As of now, 'systemd_unit.available' works ok only when provided
unit is present.
It raises Exception instead of returning boolean
when provided systemd unit is absent.
So, make it return boolean in both cases.
Fixes https://github.com/scylladb/scylla/issues/9848Closes#9849
Move saving features to `system.local#supported_features`
to the point after passing all remote feature checks in
the gossiper, right before joining the ring.
This makes `system.local#supported_features` column to store
advertised feature set. Leave a comment in the definition of
`system.local` schema to reflect that.
Since the column value is not actually used anywhere for now,
it shouldn't affect any tests or alter the existing behavior.
Later, we can optimize the gossip communication between nodes
in the cluster, removing the feature check altogether
in some cases (since the column value should now be monotonic).
* manmanson/save_adv_features_v2:
db: save supported features after passing gossip feature check
db: add `save_local_supported_features` function
Commit dcc73c5d4e introduced a semaphore
for excluding concurrent recalculations - _reserve_recalculation_guard.
Unfortunately, the two places in the code which tried to take this
guard just called get_units() - which returns a future<units>, not
units - and never waited for this future to become available.
So this patch adds the missing "co_await" needed to wait for the
units to become available.
Fixes#9770.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211214122612.1462436-1-nyh@scylladb.com>
On newer version of systemd-coredump, coredump handled in
systemd-coredump@.service, and may causes timeout while running the
systemd unit, like this:
systemd[1]: systemd-coredump@xxxx.service: Service reached runtime time limit. Stopping.
To prevent that, we need to override TimeoutStartSec=infinity.
Fixes#9837Closes#9841
On CentOS8, mdmonitor.service does not works correctly when using
mdadm-4.1-15.el8.x86_64 and later versions.
Until we find a solution, let's pinning the package version to older one
which does not cause the issue (4.1-14.el8.x86_64).
Fixes#9540Closes#9782
Along the way, our flat structure for docs was changed
to categorize the documents, but service_levels.md was forward-ported
later and missed the created directory structure, so it was created
as a sole document in the top directory. Move it to where the other
similar docs live.
Message-Id: <68079d9dd511574ee32fce15fec541ca75fca1e2.1640248754.git.sarna@scylladb.com>
"
The token metadata and features should be kept on the query_processor itself,
so finally the "storage" API would look like this:
6 .query()
5 .get_max_result_size()
2 .mutate_with_triggers()
2 .cas()
1 .truncate_blocking()
The get_max_result_size() is probably also worth moving away from storage,
it seem to have nothing to do with it.
tests: unit(dev)
"
* 'br-query-processor-in-cql-statements' of https://github.com/xemul/scylla:
cql3: Generalize bounce-to-shard result creation
cql3: Get data dictionary directly from query_processor
create_keyspace_statement: Do not use proxy.shared_from_this()
cas_request: Make read_command() accept query_processor
select_statement: Replace all proxy-s with query_processor
create_|alter_table_statement: Make check_restricted_table_properties() accept query_processor
create_|alter_keyspace_statement: Make check_restricted_replication_strategy() accept query_processor
role_management_statement: Make validate_cluster_support() accept query_processor
drop_index_statement: Make lookup_indexed_table() accept query_processor
batch_|modification_statement: Make get_mutations accept query_processor
modification_statement: Replace most of proxy-s with query_processor
batch_statement: Replace most of proxy-s with query_processor
cql3: Make create_arg_types()/prepare_type() accept query_processor
cql3: Make .validate_while_executing() accept query_processor
cql3: Make execution stages carry query_processor over
cql3: Make .validate() and .check_access() accept query_processor
The current implementation of the Alternator expiration (TTL) feature
has each node scan for expired partitions in its own primary ranges.
This means that while a node is down, items in its primary ranges will
not get expired.
But we note that doesn't have to be this way: If only a single node is
down, and RF=3, the items that node owns are still readable with QUORUM -
so these items can still be safely read and checked for expiration - and
also deleted.
This patch implements a fairly simple solution: When a node completes
scanning its own primary ranges, also checks whether any of its *secondary*
ranges (ranges where it is the *second* replica) has its primary owner
down. For such ranges, this node will scan them as well. This secondary
scan stops if the remote node comes back up, but in that case it may
happen that both nodes will work on the same range at the same time.
The risks in that are minimal, though, and amount to wasted work and
duplicate deletion records in CDC. In the future we could avoid this by
using LWT to claim ownership on a range being scanned.
We have a new dtest (see a separate patch), alternator_ttl_tests.py::
TestAlternatorTTL::test_expiration_with_down_node, which reproduces this
and verifies this fix. The test starts a 5-node cluster, with 1000 items
with random tokens which are due to be expired immediately. The test
expects to see all items expiring ASAP, but when one of the five nodes
is brought down, this doesn't happen: Some of the items are not expired,
until this patch is used.
Fixes#9787
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211222131933.406148-1-nyh@scylladb.com>
Move saving features to `system.local#supported_features`
to the point after passing all remote feature checks in
the gossiper, right before joining the ring.
This makes `system.local#supported_features` column to store
advertised feature set. Leave a comment in the definition of
`system.local` schema to reflect that.
Since the column value is not actually used anywhere for now,
it shouldn't affect any tests or alter the existing behavior.
Later, we can optimize the gossip communication between nodes
in the cluster, removing the feature check altogether
in some cases (since the column value should now be monotonic).
Tests: unit(dev)
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
The main intention is actually to free the qp.proxy() from the
need to provide the get_stats() method.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
After previous patches there's a whole bunch of places that do
qp.proxy().data_dictionary()
while the data_dictionary is present on the query processor itself
and there's a public method to get one. So use it everywhere.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The prepare_schema_mutations is not sleeping method, so there's no
point in getting call-local shared pointer on proxy. Plain reference
is more than enough.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This is the largest user of proxy argument. Fix them all and
their callers (all sit in the same .cc file).
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This completes the batch_ and modification_statement rework.
Also touch the private batch_statement::read_command while at it.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
There are some internal methods that use proxy argument. Replace
most of them with query_processor, next patch will fix the rest --
those that interact with batch statement.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
There are some proxy arguments left in the batch_statement internals.
Fix most of them to be query_processors. Few remainders will come
later as they rely on other statements to be fixed.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The schema_altering_statement declares this pure virtual method. This
patch changes its first argument from proxy into query processor and
fixes what compiler errors about.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The batch_ , modification_ and select_ statements get proxy from
query processor just to push it through execution stage. Simplify
that by pushing the query processor itself.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This is mostly a sed script that replaces methods' first argument
plus fixes of compiler-generated errors.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Currently this parse function reads only 100KB worth
of members in eac hiteration.
Since the default max_chunk_capacity is 128KB,
100KB underutilize the chunk capacity, and it could
be safely increased to the max to reduce the number of
allocations and corresponding calls to read_exactly
for large arrays.
Expose utils::chunked_vector::max_chunk_capacity
so that the caler wouldn't have to guess this number
and use it in parse().
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20211222103126.1819289-2-bhalevy@scylladb.com>
Otherwise, it may trip an assertion when the nuderlying
file is closed, as seen in e.g.:
https://jenkins.scylladb.com/view/master/job/scylla-master/job/next/4318/artifact/testlog/x86_64_release/sstable_3_x_test.test_read_rows_only_index.4174.log
```
test/boost/sstable_3_x_test.cc(0): Entering test case "test_read_rows_only_index"
sstable_3_x_test: ./seastar/src/core/fstream.cc:205: virtual seastar::file_data_source_impl::~file_data_source_impl(): Assertion `_reads_in_progress == 0' failed.
Aborting on shard 0.
Backtrace:
0x22557e8
0x2286842
0x7f2799e99a1f
/lib64/libc.so.6+0x3d2a1
/lib64/libc.so.6+0x268a3
/lib64/libc.so.6+0x26788
/lib64/libc.so.6+0x35a15
0x222c53d
0x222c548
0xb929cc
0xc0b23b
0xa84bbf
0x24d0111
```
Decoded:
```
__GI___assert_fail at :?
~file_data_source_impl at ./build/release/seastar/./seastar/src/core/fstream.cc:205
~file_data_source_impl at ./build/release/seastar/./seastar/src/core/fstream.cc:202
std::default_delete<seastar::data_source_impl>::operator()(seastar::data_source_impl*) const at /usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/unique_ptr.h:85
(inlined by) ~unique_ptr at /usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/unique_ptr.h:361
(inlined by) ~data_source at ././seastar/include/seastar/core/iostream.hh:55
(inlined by) ~input_stream at ././seastar/include/seastar/core/iostream.hh:254
(inlined by) ~continuous_data_consumer at ././sstables/consumer.hh:484
(inlined by) ~index_consume_entry_context at ././sstables/index_reader.hh:116
(inlined by) std::default_delete<sstables::index_consume_entry_context<sstables::index_consumer> >::operator()(sstables::index_consume_entry_context<sstables::index_consumer>*) const at /usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/unique_ptr.h:85
(inlined by) ~unique_ptr at /usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/unique_ptr.h:361
(inlined by) ~index_bound at ././sstables/index_reader.hh:395
(inlined by) ~index_reader at ././sstables/index_reader.hh:435
std::default_delete<sstables::index_reader>::operator()(sstables::index_reader*) const at /usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/unique_ptr.h:85
(inlined by) ~unique_ptr at /usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/unique_ptr.h:361
(inlined by) ~index_reader_assertions at ././test/lib/index_reader_assertions.hh:31
(inlined by) operator() at ./test/boost/sstable_3_x_test.cc:4630
```
Test: unit(dev), sstable_3_x_test.test_read_rows_only_index(release X 10000)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20211222132858.2155227-1-bhalevy@scylladb.com>
Today, when resharding is interrupted, shutdown will not be clean
because stopped exception interrupts the shutdown process.
Let's handle stopped exception properly, to allow shutdown process
to run to completion.
Refs #9759
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20211221175717.62293-1-raphaelsc@scylladb.com>
Catches inconsistencies in LSA state.
Currently:
- discrepancy between segment set in _closed_segments and shard's
segment descriptors
- cross-shard segment references in _closed_segments
- discrepancy in _closed_occupancy stats and what's in segment
descriptors
- segments not present in _closed_segments but present in
segment descriptors
Refs https://github.com/scylladb/scylla/issues/9544Closes#9834
* github.com:scylladb/scylla:
gdb: Introduce "scylla lsa-check"
gdb: Make get_base_class_offset() also see indirect base classes
The migration manager got local storage proxy reference recently, but one
method still uses the global call. Fix it.
tests: unit(dev)
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20211221120034.21824-1-xemul@scylladb.com>
"
The second patch in this series is a mechanical conversion of
reader_concurrency_semaphore to flat_mutation_reader_v2, and caller
updates.
The first patch is needed to pass the test suite, since without it a
real reader version conversion would happen on every entry to and exit
from reader_concurrency_semaphore, which is stressful (for example:
mutation_reader_test.test_multishard_streaming_reader reaches 8191
conversions for a couple of readers, which somehow causes it to catch
SIGSEGV in diverse and seemingly-random places).
Note that in a real workload it is unreasonable to expect readers being
parked in a reader_concurrency_semaphore to be pristine, so
short-circuiting their version conversions will be impossible and this
workaround will not really help.
"
* tag 'rcs-v2-v4' of https://github.com/cmm/scylla:
reader_concurrency_semaphore: convert to flat_mutation_reader_v2
short-circuit flat mutation reader upgrades and downgrades
Catches inconsistencies in LSA state.
Currently:
- discrepancy between segment set in _closed_segments and shard's
segment descritpors
- cross-shard segment references in _closed_segments
- discrepancy in _closed_occupancy stats and what's in segment
descriptors
- segments not present in _closed_segments but present in
segment descriptors
When asked to upgrade a reader that itself is a downgrade, try to
return the original v2 reader instead, and likewise when downgrading
upgraded v1 readers.
This is desirable because version transformations can result from,
say, entering/leaving a reader concurrency semaphore, and the amount
of such transformations is practically unbounded.
Such short-circuiting is only done if it is safe, that is: the
transforming reader's buffer is empty and its internal range tombstone
tracking state is discardable.
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
Make sure that major will compact data in all sstables and memtable,
as tombstones sitting in memtable could shadow data in sstables.
For example, a tombstone in memtable deleting a large partition could
be missed in major, so space wouldn't be saved as expected.
Additionally, write amplification is reduced as data in memtable
won't have to travel through tiers once flushed.
Fixes#9514.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20211217160055.96693-2-raphaelsc@scylladb.com>
"
Convert sstable_set and table::make_sstable_reader() to v2. With this
all readers below cache use the v2 format.
Tests: unit(dev)
"
* 'table-make-sstable-reader-v2/v1' of https://github.com/denesb/scylla:
table: upgrade make_sstable_reader() to v2
sstables/sstable_set: create_single_key_sstable_reader() upgrade to v2
sstables/sstable_set: remove unused and undefined make_reader() member
"
TWCS perform STCS on a window as long as it's the most recent one.
From there on, TWCS will compact all files in the past window into
a single file. With some moderate write load, it could happen that
there's still some compaction activity in that past window, meaning
that per-window major may miss some files being currently compacted.
As a result, a past window may contain more than 1 file after all
compaction activity is done on its behalf, which may increase read
amplification. To avoid that, TWCS will now make sure that per-window
major is serialized, to make sure no files are missed.
Fixes#9553.
tests: unit(dev).
"
* 'fix_twcs_per_window_major_v3' of https://github.com/raphaelsc/scylla:
TWCS: Make sure major on past window is done on all its sstables
TWCS: remove needless param for STCS options
TWCS: kill unused param in newest_bucket()
compaction: Implement strategy control and wire it
compaction: Add interface to control strategy behavior.
"
Users are adjusted by sprinkling `upgrade_to_v2()` and
`downgrade_to_v1()` where necessary (or removing any of these where
possible). No attempt was made to optimize and reduce the amount of
v1<->v2 conversions. This is left for follow-up patches to keep this set
small.
The combined reader is composed of 3 layers:
1. fragment producer - pop fragments from readers, return them in batches
(each fragment in a batch having the same type and pos).
2. fragment merger - merge fragment batches into single fragments
3. reader implementation glue-code
Converting layers (1) and (3) was mostly mechanical. The logic of
merging range tombstone changes is implemented at layer (2), so the two
different producer (layer 1) implementations we have share this logic.
Tests: unit(dev)
"
* 'combined-reader-v2/v4' of https://github.com/denesb/scylla:
test/boost/mutation_reader_test: add test_combined_reader_range_tombstone_change_merging
mutation_reader: convert make_clustering_combined_reader() to v2
mutation_reader: convert position_reader_queue to v2
mutation_reader: convert make_combined_reader() overloads to v2
mutation_reader: combined_reader: convert reader_selector to v2
mutation_reader: convert combined reader to v2
mutation_reader: combined_reader: attach stream_id to mutation_fragments
flat_mutation_reader_v2: add v2 version of empty reader
test/boost/mutation_reader_test: clustering_combined_reader_mutation_source_test: fix end bound calculation