This patch series splits up parts of repair pipeline to allow unit testing
various bits of code without having to run full dtest suite. The reason why
repair pipeline has no unit tests is that by definition repair requires multiple
nodes, while unit test environment works only for a single node.
However, it is possible to explicitly define interfaces between various parts of the
pipeline, inject dependencies and test them individually. This patch series is focused
on taking repair_rows_on_wire (frozen mutation representation of changes coming from
another node) and flushing them to an sstable.
The commits are split into the following parts:
- pulling out classes to separate headers so that they can be included (potentially indirectly) from the test,
- pulling out repair_meta::to_repair_rows_list and part of repair_meta::flush_rows_in_working_row_buf so that they can be tested,
- refactoring repair_writer so that the actual writing logic can be injected as dependency,
- creating the unit test.
tests: unit(dev), dtest(incremental_repair_test, read_repair_test, repair_additional_test, repair_test)
Closes#10345
* github.com:scylladb/scylla:
repair: Add unit test for flushing repair_rows_on_wire to disk.
repair: Extract mutation_fragment_queue and repair_writer::impl interfaces.
repair: Make parts of repair_writer interface private.
repair: Rename inputs to flush_rows.
repair: Make repair_meta::flush_rows a free function.
repair: Split flush_rows_in_working_row_buf to two functions and make one static.
repair: Rename inputs to to_repair_rows_list.
repair: Make to_repair_rows_list a free function.
repair: Make repair_meta::to_repair_rows_list a static function
repair: Fix indentation in repair_writer.
repair: Move repair_writer to separate header.
repair: Move repair_row to a separate header.
repair: Move repair_sync_boundary to a separate header.
repair: Move decorated_key_with_hash to separate header.
repair: Move row_repair hashing logic to separate class and file.
* 'raft_group0_early_startup_v3' of https://github.com/ManManson/scylla:
main: allow joining raft group0 before waiting for gossiper to settle
service: raft_group0: make `join_group0` re-entrant
service: storage_service: add `join_group0` method
raft_group_registry: update gossiper state only on shard 0
raft: don't update gossiper state if raft is enabled early or not enabled at all
gms: feature_service: add `cluster_uses_raft_mgmt` accessor method
db: system_keyspace: add `bootstrap_needed()` method
db: system_keyspace: mark getter methods for bootstrap state as "const"
"
Optimize consuming from a single partition.
This gives us significant improvement with single, small mutations,
as shown with perf_mutation_readers, compared to the vector-based
flat_mutation_reader_from_mutations_v2.
These are expected to be common on the write path,
and can be optimized for view building.
results from: perf_mutation_readers -c1 --random-seed=840478750
(userspace cpu-frequency governer, 2.2GHz)
test iterations median mad min max
Before:
combined.one_row 720118 825.668ns 1.020ns 824.648ns 827.750ns
After:
combined.one_mutation 881482 751.157ns 0.397ns 750.211ns 751.912ns
combined.one_row 843270 756.553ns 0.303ns 755.889ns 757.911ns
The grand plan is to follow up
with make_flat_mutation_reader_from_frozen_mutation_v2
so that we can read directly from either a mutation
or frozen_mutation without having to unfreeze it e.g. in
table::push_view_replica_updates.
Test: unit(dev)
Perf: perf_mutation_readers(release)
"
* tag 'flat_mutation_reader_from_mutation-v3' of https://github.com/bhalevy/scylla:
perf: perf_mutation_readers: add one_mutation case
test: mutation_query_test: make make_source static
mutation readers: refactor make_flat_mutation_reader_from_mutation*_v2
mutation readers: add make_flat_mutation_reader_from_mutation_v2
readers: delete slice_mutation.hh
test: flat_mutation_reader_test: mock_consumer: add debug logging
test: flat_mutation_reader_test: mock_consumer: make depth counter signed
"
There's a generic way to start-stop services in scylla, that includes
5 "actions" (some are optional and/or implicit though)
service_config cfg = ...
sharded<service>.start(cfg)
service.invoke_on_all(&service::start)
service.invoke_on_all(&service::shutdown)
service.invoke_on_all(&servuce::stop)
sharded<service>.stop()
and most of the service out there conforms to that scheme. Not snitch
(spoiler: and not tracing), for which there's a couple of helpers that
do all that magic behind the scenes, "configuring" snitch is done with
the help of overloaded constructors. The latter is extra complicated
with the need to register snitch drivers in class-registry for each
constructor overload. Also there's an external shards synchronization
on stop.
This set brings snitch start/stop code to the described standard: the
create/stop helpers are removed, creation acceps the config structure,
per-shard start/stop (snitch has no drain for now) happens in the
simple invoke-on-all manner.
The intended side effect of this change is the ability to add explicit
dependencies to snitch (in the future, not in this set).
tests: unit(dev)
"
* 'br-snitch-config' of https://github.com/xemul/scylla:
snitch: Remove create_snitch/stop_snitch
snitch: Simplify stop (and pause_io)
snitch: Move io_is_stopped to property-file driver
snitch: Remove init_snitch_obj()
snitch: Move instance creation into snitch_ptr constructor
snitch: Make config-based construction of all drivers
snitch: Declare snitch_ptr peering and rework container() method
snitch: Introduce container() method
A node can join group0 without waiting for gossiper if
it is either a fresh node, or it's an existing node, which
is already part of some group0 (i.e. have `group0_id` persisted
in system tables).
In that case the second `join_group0()` call inside the
`storage_service::join_token_ring` will be a no-op.
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Measure performance of the single-mutation reader:
make_flat_mutation_reader_from_mutation_v2.
Comparable to the `one_row` case that consumes the
single mutation using the multi-mutatio reader:
make_flat_mutation_reader_from_mutations_v2
perf_mutation_readers shows ~20-30% improvement of
make_flat_mutation_reader_from_mutation_v2
the same single mutation, just given as a single-item vector
to make_flat_mutation_reader_from_mutations_v2.
test iterations median mad min max
Before:
combined.one_row 720118 825.668ns 1.020ns 824.648ns 827.750ns
After:
combined.one_mutation 881482 751.157ns 0.397ns 750.211ns 751.912ns
combined.one_row 843270 756.553ns 0.303ns 755.889ns 757.911ns
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Extract the common parts of the single mutation reader
and the vector-based variant into mutation_reader_base
and reuse from both readers.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
slice_mutations() is currently used only by readers/mutation_readers.cc
so there's no need to expose it.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
We want to return stop_iteration::yes once we crossed
the initial depth threshold, with an unsigned depth counter,
it might wraparound and look > 1.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
The test will now, with probability 1/2, enable forwarding of entries by
followers to leaders. This is possible thanks to the new abort_source&
APIs which we use to ensure that no operations are running on servers
before we destroy them.
Some adjustments were required to the server abort procedure in order to
prevent rare hangs (see first patch). We also translate some low-level
exceptions coming from seastar primitives to high-level Raft API
exceptions (second patch).
* kbr/nemesis-enable-fd-v1:
test: raft: randomized_nemesis_test: enable entry forwarding
test: raft: randomized_nemesis_test: increase logging level on some rare operations
raft: server: translate abort_requested_exception to raft::request_aborted
raft: fsm: when stopping, become follower to reject new requests
This pull request adds support for retrying failed forwarder calls
(currently used to parallelize `select count(*) from ...` queries).
Failed-to-forward sub-queries will be executed locally (on a
super-coordinator). This local execution is meant as a fallback for a
forward_requests that could not be sent to its destined coordinator
(e.g. due gossiper not reacting fast enough). Local execution was chosen
as the safest one - it does not require sending data to another
coordinator.
Due to problems with misscompilations, some parts of the
`forward_service` were uncoroutinized.
Fixes: #10131Closes#10329
* github.com:scylladb/scylla:
forward_service: uncoroutinize dispatch method
forward_service: uncoroutinize retrying_dispatcher
forward_service: rety a failed forwarder call
forward_service: copy arguments/captured vars to local variables
The restriction "WHERE m[NULL] = 2" should result in an invalid request
error, but currently results in an ugly internal server error.
This test reproduces it, and since the bug is still in the code - is
marked as xfail.
Refs #10361
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220412134118.829671-1-nyh@scylladb.com>
This is a reproducer for issue #10359 that a "CONTAINS NULL" and
"CONTAINS KEY NULL" restrictions should not match any set, but currently
do match non-empty or all sets.
The tests currently fail on Scylla, so marked xfail. They also fails on
Cassandra because Cassandra considers such a request an error, which
we consider a mistake (see #4776) - so the tests are marked "cassandra_bug".
Refs #10359.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220412130914.823646-1-nyh@scylladb.com>
We already have a test showing that WHERE v=NULL ALLOW FILTERING is
allowed in Scylla (unlike Cassandra), and matches nothing. Here
we add two further tests that confirm that:
1. Not only is v=NULL allowed - v<NULL, v<=NULL, and so on, is also
allowed and matches nothing.
2. The ALLOW FILTERING is required in in those requests. Without it,
both Scylla and Cassandra generate the same "ALLOW FILTERING is
required" error.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220411214503.770413-1-nyh@scylladb.com>
Protocol v4 added WRITE_FAILURE and READ_FAILURE. When running under v3
we downgrade these exceptions to WRITE_TIMEOUT and READ_TIMEOUT (since
the client won't understand the v4 errors), but we still send the new
error codes. This causes the client to become confused.
Fix by updating the error codes.
A better fix is to move the error code from the constructor parameter
list and hard-code it in the constructor, but that is left for a follow-up
after this minimal fix.
Fixes#5610.
Closes#10362
"
Repair code keeps its history in system keyspace and uses the qctx
global thing to update and query it. This set replaces the qctx with
the explicit reference on the system_keyspace object.
tests: unit(dev), dtest.repair_test(dev)
"
* 'br-repair-vs-qctx' of https://github.com/xemul/scylla:
repair, system_keyspace: Query repair_history with a helper
repair: Update loader code to use system_keyspace entry
repair, system_keyspace: Update repair_history with a helper
repair: Keep system keyspace reference
Fixes the case of make_room() invoked with last_chunk_capacity_deficit
but _size not in the last reserved chunk.
Found during code review, no user impact.
Fixes#10364.
Message-Id: <20220411224741.644113-1-tgrabiec@scylladb.com>
Fixes the case of make_room() invoked with last_chunk_capacity_deficit
but _size not in the last reserved chunk.
Found during code review, no known user impact.
Fixes#10363.
Message-Id: <20220411222605.641614-1-tgrabiec@scylladb.com>
This miniseries rewrites a few unnecessary throws into forwarding the exception directly. It's partially possible thanks to the new `co_await coroutine::return_exception` mechanism which allows returning from a coroutine early, without explicitly calling co_return (d5843f6e88).
Closes#10360
* github.com:scylladb/scylla:
sstables: : remove unnecessary throws
schema_tables: remove unnecessary throws
Querying the table is now done with the help of qctx directly. This
patch replaces it with a querying helper that calls the consumer
function with the entry struct as the argument.
After this change repair code can stop including query_context and
mess with untyped_result_set.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Patch the history entry loader to use the recently introduced
history entry. This is just to reduce the churn in the next patch
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Current code works directly on the qctx which is not nice. Instead,
make it use the system keyspace reference. To make it work, the patch
adds a helper method and introduces a helper struct for the table
entry. This struct will also be used to query the table (next patch).
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Repair updates (and queries on start) the system.repair_history table
and thus depends on the system_keyspace object
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Adds a "named_file" wrapper type in commitlog, encapsulating file and disk size, the latter being updated automatically on write/truncate/allocate/delete operations. Use this instead of loose vars in segments, and also in recycle/delete lists.
Having the data propagate with the objects means we can dispose of re-reading sizes from disk, which in turn means we know what "our" view of the file sizes is when we try to delete/recycle them -> we can bookkeep accurately (from our view point) without having to resort to the rather horrible recalculation of disk footprint.
This series also drops non-recycled segment handling, since it is not used anywhere, and just makes things harder.
It also adds a parameter to set flush threshold.
These two first patches could be broken out into separate PR:s if need be.
Closes#10084
* github.com:scylladb/scylla:
commitlog: Fold named_file continuations into caller coroutine frame
commitlog: Use named named_file objects in delete/dispose/recycle lists
commitlog: Use named_file size tracking instead of segment var
commitlog: Use named_file in segment
commitlog: Add "named_file" file wrapping type
commitlog: Make flush threshold a config parameter
commitlog: kill non-recycled segment management
With off-strategy, we no longer need LCS explicitly switching to STCS
mode, and even without off-strategy, the dynamic fan-in approach
in compaction manager will cause LCS to automatically switch to
STCS under heavy write load.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20220411181322.192830-1-raphaelsc@scylladb.com>
The unit test executes a simplified repair scenario by:
- producing a random stream of mutation mutation_fragments,
- convering them to repair_rows_on_wire,
- convering them to list of repair_rows using the conversion logic
extracted in previous commits from repair_meta,
- flushing the rows to an sstable using the logic extracted in previous
commits from repair_meta,
- comparing the sstable contents with the originally produced mutation
fragments.
The test checks only the flushing part and is not concerned with any
other piece of the repair pipeline.
It allows pulling out the logic of writing internal representation
of repair mutations to disk. This in turn is needed to unit test
this functionality without spinning up clusters, which significantly
improves developer iteration time.
It allows pulling out the logic of convering on-the-wire representation
of repair mutations to an internal representation used later for
flushing repair mutations to disk. This in turn is needed to unit test
the functionality without spinning up clusters, which significantly
improves developer iteration time.
Saves a continuation. That matters very little. But...
Uses a special awaiter type on returns from the "then(...)"-wrapping
named_file methods (which use a then([...update]) to keep internal
size counters up-to-date, making the continuation instead a stored func
into the returned awaiter, executed on successul resume of the caller
co_await.
Changes delete/close queue, as well as deletetion queue into one, using
named_file objects + marker. Recycle list now also contains said named
file type.
This removes the need to re-eval file sizes on disk when deleting etc,
which in turn means we can dispose of recalculate_footprint on errors,
thus making things simpler and safer.