When tombstone_gc=repair, the repaired compaction view's sstable_set_for_tombstone_gc() previously returned all sstables across all three views (unrepaired, repairing, repaired). This is correct but unnecessarily expensive: the unrepaired and repairing sets are never the source of a GC-blocking shadow when tombstone_gc=repair, for base tables. The key ordering guarantee that makes this safe is: - topology_coordinator sends send_tablet_repair RPC and waits for it to complete. Inside that RPC, mark_sstable_as_repaired() runs on all replicas, moving D from repairing → repaired (repaired_at stamped on disk). - Only after the RPC returns does the coordinator commit repair_time + sstables_repaired_at to Raft. - gc_before = repair_time - propagation_delay only advances once that Raft commit applies. Therefore, when a tombstone T in the repaired set first becomes GC-eligible (its deletion_time < gc_before), any data D it shadows is already in the repaired set on every replica. This holds because: - The memtable is flushed before the repairing snapshot is taken (take_storage_snapshot calls sg->flush()), capturing all data present at repair time. - Hints and batchlog are flushed before the snapshot, ensuring remotely-hinted writes arrive before the snapshot boundary. - Legitimate unrepaired data has timestamps close to 'now', always newer than any GC-eligible tombstone (USING TIMESTAMP to write backdated data is user error / UB). Excluding the repairing and unrepaired sets from the GC shadow check cannot cause any tombstone to be wrongly collected. The memtable check is also skipped for the same reason: memtable data is either newer than the GC-eligible tombstone, or was flushed into the repairing/repaired set before gc_before advanced. Safety restriction — materialized views: The optimization IS applied to materialized view tables. Two possible paths could inject D_view into the MV's unrepaired set after MV repair: view hints and staging via the view-update-generator. Both are safe: (1) View hints: flush_hints() creates a sync point covering BOTH _hints_manager (base mutations) AND _hints_for_views_manager (view mutations). It waits until ALL pending view hints — including D_view entries queued in _hints_for_views_manager while the target MV replica was down — have been replayed to the target node before take_storage_snapshot() is called. D_view therefore lands in the MV's repairing sstable and is promoted to repaired. When a repaired compaction then checks for shadows it finds D_view in the repaired set, keeping T_mv non-purgeable. (2) View-update-generator staging path: Base table repair can write a missing D_base to a replica via a staging sstable. The view-update-generator processes the staging sstable ASYNCHRONOUSLY: it may fire arbitrarily later, even after MV repair has committed repair_time and T_mv has been GC'd from the repaired set. However, the staging processor calls stream_view_replica_updates() which performs a READ-BEFORE-WRITE via as_mutation_source_excluding_staging(): it reads the CURRENT base table state before building the view update. If T_base was written to the base table (as it always is before the base replica can be repaired and the MV tombstone can become GC-eligible), the view_update_builder sees T_base as the existing partition tombstone. D_base's row marker (ts_d < ts_t) is expired by T_base, so the view update is a no-op: D_view is never dispatched to the MV replica. No resurrection can occur regardless of how long staging is delayed. A potential sub-edge-case is T_base being purged BEFORE staging fires (leaving D_base as the sole survivor, so stream_view_replica_updates would dispatch D_view). This is blocked by an additional invariant: for tablet-based tables, the repair writer stamps repaired_at on staging sstables (repair_writer_impl::create_writer sets mark_as_repaired = true and perform_component_rewrite writes repaired_at = sstables_repaired_at + 1 on every staging sstable). After base repair commits sstables_repaired_at to Raft, the staging sstable satisfies is_repaired(sstables_repaired_at, staging_sst) and therefore appears in make_repaired_sstable_set(). Any subsequent base repair that advances sstables_repaired_at further still includes the staging sstable (its repaired_at ≤ new sstables_repaired_at). D_base in the staging sstable thus shadows T_base in every repaired compaction's shadow check, keeping T_base non-purgeable as long as D_base remains in staging. A base table hint also cannot bypass this. A base hint is replayed as a base mutation. The resulting view update is generated synchronously on the base replica and sent to the MV replica via _hints_for_views_manager (path 1 above), not via staging. USING TIMESTAMP with timestamps predating (gc_before + propagation_delay) is explicitly UB and excluded from the safety argument. For tombstone_gc modes other than repair (timeout, immediate, disabled) the invariant does not hold for base tables either, so the full storage-group set is returned. The expected gain is reduced bloom filter and memtable key-lookup I/O during repaired compactions: the unrepaired set is typically the largest (it holds all recent writes), yet for tombstone_gc=repair it never influences GC decisions. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-231. Closes scylladb/scylladb#29310 * github.com:scylladb/scylladb: compaction: Restrict tombstone GC sstable set to repaired sstables for tombstone_gc=repair mode test/repair: Add tombstone GC safety tests for incremental repair
Scylla unit tests using C++ and the Boost test framework
The source files in this directory are Scylla unit tests written in C++ using the Boost.Test framework. These unit tests come in three flavors:
-
Some simple tests that check stand-alone C++ functions or classes use Boost's
BOOST_AUTO_TEST_CASE. -
Some tests require Seastar features, and need to be declared with Seastar's extensions to Boost.Test, namely
SEASTAR_TEST_CASE. -
Even more elaborate tests require not just a functioning Seastar environment but also a complete (or partial) Scylla environment. Those tests use the
do_with_cql_env()ordo_with_cql_env_thread()function to set up a mostly-functioning environment behaving like a single-node Scylla, in which the test can run.
While we have many tests of the third flavor, writing new tests of this type should be reserved to white box tests - tests where it is necessary to inspect or control Scylla internals that do not have user-facing APIs such as CQL. In contrast, black-box tests - tests that can be written only using user-facing APIs, should be written in one of newer test frameworks that we offer - such as test/cqlpy or test/alternator (in Python, using the CQL or DynamoDB APIs respectively) or test/cql (using textual CQL commands), or - if more than one Scylla node is needed for a test - using the test/topology* framework.
Running tests
Because these are C++ tests, they need to be compiled before running.
To compile a single test executable row_cache_test, use a command like
ninja build/dev/test/boost/row_cache_test
You can also use ninja dev-test to build all C++ tests, or use
ninja deb-build to build the C++ tests and also the full Scylla executable
(however, note that full Scylla executable isn't needed to run Boost tests).
Replace "dev" by "debug" or "release" in the examples above and below to use the "debug" build mode (which, importantly, compiles the test with ASAN and UBSAN enabling on and helps catch difficult-to-catch use-after-free bugs) or the "release" build mode (optimized for run speed).
To run an entire test file row_cache_test, including all its test
functions, use a command like:
build/dev/test/boost/row_cache_test -- -c1 -m1G
to run a single test function test_reproduce_18045() from the longer test
file, use a command like:
build/dev/test/boost/row_cache_test -t test_reproduce_18045 -- -c1 -m1G
In these command lines, the parameters before the -- are passed to
Boost.Test, while the parameters after the -- are passed to the test code,
and in particular to Seastar. In this example Seastar is asked to run on one
CPU (-c1) and use 1G of memory (-m1G) instead of hogging the entire
machine. The Boost.Test option -t test_reproduce_18045 asks it to run just
this one test function instead of all the test functions in the executable.
Unfortunately, interrupting a running test with control-C while doesn't
work. This is a known bug (#5696). Kill a test with SIGKILL (-9) if you
need to kill it while it's running.
Boost tests can also be run using test.py - which is a script that provides a uniform way to run all tests in scylladb.git - C++ tests, Python tests, etc.
Execution with pytest
To run all tests with pytest execute
pytest test/boost
To execute all tests in one file, provide the path to the source filename as a parameter
pytest test/boost/aggregate_fcts_test.cc
Since it's a normal path, autocompletion works in the terminal out of the box.
To execute only one test function, provide the path to the source file and function name
pytest --mode dev test/boost/aggregate_fcts_test.cc::test_aggregate_avg
To provide a specific mode, use the next parameter --mode dev,
if parameter isn't provided pytest tries to use ninja mode_list to find out the compiled modes.
Parallel execution is controlled by pytest-xdist and the parameter -n auto.
This command starts tests with the number of workers equal to CPU cores.
The useful command to discover the tests in the file or directory is
pytest --collect-only -q --mode dev test/boost/aggregate_fcts_test.cc
That will return all test functions in the file.
To execute only one function from the test, you can invoke the output from the previous command.
However, suffix for mode should be skipped.
For example,
output shows in the terminal something like this test/boost/aggregate_fcts_test.cc::test_aggregate_avg.dev.
So to execute this specific test function, please use the next command
pytest --mode dev test/boost/aggregate_fcts_test.cc::test_aggregate_avg
Writing tests
Because of the large build time and build size of each separate test executable, it is recommended to put test functions into relatively large source files. But not too large - to keep compilation time of a single source file (during development) at reasonable levels.
When adding new source files in test/boost, don't forget to list the new source file in configure.py and also in CMakeLists.txt. The former is needed by our CI, but the latter is preferred by some developers.