The test hardcoded the expected number of coordinator elections
(2, 3, 4, 5) for each phase. If a prior phase triggered an extra
election, subsequent phases would wait for a count that was already
reached or would never match.
Fix by reading the current election count before each operation and
expecting exactly one more, making each phase independent of prior
history.
Also add wait_for_no_pending_topology_transition() calls after each
coordinator election to ensure the topology state machine has fully
settled before proceeding with restarts and further operations.
Decrease the failure detector timeout (failure_detector_timeout_in_ms)
to 2000 ms on all test nodes so that coordinator crashes are detected
faster, reducing test wallclock time and timeout-related flakiness.
Enable raft_topology=trace logging on all test nodes to aid
post-failure diagnosis. Add diagnostic logging in
wait_new_coordinator_elected().
Fixes: SCYLLADB-1089
Closesscylladb/scylladb#29284
- Revert the previous "test.py: fix test collection bug" commit (92c09d10) which worked around broken deduplication by filtering items without `BUILD_MODE` in `pytest_collection_modifyitems`. This approach masked the root cause and is superseded by the proper fixes below.
- Backport pytest 9.0.3's argument normalization algorithm into `test.py` to work around broken deduplication in pytest 8.3.5 ([pytest-dev/pytest#12083](https://github.com/pytest-dev/pytest/issues/12083)). Duplicate or subsumed test paths (e.g. `test/cql` and `test/cql/lua_test.cql`) are collapsed before invoking pytest. Revert when upgrading to pytest 9.x.
- Return a `DisabledFile` collector instead of an empty list in `pytest_collect_file` when all modes are disabled for a file, fixing a bug where subsequent files would not get their stash items set (`REPEATING_FILES`). Restructure `pytest_collect_file` to use a walrus operator (`if repeats := ...`) with a single `remove(file_path)` and `return collectors` at the end, eliminating the early return.
- Add `--keep-duplicates` CLI argument to bypass deduplication and forward to pytest.
- Move `RUN_ID` assignment from `pytest_collect_file` to `modify_pytest_item`. A shared `run_ids` cache (`defaultdict[tuple[str, str], count]`) is created in `pytest_collection_modifyitems` and passed to `modify_pytest_item`, keyed by `(build_mode, nodeid)` so each mode gets independent counters. This ensures unique run IDs even when `--keep-duplicates` causes the same file to be collected multiple times.
- Fix `--repeat` option default from string `"1"` to int `1` — argparse only applies `type=` to CLI-parsed values, not defaults.
pytest normally deduplicates overlapping test arguments — e.g. `test/cql test/cql/lua_test.cql` collects `lua_test.cql` only once. The original `test.py` never performed this deduplication, and the pytest version in the toolchain image (8.3.5) has a bug that breaks it ([pytest-dev/pytest#12083](https://github.com/pytest-dev/pytest/issues/12083).)
Since we are moving to bare pytest, `test.py` should match pytest's default behavior: deduplicate. Because we cannot easily upgrade pytest, commit 2 backports the deduplication logic from pytest 9.0.3.
To match pytest's interface, `--keep-duplicates` is added as an opt-out. This lets a user intentionally run overlapping paths — e.g. `./test.py test/blah test/blah/test_foo.py --keep-duplicates` runs `test_foo.py` twice. The flag is forwarded to pytest and also skips the backported deduplication in `test.py`.
- Revert 92c09d10 which filtered items without `BUILD_MODE` in `pytest_collection_modifyitems` and added an early return in `CppFile.collect()`. This workaround is superseded by the proper deduplication and `DisabledFile` fixes.
- Add `_CollectionArgument` dataclass (`order=True`, `__contains__` for subsumption) and `_deduplicate_test_args()` function, adapted from pytest 9.0.3. Marked with a TODO to remove once we update to pytest 9.x.
- Call `_deduplicate_test_args()` on `options.name` before passing to pytest.
- Add `DisabledFile(pytest.File)` that skips collection with an informative message instead of returning an empty list.
- Restructure `pytest_collect_file` to use walrus operator: `if repeats := ...:` / `else:` — single `remove(file_path)` at end, no early return.
- Add `--keep-duplicates` argument that skips deduplication and is forwarded to pytest.
- Create a shared `run_ids` cache in `pytest_collection_modifyitems` and pass it to `modify_pytest_item`, which assigns unique sequential RUN_IDs via `itertools.count`. The cache is keyed by `(build_mode, nodeid)` so each mode gets independent counters.
- Remove `RUN_ID` from `_STASH_KEYS_TO_COPY` — it is no longer set on collectors.
- Remove `CppFile.run_id` cached_property. `CppTestCase` now reads `RUN_ID` from its own item stash.
- Fix `--repeat` option default from `"1"` to `1` and drop redundant `int()` cast.
Closes SCYLLADB-1730
Closesscylladb/scylladb#29665
* github.com:scylladb/scylladb:
test: add --keep-duplicates and assign RUN_ID via shared cache
test/pylib/runner: fix disabled file collection
test.py: deduplicate CLI test arguments before passing to pytest
Revert "test.py: fix test collection bug"
lock_tables_metadata() acquires a write lock on tables_metadata._cf_lock
on every shard. It used invoke_on_all(), which dispatches lock
acquisitions to all shards in parallel via parallel_for_each +
smp::submit_to.
When two fibers call lock_tables_metadata() concurrently, this can
deadlock. parallel_for_each starts all iterations unconditionally:
even when the local shard's lock attempt blocks (because the other
fiber already holds it), SMP messages are still sent to remote shards.
Both fibers' lock-acquisition messages land in the per-shard SMP
queues. The SMP queue itself is FIFO, but process_incoming() drains
it and schedules each item as a reactor task via add_task(), which —
in debug and sanitize builds with SEASTAR_SHUFFLE_TASK_QUEUE — shuffles
each newly added task against all pending tasks in the same scheduling
group's reactor task queue. This means fiber A's lock acquisition can
be reordered past fiber B's (and past unrelated tasks) on a given shard.
If fiber A wins the lock on shard X while fiber B wins on shard Y, this
creates a classic cross-shard lock-ordering deadlock (circular wait).
In production builds without SEASTAR_SHUFFLE_TASK_QUEUE, the reactor
task queue is FIFO. Still, even in release builds, the SMP queues can
reorder messages even, so the deadlock is still possible, even if it's
much less likely. In debug and sanitize builds, the task-queue shuffle
makes the deadlock very likely whenever both fibers' lock-acquisition
tasks are pending simultaneously in the reactor task queue on any shard.
This deadlock was exposed by ce00d61917 ("db: implement large_data
virtual tables with feature flag gating", merged as 88a8324e68),
which introduced legacy_drop_table_on_all_shards as a second caller
of lock_tables_metadata(). When LARGE_DATA_VIRTUAL_TABLES is enabled
during topology_state_load (via feature_service::enable), two fibers
can race:
1. activate_large_data_virtual_tables() — calls
legacy_drop_table_on_all_shards() which calls
lock_tables_metadata() synchronously via .get()
2. reload_schema_in_bg() — fires as a background fiber from
TABLE_DIGEST_INSENSITIVE_TO_EXPIRY, eventually reaches
schema_applier::commit() which also calls lock_tables_metadata()
If both reach lock_tables_metadata() while the lock is free on all
shards, the parallel acquisition creates the deadlock opportunity.
The deadlock blocks topology_state_load() from completing, which
prevents the bootstrapping node from finishing its topology state
transitions. The coordinator's topology coordinator then waits for
the node to reach the expected state, but the node is stuck, so
eventually the read_barrier times out after 300 seconds.
Fix by acquiring the shard 0 lock first before attempting to
acquire any other lock. Whichever fiber wins shard 0 is
guaranteed to acquire all remaining shards before the other fiber
can proceed past shard 0, eliminating the circular-wait condition.
Tested manually with 2 approaches:
1. causing different shard locks to be acquired by different
lock_tables_metadata() calls by adding different sleeps depending
on the lock_tables_metadata() call and target shard - this reproduced
the issue consistently
2. matching the time point at which both fibers reach lock_tables_metadata()
adding a single sleep to one of the fibers - this heavily depends on
the machine so we can't create a universal reproducer this way, but
it did result in the observed failure on my machine after finding the
right sleep time
Also added a unit test for concurrent lock_tables_metadata() calls.
Fixes: SCYLLADB-1694
Fixes: SCYLLADB-1644
Fixes: SCYLLADB-1684
Closesscylladb/scylladb#29678
This series addresses three problems in the audit startup/shutdown
sequence:
1. [BUG] Shutdown SIGABRT. During graceful shutdown, deferred stops run in reverse order of construction. With the audit service constructed after the maintenance socket, audit was destroyed first, and in-flight queries on the maintenance socket could hit the destroyed audit service (assertion failure in sharded::local()).
2. [BUG] Startup audit bypass. The maintenance socket opened before audit storage was initialized, allowing queries (e.g. creating a superuser) to bypass auditing in that window.
3. [PROBLEM] Blocks SCYLLADB-1430. The existing order prevents audit configuration from being driven by group0 state, because audit started before group0.
The series is organized as: a test-helper refactor, a test for the audited maintenance-socket flow, a startup-phase split, the construction-order fix and its shutdown-race test, and finally the storage-before-socket fix and its startup-window test.
Fixes SCYLLADB-1615
No backport, bugs don't seem severe enough to justify backporting.
Closesscylladb/scylladb#29539
* github.com:scylladb/scylladb:
audit: assert storage ordering invariants at runtime
audit: start maintenance socket after audit storage
audit: move audit construction before maintenance socket
audit: split startup into construction and storage phases
test: audit: verify maintenance socket operations are audited
test: audit: parameterize source address in audit assertions
Commit 8d34127684 ("sstables: clean up TemporaryHashes file in wipe()")
unconditionally calls filename(..., component_type::TemporaryHashes)
inside filesystem_storage::wipe(). However, the TemporaryHashes
component is only registered in the component map of the 'ms' sstable
format. For older formats (ka, la, mc, md, me) the lookup goes through
sstable_version_constants::get_component_map(version).at(...) and throws
std::out_of_range.
The exception is then swallowed by the outer catch(...) in wipe(), which
just logs and ignores. As a side effect, the subsequent
remove_file(new_toc_name) is never reached and the TemporaryTOC
('*-TOC.txt.tmp') file is left as an orphan on disk after every unlink()
of a non-'ms' sstable.
Guard the lookup with get_component_map(version).contains() so the
cleanup is only attempted for formats that actually define the
component.
Add a regression test in test/boost/sstable_directory_test.cc that
creates an 'me'-format sstable, unlinks it and asserts that the sstable
directory is left empty. Without the fix the test fails with a leftover
'me-...-TOC.txt.tmp' file.
Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1697Closesscylladb/scylladb#29620
`ScyllaCluster.nodelist()` creates new `ScyllaNode` objects on every call,
so per-node state set via `set_smp()`, `set_log_level()`, and
`_adjust_smp_and_memory()` was lost. This meant `set_smp()` had no effect
when `cluster.start()` was called after it, since `start_nodes()` calls
`nodelist()` internally which creates fresh nodes with default values.
- Add debug logging for smp/memory in ScyllaNode
- Store per-node settings (smp, memory, log levels) in a
`ScyllaCluster._node_resources` dict keyed by server_id, so they survive
`nodelist()` reconstruction. `ScyllaNode` restores its state from this dict
on construction and saves it back whenever `set_smp()`, `set_log_level()`,
or `_adjust_smp_and_memory()` modifies it.
- Add a reproducer test verifying `set_smp()` takes effect on restart
Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1629
--
No backport needed: this only fixes dtest infrastructure, no production code
is affected.
Closesscylladb/scylladb#29549
* github.com:scylladb/scylladb:
test/cluster/dtest: add test for node.set_smp() persistence
test/cluster/dtest: cache ScyllaNode instances in ScyllaCluster
test/cluster/dtest/ccmlib/scylla_node: add debug logging
Add --keep-duplicates CLI argument to bypass deduplication and forward
to pytest, allowing duplicate test file arguments to be collected
multiple times.
Move RUN_ID assignment from pytest_collect_file to modify_pytest_item.
All File collectors for the same source file share a single run_ids
dict (via RUN_ID_CACHE stash key), so items from duplicate collection
arguments (e.g. with --keep-duplicates) automatically get unique IDs.
Remove CppFile.run_id cached_property — CppTestCase now reads RUN_ID
from its own item stash, which is set during modify_pytest_item.
Fix --repeat option default from string "1" to int 1 — argparse only
applies type= to CLI-parsed values, not defaults.
Co-Authored-By: Claude Opus 4.6 (200K context) <noreply@anthropic.com>
Return a DisabledFile collector instead of an empty list when all modes
are disabled for a file. Returning an empty list caused subsequent
files to not get their stash items set because file_path was never
removed from REPEATING_FILES.
Co-Authored-By: Claude Opus 4.6 (200K context) <noreply@anthropic.com>
The table-based audit backend needs Raft to create its keyspace,
but the audit service must exist earlier so that CQL paths don't
silently skip auditing.
Split startup into two phases: construction and storage
initialization. Queries arriving between the two phases are
logged as errors.
This is a refactoring commit and the split sections will be
moved later in this patch series.
Refs SCYLLADB-1615
User creation via the maintenance socket should produce audit
entries, as this is the recommended flow for creating the
initial superuser when default credentials are disabled.
The test is parametrized by audit backend (table and syslog).
The maintenance socket source address is "::" because Seastar
returns a zero-initialised in6_addr for AF_UNIX sockets.
Test time in dev: 0.6s
Refs SCYLLADB-1615
Some tests in test_tablets.py read system_schema.keyspaces from an arbitrary node that may not have applied the latest schema change yet. Pin the read to a specific node and issue a read barrier before querying, ensuring the node has up-to-date data.
Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1700
Test fix; no backport
Closesscylladb/scylladb#29655
* github.com:scylladb/scylladb:
test: fix flaky rack list conversion tests by using read barrier
test: fix flaky test_enforce_rack_list_option by using read barrier
Commit 2b7aa32 (topology_coordinator: Refresh load stats after
table is created or altered) registered topology_coordinator as a
schema change listener and added on_create_column_family which
fire-and-forgets _tablet_load_stats_refresh.trigger(). The
triggered task runs on the gossip scheduling group via
with_scheduling_group and accesses the topology_coordinator via
'this'.
stop() unregisters the listener but does not wait for any
in-flight refresh task. If a notification fires between
_tablet_load_stats_refresh.join() in run() and unregister_listener
in stop(), the scheduled task can outlive the topology_coordinator
and access freed memory after run_topology_coordinator's coroutine
frame is destroyed.
Wait for the refresh to complete in stop() after unregistering the
listener, ensuring no task can fire after destruction.
Fixes SCYLLADB-1728
Backport to 2026.1 and 2026.2, because the issue was introduced in 2b7aa32Closesscylladb/scylladb#29653
* https://github.com/scylladb/scylladb:
test: tablet_stats: reproduce shutdown refresh race
topology_coordinator: join tablet load stats refresh in stop()
Add a test that reproduces SCYLLADB-1629: set_smp() had no effect
because nodelist() created new ScyllaNode objects on every call,
losing the _smp_set_during_test value. The test fails without the
fix in the previous patch.
ScyllaCluster.nodelist() was creating new ScyllaNode objects on every
call, so per-node state set via set_smp(), set_log_level(), and
_adjust_smp_and_memory() was lost between calls.
Fix by caching ScyllaNode instances in a list populated by
_add_nodes() using the list returned by servers_add() in populate().
Nodes are assigned monotonically increasing names (node1, node2, ...).
nodelist() simply returns the cached list.
The LDAP role manager's `_cache_pruner` background fiber periodically calls cache::reload_all_permissions(). Two races cause it to hit SCYLLA_ASSERT(_permission_loader):
- Cross-shard race: The pruner `used _cache.container().invoke_on_all()` to reload permissions on every shard. Since both `service::start()` and `sharded<service>::stop()` execute per-shard in parallel, the pruner on one shard could call reload_all_permissions() on another shard before that shard set its loader (startup) or after it cleared its loader (shutdown). Each shard runs its own pruner instance, so reloading locally is sufficient — this also removes redundant N² reload calls.
- Intra-shard race: `service::stop()` cleared the permission loader and stopped the role manager concurrently (via when_all_succeed). A mid-reload pruner could yield and then call the now-null loader. Fixed by stopping the role manager first so the pruner is fully drained before the loader is cleared.
Fixes SCYLLADB-1679
Backport to 2026.2, introduced in 7eedf50c12Closesscylladb/scylladb#29605
* github.com:scylladb/scylladb:
auth: make shutdown the exact reverse of startup
test: ldap: add test for pruner crash during shutdown
auth: start authorizer and set permission loader before role manager
auth: stop role manager before clearing permission loader
auth: reload LDAP permission cache on local shard only
In certain circumstances current way of collecting can be error-prone. Collection can stop when the first file is skipped in the mode leaving the rest of the files in CLI not collected.
Another issue that if the file specified twice, with directory and file explicitly, it will produce incorrect CppFile in the stash causing KeyError.
Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1714
No backport, test framework bug fix only.
Closesscylladb/scylladb#29634
* github.com:scylladb/scylladb:
test.py: fix framework test
test.py: fix test collection bug
The coordinator can receive a schema-change notification after run()
finishes but before stop() unregisters listeners. The test pins that
window with error injections and verifies stop() waits for the refresh
instead of letting it outlive the coordinator.
Test time in dev: 9.51s
Refs SCYLLADB-1728
There is a small race window where Raft leadership could transfer back
to servers[1] between the ensure_group0_leader_on() check and the
actual restart. If this happens, the new coordinator re-initiates
repair and masks the compaction-merge bug.
Extract the core test logic into _do_race_window_promotes_unrepaired_data()
which directly checks get_topology_coordinator() after restart and raises
_LeadershipTransferred if servers[1] became coordinator. The test
function calls this helper in a retry loop (up to 5 attempts).
Refs: SCYLLADB-1478
The test_incremental_repair_race_window_promotes_unrepaired_data test
was flaky because it hardcodes servers[1] as the restart target but did
not ensure servers[1] was NOT the topology coordinator.
When servers[1] happened to be the Raft group0 leader (topology
coordinator), restarting it killed the leader, forced a new election,
and the new coordinator re-initiated tablet repair. This re-repair
flushes memtables on all replicas via take_storage_snapshot() and marks
the resulting sstables as repaired -- causing post-repair keys to appear
in repaired sstables on servers[0] and servers[2]. The test then hit
the wrong assertion (servers[0]/[2] contaminated).
Fix: before starting the repair, check whether servers[1] is the
topology coordinator. If so, move leadership to another server via
ensure_group0_leader_on() so that restarting servers[1] only kills a
follower -- which does not trigger an election or coordinator change.
Reproducibility was confirmed by forcing leadership to servers[1] via
ensure_group0_leader_on() and observing deterministic failure with all
three servers showing post-repair keys in repaired sstables (confirming
the re-repair scenario), then verifying the fix passes reliably.
Fixes: SCYLLADB-1478
test_numeric_rf_to_rack_list_conversion and
test_numeric_rf_to_rack_list_conversion_abort were reading
system_schema.keyspaces from an arbitrary node that may not have
applied the latest schema change yet. Pin the read to a specific node
and issue a read barrier before querying, ensuring the node has
up-to-date data.
The test was reading system_schema.keyspaces from an arbitrary node
that may not have applied the latest schema change yet. Pin the read
to a specific node and issue a read barrier before querying, ensuring
the node has up-to-date data.
server_add() only waits for CQL readiness before returning. The
Alternator HTTP port may not be listening yet, causing
ConnectionRefused with Alternator tests.
Extend the ServerUpState enum and startup loop to also check Alternator
port readiness when configured. Whenever Alternator port(s) is/are
configured, each is verified if connectable and queryable,
similar to how CQL ports are probed.
Fixes SCYLLADB-1701
Closesscylladb/scylladb#29625
Commit ce00d61917 ("db: implement large_data virtual tables with feature
flag gating") changed these two tests to construct their mutation with
a randomly generated partition key (simple_schema::make_pkey()) instead
of the previously fixed pk "pv", with the comment that this avoids a
"Failed to generate sharding metadata" error.
simple_schema::make_pkey() delegates to tests::generate_partition_key(),
which defaults to key_size{1, 128}, i.e. the partition key length is
uniformly random in [1, 128] bytes. That interacts badly with the fact
that both tests pick thresholds at exact byte boundaries of the MC
sstable row encoding:
- The large-data handler records a row's size as
_data_writer->offset() - current_pos
(sstables/mx/writer.cc: collect_row_stats()), i.e. the number of
bytes the row took on disk.
- For the first clustering row, the body includes a vint-encoded
prev_row_size = pos - _prev_row_start.
- _prev_row_start is captured at the start of the partition
(consume_new_partition()) before the partition key is written to
the data stream, so prev_row_size rolls in the partition key's
serialized length (2-byte prefix + pk bytes) + deletion_time +
static row size.
A random-size partition key therefore perturbs the first clustering
row's encoded size by 1-2 bytes across runs (the vint of prev_row_size
crosses the 128 boundary), flipping the test's byte-exact threshold
comparison. On seed 2104744000 this produced:
critical check row_size_count == expected.size() has failed [3 != 2]
Fix the two byte-exact-sensitive tests by reverting their partition key
to the fixed s.new_mutation("pv") used before ce00d61917. Under smp=1
(which these tests run with, per -c1 in the test invocation) a fixed
key is always shard-local, so no sharding-metadata issue arises here.
The other tests modified by ce00d61917 (test_sstable_log_too_many_rows,
test_sstable_log_too_many_dead_rows, test_sstable_too_many_collection_elements,
test_large_data_records_round_trip, etc.) assert on row/element counts
or use thresholds with enough slack that the partition key size does
not matter, and are left unchanged.
Add an explanatory comment to each fixed site so the pitfall is not
re-introduced by a future refactor.
Verified stable with:
./test.py --mode=dev test/boost/sstable_3_x_test.cc::test_sstable_write_large_row --repeat 100 --max-failures 1
./test.py --mode=dev test/boost/sstable_3_x_test.cc::test_sstable_write_large_cell --repeat 100 --max-failures 1
./test.py --mode=release test/boost/sstable_3_x_test.cc::test_sstable_write_large_row --repeat 100 --max-failures 1
./test.py --mode=release test/boost/sstable_3_x_test.cc::test_sstable_write_large_cell --repeat 100 --max-failures 1
All four invocations: 100/100 passed.
Fixes: SCYLLADB-1685
Closesscylladb/scylladb#29621
In certain circumstances current way of collecting can be error prone.
Collection can stop when the first file is skipped in the mode leaving
the rest of the files in CLI not collected.
Another issue that if the file specified twice, with directory and file
explicitly, it will produce incorrect CppFile in the stash causing
KeyError.
Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1714
Verify that service::stop() drains the LDAP pruner before
clearing the permission loader. The test installs a slow
permission loader and confirms the pruner is actively
reloading when teardown begins.
Refs SCYLLADB-1679
The failure_detector_timeout_in_ms override of 2000ms in 6 cluster test files is too aggressive for debug/sanitize builds. During node joins, the coordinator's failure detector times out on RPC pings to the joining node while it is still applying schema snapshots, marks it DOWN, and bans it — causing flaky test failures.
Scale the timeout by MODES_TIMEOUT_FACTOR (3x for debug/sanitize, 2x for dev, 1x for release) via a shared failure_detector_timeout fixture in conftest.py.
Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1587
Backport: no, elasticsearch analyser shows only a single failure
Closesscylladb/scylladb#29522
* github.com:scylladb/scylladb:
test/cluster: scale failure_detector_timeout_in_ms by build mode
test/cluster: add failure_detector_timeout fixture
The log grep in get_sst_status searched from the beginning of the log
(no from_mark), so the second-repair assertions were checking cumulative
counts across both repairs rather than counts for the second repair alone.
The expected values (sst_add==2, sst_mark==2) relied on this cumulative
behaviour: 1 from the first repair + 1 from the second = 2. This works
when the second repair encounters exactly one unrepaired sstable, but
fails whenever the second repair sees two.
The second repair can see two unrepaired sstables when the 100 keys
inserted before it (via asyncio.gather) trigger a background auto-flush
before take_storage_snapshot runs. take_storage_snapshot always flushes
the memtable itself, so if an auto-flush already split the batch into two
sstables on disk, the second repair's snapshot contains both and logs
"Added sst" twice, making the cumulative count 3 instead of 2.
Fix: take a log mark per-server before each repair call and pass it to
get_sst_status so each check counts only the entries produced by that
repair. The expected values become 1/0/1 and 1/1/1 respectively,
independent of how many sstables happened to exist beforehand.
get_sst_status gains an optional from_mark parameter (default None)
which preserves existing call sites that intentionally grep from the
start of the log.
Fixes: SCYLLADB-1086
Closesscylladb/scylladb#29484
The test creates a sync point immediately after writing 100 rows
with CL=ANY, without waiting for pending hint writes to complete.
store_hint() is fire-and-forget: it submits do_store_hint() to a gate
and returns immediately. do_store_hint() updates _last_written_rp only
after writing to the commitlog. If create_sync_point() is called before
all do_store_hint() coroutines complete, the captured replay position
is stale, and await_sync_point() returns DONE before all hints are
replayed, leaving some rows missing.
Fix by waiting for the size_of_hints_in_progress metric to reach zero
before creating the sync point, ensuring all in-flight hint writes have
completed and _last_written_rp is up to date. This follows the same
pattern already used in test_sync_point.
Fixes: SCYLLADB-1560
Closesscylladb/scylladb#29623
Tracing events are written to system_traces.events with CL=ANY, so they
are only guaranteed to be present on the local node of the query
coordinator. Reading them back with the driver default (CL=LOCAL_ONE)
may route the query to a replica that has not yet received all events,
causing the assertion on 'digest mismatch, starting read repair' to fail
intermittently.
Fix execute_with_tracing() to read tracing via the ResponseFuture API
with query_cl=ConsistencyLevel.ALL, so events from all replicas are
merged before the caller inspects them.
Fixes: SCYLLADB-1633
Closesscylladb/scylladb#29566
The previous fix for the flakiness in test_file_streaming waited for
the scylla_database_view_update_backlog metric to drop to 0 via
wait_for(view_updates_drained, ...). However, the predicate returned
True/False, while wait_for treats any non-None result as 'done' and
keeps retrying only on None. So when the backlog was non-zero the
predicate returned False, which wait_for interpreted as success and
returned immediately - the test could then stop servers[0]/servers[1]
before the view updates generated by new_server from the migrated
staging sstable were actually delivered, leading to a partially
populated MV (e.g. 431/1000 rows) and a failing assertion.
Fix the predicate to return None instead of False when the backlog is
not yet drained, so wait_for will actually retry until the metric
reaches 0 (or the deadline is hit).
Fixes SCYLLADB-1182
Closesscylladb/scylladb#29587
Fixes: SCYLLADB-1523
The returned file object does not increment file pos as is. One line fix.
Added test to make sure this read path works as expected.
Closesscylladb/scylladb#29456
Wait for cql on all hosts after restarting a server in the test.
The problem that was observed is that the test restarts servers[1] and
doesn't wait for the cql to be ready on it. On test teardown it drops
the keyspace, trying to execute it on the host that is not ready, and
fails.
Fixes SCYLLADB-1632
Closesscylladb/scylladb#29562
Maintenance socket connections report a different source address
than regular CQL connections. Make the source field configurable
in the audit test helpers so that upcoming maintenance socket
tests can verify the correct address.
Also fix the syslog backend address parser to handle IPv6
addresses formatted as [ip]:port.
Refs SCYLLADB-1615
Commit 16b56c2451 ("Audit: avoid dynamic_cast on a hot path") moved
audit info into batch_statement via set_audit_info(), but only wired it
for the CQL-text BATCH path (raw::batch_statement::prepare()).
Native-protocol BATCH messages (opcode 0x0D), handled by
process_batch_internal in transport/server.cc, construct a
batch_statement without setting audit_info. This causes audit to
silently skip the entire batch.
Set audit_info on the batch_statement so these batches are audited.
Fixes SCYLLADB-1652
No backport - bug introduced recently.
Closesscylladb/scylladb#29570
* github.com:scylladb/scylladb:
test/audit: add reproducer for native-protocol batch not being audited
audit: set audit_info for native-protocol BATCH messages
test/audit: rename internal test methods to avoid CI misdetection
should be merged after #29235
Complete the typed skip markers migration started in the plugin PR.
Every bare `@pytest.mark.skip` decorator and `pytest.skip()` runtime call
across the test suite is replaced with a typed equivalent, making skip
reasons machine-readable in JUnit XML and Allure reports.
**62 files changed** across 8 commits, covering ~127 skip sites in total.
Bare `pytest.skip` provides only a free-text reason string. CI dashboards
(JUnit, Allure) cannot distinguish between a test skipped due to a known
bug, a missing feature, a slow test, or an environment limitation. This
makes it hard to track skip debt, prioritize fixes, or filter dashboards
by skip category.
The typed markers (`skip_bug`, `skip_not_implemented`, `skip_slow`,
`skip_env`) introduced by the `skip_reason_plugin` solve this by embedding
a `skip_type` field into every skip report entry.
| Type | Count | Files | Description |
|------|-------|-------|-------------|
| `skip_bug` | 24 | 16 | Skip reason references a known bug/issue |
| `skip_not_implemented` | 10 | 5 | Feature not yet implemented in Scylla |
| `skip_slow` | 4 | 3 | Test too slow for regular CI runs |
| `skip_not_implemented` (bare) | 2 | 1 | Bare `@pytest.mark.skip` with no reason (COMPACT STORAGE, #3882) |
| Type | Count | Files | Description |
|------|-------|-------|-------------|
| `skip_env` | ~85 | 34 | Feature/config/topology not available at runtime |
| `skip_bug` | 2 | 2 | Known bugs: Streams on tablets (#23838), coroutine task not found (#22501) |
- **Comments**: 7 comments/docstrings across 5 files updated from `pytest.skip()` to `skip()`
- **Plugin hardened**: `warnings.warn()` → `pytest.UsageError` for bare `@pytest.mark.skip` at collection time — bare skips are now a hard error, not a warning
- **Guard tests**: New `test/pylib_test/test_no_bare_skips.py` with 3 tests that prevent regression:
- AST scan for bare `@pytest.mark.skip` decorators
- AST scan for bare `pytest.skip()` runtime calls
- Real `pytest --collect-only` against all Python test directories
Runtime skip sites use the convenience wrappers from `test.pylib.skip_types`:
```python
from test.pylib.skip_types import skip_env
```
Usage:
```python
skip_env("Tablets not enabled")
```
1. **test: migrate @pytest.mark.skip to @pytest.mark.skip_bug for known bugs** — 24 decorator sites, 16 files
2. **test: migrate @pytest.mark.skip to @pytest.mark.skip_not_implemented** — 10 decorator sites, 5 files
3. **test: migrate @pytest.mark.skip to @pytest.mark.skip_slow** — 4 decorator sites, 3 files
4. **test: migrate bare @pytest.mark.skip to skip_not_implemented** — 2 bare decorators, 1 file
5. **test: migrate runtime pytest.skip() to typed skip_env()** — ~85 sites, 34 files
6. **test: migrate runtime pytest.skip() to typed skip_bug()** — 2 sites, 2 files
7. **test: update comments referencing pytest.skip() to skip()** — 7 comments, 5 files
8. **test/pylib: reject bare pytest.mark.skip and add codebase guards** — plugin hardening + 3 guard tests
- All 60 plugin + guard tests pass (`test/pylib_test/`)
- No bare `@pytest.mark.skip` or `pytest.skip()` calls remain in the codebase
- `pytest --collect-only` succeeds across all test directories with the hardened plugin
SCYLLADB-1349
Closesscylladb/scylladb#29305
* github.com:scylladb/scylladb:
test/alternator: replace bare pytest.skip() with typed skip helpers
test: migrate new bare skips introduced by upstream after rebase
test/pylib: reject bare pytest.mark.skip and add codebase guards
test: update comments referencing pytest.skip() to skip_env()
test: migrate runtime pytest.skip() to typed skip_bug()
test: migrate runtime pytest.skip() to typed skip_env()
test: migrate bare @pytest.mark.skip to skip_not_implemented
test: migrate @pytest.mark.skip to @pytest.mark.skip_slow
test: migrate @pytest.mark.skip to @pytest.mark.skip_not_implemented
test: migrate @pytest.mark.skip to @pytest.mark.skip_bug for known bugs
Add pylib_test to norecursedirs in pytest.ini so it is not collected
during ./test.py or pytest test/ runs, but can still be run directly
via 'pytest test/pylib_test'.
Also fix pytest log cleanup: worker log files (pytest_gw*) were not
being deleted on success because cleanup was restricted to the main
process only. Now each process (main and workers) cleans up its own
log file on success.
Closesscylladb/scylladb#29551
When tombstone_gc=repair, the repaired compaction view's sstable_set_for_tombstone_gc()
previously returned all sstables across all three views (unrepaired, repairing, repaired).
This is correct but unnecessarily expensive: the unrepaired and repairing sets are never
the source of a GC-blocking shadow when tombstone_gc=repair, for base tables.
The key ordering guarantee that makes this safe is:
- topology_coordinator sends send_tablet_repair RPC and waits for it to complete.
Inside that RPC, mark_sstable_as_repaired() runs on all replicas, moving D from
repairing → repaired (repaired_at stamped on disk).
- Only after the RPC returns does the coordinator commit repair_time + sstables_repaired_at
to Raft.
- gc_before = repair_time - propagation_delay only advances once that Raft commit applies.
Therefore, when a tombstone T in the repaired set first becomes GC-eligible (its
deletion_time < gc_before), any data D it shadows is already in the repaired set on
every replica. This holds because:
- The memtable is flushed before the repairing snapshot is taken (take_storage_snapshot
calls sg->flush()), capturing all data present at repair time.
- Hints and batchlog are flushed before the snapshot, ensuring remotely-hinted writes
arrive before the snapshot boundary.
- Legitimate unrepaired data has timestamps close to 'now', always newer than any
GC-eligible tombstone (USING TIMESTAMP to write backdated data is user error / UB).
Excluding the repairing and unrepaired sets from the GC shadow check cannot cause any
tombstone to be wrongly collected. The memtable check is also skipped for the same
reason: memtable data is either newer than the GC-eligible tombstone, or was flushed
into the repairing/repaired set before gc_before advanced.
Safety restriction — materialized views:
The optimization IS applied to materialized view tables. Two possible paths could inject
D_view into the MV's unrepaired set after MV repair: view hints and staging via the
view-update-generator. Both are safe:
(1) View hints: flush_hints() creates a sync point covering BOTH _hints_manager (base
mutations) AND _hints_for_views_manager (view mutations). It waits until ALL pending view
hints — including D_view entries queued in _hints_for_views_manager while the target MV
replica was down — have been replayed to the target node before take_storage_snapshot() is
called. D_view therefore lands in the MV's repairing sstable and is promoted to repaired.
When a repaired compaction then checks for shadows it finds D_view in the repaired set,
keeping T_mv non-purgeable.
(2) View-update-generator staging path: Base table repair can write a missing D_base to a
replica via a staging sstable. The view-update-generator processes the staging sstable
ASYNCHRONOUSLY: it may fire arbitrarily later, even after MV repair has committed
repair_time and T_mv has been GC'd from the repaired set. However, the staging processor
calls stream_view_replica_updates() which performs a READ-BEFORE-WRITE via
as_mutation_source_excluding_staging(): it reads the CURRENT base table state before
building the view update. If T_base was written to the base table (as it always is before
the base replica can be repaired and the MV tombstone can become GC-eligible), the
view_update_builder sees T_base as the existing partition tombstone. D_base's row marker
(ts_d < ts_t) is expired by T_base, so the view update is a no-op: D_view is never
dispatched to the MV replica. No resurrection can occur regardless of how long staging is
delayed.
A potential sub-edge-case is T_base being purged BEFORE staging fires (leaving D_base as
the sole survivor, so stream_view_replica_updates would dispatch D_view). This is blocked
by an additional invariant: for tablet-based tables, the repair writer stamps repaired_at
on staging sstables (repair_writer_impl::create_writer sets mark_as_repaired = true and
perform_component_rewrite writes repaired_at = sstables_repaired_at + 1 on every staging
sstable). After base repair commits sstables_repaired_at to Raft, the staging sstable
satisfies is_repaired(sstables_repaired_at, staging_sst) and therefore appears in
make_repaired_sstable_set(). Any subsequent base repair that advances sstables_repaired_at
further still includes the staging sstable (its repaired_at ≤ new sstables_repaired_at).
D_base in the staging sstable thus shadows T_base in every repaired compaction's shadow
check, keeping T_base non-purgeable as long as D_base remains in staging.
A base table hint also cannot bypass this. A base hint is replayed as a base mutation. The
resulting view update is generated synchronously on the base replica and sent to the MV
replica via _hints_for_views_manager (path 1 above), not via staging.
USING TIMESTAMP with timestamps predating (gc_before + propagation_delay) is explicitly
UB and excluded from the safety argument.
For tombstone_gc modes other than repair (timeout, immediate, disabled) the invariant
does not hold for base tables either, so the full storage-group set is returned.
The expected gain is reduced bloom filter and memtable key-lookup I/O during repaired
compactions: the unrepaired set is typically the largest (it holds all recent writes),
yet for tombstone_gc=repair it never influences GC decisions.
Fixes https://scylladb.atlassian.net/browse/SCYLLADB-231.
Closesscylladb/scylladb#29310
* github.com:scylladb/scylladb:
compaction: Restrict tombstone GC sstable set to repaired sstables for tombstone_gc=repair mode
test/repair: Add tombstone GC safety tests for incremental repair
Three test cases in multishard_query_test.cc set the querier_cache entry
TTL to 2s and then assert, between pages of a stateful paged query, that
cached queriers are still present (population >= 1) and that
time_based_evictions stays 0.
The 2s TTL is not load-bearing for what these tests exercise — they are
checking the paging-cache handoff, not TTL semantics. But on busy CI
runners (SCYLLADB-1642 was observed on aarch64 release), scheduling
jitter between saving a reader and sampling the population can exceed
2s. When that happens, the TTL fires, both saved queriers are
time-evicted, population drops to 0, and the assertion
`require_greater_equal(saved_readers, 1u)` fails. The trailing
`require_equal(time_based_evictions, 0)` check never runs because the
earlier assertion has already aborted the iteration — which is why the
Jenkins failure surfaces only as a bare "C++ failure at seastar_test.cc:93".
Reproduced deterministically in test_read_with_partition_row_limits by
injecting a `seastar::sleep(2500ms)` between the save and the sample:
the hook then reports
population=0 inserts=2 drops=0 time_based_evictions=2 resource_based_evictions=0
and the assertion fires — matching the Jenkins symptoms exactly.
Bump the TTL to 60s in all three affected tests:
- test_read_with_partition_row_limits (confirmed repro for SCYLLADB-1642)
- test_read_all (same pattern, same invariants — suspect)
- test_read_all_multi_range (same pattern, same invariants — suspect)
Leave test_abandoned_read (1s TTL, actually tests TTL-driven eviction)
and test_evict_a_shard_reader_on_each_page (tests manual eviction via
evict_one(); its TTL is not load-bearing but the fix is deferred for a
separate review) unchanged.
Fixes: SCYLLADB-1642
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closesscylladb/scylladb#29564
With this change, you can add or remove a DC(s) in a single ALTER KEYSPACE statement. It requires the keyspace to use rack list replication factor.
In existing approach, during RF change all tablet replicas are rebuilt at once. This isn't the case now. In global_topology_request::keyspace_rf_change the request is added to a ongoing_rf_changes - a new column in system.topology table. In a new column in system_schema.keyspaces - next_replication - we keep the target RF.
In make_rf_change_plan, load balancer schedules necessary migrations, considering the load of nodes and other pending tablet transitions. Requests from ongoing_rf_changes are processed concurrently, independently from one another. In each request racks are processed concurrently. No tablet replica will be removed until all required replicas are added. While adding replicas to each rack we always start with base tables and won't proceed with views until they are done (while removing - the other way around). The intermediary steps aren't reflected in schema. When the Rf change is finished:
- in system_schema.keyspaces:
- next_replication is cleared;
- new keyspace properties are saved;
- request is removed from ongoing_rf_changes;
- the request is marked as done in system.topology_requests.
Until the request is done, DESCRIBE KEYSPACE shows the replication_v2.
If a request hasn't started to remove replicas, it can be aborted using task manager. system.topology_requests::error is set (but the request isn't marked as done) and next_replication = replication_v2. This will be interpreted by load balancer, that will start the rollback of the request. After the rollback is done, we set the relevant system.topology_requests entry as done (failed), clear the request id from system.topology::ongoing_rf_changes, and remove next_replication.
Fixes: SCYLLADB-567.
No backport needed; new feature.
Closesscylladb/scylladb#24421
* github.com:scylladb/scylladb:
service: fix indentation
docs: update documentation
test: test multi RF changes
service: tasks: allow aborting ongoing RF changes
cql3: allow changing RF by more than one when adding or removing a DC
service: handle multi_rf_change
service: implement make_rf_change_plan
service: add keyspace_rf_change_plan to migration_plan
service: extend tablet_migration_info to handle rebuilds
service: split update_node_load_on_migration
service: rearrange keyspace_rf_change handler
db: add columns to system_schema.keyspaces
db: service: add ongoing_rf_changes to system.topology
gms: add keyspace_multi_rf_change feature
The existing test_batch sends a textual BEGIN BATCH ... APPLY BATCH as a
QUERY message, which goes through the CQL parser and raw::batch_statement::
prepare() — a path that correctly sets audit_info. This missed the bug
where native-protocol BATCH messages (opcode 0x0D), handled by
process_batch_internal in transport/server.cc, construct a batch_statement
without setting audit_info, causing audit to silently skip the batch.
Add _test_batch_native_protocol which uses the driver's BatchStatement
(both unprepared and prepared variants) to exercise this code path.
Refs SCYLLADB-1652
The CI heuristic picks up any function named test_* in changed files
and tries to run it as a standalone pytest test. The AuditTester class
methods (test_batch, test_dml, etc.) are not top-level pytest tests —
they are internal helpers called from the actual test functions.
Prefix them with underscore so CI does not mistake them for
standalone tests.
A handful of cassandra-driver Cluster.shutdown() call sites in the
auth_cluster tests were missed by the previous sweep that introduced
safe_driver_shutdown(), because the local variable holding the Cluster
is named "c" rather than "cluster".
Direct Cluster.shutdown() is racy: the driver's "Task Scheduler"
thread may raise RuntimeError ("cannot schedule new futures after
shutdown") during or after the call, occasionally failing tests.
safe_driver_shutdown() suppresses this expected RuntimeError and
joins the scheduler thread.
Replace the remaining c.shutdown() calls in:
- test/cluster/auth_cluster/test_startup_response.py
- test/cluster/auth_cluster/test_maintenance_socket.py
with safe_driver_shutdown(c) and add the corresponding import from
test.pylib.driver_utils.
No behavioral change to the tests; only the driver teardown is
hardened against a known driver-side race.
Fixes SCYLLADB-1662
Closesscylladb/scylladb#29576
When forcing tablet count change via cql command, the underlying
tablet machinery takes some time to adjust. Original code waited
at most 0.1s for tablet data to be synchronized. This seems to be
not enough on debug builds, so we add exponential backoff and increase
maximum waiting time. Now the code will wait 0.1s first time and
continue waiting with each time doubling the time, up to maximum of 6 times -
or total time ~6s.
Fixes: SCYLLADB-1655
Closesscylladb/scylladb#29573
When DROP TABLE races with an in-flight DML on a strongly-consistent
table, the node aborts in `groups_manager::acquire_server()` because the
raft group has already been erased from `_raft_groups`.
A concurrent `DROP TABLE` may have already removed the table from database
registries and erased the raft group via `schedule_raft_group_deletion`.
The `schema.table()` in `create_operation_ctx()` might not fail though
because someone might be holding `lw_shared_ptr<table>`, so that the
table is dropped but the table object is still alive.
Fix by accepting table_id in acquire_server and checking that the table
still exists in the database via `find_column_family` before looking up
the raft group. If the table has been dropped, find_column_family
throws no_such_column_family instead of the node aborting via
on_internal_error. When the table does exist, acquire_server proceeds
to acquire state.gate; schedule_raft_group_deletion co_awaits
gate::close, so it will wait for the DML operation to complete before
erasing the group.
backport: not needed (not released feature)
Fixes SCYLLADB-1450
Closesscylladb/scylladb#29430
* github.com:scylladb/scylladb:
strong_consistency: fix crash when DROP TABLE races with in-flight DML
test: add regression test for DROP TABLE racing with in-flight DML
assert_entries_were_added asserted that new audit rows always appear at
the tail of each per-node, event_time-sorted sequence. That invariant
is not a property of the audit feature: audit writes are asynchronous
with respect to query completion, and on a multi-node cluster QUORUM
reads of audit.audit_log can reveal a row with an older event_time
after a row with a newer one has already been observed.
Replace the positional tail slice with a per-node set difference
between the rows observed before and after the audited operation.
The wait_for retry loop, noise filtering, and final by-value
comparison against expected_entries are unchanged, so the test still
verifies the real contract, that the expected audit entries appear,
without relying on a visibility-ordering invariant that the audit log
does not guarantee.
Fixes SCYLLADB-1589
Closesscylladb/scylladb#29567
The statement_restrictions code is responsible for analyzing the WHERE
clause, deciding on the query plan (which index to use), and extracting
the partition and clustering keys to use for the index.
Currently, it suffers from repetition in making its decisions: there are 15
calls to expr::visit in statement_restrictions.cc, and 14 find_binop calls. This
reduces to 2 visits (one nested in the other) and 6 find_binop calls. The analysis
of binary operators is done once, then reused.
The key data structure introduced is the predicate. While an expression
takes inputs from the row evaluated, constants, and bind variables, and
produces a boolean result, predicates ask which values for a column (or
a number of columns) are needed to satisfy (part of) the WHERE clause.
The WHERE clause is then expressed as a conjunction of such predicates.
The analyzer uses the predicates to select the index, then uses the predicates
to compute the partition and clustering keys.
The refactoring is composed of these parts (but patches from different parts
are interspersed):
1. an exhaustive regression test is added as the first commit, to ensure behavior doesn't change
2. move computation from query time to prepare time
3. introduce, gradually enrich, and use predicates to implement the statement_restrictions API
Major refactoring, and no bugs fixed, so definitely not backporting.
Closesscylladb/scylladb#29114
* github.com:scylladb/scylladb:
cql3: statement_restrictions: replace has_eq_restriction_on_column with precomputed set
cql3: statement_restrictions: replace multi_column_range_accumulator_builder with direct predicate iteration
cql3: statement_restrictions: use predicate fields in build_get_clustering_bounds_fn
cql3: statement_restrictions: remove extract_single_column_restrictions_for_column
cql3: statement_restrictions: use predicate vectors in prepare_indexed_local
cql3: statement_restrictions: use predicate vector size for clustering prefix length
cql3: statement_restrictions: replace do_find_idx and is_supported_by with predicate-based versions
cql3: statement_restrictions: remove expression-based has_supporting_index and index_supports_some_column
cql3: statement_restrictions: replace multi-column and PK index support checks with predicate-based versions
cql3: statement_restrictions: add predicate-based index support checking
cql3: statement_restrictions: use pre-built single-column maps for index support checks
cql3: statement_restrictions: build clustering-prefix restrictions incrementally
cql3: statement_restrictions: build partition-range restrictions incrementally
cql3: statement_restrictions: build clustering-key single-column restrictions map incrementally
cql3: statement_restrictions: build partition-key single-column restrictions map incrementally
cql3: statement_restrictions: build non-primary-key single-column restrictions map incrementally
cql3: statement_restrictions: use tracked has_mc_clustering for _has_multi_column
cql3: statement_restrictions: track has-token state incrementally
cql3: statement_restrictions: track partition-key-empty state incrementally
cql3: statement_restrictions: track first multi-column predicate incrementally
cql3: statement_restrictions: track last clustering column incrementally
cql3: statement_restrictions: track clustering-has-slice incrementally
cql3: statement_restrictions: track has-multi-column-clustering incrementally
cql3: statement_restrictions: track clustering-empty state incrementally
cql3: statement_restrictions: replace restr bridge variable with pred.filter
cql3: statement_restrictions: convert single-column branch to use predicate properties
cql3: statement_restrictions: convert multi-column branch to use predicate properties
cql3: statement_restrictions: convert constructor loop to iterate over predicates
cql3: statement_restrictions: annotate predicates with operator properties
cql3: statement_restrictions: annotate predicates with is_not_null and is_multi_column
cql3: statement_restrictions: complete preparation early
cql3: statement_restrictions: convert expressions to predicates without being directed at a specific column
cql3: statement_restrictions: refine possible_lhs_values() function_call processing
cql3: statement_restrictions: return nullptr for function solver if not token
cql3: statement_restrictions: refine possible_lhs_values() subscript solving
cql3: statement_restrictions: return nullptr from possible_lhs_values instead of on_internal_error
cql3: statement_restrictions: convert possible_lhs_values into a solver
cql3: statement_restrictions: split _where to boolean factors in preparation for predicates conversion
cql3: statement_restrictions: refactor IS NOT NULL processing
cql3: statement_restrictions: fold add_single_column_nonprimary_key_restriction() into its caller
cql3: statement_restrictions: fold add_single_column_clustering_key_restriction() into its caller
cql3: statement_restrictions: fold add_single_column_partition_key_restriction() into its caller
cql3: statement_restrictions: fold add_token_partition_key_restriction() into its caller
cql3: statement_restrictions: fold add_multi_column_clustering_key_restriction() into its caller
cql3: statement_restrictions: avoid early return in add_multi_column_clustering_key_restrictions
cql3: statement_restrictions: fold add_is_not_restriction() into its caller
cql3: statement_restrictions: fold add_restriction() into its caller
cql3: statement_restrictions: remove possible_partition_token_values()
cql3: statement_restrictions: remove possible_column_values
cql3: statement_restrictions: pass schema to possible_column_values()
cql3: statement_restrictions: remove fallback path in solve()
cql3: statement_restrictions: reorder possible_lhs_column parameters
cql3: statement_restrictions: prepare solver for multi-column restrictions
cql3: statement_restrictions: add solver for token restriction on index
cql3: statement_restrictions: pre-analyze column in value_for()
cql3: statement_restrictions: don't handle boolean constants in multi_column_range_accumulator_builder
cql3: statement_restrictions: split range_from_raw_bounds into prepare phase and query phase
cql3: statement_restrictions: adjust signature of range_from_raw_bounds
cql3: statement_restrictions: split multi_column_range_accumulator into prepare-time and query-time phases
cql3: statement_restrictions: make get_multi_column_clustering_bounds a builder
cql3: statement_restrictions: multi-key clustering restrictions one layer deeper
cql3: statement_restrictions: push multi-column post-processing into get_multi_column_clustering_bounds()
cql3: statement_restrictions: pre-analyze single-column clustering key restrictions
cql3: statement_restrictions: wrap value_for_index_partition_key()
cql3: statement_restrictions: hide value_for()
cql3: statement_restrictions: push down clustering prefix wrapper one level
cql3: statement_restrictions: wrap functions that return clustering ranges
cql3: statement_restrictions: do not pass view schema back and forth
cql3: statement_restrictions: pre-analyze token range restrictions
cql3: statement_restrictions: pre-analyze partition key columns
cql3: statement_restrictions: do not collect subscripted partition key columns
cql3: statement_restrictions: split _partition_range_restrictions into three cases
cql3: statement_restrictions: move value_list, value_set to header file
cql3: statement_restrictions: wrap get_partition_key_ranges
cql3: statement_restrictions: prepare statement_restrictions for capturing `this`
test: statement_restrictions: add index_selection regression test