scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-01 13:45:53 +00:00

Author	SHA1	Message	Date
Aleksandr Bykov	8afdae24d2	test: fix flaky test_kill_coordinator_during_op The test hardcoded the expected number of coordinator elections (2, 3, 4, 5) for each phase. If a prior phase triggered an extra election, subsequent phases would wait for a count that was already reached or would never match. Fix by reading the current election count before each operation and expecting exactly one more, making each phase independent of prior history. Also add wait_for_no_pending_topology_transition() calls after each coordinator election to ensure the topology state machine has fully settled before proceeding with restarts and further operations. Decrease the failure detector timeout (failure_detector_timeout_in_ms) to 2000 ms on all test nodes so that coordinator crashes are detected faster, reducing test wallclock time and timeout-related flakiness. Enable raft_topology=trace logging on all test nodes to aid post-failure diagnosis. Add diagnostic logging in wait_new_coordinator_elected(). Fixes: SCYLLADB-1089 Closes scylladb/scylladb#29284	2026-04-30 21:27:56 +03:00
Ernest Zaslavsky	1febfbd9b5	test: rename sstable_tablet_streaming.cc to match the naming convention apparently, boost test MUST end with "_test" to be executed by the test.py Closes scylladb/scylladb#29693	2026-04-30 11:16:39 +03:00
Pavel Emelyanov	1ca97f0c0a	Merge 'test: fix disabled test handling and deduplicate CLI test arguments' from Evgeniy Naydanov - Revert the previous "test.py: fix test collection bug" commit (`92c09d10`) which worked around broken deduplication by filtering items without `BUILD_MODE` in `pytest_collection_modifyitems`. This approach masked the root cause and is superseded by the proper fixes below. - Backport pytest 9.0.3's argument normalization algorithm into `test.py` to work around broken deduplication in pytest 8.3.5 ([pytest-dev/pytest#12083](https://github.com/pytest-dev/pytest/issues/12083)). Duplicate or subsumed test paths (e.g. `test/cql` and `test/cql/lua_test.cql`) are collapsed before invoking pytest. Revert when upgrading to pytest 9.x. - Return a `DisabledFile` collector instead of an empty list in `pytest_collect_file` when all modes are disabled for a file, fixing a bug where subsequent files would not get their stash items set (`REPEATING_FILES`). Restructure `pytest_collect_file` to use a walrus operator (`if repeats := ...`) with a single `remove(file_path)` and `return collectors` at the end, eliminating the early return. - Add `--keep-duplicates` CLI argument to bypass deduplication and forward to pytest. - Move `RUN_ID` assignment from `pytest_collect_file` to `modify_pytest_item`. A shared `run_ids` cache (`defaultdict[tuple[str, str], count]`) is created in `pytest_collection_modifyitems` and passed to `modify_pytest_item`, keyed by `(build_mode, nodeid)` so each mode gets independent counters. This ensures unique run IDs even when `--keep-duplicates` causes the same file to be collected multiple times. - Fix `--repeat` option default from string `"1"` to int `1` — argparse only applies `type=` to CLI-parsed values, not defaults. pytest normally deduplicates overlapping test arguments — e.g. `test/cql test/cql/lua_test.cql` collects `lua_test.cql` only once. The original `test.py` never performed this deduplication, and the pytest version in the toolchain image (8.3.5) has a bug that breaks it ([pytest-dev/pytest#12083](https://github.com/pytest-dev/pytest/issues/12083).) Since we are moving to bare pytest, `test.py` should match pytest's default behavior: deduplicate. Because we cannot easily upgrade pytest, commit 2 backports the deduplication logic from pytest 9.0.3. To match pytest's interface, `--keep-duplicates` is added as an opt-out. This lets a user intentionally run overlapping paths — e.g. `./test.py test/blah test/blah/test_foo.py --keep-duplicates` runs `test_foo.py` twice. The flag is forwarded to pytest and also skips the backported deduplication in `test.py`. - Revert `92c09d10` which filtered items without `BUILD_MODE` in `pytest_collection_modifyitems` and added an early return in `CppFile.collect()`. This workaround is superseded by the proper deduplication and `DisabledFile` fixes. - Add `_CollectionArgument` dataclass (`order=True`, `__contains__` for subsumption) and `_deduplicate_test_args()` function, adapted from pytest 9.0.3. Marked with a TODO to remove once we update to pytest 9.x. - Call `_deduplicate_test_args()` on `options.name` before passing to pytest. - Add `DisabledFile(pytest.File)` that skips collection with an informative message instead of returning an empty list. - Restructure `pytest_collect_file` to use walrus operator: `if repeats := ...:` / `else:` — single `remove(file_path)` at end, no early return. - Add `--keep-duplicates` argument that skips deduplication and is forwarded to pytest. - Create a shared `run_ids` cache in `pytest_collection_modifyitems` and pass it to `modify_pytest_item`, which assigns unique sequential RUN_IDs via `itertools.count`. The cache is keyed by `(build_mode, nodeid)` so each mode gets independent counters. - Remove `RUN_ID` from `_STASH_KEYS_TO_COPY` — it is no longer set on collectors. - Remove `CppFile.run_id` cached_property. `CppTestCase` now reads `RUN_ID` from its own item stash. - Fix `--repeat` option default from `"1"` to `1` and drop redundant `int()` cast. Closes SCYLLADB-1730 Closes scylladb/scylladb#29665 * github.com:scylladb/scylladb: test: add --keep-duplicates and assign RUN_ID via shared cache test/pylib/runner: fix disabled file collection test.py: deduplicate CLI test arguments before passing to pytest Revert "test.py: fix test collection bug"	2026-04-30 07:58:25 +03:00
Yaniv Michael Kaul	93722f2c89	gms/gossiper: fix use-after-move in do_send_ack2_msg The second logger.debug() call accesses ack2_msg after it was moved via std::move() in the co_await send_gossip_digest_ack2 call. This is undefined behavior. Fix by formatting ack2_msg to a string before the move, then using that cached string in both debug log calls. FIXES: https://scylladb.atlassian.net/browse/SCYLLADB-1778 Closes scylladb/scylladb#29227	2026-04-30 07:07:39 +03:00
Wojciech Mitros	ebaf536449	replica/database: fix cross-shard deadlock in lock_tables_metadata() lock_tables_metadata() acquires a write lock on tables_metadata._cf_lock on every shard. It used invoke_on_all(), which dispatches lock acquisitions to all shards in parallel via parallel_for_each + smp::submit_to. When two fibers call lock_tables_metadata() concurrently, this can deadlock. parallel_for_each starts all iterations unconditionally: even when the local shard's lock attempt blocks (because the other fiber already holds it), SMP messages are still sent to remote shards. Both fibers' lock-acquisition messages land in the per-shard SMP queues. The SMP queue itself is FIFO, but process_incoming() drains it and schedules each item as a reactor task via add_task(), which — in debug and sanitize builds with SEASTAR_SHUFFLE_TASK_QUEUE — shuffles each newly added task against all pending tasks in the same scheduling group's reactor task queue. This means fiber A's lock acquisition can be reordered past fiber B's (and past unrelated tasks) on a given shard. If fiber A wins the lock on shard X while fiber B wins on shard Y, this creates a classic cross-shard lock-ordering deadlock (circular wait). In production builds without SEASTAR_SHUFFLE_TASK_QUEUE, the reactor task queue is FIFO. Still, even in release builds, the SMP queues can reorder messages even, so the deadlock is still possible, even if it's much less likely. In debug and sanitize builds, the task-queue shuffle makes the deadlock very likely whenever both fibers' lock-acquisition tasks are pending simultaneously in the reactor task queue on any shard. This deadlock was exposed by `ce00d61917` ("db: implement large_data virtual tables with feature flag gating", merged as `88a8324e68`), which introduced legacy_drop_table_on_all_shards as a second caller of lock_tables_metadata(). When LARGE_DATA_VIRTUAL_TABLES is enabled during topology_state_load (via feature_service::enable), two fibers can race: 1. activate_large_data_virtual_tables() — calls legacy_drop_table_on_all_shards() which calls lock_tables_metadata() synchronously via .get() 2. reload_schema_in_bg() — fires as a background fiber from TABLE_DIGEST_INSENSITIVE_TO_EXPIRY, eventually reaches schema_applier::commit() which also calls lock_tables_metadata() If both reach lock_tables_metadata() while the lock is free on all shards, the parallel acquisition creates the deadlock opportunity. The deadlock blocks topology_state_load() from completing, which prevents the bootstrapping node from finishing its topology state transitions. The coordinator's topology coordinator then waits for the node to reach the expected state, but the node is stuck, so eventually the read_barrier times out after 300 seconds. Fix by acquiring the shard 0 lock first before attempting to acquire any other lock. Whichever fiber wins shard 0 is guaranteed to acquire all remaining shards before the other fiber can proceed past shard 0, eliminating the circular-wait condition. Tested manually with 2 approaches: 1. causing different shard locks to be acquired by different lock_tables_metadata() calls by adding different sleeps depending on the lock_tables_metadata() call and target shard - this reproduced the issue consistently 2. matching the time point at which both fibers reach lock_tables_metadata() adding a single sleep to one of the fibers - this heavily depends on the machine so we can't create a universal reproducer this way, but it did result in the observed failure on my machine after finding the right sleep time Also added a unit test for concurrent lock_tables_metadata() calls. Fixes: SCYLLADB-1694 Fixes: SCYLLADB-1644 Fixes: SCYLLADB-1684 Closes scylladb/scylladb#29678	2026-04-29 21:13:53 +02:00
Patryk Jędrzejczak	15f35577ed	Merge 'paxos_state: keep prepared message alive across statement execution' from Petr Gusev In do_execute_cql_with_timeout(), when the prepared statement was not found in the cache, we called qp.prepare() and stored the returned result_message::prepared in a local variable scoped to the 'if' block. We then extracted ps_ptr (a checked_weak_ptr to the prepared statement) from the message, let the message go out of scope at the end of the 'if', and used ps_ptr after a co_await on st->execute(). Since `3ac4e258e8` ("transport/messages: hold pinned prepared entry in PREPARE result"), result_message::prepared owns a strong pinned reference to the prepared cache entry. While qp.prepare() runs it also holds its own pin on the entry, so on return the entry has at least the pin owned by the returned message. As long as that message is alive, the cache entry cannot be purged and the weak handle inside ps_ptr remains promotable. The lifetime gap manifested only in debug builds. qp.prepare() returns a ready future on the cache-miss path, so in release builds the co_await resumes synchronously: control flows from the assignment of ps_ptr straight into st->execute() with no opportunity for any other task (in particular, prepared cache invalidation triggered by a concurrent schema change) to run in between. Debug builds, however, force a reactor preemption point on every co_await even when the awaited future is ready. With prepared_msg already destroyed at the end of the 'if' block, the only remaining handle on the cache entry was the weak ps_ptr, and the preemption gave a concurrent cache purge - triggered, for example, by Raft schema changes received during a node restart - the chance to drop the entry. The subsequent execute() then failed when promoting the weak pointer with checked_ptr_is_null_exception. The exception propagated out of the Paxos prepare path as a generic std::exception with no type information in the log, surfacing on the coordinator as: WriteFailure: Failed to prepare ballot ... Replica errors: host_id ... -> seastar::rpc::remote_verb_error (std::exception) Hoist the result_message::prepared into the outer scope so the pinned cache entry stays alive across co_await st->execute(...), closing the window in which a concurrent cache purge could invalidate the weak handle. Fixes SCYLLADB-1173 backport: the patch is simple, we can backport it to all versions with "LWT over tablets" feature. Note that the problem is only in test runs in debug configuration, production is not affected. Closes scylladb/scylladb#29675 * https://github.com/scylladb/scylladb: table_helper: retry insert prepare on concurrent cache invalidation paxos_state: keep prepared message alive across statement execution	2026-04-29 17:57:27 +02:00
Yaron Kaikov	d310e4b27d	scylla-gdb: fix compaction-tasks command for intrusive list Since commit `e942c074f2` changed _tasks from std::list<shared_ptr<...>> to a boost::intrusive_list, iterating yields raw compaction_task_executor objects rather than shared_ptr wrappers. The GDB script was updated to use intrusive_list() but still wrapped elements in seastar_shared_ptr(), causing 'gdb.error: There is no member or method named _p' when compaction tasks are active. Move the seastar_shared_ptr unwrapping to the 6.2 compatibility fallback path only, since the intrusive list path yields objects directly. Fixes: SCYLLADB-1762 Closes scylladb/scylladb#29690	2026-04-29 13:11:13 +03:00
Marcin Maliszkiewicz	45b4834ac4	Merge 'audit: fix maintenance socket startup/shutdown ordering' from Andrzej Jackowski This series addresses three problems in the audit startup/shutdown sequence: 1. [BUG] Shutdown SIGABRT. During graceful shutdown, deferred stops run in reverse order of construction. With the audit service constructed after the maintenance socket, audit was destroyed first, and in-flight queries on the maintenance socket could hit the destroyed audit service (assertion failure in sharded::local()). 2. [BUG] Startup audit bypass. The maintenance socket opened before audit storage was initialized, allowing queries (e.g. creating a superuser) to bypass auditing in that window. 3. [PROBLEM] Blocks SCYLLADB-1430. The existing order prevents audit configuration from being driven by group0 state, because audit started before group0. The series is organized as: a test-helper refactor, a test for the audited maintenance-socket flow, a startup-phase split, the construction-order fix and its shutdown-race test, and finally the storage-before-socket fix and its startup-window test. Fixes SCYLLADB-1615 No backport, bugs don't seem severe enough to justify backporting. Closes scylladb/scylladb#29539 * github.com:scylladb/scylladb: audit: assert storage ordering invariants at runtime audit: start maintenance socket after audit storage audit: move audit construction before maintenance socket audit: split startup into construction and storage phases test: audit: verify maintenance socket operations are audited test: audit: parameterize source address in audit assertions	2026-04-29 10:37:38 +02:00
Łukasz Paszkowski	7e14ea5ac8	sstables: only wipe TemporaryHashes for sstable formats that have it Commit `8d34127684` ("sstables: clean up TemporaryHashes file in wipe()") unconditionally calls filename(..., component_type::TemporaryHashes) inside filesystem_storage::wipe(). However, the TemporaryHashes component is only registered in the component map of the 'ms' sstable format. For older formats (ka, la, mc, md, me) the lookup goes through sstable_version_constants::get_component_map(version).at(...) and throws std::out_of_range. The exception is then swallowed by the outer catch(...) in wipe(), which just logs and ignores. As a side effect, the subsequent remove_file(new_toc_name) is never reached and the TemporaryTOC ('*-TOC.txt.tmp') file is left as an orphan on disk after every unlink() of a non-'ms' sstable. Guard the lookup with get_component_map(version).contains() so the cleanup is only attempted for formats that actually define the component. Add a regression test in test/boost/sstable_directory_test.cc that creates an 'me'-format sstable, unlinks it and asserts that the sstable directory is left empty. Without the fix the test fails with a leftover 'me-...-TOC.txt.tmp' file. Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1697 Closes scylladb/scylladb#29620	2026-04-29 08:06:36 +03:00
Botond Dénes	809f12f988	Merge 'test/cluster/dtest: fix ScyllaNode state not persisting across nodelist() calls' from Benny Halevy `ScyllaCluster.nodelist()` creates new `ScyllaNode` objects on every call, so per-node state set via `set_smp()`, `set_log_level()`, and `_adjust_smp_and_memory()` was lost. This meant `set_smp()` had no effect when `cluster.start()` was called after it, since `start_nodes()` calls `nodelist()` internally which creates fresh nodes with default values. - Add debug logging for smp/memory in ScyllaNode - Store per-node settings (smp, memory, log levels) in a `ScyllaCluster._node_resources` dict keyed by server_id, so they survive `nodelist()` reconstruction. `ScyllaNode` restores its state from this dict on construction and saves it back whenever `set_smp()`, `set_log_level()`, or `_adjust_smp_and_memory()` modifies it. - Add a reproducer test verifying `set_smp()` takes effect on restart Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1629 -- No backport needed: this only fixes dtest infrastructure, no production code is affected. Closes scylladb/scylladb#29549 * github.com:scylladb/scylladb: test/cluster/dtest: add test for node.set_smp() persistence test/cluster/dtest: cache ScyllaNode instances in ScyllaCluster test/cluster/dtest/ccmlib/scylla_node: add debug logging	2026-04-29 06:25:36 +03:00
Evgeniy Naydanov	96d3f13245	test: add --keep-duplicates and assign RUN_ID via shared cache Add --keep-duplicates CLI argument to bypass deduplication and forward to pytest, allowing duplicate test file arguments to be collected multiple times. Move RUN_ID assignment from pytest_collect_file to modify_pytest_item. All File collectors for the same source file share a single run_ids dict (via RUN_ID_CACHE stash key), so items from duplicate collection arguments (e.g. with --keep-duplicates) automatically get unique IDs. Remove CppFile.run_id cached_property — CppTestCase now reads RUN_ID from its own item stash, which is set during modify_pytest_item. Fix --repeat option default from string "1" to int 1 — argparse only applies type= to CLI-parsed values, not defaults. Co-Authored-By: Claude Opus 4.6 (200K context) <noreply@anthropic.com>	2026-04-29 02:36:05 +00:00
Evgeniy Naydanov	497bd6b6c9	test/pylib/runner: fix disabled file collection Return a DisabledFile collector instead of an empty list when all modes are disabled for a file. Returning an empty list caused subsequent files to not get their stash items set because file_path was never removed from REPEATING_FILES. Co-Authored-By: Claude Opus 4.6 (200K context) <noreply@anthropic.com>	2026-04-29 02:36:05 +00:00
Evgeniy Naydanov	43f06ed19d	test.py: deduplicate CLI test arguments before passing to pytest Backport the argument normalization algorithm from pytest 9.0.3 to work around broken deduplication in pytest 8.3.5 (https://github.com/pytest-dev/pytest/issues/12083). Duplicate or subsumed test paths (e.g. 'test/cql' and 'test/cql/lua_test.cql') are now collapsed before invoking pytest. Revert this commit when upgrading to pytest 9.x. Co-Authored-By: Claude Opus 4.6 (200K context) <noreply@anthropic.com>	2026-04-29 02:36:05 +00:00
Evgeniy Naydanov	05f2c53931	Revert "test.py: fix test collection bug" This reverts commit `92c09d106d`.	2026-04-29 02:35:00 +00:00
Andrzej Jackowski	3755c370ac	audit: assert storage ordering invariants at runtime Abort if audit storage fails to start rather than silently running with an unaudited maintenance socket. Also assert that storage is already stopped when the audit service is destroyed, documenting the defer-stack ordering requirement. Refs SCYLLADB-1615 Refs SCYLLADB-1695	2026-04-28 18:58:49 +02:00
Andrzej Jackowski	543fb6a2db	audit: start maintenance socket after audit storage Without this, there is a window after startup where queries on the maintenance socket bypass auditing because audit storage is not yet initialized. Fixes SCYLLADB-1615	2026-04-28 18:58:49 +02:00
Andrzej Jackowski	b7bc2d89e6	audit: move audit construction before maintenance socket During graceful shutdown, deferred stops run in reverse order of construction. When the audit service was constructed after the maintenance socket, audit was destroyed first. A DML query still in-flight on the maintenance socket could then bypass auditing entirely. Move construction as early as possible so the audit service outlives the maintenance socket on the defer stack, and to maximise the window in which attempts to use audit before storage is ready are caught with on_internal_error_noexcept. Refs SCYLLADB-1615	2026-04-28 18:58:49 +02:00
Andrzej Jackowski	bc67dd0b82	audit: split startup into construction and storage phases The table-based audit backend needs Raft to create its keyspace, but the audit service must exist earlier so that CQL paths don't silently skip auditing. Split startup into two phases: construction and storage initialization. Queries arriving between the two phases are logged as errors. This is a refactoring commit and the split sections will be moved later in this patch series. Refs SCYLLADB-1615	2026-04-28 18:58:42 +02:00
Andrzej Jackowski	1616c71bf0	test: audit: verify maintenance socket operations are audited User creation via the maintenance socket should produce audit entries, as this is the recommended flow for creating the initial superuser when default credentials are disabled. The test is parametrized by audit backend (table and syslog). The maintenance socket source address is "::" because Seastar returns a zero-initialised in6_addr for AF_UNIX sockets. Test time in dev: 0.6s Refs SCYLLADB-1615	2026-04-28 18:42:39 +02:00
Avi Kivity	c4de2b3c9d	Merge 'test: fix flaky tablets test by using read barrier' from Aleksandra Martyniuk Some tests in test_tablets.py read system_schema.keyspaces from an arbitrary node that may not have applied the latest schema change yet. Pin the read to a specific node and issue a read barrier before querying, ensuring the node has up-to-date data. Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1700 Test fix; no backport Closes scylladb/scylladb#29655 * github.com:scylladb/scylladb: test: fix flaky rack list conversion tests by using read barrier test: fix flaky test_enforce_rack_list_option by using read barrier	2026-04-28 17:15:59 +03:00
Petr Gusev	e6137ab11b	table_helper: retry insert prepare on concurrent cache invalidation table_helper::insert() retrieves the prepared statement via cache_table_info() and then dereferences _prepared_stmt to read bound_names. _prepared_stmt is a checked_weak_ptr into the prepared statements cache and can be invalidated at any time by a concurrent purge (for example, on a schema change). cache_table_info() (re-)prepares the statement and assigns _prepared_stmt before returning, and the strong pin held by the result_message::prepared returned from qp.prepare() keeps the cache entry alive only for the duration of try_prepare(). After try_prepare() returns, the pin is gone and _prepared_stmt is the only remaining handle on the entry. In release builds this is fine: the chain of ready-future co_awaits between try_prepare() finishing and _prepared_stmt->bound_names being read resumes synchronously, so no other task -- in particular, no cache purge -- can run in that window. In debug builds, however, Seastar inserts a reactor preemption point on every co_await even when the awaited future is ready. That preemption window is wide enough for a concurrent invalidation to drop the freshly installed cache entry, turning _prepared_stmt into a null weak handle and crashing the subsequent dereference with checked_ptr_is_null_exception. Wrap the cache_table_info() call in a loop that re-attempts the preparation until a synchronous post-resume check finds _prepared_stmt still valid. The check runs in the same task immediately after the co_await resumes, with no co_await between the check and the dereference, so a purge cannot slip in. _insert_stmt is a strong shared_ptr to the statement object and is not affected by cache invalidation, so it remains safe to use across the final co_await on execute(). The other caller of cache_table_info(), trace_keyspace_helper::apply_events_mutation(), accesses only the strong _insert_stmt via insert_stmt() and never dereferences the weak _prepared_stmt, so it is unaffected. Refs SCYLLADB-1173	2026-04-28 16:03:06 +02:00
Patryk Jędrzejczak	d9dd3bfe53	Merge 'topology_coordinator: join tablet load stats refresh in stop()' from Andrzej Jackowski Commit `2b7aa32` (topology_coordinator: Refresh load stats after table is created or altered) registered topology_coordinator as a schema change listener and added on_create_column_family which fire-and-forgets _tablet_load_stats_refresh.trigger(). The triggered task runs on the gossip scheduling group via with_scheduling_group and accesses the topology_coordinator via 'this'. stop() unregisters the listener but does not wait for any in-flight refresh task. If a notification fires between _tablet_load_stats_refresh.join() in run() and unregister_listener in stop(), the scheduled task can outlive the topology_coordinator and access freed memory after run_topology_coordinator's coroutine frame is destroyed. Wait for the refresh to complete in stop() after unregistering the listener, ensuring no task can fire after destruction. Fixes SCYLLADB-1728 Backport to 2026.1 and 2026.2, because the issue was introduced in `2b7aa32` Closes scylladb/scylladb#29653 * https://github.com/scylladb/scylladb: test: tablet_stats: reproduce shutdown refresh race topology_coordinator: join tablet load stats refresh in stop()	2026-04-28 12:54:28 +02:00
Benny Halevy	5eaa979f35	test/cluster/dtest: add test for node.set_smp() persistence Add a test that reproduces SCYLLADB-1629: set_smp() had no effect because nodelist() created new ScyllaNode objects on every call, losing the _smp_set_during_test value. The test fails without the fix in the previous patch.	2026-04-28 12:34:08 +03:00
Benny Halevy	7430c1efd7	test/cluster/dtest: cache ScyllaNode instances in ScyllaCluster ScyllaCluster.nodelist() was creating new ScyllaNode objects on every call, so per-node state set via set_smp(), set_log_level(), and _adjust_smp_and_memory() was lost between calls. Fix by caching ScyllaNode instances in a list populated by _add_nodes() using the list returned by servers_add() in populate(). Nodes are assigned monotonically increasing names (node1, node2, ...). nodelist() simply returns the cached list.	2026-04-28 12:34:06 +03:00
Marcin Maliszkiewicz	b0f988afc4	Merge 'auth: fix shutdown and startup races in LDAP cache pruner' from Andrzej Jackowski The LDAP role manager's `_cache_pruner` background fiber periodically calls cache::reload_all_permissions(). Two races cause it to hit SCYLLA_ASSERT(_permission_loader): - Cross-shard race: The pruner `used _cache.container().invoke_on_all()` to reload permissions on every shard. Since both `service::start()` and `sharded<service>::stop()` execute per-shard in parallel, the pruner on one shard could call reload_all_permissions() on another shard before that shard set its loader (startup) or after it cleared its loader (shutdown). Each shard runs its own pruner instance, so reloading locally is sufficient — this also removes redundant N² reload calls. - Intra-shard race: `service::stop()` cleared the permission loader and stopped the role manager concurrently (via when_all_succeed). A mid-reload pruner could yield and then call the now-null loader. Fixed by stopping the role manager first so the pruner is fully drained before the loader is cleared. Fixes SCYLLADB-1679 Backport to 2026.2, introduced in `7eedf50c12` Closes scylladb/scylladb#29605 * github.com:scylladb/scylladb: auth: make shutdown the exact reverse of startup test: ldap: add test for pruner crash during shutdown auth: start authorizer and set permission loader before role manager auth: stop role manager before clearing permission loader auth: reload LDAP permission cache on local shard only	2026-04-28 11:16:07 +02:00
Botond Dénes	a7e9c0e6d2	Merge 'test.py: fix test collection bug' from Andrei Chekun In certain circumstances current way of collecting can be error-prone. Collection can stop when the first file is skipped in the mode leaving the rest of the files in CLI not collected. Another issue that if the file specified twice, with directory and file explicitly, it will produce incorrect CppFile in the stash causing KeyError. Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1714 No backport, test framework bug fix only. Closes scylladb/scylladb#29634 * github.com:scylladb/scylladb: test.py: fix framework test test.py: fix test collection bug	2026-04-28 11:52:35 +03:00
Petr Gusev	e39267b55f	paxos_state: keep prepared message alive across statement execution In do_execute_cql_with_timeout(), when the prepared statement was not found in the cache, we called qp.prepare() and stored the returned result_message::prepared in a local variable scoped to the 'if' block. We then extracted ps_ptr (a checked_weak_ptr to the prepared statement) from the message, let the message go out of scope at the end of the 'if', and used ps_ptr after a co_await on st->execute(). Since `3ac4e258e8` ("transport/messages: hold pinned prepared entry in PREPARE result"), result_message::prepared owns a strong pinned reference to the prepared cache entry. While qp.prepare() runs it also holds its own pin on the entry, so on return the entry has at least the pin owned by the returned message. As long as that message is alive, the cache entry cannot be purged and the weak handle inside ps_ptr remains promotable. The lifetime gap manifested only in debug builds. qp.prepare() returns a ready future on the cache-miss path, so in release builds the co_await resumes synchronously: control flows from the assignment of ps_ptr straight into st->execute() with no opportunity for any other task (in particular, prepared cache invalidation triggered by a concurrent schema change) to run in between. Debug builds, however, force a reactor preemption point on every co_await even when the awaited future is ready. With prepared_msg already destroyed at the end of the 'if' block, the only remaining handle on the cache entry was the weak ps_ptr, and the preemption gave a concurrent cache purge - triggered, for example, by Raft schema changes received during a node restart - the chance to drop the entry. The subsequent execute() then failed when promoting the weak pointer with checked_ptr_is_null_exception. The exception propagated out of the Paxos prepare path as a generic std::exception with no type information in the log, surfacing on the coordinator as: WriteFailure: Failed to prepare ballot ... Replica errors: host_id ... -> seastar::rpc::remote_verb_error (std::exception) Hoist the result_message::prepared into the outer scope so the pinned cache entry stays alive across co_await st->execute(...), closing the window in which a concurrent cache purge could invalidate the weak handle. Fixes SCYLLADB-1173	2026-04-28 10:42:13 +02:00
Botond Dénes	3ea4af1c8c	Merge 'test/cluster/test_incremental_repair: fix flaky coordinator-change scenario' from Avi Kivity - Ensure servers[1] is not the topology coordinator before restarting it, preventing the leader death + re-election + re-repair sequence that masked the compaction-merge bug - Add a retry loop that detects post-restart leadership transfer to servers[1] via direct coordinator query, retrying up to 5 times Fixes: SCYLLADB-1478 Backporting to 2026.2, which sees the failure regularly. Closes scylladb/scylladb#29671 * github.com:scylladb/scylladb: test/cluster/test_incremental_repair: add retry for residual leadership race test/cluster/test_incremental_repair: fix flaky coordinator-change scenario	2026-04-28 09:05:02 +03:00
Andrzej Jackowski	459e3970cd	test: tablet_stats: reproduce shutdown refresh race The coordinator can receive a schema-change notification after run() finishes but before stop() unregisters listeners. The test pins that window with error injections and verifies stop() waits for the refresh instead of letting it outlive the coordinator. Test time in dev: 9.51s Refs SCYLLADB-1728	2026-04-28 08:00:54 +02:00
Andrzej Jackowski	8756f7c068	topology_coordinator: join tablet load stats refresh in stop() Commit `2b7aa3211d` made schema changes trigger tablet load stats refreshes in the background. A notification can still arrive after run() stops the periodic refresher and before the coordinator object is destroyed. Move lifecycle subscription cleanup to stop() and join the serialized refresh there after unregistering refresh trigger sources. This keeps the coordinator alive until notification-triggered refresh work has completed. Fixes SCYLLADB-1728	2026-04-28 07:37:28 +02:00
Avi Kivity	2615d0e8d8	test/cluster/test_incremental_repair: add retry for residual leadership race There is a small race window where Raft leadership could transfer back to servers[1] between the ensure_group0_leader_on() check and the actual restart. If this happens, the new coordinator re-initiates repair and masks the compaction-merge bug. Extract the core test logic into _do_race_window_promotes_unrepaired_data() which directly checks get_topology_coordinator() after restart and raises _LeadershipTransferred if servers[1] became coordinator. The test function calls this helper in a retry loop (up to 5 attempts). Refs: SCYLLADB-1478	2026-04-27 21:11:06 +03:00
Avi Kivity	914b70c75b	test/cluster/test_incremental_repair: fix flaky coordinator-change scenario The test_incremental_repair_race_window_promotes_unrepaired_data test was flaky because it hardcodes servers[1] as the restart target but did not ensure servers[1] was NOT the topology coordinator. When servers[1] happened to be the Raft group0 leader (topology coordinator), restarting it killed the leader, forced a new election, and the new coordinator re-initiated tablet repair. This re-repair flushes memtables on all replicas via take_storage_snapshot() and marks the resulting sstables as repaired -- causing post-repair keys to appear in repaired sstables on servers[0] and servers[2]. The test then hit the wrong assertion (servers[0]/[2] contaminated). Fix: before starting the repair, check whether servers[1] is the topology coordinator. If so, move leadership to another server via ensure_group0_leader_on() so that restarting servers[1] only kills a follower -- which does not trigger an election or coordinator change. Reproducibility was confirmed by forcing leadership to servers[1] via ensure_group0_leader_on() and observing deterministic failure with all three servers showing post-repair keys in repaired sstables (confirming the re-repair scenario), then verifying the fix passes reliably. Fixes: SCYLLADB-1478	2026-04-27 21:08:12 +03:00
Aleksandra Martyniuk	6b7ce5e244	test: fix flaky rack list conversion tests by using read barrier test_numeric_rf_to_rack_list_conversion and test_numeric_rf_to_rack_list_conversion_abort were reading system_schema.keyspaces from an arbitrary node that may not have applied the latest schema change yet. Pin the read to a specific node and issue a read barrier before querying, ensuring the node has up-to-date data.	2026-04-27 15:19:09 +02:00
Aleksandra Martyniuk	9d3d424d58	test: fix flaky test_enforce_rack_list_option by using read barrier The test was reading system_schema.keyspaces from an arbitrary node that may not have applied the latest schema change yet. Pin the read to a specific node and issue a read barrier before querying, ensuring the node has up-to-date data.	2026-04-27 14:44:38 +02:00
Anna Mikhlin	86472e43e1	Update ScyllaDB version to: 2026.3.0-dev	2026-04-26 15:30:13 +03:00
Andrei Chekun	f2f4915e09	test.py: fix framework test Framework test was not skipping unit directory where C++ tests are located. With bug fixing this started to fail. Add ignoring this directory as well.	2026-04-25 18:04:55 +02:00
Piotr Szymaniak	d5efd1f676	test/cluster: wait for Alternator readiness in server startup server_add() only waits for CQL readiness before returning. The Alternator HTTP port may not be listening yet, causing ConnectionRefused with Alternator tests. Extend the ServerUpState enum and startup loop to also check Alternator port readiness when configured. Whenever Alternator port(s) is/are configured, each is verified if connectable and queryable, similar to how CQL ports are probed. Fixes SCYLLADB-1701 Closes scylladb/scylladb#29625	2026-04-25 16:35:44 +03:00
Piotr Smaron	d14d07a079	test: fix flaky test_sstable_write_large_{row,cell} by using a fixed partition key Commit `ce00d61917` ("db: implement large_data virtual tables with feature flag gating") changed these two tests to construct their mutation with a randomly generated partition key (simple_schema::make_pkey()) instead of the previously fixed pk "pv", with the comment that this avoids a "Failed to generate sharding metadata" error. simple_schema::make_pkey() delegates to tests::generate_partition_key(), which defaults to key_size{1, 128}, i.e. the partition key length is uniformly random in [1, 128] bytes. That interacts badly with the fact that both tests pick thresholds at exact byte boundaries of the MC sstable row encoding: - The large-data handler records a row's size as _data_writer->offset() - current_pos (sstables/mx/writer.cc: collect_row_stats()), i.e. the number of bytes the row took on disk. - For the first clustering row, the body includes a vint-encoded prev_row_size = pos - _prev_row_start. - _prev_row_start is captured at the start of the partition (consume_new_partition()) before the partition key is written to the data stream, so prev_row_size rolls in the partition key's serialized length (2-byte prefix + pk bytes) + deletion_time + static row size. A random-size partition key therefore perturbs the first clustering row's encoded size by 1-2 bytes across runs (the vint of prev_row_size crosses the 128 boundary), flipping the test's byte-exact threshold comparison. On seed 2104744000 this produced: critical check row_size_count == expected.size() has failed [3 != 2] Fix the two byte-exact-sensitive tests by reverting their partition key to the fixed s.new_mutation("pv") used before `ce00d61917`. Under smp=1 (which these tests run with, per -c1 in the test invocation) a fixed key is always shard-local, so no sharding-metadata issue arises here. The other tests modified by `ce00d61917` (test_sstable_log_too_many_rows, test_sstable_log_too_many_dead_rows, test_sstable_too_many_collection_elements, test_large_data_records_round_trip, etc.) assert on row/element counts or use thresholds with enough slack that the partition key size does not matter, and are left unchanged. Add an explanatory comment to each fixed site so the pitfall is not re-introduced by a future refactor. Verified stable with: ./test.py --mode=dev test/boost/sstable_3_x_test.cc::test_sstable_write_large_row --repeat 100 --max-failures 1 ./test.py --mode=dev test/boost/sstable_3_x_test.cc::test_sstable_write_large_cell --repeat 100 --max-failures 1 ./test.py --mode=release test/boost/sstable_3_x_test.cc::test_sstable_write_large_row --repeat 100 --max-failures 1 ./test.py --mode=release test/boost/sstable_3_x_test.cc::test_sstable_write_large_cell --repeat 100 --max-failures 1 All four invocations: 100/100 passed. Fixes: SCYLLADB-1685 Closes scylladb/scylladb#29621	2026-04-25 16:32:02 +03:00
Andrei Chekun	92c09d106d	test.py: fix test collection bug In certain circumstances current way of collecting can be error prone. Collection can stop when the first file is skipped in the mode leaving the rest of the files in CLI not collected. Another issue that if the file specified twice, with directory and file explicitly, it will produce incorrect CppFile in the stash causing KeyError. Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1714	2026-04-24 17:57:11 +02:00
Andrzej Jackowski	8855e77465	auth: make shutdown the exact reverse of startup The previous parallel stop of the authenticator and authorizer was a micro-optimization that obscured the lifecycle invariant that shutdown should reverse startup. Refs SCYLLADB-1679	2026-04-24 13:34:09 +02:00
Andrzej Jackowski	adf1e26bab	test: ldap: add test for pruner crash during shutdown Verify that service::stop() drains the LDAP pruner before clearing the permission loader. The test installs a slow permission loader and confirms the pruner is actively reloading when teardown begins. Refs SCYLLADB-1679	2026-04-24 13:34:09 +02:00
Andrzej Jackowski	37a547604f	auth: start authorizer and set permission loader before role manager LDAP role manager starts a pruner fiber that calls reload_all_permissions() which asserts _permission_loader is set. The permission loader calls _authorizer->authorize(), so the authorizer must be started before the loader is set. Start authorizer, then set the permission loader, then start the role manager, ensuring both dependencies are satisfied before the pruner can fire. Fixes SCYLLADB-1679	2026-04-24 13:34:09 +02:00
Andrzej Jackowski	c3e5285d45	auth: stop role manager before clearing permission loader service::stop() cleared the permission loader and stopped the role manager concurrently (via when_all_succeed). The LDAP pruner could be mid-reload at a yield point when the loader was set to null, causing it to call a null function. Stop the role manager first so the pruner is fully drained before the loader is cleared. Fixes SCYLLADB-1679	2026-04-24 13:34:09 +02:00
Andrzej Jackowski	f75e5ac65b	auth: reload LDAP permission cache on local shard only The LDAP role manager's _cache_pruner fiber used invoke_on_all() to reload permissions on every shard. Since auth::service::start() runs on all shards in parallel via invoke_on_all(), the pruner on shard X could call reload_all_permissions() on shard Y before shard Y finished start() and set its permission loader, hitting SCYLLA_ASSERT(_permission_loader). The same cross-shard race existed during shutdown. Each shard runs its own pruner instance, so reloading locally is sufficient — all shards are still covered. This also removes redundant N-squared reload calls. Refs SCYLLADB-1679	2026-04-24 13:06:58 +02:00
Botond Dénes	70261dc674	Merge 'test/cluster: scale failure_detector_timeout_in_ms by build mode' from Marcin Maliszkiewicz The failure_detector_timeout_in_ms override of 2000ms in 6 cluster test files is too aggressive for debug/sanitize builds. During node joins, the coordinator's failure detector times out on RPC pings to the joining node while it is still applying schema snapshots, marks it DOWN, and bans it — causing flaky test failures. Scale the timeout by MODES_TIMEOUT_FACTOR (3x for debug/sanitize, 2x for dev, 1x for release) via a shared failure_detector_timeout fixture in conftest.py. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1587 Backport: no, elasticsearch analyser shows only a single failure Closes scylladb/scylladb#29522 * github.com:scylladb/scylladb: test/cluster: scale failure_detector_timeout_in_ms by build mode test/cluster: add failure_detector_timeout fixture	2026-04-24 09:10:43 +03:00
Botond Dénes	d280517e27	test/cluster/test_incremental_repair: fix flaky do_tablet_incremental_repair_and_ops The log grep in get_sst_status searched from the beginning of the log (no from_mark), so the second-repair assertions were checking cumulative counts across both repairs rather than counts for the second repair alone. The expected values (sst_add==2, sst_mark==2) relied on this cumulative behaviour: 1 from the first repair + 1 from the second = 2. This works when the second repair encounters exactly one unrepaired sstable, but fails whenever the second repair sees two. The second repair can see two unrepaired sstables when the 100 keys inserted before it (via asyncio.gather) trigger a background auto-flush before take_storage_snapshot runs. take_storage_snapshot always flushes the memtable itself, so if an auto-flush already split the batch into two sstables on disk, the second repair's snapshot contains both and logs "Added sst" twice, making the cumulative count 3 instead of 2. Fix: take a log mark per-server before each repair call and pass it to get_sst_status so each check counts only the entries produced by that repair. The expected values become 1/0/1 and 1/1/1 respectively, independent of how many sstables happened to exist beforehand. get_sst_status gains an optional from_mark parameter (default None) which preserves existing call sites that intentionally grep from the start of the log. Fixes: SCYLLADB-1086 Closes scylladb/scylladb#29484	2026-04-23 17:17:16 +02:00
Wojciech Mitros	7634d3f7d4	test/cluster: fix flaky test_hints_consistency_during_replace The test creates a sync point immediately after writing 100 rows with CL=ANY, without waiting for pending hint writes to complete. store_hint() is fire-and-forget: it submits do_store_hint() to a gate and returns immediately. do_store_hint() updates _last_written_rp only after writing to the commitlog. If create_sync_point() is called before all do_store_hint() coroutines complete, the captured replay position is stale, and await_sync_point() returns DONE before all hints are replayed, leaving some rows missing. Fix by waiting for the size_of_hints_in_progress metric to reach zero before creating the sync point, ensuring all in-flight hint writes have completed and _last_written_rp is up to date. This follows the same pattern already used in test_sync_point. Fixes: SCYLLADB-1560 Closes scylladb/scylladb#29623	2026-04-23 17:03:48 +02:00
Botond Dénes	b49cf6247f	test: fix flaky test_read_repair_with_trace_logging by reading tracing with CL=ALL Tracing events are written to system_traces.events with CL=ANY, so they are only guaranteed to be present on the local node of the query coordinator. Reading them back with the driver default (CL=LOCAL_ONE) may route the query to a replica that has not yet received all events, causing the assertion on 'digest mismatch, starting read repair' to fail intermittently. Fix execute_with_tracing() to read tracing via the ResponseFuture API with query_cl=ConsistencyLevel.ALL, so events from all replicas are merged before the caller inspects them. Fixes: SCYLLADB-1633 Closes scylladb/scylladb#29566	2026-04-23 16:57:29 +02:00
Michał Jadwiszczak	878f341338	test/cluster/test_view_building_coordinator: fix view_updates_drained predicate The previous fix for the flakiness in test_file_streaming waited for the scylla_database_view_update_backlog metric to drop to 0 via wait_for(view_updates_drained, ...). However, the predicate returned True/False, while wait_for treats any non-None result as 'done' and keeps retrying only on None. So when the backlog was non-zero the predicate returned False, which wait_for interpreted as success and returned immediately - the test could then stop servers[0]/servers[1] before the view updates generated by new_server from the migrated staging sstable were actually delivered, leading to a partially populated MV (e.g. 431/1000 rows) and a failing assertion. Fix the predicate to return None instead of False when the backlog is not yet drained, so wait_for will actually retry until the metric reaches 0 (or the deadline is hit). Fixes SCYLLADB-1182 Closes scylladb/scylladb#29587	2026-04-23 17:52:22 +03:00
Andrei Chekun	67b3ad94a0	test.py: enhance error output in case no tests were executed By default, pytest produces the error if provided file is not exists. But coupled with xdist it will produce no errors. This is due how the pytest works with xdist. test.py always uses the parameter -n, so if something will go wrong there will be no errors produced, only exit code 5 will be thrown. This PR will print warning in case pytest's exit code is 5. Closes scylladb/scylladb#29584	2026-04-23 14:03:55 +02:00

1 2 3 4 5 ...

53574 Commits