scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-22 07:42:16 +00:00

Author	SHA1	Message	Date
Dario Mirovic	918130befd	utils: loading_cache: add insert() that is a no-op when caching is disabled When the cache is constructed with expiry == 0 the underlying storage is never instantiated and get_ptr() asserts via caching_enabled(). This is fine for callers that need a handle into the cache, but it makes get_ptr() unusable for write-only insertions on caches whose expiry is configurable at runtime (e.g. caches driven by a LiveUpdate config option that the operator may set to 0). Add a new insert(k, load) method on loading_cache that returns a future<> and is a no-op when caching is disabled, otherwise forwards to get_ptr(k, load) and discards the resulting handle. This completes the disabled-mode safety contract of the cache for the write side, mirroring the fallback that get() already provides for the read side. Switch authorized_prepared_statements_cache::insert() from get_ptr().discard_result() to the new insert(), which fixes the crash 'Assertion caching_enabled() failed' in authorized_prepared_statements_cache::insert() that occurs when permissions_validity_in_ms is set to 0 and a prepared statement is executed under authentication. Fixes SCYLLADB-1699	2026-04-30 16:51:23 +02:00
Marcin Maliszkiewicz	b08e0c67e4	test/boost: add dummy case to table_helper_test for non-injection modes The only test requires SCYLLA_ENABLE_ERROR_INJECTION. In modes without it (e.g. release) the suite was empty, so pytest exited with code 5 ("no tests collected") and CI failed. Add a no-op case in that branch so collection always yields at least one test.	2026-04-30 11:45:12 +02:00
Marcin Maliszkiewicz	515b5722fd	test/boost: add regression test for table_helper insert() UAF Deterministic reproducer using an error injection point placed in table_helper::insert() between cache_table_info() and execute(). The test parks fiber A at the injection, drops the target table (evicting the prepared_statements_cache entry), runs fiber B which nulls _insert_stmt, then releases fiber A. Without the fix this crashes in execute(); with the fix fiber A holds a local strong ref and proceeds. Uses the new waiters() API to synchronize with fiber A's entry into the injection.	2026-04-30 11:45:12 +02:00
Marcin Maliszkiewicz	4d234aaaa5	utils/error_injection: add waiters() API Returns the number of fibers currently suspended in wait_for_message() for a named injection. Lets tests synchronize precisely with code parked on an injection point.	2026-04-30 11:45:12 +02:00
Marcin Maliszkiewicz	aa18c3ed4a	table_helper: fix use-after-free on prepared-statement invalidation insert() held no local strong ref to the prepared modification_statement across the suspension in execute(). On a single shard: 1. Fiber A suspends inside _insert_stmt->execute(). 2. DROP TABLE / DROP KEYSPACE on the target, or LRU eviction, removes the prepared_statements_cache entry, releasing its strong ref. 3. Fiber B re-enters cache_table_info(), sees _prepared_stmt (checked_weak_ptr) invalidated, and runs _insert_stmt = nullptr, releasing the last strong ref. The modification_statement is freed. 4. Fiber A resumes inside execute() and touches freed *this. Pin strong ref to _insert_stmt locally before the suspension.	2026-04-30 11:45:12 +02:00
Ernest Zaslavsky	1febfbd9b5	test: rename sstable_tablet_streaming.cc to match the naming convention apparently, boost test MUST end with "_test" to be executed by the test.py Closes scylladb/scylladb#29693	2026-04-30 11:16:39 +03:00
Pavel Emelyanov	1ca97f0c0a	Merge 'test: fix disabled test handling and deduplicate CLI test arguments' from Evgeniy Naydanov - Revert the previous "test.py: fix test collection bug" commit (`92c09d10`) which worked around broken deduplication by filtering items without `BUILD_MODE` in `pytest_collection_modifyitems`. This approach masked the root cause and is superseded by the proper fixes below. - Backport pytest 9.0.3's argument normalization algorithm into `test.py` to work around broken deduplication in pytest 8.3.5 ([pytest-dev/pytest#12083](https://github.com/pytest-dev/pytest/issues/12083)). Duplicate or subsumed test paths (e.g. `test/cql` and `test/cql/lua_test.cql`) are collapsed before invoking pytest. Revert when upgrading to pytest 9.x. - Return a `DisabledFile` collector instead of an empty list in `pytest_collect_file` when all modes are disabled for a file, fixing a bug where subsequent files would not get their stash items set (`REPEATING_FILES`). Restructure `pytest_collect_file` to use a walrus operator (`if repeats := ...`) with a single `remove(file_path)` and `return collectors` at the end, eliminating the early return. - Add `--keep-duplicates` CLI argument to bypass deduplication and forward to pytest. - Move `RUN_ID` assignment from `pytest_collect_file` to `modify_pytest_item`. A shared `run_ids` cache (`defaultdict[tuple[str, str], count]`) is created in `pytest_collection_modifyitems` and passed to `modify_pytest_item`, keyed by `(build_mode, nodeid)` so each mode gets independent counters. This ensures unique run IDs even when `--keep-duplicates` causes the same file to be collected multiple times. - Fix `--repeat` option default from string `"1"` to int `1` — argparse only applies `type=` to CLI-parsed values, not defaults. pytest normally deduplicates overlapping test arguments — e.g. `test/cql test/cql/lua_test.cql` collects `lua_test.cql` only once. The original `test.py` never performed this deduplication, and the pytest version in the toolchain image (8.3.5) has a bug that breaks it ([pytest-dev/pytest#12083](https://github.com/pytest-dev/pytest/issues/12083).) Since we are moving to bare pytest, `test.py` should match pytest's default behavior: deduplicate. Because we cannot easily upgrade pytest, commit 2 backports the deduplication logic from pytest 9.0.3. To match pytest's interface, `--keep-duplicates` is added as an opt-out. This lets a user intentionally run overlapping paths — e.g. `./test.py test/blah test/blah/test_foo.py --keep-duplicates` runs `test_foo.py` twice. The flag is forwarded to pytest and also skips the backported deduplication in `test.py`. - Revert `92c09d10` which filtered items without `BUILD_MODE` in `pytest_collection_modifyitems` and added an early return in `CppFile.collect()`. This workaround is superseded by the proper deduplication and `DisabledFile` fixes. - Add `_CollectionArgument` dataclass (`order=True`, `__contains__` for subsumption) and `_deduplicate_test_args()` function, adapted from pytest 9.0.3. Marked with a TODO to remove once we update to pytest 9.x. - Call `_deduplicate_test_args()` on `options.name` before passing to pytest. - Add `DisabledFile(pytest.File)` that skips collection with an informative message instead of returning an empty list. - Restructure `pytest_collect_file` to use walrus operator: `if repeats := ...:` / `else:` — single `remove(file_path)` at end, no early return. - Add `--keep-duplicates` argument that skips deduplication and is forwarded to pytest. - Create a shared `run_ids` cache in `pytest_collection_modifyitems` and pass it to `modify_pytest_item`, which assigns unique sequential RUN_IDs via `itertools.count`. The cache is keyed by `(build_mode, nodeid)` so each mode gets independent counters. - Remove `RUN_ID` from `_STASH_KEYS_TO_COPY` — it is no longer set on collectors. - Remove `CppFile.run_id` cached_property. `CppTestCase` now reads `RUN_ID` from its own item stash. - Fix `--repeat` option default from `"1"` to `1` and drop redundant `int()` cast. Closes SCYLLADB-1730 Closes scylladb/scylladb#29665 * github.com:scylladb/scylladb: test: add --keep-duplicates and assign RUN_ID via shared cache test/pylib/runner: fix disabled file collection test.py: deduplicate CLI test arguments before passing to pytest Revert "test.py: fix test collection bug"	2026-04-30 07:58:25 +03:00
Yaniv Michael Kaul	93722f2c89	gms/gossiper: fix use-after-move in do_send_ack2_msg The second logger.debug() call accesses ack2_msg after it was moved via std::move() in the co_await send_gossip_digest_ack2 call. This is undefined behavior. Fix by formatting ack2_msg to a string before the move, then using that cached string in both debug log calls. FIXES: https://scylladb.atlassian.net/browse/SCYLLADB-1778 Closes scylladb/scylladb#29227	2026-04-30 07:07:39 +03:00
Wojciech Mitros	ebaf536449	replica/database: fix cross-shard deadlock in lock_tables_metadata() lock_tables_metadata() acquires a write lock on tables_metadata._cf_lock on every shard. It used invoke_on_all(), which dispatches lock acquisitions to all shards in parallel via parallel_for_each + smp::submit_to. When two fibers call lock_tables_metadata() concurrently, this can deadlock. parallel_for_each starts all iterations unconditionally: even when the local shard's lock attempt blocks (because the other fiber already holds it), SMP messages are still sent to remote shards. Both fibers' lock-acquisition messages land in the per-shard SMP queues. The SMP queue itself is FIFO, but process_incoming() drains it and schedules each item as a reactor task via add_task(), which — in debug and sanitize builds with SEASTAR_SHUFFLE_TASK_QUEUE — shuffles each newly added task against all pending tasks in the same scheduling group's reactor task queue. This means fiber A's lock acquisition can be reordered past fiber B's (and past unrelated tasks) on a given shard. If fiber A wins the lock on shard X while fiber B wins on shard Y, this creates a classic cross-shard lock-ordering deadlock (circular wait). In production builds without SEASTAR_SHUFFLE_TASK_QUEUE, the reactor task queue is FIFO. Still, even in release builds, the SMP queues can reorder messages even, so the deadlock is still possible, even if it's much less likely. In debug and sanitize builds, the task-queue shuffle makes the deadlock very likely whenever both fibers' lock-acquisition tasks are pending simultaneously in the reactor task queue on any shard. This deadlock was exposed by `ce00d61917` ("db: implement large_data virtual tables with feature flag gating", merged as `88a8324e68`), which introduced legacy_drop_table_on_all_shards as a second caller of lock_tables_metadata(). When LARGE_DATA_VIRTUAL_TABLES is enabled during topology_state_load (via feature_service::enable), two fibers can race: 1. activate_large_data_virtual_tables() — calls legacy_drop_table_on_all_shards() which calls lock_tables_metadata() synchronously via .get() 2. reload_schema_in_bg() — fires as a background fiber from TABLE_DIGEST_INSENSITIVE_TO_EXPIRY, eventually reaches schema_applier::commit() which also calls lock_tables_metadata() If both reach lock_tables_metadata() while the lock is free on all shards, the parallel acquisition creates the deadlock opportunity. The deadlock blocks topology_state_load() from completing, which prevents the bootstrapping node from finishing its topology state transitions. The coordinator's topology coordinator then waits for the node to reach the expected state, but the node is stuck, so eventually the read_barrier times out after 300 seconds. Fix by acquiring the shard 0 lock first before attempting to acquire any other lock. Whichever fiber wins shard 0 is guaranteed to acquire all remaining shards before the other fiber can proceed past shard 0, eliminating the circular-wait condition. Tested manually with 2 approaches: 1. causing different shard locks to be acquired by different lock_tables_metadata() calls by adding different sleeps depending on the lock_tables_metadata() call and target shard - this reproduced the issue consistently 2. matching the time point at which both fibers reach lock_tables_metadata() adding a single sleep to one of the fibers - this heavily depends on the machine so we can't create a universal reproducer this way, but it did result in the observed failure on my machine after finding the right sleep time Also added a unit test for concurrent lock_tables_metadata() calls. Fixes: SCYLLADB-1694 Fixes: SCYLLADB-1644 Fixes: SCYLLADB-1684 Closes scylladb/scylladb#29678	2026-04-29 21:13:53 +02:00
Patryk Jędrzejczak	15f35577ed	Merge 'paxos_state: keep prepared message alive across statement execution' from Petr Gusev In do_execute_cql_with_timeout(), when the prepared statement was not found in the cache, we called qp.prepare() and stored the returned result_message::prepared in a local variable scoped to the 'if' block. We then extracted ps_ptr (a checked_weak_ptr to the prepared statement) from the message, let the message go out of scope at the end of the 'if', and used ps_ptr after a co_await on st->execute(). Since `3ac4e258e8` ("transport/messages: hold pinned prepared entry in PREPARE result"), result_message::prepared owns a strong pinned reference to the prepared cache entry. While qp.prepare() runs it also holds its own pin on the entry, so on return the entry has at least the pin owned by the returned message. As long as that message is alive, the cache entry cannot be purged and the weak handle inside ps_ptr remains promotable. The lifetime gap manifested only in debug builds. qp.prepare() returns a ready future on the cache-miss path, so in release builds the co_await resumes synchronously: control flows from the assignment of ps_ptr straight into st->execute() with no opportunity for any other task (in particular, prepared cache invalidation triggered by a concurrent schema change) to run in between. Debug builds, however, force a reactor preemption point on every co_await even when the awaited future is ready. With prepared_msg already destroyed at the end of the 'if' block, the only remaining handle on the cache entry was the weak ps_ptr, and the preemption gave a concurrent cache purge - triggered, for example, by Raft schema changes received during a node restart - the chance to drop the entry. The subsequent execute() then failed when promoting the weak pointer with checked_ptr_is_null_exception. The exception propagated out of the Paxos prepare path as a generic std::exception with no type information in the log, surfacing on the coordinator as: WriteFailure: Failed to prepare ballot ... Replica errors: host_id ... -> seastar::rpc::remote_verb_error (std::exception) Hoist the result_message::prepared into the outer scope so the pinned cache entry stays alive across co_await st->execute(...), closing the window in which a concurrent cache purge could invalidate the weak handle. Fixes SCYLLADB-1173 backport: the patch is simple, we can backport it to all versions with "LWT over tablets" feature. Note that the problem is only in test runs in debug configuration, production is not affected. Closes scylladb/scylladb#29675 * https://github.com/scylladb/scylladb: table_helper: retry insert prepare on concurrent cache invalidation paxos_state: keep prepared message alive across statement execution	2026-04-29 17:57:27 +02:00
Yaron Kaikov	d310e4b27d	scylla-gdb: fix compaction-tasks command for intrusive list Since commit `e942c074f2` changed _tasks from std::list<shared_ptr<...>> to a boost::intrusive_list, iterating yields raw compaction_task_executor objects rather than shared_ptr wrappers. The GDB script was updated to use intrusive_list() but still wrapped elements in seastar_shared_ptr(), causing 'gdb.error: There is no member or method named _p' when compaction tasks are active. Move the seastar_shared_ptr unwrapping to the 6.2 compatibility fallback path only, since the intrusive list path yields objects directly. Fixes: SCYLLADB-1762 Closes scylladb/scylladb#29690	2026-04-29 13:11:13 +03:00
Marcin Maliszkiewicz	45b4834ac4	Merge 'audit: fix maintenance socket startup/shutdown ordering' from Andrzej Jackowski This series addresses three problems in the audit startup/shutdown sequence: 1. [BUG] Shutdown SIGABRT. During graceful shutdown, deferred stops run in reverse order of construction. With the audit service constructed after the maintenance socket, audit was destroyed first, and in-flight queries on the maintenance socket could hit the destroyed audit service (assertion failure in sharded::local()). 2. [BUG] Startup audit bypass. The maintenance socket opened before audit storage was initialized, allowing queries (e.g. creating a superuser) to bypass auditing in that window. 3. [PROBLEM] Blocks SCYLLADB-1430. The existing order prevents audit configuration from being driven by group0 state, because audit started before group0. The series is organized as: a test-helper refactor, a test for the audited maintenance-socket flow, a startup-phase split, the construction-order fix and its shutdown-race test, and finally the storage-before-socket fix and its startup-window test. Fixes SCYLLADB-1615 No backport, bugs don't seem severe enough to justify backporting. Closes scylladb/scylladb#29539 * github.com:scylladb/scylladb: audit: assert storage ordering invariants at runtime audit: start maintenance socket after audit storage audit: move audit construction before maintenance socket audit: split startup into construction and storage phases test: audit: verify maintenance socket operations are audited test: audit: parameterize source address in audit assertions	2026-04-29 10:37:38 +02:00
Łukasz Paszkowski	7e14ea5ac8	sstables: only wipe TemporaryHashes for sstable formats that have it Commit `8d34127684` ("sstables: clean up TemporaryHashes file in wipe()") unconditionally calls filename(..., component_type::TemporaryHashes) inside filesystem_storage::wipe(). However, the TemporaryHashes component is only registered in the component map of the 'ms' sstable format. For older formats (ka, la, mc, md, me) the lookup goes through sstable_version_constants::get_component_map(version).at(...) and throws std::out_of_range. The exception is then swallowed by the outer catch(...) in wipe(), which just logs and ignores. As a side effect, the subsequent remove_file(new_toc_name) is never reached and the TemporaryTOC ('*-TOC.txt.tmp') file is left as an orphan on disk after every unlink() of a non-'ms' sstable. Guard the lookup with get_component_map(version).contains() so the cleanup is only attempted for formats that actually define the component. Add a regression test in test/boost/sstable_directory_test.cc that creates an 'me'-format sstable, unlinks it and asserts that the sstable directory is left empty. Without the fix the test fails with a leftover 'me-...-TOC.txt.tmp' file. Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1697 Closes scylladb/scylladb#29620	2026-04-29 08:06:36 +03:00
Botond Dénes	809f12f988	Merge 'test/cluster/dtest: fix ScyllaNode state not persisting across nodelist() calls' from Benny Halevy `ScyllaCluster.nodelist()` creates new `ScyllaNode` objects on every call, so per-node state set via `set_smp()`, `set_log_level()`, and `_adjust_smp_and_memory()` was lost. This meant `set_smp()` had no effect when `cluster.start()` was called after it, since `start_nodes()` calls `nodelist()` internally which creates fresh nodes with default values. - Add debug logging for smp/memory in ScyllaNode - Store per-node settings (smp, memory, log levels) in a `ScyllaCluster._node_resources` dict keyed by server_id, so they survive `nodelist()` reconstruction. `ScyllaNode` restores its state from this dict on construction and saves it back whenever `set_smp()`, `set_log_level()`, or `_adjust_smp_and_memory()` modifies it. - Add a reproducer test verifying `set_smp()` takes effect on restart Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1629 -- No backport needed: this only fixes dtest infrastructure, no production code is affected. Closes scylladb/scylladb#29549 * github.com:scylladb/scylladb: test/cluster/dtest: add test for node.set_smp() persistence test/cluster/dtest: cache ScyllaNode instances in ScyllaCluster test/cluster/dtest/ccmlib/scylla_node: add debug logging	2026-04-29 06:25:36 +03:00
Evgeniy Naydanov	96d3f13245	test: add --keep-duplicates and assign RUN_ID via shared cache Add --keep-duplicates CLI argument to bypass deduplication and forward to pytest, allowing duplicate test file arguments to be collected multiple times. Move RUN_ID assignment from pytest_collect_file to modify_pytest_item. All File collectors for the same source file share a single run_ids dict (via RUN_ID_CACHE stash key), so items from duplicate collection arguments (e.g. with --keep-duplicates) automatically get unique IDs. Remove CppFile.run_id cached_property — CppTestCase now reads RUN_ID from its own item stash, which is set during modify_pytest_item. Fix --repeat option default from string "1" to int 1 — argparse only applies type= to CLI-parsed values, not defaults. Co-Authored-By: Claude Opus 4.6 (200K context) <noreply@anthropic.com>	2026-04-29 02:36:05 +00:00
Evgeniy Naydanov	497bd6b6c9	test/pylib/runner: fix disabled file collection Return a DisabledFile collector instead of an empty list when all modes are disabled for a file. Returning an empty list caused subsequent files to not get their stash items set because file_path was never removed from REPEATING_FILES. Co-Authored-By: Claude Opus 4.6 (200K context) <noreply@anthropic.com>	2026-04-29 02:36:05 +00:00
Evgeniy Naydanov	43f06ed19d	test.py: deduplicate CLI test arguments before passing to pytest Backport the argument normalization algorithm from pytest 9.0.3 to work around broken deduplication in pytest 8.3.5 (https://github.com/pytest-dev/pytest/issues/12083). Duplicate or subsumed test paths (e.g. 'test/cql' and 'test/cql/lua_test.cql') are now collapsed before invoking pytest. Revert this commit when upgrading to pytest 9.x. Co-Authored-By: Claude Opus 4.6 (200K context) <noreply@anthropic.com>	2026-04-29 02:36:05 +00:00
Evgeniy Naydanov	05f2c53931	Revert "test.py: fix test collection bug" This reverts commit `92c09d106d`.	2026-04-29 02:35:00 +00:00
Andrzej Jackowski	3755c370ac	audit: assert storage ordering invariants at runtime Abort if audit storage fails to start rather than silently running with an unaudited maintenance socket. Also assert that storage is already stopped when the audit service is destroyed, documenting the defer-stack ordering requirement. Refs SCYLLADB-1615 Refs SCYLLADB-1695	2026-04-28 18:58:49 +02:00
Andrzej Jackowski	543fb6a2db	audit: start maintenance socket after audit storage Without this, there is a window after startup where queries on the maintenance socket bypass auditing because audit storage is not yet initialized. Fixes SCYLLADB-1615	2026-04-28 18:58:49 +02:00
Andrzej Jackowski	b7bc2d89e6	audit: move audit construction before maintenance socket During graceful shutdown, deferred stops run in reverse order of construction. When the audit service was constructed after the maintenance socket, audit was destroyed first. A DML query still in-flight on the maintenance socket could then bypass auditing entirely. Move construction as early as possible so the audit service outlives the maintenance socket on the defer stack, and to maximise the window in which attempts to use audit before storage is ready are caught with on_internal_error_noexcept. Refs SCYLLADB-1615	2026-04-28 18:58:49 +02:00
Andrzej Jackowski	bc67dd0b82	audit: split startup into construction and storage phases The table-based audit backend needs Raft to create its keyspace, but the audit service must exist earlier so that CQL paths don't silently skip auditing. Split startup into two phases: construction and storage initialization. Queries arriving between the two phases are logged as errors. This is a refactoring commit and the split sections will be moved later in this patch series. Refs SCYLLADB-1615	2026-04-28 18:58:42 +02:00
Andrzej Jackowski	1616c71bf0	test: audit: verify maintenance socket operations are audited User creation via the maintenance socket should produce audit entries, as this is the recommended flow for creating the initial superuser when default credentials are disabled. The test is parametrized by audit backend (table and syslog). The maintenance socket source address is "::" because Seastar returns a zero-initialised in6_addr for AF_UNIX sockets. Test time in dev: 0.6s Refs SCYLLADB-1615	2026-04-28 18:42:39 +02:00
Avi Kivity	c4de2b3c9d	Merge 'test: fix flaky tablets test by using read barrier' from Aleksandra Martyniuk Some tests in test_tablets.py read system_schema.keyspaces from an arbitrary node that may not have applied the latest schema change yet. Pin the read to a specific node and issue a read barrier before querying, ensuring the node has up-to-date data. Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1700 Test fix; no backport Closes scylladb/scylladb#29655 * github.com:scylladb/scylladb: test: fix flaky rack list conversion tests by using read barrier test: fix flaky test_enforce_rack_list_option by using read barrier	2026-04-28 17:15:59 +03:00
Petr Gusev	e6137ab11b	table_helper: retry insert prepare on concurrent cache invalidation table_helper::insert() retrieves the prepared statement via cache_table_info() and then dereferences _prepared_stmt to read bound_names. _prepared_stmt is a checked_weak_ptr into the prepared statements cache and can be invalidated at any time by a concurrent purge (for example, on a schema change). cache_table_info() (re-)prepares the statement and assigns _prepared_stmt before returning, and the strong pin held by the result_message::prepared returned from qp.prepare() keeps the cache entry alive only for the duration of try_prepare(). After try_prepare() returns, the pin is gone and _prepared_stmt is the only remaining handle on the entry. In release builds this is fine: the chain of ready-future co_awaits between try_prepare() finishing and _prepared_stmt->bound_names being read resumes synchronously, so no other task -- in particular, no cache purge -- can run in that window. In debug builds, however, Seastar inserts a reactor preemption point on every co_await even when the awaited future is ready. That preemption window is wide enough for a concurrent invalidation to drop the freshly installed cache entry, turning _prepared_stmt into a null weak handle and crashing the subsequent dereference with checked_ptr_is_null_exception. Wrap the cache_table_info() call in a loop that re-attempts the preparation until a synchronous post-resume check finds _prepared_stmt still valid. The check runs in the same task immediately after the co_await resumes, with no co_await between the check and the dereference, so a purge cannot slip in. _insert_stmt is a strong shared_ptr to the statement object and is not affected by cache invalidation, so it remains safe to use across the final co_await on execute(). The other caller of cache_table_info(), trace_keyspace_helper::apply_events_mutation(), accesses only the strong _insert_stmt via insert_stmt() and never dereferences the weak _prepared_stmt, so it is unaffected. Refs SCYLLADB-1173	2026-04-28 16:03:06 +02:00
Ernest Zaslavsky	a97502920b	test: optimize compaction_strategy_cleanup_method for remote storage Parallelize SSTable creation using parallel_for_each. The file count is made a parameter with a default of 64, allowing future S3/GCS variants to use a smaller count if needed.	2026-04-28 16:59:38 +03:00
Ernest Zaslavsky	0b9a2844bd	test: optimize stcs_reshape_overlapping for remote storage Parallelize SSTable creation using parallel_for_each and reduce the SSTable count from 256 to 64 for S3/GCS variants. The local test variant retains the original 256 count.	2026-04-28 16:59:38 +03:00
Ernest Zaslavsky	ac89cffc9f	test: optimize twcs_reshape_with_disjoint_set for remote storage Parallelize SSTable creation across all sub-tests using parallel_for_each and reduce the SSTable count from 256 to 64 for S3/GCS variants. Re-enable the S3 test variant that was previously disabled due to taking 4+ minutes. With parallel creation and reduced count, the test now completes in a reasonable time.	2026-04-28 16:59:37 +03:00
Ernest Zaslavsky	01b4292f87	test: parallelize SSTable creation in cleanup_during_offstrategy_incremental Pre-extract mutation pairs and use parallel_for_each with make_sstable_containing_async to create SSTables concurrently instead of sequentially. The post-creation loop still runs serially to collect token ranges and generations.	2026-04-28 16:59:37 +03:00
Ernest Zaslavsky	923ff9abc9	test: parallelize SSTable creation in run_incremental_compaction_test Pre-extract mutation pairs and use parallel_for_each with make_sstable_containing_async to create SSTables concurrently instead of sequentially. The post-creation loop still runs serially to collect token ranges and generations that depend on SSTable order.	2026-04-28 16:59:37 +03:00
Ernest Zaslavsky	6a25f52473	test: parallelize SSTable creation in offstrategy_sstable_compaction Use parallel_for_each with make_sstable_containing_async to create SSTables concurrently instead of sequentially, reducing wall-clock time on remote storage backends (S3/GCS).	2026-04-28 16:59:37 +03:00
Ernest Zaslavsky	baca685629	test: parallelize SSTable creation in twcs_partition_estimate Use parallel_for_each with make_sstable_containing_async to create SSTables concurrently instead of sequentially, reducing wall-clock time on remote storage backends (S3/GCS).	2026-04-28 16:59:37 +03:00
Ernest Zaslavsky	716202b839	test: add trace-level logging for S3 and HTTP in compaction tests Raise log levels for s3 and gcp_storage from debug to trace, and add trace-level logging for http and default_http_retry_strategy modules. This provides better visibility into storage backend interactions when debugging slow or failing compaction tests on remote storage.	2026-04-28 16:59:37 +03:00
Ernest Zaslavsky	a4ebe16517	test: make sstable test utilities natively async The original make_memtable used seastar::thread::yield() for preemption, which required all callers to run inside a seastar::thread context. This prevented the utilities from being used directly in coroutines or parallel_for_each lambdas. Make the primary functions — make_memtable, make_sstable_containing, and verify_mutation — return future<> directly. Callers now .get() explicitly when in seastar::thread context, or co_await when in a coroutine. make_memtable now uses coroutine::maybe_yield() instead of seastar::thread::yield(). verify_mutation is converted to coroutines as well. Requested in: https://github.com/scylladb/scylladb/pull/29416#pullrequestreview-4112296282	2026-04-28 16:59:37 +03:00
Ernest Zaslavsky	4b637226a7	test: move make_memtable out of external_updater in row_cache_test test_exception_safety_of_update_from_memtable called make_memtable inside the row_cache::external_updater callback. external_updater runs as a synchronous execute() call that must not yield, but make_memtable calls seastar::thread::yield() every 10th mutation. The bug was latent because the test only inserted 5 mutations, so the yield was never reached. Move the call before the callback. Prerequisite for the next patch, which changes make_memtable to call make_memtable_async().get() -- that would yield on every mutation via coroutine::maybe_yield(), making this bug visible.	2026-04-28 16:59:37 +03:00
Ernest Zaslavsky	7c09f35ddf	test: increase S3 max connections for compaction tests Increase max_connections from the default to 32 for the S3 endpoint used in tests. This allows more concurrent HTTP connections to the S3 backend, which is needed to benefit from parallel SSTable creation that will be introduced in subsequent commits.	2026-04-28 16:59:37 +03:00
Taras Veretilnyk	784127c40b	sstables_loader: synchronously unlink streamed sstables before returning mark_for_deletion() only set an in-memory flag; the actual file deletion ran lazily when the last shared_sstable reference dropped, leaving a window in which a follow-up scan of the upload directory (e.g. a second 'nodetool refresh --load-and-stream') could observe a partially-deleted sstable and fail with malformed_sstable_exception. Force the unlink to complete before stream() returns. For tablet streaming, partially-contained sstables span multiple per-tablet batches, so a defer_unlinking flag postpones the unlink until after all sstables are streamed; for vnodes and fully-contained sstables are streamed only once and could be removed just after being streamed. Added a FIXME on object_storage_base::wipe and strengthened the doc on storage::wipe to make the never-fails contract explicit	2026-04-28 14:52:28 +02:00
Patryk Jędrzejczak	d9dd3bfe53	Merge 'topology_coordinator: join tablet load stats refresh in stop()' from Andrzej Jackowski Commit `2b7aa32` (topology_coordinator: Refresh load stats after table is created or altered) registered topology_coordinator as a schema change listener and added on_create_column_family which fire-and-forgets _tablet_load_stats_refresh.trigger(). The triggered task runs on the gossip scheduling group via with_scheduling_group and accesses the topology_coordinator via 'this'. stop() unregisters the listener but does not wait for any in-flight refresh task. If a notification fires between _tablet_load_stats_refresh.join() in run() and unregister_listener in stop(), the scheduled task can outlive the topology_coordinator and access freed memory after run_topology_coordinator's coroutine frame is destroyed. Wait for the refresh to complete in stop() after unregistering the listener, ensuring no task can fire after destruction. Fixes SCYLLADB-1728 Backport to 2026.1 and 2026.2, because the issue was introduced in `2b7aa32` Closes scylladb/scylladb#29653 * https://github.com/scylladb/scylladb: test: tablet_stats: reproduce shutdown refresh race topology_coordinator: join tablet load stats refresh in stop()	2026-04-28 12:54:28 +02:00
Benny Halevy	5eaa979f35	test/cluster/dtest: add test for node.set_smp() persistence Add a test that reproduces SCYLLADB-1629: set_smp() had no effect because nodelist() created new ScyllaNode objects on every call, losing the _smp_set_during_test value. The test fails without the fix in the previous patch.	2026-04-28 12:34:08 +03:00
Benny Halevy	7430c1efd7	test/cluster/dtest: cache ScyllaNode instances in ScyllaCluster ScyllaCluster.nodelist() was creating new ScyllaNode objects on every call, so per-node state set via set_smp(), set_log_level(), and _adjust_smp_and_memory() was lost between calls. Fix by caching ScyllaNode instances in a list populated by _add_nodes() using the list returned by servers_add() in populate(). Nodes are assigned monotonically increasing names (node1, node2, ...). nodelist() simply returns the cached list.	2026-04-28 12:34:06 +03:00
Marcin Maliszkiewicz	b0f988afc4	Merge 'auth: fix shutdown and startup races in LDAP cache pruner' from Andrzej Jackowski The LDAP role manager's `_cache_pruner` background fiber periodically calls cache::reload_all_permissions(). Two races cause it to hit SCYLLA_ASSERT(_permission_loader): - Cross-shard race: The pruner `used _cache.container().invoke_on_all()` to reload permissions on every shard. Since both `service::start()` and `sharded<service>::stop()` execute per-shard in parallel, the pruner on one shard could call reload_all_permissions() on another shard before that shard set its loader (startup) or after it cleared its loader (shutdown). Each shard runs its own pruner instance, so reloading locally is sufficient — this also removes redundant N² reload calls. - Intra-shard race: `service::stop()` cleared the permission loader and stopped the role manager concurrently (via when_all_succeed). A mid-reload pruner could yield and then call the now-null loader. Fixed by stopping the role manager first so the pruner is fully drained before the loader is cleared. Fixes SCYLLADB-1679 Backport to 2026.2, introduced in `7eedf50c12` Closes scylladb/scylladb#29605 * github.com:scylladb/scylladb: auth: make shutdown the exact reverse of startup test: ldap: add test for pruner crash during shutdown auth: start authorizer and set permission loader before role manager auth: stop role manager before clearing permission loader auth: reload LDAP permission cache on local shard only	2026-04-28 11:16:07 +02:00
Botond Dénes	a7e9c0e6d2	Merge 'test.py: fix test collection bug' from Andrei Chekun In certain circumstances current way of collecting can be error-prone. Collection can stop when the first file is skipped in the mode leaving the rest of the files in CLI not collected. Another issue that if the file specified twice, with directory and file explicitly, it will produce incorrect CppFile in the stash causing KeyError. Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1714 No backport, test framework bug fix only. Closes scylladb/scylladb#29634 * github.com:scylladb/scylladb: test.py: fix framework test test.py: fix test collection bug	2026-04-28 11:52:35 +03:00
Petr Gusev	e39267b55f	paxos_state: keep prepared message alive across statement execution In do_execute_cql_with_timeout(), when the prepared statement was not found in the cache, we called qp.prepare() and stored the returned result_message::prepared in a local variable scoped to the 'if' block. We then extracted ps_ptr (a checked_weak_ptr to the prepared statement) from the message, let the message go out of scope at the end of the 'if', and used ps_ptr after a co_await on st->execute(). Since `3ac4e258e8` ("transport/messages: hold pinned prepared entry in PREPARE result"), result_message::prepared owns a strong pinned reference to the prepared cache entry. While qp.prepare() runs it also holds its own pin on the entry, so on return the entry has at least the pin owned by the returned message. As long as that message is alive, the cache entry cannot be purged and the weak handle inside ps_ptr remains promotable. The lifetime gap manifested only in debug builds. qp.prepare() returns a ready future on the cache-miss path, so in release builds the co_await resumes synchronously: control flows from the assignment of ps_ptr straight into st->execute() with no opportunity for any other task (in particular, prepared cache invalidation triggered by a concurrent schema change) to run in between. Debug builds, however, force a reactor preemption point on every co_await even when the awaited future is ready. With prepared_msg already destroyed at the end of the 'if' block, the only remaining handle on the cache entry was the weak ps_ptr, and the preemption gave a concurrent cache purge - triggered, for example, by Raft schema changes received during a node restart - the chance to drop the entry. The subsequent execute() then failed when promoting the weak pointer with checked_ptr_is_null_exception. The exception propagated out of the Paxos prepare path as a generic std::exception with no type information in the log, surfacing on the coordinator as: WriteFailure: Failed to prepare ballot ... Replica errors: host_id ... -> seastar::rpc::remote_verb_error (std::exception) Hoist the result_message::prepared into the outer scope so the pinned cache entry stays alive across co_await st->execute(...), closing the window in which a concurrent cache purge could invalidate the weak handle. Fixes SCYLLADB-1173	2026-04-28 10:42:13 +02:00
Botond Dénes	3ea4af1c8c	Merge 'test/cluster/test_incremental_repair: fix flaky coordinator-change scenario' from Avi Kivity - Ensure servers[1] is not the topology coordinator before restarting it, preventing the leader death + re-election + re-repair sequence that masked the compaction-merge bug - Add a retry loop that detects post-restart leadership transfer to servers[1] via direct coordinator query, retrying up to 5 times Fixes: SCYLLADB-1478 Backporting to 2026.2, which sees the failure regularly. Closes scylladb/scylladb#29671 * github.com:scylladb/scylladb: test/cluster/test_incremental_repair: add retry for residual leadership race test/cluster/test_incremental_repair: fix flaky coordinator-change scenario	2026-04-28 09:05:02 +03:00
Andrzej Jackowski	459e3970cd	test: tablet_stats: reproduce shutdown refresh race The coordinator can receive a schema-change notification after run() finishes but before stop() unregisters listeners. The test pins that window with error injections and verifies stop() waits for the refresh instead of letting it outlive the coordinator. Test time in dev: 9.51s Refs SCYLLADB-1728	2026-04-28 08:00:54 +02:00
Andrzej Jackowski	8756f7c068	topology_coordinator: join tablet load stats refresh in stop() Commit `2b7aa3211d` made schema changes trigger tablet load stats refreshes in the background. A notification can still arrive after run() stops the periodic refresher and before the coordinator object is destroyed. Move lifecycle subscription cleanup to stop() and join the serialized refresh there after unregistering refresh trigger sources. This keeps the coordinator alive until notification-triggered refresh work has completed. Fixes SCYLLADB-1728	2026-04-28 07:37:28 +02:00
Avi Kivity	2615d0e8d8	test/cluster/test_incremental_repair: add retry for residual leadership race There is a small race window where Raft leadership could transfer back to servers[1] between the ensure_group0_leader_on() check and the actual restart. If this happens, the new coordinator re-initiates repair and masks the compaction-merge bug. Extract the core test logic into _do_race_window_promotes_unrepaired_data() which directly checks get_topology_coordinator() after restart and raises _LeadershipTransferred if servers[1] became coordinator. The test function calls this helper in a retry loop (up to 5 attempts). Refs: SCYLLADB-1478	2026-04-27 21:11:06 +03:00
Avi Kivity	914b70c75b	test/cluster/test_incremental_repair: fix flaky coordinator-change scenario The test_incremental_repair_race_window_promotes_unrepaired_data test was flaky because it hardcodes servers[1] as the restart target but did not ensure servers[1] was NOT the topology coordinator. When servers[1] happened to be the Raft group0 leader (topology coordinator), restarting it killed the leader, forced a new election, and the new coordinator re-initiated tablet repair. This re-repair flushes memtables on all replicas via take_storage_snapshot() and marks the resulting sstables as repaired -- causing post-repair keys to appear in repaired sstables on servers[0] and servers[2]. The test then hit the wrong assertion (servers[0]/[2] contaminated). Fix: before starting the repair, check whether servers[1] is the topology coordinator. If so, move leadership to another server via ensure_group0_leader_on() so that restarting servers[1] only kills a follower -- which does not trigger an election or coordinator change. Reproducibility was confirmed by forcing leadership to servers[1] via ensure_group0_leader_on() and observing deterministic failure with all three servers showing post-repair keys in repaired sstables (confirming the re-repair scenario), then verifying the fix passes reliably. Fixes: SCYLLADB-1478	2026-04-27 21:08:12 +03:00
Aleksandra Martyniuk	6b7ce5e244	test: fix flaky rack list conversion tests by using read barrier test_numeric_rf_to_rack_list_conversion and test_numeric_rf_to_rack_list_conversion_abort were reading system_schema.keyspaces from an arbitrary node that may not have applied the latest schema change yet. Pin the read to a specific node and issue a read barrier before querying, ensuring the node has up-to-date data.	2026-04-27 15:19:09 +02:00
Aleksandra Martyniuk	9d3d424d58	test: fix flaky test_enforce_rack_list_option by using read barrier The test was reading system_schema.keyspaces from an arbitrary node that may not have applied the latest schema change yet. Pin the read to a specific node and issue a read barrier before querying, ensuring the node has up-to-date data.	2026-04-27 14:44:38 +02:00

... 5 6 7 8 9 ...

53948 Commits