scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-26 03:20:37 +00:00

Author	SHA1	Message	Date
Ernest Zaslavsky	116a2f43ee	sstables_loader: prevent use-after-free on table drop during streaming sstables_loader::load_and_stream holds a replica::table& reference via the sstable_streamer for the entire streaming operation. If the table is dropped concurrently (e.g. DROP TABLE or DROP KEYSPACE), the reference becomes dangling and the next access crashes with SEGV. This was observed in a longevity-50gb-12h-master test run where a keyspace was dropped while load_and_stream was still streaming SSTables from a previous batch. Fix by acquiring a stream_in_progress() phaser guard in load_and_stream before creating the streamer. table::stop() calls _pending_streams_phaser.close() which blocks until all outstanding guards are released, keeping the table alive for the duration of the streaming operation. Fixes: SCYLLADB-1639 Closes scylladb/scylladb#29403 (cherry picked from commit `e5e6608f20`) Closes scylladb/scylladb#29558 Closes scylladb/scylladb#29600	2026-04-24 10:33:51 +02:00
Piotr Dulikowski	04d8663052	Merge 'cql3: pin prepared cache entry in prepare() to avoid invalid weak handle race' from Alex Dathskovsky query_processor::prepare() could race with prepared statement invalidation: after loading from the prepared cache, we converted the cached object to a checked weak pointer and then continued asynchronous work (including error-injection waitpoints). If invalidation happened in that window, the weak handle could no longer be promoted and the prepare path could fail nondeterministically. This change keeps a strong cache entry reference alive across the whole critical section in prepare() by using a pinned cache accessor (get_pinned()), and only deriving the weak handle while the entry is pinned. This removes the lifetime gap without adding retry loops. Test coverage was extended in test/cluster/test_prepare_race.py: - reproduces the invalidation-during-prepare window with injection, - verifies prepare completes successfully, - then invalidates again and executes the same stale client prepared object, - confirms the driver transparently re-requests/re-prepares and execution succeeds. This change introduces: - no behavior change for normal prepare flow besides stronger lifetime guarantees, - no new protocol semantics, - preserves existing cache invalidation logic, - adds explicit cluster-level regression coverage for both the race and driver reprepare path. - pushes the re prepare operation twards the driver, the server will return unprepared error for the first time and the driver will have to re prepare during execution stage Fixes: https://github.com/scylladb/scylladb/issues/27657 Backport to active branches recommended: No node crash, but user-visible PREPARE failures under rare schema-invalidation race; low-risk timeout-bounded retry improves robustness. Closes scylladb/scylladb#28952 * github.com:scylladb/scylladb: transport/messages: hold pinned prepared entry in PREPARE result cql3: pin prepared cache entry in prepare() to avoid invalid weak handle race (cherry picked from commit `d9a277453e`) Closes scylladb/scylladb#29001 Closes scylladb/scylladb#29195	2026-04-20 12:59:53 +02:00
Botond Dénes	41e2c2d1c4	Merge 'tasks: do not fail the wait request if rpc fails' from Aleksandra Martyniuk During decommission, we first mark a topology request as done, then shut down a node and in the following steps we remove node from the topology. Thus, finished request does not imply that a node is removed from the topology. Due to that, in node_ops_virtual_task::wait, while gathering children from the whole cluster, we may hit the connection exception - because a node is still in topology, even though it is down. Modify the get_children method to ignore the exception and warn about the failure instead. Keep token_metadata_ptr in get_children to prevent topology from changing. Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-867 Needs backports to all versions Closes scylladb/scylladb#29035 * github.com:scylladb/scylladb: tasks: fix indentation tasks: do not fail the wait request if rpc fails tasks: pass token_metadata_ptr to task_manager::virtual_task::impl::get_children (cherry picked from commit `2e47fd9f56`) Closes scylladb/scylladb#29193	2026-04-16 21:57:08 +03:00
Michał Chojnowski	18fc2eff31	test: add a missing reconnect_driver in test_sstable_compression_dictionaries_upgrade.py Need to work around https://github.com/scylladb/python-driver/issues/295, lest a CQL query fail spuriously after the cluster restart. Fixes: SCYLLADB-1114 Closes scylladb/scylladb#29118 (cherry picked from commit `6b18d95dec`) Closes scylladb/scylladb#29146 Closes scylladb/scylladb#29366	2026-04-16 10:56:59 +03:00
Patryk Jędrzejczak	a2c23793ab	raft_group0: join_group0: fix join hang when node joins group 0 before post_server_start A joining node hung forever if the topology coordinator added it to the group 0 configuration before the node reached `post_server_start`. In that case, `server->get_configuration().contains(my_id)` returned true and the node broke out of the join loop early, skipping `post_server_start`. `_join_node_group0_started` was therefore never set, so the node's `join_node_response` RPC handler blocked indefinitely. Meanwhile the topology coordinator's `respond_to_joining_node` call (which has no timeout) hung forever waiting for the reply that never came. Fix by only taking the early-break path when not starting as a follower (i.e. when the node is the discovery leader or is restarting). A joining node must always reach `post_server_start`. We also provide a regression test. It takes 6s in dev mode. Fixes SCYLLADB-959 Closes scylladb/scylladb#29266 (cherry picked from commit `b9f82f6f23`) Closes scylladb/scylladb#29291 Closes scylladb/scylladb#29308	2026-04-09 15:53:43 +02:00
Andrzej Jackowski	2b58d396e7	test: use exclusive driver connection in test_limited_concurrency_of_writes Use get_cql_exclusive(node1) so the driver only connects to node1 and never attempts to contact the stopped node2. The test was flaky because the driver received `Host has been marked down or removed` from node2. Fixes: SCYLLADB-1227 Closes scylladb/scylladb#29268 (cherry picked from commit `ab43420d30`) Closes scylladb/scylladb#29278 Closes scylladb/scylladb#29355	2026-04-07 14:22:48 +03:00
Botond Dénes	0aa03677b5	test/cluster: fix flaky test_cleanup_stop by using asyncio.sleep The test was using time.sleep(1) (a blocking call) to wait after scheduling the stop_compaction task, intending to let it register on the server before releasing the sstable_cleanup_wait injection point. However, time.sleep() blocks the asyncio event loop entirely, so the asyncio.create_task(stop_compaction) task never gets to run during the sleep. After the sleep, the directly-awaited message_injection() runs first, releasing the injection point before stop_compaction is even sent. By the time stop_compaction reaches Scylla, the cleanup has already completed successfully -- no exception is raised and the test fails. Fix by replacing time.sleep(1) with await asyncio.sleep(1), which yields control to the event loop and allows the stop_compaction task to actually send its HTTP request before message_injection is called. Fixes: SCYLLADB-834 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Closes scylladb/scylladb#29202 (cherry picked from commit `068a7894aa`) Closes scylladb/scylladb#29277 Closes scylladb/scylladb#29356	2026-04-07 14:22:27 +03:00
Avi Kivity	d4c28ee317	Merge 'service_levels: mark v2 migration complete on empty legacy table' from Alex Dathskovsky During raft-topology upgrade in 2026.1, service_level_controller::migrate_to_v2() returns early when system_distributed.service_levels is empty. This skips the service_level_version = 2 write, so the cluster is never marked as upgraded to service levels v2 even though there is no data to migrate. Subsequent upgrades may then fail the startup check which requires service_level_version == 2. Remove the early return and let the migration commit the version marker even when there are no legacy service levels rows to copy. Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1198 backport: should be backported to all versions that can be upgraded to 2026.2 Closes scylladb/scylladb#29333 * github.com:scylladb/scylladb: test/auth_cluster: cover empty legacy table in service level upgrade service_levels: mark v2 migration complete on empty legacy table (cherry picked from commit `95e422db48`) Closes scylladb/scylladb#29352	2026-04-06 17:51:34 +03:00
Patryk Jędrzejczak	be942e9a4f	test: test_remove_garbage_group0_members: wait for token ring and group0 consistency before removenode The removenove initiator could have an outdated token ring (still considering the node removed by the previous removenode a token owner) and unexpectedly reject the operation. Fix that by waiting for token ring and group0 consistency before removenode. Note that the test already checks that consistency, but only for one node, which is different from the removenode initiator. This test has been removed in master together with the code being tested (the gossip-based topology). Hence, the fix is submitted directly to 2026.1. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1103 Backport to all supported branches (other than 2026.1), as the test can fail there. Closes scylladb/scylladb#29108 (cherry picked from commit `1398a55d16`) Closes scylladb/scylladb#29205	2026-03-24 16:09:02 +01:00
Patryk Jędrzejczak	3863dfbc0a	test: test_raft_no_quorum: decrease group0_raft_op_timeout_in_ms after quorum loss `test_raft_no_quorum.py::test_cannot_add_new_node` is currently flaky in dev mode. The bootstrap of the first node can fail due to `add_entry()` timing out (with the 1s timeout set by the test case). Other test cases in this test file could fail in the same way as well, so we need a general fix. We don't want to increase the timeout in dev mode, as it would slow down the test. The solution is to keep the timeout unchanged, but set it only after quorum is lost. This prevents unexpected timeouts of group0 operations with almost no impact on the test running time. A note about the new `update_group0_raft_op_timeout` function: waiting for the log seems to be necessary only for `test_quorum_lost_during_node_join_response_handler`, but let's do it for all test cases just in case (including `test_can_restart` that shouldn't be flaky currently). Fixes https://scylladb.atlassian.net/browse/SCYLLADB-913 Closes scylladb/scylladb#28998 (cherry picked from commit `526e5986fe`) Closes scylladb/scylladb#29068 Closes scylladb/scylladb#29097	2026-03-18 10:15:34 +01:00
Patryk Jędrzejczak	9152a8d111	test: test_full_shutdown_during_replace: retry replace after the replacing node is removed from gossip The test is currently flaky with `reuse_ip = True`. The issue is that the test retries replace before the first replace is rolled back and the first replacing node is removed from gossip. The second replacing node can see the entry of the first replacing node in gossip. This entry has a newer generation than the entry of the node being replaced, and both replacing nodes have the same IP as the node being replaced. Therefore, the second replacing node incorrectly considers this entry as the entry of the node being replaced. This entry is missing rack and DC, so the second replace fails with ``` ERROR 2026-02-24 21:19:03,420 [shard 0:main] init - Startup failed: std::runtime_error (Cannot replace node 8762a9d2-3b30-4e66-83a1-98d16c5dd007/127.61.127.1 with a node on a different data center or rack. Current location=UNKNOWN_DC/UNKNOWN_RACK, new location=dc1/rack2) ``` Fixes SCYLLADB-805 Closes scylladb/scylladb#28829 (cherry picked from commit `ba7f314cdc`) Closes scylladb/scylladb#28953	2026-03-10 16:48:05 +01:00
Łukasz Paszkowski	4f5d10ccd0	compaction_manager: fix maybe_wait_for_sstable_count_reduction() hanging forever The futurization refactoring in `9d3755f276` ("replica: Futurize retrieval of sstable sets in compaction_group_view") changed maybe_wait_for_sstable_count_reduction() from a single predicated wait: ``` co_await cstate.compaction_done.wait([..] { return num_runs_for_compaction() <= threshold \|\| !can_perform_regular_compaction(t); }); ``` to a while loop with a predicated wait: ``` while (can_perform_regular_compaction(t) && co_await num_runs_for_compaction() > threshold) { co_await cstate.compaction_done.wait([this, &t] { return !can_perform_regular_compaction(t); }); } ``` This was necessary because num_runs_for_compaction() became a coroutine (returns future<size_t>) and can no longer be called inside a condition_variable predicate (which must be synchronous). However, the inner wait's predicate — !can_perform_regular_compaction(t) — only returns true when compaction is disabled or the table is being removed. During normal operation, every signal() from compaction_done wakes the waiter, the predicate returns false, and the waiter immediately goes back to sleep without ever re-checking the outer while loop's num_runs_for_compaction() condition. This causes memtable flushes to hang forever in maybe_wait_for_sstable_count_reduction() whenever the sstable run count exceeds the threshold, because completed compactions signal compaction_done but the signal is swallowed by the predicate. Fix by replacing the predicated wait with a bare wait(), so that any signal (including from completed compactions) causes the outer while loop to re-evaluate num_runs_for_compaction(). Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-610 Closes scylladb/scylladb#28801 (cherry picked from commit `bb57b0f3b7`)	2026-02-27 01:39:05 +02:00
Andrzej Jackowski	1c9d3e14a3	test: fix configuration of test_autoretrain_dict `test_autoretrain_dict` sporadically fails because the default compression algorithm was changed after the test was written. `9ffa62a986815709d0a09c705d2d0caf64776249` was an attempt to fix it by changing the compression configuration during node startup. However, the configuration change had an incorrect YAML format and was ignored by ScyllaDB. This commit fixes it. Fixes: scylladb/scylladb#28204 Closes scylladb/scylladb#28746 (cherry picked from commit `cd4caed3d3`) Closes scylladb/scylladb#28792	2026-02-26 09:55:57 +02:00
Marcin Maliszkiewicz	1f1fc2c2ac	test: decrease strain in test_startup_response For 2025.3 and 2025.4 this test runs order of magnitude slower in debug mode. Potentially due to passwords::check running in alien thread and overwhelming the CPU (this is fixed in newer versions). Decreasing the number of connections in test makes it fast again, without breaking reproducibility. As additional measure we double the timeout.	2026-02-20 10:13:55 +01:00
Marcin Maliszkiewicz	b7b7fef02c	test: auth_cluster: add test for hanged AUTHENTICATING connections Test runtime: Release - 2s Debug - 5s (cherry picked from commit `3b98451`)	2026-02-19 16:24:03 +01:00
Michael Litvak	790b0d5627	migration_listener: fix deadlock in nested notifications When calling a migration notification from the context of a notification callback, this could lead to a deadlock with unregistering a listener: A: the parent notification is called. it calls thread_for_each, where it acquires a read lock on the vector of listeners, and calls the callback function for each listener while holding the lock. B: a listener is unregistered. it calls `remove` and tries to acquire a write lock on the vector of listeners. it waits because the lock is held. A: the callback function calls another notification and calls thread_for_each which tries to acquire the read lock again. but it waits since there is a waiter. Currently we have such concrete scenario when creating a table, where the callback of `before_create_column_family` in the tablet allocator calls `before_allocate_tablet_map`, and this could deadlock with node shutdown where we unregister listeners. Fix this by not acquiring the read lock again in the nested notification. There is no need because the read lock is already held by the parent notification while the child notification is running. We add a function `thread_for_each_nested` that is similar to `thread_for_each` except it assumes the read lock is already held and doesn't acquire it, and it should be used for nested notifications instead of `thread_for_each`. Fixes scylladb/scylladb#27364 Closes scylladb/scylladb#27637 (cherry picked from commit `55f4a2b754`) Closes scylladb/scylladb#28557	2026-02-18 12:47:30 +02:00
Botond Dénes	4e9c84321b	Merge '[Backport 2025.4] test: cluster: Fix test_sync_point' from Scylladb[bot] The test `test_sync_point` had a few shortcomings that made it flaky or simply wrong: 1. We were verifying that hints were written by checking the size of in-flight hints. However, that could potentially lead to problems in rare situations. For instance, if all of the hints failed to be written to disk, the size of in-flight hints would drop to zero, but creating a sync point would correspond to the empty state. In such a situation, we should fail immediately and indicate what the cause was. 2. A sync point corresponds to the hints that have already been written to disk. The number of those is tracked by the metric `written`. It's a much more reliable way to make sure that hints have been written to the commitlog. That ensures that the sync point we'll create will really correspond to those hints. 3. The auxiliary function `wait_for` used in the test works like this: it executes the passed callback and looks at the result. If it's `None`, it retries it. Otherwise, the callback is deemed to have finished its execution and no further retries will be attempted. Before this commit, we simply returned a bool, and so the code was wrong. We improve it. --- Note that this fixes scylladb/scylladb#28203, which was a manifestation of scylladb/scylladb#25879. We created a sync point that corresponded to the empty state, and so it immediately resolved, even when node 3 was still dead. As a bonus, we rewrite the auxiliary code responsible for fetching metrics and manipulating sync points. Now it's asynchronous and uses the existing standard mechanisms available to developers. Furthermore, we reduce the time needed for executing `test_sync_point` by 27 seconds. --- The total difference in time needed to execute the whole test file (on my local machine, in dev mode): Before: CPU utilization: 0.9% real 2m7.811s user 0m25.446s sys 0m16.733s After: CPU utilization: 1.1% real 1m40.288s user 0m25.218s sys 0m16.566s --- Refs scylladb/scylladb#25879 Fixes scylladb/scylladb#28203 Backport: This improves the stability of our CI, so let's backport it to all supported versions. - (cherry picked from commit `628e74f157`) - (cherry picked from commit `ac4af5f461`) - (cherry picked from commit `c5239edf2a`) - (cherry picked from commit `a256ba7de0`) - (cherry picked from commit `f83f911bae`) Parent PR: #28602 Closes scylladb/scylladb#28622 * github.com:scylladb/scylladb: test: cluster: Reduce wait time in test_sync_point test: cluster: Fix test_sync_point test: cluster: Await sync points asynchronously test: cluster: Create sync points asynchronously test: cluster: Fetch hint metrics asynchronously	2026-02-18 12:46:35 +02:00
Patryk Jędrzejczak	63abb3e6cd	test: test_restart_leaving_replica_during_cleanup: reconnect driver after restart The test can currently fail like this: ``` > await cql.run_async(f"ALTER TABLE {ks}.test WITH tablets = {{'min_tablet_count': 1}}") E cassandra.cluster.NoHostAvailable: ('Unable to complete the operation against any hosts', {<Host: 127.158.27.9:9042 datacenter1>: <Error from server: code=0000 [Server error] message="Failed to apply group 0 change due to concurrent modification">}) ``` The following happens: - node A is restarted and becomes the group0 leader, - the driver sends the ALTER TABLE request to node B, - the request hits group 0 concurrent modification error 10 times and fails because node A performs tablet migrations at the the same time. What is unexpected is that even though the driver session uses the default retry policy, the driver doesn't retry the request on node A. The request is guaranteed to succeed on node A because it's the only node adding group0 entries. The driver doesn't retry the request on node A because of a missing `wait_for_cql_and_get_hosts` call. We add it in this commit. We also reconnect the driver just in case to prevent hitting scylladb/python-driver#295. Moreover, we can revert the workaround from `4c9efc08d8`, as the fix from this commit also prevents DROP KEYSPACE failures. The commit has been tested in byo with `_concurrent_ddl_retries{0}` to verify that node A really can't hit group 0 concurrent modification error and always receives the ALTER TABLE request from the driver. All 300 runs in each build mode passed. Fixes #25938 Closes scylladb/scylladb#28632 (cherry picked from commit `0693091aff`) Closes scylladb/scylladb#28672	2026-02-18 12:43:31 +02:00
Andrzej Jackowski	c46ae2c2ab	test: explicitly set compression algorithm in test_autoretrain_dict When `test_autoretrain_dict` was originally written, the default `sstable_compression_user_table_options` was `LZ4Compressor`. The test assumed (correctly) that initially the compression doesn't use a trained dictionary, and later in the test scenario, it changed the algorithm to one with a dictionary. However, the default `sstable_compression_user_table_options` is now `LZ4WithDictsCompressor`, so the old assumption is no longer correct. As a result, the assertion that data is initially not compressed well may or may not fail depending on dictionary training timing. To fix this, this commit explicitly sets `ZstdCompressor` as the initial `sstable_compression_user_table_options`, ensuring that the assumption that initial compression is without a dictionary is always met. Note: `ZstdCompressor` differs from the former default `LZ4Compressor`. However, it's a better choice — the test aims to show the benefit of using a dictionary, not the benefit of Zstd over LZ4 (and the test uses ZstdWithDictsCompressor as the algorithm with the dictionary). Fixes: scylladb/scylladb#28204 (cherry picked from commit `9ffa62a986`)	2026-02-16 16:22:58 +00:00
Andrzej Jackowski	91bf817955	test: remove unneeded semicolons from python test (cherry picked from commit `e63cfc38b3`)	2026-02-16 16:22:58 +00:00
Dawid Mędrek	3e7602254a	test: cluster: Reduce wait time in test_sync_point If everything is OK, the sync point will not resolve with node 3 dead. As a result, the waiting will use all of the time we allocate for it, i.e. 30 seconds. That's a lot of time. There's no easy way to verify that the sync point will NOT resolve, but let's at least reduce the waiting to 3 seconds. If there's a bug, it should be enough to trigger it at some point, while reducing the average time needed for CI. (cherry picked from commit `f83f911bae`)	2026-02-12 12:12:43 +00:00
Dawid Mędrek	2334f297f2	test: cluster: Fix test_sync_point The test had a few shortcomings that made it flaky or simply wrong: 1. We were verifying that hints were written by checking the size of in-flight hints. However, that could potentially lead to problems in rare situations. For instance, if all of the hints failed to be written to disk, the size of in-flight hints would drop to zero, but creating a sync point would correspond to the empty state. In such a situation, we should fail immediately and indicate what the cause was. 2. A sync point corresponds to the hints that have already been written to disk. The number of those is tracked by the metric `written`. It's a much more reliable way to make sure that hints have been written to the commitlog. That ensures that the sync point we'll create will really correspond to those hints. 3. The auxiliary function `wait_for` used in the test works like this: it executes the passed callback and looks at the result. If it's `None`, it retries it. Otherwise, the callback is deemed to have finished its execution and no further retries will be attempted. Before this commit, we simply returned a bool, and so the code was wrong. We improve it. Note that this fixes scylladb/scylladb#28203, which was a manifestation of scylladb/scylladb#25879. We created a sync point that corresponded to the empty state, and so it immediately resolved, even when node 3 was still dead. Refs scylladb/scylladb#25879 Fixes scylladb/scylladb#28203 (cherry picked from commit `a256ba7de0`)	2026-02-12 12:12:43 +00:00
Dawid Mędrek	ebf8281b66	test: cluster: Await sync points asynchronously There's a dedicated HTTP API for communicating with the cluster, so let's use it instead of yet another custom solution. (cherry picked from commit `c5239edf2a`)	2026-02-12 12:12:43 +00:00
Dawid Mędrek	e9ae597d35	test: cluster: Create sync points asynchronously There's a dedicated HTTP API for communicating with the nodes, so let's use it instead of yet another custom solution. (cherry picked from commit `ac4af5f461`)	2026-02-12 12:12:43 +00:00
Dawid Mędrek	63d001e141	test: cluster: Fetch hint metrics asynchronously There's a dedicated API for fetching metrics now. Let's use it instead of developing yet another solution that's also worse. (cherry picked from commit `628e74f157`)	2026-02-12 12:12:43 +00:00
Patryk Jędrzejczak	0e78bec6e8	Merge '[Backport 2025.4] storage_service: set up topology properly in maintenance mode' from Scylladb[bot] We currently make the local node the only token owner (that owns the whole ring) in maintenance mode, but we don't update the topology properly. The node is present in the topology, but in the `none` state. That's how it's inserted by `tm.get_topology().set_host_id_cfg(host_id);` in `scylla_main`. As a result, the node started in maintenance mode crashes in the following way in the presence of a vnodes-based keyspace with the NetworkTopologyStrategy: ``` scylla: locator/network_topology_strategy.cc:207: locator::natural_endpoints_tracker::natural_endpoints_tracker( const token_metadata &, const network_topology_strategy::dc_rep_factor_map &): Assertion `!_token_owners.empty() && !_racks.empty()' failed. ``` Both `_token_owners` and `_racks` are empty. The reason is that `_tm.get_datacenter_token_owners()` and `_tm.get_datacenter_racks_token_owners()` called above filter out nodes in the `none` state. This bug basically made maintenance mode unusable in customer clusters. We fix it by changing the node state to `normal`. We also extend `test_maintenance_mode` to provide a reproducer for Fixes #27988 This PR must be backported to all branches, as maintenance mode is currently unusable everywhere. - (cherry picked from commit `a08c53ae4b`) - (cherry picked from commit `9d4a5ade08`) - (cherry picked from commit `c92962ca45`) - (cherry picked from commit `408c6ea3ee`) - (cherry picked from commit `53f58b85b7`) - (cherry picked from commit `867a1ca346`) - (cherry picked from commit `6c547e1692`) - (cherry picked from commit `7e7b9977c5`) Parent PR: #28322 Closes scylladb/scylladb#28498 * https://github.com/scylladb/scylladb: test: test_maintenance_mode: enable maintenance mode properly test: test_maintenance_mode: shutdown cluster connections test: test_maintenance_mode: run with different keyspace options test: test_maintenance_mode: check that group0 is disabled by creating a keyspace test: test_maintenance_mode: get rid of the conditional skip test: test_maintenance_mode: remove the redundant value from the query result storage_proxy: skip validate_read_replica in maintenance mode storage_service: set up topology properly in maintenance mode	2026-02-04 16:49:02 +01:00
Tomasz Grabiec	3243952c47	Merge '[Backport 2025.4] load_stats: fix problem with load_stats refresh throwing no_such_column_family' from Scylladb[bot] When the topology coordinator refreshes load_stats, it caches load_stats for every node. In case the node becomes unresponsive, and fresh load_stats can not be read from the node, the cached version of load_stats will be used. This is to allow the load balancer to have at least some information about the table sizes and disk capacities of the host. During load_stats refresh, we aggregate the table sizes from all the nodes. This procedure calls db.find_column_family() for each table_id found in load_stats. This function will throw if the table is not found. This will cause load_stats refresh to fail. It is also possible for a table to have been dropped between the time load_stats has been prepared on the host, and the time it is processed on the topology coordinator. This would also cause an exception in the refresh procedure. This fixes this problem by checking if the table still exists. Fixes: #28359 - (cherry picked from commit `71be10b8d6`) - (cherry picked from commit `92dbde54a5`) Parent PR: #28440 Closes scylladb/scylladb#28470 * github.com:scylladb/scylladb: test: add test and reproducer for load_stats refresh exception load_stats: handle dropped tables when refreshing load_stats	2026-02-03 11:33:57 +01:00
Patryk Jędrzejczak	dcffababbe	test: test_maintenance_mode: enable maintenance mode properly The same issue as the one fixed in `394207fd69`. This one didn't cause real problems, but it's still cleaner to fix it. (cherry picked from commit `7e7b9977c5`)	2026-02-03 11:33:53 +01:00
Patryk Jędrzejczak	4be6788083	test: test_maintenance_mode: shutdown cluster connections Leaked connections are known to cause inter-test issues. (cherry picked from commit `6c547e1692`)	2026-02-03 11:33:52 +01:00
Patryk Jędrzejczak	043287ac77	test: test_maintenance_mode: run with different keyspace options We extend the test to provide a reproducer for #27988 and to avoid similar bugs in the future. The test slows down from ~14s to ~19s on my local machine in dev mode. It seems reasonable. (cherry picked from commit `867a1ca346`)	2026-02-03 11:33:52 +01:00
Patryk Jędrzejczak	2b984f07cf	test: test_maintenance_mode: check that group0 is disabled by creating a keyspace In the following commit, we make the rest run with multiple keyspaces, and the old check becomes inconvenient. We also move it below to the part of the code that won't be executed for each keyspace. Additionally, we check if the error message is as expected. (cherry picked from commit `53f58b85b7`)	2026-02-03 11:33:52 +01:00
Patryk Jędrzejczak	59aee20992	test: test_maintenance_mode: get rid of the conditional skip This skip has already caused trouble. After `0668c642a2`, the skip was always hit, and the test was silently doing nothing. This made us miss #26816 for a long time. The test was fixed in `222eab45f8`, but we should get rid of the skip anyway. We increase the number of writes from 256 to 1000 to make the chance of not finding the key on server A even lower. If that still happens, it must be due to a bug, so we fail the test. We also make the test insert rows until server A is a replica of one row. The expected number of inserted rows is a small constant, so it should, in theory, make the test faster and cleaner (we need one row on server A, so we insert exactly one such row). It's possible to make the test fully deterministic, by e.g., hardcoding the key and tokens of all nodes via `initial_token`, but I'm afraid it would make the test "too deterministic" and could hide a bug. (cherry picked from commit `408c6ea3ee`)	2026-02-03 11:33:52 +01:00
Patryk Jędrzejczak	7f133e72af	test: test_maintenance_mode: remove the redundant value from the query result (cherry picked from commit `c92962ca45`)	2026-02-03 11:33:52 +01:00
Tomasz Grabiec	ab4a4ad03f	test: Verify that repair doesn't block disabling of tablet load balancing Refs #27647 (cherry picked from commit `ffa11d6a2d`)	2026-02-02 21:26:18 +01:00
Ferenc Szili	22e0bafaa7	test: add test and reproducer for load_stats refresh exception This patch adds a test and reproducer for the issue where the load_stats refresh procedure throws exceptions if any of the tables have been dropped since load_stats was produced. (cherry picked from commit `92dbde54a5`)	2026-02-02 16:35:08 +01:00
Botond Dénes	3573535167	Merge '[Backport 2025.4] schema: Apply `sstable_compression_user_table_options` to CQL aux and Alternator tables' from Scylladb[bot] In PR `5b6570be52` we introduced the config option `sstable_compression_user_table_options` to allow adjusting the default compression settings for user tables. However, the new option was hooked into the CQL layer and applied only to CQL base tables, not to the whole spectrum of user tables: CQL auxiliary tables (materialized views, secondary indexes, CDC log tables), Alternator base tables, Alternator auxiliary tables (GSIs, LSIs, Streams). This gap also led to inconsistent default compression algorithms after we changed the option’s default algorithm from LZ4 to LZ4WithDicts (`adf9c426c2`). This series introduces a general “schema initializer” mechanism in `schema_builder` and uses it to apply the default compression settings uniformly across all user tables. This ensures that all base and aux tables take their default compression settings from config. Fixes #26914. Backport justification: LZ4WithDicts is the new default since 2025.4, but the config option exists since 2025.2. Based on severity, I suggest we backport only to 2025.4 to maintain consistency of the defaults. - (cherry picked from commit `4ec7a064a9`) - (cherry picked from commit `76b2d0f961`) - (cherry picked from commit `5b4aa4b6a6`) - (cherry picked from commit `d5ec66bc0c`) - (cherry picked from commit `1e37781d86`) - (cherry picked from commit `7fa1f87355`) Parent PR: #27204 Closes scylladb/scylladb#28305 * github.com:scylladb/scylladb: db/config: Update sstable_compression_user_table_options description schema: Add initializer for compression defaults schema: Generalize static configurators into schema initializers schema: Initialize static properties eagerly db: config: Add accessor for sstable_compression_user_table_options test: Check that CQL and Alternator tables respect compression config test/cqlpy: test compression setting for auxiliary table test/alternator: tests for schema of Alternator table	2026-01-30 16:10:35 +02:00
Patryk Jędrzejczak	538295e97b	test: test_gossiper_orphan_remover: get host ID of the bootstrapping node before it crashes The test is currently flaky. It tries to get the host ID of the bootstrapping node via the REST API after the node crashes. This can obviously fail. The test usually doesn't fail, though, as it relies on the host ID being saved in `ScyllaServer._host_id` at this point by `ScyllaServer.try_get_host_id()` repeatedly called in `ScyllaServer.start()`. However, with a very fast crash and unlucky timings, no such call may succeed. We deflake the test by getting the host ID before the crash. Note that at this point, the bootstrapping node must be serving the REST API requests because `await log.wait_for("finished do_send_ack2_msg")` above guarantees that the node has started the gossip shadow round, which happens after starting the REST API. Fixes #28385 Closes scylladb/scylladb#28388 (cherry picked from commit `a2c1569e04`) Closes scylladb/scylladb#28416	2026-01-29 11:27:37 +01:00
Nikos Dragazis	914d3f845a	schema: Add initializer for compression defaults In PR `5b6570be52` we introduced the config option `sstable_compression_user_table_options` to allow adjusting the default compression settings for user tables. However, the new option was hooked into the CQL layer and applied only to CQL base tables, not to the whole spectrum of user tables: CQL auxiliary tables (materialized views, secondary indexes, CDC log tables), Alternator base tables, Alternator auxiliary tables (GSIs, LSIs, Streams). Fix this by moving the logic into the `schema_builder` via a schema initializer. This ensures that the default compression settings are applied uniformly regardless of how the table is created, while also keeping the logic in a central place. Register the initializer at startup in all executables where schemas are being used (`scylla_main()`, `scylla_sstable_main()`, `cql_test_env`). Finally, remove the ad-hoc logic from `create_table_statement` (redundant as of this patch), remove the xfail markers from the relevant tests and adjust `test_describe_cdc_log_table_create_statement` to expect LZ4WithDicts as the default compressor. Fixes #26914. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com> (cherry picked from commit `1e37781d86`)	2026-01-28 12:42:10 +02:00
Nikos Dragazis	80e064d61c	test: Check that CQL and Alternator tables respect compression config In patches `11f6a25d44` and `7b9428d8d7` we added tests to verify that auxiliary tables for both CQL and Alternator have the same default compression settings as their base tables. These tests do not check where these defaults originate from; they just verify that they are consistent. Add some more tests to verify the actual source of the defaults, which is expected to be the `sstable_compression_user_table_options` from the configuration. Unlike the previous tests, these tests require dedicated Scylla instances with custom configuration, so they must be placed under `test/cluster/`. Mark them as xfail-ing. The marker will be removed later in this series. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com> (cherry picked from commit `4ec7a064a9`)	2026-01-27 19:33:35 +02:00
Petr Gusev	69a24bb933	test_lwt_shutdown: fix flakiness by removing storage_proxy::stop injection storage_proxy::stop() is not called by main (it is commented out due to #293), so the corresponding message injection is never hit. When the test releases paxos_state_learn_after_mutate, shutdown may already be in progress or even completed by the time we try to trigger the storage_proxy::stop injection, which makes the test flaky. Fix this by completely removing the storage_proxy::stop injection. The injection is not required for test correctness. Shutdown must wait for the background LWT learn to finish, which is released via the paxos_state_learn_after_mutate injection. The shutdown process blocks on in-flight api HTTP requests through seastar::httpd::http_server::stop and its _task_gate, so the shutdown will not prevent the HTTP request that released the paxos_state_learn_after_mutate from completing successfully. Fixes scylladb/scylladb#28260 (cherry picked from commit `f5ed3e9fea`)	2026-01-23 19:24:06 +00:00
Patryk Jędrzejczak	d4277c95e8	test: test_zero_token_nodes_multidc: properly handle reads with CL=LOCAL_ONE The test is currently flaky. It incorrectly assumes that a read with CL=LOCAL_ONE will see the data inserted by a preceding write with CL=LOCAL_ONE in the same datacenter with RF=2. The same issue has already been fixed for CL=ONE in `21edec1ace`. The difference is that for CL=LOCAL_ONE, only dc1 is problematic, as dc2 has RF=1. We fix the issue for CL=LOCAL_ONE by skipping the check for dc1. Fixes #28253 The fix addresses CI flakiness and only changes the test, so it should be backported. Closes scylladb/scylladb#28274 (cherry picked from commit `1f0f694c9e`) Closes scylladb/scylladb#28304	2026-01-22 18:22:05 +01:00
Patryk Jędrzejczak	a247e19f56	test: test_raft_recovery_during_join: get host ID of the bootstrapping node before it crashes The test is currently flaky. It tries to get the host ID of the bootstrapping node via the REST API after the node crashes. This can obviously fail. The test usually doesn't fail, though, as it relies on the host ID being saved in `ScyllaServer._host_id` at this point by `ScyllaServer.try_get_host_id()` repeatedly called in `ScyllaServer.start()`. However, with a very fast crash and unlucky timings, no such call may succeed. We deflake the test by getting the host ID before the crash. Note that at this point, the bootstrapping node must be serving the REST API requests because `await coordinator_log.wait_for("delay_node_bootstrap: waiting for message")` above guarantees that the node has submitted the join topology request, which happens after starting the REST API. Fixes #28227 Closes scylladb/scylladb#28233 (cherry picked from commit `e503340efc`) Closes scylladb/scylladb#28310	2026-01-22 18:19:36 +01:00
Patryk Jędrzejczak	8b15975cb8	test: test_group0_schema_versioning: wait for schema sync in system.local `test_schema_versioning_with_recovery` is currently flaky. It performs a write with CL=ALL and then checks if the schema version is the same on all nodes by calling `verify_table_versions_synced`. All nodes are expected to sync their schema before handling the replica write. The node in RECOVERY mode should do it through a schema pull, and other nodes should do it through a group 0 read barrier. The problem is in `verify_local_schema_versions_synced` that compares the schema versions in `system.local`. The node in RECOVERY mode updates the schema version in `system.local` after it acknowledges the replica write as completed. Hence, the check can fail. We fix the problem by making the function wait until the schema versions match. Note that RECOVERY mode is about to be retired together with the whole gossip-based topology in 2026.2. So, this test is about to be deleted. However, we still want to fix it, so that it doesn't bother us in older branches. Fixes #23803 Closes scylladb/scylladb#28114 (cherry picked from commit `6b5923c64e`) Closes scylladb/scylladb#28178	2026-01-19 16:35:49 +01:00
Gleb Natapov	4e4bfee41e	topology coordinator: complete pending operation for a replaced node A replaced node may have pending operation on it. The replace operation will move the node into the 'left' state and the request will never be completed. More over the code does not expect left node to have a request. It will try to process the request and will crash because the node for the request will not be found. The patch checks is the replaced node has peening request and completes it with failure. It also changes topology loading code to skip requests for nodes that are in a left state. This is not strictly needed, but makes the code more robust. Fixes #27990 Closes scylladb/scylladb#28009 (cherry picked from commit `bee5f63cb6`) Closes scylladb/scylladb#28180	2026-01-19 09:42:20 +02:00
Asias He	09da47a42c	repair: Fix sstable_list_to_mark_as_repaired with multishard writer It was obseved: ``` test_repair_disjoint_row_2nodes_diff_shard_count was spuriously failing due to segfault. backtrace pointed to a failure when allocating an object from the chain of freed objects, which indicates memory corruption. (gdb) bt at ./seastar/include/seastar/core/shared_ptr.hh:275 at ./seastar/include/seastar/core/shared_ptr.hh:430 Usual suspect is use-after-free, so ran the reproducer in the sanitize mode, which indicated shared ptr was being copied into another cpu through the multi shard writer: seastar - shared_ptr accessed on non-owner cpu, at: ... -------- seastar::smp_message_queue::async_work_item<mutation_writer::multishard_writer::make_shard_writer... ``` The multishard writer itself was fine, the problem was in the streaming consumer for repair copying a shared ptr. It could work fine with same smp setting, since there will be only 1 shard in the consumer path, from rpc handler all the way to the consumer. But with mixed smp setting, the ptr would be copied into the cpus involved, and since the shared ptr is not cpu safe, the refcount change can go wrong, causing double free, use-after-free. To fix, we pass a generic incremental repair handler to the streaming consumer. The handler is safe to be copied to different shards. It will be a no op if incremental repair is not enabled or on a different shard. A reproducer test is added. The test could reproduce the crash consistently before the fix and work well after the fix. Fixes #27666 Closes scylladb/scylladb#27870 (cherry picked from commit `0aabf51380`) Closes scylladb/scylladb#28064	2026-01-19 09:39:49 +02:00
Asias He	c5aa29404d	repair: Add tablet repair progress report support This patch adds tablet repair progress report support so that the user could use the /task_manager/task_status API to query the progress. In order to support this, a new system table is introduced to record the user request related info, i.e, start of the request and end of the request. The progress is accurate when tablet split or merge happens in the middle of the request, since the tokens of the tablet are recorded when the request is started and when repair of each tablet is finished. The original tablet repair is considered as finished when the finished ranges cover the original tablet token ranges. After this patch, the /task_manager/task_status API will report correct progress_total and progress_completed. Fixes #22564 Fixes #26896 Closes scylladb/scylladb#27679 (cherry picked from commit `4f77dd058d`) Closes scylladb/scylladb#28065	2026-01-19 09:39:13 +02:00
Botond Dénes	b738be094f	Merge '[Backport 2025.4] Make commitlog replay handle files with corrupt file header (non-zero) as data loss, not startup failure' from Scylladb[bot] Fixes #26744 If a segment to replay is broken such that the main header is not zero, but still broken, we throw header_checksum_error. This was not handled in replayer, which grouped this into the "user error/fundamental problem" category. However, assuming we allow for "real" disk corruption, this should really be treated same as data corruption, i.e. reported data loss, not failure to start up. The `test_one_big_mutation_corrupted_on_startup` test accidentally sometimes provoked this issue, by doing random file wrecking, which on rare occasions provoked this, and thus failed test due to scylla not starting up, instead of losing data as expected. - (cherry picked from commit `9b5f3d12a3`) - (cherry picked from commit `e48170ca8e`) - (cherry picked from commit `8c4ac457af`) Parent PR: #27556 Closes scylladb/scylladb#27682 * github.com:scylladb/scylladb: test::cluster::dtest::tools::files: Remove file commitlog_replay: Handle fully corrupt files same as partial corruption. test::pylib::suite::base: Split options.name test specifier only once	2026-01-19 06:39:29 +02:00
Sergey Zolotukhin	96275adf1c	test: disable test_start_bootstrapped_with_invalid_seed The test intermittently fails when an invalid DNS name is resolved, likely due to ISP DNS error hijacking (see scylladb/scylladb#28153). Disable this test to unblock CI. Fixes scylladb/scylladb#28153 Closes scylladb/scylladb#28162 (cherry picked from commit `799d837295`)	2026-01-15 17:01:30 +02:00
Patryk Jędrzejczak	8a833e0400	Merge '[Backport 2025.4] raft topology: preserve IP -> ID mapping of a replacing node on restart' from Scylladb[bot] We currently do it only for a bootstrapping node, which is a bug. The missing IP can cause an internal error, for example, in the following scenario: - replace fails during streaming, - all live nodes are shut down before the rollback of replace completes, - all live nodes are restarted, - live nodes start hitting internal error in all operations that require IP of the replacing node (like client requests or REST API requests coming from nodetool). We fix the bug here, but we do it separately for replace with different IP and replace with the same IP. For replace with different IP, we persist the IP -> host ID mapping in `system.peers` just like for bootstrap. That's necessary, since there is no other way to determine IP of the replacing node on restart. For replace with the same IP, we can't do the same. This would require deleting the row corresponding to the node being replaced from `system.peers`. That's fine in theory, as that node is permanently banned, so its IP shouldn't be needed. Unfortunately, we have many places in the code where we assume that IP of a topology member is always present in the address map or that a topology member is always present in the gossiper endpoint set. Examples of such places: - nodetool operations, - REST API endpoints, - `db::hints::manager::store_hint`, - `group0_voter_handler::update_nodes`. We could fix all those places and verify that drivers work properly when they see a node in the token metadata, but not in `system.peers`. However, that would be too risky to backport. We take a different approach. We recover IP of the replacing node on restart based on the state of the topology state machine and `system.peers` just after loading `system.peers`. We rely on the fact that group 0 is set up at this point. The only case where this assumption is incorrect is a restart in the Raft-based recovery procedure. However, hitting this problem then seems improbable, and even if it happens, we can restart the node again after ensuring that no client and REST API requests come before replace is rolled back on the new topology coordinator. Hence, it's not worth to complicate the fix (by e.g. looking at the persistent topology state instead of the in-memory state machine). Fixes #28057 Backport this PR to all branches as it fixes a problematic bug. - (cherry picked from commit `fc4c2df2ce`) - (cherry picked from commit `4526dd93b1`) - (cherry picked from commit `749b0278e5`) - (cherry picked from commit `0fed9f94f8`) Parent PR: #27435 Closes scylladb/scylladb#28100 * https://github.com/scylladb/scylladb: gossiper: add_saved_endpoint: make generations of excluded nodes negative test: introduce test_full_shutdown_during_replace utils: error_injection: allow aborting wait_for_message raft topology: preserve IP -> ID mapping of a replacing node on restart	2026-01-13 17:19:39 +01:00
Patryk Jędrzejczak	1104022d91	gossiper: add_saved_endpoint: make generations of excluded nodes negative The explanation is in the new comment in `gossiper::add_saved_endpoint`. We add a test for this change. It's "extremely white-box", but it's better than nothing. (cherry picked from commit `0fed9f94f8`)	2026-01-13 12:07:18 +01:00

1 2 3 4 5 ...

590 Commits