scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-02 06:05:53 +00:00

Author	SHA1	Message	Date
Botond Dénes	2eda427b96	Merge 'tasks: do not fail the wait request if rpc fails' from Aleksandra Martyniuk During decommission, we first mark a topology request as done, then shut down a node and in the following steps we remove node from the topology. Thus, finished request does not imply that a node is removed from the topology. Due to that, in node_ops_virtual_task::wait, while gathering children from the whole cluster, we may hit the connection exception - because a node is still in topology, even though it is down. Modify the get_children method to ignore the exception and warn about the failure instead. Keep token_metadata_ptr in get_children to prevent topology from changing. Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-867 Needs backports to all versions Closes scylladb/scylladb#29035 * github.com:scylladb/scylladb: tasks: fix indentation tasks: do not fail the wait request if rpc fails tasks: pass token_metadata_ptr to task_manager::virtual_task::impl::get_children (cherry picked from commit `2e47fd9f56`) Closes scylladb/scylladb#29531	2026-04-30 12:08:26 +03:00
Botond Dénes	42edeee977	Merge 'test: cluster: Deflake test_startup_with_keyspaces_violating_rf_rack_valid_keyspaces' from Dawid Mędrek The test was flaky. The scenario looked like this: 1. Stop server 1. 2. Set its rf_rack_valid_keyspaces configuration option to true. 3. Create an RF-rack-invalid keyspace. 4. Start server 1 and expect a failure during start-up. It was wrong. We cannot predict when the Raft mutation corresponding to the newly created keyspace will arrive at the node or when it will be processed. If the check of the RF-rack-valid keyspaces we perform at start-up was done before that, it won't include the keyspace. This will lead to a test failure. Unfortunately, it's not feasible to perform a read barrier during start-up. What's more, although it would help the test, it wouldn't be useful otherwise. Because of that, we simply fix the test, at least for now. The new scenario looks like this: 1. Disable the rf_rack_valid_keyspaces configuration option on server 1. 2. Start the server. 3. Create an RF-rack-invalid keyspace. 4. Perform a read barrier on server 1. This will ensure that it has observed all Raft mutations, and we won't run into the same problem. 5. Stop the node. 6. Set its rf_rack_valid_keyspaces configuration option to true. 7. Try to start the node and observe a failure. This will make the test perform consistently. --- I ran the test (in dev mode, on my local machine) three times before these changes, and three times with them. I include the time results below. Before: ``` real 0m47.570s user 0m41.631s sys 0m8.634s real 0m50.495s user 0m42.499s sys 0m8.607s real 0m50.375s user 0m41.832s sys 0m8.789s ``` After: ``` real 0m50.509s user 0m43.535s sys 0m9.715s real 0m50.857s user 0m44.185s sys 0m9.811s real 0m50.873s user 0m44.289s sys 0m9.737s ``` Fixes SCYLLADB-1137 Backport: The test is present on all supported branches, and so we should backport these changes to them. Closes scylladb/scylladb#29218 * github.com:scylladb/scylladb: test: cluster: Deflake test_startup_with_keyspaces_violating_rf_rack_valid_keyspaces test: cluster: Mark test with @pytest.mark.asyncio in test_multidc.py (cherry picked from commit `d52fbf7ada`) Closes scylladb/scylladb#29654	2026-04-30 12:07:54 +03:00
Marcin Maliszkiewicz	b29f3b4a9b	Merge 'ldap: fix double-free of LDAPMessage in poll_results()' from Andrzej Jackowski In the unregistered-ID branch, ldap_msgfree() was called on a result already owned by an RAII ldap_msg_ptr, causing a double-free on scope exit. Remove the redundant manual free. Fixes: SCYLLADB-1446 Backport: 2026.1, 2025.4, 2025.1 - it's a memory corruption, with a one-line fix, so better backport it everywhere. Closes scylladb/scylladb#29302 * github.com:scylladb/scylladb: test: ldap: add regression test for double-free on unregistered message ID ldap: fix double-free of LDAPMessage in poll_results() (cherry picked from commit `895fdb6d29`) Closes scylladb/scylladb#29393 Closes scylladb/scylladb#29454 Closes scylladb/scylladb#29646	2026-04-26 10:30:39 +03:00
Emil Maskovsky	c04cbd5aa2	encryption: cover system.raft table in system_info_encryption Extend system_info_encryption to encrypt system.raft SSTables. system.raft contains the Raft log, which may hold sensitive user data (e.g. batched mutations), so it warrants the same treatment as system.batchlog and system.paxos. During upgrade, existing unencrypted system.raft SSTables remain readable. Existing data is rewritten encrypted via compaction, or immediately via nodetool upgradesstables -a. Update the operator-facing system_info_encryption description to mention system.raft and add a focused test that verifies the schema extension is present on system.raft. Fixes: CUSTOMER-317 Backport: 2026.1 - closes an encryption-at-rest coverage gap: system.raft may persist sensitive user-originated data unencrypted; backport to the current LTS. Closes scylladb/scylladb#29242 (cherry picked from commit `91df3795fc`) Closes scylladb/scylladb#29526 Closes scylladb/scylladb#29582 Closes scylladb/scylladb#29615	2026-04-24 18:08:41 +03:00
Patryk Jędrzejczak	20ed7c3d3d	raft_group0: join_group0: fix join hang when node joins group 0 before post_server_start A joining node hung forever if the topology coordinator added it to the group 0 configuration before the node reached `post_server_start`. In that case, `server->get_configuration().contains(my_id)` returned true and the node broke out of the join loop early, skipping `post_server_start`. `_join_node_group0_started` was therefore never set, so the node's `join_node_response` RPC handler blocked indefinitely. Meanwhile the topology coordinator's `respond_to_joining_node` call (which has no timeout) hung forever waiting for the reply that never came. Fix by only taking the early-break path when not starting as a follower (i.e. when the node is the discovery leader or is restarting). A joining node must always reach `post_server_start`. We also provide a regression test. It takes 6s in dev mode. Fixes SCYLLADB-959 Closes scylladb/scylladb#29266 (cherry picked from commit `b9f82f6f23`) Closes scylladb/scylladb#29291 Closes scylladb/scylladb#29308 Closes scylladb/scylladb#29419	2026-04-20 17:35:54 +02:00
Botond Dénes	5f86c008e2	Merge 'cql3: fix null handling in data_value formatting' from Dario Mirovic `data_value::to_parsable_string()` crashes with a null pointer dereference when called on a `null` data_value. Return `"null"` instead. Added tests after the fix. Manually checked that tests fail without the fix. Fixes SCYLLADB-1350 This is a fix that prevents format crash. No known occurrence in production, but backport is desirable. Closes scylladb/scylladb#29262 * github.com:scylladb/scylladb: test: boost: test null data value to_parsable_string cql3: fix null handling in data_value formatting (cherry picked from commit `816f2bf163`) Closes scylladb/scylladb#29468	2026-04-16 11:00:44 +03:00
Aleksandra Martyniuk	88e00db91c	nodetool: cluster repair: do not fail if a table was dropped nodetool cluster repair without additional params repairs all tablet keyspaces in a cluster. Currently, if a table is dropped while the command is running, all tables are repaired but the command finishes with a failure. Modify nodetool cluster repair. If a table wasn't specified (i.e. all tables are repaired), the command finishes successfully even if a table was dropped. If a table was specified and it does not exist (e.g. because it was dropped before the repair was requested), then the behavior remains unchanged. Fixes: SCYLLADB-568. Closes scylladb/scylladb#28739 (cherry picked from commit `2e68f48068`) Closes scylladb/scylladb#29280	2026-04-06 22:06:07 +03:00
Patryk Jędrzejczak	95e8c6fd56	test: test_remove_garbage_group0_members: wait for token ring and group0 consistency before removenode The removenove initiator could have an outdated token ring (still considering the node removed by the previous removenode a token owner) and unexpectedly reject the operation. Fix that by waiting for token ring and group0 consistency before removenode. Note that the test already checks that consistency, but only for one node, which is different from the removenode initiator. This test has been removed in master together with the code being tested (the gossip-based topology). Hence, the fix is submitted directly to 2026.1. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1103 Backport to all supported branches (other than 2026.1), as the test can fail there. Closes scylladb/scylladb#29108 (cherry picked from commit `1398a55d16`) Closes scylladb/scylladb#29206	2026-03-24 13:09:36 +01:00
Wojciech Mitros	793a1e9e89	mv: don't mark the view as built if the reader produced no partitions When we build a materialized view we read the entire base table from start to end to generate all required view udpates. If a view is created while another view is being built on the same base table, this is optimized - we start generating view udpates for the new view from the base table rows that we're currently reading, and we read the missed initial range again after the previous view finishes building. The view building progress is only updated after generating view updates for some read partitions. However, there are scenarios where we'll generate no view updates for the entire read range. If this was not handled we could end up in an infinite view building loop like we did in https://github.com/scylladb/scylladb/issues/17293 To handle this, we mark the view as built if the reader generated no partitions. However, this is not always the correct conclusion. Another scenario where the reader won't encounter any partitions is when view building is interrupted, and then we perform a reshard. In this scenario, we set the reader for all shards to the last unbuilt token for an existing partition before the reshard. However, this partition may not exist on a shard after reshard, and if there are also no partitions with higher tokens, the reader will generate no partitions even though it hasn't finished view building. Additionally, we already have a check that prevents infinite view building loops without taking the partitions generated by the reader into account. At the end of stream, before looping back to the start, we advance current_key to the end of the built range and check for built views in that range. This handles the case where the entire range is empty - the conditions for a built view are: 1. the "next_token" is no greater than "first_token" (the view building process looped back, so we've built all tokens above "first_token") 2. the "current_token" is no less than "first_token" (after looping back, we've built all tokens below "first_token") If the range is empty, we'll pass these conditions on an empty range after advancing "current_key" to the end because: 1. after looping back, "next_token" will be set to `dht::minimum_token` 2. "current_key" will be set to `dht::ring_position::max()` In this patch we remove the check for partitions generated by the reader. This fixes the issue with resharding and it does not resurrect the issue with infinite view building that the check was introduced for. Fixes https://github.com/scylladb/scylladb/issues/26523 Closes scylladb/scylladb#26635 (cherry picked from commit `0a22ac3c9e`) Closes scylladb/scylladb#26880	2026-03-03 13:32:24 +02:00
Łukasz Paszkowski	94650ff482	test/pylib/util.py: Add retries and additional logging to start_writes() Consider the following scenario: 1. Let nodes A,B,C form a cluster with RF=3 2. Write query with CL=QUORUM is submitted and is acknowledged by nodes B,C 3. Follow-up read query with CL=QUORUM is sent to verify the write from the previous step 4. Coordinator sends data/digest requests to the nodes A,B. Since the node A is missing data, digest mismatches and data reconciliation is triggered 5. The node A or B fails, becomes unavailable, etc 6. During reconciliation, data requests are sent to node A,B and fail failing the entire read query When the above scenario happens, the tests using `start_writes()` fail with the following stacktrace: ``` ... > await finish_writes() test/cluster/test_tablets_migration.py:259: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ test/pylib/util.py:241: in finish await asyncio.gather(*tasks) test/pylib/util.py:227: in do_writes raise e _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ worker_id = 1 ... > rows = await cql.run_async(rd_stmt, [pk]) E cassandra.ReadFailure: Error from server: code=1300 [Replica(s) failed to execute read] message="Operation failed for test_1767777001181_bmsvk.test - received 1 responses and 1 failures from 2 CL=QUORUM." info={'consistency': 'QUORUM', 'required_responses': 2, 'received_responses': 1, 'failures': 1} ``` Note that when a node failure happens before/during a read query, there is no test failure as the speculative retries are enabled by default. Hence an additional data/digest read is sent to the third remaining node. However, the same speculative read is cancelled the moment, the read query reaches CL which may trigger a read-repair. This change: - Retries the verification read in start_writes() on failure to mitigate races between reads and node failures - Adds additional logging to correlate Python exceptions with Scylla logs Fixes https://github.com/scylladb/scylladb/issues/27478 Fixes https://github.com/scylladb/scylladb/issues/27974 Fixes https://github.com/scylladb/scylladb/issues/27494 Fixes https://github.com/scylladb/scylladb/issues/23529 Note that this change test flakiness observed during tablet transitions. However, it serves as a workaround for a higher-level issue https://github.com/scylladb/scylladb/issues/28125 Closes scylladb/scylladb#28140 (cherry picked from commit `e07fe2536e`) Closes scylladb/scylladb#28827	2026-03-03 13:29:12 +02:00
Dawid Mędrek	032b5f16e0	test: topology_custom: Reduce wait time in test_sync_point If everything is OK, the sync point will not resolve with node 3 dead. As a result, the waiting will use all of the time we allocate for it, i.e. 30 seconds. That's a lot of time. There's no easy way to verify that the sync point will NOT resolve, but let's at least reduce the waiting to 3 seconds. If there's a bug, it should be enough to trigger it at some point, while reducing the average time needed for CI. (cherry picked from commit `f83f911bae`)	2026-02-19 15:04:09 +01:00
Dawid Mędrek	654a824f58	test: topology_custom: Fix test_sync_point The test had a few shortcomings that made it flaky or simply wrong: 1. We were verifying that hints were written by checking the size of in-flight hints. However, that could potentially lead to problems in rare situations. For instance, if all of the hints failed to be written to disk, the size of in-flight hints would drop to zero, but creating a sync point would correspond to the empty state. In such a situation, we should fail immediately and indicate what the cause was. 2. A sync point corresponds to the hints that have already been written to disk. The number of those is tracked by the metric `written`. It's a much more reliable way to make sure that hints have been written to the commitlog. That ensures that the sync point we'll create will really correspond to those hints. 3. The auxiliary function `wait_for` used in the test works like this: it executes the passed callback and looks at the result. If it's `None`, it retries it. Otherwise, the callback is deemed to have finished its execution and no further retries will be attempted. Before this commit, we simply returned a bool, and so the code was wrong. We improve it. Note that this fixes scylladb/scylladb#28203, which was a manifestation of scylladb/scylladb#25879. We created a sync point that corresponded to the empty state, and so it immediately resolved, even when node 3 was still dead. Refs scylladb/scylladb#25879 Fixes scylladb/scylladb#28203 (cherry picked from commit `a256ba7de0`)	2026-02-19 15:04:08 +01:00
Dawid Mędrek	07758bce68	test: topology_custom: Await sync points asynchronously There's a dedicated HTTP API for communicating with the cluster, so let's use it instead of yet another custom solution. (cherry picked from commit `c5239edf2a`)	2026-02-19 15:04:08 +01:00
Dawid Mędrek	45577072fa	test: topology_custom: Create sync points asynchronously There's a dedicated HTTP API for communicating with the nodes, so let's use it instead of yet another custom solution. (cherry picked from commit `ac4af5f461`)	2026-02-19 15:04:08 +01:00
Dawid Mędrek	58a342c524	test: topology_custom: Fetch hint metrics asynchronously There's a dedicated API for fetching metrics now. Let's use it instead of developing yet another solution that's also worse. (cherry picked from commit `628e74f157`)	2026-02-19 15:04:08 +01:00
Patryk Jędrzejczak	5db584a037	test: test_restart_leaving_replica_during_cleanup: reconnect driver after restart The test can currently fail like this: ``` > await cql.run_async(f"ALTER TABLE {ks}.test WITH tablets = {{'min_tablet_count': 1}}") E cassandra.cluster.NoHostAvailable: ('Unable to complete the operation against any hosts', {<Host: 127.158.27.9:9042 datacenter1>: <Error from server: code=0000 [Server error] message="Failed to apply group 0 change due to concurrent modification">}) ``` The following happens: - node A is restarted and becomes the group0 leader, - the driver sends the ALTER TABLE request to node B, - the request hits group 0 concurrent modification error 10 times and fails because node A performs tablet migrations at the the same time. What is unexpected is that even though the driver session uses the default retry policy, the driver doesn't retry the request on node A. The request is guaranteed to succeed on node A because it's the only node adding group0 entries. The driver doesn't retry the request on node A because of a missing `wait_for_cql_and_get_hosts` call. We add it in this commit. We also reconnect the driver just in case to prevent hitting scylladb/python-driver#295. Moreover, we can revert the workaround from `4c9efc08d8`, as the fix from this commit also prevents DROP KEYSPACE failures. The commit has been tested in byo with `_concurrent_ddl_retries{0}` to verify that node A really can't hit group 0 concurrent modification error and always receives the ALTER TABLE request from the driver. All 300 runs in each build mode passed. Fixes #25938 Closes scylladb/scylladb#28632 (cherry picked from commit `0693091aff`) Closes scylladb/scylladb#28670	2026-02-17 16:05:53 +01:00
Patryk Jędrzejczak	280303154d	Merge '[Backport 2025.1] storage_service: set up topology properly in maintenance mode' from Scylladb[bot] We currently make the local node the only token owner (that owns the whole ring) in maintenance mode, but we don't update the topology properly. The node is present in the topology, but in the `none` state. That's how it's inserted by `tm.get_topology().set_host_id_cfg(host_id);` in `scylla_main`. As a result, the node started in maintenance mode crashes in the following way in the presence of a vnodes-based keyspace with the NetworkTopologyStrategy: ``` scylla: locator/network_topology_strategy.cc:207: locator::natural_endpoints_tracker::natural_endpoints_tracker( const token_metadata &, const network_topology_strategy::dc_rep_factor_map &): Assertion `!_token_owners.empty() && !_racks.empty()' failed. ``` Both `_token_owners` and `_racks` are empty. The reason is that `_tm.get_datacenter_token_owners()` and `_tm.get_datacenter_racks_token_owners()` called above filter out nodes in the `none` state. This bug basically made maintenance mode unusable in customer clusters. We fix it by changing the node state to `normal`. We also extend `test_maintenance_mode` to provide a reproducer for #27988 and to avoid similar bugs in the future. Fixes #27988 This PR must be backported to all branches, as maintenance mode is currently unusable everywhere. - (cherry picked from commit `a08c53ae4b`) - (cherry picked from commit `9d4a5ade08`) - (cherry picked from commit `c92962ca45`) - (cherry picked from commit `408c6ea3ee`) - (cherry picked from commit `53f58b85b7`) - (cherry picked from commit `867a1ca346`) - (cherry picked from commit `6c547e1692`) - (cherry picked from commit `7e7b9977c5`) Parent PR: #28322 Closes scylladb/scylladb#28496 * https://github.com/scylladb/scylladb: test: test_maintenance_mode: enable maintenance mode properly test: test_maintenance_mode: shutdown cluster connections test: test_maintenance_mode: run with different keyspace options test: test_maintenance_mode: check that group0 is disabled by creating a keyspace test: test_maintenance_mode: get rid of the conditional skip test: test_maintenance_mode: remove the redundant value from the query result storage_proxy: skip validate_read_replica in maintenance mode storage_service: set up topology properly in maintenance mode	2026-02-04 16:57:50 +01:00
Patryk Jędrzejczak	3408b5ee66	test: test_maintenance_mode: enable maintenance mode properly The same issue as the one fixed in `394207fd69`. This one didn't cause real problems, but it's still cleaner to fix it. (cherry picked from commit `7e7b9977c5`)	2026-02-03 12:20:11 +01:00
Patryk Jędrzejczak	7c40e90529	test: test_maintenance_mode: shutdown cluster connections Leaked connections are known to cause inter-test issues. (cherry picked from commit `6c547e1692`)	2026-02-03 12:20:11 +01:00
Patryk Jędrzejczak	687061cca9	test: test_maintenance_mode: run with different keyspace options We extend the test to provide a reproducer for #27988 and to avoid similar bugs in the future. The test slows down from ~14s to ~19s on my local machine in dev mode. It seems reasonable. (cherry picked from commit `867a1ca346`)	2026-02-03 12:20:10 +01:00
Patryk Jędrzejczak	f5af15c245	test: test_maintenance_mode: check that group0 is disabled by creating a keyspace In the following commit, we make the rest run with multiple keyspaces, and the old check becomes inconvenient. We also move it below to the part of the code that won't be executed for each keyspace. Additionally, we check if the error message is as expected. (cherry picked from commit `53f58b85b7`)	2026-02-03 12:20:10 +01:00
Patryk Jędrzejczak	e7837844e8	test: test_maintenance_mode: get rid of the conditional skip This skip has already caused trouble. After `0668c642a2`, the skip was always hit, and the test was silently doing nothing. This made us miss #26816 for a long time. The test was fixed in `222eab45f8`, but we should get rid of the skip anyway. We increase the number of writes from 256 to 1000 to make the chance of not finding the key on server A even lower. If that still happens, it must be due to a bug, so we fail the test. We also make the test insert rows until server A is a replica of one row. The expected number of inserted rows is a small constant, so it should, in theory, make the test faster and cleaner (we need one row on server A, so we insert exactly one such row). It's possible to make the test fully deterministic, by e.g., hardcoding the key and tokens of all nodes via `initial_token`, but I'm afraid it would make the test "too deterministic" and could hide a bug. (cherry picked from commit `408c6ea3ee`)	2026-02-03 12:20:10 +01:00
Patryk Jędrzejczak	77721b2f32	test: test_maintenance_mode: remove the redundant value from the query result (cherry picked from commit `c92962ca45`)	2026-02-03 12:20:10 +01:00
Ferenc Szili	f8ae81d8bb	test: add test and reproducer for load_stats refresh exception This patch adds a test and reproducer for the issue where the load_stats refresh procedure throws exceptions if any of the tables have been dropped since load_stats was produced. (cherry picked from commit `92dbde54a5`)	2026-02-03 08:55:26 +01:00
Botond Dénes	ac36cafa98	db/row_cache: make_nonpopulating_reader(): pass cache tracker to snapshot The API contract in partition_version.hh states that when dealing with evictable entries, a real cache tracker pointer has to be passed to all methods that ask for it. The nonpopulating reader violates this, passing a nullptr to the snapshot. This was observed to cause a crash when a concurrent cache read accessed the snapshot with the null tracker. A reproducer is included which fails before and passes after the fix. Fixes: #26847 Closes scylladb/scylladb#28163 (cherry picked from commit `a53f989d2f`) Closes scylladb/scylladb#28277	2026-01-30 16:22:22 +02:00
Botond Dénes	cb5621f17f	Merge '[Backport 2025.1] db: batchlog_manager: update _last_replay only if all batches were re…' from Scylladb[bot] …played Currently, if flushing hints falls within the repair cache timeout, then the flush_time is set to batchlog_manager::_last_replay. _last_replay is updated on each replay, even if some batches weren't replayed. Due to that, we risk the data resurrection. Update _last_replay only if all batches were replayed. Fixes: https://github.com/scylladb/scylladb/issues/24415. Needs backport to all live versions. - (cherry picked from commit `4d0de1126f`) - (cherry picked from commit `e3dcb7e827`) Parent PR: #26793 Closes scylladb/scylladb#27088 * github.com:scylladb/scylladb: test: extend test_batchlog_replay_failure_during_repair db: batchlog_manager: update _last_replay only if all batches were replayed	2026-01-30 16:21:43 +02:00
Aleksandra Martyniuk	fef5ae1e00	test: fix test_two_tablets_concurrent_repair_and_migration_repair_writer_level test_two_tablets_concurrent_repair_and_migration_repair_writer_level waits for the first node that logs info about repair_writer using asyncio.wait. The done group is never awaited, so we never learn about the error. The test itself is incorrect and the log about repair_writer is never printed. We never learn about that and tests finishes successfully after 10 minutes timeout. Fix the test: - disable hinted handoff; - repair tablets of the whole table: - new table is added so that concurrent migration is possible; - use wait_for_first_completed that awaits done group; - do some cleanups. Remove nightly mark. Fixes: #26148. Closes scylladb/scylladb#26209 (cherry picked from commit `48bbe09c8b`) Closes scylladb/scylladb#26217	2026-01-30 16:18:49 +02:00
Gleb Natapov	5e706c1790	topology coordinator: complete pending operation for a replaced node A replaced node may have pending operation on it. The replace operation will move the node into the 'left' state and the request will never be completed. More over the code does not expect left node to have a request. It will try to process the request and will crash because the node for the request will not be found. The patch checks is the replaced node has peening request and completes it with failure. It also changes topology loading code to skip requests for nodes that are in a left state. This is not strictly needed, but makes the code more robust. Fixes #27990 Closes scylladb/scylladb#28009 (cherry picked from commit `bee5f63cb6`) Closes scylladb/scylladb#28174	2026-01-27 10:17:06 +01:00
Patryk Jędrzejczak	d360d5a937	test: test_zero_token_nodes_multidc: properly handle reads with CL=LOCAL_ONE The test is currently flaky. It incorrectly assumes that a read with CL=LOCAL_ONE will see the data inserted by a preceding write with CL=LOCAL_ONE in the same datacenter with RF=2. The same issue has already been fixed for CL=ONE in `21edec1ace`. The difference is that for CL=LOCAL_ONE, only dc1 is problematic, as dc2 has RF=1. We fix the issue for CL=LOCAL_ONE by skipping the check for dc1. Fixes #28253 The fix addresses CI flakiness and only changes the test, so it should be backported. Closes scylladb/scylladb#28274 (cherry picked from commit `1f0f694c9e`) Closes scylladb/scylladb#28302	2026-01-22 11:08:22 +01:00
Aleksandra Martyniuk	b5dae10acb	test: extend test_batchlog_replay_failure_during_repair Modify test_batchlog_replay_failure_during_repair to also check that there isn't data resurrection if flushing hints falls within the repair cache timeout. (cherry picked from commit `e3dcb7e827`)	2026-01-22 10:48:32 +01:00
Botond Dénes	abf7c34b40	reader_concurrency_semaphore: improve handling of base resources reader_permit::release_base_resources() is a soft evict for the permit: it releases the resources aquired during admission. This is used in cases where a single process owns multiple permits, creating a risk for deadlock, like it is the case for repair. In this case, release_base_resources() acts as a manual eviction mechanism to prevent permits blockings each other from admission. Recently we found a bad interaction between release_base_resources() and permit eviction. Repair uses both mechanism: it marks its permits as inactive and later it also uses release_base_resources(). This partice might be worth reconsidering, but the fact remains that there is a bug in the reader permit which causes the base resources to be released twice when release_base_resources() is called on an already evicted permit. This is incorrect and is fixed in this patch. Improve release_base_resources(): * make _base_resources const * move signal call into the if (_base_resources_consumed()) { } * use reader_permit::impl::signal() instead of reader_concurrency_semaphore::signal() * all places where base resources are released now call release_base_resources() A reproducer unit test is added, which fails before and passes after the fix. Fixes: #28083 Closes scylladb/scylladb#28155 (cherry picked from commit `b7bc48e7b7`) Closes scylladb/scylladb#28238	2026-01-21 06:47:21 +02:00
Asias He	a04f31da5b	repair: Allow min max range to be updated for repair history It is observed that: repair - repair[667d4a59-63fb-4ca6-8feb-98da49946d8b]: Failed to update system.repair_history table of node d27de212-6f32-4649ad76-a9ef1165fdcb: seastar::rpc::remote_verb_error (repair[667d4a59-63fb-4ca6-8feb-98da49946d8b]: range (minimum token,maximum token) is not in the format of (start, end]) This is because repair checks the end of the range to be repaired needs to be inclusive. When small_table_optimization is enabled for regular repair, a (minimum token,maximum token) will be used. To fix, we can relax the check of (start, end] for the min max range. Fixes #27220 Backport to all active branches. (cherry picked from commit `e97a504`) Parent PR: #27357 Closes scylladb/scylladb#27458	2026-01-19 11:09:42 +02:00
Nikos Dragazis	d21b37c8eb	test: database_test: Fix serialization of partition key The `make_key` lambda erroneously allocates a fixed 8-byte buffer (`sizeof(s.size())`) for variable-length strings, potentially causing uninitialized bytes to be included. If such bytes exist and they are not valid UTF-8 characters, deserialization fails: ``` ERROR 2026-01-16 08:18:26,062 [shard 0:main] testlog - snapshot_list_contains_dropped_tables: cql env callback failed, error: exceptions::invalid_request_exception (Exception while binding column p1: marshaling error: Validation failed - non-UTF8 character in a UTF8 string, at byte offset 7) ``` Fixes #28195. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com> Closes scylladb/scylladb#28197 (cherry picked from commit `8aca7b0eb9`) Closes scylladb/scylladb#28207	2026-01-19 09:44:21 +02:00
Botond Dénes	729cf6d5b6	Merge '[Backport 2025.1] raft topology: preserve IP -> ID mapping of a replacing node on restart' from Scylladb[bot] Note: only the fix for replace with different IP has been backported to 2025.1. The IP-based gossiper in 2025.1 has made backporting the fix for replace with the same IP too difficult. We currently do it only for a bootstrapping node, which is a bug. The missing IP can cause an internal error, for example, in the following scenario: - replace fails during streaming, - all live nodes are shut down before the rollback of replace completes, - all live nodes are restarted, - live nodes start hitting internal error in all operations that require IP of the replacing node (like client requests or REST API requests coming from nodetool). We fix the bug here, but we do it separately for replace with different IP and replace with the same IP. For replace with different IP, we persist the IP -> host ID mapping in `system.peers` just like for bootstrap. That's necessary, since there is no other way to determine IP of the replacing node on restart. For replace with the same IP, we can't do the same. This would require deleting the row corresponding to the node being replaced from `system.peers`. That's fine in theory, as that node is permanently banned, so its IP shouldn't be needed. Unfortunately, we have many places in the code where we assume that IP of a topology member is always present in the address map or that a topology member is always present in the gossiper endpoint set. Examples of such places: - nodetool operations, - REST API endpoints, - `db::hints::manager::store_hint`, - `group0_voter_handler::update_nodes`. We could fix all those places and verify that drivers work properly when they see a node in the token metadata, but not in `system.peers`. However, that would be too risky to backport. We take a different approach. We recover IP of the replacing node on restart based on the state of the topology state machine and `system.peers` just after loading `system.peers`. We rely on the fact that group 0 is set up at this point. The only case where this assumption is incorrect is a restart in the Raft-based recovery procedure. However, hitting this problem then seems improbable, and even if it happens, we can restart the node again after ensuring that no client and REST API requests come before replace is rolled back on the new topology coordinator. Hence, it's not worth to complicate the fix (by e.g. looking at the persistent topology state instead of the in-memory state machine). Fixes #28057 Backport this PR to all branches as it fixes a problematic bug. - (cherry picked from commit `fc4c2df2ce`) - (cherry picked from commit `4526dd93b1`) - (cherry picked from commit `749b0278e5`) - (cherry picked from commit `0fed9f94f8`) Manually cherry-picked: - `90b5b2c5f5` - `92b165b8c0` Parent PR: #27435 Closes scylladb/scylladb#28096 * github.com:scylladb/scylladb: test: introduce test_full_shutdown_during_replace utils: error_injection: allow aborting wait_for_message raft topology: preserve IP -> ID mapping of a replacing node on restart pylib/rest_client.py: encode injection name utils/error_injection: allow to abort `injection_handler::wait_for_message()`	2026-01-19 09:43:43 +02:00
Raphael S. Carvalho	ab19da2bd7	replica: Fix race between drop table and merge completion handling Consider this: 1) merge finishes, wakes up fiber to merge compaction groups 2) drop table happens, which in turn invokes truncate underneath 3) merge fiber stops old groups 4) truncate disables compaction on all groups, but the ones stopped 5) truncate performs a check that compaction has been disabled on all groups, including the ones stopped 6) the check fails because groups being stopped didn't have compaction explicitly disabled on them To fix it, the check on step 6 will ignore groups that have been stopped, since those are not eligible for having compaction explicitly disabled on them. The compaction check is there, so ongoing compaction will not propagate data being truncated, but here it happens in the context of drop table which doesn't leave anything behind. Also, a group stopped is somewhat equivalent to compaction disabled on it, since the procedure to stop a group stops all ongoing compaction and eventually removes its state from compaction manager. Fixes #25551. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#25563 (cherry picked from commit `149f9d8448`) Closes scylladb/scylladb#25630	2026-01-19 06:40:30 +02:00
Calle Wilund	74a721b9db	db::commitlog: Fix sanity check error on race between segment flushing and oversized alloc Fixes #27992 When doing a commit log oversized allocation, we lock out all other writers by grabbing the _request_controller semaphore fully (max capacity). We thereafter assert that the semaphore is in fact zero. However, due to how things work with the bookkeep here, the semaphore can in fact become negative (some paths will not actually wait for the semaphore, because this could deadlock). Thus, if, after we grab the semaphore and execution actually returns to us (task schedule), new_buffer via segment::allocate is called (due to a non-fully-full segment), we might in fact grab the segment overhead from zero, resulting in a negative semaphore. The same problem applies later when we try to sanity check the return of our permits. Fix is trivial, just accept less-than-zero values, and take same possible ltz-value into account in exit check (returning units) Added whitebox (special callback interface for sync) unit test that provokes/creates the race condition explicitly (and reliably). Closes scylladb/scylladb#27998 (cherry picked from commit `a7cdb602e1`) Closes scylladb/scylladb#28095	2026-01-16 16:24:43 +02:00
Patryk Jędrzejczak	a1eec6f495	test: test_group0_schema_versioning: wait for schema sync in system.local `test_schema_versioning_with_recovery` is currently flaky. It performs a write with CL=ALL and then checks if the schema version is the same on all nodes by calling `verify_table_versions_synced`. All nodes are expected to sync their schema before handling the replica write. The node in RECOVERY mode should do it through a schema pull, and other nodes should do it through a group 0 read barrier. The problem is in `verify_local_schema_versions_synced` that compares the schema versions in `system.local`. The node in RECOVERY mode updates the schema version in `system.local` after it acknowledges the replica write as completed. Hence, the check can fail. We fix the problem by making the function wait until the schema versions match. Note that RECOVERY mode is about to be retired together with the whole gossip-based topology in 2026.2. So, this test is about to be deleted. However, we still want to fix it, so that it doesn't bother us in older branches. Fixes #23803 Closes scylladb/scylladb#28114 (cherry picked from commit `6b5923c64e`) Closes scylladb/scylladb#28172	2026-01-16 11:23:09 +01:00
Sergey Zolotukhin	c0f730bbaf	test: disable test_start_bootstrapped_with_invalid_seed The test intermittently fails when an invalid DNS name is resolved, likely due to ISP DNS error hijacking (see scylladb/scylladb#28153). Disable this test to unblock CI. Fixes scylladb/scylladb#28153 Closes scylladb/scylladb#28162 (cherry picked from commit `799d837295`)	2026-01-15 17:56:10 +02:00
Patryk Jędrzejczak	fe8738540e	test: introduce test_full_shutdown_during_replace (cherry picked from commit `749b0278e5`)	2026-01-13 17:14:19 +01:00
Petr Gusev	1b16fabbf6	pylib/rest_client.py: encode injection name Sometimes it's convenient to use slashes in injection names, for example my_component/my_method/my_condition. Without quote() we get 'handler not found' error from Scylla. (cherry picked from commit `92b165b8c0`)	2026-01-13 17:14:19 +01:00
Patryk Jędrzejczak	52f20e66ef	Merge '[Backport 2025.1] database: truncate_table_on_all_shards: consider can_flush on all shards' from Scylladb[bot] Currently, database::truncate_table_on_all_shards calls the table::can_flush only on the coordinator shard and therefore it may miss shards with dirty data if the coordinator shard happens to have empty memtables, leading to clearing the memtables with dirty data rather than flushing them. This change fixes that by making flush safe to be called, even if the memtable list is empty, and calling it on every shard that can flush (i.e. seal_immediate_fn is engaged). Also, change database_test::do_with_some_data is use random keys instead of hard-coded key names, to reproduce this issue with `snapshot_list_contains_dropped_tables`. Fixes #27639 * The issue exists since forever and might cause data loss due to wrongly clearing the memtable, so it needs backport to all live versions - (cherry picked from commit `ec4069246d`) - (cherry picked from commit `5be6b80936`) - (cherry picked from commit `0342a24ee0`) - (cherry picked from commit `02ee341a03`) - (cherry picked from commit `2a803d2261`) - (cherry picked from commit `93b827c185`) - (cherry picked from commit `ebd667a8e0`) Parent PR: #27643 Closes scylladb/scylladb#28067 * https://github.com/scylladb/scylladb: test: database_test: do_with_some_data: randomize keys database: truncate_table_on_all_shards: drop outdated TODO comment database: truncate_table_on_all_shards: consider can_flush on all shards memtable_list: unify can_flush and may_flush test: database_test: add test_flush_empty_table_waits_on_outstanding_flush replica: table, storage_group, compaction_group: add needs_flush test: database_test: do_with_some_data_in_thread: accept void callback function	2026-01-12 11:27:01 +01:00
Tomasz Grabiec	bec19944ca	test: cluster: Fix NoHostAvailable error in test_not_enough_token_owners The driver must see server_c before we stop server_a, otherwise there will be no live host in the pool when we attempt to drop the keyspace: ``` @pytest.mark.asyncio async def test_not_enough_token_owners(manager: ManagerClient): """ Test that: - the first node in the cluster cannot be a zero-token node - removenode and decommission of the only token owner fail in the presence of zero-token nodes - removenode and decommission of a token owner fail in the presence of zero-token nodes if the number of token owners would fall below the RF of some keyspace using tablets """ logging.info('Trying to add a zero-token server as the first server in the cluster') await manager.server_add(config={'join_ring': False}, property_file={"dc": "dc1", "rack": "rz"}, expected_error='Cannot start the first node in the cluster as zero-token') logging.info('Adding the first server') server_a = await manager.server_add(property_file={"dc": "dc1", "rack": "r1"}) logging.info('Adding two zero-token servers') # The second server is needed only to preserve the Raft majority. server_b = (await manager.servers_add(2, config={'join_ring': False}, property_file={"dc": "dc1", "rack": "rz"}))[0] logging.info(f'Trying to decommission the only token owner {server_a}') await manager.decommission_node(server_a.server_id, expected_error='Cannot decommission the last token-owning node in the cluster') logging.info(f'Stopping {server_a}') await manager.server_stop_gracefully(server_a.server_id) logging.info(f'Trying to remove the only token owner {server_a} by {server_b}') await manager.remove_node(server_b.server_id, server_a.server_id, expected_error='cannot be removed because it is the last token-owning node in the cluster') logging.info(f'Starting {server_a}') await manager.server_start(server_a.server_id) logging.info('Adding a normal server') await manager.server_add(property_file={"dc": "dc1", "rack": "r2"}) cql = manager.get_cql() await wait_for_cql_and_get_hosts(cql, [server_a], time.time() + 60) > async with new_test_keyspace(manager, "WITH replication = {'class': 'NetworkTopologyStrategy', 'replication_factor': 2} AND tablets = { 'enabled': true }") as ks_name: test/cluster/test_not_enough_token_owners.py:57: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ /usr/lib64/python3.14/contextlib.py:221: in __aexit__ await anext(self.gen) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ manager = <test.pylib.manager_client.ManagerClient object at 0x7f37efe00830> opts = "WITH replication = {'class': 'NetworkTopologyStrategy', 'replication_factor': 2} AND tablets = { 'enabled': true }" host = None @asynccontextmanager async def new_test_keyspace(manager: ManagerClient, opts, host=None): """ A utility function for creating a new temporary keyspace with given options. It can be used in a "async with", as: async with new_test_keyspace(ManagerClient, '...') as keyspace: """ keyspace = await create_new_test_keyspace(manager.get_cql(), opts, host) try: yield keyspace except: logger.info(f"Error happened while using keyspace '{keyspace}', the keyspace is left in place for investigation") raise else: > await manager.get_cql().run_async("DROP KEYSPACE " + keyspace, host=host) E cassandra.cluster.NoHostAvailable: ('Unable to complete the operation against any hosts', {<Host: 127.69.108.39:9042 dc1>: ConnectionException('Pool for 127.69.108.39:9042 is shutdown')}) test/cluster/util.py:544: NoHostAvailable ``` Fixes #28011 Closes scylladb/scylladb#28040 (cherry picked from commit `34df158605`) Closes scylladb/scylladb#28063	2026-01-09 19:14:34 +01:00
Botond Dénes	f9be7b4672	reader_concurrency_semaphore: add protection against negative count resource leaks The semaphore has detection and protection against regular resource leaks, where some resources go unaccounted for and are not released by the time the semaphore is destroyed. There is no detection or protection against negative leaks: where resources are "made up" of thin air. This kind of leaks looks benign at first sight, a few extra resources won't hurt anyone so long as this is a small amount. But turns out that even a single extra count resource can defeat a very important anti-deadlock protection in can_admit_read(): the special case which admits a new permit regardless of memory resources, when all original count resources all available. This check uses ==, so if resource > original, the protection is defeated indefinitely. Instead of just changing == to >=, we add detection of such negative leaks to signal(), via on_internal_error_noexcept(). At this time I still don't now how this negative leak happens (the code doesn't confess), with this detection, hopefully we'll get a clue from tests or the field. Note that on_internal_error_noexcept() will not generate a coredump, unless ScyllaDB is explicitely configured to do so. In production, it will just generate an error log with a backtrace. The detection also clams the _resources to _initial_resources, to prevent any damage from the negativae leak. I just noticed that there is no unit test for the deadlock protection described above, so one is added in this PR, even if only loosely related to the rest of the patch. Fixes: SCYLLADB-163 Closes scylladb/scylladb#27764 (cherry picked from commit `e4da0afb8d`) Closes scylladb/scylladb#28000	2026-01-09 13:28:46 +02:00
Benny Halevy	7389784151	test: database_test: do_with_some_data: randomize keys With randomized keys, and since we're inserting only 2 keys, it is possible that they would end up owned only by a single shard, reproducing #27639 in snapshot_list_contains_dropped_tables. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `ebd667a8e0`)	2026-01-09 09:02:29 +02:00
Benny Halevy	3920cacadc	test: database_test: add test_flush_empty_table_waits_on_outstanding_flush Test that table::flush waits on outstanding flushes, even if the active memtable is empty Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `0342a24ee0`)	2026-01-09 08:57:39 +02:00
Benny Halevy	faecb2aabc	test: database_test: do_with_some_data_in_thread: accept void callback function Many test cases already assume `func` is being called a seastar thread and although the function they pass returns a (ready) future, it serves no purpose other than to conform to the interface. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `ec4069246d`)	2026-01-09 08:53:16 +02:00
Botond Dénes	a6367cc48c	Merge '[Backport 2025.1] topology_coordinator: Add barrier to cleanup_target' from Scylladb[bot] Consider the following scenario: 1. A table has RF=3 and writes use CL=QUORUM 2. One node is down 3. There is a pending tablet migration from the unavailable node that is reverted During the revert, there can be a time window where the pending replica being cleaned up still accepts writes. This leads to write failures, as only two nodes (out of four) are able to acknowledge writes. This patch fixes the issue by adding a barrier to the cleanup_target tablet transition state, ensuring that the coordinator switches back to the previous replica set before cleanup is triggered. Fixes https://github.com/scylladb/scylladb/issues/26512 It's a pre existing issue. Backport is required to all recent 2025.x versions. - (cherry picked from commit `669286b1d6`) - (cherry picked from commit `67f1c6d36c`) - (cherry picked from commit `6163fedd2e`) Parent PR: #27413 Closes scylladb/scylladb#27425 * github.com:scylladb/scylladb: topology_coordinator: Fix the indentation for the cleanup_target case topology_coordinator: Add barrier to cleanup_target test_node_failure_during_tablet_migration: Increase RF from 2 to 3	2026-01-08 18:09:37 +02:00
Botond Dénes	9b10a6328d	Merge '[Backport 2025.1] db: repair: do not update repair_time if batchlog replay failed' from Scylladb[bot] Currently, batchlog replay is considered successful even if all batches fail to be sent (they are replayed later). However, repair requires all batches to be sent successfully. Currently, if batchlog isn't cleared, the repair never learns and updates the repair_time. If GC mode is set to "repair", this means that the tombstones written before the repair_time (minus propagation_delay) can be GC'd while not all batches were replied. Consider a scenario: - Table t has a row with (pk=1, v=0); - There is an entry in the batchlog that sets (pk=1, v=1) in table t; - The row with pk=1 is deleted from table t; - Table t is repaired: - batchlog reply fails; - repair_time is updated; - propagation_delay seconds passes and the tombstone of pk=1 is GC'd; - batchlog is replayed and (pk=1, v=1) inserted - data resurrection! Do not update repair_time if sending any batch fails. The data is still repaired. For tablet repair the repair runs, but at the end the exception is passed to topology coordinator. Thanks to that the repair_time isn't updated. The repair request isn't removed as well, due to which the repair will need to rerun. Apart from that, a batch is removed from the batchlog if its version is invalid or unknown. The condition on which we consider a batch too fresh to replay is updated to consider propagation_delay. Fixes: https://github.com/scylladb/scylladb/issues/24415 Data resurrection fix; needs backport to all versions - (cherry picked from commit `502b03dbc6`) - (cherry picked from commit `904183734f`) - (cherry picked from commit `7f20b66eff`) - (cherry picked from commit `e1b2180092`) - (cherry picked from commit `d436233209`) - (cherry picked from commit `1935268a87`) - (cherry picked from commit `6fc43f27d0`) Parent PR: #26319 Closes scylladb/scylladb#26752 * github.com:scylladb/scylladb: repair: throw if flush failed in get_flush_time db: fix indentation test: add reproducer for data resurrection repair: fail tablet repair if any batch wasn't sent successfully db/batchlog_manager: fix making decision to skip batch replay db: repair: throw if replay fails db/batchlog_manager: delete batch with incorrect or unknown version db/batchlog_manager: coroutinize replay_all_failed_batches	2026-01-08 18:07:03 +02:00
Patryk Jędrzejczak	310ce822c9	Merge '[Backport 2025.1] test/raft: use valid sentinel in liveness check to prevent digest errors' from Scylladb[bot] Replace -1 with 0 for the liveness check operation to avoid triggering digest validation failures. This prevents rare fatal errors when the cluster is recovering and ensures the test does not violate append_seq invariants. The value -1 was causing invalid digest results in the append_seq structure, leading to assertion failures. This could happen when the sentinel value was the first (or only) element being appended, resulting in a digest that did not match the expected value. By using 0 instead, we ensure that the digest calculations remain valid and consistent with the expected behavior of the test. The specific value of the sentinel is not important, as long as it is a valid elem_t that does not violate the invariants of the append_seq structure. In particular, the sentinel value is typically used only when no valid result is received from any server in the current loop iteration, in which case the loop will retry. Fixes: scylladb/scylladb#27307 Backporting to active branches - this is a test-only fix (low risk) for a flaky test that exists in older branches (thus affects the CI of active branches). - (cherry picked from commit `3af5183633`) - (cherry picked from commit `4ba3e90f33`) Parent PR: #28010 Closes scylladb/scylladb#28036 * https://github.com/scylladb/scylladb: test/raft: use valid sentinel in liveness check to prevent digest errors test/raft: improve debugging in randomized_nemesis_test test/raft: improve reporting in the randomized_nemesis_test digest functions	2026-01-08 15:42:08 +01:00
Aleksandra Martyniuk	e1776dc828	test: rename duplicate tests There are two test with name test_repair_options_hosts_tablets in test/nodetool/test_cluster_repair.py and and two test_repair_keyspace in test/nodetool/test_repair.py. Due to that one of each pair is ignored. Rename the tests so that they are unique. Fixes: https://github.com/scylladb/scylladb/issues/27701. Closes scylladb/scylladb#27720 (cherry picked from commit `bbe64e0e2a`) Closes scylladb/scylladb#27845	2026-01-08 14:52:38 +02:00

1 2 3 4 5 ...

8608 Commits