scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-20 00:20:47 +00:00

Author	SHA1	Message	Date
Yaniv Michael Kaul	ba21da2e02	compaction: set_skip_when_empty() for validation_errors metric Add .set_skip_when_empty() to compaction_manager::validation_errors. This metric only increments when scrubbing encounters out-of-order or invalid mutation fragments in SSTables, indicating data corruption. It is almost always zero and creates unnecessary reporting overhead. AI-Assisted: yes Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>	2026-04-06 14:42:07 +03:00
Avi Kivity	b4f652b7c1	test: fix flaky test_create_ks_auth by removing bad retry timeout get_session() was passing timeout=0.1 to patient_exclusive_cql_connection and patient_cql_connection, leaving only 0.1 seconds for the retry loop in retry_till_success(). Since each connection attempt can take up to 5 seconds (connect_timeout=5), the retry loop effectively got only one attempt with no chance to retry on transient NoHostAvailable errors. Use the default timeout=30 seconds, consistent with all other callers. Fixes: SCYLLADB-1373 Closes scylladb/scylladb#29332	2026-04-05 19:13:15 +03:00
Jenkins Promoter	ab4a2cdde2	Update pgo profiles - aarch64	2026-04-05 16:58:02 +03:00
Jenkins Promoter	b97cf0083c	Update pgo profiles - x86_64	2026-04-05 16:00:15 +03:00
Nikos Dragazis	6d50e67bd2	scylla_swap_setup: Remove Before=swap.target dependency from swap unit When a Scylla node starts, the scylla-image-setup.service invokes the `scylla_swap_setup` script to provision swap. This script allocates a swap file and creates a swap systemd unit to delegate control to systemd. By default, systemd injects a Before=swap.target dependency into every swap unit, allowing other services to use swap.target to wait for swap to be enabled. On Azure, this doesn't work so well because we store the swap file on the ephemeral disk [1] which has network dependencies (`_netdev` mount option, configured by cloud-init [2]). This makes the swap.target indirectly depend on the network, leading to dependency cycles such as: swap.target -> mnt-swapfile.swap -> mnt.mount -> network-online.target -> network.target -> systemd-resolved.service -> tmp.mount -> swap.target This patch breaks the cycle by removing the swap unit from swap.target using DefaultDependencies=no. The swap unit will still be activated via WantedBy=multi-user.target, just not during early boot. Although this problem is specific to Azure, this patch applies the fix to all clouds to keep the code simple. Fixes #26519. Fixes SCYLLADB-1257 [1] https://github.com/scylladb/scylla-machine-image/pull/426 [2] https://github.com/canonical/cloud-init/pull/1213#issuecomment-1026065501 Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com> Closes scylladb/scylladb#28504	2026-04-05 15:07:50 +03:00
Tomasz Grabiec	74542be5aa	test: pylib: Ignore exceptions in wait_for() ManagerClient::get_ready_cql() calls server_sees_others(), which waits for servers to see each other as alive in gossip. If one of the servers is still early in boot, RESTful API call to "gossiper/endpoint/live" may fail. It throws an exception, which currently terminates the wait_for() and propagates up, failing the test. Fix this by ignoring errors when polling inside wait_for. In case of timeout, we log the last exception. This should fix the problem not only in this case, for all uses of wait_for(). Example output: ``` pred = <function ManagerClient.server_sees_others.<locals>._sees_min_others at 0x7f022af9a140> deadline = 1775218828.9172852, period = 1.0, before_retry = None backoff_factor = 1.5, max_period = 1.0, label = None async def wait_for( pred: Callable[[], Awaitable[Optional[T]]], deadline: float, period: float = 0.1, before_retry: Optional[Callable[[], Any]] = None, backoff_factor: float = 1.5, max_period: float = 1.0, label: Optional[str] = None) -> T: tag = label or getattr(pred, '__name__', 'unlabeled') start = time.time() retries = 0 last_exception: Exception \| None = None while True: elapsed = time.time() - start if time.time() >= deadline: timeout_msg = f"wait_for({tag}) timed out after {elapsed:.2f}s ({retries} retries)" if last_exception is not None: timeout_msg += ( f"; last exception: {type(last_exception).__name__}: {last_exception}" ) raise AssertionError(timeout_msg) from last_exception raise AssertionError(timeout_msg) try: > res = await pred() test/pylib/util.py:80: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ async def _sees_min_others(): > raise Exception("asd") E Exception: asd test/pylib/manager_client.py:802: Exception The above exception was the direct cause of the following exception: manager = <test.pylib.manager_client.ManagerClient object at 0x7f022af7e7b0> @pytest.mark.asyncio async def test_auth_after_reset(manager: ManagerClient) -> None: servers = await manager.servers_add(3, config=auth_config, auto_rack_dc="dc1") > cql, _ = await manager.get_ready_cql(servers) test/cluster/auth_cluster/test_auth_after_reset.py:33: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ test/pylib/manager_client.py:137: in get_ready_cql await self.servers_see_each_other(servers) test/pylib/manager_client.py:820: in servers_see_each_other await asyncio.gather(others) test/pylib/manager_client.py:806: in server_sees_others await wait_for(_sees_min_others, time() + interval, period=.5) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ pred = <function ManagerClient.server_sees_others.<locals>._sees_min_others at 0x7f022af9a140> deadline = 1775218828.9172852, period = 1.0, before_retry = None backoff_factor = 1.5, max_period = 1.0, label = None async def wait_for( pred: Callable[[], Awaitable[Optional[T]]], deadline: float, period: float = 0.1, before_retry: Optional[Callable[[], Any]] = None, backoff_factor: float = 1.5, max_period: float = 1.0, label: Optional[str] = None) -> T: tag = label or getattr(pred, '__name__', 'unlabeled') start = time.time() retries = 0 last_exception: Exception \| None = None while True: elapsed = time.time() - start if time.time() >= deadline: timeout_msg = f"wait_for({tag}) timed out after {elapsed:.2f}s ({retries} retries)" if last_exception is not None: timeout_msg += ( f"; last exception: {type(last_exception).__name__}: {last_exception}" ) > raise AssertionError(timeout_msg) from last_exception E AssertionError: wait_for(_sees_min_others) timed out after 45.30s (46 retries); last exception: Exception: asd test/pylib/util.py:76: AssertionError ``` Fixes a failure observed in test_auth_after_reset: ``` manager = <test.pylib.manager_client.ManagerClient object at 0x7fb3740e1630> @pytest.mark.asyncio async def test_auth_after_reset(manager: ManagerClient) -> None: servers = await manager.servers_add(3, config=auth_config, auto_rack_dc="dc1") cql, _ = await manager.get_ready_cql(servers) await cql.run_async("ALTER ROLE cassandra WITH PASSWORD = 'forgotten_pwd'") logging.info("Stopping cluster") await asyncio.gather([manager.server_stop_gracefully(server.server_id) for server in servers]) logging.info("Deleting sstables") for table in ["roles", "role_members", "role_attributes", "role_permissions"]: await asyncio.gather([manager.server_wipe_sstables(server.server_id, "system", table) for server in servers]) logging.info("Starting cluster") # Don't try connect to the servers yet, with deleted superuser it will be possible only after # quorum is reached. await asyncio.gather([manager.server_start(server.server_id, connect_driver=False) for server in servers]) logging.info("Waiting for CQL connection") await repeat_until_success(lambda: manager.driver_connect(auth_provider=PlainTextAuthProvider(username="cassandra", password="cassandra"))) > await manager.get_ready_cql(servers) test/cluster/auth_cluster/test_auth_after_reset.py:50: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ test/pylib/manager_client.py:137: in get_ready_cql await self.servers_see_each_other(servers) test/pylib/manager_client.py:819: in servers_see_each_other await asyncio.gather(*others) test/pylib/manager_client.py:805: in server_sees_others await wait_for(_sees_min_others, time() + interval, period=.5) test/pylib/util.py:71: in wait_for res = await pred() test/pylib/manager_client.py:802: in _sees_min_others alive_nodes = await self.api.get_alive_endpoints(server_ip) test/pylib/rest_client.py:243: in get_alive_endpoints data = await self.client.get_json(f"/gossiper/endpoint/live", host=node_ip) test/pylib/rest_client.py:99: in get_json ret = await self._fetch("GET", resource_uri, response_type = "json", host = host, _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = <test.pylib.rest_client.TCPRESTClient object at 0x7fb2404a0650> method = 'GET', resource = '/gossiper/endpoint/live', response_type = 'json' host = '127.15.252.8', port = 10000, params = None, json = None, timeout = None allow_failed = False async def _fetch(self, method: str, resource: str, response_type: Optional[str] = None, host: Optional[str] = None, port: Optional[int] = None, params: Optional[Mapping[str, str]] = None, json: Optional[Mapping] = None, timeout: Optional[float] = None, allow_failed: bool = False) -> Any: # Can raise exception. See https://docs.aiohttp.org/en/latest/web_exceptions.html assert method in ["GET", "POST", "PUT", "DELETE"], f"Invalid HTTP request method {method}" assert response_type is None or response_type in ["text", "json"], \ f"Invalid response type requested {response_type} (expected 'text' or 'json')" # Build the URI port = port if port else self.default_port if hasattr(self, "default_port") else None port_str = f":{port}" if port else "" assert host is not None or hasattr(self, "default_host"), "_fetch: missing host for " \ "{method} {resource}" host_str = host if host is not None else self.default_host uri = self.uri_scheme + "://" + host_str + port_str + resource logging.debug(f"RESTClient fetching {method} {uri}") client_timeout = ClientTimeout(total = timeout if timeout is not None else 300) async with request(method, uri, connector = self.connector if hasattr(self, "connector") else None, params = params, json = json, timeout = client_timeout) as resp: if allow_failed: return await resp.json() if resp.status != 200: text = await resp.text() > raise HTTPError(uri, resp.status, params, json, text) E test.pylib.rest_client.HTTPError: HTTP error 404, uri: http://127.15.252.8:10000/gossiper/endpoint/live, params: None, json: None, body: E {"message": "Not found", "code": 404} test/pylib/rest_client.py:77: HTTPError ``` Fixes: SCYLLADB-1367 Closes scylladb/scylladb#29323	2026-04-05 13:52:26 +03:00
Andrzej Jackowski	8c0920202b	test: protect populate_range in row_cache_test from bad_alloc When test_exception_safety_of_update_from_memtable was converted from manual fail_after()/catch to with_allocation_failures() in `74db08165d`, the populate_range() call ended up inside the failure injection scope without a scoped_critical_alloc_section guard. The other two tests converted in the same commit (test_exception_safety_of_transitioning... and test_exception_safety_of_partition_scan) were correctly guarded. Without the guard, the allocation failure injector can sometimes target an allocation point inside the cleanup path of populate_range(). In a rare corner case, this triggers a bad_alloc in a noexcept context (reader_concurrency_semaphore::stop()), causing std::terminate. Fixes SCYLLADB-1346 Closes scylladb/scylladb#29321	2026-04-04 21:13:26 +03:00
Botond Dénes	2c22d69793	Merge 'Pytest: fix variable handling in GSServer (mock) and ensure docker service logs go to test log as well' from Calle Wilund Fixes: SCYLLADB-1106 * Small fix in scylla_cluster - remove debug print * Fix GSServer::unpublish so it does not except if publish was not called beforehand * Improve dockerized_server so mock server logs echo to the test log to help diagnose CI failures (because we don't collect log files from mocks etc, and in any case correlation will be much easier). No backport needed. Closes scylladb/scylladb#29112 * github.com:scylladb/scylladb: dockerized_service: Convert log reader to pipes and push to test log test::cluster::conftest::GSServer: Fix unpublish for when publish was not called scylla_cluster: Use thread safe future signalling scylla_cluster: Remove left-over debug printout	2026-04-03 06:38:05 +03:00
Raphael S. Carvalho	b6ebbbf036	test/cluster/test_tablets2: Fix test_split_stopped_on_shutdown race with stale log messages The test was failing because the call to: await log.wait_for('Stopping.ongoing compactions') was missing the 'from_mark=log_mark' argument. The log mark was updated (line: log_mark = await log.mark()) immediately after detecting 'splitting_mutation_writer_switch_wait: waiting', and just before launching the shutdown task. However, the wait_for call on the following line was scanning from the beginning of the log, not from that mark. As a result, the search immediately matched old 'Stopping N tasks for N ongoing compactions for table system.X due to table removal' messages emitted during initial server bootstrap (for system.large_partitions, system.large_rows, system.large_cells), rather than waiting for the shutdown to actually stop the user-table split compaction. This caused the test to prematurely send the message to the 'splitting_mutation_writer_switch_wait' injection. The split compaction was unblocked before the shutdown had aborted it, so it completed successfully. Since the split succeeded, 'Failed to complete splitting of table' was never logged. Meanwhile, 'storage_service_drain_wait' was blocking do_drain() waiting for a message. With the split already done, the test was stuck waiting for the expected failure log that would never come (600s timeout). At the same time, after 60s the 'storage_service_drain_wait' injection timed out internally, triggering on_internal_error() which -- with --abort-on-internal-error=1 -- crashed the server (exit code -6). Fix: pass from_mark=log_mark to the wait_for('Stopping.ongoing compactions') call so it only matches messages that appear after the shutdown has started, ensuring the test correctly synchronizes with the shutdown aborting the user-table split compaction before releasing the injection. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1319. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Closes scylladb/scylladb#29311	2026-04-03 06:28:51 +03:00
Andrei Chekun	6526a78334	test.py: fix nodetool mock server port collision Replace the random port selection with an OS-assigned port. We open a temporary TCP socket, bind it to (ip, 0) with SO_REUSEADDR, read back the port number the OS selected, then close the socket before launching rest_api_mock.py. Add reuse_address=True and reuse_port=True to TCPSite in rest_api_mock.py so the server itself can also reclaim a TIME_WAIT port if needed. Fixes: SCYLLADB-1275 Closes scylladb/scylladb#29314	2026-04-02 16:24:07 +02:00
Botond Dénes	eb78498e07	test: fix flaky test_timeout_is_applied_on_lookup by using eventually_true On slow/overloaded CI machines the lowres_clock timer may not have fired after the fixed 2x sleep, causing the assertion on get_abort_exception() to fail. Replace the fixed sleep with sleep(1x) + eventually_true() which retries with exponential backoff, matching the pattern already used in test_time_based_cache_eviction. Fixes: SCYLLADB-1311 Closes scylladb/scylladb#29299	2026-04-01 18:20:11 +03:00
Robert Bindar	e7527392c4	test: close clients if cluster teardown throws make sure the driver is stopped even though cluster teardown throws and avoid potential stale driver connections entering infinite reconnect loops which exhaust cpu resources. Fixes: SCYLLADB-1189 Signed-off-by: Robert Bindar <robert.bindar@scylladb.com> Closes scylladb/scylladb#29230	2026-04-01 17:22:19 +03:00
Tomasz Grabiec	2ec47a8a21	tests: address_map_test: Fix flakiness in debug mode due to task reordering Debug mode shuffles task position in the queue. So the following is possible: 1) shard 1 calls manual_clock::advance(). This expires timers on shard 1 and queues a background smp call to shard 0 which will expire timers there 2) the smp::submit_to(0, ...) from shard 1 called by the test sumbits the call 3) shard 0 creates tasks for both calls, but (2) is run first, and preempts the reactor 4) shard 1 sees the completion, completes m_svc.invoke_on(1, ..) 5) shard 0 inserts the completion from (4) before task from (1) 6) the check on shard 0: m.find(id1) fails because the timer is not expired yet To fix that, wait for timer expiration on shard 0, so that the test doesn't depend on task execution order. Note: I was not able to reproduce the problem locally using test.py --mode debug --repeat 1000. It happens in jenkins very rarely. Which is expected as the scenario which leads to this is quite unlikely. Fixes SCYLLADB-1265 Closes scylladb/scylladb#29290	2026-04-01 17:17:35 +03:00
Aleksandra Martyniuk	4d4ce074bb	test: node_ops_tasks_tree: reconnect driver after topology changes The test exercises all five node operations (bootstrap, replace, rebuild, removenode, decommission) and by the end only one node out of four remains alive. The CQL driver session, however, still holds stale references to the dead hosts in its connection pool and load-balancing policy state. When the new_test_keyspace context manager exits and attempts DROP KEYSPACE, the driver routes the query to the dead hosts first, gets ConnectionShutdown from each, and throws NoHostAvailable before ever trying the single live node. Fix by calling driver_connect() after the decommission step, which closes the old session and creates a fresh one connected only to the servers the test manager reports as running. Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1313. Closes scylladb/scylladb#29306	2026-04-01 17:13:11 +03:00
Botond Dénes	0351756b15	Merge 'test: fix fuzzy_test timeout in release mode' from Piotr Smaron The multishard_query_test/fuzzy_test was timing out (SIGKILL after 15 minutes) in release mode CI. In release mode the test generates up to 64 partitions with up to 1000 clustering rows and 1000 range tombstones each. With deeply nested randomly-generated types (e.g. frozen<map<varint, frozen<map<frozen<tuple<...>>>>>>), this volume of data can exceed the 15-minute CI timeout. Reduce the release-mode clustering-row and range-tombstone distributions from 0-1000 to 0-200. This caps the worst case at ~12,800 rows -- still 2x the devel-mode maximum (0-100) and sufficient to exercise multi-partition paged scanning with many pages. Fixes: SCYLLADB-1270 No need to backport for now, only appeared on master. Closes scylladb/scylladb#29293 * github.com:scylladb/scylladb: test: clean up fuzzy_test_config and add comments test: fix fuzzy_test timeout in release mode	2026-04-01 11:50:15 +03:00
Andrei Chekun	18f41dcd71	test.py: introduce new scheduler for choosing job count This commit improves how test.py chohoses the default number of parallele jobs. This update keeps logic of selecting number of jobs from memory and cpu limits but simplifies the heuristic so it is smoother, easier to reason about. This avoids discontinuities such as neighboring machine sizes producing unexpectedly different job counts, and behaves more predictably on asymmetric machines where CPU and RAM do not scale together. Compared to the current threshold-based version, this approach: - avoids hard jumps around memory cutoffs - avoids bucketed debug scaling based on CPU count - keeps CPU and memory as separate constraints and combines them in one place - avoids double-penalizing debug mode - is easier to tune later by adjusting a few constants instead of rewriting branching logic Closes scylladb/scylladb#28904	2026-04-01 11:11:15 +03:00
Avi Kivity	d438e35cdd	test/cluster: fix race in test_insert_failure_standalone audit log query get_audit_partitions_for_operation() returns None when no audit log rows are found. In _test_insert_failure_doesnt_report_success_assign_nodes, this None is passed to set(), causing TypeError: 'NoneType' object is not iterable. The audit log entry may not yet be visible immediately after executing the INSERT, so use wait_for() from test.pylib.util with exponential backoff to poll until the entry appears. Import it as wait_for_async to avoid shadowing the existing wait_for from test.cluster.dtest.dtest_class, which has a different signature (timeout vs deadline). Fixes SCYLLADB-1330 Closes scylladb/scylladb#29289	2026-04-01 10:59:02 +03:00
Botond Dénes	2d2ff4fbda	sstables: use chunked_managed_vector for promoted indexes in partition_index_page Switch _promoted_indexes storage in partition_index_page from managed_vector to chunked_managed_vector to avoid large contiguous allocations. Avoid allocation failure (or crashes with --abort-on-internal-error) when large partitions have enough promoted index entries to trigger a large allocation with managed_vector. Fixes: SCYLLADB-1315 Closes scylladb/scylladb#29283	2026-03-31 18:43:57 +03:00
Piotr Smaron	2ce409dca0	test: clean up fuzzy_test_config and add comments Remove the unused timeout field from fuzzy_test_config. It was declared, initialized per build mode, and logged, but never actually enforced anywhere. Document the intentionally small max_size (1024 bytes) passed to read_partitions_with_paged_scan in run_fuzzy_test_scan: it forces many pages per scan to stress the paging and result-merging logic.	2026-03-31 17:13:26 +02:00
Piotr Smaron	df2924b2a3	test: fix fuzzy_test timeout in release mode The multishard_query_test/fuzzy_test was timing out (SIGKILL after 15 minutes) in release mode CI. In release mode the test generates up to 64 partitions with up to 1000 clustering rows and 1000 range tombstones each. With deeply nested randomly-generated types (e.g. frozen<map<varint, frozen<map<frozen<tuple<...>>>>>>), this volume of data can exceed the 15-minute CI timeout. Reduce the release-mode clustering-row and range-tombstone distributions from 0-1000 to 0-200. This caps the worst case at ~12,800 rows -- still 2x the devel-mode maximum (0-100) and sufficient to exercise multi-partition paged scanning with many pages. Fixes: SCYLLADB-1270	2026-03-31 17:13:06 +02:00
Piotr Szymaniak	6d8ec8a0c0	alternator: fix flaky test_update_condition_unused_entries_short_circuit The test was flaky because it stopped dc2_node immediately after an LWT write, before cross-DC replication could complete. The LWT commit uses LOCAL_QUORUM, which only guarantees persistence in the coordinator's DC. Replication to the remote DC is async background work, and CAS mutations don't store hints. Stopping dc2_node could drop in-flight RPCs, leaving DC1 without the mutation. Fix by polling both live DC1 nodes after the write to confirm cross-DC replication completed before stopping dc2_node. Both nodes must have the data so that the later ConsistentRead=True (LOCAL_QUORUM) read on restarted node1 is guaranteed to succeed. Fixes SCYLLADB-1267 Closes scylladb/scylladb#29287	2026-03-31 16:50:51 +03:00
Dawid Mędrek	f040f1b703	Merge 'raft: remake the read barrier optimization' from Patryk Jędrzejczak The approach taken in `1ae2ae50a6` turned out to be incorrect. The Raft member requesting a read barrier could incorrectly advance its commit_idx and break linearizability. We revert that commit in this PR. We also remake the read barrier optimization with a completely new approach. We make the leader replicate to the non-voting requester of a read barrier if its `commit_idx` is behind. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-998 No backport: the issue is present only in master. Closes scylladb/scylladb#29216 * github.com:scylladb/scylladb: raft: speed up read barrier requested by non-voters Revert "raft: read_barrier: update local commit_idx to read_idx when it's safe"	2026-03-31 15:11:56 +02:00
Avi Kivity	216d39883a	Merge 'test: audit: fix audit test syslog race' from Dario Mirovic Fix two independent race conditions in the syslog audit test that cause intermittent `assert 2 <= 1` failures in `assert_entries_were_added`. Datagram ordering race: `UnixSockerListener` used `ThreadingUnixDatagramServer`, where each datagram spawns a new thread. The notification barrier in `get_lines()` assumes FIFO handling, but the notification thread can win the lock before an audit entry thread, so `clear_audit_logs()` misses entries that arrive moments later. Fix: switch to sequential `UnixDatagramServer`. Config reload race: The live-update path used `wait_for_config` (REST API poll on shard 0) which can return before `broadcast_to_all_shards()` completes. Fix: wait for `"completed re-reading configuration file"` in the server log after each SIGHUP, which guarantees all shards have the new config. Fixes SCYLLADB-1277 This is CI improvement for the latest code. No need for backport. Closes scylladb/scylladb#29282 * github.com:scylladb/scylladb: test: cluster: wait for full config reload in audit live-update path test: cluster: fix syslog listener datagram ordering race	2026-03-31 13:53:01 +03:00
Tomasz Grabiec	b355bb70c2	dtest/alternator: stop concurrent-requests test when workers hit limit `test_limit_concurrent_requests` could create far more tables than intended because worker threads looped indefinitely and only the probe path terminated the test. In practice, workers often hit `RequestLimitExceeded` first, but the test kept running and creating tables, increasing memory pressure and causing flakiness due to bad_alloc errors in logs. Fix by replacing the old probe-driven termination with worker-driven termination. Workers now run until any worker sees `RequestLimitExceeded`. Fixes SCYLLADB-1181 Closes scylladb/scylladb#29270	2026-03-31 13:35:50 +03:00
Patryk Jędrzejczak	b9f82f6f23	raft_group0: join_group0: fix join hang when node joins group 0 before post_server_start A joining node hung forever if the topology coordinator added it to the group 0 configuration before the node reached `post_server_start`. In that case, `server->get_configuration().contains(my_id)` returned true and the node broke out of the join loop early, skipping `post_server_start`. `_join_node_group0_started` was therefore never set, so the node's `join_node_response` RPC handler blocked indefinitely. Meanwhile the topology coordinator's `respond_to_joining_node` call (which has no timeout) hung forever waiting for the reply that never came. Fix by only taking the early-break path when not starting as a follower (i.e. when the node is the discovery leader or is restarting). A joining node must always reach `post_server_start`. We also provide a regression test. It takes 6s in dev mode. Fixes SCYLLADB-959 Closes scylladb/scylladb#29266	2026-03-31 12:33:56 +02:00
Dario Mirovic	0cb63fb669	test: cluster: wait for full config reload in audit live-update path _apply_config_to_running_servers used wait_for_config (REST API poll) to confirm live config updates. The REST API reads from shard 0 only, so it can return before broadcast_to_all_shards() completes — other shards may still have stale audit config, generating unexpected entries. Additionally, server_remove_config_option for absent keys sent separate SIGHUPs before server_update_config, and the single wait_for_config at the end could match a completion from an earlier SIGHUP. Wait for "completed re-reading configuration file" in the server log after each SIGHUP-producing operation. This message is logged only after both read_config() and broadcast_to_all_shards() finish, guaranteeing all shards have the new config. Each operation gets its own mark+wait so no stale completion is matched. Fixes SCYLLADB-1277	2026-03-31 02:27:11 +02:00
Dario Mirovic	1d623196eb	test: cluster: fix syslog listener datagram ordering race UnixSockerListener used ThreadingUnixDatagramServer, which spawns a new thread per datagram. The notification barrier in get_lines() relies on all prior datagrams being handled before the notification. With threading, the notification handler can win the lock before an audit entry handler, so get_lines() returns before the entry is appended. clear_audit_logs() then clears an incomplete buffer, and the late entry leaks into the next test's before/after diff. Switch to sequential UnixDatagramServer. The server thread now handles datagrams in kernel FIFO order, so the notification is always processed after all preceding audit entries. Refs SCYLLADB-1277	2026-03-31 02:27:11 +02:00
Patryk Jędrzejczak	ba54b2272b	raft: speed up read barrier requested by non-voters We achieve this by making the leader replicate to the non-voting requester of a read barrier if its commit_idx is behind. There are some corner cases where the new `replicate_to(*opt_progress, true);` call will be a no-op, while the corresponding call in `tick_leader()` would result in sending the AppendEntries RPC to the follower. These cases are: - `progress.state == follower_progress::state::PROBE && progress.probe_sent`, - `progress.state == follower_progress::state::PIPELINE && progress.in_flight == follower_progress::max_in_flight`. We could try to improve the optimization by including some of the cases above, but it would only complicate the code without noticeable benefits (at least for group0). Note: this is the second attempt for this optimization. The first approach turned out to be incorrect and was reverted in the previous commit. The performance improvement is the same as in the previous case.	2026-03-30 15:56:24 +02:00
Patryk Jędrzejczak	4913acd742	Revert "raft: read_barrier: update local commit_idx to read_idx when it's safe" This reverts commit `1ae2ae50a6`. The reverted change turned out to be incorrect. The Raft member requesting a read barrier could incorrectly advance its commit_idx and break linearizability. More details in https://scylladb.atlassian.net/browse/SCYLLADB-998?focusedCommentId=42935	2026-03-30 15:56:24 +02:00
Andrzej Jackowski	ab43420d30	test: use exclusive driver connection in test_limited_concurrency_of_writes Use get_cql_exclusive(node1) so the driver only connects to node1 and never attempts to contact the stopped node2. The test was flaky because the driver received `Host has been marked down or removed` from node2. Fixes: SCYLLADB-1227 Closes scylladb/scylladb#29268	2026-03-30 11:50:44 +02:00
Botond Dénes	068a7894aa	test/cluster: fix flaky test_cleanup_stop by using asyncio.sleep The test was using time.sleep(1) (a blocking call) to wait after scheduling the stop_compaction task, intending to let it register on the server before releasing the sstable_cleanup_wait injection point. However, time.sleep() blocks the asyncio event loop entirely, so the asyncio.create_task(stop_compaction) task never gets to run during the sleep. After the sleep, the directly-awaited message_injection() runs first, releasing the injection point before stop_compaction is even sent. By the time stop_compaction reaches Scylla, the cleanup has already completed successfully -- no exception is raised and the test fails. Fix by replacing time.sleep(1) with await asyncio.sleep(1), which yields control to the event loop and allows the stop_compaction task to actually send its HTTP request before message_injection is called. Fixes: SCYLLADB-834 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Closes scylladb/scylladb#29202	2026-03-30 11:40:47 +03:00
Nadav Har'El	d32fe72252	Merge 'alternator: check concurrency limit before memory acquisition' from Łukasz Paszkowski Fix the ordering of the concurrency limit check in the Alternator HTTP server so it happens before memory acquisition, and reduce test pressure to avoid LSA exhaustion on the memory-constrained test node. The patch moves the concurrency check to right after the content-length early-out, before any memory acquisition or I/O. The check was originally placed before memory acquisition but was inadvertently moved after it during a refactoring. This allowed unlimited requests to pile up consuming memory, reading bodies, verifying signatures, and decompressing — all before being rejected. Restores the original ordering and mirrors the CQL transport (`transport/server.cc`). Lowers `concurrent_requests_limit` from 5 to 3 and the thread multiplier from 5 to 2 (6 threads instead of 25). This is still sufficient to reliably trigger RequestLimitExceeded, while keeping flush pressure within what 512MB per shard can sustain. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1248 Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1181 The test started to fail quite recently. It affects master only. No backport is needed. We might want to consider backporting a commit moving the concurrency check earlier. Closes scylladb/scylladb#29272 * github.com:scylladb/scylladb: test: reduce concurrent-request-limit test pressure to avoid LSA exhaustion alternator: check concurrency limit before memory acquisition	2026-03-29 11:08:28 +03:00
Łukasz Paszkowski	b8e3ef0c64	test: reduce concurrent-request-limit test pressure to avoid LSA exhaustion The test_limit_concurrent_requests dtest uses concurrent CreateTable requests to verify Alternator's concurrency limiting. Each admitted CreateTable triggers Raft consensus, schema mutations, and memtable flushes—all of which consume LSA memory. On the 1 GB test node (2 SMP × 512 MB), the original settings (limit=5, 25 threads) created enough flush pressure to exhaust the LSA emergency reserve, producing logalloc::bad_alloc errors in the node log. The test was always marginal under these settings and became flaky as new system tables increased baseline LSA usage over time. Lower concurrent_requests_limit from 5 to 3 and the thread multiplier from 5 to 2 (6 threads total). This is still well above the limit and sufficient to reliably trigger RequestLimitExceeded, while keeping flush pressure within what 512 MB per shard can sustain.	2026-03-28 20:40:33 +01:00
Łukasz Paszkowski	a86928caa1	alternator: check concurrency limit before memory acquisition The concurrency limit check in the Alternator server was positioned after memory acquisition (get_units), request body reading (read_entire_stream), signature verification, and decompression. This allowed unlimited requests to pile up consuming memory before being rejected, exhausting LSA memory and causing logalloc::bad_alloc errors that cascade into Raft applier and topology coordinator failures, breaking subsequent operations. Without this fix, test_limit_concurrent_requests on a 1GB node produces 50 logalloc::bad_alloc errors and cascading failures: reads from system.scylla_local fail, the Raft applier fiber stops, the topology coordinator stops, and all subsequent CreateTable operations fail with InternalServerError (500). With this fix, the cascade is eliminated -- admitted requests may still cause LSA pressure on a memory-constrained node, but the server remains functional. Move the concurrency check to right after the content-length early-out, before any memory acquisition or I/O. This mirrors the CQL transport which correctly checks concurrency before memory acquisition (transport/server.cc). The concurrency check was originally added in `1b8c946ad7` (Sep 2020) before memory acquisition, which at the time lived inside with_gate (after the concurrency gate). The ordering was inverted by `f41dac2a3a` (Mar 2021, "avoid large contiguous allocation for request body"), which moved get_units() earlier in the function to reserve memory before reading the newly-introduced content stream -- but inadvertently also moved it before the concurrency check. `c3593462a4` (Mar 2025) further worsened the situation by adding a 16MB fallback reservation for requests without Content-Length and ungzip/deflate decompression steps -- all before the concurrency check -- greatly increasing the memory consumed by requests that would ultimately be rejected.	2026-03-28 20:40:33 +01:00
Emil Maskovsky	9dad68e58d	raft: abort stale snapshot transfers when term changes The Bug Assertion failure: `SCYLLA_ASSERT(res.second)` in `raft/server.cc` when creating a snapshot transfer for a destination that already had a stale in-flight transfer. Root Cause If a node loses leadership and later becomes leader again before the next `io_fiber` iteration, the old transfer from the previous term can remain in `_snapshot_transfers` while `become_leader()` resets progress state. When the new term emits `install_snapshot(dst)`, `send_snapshot(dst)` tries to create a new entry for the same destination and can hit the assertion. The Fix Abort all in-flight snapshot transfers in `process_fsm_output()` when `term_and_vote` is persisted. A term/vote change marks existing transfers as stale, so we clean them up before dispatching messages from that batch and before any new snapshot transfer is started. With cross-term cleanup moved to the term-change path, `send_snapshot()` now asserts the within-term invariant that there is at most one in-flight transfer per destination. Fixes: SCYLLADB-862 Backport: The issue is reproducible in master, but is present in all active branches. Closes scylladb/scylladb#29092	2026-03-27 10:00:15 +01:00
Andrzej Jackowski	181ad9f476	Revert "audit: disable DDL by default" This reverts commit `c30607d80b`. With the default configuration, enabling DDL has no effect because no `audit_keyspaces` or `audit_tables` are specified. Including DDL in the default categories can be misleading for some customers, and ideally we would like to avoid it. However, DDL has been one of the default audit categories for years, and removing it risks silently breaking existing deployments that depend on it. Therefore, the recent change to disable DDL by default is reverted. Fixes: SCYLLADB-1155 Closes scylladb/scylladb#29169	2026-03-27 09:55:11 +01:00
Botond Dénes	854c374ebf	test/encryption: wait for topology convergence after abrupt restart test_reboot uses a custom restart function that SIGKILLs and restarts nodes sequentially. After all nodes are back up, the test proceeded directly to reads after wait_for_cql_and_get_hosts(), which only confirms CQL reachability. While a node is restarted, other nodes might execute global token metadata barriers, which advance the topology fence version. The restarted node has to learn about the new version before it can send reads/writes to the other nodes. The test issues reads as soon as the CQL port is opened, which might happen before the last restarted node learns of the latest topology version. If this node acts as a coordinator for reads/write before this happens, these will fail as the other nodes will reject the ops with the outdated topology fence version. Fix this by replacing wait_for_cql_and_get_hosts() on the abrupt-restart path with the more robus get_ready_cql(), which makes sure servers see each other before refreshing the cql connection. This should ensure that nodes have exchanged gossip and converged on topology state before any reads are executed. The rolling_restart() path is unaffected as it handles this internally. Fixes: SCYLLADB-557 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Closes scylladb/scylladb#29211	2026-03-27 09:52:27 +01:00
Avi Kivity	b708e5d7c9	Merge 'test: fix race condition in test_crashed_node_substitution' from Sergey Zolotukhin `test_crashed_node_substitution` intermittently failed: ```python assert len(gossiper_eps) == (len(server_eps) + 1) ``` The test crashed the node right after a single ACK2 handshake (`finished do_send_ack2_msg`), assuming the node state was visible to all peers. However, since gossip is eventually consistent, the update may not have propagated yet, so some nodes did not see the failed node. This change: Wait until the gossiper state is visible on peers before continuing the test and asserting. Fixes: [SCYLLADB-1256](https://scylladb.atlassian.net/browse/SCYLLADB-1256). backport: this issue may affect CI for all branches, so should be backported to all versions. [SCYLLADB-1256]: https://scylladb.atlassian.net/browse/SCYLLADB-1256?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ Closes scylladb/scylladb#29254 * github.com:scylladb/scylladb: test: test_crashed_node_substitution: add docstring and fix whitespace test: fix race condition in test_crashed_node_substitution	2026-03-26 21:40:33 +02:00
Petr Gusev	c38e312321	test_lwt_fencing_upgrade: fix quorum failure due to gossip lag If lwt_workload() sends an update immediately after a rolling restart, the coordinator might still see a replica as down due to gossip lagging behind. Concurrently restarting another node leaves only one available replica, failing the LOCAL_QUORUM requirement for learn or eventually consistent sp::query() in sp::cas() and resulting in a mutation_write_failure_exception. We fix this problem by waiting for the restarted server to see 2 other peers. The server_change_version doesn't do that by default -- it passes wait_others=0 to server_start(). Fixes SCYLLADB-1136 Closes scylladb/scylladb#29234	2026-03-26 21:25:53 +02:00
bitpathfinder	627a8294ed	test: test_crashed_node_substitution: add docstring and fix whitespace Add a description of the test's intent and scenario; remove extra blanks.	2026-03-26 18:40:17 +01:00
bitpathfinder	5a086ae9b7	test: fix race condition in test_crashed_node_substitution `test_crashed_node_substitution` intermittently failed: ``` assert len(gossiper_eps) == (len(server_eps) + 1) ``` The test crashed the node right after a single ACK2 handshake ("finished do_send_ack2_msg"), assuming the node state was visible to all peers. However, since gossip is eventually consistent, the update may not have propagated yet, so some nodes did not see the failed node. This change: Wait until the gossiper state is visible on peers before continuing the test and asserting. Fixes: SCYLLADB-1256.	2026-03-26 18:25:05 +01:00
Robert Bindar	c575bbf1e8	test_refresh_deletes_uploaded_sstables should wait for sstables to get deleted SSTable unlinking is async, so in some cases it may happen that the upload dir is not empty immediately after refresh is done. This patch adjusts test_refresh_deletes_uploaded_sstables so it waits with a timeout till the upload dir becomes empty instead of just assuming the API will sync on sstables being gone. Fixes SCYLLADB-1190 Signed-off-by: Robert Bindar <robert.bindar@scylladb.com> Closes scylladb/scylladb#29215	2026-03-26 08:43:14 +03:00
Marcin Maliszkiewicz	7fdd650009	Merge 'test: audit: clean up test helper class naming' from Dario Mirovic Remove unused `pytest.mark.single_node` marker from `TestCQLAudit`. Rename `TestCQLAudit` to `CQLAuditTester` to reflect that it is a test helper, not a test class. This avoids accidental pytest collection and subsequent warning about `__init__`. Logs before the fixes: ``` test/cluster/test_audit.py:514: 14 warnings /home/dario/dev/scylladb/test/cluster/test_audit.py:514: PytestCollectionWarning: cannot collect test class 'TestCQLAudit' because it has a __init__ constructor (from: cluster/test_audit.py) @pytest.mark.single_node ``` Fixes SCYLLADB-1237 This is an addition to the latest master code. No backport needed. Closes scylladb/scylladb#29237 * github.com:scylladb/scylladb: test: audit: rename TestCQLAudit to CQLAuditTester test: audit: remove unused pytest.mark.single_node	2026-03-25 15:30:16 +01:00
Dario Mirovic	552a2d0995	test: audit: rename TestCQLAudit to CQLAuditTester pytest tries to collect tests for execution in several ways. One is to pick all classes that start with 'Test'. Those classes must not have custom '__init__' constructor. TestCQLAudit does. TestCQLAudit after migration from test/cluster/dtest is not a test class anymore, but rather a helper class. There are two ways to fix this: 1. Add __init__ = False to the TestCQLAudit class 2. Rename it to not start with 'Test' Option 2 feels better because the new name itself does not convey the wrong message about its role. Fixes SCYLLADB-1237	2026-03-25 13:21:08 +01:00
Dario Mirovic	73de865ca3	test: audit: remove unused pytest.mark.single_node Remove unused pytest.mark.single_node in TestCQLAudit class. This is a leftover from audit tests migration from test/cluster/dtest to test/cluster. Refs SCYLLADB-1237	2026-03-25 13:18:37 +01:00
Marcin Maliszkiewicz	f988ec18cb	test/lib: fix port in-use detection in start_docker_service Previously, the result of when_all was discarded. when_all stores exceptions in the returned futures rather than throwing, so the outer catch(in_use&) could never trigger. Now we capture the when_all result and inspect each future individually to properly detect in_use from either stream. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1216 Closes scylladb/scylladb#29219	2026-03-25 11:45:53 +02:00
Artsiom Mishuta	cd1679934c	test/pylib: use exponential backoff in wait_for() Change wait_for() defaults from period=1s/no backoff to period=0.1s with 1.5x backoff capped at 1.0s. This catches fast conditions in 100ms instead of 1000ms, benefiting ~100 call sites automatically. Add completion logging with elapsed time and iteration count. Tested local with test/cluster/test_fencing.py::test_fence_hints (dev mode), log output: wait_for(at_least_one_hint_failed) completed in 0.83s (4 iterations) wait_for(exactly_one_hint_sent) completed in 1.34s (5 iterations) Fixes SCYLLADB-738 Closes scylladb/scylladb#29173	2026-03-24 23:49:49 +02:00
Botond Dénes	d52fbf7ada	Merge 'test: cluster: Deflake test_startup_with_keyspaces_violating_rf_rack_valid_keyspaces' from Dawid Mędrek The test was flaky. The scenario looked like this: 1. Stop server 1. 2. Set its rf_rack_valid_keyspaces configuration option to true. 3. Create an RF-rack-invalid keyspace. 4. Start server 1 and expect a failure during start-up. It was wrong. We cannot predict when the Raft mutation corresponding to the newly created keyspace will arrive at the node or when it will be processed. If the check of the RF-rack-valid keyspaces we perform at start-up was done before that, it won't include the keyspace. This will lead to a test failure. Unfortunately, it's not feasible to perform a read barrier during start-up. What's more, although it would help the test, it wouldn't be useful otherwise. Because of that, we simply fix the test, at least for now. The new scenario looks like this: 1. Disable the rf_rack_valid_keyspaces configuration option on server 1. 2. Start the server. 3. Create an RF-rack-invalid keyspace. 4. Perform a read barrier on server 1. This will ensure that it has observed all Raft mutations, and we won't run into the same problem. 5. Stop the node. 6. Set its rf_rack_valid_keyspaces configuration option to true. 7. Try to start the node and observe a failure. This will make the test perform consistently. --- I ran the test (in dev mode, on my local machine) three times before these changes, and three times with them. I include the time results below. Before: ``` real 0m47.570s user 0m41.631s sys 0m8.634s real 0m50.495s user 0m42.499s sys 0m8.607s real 0m50.375s user 0m41.832s sys 0m8.789s ``` After: ``` real 0m50.509s user 0m43.535s sys 0m9.715s real 0m50.857s user 0m44.185s sys 0m9.811s real 0m50.873s user 0m44.289s sys 0m9.737s ``` Fixes SCYLLADB-1137 Backport: The test is present on all supported branches, and so we should backport these changes to them. Closes scylladb/scylladb#29218 * github.com:scylladb/scylladb: test: cluster: Deflake test_startup_with_keyspaces_violating_rf_rack_valid_keyspaces test: cluster: Mark test with @pytest.mark.asyncio in test_multidc.py	2026-03-24 21:09:19 +02:00
Patryk Jędrzejczak	141aa2d696	Merge 'test/cluster/test_incremental_repair.py: fix typo + enable compaction DEBUG logs' from Botond Dénes This PR contains two small improvements to `test_incremental_repair.py` motivated by the sporadic failure of `test_tablet_incremental_repair_and_scrubsstables_abort`. The test fails with `assert 3 == 2` on `len(sst_add)` in the second repair round. The extra SSTable has `repaired_at=0`, meaning scrub unexpectedly produced more unrepaired SSTables than anticipated. Since scrub (and compaction in general) logs at DEBUG level and the test did not enable debug logging, the existing logs do not contain enough information to determine the root cause. Commit 1 fixes a long-standing typo in the helper function name (`preapre` -> `prepare`). Commit 2 enables `compaction=debug` for the Scylla nodes started by `do_tablet_incremental_repair_and_ops`, which covers all `test_tablet_incremental_repair_and_` variants. This will capture full compaction/scrub activity on the next reproduction, making the failure diagnosable. Refs: SCYLLADB-1086 Backport: test improvement, no backport Closes scylladb/scylladb#29175 https://github.com/scylladb/scylladb: test/cluster/test_incremental_repair.py: enable compaction DEBUG logs in do_tablet_incremental_repair_and_ops test/cluster/test_incremental_repair.py: fix typo preapre -> prepare	2026-03-24 16:27:01 +01:00
Ernest Zaslavsky	c670183be8	cmake: fix precompiled header (PCH) creation Two issues prevented the precompiled header from compiling successfully when using CMake directly (rather than the configure.py + ninja build system): a) Propagate build flags to Rust binding targets reusing the PCH. The wasmtime_bindings and inc targets reuse the PCH from scylla-precompiled-header, which is compiled with Seastar's flags (including sanitizer flags in Debug/Sanitize modes). Without matching compile options, the compiler rejects the PCH due to flag mismatch (e.g., -fsanitize=address). Link these targets against Seastar::seastar to inherit the required compile options. Closes scylladb/scylladb#28941	2026-03-24 15:53:40 +02:00

1 2 3 4 5 ...

52868 Commits