scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-29 20:57:00 +00:00

Author	SHA1	Message	Date
Łukasz Paszkowski	7e14ea5ac8	sstables: only wipe TemporaryHashes for sstable formats that have it Commit `8d34127684` ("sstables: clean up TemporaryHashes file in wipe()") unconditionally calls filename(..., component_type::TemporaryHashes) inside filesystem_storage::wipe(). However, the TemporaryHashes component is only registered in the component map of the 'ms' sstable format. For older formats (ka, la, mc, md, me) the lookup goes through sstable_version_constants::get_component_map(version).at(...) and throws std::out_of_range. The exception is then swallowed by the outer catch(...) in wipe(), which just logs and ignores. As a side effect, the subsequent remove_file(new_toc_name) is never reached and the TemporaryTOC ('*-TOC.txt.tmp') file is left as an orphan on disk after every unlink() of a non-'ms' sstable. Guard the lookup with get_component_map(version).contains() so the cleanup is only attempted for formats that actually define the component. Add a regression test in test/boost/sstable_directory_test.cc that creates an 'me'-format sstable, unlinks it and asserts that the sstable directory is left empty. Without the fix the test fails with a leftover 'me-...-TOC.txt.tmp' file. Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1697 Closes scylladb/scylladb#29620	2026-04-29 08:06:36 +03:00
Botond Dénes	809f12f988	Merge 'test/cluster/dtest: fix ScyllaNode state not persisting across nodelist() calls' from Benny Halevy `ScyllaCluster.nodelist()` creates new `ScyllaNode` objects on every call, so per-node state set via `set_smp()`, `set_log_level()`, and `_adjust_smp_and_memory()` was lost. This meant `set_smp()` had no effect when `cluster.start()` was called after it, since `start_nodes()` calls `nodelist()` internally which creates fresh nodes with default values. - Add debug logging for smp/memory in ScyllaNode - Store per-node settings (smp, memory, log levels) in a `ScyllaCluster._node_resources` dict keyed by server_id, so they survive `nodelist()` reconstruction. `ScyllaNode` restores its state from this dict on construction and saves it back whenever `set_smp()`, `set_log_level()`, or `_adjust_smp_and_memory()` modifies it. - Add a reproducer test verifying `set_smp()` takes effect on restart Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1629 -- No backport needed: this only fixes dtest infrastructure, no production code is affected. Closes scylladb/scylladb#29549 * github.com:scylladb/scylladb: test/cluster/dtest: add test for node.set_smp() persistence test/cluster/dtest: cache ScyllaNode instances in ScyllaCluster test/cluster/dtest/ccmlib/scylla_node: add debug logging	2026-04-29 06:25:36 +03:00
Avi Kivity	c4de2b3c9d	Merge 'test: fix flaky tablets test by using read barrier' from Aleksandra Martyniuk Some tests in test_tablets.py read system_schema.keyspaces from an arbitrary node that may not have applied the latest schema change yet. Pin the read to a specific node and issue a read barrier before querying, ensuring the node has up-to-date data. Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1700 Test fix; no backport Closes scylladb/scylladb#29655 * github.com:scylladb/scylladb: test: fix flaky rack list conversion tests by using read barrier test: fix flaky test_enforce_rack_list_option by using read barrier	2026-04-28 17:15:59 +03:00
Patryk Jędrzejczak	d9dd3bfe53	Merge 'topology_coordinator: join tablet load stats refresh in stop()' from Andrzej Jackowski Commit `2b7aa32` (topology_coordinator: Refresh load stats after table is created or altered) registered topology_coordinator as a schema change listener and added on_create_column_family which fire-and-forgets _tablet_load_stats_refresh.trigger(). The triggered task runs on the gossip scheduling group via with_scheduling_group and accesses the topology_coordinator via 'this'. stop() unregisters the listener but does not wait for any in-flight refresh task. If a notification fires between _tablet_load_stats_refresh.join() in run() and unregister_listener in stop(), the scheduled task can outlive the topology_coordinator and access freed memory after run_topology_coordinator's coroutine frame is destroyed. Wait for the refresh to complete in stop() after unregistering the listener, ensuring no task can fire after destruction. Fixes SCYLLADB-1728 Backport to 2026.1 and 2026.2, because the issue was introduced in `2b7aa32` Closes scylladb/scylladb#29653 * https://github.com/scylladb/scylladb: test: tablet_stats: reproduce shutdown refresh race topology_coordinator: join tablet load stats refresh in stop()	2026-04-28 12:54:28 +02:00
Benny Halevy	5eaa979f35	test/cluster/dtest: add test for node.set_smp() persistence Add a test that reproduces SCYLLADB-1629: set_smp() had no effect because nodelist() created new ScyllaNode objects on every call, losing the _smp_set_during_test value. The test fails without the fix in the previous patch.	2026-04-28 12:34:08 +03:00
Benny Halevy	7430c1efd7	test/cluster/dtest: cache ScyllaNode instances in ScyllaCluster ScyllaCluster.nodelist() was creating new ScyllaNode objects on every call, so per-node state set via set_smp(), set_log_level(), and _adjust_smp_and_memory() was lost between calls. Fix by caching ScyllaNode instances in a list populated by _add_nodes() using the list returned by servers_add() in populate(). Nodes are assigned monotonically increasing names (node1, node2, ...). nodelist() simply returns the cached list.	2026-04-28 12:34:06 +03:00
Marcin Maliszkiewicz	b0f988afc4	Merge 'auth: fix shutdown and startup races in LDAP cache pruner' from Andrzej Jackowski The LDAP role manager's `_cache_pruner` background fiber periodically calls cache::reload_all_permissions(). Two races cause it to hit SCYLLA_ASSERT(_permission_loader): - Cross-shard race: The pruner `used _cache.container().invoke_on_all()` to reload permissions on every shard. Since both `service::start()` and `sharded<service>::stop()` execute per-shard in parallel, the pruner on one shard could call reload_all_permissions() on another shard before that shard set its loader (startup) or after it cleared its loader (shutdown). Each shard runs its own pruner instance, so reloading locally is sufficient — this also removes redundant N² reload calls. - Intra-shard race: `service::stop()` cleared the permission loader and stopped the role manager concurrently (via when_all_succeed). A mid-reload pruner could yield and then call the now-null loader. Fixed by stopping the role manager first so the pruner is fully drained before the loader is cleared. Fixes SCYLLADB-1679 Backport to 2026.2, introduced in `7eedf50c12` Closes scylladb/scylladb#29605 * github.com:scylladb/scylladb: auth: make shutdown the exact reverse of startup test: ldap: add test for pruner crash during shutdown auth: start authorizer and set permission loader before role manager auth: stop role manager before clearing permission loader auth: reload LDAP permission cache on local shard only	2026-04-28 11:16:07 +02:00
Botond Dénes	a7e9c0e6d2	Merge 'test.py: fix test collection bug' from Andrei Chekun In certain circumstances current way of collecting can be error-prone. Collection can stop when the first file is skipped in the mode leaving the rest of the files in CLI not collected. Another issue that if the file specified twice, with directory and file explicitly, it will produce incorrect CppFile in the stash causing KeyError. Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1714 No backport, test framework bug fix only. Closes scylladb/scylladb#29634 * github.com:scylladb/scylladb: test.py: fix framework test test.py: fix test collection bug	2026-04-28 11:52:35 +03:00
Botond Dénes	3ea4af1c8c	Merge 'test/cluster/test_incremental_repair: fix flaky coordinator-change scenario' from Avi Kivity - Ensure servers[1] is not the topology coordinator before restarting it, preventing the leader death + re-election + re-repair sequence that masked the compaction-merge bug - Add a retry loop that detects post-restart leadership transfer to servers[1] via direct coordinator query, retrying up to 5 times Fixes: SCYLLADB-1478 Backporting to 2026.2, which sees the failure regularly. Closes scylladb/scylladb#29671 * github.com:scylladb/scylladb: test/cluster/test_incremental_repair: add retry for residual leadership race test/cluster/test_incremental_repair: fix flaky coordinator-change scenario	2026-04-28 09:05:02 +03:00
Andrzej Jackowski	459e3970cd	test: tablet_stats: reproduce shutdown refresh race The coordinator can receive a schema-change notification after run() finishes but before stop() unregisters listeners. The test pins that window with error injections and verifies stop() waits for the refresh instead of letting it outlive the coordinator. Test time in dev: 9.51s Refs SCYLLADB-1728	2026-04-28 08:00:54 +02:00
Andrzej Jackowski	8756f7c068	topology_coordinator: join tablet load stats refresh in stop() Commit `2b7aa3211d` made schema changes trigger tablet load stats refreshes in the background. A notification can still arrive after run() stops the periodic refresher and before the coordinator object is destroyed. Move lifecycle subscription cleanup to stop() and join the serialized refresh there after unregistering refresh trigger sources. This keeps the coordinator alive until notification-triggered refresh work has completed. Fixes SCYLLADB-1728	2026-04-28 07:37:28 +02:00
Avi Kivity	2615d0e8d8	test/cluster/test_incremental_repair: add retry for residual leadership race There is a small race window where Raft leadership could transfer back to servers[1] between the ensure_group0_leader_on() check and the actual restart. If this happens, the new coordinator re-initiates repair and masks the compaction-merge bug. Extract the core test logic into _do_race_window_promotes_unrepaired_data() which directly checks get_topology_coordinator() after restart and raises _LeadershipTransferred if servers[1] became coordinator. The test function calls this helper in a retry loop (up to 5 attempts). Refs: SCYLLADB-1478	2026-04-27 21:11:06 +03:00
Avi Kivity	914b70c75b	test/cluster/test_incremental_repair: fix flaky coordinator-change scenario The test_incremental_repair_race_window_promotes_unrepaired_data test was flaky because it hardcodes servers[1] as the restart target but did not ensure servers[1] was NOT the topology coordinator. When servers[1] happened to be the Raft group0 leader (topology coordinator), restarting it killed the leader, forced a new election, and the new coordinator re-initiated tablet repair. This re-repair flushes memtables on all replicas via take_storage_snapshot() and marks the resulting sstables as repaired -- causing post-repair keys to appear in repaired sstables on servers[0] and servers[2]. The test then hit the wrong assertion (servers[0]/[2] contaminated). Fix: before starting the repair, check whether servers[1] is the topology coordinator. If so, move leadership to another server via ensure_group0_leader_on() so that restarting servers[1] only kills a follower -- which does not trigger an election or coordinator change. Reproducibility was confirmed by forcing leadership to servers[1] via ensure_group0_leader_on() and observing deterministic failure with all three servers showing post-repair keys in repaired sstables (confirming the re-repair scenario), then verifying the fix passes reliably. Fixes: SCYLLADB-1478	2026-04-27 21:08:12 +03:00
Aleksandra Martyniuk	6b7ce5e244	test: fix flaky rack list conversion tests by using read barrier test_numeric_rf_to_rack_list_conversion and test_numeric_rf_to_rack_list_conversion_abort were reading system_schema.keyspaces from an arbitrary node that may not have applied the latest schema change yet. Pin the read to a specific node and issue a read barrier before querying, ensuring the node has up-to-date data.	2026-04-27 15:19:09 +02:00
Aleksandra Martyniuk	9d3d424d58	test: fix flaky test_enforce_rack_list_option by using read barrier The test was reading system_schema.keyspaces from an arbitrary node that may not have applied the latest schema change yet. Pin the read to a specific node and issue a read barrier before querying, ensuring the node has up-to-date data.	2026-04-27 14:44:38 +02:00
Anna Mikhlin	86472e43e1	Update ScyllaDB version to: 2026.3.0-dev	2026-04-26 15:30:13 +03:00
Andrei Chekun	f2f4915e09	test.py: fix framework test Framework test was not skipping unit directory where C++ tests are located. With bug fixing this started to fail. Add ignoring this directory as well.	2026-04-25 18:04:55 +02:00
Piotr Szymaniak	d5efd1f676	test/cluster: wait for Alternator readiness in server startup server_add() only waits for CQL readiness before returning. The Alternator HTTP port may not be listening yet, causing ConnectionRefused with Alternator tests. Extend the ServerUpState enum and startup loop to also check Alternator port readiness when configured. Whenever Alternator port(s) is/are configured, each is verified if connectable and queryable, similar to how CQL ports are probed. Fixes SCYLLADB-1701 Closes scylladb/scylladb#29625	2026-04-25 16:35:44 +03:00
Piotr Smaron	d14d07a079	test: fix flaky test_sstable_write_large_{row,cell} by using a fixed partition key Commit `ce00d61917` ("db: implement large_data virtual tables with feature flag gating") changed these two tests to construct their mutation with a randomly generated partition key (simple_schema::make_pkey()) instead of the previously fixed pk "pv", with the comment that this avoids a "Failed to generate sharding metadata" error. simple_schema::make_pkey() delegates to tests::generate_partition_key(), which defaults to key_size{1, 128}, i.e. the partition key length is uniformly random in [1, 128] bytes. That interacts badly with the fact that both tests pick thresholds at exact byte boundaries of the MC sstable row encoding: - The large-data handler records a row's size as _data_writer->offset() - current_pos (sstables/mx/writer.cc: collect_row_stats()), i.e. the number of bytes the row took on disk. - For the first clustering row, the body includes a vint-encoded prev_row_size = pos - _prev_row_start. - _prev_row_start is captured at the start of the partition (consume_new_partition()) before the partition key is written to the data stream, so prev_row_size rolls in the partition key's serialized length (2-byte prefix + pk bytes) + deletion_time + static row size. A random-size partition key therefore perturbs the first clustering row's encoded size by 1-2 bytes across runs (the vint of prev_row_size crosses the 128 boundary), flipping the test's byte-exact threshold comparison. On seed 2104744000 this produced: critical check row_size_count == expected.size() has failed [3 != 2] Fix the two byte-exact-sensitive tests by reverting their partition key to the fixed s.new_mutation("pv") used before `ce00d61917`. Under smp=1 (which these tests run with, per -c1 in the test invocation) a fixed key is always shard-local, so no sharding-metadata issue arises here. The other tests modified by `ce00d61917` (test_sstable_log_too_many_rows, test_sstable_log_too_many_dead_rows, test_sstable_too_many_collection_elements, test_large_data_records_round_trip, etc.) assert on row/element counts or use thresholds with enough slack that the partition key size does not matter, and are left unchanged. Add an explanatory comment to each fixed site so the pitfall is not re-introduced by a future refactor. Verified stable with: ./test.py --mode=dev test/boost/sstable_3_x_test.cc::test_sstable_write_large_row --repeat 100 --max-failures 1 ./test.py --mode=dev test/boost/sstable_3_x_test.cc::test_sstable_write_large_cell --repeat 100 --max-failures 1 ./test.py --mode=release test/boost/sstable_3_x_test.cc::test_sstable_write_large_row --repeat 100 --max-failures 1 ./test.py --mode=release test/boost/sstable_3_x_test.cc::test_sstable_write_large_cell --repeat 100 --max-failures 1 All four invocations: 100/100 passed. Fixes: SCYLLADB-1685 Closes scylladb/scylladb#29621	2026-04-25 16:32:02 +03:00
Andrei Chekun	92c09d106d	test.py: fix test collection bug In certain circumstances current way of collecting can be error prone. Collection can stop when the first file is skipped in the mode leaving the rest of the files in CLI not collected. Another issue that if the file specified twice, with directory and file explicitly, it will produce incorrect CppFile in the stash causing KeyError. Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1714	2026-04-24 17:57:11 +02:00
Andrzej Jackowski	8855e77465	auth: make shutdown the exact reverse of startup The previous parallel stop of the authenticator and authorizer was a micro-optimization that obscured the lifecycle invariant that shutdown should reverse startup. Refs SCYLLADB-1679	2026-04-24 13:34:09 +02:00
Andrzej Jackowski	adf1e26bab	test: ldap: add test for pruner crash during shutdown Verify that service::stop() drains the LDAP pruner before clearing the permission loader. The test installs a slow permission loader and confirms the pruner is actively reloading when teardown begins. Refs SCYLLADB-1679	2026-04-24 13:34:09 +02:00
Andrzej Jackowski	37a547604f	auth: start authorizer and set permission loader before role manager LDAP role manager starts a pruner fiber that calls reload_all_permissions() which asserts _permission_loader is set. The permission loader calls _authorizer->authorize(), so the authorizer must be started before the loader is set. Start authorizer, then set the permission loader, then start the role manager, ensuring both dependencies are satisfied before the pruner can fire. Fixes SCYLLADB-1679	2026-04-24 13:34:09 +02:00
Andrzej Jackowski	c3e5285d45	auth: stop role manager before clearing permission loader service::stop() cleared the permission loader and stopped the role manager concurrently (via when_all_succeed). The LDAP pruner could be mid-reload at a yield point when the loader was set to null, causing it to call a null function. Stop the role manager first so the pruner is fully drained before the loader is cleared. Fixes SCYLLADB-1679	2026-04-24 13:34:09 +02:00
Andrzej Jackowski	f75e5ac65b	auth: reload LDAP permission cache on local shard only The LDAP role manager's _cache_pruner fiber used invoke_on_all() to reload permissions on every shard. Since auth::service::start() runs on all shards in parallel via invoke_on_all(), the pruner on shard X could call reload_all_permissions() on shard Y before shard Y finished start() and set its permission loader, hitting SCYLLA_ASSERT(_permission_loader). The same cross-shard race existed during shutdown. Each shard runs its own pruner instance, so reloading locally is sufficient — all shards are still covered. This also removes redundant N-squared reload calls. Refs SCYLLADB-1679	2026-04-24 13:06:58 +02:00
Botond Dénes	70261dc674	Merge 'test/cluster: scale failure_detector_timeout_in_ms by build mode' from Marcin Maliszkiewicz The failure_detector_timeout_in_ms override of 2000ms in 6 cluster test files is too aggressive for debug/sanitize builds. During node joins, the coordinator's failure detector times out on RPC pings to the joining node while it is still applying schema snapshots, marks it DOWN, and bans it — causing flaky test failures. Scale the timeout by MODES_TIMEOUT_FACTOR (3x for debug/sanitize, 2x for dev, 1x for release) via a shared failure_detector_timeout fixture in conftest.py. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1587 Backport: no, elasticsearch analyser shows only a single failure Closes scylladb/scylladb#29522 * github.com:scylladb/scylladb: test/cluster: scale failure_detector_timeout_in_ms by build mode test/cluster: add failure_detector_timeout fixture	2026-04-24 09:10:43 +03:00
Botond Dénes	d280517e27	test/cluster/test_incremental_repair: fix flaky do_tablet_incremental_repair_and_ops The log grep in get_sst_status searched from the beginning of the log (no from_mark), so the second-repair assertions were checking cumulative counts across both repairs rather than counts for the second repair alone. The expected values (sst_add==2, sst_mark==2) relied on this cumulative behaviour: 1 from the first repair + 1 from the second = 2. This works when the second repair encounters exactly one unrepaired sstable, but fails whenever the second repair sees two. The second repair can see two unrepaired sstables when the 100 keys inserted before it (via asyncio.gather) trigger a background auto-flush before take_storage_snapshot runs. take_storage_snapshot always flushes the memtable itself, so if an auto-flush already split the batch into two sstables on disk, the second repair's snapshot contains both and logs "Added sst" twice, making the cumulative count 3 instead of 2. Fix: take a log mark per-server before each repair call and pass it to get_sst_status so each check counts only the entries produced by that repair. The expected values become 1/0/1 and 1/1/1 respectively, independent of how many sstables happened to exist beforehand. get_sst_status gains an optional from_mark parameter (default None) which preserves existing call sites that intentionally grep from the start of the log. Fixes: SCYLLADB-1086 Closes scylladb/scylladb#29484	2026-04-23 17:17:16 +02:00
Wojciech Mitros	7634d3f7d4	test/cluster: fix flaky test_hints_consistency_during_replace The test creates a sync point immediately after writing 100 rows with CL=ANY, without waiting for pending hint writes to complete. store_hint() is fire-and-forget: it submits do_store_hint() to a gate and returns immediately. do_store_hint() updates _last_written_rp only after writing to the commitlog. If create_sync_point() is called before all do_store_hint() coroutines complete, the captured replay position is stale, and await_sync_point() returns DONE before all hints are replayed, leaving some rows missing. Fix by waiting for the size_of_hints_in_progress metric to reach zero before creating the sync point, ensuring all in-flight hint writes have completed and _last_written_rp is up to date. This follows the same pattern already used in test_sync_point. Fixes: SCYLLADB-1560 Closes scylladb/scylladb#29623	2026-04-23 17:03:48 +02:00
Botond Dénes	b49cf6247f	test: fix flaky test_read_repair_with_trace_logging by reading tracing with CL=ALL Tracing events are written to system_traces.events with CL=ANY, so they are only guaranteed to be present on the local node of the query coordinator. Reading them back with the driver default (CL=LOCAL_ONE) may route the query to a replica that has not yet received all events, causing the assertion on 'digest mismatch, starting read repair' to fail intermittently. Fix execute_with_tracing() to read tracing via the ResponseFuture API with query_cl=ConsistencyLevel.ALL, so events from all replicas are merged before the caller inspects them. Fixes: SCYLLADB-1633 Closes scylladb/scylladb#29566	2026-04-23 16:57:29 +02:00
Michał Jadwiszczak	878f341338	test/cluster/test_view_building_coordinator: fix view_updates_drained predicate The previous fix for the flakiness in test_file_streaming waited for the scylla_database_view_update_backlog metric to drop to 0 via wait_for(view_updates_drained, ...). However, the predicate returned True/False, while wait_for treats any non-None result as 'done' and keeps retrying only on None. So when the backlog was non-zero the predicate returned False, which wait_for interpreted as success and returned immediately - the test could then stop servers[0]/servers[1] before the view updates generated by new_server from the migrated staging sstable were actually delivered, leading to a partially populated MV (e.g. 431/1000 rows) and a failing assertion. Fix the predicate to return None instead of False when the backlog is not yet drained, so wait_for will actually retry until the metric reaches 0 (or the deadline is hit). Fixes SCYLLADB-1182 Closes scylladb/scylladb#29587	2026-04-23 17:52:22 +03:00
Andrei Chekun	67b3ad94a0	test.py: enhance error output in case no tests were executed By default, pytest produces the error if provided file is not exists. But coupled with xdist it will produce no errors. This is due how the pytest works with xdist. test.py always uses the parameter -n, so if something will go wrong there will be no errors produced, only exit code 5 will be thrown. This PR will print warning in case pytest's exit code is 5. Closes scylladb/scylladb#29584	2026-04-23 14:03:55 +02:00
Calle Wilund	c97ce32f47	Update position in dma_read(iovec) in create_file_for_seekable_source Fixes: SCYLLADB-1523 The returned file object does not increment file pos as is. One line fix. Added test to make sure this read path works as expected. Closes scylladb/scylladb#29456	2026-04-23 14:54:20 +03:00
Michael Litvak	3468e8de8b	test/mv/test_mv_staging: wait for cql after restart Wait for cql on all hosts after restarting a server in the test. The problem that was observed is that the test restarts servers[1] and doesn't wait for the cql to be ready on it. On test teardown it drops the keyspace, trying to execute it on the host that is not ready, and fails. Fixes SCYLLADB-1632 Closes scylladb/scylladb#29562	2026-04-23 12:40:19 +02:00
Benny Halevy	6cb4c27f8c	test/cluster/dtest/ccmlib/scylla_node: add debug logging Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2026-04-23 09:21:06 +03:00
Marcin Maliszkiewicz	3df951bc9c	Merge 'audit: set audit_info for native-protocol BATCH messages' from Andrzej Jackowski Commit `16b56c2451` ("Audit: avoid dynamic_cast on a hot path") moved audit info into batch_statement via set_audit_info(), but only wired it for the CQL-text BATCH path (raw::batch_statement::prepare()). Native-protocol BATCH messages (opcode 0x0D), handled by process_batch_internal in transport/server.cc, construct a batch_statement without setting audit_info. This causes audit to silently skip the entire batch. Set audit_info on the batch_statement so these batches are audited. Fixes SCYLLADB-1652 No backport - bug introduced recently. Closes scylladb/scylladb#29570 * github.com:scylladb/scylladb: test/audit: add reproducer for native-protocol batch not being audited audit: set audit_info for native-protocol BATCH messages test/audit: rename internal test methods to avoid CI misdetection	2026-04-22 18:56:28 +02:00
Botond Dénes	eb3326b417	Merge 'test.py: migrate all bare skips to typed skip markers' from Artsiom Mishuta should be merged after #29235 Complete the typed skip markers migration started in the plugin PR. Every bare `@pytest.mark.skip` decorator and `pytest.skip()` runtime call across the test suite is replaced with a typed equivalent, making skip reasons machine-readable in JUnit XML and Allure reports. 62 files changed across 8 commits, covering ~127 skip sites in total. Bare `pytest.skip` provides only a free-text reason string. CI dashboards (JUnit, Allure) cannot distinguish between a test skipped due to a known bug, a missing feature, a slow test, or an environment limitation. This makes it hard to track skip debt, prioritize fixes, or filter dashboards by skip category. The typed markers (`skip_bug`, `skip_not_implemented`, `skip_slow`, `skip_env`) introduced by the `skip_reason_plugin` solve this by embedding a `skip_type` field into every skip report entry. \| Type \| Count \| Files \| Description \| \|------\|-------\|-------\|-------------\| \| `skip_bug` \| 24 \| 16 \| Skip reason references a known bug/issue \| \| `skip_not_implemented` \| 10 \| 5 \| Feature not yet implemented in Scylla \| \| `skip_slow` \| 4 \| 3 \| Test too slow for regular CI runs \| \| `skip_not_implemented` (bare) \| 2 \| 1 \| Bare `@pytest.mark.skip` with no reason (COMPACT STORAGE, #3882) \| \| Type \| Count \| Files \| Description \| \|------\|-------\|-------\|-------------\| \| `skip_env` \| ~85 \| 34 \| Feature/config/topology not available at runtime \| \| `skip_bug` \| 2 \| 2 \| Known bugs: Streams on tablets (#23838), coroutine task not found (#22501) \| - Comments: 7 comments/docstrings across 5 files updated from `pytest.skip()` to `skip()` - Plugin hardened: `warnings.warn()` → `pytest.UsageError` for bare `@pytest.mark.skip` at collection time — bare skips are now a hard error, not a warning - Guard tests: New `test/pylib_test/test_no_bare_skips.py` with 3 tests that prevent regression: - AST scan for bare `@pytest.mark.skip` decorators - AST scan for bare `pytest.skip()` runtime calls - Real `pytest --collect-only` against all Python test directories Runtime skip sites use the convenience wrappers from `test.pylib.skip_types`: ```python from test.pylib.skip_types import skip_env ``` Usage: ```python skip_env("Tablets not enabled") ``` 1. test: migrate @pytest.mark.skip to @pytest.mark.skip_bug for known bugs — 24 decorator sites, 16 files 2. test: migrate @pytest.mark.skip to @pytest.mark.skip_not_implemented — 10 decorator sites, 5 files 3. test: migrate @pytest.mark.skip to @pytest.mark.skip_slow — 4 decorator sites, 3 files 4. test: migrate bare @pytest.mark.skip to skip_not_implemented — 2 bare decorators, 1 file 5. test: migrate runtime pytest.skip() to typed skip_env() — ~85 sites, 34 files 6. test: migrate runtime pytest.skip() to typed skip_bug() — 2 sites, 2 files 7. test: update comments referencing pytest.skip() to skip() — 7 comments, 5 files 8. test/pylib: reject bare pytest.mark.skip and add codebase guards — plugin hardening + 3 guard tests - All 60 plugin + guard tests pass (`test/pylib_test/`) - No bare `@pytest.mark.skip` or `pytest.skip()` calls remain in the codebase - `pytest --collect-only` succeeds across all test directories with the hardened plugin SCYLLADB-1349 Closes scylladb/scylladb#29305 * github.com:scylladb/scylladb: test/alternator: replace bare pytest.skip() with typed skip helpers test: migrate new bare skips introduced by upstream after rebase test/pylib: reject bare pytest.mark.skip and add codebase guards test: update comments referencing pytest.skip() to skip_env() test: migrate runtime pytest.skip() to typed skip_bug() test: migrate runtime pytest.skip() to typed skip_env() test: migrate bare @pytest.mark.skip to skip_not_implemented test: migrate @pytest.mark.skip to @pytest.mark.skip_slow test: migrate @pytest.mark.skip to @pytest.mark.skip_not_implemented test: migrate @pytest.mark.skip to @pytest.mark.skip_bug for known bugs	2026-04-22 15:48:27 +03:00
Avi Kivity	e84e7dfb7a	build: drop utils/rolling_max_tracker.hh from precompiled header Added by mistake. Precompiled headers should only include library headers that rarely change, since any dependency change causes a full rebuild. Closes scylladb/scylladb#29560	2026-04-22 15:46:50 +03:00
Botond Dénes	3aced88586	Merge 'audit: decrease allocations / instructions on will_log() fast path' from Marcin Maliszkiewicz Audit::will_log() runs on every CQL/Alternator request. Since `9646ee05bd` it constructs three temporary sstrings per call to look up the audited keyspaces set / tables map with std::string_view keys, costing ~180 insns/op and 2 allocations if sstring misses SSO. This series switches the containers to std::less<> comparators to enable heterogeneous lookup, then drops the sstring temporaries from will_log(). perf-simple-query --smp 1 --duration 15 --audit "table" --audit-keyspaces "ks-non-existing" --audit-categories "DCL,DDL,AUTH,DML,QUERY" baseline `3d0582d51e` 36777 insns/op regression `9646ee05bd` 36952 (+175) this series 36768 (-184, fixed) Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1616 Backport: no, offending commit is not backported Closes scylladb/scylladb#29565 * github.com:scylladb/scylladb: audit: drop sstring temporaries on the will_log() fast path audit: enable heterogeneous lookup on audited keyspaces/tables	2026-04-22 15:46:16 +03:00
Marcin Maliszkiewicz	4043d95810	Merge 'storage_service: fix REST API races during shutdown and cross-shard forwarding' from Piotr Smaron REST route removal unregisters handlers but does not wait for requests that already entered storage_service. A request can therefore suspend inside an async operation, restart proceeds to tear the service down, and the coroutine later resumes against destroyed members such as _topology_state_machine, _group0, or _sys_ks — a use-after-destruction bug that surfaces as UBSAN dynamic-type failures (e.g. the crash seen from topology_state_load()). Fix this by holding storage_service::_async_gate from the entry boundary of every externally-triggered async operation so that stop() drains them before teardown begins. The gate is acquired in run_with_api_lock, run_with_no_api_lock, and in individual REST handlers that bypass those wrappers (reload_raft_topology_state, mark_excluded, removenode, schema reload, topology-request waits/abort, cleanup, ring/schema queries, SSTable dictionary training/publish, and sampling). Additionally, fix get_ownership() and abort_topology_request() which forward work to shard 0 but were still referencing the caller-shard's `this` pointer instead of the destination-shard instance, causing silent cross-shard access to shard-local state. Add a cluster regression test that repeatedly exercises the multi-shard ownership REST path to cover the forwarding fix. Fixes: SCYLLADB-1415 Should be backported to all branches, the code has been introduced around 2024.1 release. Closes scylladb/scylladb#29373 * github.com:scylladb/scylladb: storage_service: fix shard-0 forwarding in REST helpers storage_service: gate REST-facing async operations during shutdown storage_service: prepare for async gate in REST handlers	2026-04-22 14:43:31 +02:00
Radosław Cybulski	cc39b54173	alternator: use `stream_arn` instead of `std::string` in list_streams Use `stream_arn` object for storage of last returned to the user stream instead of raw `std::string`. `stream_arn` is used for parsing ARN incoming from the user, for returning `std::string` was used because of buggy copy / move operations of `stream_arn`. Those were fixed, so we're fixing usage as well. Fixes: SCYLLADB-1241 Closes scylladb/scylladb#29578	2026-04-22 14:02:53 +02:00
Artsiom Mishuta	183c6d120e	test: exclude pylib_test from default test runs Add pylib_test to norecursedirs in pytest.ini so it is not collected during ./test.py or pytest test/ runs, but can still be run directly via 'pytest test/pylib_test'. Also fix pytest log cleanup: worker log files (pytest_gw*) were not being deleted on success because cleanup was restricted to the main process only. Now each process (main and workers) cleans up its own log file on success. Closes scylladb/scylladb#29551	2026-04-22 11:38:40 +02:00
Piotr Smaron	dffb266b79	storage_service: fix shard-0 forwarding in REST helpers get_ownership() and abort_topology_request() forward work to shard 0 via container().invoke_on(0, ...) but the lambda captured 'this' and accessed members through it instead of through the shard-0 'ss' parameter. This means the lambda used the caller-shard's instance, defeating the purpose of the forwarding. Use the 'ss' parameter consistently so the operations run against the correct shard-0 state.	2026-04-22 10:30:33 +02:00
Piotr Smaron	6a91d046f3	storage_service: gate REST-facing async operations during shutdown Hold _async_gate in all REST-facing async operations so that stop() drains in-flight requests before teardown, preventing use-after-free crashes when REST calls race with shutdown. A centralized gated() wrapper in set_storage_service (api/storage_service.cc) automatically holds the gate for every REST handler registered there, so new handlers get shutdown-safety by default. run_with_api_lock_internal and run_with_no_api_lock hold _async_gate on shard 0 as well, because REST requests arriving on any shard are forwarded there for execution. Methods that previously self-forwarded to shard 0 (mark_excluded, prepare_for_tablets_migration, set_node_intended_storage_mode, get_tablets_migration_status, finalize_tablets_migration) now assert this_shard_id() == 0. Their REST handlers call them via run_with_no_api_lock, which performs the shard-0 hop and gate hold centrally. Fixes: SCYLLADB-1415	2026-04-22 10:30:33 +02:00
Piotr Smaron	74dd33811e	storage_service: prepare for async gate in REST handlers Add hold_async_gate() public accessor for use by the REST registration layer in a followup commit. Convert run_with_no_api_lock to a coroutine so a followup commit can hold the async gate across the entire forwarded operation. No functional changes.	2026-04-22 10:28:54 +02:00
Botond Dénes	18ceeaf3ef	Merge 'Restrict tombstone GC sstable set to repaired sstables for tombstone_gc=repair mode' from Raphael Raph Carvalho When tombstone_gc=repair, the repaired compaction view's sstable_set_for_tombstone_gc() previously returned all sstables across all three views (unrepaired, repairing, repaired). This is correct but unnecessarily expensive: the unrepaired and repairing sets are never the source of a GC-blocking shadow when tombstone_gc=repair, for base tables. The key ordering guarantee that makes this safe is: - topology_coordinator sends send_tablet_repair RPC and waits for it to complete. Inside that RPC, mark_sstable_as_repaired() runs on all replicas, moving D from repairing → repaired (repaired_at stamped on disk). - Only after the RPC returns does the coordinator commit repair_time + sstables_repaired_at to Raft. - gc_before = repair_time - propagation_delay only advances once that Raft commit applies. Therefore, when a tombstone T in the repaired set first becomes GC-eligible (its deletion_time < gc_before), any data D it shadows is already in the repaired set on every replica. This holds because: - The memtable is flushed before the repairing snapshot is taken (take_storage_snapshot calls sg->flush()), capturing all data present at repair time. - Hints and batchlog are flushed before the snapshot, ensuring remotely-hinted writes arrive before the snapshot boundary. - Legitimate unrepaired data has timestamps close to 'now', always newer than any GC-eligible tombstone (USING TIMESTAMP to write backdated data is user error / UB). Excluding the repairing and unrepaired sets from the GC shadow check cannot cause any tombstone to be wrongly collected. The memtable check is also skipped for the same reason: memtable data is either newer than the GC-eligible tombstone, or was flushed into the repairing/repaired set before gc_before advanced. Safety restriction — materialized views: The optimization IS applied to materialized view tables. Two possible paths could inject D_view into the MV's unrepaired set after MV repair: view hints and staging via the view-update-generator. Both are safe: (1) View hints: flush_hints() creates a sync point covering BOTH _hints_manager (base mutations) AND _hints_for_views_manager (view mutations). It waits until ALL pending view hints — including D_view entries queued in _hints_for_views_manager while the target MV replica was down — have been replayed to the target node before take_storage_snapshot() is called. D_view therefore lands in the MV's repairing sstable and is promoted to repaired. When a repaired compaction then checks for shadows it finds D_view in the repaired set, keeping T_mv non-purgeable. (2) View-update-generator staging path: Base table repair can write a missing D_base to a replica via a staging sstable. The view-update-generator processes the staging sstable ASYNCHRONOUSLY: it may fire arbitrarily later, even after MV repair has committed repair_time and T_mv has been GC'd from the repaired set. However, the staging processor calls stream_view_replica_updates() which performs a READ-BEFORE-WRITE via as_mutation_source_excluding_staging(): it reads the CURRENT base table state before building the view update. If T_base was written to the base table (as it always is before the base replica can be repaired and the MV tombstone can become GC-eligible), the view_update_builder sees T_base as the existing partition tombstone. D_base's row marker (ts_d < ts_t) is expired by T_base, so the view update is a no-op: D_view is never dispatched to the MV replica. No resurrection can occur regardless of how long staging is delayed. A potential sub-edge-case is T_base being purged BEFORE staging fires (leaving D_base as the sole survivor, so stream_view_replica_updates would dispatch D_view). This is blocked by an additional invariant: for tablet-based tables, the repair writer stamps repaired_at on staging sstables (repair_writer_impl::create_writer sets mark_as_repaired = true and perform_component_rewrite writes repaired_at = sstables_repaired_at + 1 on every staging sstable). After base repair commits sstables_repaired_at to Raft, the staging sstable satisfies is_repaired(sstables_repaired_at, staging_sst) and therefore appears in make_repaired_sstable_set(). Any subsequent base repair that advances sstables_repaired_at further still includes the staging sstable (its repaired_at ≤ new sstables_repaired_at). D_base in the staging sstable thus shadows T_base in every repaired compaction's shadow check, keeping T_base non-purgeable as long as D_base remains in staging. A base table hint also cannot bypass this. A base hint is replayed as a base mutation. The resulting view update is generated synchronously on the base replica and sent to the MV replica via _hints_for_views_manager (path 1 above), not via staging. USING TIMESTAMP with timestamps predating (gc_before + propagation_delay) is explicitly UB and excluded from the safety argument. For tombstone_gc modes other than repair (timeout, immediate, disabled) the invariant does not hold for base tables either, so the full storage-group set is returned. The expected gain is reduced bloom filter and memtable key-lookup I/O during repaired compactions: the unrepaired set is typically the largest (it holds all recent writes), yet for tombstone_gc=repair it never influences GC decisions. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-231. Closes scylladb/scylladb#29310 * github.com:scylladb/scylladb: compaction: Restrict tombstone GC sstable set to repaired sstables for tombstone_gc=repair mode test/repair: Add tombstone GC safety tests for incremental repair	2026-04-22 10:21:37 +03:00
Avi Kivity	f5eb99f149	test: bump multishard_query_test querier_cache TTL to 60s to avoid flake Three test cases in multishard_query_test.cc set the querier_cache entry TTL to 2s and then assert, between pages of a stateful paged query, that cached queriers are still present (population >= 1) and that time_based_evictions stays 0. The 2s TTL is not load-bearing for what these tests exercise — they are checking the paging-cache handoff, not TTL semantics. But on busy CI runners (SCYLLADB-1642 was observed on aarch64 release), scheduling jitter between saving a reader and sampling the population can exceed 2s. When that happens, the TTL fires, both saved queriers are time-evicted, population drops to 0, and the assertion `require_greater_equal(saved_readers, 1u)` fails. The trailing `require_equal(time_based_evictions, 0)` check never runs because the earlier assertion has already aborted the iteration — which is why the Jenkins failure surfaces only as a bare "C++ failure at seastar_test.cc:93". Reproduced deterministically in test_read_with_partition_row_limits by injecting a `seastar::sleep(2500ms)` between the save and the sample: the hook then reports population=0 inserts=2 drops=0 time_based_evictions=2 resource_based_evictions=0 and the assertion fires — matching the Jenkins symptoms exactly. Bump the TTL to 60s in all three affected tests: - test_read_with_partition_row_limits (confirmed repro for SCYLLADB-1642) - test_read_all (same pattern, same invariants — suspect) - test_read_all_multi_range (same pattern, same invariants — suspect) Leave test_abandoned_read (1s TTL, actually tests TTL-driven eviction) and test_evict_a_shard_reader_on_each_page (tests manual eviction via evict_one(); its TTL is not load-bearing but the fix is deferred for a separate review) unchanged. Fixes: SCYLLADB-1642 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Closes scylladb/scylladb#29564	2026-04-22 09:48:59 +03:00
Tomasz Grabiec	cddde464ca	Merge 'service: Support adding/removing a datacenter with tablets by changing RF' from Aleksandra Martyniuk With this change, you can add or remove a DC(s) in a single ALTER KEYSPACE statement. It requires the keyspace to use rack list replication factor. In existing approach, during RF change all tablet replicas are rebuilt at once. This isn't the case now. In global_topology_request::keyspace_rf_change the request is added to a ongoing_rf_changes - a new column in system.topology table. In a new column in system_schema.keyspaces - next_replication - we keep the target RF. In make_rf_change_plan, load balancer schedules necessary migrations, considering the load of nodes and other pending tablet transitions. Requests from ongoing_rf_changes are processed concurrently, independently from one another. In each request racks are processed concurrently. No tablet replica will be removed until all required replicas are added. While adding replicas to each rack we always start with base tables and won't proceed with views until they are done (while removing - the other way around). The intermediary steps aren't reflected in schema. When the Rf change is finished: - in system_schema.keyspaces: - next_replication is cleared; - new keyspace properties are saved; - request is removed from ongoing_rf_changes; - the request is marked as done in system.topology_requests. Until the request is done, DESCRIBE KEYSPACE shows the replication_v2. If a request hasn't started to remove replicas, it can be aborted using task manager. system.topology_requests::error is set (but the request isn't marked as done) and next_replication = replication_v2. This will be interpreted by load balancer, that will start the rollback of the request. After the rollback is done, we set the relevant system.topology_requests entry as done (failed), clear the request id from system.topology::ongoing_rf_changes, and remove next_replication. Fixes: SCYLLADB-567. No backport needed; new feature. Closes scylladb/scylladb#24421 * github.com:scylladb/scylladb: service: fix indentation docs: update documentation test: test multi RF changes service: tasks: allow aborting ongoing RF changes cql3: allow changing RF by more than one when adding or removing a DC service: handle multi_rf_change service: implement make_rf_change_plan service: add keyspace_rf_change_plan to migration_plan service: extend tablet_migration_info to handle rebuilds service: split update_node_load_on_migration service: rearrange keyspace_rf_change handler db: add columns to system_schema.keyspaces db: service: add ongoing_rf_changes to system.topology gms: add keyspace_multi_rf_change feature	2026-04-22 01:46:11 +02:00
Andrzej Jackowski	b6cb025e9b	test/audit: add reproducer for native-protocol batch not being audited The existing test_batch sends a textual BEGIN BATCH ... APPLY BATCH as a QUERY message, which goes through the CQL parser and raw::batch_statement:: prepare() — a path that correctly sets audit_info. This missed the bug where native-protocol BATCH messages (opcode 0x0D), handled by process_batch_internal in transport/server.cc, construct a batch_statement without setting audit_info, causing audit to silently skip the batch. Add _test_batch_native_protocol which uses the driver's BatchStatement (both unprepared and prepared variants) to exercise this code path. Refs SCYLLADB-1652	2026-04-21 21:52:26 +02:00
Andrzej Jackowski	f5bb9b6282	audit: set audit_info for native-protocol BATCH messages Commit `16b56c2451` ("Audit: avoid dynamic_cast on a hot path") moved audit info into batch_statement via set_audit_info(), but only wired it for the CQL-text BATCH path (raw::batch_statement::prepare()). Native-protocol BATCH messages (opcode 0x0D), handled by process_batch_internal in transport/server.cc, construct a batch_statement without setting audit_info. This causes audit to silently skip the entire batch. Set audit_info on the batch_statement so these batches are audited. Fixes SCYLLADB-1652	2026-04-21 21:52:26 +02:00
Andrzej Jackowski	5f93d57d6e	test/audit: rename internal test methods to avoid CI misdetection The CI heuristic picks up any function named test_* in changed files and tries to run it as a standalone pytest test. The AuditTester class methods (test_batch, test_dml, etc.) are not top-level pytest tests — they are internal helpers called from the actual test functions. Prefix them with underscore so CI does not mistake them for standalone tests.	2026-04-21 21:52:26 +02:00

1 2 3 4 5 ...

53554 Commits