scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-02 21:17:01 +00:00

Author	SHA1	Message	Date
Ferenc Szili	21f0ef209b	test: add test for intranode balance threshold in size-based mode Verify that the load balancer does not issue intranode migrations when the load difference between shards is within the size_based_balance_threshold, and that it does issue migrations when the difference exceeds the threshold. (cherry picked from commit `6856f51097`)	2026-05-13 16:58:04 +00:00
Ferenc Szili	8a71a5ba88	tablet_allocator: apply balance threshold to intranode shard balancing The intranode shard balancing loop only stopped when the most-loaded and least-loaded shard were the same (src == dst), meaning it would keep issuing migrations until the load difference reached exactly 0. This caused unnecessary migrations for negligible imbalances. Apply the same is_balanced() threshold check that is already used for inter-node balancing, so that intranode migrations stop when the relative load difference between shards is within the configured size_based_balance_threshold (default 1%). (cherry picked from commit `aaead10e5d`)	2026-05-13 16:58:04 +00:00
Nadav Har'El	5de73f5480	test/cluster/auth_cluster: use CREATE ROLE IF NOT EXISTS to fix flaky test test_create_role_mixed_cluster calls servers_add(2) to bootstrap two old nodes concurrently, then adds a new node before issuing CREATE ROLE. The concurrent bootstraps trigger the well-known Python driver bug (scylladb/python-driver#317): two on_add notifications race in update_created_pools, causing a second pool to be created for a host whose pool was already established. If CREATE ROLE is in-flight on the old pool when it is closed, the driver retries on the new pool, executing the statement twice. The second execution fails with "Role ... already exists", making the test flaky. Fix by using CREATE ROLE IF NOT EXISTS. This is safe because unique_name() generates a timestamp+random suffix that is guaranteed to be unique; the role can "already exist" only due to the driver double-execution bug, never due to a real conflict. This is the same workaround that has been applied many times elsewhere in our test suite for exactly the same root cause: - CREATE KEYSPACE was changed to CREATE KEYSPACE IF NOT EXISTS (scylladb#18368, later generalised in scylladb#22399 via new_test_keyspace helpers) - DROP KEYSPACE was changed to DROP KEYSPACE IF EXISTS (scylladb#29487) Fixes: SCYLLADB-1811 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#29732 (cherry picked from commit `983eb5ab43`) Closes scylladb/scylladb#29743	2026-05-13 09:30:52 +03:00
Nadav Har'El	594e8f35b4	test: fix replica_read_timeout_no_exception flakiness on slow systems The test uses a 10ms read timeout to exercise code paths that handle timed-out reads without throwing C++ exceptions. As part of setup, it inserts rows and flushes them to two SSTables, then runs a warm-up SELECT to populate internal caches (e.g. the auth cache) before the real test begins. The reason for this warm-up read was the possibility that the first read does additional operations (such as reading and caching authentication) that might throw exceptions internally. I couldn't verify that such exceptions actually happen in today's code, but they might (re)appear in the future, so we should keep the warm-up SELECT. On slow CI machines (aarch64, debug build), that warm-up SELECT can take longer than 10ms to read from the two SSTables. When it does, the read times out: the coordinator receives 0 responses from the local replica within the deadline and propagates a read_timeout_exception. Since the exception is not caught, it escapes the test lambda, is logged as "cql env callback failed", and causes Boost.Test to report a C++ failure at the do_with_cql_env_thread call site. This matches the CI failure seen in SCYLLADB-1774: ERROR ... replica_read_timeout_no_exception: cql env callback failed, error: exceptions::read_timeout_exception (Operation timed out for replica_read_timeout_no_exception.tbl - received only 0 responses from 1 CL=ONE.) The CI log also shows that only 12 reads were admitted (the warm-up read plus the 11 reads from the two prepare() calls and CREATE/INSERT statements made earlier), and the current permit was stuck in need_cpu state -- the reactor hadn't had a chance to schedule the read before the 10ms window elapsed. The fix catches read_timeout_exception from the warm-up SELECT and retries until the read succeeds. The warm-up is required for correctness: some lazy-init code paths (e.g. auth cache population) use C++ exceptions for control flow internally. Those exceptions must be absorbed before the cxx_exceptions baseline is sampled inside execute_test(); otherwise they would appear in the delta and cause a false test failure. Simply ignoring a timed-out warm-up is not safe, because the lazy-init exceptions would then fire during the 1000 test reads, inflating cxx_exceptions_after relative to cxx_exceptions_before. No other calls in setup are susceptible to the 10ms read timeout: - CREATE KEYSPACE, CREATE TABLE, INSERT, and flush use the write timeout (10s) and are not reads. - e.prepare() goes through the query processor without reading table data, so it is not subject to the read timeout. - The semaphore manipulation in Test 2 is internal and has no timeout. - All 1000 reads in execute_test() are expected to fail, so a timeout there is the happy path, not a failure. The 10ms timeout itself is fine for the test's purpose: it is deliberately aggressive so that reads reliably time out on the hot path being tested. The problem was only that the pre-test warm-up was not guarded against the same timeout. Fixes: SCYLLADB-1830 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#29731 (cherry picked from commit `1f15e05946`) Closes scylladb/scylladb#29760	2026-05-13 09:28:04 +03:00
Ferenc Szili	b1fad45a6d	test: fix flaky test_tablets_split_merge_with_many_tables In debug mode, this test can timeout during tablets merge. While the test already decreases the number of tables in debug mode (20 tables, instead of 200 for dev mode), this is not enough, and the test can still timeout during merge. This change reduces the number of tables from 20 to 5 in debug mode. It also drops the log level for lead_balancer to debug. This should make any potential future problems with this test easier to investigate. Fixes: SCYLLADB-1863 Closes scylladb/scylladb#29682 (cherry picked from commit `ec4b483e88`) Closes scylladb/scylladb#29786	2026-05-13 09:18:30 +03:00
Botond Dénes	a0a61fe81f	Merge '[Backport 2026.2] load_balancer: fix tablet allocator dropped table' from Scylladb[bot] - Handle dropped tables gracefully in the tablet load balancer's `get_schema_and_rs()` instead of aborting with `on_internal_error` - The load balancer operates on a token metadata snapshot but accesses the live schema for table lookups. A DROP TABLE applied by another fiber between coroutine yield points can remove a table from the live schema while it still exists in the snapshot, causing an abort. `get_schema_and_rs()` now returns `std::optional` and logs a warning in debug log level instead of aborting when a table is missing. All callers skip dropped tables: - `make_sizing_plan`: skips to next table - `make_resize_plan`: skips to next table (merge suppression is moot) - `check_constraints`: returns `skip_info{}` with empty viable targets - `get_rs`: returns `nullptr`, checked by `check_constraints` The call chain is: `make_plan` → `make_internode_plan` → `check_constraints` → `get_rs` → `get_schema_and_rs`. The `make_internode_plan` coroutine has multiple `co_await` yield points (`maybe_yield`, `pick_candidate`) between building the candidate tablet list and checking replication constraints. A DROP TABLE schema mutation applied during any of these yields removes the table from `_db.get_tables_metadata()` while the candidate list still references it. Added `test_load_balancing_with_dropped_table` which simulates the race by capturing a token metadata snapshot, dropping the table, then calling `balance_tablets` with the stale snapshot. Fixes: SCYLLADB-1905 This fix needs to be backported to versions: 2025.4, 2026.1 - (cherry picked from commit `4987204f71`) - (cherry picked from commit `6b3e18c4a9`) Parent PR: #29585 Closes scylladb/scylladb#29818 * github.com:scylladb/scylladb: test: verify load balancer handles dropped tables gracefully tablet_allocator: handle dropped tables gracefully in get_schema_and_rs	2026-05-13 09:16:30 +03:00
Piotr Szymaniak	4d00019eff	test/alternator: stop avoiding tablets in Streams tests Alternator Streams now supports tablets, so stop skipping the TTL Streams test in tablet mode and stop forcing vnodes in the Streams audit test. Refs SCYLLADB-463 Closes scylladb/scylladb#29697 (cherry picked from commit `459c1dc32f`) Closes scylladb/scylladb#29819	2026-05-13 09:15:20 +03:00
Botond Dénes	3f57cdf7d7	Merge '[Backport 2026.2] sstables_loader: ensure upload directory is empty when load_and_stream returns' from Scylladb[bot] After `load_and_stream` (e.g. via `nodetool refresh --load-and-stream`) returns success, source sstable files in the `upload/` directory may still be on disk. `mark_for_deletion()` only sets an in-memory flag; the actual file deletion runs lazily when the last `shared_sstable` reference drops. This leaves a window between API success and physical deletion where a follow-up scan of the upload directory can detected sstables that will be deleted soon. This might cause failure because SSTable will be already wiped during processing. For fix: Force unlink to complete before `stream()` returns, so the upload directory is in a consistent state by the time the API reports success. For tablet streaming, partially-contained sstables participate in multiple per-tablet batches; eagerly unlinking after each batch would break the next batch that still needs to read the file. A `defer_unlinking` flag on the streamer postpones the explicit unlink until after all batches complete (called once at the end of `tablet_sstable_streamer::stream()`). Vnode streaming unlink eagerly at the end of `stream_sstable_mutations`. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1647 Backport is required, as it is a bug fix that was introduced in `517a4dc4df`. - (cherry picked from commit `7cdf215999`) - (cherry picked from commit `784127c40b`) Parent PR: #29599 Closes scylladb/scylladb#29845 * github.com:scylladb/scylladb: sstables_loader: synchronously unlink streamed sstables before returning sstables: make sstable::unlink() idempotent	2026-05-13 09:13:33 +03:00
Anna Stuchlik	664fc5bcc6	doc: mark Vector Search in Alternator as Cloud-only This commit adds the information missing from the Alternator docs that Vector Search is only available in ScyllaDB Cloud. Fixes https://github.com/scylladb/scylladb/issues/29661 Closes scylladb/scylladb#29664 (cherry picked from commit `4c01556f79`) Closes scylladb/scylladb#29847	2026-05-13 09:12:26 +03:00
Anna Stuchlik	1b6784df56	doc: label Migration from Vnodes to Tablets as experimental The procedure to migrate a vnodes-based keyspace to tablets-based keyspace has been labeled as experimental. Fixes SCYLLADB-1932 Closes scylladb/scylladb#29834 (cherry picked from commit `1f7d20f701`) Closes scylladb/scylladb#29849	2026-05-13 09:11:09 +03:00
Botond Dénes	473320df18	Merge '[Backport 2026.2] load_balance: fix drain with forced capacity-based balancing' from Scylladb[bot] When `force_capacity_based_balancing` is enabled and a node is being drained/excluded, the tablet allocator incorrectly aborts balancing due to incomplete tablet stats - even though capacity-based balancing doesn't depend on tablet sizes. The tablet allocator normally waits for complete load stats before balancing. An exception exists for drained+excluded nodes (they're unreachable and won't return stats). However, when forced capacity-based balancing is active, this exception was not being applied, causing the balancer to reject the drain plan. Adjust the condition in `tablet_allocator.cc` so that the "ignore missing data for drained nodes" logic applies regardless of whether capacity-based balancing is forced. Added a Boost unit test that forces capacity-based balancing and verifies a drained/excluded node gets its tablets migrated even when tablet size stats are missing. This bug was introduced in 2026.1, so this needs to be backported to 2026.1 and 2026.2 Fixes: SCYLLADB-1953 - (cherry picked from commit `906d2b817e`) - (cherry picked from commit `f7bc8f5fa7`) Parent PR: #29791 Closes scylladb/scylladb#29866 * github.com:scylladb/scylladb: test: boost: add drain test for forced capacity-based balancing service: allow draining with forced capacity-based balancing	2026-05-13 09:05:42 +03:00
Botond Dénes	ceae68b487	schema: fix DESCRIBE showing NullCompactionStrategy when compaction is disabled When a table's compaction is disabled via 'enabled': 'false', the DESCRIBE output incorrectly showed NullCompactionStrategy instead of the actual strategy. This happened because schema_properties() called compaction_strategy(), which returns compaction_strategy_type::null when compaction is disabled. Fix it by using configured_compaction_strategy(), which always returns the real strategy type - consistent with how schema_tables.cc serializes it to disk. Fixes SCYLLADB-1353 Closes scylladb/scylladb#29804 (cherry picked from commit `8d6f031a4a`) Closes scylladb/scylladb#29867	2026-05-13 08:59:59 +03:00
Andrzej Jackowski	3df25f1952	test: wait for TTL scheduling sanity metric The test samples sl:default runtime before and after setup writes to prove that it measures the scheduling group used by regular CQL writes. The metric is exported in milliseconds, so a single 200-row batch may not be visible immediately, or may be too small in some environments. Keep the original 200-row table size, but wait up to 30 seconds for the metric to advance. If it does not, retry the same writes before TTL is enabled. The retries update the same keys, so the expiration part of the test still waits for exactly the original number of rows. In a local 100-run with N=200 rows, the observed delta of `ms_statement_before - ms_statement_before_write` was: min=4.0, max=16.0, mean=8.13, and median=8.0. Therefore, it looks possible that in a rare corner case the delta drops even to 0. Fixes SCYLLADB-1869 Closes scylladb/scylladb#29797 (cherry picked from commit `89261bf759`) Closes scylladb/scylladb#29868	2026-05-13 08:59:23 +03:00
Wojciech Mitros	1ed765a381	test: run test_mv_admission_control_exception on one shard In the test we perform 2 consecutive writes where the first write is supposed to increase the view update backlog above the mv admission control threshold and the second one is expected to be rejected because of that. On each node/shard we have 2 types of view update backlogs: 1. for deciding whether we should admit writes 2. for propagating the backlog information to other nodes/shards. For the second write to be rejected, it must be performed on a node and shard which updated its backlog of type 1. The view update backlog of type 2. is immediately increased on the base table replica. For this backlog to be registered as a backlog of type 1., it needs to be either carried by gossip (happening once every second) or by attaching it to a replica write response. We don't want to increase the runtime of tests unnecessarily, so we don't wait and we rely on the second mechanism. The response to the first base table write (the one causing increase in the backlog) carries the increased backlog to the coordinator of this write. So for the second write to observe the increased backlog, it needs to be coordinated on the same node+shard as the first write. We make sure that both writes are coordinated on the same node+shard by using prepared statements combined with setting the host in `run_async`. Both writes target the same partition and with prepared statements we route them directly to the correct shard. That was the idea, at least. In practice, for the driver to learn the correct shard, it first needs to learn the token->shard mapping from the server. For vnodes it can expect a shard by calculating the token of the affected partition, but for tablets, it had no opportunity to learn the tablet->shard mapping so the first write may route to any shard. Additionally, we aren't guaranteed that the driver established connections to all shards on all nodes at the point of any write. So if a connection finishes establishing between the two writes, this may also cause us to coordinate these 2 writes on different shards, leading to a missed view backlog growth and not-rejected second write. We fix this in this patch by running the test using one shard on each node. This way, as long as we perform both writes on the same node, they'll also be coordinated on the same shard. This also makes the prepared statement and BoundStatement unnecessary — we can use SimpleStatement with FallthroughRetryPolicy directly. Fixes: SCYLLADB-1957 Closes scylladb/scylladb#29862 (cherry picked from commit `f3cf20803b`) Closes scylladb/scylladb#29873	2026-05-13 08:56:27 +03:00
Ferenc Szili	12f4280d1e	test: boost: add drain test for forced capacity-based balancing Add a Boost unit test that forces capacity-based balancing through configuration and verifies that a drained and excluded node will be drained of its tablets when tablet size stats are missing. The test covers the regression where the allocator rejected the plan due to incomplete tablet stats, even though forced capacity-based balancing does not depend on tablet sizes. (cherry picked from commit `f7bc8f5fa7`)	2026-05-12 12:59:33 +00:00
Ferenc Szili	7a47c4ceba	service: allow draining with forced capacity-based balancing When force_capacity_based_balancing is enabled, the tablet allocator balances by node and shard capacity rather than by tablet sizes. When the data needed for load balancing is incomplete, the balancer fails and waits until load_stats is available and correct for all the nodes. An exception to this is when a node is being drained and excluded: it is unreachable, and will not return. In this case the balancer has to do its best and ignore the missing data. This patch fixes a bug where forcing capacity based balancing made the balancer not ignore missing data in these cases, and instead abort the balancing. (cherry picked from commit `906d2b817e`)	2026-05-12 12:59:33 +00:00
copilot-swe-agent[bot]	7acb040470	docs: fix typo in materialized views docs - "columns are" instead of "is" The MV Select Statement description was missing the word "columns" and used incorrect verb agreement, making the sentence grammatically broken and ambiguous. docs/cql/mv.rst: "which of the base table is included" → "which of the base table columns are included" Fixes #29662 Closes #29663 Co-authored-by: annastuchlik <37244380+annastuchlik@users.noreply.github.com> (cherry picked from commit `9e7d67612c`) Closes scylladb/scylladb#29835	2026-05-12 12:03:09 +03:00
Anna Stuchlik	9563994298	doc: update the node size limit This commit increases the node size limit from 256 to 4096 CPUs based on `be1f566488` Fixes SCYLLADB-1676 Closes scylladb/scylladb#29602 (cherry picked from commit `a7b7019f90`) Closes scylladb/scylladb#29846	2026-05-12 12:01:38 +03:00
Asias He	714003ef2e	repair: Reject repair requests where start and end tokens are equal When a user calls the repair API with identical startToken and endToken values, the code creates a wrapping interval (T, T]. This causes unwrap() to split it into (-inf, T] and (T, +inf), covering the entire token ring and triggering a full repair. Reject such requests early with an error message matching Cassandra's behavior: "Start and end tokens must be different." Fixes: CUSTOMER-368 Closes scylladb/scylladb#29821 (cherry picked from commit `0204372156`) Closes scylladb/scylladb#29836	2026-05-12 11:58:04 +03:00
Calle Wilund	be2f0a8601	storage_service: Disable snapshots after raft decommission Fixes: SCYLLADB-1936 In case we abort a decommission operation, the snapshot/backup mechanism need to remain open. This change moves it to after raft_decommission. In the case of a cluster snapshot, our nodes ownership or not of tables will be serialized by raft anyway, so should remain consistent. In that case we at worst coordinate from a node in "leave" status In the case of a local snapshot, ownership matters less, only sstables on disk, which should not change. In the case of backup, this operates on a snapshot, state of which is not affected. Adds an injection point for testing. v2: - Added injection point to ensure test can abort decommission Closes scylladb/scylladb#29667 (cherry picked from commit `2cc1a2c406`) Closes scylladb/scylladb#29848	2026-05-12 11:42:14 +03:00
Taras Veretilnyk	ca9abcdcbc	sstables_loader: synchronously unlink streamed sstables before returning mark_for_deletion() only set an in-memory flag; the actual file deletion ran lazily when the last shared_sstable reference dropped, leaving a window in which a follow-up scan of the upload directory (e.g. a second 'nodetool refresh --load-and-stream') could observe a partially-deleted sstable and fail with malformed_sstable_exception. Force the unlink to complete before stream() returns. For tablet streaming, partially-contained sstables span multiple per-tablet batches, so a defer_unlinking flag postpones the unlink until after all sstables are streamed; for vnodes and fully-contained sstables are streamed only once and could be removed just after being streamed. Added a FIXME on object_storage_base::wipe and strengthened the doc on storage::wipe to make the never-fails contract explicit (cherry picked from commit `784127c40b`)	2026-05-11 18:44:27 +00:00
Taras Veretilnyk	fae12d069e	sstables: make sstable::unlink() idempotent Avoid duplicate work when unlink() is called more than once on the same sstable. This happens when a caller invokes unlink() explicitly on an sstable that is also marked for deletion: the destructor's close_files() path would otherwise call unlink() again, re-firing _on_delete, double-counting _stats.on_delete() and double-invoking _manager.on_unlink(). (cherry picked from commit `7cdf215999`)	2026-05-11 18:44:26 +00:00
Piotr Dulikowski	0ac15b7030	database: add missing co_await on lock in create_local_system_table The function database::create_local_system_table calls get_tables_metadata().hold_write_lock(), but does not co_await the returned future. Effectively, this code does not guarantee mutual exclusion because it does not wait for the lock to be acquired and does not guarantee that the lock is held long enough. Fix this by adding the co_await that was missing. Found by manual inspection. This code is not known to have caused any problems so far, but it's clearly wrong - hence the fix. Fixes: SCYLLADB-1916 Closes scylladb/scylladb#29806 (cherry picked from commit `bc482bfdea`) Closes scylladb/scylladb#29815	2026-05-11 12:24:35 +02:00
Jenkins Promoter	aff9aa156b	Update ScyllaDB version to: 2026.2.1	2026-05-11 08:51:05 +03:00
Ferenc Szili	be56bf031f	test: verify load balancer handles dropped tables gracefully Add test_load_balancing_with_dropped_table that simulates the race between DROP TABLE and the load balancer by capturing a token metadata snapshot before dropping the table, then passing the stale snapshot to balance_tablets(). Verifies it completes without aborting and produces no migrations for the dropped table. (cherry picked from commit `6b3e18c4a9`)	2026-05-10 22:37:42 +00:00
Ferenc Szili	4375b502ea	tablet_allocator: handle dropped tables gracefully in get_schema_and_rs The load balancer's get_schema_and_rs() would trigger on_internal_error when a table present in the token metadata snapshot had been concurrently dropped from the live schema. This race is possible because the balancer coroutine yields between building the candidate list and checking replication constraints, allowing a DROP TABLE schema mutation to be applied by another fiber in the meantime. Change get_schema_and_rs() to return {nullptr, nullptr} for dropped tables instead of aborting. Update all callers to skip dropped tables: - make_sizing_plan: continue to next table - make_resize_plan: continue to next table (merge suppression is moot) - check_constraints: return skip_info with empty viable targets - get_rs: return nullptr, checked by check_constraints (cherry picked from commit `4987204f71`)	2026-05-10 22:37:41 +00:00
Botond Dénes	815260866c	sstables/trie: add preemption points in trie_writer The BTI partition index trie writer flushes all buffered nodes at the end of each SSTable via complete_until_depth(0), called from bti_partition_index_writer_impl::finish(). This is a tight synchronous loop that writes trie nodes through file_writer::write(), which uses a buffered output_stream: individual writes that fit in the buffer are plain memcpy operations returning a ready future, so .get() never yields. As a result the reactor can stall for several milliseconds on large SSTables. The entire call chain runs inside seastar::async() (via sstable::write_components()), so seastar::thread::maybe_yield() is safe to call here. Add it at the top of both tight loops: - complete_until_depth(), which iterates over trie depth - lay_out_children(), which iterates over child branches per node Fixes SCYLLADB-1885 Closes scylladb/scylladb#29798 (cherry picked from commit `d0813769ec`) Closes scylladb/scylladb#29810	2026-05-10 22:15:09 +03:00
Anna Stuchlik	119df703b0	doc: add the upgrade guide from 2026.1 to 2026.2 This commit adds the upgrade guide, including the updated metrics. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1746 Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1765 Closes scylladb/scylladb#29694 (cherry picked from commit `61d1cbfd20`) Closes scylladb/scylladb#29814	2026-05-10 22:14:34 +03:00
Patryk Jędrzejczak	4f87c9c510	Merge 'topology_coordinator: join tablet load stats refresh in stop()' from Andrzej Jackowski Commit `2b7aa32` (topology_coordinator: Refresh load stats after table is created or altered) registered topology_coordinator as a schema change listener and added on_create_column_family which fire-and-forgets _tablet_load_stats_refresh.trigger(). The triggered task runs on the gossip scheduling group via with_scheduling_group and accesses the topology_coordinator via 'this'. stop() unregisters the listener but does not wait for any in-flight refresh task. If a notification fires between _tablet_load_stats_refresh.join() in run() and unregister_listener in stop(), the scheduled task can outlive the topology_coordinator and access freed memory after run_topology_coordinator's coroutine frame is destroyed. Wait for the refresh to complete in stop() after unregistering the listener, ensuring no task can fire after destruction. Fixes SCYLLADB-1728 Backport to 2026.1 and 2026.2, because the issue was introduced in `2b7aa32` Closes scylladb/scylladb#29653 * https://github.com/scylladb/scylladb: test: tablet_stats: reproduce shutdown refresh race topology_coordinator: join tablet load stats refresh in stop() (cherry picked from commit `d9dd3bfe53`) Closes scylladb/scylladb#29686	2026-05-10 13:56:42 +03:00
Nadav Har'El	f9aae8c2f1	Merge 'test: fix race window test flakiness from residual re-repair' from Avi Kivity Fix the persistent flakiness in `test_incremental_repair_race_window_promotes_unrepaired_data` (SCYLLADB-1478, reopened). After restarting servers[1], the topology coordinator can initiate a residual re-repair when it sees tablets stuck in the `repair` stage. This re-repair flushes memtables on all replicas and marks post-repair data as repaired, contaminating the test state and masking the compaction-merge bug the test is designed to detect. The assertion then fails on the next retry because the previous attempt's re-repair left behind repaired sstables containing post-repair keys. 1. Propagating `current_key` through the exception — correctly advanced the key counter on retry, but the contaminated tablet metadata from the prior re-repair (repaired sstables with post-repair keys) was still present, causing assertion failures on the next attempt. 2. DROP TABLE + CREATE TABLE between retries — the tablet metadata (sstables_repaired_at, repair stage) is tied to the tablet identity, and recreating the table in the same keyspace still showed residual state issues. Instead of trying to clean up contaminated state, each retry creates a completely fresh keyspace (unique name via `create_new_test_keyspace`). This gives entirely new tablets with no residual repair metadata from prior attempts. Combined with broader detection of coordinator changes and residual re-repairs, the test reliably retries before any contamination can cause false failures. The detection is now comprehensive: - Broadened coordinator check: any coordinator change (`new_coord != coord`), not just migration to servers[1] - Re-repair detection at three points: post-restart, during the compaction poll, and after injection release — grep for `"Initiating tablet repair host="` in the coordinator log 1. `test: extract _setup_table_for_race_window helper` — pure code-movement refactor that extracts keyspace+table+data+repair1+data+flush into a reusable helper. Easily verifiable as a no-op behavioral change. 2. `test: fix race window test flakiness from residual re-repair` — the actual fix: broadened detection logic + re-repair grep at 3 points + fresh-keyspace retry on exception. Passed 1000 consecutive runs with the fix applied. Without the fix, about 2% flakiness was observed in debug mode. Fixes: SCYLLADB-1743 So far, we haven't observed flakiness of this test on branches, so not backporting yet. Will backport if seen. Closes scylladb/scylladb#29721 * github.com:scylladb/scylladb: test: fix race window test flakiness from residual re-repair test: extract _setup_table_for_race_window helper for race window test (cherry picked from commit `d33bb6ea00`) Closes scylladb/scylladb#29761	2026-05-08 12:24:23 +02:00
Piotr Dulikowski	104e9b3c32	Merge 'table_helper: fix use-after-free on prepared-statement invalidation' from Marcin Maliszkiewicz insert() held no local strong ref to the prepared modification_statement across the suspension in execute(). On a single shard: 1. Fiber A suspends inside _insert_stmt->execute(). 2. DROP TABLE / DROP KEYSPACE on the target, or LRU eviction, removes the prepared_statements_cache entry, releasing its strong ref. 3. Fiber B re-enters cache_table_info(), sees _prepared_stmt (checked_weak_ptr) invalidated, and runs _insert_stmt = nullptr, releasing the last strong ref. The modification_statement is freed. 4. Fiber A resumes inside execute() and touches freed this. Pin strong ref to _insert_stmt locally before the suspension. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1667 Backport: all supported branches, it's memory corruption bug, long present Closes scylladb/scylladb#29588 github.com:scylladb/scylladb: test/boost: add dummy case to table_helper_test for non-injection modes test/boost: add regression test for table_helper insert() UAF utils/error_injection: add waiters() API table_helper: fix use-after-free on prepared-statement invalidation (cherry picked from commit `efcc0b6376`) Closes scylladb/scylladb#29747	2026-05-08 10:47:42 +02:00
Wojciech Mitros	4fc4f4e9f9	test: propagate view update backlog before partition delete In the test_delete_partition_rows_from_table_with_mv case we perform a deletion of a large partition to verify that the deletion will self-throttle when generating many view updates. Before the deletion, we first build the materialized view, which causes the view update backlog to grow. The backlog should be back to empty when the view building finishes, and we do wait for that to happen, but the information about the backlog drop may not be propagated to the delete coordinator in time - the gossip interval is 1s and we perform no other writes between the nodes in the meantime, so we don't make use of the "piggyback" mechanism of propagating view backlog either. If the coordinator thinks that the backlog is high on the replica, it may reject the delete, failing this test. We change this in this patch - after the view is built, we perform an extra write from the coordinator. When the write finishes, the coordinator will have the up-to-date view backlog and can proceed with the DELETE. Additionally, we enable the "update_backlog_immediately" injection, which makes the node backlog (the highest backlog across shards) update immediately after each change. Fixes: SCYLLADB-1877 Closes scylladb/scylladb#29775 (cherry picked from commit `ab12083525`) Closes scylladb/scylladb#29793	2026-05-07 22:43:18 +03:00
Jenkins Promoter	ee34573bd1	Update pgo profiles - aarch64	2026-05-07 15:36:29 +03:00
Piotr Dulikowski	851c605b1d	Merge '[Backport 2026.2] vector_search: test: fix flaky test_dns_resolving_repeated' from Scylladb[bot] The `vector_store_client_test_dns_resolving_repeated` test was intermittently timing out on CI. The exact root cause is not fully understood, but the hypothesis is that a single trigger signal can be lost somewhere (not exactly known where). This is not an issue for the production code because refresh trigger will be called multiple times whenever all configured nodes will be unreachable. Fixes SCYLLADB-1794 Backport to 2026.1 and 2026.2, as the same CI flakiness can occur on these branches. - (cherry picked from commit `4722be1289`) - (cherry picked from commit `207de967fb`) Parent PR: #29752 Closes scylladb/scylladb#29784 * github.com:scylladb/scylladb: vector_search: test: default timeout in test_dns_resolving_repeated vector_search: test: fix flaky test_dns_resolving_repeated	2026-05-07 14:34:34 +02:00
Jenkins Promoter	57f9d9d581	Update pgo profiles - x86_64	2026-05-07 15:05:58 +03:00
Marcin Maliszkiewicz	15b2ed99f0	Merge '[Backport 2026.2] auth: fix crash on ghost rows in role_permissions' from Scylladb[bot] The auth cache crashes when it encounters rows in role_permissions that have a live row marker but no permissions column. These “ghost rows” were created by the now-removed auth v2 migration, which used INSERT (creating row markers) instead of UPDATE. When permissions were later revoked, the row marker remained while the permissions column became null. An empty collection appears as null, since its lifetime is based only on its element's cells. As a result, when the cache reloads and expects the permissions column to exist, it hits a missing_column exception. The series removes dead code that was the primary crash site, adds has() guards to the remaining access paths, and includes a test reproducer. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1816 Backport: all supported versions 2026.1, 2025.4, 2025.1 - (cherry picked from commit `797bc28aae`) - (cherry picked from commit `c44625ebdf`) - (cherry picked from commit `df69a5c79b`) - (cherry picked from commit `5c5306c692`) Parent PR: #29757 Closes scylladb/scylladb#29783 * github.com:scylladb/scylladb: test: add reproducer for auth cache crash on missing permissions column auth: tolerate missing permissions column in authorize() auth: add defensive has() guard for role_attributes value column auth: remove unused permissions field from cache role_record	2026-05-07 10:55:07 +02:00
Yaron Kaikov	3aac93f49e	pgo: fix ModuleNotFoundError in exec_cql.py by reverting safe_driver_shutdown Commit `cf237e060a` introduced 'from test.pylib.driver_utils import safe_driver_shutdown' in pgo/exec_cql.py. This module runs during PGO profile training (a build step) where the test package is not on the Python path, causing an immediate ModuleNotFoundError on both x86 and ARM. Revert to plain cluster.shutdown() which is sufficient for the single-use PGO training scenario. Fixes: SCYLLADB-1862 Closes scylladb/scylladb#29746 (cherry picked from commit `65eabda833`) Closes scylladb/scylladb#29785	2026-05-07 10:09:25 +02:00
Karol Nowacki	44249c0a75	vector_search: test: default timeout in test_dns_resolving_repeated Replace explicit 1-second timeouts in repeat_until() with the default STANDARD_WAIT (10s). The 1-second timeout could be too aggressive for loaded CI environments where lowres_clock granularity (~10ms) combined with OS scheduling delays and resource contention (-c2 -m2G) could cause the loop to expire before the DNS refresh task completes its cycle. This also unifies test timeouts across test cases. (cherry picked from commit `207de967fb`)	2026-05-06 20:48:55 +00:00
Karol Nowacki	e9240587f4	vector_search: test: fix flaky test_dns_resolving_repeated Move trigger_dns_resolver() inside the repeat_until loop instead of calling it once before the loop. The test was intermittently timing out on CI. The exact root cause is not fully understood, but the hypothesis is that a single trigger signal can be lost somewhere (not exactly known where). This is not an issue for the production code because refresh trigger will be called multiple times - in every query where all configured nodes will be unreachable. By triggering inside the loop, we ensure the signal is re-sent on each iteration until the resolver actually performs the refresh and picks up the new (failing) DNS resolution. This makes the test resilient to timing-dependent signal loss without changing production code. Fixes: SCYLLADB-1794 (cherry picked from commit `4722be1289`)	2026-05-06 20:48:54 +00:00
Marcin Maliszkiewicz	b39c7fa034	test: add reproducer for auth cache crash on missing permissions column (cherry picked from commit `5c5306c692`)	2026-05-06 20:47:30 +00:00
Marcin Maliszkiewicz	3e3096d6df	auth: tolerate missing permissions column in authorize() Ghost rows in role_permissions with a live row marker but no permissions column can occur when permissions created via INSERT (e.g. by the removed auth v2 migration) are later revoked. The row marker survives the revoke, leaving a row visible to queries but with permissions=null. Add a has() guard before accessing the permissions column, matching the pattern already used in list_all(). Return NONE permissions for such ghost rows instead of crashing. (cherry picked from commit `df69a5c79b`)	2026-05-06 20:47:29 +00:00
Marcin Maliszkiewicz	6195e08408	auth: add defensive has() guard for role_attributes value column Add a has() check before accessing the value column in role_attributes to tolerate ghost rows with missing regular columns. In practice this is unlikely to be a problem since attributes are not typically revoked, but the guard is added for consistency and defensive programming. (cherry picked from commit `c44625ebdf`)	2026-05-06 20:47:29 +00:00
Marcin Maliszkiewicz	53caa6eca4	auth: remove unused permissions field from cache role_record The permissions field in role_record was populated by fetch_role() but never read. Authorization uses cached_permissions instead, which is loaded via the permission_loader callback. Remove the dead field and its fetch code. The removed code also did not check for missing columns before accessing the permissions set, which could crash on ghost rows left by the removed auth v2 migration. The migration used INSERT (creating row markers), and when permissions were later revoked, the row marker survived while the permissions column became null. (cherry picked from commit `797bc28aae`)	2026-05-06 20:47:28 +00:00
Marcin Maliszkiewicz	fb6d5368bb	Merge 'auth: fix shutdown and startup races in LDAP cache pruner' from Andrzej Jackowski The LDAP role manager's `_cache_pruner` background fiber periodically calls cache::reload_all_permissions(). Two races cause it to hit SCYLLA_ASSERT(_permission_loader): - Cross-shard race: The pruner `used _cache.container().invoke_on_all()` to reload permissions on every shard. Since both `service::start()` and `sharded<service>::stop()` execute per-shard in parallel, the pruner on one shard could call reload_all_permissions() on another shard before that shard set its loader (startup) or after it cleared its loader (shutdown). Each shard runs its own pruner instance, so reloading locally is sufficient — this also removes redundant N² reload calls. - Intra-shard race: `service::stop()` cleared the permission loader and stopped the role manager concurrently (via when_all_succeed). A mid-reload pruner could yield and then call the now-null loader. Fixed by stopping the role manager first so the pruner is fully drained before the loader is cleared. Fixes SCYLLADB-1679 Backport to 2026.2, introduced in `7eedf50c12` Closes scylladb/scylladb#29605 * github.com:scylladb/scylladb: auth: make shutdown the exact reverse of startup test: ldap: add test for pruner crash during shutdown auth: start authorizer and set permission loader before role manager auth: stop role manager before clearing permission loader auth: reload LDAP permission cache on local shard only (cherry picked from commit `b0f988afc4`) Closes scylladb/scylladb#29681	2026-05-06 14:33:33 +02:00
Marcin Maliszkiewicz	9e0c86b7fd	Merge 'utils: loading_cache: add `insert()` that is a no-op when caching is disabled' from Dario Mirovic When `permissions_validity_in_ms` is set to 0, executing a prepared statement under authentication crashes with: ``` Assertion `caching_enabled()' failed. at utils/loading_cache.hh:319 in authorized_prepared_statements_cache::insert ``` `loading_cache::get_ptr()` asserts when caching is disabled (expiry == 0), but `authorized_prepared_statements_cache::insert()` was using it purely for its side effect of populating the cache, which is meaningless when caching is off. Add a new `loading_cache::insert(k, load)` method that is a no-op when caching is disabled and otherwise forwards to `get_ptr()`. Switch `authorized_prepared_statements_cache::insert()` to use it. This completes the disabled-mode safety contract of the cache for the write side, mirroring the fallback that `get()` already provides for the read side. Includes a regression test in `test/boost/loading_cache_test.cc` plus a positive test for the new `insert()` overload. Fixes SCYLLADB-1699 The crash is introduced a long time ago. It is present on all the live versions, from 2025.1 onward. No client tickets, but it should be backported. Closes scylladb/scylladb#29638 * github.com:scylladb/scylladb: test: boost: regression test for loading_cache::insert with caching disabled utils: loading_cache: add insert() that is a no-op when caching is disabled (cherry picked from commit `c00fee0316`) Closes scylladb/scylladb#29762	2026-05-06 14:27:41 +02:00
Patryk Jędrzejczak	6d09897339	Merge 'Barrier and drain logging' from Gleb Natapov Add more logging to barrier and drain rpc to try and pinpoint https://github.com/scylladb/scylladb/issues/26281 Bakport since we want to have it if it happens in the field. Fixes: SCYLLADB-1836 Refs: #26281 Closes scylladb/scylladb#29735 * https://github.com/scylladb/scylladb: session, raft_topology: add periodic warnings for hung drain and stale version waits session: add info-level logging to drain_closing_sessions raft_topology: log sub-step progress in local_topology_barrier raft_topology: log read_barrier progress in topology cmd handler (cherry picked from commit `b69d00b0a7`) Closes scylladb/scylladb#29763 scylla-2026.2.0-rc1-candidate-20260506022659 scylla-2026.2.0-rc1	2026-05-06 10:26:44 +02:00
Yaniv Michael Kaul	5c8662d606	raft/group0: fix destroy assertion on startup failure If start_server_for_group0() successfully registers a server in _raft_gr._servers but a subsequent step (e.g. enable_in_memory_state_machine()) throws, the server is never destroyed because abort_and_drain()/destroy() check std::get_if<raft::group_id>(&_group0) which was only set after the entire with_scheduling_group block completed. Move _group0.emplace<raft::group_id>() inside the lambda, immediately after start_server_for_group() succeeds, so that cleanup paths can always find and destroy the registered server. This fixes the assertion: "raft_group_registry - stop(): server for group ... is not destroyed" which manifests during shutdown after an upgrade where topology_state_load() fails due to netw::unknown_address. Backport: Yes, to 2026.1, 2026.2, as it causes a crash on upgrades Refs: SCYLLADB-1217 Refs: CUSTOMER-340 Refs: CUSTOMER-335 Fixes: SCYLLADB-1809 Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> AI-assisted: Yes, Opencode/Opus 4.6 Closes scylladb/scylladb#29702 (cherry picked from commit `6179406467`) Closes scylladb/scylladb#29742	2026-05-05 10:48:13 +02:00
Patryk Jędrzejczak	74a58a6757	Merge 'paxos_state: keep prepared message alive across statement execution' from Petr Gusev In do_execute_cql_with_timeout(), when the prepared statement was not found in the cache, we called qp.prepare() and stored the returned result_message::prepared in a local variable scoped to the 'if' block. We then extracted ps_ptr (a checked_weak_ptr to the prepared statement) from the message, let the message go out of scope at the end of the 'if', and used ps_ptr after a co_await on st->execute(). Since `3ac4e258e8` ("transport/messages: hold pinned prepared entry in PREPARE result"), result_message::prepared owns a strong pinned reference to the prepared cache entry. While qp.prepare() runs it also holds its own pin on the entry, so on return the entry has at least the pin owned by the returned message. As long as that message is alive, the cache entry cannot be purged and the weak handle inside ps_ptr remains promotable. The lifetime gap manifested only in debug builds. qp.prepare() returns a ready future on the cache-miss path, so in release builds the co_await resumes synchronously: control flows from the assignment of ps_ptr straight into st->execute() with no opportunity for any other task (in particular, prepared cache invalidation triggered by a concurrent schema change) to run in between. Debug builds, however, force a reactor preemption point on every co_await even when the awaited future is ready. With prepared_msg already destroyed at the end of the 'if' block, the only remaining handle on the cache entry was the weak ps_ptr, and the preemption gave a concurrent cache purge - triggered, for example, by Raft schema changes received during a node restart - the chance to drop the entry. The subsequent execute() then failed when promoting the weak pointer with checked_ptr_is_null_exception. The exception propagated out of the Paxos prepare path as a generic std::exception with no type information in the log, surfacing on the coordinator as: WriteFailure: Failed to prepare ballot ... Replica errors: host_id ... -> seastar::rpc::remote_verb_error (std::exception) Hoist the result_message::prepared into the outer scope so the pinned cache entry stays alive across co_await st->execute(...), closing the window in which a concurrent cache purge could invalidate the weak handle. Fixes SCYLLADB-1173 backport: the patch is simple, we can backport it to all versions with "LWT over tablets" feature. Note that the problem is only in test runs in debug configuration, production is not affected. Closes scylladb/scylladb#29675 * https://github.com/scylladb/scylladb: table_helper: retry insert prepare on concurrent cache invalidation paxos_state: keep prepared message alive across statement execution (cherry picked from commit `15f35577ed`) Closes scylladb/scylladb#29701	2026-05-05 10:02:19 +02:00
Aleksandr Bykov	148e05820b	test: fix flaky test_kill_coordinator_during_op The test hardcoded the expected number of coordinator elections (2, 3, 4, 5) for each phase. If a prior phase triggered an extra election, subsequent phases would wait for a count that was already reached or would never match. Fix by reading the current election count before each operation and expecting exactly one more, making each phase independent of prior history. Also add wait_for_no_pending_topology_transition() calls after each coordinator election to ensure the topology state machine has fully settled before proceeding with restarts and further operations. Decrease the failure detector timeout (failure_detector_timeout_in_ms) to 2000 ms on all test nodes so that coordinator crashes are detected faster, reducing test wallclock time and timeout-related flakiness. Enable raft_topology=trace logging on all test nodes to aid post-failure diagnosis. Add diagnostic logging in wait_new_coordinator_elected(). Fixes: SCYLLADB-1790 Closes scylladb/scylladb#29284 (cherry picked from commit `8afdae24d2`) Closes scylladb/scylladb#29723	2026-05-02 16:27:16 +03:00
Łukasz Paszkowski	1438830348	sstables: only wipe TemporaryHashes for sstable formats that have it Commit `8d34127684` ("sstables: clean up TemporaryHashes file in wipe()") unconditionally calls filename(..., component_type::TemporaryHashes) inside filesystem_storage::wipe(). However, the TemporaryHashes component is only registered in the component map of the 'ms' sstable format. For older formats (ka, la, mc, md, me) the lookup goes through sstable_version_constants::get_component_map(version).at(...) and throws std::out_of_range. The exception is then swallowed by the outer catch(...) in wipe(), which just logs and ignores. As a side effect, the subsequent remove_file(new_toc_name) is never reached and the TemporaryTOC ('*-TOC.txt.tmp') file is left as an orphan on disk after every unlink() of a non-'ms' sstable. Guard the lookup with get_component_map(version).contains() so the cleanup is only attempted for formats that actually define the component. Add a regression test in test/boost/sstable_directory_test.cc that creates an 'me'-format sstable, unlinks it and asserts that the sstable directory is left empty. Without the fix the test fails with a leftover 'me-...-TOC.txt.tmp' file. Fixes: SCYLLADB-1767 Closes scylladb/scylladb#29620 (cherry picked from commit `7e14ea5ac8`) Closes scylladb/scylladb#29692	2026-04-30 21:49:31 +03:00

1 2 3 4 5 ...

53586 Commits