scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-22 07:42:16 +00:00

Author	SHA1	Message	Date
Benny Halevy	97e03762c5	test/cluster/test_keyspace_rf: extend test_create_keyspace_with_default_replication_factor for tablets rack lists Add more racks to dc2 to verify that the default replication factor covers all available racks (rather than e.g. limited to 3). With tablets and rf_rack_valid_keyspaces, verify also the automatically selected rack list. Restrict the extension to non-debug build modes to prevent running out of memory with --repeat=100. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#29931	2026-05-19 10:44:24 +03:00
Andrei Chekun	6414c48fc2	test.py: rewrite resource gather Python tests requires different handling of metrics gathering from cgroup than C++ tests. pytest do not execute each python tests in a separate process, so we can't put it there and get the metrics. The idea is to put the whole pytest process to the cgroup and get the metrics. This will work because pytest runs the threads as as completely separate processes and inside the thread it will run tests consequently. Additionally, to simplify system resource monitor moved to pytest main thread.	2026-05-18 12:23:40 +02:00
Evgeniy Naydanov	39a10d6d67	test: remove dead suite subclasses and legacy execution pipeline After all test suites migrated to test_config.yaml with type: Python, the specialized suite classes (Topology, CQLApproval, Run, Tool) and the legacy execution pipeline (find_tests, run_test, TestSuite.run, Test.run) became unreachable. Remove all this dead code. Deleted files: - suite/topology.py, suite/cql_approval.py, suite/run.py, suite/tool.py Simplified: - base.py: remove run_test(), read_log(), TestSuite.run(), add_test_list(), build_test_list(), all_tests(), test_count(), SUITE_CONFIG_FILENAME, disabled/flaky test tracking, and dead Test attributes (args, core_args, valid_exit_codes, allure_dir, is_flaky, is_cancelled, etc.) - python.py: remove PythonTestSuite.run(), PythonTest.run(), _prepare_pytest_params(), pattern, test_file_ext, xmlout, server_log, scylla_env setup, and shlex import. Simplify run_ctx() to take no parameters. - runner.py: remove --scylla-log-filename option, print_scylla_log_filename fixture, SUITE_CONFIG_FILENAME import, and suite.yaml probe in TestSuiteConfig.from_pytest_node(). - __init__.py: remove re-exports of deleted classes. - test_config.yaml: Topology -> Python, Approval -> Python. - conftest files: run_ctx(options=...) -> run_ctx(). - docs/dev/testing.md: update to reflect current pytest-based architecture, log paths, and removed features. Co-Authored-By: Claude Opus 4.6 (200K context) <noreply@anthropic.com> Closes scylladb/scylladb#29613	2026-05-17 22:16:31 +03:00
Andrzej Jackowski	61e5ec9888	test: storage: retry fusermount3 unmount on teardown After stopping scylla server processes, the FUSE daemon (fuse2fs) may still be processing file handle closures. An immediate fusermount3 -u can fail with 'device busy', causing spurious test failures on teardown. Retry the unmount up to 10 times with 0.5s delay between attempts, and capture stderr for diagnostics. Fixes: SCYLLADB-2049 Closes scylladb/scylladb#29920	2026-05-16 19:36:48 +03:00
Piotr Dulikowski	460cb1656e	Merge 'test: limits: optimize test_max_cells to avoid large allocations and fragmentation' from Dario Mirovic The `test_max_cells` test was flaky due to `std::bad_alloc` caused by Seastar buddy allocator fragmentation. The root causes are: 1. The doubling loop with 24 iterations of CREATE/INSERT/DROP fragmented the allocator 2. The test built the whole batch as a single string that takes contiguous memory Also, some iterations inserted zero rows, but still did CREATE/DROP table which also contributed to the fragmentation. This patch series: - Skips iterations that insert zero rows - Creates the table once, truncates it after each test iteration - Switches to prepared statements Investigation results are presented in detail in https://scylladb.atlassian.net/browse/SCYLLADB-1645 Fixes SCYLLADB-1645 CI stability improvement. Backport to versions that have this test. Closes scylladb/scylladb#29759 * github.com:scylladb/scylladb: test: prepare max cells inserts test: reuse max cells schema test: limits: skip empty max cells iterations	2026-05-15 18:12:48 +02:00
Aleksandra Martyniuk	d874d355c2	service: skip load_sketch unload for excluded nodes on RF shrink When an RF change shrinks replicas on a DC and the node being shrunk is excluded, refresh_tablet_load_stats() only provides load_stats for that node if it has a cached snapshot from when the node was still up. If the snapshot is missing or predates the tables being shrunk (e.g. they were created after the node went down), stats stay incomplete. In that case load_sketch::unload() called from make_rf_change_plan() throws: Can't provide accurate load computation with incomplete load_stats for host: <uuid> Since an excluded node is not expected to come back, load_stats will never become complete, and the topology coordinator retries the plan infinitely, hanging ALTER KEYSPACE. Add a check for excluded nodes and skip unload() for them: we are removing the replica, so accurate load data for that node is not needed. For all other node states the throw-and-retry behavior is preserved. Modify test_excludenode_shrink_rf to always trigger the bug: a new error injection 'force_down_node_load_stats_invalid' forces the invalid-stats path in refresh_tablet_load_stats() for a down node, so the test does not depend on whether the load-stats refresher happened to cache the excluded node's stats while it was still up. Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1702. Closes scylladb/scylladb#29622	2026-05-15 17:46:28 +02:00
Marcin Maliszkiewicz	0574055b73	test: prepare max cells inserts Switch from raw CQL batch string to using a prepared statement. The old approach constructed the entire 50-row batch as a single CQL text string (~19.8 MiB with 32768 column names spelled out per row). This caused large contiguous allocations in the server. Fixes SCYLLADB-1645	2026-05-14 17:25:39 +02:00
Marcin Maliszkiewicz	0fd6f6f292	test: reuse max cells schema Extract table creation into _create_max_cell_count_table(). Call it once before the loop instead of creating and dropping the table on every iteration. Use TRUNCATE instead of DROP TABLE between iterations to clear data while keeping the schema. This avoids repeated schema operations that fragment the Seastar buddy allocator's address space with scattered small allocations. Refs SCYLLADB-1645	2026-05-14 17:24:53 +02:00
Marcin Maliszkiewicz	3debae9a37	test: limits: skip empty max cells iterations The doubling loop in test_max_cells started from cells=1. Since each row has MAX_CELLS_COLUMNS (32768) cells, iterations where cells < MAX_CELLS_COLUMNS produced zero rows (cells // columns = 0). Those iterations only did CREATE TABLE / DROP TABLE with no data inserted. Start the loop from MAX_CELLS_COLUMNS and use a while loop. Co-authored-by: Dario Mirovic <dario.mirovic@scylladb.com> Refs SCYLLADB-1645	2026-05-14 17:00:15 +02:00
Piotr Dulikowski	5b269be37b	Merge 'test/cluster/test_view_building_coordinator: migrate test from dtest' from Michał Jadwiszczak Move `materialized_views_test.py::TestMaterializedViews::test_do_not_finish_view_building_with_hints` test from dtest to test.py. The dtest was throttling down IO throughput in the hope that the view building won't be finished too soon. This introduces some unreliability, which can be solved by using error injection and pausing view building until we stop necessary nodes. This patch adds 2 tests: one for tablet-based view and one for vnode-based. Both of the tests use error injection to pause view building. Fixes [SCYLLADB-1261](https://scylladb.atlassian.net/browse/SCYLLADB-1261) The issue was seen in 2026.2, so we should backport this patch to this version. [SCYLLADB-1261]: https://scylladb.atlassian.net/browse/SCYLLADB-1261?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ Closes scylladb/scylladb#29788 * github.com:scylladb/scylladb: test/cluster/mv/test_mv_building: add similar test for vnode-based view test/cluster/test_view_building_coordinator: migrate test from dtest db/view/view_building_worker: add more logs when flushing base table	2026-05-14 15:34:26 +02:00
Michał Jadwiszczak	5c84cff78a	test/cluster/mv/test_mv_building: add similar test for vnode-based view In the dtest repo, the test run for both vnode and tablet based views. Since in test.py infra we're using error injection to pause the view building process, we need separate tests for those two cases.	2026-05-14 10:52:44 +02:00
Piotr Dulikowski	0c016cecc3	Merge 'QOS: self-heal stale V1-to-V2 migration state on upgrade' from Alex Dathskovsky service_levels: self-heal stale v1 marker after raft topology upgrade This PR handles an upgrade corner case where a node may already be using raft topology, while `system.scylla_local` still marks service levels as v1. The problem was introduced by commit `2917ec5d51` ("service:qos: service levels migration"), which added the service-levels migration from `system_distributed.service_levels` to `system.service_levels_v2` as part of the raft topology upgrade. However, if the cluster had no service levels configured, there was no data to migrate. In that case, the migration path could leave the local version marker unchanged, so the node would later observe an inconsistent state: * raft topology is already enabled; * service levels are still marked as v1 in `system.scylla_local`. Such clusters can be left in a stale state and fail startup during upgrade to 2026.2 This PR makes the upgrade path self-healing. The first commit restores `service_level_controller::migrate_to_v2()`, giving us a group0-based path for writing the service-levels v2 state even after raft topology is already in use. The second commit wires this path into startup. When the node detects the stale raft-topology + service-levels-v1 state, it retries the migration a bounded number of times and updates the version marker to v2 instead of failing startup. With this change, clusters that were left in this stale state can recover automatically during upgrade to 2026. Fixes: SCYLLADB-1807 backport: 2026.2 2026.1 we need this functionality when we are upgrading older servers Closes scylladb/scylladb#29749 * github.com:scylladb/scylladb: test/auth_cluster: simulate v1 state in self-heal test When skip_service_levels_v2_initialization is used, write an explicit v1 service level version marker while skipping v2 initialization. This lets the restart test exercise self-healing from v1 to v2. qos: self-heal stale service levels version on startup qos: reintroduce service levels v2 migration self-heal	2026-05-14 10:32:43 +02:00
Michał Jadwiszczak	b887f8cb2b	test/cluster/test_view_building_coordinator: migrate test from dtest Move `materialized_views_test.py::TestMaterializedViews::test_do_not_finish_view_building_with_hints` test from dtest to test.py. The dtest was throttling down IO throughput in the hope that the view building won't be finished too soon. This introduces some unreliability, which can be solved by using error injection and pausing view building until we stop necessary nodes. Fixes SCYLLADB-1261	2026-05-14 10:23:42 +02:00
Avi Kivity	f2ab911a46	Merge 'test/cluster: fix server-starting functions to wait for all ports' from Nadav Har'El This series fixes a recurring source of flaky tests in the cluster test suite. When a test configures Scylla to listen on non-default ports (e.g. a custom Alternator port, proxy-protocol port or shard-aware port), server_add() and server_start() would declare the server ready by polling the hardcoded standard CQL and Alternator ports. Those ports can become available slightly before the custom ports finish binding, so the test could start using the custom port before it was open — causing intermittent failures. The fix for each affected test was to pass `expected_server_up_state=ServerUpState.SERVING` explicitly, which waits for Scylla's sd_notify("STATUS=serving") signal instead. That signal is sent only after all configured listeners are fully open, so it is always the right readiness signal regardless of the port configuration. This workaround was applied again in PR #29737 and will keep being needed for every new test that uses a non-default port. This series makes ServerUpState.SERVING the default at every level of the server start/add call stack so no test needs to remember it: * Make server_add(), servers_add(), server_start() et al. all default to ServerUpState.SERVING. * Document that server_add/server_start wait for all ports to be ready, so future test authors understand what the functions guarantee. * Remove now-redundant expected_server_up_state=SERVING from exiting tests. * A small optimization: Fix check_serving_notification() returning False on first completion. When the sd_notify future completed, the function correctly updated _received_serving but still returned False, wasting one 100ms polling interval. Return self._received_serving directly. Closes scylladb/scylladb#29758 * github.com:scylladb/scylladb: test/pylib: fix missing protocol_version=4 on control_cluster scylla_cluster: guard poll_status() set_result() calls against cancelled future test/cluster: avoid repeated CQL checks and leaks while waiting for SERVING test/cluster: fix check_serving_notification() inefficiency test/cluster: remove now-redundant expected_server_up_state=SERVING test/cluster: document that add/start waits for all ports to be ready test/cluster: update remaining CQL_ALTERNATOR_QUERIED defaults to SERVING test/cluster: fix server_add/server_start hanging when starting in maintenance mode main: notify "entering maintenance mode" after the maintenance CQL server is ready test/cluster: make server_start() default to ServerUpState.SERVING test/cluster: make server_add() default to ServerUpState.SERVING	2026-05-13 21:23:18 +03:00
Alex	6188bf3e01	test/auth_cluster: simulate v1 state in self-heal test When skip_service_levels_v2_initialization is used, write an explicit v1 service level version marker while skipping v2 initialization. This lets the restart test exercise self-healing from v1 to v2.	2026-05-13 17:55:20 +03:00
Piotr Dulikowski	dc05bd35bb	Merge 'strong_consistency: limit available consistency levels in strong consistent requests' from Michał Jadwiszczak Strong consistent requests take different patch then EC requests and consistency levels don’t map well. We should limit available consistency levels in SC request to avoid ignoring them silently, which may cause confusion to user. For writes, there is only one option: - QUORUM/LOCAL_QUORUM (multi DC is not supported yet, so both of those CLs have the same effect) - we need quorum of replicas to successfully commit new mutations to Raft log. For reads, there are 2 options: - QUORUM/LOCAL_QUORUM - if user wants to be sure he sees latest data and the query needs to execute `read_barrier()`, which requires quorum of replicas - ONE/LOCAL_ONE - if user just wants to read data from one replica without synchronization All tests were updated to use LOCAL_QUORUM for both read and writes. Fixes SCYLLADB-1766 SC is in experimental phase and this patch is an improvement, no backport needed. Closes scylladb/scylladb#29691 * github.com:scylladb/scylladb: strong_consistency: allow QUORUM/LOCAL_QUORUM and ONE/LOCAL_ONE for reads strong_consistency: allow only QUORUM/LOCAL_QUORUM CL for writes	2026-05-13 16:31:05 +02:00
Piotr Smaron	0fcae72530	test: bootstrap tombstone gc repair cluster sequentially Avoid concurrent topology changes in the tombstone GC repair setup, where debug-mode nodes running hinted handoff and materialized view startup work can time out while applying Raft entries before the test starts. Keep the sequential path opt-in so unrelated repair tests still exercise concurrent bootstrap behavior. Closes scylladb/scylladb#29829	2026-05-13 13:58:44 +03:00
Patryk Jędrzejczak	3f2ff5a13f	Merge 'Remove raft_group0::finish_setup_after_join' from Gleb Natapov The function does nothing useful now. No backport needed. Removes code. Closes scylladb/scylladb#29828 * https://github.com/scylladb/scylladb: raft_group0: remove finish_setup_after_join function raft_group0: fix indentation after the last change raft_group: drop unneeded checks	2026-05-13 10:53:37 +02:00
Yaniv Michael Kaul	5d6f160129	test: update get_scylla_2025_1_executable() to use 2025.1.12 Update the hardcoded 2025.1.0 binary URL to the latest 2025.1.12 release for upgrade tests. The 2025.1.12 binary now supports and enforces the rf_rack_valid_keyspaces option which the test harness enables by default. Since test_sstable_compression_dictionaries_upgrade creates a 2-node cluster in a single rack with RF=2, it violates the constraint. Disable the option explicitly for this test. Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> Closes scylladb/scylladb#29714	2026-05-12 23:20:55 +02:00
Michał Jadwiszczak	68f0cf6fac	strong_consistency: allow only QUORUM/LOCAL_QUORUM CL for writes To successfully write data to strong consistent table, a quorum of replicas need to be used to save the data to Raft log. So the only reasonable consistency level is QUORUM/LOCAL_QUORUM (currently SC doesn't support multi DC).	2026-05-12 23:20:03 +02:00
Wojciech Mitros	f3cf20803b	test: run test_mv_admission_control_exception on one shard In the test we perform 2 consecutive writes where the first write is supposed to increase the view update backlog above the mv admission control threshold and the second one is expected to be rejected because of that. On each node/shard we have 2 types of view update backlogs: 1. for deciding whether we should admit writes 2. for propagating the backlog information to other nodes/shards. For the second write to be rejected, it must be performed on a node and shard which updated its backlog of type 1. The view update backlog of type 2. is immediately increased on the base table replica. For this backlog to be registered as a backlog of type 1., it needs to be either carried by gossip (happening once every second) or by attaching it to a replica write response. We don't want to increase the runtime of tests unnecessarily, so we don't wait and we rely on the second mechanism. The response to the first base table write (the one causing increase in the backlog) carries the increased backlog to the coordinator of this write. So for the second write to observe the increased backlog, it needs to be coordinated on the same node+shard as the first write. We make sure that both writes are coordinated on the same node+shard by using prepared statements combined with setting the host in `run_async`. Both writes target the same partition and with prepared statements we route them directly to the correct shard. That was the idea, at least. In practice, for the driver to learn the correct shard, it first needs to learn the token->shard mapping from the server. For vnodes it can expect a shard by calculating the token of the affected partition, but for tablets, it had no opportunity to learn the tablet->shard mapping so the first write may route to any shard. Additionally, we aren't guaranteed that the driver established connections to all shards on all nodes at the point of any write. So if a connection finishes establishing between the two writes, this may also cause us to coordinate these 2 writes on different shards, leading to a missed view backlog growth and not-rejected second write. We fix this in this patch by running the test using one shard on each node. This way, as long as we perform both writes on the same node, they'll also be coordinated on the same shard. This also makes the prepared statement and BoundStatement unnecessary — we can use SimpleStatement with FallthroughRetryPolicy directly. Fixes: SCYLLADB-1901 Closes scylladb/scylladb#29862	2026-05-12 17:34:19 +02:00
Piotr Dulikowski	129f193116	Merge 'strong_consistency: implement basic coordinator metrics' from Michał Jadwiszczak Add per-shard metrics for strong consistency coordinator operations (latency, timeouts, bounces, status unknown) under the `"strong_consistency_coordinator"` category. These are analogous to the eventual consistency metrics in `storage_proxy_stats`, enabling direct performance comparison between the two consistency modes. The metrics are simplified compared to `storage_proxy_stats` — no breakdown by table, tablet, scheduling group, or DC, only per-shard. Fixes SCYLLADB-1343 Strong consistency is still in experimental phase, no need to backport. Closes scylladb/scylladb#29318 * github.com:scylladb/scylladb: test/strong_consistency: verify metrics strong_consistency: wire up metrics to operations strong_consistency: add stats struct and metrics registration	2026-05-12 16:15:51 +02:00
Botond Dénes	e95eb21a16	Merge 'Tablet-aware restore' from Pavel Emelyanov The mechanics of the restore is like this - A /storage_service/tablets/restore API is called with (keyspace, table, endpoint, bucket, manifests) parameters - First, it populates the system_distributed.snapshot_sstables table with the data read from the manifests - Then it emplaces a bunch of tablet transitions (of a new "restore" kind), one for each tablet - The topology coordinator handles the "restore" transition by calling a new RESTORE_TABLET RPC against all the current tablet replicas - Each replica handles the RPC verb by - Reading the snapshot_sstables table - Filtering the read sstable infos against current node and tablet being handled - Downloading and attaching the filtered sstables This PR includes system_distributed.snapshot_sstables table from @robertbindar and preparation work from @kreuzerkrieg that extracts raw sstables downloading and attaching from existing generic sstables loading code. This is first step towards SCYLLADB-197 and lacks many things. In particular - the API only works for single-DC cluster - the caller needs to "lock" tablet boundaries with min/max tablet count - not abortable - no progress tracking - sub-optimal (re-kicking API on restore will re-download everything again) - not re-attacheable (if API node dies, restoration proceeds, but the caller cannot "wait" for it to complete via other node) - nodes download sstables in maintenance/streaming sched gorup (should be moved to maintenance/backup) Other follow-up items: - have an actual swagger object specification for `backup_location` Closes #28436 Closes #28657 Closes #28773 Closes scylladb/scylladb#28763 * github.com:scylladb/scylladb: docs: Update topology_over_raft.md with `restore` transition kind test: Add test for backup vs migration race test: Restore resilience test sstables_loader: Fail tablet-restore task if not all sstables were downloaded sstables_loader: mark sstables as downloaded after attaching sstables_loader: return shared_sstable from attach_sstable db: add update_sstable_download_status method db: add downloaded column to snapshot_sstables db: extract snapshot_sstables TTL into class constant test: Add a test for tablet-aware restore tablets: Implement tablet-aware cluster-wide restore messaging: Add RESTORE_TABLET RPC verb sstables_loader: Add method to download and attach sstables for a tablet tablets: Add restore_config to tablet_transition_info sstables_loader: Add restore_tablets task skeleton test: Add rest_client helper to kick newly introduced API endpoint api: Add /storage_service/tablets/restore endpoint skeleton sstables_loader: Add keyspace and table arguments to manfiest loading helper sstables_loader_helpers: just reformat the code sstables_loader_helpers: generalize argument and variable names sstables_loader_helpers: generalize get_sstables_for_tablet sstables_loader_helpers: add token getters for tablet filtering sstables_loader_helpers: remove underscores from struct members sstables_loader: move download_sstable and get_sstables_for_tablet sstables_loader: extract single-tablet SST filtering sstables_loader: make download_sstable static sstables_loader: fix formating of the new `download_sstable` function sstables_loader: extract single SST download into a function sstables_loader: add shard_id to minimal_sst_info sstables_loader: add function for parsing backup manifests split utility functions for creating test data from database_test export make_storage_options_config from lib/test_services rjson: Add helpers for conversions to dht::token and sstable_id Add system_distributed_keyspace.snapshot_sstables add get_system_distributed_keyspace to cql_test_env code: Add system_distributed_keyspace dependency to sstables_loader storage_service: Export export handle_raft_rpc() helper storage_service: Export do_tablet_operation() storage_service: Split transit_tablet() into two tablets: Add braces around tablet_transition_kind::repair switch	2026-05-12 16:24:13 +03:00
Andrzej Jackowski	89261bf759	test: wait for TTL scheduling sanity metric The test samples sl:default runtime before and after setup writes to prove that it measures the scheduling group used by regular CQL writes. The metric is exported in milliseconds, so a single 200-row batch may not be visible immediately, or may be too small in some environments. Keep the original 200-row table size, but wait up to 30 seconds for the metric to advance. If it does not, retry the same writes before TTL is enabled. The retries update the same keys, so the expiration part of the test still waits for exactly the original number of rows. In a local 100-run with N=200 rows, the observed delta of `ms_statement_before - ms_statement_before_write` was: min=4.0, max=16.0, mean=8.13, and median=8.0. Therefore, it looks possible that in a rare corner case the delta drops even to 0. Fixes SCYLLADB-1869 Closes scylladb/scylladb#29797	2026-05-12 12:38:25 +03:00
Piotr Dulikowski	7c2b1ea0b5	Merge 'view_building: fix tombstone_warn_threshold warnings' from Michał Jadwiszczak `system.view_building_tasks` is a single-partition Raft group0 table (pk = `"view_building"`, CK = timeuuid). When `clean_finished_tasks()` deletes hundreds of finished tasks, the physical rows remain in SSTables until compaction. Any subsequent read of the partition counts every column of every tombstoned row as a dead cell, triggering `tombstone_warn_threshold` warnings in large clusters. Two-part fix: 1. Range tombstones instead of row tombstones (commits 2–3) Instead of one row tombstone per finished task, find the minimum alive task UUID (`min_alive_uuid`) and emit a single range tombstone `[before_all, min_alive_uuid)` covering all tasks below that boundary. This reduces the tombstone count significantly and also benefits future compaction. 2. Bounded scan with `min_task_id` (commits 4–6) Even with range tombstones, physical rows remain until compaction and still count as dead cells during reads. The only way to avoid them is to not read them at all. - Add a `min_task_id timeuuid` static column to `system.view_building_tasks`. - On every GC, write `min_task_id = min_alive_uuid` atomically with the range tombstone (same Raft batch). - On reload, read `min_task_id` first using a static-only partition slice (empty `_row_ranges` + `always_return_static_content`): the SSTable reader stops immediately after the static row before processing any clustering tombstones — zero dead cells counted. - Use `AND id >= min_task_id` as a lower bound for the main task scan, skipping all tombstoned rows. The static-only read and the bounded scan are gated on the `VIEW_BUILDING_TASKS_MIN_TASK_ID` cluster feature so mixed-version clusters fall back to the full scan. The issue is not critical, so the fix shouldn't be backported. Fixes SCYLLADB-657 Closes scylladb/scylladb#28929 * github.com:scylladb/scylladb: test/cluster/test_view_building_coordinator: add reproducer for tombstone threshold warning docs: document tombstone avoidance in view_building_tasks view_building: add `task_uuid_generator` to `view_building_task_mutation_builder` view_building: introduce `task_uuid_generator` view_building: store `min_alive_uuid` in view building state view_building: set min_task_id when GC-ing finished tasks view_building: add min_task_id support to view_building_task_mutation_builder view_building: add min_task_id static column and bounded scan to system_keyspace view_building: use range tombstone when GC-ing finished tasks view_building: add range tombstone support to view_building_task_mutation_builder view_building: introduce VIEW_BUILDING_TASKS_MIN_TASK_ID cluster feature	2026-05-12 12:38:25 +03:00
Pavel Emelyanov	150345cc52	Merge 'test: per-bucket isolation for S3/GCS object storage tests' from Ernest Zaslavsky This series adds per-test bucket isolation to all S3 and GCS object storage tests. Previously, every test shared a single pre-created bucket, which meant tests could interfere with each other through leftover objects and could not run concurrently across multiple `test.py` processes without risking collisions. New `create_bucket`, `delete_bucket`, and `delete_bucket_with_objects` methods on `s3::client`, following the existing `make_request` pattern. `create_bucket` handles the `BUCKET_ALREADY_OWNED_BY_YOU` error gracefully. A new `s3_test_fixture` RAII class for C++ Boost tests that creates a uniquely-named bucket on construction (derived from the Boost test name and pid) and tears down everything — objects, bucket, client — on destruction. All S3 tests in `s3_test.cc` are migrated to use it, removing manual `deferred_delete_object` and `deferred_close` boilerplate. The minio server policy is broadened to allow dynamic bucket creation/deletion. A `client::make` overload that accepts a custom `retry_strategy`, used in tests with a fast 1ms retry delay instead of exponential backoff, significantly reducing test runtime for transient errors during bucket lifecycle operations. Python-side (`test/cluster/object_store`): each pytest fixture (`object_storage`, `s3_storage`, `s3_server`) now creates a unique bucket per test function via `create_test_bucket()` and destroys it on teardown. Bucket names are sanitized from the pytest node name with a short UUID suffix for uniqueness. Object storage helpers (`S3Server`, `MinioWrapper`, `GSFront`, `GSServerImpl`, factory functions, CQL helpers, `s3_server` fixture) are extracted from `test/cluster/object_store/conftest.py` into a shared `test/pylib/object_storage.py` module, eliminating duplication across test suites. The conftest becomes a thin re-export wrapper. Old class names are preserved as aliases for backward compatibility. \| Test Name \| new test specific retry strategy execution time (ms) \| original execution time (ms) \| Δ (ms) \| Speedup \| \|--------------------------------------------------------------\|----------------:\|-------------:\|---------:\|--------:\| \| test_client_upload_file_multi_part_with_remainder_proxy \| 19,261 \| 61,395 \| −42,134 \| 3.2× \| \| test_client_upload_file_multi_part_without_remainder_proxy \| 16,901 \| 53,688 \| −36,787 \| 3.2× \| \| test_client_upload_file_single_part_proxy \| 3,478 \| 6,789 \| −3,311 \| 2.0× \| \| test_client_multipart_copy_upload_proxy \| 1,303 \| 1,619 \| −316 \| 1.2× \| \| test_client_put_get_object_proxy \| 150 \| 365 \| −215 \| 2.4× \| \| test_client_readable_file_stream_proxy \| 125 \| 327 \| −202 \| 2.6× \| \| test_small_object_copy_proxy \| 205 \| 389 \| −184 \| 1.9× \| \| test_client_put_get_tagging_proxy \| 181 \| 350 \| −169 \| 1.9× \| \| test_client_multipart_upload_proxy \| 1,252 \| 1,416 \| −164 \| 1.1× \| \| test_client_list_objects_proxy \| 729 \| 881 \| −152 \| 1.2× \| \| test_chunked_download_data_source_with_delays_proxy \| 830 \| 960 \| −130 \| 1.2× \| \| test_client_readable_file_proxy \| 148 \| 279 \| −131 \| 1.9× \| \| test_client_upload_file_multi_part_with_remainder_minio \| 3,358 \| 3,170 \| +188 \| 0.9× \| \| test_client_upload_file_multi_part_without_remainder_minio \| 3,131 \| 2,929 \| +202 \| 0.9× \| \| test_client_upload_file_single_part_minio \| 519 \| 421 \| +98 \| 0.8× \| \| test_download_data_source_proxy \| 180 \| 237 \| −57 \| 1.3× \| \| test_client_list_objects_incomplete_proxy \| 590 \| 641 \| −51 \| 1.1× \| \| test_large_object_copy_proxy \| 952 \| 991 \| −39 \| 1.0× \| \| test_client_multipart_upload_fallback_proxy \| 148 \| 185 \| −37 \| 1.3× \| \| test_client_multipart_copy_upload_minio \| 641 \| 674 \| −33 \| 1.1× \| No backport needed — this is a test infrastructure improvement with no production code impact beyond the new `s3::client` methods. Closes scylladb/scylladb#29508 * github.com:scylladb/scylladb: test: extract object storage helpers to test/pylib/object_storage.py test: add per-test bucket isolation to object_store fixtures s3: add client::make overload with custom retry strategy test: add s3_test_fixture and migrate tests to per-bucket isolation s3: add create_bucket and delete_bucket to client	2026-05-12 12:38:24 +03:00
Pavel Emelyanov	19820910f8	test: Add test for backup vs migration race The test starts regular backup+restore on a smaller cluster, but prior to it spawns tablet migration from one node to another and locks it in the middle with the help of block_tablet_streaming injection (even though tablets have no data and there's nothing to stream, the injection is located early enough to work). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-05-12 10:40:24 +03:00
Pavel Emelyanov	3bcefa42c5	test: Restore resilience test The test checks that losing one of nodes from the cluster while restore is handled. In particular: - losing an API node makes the task waiting API to throw (apparently) - losing coordinator or replica node makes the API call to fail, because some tablets should fail to get restored. If the coordinator is lost, it triggers coordinator re-election and new coordinator still notices that a tablet that was replicated to "old" coordinator failed to get restored and fails the restore anyway Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-05-12 10:40:24 +03:00
Pavel Emelyanov	69b8f76a32	sstables_loader: Fail tablet-restore task if not all sstables were downloaded When the storage_service::restore_tablets() resolves, it only means that tablet transitions are done, including restore transitions, but not necessarily that they succeeded. So before resolving the restoration task with success need to check if all sstables were downloaded and, if not, resolve the task with exception. Test included. It uses fault-injection to abort downloading of a single sstable early, then checks that the error was properly propagated back to the task waiting API Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-05-12 10:40:24 +03:00
Pavel Emelyanov	4137211cf4	test: Add a test for tablet-aware restore The test is derived from test_restore_with_streaming_scopes() one, with the excaption that it doesn't check for streaming directions, doesn't check mutations right after creation and doesn't loop over scoped sub-tests, because there's no scope concept here. Also it verifies just two topologies, it seems to be enough. The scopes test has many topologies because of the nature of the scoped restore, with cluster-wide restore such flexibility is not required. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-05-12 10:40:23 +03:00
Calle Wilund	2cc1a2c406	storage_service: Disable snapshots after raft decommission Fixes: SCYLLADB-1693 In case we abort a decommission operation, the snapshot/backup mechanism need to remain open. This change moves it to after raft_decommission. In the case of a cluster snapshot, our nodes ownership or not of tables will be serialized by raft anyway, so should remain consistent. In that case we at worst coordinate from a node in "leave" status In the case of a local snapshot, ownership matters less, only sstables on disk, which should not change. In the case of backup, this operates on a snapshot, state of which is not affected. Adds an injection point for testing. v2: - Added injection point to ensure test can abort decommission Closes scylladb/scylladb#29667	2026-05-11 17:04:09 +03:00
Piotr Smaron	71542206bc	cql: return InvalidRequest for oversized partition/clustering keys When a partition key or clustering key value exceeds the 64 KiB limit (65535 bytes serialized), Scylla used to raise a generic std::runtime_error "Key size too large: N > M" from the low-level compound-key serializer. That error surfaced to clients as a CQL server error (code 0x0000, "NoHostAvailable"-looking), which is both ugly and incompatible with Cassandra - Cassandra returns a clean InvalidRequest with the message "Key length of N is longer than maximum of M". Fix this at the single chokepoint: compound_type::serialize_value in keys/compound.hh. The serializer is on every path that materializes a key - INSERT/UPDATE/DELETE/BATCH build mutations through it, and SELECT builds partition and clustering ranges through it - so a single throw replacement produces a clean InvalidRequest consistently across all paths and all key shapes (single, compound PK, composite CK). The previous approach on this PR branch patched three call sites in cql3/restrictions/statement_restrictions.cc, which only covered SELECT, duplicated the check, and placed it mid-restrictions code (flagged in review). Dropping those changes in favour of the root-cause fix here. Un-xfail the tests this fixes: - test/cqlpy/test_key_length.py: test_insert_65k_pk, test_insert_65k_ck, test_where_65k_pk, test_where_65k_ck, test_insert_65k_ck_composite, test_insert_total_compound_pk_err, test_insert_total_composite_ck_err. - test/cqlpy/cassandra_tests/.../insert_test.py: testPKInsertWithValueOver64K, testCKInsertWithValueOver64K. - test/cqlpy/cassandra_tests/.../select_test.py: testPKQueryWithValueOver64K. test_insert_65k_pk_compound stays xfail: its oversized value gets rejected by the Python driver's CQL wire-protocol encoder (see CASSANDRA-19270) before reaching the server, so the fix can't apply. Updated its reason. testCKQueryWithValueOver64K stays xfail with an updated reason: Cassandra silently returns empty for an oversized clustering key in WHERE, while Scylla now throws InvalidRequest - a deliberate choice mirroring the partition-key case, documented in the discussion on #10366. Add three tight-boundary tests (addressing review feedback on the previous revision) that pin MAX+1 behaviour for SELECT and INSERT of both partition and clustering keys. Update test/cluster/dtest/limits_test.py to match the new message ("Key length of \\d+ is longer than maximum of 65535"). fixes #10366 fixes #12247 Co-authored-by: Alexander Turetskiy <someone.tur@gmail.com> Closes scylladb/scylladb#23433	2026-05-11 16:56:35 +03:00
Gleb Natapov	c3d2f0bde9	raft_group0: remove finish_setup_after_join function The only thing it does not change a bootstrapping node to become a voter in case the cluster does not support limited voters feature. But the feature was introduced in 2025.2 and direct upgrade from 2025.1 to version newer than 2026.1 is not supported. But even if such upgrade is done the removed code has affect only during bootstrap, not during regular boot. Also remove the upgrade test since after the patch suppressing the feature on the first boot will no longer behave correctly.	2026-05-11 15:38:36 +03:00
Asias He	0204372156	repair: Reject repair requests where start and end tokens are equal When a user calls the repair API with identical startToken and endToken values, the code creates a wrapping interval (T, T]. This causes unwrap() to split it into (-inf, T] and (T, +inf), covering the entire token ring and triggering a full repair. Reject such requests early with an error message matching Cassandra's behavior: "Start and end tokens must be different." Fixes: https://scylladb.atlassian.net/browse/CUSTOMER-358 Closes scylladb/scylladb#29821	2026-05-11 14:08:20 +03:00
Botond Dénes	ad7ac62835	Merge ' Add a node_owner column (locator::host_id) to system.sstables and make it part of the partition key' from Dimitrios Symonidis Add a node_owner column (locator::host_id) to system.sstables and make it part of the partition key, so the primary key becomesv PRIMARY KEY ((table_id, node_owner), generation). This is the first step toward moving the sstables registry into system_distributed: once distributed, each node's startup scan must read only the rows it owns, which requires the owning node to be part of the partition key. Partitioning by (table_id, node_owner) turns that scan into a single-partition read of exactly the local node's rows. Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1562 No need to backport this, keyspace over object storage is experimental feature Closes scylladb/scylladb#29659 * github.com:scylladb/scylladb: db, sstables: add node_owner to sstables registry primary key db, sstables: rename sstables registry column owner to table_id	2026-05-11 14:08:19 +03:00
Wojciech Mitros	ab12083525	test: propagate view update backlog before partition delete In the test_delete_partition_rows_from_table_with_mv case we perform a deletion of a large partition to verify that the deletion will self-throttle when generating many view updates. Before the deletion, we first build the materialized view, which causes the view update backlog to grow. The backlog should be back to empty when the view building finishes, and we do wait for that to happen, but the information about the backlog drop may not be propagated to the delete coordinator in time - the gossip interval is 1s and we perform no other writes between the nodes in the meantime, so we don't make use of the "piggyback" mechanism of propagating view backlog either. If the coordinator thinks that the backlog is high on the replica, it may reject the delete, failing this test. We change this in this patch - after the view is built, we perform an extra write from the coordinator. When the write finishes, the coordinator will have the up-to-date view backlog and can proceed with the DELETE. Additionally, we enable the "update_backlog_immediately" injection, which makes the node backlog (the highest backlog across shards) update immediately after each change. Fixes: SCYLLADB-1795 Closes scylladb/scylladb#29775	2026-05-07 11:33:13 +03:00
Ferenc Szili	ec4b483e88	test: fix flaky test_tablets_split_merge_with_many_tables In debug mode, this test can timeout during tablets merge. While the test already decreases the number of tables in debug mode (20 tables, instead of 200 for dev mode), this is not enough, and the test can still timeout during merge. This change reduces the number of tables from 20 to 5 in debug mode. It also drops the log level for lead_balancer to debug. This should make any potential future problems with this test easier to investigate. Fixes: SCYLLADB-1717 Closes scylladb/scylladb#29682	2026-05-06 17:02:10 +03:00
Petr Gusev	cab043323d	test/cluster: fix test_lwt_fencing_upgrade flakiness during rolling upgrade Replace the naive host.is_up check with wait_for_cql_and_get_hosts() which actually executes a query against each host, ensuring the driver's connection pool is fully re-established before proceeding to stop the last server. The is_up flag is set asynchronously via gossip and doesn't guarantee the connection pool has live TCP connections. After a server restart, the flag may be True while the pool still holds stale connections. When the pool monitor later discovers them dead it briefly marks the host DOWN, causing NoHostAvailable if another server is being stopped concurrently. Fixes SCYLLADB-1840 Closes scylladb/scylladb#29769	2026-05-06 15:40:09 +03:00
Tomasz Grabiec	d6346e68c1	Merge 'prevent gossiper from marking nodes as down in tests unexpectedly' from Patryk Jędrzejczak This PR includes two changes that make gossiper much less likely to mark nodes as down in tests unexpectedly, and cause test flakiness in issues like SCYLLADB-864: - fixing false node conviction when echo succeeds, - increasing the failure_detector_timeout fixture. Fixes: SCYLLADB-864 No need for backport: related CI failures are rare, and merging #29522 made them even more unlikely (I haven't seen one since then, but it's still possible to reproduce locally on dev machines). Closes scylladb/scylladb#29755 * github.com:scylladb/scylladb: test/cluster: increase failure_detector_timeout gossiper: fix false node conviction when echo succeeds	2026-05-06 14:01:15 +02:00
Botond Dénes	8d22ef3058	Merge 'commitlog_test.py: Fix size check aliasing, and threshold calc and fix CL chunk size est.' from Calle Wilund Fixes: SCYLLADB-1815 If we're in a brand new chunk (no buffer yet allocated), we would miscalculate the actual size of an entry to write, possibly causing segment size overshoot. Break out some logic to share between this calc and new_buffer. Also remove redundant (and possibly wrong) constant in oversized allocation. As for the test: Checking segment sizes should not use a size filter that rounds (up) sizes. More importantly, the estimate for what is acceptable limit for commitlog disk usage should be aligned. Simplified the calc, and also made logging more useful in case of failure. Closes scylladb/scylladb#29753 * github.com:scylladb/scylladb: commitlog_test.py: Fix size check aliasing, and threshold calc. commitlog: Fix segment/chunk overhead maybe not included in next_position calculation	2026-05-06 13:48:41 +03:00
Piotr Dulikowski	321006ecbd	Merge 'auth: fix crash on ghost rows in role_permissions' from Marcin Maliszkiewicz The auth cache crashes when it encounters rows in role_permissions that have a live row marker but no permissions column. These “ghost rows” were created by the now-removed auth v2 migration, which used INSERT (creating row markers) instead of UPDATE. When permissions were later revoked, the row marker remained while the permissions column became null. An empty collection appears as null, since its lifetime is based only on its element's cells. As a result, when the cache reloads and expects the permissions column to exist, it hits a missing_column exception. The series removes dead code that was the primary crash site, adds has() guards to the remaining access paths, and includes a test reproducer. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1816 Backport: all supported versions 2026.1, 2025.4, 2025.1 Closes scylladb/scylladb#29757 * github.com:scylladb/scylladb: test: add reproducer for auth cache crash on missing permissions column auth: tolerate missing permissions column in authorize() auth: add defensive has() guard for role_attributes value column auth: remove unused permissions field from cache role_record	2026-05-06 12:00:17 +02:00
Nadav Har'El	67384dbb96	test/cluster: remove now-redundant expected_server_up_state=SERVING ServerUpState.SERVING is now the default for server_add() and server_start(), so the explicit argument in various tests are no longer needed. Remove it along with the unused ServerUpState imports and the docstring comments that explained why it was there. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-05-05 18:56:37 +03:00
Marcin Maliszkiewicz	5c5306c692	test: add reproducer for auth cache crash on missing permissions column	2026-05-05 17:16:25 +02:00
Patryk Jędrzejczak	9f692857be	test/cluster: increase failure_detector_timeout Scaling the timeout by build mode (#29522) turned out to be not sufficient. Nodes can still be unexpectedly marked as down, even with a 4s timeout in dev mode. I managed to reproduce SCYLLADB-864 in such conditions. Increasing failure_detector_timeout will proportionally slow down tests that use it. That's bad, but currently these tests' flakiness is a much bigger problem than the tests' slowness. Also, not many tests use this fixture, and we hope to make it unneeded eventually (see #28495).	2026-05-05 15:12:33 +02:00
Patryk Jędrzejczak	b69d00b0a7	Merge 'Barrier and drain logging' from Gleb Natapov Add more logging to barrier and drain rpc to try and pinpoint https://github.com/scylladb/scylladb/issues/26281 Bakport since we want to have it if it happens in the field. Fixes: SCYLLADB-1821 Refs: #26281 Closes scylladb/scylladb#29735 * https://github.com/scylladb/scylladb: session, raft_topology: add periodic warnings for hung drain and stale version waits session: add info-level logging to drain_closing_sessions raft_topology: log sub-step progress in local_topology_barrier raft_topology: log read_barrier progress in topology cmd handler	2026-05-05 15:04:50 +02:00
Calle Wilund	5cdfdd9ba3	commitlog_test.py: Fix size check aliasing, and threshold calc. Fixes: SCYLLADB-1815 Checking segment sizes should not use a size filter that rounds (up) sizes. More importantly, the estimate for what is acceptable limit for commitlog disk usage should be aligned. Simplified the calc, and also made logging more useful in case of failure.	2026-05-05 14:42:55 +02:00
Botond Dénes	afd9a55891	Merge 'test/cluster: wait for custom listener readiness' from Piotr Smaron server_add() defaults to CQL_ALTERNATOR_QUERIED. That proves the regular CQL driver path is queryable, and regular Alternator ports listed in YAML config if any. It does not prove that every custom listener the test will connect to is already accepting raw TCP connections. test_proxy_protocol_ssl_shard_aware connects directly to the shard-aware TLS proxy-protocol CQL port immediately after server startup. Wait for ServerUpState.SERVING in the fixture so the custom proxy-protocol listener is registered before opening raw sockets. test_uninitialized_conns_semaphore opens a raw TCP connection to native_shard_aware_transport_port immediately after startup. The default readiness check can succeed through native_transport_port while the shard-aware listener is still being started, because CQL listeners are registered independently. Wait for ServerUpState.SERVING before opening raw sockets. test_perf_alternator_remote now asks server_add() to wait for SERVING and uses the returned server address directly. This removes the redundant running_servers() plus get_ready_cql() sequence noted in review. Fixes: SCYLLADB-1797 No backport as of now, only appeared on master. Closes scylladb/scylladb#29737 * github.com:scylladb/scylladb: test/cluster: avoid redundant perf alternator CQL wait test/cluster: wait for shard-aware CQL listener test/cluster: wait for proxy protocol ports to serve	2026-05-05 14:45:58 +03:00
Gleb Natapov	e88ce09372	raft_topology: log sub-step progress in local_topology_barrier When a node processes a barrier_and_drain topology command, it performs two potentially long-running operations inside local_topology_barrier(): waiting for stale token metadata versions to be released (stale_versions_in_use) and draining closing sessions (drain_closing_sessions). Either of these can hang indefinitely -- for example, stale_versions_in_use blocks until all references to previous token metadata versions are released, which depends on in-flight requests completing. Previously, the only logging was a single 'done' message at the end, making it impossible to determine which sub-step was blocking when a barrier_and_drain RPC appeared stuck on a node. In a recent CI failure, a node never responded to barrier_and_drain during a removenode operation, and the logs showed the RPC was received but nothing about what it was waiting on internally. Add info-level logging before each blocking sub-step, including the topology version for correlation. This allows diagnosing hangs by showing whether the node is stuck waiting for stale metadata versions, stuck draining sessions, or never reached these steps at all.	2026-05-04 15:58:45 +03:00
Piotr Smaron	0a780d0ea1	test/cluster: avoid redundant perf alternator CQL wait server_add() already waits for the requested server-up state. For the remote perf-alternator test, request SERVING from server_add() and use the returned server address directly instead of asking for running servers and then calling get_ready_cql() again. This keeps the listener-readiness intent explicit while removing the redundant CQL readiness probe noted in review.	2026-05-04 14:09:28 +02:00
Piotr Smaron	c90012c22b	test/cluster: wait for shard-aware CQL listener server_add() defaults to CQL_ALTERNATOR_QUERIED. That proves the regular CQL driver path is queryable, and regular Alternator ports listed in YAML config if any. It does not prove that every CQL listener configured for the process is already accepting raw TCP connections. test_uninitialized_conns_semaphore opens a raw TCP connection to native_shard_aware_transport_port immediately after startup. The default readiness check can succeed through native_transport_port while the shard-aware listener is still being started, because CQL listeners are registered independently. Wait for ServerUpState.SERVING before opening raw sockets. Scylla sends that notification only after protocol servers are registered, so this closes the startup window without adding sleeps or local retry loops. Fixes: SCYLLADB-1797	2026-05-04 13:36:43 +02:00

1 2 3 4 5 ...

1329 Commits