scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-30 19:46:48 +00:00

Author	SHA1	Message	Date
Patryk Jędrzejczak	ba7f314cdc	test: test_full_shutdown_during_replace: retry replace after the replacing node is removed from gossip The test is currently flaky with `reuse_ip = True`. The issue is that the test retries replace before the first replace is rolled back and the first replacing node is removed from gossip. The second replacing node can see the entry of the first replacing node in gossip. This entry has a newer generation than the entry of the node being replaced, and both replacing nodes have the same IP as the node being replaced. Therefore, the second replacing node incorrectly considers this entry as the entry of the node being replaced. This entry is missing rack and DC, so the second replace fails with ``` ERROR 2026-02-24 21:19:03,420 [shard 0:main] init - Startup failed: std::runtime_error (Cannot replace node 8762a9d2-3b30-4e66-83a1-98d16c5dd007/127.61.127.1 with a node on a different data center or rack. Current location=UNKNOWN_DC/UNKNOWN_RACK, new location=dc1/rack2) ``` Fixes SCYLLADB-805 Closes scylladb/scylladb#28829	2026-03-02 10:26:57 +02:00
Michael Litvak	8c4bc33e51	test: remove test_view_building_with_tablet_move remove the test since it's not relevant anymore, it's not testing what it's supposed to test and it's unstable. the purpose of the test was to reproduce an issue in the legacy view builder where a view starts to build at token T2 and then all tokens [T1, end) with T1<T2 migrate to another node while it's still building, exposing an issue when the view builder wraparounds the token ring. this is not relevant anymore because now view building with tablets is done via the view building coordinator for tablets, and all views start to build from the first token with no wraparound. besides, the test is unstable due to relying too much on specific timing, which was useful for investigating and fixing the original issue but not anymore. Fixes SCYLLADB-842 Closes scylladb/scylladb#28842	2026-03-02 07:42:08 +01:00
Botond Dénes	1f09fcfb26	Merge 'Use standard ks/cf/data creation methods in test_restore_with_streaming_scopes' from Pavel Emelyanov The test uses create_dataset helper duplicating the existing code that does the same. This PR patches basic tests to use standard facilities. Also the PR simplifies the 3-level nested loops used to combine several sets of restoration parameters by using itertools.product facility. Continuation of #28600. Cleaning tests, not backporting Closes scylladb/scylladb#28608 * github.com:scylladb/scylladb: test/object_store: Use itertools.product() for deeply nested loops test/object_store: Replace dataset creation usage with standard methods test/object_store: Shift indentation right for test_restore_with_streaming_scopes	2026-02-27 16:15:55 +02:00
Avi Kivity	450a09b152	test: tools: restrict embedded perf tests from taking over host The perf-simple-query tests were not restricted on CPU count, so on a 96-CPU machine, they would run on 96 CPUs, and time out in debug mode. All restrict memory usage and add --overprovisioned so that pinning is disabled. Apply that to all tests. Closes scylladb/scylladb#28821	2026-02-27 16:06:22 +02:00
Botond Dénes	d3a3921487	Merge 'Re-use and improve the take_snapshot() helper in backup tests' from Pavel Emelyanov The helper is very simple yet generic -- it takes a snapshot of a keyspace on all servers and collects the resulting sstables from workdirs. Re-using it in all test cases saves some lines of code. Also, the method is "sequential", making it "parallel" reduces the waiting time a bit. Will help generalizing existing backup/restore tests to support clustered snapshot/backup/restore API (see #28525) later. Cleaning up tests, not backporting. Closes scylladb/scylladb#28660 * github.com:scylladb/scylladb: test/backup: Run keyspace flush and snapshot taking API in parallel test/backup: Re-use take_snapshot() helper in do_abort_restore() test/backup: Move take_snapshot() helper up	2026-02-27 15:58:18 +02:00
Patryk Jędrzejczak	9a9202c909	Merge 'Remove gossiper topology code' from Gleb Natapov The PR removes most of the code that assumes that group0 and raft topology is not enabled. It also makes sure that joining a cluster in no raft mode or upgrading a node in a cluster that not yet uses raft topology to this version will fail. Refs #15422 No backport needed since this removes functionality. Closes scylladb/scylladb#28514 * https://github.com/scylladb/scylladb: group0: fix indentation after previous patch raft_group0: simplify get_group0_upgrade_state function since no upgrade can happen any more raft_group0: move service::group0_upgrade_state to use fmt::formatter instead of iostream raft_group0: remove unused code from raft_group0 node_ops: remove topology over node ops code topology: fix indentation after the previous patch topology: drop topology_change_enabled parameter from raft_group0 code storage_service: remove unused handle_state_* functions gossiper: drop wait_for_gossip_to_settle and deprecate correspondent option storage_service: fix indentation after the last patch storage_service: remove gossiper bootstrapping code storage_service: drop get_group_server_if_raft_topolgy_enabled storage_service: drop is_topology_coordinator_enabled and its uses storage_service: drop run_with_api_lock_in_gossiper_mode_only topology: remove code that assumes raft_topology_change_enabled() may return false test: schema_change_test: make test_schema_digest_does_not_change_with_disabled_features tests run in raft mode test: schema_change_test: drop schema tests relevant for no raft mode only topology: remove upgrade to raft topology code group0: remove upgrade to group0 code group0: refuse to boot if a cluster is still is not in a raft topology mode storage_service: refuse to join a cluster in legacy mode	2026-02-27 14:43:41 +01:00
Botond Dénes	9521a51e4c	Merge 'generic_server: scale connection concurrency semaphore by listener count' from Marcin Maliszkiewicz The concurrency semaphore gates uninitialized connections across all do_accepts loops, but was initialized to a fixed value regardless of how many listeners exist. With multiple listeners competing for the same units, each effectively gets less than the configured concurrency. Initialize the semaphore to concurrency - 1 and signal 1 per listen() call, so total capacity is concurrency - 1 + nr_listeners. This guarantees each listener's accept loop can have at least one unit available. It mainly fixes problem when setting uninitialized_connections_semaphore_cpu_concurrency config value to 1 would result in not being able to process connections, as only 1 out of 2 listeners got the semaphore. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-762 Backport: no, it's a minor problem Closes scylladb/scylladb#28747 * github.com:scylladb/scylladb: test: add test_uninitialized_conns_semaphore generic_server: fix waiters count in shed log generic_server: scale connection concurrency semaphore by listener count	2026-02-27 15:06:50 +02:00
Łukasz Paszkowski	bb57b0f3b7	compaction_manager: fix maybe_wait_for_sstable_count_reduction() hanging forever The futurization refactoring in `9d3755f276` ("replica: Futurize retrieval of sstable sets in compaction_group_view") changed maybe_wait_for_sstable_count_reduction() from a single predicated wait: ``` co_await cstate.compaction_done.wait([..] { return num_runs_for_compaction() <= threshold \|\| !can_perform_regular_compaction(t); }); ``` to a while loop with a predicated wait: ``` while (can_perform_regular_compaction(t) && co_await num_runs_for_compaction() > threshold) { co_await cstate.compaction_done.wait([this, &t] { return !can_perform_regular_compaction(t); }); } ``` This was necessary because num_runs_for_compaction() became a coroutine (returns future<size_t>) and can no longer be called inside a condition_variable predicate (which must be synchronous). However, the inner wait's predicate — !can_perform_regular_compaction(t) — only returns true when compaction is disabled or the table is being removed. During normal operation, every signal() from compaction_done wakes the waiter, the predicate returns false, and the waiter immediately goes back to sleep without ever re-checking the outer while loop's num_runs_for_compaction() condition. This causes memtable flushes to hang forever in maybe_wait_for_sstable_count_reduction() whenever the sstable run count exceeds the threshold, because completed compactions signal compaction_done but the signal is swallowed by the predicate. Fix by replacing the predicated wait with a bare wait(), so that any signal (including from completed compactions) causes the outer while loop to re-evaluate num_runs_for_compaction(). Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-610 Closes scylladb/scylladb#28801	2026-02-26 20:13:50 +02:00
Marcin Maliszkiewicz	a03ebe1a29	Merge 'cql: implement a new per-row TTL feature' from Nadav Har'El This series implements a new per-row TTL feature for CQL. The per-row TTL feature was requested in issue #13000. It is a feature that does not exist in Cassandra, and was inspired by DynamoDB's TTL feature - and under the hood uses the same implementation that we used in Alternator to implement this DynamoDB feature. The new per-row TTL feature is completely separate from CQL's existing per-write (and per-cell) TTL, and both will be available to users. In the per-row TTL feature, one column in the table is designated as the "TTL" column, and its value for a row is the expiration time for that row. The TTL column can be designated at table creation time, e.g.: ```cql CREATE TABLE tab ( id int PRIMARY KEY, t text, expiration timestamp TTL ); ``` Or after the table already exists with: ```cql ALTER TABLE tab TTL expiration ``` Expiration can also be disabled, with: ```cql ALTER TABLE tab TTL NULL ``` The new per-row TTL feature has two features that users have been asking for: 1. A user can change the value of just the TTL column - without rewriting the entire row - to change the expiration time of the entire row. 2. When an expired row is finally deleted, a CDC event about this deletion appears in the CDC log (if CDC is enabled), including - if a preimage is enabled - the content of the deleted row. To achieve the second goal (CDC events), a row is not guaranteed to disappear at exactly its expiration time (as CQL's original TTL feature guarantees). Rather, the row is deleted some time later, depending on `alternator_ttl_period_in_seconds`; Until the actual deletion, the row is still readable (and even writable). But we are guaranteed that when the row is finally deleted, the CDC event will come too. The implementation uses the same background thread used by Alternator to periodically scan for expired items and delete them. The expiration thread keeps the same metrics as it did for Alternator: * `scylla_expiration_scan_passes` * `scylla_expiration_scan_table` * `scylla_expiration_items_deleted` * `scylla_expiration_secondary_ranges_scanned` The series begins with a few small preparation patches, followed by the main part of the feature (which isn't big, since we are just enabling the pre-existing Alternator expiration machinary for CQL) and finally 30 tests (single-node and multi-node tests) and documentation. This series is a new feature, so traditionally would not be backported. However, I wouldn't be surprised if we will be requested to backport it so that customers will not need to wait for a new major release. Fixes #13000 Closes scylladb/scylladb#28320 * github.com:scylladb/scylladb: test/cqlpy: verify that a column can't be both STATIC and PRIMARY KEY docs/cql: document the new CQL per-row TTL feature test/cluster: tests for the new CQL per-row TTL feature test/cqlpy: tests for the new CQL per-row TTL feature test: set low alternator_ttl_period_in_seconds in CQL tests cql ttl: fix ALTER TABLE to disable TTL if column is dropped cql ttl: add setting/unsetting of TTL column to ALTER TABLE cql ttl: add TTL column support to CREATE TABLE and DESC TABLE ttl: add CQL support to Alternator's TTL expiration service alternator ttl: move TTL_TAG_KEY to a header file alternator ttl: remove unnecessary check of feature flag cql: add "cql_row_ttl" cluster feature alternator: fix error message if UpdateTimeToLive is not supported	2026-02-26 15:29:12 +01:00
Marcin Maliszkiewicz	30f18a91fd	Merge 'dtest: wait_for speedup' from Dario Mirovic Audit tests have been slow. They rely on wait_for function. This function first sleeps for the duration of the time step specified, and then calls the given function. The audit tests need 0.02-0.03 seconds for the given function, but the operation lasts around 1.02-1.03 seconds, since step is 1 second. This patch modifies wait_for dtest function so it first executes the given function, and afterwards calls time.sleep(step). This reduces time needed for the given function from 1.03 to 0.03 seconds. Total audit tests suite speedup is 3x. On the developer machine the time is reduced from 13+ minutes to 4 minutes. This patch also improves performance of some alternator tests that use the same wait_for dtest function. `wait_for` in dtest framework has default time step reduced to make the environment more responsive and test execution faster. Refs SCYLLADB-573 This is a performance improvement of testing framework. No need to backport. Closes scylladb/scylladb#28590 * github.com:scylladb/scylladb: dtest: shorten default sleep step in wait_for dtest: wait_for speedup	2026-02-26 09:33:38 +01:00
Nadav Har'El	23ad0be034	test/cluster: tests for the new CQL per-row TTL feature The previous patch added single-node functional tests (in test/cqlpy) for everything which was possible to test on a single node. In this patch we add four tests that we couldn't test on a single node, using the test/cluster test framework: 1. Test that the TTL expiration work - both the scanning threads and the actual deletion work on all nodes - happens on the "streaming" scheduling group. 2. Test that even if one of the cluster's nodes is down, still all the items get expired - another node "takes over" the dead node's work. 3. Test that rolling upgrade works as designed for the CQL per-row TTL feature: Before every single node in the cluster is upgraded to support this feature, a TTL column cannot be enabled on a table. And as soon as the last node of the cluster is upgraded, the TTL feature begins to work completely (you don't need to reboot all the nodes again). 4. Test that expiration works correctly on a multi-DC setup. The test doesn't check the efficiency of this process - i.e., that today each DC scans part of the data, reading with LOCAL_QUORUM, and writing the deletions across the entire cluster. Rather, the test only verifies the correctness - that expired rows do get deleted - for the usual case the data across the DCs is consistent. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-02-25 14:59:44 +02:00
Botond Dénes	56cc7bbeec	Merge 'Allow "global" snapshot using topology coordinator + add tablet metadata to manifest' from Calle Wilund Refs: SCYLLADB-193 Adds a "snapshot_table" topology operation and associated data structure/table columns to support dispatching a snapshot operation as a topo coordinator op. Logic is similar, and thus broken out and semi-shared with, truncation. Also adds optional tablet metadata to manifest, listing all tablets present in a given snapshot, as well as tablet sstable ownership, repair status, and token ranges. As per description in SCYLLADB-193, the alternative snapshot mechanism is in a separate namespace under 'tablets', which while dubious is the desired destination. The API is accessed via `nodetool cluster snapshot`, which more or less mirrors `nodetool snapshot`, but using topo op. TTL is added to message propagation as a separate patch here, since it is not (yet) used from API (or nodetool). Requires a syntax for both API and command line. Closes scylladb/scylladb#28525 * github.com:scylladb/scylladb: topology::snapshot: Add expiry (ttl) to RPC/topo op test_snapshot_with_tablets: Extend test to check manifest content table::manifest: Add tablet info to manifest.json test::test_snapshot_with_tablets: Add small test for topo coordinated snapshot scylla-nodetool: Add "cluster snapshot" command api::storage_service: Add tablets/snapshots command for cluster level snapshot db::snapshot-ctl: Add method to do snapshot using topo coordinator storage_proxy: Add snapshot_keyspace method topology_coordinator: Add handler for snapshot_tables storage_proxy: Add handler for SNAPSHOT_WITH_TABLETS messaging_service: Add SNAPSHOT_WITH_TABLETS verb feature_service: Add SNAPSHOT_AS_TOPOLOGY_OPERATION feature topology_mutation: Add setter for snapshot part of row system_keyspace::topology_requests_entry: Add snapshot info to table topology_state_machine: Add snapshot_tables operation topology_coordinator: Break out logic from handle_truncate_table storage_proxy: Break out logic from request_truncate_with_tablets test/object_store: Remove create_ks_and_cf() helper test/object_store: Replace create_ks_and_cf() usage with standard methods test/object_store: Shift indentation right for test cases	2026-02-25 10:17:53 +02:00
Botond Dénes	166e245097	Merge 'test.py: Topology test pytest integration' from Andrei Chekun Migrate cluster tests directory to be handled by pytest. This is the next step in process of unification of the tests and migration to the pytest. With this PR cluster test will be executed with the full path to the file instead of `suite/test` paradigm. Backport is not needed because it framework enhancement. Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-46 Closes scylladb/scylladb#27618 * github.com:scylladb/scylladb: test.py: remove setsid from the framework test.py: rename suite.yaml to test_config.yaml test.py: add cluster tests to be executed by pytest test.py: add random seed for topology tests reproducibility test.py: add explicit default values to pytest options test.py: replace SCYLLA env var with build_mode fixture	2026-02-25 10:17:20 +02:00
Botond Dénes	9dff9752b4	Merge 'Fix regression in Alternator TTL with tablets and node going down' from Nadav Har'El Recently we suffered a regression on how Alternator TTL behaves when a node goes down when tablets are used. Usually, expiration of data in a particular tablet are handled by this tablet's "primary replica". However, if that node is down, we want another node to perform these expiration until the primary replica goes back online. We created a function `tablet_map::get_secondary_replica()` to select that "other node". We don't care too much what the "secondary replica" means, but we do care that it's different from the primary replica - if it's the same the expiration of that tablet will never be done. It turns out that recently, in commits `817fdad` and `d88036d`, the implementation of get_primary_replica() changed without a corresponding change to get_secondary_replica(). After those changes, the two functions are mismatched, and sometimes return the same node for both primary and secondary replica. Unfortunately, although we had a dtest for the handling of a dead node in Alternator TTL, it failed to reproduce this bug, so this regression was missed - nothing else besides Alternator TTL ever used the get_secondary_replica() function. So this series, in addition to fixing the bug, we add two tests that reproduce this bug (fail before the fix, pass with the fix): 1. A unit test that checks that get_secondary_replica() always returns a different node from get_primary_replica() 2. A cluster test based on the original dtest, which does reproduce this bug in Alternator TTL where some of the data was never expired (but only failed in release build, for an unknown reason). Fixes SCYLLADB-777. Closes scylladb/scylladb#28771 * github.com:scylladb/scylladb: test: add unit test for tablet_map::get_secondary_replica() test, alternator: add test for TTL expiration with a node down locator: fix get_secondary_replica() to match get_primary_replica()	2026-02-25 10:13:55 +02:00
Gleb Natapov	a8a167623a	topology: remove code that assumes raft_topology_change_enabled() may return false The path removes the code protected by !raft_topology_change_enabled() since it is no longer reachable. Drop test_lwt_for_tablets_is_not_supported_without_raft since not raft mode is no longer supported.	2026-02-25 10:08:30 +02:00
Dario Mirovic	3222a1a559	dtest: shorten default sleep step in wait_for Default sleep step of 1s is too long. Reduce it to make the test environment more responsive and faster. Refs SCYLLADB-573	2026-02-25 03:17:47 +01:00
Dario Mirovic	51e7c2f8d9	dtest: wait_for speedup Audit tests have been slow. They rely on wait_for function. This function first sleeps for the duration of the time step specified, and then calls the given function. The audit tests need 0.02-0.03 seconds for the given function, but the operation lasts around 1.02-1.03 seconds, since step is 1 second. This patch modifies wait_for dtest function so it first executes the given function, and afterwards calls time.sleep(step). This reduces time needed for the given function from 1.03 to 0.03 seconds. Total audit tests suite speedup is 3x. On the developer machine the time is reduced from 13+ minutes to 4 minutes. This patch also improves performance of some alternator tests that use the same wait_for dtest function. Refs SCYLLADB-573	2026-02-25 03:17:46 +01:00
Marcin Maliszkiewicz	aa7816882e	test: add test_uninitialized_conns_semaphore Runtime in dev mode: 2s	2026-02-24 17:28:51 +01:00
Alex	5557770b59	test_mv_build_during_shutdown started two async CREATE MATERIALIZED VIEW operations and never awaited them (asyncio.gather(...) without await). This pr adds await for each one of the tasks to wait for the MV schema to be added successfully and then to start the server shutdown With this change we dont need will not get the shutdown races. Closes scylladb/scylladb#28774	2026-02-24 17:25:05 +01:00
Andrzej Jackowski	cd4caed3d3	test: fix configuration of test_autoretrain_dict `test_autoretrain_dict` sporadically fails because the default compression algorithm was changed after the test was written. `9ffa62a986815709d0a09c705d2d0caf64776249` was an attempt to fix it by changing the compression configuration during node startup. However, the configuration change had an incorrect YAML format and was ignored by ScyllaDB. This commit fixes it. Fixes: scylladb/scylladb#28204 Closes scylladb/scylladb#28746	2026-02-24 12:08:44 +01:00
Marcin Maliszkiewicz	d5684b98c8	test: cluster: add continue-after-error to perf tool tests Add --continue-after-error true to perf-cql-raw and perf-alternator tests, and --stop-on-error false to perf-simple-query test, so that tests don't abort on the first error. Reason for this is that tests are flaky with example failure: Perf test failed: std::runtime_error (server returned ERROR to EXECUTE) When CPU is starved on CI we can return timeouts and/or other errors. The change should make tests more robust on the expense of smaller test scope. But those tests were written mostly to test startup sequence as it differs from Scylla's starup. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-759 Closes scylladb/scylladb#28767	2026-02-24 11:08:34 +02:00
Andrei Chekun	d3f5f7468c	test.py: rename suite.yaml to test_config.yaml Switch of discovery of the tests by test.py	2026-02-24 09:48:38 +01:00
Andrei Chekun	4a7d8cd99d	test.py: add explicit default values to pytest options Add explicit default values to pytest command line options to prevent issues when running tests with pytest's parallel execution where options are not present on upper conftest, so they're just not set at all.	2026-02-24 09:48:38 +01:00
Andrei Chekun	99234f0a83	test.py: replace SCYLLA env var with build_mode fixture Replace direct usage of SCYLLA environment variable with the build_mode pytest fixture and path_to helper function. This makes tests more flexible and consistent with the test framework. Also this allows to use tests with xdist, where environment variable can be left in the master process and will not be set in the workers Add using the fixture to get the scylla binary from the suite, this will align with getting relocatable Scylla exe.	2026-02-24 09:48:38 +01:00
Pavel Emelyanov	6b02b50e3d	Merge 'object_storage: add retryable machinery to object storage' from Ernest Zaslavsky - add an overload to the rest http client to accept retry strategy instance as an argument - remove hand rolled error handling from object storage client and replace with common machinery that supports handling and retrying when appropriate No backport neede since it is only refactoring Closes scylladb/scylladb#28161 * github.com:scylladb/scylladb: object_storage: add retryable machinery to object storage rest_client: add `simple_send` overload	2026-02-23 21:28:51 +03:00
Nadav Har'El	0c7f499750	test, alternator: add test for TTL expiration with a node down We have many single-node functional tests for Alternator TTL in test/alternator/test_ttl.py. This patch adds a multi-node test in test/cluster/test_alternator.py. The new test verifies that: 1. Even though Alternator TTL splits the work of scanning and expiring items between nodes, all the items get correctly expired. 2. When one node is down, all the items still expire because the "secondary" owner of each token range takes over expiring the items in this range while the "primary" owner is down. This new test is actually a port of a test we already had in dtest (alternator_ttl_tests.py::test_multinode_expiration). This port is faster and smaller then the original (fewer nodes, fewer rows), but it still found a regression (SCYLLADB-777) that dtest missed - the new test failed when running with tablets and in release build mode. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-02-23 16:19:43 +02:00
Andrei Chekun	6ae58c6fa6	test.py: move storage tests to cluster subdirectory Move the storage test suite from test/storage/ to test/cluster/storage/ to consolidate related cluster-based tests.This removes the standalone test/storage/suite.yaml as the tests will use the cluster's test configuration. Initially these tests were in cluster, but to use unshare at first iteration they were moved outside. Now they are using another way to handle volumes without unshare, they should be in cluster Closes scylladb/scylladb#28634	2026-02-23 16:14:15 +02:00
Gleb Natapov	4a9cf687cc	group0: remove upgrade to group0 code This patch removes ability of a cluster to upgrade from not having group0 to having one. This ability is used in gossiper based recovery procedure that is deprecated and removed in this version. Also remove tests that uses the procedure.	2026-02-23 14:54:24 +02:00
Marcin Maliszkiewicz	54dca90e8c	Merge 'test: move dtest/guardrails_test.py to test_guardrails.py' from Andrzej Jackowski This patch series moves `test/cluster/dtest/guardrails_test.py` to `test/cluster/test_guardrails.py`, and migrates it from `cluster/dtest/` to `cluster/` framework. There are two motivations for moving the test: - Execution time reduction (from 12s to 9s in 'dev' in my env) - Facilitate adding new tests to the `guardrails_test.py` file No backport, `dtest/guardrails_test.py` is only on master Closes scylladb/scylladb#28737 * github.com:scylladb/scylladb: test: move dtest/guardrails_test.py to test_guardrails.py test: prepare guardrails_test.py to be moved to test/cluster/	2026-02-23 12:34:43 +01:00
Calle Wilund	cc60d014ed	test_snapshot_with_tablets: Extend test to check manifest content Verifies we have the expected tablet info in manifest.	2026-02-23 11:37:17 +01:00
Calle Wilund	ae10b5a897	test::test_snapshot_with_tablets: Add small test for topo coordinated snapshot	2026-02-23 11:37:16 +01:00
Pavel Emelyanov	ad0c2de0d1	test/object_store: Remove create_ks_and_cf() helper Now all test cases use standard facilities to create data they test Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-02-23 10:43:28 +01:00
Pavel Emelyanov	6711afd73b	test/object_store: Replace create_ks_and_cf() usage with standard methods To create a keyspace theres new_test_keyspace helper Table is created with a single cql.run_async with explicit schema Dataset is populated with a single parallel INSERT as well Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-02-23 10:43:28 +01:00
Pavel Emelyanov	ed3a326637	test/object_store: Shift indentation right for test cases This is preparational patch. Next will need to replace foo() bar() with with something() as s: foo() bar() Effectively -- only add the `with something()` line. Not to shift the whole file right together with that future change, do it here. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-02-23 10:43:28 +01:00
Pavel Emelyanov	3d07633300	test/object_store: Use itertools.product() for deeply nested loops The test_restore_with_streaming_scopes want to run some loop body for all (almost) combinations of scope, primary-replica-only and min tablet count. For that three nested loops are used. Using itertools.product() makes the code shorter, less indented and more explicit. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-02-23 12:28:53 +03:00
Pavel Emelyanov	a9a82f89ac	test/object_store: Replace dataset creation usage with standard methods Two places are fixed 1. The call to create_dataset() is replaced with three "library" methods. This makes it explicit which options and schema are used for that. Eventually, the large and bulky create_dataset will be removed 2. The part that restores data into a fresh new table calls some CQLs by hand, and partially re-uses variables obtained from previous call to create_dataset(). Using the same "library" methods to re-create an empty table makes this part much simpler Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-02-23 12:27:41 +03:00
Pavel Emelyanov	988606ac7f	test/object_store: Shift indentation right for test_restore_with_streaming_scopes This is preparational patch. Next will need to replace foo() bar() with with something() as s: foo() bar() Effectively -- only add the `with something()` line. Not to shift the whole file right together with that future change, do it here. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-02-23 12:27:09 +03:00
Pavel Emelyanov	5161aeee95	test/backup: Run keyspace flush and snapshot taking API in parallel The take_snapshot() helper runs these API sequentially for every server. Running them with asyncio.gather() slightly reduces the wait-time thus improving the total runtime. Before: CPU utilization: 2.1% real 0m33,871s user 0m22,500s sys 0m13,207s After: CPU utilization: 2.4% real 0m29,532s user 0m22,351s sys 0m12,890s Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-02-23 12:20:36 +03:00
Pavel Emelyanov	21752a43fe	test/backup: Re-use take_snapshot() helper in do_abort_restore() The test in question does _exactly_ what this helper does, but in a longer way. The only difference is that it uses server_id as key to dict with sstable components, but it's easy to tune. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-02-23 12:20:35 +03:00
Pavel Emelyanov	818a99810c	test/backup: Move take_snapshot() helper up So that it's not in the middle of tests themselves, but near other "helper" functions in the .py file Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-02-23 12:20:35 +03:00
Ernest Zaslavsky	321d4caf0c	object_storage: add retryable machinery to object storage remove hand rolled error handling from object storage client and replace with common machinery that supports exception handling and retrying when appropriate	2026-02-22 14:00:44 +02:00
Patryk Jędrzejczak	e8efcae991	Merge 'Use standard ks/cf/data creation methods in object_store/test_basic.py test' from Pavel Emelyanov The test uses create_ks_and_cf helper duplicating the existing code that does the same. This PR patches basic tests to use standard facilities. Also it prepares the ground for testing keyspace storage options with rf=3 Cleaning tests, not backporting Closes scylladb/scylladb#28600 * https://github.com/scylladb/scylladb: test/object_store: Remove create_ks_and_cf() helper test/object_store: Replace create_ks_and_cf() usage with standard methods test/object_store: Shift indentation right for test cases	2026-02-20 15:53:38 +01:00
Botond Dénes	6c04e02f66	Merge 'Fix restoration test's validation of streaming directions' from Pavel Emelyanov The test_restore_with_streaming_scopes among other things checks how data streams flow while restoring. Whether or not to check the streams is decided based on the min tablet count value, which is compared with a hardcoded 512. This value of 512 matched the tablet count used by this test until it was "optimized" by #27839, where this number changed to 5 and streaming checks became off. Good news is that the very same checks are still performed by test_refresh_with_streaming_scopes. But it's better to have a working restoration test anyway. Minor test fix, not backporting Closes scylladb/scylladb#28607 * github.com:scylladb/scylladb: test: Fix the condition for streaming directions validation test: Split test_backup.py::check_data_is_back() into two	2026-02-20 15:42:10 +02:00
Botond Dénes	6f88c0dbd3	Merge ' test_tablets_parallel_decommission: Fix flakiness due to delayed task appearance' from Tomasz Grabiec Currently, the test assumes that when 'topology_coordinator_pause_before_processing_backlog: waiting' is logged, the task for decommission must be there. This was based on the assumption that topology coordinator is idle and decommission request wakes it up. But if the server is slow enough, it may still be running the load balancer in reaction to table creation, and block on that injection point before decommission request was added. Fix by waiting for the task to appear rather than the injection. Fixes SCYLLADB-715 Only 2026.1 vulnerable. Closes scylladb/scylladb#28688 * github.com:scylladb/scylladb: test_tablets_parallel_decommission: Fix flakiness due to delayed task appearance test: cluster: task_manager_client: Introduce wait_task_appears() tests: pylib: util: Add exponential backoff to wait_for	2026-02-20 15:05:36 +02:00
Pavel Emelyanov	c96420c015	tests: Re-use manager.get_server_exe() There's a bunch of incremental repair tests that want to call scylla sstable command. For that they try to find where scylla binary by scanning /proc directory (see local_process_id and get_scylla_path helpers). There's shorter way -- just call manager.get_server_exe(). Same for backup-restore test. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#28676	2026-02-20 14:59:30 +02:00
Pavel Emelyanov	a4a0d75eee	test/object_store: Parametrize test_simple_backup_and_restore() There are three tests and a function with a pair of boolean parameters called by those. It's less code if the function becomes a test with parameters. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#28677	2026-02-20 14:57:30 +02:00
Pavel Emelyanov	a2e1293f86	test/object_store: Squash two simple-backup tests together The test_backup_simple creates a ks/cf, takes a snapshot, backs it up, then checks that the files were uploaded. The test_backup_move does the same, but also plays with 'move_files' parameter to be true/false. In fact, the "move" test was the copy of "simple" one that dropepd check for scheduling group being "streaming" (backup with --move-files can check the same, it's not bad), and check for destination bucket to contain needed files (same here -- checking that files arrived to bucket after --move-files is good). In the end of the day, after the change backup test is run two times, instead of three, and performs extra checks for --move-files case. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#28606	2026-02-20 14:49:30 +02:00
Andrzej Jackowski	eb5a564df2	test: move dtest/guardrails_test.py to test_guardrails.py This commit moves `guardrails_test.py`, prepared in the previous commit of this patch series, to `test/cluster/test_guardrails.py`. It also cleans up `suite.yaml`.	2026-02-20 11:39:52 +01:00
Andrzej Jackowski	9df426d2ae	test: prepare guardrails_test.py to be moved to test/cluster/ Disable `test/cluster/dtest/guardrails_test.py` in `suite.yaml` and make it compatible with the `test/cluster/` framework. This will allow moving this file from `test/cluster/dtest/` to `test/cluster/` in the next commit of this patch series. There are two motivations for moving the test: - Execution time reduction (from 12s to 9s in 'dev' in my env) - Facilitate adding new tests to the `guardrails_test.py` file	2026-02-20 11:39:43 +01:00
Avi Kivity	27a5502f14	Merge 'Reapply "main: test: add future and abort_source to after_init_func"' from Marcin Maliszkiewicz The patchset fixes abort_source implementation for perf-alternator and perf-cql-raw. It moves run_standalone function to common code in perf.hh with necessary templating. We also add extensive testing so that it's more difficult to break the tooling in the future. Fixes SCYLLADB-560 Backport: no, internal tooling improvement Closes scylladb/scylladb#28541 * github.com:scylladb/scylladb: test: cluster: add tests for perf tools test: perf: fix port race condition on startup in connect workload test: perf: prepare benchmarks to bind to custom host test: perf: make perf-alterantor remote port configurable test: perf: fix ASAN leak warnings in perf-alternator Reapply "main: test: add future and abort_source to after_init_func"	2026-02-19 19:12:46 +02:00

1 2 3 4 5 ...

935 Commits