Add more racks to dc2 to verify that the default replication factor
covers all available racks (rather than e.g. limited to 3).
With tablets and rf_rack_valid_keyspaces, verify also the automatically
selected rack list.
Restrict the extension to non-debug build modes to prevent running out
of memory with --repeat=100.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Closesscylladb/scylladb#29931
Python tests requires different handling of metrics gathering from
cgroup than C++ tests. pytest do not execute each python tests in
a separate process, so we can't put it there and get the metrics.
The idea is to put the whole pytest process to the cgroup and get the
metrics. This will work because pytest runs the threads as as completely
separate processes and inside the thread it will run tests consequently.
Additionally, to simplify system resource monitor moved to pytest main
thread.
After all test suites migrated to test_config.yaml with type: Python,
the specialized suite classes (Topology, CQLApproval, Run, Tool) and
the legacy execution pipeline (find_tests, run_test, TestSuite.run,
Test.run) became unreachable. Remove all this dead code.
Deleted files:
- suite/topology.py, suite/cql_approval.py, suite/run.py, suite/tool.py
Simplified:
- base.py: remove run_test(), read_log(), TestSuite.run(),
add_test_list(), build_test_list(), all_tests(), test_count(),
SUITE_CONFIG_FILENAME, disabled/flaky test tracking, and dead
Test attributes (args, core_args, valid_exit_codes, allure_dir,
is_flaky, is_cancelled, etc.)
- python.py: remove PythonTestSuite.run(), PythonTest.run(),
_prepare_pytest_params(), pattern, test_file_ext, xmlout,
server_log, scylla_env setup, and shlex import.
Simplify run_ctx() to take no parameters.
- runner.py: remove --scylla-log-filename option,
print_scylla_log_filename fixture, SUITE_CONFIG_FILENAME import,
and suite.yaml probe in TestSuiteConfig.from_pytest_node().
- __init__.py: remove re-exports of deleted classes.
- test_config.yaml: Topology -> Python, Approval -> Python.
- conftest files: run_ctx(options=...) -> run_ctx().
- docs/dev/testing.md: update to reflect current pytest-based
architecture, log paths, and removed features.
Co-Authored-By: Claude Opus 4.6 (200K context) <noreply@anthropic.com>
Closesscylladb/scylladb#29613
After stopping scylla server processes, the FUSE daemon
(fuse2fs) may still be processing file handle closures.
An immediate fusermount3 -u can fail with 'device busy',
causing spurious test failures on teardown.
Retry the unmount up to 10 times with 0.5s delay between
attempts, and capture stderr for diagnostics.
Fixes: SCYLLADB-2049
Closesscylladb/scylladb#29920
The `test_max_cells` test was flaky due to `std::bad_alloc` caused by Seastar buddy allocator fragmentation. The root causes are:
1. The doubling loop with 24 iterations of CREATE/INSERT/DROP fragmented the allocator
2. The test built the whole batch as a single string that takes contiguous memory
Also, some iterations inserted zero rows, but still did CREATE/DROP table which also contributed to the fragmentation.
This patch series:
- Skips iterations that insert zero rows
- Creates the table once, truncates it after each test iteration
- Switches to prepared statements
Investigation results are presented in detail in https://scylladb.atlassian.net/browse/SCYLLADB-1645
Fixes SCYLLADB-1645
CI stability improvement. Backport to versions that have this test.
Closesscylladb/scylladb#29759
* github.com:scylladb/scylladb:
test: prepare max cells inserts
test: reuse max cells schema
test: limits: skip empty max cells iterations
When an RF change shrinks replicas on a DC and the node being shrunk is
excluded, refresh_tablet_load_stats() only provides load_stats for that
node if it has a cached snapshot from when the node was still up. If the
snapshot is missing or predates the tables being shrunk (e.g. they were
created after the node went down), stats stay incomplete. In that case
load_sketch::unload() called from make_rf_change_plan() throws:
Can't provide accurate load computation with incomplete load_stats
for host: <uuid>
Since an excluded node is not expected to come back, load_stats will
never become complete, and the topology coordinator retries the plan
infinitely, hanging ALTER KEYSPACE.
Add a check for excluded nodes and skip unload() for them: we are
removing the replica, so accurate load data for that node is not
needed. For all other node states the throw-and-retry behavior is
preserved.
Modify test_excludenode_shrink_rf to always trigger the bug: a new
error injection 'force_down_node_load_stats_invalid' forces the
invalid-stats path in refresh_tablet_load_stats() for a down node, so
the test does not depend on whether the load-stats refresher happened
to cache the excluded node's stats while it was still up.
Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1702.
Closesscylladb/scylladb#29622
Switch from raw CQL batch string to using a prepared statement.
The old approach constructed the entire 50-row batch as a single
CQL text string (~19.8 MiB with 32768 column names spelled out
per row). This caused large contiguous allocations in the server.
Fixes SCYLLADB-1645
Extract table creation into _create_max_cell_count_table(). Call
it once before the loop instead of creating and dropping the table
on every iteration. Use TRUNCATE instead of DROP TABLE between
iterations to clear data while keeping the schema.
This avoids repeated schema operations that fragment the Seastar
buddy allocator's address space with scattered small allocations.
Refs SCYLLADB-1645
The doubling loop in test_max_cells started from cells=1. Since
each row has MAX_CELLS_COLUMNS (32768) cells, iterations where
cells < MAX_CELLS_COLUMNS produced zero rows (cells // columns = 0).
Those iterations only did CREATE TABLE / DROP TABLE with no data
inserted.
Start the loop from MAX_CELLS_COLUMNS and use a while loop.
Co-authored-by: Dario Mirovic <dario.mirovic@scylladb.com>
Refs SCYLLADB-1645
Move `materialized_views_test.py::TestMaterializedViews::test_do_not_finish_view_building_with_hints`
test from dtest to test.py.
The dtest was throttling down IO throughput in the hope that the view
building won't be finished too soon. This introduces some unreliability,
which can be solved by using error injection and pausing view building
until we stop necessary nodes.
This patch adds 2 tests: one for tablet-based view and one for vnode-based. Both of the tests use error injection to pause view building.
Fixes [SCYLLADB-1261](https://scylladb.atlassian.net/browse/SCYLLADB-1261)
The issue was seen in 2026.2, so we should backport this patch to this version.
[SCYLLADB-1261]: https://scylladb.atlassian.net/browse/SCYLLADB-1261?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQClosesscylladb/scylladb#29788
* github.com:scylladb/scylladb:
test/cluster/mv/test_mv_building: add similar test for vnode-based view
test/cluster/test_view_building_coordinator: migrate test from dtest
db/view/view_building_worker: add more logs when flushing base table
In the dtest repo, the test run for both vnode and tablet based views.
Since in test.py infra we're using error injection to pause the view
building process, we need separate tests for those two cases.
service_levels: self-heal stale v1 marker after raft topology upgrade
This PR handles an upgrade corner case where a node may already be using
raft topology, while `system.scylla_local` still marks service levels as v1.
The problem was introduced by commit 2917ec5d51
("service:qos: service levels migration"), which added the service-levels
migration from `system_distributed.service_levels` to
`system.service_levels_v2` as part of the raft topology upgrade.
However, if the cluster had no service levels configured, there was no data
to migrate. In that case, the migration path could leave the local version
marker unchanged, so the node would later observe an inconsistent state:
* raft topology is already enabled;
* service levels are still marked as v1 in `system.scylla_local`.
Such clusters can be left in a stale state and fail startup during upgrade to
2026.2
This PR makes the upgrade path self-healing.
The first commit restores `service_level_controller::migrate_to_v2()`, giving
us a group0-based path for writing the service-levels v2 state even after raft
topology is already in use.
The second commit wires this path into startup. When the node detects the
stale raft-topology + service-levels-v1 state, it retries the migration a
bounded number of times and updates the version marker to v2 instead of
failing startup.
With this change, clusters that were left in this stale state can recover
automatically during upgrade to 2026.
Fixes: SCYLLADB-1807
backport: 2026.2 2026.1 we need this functionality when we are upgrading older servers
Closesscylladb/scylladb#29749
* github.com:scylladb/scylladb:
test/auth_cluster: simulate v1 state in self-heal test When skip_service_levels_v2_initialization is used, write an explicit v1 service level version marker while skipping v2 initialization. This lets the restart test exercise self-healing from v1 to v2.
qos: self-heal stale service levels version on startup
qos: reintroduce service levels v2 migration self-heal
Move `materialized_views_test.py::TestMaterializedViews::test_do_not_finish_view_building_with_hints`
test from dtest to test.py.
The dtest was throttling down IO throughput in the hope that the view
building won't be finished too soon. This introduces some unreliability,
which can be solved by using error injection and pausing view building
until we stop necessary nodes.
Fixes SCYLLADB-1261
This series fixes a recurring source of flaky tests in the cluster test suite.
When a test configures Scylla to listen on non-default ports (e.g. a custom Alternator port, proxy-protocol port or shard-aware port), server_add() and server_start() would declare the server ready by polling the hardcoded standard CQL and Alternator ports. Those ports can become available slightly before the custom ports finish binding, so the test could start using the custom port before it was open — causing intermittent failures.
The fix for each affected test was to pass `expected_server_up_state=ServerUpState.SERVING` explicitly, which waits for Scylla's sd_notify("STATUS=serving") signal instead. That signal is sent only after all configured listeners are fully open, so it is always the right readiness signal regardless of the port configuration. This workaround was applied again in PR #29737 and will keep being needed for every new test that uses a non-default port.
This series makes ServerUpState.SERVING the default at every level of the server start/add call stack so no test needs to remember it:
* Make server_add(), servers_add(), server_start() et al. all default to ServerUpState.SERVING.
* Document that server_add/server_start wait for all ports to be ready, so future test authors understand what the functions guarantee.
* Remove now-redundant expected_server_up_state=SERVING from exiting tests.
* A small optimization: Fix check_serving_notification() returning False on first completion. When the sd_notify future completed, the function correctly updated _received_serving but still returned False, wasting one 100ms polling interval. Return self._received_serving directly.
Closesscylladb/scylladb#29758
* github.com:scylladb/scylladb:
test/pylib: fix missing protocol_version=4 on control_cluster
scylla_cluster: guard poll_status() set_result() calls against cancelled future
test/cluster: avoid repeated CQL checks and leaks while waiting for SERVING
test/cluster: fix check_serving_notification() inefficiency
test/cluster: remove now-redundant expected_server_up_state=SERVING
test/cluster: document that add/start waits for all ports to be ready
test/cluster: update remaining CQL_ALTERNATOR_QUERIED defaults to SERVING
test/cluster: fix server_add/server_start hanging when starting in maintenance mode
main: notify "entering maintenance mode" after the maintenance CQL server is ready
test/cluster: make server_start() default to ServerUpState.SERVING
test/cluster: make server_add() default to ServerUpState.SERVING
When skip_service_levels_v2_initialization is used, write an explicit
v1 service level version marker while skipping v2 initialization. This
lets the restart test exercise self-healing from v1 to v2.
Strong consistent requests take different patch then EC requests and consistency levels don’t map well.
We should limit available consistency levels in SC request to avoid ignoring them silently, which may cause confusion to user.
For writes, there is only one option:
- QUORUM/LOCAL_QUORUM (multi DC is not supported yet, so both of those CLs have the same effect) - we need quorum of replicas to successfully commit new mutations to Raft log.
For reads, there are 2 options:
- QUORUM/LOCAL_QUORUM - if user wants to be sure he sees latest data and the query needs to execute `read_barrier()`, which requires quorum of replicas
- ONE/LOCAL_ONE - if user just wants to read data from one replica without synchronization
All tests were updated to use LOCAL_QUORUM for both read and writes.
Fixes SCYLLADB-1766
SC is in experimental phase and this patch is an improvement, no backport needed.
Closesscylladb/scylladb#29691
* github.com:scylladb/scylladb:
strong_consistency: allow QUORUM/LOCAL_QUORUM and ONE/LOCAL_ONE for reads
strong_consistency: allow only QUORUM/LOCAL_QUORUM CL for writes
Avoid concurrent topology changes in the tombstone GC repair setup, where debug-mode nodes running hinted handoff and materialized view startup work can time out while applying Raft entries before the test starts.
Keep the sequential path opt-in so unrelated repair tests still exercise concurrent bootstrap behavior.
Closesscylladb/scylladb#29829
The function does nothing useful now.
No backport needed. Removes code.
Closesscylladb/scylladb#29828
* https://github.com/scylladb/scylladb:
raft_group0: remove finish_setup_after_join function
raft_group0: fix indentation after the last change
raft_group: drop unneeded checks
Update the hardcoded 2025.1.0 binary URL to the latest 2025.1.12
release for upgrade tests.
The 2025.1.12 binary now supports and enforces the
rf_rack_valid_keyspaces option which the test harness enables by
default. Since test_sstable_compression_dictionaries_upgrade creates
a 2-node cluster in a single rack with RF=2, it violates the
constraint. Disable the option explicitly for this test.
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
Closesscylladb/scylladb#29714
To successfully write data to strong consistent table, a quorum of
replicas need to be used to save the data to Raft log.
So the only reasonable consistency level is QUORUM/LOCAL_QUORUM
(currently SC doesn't support multi DC).
In the test we perform 2 consecutive writes where the first write
is supposed to increase the view update backlog above the mv
admission control threshold and the second one is expected to be
rejected because of that.
On each node/shard we have 2 types of view update backlogs:
1. for deciding whether we should admit writes
2. for propagating the backlog information to other nodes/shards.
For the second write to be rejected, it must be performed on a node
and shard which updated its backlog of type 1.
The view update backlog of type 2. is immediately increased on the
base table replica. For this backlog to be registered as a backlog
of type 1., it needs to be either carried by gossip (happening once
every second) or by attaching it to a replica write response. We
don't want to increase the runtime of tests unnecessarily, so we don't
wait and we rely on the second mechanism. The response to the first
base table write (the one causing increase in the backlog) carries
the increased backlog to the coordinator of this write. So for the
second write to observe the increased backlog, it needs to be coordinated
on the same node+shard as the first write.
We make sure that both writes are coordinated on the same node+shard by
using prepared statements combined with setting the host in `run_async`.
Both writes target the same partition and with prepared statements we
route them directly to the correct shard.
That was the idea, at least. In practice, for the driver to learn the
correct shard, it first needs to learn the token->shard mapping from
the server. For vnodes it can expect a shard by calculating the token
of the affected partition, but for tablets, it had no opportunity to
learn the tablet->shard mapping so the first write may route to any shard.
Additionally, we aren't guaranteed that the driver established connections
to all shards on all nodes at the point of any write. So if a connection
finishes establishing between the two writes, this may also cause us to
coordinate these 2 writes on different shards, leading to a missed view
backlog growth and not-rejected second write.
We fix this in this patch by running the test using one shard on each node.
This way, as long as we perform both writes on the same node, they'll also
be coordinated on the same shard. This also makes the prepared statement and
BoundStatement unnecessary — we can use SimpleStatement with
FallthroughRetryPolicy directly.
Fixes: SCYLLADB-1901
Closesscylladb/scylladb#29862
Add per-shard metrics for strong consistency coordinator operations (latency, timeouts, bounces, status unknown) under the `"strong_consistency_coordinator"` category. These are analogous to the eventual consistency metrics in `storage_proxy_stats`, enabling direct performance comparison between the two consistency modes.
The metrics are simplified compared to `storage_proxy_stats` — no breakdown by table, tablet, scheduling group, or DC, only per-shard.
Fixes SCYLLADB-1343
Strong consistency is still in experimental phase, no need to backport.
Closesscylladb/scylladb#29318
* github.com:scylladb/scylladb:
test/strong_consistency: verify metrics
strong_consistency: wire up metrics to operations
strong_consistency: add stats struct and metrics registration
The mechanics of the restore is like this
- A /storage_service/tablets/restore API is called with (keyspace, table, endpoint, bucket, manifests) parameters
- First, it populates the system_distributed.snapshot_sstables table with the data read from the manifests
- Then it emplaces a bunch of tablet transitions (of a new "restore" kind), one for each tablet
- The topology coordinator handles the "restore" transition by calling a new RESTORE_TABLET RPC against all the current tablet replicas
- Each replica handles the RPC verb by
- Reading the snapshot_sstables table
- Filtering the read sstable infos against current node and tablet being handled
- Downloading and attaching the filtered sstables
This PR includes system_distributed.snapshot_sstables table from @robertbindar and preparation work from @kreuzerkrieg that extracts raw sstables downloading and attaching from existing generic sstables loading code.
This is first step towards SCYLLADB-197 and lacks many things. In particular
- the API only works for single-DC cluster
- the caller needs to "lock" tablet boundaries with min/max tablet count
- not abortable
- no progress tracking
- sub-optimal (re-kicking API on restore will re-download everything again)
- not re-attacheable (if API node dies, restoration proceeds, but the caller cannot "wait" for it to complete via other node)
- nodes download sstables in maintenance/streaming sched gorup (should be moved to maintenance/backup)
Other follow-up items:
- have an actual swagger object specification for `backup_location`
Closes#28436Closes#28657Closes#28773Closesscylladb/scylladb#28763
* github.com:scylladb/scylladb:
docs: Update topology_over_raft.md with `restore` transition kind
test: Add test for backup vs migration race
test: Restore resilience test
sstables_loader: Fail tablet-restore task if not all sstables were downloaded
sstables_loader: mark sstables as downloaded after attaching
sstables_loader: return shared_sstable from attach_sstable
db: add update_sstable_download_status method
db: add downloaded column to snapshot_sstables
db: extract snapshot_sstables TTL into class constant
test: Add a test for tablet-aware restore
tablets: Implement tablet-aware cluster-wide restore
messaging: Add RESTORE_TABLET RPC verb
sstables_loader: Add method to download and attach sstables for a tablet
tablets: Add restore_config to tablet_transition_info
sstables_loader: Add restore_tablets task skeleton
test: Add rest_client helper to kick newly introduced API endpoint
api: Add /storage_service/tablets/restore endpoint skeleton
sstables_loader: Add keyspace and table arguments to manfiest loading helper
sstables_loader_helpers: just reformat the code
sstables_loader_helpers: generalize argument and variable names
sstables_loader_helpers: generalize get_sstables_for_tablet
sstables_loader_helpers: add token getters for tablet filtering
sstables_loader_helpers: remove underscores from struct members
sstables_loader: move download_sstable and get_sstables_for_tablet
sstables_loader: extract single-tablet SST filtering
sstables_loader: make download_sstable static
sstables_loader: fix formating of the new `download_sstable` function
sstables_loader: extract single SST download into a function
sstables_loader: add shard_id to minimal_sst_info
sstables_loader: add function for parsing backup manifests
split utility functions for creating test data from database_test
export make_storage_options_config from lib/test_services
rjson: Add helpers for conversions to dht::token and sstable_id
Add system_distributed_keyspace.snapshot_sstables
add get_system_distributed_keyspace to cql_test_env
code: Add system_distributed_keyspace dependency to sstables_loader
storage_service: Export export handle_raft_rpc() helper
storage_service: Export do_tablet_operation()
storage_service: Split transit_tablet() into two
tablets: Add braces around tablet_transition_kind::repair switch
The test samples sl:default runtime before and after setup writes to
prove that it measures the scheduling group used by regular CQL writes.
The metric is exported in milliseconds, so a single 200-row batch may
not be visible immediately, or may be too small in some environments.
Keep the original 200-row table size, but wait up to 30 seconds for the
metric to advance. If it does not, retry the same writes before TTL is
enabled. The retries update the same keys, so the expiration part of the
test still waits for exactly the original number of rows.
In a local 100-run with N=200 rows, the observed delta of
`ms_statement_before - ms_statement_before_write` was: min=4.0,
max=16.0, mean=8.13, and median=8.0. Therefore, it looks possible that
in a rare corner case the delta drops even to 0.
Fixes SCYLLADB-1869
Closesscylladb/scylladb#29797
`system.view_building_tasks` is a single-partition Raft group0 table (pk = `"view_building"`, CK = timeuuid). When `clean_finished_tasks()` deletes hundreds of finished tasks, the physical rows remain in SSTables until compaction. Any subsequent read of the partition counts every column of every tombstoned row
as a dead cell, triggering `tombstone_warn_threshold` warnings in large clusters.
Two-part fix:
**1. Range tombstones instead of row tombstones (commits 2–3)**
Instead of one row tombstone per finished task, find the minimum alive task UUID (`min_alive_uuid`) and emit a single range tombstone `[before_all, min_alive_uuid)` covering all tasks below that boundary. This reduces the tombstone count significantly and also benefits future compaction.
**2. Bounded scan with `min_task_id` (commits 4–6)**
Even with range tombstones, physical rows remain until compaction and still count as dead cells during reads. The only way to avoid them is to not read them at all.
- Add a `min_task_id timeuuid` static column to `system.view_building_tasks`.
- On every GC, write `min_task_id = min_alive_uuid` atomically with the range tombstone (same Raft batch).
- On reload, read `min_task_id` first using a **static-only partition slice** (empty `_row_ranges` + `always_return_static_content`): the SSTable reader stops immediately after the static row before processing any clustering tombstones — zero dead cells counted.
- Use `AND id >= min_task_id` as a lower bound for the main task scan, skipping all tombstoned rows.
The static-only read and the bounded scan are gated on the `VIEW_BUILDING_TASKS_MIN_TASK_ID` cluster feature so mixed-version clusters fall back to the full scan.
The issue is not critical, so the fix shouldn't be backported.
Fixes SCYLLADB-657
Closesscylladb/scylladb#28929
* github.com:scylladb/scylladb:
test/cluster/test_view_building_coordinator: add reproducer for tombstone threshold warning
docs: document tombstone avoidance in view_building_tasks
view_building: add `task_uuid_generator` to `view_building_task_mutation_builder`
view_building: introduce `task_uuid_generator`
view_building: store `min_alive_uuid` in view building state
view_building: set min_task_id when GC-ing finished tasks
view_building: add min_task_id support to view_building_task_mutation_builder
view_building: add min_task_id static column and bounded scan to system_keyspace
view_building: use range tombstone when GC-ing finished tasks
view_building: add range tombstone support to view_building_task_mutation_builder
view_building: introduce VIEW_BUILDING_TASKS_MIN_TASK_ID cluster feature
This series adds per-test bucket isolation to all S3 and GCS object storage tests. Previously, every test shared a single pre-created bucket, which meant tests could interfere with each other through leftover objects and could not run concurrently across multiple `test.py` processes without risking collisions.
New `create_bucket`, `delete_bucket`, and `delete_bucket_with_objects` methods on `s3::client`, following the existing `make_request` pattern. `create_bucket` handles the `BUCKET_ALREADY_OWNED_BY_YOU` error gracefully.
A new `s3_test_fixture` RAII class for C++ Boost tests that creates a uniquely-named bucket on construction (derived from the Boost test name and pid) and tears down everything — objects, bucket, client — on destruction. All S3 tests in `s3_test.cc` are migrated to use it, removing manual `deferred_delete_object` and `deferred_close` boilerplate. The minio server policy is broadened to allow dynamic bucket creation/deletion.
A `client::make` overload that accepts a custom `retry_strategy`, used in tests with a fast 1ms retry delay instead of exponential backoff, significantly reducing test runtime for transient errors during bucket lifecycle operations.
Python-side (`test/cluster/object_store`): each pytest fixture (`object_storage`, `s3_storage`, `s3_server`) now creates a unique bucket per test function via `create_test_bucket()` and destroys it on teardown. Bucket names are sanitized from the pytest node name with a short UUID suffix for uniqueness.
Object storage helpers (`S3Server`, `MinioWrapper`, `GSFront`, `GSServerImpl`, factory functions, CQL helpers, `s3_server` fixture) are extracted from `test/cluster/object_store/conftest.py` into a shared `test/pylib/object_storage.py` module, eliminating duplication across test suites. The conftest becomes a thin re-export wrapper. Old class names are preserved as aliases for backward compatibility.
| Test Name | new test specific retry strategy execution time (ms) | original execution time (ms) | Δ (ms) | Speedup |
|--------------------------------------------------------------|----------------:|-------------:|---------:|--------:|
| test_client_upload_file_multi_part_with_remainder_proxy | 19,261 | 61,395 | −42,134 | **3.2×** |
| test_client_upload_file_multi_part_without_remainder_proxy | 16,901 | 53,688 | −36,787 | **3.2×** |
| test_client_upload_file_single_part_proxy | 3,478 | 6,789 | −3,311 | **2.0×** |
| test_client_multipart_copy_upload_proxy | 1,303 | 1,619 | −316 | 1.2× |
| test_client_put_get_object_proxy | 150 | 365 | −215 | **2.4×** |
| test_client_readable_file_stream_proxy | 125 | 327 | −202 | **2.6×** |
| test_small_object_copy_proxy | 205 | 389 | −184 | 1.9× |
| test_client_put_get_tagging_proxy | 181 | 350 | −169 | 1.9× |
| test_client_multipart_upload_proxy | 1,252 | 1,416 | −164 | 1.1× |
| test_client_list_objects_proxy | 729 | 881 | −152 | 1.2× |
| test_chunked_download_data_source_with_delays_proxy | 830 | 960 | −130 | 1.2× |
| test_client_readable_file_proxy | 148 | 279 | −131 | 1.9× |
| test_client_upload_file_multi_part_with_remainder_minio | 3,358 | 3,170 | +188 | 0.9× |
| test_client_upload_file_multi_part_without_remainder_minio | 3,131 | 2,929 | +202 | 0.9× |
| test_client_upload_file_single_part_minio | 519 | 421 | +98 | 0.8× |
| test_download_data_source_proxy | 180 | 237 | −57 | 1.3× |
| test_client_list_objects_incomplete_proxy | 590 | 641 | −51 | 1.1× |
| test_large_object_copy_proxy | 952 | 991 | −39 | 1.0× |
| test_client_multipart_upload_fallback_proxy | 148 | 185 | −37 | 1.3× |
| test_client_multipart_copy_upload_minio | 641 | 674 | −33 | 1.1× |
No backport needed — this is a test infrastructure improvement with no production code impact beyond the new `s3::client` methods.
Closesscylladb/scylladb#29508
* github.com:scylladb/scylladb:
test: extract object storage helpers to test/pylib/object_storage.py
test: add per-test bucket isolation to object_store fixtures
s3: add client::make overload with custom retry strategy
test: add s3_test_fixture and migrate tests to per-bucket isolation
s3: add create_bucket and delete_bucket to client
The test starts regular backup+restore on a smaller cluster, but prior
to it spawns tablet migration from one node to another and locks it in
the middle with the help of block_tablet_streaming injection (even
though tablets have no data and there's nothing to stream, the injection
is located early enough to work).
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The test checks that losing one of nodes from the cluster while restore
is handled. In particular:
- losing an API node makes the task waiting API to throw (apparently)
- losing coordinator or replica node makes the API call to fail, because
some tablets should fail to get restored. If the coordinator is lost,
it triggers coordinator re-election and new coordinator still notices
that a tablet that was replicated to "old" coordinator failed to get
restored and fails the restore anyway
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
When the storage_service::restore_tablets() resolves, it only means that
tablet transitions are done, including restore transitions, but not
necessarily that they succeeded. So before resolving the restoration
task with success need to check if all sstables were downloaded and, if
not, resolve the task with exception.
Test included. It uses fault-injection to abort downloading of a single
sstable early, then checks that the error was properly propagated back
to the task waiting API
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The test is derived from test_restore_with_streaming_scopes() one, with
the excaption that it doesn't check for streaming directions, doesn't
check mutations right after creation and doesn't loop over scoped
sub-tests, because there's no scope concept here.
Also it verifies just two topologies, it seems to be enough. The scopes
test has many topologies because of the nature of the scoped restore,
with cluster-wide restore such flexibility is not required.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Fixes: SCYLLADB-1693
In case we abort a decommission operation, the snapshot/backup
mechanism need to remain open.
This change moves it to after raft_decommission.
In the case of a cluster snapshot, our nodes ownership
or not of tables will be serialized by raft anyway, so
should remain consistent. In that case we at worst coordinate
from a node in "leave" status
In the case of a local snapshot, ownership matters less,
only sstables on disk, which should not change.
In the case of backup, this operates on a snapshot, state of which
is not affected.
Adds an injection point for testing.
v2:
- Added injection point to ensure test can abort decommission
Closesscylladb/scylladb#29667
When a partition key or clustering key value exceeds the 64 KiB limit
(65535 bytes serialized), Scylla used to raise a generic
std::runtime_error "Key size too large: N > M" from the low-level
compound-key serializer. That error surfaced to clients as a CQL
server error (code 0x0000, "NoHostAvailable"-looking), which is both
ugly and incompatible with Cassandra - Cassandra returns a clean
InvalidRequest with the message "Key length of N is longer than
maximum of M".
Fix this at the single chokepoint: compound_type::serialize_value in
keys/compound.hh. The serializer is on every path that materializes a
key - INSERT/UPDATE/DELETE/BATCH build mutations through it, and
SELECT builds partition and clustering ranges through it - so a single
throw replacement produces a clean InvalidRequest consistently across
all paths and all key shapes (single, compound PK, composite CK).
The previous approach on this PR branch patched three call sites in
cql3/restrictions/statement_restrictions.cc, which only covered
SELECT, duplicated the check, and placed it mid-restrictions code
(flagged in review). Dropping those changes in favour of the
root-cause fix here.
Un-xfail the tests this fixes:
- test/cqlpy/test_key_length.py: test_insert_65k_pk, test_insert_65k_ck,
test_where_65k_pk, test_where_65k_ck, test_insert_65k_ck_composite,
test_insert_total_compound_pk_err, test_insert_total_composite_ck_err.
- test/cqlpy/cassandra_tests/.../insert_test.py: testPKInsertWithValueOver64K,
testCKInsertWithValueOver64K.
- test/cqlpy/cassandra_tests/.../select_test.py: testPKQueryWithValueOver64K.
test_insert_65k_pk_compound stays xfail: its oversized value gets
rejected by the Python driver's CQL wire-protocol encoder (see
CASSANDRA-19270) before reaching the server, so the fix can't apply.
Updated its reason. testCKQueryWithValueOver64K stays xfail with an
updated reason: Cassandra silently returns empty for an oversized
clustering key in WHERE, while Scylla now throws InvalidRequest - a
deliberate choice mirroring the partition-key case, documented in
the discussion on #10366.
Add three tight-boundary tests (addressing review feedback on the
previous revision) that pin MAX+1 behaviour for SELECT and INSERT of
both partition and clustering keys.
Update test/cluster/dtest/limits_test.py to match the new message
("Key length of \\d+ is longer than maximum of 65535").
fixes#10366fixes#12247
Co-authored-by: Alexander Turetskiy <someone.tur@gmail.com>
Closesscylladb/scylladb#23433
The only thing it does not change a bootstrapping node to become a voter
in case the cluster does not support limited voters feature. But the
feature was introduced in 2025.2 and direct upgrade from 2025.1 to
version newer than 2026.1 is not supported. But even if such upgrade is
done the removed code has affect only during bootstrap, not during
regular boot.
Also remove the upgrade test since after the patch suppressing the
feature on the first boot will no longer behave correctly.
When a user calls the repair API with identical startToken and endToken
values, the code creates a wrapping interval (T, T]. This causes
unwrap() to split it into (-inf, T] and (T, +inf), covering the entire
token ring and triggering a full repair.
Reject such requests early with an error message matching
Cassandra's behavior: "Start and end tokens must be different."
Fixes: https://scylladb.atlassian.net/browse/CUSTOMER-358Closesscylladb/scylladb#29821
Add a node_owner column (locator::host_id) to system.sstables and make it part of the partition key, so the primary key becomesv PRIMARY KEY ((table_id, node_owner), generation).
This is the first step toward moving the sstables registry into system_distributed: once distributed, each node's startup scan must read only the rows it owns, which requires the owning node to be part of the partition key. Partitioning by (table_id, node_owner) turns that scan into a single-partition read of exactly the local node's rows.
Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1562
No need to backport this, keyspace over object storage is experimental feature
Closesscylladb/scylladb#29659
* github.com:scylladb/scylladb:
db, sstables: add node_owner to sstables registry primary key
db, sstables: rename sstables registry column owner to table_id
In the test_delete_partition_rows_from_table_with_mv case we perform
a deletion of a large partition to verify that the deletion will
self-throttle when generating many view updates.
Before the deletion, we first build the materialized view, which causes
the view update backlog to grow. The backlog should be back to empty
when the view building finishes, and we do wait for that to happen, but
the information about the backlog drop may not be propagated to the
delete coordinator in time - the gossip interval is 1s and we perform
no other writes between the nodes in the meantime, so we don't make use
of the "piggyback" mechanism of propagating view backlog either. If the
coordinator thinks that the backlog is high on the replica, it may reject
the delete, failing this test.
We change this in this patch - after the view is built, we perform an
extra write from the coordinator. When the write finishes, the coordinator
will have the up-to-date view backlog and can proceed with the DELETE.
Additionally, we enable the "update_backlog_immediately" injection, which
makes the node backlog (the highest backlog across shards) update immediately
after each change.
Fixes: SCYLLADB-1795
Closesscylladb/scylladb#29775
In debug mode, this test can timeout during tablets merge. While the
test already decreases the number of tables in debug mode (20 tables,
instead of 200 for dev mode), this is not enough, and the test can still
timeout during merge. This change reduces the number of tables from 20
to 5 in debug mode.
It also drops the log level for lead_balancer to debug. This should make
any potential future problems with this test easier to investigate.
Fixes: SCYLLADB-1717
Closesscylladb/scylladb#29682
Replace the naive host.is_up check with wait_for_cql_and_get_hosts() which
actually executes a query against each host, ensuring the driver's connection
pool is fully re-established before proceeding to stop the last server.
The is_up flag is set asynchronously via gossip and doesn't guarantee the
connection pool has live TCP connections. After a server restart, the flag
may be True while the pool still holds stale connections. When the pool
monitor later discovers them dead it briefly marks the host DOWN, causing
NoHostAvailable if another server is being stopped concurrently.
Fixes SCYLLADB-1840
Closesscylladb/scylladb#29769
This PR includes two changes that make gossiper much less likely to mark
nodes as down in tests unexpectedly, and cause test flakiness in issues
like SCYLLADB-864:
- fixing false node conviction when echo succeeds,
- increasing the failure_detector_timeout fixture.
Fixes: SCYLLADB-864
No need for backport: related CI failures are rare, and merging #29522
made them even more unlikely (I haven't seen one since then, but it's
still possible to reproduce locally on dev machines).
Closesscylladb/scylladb#29755
* github.com:scylladb/scylladb:
test/cluster: increase failure_detector_timeout
gossiper: fix false node conviction when echo succeeds
Fixes: SCYLLADB-1815
If we're in a brand new chunk (no buffer yet allocated), we would miscalculate the actual size of an entry to write, possibly causing segment size overshoot. Break out some logic to share between this calc and new_buffer. Also remove redundant (and possibly wrong) constant in oversized allocation.
As for the test:
Checking segment sizes should not use a size filter that rounds (up) sizes.
More importantly, the estimate for what is acceptable limit for commitlog disk usage should be aligned. Simplified the calc, and also made logging more useful in case of failure.
Closesscylladb/scylladb#29753
* github.com:scylladb/scylladb:
commitlog_test.py: Fix size check aliasing, and threshold calc.
commitlog: Fix segment/chunk overhead maybe not included in next_position calculation
The auth cache crashes when it encounters rows in role_permissions that have a live row marker but no permissions column. These “ghost rows” were created by the now-removed auth v2 migration, which used INSERT (creating row markers) instead of UPDATE.
When permissions were later revoked, the row marker remained while the permissions column became null. An empty collection appears as null, since its lifetime is based only on its element's cells.
As a result, when the cache reloads and expects the permissions column to exist, it hits a missing_column exception.
The series removes dead code that was the primary crash site, adds has() guards to the remaining access paths, and includes a test reproducer.
Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1816
Backport: all supported versions 2026.1, 2025.4, 2025.1
Closesscylladb/scylladb#29757
* github.com:scylladb/scylladb:
test: add reproducer for auth cache crash on missing permissions column
auth: tolerate missing permissions column in authorize()
auth: add defensive has() guard for role_attributes value column
auth: remove unused permissions field from cache role_record
ServerUpState.SERVING is now the default for server_add() and
server_start(), so the explicit argument in various tests are no longer
needed. Remove it along with the unused ServerUpState imports and the
docstring comments that explained why it was there.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Scaling the timeout by build mode (#29522) turned out to be not sufficient.
Nodes can still be unexpectedly marked as down, even with a 4s timeout in
dev mode. I managed to reproduce SCYLLADB-864 in such conditions.
Increasing failure_detector_timeout will proportionally slow down tests
that use it. That's bad, but currently these tests' flakiness is a much
bigger problem than the tests' slowness. Also, not many tests use this
fixture, and we hope to make it unneeded eventually (see #28495).
Add more logging to barrier and drain rpc to try and pinpoint https://github.com/scylladb/scylladb/issues/26281
Bakport since we want to have it if it happens in the field.
Fixes: SCYLLADB-1821
Refs: #26281Closesscylladb/scylladb#29735
* https://github.com/scylladb/scylladb:
session, raft_topology: add periodic warnings for hung drain and stale version waits
session: add info-level logging to drain_closing_sessions
raft_topology: log sub-step progress in local_topology_barrier
raft_topology: log read_barrier progress in topology cmd handler
Fixes: SCYLLADB-1815
Checking segment sizes should not use a size filter that rounds
(up) sizes.
More importantly, the estimate for what is acceptable limit for
commitlog disk usage should be aligned. Simplified the calc, and
also made logging more useful in case of failure.
server_add() defaults to CQL_ALTERNATOR_QUERIED. That proves the regular CQL driver path is queryable, and regular Alternator ports listed in YAML config if any. It does not prove that every custom listener the test will connect to is already accepting raw TCP connections.
test_proxy_protocol_ssl_shard_aware connects directly to the shard-aware TLS proxy-protocol CQL port immediately after server startup. Wait for ServerUpState.SERVING in the fixture so the custom proxy-protocol listener is registered before opening raw sockets.
test_uninitialized_conns_semaphore opens a raw TCP connection to native_shard_aware_transport_port immediately after startup. The default readiness check can succeed through native_transport_port while the shard-aware listener is still being started, because CQL listeners are registered independently. Wait for ServerUpState.SERVING before opening raw sockets.
test_perf_alternator_remote now asks server_add() to wait for SERVING and uses the returned server address directly. This removes the redundant running_servers() plus get_ready_cql() sequence noted in review.
Fixes: SCYLLADB-1797
No backport as of now, only appeared on master.
Closesscylladb/scylladb#29737
* github.com:scylladb/scylladb:
test/cluster: avoid redundant perf alternator CQL wait
test/cluster: wait for shard-aware CQL listener
test/cluster: wait for proxy protocol ports to serve
When a node processes a barrier_and_drain topology command, it performs
two potentially long-running operations inside local_topology_barrier():
waiting for stale token metadata versions to be released
(stale_versions_in_use) and draining closing sessions
(drain_closing_sessions). Either of these can hang indefinitely -- for
example, stale_versions_in_use blocks until all references to previous
token metadata versions are released, which depends on in-flight
requests completing.
Previously, the only logging was a single 'done' message at the end,
making it impossible to determine which sub-step was blocking when a
barrier_and_drain RPC appeared stuck on a node. In a recent CI failure,
a node never responded to barrier_and_drain during a removenode
operation, and the logs showed the RPC was received but nothing about
what it was waiting on internally.
Add info-level logging before each blocking sub-step, including the
topology version for correlation. This allows diagnosing hangs by
showing whether the node is stuck waiting for stale metadata versions,
stuck draining sessions, or never reached these steps at all.
server_add() already waits for the requested server-up state. For the remote perf-alternator test, request SERVING from server_add() and use the returned server address directly instead of asking for running servers and then calling get_ready_cql() again.
This keeps the listener-readiness intent explicit while removing the redundant CQL readiness probe noted in review.
server_add() defaults to CQL_ALTERNATOR_QUERIED. That proves the regular CQL driver path is queryable, and regular Alternator ports listed in YAML config if any. It does not prove that every CQL listener configured for the process is already accepting raw TCP connections.
test_uninitialized_conns_semaphore opens a raw TCP connection to native_shard_aware_transport_port immediately after startup. The default readiness check can succeed through native_transport_port while the shard-aware listener is still being started, because CQL listeners are registered independently.
Wait for ServerUpState.SERVING before opening raw sockets. Scylla sends that notification only after protocol servers are registered, so this closes the startup window without adding sleeps or local retry loops.
Fixes: SCYLLADB-1797