scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-12 19:02:12 +00:00

Author	SHA1	Message	Date
Wojciech Mitros	f3cf20803b	test: run test_mv_admission_control_exception on one shard In the test we perform 2 consecutive writes where the first write is supposed to increase the view update backlog above the mv admission control threshold and the second one is expected to be rejected because of that. On each node/shard we have 2 types of view update backlogs: 1. for deciding whether we should admit writes 2. for propagating the backlog information to other nodes/shards. For the second write to be rejected, it must be performed on a node and shard which updated its backlog of type 1. The view update backlog of type 2. is immediately increased on the base table replica. For this backlog to be registered as a backlog of type 1., it needs to be either carried by gossip (happening once every second) or by attaching it to a replica write response. We don't want to increase the runtime of tests unnecessarily, so we don't wait and we rely on the second mechanism. The response to the first base table write (the one causing increase in the backlog) carries the increased backlog to the coordinator of this write. So for the second write to observe the increased backlog, it needs to be coordinated on the same node+shard as the first write. We make sure that both writes are coordinated on the same node+shard by using prepared statements combined with setting the host in `run_async`. Both writes target the same partition and with prepared statements we route them directly to the correct shard. That was the idea, at least. In practice, for the driver to learn the correct shard, it first needs to learn the token->shard mapping from the server. For vnodes it can expect a shard by calculating the token of the affected partition, but for tablets, it had no opportunity to learn the tablet->shard mapping so the first write may route to any shard. Additionally, we aren't guaranteed that the driver established connections to all shards on all nodes at the point of any write. So if a connection finishes establishing between the two writes, this may also cause us to coordinate these 2 writes on different shards, leading to a missed view backlog growth and not-rejected second write. We fix this in this patch by running the test using one shard on each node. This way, as long as we perform both writes on the same node, they'll also be coordinated on the same shard. This also makes the prepared statement and BoundStatement unnecessary — we can use SimpleStatement with FallthroughRetryPolicy directly. Fixes: SCYLLADB-1901 Closes scylladb/scylladb#29862	2026-05-12 17:34:19 +02:00
Piotr Dulikowski	129f193116	Merge 'strong_consistency: implement basic coordinator metrics' from Michał Jadwiszczak Add per-shard metrics for strong consistency coordinator operations (latency, timeouts, bounces, status unknown) under the `"strong_consistency_coordinator"` category. These are analogous to the eventual consistency metrics in `storage_proxy_stats`, enabling direct performance comparison between the two consistency modes. The metrics are simplified compared to `storage_proxy_stats` — no breakdown by table, tablet, scheduling group, or DC, only per-shard. Fixes SCYLLADB-1343 Strong consistency is still in experimental phase, no need to backport. Closes scylladb/scylladb#29318 * github.com:scylladb/scylladb: test/strong_consistency: verify metrics strong_consistency: wire up metrics to operations strong_consistency: add stats struct and metrics registration	2026-05-12 16:15:51 +02:00
Botond Dénes	e95eb21a16	Merge 'Tablet-aware restore' from Pavel Emelyanov The mechanics of the restore is like this - A /storage_service/tablets/restore API is called with (keyspace, table, endpoint, bucket, manifests) parameters - First, it populates the system_distributed.snapshot_sstables table with the data read from the manifests - Then it emplaces a bunch of tablet transitions (of a new "restore" kind), one for each tablet - The topology coordinator handles the "restore" transition by calling a new RESTORE_TABLET RPC against all the current tablet replicas - Each replica handles the RPC verb by - Reading the snapshot_sstables table - Filtering the read sstable infos against current node and tablet being handled - Downloading and attaching the filtered sstables This PR includes system_distributed.snapshot_sstables table from @robertbindar and preparation work from @kreuzerkrieg that extracts raw sstables downloading and attaching from existing generic sstables loading code. This is first step towards SCYLLADB-197 and lacks many things. In particular - the API only works for single-DC cluster - the caller needs to "lock" tablet boundaries with min/max tablet count - not abortable - no progress tracking - sub-optimal (re-kicking API on restore will re-download everything again) - not re-attacheable (if API node dies, restoration proceeds, but the caller cannot "wait" for it to complete via other node) - nodes download sstables in maintenance/streaming sched gorup (should be moved to maintenance/backup) Other follow-up items: - have an actual swagger object specification for `backup_location` Closes #28436 Closes #28657 Closes #28773 Closes scylladb/scylladb#28763 * github.com:scylladb/scylladb: docs: Update topology_over_raft.md with `restore` transition kind test: Add test for backup vs migration race test: Restore resilience test sstables_loader: Fail tablet-restore task if not all sstables were downloaded sstables_loader: mark sstables as downloaded after attaching sstables_loader: return shared_sstable from attach_sstable db: add update_sstable_download_status method db: add downloaded column to snapshot_sstables db: extract snapshot_sstables TTL into class constant test: Add a test for tablet-aware restore tablets: Implement tablet-aware cluster-wide restore messaging: Add RESTORE_TABLET RPC verb sstables_loader: Add method to download and attach sstables for a tablet tablets: Add restore_config to tablet_transition_info sstables_loader: Add restore_tablets task skeleton test: Add rest_client helper to kick newly introduced API endpoint api: Add /storage_service/tablets/restore endpoint skeleton sstables_loader: Add keyspace and table arguments to manfiest loading helper sstables_loader_helpers: just reformat the code sstables_loader_helpers: generalize argument and variable names sstables_loader_helpers: generalize get_sstables_for_tablet sstables_loader_helpers: add token getters for tablet filtering sstables_loader_helpers: remove underscores from struct members sstables_loader: move download_sstable and get_sstables_for_tablet sstables_loader: extract single-tablet SST filtering sstables_loader: make download_sstable static sstables_loader: fix formating of the new `download_sstable` function sstables_loader: extract single SST download into a function sstables_loader: add shard_id to minimal_sst_info sstables_loader: add function for parsing backup manifests split utility functions for creating test data from database_test export make_storage_options_config from lib/test_services rjson: Add helpers for conversions to dht::token and sstable_id Add system_distributed_keyspace.snapshot_sstables add get_system_distributed_keyspace to cql_test_env code: Add system_distributed_keyspace dependency to sstables_loader storage_service: Export export handle_raft_rpc() helper storage_service: Export do_tablet_operation() storage_service: Split transit_tablet() into two tablets: Add braces around tablet_transition_kind::repair switch	2026-05-12 16:24:13 +03:00
Andrzej Jackowski	89261bf759	test: wait for TTL scheduling sanity metric The test samples sl:default runtime before and after setup writes to prove that it measures the scheduling group used by regular CQL writes. The metric is exported in milliseconds, so a single 200-row batch may not be visible immediately, or may be too small in some environments. Keep the original 200-row table size, but wait up to 30 seconds for the metric to advance. If it does not, retry the same writes before TTL is enabled. The retries update the same keys, so the expiration part of the test still waits for exactly the original number of rows. In a local 100-run with N=200 rows, the observed delta of `ms_statement_before - ms_statement_before_write` was: min=4.0, max=16.0, mean=8.13, and median=8.0. Therefore, it looks possible that in a rare corner case the delta drops even to 0. Fixes SCYLLADB-1869 Closes scylladb/scylladb#29797	2026-05-12 12:38:25 +03:00
Piotr Dulikowski	7c2b1ea0b5	Merge 'view_building: fix tombstone_warn_threshold warnings' from Michał Jadwiszczak `system.view_building_tasks` is a single-partition Raft group0 table (pk = `"view_building"`, CK = timeuuid). When `clean_finished_tasks()` deletes hundreds of finished tasks, the physical rows remain in SSTables until compaction. Any subsequent read of the partition counts every column of every tombstoned row as a dead cell, triggering `tombstone_warn_threshold` warnings in large clusters. Two-part fix: 1. Range tombstones instead of row tombstones (commits 2–3) Instead of one row tombstone per finished task, find the minimum alive task UUID (`min_alive_uuid`) and emit a single range tombstone `[before_all, min_alive_uuid)` covering all tasks below that boundary. This reduces the tombstone count significantly and also benefits future compaction. 2. Bounded scan with `min_task_id` (commits 4–6) Even with range tombstones, physical rows remain until compaction and still count as dead cells during reads. The only way to avoid them is to not read them at all. - Add a `min_task_id timeuuid` static column to `system.view_building_tasks`. - On every GC, write `min_task_id = min_alive_uuid` atomically with the range tombstone (same Raft batch). - On reload, read `min_task_id` first using a static-only partition slice (empty `_row_ranges` + `always_return_static_content`): the SSTable reader stops immediately after the static row before processing any clustering tombstones — zero dead cells counted. - Use `AND id >= min_task_id` as a lower bound for the main task scan, skipping all tombstoned rows. The static-only read and the bounded scan are gated on the `VIEW_BUILDING_TASKS_MIN_TASK_ID` cluster feature so mixed-version clusters fall back to the full scan. The issue is not critical, so the fix shouldn't be backported. Fixes SCYLLADB-657 Closes scylladb/scylladb#28929 * github.com:scylladb/scylladb: test/cluster/test_view_building_coordinator: add reproducer for tombstone threshold warning docs: document tombstone avoidance in view_building_tasks view_building: add `task_uuid_generator` to `view_building_task_mutation_builder` view_building: introduce `task_uuid_generator` view_building: store `min_alive_uuid` in view building state view_building: set min_task_id when GC-ing finished tasks view_building: add min_task_id support to view_building_task_mutation_builder view_building: add min_task_id static column and bounded scan to system_keyspace view_building: use range tombstone when GC-ing finished tasks view_building: add range tombstone support to view_building_task_mutation_builder view_building: introduce VIEW_BUILDING_TASKS_MIN_TASK_ID cluster feature	2026-05-12 12:38:25 +03:00
Pavel Emelyanov	150345cc52	Merge 'test: per-bucket isolation for S3/GCS object storage tests' from Ernest Zaslavsky This series adds per-test bucket isolation to all S3 and GCS object storage tests. Previously, every test shared a single pre-created bucket, which meant tests could interfere with each other through leftover objects and could not run concurrently across multiple `test.py` processes without risking collisions. New `create_bucket`, `delete_bucket`, and `delete_bucket_with_objects` methods on `s3::client`, following the existing `make_request` pattern. `create_bucket` handles the `BUCKET_ALREADY_OWNED_BY_YOU` error gracefully. A new `s3_test_fixture` RAII class for C++ Boost tests that creates a uniquely-named bucket on construction (derived from the Boost test name and pid) and tears down everything — objects, bucket, client — on destruction. All S3 tests in `s3_test.cc` are migrated to use it, removing manual `deferred_delete_object` and `deferred_close` boilerplate. The minio server policy is broadened to allow dynamic bucket creation/deletion. A `client::make` overload that accepts a custom `retry_strategy`, used in tests with a fast 1ms retry delay instead of exponential backoff, significantly reducing test runtime for transient errors during bucket lifecycle operations. Python-side (`test/cluster/object_store`): each pytest fixture (`object_storage`, `s3_storage`, `s3_server`) now creates a unique bucket per test function via `create_test_bucket()` and destroys it on teardown. Bucket names are sanitized from the pytest node name with a short UUID suffix for uniqueness. Object storage helpers (`S3Server`, `MinioWrapper`, `GSFront`, `GSServerImpl`, factory functions, CQL helpers, `s3_server` fixture) are extracted from `test/cluster/object_store/conftest.py` into a shared `test/pylib/object_storage.py` module, eliminating duplication across test suites. The conftest becomes a thin re-export wrapper. Old class names are preserved as aliases for backward compatibility. \| Test Name \| new test specific retry strategy execution time (ms) \| original execution time (ms) \| Δ (ms) \| Speedup \| \|--------------------------------------------------------------\|----------------:\|-------------:\|---------:\|--------:\| \| test_client_upload_file_multi_part_with_remainder_proxy \| 19,261 \| 61,395 \| −42,134 \| 3.2× \| \| test_client_upload_file_multi_part_without_remainder_proxy \| 16,901 \| 53,688 \| −36,787 \| 3.2× \| \| test_client_upload_file_single_part_proxy \| 3,478 \| 6,789 \| −3,311 \| 2.0× \| \| test_client_multipart_copy_upload_proxy \| 1,303 \| 1,619 \| −316 \| 1.2× \| \| test_client_put_get_object_proxy \| 150 \| 365 \| −215 \| 2.4× \| \| test_client_readable_file_stream_proxy \| 125 \| 327 \| −202 \| 2.6× \| \| test_small_object_copy_proxy \| 205 \| 389 \| −184 \| 1.9× \| \| test_client_put_get_tagging_proxy \| 181 \| 350 \| −169 \| 1.9× \| \| test_client_multipart_upload_proxy \| 1,252 \| 1,416 \| −164 \| 1.1× \| \| test_client_list_objects_proxy \| 729 \| 881 \| −152 \| 1.2× \| \| test_chunked_download_data_source_with_delays_proxy \| 830 \| 960 \| −130 \| 1.2× \| \| test_client_readable_file_proxy \| 148 \| 279 \| −131 \| 1.9× \| \| test_client_upload_file_multi_part_with_remainder_minio \| 3,358 \| 3,170 \| +188 \| 0.9× \| \| test_client_upload_file_multi_part_without_remainder_minio \| 3,131 \| 2,929 \| +202 \| 0.9× \| \| test_client_upload_file_single_part_minio \| 519 \| 421 \| +98 \| 0.8× \| \| test_download_data_source_proxy \| 180 \| 237 \| −57 \| 1.3× \| \| test_client_list_objects_incomplete_proxy \| 590 \| 641 \| −51 \| 1.1× \| \| test_large_object_copy_proxy \| 952 \| 991 \| −39 \| 1.0× \| \| test_client_multipart_upload_fallback_proxy \| 148 \| 185 \| −37 \| 1.3× \| \| test_client_multipart_copy_upload_minio \| 641 \| 674 \| −33 \| 1.1× \| No backport needed — this is a test infrastructure improvement with no production code impact beyond the new `s3::client` methods. Closes scylladb/scylladb#29508 * github.com:scylladb/scylladb: test: extract object storage helpers to test/pylib/object_storage.py test: add per-test bucket isolation to object_store fixtures s3: add client::make overload with custom retry strategy test: add s3_test_fixture and migrate tests to per-bucket isolation s3: add create_bucket and delete_bucket to client	2026-05-12 12:38:24 +03:00
Pavel Emelyanov	19820910f8	test: Add test for backup vs migration race The test starts regular backup+restore on a smaller cluster, but prior to it spawns tablet migration from one node to another and locks it in the middle with the help of block_tablet_streaming injection (even though tablets have no data and there's nothing to stream, the injection is located early enough to work). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-05-12 10:40:24 +03:00
Pavel Emelyanov	3bcefa42c5	test: Restore resilience test The test checks that losing one of nodes from the cluster while restore is handled. In particular: - losing an API node makes the task waiting API to throw (apparently) - losing coordinator or replica node makes the API call to fail, because some tablets should fail to get restored. If the coordinator is lost, it triggers coordinator re-election and new coordinator still notices that a tablet that was replicated to "old" coordinator failed to get restored and fails the restore anyway Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-05-12 10:40:24 +03:00
Pavel Emelyanov	69b8f76a32	sstables_loader: Fail tablet-restore task if not all sstables were downloaded When the storage_service::restore_tablets() resolves, it only means that tablet transitions are done, including restore transitions, but not necessarily that they succeeded. So before resolving the restoration task with success need to check if all sstables were downloaded and, if not, resolve the task with exception. Test included. It uses fault-injection to abort downloading of a single sstable early, then checks that the error was properly propagated back to the task waiting API Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-05-12 10:40:24 +03:00
Pavel Emelyanov	4137211cf4	test: Add a test for tablet-aware restore The test is derived from test_restore_with_streaming_scopes() one, with the excaption that it doesn't check for streaming directions, doesn't check mutations right after creation and doesn't loop over scoped sub-tests, because there's no scope concept here. Also it verifies just two topologies, it seems to be enough. The scopes test has many topologies because of the nature of the scoped restore, with cluster-wide restore such flexibility is not required. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-05-12 10:40:23 +03:00
Calle Wilund	2cc1a2c406	storage_service: Disable snapshots after raft decommission Fixes: SCYLLADB-1693 In case we abort a decommission operation, the snapshot/backup mechanism need to remain open. This change moves it to after raft_decommission. In the case of a cluster snapshot, our nodes ownership or not of tables will be serialized by raft anyway, so should remain consistent. In that case we at worst coordinate from a node in "leave" status In the case of a local snapshot, ownership matters less, only sstables on disk, which should not change. In the case of backup, this operates on a snapshot, state of which is not affected. Adds an injection point for testing. v2: - Added injection point to ensure test can abort decommission Closes scylladb/scylladb#29667	2026-05-11 17:04:09 +03:00
Piotr Smaron	71542206bc	cql: return InvalidRequest for oversized partition/clustering keys When a partition key or clustering key value exceeds the 64 KiB limit (65535 bytes serialized), Scylla used to raise a generic std::runtime_error "Key size too large: N > M" from the low-level compound-key serializer. That error surfaced to clients as a CQL server error (code 0x0000, "NoHostAvailable"-looking), which is both ugly and incompatible with Cassandra - Cassandra returns a clean InvalidRequest with the message "Key length of N is longer than maximum of M". Fix this at the single chokepoint: compound_type::serialize_value in keys/compound.hh. The serializer is on every path that materializes a key - INSERT/UPDATE/DELETE/BATCH build mutations through it, and SELECT builds partition and clustering ranges through it - so a single throw replacement produces a clean InvalidRequest consistently across all paths and all key shapes (single, compound PK, composite CK). The previous approach on this PR branch patched three call sites in cql3/restrictions/statement_restrictions.cc, which only covered SELECT, duplicated the check, and placed it mid-restrictions code (flagged in review). Dropping those changes in favour of the root-cause fix here. Un-xfail the tests this fixes: - test/cqlpy/test_key_length.py: test_insert_65k_pk, test_insert_65k_ck, test_where_65k_pk, test_where_65k_ck, test_insert_65k_ck_composite, test_insert_total_compound_pk_err, test_insert_total_composite_ck_err. - test/cqlpy/cassandra_tests/.../insert_test.py: testPKInsertWithValueOver64K, testCKInsertWithValueOver64K. - test/cqlpy/cassandra_tests/.../select_test.py: testPKQueryWithValueOver64K. test_insert_65k_pk_compound stays xfail: its oversized value gets rejected by the Python driver's CQL wire-protocol encoder (see CASSANDRA-19270) before reaching the server, so the fix can't apply. Updated its reason. testCKQueryWithValueOver64K stays xfail with an updated reason: Cassandra silently returns empty for an oversized clustering key in WHERE, while Scylla now throws InvalidRequest - a deliberate choice mirroring the partition-key case, documented in the discussion on #10366. Add three tight-boundary tests (addressing review feedback on the previous revision) that pin MAX+1 behaviour for SELECT and INSERT of both partition and clustering keys. Update test/cluster/dtest/limits_test.py to match the new message ("Key length of \\d+ is longer than maximum of 65535"). fixes #10366 fixes #12247 Co-authored-by: Alexander Turetskiy <someone.tur@gmail.com> Closes scylladb/scylladb#23433	2026-05-11 16:56:35 +03:00
Asias He	0204372156	repair: Reject repair requests where start and end tokens are equal When a user calls the repair API with identical startToken and endToken values, the code creates a wrapping interval (T, T]. This causes unwrap() to split it into (-inf, T] and (T, +inf), covering the entire token ring and triggering a full repair. Reject such requests early with an error message matching Cassandra's behavior: "Start and end tokens must be different." Fixes: https://scylladb.atlassian.net/browse/CUSTOMER-358 Closes scylladb/scylladb#29821	2026-05-11 14:08:20 +03:00
Botond Dénes	ad7ac62835	Merge ' Add a node_owner column (locator::host_id) to system.sstables and make it part of the partition key' from Dimitrios Symonidis Add a node_owner column (locator::host_id) to system.sstables and make it part of the partition key, so the primary key becomesv PRIMARY KEY ((table_id, node_owner), generation). This is the first step toward moving the sstables registry into system_distributed: once distributed, each node's startup scan must read only the rows it owns, which requires the owning node to be part of the partition key. Partitioning by (table_id, node_owner) turns that scan into a single-partition read of exactly the local node's rows. Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1562 No need to backport this, keyspace over object storage is experimental feature Closes scylladb/scylladb#29659 * github.com:scylladb/scylladb: db, sstables: add node_owner to sstables registry primary key db, sstables: rename sstables registry column owner to table_id	2026-05-11 14:08:19 +03:00
Wojciech Mitros	ab12083525	test: propagate view update backlog before partition delete In the test_delete_partition_rows_from_table_with_mv case we perform a deletion of a large partition to verify that the deletion will self-throttle when generating many view updates. Before the deletion, we first build the materialized view, which causes the view update backlog to grow. The backlog should be back to empty when the view building finishes, and we do wait for that to happen, but the information about the backlog drop may not be propagated to the delete coordinator in time - the gossip interval is 1s and we perform no other writes between the nodes in the meantime, so we don't make use of the "piggyback" mechanism of propagating view backlog either. If the coordinator thinks that the backlog is high on the replica, it may reject the delete, failing this test. We change this in this patch - after the view is built, we perform an extra write from the coordinator. When the write finishes, the coordinator will have the up-to-date view backlog and can proceed with the DELETE. Additionally, we enable the "update_backlog_immediately" injection, which makes the node backlog (the highest backlog across shards) update immediately after each change. Fixes: SCYLLADB-1795 Closes scylladb/scylladb#29775	2026-05-07 11:33:13 +03:00
Ferenc Szili	ec4b483e88	test: fix flaky test_tablets_split_merge_with_many_tables In debug mode, this test can timeout during tablets merge. While the test already decreases the number of tables in debug mode (20 tables, instead of 200 for dev mode), this is not enough, and the test can still timeout during merge. This change reduces the number of tables from 20 to 5 in debug mode. It also drops the log level for lead_balancer to debug. This should make any potential future problems with this test easier to investigate. Fixes: SCYLLADB-1717 Closes scylladb/scylladb#29682	2026-05-06 17:02:10 +03:00
Petr Gusev	cab043323d	test/cluster: fix test_lwt_fencing_upgrade flakiness during rolling upgrade Replace the naive host.is_up check with wait_for_cql_and_get_hosts() which actually executes a query against each host, ensuring the driver's connection pool is fully re-established before proceeding to stop the last server. The is_up flag is set asynchronously via gossip and doesn't guarantee the connection pool has live TCP connections. After a server restart, the flag may be True while the pool still holds stale connections. When the pool monitor later discovers them dead it briefly marks the host DOWN, causing NoHostAvailable if another server is being stopped concurrently. Fixes SCYLLADB-1840 Closes scylladb/scylladb#29769	2026-05-06 15:40:09 +03:00
Tomasz Grabiec	d6346e68c1	Merge 'prevent gossiper from marking nodes as down in tests unexpectedly' from Patryk Jędrzejczak This PR includes two changes that make gossiper much less likely to mark nodes as down in tests unexpectedly, and cause test flakiness in issues like SCYLLADB-864: - fixing false node conviction when echo succeeds, - increasing the failure_detector_timeout fixture. Fixes: SCYLLADB-864 No need for backport: related CI failures are rare, and merging #29522 made them even more unlikely (I haven't seen one since then, but it's still possible to reproduce locally on dev machines). Closes scylladb/scylladb#29755 * github.com:scylladb/scylladb: test/cluster: increase failure_detector_timeout gossiper: fix false node conviction when echo succeeds	2026-05-06 14:01:15 +02:00
Botond Dénes	8d22ef3058	Merge 'commitlog_test.py: Fix size check aliasing, and threshold calc and fix CL chunk size est.' from Calle Wilund Fixes: SCYLLADB-1815 If we're in a brand new chunk (no buffer yet allocated), we would miscalculate the actual size of an entry to write, possibly causing segment size overshoot. Break out some logic to share between this calc and new_buffer. Also remove redundant (and possibly wrong) constant in oversized allocation. As for the test: Checking segment sizes should not use a size filter that rounds (up) sizes. More importantly, the estimate for what is acceptable limit for commitlog disk usage should be aligned. Simplified the calc, and also made logging more useful in case of failure. Closes scylladb/scylladb#29753 * github.com:scylladb/scylladb: commitlog_test.py: Fix size check aliasing, and threshold calc. commitlog: Fix segment/chunk overhead maybe not included in next_position calculation	2026-05-06 13:48:41 +03:00
Piotr Dulikowski	321006ecbd	Merge 'auth: fix crash on ghost rows in role_permissions' from Marcin Maliszkiewicz The auth cache crashes when it encounters rows in role_permissions that have a live row marker but no permissions column. These “ghost rows” were created by the now-removed auth v2 migration, which used INSERT (creating row markers) instead of UPDATE. When permissions were later revoked, the row marker remained while the permissions column became null. An empty collection appears as null, since its lifetime is based only on its element's cells. As a result, when the cache reloads and expects the permissions column to exist, it hits a missing_column exception. The series removes dead code that was the primary crash site, adds has() guards to the remaining access paths, and includes a test reproducer. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1816 Backport: all supported versions 2026.1, 2025.4, 2025.1 Closes scylladb/scylladb#29757 * github.com:scylladb/scylladb: test: add reproducer for auth cache crash on missing permissions column auth: tolerate missing permissions column in authorize() auth: add defensive has() guard for role_attributes value column auth: remove unused permissions field from cache role_record	2026-05-06 12:00:17 +02:00
Marcin Maliszkiewicz	5c5306c692	test: add reproducer for auth cache crash on missing permissions column	2026-05-05 17:16:25 +02:00
Patryk Jędrzejczak	9f692857be	test/cluster: increase failure_detector_timeout Scaling the timeout by build mode (#29522) turned out to be not sufficient. Nodes can still be unexpectedly marked as down, even with a 4s timeout in dev mode. I managed to reproduce SCYLLADB-864 in such conditions. Increasing failure_detector_timeout will proportionally slow down tests that use it. That's bad, but currently these tests' flakiness is a much bigger problem than the tests' slowness. Also, not many tests use this fixture, and we hope to make it unneeded eventually (see #28495).	2026-05-05 15:12:33 +02:00
Patryk Jędrzejczak	b69d00b0a7	Merge 'Barrier and drain logging' from Gleb Natapov Add more logging to barrier and drain rpc to try and pinpoint https://github.com/scylladb/scylladb/issues/26281 Bakport since we want to have it if it happens in the field. Fixes: SCYLLADB-1821 Refs: #26281 Closes scylladb/scylladb#29735 * https://github.com/scylladb/scylladb: session, raft_topology: add periodic warnings for hung drain and stale version waits session: add info-level logging to drain_closing_sessions raft_topology: log sub-step progress in local_topology_barrier raft_topology: log read_barrier progress in topology cmd handler	2026-05-05 15:04:50 +02:00
Calle Wilund	5cdfdd9ba3	commitlog_test.py: Fix size check aliasing, and threshold calc. Fixes: SCYLLADB-1815 Checking segment sizes should not use a size filter that rounds (up) sizes. More importantly, the estimate for what is acceptable limit for commitlog disk usage should be aligned. Simplified the calc, and also made logging more useful in case of failure.	2026-05-05 14:42:55 +02:00
Botond Dénes	afd9a55891	Merge 'test/cluster: wait for custom listener readiness' from Piotr Smaron server_add() defaults to CQL_ALTERNATOR_QUERIED. That proves the regular CQL driver path is queryable, and regular Alternator ports listed in YAML config if any. It does not prove that every custom listener the test will connect to is already accepting raw TCP connections. test_proxy_protocol_ssl_shard_aware connects directly to the shard-aware TLS proxy-protocol CQL port immediately after server startup. Wait for ServerUpState.SERVING in the fixture so the custom proxy-protocol listener is registered before opening raw sockets. test_uninitialized_conns_semaphore opens a raw TCP connection to native_shard_aware_transport_port immediately after startup. The default readiness check can succeed through native_transport_port while the shard-aware listener is still being started, because CQL listeners are registered independently. Wait for ServerUpState.SERVING before opening raw sockets. test_perf_alternator_remote now asks server_add() to wait for SERVING and uses the returned server address directly. This removes the redundant running_servers() plus get_ready_cql() sequence noted in review. Fixes: SCYLLADB-1797 No backport as of now, only appeared on master. Closes scylladb/scylladb#29737 * github.com:scylladb/scylladb: test/cluster: avoid redundant perf alternator CQL wait test/cluster: wait for shard-aware CQL listener test/cluster: wait for proxy protocol ports to serve	2026-05-05 14:45:58 +03:00
Gleb Natapov	e88ce09372	raft_topology: log sub-step progress in local_topology_barrier When a node processes a barrier_and_drain topology command, it performs two potentially long-running operations inside local_topology_barrier(): waiting for stale token metadata versions to be released (stale_versions_in_use) and draining closing sessions (drain_closing_sessions). Either of these can hang indefinitely -- for example, stale_versions_in_use blocks until all references to previous token metadata versions are released, which depends on in-flight requests completing. Previously, the only logging was a single 'done' message at the end, making it impossible to determine which sub-step was blocking when a barrier_and_drain RPC appeared stuck on a node. In a recent CI failure, a node never responded to barrier_and_drain during a removenode operation, and the logs showed the RPC was received but nothing about what it was waiting on internally. Add info-level logging before each blocking sub-step, including the topology version for correlation. This allows diagnosing hangs by showing whether the node is stuck waiting for stale metadata versions, stuck draining sessions, or never reached these steps at all.	2026-05-04 15:58:45 +03:00
Piotr Smaron	0a780d0ea1	test/cluster: avoid redundant perf alternator CQL wait server_add() already waits for the requested server-up state. For the remote perf-alternator test, request SERVING from server_add() and use the returned server address directly instead of asking for running servers and then calling get_ready_cql() again. This keeps the listener-readiness intent explicit while removing the redundant CQL readiness probe noted in review.	2026-05-04 14:09:28 +02:00
Piotr Smaron	c90012c22b	test/cluster: wait for shard-aware CQL listener server_add() defaults to CQL_ALTERNATOR_QUERIED. That proves the regular CQL driver path is queryable, and regular Alternator ports listed in YAML config if any. It does not prove that every CQL listener configured for the process is already accepting raw TCP connections. test_uninitialized_conns_semaphore opens a raw TCP connection to native_shard_aware_transport_port immediately after startup. The default readiness check can succeed through native_transport_port while the shard-aware listener is still being started, because CQL listeners are registered independently. Wait for ServerUpState.SERVING before opening raw sockets. Scylla sends that notification only after protocol servers are registered, so this closes the startup window without adding sleeps or local retry loops. Fixes: SCYLLADB-1797	2026-05-04 13:36:43 +02:00
Nadav Har'El	983eb5ab43	test/cluster/auth_cluster: use CREATE ROLE IF NOT EXISTS to fix flaky test test_create_role_mixed_cluster calls servers_add(2) to bootstrap two old nodes concurrently, then adds a new node before issuing CREATE ROLE. The concurrent bootstraps trigger the well-known Python driver bug (scylladb/python-driver#317): two on_add notifications race in update_created_pools, causing a second pool to be created for a host whose pool was already established. If CREATE ROLE is in-flight on the old pool when it is closed, the driver retries on the new pool, executing the statement twice. The second execution fails with "Role ... already exists", making the test flaky. Fix by using CREATE ROLE IF NOT EXISTS. This is safe because unique_name() generates a timestamp+random suffix that is guaranteed to be unique; the role can "already exist" only due to the driver double-execution bug, never due to a real conflict. This is the same workaround that has been applied many times elsewhere in our test suite for exactly the same root cause: - CREATE KEYSPACE was changed to CREATE KEYSPACE IF NOT EXISTS (scylladb#18368, later generalised in scylladb#22399 via new_test_keyspace helpers) - DROP KEYSPACE was changed to DROP KEYSPACE IF EXISTS (scylladb#29487) Fixes: SCYLLADB-1742 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#29732	2026-05-04 11:47:11 +02:00
Yaniv Michael Kaul	6179406467	raft/group0: fix destroy assertion on startup failure If start_server_for_group0() successfully registers a server in _raft_gr._servers but a subsequent step (e.g. enable_in_memory_state_machine()) throws, the server is never destroyed because abort_and_drain()/destroy() check std::get_if<raft::group_id>(&_group0) which was only set after the entire with_scheduling_group block completed. Move _group0.emplace<raft::group_id>() inside the lambda, immediately after start_server_for_group() succeeds, so that cleanup paths can always find and destroy the registered server. This fixes the assertion: "raft_group_registry - stop(): server for group ... is not destroyed" which manifests during shutdown after an upgrade where topology_state_load() fails due to netw::unknown_address. Backport: Yes, to 2026.1, 2026.2, as it causes a crash on upgrades Refs: SCYLLADB-1217 Refs: CUSTOMER-340 Refs: CUSTOMER-335 Fixes: SCYLLADB-1801 Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> AI-assisted: Yes, Opencode/Opus 4.6 Closes scylladb/scylladb#29702	2026-05-04 11:25:46 +02:00
Piotr Smaron	689117f706	test/cluster: wait for proxy protocol ports to serve server_add()'s default readiness only waits until CQL can be queried, but these tests immediately connect to custom proxy protocol listeners. Wait for SERVING so the shard-aware TLS proxy port is accepting connections before the test starts, matching the Alternator proxy protocol readiness fix.	2026-05-04 10:23:03 +02:00
Nadav Har'El	d33bb6ea00	Merge 'test: fix race window test flakiness from residual re-repair' from Avi Kivity Fix the persistent flakiness in `test_incremental_repair_race_window_promotes_unrepaired_data` (SCYLLADB-1478, reopened). After restarting servers[1], the topology coordinator can initiate a residual re-repair when it sees tablets stuck in the `repair` stage. This re-repair flushes memtables on all replicas and marks post-repair data as repaired, contaminating the test state and masking the compaction-merge bug the test is designed to detect. The assertion then fails on the next retry because the previous attempt's re-repair left behind repaired sstables containing post-repair keys. 1. Propagating `current_key` through the exception — correctly advanced the key counter on retry, but the contaminated tablet metadata from the prior re-repair (repaired sstables with post-repair keys) was still present, causing assertion failures on the next attempt. 2. DROP TABLE + CREATE TABLE between retries — the tablet metadata (sstables_repaired_at, repair stage) is tied to the tablet identity, and recreating the table in the same keyspace still showed residual state issues. Instead of trying to clean up contaminated state, each retry creates a completely fresh keyspace (unique name via `create_new_test_keyspace`). This gives entirely new tablets with no residual repair metadata from prior attempts. Combined with broader detection of coordinator changes and residual re-repairs, the test reliably retries before any contamination can cause false failures. The detection is now comprehensive: - Broadened coordinator check: any coordinator change (`new_coord != coord`), not just migration to servers[1] - Re-repair detection at three points: post-restart, during the compaction poll, and after injection release — grep for `"Initiating tablet repair host="` in the coordinator log 1. `test: extract _setup_table_for_race_window helper` — pure code-movement refactor that extracts keyspace+table+data+repair1+data+flush into a reusable helper. Easily verifiable as a no-op behavioral change. 2. `test: fix race window test flakiness from residual re-repair` — the actual fix: broadened detection logic + re-repair grep at 3 points + fresh-keyspace retry on exception. Passed 1000 consecutive runs with the fix applied. Without the fix, about 2% flakiness was observed in debug mode. Fixes: SCYLLADB-1478 So far, we haven't observed flakiness of this test on branches, so not backporting yet. Will backport if seen. Closes scylladb/scylladb#29721 * github.com:scylladb/scylladb: test: fix race window test flakiness from residual re-repair test: extract _setup_table_for_race_window helper for race window test	2026-05-03 14:47:19 +03:00
Aleksandr Bykov	8afdae24d2	test: fix flaky test_kill_coordinator_during_op The test hardcoded the expected number of coordinator elections (2, 3, 4, 5) for each phase. If a prior phase triggered an extra election, subsequent phases would wait for a count that was already reached or would never match. Fix by reading the current election count before each operation and expecting exactly one more, making each phase independent of prior history. Also add wait_for_no_pending_topology_transition() calls after each coordinator election to ensure the topology state machine has fully settled before proceeding with restarts and further operations. Decrease the failure detector timeout (failure_detector_timeout_in_ms) to 2000 ms on all test nodes so that coordinator crashes are detected faster, reducing test wallclock time and timeout-related flakiness. Enable raft_topology=trace logging on all test nodes to aid post-failure diagnosis. Add diagnostic logging in wait_new_coordinator_elected(). Fixes: SCYLLADB-1089 Closes scylladb/scylladb#29284	2026-04-30 21:27:56 +03:00
Avi Kivity	795478fa7a	test: fix race window test flakiness from residual re-repair The test_incremental_repair_race_window_promotes_unrepaired_data test was still flaky because: 1. Only coordinator changes TO servers[1] were detected, but ANY coordinator change can trigger a residual re-repair that flushes memtables on all replicas and marks post-repair data as repaired. 2. Even without a coordinator change, the topology coordinator can initiate a residual re-repair when it sees tablets stuck in the repair stage after the servers[1] restart. This re-repair contaminates the repaired set with post-repair data, masking the compaction-merge bug the test detects. Fix by: - Broadening the coordinator check from == servers[1] to != coord - Adding re-repair detection (grep for 'Initiating tablet repair host=') at three points: post-restart, during the compaction poll, and after injection release - On retry, creating a completely fresh keyspace+table via _setup_table_for_race_window() so the new attempt starts with clean tablet metadata uncontaminated by prior re-repairs Fixes: SCYLLADB-1478	2026-04-30 18:40:18 +03:00
Avi Kivity	12d5e758ed	test: extract _setup_table_for_race_window helper for race window test Move the keyspace+table setup logic for test_incremental_repair_race_window_promotes_unrepaired_data into a dedicated helper function _setup_table_for_race_window(). The helper creates a fresh keyspace (unique name via create_new_test_keyspace), the table, configures STCS min_threshold=2, inserts baseline keys, runs repair 1, inserts keys for repair 2, and flushes. This is a pure refactor with no behavioral change: the test function now calls the helper once instead of inlining the setup. The extraction enables a subsequent commit to call the helper again on retry when a leadership transfer is detected.	2026-04-30 18:37:42 +03:00
Marcin Maliszkiewicz	45b4834ac4	Merge 'audit: fix maintenance socket startup/shutdown ordering' from Andrzej Jackowski This series addresses three problems in the audit startup/shutdown sequence: 1. [BUG] Shutdown SIGABRT. During graceful shutdown, deferred stops run in reverse order of construction. With the audit service constructed after the maintenance socket, audit was destroyed first, and in-flight queries on the maintenance socket could hit the destroyed audit service (assertion failure in sharded::local()). 2. [BUG] Startup audit bypass. The maintenance socket opened before audit storage was initialized, allowing queries (e.g. creating a superuser) to bypass auditing in that window. 3. [PROBLEM] Blocks SCYLLADB-1430. The existing order prevents audit configuration from being driven by group0 state, because audit started before group0. The series is organized as: a test-helper refactor, a test for the audited maintenance-socket flow, a startup-phase split, the construction-order fix and its shutdown-race test, and finally the storage-before-socket fix and its startup-window test. Fixes SCYLLADB-1615 No backport, bugs don't seem severe enough to justify backporting. Closes scylladb/scylladb#29539 * github.com:scylladb/scylladb: audit: assert storage ordering invariants at runtime audit: start maintenance socket after audit storage audit: move audit construction before maintenance socket audit: split startup into construction and storage phases test: audit: verify maintenance socket operations are audited test: audit: parameterize source address in audit assertions	2026-04-29 10:37:38 +02:00
Botond Dénes	809f12f988	Merge 'test/cluster/dtest: fix ScyllaNode state not persisting across nodelist() calls' from Benny Halevy `ScyllaCluster.nodelist()` creates new `ScyllaNode` objects on every call, so per-node state set via `set_smp()`, `set_log_level()`, and `_adjust_smp_and_memory()` was lost. This meant `set_smp()` had no effect when `cluster.start()` was called after it, since `start_nodes()` calls `nodelist()` internally which creates fresh nodes with default values. - Add debug logging for smp/memory in ScyllaNode - Store per-node settings (smp, memory, log levels) in a `ScyllaCluster._node_resources` dict keyed by server_id, so they survive `nodelist()` reconstruction. `ScyllaNode` restores its state from this dict on construction and saves it back whenever `set_smp()`, `set_log_level()`, or `_adjust_smp_and_memory()` modifies it. - Add a reproducer test verifying `set_smp()` takes effect on restart Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1629 -- No backport needed: this only fixes dtest infrastructure, no production code is affected. Closes scylladb/scylladb#29549 * github.com:scylladb/scylladb: test/cluster/dtest: add test for node.set_smp() persistence test/cluster/dtest: cache ScyllaNode instances in ScyllaCluster test/cluster/dtest/ccmlib/scylla_node: add debug logging	2026-04-29 06:25:36 +03:00
Andrzej Jackowski	1616c71bf0	test: audit: verify maintenance socket operations are audited User creation via the maintenance socket should produce audit entries, as this is the recommended flow for creating the initial superuser when default credentials are disabled. The test is parametrized by audit backend (table and syslog). The maintenance socket source address is "::" because Seastar returns a zero-initialised in6_addr for AF_UNIX sockets. Test time in dev: 0.6s Refs SCYLLADB-1615	2026-04-28 18:42:39 +02:00
Avi Kivity	c4de2b3c9d	Merge 'test: fix flaky tablets test by using read barrier' from Aleksandra Martyniuk Some tests in test_tablets.py read system_schema.keyspaces from an arbitrary node that may not have applied the latest schema change yet. Pin the read to a specific node and issue a read barrier before querying, ensuring the node has up-to-date data. Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1700 Test fix; no backport Closes scylladb/scylladb#29655 * github.com:scylladb/scylladb: test: fix flaky rack list conversion tests by using read barrier test: fix flaky test_enforce_rack_list_option by using read barrier	2026-04-28 17:15:59 +03:00
Patryk Jędrzejczak	d9dd3bfe53	Merge 'topology_coordinator: join tablet load stats refresh in stop()' from Andrzej Jackowski Commit `2b7aa32` (topology_coordinator: Refresh load stats after table is created or altered) registered topology_coordinator as a schema change listener and added on_create_column_family which fire-and-forgets _tablet_load_stats_refresh.trigger(). The triggered task runs on the gossip scheduling group via with_scheduling_group and accesses the topology_coordinator via 'this'. stop() unregisters the listener but does not wait for any in-flight refresh task. If a notification fires between _tablet_load_stats_refresh.join() in run() and unregister_listener in stop(), the scheduled task can outlive the topology_coordinator and access freed memory after run_topology_coordinator's coroutine frame is destroyed. Wait for the refresh to complete in stop() after unregistering the listener, ensuring no task can fire after destruction. Fixes SCYLLADB-1728 Backport to 2026.1 and 2026.2, because the issue was introduced in `2b7aa32` Closes scylladb/scylladb#29653 * https://github.com/scylladb/scylladb: test: tablet_stats: reproduce shutdown refresh race topology_coordinator: join tablet load stats refresh in stop()	2026-04-28 12:54:28 +02:00
Benny Halevy	5eaa979f35	test/cluster/dtest: add test for node.set_smp() persistence Add a test that reproduces SCYLLADB-1629: set_smp() had no effect because nodelist() created new ScyllaNode objects on every call, losing the _smp_set_during_test value. The test fails without the fix in the previous patch.	2026-04-28 12:34:08 +03:00
Benny Halevy	7430c1efd7	test/cluster/dtest: cache ScyllaNode instances in ScyllaCluster ScyllaCluster.nodelist() was creating new ScyllaNode objects on every call, so per-node state set via set_smp(), set_log_level(), and _adjust_smp_and_memory() was lost between calls. Fix by caching ScyllaNode instances in a list populated by _add_nodes() using the list returned by servers_add() in populate(). Nodes are assigned monotonically increasing names (node1, node2, ...). nodelist() simply returns the cached list.	2026-04-28 12:34:06 +03:00
Andrzej Jackowski	459e3970cd	test: tablet_stats: reproduce shutdown refresh race The coordinator can receive a schema-change notification after run() finishes but before stop() unregisters listeners. The test pins that window with error injections and verifies stop() waits for the refresh instead of letting it outlive the coordinator. Test time in dev: 9.51s Refs SCYLLADB-1728	2026-04-28 08:00:54 +02:00
Avi Kivity	2615d0e8d8	test/cluster/test_incremental_repair: add retry for residual leadership race There is a small race window where Raft leadership could transfer back to servers[1] between the ensure_group0_leader_on() check and the actual restart. If this happens, the new coordinator re-initiates repair and masks the compaction-merge bug. Extract the core test logic into _do_race_window_promotes_unrepaired_data() which directly checks get_topology_coordinator() after restart and raises _LeadershipTransferred if servers[1] became coordinator. The test function calls this helper in a retry loop (up to 5 attempts). Refs: SCYLLADB-1478	2026-04-27 21:11:06 +03:00
Avi Kivity	914b70c75b	test/cluster/test_incremental_repair: fix flaky coordinator-change scenario The test_incremental_repair_race_window_promotes_unrepaired_data test was flaky because it hardcodes servers[1] as the restart target but did not ensure servers[1] was NOT the topology coordinator. When servers[1] happened to be the Raft group0 leader (topology coordinator), restarting it killed the leader, forced a new election, and the new coordinator re-initiated tablet repair. This re-repair flushes memtables on all replicas via take_storage_snapshot() and marks the resulting sstables as repaired -- causing post-repair keys to appear in repaired sstables on servers[0] and servers[2]. The test then hit the wrong assertion (servers[0]/[2] contaminated). Fix: before starting the repair, check whether servers[1] is the topology coordinator. If so, move leadership to another server via ensure_group0_leader_on() so that restarting servers[1] only kills a follower -- which does not trigger an election or coordinator change. Reproducibility was confirmed by forcing leadership to servers[1] via ensure_group0_leader_on() and observing deterministic failure with all three servers showing post-repair keys in repaired sstables (confirming the re-repair scenario), then verifying the fix passes reliably. Fixes: SCYLLADB-1478	2026-04-27 21:08:12 +03:00
Aleksandra Martyniuk	6b7ce5e244	test: fix flaky rack list conversion tests by using read barrier test_numeric_rf_to_rack_list_conversion and test_numeric_rf_to_rack_list_conversion_abort were reading system_schema.keyspaces from an arbitrary node that may not have applied the latest schema change yet. Pin the read to a specific node and issue a read barrier before querying, ensuring the node has up-to-date data.	2026-04-27 15:19:09 +02:00
Aleksandra Martyniuk	9d3d424d58	test: fix flaky test_enforce_rack_list_option by using read barrier The test was reading system_schema.keyspaces from an arbitrary node that may not have applied the latest schema change yet. Pin the read to a specific node and issue a read barrier before querying, ensuring the node has up-to-date data.	2026-04-27 14:44:38 +02:00
Dimitrios Symonidis	c40842f60a	db, sstables: add node_owner to sstables registry primary key Add a node_owner column (locator::host_id) to system.sstables and make it part of the partition key, so the primary key becomes PRIMARY KEY ((table_id, node_owner), generation). This is the first step toward moving the sstables registry into system_distributed: once distributed, each node's startup scan must read only the rows it owns, which requires the owning node to be part of the partition key. Partitioning by (table_id, node_owner) turns that scan into a single-partition read of exactly the local node's rows. The new column is populated via sstables_manager::get_local_host_id(). No backward compatibility is preserved; the feature is experimental and gated by keyspace-storage-options.	2026-04-24 16:41:09 +02:00
Dimitrios Symonidis	ce78c5113e	db, sstables: rename sstables registry column owner to table_id The partition-key column in system.sstables named 'owner' actually holds a table_id. Rename the CQL column and the matching C++ parameter and member names so the identifier describes what it stores. No behavior change. This prepares the schema for an upcoming node_owner partition-key column (the local host id), which needs a free name.	2026-04-24 16:24:07 +02:00
Botond Dénes	70261dc674	Merge 'test/cluster: scale failure_detector_timeout_in_ms by build mode' from Marcin Maliszkiewicz The failure_detector_timeout_in_ms override of 2000ms in 6 cluster test files is too aggressive for debug/sanitize builds. During node joins, the coordinator's failure detector times out on RPC pings to the joining node while it is still applying schema snapshots, marks it DOWN, and bans it — causing flaky test failures. Scale the timeout by MODES_TIMEOUT_FACTOR (3x for debug/sanitize, 2x for dev, 1x for release) via a shared failure_detector_timeout fixture in conftest.py. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1587 Backport: no, elasticsearch analyser shows only a single failure Closes scylladb/scylladb#29522 * github.com:scylladb/scylladb: test/cluster: scale failure_detector_timeout_in_ms by build mode test/cluster: add failure_detector_timeout fixture	2026-04-24 09:10:43 +03:00

1 2 3 4 5 ...

1307 Commits