scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-30 11:36:54 +00:00

Author	SHA1	Message	Date
Marcin Maliszkiewicz	a83ee6cf66	Merge 'db/batchlog_manager: re-add v1 support for mixed clusters' from Botond Dénes `3f7ee3ce5d` introduced system.batchlog_v2, with a schema designed to speed up batchlog replays and make post-replay cleanups much more effective. It did not introduce a cluster feature for the new table, because it is node local table, so the cluster can switch to the new table gradually, one node at a time. However, https://github.com/scylladb/scylladb/issues/27886 showed that the switching causes timeouts during upgrades, in mixed clusters. Furthermore, switching to the new table unconditionally on upgrades nodes, means that on rollback, the batches saved into the v2 table are lost. This PR introduces re-introduces v1 (`system.batchlog`) support and guards the use of the v2 table with a cluster feature, so mixed clusters keep using v1 and thus be rollback-compatible. The re-introduced v1 support doesn't support post-replay cleanups for simplicity. The cleanup in v1 was never particularly effective anyway and we ended up disabling it for heavy batchlog users, so I don't think the lack of support for cleanup is a problem. Fixes: https://github.com/scylladb/scylladb/issues/27886 Needs backport to 2026.1, to fix upgrades for clusters using batches Closes scylladb/scylladb#28736 * github.com:scylladb/scylladb: test/boost/batchlog_manager_test: add tests for v1 batchlog test/boost/batchlog_manager_test: make prepare_batches() work with both v1 and v2 test/boost/batchlog_manager_test: fix indentation test/boost/batchlog_manager_test: extract prepare_batches() method test/lib/cql_assertions: is_rows(): add dump parameter tools/scylla-sstable: extract query result printers tools/scylla-sstable: add std::ostream& arg to query result printers repair/row_level: repair_flush_hints_batchlog_handler(): add all_replayed to finish log db/batchlog_manager: re-add v1 support db/batchlog_manager: return all_replayed from process_batch() db/batchlog_manager: process_bath() fix indentation db/batchlog_manager: make batch() a standalone function db/batchlog_manager: make structs stats public db/batchlog_manager: allocate limiter on the stack db/batchlog_manager: add feature_service dependency gms/feature_service: add batchlog_v2 feature	2026-03-02 12:09:10 +01:00
Patryk Jędrzejczak	9a9202c909	Merge 'Remove gossiper topology code' from Gleb Natapov The PR removes most of the code that assumes that group0 and raft topology is not enabled. It also makes sure that joining a cluster in no raft mode or upgrading a node in a cluster that not yet uses raft topology to this version will fail. Refs #15422 No backport needed since this removes functionality. Closes scylladb/scylladb#28514 * https://github.com/scylladb/scylladb: group0: fix indentation after previous patch raft_group0: simplify get_group0_upgrade_state function since no upgrade can happen any more raft_group0: move service::group0_upgrade_state to use fmt::formatter instead of iostream raft_group0: remove unused code from raft_group0 node_ops: remove topology over node ops code topology: fix indentation after the previous patch topology: drop topology_change_enabled parameter from raft_group0 code storage_service: remove unused handle_state_* functions gossiper: drop wait_for_gossip_to_settle and deprecate correspondent option storage_service: fix indentation after the last patch storage_service: remove gossiper bootstrapping code storage_service: drop get_group_server_if_raft_topolgy_enabled storage_service: drop is_topology_coordinator_enabled and its uses storage_service: drop run_with_api_lock_in_gossiper_mode_only topology: remove code that assumes raft_topology_change_enabled() may return false test: schema_change_test: make test_schema_digest_does_not_change_with_disabled_features tests run in raft mode test: schema_change_test: drop schema tests relevant for no raft mode only topology: remove upgrade to raft topology code group0: remove upgrade to group0 code group0: refuse to boot if a cluster is still is not in a raft topology mode storage_service: refuse to join a cluster in legacy mode	2026-02-27 14:43:41 +01:00
Botond Dénes	99244179f7	Merge 'CQL transport: Add histogram-based request/response size tracking' from Amnon Heiman This series closes a gap in how CQL request and response sizes are reported. Previously, request_size and response_size were tracked as simple counters, providing only cumulative totals per shard. This made it difficult to understand the distribution of message sizes and identify potential issues with very large or very small requests. After this series, the CQL transport reports detailed histogram metrics showing the distribution of request and response sizes. These histograms are tracked per-instance, per-type (per ops), and per-scheduling-group, providing much better visibility into CQL traffic patterns. The histograms are collected for QUERY, EXECUTE, and BATCH operations, which are the primary data path operations where message size distribution is most relevant. This data can help identify: - Clients sending unexpectedly large requests - Operations with oversized result sets - Scheduling group differences in traffic patterns To support this, the series extends the approx_exponential_histogram template to handle accurate sum, adds a bytes_histogram type alias optimized for byte-range measurements (1KB to 1GB). The existing per-shard counter metrics are maintained for backward compatibility. Metrics example: ``` scylla_transport_cql_request_bytes{kind="BATCH",scheduling_group_name="sl:default",shard="0"} 129808 scylla_transport_cql_request_bytes{kind="EXECUTE",scheduling_group_name="sl:default",shard="0"} 227409 scylla_transport_cql_request_bytes{kind="PREPARE",scheduling_group_name="sl:default",shard="0"} 631 scylla_transport_cql_request_bytes{kind="QUERY",scheduling_group_name="sl:default",shard="0"} 2809 scylla_transport_cql_request_bytes{kind="QUERY",scheduling_group_name="sl:driver",shard="0"} 4079 scylla_transport_cql_request_bytes{kind="REGISTER",scheduling_group_name="sl:default",shard="0"} 98 scylla_transport_cql_request_bytes{kind="STARTUP",scheduling_group_name="sl:driver",shard="0"} 432 scylla_transport_cql_request_histogram_bytes_sum{kind="QUERY",scheduling_group_name="sl:driver"} 4079 scylla_transport_cql_request_histogram_bytes_count{kind="QUERY",scheduling_group_name="sl:driver"} 57 scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="1024.000000",scheduling_group_name="sl:driver"} 57 scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="2048.000000",scheduling_group_name="sl:driver"} 57 scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="4096.000000",scheduling_group_name="sl:driver"} 57 scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="8192.000000",scheduling_group_name="sl:driver"} 57 scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="16384.000000",scheduling_group_name="sl:driver"} 57 scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="32768.000000",scheduling_group_name="sl:driver"} 57 scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="65536.000000",scheduling_group_name="sl:driver"} 57 scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="131072.000000",scheduling_group_name="sl:driver"} 57 scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="262144.000000",scheduling_group_name="sl:driver"} 57 scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="524288.000000",scheduling_group_name="sl:driver"} 57 scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="1048576.000000",scheduling_group_name="sl:driver"} 57 scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="2097152.000000",scheduling_group_name="sl:driver"} 57 scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="4194304.000000",scheduling_group_name="sl:driver"} 57 scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="8388608.000000",scheduling_group_name="sl:driver"} 57 scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="16777216.000000",scheduling_group_name="sl:driver"} 57 scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="33554432.000000",scheduling_group_name="sl:driver"} 57 scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="67108864.000000",scheduling_group_name="sl:driver"} 57 scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="134217728.000000",scheduling_group_name="sl:driver"} 57 scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="268435456.000000",scheduling_group_name="sl:driver"} 57 scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="536870912.000000",scheduling_group_name="sl:driver"} 57 scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="1073741824.000000",scheduling_group_name="sl:driver"} 57 ``` The field sees it as an important issue Fixes #14850 Closes scylladb/scylladb#28419 * github.com:scylladb/scylladb: test/boost/estimated_histogram_test.cc: Switch to real Sum transport/server: to bytes_histogram approx_exponential_histogram: Add sum() method for accurate value tracking utils/estimated_histogram.hh: Add bytes_histogram	2026-02-25 13:05:18 +02:00
Łukasz Paszkowski	9ade0b23da	reader_concurrency_semaphore: set _ex in on_preemptive_abort() When a permit is preemptively aborted, store the corresponding exception in permit's member: `reader_permit::impl::_ex`. This makes preemptively-aborted permits consistently report aborted() and prevents them from being treated as eligible for inactive registration in `register_inactive_read()`, avoiding assertion failures on unexpected permit state. Closes scylladb/scylladb#28591	2026-02-25 10:20:06 +02:00
Botond Dénes	56cc7bbeec	Merge 'Allow "global" snapshot using topology coordinator + add tablet metadata to manifest' from Calle Wilund Refs: SCYLLADB-193 Adds a "snapshot_table" topology operation and associated data structure/table columns to support dispatching a snapshot operation as a topo coordinator op. Logic is similar, and thus broken out and semi-shared with, truncation. Also adds optional tablet metadata to manifest, listing all tablets present in a given snapshot, as well as tablet sstable ownership, repair status, and token ranges. As per description in SCYLLADB-193, the alternative snapshot mechanism is in a separate namespace under 'tablets', which while dubious is the desired destination. The API is accessed via `nodetool cluster snapshot`, which more or less mirrors `nodetool snapshot`, but using topo op. TTL is added to message propagation as a separate patch here, since it is not (yet) used from API (or nodetool). Requires a syntax for both API and command line. Closes scylladb/scylladb#28525 * github.com:scylladb/scylladb: topology::snapshot: Add expiry (ttl) to RPC/topo op test_snapshot_with_tablets: Extend test to check manifest content table::manifest: Add tablet info to manifest.json test::test_snapshot_with_tablets: Add small test for topo coordinated snapshot scylla-nodetool: Add "cluster snapshot" command api::storage_service: Add tablets/snapshots command for cluster level snapshot db::snapshot-ctl: Add method to do snapshot using topo coordinator storage_proxy: Add snapshot_keyspace method topology_coordinator: Add handler for snapshot_tables storage_proxy: Add handler for SNAPSHOT_WITH_TABLETS messaging_service: Add SNAPSHOT_WITH_TABLETS verb feature_service: Add SNAPSHOT_AS_TOPOLOGY_OPERATION feature topology_mutation: Add setter for snapshot part of row system_keyspace::topology_requests_entry: Add snapshot info to table topology_state_machine: Add snapshot_tables operation topology_coordinator: Break out logic from handle_truncate_table storage_proxy: Break out logic from request_truncate_with_tablets test/object_store: Remove create_ks_and_cf() helper test/object_store: Replace create_ks_and_cf() usage with standard methods test/object_store: Shift indentation right for test cases	2026-02-25 10:17:53 +02:00
Nadav Har'El	e463d528fe	test: add unit test for tablet_map::get_secondary_replica() This patch adds a unit test for tablet_map::get_secondary_replica(). It was never officially defined how the "primary" and "secondary" replicas were chosen, and their implementation changed over time, but the one invariant that this test verifies is that the secondary replica and the primary replica must be a different node. This test reproduces issue SCYLLADB-777, where we discovered that the get_primary_replica() changed without a corresponding change to get_primary_replica(). So before the previous patch, this test failed, and after the previous patch - it passes. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-02-23 16:19:43 +02:00
Gleb Natapov	e23af998e1	test: schema_change_test: make test_schema_digest_does_not_change_with_disabled_features tests run in raft mode They were running in recovery to reuse existing system tables without group0 id, but since we want to remove recovery mode we need to re-generate the tables.	2026-02-23 14:54:24 +02:00
Gleb Natapov	f589740a39	test: schema_change_test: drop schema tests relevant for no raft mode only They were running in no longer supported recovery mode to force gossip topology.	2026-02-23 14:54:24 +02:00
Calle Wilund	9680541144	db::snapshot-ctl: Add method to do snapshot using topo coordinator Separated from "local" snapshot.	2026-02-23 11:27:15 +01:00
Botond Dénes	51a25c8af3	test/boost/batchlog_manager_test: add tests for v1 batchlog The v1 table is used while upgrading from a pre-v2 version. We need tests to ensure it still works.	2026-02-20 07:03:46 +02:00
Botond Dénes	83344dacbd	test/boost/batchlog_manager_test: make prepare_batches() work with both v1 and v2 Make the actual table name a parameter and add logic to adapt to the variant used. Also add dump_to_log::yes to is_rows() invokation to help debuging tests.	2026-02-20 07:03:46 +02:00
Botond Dénes	2956714e19	test/boost/batchlog_manager_test: fix indentation	2026-02-20 07:03:46 +02:00
Botond Dénes	23732227fe	test/boost/batchlog_manager_test: extract prepare_batches() method To be shared between multiple tests in future commits. Indentation is left broken.	2026-02-20 07:03:46 +02:00
Botond Dénes	ca2bbbad97	db/batchlog_manager: make structs stats public Need to rename stats() -> get_stats() because it shadows the now exported type name.	2026-02-20 07:03:46 +02:00
Asias He	1be80c9e86	repair: Skip auto repair for tables using RF one There is no point running repair for tables using RF one. Row level repair will skip it but the auto repair scheduler will keep scheduling such repairs since repair_time could not be updated. Skip such repairs at the scheduler level for auto repair. If the request is issued by user, we will have to schedule such repair otherwise the user request will never be finished. Fixes SCYLLADB-561 Closes scylladb/scylladb#28640	2026-02-18 14:32:50 +02:00
Piotr Dulikowski	b9db3c9c75	Merge 'Add consistent permissions cache' from Marcin Maliszkiewicz This patchset replaces permissions cache based on loading_cache with a new unified (permissions and roles), full, coherent auth cache. Reason for the change is that we want to improve scenarios under stress and simplify operation manuals. New cache doesn't require any tweaking. And it behaves particularly better in scenarios with lots of schema entities (e.g. tables) combined with unprepared queries. Old cache can generate few thousands of extra internal tps due to cache refresh. Benchmark of unprepared statements (just to populate the cache) with 1000 tables shows 3k tps of internal reads reduction and 9.1% reduction of median instructions per op. So many tables were used to show resource impact, cache could be filled with other resource types to show the same improvement. Backport: no, it's a new feature. Fixes https://github.com/scylladb/scylladb/issues/7397 Fixes https://github.com/scylladb/scylladb/issues/3693 Fixes https://github.com/scylladb/scylladb/issues/2589 Fixes https://scylladb.atlassian.net/browse/SCYLLADB-147 Closes scylladb/scylladb#28078 * github.com:scylladb/scylladb: test: boost: add auth cache tests auth: add cache size metrics docs: conf: update permissions cache documentation auth: remove old permissions cache auth: use unified cache for permissions auth: ldap: add permissions reload to unified cache auth: add permissions cache to auth/cache auth: add service::revoke_all as main entry point auth: explicitly life-extend resource in auth_migration_listener	2026-02-18 12:03:20 +01:00
Pavel Emelyanov	89d8ae5cb6	Merge 'http: prepare http clients retry machinery refactoring' from Ernest Zaslavsky Today S3 client has well established and well testes (hopefully) http request retry strategy, in the rest of clients it looks like we are trying to achieve the same writing the same code over and over again and of course missing corner cases that already been addressed in the S3 client. This PR aims to extract the code that could assist other clients to detect the retryability of an error originating from the http client, reuse the built in seastar http client retryability and to minimize the boilerplate of http client exception handling No backport needed since it is only refactoring of the existing code Closes scylladb/scylladb#28250 * github.com:scylladb/scylladb: exceptions: add helper to build a chain of error handlers http: extract error classification code aws_error: extract `retryable` from aws_error	2026-02-18 10:06:37 +03:00
Pavel Emelyanov	2f10fd93be	Merge 's3_client: Fix s3 part size and number of parts calculation' from Ernest Zaslavsky - Correct `calc_part_size` function since it could return more than 10k parts - Add tests - Add more checks in `calc_part_size` to comply with S3 limits Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-640 Must be ported back to 2025.3/4 and 2026.1 since we may encounter this bug in production clusters Closes scylladb/scylladb#28592 * github.com:scylladb/scylladb: s3_client: add more constrains to the calc_part_size s3_client: add tests for calc_part_size s3_client: correct multipart part-size logic to respect 10k limit	2026-02-18 10:04:53 +03:00
Szymon Malewski	668d6fe019	vector: Improve similarity functions performance Improves performance of deserialization of vector data for calculating similarity functions. Instead of deserializing vector data into a std::vector<data_value>, we deserialize directly into a std::vector<float> and then pass it to similarity functions as a std::span<const float>. This avoids overhead of data_value allocations and conversions. Example QPS of `SELECT id, similarity_cosine({vector<float, 1536>}, {vector<float, 1536>}) ...`: client concurrency 1: before: ~135 QPS, after: ~1005 QPS client concurrency 20: before: ~280 QPS, after: ~2097 QPS Measured using https://github.com/zilliztech/VectorDBBench (modified to call above query without ANN search) Fixes https://scylladb.atlassian.net/browse/SCYLLADB-471 Closes scylladb/scylladb#28615	2026-02-18 00:33:34 +02:00
Marcin Maliszkiewicz	741969cf4c	test: boost: add auth cache tests The cache is covered already with general auth dtests but some cases are more tricky and easier to express directly as calls to cache class. For such tests boost test file was added.	2026-02-17 18:18:40 +01:00
Botond Dénes	2e087882fa	Merge 'GCS object storage. Fix incompatibilty issues with "real" GCS' from Calle Wilund Fixes #28398 Fixes #28399 When used as path elements in google storage paths, the object names need to be URL encoded. Due to a.) tests not really using prefixes including non-url valid chars (i.e. / etc) and b.) the mock server used for most testing not enforcing this particular aspect, this was missed. Modified unit tests to use prefixing for all names, so when running real GS, any errors like this will show. "Real" GCS also behaves a bit different when listing with pager, compared to mock; The former will not give a pager token for last page, only penultimate. Adds handling for this. Needs backport to the releases that have (though might not really use) the feature, as it is technically possible to use google storage for backup and whatnot there, and it should work as expected. Closes scylladb/scylladb#28400 * github.com:scylladb/scylladb: utils/gcp/object_storage: URL-encode object names in URL:s utils::gcp::object_storage: Fix list object pager end condition detection	2026-02-17 16:40:02 +02:00
Ernest Zaslavsky	960adbb439	s3_client: add more constrains to the calc_part_size Enforce more checks on part size and object size as defined in "Amazon S3 multipart upload limits", see https://docs.aws.amazon.com/AmazonS3/latest/userguide/qfacts.html and https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingObjects.html	2026-02-10 13:15:07 +02:00
Ernest Zaslavsky	6280cb91ca	s3_client: add tests for calc_part_size Introduce tests that validate the corrected multipart part-size calculation, including boundary conditions and error cases.	2026-02-10 13:13:26 +02:00
Ernest Zaslavsky	7fd62f042e	http: extract error classification code move http client related error classification code to a common location for future reuse	2026-02-09 08:48:41 +02:00
Ernest Zaslavsky	5beb7a2814	aws_error: extract `retryable` from aws_error Move aws::retryable to common location to reuse it later in other http based clients	2026-02-09 08:48:41 +02:00
Michał Hudobski	6b9fcc6ca3	auth: add CDC streams and timestamps to vector search permissions It turns out that the cdc driver requires permissions to two additional system tables. This patch adds them to VECTOR_SEARCH_INDEXING and modifies the unit tests. The integration with vector store was tested manually, integration tests will be added in vector-store repository in a follow up PR. Fixes: SCYLLADB-522 Closes scylladb/scylladb#28519	2026-02-04 09:10:08 +01:00
Botond Dénes	2b3f3d9ba7	Merge 'test.py: support boost labels in test.py' from Artsiom Mishuta related PR: https://github.com/scylladb/scylladb/pull/27527 This PR changes test.py logic of parsing boost test cases to use -- --list_json_content and pass boost labels as pytests markers using -- --list_json_content is not ideal and currenly require to implement severall [workarounds](https://github.com/scylladb/scylladb/pull/27527#issuecomment-3765499812), but having the ability to support boost labels in pytest is worth it. because now we can apply the tiering mechanism for the boost tests as well Fixes SCYLLADB-246 Closes scylladb/scylladb#28232 * github.com:scylladb/scylladb: test: add nightly label test.py: support boost labels in test.py	2026-02-02 16:55:29 +02:00
Artsiom Mishuta	af2d7a146f	test: add nightly label add nightly label for test test_foreign_reader_as_mutation_source as an example of usinf boost labels pytest as markers command to test : ./tools/toolchain/dbuild pytest --test-py-init --collect-only -q -m=nightly test/boost output: boost/mutation_reader_test.cc::test_foreign_reader_as_mutation_source.debug.1 boost/mutation_reader_test.cc::test_foreign_reader_as_mutation_source.release.1 boost/mutation_reader_test.cc::test_foreign_reader_as_mutation_source.dev.1	2026-02-02 10:30:38 +01:00
Pavel Emelyanov	937d008d3c	Merge 'Clean up partition_snapshot_reader' from Botond Dénes Move to `replica/`, drop `flat` from name and drop unused usages as well as unused includes. Code cleanup, no backport Closes scylladb/scylladb#28353 * github.com:scylladb/scylladb: replica/partition_snapshot_reader: remove unused includes partition_snapshot_reader: remove "flat" from name mv partition_snapshot_reader.hh -> replica/	2026-01-29 11:22:15 +03:00
Botond Dénes	f6d7f606aa	memtable_test: disable flushing_rate_is_reduced_if_compaction_doesnt_keep_up for debug This test case was observed to take over 2 minutes to run on CI machines, contributing to already bloated CI run times. Disable this test in debug mode. This test checks for memtable flush being slowed down when compaction can't keep up. So this test needs to overwhelm the CPU by definition. On the other hand, this is not a correctness test, there are such tests for the memtable and compaction already, so it is not critical to run this in debug mode, it is not expected to catch any use-after-free and such. Closes scylladb/scylladb#28407	2026-01-29 11:13:22 +03:00
Botond Dénes	482ffe06fd	Merge 'Improve load shedding on the replica side' from Łukasz Paszkowski When reads arrive, they have to wait for admission on the reader concurrency semaphore. If the node is overloaded, the reads will be queued. They can time out while in the queue, but will not time out once admitted. Once the shard is sufficiently loaded, it is possible that most queued reads will time out, because the average time it takes to for a queued read to be admitted is around that of the timeout. If a read times out, any work we already did, or are about to do on it is wasted effort. Therefore, the patch tries to prevent it by checking if an admitted read has a chance to complete in time and abort it if not. It uses the following criteria: if read's remaining time <= read's timeout when arrived to the semaphore * live updateable preemptive_abort_factor; the read is rejected and the next one from the wait list is considered. Fixes https://github.com/scylladb/scylladb/issues/14909 Fixes: SCYLLADB-353 Backport is not needed. Better to first observe its impact. Closes scylladb/scylladb#21649 * github.com:scylladb/scylladb: reader_concurrency_semaphore: Check during admission if read may timeout permit_reader::impl: Replace break with return after evicting inactive permit on timeout reader_concurrency_semaphore: Add preemptive_abort_factor to constructors config: Add parameters to control reads' preemptive_abort_factor permit_reader: Add a new state: preemptive_aborted reader_concurrency_semaphore: validate waiters counter when dequeueing a waiting permit reader_concurrency_semaphore: Remove cpu_concurrency's default value	2026-01-29 08:27:22 +02:00
Amnon Heiman	f2e142ac6e	test/boost/estimated_histogram_test.cc: Switch to real Sum Now that the sum function in the histogram uses true values instead of an estimate, the test should reflect that. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2026-01-28 23:19:00 +02:00
Łukasz Paszkowski	7e1bbbd937	reader_concurrency_semaphore: Check during admission if read may timeout When a shard on a replica is overloaded, it breaks down completely, throughput collapses, latencies go through the roof and the node/shard can even become completely unresponsive to new connection attempts. When reads arrive, they have to wait for admission on the reader concurrency semaphore. If the node is overloaded, the reads will be queued and thus they can time out while being in the queue or during the execution. In the latter case, the timeout does not always result in the read being aborted. Once the shard is sufficiently loaded, it is possible that most queued reads will time out, because the average time it takes for a queued read to be admitted is around that of the timeout. If a read times out, any work we already did, or are about to do on it is wasted effort. Therefore, the patch tries to prevent it by checking if an admitted read has a chance to complete in time and abort it if not. It uses the following cryteria: if read's remaining time <= read's timeout when arrived to the semaphore * preemptive factor; the read is rejected and the next one from the wait list is considered.	2026-01-28 14:24:45 +01:00
Łukasz Paszkowski	fde09fd136	reader_concurrency_semaphore: Add preemptive_abort_factor to constructors The new parameter parametrizes the factor used to reject a read during admission. Its value shall be between 0.0 and 1.0 where + 0.0 means a read will never get rejected during admission + 1.0 means a read will immediatelly get rejected during admission Although passing values outside the interaval is possible, they will have the exact same effects as they were clamped to [0.0, 1.0].	2026-01-28 14:20:01 +01:00
Avi Kivity	47315c63dc	treewide: include Seastar headers with angle brackets Seastar is a "system" library from our point of view, so should be included with angle brackets. Closes scylladb/scylladb#28395	2026-01-28 10:33:06 +02:00
Pavel Emelyanov	834921251b	test: Replace memory_data_source with seastar::util::as_input_stream The existing test-only implementation is a simplified version of the generic one. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#28339	2026-01-28 10:15:03 +02:00
Calle Wilund	87aa6c8387	utils/gcp/object_storage: URL-encode object names in URL:s Fixes #28398 When used as path elements in google storage paths, the object names need to be URL encoded. Due to a.) tests not really using prefixes including non-url valid chars (i.e. / etc) and the mock server used for most testing not enforcing this particular aspect, this was missed. Modified unit tests to use prefixing for all names, so when run in real GS, any errors like this will show.	2026-01-27 18:01:21 +01:00
Pavel Emelyanov	02af292869	Merge 'Introduce TTL and retries to address resolution' from Ernest Zaslavsky In production environments, we observed cases where the S3 client would repeatedly fail to connect due to DNS entries becoming stale. Because the existing logic only attempted the first resolved address and lacked a way to refresh DNS state, the client could get stuck in a failure loop. Introduce RR TTL and connection failure retry to - re-resolve the RR in a timely manner - forcefully reset and re-resolve addresses - add a special case when the TTL is 0 and the record must be resolved for every request Fixes: CUSTOMER-96 Fixes: CUSTOMER-139 Should be backported to 2025.3/4 and 2026.1 since we already encountered it in the production clusters for 2025.3 Closes scylladb/scylladb#27891 * github.com:scylladb/scylladb: connection_factory: includes cleanup dns_connection_factory: refine the move constructor connection_factory: retry on failure connection_factory: introduce TTL timer connection_factory: get rid of shared_future in dns_connection_factory connection_factory: extract connection logic into a member connection_factory: remove unnecessary `else` connection_factory: use all resolved DNS addresses s3_test: remove client double-close	2026-01-27 18:45:43 +03:00
Łukasz Paszkowski	8829098e90	reader_concurrency_semaphore: Remove cpu_concurrency's default value The commit `59faa6d`, introduces a new parameter called cpu_concurrency and sets its default value to 1 which violates the commit `fbb83dd` that removes all default values from constructors but one used by the unit tests. The patch removes the default value of the cpu_concurrency parameter and alters tests to use the test dedicated reader_concurrency_semaphore constructor wherever possible.	2026-01-27 15:40:11 +01:00
Gleb Natapov	9daa109d2c	test: get rid of consistent_cluster_management usage in test consistent_cluster_management is deprecated since scylla-5.2 and no longer used by Scylladb, so it should not be used by test either. Closes scylladb/scylladb#28340	2026-01-27 11:31:30 +01:00
Avi Kivity	f1c6094150	Merge 'Remove buffer_input_stream and limiting_input_stream from core code' from Pavel Emelyanov These two streams mostly play together. The former provides an input_stream from read from in-memory temporary buffers, the latter wraps it to limit the size of provided temporary buffers. Both are used to test contiguous data consumer, also the buffer_input_stream has a caller in sstables reversing reader. This PR removes the buffer_input_stream in favor of seastar memory_data_source, and moves the limiting_input_stream into test/lib. Enanching testing code, not backporting Closes scylladb/scylladb#28352 * github.com:scylladb/scylladb: code: Move limiting data source to test/lib util: Simplify limiting_data_source API util: Remove buffer_input_stream test: Use seastar::util::temporary_buffer_data_source in data consumer test sstables: Use seastar::util::as_input_stream() in mx reader	2026-01-26 22:05:59 +02:00
Raphael S. Carvalho	0e07c6556d	test: Remove useless compaction group testing in database_test This compaction group testing is useless because the machinery for it to work was removed. This was useful in the early tablet days, where we wanted to test compaction groups directly. Today groups are stressed and tested on every tablet test. I see a ~40% reduction time after this patch, since database_test is one of the most (if not the most) time consuming in boost suite. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#28324	2026-01-26 19:16:27 +02:00
Botond Dénes	9d1933492a	mv partition_snapshot_reader.hh -> replica/ The partition snapshot lives in mutation/, however mutation/ is a lower level concept than a mutation reader. The next best place for this reader is the replica/ directory, where the memtable, its main user, also lives. Also move the code to the replica namespace. test/boost/mvcc_test.cc includes this header but doesn't use anything from it. Instead of updating the include path, just drop the unused include.	2026-01-26 16:52:08 +02:00
Pavel Emelyanov	77435206b9	code: Move limiting data source to test/lib Only two tests use it now -- the limit-data-source-test iself and a test that validates continuous_data_consumer template. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-01-26 12:49:42 +03:00
Pavel Emelyanov	111b376d0d	util: Simplify limiting_data_source API The source maintains "limit generator" -- a function that returns the maximum size of bytes to return from the next buffer. Currently all callers just return constant numbers from it. Passing a function that returns non-constant one can, probably, be used for a fuzzy test, but even the limiting-data-source-test itself doesn't do it, so what's the point... Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-01-26 12:46:37 +03:00
Pavel Emelyanov	4639681907	test: Use seastar::util::temporary_buffer_data_source in data consumer test The test creates buffer_data_source_impl and wraps it with limiting data source. The former data_source duplicates the functionality of the existing seastar temporary_buffer_data_source. This patch makes the test code use seastar facility. The buffer_data_source_impl will be removed soon. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-01-26 12:44:33 +03:00
Ernest Zaslavsky	cb2aa85cf5	aws_error: handle all restartable nested exception types Previously we only inspected std::system_error inside std::nested_exception to support a specific TLS-related failure mode. However, nested exceptions may contain any type, including other restartable (retryable) errors. This change unwraps one nested exception per iteration and re-applies all known handlers until a match is found or the chain is exhausted. Closes scylladb/scylladb#28240	2026-01-26 10:19:57 +03:00
Ernest Zaslavsky	bd9d5ad75b	s3_test: remove client double-close `test_chunked_download_data_source_with_delays` was calling `close()` on a client twice, remove the unnecessary call	2026-01-25 15:42:48 +02:00
Pavel Emelyanov	cb6ee05391	Merge 'Extend snapshot manifest.json with tablet-aware metadata' from Benny Halevy This series extends the json manifest file we create when taking snapshots. It adds the following metadata: - manifesr version and scope - snapshot name - created_at and expires_at timestamps (#24061) - node metadata (host_id, dc, rack) - keyspace and table metadat - tablet_count (#26352) - per-sstable metadata (#26352) Fixes [SCYLLADB-189](https://scylladb.atlassian.net/browse/SCYLLADB-189) Fixes [SCYLLADB-195](https://scylladb.atlassian.net/browse/SCYLLADB-195) Fixes [SCYLLADB-196](https://scylladb.atlassian.net/browse/SCYLLADB-196) * Enhancement, no backport needed [SCYLLADB-189]: https://scylladb.atlassian.net/browse/SCYLLADB-189?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ [SCYLLADB-195]: https://scylladb.atlassian.net/browse/SCYLLADB-195?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ [SCYLLADB-196]: https://scylladb.atlassian.net/browse/SCYLLADB-196?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ Closes scylladb/scylladb#27945 * github.com:scylladb/scylladb: snapshot: keep per-sstable metadata in manifest.json snapshot: add table info and tablet_count to manifest.json snapshot: add basic support for snapshot ttl in manifest.json table: snapshot_on_all_shards: take snapshot_options db: snapshot_ctl: move skip_flush to struct snapshot_options snapshot: add snapshot name in manifest.json test: lib: cql_test_env: apply db::config::tablets_mode_for_new_keyspaces snapshot: add node info to manifest.json snapshot: add manifest info to manifest.json test: database_test: snapshot_works: add validate_manifest	2026-01-22 15:19:11 +03:00
Patryk Jędrzejczak	67045b5f17	Merge 'raft_topology, tablets: Drain tablets in parallel with other topology operations' from Tomasz Grabiec Allows other topology operations to execute while tablets are being drained on decommission. In particular, bootstrap on scale-out. This is important for elasticity. Allows multiple decommission/removenode to happen in parallel, which is important for efficiency. Flow of decommission/removenode request: 1) pending and paused, has tablet replicas on target node. Tablet scheduler will start draining tablets. 2) No tablets on target node, request is pending but not paused 3) Request is scheduled, node is in transition 4) Request is done Nodes are considered draining as soon as there is a leave or remove request on them. If there are tablet replicas present on the target node, the request is in a paused state and will not be picked by topology coordinator. The paused state is computed from topology state automatically on reload. When request is not paused, its execution starts in write_both_read_old state. The old tablet_draining state is not entered (it's deprecated now). Tablet load balancing will yield the state machine as soon as some request is no longer paused and ready to be scheduled, based on standard preemption mechanics. Fixes #21452 Closes scylladb/scylladb#24129 * https://github.com/scylladb/scylladb: docs: Document parallel decommission and removenode and relevant task API test: Add tests for parallel decommission/removenode test: util: Introduce ensure_group0_leader_on() test: tablets: Check that there are no migrations scheduled on draining nodes test: lib: topology_builder: Introduce add_draining_request() topology_coordinator, tablets: Fail draining operations when tablet migration fails due to critical disk utilization tablets: topology_coordinator: Refactor to propagate reason for migration rollback tablet_allocator: Skip co-location on draining nodes node_ops: task_manager_module: Populate entity field also for active requests tasks: node_ops: Put node id in the entity field tasks, node_ops: Unify setting of task_stats in get_status() and get_stats() topology: Protect against empty cancelation reason tasks, topology: Make pending node operations abortable doc: topology-over-raft.md: Fix diagram for replacing, tablet_draining is not engaged raft_topology, tablets: Drain tablets in parallel with other topology operations virtual_tables: Show draining and excluded fields in system.cluster_status and system.load_by_node locator: topology: Add "draining" flag to a node topology_coordinator: Extract generate_cancel_request_update() storage_service: Drop dependency in topology_state_machine.hh in the header locator: Extract common code in assert_rf_rack_valid_keyspace() topology_coordinator, storage_service: Validate node removal/decommission at request submission time	2026-01-22 13:06:53 +01:00

1 2 3 4 5 ...

4504 Commits