scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-19 16:15:07 +00:00

Author	SHA1	Message	Date
Yaniv Michael Kaul	7b17429d99	treewide: add explicit includes for headers losing transitive availability via PCH Adding Scylla internal headers to the PCH changes which transitive includes are available. Add explicit includes where needed: - group0_fwd.hh: timer, lowres_clock, variant, vector - discovery.hh: unordered_set - Various test files: result_message.hh, result_set.hh, selection.hh, gossiper.hh, seastar net/api and core/seastar headers	2026-04-19 10:54:19 +03:00
Yaniv Michael Kaul	8f491eb7c7	pch: exclude small test sources from precompiled header Small test binaries with partial link sets cannot satisfy symbol references injected by -fpch-instantiate-templates. Exclude source files used by tests with fewer than 50 dependencies from PCH compilation to avoid linker failures.	2026-04-19 10:54:19 +03:00
Yaniv Michael Kaul	b9823f3053	pch: add replica/database.hh to precompiled header replica/database.hh is included by 112 translation units and is one of the heaviest headers in the codebase. Adding it to the PCH provides a major compile-time reduction as its large transitive include tree is parsed only once. Clean dev build time drops from ~14m to ~6m20s (with previous PCH commits; ~22m33s baseline without any PCH changes).	2026-04-19 10:54:19 +03:00
Yaniv Michael Kaul	3e6d959867	lang: remove redundant std::equal_to<scheduling_group> specialization The specialization is unused and conflicts with PCH template pre-instantiation. scheduling_group already has operator==, so std::equal_to works via the default template.	2026-04-19 10:54:19 +03:00
Yaniv Michael Kaul	75c4fe1f33	replica/logstor: declare hist_key specialization before instantiation Move the declaration of hist_key<segment_descriptor> specialization into compaction.hh so it is visible before the primary template gets instantiated via log_heap. This prevents -fpch-instantiate-templates from instantiating the primary template in the PCH, which would conflict with the explicit specialization in the .cc file.	2026-04-19 10:54:19 +03:00
Yaniv Michael Kaul	28e59bae5a	utils, db: qualify seastar::coroutine:: to avoid shadowing by utils::coroutine class Inside namespace utils, unqualified coroutine:: resolves to the utils::coroutine class (utils/coroutine.hh) rather than the seastar::coroutine namespace. This causes build failures when replica/database.hh is added to the precompiled header, because utils/coroutine.hh becomes transitively visible in all TUs. Qualify all coroutine:: references with seastar:: in affected files under utils/ and db/.	2026-04-19 10:54:19 +03:00
Yaniv Michael Kaul	66618cd869	pch: expand precompiled header with more high-impact Scylla headers Add to stdafx.hh: locator/token_metadata.hh, gms/gossiper.hh, db/system_keyspace.hh, service/topology_state_machine.hh, cql3/query_options.hh, service/client_state.hh, cql3/query_processor.hh, db/config.hh, service/storage_proxy.hh, schema/schema_builder.hh, exceptions/exceptions.hh, gms/feature_service.hh, service/migration_manager.hh, sstables/sstables.hh, service/storage_service.hh, transport/messages/result_message.hh. These headers are included by 40-140 translation units each. Adding them to the PCH avoids redundant parsing across the build. Combined with the previous PCH commit, clean dev build time drops from 22m33s to ~14m23s (-36.2%).	2026-04-19 10:54:19 +03:00
Yaniv Michael Kaul	b4586f0789	utils: fix PCH compatibility in config_file and object_storage Convert config_file.cc read_from_file() from continuation-style to coroutines, avoiding a template instantiation conflict with -fpch-instantiate-templates when heavy Scylla headers are in the PCH. Qualify input_stream<char> in object_storage.cc lambda parameter with seastar:: to resolve the same PCH template parsing issue.	2026-04-19 10:54:19 +03:00
Yaniv Michael Kaul	37280265ef	pch: add commonly-used Scylla internal headers to precompiled header Add schema/schema.hh, types/types.hh, mutation/mutation_partition.hh, mutation/mutation_fragment.hh and their dependencies (bytes.hh, keys.hh, dht/token.hh, locator types, etc.) to the PCH. These are included by the vast majority of translation units and benefit greatly from being precompiled once rather than parsed ~400 times. Reduces clean dev build time from ~22m to ~18m (~19% faster).	2026-04-19 10:54:18 +03:00
Yaniv Michael Kaul	2fbba4a071	raft, service, locator: create raft_fwd.hh and reduce heavy header includes Create raft/raft_fwd.hh with lightweight type aliases (server_id, group_id, term_t, index_t) backed only by raft/internal.hh, avoiding the heavy raft/raft.hh (832 lines with futures, abort_source, bytes_ostream). Replace raft/raft.hh with raft/raft_fwd.hh in headers that only need the basic ID types: tablets.hh, topology_state_machine.hh, topology_coordinator.hh, storage_service.hh, group0_fwd.hh, view_building_coordinator.hh, view_building_worker.hh. Also remove gossiper.hh and tablet_allocator.hh from storage_service.hh (forward declarations suffice), and remove unused reactor.hh from tablets.hh. Add explicit includes in .cc files that lost transitive availability.	2026-04-17 01:08:04 +03:00
Yaniv Michael Kaul	be5fa64d36	db: break gossiper.hh include from system_keyspace.hh Extract loaded_endpoint_state into a standalone lightweight header to avoid pulling the heavy gossiper.hh (and transitively query-result-set.hh) into every includer of system_keyspace.hh. Add explicit includes where the full definitions are actually needed. Reduces clean dev build time by ~2 minutes (-8%).	2026-04-16 23:27:55 +03:00
Yaniv Michael Kaul	5c918d29cc	service: remove unused storage_service.hh include from storage_proxy.hh storage_proxy.hh included storage_service.hh but never referenced any symbol from it. storage_service.hh costs 3.7s to parse per file, and storage_proxy.hh has 75 direct includers. While most of those also include database.hh (which shares transitive deps), removing this unnecessary include still reduces total parse work. Speedup: part of a series measured at -5.8% wall-clock improvement (same-session A/B: 16m14s -> 15m17s at -j16, 16 cores).	2026-04-16 18:22:56 +03:00
Yaniv Michael Kaul	43e337a663	db, test: add explicit includes for storage_service.hh and system_keyspace.hh Add explicit includes that were previously available transitively through service/storage_proxy.hh -> service/storage_service.hh. This prepares for removing the unused storage_service.hh include from storage_proxy.hh in a follow-up commit. Speedup: prerequisite for storage_proxy.hh include chain reduction (measured -5.8% wall-clock combined with all changes in this series, same-session A/B: 16m14s -> 15m17s at -j16).	2026-04-16 18:22:41 +03:00
Yaniv Michael Kaul	a67efb031c	db: break heavy include chain from config.hh by extracting replication_strategy_type Extract replication_strategy_type enum from locator/abstract_replication_strategy.hh into a new lightweight header locator/replication_strategy_type.hh, and use it in db/config.hh instead of the full abstract_replication_strategy.hh. abstract_replication_strategy.hh pulls in a large transitive dependency tree (schema.hh, mutation serializers, etc.) costing ~1.7s per file. With this change, config.hh's incremental parse cost drops from 1.7s to 0.6s. Since ~85 files include config.hh without also including database.hh (which would bring in these deps anyway), this saves ~93s total CPU. Speedup: part of a series measured at -5.8% wall-clock improvement (same-session A/B: 16m14s -> 15m17s at -j16, 16 cores).	2026-04-16 18:19:19 +03:00
Yaniv Michael Kaul	5b0933c453	utils: add explicit include for exceptions.hh in s3/client.cc Add explicit #include for utils/exceptions.hh which was previously available transitively through db/config.hh -> abstract_replication_strategy.hh. This prepares for removing the heavy abstract_replication_strategy.hh include from db/config.hh in a follow-up commit. Speedup: prerequisite for config.hh include chain reduction (measured -5.8% wall-clock combined with all changes in this series, same-session A/B: 16m14s -> 15m17s at -j16).	2026-04-16 18:19:04 +03:00
Yaniv Michael Kaul	2ac834d797	pch: remove seastar/http/api_docs.hh from precompiled header The api_docs.hh header contains inline method bodies (api_registry::handle) that call seastar::json::formatter::to_json(), forcing the compiler to instantiate seastar::json template specializations (json_list_template, formatter::write, do_with, etc.) in every compilation unit — even files that never use any HTTP/JSON API types. Measured ~6s of wasted template instantiation per file × ~620 files = ~3,700s total CPU. Only 2 files outside the PCH include api_docs.hh directly, so removing it has no impact on code that actually uses these types. Wall-clock build time (-j16, Seastar/Abseil cached): Before (with loading_cache fix): avg 23m29s After: avg 23m04s (-1.8%) vs original baseline: avg 24m01s (-4.0%)	2026-04-15 09:29:25 +03:00
Yaniv Michael Kaul	b324c84a04	cql3: break loading_cache include chain from query_processor.hh utils/loading_cache.hh is an expensive template header that costs ~2,494 seconds of aggregate CPU time across 133 files that include it. 88 of those files include it only transitively via query_processor.hh through the chain: query_processor.hh -> prepared_statements_cache.hh -> loading_cache.hh, costing ~1,690s of template instantiation. Break the chain by: - Replacing #include of prepared_statements_cache.hh and authorized_prepared_statements_cache.hh in query_processor.hh with forward declarations and the lightweight prepared_cache_key_type.hh - Replacing #include of result_message.hh with result_message_base.hh (which doesn't pull in prepared_statements_cache.hh) - Changing prepared_statements_cache and authorized_prepared_statements_cache members to std::unique_ptr (PImpl) since forward-declared types cannot be held by value - Moving get_prepared(), execute_prepared(), execute_direct(), and execute_batch() method bodies from the header to query_processor.cc - Updating transport/server.cc to use the concrete type instead of the no-longer-visible authorized_prepared_statements_cache::value_type Per-file measurement: files including query_processor.hh now show zero loading_cache template instantiation events (previously 20-32s each). Wall-clock measurement (clean build, -j16, 16 cores, Seastar cached): Baseline (origin/master): avg 24m01s (24m03s, 23m59s) With loading_cache chain break: avg 23m29s (23m32s, 23m29s, 23m27s) Improvement: ~32s, ~2.2%	2026-04-15 04:21:15 +03:00
Yaniv Michael Kaul	b499dc8e9d	cql3: extract prepared_cache_key_type into standalone lightweight header Move prepared_cache_key_type class and its std::hash / fmt::formatter specializations from prepared_statements_cache.hh into a new header cql3/prepared_cache_key_type.hh. The new header only depends on bytes.hh, utils/hash.hh, and cql3/dialect.hh -- it does NOT include utils/loading_cache.hh. This allows code that needs the cache key type (e.g. for function signatures) without pulling in the expensive loading_cache template machinery. prepared_statements_cache.hh now includes prepared_cache_key_type.hh, so existing includers are unaffected. No functional change. Prepares for breaking the loading_cache include chain from query_processor.hh.	2026-04-15 04:20:57 +03:00
Yaniv Michael Kaul	8ad8e76c3b	cql3, service, test: add explicit includes for headers losing transitive availability Add explicit #include directives for headers that are currently available transitively through cql3/query_processor.hh but will stop being available after a subsequent refactoring that removes the loading_cache include chain. Files changed: - cql3/statements/drop_keyspace_statement.cc: add unimplemented.hh - cql3/statements/truncate_statement.cc: add unimplemented.hh - cql3/statements/batch_statement.cc: add result_message.hh - cql3/statements/broadcast_modification_statement.cc: add result_message.hh - service/paxos/paxos_state.cc: add result_message.hh - test/lib/cql_test_env.cc: add result_message.hh - table_helper.cc: add result_message.hh No functional change. Prepares for subsequent query_processor.hh cleanup.	2026-04-15 04:20:49 +03:00
Avi Kivity	0ae22a09d4	LICENSE: Update to version 1.1 Updated terms of non-commercial use (must be a never-customer).	2026-04-12 19:46:33 +03:00
Avi Kivity	22949bae52	Merge 'logstor: implement tablet split/merge and migration' from Michael Litvak implement tablet split, tablet merge and tablet migration for tables that use the experimental logstor storage engine. * tablet merge simply merges the histograms of segments of one compaction group with another. * for tablet split we take the segments from the source compaction group, read them and write all live records to separate segments according to the split classifier, and move separated segments to the target compaction groups. * for tablet migration we use stream_blob, similarly to file streaming of sstables. we add a new op type for streaming a logstor segment. on the source we take a snapshot of the segments with an input stream that reads the segment, and on the target we create a sink that allocates a new segment on the target shard and writes to it. * we also do some improvements for recovery and loading of segments. we add a segment header that contains useful information for non-mixed segments, such as the table and token range. Refs SCYLLADB-770 no backport - still a new and experimental feature Closes scylladb/scylladb#29207 * github.com:scylladb/scylladb: test: logstor: additional logstor tests docs/dev: add logstor on-disk format section logstor: add version and crc to buffer header test: logstor: tablet split/merge and migration logstor: enable tablet balancing logstor: streaming of logstor segments using stream_blob logstor: add take_logstor_snapshot logstor: segment input/output stream logstor: implement compaction_group::cleanup logstor: tablet split logstor: tablet merge logstor: add compaction reenabler logstor: add segment header logstor: serialize writes to active segment replica: extend compaction_group functions for logstor replica: add compaction_group_for_logstor_segment logstor: code cleanup	2026-04-12 16:11:12 +03:00
Israel Fruchter	79c736455e	cqlsh: update to v6.0.34-scylla Update cqlsh to version v6.0.34-scylla. Notable fix: - Fix vector type formatting error (scylladb/scylla-cqlsh#165) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Closes scylladb/scylladb#29401	2026-04-12 14:54:50 +03:00
Avi Kivity	8ccee6803e	Merge 'Remove upgrade view builder' from Gleb Natapov Since we do no longer support upgrade from versions that do not support v2 of "view building status" code (building status is managed by raft) we can remove v1 code and upgrade code and make sure we do not boot with old "builder status" version. v2 version was introduced by `8d25a4d678` which is included in scylla-2025.1.0. No backport needed since this is code removal. Closes scylladb/scylladb#29105 * github.com:scylladb/scylladb: view: drop unused v1 builder code view: remove upgrade to raft code	2026-04-12 00:39:26 +03:00
Botond Dénes	9770a4c081	test/cluster/test_encryption.py: use single-partition reads in read_verify_workload() Replace the range scan in read_verify_workload() with individual single-partition queries, using the keys returned by prepare_write_workload() instead of hard-coding them. The range scan was previously observed to time out in debug mode after a hard cluster restart. Single-partition reads are lighter on the cluster and less likely to time out under load. The new verification is also stricter: instead of merely checking that the expected number of rows is returned, it verifies that each written key is individually readable, catching any data-loss or key-identity mismatch that the old count-only check would have missed. This is the second attemp at stabilizing this test, after the recent `854c374ebf`. That fix made sure that the cluster has converged on topology and nodes see each other before running the verify workload. Fixes: SCYLLADB-1331 Closes scylladb/scylladb#29313	2026-04-12 00:38:20 +03:00
Avi Kivity	ca80ee8586	Merge 'Introduce maintenance scheduling supergroup and do initial population' from Pavel Emelyanov The supergroup replaces streaming (a.k.a. maintenance as well) group, inherits 200 shares from it and consists of four sub-groups (all have equal shares of 200 withing the new supergroup) * maintenance_compaction. This group configures `compaction_manager::maintenance_sg()` group. User-triggered compaction runs in it * backup. This group configures `snapshot_ctl::config::backup_sched_group`. Native backup activity runs there * maintenance. It's a new "visible" name, everything that was called "maintenance" in the code ran in "streaming" group. Now it will run in "maintenance". The activities include those that don't communicate over RPC (see below why) * `tablet_allocator::balance_tablets()` * `sstables_manager::components_reclaim_reload_fiber()` * `tablet_storage_group_manager::merge_completion_fiber()` * metrics exporting http server altogether * streaming. This is purely existing streaming group that just moves under the new supergroup. Everything else that was run there, continues doing so, including * hints sender * all view building related components (update generator, builder, workers) * repair * stream_manager * messaging service (except for verb handlers that switch groups) * join_cluster() activity * REST API * ... something else I forgot The `--maintenance_io_throughput_mb_per_sec` option is introduced. It controls the IO throughput limit applied to the maintenance supergroup. If not set, the `--stream_io_throughput_mb_per_sec` option is used to preserve backward compatibility. All new sched groups inherit `request_class::maintenance` (however, "backup" seem not to make any requests yet). Moving more activities from "streaming" into "maintenance" (or its own group) is possible, but one will need to take care of RPC group switching. The thing is that when a client makes an RPC call, the server may switch to one of pre-negotiated scheduling groups. Verbs for existing activities that run in "streaming" group are routed through RPC index that negotiates "streaming" group on the server side. If any of that client code moves to some other group, server will still run the handlers in "streaming" which is not quite expected. That's one of the main reasons why only the selected fibers were moved to their own "maintenance" group. Similar for backup -- this code doesn't use RPC, so it can be moved. Restoring code uses load-and-stream and corresponding RPCs, so it cannot be just moved into its own new group. Fixes SCYLLADB-351 New feature, not backporting Closes scylladb/scylladb#28542 * github.com:scylladb/scylladb: code: Add maintenance/maintenance group backup: Add maintenance/backup group compaction: Add maintenance/maintenance_compaction group main: Introduce maintenance supergroup main: Move all maintenance sched group into streaming one database: Use local variable for current_scheduling_group code: Live-update IO throughputs from main	2026-04-12 00:34:48 +03:00
Botond Dénes	3289928679	repair: fix quadratic complexity when loading repair history shared_tombstone_gc_state::update_repair_time() uses copy-on-write semantics: each call copies the entire per_table_history_maps and the per-table repair_history_map. repair_service::load_history() called this once per history entry, making the load O(N²) in both time and memory. Introduce batch_update_repair_time() which performs a single copy-on-write for any number of entries belonging to the same table. Restructure load_history() to collect entries into batches of up to 1000 and flush each batch in one call, keeping peak memory bounded. The batch size limit is intentional: the repair history table currently has no bound on the number of entries and can grow large. Note that this does not cause a problem in the in-memory history map itself: entries are coalesced internally and only the latest repair time is kept per range. The unbounded entry count only makes the batched update during load expensive. Fixes: SCYLLADB-104 Closes scylladb/scylladb#29326	2026-04-11 23:54:26 +03:00
Michał Hudobski	7d648961ed	vector_search: forward non-primary key restrictions to Vector Store service Include non-primary key restrictions (e.g. regular column filters) in the filter JSON sent to the Vector Store service. Previously only partition key and clustering column restrictions were forwarded, so filtering on regular columns was silently ignored. Add get_nonprimary_key_restrictions() getter to statement_restrictions. Add unit tests for non-primary key equality, range, and bind marker restrictions in filter_test. Fixes: SCYLLADB-970 Closes scylladb/scylladb#29019	2026-04-10 17:16:29 +02:00
Piotr Dulikowski	3bd770d4d9	Merge 'counters: reuse counter IDs by rack' from Michael Litvak For counter updates, use a counter ID that is constructed from the node's rack instead of the node's host ID. A rack can have at most two active tablet replicas at a time: a single normal tablet replica, and during tablet migration there are two active replicas, the normal and pending replica. Therefore we can have two unique counter IDs per rack that are reused by all replicas in the rack. We construct the counter ID from the rack UUID, which is constructed from the name "dc:rack". The pending replica uses a deterministic variation of the rack's counter ID by negating it. This improves the performance and size of counter cells by having less unique counter IDs and less counter shards in a counter cell. Previously the number of counter shards was the number of different host_id's that updated the counter, which can be typically the number of nodes in the cluster and continue growing indefinitely when nodes are replaced. with the rack-based counter id the number of counter shards will be at most twice the number of different racks (including removed racks, which should not be significant). Fixes SCYLLADB-356 backport not needed - an enhancement Closes scylladb/scylladb#28901 * github.com:scylladb/scylladb: docs/dev: add counters doc counters: reuse counter IDs by rack	2026-04-10 12:24:18 +02:00
Wojciech Mitros	163c6f71d6	transport: refactor result_message bounce interface Replace move_to_shard()/move_to_host() with as_bounce()/target_shard()/ target_host() to clarify the interface after bounce was extended to support cross-node bouncing. - Add virtual as_bounce() returning const bounce* to the base class (nullptr by default, overridden in bounce to return this), replacing the virtual move_to_shard() which conflated bounce detection with shard access - Rename move_to_shard() -> target_shard() (now non-virtual, returns unsigned directly) and move_to_host() -> target_host() on bounce - Replace dynamic_pointer_cast with static_pointer_cast at call sites that already checked as_bounce() - Move forward declarations of message types before the virtual methods so as_bounce() can reference bounce Fixes: SCYLLADB-1066 Closes scylladb/scylladb#29367	2026-04-10 12:17:43 +02:00
Piotr Dulikowski	32e3a01718	Merge 'service: strong_consistency: Allow for aborting operations' from Dawid Mędrek Motivation ---------- Since strongly consistent tables are based on the concept of Raft groups, operations on them can get stuck for indefinite amounts of time. That may be problematic, and so we'd like to implement a way to cancel those operations at suitable times. Description of solution ----------------------- The situations we focus on are the following: * Timed-out queries * Leader changes * Tablet migrations * Table drops * Node shutdowns We handle each of them and provide validation tests. Implementation strategy ----------------------- 1. Auxiliary commits. 2. Abort operations on timeout. 3. Abort operations on tablet removal. 4. Extend `client_state`. 5. Abort operation on shutdown. 6. Help `state_machine` be aborted as soon as possible. Tests ----- We provide tests that validate the correctness of the solution. The total time spent on `test_strong_consistency.py` (measured on my local machine, dev mode): Before: ``` real 0m31.809s user 1m3.048s sys 0m21.812s ``` After: ``` real 0m34.523s user 1m10.307s sys 0m27.223s ``` The incremental differences in time can be found in the commit messages. Fixes SCYLLADB-429 Backport: not needed. This is an enhancement to an experimental feature. Closes scylladb/scylladb#28526 * github.com:scylladb/scylladb: service: strong_consistency: Abort state_machine::apply when aborting server service: strong_consistency: Abort ongoing operations when shutting down service: client_state: Extend with abort_source service: strong_consistency: Handle abort when removing Raft group service: strong_consistency: Abort Raft operations on timeout service: strong_consistency: Use timeout when mutating service: strong_consistency: Fix indentation service: strong_consistency: Enclose coordinator methods with try-catch service: strong_consistency: Crash at unexpected exception test: cluster: Extract default config & cmdline in test_strong_consistency.py	2026-04-10 11:11:21 +02:00
Pavel Emelyanov	0b336da89d	Revert "cmake: add missing rolling_max_tracker_test and symmetric_key_test" This reverts commit `8b4a91982b`. Two commits independently added rolling_max_tracker_test to test/boost/CMakeLists.txt: `8b4a919` cmake: add missing rolling_max_tracker_test and symmetric_key_test `f3a91df` test/cmake: add missing tests to boost test suite The second was merged two days after the first. They didn't conflict on code-level and applied cleanly resulting in a duplicate add_scylla_test() entries that breaks the CMake build: CMake Error: add_executable cannot create target "test_boost_rolling_max_tracker_test" because another target with the same name already exists. Remove the duplicate. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Reported-by: Łukasz Paszkowski <lukasz.paszkowski@scylladb.com>	2026-04-10 11:19:43 +03:00
Patryk Jędrzejczak	751bf31273	Merge 'More gossiper cleanups' from Gleb Natapov The PR contains more code cleanups, mostly in gossiper. Dropping more gossiper state leaving only NORMAL and SHUTDOWN. All other states are checked against topology state. Those two are left because SHUTDOWN state is propagated through gossiper only and when the node is not in SHUTDOWN it should be in some other state. No need to backport. Cleanups. Closes scylladb/scylladb#29129 * https://github.com/scylladb/scylladb: storage_service: cleanup unused code storage_service: simplify get_peer_info_for_update gossiper: send shutdown notifications in parallel gms: remove unused code virtual_tables: no need to call gossiper if we already know that the node is in shutdown gossiper: print node state from raft topology in the logs gossiper: use is_shutdown instead of code it manually gossiper: mark endpoint_state(inet_address ip) constructor as explicit gossiper: remove unused code gossiper: drop last use of LEFT state and drop the state gossiper: drop unused STATUS_BOOTSTRAPPING state gossiper: rename is_dead_state to is_left since this is all that the function checks now. gossiper: use raft topology state instead of gossiper one when checking node's state storage_service: drop check_for_endpoint_collision function storage_service: drop is_first_node function gossiper: remove unused REMOVED_TOKEN state gossiper: remove unused advertise_token_removed function	2026-04-10 09:56:20 +02:00
Nadav Har'El	6674aa29ca	Merge 'Add Cassandra SAI (StorageAttachedIndex) compatibility' from Szymon Wasik Cassandra's native vector index type is StorageAttachedIndex (SAI). Libraries such as CassIO, LangChain, and LlamaIndex generate `CREATE CUSTOM INDEX` statements using the SAI class name. Previously, ScyllaDB rejected these with "Non-supported custom class". This PR adds compatibility so that SAI-style CQL statements work on ScyllaDB without modification. 1. test: enable SAI_VECTOR_ALLOW_CUSTOM_PARAMETERS for Cassandra tests Enables the `SAI_VECTOR_ALLOW_CUSTOM_PARAMETERS` Cassandra system property so that `search_beam_width` tests pass against Cassandra 5.0.7. 2. test: modernize vector index test comments and fix xfail Updates test comments from "Reproduces" to "Validates fix for" for clarity, and converts the `test_ann_query_with_pk_restriction` xfail into a stripped-down CREATE INDEX syntax test (removing unused INSERT/SELECT lines). Removes the redundant `test_ann_query_with_non_pk_restriction` test. 3. cql: add Cassandra SAI (StorageAttachedIndex) compatibility Core implementation: the SAI class name is detected and translated to ScyllaDB's native `vector_index`. The fully-qualified class name (`org.apache.cassandra.index.sai.StorageAttachedIndex`) requires exact case; short names (`StorageAttachedIndex`, `sai`) are matched case-insensitively — matching Cassandra's behavior. Non-vector and multi-column SAI targets are rejected with clear errors. Adds `skip_on_scylla_vnodes` fixture, SAI compatibility docs, and the Cassandra compatibility table entry (split into "SAI general" vs "SAI for vector search"). 4. cql: accept source_model option for Cassandra SAI compatibility The `source_model` option is a Cassandra SAI property used by Cassandra libraries (e.g., CassIO) to tag vector indexes with the name of the embedding model. ScyllaDB accepts it for compatibility but does not use it — the validator is a no-op lambda. The option is preserved in index metadata and returned in DESCRIBE INDEX output. - `cql3/statements/create_index_statement.cc`: SAI class detection and rewriting logic - `index/secondary_index_manager.cc`: case-insensitive class name lookup (lowercasing restored before `classes.find()`) - `index/vector_index.cc`: `source_model` accepted as a valid option with no-op validator - `docs/cql/secondary-indexes.rst`: SAI compatibility documentation with `source_model` table row - `docs/using-scylla/cassandra-compatibility.rst`: SAI entry split into general (not supported) and vector search (supported) - `test/cqlpy/conftest.py`: `scylla_with_tablets` renamed to `skip_on_scylla_vnodes` - `test/cqlpy/test_vector_index.py`: SAI tests inlined (no constants), `check_bad_option()` helper for numeric validation, uppercase class name test, merged `source_model` tests with DESCRIBE check \| Backend \| Passed \| Skipped \| Failed \| \|--------------------\|--------\|---------\|--------\| \| ScyllaDB (dev) \| 42 \| 0 \| 0 \| \| Cassandra 5.0.7 \| 16 \| 26 \| 0 \| None: new feature. Fixes: SCYLLADB-239 Closes scylladb/scylladb#28645 * github.com:scylladb/scylladb: cql: accept source_model option and show options in DESCRIBE cql: add Cassandra SAI (StorageAttachedIndex) compatibility test: modernize vector index test comments and fix xfail test: enable SAI_VECTOR_ALLOW_CUSTOM_PARAMETERS for Cassandra tests	2026-04-10 10:21:20 +03:00
Avi Kivity	f67d0739d0	test: user_function_test: adjust Lua error message tests Lua 5.5 changed the error message slightly ("?:-1" -> "?:?"). Relax the error message tests to avoid this unimportant fragment. Closes scylladb/scylladb#29414	2026-04-10 01:09:35 +03:00
Piotr Szymaniak	98d6edaa88	alternator: add comment explaining delta_mode::keys in add_stream_options() Clarify that cdc::delta_mode is ignored by Alternator, so we use the least expensive mode (keys) to reduce overhead. Fixes scylladb/scylladb#24812 Closes scylladb/scylladb#29408	2026-04-10 01:07:21 +03:00
Michał Hudobski	c8b9fde828	auth: allow VECTOR_SEARCH_INDEXING permission to access system.tablets Add system.tablets to the set of system resources that can be accessed with the VECTOR_SEARCH_INDEXING permission. Fixes: VECTOR-605 Closes scylladb/scylladb#29397	2026-04-09 21:53:07 +03:00
Szymon Wasik	573def7cd8	cql: accept source_model option and show options in DESCRIBE Accept the Cassandra SAI 'source_model' option for vector indexes. This option is used by Cassandra libraries (e.g., CassIO, LangChain) to tag vector indexes with the name of the embedding model that produced the vectors. ScyllaDB does not use the source_model value but stores it and includes it in the DESCRIBE INDEX output for Cassandra compatibility. Additionally, extend vector_index::describe() to emit a WITH OPTIONS = {...} clause containing all user-provided index options (filtering out system keys: target, class_name, index_version). This makes options like similarity_function, source_model, etc. visible in DESCRIBE output.	2026-04-09 17:20:03 +02:00
Szymon Wasik	80a2e4a0ab	cql: add Cassandra SAI (StorageAttachedIndex) compatibility Libraries such as CassIO, LangChain, and LlamaIndex create vector indexes using Cassandra's StorageAttachedIndex (SAI) class name. This commit lets ScyllaDB accept these statements without modification. When a CREATE CUSTOM INDEX statement specifies an SAI class name on a vector column, ScyllaDB automatically rewrites it to the native vector_index implementation. Accepted class names (case-insensitive): - org.apache.cassandra.index.sai.StorageAttachedIndex - StorageAttachedIndex - sai SAI on non-vector columns is rejected with a clear error directing users to a secondary index instead. The SAI detection and rewriting logic is extracted into a dedicated static function (maybe_rewrite_sai_to_vector_index) to keep the already-long validate_while_executing method manageable. Multi-column (local index) targets and nonexistent columns are skipped with continue — the former are treated as filtering columns by vector_index::check_target(), and the latter are caught later by vector_index::validate(). Tests that exercise features common to both backends (basic creation, similarity_function, IF NOT EXISTS, bad options, etc.) now use the SAI class name with the skip_on_scylla_vnodes fixture so they run against both ScyllaDB and Cassandra. ScyllaDB-specific tests continue to use USING 'vector_index' with scylla_only.	2026-04-09 17:20:03 +02:00
Szymon Wasik	fa7edc627c	test: modernize vector index test comments and fix xfail - Change 'Reproduces' to 'Validates fix for' in test comments to reflect that the referenced issues are already fixed. - Condense the VECTOR-179 comment to two lines. - Replace the xfailed test_ann_query_with_restriction_works_only_on_pk with a focused test (test_ann_query_with_pk_restriction) that creates a vector index on a table with a PK column restriction, validating the VECTOR-374 fix.	2026-04-09 17:20:02 +02:00
Szymon Wasik	4eab050be4	test: enable SAI_VECTOR_ALLOW_CUSTOM_PARAMETERS for Cassandra tests	2026-04-09 17:20:02 +02:00
Andrzej Jackowski	23c386a27f	test: perf: add audit-unix-socket-path to perf-simple-query To allow performance benchmarking with custom syslog sinks. Example use case: -- Audit + default syslog: ~100k tps taskset -c 0,2,4 ./build/release/scylla perf-simple-query --smp 3 --write --duration 30 --audit "syslog" --audit-keyspace "ks" --audit-categories "DCL,DDL,AUTH,DML,QUERY" ``` 110263.72 tps ( 66.1 allocs/op, 16.0 logallocs/op, 25.7 tasks/op, 254900 insns/op, 144796 cycles/op, 0 errors) throughput: mean= 107137.48 standard-deviation=3142.98 median= 106665.00 median-absolute-deviation=1786.03 maximum=111435.19 minimum=97620.79 instructions_per_op: mean= 256311.36 standard-deviation=5037.13 median= 256288.09 median-absolute-deviation=2223.08 maximum=274220.89 minimum=248141.40 cpu_cycles_per_op: mean= 146443.47 standard-deviation=2844.19 median= 146001.85 median-absolute-deviation=1514.82 maximum=157177.54 minimum=142981.03 ``` -- Audit + custom syslog: ~400k tps socat -u UNIX-RECV:/tmp/audit-null.sock,type=2 OPEN:/dev/null taskset -c 0,2,4 ./build/release/scylla perf-simple-query --smp 3 --write --duration 30 --audit "syslog" --audit-keyspace "ks" --audit-categories "DCL,DDL,AUTH,DML,QUERY" --audit-unix-socket-path /tmp/audit-null.sock ``` 404929.62 tps ( 65.9 allocs/op, 16.0 logallocs/op, 25.5 tasks/op, 77406 insns/op, 35559 cycles/op, 0 errors) throughput: mean= 399868.39 standard-deviation=6232.88 median= 401770.65 median-absolute-deviation=3859.09 maximum=406126.79 minimum=383434.84 instructions_per_op: mean= 77481.26 standard-deviation=168.31 median= 77405.54 median-absolute-deviation=84.33 maximum=78081.46 minimum=77332.84 cpu_cycles_per_op: mean= 35871.32 standard-deviation=516.83 median= 35699.70 median-absolute-deviation=251.15 maximum=37454.86 minimum=35432.60 ``` -- No audit: ~800k tps taskset -c 0,2,4 ./build/release/scylla perf-simple-query --smp 3 --write --duration 30 ``` 808970.95 tps ( 53.3 allocs/op, 16.0 logallocs/op, 14.9 tasks/op, 49904 insns/op, 20471 cycles/op, 0 errors) throughput: mean= 809065.31 standard-deviation=6222.39 median= 810507.10 median-absolute-deviation=1827.99 maximum=815213.41 minimum=782104.84 instructions_per_op: mean= 49905.50 standard-deviation=21.81 median= 49900.12 median-absolute-deviation=7.72 maximum=50010.97 minimum=49892.57 cpu_cycles_per_op: mean= 20429.00 standard-deviation=41.40 median= 20425.18 median-absolute-deviation=29.11 maximum=20530.74 minimum=20355.42 ``` Closes scylladb/scylladb#29396	2026-04-09 16:00:41 +03:00
Anna Stuchlik	c6587c6a70	doc: Fix malformed markdown link in alternator network docs Fixes https://github.com/scylladb/scylladb/issues/29400 Closes scylladb/scylladb#29402	2026-04-09 15:54:43 +03:00
Botond Dénes	5886d1841a	Merge 'cmake: align CMake build system with configure.py and add comparison script' from Ernest Zaslavsky Every time someone modifies the build system — adding a source file, changing a compilation flag, or wiring a new test — the change tends to land in only one of our two build systems (configure.py or CMake). Over time this causes three classes of problems: 1. CMake stops compiling entirely. Missing defines, wrong sanitizer flags, or misplaced subdirectory ordering cause hard build failures that are only discovered when someone tries to use CMake (e.g. for IDE integration). 2. Missing build targets. Tests or binaries present in configure.py are never added to CMake, so `cmake --build` silently skips them. This PR fixes several such cases (e.g. `symmetric_key_test`, `auth_cache_test`, `sstable_tablet_streaming`). 3. Missing compilation units in targets. A `.cc` file is added to a test binary in one system but not the other, causing link errors or silently omitted test coverage. To fix the existing drift and prevent future divergence, this series: Adds a build-system comparison script (`scripts/compare_build_systems.py`) that configures both systems into a temporary directory, parses their generated `build.ninja` files, and compares per-file compilation flags, link target sets, and per-target libraries. configure.py is treated as the baseline; CMake must match it. The script supports a `--ci` mode suitable for gating PRs that touch build files. Fixes all current mismatches found by the script: - Mode flag alignment in `mode.common.cmake` and `mode.Coverage.cmake` (sanitizer flags, `-fno-lto`, stack-usage warnings, coverage defines). - Global define alignment (`SEASTAR_NO_EXCEPTION_HACK`, `XXH_PRIVATE_API`, `BOOST_ALL_DYN_LINK`, `SEASTAR_TESTING_MAIN` placement). - Seastar build configuration (shared vs static per mode, coverage sanitizer link options). - Abseil sanitizer flags (`-fno-sanitize=vptr`). - Missing test targets in `test/boost/CMakeLists.txt`. - Redundant per-test flags now covered by global settings. - Lua library resolution via a custom `cmake/FindLua.cmake` using pkg-config, matching configure.py's approach. Adds documentation (`docs/dev/compare-build-systems.md`) describing how to run the script and interpret its output. No backport needed — this is build infrastructure improvement only. Closes scylladb/scylladb#29273 * github.com:scylladb/scylladb: scripts: remove lua library rename workaround from comparison script cmake: add custom FindLua using pkg-config to match configure.py test/cmake: add missing tests to boost test suite test/cmake: remove per-test LTO disable cmake: add BOOST_ALL_DYN_LINK and strip per-component defines cmake: move SEASTAR_TESTING_MAIN after seastar and abseil subdirs cmake: add -fno-sanitize=vptr for abseil sanitizer flags cmake: align Seastar build configuration with configure.py cmake: align global compile defines and options with configure.py cmake: fix Coverage mode in mode.Coverage.cmake cmake: align mode.common.cmake flags with configure.py configure.py: add sstable_tablet_streaming to combined_tests docs: add compare-build-systems.md scripts: add compare_build_systems.py to compare ninja build files	2026-04-09 15:46:09 +03:00
Yaniv Michael Kaul	13879b023f	tracing: set_skip_when_empty() for error-path metrics Add .set_skip_when_empty() to all error-path metrics in the tracing module. Tracing itself is not a commonly used feature, making all of these metrics almost always zero: Tier 1 (very rare - corruption/schema issues): - tracing_keyspace_helper::bad_column_family_errors: tracing schema missing or incompatible, should never happen post-bootstrap - tracing::trace_errors: internal error building trace parameters Tier 2 (overload - tracing backend saturated): - tracing::dropped_sessions: too many pending sessions - tracing::dropped_records: too many pending records Tier 3 (general tracing write errors): - tracing_keyspace_helper::tracing_errors: errors during writes to system_traces keyspace Since tracing is an opt-in feature that most deployments rarely use, all five metrics are almost always zero and create unnecessary reporting overhead. AI-Assisted: yes Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> Closes scylladb/scylladb#29346	2026-04-09 14:28:16 +03:00
Michael Litvak	3964040008	docs/dev: add counters doc Add a documentation of the counters feature implementation in docs/dev/counters.md. The documentation is taken from the wiki and updated according to the current state of the code - legacy details are removed, and a section about the counter id is added.	2026-04-09 13:08:02 +02:00
Michael Litvak	b71762d5da	counters: reuse counter IDs by rack For counter updates, use a counter ID that is constructed from the node's rack instead of the node's host ID. A rack can have at most two active tablet replicas at a time: a single normal tablet replica, and during tablet migration there are two active replicas, the normal and pending replica. Therefore we can have two unique counter IDs per rack that are reused by all replicas in the rack. We construct the counter ID from the rack UUID, which is constructed from the name "dc:rack". The pending replica uses a deterministic variation of the rack's counter ID by negating it. This improves the performance and size of counter cells by having less unique counter IDs and less counter shards in a counter cell. Previously the number of counter shards was the number of different host_id's that updated the counter, which can be typically the number of nodes in the cluster and continue growing indefinitely when nodes are replaced. with the rack-based counter id the number of counter shards will be at most twice the number of different racks (including removed racks, which should not be significant). Fixes SCYLLADB-356	2026-04-09 13:08:02 +02:00
Yaniv Michael Kaul	2c0076d3ef	replica: set_skip_when_empty() for rare error-path metrics Add .set_skip_when_empty() to four metrics in replica/database.cc that are only incremented on very rare error paths and are almost always zero: - database::dropped_view_updates: view updates dropped due to overload. NOTE: this metric appears to never be incremented in the current codebase and may be a candidate for removal. - database::multishard_query_failed_reader_stops: documented as a 'hard badness counter' that should always be zero. NOTE: no increment site was found in the current codebase; may be a candidate for removal. - database::multishard_query_failed_reader_saves: documented as a 'hard badness counter' that should always be zero. - database::total_writes_rejected_due_to_out_of_space_prevention: only fires when disk utilization is critical and user table writes are disabled, a very rare operational state. These metrics create unnecessary reporting overhead when they are perpetually zero. set_skip_when_empty() suppresses them from metrics output until they become non-zero. AI-Assisted: yes Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> Closes scylladb/scylladb#29345	2026-04-09 14:07:28 +03:00
Botond Dénes	86417d49de	Merge 'transport: improve memory accounting for big responses and slow network' from Marcin Maliszkiewicz After obtaining the CQL response, check if its actual size exceeds the initially acquired memory permit. If so, acquire additional semaphore units and adopt them into the permit, ensuring accurate memory accounting for large responses. Additionally, move the permit into a .then() continuation so that the semaphore units are kept alive until write_message finishes, preventing premature release of memory permit. This is especially important with slow networks and big responses when buffers can accumulate and deplete a node's memory. Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1306 Related https://scylladb.atlassian.net/browse/SCYLLADB-740 Backport: all supported versions Closes scylladb/scylladb#29288 * github.com:scylladb/scylladb: transport: add per-service-level pending response memory metric transport: hold memory permit until response write completes transport: account for response size exceeding initial memory estimate	2026-04-09 13:36:31 +03:00
Yaniv Michael Kaul	5c8b4a003e	db: set_skip_when_empty() for rare error-path metrics Add .set_skip_when_empty() to four metrics in the db module that are only incremented on very rare error paths and are almost always zero: - cache::pinned_dirty_memory_overload: described as 'should sit constantly at 0, nonzero is indicative of a bug' - corrupt_data::entries_reported: only fires on actual data corruption - hints::corrupted_files: only fires on on-disk hint file corruption - rate_limiter::failed_allocations: only fires when the rate limiter hash table is completely full and gives up allocating, requiring extreme cardinality pressure These metrics create unnecessary reporting overhead when they are perpetually zero. set_skip_when_empty() suppresses them from metrics output until they become non-zero. AI-Assisted: yes Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> Closes scylladb/scylladb#29344	2026-04-09 13:32:09 +03:00
Gleb Natapov	dbaba7ab8a	storage_service: cleanup unused code Remove unused definition and double includes.	2026-04-09 13:31:41 +03:00
Gleb Natapov	b050b593b3	storage_service: simplify get_peer_info_for_update It does nothing for fields managed in raft, so drop their processing.	2026-04-09 13:31:41 +03:00
Gleb Natapov	d0576c109f	gossiper: send shutdown notifications in parallel	2026-04-09 13:31:40 +03:00
Gleb Natapov	1586fa65af	gms: remove unused code Also moved version_string(...) and make_token_string(...) to private: — they are internal helpers used only by normal(), not part of the public API	2026-04-09 13:31:40 +03:00
Gleb Natapov	b2e35c538f	virtual_tables: no need to call gossiper if we already know that the node is in shutdown	2026-04-09 13:31:40 +03:00
Gleb Natapov	e17fc180a0	gossiper: print node state from raft topology in the logs Raft topology has real node's state now. gossiper sate are now set to NORMAL and SHUTDOWN only.	2026-04-09 13:31:40 +03:00
Gleb Natapov	8439154851	gossiper: use is_shutdown instead of code it manually	2026-04-09 13:31:39 +03:00
Gleb Natapov	7d700d0377	gossiper: mark endpoint_state(inet_address ip) constructor as explicit get_live_members function called is_shutdown which inet_address argument, which caused temporary endpoint_state to be created. Fix it by prohibiting implicit conversion and calling the correct is_shutdown function instead.	2026-04-09 13:31:39 +03:00
Gleb Natapov	6df4f572d5	gossiper: remove unused code	2026-04-09 13:31:39 +03:00
Gleb Natapov	67102496c8	gossiper: drop last use of LEFT state and drop the state The decommission sets left gossiper state only to prevent shutdown notification be issued by the node during shutdown. Since the notification code now checks the state in raft topology this is no longer needed.	2026-04-09 13:31:39 +03:00
Gleb Natapov	54d2c95094	gossiper: drop unused STATUS_BOOTSTRAPPING state	2026-04-09 13:31:38 +03:00
Gleb Natapov	7c895ced19	gossiper: rename is_dead_state to is_left since this is all that the function checks now.	2026-04-09 13:31:38 +03:00
Gleb Natapov	7dfb0577b8	gossiper: use raft topology state instead of gossiper one when checking node's state Raft topology state is a truth source for the nodes state, so use it instead of a gossiper one.	2026-04-09 13:31:38 +03:00
Gleb Natapov	c17c4806a1	storage_service: drop check_for_endpoint_collision function All the checks that it does are also done by join coordinator and the join coordinator uses more reliable raft state instead of gossiper one.	2026-04-09 13:31:37 +03:00
Gleb Natapov	1ac8edb22b	storage_service: drop is_first_node function It make no sense now since the first node to bootstrap is determined by discover_group0 algorithm.	2026-04-09 13:31:37 +03:00
Gleb Natapov	681aa9ebe1	gossiper: remove unused REMOVED_TOKEN state	2026-04-09 13:31:37 +03:00
Gleb Natapov	5af17aa578	gossiper: remove unused advertise_token_removed function	2026-04-09 13:31:36 +03:00
Dawid Mędrek	f0dfe29d88	service: strong_consistency: Abort state_machine::apply when aborting server The state machine used by strongly consistent tablets may block on a read barrier if the local schema is insufficient to resolve pending mutations [1]. To deal with that, we perform a read barrier that may block for a long time. When a strongly consistent tablet is being removed, we'd like to cancel all ongoing executions of `state_machine::apply`: the shard is no longer responsible for the tablet, so it doesn't matter what the outcome is. --- In the implementation, we abort the operations by simply throwing an exception from `state_machine::apply` and not doing anything. That's a red flag considering that it may lead to the instance being killed on the spot [2]. Fortunately for us, strongly consistent tables use the default Raft server implementation, i.e. `raft::server_impl`, which actually handles one type of an exception thrown by the method: namely, `abort_requested_exception`, which is the default exception thrown by `seastar::abort_source` [3]. We leverage this property. --- Unfortunately, `raft::server_impl::abort` isn't perfectly suited for us. If we look into its code, we'll see that the relevant portion of the procedure boils down to three steps: 1. Prevent scheduling adding new entries. 2. Wait for the applier fiber. 3. Abort the state machine. Since aborting the state machine happens only after the applier fiber has already finished, there will no longer be anything to abort. Either all executions of `state_machine::apply` have already finished, or they are hanging and we cannot do anything. That's a pre-existing problem that we won't be solving here (even though it's possible). We hope the problem will be solved, and it seems likely: the code suggests that the behavior is not intended. For more details, see e.g. [4]. --- We provide two validation tests. They simulate the abortion of `state_machine::apply` in two different scenarios: * when the table is dropped (which should also cover the case of tablet migration), * when the node is shutting down. The value of the tests isn't high since they don't ensure that the state of the group is still valid (though it should be), nor do they perform any other check. Instead, we rely on the testing framework to spot any anomalies or errors. That's probably the best we can do at the moment. Unfortunately, both tests are marked as skipped becuause of the current limitations of `raft::server_impl::abort` described above and in [4]. References: [1] `4c8dba1` [2] See the description of `raft::state_machine` in `raft/raft.hh`. [3] See `server_impl::applier_fiber` in `raft/server.cc`. [4] SCYLLADB-1056	2026-04-09 11:36:51 +02:00
Dawid Mędrek	ad8a263683	service: strong_consistency: Abort ongoing operations when shutting down These changes are complementary to those from a recent commit where we handled aborting ongoing operations during tablet events, such as tablet migration. In this commit, we consider the case of shutting down a node. When a node is shutting down, we eventually close the connections. When the client can no longer get a response from the server, it makes no sense to continue with the queries. We'd like to cancel them at that point. We leverage the abort source passed down via `client_state` down to the strongly consistent coordinator. This way, the transport layer can communicate with it and signal that the queries should be canceled. The abort source is triggered by the CQL server (cf. `generic_server::server::{stop,shutdown}`). --- Note that this is not an optional change. In fact, if we don't abort those requests, we might hang for an indefinite amount of time when executing the following code in `main.cc`: ``` // Register at_exit last, so that storage_service::drain_on_shutdown will be called first auto do_drain = defer_verbose_shutdown("local storage", [&ss] { ss.local().drain_on_shutdown().get(); }); ``` The problem boils down to the fact that `generic_server::server::stop` will wait for all connections to be closed, but that won't happen until all ongoing operations (at least those to strongly consistent tables) are finished. It's important to highlight that even though we hang on this, the client can no longer get any response. Thus, it's crucial that at that point we simply abort ongoing operations to proceed with the rest of shutdown. --- Two tests are added to verify that the implementation is correct: one focusing on local operations, the other -- on a forwarded write. Difference in time spent on the whole test file `test_strong_consistency.py` on my local machine, in dev mode: Before: ``` real 0m31.775s user 1m4.475s sys 0m22.615s ``` After: ``` real 0m32.024s user 1m10.751s sys 0m23.871s ``` Individual runs of the added tests: test_queries_when_shutting_down: ``` real 0m12.818s user 0m36.726s sys 0m4.577s ``` test_abort_forwarded_write_upon_shutdown: ``` real 0m12.930s user 0m36.622s sys 0m4.752s ```	2026-04-09 11:36:17 +02:00
Dawid Mędrek	4a87bdc778	service: client_state: Extend with abort_source We make `client_state` store a pointer to an `abort_source`. This will be useful in the following commit that will implement aborting ongoing requests to strongly consistent tables upon connection shutdowns. It might also be useful in some other places in the code in the future. We set the abort source for client states in relevant places.	2026-04-09 11:35:35 +02:00
Dawid Mędrek	89c049b889	service: strong_consistency: Handle abort when removing Raft group When a strongly consistent Raft group is being removed, it means one of the following cases: (A) The node is shutting down and it's simply part of the the shutdown procedure. (B) The tablet is somehow leaving the replica. For example, due to: - Tablet migration - Tablet split/merge - Tablet removal (e.g. because the table is dropped) In this commit, we focus on case (A). Case (B) will be handled in the following one. --- The changes in the code are literally none, and there's a reason to it. First, let's note that we've already implemented abortion of timed-out requests. There is a limit to how long a query can run and sooner or later it will finish, regardless of what we do. Second, we need to ask ourselves if the cases we're considering in this commit (i.e. case (B)) is a situation where we'd like to speed up the process. The answer is no. Tablet migrations are effectively internal operations that are invisible to the users. User requests are, quite obviously, the opposite of that. Because of that, we want to patiently wait for the queries to finish or time out, even though it's technically possible to lead to an abort earlier. Lastly, the changes in the code that actually appear in this commit are not completely irrelevant either. We consider the important case of the `leader_info_updater` fiber and argue that it's safe to not pass any abort source to the Raft methods used by it. --- Unfortunately, we don't have tablet migrations implemented yet [1], so our testing capabilities are limited. Still, we provide a new test that corresponds to case (B) described above. We simulate a tablet migration by dropping a table and observe how reads and writes behave in such a situation. There's no extremely careful validation involved there, but that's what we can have for the time being. Difference in time spent on the whole test file `test_strong_consistency.py` on my local machine, in dev mode: Before: ``` real 0m30.841s user 1m3.294s sys 0m21.091s ``` After: ``` real 0m31.775s user 1m4.475s sys 0m22.615s ``` The time spent on the new test only: ``` real 0m5.264s user 0m34.646s sys 0m3.374s ``` References: [1] SCYLLADB-868	2026-04-09 11:35:31 +02:00
Dawid Mędrek	7dcc3e85b9	service: strong_consistency: Abort Raft operations on timeout If a query, either a write, or a read to a strongly consistent table, times out, we immediately abort the operation and throw an exception. Unfortunately, due to the inconsistency in exception types thrown on timeout by the many methods we use in the code, it results in pretty messy `try-catch` clauses. Perhaps there's a better alternative to this, but it's beyond the scope of this work, so we leave it as-is. We provide a validation test that consists of three cases corresponding to reads, writes, and waiting for the leader. They verify that the code works as expected in all affected places. A comparison of time spent on the whole `test_strong_consistency.py` on my local machine, in dev mode: Before: ``` real 0m32.185s user 0m55.391s sys 0m15.745s ``` After: ``` real 0m30.841s user 1m3.294s sys 0m21.091s ``` The time spent on the new test only: ``` real 0m7.077s user 0m35.359s sys 0m3.717s ```	2026-04-09 11:35:04 +02:00
Piotr Szymaniak	65a1bdd368	docs: document Alternator auditing in the operator-facing auditing guide - Document Alternator (DynamoDB-compatible API) auditing support in the operator-facing auditing guide (docs/operating-scylla/security/auditing.rst) - Cover operation-to-category mapping, operation field format, keyspace/table filtering, and audit log examples - Document the audit_tables=alternator.<table> shorthand format - Minor wording improvements throughout (Scylla -> ScyllaDB, clarify default audit backend) Closes scylladb/scylladb#29231	2026-04-09 12:26:57 +03:00
Dawid Mędrek	2243e0ffea	service: strong_consistency: Use timeout when mutating We remove the inconsistency between reads and writes to strongly consistent tables. Before the commit, only reads used a timeout. Now, writes do as well. Although the parameter isn't used yet, that will change in the following commit. This is a prerequisite for it.	2026-04-09 11:25:57 +02:00
Dawid Mędrek	fd9c907be1	service: strong_consistency: Fix indentation	2026-04-09 11:25:57 +02:00
Dawid Mędrek	ca7f24516e	service: strong_consistency: Enclose coordinator methods with try-catch We enclose `coordinator::{mutate,query}` with `try-catch` clauses. They do nothing at the moment, but we'll use them later. We do this now to avoid noise in the upcoming commits. We'll fix the indentation in the following commit.	2026-04-09 11:25:57 +02:00
Dawid Mędrek	e9ea9e7259	service: strong_consistency: Crash at unexpected exception The loop shouldn't throw any other exception than the ones already covered by the `catch` claues. Crash, at least when `abort_on_internal_error` is set, if we catch any other type since that may be a sign of a bug.	2026-04-09 11:25:57 +02:00
Dawid Mędrek	f499a629ab	test: cluster: Extract default config & cmdline in test_strong_consistency.py All used configs and cmdlines share the same values. Let's extract them to avoid repeating them every time a new test is written. Those options should be enabled for all tests in the file anyway.	2026-04-09 11:25:57 +02:00
Geoff Montee	7d7ec7025e	docs: Document system keyspaces for developers / internal usage Fixes #29043 with the following docs changes: - docs/dev/system-keyspaces.md: Added a new file that documents all keyspaces created internally Closes scylladb/scylladb#29044	2026-04-09 11:49:58 +03:00
Guy Shtub	40a861016a	docs/faq.rst: Fixing small spelling mistake Closes scylladb/scylladb#29131	2026-04-09 11:48:46 +03:00
Pavel Emelyanov	78f5bab7cf	table: Add formatter for group_id argument in tablet merge exception message Fixes: SCYLLADB-1432 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#29143	2026-04-09 11:45:57 +03:00
Botond Dénes	fbbe2bdce8	Merge 'Introduce repair_service::config and cut dependency from db::config' from Pavel Emelyanov Spreading db::config around and making all services depend on it is not nice. Most other service that need configuration provide their own config that's populated from db::config in main.cc/cql_test_env.cc and use it, not the global config. This PR does the same for repair_service. Enhancing components dependencies, not backporting Closes scylladb/scylladb#29153 * github.com:scylladb/scylladb: repair: Remove db/config.hh from repair/*.cc files repair: Move repair_multishard_reader options onto repair_service::config repair: Move critical_disk_utilization_level onto repair_service::config repair: Move repair_partition_count_estimation_ratio onto repair_service::config repair: Move repair_hints_batchlog_flush_cache_time_in_ms onto repair_service::config repair: Move enable_small_table_optimization_for_rbno onto repair_service::config repair: Introduce repair_service::config	2026-04-09 11:44:25 +03:00
Botond Dénes	76c8794f4f	Merge 'Strong consistency: allow taking snapshots (but not transfer) and make them less likely' from Piotr Dulikowski While working on benchmarks for strong consistency we noticed that the raft logic attempted to take snapshots during the benchmark. Snapshot transfer is not implemented for strong consistency yet and the methods that take or transfer snapshots throw exceptions. This causes the raft groups to stop working completely. While implementing snapshot transfers is out of scope, we can implement some mitigations now to stop the tests from breaking: - The first commit adjusts the configuration options. First, it disables periodic snapshotting (i.e. creating a snapshot every X log entries). Second, it increases the memory threshold for the raft log before which a snapshot is created from 2MB to 10MB. - The second commit relaxes the take snapshot / drop snapshot methods and makes it possible to actually use them - they are no-ops. It is still forbidden to transfer snapshots. I am including both commits because applying only the first one didn't completely prevent the issue from occurring when testing locally. Refs: SCYLLADB-1115 Strong consistency is experimental, no need for backport. Closes scylladb/scylladb#29189 * github.com:scylladb/scylladb: strong_consistency: fake taking and dropping snapshots strong_consistency: adjust limits for snapshots	2026-04-09 11:44:03 +03:00
Anna Stuchlik	dd34d2afb4	doc: remove references to old versions from Docker Hub docs This commit removes references ScyllaDB versions ("Since x.y") from the ScyllaDB documentation on Docker Hub, as they are redundant and confusing (some versions are super ancient). Fixes SCYLLADB-1212 Closes scylladb/scylladb#29204	2026-04-09 11:43:40 +03:00
Botond Dénes	c162277b28	Merge 'Perform full connection set-up for CertificateAuthorization in process_startup()' from Pavel Emelyanov The code responds ealry with READY message, but lack some necessary set up, namely: * update_scheduling_group(): without it, the connection runs under the default scheduling group instead of the one mapped to the user's service level. * on_connection_ready(): without it, the connection never releases its slot in the uninitialized-connections concurrency semaphore (acquired at connection creation), leaking one unit per cert-authenticated connection for the lifetime of the connection. * _authenticating = false / _ready = true: without them, system.clients reports connection_stage = AUTHENTICATING forever instead of READY (not critical, but not nice either) The PR fixes it and adds a regression test, that (for sanity) also covers AllowAll and Password authrticators Fixes SCYLLADB-1226 Present since 2025.1, probably worth backporting Closes scylladb/scylladb#29220 * github.com:scylladb/scylladb: transport: fix process_startup cert-auth path missing connection-ready setup transport: test that connection_stage is READY after auth via all process_startup paths	2026-04-09 11:43:02 +03:00
Raphael S. Carvalho	16e387d5f9	repair/replica: Fix race window where post-repair data is wrongly promoted to repaired During incremental repair, each tablet replica holds three SSTable views: UNREPAIRED, REPAIRING, and REPAIRED. The repair lifecycle is: 1. Replicas snapshot unrepaired SSTables and mark them REPAIRING. 2. Row-level repair streams missing rows between replicas. 3. mark_sstable_as_repaired() runs on all replicas, rewriting the SSTables with repaired_at = sstables_repaired_at + 1 (e.g. N+1). 4. The coordinator atomically commits sstables_repaired_at=N+1 and the end_repair stage to Raft, then broadcasts repair_update_compaction_ctrl which calls clear_being_repaired(). The bug lives in the window between steps 3 and 4. After step 3, each replica has on-disk SSTables with repaired_at=N+1, but sstables_repaired_at in Raft is still N. The classifier therefore sees: is_repaired(N, sst{repaired_at=N+1}) == false sst->being_repaired == null (lost on restart, or not yet set) and puts them in the UNREPAIRED view. If a new write arrives and is flushed (repaired_at=0), STCS minor compaction can fire immediately and merge the two SSTables. The output gets repaired_at = max(N+1, 0) = N+1 because compaction preserves the maximum repaired_at of its inputs. Once step 4 commits sstables_repaired_at=N+1, the compacted output is classified REPAIRED on the affected replica even though it contains data that was never part of the repair scan. Other replicas, which did not experience this compaction, classify the same rows as UNREPAIRED. This divergence is never healed by future repairs because the repaired set is considered authoritative. The result is data resurrection: deleted rows can reappear after the next compaction that merges unrepaired data with the wrongly-promoted repaired SSTable. The fix has two layers: Layer 1 (in-memory, fast path): mark_sstable_as_repaired() now also calls mark_as_being_repaired(session) on the new SSTables it writes. This keeps them in the REPAIRING view from the moment they are created until repair_update_compaction_ctrl clears the flag after step 4, covering the race window in the normal (no-restart) case. Layer 2 (durable, restart-safe): a new is_being_repaired() helper on tablet_storage_group_manager detects the race window even after a node restart, when being_repaired has been lost from memory. It checks: sst.repaired_at == sstables_repaired_at + 1 AND tablet transition kind == tablet_transition_kind::repair Both conditions survive restarts: repaired_at is on-disk in SSTable metadata, and the tablet transition is persisted in Raft. Once the coordinator commits sstables_repaired_at=N+1 (step 4), is_repaired() returns true and the SSTable naturally moves to the REPAIRED view. The classifier in make_repair_sstable_classifier_func() is updated to call is_being_repaired(sst, sstables_repaired_at) in place of the previous sst->being_repaired.uuid().is_null() check. A new test, test_incremental_repair_race_window_promotes_unrepaired_data, reproduces the bug by: - Running repair round 1 to establish sstables_repaired_at=1. - Injecting delay_end_repair_update to hold the race window open. - Running repair round 2 so all replicas complete mark_sstable_as_repaired (repaired_at=2) but the coordinator has not yet committed step 4. - Writing post-repair keys to all replicas and flushing servers[1] to create an SSTable with repaired_at=0 on disk. - Restarting servers[1] so being_repaired is lost from memory. - Waiting for autocompaction to merge the two SSTables on servers[1]. - Asserting that the merged SSTable contains post-repair keys (the bug) and that servers[0] and servers[2] do not see those keys as repaired. NOTE FOR MAINTAINER: Copilot initially only implemented Layer 1 (the in-memory being_repaired guard), missing the restart scenario entirely. I pointed out that being_repaired is lost on restart and guided Copilot to add the durable Layer 2 check. I also polished the implementation: moving is_being_repaired into tablet_storage_group_manager so it can reuse the already-held _tablet_map (avoiding an ERM lookup and try/catch), passing sstables_repaired_at in from the classifier to avoid re-reading it, and using compaction_group_for_sstable inside the function rather than threading a tablet_id parameter through the classifier. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1239. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Closes scylladb/scylladb#29244	2026-04-09 11:42:28 +03:00
Dawid Mędrek	a8bc90a375	Merge 'cql3: fix DESCRIBE INDEX WITH INTERNALS name' from Piotr Smaron This series fixes two related inconsistencies around secondary-index names. 1. `DESCRIBE INDEX ... WITH INTERNALS` returned the backing materialized-view name in the `name` column instead of the logical index name. 2. The snapshot REST API accepted backing table names for MV-backed secondary indexes, but not the logical index names exposed to users. The snapshot side now resolves logical secondary-index names to backing table names where applicable, reports logical index names in snapshot details, rejects vector index names with HTTP 400, and keeps multi-keyspace DELETE atomic by resolving all keyspaces before deleting anything. The tests were also extended accordingly, and the snapshot test helper was fixed to clean up multi-table snapshots using one DELETE per table. Fixes: SCYLLADB-1122 Minor bugfix, no need to backport. Closes scylladb/scylladb#29083 * github.com:scylladb/scylladb: cql3: fix DESCRIBE INDEX WITH INTERNALS name test: add snapshot REST API tests for logical index names test: fix snapshot cleanup helper api: clarify snapshot REST parameter descriptions api: surface no_such_column_family as HTTP 400 db: fix clear_snapshot() atomicity and use C++23 lambda form db: normalize index names in get_snapshot_details() db: add resolve_table_name() to snapshot_ctl	2026-04-09 08:37:51 +03:00
Piotr Dulikowski	ec0231c36c	Merge 'db/view/view_building_worker: lock staging sstables mutex for all necessary shards when creating tasks' from Michał Jadwiszczak To create `process_staging` view building tasks, we firstly need to collect informations about them on shard0, create necessary mutations, commit them to group0 and move staging sstables objects to their original shards. But there is a possible race after committing the group0 command and before moving the staging sstables to their shards. Between those two events, the coordinator may schedule freshly created tasks and dispatch them to the worker but the worker won't have the sstables objects because they weren't moved yet. This patch fixes the race by holding `_staging_sstables_mutex` locks from all necessary shards when executing `create_staging_sstable_tasks()`. With this, even if the task will be scheduled and dispatched quickly, the worker will wait with executing it until the sstables objects are moved and the locks are released. Fixes SCYLLADB-816 This PR should be backported to all versions containing view building coordinator (2025.4 and newer). Closes scylladb/scylladb#29174 * github.com:scylladb/scylladb: db/view/view_building_worker: fix indentation db/view/view_building_worker: lock staging sstables mutex for necessary shards when creating tasks	2026-04-09 08:37:51 +03:00
Piotr Smaron	d458ff50b0	cql3: fix DESCRIBE INDEX WITH INTERNALS name DESCRIBE INDEX ... WITH INTERNALS returned the name of the backing materialized view in the name column instead of the logical index name. Return the logical index name from schema::describe() for index schemas so all callers observe the user-facing name consistently. Fixes: SCYLLADB-1122	2026-04-08 13:38:17 +02:00
Piotr Smaron	04837ba20f	test: add snapshot REST API tests for logical index names Add focused REST coverage for logical secondary-index names in snapshot creation, deletion, and details output. Also cover vector-index rejection and verify multi-keyspace delete resolves all keyspaces before deleting anything so mixed index kinds cannot cause partial removal.	2026-04-08 13:38:17 +02:00
Piotr Smaron	6b85da3ce3	test: fix snapshot cleanup helper The snapshot REST helper cleaned up multi-table snapshots with a single DELETE request that passed a comma-separated cf filter, but the API accepts only one table name there. Delete each table snapshot separately so existing tests that snapshot multiple tables use the API as documented.	2026-04-08 13:36:27 +02:00
Piotr Smaron	3090684dad	api: clarify snapshot REST parameter descriptions Document the current /storage_service/snapshots behavior more accurately. For DELETE, cf is a table filter applied independently in each keyspace listed in kn. If cf is omitted or empty, snapshots for all tables are eligible, and secondary indexes can be addressed by their logical index name.	2026-04-08 13:36:27 +02:00
Piotr Smaron	6ee75c74bd	api: surface no_such_column_family as HTTP 400 Snapshot requests that name a non-existent table or a non-snapshotable logical index currently surface an internal server error. Translate no_such_column_family into a bad request so callers get a client-facing error that matches the invalid input.	2026-04-08 13:36:27 +02:00
Piotr Smaron	7d83a264ac	db: fix clear_snapshot() atomicity and use C++23 lambda form clear_snapshot() applies a table filter independently in each keyspace, so logical index names must be resolved per keyspace on the delete path as well. Resolve all keyspaces before deleting anything so a later failure cannot partially remove a snapshot, and use the explicit-object-parameter coroutine lambda form for the asynchronous implementation.	2026-04-08 13:36:27 +02:00
Piotr Smaron	39baa1870e	db: normalize index names in get_snapshot_details() Snapshot details exposed backing secondary-index view names instead of logical index names. Normalize index entries in get_snapshot_details() so the REST API reports the user-facing name, and update the existing REST test to assert that behavior directly.	2026-04-08 13:36:27 +02:00
Piotr Smaron	9c37f1def2	db: add resolve_table_name() to snapshot_ctl The snapshot REST API accepted backing secondary-index table names, but not logical index names. Introduce resolve_table_name() so snapshot creation can translate a logical index name to the backing table when the index is materialized as a view.	2026-04-08 13:36:27 +02:00
Petr Gusev	7750d5737c	strong consistency: replace local consistency with global Currently we don't support 'local' consistency, which would imply maintaining separate raft group for each dc. What we support is actually 'global' consistency -- one raft group per tablet replica set. We don't plan to support local consistency for the first GA. Closes scylladb/scylladb#29221	2026-04-08 12:52:32 +02:00
Patryk Jędrzejczak	850db950f8	Merge 'raft: include demoted voters in read barrier during joint config' from Qian Cheng Hi, thanks for Scylla! We found a small issue in tracker::set_configuration() during joint consensus and put together a fix. When a server is demoted from voter to non-voter, set_configuration processes the current config first (can_vote=false), then the previous config. But when it finds the server already in the progress map (tracker.cc:118), it hits `continue` without updating can_vote. So the server's follower_progress::can_vote stays false even though it's still a voter in the previous config. This causes broadcast_read_quorum (fsm.cc:1055) to skip the demoted server, reducing the pool of responders. Since committed() correctly includes the server in _previous_voters for quorum calculation, read barriers can stall if other servers are slow. The fix is to use configuration::can_vote() in tracker::set_configuration. We included a reproduction unit test (test_tracker_voter_demotion_joint_config) that extracts the set_configuration algorithm and demonstrates the mismatch. We weren't able to build the full Scylla test suite to add an in-tree test, so we kept it as a standalone file for reference. No backport: the bug is non-critical and the change needs some soak time in master. Closes scylladb/scylladb#29226 * https://github.com/scylladb/scylladb: fix: use is_voter::yes instead of true in test assertions test: add tracker voter demotion test to fsm_test.cc fix: use configuration::can_vote() in tracker::set_configuration	2026-04-08 12:37:27 +02:00
Qian-Cheng-nju	a416238155	test: add tracker voter demotion test to fsm_test.cc	2026-04-08 12:37:19 +02:00
Qian-Cheng-nju	f72528c759	raft: use configuration::can_vote() in tracker::set_configuration	2026-04-08 12:37:16 +02:00
Michał Jadwiszczak	568f20396a	test: fix flaky test_create_index_synchronous_updates trace event race The test_create_index_synchronous_updates test in test_secondary_index_properties.py was intermittently failing with 'assert found_wanted_trace' because the expected trace event 'Forcing ... view update to be synchronous' was missing from the trace events returned by get_query_trace(). Root cause: trace events are written asynchronously to system_traces.events. The Python driver's populate() method considers a trace complete once the session row in system_traces.sessions has duration IS NOT NULL, then reads events exactly once. Since the session row and event rows are written as separate mutations with no transactional guarantee, the driver can read an incomplete set of events. Evidence from the failed CI run logs: - The entire test (CREATE TABLE through DROP TABLE) completed in ~300ms (01:38:54,859 - 01:38:55,157) - The INSERT with tracing happened in a ~50ms window between the second CREATE INDEX completing (01:38:55,108) and DROP TABLE starting (01:38:55,157) - The 'Forcing ... synchronous' trace message is generated during the INSERT write path (db/view/view.cc:2061), so it was produced, but not yet flushed to system_traces.events when the driver read them - This matches the known limitation documented in test/alternator/ test_tracing.py: 'we have no way to know whether the tracing events returned is the entire trace' Fix: replace the single-shot trace.events read with a retry loop that directly queries system_traces.events until the expected event appears (with a 30s timeout). Use ConsistencyLevel.ONE since system_traces has RF=2 and cqlpy tests run on a single-node cluster. The same race condition pattern exists in test_mv_synchronous_updates in test_materialized_view.py (which this test was modeled after), so the same fix is proactively applied there as well. Fixes SCYLLADB-1314 Closes scylladb/scylladb#29374	2026-04-08 12:35:10 +02:00
Raphael S. Carvalho	f941a77867	scripts/base36-uuid: dump date in UTC Previously, the timestamp decoded from a timeuuid was printed using the local timezone via datetime.fromtimestamp(), which produces different output depending on the machine's locale settings. ScyllaDB logs are emitted in UTC by default. Printing the decoded date in UTC makes it straightforward to correlate SSTable identifiers with log entries without having to mentally convert timezones. Also fix the embedded pytest assertion, which was accidentally correct only on machines in UTC+8 — it now uses an explicit UTC-aware datetime. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Closes scylladb/scylladb#29253	2026-04-08 12:19:55 +03:00
Yaniv Michael Kaul	c385c0bdf9	.github/workflows/call_validate_pr_author_email.yml: add missing workflow permissions Add explicit permissions block (contents: read, pull-requests: write, statuses: write) matching the requirements of the called reusable workflow which checks out code, posts PR comments, and sets commit statuses. Fixes code scanning alert #172. Closes scylladb/scylladb#29183	2026-04-08 12:19:55 +03:00
Pavel Emelyanov	788ecaa682	api: Fix enable_injection to accept case-insensitive bool parameter Replace strict case-sensitive '== "True"' check with strcasecmp(..., "true") so that Python's str(True) -> "True" is properly recognized. Accepts any case variation of "true" ("True", "TRUE", etc.), with empty string defaulting to false. Maintains backward compatibility with out-of-tree tests that rely on Python's bool stringification. The goal is to reduce the number of distinct ways API handlers use to convert string http query parameters into bool variables. This place is the only one that simply compares param to "True". Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Closes scylladb/scylladb#29236	2026-04-08 12:19:55 +03:00
Avi Kivity	0fd9ea9701	abseil: update to lts_2026_01_07 Switch to branch lts_2026_01_07, which is exactly equal to upstream now. There were no notable changes in the release notes, but the new versions are more friendly to newer compilers (specifically, in include hygiene). configure.py needs a few library updates; cmake works without change. scylla-gdb.py updated for new hash table layout (by Claude Opus 4.6). * abseil d7aaad83...255c84da (1179): > Abseil LTS branch, Jan 2026, Patch 1 (#2007) > Cherry-picks for LTS 20260107 (#1990) > Apply LTS transformations for 20260107 LTS branch (#1989) > Mark legacy Mutex methods and MutexLock pointer constructors as deprecated > `cleanup`: specify that it's safe to use the class in a signal handler. > Suppress bugprone-use-after-move in benign cases > StrFormat: format scientific notation without heap allocation > Introduce a legacy copy of GetDebugStackTraceHook API. > Report 1ns instead of 0ns for probe_benchmarks. Some tools incorrectly assume that benchmark was not run if 0ns reported. > Add absl::chunked_queue > `CRC32` version of `CombineContiguous` for length <= 32. > Add `absl::down_cast` > Fix FixedArray iterator constructor, which should require input_iterator, not forward_iterator > Add a latency benchmark for hashing a pair of integers. > Delete absl::strings_internal::STLStringReserveAmortized() > As IsAtLeastInputIterator helper > Use StringAppendAndOverwrite() in CEscapeAndAppendInternal() > Add support for absl::(u)int128 in FastIntToBuffer() > absl/strings: Prepare helper for printing objects to string representations. > Use SimpleAtob() for parsing bool flags > No-op changes to relative timeout support code. > Adjust visibility of heterogeneous_lookup_testing.h > Remove -DUNORDERED_SET_CXX17 since the macro no longer exists > [log] Prepare helper for streaming container contents to strings. > Restrict the visibility of some internal testing utilities > Add absl::linked_hash_set and absl::linked_hash_map > [meta] Add constexpr testing helper. > BUILD file reformatting. > `absl/meta`: Add C++17 port of C++20 `requires` expression for internal use > Remove the implementation of `absl::string_view`, which was only needed prior to C++17. `absl::string_view` is now an alias for `std::string_view`. It is recommended that clients simply use `std::string_view`. > No public description > absl:🎏 Stop echoing file content in flagfile parsing errors Modified ArgsList::ReadFromFlagfile to redact the content of unexpected lines from error messages. \ > Refactor the declaration of `raw_hash_set`/`btree` to omit default template parameters from the subclasses. > Import of CCTZ from GitHub. > Add ABSL_ATTRIBUTE_LIFETIME_BOUND to Flag help generator > Correct `Mix4x16Vectors` comment. > Special implementation for string hash with sizes greater than 64. > Reorder function parameters so that hash state is the first argument. > Search more aggressively for open slots in absl::internal_stacktrace::BorrowedFixupBuffer > Implement SpinLockHolder in terms of std::lock_guard. > No public description > Avoid discarding test matchers. > Import of CCTZ from GitHub. > Automated rollback of commit 9f40d6d6f3cfc1fb0325dd8637eb65f8299a4b00. > Enable clang-specific warnings on the clang-cl build instead of just trying to be MSVC > Enable clang-specific warnings on the clang-cl build instead of just trying to be MSVC > Make AnyInvocable remember more information > Add further diagnostics under clang for string_view(nullptr) > Import of CCTZ from GitHub. > Document the differing trimming behavior of absl::Span::subspan() and std::span::subspan() > Special implementation for string hash with sizes in range [33, 64]. > Add the deleted string_view(std::nullptr_t) constructor from C++23 > CI: Use a cached copy of GoogleTest in CMake builds if possible to minimize the possibility of errors downloading from GitHub > CI: Enable libc++ hardening in the ASAN build for even more checks https://libcxx.llvm.org/Hardening.html > Call the common case of AllocateBackingArray directly instead of through the function pointer. > Change AlignedType to have a void* array member so that swisstable backing arrays end up in the pointer-containing partition for heap partitioning. > base: Discourage use of ABSL_ATTRIBUTE_PACKED > Revert: Add an attribute to HashtablezInfo which performs a bitwise XOR on all hashes. The purposes of this attribute is to identify if identical hash tables are being created. If we see a large number of identical tables, it's likely the code can be improved by using a common table as opposed to keep rebuilding the same one. > Import of CCTZ from GitHub. > Record insert misses in hashtable profiling. > Add absl::StatusCodeToStringView. > Add a missing dependency on str_format that was being pulled in transitively > Pico-optimize `SkipWhitespace` to use `StripLeadingAsciiWhitespace`. > absl::string_view: Upgrade the debug assert on the single argument char* constructor to ABSL_HARDENING_ASSERT > Use non-stack storage for stack trace buffers > Fixed incorrect include for ABSL_NAMESPACE_BEGIN > Add ABSL_REFACTOR_INLINE to separate the inliner directive from the deprecated directive so that we can give users a custom deprecation message. > Reduce stack usage when unwinding without fixups > Reduce stack usage when unwinding from 170 to 128 on x64 > Rename RecordInsert -> RecordInsertMiss. > PR #1968: Use std::move_backward within InlinedVector's Storage::Insert > Use the new absl::StringResizeAndOverwrite() in CUnescape() > Explicitly instantiate common `raw_hash_set` backing array functions. > Rollback reduction of maximum load factor. Now it is back to 28/32. > Export Mutex::Dtor from shared libraries in NDEBUG mode > Allow `IsOkAndHolds` to rely on duck typing for matching `StatusOr` like types instead of uniquely `absl::StatusOr`, e.g. `google::cloud::StatusOr`. > Fix typo in macro and add missing static_cast for WASM builds. > windows(cmake): add abseil_test_dll to target link libraries when required > Handle empty strings in `SimpleAtof` after stripping whitespace > Avoid using a thread_local in an inline function since this causes issues on some platforms. > (Roll forward) Change Abseil's SpinLock adaptive_spin_count to a class static variable that can be set by tcmalloc friend classes. > Change Abseil's SpinLock adaptive_spin_count to a class static variable that can be set by tcmalloc friend classes. > Change Abseil's SpinLock adaptive_spin_count to a class static variable that can be set by tcmalloc friend classes. > Fixes for String{Resize\|Append}AndOverwrite - StringAppendAndOverwrite() should always call StringResizeAndOverwrite() with at least capacity() in case the standard library decides to shrink the buffer (Fixes #1965) - Small refactor to make the minimum growth an addition for clarity and to make it easier to test 1.5x growth in the future - Turn an ABSL_HARDENING_ASSERT into a ThrowStdLengthError - Add a missing std::move > Correct the supported features of Status Matchers > absl/time: Use "memory order acquire" for loads, which would allow for the safe removal of the data memory barrier. > Use the new absl::StringResizeAndOverwrite() in string escaping utilities > Add an internal-only helper StringAppendAndOverwrite() similar to StringResizeAndOverwrite() but optimized for repeated appends, using exponential growth to ensure amortized complexity of increasing a string size by a small amount is O(1). > Release `ABSL_EXPECT_OK` and `ABSL_ASSERT_OK`. > Fix the CHECK_XX family of macros to not print `char` arguments as C-strings if the comparison happened as pointers. Printing as pointers is more relevant to the result of the comparison. > Rollback StringAppendAndOverwrite() - the problem is that StringResizeAndOverwrite has MSAN testing of the entire string. This causes quadratic MSAN verification on small appends. > Add an internal-only helper StringAppendAndOverwrite() similar to StringResizeAndOverwrite() but optimized for repeated appends, using exponential growth to ensure amortized complexity of increasing a string size by a small amount is O(1). > PR #1961: Fix Clang warnings on powerpc > Use the new absl::StringResizeAndOverwrite() in string escaping utilities > Use the new absl::StringResizeAndOverwrite() in string escaping utilities > macOS CI: Move the Bazel vendor_dir to ${HOME} to workaround a Bazel issue where it does not work when it is in ${TMP} and also fix the quoting which was causing it to incorrectly receive the argument > Use __msan_check_mem_is_initialized for detailed MSan report > Optimize stack unwinding by reducing `AddressIsReadable` calls. > Add internal API to allow bypassing stack trace fixups when needed > absl::StrFormat: improve test coverage with scientific exponent test cases > Add throughput and latency benchmarks for `absl::ToDoubleXYZ` functions. > CordzInfo: Use absl::NoDestructor to remove a global destructor. Chromium requires no global destructors. > string_view: Enable std::view and std::borrowed_range > cleanup: s/logging_internal/log_internal/ig for consistency > Use the new absl::StringResizeAndOverwrite() in string escaping utilities > Use the new absl::StringResizeAndOverwrite() in string escaping utilities > Use the new absl::StringResizeAndOverwrite() in absl::AsciiStrTo{Lower\|Upper} > Use the new absl::StringResizeAndOverwrite() in absl::StrJoin() > Use the new absl::StringResizeAndOverwrite() in absl::StrCat() > string_view: Fix include order > Don't pass nullptr as the 1st arg of `from_chars` > absl/types: format code with clang-format. > Validate absl::StringResizeAndOverwrite op has written bytes as expected. > Skip the ShortStringCollision test on WASM. > Rollback `absl/types`: format code with clang-format. > Remove usage of the WasmOffsetConverter for Wasm / Emscripten stack-traces. > Use the new absl::StringResizeAndOverwrite() in absl::CordCopyToString() > Remove an undocumented behavior of --vmodule and absl::SetVLogLevel that could set a module_pattern to defer to the global vlog threshold. > Update to rules_cc 0.2.9 > Avoid redefine warnings with ntstatus constants > PR #1944: Use same element-width for non-temporal loads and stores on Arm > absl::StringResizeAndOverwrite(): Add the requirement that the only value that can be written to buf[size] is the terminator character. > absl/types: format code with clang-format. > Minor formatting changes. > Remove `IntIdentity` and `PtrIdentity` from `raw_hash_set_probe_benchmark`. > Automated rollback of commit cad60580dba861d36ed813564026d9774d9e4e2b. > FlagStateInterface implementors need only support being restored once. > Clarify the post-condition of `reserve()` in Abseil hash containers. > Clarify the post-condition of `reserve()` in Abseil hash containers. > Represent dropped samples in hashtable profile. > Add lifetimebound to absl::implicit_cast and make it work for rvalue references as it already does with lvalue references > Clean up a doc example where we had `absl_nonnull` and `= nullptr;` > Change Cordz to synchronize tracked cords with Snapshots / DeleteQueue > Minor refactor to `num_threads` in deadlock test > Rename VLOG macro parameter to match other uses of this pseudo type. > `time`: Fix indentation > Automated Code Change > Adds `absl::StringResizeAndOverwrite` as a polyfill for C++23's `std::basic_string<CharT,Traits,Allocator>::resize_and_overwrite` > Internal-only change > absl/time: format code with clang-format. > No public description > Expose typed releasers of externally appended memory. > Fix __declspec support for ABSL_DECLARE_FLAG() > Annotate absl::AnyInvocable as an owner type via [[gsl::Owner]] and absl_internal_is_view = std::false_type > Annotate absl::FunctionRef as a view type via [[gsl::Pointer]] and absl_internal_is_view > Remove unnecessary dep on `core_headers` from the `nullability` cc_library > type_traits: Add type_identity and type_traits_t backfills > Refactor raw_hash_set range insertion to call private insert_range function. > Fix bug in absl::FunctionRef conversions from non-const to const > PR #1937: Simplify ConvertSpecialToEmptyAndFullToDeleted > Improve absl::FunctionRef compatibility with C++26 > Add a workaround for unused variable warnings inside of not-taken if-constexpr codepaths in older versions of GCC > Annotate ABSL_DIE_IF_NULL's return type with `absl_nonnull` > Move insert index computation into `PrepareInsertLarge` in order to reduce inlined part of insert/emplace operations. > Automated Code Change > PR #1939: Add missing rules_cc loads > Expose (internally) a LogMessage constructor taking file as a string_view for (internal, upcoming) FFI integration. > Fixed up some #includes in mutex.h > Make absl::FunctionRef support non-const callables, aligning it with std::function_ref from C++26 > Move capacity update in `Grow1To3AndPrepareInsert` after accessing `common.infoz()` to prevent assertion failure in `control()`. > Fix check_op(s) compilation failures on gcc 8 which eagerly tries to instantiate std::underlying_type for non-num types. > Use `ABSL_ATTRIBUTE_ALWAYS_INLINE`for lambda in `find_or_prepare_insert_large`. > Mark the implicit floating operators as constexpr for `absl::int128` and `absl::uint128` > PR #1931: raw_hash_set: fix instantiation for recursive types on MSVC with /Zc:__cplusplus > Add std::pair specializations for IsOwner and IsView > Cast ABSL_MIN_LOG_LEVEL to absl::LogSeverityAtLeast instead of absl::LogSeverity. > Fix a corner case in the aarch64 unwinder > Fix inconsistent nullability annotation in ReleasableMutexLock > Remove support for Native Client > Rollback f040e96b93dba46e8ed3ca59c0444cbd6c0a0955 > When printing CHECK_XX failures and both types are unprintable, don't bother printing " (UNPRINTABLE vs. UNPRINTABLE)". > PR #1929: Fix shorten-64-to-32 warning in stacktrace_riscv-inl.inc > Refactor `find_or_prepare_insert_large` to use a single return statement using a lambda. > Use possible CPUs to identify NumCPUs() on Linux. > Fix incorrect nullability annotation of `absl::Cord::InlineRep::set_data()`. > Move SetCtrl family of functions to cc file. > Change absl::InlinedVector::clear() so that it does not deallocate any allocated space. This allows allocations to be reused and matches the behavior specification of std::vector::clear(). > Mark Abseil container algorithms as `constexpr` for C++20. > Fix `CHECK_<OP>` ambiguous overload for `operator<<` in older versions of GCC when C-style strings are compared > stacktrace_test: avoid spoiling errno in the test signal handler. > Optimize `CRC32AcceleratedX86ARMCombinedMultipleStreams::Extend` by interleaving the `CRC32_u64` calls at a lower level. > stacktrace_test: avoid spoiling errno in the test signal handler. > stacktrace_test: avoid spoiling errno in the test signal handler. > std::multimap::find() is not guaranteed to return the first entry with the requested key. Any may be returned if many exist. > Mark `/`, `%`, and `` operators as constexpr when intrinsics are available. > Add the C++20 string_view contructor that uses iterators > Implement absl::erase_if for absl::InlinedVector > Adjust software prefetch to fetch 5 cachelines ahead, as benchmarking suggests this should perform better. > Reduce maximum load factor to 27/32 (from 28/32). > Remove unused include > Remove unused include statement > PR #1921: Fix ABSL_BUILD_DLL mode (absl_make_dll) with mingw > PR #1922: Enable mmap for WASI if it supports the mman header > Rollback C++20 string_view constructor that uses iterators due to broken builds > Add the C++20 string_view contructor that uses iterators > Bump versions of dependencies in MODULE.bazel > Automated Code Change > PR #1918: base: add musl + ppc64le fallback for UnscaledCycleClock::Frequency > Optimize crc32 Extend by removing obsolete length alignment. > Fix typo in comment of `ABSL_ATTRIBUTE_UNUSED`. > Mark AnyInvocable as being nullability compatible. > Ensure stack usage remains low when unwinding the stack, to prevent stack overflows > Shrink #if ABSL_HAVE_ATTRIBUTE_WEAK region sizes in stacktrace_test.cc > <filesystem> is not supported for XTENSA. Disable it in //absl/hash/internal/hash.h. > Use signal-safe dynamic memory allocation for stack traces when necessary > PR #1915: Fix SYCL Build Compatibility with Intel LLVM Compiler on Windows for abseil > Import of CCTZ from GitHub. > Tag tests that currently fail on ios_sim_arm64 with "no_test_ios_sim_arm64" > Automated Code Change > Automated Code Change > Import of CCTZ from GitHub. > Move comment specific to pointer-taking MutexLock variant to its definition. > Add lifetime annotations to MutexLock, SpinLockHolder, etc. > Add lifetimebound annotations to absl::MakeSpan and absl::MakeConstSpan to detect dangling references > Remove comment mentioning deferenceability. > Add referenceful MutexLock with Condition overload. > Mark SpinLock camel-cased methods as ready for inlining. > Whitespace change > In logging tests that write expectations against `ScopedMockLog::Send`, suppress the default behavior that forwards to `ScopedMockLog::Log` so that unexpected logs are printed with full metadata. Many of these tests are poking at those metadata, and a failure message that doesn't include them is unhelpful. > Add ABSL_ATTRIBUTE_LIFETIME_BOUND to absl::ClippedSubstr > Inline internal usages of Mutex::Lock, etc. in favor of lock. > Inline internal usages of pointerful SpinLockHolder/MutexLock. > Remove wrong comment in Cord::Unref > Update the crc32 dynamic dispatch table with newer platforms. > PR #1914: absl/base/internal/poison.cc: Minor build fix > Accept references on SpinLockHolder/MutexLock > Import of CCTZ from GitHub. > Fix typos in comments. > Inline SpinLock Lock->lock, Unlock->unlock internal to Abseil. > Rename Mutex methods to use the typical C++ lower case names. > Rename SpinLock methods to use the typical C++ lower case names. > Add an assert that absl::StrSplit is not called with a null char argument. > Fix sign conversion warning > PR #1911: Fix absl_demangle_test on ppc64 > Disallow using a hash function whose return type is smaller than size_t. > Optimize CRC-32C extension by zeroes > Deduplicate stack trace implementations in stacktrace.cc > Align types of location_table_ and mapping_table_ keys (-Wshorten-64-to-32). > Move SigSafeArena() out to absl/base/internal/low_level_alloc.h > Allow CHECK_<OP> variants to be used with unprintable types. > Import of CCTZ from GitHub. > Adds required load statements for C++ rules to BUILD and bzl files. > Disable sanitizer bounds checking in ComputeZeroConstant. > Roll back NDK weak symbol mode for backtrace() due to internal test breakage > Add converter for extracting SwissMap profile information into a https://github.com/google/pprof suitable format for inspection. > Allocate memory for frames and sizes during stack trace fix-up when no memory is provided > Support NDK weak symbol mode for backtrace() on Android. > Change skip_empty_or_deleted to not use groups. > Fix bug of dereferencing invalidated iterator in test case. > Refactor: split erase_meta_only into large and small versions. > Fix a TODO to use std::is_nothrow_swappable when it became available. > Clean up the testing of alternate options that were removed in previous changes > Only use generic stacktrace when ABSL_HAVE_THREAD_LOCAL. > Automated Code Change > Add triviality tests for absl::Span > Loosen the PointerAlignment test to allow up to 5 stuck bits to avoid flakiness. > Prevent conversion constructions from absl::Span to itself > Skip flaky expectations in waiter_test for MSVC. > Refactor: call AssertIsFull from iterator::assert_is_full to avoid passing the same arguments repeatedly. > In AssertSameContainer, remove the logic checking for whether the iterators are from SOO tables or not since we don't use it to generate a more informative debug message. > Remove unused NonIterableBitMask::HighestBitSet function. > Refactor: move iterator unchecked_* members before data members to comply with Google C++ style guide. > Mix pointers once instead of twice now that we've improved mixing on 32-bit platforms and improved the kMul constant. > Remove unused utility functions/constants. > Revert a change for breaking downstream third party libs > Remove unneeded include from cord_rep_btree_navigator.h > Refactor: move find_first_non_full into raw_hash_set.cc. > Perform stronger mixing on 32-bit platforms and enable the LowEntropyStrings test. > Include deallocated caller-provided size in delete hooks. > Roll back one more time: In debug mode, assert that the probe sequence isn't excessively long. > Allow a `std::move` of `delimiter_` to happen in `ByString::ByString(ByString&&)`. Right now the move ctor is making a copy because the source object is `const`. > Assume that control bytes don't alias CommonFields. > Consistently use [[maybe_unused]] in raw_hash_set.h for better compiler warning compatibility. > Roll forward: In debug mode, assert that the probe sequence isn't excessively long. > Add a new test for hash collisions for short strings when PrecombineLengthMix has low quality. > Refactor: define CombineRawImpl for repeated `Mix(state ^ value, kMul)` operations. > Automated Code Change > Mark hash_test as large so that the timeout is increased. > Change the value of kMul to have higher entropy and prevent collisions when keys are aligned integers or pointers. > Fix LIFETIME annotations for op/op->/value operators for reference types. > Update StatusOr to support lvalue reference value types. > Rollback debug assertion that the probe sequence isn't excessively long. > AnyInvocable: Fix operator==/!= comments > In debug mode, assert that the probe sequence isn't excessively long. > Improve NaN handling in absl::Duration arithmetic. > Change PrecombineLengthMix to sample data from kStaticRandomData. > Fix includes and fuse constructors of SpinLock. > Enable `operator==` for `StatusOr` only if the contained type is equality-comparable > Enable SIMD memcpy-crc on ARM cores. > Improve mixing on 32-bit platforms. > Change DurationFromDouble to return -InfiniteDuration() for all NaNs. > Change return type of hash internal `Seed` to `size_t` from `uint64_t` > CMake: Add a fatal error when the compiler defaults to or is set to a C++ language standard prior to C++17. > Make bool true hash be ~size_t{} instead of 1 so that all bits are different between true/false instead of only one. > Automated Code Change > Pass swisstable seed as seed to absl::Hash so we can save an XOR in H1. > Add support for scoped enumerations in CHECK_XX(). > Revert no-inline on Voidify::operator&&() -- caused unexpected binary size growth > Mark Voidify::operator&&() as no-inline. This improves stack trace for `LOG(FATAL)` with optimization on. > Refactor long strings hash computations and move `len <= PiecewiseChunkSize()` out of the line to keep only one function call in the inlined hash code. > rotr/rotl: Fix undefined behavior when passing INT_MIN as the number of positions to rotate by > Reorder members of MixingHashState to comply with Google C++ style guide ordering of type declarations, static constants, ctors, non-ctor functions. > Delete unused function ShouldSampleHashtablezInfoOnResize. > Remove redundant comments that just name the following symbol without providing additional information. > Remove unnecessary modification of growth info in small table case. > Suppress CFI violation on VDSO call. > Replace WeakMix usage with Mix and change H2 to use the most significant 7 bits - saving 1 cycle in H1. > Fix -Wundef warning > Fix conditional constexpr in ToInt64{Nano\|Micro\|Milli}seconds under GCC7 and GCC8 using an else clause as a workaround > Enable CompressedTupleTest.NestedEbo test case. > Lift restriction on using EBCO[1] for nested CompressedTuples. The current implementation of CompressedTuple explicitly disallows EBCO for cases where CompressedTuples are nested. This is because the implentation for a tuple with EBCO-compatible element T inherits from Storage<T, I>, where I is the index of T in the tuple, and > absl::string_view: assert against (data() == nullptr && size() != 0) > Fix a false nullability warning in [Q]CHECK_OK by replacing nullptr with an empty char > Make `combine_contiguous` to mix length in a weak way by adding `size << 24`, so that we can avoid a separate mixing of size later. The empty range is mixing 0x57 byte. > Add a test case that -1.0 and 1.0 have different hashes. > Update CI to a more recent Clang on Linux x86-64 > `absl::string_view`: Add a debug assert to the single-argument constructor that the argument is not `nullptr`. > Fix CI on macOS Sequoia > Use Xcode 16.3 for testing > Use a proper fix instead of a workaround for a parameter annotated absl_nonnull since the latest Clang can see through the workaround > Assert that SetCtrl isn't called on small tables - there are no control bytes in such cases. > Use `MaskFullOrSentinel` in `skip_empty_or_deleted`. > Reduce flakiness in MockDistributions.Examples test case. > Rename PrepareInsertNonSoo to PrepareInsertLarge now that it's no longer used in all non-SOO cases. > PR #1895: use c++17 in podspec > Avoid hashing the key in prefetch() for small tables. > Remove template alias nullability annotations. > Add `Group::MaskFullOrSentinel` implementation without usage. > Move `hashtable_control_bytes` tests into their own file. > Simplify calls to `EqualElement` by introducing `equal_to` helper function. > Do `common.increment_size()` directly in SmallNonSooPrepareInsert if inserting to reserved 1 element table. > Import of CCTZ from GitHub. > Small cleanup of `infoz` processing to get the logic out of the line or removed. > Extract the entire PrepareInsert to Small non SOO table out of the line. > Take `get_hash` implementation out of the SwissTable class to minimize number of instantiations. > Change kEmptyGroup to kDefaultIterControl now that it's only used for default-constructed iterators. > [bits] Add tests for return types > Avoid allocating control bytes in capacity==1 swisstables. > PR #1888: Adjust Table.GrowExtremelyLargeTable to avoid OOM on i386 > Avoid mixing after `Hash64` calls for long strings by passing `state` instead of `Seed` to low level hash. > Indent absl container examples consistently > Revert- Doesn't actually work because SWIG doesn't use the full preprocessor > Add tags to skip some tests under UBSAN. > Avoid subtracting `it.control()` and `table.control()` in single element table during erase. > Remove the `salt` parameter from low level hash and use a global constant. That may potentially remove some loads. > In SwissTable, don't hash the key when capacity<=1 on insertions. > Remove the "small" size designation for thread_identity_test, which causes the test to timeout after 60s. > Add comment explaining math behind expressions. > Exclude SWIG from ABSL_DEPRECATED and ABSL_DEPRECATE_AND_INLINE > stacktrace_x86: Handle nested signals on altstack > Import of CCTZ from GitHub. > Simplify MixingHashState::Read9To16 to not depend on endianness. > Delete deprecated `absl::Cord::Get` and its remaining call sites. > PR #1884: Remove duplicate dependency > Remove relocatability test that is no longer useful > Import of CCTZ from GitHub. > Fix a bug of casting sizeof(slot_type) to uint16_t instead of uint32_t. > Rewrite `WideToUtf8` for improved readability. > Avoid requiring default-constructability of iterator type in algorithms that use ContainerIterPairType > Added test cases for invalid surrogates sequences. > Use __builtin_is_cpp_trivially_relocatable to implement absl::is_trivially_relocatable in a way that is compatible with PR2786 in the upcoming C++26. > Remove dependency on `wcsnlen` for string length calculation. > Stop being strict about validating the "clone" part of mangled names > Add support for logging wide strings in `absl::log`. > Deprecate `ABSL_HAVE_STD_STRING_VIEW`. > Change some nullability annotations in absl::Span to absl_nullability_unknown to workaround a bug that makes nullability checks trigger in foreach loops, while still fixing the -Wnullability-completeness warnings. > Linux CI update > Fix new -Wnullability-completeness warnings found after upgrading the Clang version used in the Linux ARM CI to Clang 19. > Add __restrict for uses of PolicyFunctions. > Use Bazel vendor mode to cache external dependencies on Windows and macOS > Move PrepareInsertCommon from header file to cc file. > Remove the explicit from the constructor to a test allocator in hash_policy_testing.h. This is rejected by Clang when using the libstdc++ that ships with GCC15 > Extract `WideToUtf8` helper to `utf8.h`. > Updates the documentation for `CHECK` to make it more explicit that it is used to require that a condition is true. > Add PolicyFunctions::soo_capacity() so that the compiler knows that soo_capacity() is always 0 or 1. > Expect different representations of pointers from the Windows toolchain. > Add set_no_seed_for_testing for use in GrowExtremelyLargeTable test. > Update GoogleTest dependency to 1.17.0 to support GCC15 > Assume that frame pointers inside known stack bounds are readable. > Remove fallback code in absl/algorithm/container.h > Fix GCC15 warning that <ciso646> is deprecated in C++17 > Fix misplaced closing brace > Remove unused include. > Automated Code Change > Type erase copy constructor. > Refactor to use hash_of(key) instead of hash_ref()(key). > Create Table.Prefetch test to make sure that it works. > Remove NOINLINE on the constructor with buckets. > In SwissTable, don't hash the key in find when capacity<=1. > Use 0x57 instead of Seed() for weakly mixing of size. > Use absl::InsecureBitGen in place of std::random_device in Abseil tests. > Remove unused include. > Use large 64 bits kMul for 32 bits platforms as well. > Import of CCTZ from GitHub. > Define `combine_weakly_mixed_integer` in HashSelect::State in order to allow `friend auto AbslHashValue` instead of `friend H AbslHashValue`. > PR #1878: Fix typos in comments > Update Abseil dependencies in preparation for release > Use weaker mixing for absl::Hash for types that mix their sizes. > Update comments on UnscaledCycleClock::Now. > Use alignas instead of the manual alignment for the Randen entropy pool. > Document nullability annotation syntax for array declarations (not many people may know the syntax). > Import of CCTZ from GitHub. > Release tests for ABSL_RAW_DCHECK and ABSL_RAW_DLOG. > Adjust threshold for stuck bits to avoid flaky failures. > Deprecate template type alias nullability annotations. > Add more probe benchmarks > PR #1874: Simplify detection of the powerpc64 ELFv1 ABI > Make `absl::FunctionRef` copy-assignable. This brings it more in line with `std::function_ref`. > Remove unused #includes from absl/base/internal/nullability_impl.h > PR #1870: Retry SymInitialize on STATUS_INFO_LENGTH_MISMATCH > Prefetch from slots in parallel with reading from control. > Migrate template alias nullability annotations to macros. > Improve dependency graph in `TryFindNewIndexWithoutProbing` hot path evaluation. > Add latency benchmarks for Hash for strings with size 3, 5 and 17. > Exclude UnwindImpl etc. from thread sanitizer due to false positives. > Use `GroupFullEmptyOrDeleted` inside of `transfer_unprobed_elements_to_next_capacity_fn`. > PR #1863: [minor] Avoid variable shadowing for absl btree > Extend stack-frame walking functionality to allow dynamic fixup > Fix "unsafe narrowing" in absl for Emscripten > Roll back change to address breakage > Extend stack-frame walking functionality to allow dynamic fixup > Introduce `absl::Cord::Distance()` > Avoid aliasing issues in growth information initialization. > Make `GrowSooTableToNextCapacityAndPrepareInsert` in order to initialize control bytes all at once and avoid two function calls on growth right after SOO. > Simplify `SingleGroupTableH1` since we do not need to mix all bits anymore. Per table seed has a good last bit distribution. > Use `NextSeed` instead of `NextSeedBaseNumber` and make the result type to be `uint16_t`. That avoids unnecessary bit twiddling and simplify the code. > Optimize `GrowthToLowerBoundCapacity` in order to avoid division. > [base] Make :endian internal to absl > Fully qualify absl names in check macros to avoid invalid name resolution when the user scope has those names defined. > Fix memory sanitization in `GrowToNextCapacityAndPrepareInsert`. > Define and use `ABSL_SWISSTABLE_ASSERT` in cc file since a lot of logic moved there. > Remove `ShouldInsertBackwards` functionality. It was used for additional order randomness in debug mode. It is not necessary anymore with introduction of separate per table `seed`. > Fast growing to the next capacity based on carbon hash table ideas. > Automated Code Change > Refactor CombinePiecewiseBuffer test case to (a) call PiecewiseChunkSize() to get the chunk size and (b) use ASSERT for expectation in a loop. > PR #1867: Remove global static in stacktrace_win32-inl.inc > Mark Abseil hardening assert in AssertIsValidForComparison as slow. > Roll back a problematic change. > Add absl::FastTypeId<T>() > Automated Code Change > Update TestIntrinsicInt128 test to print the indices with the conflicting hashes. > Code simplification: we don't need XOR and kMul when mixing large string hashes into hash state. > Refactor absl::CUnescape() to use direct string output instead of pointer/size. > Rename `policy.transfer` to `policy.transfer_n`. > Optimize `ResetCtrl` for small tables with `capacity < Group::KWidth * 2` (<32 if SSE enabled and <16 if not). > Use 16 bits of per-table-seed so that we can save an `and` instruction in H1. > Fully annotate nullability in headers where it is partially annotated. > Add note about sparse containers to (flat\|node)_hash_(set\|map). > Make low_level_alloc compatible with -Wthread-safety-pointer > Add missing direct includes to enable the removal of unused includes from absl/base/internal/nullability_impl.h. > Add tests for macro nullability annotations analogous to existing tests for type alias annotations. > Adds functionality to return stack frame pointers during stack walking, in addition to code addresses > Use even faster reduction algorithm in FinalizePclmulStream() > Add nullability annotations to some very-commonly-used APIs. > PR #1860: Add `unsigned` to character buffers to ensure they can provide storage (https://eel.is/c++draft/intro.object#3) > Release benchmarks for absl::Status and absl::StatusOr > Use more efficient reduction algorithm in FinalizePclmulStream() > Add a test case to make it clear that `--vmodule=foo/=1` does match any children and grandchildren and so on under `foo/`. > Gate use of clang nullability qualifiers through absl nullability macros on `nullability_on_classes`. > Mark `absl::StatusOr::status()` as ABSL_MUST_USE_RESULT > Cleanups related to benchmarks Fix many benchmarks to be cc_binary instead of cc_test * Add a few benchmarks for StrFormat * Add benchmarks for Substitute * Add benchmarks for Damerau-Levenshtein distance used in flags > Add a log severity alias `DO_NOT_$UBMIT` intended for logging during development > Avoid relying on true and false tokens in the preprocessor macros used in any_invocable.h > Avoid relying on true and false tokens in the preprocessor macros used in absl/container > Refactor to make it clear that H2 computation is not repeated in each iteration of the probe loop. > Turn on C++23 testing for GCC and Clang on Linux > Fix overflow of kSeedMask on 32 bits platform in `generate_new_seed`. > Add a workaround for std::pair not being trivially copyable in C++23 in some standard library versions > Refactor WeakMix to include the XOR of the state with the input value. > Migrate ClearPacBits() to a more generic implementation and location > Annotate more Abseil container methods with [[clang::lifetime_capture_by(...)]] and make them all forward to the non-captured overload > Make PolicyFunctions always be the second argument (after CommonFields) for type-erased functions. > Move GrowFullSooTableToNextCapacity implementation with some dependencies to cc file. > Optimize btree_iterator increment/decrement to avoid aliasing issues by using local variables instead of repeatedly writing to `this`. > Add constexpr conversions from absl::Duration to int64_t > PR #1853: Add support for QCC compiler > Fix documentation for key requirements of flat_hash_set > Use `extern template` for `GrowFullSooTableToNextCapacity` since we know the most common set of paramenters. > C++23: Fix log_format_test to match the stream format for volatile pointers > C++23: Fix compressed_tuple_test. > Implement `btree::iterator::+=` and `-=`. > Stop calling `ABSL_ANNOTATE_MEMORY_IS_INITIALIZED` for threadlocal counter. > Automated Code Change > Introduce seed stored in the hash table inside of the size. > Replace ABSL_ATTRIBUTE_UNUSED with [[maybe_unused]] > Minor consistency cleanups to absl::BitGen mocking. > Restore the empty CMake targets for bad_any_cast, bad_optional_access, and bad_variant_access to allow clients to migrate. > bits.h: Add absl::endian and absl::byteswap polyfills > Use absl::NoDestructor an absl::Mutex instance in the flags library to prevent some exit-time destructor warnings > Add thread GetEntropyFromRandenPool test > Update nullability annotation documentation to focus on macro annotations. > Simplify some random/internal types; expose one function to acquire entropy. > Remove pre-C++17 workarounds for lack of std::launder > UBSAN: Use -fno-sanitize-recover > int128_test: Avoid testing signed integer overflow > Remove leading commas in `Describe` methods of `StatusIs` matcher. > absl::StrFormat: Avoid passing null to memcpy > str_cat_test: Avoid using invalid enum values > hash_generator_testing: Avoid using invalid enum values > absl::Cord: Avoid passing null to memcpy and memset > graphcycles_test: Avoid applying a non-zero offset to a null pointer > Make warning about wrapping empty std::function in AnyInvocable stronger. > absl/random: Convert absl::BitGen / absl::InsecureBitGen to classes from aliases. > Fix buffer overflow the internal demangling function > Avoid calling `ShouldRehashForBugDetection` on the first two inserts to the table. > Remove the polyfill implementations for many type traits and alias them to their std equivalents. It is recommended that clients now simple use the std equivalents. > ROLLBACK: Limit slot_size to 2^16-1 and maximum table size to 2^43-1. > Limit `slot_size` to `2^16-1` and maximum table size to `2^43-1`. > Use C++17 [[nodiscard]] instead of the deprecated ABSL_MUST_USE_RESULT > Remove the polyfills for absl::apply and absl::make_from_tuple, which were only needed prior to C++17. It is recommended that clients simply use std::apply and std::make_from_tuple. > PR #1846: Fix build on big endian > Bazel: Move environment variables to --action_env > Remove the implementation of `absl::variant`, which was only needed prior to C++17. `absl::variant` is now an alias for `std::variant`. It is recommended that clients simply use `std::variant`. > MSVC: Fix warnings c4244 and c4267 in the main library code > Update LowLevelHashLenGt16 to be LowLevelHashLenGt32 now that the input is guaranteed to be >32 in length. > Xtensa does not support thread_local. Disable it in absl/base/config.h. > Add support for 8-bit and 16-bit integers to absl::SimpleAtoi > CI: Update Linux ARM latest container > Add time hash tests > `any_invocable`: Update comment that refer to C++17 and C++11 > `check_test_impl.inc`: Use C++17 features unconditionally > Remove the implementation of `absl::optional`, which was only needed prior to C++17. `absl::optional` is now an alias for `std::optional`. It is recommended that clients simply use `std::optional`. > Move hashtable control bytes manipulation to a separate file. > Fix a use-after-free bug in which the string passed to `AtLocation` may be referenced after it is destroyed. While the string does live until the end of the full statement, logging (previously occurred) in the destructor of the `LogMessage` which may be constructed before the temporary string (and thus destroyed after the temporary string's destructor). > `internal/layout`: Delete pre-C++17 out of line definition of constexpr class member > Extract slow path for PrepareInsertNonSoo to a separate function `PrepareInsertNonSooSlow`. > Minor code cleanups > `internal/log_message`: Use `if constexpr` instead of SFINAE for `operator<<` > [absl] Use `std::min` in `constexpr` contexts in `absl::string_view` > Remove the implementation of `absl::any`, which was only needed prior to C++17. `absl::any` is now an alias for `std::any`. It is recommended that clients simply use `std::any`. > Remove ABSL_INTERNAL_NEED_REDUNDANT_CONSTEXPR_DECL which is longer needed with the C++17 floor > Make `OptimalMemcpySizeForSooSlotTransfer` ready to work with MaxSooSlotSize upto `3sizeof(size_t)`. > `internal/layout`: Replace SFINAE with `if constexpr` > PR #1830: C++17 improvement: use if constexpr in internal/hash.h > `absl`: Deprecate `ABSL_HAVE_CLASS_TEMPLATE_ARGUMENT_DEDUCTION` > Add a verification for access of being destroyed table. Also enabled access after destroy check in ASAN optimized mode. > Store `CharAlloc` in SwissTable in order to simplify type erasure of functions accepting allocator as `void`. > Introduce and use `SetCtrlInLargeTable`, when we know that table is at least one group. Similarly to `SetCtrlInSingleGroupTable`, we can save some operations. > Make raw_hash_set::slot_type private. > Delete absl/utility/internal/if_constexpr.h > `internal/any_invocable`: Use `if constexpr` instead of SFINAE when initializing storage accessor > Depend on string_view directly > Optimize and slightly simplify `PrepareInsertNonSoo`. > PR #1833: Make ABSL_INTERNAL_STEP_n macros consistent in crc code > `internal/any_invocable`: Use alias `RawT` consistently in `InitializeStorage` > Move the implementation of absl::ComputeCrc32c to the header file, to facilitate inlining. > Delete absl/base/internal/inline_variable.h > Add lifetimebound to absl::StripAsciiWhitespace > Revert: Random: Use target attribute instead of -march > Add return for opt mode in AssertNotDebugCapacity to make sure that code is not evaluated in opt mode. > `internal/any_invocable`: Delete TODO, improve comment and simplify pragma in constructor > Split resizing routines and type erase similar instructions. > Random: Use target attribute instead of -march > `internal/any_invocable`: Use `std::launder` unconditionally > `internal/any_invocable`: Remove suppresion of false positive -Wmaybe-uninitialized on GCC 12 > Fix feature test for ABSL_HAVE_STD_OPTIONAL > Support C++20 iterators in raw_hash_map's random-access iterator detection > Fix mis-located test dependency > Disable the DestroyedCallsFail test on GCC due to flakiness. > `internal/any_invocable`: Implement invocation using `if constexpr` instead of SFINAE > PR #1835: Bump deployment_target version and add visionos to podspec > PR #1828: Fix spelling of pseudorandom in README.md > Make raw_hash_map::key_arg private. > `overload`: Delete obsolete macros for undefining `absl::Overload` when C++ < 17 > `absl/base`: Delete `internal/invoke.h` and `invoke_test.cc` > Remove `WORKSPACE.bazel` > `absl`: Replace `base_internal::{invoke,invoke_result_t,is_invocable_r}` with `std` equivalents > Allow C++20 forward iterators to use fast paths > Factor out some iterator traits detection code > Type erase IterateOverFullSlots to decrease code size. > `any_invocable`: Delete pre-C++17 workarounds for `noexcept` and guaranteed copy elision > Make raw_hash_set::key_arg private. > Rename nullability macros to use new lowercase spelling. > Fix bug where ABSL_REQUIRE_EXPLICIT_INIT did not actually result in a linker error > Make Randen benchmark program use runtime CPU detection. > Add CI for the C++20/Clang/libstdc++ combination > Move Abseil to GoogleTest 1.16.0 > `internal/any_invocable`: Use `if constexpr` instead of SFINAE in `InitializeStorage` > More type-erasing of InitializeSlots by removing the Alloc and AlignOfSlot template parameters. > Actually use the hint space instruction to strip PAC bits for return addresses in stack traces as the comment says > `log/internal`: Replace `..._ATTRIBUTE_UNUSED_IF_STRIP_LOG` with C++17 `[[maybe_unused]]` > `attributes`: Document `ABSL_ATTRIBUTE_UNUSED` as deprecated > `internal/any_invocable`: Initialize using `if constexpr` instead of ternary operator, enum, and templates > Fix flaky tests due to sampling by introducing utility to refresh sampling counters for the current thread. > Minor reformatting in raw_hash_set: - Add a clear_backing_array member to declutter calls to ClearBackingArray. - Remove some unnecessary `inline` keywords on functions. - Make PoisonSingleGroupEmptySlots static. > Update CI for linux_gcc-floor to use GCC9, Bazel 7.5, and CMake 3.31.5. > `internal/any_invocable`: Rewrite `IsStoredLocally` type trait into a simpler constexpr function > Add ABSL_REQUIRE_EXPLICIT_INIT to Abseil to enable enforcing explicit field initializations > Require C++17 > Minimize number of `InitializeSlots` with respect to SizeOfSlot. > Leave the call to `SampleSlow` only in type erased InitializeSlots. > Update comments for Read4To8 and Read1To3. > PR #1819: fix compilation with AppleClang > Move SOO processing inside of InitializeSlots and move it once. > PR #1816: Random: use getauxval() via <sys/auxv.h> > Optimize `InitControlBytesAfterSoo` to have less writes and make them with compile time known size. > Remove stray plus operator in cleanup_internal::Storage > Include <cerrno> to fix compilation error in chromium build. > Adjust internal logging namespacing for consistency s/ABSL_LOGGING_INTERNAL_/ABSL_LOG_INTERNAL_/ > Rewrite LOG_EVERY_N (et al) docs to clarify that the first instance is logged. Also, deliberately avoid giving exact numbers or examples since IRL behavior is not so exact. > ABSL_ASSUME: Use a ternary operator instead of do-while in the implementations that use a branch marked unreachable so that it is usable in more contexts. > Simplify the comment for raw_hash_set::erase. > Remove preprocessors for now unsupported compilers. > `absl::ScopedMockLog`: Explicitly document that it captures logs emitted by all threads > Fix potential integer overflow in hash container create/resize > Add lifetimebound to StripPrefix/StripSuffix. > Random: Rollforward support runtime dispatch on AArch64 macOS > Crc: Only test non_temporal_store_memcpy_avx on AVX targets > Provide information about types of all flags. > Deprecate the precomputed hash find() API in swisstable. > Import of CCTZ from GitHub. > Adjust whitespace > Expand documentation for absl::raw_hash_set::erase to include idiom example of iterator post-increment. > Performance improvement for absl::AsciiStrToUpper() and absl::AsciiStrToLower() > Crc: Remove the __builtin_cpu_supports path for SupportsArmCRC32PMULL > Use absl::NoDestructor for some absl::Mutex instances in the flags library to prevent some exit-time destructor warnings > Update the WORKSPACE dependency of rules_cc to 0.1.0 > Rollback support runtime dispatch on AArch64 macOS for breaking some builds > Downgrade to rules_cc 0.0.17 because 0.1.0 was yanked > Use unused set in testing. > Random: Support runtime dispatch on AArch64 macOS > crc: Use absl::nullopt when returning absl::optional > Annotate absl::FixedArray to warn when unused. > PR #1806: Fix undefined symbol: __android_log_write > Move ABSL_HAVE_PTHREAD_CPU_NUMBER_NP to the file where it is needed > Use rbit instruction on ARM rather than rev. > Debugging: Report the CPU we are running on under Darwin > Add a microbenchmark for very long int/string tuples. > Crc: Detect support for pmull and crc instructions on Apple AArch64 With a newer clang, we can use __builtin_cpu_supports which caches all the feature bits. > Add special handling for hashing integral types so that we can optimize Read1To3 and Read4To8 for the strings case. > Use unused FixedArray instances. > Minor reformatting > Avoid flaky expectation in WaitDurationWoken test case in MSVC. > Use Bazel rules_cc for many compiler-specific rules instead of our custom ones from before the Bazel rules existed. > Mix pointers twice in absl::Hash. > New internal-use-only classes `AsStructuredLiteralImpl` and `AsStructuredValueImpl` > Annotate some Abseil container methods with [[clang::lifetime_capture_by(...)]] > Faster copy from inline Cords to inline Strings > Add new benchmark cases for hashing string lengths 1,2,4,8. > Move the Arm implementation of UnscaledCycleClock::Now() into the header file, like the x86 implementation, so it can be more easily inlined. > Minor include cleanup in absl/random/internal > Import of CCTZ from GitHub. > Use Bazel Platforms to support AES-NI compile options for Randen > In HashState::Create, require that T is a subclass of HashStateBase in order to discourage users from defining their own HashState types. > PR #1801: Remove unncessary <iostream> includes > New class StructuredProtoField > Mix pointers twice in TSan and MSVC to avoid flakes in the PointerAlignment test. > Add a test case that type-erased absl::HashState is consistent with absl::HashOf. > Mix pointers twice in build modes in which the PointerAlignment test is flaky if we mix once. > Increase threshold for stuck bits in PointerAlignment test on android. > Use hashing ideas from Carbon's hashtable in absl hashing: - Use byte swap instead of mixing pointers twice. - Change order of branches to check for len<=8 first. - In len<=16 case, do one multiply to mix the data instead of using the logic from go/absl-hash-rl (reinforcement learning was used to optimize the instruction sequence). - Add special handling for len<=32 cases in 64-bit architectures. > Test that using a table that was moved-to from a moved-from table fails in sanitizer mode. > Remove a trailing comma causing an issue for an OSS user > Add missing includes in hash.h. > Use the public implementation rule for "@bazel_tools//tools/cpp:clang-cl" > Import of CCTZ from GitHub. > Change the definition of is_trivially_relocatable to be a bit less conservative. > Updates to CI to support newer versions of tools > Check if ABSL_HAVE_INTRINSIC_INT128 is defined > Print hash expansions in the hash_testing error messages. > Avoid flakiness in notification_test on MSVC. > Roll back: Add more debug capacity validation checks on moves. > Add more debug capacity validation checks on moves. > Add macro versions of nullability annotations. > Improve fork-safety by opening files with `O_CLOEXEC`. > Move ABSL_HARDENING_ASSERTs in constexpr methods to their own lines. > Add test cases for absl::Hash: - That hashes are consistent for the same int value across different int types. - That hashes of vectors of strings are unequal even when their concatenations are equal. - That FragmentedCord hashes works as intended for small Cords. > Skip the IterationOrderChangesOnRehash test case in ASan mode because it's flaky. > Add missing includes in absl hash. > Try to use file descriptors in the 2000+ range to avoid mis-behaving client interference. > Add weak implementation of the __lsan_is_turned_off in Leak Checker > Fix a bug where EOF resulted in infinite loop. > static_assert that absl::Time and absl::Duration are trivially destructible. > Move Duration ToInt64<unit> functions to be inline. > string_view: Add defaulted copy constructor and assignment > Use `#ifdef` to avoid errors when `-Wundef` is used. > Strip PAC bits for return addresses in stack traces > PR #1794: Update cpu_detect.cc fix hw crc32 and AES capability check, fix undefined > PR #1790: Respect the allocator's .destroy method in ~InlinedVector > Cast away nullability in the guts of CHECK_EQ (et al) where Clang doesn't see that the nullable string returned by Check_EQImpl is statically nonnull inside the loop. > string_view: Correct string_view(const char, size_type) docs > Add support for std::string_view in StrCat even when absl::string_view != std::string_view. > Misc. adjustments to unit tests for logging. > Use local_config_cc from rules_cc and make it a dev dependency > Add additional iteration order tests with reservation. Reserved tables have a different way of iteration randomization compared to gradually resized tables (at least for small tables). > Use all the bits (`popcount`) in `FindFirstNonFullAfterResize` and `PrepareInsertAfterSoo`. > Mark ConsumePrefix, ConsumeSuffix, StripPrefix, and StripSuffix as constexpr since they are all pure functions. > PR #1789: Add missing #ifdef pp directive to the TypeName() function in the layout.h > PR #1788: Fix warning for sign-conversion on riscv > Make StartsWith and EndsWith constexpr. > Simplify logic for growing single group table. > Document that absl::Time and absl::Duration are trivially destructible. > Change some C-arrays to std::array as this enables bounds checking in some hardened standard library builds > Replace outdated select() on --cpu with platform API equivalent. > Take failure_message as const char* instead of string_view in LogMessageFatal and friends. > Mention `c_any_of` in the function comment of `absl::c_linear_search`. > Import of CCTZ from GitHub. > Rewrite some string_view methods to avoid a -Wunreachable-code warning > IWYU: Update includes and fix minor spelling mistakes. > Add comment on how to get next element after using erase. > Add ABSL_ATTRIBUTE_LIFETIME_BOUND and a doc note about absl::LogAsLiteral to clarify its intended use. > Import of CCTZ from GitHub. > Reduce memory consumption of structured logging proto encoding by passing tag value > Remove usage of _LIBCPP_HAS_NO_FILESYSTEM_LIBRARY. > Make Span's relational operators constexpr since C++20. > distributions: support a zero max value in Zipf. > PR #1786: Fix typo in test case. > absl/random: run clang-format. > Add some nullability annotations in logging and tidy up some NOLINTs and comments. > CMake: Change the default for ABSL_PROPAGATE_CXX_STD to ON > Delete UnvalidatedMockingBitGen > PR #1783: [riscv][debugging] Fix a few warnings in RISC-V inlines > Add conversion operator to std::array for StrSplit. > Add a comment explaining the extra comparison in raw_hash_set::operator==. Also add a small optimization to avoid the extra comparison in sets that use hash_default_eq as the key_equal functor. > Add benchmark for absl::HexStringToBytes > Avoid installing options.h with the other headers > Add ABSL_ATTRIBUTE_LIFETIME_BOUND to absl::Span constructors. > Annotate absl::InlinedVector to warn when unused. > Make `c_find_first_of`'s `options` parameter a const reference to allow temporaries. > Disable Elf symbols for Xtensa > PR #1775: Support symbolize only on WINAPI_PARTITION_DESKTOP > Require through an internal presubmit that .h\|.cc\|.inc files contain either the string ABSL_NAMESPACE_BEGIN or SKIP_ABSL_INLINE_NAMESPACE_CHECK > Xtensa supports mmap, enable it in absl/base/config.h > PR #1777: Avoid std::ldexp in `operator double(int128)`. > Marks absl::Span as view and borrowed_range, like std::span. > Mark inline functions with only a simple comparison in strings/ascii.h as constexpr. > Add missing Abseil inline namespace and fix includes > Fix bug where the high bits of `__int128_t`/`__uint128_t` might go unused in the hash function. This fix increases the hash quality of these types. > Add a test to verify bit casting between signed and unsigned int128 works as expected > Add suggestions to enable sanitizers for asserts when doing so may be helpful. > Add nullability attributes to nullability type aliases. > Refactor swisstable moves. > Improve ABSL_ASSERT performance by guaranteeing it is optimized away under NDEBUG in C++20 > Mark Abseil hardening assert in AssertSameContainer as slow. > Add workaround for q++ 8.3.0 (QNX 7.1) compiler by making sure MaskedPointer is trivially copyable and copy constructible. > Small Mutex::Unlock optimization > Optimize `CEscape` and `CEscapeAndAppend` by up to 40%. > Fix the conditional compilation of non_temporal_store_memcpy_avx to verify that AVX can be forced via `gnu::target`. > Delete TODOs to move functors when moving hashtables and add a test that fails when we do so. > Fix benchmarks in `escaping_benchmark.cc` by properly calling `benchmark::DoNotOptimize` on both inputs and outputs and by removing the unnecessary and wrong `ABSL_RAW_CHECK` condition (`check != 0`) of `BM_ByteStringFromAscii_Fail` benchmark. > It seems like commit abc9b916a94ebbf251f0934048295a07ecdbf32a did not work as intended. > Fix a bug in `absl::SetVLogLevel` where a less generic pattern incorrectly removed a more generic one. > Remove the side effects between tests in vlog_is_on_test.cc > Attempt to fix flaky Abseil waiter/sleep tests > Add an explicit tag for non-SOO CommonFields (removing default ctor) and add a small optimization for early return in AssertNotDebugCapacity. > Make moved-from swisstables behave the same as empty tables. Note that we may change this in the future. > Tag tests that currently fail on darwin_arm64 with "no_test_darwin_arm64" > add gmock to cmake defs for no_destructor_test > Optimize raw_hash_set moves by allowing some members of CommonFields to be uninitialized when moved-from. > Add more debug capacity validation checks on iteration/size. > Add more debug capacity validation checks on copies. > constinit -> constexpr for DisplayUnits > LSC: Fix null safety issues diagnosed by Clang’s `-Wnonnull` and `-Wnullability`. > Remove the extraneous variable creation in Match(). > Import of CCTZ from GitHub. > Add more debug capacity validation checks on merge/swap. > Add `absl::` namespace to c_linear_search implementation in order to avoid ADL > Distinguish the debug message for the case of self-move-assigned swiss tables. > Update LowLevelHash comment regarding number of hash state variables. > Add an example for the `--vmodule` flag. > Remove first prefetch. > Add moved-from validation for the case of self-move-assignment. > Allow slow and fast abseil hardening checks to be enabled independently. > Update `ABSL_RETIRED_FLAG` comment to reflect `default_value` is no longer used. > Add validation against use of moved-from hash tables. > Provide file-scoped pragma behind macro ABSL_POINTERS_DEFAULT_NONNULL to indicate the default nullability. This is a no-op for now (not understood by checkers), but does communicate intention to human readers. > Add stacktrace config for android using the generic implementation > Fix nullability annotations in ABSL code. > Replace CHECKs with ASSERTs and EXPECTs -- no reason to crash on failure. > Remove ABSL_INTERNAL_ATTRIBUTE_OWNER and ABSL_INTERNAL_ATTRIBUTE_VIEW > Migrate ABSL_INTERNAL_ATTRIBUTE_OWNER and ABSL_INTERNAL_ATTRIBUTE_VIEW to ABSL_ATTRIBUTE_OWNER and ABSL_ATTRIBUTE_VIEW > Disable ABSL_ATTRIBUTE_OWNER and ABSL_ATTRIBUTE_VIEW prior to Clang-13 due to false positives. > Make ABSL_ATTRIBUTE_VIEW and ABSL_ATTRIBUTE_OWNER public > Optimize raw_hash_set::AssertHashEqConsistent a bit to avoid having as much runtime overhead. > PR #1728: Workaround broken compilation against NDK r25 > Add validation against use of destroyed hash tables. > Do not truncate `ABSL_RAW_LOG` output at null bytes > Use several unused cord instances in tests and benchmarks. > Add comments about ThreadIdentity struct allocation behavior. > Refactoring followup for reentrancy validation in swisstable. > Add debug mode checks that element constructors/destructors don't make reentrant calls to raw_hash_set member functions. > Add tagging for cc_tests that are incompatible with Fuchsia > Add GetTID() implementation for Fuchsia > PR #1738: Fix shell option group handling in pkgconfig files > Disable weak attribute when absl compiled as windows DLL > Remove `CharIterator::operator->`. > Mark non-modifying container algorithms as constexpr for C++20. > PR #1739: container/internal: Explicitly include <cstdint> > Don't match -Wnon-virtual-dtor in the "flags are needed to suppress warnings in headers". It should fall through to the "don't impose our warnings on others" case. Do this by matching on "-Wno-" instead of "-Wno". > PR #1732: Fix build on NVIDIA Jetson board. Fix #1665 > Update GoogleTest dependency to 1.15.2 > Enable AsciiStrToLower and AsciiStrToUpper overloads for rvalue references. > PR #1735: Avoid `int` to `bool` conversion warning > Add `absl::swap` functions for `_hash_` to avoid calling `std::swap` > Change internal visibility > Remove resolved issue. > Increase test timeouts to support running on Fuchsia emulators > Add tracing annotations to absl::Notification > Suppress compiler optimizations which may break container poisoning. > Disable ABSL_INTERNAL_HAVE_DEBUGGING_STACK_CONSUMPTION for Fuchsia > Add tracing annotations to absl::BlockingCounter > Add absl_vlog_is_on and vlog_is_on to ABSL_INTERNAL_DLL_TARGETS > Update swisstable swap API comments to no longer guarantee that we don't move/swap individual elements. > PR #1726: cmake: Fix RUNPATH when using BUILD_WITH_INSTALL_RPATH=True > Avoid unnecessary copying when upper-casing or lower-casing ASCII string_view > Add weak internal tracing API > Fix LINT.IfChange syntax > PR #1720: Fix spelling mistake: occurrance -> occurrence > Add missing include for Windows ASAN configuration in poison.cc > Delete absl/strings/internal/has_absl_stringify.h now that the GoogleTest version we depend on uses the public file > Update versions of dependencies in preparation for release > PR #1699: Add option to build with MSVC static runtime > Remove unneeded 'be' from comment. > PR #1715: Generate options.h using CMake only once > Small type fix in absl/log/internal/log_impl.h > PR #1709: Handle RPATH CMake configuration > PR #1710: fixup! PR #1707: Fixup absl_random compile breakage in Apple ARM64 targets > PR #1695: Fix time library build for Apple platforms > Remove cyclic cmake dependency that breaks in cmake 3.30.0 > Roll forward poisoned pointer API and fix portability issues. > Use GetStatus in IsOkAndHoldsMatcher > PR #1707: Fixup absl_random compile breakage in Apple ARM64 targets > PR #1706: Require CMake version 3.16 > Add an MSVC implementation of ABSL_ATTRIBUTE_LIFETIME_BOUND > Mark c_min_element, c_max_element, and c_minmax_element as constexpr in C++17. > Optimize the absl::GetFlag cost for most non built-in flag types (including string). > Encode some additional metadata when writing protobuf-encoded logs. > Replace signed integer overflow, since that's undefined behavior, with unsigned integer overflow. > Make mutable CompressedTuple::get() constexpr. > vdso_support: support DT_GNU_HASH > Make c_begin, c_end, and c_distance conditionally constexpr. > Add operator<=> comparison to absl::Time and absl::Duration. > Deprecate `ABSL_ATTRIBUTE_NORETURN` in favor of the `[[noreturn]]` standardized in C++11 > Rollback new poisoned pointer API > Static cast instead of reinterpret cast raw hash set slots as casting from void* to T* is well defined > Fix absl::NoDestructor documentation about its use as a global > Declare Rust demangling feature-complete. > Split demangle_internal into a tree of smaller libraries. > Decode Rust Punycode when it's not too long. > Add assertions to detect reentrance in `IterateOverFullSlots` and `absl::erase_if`. > Decoder for Rust-style Punycode encodings of bounded length. > Add `c_contains()` and `c_contains_subrange()` to `absl/algorithm/container.h`. > Three-way comparison spaceship <=> operators for Cord. > internal-only change > Remove erroneous preprocessor branch on SGX_SIM. > Add an internal API to get a poisoned pointer. > optimization.h: Add missing <utility> header for C++ > Add a compile test for headers that require C compatibility > Fix comment typo > Expand documentation for SetGlobalVLogLevel and SetVLogLevel. > Roll back 6f972e239f668fa29cab43d7968692cd285997a9 > PR #1692: Add missing `<utility>` include > Remove NOLINT for `#include <new>` for __cpp_lib_launder > Remove not used after all kAllowRemoveReentrance parameter from IterateOverFullSlots. > Create `absl::container_internal::c_for_each_fast` for SwissTable. > Disable flaky test cases in kernel_timeout_internal_test. > Document that swisstable and b-tree containers are not exception-safe. > Add `ABSL_NULLABILITY_COMPATIBLE` attribute. > LSC: Move expensive variables on their last use to avoid copies. > Add ABSL_INTERNAL_ATTRIBUTE_VIEW and ABSL_INTERNAL_ATTRIBUTE_OWNER attributes to more types in Abseil > Drop std:: qualification from integer types like uint64_t. > Increase slop time on MSVC in PerThreadSemTest.Timeouts again due to continued flakiness. > Turn on validation for out of bounds MockUniform in MockingBitGen > Use ABSL_UNREACHABLE() instead of equivalent > If so configured, report which part of a C++ mangled name didn't parse. > Sequence of 1-to-4 values with prefix sum to support Punycode decoding. > Add the missing inline namespace to the nullability files > Add ABSL_INTERNAL_ATTRIBUTE_VIEW and ABSL_INTERNAL_ATTRIBUTE_OWNER attributes to types in Abseil > Disallow reentrance removal in `absl::erase_if`. > Fix implicit conversion of temporary bitgen to BitGenRef > Use `IterateOverFullSlots` in `absl::erase_if` for hash table. > UTF-8 encoding library to support Rust Punycode decoding. > Disable negative NaN float ostream format checking on RISC-V > PR #1689: Minor: Add missing quotes in CMake string view library definition > Demangle template parameter object names, TA <template-arg>. > Demangle sr St <simple-id> <simple-id>, a dubious encoding found in the wild. > Try not to lose easy type combinators in S::operator const int() and the like. > Demangle fixed-width floating-point types, DF.... > Demangle _BitInt types DB..., DU.... > Demangle complex floating-point literals. > Demangle <extended-qualifier> in types, e.g., U5AS128 for address_space(128). > Demangle operator co_await (aw). > Demangle fully general vendor extended types (any <template-args>). > Demangle transaction-safety notations GTt and Dx. > Demangle C++11 user-defined literal operator functions. > Demangle C++20 constrained friend names, F (<source-name> \| <operator-name>). > Demangle dependent GNU vector extension types, Dv <expression> _ <type>. > Demangle elaborated type names, (Ts \| Tu \| Te) <name>. > Add validation that hash/eq functors are consistent, meaning that `eq(k1, k2) -> hash(k1) == hash(k2)`. > Demangle delete-expressions with the global-scope operator, gs (dl \| da) .... > Demangle new-expressions with braced-init-lists. > Demangle array new-expressions, [gs] na .... > Demangle object new-expressions, [gs] nw .... > Demangle preincrement and predecrement, pp_... and mm_.... > Demangle throw and rethrow (tw... and tr). > Remove redundant check of is_soo() while prefetching heap blocks. > Demangle ti... and te... expressions (typeid). > Demangle nx... syntax for noexcept(e) as an expression in a dependent signature. > Demangle alignof expressions, at... and az.... > Demangle C++17 structured bindings, DC...E. > Demangle modern _ZGR..._ symbols. > Remove redundant check of is_soo() while prefetching heap blocks. > Demangle sizeof...(pack captured from an alias template), sP ... E. > Demangle types nested under vendor extended types. > Demangle il ... E syntax (braced list other than direct-list-initialization). > Avoid signed overflow for Ed <number> _ manglings with large <number>s. > Remove redundant check of is_soo() while prefetching heap blocks. > Remove obsolete TODO > Clarify function comment for `erase` by stating that this idiom only works for "some" standard containers. > Move SOVERSION to global CMakeLists, apply SOVERSION to DLL > Set ABSL_HAVE_THREAD_LOCAL to 1 on all platforms > Demangle constrained auto types (Dk <type-constraint>). > Parse <discriminator> more accurately. > Demangle lambdas in class member functions' default arguments. > Demangle unofficial <unresolved-qualifier-level> encodings like S0_IT_E. > Do not make std::filesystem::path hash available for macOS <10.15 > Include flags in DLL build (non-Windows only) > Enable building monolithic shared library on macOS and Linux. > Demangle Clang's last-resort notation _SUBSTPACK_. > Demangle C++ requires-expressions with parameters (rQ ... E). > Demangle Clang's encoding of __attribute__((enable_if(condition, "message"))). > Demangle static_cast and friends. > Demangle decltype(expr)::nested_type (NDT...E). > Optimize GrowIntoSingleGroupShuffleControlBytes. > Demangle C++17 fold-expressions. > Demangle thread_local helper functions. > Demangle lambdas with explicit template arguments (UlTy and similar forms). > Demangle &-qualified function types. > Demangle valueless literals LDnE (nullptr) and LA<number>_<type>E ("foo"). > Correctly demangle the <unresolved-name> at the end of dt and pt (x.y, x->y). > Add missing targets to ABSL_INTERNAL_DLL_TARGETS > Build abseil_test_dll with ABSL_BUILD_TESTING > Demangle C++ requires-expressions without parameters (rq ... E). > overload: make the constructor constexpr > Update Abseil CI Docker image to use Clang 19, GCC 14, and CMake 3.29.3 > Workaround symbol resolution bug in Clang 19 > Workaround bogus GCC14 -Wmaybe-uninitialized warning > Silence a bogus GCC14 -Warray-bounds warning > Forbid absl::Uniform<absl::int128>(gen) > Use IN_LIST to replace list(FIND) + > -1 > Recognize C++ vendor extended expressions (e.g., u9__is_same...E). > `overload_test`: Remove a few unnecessary trailing return types > Demangle the C++ this pointer (fpT). > Stop eating an extra E in ParseTemplateArg for some L<type><value>E literals. > Add ABSL_INTERNAL_ATTRIBUTE_VIEW and ABSL_INTERNAL_ATTRIBUTE_OWNER attributes to Abseil. > Demangle C++ direct-list-initialization (T{1, 2, 3}, tl ... E). > Demangle the C++ spaceship operator (ss, operator<=>). > Demangle C++ sZ encodings (sizeof...(pack)). > Demangle C++ so ... E encodings (typically array-to-pointer decay). > Recognize dyn-trait-type in Rust demangling. > Rework casting in raw_hash_set's IsFull(). > Remove test references to absl::SharedBitGen, which was never part of the open source release. This was only used in tests that never ran as part in the open source release. > Recognize fn-type and lifetimes in Rust demangling. > Support int128/uint128 in validated MockingBitGen > Recognize inherent-impl and trait-impl in Rust demangling. > Recognize const and array-type in Rust mangled names. > Remove Asylo from absl. > Recognize generic arguments containing only types in Rust mangled names. > Fix missing #include <random> for std::uniform_int_distribution > Move `prepare_insert` out of the line as type erased `PrepareInsertNonSoo`. > Revert: Add -Wdead-code-aggressive to ABSL_LLVM_FLAGS > Add (unused) validation to absl::MockingBitGen > Support `AbslStringify` with `DCHECK_EQ`. > PR #1672: Optimize StrJoin with tuple without user defined formatter > Give ReturnAddresses and N<uppercase> namespaces separate stacks for clarity. > Demangle Rust backrefs. > Use Nt for struct and trait names in Rust demangler test inputs. > Allow __cxa_demangle on MIPS > Add a `string_view` overload to `absl::StrJoin` > Demangle Rust's Y<type><path> production for passably simple <type>s. > `convert_test`: Delete obsolete condition around ASSERT_EQ in TestWithMultipleFormatsHelper > `any_invocable`: Clean up #includes > Resynchronize absl/functional/CMakeLists.txt with BUILD.bazel > `any_invocable`: Add public documentation for undefined behavior when invoking an empty AnyInvocable > `any_invocable`: Delete obsolete reference to proposed standard type > PR #1662: Replace shift with addition in crc multiply > Doc fix. > `convert_test`: Extract loop over tested floats from helper function > Recognize some simple Rust mangled names in Demangle. > Use __builtin_ctzg and __builtin_clzg in the implementations of CountTrailingZeroesNonzero16 and CountLeadingZeroes16 when they are available. > Remove the forked absl::Status matchers implementation in statusor_test > Add comment hack to fix copybara reversibility > Add GoogleTest matchers for absl::Status > [random] LogUniform: Document as a discrete distribution > Enable Cord tests with Crc. > Fix order of qualifiers in `absl::AnyInvocable` documentation. > Guard against null pointer dereference in DumpNode. > Apply ABSL_MUST_USE_RESULT to try lock functions. > Add public aliases for default hash/eq types in hash-based containers > Import of CCTZ from GitHub. > Remove the hand-rolled CordLeaker and replace with absl::NoDestructor to test the after-exit behavior > `convert_test`: Delete obsolete `skip_verify` parameter in test helper > overload: allow using the underlying type with CTAD directly. > PR #1653: Remove unnecessary casts when calling CRC32_u64 > PR #1652: Avoid C++23 deprecation warnings from float_denorm_style > Minor cleanup for `absl::Cord` > PR #1651: Implement ABSL_INTERNAL_DISABLE_DEPRECATED_DECLARATION_WARNING for MSVC compiler > Add `operator<=>` support to `absl::int128` and `absl::uint128` > [absl] Re-use the existing `std::type_identity` backfill instead of redefining it again > Add `absl::AppendCordToString` > `str_format/convert_test`: Delete workaround for [glibc bug](https://sourceware.org/bugzilla/show_bug.cgi?id=22142) > `absl/log/internal`: Document conditional ABSL_ATTRIBUTE_UNUSED, add C++17 TODO > `log/internal/check_op`: Add ABSL_ATTRIBUTE_UNUSED to CHECK macros when STRIP_LOG is enabled > log_benchmark: Add VLOG_IS_ON benchmark > Restore string_view detection check > Remove an unnecessary ABSL_ATTRIBUTE_UNUSED from a logging macro < Abseil LTS Branch, Jan 2024, Patch 2 (#1650) > In example code, add missing template parameter. > Optimize crc32 V128_From2x64 on Arm > Annotate that Mutex should warn when unused. > Add ABSL_ATTRIBUTE_LIFETIME_BOUND to Cord::Flatten/TryFlat > Deprecate `absl::exchange`, `absl::forward` and `absl::move`, which were only useful before C++14. > Temporarily revert dangling std::string_view detection until dependent is fixed > Use _decimal_ literals for the CivilDay example. > Fix bug in BM_EraseIf. > Add internal traits to absl::string_view for lifetimebound detection > Add internal traits to absl::StatusOr for lifetimebound detection > Add internal traits to absl::Span for lifetimebound detection > Add missing dependency for log test build target > Add internal traits for lifetimebound detection > Use local decoding buffer in HexStringToBytes > Only check if the frame pointer is inside a signal stack with known bounds > Roll forward: enable small object optimization in swisstable. > Optimize LowLevelHash by breaking dependency between final loads and previous len/ptr updates. > Fix the wrong link. > Optimize InsertMiss for tables without kDeleted slots. > Use GrowthInfo without applying any optimizations based on it. > Disable small object optimization while debugging some failing tests. > Adjust conditonal compilation in non_temporal_memcpy.h > Reformat log/internal/BUILD > Remove deprecated errno constants from the absl::Status mapping > Introduce GrowthInfo with tests, but without usage. > Enable small object optimization in swisstable. > Refactor the GCC unintialized memory warning suppression in raw_hash_set.h. > Respect `NDEBUG_SANITIZER` > Revert integer-to-string conversion optimizations pending more thorough analysis > Fix a bug in `Cord::{Append,Prepend}(CordBuffer)`: call `MaybeRemoveEmptyCrcNode()`. Otherwise appending a `CordBuffer` an empty Cord with a CRC node crashes (`RemoveCrcNode()` which increases the refcount of a nullptr child). > Add `BM_EraseIf` benchmark. > Record sizeof(key_type), sizeof(value_type) in hashtable profiles. > Fix ClangTidy warnings in btree.h. > LSC: Move expensive variables on their last use to avoid copies. > PR #1644: unscaledcycleclock: remove RISC-V support > Reland: Make DLOG(FATAL) not understood as [[noreturn]] > Separate out absl::StatusOr constraints into statusor_internal.h > Use Layout::WithStaticSizes in btree. > `layout`: Delete outdated comments about ElementType alias not being used because of MSVC > Performance improvement for absl::AsciiStrToUpper() and absl::AsciiStrToLower() > `layout_benchmark`: Replace leftover comment with intended call to MyAlign > Remove absl::aligned_storage_t > Delete ABSL_ANNOTATE_MEMORY_IS_INITIALIZED under Thread Sanitizer > Remove vestigial variables in the DumpNode() helper in absl::Cord > Do hashtablez sampling on the first insertion into an empty SOO hashtable. > Add explicit #include directives for <tuple>, "absl/base/config.h", and "absl/strings/string_view.h". > Add a note about the cost of `VLOG` in non-debug builds. > Fix flaky test failures on MSVC. > Add template keyword to example comment for Layout::WithStaticSizes. > PR #1643: add xcprivacy to all subspecs > Record sampling stride in cord profiling to facilitate unsampling. > Fix a typo in a comment. > [log] Correct SetVLOGLevel to SetVLogLevel in comments > Add a feature to container_internal::Layout that lets you specify some array sizes at compile-time as template parameters. This can make offset and size calculations faster. > `layout`: Mark parameter of Slices with ABSL_ATTRIBUTE_UNUSED, remove old workaround > `layout`: Use auto return type for functions that explicitly instantiate std::tuple in return statements > Remove redundant semicolons introduced by macros > [log] Make :vlog_is_on/:absl_vlog_is_on public in BUILD.bazel > Add additional checks for size_t overflows > Replace //visibility:private with :__pkg__ for certain targets > PR #1603: Disable -Wnon-virtual-dtor warning for CommandLineFlag implementations > Add several missing includes in crc/internal > Roll back extern template instatiations in swisstable due to binary size increases in shared libraries. > Add nodiscard to SpinLockHolder. > Test that rehash(0) reduces capacity to minimum. > Add extern templates for common swisstable types. > Disable ubsan for benign unaligned access in crc_memcpy > Make swisstable SOO support GDB pretty printing and still compile in OSS. > Fix OSX support with CocoaPods and Xcode 15 > Fix GCC7 C++17 build > Use UnixEpoch and ZeroDuration > Make flaky failures much less likely in BasicMocking.MocksNotTriggeredForIncorrectTypes test. > Delete a stray comment > Move GCC uninitialized memory warning suppression into MaybeInitializedPtr. > Replace usages of absl::move, absl::forward, and absl::exchange with their std:: equivalents > Fix the move to itself > Work around an implicit conversion signedness compiler warning > Avoid MSan: use-of-uninitialized-value error in find_non_soo. > Fix flaky MSVC test failures by using longer slop time. > Add ABSL_ATTRIBUTE_UNUSED to variables used in an ABSL_ASSUME. > Implement small object optimization in swisstable - disabled for now. > Document and test ability to use absl::Overload with generic lambdas. > Extract `InsertPosition` function to be able to reuse it. > Increase GraphCycles::PointerMap size > PR #1632: inlined_vector: Use trivial relocation for `erase` > Create `BM_GroupPortable_Match`. > [absl] Mark `absl::NoDestructor` methods with `absl::Nonnull` as appropriate > Automated Code Change > Rework casting in raw_hash_set's `IsFull()`. > Adds ABSL_ATTRIBUTE_LIFETIME_BOUND to absl::BitGenRef > Workaround for NVIDIA C++ compiler being unable to parse variadic expansions in range of range-based for loop > Rollback: Make DLOG(FATAL) not understood as [[noreturn]] > Make DLOG(FATAL) not understood as [[noreturn]] > Optimize `absl::Duration` division and modulo: Avoid repeated redundant comparisons in `IDivFastPath`. > Optimize `absl::Duration` division and modulo: Allow the compiler to inline `time_internal::IDivDuration`, by splitting the slow path to a separate function. > Fix typo in example code snippet. > Automated Code Change > Add braces for conditional statements in raw_hash_map functions. > Optimize `prepare_insert`, when resize happens. It removes single unnecessary probing before resize that is beneficial for small tables the most. > Add noexcept to move assignment operator and swap function > Import of CCTZ from GitHub. > Minor documentation updates. > Change find_or_prepare_insert to return std::pair<iterator, bool> to match return type of insert. > PR #1618: inlined_vector: Use trivial relocation for `SwapInlinedElements` > Improve raw_hash_set tests. > Performance improvement for absl::AsciiStrToUpper() and absl::AsciiStrToLower() > Use const_cast to avoid duplicating the implementation of raw_hash_set::find(key). > Import of CCTZ from GitHub. > Performance improvement for absl::AsciiStrToUpper() and absl::AsciiStrToLower() > Annotate that SpinLock should warn when unused. > PR #1625: absl::is_trivially_relocatable now respects assignment operators > Introduce `Group::MaskNonFull` without usage. > `demangle`: Parse template template and C++20 lambda template param substitutions > PR #1617: fix MSVC 32-bit build with -arch:AVX > Minor documentation fix for `absl::StrSplit()` > Prevent overflow in `absl::CEscape()` > `demangle`: Parse optional single template argument for built-in types > PR #1412: Filter out `-Xarch_` flags from pkg-config files > `demangle`: Add complexity guard to `ParseQRequiresExpr` < Prepare 20240116.1 patch for Apple Privacy Manifest (#1623) > Remove deprecated symbol absl::kuint128max > Add ABSL_ATTRIBUTE_WARN_UNUSED. > `demangle`: Parse `requires` clauses on template params, before function return type > On Apple, implement absl::is_trivially_relocatable with the fallback. > `demangle`: Parse `requires` clauses on functions > Make `begin()` to return `end()` on empty tables. > `demangle`: Parse C++20-compatible template param declarations, except those with `requires` expressions > Add the ABSL_DEPRECATE_AND_INLINE() macro > Span: Fixed comment referencing std::span as_writable_bytes() as as_mutable_bytes(). > Switch rank structs to be consistent with written guidance in go/ranked-overloads > Avoid hash computation and `Group::Match` in small tables copy and use `IterateOverFullSlots` for iterating for all tables. > Optimize `absl::Hash` by making `LowLevelHash` faster. > Add -Wdead-code-aggressive to ABSL_LLVM_FLAGS < Backport Apple Privacy Manifest (#1613) > Stop using `std::basic_string<uint8_t>` which relies on a non-standard generic `char_traits<>` implementation, recently removed from `libc++`. > Add absl_container_hash-based HashEq specialization > `demangle`: Implement parsing for simplest constrained template arguments > Roll forward 9d8588bfc4566531c4053b5001e2952308255f44 (which was rolled back in 146169f9ad357635b9cd988f976b38bcf83476e3) with fix. > Add a version of absl::HexStringToBytes() that returns a bool to validate that the input was actually valid hexadecimal data. > Enable StringLikeTest in hash_function_defaults_test > Fix a typo. > Minor changes to the BUILD file for absl/synchronization > Avoid static initializers in case of ABSL_FLAGS_STRIP_NAMES=1 > Rollback 9d8588bfc4566531c4053b5001e2952308255f44 for breaking the build > No public description > Decrease the precision of absl::Now in x86-64 debug builds > Optimize raw_hash_set destructor. > Add ABSL_ATTRIBUTE_UNINITIALIZED macros for use with clang and GCC's `uninitialized` > Optimize `Cord::Swap()` for missed compiler optimization in clang. > Type erased hash_slot_fn that depends only on key types (and hash function). > Replace `testonly = 1` with `testonly = True` in abseil BUILD files. > Avoid extra `& msbs` on every iteration over the mask for GroupPortableImpl. > Missing parenthesis. > Early return from destroy_slots for trivially destructible types in flat_hash_{}. > Avoid export of testonly target absl::test_allocator in CMake builds > Use absl::NoDestructor for cordz global queue. > Add empty WORKSPACE.bzlmod > Introduce `RawHashSetLayout` helper class. > Fix a corner case in SpyHashState for exact boundaries. > Add nullability annotations > Use absl::NoDestructor for global HashtablezSampler. > Always check if the new frame pointer is readable. > PR #1604: Add privacy manifest < Disable ABSL_ATTRIBUTE_TRIVIAL_ABI in open-source builds (#1606) > Remove code pieces for no longer supported GCC versions. > Disable ABSL_ATTRIBUTE_TRIVIAL_ABI in open-source builds > Prevent brace initialization of AlphaNum > Remove code pieces for no longer supported MSVC versions. > Added benchmarks for smaller size copy constructors. > Migrate empty CrcCordState to absl::NoDestructor. > Add protected copy ctor+assign to absl::LogSink, and clarify thread-safety requirements to apply to the interface methods. < Apply LTS transformations for 20240116 LTS branch (#1599) Closes scylladb/scylladb#28756	2026-04-08 12:19:54 +03:00
Liapkovich	4f17cc6d83	docs: add missing rack value for internode_compression parameter The rack option was fully implemented in the code but omitted from both docs/operating-scylla/admin.rst and conf/scylla.yaml comments. Closes scylladb/scylladb#29239	2026-04-08 12:19:54 +03:00
Pavel Emelyanov	0ea76a468f	schema: Avoid copies in column_mapping::operator== In a multi-declarator declaration, the & ref-qualifier is part of each individual declarator, not the shared type specifier. So: const auto& a = x(), b = y(); declares 'a' as a reference but 'b' as a value, silently copying y(). The same applies to: const T& a = v[i], b = v[j]; Both operator== lines had this pattern, causing an unnecessary copy of the column vector and an unnecessary copy of each entry on every call. Fix by repeating & on the second declarator in both lines. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Closes scylladb/scylladb#29213	2026-04-08 12:19:54 +03:00
Pavel Emelyanov	b7c14c6d29	token_metadata: Clear _topology_change_info gently clear_gently() (introduced in `322aa2f8b5`) clears all token_metadata_impl members using co_await to avoid reactor stalls on large data structures. _topology_change_info (introduced in `10bf8c7901`) was added later and not included in clear_gently(). update_topology_change_info() already uses utils::clear_gently() when replacing the value, so it looks reasonable to apply the same pattern in clear_gently(). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Closes scylladb/scylladb#29210	2026-04-08 12:19:54 +03:00
Pavel Emelyanov	54fbbf0410	locator/tablets: Fix missing selector value in error messages Some on_internal_error() calls have the selector argument to a format string with no placeholder for it in the format string. "While at it", disambiguate selector type in the message text. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#29208	2026-04-08 12:19:54 +03:00
Botond Dénes	418141ec08	Merge 'Drop create_dataset() helper from object_store tests' from Pavel Emelyanov There's only one test left that uses it, and it can be patched to use standard ks/cf creation helpers from pylib. This patch does so and drops the lengthy create_dataset() helper Tests improvements, no need to backport Closes scylladb/scylladb#29176 * github.com:scylladb/scylladb: test/backup: drop create_dataset helper test/backup: use new_test_keyspace in test_restore_primary_replica	2026-04-08 12:19:54 +03:00
Petr Gusev	1e3c8c5a87	test_mutation_schema_change: use tablets The enable_tablets(false) was added when LWT wasn't supported for tablets, now it's, so no need in this attribute are more. The test covers behavior which should work in similar way for both vnodes and tablets -> it doesn't seem it would benefit much from running it in both enable_tablets(true) and enable_tablets(false) modes. Closes scylladb/scylladb#29167	2026-04-08 12:19:54 +03:00
Pavel Emelyanov	7f854c0255	hints: Use shorter fault-injection overload In order to apply fsult-injected delay, there's the inject(duration) overload. Results in shorter code Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#29168	2026-04-08 10:51:37 +03:00
Botond Dénes	aeefbda304	Merge 'Simplify and improve API descibe_ring code flow' from Pavel Emelyanov The endpoint in question has some places worth fixing, in particular - the keyspace parameter is not validated - the validated table name is resolved into table_id, but the id is unused - two ugly static helpers to stream obtained token ranges into json Improving the API code flow, not backporting Closes scylladb/scylladb#29154 * github.com:scylladb/scylladb: api: Inline describe_ring JSON handling storage_service: Make describe_ring_for_table() take table_id	2026-04-08 10:50:07 +03:00
Artsiom Mishuta	b1e9c0b867	test/pylib: add typed skip markers plugin Add skip_reason_plugin.py — a framework-agnostic pytest plugin that provides typed skip markers (skip_bug, skip_not_implemented, skip_slow, skip_env) so that the reason a test is skipped is machine-readable in JUnit XML and Allure reports. Bare untyped pytest.mark.skip now triggers a warning (to become an error after full migration). Runtime skips via skip() are also enriched by parsing the [type] prefix from the skip message. The plugin is a class (SkipReasonPlugin) that receives the concrete SkipType enum and an optional report_callback from conftest.py, keeping it decoupled from allure and project-specific types. Extract SkipType enum and convenience runtime skip wrappers (skip_bug, skip_env, etc.) into test/pylib/skip_types.py so callers only need a single import instead of importing both SkipType and skip() separately. conftest.py imports SkipType from the new module and registers the plugin instance unconditionally (for all test runners). New files: - test/pylib/skip_reason_plugin.py: core plugin — typed marker processing, bare-skip warnings, JUnit/Allure report enrichment (including runtime skip() parsing via _parse_skip_type helper) - test/pylib/skip_types.py: SkipType enum and convenience wrappers (skip_bug, skip_not_implemented, skip_slow, skip_env) - test/pylib_test/test_skip_reason_plugin.py: 17 pytester-based test functions (51 cases across 3 build modes) covering markers, warnings, reports, callbacks, and skip_mode interaction Infrastructure changes: - test/conftest.py: import SkipType from skip_types, register SkipReasonPlugin with allure report callback - test/pylib/runner.py: set SKIP_TYPE_KEY/SKIP_REASON_KEY stash keys for skip_mode so the report hook can enrich JUnit/Allure with skip_type=mode without longrepr parsing - test/pytest.ini: register typed marker definitions (required for --strict-markers even when plugin is not loaded) Migrated test files (representative samples): - test/cluster/test_tablet_repair_scheduler.py: skip -> skip_bug (#26844), skip -> skip_not_implemented - test/cqlpy/.../timestamp_test.py: skip -> skip_slow - test/cluster/dtest/schema_management_test.py: skip -> skip_not_implemented - test/cluster/test_change_replication_factor_1_to_0.py: skip -> skip_bug (#20282) - test/alternator/conftest.py: skip -> skip_env - test/alternator/test_https.py: use skip_env() wrapper Fixes SCYLLADB-79 Closes scylladb/scylladb#29235	2026-04-08 10:38:56 +03:00
Pavel Emelyanov	e0fa9ee332	Merge 'storage: implement sstable clone for object storage' from Ernest Zaslavsky This patch series implements `object_storage_base::clone`, which was previously a stub that aborted at runtime. Clone creates a copy of an sstable under a new generation and is used during compaction. The implementation uses server-side object copies (S3 CopyObject / GCS Objects: rewrite) and mirrors the filesystem clone semantics: TemporaryTOC is written first to mark the operation as in-progress, component objects are copied, and TemporaryTOC is removed to commit (unless the caller requested the destination be left unsealed). The first two patches fix pre-existing bugs in the underlying storage clients that were exposed by the new clone code path: - GCS `copy_object` used the wrong HTTP method (PUT instead of POST) and sent an invalid empty request body. - S3 `copy_object` silently ignored the abort_source parameter. 1. gcp_client: fix copy_object request method and body — Fix two bugs in the GCS rewrite API call. 2. s3_client: pass through abort_source in copy_object — Stop ignoring the abort_source parameter. 3. object_storage: add copy_object to object_storage_client — New interface method with S3 and GCS implementations. 4. storage: add make_object_name overload with generation — Helper for building destination object names with a different generation. 5. storage: make delete_object const — Needed by the const clone method. 6. storage: implement object_storage_base::clone — The actual clone implementation plus a copy_object wrapper. 7. test/boost: enable sstable clone tests for S3 and GCS — Re-enable the previously skipped tests. A test similar to `sstable_clone_leaving_unsealed_dest_sstable` was added to properly test the sealed/unsealed states for object storage. Works for both S3 and GCS. Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1045 Prerequisite: https://github.com/scylladb/scylladb/pull/28790 No need to backport since this code targets future feature Closes scylladb/scylladb#29166 * github.com:scylladb/scylladb: compaction_test: enable sstable clone tests for S3 and GCS storage: implement object_storage_base::clone storage: make delete_object const in object_storage_base storage: add make_object_name overload with generation sstables: add get_format() accessor to sstable object_storage: add copy_object to object_storage_client s3_client: pass through abort_source in copy_object gcp_client: fix copy_object request method and body	2026-04-08 09:35:10 +03:00
Nadav Har'El	4eeb9f4120	lwt, vector: write to CDC when vector index is enabled. The vector-search feature introduced the somewhat confusing feature of enabling CDC without explicitly enabling CDC: When a vector index is enabled on a table, CDC is "enabled" for it even if the user didn't ask to enable CDC. For this, write-path code began to use a new cdc_enabled() function instead of checking schema.cdc_options.enabled() directly. This cdc_enabled() function checks if either this enabled() is true, or has_vector_index() is true. Unfortunately, LWT writes continued to use cdc_options.enabled() instead of the new cdc_enabled(). This means that if a vector index is used and a vector is written using an LWT write, the new value is not indexed. This patch fixes this bug. It also adds a regression test that fails before this patch and passes afterwards - the new test verifies that when a table has a vector index (but no explicit CDC enabled), the CDC log is updated both after regular writes and after successful LWT writes. This patch was also tested in the context of the upcoming vector-search- for-Alternator pull request, which has a test reproducing this bug (Alternator uses LWT frequently, so this is very important there). It will also be tested by the vector-store test suite ("validator"). Fixes SCYLLADB-1342 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#29300	2026-04-08 07:55:05 +03:00
Marcin Maliszkiewicz	1bf3110adb	Merge 'test: add test_upgrade_preserves_ddl_audit_for_tables' from Andrzej Jackowski Verify that upgrading from 2025.1 to master does not silently drop DDL auditing for table-scoped audit configurations ([SCYLLADB-1155](https://scylladb.atlassian.net/browse/SCYLLADB-1155)). Test time in dev: 4s Refs: SCYLLADB-1155 Fixes: SCYLLADB-1305 No backport, test for bug on master [SCYLLADB-1155]: https://scylladb.atlassian.net/browse/SCYLLADB-1155?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ Closes scylladb/scylladb#29223 * github.com:scylladb/scylladb: test: add test_upgrade_preserves_ddl_audit_for_tables test: audit: split validate helper so callers need not pass audit_settings test: audit: declare manager attribute in AuditTester base class	2026-04-07 17:29:11 +02:00
Marcin Maliszkiewicz	895fdb6d29	Merge 'ldap: fix double-free of LDAPMessage in poll_results()' from Andrzej Jackowski In the unregistered-ID branch, ldap_msgfree() was called on a result already owned by an RAII ldap_msg_ptr, causing a double-free on scope exit. Remove the redundant manual free. Fixes: SCYLLADB-1344 Backport: 2026.1, 2025.4, 2025.1 - it's a memory corruption, with a one-line fix, so better backport it everywhere. Closes scylladb/scylladb#29302 * github.com:scylladb/scylladb: test: ldap: add regression test for double-free on unregistered message ID ldap: fix double-free of LDAPMessage in poll_results()	2026-04-07 17:27:43 +02:00
Ernest Zaslavsky	422f107122	compaction_test: enable sstable clone tests for S3 and GCS Now that object_storage_base::clone is implemented, remove the early-return skips and re-enable the sstable_clone_leaving_unsealed_dest_sstable tests for both S3 and GCS storage backends.	2026-04-07 18:16:52 +03:00
Ernest Zaslavsky	7cd9bbb010	storage: implement object_storage_base::clone Implement the clone method for object_storage_base, which creates a copy of an sstable with a new generation using server-side object copies. Also add a const copy_object convenience wrapper, similar to the existing put_object and delete_object wrappers. A dedicated test for the new object storage clone path will be added in the following commit. The preexisting local-filesystem clone is already covered by the sstable_clone_leaving_unsealed_dest_sstable test.	2026-04-07 18:16:52 +03:00
Ernest Zaslavsky	8fa82e6b6f	storage: make delete_object const in object_storage_base The method doesn't modify any member state. Making it const is needed for calling it from the const clone method.	2026-04-07 18:16:52 +03:00
Ernest Zaslavsky	47387341bb	storage: add make_object_name overload with generation Add a make_object_name overload that accepts a target generation parameter for constructing object names with a generation different from the source sstable's own. Refactor the original make_object_name to delegate to the new overload, eliminating code duplication. This is needed by clone to build destination object names for the new generation.	2026-04-07 18:16:52 +03:00
Ernest Zaslavsky	8bd891c6ed	sstables: add get_format() accessor to sstable Add a public get_format() accessor for the _format member, following the same pattern as the existing get_version(). This allows storage implementations to access the sstable format without reaching into private members, and is needed by the upcoming object_storage_base::clone to construct entry_descriptor for the sstables registry.	2026-04-07 18:16:52 +03:00
Ernest Zaslavsky	3d23490615	object_storage: add copy_object to object_storage_client Add a copy_object method to the object_storage_client interface for server-side object copies, with implementations for both S3 and GCS wrappers. The S3 wrapper delegates to s3::client::copy_object. The GCS wrapper delegates to gcp::storage::client's cross-bucket copy_object overload. This is a prerequisite for implementing sstable clone on object storage.	2026-04-07 18:16:52 +03:00
Ernest Zaslavsky	1702d6e6d4	s3_client: pass through abort_source in copy_object The abort_source parameter in s3::client::copy_object was ignored — the function accepted it but always passed nullptr to the underlying copy_s3_object. Forward it properly so callers can cancel in-progress copies.	2026-04-07 18:16:52 +03:00
Ernest Zaslavsky	bfdc1e5267	gcp_client: fix copy_object request method and body The GCP copy_object (rewrite API) had two bugs: 1. The request body was an empty string, but the GCP rewrite endpoint always parses it as JSON metadata. An empty string is not valid JSON, resulting in 400 "Metadata in the request couldn't decode". Fix: send "{}" (empty JSON object) as the body. 2. The HTTP method was PUT, but the GCP Objects: rewrite API requires POST per the documentation. Fix: use POST. Test coverage in a follow-up patch	2026-04-07 18:16:52 +03:00
Nadav Har'El	a0e79f391f	Merge 'alternator: fix batch write item squashing cdc entries' from Radosław Cybulski When `BatchWriteItem` operates on multiple items sharing the same partition key in `always_use_lwt` write isolation mode, all CDC log entries are emitted under a single timestamp. The previous `get_records` parsing algorithm in `alternator/streams.cc` assumed that all CDC log entries sharing the same timestamp correspond to a single DynamoDB item change. As a result, it would incorrectly squash multiple distinct item changes into a single Streams record — producing wrong event data (e.g., one INSERT instead of four, with mismatched key/attribute values). Note: the bug is specific to `always_use_lwt` mode because only in LWT mode does the entire batch share a single timestamp. In non-LWT modes, each item in the batch receives a separate timestamp, so the entries naturally stay separate. Commit 1: alternator: add BatchWriteItem Streams test - Adds new tests `test_streams_batchwrite_no_clustering_deletes_non_existing_items` and `test_streams_batchwrite_no_clustering_deletes_existing_items` that cover the corner cases of batch-deleting a existing and non-existing item in a table without a clustering key. CDC tables without clustering keys are handled differently, and this path was previously untested for delete operations. - Adds a new test `test_streams_batchwrite_into_the_same_partition_will_report_wrong_stream_data`, that is a simple way to trigger a bug. - Adds a new test `test_streams_batchwrite_into_the_same_partition_deletes_existing_items`, that validates various combinations of puts and deletes in a single BatchWrite against the same partition. - Adds a new `test_table_ss_new_and_old_images_write_isolation_always` fixture and extends `create_table_ss` to accept `additional_tags`, enabling tests with a specific write isolation mode. Commit 2: alternator: fix BatchWriteItem squashed Streams entries The core fix rewrites the CDC log entry parsing in `get_records` to distinguish items by their clustering key: - Introduces `managed_bytes_ptr_hash` and `managed_bytes_ptr_equal` helper structs for pointer-based hash map lookups on `managed_bytes`. - Replaces the single `record`/`dynamodb` pair with a `std::unordered_map<const managed_bytes, Record, ...>` (`records_map`) keyed by the base table's clustering key value from each CDC log row. For tables without a clustering key, all entries map to a single sentinel key. - Adds a validation that Alternator tables have at most one clustering key column (as required by the DynamoDB data model). - On end-of-record (`eor`), flushes all accumulated per-clustering-key records into the output, each with a unique `eventID` (the `event_id` format now includes an index suffix). - Adjusts the limit check: since a single CDC timestamp bucket can now produce multiple output records, the limit may be slightly exceeded to avoid breaking mid-batch. Fixes #28439 Fixes: SCYLLADB-540 Closes scylladb/scylladb#28452 github.com:scylladb/scylladb: alternator/test: explain why 'always' write isolation mode is used in tests alternator/test: add scylla_only to always write isolation fixture alternator: fix BatchWriteItem squashed Streams entries alternator: add BatchWriteItem test (failing)	2026-04-07 17:49:23 +03:00
Nadav Har'El	22e7ef46a7	Merge 'vector_search: fix SELECT on local vector index' from Karol Nowacki Queries against local vector indexes were failing with the error: ```ANN ordering by vector requires the column to be indexed using 'vector_index'``` This was a regression introduced by `15788c3734`, which incorrectly assumed the first column in the targets list is always the vector column. For local vector indexes, the first column is the partition key, causing the failure. Previously, serialization logic for the target index option was shared between vector and secondary indexes. This is no longer viable due to the introduction of local vector indexes and vector indexes with filtering columns, which have different target format. This commit introduces a dedicated JSON-based serialization format for vector index targets, identifying the target column (tc), filtering columns (fc), and partition key columns (pk). This ensures unambiguous serialization and deserialization for all vector index types. This change is backward compatible for regular vector indexes. However, it breaks compatibility for local vector indexes and vector indexes with filtering columns created in version 2026.1.0. To mitigate this, usage of these specific index types will be blocked in the 2026.1.0 release by failing ANN queries against them in vector-store service. Fixes: SCYLLADB-895 Backport to 2026.1 is required as this issue occurs also on this branch. Closes scylladb/scylladb#28862 * github.com:scylladb/scylladb: index: fix DESC INDEX for vector index vector_search: test: refactor boilerplate setup vector_search: fix SELECT on local vector index index: test: vector index target option serialization test index: test: secondary index target option serialization test	2026-04-07 17:43:35 +03:00
Michał Jadwiszczak	9cf94116c2	db/view/view_building_worker: fix indentation	2026-04-07 16:12:04 +02:00
Michał Jadwiszczak	c9aa5bb09c	db/view/view_building_worker: lock staging sstables mutex for necessary shards when creating tasks To create `process_staging` view building tasks, we firstly need to collect informations about them on shard0, create necessary mutations, commit them to group0 and move staging sstables objects to their original shards. But there is a possible race after committing the group0 command and before moving the staging sstables to their shards. Between those two events, the coordinator may schedule freshly created tasks and dispatch them to the worker but the worker won't have the sstables objects because they weren't moved yet. This patch fixes the race by holding `_staging_sstables_mutex` locks from necessary shards when executing `create_staging_sstable_tasks()`. With this, even if the task will be scheduled and dispatched quickly, the worker will wait with executing it until the sstables objects are moved and the locks are released. Fixes SCYLLADB-816	2026-04-07 16:11:45 +02:00
Pavel Emelyanov	58e59e8c0d	Merge 'test: add test_sstable_clone_preserves_staging_state' from Benny Halevy Add a test that verifies filesystem_storage::clone preserves the sstable state: an sstable in staging is cloned to a new generation, the clone is re-loaded from the staging directory, and its state is asserted to still be staging. The change proves that https://scylladb.atlassian.net/browse/SCYLLADB-1205 is invalid, and can be closed. * No functional change and no backport needed Closes scylladb/scylladb#29209 * github.com:scylladb/scylladb: test: add test_sstable_clone_preserves_staging_state test: derive sstable state from directory in test_env::make_sstable sstables: log debug message in filesystem_storage::clone	2026-04-07 17:02:04 +03:00
Botond Dénes	816f2bf163	Merge 'cql3: fix null handling in data_value formatting' from Dario Mirovic `data_value::to_parsable_string()` crashes with a null pointer dereference when called on a `null` data_value. Return `"null"` instead. Added tests after the fix. Manually checked that tests fail without the fix. Fixes SCYLLADB-1350 This is a fix that prevents format crash. No known occurrence in production, but backport is desirable. Closes scylladb/scylladb#29262 * github.com:scylladb/scylladb: test: boost: test null data value to_parsable_string cql3: fix null handling in data_value formatting	2026-04-07 16:35:31 +03:00
Dimitrios Symonidis	701808d7aa	test/object_store: parametrize test_basic over replication factor Extend test_basic to run with both RF=1 and RF=3 to verify that object storage works correctly with multiple replicas. The test now starts one server per replica (each on its own rack), flushes all nodes, validates tablet replica counts for RF>1, and restarts all servers before verifying data is still readable. Fixes: SCYLLADB-546 Closes scylladb/scylladb#28583	2026-04-07 16:27:44 +03:00
Nadav Har'El	f642db0693	test/alternator: tests for missing support of ReturnConsumedCapacity As noted in issue #5027 and issue #29138, Alternator's support for ReturnConsumedCapacity is lacking in a two areas: 1. While ReturnConsumedCapacity is supported for most relevant operations, it's not supported in two operations: Query and Scan. 2. While ReturnConsumedCapacity=TOTAL is supported, INDEXES is not supported at all. This patch adds extensive tests for all these cases. All these tests pass on DynamoDB but fail on Alternator, so are marked with "xfail". The tests for ReturnConsumedCapacity=INDEXES are deliberately split into two: First, we test the case where the table has no indexes, so INDEXES is almost the same as TOTAL and should be very easy to implement. A second test checks the cases where there are indexes, and different operations increment the capacity of the base table and/or indexes differently - it will require significantly more work to make the second test pass. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#29188	2026-04-07 16:07:41 +03:00
Nadav Har'El	f590ee2b7e	cdc, vector: fix CDC result tracker for vector indexes When a table has a vector index, cdc::cdc_enabled() returns true because vector index writes are implemented via the CDC augmentation path. However, register_cdc_operation_result_tracker() was checking only cdc_options().enabled(), which is false for tables that have a vector index but not traditional CDC. As a result, the operation_result_tracker was never attached to write response handlers for vector-indexed tables. This tracker was added in commit `1b92cbe`, and its job is to update metrics of CDC operations, and since vector search really does use CDC under the hood, these metrics could be useful when diagnosing problems. Fix by using cdc::cdc_enabled() instead of cdc_options().enabled(), which covers both traditional CDC and vector-indexed tables. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#29343	2026-04-07 15:54:51 +03:00
Avi Kivity	8c629d55b0	test: vector_search: check [[nodiscard]] return values of expected<> types Clang 22 verifies [[nodiscard]] for co_await, causing compilation failures where return values of expected<> were silently discarded. These call sites were discarding the return value of client::request() and vector_store_client::ann(), both of which return expected<> types marked [[nodiscard]]. Rather than suppressing the warning with (void) casts, properly check the return values using the established test patterns: BOOST_CHECK(result) where the call is expected to succeed, and BOOST_CHECK(!result) where the call is expected to fail. Closes scylladb/scylladb#29297	2026-04-07 15:25:08 +03:00
Anna Stuchlik	176f6fb59e	doc: add the 2026.x patch release upgrade guide-from-2025 This issue adds the upgrade guide for all patch releases within 2026.x major release. In addition, it fixes the link to Upgrade Policy in the 2025.x-to-2026.1 upgrade guide. Fixes SCYLLADB-1247 Closes scylladb/scylladb#29307	2026-04-07 13:52:16 +02:00
Anna Stuchlik	d329c91f9e	doc: remove About Upgrade and redirect to Upgrade Policy While fixing https://github.com/scylladb/scylladb/issues/28997, we added a new page about upgrade policy: https://docs.scylladb.com/stable/versioning/upgrade-policy.html This commit removes the old page and adds redirections to the new Upgrade Policy page in the unversioned documentation set. Closes scylladb/scylladb#29251	2026-04-07 13:44:10 +02:00
Andrei Chekun	93583bf193	test.py: use safe_drive_shutdown in the tests These methods for closing driver was missed during original fix. Fixes: SCYLLADB-900 Closes scylladb/scylladb#29093	2026-04-07 14:35:18 +03:00
Avi Kivity	00409b61f1	Merge 'Add Vnodes to Tablets Migration Procedure' from Nikos Dragazis This PR introduces the vnodes-to-tablets migration procedure, which enables converting an existing vnode-based keyspace to tablets. The migration is implemented as a manual, operator-driven process executed in several stages. The core idea is to first create tablet maps with the same token boundaries and replica hosts as the vnodes, and then incrementally convert the storage of each node to the tablets layout. At a high level, the procedure is the following: 1. Create tablet maps for all tables in the keyspace. 2. Sequentially upgrade all nodes from vnodes to tablets: 1. Mark a node for upgrade in the topology state. 2. Restart the node. During startup, while the node is offline, it reshards the SSTables on vnode boundaries and switches to a tablet ERM. 3. Wait for the node to return online before proceeding to the next node. 4. Finalize the migration: 1. Update the keyspace schema to mark it as tablet-based. 2. Clear the group0 state related to the migration. From the client's perspective, the migration is online; the cluster can still serve requests on that keyspace, although performance may be temporarily degraded. During the migration, some nodes use vnode ERMs while others use tablet ERMs. Cluster-level algorithms such as load balancing will treat the keyspace's tables as vnode-based. Once migration is finalized, the keyspace is permanently switched to tablets and cannot be reverted back to vnodes. However, a rollback procedure is available before finalization. The patch series consists of: * Load balancer adjustments to ignore tablets belonging to a migrating keyspace. * A new vnode-based resharding mode, where SSTables are segregated on vnode boundaries rather than with the static sharder. * A new per-node `intended_storage_mode` column in `system.topology`. Represents migration intent (whether migration should occur on restart) and direction. * Four new REST endpoints for driving the migration (start, node upgrade/downgrade, finalize, status), along with `nodetool` wrappers. The finalization is implemented as a global topology request. * Wiring of the migration process into the startup logic: the `distributed_loader` determines a migrating table's ERM flavor from the `intended_storage_mode` and the ERM flavor determines the `table_populator`'s resharding mode. Token metadata changes have been adjusted to preserve the ERM flavor. * Cluster tests for the migration process. Fixes SCYLLADB-722. Fixes SCYLLADB-723. Fixes SCYLLADB-725. Fixes SCYLLADB-779. Fixes SCYLLADB-948. New feature, no backport is needed. Closes scylladb/scylladb#29065 * github.com:scylladb/scylladb: docs: Add ops guide for vnodes-to-tablets migration test: cluster: Add test for migration of multiple keyspaces test: cluster: Add test for error conditions test: cluster: Add vnodes->tablets migration test (rollback) test: cluster: Add vnodes->tablets migration test (1 table, 3 nodes) test: cluster: Add vnodes->tablets migration test (1 table, 1 node) scylla-nodetool: Add migrate-to-tablets subcommand api: Add REST endpoint for vnode-to-tablet migration status api: Add REST endpoint for migration finalization topology_coordinator: Add `finalize_migration` request database: Construct migrating tables with tablet ERMs api: Add REST endpoint for upgrading nodes to tablets api: Add REST endpoint for starting vnodes-to-tablets migration topology_state_machine: Add intended_storage_mode to system.topology distributed_loader: Wire vnode-based resharding into table populator replica: Pick any compaction group for resharding compaction: resharding_compaction: add vnodes_resharding option storage_service: Preserve ERM flavor of migrating tables tablet_allocator: Exclude migrating tables from load balancing feature_service: Add vnodes_to_tablets_migrations feature	2026-04-07 14:32:22 +03:00
Łukasz Paszkowski	6f364fd3b7	db: fix system.size_estimates to aggregate sstable estimates across all shards The estimate() function in the size_estimates virtual reader only considered sstables local to the shard that happened to own the keyspace's partition key token. Since sstables are distributed across shards, this caused partition count estimates to be approximately 1/smp_count of the actual value. This bug has been present since the virtual reader was introduced in `225648780d`. Use db.container().map_reduce0() to aggregate sstable estimates across all shards. Each shard contributes its local count and estimated_histogram, which are then merged to produce the correct total. Also fix the `test_partitions_estimate_full_overlap` test which becomes flaky (xpassing ~1% of runs) because autocompaction could merge the two overlapping sstables before the size estimate was read. Wrap the test body in nodetool.no_autocompaction_context to prevent this race. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1179 Refs https://github.com/scylladb/scylladb/issues/9083 Closes scylladb/scylladb#29286	2026-04-07 14:13:26 +03:00
Piotr Smaron	7d449a307c	docs: remove old audit design doc As discussed with @ScyllaPiotr in https://github.com/scylladb/scylladb/pull/29232, the doc about to be removed is just: > Looking at history, I think this audit.md is a design doc: scylladb/scylla-enterprise@87a5c19, for which the feature has been implemented differently, eventually, and was created around the time when design docs, apparently, where stored within the repository itself. So for me it's some trash (sorry for strong language) that can be safely removed. Closes scylladb/scylladb#29316	2026-04-07 14:11:53 +03:00
Avi Kivity	8b4a91982b	cmake: add missing rolling_max_tracker_test and symmetric_key_test Added in `5b2a07b408` and `c596ae6eb1` without cmake integration. Closes scylladb/scylladb#29328	2026-04-07 14:09:00 +03:00
Avi Kivity	d01c9a425f	test: test_out_of_storage_prevention: fix invalid escape in regex Python warns that the sequence "\(" is an invalid escape and might be rejected in the future. Protect against that by using a raw string. Closes scylladb/scylladb#29334	2026-04-07 14:06:32 +03:00
Pavel Emelyanov	0ae781c008	Merge 'test: auth_test: coroutinize' from Avi Kivity Convert auth_test.cc to coroutines for improved readability. Each test is converted in its own commit. Some are trivial. Indentation is left broken in some commits to reduce the diff, then fixed up in the last commit. Code cleanup, so no backport. Closes scylladb/scylladb#29336 * github.com:scylladb/scylladb: auth_test: fix whitespace auth_test: coroutinize test_try_describe_schema_with_internals_and_passwords_as_anonymous_user auth_test: coroutinize test_try_login_after_creating_roles_with_hashed_password auth_test: coroutinize test_create_roles_with_hashed_password_and_log_in auth_test: coroutinize test_try_create_role_with_hashed_password_as_anonymous_user auth_test: coroutinize test_try_to_create_role_with_password_and_hashed_password auth_test: coroutinize test_try_to_create_role_with_hashed_password_and_password auth_test: coroutinize test_alter_with_workload_type auth_test: coroutinize test_alter_with_timeouts auth_test: coroutinize role_permissions_table_is_protected auth_test: coroutinize role_members_table_is_protected auth_test: coroutinize roles_table_is_protected auth_test: coroutinize test_password_authenticator_operations auth_test: coroutinize test_password_authenticator_attributes auth_test: coroutinize test_default_authenticator	2026-04-07 14:05:32 +03:00
Botond Dénes	513af59130	encryption: improve error message when KMS host is not configured When an SSTable was encrypted with a KMS host that is not present in scylla.yaml, the error thrown was: std::invalid_argument (No such host: <host-name>) This message is very obscure in general, and especially confusing when encountered while using the scylla-sstable tool: it gives no indication that the SSTable is encrypted, that a KMS host lookup is involved, or what the user needs to do to fix the problem. Replace it with a message that names the missing host and points directly to the relevant scylla.yaml section: Encryption host "<host-name>" is not defined in scylla.yaml. Make sure it is listed under the "kmip_hosts" section. The wording is intentionally kept neutral (not framed as an SSTable tool problem) because the same code path is exercised by production ScyllaDB when a node's configuration no longer contains a host referenced by an existing data file (e.g. after a config rollback or when restoring data from a different cluster). The production use-case takes precedence, but the message is equally actionable from the tool. Closes scylladb/scylladb#29228	2026-04-07 14:00:27 +03:00
Botond Dénes	7344c05494	scylla-gdb.py: fix small_vector.__len__() start - end will result in negative length, rejected by the python runtime. Use the correct end - start to calculate length. Closes scylladb/scylladb#29249	2026-04-07 13:57:21 +03:00
Botond Dénes	f71d2e78d8	tombstone_gc: don't use real-db for validation and determining default data_dictionary::database was converted to replica::database in two places, just to call find_keyspace(), then call get_replication_strategy() on the returned keyspace. This is not necessary, data_dictionary::database already has find_keyspace() and the returned data_dictionary::keyspace also has get_replication_strategy(). This patch removes a small layering violation but more importantly, it is necessary for the sstable tool to be able to load schemas from disk, when said schema has tombstone_gc props. Closes scylladb/scylladb#29279	2026-04-07 13:56:24 +03:00
Pavel Emelyanov	d6df5ef60a	Merge 'compaction_test: Make compaction tests backend‑agnostic and add S3/GCS support' from Ernest Zaslavsky This series updates the storage abstraction and extends the compaction tests to support object‑storage backends (S3 and GCS), while tightening several parts of the test environment. The changes include: - New exists/object_exists helpers across storage backends and clock fixes in the S3 client to make signature generation stable under test conditions. - A new get_storage_for_tests accessor and adjustments to the test environment to avoid premature teardown of the sstable registry. - Refactoring of compaction tests to remove direct sstable access, ensure proper schema setup, and avoid use of moved‑from objects. - Extraction of test_env‑based logic into reusable functions and addition of S3/GCS variants of the compaction tests. Not all tests were converted to be backend‑agnostic yet, and a few require further investigation before they can run cleanly against S3/GCS backends. These will be addressed in follow‑up work. Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-704 however, followup is needed No backport needed since this change targeting future feature Closes scylladb/scylladb#28790 * github.com:scylladb/scylladb: compaction_test: fix formatting after previous patches compaction_test: add S3/GCS variations to tests compaction_test: extract test_env-based tests into functions compaction_test: replace file_exists with storage::exists compaction_test: initialize tables with schema via make_table_for_tests compaction_test: use sstable APIs to manipulate component files compaction_test: fix use-after-move issue sstable_utils: add `get_storage` and `open_file` helpers test_env: delay unplugging sstable registry storage: add `exists` method to storage abstraction s3_client: use lowres_system_clock for aws_sigv4 s3_client: add `object_exists` helper gcs_client: add `object_exists` helper	2026-04-07 13:53:48 +03:00
Piotr Dulikowski	4161273b4c	Merge 'view_building_worker: fix race during draining procedure' from Michał Jadwiszczak View building worker was breaking semaphores without holding their locks. This lead to races like SCYLLADB-844 and SCYLLADB-543, where a new batch was started after `view_building_worker::state` was cleared in the `drain()` process. This patch fix the race by: - taking a lock of the mutex before breaking it - distinguishing between `state::clear()`(can happen multiple times) and `state::drain()`(can be called only once during shutdown) - asserting that the state is not doing any new work after it was drained Fixes SCYLLADB-844 Fixes SCYLLADB-543 This PR should be backported to all versions containing view building coordinator (2025.4 and newer). Closes scylladb/scylladb#29303 * github.com:scylladb/scylladb: view_building_worker: extract starting a new batch to state's method view_building_worker: distinguish between state's `clear()` and `drain()` view_building_worker: lock mutexes before breaking them in `drain()` view_building_worker: execute drain() once	2026-04-07 12:13:51 +02:00
Avi Kivity	bc10e1a171	test: fix flaky test_login by not retrying authentication failures The fix for SCYLLADB-1373 (`b4f652b7c1`) changed get_session() to use the default timeout=30 for the retry loop in patient_*_cql_connection (previously timeout=0.1). This correctly allowed retrying transient NoHostAvailable errors during node startup, but introduced a new flakiness in test_login and other auth tests. The failure chain: 1. test_login connects with bad credentials (e.g. user="doesntexist") 2. get_session() calls patient_exclusive_cql_connection(), which calls retry_till_success() with bypassed_exception=NoHostAvailable 3. The first attempt correctly fails: the server rejects the credentials with AuthenticationFailed, wrapped in NoHostAvailable 4. retry_till_success() catches NoHostAvailable indiscriminately and retries, not distinguishing between transient errors (node not ready) and permanent errors (bad credentials) 5. A subsequent retry attempt times out (connect_timeout=5), producing OperationTimedOut wrapped in NoHostAvailable 6. After 30 seconds, the last NoHostAvailable is raised -- now wrapping OperationTimedOut instead of the original AuthenticationFailed 7. The assertion `isinstance(..., AuthenticationFailed)` fails With the old timeout=0.1, the deadline was already exceeded after the first attempt, so the original AuthenticationFailed propagated. Fix: Add a `should_retry` predicate parameter to retry_till_success() and use it in patient_cql_connection() and patient_exclusive_cql_connection() to immediately re-raise NoHostAvailable when it wraps AuthenticationFailed. Retrying authentication failures is never useful since the credentials won't change between attempts. Fixes: SCYLLADB-1382 Closes scylladb/scylladb#29348	2026-04-07 10:17:31 +03:00
Michał Jadwiszczak	51c164c8d2	view_building_worker: extract starting a new batch to state's method Following the previous commit, a new batch cannot be started if the state was already drained. This commit also adds a check that only one batch is running at a time.	2026-04-07 08:39:05 +02:00
Michał Jadwiszczak	639aa223f3	view_building_worker: distinguish between state's `clear()` and `drain()` While both of this methods do the same (abort current batch, clear data), we can clear the state multiple times during view_building_worker lifetime (for instance when processing base table is changed) but `view_building_worker::state::drain()` should be called only once and after this no other work on the state should be done.	2026-04-07 08:39:05 +02:00
Michał Jadwiszczak	7aea524f52	view_building_worker: lock mutexes before breaking them in `drain()` Not doing this may lead to races like SCYLLADB-844. If some consumer is holding a lock of a mutex and `drain()` is just braking the mutex without locking it beforehand, then the consumer may process its code which should be aborted. An example of the race is SCYLLADB-844, where `work_on_tasks()` is holding `_state._mutex` while it is broken by `drain()`. This causes a new batch is started after the `_state` is cleared.	2026-04-07 08:39:00 +02:00
Michał Jadwiszczak	91c7ac1fb2	view_building_worker: execute drain() once Future changes will require that the view building worker is drained only once per its lifetime.	2026-04-07 08:35:02 +02:00
Avi Kivity	b4f652b7c1	test: fix flaky test_create_ks_auth by removing bad retry timeout get_session() was passing timeout=0.1 to patient_exclusive_cql_connection and patient_cql_connection, leaving only 0.1 seconds for the retry loop in retry_till_success(). Since each connection attempt can take up to 5 seconds (connect_timeout=5), the retry loop effectively got only one attempt with no chance to retry on transient NoHostAvailable errors. Use the default timeout=30 seconds, consistent with all other callers. Fixes: SCYLLADB-1373 Closes scylladb/scylladb#29332	2026-04-05 19:13:15 +03:00
Avi Kivity	2f0d178510	auth_test: fix whitespace Fix over-indented lines inside do_with_mc lambda bodies introduced during coroutinization.	2026-04-05 18:28:23 +03:00
Avi Kivity	7a24da9e88	auth_test: coroutinize test_try_describe_schema_with_internals_and_passwords_as_anonymous_user Use co_await instead of return for improved readability.	2026-04-05 18:26:30 +03:00
Avi Kivity	e1b52cf337	auth_test: coroutinize test_try_login_after_creating_roles_with_hashed_password Use co_await instead of return for improved readability.	2026-04-05 18:26:30 +03:00
Avi Kivity	24d36ad459	auth_test: coroutinize test_create_roles_with_hashed_password_and_log_in Use co_await instead of return for improved readability.	2026-04-05 18:26:30 +03:00
Avi Kivity	6f20129eec	auth_test: coroutinize test_try_create_role_with_hashed_password_as_anonymous_user Use co_await instead of return for improved readability.	2026-04-05 18:26:30 +03:00
Avi Kivity	cece181113	auth_test: coroutinize test_try_to_create_role_with_password_and_hashed_password Use co_await instead of return for improved readability.	2026-04-05 18:26:30 +03:00
Avi Kivity	752391f757	auth_test: coroutinize test_try_to_create_role_with_hashed_password_and_password Use co_await instead of return for improved readability.	2026-04-05 18:26:30 +03:00
Avi Kivity	287625b297	auth_test: coroutinize test_alter_with_workload_type Use co_await instead of return for improved readability.	2026-04-05 18:26:30 +03:00
Avi Kivity	4eeb5ef54d	auth_test: coroutinize test_alter_with_timeouts Use co_await instead of return for improved readability.	2026-04-05 18:26:30 +03:00
Avi Kivity	170c71b25d	auth_test: coroutinize role_permissions_table_is_protected Use co_await for improved readability.	2026-04-05 18:26:30 +03:00
Avi Kivity	13eccf519f	auth_test: coroutinize role_members_table_is_protected Use co_await for improved readability.	2026-04-05 18:26:30 +03:00
Avi Kivity	43ff3798ad	auth_test: coroutinize roles_table_is_protected Use co_await for improved readability.	2026-04-05 18:26:30 +03:00
Avi Kivity	c586eeb003	auth_test: coroutinize test_password_authenticator_operations Flatten continuation chains (.then()) into linear thread-style code with .get() calls for improved readability. Remove the now-unused require_throws helper template.	2026-04-05 18:26:25 +03:00
Avi Kivity	fbccfe5c9d	auth_test: coroutinize test_password_authenticator_attributes Use co_await instead of return+do_with_cql_env+make_ready_future for improved readability.	2026-04-05 17:28:09 +03:00
Avi Kivity	e3dee64003	auth_test: coroutinize test_default_authenticator Use co_await instead of return+do_with_cql_env+make_ready_future for improved readability.	2026-04-05 17:27:45 +03:00
Jenkins Promoter	ab4a2cdde2	Update pgo profiles - aarch64	2026-04-05 16:58:02 +03:00
Jenkins Promoter	b97cf0083c	Update pgo profiles - x86_64	2026-04-05 16:00:15 +03:00
Nikos Dragazis	6d50e67bd2	scylla_swap_setup: Remove Before=swap.target dependency from swap unit When a Scylla node starts, the scylla-image-setup.service invokes the `scylla_swap_setup` script to provision swap. This script allocates a swap file and creates a swap systemd unit to delegate control to systemd. By default, systemd injects a Before=swap.target dependency into every swap unit, allowing other services to use swap.target to wait for swap to be enabled. On Azure, this doesn't work so well because we store the swap file on the ephemeral disk [1] which has network dependencies (`_netdev` mount option, configured by cloud-init [2]). This makes the swap.target indirectly depend on the network, leading to dependency cycles such as: swap.target -> mnt-swapfile.swap -> mnt.mount -> network-online.target -> network.target -> systemd-resolved.service -> tmp.mount -> swap.target This patch breaks the cycle by removing the swap unit from swap.target using DefaultDependencies=no. The swap unit will still be activated via WantedBy=multi-user.target, just not during early boot. Although this problem is specific to Azure, this patch applies the fix to all clouds to keep the code simple. Fixes #26519. Fixes SCYLLADB-1257 [1] https://github.com/scylladb/scylla-machine-image/pull/426 [2] https://github.com/canonical/cloud-init/pull/1213#issuecomment-1026065501 Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com> Closes scylladb/scylladb#28504	2026-04-05 15:07:50 +03:00
Tomasz Grabiec	74542be5aa	test: pylib: Ignore exceptions in wait_for() ManagerClient::get_ready_cql() calls server_sees_others(), which waits for servers to see each other as alive in gossip. If one of the servers is still early in boot, RESTful API call to "gossiper/endpoint/live" may fail. It throws an exception, which currently terminates the wait_for() and propagates up, failing the test. Fix this by ignoring errors when polling inside wait_for. In case of timeout, we log the last exception. This should fix the problem not only in this case, for all uses of wait_for(). Example output: ``` pred = <function ManagerClient.server_sees_others.<locals>._sees_min_others at 0x7f022af9a140> deadline = 1775218828.9172852, period = 1.0, before_retry = None backoff_factor = 1.5, max_period = 1.0, label = None async def wait_for( pred: Callable[[], Awaitable[Optional[T]]], deadline: float, period: float = 0.1, before_retry: Optional[Callable[[], Any]] = None, backoff_factor: float = 1.5, max_period: float = 1.0, label: Optional[str] = None) -> T: tag = label or getattr(pred, '__name__', 'unlabeled') start = time.time() retries = 0 last_exception: Exception \| None = None while True: elapsed = time.time() - start if time.time() >= deadline: timeout_msg = f"wait_for({tag}) timed out after {elapsed:.2f}s ({retries} retries)" if last_exception is not None: timeout_msg += ( f"; last exception: {type(last_exception).__name__}: {last_exception}" ) raise AssertionError(timeout_msg) from last_exception raise AssertionError(timeout_msg) try: > res = await pred() test/pylib/util.py:80: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ async def _sees_min_others(): > raise Exception("asd") E Exception: asd test/pylib/manager_client.py:802: Exception The above exception was the direct cause of the following exception: manager = <test.pylib.manager_client.ManagerClient object at 0x7f022af7e7b0> @pytest.mark.asyncio async def test_auth_after_reset(manager: ManagerClient) -> None: servers = await manager.servers_add(3, config=auth_config, auto_rack_dc="dc1") > cql, _ = await manager.get_ready_cql(servers) test/cluster/auth_cluster/test_auth_after_reset.py:33: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ test/pylib/manager_client.py:137: in get_ready_cql await self.servers_see_each_other(servers) test/pylib/manager_client.py:820: in servers_see_each_other await asyncio.gather(others) test/pylib/manager_client.py:806: in server_sees_others await wait_for(_sees_min_others, time() + interval, period=.5) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ pred = <function ManagerClient.server_sees_others.<locals>._sees_min_others at 0x7f022af9a140> deadline = 1775218828.9172852, period = 1.0, before_retry = None backoff_factor = 1.5, max_period = 1.0, label = None async def wait_for( pred: Callable[[], Awaitable[Optional[T]]], deadline: float, period: float = 0.1, before_retry: Optional[Callable[[], Any]] = None, backoff_factor: float = 1.5, max_period: float = 1.0, label: Optional[str] = None) -> T: tag = label or getattr(pred, '__name__', 'unlabeled') start = time.time() retries = 0 last_exception: Exception \| None = None while True: elapsed = time.time() - start if time.time() >= deadline: timeout_msg = f"wait_for({tag}) timed out after {elapsed:.2f}s ({retries} retries)" if last_exception is not None: timeout_msg += ( f"; last exception: {type(last_exception).__name__}: {last_exception}" ) > raise AssertionError(timeout_msg) from last_exception E AssertionError: wait_for(_sees_min_others) timed out after 45.30s (46 retries); last exception: Exception: asd test/pylib/util.py:76: AssertionError ``` Fixes a failure observed in test_auth_after_reset: ``` manager = <test.pylib.manager_client.ManagerClient object at 0x7fb3740e1630> @pytest.mark.asyncio async def test_auth_after_reset(manager: ManagerClient) -> None: servers = await manager.servers_add(3, config=auth_config, auto_rack_dc="dc1") cql, _ = await manager.get_ready_cql(servers) await cql.run_async("ALTER ROLE cassandra WITH PASSWORD = 'forgotten_pwd'") logging.info("Stopping cluster") await asyncio.gather([manager.server_stop_gracefully(server.server_id) for server in servers]) logging.info("Deleting sstables") for table in ["roles", "role_members", "role_attributes", "role_permissions"]: await asyncio.gather([manager.server_wipe_sstables(server.server_id, "system", table) for server in servers]) logging.info("Starting cluster") # Don't try connect to the servers yet, with deleted superuser it will be possible only after # quorum is reached. await asyncio.gather([manager.server_start(server.server_id, connect_driver=False) for server in servers]) logging.info("Waiting for CQL connection") await repeat_until_success(lambda: manager.driver_connect(auth_provider=PlainTextAuthProvider(username="cassandra", password="cassandra"))) > await manager.get_ready_cql(servers) test/cluster/auth_cluster/test_auth_after_reset.py:50: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ test/pylib/manager_client.py:137: in get_ready_cql await self.servers_see_each_other(servers) test/pylib/manager_client.py:819: in servers_see_each_other await asyncio.gather(*others) test/pylib/manager_client.py:805: in server_sees_others await wait_for(_sees_min_others, time() + interval, period=.5) test/pylib/util.py:71: in wait_for res = await pred() test/pylib/manager_client.py:802: in _sees_min_others alive_nodes = await self.api.get_alive_endpoints(server_ip) test/pylib/rest_client.py:243: in get_alive_endpoints data = await self.client.get_json(f"/gossiper/endpoint/live", host=node_ip) test/pylib/rest_client.py:99: in get_json ret = await self._fetch("GET", resource_uri, response_type = "json", host = host, _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = <test.pylib.rest_client.TCPRESTClient object at 0x7fb2404a0650> method = 'GET', resource = '/gossiper/endpoint/live', response_type = 'json' host = '127.15.252.8', port = 10000, params = None, json = None, timeout = None allow_failed = False async def _fetch(self, method: str, resource: str, response_type: Optional[str] = None, host: Optional[str] = None, port: Optional[int] = None, params: Optional[Mapping[str, str]] = None, json: Optional[Mapping] = None, timeout: Optional[float] = None, allow_failed: bool = False) -> Any: # Can raise exception. See https://docs.aiohttp.org/en/latest/web_exceptions.html assert method in ["GET", "POST", "PUT", "DELETE"], f"Invalid HTTP request method {method}" assert response_type is None or response_type in ["text", "json"], \ f"Invalid response type requested {response_type} (expected 'text' or 'json')" # Build the URI port = port if port else self.default_port if hasattr(self, "default_port") else None port_str = f":{port}" if port else "" assert host is not None or hasattr(self, "default_host"), "_fetch: missing host for " \ "{method} {resource}" host_str = host if host is not None else self.default_host uri = self.uri_scheme + "://" + host_str + port_str + resource logging.debug(f"RESTClient fetching {method} {uri}") client_timeout = ClientTimeout(total = timeout if timeout is not None else 300) async with request(method, uri, connector = self.connector if hasattr(self, "connector") else None, params = params, json = json, timeout = client_timeout) as resp: if allow_failed: return await resp.json() if resp.status != 200: text = await resp.text() > raise HTTPError(uri, resp.status, params, json, text) E test.pylib.rest_client.HTTPError: HTTP error 404, uri: http://127.15.252.8:10000/gossiper/endpoint/live, params: None, json: None, body: E {"message": "Not found", "code": 404} test/pylib/rest_client.py:77: HTTPError ``` Fixes: SCYLLADB-1367 Closes scylladb/scylladb#29323	2026-04-05 13:52:26 +03:00
Ernest Zaslavsky	c7a74237b3	compaction_test: fix formatting after previous patches	2026-04-05 11:07:17 +03:00
Ernest Zaslavsky	101b4ad7fa	compaction_test: add S3/GCS variations to tests Add S3 and GCS variants of the compaction tests to expand coverage for keyspaces configured to use object_storage backends.	2026-04-05 11:07:17 +03:00
Ernest Zaslavsky	03bd3010bf	compaction_test: extract test_env-based tests into functions Move all test code that relies on test_env into standalone free functions so they can be reused by upcoming S3 and GCS test suites.	2026-04-05 11:07:17 +03:00
Ernest Zaslavsky	b18528e97e	compaction_test: replace file_exists with storage::exists Replace direct filesystem checks (file_exists) with the storage-agnostic exists() method in unsealed_sstable_compaction, sstable_clone_leaving_unsealed_dest_sstable, and failure_when_adding_new_sstable tests, making them compatible with object-storage backends (S3, GCS).	2026-04-05 11:07:17 +03:00
Ernest Zaslavsky	98492e4ea8	compaction_test: initialize tables with schema via make_table_for_tests Start using `table_for_tests::make_default_schema` so test tables are created with a real schema. This is required for object-storage backends, which cannot operate correctly without proper schema initialization.	2026-04-05 11:07:17 +03:00
Ernest Zaslavsky	5ba79e2ed4	compaction_test: use sstable APIs to manipulate component files Switch tests to use sstable member functions for file manipulation instead of opening files directly on the filesystem. This affects the helpers that emulate sstable corruption: we now overwrite the entire component file rather than just the first few kilobytes, which is sufficient for producing a corrupted sstable.	2026-04-05 11:07:17 +03:00
Ernest Zaslavsky	405c032f48	compaction_test: fix use-after-move issue We were moving `compaction_type_options` inside a loop, so on the second iteration the test received an already moved-from instance.	2026-04-05 11:07:17 +03:00
Ernest Zaslavsky	437a581b04	sstable_utils: add `get_storage` and `open_file` helpers Add a non-const `get_storage` accessor to expose underlying storage, and an `open_file` helper to access sstable component files directly. These are needed so compaction tests can read and write sstable components.	2026-04-05 11:07:17 +03:00
Ernest Zaslavsky	2ad2dbae03	test_env: delay unplugging sstable registry Unplugging the mock sstable_registry happened too early in the test environment. During sstable destruction, components may still need access to the registry, so the unplugging is moved to a later stage.	2026-04-05 11:07:17 +03:00
Ernest Zaslavsky	8f6630e9cd	storage: add `exists` method to storage abstraction Add an `exists` method to the storage abstraction to allow S3, GCS, and local storage implementations to check whether an sstable component is present.	2026-04-05 11:07:17 +03:00
Ernest Zaslavsky	ba785f6cab	s3_client: use lowres_system_clock for aws_sigv4 Switch aws_sigv4 to lowres_system_clock since it is not affected by time offsets often introduced in tests, which can skew db_clock. S3 requests cannot represent time shifts greater than 15 minutes from server time, so a stable clock is required.	2026-04-05 11:07:17 +03:00
Ernest Zaslavsky	e08d779922	s3_client: add `object_exists` helper Introduce `object_exists` to the S3 client to check whether an object exists. This is primarily useful for test scenarios.	2026-04-05 11:07:16 +03:00
Ernest Zaslavsky	016b344a8a	gcs_client: add `object_exists` helper Introduce `object_exists` to the GCS client to check whether an object exists. This is primarily useful for test scenarios.	2026-04-05 11:07:16 +03:00
Andrzej Jackowski	8c0920202b	test: protect populate_range in row_cache_test from bad_alloc When test_exception_safety_of_update_from_memtable was converted from manual fail_after()/catch to with_allocation_failures() in `74db08165d`, the populate_range() call ended up inside the failure injection scope without a scoped_critical_alloc_section guard. The other two tests converted in the same commit (test_exception_safety_of_transitioning... and test_exception_safety_of_partition_scan) were correctly guarded. Without the guard, the allocation failure injector can sometimes target an allocation point inside the cleanup path of populate_range(). In a rare corner case, this triggers a bad_alloc in a noexcept context (reader_concurrency_semaphore::stop()), causing std::terminate. Fixes SCYLLADB-1346 Closes scylladb/scylladb#29321	2026-04-04 21:13:26 +03:00
Andrzej Jackowski	ec274cf7b6	test: add test_upgrade_preserves_ddl_audit_for_tables Verify that upgrading from 2025.1 to master does not silently drop DDL auditing for table-scoped audit configurations (SCYLLADB-1155). Test time in dev: 4s Refs: SCYLLADB-1155 Fixes: SCYLLADB-1305	2026-04-03 13:53:28 +02:00
Andrzej Jackowski	9c7b7ac3e3	test: audit: split validate helper so callers need not pass audit_settings The old execute_and_validate_audit_entry required every caller to pass audit_settings so it could decide internally whether to expect an entry. A test added later in this series needs to simply assert an entry was produced, without specifying audit_settings at all. Split into two methods: - execute_and_validate_new_audit_entry: unconditionally expects an audit entry. - execute_and_validate_if_category_enabled: checks audit_settings to decide whether to expect an entry or assert absence. Local wrapper functions and **kwargs forwarding are removed in favor of explicit arguments at each call site, and expected-error cases are handled inline with assert_invalid + assert_entries_were_added.	2026-04-03 13:52:47 +02:00
Andrzej Jackowski	189bff1d5c	test: audit: declare manager attribute in AuditTester base class AuditTester uses self.manager throughout but never declares it. The attribute is only assigned in the CQLAuditTester subclass __init__, so the type checker reports 'Attribute "manager" is unknown' on every self.manager reference in the base class. Add an __init__ to AuditTester that accepts and stores the manager instance, and update CQLAuditTester to forward it via super().__init__ instead of assigning self.manager directly.	2026-04-03 13:52:47 +02:00
Botond Dénes	2c22d69793	Merge 'Pytest: fix variable handling in GSServer (mock) and ensure docker service logs go to test log as well' from Calle Wilund Fixes: SCYLLADB-1106 * Small fix in scylla_cluster - remove debug print * Fix GSServer::unpublish so it does not except if publish was not called beforehand * Improve dockerized_server so mock server logs echo to the test log to help diagnose CI failures (because we don't collect log files from mocks etc, and in any case correlation will be much easier). No backport needed. Closes scylladb/scylladb#29112 * github.com:scylladb/scylladb: dockerized_service: Convert log reader to pipes and push to test log test::cluster::conftest::GSServer: Fix unpublish for when publish was not called scylla_cluster: Use thread safe future signalling scylla_cluster: Remove left-over debug printout	2026-04-03 06:38:05 +03:00
Raphael S. Carvalho	b6ebbbf036	test/cluster/test_tablets2: Fix test_split_stopped_on_shutdown race with stale log messages The test was failing because the call to: await log.wait_for('Stopping.ongoing compactions') was missing the 'from_mark=log_mark' argument. The log mark was updated (line: log_mark = await log.mark()) immediately after detecting 'splitting_mutation_writer_switch_wait: waiting', and just before launching the shutdown task. However, the wait_for call on the following line was scanning from the beginning of the log, not from that mark. As a result, the search immediately matched old 'Stopping N tasks for N ongoing compactions for table system.X due to table removal' messages emitted during initial server bootstrap (for system.large_partitions, system.large_rows, system.large_cells), rather than waiting for the shutdown to actually stop the user-table split compaction. This caused the test to prematurely send the message to the 'splitting_mutation_writer_switch_wait' injection. The split compaction was unblocked before the shutdown had aborted it, so it completed successfully. Since the split succeeded, 'Failed to complete splitting of table' was never logged. Meanwhile, 'storage_service_drain_wait' was blocking do_drain() waiting for a message. With the split already done, the test was stuck waiting for the expected failure log that would never come (600s timeout). At the same time, after 60s the 'storage_service_drain_wait' injection timed out internally, triggering on_internal_error() which -- with --abort-on-internal-error=1 -- crashed the server (exit code -6). Fix: pass from_mark=log_mark to the wait_for('Stopping.ongoing compactions') call so it only matches messages that appear after the shutdown has started, ensuring the test correctly synchronizes with the shutdown aborting the user-table split compaction before releasing the injection. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1319. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Closes scylladb/scylladb#29311	2026-04-03 06:28:51 +03:00
Andrei Chekun	6526a78334	test.py: fix nodetool mock server port collision Replace the random port selection with an OS-assigned port. We open a temporary TCP socket, bind it to (ip, 0) with SO_REUSEADDR, read back the port number the OS selected, then close the socket before launching rest_api_mock.py. Add reuse_address=True and reuse_port=True to TCPSite in rest_api_mock.py so the server itself can also reclaim a TIME_WAIT port if needed. Fixes: SCYLLADB-1275 Closes scylladb/scylladb#29314	2026-04-02 16:24:07 +02:00
Botond Dénes	eb78498e07	test: fix flaky test_timeout_is_applied_on_lookup by using eventually_true On slow/overloaded CI machines the lowres_clock timer may not have fired after the fixed 2x sleep, causing the assertion on get_abort_exception() to fail. Replace the fixed sleep with sleep(1x) + eventually_true() which retries with exponential backoff, matching the pattern already used in test_time_based_cache_eviction. Fixes: SCYLLADB-1311 Closes scylladb/scylladb#29299	2026-04-01 18:20:11 +03:00
Marcin Maliszkiewicz	a74665b300	transport: add per-service-level pending response memory metric Track the total memory consumed by responses waiting to be written to the socket, exposed as a per-scheduling-group gauge (cql_pending_response_memory). This complements the response memory accounting added in the previous commits by giving visibility into how much memory each service level is holding in unsent response buffers.	2026-04-01 17:15:28 +02:00
Robert Bindar	e7527392c4	test: close clients if cluster teardown throws make sure the driver is stopped even though cluster teardown throws and avoid potential stale driver connections entering infinite reconnect loops which exhaust cpu resources. Fixes: SCYLLADB-1189 Signed-off-by: Robert Bindar <robert.bindar@scylladb.com> Closes scylladb/scylladb#29230	2026-04-01 17:22:19 +03:00
Tomasz Grabiec	2ec47a8a21	tests: address_map_test: Fix flakiness in debug mode due to task reordering Debug mode shuffles task position in the queue. So the following is possible: 1) shard 1 calls manual_clock::advance(). This expires timers on shard 1 and queues a background smp call to shard 0 which will expire timers there 2) the smp::submit_to(0, ...) from shard 1 called by the test sumbits the call 3) shard 0 creates tasks for both calls, but (2) is run first, and preempts the reactor 4) shard 1 sees the completion, completes m_svc.invoke_on(1, ..) 5) shard 0 inserts the completion from (4) before task from (1) 6) the check on shard 0: m.find(id1) fails because the timer is not expired yet To fix that, wait for timer expiration on shard 0, so that the test doesn't depend on task execution order. Note: I was not able to reproduce the problem locally using test.py --mode debug --repeat 1000. It happens in jenkins very rarely. Which is expected as the scenario which leads to this is quite unlikely. Fixes SCYLLADB-1265 Closes scylladb/scylladb#29290	2026-04-01 17:17:35 +03:00
Aleksandra Martyniuk	4d4ce074bb	test: node_ops_tasks_tree: reconnect driver after topology changes The test exercises all five node operations (bootstrap, replace, rebuild, removenode, decommission) and by the end only one node out of four remains alive. The CQL driver session, however, still holds stale references to the dead hosts in its connection pool and load-balancing policy state. When the new_test_keyspace context manager exits and attempts DROP KEYSPACE, the driver routes the query to the dead hosts first, gets ConnectionShutdown from each, and throws NoHostAvailable before ever trying the single live node. Fix by calling driver_connect() after the decommission step, which closes the old session and creates a fresh one connected only to the servers the test manager reports as running. Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1313. Closes scylladb/scylladb#29306	2026-04-01 17:13:11 +03:00
Dario Mirovic	85127fded8	test: boost: test null data value to_parsable_string Add tests for null value in data_type::to_parsable_string(). We now explicitly return "null". Refs SCYLLADB-1350	2026-04-01 14:15:25 +02:00
Dario Mirovic	fc705dfb4b	cql3: fix null handling in data_value formatting data_value::to_parsable_string() crashed with a null pointer dereference when called on a null data_value. Return "null" instead. Fixes SCYLLADB-1350	2026-04-01 14:15:18 +02:00
Andrzej Jackowski	cccb014747	test: ldap: add regression test for double-free on unregistered message ID Sends a search via the raw LDAP handle (bypassing _msgid_to_promise registration), then triggers poll_results() through the public API to exercise the unregistered-ID branch. Refs: SCYLLADB-1344	2026-04-01 12:57:50 +02:00
Botond Dénes	0351756b15	Merge 'test: fix fuzzy_test timeout in release mode' from Piotr Smaron The multishard_query_test/fuzzy_test was timing out (SIGKILL after 15 minutes) in release mode CI. In release mode the test generates up to 64 partitions with up to 1000 clustering rows and 1000 range tombstones each. With deeply nested randomly-generated types (e.g. frozen<map<varint, frozen<map<frozen<tuple<...>>>>>>), this volume of data can exceed the 15-minute CI timeout. Reduce the release-mode clustering-row and range-tombstone distributions from 0-1000 to 0-200. This caps the worst case at ~12,800 rows -- still 2x the devel-mode maximum (0-100) and sufficient to exercise multi-partition paged scanning with many pages. Fixes: SCYLLADB-1270 No need to backport for now, only appeared on master. Closes scylladb/scylladb#29293 * github.com:scylladb/scylladb: test: clean up fuzzy_test_config and add comments test: fix fuzzy_test timeout in release mode	2026-04-01 11:50:15 +03:00
Andrzej Jackowski	f0028c06dc	ldap: fix double-free of LDAPMessage in poll_results() In the unregistered-ID branch, ldap_msgfree() was called on a result already owned by an RAII ldap_msg_ptr, causing a double-free on scope exit. Remove the redundant manual free. Fixes: SCYLLADB-1344	2026-04-01 10:35:13 +02:00
Andrei Chekun	18f41dcd71	test.py: introduce new scheduler for choosing job count This commit improves how test.py chohoses the default number of parallele jobs. This update keeps logic of selecting number of jobs from memory and cpu limits but simplifies the heuristic so it is smoother, easier to reason about. This avoids discontinuities such as neighboring machine sizes producing unexpectedly different job counts, and behaves more predictably on asymmetric machines where CPU and RAM do not scale together. Compared to the current threshold-based version, this approach: - avoids hard jumps around memory cutoffs - avoids bucketed debug scaling based on CPU count - keeps CPU and memory as separate constraints and combines them in one place - avoids double-penalizing debug mode - is easier to tune later by adjusting a few constants instead of rewriting branching logic Closes scylladb/scylladb#28904	2026-04-01 11:11:15 +03:00
Avi Kivity	d438e35cdd	test/cluster: fix race in test_insert_failure_standalone audit log query get_audit_partitions_for_operation() returns None when no audit log rows are found. In _test_insert_failure_doesnt_report_success_assign_nodes, this None is passed to set(), causing TypeError: 'NoneType' object is not iterable. The audit log entry may not yet be visible immediately after executing the INSERT, so use wait_for() from test.pylib.util with exponential backoff to poll until the entry appears. Import it as wait_for_async to avoid shadowing the existing wait_for from test.cluster.dtest.dtest_class, which has a different signature (timeout vs deadline). Fixes SCYLLADB-1330 Closes scylladb/scylladb#29289	2026-04-01 10:59:02 +03:00
Michael Litvak	35547bfb6e	test: logstor: additional logstor tests	2026-03-31 18:45:08 +02:00
Michael Litvak	5b3e2a4ca2	docs/dev: add logstor on-disk format section	2026-03-31 18:45:08 +02:00
Michael Litvak	39baa573d2	logstor: add version and crc to buffer header add basic crc and validation to the buffer header. add also a version field that indicates the version of the on-disk format.	2026-03-31 18:45:08 +02:00
Michael Litvak	6ace823ee4	test: logstor: tablet split/merge and migration add basic logstor tests for tablet split/merge and migration to verify it works as expected	2026-03-31 18:45:08 +02:00
Michael Litvak	996d623ab4	logstor: enable tablet balancing enable tablet balancing with the logstor feature now that it works	2026-03-31 18:45:08 +02:00
Michael Litvak	b02349d755	logstor: streaming of logstor segments using stream_blob implement tablet migration for logstor tables by streaming segments using stream_blob, similar to file streaming of sstables. take a snapshot of the logstor segments and create a stream_blob_info vector with entry for each segment with the input stream that reads the segment and an op of type file_ops::stream_logstor_segments. the stream_blob_handler creates a logstor sink that allocates a segment on the target shard and creates an output stream that writes to it. when the sink is closed it loads the segment.	2026-03-31 18:45:08 +02:00
Michael Litvak	78426ae31b	logstor: add take_logstor_snapshot add the function table::take_logstor_snapshot that is similar to take_storage_snapshot for sstables. given a token range, for each storage group in the range, it flushes the separator buffers and then makes a snapshot of all segments in the sg's compaction groups while disabling compaction. the segment snapshot holds a reference to the segment so that it won't be freed by compaction, and it provides an input stream for reading the segment. this will be used for tablet migration to stream the segments.	2026-03-31 18:45:08 +02:00
Michael Litvak	754c1b83bd	logstor: segment input/output stream add functions for creating segment input and output streams, that will be used for segment streaming. the segment input stream creates a file input stream that reads a given segment. the segment output stream allocates a new local segment and creates an output stream that writes to the segment, and when closed it loads the segment and adds it to the compaction group.	2026-03-31 18:45:08 +02:00
Michael Litvak	17cab4181b	logstor: implement compaction_group::cleanup implement compaction group cleanup by clearing the range in the index and discarding the segments of the compaction group. segments are discarded by overwriting the segment header to indicate the segment is empty while preserving the segment generation number in order to not resurrect old data in the segment.	2026-03-31 18:45:08 +02:00
Michael Litvak	9fd6dace72	logstor: tablet split implement tablet split for logstor. flush the separator and then perform split as a new type of compaction: take a batch of segments from the source compaction group, read them and write all live records into left/right write buffers according to the split classifier, flush them to the compaction group, and free the old segments. segments that fit in a single target compaction group are removed from the source and added to the correct target group.	2026-03-31 18:45:08 +02:00
Michael Litvak	5de39afc24	logstor: tablet merge implement tablet merge with logstor. disable compaction for the new compaction group, then merge the merging compaction groups by merging their logstor segments set into the new cg - simply merging the segment histogram.	2026-03-31 18:40:57 +02:00
Michael Litvak	684ce8de71	logstor: add compaction reenabler add a function that stops and disabled compaction for a compaction group and returns a compaction reenabler object, similarly to the normal compaction manager. this will be useful for disabling compaction while doing operations on the compaction group's logstor segment set.	2026-03-31 18:40:56 +02:00
Michael Litvak	1d7c2e4f52	logstor: add segment header we have two types of segments. the active segment is "mixed" because we can write to it multiple write_buffers, each write buffer having records from different tables and tablets. in constrast, the separator and compaction write "full" segments - they write a single write_buffer that has records from a single tablet and storage group. for "full" segments, we add a segment header the contains additional useful metadata such as the table and token range in the segment. the write buffer header contains the type of the buffer, mixed or full. if it's full then it has a segment header placed after the write buffer header.	2026-03-31 18:40:56 +02:00
Michael Litvak	8615f68657	logstor: serialize writes to active segment previously when writing to the active segment, the allocation was serialized but multiple writes could proceed concurrently to different offsets. change it instead to serialize the entire write. we prefer to write larger buffers sequentially instead of multiple buffers concurrently. it is also better that we don't have "holes" in the segment. we also change the buffered_writer to send a single flushing buffer at a time. it has a ring of buffers, new writes are written to the head buffer, and a single consumer flushes the tail buffer.	2026-03-31 18:40:56 +02:00
Michael Litvak	e791823874	replica: extend compaction_group functions for logstor extend compaction_group functions such as disk size calculation and empty() to account also for the logstor segments that the compaction group owns. reuse the sstable_add_gate when there is a write in process to a compaction group, in order for the compaction group to be considered not empty.	2026-03-31 18:40:56 +02:00
Michael Litvak	d3db967802	replica: add compaction_group_for_logstor_segment add the function table::compaction_group_for_logstor_segment that we use when recovering a segment to find the compaction group for a segment based on its token range, similarly to compaction_group_for_sstable for sstables. extract the common logic from compaction_group_for_sstable to a common function compaction_group_for_token_range that finds a compaction group for a token range.	2026-03-31 18:40:56 +02:00
Michael Litvak	bf7bc5b410	logstor: code cleanup misc code cleanup and small changes	2026-03-31 18:40:56 +02:00
Botond Dénes	2d2ff4fbda	sstables: use chunked_managed_vector for promoted indexes in partition_index_page Switch _promoted_indexes storage in partition_index_page from managed_vector to chunked_managed_vector to avoid large contiguous allocations. Avoid allocation failure (or crashes with --abort-on-internal-error) when large partitions have enough promoted index entries to trigger a large allocation with managed_vector. Fixes: SCYLLADB-1315 Closes scylladb/scylladb#29283	2026-03-31 18:43:57 +03:00
Piotr Smaron	2ce409dca0	test: clean up fuzzy_test_config and add comments Remove the unused timeout field from fuzzy_test_config. It was declared, initialized per build mode, and logged, but never actually enforced anywhere. Document the intentionally small max_size (1024 bytes) passed to read_partitions_with_paged_scan in run_fuzzy_test_scan: it forces many pages per scan to stress the paging and result-merging logic.	2026-03-31 17:13:26 +02:00
Piotr Smaron	df2924b2a3	test: fix fuzzy_test timeout in release mode The multishard_query_test/fuzzy_test was timing out (SIGKILL after 15 minutes) in release mode CI. In release mode the test generates up to 64 partitions with up to 1000 clustering rows and 1000 range tombstones each. With deeply nested randomly-generated types (e.g. frozen<map<varint, frozen<map<frozen<tuple<...>>>>>>), this volume of data can exceed the 15-minute CI timeout. Reduce the release-mode clustering-row and range-tombstone distributions from 0-1000 to 0-200. This caps the worst case at ~12,800 rows -- still 2x the devel-mode maximum (0-100) and sufficient to exercise multi-partition paged scanning with many pages. Fixes: SCYLLADB-1270	2026-03-31 17:13:06 +02:00
Piotr Szymaniak	6d8ec8a0c0	alternator: fix flaky test_update_condition_unused_entries_short_circuit The test was flaky because it stopped dc2_node immediately after an LWT write, before cross-DC replication could complete. The LWT commit uses LOCAL_QUORUM, which only guarantees persistence in the coordinator's DC. Replication to the remote DC is async background work, and CAS mutations don't store hints. Stopping dc2_node could drop in-flight RPCs, leaving DC1 without the mutation. Fix by polling both live DC1 nodes after the write to confirm cross-DC replication completed before stopping dc2_node. Both nodes must have the data so that the later ConsistentRead=True (LOCAL_QUORUM) read on restarted node1 is guaranteed to succeed. Fixes SCYLLADB-1267 Closes scylladb/scylladb#29287	2026-03-31 16:50:51 +03:00
Dawid Mędrek	f040f1b703	Merge 'raft: remake the read barrier optimization' from Patryk Jędrzejczak The approach taken in `1ae2ae50a6` turned out to be incorrect. The Raft member requesting a read barrier could incorrectly advance its commit_idx and break linearizability. We revert that commit in this PR. We also remake the read barrier optimization with a completely new approach. We make the leader replicate to the non-voting requester of a read barrier if its `commit_idx` is behind. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-998 No backport: the issue is present only in master. Closes scylladb/scylladb#29216 * github.com:scylladb/scylladb: raft: speed up read barrier requested by non-voters Revert "raft: read_barrier: update local commit_idx to read_idx when it's safe"	2026-03-31 15:11:56 +02:00
Marcin Maliszkiewicz	a26ca0f5f7	transport: hold memory permit until response write completes Capture the memory permit in the leave lambda's .finally() continuation so that the semaphore units are kept alive until write_response finishes, preventing premature release of memory accounting. This is especially important with slow network and big responses when buffers can accumulate and deplete node's memory.	2026-03-31 14:05:00 +02:00
Avi Kivity	216d39883a	Merge 'test: audit: fix audit test syslog race' from Dario Mirovic Fix two independent race conditions in the syslog audit test that cause intermittent `assert 2 <= 1` failures in `assert_entries_were_added`. Datagram ordering race: `UnixSockerListener` used `ThreadingUnixDatagramServer`, where each datagram spawns a new thread. The notification barrier in `get_lines()` assumes FIFO handling, but the notification thread can win the lock before an audit entry thread, so `clear_audit_logs()` misses entries that arrive moments later. Fix: switch to sequential `UnixDatagramServer`. Config reload race: The live-update path used `wait_for_config` (REST API poll on shard 0) which can return before `broadcast_to_all_shards()` completes. Fix: wait for `"completed re-reading configuration file"` in the server log after each SIGHUP, which guarantees all shards have the new config. Fixes SCYLLADB-1277 This is CI improvement for the latest code. No need for backport. Closes scylladb/scylladb#29282 * github.com:scylladb/scylladb: test: cluster: wait for full config reload in audit live-update path test: cluster: fix syslog listener datagram ordering race	2026-03-31 13:53:01 +03:00
Tomasz Grabiec	b355bb70c2	dtest/alternator: stop concurrent-requests test when workers hit limit `test_limit_concurrent_requests` could create far more tables than intended because worker threads looped indefinitely and only the probe path terminated the test. In practice, workers often hit `RequestLimitExceeded` first, but the test kept running and creating tables, increasing memory pressure and causing flakiness due to bad_alloc errors in logs. Fix by replacing the old probe-driven termination with worker-driven termination. Workers now run until any worker sees `RequestLimitExceeded`. Fixes SCYLLADB-1181 Closes scylladb/scylladb#29270	2026-03-31 13:35:50 +03:00
Patryk Jędrzejczak	b9f82f6f23	raft_group0: join_group0: fix join hang when node joins group 0 before post_server_start A joining node hung forever if the topology coordinator added it to the group 0 configuration before the node reached `post_server_start`. In that case, `server->get_configuration().contains(my_id)` returned true and the node broke out of the join loop early, skipping `post_server_start`. `_join_node_group0_started` was therefore never set, so the node's `join_node_response` RPC handler blocked indefinitely. Meanwhile the topology coordinator's `respond_to_joining_node` call (which has no timeout) hung forever waiting for the reply that never came. Fix by only taking the early-break path when not starting as a follower (i.e. when the node is the discovery leader or is restarting). A joining node must always reach `post_server_start`. We also provide a regression test. It takes 6s in dev mode. Fixes SCYLLADB-959 Closes scylladb/scylladb#29266	2026-03-31 12:33:56 +02:00
Marcin Maliszkiewicz	2645b95888	transport: account for response size exceeding initial memory estimate After obtaining the CQL response, check if its actual size exceeds the initially acquired memory permit. If so, take semaphore units and adopt them into the permit (non blocking). This doesn't fully prevent from allocating too much memory as size is known when buffer is already allocated but improves memory accounting for big responses.	2026-03-31 11:57:41 +02:00
Dario Mirovic	0cb63fb669	test: cluster: wait for full config reload in audit live-update path _apply_config_to_running_servers used wait_for_config (REST API poll) to confirm live config updates. The REST API reads from shard 0 only, so it can return before broadcast_to_all_shards() completes — other shards may still have stale audit config, generating unexpected entries. Additionally, server_remove_config_option for absent keys sent separate SIGHUPs before server_update_config, and the single wait_for_config at the end could match a completion from an earlier SIGHUP. Wait for "completed re-reading configuration file" in the server log after each SIGHUP-producing operation. This message is logged only after both read_config() and broadcast_to_all_shards() finish, guaranteeing all shards have the new config. Each operation gets its own mark+wait so no stale completion is matched. Fixes SCYLLADB-1277	2026-03-31 02:27:11 +02:00
Dario Mirovic	1d623196eb	test: cluster: fix syslog listener datagram ordering race UnixSockerListener used ThreadingUnixDatagramServer, which spawns a new thread per datagram. The notification barrier in get_lines() relies on all prior datagrams being handled before the notification. With threading, the notification handler can win the lock before an audit entry handler, so get_lines() returns before the entry is appended. clear_audit_logs() then clears an incomplete buffer, and the late entry leaks into the next test's before/after diff. Switch to sequential UnixDatagramServer. The server thread now handles datagrams in kernel FIFO order, so the notification is always processed after all preceding audit entries. Refs SCYLLADB-1277	2026-03-31 02:27:11 +02:00
Karol Nowacki	493a4433e7	index: fix DESC INDEX for vector index The `DESC INDEX` command returned incorrect results for local vector indexes and for vector indexes that included filtering columns. This patch corrects the implementation to ensure `DESCRIBE INDEX` accurately reflects the index configuration. This was a pre-existing issue, not a regression from recent serialization schema changes for vector index target options.	2026-03-30 16:46:48 +02:00
Karol Nowacki	a32e4bb9f4	vector_search: test: refactor boilerplate setup The test boilerplate setup for some vector store client tests has been extracted to a common function.	2026-03-30 16:46:48 +02:00
Karol Nowacki	6bc88e817f	vector_search: fix SELECT on local vector index Queries against local vector indexes were failing with the error: "ANN ordering by vector requires the column to be indexed using 'vector_index'" This was a regression introduced by `15788c3734`, which incorrectly assumed the first column in the targets list is always the vector column. For local vector indexes, the first column is the partition key, causing the failure. Previously, serialization logic for the target index option was shared between vector and secondary indexes. This is no longer viable due to the introduction of local vector indexes and vector indexes with filtering columns, which have different target format. This commit introduces a dedicated JSON-based serialization format for vector index targets, identifying the target column (tc), filtering columns (fc), and partition key columns (pk). This ensures unambiguous serialization and deserialization for all vector index types. This change is backward compatible for regular vector indexes. However, it breaks compatibility for local vector indexes and vector indexes with filtering columns created in version 2026.1.0. To mitigate this, usage of these specific index types will be blocked in the 2026.1.0 release by failing ANN queries against them in vector-store service. Fixes: SCYLLADB-895	2026-03-30 16:46:48 +02:00
Karol Nowacki	c0b78477a5	index: test: vector index target option serialization test This test ensures that the serialization format for vector index target options remains stable. Maintaining backward compatibility is critical because the index is restored from this property on startup. Any unintended changes to the serialization schema could break existing indexes after an upgrade. This option is also an interface for the vector-store service, which uses it to identify the indexed column.	2026-03-30 16:46:48 +02:00
Karol Nowacki	4dc28dfa52	index: test: secondary index target option serialization test Target option serialization must remain stable for backward compatibility. The index is restored from this property on startup, so unintentional changes to the serialization schema can break indexes after upgrade.	2026-03-30 16:46:47 +02:00
Patryk Jędrzejczak	ba54b2272b	raft: speed up read barrier requested by non-voters We achieve this by making the leader replicate to the non-voting requester of a read barrier if its commit_idx is behind. There are some corner cases where the new `replicate_to(*opt_progress, true);` call will be a no-op, while the corresponding call in `tick_leader()` would result in sending the AppendEntries RPC to the follower. These cases are: - `progress.state == follower_progress::state::PROBE && progress.probe_sent`, - `progress.state == follower_progress::state::PIPELINE && progress.in_flight == follower_progress::max_in_flight`. We could try to improve the optimization by including some of the cases above, but it would only complicate the code without noticeable benefits (at least for group0). Note: this is the second attempt for this optimization. The first approach turned out to be incorrect and was reverted in the previous commit. The performance improvement is the same as in the previous case.	2026-03-30 15:56:24 +02:00
Patryk Jędrzejczak	4913acd742	Revert "raft: read_barrier: update local commit_idx to read_idx when it's safe" This reverts commit `1ae2ae50a6`. The reverted change turned out to be incorrect. The Raft member requesting a read barrier could incorrectly advance its commit_idx and break linearizability. More details in https://scylladb.atlassian.net/browse/SCYLLADB-998?focusedCommentId=42935	2026-03-30 15:56:24 +02:00
Andrzej Jackowski	ab43420d30	test: use exclusive driver connection in test_limited_concurrency_of_writes Use get_cql_exclusive(node1) so the driver only connects to node1 and never attempts to contact the stopped node2. The test was flaky because the driver received `Host has been marked down or removed` from node2. Fixes: SCYLLADB-1227 Closes scylladb/scylladb#29268	2026-03-30 11:50:44 +02:00
Botond Dénes	068a7894aa	test/cluster: fix flaky test_cleanup_stop by using asyncio.sleep The test was using time.sleep(1) (a blocking call) to wait after scheduling the stop_compaction task, intending to let it register on the server before releasing the sstable_cleanup_wait injection point. However, time.sleep() blocks the asyncio event loop entirely, so the asyncio.create_task(stop_compaction) task never gets to run during the sleep. After the sleep, the directly-awaited message_injection() runs first, releasing the injection point before stop_compaction is even sent. By the time stop_compaction reaches Scylla, the cleanup has already completed successfully -- no exception is raised and the test fails. Fix by replacing time.sleep(1) with await asyncio.sleep(1), which yields control to the event loop and allows the stop_compaction task to actually send its HTTP request before message_injection is called. Fixes: SCYLLADB-834 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Closes scylladb/scylladb#29202	2026-03-30 11:40:47 +03:00
Nikos Dragazis	3b3b02b15a	docs: Add ops guide for vnodes-to-tablets migration The vnodes-to-tablets migration is a manual procedure, so instructions need to be provided to the users. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2026-03-29 22:18:46 +03:00
Ernest Zaslavsky	1d779804a0	scripts: remove lua library rename workaround from comparison script Now that cmake/FindLua.cmake uses pkg-config (matching configure.py), both build systems resolve to the same 'lua' library name. Remove the lua/lua-5.4 entries from _KNOWN_LIB_ASYMMETRIES and add 'm' (math library) as a known transitive dependency that configure.py gets via pkg-config for lua.	2026-03-29 16:17:45 +03:00
Ernest Zaslavsky	c32851b102	cmake: add custom FindLua using pkg-config to match configure.py CMake's built-in FindLua resolves to the versioned library file (e.g. liblua-5.4.so) instead of the unversioned symlink (liblua.so), causing a library name mismatch between the two build systems. Add a custom cmake/FindLua.cmake that uses pkg-config — matching configure.py's approach — and find_library(NAMES lua) to find the unversioned symlink. This also mirrors the pattern used by other Find modules in cmake/ (FindxxHash, Findlz4, etc.).	2026-03-29 16:17:45 +03:00
Ernest Zaslavsky	f3a91df0b4	test/cmake: add missing tests to boost test suite Add symmetric_key_test (standalone, links encryption library) and auth_cache_test to the combined_tests binary. These tests already exist in configure.py; this aligns the CMake build.	2026-03-29 16:17:45 +03:00
Ernest Zaslavsky	de606cc17a	test/cmake: remove per-test LTO disable The per-test -fno-lto link option is now redundant since -fno-lto was added globally in mode.common.cmake. LTO-enabled targets (the scylla binary in RelWithDebInfo) override it via enable_lto().	2026-03-29 16:17:45 +03:00
Ernest Zaslavsky	38ba58567a	cmake: add BOOST_ALL_DYN_LINK and strip per-component defines Match configure.py's Boost handling: - Add BOOST_ALL_DYN_LINK when using shared Boost libraries. - Strip per-component defines (BOOST_UNIT_TEST_FRAMEWORK_DYN_LINK, BOOST_REGEX_DYN_LINK, etc.) that CMake's Boost package config adds on imported targets. configure.py only uses the umbrella BOOST_ALL_DYN_LINK define.	2026-03-29 16:17:45 +03:00
Ernest Zaslavsky	7e72898150	cmake: move SEASTAR_TESTING_MAIN after seastar and abseil subdirs Place add_compile_definitions(SEASTAR_TESTING_MAIN) after both add_subdirectory(seastar) and add_subdirectory(abseil) are processed. This matches configure.py's global define without leaking into seastar's subdirectory build (which would cause a duplicate main symbol in seastar_testing). Remove the now-redundant per-test SEASTAR_TESTING_MAIN compile definition from test/CMakeLists.txt.	2026-03-29 16:17:45 +03:00
Ernest Zaslavsky	b0837ead3e	cmake: add -fno-sanitize=vptr for abseil sanitizer flags Match configure.py line 2192: abseil gets sanitizer flags with -fno-sanitize=vptr to exclude vptr checks which are incompatible with abseil's usage of type-punning patterns.	2026-03-29 16:17:45 +03:00
Ernest Zaslavsky	dd829fa69c	cmake: align Seastar build configuration with configure.py - Set BUILD_SHARED_LIBS based on build type to match configure.py's build_seastar_shared_libs: Debug and Dev build Seastar as a shared library, all other modes build it static. - Add sanitizer link options on the seastar target for Coverage mode. Seastar's CMake only activates sanitizer targets for Debug/Sanitize configs, but Coverage mode needs them too since configure.py's seastar_libs_coverage carries -fsanitize flags.	2026-03-29 16:17:45 +03:00
Ernest Zaslavsky	52e4d44a75	cmake: align global compile defines and options with configure.py - Disable CMake's automatic -fcolor-diagnostics injection for Clang+Ninja (CMake 3.24+), matching configure.py which does not add any color diagnostics flags. - Add SEASTAR_NO_EXCEPTION_HACK and XXH_PRIVATE_API as global defines (previously SEASTAR_NO_EXCEPTION_HACK was only on the seastar target as PRIVATE; it needs to be project-wide). - Add -fpch-validate-input-files-content to check precompiled header content when timestamps don't match.	2026-03-29 16:17:44 +03:00
Ernest Zaslavsky	6f2fe3c2fc	cmake: fix Coverage mode in mode.Coverage.cmake Fix multiple deviations from configure.py's coverage mode: - Remove -fprofile-list from CMAKE_CXX_FLAGS_COVERAGE. That flag belongs in COVERAGE_INST_FLAGS applied to other modes, not to coverage mode itself. - Replace incorrect defines (DEBUG, SANITIZE, DEBUG_LSA_SANITIZER, SCYLLA_ENABLE_ERROR_INJECTION) with the correct Seastar debug defines (SEASTAR_DEBUG, SEASTAR_DEFAULT_ALLOCATOR, etc.) that configure.py's pkg-config query produces for coverage mode. - Add sanitizer and stack-clash-protection compile flags for Coverage config, matching the flags that Seastar's pkg-config --cflags output includes for debug builds. - Change CMAKE_STATIC_LINKER_FLAGS_COVERAGE to CMAKE_EXE_LINKER_FLAGS_COVERAGE. Coverage flags need to reach the executable linker, not the static archiver.	2026-03-29 16:17:44 +03:00
Ernest Zaslavsky	7d23ba7dc8	cmake: align mode.common.cmake flags with configure.py Add three flag-alignment changes: - -Wno-error=stack-usage= alongside the stack-usage threshold flag, preventing hard errors from stack-usage warnings (matching configure.py behavior). - -fno-lto global link option. configure.py adds -fno-lto to all binaries; LTO-enabled targets override it via enable_lto(). - Sanitizer link flags (-fsanitize=address, -fsanitize=undefined) for Debug/Sanitize configs, matching configure.py's cxx_ld_flags.	2026-03-29 16:17:44 +03:00
Ernest Zaslavsky	38088a8a94	configure.py: add sstable_tablet_streaming to combined_tests	2026-03-29 16:17:44 +03:00
Ernest Zaslavsky	33bca2428a	docs: add compare-build-systems.md Document the purpose, usage, and examples for scripts/compare_build_systems.py which compares the configure.py and CMake build systems by parsing their ninja build files.	2026-03-29 16:17:44 +03:00
Ernest Zaslavsky	d3972369a0	scripts: add compare_build_systems.py to compare ninja build files Add a script that compares configure.py and CMake build systems by parsing their generated build.ninja files. The script checks: - Per-file compilation flags (defines, warnings, optimization) - Link target sets (detect missing/extra targets) - Per-target linker flags and libraries configure.py is treated as the baseline. CMake should match it. Both systems are always configured into a temporary directory so the user's build tree is never touched. Usage: scripts/compare_build_systems.py -m dev # single mode scripts/compare_build_systems.py # all modes scripts/compare_build_systems.py --ci # CI mode (strict)	2026-03-29 16:17:44 +03:00
Nadav Har'El	d32fe72252	Merge 'alternator: check concurrency limit before memory acquisition' from Łukasz Paszkowski Fix the ordering of the concurrency limit check in the Alternator HTTP server so it happens before memory acquisition, and reduce test pressure to avoid LSA exhaustion on the memory-constrained test node. The patch moves the concurrency check to right after the content-length early-out, before any memory acquisition or I/O. The check was originally placed before memory acquisition but was inadvertently moved after it during a refactoring. This allowed unlimited requests to pile up consuming memory, reading bodies, verifying signatures, and decompressing — all before being rejected. Restores the original ordering and mirrors the CQL transport (`transport/server.cc`). Lowers `concurrent_requests_limit` from 5 to 3 and the thread multiplier from 5 to 2 (6 threads instead of 25). This is still sufficient to reliably trigger RequestLimitExceeded, while keeping flush pressure within what 512MB per shard can sustain. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1248 Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1181 The test started to fail quite recently. It affects master only. No backport is needed. We might want to consider backporting a commit moving the concurrency check earlier. Closes scylladb/scylladb#29272 * github.com:scylladb/scylladb: test: reduce concurrent-request-limit test pressure to avoid LSA exhaustion alternator: check concurrency limit before memory acquisition	2026-03-29 11:08:28 +03:00
Łukasz Paszkowski	b8e3ef0c64	test: reduce concurrent-request-limit test pressure to avoid LSA exhaustion The test_limit_concurrent_requests dtest uses concurrent CreateTable requests to verify Alternator's concurrency limiting. Each admitted CreateTable triggers Raft consensus, schema mutations, and memtable flushes—all of which consume LSA memory. On the 1 GB test node (2 SMP × 512 MB), the original settings (limit=5, 25 threads) created enough flush pressure to exhaust the LSA emergency reserve, producing logalloc::bad_alloc errors in the node log. The test was always marginal under these settings and became flaky as new system tables increased baseline LSA usage over time. Lower concurrent_requests_limit from 5 to 3 and the thread multiplier from 5 to 2 (6 threads total). This is still well above the limit and sufficient to reliably trigger RequestLimitExceeded, while keeping flush pressure within what 512 MB per shard can sustain.	2026-03-28 20:40:33 +01:00
Łukasz Paszkowski	a86928caa1	alternator: check concurrency limit before memory acquisition The concurrency limit check in the Alternator server was positioned after memory acquisition (get_units), request body reading (read_entire_stream), signature verification, and decompression. This allowed unlimited requests to pile up consuming memory before being rejected, exhausting LSA memory and causing logalloc::bad_alloc errors that cascade into Raft applier and topology coordinator failures, breaking subsequent operations. Without this fix, test_limit_concurrent_requests on a 1GB node produces 50 logalloc::bad_alloc errors and cascading failures: reads from system.scylla_local fail, the Raft applier fiber stops, the topology coordinator stops, and all subsequent CreateTable operations fail with InternalServerError (500). With this fix, the cascade is eliminated -- admitted requests may still cause LSA pressure on a memory-constrained node, but the server remains functional. Move the concurrency check to right after the content-length early-out, before any memory acquisition or I/O. This mirrors the CQL transport which correctly checks concurrency before memory acquisition (transport/server.cc). The concurrency check was originally added in `1b8c946ad7` (Sep 2020) before memory acquisition, which at the time lived inside with_gate (after the concurrency gate). The ordering was inverted by `f41dac2a3a` (Mar 2021, "avoid large contiguous allocation for request body"), which moved get_units() earlier in the function to reserve memory before reading the newly-introduced content stream -- but inadvertently also moved it before the concurrency check. `c3593462a4` (Mar 2025) further worsened the situation by adding a 16MB fallback reservation for requests without Content-Length and ungzip/deflate decompression steps -- all before the concurrency check -- greatly increasing the memory consumed by requests that would ultimately be rejected.	2026-03-28 20:40:33 +01:00
Emil Maskovsky	9dad68e58d	raft: abort stale snapshot transfers when term changes The Bug Assertion failure: `SCYLLA_ASSERT(res.second)` in `raft/server.cc` when creating a snapshot transfer for a destination that already had a stale in-flight transfer. Root Cause If a node loses leadership and later becomes leader again before the next `io_fiber` iteration, the old transfer from the previous term can remain in `_snapshot_transfers` while `become_leader()` resets progress state. When the new term emits `install_snapshot(dst)`, `send_snapshot(dst)` tries to create a new entry for the same destination and can hit the assertion. The Fix Abort all in-flight snapshot transfers in `process_fsm_output()` when `term_and_vote` is persisted. A term/vote change marks existing transfers as stale, so we clean them up before dispatching messages from that batch and before any new snapshot transfer is started. With cross-term cleanup moved to the term-change path, `send_snapshot()` now asserts the within-term invariant that there is at most one in-flight transfer per destination. Fixes: SCYLLADB-862 Backport: The issue is reproducible in master, but is present in all active branches. Closes scylladb/scylladb#29092	2026-03-27 10:00:15 +01:00
Andrzej Jackowski	181ad9f476	Revert "audit: disable DDL by default" This reverts commit `c30607d80b`. With the default configuration, enabling DDL has no effect because no `audit_keyspaces` or `audit_tables` are specified. Including DDL in the default categories can be misleading for some customers, and ideally we would like to avoid it. However, DDL has been one of the default audit categories for years, and removing it risks silently breaking existing deployments that depend on it. Therefore, the recent change to disable DDL by default is reverted. Fixes: SCYLLADB-1155 Closes scylladb/scylladb#29169	2026-03-27 09:55:11 +01:00
Botond Dénes	854c374ebf	test/encryption: wait for topology convergence after abrupt restart test_reboot uses a custom restart function that SIGKILLs and restarts nodes sequentially. After all nodes are back up, the test proceeded directly to reads after wait_for_cql_and_get_hosts(), which only confirms CQL reachability. While a node is restarted, other nodes might execute global token metadata barriers, which advance the topology fence version. The restarted node has to learn about the new version before it can send reads/writes to the other nodes. The test issues reads as soon as the CQL port is opened, which might happen before the last restarted node learns of the latest topology version. If this node acts as a coordinator for reads/write before this happens, these will fail as the other nodes will reject the ops with the outdated topology fence version. Fix this by replacing wait_for_cql_and_get_hosts() on the abrupt-restart path with the more robus get_ready_cql(), which makes sure servers see each other before refreshing the cql connection. This should ensure that nodes have exchanged gossip and converged on topology state before any reads are executed. The rolling_restart() path is unaffected as it handles this internally. Fixes: SCYLLADB-557 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Closes scylladb/scylladb#29211	2026-03-27 09:52:27 +01:00
Avi Kivity	b708e5d7c9	Merge 'test: fix race condition in test_crashed_node_substitution' from Sergey Zolotukhin `test_crashed_node_substitution` intermittently failed: ```python assert len(gossiper_eps) == (len(server_eps) + 1) ``` The test crashed the node right after a single ACK2 handshake (`finished do_send_ack2_msg`), assuming the node state was visible to all peers. However, since gossip is eventually consistent, the update may not have propagated yet, so some nodes did not see the failed node. This change: Wait until the gossiper state is visible on peers before continuing the test and asserting. Fixes: [SCYLLADB-1256](https://scylladb.atlassian.net/browse/SCYLLADB-1256). backport: this issue may affect CI for all branches, so should be backported to all versions. [SCYLLADB-1256]: https://scylladb.atlassian.net/browse/SCYLLADB-1256?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ Closes scylladb/scylladb#29254 * github.com:scylladb/scylladb: test: test_crashed_node_substitution: add docstring and fix whitespace test: fix race condition in test_crashed_node_substitution	2026-03-26 21:40:33 +02:00
Petr Gusev	c38e312321	test_lwt_fencing_upgrade: fix quorum failure due to gossip lag If lwt_workload() sends an update immediately after a rolling restart, the coordinator might still see a replica as down due to gossip lagging behind. Concurrently restarting another node leaves only one available replica, failing the LOCAL_QUORUM requirement for learn or eventually consistent sp::query() in sp::cas() and resulting in a mutation_write_failure_exception. We fix this problem by waiting for the restarted server to see 2 other peers. The server_change_version doesn't do that by default -- it passes wait_others=0 to server_start(). Fixes SCYLLADB-1136 Closes scylladb/scylladb#29234	2026-03-26 21:25:53 +02:00
bitpathfinder	627a8294ed	test: test_crashed_node_substitution: add docstring and fix whitespace Add a description of the test's intent and scenario; remove extra blanks.	2026-03-26 18:40:17 +01:00
bitpathfinder	5a086ae9b7	test: fix race condition in test_crashed_node_substitution `test_crashed_node_substitution` intermittently failed: ``` assert len(gossiper_eps) == (len(server_eps) + 1) ``` The test crashed the node right after a single ACK2 handshake ("finished do_send_ack2_msg"), assuming the node state was visible to all peers. However, since gossip is eventually consistent, the update may not have propagated yet, so some nodes did not see the failed node. This change: Wait until the gossiper state is visible on peers before continuing the test and asserting. Fixes: SCYLLADB-1256.	2026-03-26 18:25:05 +01:00
Robert Bindar	c575bbf1e8	test_refresh_deletes_uploaded_sstables should wait for sstables to get deleted SSTable unlinking is async, so in some cases it may happen that the upload dir is not empty immediately after refresh is done. This patch adjusts test_refresh_deletes_uploaded_sstables so it waits with a timeout till the upload dir becomes empty instead of just assuming the API will sync on sstables being gone. Fixes SCYLLADB-1190 Signed-off-by: Robert Bindar <robert.bindar@scylladb.com> Closes scylladb/scylladb#29215	2026-03-26 08:43:14 +03:00
Nikos Dragazis	8789c95a85	test: cluster: Add test for migration of multiple keyspaces Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2026-03-25 19:11:29 +02:00
Nikos Dragazis	25af8bdc24	test: cluster: Add test for error conditions Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2026-03-25 19:11:29 +02:00
Nikos Dragazis	01a51817c4	test: cluster: Add vnodes->tablets migration test (rollback) Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2026-03-25 19:11:29 +02:00
Nikos Dragazis	56ec33d3e0	test: cluster: Add vnodes->tablets migration test (1 table, 3 nodes) Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2026-03-25 19:11:29 +02:00
Nikos Dragazis	58e930c490	test: cluster: Add vnodes->tablets migration test (1 table, 1 node) This test runs the vnodes-to-tablets migration for a single table on a single-node cluster. The node has multiple shards and multiple power-of-two aligned vnodes, so resharding is triggered. More details in the docstring. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2026-03-25 19:11:29 +02:00
Nikos Dragazis	8837dac2f9	scylla-nodetool: Add migrate-to-tablets subcommand The vnodes-to-tablets migration is a manual procedure, so orchestration must be done via nodetool. This patch adds the following new commands: * nodetool migrate-to-tablets start {ks} * nodetool migrate-to-tablets upgrade * nodetool migrate-to-tablets downgrade * nodetool migrate-to-tablets status {ks} * nodetool migrate-to-tablets finalize {ks} The commands are just wrappers over the REST API. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2026-03-25 19:11:29 +02:00
Nikos Dragazis	2a5e6b832a	api: Add REST endpoint for vnode-to-tablet migration status If the keyspace is migrating, it reports the intended and actual storage mode for each node. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2026-03-25 19:11:24 +02:00
Marcin Maliszkiewicz	7fdd650009	Merge 'test: audit: clean up test helper class naming' from Dario Mirovic Remove unused `pytest.mark.single_node` marker from `TestCQLAudit`. Rename `TestCQLAudit` to `CQLAuditTester` to reflect that it is a test helper, not a test class. This avoids accidental pytest collection and subsequent warning about `__init__`. Logs before the fixes: ``` test/cluster/test_audit.py:514: 14 warnings /home/dario/dev/scylladb/test/cluster/test_audit.py:514: PytestCollectionWarning: cannot collect test class 'TestCQLAudit' because it has a __init__ constructor (from: cluster/test_audit.py) @pytest.mark.single_node ``` Fixes SCYLLADB-1237 This is an addition to the latest master code. No backport needed. Closes scylladb/scylladb#29237 * github.com:scylladb/scylladb: test: audit: rename TestCQLAudit to CQLAuditTester test: audit: remove unused pytest.mark.single_node	2026-03-25 15:30:16 +01:00
Radosław Cybulski	1dc20cc8f9	alternator/test: explain why 'always' write isolation mode is used in tests Improve test comments for test_streams_batchwrite_into_the_same_partition_deletes_existing_items and test_streams_batchwrite_into_the_same_partition_will_report_wrong_stream_data to explain why 'always' write isolation mode is required: in always_use_lwt mode all items in a batch get the same CDC timestamp, which triggers the squashing bug. In other modes each item gets a separate timestamp so the bug doesn't manifest. Also fix the example in the second test comment to use cleaner key values and correct event type (INSERT, not MODIFY, since items are inserted into an empty table), and fix the issue reference from #28452 (the PR) to #28439 (the issue).	2026-03-25 15:15:20 +01:00
Dario Mirovic	552a2d0995	test: audit: rename TestCQLAudit to CQLAuditTester pytest tries to collect tests for execution in several ways. One is to pick all classes that start with 'Test'. Those classes must not have custom '__init__' constructor. TestCQLAudit does. TestCQLAudit after migration from test/cluster/dtest is not a test class anymore, but rather a helper class. There are two ways to fix this: 1. Add __init__ = False to the TestCQLAudit class 2. Rename it to not start with 'Test' Option 2 feels better because the new name itself does not convey the wrong message about its role. Fixes SCYLLADB-1237	2026-03-25 13:21:08 +01:00
Dario Mirovic	73de865ca3	test: audit: remove unused pytest.mark.single_node Remove unused pytest.mark.single_node in TestCQLAudit class. This is a leftover from audit tests migration from test/cluster/dtest to test/cluster. Refs SCYLLADB-1237	2026-03-25 13:18:37 +01:00
Radosław Cybulski	ded62b2c5e	alternator/test: add scylla_only to always write isolation fixture Add scylla_only fixture dependency to the test_table_ss_new_and_old_images_write_isolation_always fixture. This ensures all tests using the 'always' write isolation mode are skipped when running against DynamoDB (--aws), since the system:write_isolation tag is a Scylla-only feature.	2026-03-25 12:38:09 +01:00
Radosław Cybulski	7d404cdd51	alternator: fix BatchWriteItem squashed Streams entries BatchWriteItem with items for the same partition (and write isolation set to always) will trigger LWT and run different cdc code path, which will result in wrong Streams data being returned to the user - changes will be randomly squashed together. For example batch write: batch.put_item(Item={'p': 'p', 'c': 'c0'}) batch.put_item(Item={'p': 'p', 'c': 'c1'}) batch.put_item(Item={'p': 'p', 'c': 'c2'}) instead of producing 3 modify / insert events will produce one: type=INSERT, key={'c': {'S': 'c0'}, 'p': {'S': 'p'}}, old_image=None, new_image={'c': {'S': 'c2'}, 'p': {'S': 'p'}} with `new_image` having different `c` key from `key` field. This happens because BatchWriteItem (when using LWT) emits it's changes to cdc under the same timestamp. This results in in all log entries being put in single cdc "bucket" (under the same cdc$timestamp key). Previous parsing algorithm would interpret those changes as a change to a single item and squash them together. The patch rewrites algorithm to use `std::unordered_map` for records based on value of clustering key, that is added to every cdc log entry. This allows rebuilding all item modifications. Fixes #28439 Fixes: SCYLLADB-540	2026-03-25 11:40:53 +01:00
Radosław Cybulski	85da03c88d	alternator: add BatchWriteItem test (failing) Add additional BatchWriteItem tests (some failing): - `test_streams_batchwrite_no_clustering_deletes_non_existing_items` `test_streams_batchwrite_no_clustering_deletes_existing_items` - those tests pass, we add it here for completness, as non clustering tables trigger different paths. - `test_streams_batchwrite_into_the_same_partition_deletes_existing_items` - failing test, that checks combinations of puts and deletes in a single batch write (so for example 3 items, 2 puts and 1 delete). - `test_streams_batchwrite_into_the_same_partition_will_report_wrong_stream_data` - failing simple test. Tests fail, because current implementation, when writing cdc log entries will squash all changes done to the same partition together. The data is still there, but when GetRecords is called and we parse cdc log entries, we don't correctly recover it (see issue #28439 for more details).	2026-03-25 11:40:53 +01:00
Marcin Maliszkiewicz	f988ec18cb	test/lib: fix port in-use detection in start_docker_service Previously, the result of when_all was discarded. when_all stores exceptions in the returned futures rather than throwing, so the outer catch(in_use&) could never trigger. Now we capture the when_all result and inspect each future individually to properly detect in_use from either stream. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1216 Closes scylladb/scylladb#29219	2026-03-25 11:45:53 +02:00
Artsiom Mishuta	cd1679934c	test/pylib: use exponential backoff in wait_for() Change wait_for() defaults from period=1s/no backoff to period=0.1s with 1.5x backoff capped at 1.0s. This catches fast conditions in 100ms instead of 1000ms, benefiting ~100 call sites automatically. Add completion logging with elapsed time and iteration count. Tested local with test/cluster/test_fencing.py::test_fence_hints (dev mode), log output: wait_for(at_least_one_hint_failed) completed in 0.83s (4 iterations) wait_for(exactly_one_hint_sent) completed in 1.34s (5 iterations) Fixes SCYLLADB-738 Closes scylladb/scylladb#29173	2026-03-24 23:49:49 +02:00
Botond Dénes	d52fbf7ada	Merge 'test: cluster: Deflake test_startup_with_keyspaces_violating_rf_rack_valid_keyspaces' from Dawid Mędrek The test was flaky. The scenario looked like this: 1. Stop server 1. 2. Set its rf_rack_valid_keyspaces configuration option to true. 3. Create an RF-rack-invalid keyspace. 4. Start server 1 and expect a failure during start-up. It was wrong. We cannot predict when the Raft mutation corresponding to the newly created keyspace will arrive at the node or when it will be processed. If the check of the RF-rack-valid keyspaces we perform at start-up was done before that, it won't include the keyspace. This will lead to a test failure. Unfortunately, it's not feasible to perform a read barrier during start-up. What's more, although it would help the test, it wouldn't be useful otherwise. Because of that, we simply fix the test, at least for now. The new scenario looks like this: 1. Disable the rf_rack_valid_keyspaces configuration option on server 1. 2. Start the server. 3. Create an RF-rack-invalid keyspace. 4. Perform a read barrier on server 1. This will ensure that it has observed all Raft mutations, and we won't run into the same problem. 5. Stop the node. 6. Set its rf_rack_valid_keyspaces configuration option to true. 7. Try to start the node and observe a failure. This will make the test perform consistently. --- I ran the test (in dev mode, on my local machine) three times before these changes, and three times with them. I include the time results below. Before: ``` real 0m47.570s user 0m41.631s sys 0m8.634s real 0m50.495s user 0m42.499s sys 0m8.607s real 0m50.375s user 0m41.832s sys 0m8.789s ``` After: ``` real 0m50.509s user 0m43.535s sys 0m9.715s real 0m50.857s user 0m44.185s sys 0m9.811s real 0m50.873s user 0m44.289s sys 0m9.737s ``` Fixes SCYLLADB-1137 Backport: The test is present on all supported branches, and so we should backport these changes to them. Closes scylladb/scylladb#29218 * github.com:scylladb/scylladb: test: cluster: Deflake test_startup_with_keyspaces_violating_rf_rack_valid_keyspaces test: cluster: Mark test with @pytest.mark.asyncio in test_multidc.py	2026-03-24 21:09:19 +02:00
Patryk Jędrzejczak	141aa2d696	Merge 'test/cluster/test_incremental_repair.py: fix typo + enable compaction DEBUG logs' from Botond Dénes This PR contains two small improvements to `test_incremental_repair.py` motivated by the sporadic failure of `test_tablet_incremental_repair_and_scrubsstables_abort`. The test fails with `assert 3 == 2` on `len(sst_add)` in the second repair round. The extra SSTable has `repaired_at=0`, meaning scrub unexpectedly produced more unrepaired SSTables than anticipated. Since scrub (and compaction in general) logs at DEBUG level and the test did not enable debug logging, the existing logs do not contain enough information to determine the root cause. Commit 1 fixes a long-standing typo in the helper function name (`preapre` -> `prepare`). Commit 2 enables `compaction=debug` for the Scylla nodes started by `do_tablet_incremental_repair_and_ops`, which covers all `test_tablet_incremental_repair_and_` variants. This will capture full compaction/scrub activity on the next reproduction, making the failure diagnosable. Refs: SCYLLADB-1086 Backport: test improvement, no backport Closes scylladb/scylladb#29175 https://github.com/scylladb/scylladb: test/cluster/test_incremental_repair.py: enable compaction DEBUG logs in do_tablet_incremental_repair_and_ops test/cluster/test_incremental_repair.py: fix typo preapre -> prepare	2026-03-24 16:27:01 +01:00
Pavel Emelyanov	2d8540f1ee	transport: fix process_startup cert-auth path missing connection-ready setup When authenticate() returns a user directly (certificate-based auth, introduced in `20e9619bb1`), process_startup was missing the same post-authentication bookkeeping that the no-auth and SASL paths perform: - update_scheduling_group(): without it, the connection runs under the default scheduling group instead of the one mapped to the user's service level. - _authenticating = false / _ready = true: without them, system.clients reports connection_stage = AUTHENTICATING forever instead of READY. - on_connection_ready(): without it, the connection never releases its slot in the uninitialized-connections concurrency semaphore (acquired at connection creation), leaking one unit per cert-authenticated connection for the lifetime of the connection. The omission was introduced when on_connection_ready() was added to the else and SASL branches in `474e84199c` but the cert-auth branch was missed. Fixes: `20e9619bb1` ("auth: support certificate-based authentication") Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-24 18:02:46 +03:00
Pavel Emelyanov	da6fe14035	transport: test that connection_stage is READY after auth via all process_startup paths The cert-auth path in process_startup (introduced in `20e9619bb1`) was missing _ready = true, _authenticating = false, update_scheduling_group() and on_connection_ready(). The result is that connections authenticated via certificate show connection_stage = AUTHENTICATING in system.clients forever, run under the wrong service-level scheduling group, and hold the uninitialized-connections semaphore slot for the lifetime of the connection. Add a parametrized cluster test that verifies all three process_startup branches result in connection_stage = READY: - allow_all: AllowAllAuthenticator (no-auth path) - password: PasswordAuthenticator (SASL/process_auth_response path) - cert_bypass: CertificateAuthenticator with transport_early_auth_bypass error injection (cert-auth path -- the buggy one) The injection is added to certificate_authenticator::authenticate() so tests can bypass actual TLS certificate parsing while still exercising the cert-auth code path in process_startup. The cert_bypass case is marked xfail until the bug is fixed. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-24 18:01:28 +03:00
Benny Halevy	1a7b013377	test: add test_sstable_clone_preserves_staging_state	2026-03-24 16:48:01 +02:00
Benny Halevy	22f2010477	test: derive sstable state from directory in test_env::make_sstable Instead of always passing sstable_state::normal, infer the state from the last component of the directory path by comparing against the known state subdirectory constants (staging_dir, upload_dir, quarantine_dir). Any unrecognized path component (the common case for normal-state sstables) maps to sstable_state::normal. When a non-normal state is detected, strip the state subdirectory from dir so that the base table directory is passed to storage.	2026-03-24 16:48:01 +02:00
Ernest Zaslavsky	c670183be8	cmake: fix precompiled header (PCH) creation Two issues prevented the precompiled header from compiling successfully when using CMake directly (rather than the configure.py + ninja build system): a) Propagate build flags to Rust binding targets reusing the PCH. The wasmtime_bindings and inc targets reuse the PCH from scylla-precompiled-header, which is compiled with Seastar's flags (including sanitizer flags in Debug/Sanitize modes). Without matching compile options, the compiler rejects the PCH due to flag mismatch (e.g., -fsanitize=address). Link these targets against Seastar::seastar to inherit the required compile options. Closes scylladb/scylladb#28941	2026-03-24 15:53:40 +02:00
Dawid Mędrek	e639dcda0b	test: cluster: Deflake test_startup_with_keyspaces_violating_rf_rack_valid_keyspaces The test was flaky. The scenario looked like this: 1. Stop server 1. 2. Set its rf_rack_valid_keyspaces configuration option to true. 3. Create an RF-rack-invalid keyspace. 4. Start server 1 and expect a failure during start-up. It was wrong. We cannot predict when the Raft mutation corresponding to the newly created keyspace will arrive at the node or when it will be processed. If the check of the RF-rack-valid keyspaces we perform at start-up was done before that, it won't include the keyspace. This will lead to a test failure. Unfortunately, it's not feasible to perform a read barrier during start-up. What's more, although it would help the test, it wouldn't be useful otherwise. Because of that, we simply fix the test, at least for now. The new scenario looks like this: 1. Disable the rf_rack_valid_keyspaces configuration option on server 1. 2. Start the server. 3. Create an RF-rack-invalid keyspace. 4. Perform a read barrier on server 1. This will ensure that it has observed all Raft mutations, and we won't run into the same problem. 5. Stop the node. 6. Set its rf_rack_valid_keyspaces configuration option to true. 7. Try to start the node and observe a failure. This will make the test perform consistently. --- I ran the test (in dev mode, on my local machine) three times before these changes, and three times with them. I include the time results below. Before: ``` real 0m47.570s user 0m41.631s sys 0m8.634s real 0m50.495s user 0m42.499s sys 0m8.607s real 0m50.375s user 0m41.832s sys 0m8.789s ``` After: ``` real 0m50.509s user 0m43.535s sys 0m9.715s real 0m50.857s user 0m44.185s sys 0m9.811s real 0m50.873s user 0m44.289s sys 0m9.737s ``` Fixes SCYLLADB-1137	2026-03-24 14:27:36 +01:00
Patryk Jędrzejczak	503a6e2d7e	locator: everywhere_replication_strategy: fix sanity_check_read_replicas when read_new is true ERMs created in `calculate_vnode_effective_replication_map` have RF computed based on the old token metadata during a topology change. The reading replicas, however, are computed based on the new token metadata (`target_token_metadata`) when `read_new` is true. That can create a mismatch for EverywhereStrategy during some topology changes - RF can be equal to the number of reading replicas +-1. During bootstrap, this can cause the `everywhere_replication_strategy::sanity_check_read_replicas` check to fail in debug mode. We fix the check in this commit by allowing one more reading replica when `read_new` is true. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1147 Closes scylladb/scylladb#29150	2026-03-24 13:43:39 +01:00
Jenkins Promoter	0f02c0d6fa	Update pgo profiles - x86_64	2026-03-24 14:11:38 +02:00
Dawid Mędrek	4fead4baae	test: cluster: Mark test with @pytest.mark.asyncio in test_multidc.py One of the tests, test_startup_with_keyspaces_violating_rf_rack_valid_keyspaces, didn't have the marker. Let's add it now.	2026-03-24 12:52:00 +01:00
Botond Dénes	ffd58ca1f0	Merge 'test: cluster: Deflake test_write_cl_any_to_dead_node_generates_hints' from Dawid Mędrek Before these changes, we would send mutations to the node and immediately query the metrics to see how many hints had been written. However, that could lead to random failures of the test: even if the mutations have finished executing, hints are stored asynchronously, so we don't have a guarantee they have already been processed. To prevent such failures, we rewrite the check: we will perform multiple checks against the metrics until we have confirmed that the hints have indeed been written or we hit the timeout. We're generous with the timeout: we give the test 60 seconds. That should be enough time to avoid flakiness even on super slow machines, and if the test does fail, we will know something is really wrong. As a bonus, we improve the test in general too. We explicitly express the preconditions we rely on, as well as bump the log level. If the test fails in the future, it might be very difficult do debug it without this additional information. Fixes SCYLLADB-1133 Backport: The test is present on all supported branches. To avoid running into more failures, we should backport these changes to them. Closes scylladb/scylladb#29191 * github.com:scylladb/scylladb: test: cluster: Increase log level in test_write_cl_any_to_dead_node_generates_hints test: cluster: Await all mutations concurrently in test_write_cl_any_to_dead_node_generates_hints test: cluster: Specify min_tablet_count in test_write_cl_any_to_dead_node_generates_hints test: cluster: Use new_test_table in test_write_cl_any_to_dead_node_generates_hints test: cluster: Introduce auxiliary function keyspace_has_tablets test: cluster: Deflake test_write_cl_any_to_dead_node_generates_hints	2026-03-24 13:39:56 +02:00
Calle Wilund	f1b3bff4a5	dockerized_service: Convert log reader to pipes and push to test log Refs: SCYLLADB-1106 Ensures any stderr logs from mock services will echo to the test log regardless of the log file we write. To help debug failed CI.	2026-03-24 12:35:42 +01:00
Calle Wilund	38aaed1ed4	test::cluster::conftest::GSServer: Fix unpublish for when publish was not called Use checked dict access to check the set vars. Fixes: SCYLLADB-1106	2026-03-24 12:33:56 +01:00
Calle Wilund	b382f3593c	scylla_cluster: Use thread safe future signalling	2026-03-24 12:33:56 +01:00
Nikos Dragazis	d09196068c	api: Add REST endpoint for migration finalization The endpoint is the following: POST /storage_service/vnode_tablet_migrations/keyspaces/{keyspace}/finalization When called, it issues a `finalize_migration` topology request and waits for its completion. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2026-03-24 13:21:12 +02:00
Nikos Dragazis	c88ddecfca	topology_coordinator: Add `finalize_migration` request Vnodes-to-tablets migration needs a finalization step to finish or rollback the migration. Finishing the migration involves switching the keyspace schema to tablets and clearing the `intended_storage_mode` from system.topology. Rolling back the migration involves deleting the tablet maps and clearing the `intended_storage_mode`. The finalization needs to be done as a topology request to exclude with other operations such as repair and TRUNCATE. This patch introduces the `finalize_migration` global topology request for this purpose. The request takes a keyspace name as an argument. The direction of the finalization (i.e., forward path vs rollback) is inferred from the `intended_storage_mode` of all nodes (not ideal, should be made explicit). Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2026-03-24 13:20:39 +02:00
Nikos Dragazis	0e1e6ebdc5	database: Construct migrating tables with tablet ERMs Extend `database::add_column_family()` with a `storage_mode` argument. If the table is under vnodes-to-tablets migration and the storage mode is "tablets", create a tablet ERM. Make the distributed loader determine the storage mode from topology (`intended_storage_mode` column in system.topology). Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2026-03-24 13:20:39 +02:00
Nikos Dragazis	2f93ab281b	api: Add REST endpoint for upgrading nodes to tablets The endpoint is the following: POST /storage_service/vnode_tablet_migrations/node/storage_mode?intended_mode={tablets,vnodes} This endpoint is part of the vnodes-to-tablets migration process and controls a node's intended_storage_mode in system.topology. The storage mode represents the node-local data distribution model, i.e., how data are organized across shards. The node will apply the intended storage mode to migrating tables upon next restart by resharding their SSTables (either on vnode boundaries if intended_mode=tablets, or with the static sharder if intended_mode=vnodes). Note that this endpoint controls the intended_storage_mode of the local node only. This has the nice benefit that once the API call returns, the change has not only been committed to group0 but also applied to the local node's state machine. This guarantees that the change is part of the node's local copy upon next restart; no additional read barrier is needed. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2026-03-24 13:20:35 +02:00
Nikos Dragazis	c4c3a95863	api: Add REST endpoint for starting vnodes-to-tablets migration The endpoint is the following: POST /storage_service/vnode_tablet_migrations/keyspaces/{keyspace} Its purpose is to start the migration of a whole keyspace from vnodes to tablets. When called, Scylla will synchronously create a tablet map for each table in the specified keyspace. The tablet maps of all tables are identical and they mirror the vnode layout; they contain one tablet per vnode and each tablet uses the same replica hosts and token boundaries as the corresponding vnode. The only difference from vnodes lies in the sharding approach. Tablets are assigned to a single shard - using a round-robin strategy in this patch - whereas vnodes are distributed evenly across all shards. If the tablet count per shard is low and tablet sizes are uneven, or some shards have more tablets than others, performance may degrade during the migration process. For example, a cluster with i8g.48xlarge (192 vCPUs), 256 vnodes per node and RF=3 will have 256 * 3 / 192 vCPUs = 4 tablet replicas per shard during the migration. One additional tablet or a double-sized tablet would cause 25% overcommit. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2026-03-24 13:19:47 +02:00
Andrei Chekun	f6fd3bbea0	test.py: reduce timeout for one test Reduce the timeout for one test to 60 minutes. The longest test we had so far was ~10-15 minutes. So reducing this timeout is pretty safe and should help with hanging tests. Closes scylladb/scylladb#29212	2026-03-24 12:50:10 +02:00
Benny Halevy	ca9ff134b8	sstables: log debug message in filesystem_storage::clone	2026-03-24 12:26:03 +02:00
Nikos Dragazis	b7f4ae8218	topology_state_machine: Add intended_storage_mode to system.topology Part of the vnodes-to-tablets migration is to reshard the SSTables of each node on vnode boundaries. Resharding is a heavy operation that runs on startup while the node is offline. Since nodes can restart for unexpected reasons, we need a flag to do it in a controllable way. We also need the ability to roll back the migration, which requires resharding in the opposite direction. This means a node must be aware of the intended migration direction. To address both requirements, this patch introduces a new column, intended_storage_mode, in system.topology. A non-null value indicates that a node should perform a migration and specifies the migration direction. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2026-03-24 11:06:38 +02:00
Nikos Dragazis	bc8109f1a4	distributed_loader: Wire vnode-based resharding into table populator Make the table populator migration-aware. If a table is migrating to tablets, switch from normal resharding to vnode-based resharding. Vnode-based resharding requires passing a vector of "owned ranges" upon which resharding will segregate the SSTables. Compute it from the tablet map. We could also compute them from the vnodes, since tablets are identical to vnodes during the migration, but in the future we may switch to a different model (multiple tablets per vnode). Let the distributed loader decide if a table is migrating or not and communicate that to the table populator. A table is migrating if the keyspace replication strategy uses vnodes but the table replication strategy uses tablets. Currently, tables cannot enter this "migrating" state; support for this will be introduced in the next patches. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2026-03-24 11:06:38 +02:00
Nikos Dragazis	63399951df	replica: Pick any compaction group for resharding In the previous patch, reshard compaction was extended with a special operation mode where SSTables from vnode-based tables are segregated on vnode boundaries and not with the static sharder. This will later be wired into vnodes-to-tablets migration. The problem is that resharding requires a compaction group. With a vnode-based table, there is only one compaction group per shard, and this is what the current code utilizes (`try_get_compaction_group_view_with_static_sharding()`). But the new operation mode will apply to migrating tables, which use a `tablet_storage_group_manager`, which creates one compaction group for each tablet. Some compaction group needs to be selected. Pick any compaction group that is available on the current shard. Reshard compaction is an operation that happens early in the startup process; compaction groups do not own any SSTables yet, so all compaction groups are equivalent. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2026-03-24 11:06:38 +02:00
Benny Halevy	d1c6141407	compaction: resharding_compaction: add vnodes_resharding option In this mode, the output sstables generated by resharding compaction are segregated by token range, based on the keyspace vnode-based owned token ranges vector. A basic unit test was also added to sstable_directory_test. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2026-03-24 11:06:38 +02:00
Nikos Dragazis	d153a95943	storage_service: Preserve ERM flavor of migrating tables When a table is migrating from vnodes to tablets, the cluster is in a mixed state where some nodes use vnode ERMs and others use tablet ERMs. The ERM flavor is a node-local property that expresses the node's storage organization. Preserve the flavor across token metadata changes. The flavor needs to be on par with storage, but the storage can change only on startup, as it requires resharding all SSTables to conform with the flavor. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2026-03-24 11:06:38 +02:00
Nikos Dragazis	4a3e26d5e3	tablet_allocator: Exclude migrating tables from load balancing The tablet load balancer operates on all tablet-based tables that appear in the tablet metadata. With the introduction of the vnodes-to-tablets migration procedure later in this series, migrating tables will also appear in the tablet metadata, but they need to be treated as vnode tables until migration is finished. This patch excludes such tables from load balancing. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2026-03-24 11:06:38 +02:00
Nikos Dragazis	3e2dc078c9	feature_service: Add vnodes_to_tablets_migrations feature Vnodes-to-tablets migrations require cluster-level support: the REST API and the group0 state need to be supported by all nodes. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2026-03-24 11:06:38 +02:00
Marcin Maliszkiewicz	66be0f4577	Merge 'test: cluster: audit test suite optimization' from Dario Mirovic Migrate audit tests from test/cluster/dtest to test/cluster. Optimize their execution time through cluster reuse. The audit test suite is heavy. There are more than 70 test executions. Environment preparation is a significant part of each test case execution time. This PR: 1. Copies audit tests from test/cluster/dtest to test/cluster, refactoring and enabling them 2. Groups tests functions by non-live cluster configuration variations to enable cluster reuse between them - Execution time reduced from 4m 29s to 2m 47s, which is ~38% execution time decrease 3. Removes the old audit tests from test/cluster/dtest Includes two supporting changes: - Allow specifying `AuthProvider` in `ManagerClient.get_cql_exclusive` - Fix server log file handling for clean clusters Refs [SCYLLADB-573](https://scylladb.atlassian.net/browse/SCYLLADB-573) This PR is an improvement and does not require a backport. [SCYLLADB-573]: https://scylladb.atlassian.net/browse/SCYLLADB-573?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ Closes scylladb/scylladb#28650 * github.com:scylladb/scylladb: test: cluster: fix log clear race condition in test_audit.py test: pylib: shut down exclusive cql connections in ManagerClient test: cluster: fix multinode audit entry comparison in test_audit.py test: cluster: dtest: remove old audit tests test: cluster: group migrated audit tests for cluster reuse test: cluster: enable migrated audit tests and make them work test: pylib: manager_client: specify AuthProvider in get_cql_exclusive test: pylib: scylla cluster after_test log fix test: audit: copy audit test from dtest	2026-03-24 09:29:52 +01:00
Dario Mirovic	120f381a9d	pgo: fix maintenance socket path too long Maintenance socket path used for PGO is in the node workdir. When the node workdir path is too long, the maintenance socket path (workdir/cql.m) can exceed the Unix domain socket sun_path limit and failing the PGO training pipeline. To prevent this: - pass an explicit --maintenance-socket override pointing to a short determinitic path in /tmp derived from the MD5 hash of the workdir maintenance socket path - update maintenance_socket_path to return the matching short path so that exec_cql.py connects to the right socket The short path socket files are cleaned up after the cluster stops. The path is using MD5 hash of the workdir path, so it is deterministic. Fixes SCYLLADB-1070 Closes scylladb/scylladb#29149	2026-03-24 09:17:10 +01:00
Pavel Emelyanov	f112e42ddd	raft: Fix split mutations freeze Commit `faa0ee9844` accidentally broke the way split snapshot mutation was frozen -- instead of appending the sub-mutation `m` the commit kept the old variable name of `mut` which in the new code corresponds to "old" non-split mutation Fixes #29051 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#29052	2026-03-24 08:53:50 +02:00
Botond Dénes	56c375b1f3	Merge 'table: don't close a disengaged querier in query()' from Pavel Emelyanov There's a flaw in table::query() -- calling querier_opt->close() can dereferences a disengaged std::optional. The fix pretty simple. Once fixed, there are two if-s checking for querier_opt being engaged or not that are worth being merged. The problem doesn't really shows itself becase table::query() is not called with null saved_querier, so the de-facto if is always correct. However, better to be on safe-side. The problem doesn't show itself for real, not worth backporting Closes scylladb/scylladb#29142 * github.com:scylladb/scylladb: table: merge adjacent querier_opt checks in query() table: don't close a disengaged querier in query()	2026-03-24 08:47:35 +02:00
Yaniv Kaul	e59a21752d	.github/workflows/trigger_jenkins.yaml: add workflow permissions Potential fix for https://github.com/scylladb/scylladb/security/code-scanning/147. To fix the problem, add an explicit `permissions:` block to the workflow (either at the top level or inside the `trigger-jenkins` job) that constrains the `GITHUB_TOKEN` to the minimal necessary privileges. This codifies least-privilege in the workflow itself instead of relying on repository or organization defaults. The best minimal, non‑breaking change is to define a root‑level `permissions:` block with read‑only contents access because the job does not perform any write operations to the repository, nor does it interact with issues, pull requests, or other GitHub resources. A conservative, widely accepted baseline is `contents: read`. If later steps require more permissions, they can be added explicitly, but for this snippet, no such need is visible. Concretely, in `.github/workflows/trigger_jenkins.yaml`, insert: ```yaml permissions: contents: read ``` between the `name:` block and the `on:` block (e.g., after line 2). No additional methods, imports, or definitions are needed since this is a pure YAML configuration change and does not alter runtime behavior of the existing shell steps. Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> Closes scylladb/scylladb#27815	2026-03-24 08:40:30 +02:00
Yaniv Kaul	85a531819b	.github/workflows/trigger-scylla-ci.yaml: add permissions to workflow Potential fix for https://github.com/scylladb/scylladb/security/code-scanning/169. In general, the fix is to add an explicit `permissions:` block to the workflow (at the root level or per job) so that the `GITHUB_TOKEN` has only the minimal scopes needed. Since this job only reads event data and uses secrets to talk to Jenkins, we can restrict `GITHUB_TOKEN` to read‑only repository contents. The single best fix here is to add a top‑level `permissions:` block right under the `name:` (and before `on:`) in `.github/workflows/trigger-scylla-ci.yaml`, setting `contents: read`. This applies to all jobs in the workflow, including `trigger-jenkins`, and does not alter any existing steps or logic. No additional imports or methods are needed, as this is purely a YAML configuration change for GitHub Actions. Concretely, edit `.github/workflows/trigger-scylla-ci.yaml` to insert: ```yaml permissions: contents: read ``` after line 1. No other lines in the file need to change. Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> Closes scylladb/scylladb#27812	2026-03-24 08:37:49 +02:00
Dawid Mędrek	148217bed6	test: cluster: Increase log level in test_write_cl_any_to_dead_node_generates_hints We increase the log level of `hints_manager` to TRACE in the test. If it fails, it may be incredibly difficult to debug it without any additional information.	2026-03-23 19:19:17 +01:00
Dawid Mędrek	2b472fe7fd	test: cluster: Await all mutations concurrently in test_write_cl_any_to_dead_node_generates_hints	2026-03-23 19:19:17 +01:00
Dawid Mędrek	ae12c712ce	test: cluster: Specify min_tablet_count in test_write_cl_any_to_dead_node_generates_hints The test relies on the assumption that mutations will be distributed more or less uniformly over the nodes. Although in practice this should not be possible, theoretically it's possible that there's only one tablet allocated for the table. To clearly indicate this precondition, we explicitly set the property `min_tablet_count` when creating the table. This way, we have a gurantee that the table has multiple tablets. The load balancer should now take care of distributing them over the nodes equally. Thanks to that, `servers[1]` will have some tablets, and so it'll be the target for some of the mutations we perform.	2026-03-23 19:19:14 +01:00
Dawid Mędrek	dd446aa442	test: cluster: Use new_test_table in test_write_cl_any_to_dead_node_generates_hints The context manager is the de-facto standard in the test suite. It will also allow us for a prettier way to conditionally enable per-table tablet options in the following commit.	2026-03-23 19:07:01 +01:00
Dawid Mędrek	dea79b09a9	test: cluster: Introduce auxiliary function keyspace_has_tablets The function is adapted from its counterpart in the cqlpy test suite: cqlpy/util.py::keyspace_has_tablets. We will use it in a commit in this series to conditionally set tablet properties when creating a table. It might also be useful in general.	2026-03-23 19:07:01 +01:00
Dawid Mędrek	3d04fd1d13	test: cluster: Deflake test_write_cl_any_to_dead_node_generates_hints Before these changes, we would send mutations to the node and immediately query the metrics to see how many hints had been written. However, that could lead to random failures of the test: even if the mutations have finished executing, hints are stored asynchronously, so we don't have a guarantee they have already been processed. To prevent such failures, we rewrite the check: we will perform multiple checks against the metrics until we have confirmed that the hints have indeed been written or we hit the timeout. We're generous with the timeout: we give the test 60 seconds. That should be enough time to avoid flakiness even on super slow machines, and if the test does fail, we will know something is really wrong. Fixes SCYLLADB-1133	2026-03-23 19:06:57 +01:00
Piotr Dulikowski	63067f594d	strong_consistency: fake taking and dropping snapshots Snapshots are not implemented yet for strong consistency - attempting to take, transfer or drop a snapshot results in an exception. However, the logic of our state machine forces snapshot transfer even if there are no lagging replicas - every raft::server::configuration::snapshot_threshold log entries. We have actually encountered an issue in our benchmarks where snapshots were being taken even though the cluster was not under any disruption, and this is one of the possible causes. It turns out that we can safely allow for taking snapshots right now - we can just implement it as a no-op and return a random UUID. Conversely, dropping a snapshot can also be a no-op. This is safe because snapshot transfer still throws an exception - as long as the taken/recovered snapshots are never attempted to be transferred.	2026-03-23 17:03:36 +01:00
Piotr Dulikowski	dd1d3dd1ee	strong_consistency: adjust limits for snapshots Raft snapshots are not implemented yet for strong consistency. Adjust the current raft group config to make them much less likely to occur: - snapshot_threshold config option decides how many log entries need to be applied after the last snapshot. Set it to the maximum value for size_t in order to effectively disable it. - snapshot_threshold_log_size defines a threshold for the log memory usage over which a snapshot is created. Increase it from the default 2MB to 10MB. - max_log_size defines the threshold for the log memory usage over which requests are stopped to be admitted until the log is shrunk back by a snapshot. Set it to 20MB, as this option is recommended to be at least twice as much as snapshot_threshold_log_size. Refs: SCYLLADB-1115	2026-03-23 17:03:36 +01:00
Botond Dénes	772b32d9f7	test/scylla_gdb: fix flakiness by preparing objects at test time Fixtures previously ran GDB once (module scope) to find live objects (sstables, tasks, schemas) and stored their addresses. Tests then reused those addresses in separate GDB invocations. Sometimes these addresses would become stale and the test would step on use-after-free (e.g. sstables compacted away between invocations). Fix by dropping the fixtures. The helper functions used by the fixtures to obtain the required objects are converted to gdb convenience functions, which can be used in the same expression as the test command invocation. Thus, the object is aquired on-demand at the moment it is used, so it is guaranteed to be fresh and relevant. Fixes: SCYLLADB-1020 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Closes scylladb/scylladb#28999	2026-03-23 16:54:03 +02:00
Piotr Dulikowski	60fb5270a9	logstor: fix fmt::format use with std::filesystem::path The version of fmt installed on my machine refuses to work with `std::filesystem::path` directly. Add `.string()` calls in places that attempt to print paths directly in order to make them work. Closes scylladb/scylladb#29148	2026-03-23 15:15:52 +01:00
Pavel Emelyanov	3b9398dfc8	Merge 'encryption: fix deadlock in encrypted_data_source::get()' from Ernest Zaslavsky When encrypted_data_source::get() caches a trailing block in _next, the next call takes it directly — bypassing input_stream::read(), which checks _eof. It then calls input_stream::read_exactly() on the already-drained stream. Unlike read(), read_up_to(), and consume(), read_exactly() does not check _eof when the buffer is empty, so it calls _fd.get() on a source that already returned EOS. In production this manifested as stuck encrypted SSTable component downloads during tablet restore: the underlying chunked_download_source hung forever on the post-EOS get(), causing 4 tablets to never complete. The stuck files were always block-aligned sizes (8k, 12k) where _next gets populated and the source is fully consumed in the same call. Fix by checking _input.eof() before calling read_exactly(). When the stream already reached EOF, buf2 is known to be empty, so the call is skipped entirely. A comprehensive test is added that uses a strict_memory_source which fails on post-EOS get(), reproducing the exact code path that caused the production deadlock. Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1128 Backport to 2025.3/4 and 2026.1 is needed since it fixes a bug that may bite us in production, to be on the safe side Closes scylladb/scylladb#29110 * github.com:scylladb/scylladb: encryption: fix deadlock in encrypted_data_source::get() test_lib: mark `limiting_data_source_impl` as not `final` Fix formatting after previous patch Fix indentation after previous patch test_lib: make limiting_data_source_impl available to tests	2026-03-23 17:12:44 +03:00
Pavel Emelyanov	57ef712243	test/backup: drop create_dataset helper It has no more callers after the previous patch. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-23 17:01:20 +03:00
Pavel Emelyanov	2353091cbd	test/backup: use new_test_keyspace in test_restore_primary_replica Replace create_dataset + manual DROP/CREATE KEYSPACE with two sequential new_test_keyspace context manager blocks, matching the pattern used by do_test_streaming_scopes. The first block covers backup, the second covers restore. Keyspace lifecycle is now automatic. The streaming directions validation loop is moved outside of the second context block, since it only parses logs and has no dependency on the keyspace being alive. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-23 16:59:47 +03:00
Botond Dénes	f5438e0587	test/cluster/test_incremental_repair.py: enable compaction DEBUG logs in do_tablet_incremental_repair_and_ops The test sporadically fails because scrub produces an unexpected number of SSTables. Compaction logs are needed to diagnose why, but were not captured since scrub runs at DEBUG level. Enable compaction=debug for the servers started by do_tablet_incremental_repair_and_ops so the next reproduction provides enough information to root-cause the issue. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-23 15:48:26 +02:00
Botond Dénes	f6ab576ed9	test/cluster/test_incremental_repair.py: fix typo preapre -> prepare Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-23 15:48:12 +02:00
Pavel Emelyanov	cb329b10bf	code: Add maintenance/maintenance group And move some activities from streaming group into it, namely - tablet_allocator background group - sstables_manager-s components reclaimer - tablet storage group manager merge completion fiber - prometheus All other activity that was in streaming group remains there, but can be moved to this group (or to new maintenance subgroup) later. All but prometheus are patched here, prometheus still uses the maintenance_sched_group variable in main.cc, so it transparently moves into new group Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-03-23 16:00:03 +03:00
Pavel Emelyanov	de9bfe0f1d	backup: Add maintenance/backup group The snapshot_ctl::backup_task_impl runs in configured scheduling group. Now it's streaming one. This patch introduces the maintenance/backup group and re-configures backup task with it. The group gets its --backup_io_throughput_mb_per_sec option that controls bandwidth limit for this sub-group only. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-03-23 16:00:02 +03:00
Pavel Emelyanov	6f43e8562e	compaction: Add maintenance/maintenance_compaction group Compaction manager tells compaction_sched_group from maintenance_compaction_sched_group. The latter, however, is set to be "streaming" group. This patch adds real maintenance_compaction group under the maintenance supergroup and makes compaction manager use it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-03-23 16:00:02 +03:00
Pavel Emelyanov	13355d1845	main: Introduce maintenance supergroup And just move streaming group inside it. Next patches will populate this supergroup further. The new supergroup gets its --maintenance-io-throughput-mb-per-sec option that controls supergroup-wide IO bandwidth applied to it. If not configured, the supergroup gets the throughput from streaming to be backward compatible. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-03-23 16:00:02 +03:00
Pavel Emelyanov	7cb9fa0778	main: Move all maintenance sched group into streaming one The main.cc code uses two variables to reference streaming scheduling. This patch stops using the maintenance_sched_group one, because it's in fact streaming group, and real "maintenance" will appear later in this set. One place is deliberately not patched -- prometheus code starts before dbcfg.streaming_scheduling_group appears, so it still sits uses the maintenance_sched_group variable. This fact will be used in one of the next patches. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-03-23 16:00:02 +03:00
Pavel Emelyanov	45ecf15fff	database: Use local variable for current_scheduling_group The classify_request() helper captures current scheduling group into local variable and compares it with groups from db_config to decide which "class" it belongs to. One if uses current_scheduling_group(), while it could use the local variable. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-03-23 16:00:02 +03:00
Pavel Emelyanov	15c41bfb6c	code: Live-update IO throughputs from main Currently we have two live-updateable IO-throughput options -- one for streaming and one for compaction. Both are observed and the changed value is applied to the corresponding scheduling_group by the relevant serice -- respectively, stream_manager and compaction_manager. Both observe/react/apply places use pretty heavy boilerplate code for such simple task. Next patches will make things worse by adding two more options to control IO throughput of some other groups. Said that, the proposal is to hold the updating code in main.cc with the help of a wrapper class. In there all the needed bits are at hand, and classes can get their IO updates applied easily. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-03-23 16:00:02 +03:00
Piotr Dulikowski	df68d0c0f7	directories: add missing seastar/util/closeable.hh include Without this include the file would not compile on its own. The issue was most likely masked by the use of precompiled headers in our CI. Closes scylladb/scylladb#29170	2026-03-23 15:46:56 +03:00
Yaniv Michael Kaul	051107f5bc	scylla-gdb: fix sstable-summary crash on ms-format sstables The 'scylla sstable-summary' GDB command crashes with 'ValueError: Argument "count" should be greater than zero' when inspecting ms-format (trie-based) sstables. This happens because ms-format sstables don't populate the traditional summary structure, leaving all fields zeroed out, which causes gdb.read_memory() to be called with a zero count. Fix by: - Adding zero-length guards to sstring.to_hex() and sstring.as_bytes() to return early when the data length is zero, consistent with the existing guard in managed_bytes.get(). - Adding the same guard to scylla_sstable_summary.to_hex(). - Detecting ms-format sstables (version == 5) early in scylla_sstable_summary.invoke() and printing an informative message instead of attempting to read the unpopulated summary. Fixes: SCYLLADB-1180 Closes scylladb/scylladb#29162	2026-03-23 12:44:47 +02:00
Calle Wilund	b36dc80835	scylla_cluster: Remove left-over debug printout	2026-03-23 11:07:59 +01:00
Piotr Szymaniak	c8e7e20c5c	test/cluster: retry create_table on transient schema agreement timeout In test_index_requires_rf_rack_valid_keyspace, the create_table call for a plain tablet-based table can fail with 'Unable to reach schema agreement' after the server's 10s timeout is exceeded. This happens when schema gossip propagation across the 4-node cluster takes longer than expected after a sequence of rapid schema changes earlier in the test. Add a retry (up to 2 attempts) on schema agreement errors for this specific create_table call rather than increasing the server-side timeout. Fixes: SCYLLADB-1135 Closes scylladb/scylladb#29132	2026-03-23 10:45:30 +02:00
Yaniv Kaul	fb1f995d6b	.github/workflows/backport-pr-fixes-validation.yaml: workflow does not contain permissions (Potential fix for code scanning alert no. 139) Potential fix for https://github.com/scylladb/scylladb/security/code-scanning/139, To fix the problem, explicitly restrict the `GITHUB_TOKEN` permissions for this workflow/job so it has only what is needed. The script reads PR data and repository info (which is covered by `contents: read`/default read scopes) and posts a comment via `github.rest.issues.createComment`, which requires `issues: write`. No other write scopes (e.g., `contents: write`, `pull-requests: write`) are necessary. The best fix without changing functionality is to add a `permissions` block scoped to this job (or at the workflow root). Since we only see a single job here, we’ll add it under `check-fixes-prefix`. Concretely, in `.github/workflows/backport-pr-fixes-validation.yaml`, between the `runs-on: ubuntu-latest` line (line 10) and `steps:` (line 11), add: ```yaml permissions: contents: read issues: write ``` This keeps the token minimally privileged while still allowing the script to create issue/PR comments. Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> Closes scylladb/scylladb#27810	2026-03-23 10:30:01 +02:00
Piotr Smaron	32225797cd	dtest: fix flaky test_writes_schema_recreated_while_node_down `read_barrier(session2)` was supposed to ensure `node2` has caught up on schema before a CL=ALL write. But `patient_cql_connection(node2)` creates a cluster-aware driver session `(TokenAwarePolicy(DCAwareRoundRobinPolicy()))` that can route the barrier CQL statement to any node — not necessarily `node2`. If the barrier runs on `node1` or `node3` (which already have the new schema), it's a no-op, and `node2` remains stale, thus the observed `WriteFailure`. The fix is to switch to `patient_exclusive_cql_connection(node2)`, which uses `WhiteListRoundRobinPolicy([node2_ip])` to pin all CQL to `node2`. This is already the established pattern used by other tests in the same file. Fixes: SCYLLADB-1139 No need to backport yet, appeared only on master. Closes scylladb/scylladb#29151	2026-03-23 10:25:54 +02:00
Michał Chojnowski	f29525f3a6	test/boost/cache_algorithm_test: disable sstable compression to avoid giant index pages The test intentionally creates huge index pages. But since `5e7fb08bf3`, the index reader allocates a block of memory for a whole index page, instead of incrementally allocating small pieces during index parsing. This giant allocation causes the test to fail spuriously in CI sometimes. Fix this by disabling sstable compression on the test table, which puts a hard cap of 2000 keys per index page. Fixes: SCYLLADB-1152 Closes scylladb/scylladb#29152	2026-03-23 09:57:11 +02:00
Raphael S. Carvalho	05b11a3b82	sstables_loader: use new sstable add path Use add_new_sstable_and_update_cache() when attaching SSTables downloaded by the node-scoped local loader. This is the correct variant for new SSTables: it can unlink the SSTable on failure to add it, and it can split the SSTable if a tablet split is in progress. The older add_sstable_and_update_cache() helper is intended for preexisting SSTables that are already stable on disk. Additionally, downloaded SSTables are now left unsealed (TemporaryTOC) until they are successfully added to the table's SSTable set. The download path (download_fully_contained_sstables) passes leave_unsealed=true to create_stream_sink, and attach_sstable opens the SSTable with unsealed_sstable=true and seals it only inside the on_add callback — matching the pattern used by stream_blob.cc and storage_service.cc for tablet streaming. This prevents a data-resurrection hazard: previously, if the process crashed between download and attach_sstable, or if attach_sstable failed mid-loop, sealed (TOC) SSTables would remain in the table directory and be reloaded by distributed_loader on restart. With TemporaryTOC, sstable_directory automatically cleans them up on restart instead. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1085. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#29072	2026-03-23 10:33:04 +03:00
Piotr Szymaniak	f511264831	alternator/test: fix test_ttl_with_load_and_decommission flaky Connection refused error The native Scylla nodetool reports ECONNREFUSED as 'Connection refused', not as 'ConnectException' (which is the Java nodetool format). Add 'Connection refused' to the valid_errors list so that transient connection failures during concurrent decommission/bootstrap topology changes are properly tolerated. Fixes SCYLLADB-1167 Closes scylladb/scylladb#29156	2026-03-22 11:01:45 +02:00
Pavel Emelyanov	c114d1b82c	api: Inline describe_ring JSON handling There are two helpers for describe_ring endpoint. Both can be squashed together for code brevity. Also, while at it, the "keyspace" parameter is not properly validated by the endpoint. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-20 19:51:32 +03:00
Pavel Emelyanov	9a2e583f29	storage_service: Make describe_ring_for_table() take table_id All callers already have it. It makes no difference for the method itself with which table identifier to work, but will help to simplify the flow in API handler (next patch) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-20 19:49:24 +03:00
Pavel Emelyanov	4bc8ec174c	repair: Remove db/config.hh from repair/*.cc files Now all the code uses repair_service::config and no longer needs global config description. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-03-20 19:36:50 +03:00
Pavel Emelyanov	35f625e5c7	repair: Move repair_multishard_reader options onto repair_service::config This actually uses two interconnected options: repair_multishard_reader_buffer_hint_size and repair_multishard_reader_enable_read_ahead. Both are propagated through repair_service::config and pass their values to repair_reader/make_reader at construction time. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-20 19:36:50 +03:00
Pavel Emelyanov	9bc0d27aae	repair: Move critical_disk_utilization_level onto repair_service::config Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-20 19:23:47 +03:00
Pavel Emelyanov	80aa0fcdc2	repair: Move repair_partition_count_estimation_ratio onto repair_service::config Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-20 19:23:47 +03:00
Pavel Emelyanov	585cb0c718	repair: Move repair_hints_batchlog_flush_cache_time_in_ms onto repair_service::config Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-20 19:23:47 +03:00
Pavel Emelyanov	d8f7f86e10	repair: Move enable_small_table_optimization_for_rbno onto repair_service::config Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-20 19:23:47 +03:00
Pavel Emelyanov	38a23ff927	repair: Introduce repair_service::config Most other services have their configs, rpair still uses global db::config. Add an empty config struct to repair_service to carry db::config options the repair service needs. Subsequent patches will populate the struct with options. The config is created in main.cc as sharded_parameter because all future options are live-updateable and should capture theirs source from db::config on correct shard. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-03-20 19:23:47 +03:00
Pavel Emelyanov	7dce43363e	table: merge adjacent querier_opt checks in query() After the previous fix both guarding if-s start with 'if (querier_opt &&'. Merge them into a single outer 'if (querier_opt)' block to avoid the redundant check and make the structure easier to follow. No functional change. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-20 14:48:08 +03:00
Piotr Dulikowski	cc695bc3f7	Merge 'vector_search: fix race condition on connection timeout' from Karol Nowacki When a `with_connect` operation timed out, the underlying connection attempt continued to run in the reactor. This could lead to a crash if the connection was established/rejected after the client object had already been destroyed. This issue was observed during the teardown phase of a upcoming high-availability test case. This commit fixes the race condition by ensuring the connection attempt is properly canceled on timeout. Additionally, the explicit TLS handshake previously forced during the connection is now deferred to the first I/O operation, which is the default and preferred behavior. Fixes: SCYLLADB-832 Backports to 2026.1 and 2025.4 are required, as this issue also exists on those branches and is causing CI flakiness. Closes scylladb/scylladb#29031 * github.com:scylladb/scylladb: vector_search: test: fix flaky test vector_search: fix race condition on connection timeout	2026-03-20 11:12:04 +01:00
Petr Gusev	4bfcd035ae	test_fencing: add missing await-s Fixes SCYLLADB-1099 Closes scylladb/scylladb#29133	2026-03-20 10:55:35 +01:00
Pavel Emelyanov	9c1c41df03	table: don't close a disengaged querier in query() The condition guarding querier_opt->close() was: When saved_querier is null the short-circuit makes the whole condition true regardless of whether querier_opt is engaged. If partition_ranges is empty, query_state::done() is true before the while-loop body ever runs, so querier_opt is never created. Calling querier_opt->close() then dereferences a disengaged std::optional — undefined behaviour. Fix by checking querier_opt first: This preserves all existing semantics (close when not saving, or when saving wouldn't be useful) while making the no-querier path safe. Why this doesn't surface today: the sole production call site, database::query(), in practice. The API header documents nullptr as valid ("Pass nullptr when queriers are not saved"), so the bug is real but latent. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-20 12:25:13 +03:00
Pavel Emelyanov	c4a0f6f2e6	object_store: Don't leave dangling objects by iterating moved-from names vector The code in upload_file std::move()-s vector of names into merge_objects() method, then iterates over this vector to delete objects. The iteration is apparently a no-op on moved-from vector. The fix is to make merge_objects() helper get vector of names by const reference -- the method doesn't modify the names collection, the caller keeps one in stable storage. Fixes #29060 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#29061	2026-03-20 10:09:30 +02:00
Pavel Emelyanov	712ba5a31f	utils: Use yielding directory_lister in owner verification Switch directories::do_verify_owner_and_mode() from lister::scan_dir() to utils::directory_lister while preserving the previous hidden-entry behavior. Make do_verify_subpath use lister::filter_type directly so the verification helper can pass it straight into directory_lister, and keep a single yielding iteration loop for directory traversal. Minus one scan_dir user twards scan_dir removal from code. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#29064	2026-03-20 10:08:38 +02:00
Pavel Emelyanov	961fc9e041	s3: Don't rearm credential timers when credentials are not refreshed The update_credentials_and_rearm() may get "empty" credentials from _creds_provider_chain.get_aws_credentials() -- it doesn't throw, but returns default-initialized value. In that case the expires_at will be set to time_point::min, and it's probably not a good idea to arm the refresh timer and, even worse idea, to subtract 1h from it. Fixes #29056 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#29057	2026-03-20 10:07:01 +02:00
Pavel Emelyanov	0a8dc4532b	s3: Fix missing upload ID in copy_part trace log The format string had two {} placeholders but three arguments, the _upload_id one is skipped from formatting Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#29053	2026-03-20 10:05:44 +02:00
Botond Dénes	bb5c328a16	Merge 'Squash two primary-replica restoration tests together' from Pavel Emelyanov The test_restore_primary_replica_same_domain and test_restore_primary_replica_different_domain tests have very much in common. Previously both tests were also split each into two, so we have four tests, and now we have two that can also be squashed, the lines-of-code savings still worth it. This is the continuation of #28569 Tests improvement, not backporting Closes scylladb/scylladb#28994 * github.com:scylladb/scylladb: test: Replace a bunch of ternary operators with an if-else block test: Squash test_restore_primary_replica_same\|different_domain tests test: Use the same regexp in test_restore_primary_replica_different\|same_domain-s	2026-03-20 10:05:16 +02:00
Pavel Emelyanov	ea2a214959	test/backup: Use unique_name() for backup prefix instead of cf_dir The do_test_backup_abort() fetched the node's workdir and resolved cf_dir solely to construct a unique-ish backup prefix: prefix = f'{cf_dir}/backup' The comment already acknowledged this was only "unique(ish)" — relying on the UUID-derived cf_dir name as a uniqueness source is roundabout. unique_name() is already imported and used for exactly this purpose elsewhere in the file. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#29030	2026-03-20 10:04:22 +02:00
Pavel Emelyanov	65032877d4	api: Move /storage_service/toppartitions from storage_service.cc to column_family.cc The endpoint URL remains intact. Having it next to another toppartitions endpoint (the /column_family/toppartitions one) is natural. This endpoint only needs sharded<replica::database>&, grabs it from http_context and doesn't use any other service. In column_family.cc the database reference is already available as a parameter. Once more user of http_context.db is gone. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Closes scylladb/scylladb#28996	2026-03-20 09:52:33 +02:00
Botond Dénes	de0bdf1a65	Merge 'Decouple test_refresh_deletes_uploaded_sstables from backup test-suite' from Pavel Emelyanov The test in question uses several helpers from the backup sute, but it doesn't really need them -- the operations it want to perform can be performed with standard pylib methods. "While at it" also collect some dangling effectively unused local variables from this test (these were apparently left from backup tests this one was copied-and-reworked from) Enhancing tests, not backporting Closes scylladb/scylladb#29130 * github.com:scylladb/scylladb: test/refresh: Simplify refresh invocation test/refresh: Remove r_servers alias for servers test/refresh: Replace check_mutation_replicas with a plain CQL SELECT test/refresh: Inline keyspace/table/data setup in test_refresh_deletes_uploaded_sstables test/refresh: Prepare indentation for new_test_keyspace in test_refresh_deletes_uploaded_sstables test/refresh: Decouple test_refresh_deletes_uploaded_sstables from backup tests test/refresh: Remove unused wait_for_cql_and_get_hosts import	2026-03-20 09:29:15 +02:00
Botond Dénes	97430e2df5	Merge 'Fix object storage lister entries walking loop' from Pavel Emelyanov Two issues found in the lister returned by gs_client_wrapper::make_object_lister() Lister can report EOF too early in case filter is active, another one is potential vector out-of-bounds access Fixes #29058 The code appeared in 2026.1, worth fixing it there as well Closes scylladb/scylladb#29059 * github.com:scylladb/scylladb: sstables: Fix object storage lister not resetting position in batch vector sstables: Fix object storage lister skipping entries when filter is active	2026-03-20 09:12:42 +02:00
Botond Dénes	5573c3b18e	Merge 'tablets: Fix deadlock in background storage group merge fiber' from Tomasz Grabiec When it deadlocks, groups stop merging and compaction group merge backlog will run-away. Also, graceful shutdown will be blocked on it. Found by flaky unit test test_merge_chooses_best_replica_with_odd_count, which timed-out in 1 in 100 runs. Reason for deadlock: When storage groups are merged, the main compaction group of the new storage group takes a compaction lock, which is appended to _compaction_reenablers_for_merging, and released when the merge completion fiber is done with the whole batch. If we accumulate more than 1 merge cycle for the fiber, deadlock occurs. Lock order will be this Initial state: cg0: main cg1: main cg2: main cg3: main After 1st merge: cg0': main [locked], merging_groups=[cg0.main, cg1.main] cg1': main [locked], merging_groups=[cg2.main, cg3.main] After 2nd merge: cg0'': main [locked], merging_groups=[cg0'.main [locked], cg0.main, cg1.main, cg1'.main [locked], cg2.main, cg3.main] merge completion fiber will try to stop cg0'.main, which will be blocked on compaction lock. which is held by the reenabler in _compaction_reenablers_for_merging, hence deadlock. The fix is to wait for background merge to finish before we start the next merge. It's achieved by holding old erm in the background merge, and doing a topology barrier from the merge finalizing transition. Background merge is supposed to be a relatively quick operation, it's stopping compaction groups. So may wait for active requests. It shouldn't prolong the barrier indefinitely. Tablet tests which trigger merge need to be adjusted to call the barrier, otherwise they will be vulnerable to the deadlock. Fixes SCYLLADB-928 Backport to >= 2025.4 because it's the earliest vulnerable due to `f9021777d8`. Closes scylladb/scylladb#29007 * github.com:scylladb/scylladb: tablets: Fix deadlock in background storage group merge fiber replica: table: Propagate old erm to storage group merge test: boost: tablets_test: Save tablet metadata when ACKing split resize decision storage_service: Extract local_topology_barrier()	2026-03-20 09:05:52 +02:00
Botond Dénes	34473302b0	Merge 'docs: document existing guardrails' from Andrzej Jackowski This patch series introduces a new documentation for exiting guardrails. Moreover: - Warning / failure messages of recently added write CL guardrails (SCYLLADB-259) are rephrased, so all guardrails have similar messages. - Some new tests are added, to help verify the correctness of the documentation and avoid situations where the documentation and implementation diverge. Fixes: [SCYLLADB-257](https://scylladb.atlassian.net/browse/SCYLLADB-257) No backport, just new docs and tests. [SCYLLADB-257]: https://scylladb.atlassian.net/browse/SCYLLADB-257?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ Closes scylladb/scylladb#29011 * github.com:scylladb/scylladb: test: add new guardrail tests matching documentation scenarios test: add metric assertions to guardrail replication strategy tests test: use regex matching in guardrail replication strategy tests test: extract ks_opts helper in test_guardrail_replication_strategy docs: document CQL guardrails cql: improve write consistency level guardrail messages	2026-03-20 08:56:00 +02:00
artem.penner	9898e5700b	scylla-node-exporter: Add systemd collector to node exporter This PR enables the node_exporter systemd collector and configures the unit whitelist to include scylla-server.service and systemd-coredump services. Motivation: We currently lack visibility into system-level service states, which is critical for diagnosing stability issues. This configuration enables two specific use cases: - Detecting Coredump Loops: We encounter scenarios where ScyllaDB enters a restart loop. To pinpoint SIGSEGV (coredumps) as the root cause, we need to track when the systemd-coredump service becomes active, indicating a dump is being processed. - Identifying Startup Failures: We need to detect when the scylla-server unit enters a failed state. This is essential for catching unrecoverable errors (e.g., corrupted commitlogs or configuration bugs) that prevent the server from starting. example of promql queries: - `node_systemd_unit_state{name=~"systemd-coredump@.*", state="active"} == 1` - `node_systemd_unit_state{name="scylla-server.service", state="failed"} == 1` Closes #28402	2026-03-20 08:39:56 +02:00
Andrzej Jackowski	10c4b9b5b0	test: verify signal() detects resource negative leak in rcs reader_concurrency_semaphore::signal() guards against available resources exceeding the initial limit after a signal, which would indicate a bug such as double-returning resources. It reports the issue via on_internal_error_noexcept and clamps resources back to the initial values. However, before this commit there were no tests that verified this behavior, so bugs like SCYLLADB-1014 went undetected. Add a test that artificially signals resources that were never consumed and verifies that signal() detects the negative leak and clamps available resources back to the initial limit. Refs: SCYLLADB-1014 Fixes: SCYLLADB-1031 Closes scylladb/scylladb#28993	2026-03-20 09:21:20 +03:00
Botond Dénes	f9adbc7548	test/cqlpy/test_tombstone_limit.py: disable tombstone-gc for test table Since `7564a56dc8`, all tables default to repair-mode tombstone-gc, which is identical to immediate-mode for RF=1 tables. Consequently the tombstones written by the tests in this test file are immediately collectible and with some unlucky timing, some of them can be collected before the end of the test, failing the empty-page prefix check because the empty pages prefix will be smaller than expected based on the number of tombstones written. Disable tombstone-gc to remove this source of flakyness. Fixes: SCYLLADB-1062 Closes scylladb/scylladb#29077	2026-03-20 09:14:29 +03:00
Michał Chojnowski	6b18d95dec	test: add a missing reconnect_driver in test_sstable_compression_dictionaries_upgrade.py Need to work around https://github.com/scylladb/python-driver/issues/295, lest a CQL query fail spuriously after the cluster restart. Fixes: SCYLLADB-1114 Closes scylladb/scylladb#29118	2026-03-20 09:05:14 +03:00
Botond Dénes	89388510a0	test/cluster/test_data_resurrection_in_memtable.py: use explicit CL The test has expectation w.r.t which write makes it to which nodes: * inserts make it to all nodes * delete makes it to all-1 (QUORUM) node However, this was not expressed with CL, and the default CL=ONE allowed for some nodes missing the writes and this violating the tests expectations on what data is persent on which nodes. This resulted on the test being flaky and failing on the data checks. Use explicit CL for the ingestion to prevent this. The improvements to the test introduced in `a8dd13731f` was of great help in investigating this: traces are now available and the check happens after the data was dumped to logs. Fixes: SCYLLADB-870 Fixes: SCYLLADB-812 Fixes: SCYLLADB-1102 Closes scylladb/scylladb#29128	2026-03-20 09:02:57 +03:00
Avi Kivity	6b259babeb	Merge 'logstor: initial log-structured storage for key-value tables' from Michael Litvak Introduce an initial and experimental implementation of an alternative log-structured storage engine for key-value tables. Main flows and components: * The storage is composed of 32MB files, each file divided to segments of size 128k. We write to them sequentially records that contain a mutation and additional metadata. Records are written to a buffer first and then written to the active segment sequentially in 4k sized blocks. * The primary index in memory maps keys to their location on disk. It is a B-tree per-table that is ordered by tokens, similar to a memtable. * On reads we calculate the key and look it up in the primary index, then read the mutation from disk with a single disk IO. * On writes we write the record to a buffer, wait for it to be written to disk, then update the index with the new location, and free the previous record. * We track the used space in each segment. When overwriting a record, we increase the free space counter for the segment of the previous record that becomes dead. We store the segments in a histogram by usage. * The compaction process takes segments with low utilization, reads them and writes the live records to new segments, and frees the old segments. * Segments are initially "mixed" - we write to the active segment records from all tables and all tablets. The "separator" process rewrites records from mixed segments into new segments that are organized by compaction groups (tablets), and frees the mixed segments. Each write is written to the active segment and to a separator buffer of the compaction group, which is eventually flushed to a new segment in the compaction group. Currently this mode is experimental and requires an experimental flag to be enabled. Some things that are not supported yet are strong consistency, tablet migration, tablet split/merge, big mutations, tombstone gc, ttl. to use, add to config: ``` enable_logstor: true experimental_features: - logstor ``` create a table: ``` CREATE TABLE ks.t(pk int PRIMARY KEY, a int, v text) WITH storage_engine = 'logstor'; ``` INSERT, SELECT, DELETE work as expected UPDATE not supported yet no backport - new feature Closes scylladb/scylladb#28706 * github.com:scylladb/scylladb: logstor: trigger separator flush for buffers that hold old segments docs/dev: add logstor documentation logstor: recover segments into compaction groups logstor: range read logstor: change index to btree by token per table logstor: move segments to replica::compaction_group db: update dirty mem limits dynamically logstor: track memory usage logstor: logstor stats api logstor: compaction buffer pool logstor: separator: flush buffer when full logstor: hold segment until index updates logstor: truncate table logstor: enable/disable compaction per table logstor: separator buffer pool test: logstor: add separator and compaction tests logstor: segment and separator barrier logstor: separator debt controller logstor: compaction controller logstor: recovery: recover mixed segments using separator logstor: wait for pending reads in compaction logstor: separator logstor: compaction groups logstor: cache files for read logstor: recovery: initial logstor: add segment generation logstor: reserve segments for compaction logstor: index: buckets logstor: add buffer header logstor: add group_id logstor: record generation logstor: generation utility logstor: use RIPEMD-160 for index key test: add test_logstor.py api: add logstor compaction trigger endpoint replica: add logstor to db schema: add logstor cf property logstor: initial commit db: disable tablet balancing with logstor db: add logstor experimental feature flag	2026-03-20 00:18:09 +02:00
Avi Kivity	062751fcec	Merge 'db/config: enable ms sstable format by default' from Łukasz Paszkowski Trie-based sstable indexes are supposed to be (hopefully) a better default than the old BIG indexes. Make the new format a new default for new clusters by naming ms in the default scylla.yaml. New functionality. No backport needed. This PR is basically Michał's one https://github.com/scylladb/scylladb/pull/26377, Jakub's https://github.com/scylladb/scylladb/pull/27332 fixing `sstables_manager::get_highest_supported_format()` and one test fix. Closes scylladb/scylladb#28960 * github.com:scylladb/scylladb: db/config: announce ms format as highest supported db/config: enable `ms` sstable format by default cluster/dtest/bypass_cache_test: switch from highest_supported_sstable_format to chosen_sstable_format api/system: add /system/chosen_sstable_version test/cluster/dtest: reduce num_tokens to 16	2026-03-19 18:19:01 +02:00
Pavel Emelyanov	969dddb630	test/refresh: Simplify refresh invocation take_snapshot return values were unused so drop them. do_refresh was a thin wrapper around load_new_sstables that added no logic; inline it directly into the gather expression. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-19 18:42:57 +03:00
Pavel Emelyanov	de21572b31	test/refresh: Remove r_servers alias for servers r_servers = servers was a no-op assignment; use servers directly. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-19 18:42:52 +03:00
Pavel Emelyanov	20b1531e6d	test/refresh: Replace check_mutation_replicas with a plain CQL SELECT The goal of test_refresh_deletes_uploaded_sstables is to verify that sstables are removed from the upload directory after refresh. The replica check was just a sanity guard; a simple SELECT of all keys is sufficient and much lighter. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-03-19 18:42:48 +03:00
Pavel Emelyanov	c591b9ebe2	test/refresh: Inline keyspace/table/data setup in test_refresh_deletes_uploaded_sstables Replace create_dataset() with explicit keyspace creation via new_test_keyspace, inline CREATE TABLE, and direct cql.run_async inserts — matching the pattern used in do_test_streaming_scopes. This removes the last dependency on backup helpers for dataset setup and makes the test self-contained. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-19 18:42:44 +03:00
Pavel Emelyanov	06006a6328	test/refresh: Prepare indentation for new_test_keyspace in test_refresh_deletes_uploaded_sstables Wrap the test body under if True: to pre-indent it, making the subsequent patch that introduces new_test_keyspace a pure content change with no whitespace noise. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-19 18:42:40 +03:00
Pavel Emelyanov	67d8cde42d	test/refresh: Decouple test_refresh_deletes_uploaded_sstables from backup tests Replace create_cluster() from object_store/test_backup.py with a plain manager.servers_add(2) call. The test does not use object storage, so there is no need to pull in the backup helper along with its config and logging knobs. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-19 18:42:36 +03:00
Pavel Emelyanov	04f046d2d8	test/refresh: Remove unused wait_for_cql_and_get_hosts import Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-19 18:42:32 +03:00
Botond Dénes	e8b37d1a89	Merge 'doc: fix the installation section' from Anna Stuchlik This PR fixes the Installation page: - Replaces `http `with `https `in the download command. - Replaces the Open Source example from the Installation section for CentOS (we overlooked this example before). Fixes https://github.com/scylladb/scylladb/issues/29087 Fixes https://github.com/scylladb/scylladb/issues/29087 This update affects all supported versions and should be backported as a bug fix. Closes scylladb/scylladb#29088 * github.com:scylladb/scylladb: doc: remove the Open Source Example from Installation doc: replace http with https in the installation instructions	2026-03-19 17:13:53 +02:00
Dario Mirovic	d2c44722e1	test: cluster: fix log clear race condition in test_audit.py assert_entries_were_added: - takes a "before" snapshot of the audit log - yields to execute a statement - takes an "after" snapshot of the audit log - computes new rows by diffing "after" minus "before" If an audit entry generated by prepare() arrives between the snapshot and the diff, it inflates the new row count and the test fails with assert 2 <= 1. Fix by: - Adding clear_audit_logs() at the end of prepare(), after all setup - Waiting for the "completed re-reading configuration file" log message after server_update_config - Draining pending syslog lines before clearing the buffer Refs SCYLLADB-573	2026-03-19 16:12:13 +01:00
Dario Mirovic	821f8696a7	test: pylib: shut down exclusive cql connections in ManagerClient get_cql_exclusive() creates a Cluster object per call, but never records it. driver_close() cannot shut it down. The cluster's internal scheduler thread then tries to submit work to an already shut down executor. This causes RuntimeError: RuntimeError: cannot schedule new futures after shutdown Fix this by tracking every exclusive Cluster in a list and shutting them all down in driver_close(). Refs SCYLLADB-573	2026-03-19 16:12:13 +01:00
Dario Mirovic	d94999f87b	test: cluster: fix multinode audit entry comparison in test_audit.py assert_entries_were_added computes new audit rows by slicing the "after" list at the length of the "before" list: rows_after[len(rows_before):]. This assumes new rows always appear at the tail of the combined sorted list. In a multinode setup, each node generates its own event_time timestamps. A new row from node A can sort before an old row from node B, breaking the tail assumption. The assertion "new rows are not the last rows in the audit table" then fires. Fix this by splitting the before/after lists per node and computing the new rows tail independently for each node. This guarantees that per node ordering, which is monotonic, is respected, and the combined new rows are sorted afterwards. Refs SCYLLADB-573	2026-03-19 16:12:13 +01:00
Dario Mirovic	249a6cec1b	test: cluster: dtest: remove old audit tests Since audit tests have been migrated to test/cluster/test_audit.py, old tests in test/cluster/dtest/audit_test.py have to be removed. Refs SCYLLADB-573	2026-03-19 16:12:13 +01:00
Dario Mirovic	adc790a8bf	test: cluster: group migrated audit tests for cluster reuse This patch reorganizes the execution flow of the test functions. They are grouped to enable cluster reuse between specific test functions. One of the main contributors to the test execution time is the cluster preparation. This patch significantly reduces the total test execution time by having way less new cluster preparation calls and more cluster reuse. Performance increase on the developer machine is around 38%: - before: 4m 29s - after: 2m 47s Fixes SCYLLADB-573	2026-03-19 16:11:47 +01:00
Dario Mirovic	967b7ff6bf	test: cluster: enable migrated audit tests and make them work Make audit tests from test/cluster/dtest to test/cluster. test/cluster environment has less overhead, and audit tests are heavy, their execution taking lots of time. This patch is part of an effort to improve audit test suite performance. This patch refactors the tests so that they execute correctly, as well as enables them. A follow up patch will remove the audit tests in test/cluster/dtest. All the tests are confirmed to be running after the change. No dead code present. Test test_audit_categories_invalid is not parametrized anymore. It never used the parametrized helper class, so it just ran the same logic three times. This is why there are now 74, and not 76, test executions. Refs SCYLLADB-573	2026-03-19 16:07:28 +01:00
Dario Mirovic	5d51501a0b	pgo: use maintenance socket for CQL setup in PGO training The default 'cassandra' superuser was removed from ScyllaDB, which broke PGO training. exec_cql.py relied on username/password auth ('cassandra'/'cassandra') to execute setup CQL scripts like auth.cql and counters.cql. Switch exec_cql.py to connect via the Unix domain maintenance socket instead. The maintenance socket bypasses authentication, no credentials are needed. Additionally, create the 'cassandra' superuser via the maintenance socket during the populate phase, so that cassandra-stress keeps working. cassandra-stress hardcodes user=cassandra password=cassandra. Changes: - exec_cql.py: replace host/port/username/password arguments with a single --socket argument; add connect_maintenance_socket() with wait ready logic - pgo.py: add maintenance_socket_path() helper; update populate_auth_conns() and populate_counters() to pass the socket path to exec_cql.py Fixes SCYLLADB-1070 Closes scylladb/scylladb#29081	2026-03-19 16:52:36 +02:00
Dario Mirovic	8367509b3b	test: pylib: manager_client: specify AuthProvider in get_cql_exclusive This patch allows ManagerClient.get_cql_exclusive to accept AuthProvider as parameter. This will be used in a follow up patch which migrates audit test suite to test/cluster and requires this functionality for some tests. Refs SCYLLADB-573	2026-03-19 15:35:24 +01:00
Dario Mirovic	0a7a69345c	test: pylib: scylla cluster after_test log fix Before any test, a pool of ScyllaCluster objects is created. At the beginning of a test suite, a ScyllaClusterManager is created, and given a reference to the pool. At the end of a test suite, the ScyllaClusterManager is destroyed. Before each test case: - ManagerClient is constructed and connected to the ScyllaClusterManager of that test suite - A ScyllaCluster object is fetched from the pool - If the pool is empty, a new ScyllaCluster object is created - If the pool is not empty, a cached ScyllaCluster object is returned After each test case: - Return ScyllaCluster object from ManagerClient to the pool - If the cluster is dirty, the pool destroys it - If the cluster is clean, the pool caches it - ManagerClient is destroyed Many actions mark a cluster as dirty. Normal test execution will always make the cluster be destroyed upon returning to the pool. ManagerClient.mark_clean is not used in the tests. When it is used, the flow with cluster reuse happens. The bug is that the log file is closed even if cluster is not dirty. This causes an error when trying to log to a reused cluster server. The solution in this patch is to not close the log file if the cluster is not dirty. Upon cluster reuse the log file will be open and functional. Another approach would be to reopen the log file if closed, but this approach seems more clean. Refs SCYLLADB-573	2026-03-19 15:35:24 +01:00
Dario Mirovic	899ae71349	test: audit: copy audit test from dtest This patch just copies the audit test suite from dtest and disables it in the test config file. Later patches will update the code and enable the test suite. Refs SCYLLADB-573	2026-03-19 15:35:24 +01:00
Andrzej Jackowski	4deeb7ebfc	test: add new guardrail tests matching documentation scenarios Add tests for RF guardrails (min/max warn/fail, RF=0 bypass, threshold=-1 disable, ALTER KEYSPACE) and write consistency level guardrails to cover all scenarios described in guardrails.rst. Test runtime (dev): test_guardrail_replication_strategy - 6s test_guardrail_write_consistency_level - 5s Refs: SCYLLADB-257	2026-03-19 15:07:03 +01:00
Andrzej Jackowski	2a03c634c0	test: add metric assertions to guardrail replication strategy tests Verify that guardrail violations increment the corresponding metrics. Refs: SCYLLADB-257	2026-03-19 15:07:03 +01:00
Andrzej Jackowski	81c4e717e2	test: use regex matching in guardrail replication strategy tests Replace loose substring assertions with regex-based matching against the exact server message formats. Add regex constants for all guardrail messages and rewrite create_ks_and_assert_warnings_and_errors() to verify count and content of warnings and failures. Refs: SCYLLADB-257	2026-03-19 15:07:03 +01:00
Anna Stuchlik	6b1df5202c	doc: remove the instructions to install old versions from Web Installer The Web Installer page includes instructions to install the old pre-2025.1 Enterprise versions, which are no longer supported (since we released 2026.1). This commit removes those redundant and misleading instructions. Fixes https://github.com/scylladb/scylladb/issues/29099 Closes scylladb/scylladb#29103	2026-03-19 15:47:00 +02:00
Piotr Dulikowski	171504c84f	Merge 'auth: migrate some standard role manager APIs to use cache' from Marcin Maliszkiewicz This patchset migrates: query_all_directly_granted, query_all, get_attribute, query_attribute_for_all functions to use cache instead of doing CQL queries. It also includes some preparatory work which fixes cache update order and triggering. Main motivation behind this is to make sure that all calls from service_level_controller::auth_integration are cached, which we achieve here. Alternative implementation could move the whole auth_integration data into auth cache but since auth_integration manages also lifetime and contains service levels specific logic such solution would be too complex for little (if any) gain. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-159 Backport: no, not a bug Closes scylladb/scylladb#28791 * github.com:scylladb/scylladb: auth: switch query_attribute_for_all to use cache auth: switch get_attribute to use cache auth: cache: add heterogeneous map lookups auth: switch query_all to use cache auth: switch query_all_directly_granted to use cache auth: cache: add ability to go over all roles raft: service: reload auth cache before service levels service: raft: move update_service_levels_effective_cache check	2026-03-19 14:37:22 +01:00
Avi Kivity	5e7fb08bf3	Merge 'Fix bad performance for densely populated partition index pages' from Tomasz Grabiec This applies to small partition workload where index pages have high partition count, and the index doesn't fit in cache. It was observed that the count can be in the order of hundreds. In such a workload pages undergo constant population, LSA compaction, and LSA eviction, which has severe impact on CPU utilization. Refs https://scylladb.atlassian.net/browse/SCYLLADB-620 This PR reduces the impact by several changes: - reducing memory footprint in the partition index. Assuming partition key size is 16 bytes, the cost dropped from 96 bytes to 36 bytes per partition. - flattening the object graph and amortizing storage. Storing entries directly in the vector. Storing all key values in a single managed_bytes. Making index_entry a trivial struct. - index entries and key storage are now trivially moveable, and batched inside vector storage so LSA migration can use memcpy(), which amortizes the cost per key. This reduces the cost of LSA segment compaction. - LSA eviction is now pretty much constant time for the whole page regardless of the number of entries, because elements are trivial and batched inside vectors. Page eviction cost dropped from 50 us to 1 us. Performance evaluated with: scylla perf-simple-query -c1 -m200M --partitions=1000000 Before: ``` 7774.96 tps (166.0 allocs/op, 521.7 logallocs/op, 54.0 tasks/op, 802428 insns/op, 430457 cycles/op, 0 errors) 7511.08 tps (166.1 allocs/op, 527.2 logallocs/op, 54.0 tasks/op, 804185 insns/op, 430752 cycles/op, 0 errors) 7740.44 tps (166.3 allocs/op, 526.2 logallocs/op, 54.2 tasks/op, 805347 insns/op, 432117 cycles/op, 0 errors) 7818.72 tps (165.2 allocs/op, 517.6 logallocs/op, 53.7 tasks/op, 794965 insns/op, 427751 cycles/op, 0 errors) 7865.49 tps (165.1 allocs/op, 513.3 logallocs/op, 53.6 tasks/op, 788898 insns/op, 425171 cycles/op, 0 errors) ``` After (+318%): ``` 32492.40 tps (130.7 allocs/op, 12.8 logallocs/op, 36.1 tasks/op, 109236 insns/op, 103203 cycles/op, 0 errors) 32591.99 tps (130.4 allocs/op, 12.8 logallocs/op, 36.0 tasks/op, 108947 insns/op, 102889 cycles/op, 0 errors) 32514.52 tps (130.6 allocs/op, 12.8 logallocs/op, 36.0 tasks/op, 109118 insns/op, 103219 cycles/op, 0 errors) 32491.14 tps (130.6 allocs/op, 12.8 logallocs/op, 36.0 tasks/op, 109349 insns/op, 103272 cycles/op, 0 errors) 32582.90 tps (130.5 allocs/op, 12.8 logallocs/op, 36.0 tasks/op, 109269 insns/op, 102872 cycles/op, 0 errors) 32479.43 tps (130.6 allocs/op, 12.8 logallocs/op, 36.0 tasks/op, 109313 insns/op, 103242 cycles/op, 0 errors) 32418.48 tps (130.7 allocs/op, 12.8 logallocs/op, 36.1 tasks/op, 109201 insns/op, 103301 cycles/op, 0 errors) 31394.14 tps (130.7 allocs/op, 12.8 logallocs/op, 36.1 tasks/op, 109267 insns/op, 103301 cycles/op, 0 errors) 32298.55 tps (130.7 allocs/op, 12.8 logallocs/op, 36.1 tasks/op, 109323 insns/op, 103551 cycles/op, 0 errors) ``` When the workload is miss-only, with both row cache and index cache disabled (no cache maintenance cost): perf-simple-query -c1 -m200M --duration 6000 --partitions=100000 --enable-index-cache=0 --enable-cache=0 Before: ``` 9124.57 tps (146.2 allocs/op, 789.0 logallocs/op, 45.3 tasks/op, 889320 insns/op, 357937 cycles/op, 0 errors) 9437.23 tps (146.1 allocs/op, 789.3 logallocs/op, 45.3 tasks/op, 889613 insns/op, 357782 cycles/op, 0 errors) 9455.65 tps (146.0 allocs/op, 787.4 logallocs/op, 45.2 tasks/op, 887606 insns/op, 357167 cycles/op, 0 errors) 9451.22 tps (146.0 allocs/op, 787.4 logallocs/op, 45.3 tasks/op, 887627 insns/op, 357357 cycles/op, 0 errors) 9429.50 tps (146.0 allocs/op, 787.4 logallocs/op, 45.3 tasks/op, 887761 insns/op, 358148 cycles/op, 0 errors) 9430.29 tps (146.1 allocs/op, 788.2 logallocs/op, 45.3 tasks/op, 888501 insns/op, 357679 cycles/op, 0 errors) 9454.08 tps (146.0 allocs/op, 787.3 logallocs/op, 45.3 tasks/op, 887545 insns/op, 357132 cycles/op, 0 errors) ``` After (+55%): ``` 14484.84 tps (150.7 allocs/op, 6.5 logallocs/op, 44.7 tasks/op, 396164 insns/op, 229490 cycles/op, 0 errors) 14526.21 tps (150.8 allocs/op, 6.5 logallocs/op, 44.8 tasks/op, 396401 insns/op, 228824 cycles/op, 0 errors) 14567.53 tps (150.7 allocs/op, 6.5 logallocs/op, 44.7 tasks/op, 396319 insns/op, 228701 cycles/op, 0 errors) 14545.63 tps (150.6 allocs/op, 6.5 logallocs/op, 44.7 tasks/op, 395889 insns/op, 228493 cycles/op, 0 errors) 14626.06 tps (150.5 allocs/op, 6.5 logallocs/op, 44.7 tasks/op, 395254 insns/op, 227891 cycles/op, 0 errors) 14593.74 tps (150.5 allocs/op, 6.5 logallocs/op, 44.7 tasks/op, 395480 insns/op, 227993 cycles/op, 0 errors) 14538.10 tps (150.8 allocs/op, 6.5 logallocs/op, 44.8 tasks/op, 397035 insns/op, 228831 cycles/op, 0 errors) 14527.18 tps (150.8 allocs/op, 6.5 logallocs/op, 44.8 tasks/op, 396992 insns/op, 228839 cycles/op, 0 errors) ``` Same as above, but with summary ratio increased from 0.0005 to 0.005 (smaller pages): Before: ``` 33906.70 tps (146.1 allocs/op, 83.6 logallocs/op, 45.1 tasks/op, 170553 insns/op, 98104 cycles/op, 0 errors) 32696.16 tps (146.0 allocs/op, 83.5 logallocs/op, 45.1 tasks/op, 170369 insns/op, 98405 cycles/op, 0 errors) 33889.05 tps (146.1 allocs/op, 83.6 logallocs/op, 45.1 tasks/op, 170551 insns/op, 98135 cycles/op, 0 errors) 33893.24 tps (146.1 allocs/op, 83.5 logallocs/op, 45.1 tasks/op, 170488 insns/op, 98168 cycles/op, 0 errors) 33836.73 tps (146.1 allocs/op, 83.6 logallocs/op, 45.1 tasks/op, 170528 insns/op, 98226 cycles/op, 0 errors) 33897.61 tps (146.0 allocs/op, 83.5 logallocs/op, 45.1 tasks/op, 170428 insns/op, 98081 cycles/op, 0 errors) 33834.73 tps (146.1 allocs/op, 83.5 logallocs/op, 45.1 tasks/op, 170438 insns/op, 98178 cycles/op, 0 errors) 33776.31 tps (146.3 allocs/op, 83.9 logallocs/op, 45.2 tasks/op, 170958 insns/op, 98418 cycles/op, 0 errors) 33808.08 tps (146.3 allocs/op, 83.9 logallocs/op, 45.2 tasks/op, 170940 insns/op, 98388 cycles/op, 0 errors) ``` After (+18%): ``` 40081.51 tps (148.2 allocs/op, 4.4 logallocs/op, 45.0 tasks/op, 121047 insns/op, 82231 cycles/op, 0 errors) 40005.85 tps (148.6 allocs/op, 4.4 logallocs/op, 45.2 tasks/op, 121327 insns/op, 82545 cycles/op, 0 errors) 39816.75 tps (148.3 allocs/op, 4.4 logallocs/op, 45.1 tasks/op, 121067 insns/op, 82419 cycles/op, 0 errors) 39953.11 tps (148.1 allocs/op, 4.4 logallocs/op, 45.0 tasks/op, 121027 insns/op, 82258 cycles/op, 0 errors) 40073.96 tps (148.2 allocs/op, 4.4 logallocs/op, 45.0 tasks/op, 121006 insns/op, 82313 cycles/op, 0 errors) 39882.25 tps (148.2 allocs/op, 4.4 logallocs/op, 45.0 tasks/op, 120925 insns/op, 82320 cycles/op, 0 errors) 39916.08 tps (148.3 allocs/op, 4.4 logallocs/op, 45.1 tasks/op, 121054 insns/op, 82393 cycles/op, 0 errors) 39786.30 tps (148.2 allocs/op, 4.4 logallocs/op, 45.0 tasks/op, 121027 insns/op, 82465 cycles/op, 0 errors) 38662.45 tps (148.3 allocs/op, 4.4 logallocs/op, 45.0 tasks/op, 121108 insns/op, 82312 cycles/op, 0 errors) 39849.42 tps (148.3 allocs/op, 4.4 logallocs/op, 45.1 tasks/op, 121098 insns/op, 82447 cycles/op, 0 errors) ``` Closes scylladb/scylladb#28603 * github.com:scylladb/scylladb: sstables: mx: index_reader: Optimize parsing for no promoted index case vint: Use std::countl_zero() test: sstable_partition_index_cache_test: Validate scenario of pages with sparse promoted index placement sstables: mx: index_reader: Amoritze partition key storage managed_bytes: Hoist write_fragmented() to common header utils: managed_vector: Use std::uninitialized_move() to move objects sstables: mx: index_reader: Keep promoted_index info next to index_entry sstables: mx: index_reader: Extract partition_index_page::clear_gently() sstables: mx: index_reader: Shave-off 16 bytes from index_entry by using raw_token sstables: mx: index_reader: Reduce allocation_section overhead during index page parsing by batching allocation sstables: mx: index_reader: Keep index_entry directly in the vector dht: Introduce raw_token test: perf_simple_query: Add 'sstable-format' command-line option test: perf_simple_query: Add 'sstable-summary-ratio' command-line option test: perf-simple-query: Add option to disable index cache test: cql_test_env: Respect enable-index-cache config	2026-03-19 14:42:50 +02:00
Botond Dénes	4981e72607	Merge 'replica: avoid unnecessary computation on token lookup hot path' from Łukasz Paszkowski `storage_group_of()` sits on the replica-side token lookup hot path, yet it called `tablet_map::get_tablet_id_and_range_side()`, which always computes both the tablet id and the post-split range side — even though most callers only need the storage group id. The range-side computation is only relevant when a storage group is in tablet splitting mode, but we were paying for it unconditionally on every lookup. This series fixes that by: 1. Adding `tablet_map::get_tablet_range_side()` so the range side can be computed independently when needed. 2. Adding lazy `select_compaction_group()` overloads that defer the range-side computation until splitting mode is actually active. 3. Switching `storage_group_of()` to use the cheaper `get_tablet_id()` path, only computing the range side on demand. Improvements. No backport is required. Closes scylladb/scylladb#28963 * github.com:scylladb/scylladb: replica/table: avoid computing token range side in storage_group_of() on hot path replica/compaction_group: add lazy select_compaction_group() overloads locator/tablets: add tablet_map::get_tablet_range_side()	2026-03-19 14:27:12 +02:00
Ernest Zaslavsky	aa9da87e97	encryption: fix deadlock in encrypted_data_source::get() When encrypted_data_source::get() caches a trailing block in _next, the next call takes it directly — bypassing input_stream::read(), which checks _eof. It then calls input_stream::read_exactly() on the already-drained stream. Unlike read(), read_up_to(), and consume(), read_exactly() does not check _eof when the buffer is empty, so it calls _fd.get() on a source that already returned EOS. In production this manifested as stuck encrypted SSTable component downloads during tablet restore: the underlying chunked_download_source hung forever on the post-EOS get(), causing 4 tablets to never complete. The stuck files were always block-aligned sizes (8k, 12k) where _next gets populated and the source is fully consumed in the same call. Fix by checking _input.eof() before calling read_exactly(). When the stream already reached EOF, buf2 is known to be empty, so the call is skipped entirely. A comprehensive test is added that uses a strict_memory_source which fails on post-EOS get(), reproducing the exact code path that caused the production deadlock.	2026-03-19 13:54:54 +02:00
Ernest Zaslavsky	f74a54f005	test_lib: mark `limiting_data_source_impl` as not `final`	2026-03-19 13:54:54 +02:00
Ernest Zaslavsky	151e945d9f	Fix formatting after previous patch	2026-03-19 13:54:44 +02:00
Andrzej Jackowski	517bb8655d	test: extract ks_opts helper in test_guardrail_replication_strategy Factor out ks_opts() to build keyspace options with tablets handling and use it across all existing replication strategy guardrail tests. No behavioral changes. This facilitates further modification of the tests later in this patch series. Refs: SCYLLADB-257	2026-03-19 12:49:41 +01:00
Andrzej Jackowski	9b24d9ee7d	docs: document CQL guardrails Add docs/cql/guardrails.rst covering replication factor, replication strategy, write consistency level, and compact storage guardrails. Fixes: SCYLLADB-257	2026-03-19 12:49:41 +01:00
Ernest Zaslavsky	537747cf5d	Fix indentation after previous patch	2026-03-19 13:48:53 +02:00
Ernest Zaslavsky	2535164542	test_lib: make limiting_data_source_impl available to tests Relocate the `limiting_data_source_impl` declaration to the header file so that test code can access it directly.	2026-03-19 13:48:53 +02:00
Botond Dénes	86d7c82993	test/cluster/test_repair.py: use tablets in test_repair_timestamp_difference After repair, the test does a major to compact all sstables into a single one, so the results can be simply checked by a select from mutation_fragments() query. Sometimes off-strategy happens parallel to this major, so after the major there are still 2 sstables, resulting in the test failing when checking that the query returns just a single row. To fix, just use tablets for the test table, tablets don't use off-strategy anymore. Fixes: SCYLLADB-940 Closes scylladb/scylladb#29071	2026-03-19 12:42:18 +03:00
Michael Litvak	399260a6c0	test: mv: fix flaky wait for commitlog sync Previously the test test_interrupt_view_build_shard_registration stopped the node ungracefully and used commitlog periodic mode to persist the view build progress in a not very reliable way. It can happen that due to timing issues, the view build progress is not persisted, or some of it is persisted in a different ordering than expected. To make the test more reliable we change it to stop the node gracefully, so the commitlog is persisted in a graceful and consistent way, without using the periodic mode delay. We need to also change the injection for the shutdown to not get stuck. Fixes SCYLLADB-1005 Closes scylladb/scylladb#29008	2026-03-19 10:41:21 +01:00
Pavel Emelyanov	f27dc12b7c	Merge 'Fix directory lister leak in table::get_snapshot_details: ' from Benny Halevy As reported in SCYLLADB-1013, the directory lister must be closed also when an exception is thrown. For example, see backtrace below: ``` seastar::on_internal_error(seastar::logger&, std::basic_string_view<char, std::char_traits<char>>) at ./build/release/seastar/./seastar/src/core/on_internal_error.cc:57 directory_lister::~directory_lister() at ./utils/lister.cc:77 replica::table::get_snapshot_details(std::filesystem::__cxx11::path, std::filesystem::__cxx11::path) (.resume) at ./replica/table.cc:4081 std::__n4861::coroutine_handle<seastar::internal::coroutine_traits_base<db::snapshot_ctl::table_snapshot_details>::promise_type>::resume() const at /usr/lib/gcc/x86_64-redhat-linux/15/../../../../include/c++/15/coroutine:247 (inlined by) seastar::internal::coroutine_traits_base<db::snapshot_ctl::table_snapshot_details>::promise_type::run_and_dispose() at ././seastar/include/seastar/core/coroutine.hh:129 seastar::reactor::task_queue::run_tasks() at ./build/release/seastar/./seastar/src/core/reactor.cc:2695 (inlined by) seastar::reactor::task_queue_group::run_tasks() at ./build/release/seastar/./seastar/src/core/reactor.cc:3201 seastar::reactor::task_queue_group::run_some_tasks() at ./build/release/seastar/./seastar/src/core/reactor.cc:3185 (inlined by) seastar::reactor::do_run() at ./build/release/seastar/./seastar/src/core/reactor.cc:3353 seastar::reactor::run() at ./build/release/seastar/./seastar/src/core/reactor.cc:3245 seastar::app_template::run_deprecated(int, char, std::function<void ()>&&) at ./build/release/seastar/./seastar/src/core/app-template.cc:266 seastar::app_template::run(int, char, std::function<seastar::future<int> ()>&&) at ./build/release/seastar/./seastar/src/core/app-template.cc:160 scylla_main(int, char*) at ./main.cc:756 ``` Fixes: [SCYLLADB-1013](https://scylladb.atlassian.net/browse/SCYLLADB-1013) Requires backport to 2026.1 since the leak exists since `004c08f525` [SCYLLADB-1013]: https://scylladb.atlassian.net/browse/SCYLLADB-1013?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ Closes scylladb/scylladb#29084 * github.com:scylladb/scylladb: test/boost/database_test: add test_snapshot_ctl_details_exception_handling table: get_snapshot_details: fix indentation inside try block table: per-snapshot get_snapshot_details: fix typo in comment table: per-snapshot get_snapshot_details: always close lister using try/catch table: get_snapshot_details: always close lister using deferred_close	2026-03-19 12:40:23 +03:00
Raphael S. Carvalho	3143134968	test: avoid split/major compaction deadlock in tablet split test Run keyspace compaction asynchronously in `test_tombstone_gc_correctness_during_tablet_split` and only await it after `split_sstable_rewrite` is disabled. The problem is that `keyspace_compaction()` starts with a flush, and that flush can take around five seconds. During that window the split compaction is stopped before major compaction is retried. The stop aborts the in-flight major compaction attempt, then the split proceeds far enough to enter the `split_sstable_rewrite` injection point. At that point the test used to wait synchronously for major compaction to finish, but major compaction cannot finish yet: when it retries, it needs the same semaphore that is still effectively tied up behind the blocked split rewrite. So the test waits for major compaction, while the split waits for the injection to be released, and the code that would release that injection never runs. Starting major compaction as a task breaks that cycle. The test can first disable `split_sstable_rewrite`, let the split get out of the way, and only then wait for major compaction to complete. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-827. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#29066	2026-03-19 11:12:21 +02:00
Botond Dénes	2e47fd9f56	Merge 'tasks: do not fail the wait request if rpc fails' from Aleksandra Martyniuk During decommission, we first mark a topology request as done, then shut down a node and in the following steps we remove node from the topology. Thus, finished request does not imply that a node is removed from the topology. Due to that, in node_ops_virtual_task::wait, while gathering children from the whole cluster, we may hit the connection exception - because a node is still in topology, even though it is down. Modify the get_children method to ignore the exception and warn about the failure instead. Keep token_metadata_ptr in get_children to prevent topology from changing. Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-867 Needs backports to all versions Closes scylladb/scylladb#29035 * github.com:scylladb/scylladb: tasks: fix indentation tasks: do not fail the wait request if rpc fails tasks: pass token_metadata_ptr to task_manager::virtual_task::impl::get_children	2026-03-19 10:03:18 +02:00
Piotr Smaron	a2ad57062f	docs/cql: clarify WHERE clause boolean limitations Document that `SELECT ... WHERE` clause currently accepts only conjunctions of relations joined by `AND` (`OR` is not supported), and that parentheses cannot be used to group boolean subexpressions. Add an unsupported query example and point readers to equivalent `IN` rewrites when applicable. This problem has been raised by one of our users in https://forum.scylladb.com/t/error-parsing-query-or-unsupported-statement/5299, and while one could infer answer to user's question by looking at the syntax of the `SELECT ... WHERE`, it's not immediately obvious to non-advanced users, so clarifying these concepts is justified. Fixes: SCYLLADB-1116 Closes scylladb/scylladb#29100	2026-03-19 09:47:22 +02:00
Michael Litvak	31d339e54a	logstor: trigger separator flush for buffers that hold old segments A compaction group has a separator buffer that holds the mixed segments alive until the separator buffer is flushed. A mixed segment can be freed only after all separator buffers that hold writes from the segment are flushed. Typically a separator buffer is flushed when it becomes full. However it's possible for example that one compaction groups is filled slower than others and holds many segments. To fix this we trigger a separator flush periodically for separator buffers that hold old segments. We track the active segment sequence number and for each separator buffer the oldest sequence number it holds.	2026-03-18 19:24:28 +01:00
Michael Litvak	ad87eda835	docs/dev: add logstor documentation	2026-03-18 19:24:28 +01:00
Michael Litvak	a0da07e5b7	logstor: recover segments into compaction groups Fix the logstor recovery to work with compaction groups. When recovering a segment find its token range and add it to the appropriate compaction groups. if it doesn't fit in a single compaction group then write each record to its compaction group's separator buffer.	2026-03-18 19:24:28 +01:00
Michael Litvak	24379acc76	logstor: range read extend the logstor mutation reader to support range read	2026-03-18 19:24:28 +01:00
Michael Litvak	a9d0211a64	logstor: change index to btree by token per table Change the primary index to be a btree that is ordered by token, similarly to a memtable, and create a index per-table instead of a single global index.	2026-03-18 19:24:28 +01:00
Michael Litvak	e7c3942d43	logstor: move segments to replica::compaction_group Add a segment_set member to replica::compaction_group that manages the logstor segments that belong to the compaction group, similarly to how it manages sstables. Add also a separator buffer in each compaction group. When writing a mutation to a compaction group, the mutation is written to the active segment and to the separator buffer of the compaction group, and when the separator buffer is flushed the segment is added to the compaction_group's segment set.	2026-03-18 19:24:28 +01:00
Michael Litvak	d69f7eb0ee	db: update dirty mem limits dynamically when logstor is enabled, update the db dirty memory limits dynamically. previously the threshold is set to 0.5 of the available memory, so 0.5 goes to memtables and 0.5 to others (cache). when logstor is enabled, we calculate the available memory excluding logstor, and divide it evenly between memtables and cache.	2026-03-18 19:24:27 +01:00
Michael Litvak	65cd0b5639	logstor: track memory usage add logstor::get_memory_usage() that returns an estimate of the memory usage by logstor. add tracking to how many unique keys are held in the index.	2026-03-18 19:24:27 +01:00
Michael Litvak	b7bdb1010a	logstor: logstor stats api add api to get logstor statistics about segments for a table	2026-03-18 19:24:27 +01:00
Michael Litvak	8bd3bd7e2a	logstor: compaction buffer pool pre-allocate write buffers for compaction	2026-03-18 19:24:27 +01:00
Michael Litvak	caf5aa47c2	logstor: separator: flush buffer when full flush separator buffers when they become full and switched instead of aggregating all the buffers and flushing them when the separator is switched.	2026-03-18 19:24:27 +01:00
Michael Litvak	6ddb7a4d13	logstor: hold segment until index updates add a write gate to write_buffer. when writing a record to the write buffer, the gate is held and passed back to the caller, and the caller holds the gate until the write operation is complete, including follow-up operations such as updating the index after the write. in particular, when writing a mutation in logstor::write, the write buffer is held open until the write is completed and updated in the index. when writing the write buffer to the active segment, we write the buffer and then wait for the write buffer gate to close, i.e. we wait for all index updates to complete before proceeding. the segment is held open until all the write operations and index updates are complete. this property is useful for correctness: when a segment is closed we know that all the writes to it are updated in the index. this is needed in compaction for example, where we take closed segments and check which records in them are alive by looking them up in the index. if the index is not updated yet then it will be wrong.	2026-03-18 19:24:27 +01:00
Michael Litvak	bd66edee5c	logstor: truncate table implement freeing all segments of a table for table truncate. first do barrier to flush all active and mixed segments and put all the table's data in compaction groups, then stop compaction for the table, then free the table's segments and remove the live entries from the index.	2026-03-18 19:24:27 +01:00
Michael Litvak	489efca47c	logstor: enable/disable compaction per table add functions to enable or disable compaction for a specific compaction group or for all compaction groups of a table.	2026-03-18 19:24:27 +01:00
Michael Litvak	21db4f3ed8	logstor: separator buffer pool pre-allocate write buffers for the separator	2026-03-18 19:24:27 +01:00
Michael Litvak	37c485e3d1	test: logstor: add separator and compaction tests	2026-03-18 19:24:27 +01:00
Michael Litvak	31aefdc07d	logstor: segment and separator barrier add barrier operation that forces switch of the active segment and separator, and waits for all existing segments to close and all separators to flush.	2026-03-18 19:24:27 +01:00
Michael Litvak	1231fafb46	logstor: separator debt controller add tracking of the total separator debt - writes that were written to a separator and waiting to be flushed, and add flow control to keep the debt in control by delaying normal writes.	2026-03-18 19:24:27 +01:00
Michael Litvak	17cb173e18	logstor: compaction controller adjust compaction shares by the compaction overhead: how many segments compaction writes to generate a single free segment for new writes.	2026-03-18 19:24:27 +01:00
Michael Litvak	1da1bb9d99	logstor: recovery: recover mixed segments using separator on recovery we may find mixed segments. recover them by adding them to a separator, reading all their records and writing them to the separator, and flush the separator.	2026-03-18 19:24:27 +01:00
Michael Litvak	b78cc787a6	logstor: wait for pending reads in compaction we free a segment from compaction after updating all live records in the segment to point to new locations in the index. we need to ensure they are no running operations that use the old locations before we free the segment.	2026-03-18 19:24:27 +01:00
Michael Litvak	600ec82bec	logstor: separator initial implementation of the separator. it replaces "mixed" segments - segments that have records from different groups, to segments by group. every write is written to the active segment and to a buffer in the active separator. the active separator has in-memory buffers by group. at some threshold number of segments we switch the active segment and separator atomically, and start flushing the separator. the separator is flushed by writing the buffers into new non-mixed segments, adding them to a compaction group, and frees the mixed segments.	2026-03-18 19:24:27 +01:00
Michael Litvak	009fc3757a	logstor: compaction groups divide the segments in the compaction manager to compaction group. compaction will compact only segments from a single compaction group at a time.	2026-03-18 19:24:27 +01:00
Michael Litvak	b3293f8579	logstor: cache files for read keep all files for all segments open for read to improve reads.	2026-03-18 19:24:26 +01:00
Michael Litvak	5a16980845	logstor: recovery: initial initial and basic recovery implementation. * find all files, read their segments and populate the index with the newest record for each key. * find which segments are used and build the usage histogram	2026-03-18 19:24:26 +01:00
Michael Litvak	bc9fc96579	logstor: add segment generation add segment generation number that is incremented when the segment is reused, and it's written to every buffer that is written to the segment. this is useful for recovery.	2026-03-18 19:24:26 +01:00
Michael Litvak	719f7cca57	logstor: reserve segments for compaction reserve segments for compaction so it always has enough segments to run and doesn't get stuck. do the compaction writes into full new segments instead of the active segment.	2026-03-18 19:24:26 +01:00
Michael Litvak	521fca5c92	logstor: index: buckets divide the primary index to buckets, each bucket containing a btree. the bucket is determined by using bits from the key hash.	2026-03-18 19:24:26 +01:00
Michael Litvak	99c3b1998a	logstor: add buffer header add a buffer header in each write buffer we write that contains some information that can be useful for recovery and reading.	2026-03-18 19:24:26 +01:00
Michael Litvak	ddd72a16b0	logstor: add group_id add group_id value to each log record that is passed with the mutation when writing it. the group_id will be used to group log records in segments, such that a segment will contain records only from a single group. this will be useful for tablet migration. we want for each tablet to have their own segments with all their records, so we can migrate them efficiently by copying these segments. the group_id value is set to a value equivalent to the tablet id.	2026-03-18 19:24:26 +01:00
Michael Litvak	08bea860ef	logstor: record generation add a record generation number for each record so we can compare records and find which one is newer.	2026-03-18 19:24:26 +01:00
Michael Litvak	28f820eb1c	logstor: generation utility basic utility for generation numbers that will be useful next. a generation number is an unsigned integer that can be incremented and compared even if it wraparounds, assuming the values we compare were written around the same time.	2026-03-18 19:24:26 +01:00
Michael Litvak	5f649dd39f	logstor: use RIPEMD-160 for index key use a 20-byte hash function for the index key to make hash collisions very unlikely. we assume there are no hash collisions.	2026-03-18 19:24:26 +01:00
Michael Litvak	a521bcbcee	test: add test_logstor.py add basic tests for key-value tables with logstor storage	2026-03-18 19:24:26 +01:00
Michael Litvak	1ae1f37ec1	api: add logstor compaction trigger endpoint add a new api endpoint that triggers logstor compaction.	2026-03-18 19:24:26 +01:00
Michael Litvak	2128b1b15c	replica: add logstor to db Add a single logstor instance in the database that is used for writing and reading to tables with kv storage	2026-03-18 19:24:26 +01:00
Michael Litvak	9172cc172e	schema: add logstor cf property add a schema property for tables with logstor storage	2026-03-18 19:24:26 +01:00
Michael Litvak	0b1343747f	logstor: initial commit initial implementation of the logstor storage engine for key-value tables that supports writes, reads and basic compaction. main components: * logstor: this is the main interface to users that supports writing and reading back mutations, and manages the internal components. * index: the primary index in-memory that maps a key to a location on disk. * write buffer: writes go initially to a write buffer. it accumulates multiple records in a buffer and writes them to the segment manager in 4k sized blocks. * segment manager: manages the storage - files, segments, compaction. it manages file and segment allocation, and writes 4k aligned buffers to the active segment sequentially. it tracks the used space in each segment. the compaction finds segment with low space usage and writes them to new segments, and frees the old segments.	2026-03-18 19:24:26 +01:00
Michael Litvak	27fd0c119f	db: disable tablet balancing with logstor initially logstor tables will not support tablet migrations, so disable tablet balancing if the experimental feature flag is set.	2026-03-18 19:24:26 +01:00
Michael Litvak	ed852a2af2	db: add logstor experimental feature flag add a new experimental feature flag for key-value tables with the new logstor storage engine.	2026-03-18 19:24:26 +01:00
Anna Stuchlik	88b98fac3a	doc: update the warning about shared dictionary training This commit updates the inadequate warning on the Advanced Internode (RPC) Compression page. The warning is replaced with a note about how training data is encrypted. Fixes https://github.com/scylladb/scylladb/issues/29109 Closes scylladb/scylladb#29111	2026-03-18 19:35:18 +02:00
Avi Kivity	46a6f8e1d3	Merge 'auth: add maintenance_socket_authorizer' from Dario Mirovic GRANT/REVOKE fails on the maintenance socket connections, because maintenance_auth_service uses allow_all_authorizer. allow_all_authorizer allows all operations, but not GRANT/REVOKE, because they make no sense in its context. This has been observed during PGO run failure in operations from ./pgo/conf/auth.cql file. This patch introduces maintenance_socket_authorizer that supports the capabilities of default_authorizer ('CassandraAuthorizer') without needing authorization. Refs SCYLLADB-1070 This is an improvement, no need for backport. Closes scylladb/scylladb#29080 * github.com:scylladb/scylladb: test: use NetworkTopologyStrategy in maintenance socket tests test: use cleanup fixture in maintenance socket auth tests auth: add maintenance_socket_authorizer	2026-03-18 19:29:57 +02:00
Pavel Emelyanov	d6c01be09b	s3/client: Don't reconstruct regex on every parse_content_range call Make the pattern static const so it is compiled once at first call rather than on every Content-Range header parse. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#29054	2026-03-18 17:56:33 +02:00
Gleb Natapov	2d8b3e751b	view: drop unused v1 builder code	2026-03-18 17:45:40 +02:00
Gleb Natapov	77d3245e02	view: remove upgrade to raft code Since we do no longer support upgrade from versions that do not support v2 of view building code we can remove upgrade code and make sure we do not boot with old builder version.	2026-03-18 17:45:40 +02:00
Tomasz Grabiec	4410e9c61a	sstables: mx: index_reader: Optimize parsing for no promoted index case It's a common case with small partition workloads.	2026-03-18 16:25:21 +01:00
Tomasz Grabiec	32f8609b89	vint: Use std::countl_zero() It handles 0, and could generate better code for that. On Broadwell architecture, it translates to a single instruction (LZCNT). We're still on Westmere, so it translates to BSR with a conditional move. Also, drop unnecessary casts and bit arithmetic, which saves a few instructions. Move to header so that it's inlined in parsers.	2026-03-18 16:25:21 +01:00
Tomasz Grabiec	6017688445	test: sstable_partition_index_cache_test: Validate scenario of pages with sparse promoted index placement	2026-03-18 16:25:21 +01:00
Tomasz Grabiec	f55bb154ec	sstables: mx: index_reader: Amoritze partition key storage This change reduces the cost of partition index page construction and LSA migration. This is achieved by several things working together: - index entries don't store keys as separate small objects (managed_bytes) They are written into one managed_bytes fragmented storage, entries hold offset into it. Before, we paid 16 bytes for managed_bytes plus LSA descriptor for the storage (1 byte) plus back-reference in the storage (8 bytes), so 25 bytes. Now we only pay 4 bytes for the size offset. If keys are 16 bytes, that's a reduction from 31 bytes to 20 bytes per key. - index entries and key storage are now trivially moveable, so LSA migration can use memcpy() which amortizes the cost per key. memcpy(). LSA eviction is now trivial and constant time for the whole page regardless of the number of entries. Page eviction dropped from 14 us to 1 us. This improves throughput in a CPU-bound miss-heavy read workload where the partition index doesn't fit in memory. scylla perf-simple-query -c1 -m200M --partitions=1000000 Before: 15328.25 tps (150.0 allocs/op, 14.1 logallocs/op, 45.4 tasks/op, 286769 insns/op, 218134 cycles/op, 0 errors) 15279.01 tps (149.9 allocs/op, 14.1 logallocs/op, 45.3 tasks/op, 287696 insns/op, 218637 cycles/op, 0 errors) 15347.78 tps (149.7 allocs/op, 14.1 logallocs/op, 45.3 tasks/op, 285851 insns/op, 217795 cycles/op, 0 errors) 15403.68 tps (149.6 allocs/op, 14.1 logallocs/op, 45.2 tasks/op, 285111 insns/op, 216984 cycles/op, 0 errors) 15189.47 tps (150.0 allocs/op, 14.1 logallocs/op, 45.5 tasks/op, 289509 insns/op, 219602 cycles/op, 0 errors) 15295.04 tps (149.8 allocs/op, 14.1 logallocs/op, 45.3 tasks/op, 288021 insns/op, 218545 cycles/op, 0 errors) 15162.01 tps (149.8 allocs/op, 14.1 logallocs/op, 45.4 tasks/op, 291265 insns/op, 220451 cycles/op, 0 errors) After: 21620.18 tps (148.4 allocs/op, 13.4 logallocs/op, 43.7 tasks/op, 176817 insns/op, 153183 cycles/op, 0 errors) 20644.03 tps (149.8 allocs/op, 13.5 logallocs/op, 44.3 tasks/op, 187941 insns/op, 160409 cycles/op, 0 errors) 20588.06 tps (150.1 allocs/op, 13.5 logallocs/op, 44.5 tasks/op, 188090 insns/op, 160818 cycles/op, 0 errors) 20789.29 tps (149.5 allocs/op, 13.5 logallocs/op, 44.2 tasks/op, 186495 insns/op, 159382 cycles/op, 0 errors) 20977.89 tps (149.5 allocs/op, 13.4 logallocs/op, 44.2 tasks/op, 183969 insns/op, 158140 cycles/op, 0 errors) 21125.34 tps (149.1 allocs/op, 13.4 logallocs/op, 44.1 tasks/op, 183204 insns/op, 156925 cycles/op, 0 errors) 21244.42 tps (148.6 allocs/op, 13.4 logallocs/op, 43.8 tasks/op, 181276 insns/op, 155973 cycles/op, 0 errors) Mostly because the index now fits in memory. When it doesn't, the benefits are still visible due to lower LSA overhead.	2026-03-18 16:25:21 +01:00
Tomasz Grabiec	1452e92567	managed_bytes: Hoist write_fragmented() to common header	2026-03-18 16:25:20 +01:00
Tomasz Grabiec	75e6412b1c	utils: managed_vector: Use std::uninitialized_move() to move objects It's shorter, and is supposed to be optimized for trivially-moveable types. Important for managed_vector<index_entry>, which can have lots of elements.	2026-03-18 16:25:20 +01:00
Tomasz Grabiec	50dc7c6dd8	sstables: mx: index_reader: Keep promoted_index info next to index_entry Densely populated pages have no promoted index (small partitions), so we can save space in such workloads by keeping promoted index in a separate vector. For workloads which do have a promoted index, pages have only one partition. There aren't many such pages and they are long-lived, so the extra allocation of the vector is amortized. promoted_index class is removed, and replaced with equivalent parsed_promoted_index_entry for simplicity. Because it's removed, make_cursor() is moved into the index_reader class. Reducing the size of index_entry is important for performence if pages are densly populated. It helps to reduce LSA allocator pressure and compaction/eviction speed. This change, combined with the earlier change "Shave-off 16 bytes from index_entry by using raw_token", gives significant improvement in throughput in perf_simple_query run where the index doesn't fit in memory: scylla perf-simple-query -c1 -m200M --partitions=1000000 Before: 9714.78 tps (170.9 allocs/op, 16.9 logallocs/op, 55.3 tasks/op, 494788 insns/op, 343920 cycles/op, 0 errors) 9603.13 tps (171.6 allocs/op, 17.0 logallocs/op, 55.6 tasks/op, 502358 insns/op, 348344 cycles/op, 0 errors) 9621.43 tps (171.9 allocs/op, 17.0 logallocs/op, 55.8 tasks/op, 500612 insns/op, 347508 cycles/op, 0 errors) 9597.75 tps (171.6 allocs/op, 17.0 logallocs/op, 55.6 tasks/op, 501428 insns/op, 348604 cycles/op, 0 errors) 9615.54 tps (171.6 allocs/op, 16.9 logallocs/op, 55.6 tasks/op, 501313 insns/op, 347935 cycles/op, 0 errors) 9577.03 tps (171.8 allocs/op, 17.0 logallocs/op, 55.7 tasks/op, 503283 insns/op, 349251 cycles/op, 0 errors) After: 15328.25 tps (150.0 allocs/op, 14.1 logallocs/op, 45.4 tasks/op, 286769 insns/op, 218134 cycles/op, 0 errors) 15279.01 tps (149.9 allocs/op, 14.1 logallocs/op, 45.3 tasks/op, 287696 insns/op, 218637 cycles/op, 0 errors) 15347.78 tps (149.7 allocs/op, 14.1 logallocs/op, 45.3 tasks/op, 285851 insns/op, 217795 cycles/op, 0 errors) 15403.68 tps (149.6 allocs/op, 14.1 logallocs/op, 45.2 tasks/op, 285111 insns/op, 216984 cycles/op, 0 errors) 15189.47 tps (150.0 allocs/op, 14.1 logallocs/op, 45.5 tasks/op, 289509 insns/op, 219602 cycles/op, 0 errors) 15295.04 tps (149.8 allocs/op, 14.1 logallocs/op, 45.3 tasks/op, 288021 insns/op, 218545 cycles/op, 0 errors) 15162.01 tps (149.8 allocs/op, 14.1 logallocs/op, 45.4 tasks/op, 291265 insns/op, 220451 cycles/op, 0 errors)	2026-03-18 16:25:20 +01:00
Tomasz Grabiec	5e228a8387	sstables: mx: index_reader: Extract partition_index_page::clear_gently() There will be more elements to clear. And partition_index_page should know how to clear itself.	2026-03-18 16:25:20 +01:00
Tomasz Grabiec	2d77e4fc28	sstables: mx: index_reader: Shave-off 16 bytes from index_entry by using raw_token The std::optional<> adds 8 bytes. And dht::token adds 8 bytes due to _kind, which in this case is always kind::key. The size changd from 56 to 48 bytes.	2026-03-18 16:25:20 +01:00
Tomasz Grabiec	e9c98274b5	sstables: mx: index_reader: Reduce allocation_section overhead during index page parsing by batching allocation If the page has many entries, we continuously enter and leave the allocating section for every key. This can be avoided by batching LSA operations for the whole page, after collecting all the entries. Later optimizations will also build on this, where we will allocate fragmented storage for keys in LSA using a single managed_bytes constructor. This alone brings only a minor improvement, but it does reduce LSA allocations, probably due to less frequent memory reclamation: scylla perf-simple-query -c1 -m200M --duration 6000 --partitions=1000000 Before: 9560.42 tps (172.2 allocs/op, 19.6 logallocs/op, 57.7 tasks/op, 567741 insns/op, 345158 cycles/op, 0 errors) 9445.95 tps (173.1 allocs/op, 19.7 logallocs/op, 58.1 tasks/op, 579075 insns/op, 352173 cycles/op, 0 errors) 9576.75 tps (172.2 allocs/op, 19.6 logallocs/op, 57.6 tasks/op, 572004 insns/op, 347373 cycles/op, 0 errors) 9597.16 tps (172.2 allocs/op, 19.6 logallocs/op, 57.6 tasks/op, 569615 insns/op, 346618 cycles/op, 0 errors) 9454.07 tps (173.5 allocs/op, 19.8 logallocs/op, 58.3 tasks/op, 579213 insns/op, 351569 cycles/op, 0 errors) After: 9562.21 tps (172.0 allocs/op, 17.0 logallocs/op, 55.8 tasks/op, 499225 insns/op, 347832 cycles/op, 0 errors) 9480.20 tps (172.3 allocs/op, 17.0 logallocs/op, 55.9 tasks/op, 507271 insns/op, 350640 cycles/op, 0 errors) 9512.42 tps (172.1 allocs/op, 17.0 logallocs/op, 55.9 tasks/op, 504247 insns/op, 350392 cycles/op, 0 errors) 9498.45 tps (172.4 allocs/op, 17.1 logallocs/op, 55.9 tasks/op, 505765 insns/op, 350320 cycles/op, 0 errors) 9076.30 tps (173.5 allocs/op, 17.1 logallocs/op, 56.5 tasks/op, 512791 insns/op, 354792 cycles/op, 0 errors) 9542.62 tps (171.9 allocs/op, 17.0 logallocs/op, 55.8 tasks/op, 502532 insns/op, 348922 cycles/op, 0 errors)	2026-03-18 16:25:20 +01:00
Tomasz Grabiec	0e0f9f41b3	sstables: mx: index_reader: Keep index_entry directly in the vector Partition index entries are relatively small, and if the workload has small partitions, index pages have a lot of elements. Currently, index entries are indirected via managed_ref, which causes increased cost of LSA eviction and compaction. This patch amortizes this cost by storing them dierctly in the managed_chunked_vector. This gives about 23% improvement in throughput in perf-simple-query for a workload where the index doesn't fit in memory: scylla perf-simple-query -c1 -m200M --duration 6000 --partitions=1000000 Before: 7774.96 tps (166.0 allocs/op, 521.7 logallocs/op, 54.0 tasks/op, 802428 insns/op, 430457 cycles/op, 0 errors) 7511.08 tps (166.1 allocs/op, 527.2 logallocs/op, 54.0 tasks/op, 804185 insns/op, 430752 cycles/op, 0 errors) 7740.44 tps (166.3 allocs/op, 526.2 logallocs/op, 54.2 tasks/op, 805347 insns/op, 432117 cycles/op, 0 errors) 7818.72 tps (165.2 allocs/op, 517.6 logallocs/op, 53.7 tasks/op, 794965 insns/op, 427751 cycles/op, 0 errors) 7865.49 tps (165.1 allocs/op, 513.3 logallocs/op, 53.6 tasks/op, 788898 insns/op, 425171 cycles/op, 0 errors) After: 9560.42 tps (172.2 allocs/op, 19.6 logallocs/op, 57.7 tasks/op, 567741 insns/op, 345158 cycles/op, 0 errors) 9445.95 tps (173.1 allocs/op, 19.7 logallocs/op, 58.1 tasks/op, 579075 insns/op, 352173 cycles/op, 0 errors) 9576.75 tps (172.2 allocs/op, 19.6 logallocs/op, 57.6 tasks/op, 572004 insns/op, 347373 cycles/op, 0 errors) 9597.16 tps (172.2 allocs/op, 19.6 logallocs/op, 57.6 tasks/op, 569615 insns/op, 346618 cycles/op, 0 errors) 9454.07 tps (173.5 allocs/op, 19.8 logallocs/op, 58.3 tasks/op, 579213 insns/op, 351569 cycles/op, 0 errors) Disabling the partition index doesn't improve the throuhgput beyond that.	2026-03-18 16:25:20 +01:00
Tomasz Grabiec	b6bfdeb111	dht: Introduce raw_token Most tokens stored in data structures are for key-scoped tokens, and we don't need to pay for token::kind storage.	2026-03-18 16:25:20 +01:00
Tomasz Grabiec	3775593e53	test: perf_simple_query: Add 'sstable-format' command-line option	2026-03-18 16:25:20 +01:00
Tomasz Grabiec	6ee9bc63eb	test: perf_simple_query: Add 'sstable-summary-ratio' command-line option	2026-03-18 16:25:20 +01:00
Tomasz Grabiec	38d130d9d0	test: perf-simple-query: Add option to disable index cache	2026-03-18 16:25:20 +01:00
Tomasz Grabiec	5ee61f067d	test: cql_test_env: Respect enable-index-cache config Mirrors the code in main.cc	2026-03-18 16:25:20 +01:00
Aleksandra Martyniuk	2d16083ba6	tasks: fix indentation	2026-03-18 15:37:24 +01:00
Aleksandra Martyniuk	1fbf3a4ba1	tasks: do not fail the wait request if rpc fails During decommission, we first mark a topology request as done, then shut down a node and in the following steps we remove node from the topology. Thus, finished request does not imply that a node is removed from the topology. Due to that, in node_ops_virtual_task::wait, while gathering children from the whole cluster, we may hit the connection exception - because a node is still in topology, even though it is down. Modify the get_children method to ignore the exception and warn about the failure instead.	2026-03-18 15:37:24 +01:00
Aleksandra Martyniuk	d4fdeb4839	tasks: pass token_metadata_ptr to task_manager::virtual_task::impl::get_children In get_children we get the vector of alive nodes with get_nodes. Yet, between this and sending rpc to those nodes there might be a preemption. Currently, the liveness of a node is checked once again before the rpcs (only with gossiper not in topology - unlike get_nodes). Modify get_children, so that it keeps a token_metadata_ptr, preventing topology from changing between get_nodes and rpcs. Remove test_get_children as it checked if the get_children method won't fail if a node is down after get_nodes - which cannot happen currently.	2026-03-18 15:37:24 +01:00
Calle Wilund	0013f22374	memtable_test::memtable_flush_period: Change sleep to use injection signal instead Fixes: SCYLLADB-942 Adds an injection signal _from_ table::seal_active_memtable to allow us to reliably wait for flushing. And does so. Closes scylladb/scylladb#29070	2026-03-18 16:23:13 +02:00
Botond Dénes	ae17596c2a	Merge 'Demote log level on split failure during shutdown' from Raphael Raph Carvalho Since commit `509f2af8db`, gate_closed_exception can be triggered for ongoing split during shutdown. The commit is correct, but it causes split failure on shutdown to log an error, which causes CI instability. Previously, aborted_exception would be triggered instead which is logged as warning. Let's do the same. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-951. Fixes https://github.com/scylladb/scylladb/issues/24850. Only 2026.1 is affected. Closes scylladb/scylladb#29032 * github.com:scylladb/scylladb: replica: Demote log level on split failure during shutdown service: Demote log level on split failure during shutdown	2026-03-18 16:21:05 +02:00
Pavel Emelyanov	8b1ca6dcd6	database: Rate limit all tokens from a range The limiter scans ranges to decide whether or not to rate-limit the query. However, when considering each range only the front one's token is accounted. This looks like a misprint. The limiter was introduced in `cc9a2ad41f` Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#29050	2026-03-18 13:50:48 +01:00
Pavel Emelyanov	d68c92ec04	test: Replace a bunch of ternary operators with an if-else block A followup of the merge of two test cases that happened in the previous patch. Both used `foo = N if domain == bar else M` to evaluate the parameters for topology. Using if-else block makes it immediately obvious which topology and scope apply for each domain value without having to evaluate multiple inline conditionals. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-03-18 13:08:36 +03:00
Pavel Emelyanov	b1d4fc5e6e	test: Squash test_restore_primary_replica_same\|different_domain tests The two tests differ only in the way they set up the topology for the cluster and the post-restore checks against the resulting streams. The merge happens with the help of a "scope_is_same" boolean parameter and corresponding updates in the topology setup and post-checks. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-03-18 13:08:36 +03:00
Pavel Emelyanov	21c603a79e	test: Use the same regexp in test_restore_primary_replica_different\|same_domain-s The one in "different domain" test is simpler because the test performs less checks. Next patch will merge both tests and making regexp-s look identical makes the merge even smother. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-03-18 13:07:09 +03:00
Emil Maskovsky	34f3916e7d	.github: update test instructions for unified pytest runner Update test running instructions to reflect unified pytest-based runner. The test.py now requires full test paths with file extensions for both C++ and Python tests. No backport: The change is only relevant for recent test.py changes in master. Closes scylladb/scylladb#29062	2026-03-18 09:28:28 +01:00
Marcin Maliszkiewicz	04bf631d7f	auth: switch query_attribute_for_all to use cache	2026-03-18 09:06:20 +01:00
Marcin Maliszkiewicz	cf578fd81a	auth: switch get_attribute to use cache	2026-03-18 09:06:20 +01:00
Marcin Maliszkiewicz	06d16b6ea2	auth: cache: add heterogeneous map lookups Some callers have only string_view role name, they shouldn't need to allocate sstring to do the lookup.	2026-03-18 09:06:20 +01:00
Marcin Maliszkiewicz	7fdb1118f5	auth: switch query_all to use cache	2026-03-18 09:06:20 +01:00
Marcin Maliszkiewicz	fca11c5a21	auth: switch query_all_directly_granted to use cache	2026-03-18 09:06:20 +01:00
Marcin Maliszkiewicz	6f682f7eb1	auth: cache: add ability to go over all roles This is needed to implement auth service api where we list all roles.	2026-03-18 09:06:20 +01:00
Marcin Maliszkiewicz	61952cd985	raft: service: reload auth cache before service levels Since service levels depend on auth data, and not other way around, we need to ensure a proper loading order.	2026-03-18 09:06:20 +01:00
Marcin Maliszkiewicz	c4cfb278bc	service: raft: move update_service_levels_effective_cache check The auth::cache::includes_table function also covers role_members and role_attributes. The existing check was removed because it blocked these tables from triggering necessary cache updates. While previously non-critical (due to unused attributes and table coupling), maintaining a correct cache is essential for upcoming changes.	2026-03-18 09:06:20 +01:00
Benny Halevy	c2a6d1e930	test/boost/database_test: add test_snapshot_ctl_details_exception_handling Verify that the directory listers opened by get_snapshot_details are properly closed when handling an (injected) exception. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2026-03-18 09:37:44 +02:00
Benny Halevy	6dc4ea766b	table: get_snapshot_details: fix indentation inside try block Whitespace-only change: indent the loop body one level inside the try block added in the previous commit. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2026-03-18 09:28:50 +02:00
Benny Halevy	b09d45b89a	table: per-snapshot get_snapshot_details: fix typo in comment The comment says the snapshot directory may contain a `schema.sql` file, but the code treats `schema.cql` as the special-case schema file. Reported-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2026-03-18 09:27:40 +02:00
Benny Halevy	580cc309d2	table: per-snapshot get_snapshot_details: always close lister using try/catch Since this is a coroutine, we cannot just use deferred_close, but rather we need to catch an error, close the lister, and then return the error, is applicable. Fixes: SCYLLADB-1013 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2026-03-18 09:27:23 +02:00
Benny Halevy	78c817f71e	table: get_snapshot_details: always close lister using deferred_close Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2026-03-18 09:26:26 +02:00
Dario Mirovic	71e6918f28	test: use NetworkTopologyStrategy in maintenance socket tests NetworkTopologyStrategy is the preferred choice. We should not use SimpleStrategy anymore. This patch changes the topology strategy for all the maintenance socket tests. Refs SCYLLADB-1070	2026-03-17 20:20:47 +01:00
Dario Mirovic	278535e4e3	test: use cleanup fixture in maintenance socket auth tests Add a cql_clusters pytest fixture that tracks CQL driver Cluster objects and shuts them down automatically after test completion. This replaces manual shutdown() calls at the end of each test. Also consolidate shutdown() calls in retry helpers into finally blocks for consistent cleanup. Refs SCYLLADB-1070	2026-03-17 20:15:30 +01:00
Dario Mirovic	2e4b72c6b9	auth: add maintenance_socket_authorizer GRANT/REVOKE fails on the maintenance socket connections, because maintenance_auth_service uses allow_all_authorizer. allow_all_authorizer allows all operations, but not GRANT/REVOKE, because they make no sense in its context. This has been observed during PGO run failure in operations from ./pgo/conf/auth.cql file. This patch introduces maintenance_socket_authorizer that supports the capabilities of default_authorizer ('CassandraAuthorizer') without needing authorization. Refs SCYLLADB-1070	2026-03-17 19:19:41 +01:00
Botond Dénes	172c786079	Merge 'perf-alternator: wait for alternator port before running workload' from Marcin Maliszkiewicz This patch is mostly for the purpose of running pgo CI job. We may receive connection error if asyncio.sleep(5) in pgo.py is not sufficient waiting time. In pgo.py we do wait for port but only for cql, anyway it's better to have high level check than trying to wait for alternator port there. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1071 Backport: 2026.1 - it failed on CI for that build Closes scylladb/scylladb#29063 * github.com:scylladb/scylladb: perf: add abort_source support to wait-for-port loops perf-alternator: wait for alternator port before running workload	2026-03-17 18:38:11 +02:00
Botond Dénes	5d868dcc55	Merge 's3_client: fix s3::range max value for object size' from Ernest Zaslavsky - fix s3::range max value for object size which is 50TiB and not 5. - refactor constants to make it accessible for all interested parties, also reuse these constants in tests No need to backport, doubt we will encounter an object larger than 5TiB Closes scylladb/scylladb#28601 * github.com:scylladb/scylladb: s3_client: reorganize tests in part_size_calculation_test s3_client: switch using s3 limits constants in tests s3_client: fix the s3::range max object size s3_client: remove "aws" prefix from object limits constants s3_client: make s3 object limits accessible	2026-03-17 16:34:42 +02:00
Anna Stuchlik	f4a6bb1885	doc: remove the Open Source Example from Installation This commit replaces the Open Soruce example from the Installation section for CentOS. We updated the example for Ubuntu, but not for CentOS. We don't want to have any Open Source information in the docs. Fixes https://github.com/scylladb/scylladb/issues/29087	2026-03-17 14:54:32 +01:00
Anna Stuchlik	95bc8911dd	doc: replace http with https in the installation instructions Fixes https://github.com/scylladb/scylladb/issues/17227	2026-03-17 14:46:16 +01:00
Dawid Mędrek	a8dd13731f	Merge 'Improve debuggability of test/cluster/test_data_resurrection_in_memtable.py' from Botond Dénes This test was observed to fail in CI recently but there is not enough information in the logs to figure out what went wrong. This PR makes a few improvements to make the next investigation easier, should it be needed: * storage-service: add table name to mutation write failure error messages. * database: the `database_apply` error injection used to cause trouble, catching writes to bystander tables, making tests flaky. To eliminate this, it gained a filter to apply only to non-system keyspaces. Unfortunately, this still allows it to catch writes to the trace tables. While this should not fail the test, it reduces observability, as some traces disappear. Improve this error injection to only apply to selected table. Also merge it with the `database_apply_wait` error injection, to streamline the code a bit. * test/test_data_resurrection_in_memtable.py: dump data from the datable, before the checks for expected data, so if checks fail, the data in the table is known. Refs: SCYLLADB-812 Refs: SCYLLADB-870 Fixes: SCYLLADB-1050 (by restricting `database_apply` error injection, so it doesn't affect writes to system traces) Backport: test related improvement, no backport Closes scylladb/scylladb#28899 * github.com:scylladb/scylladb: test/cluster/test_data_resurrection_in_memtable.py: dump rows before check replica/database: consolidate the two database_apply error injections service/storage_proxy: add name of table to error message for write errors	2026-03-17 13:35:19 +01:00
Botond Dénes	318aa07158	Merge ' test/alternator: use module-scope fixtures in test_streams.py ' from Nadav Har'El Previously, all stream-table fixtures in test_streams.py used scope="function", forcing a fresh table to be created for every test, slowing down the test a bit (though not much), and discouraging writing small new tests. This was a workaround for a DynamoDB quirk (that Alternator doesn't have): LATEST shard iterators have a time slack and may point slightly before the true stream head, causing leftover events from a previous test to appear in the next test's reads. The first two tests in this series fix small problems that turn up once we start sharing test tables in test_streams.py. The final patch fixes the "LATEST" problem and enables sharing the test table by using "module" scope fixtures instead of "function". After this series, test_streams.py run time went down a bit, from 20.2 seconds to 17.7 seconds. Closes scylladb/scylladb#28972 * github.com:scylladb/scylladb: test/alternator: speed up test_streams.py by using module-scope fixtures test/alternator: test_streams.py don't use fixtures in 4 tests test/alternator: fix do_test() in test_streams.py	2026-03-17 13:56:16 +02:00
Ernest Zaslavsky	7f597aca67	cmake: fix broken build Add raft_util.idl.hh to cmake to generate the code properly Closes scylladb/scylladb#29055	2026-03-17 10:35:34 +01:00
Botond Dénes	dbe70cddca	test/boost/querier_cache_test: make test_time_based_cache_eviction less sensitive to timing This test relies on the cache entry being evicted after 200ms past the TTL. This may not happen on a busy CI machine. Make the test less reliant on timing by using eventually_true(). Simplify the test by dropping the second entry, it doesn't add anything to the test. Fixes: SCYLLADB-811 Closes scylladb/scylladb#28958	2026-03-17 10:32:23 +01:00
Botond Dénes	0fd51c4adb	test/nodetool: rest_api_mock_server: add retry for status code 404 This fixtures starts the mock server and immediately connects to it to setup the expected requests. The connection attempt might be too early, so there is a retry loop with a timeout. The loop currently checks for requests.exception.ConnectionError. We've seen a case where the connection is successful but the request fails with 404. The mock started the server but didn't setup the routes yet. Add a retry for http 404 to handle this. Fixes: SCYLLADB-966 Closes scylladb/scylladb#29003	2026-03-17 10:30:23 +01:00
Asias He	6cb263bab0	repair: Prevent CPU stall during cross-shard row copy and destruction When handling `repair_stream_cmd::end_of_current_rows`, passing the foreign list directly to `put_row_diff_handler` triggered a massive synchronous deep copy on the destination shard. Additionally, destroying the list triggered a synchronous deallocation on the source shard. This blocked the reactor and triggered the CPU stall detector. This commit fixes the issue by introducing `clone_gently()` to copy the list elements one by one, and leveraging the existing `utils::clear_gently()` to destroy them. Both utilize `seastar::coroutine::maybe_yield()` to allow the reactor to breathe during large cross-shard transfers and cleanups. Fixes SCYLLADB-403 Closes scylladb/scylladb#28979	2026-03-17 11:05:15 +02:00
Pavel Emelyanov	9fe19ec9d9	sstables: Fix object storage lister not resetting position in batch vector The lister loop in get() pre-fetches records in batches and keeps them in a _info vector, iterating over it with the help of _pos cursor. When the vector is re-read, the cursor must be reset too. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-03-17 10:32:42 +03:00
Pavel Emelyanov	1a6a7647c6	sstables: Fix object storage lister skipping entries when filter is active The lister loop in get() method looks weird. It uses do-while(false) loop and calls continue; inside when filter asks to skip a entry. Skipping, thus, aborts the whole thing and EOF-s, which is not what's supposed to happen. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-03-17 10:32:40 +03:00
Botond Dénes	035aa90d4b	Merge 'Alternator: add per-table batch latency metrics and test coverage' from Amnon Heiman This series fixes a metrics visibility gap in Alternator and adds regression coverage. Until now, BatchGetItem and BatchWriteItem updated global latency histograms but did not consistently update per-table latency histograms. As a result, table-level latency dashboards could miss batch traffic. It updates the batch read/write paths to compute request duration once and record it in both global and per-table latency metrics. Add the missing tests, including a metric-agnostic helper and a dedicated per-table latency test that verifies latency counters increase for item and batch operations. This change is metrics-only (no API/behavior change for requests) and improves observability consistency between global and per-table views. Fixes #28721 We assume the alternator per-table metrics exist, but the batch ones are not updated Closes scylladb/scylladb#28732 * github.com:scylladb/scylladb: test(alternator): add per-table latency coverage for item and batch ops alternator: track per-table latency for batch get/write operations	2026-03-16 17:18:00 +02:00
Michał Hudobski	40d180a7ef	docs: update vector search filtering to reflect primary key support only Remove outdated references to filtering on columns provided in the index definition, and remove the note about equal relations (= and IN) being the only supported operations. Vector search filtering currently supports WHERE clauses on primary key columns only. Closes scylladb/scylladb#28949	2026-03-16 17:16:16 +02:00
Botond Dénes	9de8d6798e	Merge 'reader_concurrency_semaphore: skip preemptive abort for permits waiting for memory' from Łukasz Paszkowski Permits in the `waiting_for_memory` state represent already-executing reads that are blocked on memory allocation. Preemptively aborting them is wasteful -- these reads have already consumed resources and made progress, so they should be allowed to complete. Restrict the preemptive abort check in maybe_admit_waiters() to only apply to permits in the `waiting_for_admission` state, and tighten the state validation in `on_preemptive_aborted()` accordingly. Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1016 Backport not needed. The commit introducing replica load shedding is not part of 2026.1 Closes scylladb/scylladb#29025 * github.com:scylladb/scylladb: reader_concurrency_semaphore: skip preemptive abort for permits waiting for memory reader_concurrency_semaphore_test: detect memory leak on preemptive abort of waiting_for_memory permit	2026-03-16 17:14:25 +02:00
Marcin Maliszkiewicz	9318c80203	perf: add abort_source support to wait-for-port loops Check abort_source on each retry iteration in wait_for_alternator and wait_for_cql so the wait can be interrupted on shutdown. Didn't use sleep_abortable as the sleep is very short anyway.	2026-03-16 16:14:10 +01:00
Calle Wilund	a5df2e79a7	storage_service: Wait for snapshot/backup before decommission Fixes: SCYLLADB-244 Disables snapshot control such that any active ops finish/fail before proceeding with decommission. Note: snapshot control provided as argument, not member ref due to storage_service being used from both main and cql_test_env. (The latter has no snapshot_ctl to provide). Could do the snapshot lockout on API level, but want to do pre-checks before this. Note: this just disables backup/snapshot fully. Could re-enable after decommission, but this seems somewhat pointless. v2: * Add log message to snapshot shutdown * Make test use log waiting instead of timeouts Closes scylladb/scylladb#28980	2026-03-16 17:12:57 +02:00
Marcin Maliszkiewicz	edf0148bee	perf-alternator: wait for alternator port before running workload This patch is mostly for the purpose of running pgo CI job. We may receive connection error if asyncio.sleep(5) in pgo.py is not sufficient waiting time. In pgo.py we do wait for port but only for cql, anyway it's better to have high level check than trying to wait for alternator port there.	2026-03-16 16:07:52 +01:00
bitpathfinder	85d5073234	test: Fix non-awaited coroutine in test_gossiper_empty_self_id_on_shadow_round The line with the error was not actually needed and has therefore been removed. Fixes: SCYLLADB-906 Closes scylladb/scylladb#28884	2026-03-16 17:07:36 +02:00
Botond Dénes	3e4e0c57b8	Merge 'Relax rf-rack-valid-keyspace option in backup/restore tests' from Pavel Emelyanov Some tests, when create a cluster, configure nodes with the rf-rack-valid option, because sometimes they want to have it OFF. For that the option is explicitly carried around, but the cluster creating helper can guess this option itself -- out of the provided topology and replication factor. Removing this option simplifies the code and (which a nicer outcome) the test "signature" that's used e.g. in command-line to run a specific test. Improving tests, not backporting Closes scylladb/scylladb#28860 * github.com:scylladb/scylladb: test: Relax topology_rf_validity parameter for some tests test: Auto detect rf-rack-valid option in create_cluster()	2026-03-16 17:06:46 +02:00
Raphael S. Carvalho	ee87b66033	replica: Demote log level on split failure during shutdown Dtest failed with: table - Failed to load SSTable .../me-3gyn_0qwi_313gw2n2y90v2j4fcv-big-Data.db of origin memtable due to std::runtime_error (Cannot split .../me-3gyn_0qwi_313gw2n2y90v2j4fcv-big-Data.db because manager has compaction disabled, reason might be out of space prevention), it will be unlinked... The reason is that the error above is being triggered when the cause is shutdown, not out of space prevention. Let's distinguish between the two cases and log the error with warning level on shutdown. Fixes https://github.com/scylladb/scylladb/issues/24850. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2026-03-16 12:03:17 -03:00
Patryk Jędrzejczak	526e5986fe	test: test_raft_no_quorum: decrease group0_raft_op_timeout_in_ms after quorum loss `test_raft_no_quorum.py::test_cannot_add_new_node` is currently flaky in dev mode. The bootstrap of the first node can fail due to `add_entry()` timing out (with the 1s timeout set by the test case). Other test cases in this test file could fail in the same way as well, so we need a general fix. We don't want to increase the timeout in dev mode, as it would slow down the test. The solution is to keep the timeout unchanged, but set it only after quorum is lost. This prevents unexpected timeouts of group0 operations with almost no impact on the test running time. A note about the new `update_group0_raft_op_timeout` function: waiting for the log seems to be necessary only for `test_quorum_lost_during_node_join_response_handler`, but let's do it for all test cases just in case (including `test_can_restart` that shouldn't be flaky currently). Fixes https://scylladb.atlassian.net/browse/SCYLLADB-913 Closes scylladb/scylladb#28998	2026-03-16 16:58:15 +02:00
Raphael S. Carvalho	b508f3dd38	service: Demote log level on split failure during shutdown Since commit `509f2af8db`, gate_closed_exception can be triggered for ongoing split during shutdown. The commit is correct, but it causes split failure on shutdown to log an error, which causes CI instability. Previously, aborted_exception would be triggered instead which is logged as warning. Let's do the same. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-951. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2026-03-16 11:52:00 -03:00
Dani Tweig	bc0952781a	Update Jira sync calling workflow to consolidated view Replaced multiple per-action workflow jobs with a single consolidated call to main_pr_events_jira_sync.yml. Added 'edited' event trigger. This makes CI actions in PRs more readable and workflow execution faster. Fixes:PM-253 Closes scylladb/scylladb#29042	2026-03-16 08:25:32 +02:00
Artsiom Mishuta	755d528135	test.py: fix warnings changes in this commit: 1)rename class from 'TestContext' to 'Context' so pytest will not consider this class as a test 2)extend pytest filterwarnings list to ignore warnings from external libs 3) use datetime.datetime.now(datetime.UTC) unstead datetime.datetime.utcnow() 4) use ResultSet.one() instead ResultSet[0] Fixes SCYLLADB-904 Fixes SCYLLADB-908 Related SCYLLADB-902 Closes scylladb/scylladb#28956	2026-03-15 12:00:10 +02:00
Karol Nowacki	7659a5b878	vector_search: test: fix flaky test The test assumes that the sleep duration will be at least the value of the sleep parameter. However, the actual sleep time can be slightly less than requested (e.g., a 100ms sleep request might result in a 99ms sleep). This commit adjusts the test's time comparison to be more lenient, preventing test flakiness.	2026-03-13 16:28:22 +01:00
Karol Nowacki	5474cc6cc2	vector_search: fix race condition on connection timeout When a `with_connect` operation timed out, the underlying connection attempt continued to run in the reactor. This could lead to a crash if the connection was established/rejected after the client object had already been destroyed. This issue was observed during the teardown phase of a upcoming high-availability test case. This commit fixes the race condition by ensuring the connection attempt is properly canceled on timeout. Additionally, the explicit TLS handshake previously forced during the connection is now deferred to the first I/O operation, which is the default and preferred behavior. Fixes: SCYLLADB-832	2026-03-13 16:28:22 +01:00
Piotr Dulikowski	d8b283e1fb	Merge 'Add CQL forwarding for strongly consistent tables' from Wojciech Mitros In this series we add support for forwarding strongly consistent CQL requests to suitable replicas, so that clients can issue reads/writes to any node and have the request executed on an appropriate tablet replica (and, for writes, on the Raft leader). We return the same CQL response as what the user would get while sending the request to the correct replica and we perform the same logging/stats updates on the request coordinator as if the coordinator was the appropriate replica. The core mechanism of forwarding a strongly consistent request is sending an RPC containing the user's cql request frame to the appropriate replica and returning back a ready, serialized `cql_transport::response`. We do this in the CQL server - it is most prepared for handling these types and forwarding a request containing a CQL frame allows us to reuse near-top-level methods for CQL request handling in the new RPC handler (such as the general `process`) For sending the RPC, the CQL server needs to obtain the information about who should it forward the request to. This requires knowledge about the tablet raft group members and leader. We obtain this information during the execution of a `cql3/strong_consistency` statement, and we return this information back to the CQL server using the generalized `bounce_to_shard` `response_message`, where we now store the information about either a shard, or a specific replica to which we should forward to. Similarly to `bounce_to_shard`, we need to handle this `result_message` in a loop - a replica may move during statement execution, or the Raft leader can change. We also use it for forwarding strongly consistent writes when we're not a member of the affected tablet raft group - in that case we need to forward the statement twice - once to any replica of the affected tablet, then that replica can find the leader and return this information to the coordinator, which allows the second request to be directed to the leader. This feature also allows passing through exception messages which happened on the target replica while executing the statement. For that, many methods of the `cql_transport::cql_server::connection` for creating error responses needed to be moved to `cql_transport::cql_server`. And for final exception handling on the coordinator, we added additional error info to the RPC response, so that the handling can be performed without having the `result_message::exception` or `exception_ptr` itself. Fixes [SCYLLADB-71](https://scylladb.atlassian.net/browse/SCYLLADB-71) [SCYLLADB-71]: https://scylladb.atlassian.net/browse/SCYLLADB-71?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ Closes scylladb/scylladb#27517 * github.com:scylladb/scylladb: test: add tests for CQL forwarding transport: enable CQL forwarding for strong consistency statements transport: add remote statement preparation for CQL forwarding transport: handle redirect responses in CQL forwarding transport: add exception handling for forwarded CQL requests transport: add basic CQL request forwarding idl: add a representation of client_state for forwarding cql_server: handle query, execute, batch in one case transport: inline process_on_shard in cql_server::process transport: extract process() to cql_server transport: add messaging_service to cql_server transport: add response reconstruction helpers for forwarding transport: generalize the bounce result message for bouncing to other nodes strong consistency: redirect requests to live replicas from the same rack transport: pass foreign_ptr into sleep_until_timeout_passes and move it to cql_server transport: extract the error handling from process_request_one transport: move error response helpers from connection to cql_server	2026-03-13 15:03:10 +01:00
Andrzej Jackowski	60aaea8547	cql: improve write consistency level guardrail messages Update warn and fail messages for the write_consistency_levels_warned and write_consistency_levels_disallowed guardrails to include the configuration option name and actionable guidance. The main motivation is to make the messages follow the conventions of other guardrails. Refs: SCYLLADB-257	2026-03-13 14:40:45 +01:00
Tomasz Grabiec	518470e89e	Merge 'load_stats: improve tablet filtering for load stats' from Ferenc Szili When computing table sizes via load_stats to determine if a split/merge is needed, we are filtering tablets which are being migrated, in order to avoid counting them twice (both on leaving and pending replica) in the total table size. The tablets are filtered so that they are counted on the leaving replica until the streaming stage, and on the pending replica after the streaming stage. Currently, the procedure for collecting tablet sizes for load balancing also uses this same filter. This should be changed, because the load balancer needs to have as much information about tablet sizes as possible, and could ignore a node due to missing tablet sizes for tablets in the `write_both_read_new` and `use_new` stages. For tablet size collection, we should include all the tablets which are currently taking up disk space. This means: - on leaving replica, include all tablets until the `cleanup` stage - on pending replica, include all tablets starting with the `write_both_read_new` and later stages While this is an improvement, it causes problems with some of the tests, and therefore needs to be backported to 2026.1 Fixes: SCYLLADB-829 Closes scylladb/scylladb#28587 * github.com:scylladb/scylladb: load_stats: add filtering for tablet sizes load_stats: move tablet filtering for table size computation load_stats: bring the comment and code in sync	2026-03-13 13:08:11 +01:00
Pavel Emelyanov	d544d8602d	test: Relax topology_rf_validity parameter for some tests Tests that call create_cluster() helper no longer need to carry the rf-validity parameter. This simplifies the code and test signature. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-03-13 14:30:32 +03:00
Pavel Emelyanov	313985fed7	test: Auto detect rf-rack-valid option in create_cluster() The helper accepts its as boolean argument, but it can easily estimate one from the provided topology. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-03-13 14:30:32 +03:00
Gleb Natapov	fae5282c82	service level: fix crash during migration to driver server level Before `b59b3d4` the migration code checked that service level controller is on v2 version before migration and the check also implicitly checked that _sl_data_accessor field is already initialized, but now that the check is gone the migration can start before service level controller is fully initialized. Re add the check, but to a different place. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1049 Closes scylladb/scylladb#29021	2026-03-13 11:24:26 +01:00
Łukasz Paszkowski	4c4d043a3b	reader_concurrency_semaphore: skip preemptive abort for permits waiting for memory Permits in the `waiting_for_memory` state represent already-executing reads that are blocked on memory allocation. Preemptively aborting them is wasteful -- these reads have already consumed resources and made progress, so they should be allowed to complete. Restrict the preemptive abort check in maybe_admit_waiters() to only apply to permits in the `waiting_for_admission` state, and tighten the state validation in `on_preemptive_aborted()` accordingly. Adjust the following tests: + test_reader_concurrency_semaphore_abort_preemptively_aborted_permit no longer relies on requesting memory + test_reader_concurrency_semaphore_preemptive_abort_requested_memory_leak adjusted to the fix Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1016	2026-03-13 09:50:05 +01:00
Dani Tweig	aa46a0f4e0	Add VECTOR to the list of synced milestones in scylladb.git - Added VECTOR to the comma-separated list of Jira project keys in `call_sync_milestone_to_jira.yml`. - The `jira_project_keys` value changed from `SCYLLADB,CUSTOMER,SMI,RELENG` to `SCYLLADB,CUSTOMER,SMI,RELENG,VECTOR`. - The VECTOR project needs to sync with scylladb.git milestones, so that when a GitHub milestone is created or closed in scylladb/scylladb, the corresponding Jira release is also created or released in the VECTOR project. - Previously only SCYLLADB, CUSTOMER, SMI, and RELENG projects were synced. Fixes:PM-220 Closes scylladb/scylladb#29014	2026-03-13 09:58:41 +02:00
Botond Dénes	fc8cebd671	Merge 'Verify components digests during component load and scrub in validate mode' from Taras Veretilnyk This PR adds integrity verification for SSTable component files during loading. When component digests are present in Scylla metadata, the loader now validates each component's CRC32 digest against the stored expected value, catching silent corruption of component files. Index, Rows and Partitions components digests are also validated duriung scrub in validate mode Added corruption tests that write an SSTable, flip a bit in a specific component file, then verify that reloading the SSTable detects the corruption and throws the expected exception. Depends on https://github.com/scylladb/scylladb/pull/28338 Backport is not required, this is new feature Fixes https://github.com/scylladb/scylladb/issues/20103 Closes scylladb/scylladb#28761 * github.com:scylladb/scylladb: test/cqlpy: test --ignore-component-digest-mismatch flag in scylla sstable upgrade docs: document --ignore-component-digest-mismatch flag for scylla sstable upgrade sstables: propagate ignore_component_digest_mismatch config to all load sites sstables: add option to ignore component digest mismatches sstable_compaction_test: Add scrub validate test for corrupted index sstables: add tests for component digest validation on corrupted SSTables sstables: validate index components digests during SSTable scrub in validate mode sstables: verify component digests on SSTable load sstables: add digest_file_random_access_reader for CRC32 digest computation	2026-03-13 09:55:55 +02:00
Avi Kivity	ae8a418744	Merge 'Await async calls in test tablets migration' from Benny Halevy Fix several test cases that did not await async tasks: - test_restart_leaving_replica_during_cleanup - test_restart_in_cleanup_stage_after_cleanup - test_tablet_back_and_forth_migration - test_staging_backlog_is_preserved_with_file_based_streaming Fixes SCYLLADB-910 * Minor fixes, no backport needed Closes scylladb/scylladb#28908 * github.com:scylladb/scylladb: test_tablets_migration: test_staging_backlog_is_preserved_with_file_based_streaming: convert for loop to asyncio.gather test_tablets_migration: test_tablet_back_and_forth_migration: await move_tablet test_tablets_migration: test_restart_in_cleanup_stage_after_cleanup: await move_task test_tablets_migration: test_restart_leaving_replica_during_cleanup: await move_task test_tablets_migration: drop unused imports from cassandra.query	2026-03-13 00:20:29 +02:00
Avi Kivity	b228eb26e6	Merge 'dbuild: Use slirp4netns network in dbuild nested containers' from Calle Wilund Fixes #25084 Add slirp4netns and use for nested containers. This will allow nested container port aliasing, helping CI stability. Note: this contains and updated Dockerfile for dbuild image, but since chicken and eggs, right now will force install slirp4netns before anything in dbuild script. Updates the mock server handling to use ephemeral ports and query from container, ensuring we don't get port collisions. (boost as well as pytest). Includes a timeout up, and a tweak to our scylla_cluster handling, ensuring we don't deadlock when pipe size is less than requires for our sys notify messages. Closes scylladb/scylladb#28727 * github.com:scylladb/scylladb: gcs_fixture: Change to use docker helper aws_kms_fixture: Modify to use docker helper test/lib/proc_util: Add docker helper pytest: use ephemeral port publish for docker mock servers dbuild: Use container network in dbuild nested containers scylla_cluster: Read notify sock in background to prevent deadlock	2026-03-12 23:49:25 +02:00
Nadav Har'El	ad832c263e	test/cluster: mark test_alternator_concurrent_rmw_same_partition_different_server not strictly xfail A few days ago, in commit `7b30a39` we added to pytest.ini the option xfail_strict. This option causes every time a test XPASSes, i.e., an xfail test actually passes - to be considered an error and fail the test. But some tests demonstrate a timing-related bug and do not reproduce the bug every single time. An example we noticed in one CI run is: test/cluster/test_alternator.py::test_alternator_concurrent_rmw_same_partition_different_server This test reproduces a timing-related bug (if you do an LWT write to one partition on to two different coordinators "at the same time", you can get a failure), but only most of the time, not 100% of the time. The solution is to add "strict=False" for the xfail marker on this specific test. This undoes the xfail_strict for this specific test, accepting that this specific test can either pass or fail. Note that this does NOT make this test worthless - we still see this test failing most of the time, and when a developer finally fixes this issue, the test will begin to pass all the time. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-941 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#29016	2026-03-12 23:46:23 +02:00
Tomasz Grabiec	1256a9faa7	tablets: Fix deadlock in background storage group merge fiber When it deadlocks, groups stop merging and compaction group merge backlog will run-away. Also, graceful shutdown will be blocked on it. Found by flaky unit test test_merge_chooses_best_replica_with_odd_count, which timed-out in 1 in 100 runs. Reason for deadlock: When storage groups are merged, the main compaction group of the new storage group takes a compaction lock, which is appended to _compaction_reenablers_for_merging, and released when the merge completion fiber is done with the whole batch. If we accumulate more than 1 merge cycle for the fiber, deadlock occurs. Lock order will be this Initial state: cg0: main cg1: main cg2: main cg3: main After 1st merge: cg0': main [locked], merging_groups=[cg0.main, cg1.main] cg1': main [locked], merging_groups=[cg2.main, cg3.main] After 2nd merge: cg0'': main [locked], merging_groups=[cg0'.main [locked], cg0.main, cg1.main, cg1'.main [locked], cg2.main, cg3.main] merge completion fiber will try to stop cg0'.main, which will be blocked on compaction lock. which is held by the reenabler in _compaction_reenablers_for_merging, hence deadlock. The fix is to wait for background merge to finish before we start the next merge. It's achieved by holding old erm in the background merge, and doing a topology barrier from the merge finalizing transition. Background merge is supposed to be a relatively quick operation, it's stopping compaction groups. So may wait for active requests. It shouldn't prolong the barrier indefinitely. Tablet boost unit tests which trigger merge need to be adjusted to call the barrier, otherwise they will be vulnerable to the deadlock. Two cluster tests were removed because they assumed that merge happens in the backgournd. Now that it happens as part of merge finalization, and blocks topology state machine, those tests deadlock because they are unable to make topology changes (node bootstrap) while background merge is blocked. The test "test_tablets_merge_waits_for_lwt" needed to be adjusted. It assumed that merge finalization doesn't wait for the erm held by the LWT operation, and triggered tablet movement afterwards, and assumed that this migration will issue a barrier which will block on the LWT operation. After this commit, it's the barrier in merge finalization which is blocked. The test was adjusted to use an earlier log mark when waiting for "Got raft_topology_cmd::barrier_and_drain", which will catch the barrier in merge finalization. Fixes SCYLLADB-928	2026-03-12 22:45:01 +01:00
Tomasz Grabiec	7706c9e8c4	replica: table: Propagate old erm to storage group merge	2026-03-12 22:45:01 +01:00
Tomasz Grabiec	582a4abeb6	test: boost: tablets_test: Save tablet metadata when ACKing split resize decision Needs to be ordered before split finalization, because storage_group must be in split mode already at finalization time. There must be split-ready compaction groups, otherwise finalization fails with this error: Found 0 split ready compaction groups, but expected 2 instead. Exposed by increased split activity in tests.	2026-03-12 22:45:01 +01:00
Tomasz Grabiec	279fcdd5ff	storage_service: Extract local_topology_barrier() Will be called in tests. It does the local part of the global topology barrier. The comment: // We capture the topology version right after the checks // above, before any yields. This is crucial since _topology_state_machine._topology // might be altered concurrently while this method is running, // which can cause the fence command to apply an invalid fence version. was dropped, because it's no longer true after `fad6c41cee`, and it doesn't make sense in the context of local_topology_barrier(). We'd have to propagate the version to local_topology_barrier(), but it's pointless. The fence version is decided before calling the local barrier, and it will be valid even if local version moves ahead.	2026-03-12 22:44:56 +01:00
Avi Kivity	03186ce60d	Merge 'Cleanup after auth v1 and default superuser code removal' from Marcin Maliszkiewicz This is short cleanup after recent removal of creating default cassandra superuser and auth-v1 code removal. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1036 Backport: no, just code cleanup Closes scylladb/scylladb#29004 * github.com:scylladb/scylladb: auth: remove DEFAULT_SUPERUSER_NAME constant and dead DEFAULT_USER_PASSWORD auth: use configurable default_superuser in describe_roles auth: move default_superuser to common, remove _superuser member auth: use LOCAL_ONE for all auth queries auth: remove get_auth_ks_name indirection	2026-03-12 23:44:32 +02:00
Avi Kivity	e2eeef3e01	Merge 'service level: remove remnants of version 1 service level' from Gleb Natapov can_use_effective_service_level_cache() always returns true now, so the function can be dropped entirely and all the code that assumes it may return false can be dropped as well. Also drop async versions of find_effective_service_level and get_user_scheduling_group since they are unused. No need to backport, code removal, Closes scylladb/scylladb#29002 * github.com:scylladb/scylladb: service level: make maybe_update_per_service_level_params synchronous service level: remove unused get_user_scheduling_group function service level: drop async find_effective_service_level service level: remove remnants of version 1 service level	2026-03-12 23:39:41 +02:00
Botond Dénes	eed3a6d407	sstables/mx/writer: move post-cell write yield to collection write loop Introduced by `54bddeb3b5`, the yield was added to write_cell(), to also help the general case where there is no collection. Arguably this was unnecessary and this patch moves the yield to write_collection(), to the cell write loop instead, so regular cells don't have to poll the preempt flag. Closes scylladb/scylladb#29013	2026-03-12 21:26:35 +02:00
Avi Kivity	e8a6706d6e	Merge 'shorten some sleeps to speed up bootstrap in tests' from Patryk Jędrzejczak This PR shortens two sleeps from 1s to 100ms to speed up bootstrap in tests. The changed sleeps are: - the pause duration in group0 discovery, - the retry period in `wait_for_cql`. Refs: https://scylladb.atlassian.net/browse/SCYLLADB-918 No backport: performance improvements mostly relevant to tests. Closes scylladb/scylladb#29020 * github.com:scylladb/scylladb: test: pylib: util: wait for CQL being ready with a shorter period group0: discovery: shorten the pause duration	2026-03-12 21:17:05 +02:00
Wojciech Mitros	32974770b0	test: add tests for CQL forwarding Add basic cluster tests for CQL forwarding. The test cases include: - basic reads and writes - prepared statements with binds - forwarding from a non-replica - exception passthrough during forwarding (using an injection) - re-preparing a statement on the target node, even if the user query is also an EXECUTE request on a prepared statement - verification metric updates The existing test_basic_write_read was modified so that a few extra cases could be validated on the same cluster.	2026-03-12 19:43:35 +01:00
Wojciech Mitros	916a9995c1	transport: enable CQL forwarding for strong consistency statements We enable CQL forwarding by starting to return the bounce_to_node result message in redirect_statement() instead of throwing. The forwarding code introduced in the preceding patches reacts to these messages, allowing the requests to be forwarded. With the update, some tests assuming that requests can't be forwarded need to be adjusted, so we do that as well.	2026-03-12 19:43:35 +01:00
Wojciech Mitros	21a7b036a5	transport: add remote statement preparation for CQL forwarding During forwarding of CQL EXECUTE requests, the target node may not have the prepared statement in its cache. If we do have this statement as a coordinator, instead of returning PREPARED NOT FOUND to the client, we want to prepare the statement ourselves on target node. For that, we add a new FORWARD_CQL_PREPARE RPC. We use the new RPC after gettting the prepared_not_found status during forwarding. When we try to forward a request, we always have the query string (we decide whether to forward based on this query), so we can always use the new RPC when getting the prepared_not_found status. After receiving the response, we try forwarding the EXECUTE request again.	2026-03-12 19:43:35 +01:00
Wojciech Mitros	96a5e1c7ce	transport: handle redirect responses in CQL forwarding During CQL forwarding, when the target node can't handle the request, it will find another node which can execute the request or which knows where the request can be executed. We return this information in responses to CQL forwarding, and in this patch, we add handling of this kind of a response. After getting a redirect response, we retry forwarding to the returned host/shard until success or timeout. This can happen many times during a single request, when we first forward to a replica and later to the coordinator, or when a replica/coordinator migrated while we were performing the forwarding	2026-03-12 19:43:31 +01:00
Wojciech Mitros	8816d3038c	transport: add exception handling for forwarded CQL requests When a forwarded request fails on the remote node, we can't use the exception handling that happens in process_request_one because we don't go through this code path. Instead, we use the previously extracted cql_server::handle_exception handler, which performs all accounting on the forwarded-to node, and which prepares the response. For the read_failure_exception_with_timeout exception, we need to perform the sleep on the source node, so we return the timeout in the forwarding response and use it on the source node to know how long to sleep without any extra calculations. The handle_forward_execute() method is extracted from the inline handler lambda to make the error catching wrapper cleaner.	2026-03-12 19:41:37 +01:00
Wojciech Mitros	23bff5dfef	transport: add basic CQL request forwarding Add the infrastructure for forwarding CQL requests to other nodes. When a process() call results in a node bounce (as opposed to a shard bounce), the coordinator serializes the request and sends it via the FORWARD_CQL_EXECUTE RPC verb to the target node. In this patch we omit several features that allow handling more scenarios that can happen when trying to forward a CQL request, but the RPC request and response are already prepared for them. They will be handled in the following commits.	2026-03-12 19:41:35 +01:00
Avi Kivity	76b6784c1a	Merge 'cql3: track CQL parsing memory cost and use it for admission control' from Marcin Maliszkiewicz Use rolling_max_tracker to record gross bytes allocated during each CQL parse. The rolling maximum is then added to the memory estimate for incoming QUERY and PREPARE requests so that the admission control in the CQL transport layer accounts for parsing overhead. The measured memory footprint serves as upper bound rather than exact number but it's purpose is to prevent OOMs under unprepared statements heavy load. In benchmark 1G memory node shows decrease of non-LSA memory usage from peak 320MB (our coordinator budget is 10% of 1G) to 96MB. While tps drops from 1.2 kops to 0.8 kops. Drop in tps is expected as memory admission kicks in trying to prevent OOM. This is phase 1 of OOM prevention, potential next steps: - add second admission in query_processor::get_statement trying to prevent potential thundering herd problem - decrease cql_server memory pool size - count reads in the memory pool - add per service level memory pool and a shared one Related https://scylladb.atlassian.net/browse/SCYLLADB-740 Fixes https://scylladb.atlassian.net/browse/SCYLLADB-938 Backport: no, new feature, but we may reconsider if some customer needs it Closes scylladb/scylladb#28919 * github.com:scylladb/scylladb: cql3: track CQL parsing memory cost and use it for admission control utils: add rolling max tracker	2026-03-12 19:59:52 +02:00
Wojciech Mitros	170b82ddca	idl: add a representation of client_state for forwarding In the following patches, when we start allowing to forward CQL requests to other nodes, we'll need to use the same client state for executing the request on the destination node as we had on the source. client_state contains many fields and we need to create a new instance of it when we start handling the forwarded request, so to prepare for the forwarding RPC, we add a serializable format of the client_state as an IDL struct. The new class is missing some fields that are not used while executing requests, and some whose value is determined by the fact that the client state is used for a forwarded request. These include: - driver name, driver version, client options - not used for executing requests. Instead, we use these as data sources for the virtual "clients" system table. - auth_state - must be READY - we reached a bounce message, so we were able to try executing the request locally - _control_connection - used for altering a cql_server::connection, which we don't have on the target node - _default_timeout_config - used when updating service levels, also only per-connection - workload_type - used for deciding whether to allow shedding at the start of processing the request, and for getting per-connection service level params (for an API)	2026-03-12 17:48:58 +01:00
Wojciech Mitros	b4a7fefe20	cql_server: handle query, execute, batch in one case Currently we perform the same steps when handling query, execute and batch CQL requests. So instead of creating multiple functions performing these steps, we can handle them all in one fallthrough case in cql_server::connection::process_request_one.	2026-03-12 17:48:58 +01:00
Wojciech Mitros	dadb87047c	transport: inline process_on_shard in cql_server::process The process_on_shard method is relatively short, it's only used in the process() method and the Process concept that is uses is as long as the function itself. This area will be made more complex by the following patches for cql forwarding, so we simplify it by inlining process_on_shard in cql_server::process.	2026-03-12 17:48:58 +01:00
Wojciech Mitros	24cdc3a10d	transport: extract process() to cql_server Move process() and process_on_shard() from cql_server::connection to cql_server. The process() method is no longer a template - instead, it takes an opcode parameter and uses get_process_fn_for_opcode() to select the appropriate internal processing function. The process_query, process_execute, and process_batch wrappers on connection now delegate to _server.process() with the appropriate opcode. This refactoring is preparation for CQL request forwarding, where process() will need to be called from a context other than connection - the forwarding RPC handler).	2026-03-12 17:48:57 +01:00
Wojciech Mitros	0e3469e89c	transport: add messaging_service to cql_server The messaging service will be used by cql_server to register RPC handlers for forwarding CQL requests between nodes. We pass it through the controller to cql_server.	2026-03-12 17:48:57 +01:00
Wojciech Mitros	1376caf980	transport: add response reconstruction helpers for forwarding Expose response::flags() and response::extract_body(), and a new constructor. It will be needed for creating a cql_transport::response from the response body returned during CQL forwarding.	2026-03-12 17:48:57 +01:00
Wojciech Mitros	e44820ba1f	transport: generalize the bounce result message for bouncing to other nodes In the following patches, we'll start allowing forwarding requests to strongly consistent tables so that they'll get executed on the suitable tablet Raft group members. For that we'll reuse the approach that we already have for bouncing requests to other shards - we'll try to execute a request locally, and the result of that will be a bounce message with another replica as the target. In this patch we generalize the former bounce_to_shard result message so that it will be able to specify the target of the bounce as another shard or specific replica. We also rename it to result_message::bounce so that it stops implying that only another shard may be its target. Aside from the host_id and the shard, the new message also includes the timeout, because in the service handling the forwarding we won't have the access to it, and it's needed for specifying how long we should wait for the forwarded requests. It also includes an information whether this is a write request to return correct timeout response in case the deadline is exceeded. We will return other hosts in the new bounce message when executing requests to strongly consistent tables when we can't handle the request because we aren't a suitable replica. We can't handle this message yet, so we don't return it anywhere and we still assume that every bounce message is a bounce to the same host.	2026-03-12 17:48:57 +01:00
Wojciech Mitros	b4d66fda2e	strong consistency: redirect requests to live replicas from the same rack Forwarding CQL requests is not implemented yet, but we're already prepared to return the target to forward to when trying to execute strongly consistent requests. Currently, if we're not a replica of the affected tablet, we redirect the request to the first replica in the list. This is not optimal, because this replica may be down or it may be in another rack, making us perform cross-rack requests during forwarding. Instead, we should forward the request to the replica from the same rack and handle the case where the replica is down. In this patch we change the replica selection for forwarding strongly consistent requests, so that when the coordinator isn't a replica, it redirects the request to the replica from the same rack. If the replica from the same rack is down, or there is no replica in our rack, we choose the next closest replica (preferring same-DC replicas over other DCs). If no replica is alive, the query fails - the driver should retry when some replica comes back up.	2026-03-12 17:48:54 +01:00
Andrzej Jackowski	3b9cd52a95	reader_concurrency_semaphore_test: detect memory leak on preemptive abort of waiting_for_memory permit A permit in `waiting_for_memory` state can be preemptively aborted by maybe_admit_waiters(). This is wrong: such permits have already been admitted and are actively processing a read — they are merely blocked waiting for memory under serialize-limit pressure. When `on_preemptive_aborted()` fires on a `waiting_for_memory` permit, it does not clear `_requested_memory`. A subsequent `request_memory()` call accumulatesa on top of the stale value, causing `on_granted_memory()` to consume more than resource_units tracks. This commit adds a test that confirms that scenario by counting internal_errors.	2026-03-12 17:09:34 +01:00
Alex	7fd39ba586	test/cluster: strengthen raft voters multi-DC test and tune debug runtime The test_raft_voters_multidc_kill_dc scenario had become weaker after group0 voter count was made always odd. In particular, the old num_nodes == 1 case (dc1=2, dc2=1, dc3=1) could pass even without the intended balancing logic, because with 3 voters total we naturally get one voter per DC. This change restores coverage of the original intent: - Replace num_nodes parametrization with explicit DC triples. - Use (3, 1, 1) to force a meaningful asymmetric topology where voter placement logic is required. - Keep a larger topology case (6, 3, 3) for broader coverage. - Mark (6, 3, 3) as skip_mode(debug) with reason: larger topology case is too slow in debug on minipcs. Also updated comments/docstring to match the new setup. Fixes: SCYLLADB-794 backport: None, it is done to deflake minipcs that will start working only on master Closes scylladb/scylladb#29000	2026-03-12 17:07:45 +01:00
Wojciech Mitros	309abc44d9	transport: pass foreign_ptr into sleep_until_timeout_passes and move it to cql_server Change sleep_until_timeout_passes() to accept a foreign_ptr<std::unique_ptr<response>>. We can easily create the foreign_ptr for the responses created in the CQL server, but we'll need this when we get responses when forwarding CQL statements - the responses may come from other shards. We also move it from cql_server::connection to cql_server, because for forwarded CQL requests, we'll need to handle it at the cql_server level. The method also loses its const qualifier - the abort_source that we pass into sleep_abortable needs to be non-const. Apparently, we could still use it in a const method of cql_server::connection because we passed it as _server._abort_source which caused the const qualifier to be lost.	2026-03-12 16:03:14 +01:00
Marcin Maliszkiewicz	975cd60e05	ldap: fix use-after-move crash in ldap_reuser::reap() After stop() moved _reaper, in-flight with_connection() callbacks could still call reap(), which accessed the moved-from future causing a SIGSEGV in future_base::detach_promise(). Add a seastar::gate so stop() waits for all in-flight operations before moving _reaper. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1043 Closes scylladb/scylladb#29015	2026-03-12 16:48:45 +02:00
Patryk Jędrzejczak	c50cf32793	test: pylib: util: wait for CQL being ready with a shorter period `wait_for_cql` is used in hundreds, if not thousands, of places in tests. We shouldn't waste up to 1s for every call. Also, the 1s period is clearly too long compared to the bootstrap time, which is usually 0-3s in dev mode. The following test speeds up from 50s to 42s with the change: ``` for _ in range(10): servers = await manager.servers_add(3) await manager.get_ready_cql(servers) ```	2026-03-12 15:40:19 +01:00
Patryk Jędrzejczak	f85628a9a0	group0: discovery: shorten the pause duration Nodes currently pause group0 discovery for 1s. This case is always hit while adding multiple nodes in parallel to an empty cluster by all nodes except the one that becomes the group0 leader. This is fine in production, but in tests, the slowdown is quite significant. Every `manager.servers_add(n)` call for n > 1 becomes 1s slower when the cluster is empty. Many cluster tests are affected. In this commit, we decrease the sleep duration from 1s to 100ms to speed up tests. The consequence of this change is that nodes might perform more steps in group0 discovery, but the increase in CPU usage and network traffic should be negligible.	2026-03-12 15:40:18 +01:00
Gleb Natapov	c67f876893	service level: make maybe_update_per_service_level_params synchronous It does not call async functions any more.	2026-03-12 15:53:08 +02:00
Benny Halevy	b3fec20960	test_tablets_migration: test_staging_backlog_is_preserved_with_file_based_streaming: convert for loop to asyncio.gather Currently the test iterates on all servers and calls manager.api.disable_injection but it doesn't await those calls. Use asyncio.gather to await all calls in parallel. Co-authored-by: Copilot CLI Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2026-03-12 15:26:40 +02:00
Benny Halevy	61d5a2df02	test_tablets_migration: test_tablet_back_and_forth_migration: await move_tablet Co-authored-by: Copilot CLI Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2026-03-12 15:26:40 +02:00
Benny Halevy	b8655748a2	test_tablets_migration: test_restart_in_cleanup_stage_after_cleanup: await move_task Co-authored-by: Copilot CLI Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2026-03-12 15:26:40 +02:00
Benny Halevy	10dccc2c4e	test_tablets_migration: test_restart_leaving_replica_during_cleanup: await move_task Co-authored-by: Copilot CLI Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2026-03-12 15:26:40 +02:00
Benny Halevy	c9d653fb1e	test_tablets_migration: drop unused imports from cassandra.query Co-authored-by: Copilot CLI Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2026-03-12 15:26:40 +02:00
Gleb Natapov	c30907b8f2	service level: remove unused get_user_scheduling_group function	2026-03-12 14:28:26 +02:00
Gleb Natapov	a934d8391d	service level: drop async find_effective_service_level find_cached_effective_service_level does exactly same thing now and it is synchronous.	2026-03-12 14:28:26 +02:00
Botond Dénes	15cfa5beeb	mutation/collection_mutation: don't copy the serialized collection serialize_collection_mutation() copies the serialized collection into the returned collection_mutation object. Change to move to avoid the copy. Fixes: SCYLLADB-1041 Closes scylladb/scylladb#29010	2026-03-12 13:57:40 +02:00
Gleb Natapov	f888f2dced	service level: remove remnants of version 1 service level can_use_effective_service_level_cache() always returns true now, so the function can be dropped entirely and all the code that assumes it may return false can be dropped as well.	2026-03-12 12:27:52 +02:00
Nadav Har'El	27f0510280	test/alternator: test_gzip_request_oversized now passes on AWS The Alternator test test_compressed_request.py::test_gzip_request_oversized checks that a very large request that compresses to a small size is still rejected. This test passed on Alternator, but used to fail on DynamoDB because DynamoDB didn't reject this case. This was a bug in DynamoDB (a "decompression bomb" vulnerability), and after I reported it, it was fixed. So now this test does pass on DynamoDB (after a small modification to allow for different error codes). So remove its scylla_only marker, and make the comment true to the current state. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#28820	2026-03-12 10:41:56 +01:00
Marcin Maliszkiewicz	b277d9d9aa	cql3: track CQL parsing memory cost and use it for admission control Use rolling_max_tracker to record gross bytes allocated during each CQL parse. The rolling maximum is then added to the memory estimate for incoming QUERY and PREPARE requests so that the admission control in the CQL transport layer accounts for parsing overhead. The measured memory footprint serves as upper bound rather than exact number but it's purpose is to prevent OOMs under unprepared statements heavy load. In benchmark 1G memory node shows decrease of non-LSA memory usage from peak 320MB (our coordinator budget is 10% of 1G) to 96MB. While tps drops from 1.2 kops to 0.8 kops. Drop in tps is expected as memory admission kicks in trying to prevent OOM.	2026-03-12 10:16:10 +01:00
Botond Dénes	0b19a6de85	tombstone_gc: tombstone_gc_state::for_tests(): remove unused param Closes scylladb/scylladb#28923	2026-03-12 10:01:42 +01:00
Marcin Maliszkiewicz	2d22eea2f9	Merge 'cql3: Replace SCYLLA_ASSERT and abort by throwing_assert' from Nadav Har'El In this patch we replace every single use of SCYLLA_ASSERT(), abort() and assert() in the cql3/ directory by throwing_assert(). The problem with SCYLLA_ASSERT()/abort()/assert() is that when it fails, it crashes Scylla. This is almost always a bad idea (see #7871 discussing why), but it's even riskier in front-end code like cql3/: In front-end code, there is a risk that due to a bug in our code, a specific user request can cause Scylla to crash. A malicious user can send this query to all nodes and crash the entire cluster. When the user is not malicious, it causes a small problem (a failing request) to become a much worse crash - and worse, the user has no idea which request is causing this crash and the crash will repeat if the same request is tried again. All of this is solved by using the new throwing_assert(), which is the same as SCYLLA_ASSERT() but throws an exception (using on_internal_error()) instead of crashing. The exception will prevent the code path with the invalid assumption from continuing, but will result in only the current user request being aborted, with a clear error message reporting the internal server error due to an assertion failure. I reviewed all the changes that I did in these patches to check that (to the best of my understanding) none of the assertions in cql3/ involve the sort of serious corruption that might require crashing the Scylla node entirely. throwing_assert() also improves logging of assertion failures compared to the original SCYLLA_ASSERT()/abort() - SCYLLA_ASSERT() printed a message to stderr which in many installations is lost, and abort() often prints no message at all. But throwing_assert() uses Scylla's standard logger, and also includes a backtrace in the log message. Fixes #13970 (Exorcise assertions from CQL code paths) Refs #7871 (Exorcise assertions from Scylla) Closes scylladb/scylladb#28847 * github.com:scylladb/scylladb: cql3: remove unnecessary assert() cql3: replace abort() by throwing_assert() cql3: Replace SCYLLA_ASSERT by throwing_assert	2026-03-12 09:09:24 +01:00
Szymon Malewski	3116db6c2d	test: fix `testJsonOrdering` The `test/cqlpy/cassandra_tests/validation/entities/json_test.py::testJsonOrdering` was failing because of differences between Cassandra and Scylla in printing JSON floating point values - e.g. Cassandra prints 30.0, where Scylla prints 30. Both are valid, so in this patch, instead of comparing strings, we compare parsed JSON using `EquivalentJson`. Fixes #28467 Closes scylladb/scylladb#28924	2026-03-12 09:07:08 +01:00
Marcin Maliszkiewicz	5b2a07b408	utils: add rolling max tracker We will use it later to track parser memory usage via per query samples. Tests runtime in dev: 1.6s	2026-03-12 08:56:41 +01:00
Marcin Maliszkiewicz	54ef8fca57	auth: remove DEFAULT_SUPERUSER_NAME constant and dead DEFAULT_USER_PASSWORD DEFAULT_SUPERUSER_NAME is no longer referenced after removing the role_part special-casing in describe_roles. DEFAULT_USER_PASSWORD was dead code too.	2026-03-12 08:46:00 +01:00
Marcin Maliszkiewicz	029410e159	auth: use configurable default_superuser in describe_roles Replace the hardcoded meta::DEFAULT_SUPERUSER_NAME comparison with default_superuser(_qp) which reads from the auth_superuser_name config option. This makes the IF NOT EXISTS clause in DESCRIBE output correct for clusters with a non-default superuser name.	2026-03-12 08:45:47 +01:00
Nadav Har'El	09a399ae3c	Merge 'Replace estimated_histogram with approx_exponential_histogram - alternator' from Amnon Heiman _"A journey of a thousand miles begins with a single step" Lao Tzu_ ScyllaDB uses estimated_histogram in many places. We already have a more efficient alternative: approx_exponential_histogram. It is both CPU and memory-efficient and can be exported as Prometheus native histograms. Its main limitation (which has its benefits) is that the bucket layout is fixed at compile time, so histograms with different configurations cannot be mixed. The end goal is to replace all uses of estimated_histogram in the codebase. That migration needs a few small API adjustments, so I am splitting the work into steps for easier review. This series is the first step. It introduces a base template for fixed-size estimated histograms, and switches the Alternator's estimated_histogram with the template. This change is self-contained and valuable on its own, while keeping the scope limited. Minor adjustments were made to the code and tests so that the tests would pass. Follow-up PRs will apply the same pattern to the rest of the code. New feature no need to backport Closes scylladb/scylladb#28987 * github.com:scylladb/scylladb: alternator: migrate to operation_size_kb histograms test/alternator/test_metrics.py: Update the bucket in the histogram search alternator: Use batch_histogram for batch size histograms estimated_histogram.hh: adds estimated_histogram_with_max	2026-03-12 00:06:16 +02:00
Wojciech Mitros	b1bd206147	transport: extract the error handling from process_request_one When we forward CQL statements, we'll need to handle the errors on the destination node. Only for read_failure_exception_with_timeout exception, we'll still need to wait until timeout passes on the source node. For that we extract the exception handling to a separate method. Additionally, we separate the waiting and all other handling, so that all handling aside from waiting will be reusable after forwarding, and we'll also be able to sleep on the source node if necessary.	2026-03-11 19:40:47 +01:00
Wojciech Mitros	6184b1d5ea	transport: move error response helpers from connection to cql_server These methods are used only in the error handler in the cql server, and outside of 3 cases, they don't need any information from the cql_server::connection. We move them from cql_server::connection to cql_server, so that they can be used in the following patches for methods for CQL request forwarding where we'll have no instance of cql_server::connection on the node forwarded to. After the change the methods require no access to the server's or connection's fields, so we also make them static methods.	2026-03-11 19:40:47 +01:00
Amnon Heiman	1339a44163	alternator: migrate to operation_size_kb histograms Switch Alternator operation-size metrics from the legacy estimated histogram implementation to estimated_histogram_with_max<512> and export them through the native approx-exponential histogram path. Add a dedicated operation-size histogram type alias based on estimated_histogram_with_max<512>. Replace all per-operation size histograms (GetItem/PutItem/DeleteItem/ UpdateItem/BatchGetItem/BatchWriteItem) with the new type. Remove the custom legacy histogram-to-metrics adapter and use to_metrics_histogram() for operation size metrics, aligning export behavior with other approx-exponential histograms. Update Alternator metrics tests to compute expected le bucket boundaries using approx-exponential bucket math (including deduplication of equal bounds), so assertions match the new exported histogram schema. Update bucket helper signatures to use (max, precision) parameters and keep +Inf handling unchanged. Replace byte-to-KB ceiling conversion with plain integer division (bytes / 1024): histogram export already reports each bucket by its upper bound (le), so rounding input values up before bucketing is unnecessary and would over-shift borderline samples into higher buckets.	2026-03-11 17:29:14 +02:00
Marcin Maliszkiewicz	adc840919b	auth: move default_superuser to common, remove _superuser member Move default_superuser() to auth::meta in common.{hh,cc} and remove the cached _superuser member from both standard_role_manager and password_authenticator. The superuser name comes from config which is immutable at runtime, so caching it is unnecessary.	2026-03-11 16:28:38 +01:00
Marcin Maliszkiewicz	993e06c1ae	auth: use LOCAL_ONE for all auth queries Removes auth-v1 hack for cassandra superuser as auth-v1 code no longer exists. Also CL is not really used when quering raft replicated tables (like auth ones), but LOCAL_ONE is the least confusing one.	2026-03-11 16:27:15 +01:00
Marcin Maliszkiewicz	6d1153687a	auth: remove get_auth_ks_name indirection Replace get_auth_ks_name(qp) with db::system_keyspace::NAME directly. The function always returned the constant "system" and its qp parameter was unused.	2026-03-11 16:26:47 +01:00
David	79f9967eaa	docs: update theme 1.9 Motivation Upgrades Sphinx to 9.x, MyST Parser to 5.x, Python to 3.11+–3.14, Node.js to 22, and replaces Poetry with uv for dependency management. Changelog: https://github.com/scylladb/sphinx-scylladb-theme/blob/master/docs/source/upgrade/CHANGELOG.md#190---26-february-2026 How to test * Make sure you are using Python 3.11-3.14: * python --version * Install uv: * make setupenv * Build the docs: * make preview * Docs should render without errors at http://127.0.0.1:5500 Closes scylladb/scylladb#28971	2026-03-11 16:56:51 +02:00
Aleksandra Martyniuk	2e68f48068	nodetool: cluster repair: do not fail if a table was dropped nodetool cluster repair without additional params repairs all tablet keyspaces in a cluster. Currently, if a table is dropped while the command is running, all tables are repaired but the command finishes with a failure. Modify nodetool cluster repair. If a table wasn't specified (i.e. all tables are repaired), the command finishes successfully even if a table was dropped. If a table was specified and it does not exist (e.g. because it was dropped before the repair was requested), then the behavior remains unchanged. Fixes: SCYLLADB-568. Closes scylladb/scylladb#28739	2026-03-11 16:35:04 +02:00
Dani Tweig	45d7d9a96c	.github/workflow: also call call_sync_milestone_to_jira.yml for close milestone event What changed * Added closed to milestone event types in call_sync_milestone_to_jira.yml (types: [created] -> types: [created, closed]) * Added VECTOR to the list of Jira project keys being synced (jira_project_keys: SCYLLADB,CUSTOMER,SMI,RELENG -> jira_project_keys: SCYLLADB,CUSTOMER,SMI,RELENG,VECTOR) Why (Requirements Summary) * The call_sync_milestone_to_jira.yml workflow only triggered on milestone creation. When a GitHub milestone is closed, the corresponding Jira versions (in SCYLLADB, CUSTOMER, SMI, RELENG projects) should be marked as released. Adding the closed trigger enables the called workflow (main_sync_milestone_to_jira_release.yml in github-automation) to handle both creating and releasing Jira versions from GitHub milestone events. * Added the VECTOR project so its Jira versions are also created/released when milestones are created or closed in scylladb.git. * This is consistent with the same change already applied to the staging and scylla-machine-image repos. Fixes:PM-216 Update call_sync_milestone_to_jira.yml in scylladb.git - add close trigger and VECTOR project sync Closes scylladb/scylladb#28981	2026-03-11 15:56:55 +02:00
Amnon Heiman	69fbcd32bd	test/alternator/test_metrics.py: Update the bucket in the histogram search	2026-03-11 15:24:05 +02:00
Amnon Heiman	50af1f3671	alternator: Use batch_histogram for batch size histograms Switch batch-related histograms to estimated_histogram_with_max. Results with better memory consumption and improve efficiency.	2026-03-11 15:21:25 +02:00
Amnon Heiman	b22162c719	estimated_histogram.hh: adds estimated_histogram_with_max This patch adds estimated_histogram_with_max template that will be a based for specific estimated_histograms, eventually replacing the current struct implementation. Introduce estimated_histogram_with_max<Max> as a reusable wrapper around approx_exponential_histogram<1, Max, 4>, providing merge support and the same add helpers used by existing estimated_histogra type. Add estimated_histogram_with_max_merge() Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2026-03-11 15:02:37 +02:00
Radosław Cybulski	fe8117feee	alternator: fix shard's parent calculation for vnodes Fix an invalid condition, when searching for a parent shard, when table is based on vnodes. Shards have associated with them `last token` - token, than marks the end of the range of tokens they consume (inclusive). An additional assumptions are whole token space is used and (for vnodes) token space wraps around. Previously code looked like this: auto pid = std::upper_bound(..., [](const dht::token& t, const cdc::stream_id& id) { return t < id.token(); }); if (pid != pids.begin()) { pid = std::prev(pid); } An `upper_bound` call with `t < id.token()` means it is looking for an iterator, for which value `t < id.token()` changed to true, which effectively means a position, where iterator is bigger then searched value. Then we move iterator backward once if possible. Assuming token space <-2, 2> and parents [0, 2], when we search for: - -1 -> we will get 0, it's first, so we can't move backward, so 0 (ok) - 0 -> we will get 2, it's not first, so we go back and we return 0 (ok) - 1 -> we will get 2, it's not first, so we go back and we return 0 (not ok - should be 2) The fix is to replace it with `std::lower_bound` and remove conditional backward motion. Since we've a guarantees that whole token space is used if `std::lower_bound` ends with `end()` value, then we have a wrap around case and we need to pick `begin()` as result. Fixes #28354 Fixes: SCYLLADB-537 Closes scylladb/scylladb#28382	2026-03-11 14:51:42 +02:00
Calle Wilund	bc544eb08e	gcs_fixture: Change to use docker helper	2026-03-11 12:32:02 +01:00
Calle Wilund	eb2dfe04e1	aws_kms_fixture: Modify to use docker helper	2026-03-11 12:32:02 +01:00
Calle Wilund	4a8afd9649	test/lib/proc_util: Add docker helper Adds boost test equivalent of dockerized_service to handle launching dockerized mock service using ephermal port, query port and return the process.	2026-03-11 12:32:02 +01:00
Calle Wilund	3e8a9a0beb	pytest: use ephemeral port publish for docker mock servers Changes dockerized_service to use ephermal port publish, and query the published port from podman/docker. Modifies client code to use slightly changed usage syntax.	2026-03-11 12:32:01 +01:00
Piotr Dulikowski	d9a277453e	Merge 'cql3: pin prepared cache entry in prepare() to avoid invalid weak handle race' from Alex Dathskovsky query_processor::prepare() could race with prepared statement invalidation: after loading from the prepared cache, we converted the cached object to a checked weak pointer and then continued asynchronous work (including error-injection waitpoints). If invalidation happened in that window, the weak handle could no longer be promoted and the prepare path could fail nondeterministically. This change keeps a strong cache entry reference alive across the whole critical section in prepare() by using a pinned cache accessor (get_pinned()), and only deriving the weak handle while the entry is pinned. This removes the lifetime gap without adding retry loops. Test coverage was extended in test/cluster/test_prepare_race.py: - reproduces the invalidation-during-prepare window with injection, - verifies prepare completes successfully, - then invalidates again and executes the same stale client prepared object, - confirms the driver transparently re-requests/re-prepares and execution succeeds. This change introduces: - no behavior change for normal prepare flow besides stronger lifetime guarantees, - no new protocol semantics, - preserves existing cache invalidation logic, - adds explicit cluster-level regression coverage for both the race and driver reprepare path. - pushes the re prepare operation twards the driver, the server will return unprepared error for the first time and the driver will have to re prepare during execution stage Fixes: https://github.com/scylladb/scylladb/issues/27657 Backport to active branches recommended: No node crash, but user-visible PREPARE failures under rare schema-invalidation race; low-risk timeout-bounded retry improves robustness. Closes scylladb/scylladb#28952 * github.com:scylladb/scylladb: transport/messages: hold pinned prepared entry in PREPARE result cql3: pin prepared cache entry in prepare() to avoid invalid weak handle race	2026-03-11 12:09:23 +01:00
Calle Wilund	e3e940bc47	dbuild: Use container network in dbuild nested containers Remove the host network setting, ensuring we use private networks (slirp4netns). This will allow nested container port aliasing, helping CI stability (can use ephemeral ports and container introspection). This also makes the nested podman setup non-conditional, since we only run podman containers inside dbuild, and need the setup regardless if host container is docker or not.	2026-03-11 12:05:51 +01:00
Calle Wilund	8a56eafd39	scylla_cluster: Read notify sock in background to prevent deadlock Starts a thread to process scylla notify messages (NOTIFY_SOCKET) instead of just processing inline, non-blocking. This because it is possible for the pipe created to be to small to hold enough messages for us to reach the point where we otherwise even read from said pipe, allowing other end (scylla) to proceed.	2026-03-11 11:59:00 +01:00
Patryk Jędrzejczak	37aeba9c8c	Merge 'raft: add global read barrier to group0_batch::commit and switch auth and service levels' from Marcin Maliszkiewicz This series adds a global read barrier to raft_group0_client, ensuring that Raft group0 mutations are applied on all live nodes before returning to the caller. Currently, after a group0_batch::commit, the mutations are only guaranteed to be applied on the leader. Other nodes may still be catching up, leading to stale reads. This patch introduces a broadcast read barrier mechanism. Calling send_group0_read_barrier_to_live_members after committing will cause the coordinator to send a read barrier RPC to all live nodes (discovered via gossiper) and waits for them to complete. This is best effort attempt to get cluster-wide visibility of the committed state before the response is returned to the user. Auth and service levels write paths are switched to use this new mechanism. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-650 Backport: no, new feature Closes scylladb/scylladb#28731 * https://github.com/scylladb/scylladb: test: add tests for global group0_batch barrier feature qos: switch service levels write paths to use global group0_batch barrier auth: switch write paths to use global group0_batch barrier raft: add function to broadcast read barrier request raft: add gossiper dependency to raft_group0_client raft: add read barrier RPC	2026-03-11 10:37:19 +01:00
Botond Dénes	54bddeb3b5	sstables/mx/writer: yield after writing a cell With the goal of avoiding stalls on writing large collections, like below: ++[0#1/1 100%] addr=0x5422d1e total=32 count=1 avg=32: \| seastar::backtrace<seastar::backtrace_buffer::append_backtrace_oneline()::{lambda(seastar::frame)#1}> at ./build/release/seastar/./seastar/include/seastar/util/backtrace.hh:85 ++ - addr=0x541b6d4: \| seastar::backtrace_buffer::append_backtrace_oneline at ./build/release/seastar/./seastar/src/core/reactor.cc:811 \| (inlined by) seastar::print_with_backtrace at ./build/release/seastar/./seastar/src/core/reactor.cc:838 ++ - addr=0x541afb7: \| seastar::internal::cpu_stall_detector::generate_trace at ./build/release/seastar/./seastar/src/core/reactor.cc:1479 ++ - addr=0x541b86c: \| seastar::internal::cpu_stall_detector::maybe_report at ./build/release/seastar/./seastar/src/core/reactor.cc:1214 \| (inlined by) seastar::internal::cpu_stall_detector::on_signal at ./build/release/seastar/./seastar/src/core/reactor.cc:1234 \| (inlined by) seastar::reactor::block_notifier at ./build/release/seastar/./seastar/src/core/reactor.cc:1548 /opt/scylladb/libreloc/libc.so.6: ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=f83d43b9b4b0ed5c2bd0a1613bf33e08ee054c93, for GNU/Linux 3.2.0, not stripped ++ - addr=/opt/scylladb/libreloc/libc.so.6+0x1a28f: \| sigpending at ??:0 ++ - addr=0x1760bf6: \| std::basic_string_view<signed char, std::char_traits<signed char> >::remove_prefix at /usr/lib/gcc/x86_64-redhat-linux/15/../../../../include/c++/15/string_view:302 \| (inlined by) managed_bytes_basic_view<(mutable_view)0>::remove_prefix at ././utils/managed_bytes.hh:421 \| (inlined by) _Z11read_simpleIlTk14FragmentedView24managed_bytes_basic_viewIL12mutable_view0EEET_RT0_ at ././utils/fragment_range.hh:365 \| (inlined by) _ZL9get_fieldIlTk14FragmentedView24managed_bytes_basic_viewIL12mutable_view0EEQsr3stdE12is_trivial_vIT_EES3_T0_j at ././mutation/atomic_cell.hh:62 \| (inlined by) atomic_cell_type::timestamp at ././mutation/atomic_cell.hh:103 \| (inlined by) basic_atomic_cell_view<(mutable_view)0>::timestamp at ././mutation/atomic_cell.hh:232 \| (inlined by) sstables::mc::writer::write_cell at ./sstables/mx/writer.cc:1101 \| (inlined by) sstables::mc::writer::write_collection(bytes_ostream&, clustering_key_prefix const, column_definition const&, collection_mutation_view, sstables::mc::writer::row_time_properties const&, bool)::$_0::operator() at ./sstables/mx/writer.cc:1233 \| (inlined by) collection_mutation_view::with_deserialized<sstables::mc::writer::write_collection(bytes_ostream&, clustering_key_prefix const, column_definition const&, collection_mutation_view, sstables::mc::writer::row_time_properties const&, bool)::$_0> at ././mutation/collection_mutation.hh:97 \| (inlined by) sstables::mc::writer::write_collection at ./sstables/mx/writer.cc:1221 ++ - addr=0x1677af3: \| sstables::mc::writer::write_cells at ./sstables/mx/writer.cc:1261 \| (inlined by) sstables::mc::writer::write_row_body at ./sstables/mx/writer.cc:1287 \| (inlined by) sstables::mc::writer::write_clustered at ./sstables/mx/writer.cc:1377 \| (inlined by) _ZN8sstables2mc6writer15write_clusteredI14clustering_rowQ9ClusteredIT_EEEvRKS4_9tombstone at ./sstables/mx/writer.cc:766 \| (inlined by) sstables::mc::writer::consume at ./sstables/mx/writer.cc:1425 Putting the yield in write_cell() instead of in write_collection() means that writing any row benefits from the added yield point in the middle. Refs: SCYLLADB-964 Closes scylladb/scylladb#28948	2026-03-11 10:34:55 +01:00
Botond Dénes	475220b9c9	Merge 'Remove the rest of pre raft topology code' from Gleb Natapov Remove the rest of the code that assumes that either group0 does not exist yet or a cluster is till not upgraded to raft topology. Both of those are not supported any more. No need to backport since we remove functionality here. Closes scylladb/scylladb#28841 * github.com:scylladb/scylladb: service level: remove version 1 service level code features: move GROUP0_SCHEMA_VERSIONING to deprecated features list migration_manager: remove unused forward definitions test: remove unused code auth: drop auth_migration_listener since it does nothing now schema: drop schema_registry_entry::maybe_sync() function schema: drop make_table_deleting_mutations since it should not be needed with raft schema: remove calculate_schema_digest function schema: drop recalculate_schema_version function and its uses migration_manager: drop check for group0_schema_versioning feature cdc: drop usage of cdc_local table and v1 generation definition storage_service: no need to add yourself to the topology during reboot since raft state loading already did it storage_service: remove unused functions group0: drop with_raft() function from group0_guard since it always returns true now gossiper: do not gossip TOKENS and CDC_GENERATION_ID any more gossiper: drop tokens from loaded_endpoint_state gossiper: remove unused functions storage_service: do not pass loaded_peer_features to join_topology() storage_service: remove unused fields from replacement_info gossiper: drop is_safe_for_restart() function and its use storage_service: remove unused variables from join_topology gossiper: remove the code that was only used in gossiper topology storage_service: drop the check for raft mode from recovery code cdc: remove legacy code test: remove unused injection points auth: remove legacy auth mode and upgrade code treewide: remove schema pull code since we never pull schema any more raft topology: drop upgrade_state and its type from the topology state machine since it is not used any longer group0: hoist the checks for an illegal upgrade into main.cc api: drop get_topology_upgrade_state and always report upgrade status as done service_level_controller: drop service level upgrade code test: drop run_with_raft_recovery parameter to cql_test_env group0: get rid of group0_upgrade_state storage_service: drop topology_change_kind as it is no longer needed storage_service: drop check_ability_to_perform_topology_operation since no upgrades can happen any more service_storage: remove unused functions storage_service: remove non raft rebuild code storage_service: set topology change kind only once group0: drop in_recovery function and its uses group0: rename use_raft to maintenance_mode and make it sync	2026-03-11 10:24:20 +02:00
Piotr Dulikowski	38a2829f69	Merge 'Return HTTP error description in Vector Store client' from Szymon Wasik The `service_error` struct: `6dc2c42f8b/service/vector_store_client.hh (L64)` currently stores just the error status code. For this reason whenever the HTTP error occurs, only the error code can be forwarded to the client. For example see here: `6dc2c42f8b/service/vector_store_client.cc (L580)` For this reason in the output of the drivers full description of the error is missing which forces user to take a look into Scylla server logs. The objective of this PR is to extend the support for HTTP errors in Vector Store client to handle messages as well. Moreover, it removes the quadratic reallocation in response_content_to_sstring() helper function that is used for getting the response in case of error. Fixes: VECTOR-189 Closes scylladb/scylladb#26139 * github.com:scylladb/scylladb: vector_search: Avoid quadratic reallocation in response_content_to_sstring vector_store_client: Return HTTP error description, not just code	2026-03-11 09:19:27 +01:00
Calle Wilund	6d8ac23731	test_encryption: Use maximum replication in _smoke_test Refs: SCYLLADB-557 We should use full replication in KS/CF creation and population, for at least two reasons: 1.) Ensure we wait fully for and write to all nodes 2.) Make test more "real", behaving like a proper cluster Closes scylladb/scylladb#28959	2026-03-11 09:54:57 +02:00
Nadav Har'El	00a819bcd8	cql3: remove unnecessary assert() In cql3/, there was one call to assert() (not SCYLLA_ASSERT or throwing_assert), and it was: const auto shard_num = smp::count; assert(shard_num > 0) Rather than converting this assert() to throwing_assert() as I did in previous patches, I decided to outright remove it: Seastar guarantees that smp::count is not zero. Many other places in the code use smp::count assuming that it is correct, no other place bothers to assert it isn't zero. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-03-11 09:43:24 +02:00
Nadav Har'El	34eec020b3	cql3: replace abort() by throwing_assert() After the previous patch replaced all SCYLLA_ASSERT() calls by throwing_assert(), this patch also replaces all calls to abort(). All these abort() calls are supposedly cases that can never happen, but if they ever do happen because of a bug, in none of these places we absolutely need to crash - and exception that aborts the current operation should be enough. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-03-11 09:43:11 +02:00
Nadav Har'El	c87d6407ed	cql3: Replace SCYLLA_ASSERT by throwing_assert In this patch we replace every single use of SCYLLA_ASSERT() in the cql3/ directory by throwing_assert(). The problem with SCYLLA_ASSERT() is that when it fails, it crashes Scylla. This is almost always a bad idea (see #7871 discussing why), but it's even riskier in front-end code like cql3/: In front-end code, there is a risk that due to a bug in our code, a specific user request can cause Scylla to crash. A malicious user can send this query to all nodes and crash the entire cluster. When the user is not malicious, it causes a small problem (a failing request) to become a much worse crash - and worse, the user has no idea which request is causing this crash and the crash will repeat if the same request is tried again. All of this is solved by using the new throwing_assert(), which is the same as SCYLLA_ASSERT() but throws an exception (using on_internal_error()) instead of crashing. The exception will prevent the code path with the invalid assumption from continuing, but will result in only the current user request being aborted, with a clear error message reporting the internal server error due to an assertion failure. I reviewed all the changes that I did in this patch to check that (to the best of my understanding) none of the assertions in cql3/ involve the sort of serious corruption that might require crashing the Scylla node entirely. throwing_assert() also improves logging of assertion failures compared to the original SCYLLA_ASSERT() - SCYLLA_ASSERT() printed a message to stderr which in many installations is lost, whereas throwing_assert() uses Scylla's standard logger, and also includes a backtrace in the log message. Fixes #13970 (Exorcise assertions from CQL code paths) Refs #7871 (Exorcise assertions from Scylla) Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-03-11 09:41:20 +02:00
Botond Dénes	99fa912f1b	Merge 'Generalize streaming scopes tests' from Pavel Emelyanov To restore how streaming scopes work there are two tests that greatly duplicate each other -- test_restore_with_streaming_scopes from cluster/object_store suite and test_refresh_with_streaming_scopes from cluster suite. This patch generalizes both into a do_test_streaming_scopes() non-test function Closes scylladb/scylladb#28874 * github.com:scylladb/scylladb: test: Re-sort comments around do_test_streaming_scopes() test: Split do_load_sstables() test: Drop load_fn argument from do_load_sstables() test: Re-use do_test_streaming_scopes() in refresh test test: Introduce SSTablesOnLocalStorage test: Introduce SSTablesOnObjectStorage test: Move test_restore_with_streaming_scopes() into do_test_streaming_scopes()	2026-03-11 09:35:21 +02:00
Dmitriy Kruglov	cee44716db	docs: add cluster platform migration procedure Document how to migrate a ScyllaDB cluster to different instance types using the add-and-replace node cycling approach. Closes: QAINFRA-42 Closes scylladb/scylladb#28458	2026-03-11 09:31:35 +02:00
Nadav Har'El	401dc1894c	test/alternator,cqlpy: avoid xfail_strict against DynamoDB/Cassandra Recently, in commit `7b30a39`, we added to pytest.ini the option xfail_strict. This option causes every time a test XPASSes, i.e., an XFAIL test actually passes, to be considered an error and fail the test. While this has some benefits, it's a big problem when running tests against a reference implementation like DynamoDB or Cassandra: We typically mark a test "xfail" if the test shows a known bug - i.e., if the test fails on Scylla but passes on the reference system (DynamoDB or Cassandra). This means that when running "test/cqlpy/run-cassandra" or "test/alternator/run --aws", we expect to see many tests XPASS, and now this will cause these runs to "fail". So in this patch we add the xfail_strict=false to cqlpy/run-cassandra and alternator/run --aws. This option is not added to cqlpy/run or to alternator/run without --aws, and also doesn't affect test.py or Jenkins. P.S. This is another nail in the coffin of doing "cd test/alternator; pytest --aws". You should get used to running Alternator tests through test/alternator/run, even if you don't need to run Scylla (the "--aws" option doesn't run Scylla). Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#28973	2026-03-11 09:29:30 +02:00
Robert Bindar	29619e48d7	replica/table: calculate manifest tablet_count from tablet map During tests I noticed that if the number of tablets is very small, say 2, and the number of nodes is 3 (2 shards per node), using the number of storage groups on each shard, a shard may end up holding 0 groups, whilst the other holds 1 group. And in some nodes even both shards have 0 groups. Taking the minimum among shards here was showing in manifests a tablet count of 0 for all 3 nodes, which is incorrect. Signed-off-by: Robert Bindar <robert.bindar@scylladb.com> Closes scylladb/scylladb#28978	2026-03-11 09:27:04 +02:00
Botond Dénes	3fed6f9eff	Merge 'service: tasks: scan all tablets in tablet_virtual_task::wait' from Aleksandra Martyniuk Currently, for repair tasks tablet_virtual_task::wait gathers the ids of tablets that are to be repaired. The gathered set is later used to check if the repair is still ongoing. However, if the tablets are resized (split or merged), the gathered set becomes irrelevant. Those, we may end up with invalid tablet id error being thrown. Wait until repair is done for all tablets in the table. Fixes: https://github.com/scylladb/scylladb/issues/28202 Backport to 2026.1 needed as it contains the change introducing the issue `d51b1fea94` Closes scylladb/scylladb#28323 * github.com:scylladb/scylladb: service: fix indentation test: add test_tablet_repair_wait service: remove status_helper::tablets service: tasks: scan all tablets in tablet_virtual_task::wait	2026-03-11 09:24:07 +02:00
Raphael S. Carvalho	cc5b1acadf	Improve log when sstable load fails due to missing tablet replica A bug or some bad operator intervention can lead to a sstable existing in a node after the tablet replica was moved to a different node. This will result sstable loading during boot failing, requiring operator intervention. The log today just dumps the name of the "orphaned" sstable, but one investigating it might want to know which process (repair, memtable, whatever) generated that sstable, if the sstable was created locally or remotely, and the current replica set of the underlying tablet. From the original identifier, we can know the exact time the sstable was created on its original node. From the current id, we know the time it was created on the current node. All this info can help the investigator to correlate with events in other nodes (includes actions from the coordinator) to get closer to the root cause. The new log will look like this: "Unable to load SSTable .../me-3gyg_1fsw_2u0u826b00b71vc46o-big-Data.db (originated from compaction with id 913f41c0-18c2-11f1-8f08-cb8521b3f330 on host e483238c-2287-4022-8bc4-b4f1c4cb2b0d) of tablet 6 (replica set: [e483238c-2287-4022-8bc4-b4f1c4cb2b0d:0])" Refs https://scylladb.atlassian.net/browse/SCYLLADB-788. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#28921	2026-03-11 06:20:34 +02:00
Avi Kivity	b17e1259e3	Merge 'types: optimize vector deserialization for high-dimensional vectors' from Szymon Wasik Vector deserialization is an operation which performance is critical for vector similarity search feature because it is frequently executed during rescoring operation. Some of the identified performance bottlenecks for it include: 1. Per-element virtual dispatch in deserialize(): each of the N elements went through visit() which switches on ~28 type variants. For a 1024-dimension float vector, that's 1024 redundant type switches when the element type is the same for all of them. 2. Redundant work in split_fragmented(): value_length_if_fixed() was called inside the loop (N virtual calls), and no reserve() was done on the output vector causing repeated reallocations. This series fixes both: - Introduce deserialize_vector_visitor that dispatches on the element type once for the entire vector, then loops inside the resolved handler. Simple numeric types (float, int, etc.) call deserialize_value() directly with no virtual dispatch per element. String types (ascii, utf8) get a dedicated handler that skips make_empty() (sstring has no empty_t constructor). Complex types (list, map, tuple, etc.) fall back to per-element dispatch. - In split_fragmented(), reserve the output vector to _dimension and cache value_length_if_fixed() before the loop. Benchmark results (1024-dim float vector, release build, -O3 -flto): deserialize: 15.73 us -> 11.70 us (1.34x, 26% faster) split_fragmented: 10.34 us -> 7.45 us (1.39x, 28% faster) References: SCYLLADB-471 Backport: none, unless we observe some critical performance improvement for quantization. Closes scylladb/scylladb#28618 * github.com:scylladb/scylladb: types: optimize reading vector fragments types: optimize vector deserialization for high-dimensional vectors	2026-03-11 00:39:46 +02:00
Dawid Mędrek	167feabe1a	cql3: Reject user-provided timestamps for strongly consistent tables Similarly to LWTs, we reject queries with user-provided timestamps when they target strongly consistent tables. Such statements could force us to rewrite history, and that contradicts the philosophy of linearizability we aim for. Fixes SCYLLADB-879 Closes scylladb/scylladb#28867	2026-03-10 22:11:39 +02:00
Marcin Maliszkiewicz	8ae80a32c0	Update seastar submodule * seastar d2953d2a...4d268e0e (32): > Merge 'prometheus: support multiple __name__ filters and prefixed names' from Travis Downs doc: update prometheus.md with __name__ filter enhancements prometheus: support prefixed names in __name__ filter prometheus: add benchmarks for name filter performance prometheus: support multiple __name__ query parameters prometheus: move write_body_args to header > fair_queue: Subtract from _queued_capacity on pop_front() > memory: expose cumulative allocated bytes statistic > Merge 'Add ability to configure IO bandwidth limit for supergroup' from Pavel Emelyanov test: IO bandiwdth throttler unit tests code: Add ability to configure IO bandwidth limit for supergroup io_queue: Have more than one throttler par class io_queue: Introduce bandwidth_throttler helper class io_queue: Nest io_group::priotiy_class_data-s io_queue: Update class bandwidth on group's class data io_queue: Make io_group::priority_class_data::tokens() static fair_queue: Introduce group (un)plugging > Fix _shard_to_numa_node_mapping double population > Use exception parameter in log_timer_callback_exception() > Fix wakeup_granularity() fallback debug-fs reading > test_fixture: Fix SEASTAR_FIXTURE_THREAD_TEST_CASE thread not propagated > build: support tuning -ffile-prefix-map > test: Remove unused C::dup() method of testing class > src/core/reactor: introduce reactor::get_backend_name() > util/process: add pid() accessor > Merge 'Add source location to task and tasktrace object' from Radosław Cybulski coroutine.hh: disable source_location for GCC to avoid ICE reactor: improve do_dump_task_queue reporting Use source_location in `do_dump_task_queue` Update backtrace with source locations of resume points Add calls to update resume_point Add a std::source_location (resume_point) to task object. > Merge 'Refine posix file .dup() implementation' from Pavel Emelyanov file: Templatize posix_file_handle_impl file: Don't dup() non-read-only files file: Split ..._impl::dup() implementations test: Add a simple test for dup() > Merge 'Deprecate reactor::make_pollable_fd(socket_address, int)' from Pavel Emelyanov reactor: Deprecate make_pollable_fd() net/posix: Create file_desc for sockets in-place reactor,net: Keep sock_need_nonblock boolean on posix_network_stack net/posix: Re-format constructor initializer lists > Merge 'test: add fuzz testing infrastructure and sstring fuzzer' from Travis Downs test: add fuzz tests to CI workflow test: add sstring differential fuzzer test: add fuzz testing infrastructure > Introduce "integrated queue length" metrics and use it for IO classes (#3210) > reactor: Remove get_sg_data(unsigned) overload > memcached: Stop using scattered_message > reactor: Mark uptime() method const > alien: Remove deprecated run_on and submit_to calls > file: make open_flags and access_flags constexpr > scheduling: Unfriend some methods from scheduling_group > reactor: Move _dying bit to epoll backend > file: coroutinize the with_file templates > configure: validate --cook ingredient names > fix trailing whitespace > Merge 'Estimate timing overhead, allow failing if it is too high' from Travis Downs perf_tests: document overhead column and threshold options perf_tests: add measurement overhead tracking and warnings perf_tests: remove inline/hot attributes from time_measurement methods perf_tests: move time_measurement class to implementation file perf_tests: move perf counters into time_measurement singleton > rpc: log handler type > Merge 'Add pre-commit with trailing whitespace hook' from Travis Downs Add GitHub Actions workflow for pre-commit enforcement Add pre-commit setup documentation to HACKING.md Add pre-commit configuration with trailing-whitespace hook Remove trailing whitespace from source files > posix-stack: Make internal::posix_connect() resolve exceptions into futures > sstring: fix npos to be size_t for consistency with std::string Closes scylladb/scylladb#28954	2026-03-10 22:06:58 +02:00
Szymon Wasik	7fae78d2b0	types: optimize reading vector fragments There was a redundant work in split_fragmented(): value_length_if_fixed() was called inside the loop (N virtual calls), and no reserve() was done on the output vector causing repeated reallocations. This patch reserves the output vector to _dimension and caches value_length_if_fixed() before the loop. Additionally, split read_vector_element() into two specialized functions: read_vector_element_fixed() and read_vector_element_variable(), and hoist the branch on fixed_len outside the loop in split_fragmented() and deserialize_loop(). This avoids a conditional branch per element in the hot path. Benchmark results (1024-dim float vector, release build, -O3 -flto): 10.34 us -> 7.45 us (1.39x, 28% faster)	2026-03-10 20:17:31 +01:00
Taras Veretilnyk	579269b3c5	test/cqlpy: test --ignore-component-digest-mismatch flag in scylla sstable upgrade Verify that scylla sstable upgrade fails when an sstable has a corrupted Statistics component digest, and succeeds when the --ignore-component-digest-mismatch flag is provided.	2026-03-10 19:24:05 +01:00
Taras Veretilnyk	fc4c82b962	docs: document --ignore-component-digest-mismatch flag for scylla sstable upgrade	2026-03-10 19:24:05 +01:00
Taras Veretilnyk	7214f5a0b6	sstables: propagate ignore_component_digest_mismatch config to all load sites Add ignore_component_digest_mismatch option to db::config (default false). When set, sstable loading logs a warning instead of throwing on component digest mismatches, allowing a node to start up despite corrupted non-vital components or bugs in digest calculation. Propagate the config to all production sstable load paths: - distributed_loader (node startup, upload dir processing) - storage_service (tablet storage cloning) - sstables_loader (load-and-stream, download tasks, attach) - stream_blob (tablet streaming)	2026-03-10 19:24:05 +01:00
Taras Veretilnyk	c123f637ea	sstables: add option to ignore component digest mismatches Add `ignore_component_digest_mismatch` option to `sstable_open_config` that logs a warning instead of throwing `malformed_sstable_exception` on component digest mismatch. This is useful for recovering sstables with corrupted non-vital components or working around bugs in digest calculation. Expose the option in scylla-sstable via the `--ignore-component-digest-mismatch` flag for the upgrade operation.	2026-03-10 19:24:05 +01:00
Taras Veretilnyk	95420014ea	sstable_compaction_test: Add scrub validate test for corrupted index Generalize corrupt_sstable() and scrub_validate_corrupted_file() to accept a component_type parameter, defaulting to Data, so they can be reused for corrupting other components.	2026-03-10 19:24:05 +01:00
Taras Veretilnyk	a3912cf7f1	sstables: add tests for component digest validation on corrupted SSTables Add tests that verify SSTable component digest validation detects corruption on load. Each test writes an SSTable, corrupts a specific component file by flipping a bit, then asserts that reloading the SSTable throws malformed_sstable_exception with the expected digest mismatch message.	2026-03-10 19:24:05 +01:00
Taras Veretilnyk	e78a3d2c44	sstables: validate index components digests during SSTable scrub in validate mode	2026-03-10 19:24:05 +01:00
Taras Veretilnyk	9decbdeab0	sstables: verify component digests on SSTable load Add integrity verification for SSTable component files by validating their CRC32 digests against the expected values stored in Scylla metadata during SSTable loading. The following components are validated on load: TOC, Scylla metadata, CompressionInfo, Statistics, Summary, and Filter.	2026-03-10 19:24:05 +01:00
Taras Veretilnyk	478c1eaec5	sstables: add digest_file_random_access_reader for CRC32 digest computation Add a new reader that wraps file_random_access_reader and computes a running CRC32 digest over the data as it is read. The digest accumulates across sequential read_exactly() calls and is reset on seek(), since a non-sequential read invalidates the running checksum.	2026-03-10 19:24:05 +01:00
Szymon Wasik	6c0ef8eb92	types: optimize vector deserialization for high-dimensional vectors One of the performance bottlenecks while deserializing vectors was per-element virtual dispatch in deserialize(): each of the N elements went through visit() which switches on ~28 type variants. For a 1024-dimension float vector, that's 1024 redundant type switches when the element type is the same for all of them. This patch introduces deserialize_vector_visitor that dispatches on the element type once for the entire vector, then loops inside the resolved handler. Simple numeric types (float, int, etc.) call deserialize_value() directly with no virtual dispatch per element. String types (ascii, utf8) get a dedicated handler that skips make_empty() (sstring has no empty_t constructor). Complex types (list, map, tuple, etc.) fall back to per-element dispatch. Benchmark results (1024-dim float vector, release build, -O3 -flto): 15.73 us -> 11.70 us (1.34x, 26% faster)	2026-03-10 18:21:34 +01:00
Andrzej Jackowski	9247dff8c2	reader_concurrency_semaphore: fix leak workaround `e4da0afb8d5491bf995cbd1d7a7efb966c79ac34` introduces a protection against resources that are "made up" of thin air to `reader_concurrency_semaphore`. If there are more `_resources` than the `_initial_resources`, it means there is a negative leak, and `on_internal_error_noexcept` is called. In addition to it, `_resources` is set to `std::max(_resources, _initial_resources)`. However, the commit message of `e4da0afb8d5491bf995cbd1d7a7efb966c79ac34` states the opposite: "The detection also clamps the _resources to _initial_resources, to prevent any damage". Before this commit, the protection mechanism doesn't clamp `_resources` to `_initial_resources` but instead keeps `_resources` high, possibly even indefinitely growing. This commit changes `std::max` to `std::min` to make the code behave as intended. Refs: SCYLLADB-163 Closes scylladb/scylladb#28982	2026-03-10 18:57:31 +02:00
Szymon Wasik	74d86d3fe9	vector_search: Avoid quadratic reallocation in response_content_to_sstring Pre-compute the total size and allocate a single uninitialized sstring before copying the buffers, following the pattern from Seastar's read_entire_stream_contiguous(). This avoids iterative reallocation which is O(n^2) for large responses.	2026-03-10 17:45:55 +01:00
Szymon Wasik	d27610f138	vector_store_client: Return HTTP error description, not just code This simple patch adds support for storing the HTTP error description that Vector Store client receives from vector store. Until now it was just printed to the log but it was not returned. For this reason it was not forwarded to the drivers which forced users to access ScyllaDB server logs to understand what is wrong with Vector Store. This patch also updates formatter to print the message next to the error code. Fixes: VECTOR-189	2026-03-10 17:22:30 +01:00
Nadav Har'El	92ee959e9b	test/alternator: speed up test_streams.py by using module-scope fixtures Previously, all stream-table fixtures in this test file used scope="function", forcing a fresh table to be created for every test, slowing down the test a bit (though not much), and discouraging writing small new tests. This was a workaround for a DynamoDB quirk (that Alternator doesn't have): LATEST shard iterators have a time slack and may point slightly before the true stream head, causing leftover events from a previous test to appear in the next test's reads. We fix this by draining the stream inside latest_iterators() and shards_and_latest_iterators() after obtaining the LATEST iterators: fetch records in a loop until two consecutive polling rounds both return empty, guaranteeing the iterators are positioned past all pre-existing events before the caller writes anything. With this guarantee in place, all stream-table fixtures can safely use scope="module". After this patch, test_streams.py continues to pass on DynamoDB. On Alternator, the test file's run time went down a bit, from 20.2 seconds to 17.7 seconds. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-03-10 17:14:04 +02:00
Nadav Har'El	6ac1f1333f	test/alternator: test_streams.py don't use fixtures in 4 tests In the next patch, we plan to make the fixtures in test_streams.py shared between tests. Most tests work well with shared tables, but two (test_streams_trim_horizon and test_streams_starting_sequence_number) were written to expect a new table with an empty history, and two other (test_streams_closed_read and test_streams_disabled_stream) want to disable streaming and would break a shared table. So this patch we modify these four tests to create their own new table instead of using a fixture. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-03-10 17:12:33 +02:00
Botond Dénes	81e214237f	Merge 'Add digests for all sstable components in scylla metadata' from Taras Veretilnyk This pull request adds support for calculation and storing CRC32 digests for all SSTable components. This change replaces plain file_writer with crc32_digest_file_writer for all SSTable components that should be checksummed. The resulting component digests are stored in the sstable structure and later persisted to disk as part of the Scylla metadata component during writer::consume_end_of_stream. Several test cases where introduced to verify expected behaviour. Additionally, this PR adds new rewrite component mechanism for safe sstable component rewriting. Previously, rewriting an sstable component (e.g., via rewrite_statistics) created a temporary file that was renamed to the final name after sealing. This allowed crash recovery by simply removing the temporary file on startup. However, with component digests stored in scylla_metadata (#20100), replacing a component like Statistics requires atomically updating both the component and scylla_metadata with the new digest - impossible with POSIX rename. The new mechanism creates a clone sstable with a fresh generation: - Hard-links all components from the source except the component being rewritten and scylla_metadata - Copies original sstable components pointer and recognized components from the source - Invokes a modifier callback to adjust the new sstable before rewriting - Writes the modified component along with updated scylla_metadata containing the new digest - Seals the new sstable with a temporary TOC - Replaces the old sstable atomically, the same way as it is done in compaction This is built on the rewrite_sstables compaction framework to support batch operations (e.g., following incremental repair). In case of any failure durning the whole process, sstable will be automatically deleted on the node startup due to temporary toc persistence. Backport is not required, it is a new feature Fixes https://github.com/scylladb/scylladb/issues/20100, https://github.com/scylladb/scylladb/issues/27453 Closes scylladb/scylladb#28338 * github.com:scylladb/scylladb: docs: document components_digests subcomponent and trailing digest in Scylla.db sstable_compaction_test: Add tests for perform_component_rewrite sstable_test: add verification testcases of SSTable components digests persistance sstables: store digest of all sstable components in scylla metadata sstables: replace rewrite_statistics with new rewrite component mechanism sstables: add new rewrite component mechanism for safe sstable component rewriting compaction: add compaction_group_view method to specify sstable version sstables: add null_data_sink and serialized_checksum for checksum-only calculation sstables: extract default write open flags into a constant sstables: Add write_simple_with_digest for component checksumming sstables: Extract file writer closing logic into separate methods sstables: Implement CRC32 digest-only writer	2026-03-10 16:02:53 +02:00
Aleksandra Martyniuk	e02b0d763c	service: fix indentation	2026-03-10 14:44:52 +01:00
Aleksandra Martyniuk	02257d1429	test: add test_tablet_repair_wait Add a test that checks if tablet_virtual_task::wait won't fail if tablets are merged.	2026-03-10 14:42:27 +01:00
Aleksandra Martyniuk	0e0070f118	service: remove status_helper::tablets Currently, status_helper::tablets, which keeps a vector of processed tablet ids, is used only in tablet_virtual_task::get_status_helper, so there is no point in returning it. Also, in get_status_helper, it is used only to determine if any tablets are processed. Remove status_helper::tablets. Use a flag instead of the vector in get_status_helper.	2026-03-10 14:42:27 +01:00
Aleksandra Martyniuk	e5928497ce	service: tasks: scan all tablets in tablet_virtual_task::wait Currently, for repair tasks tablet_virtual_task::wait gathers the ids of tablets that are to be repaired. The gathered set is later used to check if the repair is still ongoing. However, if the tablets are resized (split or merged), the gathered set becomes irrelevant. Those, we may end up with invalid tablet id error being thrown. Wait until repair is done for all tablets in the table.	2026-03-10 14:42:21 +01:00
Andrei Chekun	c36df5ecf4	test.py: eliminite drivers exception There is a race condition in driver that raises the RuntimeException. This pollutes the output, so this PR is just silencing this exception. Fixes: SCYLLADB-900 Closes scylladb/scylladb#28957	2026-03-10 14:31:36 +02:00
Alex	3ac4e258e8	transport/messages: hold pinned prepared entry in PREPARE result result_message::prepared now owns a strong pinned prepared-cache entry instead of relying only on a weak pointer view. This closes the remaining lifetime gap after query_processor::prepare() returns, so users of the returned PREPARE message cannot observe an invalidated weak handle during subsequent processing. - update result_message::prepared::cql constructor to accept pinned entry - construct weak view from owned pinned entry inside the message - pass pinned cache entry from query_processor::prepare() into the message constructor	2026-03-10 14:17:57 +02:00
Alex	27051d9a7c	cql3: pin prepared cache entry in prepare() to avoid invalid weak handle race query_processor::prepare() could race with prepared statement invalidation: after loading from the prepared cache, we converted the cached object to a checked weak pointer and then continued asynchronous work (including error-injection waitpoints). If invalidation happened in that window, the weak handle could no longer be promoted and the prepare path could fail nondeterministically. This change keeps a strong cache entry reference alive across the whole critical section in prepare() by using a pinned cache accessor (get_pinned()), and only deriving the weak handle while the entry is pinned. This removes the lifetime gap without adding retry loops. Test coverage was extended in test/cluster/test_prepare_race.py: - reproduces the invalidation-during-prepare window with injection, - verifies prepare completes successfully, - then invalidates again and executes the same stale client prepared object, - confirms the driver transparently re-requests/re-prepares and execution succeeds. This change introduces: - no behavior change for normal prepare flow besides stronger lifetime guarantees, - no new protocol semantics, - preserves existing cache invalidation logic, - adds explicit cluster-level regression coverage for both the race and driver reprepare path. - pushes the re prepare operation twards the driver, the server will return unprepared error for the first time and the driver will have to re prepare during execution stage	2026-03-10 14:17:57 +02:00
Piotr Dulikowski	37f8cdf485	Merge 'test.py: fix unawaited ScyllaLogFile.grep() coroutines' from Andrei Chekun Fixed several places where ScyllaLogFile.grep() was called without await, resulting in checking coroutine objects for truthiness instead of actual log matches. Fixes: SCYLLADB-903 No backport, framework fix and one test fix. Closes scylladb/scylladb#28909 * github.com:scylladb/scylladb: test.py: fix unawaited ScyllaLogFile.grep() coroutines tests: fix test_group0_recovers_after_partial_command_application	2026-03-10 12:29:23 +01:00
Dario Mirovic	f72081194c	db: use prefix tombstones in DROP TABLE schema mutations When dropping a table, make_drop_table_or_view_mutations() creates a point tombstone in system_schema.columns for every column in the table. The clustering key of system_schema.columns is (table_name, column_name). A clustering key with only the table_name component acts as a prefix tombstone. That tombstone covers all columns belonging to that table. This approach is already used by make_table_deleting_mutations() during CREATE TABLE. Apply the same prefix tombstone approach to DROP TABLE for the columns, view_virtual_columns, computed_columns, and dropped_columns schema tables. This reduces tombstone accumulation in schema table sstables. In test_max_cells test case, which repeatedly creates and drops a table with 32768 columns, overall test time improved from ~180s to ~157s, which is ~12.7% improvement. Refs SCYLLADB-815 Closes scylladb/scylladb#28976	2026-03-10 11:59:00 +01:00
Gleb Natapov	b59b3d4f8a	service level: remove version 1 service level code	2026-03-10 10:46:48 +02:00
Gleb Natapov	b633ec1779	features: move GROUP0_SCHEMA_VERSIONING to deprecated features list	2026-03-10 10:46:48 +02:00
Gleb Natapov	40ec0d4942	migration_manager: remove unused forward definitions	2026-03-10 10:46:48 +02:00
Gleb Natapov	aa9eb0ef8c	test: remove unused code	2026-03-10 10:46:48 +02:00
Gleb Natapov	4660f908f9	auth: drop auth_migration_listener since it does nothing now	2026-03-10 10:46:48 +02:00
Gleb Natapov	74b5a8d43d	schema: drop schema_registry_entry::maybe_sync() function Schema is synced through group0 now. Drop all the test of the function as well.	2026-03-10 10:46:47 +02:00
Gleb Natapov	b9f3281af6	schema: drop make_table_deleting_mutations since it should not be needed with raft Also remove the test since it is no longer relevant	2026-03-10 10:46:47 +02:00
Gleb Natapov	f76199e5c2	schema: remove calculate_schema_digest function It is used by the test only, so remove the test and its data as well.	2026-03-10 10:46:47 +02:00
Gleb Natapov	08e33ad7f7	schema: drop recalculate_schema_version function and its uses There is no need to recalculate schema version any more since it is set by group0.	2026-03-10 10:46:39 +02:00
Gleb Natapov	7bb334a5dd	migration_manager: drop check for group0_schema_versioning feature We do not allow upgrading from a version that does not have it any longer.	2026-03-10 10:39:59 +02:00
Gleb Natapov	4402b030ae	cdc: drop usage of cdc_local table and v1 generation definition	2026-03-10 10:39:59 +02:00
Gleb Natapov	6769615ff1	storage_service: no need to add yourself to the topology during reboot since raft state loading already did it	2026-03-10 10:39:59 +02:00
Gleb Natapov	33fbda9f3b	storage_service: remove unused functions	2026-03-10 10:39:58 +02:00
Gleb Natapov	0e3e7be335	group0: drop with_raft() function from group0_guard since it always returns true now Also drop the code that assumed that the function can return false.	2026-03-10 10:39:58 +02:00
Gleb Natapov	4e56ca3c76	gossiper: do not gossip TOKENS and CDC_GENERATION_ID any more They were used by legacy topology and cdc code only.	2026-03-10 10:39:58 +02:00
Gleb Natapov	77f8f952b2	gossiper: drop tokens from loaded_endpoint_state	2026-03-10 10:39:58 +02:00
Gleb Natapov	706754dc24	gossiper: remove unused functions	2026-03-10 10:39:58 +02:00
Gleb Natapov	8ee4cdd4b7	storage_service: do not pass loaded_peer_features to join_topology() They are not used there any longer.	2026-03-10 10:39:58 +02:00
Gleb Natapov	24c01f2289	storage_service: remove unused fields from replacement_info	2026-03-10 10:39:58 +02:00
Gleb Natapov	2d8722d204	gossiper: drop is_safe_for_restart() function and its use The function checks that the node's state is not left or removed in gossiper during restart, but with raft topology a removed node will not be able to contact the cluster to get this information since it will be banned.	2026-03-10 10:39:58 +02:00
Gleb Natapov	6f739a8ee4	storage_service: remove unused variables from join_topology	2026-03-10 10:39:58 +02:00
Gleb Natapov	d35b83bec8	gossiper: remove the code that was only used in gossiper topology The topology state machine is always present now and can be passed to the gossiper during creation.	2026-03-10 10:39:58 +02:00
Gleb Natapov	390eb46c1a	storage_service: drop the check for raft mode from recovery code In non raft mode the node will node boot at all, so the check is redundant now.	2026-03-10 10:39:58 +02:00
Gleb Natapov	6a7e850161	cdc: remove legacy code The patch removes test/boost/cdc_generation_test.cc since it unit tests cdc::limit_number_of_streams_if_needed function which is remove here.	2026-03-10 10:38:57 +02:00
Gleb Natapov	0b508c5f96	test: remove unused injection points Also remove test_auth_raft_command_split test which is irrelevant since `5ba7d1b116` because it does not use the function that injects max sized command after the commit.	2026-03-10 10:09:39 +02:00
Gleb Natapov	1d188f0394	auth: remove legacy auth mode and upgrade code A system needs to be upgraded to use v2 auth before moving to this ScyllaDB version otherwise the boot will fail.	2026-03-10 10:09:39 +02:00
Gleb Natapov	02fc4ad0a9	treewide: remove schema pull code since we never pull schema any more Schema pull was used by legacy schema code which is not supported for a long time now and during legacy recovery which is no longer supported as well. It can be dropped now.	2026-03-10 10:09:39 +02:00
Gleb Natapov	0cf726c81f	raft topology: drop upgrade_state and its type from the topology state machine since it is not used any longer	2026-03-10 10:09:39 +02:00
Gleb Natapov	60a861c518	group0: hoist the checks for an illegal upgrade into main.cc The checks are spread around now, but having then in one place and done as early as possible simplifies the logic.	2026-03-10 10:09:39 +02:00
Gleb Natapov	1ff98c89e3	api: drop get_topology_upgrade_state and always report upgrade status as done Non upgraded version will not boot any longer.	2026-03-10 10:09:38 +02:00
Gleb Natapov	be153a4eb7	service_level_controller: drop service level upgrade code We do not allow upgrade from a version that is not updated yet, so the code is not used any longer.	2026-03-10 10:09:38 +02:00
Gleb Natapov	61cc091364	test: drop run_with_raft_recovery parameter to cql_test_env It is unused.	2026-03-10 10:09:38 +02:00
Gleb Natapov	00083b42a7	group0: get rid of group0_upgrade_state Simplify code by getting rid of group0_upgrade_state since upgrade is no longer supported, so no need to track its state. The none upgraded node will simply not boot and to detect that the patch checks the state directly from the system table.	2026-03-10 10:09:38 +02:00
Gleb Natapov	d4b55de214	storage_service: drop topology_change_kind as it is no longer needed The mode is always raft, so no need to keep a variable that tracks that.	2026-03-10 10:09:38 +02:00
Gleb Natapov	68ea6aa0a6	storage_service: drop check_ability_to_perform_topology_operation since no upgrades can happen any more	2026-03-10 10:09:38 +02:00
Gleb Natapov	06652948f3	service_storage: remove unused functions raft_topology_change_enabled and upgrade_state_to_topology_op_kind are not use any more. Remove the code.	2026-03-10 10:09:38 +02:00
Gleb Natapov	e8c72b7ba0	storage_service: remove non raft rebuild code Only raft is supported now.	2026-03-10 10:09:38 +02:00
Gleb Natapov	49ebab971d	storage_service: set topology change kind only once The only support mode is topology_change_kind::raft, so always set it in storage_service::join_cluster during join or regular boot. Drop the check for legacy mode from raft_group0::setup_group0_if_exist since the mode will not be set at this point any longer. The wrong upgrade will still be detected in storage_service::join_cluster where topology.upgrade_state is checked directly.	2026-03-10 10:09:38 +02:00
Gleb Natapov	4e072977d4	group0: drop in_recovery function and its uses Legacy recovery procedure is no longer supported and the code can be dropped.	2026-03-10 10:09:38 +02:00
Gleb Natapov	770762edd8	group0: rename use_raft to maintenance_mode and make it sync group0_upgrade_state::recovery is now used only in maintenance mode so rename the function to indicate it. Also there is no preemption point in the function any more and it can be a regular function, not a co-routine.	2026-03-10 10:09:33 +02:00
Pavel Emelyanov	61af7c8300	test: Re-sort comments around do_test_streaming_scopes() The test description of refreshing test is very elaborated and it's worth having it as the description of the streaming scopes test itself. Callers of the helper can go with smaller descriptions. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-03-10 10:00:09 +03:00
Pavel Emelyanov	5ce3597c25	test: Split do_load_sstables() This helper does two things -- sorts sstables per server according to scope in use and calls sstables_storage.restore(). The code looks better if the sorting of sstables stays in a helper and the call for .restore() is moved to the caller. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-03-10 10:00:09 +03:00
Pavel Emelyanov	8c1fb2b39a	test: Drop load_fn argument from do_load_sstables() Now all callers provide the sstables_storage argument and the load_fn is effectively unused. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-03-10 09:59:08 +03:00
Pavel Emelyanov	59051ccc28	test: Re-use do_test_streaming_scopes() in refresh test Now it's possible to replace the whole body of the test_refresh_with_streaming_scopes() test by calling the corresponding helper function from backup/restore test module. This helper does exactly the same, and the SSTablesOnLocalStorage class provides the necessary save/restore implementations. One more thing to mention -- the refreshing test for some reason only wants to run with restored min-tablet-count equal to the original one. The do_test_streaming_scopes() needs to account for that, as it runs the tests for more options. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-03-10 09:59:07 +03:00
Pavel Emelyanov	f6f1cb0391	test: Introduce SSTablesOnLocalStorage This class implements some of the sstables manipulations performed by test_refresh_with_streaming_scopes(). It's here to facilitate next patch that will use it to call do_test_streaming_scopes() helper. This patch moves two blocks of code out of the test into this new class. The shutil.rmtree(tmpbackup) is seemingly lost, but it really isn't -- the tmpbackup variable holds a name of a _subdir_ inside servers' workdirs. This path doesn't really exist on disk on its own, so removing it is a no-op. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-03-10 09:58:40 +03:00
Pavel Emelyanov	dae4da1810	test: Introduce SSTablesOnObjectStorage The class in question performs two operations for do_test_streaming_scopes(): saves sstables and restores them. Current caller of the helper is the test_restore_with_streaming_scopes() test that need to backup sstables on object storage and restore them from there with the restoration API. The SSTablesOnObjectStorage class does exactly that. The change in do_load_sstables() that checks for sstables_storage to be non None is needed to keep test_refresh_with_streaming_scopes() work -- that test doesn't provide sstables_storage (yet) and the function in question will call the load_fn callback. Next patch will eliminate it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-03-10 09:58:39 +03:00
Pavel Emelyanov	5a033dea47	test: Move test_restore_with_streaming_scopes() into do_test_streaming_scopes() The body of this test is duplicated by test_refresh_with_streaming_scopes() test from other module. Keeping it in a non-test top-level function will help generalizing these two tests. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-03-10 09:57:53 +03:00
Nadav Har'El	d78ea3d498	test/cqlpy: mark test_unbuilt_index_not_used not strictly xfail A few days ago, in commit `7b30a3981b` we added to pytest.ini the option xfail_strict. This option causes every time a test XPASSes, i.e., an xfail test actually passes - to be considered an error and fail the test. But some tests demonstrate a timing-related bug and do not reproduce the bug every single time. An example we noticed in one CI run is: test/cqlpy/test_secondary_index.py::test_unbuilt_index_not_used This test reproduces a timing-related bug (if you read from a secondary index "too quickly" you can get wrong results), but only about 90% of the time, not 100% of the time. The solution is to add "strict=False" for the xfail marker on this specific test. This undoes the xfail_strict for this specific test, accepting that this specific test can either pass or fail. Note that this does NOT make this test worthless - we still see this test failing most of the time, and when a developer finally fixes this issue, the test will begin to pass all the time. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-956 (we'll probably need to follow up this fix with the same fix for other xfail tests that can sometime pass). Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#28942	2026-03-09 22:48:20 +02:00
Avi Kivity	01ddc17ab9	Merge 'mv: allow skipping view updates when a collection is unmodified' from Wojciech Mitros When we generate view updates, we check whether we can skip the entire view update if all columns selected by the view are unmodified. However, for collection columns, we only check if they were unset before and after the update. In this patch we add a check for the actual collection contents. We perform this check for both virtual and non-virtual selections. When the column is only a virtual column in the view, it would be enough to check the liveness of each collection cell, however for that we'd need to deserialize the entire collection anyway, which should be effectively as expensive as comparing all of its bytes. Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-808 Closes scylladb/scylladb#28839 * github.com:scylladb/scylladb: mv: allow skipping view updates when a collection is unmodified mv: allow skipping view updates if an empty collection remains unset	2026-03-09 22:46:01 +02:00
Botond Dénes	13ff9c4394	db,compaction: use utils::chunked_vector for cache invalidation ranges Instead of dht::partition_ranges_vector, which is an std::vector<> and have been seen to cause large allocations when calculating ranges to be invalidated after compaction: seastar_memory - oversized allocation: 147456 bytes. This is non-fatal, but could lead to latency and/or fragmentation issues. Please report: at [Backtrace #0] void seastar::backtrace<seastar::current_backtrace_tasklocal()::$_0>(seastar::current_backtrace_tasklocal()::$_0&&, bool) at ./build/release/seastar/./seastar/include/seastar/util/backtrace.hh:89 (inlined by) seastar::current_backtrace_tasklocal() at ./build/release/seastar/./seastar/src/util/backtrace.cc:99 seastar::current_tasktrace() at ./build/release/seastar/./seastar/src/util/backtrace.cc:136 seastar::current_backtrace() at ./build/release/seastar/./seastar/src/util/backtrace.cc:169 seastar::memory::cpu_pages::warn_large_allocation(unsigned long) at ./build/release/seastar/./seastar/src/core/memory.cc:840 seastar::memory::cpu_pages::check_large_allocation(unsigned long) at ./build/release/seastar/./seastar/src/core/memory.cc:903 (inlined by) seastar::memory::cpu_pages::allocate_large(unsigned int, bool) at ./build/release/seastar/./seastar/src/core/memory.cc:910 (inlined by) seastar::memory::allocate_large(unsigned long, bool) at ./build/release/seastar/./seastar/src/core/memory.cc:1533 (inlined by) seastar::memory::allocate_slowpath(unsigned long) at ./build/release/seastar/./seastar/src/core/memory.cc:1679 seastar::memory::allocate(unsigned long) at ././seastar/src/core/memory.cc:1698 (inlined by) operator new(unsigned long) at ././seastar/src/core/memory.cc:2440 (inlined by) std::__new_allocator<interval<dht::ring_position>>::allocate(unsigned long, void const) at /usr/lib/gcc/x86_64-redhat-linux/15/../../../../include/c++/15/bits/new_allocator.h:151 (inlined by) std::allocator<interval<dht::ring_position>>::allocate(unsigned long) at /usr/lib/gcc/x86_64-redhat-linux/15/../../../../include/c++/15/bits/allocator.h:203 (inlined by) std::allocator_traits<std::allocator<interval<dht::ring_position>>>::allocate(std::allocator<interval<dht::ring_position>>&, unsigned long) at /usr/lib/gcc/x86_64-redhat-linux/15/../../../../include/c++/15/bits/alloc_traits.h:614 (inlined by) std::_Vector_base<interval<dht::ring_position>, std::allocator<interval<dht::ring_position>>>::_M_allocate(unsigned long) at /usr/lib/gcc/x86_64-redhat-linux/15/../../../../include/c++/15/bits/stl_vector.h:387 (inlined by) std::vector<interval<dht::ring_position>, std::allocator<interval<dht::ring_position>>>::reserve(unsigned long) at /usr/lib/gcc/x86_64-redhat-linux/15/../../../../include/c++/15/bits/vector.tcc:79 dht::to_partition_ranges(utils::chunked_vector<interval<dht::token>, 131072ul> const&, seastar::bool_class<utils::can_yield_tag>) at ./dht/i_partitioner.cc:347 compaction::compaction::get_ranges_for_invalidation(std::vector<seastar::lw_shared_ptr<sstables::sstable>, std::allocator<seastar::lw_shared_ptr<sstables::sstable>>> const&) at ./compaction/compaction.cc:619 (inlined by) compaction::compaction::get_compaction_completion_desc(std::vector<seastar::lw_shared_ptr<sstables::sstable>, std::allocator<seastar::lw_shared_ptr<sstables::sstable>>>, std::vector<seastar::lw_shared_ptr<sstables::sstable>, std::allocator<seastar::lw_shared_ptr<sstables::sstable>>>) at ./compaction/compaction.cc:719 (inlined by) compaction::regular_compaction::replace_remaining_exhausted_sstables() at ./compaction/compaction.cc:1362 compaction::compaction::finish(std::chrono::time_point<db_clock, std::chrono::duration<long, std::ratio<1l, 1000l>>>, std::chrono::time_point<db_clock, std::chrono::duration<long, std::ratio<1l, 1000l>>>) at ./compaction/compaction.cc:1021 compaction::compaction::run(std::unique_ptr<compaction::compaction, std::default_delete<compaction::compaction>>)::$_0::operator()() at ./compaction/compaction.cc:1960 (inlined by) compaction::compaction_result std::__invoke_impl<compaction::compaction_result, compaction::compaction::run(std::unique_ptr<compaction::compaction, std::default_delete<compaction::compaction>>)::$_0>(std::__invoke_other, compaction::compaction::run(std::unique_ptr<compaction::compaction, std::default_delete<compaction::compaction>>)::$_0&&) at /usr/lib/gcc/x86_64-redhat-linux/15/../../../../include/c++/15/bits/invoke.h:63 (inlined by) std::__invoke_result<compaction::compaction::run(std::unique_ptr<compaction::compaction, std::default_delete<compaction::compaction>>)::$_0>::type std::__invoke<compaction::compaction::run(std::unique_ptr<compaction::compaction, std::default_delete<compaction::compaction>>)::$_0>(compaction::compaction::run(std::unique_ptr<compaction::compaction, std::default_delete<compaction::compaction>>)::$_0&&) at /usr/lib/gcc/x86_64-redhat-linux/15/../../../../include/c++/15/bits/invoke.h:98 (inlined by) decltype(auto) std::__apply_impl<compaction::compaction::run(std::unique_ptr<compaction::compaction, std::default_delete<compaction::compaction>>)::$_0, std::tuple<>>(compaction::compaction::run(std::unique_ptr<compaction::compaction, std::default_delete<compaction::compaction>>)::$_0&&, std::tuple<>&&, std::integer_sequence<unsigned long, ...>) at /usr/lib/gcc/x86_64-redhat-linux/15/../../../../include/c++/15/tuple:2920 (inlined by) decltype(auto) std::apply<compaction::compaction::run(std::unique_ptr<compaction::compaction, std::default_delete<compaction::compaction>>)::$_0, std::tuple<>>(compaction::compaction::run(std::unique_ptr<compaction::compaction, std::default_delete<compaction::compaction>>)::$_0&&, std::tuple<>&&) at /usr/lib/gcc/x86_64-redhat-linux/15/../../../../include/c++/15/tuple:2935 (inlined by) seastar::future<compaction::compaction_result> seastar::futurize<compaction::compaction_result>::apply<compaction::compaction::run(std::unique_ptr<compaction::compaction, std::default_delete<compaction::compaction>>)::$_0>(compaction::compaction::run(std::unique_ptr<compaction::compaction, std::default_delete<compaction::compaction>>)::$_0&&, std::tuple<>&&) at ././seastar/include/seastar/core/future.hh:1930 (inlined by) seastar::futurize<std::invoke_result<compaction::compaction::run(std::unique_ptr<compaction::compaction, std::default_delete<compaction::compaction>>)::$_0>::type>::type seastar::async<compaction::compaction::run(std::unique_ptr<compaction::compaction, std::default_delete<compaction::compaction>>)::$_0>(seastar::thread_attributes, compaction::compaction::run(std::unique_ptr<compaction::compaction, std::default_delete<compaction::compaction>>)::$_0&&)::'lambda'()::operator()() const at ././seastar/include/seastar/core/thread.hh:267 (inlined by) seastar::noncopyable_function<void ()>::direct_vtable_for<seastar::futurize<std::invoke_result<compaction::compaction::run(std::unique_ptr<compaction::compaction, std::default_delete<compaction::compaction>>)::$_0>::type>::type seastar::async<compaction::compaction::run(std::unique_ptr<compaction::compaction, std::default_delete<compaction::compaction>>)::$_0>(seastar::thread_attributes, compaction::compaction::run(std::unique_ptr<compaction::compaction, std::default_delete<compaction::compaction>>)::$_0&&)::'lambda'()>::call(seastar::noncopyable_function<void ()> const) at ././seastar/include/seastar/util/noncopyable_function.hh:138 seastar::noncopyable_function<void ()>::operator()() const at ./build/release/seastar/./seastar/include/seastar/util/noncopyable_function.hh:224 (inlined by) seastar::thread_context::main() at ./build/release/seastar/./seastar/src/core/thread.cc:318 dht::partition_ranges_vector is used on the hot path, so just convert the problematic user -- cache invalidation -- to use utils::chunked_vector<dht::partition_range> instead. Fixes: SCYLLADB-121 Closes scylladb/scylladb#28855	2026-03-09 22:04:54 +02:00
Andrei Chekun	8acba40c84	test.py: fix unawaited ScyllaLogFile.grep() coroutines Fixed several places where ScyllaLogFile.grep() was called without await, resulting in checking coroutine objects for truthiness instead of actual log matches. Fixes: SCYLLADB-903	2026-03-09 19:41:07 +01:00
Andrei Chekun	224a11be65	tests: fix test_group0_recovers_after_partial_command_application Due to the fact that grep logs was not awaited this issue was masked. With adding await for log grep it started to fail. This PR fixes the test.	2026-03-09 19:41:07 +01:00
Nadav Har'El	16e7a88a02	test/alternator: fix do_test() in test_streams.py Many tests in test/alternator/test_streams.py use a do_test() function which performs a user-defined function that runs some write requests, and then verifies that the expected output appears on the stream. Because DynamoDB drops do-nothing changes from the stream - such as writing to an item a value that it already has - these tests need to write to a different item each time, so do_test() invents a random key and passes it to the user-defined function to use. But... we had a bug, the random number generation was done only once, instead of every time. The fix is to do the random number generation on every call. We never noticed this bug when each test used a brand new table. But the next patch will make the tests share the test table, and tests start to fail. It's especially visible if you run the same test twice against DynamoDB, e.g., test/alternator/run --count 2 --aws \ test_streams.py::test_streams_putitem_keys_only Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-03-09 19:21:53 +02:00
Łukasz Paszkowski	147b355326	replica/table: avoid computing token range side in storage_group_of() on hot path `storage_group_of()` is on the replica-side token lookup hot path but used `tablet_map::get_tablet_id_and_range_side()`, which computes both tablet id and post-split range side. Most callers only need the storage group id. Switch `storage_group_of()` to use `get_tablet_id()` via `tablet_id_for_token()`, and select the compaction group via new overloads that compute the range side only when splitting mode is active.	2026-03-09 17:59:36 +01:00
Łukasz Paszkowski	419e9aa323	replica/compaction_group: add lazy select_compaction_group() overloads Change `storage_group::select_compaction_group()` to accept a token (and tablet_map) and compute the tablet range side only when splitting_mode() is active. Add an overload for selecting the compaction group for an sstable spanning a token range.	2026-03-09 17:59:36 +01:00
Łukasz Paszkowski	3f70611504	locator/tablets: add tablet_map::get_tablet_range_side() Add `tablet_map::get_tablet_range_side(token)` to compute the post-split range side without computing the tablet id. Pure addition, no behavior change.	2026-03-09 17:59:36 +01:00
Jakub Smolar	7cdd979158	db/config: announce ms format as highest supported Uncomment the feature flag check in get_highest_supported_format() to return MS format when supported, otherwise fall back to ME.	2026-03-09 17:12:09 +01:00
Michał Chojnowski	949fc85217	db/config: enable `ms` sstable format by default Trie-based sstable indexes are supposed to be (hopefully) a better default than the old BIG indexes. Make them the new default. If we change our mind, this change can be reverted later.	2026-03-09 17:12:09 +01:00
Michał Chojnowski	6b413e3959	cluster/dtest/bypass_cache_test: switch from highest_supported_sstable_format to chosen_sstable_format Trie-based indexes and older indexes have a difference in metrics, and the test uses the metrics to check for bypass cache. To choose the right metrics, it uses highest_supported_sstable_format, which is inappropriate, because the sstable format chosen for writes by Scylla might be different than highest_supported_sstable_format. Use chosen_sstable_format instead.	2026-03-09 17:12:09 +01:00
Michał Chojnowski	b89840c4b9	api/system: add /system/chosen_sstable_version Returns the sstable version currently chosen for use in for new sstables. We are adding it because some tests want to know what format they are writing (tests using upgradesstable, tests which check stats that only apply to one of the index types, etc). (Currently they are using `highest_supported_sstable_format` for this purpose, which is inappropriate, and will become invalid if a non-latest format is the default).	2026-03-09 17:12:09 +01:00
Michał Chojnowski	9280a039ee	test/cluster/dtest: reduce num_tokens to 16 cluster.dtest_alternator_tests.test_slow_query_logging performs a bootstrap with 768 token ranges. It works with `me` sstables, which have 2 open file descriptors per open sstable, but with `ms` sstables, which have 3 open file descriptors per open sstable, it fails with EMFILE. To avoid this problem, let's just decrease the number of vnodes for in the test suite. It's appropriate anyway, because it avoids some unneeded work without weakening the tests. (Note: pylib-based have been setting `num_tokens` to 16 for a long time too). This breaks `bypass_cache_test`, which is written in a way that expects a certain number of token ranges. We adjust the relevant parameter accordingly.	2026-03-09 17:12:09 +01:00
Marcin Maliszkiewicz	96a2b0e634	test: add tests for global group0_batch barrier feature Runtime: 16s in dev mode	2026-03-09 15:15:59 +01:00
Marcin Maliszkiewicz	6723ced684	qos: switch service levels write paths to use global group0_batch barrier This ensures that we return auth functions only after we wait until all live nodes apply our mutations.	2026-03-09 15:15:59 +01:00
Marcin Maliszkiewicz	fe79fdf090	auth: switch write paths to use global group0_batch barrier This ensures that we return auth functions only after we wait until all live nodes apply our mutations.	2026-03-09 15:15:59 +01:00
Marcin Maliszkiewicz	4c8681a927	raft: add function to broadcast read barrier request This function ensures that all alive nodes executed read barrier. It will be usefull for the following commits which would eventually delay returning response to the user until mutations are applied on other nodes so that the user may perceive better data consistency accross nodes.	2026-03-09 15:15:59 +01:00
Marcin Maliszkiewicz	cbae84a926	raft: add gossiper dependency to raft_group0_client In following commit raft_group0_client will send read barrier RPC to all alive nodes, it takes list of the nodes from gossiper.	2026-03-09 15:15:59 +01:00
Marcin Maliszkiewicz	8422fbca9f	raft: add read barrier RPC The RPC does read barrier on a destination node. It will be issued in following commits to live nodes to assure that command was applied everywhere.	2026-03-09 15:15:59 +01:00
Michał Chojnowski	ff60a5f1e5	cql3: suggest ALTER MATERIALIZED VIEW to users trying to use ALTER TABLE on a view When a user tries to use ALTER TABLE on a materialized view, the resulting error message is `Cannot use ALTER TABLE on Materialized View`. The intention behind this error is that ALTER MATERIALIZED VIEW should be used instead. But we observed that some users interpret this error message as a general "You cannot do any ALTER on this thing". This patch enhances the error message (and others similar to it) to prevent the confusion. Closes scylladb/scylladb#28831	2026-03-09 15:07:21 +01:00
Botond Dénes	1e41db5948	Merge 'service: tasks: return successful status if a table was dropped' from Aleksandra Martyniuk tablet_virtual_task::wait throws if a table on which a tablet operation was working is dropped. Treat the tablet operation as successful if a table is dropped. Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-494 Needs backport to all live releases Closes scylladb/scylladb#28933 * github.com:scylladb/scylladb: test: add test_tablet_repair_wait_with_table_drop service: tasks: return successful status if a table was dropped	2026-03-09 16:04:44 +02:00
Piotr Dulikowski	23ed0d4df8	Merge 'vector_search: fix TLS server name with IP' from Karol Nowacki SNI works only with DNS hostnames. Adding an IP address causes warnings on the server side. This change adds SNI only if it is not an IP address. This change has no unit tests, as this behavior is not critical, since it causes a warning on the server side. The critical part, that the server name is verified, is already covered. This PR also adds warning logs to improve future troubleshooting of connections to the vector-store nodes. Fixes: VECTOR-528 Backports to 2025.04 and 2026.01 are required, as these branches are also affected. Closes scylladb/scylladb#28637 * github.com:scylladb/scylladb: vector_search: fix TLS server name with IP vector_search: add warn log for failed ann requests	2026-03-09 15:03:22 +01:00
Asias He	e0483f6001	test: Fix coordinator assumption in do_test_tablet_incremental_repair_merge_error The first node in the cluster is not guaranteed to be the coordinator node. Hardcoding node 0 as the coordinator causes test flakiness. This patch dynamically finds the actual coordinator node and targets it for error injection, log checking, and restarts. Additionally, inject `tablet_force_tablet_count_decrease_once` across all servers to force the tablet merge process to trigger once. Fixes SCYLLADB-865 Closes scylladb/scylladb#28945	2026-03-09 15:27:45 +02:00
Marcin Maliszkiewicz	b6a7484520	docs: note eventual visibility of auth changes Mention that role and permission changes are durable but may not be immediately visible on other nodes due to asynchronous replication. Fixes: SCYLLADB-651 Closes scylladb/scylladb#28900	2026-03-09 14:07:10 +01:00
Piotr Dulikowski	42d70baad3	db: view: mutate_MV: don't hold keyspace ref across preemption Currently, the view_update_generator::mutate_MV function acquires a reference to the keyspace relevant to the operation, then it calls max_concurrent_for_each and uses that reference inside the lambda passed to that function. max_concurrent_for_each can preempt and there is no mechanism that makes sure that the keyspace is alive until the view updates are generated, so it is possible that the keyspace is freed by the time the reference is used. Fix the issue by precomputing the necessary information based on the keyspace reference right away, and then passing that information by value to the other parts of the code. It turns out that we only need to know whether the keyspace uses tablets and whether it uses a network topology strategy. Fixes: scylladb/scylladb#28925 Closes scylladb/scylladb#28928	2026-03-09 15:04:26 +02:00
Łukasz Paszkowski	826fd5d6c3	test/storage: harden out-of-space prevention tests around restart and disk-utilization transitions The tests in test_out_of_space_prevention.py are flaky. Three issues contribute: 1. After creating/removing the blob file that simulates disk pressure, the tests immediately checked derived state (e.g., "compaction_manager - Drained") without first confirming the disk space monitor had detected the utilization change. Fix: explicitly wait for "Reached/Dropped below critical disk utilization level" right after creating/removing the blob file, before checking downstream effects. 2. Several tests called `manager.driver_connect()` or omitted reconnection entirely after `server_restart()` / `server_start()`. The pre-existing driver session can silently reconnect multiple times, causing subsequent CQL queries to fail. Fix: call `reconnect_driver()` after every node restart. Additionally, call `wait_for_cql_and_get_hosts()` where CQL is used afterward, to ensure all connection pools are established. 3. Some log assertions used marks captured before a restart, so they could match pre-restart messages or miss messages emitted in the correct post-restart window. Fix: refresh marks at the right points. Apart from that, the patch fixes a typo: autotoogle -> autotoggle. Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-655 Closes scylladb/scylladb#28626	2026-03-09 14:45:09 +02:00
Calle Wilund	ef795eda5b	test_encryption: Fix test_system_auth_encryption Fixes: SCYLLADB-915 Test was quite broken; Not waiting for coro:s, as well as a bunch of checks no longer even close to valid (this is a ported dtest, and not a very good one). Closes scylladb/scylladb#28887	2026-03-09 14:38:31 +02:00
Marcin Maliszkiewicz	f177259316	Merge 'vector_search: small improvements' from Karol Nowacki vector_search: small improvements This PR addresses several minor code quality issues and style inconsistencies within the vector_search module. No backport is needed as these improvements are not visible to the end user. Closes scylladb/scylladb#28718 * github.com:scylladb/scylladb: vector_search: fix names of private members vector_search: remove unused global variable	2026-03-09 11:42:35 +01:00
Botond Dénes	6bba4f7ca1	Merge 'test: cluster: util: sleep for 0.01s between writes in do_writes' from Patryk Jędrzejczak Tests use `start_writes` as a simple write workload to test that writes succeed when they should (e.g., there is no availability loss), but not to test performance. There is no reason to overload the CPU, which can lead to test failures. I suspect this function to be the cause of SCYLLADB-929, where the failures of `test_raft_recovery_user_data` (that creates multiple write workloads with `start_writes`) indicated that the machine was overloaded. The relevant observations: - two runs failed at the same time in debug mode, - there were many reactor stalls and RPC timeouts in the logs (leading to unexpected events like servers marking each other down and group0 leader changes). I didn't prove that `start_writes` really caused this, but adding this sleep should be a good change, even if I'm wrong. The number of writes performed by the test decreases 30-50 times with the sleep. Note that some other util functions like `start_writes_to_cdc_table` have such a sleep. This PR also contains some minor updates to `test_raft_recovery_user_data`. Fixes SCYLLADB-929 No backport: - the failures were observed only in master CI, - no proof that the change fixes the issue, so backports could be a waste of time. Closes scylladb/scylladb#28917 * github.com:scylladb/scylladb: test: test_raft_recovery_user_data: replace asyncio.gather with gather_safely test: test_raft_recovery_user_data: use the exclude_node API test: test_raft_recovery_user_data: drop tablet_load_stats_cfg test: cluster: util: sleep for 0.01s between writes in do_writes	2026-03-09 12:12:04 +02:00
Nadav Har'El	47e8206482	test/alternator: test, and document, Alternator's data encoding This patch adds a test file, test/alternator/test_encoding.py, testing how Alternator stores its data in the underlying CQL database. We test how tables are named, and how attributes of different types are encoded into CQL. The test, which begins with a long comment, also doubles as developer- oriented documention on how Alternator encodes its data in the CQL database. This documentation is not intended for end-users - we do not want to officially support reading or writing Alternator tables through CQL. But once in a while, this information can come in handy for developers. More importantly, this test will also serve as a regression test, verifying that Alternator's encoding doesn't change unintentionally. If we make an unintentional change to the way that Alternator stores its data, this can break upgrades: The new code might not be able to read or write the old table with its old encoding. So it's important to make sure we never make such unintentional changes to the encoding of Alternator's data. If we ever do make intentional changes to Alternator's data encoding, we will need to fix the relevant test; But also not forget to make sure that the new code is able to read the old encoding as well. The new tests use both "dynamodb" (Alternator) and "cql" fixtures, to test how CQL sees the Alternator tables. So naturally are these tests are marked "scylla_only" and skipped on DynamoDB. Fixes #19770. Closes scylladb/scylladb#28866	2026-03-09 10:50:09 +01:00
Andrzej Jackowski	6fb5ab78eb	db/config: move guardrails config to one place and reorder The motivations for this patch are as follows: - Guardrails should follow similar conventions, e.g. for config names, metrics names, testing. Keeping guardrails together makes it easier to find and compare existing guardrails when new guardrails are implemented. - The configuration is used to auto-generate the documentation (particularly, the `configuration-parameters` page). Currently, the order of parameters in the documentation is inconsistent (e.g. `minimum_replication_factor_fail_threshold` before `minimum_replication_factor_warn_threshold` but `maximum_replication_factor_fail_threshold` after `maximum_replication_factor_warn_threshold`), which can be confusing to customers. Fixes: SCYLLADB-256 Closes scylladb/scylladb#28932	2026-03-09 10:50:00 +01:00
Patryk Jędrzejczak	46b7170347	Merge 'test/pylib: centralize timeout scaling and propagate build_mode in LWT helpers' from Alex Dathskovsky This series improves timeout handling consistency across the test framework and makes build-mode effects explicit in LWT tests. (starting with LWT test that got flaky) 1. Centralize timeout scaling Introduce scale_timeout(timeout) fixture in runner.py to provide a single, consistent mechanism for scaling test timeouts based on build mode. Previously, timeout adjustments were done in an ad-hoc manner across different helpers and tests. Centralizing the logic: Ensures consistent behavior across the test suite Simplifies maintenance and reasoning about timeout behavior Reduces duplication and per-test scaling logic This becomes increasingly important as tests run on heterogeneous hardware configurations, where different build modes (especially debug) can significantly impact execution time. 2. Make scale_timeout explicit in LWT helpers Propagate scale_timeout explicitly through BaseLWTTester and Worker, validating it at construction time instead of relying on implicit pytest fixture injection inside helper classes. Additionally: Update wait_for_phase_ops() and wait_for_tablet_count() to use scale_timeout_by_mode() for consistent polling behavior across modes Update all LWT test call sites to pass build_mode explicitly Increase default timeout values, as the previous defaults were too short and prone to flakiness, particularly under slower configurations such as debug builds Overall, this series improves determinism, reduces flakiness, and makes the interaction between build mode and test timing explicit and maintainable. backport: not required just an enhansment for test.py infra Closes scylladb/scylladb#28840 * https://github.com/scylladb/scylladb: test/auth_cluster: align service-level timeout expectations with scaled config test/lwt: propagate scale_timeout through LWT helpers; scale resize waits Pass scale_timeout explicitly through BaseLWTTester and Worker, validating it at construction time instead of relying on implicit pytest fixture injection inside helper classes. Update wait_for_phase_ops() and wait_for_tablet_count() to use scale_timeout_by_mode() so polling behavior remains consistent across build modes. Adjust LWT test call sites to pass scale_timeout explicitly. Increase default timeout values, as the previous defaults were too short and prone to flakiness under slower configurations (notably debug/dev builds). test/pylib: introduce scale_timeout fixture helper	2026-03-09 10:28:19 +01:00
Patryk Jędrzejczak	4c8dba15f1	Merge 'strong_consistency/state_machine: ensure and upgrade mutations schema' from Michał Jadwiszczak This patch fixes 2 issues within strong consistency state machine: - it might happen that apply is called before the schema is delivered to the node - on the other hand, the apply may be called after the schema was changed and purged from the schema registry The first problem is fixed by doing `group0.read_barrier()` before applying the mutations. The second one is solved by upgrading the mutations using column mappings in case the version of the mutations' schema is older. Fixes SCYLLADB-428 Strong consistency is in experimental phase, no need to backport. Closes scylladb/scylladb#28546 * https://github.com/scylladb/scylladb: test/cluster/test_strong_consistency: add reproducer for old schema during apply test/cluster/test_strong_consistency: add reproducer for missing schema during apply test/cluster/test_strong_consistency: extract common function raft_group_registry: allow to drop append entries requests for specific raft group strong_consistency/state_machine: find and hold schemas of applying mutations strong_consistency/state_machine: pull necessary dependencies db/schema_tables: add `get_column_mapping_if_exists()`	2026-03-09 09:49:22 +01:00
Marcin Maliszkiewicz	4150c62f29	Merge 'test_proxy_protocol: fix flaky system.clients visibility checks' from Piotr Smaron `test_proxy_protocol_port_preserved_in_system_clients` failed because it didn't see the just created connection in system.clients immediately. The last lines of the stacktrace are: ``` # Complete CQL handshake await do_cql_handshake(reader, writer) # Now query system.clients using the driver to see our connection cql = manager.get_cql() rows = list(cql.execute( f"SELECT address, port FROM system.clients WHERE address = '{fake_src_addr}' ALLOW FILTERING" )) # We should find our connection with the fake source address and port > assert len(rows) > 0, f"Expected to find connection from {fake_src_addr} in system.clients" E AssertionError: Expected to find connection from 203.0.113.200 in system.clients E assert 0 > 0 E + where 0 = len([]) ``` Explanation: we first await for the hand-made connection to be completed, then, via another connection, we're querying system.clients, and we don't get this hand-made connection in the resultset. The solution is to replace the bare cql.execute() calls with await wait_for_results(), a helper that polls via cql.run_async() until the expected row count is reached (30 s timeout, 100 ms period). Fixes: SCYLLADB-819 The flaky test is present on master and in previous release, so backporting only there. Closes scylladb/scylladb#28849 * github.com:scylladb/scylladb: test_proxy_protocol: introduce extra logging to aid debugging test_proxy_protocol: fix flaky system.clients visibility checks	2026-03-09 08:37:57 +01:00
Yaron Kaikov	977bdd6260	.github/workflows/trigger-scylla-ci: fix heredoc injection in trigger-scylla-ci workflow Move all ${{ }} expression interpolations into env: blocks so they are passed as environment variables instead of being expanded directly into shell scripts. This prevents an attacker from escaping the heredoc in the Validate Comment Trigger step and executing arbitrary commands on the runner. The Verify Org Membership step is hardened in the same way for defense-in-depth. Refs: GHSA-9pmq-v59g-8fxp Fixes: SCYLLADB-954 Closes scylladb/scylladb#28935	2026-03-08 21:34:51 +02:00
Wojciech Mitros	0008976e2f	mv: allow skipping view updates when a collection is unmodified When we generate view updates, we check whether we can skip the entire view update if all columns selected by the view are unmodified. However, for collection columns, we only check if they were unset before and after the update. In this patch we add a check for the actual collection contents. We perform this check for both virtual and non-virtual selections. When the column is only a virtual column in the view, it would be enough to check the liveness of each collection cell, however for that we'd need to deserialize the entire collection anyway, which should be effectively as expensive as comparing all of its bytes. Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-808	2026-03-08 16:23:22 +01:00
Wojciech Mitros	7d1e0a2e4d	mv: allow skipping view updates if an empty collection remains unset Currently, when we generate view updates, we skip the view update if all columns selected by the view are unchanged in the base table update. However, this does not apply for collection columns - if the base table has a collection regular column, we never allow skipping generating view updates and the reason for that is missing implementation. We can easily relax this for the case where the collection was missing before and after the update - in this commit we move the check for collections after the check for missing cells.	2026-03-08 16:22:27 +01:00
Artsiom Mishuta	fda68811e8	test.py: fix strict-config argument. The ini-level strict_config was removed/never existed as a config key in pytest 8 — it's only a command-line flag(and back in pytest 9) In pytest 8.3.5, the equivalent is the --strict-config CLI flag, not an ini option Fixes SCYLLADB-955 Closes scylladb/scylladb#28939	2026-03-08 16:09:29 +02:00
Taras Veretilnyk	739dd59ebc	docs: document components_digests subcomponent and trailing digest in Scylla.db Document the new `components_digests` subcomponent (tag 12) added to the Scylla.db metadata component, which stores CRC32 digests of all checksummed SSTable component files. Also document the trailing CRC32 digest that stores digest of the scylla metadata itself.	2026-03-06 21:58:15 +01:00
Taras Veretilnyk	2b1c37396a	sstable_compaction_test: Add tests for perform_component_rewrite Add two test cases to verify the correctness of the perform_component_rewrite functionality: - test_perform_component_rewrite_single_sstable: Tests rewriting the Statistics component of a single sstable - test_perform_component_rewrite_multiple_sstables: Tests rewriting 5 out of 10 sstables	2026-03-06 21:58:15 +01:00
Taras Veretilnyk	591d13e942	sstable_test: add verification testcases of SSTable components digests persistance Adds a generic test helper that writes a random SSTable, reloads it, and verifies that the persisted CRC32 digest for each component matches the digest computed from disk. Those covers all checksummed components test cases.	2026-03-06 21:58:15 +01:00
Taras Veretilnyk	54af4a26ca	sstables: store digest of all sstable components in scylla metadata This change replaces plain file_writer with crc32_digest_file_writer for all SSTable components that should be checksummed. The resulting component digests are stored in scylla metadata component. This also extends new rewrite component mechanism, to rewrite metadata with updated digest together with the component.	2026-03-06 21:58:10 +01:00
Dawid Mędrek	5feed00caa	Merge 'raft: read_barrier: update local commit_idx to read_idx when it's safe' from Patryk Jędrzejczak When the local entry with `read_idx` belongs to the current term, it's safe to update the local `commit_idx` to `read_idx`. The motivation for this change is to speed up read barriers. `wait_for_apply` executed at the end of `read_barrier` is delayed until the follower learns that the entry with `read_idx` is committed. It usually happens quickly in the `read_quorum` message. However, non-voters don't receive this message, so they have to wait for `append_entries`. If no new entries are being added, `append_entries` can come only from `fsm::tick_leader()`. For group0, this happens once every 100ms. The issue above significantly slows down cluster setups in tests. Nodes join group0 as non-voters, and then they are met with several read barriers just after a write to group0. One example is `global_token_metadata_barrier` in `write_both_read_new` performed just after `update_topology_state` in `write_both_read_old`. I tested the performance impact of this change with the following test: ```python for _ in range(10): await manager.servers_add(3) ``` It consistently takes 44-45s with the change and 50-51s without the change in dev mode. No backport: - non-critical performance improvement mostly relevant in tests, - the change requires some soak time in master. Closes scylladb/scylladb#28891 * github.com:scylladb/scylladb: raft: server: fix the repeating typo raft: clarify the comment about read_barrier_reply raft: read_barrier: update local commit_idx to read_idx when it's safe raft: log: clarify the specification of term_for	2026-03-06 18:50:08 +01:00
Aleksandra Martyniuk	40dca578c5	test: add test_tablet_repair_wait_with_table_drop	2026-03-06 15:08:29 +01:00
Piotr Smaron	f12e4ea42b	test_proxy_protocol: introduce extra logging to aid debugging In case of an error, we want to see the contents of the system.clients table to have a better understanding of what happened - whether the row(s) are really missing or maybe they are there, but 1 digit doesn't match or the row is half-written. We'll therefore query for the whole table on the CQL side, and then filter out the rows we want to later proceed with on the python side. This way we can dump the contents of the whole system.clients table if something goes south.	2026-03-06 14:50:12 +01:00
Piotr Smaron	d8cf2c5f23	test_proxy_protocol: fix flaky system.clients visibility checks `test_proxy_protocol_port_preserved_in_system_clients` failed because it didn't see the just created connection in system.clients immediately. The last lines of the stacktrace are: ``` # Complete CQL handshake await do_cql_handshake(reader, writer) # Now query system.clients using the driver to see our connection cql = manager.get_cql() rows = list(cql.execute( f"SELECT address, port FROM system.clients WHERE address = '{fake_src_addr}' ALLOW FILTERING" )) # We should find our connection with the fake source address and port > assert len(rows) > 0, f"Expected to find connection from {fake_src_addr} in system.clients" E AssertionError: Expected to find connection from 203.0.113.200 in system.clients E assert 0 > 0 E + where 0 = len([]) ``` Explanation: we first await for the hand-made connection to be completed, then, via another connection, we're querying system.clients, and we don't get this hand-made connection in the resultset. The solution is to replace the bare cql.execute() calls with await wait_for_results(), a helper that polls via cql.run_async() until the expected row count is reached (30 s timeout, 100 ms period). Fixes: SCYLLADB-819	2026-03-06 14:49:59 +01:00
Aleksandra Martyniuk	dd634c329f	service: tasks: return successful status if a table was dropped tablet_virtual_task::wait throws if a table on which a tablet operation was working is dropped. Treat the tablet operation as successful if a table is dropped.	2026-03-06 14:37:44 +01:00
Botond Dénes	4fdc0a5316	Merge 'Relax test's check_mutation_replicas() argument list' from Pavel Emelyanov The one accepts long list of arguments, some of those is not really needed. Also some callers can be relaxed not to provide default values for arguments with such. Improving tests, not backporting Closes scylladb/scylladb#28861 * github.com:scylladb/scylladb: test: Remove passing default "expected_replicas" to check_mutation_replicas() test: Remove scope and primary-replica-only arguments from check_mutation_replicas() helper	2026-03-06 11:25:00 +02:00
Szymon Malewski	d817e56e87	vector_similarity_fcts.cc: fix strict aliasing violation in extract_float_vector Previous code performed endian conversion by bulk-copying raw bytes into a std::vector<float> and then iterating over it via a reinterpret_cast<uint32_t> pointer. Accessing float storage through a uint32_t violates C++ strict aliasing rules, giving the compiler freedom to reorder or elide the stores, causing undefined behavior. Replace the two-pass approach with a single-pass loop using seastar::consume_be<uint32_t>() and std::bit_cast<float>(), which is both well-defined and auto-vectorizable. Follow-up #28754 Closes scylladb/scylladb#28912	2026-03-06 09:15:45 +01:00
Artsiom Mishuta	5d7a73cc5b	test.py add support if non_gating tests Add support for non_gating, the opposite of gating in dtest terminology, tests in test.py codebase This test will/should not be run by any current gating job (ci/next/nightly) Closes scylladb/scylladb#28902	2026-03-06 09:39:32 +02:00
Andrei Chekun	01498a00d5	test.py: make HostRegistry singleton HostRegistry initialized in several places in the framework, this can lead to the overlapping IP, even though the possibility is low it's not zero. This PR makes host registry initialized once for the master thread and pytest. To avoid communication between with workers, each worker will get its own subnet that it can use solely for its own goals. This simplifies the solution while providing the way to avoid overlapping IP's. Closes scylladb/scylladb#28520	2026-03-06 09:25:29 +02:00
Artsiom Mishuta	2be4d8074d	test.py disable XFail tests on CI run This PR disables running FXAIL tests on ci run to speed it up. tests will continue run on "nightly" job and FAIL on unexpected pass and will continue run on "NEXT" job and NOT FAIL on unexpected pass Closes scylladb/scylladb#28886	2026-03-06 09:12:06 +02:00
Szymon Malewski	f9d213547f	cql3: selection: fix `add_column_for_post_processing` for ORDER BY The purpose of `add_column_for_post_processing` is to add columns that are required for processing of a query, but are not part of SELECT clause and shouldn't be returned. They are added to the final result set, but later are not serialized. Mainly it is used for filtering and grouping columns, with a special case of `WHERE primary_key IN ... ORDER BY ...` when the whole result set needs additional final sorting, and ordering columns must be added as well. There was a bug that manifested in #9435, #8100 and was actually identified in #22061. In case of selection with processing (e.g functions involved), result set row is formed in two stages. Initially it is a list of columns fetched from replicas - on which filtering and grouping is performed. After that the actual selection is resolved and the final number of columns can change. Ordering is performed on this final shape, but the ordering column index returned by `add_column_for_post_processing` refereed to initial shape. If selection refereed to the same column twice (e.g. `v, TTL(v)` as in #9435) final row was longer than initial and ordering refereed to incorrect column. If a function in selection refereed to multiple columns (e.g. as_json(.., ..) which #8100 effectively uses) the final row was shorter and ordering tried to use a non-existing column. This patch fixes the problem by making sure that column index of the final result set is used for ordering. The previously crashing test `cassandra_tests/validation/entities/json_test.py::testJsonOrdering` doesn't have to be skipped, but now it is failing on issue #28467. Fixes #9435 Fixes #8100 Fixes #22061 Closes scylladb/scylladb#28472	2026-03-05 19:22:34 +02:00
Patryk Jędrzejczak	c8c57850d9	test: test_raft_recovery_user_data: replace asyncio.gather with gather_safely	2026-03-05 17:13:52 +01:00
Patryk Jędrzejczak	c3aa4ed23c	test: test_raft_recovery_user_data: use the exclude_node API The API is now available.	2026-03-05 17:13:52 +01:00
Patryk Jędrzejczak	dd75687251	test: test_raft_recovery_user_data: drop tablet_load_stats_cfg The issue has been fixed.	2026-03-05 17:13:52 +01:00
Patryk Jędrzejczak	52940c4f31	test: cluster: util: sleep for 0.01s between writes in do_writes Tests use `start_writes` as a simple write workload to test that writes succeed when they should (e.g., there is no availability loss), but not to test performance. There is no reason to overload the CPU, which can lead to test failures. I suspect this function to be the cause of SCYLLADB-929, where the failures of `test_raft_recovery_user_data` (that creates multiple write workloads with `start_writes`) indicated that the machine was overloaded. The relevant observations: - two runs failed at the same time in debug mode, - there were many reactor stalls and RPC timeouts in the logs (leading to unexpected events like servers marking each other down and group0 leader changes). I didn't prove that `start_writes` really caused this, but adding this sleep should be a good change, even if I'm wrong. The number of writes performed by the test decreases 30-50 times with the sleep. Note that some other util functions like `start_writes_to_cdc_table` have such a sleep. Fixes SCYLLADB-929	2026-03-05 17:13:40 +01:00
Calle Wilund	ab3d3d8638	build: add slirp4netns to dependencies Needed for port forwarded podman-in-podman containers [avi: - move from Dockerfile to install-dependencies.sh so non-container builds also get it - regenerate frozen toolchain with optimized clang from https://devpkg.scylladb.com/clang/clang-21.1.8-Fedora-43-aarch64.tar.gz https://devpkg.scylladb.com/clang/clang-21.1.8-Fedora-43-x86_64.tar.gz ] Closes scylladb/scylladb#28870	2026-03-05 17:44:17 +02:00
Michał Jadwiszczak	37bbbd3a27	test/cluster/test_strong_consistency: add reproducer for old schema during apply	2026-03-05 13:50:20 +01:00
Michał Jadwiszczak	6aef4d3541	test/cluster/test_strong_consistency: add reproducer for missing schema during apply	2026-03-05 13:50:16 +01:00
Michał Jadwiszczak	4795f5840f	test/cluster/test_strong_consistency: extract common function	2026-03-05 13:47:43 +01:00
Michał Jadwiszczak	3548b7ad38	raft_group_registry: allow to drop append entries requests for specific raft group Similar to `raft_drop_incoming_append_entries`, the new error injection `raft_drop_incoming_append_entries_for_specified_group` skips handler for `raft_append_entries` RPC but it allows to specify id of raft group for which the requests should be dropped. The id of a raft group should be passed in error injection parameters under `value` key.	2026-03-05 13:47:43 +01:00
Michał Jadwiszczak	b0cffb2e81	strong_consistency/state_machine: find and hold schemas of applying mutations It might happen that a strong consistency command will arrive to a node: - before it knows about the schema - after the schema was changes and the old version was removed from the memory To fix the first case, it's enough to perform a read barrier on group0. In case of the second one, we can use column mapping the upgrade the mutation to newer schema. Also, we should hold pointers to schemas until we finish `_db.apply()`, so the schema is valid for the whole time. And potentially we should hold multiple pointers because commands passed to `state_machine::apply()` may contain mutations to different schema versions. This commit relies on a fact that the tablet raft group and its state machine is created only after the table is created locally on the node. Fixes SCYLLADB-428	2026-03-05 13:47:40 +01:00
Tomasz Grabiec	b90fe19a42	Merge 'service: assert that tables updated via group0 use schema commitlog' from Aleksandra Martyniuk Set enable_schema_commitlog for each group0 tables. Assert that group0 tables use schema commitlog in ensure_group0_schema (per each command). Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-914. Needs backport to all live releases as all are vulnerable Closes scylladb/scylladb#28876 * github.com:scylladb/scylladb: test: add test_group0_tables_use_schema_commitlog db: service: remove group0 tables from schema commitlog schema initializer service: ensure that tables updated via group0 use schema commitlog db: schema: remove set_is_group0_table param	2026-03-05 13:28:13 +01:00
Botond Dénes	509f2af8db	Merge 'repair: Fix rwlock in compaction_state and lock holder lifecycle' from Raphael Raph Carvalho Consider this: - repair takes the lock holder - tablet merge filber destories the compaction group and the compaction state - repair fails - repair destroy the lock holder This is observed in the test: ``` repair - repair[5d73d094-72ee-4570-a3cc-1cd479b2a036] Repair 1 out of 1 tablets: table=sec_index.users range=(432345564227567615,504403158265495551] replicas=[0e9d51a5-9c99-4d6e-b9db-ad36a148b0ea:15, 498e354c-1254-4d8d-a565-2f5c6523845a:9, 5208598c-84f0-4526-bb7f-573728592172:28] ... repair - repair[5d73d094-72ee-4570-a3cc-1cd479b2a036]: Started to repair 1 out of 1 tables in keyspace=sec_index, table=users, table_id=ea2072d0-ccd9-11f0-8dba-c5ab01bffb77, repair_reason=repair repair - Enable incremental repair for table=sec_index.users range=(432345564227567615,504403158265495551] table - Disabled compaction for range=(432345564227567615,504403158265495551] session_id=a13a72cc-cd2d-11f0-8e9b-76d54580ab09 for incremental repair table - Got unrepaired compaction and repair lock for range=(432345564227567615,504403158265495551] session_id=a13a72cc-cd2d-11f0-8e9b-76d54580ab09 for incremental repair table - Disabled compaction for range=(432345564227567615,504403158265495551] session_id=a13a72cc-cd2d-11f0-8e9b-76d54580ab09 for incremental repair table - Got unrepaired compaction and repair lock for range=(432345564227567615,504403158265495551] session_id=a13a72cc-cd2d-11f0-8e9b-76d54580ab09 for incremental repair repair - repair[5d73d094-72ee-4570-a3cc-1cd479b2a036]: get_sync_boundary: got error from node=0e9d51a5-9c99-4d6e-b9db-ad36a148b0ea, keyspace=sec_index, table=users, range=(432345564227567615,504403158265495551], error=seastar::rpc::remote_verb_error (Compaction state for table [0x60f008fa34c0] not found) compaction_manager - Stopping 1 tasks for 1 ongoing compactions for table sec_index.users compaction_group=238 due to tablet merge compaction_manager - Stopping 1 tasks for 1 ongoing compactions for table sec_index.users compaction_group=238 due to tablet merge .... scylla[10793] Segmentation fault on shard 28, in scheduling group streaming ``` The rwlock in compaction_state could be destroyed before the lock holder of the rwlock is destroyed. This causes user after free when the lock the holder is destroyed. To fix it, users of repair lock will now be waited when a compaction group is being stopped. That way, compaction group - which controls the lifetime of rwlock - cannot be destroyed while the lock is held. Additionally, the merge completion fiber - that might remove groups - is properly serialized with incremental repair. The issue can be reproduced using sanitize build consistently and can not be reproduced after the fix. Fixes #27365 Closes scylladb/scylladb#28823 * github.com:scylladb/scylladb: repair: Fix rwlock in compaction_state and lock holder lifecycle repair: Prevent repair lock holder leakage after table drop	2026-03-05 14:18:25 +02:00
Patryk Jędrzejczak	f1978d8a22	raft: server: fix the repeating typo	2026-03-05 13:06:08 +01:00
Patryk Jędrzejczak	5a43695f6a	raft: clarify the comment about read_barrier_reply The comment could be misleading. It could suggest that the returned index is already safe to read. That's not necessarily true. The entry with the returned index could, for example, be dropped by the leader if the leader's entry with this index had a different term.	2026-03-05 13:06:08 +01:00
Patryk Jędrzejczak	1ae2ae50a6	raft: read_barrier: update local commit_idx to read_idx when it's safe When the local entry with `read_idx` belongs to the current term, it's safe to update the local `commit_idx` to `read_idx`. The argument for safety is in the new comment above `maybe_update_commit_idx_for_read`. The motivation for this change is to speed up read barriers. `wait_for_apply` executed at the end of `read_barrier` is delayed until the follower learns that the entry with `read_idx` is committed. It usually happens quickly in the `read_quorum` message. However, non-voters don't receive this message, so they have to wait for `append_entries`. If no new entries are being added, `append_entries` can come only from `fsm::tick_leader()`. For group0, this happens once every 100ms. The issue above significantly slows down cluster setups in tests. Nodes join group0 as non-voters, and then they are met with several read barriers just after a write to group0. One example is `global_token_metadata_barrier` in `write_both_read_new` performed just after `update_topology_state` in `write_both_read_old`. Writing a test for this change would be difficult, so we trust the nemesis tests to do the job. They have already found consistency issues in read barriers. See #10578.	2026-03-05 13:06:08 +01:00
Patryk Jędrzejczak	1cbd0da519	raft: log: clarify the specification of term_for When `idx > last_idx()`, the function does an out-of-bounds access to `_log`. This may look contradictory to the current specification.	2026-03-05 13:06:07 +01:00
Michał Jadwiszczak	33a16940be	strong_consistency/state_machine: pull necessary dependencies Both migration manager and system keyspace will be used in next commit. The first one is needed to execute group0 read barrier and we need system keyspace to get column mappings.	2026-03-05 12:33:17 +01:00
Alex	b32ef8ecd5	test/auth_cluster: align service-level timeout expectations with scaled config Use scale_timeout_by_mode() in make_scylla_conf() to derive request_timeout_in_ms in test/pylib/scylla_cluster.py. Update test_connections_parameters_auto_update in test/cluster/auth_cluster/test_raft_service_levels.py to expect the mode-specific timeout string returned by the REST endpoint after this scaling change.	2026-03-05 13:32:15 +02:00
Alex	a66565cc42	test/lwt: propagate scale_timeout through LWT helpers; scale resize waits Pass scale_timeout explicitly through BaseLWTTester and Worker, validating it at construction time instead of relying on implicit pytest fixture injection inside helper classes. Update wait_for_phase_ops() and wait_for_tablet_count() to use scale_timeout_by_mode() so polling behavior remains consistent across build modes. Adjust LWT test call sites to pass scale_timeout explicitly. Increase default timeout values, as the previous defaults were too short and prone to flakiness under slower configurations (notably debug/dev builds).	2026-03-05 13:07:09 +02:00
Alex	73f1a65203	test/pylib: introduce scale_timeout fixture helper Introduce scale_timeout(mode) to centralize test timeout scaling logic based on build mode, the function will return a callable that will handle the timeout by mode. This ensures consistent timeout behavior across test helpers and eliminates ad-hoc per-test scaling adjustments. Centralizing the logic improves maintainability and makes timeout behavior easier to reason about. This becomes increasingly important as we run tests on heterogeneous hardware configurations. Different build modes (especially debug) can significantly affect execution time, and having a single scaling mechanism helps keep test stability predictable across environments. No functional change beyond unifying existing timeout scaling behavior.	2026-03-05 13:07:09 +02:00
Anna Stuchlik	855c503c63	doc: fix the unified installer instructions This commit updates the documentation for the unified installer. - The Open Source example is replaced with version 2025.1 (Source Available, currently supported, LTS). - The info about CentOS 7 is removed (no longer supported). - Java 8 is removed. - The example for cassandra-stress is removed (as it was already removed on other installation pages). Fixes https://github.com/scylladb/scylladb/issues/28150 Closes scylladb/scylladb#28152	2026-03-05 12:57:06 +02:00
Michał Jadwiszczak	d25be9e389	db/schema_tables: add `get_column_mapping_if_exists()` In scenarios where we want to firsty check if a column mapping exists and if we don't want do flow control with exception, it is very wasteful to do ``` if (column_mapping_exists()) { get_column_mapping(); } ``` especially in a hot path like `state_machine::apply()` becase this will execute 2 internal queries. This commit introduces `get_column_mapping_if_exists()` function, which simply wrapps result of `get_column_mapping()` in optional and doesn't throw an exception if the mapping doesn't exist.	2026-03-05 11:55:57 +01:00
Artsiom Mishuta	7b30a3981b	test.py: enable strict_config,xfail_strict,strict-markers this commit enables 3 strict pytest options: strict_config - if any warnings encountered while parsing the pytest section of the configuration file will raise errors. xfail_strict - if markers not registered in the markers section of the configuration file will raise errors. strict-markers - if tests marked with @pytest.mark.xfail that actually succeed will by default fail the test suite and fix errors that occur after enabling these options Closes scylladb/scylladb#28859	2026-03-05 12:54:26 +02:00
Dawid Mędrek	7564a56dc8	Merge 'tombstone_gc: allow using repair-mode tombstone gc with RF=1 tables' from Botond Dénes Currently, repair-mode tombstone-gc cannot be used on tables with RF=1. We want to make repair-mode the default for all tablet tables (and more, see https://github.com/scylladb/scylladb/issues/22814), but currently a keyspace created with RF=1 and later altered to RF>1 will end up using timeout-mode tombstone gc. This is because the repair-mode tombstone-gc code relies on repair history to determine the gc-before time for keys/ranges. RF=1 tables cannot run repairs so they will have empty repair history and consequently won't be able to purge tombstones. This PR solves this by keeping a registry of RF=1 tables and consulting this registry when creating `tombstone_gc_state` objects. If the table is RF=1, tombstone-gc will work as if the table used immediate-mode tombstone-gc. The registry is updated on each replication update. As soon as the table is not RF=1 anymore, the tombstone-gc reverts to the natural repair-mode behaviour. After this PR, tombstone-gc defaults to repair-mode for all tables, regardless of RF and tablets/vnodes. Fixes: SCYLLADB-106. New feature, no backport required. Closes scylladb/scylladb#22945 * github.com:scylladb/scylladb: test/{boost,cluster}: add test for tombstone gc mode=repair with RF=1 tombstone_gc: allow use of repair-mode for RF=1 tables replica/table: update rf=1 table registry in shared tombstone-gc state tombstone_gc: tombstone_gc_before_getter: consider RF when getting gc before time tombstone_gc: unpack per_table_history_maps tombstone_gc: extract _group0_gc_time from per_table_history_map tombstone_gc: drop tombstone_gc_state(nullptr) ctor and operator bool() test/lib/random_schema: use timeout-mode tombstone_gc tombstone_gc_options: add C++ friendly constructor test: move away from tombstone_gc_state(nullptr) ctor treewide: move away from tombstone_gc_state(nullptr) ctor sstable: move away from tombstone_gc_mode::operator bool() replica/table: add get_tombstone_gc_state() compaction: use tombstone_gc_state with value semantics db/row_cache: use tombstone_gc_state with value semantics tombstone_gc: introduce tombstone_gc_state::for_tests()	2026-03-05 11:50:31 +01:00
Piotr Dulikowski	a2669e9983	test: test_mv_merge_allowed: add mistakenly omitted awaits The test test_mv_merge_allowed asserts in two places that the tablet count is 2. It does so by calling an async function but, mistakenly, the returned coroutine was not awaited. The coroutine is, apparently, truthy so the assertions always passed. Fix the test to properly await the coroutines in the assertions. Fixes: SCYLLADB-905 Closes scylladb/scylladb#28875	2026-03-05 11:29:23 +01:00
Avi Kivity	5ae40caa6d	dist: tune tcp_mem to 3% of total memory in scylla-kernel-conf package tcp_mem defaults to 9% of total memory. ScyllaDB defaults to 93%. The sum is more than 100%. Fix by tuning tcp_mem to 3% of total memory. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-734 Closes scylladb/scylladb#28700	2026-03-05 12:51:04 +03:00
Patryk Jędrzejczak	bb1a798c2c	Merge 'raft: Throw stopped_error if server aborted' from Dawid Mędrek This PR solves a series of similar problems related to executing methods on an already aborted `raft::server`. They materialize in various ways: * For `add_entry` and `modify_config`, a `raft::not_a_leader` with a null ID will be returned IF forwarding is disabled. This wasn't a big problem because forwarding has always been enabled for group0, but it's something that's nice to fix. It's might be relevant for strong consistency that will heavily rely on this code. * For `wait_for_leader` and `wait_for_state_change`, the calls may hang and never resolve. A more detailed scenario is provided in a commit message. For the last two methods, we also extend their descriptions to indicate the new possible exception type, `raft::stopped_error`. This change is correct since either we enter the functions and throw the exception immediately (if the server has already been aborted), or it will be thrown upon the call to `raft::server::abort`. We fix both issues. A few reproducer tests have been included to verify that the calls finish and throw the appropriate errors. Fixes SCYLLADB-841 Backport: Although the hanging problems haven't been spotted so far (at least to the best of my knowledge), it's best to avoid running into a problem like that, so let's backport the changes to all supported versions. They're small enough. Closes scylladb/scylladb#28822 * https://github.com/scylladb/scylladb: raft: Make methods throw stopped_error if server aborted raft: Throw stopped_error if server aborted test: raft: Introduce get_default_cluster	2026-03-05 10:47:39 +01:00
Botond Dénes	cd13a911cc	test/cluster/test_data_resurrection_in_memtable.py: dump rows before check So that if the check of expected rows fail, we have a dump to look at and see what is different.	2026-03-05 11:44:02 +02:00
Botond Dénes	f375aae257	replica/database: consolidate the two database_apply error injections Into a single database_apply one. Add three parameters: * ks_name and cf_name to filter the tables to be affected * what - what to do: throw or wait This leads to smaller footprint in the code and improved filtering for table names at the cost of some extra error injection params in the tests.	2026-03-05 11:44:02 +02:00
Marcin Maliszkiewicz	c3f59e4fa1	Merge 'cql3: implement write_consistency_levels guardrails' from Andrzej Jackowski This patch series implements `write_consistency_levels_warned` and `write_consistency_levels_disallowed` guardrails, allowing the configuration of which consistency levels are unwanted for writes. The motivation for these guardrails is to forbid writing with consistency levels that don't provide high durability guarantees (like CL=ANY, ONE, or LOCAL_ONE). Neither guardrail is enabled by default, so as not to disrupt clusters that are currently using any of the CLs for writes. The warning guardrail may seem harmless, as it only adds a warning to the CQL response; however, enabling it can significantly increase network traffic (as a warning message is added to each response) and also decrease throughput due to additional allocations required to prepare the warning. Therefore, both guardrails should be enabled with care. The newly added `writes_per_consistency_level` metric, which is incremented unconditionally, can help decide whether a guardrail can be safely enabled in an existing cluster. This commit adds additional `if` instructions on the critical path. However, based on the `perf_simple_query` benchmark for writes, the difference is marginal (~40 additional instructions, which is a relative difference smaller than 0.001). BEFORE: ``` 291443.35 tps ( 53.3 allocs/op, 16.0 logallocs/op, 14.2 tasks/op, 48067 insns/op, 18885 cycles/op, 0 errors) throughput: mean= 289743.07 standard-deviation=6075.60 median= 291424.69 median-absolute-deviation=1702.56 maximum=292498.27 minimum=261920.06 instructions_per_op: mean= 48072.30 standard-deviation=21.15 median= 48074.49 median-absolute-deviation=12.07 maximum=48119.87 minimum=48019.89 cpu_cycles_per_op: mean= 18884.09 standard-deviation=56.43 median= 18877.33 median-absolute-deviation=14.71 maximum=19155.48 minimum=18821.57 ``` AFTER: ``` 290108.83 tps ( 53.3 allocs/op, 16.0 logallocs/op, 14.2 tasks/op, 48121 insns/op, 18988 cycles/op, 0 errors) throughput: mean= 289105.08 standard-deviation=3626.58 median= 290018.90 median-absolute-deviation=1072.25 maximum=291110.44 minimum=274669.98 instructions_per_op: mean= 48117.57 standard-deviation=18.58 median= 48114.51 median-absolute-deviation=12.08 maximum=48162.18 minimum=48087.18 cpu_cycles_per_op: mean= 18953.43 standard-deviation=28.76 median= 18945.82 median-absolute-deviation=20.84 maximum=19023.93 minimum=18916.46 ``` Fixes: SCYLLADB-259 Refs: SCYLLADB-739 No backport, it's a new feature Closes scylladb/scylladb#28570 * github.com:scylladb/scylladb: scylla.yaml: add write CL guardrails to scylla.yaml scylla.yaml: reorganize guardrails config to be in one place test: add cluster tests for write CL guardrails test: implement test_guardrail_write_consistency_level cql3: start using write CL guardrails cql3/query_processor: implement metrics to track CL of writes db: cql3/query_processor: add write_consistency_levels enum_sets config: add write_consistency_levels_* guardrails configuration	2026-03-05 09:55:38 +01:00
Botond Dénes	44b8cad3df	service/storage_proxy: add name of table to error message for write errors It is useful to know what table the failed write belongs to.	2026-03-05 10:51:12 +02:00
Yauheni Khatsianevich	aa85f5a9c3	test: migrating alternator ttl tests to scylla repo migrating alternator_ttl_tests.py to scylla repo as part of deprecating dtest framework migrated tests: - test_ttl_with_load_and_decommission Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-869 Closes scylladb/scylladb#28858	2026-03-05 10:04:14 +02:00
Nadav Har'El	8e32d97be6	test/alternator: fix run script The test/alternator/run script currently fails, Scylla fails to boot complaining that "--alternator-ttl-period-in-seconds" is specified twice (which is, unfortunately, not allowed). The problem is that recently we started to set this option in test/cqlpy/run.py, for CQL's new row-level TTL, so now it is no longer needed in test/alternator/run - and in fact not allowed and we must remove it. This patch only affects the script test/alternator/run, and has no affect on running tests through test.py or Jenkins. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#28868	2026-03-05 10:06:38 +03:00
Botond Dénes	9b2242c752	test/cluster/test_repair.py: fix test_repair_timtestamp_difference This test forgot to await its check() calls, which is the pass-condition of the test. Once the await was added, the test started failing. Turns out, the test was broken, but this was never discovered, because due to the missing await, the errors were not propagated. This patch adds the missing await and fixes the discovered problems: * Use cql.run_async() instead of cql.execute() * Fix json path for timestamp * Missing flush/compact Fixes: SCYLLADB-911 Closes scylladb/scylladb#28883	2026-03-05 10:04:49 +03:00
Nadav Har'El	af07718fff	test/cqlpy: fix "run --release" for versions 5.4 or older Recently we started to rely on the options "--auth-superuser-name" and "--auth-superuser-salted-password" to ensure that a cassandra/cassandra user exists for tests - without those options a default superuser no longer exists. This broke "test/cqlpy/run --release" for old releases, earlier than 5.4 (in the enterprise stream, 2024.1 or earlier), because those old release didn't have this option. So in this patch we fix the "--release" logic that removes these options from the command line when running these old versions. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#28894	2026-03-05 09:59:46 +03:00
Botond Dénes	5e7b966d37	Merge 'Remove prepare_snapshot_for_backup() helper from backup/restore tests' from Pavel Emelyanov The helper in question duplicates the functionality of `take_snapshot()` one from the same file. The only difference is that it additionally creates keyspace:table with yet another helper, but that helper is also going to be removed (as continuation of #28600 and #28608) Enhancing tests, not backporting Closes scylladb/scylladb#28834 * github.com:scylladb/scylladb: test_backup: Remove prepare_snapshot_for_backup() test_backup: Patch test_simple_backup_and_restore to use take_snapshot() test_backup: Patch backup tests to use take_snapshot() test_backup: Add helper to take snapshot on a single server	2026-03-05 06:54:07 +02:00
Dani Tweig	25fc8ef14c	Add RELENG to milestone-to-Jira sync project keys Closes scylladb/scylladb#28889	2026-03-05 06:51:21 +02:00
Calle Wilund	35aab75256	test_internode_compression: Add await for "run" coro:s Fixes: SCYLLADB-907 Closes scylladb/scylladb#28885	2026-03-05 06:50:33 +02:00
Patryk Jędrzejczak	2a3476094e	storage_service: raft_topology_cmd_handler: fix use-after-free `8e9c7397c5` made `rs` a reference, which can lead to use-after-free. The `normal_nodes` map containing the referenced value can be destroyed before the last use of `rs` when the topology state is reloaded after a context switch on some `co_await`. The following move assignment in `storage_service::topology_state_load` causes this: ``` _topology_state_machine._topology = co_await _sys_ks.local().load_topology_state(tablet_hosts); ``` This issue has been discovered in next-2026.1 CI after queueing the backport of #28558. `test_truncate_during_topology_change` failed after ASan reported a heap-use-after-free in ``` co_await _repair.local().bootstrap_with_repair(get_token_metadata_ptr(), rs.ring.value().tokens, session); ``` This test enables `delay_bootstrap_120s`, which makes the bug much more likely to reproduce, but it could happen elsewhere. No backport needed, as the only backport of #28558 hasn't been merged yet. The backport PR will cherry-pick this commit. Closes scylladb/scylladb#28772	2026-03-05 01:50:22 +01:00
Aleksandra Martyniuk	156c29f962	test: add test_group0_tables_use_schema_commitlog	2026-03-04 17:25:06 +01:00
Aleksandra Martyniuk	5306e26b83	db: service: remove group0 tables from schema commitlog schema initializer Remove group0 tables from schema commitlog schema initializer. The schema commitlog of group0 tables is ensured by set_is_group0_table.	2026-03-04 17:25:06 +01:00
Aleksandra Martyniuk	690b2c4142	service: ensure that tables updated via group0 use schema commitlog Set enable_schema_commitlog for each group0 tables. Assert that group0 tables use schema commitlog in ensure_group0_schema (per each command). Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-914.	2026-03-04 17:25:04 +01:00
Aleksandra Martyniuk	6b3b174704	db: schema: remove set_is_group0_table param set_is_group0_table takes an enabled flag, based on which it decides whether it's a group0 table. The method is called only with enabled = true. Drop the param. For not group0 tables nothing should be set.	2026-03-04 17:24:34 +01:00
Aleksandra Martyniuk	57f1e46204	test: cluster: tasks: await set_task_ttl Await set_task_ttl in test_tablet_repair_task_children. Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-912 Closes scylladb/scylladb#28882	2026-03-04 17:58:37 +02:00
Dawid Mędrek	d44fc00c4c	raft: Make methods throw stopped_error if server aborted After the previous changes in `raft::server::{add_entry, modify_config}` (cf. SCYLLADB-841), we also go through other methods of `raft::server` and verify that they handle the aborted state properly. I found two methods that do not: (A) `wait_for_leader` (B) `wait_for_state_change` What happened before these changes? In case (A), the dangerous scenario occurred when `_leader_promise` was empty on entering the function. In that case, we would construct the promise and wait on the corresponding future. However, if the server had been already aborted before the call, the future would never resolve and we'd be effectively stuck. Case (B) is fully analogous: instead of `_leader_promise`, we'd work with `_stte_change_promise`. There's probably a more proper solution to this problem, but since I'm not familiar with the internal code of Raft, I fix it this way. We can improve it further in the future. We provide two simple validation tests. They verify that after aborting a `raft::server`, the calls: * do not hang (the tests would time out otherwise), * throw raft::stopped_error. Fixes SCYLLADB-841	2026-03-04 16:28:11 +01:00
Dawid Mędrek	c200d6ab4f	raft: Throw stopped_error if server aborted Before the change, calling `add_entry` or `modify_config` on an already aborted Raft server could result in an error `not_a_leader` containing a null server ID. It was possible precisely when forwarding was disabled in the server configuration. `not_a_leader` is supposed to return the ID of the current leader, so that was wrong. Furthermore, the description of the function specified that if a server is aborted, then it should throw `stopped_error`. We fix that issue. A few small reproducer tests were provided to verify that the functions behave correctly with and without forwarding enabled. Refs SCYLLADB-841	2026-03-04 16:28:08 +01:00
Marcin Maliszkiewicz	9697b6013f	Merge 'test: add missing awaits in test_client_routes_upgrade' from Andrzej Jackowski Two calls in test_client_routes_upgrade were missing `await`, so they were never actually executed. This caused Python to emit RuntimeWarning about unawaited coroutines, and more importantly, the test skipped important verification steps, which could mask real bugs or cause flakiness. Additionally, increase 10s timeouts to 60s to avoid flakiness in slow environments. Although these tests haven't failed so far, similar issues have already been observed in other tests with too-short timeouts. Fixes: [SCYLLADB-909](https://scylladb.atlassian.net/browse/SCYLLADB-909) Backport to 2026.1, as the test is also there. [SCYLLADB-909]: https://scylladb.atlassian.net/browse/SCYLLADB-909?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ Closes scylladb/scylladb#28877 * github.com:scylladb/scylladb: test: increase timeouts in test_client_routes.py test: add missing awaits in test_client_routes_upgrade	2026-03-04 15:26:34 +01:00
Szymon Malewski	212bd6ae1a	vector: Vectorize loops in similarity functions The main loops iterating over vector components were not vectorized due to: - "cannot prove it is safe to reorder floating-point operations" - "Cannot vectorize early exit loop with more than one early exit" The first issue is fixed with adding `#pragma clang fp contract(fast) reassociate(on)`, which allows compiler to optimize floating point operations. The second issue is solved by refactoring the operations in the affected loop. Additionally using float operations instead of double increases throughput and numerical accuracy is not the main consideration in vector search scenarios. Performance measured: - scylla built using dbuild - using https://github.com/zilliztech/VectorDBBench (modified to call `SELECT id, similarity_cosine({vector<float, 1536>}, {vector<float, 1536>}) ...` without ANN search): - client concurrency 20 before: ~2250 QPS `float` operations: ~2350 QPS `compute_cosine_similarity` vectorization: ~2500QPS `extract_float_vector` vectorization: ~3000QPS Follow-up https://github.com/scylladb/scylladb/pull/28615 Ref https://scylladb.atlassian.net/browse/SCYLLADB-764 Closes scylladb/scylladb#28754	2026-03-04 15:14:53 +01:00
Andrzej Jackowski	221b78cb81	test: increase timeouts in test_client_routes.py Increase 10s timeouts to 60s to avoid flakiness in slow environments. Although these tests haven't failed so far, similar issues have already been observed in other tests with too-short timeouts. Test execution time is unaffected; the entire suite in `dev` takes ~30s before and after this change.	2026-03-04 13:40:30 +01:00
Andrzej Jackowski	527c4141da	test: add missing awaits in test_client_routes_upgrade Two calls in test_client_routes_upgrade were missing `await`, so they were never actually executed. This caused Python to emit RuntimeWarning about unawaited coroutines, and more importantly, the test skipped important verification steps, which could mask real bugs or cause flakiness. Fixes: SCYLLADB-909	2026-03-04 13:34:37 +01:00
Piotr Smaron	a31cb18324	db: fix UB in system.clients row sorting The comparator used to sort per-IP client rows was not a strict-weak-ordering (it could return true in both directions for some pairs), which makes `std::ranges::sort` behavior undefined. A concrete pair that breaks it (and is realistic in system.clients): a = (port=9042, client_type="cql") b = (port=10000, client_type="alternator") With the current comparator: cmp(a,b) = (9042 < 10000) \|\| ("cql" < "alternator") = true \|\| false = true cmp(b,a) = (10000 < 9042) \|\| ("alternator" < "cql") = false \|\| true = true So both directions are true, meaning there is no valid ordering that sort can achieve. The fix is to sort lexicographically by (port, client_type) to match the table's clustering key and ensure deterministic ordering. Closes scylladb/scylladb#28844	2026-03-04 14:10:49 +03:00
Avi Kivity	c331796d28	Merge 'Support Min < Precision for approx_exponential_histogram' from Amnon Heiman This series closes a gap in the approx_exponential_histogram implementation to cover integer values starting from small Min values. While the original implementation was focused on durations, where this limitation was not an issue, over time, there has been a growing need for histograms that cover smaller values, such as the number of SSTables or the number of items in a batch. The reason for the original limitation is inherent to the exponential histogram math. The previous code required Min to be at least Precision to avoid negative bit shifts in the exponential calculations. After this series, approx_exponential_histogram allows Min to be smaller than Precision by scaling values during indexing. The value is shifted left by log2 Precision minus log2 Min or zero whichever is larger, and the existing exponential math is applied. Bucket limits are then scaled back to the original units. This keeps insertion and retrieval O(1) without runtime branching, at the cost of repeated bucket limits for some values in the Min to Precision range. Additional tests cover the new behavior. Relates to #2785 New feature, no need to backport. Closes scylladb/scylladb#28371 * github.com:scylladb/scylladb: estimated_histogram_test.cc: add to_metrics_histogram test histogram_metrics_helper.hh: Support Min < Precision estimated_histogram_test.cc: Add tests for approx_exponential_histogram with Min<Precision estimated_histogram.hh: support Min less than Precision histograms	2026-03-04 12:43:26 +02:00
Szymon Malewski	4c4673e8f9	test: vector_similarity: Fix similarity value checks `isclose` function checks if returned similarity floats are close enough to expected value, but it doesn't `assert` by itself. Several tests missed that `assert`, effectively always passing. With this patch similarity values checks are wrapped in helper function `assert_similarity` with predefined tolerance. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-877 Closes scylladb/scylladb#28748	2026-03-04 09:53:32 +01:00
Marcin Maliszkiewicz	c7d3f80863	Merge 'auth: do not create default 'cassandra:cassandra' superuser' from Dario Mirovic This patch series removes creation of default 'cassandra:cassandra' superuser on system start. Disable creation of a superuser with default 'cassandra:cassandra' credentials to improve security. The current flow requires clients to create another superuser and then drop the default `cassandra:cassandra' role. For those who do, there is a time window where the default credentials exist. For those who do not, that role stays. We want to improve security by forcing the client to either use config to specify default values for default superuser name and password or use cqlsh over maintenance socket connection to explicitly create/alter a superuser role. The patch series: - Enable role modification over the maintenance socket - Stop using default 'cassandra' value for default superuser, skipping creation instead Design document: https://scylladb.atlassian.net/wiki/spaces/RND/pages/165773327/Drop+default+cassandra+superuser Fixes scylladb/scylla-enterprise#5657 This is an improvement. It does not need a backport. Closes scylladb/scylladb#27215 * github.com:scylladb/scylladb: config: enable maintenance socket in workdir by default docs: auth: do not specify password with -p option docs: update documentation related to default superuser test: maintenance socket role management test: cluster: add logs to test_maintenance_socket.py test: pylib: fix connect_driver handling when adding and starting server auth: do not create default 'cassandra:cassandra' superuser auth: remove redundant DEFAULT_USER_NAME from password authenticator auth: enable role management operations via maintenance socket client_state: add has_superuser method client_state: add _bypass_auth_checks flag auth: let maintenance_socket_role_manager know if node is in maintenance mode auth: remove class registrator usage auth: instantiate auth service with factory functors auth: add service constructor with factory functors auth: add transitional.hh file service: qos: handle special scheduling group case for maintenance socket service: qos: use _auth_integration as condition for using _auth_integration	2026-03-04 09:43:57 +01:00
Piotr Dulikowski	85dcbfae9a	Merge 'hint: Don't switch group in database::apply_hint()' from Pavel Emelyanov The method is called from storage_proxy::mutate_hint() which is in turn called from hint_mutation::apply_locally(). The latter is either called from directly by hint sender, which already runs in streaming group, or via RPC HINT_MUTATION handler which uses index 1 that negotiates streaming group as well. To be sure, add a debugging check for current group being the expected one. Code cleanup, not backporting Closes scylladb/scylladb#28545 * github.com:scylladb/scylladb: hint: Don't switch group in database::apply_hint() hint_sender: Switch to sender group on stop either	2026-03-04 09:36:38 +01:00
Pavel Emelyanov	5793e305b5	test_backup: Remove prepare_snapshot_for_backup() It's now unused Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-03-04 11:33:43 +03:00
Pavel Emelyanov	ffbd9a3218	test_backup: Patch test_simple_backup_and_restore to use take_snapshot() This change is a bit more careful, as the test collects files from snapshot directory several times. Before patching it to use the helper, it collected _all_ the files. Now the helper only provides TOC-s, but that's fine -- the only check that relies on that may also re-collect TOC-s and compare new set with old set. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-03-04 11:33:43 +03:00
Pavel Emelyanov	c1b0ac141b	test_backup: Patch backup tests to use take_snapshot() Some of those tests need to update the hard-coded 'backup' snapshot name to use the one provided by take_snapshot() helper. Other than that, the patching is pretty straightforward. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-03-04 11:33:43 +03:00
Pavel Emelyanov	ea17c26fd9	test_backup: Add helper to take snapshot on a single server The take_snapshot() helper returns a dict(server: list[string]). When there's only one server to work with, it's more handy to just get a single list of sstables. Next patches will make use of that helper. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-03-04 11:31:39 +03:00
Botond Dénes	e7487c21e4	test/{boost,cluster}: add test for tombstone gc mode=repair with RF=1	2026-03-04 09:45:38 +02:00
Botond Dénes	5998a859f7	tombstone_gc: allow use of repair-mode for RF=1 tables Modify the methods which calculate the default gc mode as well as that which validates whether repair-mode can be used at all, so both accepts use of repair-mode on RF=1 tables. This de-facto changes the default tombstone-gc to repair-mode for all tables. Documentation is updated accordingly. Some tests need adjusting: * cqlpy/test_select_from_mutation_fragments.py: disable GC for some test cases because this patch makes tombstones they write subject to GC when using defaults. * test/cluster/test_mv.py::test_mv_tombstone_gc_not_inherited used repair-mode as a non-default for the base table and expected the MV to revert to default. Another mode has to be used as the non-default (immediate). * test/cqlpy/test_tools.py::test_scylla_sstable_dump_schema: don't compare tombstone_gc schema extension when comparing dumped schema vs. original. The tool's schema loader doesn't have access to the keyspace definition so it will come up with different defaults for tombstone-gc. * test/boost/row_cache_test.cc::test_populating_cache_with_expired_and_nonexpired_tombstones sets tombstone expiry assuming the tombstone-gc timeout-mode default. Change the CREATE TABLE statement to set the expected mode.	2026-03-04 09:44:24 +02:00
Andrzej Jackowski	c0e94828de	scylla.yaml: add write CL guardrails to scylla.yaml Disabled by default. This change is introduced only to document the guardrail. Refs: SCYLLADB-259	2026-03-04 08:00:17 +01:00
Andrzej Jackowski	038f89ede4	scylla.yaml: reorganize guardrails config to be in one place Also change the format of the section header and add "#" to empty lines, so that in the future no one splits the section by adding new configs.	2026-03-04 08:00:17 +01:00
Andrzej Jackowski	ec42fdfd01	test: add cluster tests for write CL guardrails Most of the functionality is tested in cqlpy tests located in `test_guardrail_write_consistency_level.py`. Add two tests that require the cluster framework: - `test_invalid_write_cl_guardrail_config` checks the node startup path when incorrect `write_consistency_levels_warned` and `write_consistency_levels_disallowed` values are used. - `test_write_cl_default` checks the behavior of the default configuration using a multi-node cluster. Tests execution time: - Dev: 10s - Debug: 18s Refs: SCYLLADB-259	2026-03-04 08:00:17 +01:00
Andrzej Jackowski	446539f12f	test: implement test_guardrail_write_consistency_level Implement basic tests for write consistency level guardrails, verifying that they work for each type of write request (inserts, updates, deletes, logged batches, unlogged batches, conditional batches, and counter operations). All tests are marked as Scylla-only because they currently don't pass with Cassandra due to differences in handling superusers (see: SCYLLADB-882). Tests execution time: - Dev: 3s - Debug: 14s Refs: SCYLLADB-259 Refs: SCYLLADB-882	2026-03-04 08:00:13 +01:00
Avi Kivity	85bd6d0114	Merge 'Add multiple-shard persistent metadata storage for strongly consistent tables' from Wojciech Mitros In this series we introduce new system tables and use them for storing the raft metadata for strongly consistent tables. In contrast to the previously used raft group0 tables, the new tables can store data on any shard. The tables also allow specifying the shard where each partition should reside, which enables the tablets of strongly consistent tables to have their raft group metadata co-located on the same shard as the tablet replica. The new tables have almost the same schemas as the raft group0 tables. However, they have an additional column in their partition keys. The additional column is the shard that specifies where the data should be located. While a tablet and its corresponding raft group server resides on some shard, it now writes and reads all requests to the metadata tables using its shard in addition to the group_id. The extra partition key column is used by the new partitioner and sharder which allow this special shard routing. The partitioner encodes the shard in the token and the sharder decodes the shard from the token. This approach for routing avoids any additional lookups (for the tablet mapping) during operations on the new tables and it also doesn't require keeping any state. It also doesn't interact negatively with resharding - as long as tablets (and their corresponding raft metadata) occupy some shard, we do not allow starting the node with a shard count lower than the id of this shard. When increasing the shard count, the routing does not change, similarly to how tablet allocation doesn't change. To use the new tables, a new implementation of `raft::persistence` is added. Currently, it's almost an exact copy of the `raft_sys_table_storage` which just uses the new tables, but in the future we can modify it with changes specific to metadata (or mutation) storage for strongly consistent tables. The new storage is used in the `groups_manager`, which combined with the removal of some `this_shard_id() == 0` checks, allows strongly consistent tables to be used on all shards. This approach for making sure that the reads/writes to the new tables end up on the correct shards won in the balance of complexity/usability/performance against a few other approaches we've considered. They include: 1. Making the Raft server read/write directly to the database, skipping the sharder, on its shard, while using the default partitioner/sharder. This approach could let us avoid changing the schema and there should be no problems for reads and writes performed by the Raft server. However, in this approach we would input data in tables conflicting with the placement determined by the sharder. As a result, any read going through the sharder could miss the rows it was supposed to read. Even when reading all shards to find a specific value, there is a risk of polluting the cache - the rows loaded on incorrect shards may persist in the cache for an unknown amount of time. The cache may also mistakenly remember that a row is missing, even though it's actually present, just on an incorrect shard. Some of the issues with this approach could be worked around using another sharder which always returns this_shard_id() when asked about a shard. It's not clear how such a sharder would implement a method like `token_for_next_shard`, and how much simpler it would be compared to the current "identity" sharder. 2. Using a sharder depending on the current allocation of tablets on the node. This approach relies on the knowledge of group_id -> shard mapping at any point in time in the cluster. For this approach we'd also need to either add a custom partitioner which encodes the group_id in the token, or we'd need to track the token(group_id) -> shard mapping. This approach has the benefit over the one used in the series of keeping the partition key as just group_id. However, it requires more logic, and the access to the live state of the node in the sharder, and it's not static - the same token may be sharded differently depending on the state of the node - it shouldn't occur in practice, but if we changed the state of the node before adjusting the table data, we would be unable to access/fix the stale data without artificially also changing the state of the node. 3. Using metadata tables co-located to the strongly consistent tables. This approach could simplify the metadata migrations in the future, however it would require additional schema management of all co-located metadata tables, and it's not even obvious what could be used as the partition key in these tables - some metadata is per-raft-group, so we couldn't reuse the partition key of the strongly consistent table for it. And finding and remembering a partition key that is routed to a specific shard is not a simple task. Finally, splits and merges will most likely need special handling for metadata anyway, so we wouldn't even make use of co-located table's splits and merges. Fixes [SCYLLADB-361](https://scylladb.atlassian.net/browse/SCYLLADB-361) [SCYLLADB-361]: https://scylladb.atlassian.net/browse/SCYLLADB-361?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ Closes scylladb/scylladb#28509 * github.com:scylladb/scylladb: docs: add strong consistency doc test/cluster: add tests for strongly-consistent tables' metadata persistence raft: enable multi-shard raft groups for strongly consistent tablets test/raft: add unit tests for raft_groups_storage raft: add raft_groups_storage persistence class db: add system tables for strongly consistent tables' raft groups dht: add fixed_shard_partitioner and fixed_shard_sharder raft: add group_id -> shard mapping to raft_group_registry schema: add with_sharder overload accepting static_sharder reference	2026-03-04 08:55:43 +02:00
Piotr Dulikowski	2fb981413a	Merge 'vector_search: test: fix HTTPS client test flakiness' from Karol Nowacki The default 100ms timeout for client readiness in tests is too aggressive. In some test environments, this is not enough time for client creation, which involves address resolution and TLS certificate reading, leading to flaky tests. This commit increases the default client creation timeout to 10 seconds. This makes the tests more robust, especially in slower execution environments, and prevents similar flakiness in other test cases. Fixes: VECTOR-547, SCYLLADB-802, SCYLLADB-825, SCYLLADB-826 Backport to 2025.4 and 2026.1, as the same problem occurs on these branches and can potentially make the CI flaky there as well. Closes scylladb/scylladb#28846 * github.com:scylladb/scylladb: vector_search: test: include ANN error in assertion vector_search: test: fix HTTPS client test flakiness	2026-03-04 08:55:43 +02:00
Wojciech Mitros	38f02b8d76	mv: remove dead code in view_updates::can_skip_view_updates When we create a materialized view, we consider 2 cases: 1. the view's primary key contains a column that is not in the primary key of the base table 2. the view's primary key doesn't contain such a column In the 2nd case, we add all columns from the base table to the schema of the view (as virtual columns). As a result, all of these columns are effectively "selected" in view_updates::can_skip_view_updates. Same thing happens when we add new columns to the base table using ALTER. Because of this, we can never have !column_is_selected and !has_base_non_pk_columns_in_view_pk at the same time. And thus, the check (!column_is_selected && _base_info.has_base_non_pk_columns_in_view_pk) is always the same as (!column_is_selected). Because we immediately return after this check, the tail of this function is also never reached - all checks after the (column_is_selected) are affected by this. Also, the condition (!column_is_selected && base_has_nonexpiring_marker) is always false at the point it is called. And this in turn makes the `base_has_nonexpiring_marker` unused, so we delete it as well. It's worth considering, why did we even have `base_has_nonexpiring_marker` if it's effectively unused. We initially introduced it in `bd52e05ae2` and we (incorrectly) used it to allow skipping view updates even if the liveness of virtual columns changed. Soon after, in `5f85a7a821`, we started categorizing virtual columns as column_is_selected == true and we moved the liveness checks for virtual columns to the `if (column_is_selected)` clause, before the `base_has_nonexpiring_marker` check. We changed this because even if we have a nonexpiring marker right now, it may be changed in the future, in which case the liveness of the view row will depend on liveness of the virtual columns and we'll need to have the view updates from the time the row marker was nonexpiring. Closes scylladb/scylladb#28838	2026-03-04 08:55:43 +02:00
Geoff Montee	0eb5603ebd	Docs: describe the system tables Fixes issue #12818 with the following docs changes: docs/dev/system_keyspace.md: Added missing system tables, added table of contents (TOC), added categories Closes scylladb/scylladb#27789	2026-03-04 08:55:43 +02:00
Botond Dénes	f156bcddab	Merge 'test: decrease strain in test_startup_response' from Marcin Maliszkiewicz For 2025.3 and 2025.4 this test runs order of magnitude slower in debug mode. Potentially due to passwords::check running in alien thread and overwhelming the CPU (this is fixed in newer versions). Decreasing the number of connections in test makes it fast again, without breaking reproducibility. As additional measure we double the timeout. The fix is now cherry-picked to master as sometimes test fails there too. (cherry picked from commit `1f1fc2c2ac`) Fixes https://scylladb.atlassian.net/browse/SCYLLADB-795 backport: 2026.1, already on other stable branches Closes scylladb/scylladb#28848 * github.com:scylladb/scylladb: test: add more logs to test_startup_no_auth_response test: decrease strain in test_startup_response	2026-03-04 08:55:43 +02:00
Andrzej Jackowski	bb359b3b78	cql3: start using write CL guardrails Enable verification of write consistency level guardrails in `modification_statement` and `batch_statement`. Neither guardrail is enabled by default, so as not to disrupt clusters that are currently using any of the CLs for writes. The warning guardrail may seem harmless, as it only adds a warning to the CQL response; however, enabling it can significantly increase network traffic (as a warning message is added to each response) and also decrease throughput due to additional allocations required to prepare the warning. Therefore, both guardrails should be enabled with care. The newly added `writes_per_consistency_level` metric, which is incremented unconditionally, can help decide whether a guardrail can be safely enabled in an existing cluster. This commit adds additional `if` instructions on the critical path. However, based on the `perf_simple_query` benchmark for writes, the difference is marginal (~40 additional instructions, which is a relative difference smaller than 0.001). BEFORE: ``` 291443.35 tps ( 53.3 allocs/op, 16.0 logallocs/op, 14.2 tasks/op, 48067 insns/op, 18885 cycles/op, 0 errors) throughput: mean= 289743.07 standard-deviation=6075.60 median= 291424.69 median-absolute-deviation=1702.56 maximum=292498.27 minimum=261920.06 instructions_per_op: mean= 48072.30 standard-deviation=21.15 median= 48074.49 median-absolute-deviation=12.07 maximum=48119.87 minimum=48019.89 cpu_cycles_per_op: mean= 18884.09 standard-deviation=56.43 median= 18877.33 median-absolute-deviation=14.71 maximum=19155.48 minimum=18821.57 ``` AFTER: ``` 290108.83 tps ( 53.3 allocs/op, 16.0 logallocs/op, 14.2 tasks/op, 48121 insns/op, 18988 cycles/op, 0 errors) throughput: mean= 289105.08 standard-deviation=3626.58 median= 290018.90 median-absolute-deviation=1072.25 maximum=291110.44 minimum=274669.98 instructions_per_op: mean= 48117.57 standard-deviation=18.58 median= 48114.51 median-absolute-deviation=12.08 maximum=48162.18 minimum=48087.18 cpu_cycles_per_op: mean= 18953.43 standard-deviation=28.76 median= 18945.82 median-absolute-deviation=20.84 maximum=19023.93 minimum=18916.46 ``` Fixes: SCYLLADB-259	2026-03-04 07:26:00 +01:00
Asias He	225b10b683	repair: Fix rwlock in compaction_state and lock holder lifecycle Consider this: - repair takes the lock holder - tablet merge filber destories the compaction group and the compaction state - repair fails - repair destroy the lock holder This is observed in the test: ``` repair - repair[5d73d094-72ee-4570-a3cc-1cd479b2a036] Repair 1 out of 1 tablets: table=sec_index.users range=(432345564227567615,504403158265495551] replicas=[0e9d51a5-9c99-4d6e-b9db-ad36a148b0ea:15, 498e354c-1254-4d8d-a565-2f5c6523845a:9, 5208598c-84f0-4526-bb7f-573728592172:28] ... repair - repair[5d73d094-72ee-4570-a3cc-1cd479b2a036]: Started to repair 1 out of 1 tables in keyspace=sec_index, table=users, table_id=ea2072d0-ccd9-11f0-8dba-c5ab01bffb77, repair_reason=repair repair - Enable incremental repair for table=sec_index.users range=(432345564227567615,504403158265495551] table - Disabled compaction for range=(432345564227567615,504403158265495551] session_id=a13a72cc-cd2d-11f0-8e9b-76d54580ab09 for incremental repair table - Got unrepaired compaction and repair lock for range=(432345564227567615,504403158265495551] session_id=a13a72cc-cd2d-11f0-8e9b-76d54580ab09 for incremental repair table - Disabled compaction for range=(432345564227567615,504403158265495551] session_id=a13a72cc-cd2d-11f0-8e9b-76d54580ab09 for incremental repair table - Got unrepaired compaction and repair lock for range=(432345564227567615,504403158265495551] session_id=a13a72cc-cd2d-11f0-8e9b-76d54580ab09 for incremental repair repair - repair[5d73d094-72ee-4570-a3cc-1cd479b2a036]: get_sync_boundary: got error from node=0e9d51a5-9c99-4d6e-b9db-ad36a148b0ea, keyspace=sec_index, table=users, range=(432345564227567615,504403158265495551], error=seastar::rpc::remote_verb_error (Compaction state for table [0x60f008fa34c0] not found) compaction_manager - Stopping 1 tasks for 1 ongoing compactions for table sec_index.users compaction_group=238 due to tablet merge compaction_manager - Stopping 1 tasks for 1 ongoing compactions for table sec_index.users compaction_group=238 due to tablet merge .... scylla[10793] Segmentation fault on shard 28, in scheduling group streaming ``` The rwlock in compaction_state could be destroyed before the lock holder of the rwlock is destroyed. This causes user after free when the lock the holder is destroyed. To fix it, users of repair lock will now be waited when a compaction group is being stopped. That way, compaction group - which controls the lifetime of rwlock - cannot be destroyed while the lock is held. Additionally, the merge completion fiber - that might remove groups - is properly serialized with incremental repair. The issue can be reproduced using sanitize build consistently and can not be reproduced after the fix. Fixes #27365 Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2026-03-03 21:05:15 -03:00
Raphael S. Carvalho	1d8903d9f7	repair: Prevent repair lock holder leakage after table drop Prevent repair lock holder from being leaked in repair_service when table is dropped midway. The leakage might result in use-after-free later, since the repair lock itself will be gone after table drop. The RPC verb that removes the lock on success path will not be called by coordinator after table was dropped. Refs #27365. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-896. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2026-03-03 21:05:10 -03:00
Dario Mirovic	06af4480ea	config: enable maintenance socket in workdir by default We want to enable maintenance socket by default. This will prevent users from having to reboot a server to enable it. Also, there is little point in having maintenance socket that is turned off, and we want users to use it. After this patch series, they will have to use it. Note that while config seeding exists, we do not encourage it for production deployments. This patch changes default maintenance_socket value from ignore to workdir. This enables maintenance socket without specifying an explicit path. Refs SCYLLADB-409	2026-03-04 00:01:07 +01:00
Dario Mirovic	6e83fb5029	docs: auth: do not specify password with -p option Specifying password with -p option is considered unsafe. The password will be saved in bash history. The preferred approach is to enter the password when prompted. Any approach that passes the password via command line arguments makes that password visible in process options (ps command), no matter if the password is passed directly or as an environment variable. Refs SCYLLADB-409	2026-03-04 00:01:07 +01:00
Dario Mirovic	afafb8a8fa	docs: update documentation related to default superuser Update create superuser procedure: - Remove notes about default `cassandra` superuser - Add create superuser using existing superuser section - Update create superuser by using `scylla.yaml` config - Add create superuser using maintenance socket Update password reset procedure: - Add maintenance socket approach - Remove the old approach with deleting all the roles Update enabling authentication with downtime and during runtime: - Mention creating new superuser over the maintenance socket - Remove default superuser usage Update enable authorization: - Mention creating new superuser over the maintenance socket - Remove mention of default superuser Reasoning for deletion of the old approach: - [old] Needs cluster downtime, removes all roles, needs recreation of roles, needs maintenance socket anyways, if config values are not used for superuser - [new] No cluster downtime, possibly one node restart to enable maintenance socket, faster Refs SCYLLADB-409	2026-03-04 00:01:07 +01:00
Dario Mirovic	3db74aaf5f	test: maintenance socket role management Introduce a test that cover: - Server startup without credentials config seeding with no roles created - Await maintenance socket role management to be enabled - `CREATE ROLE`, `ALTER ROLE`, and `DROP ROLE` statement execution success All the tests in the test_maintenance_socket.py module take 2-3 seconds to execute. Explicitly shut down Cluster objects to prevent 'RuntimeError: cannot schedule new futures after shutdown'. Refs SCYLLADB-409	2026-03-03 23:57:50 +01:00
Dario Mirovic	f74fe22386	test: cluster: add logs to test_maintenance_socket.py Add logs to test_maintenance_socket.py test test_maintenance_socket. This approach offers additional visibility in case of test failure. Such logs will be added to new tests in a follow up patch in this patch series. Refs SCYLLADB-409	2026-03-03 23:42:25 +01:00
Dario Mirovic	0e5ddec2a8	test: pylib: fix connect_driver handling when adding and starting server When connect_driver=False, the expected server up state should be capped to HOST_ID_QUERIED. This is to avoid waiting for CQL readiness, which requires a superuser to be present. This logic was only in ScyllaCluster.server_start. ManagerClient.server_add with start=True and connect_driver=False would still wait for CQL and hang if no superuser is present. The workaround was to call ManagerClient.server_add(start=False, connect_driver=False) followed by ManagerClient.server_start(connect_driver=False). This patch moves the capping from ScyllaCluster.server_start to ManagerClient.server_add and ManagerClient.server_start, where connect_driver is processed. ScyllaCluster only receives the already resolved expected_server_up_state value. Refs SCYLLADB-409	2026-03-03 23:42:25 +01:00
Dario Mirovic	fd17dcbec8	auth: do not create default 'cassandra:cassandra' superuser Changes the behavior of default superuser creation. Previously, without configuration 'cassandra:cassandra' credentials were used. Now default superuser creation is skipped if not configured. The two ways to create default superuser are: - Config file - auth_superuser_name and auth_superuser_salted_password fields - Maintenance socket - connect over maintenance socket and CREATE/ALTER ROLE ... Behavior changes: Old behavior: - No config - 'cassandra:cassandra' created - auth_superuser_name only - <name>:cassandra created - auth_superuser_salted_password only - 'cassandra:<password>' created - Both specified - '<name>:<password>' created New behavior: - No config - no default superuser - Requires maintenance socket setup - auth_superuser_name only - '<name>:' created WITHOUT password - Requires maintenance socket setup - auth_superuser_salted_password only - no default superuser - Both specified - '<name>:<password>' created Fixes SCYLLADB-409	2026-03-03 23:42:25 +01:00
Dario Mirovic	9dc1deccf3	auth: remove redundant DEFAULT_USER_NAME from password authenticator Remove redundant DEFAULT_USER_NAME from password_authenticator.cc file. It is just a copy of meta::DEFAULT_SUPERUSER_NAME. Refs SCYLLADB-409	2026-03-03 23:42:25 +01:00
Dario Mirovic	45628cf041	auth: enable role management operations via maintenance socket Introduce maintenance_socket_authenticator and rework maintenance_socket_role_manager to support role management operations. Maintenance auth service uses allow_all_authenticator. To allow role modification statements over the maintenance socket connections, we need to treat the maintenance socket connections as superusers and give them proper access rights. Possible approaches are: 1. Modify allow_all_authenticator with conditional logic that password_authenticator already does 2. Modify password_authenticator with conditional logic specific for the maintenance socket connections 3. Extend password_authenticator, overriding the methods that differ Option 3 is chosen: maintenance_socket_authenticator extends password_authenticator with authentication disabled. The maintenance_socket_role_manager is reworked to lazily create a standard_role_manager once the node joins the cluster, delegating role operations to it. In maintenance mode role operations remain disabled. Refs SCYLLADB-409	2026-03-03 23:41:05 +01:00
Dario Mirovic	6a1edab2ac	client_state: add has_superuser method Encapsulate the superuser check in client_state so that it respects _bypass_auth_checks. Connections that bypass auth (internal callers and the maintenance socket) are always considered superusers. Migrate existing call sites from auth::has_superuser(service, user) to client_state.has_superuser(). Also add _bypass_auth_checks handling to ensure_not_anonymous(). Refs SCYLLADB-409	2026-03-03 22:31:35 +01:00
Dario Mirovic	d765b5b309	client_state: add _bypass_auth_checks flag Authorization checks were previously skipped based on the _is_internal flag. This couples two concerns: marking client state as internal and bypassing authorization. Introduce _bypass_auth_checks to handle only the authorization bypass. Internal client state sets it to true, preserving current behavior. External client state accepts it as a constructor parameter, defaulting to false. This will allow maintenance socket connections to skip authorization without being marked as internal. Refs SCYLLADB-409	2026-03-03 22:31:35 +01:00
Dario Mirovic	b68656b59f	auth: let maintenance_socket_role_manager know if node is in maintenance mode This patch is part of preparations for dropping 'cassandra::cassandra' default superuser. When that is implemented, maintenance_socket_role_manager will have two modes of work: 1. in maintenance mode, where role operations are forbidden 2. in normal mode, where role operations are allowed To execute the role operations, the node has to join a cluster. In maintenance mode the node does not join a cluster. This patch lets maintenance_socket_role_manager know if it works under maintenance mode and returns appropriate error message when role operations execution is requested. Refs SCYLLADB-409	2026-03-03 22:31:35 +01:00
Dario Mirovic	3bef493a35	auth: remove class registrator usage This patch removes class registrator usage in auth module. It is not used after switching to factory functor initialization of auth service. Several role manager, authenticator, and authorizer name variables are returned as well, and hardcoded inside qualified_java_name method, since that is the only place they are ever used. Refs SCYLLADB-409	2026-03-03 22:31:35 +01:00
Dario Mirovic	eab24ff3b0	auth: instantiate auth service with factory functors Auth service is instantiated with the constructor that accepts service_config, which then uses class registrator to instantiate authorizer, authenticator, and role manager. This patch switches to instantiating auth service via the constructor that accepts factory functors. This is a step towards removing usage of class registrator. Refs SCYLLADB-409	2026-03-03 22:31:35 +01:00
Dario Mirovic	bfff07eacb	auth: add service constructor with factory functors Auth service can be initialized: - [current] by passing instantiated authorizer, authenticator, role manager - [current] by passing service_config, which then uses class registrator to instantiate authorizer, authenticator, role manager - This approach is easy to use with sharded services - [new] by passing factory functors which instantiate authorizer, authenticator, role manager - This approach is also easy to use with sharded services Refs SCYLLADB-409	2026-03-03 22:31:35 +01:00
Dario Mirovic	e8e00c874b	auth: add transitional.hh file In a follow-up patch in this patch series class registrator will be removed. Adding transitional.hh file will be necessary to expose the authenticator and authorizer. Refs SCYLLADB-409	2026-03-03 22:31:35 +01:00
Dario Mirovic	e5218157de	service: qos: handle special scheduling group case for maintenance socket service_level_controller has special handling for maintenance socket connections. If the current user is not a named user, it should use the default scheduling group. The reason is that the maintenance socket can communicate with Scylla before auth_integration is registered. The guard is already present, but it was omitted in get_cached_user_scheduling_group. This also fixes flakiness in test_maintenance_socket.py tests. Refs SCYLLADB-409	2026-03-03 22:31:35 +01:00
Dario Mirovic	dc9a90d7cb	service: qos: use _auth_integration as condition for using _auth_integration Maintenance socket connections can be established before _auth_integration is initialized. The fix introduced with scylladb/scylladb#26856 PR check for the value of user variable. For maintenance socket connections it will be an anonymous user, and will fall back to using default scheduling group. This patch changes the criteria for using default scheduling group from the user variable to checking the _auth_integration variable itself: - If _auth_integration is not initialized, use default scheduling group - If _auth_integration is initialized, let it choose the scheduling group Refs SCYLLADB-409	2026-03-03 22:31:35 +01:00
Andrzej Jackowski	371cdb3c81	cql3/query_processor: implement metrics to track CL of writes Add `write_consistency_levels_disallowed_violations` and `write_consistency_levels_warned_violations` metrics to track violations of write_consistency_levels guardrails. Add `writes_per_consistency_level` to track what CL is used by writes, regardless of the guardrails configuration. Data gathered by this metric can be used to decide whether enabling a particular write consistency level guardrail in a particular existing cluster is safe. Refs: SCYLLADB-259	2026-03-03 21:18:11 +01:00
Andrzej Jackowski	3606934458	db: cql3/query_processor: add write_consistency_levels enum_sets Add enum_sets to query_processor that track the configuration values of `write_consistency_levels_warned` and `write_consistency_levels_disallowed`. Refs: SCYLLADB-259	2026-03-03 20:28:57 +01:00
Dawid Mędrek	7fd083e329	test: raft: Introduce get_default_cluster We introduce a function creating a Raft cluster with parameters usually used by the tests. This will avoid code duplication, especially after introducing new tests in the following commits. Note that the test `test_aborting_wait_for_state_change` has changed: the previous complex configuration was unnecessary for it (I wrote it).	2026-03-03 18:50:21 +01:00
Pavel Emelyanov	b768753c0f	test: Remove passing default "expected_replicas" to check_mutation_replicas() The value of None is default, callers don't need to specify it explicitly Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-03-03 17:28:21 +03:00
Pavel Emelyanov	b8ae9ede63	test: Remove scope and primary-replica-only arguments from check_mutation_replicas() helper These two are only used to print into logs on error. However, their values can be found from previous logs and test execution context. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-03-03 17:26:25 +03:00
Karol Nowacki	45477d9c6b	vector_search: test: include ANN error in assertion When the test fails, the assertion message does not include the error from the ANN request. This change enhances the assertion to include the specific ANN error, making it easier to diagnose test failures.	2026-03-03 14:19:20 +01:00
Karol Nowacki	ab6c222fc4	vector_search: test: fix HTTPS client test flakiness The default 100ms timeout for client readiness in tests is too aggressive. In some test environments, this is not enough time for client creation, which involves address resolution and TLS certificate reading, leading to flaky tests. This commit increases the default client creation timeout to 10 seconds. This makes the tests more robust, especially in slower execution environments, and prevents similar flakiness in other test cases. Fixes: VECTOR-547, SCYLLADB-802	2026-03-03 14:19:20 +01:00
Botond Dénes	4f5310bc72	replica/table: update rf=1 table registry in shared tombstone-gc state On every update of the ERM, update the state of the current table in the registry of RF=1 tables in shared tombstone gc state. Ensures that tombstone gc stops collection of tombstones in immediate mode as soon as the table starts transitioning away from RF=1.	2026-03-03 14:09:28 +02:00
Botond Dénes	7c2c63ab43	tombstone_gc: tombstone_gc_before_getter: consider RF when getting gc before time Currently we cannot use repair-mode tombstone gc on RF=1 tables, because such tables don't need repair and so there won't be repair history to use to produce gc_before times. Introduce shared_tombstone_gc_state::_rf_one_tables which will keep a registry of RF=1 tables. Keeping this up to date is left to outside code (table.cc). Consult the registry to determine whether a table is RF=1 or not, so the repair history check can be ellided for rf=1 tables. Not wired in yet into the table code.	2026-03-03 14:09:28 +02:00
Botond Dénes	074006749c	tombstone_gc: unpack per_table_history_maps It is now a class with a single member, replace usage with that of the member (through an alias to reduce churn).	2026-03-03 14:09:28 +02:00
Botond Dénes	d6e2d44759	tombstone_gc: extract _group0_gc_time from per_table_history_map Doesn't belong there. Also, having it as a separate member of shared_tombstone_gc_state makes updating _group0_gc_time cheaper, as the update doesn't have to do a copy-mutate-swap of the history maps.	2026-03-03 14:09:28 +02:00
Botond Dénes	5fd9fc3056	tombstone_gc: drop tombstone_gc_state(nullptr) ctor and operator bool() Both are ambiguous and all users were migrated away to more meaningful alternatives. They are now unused, drop them.	2026-03-03 14:09:28 +02:00
Botond Dénes	a785c0cf41	test/lib/random_schema: use timeout-mode tombstone_gc This is the current de-facto default for all tests using random schema and some are apparently relying on this. Make this explicit to avoid upsetting tests, by the impending change of this default to repair.	2026-03-03 14:09:28 +02:00
Botond Dénes	d10e622a3b	tombstone_gc_options: add C++ friendly constructor So one can create options with strong types, instead of from a map of strings.	2026-03-03 14:09:28 +02:00
Botond Dénes	6004e84f18	test: move away from tombstone_gc_state(nullptr) ctor Use for_tests() instead (or no_gc() where approriate).	2026-03-03 14:09:28 +02:00
Botond Dénes	3c34598d88	treewide: move away from tombstone_gc_state(nullptr) ctor It is ambigous, use the appropriate no-gc or gc-all factories instead, as appropriate. A special note for mutation::compacted(): according to the comment above it, it doesn't drop expired tombstones but as it is currently, it actually does. Change the tombstone gc param for the underlying call to compact_for_compaction() to uphold the comment. This is used in tests mostly, so no fallout expected. Tests are handled in the next commit, to reduce noise. Two tests in mutation_test.cc have to be updated: * test_compactor_range_tombstone_spanning_many_pages has to be updated in this commit, as it uses mutation_partition::compact_for_query() as well as compact_for_query(). The test passes default constructed tombstone_gc() to the latter while the former now uses no-gc creating a mismatch in tombstone gc behaviour, resulting in test failure. Update the test to also pass no-gc to compact_for_query(). * test_query_digest similarly uses mutation_partition::query_mutation() and another compaction method, having to match the no-gc now used in query_mutation().	2026-03-03 14:09:28 +02:00
Botond Dénes	04b001daa6	sstable: move away from tombstone_gc_mode::operator bool() It is ambiguous, use tombstone_gc_mode::is_gc_enabled() instead. Note that the two has slightly different meanings, operator bool() returned true when repair-history related functionality was enabled. This is fine, because the only two users are logs, where the two meanings are close enough. All other users were eliminated or migrated already, taking the change in meaning into account.	2026-03-03 14:09:28 +02:00
Botond Dénes	6364e35403	replica/table: add get_tombstone_gc_state() Shorthand for get_compaction_manager().get_shared_tombstone_gc_state().get_tombstone_gc_state().	2026-03-03 14:09:28 +02:00
Botond Dénes	f3ee6a0bd1	compaction: use tombstone_gc_state with value semantics Instead of passing around references to it, pass around values. This object is now designed to be used as a value-type, after recent refactoring.	2026-03-03 14:09:27 +02:00
Botond Dénes	83e20d920e	db/row_cache: use tombstone_gc_state with value semantics Instead of keeping a pointer to it. Replace nullptr with tombstone_gc_state::no_gc(). This object is now designed to be used as a value-type, after recent refactoring.	2026-03-03 14:09:27 +02:00
Botond Dénes	041ab593c7	tombstone_gc: introduce tombstone_gc_state::for_tests() To replace the usage of tombstone_gc_state(nullptr) usage in tests specifically. This more verbose factory method will hopefully convey that this is not to be used in production code. The nullptr constructor doesn't convey this and in fact it was used in production code here-and-there.	2026-03-03 14:09:27 +02:00
Artsiom Mishuta	5c84a76b28	test.py: setup pytest logger This commit introduces pure pytest logging into a file Previously, test.py used pytest as a script(not a framework) and just captured pytest stdout and logged this data by itself This commit sets up the log files format that additionaly display Python processName, threadName adn taskName because test.py test cases use them, and now it is so hard to investigate issues that are connected with parallelism inside test case themselve In addition, commit splits the logging of different pytest workers(xdist) into different files. If pytest workers have ho failed test - log file for these workers will be deleted There is also additional logging for failures that will contain a separate file per test failure and contain the error itself (stacktrace) and all capture logs from stdout, stderr during the test run. With --save-log-on-success it will be a separate file per test on pass as well All this new functionality works with the new xdit scheduler (--test-py-init=True) Fixes SCYLLADB-713 Closes scylladb/scylladb#28705	2026-03-03 11:49:01 +01:00
Dimitrios Symonidis	80b74d7df2	tablet options: Add max_tablet_count tablet option to enforce tablet count upper bounds Introduced a new max_tablet_count tablet option that caps the maximum number of tablets a table can have. This feature is designed primarily for backup and restore workflows. During backup, when load balancing is disabled for snapshot consistency, the current tablet count is recorded in the backup manifest. During restore, max_tablet_count is set to this recorded value, ensuring the restored table's tablet count never exceeds the original snapshot's tablet distribution. This guarantee enables efficient file-based SSTable streaming during restore, as each SSTable remains fully contained within a single tablet boundary. Closes scylladb/scylladb#28450	2026-03-03 11:19:24 +03:00
Calle Wilund	69f8e722bf	table::snapshot_table_on_all_shards: Use set to keep track of tablets in manifest Fixes: SCYLLADB-828 Avoid iterating linear set of tablets when building manifest. Reduces complexity. Closes scylladb/scylladb#28851	2026-03-03 08:09:33 +02:00
Karol Nowacki	30487e8854	index: fix vector index with filtering target column The secondary index mechanism is currently used to determine the target column. This mechanism works incorrectly for vector indexes with filtering because it returns the last specified column as the target (vectors) column. However, the syntax for a vector index requires the first column to be the target: ``` CREATE CUSTOM INDEX ON t(vectors, users) USING 'vector_index'; ``` This discrepancy eventually leads to the following exception when performing an ANN search on a vector index with filtering columns: ```` ANN ordering by vector requires the column to be indexed using 'vector_index' ```` This commit fixes the issue by introducing dedicated logic for vector indexes to correctly identify the target(vectors) column. Fixes: SCYLLADB-635 Closes scylladb/scylladb#28740	2026-03-02 18:47:58 +02:00
Sergey Zolotukhin	33923578eb	Update DROP INDEX statement documentation Clarify behavior of DROP INDEX during ongoing builds. Closes scylladb/scylladb#28659	2026-03-02 17:31:23 +02:00
Botond Dénes	ab532882db	tools/scylla-sstable: introduce scylla sstable split Split input sstable(s) into multiple output sstables based on the provided token boundaries. The input sstable(s) are divided according to the specified split tokens, creating one output sstable per token range. Fixes: SCYLLADB-10 Closes scylladb/scylladb#28741	2026-03-02 15:19:17 +01:00
Marcin Maliszkiewicz	d95939d69a	test: add more logs to test_startup_no_auth_response When test fails with assert connections_observed we would like to know if it was unable to connect or execute query in attempt_good_connection	2026-03-02 14:53:46 +01:00
Marcin Maliszkiewicz	91126eb2fb	test: decrease strain in test_startup_response For 2025.3 and 2025.4 this test runs order of magnitude slower in debug mode. Potentially due to passwords::check running in alien thread and overwhelming the CPU (this is fixed in newer versions). Decreasing the number of connections in test makes it fast again, without breaking reproducibility. As additional measure we double the timeout. The fix is now cherry-picked to master as sometimes test fails there too. (cherry picked from commit `1f1fc2c2ac`)	2026-03-02 14:46:51 +01:00
Botond Dénes	bf3edaf220	tools/scylla-sstable: filter_operation(): use deferred_close() to close reader Manual closing is bypassed with exceptions, promoting an exception to a crash due to unclosed reader. Closes scylladb/scylladb#28797	2026-03-02 14:16:08 +01:00
Marcin Maliszkiewicz	6bf706ef1b	Merge 'scylla-sstable: query: handle nested UDTs' from Botond Dénes The query (and in certain modes the write) operations uses virtual table facility inside `cql_test_env`. The schema of the sstable is created as a table in `cql_test_env`. This involves registering all UDTs with the keyspace, so they are available for lookups. This was done with a flat loop over all column types, but this is not enough. UDTs might be nested in other types, like collections. One has to do a traversal of the type tree and register every UDT on the way. This PR changes the flat loop to a recursive traversal of the type tree. The query operation now works with UDTs, no matter how deeply nested they are. Backport: Implements missing functionality of a tool, no backport. Closes scylladb/scylladb#28798 * github.com:scylladb/scylladb: tools/scylla-sstable: create_table_in_cql_env(): register UDTs recursively tools/scylla-sstable: generalize dump_if_user_type tools/scylla-sstable: move dump_if_user_type() definition	2026-03-02 14:14:43 +01:00
dependabot[bot]	f5fa77ac9d	build(deps): bump sphinx-multiversion-scylla in /docs Bumps [sphinx-multiversion-scylla](https://holzhaus.github.io/sphinx-multiversion/) from 0.3.4 to 0.3.7. --- updated-dependencies: - dependency-name: sphinx-multiversion-scylla dependency-version: 0.3.7 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Closes scylladb/scylladb#28833	2026-03-02 14:13:03 +01:00
Karol Nowacki	647172d4b8	vector_search: fix names of private members According to coding style in Scylla, member variables are prefixed with underscore.	2026-03-02 14:08:16 +01:00
Karol Nowacki	f2308b000f	vector_search: remove unused global variable	2026-03-02 14:08:07 +01:00
Marcin Maliszkiewicz	a83ee6cf66	Merge 'db/batchlog_manager: re-add v1 support for mixed clusters' from Botond Dénes `3f7ee3ce5d` introduced system.batchlog_v2, with a schema designed to speed up batchlog replays and make post-replay cleanups much more effective. It did not introduce a cluster feature for the new table, because it is node local table, so the cluster can switch to the new table gradually, one node at a time. However, https://github.com/scylladb/scylladb/issues/27886 showed that the switching causes timeouts during upgrades, in mixed clusters. Furthermore, switching to the new table unconditionally on upgrades nodes, means that on rollback, the batches saved into the v2 table are lost. This PR introduces re-introduces v1 (`system.batchlog`) support and guards the use of the v2 table with a cluster feature, so mixed clusters keep using v1 and thus be rollback-compatible. The re-introduced v1 support doesn't support post-replay cleanups for simplicity. The cleanup in v1 was never particularly effective anyway and we ended up disabling it for heavy batchlog users, so I don't think the lack of support for cleanup is a problem. Fixes: https://github.com/scylladb/scylladb/issues/27886 Needs backport to 2026.1, to fix upgrades for clusters using batches Closes scylladb/scylladb#28736 * github.com:scylladb/scylladb: test/boost/batchlog_manager_test: add tests for v1 batchlog test/boost/batchlog_manager_test: make prepare_batches() work with both v1 and v2 test/boost/batchlog_manager_test: fix indentation test/boost/batchlog_manager_test: extract prepare_batches() method test/lib/cql_assertions: is_rows(): add dump parameter tools/scylla-sstable: extract query result printers tools/scylla-sstable: add std::ostream& arg to query result printers repair/row_level: repair_flush_hints_batchlog_handler(): add all_replayed to finish log db/batchlog_manager: re-add v1 support db/batchlog_manager: return all_replayed from process_batch() db/batchlog_manager: process_bath() fix indentation db/batchlog_manager: make batch() a standalone function db/batchlog_manager: make structs stats public db/batchlog_manager: allocate limiter on the stack db/batchlog_manager: add feature_service dependency gms/feature_service: add batchlog_v2 feature	2026-03-02 12:09:10 +01:00
Patryk Jędrzejczak	ba7f314cdc	test: test_full_shutdown_during_replace: retry replace after the replacing node is removed from gossip The test is currently flaky with `reuse_ip = True`. The issue is that the test retries replace before the first replace is rolled back and the first replacing node is removed from gossip. The second replacing node can see the entry of the first replacing node in gossip. This entry has a newer generation than the entry of the node being replaced, and both replacing nodes have the same IP as the node being replaced. Therefore, the second replacing node incorrectly considers this entry as the entry of the node being replaced. This entry is missing rack and DC, so the second replace fails with ``` ERROR 2026-02-24 21:19:03,420 [shard 0:main] init - Startup failed: std::runtime_error (Cannot replace node 8762a9d2-3b30-4e66-83a1-98d16c5dd007/127.61.127.1 with a node on a different data center or rack. Current location=UNKNOWN_DC/UNKNOWN_RACK, new location=dc1/rack2) ``` Fixes SCYLLADB-805 Closes scylladb/scylladb#28829	2026-03-02 10:26:57 +02:00
Yaron Kaikov	ab02486ce8	.github/workflows/trigger-scylla-ci.yaml: fix org membership check in trigger-scylla-ci workflow Following `becb48b586` it seems we have a regression with trigger CI logic The Verify Org Membership step used gh api /orgs/scylladb/members/$AUTHOR with GITHUB_TOKEN to check if the user is an org member. However, GITHUB_TOKEN does not have read:org scope, so the API call fails for all users — even actual scylladb org members — causing CI triggers to be silently skipped. Replace the API call with the author_association field from the GitHub event payload, which is set by GitHub itself and requires no special token permissions. This allows any scylladb org member (MEMBER or OWNER) to trigger CI via comment, regardless of whether they authored the PR. Closes scylladb/scylladb#28837	2026-03-02 10:23:14 +02:00
Michael Litvak	8c4bc33e51	test: remove test_view_building_with_tablet_move remove the test since it's not relevant anymore, it's not testing what it's supposed to test and it's unstable. the purpose of the test was to reproduce an issue in the legacy view builder where a view starts to build at token T2 and then all tokens [T1, end) with T1<T2 migrate to another node while it's still building, exposing an issue when the view builder wraparounds the token ring. this is not relevant anymore because now view building with tablets is done via the view building coordinator for tablets, and all views start to build from the first token with no wraparound. besides, the test is unstable due to relying too much on specific timing, which was useful for investigating and fixing the original issue but not anymore. Fixes SCYLLADB-842 Closes scylladb/scylladb#28842	2026-03-02 07:42:08 +01:00
Marcin Maliszkiewicz	8c2da76fde	test/cqlpy: remove xfail from test_constant_function_parameter The issue was fixed by commit `cc03f5c89d` ("cql3: support literals and bind variables in selectors"), so the xfail marker is no longer needed. Closes scylladb/scylladb#28776	2026-03-01 20:03:42 +02:00
Jenkins Promoter	fb6eebc383	Update pgo profiles - aarch64	2026-03-01 05:17:44 +02:00
Jenkins Promoter	8edd532c05	Update pgo profiles - x86_64	2026-03-01 04:31:57 +02:00
Botond Dénes	1f09fcfb26	Merge 'Use standard ks/cf/data creation methods in test_restore_with_streaming_scopes' from Pavel Emelyanov The test uses create_dataset helper duplicating the existing code that does the same. This PR patches basic tests to use standard facilities. Also the PR simplifies the 3-level nested loops used to combine several sets of restoration parameters by using itertools.product facility. Continuation of #28600. Cleaning tests, not backporting Closes scylladb/scylladb#28608 * github.com:scylladb/scylladb: test/object_store: Use itertools.product() for deeply nested loops test/object_store: Replace dataset creation usage with standard methods test/object_store: Shift indentation right for test_restore_with_streaming_scopes	2026-02-27 16:15:55 +02:00
Avi Kivity	450a09b152	test: tools: restrict embedded perf tests from taking over host The perf-simple-query tests were not restricted on CPU count, so on a 96-CPU machine, they would run on 96 CPUs, and time out in debug mode. All restrict memory usage and add --overprovisioned so that pinning is disabled. Apply that to all tests. Closes scylladb/scylladb#28821	2026-02-27 16:06:22 +02:00
Botond Dénes	d3a3921487	Merge 'Re-use and improve the take_snapshot() helper in backup tests' from Pavel Emelyanov The helper is very simple yet generic -- it takes a snapshot of a keyspace on all servers and collects the resulting sstables from workdirs. Re-using it in all test cases saves some lines of code. Also, the method is "sequential", making it "parallel" reduces the waiting time a bit. Will help generalizing existing backup/restore tests to support clustered snapshot/backup/restore API (see #28525) later. Cleaning up tests, not backporting. Closes scylladb/scylladb#28660 * github.com:scylladb/scylladb: test/backup: Run keyspace flush and snapshot taking API in parallel test/backup: Re-use take_snapshot() helper in do_abort_restore() test/backup: Move take_snapshot() helper up	2026-02-27 15:58:18 +02:00
Łukasz Paszkowski	fb40d1e411	compaction_manager: improve readability of maybe_wait_for_sstable_count_reduction() Split the chained inject_parameter().value_or() expression into two separate named variables for clarity. Use condition_variable::when() instead of wait(). when() is the coroutine-native API, designed specifically for co_await contexts — it avoids the heap allocation of a promise_waiter that wait() uses, and instead uses a stack-based awaiter. Closes scylladb/scylladb#28824	2026-02-27 15:51:30 +02:00
Patryk Jędrzejczak	9a9202c909	Merge 'Remove gossiper topology code' from Gleb Natapov The PR removes most of the code that assumes that group0 and raft topology is not enabled. It also makes sure that joining a cluster in no raft mode or upgrading a node in a cluster that not yet uses raft topology to this version will fail. Refs #15422 No backport needed since this removes functionality. Closes scylladb/scylladb#28514 * https://github.com/scylladb/scylladb: group0: fix indentation after previous patch raft_group0: simplify get_group0_upgrade_state function since no upgrade can happen any more raft_group0: move service::group0_upgrade_state to use fmt::formatter instead of iostream raft_group0: remove unused code from raft_group0 node_ops: remove topology over node ops code topology: fix indentation after the previous patch topology: drop topology_change_enabled parameter from raft_group0 code storage_service: remove unused handle_state_* functions gossiper: drop wait_for_gossip_to_settle and deprecate correspondent option storage_service: fix indentation after the last patch storage_service: remove gossiper bootstrapping code storage_service: drop get_group_server_if_raft_topolgy_enabled storage_service: drop is_topology_coordinator_enabled and its uses storage_service: drop run_with_api_lock_in_gossiper_mode_only topology: remove code that assumes raft_topology_change_enabled() may return false test: schema_change_test: make test_schema_digest_does_not_change_with_disabled_features tests run in raft mode test: schema_change_test: drop schema tests relevant for no raft mode only topology: remove upgrade to raft topology code group0: remove upgrade to group0 code group0: refuse to boot if a cluster is still is not in a raft topology mode storage_service: refuse to join a cluster in legacy mode	2026-02-27 14:43:41 +01:00
Anna Stuchlik	dfd46ad3fb	doc: add the upgrade guide from 2025.x to 2026.1 This commit adds the upgrade guide for version 2026.1. According to the new upgrade policy, the user can now upgrade to the major version (2026.1) from any previous minor version. So instead of adding a separate guide form 2025.4 to 2026.1, we need a guide from 2025.x to 2026.1. In addition, this commit: - Updates the upgrade policy for reflect the above change. - Removes the upgrade guides for the previous version. Fixes https://github.com/scylladb/scylladb/issues/28533 Fixes https://github.com/scylladb/scylladb/issues/28532 Closes scylladb/scylladb#28789	2026-02-27 15:36:34 +02:00
Botond Dénes	fcc570c697	Merge 'Exorcise assertions from Alternator, using a new throwing_assert() macro' from Nadav Har'El assert(), and SCYLLA_ASSERT() are evil (Refs #7871) because they can cause the entire Scylla cluster to crash mysteriously instead of cleanly failing the specific request that encountered a serious problem of failed pre-requisite. In this two-patch series, in the first patch we introduce a new macro throwing_assert(), a convenient drop-in replacement for SCYLLA_ASSERT() but which has all the benefits of on_internal_error() instead of the dangers of SCYLLA_ASSERT(). In the second patch we use the new function to replace every call to SCYLLA_ASSERT() in Alternator by the new throwing_assert(). Here is an example from the second patch to demonstrate the power of this approach: The Alternator code uses the attrs_column() function to retrieve the ":attrs" column of a schema. Since every Alternator table always has an ":attrs" column in its schema, we felt safe to SCYLLA_ASSERT() that this column exists. However, imagine that one day because of a bug, one Alternator table is missing this column. Or maybe not a bug - maybe a malicious user on a shared cluster found a way to deliberately delete this column (e.g, with a CQL command!) and this check fails. Before this patch, the entire Scylla node will crash. If the same request is sent to all nodes - the entire cluster will crash. The user might not even know which request caused this crash. In contrast, after this patch, the specific operation - e.g., PutItem - will get an exception. Only this operation, and nothing else, will be aborted, and the user who sent this request will even get an "Internal Server Error" with the assertion-failure message, alerting them that this specific query is causing problems, while other queries might work normally. There's no need to backport this patch - unless it becomes annoying that other branches don't have the throwing_assert() function and we want it to ease other backports. Fixes #28308. Closes scylladb/scylladb#28445 * github.com:scylladb/scylladb: alternator: replace SCYLLA_ASSERT with throwing_assert utils: introduce throwing_assert(), a safe replacement for assert	2026-02-27 15:35:36 +02:00
Roy Dahan	822c1597c9	install.sh: fix REST API paths for nonroot installations In nonroot installations, the install.sh script was hardcoding the api_ui_dir and api_doc_dir paths to /opt/scylladb/ in scylla.yaml, even though the actual files were installed to a different location (typically ~/scylladb). This caused REST API endpoints like /api-doc/failure_detector/ to fail with "transfer closed with outstanding read data remaining" error because Scylla couldn't find the API documentation files at the configured paths. Fix this by using the $prefix variable instead of hardcoded /opt/scylladb/ paths. This ensures that: - In regular installations: $prefix = /opt/scylladb (no change) - In nonroot installations: $prefix = ~/scylladb (paths now correct) Fixes: SCYLLADB-721 Backport: The hardcoded paths in install.sh have been present since the nonroot installation feature was introduced, making REST API endpoints non-functional in all nonroot installations across all live versions of Scylla. Closes scylladb/scylladb#28805	2026-02-27 15:32:54 +02:00
Botond Dénes	9521a51e4c	Merge 'generic_server: scale connection concurrency semaphore by listener count' from Marcin Maliszkiewicz The concurrency semaphore gates uninitialized connections across all do_accepts loops, but was initialized to a fixed value regardless of how many listeners exist. With multiple listeners competing for the same units, each effectively gets less than the configured concurrency. Initialize the semaphore to concurrency - 1 and signal 1 per listen() call, so total capacity is concurrency - 1 + nr_listeners. This guarantees each listener's accept loop can have at least one unit available. It mainly fixes problem when setting uninitialized_connections_semaphore_cpu_concurrency config value to 1 would result in not being able to process connections, as only 1 out of 2 listeners got the semaphore. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-762 Backport: no, it's a minor problem Closes scylladb/scylladb#28747 * github.com:scylladb/scylladb: test: add test_uninitialized_conns_semaphore generic_server: fix waiters count in shed log generic_server: scale connection concurrency semaphore by listener count	2026-02-27 15:06:50 +02:00
Taras Veretilnyk	5bbc44ed12	sstables: replace rewrite_statistics with new rewrite component mechanism This commits migrates all callers that used rewrite_statistics to new rewrite component mechanism.	2026-02-26 22:38:55 +01:00
Taras Veretilnyk	51c345aaf6	sstables: add new rewrite component mechanism for safe sstable component rewriting Previously, rewriting an sstable component (e.g., via rewrite_statistics) created a temporary file that was renamed to the final name after sealing. This allows crash recovery by simply removing the temporary file on startup. However, this approach won't work once component digests are stored in scylla_metadata, as replacing a component like Statistics will require atomically updating both the component and scylla_metadata with the new digest—impossible with POSIX rename. The new mechanism creates a clone sstable with a fresh generation: - Hard-links all components from the source except the component being rewritten and scylla metadata if update_sstable_id is true - Copies original sstable components pointer and recognized components from the source - Invokes a modifier callback to adjust the new sstable before rewriting - Writes the modified component. If update_sstable_id is true, reads scylla metadata, generates new sstable_id and rewrites it. - Seals the new sstable with a temporary TOC - Replaces the old sstable atomically, the same way as it is done in compaction This is built on the rewrite_sstables compaction framework to support batch operations (e.g., following incremental repair). In case of any failure during the whole process, sstable will be automatically deleted on the node startup due to temporary toc persistence. This prepares the infrastructure for component digests. Once digests are introduced in scylla_metadata this mechanism will be extended to also rewrite scylla metadata with the updated digest alongside the modified component, ensuring atomic updates of both.	2026-02-26 22:38:55 +01:00
Taras Veretilnyk	4aa0a3acf9	compaction: add compaction_group_view method to specify sstable version Add make_sstable() overload that accepts sstable_version_types parameter to compaction_group_view interface and all implementations. This will be useful in rewrite component mechanism, as we need to preserve sstable version when creating the new one for the replacement.	2026-02-26 22:38:55 +01:00
Taras Veretilnyk	16ea7a8c1c	sstables: add null_data_sink and serialized_checksum for checksum-only calculation Introduce a null_data_sink and make_digest_calculator implementation that discards all writes, enabling checksum calculation without file I/O. This allows the existing checksummed_file_writer to be used for digest computation without writing data to disk. This will be used in a future commit to calculate the scylla metadata component checksum before writing it to disk, allowing the component to store its own checksum.	2026-02-26 22:38:51 +01:00
Łukasz Paszkowski	bb57b0f3b7	compaction_manager: fix maybe_wait_for_sstable_count_reduction() hanging forever The futurization refactoring in `9d3755f276` ("replica: Futurize retrieval of sstable sets in compaction_group_view") changed maybe_wait_for_sstable_count_reduction() from a single predicated wait: ``` co_await cstate.compaction_done.wait([..] { return num_runs_for_compaction() <= threshold \|\| !can_perform_regular_compaction(t); }); ``` to a while loop with a predicated wait: ``` while (can_perform_regular_compaction(t) && co_await num_runs_for_compaction() > threshold) { co_await cstate.compaction_done.wait([this, &t] { return !can_perform_regular_compaction(t); }); } ``` This was necessary because num_runs_for_compaction() became a coroutine (returns future<size_t>) and can no longer be called inside a condition_variable predicate (which must be synchronous). However, the inner wait's predicate — !can_perform_regular_compaction(t) — only returns true when compaction is disabled or the table is being removed. During normal operation, every signal() from compaction_done wakes the waiter, the predicate returns false, and the waiter immediately goes back to sleep without ever re-checking the outer while loop's num_runs_for_compaction() condition. This causes memtable flushes to hang forever in maybe_wait_for_sstable_count_reduction() whenever the sstable run count exceeds the threshold, because completed compactions signal compaction_done but the signal is swallowed by the predicate. Fix by replacing the predicated wait with a bare wait(), so that any signal (including from completed compactions) causes the outer while loop to re-evaluate num_runs_for_compaction(). Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-610 Closes scylladb/scylladb#28801	2026-02-26 20:13:50 +02:00
Marcin Maliszkiewicz	a03ebe1a29	Merge 'cql: implement a new per-row TTL feature' from Nadav Har'El This series implements a new per-row TTL feature for CQL. The per-row TTL feature was requested in issue #13000. It is a feature that does not exist in Cassandra, and was inspired by DynamoDB's TTL feature - and under the hood uses the same implementation that we used in Alternator to implement this DynamoDB feature. The new per-row TTL feature is completely separate from CQL's existing per-write (and per-cell) TTL, and both will be available to users. In the per-row TTL feature, one column in the table is designated as the "TTL" column, and its value for a row is the expiration time for that row. The TTL column can be designated at table creation time, e.g.: ```cql CREATE TABLE tab ( id int PRIMARY KEY, t text, expiration timestamp TTL ); ``` Or after the table already exists with: ```cql ALTER TABLE tab TTL expiration ``` Expiration can also be disabled, with: ```cql ALTER TABLE tab TTL NULL ``` The new per-row TTL feature has two features that users have been asking for: 1. A user can change the value of just the TTL column - without rewriting the entire row - to change the expiration time of the entire row. 2. When an expired row is finally deleted, a CDC event about this deletion appears in the CDC log (if CDC is enabled), including - if a preimage is enabled - the content of the deleted row. To achieve the second goal (CDC events), a row is not guaranteed to disappear at exactly its expiration time (as CQL's original TTL feature guarantees). Rather, the row is deleted some time later, depending on `alternator_ttl_period_in_seconds`; Until the actual deletion, the row is still readable (and even writable). But we are guaranteed that when the row is finally deleted, the CDC event will come too. The implementation uses the same background thread used by Alternator to periodically scan for expired items and delete them. The expiration thread keeps the same metrics as it did for Alternator: * `scylla_expiration_scan_passes` * `scylla_expiration_scan_table` * `scylla_expiration_items_deleted` * `scylla_expiration_secondary_ranges_scanned` The series begins with a few small preparation patches, followed by the main part of the feature (which isn't big, since we are just enabling the pre-existing Alternator expiration machinary for CQL) and finally 30 tests (single-node and multi-node tests) and documentation. This series is a new feature, so traditionally would not be backported. However, I wouldn't be surprised if we will be requested to backport it so that customers will not need to wait for a new major release. Fixes #13000 Closes scylladb/scylladb#28320 * github.com:scylladb/scylladb: test/cqlpy: verify that a column can't be both STATIC and PRIMARY KEY docs/cql: document the new CQL per-row TTL feature test/cluster: tests for the new CQL per-row TTL feature test/cqlpy: tests for the new CQL per-row TTL feature test: set low alternator_ttl_period_in_seconds in CQL tests cql ttl: fix ALTER TABLE to disable TTL if column is dropped cql ttl: add setting/unsetting of TTL column to ALTER TABLE cql ttl: add TTL column support to CREATE TABLE and DESC TABLE ttl: add CQL support to Alternator's TTL expiration service alternator ttl: move TTL_TAG_KEY to a header file alternator ttl: remove unnecessary check of feature flag cql: add "cql_row_ttl" cluster feature alternator: fix error message if UpdateTimeToLive is not supported	2026-02-26 15:29:12 +01:00
Ferenc Szili	f22d75a57e	load_stats: add filtering for tablet sizes This patch adds filtering for tablet sizes collected in load_stats. This is needed to improve the chances that the balancer will have all the tablet sizes for the node, and that way avoid having the node ignored during balancing.	2026-02-26 15:17:39 +01:00
Marcin Maliszkiewicz	4d0f1bf5c9	conf: improve rf_rack_valid_keyspaces documentation is scylla.yaml Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-761 Closes scylladb/scylladb#28738	2026-02-26 14:34:28 +01:00
Yaniv Michael Kaul	ead9961783	cql: vector: fix vector dimension type Switch vector dimension handling to fixed-width `uint32_t` type, update parsing/validation, and add boundary tests. The dimension is parsed as `unsigned long` at first which is guaranteed to be at least 32-bit long, which is safe to downcast to `uint32_t`. Move `MAX_VECTOR_DIMENSION` from `cql3_type::raw_vector` to `cql3_type` to ensure public visibility for checks outside the class. Add tests to verify the type boundaries. Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-223 Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> Co-authored-by: Dawid Pawlik <dawid.pawlik@scylladb.com> Closes scylladb/scylladb#28762	2026-02-26 14:46:53 +02:00
Ferenc Szili	d0a5a1d5d0	load_stats: move tablet filtering for table size computation This patch moves the table size tablet filtering code from a lambda in storage_service::load_stats_for_tablet_based_tables() to the code section where it will be used: tablet_storage_group_manager::table_load_stats() This is needed to better accomodate the next commit which will add code for filtering tablets for tablet sizes.	2026-02-26 11:07:53 +01:00
Ferenc Szili	9cd2a04e79	load_stats: bring the comment and code in sync Table sizes are collected in load stats, and are filtered according to the migration stage, so as to avoid double accounting of tablet sizes. The comment which explains the cut-off migration stage (cleanup) and the cut-off stage in the code (streaming) are not the same. This patch fixes that.	2026-02-26 11:03:33 +01:00
Michael Litvak	4a60ee28a2	test/cqlpy/test_materialized_view.py: increase view build timeout The test test_build_view_with_large_row creates a materialized view and expects the view to be built with a timeout of 5 seconds. It was observed to fail because the timeout is too short on slow machines. Increase the timeout to 60 seconds to make the test less flaky on slow machines. Similarly for the other tests in the file that have a timeout for view build, increase the timeout to 60 seconds to be consistent and safer. Fixes SCYLLADB-769 Closes scylladb/scylladb#28817	2026-02-26 11:27:51 +02:00
Michał Hudobski	579ed6f19f	secondary_index_manager: fix double registration bug We have observed a bug that caused Scylla to crash due to metrics double registration. This bug is really difficult to reproduce and was seen only once in the wild. We think that it may be caused by a request in-flight keeping a reference to the stats object, making it not deregister when the index is dropped, which casues a double registration when we recreate the index, however we are not 100% sure. This patch makes it so the metrics always get deregistered when we drop the index, which should fix the double registration bug. Fixes: #27252 Closes scylladb/scylladb#28655	2026-02-26 09:39:53 +01:00
Marcin Maliszkiewicz	30f18a91fd	Merge 'dtest: wait_for speedup' from Dario Mirovic Audit tests have been slow. They rely on wait_for function. This function first sleeps for the duration of the time step specified, and then calls the given function. The audit tests need 0.02-0.03 seconds for the given function, but the operation lasts around 1.02-1.03 seconds, since step is 1 second. This patch modifies wait_for dtest function so it first executes the given function, and afterwards calls time.sleep(step). This reduces time needed for the given function from 1.03 to 0.03 seconds. Total audit tests suite speedup is 3x. On the developer machine the time is reduced from 13+ minutes to 4 minutes. This patch also improves performance of some alternator tests that use the same wait_for dtest function. `wait_for` in dtest framework has default time step reduced to make the environment more responsive and test execution faster. Refs SCYLLADB-573 This is a performance improvement of testing framework. No need to backport. Closes scylladb/scylladb#28590 * github.com:scylladb/scylladb: dtest: shorten default sleep step in wait_for dtest: wait_for speedup	2026-02-26 09:33:38 +01:00
Amnon Heiman	5db971c2f9	estimated_histogram_test.cc: add to_metrics_histogram test Add a test that exercises to_metrics_histogram when Min is smaller than Precision. The test verifies duplicate integer bounds are collapsed, counts remain cumulative, and native histogram metadata is still present with the expected schema and min id. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2026-02-26 09:00:52 +02:00
Amnon Heiman	0b4f28ae21	histogram_metrics_helper.hh: Support Min < Precision to_metrics_histogram now collapses duplicate integer bucket bounds caused by Min less than Precision scaling while always keeping native histogram metadata. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2026-02-26 09:00:38 +02:00
Amnon Heiman	af6371c11f	estimated_histogram_test.cc: Add tests for approx_exponential_histogram with Min<Precision Add three test cases to verify the hybrid linear/exponential bucketing: - test_histogram_min_1_bucket_limits: Validates bucket lower limits - test_histogram_min_1_basic: Tests value insertion and bucket distribution - test_histogram_min_1_statistics: Tests min(), max(), quantile(), and mean() - test_histogram_min_2_precision_4: Test min == 2 and precision 4. These tests cover the new Min<Precision mode with Precision=4, verifying both the linear range and exponential range. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2026-02-26 08:53:06 +02:00
Amnon Heiman	6c21e5f80c	estimated_histogram.hh: support Min less than Precision histograms approx_exponential_histogram is a pseudo exponential histogram implementation that can insert and retrieve values into and from buckets in O 1 time. The implementation uses power of two ranges and splits them linearly into buckets. The number of buckets per power of two range is called Precision. The original implementation aimed at covering large value ranges had a limitation. The histogram Min value had to be greater than or equal to Precision. As a result code that needs histograms for small integer values could not use this implementation efficiently. This change addresses that gap by handling the case where Min is less than Precision. For Min smaller than Precision the value is scaled by a power of two factor during indexing so the existing exponential math can be reused without runtime branching. Bucket limits are scaled back to the original units which can lead to repeated bucket limits in the Min to Precision range for integer values. Example with Min 2 and Precision 4 Buckets 2 2 3 3 4 5 6 7 8 10 12 14 and so on Implementation details Introduce SHIFT based on log2 Precision minus log2 Min when positive Scale Min and Max by SHIFT for all exponential calculations Compute NUM_BUCKETS using the standard log2 Max over Min formula Use scaled value in find_bucket_index to avoid fractional bucket steps Return bucket limits by scaling back to original units Constraint relaxed from Min greater or equal to Precision to allow any Min less than Max still power of two This change maintains backward compatibility with existing histograms while enabling efficient tracking of small integer values. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2026-02-26 00:46:14 +02:00
Amnon Heiman	aca5284b13	test(alternator): add per-table latency coverage for item and batch ops Add missing tests for per-table Alternator latency metrics to ensure recent per-table latency accounting is actually validated. Changes in this patch: Refactor latency assertion helper into check_sets_latency_by_metric(), parameterized by metric name. Keep existing behavior by implementing check_sets_latency() as a wrapper over scylla_alternator_op_latency. Add test_item_latency_per_table() to verify scylla_alternator_table_op_latency_count increases for: PutItem, GetItem, DeleteItem, UpdateItem, BatchWriteItem, and BatchGetItem. This closes a test gap where only global latency metrics were checked, while per-table latency metrics were not covered. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2026-02-25 20:51:18 +02:00
Amnon Heiman	29e0b4e08c	alternator: track per-table latency for batch get/write operations Batch operations were updating only global latency histograms, which left table-level latency metrics incomplete. This change computes request duration once at the end of each operation and reuses it to update both global and per-table latency stats: Latencies are stored per table used, This aligns batch read/write metric behavior with other operations and improves per-table observability. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2026-02-25 20:51:18 +02:00
Yaron Kaikov	b211590bc0	.github/workflows: enable automatic backport PR creation with Jira sub-issue integration This workflow calls the reusable backport-with-jira workflow from scylladb/github-automation to enable automatic backport PR creation with Jira sub-issue integration. The workflow triggers on: - Push to master/next-/branch- branches (for promotion events) - PR labeled with backport/X.X pattern (for manual backport requests) - PR closed/merged on version branches (for chain backport processing) Features enabled by calling the shared workflow: - Creates Jira sub-issues under the main issue for each backport version - Sorts versions descending (highest first: 2025.4 -> 2025.3 -> 2025.2) - Cherry-picks from previous version branch to avoid repeated conflicts - On Jira API failure: adds comment to main issue, applies 'jira-sub-issue-creation-failed' label, continues with PR Closes scylladb/scylladb#28804	2026-02-25 16:39:17 +02:00
Nadav Har'El	1d265e7d6d	test/cqlpy: verify that a column can't be both STATIC and PRIMARY KEY While adding the new syntax "TTL" to CREATE TABLE, I noticed that the parser actually allows a column to be defined as "STATIC PRIMARY KEY". So I add here a small test to verify that this is not really allowed: The syntax "c int STATIC PRIMARY KEY" is accepted, but then rejected by a later check. The syntax "c int PRIMARY KEY STATIC" is rejected as a syntax error. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-02-25 14:59:45 +02:00
Nadav Har'El	34c0c64d9d	docs/cql: document the new CQL per-row TTL feature Add user-facing documentation for the new CQL per-row TTL feature, in docs/cql/cql-extensions.md. Also mention (and link) the new alternative TTL feature in a few relevant documents about the old (per-write) TTL, about CDC, and about the CREATE TABLE and ALTER TABLE commands. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-02-25 14:59:44 +02:00
Nadav Har'El	23ad0be034	test/cluster: tests for the new CQL per-row TTL feature The previous patch added single-node functional tests (in test/cqlpy) for everything which was possible to test on a single node. In this patch we add four tests that we couldn't test on a single node, using the test/cluster test framework: 1. Test that the TTL expiration work - both the scanning threads and the actual deletion work on all nodes - happens on the "streaming" scheduling group. 2. Test that even if one of the cluster's nodes is down, still all the items get expired - another node "takes over" the dead node's work. 3. Test that rolling upgrade works as designed for the CQL per-row TTL feature: Before every single node in the cluster is upgraded to support this feature, a TTL column cannot be enabled on a table. And as soon as the last node of the cluster is upgraded, the TTL feature begins to work completely (you don't need to reboot all the nodes again). 4. Test that expiration works correctly on a multi-DC setup. The test doesn't check the efficiency of this process - i.e., that today each DC scans part of the data, reading with LOCAL_QUORUM, and writing the deletions across the entire cluster. Rather, the test only verifies the correctness - that expired rows do get deleted - for the usual case the data across the DCs is consistent. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-02-25 14:59:44 +02:00
Nadav Har'El	7a1351c6cf	test/cqlpy: tests for the new CQL per-row TTL feature This patch contains 27 functional tests (in the test/cqlpy framework) for the new CQL per-row TTL feature. The tests cover the TTL column configuration statements (CREATE TABLE, ALTER TABLE) as well as the actual item expiration or non-expiration depending on the value of the expiration-time column - and also CDC events generated on expiration and the metrics generated by the expiration process. These tests were written together with the code, as in "test-driven development", so they aim to cover every corner case considered during the development, and they reproduce every bug and misstep seen during the development process. As a result, they hopefully achieve very high code coverage - but since we don't have a working code-coverage tool, I can't report any specific code coverage numbers. These tests check everything which we can check on single-node cluster. The next patch will add additional multi-node tests for things we can't check here with a single node - such as the scheduling group used by the distributed work, the effect of dead nodes on the TTL functionality, and the process of rolling upgrade. The tests in this patch do NOT try to stress the background expiration scanning threads, or to check how they handle topology changes, large amounts of data or clusters spanning multiple DCs. These tests also don't test the performance impact of these scanning threads. Because the expiration scanning thread is identical to the one already used by Alternator TTL, we assume that many of these aspects were already tested for Alternator TTL and did not change when the same implementation is used for the new CQL feature. All new tests pass on ScyllaDB. Because the per-row TTL feature is a new ScyllaDB feature that does not exist on Cassandra, all these tests are skipped on Cassandra. Because some of these tests involve waiting for expiration, they can't be very quick. Still, because we set alternator_ttl_period_in_seconds to 0.5 seconds in the test framework, all 27 tests running sequentially finish in roughly 6 seconds total, which we consider acceptable. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-02-25 14:59:44 +02:00
Nadav Har'El	154cecda71	test: set low alternator_ttl_period_in_seconds in CQL tests In test/alternator/run we set alternator_ttl_period_in_seconds to a very low number (0.5 seconds) to allow TTL tests to expire items very quickly and finish quickly. Until now, we didn't need to do this for CQL tests, because they weren't using this Alternator-only feature. Now that CQL uses the same expiration feature with its original configuration parameter, we need to set it in CQL tests too. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-02-25 14:59:43 +02:00
Nadav Har'El	eebd7b0fbc	cql ttl: fix ALTER TABLE to disable TTL if column is dropped If "ALTER TABLE tab DROP x" is done to delete column x, and column x was the designated TTL column, then the per-row TTL feature should be disabled on this table. If we don't do this, the expiration scanner will continue to scan the table trying to read the dropped column - which will be wasteful or worse. A test for this case is also included in test/cqlpy/test_ttl_row.py in a later patch. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-02-25 14:59:43 +02:00
Nadav Har'El	acbdf637b6	cql ttl: add setting/unsetting of TTL column to ALTER TABLE The previous patch added the ability in CREATE TABLE to designate one of the regular columns as a "TTL column", to be used by the per-row TTL feature (Refs #13000). In this patch we add to ALTER TABLE the ability to enable per-row TTL on an existing table with a given column as the TTL column: ALTER TABLE tab TTL colname and also the ability to disable per-row TTL with ALTER TABLE tab TTL NULL as in CREATE TABLE, the designated TTL column must be a regular column (it can't be a primary key column or a static column), and must have the types timestamp, bigint or int. You can't enable per-row TTL if already enabled, or disable it if already disabled. To change the TTL column on an existing table, you must first disable TTL, and then re-enable it with the new column. A large collection of functional tests (in test/cqlpy), for every detail of this patch, will come in a later patch in this series. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-02-25 14:59:43 +02:00
Nadav Har'El	22c79b6af8	cql ttl: add TTL column support to CREATE TABLE and DESC TABLE This patch enables the per-row TTL feature in CQL (Refs #13000). This patch allows the user to create a new table with one of its columns designated as the TTL column with a syntax like: CREATE TABLE tab ( id int PRIMARY KEY, t text, expiration timestamp TTL ); The column marked "TTL" must have the "timestamp", "bigint" or "int" types (the choice of these types was explained in the previous patch), and there can only be one such column. We decided not to allow a column to be both a primary key column and a TTL column - although it would have worked (it's supported in Alternator), I considered this non-useful and confusing, and decided not to allow it in CQL. A TTL column also can't be a static column. We save the information of which column is the TTL column in a tag which is read by the "expiration service" - originally a part of Alternator's TTL implementation. After the previous patch, the expiration service is running and knows how to understand CQL tables, so the CQL per-row TTL feature will start to work. This patch also implements DESC TABLE, printing the word "TTL" in the right place of the output. This patch doesn't yet implement ALTER TABLE that should allow enabling or disabling the TTL column setting on an existing table - we'll do that in the next patch. A large collection of functional tests (in test/cqlpy), for every detail of this feature will be added in a later patch. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-02-25 14:59:42 +02:00
Nadav Har'El	e636bc39ad	ttl: add CQL support to Alternator's TTL expiration service The Alternator TTL feature uses an "expiration service", a background thread on each shard which periodically scans for expired items and deletes them. When writing the expiration service, we already anticipated that the day will come that we'll want to use it for CQL too. Well, now that we want to use it for CQL, we only need to make two changes: 1. Before this patch, the expiration service was only started if Alternator was enabled. Now we need to start it unconditionally, as both Alternator and CQL will need to use it. The performance impact of the new background threads, when not needed, should be minimal: These threads will wake up every alternator_ttl_period_in_seconds (by default - once a day) and just check if any table has per-row TTL enabled, and if not, do nothing. 2. Before this patch, the expiration-time column had to be of type "decimal" - a variable-precision floating-point type. This made sense in Alternator - where all numbers are of this type, but CQL offers better and more efficient types for this purpose. In this patch we add support for two additional types for the expiration time column: The "timestamp" type (which uses millisecond precision, which our implementation truncates to whole seconds) and for the "bigint" type storing a number of seconds since the UNIX epoch. We also support the smaller "int" type for compatibility with existing data, but it is not recommended because a signed 32-bit integer counting time from 1970 will break in 2038. After this patch, the expiration service supports CQL tables, but there is nothing yet that can enable it on CQL tables - i.e., nothing that sets the appropriate tag on the table to tell the expiration service which column is the expiration-time column. We'll add new syntax to do this in the next patch. At the moment, we leave the expiration service implementation in its existing location - alternator/ttl.cc. This is despite the fact that we now start it and use it also for CQL. For better modularity, we should probably later move the expiration service implementation to a separate module (directory). Similarly, the expiration service's period is still configured via alternator_ttl_period_in_seconds, which is now a misnomer because it also affects CQL. Later we can rename this configuration parameter, or alternatively, consider different scan periods for different tables and table types, and have separate configuration for Alternator TTL and CQL per-row TTL. The metrics kept by the expiration service are the same metrics existing for Alternator TTL, and fortunately do not have the name "alternator" in their name: * scylla_expiration_scan_passes * scylla_expiration_scan_table * scylla_expiration_items_deleted * scylla_expiration_secondary_ranges_scanned Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-02-25 14:59:42 +02:00
Nadav Har'El	2823780557	alternator ttl: move TTL_TAG_KEY to a header file TTL_TAG_KEY stores the name of the tag in which we store the name of the table's expiration-time column, for Alternator's TTL feature. We already need this name in two source files, and soon we'll need it in more files - as we want to use the same implementation also for for a new per-row TTL feature in CQL. So it's time to move the declaration of this variable to a new header file - alternator/ttl_tag.hh. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-02-25 14:59:42 +02:00
Nadav Har'El	5e16c59312	alternator ttl: remove unnecessary check of feature flag Every node that supports the Alternator TTL feature should start its background expiration-checking thread, without checking if other nodes support this feature. This patch removes the unnecessary check. Indeed, until all other nodes enable this feature, the background thread will have nothing to do. but when finally all nodes have this feature - we need this thread to already be on - without requiring another reboot of all nodes to start this thread. In practice, this change won't change anything on modern installations because this feature is already three years old and always enabled on modern clusters. But I don't want to repeat the same mistake for the new CQL per-row TTL feature, so better fix it in Alternator too. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-02-25 14:59:41 +02:00
Nadav Har'El	4f4e93b695	cql: add "cql_row_ttl" cluster feature This patch adds a new cluster feature "CQL_ROW_TTL", for the new CQL per-row TTL feature. With this patch, this node reports supporting this feature, but the CQL per-row TTL feature can only be used once all the nodes in the cluster supports the feature. In other words, user requests to enable per-row TTL on a table should check this feature flag (on the whole cluster) before proceeding. This is needed because the implementation of the per-row-TTL expiration requires the cooperation of all nodes to participate in scanning for expired items, so the feature can't be trusted until all nodes participate in it. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-02-25 14:59:41 +02:00
Nadav Har'El	0d6b7a6211	alternator: fix error message if UpdateTimeToLive is not supported Since commit `2dedb5ea75`, the Alternator TTL feature is no longer experimental. It is still a "cluster feature" meaning it cannot be used on a partially-upgraded cluster until the entire cluster supports this feature. The error message we printed when the cluster doesn't support this feature was outdated, referring to the no-longer-existing experimental feature. So this patch fixes the error message. Since this feature is already three years old, nobody is likely to ever see this error message (it can be seen only by someone upgrading an even older cluster, during the rolling upgrade), but better not have wrong error messages in the code, even if it's not seen by users. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-02-25 14:59:41 +02:00
Nadav Har'El	b78bb914d7	alternator: replace SCYLLA_ASSERT with throwing_assert Replace all calls to SCYLLA_ASSSERT() in Alternator by the better and safer throwing_assert() introduced in the previous patch. As a result of this patch, if one of the call sites for these asserts is buggy and ever fails, only the involved operation will be killed by an exception, instead of crashing the whole server - and often the entire cluster (as the same buggy request reaches all nodes and crashes them all). Additionally, this patch replaces a few existing uses in Alternator of on_internal_error() with a non-interesting message with a more-or-less equivalent, but shorter, throwing_assert(). The idea is to convert the verbose idiom: if (!condition) { on_internal_error(logger, "some error message") } With the shorter throwing_assert(condition) Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-02-25 14:58:47 +02:00
Nadav Har'El	d876e7cd0a	utils: introduce throwing_assert(), a safe replacement for assert This patch introduces throwing_assert(cond), a better and safer replacement for assert(cond) or SCYLLA_ASSERT(cond). It aims to eventually replace all assertions in Scylla and provide a real solution to issue #7871 ("exorcise assertions from Scylla"). throwing_assert() is based on the existing on_internal_error() and inherits all its benefits, but brings with it the convenience of assert() and SCYLLA_ASSERT(): No need for a separate if(), new strings, etc. For example, you can do write just one line of throwing_assert(): throwing_assert(p != nullptr); Instead of much more verbose on_internal_error: if (p == nullptr) { utils::on_internal_error("assertion failed: p != nullptr") } Like assert() and SCYLLA_ASSERT(), in our tests throwing_assert() dumps core on failure. But its advantage over the other assertion functions like becomes clear in production: * assert() is compiled-out in release builds. This means that the condition is not checked, and the code after the failed condition continues to run normally, potentially to disasterous consequences. In contrast, throwing_assert() continues to check the condition even in release builds, and if the condition is false it throws an exception. This ensures that the code following the condition doesn't run. * SCYLLA_ASSERT() in release builds checks the condition and crashes Scylla if the condition is not met. In contrast, throwing_assert() doesn't crash, but throws an exception. This means that the specific operation that encountered the error is aborted, instead of the entire server. It often also means that the user of this operation will see this error somehow and know which operation failed - instead of encountering a mysterious server (or even whole-cluster crash) without any indication which operation caused it. Another benefit of throwing_assert() is that it logs the error message (and also a backtrace!) to Scylla's usual logging mechanisms - not to stderr like assert and SCYLLA_ASSERT write, where users sometimes can't see what is written. Fixes #28308. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-02-25 14:58:47 +02:00
Wojciech Mitros	d1ff8f1db3	docs: add strong consistency doc Add a new docs/dev document for the strongly consistent tables feature. For now, it only contains information about the Raft metadata persistence, but it should be updated as more of the strong-consistency components are added.	2026-02-25 12:34:58 +01:00
Wojciech Mitros	97dc88d6b6	test/cluster: add tests for strongly-consistent tables' metadata persistence In this patch we add various tests for checking how strongly consistent tables work while allowing their tablets to reside on non-0 shards and while using the new persistent storage for their raft metadata. The tests verify that: - strongly consistent tables' tablets can be allocated on different shards and we can write/read from them - the raft metadata is persistent across restarts even with disruptions - the sharder correctly routes metadata queries to specified shards - we can correctly perform multi-shard reads from the metadata tables - we can read using just the group_id (without shard) using ALLOW FILTERING For the tests we add logging to the sharder and partitioner and we add some extra logs for observability.	2026-02-25 12:34:58 +01:00
Wojciech Mitros	f841c0522d	raft: enable multi-shard raft groups for strongly consistent tablets In this patch we allow strongly consistent tables to have tablets on shards different than 0. For that, we remove the checks for shard 0 for the non-group0 raft groups, and we allow the tablet allocator to place tablets of strongly consistent tables on shards different than 0. We also start using the new storage (raft::persistence) for strongly consistent tables, added in the preceding commits.	2026-02-25 12:34:58 +01:00
Wojciech Mitros	ffe32e8e4d	test/raft: add unit tests for raft_groups_storage Most functions of the new storage for raft groups for strongly consistent tables are the same as for the system raft table storage, so we reuse the tests for them to test the new storage. We add additional tests for checking the new raft groups partitioner and sharder, and for verifying that writes using storages for different shards do not affect the data read on different shards. We also add a test for checking the snapshot_descriptor present after the storage bootstrap - for both system and strongly consistent storages we check that the storage contains the initial descriptor.	2026-02-25 12:34:58 +01:00
Wojciech Mitros	16977d7aa0	raft: add raft_groups_storage persistence class Add raft_groups_storage, a raft::persistence implementation for strongly consistent tablet groups. Currently, it's almost an exact copy of the raft_sys_table_storage that uses the new raft tables for strongly consistent tables (raft_groups, raft_groups_snapshots, raft_groups_snapshot_config) which have a (shard, group_id) partition key. In the future, the mutation, term and commit_idx data will be stored differently for for strongly consistent tables than for group0, which will differentiate this class from the original raft_sys_table_storage. The storage is created for each raft group server and it takes a shard parameter at construction time to ensure all queries target the correct partition (and thus shard).	2026-02-25 12:34:58 +01:00
Wojciech Mitros	654fe4b1ca	db: add system tables for strongly consistent tables' raft groups Add three new system tables for storing raft state for strongly consistent tablets, corresponding to the tables for group0: - system.raft_groups: Stores the raft log, term/vote, snapshot_id, and commit_idx for each tablet's raft group. - system.raft_groups_snapshots: Stores snapshot descriptors (index, term) for each group. - system.raft_groups_snapshot_config: Stores the raft configuration (current and previous voters) for each snapshot. These tables use a (shard, group_id) composite partition key with the newly added raft_groups_partitioner and raft_groups_sharder, ensuring data is co-located with the tablet replica that owns the raft group. The tables are only created when the STRONGLY_CONSISTENT_TABLES experimental feature is enabled.	2026-02-25 12:34:58 +01:00
Wojciech Mitros	cb0caea8bf	dht: add fixed_shard_partitioner and fixed_shard_sharder Add a custom partitioner and sharder that will be used for Raft tables for strongly consistent tables. These tables will have partition keys of the form (shard, group_id) and the partitioner creates tokens that encode the target shard in the high 16 bits. Token layout: [shard: 16 bits][partition key hash: 48 bits] This encoding guarantees that raft group data will be located on the same shard as the tablet replica corresponding to that raft group as long we use the tablet replica's shard as the value in the partition key. Storing the shard directly in the partition key avoids additional lookups for request routing to the incoming new raft tables. For even more simplicity, we avoid biasing between uint64_t and int64_t by limiting the acceptable shard ids up to 32767 (leaving the top bit 0), which results in the same value of the token when interpreting either as uint64_t or int64_t. The sharder decodes the shard by extracting the high bits, which is shard-count independent. This allows the partition key:shard mapping to remain the same even during smp changes (only increases are allowed, the same limitation as for tablets).	2026-02-25 12:34:51 +01:00
Botond Dénes	99244179f7	Merge 'CQL transport: Add histogram-based request/response size tracking' from Amnon Heiman This series closes a gap in how CQL request and response sizes are reported. Previously, request_size and response_size were tracked as simple counters, providing only cumulative totals per shard. This made it difficult to understand the distribution of message sizes and identify potential issues with very large or very small requests. After this series, the CQL transport reports detailed histogram metrics showing the distribution of request and response sizes. These histograms are tracked per-instance, per-type (per ops), and per-scheduling-group, providing much better visibility into CQL traffic patterns. The histograms are collected for QUERY, EXECUTE, and BATCH operations, which are the primary data path operations where message size distribution is most relevant. This data can help identify: - Clients sending unexpectedly large requests - Operations with oversized result sets - Scheduling group differences in traffic patterns To support this, the series extends the approx_exponential_histogram template to handle accurate sum, adds a bytes_histogram type alias optimized for byte-range measurements (1KB to 1GB). The existing per-shard counter metrics are maintained for backward compatibility. Metrics example: ``` scylla_transport_cql_request_bytes{kind="BATCH",scheduling_group_name="sl:default",shard="0"} 129808 scylla_transport_cql_request_bytes{kind="EXECUTE",scheduling_group_name="sl:default",shard="0"} 227409 scylla_transport_cql_request_bytes{kind="PREPARE",scheduling_group_name="sl:default",shard="0"} 631 scylla_transport_cql_request_bytes{kind="QUERY",scheduling_group_name="sl:default",shard="0"} 2809 scylla_transport_cql_request_bytes{kind="QUERY",scheduling_group_name="sl:driver",shard="0"} 4079 scylla_transport_cql_request_bytes{kind="REGISTER",scheduling_group_name="sl:default",shard="0"} 98 scylla_transport_cql_request_bytes{kind="STARTUP",scheduling_group_name="sl:driver",shard="0"} 432 scylla_transport_cql_request_histogram_bytes_sum{kind="QUERY",scheduling_group_name="sl:driver"} 4079 scylla_transport_cql_request_histogram_bytes_count{kind="QUERY",scheduling_group_name="sl:driver"} 57 scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="1024.000000",scheduling_group_name="sl:driver"} 57 scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="2048.000000",scheduling_group_name="sl:driver"} 57 scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="4096.000000",scheduling_group_name="sl:driver"} 57 scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="8192.000000",scheduling_group_name="sl:driver"} 57 scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="16384.000000",scheduling_group_name="sl:driver"} 57 scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="32768.000000",scheduling_group_name="sl:driver"} 57 scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="65536.000000",scheduling_group_name="sl:driver"} 57 scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="131072.000000",scheduling_group_name="sl:driver"} 57 scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="262144.000000",scheduling_group_name="sl:driver"} 57 scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="524288.000000",scheduling_group_name="sl:driver"} 57 scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="1048576.000000",scheduling_group_name="sl:driver"} 57 scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="2097152.000000",scheduling_group_name="sl:driver"} 57 scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="4194304.000000",scheduling_group_name="sl:driver"} 57 scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="8388608.000000",scheduling_group_name="sl:driver"} 57 scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="16777216.000000",scheduling_group_name="sl:driver"} 57 scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="33554432.000000",scheduling_group_name="sl:driver"} 57 scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="67108864.000000",scheduling_group_name="sl:driver"} 57 scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="134217728.000000",scheduling_group_name="sl:driver"} 57 scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="268435456.000000",scheduling_group_name="sl:driver"} 57 scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="536870912.000000",scheduling_group_name="sl:driver"} 57 scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="1073741824.000000",scheduling_group_name="sl:driver"} 57 ``` The field sees it as an important issue Fixes #14850 Closes scylladb/scylladb#28419 * github.com:scylladb/scylladb: test/boost/estimated_histogram_test.cc: Switch to real Sum transport/server: to bytes_histogram approx_exponential_histogram: Add sum() method for accurate value tracking utils/estimated_histogram.hh: Add bytes_histogram	2026-02-25 13:05:18 +02:00
Yaron Kaikov	98494e08eb	ci: harden trigger-scylla-ci workflow against credential leaks and untrusted PRs refs: https://github.com/scylladb/scylladb/security/advisories/GHSA-wrqg-xx2q-r3fv - Remove -v and -i flags from curl to prevent credentials from being logged in workflow output - Move PR_NUMBER and PR_REPO_NAME into the env block with proper quoting to prevent shell injection via crafted PR metadata - Add org membership verification step for pull_request_target events so that only PRs from scylladb org members can trigger Jenkins CI Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-796 Closes scylladb/scylladb#28785	2026-02-25 12:42:18 +02:00
Avi Kivity	511fab1f28	gossiper: exit failure detector sleep faster When running unit tests, there's a visible ~1-second sleep when gossip exits the failure detector loop. Improve this by adding a condition variable for exiting the loop and signaling it when any of the exit conditions are satisfied: the abort_source is pulled, the gossiper is shut down, or the sleep is complete. We can't just use the abort_source because gossip can be shut down independently of the rest of the system. To see the improvement, I ran cql_query_test in dev mode: Before: $ time ./build/dev/test/boost/combined_tests -t cql_query_test -- --smp 2 > /dev/null 2>&1 real 2m26.904s user 0m24.307s sys 0m13.402s After: $ time ./build/dev/test/boost/combined_tests -t cql_query_test -- --smp 2 > /dev/null 2>&1 real 0m26.579s user 0m24.671s sys 0m13.636s Two minutes of real-time saved. Real-life improvement in test.py will be lower, because of the overhead of launching pytest for each test case. Closes scylladb/scylladb#28649	2026-02-25 11:41:02 +02:00
Andrzej Jackowski	e2c4b0a733	config: add write_consistency_levels_* guardrails configuration Add guardrails configuration that can be used later in this patch series. Refs: SCYLLADB-259	2026-02-25 10:30:03 +01:00
Avi Kivity	5baf16005f	build: install antlr3 from maven + source, not rpm packages Fedora removed the C++ backend from antlr3 [1], citing incompatible license. The license in question (the Unicode license) is fine for us. To be able to continue using antlr3, build it ourselves. The main executable can be used as is from Maven, since we don't need any patches for the parser. The runtime needs to be patched, so we download the source and patch it. Regenerated frozen toolchain with optimized clang from https://devpkg.scylladb.com/clang/clang-21.1.8-Fedora-43-aarch64.tar.gz https://devpkg.scylladb.com/clang/clang-21.1.8-Fedora-43-x86_64.tar.gz Fixes https://scylladb.atlassian.net/browse/SCYLLADB-773 Closes scylladb/scylladb#28765	2026-02-25 11:03:19 +02:00
Andrei Chekun	729bad77b1	test.py: add possibility to run downloaded Scylla binary Add possibility to run Scylla binary that is stored or download the relocatable package with Scylla. Closes scylladb/scylladb#28787	2026-02-25 10:23:19 +02:00
Łukasz Paszkowski	9ade0b23da	reader_concurrency_semaphore: set _ex in on_preemptive_abort() When a permit is preemptively aborted, store the corresponding exception in permit's member: `reader_permit::impl::_ex`. This makes preemptively-aborted permits consistently report aborted() and prevents them from being treated as eligible for inactive registration in `register_inactive_read()`, avoiding assertion failures on unexpected permit state. Closes scylladb/scylladb#28591	2026-02-25 10:20:06 +02:00
Grzegorz Burzyński	b4f0eb666f	packaging: add systemctl command to dependencies scylladb/scylla container image doesn't include systemctl binary, while it is used by perftune.py script shipped within the same image. Scylla Operator runs this script to tune Scylla nodes/containers, expecting its all dependencies to be available in the container's PATH. Without systemctl, the script fails on systems that run irqbalance (e.g., on EKS nodes) as the script tries to reconfigure irqbalance and restart it via systemctl afterwards. Fixes: scylladb/scylla-operator#3080 Closes scylladb/scylladb#28567	2026-02-25 10:19:32 +02:00
Botond Dénes	56cc7bbeec	Merge 'Allow "global" snapshot using topology coordinator + add tablet metadata to manifest' from Calle Wilund Refs: SCYLLADB-193 Adds a "snapshot_table" topology operation and associated data structure/table columns to support dispatching a snapshot operation as a topo coordinator op. Logic is similar, and thus broken out and semi-shared with, truncation. Also adds optional tablet metadata to manifest, listing all tablets present in a given snapshot, as well as tablet sstable ownership, repair status, and token ranges. As per description in SCYLLADB-193, the alternative snapshot mechanism is in a separate namespace under 'tablets', which while dubious is the desired destination. The API is accessed via `nodetool cluster snapshot`, which more or less mirrors `nodetool snapshot`, but using topo op. TTL is added to message propagation as a separate patch here, since it is not (yet) used from API (or nodetool). Requires a syntax for both API and command line. Closes scylladb/scylladb#28525 * github.com:scylladb/scylladb: topology::snapshot: Add expiry (ttl) to RPC/topo op test_snapshot_with_tablets: Extend test to check manifest content table::manifest: Add tablet info to manifest.json test::test_snapshot_with_tablets: Add small test for topo coordinated snapshot scylla-nodetool: Add "cluster snapshot" command api::storage_service: Add tablets/snapshots command for cluster level snapshot db::snapshot-ctl: Add method to do snapshot using topo coordinator storage_proxy: Add snapshot_keyspace method topology_coordinator: Add handler for snapshot_tables storage_proxy: Add handler for SNAPSHOT_WITH_TABLETS messaging_service: Add SNAPSHOT_WITH_TABLETS verb feature_service: Add SNAPSHOT_AS_TOPOLOGY_OPERATION feature topology_mutation: Add setter for snapshot part of row system_keyspace::topology_requests_entry: Add snapshot info to table topology_state_machine: Add snapshot_tables operation topology_coordinator: Break out logic from handle_truncate_table storage_proxy: Break out logic from request_truncate_with_tablets test/object_store: Remove create_ks_and_cf() helper test/object_store: Replace create_ks_and_cf() usage with standard methods test/object_store: Shift indentation right for test cases	2026-02-25 10:17:53 +02:00
Botond Dénes	166e245097	Merge 'test.py: Topology test pytest integration' from Andrei Chekun Migrate cluster tests directory to be handled by pytest. This is the next step in process of unification of the tests and migration to the pytest. With this PR cluster test will be executed with the full path to the file instead of `suite/test` paradigm. Backport is not needed because it framework enhancement. Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-46 Closes scylladb/scylladb#27618 * github.com:scylladb/scylladb: test.py: remove setsid from the framework test.py: rename suite.yaml to test_config.yaml test.py: add cluster tests to be executed by pytest test.py: add random seed for topology tests reproducibility test.py: add explicit default values to pytest options test.py: replace SCYLLA env var with build_mode fixture	2026-02-25 10:17:20 +02:00
Botond Dénes	9dff9752b4	Merge 'Fix regression in Alternator TTL with tablets and node going down' from Nadav Har'El Recently we suffered a regression on how Alternator TTL behaves when a node goes down when tablets are used. Usually, expiration of data in a particular tablet are handled by this tablet's "primary replica". However, if that node is down, we want another node to perform these expiration until the primary replica goes back online. We created a function `tablet_map::get_secondary_replica()` to select that "other node". We don't care too much what the "secondary replica" means, but we do care that it's different from the primary replica - if it's the same the expiration of that tablet will never be done. It turns out that recently, in commits `817fdad` and `d88036d`, the implementation of get_primary_replica() changed without a corresponding change to get_secondary_replica(). After those changes, the two functions are mismatched, and sometimes return the same node for both primary and secondary replica. Unfortunately, although we had a dtest for the handling of a dead node in Alternator TTL, it failed to reproduce this bug, so this regression was missed - nothing else besides Alternator TTL ever used the get_secondary_replica() function. So this series, in addition to fixing the bug, we add two tests that reproduce this bug (fail before the fix, pass with the fix): 1. A unit test that checks that get_secondary_replica() always returns a different node from get_primary_replica() 2. A cluster test based on the original dtest, which does reproduce this bug in Alternator TTL where some of the data was never expired (but only failed in release build, for an unknown reason). Fixes SCYLLADB-777. Closes scylladb/scylladb#28771 * github.com:scylladb/scylladb: test: add unit test for tablet_map::get_secondary_replica() test, alternator: add test for TTL expiration with a node down locator: fix get_secondary_replica() to match get_primary_replica()	2026-02-25 10:13:55 +02:00
Gleb Natapov	0f8cdd81f3	group0: fix indentation after previous patch	2026-02-25 10:08:32 +02:00
Gleb Natapov	7d7cbae763	raft_group0: simplify get_group0_upgrade_state function since no upgrade can happen any more No need for locking any more so the function may just return a value and be synchronous.	2026-02-25 10:08:32 +02:00
Gleb Natapov	0689fb5ab2	raft_group0: move service::group0_upgrade_state to use fmt::formatter instead of iostream	2026-02-25 10:08:32 +02:00
Gleb Natapov	cd76604c79	raft_group0: remove unused code from raft_group0 Also do not pass raft_replace_info into setup_group0 since it is not used there for a long time now.	2026-02-25 10:08:32 +02:00
Gleb Natapov	6173ea476b	node_ops: remove topology over node ops code The code is no longer called.	2026-02-25 10:08:32 +02:00
Gleb Natapov	758d1c9c39	topology: fix indentation after the previous patch	2026-02-25 10:08:31 +02:00
Gleb Natapov	67cd5755b2	topology: drop topology_change_enabled parameter from raft_group0 code Since the parameter is always true there is no point to pass it everywhere. Just assume it is true at the point of use.	2026-02-25 10:08:31 +02:00
Gleb Natapov	50da142e77	storage_service: remove unused handle_state_* functions The handler are no longer called.	2026-02-25 10:08:31 +02:00
Gleb Natapov	1a57f2b22d	gossiper: drop wait_for_gossip_to_settle and deprecate correspondent option The function is unused now and the option that allows to skip the wait is no longer needed as well.	2026-02-25 10:08:31 +02:00
Gleb Natapov	aa0f103eb9	storage_service: fix indentation after the last patch	2026-02-25 10:08:31 +02:00
Gleb Natapov	be6cced978	storage_service: remove gossiper bootstrapping code Remove code that is responsible for bootstrapping a node in gossiper mode since the mode is no longer supported.	2026-02-25 10:08:31 +02:00
Gleb Natapov	8776d00c44	storage_service: drop get_group_server_if_raft_topolgy_enabled Raft topology is always enabled now so the function does not make sense any longer.	2026-02-25 10:08:30 +02:00
Gleb Natapov	5fa4f5b08f	storage_service: drop is_topology_coordinator_enabled and its uses The code can now assume that topology coordinator is enabled.	2026-02-25 10:08:30 +02:00
Gleb Natapov	1e8d4436c7	storage_service: drop run_with_api_lock_in_gossiper_mode_only Since gossiper mode is no longer exists the function is no longer needed.	2026-02-25 10:08:30 +02:00
Gleb Natapov	a8a167623a	topology: remove code that assumes raft_topology_change_enabled() may return false The path removes the code protected by !raft_topology_change_enabled() since it is no longer reachable. Drop test_lwt_for_tablets_is_not_supported_without_raft since not raft mode is no longer supported.	2026-02-25 10:08:30 +02:00
Botond Dénes	8dbcd8a0b3	tools/scylla-sstable: create_table_in_cql_env(): register UDTs recursively It is not enough to go over all column types and register the UDTs. UDTs might be nested in other types, like collections. One has to do a traversal of the type tree and register every UDT on the way. That is what this patch does. This function is used by the query and write operations, which should now both work with nested UDTs. Add a test which fails before and passes after this patch.	2026-02-25 08:51:25 +02:00
Botond Dénes	cf39a5e610	tools/scylla-sstable: generalize dump_if_user_type Rename to invoke_on_user_type() and make the action taken on user types a function parameter. Enables reuse of the traverse logic by other code.	2026-02-25 08:51:25 +02:00
Botond Dénes	80049c88e9	tools/scylla-sstable: move dump_if_user_type() definition So it can be used by create_table_in_cql_env() code.	2026-02-25 08:51:25 +02:00
Dario Mirovic	3222a1a559	dtest: shorten default sleep step in wait_for Default sleep step of 1s is too long. Reduce it to make the test environment more responsive and faster. Refs SCYLLADB-573	2026-02-25 03:17:47 +01:00
Dario Mirovic	51e7c2f8d9	dtest: wait_for speedup Audit tests have been slow. They rely on wait_for function. This function first sleeps for the duration of the time step specified, and then calls the given function. The audit tests need 0.02-0.03 seconds for the given function, but the operation lasts around 1.02-1.03 seconds, since step is 1 second. This patch modifies wait_for dtest function so it first executes the given function, and afterwards calls time.sleep(step). This reduces time needed for the given function from 1.03 to 0.03 seconds. Total audit tests suite speedup is 3x. On the developer machine the time is reduced from 13+ minutes to 4 minutes. This patch also improves performance of some alternator tests that use the same wait_for dtest function. Refs SCYLLADB-573	2026-02-25 03:17:46 +01:00
Andrei Chekun	1b92b140ee	test.py: improve stdout output for boost test The current way of checking the boost's stdout can have a race condition when pytest will try to read the file before it was really flushed. So this PR should eliminate this possibility. Closes scylladb/scylladb#28783	2026-02-25 00:50:25 +01:00
Ferenc Szili	f70ca9a406	load_stats: implement the optimized sum of tablet sizes PR #28703 was merged into master but not with the latest version of the changes. This patch is an incremental fix for this. Currently, the elements of the tablet_sizes_per_shard vector are incremented in separate shards. This is prone to false sharing of cache lines, and ping-pong of memory, which leads to reduced performance. In this patch, in order to avoid cache line collisions while updating the sum of tablet sizes per shard, we align the counter to 64 bytes. Fixes: SCYLLADB-678 Closes scylladb/scylladb#28757	2026-02-24 22:19:25 +01:00
Marcin Maliszkiewicz	aa7816882e	test: add test_uninitialized_conns_semaphore Runtime in dev mode: 2s	2026-02-24 17:28:51 +01:00
Alex	5557770b59	test_mv_build_during_shutdown started two async CREATE MATERIALIZED VIEW operations and never awaited them (asyncio.gather(...) without await). This pr adds await for each one of the tasks to wait for the MV schema to be added successfully and then to start the server shutdown With this change we dont need will not get the shutdown races. Closes scylladb/scylladb#28774	2026-02-24 17:25:05 +01:00
Anna Stuchlik	64b1798513	doc: remove reduntant Java-related information This commit removes: - Instructions to install scylla-jmx (and all references) - The Java 11 requirement for Ubuntu. Fixes https://github.com/scylladb/scylladb/issues/28249 Fixes https://github.com/scylladb/scylladb/issues/28252 Closes scylladb/scylladb#28254	2026-02-24 14:37:39 +01:00
Anna Stuchlik	e2333a57ad	doc: remove the tablets limitation for Alternator This commit removes the information that Alternator doesn't support tablets. The limitation is no longer valid. Fixes SCYLLADB-778 Closes scylladb/scylladb#28781	2026-02-24 14:24:30 +02:00
Andrzej Jackowski	cd4caed3d3	test: fix configuration of test_autoretrain_dict `test_autoretrain_dict` sporadically fails because the default compression algorithm was changed after the test was written. `9ffa62a986815709d0a09c705d2d0caf64776249` was an attempt to fix it by changing the compression configuration during node startup. However, the configuration change had an incorrect YAML format and was ignored by ScyllaDB. This commit fixes it. Fixes: scylladb/scylladb#28204 Closes scylladb/scylladb#28746	2026-02-24 12:08:44 +01:00
Botond Dénes	067bb5f888	test/scylla_gdb: skip coroutine tests if coroutine frame is not found For a while, we have seen coroutine related tests (those that use the coroutine_task fixture) fail occasionally, because no coroutine frame is found. Multiple attempts were made to make this problem self-diagnosing and dump enough information to be able to debug this post-mortem. To no avail so far. A lot of time was invested into this this benign issue: See the long discussion at https://github.com/scylladb/scylladb/issues/22501. It is not known if the bug is in gdb, or the gdb script trying to find the coroutine frame. In any case, both are only used for debugging, so we can tolerate occasional failures -- we are forced to do so when working with gdb anyway. Instead of piling on more effor there, just skip these tests when the problem occurs. This solves the CI flakyness. Fixes: #22501 Closes scylladb/scylladb#28745	2026-02-24 10:12:03 +01:00
Marcin Maliszkiewicz	d5684b98c8	test: cluster: add continue-after-error to perf tool tests Add --continue-after-error true to perf-cql-raw and perf-alternator tests, and --stop-on-error false to perf-simple-query test, so that tests don't abort on the first error. Reason for this is that tests are flaky with example failure: Perf test failed: std::runtime_error (server returned ERROR to EXECUTE) When CPU is starved on CI we can return timeouts and/or other errors. The change should make tests more robust on the expense of smaller test scope. But those tests were written mostly to test startup sequence as it differs from Scylla's starup. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-759 Closes scylladb/scylladb#28767	2026-02-24 11:08:34 +02:00
Andrei Chekun	d9ce2db1a3	test.py: remove setsid from the framework With previous architecture, scylla servers were handled by test.py and if pytest fails, test.py was responsible for stopping scylla processes. Now with only pytest handling, there is no such mechanism, that's why I'm removing the setsid, so when the parent pytest process closes it will automatically close all child including any started process during testing. This will allow to not leave any scylla process in case pytest was killed.	2026-02-24 09:48:38 +01:00
Andrei Chekun	d3f5f7468c	test.py: rename suite.yaml to test_config.yaml Switch of discovery of the tests by test.py	2026-02-24 09:48:38 +01:00
Andrei Chekun	e439ec9d67	test.py: add cluster tests to be executed by pytest cluster tests are now executed by pytest also Run pytest in an executor to avoid blocking the event loop, allowing resource monitoring to run concurrently Logic for passing the arguments to pytest changed due to the fact that almost all directories now executed by pytest and directories that are not handled excluded in pytest.ini Modify the threads count for debug mode, because with the default logic CI agents die	2026-02-24 09:48:38 +01:00
Andrei Chekun	edf7154fee	test.py: add random seed for topology tests reproducibility Set TOPOLOGY_RANDOM_FAILURES_TEST_SHUFFLE_SEED environment variable during pytest configuration to enable to ensure that all xdist workers will discover the same scope of the tests. This is a known limitation of the xdist plugi where the discovered tests should be consistenta across master and workers.	2026-02-24 09:48:38 +01:00
Andrei Chekun	4a7d8cd99d	test.py: add explicit default values to pytest options Add explicit default values to pytest command line options to prevent issues when running tests with pytest's parallel execution where options are not present on upper conftest, so they're just not set at all.	2026-02-24 09:48:38 +01:00
Andrei Chekun	99234f0a83	test.py: replace SCYLLA env var with build_mode fixture Replace direct usage of SCYLLA environment variable with the build_mode pytest fixture and path_to helper function. This makes tests more flexible and consistent with the test framework. Also this allows to use tests with xdist, where environment variable can be left in the master process and will not be set in the workers Add using the fixture to get the scylla binary from the suite, this will align with getting relocatable Scylla exe.	2026-02-24 09:48:38 +01:00
Avi Kivity	0add130b9d	lua: avoid undefined behavior when converting doubles to integers Lua doesn't have separate integer and floating point numbers, so we check if a number can fit in an integer and if so convert it to an integer. The conversion routine invokes undefined behavior (and even acknowledges it!). More recent compilers changed their behavior when casting infinities, breaking test_user_function_double_return which tests this conversion. Fix by tightening the conversion to not invoke undefined behavior. Closes scylladb/scylladb#28503	2026-02-24 10:41:21 +02:00
Botond Dénes	1d5b8cc562	Merge 'Fix use after free in auth cache' from Marcin Maliszkiewicz This patchset: - ensures the loading semaphore is acquired in cross-shard callbacks - fixes iterator invalidation problem when reloading all cached permissions Fixes https://scylladb.atlassian.net/browse/SCYLLADB-780 Backport: no, affected code not released yet Closes scylladb/scylladb#28766 * github.com:scylladb/scylladb: auth: cache: fix permissions iterator invalidation in reload_all_permissions auth/cache: acquire _loading_sem in cross-shard callbacks	2026-02-24 10:35:46 +02:00
Pavel Emelyanov	5a5eb67144	vector_search/dns: Use newer seastar get_host_by_name API The hostent::addr_list is deprecated in favor of address_entry::addr field that contains the very same addresses. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#28565	2026-02-23 21:28:43 +02:00
Pavel Emelyanov	6b02b50e3d	Merge 'object_storage: add retryable machinery to object storage' from Ernest Zaslavsky - add an overload to the rest http client to accept retry strategy instance as an argument - remove hand rolled error handling from object storage client and replace with common machinery that supports handling and retrying when appropriate No backport neede since it is only refactoring Closes scylladb/scylladb#28161 * github.com:scylladb/scylladb: object_storage: add retryable machinery to object storage rest_client: add `simple_send` overload	2026-02-23 21:28:51 +03:00
Wojciech Mitros	c1b3fec11a	raft: add group_id -> shard mapping to raft_group_registry To handle RPC from other nodes, we need to be able to redirect the requests for each raft group to the shard that owns it. We need to be able to do the redirection on all shards, so to achieve that, on all shards we need to store the information about which shard is occupied by each Raft group server. For that we add a group_id -> shard mapping to the raft_group_registry. The mapping is filled out when starting raft servers, it's emptied when we abort raft servers. We use it when registering RPC verb handlers, so that regardless of the shard handling the RPC, the work on the raft group can be performed on the corresponding shard.	2026-02-23 15:34:56 +01:00
Nadav Har'El	e463d528fe	test: add unit test for tablet_map::get_secondary_replica() This patch adds a unit test for tablet_map::get_secondary_replica(). It was never officially defined how the "primary" and "secondary" replicas were chosen, and their implementation changed over time, but the one invariant that this test verifies is that the secondary replica and the primary replica must be a different node. This test reproduces issue SCYLLADB-777, where we discovered that the get_primary_replica() changed without a corresponding change to get_primary_replica(). So before the previous patch, this test failed, and after the previous patch - it passes. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-02-23 16:19:43 +02:00
Nadav Har'El	0c7f499750	test, alternator: add test for TTL expiration with a node down We have many single-node functional tests for Alternator TTL in test/alternator/test_ttl.py. This patch adds a multi-node test in test/cluster/test_alternator.py. The new test verifies that: 1. Even though Alternator TTL splits the work of scanning and expiring items between nodes, all the items get correctly expired. 2. When one node is down, all the items still expire because the "secondary" owner of each token range takes over expiring the items in this range while the "primary" owner is down. This new test is actually a port of a test we already had in dtest (alternator_ttl_tests.py::test_multinode_expiration). This port is faster and smaller then the original (fewer nodes, fewer rows), but it still found a regression (SCYLLADB-777) that dtest missed - the new test failed when running with tablets and in release build mode. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-02-23 16:19:43 +02:00
Nadav Har'El	9ab3d5b946	locator: fix get_secondary_replica() to match get_primary_replica() The function tablet_map::get_secondary_replica() is used by Alternator TTL to choose a node different from get_primary_replica(). Unfortunately, recently (commits `817fdad` and d88037d) the implementation of the latter function changed, without changing the former. So this patch changes the former to match. The next two patches will have two tests that fail before this patch, and pass with it: 1. A unit test that checks that get_secondary_replica() returns a different node than get_primary_replica(). 2. An Alternator TTL test that checks that when a node is down, expirations still happen because the secondary replica takes over the primary replica's work. Fixes SCYLLADB-777 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-02-23 16:19:30 +02:00
Botond Dénes	dcd8de86ee	Merge 'docs: update a documentation of adding/removing DC and rebuilding a node' from Aleksandra Martyniuk Describe a procedure to convert tablet keyspace replication factor to rack list. Update the procedures of adding and removing a node to consider tablet keyspaces. Fixes: [SCYLLADB-398](https://scylladb.atlassian.net/browse/SCYLLADB-398) Fixes: https://github.com/scylladb/scylladb/issues/28306. Fixes: https://github.com/scylladb/scylladb/issues/28307. Fixes: https://github.com/scylladb/scylladb/issues/28270. Needs backport to all live branches as they all include tablets. [SCYLLADB-398]: https://scylladb.atlassian.net/browse/SCYLLADB-398?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ Closes scylladb/scylladb#28521 * github.com:scylladb/scylladb: docs: update nodetool rebuild docs docs: update a procedure of decommissioning a DC docs: update a procedure of adding a DC docs: describe upgrade to enforce_rack_list option docs: describe conversion to rack-list RF	2026-02-23 16:15:16 +02:00
Andrei Chekun	6ae58c6fa6	test.py: move storage tests to cluster subdirectory Move the storage test suite from test/storage/ to test/cluster/storage/ to consolidate related cluster-based tests.This removes the standalone test/storage/suite.yaml as the tests will use the cluster's test configuration. Initially these tests were in cluster, but to use unshare at first iteration they were moved outside. Now they are using another way to handle volumes without unshare, they should be in cluster Closes scylladb/scylladb#28634	2026-02-23 16:14:15 +02:00
Gleb Natapov	e23af998e1	test: schema_change_test: make test_schema_digest_does_not_change_with_disabled_features tests run in raft mode They were running in recovery to reuse existing system tables without group0 id, but since we want to remove recovery mode we need to re-generate the tables.	2026-02-23 14:54:24 +02:00
Gleb Natapov	f589740a39	test: schema_change_test: drop schema tests relevant for no raft mode only They were running in no longer supported recovery mode to force gossip topology.	2026-02-23 14:54:24 +02:00
Gleb Natapov	92049c3205	topology: remove upgrade to raft topology code We do no longer need this code since we expect that cluster to be upgraded before moving to this version.	2026-02-23 14:54:24 +02:00
Gleb Natapov	4a9cf687cc	group0: remove upgrade to group0 code This patch removes ability of a cluster to upgrade from not having group0 to having one. This ability is used in gossiper based recovery procedure that is deprecated and removed in this version. Also remove tests that uses the procedure.	2026-02-23 14:54:24 +02:00
Gleb Natapov	dcafb5c083	group0: refuse to boot if a cluster is still is not in a raft topology mode We are going to drop legacy topology mode (gossiper mode) and no longer allow ScyllaDB to start in this mode. This patch refuses to boot if a cluster is not in raft topology mode yet. It may happen if a node of a cluster that is not yet in a raft topology is upgraded to a newer version. If this happens the node has to be downgraded. Raft topology has to be enabled on a cluster and then the node can be upgraded again.	2026-02-23 14:54:24 +02:00
Gleb Natapov	ed52d1a292	storage_service: refuse to join a cluster in legacy mode We are going to drop legacy topology mode (gossiper mode) and no longer allow ScyllaDB to start in this mode. This patch disallows a node to join a cluster that is still in legacy mode. A cluster needs to enable raft mode first.	2026-02-23 14:54:24 +02:00
Marcin Maliszkiewicz	c5dc086baf	Merge 'vector_search: return NaN for similarity_cosine with all-zero vectors' from Dawid Pawlik The ANN vector queries with all-zero vectors are allowed even on vector indexes with similarity function set to cosine. When enabling the rescoring option, those queries would fail as the rescoring calls `similarity_cosine` function underneath, causing an `InvalidRequest` exception as all-zero vectors were not allowed matching Cassandra's behaviour. To eliminate the discrepancy we want the all-zero vector `similarity_cosine` calls to pass, but return the NaN as the cosine similarity for zero vectors is mathematically incorrect. We decided not to use arbitrary values contrary to USearch, for which the distance (not to be confused with similarity) is defined as cos(0, 0) = 0, cos(0, x) = 1 while supporting the range of values [0, 2]. If we wanted to convert that to similarity, that would mean sim_cos(0, x) = 0.5, which does not support mathematical reasoning why that would be more similar than for example vectors marking obtuse angles. It's safe to assume that all-zero vectors for cosine similarity shouldn't make any impact, therefore we return NaN and eliminate them from best results. Adjusted the tests accordingly to check both proper Cassandra and Scylla's behaviour. Fixes: SCYLLADB-456 Backport to 2026.1 needed, as it fixes the bug for ANN vector queries using rescoring introduced there. Closes scylladb/scylladb#28609 * github.com:scylladb/scylladb: test/vector_search: add reproducer for rescoring with zero vectors vector_search: return NaN for similarity_cosine with all-zero vectors	2026-02-23 13:10:44 +01:00
Aleksandra Martyniuk	9ccc95808f	docs: update nodetool rebuild docs Update nodetool rebuild docs to mention that the command does not work for tablet keyspaces. Fixes: https://github.com/scylladb/scylladb/issues/28270.	2026-02-23 12:45:01 +01:00
Aleksandra Martyniuk	e4c42acd8f	docs: update a procedure of decommissioning a DC Update a procedure of decommissioning a DC for tablet keyspaces. Fixes: https://github.com/scylladb/scylladb/issues/28307.	2026-02-23 12:45:01 +01:00
Aleksandra Martyniuk	1c764cf6ea	docs: update a procedure of adding a DC Update a procedure of adding a DC for tablet keyspaces. Fixes: https://github.com/scylladb/scylladb/issues/28306.	2026-02-23 12:45:01 +01:00
Aleksandra Martyniuk	e08ac60161	docs: describe upgrade to enforce_rack_list option	2026-02-23 12:44:57 +01:00
Aleksandra Martyniuk	eefe66b2b2	docs: describe conversion to rack-list RF Fixes: SCYLLADB-398	2026-02-23 12:41:33 +01:00
Marcin Maliszkiewicz	54dca90e8c	Merge 'test: move dtest/guardrails_test.py to test_guardrails.py' from Andrzej Jackowski This patch series moves `test/cluster/dtest/guardrails_test.py` to `test/cluster/test_guardrails.py`, and migrates it from `cluster/dtest/` to `cluster/` framework. There are two motivations for moving the test: - Execution time reduction (from 12s to 9s in 'dev' in my env) - Facilitate adding new tests to the `guardrails_test.py` file No backport, `dtest/guardrails_test.py` is only on master Closes scylladb/scylladb#28737 * github.com:scylladb/scylladb: test: move dtest/guardrails_test.py to test_guardrails.py test: prepare guardrails_test.py to be moved to test/cluster/	2026-02-23 12:34:43 +01:00
Marcin Maliszkiewicz	1293b94039	auth: cache: fix permissions iterator invalidation in reload_all_permissions The inner loops in reload_all_permissions iterate role's permissions and _anonymous_permissions maps across yield points. Concurrent load_permissions calls (which don't hold _loading_sem) can emplace into those same maps during a yield, potentially triggering a rehash that invalidates the active iterator. We want to avoid adding semaphore acquire in load_permissions because it's on a common path (get_permissions). Fixing by snapshotting the keys into a vector before iterating with yields, so no long-lived map iterator is held across suspension points.	2026-02-23 12:14:22 +01:00
Calle Wilund	fec7df7cbb	topology::snapshot: Add expiry (ttl) to RPC/topo op Not set yet, but includes it in messages so it can be properly set in calling code. Will add entry to manifest.	2026-02-23 11:37:17 +01:00
Calle Wilund	cc60d014ed	test_snapshot_with_tablets: Extend test to check manifest content Verifies we have the expected tablet info in manifest.	2026-02-23 11:37:17 +01:00
Calle Wilund	f7aa2aacfc	table::manifest: Add tablet info to manifest.json If using tablets, will add info for each tablet present in snapshot, with repair time and token ranges + map each sstable to its owning tablet	2026-02-23 11:37:17 +01:00
Calle Wilund	ae10b5a897	test::test_snapshot_with_tablets: Add small test for topo coordinated snapshot	2026-02-23 11:37:16 +01:00
Calle Wilund	bac81df20f	scylla-nodetool: Add "cluster snapshot" command Similar to "normal" snapshot, but will use the cluster-wide, topolgy coordinated snapshot API and path.	2026-02-23 11:37:16 +01:00
Calle Wilund	b0604d9840	api::storage_service: Add tablets/snapshots command for cluster level snapshot Calls the topology_coordinator path for snapshots.	2026-02-23 11:37:16 +01:00
Piotr Dulikowski	a4c389413c	Merge 'Hardens MV shutdown behavior by fixing lifecycle tracking for detached view-builder callbacks' from Alex Dathskovsky This series hardens MV shutdown behavior by fixing lifecycle tracking for detached view-builder callbacks and aligning update handling with the same async dispatch style used by create/drop. Patch 1 refactors on_update_view to use a dedicated coroutine dispatcher (dispatch_update_view), keeping update logic serialized under the existing view-builder lock and consistent with the callback architecture already used for create/drop paths. Patch 2 adds explicit callback lifetime coordination in view_builder: - introduce a seastar::gate member - acquire _ops_gate.hold() when launching detached create/update/drop dispatch futures - keep the hold alive until each detached future resolves - close the gate during view_builder::drain() so shutdown waits for in-flight callback work before final teardown Together, these changes reduce shutdown race exposure in MV event handling while preserving existing behavior for normal operation. Testing: - pytest --test-py-init test/cluster/mv (47 passed, 7 skipped) backport: not required started happening in master fixes: SCYLLADB-687 Closes scylladb/scylladb#28648 * github.com:scylladb/scylladb: db/view: gate detached view-builder callbacks during shutdown db:view: refactor on_update_view to use coroutine dispatcher	2026-02-23 11:28:37 +01:00
Calle Wilund	9680541144	db::snapshot-ctl: Add method to do snapshot using topo coordinator Separated from "local" snapshot.	2026-02-23 11:27:15 +01:00
Calle Wilund	425d6b4441	storage_proxy: Add snapshot_keyspace method Takes set of ks->tables tuples and issues snapshot for each. If feature is enabled, keyspace is non-local, and uses tablets, will issue topo coordinator call across cluster. Keyspaces not fitting the above will just go to "normal" (node local) snapshot.	2026-02-23 11:27:15 +01:00
Calle Wilund	012a065364	topology_coordinator: Add handler for snapshot_tables Similar to truncate, translates the topo op into RPC call to relevant replicas and waits.	2026-02-23 11:27:15 +01:00
Calle Wilund	2bc633c3bd	storage_proxy: Add handler for SNAPSHOT_WITH_TABLETS	2026-02-23 10:44:42 +01:00
Calle Wilund	d1b06482f0	messaging_service: Add SNAPSHOT_WITH_TABLETS verb	2026-02-23 10:44:42 +01:00
Calle Wilund	3075311f21	feature_service: Add SNAPSHOT_AS_TOPOLOGY_OPERATION feature To detect if cluster can do coordinated snapshot	2026-02-23 10:44:41 +01:00
Calle Wilund	988c5238cf	topology_mutation: Add setter for snapshot part of row	2026-02-23 10:44:41 +01:00
Calle Wilund	8bb81f00f8	system_keyspace::topology_requests_entry: Add snapshot info to table Adds required info to communicate snapshot requests via topology coordinator.	2026-02-23 10:44:38 +01:00
Calle Wilund	642aa44937	topology_state_machine: Add snapshot_tables operation Note: placed after "noop" op, to not confuse ops in a mixed (upgrading) cluster	2026-02-23 10:43:28 +01:00
Calle Wilund	e3d4493bf6	topology_coordinator: Break out logic from handle_truncate_table Makes handle_truncate_table use shared logic, because we would like to reuse some of it for other, coming ops. I.e. snapshot.	2026-02-23 10:43:28 +01:00
Calle Wilund	6e39c3bb83	storage_proxy: Break out logic from request_truncate_with_tablets Makes request_truncate_with_tablets use a parameterized helper, because eventually we will want to use almost identical logic for other ops, like snapshot.	2026-02-23 10:43:28 +01:00
Pavel Emelyanov	ad0c2de0d1	test/object_store: Remove create_ks_and_cf() helper Now all test cases use standard facilities to create data they test Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-02-23 10:43:28 +01:00
Pavel Emelyanov	6711afd73b	test/object_store: Replace create_ks_and_cf() usage with standard methods To create a keyspace theres new_test_keyspace helper Table is created with a single cql.run_async with explicit schema Dataset is populated with a single parallel INSERT as well Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-02-23 10:43:28 +01:00
Pavel Emelyanov	ed3a326637	test/object_store: Shift indentation right for test cases This is preparational patch. Next will need to replace foo() bar() with with something() as s: foo() bar() Effectively -- only add the `with something()` line. Not to shift the whole file right together with that future change, do it here. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-02-23 10:43:28 +01:00
Marcin Maliszkiewicz	75d4bc26d3	auth/cache: acquire _loading_sem in cross-shard callbacks distribute_role() modifies _roles on non-zero shards via invoke_on_others() without holding _loading_sem. Similarly, load_all()'s invoke_on_others() callback calls prune_all() without the semaphore. When these run concurrently with reload_all_permissions(), which iterates _roles across yield points, an insertion can trigger absl::flat_hash_map::resize(), freeing the backing storage while an iterator still references it. Fix by acquiring _loading_sem on the target shard in both distribute_role()'s and load_all()'s invoke_on_others callbacks, serializing all _roles mutations with coroutines that iterate the map.	2026-02-23 10:30:03 +01:00
Pavel Emelyanov	3d07633300	test/object_store: Use itertools.product() for deeply nested loops The test_restore_with_streaming_scopes want to run some loop body for all (almost) combinations of scope, primary-replica-only and min tablet count. For that three nested loops are used. Using itertools.product() makes the code shorter, less indented and more explicit. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-02-23 12:28:53 +03:00
Pavel Emelyanov	a9a82f89ac	test/object_store: Replace dataset creation usage with standard methods Two places are fixed 1. The call to create_dataset() is replaced with three "library" methods. This makes it explicit which options and schema are used for that. Eventually, the large and bulky create_dataset will be removed 2. The part that restores data into a fresh new table calls some CQLs by hand, and partially re-uses variables obtained from previous call to create_dataset(). Using the same "library" methods to re-create an empty table makes this part much simpler Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-02-23 12:27:41 +03:00
Pavel Emelyanov	988606ac7f	test/object_store: Shift indentation right for test_restore_with_streaming_scopes This is preparational patch. Next will need to replace foo() bar() with with something() as s: foo() bar() Effectively -- only add the `with something()` line. Not to shift the whole file right together with that future change, do it here. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-02-23 12:27:09 +03:00
Pavel Emelyanov	5161aeee95	test/backup: Run keyspace flush and snapshot taking API in parallel The take_snapshot() helper runs these API sequentially for every server. Running them with asyncio.gather() slightly reduces the wait-time thus improving the total runtime. Before: CPU utilization: 2.1% real 0m33,871s user 0m22,500s sys 0m13,207s After: CPU utilization: 2.4% real 0m29,532s user 0m22,351s sys 0m12,890s Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-02-23 12:20:36 +03:00
Pavel Emelyanov	21752a43fe	test/backup: Re-use take_snapshot() helper in do_abort_restore() The test in question does _exactly_ what this helper does, but in a longer way. The only difference is that it uses server_id as key to dict with sstable components, but it's easy to tune. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-02-23 12:20:35 +03:00
Pavel Emelyanov	818a99810c	test/backup: Move take_snapshot() helper up So that it's not in the middle of tests themselves, but near other "helper" functions in the .py file Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-02-23 12:20:35 +03:00
Ernest Zaslavsky	321d4caf0c	object_storage: add retryable machinery to object storage remove hand rolled error handling from object storage client and replace with common machinery that supports exception handling and retrying when appropriate	2026-02-22 14:00:44 +02:00
Ernest Zaslavsky	24972da26d	rest_client: add `simple_send` overload add an overload to rest client `simple_send` to accept a retry_strategy for http's make_request	2026-02-22 14:00:44 +02:00
Marcin Maliszkiewicz	aba5a8c37f	generic_server: fix waiters count in shed log Capture semaphore waiters count when blocking starts, not after the wait completes.	2026-02-20 17:04:23 +01:00
Marcin Maliszkiewicz	23bed55170	generic_server: scale connection concurrency semaphore by listener count The concurrency semaphore gates uninitialized connections across all do_accepts loops, but was initialized to a fixed value regardless of how many listeners exist. With multiple listeners competing for the same units, each effectively gets less than the configured concurrency. Initialize the semaphore to concurrency - 1 and signal 1 per listen() call, so total capacity is concurrency - 1 + nr_listeners. This guarantees each listener's accept loop can have at least one unit available.	2026-02-20 16:59:19 +01:00
Patryk Jędrzejczak	e8efcae991	Merge 'Use standard ks/cf/data creation methods in object_store/test_basic.py test' from Pavel Emelyanov The test uses create_ks_and_cf helper duplicating the existing code that does the same. This PR patches basic tests to use standard facilities. Also it prepares the ground for testing keyspace storage options with rf=3 Cleaning tests, not backporting Closes scylladb/scylladb#28600 * https://github.com/scylladb/scylladb: test/object_store: Remove create_ks_and_cf() helper test/object_store: Replace create_ks_and_cf() usage with standard methods test/object_store: Shift indentation right for test cases	2026-02-20 15:53:38 +01:00
Nadav Har'El	d01915131a	test/cqlpy: make test_indexing_paging_and_aggregation much faster Currently, test_secondary_index.py::test_indexing_paging_and_aggregation is very slow, and the slowest test in the test/cqlpy framework: It takes around 13 seconds on dev build, and because it is CPU-bound (doesn't sleep), it is much slower on debug builds. The reason for this slowness is that it needs to set up and read over 10,000 rows which is the default select_internal_page_size. But after the patches in pull request (#25368), we can configure select_internal_page_size, so in this patch we change the test to temporarily reduce this option to just 50, and then the test can reach the same code paths with just 142 rows instead of 20120 rows before this patch. As a result, the test should now be 140 times faster than it was before. In practice, because of some fixed overheads (the test creates several tables and indexes), in dev build mode the test run speedup is "only" 26-fold (to around half a second). I verified that removing the code added in `bb08af7` indeed makes the new shorter test fail - and this is the only test in test_secondary_index.py that starts to fail besides test_index_paging_group_by which is also related (so my revert didn't just break secondary indexing completely). So the shorter test is still a good regression test. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#28268	2026-02-20 15:44:53 +02:00
Avi Kivity	92bc5568c5	tools: toolchain: build sanitizers for future toolchain The future toolchain did not build the sanitizers, so debug executables did not link. Fix by not disabling the sanitizers. Closes scylladb/scylladb#28733	2026-02-20 15:44:24 +02:00
Botond Dénes	6c04e02f66	Merge 'Fix restoration test's validation of streaming directions' from Pavel Emelyanov The test_restore_with_streaming_scopes among other things checks how data streams flow while restoring. Whether or not to check the streams is decided based on the min tablet count value, which is compared with a hardcoded 512. This value of 512 matched the tablet count used by this test until it was "optimized" by #27839, where this number changed to 5 and streaming checks became off. Good news is that the very same checks are still performed by test_refresh_with_streaming_scopes. But it's better to have a working restoration test anyway. Minor test fix, not backporting Closes scylladb/scylladb#28607 * github.com:scylladb/scylladb: test: Fix the condition for streaming directions validation test: Split test_backup.py::check_data_is_back() into two	2026-02-20 15:42:10 +02:00
Botond Dénes	6f88c0dbd3	Merge ' test_tablets_parallel_decommission: Fix flakiness due to delayed task appearance' from Tomasz Grabiec Currently, the test assumes that when 'topology_coordinator_pause_before_processing_backlog: waiting' is logged, the task for decommission must be there. This was based on the assumption that topology coordinator is idle and decommission request wakes it up. But if the server is slow enough, it may still be running the load balancer in reaction to table creation, and block on that injection point before decommission request was added. Fix by waiting for the task to appear rather than the injection. Fixes SCYLLADB-715 Only 2026.1 vulnerable. Closes scylladb/scylladb#28688 * github.com:scylladb/scylladb: test_tablets_parallel_decommission: Fix flakiness due to delayed task appearance test: cluster: task_manager_client: Introduce wait_task_appears() tests: pylib: util: Add exponential backoff to wait_for	2026-02-20 15:05:36 +02:00
Pavel Emelyanov	c96420c015	tests: Re-use manager.get_server_exe() There's a bunch of incremental repair tests that want to call scylla sstable command. For that they try to find where scylla binary by scanning /proc directory (see local_process_id and get_scylla_path helpers). There's shorter way -- just call manager.get_server_exe(). Same for backup-restore test. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#28676	2026-02-20 14:59:30 +02:00
Pavel Emelyanov	a4a0d75eee	test/object_store: Parametrize test_simple_backup_and_restore() There are three tests and a function with a pair of boolean parameters called by those. It's less code if the function becomes a test with parameters. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#28677	2026-02-20 14:57:30 +02:00
Pavel Emelyanov	a2e1293f86	test/object_store: Squash two simple-backup tests together The test_backup_simple creates a ks/cf, takes a snapshot, backs it up, then checks that the files were uploaded. The test_backup_move does the same, but also plays with 'move_files' parameter to be true/false. In fact, the "move" test was the copy of "simple" one that dropepd check for scheduling group being "streaming" (backup with --move-files can check the same, it's not bad), and check for destination bucket to contain needed files (same here -- checking that files arrived to bucket after --move-files is good). In the end of the day, after the change backup test is run two times, instead of three, and performs extra checks for --move-files case. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#28606	2026-02-20 14:49:30 +02:00
Botond Dénes	7e90ed657c	Merge 'Fix `client_options` docs' from Karol Baryła https://github.com/scylladb/scylladb/pull/25746 added a new column to `system.clients`: `client_options frozen<map<text, text>>`. This column stores all options sent by the client in the `STARTUP` message. This PR also added `CLIENT_OPTIONS` to the list of values sent in `SUPPORTED` message, and documented that drivers can send their configuration (as JSON) in `STARTUP` under this key. Documentation for the new column was not added to the description of `system.clients` table, and documentation about the new `STARTUP` key was added in `protocol-extensions.md`, but in the section about shard awareness extension. This PR adds missing `system.clients` column description, moves the documentation of `CLIENT_OPTIONS` into its own section, and expands it a bit. Backport: none, because this fixes internal documentation. Closes scylladb/scylladb#28126 * github.com:scylladb/scylladb: protocol-extensions.md: Fix client_options docs system_keyspace.md: Add client_options column system_keyspace.md: Fix order in system.clients	2026-02-20 14:23:34 +02:00
Pavel Emelyanov	525cb5b3eb	table: Use fmt::to_string() to stringify compation group ID Doing it with format("{}", foo) is correct, but to_string is a bit more lightweight. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#28630	2026-02-20 14:13:15 +02:00
Patryk Jędrzejczak	d399a197f5	Merge 'raft: Await instead of returning future in wait_for_state_change' from Dawid Mędrek The `try-catch` expression is pretty much useless in its current form. If we return the future, the awaiting will only be performed by the caller, completely circumventing the exception handling. As a result, instead of handling `raft::request_aborted` with a proper error message, the user will face `seastar::abort_requested_exception` whose message is cryptic at best. It doesn't even point to the root of the problem. Fixes SCYLLADB-665 Backport: This is a small improvement and may help when debugging, so let's backport it to all supported versions. Closes scylladb/scylladb#28624 * https://github.com/scylladb/scylladb: test: raft: Add test_aborting_wait_for_state_change raft: Describe exception types for wait_for_state_change and wait_for_leader raft: Await instead of returning future in wait_for_state_change	2026-02-20 12:17:22 +01:00
Andrzej Jackowski	eb5a564df2	test: move dtest/guardrails_test.py to test_guardrails.py This commit moves `guardrails_test.py`, prepared in the previous commit of this patch series, to `test/cluster/test_guardrails.py`. It also cleans up `suite.yaml`.	2026-02-20 11:39:52 +01:00
Andrzej Jackowski	9df426d2ae	test: prepare guardrails_test.py to be moved to test/cluster/ Disable `test/cluster/dtest/guardrails_test.py` in `suite.yaml` and make it compatible with the `test/cluster/` framework. This will allow moving this file from `test/cluster/dtest/` to `test/cluster/` in the next commit of this patch series. There are two motivations for moving the test: - Execution time reduction (from 12s to 9s in 'dev' in my env) - Facilitate adding new tests to the `guardrails_test.py` file	2026-02-20 11:39:43 +01:00
Raphael S. Carvalho	f33f324f77	mutation_compactor: Fix tombstone GC metrics to account for only expired There are 3 metrics (that goes in every compaction_history entry): total_tombstone_purge_attempt total_tombstone_purge_failure_due_to_overlapping_with_memtable total_tombstone_purge_failure_due_to_overlapping_with_uncompacting_sstable When a tombstone is not expired (e.g. doesn't satisfy "gc_before" or grace period), it can be currently accounted as failure due to overlapping with either memtable or uncompacting sstable. So those 2 last metrics have noise of unexpired tombstones. What we should do is to only account for expired tombstones in all those 3 metrics. We lose the info of knowing the amount of tombstones processed by compaction, now we'll only know about the expired ones. But those metrics were primarily added for explaining why expired tombstones cannot be removed. We could have alternatively added a new field purge_failure_due_to_being_unexpired or something, but it requires adding a new field to compaction_history. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-737. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#28669	2026-02-20 10:43:58 +02:00
Botond Dénes	0bf4c68af5	Merge 'docs: fix link to docker build readme in the README.MD' from Marcin Szopa Links were pointing to the `debian` subdirectory. However, there docker build was refactored to use `redhat`: `1abf981a73`, see https://github.com/scylladb/scylladb/pull/22910 No backport, just a README link fixes. Closes scylladb/scylladb#28699 * github.com:scylladb/scylladb: docs: fix path to the build_docker.sh which was moved from debian to redhat subdirectory docs: fix link to docker build README.MD	2026-02-20 08:21:46 +02:00
Botond Dénes	51a25c8af3	test/boost/batchlog_manager_test: add tests for v1 batchlog The v1 table is used while upgrading from a pre-v2 version. We need tests to ensure it still works.	2026-02-20 07:03:46 +02:00
Botond Dénes	83344dacbd	test/boost/batchlog_manager_test: make prepare_batches() work with both v1 and v2 Make the actual table name a parameter and add logic to adapt to the variant used. Also add dump_to_log::yes to is_rows() invokation to help debuging tests.	2026-02-20 07:03:46 +02:00
Botond Dénes	2956714e19	test/boost/batchlog_manager_test: fix indentation	2026-02-20 07:03:46 +02:00
Botond Dénes	23732227fe	test/boost/batchlog_manager_test: extract prepare_batches() method To be shared between multiple tests in future commits. Indentation is left broken.	2026-02-20 07:03:46 +02:00
Botond Dénes	af26956bb4	test/lib/cql_assertions: is_rows(): add dump parameter When set to true, the query results will be logged by the testlog logger with debug level. A huge help when debugging failures around cql assertions: seeing the actual query result is often enough to immediately understand why the test failed.	2026-02-20 07:03:46 +02:00
Botond Dénes	48e9b3d668	tools/scylla-sstable: extract query result printers To cql3/query_result_printer.hh. Allowing for other users, outside of tools.	2026-02-20 07:03:46 +02:00
Botond Dénes	978627c4e1	tools/scylla-sstable: add std::ostream& arg to query result printers Make them more general-purpose, in preparation to extracting them to their own header.	2026-02-20 07:03:46 +02:00
Botond Dénes	0549b61d55	repair/row_level: repair_flush_hints_batchlog_handler(): add all_replayed to finish log Provides visibility into whether batchlog replay was successful or not.	2026-02-20 07:03:46 +02:00
Botond Dénes	dd50bd9bd4	db/batchlog_manager: re-add v1 support system.batchlog will still have to be used while the cluster is upgrading from an older version, which doesn't know v2 yet. Re-add support for replaying v1 batchlogs. The switch to v2 will happen after the BATCHLOG_V2 cluster feature is enabled. The only external user -- storage_proxy -- only needs a minor adjustment: switch between the table names. The rest is handled transparently by the db/batchlog.hh interface and the batchlog_manager.	2026-02-20 07:03:46 +02:00
Botond Dénes	8ffa3d32c0	db/batchlog_manager: return all_replayed from process_batch() process_batch() currently returns stop_iteration::no from all control paths. This is not useful. Return the all_replayed output param instead. This requires making the batch() lambda a coroutine, but considering the amount of work process_batch() does (send multiple writes), this should be inconsequential.	2026-02-20 07:03:46 +02:00
Botond Dénes	091b43f54b	db/batchlog_manager: process_bath() fix indentation	2026-02-20 07:03:46 +02:00
Botond Dénes	ef2b8b4819	db/batchlog_manager: make batch() a standalone function Currently it is a huge lambda. Deserves to be a standalone function, to make the replay_all_failed_batches() easier to read and modify.	2026-02-20 07:03:46 +02:00
Botond Dénes	ca2bbbad97	db/batchlog_manager: make structs stats public Need to rename stats() -> get_stats() because it shadows the now exported type name.	2026-02-20 07:03:46 +02:00
Botond Dénes	f8bfaedb6e	db/batchlog_manager: allocate limiter on the stack Now that replay_all_failed_batches() is a coroutine, there is no need to make it a shared pointer anymore.	2026-02-20 07:03:46 +02:00
Botond Dénes	ac059dadc6	db/batchlog_manager: add feature_service dependency Will be needed to check for batchlog_v2 feature.	2026-02-20 07:03:46 +02:00
Botond Dénes	c901ab53d2	gms/feature_service: add batchlog_v2 feature	2026-02-20 07:03:45 +02:00
Avi Kivity	66bef0ed36	lua, tools: adjust for lua 5.5 lua_newstate seed parameter Lua 5.5 adds a seed parameter to lua_newstate(), provide it with a strong random seed. Closes scylladb/scylladb#28734	2026-02-20 06:52:37 +02:00
Avi Kivity	27a5502f14	Merge 'Reapply "main: test: add future and abort_source to after_init_func"' from Marcin Maliszkiewicz The patchset fixes abort_source implementation for perf-alternator and perf-cql-raw. It moves run_standalone function to common code in perf.hh with necessary templating. We also add extensive testing so that it's more difficult to break the tooling in the future. Fixes SCYLLADB-560 Backport: no, internal tooling improvement Closes scylladb/scylladb#28541 * github.com:scylladb/scylladb: test: cluster: add tests for perf tools test: perf: fix port race condition on startup in connect workload test: perf: prepare benchmarks to bind to custom host test: perf: make perf-alterantor remote port configurable test: perf: fix ASAN leak warnings in perf-alternator Reapply "main: test: add future and abort_source to after_init_func"	2026-02-19 19:12:46 +02:00
Dawid Mędrek	c9d192c684	Merge 'raft ropology: prevent crashes of multiple nodes' from Patryk Jędrzejczak Some assertions in the Raft-based topology are likely to cause crashes of multiple nodes due to the consistent nature of the Raft-based code. If the failing assertion is executed in the code run by each follower (e.g., the code reloading the in-memory topology state machine), then all nodes can crash. If the failing assertion is executed only by the leader (e.g., the topology coordinator fiber), then multiple consecutive group0 leaders will chain-crash until there is no group0 majority. Crashing multiple nodes is much more severe than necessary. It's enough to prevent the topology state machine from making more progress. This will naturally happen after throwing a runtime error. The problematic fiber will be killed or will keep failing in a loop. Note that it should be safe to block the topology state machine, but not the whole group0, as the topology state machine is mostly isolated from the rest of group0. We replace some occurrences of `on_fatal_internal_error` and `SCYLLA_ASSERT` with `on_internal_error`. These are not all occurrences, as some fatal assertions make sense, for example, in the bootstrap procedure. We also raise an internal error to prevent a segmentation fault in a few places. Fixes #27987 Backporting this PR is not required, but we can consider it at least for 2026.1 because: - it is LTS, - the changes are low-risk, - there shouldn't be many conflicts. Closes scylladb/scylladb#28558 * github.com:scylladb/scylladb: raft topology: prevent accessing nullptr returned by topology::find raft topology: make some assertions non-crashing	2026-02-19 16:50:03 +01:00
Marcin Maliszkiewicz	22c3d8d609	Merge 'db/config: enable table audit by default' from Piotr Smaron In https://github.com/scylladb/scylladb/pull/27262 table audit has been re-enabled by default in `scylla.yaml`, logging certain categories to a table, which should make new Scylla deployments have audit enabled. Now, in the next release, we also want to enable audit in `db/config.cc`, which should enable audit for all deployments, which don't explicitly configure audit otherwise in `scylla.yaml` (or via cmd line). BTW. Because this commit aligns audit's default config values in `db/config.cc` to those of `scylla.yaml`, `docs/reference/configuration-parameters.rst`, which is based on `db/config.cc` will start showing that table audit is the default. Refs: https://github.com/scylladb/scylladb/issues/28355 Refs: https://scylladb.atlassian.net/browse/SCYLLADB-222 No backport: table audit has been enabled in 2026.1 in `scylla.yaml`, and should be always on starting from the next release, which is the release we're currently merging to (2026.2). Closes scylladb/scylladb#28376 * github.com:scylladb/scylladb: docs: decommission: note audit ks may require ALTERing docs: mention table audit enabled by default audit: disable DDL by default db/config: enable table audit by default test/cluster: fix `test_table_desc_read_barrier` assertion test/cluster: adjust audit in tests involving decommissioning its ks audit_test: fix incorrect config in `test_audit_type_none`	2026-02-19 16:30:11 +01:00
Pavel Emelyanov	b4b9b547ce	replica: Remove unused sched groups from keyspace and table configs Compaction and statement groups are carried over on those configs, but are in fact unused. Drop both. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#28540	2026-02-19 15:47:31 +01:00
Patryk Jędrzejczak	45115415fb	Merge 'Parametrize and merge several restoration test cases' from Pavel Emelyanov There are four tests that check how restore with primary-replica-only option works in various scopes and topologies. Cases that check same-racks and same-datacenters are very very similar, so are those that check different-racks and different-datacenters. Parametrizing them and merging saves lots of code (+30 lines, -116 lines) It's probably worth merging the resulting same-domain with different-domain tests, because the similarity is still large in both, but the result becomes too if-y, so not done here. Maybe later. Improving tests, not backporting Closes scylladb/scylladb#28569 * https://github.com/scylladb/scylladb: test: Merge test_restore_primary_replica_different_... tests test: Merge test_restore_primary_replica_same_... tests test: Don't specify expected_replicas in test_restore_primary_replica_different_dc_scope_all test: Remove local r_servers variable from test_restore_primary_replica_different_dc_scope_all	2026-02-19 15:42:55 +01:00
Pavel Emelyanov	26372e65df	Merge 's3_perf: Fix the s3 perf test' from Ernest Zaslavsky Fix the build of the test and the upload operation flow No need to backport since it is only a test we barely use Closes scylladb/scylladb#28595 * github.com:scylladb/scylladb: s3_perf: fix upload operation flow s3_perf: fix the CMake build	2026-02-19 15:31:43 +02:00
Avi Kivity	7ec710c250	Merge 'tablets: Reduce per-shard migration concurrency to 2' from Tomasz Grabiec Tablet migration keeps sstable snapshot during streaming, which may cause temporary increase in disk utilization if compaction is running concurrently. SSTables compacted away are kept on disk until streaming is done with them. The more tablets we allow to migrate concurrently, the higher disk space can rise. When the target tablet size is configured correcly, every tablet should own about 1% of disk space. So concurrency of 4 shouldn't put us at risk. But target tablet size is not chosen dynamically yet, and it may not be aligned with disk capacity. Also, tablet sizes can temporarily grow above the target, up to 2x before the split starts, and some more because splits take a while to complete. To reduce the impact from this, reduce concurrency of migration. Concurrency of 2 should still be enough to saturate resources on the leaving shard. Also, reducing concurrency means that load balancing is more responsive to preemption. There will be less bandwidth sharing, so scheduled migrations complete faster. This is important for scale-out, where we bootstrap a node and want to start migrations to that new node as soon as possible. Refs scylladb/siren#15317 Closes scylladb/scylladb#28563 * github.com:scylladb/scylladb: tablets, config: Reduce migration concurrency to 2 tablets: load_balancer: Always accept migration if the load is 0 config, tablets: Make tablet migration concurrency configurable	2026-02-19 15:31:43 +02:00
Dawid Mędrek	fae71f79c2	test: raft: Add test_aborting_wait_for_state_change	2026-02-19 14:21:01 +01:00
Karol Nowacki	ca7f9a8baf	vector_search: fix TLS server name with IP SNI works only with DNS hostnames. Adding an IP address causes warnings on the server side. This change adds SNI only if it is not an IP address. This change has no unit tests, as this behavior is not critical, since it causes a warning on the server side. The critical part, that the server name is verified, is already covered. Fixes: VECTOR-528	2026-02-19 13:00:03 +01:00
Karol Nowacki	6205aad601	vector_search: add warn log for failed ann requests In order to simplify troubleshooting connection problems, this patch adds an extra warn log that prints the error for the vector search request whenever it fails.	2026-02-19 13:00:03 +01:00
Dawid Mędrek	e4f2b62019	raft: Describe exception types for wait_for_state_change and wait_for_leader The methods of `raft::server` are abortable and if the passed `abort_source` is triggered, they throw `raft::request_aborted`. We document that. Although `raft::server` is an interface, this is consistent with the descriptions of its other methods.	2026-02-19 12:47:14 +01:00
Dawid Mędrek	c36623baad	raft: Await instead of returning future in wait_for_state_change The `try-catch` expression is pretty much useless in its current form. If we return the future, the awaiting will only be performed by the caller, completely circumventing the exception handling. As a result, instead of handling `raft::request_aborted` with a proper error message, the user will face `seastar::abort_requested_exception` whose message is cryptic at best. It doesn't even point to the root of the problem. Fixes SCYLLADB-665	2026-02-19 12:47:14 +01:00
Marcin Maliszkiewicz	de4e5e10af	test: perf: fix prepared statements logic in perf-simple-query Due to lack of checks present in process_execute_internal from transport/server.cc needs_authorization bool was always set to true doing some extra work (check_access()) for each request. We mirror the logic in this patch in test env which perf-simple-query uses. This can also potentially improve runtime of unittests (marginally). Note that bug is only in perf tool not scylla itself, the fix decreases insns/op by around 10%: Before: 41065 insns/op After: 37452 insns/op Command: ./build/release/scylla perf-simple-query --duration 5 --smp 1 Fixes https://github.com/scylladb/scylladb/issues/27941 Closes scylladb/scylladb#28704	2026-02-19 12:42:07 +02:00
Avi Kivity	58a662b9db	dist: refresh container base image (ubi9-minimal) Using an outdated image can cause problems when `microdnf update` runs, if the distribution doesn't maintain good update hygiene. Although, I suspect that when update failures happen they're really caused by propagation delay of packages to mirrors. Fix by using --pull=always to get a fresh image. Ref https://scylladb.atlassian.net/browse/SCYLLADB-714 Closes scylladb/scylladb#28680	2026-02-19 12:42:43 +03:00
Ferenc Szili	f1bc17bd4c	load_stats: fix race condition when computing sum_tablet_sizes In storage_service::load_stats_for_tablet_based_tables(), we are passing a reference to sum_tablet_sizes to the lambda which increments this value on each shard via map_reduce0(). This means we could have a race condition because this is executed on separate threads/CPUs. This patch fixed the problem by collecting the sums by shard into a vector, then summing those up. Refs: SCYLLADB-678 Closes scylladb/scylladb#28703	2026-02-19 12:29:25 +03:00
Avi Kivity	dee868b71a	interval: avoid clang 23 warning on throw statement in potentially noexcept function interval_data's move constructor is conditionally noexcept. It contains a throw statemnt for the case that the underlying type's move constructor can throw; that throw statemnt is never executed if we're in the noexept branch. Clang 23 however doesn't understand that, and warns about throwing in a noexcept function. Fix that by rewriting the logic using seastar::defer(). In the noexcept case, the optimizer should eliminate it as dead code. Closes scylladb/scylladb#28710	2026-02-19 12:24:20 +03:00
Ernest Zaslavsky	45d824e0fe	s3_perf: fix upload operation flow Correct the upload operation logic. The previous flow incorrectly checked for the test file on S3 even when performing operations that do not download the file, such as uploads.	2026-02-19 11:14:59 +02:00
Botond Dénes	b637e17b19	db/config: don't use RBNO for scaling Remove bootstrap and decomission from allowed_repair_based_node_ops. Using RBNO over streaming for these operations has no benefits, as they are not exposed to the out-of-date replica problem that replace, removenode and rebuild are. On top of that, RBNO is known to have problems with empty user tables. Using streaming for boostrap and decomission is safe and faster than RBNO in all condition, especially when the table is small. One test needs adjustment as it relies on RBNO being used for all node ops. Fixes: SCYLLADB-105 Closes scylladb/scylladb#28080	2026-02-19 09:51:09 +01:00
Calle Wilund	8e71a6f52a	gcp: Add handling of 429 (too many requests) to exponential backoff Fixes: SCYLLADB-611 Adds http error code 429 to codes handled by exponential backoff. Closes scylladb/scylladb#28588	2026-02-19 09:42:39 +01:00
Marcin Maliszkiewicz	3417d50add	test: cluster: add tests for perf tools It checks if all workloads can be properly executed with succesfull startup and teardown. Especially testing alternator in remote mode is important because it's invoked like this during pgo training in pgo.py. Test runtime: Release - 24s Debug - 1m 15s Test time consists mostly of Scylla startup in various modes.	2026-02-19 09:33:10 +01:00
Marcin Maliszkiewicz	c69534504c	test: perf: fix port race condition on startup in connect workload Other workloads at startup call prepopulate() which connects with retry loop therefore it waits until cql port is open. This commit adds a single place where we will wait for port for all workloads. Timeout is set to 5 minutes so that even slowest machines are able to start.	2026-02-19 09:33:10 +01:00
Marcin Maliszkiewicz	828f2fbdb1	test: perf: prepare benchmarks to bind to custom host This is usefull for tests where we use local networks like 127.5.5.5 to avoid port and host collisions.	2026-02-19 09:33:10 +01:00
Marcin Maliszkiewicz	9f2b97bef4	test: perf: make perf-alterantor remote port configurable It could be a usefull option to have.	2026-02-19 09:33:10 +01:00
Marcin Maliszkiewicz	f5a212e91e	test: perf: fix ASAN leak warnings in perf-alternator Those were intentional as test process is short lived but when we add automated tests in the following commits we expect clean exit, with 0 exit code.	2026-02-19 09:33:10 +01:00
Marcin Maliszkiewicz	0c76c73e34	Reapply "main: test: add future and abort_source to after_init_func" This reverts commit `ceec703bb7`. The commit was fixed with abort source handling for alternator standalone path so it's safe to reapply.	2026-02-19 09:33:10 +01:00
Piotr Dulikowski	7d6f734a51	dictionary compression: add missing co_awaits on get_units There is a handful of places in the code related to dictionary compression which calls get_units to acquire semaphore units but the returned future is not awaited, seemingly by mistake. The result of get_units is assigned to a variable - which is reasonable at a glance because the semaphore units need to be assigned to a variable in order to control their scope - but at the same time if co_await is mistakenly omitted, like here, doing so will silence the nodiscard check of seastar::future and, effectively, the get_units call will be nearly useless. Unfortunately, this is an easy mistake to make. Fix the places in the code that acquire semaphore units via get_units but never await the future returned by it. I found them by manual code inspection, so I hope that I didn't miss any. Closes scylladb/scylladb#28581	2026-02-18 16:40:40 +01:00
Ernest Zaslavsky	4026b54a5e	s3_perf: fix the CMake build Fix the CMake build of the perf_s3_client by adding the necessary linkage with the jsoncpp library.	2026-02-18 17:12:08 +02:00
Piotr Smaron	797c5cd401	docs: decommission: note audit ks may require ALTERing With audit feature enabled, it's not immediately obvious that its pseudo-system keyspace `audit` may require adjusting its RF across DCs before decommissioning a node, and this should be documented.	2026-02-18 15:14:57 +01:00
Piotr Smaron	65eec6d8e7	docs: mention table audit enabled by default Also align the documentation with the current audit settings.	2026-02-18 15:14:57 +01:00
Piotr Smaron	c30607d80b	audit: disable DDL by default DDL audit category doesn't make sense if its enabled by default on its own, as no DDL statements are going to be audited if audit_keyspaces/audit_tables setting is empty. This may be counter-intuitive to our users, who may expect to actually see these statements logged if we're enabling this by default. Also, it doesn't make sense to enable a setting by default if it has no effect. Additionally, listed all possible audit categories for user's convenience.	2026-02-18 15:14:57 +01:00
Piotr Smaron	08dc1008ba	db/config: enable table audit by default In https://github.com/scylladb/scylladb/pull/27262 table audit has been re-enabled by default in `scylla.yaml`, logging certain categories to a table, which should make new Scylla deployments have audit enabled. Now, in the next release, we also want to enable audit in `db/config.cc`, which should enable audit for all deployments, which don't explicitly configure audit otherwise in `scylla.yaml` (or via cmd line). BTW. Because this commit aligns audit's default config values in `db/config.cc` to those of `scylla.yaml`, `docs/reference/configuration-parameters.rst`, which is based on `db/config.cc` will start showing that table audit is the default. Refs: https://github.com/scylladb/scylladb/issues/28355 Refs: https://scylladb.atlassian.net/browse/SCYLLADB-222	2026-02-18 15:14:57 +01:00
Piotr Smaron	95ee4a562c	test/cluster: fix `test_table_desc_read_barrier` assertion The test `assertion desc_schema[0] == desc_schema[1]` does a direct list comparison, which is order-sensitive. Before enabling audit by default, both nodes would return only the test keyspace/table, so the order didn't matter. With audit enabled, there will be multiple keyspaces, and they can be returned in different order by different nodes.	2026-02-18 15:14:57 +01:00
Piotr Smaron	2e12b83366	test/cluster: adjust audit in tests involving decommissioning its ks When table audit is enabled, Scylla creates the "audit" ks with NetworkTopologyStrategy and RF=3. During node decommission, streaming can fail for the audit ks with "zero replica after the removal" when all nodes from a DC are removed, and so we have to ALTER audit ks to either zero the number of its replicas, to allow for a clear decommission, or have them in the 2nd DC. BTW. https://github.com/scylladb/scylladb/issues/27395 is the same change, but in dtests repository.	2026-02-18 15:14:55 +01:00
Piotr Smaron	0cf20fa15a	audit_test: fix incorrect config in `test_audit_type_none` Passing Python `None` to setup is incorrect, because config updates are sent as a dict and `None` is treated as "unset" - meaning: use Scylla's default. Using the explicit string "none" to guarantee that audit is disabled.	2026-02-18 15:12:26 +01:00
Asias He	1be80c9e86	repair: Skip auto repair for tables using RF one There is no point running repair for tables using RF one. Row level repair will skip it but the auto repair scheduler will keep scheduling such repairs since repair_time could not be updated. Skip such repairs at the scheduler level for auto repair. If the request is issued by user, we will have to schedule such repair otherwise the user request will never be finished. Fixes SCYLLADB-561 Closes scylladb/scylladb#28640	2026-02-18 14:32:50 +02:00
Andrzej Jackowski	4221d9bbfd	docs: improve examples in `Handling Audit Failures` section This commit introduces four changes: - In the `table` example, singular forms (node, partition) are changed to plural forms (nodes, partitions). Currently, the default `table` audit configuration is RF=3 and writes use CL=ONE. Therefore, a `table` audit log write failure should not be caused by a single node unavailability, and plural forms are more adequate. - In the `table` example, unreachability due to network issues is mentioned because with RF=3, audit failure due to network problems is more likely to happen than a simultaneous failure of three nodes (such network failures happened in SCYLLADB-706). - In the `syslog` example, a slash `/` is changed to `or`, so `table` and `syslog` examples have similar structure. - As the `syslog` line is already being changed, I also change `unix` to `Unix`, as the capitalized form is the correct one. Refs SCYLLADB-706 Closes scylladb/scylladb#28702	2026-02-18 13:10:01 +01:00
Botond Dénes	3bfd47da4b	Merge 'transport: fix connection code to consume only initially taken semaphore units' from Marcin Maliszkiewicz The connection's `cpu_concurrency_t` struct tracks the state of a connection to manage the admission of new requests and prevent CPU overload during connection storms. When a connection holds units (allowed only 0 or 1), it is considered to be in the "CPU state" and contributes to the concurrency limits used when accepting new connections. The bug stems from the fact that `counted_data_source_impl::get` and `counted_data_sink_impl::put` calls can interleave during execution. This occurs because of `should_parallelize` and `_ready_to_respond`, the latter being a future chain that can run in the background while requests are being read. Consequently, while reading request (N), the system may concurrently be writing the response for request (N-1) on the same connection. This interleaving allows `return_all()` to be called twice before the subsequent `consume_units()` is invoked. While the second `return_all()` call correctly returns 0 units, the matching `consume_units()` call would mistakenly take an extra unit from the semaphore. Over time, a connection blocked on a read operation could end up holding an unreturned semaphore unit. If this pattern repeats across multiple connections, the semaphore units are eventually depleted, preventing the server from accepting any new connections. The fix ensures that we always consume the exact number of units that were previously returned. With this change, interleaved operations behave as follows: get() return_all — returns 1 unit put() return_all — returns 0 units get() consume_units — takes back 1 unit put() consume_units — takes back 0 units Logically, the networking phase ends when the first network operation concludes. But more importantly, when a network operation starts, we no longer hold any units. Other solutions are possible but the chosen one seems to be the simplest and safest to backport. Fixes SCYLLADB-485 Backport: all supported affected versions, bug introduced with initial feature implementation in: `ed3e4f33fd` Closes scylladb/scylladb#28530 * github.com:scylladb/scylladb: test: auth_cluster: add test for hanged AUTHENTICATING connections transport: fix connection code to consume only initially taken semaphore units	2026-02-18 13:48:49 +02:00
Marcin Szopa	9217f85e99	docs: fix path to the build_docker.sh which was moved from debian to redhat subdirectory	2026-02-18 12:19:27 +01:00
Marcin Szopa	e66bf4a6f5	docs: fix link to docker build README.MD Link was pointing to the old place of the README. It was moved in the `1abf981a73`	2026-02-18 12:12:46 +01:00
Piotr Dulikowski	b9db3c9c75	Merge 'Add consistent permissions cache' from Marcin Maliszkiewicz This patchset replaces permissions cache based on loading_cache with a new unified (permissions and roles), full, coherent auth cache. Reason for the change is that we want to improve scenarios under stress and simplify operation manuals. New cache doesn't require any tweaking. And it behaves particularly better in scenarios with lots of schema entities (e.g. tables) combined with unprepared queries. Old cache can generate few thousands of extra internal tps due to cache refresh. Benchmark of unprepared statements (just to populate the cache) with 1000 tables shows 3k tps of internal reads reduction and 9.1% reduction of median instructions per op. So many tables were used to show resource impact, cache could be filled with other resource types to show the same improvement. Backport: no, it's a new feature. Fixes https://github.com/scylladb/scylladb/issues/7397 Fixes https://github.com/scylladb/scylladb/issues/3693 Fixes https://github.com/scylladb/scylladb/issues/2589 Fixes https://scylladb.atlassian.net/browse/SCYLLADB-147 Closes scylladb/scylladb#28078 * github.com:scylladb/scylladb: test: boost: add auth cache tests auth: add cache size metrics docs: conf: update permissions cache documentation auth: remove old permissions cache auth: use unified cache for permissions auth: ldap: add permissions reload to unified cache auth: add permissions cache to auth/cache auth: add service::revoke_all as main entry point auth: explicitly life-extend resource in auth_migration_listener	2026-02-18 12:03:20 +01:00
Tomasz Grabiec	af0b5d0894	Merge 'tablets global barrier: acknowledge barrier_and_drain from all nodes' from Petr Gusev Before this series, the `global_barrier` used during tablet migration did not guarantee that `barrier_and_drain` was acknowledged by tablet replicas. As a result, if a request coordinator was fenced out, stale requests from previous topology versions could still execute on replicas in parallel with new requests from incompatible topology versions. For example, stale requests from `tablet_transition_stage::streaming` could run concurrently with new requests from `tablet_transition_stage::use_new`. This caused several issues, including [#26864](https://github.com/scylladb/scylladb/issues/26864) and [#26375](https://github.com/scylladb/scylladb/issues/26375). This PR fixes the problem in two steps: * Replicas now hold an erm strong pointer while handling RPCs from coordinators. * The tablet barrier is updated to require `barrier_and_drain` acknowledgments from all nodes. A description of alternative solutions and various tradeoffs can be found in [this document](https://docs.google.com/document/d/1tpDtPOsrGaZGBYkdwOKApQv4eMzrBydMM1GaYYmaPgg/edit?pli=1&tab=t.0#heading=h.vidfy0hrz5j7). [A previous attempt on this changes](https://github.com/scylladb/scylladb/pull/27185). Fixes [scylladb/scylladb#26864](https://github.com/scylladb/scylladb/issues/26864) Fixes [scylladb/scylladb#26375](https://github.com/scylladb/scylladb/issues/26375) backport: needs backport to 2025.4 (fixes #26864 for tablets LWT) Closes scylladb/scylladb#27492 * github.com:scylladb/scylladb: tests: extract get_topology_version helper global tablets barrier: require all nodes to ack barrier_and_drain topology_coordinator: pass raft_topology_cmd by value storage_proxy: hold erms in replica handlers token_metadata: improve stale versions diagnostics	2026-02-18 11:45:56 +01:00
Pavel Emelyanov	0c443d5764	gms: Use newer seastar get_host_by_name API The hostent::addr_list is deprecated in favor of address_entry::addr field that contains the very same addresses. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#28566	2026-02-18 12:24:35 +02:00
Pavel Emelyanov	5b740afe9a	database: Remove streaming sched group getter All users of it had been updated to get the streaming group elsewhere, so this getter is no longer needed. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#28527	2026-02-18 12:23:35 +02:00
Avi Kivity	c5a1f44731	tools: toolchain: switch from ccache to sccache sccache combines the functions of ccache and distcc, and promises to support C++20 modules in the future. Switch to sccache in anticipation of modules support. The documentation is adjusted since cache will be persistent for sccache without further work. Closes scylladb/scylladb#28524	2026-02-18 12:23:12 +02:00
Botond Dénes	36167a155e	Merge 'Remove map_to_key_value() helpers from API' from Pavel Emelyanov There are some places that get `map<foo, bar>` and return it to the caller as `"key": string(foo), "value": string(bar)` json. For that there's `map_to_key_value()` helper in api.hh that re-formats the map into a vector of json elements and returns it, letting seastar json-ize that vector. Recently in seastar there appeared stream_range_as_array() helper that helps streaming any range without converting it into intermediate collection. Some of the hottest users of `map_to_key_value()` had been converted, this PR converts few remainders and removes the helper in question to encourage further usage of the stream_range_as_array(). Code cleanup, not backporting Closes scylladb/scylladb#28491 * github.com:scylladb/scylladb: api: Remove map_to_key_value() helpers api: Streamify view_build_statuses handler api: Streamify few more storage_service/ handlers api: Add map_to_json() helper api: Coroutinize view_build_statuses handler	2026-02-18 12:22:00 +02:00
Ernest Zaslavsky	196f7cad93	nodetool: fix handling of "--primary-replica-only" argument The "--primary-replica-only" ("-pro") flag was previously ignored by the `restore` operation. This patch ensures the argument is parsed and applied correctly. Closes scylladb/scylladb#28490	2026-02-18 12:21:27 +02:00
Pavel Emelyanov	bce43c6b20	api: Remove unused (lost) local variable Lost when the get_range_to_endpoint_map hander was implemented for real (`48c3c94aa6`) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#28489	2026-02-18 12:20:30 +02:00
Ernest Zaslavsky	afac984632	s3_client: reorganize tests in part_size_calculation_test just group all BOOST_REQUIRE_EXCEPTION tests in one block and remove artificial scopes	2026-02-18 12:12:04 +02:00
Ernest Zaslavsky	1a20877afe	s3_client: switch using s3 limits constants in tests instead of using magic numbers, switch using s3 limit constants to make it clearer what and why is tested	2026-02-18 12:12:04 +02:00
Ernest Zaslavsky	d763bdabc2	s3_client: fix the s3::range max object size in s3::Range class start using s3 global constant for two reasons: 1) uniformity, no need to introduce semantically same constant in each class 2) the value was wrong	2026-02-18 12:12:04 +02:00
Ernest Zaslavsky	24e70b30c8	s3_client: remove "aws" prefix from object limits constants remove "aws" prefix from object limits constants since it is irrelevant and unnecessary when sitting under s3 namespace	2026-02-18 12:12:04 +02:00
Ernest Zaslavsky	329c156600	s3_client: make s3 object limits accessible make s3 limits constants publicly accessible to reuse it later	2026-02-18 12:12:04 +02:00
Alex	c44ad31d44	db/view: gate detached view-builder callbacks during shutdown Detached migration callbacks (on_create_view, on_update_view, on_drop_view) can race with view_builder::drain() teardown. Add a lifetime gate to view_builder and wire callback launches through _ops_gate.hold() so each detached dispatch future is tracked until it completes (finally keeps the hold alive). During shutdown, drain() now waits for all tracked callback work with _ops_gate.close(). This ensures drain does not proceed past callback lifetime while shutdown is in progress, and ignores only gate_closed_exception at callback entry as the expected shutdown path.	2026-02-18 11:56:41 +02:00
Tomasz Grabiec	d33d38139f	test_tablets_parallel_decommission: Fix flakiness due to delayed task appearance Currently, the test assumes that when 'topology_coordinator_pause_before_processing_backlog: waiting' is logged, the task for decommission must be there. This was based on the assumption that topology coordinator is idle and decommission request wakes it up. But if the server is slow enough, it may still be running the load balancer in reaction to table creation, and block on that injection point before decommission request was added. Fix by waiting for the task to appear rather than the injection. Fixes SCYLLADB-715	2026-02-18 01:02:50 +01:00
Tomasz Grabiec	2454de4f8f	test: cluster: task_manager_client: Introduce wait_task_appears()	2026-02-18 01:02:44 +01:00
Tomasz Grabiec	e14eca46af	tests: pylib: util: Add exponential backoff to wait_for Allows balancing the trade-off between fast execution in case the condition is satisfied quickly and not adding load when it's not.	2026-02-18 01:02:19 +01:00
Marcin Maliszkiewicz	741969cf4c	test: boost: add auth cache tests The cache is covered already with general auth dtests but some cases are more tricky and easier to express directly as calls to cache class. For such tests boost test file was added.	2026-02-17 18:18:40 +01:00
Marcin Maliszkiewicz	c11eb73a59	auth: add cache size metrics	2026-02-17 18:18:40 +01:00
Marcin Maliszkiewicz	a059798de9	docs: conf: update permissions cache documentation	2026-02-17 18:18:40 +01:00
Marcin Maliszkiewicz	a23e503e7b	auth: remove old permissions cache	2026-02-17 17:56:27 +01:00
Marcin Maliszkiewicz	9d9184e5b7	auth: use unified cache for permissions	2026-02-17 17:56:27 +01:00
Marcin Maliszkiewicz	7eedf50c12	auth: ldap: add permissions reload to unified cache The LDAP server may change role-chain assignments without notifying Scylla. As a result, effective permissions can change, so some form of polling is required. Currently, this is handled via cache expiration. However, the unified cache is designed to be consistent and does not support expiration. To provide an equivalent mechanism for LDAP, we will periodically reload the permissions portion of the new cache at intervals matching the previously configured expiration time.	2026-02-17 17:56:27 +01:00
Marcin Maliszkiewicz	10996bd0fb	auth: add permissions cache to auth/cache We want to get rid of loading cache because its periodic refresh logic generates a lot of internal load when there is many entries. Also our operation procedures involve tweaking the config while new unified cache is supposed to work out of the box.	2026-02-17 17:56:27 +01:00
Marcin Maliszkiewicz	03c4e4bb10	auth: add service::revoke_all as main entry point In the following commit we'll need to add some cache related logic (removing resource permissions). This logic doesn't depend on authorizer so it should be managed by the service itself.	2026-02-17 17:56:27 +01:00
Marcin Maliszkiewicz	070d0bfc4c	auth: explicitly life-extend resource in auth_migration_listener Otherwise it's easy to trigger use-after-free when code slightly changes.	2026-02-17 17:56:27 +01:00
Marcin Maliszkiewicz	3b98451776	test: auth_cluster: add test for hanged AUTHENTICATING connections Test runtime: Release - 2s Debug - 5s	2026-02-17 17:55:48 +01:00
Marcin Maliszkiewicz	0376d16ad3	transport: fix connection code to consume only initially taken semaphore units The connection's cpu_concurrency_t struct tracks the state of a connection to manage the admission of new requests and prevent CPU overload during connection storms. When a connection holds units (allowed only 0 or 1), it is considered to be in the "CPU state" and contributes to the concurrency limits used when accepting new connections. The bug stems from the fact that `counted_data_source_impl::get` and `counted_data_sink_impl::put` calls can interleave during execution. This occurs because of `should_parallelize` and `_ready_to_respond`, the latter being a future chain that can run in the background while requests are being read. Consequently, while reading request (N), the system may concurrently be writing the response for request (N-1) on the same connection. This interleaving allows `return_all()` to be called twice before the subsequent `consume_units()` is invoked. While the second `return_all()` call correctly returns 0 units, the matching `consume_units()` call would mistakenly take an extra unit from the semaphore. Over time, a connection blocked on a read operation could end up holding an unreturned semaphore unit. If this pattern repeats across multiple connections, the semaphore units are eventually depleted, preventing the server from accepting any new connections. The fix ensures that we always consume the exact number of units that were previously returned. With this change, interleaved operations behave as follows: get() return_all — returns 1 unit put() return_all — returns 0 units get() consume_units — takes back 1 unit put() consume_units — takes back 0 units Logically, the networking phase ends when the first network operation concludes. But more importantly, when a network operation starts, we no longer hold any units. Other solutions are possible but the chosen one seems to be the simplest and safest to backport. Fixes SCYLLADB-485	2026-02-17 17:55:48 +01:00
Petr Gusev	c785d242a7	tests: extract get_topology_version helper This is a refactoring commit. We need to load the cluster version for a host in several places, so extract a helper for this.	2026-02-16 08:57:42 +01:00
Petr Gusev	ffe3262e8d	global tablets barrier: require all nodes to ack barrier_and_drain Previously, global_tablet_token_metadata_barrier() could proceed with fencing even if some nodes did not acknowledge the barrier_and_drain. This could cause problems: * In scylladb/scylladb#26864, replica locks did not provide mutual exclusion, because “fenced out” requests from old topology versions could run in parallel with requests using newer versions. * In scylladb/scylladb#26375, the barrier could succeed even though we did not wait for closed sessions to become unused. This could leave aborted repair or streaming tasks running concurrently after a tablet transition was aborted, and thus running concurrently with the next transition. In this commit we add a parameter drain_all_nodes: bool to the global_token_metadata_barrier function. If this parameter is set, the barrier waits for all nodes to acknowledge the barrier_and_drain round of RPCs. If any of the nodes are not accessible or throw an error, such errors are rethrown to the caller. We set this parameter only in global_tablet_token_metadata_barrier since for topology migrations the old behavior should be preserved. In case of errors, the tablet migration is blocked until the problem goes away by itself or the problematic node is added to the ignore_nodes list. The test_fenced_out_on_tablet_migration_while_handling_paxos_verb is removed: with tablets, we now drain all nodes, so after a successful barrier_and_drain round there can be no coordinators with an old topology version. The fence_token check after executing a request on a replica is therefore unnecessary for tablets, but still required for vnodes, where topology changes do not wait for all nodes. Topology fencing is covered by test_fence_lwt_during_bootstrap. Fixes scylladb/scylladb#26864 Fixes scylladb/scylladb#26375	2026-02-16 08:57:42 +01:00
Petr Gusev	06f88b43e5	topology_coordinator: pass raft_topology_cmd by value It's just a single enum. Passing by reference risks use-after-free if a temporary command is created on the stack in a non-coroutine function.	2026-02-16 08:57:42 +01:00
Petr Gusev	df73f723a6	storage_proxy: hold erms in replica handlers Add explicit erm-holding variables in all replica-side RPC handlers. This is required to ensure that tablet migration waits for in-flight replica requests even if a non-replica coordinator has been fenced out. Holding erms on the replica side may increase the global-barrier wait time, since the barrier must drain these requests. We believe this is acceptable because: * We already hold erms during replica-side request execution, but in an ad-hoc, non-systemic way in lower layers of storage_proxy (e.g. in sp::mutate_locally and do_query_tablets). * Replica requests are bounded by replica-side timeouts, so the global-barrier wait time cannot exceed the maximum of these timeouts. For Paxos verbs, we use token_metadata_guard, which wraps the ERM and automatically refreshes it when tablet migration does not affect the current token; see the token_metadata_guard comments for details. We use this guard only for Paxos verbs because regular reads and writes already hold raw erms in storage_proxy and on the coordinators. The erms must be held in all RPC handlers that support fencing — that is, those with a fencing_token parameter in storage_proxy.idl. Counter updates already hold erms in mutate_counter_on_leader_and_replicate. Fix test_tablets2::test_timed_out_reader_after_cleanup: the tablets barrier now waits for all nodes. As a result, the replica read is expected to finish, rather than fail due to the tablet having moved as it did previously. The test is renamed to test_tablets_barrier_waits_for_replica_erms to better reflect its purpose. Refs scylladb/scylladb#26864	2026-02-16 08:57:42 +01:00
Petr Gusev	e39f4b399c	token_metadata: improve stale versions diagnostics Before waiting on stale_versions_in_use(), we log the stale versions the barrier_and_drain handler will wait for, along with the number of token_metadata references representing each version. To achieve this, we store a pointer to token_metadata in version_tracker, traverse the _trackers list, and output all items with a version smaller than the latest. Since token_metadata contains the version_tracker instance, it is guaranteed to remain alive during traversal. To count references, token_metadata now inherits from enable_lw_shared_from_this. This helps diagnose tablet migration stalls and allows more deterministic tests: when a barrier is expected to block, we can verify that the log contains the expected stale versions rather than checking that the barrier_and_drain is blocked on stale_versions_in_use() for a fixed amount of time.	2026-02-16 08:57:42 +01:00
Alex	75e25493c1	db:view: refactor on_update_view to use coroutine dispatcher on_update_view() currently runs its serialized logic inline via with_semaphore() from a detached callback path, while create/drop already use dedicated async dispatchers. Refactor update handling to follow the same pattern: - add dispatch_update_view(sstring ks_name, sstring view_name) - move update logic into that coroutine - acquire the existing view-builder lock via get_or_adopt_view_builder_lock() - keep existing behavior for missing base/view state - keep background invocation semantics from on_update_view() This aligns update/create/drop flow and keeps async lifecycle handling and a first step to fix shutdown issue.	2026-02-15 18:50:32 +02:00
Taras Veretilnyk	f140ab0332	sstables: extract default write open flags into a constant Extract the commonly used `open_flags::wo \| open_flags::create \| open_flags::exclusive` into a reusable constant `sstable_write_open_flags` to reduce duplication.	2026-02-13 14:27:01 +01:00
Taras Veretilnyk	c8281b7b8b	sstables: Add write_simple_with_digest for component checksumming Introduce new methods to write SSTable components while calculating and returning their CRC32 checksums. This adds: - make_digests_component_file_writer(): creates a crc32_digest_file_writer for component writing with checksum tracking - write_simple_with_digest() and do_write_simple_with_digest(): write components and return the full checksum value	2026-02-13 14:27:01 +01:00
Taras Veretilnyk	1bf934c77c	sstables: Extract file writer closing logic into separate methods Refactor the consume_end_of_stream() method by extracting the inline file writer closing logic into dedicated methods: - close_index_writer() - close_partitions_writer() - close_rows_writer()	2026-02-13 14:27:01 +01:00
Taras Veretilnyk	dec5e48666	sstables: Implement CRC32 digest-only writer Introduce template parameter to checksummed file writer to support digest-only calculation without storing chunk checksums. This will be needed for future to calculate digest of other components.	2026-02-13 14:27:00 +01:00
Patryk Jędrzejczak	8e9c7397c5	raft topology: prevent accessing nullptr returned by topology::find It's better to raise an internal error than cause a segmentation fault on possibly multiple nodes.	2026-02-12 13:10:04 +01:00
Patryk Jędrzejczak	e21ecf69de	raft topology: make some assertions non-crashing Some assertions in the Raft-based topology are likely to cause crashes of multiple nodes due to the consistent nature of the Raft-based code. If the failing assertion is executed in the code run by each follower (e.g., the code reloading the in-memory topology state machine), then all nodes can crash. If the failing assertion is executed only by the leader (e.g., the topology coordinator fiber), then multiple consecutive group0 leaders will chain-crash until there is no group0 majority. Crashing multiple nodes is much more severe than necessary. It's enough to prevent the topology state machine from making more progress. This will naturally happen after throwing a runtime error. The problematic fiber will be killed or will keep failing in a loop. Note that it should be safe to block the topology state machine, but not the whole group0, as the topology state machine is mostly isolated from the rest of group0. We replace some occurrences of `on_fatal_internal_error` and `SCYLLA_ASSERT` with `on_internal_error`. These are not all occurrences, as some fatal assertions make sense, for example, in the bootstrap procedure.	2026-02-12 13:10:03 +01:00
Dawid Pawlik	4e32502bb3	test/vector_search: add reproducer for rescoring with zero vectors Add reproducer for the SCYLLADB-456 issue following exception on ANN vector queries with rescoring with similarity cosine.	2026-02-11 13:41:09 +01:00
Dawid Pawlik	af0889d194	vector_search: return NaN for similarity_cosine with all-zero vectors The ANN vector queries with all-zero vectors are allowed even on vector indexes with similarity function set to cosine. When enabling the rescoring option, those queries would fail as the rescoring calls `similarity_cosine` function underneath, causing an `InvalidRequest` exception as all-zero vectors were not allowed matching Cassandra's behaviour. To eliminate the discrepancy we want the all-zero vector `similarity_cosine` calls to pass, but return the NaN as the cosine similarity for zero vectors is mathematically incorrect. We decided not to use arbitrary values contrary to USearch, for which the distance (not to be confused with similarity) is defined as cos(0, 0) = 0, cos(0, x) = 1 while supporting the range of values [0, 2]. If we wanted to convert that to similarity, that would mean sim_cos(0, x) = 0.5, which does not support mathematical reasoning why that would be more similar than for example vectors marking obtuse angles. It's safe to assume that all-zero vectors for cosine similarity shouldn't make any impact, therefore we return NaN and eliminate them from best results. Adjusted the tests accordingly to check both proper Cassandra and Scylla's behaviour. Fixes: SCYLLADB-456	2026-02-11 12:31:47 +01:00
Pavel Emelyanov	2a3a56850c	test: Fix the condition for streaming directions validation Commit `ea8a661119` tried to reduce the dataset for restoration tests. While doing it effectively disabled part of itself -- the checks for streaming directions were never ran after this change. The thing is that this check only runs if restored tablet count matches some hardcoded one of 512. This was the real dataset size of the test before the aforementioned commit, but after it it had changed to over values, and the comparison with 512 became always False. Fix it with a local variable to prevent such mistakes in the future. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-02-11 12:55:27 +03:00
Pavel Emelyanov	f187dceb1a	test: Split test_backup.py::check_data_is_back() into two This method does two things -- checks that the data is indeed back, and validates streaming directions. The latter is not quite about "data is back", so better to have it as explicit dedicated method. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-02-11 12:54:20 +03:00
Pavel Emelyanov	875fd03882	test/object_store: Remove create_ks_and_cf() helper Now all test cases use standard facilities to create data they test Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-02-10 15:59:05 +03:00
Pavel Emelyanov	94176f7477	test/object_store: Replace create_ks_and_cf() usage with standard methods To create a keyspace theres new_test_keyspace helper Table is created with a single cql.run_async with explicit schema Dataset is populated with a single parallel INSERT as well Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-02-10 15:58:05 +03:00
Pavel Emelyanov	6665cda23f	test/object_store: Shift indentation right for test cases This is preparational patch. Next will need to replace foo() bar() with with something() as s: foo() bar() Effectively -- only add the `with something()` line. Not to shift the whole file right together with that future change, do it here. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-02-10 15:56:27 +03:00
Wojciech Mitros	c5a44b0f88	schema: add with_sharder overload accepting static_sharder reference Add a schema_builder::with_sharder() overload that accepts a const reference to dht::static_sharder. This allows schemas to use custom sharder instances instead of only static sharder configurations. This is needed to support tables that use custom partitioning and sharding strategies, such as the incoming raft metadata tables for strongly consistent tables.	2026-02-10 10:52:00 +01:00
Pavel Emelyanov	83e64b516a	hint: Don't switch group in database::apply_hint() The method is called from storage_proxy::mutate_hint() which is in turn called from hint_mutation::apply_locally(). The latter is either called from directly by hint sender, which already runs in streaming group, or via RPC HINT_MUTATION handler which uses index 1 that negotiates streaming group as well. To be sure, add a debugging check for current group being the expected one. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-02-09 08:54:51 +03:00
Pavel Emelyanov	727f1be11c	hint_sender: Switch to sender group on stop either Currently sender only switches group for hints sending on start. It's worth doing the same on stop too for consistency. There's nothing to compete with at this point. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-02-09 08:54:51 +03:00
Pavel Emelyanov	44715a2d45	api: Remove map_to_key_value() helpers All the callers had already been patched to stream their results directly as json. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-02-09 08:52:50 +03:00
Pavel Emelyanov	dcbb5cb45b	api: Streamify view_build_statuses handler Similarly to previous patch, the handler can stream the map of build statuses. Unlike previous patch, it doesn't need to fmt::format() key and value, as these are strings already. It could be a map_to_json<string, string> partial specialization, but there's so far only one caller, so probably not worth it yet. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-02-09 08:52:50 +03:00
Pavel Emelyanov	73512a59ff	api: Streamify few more storage_service/ handlers Like get_token_endpoint one streams the map that it got from storage service, the get_ownership and get_effective_ownership can do the same. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-02-09 08:52:49 +03:00
Pavel Emelyanov	a4bd9037b3	api: Add map_to_json() helper The get_token_endpoint handler converts iterator of std::map into generated maplist_mapper type. Next patch will do the same for more handlers, so it's good to have a helper converter for it. As a nice side effect, it's possible to avoid multiline lambda argument to stream_range_as_array(). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-02-09 08:52:49 +03:00
Pavel Emelyanov	63cafab56c	api: Coroutinize view_build_statuses handler Further patching will be nicer if this handler is a coroutine Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-02-09 08:52:49 +03:00
Pavel Emelyanov	1b1aae8a0d	test: Merge test_restore_primary_replica_different_... tests The difference is very small: @@ -1,18 +1,18 @@ @pytest.mark.asyncio async def test_restore_primary_replica_different_...(manager: ManagerClient, object_storage): ''' Comment ''' - topology = topo(rf = 2, nodes = 2, racks = 2, dcs = 1) - scope = "dc" + topology = topo(rf = 1, nodes = 2, racks = 1, dcs = 2) + scope = "all" ks = 'ks' cf = 'cf' - servers, host_ids = await create_cluster(topology, True, manager, logger, object_storage) + servers, host_ids = await create_cluster(topology, False, manager, logger, object_storage) await manager.disable_tablet_balancing() cql = manager.get_cql() @@ -41,7 +41,6 @@ async def test_restore_primary_replica_d log = await manager.server_open_log(s.server_id) res = await log.grep(r'INFO.sstables_loader - load_and_stream:.target_node=([0-9a-z-]+),.*num_bytes_sent=([0-9]+)') streamed_to = set([ r[1].group(1) for r in res ]) - logger.info(f'{s.ip_addr} {host_ids[s.server_id]} streamed to {streamed_to}') + logger.info(f'{s.ip_addr} {host_ids[s.server_id]} streamed to {streamed_to}, expected {servers}') assert len(streamed_to) == 2 The (removed in the above example) test description comments differ only in their usage of "rack" and "dc" words. Squashing them into one parametrized test makes perfect sense. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-02-06 14:14:43 +03:00
Pavel Emelyanov	70988f9b61	test: Merge test_restore_primary_replica_same_... tests The difference is very tiny: @@ -1,12 +1,12 @@ @pytest.mark.asyncio async def test_restore_primary_replica_same_...(manager: ManagerClient, object_storage): ''' comment ''' - topology = topo(rf = 4, nodes = 8, racks = 2, dcs = 1) - scope = "rack" + topology = topo(rf = 4, nodes = 8, racks = 2, dcs = 2) + scope = "dc" ks = 'ks' cf = 'cf' @@ -42,7 +42,7 @@ async def test_restore_primary_replica_s for r in res: nodes_by_operation[r[1].group(1)].append(r[1].group(2)) - scope_nodes = set([ str(host_ids[s.server_id]) for s in servers if s.rack == servers[i].rack ]) + scope_nodes = set([ str(host_ids[s.server_id]) for s in servers if s.datacenter == servers[i].datacenter ]) for op, nodes in nodes_by_operation.items(): logger.info(f'Operation {op} streamed to nodes {nodes}') assert len(nodes) == 1, "Each streaming operation should stream to exactly one primary replica" The (removed in the above example) test description comments differ only in their usage of "rack" and "dc" words. Squashing them into one parametrized test makes perfect sense. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-02-06 14:12:23 +03:00
Pavel Emelyanov	98b4092153	test: Don't specify expected_replicas in test_restore_primary_replica_different_dc_scope_all If not specified, the call would use dcs * rf default, which match this teat parameters perfectly Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-02-06 14:11:47 +03:00
Pavel Emelyanov	1ac1e90b16	test: Remove local r_servers variable from test_restore_primary_replica_different_dc_scope_all It merely copies the `servers` one, no need in it Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-02-06 14:11:22 +03:00
Tomasz Grabiec	41930c0176	tablets, config: Reduce migration concurrency to 2 Tablet migration keeps sstable snapshot during streaming, which may cause temporary increase in disk utilization if compaction is running concurrently. SStables compacted away are kept on disk until streaming is done with them. The more tablets we allow to migrate concurrently, the higher disk space can rise. When the target tablet size is configured correcly, every tablet should own about 1% of disk space. So concurrency of 4 shouldn't put us at risk. But target tablet size is not chosen dynamically yet, and it may not be aligned with disk capacity. Also, tablet sizes can temporary grow above the target, up to 2x before the split starts, and some more because splits take a while to complete. The reduce impact from this, reduce concurrency of migation. Concurrency of 2 should still be enough to saturate resources on the leaving shard. Also, reducing concurrency means that load balancing is more responsive to preemption. There will be less bandwidth sharing, so scheduled migrations complete faster. This is important for scale-out, where we bootstrap a node and want to start migrations to that new node as soon as possible. Refs scylladb/siren#15317	2026-02-06 00:42:19 +01:00
Tomasz Grabiec	56e40e90c9	tablets: load_balancer: Always accept migration if the load is 0 Different transitions have different weights, and limits are configurable. We don't want a situation where a high-cost migration is cut off by limits and the system can make no progress. For example, repair uses weight 2 for read concurrency. Migrating co-located tablets scales the cost by the number of co-located tablets.	2026-02-06 00:42:18 +01:00
Tomasz Grabiec	39492596c2	config, tablets: Make tablet migration concurrency configurable We're about to reduce it. It's better to not have it hard-coded in case we change our mings again.	2026-02-06 00:42:18 +01:00
Amnon Heiman	f2e142ac6e	test/boost/estimated_histogram_test.cc: Switch to real Sum Now that the sum function in the histogram uses true values instead of an estimate, the test should reflect that. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2026-01-28 23:19:00 +02:00
Amnon Heiman	3175540e87	transport/server: to bytes_histogram This patch replaces simple counters with bytes_histogram for tracking CQL request and response sizes, enabling better visibility into message size distribution. Changes: - Replace request_size and response_size metrics with bytes_histogram in cql_sg_stats::request_kind_stats - Per-shard metrics continue to be reported as before - QUERY, EXECUTE, and BATCH operations now report per-node, per-scheduling-group histograms of bytes sent and received, providing detailed insight into these operations Other CQL operations (e.g., PREPARE, OPTIONS) are not included in per-node histogram reporting as they are less performance-critical, but can be added in the future if proven useful. Metrics example: ``` # HELP scylla_transport_cql_request_bytes Counts the total number of received bytes in CQL messages of a specific kind. # TYPE scylla_transport_cql_request_bytes counter scylla_transport_cql_request_bytes{kind="BATCH",scheduling_group_name="sl:default",shard="0"} 129808 scylla_transport_cql_request_bytes{kind="EXECUTE",scheduling_group_name="sl:default",shard="0"} 227409 scylla_transport_cql_request_bytes{kind="PREPARE",scheduling_group_name="sl:default",shard="0"} 631 scylla_transport_cql_request_bytes{kind="QUERY",scheduling_group_name="sl:default",shard="0"} 2809 scylla_transport_cql_request_bytes{kind="QUERY",scheduling_group_name="sl:driver",shard="0"} 4079 scylla_transport_cql_request_bytes{kind="REGISTER",scheduling_group_name="sl:default",shard="0"} 98 scylla_transport_cql_request_bytes{kind="STARTUP",scheduling_group_name="sl:driver",shard="0"} 432 # HELP scylla_transport_cql_request_histogram_bytes A histogram of received bytes in CQL messages of a specific kind and specific scheduling group. # TYPE scylla_transport_cql_request_histogram_bytes histogram scylla_transport_cql_request_histogram_bytes_sum{kind="QUERY",scheduling_group_name="sl:driver"} 4079 scylla_transport_cql_request_histogram_bytes_count{kind="QUERY",scheduling_group_name="sl:driver"} 57 scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="1024.000000",scheduling_group_name="sl:driver"} 57 scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="2048.000000",scheduling_group_name="sl:driver"} 57 scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="4096.000000",scheduling_group_name="sl:driver"} 57 scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="8192.000000",scheduling_group_name="sl:driver"} 57 scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="16384.000000",scheduling_group_name="sl:driver"} 57 scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="32768.000000",scheduling_group_name="sl:driver"} 57 scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="65536.000000",scheduling_group_name="sl:driver"} 57 scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="131072.000000",scheduling_group_name="sl:driver"} 57 scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="262144.000000",scheduling_group_name="sl:driver"} 57 scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="524288.000000",scheduling_group_name="sl:driver"} 57 scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="1048576.000000",scheduling_group_name="sl:driver"} 57 scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="2097152.000000",scheduling_group_name="sl:driver"} 57 scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="4194304.000000",scheduling_group_name="sl:driver"} 57 scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="8388608.000000",scheduling_group_name="sl:driver"} 57 scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="16777216.000000",scheduling_group_name="sl:driver"} 57 scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="33554432.000000",scheduling_group_name="sl:driver"} 57 scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="67108864.000000",scheduling_group_name="sl:driver"} 57 scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="134217728.000000",scheduling_group_name="sl:driver"} 57 scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="268435456.000000",scheduling_group_name="sl:driver"} 57 scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="536870912.000000",scheduling_group_name="sl:driver"} 57 scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="1073741824.000000",scheduling_group_name="sl:driver"} 57 ```	2026-01-28 13:53:47 +02:00
Amnon Heiman	5875bcca23	approx_exponential_histogram: Add sum() method for accurate value tracking Previously, histogram sums were estimated by multiplying bucket offsets by their counts, which produces inaccurate results - typically too high when using upper limits or too low when using lower limits. This patch adds accurate sum tracking to approx_exponential_histogram: - Adds a _sum member variable to track the actual sum of all values - Implements sum() method to return the accumulated total - Updates add() to increment _sum for each value - Modifies to_metrics_histogram() helper to use the new sum() method This change is important as histograms will be used instead of counters for byte statistics, where accurate totals are essential for metrics reporting. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2026-01-28 13:39:46 +02:00
Amnon Heiman	2fd453f4ec	utils/estimated_histogram.hh: Add bytes_histogram For various use cases, we need to report byte histograms, such as for request and reply message sizes. This patch introduce bytes_histogram as a type alias for approx_exponential_histogram configured to track byte values from 1KB to 1GB with power-of-2 buckets (Precision=1). This provides a convenient, performance-efficient histogram for measuring message sizes, payload sizes, and other byte-based metrics. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2026-01-28 13:31:39 +02:00
Karol Baryła	2c471ec57a	protocol-extensions.md: Fix client_options docs When this column and relevant SUPPORTED key were added, the documentation was mistakenly put in the section about shard awareness extension. This commit moves the documentation into a dedicated section. I also expended it to describe both the new column and the new SUPPORTED key.	2026-01-13 11:49:00 +01:00
Karol Baryła	30d4d3248d	system_keyspace.md: Add client_options column It was recently introduced, but the documentation was not updated.	2026-01-13 11:35:52 +01:00
Karol Baryła	a0a6140436	system_keyspace.md: Fix order in system.clients scheduling_group column is places after protocol_version in the current version.	2026-01-13 11:33:34 +01:00

4291 changed files with 48520 additions and 30746 deletions

									
										18

.github/copilot-instructions.md
									
										vendored
									
												View File
												
				@@ -55,22 +55,26 @@ ninja build/<mode>/test/boost/<test_name>

				ninja build/<mode>/scylla

				# Run all tests in a file

				./test.py --mode=<mode> <test_path>

				./test.py --mode=<mode> test/<suite>/<test_name>.py

				# Run a single test case from a file

				./test.py --mode=<mode> <test_path>::<test_function_name>

				./test.py --mode=<mode> test/<suite>/<test_name>.py::<test_function_name>

				# Run all tests in a directory

				./test.py --mode=<mode> test/<suite>/

				# Examples

				./test.py --mode=dev alternator/

				./test.py --mode=dev cluster/test_raft_voters::test_raft_limited_voters_retain_coordinator

				./test.py --mode=dev test/alternator/

				./test.py --mode=dev test/cluster/test_raft_voters.py::test_raft_limited_voters_retain_coordinator

				./test.py --mode=dev test/cqlpy/test_json.py

				# Optional flags

				./test.py --mode=dev cluster/test_raft_no_quorum -v  # Verbose output

				./test.py --mode=dev cluster/test_raft_no_quorum --repeat 5  # Repeat test 5 times

				./test.py --mode=dev test/cluster/test_raft_no_quorum.py -v  # Verbose output

				./test.py --mode=dev test/cluster/test_raft_no_quorum.py --repeat 5  # Repeat test 5 times

				```

				**Important:**

				- Use path without `.py` extension (e.g., `cluster/test_raft_no_quorum`, not `cluster/test_raft_no_quorum.py`)

				- Use full path with `.py` extension (e.g., `test/cluster/test_raft_no_quorum.py`, not `cluster/test_raft_no_quorum`)

				- To run a single test case, append `::<test_function_name>` to the file path

				- Add `-v` for verbose output

				- Add `--repeat <num>` to repeat a test multiple times

									
										2

.github/dependabot.yml
									
										vendored
									
												View File
												
				@@ -1,6 +1,6 @@

				version: 2

				updates:

				- package-ecosystem: "pip"

				- package-ecosystem: "uv"

				  directory: "/docs"

				  schedule:

				    interval: "daily"

									
										2

.github/scripts/check-license.py
									
										vendored
									
												View File
												
				@@ -4,7 +4,7 @@

				# Copyright (C) 2024-present ScyllaDB

				#

				#

				# SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				# SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				#

				import argparse

									
										3

.github/workflows/backport-pr-fixes-validation.yaml
									
										vendored
									
												View File
												
				@@ -8,6 +8,9 @@ on:

				jobs:

				  check-fixes-prefix:

				    runs-on: ubuntu-latest

				    permissions:

				      contents: read

				      issues: write

				    steps:

				      - name: Check PR body for "Fixes" prefix patterns

				        uses: actions/github-script@v7

									
										53

.github/workflows/call_backport_with_jira.yaml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,53 @@

				name: Backport with Jira Integration

				on:

				  push:

				    branches:

				      - master

				      - next-*.*

				      - branch-*.*

				  pull_request_target:

				    types: [labeled, closed]

				    branches: 

				      - master

				      - next

				      - next-*.*

				      - branch-*.*

				jobs:

				  backport-on-push:

				    if: github.event_name == 'push'

				    uses: scylladb/github-automation/.github/workflows/backport-with-jira.yaml@main

				    with:

				      event_type: 'push'

				      base_branch: ${{ github.ref }}

				      commits: ${{ github.event.before }}..${{ github.sha }}

				    secrets:

				      gh_token: ${{ secrets.AUTO_BACKPORT_TOKEN }}

				      jira_auth: ${{ secrets.USER_AND_KEY_FOR_JIRA_AUTOMATION }}

				  backport-on-label:

				    if: github.event_name == 'pull_request_target' && github.event.action == 'labeled'

				    uses: scylladb/github-automation/.github/workflows/backport-with-jira.yaml@main

				    with:

				      event_type: 'labeled'

				      base_branch: refs/heads/${{ github.event.pull_request.base.ref }}

				      pull_request_number: ${{ github.event.pull_request.number }}

				      head_commit: ${{ github.event.pull_request.base.sha }}

				      label_name: ${{ github.event.label.name }}

				      pr_state: ${{ github.event.pull_request.state }}

				    secrets:

				      gh_token: ${{ secrets.AUTO_BACKPORT_TOKEN }}

				      jira_auth: ${{ secrets.USER_AND_KEY_FOR_JIRA_AUTOMATION }}

				  backport-chain:

				    if: github.event_name == 'pull_request_target' && github.event.action == 'closed' && github.event.pull_request.merged == true

				    uses: scylladb/github-automation/.github/workflows/backport-with-jira.yaml@main

				    with:

				      event_type: 'chain'

				      base_branch: refs/heads/${{ github.event.pull_request.base.ref }}

				      pull_request_number: ${{ github.event.pull_request.number }}

				      pr_body: ${{ github.event.pull_request.body }}

				    secrets:

				      gh_token: ${{ secrets.AUTO_BACKPORT_TOKEN }}

				      jira_auth: ${{ secrets.USER_AND_KEY_FOR_JIRA_AUTOMATION }}

									
										35

.github/workflows/call_jira_sync.yml
									
										vendored
									
												View File
												
				@@ -1,8 +1,8 @@

				name: Sync Jira Based on PR Events

				name: Sync Jira Based on PR Events

				on:

				  pull_request_target:

				    types: [opened, ready_for_review, review_requested, labeled, unlabeled, closed]

				    types: [opened, edited, ready_for_review, review_requested, labeled, unlabeled, closed]

				permissions:

				  contents: read

				@@ -10,32 +10,9 @@ permissions:

				  issues: write

				jobs:

				  jira-sync-pr-opened:

				    if: github.event.action == 'opened'

				    uses: scylladb/github-automation/.github/workflows/main_jira_sync_pr_opened.yml@main

				    secrets:

				      caller_jira_auth: ${{ secrets.USER_AND_KEY_FOR_JIRA_AUTOMATION }}

				  jira-sync-in-review:

				    if: github.event.action == 'ready_for_review' || github.event.action == 'review_requested'

				    uses: scylladb/github-automation/.github/workflows/main_jira_sync_in_review.yml@main

				    secrets:

				      caller_jira_auth: ${{ secrets.USER_AND_KEY_FOR_JIRA_AUTOMATION }}

				  jira-sync-add-label:

				    if: github.event.action == 'labeled'

				    uses: scylladb/github-automation/.github/workflows/main_jira_sync_add_label.yml@main

				    secrets:

				      caller_jira_auth: ${{ secrets.USER_AND_KEY_FOR_JIRA_AUTOMATION }}

				  jira-status-remove-label:

				    if: github.event.action == 'unlabeled'

				    uses: scylladb/github-automation/.github/workflows/main_jira_sync_remove_label.yml@main

				    secrets:

				      caller_jira_auth: ${{ secrets.USER_AND_KEY_FOR_JIRA_AUTOMATION }}

				  jira-status-pr-closed:

				    if: github.event.action == 'closed' 

				    uses: scylladb/github-automation/.github/workflows/main_jira_sync_pr_closed.yml@main

				  jira-sync:

				    uses: scylladb/github-automation/.github/workflows/main_pr_events_jira_sync.yml@main

				    with:

				      caller_action: ${{ github.event.action }}

				    secrets:

				      caller_jira_auth: ${{ secrets.USER_AND_KEY_FOR_JIRA_AUTOMATION }}

									
										6

.github/workflows/call_sync_milestone_to_jira.yml
									
										vendored
									
												View File
												
				@@ -1,14 +1,14 @@

				name: Call Jira release creation for new milestone

				name: Call Jira release creation for new milestone

				on:

				  milestone:

				    types: [created]

				    types: [created, closed]

				jobs:

				  sync-milestone-to-jira:

				    uses: scylladb/github-automation/.github/workflows/main_sync_milestone_to_jira_release.yml@main

				    with:

				      # Comma-separated list of Jira project keys

				      jira_project_keys: "SCYLLADB,CUSTOMER,SMI"

				      jira_project_keys: "SCYLLADB,CUSTOMER,SMI,RELENG,VECTOR"

				    secrets:

				      caller_jira_auth: ${{ secrets.USER_AND_KEY_FOR_JIRA_AUTOMATION }}

									
										5

.github/workflows/call_validate_pr_author_email.yml
									
										vendored
									
												View File
												
				@@ -7,6 +7,11 @@ on:

				      - synchronize

				      - reopened

				permissions:

				  contents: read

				  pull-requests: write

				  statuses: write

				jobs:

				  validate_pr_author_email:

				    uses: scylladb/github-automation/.github/workflows/validate_pr_author_email.yml@main

									
										2

.github/workflows/check-license-header.yaml
									
										vendored
									
												View File
												
				@@ -7,7 +7,7 @@ on:

				env:

				  HEADER_CHECK_LINES: 10

				  LICENSE: "LicenseRef-ScyllaDB-Source-Available-1.0"

				  LICENSE: "LicenseRef-ScyllaDB-Source-Available-1.1"

				  CHECKED_EXTENSIONS: ".cc .hh .py"

				jobs:

									
										6

.github/workflows/docs-pages.yaml
									
										vendored
									
												View File
												
				@@ -19,6 +19,8 @@ on:

				jobs:

				  release:

				    permissions:

				      pages: write

				      id-token: write

				      contents: write

				    runs-on: ubuntu-latest

				    steps:

				@@ -31,7 +33,9 @@ jobs:

				      - name: Set up Python

				        uses: actions/setup-python@v5

				        with:

				          python-version: "3.10"

				          python-version: "3.12"

				      - name: Install uv

				        uses: astral-sh/setup-uv@v6

				      - name: Set up env

				        run: make -C docs FLAG="${{ env.FLAG }}" setupenv

				      - name: Build docs

									
										4

.github/workflows/docs-pr.yaml
									
										vendored
									
												View File
												
				@@ -29,7 +29,9 @@ jobs:

				      - name: Set up Python

				        uses: actions/setup-python@v5

				        with:

				          python-version: "3.10"

				          python-version: "3.12"

				      - name: Install uv

				        uses: astral-sh/setup-uv@v6

				      - name: Set up env

				        run: make -C docs FLAG="${{ env.FLAG }}" setupenv

				      - name: Build docs

									
										42

.github/workflows/trigger-scylla-ci.yaml
									
										vendored
									
												View File
												
				@@ -1,4 +1,6 @@

				name: Trigger Scylla CI Route

				permissions:

				  contents: read

				on:

				  issue_comment:

				@@ -12,16 +14,38 @@ jobs:

				    if: (github.event_name == 'issue_comment' && github.event.comment.user.login != 'scylladbbot') || github.event.label.name == 'conflicts'

				    runs-on: ubuntu-latest

				    steps:

				      - name: Verify Org Membership

				        id: verify_author

				        env:

				          EVENT_NAME: ${{ github.event_name }}

				          PR_AUTHOR: ${{ github.event.pull_request.user.login }}

				          PR_ASSOCIATION: ${{ github.event.pull_request.author_association }}

				          COMMENT_AUTHOR: ${{ github.event.comment.user.login }}

				          COMMENT_ASSOCIATION: ${{ github.event.comment.author_association }}

				        shell: bash

				        run: |

				          if [[ "$EVENT_NAME" == "pull_request_target" ]]; then

				            AUTHOR="$PR_AUTHOR"

				            ASSOCIATION="$PR_ASSOCIATION"

				          else

				            AUTHOR="$COMMENT_AUTHOR"

				            ASSOCIATION="$COMMENT_ASSOCIATION"

				          fi

				          if [[ "$ASSOCIATION" == "MEMBER" || "$ASSOCIATION" == "OWNER" ]]; then

				            echo "member=true" >> $GITHUB_OUTPUT

				          else

				            echo "::warning::${AUTHOR} is not a member of scylladb (association: ${ASSOCIATION}); skipping CI trigger."

				            echo "member=false" >> $GITHUB_OUTPUT

				          fi

				      - name: Validate Comment Trigger

				        if: github.event_name == 'issue_comment'

				        id: verify_comment

				        env:

				          COMMENT_BODY: ${{ github.event.comment.body }}

				        shell: bash

				        run: |

				          BODY=$(cat << 'EOF'

				          ${{ github.event.comment.body }}

				          EOF

				          )

				          CLEAN_BODY=$(echo "$BODY" | grep -v '^[[:space:]]*>')

				          CLEAN_BODY=$(echo "$COMMENT_BODY" | grep -v '^[[:space:]]*>')

				          if echo "$CLEAN_BODY" | grep -qi '@scylladbbot' && echo "$CLEAN_BODY" | grep -qi 'trigger-ci'; then

				            echo "trigger=true" >> $GITHUB_OUTPUT

				@@ -30,13 +54,13 @@ jobs:

				          fi

				      - name: Trigger Scylla-CI-Route Jenkins Job

				        if: github.event_name == 'pull_request_target' || steps.verify_comment.outputs.trigger == 'true'

				        if: steps.verify_author.outputs.member == 'true' && (github.event_name == 'pull_request_target' || steps.verify_comment.outputs.trigger == 'true')

				        env:

				          JENKINS_USER: ${{ secrets.JENKINS_USERNAME }}

				          JENKINS_API_TOKEN: ${{ secrets.JENKINS_TOKEN }}

				          JENKINS_URL: "https://jenkins.scylladb.com"

				          PR_NUMBER: "${{ github.event.issue.number || github.event.pull_request.number }}"

				          PR_REPO_NAME: "${{ github.event.repository.full_name }}"

				        run: |

				          PR_NUMBER=${{ github.event.issue.number || github.event.pull_request.number }}

				          PR_REPO_NAME=${{ github.event.repository.full_name }}

				          curl -X POST "$JENKINS_URL/job/releng/job/Scylla-CI-Route/buildWithParameters?PR_NUMBER=$PR_NUMBER&PR_REPO_NAME=$PR_REPO_NAME" \

				          --user "$JENKINS_USER:$JENKINS_API_TOKEN" --fail -i -v

				            --user "$JENKINS_USER:$JENKINS_API_TOKEN" --fail

									
										3

.github/workflows/trigger_jenkins.yaml
									
										vendored
									
												View File
												
				@@ -1,5 +1,8 @@

				name: Trigger next gating

				permissions:

				  contents: read

				on:

				  push:

				    branches:

									
										86

CMakeLists.txt
									
												View File
												
				@@ -2,6 +2,12 @@ cmake_minimum_required(VERSION 3.27)

				project(scylla)

				# Disable CMake's automatic -fcolor-diagnostics injection (CMake 3.24+ adds

				# it for Clang+Ninja). configure.py does not add any color diagnostics flags,

				# so we clear the internal CMake variable to prevent injection.

				set(CMAKE_CXX_COMPILE_OPTIONS_COLOR_DIAGNOSTICS "")

				set(CMAKE_C_COMPILE_OPTIONS_COLOR_DIAGNOSTICS "")

				list(APPEND CMAKE_MODULE_PATH

				  ${CMAKE_CURRENT_SOURCE_DIR}/cmake

				  ${CMAKE_CURRENT_SOURCE_DIR}/seastar/cmake)

				@@ -51,6 +57,16 @@ set(CMAKE_CXX_EXTENSIONS ON CACHE INTERNAL "")

				set(CMAKE_CXX_SCAN_FOR_MODULES OFF CACHE INTERNAL "")

				set(CMAKE_VISIBILITY_INLINES_HIDDEN ON)

				# Global defines matching configure.py

				# Since gcc 13, libgcc doesn't need the exception workaround

				add_compile_definitions(SEASTAR_NO_EXCEPTION_HACK)

				# Hacks needed to expose internal APIs for xxhash dependencies

				add_compile_definitions(XXH_PRIVATE_API)

				# SEASTAR_TESTING_MAIN is added later (after add_subdirectory(seastar) and

				# add_subdirectory(abseil)) to avoid leaking into the seastar subdirectory.

				# If SEASTAR_TESTING_MAIN is defined globally before seastar, it causes a

				# duplicate 'main' symbol in seastar_testing.

				if(is_multi_config)

				    find_package(Seastar)

				    # this is atypical compared to standard ExternalProject usage:

				@@ -96,12 +112,33 @@ else()

				    set(Seastar_EXCLUDE_APPS_FROM_ALL ON CACHE BOOL "" FORCE)

				    set(Seastar_EXCLUDE_TESTS_FROM_ALL ON CACHE BOOL "" FORCE)

				    set(Seastar_IO_URING ON CACHE BOOL "" FORCE)

				    set(Seastar_SCHEDULING_GROUPS_COUNT 21 CACHE STRING "" FORCE)

				    set(Seastar_SCHEDULING_GROUPS_COUNT 24 CACHE STRING "" FORCE)

				    set(Seastar_UNUSED_RESULT_ERROR ON CACHE BOOL "" FORCE)

				    # Match configure.py's build_seastar_shared_libs: Debug and Dev

				    # build Seastar as a shared library, others build it static.

				    if(CMAKE_BUILD_TYPE STREQUAL "Debug" OR CMAKE_BUILD_TYPE STREQUAL "Dev")

				        set(BUILD_SHARED_LIBS ON CACHE BOOL "" FORCE)

				    else()

				        set(BUILD_SHARED_LIBS OFF CACHE BOOL "" FORCE)

				    endif()

				    add_subdirectory(seastar)

				    target_compile_definitions (seastar

				      PRIVATE

				        SEASTAR_NO_EXCEPTION_HACK)

				    # Coverage mode sets cmake_build_type='Debug' for Seastar

				    # (configure.py:515), so Seastar's pkg-config output includes sanitizer

				    # link flags in seastar_libs_coverage (configure.py:2514,2649).

				    # Seastar's own CMake only activates sanitizer targets for Debug/Sanitize

				    # configs, so we inject link options on the seastar target for Coverage.

				    # Using PUBLIC ensures they propagate to all targets linking Seastar

				    # (but not standalone tools like patchelf), matching configure.py's

				    # behavior.  Compile-time flags and defines are handled globally in

				    # cmake/mode.Coverage.cmake.

				    if(CMAKE_BUILD_TYPE STREQUAL "Coverage")

				        target_link_options(seastar

				            PUBLIC

				                -fsanitize=address

				                -fsanitize=undefined

				                -fsanitize=vptr)

				    endif()

				endif()

				set(ABSL_PROPAGATE_CXX_STD ON CACHE BOOL "" FORCE)

				@@ -111,8 +148,10 @@ if(Scylla_ENABLE_LTO)

				endif()

				find_package(Sanitizers QUIET)

				# Match configure.py:2192 — abseil gets sanitizer flags with -fno-sanitize=vptr

				# to exclude vptr checks which are incompatible with abseil's usage.

				list(APPEND absl_cxx_flags

				    $<$<CONFIG:Debug,Sanitize>:$<TARGET_PROPERTY:Sanitizers::address,INTERFACE_COMPILE_OPTIONS>;$<TARGET_PROPERTY:Sanitizers::undefined_behavior,INTERFACE_COMPILE_OPTIONS>>)

				    $<$<CONFIG:Debug,Sanitize>:$<TARGET_PROPERTY:Sanitizers::address,INTERFACE_COMPILE_OPTIONS>;$<TARGET_PROPERTY:Sanitizers::undefined_behavior,INTERFACE_COMPILE_OPTIONS>;-fno-sanitize=vptr>)

				if(CMAKE_CXX_COMPILER_ID STREQUAL "GNU")

				    list(APPEND ABSL_GCC_FLAGS ${absl_cxx_flags})

				elseif(CMAKE_CXX_COMPILER_ID STREQUAL "Clang")

				@@ -137,9 +176,38 @@ add_library(absl::headers ALIAS absl-headers)

				# unfortunately.

				set_target_properties(absl_strerror PROPERTIES EXCLUDE_FROM_ALL TRUE)

				# Now that seastar and abseil subdirectories are fully processed, add

				# SEASTAR_TESTING_MAIN globally. This matches configure.py's global define

				# without leaking into seastar (which would cause duplicate main symbols).

				add_compile_definitions(SEASTAR_TESTING_MAIN)

				# System libraries dependencies

				find_package(Boost REQUIRED

				    COMPONENTS filesystem program_options system thread regex unit_test_framework)

				# When using shared Boost libraries, define BOOST_ALL_DYN_LINK (matching configure.py)

				if(NOT Boost_USE_STATIC_LIBS)

				    add_compile_definitions(BOOST_ALL_DYN_LINK)

				endif()

				# CMake's Boost package config adds per-component defines like

				# BOOST_UNIT_TEST_FRAMEWORK_DYN_LINK, BOOST_REGEX_DYN_LINK, etc. on the

				# imported targets. configure.py only uses BOOST_ALL_DYN_LINK (which covers

				# all components), so strip the per-component defines to align the two build

				# systems.

				foreach(_boost_target

				    Boost::unit_test_framework

				    Boost::regex

				    Boost::filesystem

				    Boost::program_options

				    Boost::system

				    Boost::thread)

				  if(TARGET ${_boost_target})

				    # Completely remove all INTERFACE_COMPILE_DEFINITIONS from the Boost target.

				    # This prevents per-component *_DYN_LINK and *_NO_LIB defines from

				    # propagating. BOOST_ALL_DYN_LINK (set globally) covers all components.

				    set_property(TARGET ${_boost_target} PROPERTY INTERFACE_COMPILE_DEFINITIONS)

				  endif()

				endforeach()

				target_link_libraries(Boost::regex

				  INTERFACE

				    ICU::i18n

				@@ -166,6 +234,9 @@ generate_scylla_version()

				option(Scylla_USE_PRECOMPILED_HEADER "Use precompiled header for Scylla" ON)

				add_library(scylla-precompiled-header STATIC exported_templates.cc)

				target_include_directories(scylla-precompiled-header PRIVATE

				    "${CMAKE_CURRENT_SOURCE_DIR}"

				    "${scylla_gen_build_dir}")

				target_link_libraries(scylla-precompiled-header PRIVATE

				    absl::headers

				    absl::btree

				@@ -196,6 +267,10 @@ if (Scylla_USE_PRECOMPILED_HEADER)

				    message(STATUS "Using precompiled header for Scylla - remember to add `sloppiness = pch_defines,time_macros` to ccache.conf, if you're using ccache.")

				    target_precompile_headers(scylla-precompiled-header PRIVATE "stdafx.hh")

				    target_compile_definitions(scylla-precompiled-header PRIVATE SCYLLA_USE_PRECOMPILED_HEADER)

				    # Match configure.py: -fpch-validate-input-files-content tells the compiler

				    # to check content of stdafx.hh if timestamps don't match (important for

				    # ccache/git workflows where timestamps may not be preserved).

				    add_compile_options(-fpch-validate-input-files-content)

				  endif()

				else()

				  set(Scylla_USE_PRECOMPILED_HEADER_USE OFF)

				@@ -300,7 +375,6 @@ add_subdirectory(locator)

				add_subdirectory(message)

				add_subdirectory(mutation)

				add_subdirectory(mutation_writer)

				add_subdirectory(node_ops)

				add_subdirectory(readers)

				add_subdirectory(replica)

				add_subdirectory(raft)

									
										197

IMPLEMENTATION_SUMMARY.md
									
												View File
											
				@@ -1,197 +0,0 @@

				# Implementation Summary: Error Injection Event Stream

				## Problem Statement

				Tests using error injections had to rely on log parsing to detect when injection points were hit:

				```python

				mark, _ = await log.wait_for('topology_coordinator_pause_before_processing_backlog: waiting', from_mark=mark)

				```

				This approach was:

				- **Slow**: Required waiting for log flushes and buffer processing

				- **Unreliable**: Regex matching could fail or match wrong lines

				- **Fragile**: Changes to log messages broke tests

				## Solution

				Implemented a Server-Sent Events (SSE) API that sends real-time notifications when error injection points are triggered.

				## Implementation

				### 1. Backend Event System (`utils/error_injection.hh`)

				**Added**:

				- `error_injection_event_callback` type for event notifications

				- `_event_callbacks` vector to store registered callbacks

				- `notify_event()` method called by all `inject()` methods

				- `register_event_callback()` / `clear_event_callbacks()` methods

				- Cross-shard registration via `register_event_callback_on_all()`

				**Modified**:

				- All `inject()` methods now call `notify_event()` after logging

				- Changed log level from DEBUG to INFO for better visibility

				- Both enabled/disabled template specializations updated

				### 2. SSE API Endpoint (`api/error_injection.cc`)

				**Added**:

				- `GET /v2/error_injection/events` endpoint

				- Streams events in SSE format: `data: {"injection":"name","type":"handler","shard":0}\n\n`

				- Cross-shard event collection using `foreign_ptr` and `smp::submit_to()`

				- Automatic cleanup on client disconnect

				**Architecture**:

				1. Client connects → queue created on handler shard

				2. Callbacks registered on ALL shards

				3. When injection fires → event sent via `smp::submit_to()` to queue

				4. Queue → SSE stream → client

				5. Client disconnect → callbacks cleared on all shards

				### 3. Python Client (`test/pylib/rest_client.py`)

				**Added**:

				- `InjectionEventStream` class:

				  - `wait_for_injection(name, timeout)` - wait for specific injection

				  - Background task reads SSE stream

				  - Queue-based event delivery

				- `injection_event_stream()` context manager for lifecycle

				- Full async/await support

				**Usage**:

				```python

				async with injection_event_stream(server_ip) as stream:

				    await api.enable_injection(server_ip, "my_injection", one_shot=True)

				    # ... trigger operation ...

				    event = await stream.wait_for_injection("my_injection", timeout=30)

				```

				### 4. Tests (`test/cluster/test_error_injection_events.py`)

				**Added**:

				- `test_injection_event_stream_basic` - basic functionality

				- `test_injection_event_stream_multiple_injections` - multiple tracking

				- `test_injection_event_vs_log_parsing_comparison` - old vs new

				### 5. Documentation (`docs/dev/error_injection_events.md`)

				Complete documentation covering:

				- Architecture and design

				- Usage examples

				- Migration guide from log parsing

				- Thread safety and cleanup

				## Key Design Decisions

				### Why SSE instead of WebSocket?

				- **Unidirectional**: We only need server → client events

				- **Simpler**: Built on HTTP, easier to implement

				- **Standard**: Well-supported in Python (aiohttp)

				- **Sufficient**: No need for bidirectional communication

				### Why Thread-Local Callbacks?

				- **Performance**: No cross-shard synchronization overhead

				- **Simplicity**: Each shard independent

				- **Safety**: No shared mutable state

				- Event delivery handled by `smp::submit_to()`

				### Why Info Level Logging?

				- **Visibility**: Events should be visible in logs AND via SSE

				- **Debugging**: Easier to correlate events with log context

				- **Consistency**: Matches importance of injection triggers

				## Benefits

				### Performance

				- **Instant notification**: No waiting for log flushes

				- **No regex matching**: Direct event delivery

				- **Parallel processing**: Events from all shards

				### Reliability

				- **Type-safe**: Structured JSON events

				- **No missed events**: Queue-based delivery

				- **Automatic cleanup**: RAII ensures no leaks

				### Developer Experience

				- **Clean API**: Simple async/await pattern

				- **Better errors**: Timeout on specific injection name

				- **Metadata**: Event includes type and shard ID

				- **Backward compatible**: Existing tests unchanged

				## Testing

				### Security

				✅ CodeQL scan: **0 alerts** (Python)

				### Validation Needed

				Due to build environment limitations, the following validations are recommended:

				- [ ] Build C++ code in dev mode

				- [ ] Run example tests: `./test.py --mode=dev test/cluster/test_error_injection_events.py`

				- [ ] Verify SSE connection lifecycle (connect, disconnect, reconnect)

				- [ ] Test with multiple concurrent clients

				- [ ] Verify cross-shard event delivery

				- [ ] Performance comparison with log parsing

				## Files Changed

				```

				api/api-doc/error_injection.json            |  15 +++

				api/error_injection.cc                      |  82 ++++++++++++++

				docs/dev/error_injection_events.md          | 132 +++++++++++++++++++++

				test/cluster/test_error_injection_events.py | 140 ++++++++++++++++++++++

				test/pylib/rest_client.py                   | 144 ++++++++++++++++++++++

				utils/error_injection.hh                    |  81 +++++++++++++

				6 files changed, 587 insertions(+), 7 deletions(-)

				```

				## Migration Guide

				### Old Approach

				```python

				log = await manager.server_open_log(server.server_id)

				mark = await log.mark()

				await manager.api.enable_injection(server.ip_addr, "my_injection", one_shot=True)

				# ... trigger operation ...

				mark, _ = await log.wait_for('my_injection: waiting', from_mark=mark)

				```

				### New Approach

				```python

				async with injection_event_stream(server.ip_addr) as stream:

				    await manager.api.enable_injection(server.ip_addr, "my_injection", one_shot=True)

				    # ... trigger operation ...

				    event = await stream.wait_for_injection("my_injection", timeout=30)

				```

				### Backward Compatibility

				- ✅ All existing log-based tests continue to work

				- ✅ Logging still happens (now at INFO level)

				- ✅ No breaking changes to existing APIs

				- ✅ SSE is opt-in for new tests

				## Future Enhancements

				Possible improvements:

				1. Server-side filtering by injection name (query parameter)

				2. Include injection parameters in events

				3. Add event timestamps

				4. Event history/replay support

				5. Multiple concurrent SSE clients per server

				6. WebSocket support if bidirectional communication needed

				## Conclusion

				This implementation successfully addresses the problem statement:

				- ✅ Eliminates log parsing

				- ✅ Faster tests

				- ✅ More reliable detection

				- ✅ Clean API

				- ✅ Backward compatible

				- ✅ Well documented

				- ✅ Security validated

				The solution follows ScyllaDB best practices:

				- RAII for resource management

				- Seastar async patterns (coroutines, futures)

				- Cross-shard communication via `smp::submit_to()`

				- Thread-local state, no locks

				- Comprehensive error handling

									
										46

LICENSE-ScyllaDB-Source-Available.md
									
												View File
												
				@@ -1,8 +1,8 @@

				## **SCYLLADB SOFTWARE LICENSE AGREEMENT**

				| Version: | 1.0 |

				| Version: | 1.1 |

				| :---- | :---- |

				| Last updated: | December 18, 2024 |

				| Last updated: | April 12, 2026 |

				**Your Acceptance**

				@@ -12,20 +12,48 @@ The terms "**You**" or "**Licensee**" refer to any individual accessing or using

				**Grant of License**

				* **Software Definitions:** Software means the ScyllaDB software provided by Licensor, including the source code, object code, and any accompanying documentation or tools, or any part thereof, as made available under this Agreement.

				* **Grant of License:** Subject to the terms and conditions of this Agreement, Licensor grants You a limited, non-exclusive, revocable, non-sublicensable, non-transferable, royalty free license to Use the Software, in each case solely for the purposes of:

				* **Definitions:**

				  1. **Software:** Software means the ScyllaDB software provided by Licensor, including the source code, object code, and any accompanying documentation or tools, or any part thereof, as made available under this Agreement.

				  2. **Commercial Customer**: means any legal entity (including its Affiliates) that has entered into a transaction with Licensor, or an authorized reseller/distributor, for the provision of any ScyllaDB products or services. This includes, without limitation:  (a) Scope of Service: Any paid subscription, enterprise license, "BYOA" or Database-as-a-Service (DBaaS) offering, technical support, professional services, consulting, or training. (b) Scale and Volume: Any deployment regardless of size, capacity, or performance metrics (c) Payment Method: Any compensation model, including but not limited to, fixed-fee, consumption-based (On-Demand), committed spend, third-party marketplace credits (e.g., AWS, GCP, Azure), or promotional credits and discounts.

				* **Grant of License:** Subject to the terms and conditions of this Agreement, including the Eligibility and Exclusive Use Restrictions clause, Licensor grants You a limited, non-exclusive, revocable, non-sublicensable, non-transferable, royalty free license to Use the Software, in each case solely for the purposes of:

				  1) Copying, distributing, evaluating (including performing benchmarking or comparative tests or evaluations , subject to the limitations below) and improving the Software and ScyllaDB; and

				  2) create a modified version of the Software (each, a "**Licensed Work**"); provided however, that each such Licensed Work keeps all or substantially all of the functions and features of the Software, and/or using all or substantially all of the source code of the Software. You hereby agree that all the Licensed Work are, upon creation, considered Licensed Work of the Licensor, shall be the sole property of the Licensor and its assignees, and the Licensor and its assignees shall be the sole owner of all rights of any kind or nature, in connection with such Licensed Work. You hereby irrevocably and unconditionally assign to the Licensor all the Licensed Work and any part thereof.  This License applies separately for each version of the Licensed Work, which shall be considered "Software" for the purpose of this Agreement.

				* **Eligibility and Exclusive Use Restrictions**

				**License Limitations, Restrictions and Obligations:** The license grant above is subject to the following limitations, restrictions, and obligations. If Licensee’s Use of the Software does not comply with the above license grant or the terms of this section (including exceeding the Usage Limit set forth below), Licensee must: (i) refrain from any Use of the Software; and (ii) purchase a [commercial paid license](https://www.scylladb.com/scylladb-proprietary-software-license-agreement/) from the Licensor.

				i. 	Restricted to "Never Customers" Only. The license granted under this Agreement is strictly limited to Never Customers. For purposes of this Agreement, a "Never Customer" is an entity (including its Affiliates) that does not have, and has not had within the previous twelve (12) months, a paid commercial subscription, professional services agreement, or any other commercial relationship with Licensor. Satisfaction of the Never Customer criteria is a strict condition precedent to the effectiveness of this License. 

				ii. 	Total Prohibition for Existing Commercial Customers. If You (or any of Your Affiliates) are an existing Commercial Customer of Licensor within the last twelve (12) months, no license is deemed to have been offered or extended to You, and any download or installation of the Software by You is unauthorized. This prohibition applies to all deployments, including but not limited to:

				(a) existing commercial workloads;

				(b) any new use cases, new applications, or new workloads

				iii. **No "Hybrid" Usage**. Licensee is expressly prohibited from combining free tier usage under this Agreement with paid commercial units. 

				If You are a Commercial Customer, all use of the Software across Your entire organization (and any of your Affiliates) must be governed by a valid, paid commercial agreement. Use of the Software under this license by a Commercial Customer (which is not a "Never Customer") shall:

				(a) Void this license *ab initio*;

				(b) Be deemed a material breach of both this Agreement and any existing commercial terms; and

				(c) Entitle Licensor to invoice Licensee for such unauthorized usage at Licensor's standard list prices, retroactive to the date of first use.

				Notwithstanding anything to the contrary in the Eligibility or License Limitations sections above a Commercial Customer may use the Software exclusively for non-production purposes, including Continuous Integration (CI), automated testing, and quality assurance environments, provided that such use at all times remains compliant with the Usage Limit.

				iv. **Verification**. Licensor reserves the right to audit Licensee's environment and corporate identity to ensure compliance with these eligibility criteria.

				For the purposes of this Agreement an "**Affiliate**" means any entity that directly or indirectly controls, is controlled by, or is under common control with a party, where "control" means ownership of more than 50% of the voting stock or decision-making authority

				**License Limitations, Restrictions and Obligations:** The license grant above is subject to the following limitations, restrictions, and obligations. If Licensee’s Use of the Software does not comply with the above license grant or the terms of this section (including exceeding the Usage Limit set forth below), Licensee must: (i) refrain from any Use of the Software; and (ii) unless Licensee is a Never Customer, purchase a [commercial paid license](https://www.scylladb.com/scylladb-proprietary-software-license-agreement/) from the Licensor.

				* **Updates:** You shall be solely responsible for providing all equipment, systems, assets, access, and ancillary goods and services needed to access and Use the Software.  Licensor may modify or update the Software at any time, without notification, in its sole and absolute discretion.  After the effective date of each such update, Licensor shall bear no obligation to run, provide or support legacy versions of the Software.

				* **"Usage Limit":** Licensee's total overall available storage across all deployments and clusters of the Software and the Licensed Work under this License shall not exceed 10TB and/or an upper limit of 50 VCPUs (hyper threads).

				* **IP Markings:** Licensee must retain all copyright, trademark, and other proprietary notices contained in the Software. You will not modify, delete, alter, remove, or obscure any intellectual property, including without limitations licensing, copyright, trademark, or any other notices of Licensor in the Software.

				* **License Reproduction:** You must conspicuously display this Agreement on each copy of the Software. If You receive the Software from a third party, this Agreement still applies to Your Use of the Software. You will be responsible for any breach of this Agreement by any such third-party.

				* Distribution of any Licensed Works is permitted, provided that: (i) You must include in any Licensed Work prominent notices stating that You have modified the Software, (ii) You include a copy of this Agreement with the Licensed Work, and (iii) You clearly identify all modifications made in the Licensed Work and provides attribution to the Licensor as the original author(s) of the Software.

				* **Commercial Use Restrictions:** Licensee may not offer the Software as a software-as-a-service (SaaS) or commercial database-as-as-service (dBaaS) offering.  Licensee may not use the Software to compete with Licensor's existing or future products or services. If your Use of the Software does not comply with the requirements currently in effect as described in this License, you must purchase a commercial license from the Licensor, its affiliated entities, or you must refrain from using the Software and all Licensed Work. Furthermore, if You make any written claim of patent infringement relating to the Software, Your patent license for the Software granted under this Agreement terminates immediately.

				* **Commercial Use Restrictions:** Licensee may not offer the Software as a software-as-a-service (SaaS) or commercial database-as-as-service (dBaaS) offering.  Licensee may not use the Software to compete with Licensor's existing or future products or services. If your Use of the Software does not comply with the requirements currently in effect as described in this License, you must purchase a commercial license from the Licensor, its Affiliated entities, or you must refrain from using the Software and all Licensed Work. Furthermore, if You make any written claim of patent infringement relating to the Software, Your patent license for the Software granted under this Agreement terminates immediately.

				* Notwithstanding anything to the contrary, under the License granted hereunder, You shall not and shall not permit others to: (i) transfer the Software or any portions thereof to any other party except as expressly permitted herein; (ii) attempt to circumvent or overcome any technological protection measures incorporated into the Software; (iii) incorporate the Software into the structure, machinery or controls of any aircraft, other aerial device, military vehicle, hovercraft, waterborne craft or any medical equipment of any kind; or (iv) use the Software or any part thereof in any unlawful, harmful or illegal manner, or in a manner which infringes third parties’ rights in any way, including intellectual property rights.

				**Monitoring; Audit**

				@@ -41,14 +69,14 @@ The terms "**You**" or "**Licensee**" refer to any individual accessing or using

				**Indemnity; Disclaimer; Limitation of Liability**

				* **Indemnity:** Licensee hereby agrees to indemnify, defend and hold harmless Licensor and its affiliates from any losses or damages incurred due to a third party claim arising out of: (i) Licensee’s breach of this Agreement; (ii) Licensee’s negligence, willful misconduct or violation of law, or (iii) Licensee’s products or services.

				* **Indemnity:** Licensee hereby agrees to indemnify, defend and hold harmless Licensor and its Affiliates from any losses or damages incurred due to a third party claim arising out of: (i) Licensee’s breach of this Agreement; (ii) Licensee’s negligence, willful misconduct or violation of law, or (iii) Licensee’s products or services.

				* DISCLAIMER OF WARRANTIES:  LICENSEE AGREES THAT LICENSOR HAS MADE NO EXPRESS WARRANTIES REGARDING THE SOFTWARE AND THAT THE SOFTWARE IS BEING PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND. LICENSOR DISCLAIMS ALL WARRANTIES WITH REGARD TO THE SOFTWARE, EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION, ANY IMPLIED WARRANTIES OF FITNESS FOR A PARTICULAR PURPOSE; TITLE; MERCHANTABILITY;  OR NON-INFRINGEMENT OF THIRD PARTY RIGHTS. LICENSOR DOES NOT WARRANT THAT THE SOFTWARE WILL OPERATE UNINTERRUPTED OR ERROR FREE, OR THAT ALL ERRORS WILL BE CORRECTED.  LICENSOR DOES NOT GUARANTEE ANY PARTICULAR RESULTS FROM THE USE OF THE SOFTWARE, AND DOES NOT WARRANT THAT THE SOFTWARE IS FIT FOR ANY PARTICULAR PURPOSE.

				* LIMITATION OF LIABILITY:  TO THE FULLEST EXTENT PERMISSIBLE UNDER APPLICABLE LAW, IN NO EVENT WILL LICENSOR AND/OR ITS AFFILIATES, EMPLOYEES, OFFICERS AND DIRECTORS BE LIABLE TO LICENSEE FOR (I) ANY LOSS OF USE OR DATA; INTERRUPTION OF BUSINESS; OR ANY INDIRECT; SPECIAL; INCIDENTAL; OR CONSEQUENTIAL DAMAGES OF ANY KIND (INCLUDING LOST PROFITS); AND (II) ANY DIRECT DAMAGES EXCEEDING THE TOTAL AMOUNT OF ONE THOUSAND US DOLLARS ($1,000).  THE FOREGOING PROVISIONS LIMITING THE LIABILITY OF LICENSOR SHALL APPLY REGARDLESS OF THE FORM OR CAUSE OF ACTION, WHETHER IN STRICT LIABILITY, CONTRACT OR TORT.

				**Proprietary Rights; No Other Rights**

				* **Ownership:** Licensor retains sole and exclusive ownership of all rights, interests and title in the Software and any scripts, processes, techniques, methodologies, inventions, know-how, concepts, formatting, arrangements, visual attributes, ideas, database rights, copyrights, patents, trade secrets, and other intellectual property related thereto, and all derivatives, enhancements, modifications and improvements thereof. Except for the limited license rights granted herein, Licensee has no rights in or to the Software and/ or Licensor’s trademarks, logo, or branding and You acknowledge that such Software, trademarks, logo, or branding is the sole property of Licensor.

				* **Feedback:** Licensee is not required to provide any suggestions, enhancement requests, recommendations or other feedback regarding the Software ("Feedback").  If, notwithstanding this policy, Licensee submits Feedback, Licensee understands and acknowledges that such Feedback is not submitted in confidence and Licensor assumes no obligation, expressed or implied, by considering it.  All right in any trademark or logo of Licensor or its affiliates and You shall make no claim of right to the Software or any part thereof to be supplied by Licensor hereunder and acknowledges that as between Licensor and You, such Software is the sole proprietary, title and interest in and to Licensor.such Feedback shall be assigned to, and shall become the sole and exclusive property of, Licensor upon its creation.

				* **Feedback:** Licensee is not required to provide any suggestions, enhancement requests, recommendations or other feedback regarding the Software ("Feedback").  If, notwithstanding this policy, Licensee submits Feedback, Licensee understands and acknowledges that such Feedback is not submitted in confidence and Licensor assumes no obligation, expressed or implied, by considering it.  All right in any trademark or logo of Licensor or its Affiliates and You shall make no claim of right to the Software or any part thereof to be supplied by Licensor hereunder and acknowledges that as between Licensor and You, such Software is the sole proprietary, title and interest in and to Licensor.such Feedback shall be assigned to, and shall become the sole and exclusive property of, Licensor upon its creation.

				* Except for the rights expressly granted to You under this Agreement, You are not granted any other licenses or rights in the Software or otherwise. This Agreement constitutes the entire agreement between You and the Licensor with respect to the subject matter hereof and supersedes all prior or contemporaneous communications, representations, or agreements, whether oral or written.

				* **Third-Party Software:** Customer acknowledges that the Software may contain open and closed source components (“OSS Components”) that are governed separately by certain licenses, in each case as further provided by Company upon request. Any applicable OSS Component license is solely between Licensee and the applicable licensor of the OSS Component and Licensee shall comply with the applicable OSS Component license.

				* If any provision of this Agreement is held to be invalid or unenforceable, such provision shall be struck and the remaining provisions shall remain in full force and effect.

				@@ -56,7 +84,7 @@ The terms "**You**" or "**Licensee**" refer to any individual accessing or using

				**Miscellaneous**

				* **Miscellaneous:** This Agreement may be modified at any time by Licensor, and constitutes the entire agreement between the parties with respect to the subject matter hereof. Licensee may not assign or subcontract its rights or obligations under this Agreement.  This Agreement does not, and shall not be construed to create any relationship, partnership, joint venture, employer-employee, agency, or franchisor-franchisee relationship between the parties.

				* **Modifications**: Licensor reserves the right to modify this Agreement at any time. Changes will be effective upon posting to the Website or within the Software repository. Continued use of the Software after such changes constitutes acceptance.

				* **Governing Law & Jurisdiction:** This Agreement shall be governed and construed in accordance with the laws of Israel, without giving effect to their respective conflicts of laws provisions, and the competent courts situated in Tel Aviv, Israel, shall have sole and exclusive jurisdiction over the parties and any conflict and/or dispute arising out of, or in connection to, this Agreement

				\[*End of ScyllaDB Software License Agreement*\]

2

README.md

View File

@@ -43,7 +43,7 @@ For further information, please see:
 [developer documentation]: HACKING.md
 [build documentation]: docs/dev/building.md
 [docker image build documentation]: dist/docker/debian/README.md
 [docker image build documentation]: dist/docker/redhat/README.md
 ## Running Scylla

2

abseil

Submodule abseil updated: d7aaad83b4...255c84dadd

									
										2

absl-flat_hash_map.cc
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#include "absl-flat_hash_map.hh"

									
										2

absl-flat_hash_map.hh
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#pragma once

									
										11

alternator/auth.cc
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#include "alternator/error.hh"

				@@ -13,7 +13,8 @@

				#include <string_view>

				#include "alternator/auth.hh"

				#include <fmt/format.h>

				#include "auth/password_authenticator.hh"

				#include "db/consistency_level_type.hh"

				#include "db/system_keyspace.hh"

				#include "service/storage_proxy.hh"

				#include "alternator/executor.hh"

				#include "cql3/selection/selection.hh"

				@@ -25,8 +26,8 @@ namespace alternator {

				static logging::logger alogger("alternator-auth");

				future<std::string> get_key_from_roles(service::storage_proxy& proxy, auth::service& as, std::string username) {

				    schema_ptr schema = proxy.data_dictionary().find_schema(auth::get_auth_ks_name(as.query_processor()), "roles");

				future<std::string> get_key_from_roles(service::storage_proxy& proxy, std::string username) {

				    schema_ptr schema = proxy.data_dictionary().find_schema(db::system_keyspace::NAME, "roles");

				    partition_key pk = partition_key::from_single_value(*schema, utf8_type->decompose(username));

				    dht::partition_range_vector partition_ranges{dht::partition_range(dht::decorate_key(*schema, pk))};

				    std::vector<query::clustering_range> bounds{query::clustering_range::make_open_ended_both_sides()};

				@@ -39,7 +40,7 @@ future<std::string> get_key_from_roles(service::storage_proxy& proxy, auth::serv

				    auto partition_slice = query::partition_slice(std::move(bounds), {}, query::column_id_vector{salted_hash_col->id, can_login_col->id}, selection->get_query_options());

				    auto command = ::make_lw_shared<query::read_command>(schema->id(), schema->version(), partition_slice,

				            proxy.get_max_result_size(partition_slice), query::tombstone_limit(proxy.get_tombstone_limit()));

				    auto cl = auth::password_authenticator::consistency_for_user(username);

				    auto cl = db::consistency_level::LOCAL_ONE;

				    service::client_state client_state{service::client_state::internal_tag()};

				    service::storage_proxy::coordinator_query_result qr = co_await proxy.query(schema, std::move(command), std::move(partition_ranges), cl,

									
										4

alternator/auth.hh
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#pragma once

				@@ -20,6 +20,6 @@ namespace alternator {

				using key_cache = utils::loading_cache<std::string, std::string, 1>;

				future<std::string> get_key_from_roles(service::storage_proxy& proxy, auth::service& as, std::string username);

				future<std::string> get_key_from_roles(service::storage_proxy& proxy, std::string username);

				}

									
										2

alternator/conditions.cc
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#include <string_view>

									
										2

alternator/conditions.hh
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				/*

									
										2

alternator/consumed_capacity.cc
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#include "consumed_capacity.hh"

									
										2

alternator/consumed_capacity.hh
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#pragma once

									
										2

alternator/controller.cc
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#include <seastar/core/with_scheduling_group.hh>

									
										2

alternator/controller.hh
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#pragma once

									
										2

alternator/error.hh
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#pragma once

									
										28

alternator/executor.cc
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#include <fmt/ranges.h>

				@@ -63,6 +63,7 @@

				#include "types/types.hh"

				#include "db/system_keyspace.hh"

				#include "cql3/statements/ks_prop_defs.hh"

				#include "alternator/ttl_tag.hh"

				using namespace std::chrono_literals;

				@@ -164,7 +165,7 @@ static map_type attrs_type() {

				static const column_definition& attrs_column(const schema& schema) {

				    const column_definition* cdef = schema.get_column_definition(bytes(executor::ATTRS_COLUMN_NAME));

				    SCYLLA_ASSERT(cdef);

				    throwing_assert(cdef);

				    return *cdef;

				}

				@@ -1649,7 +1650,7 @@ static future<> mark_view_schemas_as_built(utils::chunked_vector<mutation>& out,

				}

				future<executor::request_return_type> executor::create_table_on_shard0(service::client_state&& client_state, tracing::trace_state_ptr trace_state, rjson::value request, bool enforce_authorization, bool warn_authorization, const db::tablets_mode_t::mode tablets_mode) {

				    SCYLLA_ASSERT(this_shard_id() == 0);

				    throwing_assert(this_shard_id() == 0);

				    // We begin by parsing and validating the content of the CreateTable

				    // command. We can't inspect the current database schema at this point

				@@ -2837,14 +2838,12 @@ future<executor::request_return_type> rmw_operation::execute(service::storage_pr

				        }

				    } else if (_write_isolation != write_isolation::LWT_ALWAYS) {

				        std::optional<mutation> m = apply(nullptr, api::new_timestamp(), cdc_opts);

				        SCYLLA_ASSERT(m); // !needs_read_before_write, so apply() did not check a condition

				        throwing_assert(m); // !needs_read_before_write, so apply() did not check a condition

				        return proxy.mutate(utils::chunked_vector<mutation>{std::move(*m)}, db::consistency_level::LOCAL_QUORUM, executor::default_timeout(), trace_state, std::move(permit), db::allow_per_partition_rate_limit::yes, false, std::move(cdc_opts)).then([this, &wcu_total] () mutable {

				            return rmw_operation_return(std::move(_return_attributes), _consumed_capacity, wcu_total);

				        });

				    }

				    if (!cas_shard) {

				        on_internal_error(elogger, "cas_shard is not set");

				    }

				    throwing_assert(cas_shard);

				    // If we're still here, we need to do this write using LWT:

				    global_stats.write_using_lwt++;

				    per_table_stats.write_using_lwt++;

				@@ -3464,7 +3463,11 @@ future<executor::request_return_type> executor::batch_write_item(client_state& c

				    if (should_add_wcu) {

				        rjson::add(ret, "ConsumedCapacity", std::move(consumed_capacity));

				    }

				    _stats.api_operations.batch_write_item_latency.mark(std::chrono::steady_clock::now() - start_time);

				    auto duration = std::chrono::steady_clock::now() - start_time;

				    _stats.api_operations.batch_write_item_latency.mark(duration);

				    for (const auto& w : per_table_wcu) {

				        w.first->api_operations.batch_write_item_latency.mark(duration);

				    }

				    co_return rjson::print(std::move(ret));

				}

				@@ -4975,7 +4978,12 @@ future<executor::request_return_type> executor::batch_get_item(client_state& cli

				    if (!some_succeeded && eptr) {

				        co_await coroutine::return_exception_ptr(std::move(eptr));

				    }

				    _stats.api_operations.batch_get_item_latency.mark(std::chrono::steady_clock::now() - start_time);

				    auto duration = std::chrono::steady_clock::now() - start_time;

				    _stats.api_operations.batch_get_item_latency.mark(duration);

				    for (const table_requests& rs : requests) {

				        lw_shared_ptr<stats> per_table_stats = get_stats_from_schema(_proxy, *rs.schema);

				        per_table_stats->api_operations.batch_get_item_latency.mark(duration);

				    }

				    if (is_big(response)) {

				        co_return make_streamed(std::move(response));

				    } else {

				@@ -5413,7 +5421,7 @@ static future<executor::request_return_type> do_query(service::storage_proxy& pr

				}

				static dht::token token_for_segment(int segment, int total_segments) {

				    SCYLLA_ASSERT(total_segments > 1 && segment >= 0 && segment < total_segments);

				    throwing_assert(total_segments > 1 && segment >= 0 && segment < total_segments);

				    uint64_t delta = std::numeric_limits<uint64_t>::max() / total_segments;

				    return dht::token::from_int64(std::numeric_limits<int64_t>::min() + delta * segment);

				}

									
										2

alternator/executor.hh
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#pragma once

									
										2

alternator/expressions.cc
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#include "expressions.hh"

2

alternator/expressions.g

View File

@@ -3,7 +3,7 @@
  */
 /*
  * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
  * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1
  */
 /*

									
										2

alternator/expressions.hh
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#pragma once

									
										2

alternator/expressions_types.hh
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#pragma once

									
										2

alternator/extract_from_attrs.hh
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#pragma once

									
										2

alternator/http_compression.cc
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#include "alternator/http_compression.hh"

									
										2

alternator/http_compression.hh
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#pragma once

									
										2

alternator/parsed_expression_cache.cc
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#include "expressions.hh"

									
										2

alternator/rmw_operation.hh
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#pragma once

									
										2

alternator/serialization.cc
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#include "utils/base64.hh"

									
										2

alternator/serialization.hh
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#pragma once

									
										27

alternator/server.cc
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#include "alternator/server.hh"

				@@ -411,8 +411,8 @@ future<std::string> server::verify_signature(const request& req, const chunked_c

				        }

				    }

				    auto cache_getter = [&proxy = _proxy, &as = _auth_service] (std::string username) {

				        return get_key_from_roles(proxy, as, std::move(username));

				    auto cache_getter = [&proxy = _proxy] (std::string username) {

				        return get_key_from_roles(proxy, std::move(username));

				    };

				    return _key_cache.get_ptr(user, cache_getter).then_wrapped([this, &req, &content,

				                                                    user = std::move(user),

				@@ -699,6 +699,17 @@ future<executor::request_return_type> server::handle_api_request(std::unique_ptr

				        // for such a size.

				        co_return api_error::payload_too_large(fmt::format("Request content length limit of {} bytes exceeded", request_content_length_limit));

				    }

				    // Check the concurrency limit early, before acquiring memory and

				    // reading the request body, to avoid piling up memory from excess

				    // requests that will be rejected anyway. This mirrors the CQL

				    // transport which also checks concurrency before memory acquisition

				    // (transport/server.cc).

				    if (_pending_requests.get_count() >= _max_concurrent_requests) {

				        _executor._stats.requests_shed++;

				        co_return api_error::request_limit_exceeded(format("too many in-flight requests (configured via max_concurrent_requests_per_shard): {}", _pending_requests.get_count()));

				    }

				    _pending_requests.enter();

				    auto leave = defer([this] () noexcept { _pending_requests.leave(); });

				    // JSON parsing can allocate up to roughly 2x the size of the raw

				    // document, + a couple of bytes for maintenance.

				    // If the Content-Length of the request is not available, we assume

				@@ -710,7 +721,7 @@ future<executor::request_return_type> server::handle_api_request(std::unique_ptr

				        ++_executor._stats.requests_blocked_memory;

				    }

				    auto units = co_await std::move(units_fut);

				    SCYLLA_ASSERT(req->content_stream);

				    throwing_assert(req->content_stream);

				    chunked_content content = co_await read_entire_stream(*req->content_stream, request_content_length_limit);

				    // If the request had no Content-Length, we reserved too many units

				    // so need to return some

				@@ -760,18 +771,12 @@ future<executor::request_return_type> server::handle_api_request(std::unique_ptr

				        _executor._stats.unsupported_operations++;

				        co_return api_error::unknown_operation(fmt::format("Unsupported operation {}", op));

				    }

				    if (_pending_requests.get_count() >= _max_concurrent_requests) {

				        _executor._stats.requests_shed++;

				        co_return api_error::request_limit_exceeded(format("too many in-flight requests (configured via max_concurrent_requests_per_shard): {}", _pending_requests.get_count()));

				    }

				    _pending_requests.enter();

				    auto leave = defer([this] () noexcept { _pending_requests.leave(); });

				    executor::client_state client_state(service::client_state::external_tag(),

				        _auth_service, &_sl_controller, _timeout_config.current_values(), req->get_client_address());

				    if (!username.empty()) {

				        client_state.set_login(auth::authenticated_user(username));

				    }

				    co_await client_state.maybe_update_per_service_level_params();

				    client_state.maybe_update_per_service_level_params();

				    tracing::trace_state_ptr trace_state = maybe_trace_query(client_state, username, op, content, _max_users_query_size_in_trace_output.get());

				    tracing::trace(trace_state, "{}", op);

									
										2

alternator/server.hh
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#pragma once

									
										32

alternator/stats.cc
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#include "stats.hh"

				@@ -14,20 +14,6 @@

				namespace alternator {

				const char* ALTERNATOR_METRICS = "alternator";

				static seastar::metrics::histogram estimated_histogram_to_metrics(const utils::estimated_histogram& histogram) {

				    seastar::metrics::histogram res;

				    res.buckets.resize(histogram.bucket_offsets.size());

				    uint64_t cumulative_count = 0;

				    res.sample_count = histogram._count;

				    res.sample_sum = histogram._sample_sum;

				    for (size_t i = 0; i < res.buckets.size(); i++) {

				        auto& v = res.buckets[i];

				        v.upper_bound = histogram.bucket_offsets[i];

				        cumulative_count += histogram.buckets[i];

				        v.count = cumulative_count;

				    }

				    return res;

				}

				static seastar::metrics::label column_family_label("cf");

				static seastar::metrics::label keyspace_label("ks");

				@@ -151,21 +137,21 @@ static void register_metrics_with_optional_table(seastar::metrics::metric_groups

				            seastar::metrics::make_counter("batch_item_count", seastar::metrics::description("The total number of items processed across all batches"), labels,

				                    stats.api_operations.batch_get_item_batch_total)(op("BatchGetItem")).aggregate(aggregate_labels).set_skip_when_empty(),

				            seastar::metrics::make_histogram("batch_item_count_histogram", seastar::metrics::description("Histogram of the number of items in a batch request"), labels,

				                    [&stats]{ return estimated_histogram_to_metrics(stats.api_operations.batch_get_item_histogram);})(op("BatchGetItem")).aggregate({seastar::metrics::shard_label}).set_skip_when_empty(),

				                    [&stats]{ return to_metrics_histogram(stats.api_operations.batch_get_item_histogram);})(op("BatchGetItem")).aggregate({seastar::metrics::shard_label}).set_skip_when_empty(),

				            seastar::metrics::make_histogram("batch_item_count_histogram", seastar::metrics::description("Histogram of the number of items in a batch request"), labels,

				                    [&stats]{ return estimated_histogram_to_metrics(stats.api_operations.batch_write_item_histogram);})(op("BatchWriteItem")).aggregate({seastar::metrics::shard_label}).set_skip_when_empty(),

				                    [&stats]{ return to_metrics_histogram(stats.api_operations.batch_write_item_histogram);})(op("BatchWriteItem")).aggregate({seastar::metrics::shard_label}).set_skip_when_empty(),

				            seastar::metrics::make_histogram("operation_size_kb", seastar::metrics::description("Histogram of item sizes involved in a request"), labels,

				                    [&stats]{ return estimated_histogram_to_metrics(stats.operation_sizes.get_item_op_size_kb);})(op("GetItem")).aggregate({seastar::metrics::shard_label}).set_skip_when_empty(),

				                    [&stats]{ return to_metrics_histogram(stats.operation_sizes.get_item_op_size_kb);})(op("GetItem")).aggregate({seastar::metrics::shard_label}).set_skip_when_empty(),

				            seastar::metrics::make_histogram("operation_size_kb", seastar::metrics::description("Histogram of item sizes involved in a request"), labels,

				                    [&stats]{ return estimated_histogram_to_metrics(stats.operation_sizes.put_item_op_size_kb);})(op("PutItem")).aggregate({seastar::metrics::shard_label}).set_skip_when_empty(),

				                    [&stats]{ return to_metrics_histogram(stats.operation_sizes.put_item_op_size_kb);})(op("PutItem")).aggregate({seastar::metrics::shard_label}).set_skip_when_empty(),

				            seastar::metrics::make_histogram("operation_size_kb", seastar::metrics::description("Histogram of item sizes involved in a request"), labels,

				                    [&stats]{ return estimated_histogram_to_metrics(stats.operation_sizes.delete_item_op_size_kb);})(op("DeleteItem")).aggregate({seastar::metrics::shard_label}).set_skip_when_empty(),

				                    [&stats]{ return to_metrics_histogram(stats.operation_sizes.delete_item_op_size_kb);})(op("DeleteItem")).aggregate({seastar::metrics::shard_label}).set_skip_when_empty(),

				            seastar::metrics::make_histogram("operation_size_kb", seastar::metrics::description("Histogram of item sizes involved in a request"), labels,

				                    [&stats]{ return estimated_histogram_to_metrics(stats.operation_sizes.update_item_op_size_kb);})(op("UpdateItem")).aggregate({seastar::metrics::shard_label}).set_skip_when_empty(),

				                    [&stats]{ return to_metrics_histogram(stats.operation_sizes.update_item_op_size_kb);})(op("UpdateItem")).aggregate({seastar::metrics::shard_label}).set_skip_when_empty(),

				            seastar::metrics::make_histogram("operation_size_kb", seastar::metrics::description("Histogram of item sizes involved in a request"), labels,

				                    [&stats]{ return estimated_histogram_to_metrics(stats.operation_sizes.batch_get_item_op_size_kb);})(op("BatchGetItem")).aggregate({seastar::metrics::shard_label}).set_skip_when_empty(),

				                    [&stats]{ return to_metrics_histogram(stats.operation_sizes.batch_get_item_op_size_kb);})(op("BatchGetItem")).aggregate({seastar::metrics::shard_label}).set_skip_when_empty(),

				            seastar::metrics::make_histogram("operation_size_kb", seastar::metrics::description("Histogram of item sizes involved in a request"), labels,

				                    [&stats]{ return estimated_histogram_to_metrics(stats.operation_sizes.batch_write_item_op_size_kb);})(op("BatchWriteItem")).aggregate({seastar::metrics::shard_label}).set_skip_when_empty(),

				                    [&stats]{ return to_metrics_histogram(stats.operation_sizes.batch_write_item_op_size_kb);})(op("BatchWriteItem")).aggregate({seastar::metrics::shard_label}).set_skip_when_empty(),

				    });

				    seastar::metrics::label expression_label("expression");

									
										26

alternator/stats.hh
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#pragma once

				@@ -16,6 +16,8 @@

				#include "cql3/stats.hh"

				namespace alternator {

				using batch_histogram = utils::estimated_histogram_with_max<128>;

				using op_size_histogram = utils::estimated_histogram_with_max<512>;

				// Object holding per-shard statistics related to Alternator.

				// While this object is alive, these metrics are also registered to be

				@@ -76,34 +78,34 @@ public:

				        utils::timed_rate_moving_average_summary_and_histogram batch_get_item_latency;

				        utils::timed_rate_moving_average_summary_and_histogram get_records_latency;

				        utils::estimated_histogram batch_get_item_histogram{22}; // a histogram that covers the range 1 - 100

				        utils::estimated_histogram batch_write_item_histogram{22}; // a histogram that covers the range 1 - 100

				        batch_histogram batch_get_item_histogram;

				        batch_histogram batch_write_item_histogram;

				    } api_operations;

				    // Operation size metrics

				    struct {

				        // Item size statistics collected per table and aggregated per node.

				        // Each histogram covers the range 0 - 446. Resolves #25143.

				        // Each histogram covers the range 0 - 512. Resolves #25143.

				        // A size is the retrieved item's size.

				        utils::estimated_histogram get_item_op_size_kb{30};

				        op_size_histogram get_item_op_size_kb;

				        // A size is the maximum of the new item's size and the old item's size.

				        utils::estimated_histogram put_item_op_size_kb{30};

				        op_size_histogram put_item_op_size_kb;

				        // A size is the deleted item's size. If the deleted item's size is

				        // unknown (i.e. read-before-write wasn't necessary and it wasn't

				        // forced by a configuration option), it won't be recorded on the

				        // histogram.

				        utils::estimated_histogram delete_item_op_size_kb{30};

				        op_size_histogram delete_item_op_size_kb;

				        // A size is the maximum of existing item's size and the estimated size

				        // of the update. This will be changed to the maximum of the existing item's

				        // size and the new item's size in a subsequent PR.

				        utils::estimated_histogram update_item_op_size_kb{30};

				        op_size_histogram update_item_op_size_kb;

				        // A size is the sum of the sizes of all items per table. This means

				        // that a single BatchGetItem / BatchWriteItem updates the histogram

				        // for each table that it has items in.

				        // The sizes are the retrieved items' sizes grouped per table.

				        utils::estimated_histogram batch_get_item_op_size_kb{30};

				        op_size_histogram batch_get_item_op_size_kb;

				        // The sizes are the the written items' sizes grouped per table.

				        utils::estimated_histogram batch_write_item_op_size_kb{30};

				        op_size_histogram batch_write_item_op_size_kb;

				    } operation_sizes;

				    // Count of authentication and authorization failures, counted if either

				    // alternator_enforce_authorization or alternator_warn_authorization are

				@@ -140,7 +142,7 @@ public:

				    cql3::cql_stats cql_stats;

				    // Enumeration of expression types only for stats

				    // if needed it can be extended e.g. per operation 

				    // if needed it can be extended e.g. per operation

				    enum expression_types {

				        UPDATE_EXPRESSION,

				        CONDITION_EXPRESSION,

				@@ -164,7 +166,7 @@ struct table_stats {

				void register_metrics(seastar::metrics::metric_groups& metrics, const stats& stats);

				inline uint64_t bytes_to_kb_ceil(uint64_t bytes) {

				    return (bytes + 1023) / 1024;

				    return (bytes) / 1024;

				}

				}

									
										143

alternator/streams.cc
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#include <type_traits>

				@@ -33,6 +33,8 @@

				#include "data_dictionary/data_dictionary.hh"

				#include "utils/rjson.hh"

				static logging::logger slogger("alternator-streams");

				/**

				 * Base template type to implement  rapidjson::internal::TypeHelper<...>:s

				 * for types that are ostreamable/string constructible/castable.

				@@ -428,6 +430,25 @@ using namespace std::chrono_literals;

				// Dynamo docs says no data shall live longer than 24h.

				static constexpr auto dynamodb_streams_max_window = 24h;

				// find the parent shard in previous generation for the given child shard

				// takes care of wrap-around case in vnodes

				// prev_streams must be sorted by token

				const cdc::stream_id& find_parent_shard_in_previous_generation(db_clock::time_point prev_timestamp, const utils::chunked_vector<cdc::stream_id> &prev_streams, const cdc::stream_id &child) {

				    if (prev_streams.empty()) {

				        // something is really wrong - streams are empty

				        // let's try internal_error in hope it will be notified and fixed

				        on_internal_error(slogger, fmt::format("streams are empty for cdc generation at {} ({})", prev_timestamp, prev_timestamp.time_since_epoch().count()));

				    }

				    auto it = std::lower_bound(prev_streams.begin(), prev_streams.end(), child.token(), [](const cdc::stream_id& id, const dht::token& t) {

				        return id.token() < t;

				    });

				    if (it == prev_streams.end()) {

				        // wrap around case - take first

				        it = prev_streams.begin();

				    }

				    return *it;

				}

				future<executor::request_return_type> executor::describe_stream(client_state& client_state, service_permit permit, rjson::value request) {

				    _stats.api_operations.describe_stream++;

				@@ -578,16 +599,8 @@ future<executor::request_return_type> executor::describe_stream(client_state& cl

				            auto shard = rjson::empty_object();

				            if (prev != e) {

				                auto& pids = prev->second.streams;

				                auto pid = std::upper_bound(pids.begin(), pids.end(), id.token(), [](const dht::token& t, const cdc::stream_id& id) {

				                    return t < id.token();

				                });

				                if (pid != pids.begin()) {

				                    pid = std::prev(pid);

				                }

				                if (pid != pids.end()) {

				                    rjson::add(shard, "ParentShardId", shard_id(prev->first, *pid));

				                }

				                auto &pid = find_parent_shard_in_previous_generation(prev->first, prev->second.streams, id);

				                rjson::add(shard, "ParentShardId", shard_id(prev->first, pid));

				            }

				            last.emplace(ts, id);

				@@ -774,16 +787,18 @@ future<executor::request_return_type> executor::get_shard_iterator(client_state&

				struct event_id {

				    cdc::stream_id stream;

				    utils::UUID timestamp;

				    size_t index = 0;

				    static constexpr auto marker = 'E';

				    event_id(cdc::stream_id s, utils::UUID ts)

				    event_id(cdc::stream_id s, utils::UUID ts, size_t index)

				        : stream(s)

				        , timestamp(ts)

				        , index(index)

				    {}

				    friend std::ostream& operator<<(std::ostream& os, const event_id& id) {

				        fmt::print(os, "{}{}:{}", marker, id.stream.to_bytes(), id.timestamp);

				        fmt::print(os, "{}{}:{}:{}", marker, id.stream.to_bytes(), id.timestamp, id.index);

				        return os;

				    }

				};

				@@ -795,7 +810,19 @@ struct rapidjson::internal::TypeHelper<ValueType, alternator::event_id>

				{};

				namespace alternator {

				    namespace {

				        struct managed_bytes_ptr_hash {

				            size_t operator()(const managed_bytes *k) const noexcept {

				                return std::hash<managed_bytes>{}(*k);

				            }

				        };

				        struct managed_bytes_ptr_equal {

				            bool operator()(const managed_bytes *a, const managed_bytes *b) const noexcept {

				                return *a == *b;

				            }

				        };

				    }

				future<executor::request_return_type> executor::get_records(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request) {

				    _stats.api_operations.get_records++;

				    auto start_time = std::chrono::steady_clock::now();

				@@ -866,6 +893,12 @@ future<executor::request_return_type> executor::get_records(client_state& client

				    auto pks = schema->partition_key_columns();

				    auto cks = schema->clustering_key_columns();

				    auto base_cks = base->clustering_key_columns();

				    if (base_cks.size() > 1) {

				        throw api_error::internal(fmt::format("invalid alternator table, clustering key count ({}) is bigger than one", base_cks.size()));

				    }

				    const bytes *clustering_key_column_name = !base_cks.empty() ? &base_cks.front().name() : nullptr;

				    std::transform(pks.begin(), pks.end(), std::back_inserter(columns), [](auto& c) { return &c; });

				    std::transform(cks.begin(), cks.end(), std::back_inserter(columns), [](auto& c) { return &c; });

				@@ -920,42 +953,40 @@ future<executor::request_return_type> executor::get_records(client_state& client

				            return cdef->name->name() == eor_column_name;

				        })

				    );

				    auto clustering_key_index = clustering_key_column_name ? std::distance(metadata.get_names().begin(), 

				        std::find_if(metadata.get_names().begin(), metadata.get_names().end(), [&](const lw_shared_ptr<cql3::column_specification>& cdef) {

				            return cdef->name->name() == *clustering_key_column_name;

				        })

				    ) : 0;

				    std::optional<utils::UUID> timestamp;

				    auto dynamodb = rjson::empty_object();

				    auto record = rjson::empty_object();

				    struct Record {

				        rjson::value record;

				        rjson::value dynamodb;

				    };

				    const managed_bytes empty_managed_bytes;

				    std::unordered_map<const managed_bytes*, Record, managed_bytes_ptr_hash, managed_bytes_ptr_equal> records_map;

				    const auto dc_name = _proxy.get_token_metadata_ptr()->get_topology().get_datacenter();

				    using op_utype = std::underlying_type_t<cdc::operation>;

				    auto maybe_add_record = [&] {

				        if (!dynamodb.ObjectEmpty()) {

				            rjson::add(record, "dynamodb", std::move(dynamodb));

				            dynamodb = rjson::empty_object();

				        }

				        if (!record.ObjectEmpty()) {

				            rjson::add(record, "awsRegion", rjson::from_string(dc_name));

				            rjson::add(record, "eventID", event_id(iter.shard.id, *timestamp));

				            rjson::add(record, "eventSource", "scylladb:alternator");

				            rjson::add(record, "eventVersion", "1.1");

				            rjson::push_back(records, std::move(record));

				            record = rjson::empty_object();

				            --limit;

				        }

				    };

				    for (auto& row : result_set->rows()) {

				        auto op = static_cast<cdc::operation>(value_cast<op_utype>(data_type_for<op_utype>()->deserialize(*row[op_index])));

				        auto ts = value_cast<utils::UUID>(data_type_for<utils::UUID>()->deserialize(*row[ts_index]));

				        auto eor = row[eor_index].has_value() ? value_cast<bool>(boolean_type->deserialize(*row[eor_index])) : false;

				        const managed_bytes* cs_ptr = clustering_key_column_name ? &*row[clustering_key_index] : &empty_managed_bytes;

				        auto records_it = records_map.emplace(cs_ptr, Record{});

				        auto &record = records_it.first->second;

				        if (!dynamodb.HasMember("Keys")) {

				        if (records_it.second) {

				            record.dynamodb = rjson::empty_object();

				            record.record = rjson::empty_object();

				            auto keys = rjson::empty_object();

				            describe_single_item(*selection, row, key_names, keys);

				            rjson::add(dynamodb, "Keys", std::move(keys));

				            rjson::add(dynamodb, "ApproximateCreationDateTime", utils::UUID_gen::unix_timestamp_in_sec(ts).count());

				            rjson::add(dynamodb, "SequenceNumber", sequence_number(ts));

				            rjson::add(dynamodb, "StreamViewType", type);

				            rjson::add(record.dynamodb, "Keys", std::move(keys));

				            rjson::add(record.dynamodb, "ApproximateCreationDateTime", utils::UUID_gen::unix_timestamp_in_sec(ts).count());

				            rjson::add(record.dynamodb, "SequenceNumber", sequence_number(ts));

				            rjson::add(record.dynamodb, "StreamViewType", type);

				            // TODO: SizeBytes

				        }

				@@ -979,6 +1010,10 @@ future<executor::request_return_type> executor::get_records(client_state& client

				         * flags on CDC log, instead we use data to 

				         * drive what is returned. This is (afaict)

				         * consistent with dynamo streams

				         * 

				         * Note: BatchWriteItem will generate multiple records with

				         * the same timestamp, when write isolation is set to always

				         * (which triggers lwt), so we need to unpack them based on clustering key.

				         */

				        switch (op) {

				        case cdc::operation::pre_image:

				@@ -987,14 +1022,14 @@ future<executor::request_return_type> executor::get_records(client_state& client

				            auto item = rjson::empty_object();

				            describe_single_item(*selection, row, attr_names, item, nullptr, true);

				            describe_single_item(*selection, row, key_names, item);

				            rjson::add(dynamodb, op == cdc::operation::pre_image ? "OldImage" : "NewImage", std::move(item));

				            rjson::add(record.dynamodb, op == cdc::operation::pre_image ? "OldImage" : "NewImage", std::move(item));

				            break;

				        }

				        case cdc::operation::update:

				            rjson::add(record, "eventName", "MODIFY");

				            rjson::add(record.record, "eventName", "MODIFY");

				            break;

				        case cdc::operation::insert:

				            rjson::add(record, "eventName", "INSERT");

				            rjson::add(record.record, "eventName", "INSERT");

				            break;

				        case cdc::operation::service_row_delete:

				        case cdc::operation::service_partition_delete:

				@@ -1002,28 +1037,41 @@ future<executor::request_return_type> executor::get_records(client_state& client

				            auto user_identity = rjson::empty_object();

				            rjson::add(user_identity, "Type", "Service");

				            rjson::add(user_identity, "PrincipalId", "dynamodb.amazonaws.com");

				            rjson::add(record, "userIdentity", std::move(user_identity));

				            rjson::add(record, "eventName", "REMOVE");

				            rjson::add(record.record, "userIdentity", std::move(user_identity));

				            rjson::add(record.record, "eventName", "REMOVE");

				            break;

				        }

				        default:

				            rjson::add(record, "eventName", "REMOVE");

				            rjson::add(record.record, "eventName", "REMOVE");

				            break;

				        }

				        if (eor) {

				            maybe_add_record();

				            size_t index = 0;

				            for (auto& [_, rec] : records_map) {

				                rjson::add(rec.record, "awsRegion", rjson::from_string(dc_name));

				                rjson::add(rec.record, "eventID", event_id(iter.shard.id, *timestamp, index++));

				                rjson::add(rec.record, "eventSource", "scylladb:alternator");

				                rjson::add(rec.record, "eventVersion", "1.1");

				                rjson::add(rec.record, "dynamodb", std::move(rec.dynamodb));

				                rjson::push_back(records, std::move(rec.record));

				            }

				            records_map.clear();

				            timestamp = ts;

				            if (limit == 0) {

				            if (records.Size() >= limit) {

				                // Note: we might have more than limit rows here - BatchWriteItem will emit multiple items

				                // with the same timestamp and we have no way of resume iteration midway through those,

				                // so we return all of them here.

				                break;

				            }

				        }

				    }

				    auto ret = rjson::empty_object();

				    auto nrecords = records.Size();

				    rjson::add(ret, "Records", std::move(records));

				    if (nrecords != 0) {

				    if (timestamp) {

				        // #9642. Set next iterators threshold to > last

				        shard_iterator next_iter(iter.table, iter.shard, *timestamp, false);

				        // Note that here we unconditionally return NextShardIterator,

				@@ -1074,6 +1122,7 @@ bool executor::add_stream_options(const rjson::value& stream_specification, sche

				        cdc::options opts;

				        opts.enabled(true);

				        // cdc::delta_mode is ignored by Alternator, so aim for the least overhead.

				        opts.set_delta_mode(cdc::delta_mode::keys);

				        opts.ttl(std::chrono::duration_cast<std::chrono::seconds>(dynamodb_streams_max_window).count());

									
										98

alternator/ttl.cc
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#include <chrono>

				@@ -46,6 +46,7 @@

				#include "alternator/executor.hh"

				#include "alternator/controller.hh"

				#include "alternator/serialization.hh"

				#include "alternator/ttl_tag.hh"

				#include "dht/sharder.hh"

				#include "db/config.hh"

				#include "db/tags/utils.hh"

				@@ -57,19 +58,10 @@ static logging::logger tlogger("alternator_ttl");

				namespace alternator {

				// We write the expiration-time attribute enabled on a table in a

				// tag TTL_TAG_KEY.

				// Currently, the *value* of this tag is simply the name of the attribute,

				// and the expiration scanner interprets it as an Alternator attribute name -

				// It can refer to a real column or if that doesn't exist, to a member of

				// the ":attrs" map column. Although this is designed for Alternator, it may

				// be good enough for CQL as well (there, the ":attrs" column won't exist).

				extern const sstring TTL_TAG_KEY;

				future<executor::request_return_type> executor::update_time_to_live(client_state& client_state, service_permit permit, rjson::value request) {

				    _stats.api_operations.update_time_to_live++;

				    if (!_proxy.features().alternator_ttl) {

				        co_return api_error::unknown_operation("UpdateTimeToLive not yet supported. Experimental support is available if the 'alternator-ttl' experimental feature is enabled on all nodes.");

				        co_return api_error::unknown_operation("UpdateTimeToLive not yet supported. Upgrade all nodes to a version that supports it.");

				    }

				    schema_ptr schema = get_table(_proxy, request);

				@@ -324,9 +316,7 @@ static future<std::vector<std::pair<dht::token_range, locator::host_id>>> get_se

				    const auto& tm = *erm->get_token_metadata_ptr();

				    const auto& sorted_tokens = tm.sorted_tokens();

				    std::vector<std::pair<dht::token_range, locator::host_id>> ret;

				    if (sorted_tokens.empty()) {

				        on_internal_error(tlogger, "Token metadata is empty");

				    }

				    throwing_assert(!sorted_tokens.empty());

				    auto prev_tok = sorted_tokens.back();

				    for (const auto& tok : sorted_tokens) {

				        co_await coroutine::maybe_yield();

				@@ -563,7 +553,7 @@ static future<> scan_table_ranges(

				        expiration_service::stats& expiration_stats)

				{

				    const schema_ptr& s = scan_ctx.s;

				    SCYLLA_ASSERT (partition_ranges.size() == 1); // otherwise issue #9167 will cause incorrect results.

				    throwing_assert(partition_ranges.size() == 1); // otherwise issue #9167 will cause incorrect results.

				    auto p = service::pager::query_pagers::pager(proxy, s, scan_ctx.selection, *scan_ctx.query_state_ptr,

				            *scan_ctx.query_options, scan_ctx.command, std::move(partition_ranges), nullptr);

				    while (!p->is_exhausted()) {

				@@ -640,13 +630,38 @@ static future<> scan_table_ranges(

				                }

				            } else {

				                // For a real column to contain an expiration time, it

				                // must be a numeric type.

				                // FIXME: Currently we only support decimal_type (which is

				                // what Alternator uses), but other numeric types can be

				                // supported as well to make this feature more useful in CQL.

				                // Note that kind::decimal is also checked above.

				                big_decimal n = value_cast<big_decimal>(v);

				                expired = is_expired(n, now);

				                // must be a numeric type. We currently support decimal

				                // (used by Alternator TTL) as well as bigint, int and

				                // timestamp (used by CQL per-row TTL).

				                switch (meta[*expiration_column]->type->get_kind()) {

				                    case abstract_type::kind::decimal:

				                        // Used by Alternator TTL for key columns not stored

				                        // in the map. The value is in seconds, fractional

				                        // part is ignored.

				                        expired = is_expired(value_cast<big_decimal>(v), now);

				                        break;

				                    case abstract_type::kind::long_kind:

				                        // Used by CQL per-row TTL. The value is in seconds.

				                        expired = is_expired(gc_clock::time_point(std::chrono::seconds(value_cast<int64_t>(v))), now);

				                        break;

				                    case abstract_type::kind::int32:

				                        // Used by CQL per-row TTL. The value is in seconds.

				                        // Using int type is not recommended because it will

				                        // overflow in 2038, but we support it to allow users

				                        // to use existing int columns for expiration.

				                        expired = is_expired(gc_clock::time_point(std::chrono::seconds(value_cast<int32_t>(v))), now);

				                        break;

				                    case abstract_type::kind::timestamp:

				                        // Used by CQL per-row TTL. The value is in milliseconds

				                        // but we truncate it to gc_clock's precision (whole seconds).

				                        expired = is_expired(gc_clock::time_point(std::chrono::duration_cast<gc_clock::duration>(value_cast<db_clock::time_point>(v).time_since_epoch())), now);

				                        break;

				                    default:

				                        // Should never happen - we verified the column's type

				                        // before starting the scan.

				                        [[unlikely]]

				                        on_internal_error(tlogger, format("expiration scanner value of unsupported type {} in column {}", meta[*expiration_column]->type->cql3_type_name(), scan_ctx.column_name) );

				                }

				            }

				            if (expired) {

				                expiration_stats.items_deleted++;

				@@ -708,16 +723,12 @@ static future<bool> scan_table(

				        co_return false;

				    }

				    // attribute_name may be one of the schema's columns (in Alternator, this

				    // means it's a key column), or an element in Alternator's attrs map

				    // encoded in Alternator's JSON encoding.

				    // FIXME: To make this less Alternators-specific, we should encode in the

				    // single key's value three things:

				    // 1. The name of a column

				    // 2. Optionally if column is a map, a member in the map

				    // 3. The deserializer for the value: CQL or Alternator (JSON).

				    // The deserializer can be guessed: If the given column or map item is

				    // numeric, it can be used directly. If it is a "bytes" type, it needs to

				    // be deserialized using Alternator's deserializer.

				    // means a key column, in CQL it's a regular column), or an element in

				    // Alternator's attrs map encoded in Alternator's JSON encoding (which we

				    // decode). If attribute_name is a real column, in Alternator it will have

				    // the type decimal, counting seconds since the UNIX epoch, while in CQL

				    // it will one of the types bigint or int (counting seconds) or timestamp

				    // (counting milliseconds).

				    bytes column_name = to_bytes(*attribute_name);

				    const column_definition *cd = s->get_column_definition(column_name);

				    std::optional<std::string> member;

				@@ -736,11 +747,14 @@ static future<bool> scan_table(

				    data_type column_type = cd->type;

				    // Verify that the column has the right type: If "member" exists

				    // the column must be a map, and if it doesn't, the column must

				    // (currently) be a decimal_type. If the column has the wrong type

				    // nothing can get expired in this table, and it's pointless to

				    // scan it.

				    // be decimal_type (Alternator), bigint, int or timestamp (CQL).

				    // If the column has the wrong type nothing can get expired in

				    // this table, and it's pointless to scan it.

				    if ((member && column_type->get_kind() != abstract_type::kind::map) ||

				        (!member && column_type->get_kind() != abstract_type::kind::decimal)) {

				        (!member && column_type->get_kind() != abstract_type::kind::decimal &&

				         column_type->get_kind() != abstract_type::kind::long_kind &&

				         column_type->get_kind() != abstract_type::kind::int32 &&

				         column_type->get_kind() != abstract_type::kind::timestamp)) {

				        tlogger.info("table {} TTL column has unsupported type, not scanning", s->cf_name());

				        co_return false;

				    }

				@@ -767,7 +781,7 @@ static future<bool> scan_table(

				                // by tasking another node to take over scanning of the dead node's primary

				                // ranges. What we do here is that this node will also check expiration

				                // on its *secondary* ranges - but only those whose primary owner is down.

				                auto tablet_secondary_replica = tablet_map.get_secondary_replica(*tablet); // throws if no secondary replica

				                auto tablet_secondary_replica = tablet_map.get_secondary_replica(*tablet, erm->get_topology()); // throws if no secondary replica

				                if (tablet_secondary_replica.host == my_host_id && tablet_secondary_replica.shard == this_shard_id()) {

				                    if (!gossiper.is_alive(tablet_primary_replica.host)) {

				                        co_await scan_tablet(*tablet, proxy, abort_source, page_sem, expiration_stats, scan_ctx, tablet_map);

				@@ -878,12 +892,10 @@ future<> expiration_service::run() {

				future<> expiration_service::start() {

				    // Called by main() on each shard to start the expiration-service

				    // thread. Just runs run() in the background and allows stop().

				    if (_db.features().alternator_ttl) {

				        if (!shutting_down()) {

				            _end = run().handle_exception([] (std::exception_ptr ep) {

				                tlogger.error("expiration_service failed: {}", ep);

				            });

				        }

				    if (!shutting_down()) {

				        _end = run().handle_exception([] (std::exception_ptr ep) {

				            tlogger.error("expiration_service failed: {}", ep);

				        });

				    }

				    return make_ready_future<>();

				}

									
										2

alternator/ttl.hh
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#pragma once

									
										26

alternator/ttl_tag.hh
									
										Normal file
									
												View File
												
				@@ -0,0 +1,26 @@

				/*

				 * Copyright 2026-present ScyllaDB

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#pragma once

				#include "seastarx.hh"

				#include <seastar/core/sstring.hh>

				namespace alternator {

				// We use the table tag TTL_TAG_KEY ("system:ttl_attribute") to remember

				// which attribute was chosen as the expiration-time attribute for

				// Alternator's TTL and CQL's per-row TTL features.

				// Currently, the *value* of this tag is simply the name of the attribute:

				// It can refer to a real column or if that doesn't exist, to a member of

				// the ":attrs" map column (which Alternator uses).

				extern const sstring TTL_TAG_KEY;

				} // namespace alternator

				// let users use TTL_TAG_KEY without the "alternator::" prefix,

				// to make it easier to move it to a different namespace later.

				using alternator::TTL_TAG_KEY;

									
										2

api/api-doc/authorization_cache.json
									
												View File
												
				@@ -12,7 +12,7 @@

				      "operations":[

				        {

				          "method":"POST",

				          "summary":"Reset cache",

				          "summary":"Resets authorized prepared statements cache",

				          "type":"void",

				          "nickname":"authorization_cache_reset",

				          "produces":[

									
										15

api/api-doc/error_injection.json
									
												View File
												
				@@ -112,21 +112,6 @@

				            }

				         ]

				      },

				      {

				         "path":"/v2/error_injection/events",

				         "operations":[

				            {

				               "method":"GET",

				               "summary":"Subscribe to Server-Sent Events stream of error injection events",

				               "type":"void",

				               "nickname":"injection_events",

				               "produces":[

				                  "text/event-stream"

				               ],

				               "parameters":[]

				            }

				         ]

				      },

				      {

				         "path":"/v2/error_injection/disconnect/{ip}",

				         "operations":[

									
										2

api/api-doc/messaging_service.json
									
												View File
												
				@@ -243,7 +243,7 @@

				                 "GOSSIP_DIGEST_SYN",

				                 "GOSSIP_DIGEST_ACK2",

				                 "GOSSIP_SHUTDOWN",

				                 "DEFINITIONS_UPDATE",

				                 "UNUSED__DEFINITIONS_UPDATE",

				                 "TRUNCATE",

				                 "UNUSED__REPLICATION_FINISHED",

				                 "MIGRATION_REQUEST",

									
										276

api/api-doc/storage_service.json
									
												View File
												
				@@ -743,7 +743,7 @@

				               "parameters":[

				                  {

				                     "name":"tag",

				                     "description":"the tag given to the snapshot",

				                     "description":"The snapshot tag to delete. If omitted, all snapshots are removed.",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"string",

				@@ -751,7 +751,7 @@

				                  },

				                  {

				                     "name":"kn",

				                     "description":"Comma-separated keyspaces name that their snapshot will be deleted",

				                     "description":"Comma-separated list of keyspace names to delete snapshots from. If omitted, snapshots are deleted from all keyspaces.",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"string",

				@@ -759,7 +759,7 @@

				                  },

				                  {

				                     "name":"cf",

				                     "description":"an optional table name that its snapshot will be deleted",

				                     "description":"A table name used to filter which table's snapshots are deleted. If omitted or empty, snapshots for all tables are eligible. When provided together with 'kn', the table is looked up in each listed keyspace independently. For secondary indexes, the logical index name (e.g. 'myindex') can be used and is resolved automatically.",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"string",

				@@ -1295,6 +1295,45 @@

				            }

				         ]

				      },

				      {

				         "path":"/storage_service/logstor_compaction",

				         "operations":[

				            {

				               "method":"POST",

				               "summary":"Trigger compaction of the key-value storage",

				               "type":"void",

				               "nickname":"logstor_compaction",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[

				                  {

				                     "name":"major",

				                     "description":"When true, perform a major compaction",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"boolean",

				                     "paramType":"query"

				                  }

				               ]

				            }

				         ]

				      },

				      {

				         "path":"/storage_service/logstor_flush",

				         "operations":[

				            {

				               "method":"POST",

				               "summary":"Trigger flush of logstor storage",

				               "type":"void",

				               "nickname":"logstor_flush",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[]

				            }

				         ]

				      },

				      {

				         "path":"/storage_service/active_repair/",

				         "operations":[

				@@ -3085,6 +3124,125 @@

				            }

				         ]

				      },

				      {

				         "path":"/storage_service/tablets/snapshots",

				         "operations":[

				            {

				               "method":"POST",

				               "summary":"Takes the snapshot for the given keyspaces/tables. A snapshot name must be specified.",

				               "type":"void",

				               "nickname":"take_cluster_snapshot",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[

				                  {

				                     "name":"tag",

				                     "description":"the tag given to the snapshot",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"keyspace",

				                     "description":"Keyspace(s) to snapshot. Multiple keyspaces can be provided using a comma-separated list. If omitted, snapshot all keyspaces.",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"table",

				                     "description":"Table(s) to snapshot. Multiple tables (in a single keyspace) can be provided using a comma-separated list. If omitted, snapshot all tables in the given keyspace(s).",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"query"

				                  }

				               ]

				            }

				         ]

				      },

				      {

				         "path":"/storage_service/vnode_tablet_migrations/keyspaces/{keyspace}",

				         "operations":[{

				             "method":"POST",

				             "summary":"Start vnodes-to-tablets migration for all tables in a keyspace",

				             "type":"void",

				             "nickname":"create_vnode_tablet_migration",

				             "produces":["application/json"],

				             "parameters":[

				                 {

				                     "name":"keyspace",

				                     "description":"Keyspace name",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"path"

				                 }

				             ]

				         },

				         {

				             "method":"GET",

				             "summary":"Get a keyspace's vnodes-to-tablets migration status",

				             "type":"vnode_tablet_migration_status",

				             "nickname":"get_vnode_tablet_migration",

				             "produces":["application/json"],

				             "parameters":[

				                 {

				                     "name":"keyspace",

				                     "description":"Keyspace name",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"path"

				                 }

				             ]

				         }]

				      },

				      {

				         "path":"/storage_service/vnode_tablet_migrations/node/storage_mode",

				         "operations":[{

				             "method":"PUT",

				             "summary":"Set the intended storage mode for this node during vnodes-to-tablets migration",

				             "type":"void",

				             "nickname":"set_vnode_tablet_migration_node_storage_mode",

				             "produces":["application/json"],

				             "parameters":[

				                 {

				                     "name":"intended_mode",

				                     "description":"Intended storage mode (tablets or vnodes)",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"query"

				                 }

				             ]

				         }]

				      },

				      {

				         "path":"/storage_service/vnode_tablet_migrations/keyspaces/{keyspace}/finalization",

				         "operations":[{

				             "method":"POST",

				             "summary":"Finalize vnodes-to-tablets migration for all tables in a keyspace",

				             "type":"void",

				             "nickname":"finalize_vnode_tablet_migration",

				             "produces":["application/json"],

				             "parameters":[

				                 {

				                     "name":"keyspace",

				                     "description":"Keyspace name",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"path"

				                 }

				             ]

				         }]

				      },

				      {

				         "path":"/storage_service/quiesce_topology",

				         "operations":[

				@@ -3187,6 +3345,38 @@

				            }

				         ]

				      },

				      {

				         "path":"/storage_service/logstor_info",

				         "operations":[

				            {

				               "method":"GET",

				               "summary":"Logstor segment information for one table",

				               "type":"table_logstor_info",

				               "nickname":"logstor_info",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[

				                  {

				                     "name":"keyspace",

				                     "description":"The keyspace",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"table",

				                     "description":"table name",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"query"

				                  }

				               ]

				            }

				         ]

				      },

				      {

				         "path":"/storage_service/retrain_dict",

				         "operations":[

				@@ -3595,6 +3785,47 @@

				            }

				        }

				      },

				        "logstor_hist_bucket":{

				         "id":"logstor_hist_bucket",

				         "properties":{

				            "bucket":{

				               "type":"long"

				            },

				            "count":{

				               "type":"long"

				            },

				            "min_data_size":{

				               "type":"long"

				            },

				            "max_data_size":{

				               "type":"long"

				            }

				         }

				        },

				        "table_logstor_info":{

				         "id":"table_logstor_info",

				         "description":"Per-table logstor segment distribution",

				         "properties":{

				            "keyspace":{

				               "type":"string"

				            },

				            "table":{

				               "type":"string"

				            },

				            "compaction_groups":{

				               "type":"long"

				            },

				            "segments":{

				               "type":"long"

				            },

				            "data_size_histogram":{

				               "type":"array",

				               "items":{

				                  "$ref":"logstor_hist_bucket"

				               }

				            }

				         }

				        },

				      "tablet_repair_result":{

				        "id":"tablet_repair_result",

				        "description":"Tablet repair result",

				@@ -3629,6 +3860,45 @@

				               "description":"The resulting compression ratio (estimated on a random sample of files)"

				            }

				         }

				      },

				      "vnode_tablet_migration_node_status":{

				         "id":"vnode_tablet_migration_node_status",

				         "description":"Node storage mode info during vnodes-to-tablets migration",

				         "properties":{

				            "host_id":{

				               "type":"string",

				               "description":"The host ID"

				            },

				            "current_mode":{

				               "type":"string",

				               "description":"The current storage mode: `vnodes` or `tablets`"

				            },

				            "intended_mode":{

				               "type":"string",

				               "description":"The intended storage mode: `vnodes` or `tablets`"

				            }

				         }

				      },

				      "vnode_tablet_migration_status":{

				         "id":"vnode_tablet_migration_status",

				         "description":"Vnodes-to-tablets migration status for a keyspace",

				         "properties":{

				            "keyspace":{

				               "type":"string",

				               "description":"The keyspace name"

				            },

				            "status":{

				               "type":"string",

				               "description":"The migration status: `vnodes` (not started), `migrating_to_tablets` (in progress), or `tablets` (complete)"

				            },

				            "nodes":{

				               "type":"array",

				               "items":{

				                  "$ref":"vnode_tablet_migration_node_status"

				               },

				               "description":"Per-node storage mode information. Empty if the keyspace is not being migrated."

				            }

				         }

				      }

				   }

				}

									
										15

api/api-doc/system.json
									
												View File
												
				@@ -209,6 +209,21 @@

				               "parameters":[]

				            }

				         ]

				      },

				      {

				         "path":"/system/chosen_sstable_version",

				         "operations":[

				            {

				               "method":"GET",

				               "summary":"Get sstable version currently chosen for use in new sstables",

				               "type":"string",

				               "nickname":"get_chosen_sstable_version",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[]

				            }

				         ]

				      }

				   ]

				}

									
										8

api/api.cc
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#include "api.hh"

				@@ -122,9 +122,9 @@ future<> unset_thrift_controller(http_context& ctx) {

				    return ctx.http_server.set_routes([&ctx] (routes& r) { unset_thrift_controller(ctx, r); });

				}

				future<> set_server_storage_service(http_context& ctx, sharded<service::storage_service>& ss, service::raft_group0_client& group0_client) {

				    return ctx.http_server.set_routes([&ctx, &ss, &group0_client] (routes& r) {

				            set_storage_service(ctx, r, ss, group0_client);

				future<> set_server_storage_service(http_context& ctx, sharded<service::storage_service>& ss, sharded<db::snapshot_ctl>& ssc, service::raft_group0_client& group0_client) {

				    return ctx.http_server.set_routes([&ctx, &ss, &ssc, &group0_client] (routes& r) {

				            set_storage_service(ctx, r, ss, ssc, group0_client);

				        });

				}

									
										27

api/api.hh
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#pragma once

				@@ -23,31 +23,6 @@

				namespace api {

				template<class T>

				std::vector<T> map_to_key_value(const std::map<sstring, sstring>& map) {

				    std::vector<T> res;

				    res.reserve(map.size());

				    for (const auto& [key, value] : map) {

				        res.push_back(T());

				        res.back().key = key;

				        res.back().value = value;

				    }

				    return res;

				}

				template<class T, class MAP>

				std::vector<T>& map_to_key_value(const MAP& map, std::vector<T>& res) {

				    res.reserve(res.size() + std::size(map));

				    for (const auto& [key, value] : map) {

				        T val;

				        val.key = fmt::to_string(key);

				        val.value = fmt::to_string(value);

				        res.push_back(val);

				    }

				    return res;

				}

				template <typename T, typename S = T>

				T map_sum(T&& dest, const S& src) {

				    for (const auto& i : src) {

									
										4

api/api_init.hh
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#pragma once

				@@ -98,7 +98,7 @@ future<> set_server_config(http_context& ctx, db::config& cfg);

				future<> unset_server_config(http_context& ctx);

				future<> set_server_snitch(http_context& ctx, sharded<locator::snitch_ptr>& snitch);

				future<> unset_server_snitch(http_context& ctx);

				future<> set_server_storage_service(http_context& ctx, sharded<service::storage_service>& ss, service::raft_group0_client&);

				future<> set_server_storage_service(http_context& ctx, sharded<service::storage_service>& ss, sharded<db::snapshot_ctl>&, service::raft_group0_client&);

				future<> unset_server_storage_service(http_context& ctx);

				future<> set_server_client_routes(http_context& ctx, sharded<service::client_routes_service>& cr);

				future<> unset_server_client_routes(http_context& ctx);

									
										2

api/authorization_cache.cc
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#include "api/api-doc/authorization_cache.json.hh"

									
										2

api/authorization_cache.hh
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#pragma once

									
										2

api/cache_service.cc
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#include "cache_service.hh"

									
										2

api/cache_service.hh
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#pragma once

									
										2

api/client_routes.cc
									
												View File
												
				@@ -4,7 +4,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				 #include <seastar/http/short_streams.hh>

									
										2

api/client_routes.hh
									
												View File
												
				@@ -4,7 +4,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#pragma once

									
										2

api/collectd.cc
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#include "collectd.hh"

									
										2

api/collectd.hh
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#pragma once

									
										59

api/column_family.cc
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#include <fmt/ranges.h>

				@@ -18,7 +18,9 @@

				#include "utils/assert.hh"

				#include "utils/estimated_histogram.hh"

				#include <algorithm>

				#include <sstream>

				#include "db/data_listeners.hh"

				#include "utils/hash.hh"

				#include "storage_service.hh"

				#include "compaction/compaction_manager.hh"

				#include "unimplemented.hh"

				@@ -342,6 +344,56 @@ uint64_t accumulate_on_active_memtables(replica::table& t, noncopyable_function<

				    return ret;

				}

				static

				future<json::json_return_type>

				rest_toppartitions_generic(sharded<replica::database>& db, std::unique_ptr<http::request> req) {

				        bool filters_provided = false;

				        std::unordered_set<std::tuple<sstring, sstring>, utils::tuple_hash> table_filters {};

				        if (auto filters = req->get_query_param("table_filters"); !filters.empty()) {

				            filters_provided = true;

				            std::stringstream ss { filters };

				            std::string filter;

				            while (!filters.empty() && ss.good()) {

				                std::getline(ss, filter, ',');

				                table_filters.emplace(parse_fully_qualified_cf_name(filter));

				            }

				        }

				        std::unordered_set<sstring> keyspace_filters {};

				        if (auto filters = req->get_query_param("keyspace_filters"); !filters.empty()) {

				            filters_provided = true;

				            std::stringstream ss { filters };

				            std::string filter;

				            while (!filters.empty() && ss.good()) {

				                std::getline(ss, filter, ',');

				                keyspace_filters.emplace(std::move(filter));

				            }

				        }

				        // when the query is empty return immediately

				        if (filters_provided && table_filters.empty() && keyspace_filters.empty()) {

				            apilog.debug("toppartitions query: processing results");

				            cf::toppartitions_query_results results;

				            results.read_cardinality = 0;

				            results.write_cardinality = 0;

				            return make_ready_future<json::json_return_type>(results);

				        }

				        api::req_param<std::chrono::milliseconds, unsigned> duration{*req, "duration", 1000ms};

				        api::req_param<unsigned> capacity(*req, "capacity", 256);

				        api::req_param<unsigned> list_size(*req, "list_size", 10);

				        apilog.info("toppartitions query: #table_filters={} #keyspace_filters={} duration={} list_size={} capacity={}",

				            !table_filters.empty() ? std::to_string(table_filters.size()) : "all", !keyspace_filters.empty() ? std::to_string(keyspace_filters.size()) : "all", duration.value, list_size.value, capacity.value);

				        return seastar::do_with(db::toppartitions_query(db, std::move(table_filters), std::move(keyspace_filters), duration.value, list_size, capacity), [] (db::toppartitions_query& q) {

				            return run_toppartitions_query(q);

				        });

				}

				void set_column_family(http_context& ctx, routes& r, sharded<replica::database>& db) {

				    cf::get_column_family_name.set(r, [&db] (const_req req){

				        std::vector<sstring> res;

				@@ -1047,6 +1099,10 @@ void set_column_family(http_context& ctx, routes& r, sharded<replica::database>&

				        });

				    });

				    ss::toppartitions_generic.set(r, [&db] (std::unique_ptr<http::request> req) {

				        return rest_toppartitions_generic(db, std::move(req));

				    });

				    cf::force_major_compaction.set(r, [&ctx, &db](std::unique_ptr<http::request> req) -> future<json::json_return_type> {

				        if (!req->get_query_param("split_output").empty()) {

				            fail(unimplemented::cause::API);

				@@ -1213,6 +1269,7 @@ void unset_column_family(http_context& ctx, routes& r) {

				    cf::get_sstable_count_per_level.unset(r);

				    cf::get_sstables_for_key.unset(r);

				    cf::toppartitions.unset(r);

				    ss::toppartitions_generic.unset(r);

				    cf::force_major_compaction.unset(r);

				    ss::get_load.unset(r);

				    ss::get_metrics_load.unset(r);

									
										2

api/column_family.hh
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#pragma once

									
										2

api/commitlog.cc
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#include "commitlog.hh"

									
										2

api/commitlog.hh
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#pragma once

									
										2

api/compaction_manager.cc
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#include <seastar/core/coroutine.hh>

									
										2

api/compaction_manager.hh
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#pragma once

									
										2

api/config.cc
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#include "api/api.hh"

									
										2

api/config.hh
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#pragma once

									
										2

api/cql_server_test.cc
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#include "build_mode.hh"

									
										2

api/cql_server_test.hh
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#ifndef SCYLLA_BUILD_MODE_RELEASE

									
										2

api/endpoint_snitch.cc
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#include "locator/snitch_base.hh"

									
										2

api/endpoint_snitch.hh
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#pragma once

									
										87

api/error_injection.cc
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#include "api/api-doc/error_injection.json.hh"

				@@ -13,27 +13,17 @@

				#include "utils/rjson.hh"

				#include <seastar/core/future-util.hh>

				#include <seastar/util/short_streams.hh>

				#include <seastar/core/queue.hh>

				#include <seastar/core/when_all.hh>

				#include <seastar/core/sharded.hh>

				namespace api {

				using namespace seastar::httpd;

				namespace hf = httpd::error_injection_json;

				// Structure to hold error injection event data

				struct injection_event {

				    sstring injection_name;

				    sstring injection_type;

				    unsigned shard_id;

				};

				void set_error_injection(http_context& ctx, routes& r) {

				    hf::enable_injection.set(r, [](std::unique_ptr<request> req) -> future<json::json_return_type> {

				        sstring injection = req->get_path_param("injection");

				        bool one_shot = req->get_query_param("one_shot") == "True";

				        bool one_shot = strcasecmp(req->get_query_param("one_shot").c_str(), "true") == 0;

				        auto params = co_await util::read_entire_stream_contiguous(*req->content_stream);

				        const size_t max_params_size = 1024 * 1024;

				@@ -111,79 +101,6 @@ void set_error_injection(http_context& ctx, routes& r) {

				            return make_ready_future<json::json_return_type>(json::json_void());

				        });

				    });

				    // Server-Sent Events endpoint for injection events

				    // This allows clients to subscribe to real-time injection events instead of log parsing

				    r.add(operation_type::GET, url("/v2/error_injection/events"), [](std::unique_ptr<request> req) -> future<json::json_return_type> {

				        // Create a shared foreign_ptr to a queue that will receive events from all shards

				        // Using a queue on the current shard to collect events

				        using event_queue_t = seastar::queue<injection_event>;

				        auto event_queue = make_lw_shared<event_queue_t>();

				        auto queue_ptr = make_foreign(event_queue);

				        // Register callback on all shards to send events to our queue

				        auto& errinj = utils::get_local_injector();

				        // Capture the current shard ID for event delivery

				        auto target_shard = this_shard_id();

				        // Setup event callback that forwards events to the queue on the target shard

				        // Note: We use shared_ptr wrapper for foreign_ptr to make it copyable

				        auto callback = [queue_ptr = queue_ptr.copy(), target_shard] (std::string_view name, std::string_view type) {

				            injection_event evt{

				                .injection_name = sstring(name),

				                .injection_type = sstring(type),

				                .shard_id = this_shard_id()

				            };

				            // Send event to the target shard's queue (discard future, fire-and-forget)

				            (void)smp::submit_to(target_shard, [queue_ptr = queue_ptr.copy(), evt = std::move(evt)] () mutable {

				                return queue_ptr->push_eventually(std::move(evt));

				            });

				        };

				        // Register the callback on all shards

				        co_await errinj.register_event_callback_on_all(callback);

				        // Return a streaming function that sends SSE events

				        noncopyable_function<future<>(output_stream<char>&&)> stream_func = 

				            [event_queue](output_stream<char>&& os) -> future<> {

				            auto s = std::move(os);

				            std::exception_ptr ex;

				            try {

				                // Send initial SSE comment to establish connection

				                co_await s.write(": connected\n\n");

				                co_await s.flush();

				                // Stream events as they arrive from any shard

				                while (true) {

				                    auto evt = co_await event_queue->pop_eventually();

				                    // Format as SSE event

				                    // data: {"injection":"name","type":"handler","shard":0}

				                    auto json_data = format("{{\"injection\":\"{}\",\"type\":\"{}\",\"shard\":{}}}",

				                        evt.injection_name, evt.injection_type, evt.shard_id);

				                    co_await s.write(format("data: {}\n\n", json_data));

				                    co_await s.flush();

				                }

				            } catch (...) {

				                ex = std::current_exception();

				            }

				            // Cleanup: clear callbacks on all shards

				            co_await utils::get_local_injector().clear_event_callbacks_on_all();

				            co_await s.close();

				            if (ex) {

				                co_await coroutine::return_exception_ptr(std::move(ex));

				            }

				        };

				        co_return json::json_return_type(std::move(stream_func));

				    });

				}

				} // namespace api

									
										2

api/error_injection.hh
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#pragma once

									
										2

api/failure_detector.cc
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#include "failure_detector.hh"

									
										2

api/failure_detector.hh
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#pragma once

									
										2

api/gossiper.cc
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#include <seastar/core/coroutine.hh>

									
										2

api/gossiper.hh
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#pragma once

									
										2

api/hinted_handoff.cc
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#include <vector>

									
										2

api/hinted_handoff.hh
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#pragma once

									
										2

api/lsa.cc
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#include "api/api-doc/lsa.json.hh"

									
										2

api/lsa.hh
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#pragma once

									
										2

api/messaging_service.cc
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#include "messaging_service.hh"

									
										2

api/messaging_service.hh
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#pragma once

									
										2

api/raft.cc
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#include <seastar/core/coroutine.hh>

									
										2

api/raft.hh
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#pragma once

									
										2

api/scrub_status.hh
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#pragma once

									
										2

api/service_levels.cc
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#include "service_levels.hh"

									
										2

api/service_levels.hh
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#pragma once

									
										2

api/storage_proxy.cc
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#include "storage_proxy.hh"

									
										2

api/storage_proxy.hh
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#pragma once

									
										334

api/storage_service.cc
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#include "storage_service.hh"

				@@ -17,9 +17,7 @@

				#include "gms/feature_service.hh"

				#include "schema/schema_builder.hh"

				#include "sstables/sstables_manager.hh"

				#include "utils/hash.hh"

				#include <optional>

				#include <sstream>

				#include <stdexcept>

				#include <time.h>

				#include <algorithm>

				@@ -32,6 +30,7 @@

				#include <fmt/ranges.h>

				#include "service/raft/raft_group0_client.hh"

				#include "service/storage_service.hh"

				#include "service/topology_state_machine.hh"

				#include "service/load_meter.hh"

				#include "gms/feature_service.hh"

				#include "gms/gossiper.hh"

				@@ -536,13 +535,15 @@ void unset_sstables_loader(http_context& ctx, routes& r) {

				}

				void set_view_builder(http_context& ctx, routes& r, sharded<db::view::view_builder>& vb, sharded<gms::gossiper>& g) {

				    ss::view_build_statuses.set(r, [&ctx, &vb, &g] (std::unique_ptr<http::request> req) {

				    ss::view_build_statuses.set(r, [&ctx, &vb, &g] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {

				        auto keyspace = validate_keyspace(ctx, req);

				        auto view = req->get_path_param("view");

				        return vb.local().view_build_statuses(std::move(keyspace), std::move(view), g.local()).then([] (std::unordered_map<sstring, sstring> status) {

				            std::vector<storage_service_json::mapper> res;

				            return make_ready_future<json::json_return_type>(map_to_key_value(std::move(status), res));

				        });

				        co_return json::json_return_type(stream_range_as_array(co_await vb.local().view_build_statuses(std::move(keyspace), std::move(view), g.local()), [] (const auto& i) {

				            storage_service_json::mapper res;

				            res.key = i.first;

				            res.value = i.second;

				            return res;

				        }));

				    });

				    cf::get_built_indexes.set(r, [&vb](std::unique_ptr<http::request> req) -> future<json::json_return_type> {

				@@ -572,12 +573,14 @@ void unset_view_builder(http_context& ctx, routes& r) {

				    cf::get_built_indexes.unset(r);

				}

				static future<json::json_return_type> describe_ring_as_json(sharded<service::storage_service>& ss, sstring keyspace) {

				    co_return json::json_return_type(stream_range_as_array(co_await ss.local().describe_ring(keyspace), token_range_endpoints_to_json));

				namespace {

				template <typename Key, typename Value>

				storage_service_json::mapper map_to_json(const std::pair<Key, Value>& i) {

				    storage_service_json::mapper val;

				    val.key = fmt::to_string(i.first);

				    val.value = fmt::to_string(i.second);

				    return val;

				}

				static future<json::json_return_type> describe_ring_as_json_for_table(const sharded<service::storage_service>& ss, sstring keyspace, sstring table) {

				    co_return json::json_return_type(stream_range_as_array(co_await ss.local().describe_ring_for_table(keyspace, table), token_range_endpoints_to_json));

				}

				static

				@@ -597,62 +600,7 @@ rest_get_token_endpoint(http_context& ctx, sharded<service::storage_service>& ss

				            throw bad_param_exception("Either provide both keyspace and table (for tablet table) or neither (for vnodes)");

				        }

				        co_return json::json_return_type(stream_range_as_array(token_endpoints, [](const auto& i) {

				            storage_service_json::mapper val;

				            val.key = fmt::to_string(i.first);

				            val.value = fmt::to_string(i.second);

				            return val;

				        }));

				}

				static

				future<json::json_return_type>

				rest_toppartitions_generic(http_context& ctx, std::unique_ptr<http::request> req) {

				        bool filters_provided = false;

				        std::unordered_set<std::tuple<sstring, sstring>, utils::tuple_hash> table_filters {};

				        if (auto filters = req->get_query_param("table_filters"); !filters.empty()) {

				            filters_provided = true;

				            std::stringstream ss { filters };

				            std::string filter;

				            while (!filters.empty() && ss.good()) {

				                std::getline(ss, filter, ',');

				                table_filters.emplace(parse_fully_qualified_cf_name(filter));

				            }

				        }

				        std::unordered_set<sstring> keyspace_filters {};

				        if (auto filters = req->get_query_param("keyspace_filters"); !filters.empty()) {

				            filters_provided = true;

				            std::stringstream ss { filters };

				            std::string filter;

				            while (!filters.empty() && ss.good()) {

				                std::getline(ss, filter, ',');

				                keyspace_filters.emplace(std::move(filter));

				            }

				        }

				        // when the query is empty return immediately

				        if (filters_provided && table_filters.empty() && keyspace_filters.empty()) {

				            apilog.debug("toppartitions query: processing results");

				            httpd::column_family_json::toppartitions_query_results results;

				            results.read_cardinality = 0;

				            results.write_cardinality = 0;

				            return make_ready_future<json::json_return_type>(results);

				        }

				        api::req_param<std::chrono::milliseconds, unsigned> duration{*req, "duration", 1000ms};

				        api::req_param<unsigned> capacity(*req, "capacity", 256);

				        api::req_param<unsigned> list_size(*req, "list_size", 10);

				        apilog.info("toppartitions query: #table_filters={} #keyspace_filters={} duration={} list_size={} capacity={}",

				            !table_filters.empty() ? std::to_string(table_filters.size()) : "all", !keyspace_filters.empty() ? std::to_string(keyspace_filters.size()) : "all", duration.value, list_size.value, capacity.value);

				        return seastar::do_with(db::toppartitions_query(ctx.db, std::move(table_filters), std::move(keyspace_filters), duration.value, list_size, capacity), [] (db::toppartitions_query& q) {

				            return run_toppartitions_query(q);

				        });

				        co_return json::json_return_type(stream_range_as_array(token_endpoints, &map_to_json<dht::token, gms::inet_address>));

				}

				static

				@@ -686,7 +634,6 @@ rest_get_range_to_endpoint_map(http_context& ctx, sharded<service::storage_servi

				            table_id = validate_table(ctx.db.local(), keyspace, table);

				        }

				        std::vector<ss::maplist_mapper> res;

				        co_return stream_range_as_array(co_await ss.local().get_range_to_address_map(keyspace, table_id),

				                [](const std::pair<dht::token_range, inet_address_vector_replica_set>& entry){

				            ss::maplist_mapper m;

				@@ -723,13 +670,16 @@ rest_describe_ring(http_context& ctx, sharded<service::storage_service>& ss, std

				        if (!req->param.exists("keyspace")) {

				            throw bad_param_exception("The keyspace param is not provided");

				        }

				        auto keyspace = req->get_path_param("keyspace");

				        auto keyspace = validate_keyspace(ctx, req);

				        auto table = req->get_query_param("table");

				        utils::chunked_vector<dht::token_range_endpoints> ranges;

				        if (!table.empty()) {

				            validate_table(ctx.db.local(), keyspace, table);

				            return describe_ring_as_json_for_table(ss, keyspace, table);

				            auto table_id = validate_table(ctx.db.local(), keyspace, table);

				            ranges = co_await ss.local().describe_ring_for_table(table_id);

				        } else {

				            ranges = co_await ss.local().describe_ring(keyspace);

				        }

				        return describe_ring_as_json(ss, validate_keyspace(ctx, req));

				        co_return json::json_return_type(stream_range_as_array(std::move(ranges), token_range_endpoints_to_json));

				}

				static

				@@ -777,17 +727,13 @@ rest_cleanup_all(http_context& ctx, sharded<service::storage_service>& ss, std::

				        apilog.info("cleanup_all global={}", global);

				        auto done = !global ? false : co_await ss.invoke_on(0, [] (service::storage_service& ss) -> future<bool> {

				            if (!ss.is_topology_coordinator_enabled()) {

				                co_return false;

				            }

				            co_await ss.do_clusterwide_vnodes_cleanup();

				            co_return true;

				        });

				        if (done) {

				        if (global) {

				            co_await ss.invoke_on(0, [] (service::storage_service& ss) -> future<> {

				                co_return co_await ss.do_clusterwide_vnodes_cleanup();

				            });

				            co_return json::json_return_type(0);

				        }

				        // fall back to the local cleanup if topology coordinator is not enabled or local cleanup is requested

				        // fall back to the local cleanup if local cleanup is requested

				        auto& db = ctx.db;

				        auto& compaction_module = db.local().get_compaction_manager().get_task_manager_module();

				        auto task = co_await compaction_module.make_and_start_task<compaction::global_cleanup_compaction_task_impl>({}, db);

				@@ -795,9 +741,7 @@ rest_cleanup_all(http_context& ctx, sharded<service::storage_service>& ss, std::

				        // Mark this node as clean

				        co_await ss.invoke_on(0, [] (service::storage_service& ss) -> future<> {

				            if (ss.is_topology_coordinator_enabled()) {

				                co_await ss.reset_cleanup_needed();

				            }

				            co_await ss.reset_cleanup_needed();

				        });

				        co_return json::json_return_type(0);

				@@ -808,9 +752,6 @@ future<json::json_return_type>

				rest_reset_cleanup_needed(http_context& ctx, sharded<service::storage_service>& ss, std::unique_ptr<http::request> req) {

				        apilog.info("reset_cleanup_needed");

				        co_await ss.invoke_on(0, [] (service::storage_service& ss) {

				            if (!ss.is_topology_coordinator_enabled()) {

				                throw std::runtime_error("mark_node_as_clean is only supported when topology over raft is enabled");

				            }

				            return ss.reset_cleanup_needed();

				        });

				        co_return json_void();

				@@ -838,9 +779,31 @@ rest_force_keyspace_flush(http_context& ctx, std::unique_ptr<http::request> req)

				static

				future<json::json_return_type>

				rest_decommission(sharded<service::storage_service>& ss, std::unique_ptr<http::request> req) {

				rest_logstor_compaction(http_context& ctx, std::unique_ptr<http::request> req) {

				        bool major = false;

				        if (auto major_param = req->get_query_param("major"); !major_param.empty()) {

				            major = validate_bool(major_param);

				        }

				        apilog.info("logstor_compaction: major={}", major);

				        auto& db = ctx.db;

				        co_await replica::database::trigger_logstor_compaction_on_all_shards(db, major);

				        co_return json_void();

				}

				static

				future<json::json_return_type>

				rest_logstor_flush(http_context& ctx, std::unique_ptr<http::request> req) {

				        apilog.info("logstor_flush");

				        auto& db = ctx.db;

				        co_await replica::database::flush_logstor_separator_on_all_shards(db);

				        co_return json_void();

				}

				static

				future<json::json_return_type>

				rest_decommission(sharded<service::storage_service>& ss, sharded<db::snapshot_ctl>& ssc, std::unique_ptr<http::request> req) {

				        apilog.info("decommission");

				        return ss.local().decommission().then([] {

				        return ss.local().decommission(ssc).then([] {

				            return make_ready_future<json::json_return_type>(json_void());

				        });

				}

				@@ -1317,10 +1280,7 @@ rest_get_ownership(http_context& ctx, sharded<service::storage_service>& ss, std

				            throw httpd::bad_param_exception("storage_service/ownership cannot be used when a keyspace uses tablets");

				        }

				        return ss.local().get_ownership().then([] (auto&& ownership) {

				            std::vector<storage_service_json::mapper> res;

				            return make_ready_future<json::json_return_type>(map_to_key_value(ownership, res));

				        });

				        co_return json::json_return_type(stream_range_as_array(co_await ss.local().get_ownership(), &map_to_json<gms::inet_address, float>));

				}

				static

				@@ -1337,10 +1297,7 @@ rest_get_effective_ownership(http_context& ctx, sharded<service::storage_service

				            }

				        }

				        return ss.local().effective_ownership(keyspace_name, table_name).then([] (auto&& ownership) {

				            std::vector<storage_service_json::mapper> res;

				            return make_ready_future<json::json_return_type>(map_to_key_value(ownership, res));

				        });

				        co_return json::json_return_type(stream_range_as_array(co_await ss.local().effective_ownership(keyspace_name, table_name), &map_to_json<gms::inet_address, float>));

				}

				static

				@@ -1350,7 +1307,7 @@ rest_estimate_compression_ratios(http_context& ctx, sharded<service::storage_ser

				        apilog.warn("estimate_compression_ratios: called before the cluster feature was enabled");

				        throw std::runtime_error("estimate_compression_ratios requires all nodes to support the SSTABLE_COMPRESSION_DICTS cluster feature");

				    }

				    auto ticket = get_units(ss.local().get_do_sample_sstables_concurrency_limiter(), 1);

				    auto ticket = co_await get_units(ss.local().get_do_sample_sstables_concurrency_limiter(), 1);

				    auto ks = api::req_param<sstring>(*req, "keyspace", {}).value;

				    auto cf = api::req_param<sstring>(*req, "cf", {}).value;

				    apilog.debug("estimate_compression_ratios: called with ks={} cf={}", ks, cf);

				@@ -1416,7 +1373,7 @@ rest_retrain_dict(http_context& ctx, sharded<service::storage_service>& ss, serv

				        apilog.warn("retrain_dict: called before the cluster feature was enabled");

				        throw std::runtime_error("retrain_dict requires all nodes to support the SSTABLE_COMPRESSION_DICTS cluster feature");

				    }

				    auto ticket = get_units(ss.local().get_do_sample_sstables_concurrency_limiter(), 1);

				    auto ticket = co_await get_units(ss.local().get_do_sample_sstables_concurrency_limiter(), 1);

				    auto ks = api::req_param<sstring>(*req, "keyspace", {}).value;

				    auto cf = api::req_param<sstring>(*req, "cf", {}).value;

				    apilog.debug("retrain_dict: called with ks={} cf={}", ks, cf);

				@@ -1562,6 +1519,54 @@ rest_sstable_info(http_context& ctx, std::unique_ptr<http::request> req) {

				        });

				}

				static

				future<json::json_return_type>

				rest_logstor_info(http_context& ctx, std::unique_ptr<http::request> req) {

				        auto keyspace = api::req_param<sstring>(*req, "keyspace", {}).value;

				        auto table = api::req_param<sstring>(*req, "table", {}).value;

				        if (table.empty()) {

				            table = api::req_param<sstring>(*req, "cf", {}).value;

				        }

				        if (keyspace.empty()) {

				            throw bad_param_exception("The query parameter 'keyspace' is required");

				        }

				        if (table.empty()) {

				            throw bad_param_exception("The query parameter 'table' is required");

				        }

				        keyspace = validate_keyspace(ctx, keyspace);

				        auto tid = validate_table(ctx.db.local(), keyspace, table);

				        auto& cf = ctx.db.local().find_column_family(tid);

				        if (!cf.uses_logstor()) {

				            throw bad_param_exception(fmt::format("Table {}.{} does not use logstor", keyspace, table));

				        }

				        return do_with(replica::logstor::table_segment_stats{}, [keyspace = std::move(keyspace), table = std::move(table), tid, &ctx] (replica::logstor::table_segment_stats& merged_stats) {

				            return ctx.db.map_reduce([&merged_stats](replica::logstor::table_segment_stats&& shard_stats) {

				                merged_stats += shard_stats;

				            }, [tid](const replica::database& db) {

				                return db.get_logstor_table_segment_stats(tid);

				            }).then([&merged_stats, keyspace = std::move(keyspace), table = std::move(table)] {

				                ss::table_logstor_info result;

				                result.keyspace = keyspace;

				                result.table = table;

				                result.compaction_groups = merged_stats.compaction_group_count;

				                result.segments = merged_stats.segment_count;

				                for (const auto& bucket : merged_stats.histogram) {

				                    ss::logstor_hist_bucket hist;

				                    hist.count = bucket.count;

				                    hist.max_data_size = bucket.max_data_size;

				                    result.data_size_histogram.push(std::move(hist));

				                }

				                return make_ready_future<json::json_return_type>(stream_object(result));

				            });

				        });

				}

				static

				future<json::json_return_type>

				rest_reload_raft_topology_state(sharded<service::storage_service>& ss, service::raft_group0_client& group0_client, std::unique_ptr<http::request> req) {

				@@ -1574,26 +1579,14 @@ rest_reload_raft_topology_state(sharded<service::storage_service>& ss, service::

				static

				future<json::json_return_type>

				rest_upgrade_to_raft_topology(sharded<service::storage_service>& ss, std::unique_ptr<http::request> req) {

				        apilog.info("Requested to schedule upgrade to raft topology");

				        try {

				            co_await ss.invoke_on(0, [] (auto& ss) {

				                return ss.start_upgrade_to_raft_topology();

				            });

				        } catch (...) {

				            auto ex = std::current_exception();

				            apilog.error("Failed to schedule upgrade to raft topology: {}", ex);

				            std::rethrow_exception(std::move(ex));

				        }

				        apilog.info("Requested to schedule upgrade to raft topology, but this version does not need it since it uses raft topology by default.");

				        co_return json_void();

				}

				static

				future<json::json_return_type>

				rest_raft_topology_upgrade_status(sharded<service::storage_service>& ss, std::unique_ptr<http::request> req) {

				        const auto ustate = co_await ss.invoke_on(0, [] (auto& ss) {

				            return ss.get_topology_upgrade_state();

				        });

				        co_return sstring(format("{}", ustate));

				        co_return sstring("done");

				}

				static

				@@ -1730,6 +1723,69 @@ rest_tablet_balancing_enable(sharded<service::storage_service>& ss, std::unique_

				        co_return json_void();

				}

				static

				future<json::json_return_type>

				rest_create_vnode_tablet_migration(http_context& ctx, sharded<service::storage_service>& ss, std::unique_ptr<http::request> req) {

				    if (!ss.local().get_feature_service().vnodes_to_tablets_migrations) {

				        apilog.warn("create_vnode_tablet_migration: called before the cluster feature was enabled");

				        throw std::runtime_error("vnodes-to-tablets migration requires all nodes to support the VNODES_TO_TABLETS_MIGRATIONS cluster feature");

				    }

				    auto keyspace = validate_keyspace(ctx, req);

				    co_await ss.local().prepare_for_tablets_migration(keyspace);

				    co_return json_void();

				}

				static

				future<json::json_return_type>

				rest_get_vnode_tablet_migration(http_context& ctx, sharded<service::storage_service>& ss, std::unique_ptr<http::request> req) {

				    if (!ss.local().get_feature_service().vnodes_to_tablets_migrations) {

				        apilog.warn("get_vnode_tablet_migration: called before the cluster feature was enabled");

				        throw std::runtime_error("vnodes-to-tablets migration requires all nodes to support the VNODES_TO_TABLETS_MIGRATIONS cluster feature");

				    }

				    auto keyspace = validate_keyspace(ctx, req);

				    auto status = co_await ss.local().get_tablets_migration_status(keyspace);

				    ss::vnode_tablet_migration_status result;

				    result.keyspace = status.keyspace;

				    result.status = status.status;

				    result.nodes._set = true;

				    for (const auto& node : status.nodes) {

				        ss::vnode_tablet_migration_node_status n;

				        n.host_id = fmt::to_string(node.host_id);

				        n.current_mode = node.current_mode;

				        n.intended_mode = node.intended_mode;

				        result.nodes.push(n);

				    }

				    co_return result;

				}

				static

				future<json::json_return_type>

				rest_set_vnode_tablet_migration_node_storage_mode(http_context& ctx, sharded<service::storage_service>& ss, std::unique_ptr<http::request> req) {

				    if (!ss.local().get_feature_service().vnodes_to_tablets_migrations) {

				        apilog.warn("set_vnode_tablet_migration_node_storage_mode: called before the cluster feature was enabled");

				        throw std::runtime_error("vnodes-to-tablets migration requires all nodes to support the VNODES_TO_TABLETS_MIGRATIONS cluster feature");

				    }

				    auto mode_str = req->get_query_param("intended_mode");

				    auto mode = service::intended_storage_mode_from_string(mode_str);

				    co_await ss.local().set_node_intended_storage_mode(mode);

				    co_return json_void();

				}

				static

				future<json::json_return_type>

				rest_finalize_vnode_tablet_migration(http_context& ctx, sharded<service::storage_service>& ss, std::unique_ptr<http::request> req) {

				    if (!ss.local().get_feature_service().vnodes_to_tablets_migrations) {

				        apilog.warn("finalize_vnode_tablet_migration: called before the cluster feature was enabled");

				        throw std::runtime_error("vnodes-to-tablets migration requires all nodes to support the VNODES_TO_TABLETS_MIGRATIONS cluster feature");

				    }

				    auto keyspace = validate_keyspace(ctx, req);

				    validate_keyspace(ctx, keyspace);

				    co_await ss.local().finalize_tablets_migration(keyspace);

				    co_return json_void();

				}

				static

				future<json::json_return_type>

				rest_quiesce_topology(sharded<service::storage_service>& ss, std::unique_ptr<http::request> req) {

				@@ -1803,9 +1859,8 @@ rest_bind(FuncType func, BindArgs&... args) {

				    return std::bind_front(func, std::ref(args)...);

				}

				void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_service>& ss, service::raft_group0_client& group0_client) {

				void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_service>& ss, sharded<db::snapshot_ctl>& ssc, service::raft_group0_client& group0_client) {

				    ss::get_token_endpoint.set(r, rest_bind(rest_get_token_endpoint, ctx, ss));

				    ss::toppartitions_generic.set(r, rest_bind(rest_toppartitions_generic, ctx));

				    ss::get_release_version.set(r, rest_bind(rest_get_release_version, ss));

				    ss::get_scylla_release_version.set(r, rest_bind(rest_get_scylla_release_version, ss));

				    ss::get_schema_version.set(r, rest_bind(rest_get_schema_version, ss));

				@@ -1820,7 +1875,9 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_

				    ss::reset_cleanup_needed.set(r, rest_bind(rest_reset_cleanup_needed, ctx, ss));

				    ss::force_flush.set(r, rest_bind(rest_force_flush, ctx));

				    ss::force_keyspace_flush.set(r, rest_bind(rest_force_keyspace_flush, ctx));

				    ss::decommission.set(r, rest_bind(rest_decommission, ss));

				    ss::decommission.set(r, rest_bind(rest_decommission, ss, ssc));

				    ss::logstor_compaction.set(r, rest_bind(rest_logstor_compaction, ctx));

				    ss::logstor_flush.set(r, rest_bind(rest_logstor_flush, ctx));

				    ss::move.set(r, rest_bind(rest_move, ss));

				    ss::remove_node.set(r, rest_bind(rest_remove_node, ss));

				    ss::exclude_node.set(r, rest_bind(rest_exclude_node, ss));

				@@ -1869,6 +1926,7 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_

				    ss::retrain_dict.set(r, rest_bind(rest_retrain_dict, ctx, ss, group0_client));

				    ss::estimate_compression_ratios.set(r, rest_bind(rest_estimate_compression_ratios, ctx, ss));

				    ss::sstable_info.set(r, rest_bind(rest_sstable_info, ctx));

				    ss::logstor_info.set(r, rest_bind(rest_logstor_info, ctx));

				    ss::reload_raft_topology_state.set(r, rest_bind(rest_reload_raft_topology_state, ss, group0_client));

				    ss::upgrade_to_raft_topology.set(r, rest_bind(rest_upgrade_to_raft_topology, ss));

				    ss::raft_topology_upgrade_status.set(r, rest_bind(rest_raft_topology_upgrade_status, ss));

				@@ -1878,6 +1936,10 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_

				    ss::del_tablet_replica.set(r, rest_bind(rest_del_tablet_replica, ctx, ss));

				    ss::repair_tablet.set(r, rest_bind(rest_repair_tablet, ctx, ss));

				    ss::tablet_balancing_enable.set(r, rest_bind(rest_tablet_balancing_enable, ss));

				    ss::create_vnode_tablet_migration.set(r, rest_bind(rest_create_vnode_tablet_migration, ctx, ss));

				    ss::get_vnode_tablet_migration.set(r, rest_bind(rest_get_vnode_tablet_migration, ctx, ss));

				    ss::set_vnode_tablet_migration_node_storage_mode.set(r, rest_bind(rest_set_vnode_tablet_migration_node_storage_mode, ctx, ss));

				    ss::finalize_vnode_tablet_migration.set(r, rest_bind(rest_finalize_vnode_tablet_migration, ctx, ss));

				    ss::quiesce_topology.set(r, rest_bind(rest_quiesce_topology, ss));

				    sp::get_schema_versions.set(r, rest_bind(rest_get_schema_versions, ss));

				    ss::drop_quarantined_sstables.set(r, rest_bind(rest_drop_quarantined_sstables, ctx, ss));

				@@ -1885,7 +1947,6 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_

				void unset_storage_service(http_context& ctx, routes& r) {

				    ss::get_token_endpoint.unset(r);

				    ss::toppartitions_generic.unset(r);

				    ss::get_release_version.unset(r);

				    ss::get_scylla_release_version.unset(r);

				    ss::get_schema_version.unset(r);

				@@ -1899,6 +1960,8 @@ void unset_storage_service(http_context& ctx, routes& r) {

				    ss::reset_cleanup_needed.unset(r);

				    ss::force_flush.unset(r);

				    ss::force_keyspace_flush.unset(r);

				    ss::logstor_compaction.unset(r);

				    ss::logstor_flush.unset(r);

				    ss::decommission.unset(r);

				    ss::move.unset(r);

				    ss::remove_node.unset(r);

				@@ -1946,6 +2009,7 @@ void unset_storage_service(http_context& ctx, routes& r) {

				    ss::get_ownership.unset(r);

				    ss::get_effective_ownership.unset(r);

				    ss::sstable_info.unset(r);

				    ss::logstor_info.unset(r);

				    ss::reload_raft_topology_state.unset(r);

				    ss::upgrade_to_raft_topology.unset(r);

				    ss::raft_topology_upgrade_status.unset(r);

				@@ -1955,6 +2019,10 @@ void unset_storage_service(http_context& ctx, routes& r) {

				    ss::del_tablet_replica.unset(r);

				    ss::repair_tablet.unset(r);

				    ss::tablet_balancing_enable.unset(r);

				    ss::create_vnode_tablet_migration.unset(r);

				    ss::get_vnode_tablet_migration.unset(r);

				    ss::set_vnode_tablet_migration_node_storage_mode.unset(r);

				    ss::finalize_vnode_tablet_migration.unset(r);

				    ss::quiesce_topology.unset(r);

				    sp::get_schema_versions.unset(r);

				    ss::drop_quarantined_sstables.unset(r);

				@@ -2025,6 +2093,8 @@ void set_snapshot(http_context& ctx, routes& r, sharded<db::snapshot_ctl>& snap_

				        auto tag = req->get_query_param("tag");

				        auto column_families = split(req->get_query_param("cf"), ",");

				        auto sfopt = req->get_query_param("sf");

				        auto tcopt = req->get_query_param("tc");

				        db::snapshot_options opts = {

				            .skip_flush = strcasecmp(sfopt.c_str(), "true") == 0,

				        };

				@@ -2043,12 +2113,35 @@ void set_snapshot(http_context& ctx, routes& r, sharded<db::snapshot_ctl>& snap_

				                co_await snap_ctl.local().take_column_family_snapshot(keynames[0], column_families, tag, opts);

				            }

				            co_return json_void();

				        } catch (const data_dictionary::no_such_column_family& e) {

				            throw httpd::bad_param_exception(e.what());

				        } catch (...) {

				            apilog.error("take_snapshot failed: {}", std::current_exception());

				            throw;

				        }

				    });

				    ss::take_cluster_snapshot.set(r, [&snap_ctl](std::unique_ptr<http::request> req) -> future<json::json_return_type> {

				        apilog.info("take_cluster_snapshot: {}", req->get_query_params());

				        auto tag = req->get_query_param("tag");

				        auto column_families = split(req->get_query_param("table"), ",");

				        // Note: not published/active. Retain as internal option, but...

				        auto sfopt = req->get_query_param("skip_flush");

				        db::snapshot_options opts = {

				            .skip_flush = strcasecmp(sfopt.c_str(), "true") == 0,

				        };

				        std::vector<sstring> keynames = split(req->get_query_param("keyspace"), ",");

				        try {

				            co_await snap_ctl.local().take_cluster_column_family_snapshot(keynames, column_families, tag, opts);

				            co_return json_void();

				        } catch (...) {

				            apilog.error("take_cluster_snapshot failed: {}", std::current_exception());

				            throw;

				        }

				    });

				    ss::del_snapshot.set(r, [&snap_ctl](std::unique_ptr<http::request> req) -> future<json::json_return_type> {

				        apilog.info("del_snapshot: {}", req->get_query_params());

				        auto tag = req->get_query_param("tag");

				@@ -2058,6 +2151,8 @@ void set_snapshot(http_context& ctx, routes& r, sharded<db::snapshot_ctl>& snap_

				        try {

				            co_await snap_ctl.local().clear_snapshot(tag, keynames, column_family);

				            co_return json_void();

				        } catch (const data_dictionary::no_such_column_family& e) {

				            throw httpd::bad_param_exception(e.what());

				        } catch (...) {

				            apilog.error("del_snapshot failed: {}", std::current_exception());

				            throw;

				@@ -2139,6 +2234,7 @@ void unset_snapshot(http_context& ctx, routes& r) {

				    ss::start_backup.unset(r);

				    cf::get_true_snapshots_size.unset(r);

				    cf::get_all_true_snapshots_size.unset(r);

				    ss::decommission.unset(r);

				}

				}

									
										4

api/storage_service.hh
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#pragma once

				@@ -66,7 +66,7 @@ struct scrub_info {

				scrub_info parse_scrub_options(const http_context& ctx, std::unique_ptr<http::request> req);

				void set_storage_service(http_context& ctx, httpd::routes& r, sharded<service::storage_service>& ss, service::raft_group0_client&);

				void set_storage_service(http_context& ctx, httpd::routes& r, sharded<service::storage_service>& ss, sharded<db::snapshot_ctl>&, service::raft_group0_client&);

				void unset_storage_service(http_context& ctx, httpd::routes& r);

				void set_sstables_loader(http_context& ctx, httpd::routes& r, sharded<sstables_loader>& sst_loader);

				void unset_sstables_loader(http_context& ctx, httpd::routes& r);

									
										2

api/stream_manager.cc
									
												View File
												
				@@ -3,7 +3,7 @@

				 */

				/*

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

				 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

				 */

				#include "stream_manager.hh"

Compare commits

1185 Commits copilot/in ... ykaul/comp

18 .github/copilot-instructions.md vendored Unescape Escape View File

2 .github/dependabot.yml vendored Unescape Escape View File

2 .github/scripts/check-license.py vendored Unescape Escape View File

3 .github/workflows/backport-pr-fixes-validation.yaml vendored Unescape Escape View File

53 .github/workflows/call_backport_with_jira.yaml vendored Normal file Unescape Escape View File

35 .github/workflows/call_jira_sync.yml vendored Unescape Escape View File

6 .github/workflows/call_sync_milestone_to_jira.yml vendored Unescape Escape View File

5 .github/workflows/call_validate_pr_author_email.yml vendored Unescape Escape View File

2 .github/workflows/check-license-header.yaml vendored Unescape Escape View File

6 .github/workflows/docs-pages.yaml vendored Unescape Escape View File

4 .github/workflows/docs-pr.yaml vendored Unescape Escape View File

42 .github/workflows/trigger-scylla-ci.yaml vendored Unescape Escape View File

3 .github/workflows/trigger_jenkins.yaml vendored Unescape Escape View File

86 CMakeLists.txt Unescape Escape View File

197 IMPLEMENTATION_SUMMARY.md Unescape Escape View File

46 LICENSE-ScyllaDB-Source-Available.md Unescape Escape View File

2 README.md Unescape Escape View File

2 abseil

2 absl-flat_hash_map.cc Unescape Escape View File

2 absl-flat_hash_map.hh Unescape Escape View File

11 alternator/auth.cc Unescape Escape View File

4 alternator/auth.hh Unescape Escape View File

2 alternator/conditions.cc Unescape Escape View File

2 alternator/conditions.hh Unescape Escape View File

2 alternator/consumed_capacity.cc Unescape Escape View File

2 alternator/consumed_capacity.hh Unescape Escape View File

2 alternator/controller.cc Unescape Escape View File

2 alternator/controller.hh Unescape Escape View File

2 alternator/error.hh Unescape Escape View File

28 alternator/executor.cc Unescape Escape View File

2 alternator/executor.hh Unescape Escape View File

2 alternator/expressions.cc Unescape Escape View File

2 alternator/expressions.g Unescape Escape View File

2 alternator/expressions.hh Unescape Escape View File

2 alternator/expressions_types.hh Unescape Escape View File

2 alternator/extract_from_attrs.hh Unescape Escape View File

2 alternator/http_compression.cc Unescape Escape View File

2 alternator/http_compression.hh Unescape Escape View File

2 alternator/parsed_expression_cache.cc Unescape Escape View File

2 alternator/rmw_operation.hh Unescape Escape View File

2 alternator/serialization.cc Unescape Escape View File

2 alternator/serialization.hh Unescape Escape View File

27 alternator/server.cc Unescape Escape View File

2 alternator/server.hh Unescape Escape View File

32 alternator/stats.cc Unescape Escape View File

26 alternator/stats.hh Unescape Escape View File

143 alternator/streams.cc Unescape Escape View File

98 alternator/ttl.cc Unescape Escape View File

2 alternator/ttl.hh Unescape Escape View File

26 alternator/ttl_tag.hh Normal file Unescape Escape View File

2 api/api-doc/authorization_cache.json Unescape Escape View File

15 api/api-doc/error_injection.json Unescape Escape View File

2 api/api-doc/messaging_service.json Unescape Escape View File

276 api/api-doc/storage_service.json Unescape Escape View File

15 api/api-doc/system.json Unescape Escape View File

8 api/api.cc Unescape Escape View File

27 api/api.hh Unescape Escape View File

4 api/api_init.hh Unescape Escape View File

2 api/authorization_cache.cc Unescape Escape View File

2 api/authorization_cache.hh Unescape Escape View File

2 api/cache_service.cc Unescape Escape View File

2 api/cache_service.hh Unescape Escape View File

2 api/client_routes.cc Unescape Escape View File

2 api/client_routes.hh Unescape Escape View File

2 api/collectd.cc Unescape Escape View File

2 api/collectd.hh Unescape Escape View File

59 api/column_family.cc Unescape Escape View File

2 api/column_family.hh Unescape Escape View File

2 api/commitlog.cc Unescape Escape View File

2 api/commitlog.hh Unescape Escape View File

2 api/compaction_manager.cc Unescape Escape View File

2 api/compaction_manager.hh Unescape Escape View File

2 api/config.cc Unescape Escape View File

2 api/config.hh Unescape Escape View File

2 api/cql_server_test.cc Unescape Escape View File

2 api/cql_server_test.hh Unescape Escape View File

2 api/endpoint_snitch.cc Unescape Escape View File

2 api/endpoint_snitch.hh Unescape Escape View File

1185 Commits

copilot/in ... ykaul/comp

18

.github/copilot-instructions.md vendored

View File

2

.github/dependabot.yml vendored

View File

2

.github/scripts/check-license.py vendored

View File

3

.github/workflows/backport-pr-fixes-validation.yaml vendored

View File

53

.github/workflows/call_backport_with_jira.yaml vendored Normal file

View File

35

.github/workflows/call_jira_sync.yml vendored

View File

6

.github/workflows/call_sync_milestone_to_jira.yml vendored

View File

5

.github/workflows/call_validate_pr_author_email.yml vendored

View File

2

.github/workflows/check-license-header.yaml vendored

View File

6

.github/workflows/docs-pages.yaml vendored

View File

4

.github/workflows/docs-pr.yaml vendored

View File

42

.github/workflows/trigger-scylla-ci.yaml vendored

View File

3

.github/workflows/trigger_jenkins.yaml vendored

View File

86

CMakeLists.txt

View File

197

IMPLEMENTATION_SUMMARY.md

View File

46

LICENSE-ScyllaDB-Source-Available.md

View File

2

README.md

View File

2

abseil

2

absl-flat_hash_map.cc

View File

2

absl-flat_hash_map.hh

View File

11

alternator/auth.cc

View File

4

alternator/auth.hh

View File

2

alternator/conditions.cc

View File

2

alternator/conditions.hh

View File

2

alternator/consumed_capacity.cc

View File

2

alternator/consumed_capacity.hh

View File

2

alternator/controller.cc

View File

2

alternator/controller.hh

View File

2

alternator/error.hh

View File

28

alternator/executor.cc

View File

2

alternator/executor.hh

View File

2

alternator/expressions.cc

View File

2

alternator/expressions.g

View File

2

alternator/expressions.hh

View File

2

alternator/expressions_types.hh

View File

2

alternator/extract_from_attrs.hh

View File

2

alternator/http_compression.cc

View File

2

alternator/http_compression.hh

View File

2

alternator/parsed_expression_cache.cc

View File

2

alternator/rmw_operation.hh

View File

2

alternator/serialization.cc

View File

2

alternator/serialization.hh

View File

27

alternator/server.cc

View File

2

alternator/server.hh

View File

32

alternator/stats.cc

View File

26

alternator/stats.hh

View File

143

alternator/streams.cc

View File

98

alternator/ttl.cc

View File

2

alternator/ttl.hh

View File

26

alternator/ttl_tag.hh Normal file

View File

2

api/api-doc/authorization_cache.json

View File

15

api/api-doc/error_injection.json

View File

2

api/api-doc/messaging_service.json

View File

276

api/api-doc/storage_service.json

View File

15

api/api-doc/system.json

View File

8

api/api.cc

View File

27

api/api.hh

View File

4

api/api_init.hh

View File

2

api/authorization_cache.cc

View File

2

api/authorization_cache.hh

View File

2

api/cache_service.cc

View File

2

api/cache_service.hh

View File

2

api/client_routes.cc

View File

2

api/client_routes.hh

View File

2

api/collectd.cc

View File

2

api/collectd.hh

View File

59

api/column_family.cc

View File

2

api/column_family.hh

View File

2

api/commitlog.cc

View File

2

api/commitlog.hh

View File

2

api/compaction_manager.cc

View File

2

api/compaction_manager.hh

View File

2

api/config.cc

View File

2

api/config.hh

View File

2

api/cql_server_test.cc

View File

2

api/cql_server_test.hh

View File

2

api/endpoint_snitch.cc

View File

2

api/endpoint_snitch.hh

View File

87

api/error_injection.cc

View File