scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-31 03:56:42 +00:00

Author	SHA1	Message	Date
Botond Dénes	555cfbcd38	Merge 'treewide: replace deprecated smp::count and smp::all_cpus() with new APIs' from Avi Kivity Replace all uses of the deprecated seastar::smp::count with this_smp_shard_count() and smp::all_cpus() with this_smp_all_shards() across the ScyllaDB codebase (seastar submodule untouched). Both replacement functions require a reactor thread context. All call sites were verified to run on reactor threads. Notable cases: - dht/token-sharding.hh: this_smp_shard_count() is used as a default parameter value. This is safe since all callers are on reactor threads, but the expression is now evaluated at each call site rather than being a reference to a global variable. - service/storage_service.hh, locator/abstract_replication_strategy.hh, ent/encryption/encryption.cc: used in default member initializers and constructor member-init-lists. Objects are always constructed on reactor threads. - schema_builder: sometimes called from BOOST_AUTO_TEST_CASE without a reactor. Added pre-patch that makes the implicit shard count parameter implicit and pass 1 in those cases. Not changed: - scylla-gdb.py: reads smp::count as a GDB symbol (no reactor context). - Python test files: only reference smp::count in comments/strings. No backport: the Seastar commit that deprecated these function hasn't (and won't) make its way into any release branches (and the warnings are cosmetic anyway) Closes scylladb/scylladb#29990 * github.com:scylladb/scylladb: treewide: replace deprecated smp::count and smp::all_cpus() with new APIs scylla-gdb: read shard count from smp::_this_smp instead of smp::count schema_builder: make shard_count an explicit constructor parameter	2026-05-27 09:42:06 +03:00
Avi Kivity	8010e408a2	treewide: replace deprecated smp::count and smp::all_cpus() with new APIs Replace all uses of the deprecated seastar::smp::count with this_smp_shard_count() and smp::all_cpus() with this_smp_all_shards() across the ScyllaDB codebase (seastar submodule untouched). Both replacement functions require a reactor thread context. All call sites were verified to run on reactor threads. Notable cases: - dht/token-sharding.hh: this_smp_shard_count() is used as a default parameter value. This is safe since all callers are on reactor threads, but the expression is now evaluated at each call site rather than being a reference to a global variable. - service/storage_service.hh, locator/abstract_replication_strategy.hh, ent/encryption/encryption.cc: used in default member initializers and constructor member-init-lists. Objects are always constructed on reactor threads. Not changed: - scylla-gdb.py: reads smp::count as a GDB symbol (no reactor context). - Python test files: only reference smp::count in comments/strings.	2026-05-26 17:35:20 +03:00
Avi Kivity	c59985c38b	Merge 'cql3: limit large allocations when parsing queries' from Botond Dénes Queries are stored and passed around as sstring/std::string_view. While normally they are small enough to not cause problems, as the `test_cdc_large_values.TestLargeColumnsWithCDC.test_single_column_blob_max_size_with_cdc_preimage_full_postimage[unprepared_statements]` demonstrates, queries can be arbitrarily large, putting heavy strain on Scylla internals via large allocations, in the extreme case causing denial of service. This PR attempts to alleviate this by using fragmented storage for queries: read query as fragmented string from the input stream in `transport/server.cc`, propagate it as such to `query_processor::prepare()` and also store it as such in `cql3::cql_statement::raw_cql_statement`. Also avoid linearizing raw values during in the CQL expression tree: switch `cql3::expr::untyped_constant::raw_text` to fragmented storage. For this to be possible, some infrastructure code had to be made fragmented storage friendly: ascii/utf8 validation, hashers, from_hex and importantly: `abstract_type::from_string()`. Unfortunately, the query still has to be linearized for parsing itself, as ANTLR -- although allows for custom InputStream implementation -- plays pointer arithmetics games with the pointers obtained from them, so fragmented input cannot be used. Still, this PR limits the places where the query is linearized to the following: * Parsing * Audit * Logs and error messages So the normal query paths for queries that actually can get arbitrarily large (UPDATE and INSERT) should only linearize the query temporarily for parsing. Fixes #10779 Improvement, no backport Closes scylladb/scylladb#28619 * github.com:scylladb/scylladb: tracing: add_query(): change query param to utils::chunked_string cql3: store raw query string in utils::chunked_string serializer: add serializer<utils::chunked_string> utils/reusable_buffer: add get_linearized_view(managed_bytes_view) cql3/expr: use utils::chunked_string for untyped_constant::raw_text types: abstract_type::from_string() switch to fragmented buffers (implementation) types: abstract_type::from_string() switch to fragmented buffers (interface) types: use write_fragmented from utils/fragment_range.hh types: timestamp_from_string(): don't assume std::string_view is null-terminated types/duration: don't assume std::string_view is null-terminated utils/hashers: add calculate(managed_bytes_view) overload utils/ascii: add validate(managed_bytes_view) overload utils: add managed_bytes_fwd.hh utils: add chunked_string utils: add managed_bytes_basic_view::byte_iterator	2026-05-26 15:00:53 +03:00
Avi Kivity	f165b396fd	schema_builder: make shard_count an explicit constructor parameter A recent Seastar update deprecated smp::count and introduced this_smp_shard_count() as a replacement. One difference is that this_smp_shard_count() wants to run on a reactor thread. This poses a problem for non-reactor tests (BOOST_AUTO_TEST_CASE) that nevertheless use a schema, as the schema_builder constructor references smp::count. If we replace it with this_smp_shard_count() then it will crash when running without a reactor. To fix, remove the implicit this_smp_shard_count() call from raw_schema's constructor and require callers to pass shard_count explicitly to schema_builder. This allows tests that don't run on a reactor thread to construct schemas without crashing. Production code and reactor-based tests pass this_smp_shard_count(). Non-reactor test files (expr_test, keys_test, nonwrapping_interval_test, wrapping_interval_test, bti_key_translation_test, range_tombstone_list_test) pass a fixed shard count of 1. Note: sstable_test.cc is a Seastar test file (SEASTAR_THREAD_TEST_CASE) but also contains one plain BOOST_AUTO_TEST_CASE (test_empty_key_view_comparison) that constructs a schema_builder without a reactor context. This test also receives a fixed shard count of 1.	2026-05-26 11:55:56 +03:00
Botond Dénes	0fd25dc47c	Merge 'Replace get_injection_parameters() with inject_parameter() where appropriate' from Pavel Emelyanov Several error injection sites use the low-level get_injection_parameters() API to fetch the entire parameters map and then manually look up a single key. The inject_parameter() API is better suited for these cases — it combines the enabled check and typed single-parameter extraction in one call, returning std::optional. Cleaning error injection usage, not backporting Closes scylladb/scylladb#29970 * github.com:scylladb/scylladb: test: Use inject_parameter() in row_cache_test sstables: Use inject_parameter() for mx reader fill buffer timeout streaming: Use inject_parameter() for order_sstables_for_streaming	2026-05-26 10:32:44 +03:00
Botond Dénes	2c9a5f9634	types: abstract_type::from_string() switch to fragmented buffers (implementation) The previous patch changed the interface and callers, this one updates the implementation to actually work with fragmented buffers. Most types just use with_linearized() to linearize the fragmented input buffer for parsing. This is fine, as most types have a fixed or bounded-size string representation that is small. Importantly, the input is not linearized for the 3 types which have unbounded values: ascii, bytes and text. The tuple type can contain any of these types itself, so it is also converted to avoid linearization.	2026-05-26 09:08:06 +03:00
Botond Dénes	597d4252dc	types: abstract_type::from_string() switch to fragmented buffers (interface) Change input: str::string_view -> utils::chunked_string_view. Change return value: bytes -> managed_bytes. This patch only changes the interface, with some to_bytes() sprinkled in the internals to deal with recursive calls. Internals will be updated in the next patch, to keep the churn of updating callers separate from the actually important changes.	2026-05-26 09:08:06 +03:00
Botond Dénes	a9028d88b2	utils/hashers: add calculate(managed_bytes_view) overload Uses update() for each fragment, then finalize. Yields identical hash to calling calculate(std::string_view) with linearized buffer. This is checked by new tests.	2026-05-26 09:08:05 +03:00
Botond Dénes	a2fff12bcd	utils: add chunked_string A thin facade over managed_bytes[_view], offering some extra convenience for working with strings, as well as a strong type communicating the purpose (storing text instead of a blob). Also introduces utils::from_hex(chunked_string_view), a fragmented hex-decode that operates directly on a chunked_string_view without requiring linearization. Hex pairs straddling fragment boundaries are handled via a carry-over nibble.	2026-05-26 09:08:05 +03:00
Botond Dénes	09743aed36	utils: add managed_bytes_basic_view::byte_iterator bytes-wise iterator which works both as bidirectional-iterator and as output-iterator (for mutable views). Allows using managed_bytes_view in algorithms which are iterator based. Added unit tests for covering the iterator functionality.	2026-05-26 09:08:05 +03:00
Nadav Har'El	b026aea6f7	cql3/expr: add NEG unary operator for numeric negation This patch adds a new expression type, unary_operator, analogous to the existing binary_operator but takes just one operand instead of two. This patch also implements the first and only unary operator type, unary_oper_t::NEG, implementing negation (unary minus) for all numeric types. For fixed-width integer types overflow or underflow results in an error. If the operand is NULL, the result is a NULL as well. The new operator is not yet used by the CQL syntax - our parser doesn't parse arithmetic expressions yet. We also do not plan to use it in the following patch which uses the separate SUB (subtraction) operation, not the new NEG. But since I already implemented a unary minus operator, and we'll surely need it in the future for general arithmentic operations, I thought I might as well include this patch as well. Refs #22918 ("Support arithmetic operators")	2026-05-25 10:08:11 +03:00
Nadav Har'El	f27d1f08fc	cql3/expr: add SUB binary operator for numeric subtraction In this patch we add to our expressions oper_t::SUB, for subtraction, analogous to the ADD from the previous patch. The only reason why we need a separate SUB operation and can't just combine ADD with a unary minus (NEG) operator is the minimum integer in fixed-sized integer. For example, 8-bit integers have the range -128...127. A subtraction like -1 - (-128) is valid (its value is 127) but the negation of (-128) would be invalid (128). One of the tests we add in this patch validates this fact. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-05-25 10:06:28 +03:00
Nadav Har'El	083adf84ab	cql3/expr: add ADD binary operator for numeric addition Extend oper_t with a new ADD operator, to represent addition between two numeric expressions. Supports all numeric types - tinyint, smallint, int, bigint, float, double, varint, and decimal. For fixed-width integer type overflow or underflow results in an error. If one of the operand is NULL, the result is also a NULL. The new operator is not yet used by the CQL syntax - our parser doesn't parse arithmetic expressions yet. We plan to start using this new operator in a following patch which implements counter syntax ("SET r = r + 1" ) for LWT, but in the future we can use it for more general cases. At the moment, ADD requires that both operands have the same type. This is all we need for the first use case, and this limitation can be relaxed later. Interestingly, ADD is our first binary operator implementation that does not return a boolean. Until now all our binary operators have been comparison operators, and all returned boolean. In contrast, ADD's return type is the type of its operands. This implementation is susceptible to the pre-existing bug SCYLLADB-1576, where adding 1e1000000 and 1 in "decimal" or "varint" types will happily allocate a million-digit number and run out of memory. A reproducing test is included, and this issue will be solved in one place for all operations that have additions (including aggregations and arithmetic expressions) in a followup pull-request. Refs #22918 ("Support arithmetic operators")	2026-05-25 10:05:09 +03:00
Gleb Natapov	0bf050d175	storage_proxy: hold shared pointer to a table object during entire query_partition_key_range_concurrent execution Otherwise if a table is dropped in the middle of a scan the object may disappear. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-2137 Closes scylladb/scylladb#29988	2026-05-24 21:54:08 +03:00
Yaniv Michael Kaul	acd3115645	sstables: include SSTable filename in Stats metadata error messages When Stats metadata is not available or malformed, include the SSTable filename in the error message to help operators identify which SSTable files need attention during startup failures. Fixes: https://github.com/scylladb/scylla-enterprise/issues/5439 Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> AI-assisted: yes Backport: no, benign improvement Closes scylladb/scylladb#29950	2026-05-22 16:49:37 +03:00
Avi Kivity	305346a3ec	Merge 'Don't materialize collections into intermediate representations' from Botond Dénes Collections have an age-old problem in ScyllaDB: they had to be unserialized into an intermediate representation for any access or manipulation. The intermediate representation needs effort to produce and also requires additional memory to store. Both can be significant for large collections. This intermediate representation is then either discarded immediately after use, or re-serialized again. This problem was significant enough for us to consider the use of collections as somewhat of an anti-pattern. But our customers keep using it. Alternator is also a heavy user of collections. This PR aims to solve this problem once and for all. The plan is as follows: * Promote direct use of the serialized collection format: - Add accessor methods to `collection_mutation_view` which read from the serialized format directly: `tomb()`, `size()` and `begin()`/`end()`. - Add a `collection_mutation_writer` which provides container semantics for generating a serialized `collection_mutation` directly on the go (`push_back()`). * Replace all usage of `collection_mutation_description`, `collection_mutation_view_description` and friends with use of the new infrastructure. * Drop the old infrastructure, to avoid accidental regressions. Continues the work started by https://github.com/scylladb/scylladb/pull/29033 and takes it to its conclusion. To help focus review, here is a summary of the patches: * [1, 2] preparatory refactoring: drop some unused abstract_type params * [3, 6] introduce new infrastructure to write and read serialized collections directly; this is the meat of the PR * [6, -1) replace all usage of old materializing infrastructure with usage of the new one * [-1] drop old infrastructure Command: ``` dbuild -it -- build/release/scylla perf-simple-query --collection=16 -c1 -m2G --default-log-level=error ``` \| Metric \| Before \| After \| Change \| \|--------------------------\|--------:\|--------:\|------------\| \| Throughput (median tps) \| 315,760 \| 332,021 \| +5.1% \| \| Instructions/op (median) \| 53,776 \| 48,681 \| -9.5% \| \| CPU cycles/op (median) \| 17,365 \| 16,471 \| -5.1% \| \| Allocations/op \| 85.1 \| 82.1 \| -3.5% \| Significant improvement. Throughput is up ~5%, and both instruction count and cycle count are meaningfully reduced. --- Command: ``` dbuild -it -- build/release/scylla perf-simple-query --collection=16 -c1 -m2G --default-log-level=error --write ``` \| Metric \| Before \| After \| Change \| \|--------------------------\|----------:\|---------:\|-----------\| \| Throughput (median tps) \| 150,823 \| 149,678 \| -0.8% \| \| Instructions/op (median) \| 108,388 \| 103,858 \| -4.2% \| \| CPU cycles/op (median) \| 34,860 \| 35,371 \| +1.5% \| \| Allocations/op \| ~105–108 \| ~102–103 \| -3.0% \| Mixed, mostly neutral. Throughput is essentially flat (within noise). Instructions/op improved by ~4%, allocations dropped slightly, but cycles/op edged up marginally. --- Command: ``` dbuild -it -- build/release/scylla perf-alternator --workload write --developer-mode=1 --alternator-port=8000 --alternator-write-isolation=unsafe -c1 -m2G --default-log-level=error ``` \| Metric \| Before \| After \| Change \| \|--------------------------\|--------:\|-------:\|-----------\| \| Throughput (median tps) \| 55,777 \| 56,051 \| +0.5% \| \| Instructions/op (median) \| 246,215 \|246,610 \| +0.2% \| \| CPU cycles/op (median) \| 77,641 \| 77,020 \| -0.8% \| \| Allocations/op \| 340.4 \| 335.4 \| -1.5% \| Essentially neutral. All metrics are within noise margins. Slight reduction in allocations and cycles, negligible otherwise. --- The change has a clear, substantial positive effect on reads (~5% throughput gain, ~9.5% fewer instructions per op). The write and alternator paths are unaffected in practice — changes there are within measurement noise. No regressions are apparent. This is expected: https://github.com/scylladb/scylladb/pull/29033 did the heavy lifting when it comes to the write path, this PR finishes the job, mostly improving reads. Fixes: #3602 Improvement, no backport. Closes scylladb/scylladb#29127 * github.com:scylladb/scylladb: mutation/collection_mutation: make collection_mutation::_data private mutation_collection: drop collection_mutation_description and friends test: move away from collection_mutation_description tree: move away from collection_mutation_description test: move away from collection_mutation_view::with_deserialized() tree: move away from collection_mutation_view::with_deserialized() types: fix indendation, left broken by previous commit types: move away from collection_mutation_view::with_deserialized() types: serialize_for_cql(): use throwing_assert() instead of SCYLLA_ASSERT() schema: column_computation: move away from collection_mutation_view::with_deserialized() mutation: move away from collection_mutation_view::with_deserialized() alternator: move away from collection_mutation_view::with_deserialized() cdc: move away from collection_mutation_view::with_deserialized() mutation/collection_mutation: printer: don't deserialize collections mutation/collection_mutation: difference(): don't deserialize collections mutation/collection_mutation: merge(): don't deserialize collections mutation/collection_mutation: extract compact_and_expire() to free function mutation/collection_mutation: refactor empty(), is_any_live() and last_update() compaction_garbage_collector: pass collection_mutation to collect() test/boost/mutation_test: add tests for collection_mutation_{view,writer} mutation/collaction_mutation: collection_mutation_view: add methods to inspect content mutation/collection_mutation: add collection_mutation_writer mutation/collection_mutation: collection_mutation(): generate valid collection mutation/collection_mutation: collection_mutation(): remove unused abstract_type param mutation/atomic_cell: drop unused type param from from_bytes()	2026-05-21 17:10:40 +03:00
Andrzej Jackowski	f8156702de	tree: add missing -present to copyright headers ~2076 files used "Copyright (C) YYYY-present ScyllaDB" while ~88 files used "Copyright (C) YYYY ScyllaDB". This inconsistency leads to unnecessary code review discussions and gradual spread of the less common format. Standardize all ScyllaDB copyright headers to use -present. Fixes SCYLLADB-1984 Closes scylladb/scylladb#29876	2026-05-21 10:57:42 +02:00
Botond Dénes	da7903de79	test: move away from collection_mutation_description Use collection_mutation_writer instead.	2026-05-21 10:23:29 +03:00
Botond Dénes	c76ab90fb2	test: move away from collection_mutation_view::with_deserialized() Use the collection_mutation_view directly.	2026-05-21 10:23:29 +03:00
Botond Dénes	7c8b5681f4	mutation/collection_mutation: extract compact_and_expire() to free function The new free-function variant operates on a collection_mutation_view directly, instead of on collection_mutation_description.	2026-05-21 10:23:15 +03:00
Botond Dénes	c5d12d44c6	test/boost/mutation_test: add tests for collection_mutation_{view,writer} Test the new facilities for producing and inspecting serialized collection mutations directly, without intermediate formats.	2026-05-21 08:34:21 +03:00
Botond Dénes	24fdfa34dd	mutation/collection_mutation: collection_mutation(): remove unused abstract_type param	2026-05-21 08:34:21 +03:00
Marcin Maliszkiewicz	83823149e9	Merge 'audit: implement audit_rules config' from Andrzej Jackowski This patch series adds `audit_rules`, a new audit configuration option for fine-grained, role-aware audit filtering with per-rule sink routing. Rules can be configured in `scylla.yaml` or updated live through `system.config` without restarting the node. Each rule specifies target sinks (`table`, `syslog`), statement categories, qualified table name patterns, and role patterns. Table and role patterns use POSIX `fnmatch` with extended glob syntax. For table-scoped categories (`DML`, `DDL`, `QUERY`), a rule matches only when the category, role, and qualified table name all match. For table-independent categories (`AUTH`, `ADMIN`, `DCL`), the table filter is ignored. Empty category or role lists match nothing; an empty table list matches nothing only for table-scoped categories. The new rules are additive with the existing `audit_categories`, `audit_keyspaces`, and `audit_tables` settings: both mechanisms are evaluated for each audit event, and the final sink set is the union of all matches. To avoid evaluating glob patterns on every audit event, audit rules use a preprocessed cache of known roles and tables. The cache is kept in sync through group0 role/table snapshots, role-change notifications, and schema migration notifications. For known entities, rule matching uses precomputed role/table rule sets; unknown entities fall back to direct rule evaluation. When `audit_rules` is empty, per-event rule matching returns immediately and does not evaluate glob patterns. Audit still keeps known role/table metadata in sync while audit is enabled, so rules can be enabled later through live configuration updates without restarting the node. Performance Measured with `perf-simple-query --smp 1 --duration 100` against a null syslog socket. Results show no regression when audit is disabled, and audit-rules performance has at most 1% more instructions than legacy config for equivalent workloads: ``` =============================================================================================================================================================================== Configuration \| Binary \| throughput (tps) \| insns/op \| cpu_cycles/op \| alloc/op \| logal/op \| task/op =============================================================================================================================================================================== audit=none [1] \| baseline \| 206922.4 \| 36591.6 \| 15348.3 \| 58.1 \| 0.0 \| 14.1 audit=none [1] \| this PR \| 207856.4 (+0.5%) \| 36544.9 (-0.1%) \| 15274.0 (-0.5%) \| 58.1 \| 0.0 \| 14.1 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- audit=syslog keyspaces=ks [2] \| baseline \| 94871.8 \| 54163.0 \| 27172.4 \| 72.0 \| 0.0 \| 24.0 audit=syslog keyspaces=ks [2] \| this PR \| 96138.4 (+1.3%) \| 54072.3 (-0.2%) \| 26699.3 (-1.7%) \| 72.0 \| 0.0 \| 24.0 audit=syslog audit-rules=ks [3] \| this PR \| 95142.1 (+0.3%) \| 54457.8 (+0.5%) \| 26953.8 (-0.8%) \| 72.0 \| 0.0 \| 24.0 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- audit=syslog keyspaces=ks-non-existent [4] \| baseline \| 213997.8 \| 36735.6 \| 14848.1 \| 58.1 \| 0.0 \| 14.1 audit=syslog keyspaces=ks-non-existent [4] \| this PR \| 219297.2 (+2.5%) \| 36667.3 (-0.2%) \| 14500.1 (-2.3%) \| 58.1 \| 0.0 \| 14.1 audit=syslog audit-rules=ks-non-existent [5] \| this PR \| 211038.7 (-1.4%) \| 36999.7 (+0.7%) \| 15048.6 (+1.4%) \| 58.1 \| 0.0 \| 14.1 =============================================================================================================================================================================== [1] ./scylla perf-simple-query --smp 1 --duration 100 --audit "none" [2] ./scylla perf-simple-query --smp 1 --duration 100 --audit "syslog" --audit-keyspaces "ks" --audit-categories "DCL,DDL,AUTH,DML,QUERY" --audit-unix-socket-path "/tmp/audit-null.sock" [3] ./scylla perf-simple-query --smp 1 --duration 100 --audit "syslog" --audit-rules '[{"sinks":["syslog"],"categories":["DCL","DDL","AUTH","DML","QUERY"],"qualified_table_names":["ks."],"roles":[""]}]' --audit-unix-socket-path "/tmp/audit-null.sock" [4] ./scylla perf-simple-query --smp 1 --duration 100 --audit "syslog" --audit-keyspaces "ks-non-existent" --audit-categories "DCL,DDL,AUTH,DML,QUERY" --audit-unix-socket-path "/tmp/audit-null.sock" [5] ./scylla perf-simple-query --smp 1 --duration 100 --audit "syslog" --audit-rules '[{"sinks":["syslog"],"categories":["DCL","DDL","AUTH","DML","QUERY"],"qualified_table_names":["ks-non-existent."],"roles":[""]}]' --audit-unix-socket-path "/tmp/audit-null.sock" audit-null.sock was created with `socat -u UNIX-RECV:/tmp/audit-null.sock,type=2 OPEN:/dev/null` ``` Fixes: SCYLLADB-1430 No backport: new feature Closes scylladb/scylladb#29267 * github.com:scylladb/scylladb: test: alternator: audit: rules filtering and batch bypass test: perf: add --audit-rules option to perf-simple-query docs: add audit rules section to the auditing guide test: audit: cover role and schema cache notifications test: audit: cover audit rules cluster behavior audit: rebuild rule caches on group0 snapshot and role changes audit: refresh rule caches on schema, role, and config changes audit: route matching rules to configured sinks test: cover preprocessed audit rule cache audit: add preprocessed rule matching cache audit: pass sink targets to storage helpers test: audit: cover rule matching semantics audit: add rule matching and sink helpers test: audit: cover audit_rules configuration config: add live audit_rules option test: cover audit rule parsing and validation audit: define audit_rule type with parsing and validation	2026-05-20 14:10:45 +02:00
Avi Kivity	6df04c9e5b	Update seastar submodule Changed seastar::http::experimental to seastar::http to reflect graduation of the seastar http API. Changed call to seastar::rename_file() (in sstables/storage.cc, sstables/sstable_directory.cc, sstable/sstables.cc and db/hints/internal/hint_storage.cc) to reflect new default parameter. Updated scylla_gdb test helper get_task() to work with updated accept loop in Seatar. This is just test code (attempts to find a task to operate on), not used in real scylla-gdb.py work, but nevertheless the adjustment keeps backward compatibility. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1798 Fixes https://scylladb.atlassian.net/browse/SCYLLADB-2043 * seastar 485a62b2...510f3148 (43): > reactor_backend: fix iocb double-free and shutdown hang during AIO teardown > file: fix default DMA alignment > http: add to_reply() to redirect_exception with extra-header support > core: propagate syscall errors via `coroutine::exception` > file: assert dma alignments are powers of two > doc: Document undocumented io_tester features and fix output example > backtrace: print the build_id along with the backtrace > reactor: default to oneline backtraces > Merge 'json: formatter: support types with user-defined conversion to sstring' from Benny Halevy tests: json_formatter: test formatter::write with string types json: formatter: support types with user-defined conversion to sstring > httpd_test: fix build failure with Seastar_SSTRING=OFF > net/tls: introduce ssl_call wrapper for SSL I/O > build: disable unused command line argument error for C++ module > coroutine/generator: fix setup of generator's waiting task > tests/tls: set 1000-day validity for self-signed CA cert > net: tls: openssl: disable certificate compression > reactor: reduce steady_clock::now() calls per scheduling quantum > fair_queue: remove notify_request_finished() > loop: use small_vector for parallel_for_each_state incomplete futures > dodge false sharing in spinlock > Merge 'Handle nowait support for reads and writes independently' from Pavel Emelyanov file: Change nowait_works mode detection file: Introduce read-only nowait_mode filesystem: Make nowait_works bit a enum class too file: Make nowait_works bit a enum class > Merge 'net/tls: improve OpenSSL error queue hygiene' from Gellért Peresztegi-Nagy net/tls: assert clean error queue before SSL operations net/tls: clear error queue after successful SSL operations net/tls: clear error queue after successful SSL_CTX_new net/tls: drain error queue on unexpected error codes net/tls: use make_openssl_error for BIO creation failure > vla.hh: add missing includes > Merge 'smp: make smp::count non-static' from Avi Kivity smp: convert all smp::count usages to instance-aware alternatives smp: add per-instance shard_count and this_smp() infrastructure disk_params: document pre-init smp::count access with explicit 0 reactor_backend: document pre-init smp::count access with explicit 0 tests: alien_test: pass shard count to alien thread explicitly > build: fix cmake missing ninja on Ubuntu 26.04 > rpc: Fix uint64 wraparound of expired timeout in send_entry() > Merge 'Generalize some RPC tests' from Pavel Emelyanov tests: Generalize async connection-based scheduling RPC tests tests: Generalize sync connection-based scheduling RPC tests tests: Remove redundant variadic/nonvariadic RPC tuple tests tests: Generalize max timeout RPC tests > net: tls: openssl: Share BIO ptrs across shards > http: fix compilation on clang 22 with c++26 > build: openssl tools needed for test cert generation > reactor: support rename2 > future: fix forwarding of reference types > Merge 'Zero-copy http chunked data sink' from Pavel Emelyanov http: Make chunked data sink zero-copy tests/prometheus_http: Rewrite on top of http::client tests/httpd: Rewrite content_length_limit on top of http::client > tests: Replace ad-hoc http_consumer with production HTTP parser > Merge 'co_return to accept same expressions and types as return' from Alexey Bashtanov tests/unit/{coroutines,futures}: strict types on co_return and set_value api: introduce version 10: core/{coroutine,future}: make `co_return` more strict with types core/{coroutine,future}: preparations to fix `co_return` type semantics > Merge 'Perftune.py: add special handling for mlx5 rss queues number calculation' from Vladislav Zolotarov perftune.py: NetPerfTuner: enhance RSS (a.k.a. "Rx") queues accounting for mlx5 devices perftune.py: update docstring of NetPerfTuner.__get_rps_cpus() method perftune.py: add a method that parses and models the output of the 'ethtool -l' command for a given interface > httpd: rewrite do_accepts/do_accept_one as coroutines > file: add mmap support to file > http: Move client code out of experimental namespace > file: add hugetlbfs support to file system detection > tests: Replace test_source_impl with util::as_input_stream > tests: Replace buf_source_impl with util::as_input_stream > Merge 'rpc_tester: expose throuput for rpc tester' from Marcin Szopa rpc_tester: remove unused payload size variable from job_rpc_streaming class rpc_tester: add start time tracking for throughput calculation, print throughput and msg/s for job_rpc rpc_tester: refactor result emission to use dedicated functions for messages and throughput > iostream: cast first argument of `std::min` to `size_t` Closes scylladb/scylladb#29952	2026-05-20 13:47:12 +03:00
Andrzej Jackowski	7afb90aa6f	test: cover preprocessed audit rule cache The rule cache is the fast path for matching, so its hit, fallback, refresh, and category-bypass behavior needs focused unit coverage. Test transparent hash consistency, cached and uncached lookup paths, incremental entity add/remove, rule refresh, and empty-rules short circuit. Refs SCYLLADB-1430	2026-05-20 06:55:15 +02:00
Andrzej Jackowski	67ecdba456	test: audit: cover rule matching semantics Rule matching is reused by both the preprocessed cache and the fallback path -- unit-test it separately so coupling failures do not mask matching bugs. Cover category bitmask, glob patterns for tables and roles, AUTH/ADMIN/DCL table bypass, empty-keyspace batch bypass, and sink bitmask conversion. Refs SCYLLADB-1430	2026-05-20 06:55:15 +02:00
Andrzej Jackowski	762fd5d455	test: audit: cover audit_rules configuration Audit rules enter through three paths (YAML, CQL, CLI), each with its own parsing and tracking -- cover all entry points before routing can depend on them. Test loading from YAML, live update via CQL and server API, CLI parsing, invalid value rejection at each path, and observer notification on live update. Refs SCYLLADB-1430	2026-05-20 06:55:14 +02:00
Andrzej Jackowski	3cc55dd6eb	test: cover audit rule parsing and validation Parsing and validation are the first consumer-visible surface of audit rules -- cover them before building higher layers. Test JSON parsing (valid, malformed, missing fields), rule validation (unknown sinks, invalid categories), and JSON round-trip serialization. Refs SCYLLADB-1430	2026-05-20 06:55:14 +02:00
Pavel Emelyanov	c23b086400	test: Use inject_parameter() in row_cache_test Replace get_injection_parameters().contains() with inject_parameter() for polling the "suspended" signal. The inject_parameter() API is more appropriate for checking a single parameter and reduces the usage of the lower-level get_injection_parameters() bulk accessor. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-19 18:23:11 +03:00
Szymon Malewski	15493872b2	vector_search: fix decimal/varint precision loss in filter value_to_json() value_to_json() converts CQL values to JSON for vector search filters. For decimal and varint types, it used rjson::parse() on the JSON string, which parses through a double and silently loses precision for values exceeding ~15 significant digits — producing wrong filter results. Additionally, for decimal type we need an exact string representation that preserves the original (unscaled, scale) pair, because partition keys use byte-level identity: different serialized representations of the same numeric value are distinct rows, so the filter must reproduce the exact representation stored in the key. Add big_decimal::to_string_canonical() which follows the Java BigDecimal toString() spec (JDK 8+), producing a bijective string representation that uses exponential notation for extreme scales instead of expanding trailing zeros (which could cause OOM). This could replace to_string(), but doing so has wider consequences (e.g. hash/equality contract for decimal_type) described in SCYLLADB-1574. Use it in value_to_json() for decimal_type, and use rjson::from_string() for varint_type, both bypassing the lossy double parse path. Tests cover the new to_string_canonical() and the filter fix, as well as existing decimal type behavior (key representation, clustering order, toJson) that we rely on and must not break. The CQL decimal type tests (test_type_decimal.py) also pass against Cassandra. Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1583 Refs: https://scylladb.atlassian.net/browse/SCYLLADB-1574 Closes scylladb/scylladb#29505	2026-05-18 17:07:26 +03:00
Marcin Maliszkiewicz	628e1ef2de	Merge 'Introduce auth::config to decouple auth modules from db::config' from Pavel Emelyanov Auth modules (authenticators, role managers, and auth::service) access their configuration options by reaching into db::config through the query processor. This abuses database as proxy object to get configuration. This series introduces a dedicated auth::config struct that carries the configuration options used by auth modules.The config is populated in main.cc and delivered to each shard via sharded_parameter. This makes auth service conform to the overall design, where db::config is split into smaller per-service configs on start, thus decoupling individual components/services from global configuration. Cleaning components dependencies, not backporting. Closes scylladb/scylladb#29870 * github.com:scylladb/scylladb: auth: Remove unused default_superuser() function auth: Switch role managers to use auth::config auth: Switch authenticators to use auth::config auth: Introduce auth::config and wire it through service	2026-05-18 11:32:11 +02:00
Pavel Emelyanov	9b58d2213b	auth: Switch role managers to use auth::config Convert all role manager implementations to receive their configuration from auth::config instead of accessing db::config through the query processor: - standard_role_manager: reads superuser name from config - ldap_role_manager: reads LDAP URL template, attribute, bind credentials, and permissions update interval from config; passes config to inner standard_role_manager - maintenance_socket_role_manager: keeps a const reference to service's config and passes it directly when lazily constructing standard_role_manager Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-05-15 18:55:02 +03:00
Petr Gusev	8a76ec7e65	test/boost: add regression test for missing tablet routing after CAS bounce Add test_tablet_routing_info_after_cas_shard_bounce that verifies TABLETS_ROUTING_V1 payload is returned after an internal CAS shard bounce. The test simulates the transport-layer bounce: it creates a table whose single tablet replica lands on a shard different from the test thread, executes an LWT (which bounces), then transfers client_state via client_state_for_another_shard (preserving _original_shard) and re-executes on the tablet shard. The test asserts that check_locality() correctly detects the misrouting and returns tablet routing info. Refs SCYLLADB-2041	2026-05-15 11:56:14 +02:00
Avi Kivity	6db152afbb	Update seastar submodule Drop local formatter for seastar::http::reply, which should have been added to Seastar in the first place, and now conflicts. Also drop local formatters for types that are aliases for Seastar types which have gained formatters. Disable recently-gained TLS use of OpenSSL instead of gnutls. We don't need it, and it causes link errors with LTO. Fix incorrect skipping in encrypted_file_test, which computed the remaining stream length but did not account for already consumed size_to_compare. Change utils::gcp::storage::client::object_data_source::skip() to match new Seastar behavior (rejecting skip-past-eof with an exception). This is needed since `30f1075544` switched the test's data source to a Seastar implementation. It is also more correct - if we're asked to skip n bytes but the stream doesn't have n bytes, this is a protocol violation. Contains test fix from Pavel, exposed by [1]: test: Handle premature EOF in test_gcp_storage_skip_read The test intentionally uses file_size larger than the actual object to exercise EOF behavior. When input_stream::skip() is called after EOF, it throws std::runtime_error("premature end of stream"). Catch this specific exception from both streams, verify they agree, and exit the loop gracefully. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> [1] `cbd1e17d2f`, included in this Seastar submodule update * seastar 4d268e0e...485a62b2 (50): > reactor: open_directory(): honor bypass_fsync > http: Add formatters for http::request and http::reply > Merge 'Assorted set of io-tester cleanups' from Pavel Emelyanov io_tester: Remove unused and internal-only accessor io_tester: Move think-time machinery into thinker_state io_tester: Move _file to io_class_data io_tester: Replace class_data::_start member with a local variable io_tester: Move _alignment from class_data to io_class_data io_tester: Remove buffer allocation from top-level request issuing io_tester: Cleanup context::stop() invocation io_tester: Allocate write buffer once to fill a file io_tester: Declare quantiles arrays as static constexpr io_tester: Drop class_data::type_str() io_tester: Replace != "" comparisons with .empty() io_tester: Replace gen_class_data() if/else chain with a switch io_tester: Deduplicate vectorized I/O classes > io_tester: fix crash from missing metric during startup > net: tls: adjust openssl integration to new module support > http/client: Count and export integrated queue length > Merge 'Introduce pipe_data_source_impl and pipe_data_sink_impl' from Pavel Emelyanov fstream: add pipe_data_source_impl and pipe_data_sink_impl pollable_fd: add write_some/write_all backed by writev pollable_fd: rename write_some/write_all(iovec) to send_some/send_all > reactor: Make pollable_fd_state helper methods private > module: extend seastar.cppm with comprehensive public API exports > Merge 'Add exhaustive input_stream invariant test + fixes' from Pavel Emelyanov tests: add exhaustive input_stream read/skip invariant test iostream: make skip() reject premature end of stream with exception > Merge 'Allow runtime selectability of GnuTLS or OpenSSL' from Noah Watkins net/tls: avoid potential read-past-buffer net/tls: move credential methods to generic tls layer net/tls: rename credentials_impl::dh_params to set_dh_params test/tls: enable openssl tls unit test test/tls: fix CA cert generation to use v3_ca extensions github: disable parallel test execution in alpine workflow crypto: support compiling seastar without gnutls net/tcp: use crypto provider for md5 calculation tls: fix test_peer_certificate_chain_handling for OpenSSL net/tls: fix test for self-signed server cert opoenssl compat net/tls: disable priority strings test for openssl provider core/crypto: expose crypto backend name for introspection test/tls: remove gnutls version guard net/tls: add openssl tls backend http: use backend agnostic tls error code net/tls: make error codes configurable by each tls backend net/tls: move reloadable_credentials to generic tls layer net/tls: move build_certificate to generic tls layer net/tls: move apply_to() to generic tls layer net/tls: move credential methods to generic tls layer net/tls: add OpenSSL-specific methods to public API with no-op defaults net/tls: introduce dh_params and credentials abstraction layer net/tls: add credentials_impl abstract base class net/tls: dispatch tls::error_category() through crypto_provider net/tls: dispatch wrap_client/wrap_server through crypto_provider net/tls: add tls_backend interface to crypto_provider net/tls: move public tls API methods to generic tls layer net/tls: move formatting utilities to generic tls layer net/tls: move credentials_builder blob methods to generic tls layer net/tls: move dh_params::from_file to generic tls layer net/tls: move abstract_credentials file methods to generic tls layer net/tls: move tls_socket_impl to generic tls layer net/tls: move server_session to general tls layer net/tls: move tls_connected_socket_impl to generic tls layer net/tls: move net::get_impl to generic tls layer net/tls: move session_ref to generic tls layer net/tls: add session_impl abstract interface for tls pluggability net/tls: rename tls.cc to be gnutls specific crypto: introduce crypto provider abstraction http: remove unused include > tls: test_send_two_large > rpc: include exception type for remote errors > GHA: increase timeout to 60 minutes > apps/httpd: replace deprecated reply::done() with write_body() > missing header(s) > net: Fix missing throw for runtime_error in create_native_net_device > tests/io_queue: account for token bucket refill granularity in bandwidth checks > Merge 'iovec: fix iovec_trim_front infinite loop on zero-length iovecs' from Travis Downs tests: add regression tests for zero-length iovec handling iovec: fix iovec_trim_front infinite loop on zero-length iovecs > util/process: graduate process management API from experimental > cooking: don't register ready.txt as a build output > sstring: make make_sstring not static > Add SparkyLinux to debian list in install-dependencies.sh > http: allow control over default response headers > Merge 'chunked_fifo: make cached chunk retention configurable' from Brandon Allard tests/perf: add chunked_fifo microbenchmarks chunked_fifo: set the default free chunk retention to 0 chunked_fifo: make free chunk retention configurable > Merge 'reactor_backend: fix pollable_fd_state_completion reuse in io_uring' from Kefu Chai tests: add regression test for pollable_fd_state_completion reuse reactor_backend: use reset() in AIO and epoll poll paths reactor_backend: fix pollable_fd_state_completion reuse after co_await in io_uring > Merge 'coroutine: Generator cleanups' from Kefu Chai coroutine/generator: extract schedule_or_resume helper coroutine/generator: remove unused next_awaiter classes coroutine/generator: remove write-only _started field coroutine/generator: assert on unreachable path in buffered await_resume coroutine/generator: add elements_of tag and #include <ranges> coroutine/generator: add empty() to bounded_container concept > cmake: bump minimum Boost version to 1.79.0 > seastar_test: remove unnecessary headers > cmake: bump minimum GnuTLS version to 3.7.4 > Merge 'reactor: add get_all_io_queues() method' from Travis Downs tests: add unit test for reactor::get_all_io_queues() reactor: add get_all_io_queues() method reactor: move get_io_queue and try_get_io_queue to .cc file > http: deprecate reply::done(), remove _response_line dead field > core: Deprecate scattered_message > ci: add workflow dispatch to tests workflow > perf_tests: exit non-zero when -t pattern matches no tests > Replace duplicate SEGV_MAPERR check in sigsegv_action() with SEGV_ACCERR. > perf_tests: add total runtime to json output > Merge 'Relax large allocation error originating from json_list_template' from Robert Bindar implement move assignment operator for json_list_template json_list_template copy assignment operator reserves capacity upfront > perf_tests: add --no-perf-counters option > Merge 'Fix to_human_readable_value() ability to work with large values' from Pavel Emelyanov memory: Add compile-time test for value-to-human-readable conversion memory: Extend list of suffixes to have peta-s memory: Fix off-by-one in suffix calculation memory: Mark to_human_readable_value() and others constexpr > http: Improve writing of response_line() into the output > Merge 'websocket: add template parameter for text/binary frame mode and implement client-side WebSocket' from wangyuwei websocket: add template parameter for text/binary frame mode websocket: impl client side websocket function > file: Fix checks for file being read-only > reactor: Make do_dump_task_queue a task_queue method > Merge 'Implement fully mixed mode for output_stream-s' from Pavel Emelyanov tests/output_stream: sample type patterns in sanitizer builds tests/output_stream: extend invariant test to cover mixed write modes iostream: allow unrestricted mixing of buffered and zero-copy writes tests/output_stream: remove obsolete ad-hoc splitting tests tests/output_stream: add invariant-based splitting tests iostream: rename output_stream::_size to ::_buffer_size > reactor_backend: replace virtual bool methods with const bool_class members > resource: Avoid copying CPU vector to break it into groups > perf_tests: increase overhead column precision to 3 decimal places > Merge 'Move reactor::fdatasync() into posix_file_impl' from Pavel Emelyanov reactor: Deprecate fdatasync() method file: Do fdatasync() right in the posix_file_impl::flush() file: Propagate aio_fdatasync to posix_file_impl reactor: Move reactor::fdatasync() code to file.cc reactor,file: Make full use of file_open_options::durable bit file: Add file_open_options::durable boolean file: Account io_stats::fsyncs in posix_file_impl::flush() reactor: Move _fsyncs counter onto io_stats > http: Remove connection::write_body() Closes scylladb/scylladb#29553	2026-05-14 10:45:39 +03:00
Tomasz Grabiec	66439bb753	Merge 'load_balancer: apply balance threshold to intranode shard balancing' from Ferenc Szili - Fix intranode shard balancing to respect the size-based balance threshold, preventing unnecessary migrations when load difference between shards is negligible - Add a regression test that verifies the threshold is respected for intranode balancing The intranode shard balancing loop only stopped when the algorithm exhausted the migration candidates or when a migration would go against convergence (it would increase imbalance instead of decrease it). This caused unnecessary tablet migrations for negligible imbalances (e.g., 0.78% difference between shards). The inter-node balancer already uses `is_balanced()` to stop when the relative load difference is within the configured `size_based_balance_threshold`, but this check was missing from the intranode path. Apply the same `is_balanced()` threshold check that is already used for inter-node balancing to the intranode convergence loop. When the relative load difference between the most-loaded and least-loaded shards on a node is within the threshold, the balancer now stops without issuing further migrations. The test creates a single node with 2 shards and 512 tablets: 1. Balanced scenario (257 vs 255 tablets, same size): relative diff = 0.78% < 1% threshold → verifies no intranode migration is emitted 2. Unbalanced scenario (307 vs 205 tablets, same size): relative diff = 33% >> 1% threshold → verifies intranode migration IS emitted Fixes: SCYLLADB-1775 This is a performance improvement which reduces the number of intranode migrations issued, and needs to be backported to versions with size-based load balancing: 2026.1 and 2026.2 Closes scylladb/scylladb#29756 * github.com:scylladb/scylladb: test: add test for intranode balance threshold in size-based mode tablet_allocator: apply balance threshold to intranode shard balancing	2026-05-13 13:09:52 +02:00
Botond Dénes	e95eb21a16	Merge 'Tablet-aware restore' from Pavel Emelyanov The mechanics of the restore is like this - A /storage_service/tablets/restore API is called with (keyspace, table, endpoint, bucket, manifests) parameters - First, it populates the system_distributed.snapshot_sstables table with the data read from the manifests - Then it emplaces a bunch of tablet transitions (of a new "restore" kind), one for each tablet - The topology coordinator handles the "restore" transition by calling a new RESTORE_TABLET RPC against all the current tablet replicas - Each replica handles the RPC verb by - Reading the snapshot_sstables table - Filtering the read sstable infos against current node and tablet being handled - Downloading and attaching the filtered sstables This PR includes system_distributed.snapshot_sstables table from @robertbindar and preparation work from @kreuzerkrieg that extracts raw sstables downloading and attaching from existing generic sstables loading code. This is first step towards SCYLLADB-197 and lacks many things. In particular - the API only works for single-DC cluster - the caller needs to "lock" tablet boundaries with min/max tablet count - not abortable - no progress tracking - sub-optimal (re-kicking API on restore will re-download everything again) - not re-attacheable (if API node dies, restoration proceeds, but the caller cannot "wait" for it to complete via other node) - nodes download sstables in maintenance/streaming sched gorup (should be moved to maintenance/backup) Other follow-up items: - have an actual swagger object specification for `backup_location` Closes #28436 Closes #28657 Closes #28773 Closes scylladb/scylladb#28763 * github.com:scylladb/scylladb: docs: Update topology_over_raft.md with `restore` transition kind test: Add test for backup vs migration race test: Restore resilience test sstables_loader: Fail tablet-restore task if not all sstables were downloaded sstables_loader: mark sstables as downloaded after attaching sstables_loader: return shared_sstable from attach_sstable db: add update_sstable_download_status method db: add downloaded column to snapshot_sstables db: extract snapshot_sstables TTL into class constant test: Add a test for tablet-aware restore tablets: Implement tablet-aware cluster-wide restore messaging: Add RESTORE_TABLET RPC verb sstables_loader: Add method to download and attach sstables for a tablet tablets: Add restore_config to tablet_transition_info sstables_loader: Add restore_tablets task skeleton test: Add rest_client helper to kick newly introduced API endpoint api: Add /storage_service/tablets/restore endpoint skeleton sstables_loader: Add keyspace and table arguments to manfiest loading helper sstables_loader_helpers: just reformat the code sstables_loader_helpers: generalize argument and variable names sstables_loader_helpers: generalize get_sstables_for_tablet sstables_loader_helpers: add token getters for tablet filtering sstables_loader_helpers: remove underscores from struct members sstables_loader: move download_sstable and get_sstables_for_tablet sstables_loader: extract single-tablet SST filtering sstables_loader: make download_sstable static sstables_loader: fix formating of the new `download_sstable` function sstables_loader: extract single SST download into a function sstables_loader: add shard_id to minimal_sst_info sstables_loader: add function for parsing backup manifests split utility functions for creating test data from database_test export make_storage_options_config from lib/test_services rjson: Add helpers for conversions to dht::token and sstable_id Add system_distributed_keyspace.snapshot_sstables add get_system_distributed_keyspace to cql_test_env code: Add system_distributed_keyspace dependency to sstables_loader storage_service: Export export handle_raft_rpc() helper storage_service: Export do_tablet_operation() storage_service: Split transit_tablet() into two tablets: Add braces around tablet_transition_kind::repair switch	2026-05-12 16:24:13 +03:00
Avi Kivity	ddb1181103	Merge 'load_balance: fix drain with forced capacity-based balancing' from Ferenc Szili When `force_capacity_based_balancing` is enabled and a node is being drained/excluded, the tablet allocator incorrectly aborts balancing due to incomplete tablet stats - even though capacity-based balancing doesn't depend on tablet sizes. The tablet allocator normally waits for complete load stats before balancing. An exception exists for drained+excluded nodes (they're unreachable and won't return stats). However, when forced capacity-based balancing is active, this exception was not being applied, causing the balancer to reject the drain plan. Adjust the condition in `tablet_allocator.cc` so that the "ignore missing data for drained nodes" logic applies regardless of whether capacity-based balancing is forced. Added a Boost unit test that forces capacity-based balancing and verifies a drained/excluded node gets its tablets migrated even when tablet size stats are missing. This bug was introduced in 2026.1, so this needs to be backported to 2026.1 and 2026.2 Fixes: SCYLLADB-1803 Closes scylladb/scylladb#29791 * github.com:scylladb/scylladb: test: boost: add drain test for forced capacity-based balancing service: allow draining with forced capacity-based balancing	2026-05-12 12:38:25 +03:00
Pavel Emelyanov	1c0f8ab66e	Merge 'sstables: introduce --abort-on-malformed-sstable-error' from Botond Dénes When a malformed sstable error occurs, it is usually caused by actual sstable corruption — a cosmic ray, a bad disk write, etc. However, it can also be caused by memory corruption, where a data structure in memory happens to be read as sstable data. In the latter case, having a coredump of the process at the moment of the error is invaluable for post-mortem debugging, since the exception throwing/catching machinery destroys the stack frames that would point to the corruption site. This patch series introduces `--abort-on-malformed-sstable-error`, a new command-line option (with `LiveUpdate` support) that, when set, causes the server to call `std::abort()` instead of throwing an exception whenever any sstable parse error is detected. This covers all code paths: - Direct `throw malformed_sstable_exception(...)` sites (migrated to `throw_malformed_sstable_exception()`) - Direct `throw bufsize_mismatch_exception(...)` sites (migrated to `throw_bufsize_mismatch_exception()`) - `parse_assert()` failures (via `on_parse_error()`) - BTI parse errors (via `on_bti_parse_error()`) The implementation places the flag and helper functions in `sstables/sstables.cc`, next to the existing `on_parse_error()` / `on_bti_parse_error()` infrastructure. The flag defaults to `false`, preserving current behaviour. It is intended to be enabled temporarily when investigating suspected memory corruption. Commit breakdown: 1. Infrastructure: flag, getter/setter, and throw helpers in `sstables/sstables.cc`; config option wired up in `main.cc` 2. `on_parse_error()` and `on_bti_parse_error()` check the new flag 3. All ~50 `throw malformed_sstable_exception(...)` sites migrated 4. Both `throw bufsize_mismatch_exception(...)` sites migrated Refs: SCYLLADB-1087 Backport: new feature, no backport Closes scylladb/scylladb#29324 * github.com:scylladb/scylladb: sstables: migrate all bufsize_mismatch_exception throw sites to throw_bufsize_mismatch_exception() sstables: migrate all malformed_sstable_exception throw sites to throw_malformed_sstable_exception() sstables: make on_parse_error() and on_bti_parse_error() respect --abort-on-malformed-sstable-error sstables: disable abort-on-malformed-sstable-error in tests that corrupt sstables on purpose sstables: introduce --abort-on-malformed-sstable-error infrastructure sstables: refactor parse_path() to return std::expected<> instead of throwing	2026-05-12 12:38:25 +03:00
Pavel Emelyanov	150345cc52	Merge 'test: per-bucket isolation for S3/GCS object storage tests' from Ernest Zaslavsky This series adds per-test bucket isolation to all S3 and GCS object storage tests. Previously, every test shared a single pre-created bucket, which meant tests could interfere with each other through leftover objects and could not run concurrently across multiple `test.py` processes without risking collisions. New `create_bucket`, `delete_bucket`, and `delete_bucket_with_objects` methods on `s3::client`, following the existing `make_request` pattern. `create_bucket` handles the `BUCKET_ALREADY_OWNED_BY_YOU` error gracefully. A new `s3_test_fixture` RAII class for C++ Boost tests that creates a uniquely-named bucket on construction (derived from the Boost test name and pid) and tears down everything — objects, bucket, client — on destruction. All S3 tests in `s3_test.cc` are migrated to use it, removing manual `deferred_delete_object` and `deferred_close` boilerplate. The minio server policy is broadened to allow dynamic bucket creation/deletion. A `client::make` overload that accepts a custom `retry_strategy`, used in tests with a fast 1ms retry delay instead of exponential backoff, significantly reducing test runtime for transient errors during bucket lifecycle operations. Python-side (`test/cluster/object_store`): each pytest fixture (`object_storage`, `s3_storage`, `s3_server`) now creates a unique bucket per test function via `create_test_bucket()` and destroys it on teardown. Bucket names are sanitized from the pytest node name with a short UUID suffix for uniqueness. Object storage helpers (`S3Server`, `MinioWrapper`, `GSFront`, `GSServerImpl`, factory functions, CQL helpers, `s3_server` fixture) are extracted from `test/cluster/object_store/conftest.py` into a shared `test/pylib/object_storage.py` module, eliminating duplication across test suites. The conftest becomes a thin re-export wrapper. Old class names are preserved as aliases for backward compatibility. \| Test Name \| new test specific retry strategy execution time (ms) \| original execution time (ms) \| Δ (ms) \| Speedup \| \|--------------------------------------------------------------\|----------------:\|-------------:\|---------:\|--------:\| \| test_client_upload_file_multi_part_with_remainder_proxy \| 19,261 \| 61,395 \| −42,134 \| 3.2× \| \| test_client_upload_file_multi_part_without_remainder_proxy \| 16,901 \| 53,688 \| −36,787 \| 3.2× \| \| test_client_upload_file_single_part_proxy \| 3,478 \| 6,789 \| −3,311 \| 2.0× \| \| test_client_multipart_copy_upload_proxy \| 1,303 \| 1,619 \| −316 \| 1.2× \| \| test_client_put_get_object_proxy \| 150 \| 365 \| −215 \| 2.4× \| \| test_client_readable_file_stream_proxy \| 125 \| 327 \| −202 \| 2.6× \| \| test_small_object_copy_proxy \| 205 \| 389 \| −184 \| 1.9× \| \| test_client_put_get_tagging_proxy \| 181 \| 350 \| −169 \| 1.9× \| \| test_client_multipart_upload_proxy \| 1,252 \| 1,416 \| −164 \| 1.1× \| \| test_client_list_objects_proxy \| 729 \| 881 \| −152 \| 1.2× \| \| test_chunked_download_data_source_with_delays_proxy \| 830 \| 960 \| −130 \| 1.2× \| \| test_client_readable_file_proxy \| 148 \| 279 \| −131 \| 1.9× \| \| test_client_upload_file_multi_part_with_remainder_minio \| 3,358 \| 3,170 \| +188 \| 0.9× \| \| test_client_upload_file_multi_part_without_remainder_minio \| 3,131 \| 2,929 \| +202 \| 0.9× \| \| test_client_upload_file_single_part_minio \| 519 \| 421 \| +98 \| 0.8× \| \| test_download_data_source_proxy \| 180 \| 237 \| −57 \| 1.3× \| \| test_client_list_objects_incomplete_proxy \| 590 \| 641 \| −51 \| 1.1× \| \| test_large_object_copy_proxy \| 952 \| 991 \| −39 \| 1.0× \| \| test_client_multipart_upload_fallback_proxy \| 148 \| 185 \| −37 \| 1.3× \| \| test_client_multipart_copy_upload_minio \| 641 \| 674 \| −33 \| 1.1× \| No backport needed — this is a test infrastructure improvement with no production code impact beyond the new `s3::client` methods. Closes scylladb/scylladb#29508 * github.com:scylladb/scylladb: test: extract object storage helpers to test/pylib/object_storage.py test: add per-test bucket isolation to object_store fixtures s3: add client::make overload with custom retry strategy test: add s3_test_fixture and migrate tests to per-bucket isolation s3: add create_bucket and delete_bucket to client	2026-05-12 12:38:24 +03:00
Ferenc Szili	6856f51097	test: add test for intranode balance threshold in size-based mode Verify that the load balancer does not issue intranode migrations when the load difference between shards is within the size_based_balance_threshold, and that it does issue migrations when the difference exceeds the threshold.	2026-05-12 10:34:25 +02:00
Pavel Emelyanov	d280987f2c	sstables_loader: Add keyspace and table arguments to manfiest loading helper When restoring a backup into a keyspace under a different name, than the one at which it existed during backup, the snapshot_sstables table must be populated with the _new_ keyspace name, not the one taken from manifest. Same is true for table name. This patch makes it possible to override keyspace/table loaded from manifest file with the provided values. in the future it will also be good to check that if those values are not provided by user, then values read from different manifest files are the same. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-05-12 10:40:23 +03:00
Robert Bindar	c97232bb7b	sstables_loader: add function for parsing backup manifests This change adds functionality for parsing backup manifests and populating system_distributed.snapshot_sstables with the content of the manifests. This change is useful for tablet-aware restore. The function introduced here will be called by the coordinator node when restore starts to populate the snapshot_sstables table with the data that workers need to execute the restore process. Signed-off-by: Robert Bindar <robert.bindar@scylladb.com> Co-authored-by: Pavel Emelyanov <xemul@scylladb.com>	2026-05-12 10:40:22 +03:00
Robert Bindar	f0e8d6c9dd	split utility functions for creating test data from database_test Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>	2026-05-12 10:40:21 +03:00
Robert Bindar	2f19d84ad7	Add system_distributed_keyspace.snapshot_sstables This patch adds the snapshot_sstables table with the following schema: ```cql CREATE TABLE system_distributed.snapshot_sstables ( snapshot_name text, keyspace text, table text, datacenter text, rack text, id uuid, first_token bigint, last_token bigint, toc_name text, prefix text) PRIMARY KEY ((snapshot_name, keyspace, table, datacenter, rack), first_token, id); ``` The table will be populated by the coordinator node during the restore phase (and later on during the backup phase to accomodate live-restore). The content of this table is meant to be consumed by the restore worker nodes which will use this data to filter and file-based download sstables. Fixes SCYLLADB-263 Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>	2026-05-12 10:40:21 +03:00
Taras Veretilnyk	47b4fa920d	test/boost: add tests for large data stats counters Add test_large_data_stats_large_rows, test_large_data_stats_large_cells, and test_large_data_stats_large_collections to verify that the large_data_handler stats counters are correctly incremented during SSTable writes and that unrelated counters remain at zero.	2026-05-11 23:42:14 +02:00
Botond Dénes	ad7ac62835	Merge ' Add a node_owner column (locator::host_id) to system.sstables and make it part of the partition key' from Dimitrios Symonidis Add a node_owner column (locator::host_id) to system.sstables and make it part of the partition key, so the primary key becomesv PRIMARY KEY ((table_id, node_owner), generation). This is the first step toward moving the sstables registry into system_distributed: once distributed, each node's startup scan must read only the rows it owns, which requires the owning node to be part of the partition key. Partitioning by (table_id, node_owner) turns that scan into a single-partition read of exactly the local node's rows. Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1562 No need to backport this, keyspace over object storage is experimental feature Closes scylladb/scylladb#29659 * github.com:scylladb/scylladb: db, sstables: add node_owner to sstables registry primary key db, sstables: rename sstables registry column owner to table_id	2026-05-11 14:08:19 +03:00
Botond Dénes	4ebcc002d6	sstables: disable abort-on-malformed-sstable-error in tests that corrupt sstables on purpose Add scoped_no_abort_on_malformed_sstable_error RAII guard (modeled after seastar::testing::scoped_no_abort_on_internal_error) and use it in all tests that intentionally corrupt sstables and expect malformed_sstable_exception to be thrown rather than the process aborting.	2026-05-11 11:58:14 +03:00
Botond Dénes	c3daa6379c	sstables: refactor parse_path() to return std::expected<> instead of throwing make_entry_descriptor() and the two overloads of parse_path() used to signal parse failures by throwing malformed_sstable_exception, which made parse_path() expensive to use as a probe (e.g. to classify directory entries). Change make_entry_descriptor() and both parse_path() overloads to return std::expected<T, sstring>, where the sstring carries the error message on failure, eliminating the exception overhead at probe call sites. Call sites that previously caught malformed_sstable_exception to treat the path as a non-SSTable file (utils/directories.cc, db/snapshot/backup_task.cc, tools/scylla-sstable.cc) now check the expected result directly. Call sites where a parse failure is a genuine error (sstable_directory.cc, sstables.cc, tools/schema_loader.cc, tools/scylla-sstable.cc) re-throw explicitly as malformed_sstable_exception using the error string, preserving the existing error propagation behaviour.	2026-05-11 11:58:14 +03:00
Nadav Har'El	df8c9b17b8	Merge 'alternator: Graduate Alternator Streams from experimental' from Piotr Szymaniak As a final step for https://scylladb.atlassian.net/browse/SCYLLADB-461 we need to graduate Alternator Streams from experimental. So let's remove `--experimental-features=alternator-streams` and map the obsolete config string to `UNUSED` for backward compatibility. Also, remove the related gating of the feature. Finally, stop providing the config flag in test configs. Fixes SCYLLADB-1680 Fixes #16367 To documentation tracked by https://scylladb.atlassian.net/browse/SCYLLADB-462 still remains. This PR needs to hit 2026.2, so (only) if it branches before the PR is merged to `master`, we'd need to backport. Closes scylladb/scylladb#29604 * github.com:scylladb/scylladb: test: Stop providing alternator-streams experimental flag alternator: Graduate Alternator Streams from experimental	2026-05-10 22:10:03 +03:00
Nadav Har'El	d4aa528834	Merge 'load_balancer: fix tablet allocator dropped table' from Ferenc Szili - Handle dropped tables gracefully in the tablet load balancer's `get_schema_and_rs()` instead of aborting with `on_internal_error` - The load balancer operates on a token metadata snapshot but accesses the live schema for table lookups. A DROP TABLE applied by another fiber between coroutine yield points can remove a table from the live schema while it still exists in the snapshot, causing an abort. `get_schema_and_rs()` now returns `std::optional` and logs a warning in debug log level instead of aborting when a table is missing. All callers skip dropped tables: - `make_sizing_plan`: skips to next table - `make_resize_plan`: skips to next table (merge suppression is moot) - `check_constraints`: returns `skip_info{}` with empty viable targets - `get_rs`: returns `nullptr`, checked by `check_constraints` The call chain is: `make_plan` → `make_internode_plan` → `check_constraints` → `get_rs` → `get_schema_and_rs`. The `make_internode_plan` coroutine has multiple `co_await` yield points (`maybe_yield`, `pick_candidate`) between building the candidate tablet list and checking replication constraints. A DROP TABLE schema mutation applied during any of these yields removes the table from `_db.get_tables_metadata()` while the candidate list still references it. Added `test_load_balancing_with_dropped_table` which simulates the race by capturing a token metadata snapshot, dropping the table, then calling `balance_tablets` with the stale snapshot. Fixes: SCYLLADB-1664 This fix needs to be backported to versions: 2025.4, 2026.1 Closes scylladb/scylladb#29585 * github.com:scylladb/scylladb: test: verify load balancer handles dropped tables gracefully tablet_allocator: handle dropped tables gracefully in get_schema_and_rs	2026-05-10 22:07:51 +03:00

1 2 3 4 5 ...

4757 Commits