scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-05 06:23:03 +00:00

Author	SHA1	Message	Date
Marcin Maliszkiewicz	1dc975c491	Merge 'table_helper: observe detached setup_table() future' from Andrzej Jackowski During shutdown, group0 may be torn down while cache_table_info() has a detached setup_table() future in flight. This causes raft_group_not_found to propagate as an abandoned failed future. Add .handle_exception() to log the failure at debug level instead of leaving the future unobserved. Fixes: SCYLLADB-2224 Backport to 2026.2 and 2026.1, because the test failed on 2026.1 Closes scylladb/scylladb#30093 * github.com:scylladb/scylladb: test: table_helper: verify detached setup failure is consumed table_helper: observe detached setup_table() future	2026-06-01 19:32:34 +02:00
Botond Dénes	bb81dbf65e	Merge 'guardrails: Add replica-side large data guardrails' from Taras Veretilnyk Adds write-path guardrails that reject or warn on mutations targeting partitions, rows, or collections that already exceed configured size thresholds, based on SSTable `large_data_record` metadata. ScyllaDB already detects and records large partitions/rows/cells in `system.large_data_records` after compaction, but takes no preventive action on the write path. Once a partition grows past operational limits it causes latency spikes, OOM, and repair failures. These guardrails let operators set hard and soft thresholds so that writes to already-oversized data are rejected (hard) or logged as warnings (soft) before they make the problem worse. - Intrusive index over SSTable metadata: A per-table `large_data_record_index` maintains three `boost::intrusive::multiset`s (partitions, rows, cells) using `auto_unlink` hooks directly on `large_data_record`. SSTable destruction automatically removes records from the index — no explicit deregistration needed. - Virtual dispatch for zero-cost disabled path: `large_data_guardrail_base` → `noop_large_data_guardrail` / `large_data_guardrail`. Tables without guardrails enabled pay only a virtual call to a no-op. No index is built or maintained for disabled tables. - Schema storage: The per-table flag is stored as a scylla_tables column, following the tablets pattern: only write a live cell when enabled, omit entirely when disabled. The CQL feature gate prevents enabling until all nodes are upgraded. - Write-path integration: The guardrail check runs in `do_apply` after the frozen mutation is deserialized but before it is applied to the memtable. Hint replay and Paxos learn skip the check via `skip_large_data_guardrails`. Uses existing `large__warn_threshold` config options as soft limits and new `large__fail_threshold` options as hard limits. Checked dimensions: - Partition size (bytes) - Partition row count - Row size (bytes) - Collection element count Backport is not required Fixes https://scylladb.atlassian.net/browse/SCYLLADB-180 Closes scylladb/scylladb#29733 * github.com:scylladb/scylladb: test/cqlpy: add per-table toggle, LWT exemption, and multi-category tests test/cqlpy: add large collection guardrail tests test/cqlpy: add large row guardrail tests test/cqlpy: add large partition guardrail tests test/boost: add large_data_guardrail unit tests test/cluster: add large data guardrails rolling upgrade test replica: wire large_data_guardrail into the write path schema: add per-table large_data_guardrails_enabled flag db: implement large_data_guardrail db: implement large_data_record_index sstables: add intrusive index hook to large_data_record db: add large_collection_elements_fail_threshold config option db: add large_row_fail_threshold_mb config option db: add rows_count_fail_threshold config option db: add large_partition_fail_threshold_mb config option replica: introduce large_data_exception	2026-06-01 13:26:00 +03:00
Nadav Har'El	33dce2b7fc	Merge 'cql3: statement_restrictions: continue exploitation of predicate work' from Avi Kivity In `6165124fcc`, we changed analysis of expressions in the WHERE clause to use predicates, an annotated form of an expression that constrains a column when the expression is set to true. Here, we exploit this work to simplify the analysis further, reusing already computed attributes rather than re-analyzing the expression. Not backporting, this is a refactor with no functional change and no bugs fixed. Closes scylladb/scylladb#30049 * github.com:scylladb/scylladb: cql3: statement_restrictions: simplify find_idx to return only the index cql3: statement_restrictions: replace has_only_eq_binops with tracked booleans cql3: statement_restrictions: use index-selection predicates for value_for_index_partition_key cql3: statement_restrictions: replace find_clustering_order with predicate order field cql3: statement_restrictions: replace has_partition_token with variant check cql3: statement_restrictions: replace has_slice with predicate is_slice check cql3: statement_restrictions: replace contains_multi_column_restriction filter with _has_multi_column cql3: statement_restrictions: remove unused find_needs_filtering and has_slice_or_needs_filtering cql3: statement_restrictions: replace has_slice_or_needs_filtering with tracked bool cql3: statement_restrictions: replace contains_multi_column_restriction with _has_multi_column cql3: statement_restrictions: replace find_needs_filtering with predicate op check cql3: statement_restrictions: replace find_binop is_on_collection with tracked bool cql3: statement_restrictions: replace find_binop column extraction with predicate on field cql3: statement_restrictions: set op on all binary-operator-derived predicates	2026-05-31 23:22:43 +03:00
Avi Kivity	503add224d	cql3: statement_restrictions: simplify find_idx to return only the index The expression returned as the second element of find_idx()'s pair was stored in view_indexed_table_select_statement::_used_index_restrictions but never read — dead code. Simplify find_idx() to return just the optional<index>, and remove the dead member and constructor parameter from view_indexed_table_select_statement. The now unused _idx_restrictions is also removed.	2026-05-29 17:18:21 +03:00
Taras Veretilnyk	ff84b1dbc4	test/boost: add large_data_guardrail unit tests 8 tests covering the record_compare template comparator, intrusive multiset equal_range grouping with heterogeneous lookup_key, and auto_unlink on record destruction.	2026-05-29 12:51:42 +02:00
Pavel Emelyanov	8b2ff16cae	schema: Move grace_period from schema_ctxt to schema_registry The schema_registry_grace_period field on schema_ctxt was only used by schema_registry itself for eviction timing. Move it to be a direct member of schema_registry, passed at init() time. This removes one db::config dependency from schema_ctxt. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Closes scylladb/scylladb#30038	2026-05-29 13:42:23 +03:00
Taras Veretilnyk	5a0974e781	schema: add per-table large_data_guardrails_enabled flag Add a per-table large_data_guardrails_enabled flag controlled via the CQL table property WITH large_data_guardrails_enabled = true\|false. Store the flag as a boolean column in system_schema_ext.scylla_tables. Only write a live cell when enabled; when disabled (the default), omit the cell entirely so that old nodes that don't know this column can still read the SSTable during rolling upgrade or rollback. When the property transitions from true to false via ALTER TABLE, a tombstone is written in make_update_table_mutations to override the previous live cell — this is safe because the CQL feature gate ensures all nodes are upgraded before the property can be set to true. Gate the CQL property behind the LARGE_DATA_GUARDRAILS cluster feature: attempting to set large_data_guardrails_enabled = true before all nodes advertise the feature raises a ConfigurationException.	2026-05-29 12:18:33 +02:00
Botond Dénes	46631692cd	mutation_fragment_stream_validator: use legacy byte order for same-token partition key comparison When two partition keys share the same token, their relative order is determined by their raw serialized bytes (legacy_tri_compare), which matches the physical on-disk order in SSTables. The validator was using partition_key::tri_compare instead — a type-aware comparator that can disagree with byte order for types like timeuuid. The result was a false-positive "out-of-order partition key" error for any two same-token partitions whose timeuuid (or other type-aware) order is the reverse of their byte order. In scrub mode this caused the second partition to be silently dropped. Fixes: SCYLLADB-2304 Closes scylladb/scylladb#30120	2026-05-29 11:54:20 +02:00
Tomasz Grabiec	5ceabcbcc5	Merge 'tablets: fix update_tablet_metadata failures during bootstrap' from Aleksandra Martyniuk When partition_split_builder splits a tablet metadata partition into multiple mutations, the first mutation gets the partition tombstone and/or static row while subsequent mutations contain only clustered rows. The hint logic would correctly clear tokens (marking a full partition read) upon seeing the tombstone in the first mutation, but then re-add tokens when processing the subsequent row-only mutations. This caused update_tablet_metadata to attempt a point update via mutate_tablet_map_async on a tablet map that doesn't exist yet during bootstrap, throwing no_such_tablet_map and failing the snapshot transfer. Fix by adding a full_read flag to table_hint. Once a full partition read is decided (due to partition tombstone, range tombstone, static row, or row deletion), the flag prevents subsequent mutations for the same table from re-adding tokens. Additionally, fall back to a full partition read when the tablet map is missing locally, which happens when the joining node receives tablet metadata for a table it has never seen before. Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-2303. Needs backports to 2026.1+. 2026.1 introduces the regression with `b17a36c071` Closes scylladb/scylladb#30115 * github.com:scylladb/scylladb: tablets: fall back to full partition read when tablet map is missing tablets: fix hint re-adding tokens after full partition read decision	2026-05-29 11:53:36 +02:00
Aleksandra Martyniuk	d6c1707a04	tablets: fix hint re-adding tokens after full partition read decision When partition_split_builder splits a tablet metadata partition into multiple mutations, the first mutation gets the partition tombstone and/or static row while subsequent mutations contain only clustered rows. The tablet metadata change hint logic would correctly clear tokens (marking a full partition read) upon seeing the tombstone in the first mutation, but then re-add tokens when processing the subsequent row-only mutations. This caused update_tablet_metadata to attempt a point update via mutate_tablet_map_async on a tablet map that doesn't exist yet during bootstrap, throwing no_such_tablet_map and failing the snapshot transfer. Fix by adding a full_read flag to table_hint. Once a full partition read is decided (due to partition tombstone, range tombstone, static row, or row deletion), the flag prevents subsequent mutations for the same table from re-adding tokens.	2026-05-27 15:36:16 +02:00
Botond Dénes	555cfbcd38	Merge 'treewide: replace deprecated smp::count and smp::all_cpus() with new APIs' from Avi Kivity Replace all uses of the deprecated seastar::smp::count with this_smp_shard_count() and smp::all_cpus() with this_smp_all_shards() across the ScyllaDB codebase (seastar submodule untouched). Both replacement functions require a reactor thread context. All call sites were verified to run on reactor threads. Notable cases: - dht/token-sharding.hh: this_smp_shard_count() is used as a default parameter value. This is safe since all callers are on reactor threads, but the expression is now evaluated at each call site rather than being a reference to a global variable. - service/storage_service.hh, locator/abstract_replication_strategy.hh, ent/encryption/encryption.cc: used in default member initializers and constructor member-init-lists. Objects are always constructed on reactor threads. - schema_builder: sometimes called from BOOST_AUTO_TEST_CASE without a reactor. Added pre-patch that makes the implicit shard count parameter implicit and pass 1 in those cases. Not changed: - scylla-gdb.py: reads smp::count as a GDB symbol (no reactor context). - Python test files: only reference smp::count in comments/strings. No backport: the Seastar commit that deprecated these function hasn't (and won't) make its way into any release branches (and the warnings are cosmetic anyway) Closes scylladb/scylladb#29990 * github.com:scylladb/scylladb: treewide: replace deprecated smp::count and smp::all_cpus() with new APIs scylla-gdb: read shard count from smp::_this_smp instead of smp::count schema_builder: make shard_count an explicit constructor parameter	2026-05-27 09:42:06 +03:00
Avi Kivity	8010e408a2	treewide: replace deprecated smp::count and smp::all_cpus() with new APIs Replace all uses of the deprecated seastar::smp::count with this_smp_shard_count() and smp::all_cpus() with this_smp_all_shards() across the ScyllaDB codebase (seastar submodule untouched). Both replacement functions require a reactor thread context. All call sites were verified to run on reactor threads. Notable cases: - dht/token-sharding.hh: this_smp_shard_count() is used as a default parameter value. This is safe since all callers are on reactor threads, but the expression is now evaluated at each call site rather than being a reference to a global variable. - service/storage_service.hh, locator/abstract_replication_strategy.hh, ent/encryption/encryption.cc: used in default member initializers and constructor member-init-lists. Objects are always constructed on reactor threads. Not changed: - scylla-gdb.py: reads smp::count as a GDB symbol (no reactor context). - Python test files: only reference smp::count in comments/strings.	2026-05-26 17:35:20 +03:00
Avi Kivity	c59985c38b	Merge 'cql3: limit large allocations when parsing queries' from Botond Dénes Queries are stored and passed around as sstring/std::string_view. While normally they are small enough to not cause problems, as the `test_cdc_large_values.TestLargeColumnsWithCDC.test_single_column_blob_max_size_with_cdc_preimage_full_postimage[unprepared_statements]` demonstrates, queries can be arbitrarily large, putting heavy strain on Scylla internals via large allocations, in the extreme case causing denial of service. This PR attempts to alleviate this by using fragmented storage for queries: read query as fragmented string from the input stream in `transport/server.cc`, propagate it as such to `query_processor::prepare()` and also store it as such in `cql3::cql_statement::raw_cql_statement`. Also avoid linearizing raw values during in the CQL expression tree: switch `cql3::expr::untyped_constant::raw_text` to fragmented storage. For this to be possible, some infrastructure code had to be made fragmented storage friendly: ascii/utf8 validation, hashers, from_hex and importantly: `abstract_type::from_string()`. Unfortunately, the query still has to be linearized for parsing itself, as ANTLR -- although allows for custom InputStream implementation -- plays pointer arithmetics games with the pointers obtained from them, so fragmented input cannot be used. Still, this PR limits the places where the query is linearized to the following: * Parsing * Audit * Logs and error messages So the normal query paths for queries that actually can get arbitrarily large (UPDATE and INSERT) should only linearize the query temporarily for parsing. Fixes #10779 Improvement, no backport Closes scylladb/scylladb#28619 * github.com:scylladb/scylladb: tracing: add_query(): change query param to utils::chunked_string cql3: store raw query string in utils::chunked_string serializer: add serializer<utils::chunked_string> utils/reusable_buffer: add get_linearized_view(managed_bytes_view) cql3/expr: use utils::chunked_string for untyped_constant::raw_text types: abstract_type::from_string() switch to fragmented buffers (implementation) types: abstract_type::from_string() switch to fragmented buffers (interface) types: use write_fragmented from utils/fragment_range.hh types: timestamp_from_string(): don't assume std::string_view is null-terminated types/duration: don't assume std::string_view is null-terminated utils/hashers: add calculate(managed_bytes_view) overload utils/ascii: add validate(managed_bytes_view) overload utils: add managed_bytes_fwd.hh utils: add chunked_string utils: add managed_bytes_basic_view::byte_iterator	2026-05-26 15:00:53 +03:00
Andrzej Jackowski	45ff773466	test: table_helper: verify detached setup failure is consumed Add test_best_effort_setup_table_failure_is_consumed which triggers a setup_table() failure via a missing keyspace and asserts no abandoned future escapes. This guards against regressions where the detached future loses its exception handler. Remove the test_skipped_no_error_injection placeholder since the new test runs unconditionally keeping the suite non-empty in all build modes.	2026-05-26 13:32:56 +02:00
Avi Kivity	f165b396fd	schema_builder: make shard_count an explicit constructor parameter A recent Seastar update deprecated smp::count and introduced this_smp_shard_count() as a replacement. One difference is that this_smp_shard_count() wants to run on a reactor thread. This poses a problem for non-reactor tests (BOOST_AUTO_TEST_CASE) that nevertheless use a schema, as the schema_builder constructor references smp::count. If we replace it with this_smp_shard_count() then it will crash when running without a reactor. To fix, remove the implicit this_smp_shard_count() call from raw_schema's constructor and require callers to pass shard_count explicitly to schema_builder. This allows tests that don't run on a reactor thread to construct schemas without crashing. Production code and reactor-based tests pass this_smp_shard_count(). Non-reactor test files (expr_test, keys_test, nonwrapping_interval_test, wrapping_interval_test, bti_key_translation_test, range_tombstone_list_test) pass a fixed shard count of 1. Note: sstable_test.cc is a Seastar test file (SEASTAR_THREAD_TEST_CASE) but also contains one plain BOOST_AUTO_TEST_CASE (test_empty_key_view_comparison) that constructs a schema_builder without a reactor context. This test also receives a fixed shard count of 1.	2026-05-26 11:55:56 +03:00
Botond Dénes	0fd25dc47c	Merge 'Replace get_injection_parameters() with inject_parameter() where appropriate' from Pavel Emelyanov Several error injection sites use the low-level get_injection_parameters() API to fetch the entire parameters map and then manually look up a single key. The inject_parameter() API is better suited for these cases — it combines the enabled check and typed single-parameter extraction in one call, returning std::optional. Cleaning error injection usage, not backporting Closes scylladb/scylladb#29970 * github.com:scylladb/scylladb: test: Use inject_parameter() in row_cache_test sstables: Use inject_parameter() for mx reader fill buffer timeout streaming: Use inject_parameter() for order_sstables_for_streaming	2026-05-26 10:32:44 +03:00
Botond Dénes	2c9a5f9634	types: abstract_type::from_string() switch to fragmented buffers (implementation) The previous patch changed the interface and callers, this one updates the implementation to actually work with fragmented buffers. Most types just use with_linearized() to linearize the fragmented input buffer for parsing. This is fine, as most types have a fixed or bounded-size string representation that is small. Importantly, the input is not linearized for the 3 types which have unbounded values: ascii, bytes and text. The tuple type can contain any of these types itself, so it is also converted to avoid linearization.	2026-05-26 09:08:06 +03:00
Botond Dénes	597d4252dc	types: abstract_type::from_string() switch to fragmented buffers (interface) Change input: str::string_view -> utils::chunked_string_view. Change return value: bytes -> managed_bytes. This patch only changes the interface, with some to_bytes() sprinkled in the internals to deal with recursive calls. Internals will be updated in the next patch, to keep the churn of updating callers separate from the actually important changes.	2026-05-26 09:08:06 +03:00
Botond Dénes	a9028d88b2	utils/hashers: add calculate(managed_bytes_view) overload Uses update() for each fragment, then finalize. Yields identical hash to calling calculate(std::string_view) with linearized buffer. This is checked by new tests.	2026-05-26 09:08:05 +03:00
Botond Dénes	a2fff12bcd	utils: add chunked_string A thin facade over managed_bytes[_view], offering some extra convenience for working with strings, as well as a strong type communicating the purpose (storing text instead of a blob). Also introduces utils::from_hex(chunked_string_view), a fragmented hex-decode that operates directly on a chunked_string_view without requiring linearization. Hex pairs straddling fragment boundaries are handled via a carry-over nibble.	2026-05-26 09:08:05 +03:00
Botond Dénes	09743aed36	utils: add managed_bytes_basic_view::byte_iterator bytes-wise iterator which works both as bidirectional-iterator and as output-iterator (for mutable views). Allows using managed_bytes_view in algorithms which are iterator based. Added unit tests for covering the iterator functionality.	2026-05-26 09:08:05 +03:00
Nadav Har'El	b026aea6f7	cql3/expr: add NEG unary operator for numeric negation This patch adds a new expression type, unary_operator, analogous to the existing binary_operator but takes just one operand instead of two. This patch also implements the first and only unary operator type, unary_oper_t::NEG, implementing negation (unary minus) for all numeric types. For fixed-width integer types overflow or underflow results in an error. If the operand is NULL, the result is a NULL as well. The new operator is not yet used by the CQL syntax - our parser doesn't parse arithmetic expressions yet. We also do not plan to use it in the following patch which uses the separate SUB (subtraction) operation, not the new NEG. But since I already implemented a unary minus operator, and we'll surely need it in the future for general arithmentic operations, I thought I might as well include this patch as well. Refs #22918 ("Support arithmetic operators")	2026-05-25 10:08:11 +03:00
Nadav Har'El	f27d1f08fc	cql3/expr: add SUB binary operator for numeric subtraction In this patch we add to our expressions oper_t::SUB, for subtraction, analogous to the ADD from the previous patch. The only reason why we need a separate SUB operation and can't just combine ADD with a unary minus (NEG) operator is the minimum integer in fixed-sized integer. For example, 8-bit integers have the range -128...127. A subtraction like -1 - (-128) is valid (its value is 127) but the negation of (-128) would be invalid (128). One of the tests we add in this patch validates this fact. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-05-25 10:06:28 +03:00
Nadav Har'El	083adf84ab	cql3/expr: add ADD binary operator for numeric addition Extend oper_t with a new ADD operator, to represent addition between two numeric expressions. Supports all numeric types - tinyint, smallint, int, bigint, float, double, varint, and decimal. For fixed-width integer type overflow or underflow results in an error. If one of the operand is NULL, the result is also a NULL. The new operator is not yet used by the CQL syntax - our parser doesn't parse arithmetic expressions yet. We plan to start using this new operator in a following patch which implements counter syntax ("SET r = r + 1" ) for LWT, but in the future we can use it for more general cases. At the moment, ADD requires that both operands have the same type. This is all we need for the first use case, and this limitation can be relaxed later. Interestingly, ADD is our first binary operator implementation that does not return a boolean. Until now all our binary operators have been comparison operators, and all returned boolean. In contrast, ADD's return type is the type of its operands. This implementation is susceptible to the pre-existing bug SCYLLADB-1576, where adding 1e1000000 and 1 in "decimal" or "varint" types will happily allocate a million-digit number and run out of memory. A reproducing test is included, and this issue will be solved in one place for all operations that have additions (including aggregations and arithmetic expressions) in a followup pull-request. Refs #22918 ("Support arithmetic operators")	2026-05-25 10:05:09 +03:00
Gleb Natapov	0bf050d175	storage_proxy: hold shared pointer to a table object during entire query_partition_key_range_concurrent execution Otherwise if a table is dropped in the middle of a scan the object may disappear. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-2137 Closes scylladb/scylladb#29988	2026-05-24 21:54:08 +03:00
Yaniv Michael Kaul	acd3115645	sstables: include SSTable filename in Stats metadata error messages When Stats metadata is not available or malformed, include the SSTable filename in the error message to help operators identify which SSTable files need attention during startup failures. Fixes: https://github.com/scylladb/scylla-enterprise/issues/5439 Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> AI-assisted: yes Backport: no, benign improvement Closes scylladb/scylladb#29950	2026-05-22 16:49:37 +03:00
Avi Kivity	305346a3ec	Merge 'Don't materialize collections into intermediate representations' from Botond Dénes Collections have an age-old problem in ScyllaDB: they had to be unserialized into an intermediate representation for any access or manipulation. The intermediate representation needs effort to produce and also requires additional memory to store. Both can be significant for large collections. This intermediate representation is then either discarded immediately after use, or re-serialized again. This problem was significant enough for us to consider the use of collections as somewhat of an anti-pattern. But our customers keep using it. Alternator is also a heavy user of collections. This PR aims to solve this problem once and for all. The plan is as follows: * Promote direct use of the serialized collection format: - Add accessor methods to `collection_mutation_view` which read from the serialized format directly: `tomb()`, `size()` and `begin()`/`end()`. - Add a `collection_mutation_writer` which provides container semantics for generating a serialized `collection_mutation` directly on the go (`push_back()`). * Replace all usage of `collection_mutation_description`, `collection_mutation_view_description` and friends with use of the new infrastructure. * Drop the old infrastructure, to avoid accidental regressions. Continues the work started by https://github.com/scylladb/scylladb/pull/29033 and takes it to its conclusion. To help focus review, here is a summary of the patches: * [1, 2] preparatory refactoring: drop some unused abstract_type params * [3, 6] introduce new infrastructure to write and read serialized collections directly; this is the meat of the PR * [6, -1) replace all usage of old materializing infrastructure with usage of the new one * [-1] drop old infrastructure Command: ``` dbuild -it -- build/release/scylla perf-simple-query --collection=16 -c1 -m2G --default-log-level=error ``` \| Metric \| Before \| After \| Change \| \|--------------------------\|--------:\|--------:\|------------\| \| Throughput (median tps) \| 315,760 \| 332,021 \| +5.1% \| \| Instructions/op (median) \| 53,776 \| 48,681 \| -9.5% \| \| CPU cycles/op (median) \| 17,365 \| 16,471 \| -5.1% \| \| Allocations/op \| 85.1 \| 82.1 \| -3.5% \| Significant improvement. Throughput is up ~5%, and both instruction count and cycle count are meaningfully reduced. --- Command: ``` dbuild -it -- build/release/scylla perf-simple-query --collection=16 -c1 -m2G --default-log-level=error --write ``` \| Metric \| Before \| After \| Change \| \|--------------------------\|----------:\|---------:\|-----------\| \| Throughput (median tps) \| 150,823 \| 149,678 \| -0.8% \| \| Instructions/op (median) \| 108,388 \| 103,858 \| -4.2% \| \| CPU cycles/op (median) \| 34,860 \| 35,371 \| +1.5% \| \| Allocations/op \| ~105–108 \| ~102–103 \| -3.0% \| Mixed, mostly neutral. Throughput is essentially flat (within noise). Instructions/op improved by ~4%, allocations dropped slightly, but cycles/op edged up marginally. --- Command: ``` dbuild -it -- build/release/scylla perf-alternator --workload write --developer-mode=1 --alternator-port=8000 --alternator-write-isolation=unsafe -c1 -m2G --default-log-level=error ``` \| Metric \| Before \| After \| Change \| \|--------------------------\|--------:\|-------:\|-----------\| \| Throughput (median tps) \| 55,777 \| 56,051 \| +0.5% \| \| Instructions/op (median) \| 246,215 \|246,610 \| +0.2% \| \| CPU cycles/op (median) \| 77,641 \| 77,020 \| -0.8% \| \| Allocations/op \| 340.4 \| 335.4 \| -1.5% \| Essentially neutral. All metrics are within noise margins. Slight reduction in allocations and cycles, negligible otherwise. --- The change has a clear, substantial positive effect on reads (~5% throughput gain, ~9.5% fewer instructions per op). The write and alternator paths are unaffected in practice — changes there are within measurement noise. No regressions are apparent. This is expected: https://github.com/scylladb/scylladb/pull/29033 did the heavy lifting when it comes to the write path, this PR finishes the job, mostly improving reads. Fixes: #3602 Improvement, no backport. Closes scylladb/scylladb#29127 * github.com:scylladb/scylladb: mutation/collection_mutation: make collection_mutation::_data private mutation_collection: drop collection_mutation_description and friends test: move away from collection_mutation_description tree: move away from collection_mutation_description test: move away from collection_mutation_view::with_deserialized() tree: move away from collection_mutation_view::with_deserialized() types: fix indendation, left broken by previous commit types: move away from collection_mutation_view::with_deserialized() types: serialize_for_cql(): use throwing_assert() instead of SCYLLA_ASSERT() schema: column_computation: move away from collection_mutation_view::with_deserialized() mutation: move away from collection_mutation_view::with_deserialized() alternator: move away from collection_mutation_view::with_deserialized() cdc: move away from collection_mutation_view::with_deserialized() mutation/collection_mutation: printer: don't deserialize collections mutation/collection_mutation: difference(): don't deserialize collections mutation/collection_mutation: merge(): don't deserialize collections mutation/collection_mutation: extract compact_and_expire() to free function mutation/collection_mutation: refactor empty(), is_any_live() and last_update() compaction_garbage_collector: pass collection_mutation to collect() test/boost/mutation_test: add tests for collection_mutation_{view,writer} mutation/collaction_mutation: collection_mutation_view: add methods to inspect content mutation/collection_mutation: add collection_mutation_writer mutation/collection_mutation: collection_mutation(): generate valid collection mutation/collection_mutation: collection_mutation(): remove unused abstract_type param mutation/atomic_cell: drop unused type param from from_bytes()	2026-05-21 17:10:40 +03:00
Andrzej Jackowski	f8156702de	tree: add missing -present to copyright headers ~2076 files used "Copyright (C) YYYY-present ScyllaDB" while ~88 files used "Copyright (C) YYYY ScyllaDB". This inconsistency leads to unnecessary code review discussions and gradual spread of the less common format. Standardize all ScyllaDB copyright headers to use -present. Fixes SCYLLADB-1984 Closes scylladb/scylladb#29876	2026-05-21 10:57:42 +02:00
Botond Dénes	da7903de79	test: move away from collection_mutation_description Use collection_mutation_writer instead.	2026-05-21 10:23:29 +03:00
Botond Dénes	c76ab90fb2	test: move away from collection_mutation_view::with_deserialized() Use the collection_mutation_view directly.	2026-05-21 10:23:29 +03:00
Botond Dénes	7c8b5681f4	mutation/collection_mutation: extract compact_and_expire() to free function The new free-function variant operates on a collection_mutation_view directly, instead of on collection_mutation_description.	2026-05-21 10:23:15 +03:00
Botond Dénes	c5d12d44c6	test/boost/mutation_test: add tests for collection_mutation_{view,writer} Test the new facilities for producing and inspecting serialized collection mutations directly, without intermediate formats.	2026-05-21 08:34:21 +03:00
Botond Dénes	24fdfa34dd	mutation/collection_mutation: collection_mutation(): remove unused abstract_type param	2026-05-21 08:34:21 +03:00
Marcin Maliszkiewicz	83823149e9	Merge 'audit: implement audit_rules config' from Andrzej Jackowski This patch series adds `audit_rules`, a new audit configuration option for fine-grained, role-aware audit filtering with per-rule sink routing. Rules can be configured in `scylla.yaml` or updated live through `system.config` without restarting the node. Each rule specifies target sinks (`table`, `syslog`), statement categories, qualified table name patterns, and role patterns. Table and role patterns use POSIX `fnmatch` with extended glob syntax. For table-scoped categories (`DML`, `DDL`, `QUERY`), a rule matches only when the category, role, and qualified table name all match. For table-independent categories (`AUTH`, `ADMIN`, `DCL`), the table filter is ignored. Empty category or role lists match nothing; an empty table list matches nothing only for table-scoped categories. The new rules are additive with the existing `audit_categories`, `audit_keyspaces`, and `audit_tables` settings: both mechanisms are evaluated for each audit event, and the final sink set is the union of all matches. To avoid evaluating glob patterns on every audit event, audit rules use a preprocessed cache of known roles and tables. The cache is kept in sync through group0 role/table snapshots, role-change notifications, and schema migration notifications. For known entities, rule matching uses precomputed role/table rule sets; unknown entities fall back to direct rule evaluation. When `audit_rules` is empty, per-event rule matching returns immediately and does not evaluate glob patterns. Audit still keeps known role/table metadata in sync while audit is enabled, so rules can be enabled later through live configuration updates without restarting the node. Performance Measured with `perf-simple-query --smp 1 --duration 100` against a null syslog socket. Results show no regression when audit is disabled, and audit-rules performance has at most 1% more instructions than legacy config for equivalent workloads: ``` =============================================================================================================================================================================== Configuration \| Binary \| throughput (tps) \| insns/op \| cpu_cycles/op \| alloc/op \| logal/op \| task/op =============================================================================================================================================================================== audit=none [1] \| baseline \| 206922.4 \| 36591.6 \| 15348.3 \| 58.1 \| 0.0 \| 14.1 audit=none [1] \| this PR \| 207856.4 (+0.5%) \| 36544.9 (-0.1%) \| 15274.0 (-0.5%) \| 58.1 \| 0.0 \| 14.1 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- audit=syslog keyspaces=ks [2] \| baseline \| 94871.8 \| 54163.0 \| 27172.4 \| 72.0 \| 0.0 \| 24.0 audit=syslog keyspaces=ks [2] \| this PR \| 96138.4 (+1.3%) \| 54072.3 (-0.2%) \| 26699.3 (-1.7%) \| 72.0 \| 0.0 \| 24.0 audit=syslog audit-rules=ks [3] \| this PR \| 95142.1 (+0.3%) \| 54457.8 (+0.5%) \| 26953.8 (-0.8%) \| 72.0 \| 0.0 \| 24.0 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- audit=syslog keyspaces=ks-non-existent [4] \| baseline \| 213997.8 \| 36735.6 \| 14848.1 \| 58.1 \| 0.0 \| 14.1 audit=syslog keyspaces=ks-non-existent [4] \| this PR \| 219297.2 (+2.5%) \| 36667.3 (-0.2%) \| 14500.1 (-2.3%) \| 58.1 \| 0.0 \| 14.1 audit=syslog audit-rules=ks-non-existent [5] \| this PR \| 211038.7 (-1.4%) \| 36999.7 (+0.7%) \| 15048.6 (+1.4%) \| 58.1 \| 0.0 \| 14.1 =============================================================================================================================================================================== [1] ./scylla perf-simple-query --smp 1 --duration 100 --audit "none" [2] ./scylla perf-simple-query --smp 1 --duration 100 --audit "syslog" --audit-keyspaces "ks" --audit-categories "DCL,DDL,AUTH,DML,QUERY" --audit-unix-socket-path "/tmp/audit-null.sock" [3] ./scylla perf-simple-query --smp 1 --duration 100 --audit "syslog" --audit-rules '[{"sinks":["syslog"],"categories":["DCL","DDL","AUTH","DML","QUERY"],"qualified_table_names":["ks."],"roles":[""]}]' --audit-unix-socket-path "/tmp/audit-null.sock" [4] ./scylla perf-simple-query --smp 1 --duration 100 --audit "syslog" --audit-keyspaces "ks-non-existent" --audit-categories "DCL,DDL,AUTH,DML,QUERY" --audit-unix-socket-path "/tmp/audit-null.sock" [5] ./scylla perf-simple-query --smp 1 --duration 100 --audit "syslog" --audit-rules '[{"sinks":["syslog"],"categories":["DCL","DDL","AUTH","DML","QUERY"],"qualified_table_names":["ks-non-existent."],"roles":[""]}]' --audit-unix-socket-path "/tmp/audit-null.sock" audit-null.sock was created with `socat -u UNIX-RECV:/tmp/audit-null.sock,type=2 OPEN:/dev/null` ``` Fixes: SCYLLADB-1430 No backport: new feature Closes scylladb/scylladb#29267 * github.com:scylladb/scylladb: test: alternator: audit: rules filtering and batch bypass test: perf: add --audit-rules option to perf-simple-query docs: add audit rules section to the auditing guide test: audit: cover role and schema cache notifications test: audit: cover audit rules cluster behavior audit: rebuild rule caches on group0 snapshot and role changes audit: refresh rule caches on schema, role, and config changes audit: route matching rules to configured sinks test: cover preprocessed audit rule cache audit: add preprocessed rule matching cache audit: pass sink targets to storage helpers test: audit: cover rule matching semantics audit: add rule matching and sink helpers test: audit: cover audit_rules configuration config: add live audit_rules option test: cover audit rule parsing and validation audit: define audit_rule type with parsing and validation	2026-05-20 14:10:45 +02:00
Avi Kivity	6df04c9e5b	Update seastar submodule Changed seastar::http::experimental to seastar::http to reflect graduation of the seastar http API. Changed call to seastar::rename_file() (in sstables/storage.cc, sstables/sstable_directory.cc, sstable/sstables.cc and db/hints/internal/hint_storage.cc) to reflect new default parameter. Updated scylla_gdb test helper get_task() to work with updated accept loop in Seatar. This is just test code (attempts to find a task to operate on), not used in real scylla-gdb.py work, but nevertheless the adjustment keeps backward compatibility. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1798 Fixes https://scylladb.atlassian.net/browse/SCYLLADB-2043 * seastar 485a62b2...510f3148 (43): > reactor_backend: fix iocb double-free and shutdown hang during AIO teardown > file: fix default DMA alignment > http: add to_reply() to redirect_exception with extra-header support > core: propagate syscall errors via `coroutine::exception` > file: assert dma alignments are powers of two > doc: Document undocumented io_tester features and fix output example > backtrace: print the build_id along with the backtrace > reactor: default to oneline backtraces > Merge 'json: formatter: support types with user-defined conversion to sstring' from Benny Halevy tests: json_formatter: test formatter::write with string types json: formatter: support types with user-defined conversion to sstring > httpd_test: fix build failure with Seastar_SSTRING=OFF > net/tls: introduce ssl_call wrapper for SSL I/O > build: disable unused command line argument error for C++ module > coroutine/generator: fix setup of generator's waiting task > tests/tls: set 1000-day validity for self-signed CA cert > net: tls: openssl: disable certificate compression > reactor: reduce steady_clock::now() calls per scheduling quantum > fair_queue: remove notify_request_finished() > loop: use small_vector for parallel_for_each_state incomplete futures > dodge false sharing in spinlock > Merge 'Handle nowait support for reads and writes independently' from Pavel Emelyanov file: Change nowait_works mode detection file: Introduce read-only nowait_mode filesystem: Make nowait_works bit a enum class too file: Make nowait_works bit a enum class > Merge 'net/tls: improve OpenSSL error queue hygiene' from Gellért Peresztegi-Nagy net/tls: assert clean error queue before SSL operations net/tls: clear error queue after successful SSL operations net/tls: clear error queue after successful SSL_CTX_new net/tls: drain error queue on unexpected error codes net/tls: use make_openssl_error for BIO creation failure > vla.hh: add missing includes > Merge 'smp: make smp::count non-static' from Avi Kivity smp: convert all smp::count usages to instance-aware alternatives smp: add per-instance shard_count and this_smp() infrastructure disk_params: document pre-init smp::count access with explicit 0 reactor_backend: document pre-init smp::count access with explicit 0 tests: alien_test: pass shard count to alien thread explicitly > build: fix cmake missing ninja on Ubuntu 26.04 > rpc: Fix uint64 wraparound of expired timeout in send_entry() > Merge 'Generalize some RPC tests' from Pavel Emelyanov tests: Generalize async connection-based scheduling RPC tests tests: Generalize sync connection-based scheduling RPC tests tests: Remove redundant variadic/nonvariadic RPC tuple tests tests: Generalize max timeout RPC tests > net: tls: openssl: Share BIO ptrs across shards > http: fix compilation on clang 22 with c++26 > build: openssl tools needed for test cert generation > reactor: support rename2 > future: fix forwarding of reference types > Merge 'Zero-copy http chunked data sink' from Pavel Emelyanov http: Make chunked data sink zero-copy tests/prometheus_http: Rewrite on top of http::client tests/httpd: Rewrite content_length_limit on top of http::client > tests: Replace ad-hoc http_consumer with production HTTP parser > Merge 'co_return to accept same expressions and types as return' from Alexey Bashtanov tests/unit/{coroutines,futures}: strict types on co_return and set_value api: introduce version 10: core/{coroutine,future}: make `co_return` more strict with types core/{coroutine,future}: preparations to fix `co_return` type semantics > Merge 'Perftune.py: add special handling for mlx5 rss queues number calculation' from Vladislav Zolotarov perftune.py: NetPerfTuner: enhance RSS (a.k.a. "Rx") queues accounting for mlx5 devices perftune.py: update docstring of NetPerfTuner.__get_rps_cpus() method perftune.py: add a method that parses and models the output of the 'ethtool -l' command for a given interface > httpd: rewrite do_accepts/do_accept_one as coroutines > file: add mmap support to file > http: Move client code out of experimental namespace > file: add hugetlbfs support to file system detection > tests: Replace test_source_impl with util::as_input_stream > tests: Replace buf_source_impl with util::as_input_stream > Merge 'rpc_tester: expose throuput for rpc tester' from Marcin Szopa rpc_tester: remove unused payload size variable from job_rpc_streaming class rpc_tester: add start time tracking for throughput calculation, print throughput and msg/s for job_rpc rpc_tester: refactor result emission to use dedicated functions for messages and throughput > iostream: cast first argument of `std::min` to `size_t` Closes scylladb/scylladb#29952	2026-05-20 13:47:12 +03:00
Andrzej Jackowski	7afb90aa6f	test: cover preprocessed audit rule cache The rule cache is the fast path for matching, so its hit, fallback, refresh, and category-bypass behavior needs focused unit coverage. Test transparent hash consistency, cached and uncached lookup paths, incremental entity add/remove, rule refresh, and empty-rules short circuit. Refs SCYLLADB-1430	2026-05-20 06:55:15 +02:00
Andrzej Jackowski	67ecdba456	test: audit: cover rule matching semantics Rule matching is reused by both the preprocessed cache and the fallback path -- unit-test it separately so coupling failures do not mask matching bugs. Cover category bitmask, glob patterns for tables and roles, AUTH/ADMIN/DCL table bypass, empty-keyspace batch bypass, and sink bitmask conversion. Refs SCYLLADB-1430	2026-05-20 06:55:15 +02:00
Andrzej Jackowski	762fd5d455	test: audit: cover audit_rules configuration Audit rules enter through three paths (YAML, CQL, CLI), each with its own parsing and tracking -- cover all entry points before routing can depend on them. Test loading from YAML, live update via CQL and server API, CLI parsing, invalid value rejection at each path, and observer notification on live update. Refs SCYLLADB-1430	2026-05-20 06:55:14 +02:00
Andrzej Jackowski	3cc55dd6eb	test: cover audit rule parsing and validation Parsing and validation are the first consumer-visible surface of audit rules -- cover them before building higher layers. Test JSON parsing (valid, malformed, missing fields), rule validation (unknown sinks, invalid categories), and JSON round-trip serialization. Refs SCYLLADB-1430	2026-05-20 06:55:14 +02:00
Pavel Emelyanov	c23b086400	test: Use inject_parameter() in row_cache_test Replace get_injection_parameters().contains() with inject_parameter() for polling the "suspended" signal. The inject_parameter() API is more appropriate for checking a single parameter and reduces the usage of the lower-level get_injection_parameters() bulk accessor. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-19 18:23:11 +03:00
Szymon Malewski	15493872b2	vector_search: fix decimal/varint precision loss in filter value_to_json() value_to_json() converts CQL values to JSON for vector search filters. For decimal and varint types, it used rjson::parse() on the JSON string, which parses through a double and silently loses precision for values exceeding ~15 significant digits — producing wrong filter results. Additionally, for decimal type we need an exact string representation that preserves the original (unscaled, scale) pair, because partition keys use byte-level identity: different serialized representations of the same numeric value are distinct rows, so the filter must reproduce the exact representation stored in the key. Add big_decimal::to_string_canonical() which follows the Java BigDecimal toString() spec (JDK 8+), producing a bijective string representation that uses exponential notation for extreme scales instead of expanding trailing zeros (which could cause OOM). This could replace to_string(), but doing so has wider consequences (e.g. hash/equality contract for decimal_type) described in SCYLLADB-1574. Use it in value_to_json() for decimal_type, and use rjson::from_string() for varint_type, both bypassing the lossy double parse path. Tests cover the new to_string_canonical() and the filter fix, as well as existing decimal type behavior (key representation, clustering order, toJson) that we rely on and must not break. The CQL decimal type tests (test_type_decimal.py) also pass against Cassandra. Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1583 Refs: https://scylladb.atlassian.net/browse/SCYLLADB-1574 Closes scylladb/scylladb#29505	2026-05-18 17:07:26 +03:00
Marcin Maliszkiewicz	628e1ef2de	Merge 'Introduce auth::config to decouple auth modules from db::config' from Pavel Emelyanov Auth modules (authenticators, role managers, and auth::service) access their configuration options by reaching into db::config through the query processor. This abuses database as proxy object to get configuration. This series introduces a dedicated auth::config struct that carries the configuration options used by auth modules.The config is populated in main.cc and delivered to each shard via sharded_parameter. This makes auth service conform to the overall design, where db::config is split into smaller per-service configs on start, thus decoupling individual components/services from global configuration. Cleaning components dependencies, not backporting. Closes scylladb/scylladb#29870 * github.com:scylladb/scylladb: auth: Remove unused default_superuser() function auth: Switch role managers to use auth::config auth: Switch authenticators to use auth::config auth: Introduce auth::config and wire it through service	2026-05-18 11:32:11 +02:00
Pavel Emelyanov	9b58d2213b	auth: Switch role managers to use auth::config Convert all role manager implementations to receive their configuration from auth::config instead of accessing db::config through the query processor: - standard_role_manager: reads superuser name from config - ldap_role_manager: reads LDAP URL template, attribute, bind credentials, and permissions update interval from config; passes config to inner standard_role_manager - maintenance_socket_role_manager: keeps a const reference to service's config and passes it directly when lazily constructing standard_role_manager Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-05-15 18:55:02 +03:00
Petr Gusev	8a76ec7e65	test/boost: add regression test for missing tablet routing after CAS bounce Add test_tablet_routing_info_after_cas_shard_bounce that verifies TABLETS_ROUTING_V1 payload is returned after an internal CAS shard bounce. The test simulates the transport-layer bounce: it creates a table whose single tablet replica lands on a shard different from the test thread, executes an LWT (which bounces), then transfers client_state via client_state_for_another_shard (preserving _original_shard) and re-executes on the tablet shard. The test asserts that check_locality() correctly detects the misrouting and returns tablet routing info. Refs SCYLLADB-2041	2026-05-15 11:56:14 +02:00
Avi Kivity	6db152afbb	Update seastar submodule Drop local formatter for seastar::http::reply, which should have been added to Seastar in the first place, and now conflicts. Also drop local formatters for types that are aliases for Seastar types which have gained formatters. Disable recently-gained TLS use of OpenSSL instead of gnutls. We don't need it, and it causes link errors with LTO. Fix incorrect skipping in encrypted_file_test, which computed the remaining stream length but did not account for already consumed size_to_compare. Change utils::gcp::storage::client::object_data_source::skip() to match new Seastar behavior (rejecting skip-past-eof with an exception). This is needed since `30f1075544` switched the test's data source to a Seastar implementation. It is also more correct - if we're asked to skip n bytes but the stream doesn't have n bytes, this is a protocol violation. Contains test fix from Pavel, exposed by [1]: test: Handle premature EOF in test_gcp_storage_skip_read The test intentionally uses file_size larger than the actual object to exercise EOF behavior. When input_stream::skip() is called after EOF, it throws std::runtime_error("premature end of stream"). Catch this specific exception from both streams, verify they agree, and exit the loop gracefully. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> [1] `cbd1e17d2f`, included in this Seastar submodule update * seastar 4d268e0e...485a62b2 (50): > reactor: open_directory(): honor bypass_fsync > http: Add formatters for http::request and http::reply > Merge 'Assorted set of io-tester cleanups' from Pavel Emelyanov io_tester: Remove unused and internal-only accessor io_tester: Move think-time machinery into thinker_state io_tester: Move _file to io_class_data io_tester: Replace class_data::_start member with a local variable io_tester: Move _alignment from class_data to io_class_data io_tester: Remove buffer allocation from top-level request issuing io_tester: Cleanup context::stop() invocation io_tester: Allocate write buffer once to fill a file io_tester: Declare quantiles arrays as static constexpr io_tester: Drop class_data::type_str() io_tester: Replace != "" comparisons with .empty() io_tester: Replace gen_class_data() if/else chain with a switch io_tester: Deduplicate vectorized I/O classes > io_tester: fix crash from missing metric during startup > net: tls: adjust openssl integration to new module support > http/client: Count and export integrated queue length > Merge 'Introduce pipe_data_source_impl and pipe_data_sink_impl' from Pavel Emelyanov fstream: add pipe_data_source_impl and pipe_data_sink_impl pollable_fd: add write_some/write_all backed by writev pollable_fd: rename write_some/write_all(iovec) to send_some/send_all > reactor: Make pollable_fd_state helper methods private > module: extend seastar.cppm with comprehensive public API exports > Merge 'Add exhaustive input_stream invariant test + fixes' from Pavel Emelyanov tests: add exhaustive input_stream read/skip invariant test iostream: make skip() reject premature end of stream with exception > Merge 'Allow runtime selectability of GnuTLS or OpenSSL' from Noah Watkins net/tls: avoid potential read-past-buffer net/tls: move credential methods to generic tls layer net/tls: rename credentials_impl::dh_params to set_dh_params test/tls: enable openssl tls unit test test/tls: fix CA cert generation to use v3_ca extensions github: disable parallel test execution in alpine workflow crypto: support compiling seastar without gnutls net/tcp: use crypto provider for md5 calculation tls: fix test_peer_certificate_chain_handling for OpenSSL net/tls: fix test for self-signed server cert opoenssl compat net/tls: disable priority strings test for openssl provider core/crypto: expose crypto backend name for introspection test/tls: remove gnutls version guard net/tls: add openssl tls backend http: use backend agnostic tls error code net/tls: make error codes configurable by each tls backend net/tls: move reloadable_credentials to generic tls layer net/tls: move build_certificate to generic tls layer net/tls: move apply_to() to generic tls layer net/tls: move credential methods to generic tls layer net/tls: add OpenSSL-specific methods to public API with no-op defaults net/tls: introduce dh_params and credentials abstraction layer net/tls: add credentials_impl abstract base class net/tls: dispatch tls::error_category() through crypto_provider net/tls: dispatch wrap_client/wrap_server through crypto_provider net/tls: add tls_backend interface to crypto_provider net/tls: move public tls API methods to generic tls layer net/tls: move formatting utilities to generic tls layer net/tls: move credentials_builder blob methods to generic tls layer net/tls: move dh_params::from_file to generic tls layer net/tls: move abstract_credentials file methods to generic tls layer net/tls: move tls_socket_impl to generic tls layer net/tls: move server_session to general tls layer net/tls: move tls_connected_socket_impl to generic tls layer net/tls: move net::get_impl to generic tls layer net/tls: move session_ref to generic tls layer net/tls: add session_impl abstract interface for tls pluggability net/tls: rename tls.cc to be gnutls specific crypto: introduce crypto provider abstraction http: remove unused include > tls: test_send_two_large > rpc: include exception type for remote errors > GHA: increase timeout to 60 minutes > apps/httpd: replace deprecated reply::done() with write_body() > missing header(s) > net: Fix missing throw for runtime_error in create_native_net_device > tests/io_queue: account for token bucket refill granularity in bandwidth checks > Merge 'iovec: fix iovec_trim_front infinite loop on zero-length iovecs' from Travis Downs tests: add regression tests for zero-length iovec handling iovec: fix iovec_trim_front infinite loop on zero-length iovecs > util/process: graduate process management API from experimental > cooking: don't register ready.txt as a build output > sstring: make make_sstring not static > Add SparkyLinux to debian list in install-dependencies.sh > http: allow control over default response headers > Merge 'chunked_fifo: make cached chunk retention configurable' from Brandon Allard tests/perf: add chunked_fifo microbenchmarks chunked_fifo: set the default free chunk retention to 0 chunked_fifo: make free chunk retention configurable > Merge 'reactor_backend: fix pollable_fd_state_completion reuse in io_uring' from Kefu Chai tests: add regression test for pollable_fd_state_completion reuse reactor_backend: use reset() in AIO and epoll poll paths reactor_backend: fix pollable_fd_state_completion reuse after co_await in io_uring > Merge 'coroutine: Generator cleanups' from Kefu Chai coroutine/generator: extract schedule_or_resume helper coroutine/generator: remove unused next_awaiter classes coroutine/generator: remove write-only _started field coroutine/generator: assert on unreachable path in buffered await_resume coroutine/generator: add elements_of tag and #include <ranges> coroutine/generator: add empty() to bounded_container concept > cmake: bump minimum Boost version to 1.79.0 > seastar_test: remove unnecessary headers > cmake: bump minimum GnuTLS version to 3.7.4 > Merge 'reactor: add get_all_io_queues() method' from Travis Downs tests: add unit test for reactor::get_all_io_queues() reactor: add get_all_io_queues() method reactor: move get_io_queue and try_get_io_queue to .cc file > http: deprecate reply::done(), remove _response_line dead field > core: Deprecate scattered_message > ci: add workflow dispatch to tests workflow > perf_tests: exit non-zero when -t pattern matches no tests > Replace duplicate SEGV_MAPERR check in sigsegv_action() with SEGV_ACCERR. > perf_tests: add total runtime to json output > Merge 'Relax large allocation error originating from json_list_template' from Robert Bindar implement move assignment operator for json_list_template json_list_template copy assignment operator reserves capacity upfront > perf_tests: add --no-perf-counters option > Merge 'Fix to_human_readable_value() ability to work with large values' from Pavel Emelyanov memory: Add compile-time test for value-to-human-readable conversion memory: Extend list of suffixes to have peta-s memory: Fix off-by-one in suffix calculation memory: Mark to_human_readable_value() and others constexpr > http: Improve writing of response_line() into the output > Merge 'websocket: add template parameter for text/binary frame mode and implement client-side WebSocket' from wangyuwei websocket: add template parameter for text/binary frame mode websocket: impl client side websocket function > file: Fix checks for file being read-only > reactor: Make do_dump_task_queue a task_queue method > Merge 'Implement fully mixed mode for output_stream-s' from Pavel Emelyanov tests/output_stream: sample type patterns in sanitizer builds tests/output_stream: extend invariant test to cover mixed write modes iostream: allow unrestricted mixing of buffered and zero-copy writes tests/output_stream: remove obsolete ad-hoc splitting tests tests/output_stream: add invariant-based splitting tests iostream: rename output_stream::_size to ::_buffer_size > reactor_backend: replace virtual bool methods with const bool_class members > resource: Avoid copying CPU vector to break it into groups > perf_tests: increase overhead column precision to 3 decimal places > Merge 'Move reactor::fdatasync() into posix_file_impl' from Pavel Emelyanov reactor: Deprecate fdatasync() method file: Do fdatasync() right in the posix_file_impl::flush() file: Propagate aio_fdatasync to posix_file_impl reactor: Move reactor::fdatasync() code to file.cc reactor,file: Make full use of file_open_options::durable bit file: Add file_open_options::durable boolean file: Account io_stats::fsyncs in posix_file_impl::flush() reactor: Move _fsyncs counter onto io_stats > http: Remove connection::write_body() Closes scylladb/scylladb#29553	2026-05-14 10:45:39 +03:00
Tomasz Grabiec	66439bb753	Merge 'load_balancer: apply balance threshold to intranode shard balancing' from Ferenc Szili - Fix intranode shard balancing to respect the size-based balance threshold, preventing unnecessary migrations when load difference between shards is negligible - Add a regression test that verifies the threshold is respected for intranode balancing The intranode shard balancing loop only stopped when the algorithm exhausted the migration candidates or when a migration would go against convergence (it would increase imbalance instead of decrease it). This caused unnecessary tablet migrations for negligible imbalances (e.g., 0.78% difference between shards). The inter-node balancer already uses `is_balanced()` to stop when the relative load difference is within the configured `size_based_balance_threshold`, but this check was missing from the intranode path. Apply the same `is_balanced()` threshold check that is already used for inter-node balancing to the intranode convergence loop. When the relative load difference between the most-loaded and least-loaded shards on a node is within the threshold, the balancer now stops without issuing further migrations. The test creates a single node with 2 shards and 512 tablets: 1. Balanced scenario (257 vs 255 tablets, same size): relative diff = 0.78% < 1% threshold → verifies no intranode migration is emitted 2. Unbalanced scenario (307 vs 205 tablets, same size): relative diff = 33% >> 1% threshold → verifies intranode migration IS emitted Fixes: SCYLLADB-1775 This is a performance improvement which reduces the number of intranode migrations issued, and needs to be backported to versions with size-based load balancing: 2026.1 and 2026.2 Closes scylladb/scylladb#29756 * github.com:scylladb/scylladb: test: add test for intranode balance threshold in size-based mode tablet_allocator: apply balance threshold to intranode shard balancing	2026-05-13 13:09:52 +02:00
Botond Dénes	e95eb21a16	Merge 'Tablet-aware restore' from Pavel Emelyanov The mechanics of the restore is like this - A /storage_service/tablets/restore API is called with (keyspace, table, endpoint, bucket, manifests) parameters - First, it populates the system_distributed.snapshot_sstables table with the data read from the manifests - Then it emplaces a bunch of tablet transitions (of a new "restore" kind), one for each tablet - The topology coordinator handles the "restore" transition by calling a new RESTORE_TABLET RPC against all the current tablet replicas - Each replica handles the RPC verb by - Reading the snapshot_sstables table - Filtering the read sstable infos against current node and tablet being handled - Downloading and attaching the filtered sstables This PR includes system_distributed.snapshot_sstables table from @robertbindar and preparation work from @kreuzerkrieg that extracts raw sstables downloading and attaching from existing generic sstables loading code. This is first step towards SCYLLADB-197 and lacks many things. In particular - the API only works for single-DC cluster - the caller needs to "lock" tablet boundaries with min/max tablet count - not abortable - no progress tracking - sub-optimal (re-kicking API on restore will re-download everything again) - not re-attacheable (if API node dies, restoration proceeds, but the caller cannot "wait" for it to complete via other node) - nodes download sstables in maintenance/streaming sched gorup (should be moved to maintenance/backup) Other follow-up items: - have an actual swagger object specification for `backup_location` Closes #28436 Closes #28657 Closes #28773 Closes scylladb/scylladb#28763 * github.com:scylladb/scylladb: docs: Update topology_over_raft.md with `restore` transition kind test: Add test for backup vs migration race test: Restore resilience test sstables_loader: Fail tablet-restore task if not all sstables were downloaded sstables_loader: mark sstables as downloaded after attaching sstables_loader: return shared_sstable from attach_sstable db: add update_sstable_download_status method db: add downloaded column to snapshot_sstables db: extract snapshot_sstables TTL into class constant test: Add a test for tablet-aware restore tablets: Implement tablet-aware cluster-wide restore messaging: Add RESTORE_TABLET RPC verb sstables_loader: Add method to download and attach sstables for a tablet tablets: Add restore_config to tablet_transition_info sstables_loader: Add restore_tablets task skeleton test: Add rest_client helper to kick newly introduced API endpoint api: Add /storage_service/tablets/restore endpoint skeleton sstables_loader: Add keyspace and table arguments to manfiest loading helper sstables_loader_helpers: just reformat the code sstables_loader_helpers: generalize argument and variable names sstables_loader_helpers: generalize get_sstables_for_tablet sstables_loader_helpers: add token getters for tablet filtering sstables_loader_helpers: remove underscores from struct members sstables_loader: move download_sstable and get_sstables_for_tablet sstables_loader: extract single-tablet SST filtering sstables_loader: make download_sstable static sstables_loader: fix formating of the new `download_sstable` function sstables_loader: extract single SST download into a function sstables_loader: add shard_id to minimal_sst_info sstables_loader: add function for parsing backup manifests split utility functions for creating test data from database_test export make_storage_options_config from lib/test_services rjson: Add helpers for conversions to dht::token and sstable_id Add system_distributed_keyspace.snapshot_sstables add get_system_distributed_keyspace to cql_test_env code: Add system_distributed_keyspace dependency to sstables_loader storage_service: Export export handle_raft_rpc() helper storage_service: Export do_tablet_operation() storage_service: Split transit_tablet() into two tablets: Add braces around tablet_transition_kind::repair switch	2026-05-12 16:24:13 +03:00
Avi Kivity	ddb1181103	Merge 'load_balance: fix drain with forced capacity-based balancing' from Ferenc Szili When `force_capacity_based_balancing` is enabled and a node is being drained/excluded, the tablet allocator incorrectly aborts balancing due to incomplete tablet stats - even though capacity-based balancing doesn't depend on tablet sizes. The tablet allocator normally waits for complete load stats before balancing. An exception exists for drained+excluded nodes (they're unreachable and won't return stats). However, when forced capacity-based balancing is active, this exception was not being applied, causing the balancer to reject the drain plan. Adjust the condition in `tablet_allocator.cc` so that the "ignore missing data for drained nodes" logic applies regardless of whether capacity-based balancing is forced. Added a Boost unit test that forces capacity-based balancing and verifies a drained/excluded node gets its tablets migrated even when tablet size stats are missing. This bug was introduced in 2026.1, so this needs to be backported to 2026.1 and 2026.2 Fixes: SCYLLADB-1803 Closes scylladb/scylladb#29791 * github.com:scylladb/scylladb: test: boost: add drain test for forced capacity-based balancing service: allow draining with forced capacity-based balancing	2026-05-12 12:38:25 +03:00
Pavel Emelyanov	1c0f8ab66e	Merge 'sstables: introduce --abort-on-malformed-sstable-error' from Botond Dénes When a malformed sstable error occurs, it is usually caused by actual sstable corruption — a cosmic ray, a bad disk write, etc. However, it can also be caused by memory corruption, where a data structure in memory happens to be read as sstable data. In the latter case, having a coredump of the process at the moment of the error is invaluable for post-mortem debugging, since the exception throwing/catching machinery destroys the stack frames that would point to the corruption site. This patch series introduces `--abort-on-malformed-sstable-error`, a new command-line option (with `LiveUpdate` support) that, when set, causes the server to call `std::abort()` instead of throwing an exception whenever any sstable parse error is detected. This covers all code paths: - Direct `throw malformed_sstable_exception(...)` sites (migrated to `throw_malformed_sstable_exception()`) - Direct `throw bufsize_mismatch_exception(...)` sites (migrated to `throw_bufsize_mismatch_exception()`) - `parse_assert()` failures (via `on_parse_error()`) - BTI parse errors (via `on_bti_parse_error()`) The implementation places the flag and helper functions in `sstables/sstables.cc`, next to the existing `on_parse_error()` / `on_bti_parse_error()` infrastructure. The flag defaults to `false`, preserving current behaviour. It is intended to be enabled temporarily when investigating suspected memory corruption. Commit breakdown: 1. Infrastructure: flag, getter/setter, and throw helpers in `sstables/sstables.cc`; config option wired up in `main.cc` 2. `on_parse_error()` and `on_bti_parse_error()` check the new flag 3. All ~50 `throw malformed_sstable_exception(...)` sites migrated 4. Both `throw bufsize_mismatch_exception(...)` sites migrated Refs: SCYLLADB-1087 Backport: new feature, no backport Closes scylladb/scylladb#29324 * github.com:scylladb/scylladb: sstables: migrate all bufsize_mismatch_exception throw sites to throw_bufsize_mismatch_exception() sstables: migrate all malformed_sstable_exception throw sites to throw_malformed_sstable_exception() sstables: make on_parse_error() and on_bti_parse_error() respect --abort-on-malformed-sstable-error sstables: disable abort-on-malformed-sstable-error in tests that corrupt sstables on purpose sstables: introduce --abort-on-malformed-sstable-error infrastructure sstables: refactor parse_path() to return std::expected<> instead of throwing	2026-05-12 12:38:25 +03:00
Pavel Emelyanov	150345cc52	Merge 'test: per-bucket isolation for S3/GCS object storage tests' from Ernest Zaslavsky This series adds per-test bucket isolation to all S3 and GCS object storage tests. Previously, every test shared a single pre-created bucket, which meant tests could interfere with each other through leftover objects and could not run concurrently across multiple `test.py` processes without risking collisions. New `create_bucket`, `delete_bucket`, and `delete_bucket_with_objects` methods on `s3::client`, following the existing `make_request` pattern. `create_bucket` handles the `BUCKET_ALREADY_OWNED_BY_YOU` error gracefully. A new `s3_test_fixture` RAII class for C++ Boost tests that creates a uniquely-named bucket on construction (derived from the Boost test name and pid) and tears down everything — objects, bucket, client — on destruction. All S3 tests in `s3_test.cc` are migrated to use it, removing manual `deferred_delete_object` and `deferred_close` boilerplate. The minio server policy is broadened to allow dynamic bucket creation/deletion. A `client::make` overload that accepts a custom `retry_strategy`, used in tests with a fast 1ms retry delay instead of exponential backoff, significantly reducing test runtime for transient errors during bucket lifecycle operations. Python-side (`test/cluster/object_store`): each pytest fixture (`object_storage`, `s3_storage`, `s3_server`) now creates a unique bucket per test function via `create_test_bucket()` and destroys it on teardown. Bucket names are sanitized from the pytest node name with a short UUID suffix for uniqueness. Object storage helpers (`S3Server`, `MinioWrapper`, `GSFront`, `GSServerImpl`, factory functions, CQL helpers, `s3_server` fixture) are extracted from `test/cluster/object_store/conftest.py` into a shared `test/pylib/object_storage.py` module, eliminating duplication across test suites. The conftest becomes a thin re-export wrapper. Old class names are preserved as aliases for backward compatibility. \| Test Name \| new test specific retry strategy execution time (ms) \| original execution time (ms) \| Δ (ms) \| Speedup \| \|--------------------------------------------------------------\|----------------:\|-------------:\|---------:\|--------:\| \| test_client_upload_file_multi_part_with_remainder_proxy \| 19,261 \| 61,395 \| −42,134 \| 3.2× \| \| test_client_upload_file_multi_part_without_remainder_proxy \| 16,901 \| 53,688 \| −36,787 \| 3.2× \| \| test_client_upload_file_single_part_proxy \| 3,478 \| 6,789 \| −3,311 \| 2.0× \| \| test_client_multipart_copy_upload_proxy \| 1,303 \| 1,619 \| −316 \| 1.2× \| \| test_client_put_get_object_proxy \| 150 \| 365 \| −215 \| 2.4× \| \| test_client_readable_file_stream_proxy \| 125 \| 327 \| −202 \| 2.6× \| \| test_small_object_copy_proxy \| 205 \| 389 \| −184 \| 1.9× \| \| test_client_put_get_tagging_proxy \| 181 \| 350 \| −169 \| 1.9× \| \| test_client_multipart_upload_proxy \| 1,252 \| 1,416 \| −164 \| 1.1× \| \| test_client_list_objects_proxy \| 729 \| 881 \| −152 \| 1.2× \| \| test_chunked_download_data_source_with_delays_proxy \| 830 \| 960 \| −130 \| 1.2× \| \| test_client_readable_file_proxy \| 148 \| 279 \| −131 \| 1.9× \| \| test_client_upload_file_multi_part_with_remainder_minio \| 3,358 \| 3,170 \| +188 \| 0.9× \| \| test_client_upload_file_multi_part_without_remainder_minio \| 3,131 \| 2,929 \| +202 \| 0.9× \| \| test_client_upload_file_single_part_minio \| 519 \| 421 \| +98 \| 0.8× \| \| test_download_data_source_proxy \| 180 \| 237 \| −57 \| 1.3× \| \| test_client_list_objects_incomplete_proxy \| 590 \| 641 \| −51 \| 1.1× \| \| test_large_object_copy_proxy \| 952 \| 991 \| −39 \| 1.0× \| \| test_client_multipart_upload_fallback_proxy \| 148 \| 185 \| −37 \| 1.3× \| \| test_client_multipart_copy_upload_minio \| 641 \| 674 \| −33 \| 1.1× \| No backport needed — this is a test infrastructure improvement with no production code impact beyond the new `s3::client` methods. Closes scylladb/scylladb#29508 * github.com:scylladb/scylladb: test: extract object storage helpers to test/pylib/object_storage.py test: add per-test bucket isolation to object_store fixtures s3: add client::make overload with custom retry strategy test: add s3_test_fixture and migrate tests to per-bucket isolation s3: add create_bucket and delete_bucket to client	2026-05-12 12:38:24 +03:00

1 2 3 4 5 ...

4768 Commits