scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-03 05:26:58 +00:00

Author	SHA1	Message	Date
Avi Kivity	e10c124cd3	cql3: statement_restrictions: replace has_slice with predicate is_slice check In the clustering prefix construction loop, replace the has_slice() call (which uses find_binop to search the merged predicate's expression tree for slice operators) with a direct check on the individual predicate vector's is_slice field.	2026-05-29 16:50:02 +03:00
Avi Kivity	4c282f588a	cql3: statement_restrictions: replace contains_multi_column_restriction filter with _has_multi_column In calculate_column_defs_for_filtering_and_erase_restrictions_used_for_index(), the code extracted multi-column boolean factors from _clustering_columns_restrictions. Since multi-column and single-column CK restrictions cannot be mixed (the constructor enforces this), when _has_multi_column is true, ALL factors are multi-column. Simplify to just adding _clustering_columns_restrictions directly when _has_multi_column is set. This removes the last caller of contains_multi_column_restriction(), allowing the function (and its find_binop call) to be removed.	2026-05-29 16:50:02 +03:00
Avi Kivity	dca2cc512e	cql3: statement_restrictions: remove unused find_needs_filtering and has_slice_or_needs_filtering These helper functions were wrappers around find_binop that are no longer called, since their call sites have been replaced by predicate-based checks.	2026-05-29 16:50:02 +03:00
Avi Kivity	eb98aea466	cql3: statement_restrictions: replace has_slice_or_needs_filtering with tracked bool Replace the has_slice_or_needs_filtering() call on _partition_key_restrictions (which uses find_binop to walk the expression tree) with a precomputed _pk_has_slice_or_needs_filtering boolean tracked incrementally during predicate construction in the partition key branch.	2026-05-29 16:50:02 +03:00
Avi Kivity	6e27c3a185	cql3: statement_restrictions: replace contains_multi_column_restriction with _has_multi_column In clustering_key_restrictions_need_filtering(), replace the contains_multi_column_restriction() call (which uses find_binop to search for a tuple_constructor LHS in the expression tree) with the precomputed _has_multi_column boolean that is already tracked incrementally during predicate construction.	2026-05-29 16:50:01 +03:00
Avi Kivity	ae7eb860a5	cql3: statement_restrictions: replace find_needs_filtering with predicate op check In the clustering prefix construction loop, replace the find_needs_filtering() call (which walks the merged predicate's expression tree looking for needs-filtering binary operators) with a check on the individual predicate vector. This uses the per-predicate op field directly instead of searching the expression tree.	2026-05-29 16:50:01 +03:00
Avi Kivity	556262a165	cql3: statement_restrictions: replace find_binop is_on_collection with tracked bool Replace the two find_binop(_clustering_columns_restrictions, is_on_collection) calls with a precomputed _ck_is_on_collection boolean that is tracked incrementally during predicate construction. This avoids walking the expression tree at each call site. The is_on_collection check detects CONTAINS/CONTAINS_KEY operators, which indicate collection restrictions on a clustering key column.	2026-05-29 16:50:01 +03:00
Avi Kivity	569c85032e	cql3: statement_restrictions: replace find_binop column extraction with predicate on field In add_clustering_restrictions_to_idx_ck_prefix(), find_binop was used to locate any binary_operator in the predicate's filter just to extract the column from its LHS. Since the predicate already stores this information in its 'on' field (as on_column for single-column predicates), use it directly instead of searching the expression tree.	2026-05-29 16:50:01 +03:00
Avi Kivity	240d9be5e2	cql3: statement_restrictions: set op on all binary-operator-derived predicates The to_predicates() function had fallthrough paths for operators like LIKE and NOT_IN that created predicates without setting the op field. This meant predicate-based checks like 'p.op && needs_filtering(*p.op)' would miss these operators. Fix by inlining the predicate construction at the fallthrough points (instead of using cannot_solve_on_column) and setting .op = oper.op. This ensures all predicates derived from binary operators carry their operator type, enabling reliable predicate-based analysis. The cannot_solve_on_column helper is now unused and removed.	2026-05-29 16:50:01 +03:00
Botond Dénes	091e3f5191	Merge 'test.py: reduce resource metrics gathering overhead' from Evgeniy Naydanov Only enable the memory controller in cgroup subtree_control instead of all available controllers. cpu.stat is available in cgroup v2 without enabling the cpu controller (base accounting), and enabling io/pids/cpu controllers adds unnecessary per-operation kernel overhead to Scylla processes - particularly the memory controller's per-page-cache-operation accounting combined with io controller overhead during heavy I/O. Additionally, restrict SystemResourceMonitor to the master process only. System-wide metrics (CPU%, memory) are identical from any process, so running a monitoring thread in each xdist worker was redundant and added unnecessary SQLite write contention and thread scheduling noise. Replace cpu_percent(interval=0.1) with a non-blocking cpu_percent() that returns CPU% since the previous call. Use stop_event.wait(timeout=2.0) as the loop control to both space out iterations and allow immediate shutdown responsiveness. Fixes SCYLLADB-2141 Closes scylladb/scylladb#29987 * github.com:scylladb/scylladb: test: use non-blocking cpu_percent in SystemResourceMonitor test.py: reduce cgroup overhead in resource metrics gathering	2026-05-29 10:52:17 +03:00
Yaniv Michael Kaul	f90b066405	cql3: lazily allocate _idx_opt behind unique_ptr Motivation: The secondary_index::index object stored in statement_restrictions is approximately 128 bytes (containing index_metadata with its sstring name, UUID id, and unordered_map options, plus a target_column sstring). This field is only populated for queries that use secondary indexing, yet every prepared statement's restrictions object pays the full inline cost. Replace std::optional<secondary_index::index> with std::unique_ptr<secondary_index::index>. This reduces the inline size from 136 bytes to 8 bytes, saving 128 bytes per non-index-using prepared statement cached in the prepared statement cache. The semantics are preserved: null unique_ptr is equivalent to std::nullopt, and the dereference patterns (-> and *) work identically. The find_idx() method that returns a copy constructs an optional from the dereferenced pointer when non-null. Tests: - statement_restrictions_test builds and passes - Full release build compiles cleanly Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> AI-assisted: Yes Backport: no, improvement Closes scylladb/scylladb#30046	2026-05-28 21:35:25 +03:00
Dawid Mędrek	ffd9f7ebbd	Merge 'treewide: update method accessibility when checked by concepts' from Avi Kivity Clang 22 and below ignore method accessibility when checking concepts. Clang 23 now [1] checks accessibility. Make relevant methods public so concepts that check them have access. The problem was that the concepts were evaluated at the use-site, which was a friend, but should have been evaluated in some friendless global context. After the clang fix, the problems in our code were exposed. [1] `ac3c588739` Preparing for a new toolchain, so not backporting. Closes scylladb/scylladb#30053 * github.com:scylladb/scylladb: compacting_reader: make consume() methods public mutation_fragment_v1_stream: make consume() methods public	2026-05-28 11:19:29 +02:00
Nadav Har'El	7a387a499f	Merge 'cql3: extract vector search select statement into cql3/statements/external_search/' from Szymon Malewski Extract vector_indexed_table_select_statement and its filter logic out of the monolithic select_statement.cc and vector_search/ module into a dedicated directory cql3/statements/index_search/. This improves modularity and eliminates a circular dependency between cql3 and vector_search: the filter code depends heavily on cql3 types (expressions, query_options, statement_restrictions) and belongs in the cql3 layer. Follow-up to VECTOR-250 which originally addressed the same dependency but has since regressed. This is also a preparatory refactoring for full-text search select statements, which can share some implementation with the vector search. Pure refactoring, no semantic changes - no need for backporting. Closes scylladb/scylladb#30100 * github.com:scylladb/scylladb: vector_index: move filter into cql3/statements/external_search cql3: extract vector_indexed_table_select_statement into own compilation unit vector_index: split query_base_table to return raw coordinator_result	2026-05-28 11:26:49 +03:00
Piotr Dulikowski	f44d57c7c7	Merge 'Deprecate HOST_ID_BASED_HINTED_HANDOFF feature and drop migration code' from Gleb Natapov The feature was included in 2024.2 and present on all supported versions. No upgrade from a version that does not have it is possible to the HEAD. It means that the feature can be deprecated features list and all the migration code can be dropped. No need to backport since the is code removal. Closes scylladb/scylladb#30087 * github.com:scylladb/scylladb: hints: remove hint_directory_manager and IP-based hint directory infrastructure hints: remove migration infrastructure hints: deprecate HOST_ID_BASED_HINTED_HANDOFF feature	2026-05-28 10:09:02 +02:00
Piotr Dulikowski	8dfd455001	Merge 'strong consistency: fix drop table blocking on stuck writes and handle timeout in update()' from Petr Gusev - Fix table drop blocking for the full client timeout when in-flight writes can't reach quorum - Handle unhandled timeout exception in the wait-for-leader loop during group startup When a strongly consistent table is dropped, `schedule_raft_group_deletion`() calls `g->close()` which waits for all in-flight operations to release their gate holders. But other nodes may have already destroyed their raft servers for this group, so an in-flight write on the leader cannot reach quorum and hangs until the client timeout expires (~seconds), unnecessarily delaying group deletion. Additionally, the wait-for-leader loop in groups_manager::update() uses abort_on_expiry with a 60-second timeout but never catches the exception if it fires, leaving the group in an indeterminate state. SCYLLADB-2080 fix: - Reorder `schedule_raft_group_deletion`: initiate gate close (prevents new operations), then abort the raft server (unblocks stuck writes by causing `raft::stopped_error`), then await the gate future (resolves immediately since holders are released). - Handle `raft::stopped_error` in the coordinator's top-level catch blocks (both write and read paths): if the table no longer exists, return `no_such_column_family` (CQL layer converts to InvalidRequest: unconfigured table). Otherwise fall through to the default timeout handling. - Replace gate->hold() with try_hold() + on_internal_error in acquire_server, with a comment explaining why the gate can never be closed at that point (table removal in `schema_applier::commit_on_shard` precedes gate closure, with no scheduling point in between). Timeout handling fix: - Use `coroutine::as_future` in the wait-for-leader loop to catch timeout exceptions gracefully — log a warning and break out instead of propagating unhandled. Includes a cluster test reproducer (test_drop_table_unblocks_stuck_write) that: 1. Pauses a write on the leader before add_entry 2. Drops the table (follower destroys its group immediately) 3. Resumes the write — verifies it fails promptly with InvalidRequest ("unconfigured table") instead of hanging for 15 seconds backport: no need, strong consistency is not released yet Fixes: SCYLLADB-2080 Closes scylladb/scylladb#30105 * github.com:scylladb/scylladb: strong consistency/groups_manager: handle timeout in update() wait-for-leader loop strong consistency: abort raft server before gate close when dropping a table test/cluster: rewrite test_queries_while_dropping_table for SCYLLADB-2080	2026-05-28 09:59:20 +02:00
Szymon Malewski	ed1006928f	vector_index: move filter into cql3/statements/external_search Move prepared_filter, prepared_restriction, prepared_rhs types and prepare_filter() from vector_search/filter.{hh,cc} into new files cql3/statements/external_search/filter.{hh,cc} under namespace cql3::statements::external_search. This eliminates a circular dependency between the cql3 and vector_search modules: the filter code depends heavily on cql3 types (expressions, query_options, statement_restrictions) and belongs in the cql3 layer. This is a follow-up to VECTOR-250 which originally addressed the same circular dependency but has since regressed.	2026-05-27 21:43:56 +02:00
Szymon Malewski	5e94abe3bc	cql3: extract vector_indexed_table_select_statement into own compilation unit Move vector_indexed_table_select_statement and its associated helpers (ann_ordering_info, get_ann_ordering_info, add_similarity_function_to_selectors, get_similarity_ordering_comparator) from select_statement.hh/.cc into new files cql3/statements/external_search/vector_indexed_table_select_statement.hh/.cc.	2026-05-27 21:43:52 +02:00
Ferenc Szili	76dac2fd8e	test: fix format string typo in error logging in ldap_server.py This change fixes a typo in the error logging format string: s% -> %s Fixes: SCYLLADB-2244 Closes scylladb/scylladb#30088	2026-05-27 17:22:21 +03:00
Anna Stuchlik	c54d7329d4	docs: add Configuration Parameters link to system-configuration index The system-configuration index page listed the System Configuration Guide, scylla.yaml, and Snitches, but omitted the Configuration Parameters reference page. Fixes https://github.com/scylladb/scylladb/issues/23110 Closes scylladb/scylladb#30117	2026-05-27 17:18:55 +03:00
Emil Maskovsky	f845918861	raft: don't block replace when group0 leader is unknown The join_node_request_handler rejects replace requests when the node being replaced is still seen as the group0 leader. It loops for up to 10s waiting for the leader to change. However, the loop condition also blocked when current_leader() returned empty (no leader known): while (!g0_server.current_leader() \|\| *params.replaced_id == g0_server.current_leader()) This is incorrect: if current_leader() is empty, it means the old leader is already gone (election in progress). The replaced node is no longer the leader, so the safety check is satisfied and the replace should be allowed to proceed. Remove the !current_leader() check so the loop only continues while the replaced node is positively identified as the current leader. No backport needed: the failure rate is 2/17K in CI (dev mode only, caused by reactor stalls under extreme resource contention) and the code path only affects replace-after-kill scenarios where the replaced node was the group0 leader. Refs: SCYLLADB-2125 Closes scylladb/scylladb#30098	2026-05-27 14:56:30 +02:00
Nadav Har'El	21ecc12fc6	Merge 'index: fix local vector index locality detection after schema reload' from Michał Hudobski After schema reload, `target_parser::is_local()` did not recognize the vector-index local target format `{"pk": [...], "tc": "..."}`, causing local vector indexes to be treated as global. This broke duplicate detection when both a global and a local vector index existed on the same column. Fix by introducing `vector_index::is_local()` and dispatching to it from `create_index_from_index_row()` based on the index class. Also adds tests for local/global vector index coexistence. Fixes: SCYLLADB-987 backport reasoning: we added local vector index support in 2026.1 Closes scylladb/scylladb#29492 * github.com:scylladb/scylladb: test/cqlpy: add tests for global and local vector index coexistence index: fix local vector index locality detection after schema reload	2026-05-27 15:34:57 +03:00
Petr Gusev	f2b1cbe998	strong consistency/groups_manager: handle timeout in update() wait-for-leader loop The wait-for-leader loop in groups_manager::update() uses abort_on_expiry with a 60-second timeout. If the timeout fires, co_await w->future throws an exception that propagates unhandled out of the server_control_op coroutine, leaving the group in an indeterminate state. Use coroutine::as_future to catch the exception, log a warning, and break out of the loop gracefully. The group will still be reported as started (allowing other operations to proceed) even if the leader wasn't found within the timeout.	2026-05-27 12:06:46 +02:00
Petr Gusev	d922c43358	strong consistency: abort raft server before gate close when dropping a table When a strongly consistent table is dropped, schedule_raft_group_deletion() used to call g->close() first, which waits for all in-flight operations to release their gate holders. But other nodes may have already destroyed their raft servers for this group, so an in-flight write on the leader cannot reach quorum and hangs until the client timeout expires, unnecessarily delaying group deletion. Fix: initiate gate close (prevents new operations from entering), then abort the raft server (causes in-flight add_entry/read_barrier to throw raft::stopped_error, releasing their gate holders), then await the gate future (resolves immediately since holders are now released). Handle raft::stopped_error in the coordinator's top-level catch blocks (both write and read paths): if the table no longer exists, return no_such_column_family (which the CQL layer converts to InvalidRequest 'unconfigured table'). Otherwise fall through to the default timeout handling. Also replace gate->hold() with try_hold() + on_internal_error in acquire_server, and handle the timeout exception in the wait-for-leader loop in update() gracefully (log + break instead of propagating). Fixes: SCYLLADB-2080	2026-05-27 12:06:46 +02:00
Petr Gusev	89307064b5	test/cluster: rewrite test_queries_while_dropping_table for SCYLLADB-2080 Rewrite the test to use 2 nodes (RF=2) instead of 1 (RF=1), which exposes the quorum-loss scenario: when a table is dropped, the follower destroys its raft group immediately while the leader's in-flight operations are still holding the gate. The test pauses both a read and a write on the leader, drops the table, then resumes them. Both are expected to fail with 'no such column family' since the raft server is aborted as part of group deletion. A 15-second timeout guard detects the old buggy behavior (write stuck forever). Marked xfail until the fix is applied in the next commit.	2026-05-27 12:06:46 +02:00
Avi Kivity	668ad55c69	build: degrade -Wpass-failed from error to warning -Wpass-failed warns when an explicitly requested optimization (e.g. `#pragma GCC unroll`) cannot be performed. Since the standard library contains those pragmas, this is more or less out of a developer's control. We can play with inlining, but cannot guarantee it will work. Since the condition isn't fatal in any way, degrade it back to its default disposition, a warning (it was upgraded to an error via -Werror). Don't suppress it entirely since in hot paths we do want to address it. Closes scylladb/scylladb#29980	2026-05-27 11:38:15 +03:00
Gleb Natapov	54a423986e	hints: remove hint_directory_manager and IP-based hint directory infrastructure Now that HOST_ID_BASED_HINTED_HANDOFF is always enabled, remove the hint_directory_manager class and all code paths that dealt with IP-named hint directories and IP-to-host-ID mappings. - Remove hint_directory_manager class from hint_storage.hh/.cc - Simplify drain_for to take only host_id (no IP parameter) - Simplify initialize_endpoint_managers to only scan host-ID directories - Simplify with_file_update_mutex_for to take host_id directly - Simplify resource_manager's space_watchdog to use host_id only - Make storage_proxy::on_leave_cluster empty (draining via on_released) - Remove uses_host_id() checks from storage_proxy::on_released	2026-05-27 11:13:28 +03:00
Wojciech Mitros	515faaf1d0	strong_consistency: cleanup forwarding reads to leader When forwarding reads to the raft group leader was introduced, we didn't use the methods allowing us to cache the leader after completing requests - we fix it in this commit by using the redirect_to_leader method prepared for this case. Also remove a duplicated consecutive 'if' Closes scylladb/scylladb#30102	2026-05-27 09:49:06 +02:00
Yaron Kaikov	ec36f0f7e1	build: fix collect-dist target failing with missing RPM/DEB rules The collect_pkgs ninja rules for building collect-dist-{mode} listed individual RPM and DEB file paths as order-only dependencies. However, the rpmbuild/debbuild rules only declare the output directory (e.g. $builddir/dist/{mode}/redhat), not the individual files within it. This caused ninja to fail with: ninja: error: '...scylla-....rpm', needed by 'build/.../dist/rpm', missing and no known rule to make it Fix by removing the individual package file paths from the order-only dependency list. The directory targets ($builddir/dist/{mode}/redhat, dist-cqlsh-rpm, dist-python3-rpm, etc.) already ensure the packages are built before collect_pkgs copies them via the $pkgs variable. `5694c93c12` ("build: add collect-dist target to organize build artifacts") intreduced this regression Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-2215 Closes scylladb/scylladb#30079	2026-05-27 10:17:28 +03:00
Botond Dénes	555cfbcd38	Merge 'treewide: replace deprecated smp::count and smp::all_cpus() with new APIs' from Avi Kivity Replace all uses of the deprecated seastar::smp::count with this_smp_shard_count() and smp::all_cpus() with this_smp_all_shards() across the ScyllaDB codebase (seastar submodule untouched). Both replacement functions require a reactor thread context. All call sites were verified to run on reactor threads. Notable cases: - dht/token-sharding.hh: this_smp_shard_count() is used as a default parameter value. This is safe since all callers are on reactor threads, but the expression is now evaluated at each call site rather than being a reference to a global variable. - service/storage_service.hh, locator/abstract_replication_strategy.hh, ent/encryption/encryption.cc: used in default member initializers and constructor member-init-lists. Objects are always constructed on reactor threads. - schema_builder: sometimes called from BOOST_AUTO_TEST_CASE without a reactor. Added pre-patch that makes the implicit shard count parameter implicit and pass 1 in those cases. Not changed: - scylla-gdb.py: reads smp::count as a GDB symbol (no reactor context). - Python test files: only reference smp::count in comments/strings. No backport: the Seastar commit that deprecated these function hasn't (and won't) make its way into any release branches (and the warnings are cosmetic anyway) Closes scylladb/scylladb#29990 * github.com:scylladb/scylladb: treewide: replace deprecated smp::count and smp::all_cpus() with new APIs scylla-gdb: read shard count from smp::_this_smp instead of smp::count schema_builder: make shard_count an explicit constructor parameter	2026-05-27 09:42:06 +03:00
Szymon Malewski	aa17c7739e	vector_index: split query_base_table to return raw coordinator_result The inner query_base_table overloads previously called process_results() themselves, duplicating row_limit setup and making it impossible to thread per-execution context (e.g. a similarity provider) into result processing. Lift process_results() to the top-level overload and change the two inner overloads to return coordinator_result<foreign_ptr<query::result>> directly. This cleanly separates query dispatch from result processing, and opens the door to passing execution-time context at the single process_results() call site. No functional change.	2026-05-26 21:37:13 +02:00
Avi Kivity	35d7cc7c3e	compacting_reader: make consume() methods public The CompactedFragmentsConsumer concept checks that these methods exist. Clang 23 tightened [1] the rules to verify that the methods are publicly accessible. Make them public so we don't fail the build. [1] `ac3c588739`	2026-05-26 20:10:10 +03:00
Avi Kivity	e26d983453	mutation_fragment_v1_stream: make consume() methods public The MutationFragmentConsumerV2 concept checks that these methods exist. Clang 23 tightened [1] the rules to verify that the methods are publicly accessible. Make them public so we don't fail the build. [1] `ac3c588739`	2026-05-26 20:09:57 +03:00
Avi Kivity	8010e408a2	treewide: replace deprecated smp::count and smp::all_cpus() with new APIs Replace all uses of the deprecated seastar::smp::count with this_smp_shard_count() and smp::all_cpus() with this_smp_all_shards() across the ScyllaDB codebase (seastar submodule untouched). Both replacement functions require a reactor thread context. All call sites were verified to run on reactor threads. Notable cases: - dht/token-sharding.hh: this_smp_shard_count() is used as a default parameter value. This is safe since all callers are on reactor threads, but the expression is now evaluated at each call site rather than being a reference to a global variable. - service/storage_service.hh, locator/abstract_replication_strategy.hh, ent/encryption/encryption.cc: used in default member initializers and constructor member-init-lists. Objects are always constructed on reactor threads. Not changed: - scylla-gdb.py: reads smp::count as a GDB symbol (no reactor context). - Python test files: only reference smp::count in comments/strings.	2026-05-26 17:35:20 +03:00
Avi Kivity	3fe64681e9	scylla-gdb: read shard count from smp::_this_smp instead of smp::count After removing all references to smp::count from ScyllaDB code, in the next patch, the linker may strip the symbol in release builds (LTO/gc-sections). The GDB script then fails with 'Missing ELF symbol _ZN7seastar3smp5countE'. Read the shard count from the thread-local smp instance pointer (smp::_this_smp->_shard_count) instead. This pointer is always set on reactor threads and is guaranteed to survive the linker since it's used by this_smp_shard_count(). If the current GDB thread is not a reactor thread (e.g., an alien thread), iterate all threads to find the first reactor thread. If none have _this_smp set, try the deprecated smp::count global as a last resort (available in debug builds).	2026-05-26 17:35:19 +03:00
Wojciech Mitros	ae0d77257f	mv: fix view_update_builder losing fragments across batch boundaries When a mutation generates more view updates than max_rows_for_view_updates (100), view_update_builder::build_some() splits the work into multiple batches. There was a bug in how fragments were read between batches: When should_stop_updates() returned true, the old code called stop() which returned stop_iteration::yes without reading the next fragments. On the next build_some() call, read_both_next_fragments() was called at the start, which advanced BOTH readers - skipping any fragment that was already read but not yet consumed. A row could be not consumed if either: - the 100th (last in the batch) update was a row insertion and we still had insertions/updates remaining - the 100th (last in the batch) update was a row deletion and we still had deletions/updates remaining For the most common case where work is split in batches, i.e. range deletions, we couldn't hit this because range delete generates only view row deletions. On tables with a single materialized view, we also couldn't get this for any batches with less than 50 statements (unless the batch also contained range deletions), because one non-range-delete update can generate up to 2 view updates. Howeveer, for a range of scenarios outside these 2, we could lose view updates, resulting in persistent inconsistencies. The fix: - read_*_next_fragment() now accept a stop_iteration parameter, so the next fragments are always read after consuming (even when stopping), but stop_iteration::yes is correctly propagated to break the loop. - build_some() no longer re-reads fragments at the start. Instead, an initialize() method performs the initial read once at construction. - because now we only advance readers after consuming, we won't advance readers after end_of_partition, so we extend the break condition to accept either readers evaluating to `false` or them being at the end_of_partition. We also handle the optimization with _skip_row_updates Fixes: scylladb/scylladb#29155 Closes scylladb/scylladb#29498	2026-05-26 14:15:12 +02:00
Avi Kivity	c59985c38b	Merge 'cql3: limit large allocations when parsing queries' from Botond Dénes Queries are stored and passed around as sstring/std::string_view. While normally they are small enough to not cause problems, as the `test_cdc_large_values.TestLargeColumnsWithCDC.test_single_column_blob_max_size_with_cdc_preimage_full_postimage[unprepared_statements]` demonstrates, queries can be arbitrarily large, putting heavy strain on Scylla internals via large allocations, in the extreme case causing denial of service. This PR attempts to alleviate this by using fragmented storage for queries: read query as fragmented string from the input stream in `transport/server.cc`, propagate it as such to `query_processor::prepare()` and also store it as such in `cql3::cql_statement::raw_cql_statement`. Also avoid linearizing raw values during in the CQL expression tree: switch `cql3::expr::untyped_constant::raw_text` to fragmented storage. For this to be possible, some infrastructure code had to be made fragmented storage friendly: ascii/utf8 validation, hashers, from_hex and importantly: `abstract_type::from_string()`. Unfortunately, the query still has to be linearized for parsing itself, as ANTLR -- although allows for custom InputStream implementation -- plays pointer arithmetics games with the pointers obtained from them, so fragmented input cannot be used. Still, this PR limits the places where the query is linearized to the following: * Parsing * Audit * Logs and error messages So the normal query paths for queries that actually can get arbitrarily large (UPDATE and INSERT) should only linearize the query temporarily for parsing. Fixes #10779 Improvement, no backport Closes scylladb/scylladb#28619 * github.com:scylladb/scylladb: tracing: add_query(): change query param to utils::chunked_string cql3: store raw query string in utils::chunked_string serializer: add serializer<utils::chunked_string> utils/reusable_buffer: add get_linearized_view(managed_bytes_view) cql3/expr: use utils::chunked_string for untyped_constant::raw_text types: abstract_type::from_string() switch to fragmented buffers (implementation) types: abstract_type::from_string() switch to fragmented buffers (interface) types: use write_fragmented from utils/fragment_range.hh types: timestamp_from_string(): don't assume std::string_view is null-terminated types/duration: don't assume std::string_view is null-terminated utils/hashers: add calculate(managed_bytes_view) overload utils/ascii: add validate(managed_bytes_view) overload utils: add managed_bytes_fwd.hh utils: add chunked_string utils: add managed_bytes_basic_view::byte_iterator	2026-05-26 15:00:53 +03:00
Avi Kivity	f165b396fd	schema_builder: make shard_count an explicit constructor parameter A recent Seastar update deprecated smp::count and introduced this_smp_shard_count() as a replacement. One difference is that this_smp_shard_count() wants to run on a reactor thread. This poses a problem for non-reactor tests (BOOST_AUTO_TEST_CASE) that nevertheless use a schema, as the schema_builder constructor references smp::count. If we replace it with this_smp_shard_count() then it will crash when running without a reactor. To fix, remove the implicit this_smp_shard_count() call from raw_schema's constructor and require callers to pass shard_count explicitly to schema_builder. This allows tests that don't run on a reactor thread to construct schemas without crashing. Production code and reactor-based tests pass this_smp_shard_count(). Non-reactor test files (expr_test, keys_test, nonwrapping_interval_test, wrapping_interval_test, bti_key_translation_test, range_tombstone_list_test) pass a fixed shard count of 1. Note: sstable_test.cc is a Seastar test file (SEASTAR_THREAD_TEST_CASE) but also contains one plain BOOST_AUTO_TEST_CASE (test_empty_key_view_comparison) that constructs a schema_builder without a reactor context. This test also receives a fixed shard count of 1.	2026-05-26 11:55:56 +03:00
Gleb Natapov	d48b8fd1f0	hints: remove migration infrastructure Remove migrate_ip_directories(), perform_migration(), and all associated state: _migration_callback, _migrating_done, _migration_mutex, state::migrating. Make _uses_host_id a static constexpr true — the dead IP-based branches still compile but will be removed in the next commit.	2026-05-26 11:44:57 +03:00
Gleb Natapov	10d37494ca	hints: deprecate HOST_ID_BASED_HINTED_HANDOFF feature The host_id_based_hinted_handoff feature is now guaranteed to be enabled on all supported upgrade paths. Move it to the deprecated features list (still advertised via gossip for compatibility) and remove the feature checks from the hint manager startup.	2026-05-26 11:44:57 +03:00
Nikos Dragazis	54cb6d4608	test: Order task-wait before finalization in test_migration_wait_task The purpose of this test is to verify that the task manager's "wait" API works correctly for vnodes-to-tablets migration virtual tasks. It starts a `wait_task` HTTP request concurrently with a finalize (or rollback) operation, and asserts that the wait returns the correct final state ("done" or "suspended"). The test `uses asyncio.create_task()` to wrap the wait request into a task, and then immediately calls finalize. With asyncio's lazy task scheduling, the wait coroutine does not start until the event loop yields, so the finalization request reaches the server before wait, and therefore may also complete before it. Once finalization completes, the virtual migration task is no longer discoverable, causing a "task not found" error. Add a log message in Scylla's wait handler and a synchronization point in the test to ensure that the wait request lands the server before finalization. This follows the same pattern used in `test_tablet_tasks.py::check_and_abort_repair_task`. Fixes SCYLLADB-2077 Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com> Closes scylladb/scylladb#29973	2026-05-26 10:43:22 +03:00
Botond Dénes	0fd25dc47c	Merge 'Replace get_injection_parameters() with inject_parameter() where appropriate' from Pavel Emelyanov Several error injection sites use the low-level get_injection_parameters() API to fetch the entire parameters map and then manually look up a single key. The inject_parameter() API is better suited for these cases — it combines the enabled check and typed single-parameter extraction in one call, returning std::optional. Cleaning error injection usage, not backporting Closes scylladb/scylladb#29970 * github.com:scylladb/scylladb: test: Use inject_parameter() in row_cache_test sstables: Use inject_parameter() for mx reader fill buffer timeout streaming: Use inject_parameter() for order_sstables_for_streaming	2026-05-26 10:32:44 +03:00
Anna Stuchlik	bd089ebcaa	doc: remove broken References section from sstables-3-index Remove the References section containing broken links to Cassandra source files that no longer exist. Fixes https://github.com/scylladb/scylladb/issues/30080 Closes scylladb/scylladb#30081	2026-05-26 09:50:08 +03:00
Nadav Har'El	f65a52f3ec	Merge 'vector_search: test: migrate rescoring tests from C++/Boost to pytest' from Szymon Malewski Migrate mock-based rescoring and oversampling tests from test/vector_search/rescoring_test.cc to pytest and delete the C++ file. Index option validation tests go to test_vector_index.py; rescoring tests go to a new test_vector_search_rescoring.py which introduces shared infrastructure (EmbeddingRow dataclass, TEST_DATA dict, reversed_ann_response() helper, rescoring_test_table() context manager). Two tests have updated assertions (semantic change): filters_invalid_similarity_scores now uses per-function expected result sets including a zero-vector row, and rescoring_with_zerovector_query asserts empty results after NaN filtering (cosine only). Both are marked xfail pending SCYLLADB-924. Follow-up to #29593. Does not require backport - simple refactoring of tests Closes scylladb/scylladb#29906 * github.com:scylladb/scylladb: test/vector_search: migrate zero-vector query rescoring test to pytest; delete rescoring_test.cc test/vector_search: migrate invalid similarity score filtering test to pytest test/vector_search: migrate non-ANN similarity argument rescoring test to pytest test/vector_search: migrate wildcard select rescoring test to pytest test/vector_search: migrate similarity_function rescoring test to pytest test/vector_search: migrate rescoring and f32 quantization tests to pytest test/vector_search: migrate oversampling tests to pytest test/vector_search: migrate vector_index option validation tests to pytest	2026-05-26 09:45:40 +03:00
Botond Dénes	853edcbf75	tracing: add_query(): change query param to utils::chunked_string Having to unconditionally linearize the chunked query string when passing it to tracing undoes the work put into reducing large alloctions on the query path. The add_query() is evaluated eagerly on every query, even if tracing is disabled. Defer the linearization to build_parameres_map(), which is only called if tracing is enabled.	2026-05-26 09:08:06 +03:00
Botond Dénes	6c3f104b67	cql3: store raw query string in utils::chunked_string Read query as fragmented string from the input stream in transport/server.cc, propagate it a such to query_processor::prepare() and also store it as such in cql3::cql_statement::raw_cql_statement. Unfortunately, the query still has to be linearized for parsing, as ANTLR -- although allows for custom InputStream implementation -- plays pointer arithmetics games with the pointers obtained from them, so fragmented input cannot be used. To amortize the cost of this linearization, the query string is linearized through utils::reusable_buffer. The parser can be invoked recursively, nested invokations linearize directly. Still, this patch limits the places where the query is linearized to the following: * Parsing * Audit * Logs and error messages So the normal query paths for queries that actually can get arbitrarily large (UPDATE and INSERT) should only linearize the query temporarily for parsing.	2026-05-26 09:08:06 +03:00
Botond Dénes	bf1a775fe4	serializer: add serializer<utils::chunked_string> Also add normalizer which maps to sstring. utils::chunked_string's wire representation is binary compatible with that of sstring, which allows for seamless migration of RPCs from sstring to utils::chunked_string where needed. Will be used in the next commit for forward CQL prepare request (query string).	2026-05-26 09:08:06 +03:00
Botond Dénes	05cfd7ac5e	utils/reusable_buffer: add get_linearized_view(managed_bytes_view) Allow using reusable buffer with managed bytes too. To be used soon to amortize linearizing query strings before passing them to ANTLR for parsing.	2026-05-26 09:08:06 +03:00
Botond Dénes	4af3359744	cql3/expr: use utils::chunked_string for untyped_constant::raw_text This value can be a string or bytes literal, which can get very large in rare cases. Use chunked storage to avoid large allocations.	2026-05-26 09:08:06 +03:00
Botond Dénes	2c9a5f9634	types: abstract_type::from_string() switch to fragmented buffers (implementation) The previous patch changed the interface and callers, this one updates the implementation to actually work with fragmented buffers. Most types just use with_linearized() to linearize the fragmented input buffer for parsing. This is fine, as most types have a fixed or bounded-size string representation that is small. Importantly, the input is not linearized for the 3 types which have unbounded values: ascii, bytes and text. The tuple type can contain any of these types itself, so it is also converted to avoid linearization.	2026-05-26 09:08:06 +03:00
Botond Dénes	597d4252dc	types: abstract_type::from_string() switch to fragmented buffers (interface) Change input: str::string_view -> utils::chunked_string_view. Change return value: bytes -> managed_bytes. This patch only changes the interface, with some to_bytes() sprinkled in the internals to deal with recursive calls. Internals will be updated in the next patch, to keep the churn of updating callers separate from the actually important changes.	2026-05-26 09:08:06 +03:00

1 2 3 4 5 ...

54172 Commits