Commit Graph

54178 Commits

Author SHA1 Message Date
Pavel Emelyanov
8b2ff16cae schema: Move grace_period from schema_ctxt to schema_registry
The schema_registry_grace_period field on schema_ctxt was only used by
schema_registry itself for eviction timing. Move it to be a direct member
of schema_registry, passed at init() time. This removes one db::config
dependency from schema_ctxt.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Closes scylladb/scylladb#30038
2026-05-29 13:42:23 +03:00
Botond Dénes
1384c9523e Merge 'Simplify handler injection call sites to use appropriate existing API' from Pavel Emelyanov
Several error injection call sites use the verbose handler-lambda API when simpler alternatives already exist in the framework. This series converts them to use the appropriate overloads, reducing boilerplate and making the injection intent immediately obvious from the call site.

Cleaning up in-code debugging facilities, no need to backport

Closes scylladb/scylladb#29962

* github.com:scylladb/scylladb:
  error_injection: Convert handler-style breakpoints to wait_for_message sugar
  error_injection: Convert no-op handler injections to enter()/is_enabled()
  error_injection: Convert handler-throw injections to lambda-throw style
  utils: Add share_messages parameter to breakpoint injection API
2026-05-29 13:41:09 +03:00
Botond Dénes
3ae88e31bd Merge 'test/pylib: stop using random ports for MinIO and JMX' from Piotr Smaron
Replace random port selection in MinIO and JMX test helpers with fixed
ports on unique per-test loopback IPs, eliminating TOCTOU races.

Commits:
- kmip_wrapper: default hostname to 127.0.0.1
- nodetool: bind JMX to the per-module loopback IP with fixed port 7199
- minio: use fixed service and console ports on a unique HostRegistry IP
  instead of probing the ephemeral range; raise on start failure

Fixes: SCYLLADB-1817

Minor improvement, no need to backport.

Closes scylladb/scylladb#29741

* github.com:scylladb/scylladb:
  test/pylib: use fixed MinIO ports on unique loopback IPs
  test/nodetool: bind JMX to per-module loopback IP
  test/pylib: default KMIP wrapper to loopback
2026-05-29 13:40:24 +03:00
Botond Dénes
46631692cd mutation_fragment_stream_validator: use legacy byte order for same-token partition key comparison
When two partition keys share the same token, their relative order is
determined by their raw serialized bytes (legacy_tri_compare), which
matches the physical on-disk order in SSTables.  The validator was
using partition_key::tri_compare instead — a type-aware comparator
that can disagree with byte order for types like timeuuid.

The result was a false-positive "out-of-order partition key" error
for any two same-token partitions whose timeuuid (or other type-aware)
order is the reverse of their byte order.  In scrub mode this caused
the second partition to be silently dropped.

Fixes: SCYLLADB-2304

Closes scylladb/scylladb#30120
2026-05-29 11:54:20 +02:00
Tomasz Grabiec
5ceabcbcc5 Merge 'tablets: fix update_tablet_metadata failures during bootstrap' from Aleksandra Martyniuk
When partition_split_builder splits a tablet metadata partition into
multiple mutations, the first mutation gets the partition tombstone
and/or static row while subsequent mutations contain only clustered
rows. The hint logic would correctly clear tokens (marking a full
partition read) upon seeing the tombstone in the first mutation, but
then re-add tokens when processing the subsequent row-only mutations.
This caused update_tablet_metadata to attempt a point update via
mutate_tablet_map_async on a tablet map that doesn't exist yet during
bootstrap, throwing no_such_tablet_map and failing the snapshot transfer.

Fix by adding a full_read flag to table_hint. Once a full partition read
is decided (due to partition tombstone, range tombstone, static row, or
row deletion), the flag prevents subsequent mutations for the same table
from re-adding tokens. Additionally, fall back to a full partition read
when the tablet map is missing locally, which happens when the joining
node receives tablet metadata for a table it has never seen before.

Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-2303.

Needs backports to 2026.1+. 2026.1 introduces the regression with b17a36c071

Closes scylladb/scylladb#30115

* github.com:scylladb/scylladb:
  tablets: fall back to full partition read when tablet map is missing
  tablets: fix hint re-adding tokens after full partition read decision
2026-05-29 11:53:36 +02:00
Raphael S. Carvalho
ea3615de1e compaction: fail resharding when out of space prevention is activated
When out-of-space prevention is activated, the compaction manager is
drained and disabled. This caused resharding to silently succeed without
actually processing any SSTables, because:

1. run_custom_job() calls start_compaction() which returns nullopt when
   is_disabled() is true, and run_custom_job() would just return
   immediately — appearing as a successful no-op.

2. reshard() used throw_if_stopping::no, so even within the compaction
   task executor, stopping would be silently swallowed rather than
   propagated as an exception.

The SSTable loader interprets a successful return from resharding as
"all SSTables processed", so it proceeds without error, leaving
the unprocessed SSTables orphaned and their data missing from the table.

Fix this with two changes:

- run_custom_job(): when start_compaction() returns nullopt, check
  is_disabled() and throw via make_disabled_exception() rather than
  returning silently. This ensures callers are always informed when
  a job was skipped because compaction is disabled (e.g. due to disk
  space pressure), as opposed to a benign skip (e.g. table removed).

- reshard(): change throw_if_stopping::no to throw_if_stopping::yes.
  Resharding is mandatory for correct SSTable loading — unlike reshape
  which is optional and can be safely skipped, resharding failure must
  be propagated to the caller so the loader does not proceed with
  incomplete data.

Fixes https://scylladb.atlassian.net/browse/SCYLLADB-2085.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#30041
2026-05-29 12:48:16 +03:00
Botond Dénes
091e3f5191 Merge 'test.py: reduce resource metrics gathering overhead' from Evgeniy Naydanov
Only enable the memory controller in cgroup subtree_control instead of all available controllers. cpu.stat is available in cgroup v2 without enabling the cpu controller (base accounting), and enabling io/pids/cpu controllers adds unnecessary per-operation kernel overhead to Scylla processes - particularly the memory controller's per-page-cache-operation accounting combined with io controller overhead during heavy I/O.

Additionally, restrict SystemResourceMonitor to the master process only. System-wide metrics (CPU%, memory) are identical from any process, so running a monitoring thread in each xdist worker was redundant and added unnecessary SQLite write contention and thread scheduling noise.

Replace cpu_percent(interval=0.1) with a non-blocking cpu_percent()
that returns CPU% since the previous call. Use stop_event.wait(timeout=2.0) as the
loop control to both space out iterations and allow immediate shutdown responsiveness.

Fixes SCYLLADB-2141

Closes scylladb/scylladb#29987

* github.com:scylladb/scylladb:
  test: use non-blocking cpu_percent in SystemResourceMonitor
  test.py: reduce cgroup overhead in resource metrics gathering
2026-05-29 10:52:17 +03:00
Yaniv Michael Kaul
f90b066405 cql3: lazily allocate _idx_opt behind unique_ptr
Motivation:
The secondary_index::index object stored in statement_restrictions is
approximately 128 bytes (containing index_metadata with its sstring name,
UUID id, and unordered_map options, plus a target_column sstring). This
field is only populated for queries that use secondary indexing, yet every
prepared statement's restrictions object pays the full inline cost.

Replace std::optional<secondary_index::index> with
std::unique_ptr<secondary_index::index>. This reduces the inline size
from 136 bytes to 8 bytes, saving 128 bytes per non-index-using
prepared statement cached in the prepared statement cache.

The semantics are preserved: null unique_ptr is equivalent to
std::nullopt, and the dereference patterns (-> and *) work identically.
The find_idx() method that returns a copy constructs an optional from
the dereferenced pointer when non-null.

Tests:
- statement_restrictions_test builds and passes
- Full release build compiles cleanly

Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
AI-assisted: Yes
Backport: no, improvement

Closes scylladb/scylladb#30046
2026-05-28 21:35:25 +03:00
Aleksandra Martyniuk
491db28fbf tablets: fall back to full partition read when tablet map is missing
When update_tablet_metadata receives a hint with non-empty tokens for a
table whose tablet map doesn't exist locally yet, it would call
mutate_tablet_map_async which throws no_such_tablet_map. This happens
during bootstrap when the joining node receives tablet metadata for a
table it has never seen before.

Fix by checking has_tablet_map() before attempting the point update. If
the map is missing, fall back to do_update_tablet_metadata_partition
which reads the full partition from system.tablets and creates the map.
2026-05-28 16:23:45 +02:00
Dawid Mędrek
ffd9f7ebbd Merge 'treewide: update method accessibility when checked by concepts' from Avi Kivity
Clang 22 and below ignore method accessibility when checking concepts. Clang 23 now [1] checks
accessibility. Make relevant methods public so concepts that check them have access.

The problem was that the concepts were evaluated at the use-site, which was a friend, but should have
been evaluated in some friendless global context. After the clang fix, the problems in our code were exposed.

[1] ac3c588739

Preparing for a new toolchain, so not backporting.

Closes scylladb/scylladb#30053

* github.com:scylladb/scylladb:
  compacting_reader: make consume() methods public
  mutation_fragment_v1_stream: make consume() methods public
2026-05-28 11:19:29 +02:00
Nadav Har'El
7a387a499f Merge 'cql3: extract vector search select statement into cql3/statements/external_search/' from Szymon Malewski
Extract vector_indexed_table_select_statement and its filter logic out of
the monolithic select_statement.cc and vector_search/ module into a
dedicated directory cql3/statements/index_search/.

This improves modularity and eliminates a circular dependency between cql3
and vector_search: the filter code depends heavily on cql3 types
(expressions, query_options, statement_restrictions) and belongs in the cql3
layer. Follow-up to VECTOR-250 which originally addressed the same
dependency but has since regressed.

This is also a preparatory refactoring for full-text search select statements,
which can share some implementation with the vector search.

Pure refactoring, no semantic changes - no need for backporting.

Closes scylladb/scylladb#30100

* github.com:scylladb/scylladb:
  vector_index: move filter into cql3/statements/external_search
  cql3: extract vector_indexed_table_select_statement into own compilation unit
  vector_index: split query_base_table to return raw coordinator_result
2026-05-28 11:26:49 +03:00
Piotr Dulikowski
f44d57c7c7 Merge 'Deprecate HOST_ID_BASED_HINTED_HANDOFF feature and drop migration code' from Gleb Natapov
The feature was included in 2024.2 and present on all supported versions. No upgrade from a version that does not have it is possible to the HEAD. It means that the feature can be deprecated features list and all the migration code can be dropped.

No need to backport since the is code removal.

Closes scylladb/scylladb#30087

* github.com:scylladb/scylladb:
  hints: remove hint_directory_manager and IP-based hint directory infrastructure
  hints: remove migration infrastructure
  hints: deprecate HOST_ID_BASED_HINTED_HANDOFF feature
2026-05-28 10:09:02 +02:00
Piotr Dulikowski
8dfd455001 Merge 'strong consistency: fix drop table blocking on stuck writes and handle timeout in update()' from Petr Gusev
- Fix table drop blocking for the full client timeout when in-flight writes can't reach quorum
- Handle unhandled timeout exception in the wait-for-leader loop during group startup

When a strongly consistent table is dropped, `schedule_raft_group_deletion`() calls `g->close()` which waits for all in-flight operations to release their gate holders. But other nodes may have already destroyed their raft servers for this group, so an in-flight write on the leader cannot reach quorum and hangs until the client timeout expires (~seconds), unnecessarily delaying group deletion.

Additionally, the wait-for-leader loop in groups_manager::update() uses abort_on_expiry with a 60-second timeout but never catches the exception if it fires, leaving the group in an indeterminate state.

SCYLLADB-2080 fix:
- Reorder `schedule_raft_group_deletion`: initiate gate close (prevents new operations), then abort the raft server (unblocks stuck writes by causing `raft::stopped_error`), then await the gate future (resolves immediately since holders are released).
- Handle `raft::stopped_error` in the coordinator's top-level catch blocks (both write and read paths): if the table no longer exists, return `no_such_column_family` (CQL layer converts to InvalidRequest: unconfigured table). Otherwise fall through to the default timeout handling.
- Replace gate->hold() with try_hold() + on_internal_error in acquire_server, with a comment explaining why the gate can never be closed at that point (table removal in `schema_applier::commit_on_shard` precedes gate closure, with no scheduling point in between).

Timeout handling fix:
- Use `coroutine::as_future` in the wait-for-leader loop to catch timeout exceptions gracefully — log a warning and break out instead of propagating unhandled.

Includes a cluster test reproducer (test_drop_table_unblocks_stuck_write) that:
1. Pauses a write on the leader before add_entry
2. Drops the table (follower destroys its group immediately)
3. Resumes the write — verifies it fails promptly with InvalidRequest ("unconfigured table") instead of hanging for 15 seconds

backport: no need, strong consistency is not released yet

Fixes: SCYLLADB-2080

Closes scylladb/scylladb#30105

* github.com:scylladb/scylladb:
  strong consistency/groups_manager: handle timeout in update() wait-for-leader loop
  strong consistency: abort raft server before gate close when dropping a table
  test/cluster: rewrite test_queries_while_dropping_table for SCYLLADB-2080
2026-05-28 09:59:20 +02:00
Szymon Malewski
ed1006928f vector_index: move filter into cql3/statements/external_search
Move prepared_filter, prepared_restriction, prepared_rhs types and
prepare_filter() from vector_search/filter.{hh,cc} into new files
cql3/statements/external_search/filter.{hh,cc} under namespace
cql3::statements::external_search.

This eliminates a circular dependency between the cql3 and vector_search
modules: the filter code depends heavily on cql3 types (expressions,
query_options, statement_restrictions) and belongs in the cql3 layer.

This is a follow-up to VECTOR-250 which originally addressed the same
circular dependency but has since regressed.
2026-05-27 21:43:56 +02:00
Szymon Malewski
5e94abe3bc cql3: extract vector_indexed_table_select_statement into own compilation unit
Move vector_indexed_table_select_statement and its associated helpers
(ann_ordering_info, get_ann_ordering_info, add_similarity_function_to_selectors,
get_similarity_ordering_comparator) from select_statement.hh/.cc into new files
cql3/statements/external_search/vector_indexed_table_select_statement.hh/.cc.
2026-05-27 21:43:52 +02:00
Ferenc Szili
76dac2fd8e test: fix format string typo in error logging in ldap_server.py
This change fixes a typo in the error logging format string: s% -> %s

Fixes: SCYLLADB-2244

Closes scylladb/scylladb#30088
2026-05-27 17:22:21 +03:00
Anna Stuchlik
c54d7329d4 docs: add Configuration Parameters link to system-configuration index
The system-configuration index page listed the System Configuration
Guide, scylla.yaml, and Snitches, but omitted the Configuration
Parameters reference page.

Fixes https://github.com/scylladb/scylladb/issues/23110

Closes scylladb/scylladb#30117
2026-05-27 17:18:55 +03:00
Aleksandra Martyniuk
d6c1707a04 tablets: fix hint re-adding tokens after full partition read decision
When partition_split_builder splits a tablet metadata partition into
multiple mutations, the first mutation gets the partition tombstone and/or
static row while subsequent mutations contain only clustered rows.

The tablet metadata change hint logic would correctly clear tokens (marking
a full partition read) upon seeing the tombstone in the first mutation,
but then re-add tokens when processing the subsequent row-only mutations.
This caused update_tablet_metadata to attempt a point update via
mutate_tablet_map_async on a tablet map that doesn't exist yet during
bootstrap, throwing no_such_tablet_map and failing the snapshot transfer.

Fix by adding a full_read flag to table_hint. Once a full partition read
is decided (due to partition tombstone, range tombstone, static row, or
row deletion), the flag prevents subsequent mutations for the same table
from re-adding tokens.
2026-05-27 15:36:16 +02:00
Emil Maskovsky
f845918861 raft: don't block replace when group0 leader is unknown
The join_node_request_handler rejects replace requests when the node
being replaced is still seen as the group0 leader. It loops for up to
10s waiting for the leader to change. However, the loop condition also
blocked when current_leader() returned empty (no leader known):

    while (!g0_server.current_leader() || *params.replaced_id == g0_server.current_leader())

This is incorrect: if current_leader() is empty, it means the old
leader is already gone (election in progress). The replaced node is
no longer the leader, so the safety check is satisfied and the replace
should be allowed to proceed.

Remove the !current_leader() check so the loop only continues while the
replaced node is positively identified as the current leader.

No backport needed: the failure rate is 2/17K in CI (dev mode only,
caused by reactor stalls under extreme resource contention) and the
code path only affects replace-after-kill scenarios where the replaced
node was the group0 leader.

Refs: SCYLLADB-2125

Closes scylladb/scylladb#30098
2026-05-27 14:56:30 +02:00
Nadav Har'El
21ecc12fc6 Merge 'index: fix local vector index locality detection after schema reload' from Michał Hudobski
After schema reload, `target_parser::is_local()` did not recognize the
vector-index local target format `{"pk": [...], "tc": "..."}`, causing
local vector indexes to be treated as global. This broke duplicate
detection when both a global and a local vector index existed on the same
column. Fix by introducing `vector_index::is_local()` and dispatching
to it from `create_index_from_index_row()` based on the index class.
Also adds tests for local/global vector index coexistence.

Fixes: SCYLLADB-987

backport reasoning: we added local vector index support in 2026.1

Closes scylladb/scylladb#29492

* github.com:scylladb/scylladb:
  test/cqlpy: add tests for global and local vector index coexistence
  index: fix local vector index locality detection after schema reload
2026-05-27 15:34:57 +03:00
Petr Gusev
f2b1cbe998 strong consistency/groups_manager: handle timeout in update() wait-for-leader loop
The wait-for-leader loop in groups_manager::update() uses abort_on_expiry
with a 60-second timeout. If the timeout fires, co_await w->future throws
an exception that propagates unhandled out of the server_control_op
coroutine, leaving the group in an indeterminate state.

Use coroutine::as_future to catch the exception, log a warning, and break
out of the loop gracefully. The group will still be reported as started
(allowing other operations to proceed) even if the leader wasn't found
within the timeout.
2026-05-27 12:06:46 +02:00
Petr Gusev
d922c43358 strong consistency: abort raft server before gate close when dropping a table
When a strongly consistent table is dropped, schedule_raft_group_deletion()
used to call g->close() first, which waits for all in-flight operations to
release their gate holders. But other nodes may have already destroyed their
raft servers for this group, so an in-flight write on the leader cannot
reach quorum and hangs until the client timeout expires, unnecessarily
delaying group deletion.

Fix: initiate gate close (prevents new operations from entering), then
abort the raft server (causes in-flight add_entry/read_barrier to throw
raft::stopped_error, releasing their gate holders), then await the gate
future (resolves immediately since holders are now released).

Handle raft::stopped_error in the coordinator's top-level catch blocks
(both write and read paths): if the table no longer exists, return
no_such_column_family (which the CQL layer converts to InvalidRequest
'unconfigured table'). Otherwise fall through to the default timeout
handling.

Also replace gate->hold() with try_hold() + on_internal_error in
acquire_server, and handle the timeout exception in the wait-for-leader
loop in update() gracefully (log + break instead of propagating).

Fixes: SCYLLADB-2080
2026-05-27 12:06:46 +02:00
Petr Gusev
89307064b5 test/cluster: rewrite test_queries_while_dropping_table for SCYLLADB-2080
Rewrite the test to use 2 nodes (RF=2) instead of 1 (RF=1), which exposes
the quorum-loss scenario: when a table is dropped, the follower destroys
its raft group immediately while the leader's in-flight operations are
still holding the gate.

The test pauses both a read and a write on the leader, drops the table,
then resumes them. Both are expected to fail with 'no such column family'
since the raft server is aborted as part of group deletion. A 15-second
timeout guard detects the old buggy behavior (write stuck forever).

Marked xfail until the fix is applied in the next commit.
2026-05-27 12:06:46 +02:00
Avi Kivity
668ad55c69 build: degrade -Wpass-failed from error to warning
-Wpass-failed warns when an explicitly requested optimization
(e.g. `#pragma GCC unroll`) cannot be performed. Since the standard
library contains those pragmas, this is more or less out of a
developer's control. We can play with inlining, but cannot guarantee
it will work.

Since the condition isn't fatal in any way, degrade it back to its
default disposition, a warning (it was upgraded to an error via
-Werror). Don't suppress it entirely since in hot paths we do want
to address it.

Closes scylladb/scylladb#29980
2026-05-27 11:38:15 +03:00
Gleb Natapov
54a423986e hints: remove hint_directory_manager and IP-based hint directory infrastructure
Now that HOST_ID_BASED_HINTED_HANDOFF is always enabled, remove the
hint_directory_manager class and all code paths that dealt with
IP-named hint directories and IP-to-host-ID mappings.

- Remove hint_directory_manager class from hint_storage.hh/.cc
- Simplify drain_for to take only host_id (no IP parameter)
- Simplify initialize_endpoint_managers to only scan host-ID directories
- Simplify with_file_update_mutex_for to take host_id directly
- Simplify resource_manager's space_watchdog to use host_id only
- Make storage_proxy::on_leave_cluster empty (draining via on_released)
- Remove uses_host_id() checks from storage_proxy::on_released
2026-05-27 11:13:28 +03:00
Wojciech Mitros
515faaf1d0 strong_consistency: cleanup forwarding reads to leader
When forwarding reads to the raft group leader was introduced, we
didn't use the methods allowing us to cache the leader after
completing requests - we fix it in this commit by using the
redirect_to_leader method prepared for this case.
Also remove a duplicated consecutive 'if'

Closes scylladb/scylladb#30102
2026-05-27 09:49:06 +02:00
Yaron Kaikov
ec36f0f7e1 build: fix collect-dist target failing with missing RPM/DEB rules
The collect_pkgs ninja rules for building collect-dist-{mode} listed
individual RPM and DEB file paths as order-only dependencies. However,
the rpmbuild/debbuild rules only declare the output *directory*
(e.g. $builddir/dist/{mode}/redhat), not the individual files within it.
This caused ninja to fail with:

  ninja: error: '...scylla-....rpm', needed by 'build/.../dist/rpm',
  missing and no known rule to make it

Fix by removing the individual package file paths from the order-only
dependency list. The directory targets ($builddir/dist/{mode}/redhat,
dist-cqlsh-rpm, dist-python3-rpm, etc.) already ensure the packages are
built before collect_pkgs copies them via the $pkgs variable.

5694c93c12 ("build: add collect-dist target to organize build artifacts") intreduced this regression

Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-2215

Closes scylladb/scylladb#30079
2026-05-27 10:17:28 +03:00
Botond Dénes
555cfbcd38 Merge 'treewide: replace deprecated smp::count and smp::all_cpus() with new APIs' from Avi Kivity
Replace all uses of the deprecated seastar::smp::count with this_smp_shard_count() and smp::all_cpus() with this_smp_all_shards() across the ScyllaDB codebase (seastar submodule untouched).

Both replacement functions require a reactor thread context. All call sites were verified to run on reactor threads.

Notable cases:
- dht/token-sharding.hh: this_smp_shard_count() is used as a default parameter value. This is safe since all callers are on reactor threads, but the expression is now evaluated at each call site rather than being a reference to a global variable.
- service/storage_service.hh, locator/abstract_replication_strategy.hh, ent/encryption/encryption.cc: used in default member initializers and constructor member-init-lists. Objects are always constructed on reactor threads.
- schema_builder: sometimes called from BOOST_AUTO_TEST_CASE without a reactor. Added pre-patch that makes the implicit shard count parameter implicit and pass 1 in those cases.

Not changed:
- scylla-gdb.py: reads smp::count as a GDB symbol (no reactor context).
- Python test files: only reference smp::count in comments/strings.

No backport: the Seastar commit that deprecated these function hasn't (and won't) make its way into any release branches (and the warnings are cosmetic anyway)

Closes scylladb/scylladb#29990

* github.com:scylladb/scylladb:
  treewide: replace deprecated smp::count and smp::all_cpus() with new APIs
  scylla-gdb: read shard count from smp::_this_smp instead of smp::count
  schema_builder: make shard_count an explicit constructor parameter
2026-05-27 09:42:06 +03:00
Szymon Malewski
aa17c7739e vector_index: split query_base_table to return raw coordinator_result
The inner query_base_table overloads previously called process_results()
themselves, duplicating row_limit setup and making it impossible to thread
per-execution context (e.g. a similarity provider) into result processing.

Lift process_results() to the top-level overload and change the two inner
overloads to return coordinator_result<foreign_ptr<query::result>> directly.
This cleanly separates query dispatch from result processing, and opens the
door to passing execution-time context at the single process_results() call site.

No functional change.
2026-05-26 21:37:13 +02:00
Avi Kivity
35d7cc7c3e compacting_reader: make consume() methods public
The CompactedFragmentsConsumer concept checks that these methods
exist. Clang 23 tightened [1] the rules to verify that the methods are
publicly accessible. Make them public so we don't fail the build.

[1] ac3c588739
2026-05-26 20:10:10 +03:00
Avi Kivity
e26d983453 mutation_fragment_v1_stream: make consume() methods public
The MutationFragmentConsumerV2 concept checks that these methods
exist. Clang 23 tightened [1] the rules to verify that the methods are
publicly accessible. Make them public so we don't fail the build.

[1] ac3c588739
2026-05-26 20:09:57 +03:00
Avi Kivity
8010e408a2 treewide: replace deprecated smp::count and smp::all_cpus() with new APIs
Replace all uses of the deprecated seastar::smp::count with
this_smp_shard_count() and smp::all_cpus() with this_smp_all_shards()
across the ScyllaDB codebase (seastar submodule untouched).

Both replacement functions require a reactor thread context. All call
sites were verified to run on reactor threads.

Notable cases:
- dht/token-sharding.hh: this_smp_shard_count() is used as a default
  parameter value. This is safe since all callers are on reactor threads,
  but the expression is now evaluated at each call site rather than being
  a reference to a global variable.
- service/storage_service.hh, locator/abstract_replication_strategy.hh,
  ent/encryption/encryption.cc: used in default member initializers and
  constructor member-init-lists. Objects are always constructed on reactor
  threads.

Not changed:
- scylla-gdb.py: reads smp::count as a GDB symbol (no reactor context).
- Python test files: only reference smp::count in comments/strings.
2026-05-26 17:35:20 +03:00
Avi Kivity
3fe64681e9 scylla-gdb: read shard count from smp::_this_smp instead of smp::count
After removing all references to smp::count from ScyllaDB code, in
the next patch, the linker may strip the symbol in release builds
(LTO/gc-sections). The GDB script then fails with 'Missing ELF symbol
_ZN7seastar3smp5countE'.

Read the shard count from the thread-local smp instance pointer
(smp::_this_smp->_shard_count) instead. This pointer is always set on
reactor threads and is guaranteed to survive the linker since it's used
by this_smp_shard_count().

If the current GDB thread is not a reactor thread (e.g., an alien
thread), iterate all threads to find the first reactor thread. If none
have _this_smp set, try the deprecated smp::count global as a last
resort (available in debug builds).
2026-05-26 17:35:19 +03:00
Wojciech Mitros
ae0d77257f mv: fix view_update_builder losing fragments across batch boundaries
When a mutation generates more view updates than max_rows_for_view_updates
(100), view_update_builder::build_some() splits the work into multiple
batches. There was a bug in how fragments were read between batches:

When should_stop_updates() returned true, the old code called stop()
which returned stop_iteration::yes without reading the next fragments.
On the next build_some() call, read_both_next_fragments() was called
at the start, which advanced BOTH readers - skipping any fragment that
was already read but not yet consumed. A row could be not consumed if
either:
- the 100th (last in the batch) update was a row insertion and we still
  had insertions/updates remaining
- the 100th (last in the batch) update was a row deletion and we still
  had deletions/updates remaining
For the most common case where work is split in batches, i.e. range
deletions, we couldn't hit this because range delete generates only
view row deletions.
On tables with a single materialized view, we also couldn't get this
for any batches with less than 50 statements (unless the batch also
contained range deletions), because one non-range-delete update can
generate up to 2 view updates.
Howeveer, for a range of scenarios outside these 2, we could lose
view updates, resulting in persistent inconsistencies.

The fix:
- read_*_next_fragment() now accept a stop_iteration parameter, so the
  next fragments are always read after consuming (even when stopping),
  but stop_iteration::yes is correctly propagated to break the loop.
- build_some() no longer re-reads fragments at the start. Instead, an
  initialize() method performs the initial read once at construction.
- because now we only advance readers after consuming, we won't advance
  readers after end_of_partition, so we extend the break condition to
  accept either readers evaluating to `false` or them being at the
  end_of_partition. We also handle the optimization with
  _skip_row_updates

Fixes: scylladb/scylladb#29155

Closes scylladb/scylladb#29498
2026-05-26 14:15:12 +02:00
Pavel Emelyanov
cd7d9a63bc error_injection: Convert handler-style breakpoints to wait_for_message sugar
Replace verbose handler lambdas that only log and call
wait_for_message() with the equivalent one-liner breakpoint sugar.
The behavior is identical -- the sugar produces the same log messages
in the format "{name}: waiting for message" / "{name}: message received".

Update Python tests that waited for the old ad-hoc log messages to
match the new standardized format.

Converted injections:
 - topology_state_load_before_update_cdc (storage_service.cc)
 - migration_streaming_wait x2 (storage_service.cc)
 - pause_after_streaming_tablet (storage_service.cc)
 - cdc_generation_publisher_fiber (topology_coordinator.cc)
 - wait_after_tablet_cleanup (topology_coordinator.cc)
 - fast_orphan_removal_fiber (topology_coordinator.cc)
 - split_storage_groups_wait (table.cc)
 - wait_before_stop_compaction_groups (table.cc)
 - tasks_vt_get_children (task_manager.cc)
 - truncate_compaction_disabled_wait (database.cc)

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-26 15:01:01 +03:00
Pavel Emelyanov
985d63c37d error_injection: Convert no-op handler injections to enter()/is_enabled()
Replace handler lambdas that only set a local variable (without
waiting) with the simpler enter() or is_enabled() check.

Converted injections:
 - speedup_orphan_removal: handler that sets timeout=0 -> enter() gate
 - fast_orphan_removal_fiber: handler that sets a flag then waits ->
   is_enabled() for the flag + breakpoint sugar for the wait

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-26 15:01:01 +03:00
Pavel Emelyanov
a16ae07617 error_injection: Convert handler-throw injections to lambda-throw style
Replace inject(name, handler_lambda_that_throws).get() with the
simpler inject(name, [] { throw ...; }) pattern used consistently
across the codebase.  The handler overload is unnecessary when the
injection just throws immediately without waiting.

Converted injections:
 - get_snapshot_details (table.cc)
 - per-snapshot-get_snapshot_details (table.cc)

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-26 15:01:01 +03:00
Avi Kivity
c59985c38b Merge 'cql3: limit large allocations when parsing queries' from Botond Dénes
Queries are stored and passed around as sstring/std::string_view. While normally they are small enough to not cause problems, as the `test_cdc_large_values.TestLargeColumnsWithCDC.test_single_column_blob_max_size_with_cdc_preimage_full_postimage[unprepared_statements]` demonstrates, queries can be arbitrarily large, putting heavy strain on Scylla internals via large allocations, in the extreme case causing denial of service.

This PR attempts to alleviate this by using fragmented storage for queries: read query as fragmented string from the input stream in `transport/server.cc`, propagate it as such to `query_processor::prepare()` and also store it as such in `cql3::cql_statement::raw_cql_statement`. Also avoid linearizing raw values during in the CQL expression tree: switch `cql3::expr::untyped_constant::raw_text` to fragmented storage.

For this to be possible, some infrastructure code had to be made fragmented storage friendly: ascii/utf8 validation, hashers, from_hex and importantly: `abstract_type::from_string()`.

Unfortunately, the query still has to be linearized for parsing itself, as ANTLR -- although allows for custom InputStream implementation -- plays pointer arithmetics games with the pointers obtained from them, so fragmented input cannot be used.

Still, this PR limits the places where the query is linearized to the
following:
* Parsing
* Audit
* Logs and error messages

So the normal query paths for queries that actually can get arbitrarily large (UPDATE and INSERT) should only linearize the query temporarily for parsing.

Fixes #10779

Improvement, no backport

Closes scylladb/scylladb#28619

* github.com:scylladb/scylladb:
  tracing: add_query(): change query param to utils::chunked_string
  cql3: store raw query string in utils::chunked_string
  serializer: add serializer<utils::chunked_string>
  utils/reusable_buffer: add get_linearized_view(managed_bytes_view)
  cql3/expr: use utils::chunked_string for untyped_constant::raw_text
  types: abstract_type::from_string() switch to fragmented buffers (implementation)
  types: abstract_type::from_string() switch to fragmented buffers (interface)
  types: use write_fragmented from utils/fragment_range.hh
  types: timestamp_from_string(): don't assume std::string_view is null-terminated
  types/duration: don't assume std::string_view is null-terminated
  utils/hashers: add calculate(managed_bytes_view) overload
  utils/ascii: add validate(managed_bytes_view) overload
  utils: add managed_bytes_fwd.hh
  utils: add chunked_string
  utils: add managed_bytes_basic_view::byte_iterator
2026-05-26 15:00:53 +03:00
Pavel Emelyanov
60aa3e2879 utils: Add share_messages parameter to breakpoint injection API
The inject(name, wait_for_message) overload now accepts an optional
third parameter 'share_messages' (defaulting to true) that is passed
through to the underlying handler injection. This allows callers to
use the breakpoint sugar even when they need non-shared message
semantics, avoiding the need for a verbose handler lambda.

This patch partially reverts the 324a0829 that added the same ability
but via wait_for_message struct itself. The explicit argument for
inject() is symmetrical to other inject overloads and is thus more
preferrable.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-26 14:59:56 +03:00
Avi Kivity
f165b396fd schema_builder: make shard_count an explicit constructor parameter
A recent Seastar update deprecated smp::count and introduced
this_smp_shard_count() as a replacement. One difference is that
this_smp_shard_count() wants to run on a reactor thread.

This poses a problem for non-reactor tests (BOOST_AUTO_TEST_CASE)
that nevertheless use a schema, as the schema_builder constructor
references smp::count. If we replace it with this_smp_shard_count()
then it will crash when running without a reactor.

To fix, remove the implicit this_smp_shard_count() call from raw_schema's
constructor and require callers to pass shard_count explicitly to
schema_builder. This allows tests that don't run on a reactor thread
to construct schemas without crashing.

Production code and reactor-based tests pass this_smp_shard_count().
Non-reactor test files (expr_test, keys_test, nonwrapping_interval_test,
wrapping_interval_test, bti_key_translation_test, range_tombstone_list_test)
pass a fixed shard count of 1.

Note: sstable_test.cc is a Seastar test file (SEASTAR_THREAD_TEST_CASE)
but also contains one plain BOOST_AUTO_TEST_CASE
(test_empty_key_view_comparison) that constructs a schema_builder without
a reactor context. This test also receives a fixed shard count of 1.
2026-05-26 11:55:56 +03:00
Gleb Natapov
d48b8fd1f0 hints: remove migration infrastructure
Remove migrate_ip_directories(), perform_migration(), and all
associated state: _migration_callback, _migrating_done,
_migration_mutex, state::migrating.

Make _uses_host_id a static constexpr true — the dead IP-based
branches still compile but will be removed in the next commit.
2026-05-26 11:44:57 +03:00
Gleb Natapov
10d37494ca hints: deprecate HOST_ID_BASED_HINTED_HANDOFF feature
The host_id_based_hinted_handoff feature is now guaranteed to be
enabled on all supported upgrade paths. Move it to the deprecated
features list (still advertised via gossip for compatibility) and
remove the feature checks from the hint manager startup.
2026-05-26 11:44:57 +03:00
Nikos Dragazis
54cb6d4608 test: Order task-wait before finalization in test_migration_wait_task
The purpose of this test is to verify that the task manager's "wait" API
works correctly for vnodes-to-tablets migration virtual tasks. It starts
a `wait_task` HTTP request concurrently with a finalize (or rollback)
operation, and asserts that the wait returns the correct final state
("done" or "suspended").

The test `uses asyncio.create_task()` to wrap the wait request into a
task, and then immediately calls finalize. With asyncio's lazy task
scheduling, the wait coroutine does not start until the event loop
yields, so the finalization request reaches the server before wait, and
therefore may also complete before it. Once finalization completes, the
virtual migration task is no longer discoverable, causing a
"task not found" error.

Add a log message in Scylla's wait handler and a synchronization point
in the test to ensure that the wait request lands the server before
finalization. This follows the same pattern used in
`test_tablet_tasks.py::check_and_abort_repair_task`.

Fixes SCYLLADB-2077

Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>

Closes scylladb/scylladb#29973
2026-05-26 10:43:22 +03:00
Botond Dénes
0fd25dc47c Merge 'Replace get_injection_parameters() with inject_parameter() where appropriate' from Pavel Emelyanov
Several error injection sites use the low-level get_injection_parameters() API to fetch the entire parameters map and then manually look up a single key. The inject_parameter() API is better suited for these cases — it combines the enabled check and typed single-parameter extraction in one call, returning std::optional.

Cleaning error injection usage, not backporting

Closes scylladb/scylladb#29970

* github.com:scylladb/scylladb:
  test: Use inject_parameter() in row_cache_test
  sstables: Use inject_parameter() for mx reader fill buffer timeout
  streaming: Use inject_parameter() for order_sstables_for_streaming
2026-05-26 10:32:44 +03:00
Anna Stuchlik
bd089ebcaa doc: remove broken References section from sstables-3-index
Remove the References section containing broken links to Cassandra
source files that no longer exist.

Fixes https://github.com/scylladb/scylladb/issues/30080

Closes scylladb/scylladb#30081
2026-05-26 09:50:08 +03:00
Nadav Har'El
f65a52f3ec Merge 'vector_search: test: migrate rescoring tests from C++/Boost to pytest' from Szymon Malewski
Migrate mock-based rescoring and oversampling tests from
test/vector_search/rescoring_test.cc to pytest and delete the C++ file.
Index option validation tests go to test_vector_index.py; rescoring tests
go to a new test_vector_search_rescoring.py which introduces shared
infrastructure (EmbeddingRow dataclass, TEST_DATA dict,
reversed_ann_response() helper, rescoring_test_table() context manager).

Two tests have updated assertions (semantic change):
filters_invalid_similarity_scores now uses per-function expected result
sets including a zero-vector row, and rescoring_with_zerovector_query
asserts empty results after NaN filtering (cosine only). Both are marked
xfail pending SCYLLADB-924.

Follow-up to #29593.

Does not require backport - simple refactoring of tests

Closes scylladb/scylladb#29906

* github.com:scylladb/scylladb:
  test/vector_search: migrate zero-vector query rescoring test to pytest; delete rescoring_test.cc
  test/vector_search: migrate invalid similarity score filtering test to pytest
  test/vector_search: migrate non-ANN similarity argument rescoring test to pytest
  test/vector_search: migrate wildcard select rescoring test to pytest
  test/vector_search: migrate similarity_function rescoring test to pytest
  test/vector_search: migrate rescoring and f32 quantization tests to pytest
  test/vector_search: migrate oversampling tests to pytest
  test/vector_search: migrate vector_index option validation tests to pytest
2026-05-26 09:45:40 +03:00
Botond Dénes
853edcbf75 tracing: add_query(): change query param to utils::chunked_string
Having to unconditionally linearize the chunked query string when
passing it to tracing undoes the work put into reducing large
alloctions on the query path. The add_query() is evaluated eagerly on
every query, even if tracing is disabled. Defer the linearization to
build_parameres_map(), which is only called if tracing is enabled.
2026-05-26 09:08:06 +03:00
Botond Dénes
6c3f104b67 cql3: store raw query string in utils::chunked_string
Read query as fragmented string from the input stream in
transport/server.cc, propagate it a such to query_processor::prepare()
and also store it as such in cql3::cql_statement::raw_cql_statement.

Unfortunately, the query still has to be linearized for parsing, as
ANTLR -- although allows for custom InputStream implementation -- plays
pointer arithmetics games with the pointers obtained from them, so
fragmented input cannot be used.
To amortize the cost of this linearization, the query string is
linearized through utils::reusable_buffer. The parser can be
invoked recursively, nested invokations linearize directly.

Still, this patch limits the places where the query is linearized to the
following:
* Parsing
* Audit
* Logs and error messages

So the normal query paths for queries that actually can get arbitrarily
large (UPDATE and INSERT) should only linearize the query temporarily
for parsing.
2026-05-26 09:08:06 +03:00
Botond Dénes
bf1a775fe4 serializer: add serializer<utils::chunked_string>
Also add normalizer which maps to sstring. utils::chunked_string's wire
representation is binary compatible with that of sstring, which allows
for seamless migration of RPCs from sstring to utils::chunked_string
where needed. Will be used in the next commit for forward CQL prepare
request (query string).
2026-05-26 09:08:06 +03:00
Botond Dénes
05cfd7ac5e utils/reusable_buffer: add get_linearized_view(managed_bytes_view)
Allow using reusable buffer with managed bytes too. To be used soon to
amortize linearizing query strings before passing them to ANTLR for
parsing.
2026-05-26 09:08:06 +03:00