Commit Graph

53141 Commits

Author SHA1 Message Date
Yaniv Michael Kaul
28e59bae5a utils, db: qualify seastar::coroutine:: to avoid shadowing by utils::coroutine class
Inside namespace utils, unqualified coroutine:: resolves to the
utils::coroutine class (utils/coroutine.hh) rather than the
seastar::coroutine namespace. This causes build failures when
replica/database.hh is added to the precompiled header, because
utils/coroutine.hh becomes transitively visible in all TUs.

Qualify all coroutine:: references with seastar:: in affected files
under utils/ and db/.
2026-04-19 10:54:19 +03:00
Yaniv Michael Kaul
66618cd869 pch: expand precompiled header with more high-impact Scylla headers
Add to stdafx.hh: locator/token_metadata.hh, gms/gossiper.hh,
db/system_keyspace.hh, service/topology_state_machine.hh,
cql3/query_options.hh, service/client_state.hh, cql3/query_processor.hh,
db/config.hh, service/storage_proxy.hh, schema/schema_builder.hh,
exceptions/exceptions.hh, gms/feature_service.hh,
service/migration_manager.hh, sstables/sstables.hh,
service/storage_service.hh, transport/messages/result_message.hh.

These headers are included by 40-140 translation units each. Adding them
to the PCH avoids redundant parsing across the build. Combined with the
previous PCH commit, clean dev build time drops from 22m33s to ~14m23s
(-36.2%).
2026-04-19 10:54:19 +03:00
Yaniv Michael Kaul
b4586f0789 utils: fix PCH compatibility in config_file and object_storage
Convert config_file.cc read_from_file() from continuation-style to
coroutines, avoiding a template instantiation conflict with
-fpch-instantiate-templates when heavy Scylla headers are in the PCH.

Qualify input_stream<char> in object_storage.cc lambda parameter with
seastar:: to resolve the same PCH template parsing issue.
2026-04-19 10:54:19 +03:00
Yaniv Michael Kaul
37280265ef pch: add commonly-used Scylla internal headers to precompiled header
Add schema/schema.hh, types/types.hh, mutation/mutation_partition.hh,
mutation/mutation_fragment.hh and their dependencies (bytes.hh, keys.hh,
dht/token.hh, locator types, etc.) to the PCH. These are included by
the vast majority of translation units and benefit greatly from being
precompiled once rather than parsed ~400 times.

Reduces clean dev build time from ~22m to ~18m (~19% faster).
2026-04-19 10:54:18 +03:00
Yaniv Michael Kaul
2fbba4a071 raft, service, locator: create raft_fwd.hh and reduce heavy header includes
Create raft/raft_fwd.hh with lightweight type aliases (server_id, group_id,
term_t, index_t) backed only by raft/internal.hh, avoiding the heavy
raft/raft.hh (832 lines with futures, abort_source, bytes_ostream).

Replace raft/raft.hh with raft/raft_fwd.hh in headers that only need the
basic ID types: tablets.hh, topology_state_machine.hh,
topology_coordinator.hh, storage_service.hh, group0_fwd.hh,
view_building_coordinator.hh, view_building_worker.hh.

Also remove gossiper.hh and tablet_allocator.hh from storage_service.hh
(forward declarations suffice), and remove unused reactor.hh from
tablets.hh. Add explicit includes in .cc files that lost transitive
availability.
2026-04-17 01:08:04 +03:00
Yaniv Michael Kaul
be5fa64d36 db: break gossiper.hh include from system_keyspace.hh
Extract loaded_endpoint_state into a standalone lightweight header to
avoid pulling the heavy gossiper.hh (and transitively query-result-set.hh)
into every includer of system_keyspace.hh. Add explicit includes where
the full definitions are actually needed.

Reduces clean dev build time by ~2 minutes (-8%).
2026-04-16 23:27:55 +03:00
Yaniv Michael Kaul
5c918d29cc service: remove unused storage_service.hh include from storage_proxy.hh
storage_proxy.hh included storage_service.hh but never referenced any
symbol from it. storage_service.hh costs 3.7s to parse per file, and
storage_proxy.hh has 75 direct includers. While most of those also
include database.hh (which shares transitive deps), removing this
unnecessary include still reduces total parse work.

Speedup: part of a series measured at -5.8% wall-clock improvement
(same-session A/B: 16m14s -> 15m17s at -j16, 16 cores).
2026-04-16 18:22:56 +03:00
Yaniv Michael Kaul
43e337a663 db, test: add explicit includes for storage_service.hh and system_keyspace.hh
Add explicit includes that were previously available transitively through
service/storage_proxy.hh -> service/storage_service.hh.

This prepares for removing the unused storage_service.hh include from
storage_proxy.hh in a follow-up commit.

Speedup: prerequisite for storage_proxy.hh include chain reduction
(measured -5.8% wall-clock combined with all changes in this series,
same-session A/B: 16m14s -> 15m17s at -j16).
2026-04-16 18:22:41 +03:00
Yaniv Michael Kaul
a67efb031c db: break heavy include chain from config.hh by extracting replication_strategy_type
Extract replication_strategy_type enum from locator/abstract_replication_strategy.hh
into a new lightweight header locator/replication_strategy_type.hh, and use it in
db/config.hh instead of the full abstract_replication_strategy.hh.

abstract_replication_strategy.hh pulls in a large transitive dependency tree
(schema.hh, mutation serializers, etc.) costing ~1.7s per file. With this change,
config.hh's incremental parse cost drops from 1.7s to 0.6s. Since ~85 files
include config.hh without also including database.hh (which would bring in these
deps anyway), this saves ~93s total CPU.

Speedup: part of a series measured at -5.8% wall-clock improvement
(same-session A/B: 16m14s -> 15m17s at -j16, 16 cores).
2026-04-16 18:19:19 +03:00
Yaniv Michael Kaul
5b0933c453 utils: add explicit include for exceptions.hh in s3/client.cc
Add explicit #include for utils/exceptions.hh which was previously
available transitively through db/config.hh -> abstract_replication_strategy.hh.

This prepares for removing the heavy abstract_replication_strategy.hh
include from db/config.hh in a follow-up commit.

Speedup: prerequisite for config.hh include chain reduction
(measured -5.8% wall-clock combined with all changes in this series,
same-session A/B: 16m14s -> 15m17s at -j16).
2026-04-16 18:19:04 +03:00
Yaniv Michael Kaul
2ac834d797 pch: remove seastar/http/api_docs.hh from precompiled header
The api_docs.hh header contains inline method bodies (api_registry::handle)
that call seastar::json::formatter::to_json(), forcing the compiler to
instantiate seastar::json template specializations (json_list_template,
formatter::write, do_with, etc.) in every compilation unit — even files
that never use any HTTP/JSON API types.

Measured ~6s of wasted template instantiation per file × ~620 files =
~3,700s total CPU. Only 2 files outside the PCH include api_docs.hh
directly, so removing it has no impact on code that actually uses these
types.

Wall-clock build time (-j16, Seastar/Abseil cached):
  Before (with loading_cache fix): avg 23m29s
  After:                           avg 23m04s  (-1.8%)
  vs original baseline:            avg 24m01s  (-4.0%)
2026-04-15 09:29:25 +03:00
Yaniv Michael Kaul
b324c84a04 cql3: break loading_cache include chain from query_processor.hh
utils/loading_cache.hh is an expensive template header that costs
~2,494 seconds of aggregate CPU time across 133 files that include it.
88 of those files include it only transitively via query_processor.hh
through the chain: query_processor.hh -> prepared_statements_cache.hh
-> loading_cache.hh, costing ~1,690s of template instantiation.

Break the chain by:
- Replacing #include of prepared_statements_cache.hh and
  authorized_prepared_statements_cache.hh in query_processor.hh with
  forward declarations and the lightweight prepared_cache_key_type.hh
- Replacing #include of result_message.hh with result_message_base.hh
  (which doesn't pull in prepared_statements_cache.hh)
- Changing prepared_statements_cache and authorized_prepared_statements_cache
  members to std::unique_ptr (PImpl) since forward-declared types
  cannot be held by value
- Moving get_prepared(), execute_prepared(), execute_direct(), and
  execute_batch() method bodies from the header to query_processor.cc
- Updating transport/server.cc to use the concrete type instead of the
  no-longer-visible authorized_prepared_statements_cache::value_type

Per-file measurement: files including query_processor.hh now show zero
loading_cache template instantiation events (previously 20-32s each).

Wall-clock measurement (clean build, -j16, 16 cores, Seastar cached):
  Baseline (origin/master):           avg 24m01s (24m03s, 23m59s)
  With loading_cache chain break:     avg 23m29s (23m32s, 23m29s, 23m27s)
  Improvement:                        ~32s, ~2.2%
2026-04-15 04:21:15 +03:00
Yaniv Michael Kaul
b499dc8e9d cql3: extract prepared_cache_key_type into standalone lightweight header
Move prepared_cache_key_type class and its std::hash / fmt::formatter
specializations from prepared_statements_cache.hh into a new header
cql3/prepared_cache_key_type.hh.

The new header only depends on bytes.hh, utils/hash.hh, and
cql3/dialect.hh -- it does NOT include utils/loading_cache.hh.
This allows code that needs the cache key type (e.g. for function
signatures) without pulling in the expensive loading_cache template
machinery.

prepared_statements_cache.hh now includes prepared_cache_key_type.hh,
so existing includers are unaffected.

No functional change. Prepares for breaking the loading_cache include
chain from query_processor.hh.
2026-04-15 04:20:57 +03:00
Yaniv Michael Kaul
8ad8e76c3b cql3, service, test: add explicit includes for headers losing transitive availability
Add explicit #include directives for headers that are currently
available transitively through cql3/query_processor.hh but will stop
being available after a subsequent refactoring that removes the
loading_cache include chain.

Files changed:
- cql3/statements/drop_keyspace_statement.cc: add unimplemented.hh
- cql3/statements/truncate_statement.cc: add unimplemented.hh
- cql3/statements/batch_statement.cc: add result_message.hh
- cql3/statements/broadcast_modification_statement.cc: add result_message.hh
- service/paxos/paxos_state.cc: add result_message.hh
- test/lib/cql_test_env.cc: add result_message.hh
- table_helper.cc: add result_message.hh

No functional change. Prepares for subsequent query_processor.hh cleanup.
2026-04-15 04:20:49 +03:00
Avi Kivity
0ae22a09d4 LICENSE: Update to version 1.1
Updated terms of non-commercial use (must be a never-customer).
2026-04-12 19:46:33 +03:00
Avi Kivity
22949bae52 Merge 'logstor: implement tablet split/merge and migration' from Michael Litvak
implement tablet split, tablet merge and tablet migration for tables that use the experimental logstor storage engine.

* tablet merge simply merges the histograms of segments of one compaction group with another.
* for tablet split we take the segments from the source compaction group, read them and write all live records to separate segments according to the split classifier, and move separated segments to the target compaction groups.
* for tablet migration we use stream_blob, similarly to file streaming of sstables. we add a new op type for streaming a logstor segment. on the source we take a snapshot of the segments with an input stream that reads the segment, and on the target we create a sink that allocates a new segment on the target shard and writes to it.
* we also do some improvements for recovery and loading of segments. we add a segment header that contains useful information for non-mixed segments, such as the table and token range.

Refs SCYLLADB-770

no backport - still a new and experimental feature

Closes scylladb/scylladb#29207

* github.com:scylladb/scylladb:
  test: logstor: additional logstor tests
  docs/dev: add logstor on-disk format section
  logstor: add version and crc to buffer header
  test: logstor: tablet split/merge and migration
  logstor: enable tablet balancing
  logstor: streaming of logstor segments using stream_blob
  logstor: add take_logstor_snapshot
  logstor: segment input/output stream
  logstor: implement compaction_group::cleanup
  logstor: tablet split
  logstor: tablet merge
  logstor: add compaction reenabler
  logstor: add segment header
  logstor: serialize writes to active segment
  replica: extend compaction_group functions for logstor
  replica: add compaction_group_for_logstor_segment
  logstor: code cleanup
2026-04-12 16:11:12 +03:00
Israel Fruchter
79c736455e cqlsh: update to v6.0.34-scylla
Update cqlsh to version v6.0.34-scylla.

Notable fix:
- Fix vector type formatting error (scylladb/scylla-cqlsh#165)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Closes scylladb/scylladb#29401
2026-04-12 14:54:50 +03:00
Avi Kivity
8ccee6803e Merge 'Remove upgrade view builder' from Gleb Natapov
Since we do no longer support upgrade from versions that do not support
v2 of "view building status" code (building status is managed by raft) we can remove v1 code and upgrade code and make sure we do not boot with old "builder status" version.

v2 version was introduced by 8d25a4d678 which is included in scylla-2025.1.0.

No backport needed since this is code removal.

Closes scylladb/scylladb#29105

* github.com:scylladb/scylladb:
  view: drop unused v1 builder code
  view: remove upgrade to raft code
2026-04-12 00:39:26 +03:00
Botond Dénes
9770a4c081 test/cluster/test_encryption.py: use single-partition reads in read_verify_workload()
Replace the range scan in read_verify_workload() with individual
single-partition queries, using the keys returned by
prepare_write_workload() instead of hard-coding them.

The range scan was previously observed to time out in debug mode after
a hard cluster restart. Single-partition reads are lighter on the
cluster and less likely to time out under load.

The new verification is also stricter: instead of merely checking that
the expected number of rows is returned, it verifies that each written
key is individually readable, catching any data-loss or key-identity
mismatch that the old count-only check would have missed.

This is the second attemp at stabilizing this test, after the recent
854c374ebf. That fix made sure that the
cluster has converged on topology and nodes see each other before running
the verify workload.

Fixes: SCYLLADB-1331

Closes scylladb/scylladb#29313
2026-04-12 00:38:20 +03:00
Avi Kivity
ca80ee8586 Merge 'Introduce maintenance scheduling supergroup and do initial population' from Pavel Emelyanov
The supergroup replaces streaming (a.k.a. maintenance as well) group, inherits 200 shares from it and consists of four sub-groups (all have equal shares of 200 withing the new supergroup)

* maintenance_compaction. This group configures `compaction_manager::maintenance_sg()` group. User-triggered compaction runs in it
* backup. This group configures `snapshot_ctl::config::backup_sched_group`. Native backup activity runs there
* maintenance. It's a new "visible" name, everything that was called "maintenance" in the code ran in "streaming" group. Now it will run in "maintenance". The activities include those that don't communicate over RPC (see below why)
  * `tablet_allocator::balance_tablets()`
  * `sstables_manager::components_reclaim_reload_fiber()`
  * `tablet_storage_group_manager::merge_completion_fiber()`
  * metrics exporting http server altogether
* streaming. This is purely existing streaming group that just moves under the new supergroup. Everything else that was run there, continues doing so, including
  * hints sender
  * all view building related components (update generator, builder, workers)
  * repair
  * stream_manager
  * messaging service (except for verb handlers that switch groups)
  * join_cluster() activity
  * REST API
  * ... something else I forgot

The `--maintenance_io_throughput_mb_per_sec` option is introduced. It controls the IO throughput limit applied to the maintenance supergroup. If not set, the `--stream_io_throughput_mb_per_sec` option is used to preserve backward compatibility.

All new sched groups inherit `request_class::maintenance` (however, "backup" seem not to make any requests yet).

Moving more activities from "streaming" into "maintenance" (or its own group) is possible, but one will need to take care of RPC group switching. The thing is that when a client makes an RPC call, the server may switch to one of pre-negotiated scheduling groups. Verbs for existing activities that run in "streaming" group are routed through RPC index that negotiates "streaming" group on the server side. If any of that client code moves to some other group, server will still run the handlers in "streaming" which is not quite expected. That's one of the main reasons why only the selected fibers were moved to their own "maintenance" group. Similar for backup -- this code doesn't use RPC, so it can be moved. Restoring code uses load-and-stream and corresponding RPCs, so it cannot be just moved into its own new group.

Fixes SCYLLADB-351

New feature, not backporting

Closes scylladb/scylladb#28542

* github.com:scylladb/scylladb:
  code: Add maintenance/maintenance group
  backup: Add maintenance/backup group
  compaction: Add maintenance/maintenance_compaction group
  main: Introduce maintenance supergroup
  main: Move all maintenance sched group into streaming one
  database: Use local variable for current_scheduling_group
  code: Live-update IO throughputs from main
2026-04-12 00:34:48 +03:00
Botond Dénes
3289928679 repair: fix quadratic complexity when loading repair history
shared_tombstone_gc_state::update_repair_time() uses copy-on-write
semantics: each call copies the entire per_table_history_maps and the
per-table repair_history_map.  repair_service::load_history() called
this once per history entry, making the load O(N²) in both time and
memory.

Introduce batch_update_repair_time() which performs a single
copy-on-write for any number of entries belonging to the same table.
Restructure load_history() to collect entries into batches of up to
1000 and flush each batch in one call, keeping peak memory bounded.
The batch size limit is intentional: the repair history table currently
has no bound on the number of entries and can grow large.  Note that
this does not cause a problem in the in-memory history map itself:
entries are coalesced internally and only the latest repair time is
kept per range.  The unbounded entry count only makes the batched
update during load expensive.

Fixes: SCYLLADB-104

Closes scylladb/scylladb#29326
2026-04-11 23:54:26 +03:00
Michał Hudobski
7d648961ed vector_search: forward non-primary key restrictions to Vector Store service
Include non-primary key restrictions (e.g. regular column filters) in
the filter JSON sent to the Vector Store service. Previously only
partition key and clustering column restrictions were forwarded, so
filtering on regular columns was silently ignored.

Add get_nonprimary_key_restrictions() getter to statement_restrictions.

Add unit tests for non-primary key equality, range, and bind marker
restrictions in filter_test.

Fixes: SCYLLADB-970

Closes scylladb/scylladb#29019
2026-04-10 17:16:29 +02:00
Piotr Dulikowski
3bd770d4d9 Merge 'counters: reuse counter IDs by rack' from Michael Litvak
For counter updates, use a counter ID that is constructed from the
node's rack instead of the node's host ID.

A rack can have at most two active tablet replicas at a time: a single
normal tablet replica, and during tablet migration there are two active
replicas, the normal and pending replica. Therefore we can have two
unique counter IDs per rack that are reused by all replicas in the rack.

We construct the counter ID from the rack UUID, which is constructed
from the name "dc:rack". The pending replica uses a deterministic
variation of the rack's counter ID by negating it.

This improves the performance and size of counter cells by having less
unique counter IDs and less counter shards in a counter cell.

Previously the number of counter shards was the number of different
host_id's that updated the counter, which can be typically the number of
nodes in the cluster and continue growing indefinitely when nodes are
replaced. with the rack-based counter id the number of counter shards
will be at most twice the number of different racks (including removed
racks, which should not be significant).

Fixes SCYLLADB-356

backport not needed - an enhancement

Closes scylladb/scylladb#28901

* github.com:scylladb/scylladb:
  docs/dev: add counters doc
  counters: reuse counter IDs by rack
2026-04-10 12:24:18 +02:00
Wojciech Mitros
163c6f71d6 transport: refactor result_message bounce interface
Replace move_to_shard()/move_to_host() with as_bounce()/target_shard()/
target_host() to clarify the interface after bounce was extended to
support cross-node bouncing.

- Add virtual as_bounce() returning const bounce* to the base class
  (nullptr by default, overridden in bounce to return this), replacing
  the virtual move_to_shard() which conflated bounce detection with
  shard access
- Rename move_to_shard() -> target_shard() (now non-virtual, returns
  unsigned directly) and move_to_host() -> target_host() on bounce
- Replace dynamic_pointer_cast with static_pointer_cast at call sites
  that already checked as_bounce()
- Move forward declarations of message types before the virtual
  methods so as_bounce() can reference bounce

Fixes: SCYLLADB-1066

Closes scylladb/scylladb#29367
2026-04-10 12:17:43 +02:00
Piotr Dulikowski
32e3a01718 Merge 'service: strong_consistency: Allow for aborting operations' from Dawid Mędrek
Motivation
----------

Since strongly consistent tables are based on the concept of Raft
groups, operations on them can get stuck for indefinite amounts of
time. That may be problematic, and so we'd like to implement a way
to cancel those operations at suitable times.

Description of solution
-----------------------

The situations we focus on are the following:

* Timed-out queries
* Leader changes
* Tablet migrations
* Table drops
* Node shutdowns

We handle each of them and provide validation tests.

Implementation strategy
-----------------------

1. Auxiliary commits.
2. Abort operations on timeout.
3. Abort operations on tablet removal.
4. Extend `client_state`.
5. Abort operation on shutdown.
6. Help `state_machine` be aborted as soon as possible.

Tests
-----

We provide tests that validate the correctness of the solution.

The total time spent on `test_strong_consistency.py`
(measured on my local machine, dev mode):

Before:
```
real    0m31.809s
user    1m3.048s
sys     0m21.812s
```

After:
```
real    0m34.523s
user    1m10.307s
sys     0m27.223s
```

The incremental differences in time can be found in the commit messages.

Fixes SCYLLADB-429

Backport: not needed. This is an enhancement to an experimental feature.

Closes scylladb/scylladb#28526

* github.com:scylladb/scylladb:
  service: strong_consistency: Abort state_machine::apply when aborting server
  service: strong_consistency: Abort ongoing operations when shutting down
  service: client_state: Extend with abort_source
  service: strong_consistency: Handle abort when removing Raft group
  service: strong_consistency: Abort Raft operations on timeout
  service: strong_consistency: Use timeout when mutating
  service: strong_consistency: Fix indentation
  service: strong_consistency: Enclose coordinator methods with try-catch
  service: strong_consistency: Crash at unexpected exception
  test: cluster: Extract default config & cmdline in test_strong_consistency.py
2026-04-10 11:11:21 +02:00
Pavel Emelyanov
0b336da89d Revert "cmake: add missing rolling_max_tracker_test and symmetric_key_test"
This reverts commit 8b4a91982b.

Two commits independently added rolling_max_tracker_test to test/boost/CMakeLists.txt:
8b4a919 cmake: add missing rolling_max_tracker_test and symmetric_key_test
f3a91df test/cmake: add missing tests to boost test suite

The second was merged two days after the first. They didn't conflict on
code-level and applied cleanly resulting in a duplicate add_scylla_test()
entries that breaks the CMake build:

    CMake Error: add_executable cannot create target
    "test_boost_rolling_max_tracker_test" because another target
    with the same name already exists.

Remove the duplicate.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Reported-by: Łukasz Paszkowski <lukasz.paszkowski@scylladb.com>
2026-04-10 11:19:43 +03:00
Patryk Jędrzejczak
751bf31273 Merge 'More gossiper cleanups' from Gleb Natapov
The PR contains more code cleanups, mostly in gossiper. Dropping more gossiper state leaving only NORMAL and SHUTDOWN. All other states are checked against topology state. Those two are left because SHUTDOWN state is propagated through gossiper only and when the node is not in SHUTDOWN it should be in some other state.

No need to backport. Cleanups.

Closes scylladb/scylladb#29129

* https://github.com/scylladb/scylladb:
  storage_service: cleanup unused code
  storage_service: simplify get_peer_info_for_update
  gossiper: send shutdown notifications in parallel
  gms: remove unused code
  virtual_tables: no need to call gossiper if we already know that the node is in shutdown
  gossiper: print node state from raft topology in the logs
  gossiper: use is_shutdown instead of code it manually
  gossiper: mark endpoint_state(inet_address ip) constructor as explicit
  gossiper: remove unused code
  gossiper: drop last use of LEFT state and drop the state
  gossiper: drop unused STATUS_BOOTSTRAPPING state
  gossiper: rename is_dead_state to is_left since this is all that the function checks now.
  gossiper: use raft topology state instead of gossiper one when checking node's state
  storage_service: drop check_for_endpoint_collision function
  storage_service: drop is_first_node function
  gossiper: remove unused REMOVED_TOKEN state
  gossiper: remove unused advertise_token_removed function
2026-04-10 09:56:20 +02:00
Nadav Har'El
6674aa29ca Merge 'Add Cassandra SAI (StorageAttachedIndex) compatibility' from Szymon Wasik
Cassandra's native vector index type is StorageAttachedIndex (SAI). Libraries such as CassIO, LangChain, and LlamaIndex generate `CREATE CUSTOM INDEX` statements using the SAI class name. Previously, ScyllaDB rejected these with "Non-supported custom class".

This PR adds compatibility so that SAI-style CQL statements work on ScyllaDB without modification.

1. **test: enable SAI_VECTOR_ALLOW_CUSTOM_PARAMETERS for Cassandra tests**
   Enables the `SAI_VECTOR_ALLOW_CUSTOM_PARAMETERS` Cassandra system property so that `search_beam_width` tests pass against Cassandra 5.0.7.

2. **test: modernize vector index test comments and fix xfail**
   Updates test comments from "Reproduces" to "Validates fix for" for clarity, and converts the `test_ann_query_with_pk_restriction` xfail into a stripped-down CREATE INDEX syntax test (removing unused INSERT/SELECT lines). Removes the redundant `test_ann_query_with_non_pk_restriction` test.

3. **cql: add Cassandra SAI (StorageAttachedIndex) compatibility**
   Core implementation: the SAI class name is detected and translated to ScyllaDB's native `vector_index`. The fully-qualified class name (`org.apache.cassandra.index.sai.StorageAttachedIndex`) requires exact case; short names (`StorageAttachedIndex`, `sai`) are matched case-insensitively — matching Cassandra's behavior. Non-vector and multi-column SAI targets are rejected with clear errors. Adds `skip_on_scylla_vnodes` fixture, SAI compatibility docs, and the Cassandra compatibility table entry (split into "SAI general" vs "SAI for vector search").

4. **cql: accept source_model option for Cassandra SAI compatibility**
   The `source_model` option is a Cassandra SAI property used by Cassandra libraries (e.g., CassIO) to tag vector indexes with the name of the embedding model. ScyllaDB accepts it for compatibility but does not use it — the validator is a no-op lambda. The option is preserved in index metadata and returned in DESCRIBE INDEX output.

- `cql3/statements/create_index_statement.cc`: SAI class detection and rewriting logic
- `index/secondary_index_manager.cc`: case-insensitive class name lookup (lowercasing restored before `classes.find()`)
- `index/vector_index.cc`: `source_model` accepted as a valid option with no-op validator
- `docs/cql/secondary-indexes.rst`: SAI compatibility documentation with `source_model` table row
- `docs/using-scylla/cassandra-compatibility.rst`: SAI entry split into general (not supported) and vector search (supported)
- `test/cqlpy/conftest.py`: `scylla_with_tablets` renamed to `skip_on_scylla_vnodes`
- `test/cqlpy/test_vector_index.py`: SAI tests inlined (no constants), `check_bad_option()` helper for numeric validation, uppercase class name test, merged `source_model` tests with DESCRIBE check

| Backend            | Passed | Skipped | Failed |
|--------------------|--------|---------|--------|
| ScyllaDB (dev)     | 42     | 0       | 0      |
| Cassandra 5.0.7    | 16     | 26      | 0      |

None: new feature.

Fixes: SCYLLADB-239

Closes scylladb/scylladb#28645

* github.com:scylladb/scylladb:
  cql: accept source_model option and show options in DESCRIBE
  cql: add Cassandra SAI (StorageAttachedIndex) compatibility
  test: modernize vector index test comments and fix xfail
  test: enable SAI_VECTOR_ALLOW_CUSTOM_PARAMETERS for Cassandra tests
2026-04-10 10:21:20 +03:00
Avi Kivity
f67d0739d0 test: user_function_test: adjust Lua error message tests
Lua 5.5 changed the error message slightly ("?:-1" -> "?:?"). Relax
the error message tests to avoid this unimportant fragment.

Closes scylladb/scylladb#29414
2026-04-10 01:09:35 +03:00
Piotr Szymaniak
98d6edaa88 alternator: add comment explaining delta_mode::keys in add_stream_options()
Clarify that cdc::delta_mode is ignored by Alternator, so we use the
least expensive mode (keys) to reduce overhead.

Fixes scylladb/scylladb#24812

Closes scylladb/scylladb#29408
2026-04-10 01:07:21 +03:00
Michał Hudobski
c8b9fde828 auth: allow VECTOR_SEARCH_INDEXING permission to access system.tablets
Add system.tablets to the set of system resources that can be
accessed with the VECTOR_SEARCH_INDEXING permission.

Fixes: VECTOR-605

Closes scylladb/scylladb#29397
2026-04-09 21:53:07 +03:00
Szymon Wasik
573def7cd8 cql: accept source_model option and show options in DESCRIBE
Accept the Cassandra SAI 'source_model' option for vector indexes.
This option is used by Cassandra libraries (e.g., CassIO, LangChain)
to tag vector indexes with the name of the embedding model that
produced the vectors.

ScyllaDB does not use the source_model value but stores it and
includes it in the DESCRIBE INDEX output for Cassandra compatibility.

Additionally, extend vector_index::describe() to emit a
WITH OPTIONS = {...} clause containing all user-provided index options
(filtering out system keys: target, class_name, index_version).
This makes options like similarity_function, source_model, etc.
visible in DESCRIBE output.
2026-04-09 17:20:03 +02:00
Szymon Wasik
80a2e4a0ab cql: add Cassandra SAI (StorageAttachedIndex) compatibility
Libraries such as CassIO, LangChain, and LlamaIndex create vector
indexes using Cassandra's StorageAttachedIndex (SAI) class name.
This commit lets ScyllaDB accept these statements without modification.

When a CREATE CUSTOM INDEX statement specifies an SAI class name on a
vector column, ScyllaDB automatically rewrites it to the native
vector_index implementation. Accepted class names (case-insensitive):
  - org.apache.cassandra.index.sai.StorageAttachedIndex
  - StorageAttachedIndex
  - sai

SAI on non-vector columns is rejected with a clear error directing
users to a secondary index instead.

The SAI detection and rewriting logic is extracted into a dedicated
static function (maybe_rewrite_sai_to_vector_index) to keep the
already-long validate_while_executing method manageable.

Multi-column (local index) targets and nonexistent columns are
skipped with continue — the former are treated as filtering columns
by vector_index::check_target(), and the latter are caught later by
vector_index::validate().

Tests that exercise features common to both backends (basic creation,
similarity_function, IF NOT EXISTS, bad options, etc.) now use the
SAI class name with the skip_on_scylla_vnodes fixture so they run
against both ScyllaDB and Cassandra. ScyllaDB-specific tests continue
to use USING 'vector_index' with scylla_only.
2026-04-09 17:20:03 +02:00
Szymon Wasik
fa7edc627c test: modernize vector index test comments and fix xfail
- Change 'Reproduces' to 'Validates fix for' in test comments to
  reflect that the referenced issues are already fixed.
- Condense the VECTOR-179 comment to two lines.
- Replace the xfailed test_ann_query_with_restriction_works_only_on_pk
  with a focused test (test_ann_query_with_pk_restriction) that creates
  a vector index on a table with a PK column restriction, validating
  the VECTOR-374 fix.
2026-04-09 17:20:02 +02:00
Szymon Wasik
4eab050be4 test: enable SAI_VECTOR_ALLOW_CUSTOM_PARAMETERS for Cassandra tests 2026-04-09 17:20:02 +02:00
Andrzej Jackowski
23c386a27f test: perf: add audit-unix-socket-path to perf-simple-query
To allow performance benchmarking with custom syslog sinks.

Example use case:

-- Audit + default syslog: ~100k tps
taskset -c 0,2,4 ./build/release/scylla perf-simple-query --smp 3 --write --duration 30 --audit "syslog" --audit-keyspace "ks" --audit-categories "DCL,DDL,AUTH,DML,QUERY"

```
110263.72 tps ( 66.1 allocs/op,  16.0 logallocs/op,  25.7 tasks/op,  254900 insns/op,  144796 cycles/op,        0 errors)
throughput:
	mean=   107137.48 standard-deviation=3142.98
	median= 106665.00 median-absolute-deviation=1786.03
	maximum=111435.19 minimum=97620.79
instructions_per_op:
	mean=   256311.36 standard-deviation=5037.13
	median= 256288.09 median-absolute-deviation=2223.08
	maximum=274220.89 minimum=248141.40
cpu_cycles_per_op:
	mean=   146443.47 standard-deviation=2844.19
	median= 146001.85 median-absolute-deviation=1514.82
	maximum=157177.54 minimum=142981.03
```

-- Audit + custom syslog: ~400k tps
socat -u UNIX-RECV:/tmp/audit-null.sock,type=2 OPEN:/dev/null
taskset -c 0,2,4 ./build/release/scylla perf-simple-query --smp 3 --write --duration 30 --audit "syslog" --audit-keyspace "ks" --audit-categories "DCL,DDL,AUTH,DML,QUERY" --audit-unix-socket-path /tmp/audit-null.sock

```
404929.62 tps ( 65.9 allocs/op,  16.0 logallocs/op,  25.5 tasks/op,   77406 insns/op,   35559 cycles/op,        0 errors)
throughput:
	mean=   399868.39 standard-deviation=6232.88
	median= 401770.65 median-absolute-deviation=3859.09
	maximum=406126.79 minimum=383434.84
instructions_per_op:
	mean=   77481.26 standard-deviation=168.31
	median= 77405.54 median-absolute-deviation=84.33
	maximum=78081.46 minimum=77332.84
cpu_cycles_per_op:
	mean=   35871.32 standard-deviation=516.83
	median= 35699.70 median-absolute-deviation=251.15
	maximum=37454.86 minimum=35432.60
```

-- No audit: ~800k tps
taskset -c 0,2,4 ./build/release/scylla perf-simple-query --smp 3 --write --duration 30

```
808970.95 tps ( 53.3 allocs/op,  16.0 logallocs/op,  14.9 tasks/op,   49904 insns/op,   20471 cycles/op,        0 errors)
throughput:
	mean=   809065.31 standard-deviation=6222.39
	median= 810507.10 median-absolute-deviation=1827.99
	maximum=815213.41 minimum=782104.84
instructions_per_op:
	mean=   49905.50 standard-deviation=21.81
	median= 49900.12 median-absolute-deviation=7.72
	maximum=50010.97 minimum=49892.57
cpu_cycles_per_op:
	mean=   20429.00 standard-deviation=41.40
	median= 20425.18 median-absolute-deviation=29.11
	maximum=20530.74 minimum=20355.42
```

Closes scylladb/scylladb#29396
2026-04-09 16:00:41 +03:00
Anna Stuchlik
c6587c6a70 doc: Fix malformed markdown link in alternator network docs
Fixes https://github.com/scylladb/scylladb/issues/29400

Closes scylladb/scylladb#29402
2026-04-09 15:54:43 +03:00
Botond Dénes
5886d1841a Merge 'cmake: align CMake build system with configure.py and add comparison script' from Ernest Zaslavsky
Every time someone modifies the build system — adding a source file, changing a compilation flag, or wiring a new test — the change tends to land in only one of our two build systems (configure.py or CMake). Over time this causes three classes of problems:

1. **CMake stops compiling entirely.** Missing defines, wrong sanitizer flags, or misplaced subdirectory ordering cause hard build failures that are only discovered when someone tries to use CMake (e.g. for IDE integration).

2. **Missing build targets.** Tests or binaries present in configure.py are never added to CMake, so `cmake --build` silently skips them. This PR fixes several such cases (e.g. `symmetric_key_test`, `auth_cache_test`, `sstable_tablet_streaming`).

3. **Missing compilation units in targets.** A `.cc` file is added to a test binary in one system but not the other, causing link errors or silently omitted test coverage.

To fix the existing drift and prevent future divergence, this series:

**Adds a build-system comparison script**
(`scripts/compare_build_systems.py`) that configures both systems into a temporary directory, parses their generated `build.ninja` files, and compares per-file compilation flags, link target sets, and per-target libraries. configure.py is treated as the baseline; CMake must match it. The script supports a `--ci` mode suitable for gating PRs that touch
build files.

**Fixes all current mismatches** found by the script:
- Mode flag alignment in `mode.common.cmake` and `mode.Coverage.cmake`
  (sanitizer flags, `-fno-lto`, stack-usage warnings, coverage defines).
- Global define alignment (`SEASTAR_NO_EXCEPTION_HACK`, `XXH_PRIVATE_API`,
  `BOOST_ALL_DYN_LINK`, `SEASTAR_TESTING_MAIN` placement).
- Seastar build configuration (shared vs static per mode, coverage
  sanitizer link options).
- Abseil sanitizer flags (`-fno-sanitize=vptr`).
- Missing test targets in `test/boost/CMakeLists.txt`.
- Redundant per-test flags now covered by global settings.
- Lua library resolution via a custom `cmake/FindLua.cmake` using
  pkg-config, matching configure.py's approach.

**Adds documentation** (`docs/dev/compare-build-systems.md`) describing how to run the script and interpret its output.

No backport needed — this is build infrastructure improvement only.

Closes scylladb/scylladb#29273

* github.com:scylladb/scylladb:
  scripts: remove lua library rename workaround from comparison script
  cmake: add custom FindLua using pkg-config to match configure.py
  test/cmake: add missing tests to boost test suite
  test/cmake: remove per-test LTO disable
  cmake: add BOOST_ALL_DYN_LINK and strip per-component defines
  cmake: move SEASTAR_TESTING_MAIN after seastar and abseil subdirs
  cmake: add -fno-sanitize=vptr for abseil sanitizer flags
  cmake: align Seastar build configuration with configure.py
  cmake: align global compile defines and options with configure.py
  cmake: fix Coverage mode in mode.Coverage.cmake
  cmake: align mode.common.cmake flags with configure.py
  configure.py: add sstable_tablet_streaming to combined_tests
  docs: add compare-build-systems.md
  scripts: add compare_build_systems.py to compare ninja build files
2026-04-09 15:46:09 +03:00
Yaniv Michael Kaul
13879b023f tracing: set_skip_when_empty() for error-path metrics
Add .set_skip_when_empty() to all error-path metrics in the tracing
module. Tracing itself is not a commonly used feature, making all of
these metrics almost always zero:

Tier 1 (very rare - corruption/schema issues):
- tracing_keyspace_helper::bad_column_family_errors: tracing schema
  missing or incompatible, should never happen post-bootstrap
- tracing::trace_errors: internal error building trace parameters

Tier 2 (overload - tracing backend saturated):
- tracing::dropped_sessions: too many pending sessions
- tracing::dropped_records: too many pending records

Tier 3 (general tracing write errors):
- tracing_keyspace_helper::tracing_errors: errors during writes to
  system_traces keyspace

Since tracing is an opt-in feature that most deployments rarely use,
all five metrics are almost always zero and create unnecessary
reporting overhead.

AI-Assisted: yes
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>

Closes scylladb/scylladb#29346
2026-04-09 14:28:16 +03:00
Michael Litvak
3964040008 docs/dev: add counters doc
Add a documentation of the counters feature implementation in
docs/dev/counters.md.

The documentation is taken from the wiki and updated according to the
current state of the code - legacy details are removed, and a section
about the counter id is added.
2026-04-09 13:08:02 +02:00
Michael Litvak
b71762d5da counters: reuse counter IDs by rack
For counter updates, use a counter ID that is constructed from the
node's rack instead of the node's host ID.

A rack can have at most two active tablet replicas at a time: a single
normal tablet replica, and during tablet migration there are two active
replicas, the normal and pending replica. Therefore we can have two
unique counter IDs per rack that are reused by all replicas in the rack.

We construct the counter ID from the rack UUID, which is constructed
from the name "dc:rack". The pending replica uses a deterministic
variation of the rack's counter ID by negating it.

This improves the performance and size of counter cells by having less
unique counter IDs and less counter shards in a counter cell.

Previously the number of counter shards was the number of different
host_id's that updated the counter, which can be typically the number of
nodes in the cluster and continue growing indefinitely when nodes are
replaced. with the rack-based counter id the number of counter shards
will be at most twice the number of different racks (including removed
racks, which should not be significant).

Fixes SCYLLADB-356
2026-04-09 13:08:02 +02:00
Yaniv Michael Kaul
2c0076d3ef replica: set_skip_when_empty() for rare error-path metrics
Add .set_skip_when_empty() to four metrics in replica/database.cc that
are only incremented on very rare error paths and are almost always zero:

- database::dropped_view_updates: view updates dropped due to overload.
  NOTE: this metric appears to never be incremented in the current
  codebase and may be a candidate for removal.
- database::multishard_query_failed_reader_stops: documented as a 'hard
  badness counter' that should always be zero. NOTE: no increment site
  was found in the current codebase; may be a candidate for removal.
- database::multishard_query_failed_reader_saves: documented as a 'hard
  badness counter' that should always be zero.
- database::total_writes_rejected_due_to_out_of_space_prevention: only
  fires when disk utilization is critical and user table writes are
  disabled, a very rare operational state.

These metrics create unnecessary reporting overhead when they are
perpetually zero. set_skip_when_empty() suppresses them from metrics
output until they become non-zero.

AI-Assisted: yes
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>

Closes scylladb/scylladb#29345
2026-04-09 14:07:28 +03:00
Botond Dénes
86417d49de Merge 'transport: improve memory accounting for big responses and slow network' from Marcin Maliszkiewicz
After obtaining the CQL response, check if its actual size exceeds the initially acquired memory permit. If so, acquire additional semaphore units and adopt them into the permit, ensuring accurate memory accounting for large responses.

Additionally, move the permit into a .then() continuation so that the semaphore units are kept alive until write_message finishes, preventing premature release of memory permit. This is especially important with slow networks and big responses when buffers can accumulate and deplete a node's memory.

Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1306
Related https://scylladb.atlassian.net/browse/SCYLLADB-740

Backport: all supported versions

Closes scylladb/scylladb#29288

* github.com:scylladb/scylladb:
  transport: add per-service-level pending response memory metric
  transport: hold memory permit until response write completes
  transport: account for response size exceeding initial memory estimate
2026-04-09 13:36:31 +03:00
Yaniv Michael Kaul
5c8b4a003e db: set_skip_when_empty() for rare error-path metrics
Add .set_skip_when_empty() to four metrics in the db module that are
only incremented on very rare error paths and are almost always zero:

- cache::pinned_dirty_memory_overload: described as 'should sit
  constantly at 0, nonzero is indicative of a bug'
- corrupt_data::entries_reported: only fires on actual data corruption
- hints::corrupted_files: only fires on on-disk hint file corruption
- rate_limiter::failed_allocations: only fires when the rate limiter
  hash table is completely full and gives up allocating, requiring
  extreme cardinality pressure

These metrics create unnecessary reporting overhead when they are
perpetually zero. set_skip_when_empty() suppresses them from metrics
output until they become non-zero.

AI-Assisted: yes
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>

Closes scylladb/scylladb#29344
2026-04-09 13:32:09 +03:00
Gleb Natapov
dbaba7ab8a storage_service: cleanup unused code
Remove unused definition and double includes.
2026-04-09 13:31:41 +03:00
Gleb Natapov
b050b593b3 storage_service: simplify get_peer_info_for_update
It does nothing for fields managed in raft, so drop their processing.
2026-04-09 13:31:41 +03:00
Gleb Natapov
d0576c109f gossiper: send shutdown notifications in parallel 2026-04-09 13:31:40 +03:00
Gleb Natapov
1586fa65af gms: remove unused code
Also moved version_string(...) and make_token_string(...) to private: — they are internal helpers used only by normal(), not part of the public API
2026-04-09 13:31:40 +03:00
Gleb Natapov
b2e35c538f virtual_tables: no need to call gossiper if we already know that the node is in shutdown 2026-04-09 13:31:40 +03:00
Gleb Natapov
e17fc180a0 gossiper: print node state from raft topology in the logs
Raft topology has real node's state now. gossiper sate are now set to
NORMAL and SHUTDOWN only.
2026-04-09 13:31:40 +03:00