Commit Graph

11911 Commits

Author SHA1 Message Date
Piotr Dulikowski
8dfd455001 Merge 'strong consistency: fix drop table blocking on stuck writes and handle timeout in update()' from Petr Gusev
- Fix table drop blocking for the full client timeout when in-flight writes can't reach quorum
- Handle unhandled timeout exception in the wait-for-leader loop during group startup

When a strongly consistent table is dropped, `schedule_raft_group_deletion`() calls `g->close()` which waits for all in-flight operations to release their gate holders. But other nodes may have already destroyed their raft servers for this group, so an in-flight write on the leader cannot reach quorum and hangs until the client timeout expires (~seconds), unnecessarily delaying group deletion.

Additionally, the wait-for-leader loop in groups_manager::update() uses abort_on_expiry with a 60-second timeout but never catches the exception if it fires, leaving the group in an indeterminate state.

SCYLLADB-2080 fix:
- Reorder `schedule_raft_group_deletion`: initiate gate close (prevents new operations), then abort the raft server (unblocks stuck writes by causing `raft::stopped_error`), then await the gate future (resolves immediately since holders are released).
- Handle `raft::stopped_error` in the coordinator's top-level catch blocks (both write and read paths): if the table no longer exists, return `no_such_column_family` (CQL layer converts to InvalidRequest: unconfigured table). Otherwise fall through to the default timeout handling.
- Replace gate->hold() with try_hold() + on_internal_error in acquire_server, with a comment explaining why the gate can never be closed at that point (table removal in `schema_applier::commit_on_shard` precedes gate closure, with no scheduling point in between).

Timeout handling fix:
- Use `coroutine::as_future` in the wait-for-leader loop to catch timeout exceptions gracefully — log a warning and break out instead of propagating unhandled.

Includes a cluster test reproducer (test_drop_table_unblocks_stuck_write) that:
1. Pauses a write on the leader before add_entry
2. Drops the table (follower destroys its group immediately)
3. Resumes the write — verifies it fails promptly with InvalidRequest ("unconfigured table") instead of hanging for 15 seconds

backport: no need, strong consistency is not released yet

Fixes: SCYLLADB-2080

Closes scylladb/scylladb#30105

* github.com:scylladb/scylladb:
  strong consistency/groups_manager: handle timeout in update() wait-for-leader loop
  strong consistency: abort raft server before gate close when dropping a table
  test/cluster: rewrite test_queries_while_dropping_table for SCYLLADB-2080
2026-05-28 09:59:20 +02:00
Ferenc Szili
76dac2fd8e test: fix format string typo in error logging in ldap_server.py
This change fixes a typo in the error logging format string: s% -> %s

Fixes: SCYLLADB-2244

Closes scylladb/scylladb#30088
2026-05-27 17:22:21 +03:00
Nadav Har'El
21ecc12fc6 Merge 'index: fix local vector index locality detection after schema reload' from Michał Hudobski
After schema reload, `target_parser::is_local()` did not recognize the
vector-index local target format `{"pk": [...], "tc": "..."}`, causing
local vector indexes to be treated as global. This broke duplicate
detection when both a global and a local vector index existed on the same
column. Fix by introducing `vector_index::is_local()` and dispatching
to it from `create_index_from_index_row()` based on the index class.
Also adds tests for local/global vector index coexistence.

Fixes: SCYLLADB-987

backport reasoning: we added local vector index support in 2026.1

Closes scylladb/scylladb#29492

* github.com:scylladb/scylladb:
  test/cqlpy: add tests for global and local vector index coexistence
  index: fix local vector index locality detection after schema reload
2026-05-27 15:34:57 +03:00
Petr Gusev
d922c43358 strong consistency: abort raft server before gate close when dropping a table
When a strongly consistent table is dropped, schedule_raft_group_deletion()
used to call g->close() first, which waits for all in-flight operations to
release their gate holders. But other nodes may have already destroyed their
raft servers for this group, so an in-flight write on the leader cannot
reach quorum and hangs until the client timeout expires, unnecessarily
delaying group deletion.

Fix: initiate gate close (prevents new operations from entering), then
abort the raft server (causes in-flight add_entry/read_barrier to throw
raft::stopped_error, releasing their gate holders), then await the gate
future (resolves immediately since holders are now released).

Handle raft::stopped_error in the coordinator's top-level catch blocks
(both write and read paths): if the table no longer exists, return
no_such_column_family (which the CQL layer converts to InvalidRequest
'unconfigured table'). Otherwise fall through to the default timeout
handling.

Also replace gate->hold() with try_hold() + on_internal_error in
acquire_server, and handle the timeout exception in the wait-for-leader
loop in update() gracefully (log + break instead of propagating).

Fixes: SCYLLADB-2080
2026-05-27 12:06:46 +02:00
Petr Gusev
89307064b5 test/cluster: rewrite test_queries_while_dropping_table for SCYLLADB-2080
Rewrite the test to use 2 nodes (RF=2) instead of 1 (RF=1), which exposes
the quorum-loss scenario: when a table is dropped, the follower destroys
its raft group immediately while the leader's in-flight operations are
still holding the gate.

The test pauses both a read and a write on the leader, drops the table,
then resumes them. Both are expected to fail with 'no such column family'
since the raft server is aborted as part of group deletion. A 15-second
timeout guard detects the old buggy behavior (write stuck forever).

Marked xfail until the fix is applied in the next commit.
2026-05-27 12:06:46 +02:00
Botond Dénes
555cfbcd38 Merge 'treewide: replace deprecated smp::count and smp::all_cpus() with new APIs' from Avi Kivity
Replace all uses of the deprecated seastar::smp::count with this_smp_shard_count() and smp::all_cpus() with this_smp_all_shards() across the ScyllaDB codebase (seastar submodule untouched).

Both replacement functions require a reactor thread context. All call sites were verified to run on reactor threads.

Notable cases:
- dht/token-sharding.hh: this_smp_shard_count() is used as a default parameter value. This is safe since all callers are on reactor threads, but the expression is now evaluated at each call site rather than being a reference to a global variable.
- service/storage_service.hh, locator/abstract_replication_strategy.hh, ent/encryption/encryption.cc: used in default member initializers and constructor member-init-lists. Objects are always constructed on reactor threads.
- schema_builder: sometimes called from BOOST_AUTO_TEST_CASE without a reactor. Added pre-patch that makes the implicit shard count parameter implicit and pass 1 in those cases.

Not changed:
- scylla-gdb.py: reads smp::count as a GDB symbol (no reactor context).
- Python test files: only reference smp::count in comments/strings.

No backport: the Seastar commit that deprecated these function hasn't (and won't) make its way into any release branches (and the warnings are cosmetic anyway)

Closes scylladb/scylladb#29990

* github.com:scylladb/scylladb:
  treewide: replace deprecated smp::count and smp::all_cpus() with new APIs
  scylla-gdb: read shard count from smp::_this_smp instead of smp::count
  schema_builder: make shard_count an explicit constructor parameter
2026-05-27 09:42:06 +03:00
Avi Kivity
8010e408a2 treewide: replace deprecated smp::count and smp::all_cpus() with new APIs
Replace all uses of the deprecated seastar::smp::count with
this_smp_shard_count() and smp::all_cpus() with this_smp_all_shards()
across the ScyllaDB codebase (seastar submodule untouched).

Both replacement functions require a reactor thread context. All call
sites were verified to run on reactor threads.

Notable cases:
- dht/token-sharding.hh: this_smp_shard_count() is used as a default
  parameter value. This is safe since all callers are on reactor threads,
  but the expression is now evaluated at each call site rather than being
  a reference to a global variable.
- service/storage_service.hh, locator/abstract_replication_strategy.hh,
  ent/encryption/encryption.cc: used in default member initializers and
  constructor member-init-lists. Objects are always constructed on reactor
  threads.

Not changed:
- scylla-gdb.py: reads smp::count as a GDB symbol (no reactor context).
- Python test files: only reference smp::count in comments/strings.
2026-05-26 17:35:20 +03:00
Wojciech Mitros
ae0d77257f mv: fix view_update_builder losing fragments across batch boundaries
When a mutation generates more view updates than max_rows_for_view_updates
(100), view_update_builder::build_some() splits the work into multiple
batches. There was a bug in how fragments were read between batches:

When should_stop_updates() returned true, the old code called stop()
which returned stop_iteration::yes without reading the next fragments.
On the next build_some() call, read_both_next_fragments() was called
at the start, which advanced BOTH readers - skipping any fragment that
was already read but not yet consumed. A row could be not consumed if
either:
- the 100th (last in the batch) update was a row insertion and we still
  had insertions/updates remaining
- the 100th (last in the batch) update was a row deletion and we still
  had deletions/updates remaining
For the most common case where work is split in batches, i.e. range
deletions, we couldn't hit this because range delete generates only
view row deletions.
On tables with a single materialized view, we also couldn't get this
for any batches with less than 50 statements (unless the batch also
contained range deletions), because one non-range-delete update can
generate up to 2 view updates.
Howeveer, for a range of scenarios outside these 2, we could lose
view updates, resulting in persistent inconsistencies.

The fix:
- read_*_next_fragment() now accept a stop_iteration parameter, so the
  next fragments are always read after consuming (even when stopping),
  but stop_iteration::yes is correctly propagated to break the loop.
- build_some() no longer re-reads fragments at the start. Instead, an
  initialize() method performs the initial read once at construction.
- because now we only advance readers after consuming, we won't advance
  readers after end_of_partition, so we extend the break condition to
  accept either readers evaluating to `false` or them being at the
  end_of_partition. We also handle the optimization with
  _skip_row_updates

Fixes: scylladb/scylladb#29155

Closes scylladb/scylladb#29498
2026-05-26 14:15:12 +02:00
Avi Kivity
c59985c38b Merge 'cql3: limit large allocations when parsing queries' from Botond Dénes
Queries are stored and passed around as sstring/std::string_view. While normally they are small enough to not cause problems, as the `test_cdc_large_values.TestLargeColumnsWithCDC.test_single_column_blob_max_size_with_cdc_preimage_full_postimage[unprepared_statements]` demonstrates, queries can be arbitrarily large, putting heavy strain on Scylla internals via large allocations, in the extreme case causing denial of service.

This PR attempts to alleviate this by using fragmented storage for queries: read query as fragmented string from the input stream in `transport/server.cc`, propagate it as such to `query_processor::prepare()` and also store it as such in `cql3::cql_statement::raw_cql_statement`. Also avoid linearizing raw values during in the CQL expression tree: switch `cql3::expr::untyped_constant::raw_text` to fragmented storage.

For this to be possible, some infrastructure code had to be made fragmented storage friendly: ascii/utf8 validation, hashers, from_hex and importantly: `abstract_type::from_string()`.

Unfortunately, the query still has to be linearized for parsing itself, as ANTLR -- although allows for custom InputStream implementation -- plays pointer arithmetics games with the pointers obtained from them, so fragmented input cannot be used.

Still, this PR limits the places where the query is linearized to the
following:
* Parsing
* Audit
* Logs and error messages

So the normal query paths for queries that actually can get arbitrarily large (UPDATE and INSERT) should only linearize the query temporarily for parsing.

Fixes #10779

Improvement, no backport

Closes scylladb/scylladb#28619

* github.com:scylladb/scylladb:
  tracing: add_query(): change query param to utils::chunked_string
  cql3: store raw query string in utils::chunked_string
  serializer: add serializer<utils::chunked_string>
  utils/reusable_buffer: add get_linearized_view(managed_bytes_view)
  cql3/expr: use utils::chunked_string for untyped_constant::raw_text
  types: abstract_type::from_string() switch to fragmented buffers (implementation)
  types: abstract_type::from_string() switch to fragmented buffers (interface)
  types: use write_fragmented from utils/fragment_range.hh
  types: timestamp_from_string(): don't assume std::string_view is null-terminated
  types/duration: don't assume std::string_view is null-terminated
  utils/hashers: add calculate(managed_bytes_view) overload
  utils/ascii: add validate(managed_bytes_view) overload
  utils: add managed_bytes_fwd.hh
  utils: add chunked_string
  utils: add managed_bytes_basic_view::byte_iterator
2026-05-26 15:00:53 +03:00
Avi Kivity
f165b396fd schema_builder: make shard_count an explicit constructor parameter
A recent Seastar update deprecated smp::count and introduced
this_smp_shard_count() as a replacement. One difference is that
this_smp_shard_count() wants to run on a reactor thread.

This poses a problem for non-reactor tests (BOOST_AUTO_TEST_CASE)
that nevertheless use a schema, as the schema_builder constructor
references smp::count. If we replace it with this_smp_shard_count()
then it will crash when running without a reactor.

To fix, remove the implicit this_smp_shard_count() call from raw_schema's
constructor and require callers to pass shard_count explicitly to
schema_builder. This allows tests that don't run on a reactor thread
to construct schemas without crashing.

Production code and reactor-based tests pass this_smp_shard_count().
Non-reactor test files (expr_test, keys_test, nonwrapping_interval_test,
wrapping_interval_test, bti_key_translation_test, range_tombstone_list_test)
pass a fixed shard count of 1.

Note: sstable_test.cc is a Seastar test file (SEASTAR_THREAD_TEST_CASE)
but also contains one plain BOOST_AUTO_TEST_CASE
(test_empty_key_view_comparison) that constructs a schema_builder without
a reactor context. This test also receives a fixed shard count of 1.
2026-05-26 11:55:56 +03:00
Nikos Dragazis
54cb6d4608 test: Order task-wait before finalization in test_migration_wait_task
The purpose of this test is to verify that the task manager's "wait" API
works correctly for vnodes-to-tablets migration virtual tasks. It starts
a `wait_task` HTTP request concurrently with a finalize (or rollback)
operation, and asserts that the wait returns the correct final state
("done" or "suspended").

The test `uses asyncio.create_task()` to wrap the wait request into a
task, and then immediately calls finalize. With asyncio's lazy task
scheduling, the wait coroutine does not start until the event loop
yields, so the finalization request reaches the server before wait, and
therefore may also complete before it. Once finalization completes, the
virtual migration task is no longer discoverable, causing a
"task not found" error.

Add a log message in Scylla's wait handler and a synchronization point
in the test to ensure that the wait request lands the server before
finalization. This follows the same pattern used in
`test_tablet_tasks.py::check_and_abort_repair_task`.

Fixes SCYLLADB-2077

Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>

Closes scylladb/scylladb#29973
2026-05-26 10:43:22 +03:00
Botond Dénes
0fd25dc47c Merge 'Replace get_injection_parameters() with inject_parameter() where appropriate' from Pavel Emelyanov
Several error injection sites use the low-level get_injection_parameters() API to fetch the entire parameters map and then manually look up a single key. The inject_parameter() API is better suited for these cases — it combines the enabled check and typed single-parameter extraction in one call, returning std::optional.

Cleaning error injection usage, not backporting

Closes scylladb/scylladb#29970

* github.com:scylladb/scylladb:
  test: Use inject_parameter() in row_cache_test
  sstables: Use inject_parameter() for mx reader fill buffer timeout
  streaming: Use inject_parameter() for order_sstables_for_streaming
2026-05-26 10:32:44 +03:00
Nadav Har'El
f65a52f3ec Merge 'vector_search: test: migrate rescoring tests from C++/Boost to pytest' from Szymon Malewski
Migrate mock-based rescoring and oversampling tests from
test/vector_search/rescoring_test.cc to pytest and delete the C++ file.
Index option validation tests go to test_vector_index.py; rescoring tests
go to a new test_vector_search_rescoring.py which introduces shared
infrastructure (EmbeddingRow dataclass, TEST_DATA dict,
reversed_ann_response() helper, rescoring_test_table() context manager).

Two tests have updated assertions (semantic change):
filters_invalid_similarity_scores now uses per-function expected result
sets including a zero-vector row, and rescoring_with_zerovector_query
asserts empty results after NaN filtering (cosine only). Both are marked
xfail pending SCYLLADB-924.

Follow-up to #29593.

Does not require backport - simple refactoring of tests

Closes scylladb/scylladb#29906

* github.com:scylladb/scylladb:
  test/vector_search: migrate zero-vector query rescoring test to pytest; delete rescoring_test.cc
  test/vector_search: migrate invalid similarity score filtering test to pytest
  test/vector_search: migrate non-ANN similarity argument rescoring test to pytest
  test/vector_search: migrate wildcard select rescoring test to pytest
  test/vector_search: migrate similarity_function rescoring test to pytest
  test/vector_search: migrate rescoring and f32 quantization tests to pytest
  test/vector_search: migrate oversampling tests to pytest
  test/vector_search: migrate vector_index option validation tests to pytest
2026-05-26 09:45:40 +03:00
Botond Dénes
2c9a5f9634 types: abstract_type::from_string() switch to fragmented buffers (implementation)
The previous patch changed the interface and callers, this one updates
the implementation to actually work with fragmented buffers. Most types
just use with_linearized() to linearize the fragmented input buffer for
parsing. This is fine, as most types have a fixed or bounded-size string
representation that is small.
Importantly, the input is not linearized for the 3 types which have
unbounded values: ascii, bytes and text. The tuple type can contain any
of these types itself, so it is also converted to avoid linearization.
2026-05-26 09:08:06 +03:00
Botond Dénes
597d4252dc types: abstract_type::from_string() switch to fragmented buffers (interface)
Change input: str::string_view -> utils::chunked_string_view.
Change return value: bytes -> managed_bytes.

This patch only changes the interface, with some to_bytes() sprinkled in
the internals to deal with recursive calls.
Internals will be updated in the next patch, to keep the churn of
updating callers separate from the actually important changes.
2026-05-26 09:08:06 +03:00
Botond Dénes
a9028d88b2 utils/hashers: add calculate(managed_bytes_view) overload
Uses update() for each fragment, then finalize. Yields identical hash to
calling calculate(std::string_view) with linearized buffer. This is
checked by new tests.
2026-05-26 09:08:05 +03:00
Botond Dénes
a2fff12bcd utils: add chunked_string
A thin facade over managed_bytes[_view], offering some extra
convenience for working with strings, as well as a strong type
communicating the purpose (storing text instead of a blob).

Also introduces utils::from_hex(chunked_string_view), a fragmented
hex-decode that operates directly on a chunked_string_view without
requiring linearization. Hex pairs straddling fragment boundaries
are handled via a carry-over nibble.
2026-05-26 09:08:05 +03:00
Botond Dénes
09743aed36 utils: add managed_bytes_basic_view::byte_iterator
bytes-wise iterator which works both as bidirectional-iterator and as
output-iterator (for mutable views). Allows using managed_bytes_view in
algorithms which are iterator based.

Added unit tests for covering the iterator functionality.
2026-05-26 09:08:05 +03:00
Szymon Malewski
2151a4fac3 test/vector_search: migrate zero-vector query rescoring test to pytest; delete rescoring_test.cc
Migrate rescoring_with_zerovector_query from rescoring_test.cc to pytest
as test_rescoring_with_zerovector_query. Tested with cosine similarity only
because zero vectors produce NaN only for cosine; other functions yield
valid scores.

The test is marked xfail: similarity_cosine now returns NaN for zero vectors
(SCYLLADB-456 fix) and rescoring should filter out NaN scores, yielding an
empty result set.

Semantic change: the test now asserts the desired empty-result behavior
instead of asserting that the query does not throw.

Delete rescoring_test.cc now that all tests have been migrated and remove
its entries from configure.py and test/vector_search/CMakeLists.txt.
2026-05-26 00:37:54 +02:00
Szymon Malewski
533a8e65fe test/vector_search: migrate invalid similarity score filtering test to pytest
Migrate no_nulls_in_rescored_results from rescoring_test.cc to pytest,
renamed to test_filters_invalid_similarity_scores_in_rescored_results.

The test now also inserts a zero-vector row (id=14) to cover the case
introduced when similarity_cosine was changed to return NaN for zero
vectors instead of throwing (SCYLLADB-456). The expected surviving set
of rows is refined per similarity function based on which inputs produce
valid (non-NaN, non-Infinity) similarity scores. Marked xfail because
rescoring does not yet filter rows with invalid scores.

Semantic change: the expected surviving row set is updated per the
behavior described above.
2026-05-26 00:37:54 +02:00
Szymon Malewski
63d9b7445f test/vector_search: migrate non-ANN similarity argument rescoring test to pytest
Migrate select_similarity_function_other_than_ann_ordering from
rescoring_test.cc to pytest. The test verifies that similarity scores in
SELECT are computed against the explicitly supplied argument vector rather
than the ANN ordering vector. No semantic change.
2026-05-26 00:37:54 +02:00
Szymon Malewski
0cb557695a test/vector_search: migrate wildcard select rescoring test to pytest
Migrate wildcard_select_is_correctly_rescored from rescoring_test.cc to
pytest. The test verifies that SELECT * with rescoring returns rows in
the correct similarity order with correct embedding values, covering a
slightly different processing path from the explicit-column SELECT test.
No semantic change.
2026-05-26 00:37:53 +02:00
Szymon Malewski
cae816a8c6 test/vector_search: migrate similarity_function rescoring test to pytest
Migrate similarity_function_returns_correctly_rescored_results from
rescoring_test.cc to pytest. The test verifies that similarity scores
in the SELECT clause are computed correctly after rescoring, for both
argument orderings of the similarity function. No semantic change.
2026-05-26 00:37:53 +02:00
Szymon Malewski
78d72309b8 test/vector_search: migrate rescoring and f32 quantization tests to pytest
Introduce shared test infrastructure in test_vector_search_rescoring.py:
EmbeddingRow dataclass, TEST_DATA dict keyed by similarity function name,
ANN_QUERY_VECTOR, reversed_ann_response() helper, and rescoring_test_table()
context manager.

Migrate result_returned_by_vector_store_is_rescored and
f32_quantization_disables_rescoring from rescoring_test.cc. No semantic change.
2026-05-26 00:37:53 +02:00
Szymon Malewski
400c0dbb22 test/vector_search: migrate oversampling tests to pytest
Migrate oversampling_multiplies_limit_for_vector_store_query and
oversampled_vector_store_results_are_limited_to_cql_limit from
rescoring_test.cc to test_vector_search_rescoring_with_mock.py.
No semantic change.
2026-05-26 00:37:53 +02:00
Szymon Malewski
9f632182fb test/vector_search: migrate vector_index option validation tests to pytest
CREATE INDEX option tests for quantization, oversampling, and rescoring
are moved from rescoring_test.cc to test_vector_index.py alongside the
existing index option tests. These tests exercise only option parsing and
validation - no vector store mock needed. No semantic change.
2026-05-26 00:37:52 +02:00
Nadav Har'El
96dd3121e7 Merge 'cql: rewrite CassIO SAI metadata index to regular secondary index' from Szymon Wasik
CassIO (the library backing LangChain's `langchain_community.vectorstores.Cassandra` integration) issues the following DDL during schema setup to create a metadata index:

```sql
CREATE CUSTOM INDEX IF NOT EXISTS eidx_metadata_s_<table>
ON <keyspace>.<table> (ENTRIES(metadata_s))
USING 'org.apache.cassandra.index.sai.StorageAttachedIndex';
```

ScyllaDB does not support Cassandra's StorageAttachedIndex (SAI) for non-vector columns and previously rejected this statement with:

```
StorageAttachedIndex (SAI) is only supported on vector columns; use a secondary index for non-vector columns
```

This blocks seamless migration of existing LangChain/CassIO applications from Cassandra to ScyllaDB — applications fail during initialization before any application-level workaround can run, even when metadata filtering is not used (`metadata_indexing="none"`).

CassIO is no longer actively maintained but remains the only official LangChain integration path for Apache Cassandra over CQL, meaning existing applications will continue using this setup pattern.

Instead of rejecting the CassIO metadata-map SAI DDL, detect the pattern and rewrite it to a standard ScyllaDB secondary index on collection entries:

- **Detection**: SAI class name + single `ENTRIES` target on a non-frozen `map` column
- **Rewrite**: Clear the custom class so the index is created through the standard secondary index path (which already fully supports indexing map entries)
- **Warning**: Emit a CQL warning informing the user that SAI is not supported by ScyllaDB, a regular secondary index was created instead, and metadata filtering behavior may differ from Cassandra SAI

The rewrite is placed early in `validate_while_executing()`, before the rf-rack-validity check, so the standard secondary index code path handles all subsequent validation naturally — no code duplication.

After this change, the CassIO schema setup succeeds on ScyllaDB:
- `CREATE CUSTOM INDEX ... USING 'sai'` on `ENTRIES(metadata_s)` creates a real secondary index
- The index is functional and can accelerate metadata filtering queries
- A CQL warning makes the rewrite transparent to operators
- SAI on non-vector, non-map-entries columns is still rejected as before
- Vector SAI indexes continue to be rewritten to `vector_index` as before

- `test_sai_entries_on_map_creates_regular_index` — verifies the index is created and the warning is emitted (fully-qualified SAI class name)
- `test_sai_entries_on_map_short_name` — same with the `'sai'` short alias
- `test_sai_on_regular_column_rejected` — confirms SAI on regular scalar columns is still rejected

All 148 tests in `test_vector_index.py` and `test_secondary_index.py` pass with no regressions (125 passed, 22 xfailed, 1 skipped).

Fixes: SCYLLADB-2113
Backport: 2026.2 as this is the version where the support for SAI class needed by LangChain was added.

Closes scylladb/scylladb#29981

* github.com:scylladb/scylladb:
  cql: rewrite CassIO SAI metadata index to regular secondary index
  db/config: add enable_cassio_compatibility flag
2026-05-26 00:19:03 +03:00
Michał Hudobski
1d17d2144f index, vector_index: limit primary key columns to 255
The vector-store's InvariantKey type supports at most 255 key
components. Reject vector index creation when the base table's
primary key (partition + clustering columns) exceeds this limit.

Fixes: VECTOR-553

Closes scylladb/scylladb#29317
2026-05-25 19:24:17 +03:00
Szymon Wasik
5ee339b11d cql: rewrite CassIO SAI metadata index to regular secondary index
When CassIO creates a SAI ENTRIES index on a map column,
ScyllaDB now rewrites it to a regular secondary index and emits
a CQL warning. This allows LangChain/CassIO applications to work
without DDL errors.

The rewrite is gated behind the enable_cassio_compatibility flag
(disabled by default).

Refs: SCYLLADB-2113
2026-05-25 15:11:43 +02:00
Botond Dénes
db89f3f095 Merge 'compaction_manager: unregister compaction module on early shutdown' from Patryk Jędrzejczak
The compaction module is registered with task_manager in the compaction_manager
constructor, and unregistered in compaction_manager::really_do_stop(), which
was gated behind `_state != state::none` in compaction_manager::do_stop().
Since enable() -- which transitions _state from none to running -- is called
later during startup (from database::start() or the disk space monitor callback)
than the compaction_manager constructor, an early shutdown could leave the
compaction module registered after compaction_manager::do_stop() returned.
task_manager::stop() then aborted with 'Tried to stop task manager while
some modules were not unregistered'.

Fix compaction_manager::do_stop() to call _task_manager_module->stop() even
when `_state == state::none`, so that the compaction module is always properly
unregistered.

Fixes: SCYLLADB-2106

Backport to all supported branches, as the bug is there and it has
already caused a failure in 2026.1 CI.

Closes scylladb/scylladb#30015

* github.com:scylladb/scylladb:
  test: add test_stop_before_starting_compaction_manager
  compaction_manager: unregister compaction module on early shutdown
2026-05-25 16:08:20 +03:00
Dmitry Kropachev
74fa423271 transport: report host id in SUPPORTED
Currently driver creates network layout (node IP addresses and ports)
from `system.local`, `system.peers`, `system.client_routes` and then
runs on assumption that this network layout is correct.
It does not check if it is.
If, for example it happens so that node ip/port (say on proxy) will not
match what driver calculated it will go unnoticed.

The goal of this feature is to provide driver host-id on SUPPORTED frame,
so that it would know which node it connected to and could make decision
wether keep connection or drop it.

- add `SCYLLA_HOST_ID` to the CQL `SUPPORTED` response
- add a regression test that hooks the Python driver handshake and
  verifies the reported host id

- `python3.12 -m py_compile test/cqlpy/test_protocol_exceptions.py`
- syntax-only compile of `transport/server.cc` with the repo toolchain
  flags inside `dbuild`

Refs #27452
Refs https://scylladb.atlassian.net/browse/DRIVER-610

Closes scylladb/scylladb#29809
2026-05-25 14:36:53 +03:00
Avi Kivity
892f22f49c Merge 'cql: atomic add/subtract operations with LWT' from Nadav Har'El
ScyllaDB has special counter columns for which atomic add/subtract operations like `SET a = a + 1` are allowed. Such operations have not been allowed on ordinary non-counter columns, as they would not be properly atomic - the read an the write are separate, and concurrent operations can have incorrect results.

This patch makes it allowed to use such atomic add/subtract operations in **LWT** statements. For example

	UPDATE ... SET a = a - 7 IF a > 0

or

	UPDATE ... SET a = a + 1 IF a != NULL

The row updated in the operation, and the updated column (`a`) should be initialized before the update. The example `SET a = a + 1 IF a != NULL` will fail the condition if `a` is not set. A different request `SET a = a + 1 IF EXISTS` will just leave `a` unset if it's unset (NULL + 1 is NULL, this is SQL's null propagation rules).

This add/subtract operations is allowed on any numeric (integer or floating point) column.

The ability of LWT to fetch the old values of a column and use it to calculate the new value has long been available in our internal CAS implementation - and has been in use for years in Alternator - but until this patch it was not exposed in CQL's LWT.

This series does not add new syntax to CQL - the "SET a = a + b" and "SET a = a - b"  syntax already existed for counters, and we just allow the same syntax for non- counters. However, the series does add a bit of machinery that will allow us to easily support more general expressions in the future. In particular, this series implements the addition, subtraction, and unary-minus operators for expressions, and adds the machinery needed to run **any** expression in "SET a = expr()", using existing row values fetched by LWT.

This is a new Scylla-only feature that does not exist in Cassandra.

Fixes #10568
Refs #22918 ("Support arithmetic operators"), SCYLLADB-1576 ("Decimal arithmetic operations OOM")

This is is a new feature, so normally would not be backported.

Closes scylladb/scylladb#29939

* github.com:scylladb/scylladb:
  cql: atomic add/subtract operations with LWT
  cql3: let constants::setter evaluate expressions using prefetched row data
  cql3/expr: add NEG unary operator for numeric negation
  cql3/expr: add SUB binary operator for numeric subtraction
  cql3/expr: add ADD binary operator for numeric addition
  types: add is_arithmetic() method for types
2026-05-25 14:27:33 +03:00
Dmitry Kropachev
06eeaf48ff tests: avoid CQL_ALTERNATOR_QUERIED on zero-token nodes
The keyspace RF test starts zero-token nodes as part of its topology setup.

The python driver 3.29.9 can't schedule queries on zero-token nodes, so waiting for `CQL_ALTERNATOR_QUERIED` on those nodes is the wrong readiness gate.
This change makes the zero-token `server_add()` calls stop at `CQL_ALTERNATOR_CONNECTED`.
The test still exercises the keyspace replication assertions through a normal token-owning contact point.

Verified with running all 4 variations of `cluster.test_keyspace_rf::test_create_keyspace_with_default_replication_factor` on this branch.

Closes scylladb/scylladb#29779
2026-05-25 14:22:04 +03:00
Piotr Dulikowski
3a5dd2e5be Merge 'strong_consistency: forward reads to the raft leader' from Wojciech Mitros
Strongly consistent reads currently call read_barrier() on whichever
replica happens to process the request. When a follower runs
read_barrier(), it sends an RPC to the leader to get the current read
index, then waits for its local apply index to catch up. If the follower
is behind, this wait can be significant.

By forwarding linearizable reads to the leader, we don't need an RPC from replica to leader to get the index to wait for apply -- it's available locally.

Note that read_barrier() is still required on the leader to confirm it
is still the leader and guarantee linearizability. A future optimization
would be to implement leases in the raft library, which could eliminate
read_barrier() on the leader entirely.

The CL-to-behavior mapping is isolated in a single parse_consistency_level()
function:
- CL=(LOCAL_)QUORUM -> linearizable: forwarded to the raft leader
- CL=(LOCAL_)ONE -> non-linearizable: existing behavior (no read_barrier()/forwarding, may return stale results)
- All other CLs -> invalid request

Read forwarding reuses the same CQL-layer bounce_to_node() mechanism
that write forwarding already uses. The transport layer's existing
requests_forwarded_* metrics automatically count forwarded reads.
Coordinator-level metrics (linearizable_reads, non_linearizable_reads,
writes) are added for visibility into the strong consistency workload.

Fixes: SCYLLADB-1157

Closes scylladb/scylladb#29575

* github.com:scylladb/scylladb:
  strong_consistency: test read forwarding to leader
  strong_consistency: skip read_barrier() for non-linearizable reads
  strong_consistency: split coordinator-level read latency metrics
  strong_consistency: forward linearizable reads to raft leader
  strong_consistency: classify reads by consistency level
  strong_consistency: add begin_read() to raft_server
2026-05-25 10:55:00 +02:00
Nadav Har'El
f8aaeb5e87 cql: atomic add/subtract operations with LWT
ScyllaDB has special counter columns for which atomic add/subtract
operations like `SET a = a + 1` are allowed. Such operations have not
been allowed on ordinary non-counter columns, as they would not be
properly atomic - the read an the write are separate, and concurrent
operations can have incorrect results.

This patch makes it allowed to use such atomic add/subtract operations
in *LWT* statements. Some examples:

        UPDATE ... SET a = a - 1 IF a > 0

        UPDATE ... SET a = a + 1 IF EXISTS

        UPDATE ... SET a = a + 1 a != NULL

The row updated in the operation, and the updated column (a) should
be initialized before the update - arithmetic operations on missing
column values silently leave the column null (no error is generated).

This add/subtract operations is allowed on any numeric column -
integer or floating point of any size.

The ability of LWT to fetch the old values of a column and use it to
calculate the new value has long been available in our internal CAS
implementation - and has been in use for years in Alternator - but until
this patch it was not exposed in CQL's LWT.

This patch does not add new syntax to CQL - the "SET a = a + b"
and "SET a = a - b" syntax that already existed for counters is now
allowed for non-counters.

This is a new Scylla-only feature that does not exist in Cassandra.

Fixes #10568

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2026-05-25 10:09:11 +03:00
Nadav Har'El
b026aea6f7 cql3/expr: add NEG unary operator for numeric negation
This patch adds a new expression type, unary_operator, analogous to
the existing binary_operator but takes just one operand instead of
two.

This patch also implements the first and only unary operator type,
unary_oper_t::NEG, implementing negation (unary minus) for all numeric
types.

For fixed-width integer types overflow or underflow results in an error.
If the operand is NULL, the result is a NULL as well.

The new operator is not yet used by the CQL syntax - our parser doesn't
parse arithmetic expressions yet. We also do not plan to use it in the
following patch which uses the separate SUB (subtraction) operation,
not the new NEG. But since I already implemented a unary minus operator,
and we'll surely need it in the future for general arithmentic operations,
I thought I might as well include this patch as well.

Refs #22918 ("Support arithmetic operators")
2026-05-25 10:08:11 +03:00
Nadav Har'El
f27d1f08fc cql3/expr: add SUB binary operator for numeric subtraction
In this patch we add to our expressions oper_t::SUB, for subtraction,
analogous to the ADD from the previous patch.

The only reason why we need a separate SUB operation and can't just
combine ADD with a unary minus (NEG) operator is the minimum integer
in fixed-sized integer. For example, 8-bit integers have the range
-128...127. A subtraction like -1 - (-128) is valid (its value is 127)
but the negation of (-128) would be invalid (128). One of the tests
we add in this patch validates this fact.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2026-05-25 10:06:28 +03:00
Nadav Har'El
083adf84ab cql3/expr: add ADD binary operator for numeric addition
Extend oper_t with a new ADD operator, to represent addition between two
numeric expressions. Supports all numeric types - tinyint, smallint,
int, bigint, float, double, varint, and decimal.

For fixed-width integer type overflow or underflow results in an error.
If one of the operand is NULL, the result is also a NULL.

The new operator is not yet used by the CQL syntax - our parser doesn't
parse arithmetic expressions yet. We plan to start using this new operator
in a following patch which implements counter syntax ("SET r = r + 1" )
for LWT, but in the future we can use it for more general cases.

At the moment, ADD requires that both operands have the same type.
This is all we need for the first use case, and this limitation can
be relaxed later.

Interestingly, ADD is our first binary operator implementation that
does not return a boolean. Until now all our binary operators have been
comparison operators, and all returned boolean. In contrast, ADD's
return type is the type of its operands.

This implementation is susceptible to the pre-existing bug SCYLLADB-1576,
where adding 1e1000000 and 1 in "decimal" or "varint" types will
happily allocate a million-digit number and run out of memory. A
reproducing test is included, and this issue will be solved in one
place for all operations that have additions (including aggregations
and arithmetic expressions) in a followup pull-request.

Refs #22918 ("Support arithmetic operators")
2026-05-25 10:05:09 +03:00
Gleb Natapov
0bf050d175 storage_proxy: hold shared pointer to a table object during entire query_partition_key_range_concurrent execution
Otherwise if a table is dropped in the middle of a scan the object may
disappear.

Fixes https://scylladb.atlassian.net/browse/SCYLLADB-2137

Closes scylladb/scylladb#29988
2026-05-24 21:54:08 +03:00
Michael Litvak
73470150a0 logstor: disable logstor compaction in table truncate
in database::truncate_table_on_all_shards disable logstor compaction
before the table data is truncated, similarly to how non-logstor
compaction is disabled, to avoid race conditions between logstor
compaction and segments discarding.

Fixes SCYLLADB-2186
2026-05-24 10:25:08 +02:00
Wojciech Mitros
45f5df14e5 strong_consistency: test read forwarding to leader
Test the linearizable read forwarding behavior in a single test that
exercises all scenarios on one cluster:
- CL=QUORUM reads on leader, follower, and non-replica nodes
- CL=ONE reads (non-linearizable, no forwarding)
- Linearizability: write + CL=QUORUM read from follower (10 iterations)
- Coordinator latency histogram metrics for both read types

Refs: SCYLLADB-1157
2026-05-23 11:35:37 +02:00
Wojciech Mitros
d07692a7ff strong_consistency: split coordinator-level read latency metrics
Split the latency metrics for strongly consistent reads into two
categories: linearizable and non-linearizable. They replace the
existing metrics for both types combined - this shouldn't cause
issues because the feature is still experimental and both the
initial introduction of latency metrics and the split will be
a part of the same release.

Also fix a test that was using the old metric.
2026-05-23 11:35:37 +02:00
Yaniv Michael Kaul
acd3115645 sstables: include SSTable filename in Stats metadata error messages
When Stats metadata is not available or malformed, include the SSTable
filename in the error message to help operators identify which SSTable
files need attention during startup failures.

Fixes: https://github.com/scylladb/scylla-enterprise/issues/5439
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
AI-assisted: yes
Backport: no, benign improvement

Closes scylladb/scylladb#29950
2026-05-22 16:49:37 +03:00
Łukasz Paszkowski
96a992002c tasks: fix busy-spin and shutdown hang in tablet_virtual_task::wait() for repair tasks
The condition variable predicate for repair tasks unconditionally
returned true (introduced in e5928497ce), which meant event.wait(pred)
never actually suspended: do_until checks the predicate first, and if
it's already satisfied, returns immediately without calling the inner
wait(). This caused two problems:
1. The while(true) loop busy-spun, polling without blocking between
   topology changes.
2. During shutdown, event.broken() had no effect because no waiter was
   registered on the CV. The loop kept spinning, holding the HTTP
   server's task gate open and preventing http_server::stop() from
   completing. After ~15 minutes, systemd killed the process with
   SIGABRT.

The fix replaces the synchronous predicate with an async task_finished()
helper that dispatches on the task type. Since the repair check is async
(for_each_tablet scans every tablet), we cannot use event.wait(Pred).
Instead, we register a waiter via event.wait() *before* running the async
check, ensuring no broadcast is missed during the check. event.broken()
during shutdown propagates broken_condition_variable to the registered
waiter and unblocks the loop promptly.

Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1532

Closes scylladb/scylladb#29485
2026-05-22 16:47:48 +03:00
Raphael S. Carvalho
3ba6184462 repair, test: fix split-repair synchronization test timeout in debug mode
The test_split_and_incremental_repair_synchronization[True] test was
timing out waiting for 'Finalizing resize decision for table' in
debug mode.

The root cause is a timing race: the incremental_repair_prepare_wait
error injection has a hardcoded 60s auto-expiry timeout
(wait_for_message(60s)), but split compactions in debug mode take ~58s
per SSTable due to -O0 compilation and scheduler starvation (the
maintenance_compaction group gets ~10% of wall-clock time). When the
injection auto-expires before split finalization, the repair fails,
leaving tablets stuck in transition=repair state. This prevents the
topology coordinator from finalizing the split, causing the 600s test
timeout.

Fix both contributing factors:

- Increase the injection timeout from 60s to 10min, giving split
  compactions ample time to complete before the injection auto-expires.
  The test explicitly messages the injection to release it (line 2200),
  so the longer timeout is just a safety net.

- Reduce data volume from 256 to 64 rows (and repair data from 256 to
  64 rows), producing smaller SSTables that split much faster in debug
  mode.

Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-2123.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#30004
2026-05-22 15:03:47 +03:00
Patryk Jędrzejczak
082936ce43 Merge 'test: pylib: Convict the node on server_stop()' from Tomasz Grabiec
This is about ungraceful stop, where the node is killed.

Test cases typically need to wait for other nodes to notice that the
node is down before proceeding. By default, that takes about 20s. Can
be reduced via config by reducing failure detector threshold, but it's
not the best solution:

 - cannot set the threshold too low, or we'll introduce falkiness due to false
   positives
 - so it's still slow (a couple of seconds)
 - developers forget about it and the test still works

This patch speeds this up by adding a way to convict the node immediately after stopping the node, controlled by the "convict" parameter.

At the end of the series the "convict" parameter is required, and each test decides what it wants. Commits are split into steps:

- the series starts with defaulting to convict=False
- each test case sets "convict" explicitly, and changes are split into 3 commits depending on whether convict=True is: useless, beneficial, undesirable
- finally, the "convict" parameter is made mandatory

There is also a dedicated test for natural failure detection (test_natural_failure_detection in test_gossiper.py) to ensure FD coverage is not lost.

Tested on dev-mode
cluster/test_tablets_parallel_decommission.py::test_node_lost_during_decommission_drain:
Wall clock time reduced from 41s to 16s

No backport: enhancement

Closes scylladb/scylladb#28495

* https://github.com/scylladb/scylladb:
  test: gossiper: Add test for natural failure detection
  test: pylib: Make convict a required parameter in server_stop()
  test: Annotate server_stop() calls where conviction is harmful
  test: Annotate server_stop() calls where conviction is beneficial
  test: Annotate server_stop() calls where conviction is useless
  test: pylib: Add convict option to server_stop()
  api: failure_detector: Introduce convict-node API
  gms: gossiper: Make convict() public and safe to call from any scheduling group
  api: Extract validate functions to common header
2026-05-22 13:39:50 +02:00
Yaniv Michael Kaul
bb69ae5a02 test: assert ALTER TYPE RENAME rejected on frozen PK UDTs
Add assertion that ALTER TYPE RENAME is rejected when the UDT is used
as a frozen partition key column. The existing test only covered ALTER
TYPE ADD. This closes the coverage gap from dtest
udtencoding_test.py::test_udt_change_in_partition_key, enabling its
removal.

Refs: SCYLLADB-1929

Closes scylladb/scylladb#29840
2026-05-22 12:29:43 +02:00
Marcin Maliszkiewicz
dcff319221 Merge 'cql: request-side custom payload parsing' from Dario Mirovic
When a CQL client sends a request with the `CUSTOM_PAYLOAD` flag (`0x04`) set, the frame body starts with a [bytes map] before the message. Scylla never implemented parsing of this map on the request side. This caused it to fail parsing with protocol errors such as `"truncated frame: expected 65546 bytes"`.

This was discovered through DataStax Java Driver 4.19.x tests that attach a `request-id` to queries via custom payload. The same issue affects any CQL client that sets the `CUSTOM_PAYLOAD` flag.

Fix this by skipping over the custom payload [bytes map] from the frame body before dispatching to opcode-specific handlers. The payload contents are discarded since Scylla has no pluggable `QueryHandler`. Cassandra's default `QueryHandler` also discards them.

Fixes SCYLLADB-745

Reported on 2026.2, backport.

Closes scylladb/scylladb#30005

* github.com:scylladb/scylladb:
  cql: fix request-side custom payload parsing
  test/cqlpy: add tests for request-side custom payload handling
2026-05-22 12:18:26 +02:00
Patryk Jędrzejczak
b7400d20dd test: add test_stop_before_starting_compaction_manager 2026-05-22 11:58:37 +02:00
Marcin Maliszkiewicz
18dd281e72 Merge 'test: audit: pin empty-keyspace DDL audit behavior' from Andrzej Jackowski
9646ee05bd changed behavior of empty keyspace handling and this code path was never tested for CQL audit. Test CREATE/DROP FUNCTION and CREATE/DROP AGGREGATE targeting both an existing keyspace and a nonexistent one to verify both are audited with empty keyspace.

No backport, just a missing test case.

Closes scylladb/scylladb#29542

* github.com:scylladb/scylladb:
  test: audit: pin empty-keyspace DDL audit behavior
  test: audit: restart server when any non-live config key changes
  test: audit: rename 'needed' to 'target_config' for clarity
2026-05-22 09:42:34 +02:00