After schema reload, `target_parser::is_local()` did not recognize the
vector-index local target format `{"pk": [...], "tc": "..."}`, causing
local vector indexes to be treated as global. This broke duplicate
detection when both a global and a local vector index existed on the same
column. Fix by introducing `vector_index::is_local()` and dispatching
to it from `create_index_from_index_row()` based on the index class.
Also adds tests for local/global vector index coexistence.
Fixes: SCYLLADB-987
backport reasoning: we added local vector index support in 2026.1
Closesscylladb/scylladb#29492
* github.com:scylladb/scylladb:
test/cqlpy: add tests for global and local vector index coexistence
index: fix local vector index locality detection after schema reload
-Wpass-failed warns when an explicitly requested optimization
(e.g. `#pragma GCC unroll`) cannot be performed. Since the standard
library contains those pragmas, this is more or less out of a
developer's control. We can play with inlining, but cannot guarantee
it will work.
Since the condition isn't fatal in any way, degrade it back to its
default disposition, a warning (it was upgraded to an error via
-Werror). Don't suppress it entirely since in hot paths we do want
to address it.
Closesscylladb/scylladb#29980
When forwarding reads to the raft group leader was introduced, we
didn't use the methods allowing us to cache the leader after
completing requests - we fix it in this commit by using the
redirect_to_leader method prepared for this case.
Also remove a duplicated consecutive 'if'
Closesscylladb/scylladb#30102
The collect_pkgs ninja rules for building collect-dist-{mode} listed
individual RPM and DEB file paths as order-only dependencies. However,
the rpmbuild/debbuild rules only declare the output *directory*
(e.g. $builddir/dist/{mode}/redhat), not the individual files within it.
This caused ninja to fail with:
ninja: error: '...scylla-....rpm', needed by 'build/.../dist/rpm',
missing and no known rule to make it
Fix by removing the individual package file paths from the order-only
dependency list. The directory targets ($builddir/dist/{mode}/redhat,
dist-cqlsh-rpm, dist-python3-rpm, etc.) already ensure the packages are
built before collect_pkgs copies them via the $pkgs variable.
5694c93c12 ("build: add collect-dist target to organize build artifacts") intreduced this regression
Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-2215Closesscylladb/scylladb#30079
Replace all uses of the deprecated seastar::smp::count with this_smp_shard_count() and smp::all_cpus() with this_smp_all_shards() across the ScyllaDB codebase (seastar submodule untouched).
Both replacement functions require a reactor thread context. All call sites were verified to run on reactor threads.
Notable cases:
- dht/token-sharding.hh: this_smp_shard_count() is used as a default parameter value. This is safe since all callers are on reactor threads, but the expression is now evaluated at each call site rather than being a reference to a global variable.
- service/storage_service.hh, locator/abstract_replication_strategy.hh, ent/encryption/encryption.cc: used in default member initializers and constructor member-init-lists. Objects are always constructed on reactor threads.
- schema_builder: sometimes called from BOOST_AUTO_TEST_CASE without a reactor. Added pre-patch that makes the implicit shard count parameter implicit and pass 1 in those cases.
Not changed:
- scylla-gdb.py: reads smp::count as a GDB symbol (no reactor context).
- Python test files: only reference smp::count in comments/strings.
No backport: the Seastar commit that deprecated these function hasn't (and won't) make its way into any release branches (and the warnings are cosmetic anyway)
Closesscylladb/scylladb#29990
* github.com:scylladb/scylladb:
treewide: replace deprecated smp::count and smp::all_cpus() with new APIs
scylla-gdb: read shard count from smp::_this_smp instead of smp::count
schema_builder: make shard_count an explicit constructor parameter
Replace all uses of the deprecated seastar::smp::count with
this_smp_shard_count() and smp::all_cpus() with this_smp_all_shards()
across the ScyllaDB codebase (seastar submodule untouched).
Both replacement functions require a reactor thread context. All call
sites were verified to run on reactor threads.
Notable cases:
- dht/token-sharding.hh: this_smp_shard_count() is used as a default
parameter value. This is safe since all callers are on reactor threads,
but the expression is now evaluated at each call site rather than being
a reference to a global variable.
- service/storage_service.hh, locator/abstract_replication_strategy.hh,
ent/encryption/encryption.cc: used in default member initializers and
constructor member-init-lists. Objects are always constructed on reactor
threads.
Not changed:
- scylla-gdb.py: reads smp::count as a GDB symbol (no reactor context).
- Python test files: only reference smp::count in comments/strings.
After removing all references to smp::count from ScyllaDB code, in
the next patch, the linker may strip the symbol in release builds
(LTO/gc-sections). The GDB script then fails with 'Missing ELF symbol
_ZN7seastar3smp5countE'.
Read the shard count from the thread-local smp instance pointer
(smp::_this_smp->_shard_count) instead. This pointer is always set on
reactor threads and is guaranteed to survive the linker since it's used
by this_smp_shard_count().
If the current GDB thread is not a reactor thread (e.g., an alien
thread), iterate all threads to find the first reactor thread. If none
have _this_smp set, try the deprecated smp::count global as a last
resort (available in debug builds).
When a mutation generates more view updates than max_rows_for_view_updates
(100), view_update_builder::build_some() splits the work into multiple
batches. There was a bug in how fragments were read between batches:
When should_stop_updates() returned true, the old code called stop()
which returned stop_iteration::yes without reading the next fragments.
On the next build_some() call, read_both_next_fragments() was called
at the start, which advanced BOTH readers - skipping any fragment that
was already read but not yet consumed. A row could be not consumed if
either:
- the 100th (last in the batch) update was a row insertion and we still
had insertions/updates remaining
- the 100th (last in the batch) update was a row deletion and we still
had deletions/updates remaining
For the most common case where work is split in batches, i.e. range
deletions, we couldn't hit this because range delete generates only
view row deletions.
On tables with a single materialized view, we also couldn't get this
for any batches with less than 50 statements (unless the batch also
contained range deletions), because one non-range-delete update can
generate up to 2 view updates.
Howeveer, for a range of scenarios outside these 2, we could lose
view updates, resulting in persistent inconsistencies.
The fix:
- read_*_next_fragment() now accept a stop_iteration parameter, so the
next fragments are always read after consuming (even when stopping),
but stop_iteration::yes is correctly propagated to break the loop.
- build_some() no longer re-reads fragments at the start. Instead, an
initialize() method performs the initial read once at construction.
- because now we only advance readers after consuming, we won't advance
readers after end_of_partition, so we extend the break condition to
accept either readers evaluating to `false` or them being at the
end_of_partition. We also handle the optimization with
_skip_row_updates
Fixes: scylladb/scylladb#29155Closesscylladb/scylladb#29498
Queries are stored and passed around as sstring/std::string_view. While normally they are small enough to not cause problems, as the `test_cdc_large_values.TestLargeColumnsWithCDC.test_single_column_blob_max_size_with_cdc_preimage_full_postimage[unprepared_statements]` demonstrates, queries can be arbitrarily large, putting heavy strain on Scylla internals via large allocations, in the extreme case causing denial of service.
This PR attempts to alleviate this by using fragmented storage for queries: read query as fragmented string from the input stream in `transport/server.cc`, propagate it as such to `query_processor::prepare()` and also store it as such in `cql3::cql_statement::raw_cql_statement`. Also avoid linearizing raw values during in the CQL expression tree: switch `cql3::expr::untyped_constant::raw_text` to fragmented storage.
For this to be possible, some infrastructure code had to be made fragmented storage friendly: ascii/utf8 validation, hashers, from_hex and importantly: `abstract_type::from_string()`.
Unfortunately, the query still has to be linearized for parsing itself, as ANTLR -- although allows for custom InputStream implementation -- plays pointer arithmetics games with the pointers obtained from them, so fragmented input cannot be used.
Still, this PR limits the places where the query is linearized to the
following:
* Parsing
* Audit
* Logs and error messages
So the normal query paths for queries that actually can get arbitrarily large (UPDATE and INSERT) should only linearize the query temporarily for parsing.
Fixes#10779
Improvement, no backport
Closesscylladb/scylladb#28619
* github.com:scylladb/scylladb:
tracing: add_query(): change query param to utils::chunked_string
cql3: store raw query string in utils::chunked_string
serializer: add serializer<utils::chunked_string>
utils/reusable_buffer: add get_linearized_view(managed_bytes_view)
cql3/expr: use utils::chunked_string for untyped_constant::raw_text
types: abstract_type::from_string() switch to fragmented buffers (implementation)
types: abstract_type::from_string() switch to fragmented buffers (interface)
types: use write_fragmented from utils/fragment_range.hh
types: timestamp_from_string(): don't assume std::string_view is null-terminated
types/duration: don't assume std::string_view is null-terminated
utils/hashers: add calculate(managed_bytes_view) overload
utils/ascii: add validate(managed_bytes_view) overload
utils: add managed_bytes_fwd.hh
utils: add chunked_string
utils: add managed_bytes_basic_view::byte_iterator
A recent Seastar update deprecated smp::count and introduced
this_smp_shard_count() as a replacement. One difference is that
this_smp_shard_count() wants to run on a reactor thread.
This poses a problem for non-reactor tests (BOOST_AUTO_TEST_CASE)
that nevertheless use a schema, as the schema_builder constructor
references smp::count. If we replace it with this_smp_shard_count()
then it will crash when running without a reactor.
To fix, remove the implicit this_smp_shard_count() call from raw_schema's
constructor and require callers to pass shard_count explicitly to
schema_builder. This allows tests that don't run on a reactor thread
to construct schemas without crashing.
Production code and reactor-based tests pass this_smp_shard_count().
Non-reactor test files (expr_test, keys_test, nonwrapping_interval_test,
wrapping_interval_test, bti_key_translation_test, range_tombstone_list_test)
pass a fixed shard count of 1.
Note: sstable_test.cc is a Seastar test file (SEASTAR_THREAD_TEST_CASE)
but also contains one plain BOOST_AUTO_TEST_CASE
(test_empty_key_view_comparison) that constructs a schema_builder without
a reactor context. This test also receives a fixed shard count of 1.
The purpose of this test is to verify that the task manager's "wait" API
works correctly for vnodes-to-tablets migration virtual tasks. It starts
a `wait_task` HTTP request concurrently with a finalize (or rollback)
operation, and asserts that the wait returns the correct final state
("done" or "suspended").
The test `uses asyncio.create_task()` to wrap the wait request into a
task, and then immediately calls finalize. With asyncio's lazy task
scheduling, the wait coroutine does not start until the event loop
yields, so the finalization request reaches the server before wait, and
therefore may also complete before it. Once finalization completes, the
virtual migration task is no longer discoverable, causing a
"task not found" error.
Add a log message in Scylla's wait handler and a synchronization point
in the test to ensure that the wait request lands the server before
finalization. This follows the same pattern used in
`test_tablet_tasks.py::check_and_abort_repair_task`.
Fixes SCYLLADB-2077
Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
Closesscylladb/scylladb#29973
Several error injection sites use the low-level get_injection_parameters() API to fetch the entire parameters map and then manually look up a single key. The inject_parameter() API is better suited for these cases — it combines the enabled check and typed single-parameter extraction in one call, returning std::optional.
Cleaning error injection usage, not backporting
Closesscylladb/scylladb#29970
* github.com:scylladb/scylladb:
test: Use inject_parameter() in row_cache_test
sstables: Use inject_parameter() for mx reader fill buffer timeout
streaming: Use inject_parameter() for order_sstables_for_streaming
Migrate mock-based rescoring and oversampling tests from
test/vector_search/rescoring_test.cc to pytest and delete the C++ file.
Index option validation tests go to test_vector_index.py; rescoring tests
go to a new test_vector_search_rescoring.py which introduces shared
infrastructure (EmbeddingRow dataclass, TEST_DATA dict,
reversed_ann_response() helper, rescoring_test_table() context manager).
Two tests have updated assertions (semantic change):
filters_invalid_similarity_scores now uses per-function expected result
sets including a zero-vector row, and rescoring_with_zerovector_query
asserts empty results after NaN filtering (cosine only). Both are marked
xfail pending SCYLLADB-924.
Follow-up to #29593.
Does not require backport - simple refactoring of tests
Closesscylladb/scylladb#29906
* github.com:scylladb/scylladb:
test/vector_search: migrate zero-vector query rescoring test to pytest; delete rescoring_test.cc
test/vector_search: migrate invalid similarity score filtering test to pytest
test/vector_search: migrate non-ANN similarity argument rescoring test to pytest
test/vector_search: migrate wildcard select rescoring test to pytest
test/vector_search: migrate similarity_function rescoring test to pytest
test/vector_search: migrate rescoring and f32 quantization tests to pytest
test/vector_search: migrate oversampling tests to pytest
test/vector_search: migrate vector_index option validation tests to pytest
Having to unconditionally linearize the chunked query string when
passing it to tracing undoes the work put into reducing large
alloctions on the query path. The add_query() is evaluated eagerly on
every query, even if tracing is disabled. Defer the linearization to
build_parameres_map(), which is only called if tracing is enabled.
Read query as fragmented string from the input stream in
transport/server.cc, propagate it a such to query_processor::prepare()
and also store it as such in cql3::cql_statement::raw_cql_statement.
Unfortunately, the query still has to be linearized for parsing, as
ANTLR -- although allows for custom InputStream implementation -- plays
pointer arithmetics games with the pointers obtained from them, so
fragmented input cannot be used.
To amortize the cost of this linearization, the query string is
linearized through utils::reusable_buffer. The parser can be
invoked recursively, nested invokations linearize directly.
Still, this patch limits the places where the query is linearized to the
following:
* Parsing
* Audit
* Logs and error messages
So the normal query paths for queries that actually can get arbitrarily
large (UPDATE and INSERT) should only linearize the query temporarily
for parsing.
Also add normalizer which maps to sstring. utils::chunked_string's wire
representation is binary compatible with that of sstring, which allows
for seamless migration of RPCs from sstring to utils::chunked_string
where needed. Will be used in the next commit for forward CQL prepare
request (query string).
The previous patch changed the interface and callers, this one updates
the implementation to actually work with fragmented buffers. Most types
just use with_linearized() to linearize the fragmented input buffer for
parsing. This is fine, as most types have a fixed or bounded-size string
representation that is small.
Importantly, the input is not linearized for the 3 types which have
unbounded values: ascii, bytes and text. The tuple type can contain any
of these types itself, so it is also converted to avoid linearization.
Change input: str::string_view -> utils::chunked_string_view.
Change return value: bytes -> managed_bytes.
This patch only changes the interface, with some to_bytes() sprinkled in
the internals to deal with recursive calls.
Internals will be updated in the next patch, to keep the churn of
updating callers separate from the actually important changes.
std::string_view is not guaranteed to point to null-terminated string
literals, it may point to a substring of such a string or a string which
is not null-terminated.
std::strtoll() assumes a null terminated string and triggers heap buffer
overflow if this is not true.
Use std::from_chars() -- which doesn't assume or require null-terminated
strings -- to parse numbers from strings instead of std:strtoll().
While at it: fix a small mistake in error reporting. When reporting
failure to parse the number, include the original string in the error
report, instead of the (failed-to-parse) number.
Not a problem on current master, as all callers pass null-terminated
strings.
std::string_view is not guaranteed to point to null-terminated string
literals, it may point to a substring of such a string or a string which
is not null-terminated.
cql_duration() constructor obtains data() pointer from std::string_view
and creates another std::string_view from it, after some conditional
pointer arithmetics. Constructing a new std::string_view from a raw
pointer, without specifying its length, will lead to strlen() being
called on the pointer, resulting in undefined behaviour if the string
is not null-terminated. Use substr() instead of pointer arithmetics to
avoid this problem altogether.
boost::regex_match() invokations also use std::string_view::data().
This leads to strlen() and heap-buffer-overflow if the string is not
null-terminated. Invoke the overload which takes an iterator pair
instead.
Not a problem on current master, as all callers pass null-terminated
strings.
Uses update() for each fragment, then finalize. Yields identical hash to
calling calculate(std::string_view) with linearized buffer. This is
checked by new tests.
Forward declaration of managed_bytes[_view].
enum class mutable_view was moved from utils/managed_bytes.hh to
utils/mutable_view.hh, because it is needed in the forward declaration.
A thin facade over managed_bytes[_view], offering some extra
convenience for working with strings, as well as a strong type
communicating the purpose (storing text instead of a blob).
Also introduces utils::from_hex(chunked_string_view), a fragmented
hex-decode that operates directly on a chunked_string_view without
requiring linearization. Hex pairs straddling fragment boundaries
are handled via a carry-over nibble.
bytes-wise iterator which works both as bidirectional-iterator and as
output-iterator (for mutable views). Allows using managed_bytes_view in
algorithms which are iterator based.
Added unit tests for covering the iterator functionality.
_compacting_table is now a compaction::compaction_group_view* (virtual
interface) which doesn't have a _schema member directly. Use
downcast_vptr() to get the concrete type and navigate through _t._schema.
Fallback to the old path for compatibility with older builds.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Closesscylladb/scylladb#30078
Migrate rescoring_with_zerovector_query from rescoring_test.cc to pytest
as test_rescoring_with_zerovector_query. Tested with cosine similarity only
because zero vectors produce NaN only for cosine; other functions yield
valid scores.
The test is marked xfail: similarity_cosine now returns NaN for zero vectors
(SCYLLADB-456 fix) and rescoring should filter out NaN scores, yielding an
empty result set.
Semantic change: the test now asserts the desired empty-result behavior
instead of asserting that the query does not throw.
Delete rescoring_test.cc now that all tests have been migrated and remove
its entries from configure.py and test/vector_search/CMakeLists.txt.
Migrate no_nulls_in_rescored_results from rescoring_test.cc to pytest,
renamed to test_filters_invalid_similarity_scores_in_rescored_results.
The test now also inserts a zero-vector row (id=14) to cover the case
introduced when similarity_cosine was changed to return NaN for zero
vectors instead of throwing (SCYLLADB-456). The expected surviving set
of rows is refined per similarity function based on which inputs produce
valid (non-NaN, non-Infinity) similarity scores. Marked xfail because
rescoring does not yet filter rows with invalid scores.
Semantic change: the expected surviving row set is updated per the
behavior described above.
Migrate select_similarity_function_other_than_ann_ordering from
rescoring_test.cc to pytest. The test verifies that similarity scores in
SELECT are computed against the explicitly supplied argument vector rather
than the ANN ordering vector. No semantic change.
Migrate wildcard_select_is_correctly_rescored from rescoring_test.cc to
pytest. The test verifies that SELECT * with rescoring returns rows in
the correct similarity order with correct embedding values, covering a
slightly different processing path from the explicit-column SELECT test.
No semantic change.
Migrate similarity_function_returns_correctly_rescored_results from
rescoring_test.cc to pytest. The test verifies that similarity scores
in the SELECT clause are computed correctly after rescoring, for both
argument orderings of the similarity function. No semantic change.
Introduce shared test infrastructure in test_vector_search_rescoring.py:
EmbeddingRow dataclass, TEST_DATA dict keyed by similarity function name,
ANN_QUERY_VECTOR, reversed_ann_response() helper, and rescoring_test_table()
context manager.
Migrate result_returned_by_vector_store_is_rescored and
f32_quantization_disables_rescoring from rescoring_test.cc. No semantic change.
Migrate oversampling_multiplies_limit_for_vector_store_query and
oversampled_vector_store_results_are_limited_to_cql_limit from
rescoring_test.cc to test_vector_search_rescoring_with_mock.py.
No semantic change.
CREATE INDEX option tests for quantization, oversampling, and rescoring
are moved from rescoring_test.cc to test_vector_index.py alongside the
existing index option tests. These tests exercise only option parsing and
validation - no vector store mock needed. No semantic change.
CassIO (the library backing LangChain's `langchain_community.vectorstores.Cassandra` integration) issues the following DDL during schema setup to create a metadata index:
```sql
CREATE CUSTOM INDEX IF NOT EXISTS eidx_metadata_s_<table>
ON <keyspace>.<table> (ENTRIES(metadata_s))
USING 'org.apache.cassandra.index.sai.StorageAttachedIndex';
```
ScyllaDB does not support Cassandra's StorageAttachedIndex (SAI) for non-vector columns and previously rejected this statement with:
```
StorageAttachedIndex (SAI) is only supported on vector columns; use a secondary index for non-vector columns
```
This blocks seamless migration of existing LangChain/CassIO applications from Cassandra to ScyllaDB — applications fail during initialization before any application-level workaround can run, even when metadata filtering is not used (`metadata_indexing="none"`).
CassIO is no longer actively maintained but remains the only official LangChain integration path for Apache Cassandra over CQL, meaning existing applications will continue using this setup pattern.
Instead of rejecting the CassIO metadata-map SAI DDL, detect the pattern and rewrite it to a standard ScyllaDB secondary index on collection entries:
- **Detection**: SAI class name + single `ENTRIES` target on a non-frozen `map` column
- **Rewrite**: Clear the custom class so the index is created through the standard secondary index path (which already fully supports indexing map entries)
- **Warning**: Emit a CQL warning informing the user that SAI is not supported by ScyllaDB, a regular secondary index was created instead, and metadata filtering behavior may differ from Cassandra SAI
The rewrite is placed early in `validate_while_executing()`, before the rf-rack-validity check, so the standard secondary index code path handles all subsequent validation naturally — no code duplication.
After this change, the CassIO schema setup succeeds on ScyllaDB:
- `CREATE CUSTOM INDEX ... USING 'sai'` on `ENTRIES(metadata_s)` creates a real secondary index
- The index is functional and can accelerate metadata filtering queries
- A CQL warning makes the rewrite transparent to operators
- SAI on non-vector, non-map-entries columns is still rejected as before
- Vector SAI indexes continue to be rewritten to `vector_index` as before
- `test_sai_entries_on_map_creates_regular_index` — verifies the index is created and the warning is emitted (fully-qualified SAI class name)
- `test_sai_entries_on_map_short_name` — same with the `'sai'` short alias
- `test_sai_on_regular_column_rejected` — confirms SAI on regular scalar columns is still rejected
All 148 tests in `test_vector_index.py` and `test_secondary_index.py` pass with no regressions (125 passed, 22 xfailed, 1 skipped).
Fixes: SCYLLADB-2113
Backport: 2026.2 as this is the version where the support for SAI class needed by LangChain was added.
Closesscylladb/scylladb#29981
* github.com:scylladb/scylladb:
cql: rewrite CassIO SAI metadata index to regular secondary index
db/config: add enable_cassio_compatibility flag
The versions collection in data_read_resolver::resolve() is a
std::vector<std::vector<version>>. This contains one entry per unique
partition in the union of all results from each replica.
The vector's size is reserved to the size of partitions in the first
replica's response. Later, new entries are added via `emplace_back()`
for partitions found only in other replica's responses.
This can become really large if there are lot of small partitions, and
especially when there are big differences between the partition set
returned by individual replicas.
With small partitions (e.g. Alternator items with TTL, typically 150-200
bytes each), a single 1 MB read page can carry thousands of partitions,
easily pushing this vector past 2730 entries -- the point at which a
std::vector doubling reallocation exceeds the 128 KB seastar
large-allocation warning threshold:
2 * 2731 * sizeof(std::vector<version>=24) > 131072
Switching to utils::chunked_vector caps every individual allocation at
128 KB by design, regardless of the number of partitions or how much
the replicas diverge. The four internal helper functions that receive
this container (find_short_partitions, get_last_row,
got_incomplete_information_across_partitions, got_incomplete_information)
are updated to accept the new type; their logic is unchanged.
Fixes: SCYLLADB-460
Closesscylladb/scylladb#29325
The vector-store's InvariantKey type supports at most 255 key
components. Reject vector index creation when the base table's
primary key (partition + clustering columns) exceeds this limit.
Fixes: VECTOR-553
Closesscylladb/scylladb#29317
When CassIO creates a SAI ENTRIES index on a map column,
ScyllaDB now rewrites it to a regular secondary index and emits
a CQL warning. This allows LangChain/CassIO applications to work
without DDL errors.
The rewrite is gated behind the enable_cassio_compatibility flag
(disabled by default).
Refs: SCYLLADB-2113
The compaction module is registered with task_manager in the compaction_manager
constructor, and unregistered in compaction_manager::really_do_stop(), which
was gated behind `_state != state::none` in compaction_manager::do_stop().
Since enable() -- which transitions _state from none to running -- is called
later during startup (from database::start() or the disk space monitor callback)
than the compaction_manager constructor, an early shutdown could leave the
compaction module registered after compaction_manager::do_stop() returned.
task_manager::stop() then aborted with 'Tried to stop task manager while
some modules were not unregistered'.
Fix compaction_manager::do_stop() to call _task_manager_module->stop() even
when `_state == state::none`, so that the compaction module is always properly
unregistered.
Fixes: SCYLLADB-2106
Backport to all supported branches, as the bug is there and it has
already caused a failure in 2026.1 CI.
Closesscylladb/scylladb#30015
* github.com:scylladb/scylladb:
test: add test_stop_before_starting_compaction_manager
compaction_manager: unregister compaction module on early shutdown
Currently driver creates network layout (node IP addresses and ports)
from `system.local`, `system.peers`, `system.client_routes` and then
runs on assumption that this network layout is correct.
It does not check if it is.
If, for example it happens so that node ip/port (say on proxy) will not
match what driver calculated it will go unnoticed.
The goal of this feature is to provide driver host-id on SUPPORTED frame,
so that it would know which node it connected to and could make decision
wether keep connection or drop it.
- add `SCYLLA_HOST_ID` to the CQL `SUPPORTED` response
- add a regression test that hooks the Python driver handshake and
verifies the reported host id
- `python3.12 -m py_compile test/cqlpy/test_protocol_exceptions.py`
- syntax-only compile of `transport/server.cc` with the repo toolchain
flags inside `dbuild`
Refs #27452
Refs https://scylladb.atlassian.net/browse/DRIVER-610Closesscylladb/scylladb#29809
ScyllaDB has special counter columns for which atomic add/subtract operations like `SET a = a + 1` are allowed. Such operations have not been allowed on ordinary non-counter columns, as they would not be properly atomic - the read an the write are separate, and concurrent operations can have incorrect results.
This patch makes it allowed to use such atomic add/subtract operations in **LWT** statements. For example
UPDATE ... SET a = a - 7 IF a > 0
or
UPDATE ... SET a = a + 1 IF a != NULL
The row updated in the operation, and the updated column (`a`) should be initialized before the update. The example `SET a = a + 1 IF a != NULL` will fail the condition if `a` is not set. A different request `SET a = a + 1 IF EXISTS` will just leave `a` unset if it's unset (NULL + 1 is NULL, this is SQL's null propagation rules).
This add/subtract operations is allowed on any numeric (integer or floating point) column.
The ability of LWT to fetch the old values of a column and use it to calculate the new value has long been available in our internal CAS implementation - and has been in use for years in Alternator - but until this patch it was not exposed in CQL's LWT.
This series does not add new syntax to CQL - the "SET a = a + b" and "SET a = a - b" syntax already existed for counters, and we just allow the same syntax for non- counters. However, the series does add a bit of machinery that will allow us to easily support more general expressions in the future. In particular, this series implements the addition, subtraction, and unary-minus operators for expressions, and adds the machinery needed to run **any** expression in "SET a = expr()", using existing row values fetched by LWT.
This is a new Scylla-only feature that does not exist in Cassandra.
Fixes#10568
Refs #22918 ("Support arithmetic operators"), SCYLLADB-1576 ("Decimal arithmetic operations OOM")
This is is a new feature, so normally would not be backported.
Closesscylladb/scylladb#29939
* github.com:scylladb/scylladb:
cql: atomic add/subtract operations with LWT
cql3: let constants::setter evaluate expressions using prefetched row data
cql3/expr: add NEG unary operator for numeric negation
cql3/expr: add SUB binary operator for numeric subtraction
cql3/expr: add ADD binary operator for numeric addition
types: add is_arithmetic() method for types
The keyspace RF test starts zero-token nodes as part of its topology setup.
The python driver 3.29.9 can't schedule queries on zero-token nodes, so waiting for `CQL_ALTERNATOR_QUERIED` on those nodes is the wrong readiness gate.
This change makes the zero-token `server_add()` calls stop at `CQL_ALTERNATOR_CONNECTED`.
The test still exercises the keyspace replication assertions through a normal token-owning contact point.
Verified with running all 4 variations of `cluster.test_keyspace_rf::test_create_keyspace_with_default_replication_factor` on this branch.
Closesscylladb/scylladb#29779
previously the logstor compaction state for a compaction group could be
removed by a compaction reenabler guard. this caused an invalid access
in stop_ongoing_compactions, because it holds an iterator to the
compaction state across a yield point, so the iterator can be
invalidated if erased by another source concurrently.
we change the compaction state removal to be done only in a remove()
function that is called when the compaction group is stopped, after
waiting for ongoing compaction to stop and after the gates are closed.
this is safer because we keep the compaction state while the compaction
group exists, and remove it only when it's stopped and there are no
compactions in progress. this is similar to compaction state removal for
non-logstor tables in compaction_group::stop.
Fixes SCYLLADB-2199
Closesscylladb/scylladb#30068
Strongly consistent reads currently call read_barrier() on whichever
replica happens to process the request. When a follower runs
read_barrier(), it sends an RPC to the leader to get the current read
index, then waits for its local apply index to catch up. If the follower
is behind, this wait can be significant.
By forwarding linearizable reads to the leader, we don't need an RPC from replica to leader to get the index to wait for apply -- it's available locally.
Note that read_barrier() is still required on the leader to confirm it
is still the leader and guarantee linearizability. A future optimization
would be to implement leases in the raft library, which could eliminate
read_barrier() on the leader entirely.
The CL-to-behavior mapping is isolated in a single parse_consistency_level()
function:
- CL=(LOCAL_)QUORUM -> linearizable: forwarded to the raft leader
- CL=(LOCAL_)ONE -> non-linearizable: existing behavior (no read_barrier()/forwarding, may return stale results)
- All other CLs -> invalid request
Read forwarding reuses the same CQL-layer bounce_to_node() mechanism
that write forwarding already uses. The transport layer's existing
requests_forwarded_* metrics automatically count forwarded reads.
Coordinator-level metrics (linearizable_reads, non_linearizable_reads,
writes) are added for visibility into the strong consistency workload.
Fixes: SCYLLADB-1157
Closesscylladb/scylladb#29575
* github.com:scylladb/scylladb:
strong_consistency: test read forwarding to leader
strong_consistency: skip read_barrier() for non-linearizable reads
strong_consistency: split coordinator-level read latency metrics
strong_consistency: forward linearizable reads to raft leader
strong_consistency: classify reads by consistency level
strong_consistency: add begin_read() to raft_server
ScyllaDB has special counter columns for which atomic add/subtract
operations like `SET a = a + 1` are allowed. Such operations have not
been allowed on ordinary non-counter columns, as they would not be
properly atomic - the read an the write are separate, and concurrent
operations can have incorrect results.
This patch makes it allowed to use such atomic add/subtract operations
in *LWT* statements. Some examples:
UPDATE ... SET a = a - 1 IF a > 0
UPDATE ... SET a = a + 1 IF EXISTS
UPDATE ... SET a = a + 1 a != NULL
The row updated in the operation, and the updated column (a) should
be initialized before the update - arithmetic operations on missing
column values silently leave the column null (no error is generated).
This add/subtract operations is allowed on any numeric column -
integer or floating point of any size.
The ability of LWT to fetch the old values of a column and use it to
calculate the new value has long been available in our internal CAS
implementation - and has been in use for years in Alternator - but until
this patch it was not exposed in CQL's LWT.
This patch does not add new syntax to CQL - the "SET a = a + b"
and "SET a = a - b" syntax that already existed for counters is now
allowed for non-counters.
This is a new Scylla-only feature that does not exist in Cassandra.
Fixes#10568
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Previously, constants::setter evaluated its expression using only the query
options, which means expressions referencing row columns (column_value nodes)
would crash or return incorrect results.
Add evaluate_on_prefetched_row() to update_parameters: it evaluates an
expression in the context of the prefetched row for a given (pkey, ckey),
falling back to options-only evaluate() when no selection is available
(non-LWT context) or no column values are needed, and treating absent
columns needed by the expression as null.
Extend constants::setter to use this method:
- setter::execute() now calls evaluate_on_prefetched_row() or evaluate()
as needed.
- setter::requires_read() returns true when the expression contains a
column_value node, triggering a prefetch read.
- setter::requires_lwt() mirrors requires_read(), enforcing that column-
referencing arithmetic is only allowed inside a conditional (IF) statement.
We'll use this new feature to implement "SET r = r + 1" and similar
expressions in the next patch.