Currently the suite generates config in old format, and only a single
test validates that using new format "works".
This change updates the suite (mainly the MinioServer::create_conf()
method) to generate endpoint confit in new format.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#28113
The datagram_channel::send() method that sends net::packet-s is
deprecated in favor of using span<temporary_buffer> one. Auditing code
still uses the former one -- it constructs a packet by using formatted
string by copying the string into the packet's fragment, then sends it.
This patch releases string into temporary_buffer and then passes
one-element span to send().
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#28198
do_query() is a coroutine but uses some continuations to take
advantage of exceptions being propagated via future::then() without
being thrown. We can accomplish the same thing with a nested coroutine
and coroutine::try_future(), simplifying the code.
While this area isn't performance intensive, we're not adding allocations.
The coroutine frame may add an allocation, but since read_page()
certainly does not return immediately, the following then() will allocate
as well. Since we eliminated that then(), the change is at least neutral
allocation-wise.
Closesscylladb/scylladb#28258
Consider the following scenario:
1. Let nodes A,B,C form a cluster with RF=3
2. Write query with CL=QUORUM is submitted and is acknowledged by
nodes B,C
3. Follow-up read query with CL=QUORUM is sent to verify the write
from the previous step
4. Coordinator sends data/digest requests to the nodes A,B. Since the
node A is missing data, digest mismatches and data reconciliation
is triggered
5. The node A or B fails, becomes unavailable, etc
6. During reconciliation, data requests are sent to node A,B and fail
failing the entire read query
When the above scenario happens, the tests using `start_writes()` fail
with the following stacktrace:
```
...
> await finish_writes()
test/cluster/test_tablets_migration.py:259:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
test/pylib/util.py:241: in finish
await asyncio.gather(*tasks)
test/pylib/util.py:227: in do_writes
raise e
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
worker_id = 1
...
> rows = await cql.run_async(rd_stmt, [pk])
E cassandra.ReadFailure: Error from server: code=1300 [Replica(s) failed to execute read] message="Operation failed for test_1767777001181_bmsvk.test - received 1 responses and 1 failures from 2 CL=QUORUM." info={'consistency': 'QUORUM', 'required_responses': 2, 'received_responses': 1, 'failures': 1}
```
Note that when a node failure happens before/during a read query,
there is no test failure as the speculative retries are enabled
by default. Hence an additional data/digest read is sent to the third
remaining node.
However, the same speculative read is cancelled the moment, the read
query reaches CL which may trigger a read-repair.
This change:
- Retries the verification read in start_writes() on failure to mitigate
races between reads and node failures
- Adds additional logging to correlate Python exceptions with Scylla logs
Fixes https://github.com/scylladb/scylladb/issues/27478
Fixes https://github.com/scylladb/scylladb/issues/27974
Fixes https://github.com/scylladb/scylladb/issues/27494
Fixes https://github.com/scylladb/scylladb/issues/23529
Note that this change test flakiness observed during tablet transitions.
However, it serves as a workaround for a higher-level issue
https://github.com/scylladb/scylladb/issues/28125Closesscylladb/scylladb#28140
This patch adds vector index options allowing to enable quantization and oversampling.
Specific quantization value will be used internally by vector store.
In the current implementation, get_oversampling allows us to decide how many times more candidates
to retrieve from vector store - final response is still trimmed to the given limit.
It is a first step to allow rescoring - recalculation of similarity metric and re-ranking.
Without rescoring oversampling will be also further optimized to happen internally in vector store.
`test/vector_search/rescoring_test.cc` implements basic tests of added functionality.
New options are documented in `docs/cql/secondary-indexes.rst`.
Fixes https://scylladb.atlassian.net/browse/SCYLLADB-82
Ref https://scylladb.atlassian.net/browse/SCYLLADB-83
New feature - no backporting
Closesscylladb/scylladb#27677
* github.com:scylladb/scylladb:
vector_search: doc: Document new index options
vector_search: test: Test oversampling
vector_search: test: Add rescoring index options test
vector_search: test: Extract Configure utility to shared header
vector_index: introduce `quantization` and `oversampling` options
Cassandra changed their system tables in 3.0. We migrated to the new system table layout in 2017, in ScyllaDB 2.0.
System tables introduced in Cassandra 3.0, as well as the 3.0 variant of pre-existing system tables were added to the db::system_table::v3 namespace.
We ended up adding some new ScyllaDB-only system tables to this namespace as well.
As the dust settled, most of the v3 system tables ended up being either simple aliases to non-v3 tables, or new tables.
Either way, the codebase uses just one variant of each table for a long time now the v3:: distinction is pointless.
Remove the v3 namespace and unify the table listing under the top-level db::system_keyspace scope.
Code cleanup, no backport
Closesscylladb/scylladb#28146
* github.com:scylladb/scylladb:
db/system_keyspace: move remining tables out of v3 keyspace
db/system_keyspace: relocate truncated() and commitlog_cleanups()
db/system_keyspace: drop v3::local()
db/system_keyspace: remove duplicate table names from v3
The loops in `ongoing_rf_change()` perform explicit yields, but they
also perform coroutine operations which can yield implicitly. The
explicit yields are redundant.
Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
`effective_capacity` is a value used in size based load balancing. It contains the sum of available disk space of a node and all the tablet sizes.
This change adds this value to the virtual table `system.load_per_node`. This can be useful for debugging size based load balancing.
Size based load balancing is currently only on master, so no backport is needed.
Closesscylladb/scylladb#28220
* github.com:scylladb/scylladb:
docs: add effective_capacity to system keyspace docs
virtual_table: add effective_capacity to load_per_node
This test has to be adjusted in lock-step with scylladb.git, due to changes in https://github.com/scylladb/scylladb/pull/27836. It is simpler to just take the time and import it, so https://github.com/scylladb/scylladb/pull/27836 can patch all the affected tests, including this one.
All code is imported verbatim, then patched later, such that the series remains bisectable.
dtest import, no backport needed
Closesscylladb/scylladb#28085
* github.com:scylladb/scylladb:
test/cluster/dtest: remove is_win() and users
test/cluster/dtest/scrub_test.py: add license blurb
test/cluster/dtest: import scrub_test.py
test/cluster/dtest/ccmlib: scylla_node.py: adapt run_scylla_sstable() at al
test/cluster/dtest/ccmlib: scylla_node.py: import run_scylla_sstable()
The original scrub test was done by the Cassandra project, hence there
is two Licenses notices: one for the original work by Cassandra
(2015) and one for our modifications on top (2021).
And dependencies: get_sstables() and __gather_sstables().
Code is importend verbatim, but doesn't work yet (no users yet either).
Will be patched to work in the next commit.
The last remining tables in the v3 keyspace are those that are genuinely
distinct -- added by Cassandra 3.0 or >= ScyllaDB 2.0.
Move these out of the v3 keyspace too, with this the v3 keyspace is
defunct and removed.
The name variables of these tables is outside the v3 namespace but the
method defining their schema is in the v3 namespace. Relocate the
methods out from the v3 namespace, to the scope where the name variables
live.
The methods are moved to the private: part of system_keyspace, as they
don't have external users currently.
Those table names that are effectively just an alias of the their
counterpart outside of the v3 namespace (struct).
scylla_local() is made public. Currently it is private, but it has
external users, working around the private designation by using the
public v3::scylla_local() alias. This change just makes the existing
status clear.
This patch adds vector index options allowing to enable quantization and oversampling.
Specific quantization value will be used internally by vector store.
In the current implementation, `get_oversampling` allows us to decide how many times more candidates
to retrieve from vector store - final response is still trimmed to the given limit.
It is a first step to allow rescoring - recalculation of similarity metric and re-ranking.
Without rescoring oversampling will be also further optimized to happen internally in vector store.
Fixes https://scylladb.atlassian.net/browse/SCYLLADB-82
Ref https://scylladb.atlassian.net/browse/SCYLLADB-83
Currently, we only know about long reclaims from lsa-timing stall
reports. Shorter reclaims can go under the radar.
Those metrics will help to asses increase in LSA activity, which
translates to higher CPU cost of a workload.
reclaim tracks memory which goes to the standard allocator, e.g. when
entering and allocating_section or in the background reclaimer.
evict/compact count activity towrads building LSA reserve, in
allocating_section entry, or naked LSA allocation.
Closesscylladb/scylladb#27774
The way that test.py runs test/cqlpy tests requires that tests end their
session with all keyspaces deleted. If we forget to delete a keyspace,
test.py suspects some test fails and reports a failure. As reported in
issue #26291, the test file test/cqlpy/test_describe.py caused this check
to trigger, so this file was added to the blacklist "dirties_cluster"
in suite.yaml to force test.py to ignore this problem.
I believe the cause of the problem was as follows: test_describe.py
didn't really leave any undeleted keyspace. Rather, test_describe.py had
one test which used "USE" and this broke DESC KEYSPACES (Refs #26334) -
which test.py used to see which keyspaces remained.
We solved this problem not just once, but twice:
1. In pull request #26345, I fixed the test not to use "USE" on the main
CQL session.
2. In pull request #27971, I fixed DESC KEYSPACES implementation so even
if "USE" was in effect, it will return the correct results.
I checked manually, and after removing test_describe.py from the
dirties_cluster blacklist, all cqlpy tests now pass, without
spurious failures in the test following test_describe.py. So it's time
to remove it from the blacklist.
Fixes#26291
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#27973
Commit d54d409 (audit: write out to both table and syslog) unified
create_audit and start_audit, which moved the audit service creation later
in the startup sequence. This broke startup when audit is enabled because
view_builder prepares CQL queries before start_audit runs, and
query preparation calls audit_instance().local_is_initialized()
which crashes on the non-existent sharded service.
Move start_audit to run before view_builder::start() and other components
that may prepare CQL queries during their initialization.
Fixes SCYLLADB-252
Closesscylladb/scylladb#28139
Our glossary is stuck in the past, still discussing token ownership in
terms of vnodes and cluster synchronization in terms of gossip.
This patch tries to improve this a bit, although much more work needs to
be done.
The term `Tablet` is added and the definition of `Token` and `Token
Range` is rephrased to be tablet inclusive.
The term `Cluster` is changed to mention raft as the synchronization
mechanism instead of gossip.
One oustanding problem is that our general architecture page describing
the ring acrhitecture is still Vnode only. We have a seprate Tablets
page, but the two don't link to each other and most documentation refer
only to the former. A casual reader might be able to spend a a lot of
time on our documentation page, without even seeing the word: tablets.
Closesscylladb/scylladb#28170
The loop that unwraps nested exception, rethrows nested exception and saves pointer to the temporary std::exception& inner on stack, then continues. This pointer is, thus, pointing to a released temporary
Closesscylladb/scylladb#28143
reader_permit::release_base_resources() is a soft evict for the permit:
it releases the resources aquired during admission. This is used in
cases where a single process owns multiple permits, creating a risk for
deadlock, like it is the case for repair. In this case,
release_base_resources() acts as a manual eviction mechanism to prevent
permits blockings each other from admission.
Recently we found a bad interaction between release_base_resources() and
permit eviction. Repair uses both mechanism: it marks its permits as
inactive and later it also uses release_base_resources(). This partice
might be worth reconsidering, but the fact remains that there is a bug
in the reader permit which causes the base resources to be released
twice when release_base_resources() is called on an already evicted
permit. This is incorrect and is fixed in this patch.
Improve release_base_resources():
* make _base_resources const
* move signal call into the if (_base_resources_consumed()) { }
* use reader_permit::impl::signal() instead of
reader_concurrency_semaphore::signal()
* all places where base resources are released now call
release_base_resources()
A reproducer unit test is added, which fails before and passes after the
fix.
Fixes: #28083Closesscylladb/scylladb#28155
This is a translation of Cassandra's CQL unit test source file
validation/operations/InsertUpdateIfConditionTest.java into our cqlpy
framework.
This test file checks various LWT conditional updates. After that
file became too big, the Cassandra developers split parts from it -
moving tests for LWT with collections, UDTs, and static columns to
separate test files - which I already translated (pull request #13663).
This patch translates the remaining, main, LWT tests.
Strangely, this test file also has, in the middle of the file, several
tests for conditional schema changes, like CREATE KEYSPACE IF NOT EXISTS,
a feature which has *nothing* to do with LWT so really didn't belong in
this file. But I translated those as well.
These new tests all pass on both ScyllaDB and Cassandra, and have not
uncovered any new bug.
However these tests do demonstrate yet again something that users and
developers of ScyllaDB's LWT must be aware of: Whereas usually
ScyllaDB's goal has been compatiblity with Cassandra's CQL, in LWT
this has *not* been the case: ScyllaDB deviated from Cassandra's
behavior in its LWT implementation in several places. These intentional
deviations were documented in docs/kb/lwt-differences.rst.
Accordingly, the tests here include almost a hundred (!) modificatons
(search for "if is_scylla") to allow the same test to pass on both
ScyllaDB and Cassandra, as well as many comments explaining the types
of differences we're seeing.
Although these deviations from Cassandra compatibility are known and
intentional, it's worth listing here the ones re-discovered by these
new tests:
1. On a successful conditional write, Cassandra returns just true, Scylla
also returns the old contents of the row.
2. Similarly, in an IF EXISTS write that failed (the row did not exist),
Cassandra returns just false, Scylla also returns extra null values for
each and every column of the row.
3. Cassandra allows in "IF v IN (?, ?)" to bind individual values to
UNSET_VALUE and skips them, Scylla treats this as an error. Refs #13659.
4. When there are static columns, Scylla's LWT response returns the static
column first, Cassandra returns the modified column first. Since both
also say which columns they return, neither is more correct than the other,
a normally users will address specific columns by name, not by position.
5. docs/kb/lwt-differences.rst explains that "the returned result set
contains an old row for every conditional statement in the batch".
Beyond this different, actually non-conditional updates in the batch will
also get a row in Scylla's result. Refs #27955.
6. For batch statement, ScyllaDB allows mixing `IF EXISTS`, `IF NOT EXISTS`,
and other conditions for the same row. Cassandra doesn't, so checks that
these combinations are not allowed were commented out.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#27961
This PR marks system_replicated_keys as a system keyspace.
It was missing when the keyspace was added.
A side effect of that is that metrics that are not supposed to be reported are.
Fixes#27903Closesscylladb/scylladb#27954
* github.com:scylladb/scylladb:
distributed_loader: system_replicated_keys as system keyspace
replicated_key_provider: make KSNAME public
In storage_service::raft_topology_cmd_handler we pass a lambda
wrapped in coroutine::lambda to a function that creates streaming_task_impl.
The lambda is kept in streaming_task_impl that invokes it in its run
method.
The lambda captures may be destroyed before the lambda is called, leading
to use after free.
Do not wrap a lambda passed to streaming_task_impl into coroutine::lambda.
Use this auto dissociate the lambda lifetime from the calling statement.
Fixes: https://github.com/scylladb/scylladb/issues/28200.
Closesscylladb/scylladb#28201
Currently, if a rf change request is paused, it immediately changes
the system_schema.keyspaces to use rack list for this keyspace.
If the request is aborted, the co-location might not be finished.
Hence, we can end up with inconsistent schema and tablet replica state.
Update the system_schema.keyspaces only after the co-location is done (and
not when it's started).
Fixes: https://github.com/scylladb/scylladb/issues/28167
No backport needed; changes that introduced a bug are only on master
Closesscylladb/scylladb#28168
* github.com:scylladb/scylladb:
service: fin indentation
test: add test_numeric_rf_to_rack_list_conversion_abort
service: tasks: fix type of global_topology_request_virtual_task
service: do not change the schema while pausing the rf change
Refs #27429
Re-implement the dtest with same name as a scylla pytest, using a python level network proxy instead of tcpdump etc.
Both to avoid sudo and also to ensure we don't race.
Juggles different listen_address and broadcast_address values to insert a proxy measuring RPC traffic.
Note: the measuring relies on python network IO not splitting data chunks, since we don't really have packet-level view of the connections.
Note that a scylla change is required to make the ip address magic work, otherwise topology mechanism gets
confused. This should maybe at some point be looked into more, since we should be more resilient against various services in scylla binding to different addresses.
When this test is merged, we can drop the flaky test from dtest. And hope no new flakiness comes from this one...
Closesscylladb/scylladb#28133
* github.com:scylladb/scylladb:
test/cluster/test_internode_compression: Transpose test from dtest
gossiper/main: Extend special treatment of node ID resolve for rpc_address
All the tests under test/cqlpy/cassandra_tests/ were translated from
Cassandra's unit tests originally written in Java into our own test
framework, and accordinly carry a clear mention of their origin and
original license.
However, we did modify these original tests - even if the modification
was slight and mostly straightforward. Therefore I was asked to also
mention our own copyright (and license) for these modifications.
So this patch adds to every file in test/cqlpy/cassandra_tests/ text like:
# Modifications: Copyright 2026-present ScyllaDB
# SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
with the appropriate year instead of 2026.
Fixes#28215
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#28216
Since Vector Store service filtering API has been implemented (scylladb/vector-store#334), there is a need for the implementation of Scylla side part.
This patch should implement a `statement_restrictions` parsing into Vector Store filtering API compatible JSON objects.
Those objects should be added to ANN query vector POST requests as `filter` object.
After this patch, the subset of all operations ([Vector Search Filtering Milestone 1](https://scylladb.atlassian.net/wiki/spaces/RND/pages/156729450/Vector+Search+Filtering+Design+Document#Milestone-1)) happy path should be completed, allowing users to filter on primary key columns with single column `=` and `IN` or multiple column `()=()` and `() IN ()`.
The restrictions for other operations should be implemented in a PR on Vector Store service side.
---
This PR implements parsing the `statement_restrictions` into Vector Store filtering API compatible JSON objects.
The JSON objects are created and used in ANN vector queries with filtering.
It closes the Scylla side implementation of Vector Search filtering milestone 1.
Unit tests for `statement_restrictions` parsing are added. Integration tests will be added on Vector Store service side PR.
---
Fixes: SCYLLADB-249
New feature, should land into 2026.1
Closesscylladb/scylladb#28109
* github.com:scylladb/scylladb:
docs: update documentation on filtering with vector queries
test/vector_search: add test for filtered ANN with VS mock
test/vector_search: add restriction to JSON conversion unit tests
vector_search: cql: construct and use filter in ANN vector queries
select_statement: do not require post query ordering for vector queries
vector_search: add `statement_restrictions` to JSON parsing
seastar dd46b6f..e00f1513
```
e00f1513 Merge 'net: Add DNS TTL to the net::hostent' from Ernest Zaslavsky
8a69e1f4 net: extract common implementation of inet_address::find_all
cb469fd1 net: deprecate the addr_list in hostent
1d59c0ca net: expose DNS TTL via net::hostent
3c6d919f http: add virtual close() to connection_factory
bbd0001a Revert "net: expose DNS TTL via net::hostent"
```
Closesscylladb/scylladb#28147
Workload prioritization was added in scylladb/scylladb#22031.
The functionality of updating service levels was implemented as
a lambda coroutine, leaving room for the lambda coroutine fiasco.
The problem was noticed and addressed in scylladb/scylladb#26404.
There are currently three functions that call switch_tenant:
- update_user_scheduling_group_v1 and update_user_scheduling_group_v2
use the deducing this (this auto self) to ensure the proper
lifecycle of the lambda capture.
- update_control_connection_scheduling_group doesn’t use the deducing
this, but the lambda captures only `this`, which is used before
the first possible coroutine preemption. Therefore, it doesn’t seem
that any memory corruption or undefined behavior is possible here.
Nevertheless, it seems better to start using the deducing this in
update_control_connection_scheduling_group as well, to avoid problems
in the future if someone modifies the code and forgets to add it.
Fixes: SCYLLADB-284
Closesscylladb/scylladb#28158
The `make_key` lambda erroneously allocates a fixed 8-byte buffer
(`sizeof(s.size())`) for variable-length strings, potentially causing
uninitialized bytes to be included. If such bytes exist and they are
not valid UTF-8 characters, deserialization fails:
```
ERROR 2026-01-16 08:18:26,062 [shard 0:main] testlog - snapshot_list_contains_dropped_tables: cql env callback failed, error: exceptions::invalid_request_exception (Exception while binding column p1: marshaling error: Validation failed - non-UTF8 character in a UTF8 string, at byte offset 7)
```
Fixes#28195.
Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
Closesscylladb/scylladb#28197
Continuing the read once it is aborted (e.g. due to timeout) is a waste
of resources, as the produced results will be discarded.
Poll the permit's abort exception in the memtable and cache reader's
fill_buffer(). This results in one poll per buffer filled (8KB of data).
We already have similar poll for sstable readers, as disk reads are
usually much heavier and therefore it is more important to stop them
ASAP after abort. Cache and memtable reads are usually quick but not
always, hence it is important to also have polling in the cache and
memtable readers.
Refs: #11469Fixes: #28148Closesscylladb/scylladb#28149
0ede8d154b introduced the dev doc for size
based load balancing, but also added spelling errors.
This PR fixed these errors.
Closesscylladb/scylladb#28196
Currently, the type of global_topology_request_virtual_task isn't
taken out of std::variant before printing, which results with
a task of type variant(actual_type).
Retrieve the type from the variant before passing it to task type.
Currently, if a rf change request is paused, it immediately changes
the system_schema.keyspaces to use rack list for this keyspace.
If the request is aborted, the co-location might not be finished.
Hence, we can end up with inconsistent schema and tablet replica state.
Update the system_schema.keyspaces only after the co-location is done (and
not when it's started).
We switched to the size-based load balancing, which now has more
strict requirements for load stats. We no longer need only per-node
stats, but also per-tablet stats.
Bootstrapping a node triggers stats refresh, but allocating tablets on
table creation didn't. So after creating a table, load balancer
couldn't make progress for up to 60s (stats refresh period).
This makes tests take longer, and can even cause failures if tests are
using a low-enough timeout.
Fixes https://github.com/scylladb/scylladb/issues/27921
No backport becuse only master is vulnerable (size-based load balancing).
Closesscylladb/scylladb#27926
* https://github.com/scylladb/scylladb:
test: cluster: Add reproducer for missed notification in topology coordinator
topology_coordinator: Wake up the state machine after stats refresh
topology_coordinator: Move tablet_load_stats_refresh_before_rebalancing injection earlier
topology_coordinator: Fix potential missed notification
topology_coordinator: Refresh load stats after table is created or altered
tablets: Do a group0 read barrier on tablet load stats refresh
topology_coordinator: Ensure stats are refreshed in the gossip scheduling group
test: Use ManagerClient.{disable,enable}_tablet_balancing()
test: Add missing calls to disable_tablet_balancing() in tests which use move_tablet() API
test: pylib: Introduce ManagerClient.{disable,enable}_tablet_balancing()
Add a description of available filtering options with ANN vector queries.
Provide an example of such query and a reference to `WHERE` clause restrictions.