Add a basic test that creates a strongly consistent keyspace and table,
writes some data, and verifies that the same data can be read back.
Since Scylla-side request proxying is not yet implemented, writes are
handled only on the leader node. The test uses the existing
`/raft/leader_host` REST endpoint to determine the leader of the tablets
Raft group.
Add the `coordinator` class, which will be responsible for coordinating
reads and writes to strongly consistent tables. This commit includes
only the boilerplate; the methods will be implemented in separate
commits.
This patch changes the layout of user-facing scheduling groups from
/
`- statement
`- sl:default
`- sl:*
`- other groups (compaction, streaming, etc.)
into
/
`- user (supergroup)
`- statement
`- sl:default
`- sl:*
`- other groups (compaction, streaming, etc.)
The new supergroup has 1000 static shares and is name-less, in a sense
that it only have a variable in the code to refer to and is not exported
via metrics (should be fixed in seastar if we want to).
The moved groups don't change their names or shares, only move inside
the scheduling hierarchy.
The goal of the change is to improve resource consumption of sl:*
groups. Right now activities in low-shares service levels are scheduled
on-par with e.g. streaming activity, which is considered to be low-prio
one. By moving all sl:* groups into their own supergroup with 1000
shares changes the meaning of sl:* shares. From now on these shares
values describe preirities of service level between each-other, and the
user activities compete with the rest of the system with 1000 shares,
regardless of how many service levels are there.
Unit tests keep their user groups under root supergroup (for simplicity)
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#28235
backup and restore tests. This made the testing times explode
with both cluster/object_store/test_backup.py and
cluster/test_refresh.py taking more than an hour each to complete
under test.py and around 14min under pytest directly.
This was painful especially in CI because it runs tests under test.py which
suffers from the issue of not being able to run test cases from within
the same file in parallel (a fix is attempted in 27618).
This patch reduces the dataset of these tests to the minimum and
gets rid of one of the tested topology as it was redundant.
The test times are reduced to 2min under pytest and 14 mins under
test.py.
Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>
Closesscylladb/scylladb#28280
This series introduces `rescoring` index option.
There is no rescoring algorithm implementation yet.
This series prepares it by:
- adding new index option
- adding documentation
- adding tests for option handling
- adding tests for rescoring implementation - at this point they report errors and are marked that this is expected, because rescoring is not implemented.
Follow-up https://github.com/scylladb/scylladb/pull/27677
Fixes https://scylladb.atlassian.net/browse/SCYLLADB-293
Fixes https://scylladb.atlassian.net/browse/SCYLLADB-294
No backporting - it is a new feature.
Closesscylladb/scylladb#28165
* github.com:scylladb/scylladb:
vector_search: Add more rescoring validation tests
vector_search: Add rescoring validation test
vector_search: doc: Document new index option
vector_search: test: Add `rescoring` index option test
vector_index: introduce rescoring option
vector_index: improve options validation
This reverts commit c8cff94a5a.
Re-enabling incremental repair on master with "Aborting on shard 0 during
scaleout + repair #26041" and "Failure to attach sstables in streaming consumer
leaves sealed sstables on disk #27414" fixed.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Closesscylladb/scylladb#28120
Since #28109 was merged, those tests started to pass as we allow
the filtering on primary key columns within ANN vector queries.
Closesscylladb/scylladb#28231
Adding tests for specific cases of rescoring processing:
- wildcard selection - "SELECT * ..." is a case with slightly different path of rescoring processing. We want to confirm that it is handled correctly.
- calculating similarity with other vectors in SELECT clause should not influence ANN ordering.
- NULL handling - results that for any reason have NULL in a score should be filtered out.
As rescoring is not implemented yet, the tests use boost::unit_test::expected_failures
to indicate that the test reports errors.
Verify that vector store results will be correctly rescored and reordered
according to the rescoring algorithm.
As rescoring is not implemented yet, the tests use `boost::unit_test::expected_failures`
to indicate that they report errors.
First test checks rescoring with a simple selection list.
Second makes sure that rescoring is not triggered for quantization=f32 - full representation of vectors.
Third repeats the first one, but adds to it returning of similarity score value.
Add `prepared_filter` class which handles the preparation, construction
and caching of Vector Search filtering compatible JSON object.
If no bind markers found in SELECT statement, the JSON object will be built
once at prepare time and cached for use during execution calls.
Adjust tests accordingly to use prepared filters.
Follow-up: #28109
Fixes: SCYLLADB-299
During rewrite --extra-scylla-cmdline-options was missed and it was not
passed to the tests that are using pytest. The issue that there were no
possibility to pass these parameters via cmd to the Scylla, while tests
were not affected because they were using the parameters from the yaml
file. This PR fixes this issue so it will be easier to modify the Scylla
start parameters without modifying code.
Move Vector Search filter functions from `cql3::restrictions` to
`vector_search` namespace as it's a better place according to
it's purpose.
The effective name has now changed from `cql3::restrictions::to_json`
to `vector_search::to_json` which clearly mentions that the JSON
object will be used for Vector Search.
Rename the auxilary functions to use `to_json` suffix instead of
variety of verbs as those functions logic focuses on building JSON
object from different structures. The late naming emphasized too
much distinction between those functions, while they do pretty much
the same thing.
Follow-up: #28109
The API contract in partition_version.hh states that when dealing with
evictable entries, a real cache tracker pointer has to be passed to all
methods that ask for it. The nonpopulating reader violates this, passing
a nullptr to the snapshot. This was observed to cause a crash when a
concurrent cache read accessed the snapshot with the null tracker.
A reproducer is included which fails before and passes after the fix.
Fixes: #26847Closesscylladb/scylladb#28163
Currently the suite generates config in old format, and only a single
test validates that using new format "works".
This change updates the suite (mainly the MinioServer::create_conf()
method) to generate endpoint confit in new format.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#28113
Consider the following scenario:
1. Let nodes A,B,C form a cluster with RF=3
2. Write query with CL=QUORUM is submitted and is acknowledged by
nodes B,C
3. Follow-up read query with CL=QUORUM is sent to verify the write
from the previous step
4. Coordinator sends data/digest requests to the nodes A,B. Since the
node A is missing data, digest mismatches and data reconciliation
is triggered
5. The node A or B fails, becomes unavailable, etc
6. During reconciliation, data requests are sent to node A,B and fail
failing the entire read query
When the above scenario happens, the tests using `start_writes()` fail
with the following stacktrace:
```
...
> await finish_writes()
test/cluster/test_tablets_migration.py:259:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
test/pylib/util.py:241: in finish
await asyncio.gather(*tasks)
test/pylib/util.py:227: in do_writes
raise e
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
worker_id = 1
...
> rows = await cql.run_async(rd_stmt, [pk])
E cassandra.ReadFailure: Error from server: code=1300 [Replica(s) failed to execute read] message="Operation failed for test_1767777001181_bmsvk.test - received 1 responses and 1 failures from 2 CL=QUORUM." info={'consistency': 'QUORUM', 'required_responses': 2, 'received_responses': 1, 'failures': 1}
```
Note that when a node failure happens before/during a read query,
there is no test failure as the speculative retries are enabled
by default. Hence an additional data/digest read is sent to the third
remaining node.
However, the same speculative read is cancelled the moment, the read
query reaches CL which may trigger a read-repair.
This change:
- Retries the verification read in start_writes() on failure to mitigate
races between reads and node failures
- Adds additional logging to correlate Python exceptions with Scylla logs
Fixes https://github.com/scylladb/scylladb/issues/27478
Fixes https://github.com/scylladb/scylladb/issues/27974
Fixes https://github.com/scylladb/scylladb/issues/27494
Fixes https://github.com/scylladb/scylladb/issues/23529
Note that this change test flakiness observed during tablet transitions.
However, it serves as a workaround for a higher-level issue
https://github.com/scylladb/scylladb/issues/28125Closesscylladb/scylladb#28140
This patch adds vector index options allowing to enable quantization and oversampling.
Specific quantization value will be used internally by vector store.
In the current implementation, get_oversampling allows us to decide how many times more candidates
to retrieve from vector store - final response is still trimmed to the given limit.
It is a first step to allow rescoring - recalculation of similarity metric and re-ranking.
Without rescoring oversampling will be also further optimized to happen internally in vector store.
`test/vector_search/rescoring_test.cc` implements basic tests of added functionality.
New options are documented in `docs/cql/secondary-indexes.rst`.
Fixes https://scylladb.atlassian.net/browse/SCYLLADB-82
Ref https://scylladb.atlassian.net/browse/SCYLLADB-83
New feature - no backporting
Closesscylladb/scylladb#27677
* github.com:scylladb/scylladb:
vector_search: doc: Document new index options
vector_search: test: Test oversampling
vector_search: test: Add rescoring index options test
vector_search: test: Extract Configure utility to shared header
vector_index: introduce `quantization` and `oversampling` options
Cassandra changed their system tables in 3.0. We migrated to the new system table layout in 2017, in ScyllaDB 2.0.
System tables introduced in Cassandra 3.0, as well as the 3.0 variant of pre-existing system tables were added to the db::system_table::v3 namespace.
We ended up adding some new ScyllaDB-only system tables to this namespace as well.
As the dust settled, most of the v3 system tables ended up being either simple aliases to non-v3 tables, or new tables.
Either way, the codebase uses just one variant of each table for a long time now the v3:: distinction is pointless.
Remove the v3 namespace and unify the table listing under the top-level db::system_keyspace scope.
Code cleanup, no backport
Closesscylladb/scylladb#28146
* github.com:scylladb/scylladb:
db/system_keyspace: move remining tables out of v3 keyspace
db/system_keyspace: relocate truncated() and commitlog_cleanups()
db/system_keyspace: drop v3::local()
db/system_keyspace: remove duplicate table names from v3
Before this commit, if a test file or a test suite didn't include
any actual test cases, it was ignored by `boost_test_tree_lister`.
However, this information is useful; for example, it allows us to tell
if the test file the user wants to run doesn't exist or simply doesn't
contain any tests. The kind of error we would return to them should be
different depending on which situation we're dealing with.
We start including those empty suites and files in the output of
`--list_json_content`.
---
Examples (with additional formatting):
* Consider the following test file, `test/boost/dummy_test.cc` [1]:
```
BOOST_AUTO_TEST_SUITE(dummy_suite1)
BOOST_AUTO_TEST_SUITE(dummy_suite2)
BOOST_AUTO_TEST_SUITE_END()
BOOST_AUTO_TEST_SUITE_END()
BOOST_AUTO_TEST_SUITE(dummy_suite3)
BOOST_AUTO_TEST_SUITE_END()
```
Before this commit:
```
$ ./build/debug/test/boost/dummy_test -- --list_json_content
[{"file": "test/boost/dummy_test.cc", "content": {"suites": [], "tests": []}}]
```
After this commit:
```
$ ./build/debug/test/boost/dummy_test -- --list_json_content
[{"file":"test/boost/dummy_test.cc", "content": {"suites": [
{"name": "dummy_suite1", "suites": [
{"name": "dummy_suite2", "suites": [], "tests": []}
], "tests": []},
{"name": "dummy_suite3", "suites": [], "tests": []}
], "tests": []}}]
```
* Consider the same test file as in Example 1, but also assume it's compiled
into `test/boost/combined_tests`.
Before this commit:
```
$ ./build/debug/test/boost/combined_tests -- --list_json_content | grep dummy
$
```
After this commit:
```
$ ./build/debug/test/boost/combined_tests -- --list_json_content
[..., {"file": "test/boost/dummy_test.cc", "content": {"suites": [
{"name": "dummy_suite1", "suites":
[{"name": "dummy_suite2", "suites": [], "tests": []}],
"tests": []},
{"name": "dummy_suite3", "suites": [], "tests": []}],
"tests":[]}}, ...]
```
[1] Note that the example is simplified. As of now, it's not possible to use
`--list_json_content` with a file without any Boost tests. That will
result in the following error: `Test setup error: test tree is empty`.
Refs scylladb/scylladb#25415
In scylladb/scylladb@afde5f668a, we
implemented custom collection of information about Boost tests
in the repository. The solution boiled down to traversing through
the test tree via callbacks provided by Boost.Test and calling that
code from a global fixture. This way, the code is called automatically
by the framework.
Unfortunately, for an unknown reason, this leads to labels of test units
being duplicated. We haven't found the root cause yet and so we
deduplicate the labels manually.
---
Example (with additional formatting):
Consider the following test in the file `test/boost/dummy_test.cc`:
```
SEASTAR_TEST_CASE(dummy_case, *boost::unit_test::label("mylabel1")) {
return make_ready_future();
}
```
Before this commit:
```
$ ./build/dev/test/boost/dummy_test -- --list_json_content
[{"file": "test/boost/dummy_test.cc", "content": {"suites": [],
"tests": [{"name": "dummy_case", "labels": "mylabel1,mylabel1"}]}
}]
```
After this commit:
```
$ ./build/dev/test/boost/dummy_test -- --list_json_content
[{"file": "test/boost/dummy_test.cc", "content": {"suites": [],
"tests": [{"name": "dummy_case", "labels": "mylabel1"}]}
}]
```
Refs scylladb/scylladb#25415
This test has to be adjusted in lock-step with scylladb.git, due to changes in https://github.com/scylladb/scylladb/pull/27836. It is simpler to just take the time and import it, so https://github.com/scylladb/scylladb/pull/27836 can patch all the affected tests, including this one.
All code is imported verbatim, then patched later, such that the series remains bisectable.
dtest import, no backport needed
Closesscylladb/scylladb#28085
* github.com:scylladb/scylladb:
test/cluster/dtest: remove is_win() and users
test/cluster/dtest/scrub_test.py: add license blurb
test/cluster/dtest: import scrub_test.py
test/cluster/dtest/ccmlib: scylla_node.py: adapt run_scylla_sstable() at al
test/cluster/dtest/ccmlib: scylla_node.py: import run_scylla_sstable()
The original scrub test was done by the Cassandra project, hence there
is two Licenses notices: one for the original work by Cassandra
(2015) and one for our modifications on top (2021).
And dependencies: get_sstables() and __gather_sstables().
Code is importend verbatim, but doesn't work yet (no users yet either).
Will be patched to work in the next commit.
The last remining tables in the v3 keyspace are those that are genuinely
distinct -- added by Cassandra 3.0 or >= ScyllaDB 2.0.
Move these out of the v3 keyspace too, with this the v3 keyspace is
defunct and removed.
The way that test.py runs test/cqlpy tests requires that tests end their
session with all keyspaces deleted. If we forget to delete a keyspace,
test.py suspects some test fails and reports a failure. As reported in
issue #26291, the test file test/cqlpy/test_describe.py caused this check
to trigger, so this file was added to the blacklist "dirties_cluster"
in suite.yaml to force test.py to ignore this problem.
I believe the cause of the problem was as follows: test_describe.py
didn't really leave any undeleted keyspace. Rather, test_describe.py had
one test which used "USE" and this broke DESC KEYSPACES (Refs #26334) -
which test.py used to see which keyspaces remained.
We solved this problem not just once, but twice:
1. In pull request #26345, I fixed the test not to use "USE" on the main
CQL session.
2. In pull request #27971, I fixed DESC KEYSPACES implementation so even
if "USE" was in effect, it will return the correct results.
I checked manually, and after removing test_describe.py from the
dirties_cluster blacklist, all cqlpy tests now pass, without
spurious failures in the test following test_describe.py. So it's time
to remove it from the blacklist.
Fixes#26291
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#27973
The loop that unwraps nested exception, rethrows nested exception and saves pointer to the temporary std::exception& inner on stack, then continues. This pointer is, thus, pointing to a released temporary
Closesscylladb/scylladb#28143
reader_permit::release_base_resources() is a soft evict for the permit:
it releases the resources aquired during admission. This is used in
cases where a single process owns multiple permits, creating a risk for
deadlock, like it is the case for repair. In this case,
release_base_resources() acts as a manual eviction mechanism to prevent
permits blockings each other from admission.
Recently we found a bad interaction between release_base_resources() and
permit eviction. Repair uses both mechanism: it marks its permits as
inactive and later it also uses release_base_resources(). This partice
might be worth reconsidering, but the fact remains that there is a bug
in the reader permit which causes the base resources to be released
twice when release_base_resources() is called on an already evicted
permit. This is incorrect and is fixed in this patch.
Improve release_base_resources():
* make _base_resources const
* move signal call into the if (_base_resources_consumed()) { }
* use reader_permit::impl::signal() instead of
reader_concurrency_semaphore::signal()
* all places where base resources are released now call
release_base_resources()
A reproducer unit test is added, which fails before and passes after the
fix.
Fixes: #28083Closesscylladb/scylladb#28155
This is a translation of Cassandra's CQL unit test source file
validation/operations/InsertUpdateIfConditionTest.java into our cqlpy
framework.
This test file checks various LWT conditional updates. After that
file became too big, the Cassandra developers split parts from it -
moving tests for LWT with collections, UDTs, and static columns to
separate test files - which I already translated (pull request #13663).
This patch translates the remaining, main, LWT tests.
Strangely, this test file also has, in the middle of the file, several
tests for conditional schema changes, like CREATE KEYSPACE IF NOT EXISTS,
a feature which has *nothing* to do with LWT so really didn't belong in
this file. But I translated those as well.
These new tests all pass on both ScyllaDB and Cassandra, and have not
uncovered any new bug.
However these tests do demonstrate yet again something that users and
developers of ScyllaDB's LWT must be aware of: Whereas usually
ScyllaDB's goal has been compatiblity with Cassandra's CQL, in LWT
this has *not* been the case: ScyllaDB deviated from Cassandra's
behavior in its LWT implementation in several places. These intentional
deviations were documented in docs/kb/lwt-differences.rst.
Accordingly, the tests here include almost a hundred (!) modificatons
(search for "if is_scylla") to allow the same test to pass on both
ScyllaDB and Cassandra, as well as many comments explaining the types
of differences we're seeing.
Although these deviations from Cassandra compatibility are known and
intentional, it's worth listing here the ones re-discovered by these
new tests:
1. On a successful conditional write, Cassandra returns just true, Scylla
also returns the old contents of the row.
2. Similarly, in an IF EXISTS write that failed (the row did not exist),
Cassandra returns just false, Scylla also returns extra null values for
each and every column of the row.
3. Cassandra allows in "IF v IN (?, ?)" to bind individual values to
UNSET_VALUE and skips them, Scylla treats this as an error. Refs #13659.
4. When there are static columns, Scylla's LWT response returns the static
column first, Cassandra returns the modified column first. Since both
also say which columns they return, neither is more correct than the other,
a normally users will address specific columns by name, not by position.
5. docs/kb/lwt-differences.rst explains that "the returned result set
contains an old row for every conditional statement in the batch".
Beyond this different, actually non-conditional updates in the batch will
also get a row in Scylla's result. Refs #27955.
6. For batch statement, ScyllaDB allows mixing `IF EXISTS`, `IF NOT EXISTS`,
and other conditions for the same row. Cassandra doesn't, so checks that
these combinations are not allowed were commented out.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#27961
Currently, if a rf change request is paused, it immediately changes
the system_schema.keyspaces to use rack list for this keyspace.
If the request is aborted, the co-location might not be finished.
Hence, we can end up with inconsistent schema and tablet replica state.
Update the system_schema.keyspaces only after the co-location is done (and
not when it's started).
Fixes: https://github.com/scylladb/scylladb/issues/28167
No backport needed; changes that introduced a bug are only on master
Closesscylladb/scylladb#28168
* github.com:scylladb/scylladb:
service: fin indentation
test: add test_numeric_rf_to_rack_list_conversion_abort
service: tasks: fix type of global_topology_request_virtual_task
service: do not change the schema while pausing the rf change
Refs #27429
Re-implement the dtest with same name as a scylla pytest, using a python level network proxy instead of tcpdump etc.
Both to avoid sudo and also to ensure we don't race.
Juggles different listen_address and broadcast_address values to insert a proxy measuring RPC traffic.
Note: the measuring relies on python network IO not splitting data chunks, since we don't really have packet-level view of the connections.
Note that a scylla change is required to make the ip address magic work, otherwise topology mechanism gets
confused. This should maybe at some point be looked into more, since we should be more resilient against various services in scylla binding to different addresses.
When this test is merged, we can drop the flaky test from dtest. And hope no new flakiness comes from this one...
Closesscylladb/scylladb#28133
* github.com:scylladb/scylladb:
test/cluster/test_internode_compression: Transpose test from dtest
gossiper/main: Extend special treatment of node ID resolve for rpc_address
All the tests under test/cqlpy/cassandra_tests/ were translated from
Cassandra's unit tests originally written in Java into our own test
framework, and accordinly carry a clear mention of their origin and
original license.
However, we did modify these original tests - even if the modification
was slight and mostly straightforward. Therefore I was asked to also
mention our own copyright (and license) for these modifications.
So this patch adds to every file in test/cqlpy/cassandra_tests/ text like:
# Modifications: Copyright 2026-present ScyllaDB
# SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
with the appropriate year instead of 2026.
Fixes#28215
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#28216
Many tests want to assume that group0 leader runs on a particualr
server, typically the first server in the list.
And they cannot be easily made to work with arbitrary leader, becuase
they setup a particular topology and then stop particular nodes, and
want to assume the leader is stable. They open leader's log and
expect things to appear in that log.
It's much easier to ensure the leader, than to prepare tests to
handle failovers.
In case of decommission, it's not desirable because it's less urgent.
In case of removenode, it leads to failure of removenode operation
because scheduled co-locating migration will fail if the destination
is on the excluded node, and this failure will be interpreted as drain
failure and coordinator will cancel the request.
Not a problem before "parallel decommission" because this failure is
only a streaming failure, not a barrier failure, so exception doesn't
escape into the catch clause in transition stage handler, and the
migration is simply rolled back. Once draining happens in the tablet
migration track, streaming failure will be interpreted as drain
failure and cancel the request.
Reaching critical disk utilization on destination means the draining
either caused it, or at least works against reliveing it. So it's
better to cancel those requests. In case of decommission, if critical
disk utilization was caused by it due to not enough capacity, aborting
decomission will bring capacity back to the system and rebalancing
will relieve critical disk utlization.
We want to be able to cancel decommission when it's still in the
tablet draining phase. Such a request is in a pending and paused
state, and can be safely canceled. We set the node's "draining" flag
back to false.