This patch series introduces a new documentation for exiting guardrails.
Moreover:
- Warning / failure messages of recently added write CL guardrails (SCYLLADB-259) are rephrased, so all guardrails have similar messages.
- Some new tests are added, to help verify the correctness of the documentation and avoid situations where the documentation and implementation diverge.
Fixes: [SCYLLADB-257](https://scylladb.atlassian.net/browse/SCYLLADB-257)
No backport, just new docs and tests.
[SCYLLADB-257]: https://scylladb.atlassian.net/browse/SCYLLADB-257?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQClosesscylladb/scylladb#29011
* github.com:scylladb/scylladb:
test: add new guardrail tests matching documentation scenarios
test: add metric assertions to guardrail replication strategy tests
test: use regex matching in guardrail replication strategy tests
test: extract ks_opts helper in test_guardrail_replication_strategy
docs: document CQL guardrails
cql: improve write consistency level guardrail messages
Document that `SELECT ... WHERE` clause currently accepts only conjunctions
of relations joined by `AND` (`OR` is not supported), and that
parentheses cannot be used to group boolean subexpressions.
Add an unsupported query example and point readers to equivalent `IN`
rewrites when applicable.
This problem has been raised by one of our users in
https://forum.scylladb.com/t/error-parsing-query-or-unsupported-statement/5299,
and while one could infer answer to user's question by looking at the
syntax of the `SELECT ... WHERE`, it's not immediately obvious to
non-advanced users, so clarifying these concepts is justified.
Fixes: SCYLLADB-1116
Closesscylladb/scylladb#29100
Remove outdated references to filtering on columns provided in the
index definition, and remove the note about equal relations (= and IN)
being the only supported operations. Vector search filtering currently
supports WHERE clauses on primary key columns only.
Closesscylladb/scylladb#28949
Currently, repair-mode tombstone-gc cannot be used on tables with RF=1. We want to make repair-mode the default for all tablet tables (and more, see https://github.com/scylladb/scylladb/issues/22814), but currently a keyspace created with RF=1 and later altered to RF>1 will end up using timeout-mode tombstone gc. This is because the repair-mode tombstone-gc code relies on repair history to determine the gc-before time for keys/ranges. RF=1 tables cannot run repairs so they will have empty repair history and consequently won't be able to purge tombstones.
This PR solves this by keeping a registry of RF=1 tables and consulting this registry when creating `tombstone_gc_state` objects. If the table is RF=1, tombstone-gc will work as if the table used immediate-mode tombstone-gc. The registry is updated on each replication update. As soon as the table is not RF=1 anymore, the tombstone-gc reverts to the natural repair-mode behaviour.
After this PR, tombstone-gc defaults to repair-mode for all tables, regardless of RF and tablets/vnodes.
Fixes: SCYLLADB-106.
New feature, no backport required.
Closesscylladb/scylladb#22945
* github.com:scylladb/scylladb:
test/{boost,cluster}: add test for tombstone gc mode=repair with RF=1
tombstone_gc: allow use of repair-mode for RF=1 tables
replica/table: update rf=1 table registry in shared tombstone-gc state
tombstone_gc: tombstone_gc_before_getter: consider RF when getting gc before time
tombstone_gc: unpack per_table_history_maps
tombstone_gc: extract _group0_gc_time from per_table_history_map
tombstone_gc: drop tombstone_gc_state(nullptr) ctor and operator bool()
test/lib/random_schema: use timeout-mode tombstone_gc
tombstone_gc_options: add C++ friendly constructor
test: move away from tombstone_gc_state(nullptr) ctor
treewide: move away from tombstone_gc_state(nullptr) ctor
sstable: move away from tombstone_gc_mode::operator bool()
replica/table: add get_tombstone_gc_state()
compaction: use tombstone_gc_state with value semantics
db/row_cache: use tombstone_gc_state with value semantics
tombstone_gc: introduce tombstone_gc_state::for_tests()
Modify the methods which calculate the default gc mode as well as that
which validates whether repair-mode can be used at all, so both accepts
use of repair-mode on RF=1 tables.
This de-facto changes the default tombstone-gc to repair-mode for all
tables. Documentation is updated accordingly.
Some tests need adjusting:
* cqlpy/test_select_from_mutation_fragments.py: disable GC for some test
cases because this patch makes tombstones they write subject to GC
when using defaults.
* test/cluster/test_mv.py::test_mv_tombstone_gc_not_inherited used
repair-mode as a non-default for the base table and expected the MV to
revert to default. Another mode has to be used as the non-default
(immediate).
* test/cqlpy/test_tools.py::test_scylla_sstable_dump_schema: don't
compare tombstone_gc schema extension when comparing dumped schema vs.
original. The tool's schema loader doesn't have access to the keyspace
definition so it will come up with different defaults for
tombstone-gc.
* test/boost/row_cache_test.cc::test_populating_cache_with_expired_and_nonexpired_tombstones
sets tombstone expiry assuming the tombstone-gc timeout-mode default.
Change the CREATE TABLE statement to set the expected mode.
Introduced a new max_tablet_count tablet option that caps the maximum number of tablets a table can have. This feature is designed primarily for backup and restore workflows.
During backup, when load balancing is disabled for snapshot consistency, the current tablet count is recorded in the backup manifest.
During restore, max_tablet_count is set to this recorded value, ensuring the restored table's tablet count never exceeds the original snapshot's tablet distribution.
This guarantee enables efficient file-based SSTable streaming during restore, as each SSTable remains fully contained within a single tablet boundary.
Closesscylladb/scylladb#28450
Add user-facing documentation for the new CQL per-row TTL feature,
in docs/cql/cql-extensions.md.
Also mention (and link) the new alternative TTL feature in a few
relevant documents about the old (per-write) TTL, about CDC,
and about the CREATE TABLE and ALTER TABLE commands.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Add support for literals in the SELECT clause. This allows
SELECT fn(column, 4) or SELECT fn(column, ?).
Note, "SELECT 7 FROM tab" becomes valid in the grammar, but is still
not accepted because of failed type inference - we cannot infer the
type of 7, and don't have a favored type for literals (like C favors
int). We might relax this later.
In the WHERE clause, and Cassandra in the SELECT clause, type hints
can also resolve type ambiguity: (bigint)7 or (text)?. But this is
deferred to a later patch.
A few changes to the grammar are needed on top of adding a `value`
alternative to `unaliasedSelector`:
- vectorSimilarityArg gained access to `value` via `unaliasedSelector`,
so it loses that alternate to avoid ambiguity. We may drop
`vectorSimilarityArg` later.
- COUNT(1) became ambiguous via the general function path (since
function arguments can now be literals), so we remove this case
from the COUNT special cases, remaining with count(*).
- SELECT JSON and SELECT DISTINCT became "ambiguous enough" for
ANTLR to complain, though as far as I can tell `value` does not
add real ambiguity. The solution is to commit early (via "=>") to
a parsing path.
Due to the loss of count(1) recognition in the parser, we have to
special-case it in prepare. We may relax it to count any expression
later, like modern Cassandra and SQL.
Testing is awkward because of the type inference problem in top-level.
We test via the set_intersection() function and via lua functions.
Example:
```
cqlsh> CREATE FUNCTION ks.sum(a int, b int) RETURNS NULL ON NULL INPUT RETURNS int LANGUAGE LUA AS 'return a + b';
cqlsh> SELECT ks.sum(1, 2) FROM system.local;
ks.sum(1, 2)
--------------
3
(1 rows)
cqlsh>
```
(There are no suitable system functions!)
Fixes https://scylladb.atlassian.net/browse/SCYLLADB-296Closesscylladb/scylladb#28256
Vector Search feature needs to support creating vector indexes with additional
filtering column. There will be two types of indexes: global which indexes
vectors per table, and local which indexes vectors per partition key. The new
syntaxes are based on ScyllaDB's Global Secondary Index and Local Secondary
Index. Vector indexes don't use secondary indexes functionalities in any way -
all indexing, filtering and processing data will be done on Vector Store side.
This patch allows creating vector indexes using this CQL syntax:
```
CREATE TABLE IF NOT EXISTS cycling.comments_vs (
commenter text,
comment text,
comment_vector VECTOR <FLOAT, 5>,
created_at timestamp,
discussion_board_id int,
country text,
lang text,
PRIMARY KEY ((commenter, discussion_board_id), created_at)
);
CREATE CUSTOM INDEX IF NOT EXISTS global_ann_index
ON cycling.comments_vs(comment_vector, country, lang) USING 'vector_index'
WITH OPTIONS = { 'similarity_function': 'DOT_PRODUCT' };
CREATE CUSTOM INDEX IF NOT EXISTS local_ann_index
ON cycling.comments_vs((commenter, discussion_board_id), comment_vector, country, lang)
USING 'vector_index'
WITH OPTIONS = { 'similarity_function': 'DOT_PRODUCT' };
```
Currently, if we run these queries to create indexes we will receive such errors:
```
InvalidRequest: Error from server: code=2200 [Invalid query] message="Vector index can only be created on a single column"
InvalidRequest: Error from server: code=2200 [Invalid query] message="Local index definition must contain full partition key only. Redundant column: XYZ"
```
This commit refactors `vector_index::check_target` to correctly validate
columns building the index. Vector-store currently support filtering by native
types, so the type of columns is checked. The first column from the list must
be a vector (to build index based on these vectors), so it is also checked.
Allowed types for columns are native types without counter (it is not possible
to create a table with counter and vector) and without duration (it is not
possible to correctly compare durations, this type is even not allowed in
secondary indexes).
This commits adds cqlpy test to check errors while creating indexes.
Fixes: SCYLLADB-298
This needs to be backported to version 2026.1 as this is a fix for filtering support.
Closesscylladb/scylladb#28366
This parameter was not mentioned at all anywhere in the documentation.
Add an explanation of this parameter: why we need it, what is the
default and how it can be changed.
Closesscylladb/scylladb#28132
This patch adds links to the Vector Search documentation that is hosted
together with Scylla Cloud docs to the CQL documentation.
It also make the note about supported capabilities consistent and
removes the experimental label as the feature is GAed.
Fixes: SCYLLADB-371
Closesscylladb/scylladb#28312
Add enforce_rack_list option. When the option is set to true,
all tablet keyspaces have rack list replication factor.
When the option is on:
- CREATE STATEMENT always auto-extends rf to rack lists;
- ALTER STATEMENT fails when there is numeric rf in any DC.
The flag is set to false by default and a node needs to be restarted
in order to change its value. Starting a node with enforce_rack_list
option will fail, if there are any tablet keyspaces with numeric rf
in any DC.
enforce_rack_list is a per-node option and a user needs to ensure
that no tablet keyspace is altered or created while nodes in
the cluster don't have the consistent value.
Mark rf_rack_valid_keyspaces as deprecated.
Fixes: https://github.com/scylladb/scylladb/issues/26399.
New feature; no backport needed
Closesscylladb/scylladb#28084
* github.com:scylladb/scylladb:
test: add test for enforce_rack_list option
db: mark rf_rack_valid_keyspaces as deprecated
config: add enforce_rack_list option
Revert "alternator: require rf_rack_valid_keyspaces when creating index"
Mark rf_rack_valid_keyspaces option as deprecated. User should
use enforce_rack_list option instead.
The option can still be used and it does not change it's behavior.
Docs is updated accordingly.
Since Vector Store service filtering API has been implemented (scylladb/vector-store#334), there is a need for the implementation of Scylla side part.
This patch should implement a `statement_restrictions` parsing into Vector Store filtering API compatible JSON objects.
Those objects should be added to ANN query vector POST requests as `filter` object.
After this patch, the subset of all operations ([Vector Search Filtering Milestone 1](https://scylladb.atlassian.net/wiki/spaces/RND/pages/156729450/Vector+Search+Filtering+Design+Document#Milestone-1)) happy path should be completed, allowing users to filter on primary key columns with single column `=` and `IN` or multiple column `()=()` and `() IN ()`.
The restrictions for other operations should be implemented in a PR on Vector Store service side.
---
This PR implements parsing the `statement_restrictions` into Vector Store filtering API compatible JSON objects.
The JSON objects are created and used in ANN vector queries with filtering.
It closes the Scylla side implementation of Vector Search filtering milestone 1.
Unit tests for `statement_restrictions` parsing are added. Integration tests will be added on Vector Store service side PR.
---
Fixes: SCYLLADB-249
New feature, should land into 2026.1
Closesscylladb/scylladb#28109
* github.com:scylladb/scylladb:
docs: update documentation on filtering with vector queries
test/vector_search: add test for filtered ANN with VS mock
test/vector_search: add restriction to JSON conversion unit tests
vector_search: cql: construct and use filter in ANN vector queries
select_statement: do not require post query ordering for vector queries
vector_search: add `statement_restrictions` to JSON parsing
Add a description of available filtering options with ANN vector queries.
Provide an example of such query and a reference to `WHERE` clause restrictions.
Problem
-------
Secondary indexes are implemented via materialized views under the
hood. The way an index behaves is determined by the configuration
of the view. Currently, it can be modified by performing the CQL
statement `ALTER MATERIALIZED VIEW` on it. However, that raises some
concerns.
Consider, for instance, the following scenario:
1. The user creates a secondary index on a table.
2. In parallel, the user performs writes to the base table.
3. The user modifies the underlying materialized view, e.g. by setting
the `synchronous_updates` to `true` [1].
Some of the writes that happened before step 3 used the default value
of the property (which is `false`). That had an actual consequence
on what happened later on: the view updates were performed
asynchronously. Only after step 3 had finished did it change.
Unfortunately, as of now, there is no way to avoid a situation like
that. Whenever the user wants to configure a secondary index they're
creating, they need to do it in another schema change. Since it's
not always possible to control how the database is manipulated in
the meantime, it leads to problems like the one described.
That's not all, though. The fact that it's not possible to configure
secondary indexes is inconsistent with other schema entities. When
it comes to tables or materialized views, the user always have a means
to set some or even all of the properties during their creation.
Solution
--------
The solution to this problem is extending the `CREATE INDEX` CQL
statement by view properties. The syntax is of form:
```
> CREATE INDEX <index name>
> .. ON <keyspace>.<table> (<columns>)
> .. WITH <properties>
```
where `<properties>` corresponds to both index-specific and view
properties [2, 3]. View properties can only be used with indexes
implemented with materialized views; for example, it will be impossible
to create a vector index when specifying any view property (see
examples below).
When a view property is provided, it will be applied when creating the
underlying materialized view. The behavior should be similar to how
other CQL statements responsible for creating schema entities work.
High-level implementation strategy
----------------------------------
1. Make auxiliary changes.
2. Introduce data structures representing the new set of index
properties: both index-specific and those corresponding to the
underlying view.
3. Extend `CREATE INDEX` to accept view properties.
4. Extend `DESCRIBE INDEX` and other `DESCRIBE` statements to include
view properties in their output.
User documentation is also updated at the steps to reflect the
corresponding changes.
Implementation considerations
-----------------------------
There are a number of schema properties that are now obsolete. They're
accepted by other CQL statements, but they have no effect. They
include:
* `index_interval`
* `replicate_on_write`
* `populate_io_cache_on_flush`
* `read_repair_chance`
* `dclocal_read_repair_chance`
If the user tries to create a secondary index specifying any of those
keywords, the statement will fail with an appropriate error (see
examples below).
Unlike materialized views, we forbid specifying the clustering order
when creating a secondary index [4]. This limitation may be lifted
later on, but it's a detail that may or may not prove troublesome. It's
better to postpone covering it to when we have a better perspective on
the consequences it would bring.
Examples
--------
Good examples
```
> CREATE INDEX idx ON ks.t (v);
> CREATE INDEX idx ON ks.t (v) WITH comment = 'ok view property';
> CREATE INDEX idx ON ks.t (v)
.. WITH comment = 'multiple view properties are ok'
.. AND synchronous_updates = true;
> CREATE INDEX idx ON ks.t (v)
.. WITH comment = 'default value ok'
.. AND synchronous_updates = false;
```
Bad examples
```
> CREATE INDEX idx ON ks.t (v) WITH replicate_on_write = true;
SyntaxException: Unknown property 'replicate_on_write'
> CREATE INDEX idx ON ks.t (v)
.. WITH OPTIONS = {'option1': 'value1'}
.. AND comment = 'some text';
InvalidRequest: Error from server: code=2200 [Invalid query]
message="Cannot specify options for a non-CUSTOM index"
> CREATE CUSTOM INDEX idx ON ks.t (v)
.. WITH OPTIONS = {'option1': 'value1'}
.. AND comment = 'some text';
InvalidRequest: Error from server: code=2200 [Invalid query]
message="CUSTOM index requires specifying the index class"
> CREATE CUSTOM INDEX idx ON ks.t (v)
.. USING 'vector_index'
.. WITH OPTIONS = {'option1': 'value1'}
.. AND comment = 'some text';
InvalidRequest: Error from server: code=2200 [Invalid query]
message="You cannot use view properties with a vector index"
> CREATE INDEX idx ON ks.t (v) WITH CLUSTERING ORDER BY (v ASC);
InvalidRequest: Error from server: code=2200 [Invalid query]
message="Indexes do not allow for specifying the clustering order"
```
and so on. For more examples, see the relevant tests.
References:
[1] https://docs.scylladb.com/manual/branch-2025.4/cql/cql-extensions.html#synchronous-materialized-views
[2] https://docs.scylladb.com/manual/branch-2025.4/cql/secondary-indexes.html#create-index
[3] https://docs.scylladb.com/manual/branch-2025.4/cql/mv.html#mv-options
[4] https://docs.scylladb.com/manual/branch-2025.4/cql/dml/select.html#ordering-clauseFixesscylladb/scylladb#16454
Backport: not needed. This is an enhancement.
Closesscylladb/scylladb#24977
* github.com:scylladb/scylladb:
cql3: Extend DESC INDEX by view properties
cql3: Forbid using CLUSTERING ORDER BY when creating index
cql3: Extend CREATE INDEX by MV properties
cql3/statements/create_index_statement: Allow for view options
cql3/statements/create_index_statement: Rename member
cql3/statements/index_prop_defs: Re-introduce index_prop_defs
cql3/statements/property_definitions: Add extract_property()
cql3/statements/index_prop_defs.cc: Add namespace
cql3/statements/index_prop_defs.hh: Rename type
cql3/statements/view_prop_defs.cc: Move validation logic into file
cql3/statements: Introduce view_prop_defs.{hh,cc}
cql3/statements/create_view_statement.cc: Move validation of ID
schema/schema.hh: Do not include index_prop_defs.hh
Allow creating materialized views and secondary indexes in a tablets keyspace only if it's RF-rack-valid, and enforce RF-rack-validity while the keyspace has views by restricting some operations:
* Altering a keyspace's RF if it would make the keyspace RF-rack-invalid
* Adding a node in a new rack
* Removing / Decommissioning the last node in a rack
Previously the config option `rf_rack_valid_keyspaces` was required for creating views. We now remove this restriction - it's not needed because we always maintain RF-rack-validity for keyspaces with views.
The restrictions are relevant only for keyspaces with numerical RF. Keyspace with rack-list-based RF are always RF-rack-valid.
Fixesscylladb/scylladb#23345
Fixes https://github.com/scylladb/scylladb/issues/26820
backport to relevant versions for materialized views with tablets since it depends on rf-rack validity
Closesscylladb/scylladb#26354
* github.com:scylladb/scylladb:
docs: update RF-rack restrictions
cql3: don't apply RF-rack restrictions on vector indexes
cql3: add warning when creating mv/index with tablets about rf-rack
service/tablet_allocator: always allow tablet merge of tables with views
locator: extend rf-rack validation for rack lists
test: test rf-rack validity when creating keyspace during node ops
locator: fix rf-rack validation during node join/remove
test: test topology restrictions for views with tablets
test: add test_topology_ops_with_rf_rack_valid
topology coordinator: restrict node join/remove to preserve RF-rack validity
topology coordinator: add validation to node remove
locator: extend rf-rack validation functions
view: change validate_view_keyspace to allow MVs if RF=Racks
db: enforce rf-rack-validity for keyspaces with views
replica/db: add enforce_rf_rack_validity_for_keyspace helper
db: remove enforce parameter from check_rf_rack_validity
test: adjust test to not break rf-rack validity
It should be possible to return the similarity of vectors in CQL statements following the [Cassandra compatible syntax](https://cassandra.apache.org/doc/latest/cassandra/getting-started/vector-search-quickstart.html#query-vector-data-with-cql):
```
SELECT comment, similarity_cosine(comment_vector, [0.1, 0.15, 0.3, 0.12, 0.05])
FROM cycling.comments_vs;
```
Although the calculations are slow, and we already have calculated results returned via Vector Store API,
we need the functionality as it allows us to calculate similarity of vectors not stored in vector indexes.
It will be needed for [quantization and rescoring](https://scylladb.atlassian.net/wiki/spaces/RND/pages/195985800/Quantization+and+Rescoring).
The feature is also a nice-to-have in testing as requested many times by testing and CX teams.
The optimized version utilizing already calculated distances from Vector Store without a need of rescoring will be coming soon after via https://github.com/scylladb/scylladb/pull/27991.
---
The patch adds functions:
- `similarity_cosine(<vector>, <vector>)`,
- `similarity_euclidean(<vector>, <vector>)`,
- `similarity_dot_product(<vector>, <vector>)`
Where `<vector>` is either a column of type `VECTOR<FLOAT, N>` or a vector of floats literal.
These functions can be called with every `SELECT` query, not only ANN vector queries as opposed to https://github.com/scylladb/scylladb/pull/25993.
The similarity calculations are implemented inspired by [USearch's implementation](
a2f1759910/include/usearch/index_plugins.hpp (L1304-L1385)) and made compatible with [Cassandra's documentation](https://cassandra.apache.org/doc/5.0/cassandra/developing/cql/functions.html#vector-similarity-functions).
That would guarantee the results in ScyllaDB are calculated using the exact same algorithms as used in Vector Store indexes.
---
Fixes: SCYLLADB-88
Fixes: SCYLLADB-89
New feature, should land into 2026.1
Closesscylladb/scylladb#27524
* github.com:scylladb/scylladb:
docs: add vector similarity functions documentation
test/cqlpy: add similarity functions correctness tests
test/cqlpy: add similarity functions invalid call tests
cql3: introduce similarity functions syntax
vector_similarity_fcts: introduce similarity functions
vector_similarity_fcts: retrieve similarity function argument types
vector_similarity_fcts: add calculating similarity between vectors
Add documentation in `functions.rst` as the CQL reference
for a vector similarity functions.
This includes the syntax, example usage, and prerequisites
for the parameters.
The doc about DDL statements claims that an `ALTER KEYSPACE` will fail
in the presence of an ongoing global topology operation.
This limitation was specifically referring to RF changes, which Scylla
implements as global topology requests (`keyspace_rf_change`), and it
was true when it was first introduced (1b913dd880) because there was
no global topology request queue at that time, so only one ongoing
global request was allowed in the cluster.
This limitation was lifted with the introduction of the global topology
request queue (6489308ebc), and it was re-introduced again very
recently (2e7ba1f8ce) in a slightly different form; it now applies only
to RF changes (not to any request type) and only those that affect the
same keyspace. None of these two changes were ever reflected in the doc.
Synchronize the doc with the current state.
Fixes#27776.
Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
Closesscylladb/scylladb#27786
Update the documentation about restrictions to tablets keyspaces related
to RF-rack.
* MV/SI require the keyspace to be RF-rack-valid
* topology operations are restricted if a keyspace has views to preserve
RF-rack-validity
Currently some things are not supported for colocated tables: it's not
possible to repair a colocated table, and due to this it's also not
possible to use the tombstone_gc=repair mode on a colocated table.
Extend the documentation to explain what colocated tables are and
document these restrictions.
Fixesscylladb/scylladb#27261Closesscylladb/scylladb#27516
This is a temporary solution as handling this property may require
a bit more attention or at least a bit more focus. For now, let's
forbid using it so it's clear it won't get applied. A simple test
is provided to cover it.
We document the restriction.
After the previous patch that extended the grammar and provided
basic functionalities to accommodate properties of materialized views
in indexes, this commit takes another step and actually applies them
to the underlying view when it's being created.
We're providing validation tests for each property, with the single
exception of CLUSTERING ORDER BY. That one will be handled separately
in an upcoming commit.
We also update the user documentation.
The PRUNE MATERALIZED VIEW statement is performed as follows:
1. Perform a range scan of the view table from the view replicas based
on the ranges specified in the statement.
2. While reading the paged scan above, for each view row perform a read
from all base replicas at the corresponding primary key. If a discrepancy
is detected, delete the row in the view table.
When reading multiple rows, this is very slow because for each view row
we need to performe a single row query on multiple replicas.
In this patch we add an option to speed this up by performing many of the
single base row reads concurrently, at the concurrency specified in the
USING CONCURRENCY clause.
Aside from the unit test, I checked manually on a 3-node cluster with 10M rows, using vnodes. There were actually no ghost rows in the test, but we still had to iterate over all view rows and read the corresponding base rows. And actual ghost rows, if there are any, should be a tiny fraction of all rows. I compared concurrencies 1,2,10,100 and the results were:
* Pruning with concurrency 1 took total 1416 seconds
* Pruning with concurrency 2 took total 731 seconds
* Pruning with concurrency 10 took total 234 seconds
* Pruning with concurrency 100 took total 171 seconds
So after a concurrency of 10 or so we're hitting diminishing returns (at least in this setup). At that point we may be no longer bottlenecked by the reads, but by CPU on the shard that's handling the PRUNE
Fixes https://github.com/scylladb/scylladb/issues/27070Closesscylladb/scylladb#27097
* github.com:scylladb/scylladb:
mv: allow setting concurrency in PRUNE MATERIALIZED VIEW
cql: add CONCURRENCY to the USING clause
The PRUNE MATERALIZED VIEW statement is performed as follows:
1. Perform a range scan of the view table from the view replicas based
on the ranges specified in the statement.
2. While reading the paged scan above, for each view row perform a read
from all base replicas at the corresponding primary key. If a discrepancy
is detected, delete the row in the view table.
When reading multiple rows, this is very slow because for each view row
we need to performe a single row query on multiple replicas.
In this patch we add an option to speed this up by performing many of the
single base row reads concurrently, at the concurrency specified in the
USING CONCURRENCY clause.
Fixes https://github.com/scylladb/scylladb/issues/27070
This commit fixes the information about object storage:
- Object storage configuration is no longer marked as experimental.
- Redundant information has been removed from the description.
- Information related to object storage for SStabels has been removed
as the feature is not working.
Fixes https://github.com/scylladb/scylladb/issues/26985Closesscylladb/scylladb#26987
This patch adds the missing warning about the lack of possibility
to return the similarity distance. This will be added in the next
iteration.
Fixes#27086
It has to be backported to 2025.4 as this is the limitation in 2025.4.
Closesscylladb/scylladb#27096
This patch series re-enables support for speculative retry values `0` and `100`. These values have been supported some time ago, before [schema: fix issue 21825: add validation for PERCENTILE values in speculative_retry configuration. #21879
](https://github.com/scylladb/scylladb/pull/21879). When that PR prevented using invalid `101PERCENTILE` values, valid `100PERCENTILE` and `0PERCENTILE` value were prevented too.
Reproduction steps from [[Bug]: drop schema and all tables after apply speculative_retry = '99.99PERCENTILE' #26369](https://github.com/scylladb/scylladb/issues/26369) are unable to reproduce the issue after the fix. A test is added to make sure the inclusive border values `0` and `100` are supported.
Documentation is updated to give more information to the users. It now states that these border values are inclusive, and also that the precision, with automatic rounding, is 1 decimal digit.
Fixes#26369
This is a bug fix. If at any time a client tries to use value >= 99.5 and < 100, the raft error will happen. Backport is needed. The code which introduced inconsistency is introduced in 2025.2, so no backporting to 2025.1.
Closesscylladb/scylladb#26909
* github.com:scylladb/scylladb:
test: cqlpy: add test case for non-numeric PERCENTILE value
schema: speculative_retry: update exception type for sstring ops
docs: cql: ddl.rst: update speculative-retry-options
test: cqlpy: add test for valid speculative_retry values
schema: speculative_retry: allow 0 and 100 PERCENTILE values
Clarify how the value of `XPERCENTILE` is handled:
- Values 0 and 100 are supported
- The percentile value is rounded to the nearest 0.1 (1 decimal place)
Refs #26369
Auto-exands numeric RF in CREATE/ALTER KEYSPACE statements for
new DCs specified in the statement.
Doesn't auto-expand existing options, as the rack choice may not be in
line with current replica placement. This requires co-locating tablet
replicas, and tracking of co-location state, which is not implemented yet.
Signed-off-by: Tomasz Grabiec <tgrabiec@scylladb.com>
We want to add strongly consistent tables as an option. We will have
two kind of strongly consistent tables: globally consistent and locally
consistent. The former means that requests from all DCs will be globally
linearisable while the later - only requests to the same DCs will be
linearisable. To allow configuring all the possibilities the patch
adds new parameter to a keyspace definition "consistency" that can be
configured to be `eventual`, `global` or `local`. Non eventual setting
is supported for tablets enabled keyspaces only. Since we want to start
with implementing local consistency configuring global consistency will
result in an error for now.
This patch adds the missing documentation for the SELECT ... ANN
statement that allows performing vector queries. This is just the
basic explanation of the grammar and how to use it. More
comprehensive documentation about vector search will be added
separately in Scylla Cloud documentation and features description.
Links to this additional documentation will be added as part of
VECTOR-244.
Fixes: VECTOR-247.
This patch adds CQL documentation about creating vector search
indexes. It includes the syntax and description of parameters.
It does not cover VECTOR type that is already supported and
documented and it does not cover querying vectors which will be
covered by a separate PR.
Fixes: VECTOR-217
Closesscylladb/scylladb#26233
This patch updates the create keyspace statement docs. It explains how
the `replication` option in the create keyspace statement is now optional,
and behaves the same as if we specified an empty set as following:
`WITH replication = {}`.
An example with no `replication` option specified has also been added.
Refs #25145
Update create-keyspace-statement section of ddl.rst since replication factor is no longer mandatory.
Add an example for keyspace creation without specifying replication factor.
Add an example for keyspace creation without specifying both `class` and replication factor.
Refs: #16028
Update create-keyspace-statement section of ddl.rst since `class` is no longer mandatory.
Add an example for keyspace creation without specifying `class`.
Refs: #16029
ScyllaDB supports non-frozen UDTs since 3.2, no need to keep referencing
this limitation in the current docs. Replace the description of the
limitation with general description of frozen semantics for UDTs.
Fixes: #22929Closesscylladb/scylladb#24763
This commit removes the Non-Reserved CQL Keywords and Reserved CQL Keywords pages-keyword
as that content is already covered on the Appendices page.
Redirections are added to avoid 404s for the removed pages.
In addition, the Appendices page title is extended with "Reserved CQL Keywords and Types"
to help users understand what those appendices are about.
Fixes https://github.com/scylladb/scylladb/issues/24319Closesscylladb/scylladb#24320