Document that `SELECT ... WHERE` clause currently accepts only conjunctions
of relations joined by `AND` (`OR` is not supported), and that
parentheses cannot be used to group boolean subexpressions.
Add an unsupported query example and point readers to equivalent `IN`
rewrites when applicable.
This problem has been raised by one of our users in
https://forum.scylladb.com/t/error-parsing-query-or-unsupported-statement/5299,
and while one could infer answer to user's question by looking at the
syntax of the `SELECT ... WHERE`, it's not immediately obvious to
non-advanced users, so clarifying these concepts is justified.
Fixes: SCYLLADB-1116
Closesscylladb/scylladb#29100
Remove outdated references to filtering on columns provided in the
index definition, and remove the note about equal relations (= and IN)
being the only supported operations. Vector search filtering currently
supports WHERE clauses on primary key columns only.
Closesscylladb/scylladb#28949
Add support for literals in the SELECT clause. This allows
SELECT fn(column, 4) or SELECT fn(column, ?).
Note, "SELECT 7 FROM tab" becomes valid in the grammar, but is still
not accepted because of failed type inference - we cannot infer the
type of 7, and don't have a favored type for literals (like C favors
int). We might relax this later.
In the WHERE clause, and Cassandra in the SELECT clause, type hints
can also resolve type ambiguity: (bigint)7 or (text)?. But this is
deferred to a later patch.
A few changes to the grammar are needed on top of adding a `value`
alternative to `unaliasedSelector`:
- vectorSimilarityArg gained access to `value` via `unaliasedSelector`,
so it loses that alternate to avoid ambiguity. We may drop
`vectorSimilarityArg` later.
- COUNT(1) became ambiguous via the general function path (since
function arguments can now be literals), so we remove this case
from the COUNT special cases, remaining with count(*).
- SELECT JSON and SELECT DISTINCT became "ambiguous enough" for
ANTLR to complain, though as far as I can tell `value` does not
add real ambiguity. The solution is to commit early (via "=>") to
a parsing path.
Due to the loss of count(1) recognition in the parser, we have to
special-case it in prepare. We may relax it to count any expression
later, like modern Cassandra and SQL.
Testing is awkward because of the type inference problem in top-level.
We test via the set_intersection() function and via lua functions.
Example:
```
cqlsh> CREATE FUNCTION ks.sum(a int, b int) RETURNS NULL ON NULL INPUT RETURNS int LANGUAGE LUA AS 'return a + b';
cqlsh> SELECT ks.sum(1, 2) FROM system.local;
ks.sum(1, 2)
--------------
3
(1 rows)
cqlsh>
```
(There are no suitable system functions!)
Fixes https://scylladb.atlassian.net/browse/SCYLLADB-296Closesscylladb/scylladb#28256
Vector Search feature needs to support creating vector indexes with additional
filtering column. There will be two types of indexes: global which indexes
vectors per table, and local which indexes vectors per partition key. The new
syntaxes are based on ScyllaDB's Global Secondary Index and Local Secondary
Index. Vector indexes don't use secondary indexes functionalities in any way -
all indexing, filtering and processing data will be done on Vector Store side.
This patch allows creating vector indexes using this CQL syntax:
```
CREATE TABLE IF NOT EXISTS cycling.comments_vs (
commenter text,
comment text,
comment_vector VECTOR <FLOAT, 5>,
created_at timestamp,
discussion_board_id int,
country text,
lang text,
PRIMARY KEY ((commenter, discussion_board_id), created_at)
);
CREATE CUSTOM INDEX IF NOT EXISTS global_ann_index
ON cycling.comments_vs(comment_vector, country, lang) USING 'vector_index'
WITH OPTIONS = { 'similarity_function': 'DOT_PRODUCT' };
CREATE CUSTOM INDEX IF NOT EXISTS local_ann_index
ON cycling.comments_vs((commenter, discussion_board_id), comment_vector, country, lang)
USING 'vector_index'
WITH OPTIONS = { 'similarity_function': 'DOT_PRODUCT' };
```
Currently, if we run these queries to create indexes we will receive such errors:
```
InvalidRequest: Error from server: code=2200 [Invalid query] message="Vector index can only be created on a single column"
InvalidRequest: Error from server: code=2200 [Invalid query] message="Local index definition must contain full partition key only. Redundant column: XYZ"
```
This commit refactors `vector_index::check_target` to correctly validate
columns building the index. Vector-store currently support filtering by native
types, so the type of columns is checked. The first column from the list must
be a vector (to build index based on these vectors), so it is also checked.
Allowed types for columns are native types without counter (it is not possible
to create a table with counter and vector) and without duration (it is not
possible to correctly compare durations, this type is even not allowed in
secondary indexes).
This commits adds cqlpy test to check errors while creating indexes.
Fixes: SCYLLADB-298
This needs to be backported to version 2026.1 as this is a fix for filtering support.
Closesscylladb/scylladb#28366
This patch adds links to the Vector Search documentation that is hosted
together with Scylla Cloud docs to the CQL documentation.
It also make the note about supported capabilities consistent and
removes the experimental label as the feature is GAed.
Fixes: SCYLLADB-371
Closesscylladb/scylladb#28312
Add a description of available filtering options with ANN vector queries.
Provide an example of such query and a reference to `WHERE` clause restrictions.
Add documentation in `functions.rst` as the CQL reference
for a vector similarity functions.
This includes the syntax, example usage, and prerequisites
for the parameters.
This patch adds the missing warning about the lack of possibility
to return the similarity distance. This will be added in the next
iteration.
Fixes#27086
It has to be backported to 2025.4 as this is the limitation in 2025.4.
Closesscylladb/scylladb#27096
This patch adds the missing documentation for the SELECT ... ANN
statement that allows performing vector queries. This is just the
basic explanation of the grammar and how to use it. More
comprehensive documentation about vector search will be added
separately in Scylla Cloud documentation and features description.
Links to this additional documentation will be added as part of
VECTOR-244.
Fixes: VECTOR-247.
Before this patch we silently allowed and ignored PER PARTITION LIMIT.
While using aggregate functions in conjunction with PER PARTITION LIMIT
can make sense, we want to disable it until we can offer proper
implementation, see #9879 for discussion.
We want to match Cassandra, and for queries with aggregate functions it
behaves as follows:
- it silently ignores PER PARTITION LIMIT if GROUP BY is present, which
matches our previous implementation.
- rejects PER PARTITION LIMIT when GROUP BY is *not* present.
This patch adds rejection of the second group.
Fixes#9879Closesscylladb/scylladb#23086
This adds to the grammar the option to SELECT a specific key in a
collection column using subscript syntax.
For example:
SELECT map['key'] FROM table
SELECT map['key1']['key2'] FROM table
The key can also be parameterized in a prepared query. For this we need
to pass the query options to result_set_builder where we process the
selectors.
Fixesscylladb/scylladb#7751
Where the grammar supports IN, we add NOT IN. This includes the WHERE
clause and LWT IF clause.
Evaluation of NOT IN follows from IN.
In statement_restrictions analysis, they are different, as NOT IN
doesn't enable any clever query plan and must filter.
Some tests are added. An error message was changed ('in' changed to 'IN'),
so some tests are adjusted.
Closesscylladb/scylladb#21992
This PR removes information about outdated versions, including disclaimers and information when a given feature was added.
Now that the documentation is versioned, information about outdated versions is unnecessary (and makes the docs harder to read).
Fixes https://github.com/scylladb/scylladb/issues/12110Closesscylladb/scylladb#17430
The goal is to make the available defaults safe for future use, as they
are often taken from existing config files or documentation verbatim.
Referenced issue: #14290Closesscylladb/scylladb#15947
The DML page is quite long (21 screenfuls on my monitor); split
it into one page per statement to make it more digestible.
The sections that are common to multiple statement are kept
in the main DML page, and references to them are added.
Closes#15053