Commit Graph

27 Commits

Author SHA1 Message Date
Pawel Pery
7883f161bb vector-store: fix creating local vector search indexes with a part of the partition key
Users ought to have possibility to create the local index for Vector Search
based only on a part of the partition key. This commits provides this by
removing requirements of 'full partition key only' for custom local index.

The commit updates docs to explain that local vector index can use only a part
of the partition key.

The commit implements cqlpy test to check fixed functionality.

Fixes: SCYLLADB-953

Needs to be backported to 2026.1 as it is a fix for local vector indexes.

Closes scylladb/scylladb#28931
2026-04-17 11:44:15 +02:00
Nadav Har'El
1eb8d170dd Merge 'vector_index: allow recreating vector indexes on the same column' from Dawid Pawlik
This series allows creating multiple vector indexes on the same column so users can rebuild an index without losing query availability.

The intended flow is:
1. Create a new vector index on a column that already has one.
2. Keep serving ANN queries from the old index while the new one is being built.
3. Verify the new index is ready.
4. Automatically switch to the remaining index.
5. Drop the old index.

To make that deterministic, `index_version` is changed from the base table schema version to a real creation timeuuid. When multiple vector indexes exist on the same column, ANN query planning now picks the index according to the routing implemented in Vector Store (newest serving index). This keeps queries on the old index until it the new one is up and ready.

This patch also removes the create-time restriction that rejected a second vector index on the same column. Name collisions are still rejected as before.

Test coverage is updated accordingly:
- Scylla now verifies that two vector indexes can coexist on the same column.
- Cassandra/SAI behavior is still covered and is still expected to reject duplicate indexes on the same column.

Fixes: VECTOR-610

Closes scylladb/scylladb#29407

* github.com:scylladb/scylladb:
  docs: document vector index metadata and duplicate handling
  test/cqlpy: cover vector index duplicate creation rules
  vector_index: allow multiple named indexes on one column
  vector_index: store `index_version` as creation timeuuid
2026-04-15 14:40:15 +03:00
Dawid Pawlik
800dec2180 test/cqlpy: cover vector index duplicate creation rules
Add cqlpy tests for the current CREATE INDEX behavior of vector indexes.

Cover named and unnamed duplicates, IF NOT EXISTS, coexistence of
multiple named vector indexes on the same column, interactions between
named and unnamed indexes, and the same-name-on-different-table case.
2026-04-14 12:21:38 +02:00
Avi Kivity
0ae22a09d4 LICENSE: Update to version 1.1
Updated terms of non-commercial use (must be a never-customer).
2026-04-12 19:46:33 +03:00
Dawid Pawlik
2dd8eef38c vector_index: store index_version as creation timeuuid
Vector indexes currently store the base table schema version in
`index_version`. That value is name-based, not time-based,
so it does not represent when the index was created.

Store a timeuuid instead and change the relevant interfaces from
`table_schema_version` to `utils::UUID`. This is a prerequisite
for supporting multiple vector indexes on the same column where
the oldest index must be selected deterministically via routing
implemented in Vector Store.

Update the cqlpy tests to check the new semantics directly:
recreating the index changes `index_version`, while ALTER TABLE does not.
2026-04-10 13:05:21 +02:00
Szymon Wasik
573def7cd8 cql: accept source_model option and show options in DESCRIBE
Accept the Cassandra SAI 'source_model' option for vector indexes.
This option is used by Cassandra libraries (e.g., CassIO, LangChain)
to tag vector indexes with the name of the embedding model that
produced the vectors.

ScyllaDB does not use the source_model value but stores it and
includes it in the DESCRIBE INDEX output for Cassandra compatibility.

Additionally, extend vector_index::describe() to emit a
WITH OPTIONS = {...} clause containing all user-provided index options
(filtering out system keys: target, class_name, index_version).
This makes options like similarity_function, source_model, etc.
visible in DESCRIBE output.
2026-04-09 17:20:03 +02:00
Szymon Wasik
80a2e4a0ab cql: add Cassandra SAI (StorageAttachedIndex) compatibility
Libraries such as CassIO, LangChain, and LlamaIndex create vector
indexes using Cassandra's StorageAttachedIndex (SAI) class name.
This commit lets ScyllaDB accept these statements without modification.

When a CREATE CUSTOM INDEX statement specifies an SAI class name on a
vector column, ScyllaDB automatically rewrites it to the native
vector_index implementation. Accepted class names (case-insensitive):
  - org.apache.cassandra.index.sai.StorageAttachedIndex
  - StorageAttachedIndex
  - sai

SAI on non-vector columns is rejected with a clear error directing
users to a secondary index instead.

The SAI detection and rewriting logic is extracted into a dedicated
static function (maybe_rewrite_sai_to_vector_index) to keep the
already-long validate_while_executing method manageable.

Multi-column (local index) targets and nonexistent columns are
skipped with continue — the former are treated as filtering columns
by vector_index::check_target(), and the latter are caught later by
vector_index::validate().

Tests that exercise features common to both backends (basic creation,
similarity_function, IF NOT EXISTS, bad options, etc.) now use the
SAI class name with the skip_on_scylla_vnodes fixture so they run
against both ScyllaDB and Cassandra. ScyllaDB-specific tests continue
to use USING 'vector_index' with scylla_only.
2026-04-09 17:20:03 +02:00
Szymon Wasik
fa7edc627c test: modernize vector index test comments and fix xfail
- Change 'Reproduces' to 'Validates fix for' in test comments to
  reflect that the referenced issues are already fixed.
- Condense the VECTOR-179 comment to two lines.
- Replace the xfailed test_ann_query_with_restriction_works_only_on_pk
  with a focused test (test_ann_query_with_pk_restriction) that creates
  a vector index on a table with a PK column restriction, validating
  the VECTOR-374 fix.
2026-04-09 17:20:02 +02:00
Nadav Har'El
4eeb9f4120 lwt, vector: write to CDC when vector index is enabled.
The vector-search feature introduced the somewhat confusing feature of
enabling CDC without explicitly enabling CDC: When a vector index is
enabled on a table, CDC is "enabled" for it even if the user didn't
ask to enable CDC.

For this, write-path code began to use a new cdc_enabled() function
instead of checking schema.cdc_options.enabled() directly.  This
cdc_enabled() function checks if either this enabled() is true, or
has_vector_index() is true.

Unfortunately, LWT writes continued to use cdc_options.enabled() instead
of the new cdc_enabled(). This means that if a vector index is used and
a vector is written using an LWT write, the new value is not indexed.

This patch fixes this bug. It also adds a regression test that fails
before this patch and passes afterwards - the new test verifies that
when a table has a vector index (but no explicit CDC enabled), the CDC
log is updated both after regular writes and after successful LWT writes.

This patch was also tested in the context of the upcoming vector-search-
for-Alternator pull request, which has a test reproducing this bug
(Alternator uses LWT frequently, so this is very important there).
It will also be tested by the vector-store test suite ("validator").

Fixes SCYLLADB-1342

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#29300
2026-04-08 07:55:05 +03:00
Karol Nowacki
493a4433e7 index: fix DESC INDEX for vector index
The `DESC INDEX` command returned incorrect results for local vector
indexes and for vector indexes that included filtering columns.

This patch corrects the implementation to ensure `DESCRIBE INDEX`
accurately reflects the index configuration.

This was a pre-existing issue, not a regression from recent
serialization schema changes for vector index target options.
2026-03-30 16:46:48 +02:00
Karol Nowacki
6bc88e817f vector_search: fix SELECT on local vector index
Queries against local vector indexes were failing with the error:
"ANN ordering by vector requires the column to be indexed using 'vector_index'"

This was a regression introduced by 15788c3734, which incorrectly
assumed the first column in the targets list is always the vector column.
For local vector indexes, the first column is the partition key, causing
the failure.

Previously, serialization logic for the target index option was shared
between vector and secondary indexes. This is no longer viable due to
the introduction of local vector indexes and vector indexes with filtering
columns, which have  different target format.

This commit introduces a dedicated JSON-based serialization format for
vector index targets, identifying the target column (tc), filtering
columns (fc), and partition key columns (pk). This ensures unambiguous
serialization and deserialization for all vector index types.

This change is backward compatible for regular vector indexes. However,
it breaks compatibility for local vector indexes and vector indexes with
filtering columns created in version 2026.1.0. To mitigate this, usage
of these specific index types will be blocked in the 2026.1.0 release
by failing ANN queries against them in vector-store service.

Fixes: SCYLLADB-895
2026-03-30 16:46:48 +02:00
Karol Nowacki
c0b78477a5 index: test: vector index target option serialization test
This test ensures that the serialization format for vector index target
options remains stable. Maintaining backward compatibility is critical
because the index is restored from this property on startup.
Any unintended changes to the serialization schema could break existing
indexes after an upgrade.

This option is also an interface for the vector-store service,
which uses it to identify the indexed column.
2026-03-30 16:46:48 +02:00
Pawel Pery
f49c9e896a vector_search: allow full secondary indexes syntax while creating the vector index
Vector Search feature needs to support creating vector indexes with additional
filtering column. There will be two types of indexes: global which indexes
vectors per table, and local which indexes vectors per partition key. The new
syntaxes are based on ScyllaDB's Global Secondary Index and Local Secondary
Index. Vector indexes don't use secondary indexes functionalities in any way -
all indexing, filtering and processing data will be done on Vector Store side.

This patch allows creating vector indexes using this CQL syntax:

```
CREATE TABLE IF NOT EXISTS cycling.comments_vs (
  commenter text,
  comment text,
  comment_vector VECTOR <FLOAT, 5>,
  created_at timestamp,
  discussion_board_id int,
  country text,
  lang text,
  PRIMARY KEY ((commenter, discussion_board_id), created_at)
);

CREATE CUSTOM INDEX IF NOT EXISTS global_ann_index
  ON cycling.comments_vs(comment_vector, country, lang) USING 'vector_index'
  WITH OPTIONS = { 'similarity_function': 'DOT_PRODUCT' };

CREATE CUSTOM INDEX IF NOT EXISTS local_ann_index
  ON cycling.comments_vs((commenter, discussion_board_id), comment_vector, country, lang)
  USING 'vector_index'
  WITH OPTIONS = { 'similarity_function': 'DOT_PRODUCT' };
```

Currently, if we run these queries to create indexes we will receive such errors:

```
InvalidRequest: Error from server: code=2200 [Invalid query] message="Vector index can only be created on a single column"
InvalidRequest: Error from server: code=2200 [Invalid query] message="Local index definition must contain full partition key only. Redundant column: XYZ"
```

This commit refactors `vector_index::check_target` to correctly validate
columns building the index. Vector-store currently support filtering by native
types, so the type of columns is checked. The first column from the list must
be a vector (to build index based on these vectors), so it is also checked.

Allowed types for columns are native types without counter (it is not possible
to create a table with counter and vector) and without duration (it is not
possible to correctly compare durations, this type is even not allowed in
secondary indexes).

This commits adds cqlpy test to check errors while creating indexes.

Fixes: SCYLLADB-298

This needs to be backported to version 2026.1 as this is a fix for filtering support.

Closes scylladb/scylladb#28366
2026-01-30 01:14:31 +02:00
Szymon Malewski
262a8cef0b vector_index: improve options validation
In this patch we enhance validation of option by:
- giving context (option name) in error messages
- listing supported values in error messages of enumerated options
- avoiding using templates

Fixes https://scylladb.atlassian.net/browse/SCYLLADB-293
Follow-up: https://github.com/scylladb/scylladb/pull/27677
2026-01-20 21:01:41 +01:00
Michał Hudobski
12483d8c3c vector_search: throw an error when we restrict primary in vector search
We currently allow restrictions on single column primary key,
but we ignore the restriction and return all results.
This can confuse the users. We change it so such a restriction
will throw an error and add a test to validate it.

Fixes: VECTOR-331

Closes scylladb/scylladb#27143
2025-12-15 09:45:56 +02:00
Karol Nowacki
ca62effdd2 vector_search: Restrict vector index tests to tablets only
Vector indexes are going to be supported only for tablets (see VECTOR-322).
As a result, tests using vector indexes will be failing when run with vnodes.

This change ensures tests using vector indexes run exclusively with tablets.

Fixes: VECTOR-49

Closes scylladb/scylladb#26843
2025-11-25 09:26:16 +02:00
Michał Hudobski
fd521cee6f test: fix typo in vector_index test
Unfortunately in https://github.com/scylladb/scylladb/pull/26508
a typo than changes a behavior of a test was introduced.
This patch fixes the typo.

Closes scylladb/scylladb#26803
2025-10-31 18:35:02 +03:00
Michał Hudobski
46589bc64c secondary_index: disallow multiple vector indexes on the same column
We currently allow creating multiple vector indexes on one column.
This doesn't make much sense as we do not support picking one when
making ann queries.

To make this less confusing and to make our behavior similar
to Cassandra we disallow the creation of multiple vector indexes
on one column.

We also add a test that checks this behavior.

Fixes: VECTOR-254
Fixes: #26672

Closes scylladb/scylladb#26508
2025-10-29 11:55:38 +02:00
Michael Litvak
778dec2630 test/cqlpy: adjust cdc tests for tablets
update cdc-related tests in test/cqlpy for cdc with tablets.

* test_cdc_log_entries_use_cdc_streams: this test depends on the
  implementation of the cdc tables, which is different for tablets, so
  it's changed to run for both vnodes and tablets keyspaces, and we add
  the implementation for tablets.

* some cdc-related are unskipped for tablets so they will be run with
  both tablets and vnodes keyspaces. these are tests where the
  implementation may be different between tablets and vnodes and we want
  to have converage of both.

* other cdc-related tests do not depend on the implementation
  differences between tablets and vnodes, so we can just enable them to
  run with the default configuration. previously they were disabled for
  tablets keyspaces because it wasn't supported, so now we remove this.
2025-09-17 14:47:13 +02:00
Nadav Har'El
5307d1b9a8 Merge 'vector_index: add version to index options' from Dawid Pawlik
Since creating the vector index does not lead to creation of a view table [#24438] (whose version info had been logged in `system_schema.scylla_tables`) we lacked the information about the version of the index.

The solution we arrived at is to add the version as a field in options column of `system_schema.indexes`.
It requires few changes and seems unintruitive for existing infrastructure.

This patch implements the solution described above.

Refs: VECTOR-142

Closes scylladb/scylladb#25614

* github.com:scylladb/scylladb:
  cqlpy/test_vector_index: add vector index version test
  vector_index, index_prop_defs: add version to index options
  create_index_statement: rename `validator` to `custom_index_factory`
  custom index: rename `custom_index_option_name`
  vector_index: rename `supported_options` to `vector_index_options`
2025-09-14 15:35:53 +03:00
Dawid Pawlik
1ce76a6ca2 cqlpy/test_vector_index: add vector index version test
Test if the index version is the same as the base table version before
the index was created.
Test if recreating the index with the same parameters changes the version.
Test if altering the base table does not change the version.
Test if the user cannot specify the index version option by themself.
2025-09-10 15:19:36 +02:00
Karol Nowacki
3086d15999 cql3: Fix crash on ANN OF query when TRACING ON is enabled
Executing a vector search (SELECT with ANN OF ordering) query with `TRACING ON` enabled
caused a node to crash due to a null pointer dereference.

This occurred because a vector index does not have an associated view
table, making its `_view_schema` member null. The implementation
attempted to enable tracing on this null view schema, leading to the
crash.

The fix adds a null check for `_view_schema` before attempting to
enable tracing on the view (index) table.

A regression test is included to prevent this from happening again.

Fixes: VECTOR-179

Closes scylladb/scylladb#25500
2025-09-01 17:26:54 +03:00
Dawid Pawlik
9463ac10e2 test/cqlpy: ensure Vector Search CDC options
Add test to check if minimal options for Vector Search
are ensured and if it is disallowed to create CDC
unrespectfully to the minimal requirements.
2025-08-20 17:20:38 +02:00
Dawid Pawlik
461d820fbb test/cqlpy: run vector_index tests only on vnodes
When creating an index on vector column using 'vector_index' class
the CDC log is being created as it is required for Vector Search.

Due to the fact that CDC does not yet work with tablets (Refs #16317)
enabled we have to mark the tests failing on tablets and run them on vnodes
to make sure the vector index tests continue to pass.
2025-08-20 12:38:52 +02:00
Michał Hudobski
919cca576f custom_index: do not create view when creating a custom index
Currently we create a view for every index, however
for currently supported custom index classes (vector_index)
that work is redundant, as we store the index in the external
service.

This patch adds a way for custom indexes to choose whether to
create a view when creating the index and makes it so that
for vector indexes the view is not created.
2025-07-07 13:47:07 +02:00
Michał Hudobski
195e6a82de test/cqlpy: remove xfail and add more vector tests
We have added validation for options and type of column for vector
indexes. This commit adds tests for that validation.
2025-05-27 21:04:50 +02:00
Michał Hudobski
8ea862f1e8 test/cqlpy: add custom index tests
Unit tests checking the behavior
of the added support for create
custom index statement
2025-05-14 09:32:01 +02:00