157 Commits

Author SHA1 Message Date
Karol Nowacki
30487e8854 index: fix vector index with filtering target column
The secondary index mechanism is currently used to determine the target column.
This mechanism works incorrectly for vector indexes with filtering because
it returns the last specified column as the target (vectors) column.
However, the syntax for a vector index requires the first column to be the target:
```
CREATE CUSTOM INDEX ON t(vectors, users) USING 'vector_index';
```

This discrepancy eventually leads to the following exception when performing an
ANN search on a vector index with filtering columns:
````
ANN ordering by vector requires the column to be indexed using 'vector_index'
````

This commit fixes the issue by introducing dedicated logic for vector indexes
to correctly identify the target(vectors) column.

Fixes: SCYLLADB-635

Closes scylladb/scylladb#28740
2026-03-02 18:47:58 +02:00
Michał Hudobski
579ed6f19f secondary_index_manager: fix double registration bug
We have observed a bug that caused Scylla to crash due to metrics double
registration. This bug is really difficult to reproduce and was seen
only once in the wild. We think that it may be caused by a request
in-flight keeping a reference to the stats object, making it not
deregister when the index is dropped, which casues a double registration
when we recreate the index, however we are not 100% sure.

This patch makes it so the metrics always get deregistered when we drop the index, which should fix the double registration bug.

Fixes: #27252

Closes scylladb/scylladb#28655
2026-02-26 09:39:53 +01:00
Pawel Pery
f49c9e896a vector_search: allow full secondary indexes syntax while creating the vector index
Vector Search feature needs to support creating vector indexes with additional
filtering column. There will be two types of indexes: global which indexes
vectors per table, and local which indexes vectors per partition key. The new
syntaxes are based on ScyllaDB's Global Secondary Index and Local Secondary
Index. Vector indexes don't use secondary indexes functionalities in any way -
all indexing, filtering and processing data will be done on Vector Store side.

This patch allows creating vector indexes using this CQL syntax:

```
CREATE TABLE IF NOT EXISTS cycling.comments_vs (
  commenter text,
  comment text,
  comment_vector VECTOR <FLOAT, 5>,
  created_at timestamp,
  discussion_board_id int,
  country text,
  lang text,
  PRIMARY KEY ((commenter, discussion_board_id), created_at)
);

CREATE CUSTOM INDEX IF NOT EXISTS global_ann_index
  ON cycling.comments_vs(comment_vector, country, lang) USING 'vector_index'
  WITH OPTIONS = { 'similarity_function': 'DOT_PRODUCT' };

CREATE CUSTOM INDEX IF NOT EXISTS local_ann_index
  ON cycling.comments_vs((commenter, discussion_board_id), comment_vector, country, lang)
  USING 'vector_index'
  WITH OPTIONS = { 'similarity_function': 'DOT_PRODUCT' };
```

Currently, if we run these queries to create indexes we will receive such errors:

```
InvalidRequest: Error from server: code=2200 [Invalid query] message="Vector index can only be created on a single column"
InvalidRequest: Error from server: code=2200 [Invalid query] message="Local index definition must contain full partition key only. Redundant column: XYZ"
```

This commit refactors `vector_index::check_target` to correctly validate
columns building the index. Vector-store currently support filtering by native
types, so the type of columns is checked. The first column from the list must
be a vector (to build index based on these vectors), so it is also checked.

Allowed types for columns are native types without counter (it is not possible
to create a table with counter and vector) and without duration (it is not
possible to correctly compare durations, this type is even not allowed in
secondary indexes).

This commits adds cqlpy test to check errors while creating indexes.

Fixes: SCYLLADB-298

This needs to be backported to version 2026.1 as this is a fix for filtering support.

Closes scylladb/scylladb#28366
2026-01-30 01:14:31 +02:00
Szymon Malewski
c89957b725 vector_index: rescoring: Add hidden similarity score column
Rescoring consist of recalculating similarity score and reordering results based on it.
In this patch we add calculation of similarity score as a hidden (non-serialized) column and following patch will add reordering.
Normal ordering uses `add_column_for_post_processing`, however this works only for regular columns, not function.
So we create it together with user requested columns (this also forces the use of `selection_with_processing`) and hide the column later.
This also requires special handling for 'SELECT *' case - we need to manually add all columns before adding similarity column.

In case user already asks for similarity score in the SELECT clause, this value will be calculated twice - is should be optimized in future patches.
2026-01-22 15:38:40 +01:00
Szymon Malewski
c5945b1ef4 vector_index: introduce rescoring option
This patch adds vector index option allowing to enable rescoring - recalculation of similarity metric and re-ranking of quantized VS candidates.
Quantization is a necessary condition to run rescoring - checked in convenience function `is_rescoring_enabled`.
Rescoring itself is not implemented - it will come in following patches.

Fixes https://scylladb.atlassian.net/browse/SCYLLADB-294
2026-01-20 21:01:45 +01:00
Szymon Malewski
262a8cef0b vector_index: improve options validation
In this patch we enhance validation of option by:
- giving context (option name) in error messages
- listing supported values in error messages of enumerated options
- avoiding using templates

Fixes https://scylladb.atlassian.net/browse/SCYLLADB-293
Follow-up: https://github.com/scylladb/scylladb/pull/27677
2026-01-20 21:01:41 +01:00
Nadav Har'El
70b3cd0540 Merge 'vector_index: introduce quantization and oversampling options' from Szymon Malewski
This patch adds vector index options allowing to enable quantization and oversampling.
Specific quantization value will be used internally by vector store.

In the current implementation, get_oversampling allows us to decide how many times more candidates
to retrieve from vector store - final response is still trimmed to the given limit.
It is a first step to allow rescoring - recalculation of similarity metric and re-ranking.
Without rescoring oversampling will be also further optimized to happen internally in vector store.

`test/vector_search/rescoring_test.cc` implements basic tests of added functionality.
New options are documented in `docs/cql/secondary-indexes.rst`.

Fixes https://scylladb.atlassian.net/browse/SCYLLADB-82
Ref https://scylladb.atlassian.net/browse/SCYLLADB-83

New feature - no backporting

Closes scylladb/scylladb#27677

* github.com:scylladb/scylladb:
  vector_search: doc: Document new index options
  vector_search: test: Test oversampling
  vector_search: test: Add rescoring index options test
  vector_search: test: Extract Configure utility to shared header
  vector_index: introduce `quantization` and `oversampling` options
2026-01-20 08:50:46 +02:00
Botond Dénes
e01041d3ee db/system_keyspace: move remining tables out of v3 keyspace
The last remining tables in the v3 keyspace are those that are genuinely
distinct -- added by Cassandra 3.0 or >= ScyllaDB 2.0.
Move these out of the v3 keyspace too, with this the v3 keyspace is
defunct and removed.
2026-01-19 12:32:21 +02:00
Szymon Malewski
b8e91ee6ae vector_index: introduce quantization and oversampling options
This patch adds vector index options allowing to enable quantization and oversampling.
Specific quantization value will be used internally by vector store.

In the current implementation, `get_oversampling` allows us to decide how many times more candidates
to retrieve from vector store - final response is still trimmed to the given limit.
It is a first step to allow rescoring - recalculation of similarity metric and re-ranking.
Without rescoring oversampling will be also further optimized to happen internally in vector store.

Fixes https://scylladb.atlassian.net/browse/SCYLLADB-82
Ref https://scylladb.atlassian.net/browse/SCYLLADB-83
2026-01-19 10:21:43 +01:00
Botond Dénes
122b7847e5 Merge 'index: Accept view properties in CREATE INDEX' from Dawid Mędrek
Problem
-------
Secondary indexes are implemented via materialized views under the
hood. The way an index behaves is determined by the configuration
of the view. Currently, it can be modified by performing the CQL
statement `ALTER MATERIALIZED VIEW` on it. However, that raises some
concerns.

Consider, for instance, the following scenario:

1. The user creates a secondary index on a table.
2. In parallel, the user performs writes to the base table.
3. The user modifies the underlying materialized view, e.g. by setting
   the `synchronous_updates` to `true` [1].

Some of the writes that happened before step 3 used the default value
of the property (which is `false`). That had an actual consequence
on what happened later on: the view updates were performed
asynchronously. Only after step 3 had finished did it change.

Unfortunately, as of now, there is no way to avoid a situation like
that. Whenever the user wants to configure a secondary index they're
creating, they need to do it in another schema change. Since it's
not always possible to control how the database is manipulated in
the meantime, it leads to problems like the one described.

That's not all, though. The fact that it's not possible to configure
secondary indexes is inconsistent with other schema entities. When
it comes to tables or materialized views, the user always have a means
to set some or even all of the properties during their creation.

Solution
--------
The solution to this problem is extending the `CREATE INDEX` CQL
statement by view properties. The syntax is of form:

```
> CREATE INDEX <index name>
> .. ON <keyspace>.<table> (<columns>)
> .. WITH <properties>
```

where `<properties>` corresponds to both index-specific and view
properties [2, 3]. View properties can only be used with indexes
implemented with materialized views; for example, it will be impossible
to create a vector index when specifying any view property (see
examples below).

When a view property is provided, it will be applied when creating the
underlying materialized view. The behavior should be similar to how
other CQL statements responsible for creating schema entities work.

High-level implementation strategy
----------------------------------
1. Make auxiliary changes.
2. Introduce data structures representing the new set of index
   properties: both index-specific and those corresponding to the
   underlying view.
3. Extend `CREATE INDEX` to accept view properties.
4. Extend `DESCRIBE INDEX` and other `DESCRIBE` statements to include
   view properties in their output.

User documentation is also updated at the steps to reflect the
corresponding changes.

Implementation considerations
-----------------------------
There are a number of schema properties that are now obsolete. They're
accepted by other CQL statements, but they have no effect. They
include:

* `index_interval`
* `replicate_on_write`
* `populate_io_cache_on_flush`
* `read_repair_chance`
* `dclocal_read_repair_chance`

If the user tries to create a secondary index specifying any of those
keywords, the statement will fail with an appropriate error (see
examples below).

Unlike materialized views, we forbid specifying the clustering order
when creating a secondary index [4]. This limitation may be lifted
later on, but it's a detail that may or may not prove troublesome. It's
better to postpone covering it to when we have a better perspective on
the consequences it would bring.

Examples
--------
Good examples
```
> CREATE INDEX idx ON ks.t (v);
> CREATE INDEX idx ON ks.t (v) WITH comment = 'ok view property';
> CREATE INDEX idx ON ks.t (v)
  .. WITH comment = 'multiple view properties are ok'
  .. AND synchronous_updates = true;
> CREATE INDEX idx ON ks.t (v)
  .. WITH comment = 'default value ok'
  .. AND synchronous_updates = false;
```

Bad examples
```
> CREATE INDEX idx ON ks.t (v) WITH replicate_on_write = true;

SyntaxException: Unknown property 'replicate_on_write'

> CREATE INDEX idx ON ks.t (v)
  .. WITH OPTIONS = {'option1': 'value1'}
  .. AND comment = 'some text';

InvalidRequest: Error from server: code=2200 [Invalid query]
  message="Cannot specify options for a non-CUSTOM index"

> CREATE CUSTOM INDEX idx ON ks.t (v)
  .. WITH OPTIONS = {'option1': 'value1'}
  .. AND comment = 'some text';

InvalidRequest: Error from server: code=2200 [Invalid query]
  message="CUSTOM index requires specifying the index class"

> CREATE CUSTOM INDEX idx ON ks.t (v)
  .. USING 'vector_index'
  .. WITH OPTIONS = {'option1': 'value1'}
  .. AND comment = 'some text';

InvalidRequest: Error from server: code=2200 [Invalid query]
  message="You cannot use view properties with a vector index"

> CREATE INDEX idx ON ks.t (v) WITH CLUSTERING ORDER BY (v ASC);

InvalidRequest: Error from server: code=2200 [Invalid query]
  message="Indexes do not allow for specifying the clustering order"
```

and so on. For more examples, see the relevant tests.

References:
[1] https://docs.scylladb.com/manual/branch-2025.4/cql/cql-extensions.html#synchronous-materialized-views
[2] https://docs.scylladb.com/manual/branch-2025.4/cql/secondary-indexes.html#create-index
[3] https://docs.scylladb.com/manual/branch-2025.4/cql/mv.html#mv-options
[4] https://docs.scylladb.com/manual/branch-2025.4/cql/dml/select.html#ordering-clause

Fixes scylladb/scylladb#16454

Backport: not needed. This is an enhancement.

Closes scylladb/scylladb#24977

* github.com:scylladb/scylladb:
  cql3: Extend DESC INDEX by view properties
  cql3: Forbid using CLUSTERING ORDER BY when creating index
  cql3: Extend CREATE INDEX by MV properties
  cql3/statements/create_index_statement: Allow for view options
  cql3/statements/create_index_statement: Rename member
  cql3/statements/index_prop_defs: Re-introduce index_prop_defs
  cql3/statements/property_definitions: Add extract_property()
  cql3/statements/index_prop_defs.cc: Add namespace
  cql3/statements/index_prop_defs.hh: Rename type
  cql3/statements/view_prop_defs.cc: Move validation logic into file
  cql3/statements: Introduce view_prop_defs.{hh,cc}
  cql3/statements/create_view_statement.cc: Move validation of ID
  schema/schema.hh: Do not include index_prop_defs.hh
2026-01-14 09:54:27 +02:00
Avi Kivity
c6dfae5661 treewide: #include Seastar headers with angle brackets
Seastar is an external library from the point of view of
ScyllaDB, so should be included with angle brackets.

Closes scylladb/scylladb#27947
2026-01-13 14:56:15 +02:00
Dawid Mędrek
dcf2c71204 cql3/statements/index_prop_defs.hh: Rename type
We rename the type `index_prop_defs` to `index_specific_prop_defs`.
The rationale for the change is to distinguish between properties
related directly to a index and properties related to the underlying
view (if applicable).

The type `index_prop_defs` will be re-introduced in an upcomming commit
where it'll encompass both index-related and view-related properties.
This is a prerequisite for it.
2025-12-16 11:43:37 +01:00
Amnon Heiman
68c7236acb vector_index: require tablets for vector indexes
This patch enforces that vector indexes can only be created on keyspaces
that use tablets. During index validation, `check_uses_tablets()` verifies
the base keyspace configuration and rejects creation otherwise.

To support this, the `custom_index::validate()` API now receives a
`const data_dictionary::database&` parameter, allowing index
implementations to access keyspace-level settings during DDL validation.

Fixes https://scylladb.atlassian.net/browse/VECTOR-322

Closes scylladb/scylladb#26786
2025-11-26 13:30:43 +02:00
Amnon Heiman
b2c2a99741 index/vector_index.cc: Don't allow zero as an index option
This patch forces vector_index option value to be real-positive numbers
as zero would make no senese.

Fixes https://scylladb.atlassian.net/browse/VECTOR-249

Signed-off-by: Amnon Heiman <amnon@scylladb.com>

Closes scylladb/scylladb#27191
2025-11-25 10:05:44 +02:00
Radosław Cybulski
d589e68642 Add precompiled headers to CMakeLists.txt
Add precompiled header support to CMakeLists.txt and configure.py -
it improves compilation time by approximately 10%.

New header `stdafx.hh` is added, don't include it manually -
the compiler will include it for you. The header contains includes from
external libraries used by Scylla - seastar, standard library,
linux headers and zlib.

The feature is enabled by default, use CMake option `Scylla_USE_PRECOMPILED_HEADER`
or configure.py --disable-precompiled-header to disable.

The feature should be disabled, when trying to check headers - otherwise
you might get false negatives on missing includes from seastar / abseil and so on.

Note: following configuration needs to be added to ccache.conf:

    sloppiness = pch_defines,time_macros,include_file_mtime,include_file_ctime

Closes scylladb/scylladb#26617
2025-11-21 12:27:41 +02:00
Michał Hudobski
46589bc64c secondary_index: disallow multiple vector indexes on the same column
We currently allow creating multiple vector indexes on one column.
This doesn't make much sense as we do not support picking one when
making ann queries.

To make this less confusing and to make our behavior similar
to Cassandra we disallow the creation of multiple vector indexes
on one column.

We also add a test that checks this behavior.

Fixes: VECTOR-254
Fixes: #26672

Closes scylladb/scylladb#26508
2025-10-29 11:55:38 +02:00
Dawid Mędrek
e294b80615 index: Make create_view_for_index method of create_index_statement 2025-10-20 14:04:16 +02:00
Dawid Mędrek
fe00485491 index: Move code for creating MV of secondary index to cql3
We move the code responsible for creating the schema for the underlying
materialized view of a secondary index from `index/` to `cql3/` so that
it's close to that responsible for performing `CREATE INDEX`. That's in
line with how other CQL statements are designed.

Note that the moved method is still a method of `secondary_index_manager`.
We'll make it a method of `create_index_statement` in the following
commit.
2025-10-20 14:04:11 +02:00
Dawid Mędrek
ecc955fbe0 index/secondary_index_manager: Take std::span instead of std::vector 2025-10-09 16:17:07 +02:00
Dawid Mędrek
074f0f2e4c index/secondary_index_manager: Add missing const qualifier 2025-10-09 16:06:50 +02:00
Dawid Mędrek
7baf95bc4b index/vector_index: Add missing const qualifiers 2025-10-09 16:06:24 +02:00
Ernest Zaslavsky
54aa552af7 treewide: Move type related files to a type directory As requested in #22110, moved the files and fixed other includes and build system.
Moved files:
- duration.hh
- duration.cc
- concrete_types.hh

Fixes: #22110

This is a cleanup, no need to backport

Closes scylladb/scylladb#25088
2025-09-17 17:32:19 +03:00
Nadav Har'El
e322902506 Merge 'index, metrics: add per-index metrics' from Michał Hudobski
This patch adds the possibility to track metrics
per secondary index. Currently, only a histogram
of query latencies is tracked, but more metrics
can be added in the future. To add a new metric,
it needs to be added to the index_metrics struct
in index/secondary_index_manager.hh and then
initialized in index/secondary_index_manager.cc
in the constructor of the index_metrics struct.
The metrics are created when the index is created
and removed when the index is dropped.

First lines of the new metric:
\# HELP scylla_index_query_latencies Index query latencies
\# TYPE scylla_index_query_latencies histogram
scylla_index_query_latencies_sum{idx="test_i_idx",ks="test"} 640
scylla_index_query_latencies_count{idx="test_i_idx",ks="test"} 1
scylla_index_query_latencies_bucket{idx="test_i_idx",ks="test",le="640.000000"} 1
scylla_index_query_latencies_bucket{idx="test_i_idx",ks="test",le="768.000000"} 1

Fixes: https://github.com/scylladb/scylladb/issues/25970

Closes scylladb/scylladb#25995

* github.com:scylladb/scylladb:
  test: verify that the index metric is added
  index, metrics: add per-index metrics
2025-09-17 14:54:12 +03:00
Ernest Zaslavsky
d624413ddd treewide: Move query related files to a new query directory
As requested in #22120, moved the files and fixed other includes and build system.

Moved files:
- query.cc
- query-request.hh
- query-result.hh
- query-result-reader.hh
- query-result-set.cc
- query-result-set.hh
- query-result-writer.hh
- query_id.hh
- query_result_merger.hh

Fixes: #22120

This is a cleanup, no need to backport

Closes scylladb/scylladb#25105
2025-09-16 23:40:47 +03:00
Michał Hudobski
b09d1f0a98 index, metrics: add per-index metrics
This patch adds the possibility to track metrics
per secondary index. Currently, only a histogram
of query latencies is tracked, but more metrics
can be added in the future. To add a new metric,
it needs to be added to the index_metrics struct
in index/secondary_index_manager.hh and then
initialized in index/secondary_index_manager.cc
in the constructor of the index_metrics struct.
The metrics are created when the index is created
and removed when the index is dropped.

First lines of the new metric:
\# HELP scylla_index_query_latencies Index query latencies
\# TYPE scylla_index_query_latencies histogram
scylla_index_query_latencies_sum{idx="test_i_idx",ks="test"} 640
scylla_index_query_latencies_count{idx="test_i_idx",ks="test"} 1
scylla_index_query_latencies_bucket{idx="test_i_idx",ks="test",le="640.000000"} 1
scylla_index_query_latencies_bucket{idx="test_i_idx",ks="test",le="768.000000"} 1
2025-09-16 14:03:43 +02:00
Dawid Pawlik
909a51e524 vector_index, index_prop_defs: add version to index options
Since creating the vector index does not lead to creation
of a view table [#24438] (whose version info had been logged in
`system_schema.scylla_tables`) we lack the information about
the version of the index.

The mentioned version is used to recognize the quick-drop-create
index with the same parameters that needs to be rebuild.

The case is mainly experienced while testing, benchmarking
or experimenting with Vector Search.
Nevertheless it is important to have it considered, as it is really
weird having seen that DROP and CREATE commands did not change
anything.
Although being nice "optimization" to use the same old index,
the rebuild feels more natural for the get-to-know-VS-users.
Should not change anything in a real production environment.

The solution we arrived at is to add the version as a field in
options column of `system_schema.indexes`.
The version of vector index is a base table's schema version
on which the index was created.
The table's schema version changes everytime a table is changed
meaning that CREATE INDEX or DROP INDEX statement also change it.
Every index has a different index version, so it allows to identify
them easily.

This patch implements the solution described above.
2025-09-10 15:16:54 +02:00
Radosław Cybulski
c242234552 Revert "build: add precompiled headers to CMakeLists.txt"
This reverts commit 01bb7b629a.

Closes scylladb/scylladb#25735
2025-09-03 09:46:00 +03:00
Dawid Pawlik
873d7dba5c custom index: rename custom_index_option_name
Renamed `custom_index_option_name` to `custom_class_option_name`
as the late was a bit misleading since we refactored our model
of custom indexes to be index class reliant.
2025-08-29 10:49:15 +02:00
Dawid Pawlik
18e4b9d989 vector_index: rename supported_options to vector_index_options
There are a few types of index options abstraction in a code.
One is `raw_options` which indicates the options provided by the user
via CQL. Another is `options` which includes the real index options
after correction checks and addition of system-set options.
I believe we do not need another abstraction with undescriptive name.

This patch adds a little neatness, describing what should the developer
understand by looking at the `supported_options`.

This options are only provided for the vector index to setup the external
index properly with parameters strongly related to Vector Search.
2025-08-29 10:47:02 +02:00
Radosław Cybulski
01bb7b629a build: add precompiled headers to CMakeLists.txt
Add precompiled header support to CMakeLists.txt and configure.py -
it improves compilation time by approximately 10%.

New header `stdafx.hh` is added, don't include it manually -
the compiler will include it for you. The header contains includes from
external libraries used by Scylla - seastar, standard library,
linux headers and zlib.

The feature is enabled by default, use CMake option `Scylla_USE_PRECOMPILED_HEADER`
or configure.py --disable-precompiled-header to disable.

The feature should be disabled, when trying to check headers - otherwise
you might get false negatives on missing includes from seastar / abseil and so on.

Note: following configuration needs to be added to ccache.conf:

    sloppiness = pch_defines,time_macros

Closes #25182
2025-08-27 21:37:54 +03:00
Nadav Har'El
e2c99436cf Merge 'cdc, vector_search: enable CDC when the index is created' from Dawid Pawlik
When a vector index is created in Scylla, it is initially built using a full scan of the database. After that, it stays up to date by tracking changes through CDC, which should be automatically enabled when the vector index is created.

When a user attempts to enable Vector Search (VS), the system checks whether Change Data Capture (CDC) is enabled and properly configured:

1. CDC is not enabled

    - CDC is automatically enabled with the minimum required TTL (Time-to-Live) for VS (24 hours) and the delta mode set to 'full' or post-image is enabled.

    - If the user later tries to reduce the CDC TTL below 24 hours or set delta mode to 'keys' with post-image disabled, the action fails.

    - Error message: Clearly states that CDC TTL must be at least 24 hours and delta mode must be set to 'full' or post-image must be enabled for VS to function.

2. CDC is already enabled

    - If CDC TTL is ≥ 24 hours and delta mode is set to 'full' or post-image is enabled: VS is enabled successfully.

    - If CDC TTL is < 24 hours or delta mode is set to 'keys' with post-image disabled: The VS enabling process fails.

    - Error message: Informs the user that CDC TTL must be at least 24 hours, delta mode must be set to 'full' or post-image must be enabled, and provides a link to documentation on how to update the TTL, delta mode, and post-image.

When a user attempts to disable CDC when VS is enabled, the action will fail and the user will be informed by error message that clearly states that VS needs to be disabled (vector indexes have to be dropped) first.

Full setup requirements and steps will be detailed in the documentation of Vector Search.

Co-authored-by: @smoczy123

Fixes: VECTOR-27
Fixes: VECTOR-25

Closes scylladb/scylladb#25179

* github.com:scylladb/scylladb:
  test/cqlpy: ensure Vector Search CDC options
  test/boost: adjust CDC boost tests for Vector Search
  test/cql: add Vector Search CDC enable/disable test
  cdc, vector_index: provide minimal option setup for Vector Search
  test/cqlpy: adjust describe table tests with CDC for Vector Search
  describe, cdc: adjust describe for cdc log tables
  cdc: enable CDC log when vector index is created
  test/cqlpy: run vector_index tests only on vnodes
  vector_index: check if vector index exists in schema
2025-08-26 23:01:32 +03:00
Dawid Pawlik
a27eef9f18 cdc, vector_index: provide minimal option setup for Vector Search
Ensure that the CDC used by Vector Search has at least 24h TTL
and delta mode is set to 'full' or postimage is enabled.

This setup is required by the Vector Store to work as intended.
The TTL of at least 24h is a rough estimate of the maximal time
needed for the full scan conducted by Vector Store to finish.
The delta mode set to 'full' or postimage enabled is needed
to read the values of vectors being written to the table,
so Vector Store can save them in the desired external index.

As the default we set TTL = 24h, delta = 'full', postimage = false.
Full delta is preffered option to log the vector values as it is less
costly and does not require additional read on write.
2025-08-20 17:20:20 +02:00
Dawid Pawlik
27ceb85508 vector_index: check if vector index exists in schema
Add `has_vector_index` function to check if an index
on vector column using 'vector_index' custom index
class exists in the schema.

Co-authored-by: Michał Hudobski <michal.hudobski@scylladb.com>
2025-08-20 10:35:55 +02:00
Avi Kivity
352cda4467 treewide: avoid including gms/feature_service.hh from headers
To avoid dependency proliferation, switch to forward declarations.

In one case, we introduce indirection via std::unique_ptr and
deinline the constructor and destructor.

Ref #1

Closes scylladb/scylladb#25584
2025-08-20 10:30:27 +03:00
Dawid Pawlik
b29e6870fa vector_index: make parameter names case insensitive
The custom index class name 'vector_index' and it's similarity function
options should be case insensitive.

Before the patch the similarity functions had to be written in
SCREAMING_SNAKE_CASE which was not commonly and intuitively used.
Furthermore the Cassandra translated tests used the options written in
snake_case and as we wanted to translate them exactly, we had to be able
to use lower case option.
2025-08-01 12:08:50 +02:00
Jan Łakomy
5fecad0ec8 cql3/statements: add ANN OF queries support to select statements
Add parsing of `ANN OF` queries to the `select_statement` and
`indexed_table_select_statement` classes.
Add a placeholder for the implementation of external ANN queries.

Rename `should_create_view` to `view_should_exist` as it is used
not only to check if the view should be created but also if
the view has been created.

Co-authored-by: Dawid Pawlik <dawid.pawlik@scylladb.com>
2025-08-01 12:08:50 +02:00
Michał Hudobski
919cca576f custom_index: do not create view when creating a custom index
Currently we create a view for every index, however
for currently supported custom index classes (vector_index)
that work is redundant, as we store the index in the external
service.

This patch adds a way for custom indexes to choose whether to
create a view when creating the index and makes it so that
for vector indexes the view is not created.
2025-07-07 13:47:07 +02:00
Michał Hudobski
d4002b61dd custom_index: refactor describe for custom indexes
Currently, to describe an index we look at
a corresponding view. However for custom indexes
the view may not exist (as we are removing the views
from vector indexes). This commit adds a way for a custom
index class to override the default describing logic
and provides such an override for the vector_index
class.
2025-07-07 13:47:07 +02:00
Michał Hudobski
5de3adb536 custom_index: remove unneeded duplicate of a static string
We have got a duplicate of the same static string and
the only usage of one of the copies can be easily replaced
2025-07-07 13:47:06 +02:00
Avi Kivity
3363bc41e2 interval: rename start() to start_ref() (and end() etc).
We are about to change start() to return a proxy object rather
than a `const interval_bound<T>&`. This is generally transparent,
except in one case: `auto x = i.start()`. With the current implementation,
we'll copy object referred to and assign it to x. With the planned
implementation, the proxy object will be assigned to `x`, but it
will keep referring to `i`.

To prevent such problems, rename start() to start_ref() and end()
to end_ref(). This forces us to audit all calls, and redirect calls
that will break to new start_copy() and end_copy() methods.
2025-06-14 21:26:16 +03:00
Michał Hudobski
3ab643a5de vector_index: add custom index and vector index classes
In this patch we add an abstract class, "custom_index", with a validate() method.
Each CUSTOM INDEX class needs to implement a concrete subclass of custom_index
which is used to validate if this type of custom index class may be used,
and whether the optional parameters passed to it are valid.

We change the existing CUSTOM INDEX validation code to use this new mechanism.

Finally this patch implements one concrete subclass for vector index.
Before this patch, the custom index type "vector_index" was allowed,
but after this patch it gains more validation of its optional parameters
(we support 4 specific parameters, with some rules on their values).
Of course, the vector index isn't actually implemented in this patch,
we are just improving the validation of the index creation statement.
2025-05-27 21:04:50 +02:00
Nadav Har'El
d2844055ad Merge 'index: implement schema management layer for vector search indexes' from null
This pull request adds support for creating custom indexes (at a metadata level) as long as a supported custom class is provided (currently only vector search).

The patch contains:

- a change in CREATE INDEX statement that allows for the USING keyword to be present as long as one of the supported classes is used
-  support for describing custom indexes in the DESCRIBE statement
- unit tests

Co-authored by: @Balwancia

Closes scylladb/scylladb#23720

* github.com:scylladb/scylladb:
  test/cqlpy: add custom index tests
  index: support storing metadata for custom indices
2025-05-22 12:19:36 +03:00
Michał Hudobski
05daa8dded index: support storing metadata for custom indices
Added function returning custom index class name.
Added printing custom index class name when using DESCRIBE.
Changed validation to reflect current support of indices.
2025-05-14 09:32:00 +02:00
Botond Dénes
b5170e27d0 replica/table: s/make_reader_v2/make_mutation_reader/ 2025-05-09 07:53:29 -04:00
Wojciech Mitros
900687c818 view_info: set base info on construction
Currently, the base_info may or may not be set in view schemas.
Even when it's set, it may be modified. This necessitates extra
checks when handling view schemas, as well as potentially causing
errors when we forget to set it at some point.

Instead, we want to make the base info an immutable member of view
schemas (inside view_info). The first step towards that is making
sure that all newly created schemas have the base info set.
We achieve that by requiring a base schema when constructing a view
schema. Unfortunately, this adds complexity each time we're making
a view schema - we need to get the base schema as well.
In most cases, the base schema is already available. The most
problematic scenario is when we create a schema from mutations:
- when parsing system tables we can get the schema from the
database, as regular tables are parsed before views
- when loading a view schema using the schema loader tool, we need
to load the base additionally to the view schema, effectively
doubling the work
- when pulling the schema from another node - in this case we can
only get the current version of the base schema from the local
database

Additionally, we need to consider the base schema version - when
we generate view updates the version of the base schema used for
reads should match the version of the base schema in view's base
info.
This is achieved by selecting the correct (old or new) schema in
`db::schema_tables::merge_tables_and_views` and using the stored
base schema in the schema_registry.
2025-04-24 01:08:39 +02:00
Kefu Chai
7ff0d7ba98 tree: Remove unused boost headers
This commit eliminates unused boost header includes from the tree.

Removing these unnecessary includes reduces dependencies on the
external Boost.Adapters library, leading to faster compile times
and a slightly cleaner codebase.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22857
2025-02-15 20:32:22 +02:00
Kefu Chai
e218a62a7a cdc,index: replace boost::ends_with() with .ends_with()
since C++20, std::string and std::string_view started providing
`ends_with()` member function, the same applies to `seastar::sstring`,
so there is no need to use `boost::ends_with()` anymore.

in this change, we switch from `boost::ends_with()` to the member
functions variant to

- improve the readability
- reduce the header dependency

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22502
2025-01-29 11:52:55 +03:00
Kefu Chai
e4463b11af treewide: replace boost::algorithm::join() with fmt::join()
Replace usages of `boost::algorithm::join()` with `fmt::join()` to improve
performance and reduce dependency on Boost. `fmt::join()` allows direct
formatting of ranges and tuples with custom separators without creating
intermediate strings.

When formatting comma-separated values into another string, fmt::join()
avoids the overhead of temporary string creation that
`boost::algorithm::join()` requires. This change also helps streamline
our dependencies by leveraging the existing fmt library instead of
Boost.Algorithm.

To avoid the ambiguity, some caller sites were updated to call
`seastar::format()` explicitly.

See also

- boost::algorithm::join():
  https://www.boost.org/doc/libs/1_87_0/doc/html/string_algo/reference.html#doxygen.join_8hpp
- fmt::join():
  https://fmt.dev/11.0/api/#ranges-api

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22082
2025-01-07 12:45:05 +02:00
Pavel Emelyanov
5eb3278d9e api: Use built_views table in get_built_indexes API
Somehow system."IndexInfo" table and column_family/built_indexes REST
API endpoint declare an index "built" at slightly different times:

The former a virtual table which declares an index completely built
when it appears on the system.built_views table.

The latter uses different data -- it takes the list of indexes in
the schema and eliminates indexes which are still listed in the
system.scylla_views_builds_in_progress table.

The mentioned system. tables are updated at different times, so API
notices the change a bit later. It's worth improving the consistency
of these two APIs by making the REST API endpoint piggy-back the
load_built_views() instead of load_view_build_progress(). With that
change the filtering of indexes should be negated.

Fixes #21587

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-12-24 16:18:00 +03:00
Avi Kivity
f3eade2f62 treewide: relicense to ScyllaDB-Source-Available-1.0
Drop the AGPL license in favor of a source-available license.
See the blog post [1] for details.

[1] https://www.scylladb.com/2024/12/18/why-were-moving-to-a-source-available-license/
2024-12-18 17:45:13 +02:00