Commit Graph

177 Commits

Author SHA1 Message Date
Nadav Har'El
21ecc12fc6 Merge 'index: fix local vector index locality detection after schema reload' from Michał Hudobski
After schema reload, `target_parser::is_local()` did not recognize the
vector-index local target format `{"pk": [...], "tc": "..."}`, causing
local vector indexes to be treated as global. This broke duplicate
detection when both a global and a local vector index existed on the same
column. Fix by introducing `vector_index::is_local()` and dispatching
to it from `create_index_from_index_row()` based on the index class.
Also adds tests for local/global vector index coexistence.

Fixes: SCYLLADB-987

backport reasoning: we added local vector index support in 2026.1

Closes scylladb/scylladb#29492

* github.com:scylladb/scylladb:
  test/cqlpy: add tests for global and local vector index coexistence
  index: fix local vector index locality detection after schema reload
2026-05-27 15:34:57 +03:00
Nadav Har'El
96dd3121e7 Merge 'cql: rewrite CassIO SAI metadata index to regular secondary index' from Szymon Wasik
CassIO (the library backing LangChain's `langchain_community.vectorstores.Cassandra` integration) issues the following DDL during schema setup to create a metadata index:

```sql
CREATE CUSTOM INDEX IF NOT EXISTS eidx_metadata_s_<table>
ON <keyspace>.<table> (ENTRIES(metadata_s))
USING 'org.apache.cassandra.index.sai.StorageAttachedIndex';
```

ScyllaDB does not support Cassandra's StorageAttachedIndex (SAI) for non-vector columns and previously rejected this statement with:

```
StorageAttachedIndex (SAI) is only supported on vector columns; use a secondary index for non-vector columns
```

This blocks seamless migration of existing LangChain/CassIO applications from Cassandra to ScyllaDB — applications fail during initialization before any application-level workaround can run, even when metadata filtering is not used (`metadata_indexing="none"`).

CassIO is no longer actively maintained but remains the only official LangChain integration path for Apache Cassandra over CQL, meaning existing applications will continue using this setup pattern.

Instead of rejecting the CassIO metadata-map SAI DDL, detect the pattern and rewrite it to a standard ScyllaDB secondary index on collection entries:

- **Detection**: SAI class name + single `ENTRIES` target on a non-frozen `map` column
- **Rewrite**: Clear the custom class so the index is created through the standard secondary index path (which already fully supports indexing map entries)
- **Warning**: Emit a CQL warning informing the user that SAI is not supported by ScyllaDB, a regular secondary index was created instead, and metadata filtering behavior may differ from Cassandra SAI

The rewrite is placed early in `validate_while_executing()`, before the rf-rack-validity check, so the standard secondary index code path handles all subsequent validation naturally — no code duplication.

After this change, the CassIO schema setup succeeds on ScyllaDB:
- `CREATE CUSTOM INDEX ... USING 'sai'` on `ENTRIES(metadata_s)` creates a real secondary index
- The index is functional and can accelerate metadata filtering queries
- A CQL warning makes the rewrite transparent to operators
- SAI on non-vector, non-map-entries columns is still rejected as before
- Vector SAI indexes continue to be rewritten to `vector_index` as before

- `test_sai_entries_on_map_creates_regular_index` — verifies the index is created and the warning is emitted (fully-qualified SAI class name)
- `test_sai_entries_on_map_short_name` — same with the `'sai'` short alias
- `test_sai_on_regular_column_rejected` — confirms SAI on regular scalar columns is still rejected

All 148 tests in `test_vector_index.py` and `test_secondary_index.py` pass with no regressions (125 passed, 22 xfailed, 1 skipped).

Fixes: SCYLLADB-2113
Backport: 2026.2 as this is the version where the support for SAI class needed by LangChain was added.

Closes scylladb/scylladb#29981

* github.com:scylladb/scylladb:
  cql: rewrite CassIO SAI metadata index to regular secondary index
  db/config: add enable_cassio_compatibility flag
2026-05-26 00:19:03 +03:00
Michał Hudobski
1d17d2144f index, vector_index: limit primary key columns to 255
The vector-store's InvariantKey type supports at most 255 key
components. Reject vector index creation when the base table's
primary key (partition + clustering columns) exceeds this limit.

Fixes: VECTOR-553

Closes scylladb/scylladb#29317
2026-05-25 19:24:17 +03:00
Szymon Wasik
5ee339b11d cql: rewrite CassIO SAI metadata index to regular secondary index
When CassIO creates a SAI ENTRIES index on a map column,
ScyllaDB now rewrites it to a regular secondary index and emits
a CQL warning. This allows LangChain/CassIO applications to work
without DDL errors.

The rewrite is gated behind the enable_cassio_compatibility flag
(disabled by default).

Refs: SCYLLADB-2113
2026-05-25 15:11:43 +02:00
Michał Hudobski
cf372ba87b index: fix local vector index locality detection after schema reload
When index metadata was deserialized from system tables during schema
reload, target_parser::is_local() failed to recognize local vector
indexes. It only handled the non-vector JSON format {"pk": [...],
"ck": [...]}, but vector indexes serialize their targets as
{"pk": [...], "tc": "..."}. As a result, every local vector index was
incorrectly marked as global after a schema reload.

Fix this by introducing vector_index::is_local() that recognizes the
vector-specific target format, and dispatching to it from the schema
deserialization code based on the index class name. This keeps
target_parser as secondary-index-specific and follows the same dispatch
pattern already used for target serialization.

Also remove the now-unused has_vector_index_on_column() helper (its
callers were removed by #29407).
2026-05-21 10:35:48 +02:00
Dawid Pawlik
a631123c06 external_index: fix require CDC options for disabled CDC
Since we want to remove the requirement of disallowing "explicitly disabled"
CDC table when creating external index (#29894), we still need to check other
CDC required parameters to be set properly.

Before this commit, once we auto-enable CDC which was "explicitly disabled",
we would never run the `check_cdc_options()`.
This patch adjusts the check to happen not only when the CDC enabled is true.
2026-05-19 08:53:15 +02:00
Dawid Pawlik
9e02e11ea8 fulltext_index: enforce CDC requirements for fulltext indexes
Fulltext indexes rely on CDC to track changes for asynchronous index
building. Enforce the following CDC constraints during CREATE INDEX:
- CDC TTL must be at least 86400 seconds (24 hours)
- CDC delta mode must be 'full' or postimage must be enabled

Add `has_fulltext_index()` and `check_cdc_options()` so that other
modules can detect fulltext indexes and validate CDC settings:
- include fulltext indexes in `cdc_enabled()` so the CDC log
  is auto-created, and validate CDC options in
  `on_before_update_column_family()`
- block `ALTER TABLE ... WITH cdc = {'enabled': false}`
  when a fulltext index exists on the table
2026-05-19 08:52:47 +02:00
Dawid Pawlik
69dc62c373 fulltext_index: require tablet storage for fulltext indexes
Fulltext indexes, like vector indexes, require the base table's
keyspace to use tablets. Add `check_uses_tablets()` validation to
`fulltext_index::validate()` that rejects index creation when the
keyspace does not use tablet storage.

Also add `skip_without_tablets` fixture to all existing fulltext index
tests so they are skipped in environments where tablets are not
available.
2026-05-19 08:52:47 +02:00
Dawid Pawlik
61d658106a index: introduce external_index base class for VS/FTS indexes
Add `external_index` as a common base for `vector_index` and `fulltext_index`,
both of which are backed by an external Vector Store engine and share CDC
requirements.
2026-05-19 08:52:47 +02:00
Dawid Pawlik
c2d27d1a50 index: remove Chinese, Japanese, and Korean language analyzers
Remove "chinese", "japanese", and "korean" from the list of accepted
full-text search analyzer options. Exposing these options commits
ScyllaDB to supporting them long-term — if we ever switch from one
backend search engine to another, CJK analyzers are the most likely
to lose out-of-the-box support, unlike the popular European languages
that are broadly available across text analysis libraries.

Restrict the accepted set now, while FTS is still new, to avoid a
future compatibility burden.

Add a test to check if the CJK language analyzer options are rejected.

Fixes: VECTOR-672

Closes scylladb/scylladb#29877
2026-05-18 18:20:47 +03:00
Dawid Pawlik
2076164af9 index: unify custom index description
Move common description logic into a protected helper
`describe_with_target` on `custom_index`, so subclasses can delegate
to it when implementing the `describe()` virtual method.
2026-05-08 11:30:08 +02:00
Dawid Pawlik
fcd15b5cd4 index: add fulltext_index custom index implementation
Introduce `fulltext_index`, a new `custom_index` subclass
for full-text search (FTS).

The index validates that the target column is a text type
(text, varchar, or ascii) and supports two WITH OPTIONS keys:
- 'analyzer': one of standard, english, german, french, spanish,
  italian, portuguese, russian, chinese, japanese, korean, simple,
  whitespace
- 'positions': boolean controlling whether term positions are stored

`view_should_exist()` returns false — no backing materialized view is
created, matching the CDC-backed pattern used by `vector_index`.

Fixes: SCYLLADB-1517
2026-05-08 11:30:08 +02:00
Dawid Pawlik
a396129e5c index: extract option validation helpers
Move `validate_enumerated_option`, `validate_positive_option`,
and `validate_factor_option` into shared index option utilities
under the `secondary_index::util` namespace. These functions were
previously defined as file-local statics in `vector_index.cc` with
hardcoded index names in error messages.

The shared versions take `index_type_name` as a parameter, allowing
each `custom_index` subclass to pass its own name via the virtual
`index_type_name()` method at the call site. The options maps use
`std::bind_front` to bind config params (supported values, limits),
leaving `index_name` as the first unbound argument passed by
`check_index_options()`.

Add `index_type_name()` as a pure virtual method on `custom_index`.
Move the shared utility implementations into `index_option_utils.cc`
and update `vector_index.cc` to use them.
2026-05-08 11:28:39 +02:00
Nadav Har'El
1eb8d170dd Merge 'vector_index: allow recreating vector indexes on the same column' from Dawid Pawlik
This series allows creating multiple vector indexes on the same column so users can rebuild an index without losing query availability.

The intended flow is:
1. Create a new vector index on a column that already has one.
2. Keep serving ANN queries from the old index while the new one is being built.
3. Verify the new index is ready.
4. Automatically switch to the remaining index.
5. Drop the old index.

To make that deterministic, `index_version` is changed from the base table schema version to a real creation timeuuid. When multiple vector indexes exist on the same column, ANN query planning now picks the index according to the routing implemented in Vector Store (newest serving index). This keeps queries on the old index until it the new one is up and ready.

This patch also removes the create-time restriction that rejected a second vector index on the same column. Name collisions are still rejected as before.

Test coverage is updated accordingly:
- Scylla now verifies that two vector indexes can coexist on the same column.
- Cassandra/SAI behavior is still covered and is still expected to reject duplicate indexes on the same column.

Fixes: VECTOR-610

Closes scylladb/scylladb#29407

* github.com:scylladb/scylladb:
  docs: document vector index metadata and duplicate handling
  test/cqlpy: cover vector index duplicate creation rules
  vector_index: allow multiple named indexes on one column
  vector_index: store `index_version` as creation timeuuid
2026-04-15 14:40:15 +03:00
Avi Kivity
0ae22a09d4 LICENSE: Update to version 1.1
Updated terms of non-commercial use (must be a never-customer).
2026-04-12 19:46:33 +03:00
Dawid Pawlik
2dd8eef38c vector_index: store index_version as creation timeuuid
Vector indexes currently store the base table schema version in
`index_version`. That value is name-based, not time-based,
so it does not represent when the index was created.

Store a timeuuid instead and change the relevant interfaces from
`table_schema_version` to `utils::UUID`. This is a prerequisite
for supporting multiple vector indexes on the same column where
the oldest index must be selected deterministically via routing
implemented in Vector Store.

Update the cqlpy tests to check the new semantics directly:
recreating the index changes `index_version`, while ALTER TABLE does not.
2026-04-10 13:05:21 +02:00
Szymon Wasik
573def7cd8 cql: accept source_model option and show options in DESCRIBE
Accept the Cassandra SAI 'source_model' option for vector indexes.
This option is used by Cassandra libraries (e.g., CassIO, LangChain)
to tag vector indexes with the name of the embedding model that
produced the vectors.

ScyllaDB does not use the source_model value but stores it and
includes it in the DESCRIBE INDEX output for Cassandra compatibility.

Additionally, extend vector_index::describe() to emit a
WITH OPTIONS = {...} clause containing all user-provided index options
(filtering out system keys: target, class_name, index_version).
This makes options like similarity_function, source_model, etc.
visible in DESCRIBE output.
2026-04-09 17:20:03 +02:00
Szymon Wasik
80a2e4a0ab cql: add Cassandra SAI (StorageAttachedIndex) compatibility
Libraries such as CassIO, LangChain, and LlamaIndex create vector
indexes using Cassandra's StorageAttachedIndex (SAI) class name.
This commit lets ScyllaDB accept these statements without modification.

When a CREATE CUSTOM INDEX statement specifies an SAI class name on a
vector column, ScyllaDB automatically rewrites it to the native
vector_index implementation. Accepted class names (case-insensitive):
  - org.apache.cassandra.index.sai.StorageAttachedIndex
  - StorageAttachedIndex
  - sai

SAI on non-vector columns is rejected with a clear error directing
users to a secondary index instead.

The SAI detection and rewriting logic is extracted into a dedicated
static function (maybe_rewrite_sai_to_vector_index) to keep the
already-long validate_while_executing method manageable.

Multi-column (local index) targets and nonexistent columns are
skipped with continue — the former are treated as filtering columns
by vector_index::check_target(), and the latter are caught later by
vector_index::validate().

Tests that exercise features common to both backends (basic creation,
similarity_function, IF NOT EXISTS, bad options, etc.) now use the
SAI class name with the skip_on_scylla_vnodes fixture so they run
against both ScyllaDB and Cassandra. ScyllaDB-specific tests continue
to use USING 'vector_index' with scylla_only.
2026-04-09 17:20:03 +02:00
Karol Nowacki
493a4433e7 index: fix DESC INDEX for vector index
The `DESC INDEX` command returned incorrect results for local vector
indexes and for vector indexes that included filtering columns.

This patch corrects the implementation to ensure `DESCRIBE INDEX`
accurately reflects the index configuration.

This was a pre-existing issue, not a regression from recent
serialization schema changes for vector index target options.
2026-03-30 16:46:48 +02:00
Karol Nowacki
6bc88e817f vector_search: fix SELECT on local vector index
Queries against local vector indexes were failing with the error:
"ANN ordering by vector requires the column to be indexed using 'vector_index'"

This was a regression introduced by 15788c3734, which incorrectly
assumed the first column in the targets list is always the vector column.
For local vector indexes, the first column is the partition key, causing
the failure.

Previously, serialization logic for the target index option was shared
between vector and secondary indexes. This is no longer viable due to
the introduction of local vector indexes and vector indexes with filtering
columns, which have  different target format.

This commit introduces a dedicated JSON-based serialization format for
vector index targets, identifying the target column (tc), filtering
columns (fc), and partition key columns (pk). This ensures unambiguous
serialization and deserialization for all vector index types.

This change is backward compatible for regular vector indexes. However,
it breaks compatibility for local vector indexes and vector indexes with
filtering columns created in version 2026.1.0. To mitigate this, usage
of these specific index types will be blocked in the 2026.1.0 release
by failing ANN queries against them in vector-store service.

Fixes: SCYLLADB-895
2026-03-30 16:46:48 +02:00
Karol Nowacki
30487e8854 index: fix vector index with filtering target column
The secondary index mechanism is currently used to determine the target column.
This mechanism works incorrectly for vector indexes with filtering because
it returns the last specified column as the target (vectors) column.
However, the syntax for a vector index requires the first column to be the target:
```
CREATE CUSTOM INDEX ON t(vectors, users) USING 'vector_index';
```

This discrepancy eventually leads to the following exception when performing an
ANN search on a vector index with filtering columns:
````
ANN ordering by vector requires the column to be indexed using 'vector_index'
````

This commit fixes the issue by introducing dedicated logic for vector indexes
to correctly identify the target(vectors) column.

Fixes: SCYLLADB-635

Closes scylladb/scylladb#28740
2026-03-02 18:47:58 +02:00
Michał Hudobski
579ed6f19f secondary_index_manager: fix double registration bug
We have observed a bug that caused Scylla to crash due to metrics double
registration. This bug is really difficult to reproduce and was seen
only once in the wild. We think that it may be caused by a request
in-flight keeping a reference to the stats object, making it not
deregister when the index is dropped, which casues a double registration
when we recreate the index, however we are not 100% sure.

This patch makes it so the metrics always get deregistered when we drop the index, which should fix the double registration bug.

Fixes: #27252

Closes scylladb/scylladb#28655
2026-02-26 09:39:53 +01:00
Pawel Pery
f49c9e896a vector_search: allow full secondary indexes syntax while creating the vector index
Vector Search feature needs to support creating vector indexes with additional
filtering column. There will be two types of indexes: global which indexes
vectors per table, and local which indexes vectors per partition key. The new
syntaxes are based on ScyllaDB's Global Secondary Index and Local Secondary
Index. Vector indexes don't use secondary indexes functionalities in any way -
all indexing, filtering and processing data will be done on Vector Store side.

This patch allows creating vector indexes using this CQL syntax:

```
CREATE TABLE IF NOT EXISTS cycling.comments_vs (
  commenter text,
  comment text,
  comment_vector VECTOR <FLOAT, 5>,
  created_at timestamp,
  discussion_board_id int,
  country text,
  lang text,
  PRIMARY KEY ((commenter, discussion_board_id), created_at)
);

CREATE CUSTOM INDEX IF NOT EXISTS global_ann_index
  ON cycling.comments_vs(comment_vector, country, lang) USING 'vector_index'
  WITH OPTIONS = { 'similarity_function': 'DOT_PRODUCT' };

CREATE CUSTOM INDEX IF NOT EXISTS local_ann_index
  ON cycling.comments_vs((commenter, discussion_board_id), comment_vector, country, lang)
  USING 'vector_index'
  WITH OPTIONS = { 'similarity_function': 'DOT_PRODUCT' };
```

Currently, if we run these queries to create indexes we will receive such errors:

```
InvalidRequest: Error from server: code=2200 [Invalid query] message="Vector index can only be created on a single column"
InvalidRequest: Error from server: code=2200 [Invalid query] message="Local index definition must contain full partition key only. Redundant column: XYZ"
```

This commit refactors `vector_index::check_target` to correctly validate
columns building the index. Vector-store currently support filtering by native
types, so the type of columns is checked. The first column from the list must
be a vector (to build index based on these vectors), so it is also checked.

Allowed types for columns are native types without counter (it is not possible
to create a table with counter and vector) and without duration (it is not
possible to correctly compare durations, this type is even not allowed in
secondary indexes).

This commits adds cqlpy test to check errors while creating indexes.

Fixes: SCYLLADB-298

This needs to be backported to version 2026.1 as this is a fix for filtering support.

Closes scylladb/scylladb#28366
2026-01-30 01:14:31 +02:00
Szymon Malewski
c89957b725 vector_index: rescoring: Add hidden similarity score column
Rescoring consist of recalculating similarity score and reordering results based on it.
In this patch we add calculation of similarity score as a hidden (non-serialized) column and following patch will add reordering.
Normal ordering uses `add_column_for_post_processing`, however this works only for regular columns, not function.
So we create it together with user requested columns (this also forces the use of `selection_with_processing`) and hide the column later.
This also requires special handling for 'SELECT *' case - we need to manually add all columns before adding similarity column.

In case user already asks for similarity score in the SELECT clause, this value will be calculated twice - is should be optimized in future patches.
2026-01-22 15:38:40 +01:00
Szymon Malewski
c5945b1ef4 vector_index: introduce rescoring option
This patch adds vector index option allowing to enable rescoring - recalculation of similarity metric and re-ranking of quantized VS candidates.
Quantization is a necessary condition to run rescoring - checked in convenience function `is_rescoring_enabled`.
Rescoring itself is not implemented - it will come in following patches.

Fixes https://scylladb.atlassian.net/browse/SCYLLADB-294
2026-01-20 21:01:45 +01:00
Szymon Malewski
262a8cef0b vector_index: improve options validation
In this patch we enhance validation of option by:
- giving context (option name) in error messages
- listing supported values in error messages of enumerated options
- avoiding using templates

Fixes https://scylladb.atlassian.net/browse/SCYLLADB-293
Follow-up: https://github.com/scylladb/scylladb/pull/27677
2026-01-20 21:01:41 +01:00
Nadav Har'El
70b3cd0540 Merge 'vector_index: introduce quantization and oversampling options' from Szymon Malewski
This patch adds vector index options allowing to enable quantization and oversampling.
Specific quantization value will be used internally by vector store.

In the current implementation, get_oversampling allows us to decide how many times more candidates
to retrieve from vector store - final response is still trimmed to the given limit.
It is a first step to allow rescoring - recalculation of similarity metric and re-ranking.
Without rescoring oversampling will be also further optimized to happen internally in vector store.

`test/vector_search/rescoring_test.cc` implements basic tests of added functionality.
New options are documented in `docs/cql/secondary-indexes.rst`.

Fixes https://scylladb.atlassian.net/browse/SCYLLADB-82
Ref https://scylladb.atlassian.net/browse/SCYLLADB-83

New feature - no backporting

Closes scylladb/scylladb#27677

* github.com:scylladb/scylladb:
  vector_search: doc: Document new index options
  vector_search: test: Test oversampling
  vector_search: test: Add rescoring index options test
  vector_search: test: Extract Configure utility to shared header
  vector_index: introduce `quantization` and `oversampling` options
2026-01-20 08:50:46 +02:00
Botond Dénes
e01041d3ee db/system_keyspace: move remining tables out of v3 keyspace
The last remining tables in the v3 keyspace are those that are genuinely
distinct -- added by Cassandra 3.0 or >= ScyllaDB 2.0.
Move these out of the v3 keyspace too, with this the v3 keyspace is
defunct and removed.
2026-01-19 12:32:21 +02:00
Szymon Malewski
b8e91ee6ae vector_index: introduce quantization and oversampling options
This patch adds vector index options allowing to enable quantization and oversampling.
Specific quantization value will be used internally by vector store.

In the current implementation, `get_oversampling` allows us to decide how many times more candidates
to retrieve from vector store - final response is still trimmed to the given limit.
It is a first step to allow rescoring - recalculation of similarity metric and re-ranking.
Without rescoring oversampling will be also further optimized to happen internally in vector store.

Fixes https://scylladb.atlassian.net/browse/SCYLLADB-82
Ref https://scylladb.atlassian.net/browse/SCYLLADB-83
2026-01-19 10:21:43 +01:00
Botond Dénes
122b7847e5 Merge 'index: Accept view properties in CREATE INDEX' from Dawid Mędrek
Problem
-------
Secondary indexes are implemented via materialized views under the
hood. The way an index behaves is determined by the configuration
of the view. Currently, it can be modified by performing the CQL
statement `ALTER MATERIALIZED VIEW` on it. However, that raises some
concerns.

Consider, for instance, the following scenario:

1. The user creates a secondary index on a table.
2. In parallel, the user performs writes to the base table.
3. The user modifies the underlying materialized view, e.g. by setting
   the `synchronous_updates` to `true` [1].

Some of the writes that happened before step 3 used the default value
of the property (which is `false`). That had an actual consequence
on what happened later on: the view updates were performed
asynchronously. Only after step 3 had finished did it change.

Unfortunately, as of now, there is no way to avoid a situation like
that. Whenever the user wants to configure a secondary index they're
creating, they need to do it in another schema change. Since it's
not always possible to control how the database is manipulated in
the meantime, it leads to problems like the one described.

That's not all, though. The fact that it's not possible to configure
secondary indexes is inconsistent with other schema entities. When
it comes to tables or materialized views, the user always have a means
to set some or even all of the properties during their creation.

Solution
--------
The solution to this problem is extending the `CREATE INDEX` CQL
statement by view properties. The syntax is of form:

```
> CREATE INDEX <index name>
> .. ON <keyspace>.<table> (<columns>)
> .. WITH <properties>
```

where `<properties>` corresponds to both index-specific and view
properties [2, 3]. View properties can only be used with indexes
implemented with materialized views; for example, it will be impossible
to create a vector index when specifying any view property (see
examples below).

When a view property is provided, it will be applied when creating the
underlying materialized view. The behavior should be similar to how
other CQL statements responsible for creating schema entities work.

High-level implementation strategy
----------------------------------
1. Make auxiliary changes.
2. Introduce data structures representing the new set of index
   properties: both index-specific and those corresponding to the
   underlying view.
3. Extend `CREATE INDEX` to accept view properties.
4. Extend `DESCRIBE INDEX` and other `DESCRIBE` statements to include
   view properties in their output.

User documentation is also updated at the steps to reflect the
corresponding changes.

Implementation considerations
-----------------------------
There are a number of schema properties that are now obsolete. They're
accepted by other CQL statements, but they have no effect. They
include:

* `index_interval`
* `replicate_on_write`
* `populate_io_cache_on_flush`
* `read_repair_chance`
* `dclocal_read_repair_chance`

If the user tries to create a secondary index specifying any of those
keywords, the statement will fail with an appropriate error (see
examples below).

Unlike materialized views, we forbid specifying the clustering order
when creating a secondary index [4]. This limitation may be lifted
later on, but it's a detail that may or may not prove troublesome. It's
better to postpone covering it to when we have a better perspective on
the consequences it would bring.

Examples
--------
Good examples
```
> CREATE INDEX idx ON ks.t (v);
> CREATE INDEX idx ON ks.t (v) WITH comment = 'ok view property';
> CREATE INDEX idx ON ks.t (v)
  .. WITH comment = 'multiple view properties are ok'
  .. AND synchronous_updates = true;
> CREATE INDEX idx ON ks.t (v)
  .. WITH comment = 'default value ok'
  .. AND synchronous_updates = false;
```

Bad examples
```
> CREATE INDEX idx ON ks.t (v) WITH replicate_on_write = true;

SyntaxException: Unknown property 'replicate_on_write'

> CREATE INDEX idx ON ks.t (v)
  .. WITH OPTIONS = {'option1': 'value1'}
  .. AND comment = 'some text';

InvalidRequest: Error from server: code=2200 [Invalid query]
  message="Cannot specify options for a non-CUSTOM index"

> CREATE CUSTOM INDEX idx ON ks.t (v)
  .. WITH OPTIONS = {'option1': 'value1'}
  .. AND comment = 'some text';

InvalidRequest: Error from server: code=2200 [Invalid query]
  message="CUSTOM index requires specifying the index class"

> CREATE CUSTOM INDEX idx ON ks.t (v)
  .. USING 'vector_index'
  .. WITH OPTIONS = {'option1': 'value1'}
  .. AND comment = 'some text';

InvalidRequest: Error from server: code=2200 [Invalid query]
  message="You cannot use view properties with a vector index"

> CREATE INDEX idx ON ks.t (v) WITH CLUSTERING ORDER BY (v ASC);

InvalidRequest: Error from server: code=2200 [Invalid query]
  message="Indexes do not allow for specifying the clustering order"
```

and so on. For more examples, see the relevant tests.

References:
[1] https://docs.scylladb.com/manual/branch-2025.4/cql/cql-extensions.html#synchronous-materialized-views
[2] https://docs.scylladb.com/manual/branch-2025.4/cql/secondary-indexes.html#create-index
[3] https://docs.scylladb.com/manual/branch-2025.4/cql/mv.html#mv-options
[4] https://docs.scylladb.com/manual/branch-2025.4/cql/dml/select.html#ordering-clause

Fixes scylladb/scylladb#16454

Backport: not needed. This is an enhancement.

Closes scylladb/scylladb#24977

* github.com:scylladb/scylladb:
  cql3: Extend DESC INDEX by view properties
  cql3: Forbid using CLUSTERING ORDER BY when creating index
  cql3: Extend CREATE INDEX by MV properties
  cql3/statements/create_index_statement: Allow for view options
  cql3/statements/create_index_statement: Rename member
  cql3/statements/index_prop_defs: Re-introduce index_prop_defs
  cql3/statements/property_definitions: Add extract_property()
  cql3/statements/index_prop_defs.cc: Add namespace
  cql3/statements/index_prop_defs.hh: Rename type
  cql3/statements/view_prop_defs.cc: Move validation logic into file
  cql3/statements: Introduce view_prop_defs.{hh,cc}
  cql3/statements/create_view_statement.cc: Move validation of ID
  schema/schema.hh: Do not include index_prop_defs.hh
2026-01-14 09:54:27 +02:00
Avi Kivity
c6dfae5661 treewide: #include Seastar headers with angle brackets
Seastar is an external library from the point of view of
ScyllaDB, so should be included with angle brackets.

Closes scylladb/scylladb#27947
2026-01-13 14:56:15 +02:00
Dawid Mędrek
dcf2c71204 cql3/statements/index_prop_defs.hh: Rename type
We rename the type `index_prop_defs` to `index_specific_prop_defs`.
The rationale for the change is to distinguish between properties
related directly to a index and properties related to the underlying
view (if applicable).

The type `index_prop_defs` will be re-introduced in an upcomming commit
where it'll encompass both index-related and view-related properties.
This is a prerequisite for it.
2025-12-16 11:43:37 +01:00
Amnon Heiman
68c7236acb vector_index: require tablets for vector indexes
This patch enforces that vector indexes can only be created on keyspaces
that use tablets. During index validation, `check_uses_tablets()` verifies
the base keyspace configuration and rejects creation otherwise.

To support this, the `custom_index::validate()` API now receives a
`const data_dictionary::database&` parameter, allowing index
implementations to access keyspace-level settings during DDL validation.

Fixes https://scylladb.atlassian.net/browse/VECTOR-322

Closes scylladb/scylladb#26786
2025-11-26 13:30:43 +02:00
Amnon Heiman
b2c2a99741 index/vector_index.cc: Don't allow zero as an index option
This patch forces vector_index option value to be real-positive numbers
as zero would make no senese.

Fixes https://scylladb.atlassian.net/browse/VECTOR-249

Signed-off-by: Amnon Heiman <amnon@scylladb.com>

Closes scylladb/scylladb#27191
2025-11-25 10:05:44 +02:00
Radosław Cybulski
d589e68642 Add precompiled headers to CMakeLists.txt
Add precompiled header support to CMakeLists.txt and configure.py -
it improves compilation time by approximately 10%.

New header `stdafx.hh` is added, don't include it manually -
the compiler will include it for you. The header contains includes from
external libraries used by Scylla - seastar, standard library,
linux headers and zlib.

The feature is enabled by default, use CMake option `Scylla_USE_PRECOMPILED_HEADER`
or configure.py --disable-precompiled-header to disable.

The feature should be disabled, when trying to check headers - otherwise
you might get false negatives on missing includes from seastar / abseil and so on.

Note: following configuration needs to be added to ccache.conf:

    sloppiness = pch_defines,time_macros,include_file_mtime,include_file_ctime

Closes scylladb/scylladb#26617
2025-11-21 12:27:41 +02:00
Michał Hudobski
46589bc64c secondary_index: disallow multiple vector indexes on the same column
We currently allow creating multiple vector indexes on one column.
This doesn't make much sense as we do not support picking one when
making ann queries.

To make this less confusing and to make our behavior similar
to Cassandra we disallow the creation of multiple vector indexes
on one column.

We also add a test that checks this behavior.

Fixes: VECTOR-254
Fixes: #26672

Closes scylladb/scylladb#26508
2025-10-29 11:55:38 +02:00
Dawid Mędrek
e294b80615 index: Make create_view_for_index method of create_index_statement 2025-10-20 14:04:16 +02:00
Dawid Mędrek
fe00485491 index: Move code for creating MV of secondary index to cql3
We move the code responsible for creating the schema for the underlying
materialized view of a secondary index from `index/` to `cql3/` so that
it's close to that responsible for performing `CREATE INDEX`. That's in
line with how other CQL statements are designed.

Note that the moved method is still a method of `secondary_index_manager`.
We'll make it a method of `create_index_statement` in the following
commit.
2025-10-20 14:04:11 +02:00
Dawid Mędrek
ecc955fbe0 index/secondary_index_manager: Take std::span instead of std::vector 2025-10-09 16:17:07 +02:00
Dawid Mędrek
074f0f2e4c index/secondary_index_manager: Add missing const qualifier 2025-10-09 16:06:50 +02:00
Dawid Mędrek
7baf95bc4b index/vector_index: Add missing const qualifiers 2025-10-09 16:06:24 +02:00
Ernest Zaslavsky
54aa552af7 treewide: Move type related files to a type directory As requested in #22110, moved the files and fixed other includes and build system.
Moved files:
- duration.hh
- duration.cc
- concrete_types.hh

Fixes: #22110

This is a cleanup, no need to backport

Closes scylladb/scylladb#25088
2025-09-17 17:32:19 +03:00
Nadav Har'El
e322902506 Merge 'index, metrics: add per-index metrics' from Michał Hudobski
This patch adds the possibility to track metrics
per secondary index. Currently, only a histogram
of query latencies is tracked, but more metrics
can be added in the future. To add a new metric,
it needs to be added to the index_metrics struct
in index/secondary_index_manager.hh and then
initialized in index/secondary_index_manager.cc
in the constructor of the index_metrics struct.
The metrics are created when the index is created
and removed when the index is dropped.

First lines of the new metric:
\# HELP scylla_index_query_latencies Index query latencies
\# TYPE scylla_index_query_latencies histogram
scylla_index_query_latencies_sum{idx="test_i_idx",ks="test"} 640
scylla_index_query_latencies_count{idx="test_i_idx",ks="test"} 1
scylla_index_query_latencies_bucket{idx="test_i_idx",ks="test",le="640.000000"} 1
scylla_index_query_latencies_bucket{idx="test_i_idx",ks="test",le="768.000000"} 1

Fixes: https://github.com/scylladb/scylladb/issues/25970

Closes scylladb/scylladb#25995

* github.com:scylladb/scylladb:
  test: verify that the index metric is added
  index, metrics: add per-index metrics
2025-09-17 14:54:12 +03:00
Ernest Zaslavsky
d624413ddd treewide: Move query related files to a new query directory
As requested in #22120, moved the files and fixed other includes and build system.

Moved files:
- query.cc
- query-request.hh
- query-result.hh
- query-result-reader.hh
- query-result-set.cc
- query-result-set.hh
- query-result-writer.hh
- query_id.hh
- query_result_merger.hh

Fixes: #22120

This is a cleanup, no need to backport

Closes scylladb/scylladb#25105
2025-09-16 23:40:47 +03:00
Michał Hudobski
b09d1f0a98 index, metrics: add per-index metrics
This patch adds the possibility to track metrics
per secondary index. Currently, only a histogram
of query latencies is tracked, but more metrics
can be added in the future. To add a new metric,
it needs to be added to the index_metrics struct
in index/secondary_index_manager.hh and then
initialized in index/secondary_index_manager.cc
in the constructor of the index_metrics struct.
The metrics are created when the index is created
and removed when the index is dropped.

First lines of the new metric:
\# HELP scylla_index_query_latencies Index query latencies
\# TYPE scylla_index_query_latencies histogram
scylla_index_query_latencies_sum{idx="test_i_idx",ks="test"} 640
scylla_index_query_latencies_count{idx="test_i_idx",ks="test"} 1
scylla_index_query_latencies_bucket{idx="test_i_idx",ks="test",le="640.000000"} 1
scylla_index_query_latencies_bucket{idx="test_i_idx",ks="test",le="768.000000"} 1
2025-09-16 14:03:43 +02:00
Dawid Pawlik
909a51e524 vector_index, index_prop_defs: add version to index options
Since creating the vector index does not lead to creation
of a view table [#24438] (whose version info had been logged in
`system_schema.scylla_tables`) we lack the information about
the version of the index.

The mentioned version is used to recognize the quick-drop-create
index with the same parameters that needs to be rebuild.

The case is mainly experienced while testing, benchmarking
or experimenting with Vector Search.
Nevertheless it is important to have it considered, as it is really
weird having seen that DROP and CREATE commands did not change
anything.
Although being nice "optimization" to use the same old index,
the rebuild feels more natural for the get-to-know-VS-users.
Should not change anything in a real production environment.

The solution we arrived at is to add the version as a field in
options column of `system_schema.indexes`.
The version of vector index is a base table's schema version
on which the index was created.
The table's schema version changes everytime a table is changed
meaning that CREATE INDEX or DROP INDEX statement also change it.
Every index has a different index version, so it allows to identify
them easily.

This patch implements the solution described above.
2025-09-10 15:16:54 +02:00
Radosław Cybulski
c242234552 Revert "build: add precompiled headers to CMakeLists.txt"
This reverts commit 01bb7b629a.

Closes scylladb/scylladb#25735
2025-09-03 09:46:00 +03:00
Dawid Pawlik
873d7dba5c custom index: rename custom_index_option_name
Renamed `custom_index_option_name` to `custom_class_option_name`
as the late was a bit misleading since we refactored our model
of custom indexes to be index class reliant.
2025-08-29 10:49:15 +02:00
Dawid Pawlik
18e4b9d989 vector_index: rename supported_options to vector_index_options
There are a few types of index options abstraction in a code.
One is `raw_options` which indicates the options provided by the user
via CQL. Another is `options` which includes the real index options
after correction checks and addition of system-set options.
I believe we do not need another abstraction with undescriptive name.

This patch adds a little neatness, describing what should the developer
understand by looking at the `supported_options`.

This options are only provided for the vector index to setup the external
index properly with parameters strongly related to Vector Search.
2025-08-29 10:47:02 +02:00
Radosław Cybulski
01bb7b629a build: add precompiled headers to CMakeLists.txt
Add precompiled header support to CMakeLists.txt and configure.py -
it improves compilation time by approximately 10%.

New header `stdafx.hh` is added, don't include it manually -
the compiler will include it for you. The header contains includes from
external libraries used by Scylla - seastar, standard library,
linux headers and zlib.

The feature is enabled by default, use CMake option `Scylla_USE_PRECOMPILED_HEADER`
or configure.py --disable-precompiled-header to disable.

The feature should be disabled, when trying to check headers - otherwise
you might get false negatives on missing includes from seastar / abseil and so on.

Note: following configuration needs to be added to ccache.conf:

    sloppiness = pch_defines,time_macros

Closes #25182
2025-08-27 21:37:54 +03:00