scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-30 19:46:48 +00:00

Author	SHA1	Message	Date
Nadav Har'El	21ecc12fc6	Merge 'index: fix local vector index locality detection after schema reload' from Michał Hudobski After schema reload, `target_parser::is_local()` did not recognize the vector-index local target format `{"pk": [...], "tc": "..."}`, causing local vector indexes to be treated as global. This broke duplicate detection when both a global and a local vector index existed on the same column. Fix by introducing `vector_index::is_local()` and dispatching to it from `create_index_from_index_row()` based on the index class. Also adds tests for local/global vector index coexistence. Fixes: SCYLLADB-987 backport reasoning: we added local vector index support in 2026.1 Closes scylladb/scylladb#29492 * github.com:scylladb/scylladb: test/cqlpy: add tests for global and local vector index coexistence index: fix local vector index locality detection after schema reload	2026-05-27 15:34:57 +03:00
Nadav Har'El	96dd3121e7	Merge 'cql: rewrite CassIO SAI metadata index to regular secondary index' from Szymon Wasik CassIO (the library backing LangChain's `langchain_community.vectorstores.Cassandra` integration) issues the following DDL during schema setup to create a metadata index: ```sql CREATE CUSTOM INDEX IF NOT EXISTS eidx_metadata_s_<table> ON <keyspace>.<table> (ENTRIES(metadata_s)) USING 'org.apache.cassandra.index.sai.StorageAttachedIndex'; ``` ScyllaDB does not support Cassandra's StorageAttachedIndex (SAI) for non-vector columns and previously rejected this statement with: ``` StorageAttachedIndex (SAI) is only supported on vector columns; use a secondary index for non-vector columns ``` This blocks seamless migration of existing LangChain/CassIO applications from Cassandra to ScyllaDB — applications fail during initialization before any application-level workaround can run, even when metadata filtering is not used (`metadata_indexing="none"`). CassIO is no longer actively maintained but remains the only official LangChain integration path for Apache Cassandra over CQL, meaning existing applications will continue using this setup pattern. Instead of rejecting the CassIO metadata-map SAI DDL, detect the pattern and rewrite it to a standard ScyllaDB secondary index on collection entries: - Detection: SAI class name + single `ENTRIES` target on a non-frozen `map` column - Rewrite: Clear the custom class so the index is created through the standard secondary index path (which already fully supports indexing map entries) - Warning: Emit a CQL warning informing the user that SAI is not supported by ScyllaDB, a regular secondary index was created instead, and metadata filtering behavior may differ from Cassandra SAI The rewrite is placed early in `validate_while_executing()`, before the rf-rack-validity check, so the standard secondary index code path handles all subsequent validation naturally — no code duplication. After this change, the CassIO schema setup succeeds on ScyllaDB: - `CREATE CUSTOM INDEX ... USING 'sai'` on `ENTRIES(metadata_s)` creates a real secondary index - The index is functional and can accelerate metadata filtering queries - A CQL warning makes the rewrite transparent to operators - SAI on non-vector, non-map-entries columns is still rejected as before - Vector SAI indexes continue to be rewritten to `vector_index` as before - `test_sai_entries_on_map_creates_regular_index` — verifies the index is created and the warning is emitted (fully-qualified SAI class name) - `test_sai_entries_on_map_short_name` — same with the `'sai'` short alias - `test_sai_on_regular_column_rejected` — confirms SAI on regular scalar columns is still rejected All 148 tests in `test_vector_index.py` and `test_secondary_index.py` pass with no regressions (125 passed, 22 xfailed, 1 skipped). Fixes: SCYLLADB-2113 Backport: 2026.2 as this is the version where the support for SAI class needed by LangChain was added. Closes scylladb/scylladb#29981 * github.com:scylladb/scylladb: cql: rewrite CassIO SAI metadata index to regular secondary index db/config: add enable_cassio_compatibility flag	2026-05-26 00:19:03 +03:00
Michał Hudobski	1d17d2144f	index, vector_index: limit primary key columns to 255 The vector-store's InvariantKey type supports at most 255 key components. Reject vector index creation when the base table's primary key (partition + clustering columns) exceeds this limit. Fixes: VECTOR-553 Closes scylladb/scylladb#29317	2026-05-25 19:24:17 +03:00
Szymon Wasik	5ee339b11d	cql: rewrite CassIO SAI metadata index to regular secondary index When CassIO creates a SAI ENTRIES index on a map column, ScyllaDB now rewrites it to a regular secondary index and emits a CQL warning. This allows LangChain/CassIO applications to work without DDL errors. The rewrite is gated behind the enable_cassio_compatibility flag (disabled by default). Refs: SCYLLADB-2113	2026-05-25 15:11:43 +02:00
Michał Hudobski	cf372ba87b	index: fix local vector index locality detection after schema reload When index metadata was deserialized from system tables during schema reload, target_parser::is_local() failed to recognize local vector indexes. It only handled the non-vector JSON format {"pk": [...], "ck": [...]}, but vector indexes serialize their targets as {"pk": [...], "tc": "..."}. As a result, every local vector index was incorrectly marked as global after a schema reload. Fix this by introducing vector_index::is_local() that recognizes the vector-specific target format, and dispatching to it from the schema deserialization code based on the index class name. This keeps target_parser as secondary-index-specific and follows the same dispatch pattern already used for target serialization. Also remove the now-unused has_vector_index_on_column() helper (its callers were removed by #29407).	2026-05-21 10:35:48 +02:00
Dawid Pawlik	a631123c06	external_index: fix require CDC options for disabled CDC Since we want to remove the requirement of disallowing "explicitly disabled" CDC table when creating external index (#29894), we still need to check other CDC required parameters to be set properly. Before this commit, once we auto-enable CDC which was "explicitly disabled", we would never run the `check_cdc_options()`. This patch adjusts the check to happen not only when the CDC enabled is true.	2026-05-19 08:53:15 +02:00
Dawid Pawlik	9e02e11ea8	fulltext_index: enforce CDC requirements for fulltext indexes Fulltext indexes rely on CDC to track changes for asynchronous index building. Enforce the following CDC constraints during CREATE INDEX: - CDC TTL must be at least 86400 seconds (24 hours) - CDC delta mode must be 'full' or postimage must be enabled Add `has_fulltext_index()` and `check_cdc_options()` so that other modules can detect fulltext indexes and validate CDC settings: - include fulltext indexes in `cdc_enabled()` so the CDC log is auto-created, and validate CDC options in `on_before_update_column_family()` - block `ALTER TABLE ... WITH cdc = {'enabled': false}` when a fulltext index exists on the table	2026-05-19 08:52:47 +02:00
Dawid Pawlik	69dc62c373	fulltext_index: require tablet storage for fulltext indexes Fulltext indexes, like vector indexes, require the base table's keyspace to use tablets. Add `check_uses_tablets()` validation to `fulltext_index::validate()` that rejects index creation when the keyspace does not use tablet storage. Also add `skip_without_tablets` fixture to all existing fulltext index tests so they are skipped in environments where tablets are not available.	2026-05-19 08:52:47 +02:00
Dawid Pawlik	61d658106a	index: introduce `external_index` base class for VS/FTS indexes Add `external_index` as a common base for `vector_index` and `fulltext_index`, both of which are backed by an external Vector Store engine and share CDC requirements.	2026-05-19 08:52:47 +02:00
Dawid Pawlik	c2d27d1a50	index: remove Chinese, Japanese, and Korean language analyzers Remove "chinese", "japanese", and "korean" from the list of accepted full-text search analyzer options. Exposing these options commits ScyllaDB to supporting them long-term — if we ever switch from one backend search engine to another, CJK analyzers are the most likely to lose out-of-the-box support, unlike the popular European languages that are broadly available across text analysis libraries. Restrict the accepted set now, while FTS is still new, to avoid a future compatibility burden. Add a test to check if the CJK language analyzer options are rejected. Fixes: VECTOR-672 Closes scylladb/scylladb#29877	2026-05-18 18:20:47 +03:00
Dawid Pawlik	2076164af9	index: unify custom index description Move common description logic into a protected helper `describe_with_target` on `custom_index`, so subclasses can delegate to it when implementing the `describe()` virtual method.	2026-05-08 11:30:08 +02:00
Dawid Pawlik	fcd15b5cd4	index: add `fulltext_index` custom index implementation Introduce `fulltext_index`, a new `custom_index` subclass for full-text search (FTS). The index validates that the target column is a text type (text, varchar, or ascii) and supports two WITH OPTIONS keys: - 'analyzer': one of standard, english, german, french, spanish, italian, portuguese, russian, chinese, japanese, korean, simple, whitespace - 'positions': boolean controlling whether term positions are stored `view_should_exist()` returns false — no backing materialized view is created, matching the CDC-backed pattern used by `vector_index`. Fixes: SCYLLADB-1517	2026-05-08 11:30:08 +02:00
Dawid Pawlik	a396129e5c	index: extract option validation helpers Move `validate_enumerated_option`, `validate_positive_option`, and `validate_factor_option` into shared index option utilities under the `secondary_index::util` namespace. These functions were previously defined as file-local statics in `vector_index.cc` with hardcoded index names in error messages. The shared versions take `index_type_name` as a parameter, allowing each `custom_index` subclass to pass its own name via the virtual `index_type_name()` method at the call site. The options maps use `std::bind_front` to bind config params (supported values, limits), leaving `index_name` as the first unbound argument passed by `check_index_options()`. Add `index_type_name()` as a pure virtual method on `custom_index`. Move the shared utility implementations into `index_option_utils.cc` and update `vector_index.cc` to use them.	2026-05-08 11:28:39 +02:00
Nadav Har'El	1eb8d170dd	Merge 'vector_index: allow recreating vector indexes on the same column' from Dawid Pawlik This series allows creating multiple vector indexes on the same column so users can rebuild an index without losing query availability. The intended flow is: 1. Create a new vector index on a column that already has one. 2. Keep serving ANN queries from the old index while the new one is being built. 3. Verify the new index is ready. 4. Automatically switch to the remaining index. 5. Drop the old index. To make that deterministic, `index_version` is changed from the base table schema version to a real creation timeuuid. When multiple vector indexes exist on the same column, ANN query planning now picks the index according to the routing implemented in Vector Store (newest serving index). This keeps queries on the old index until it the new one is up and ready. This patch also removes the create-time restriction that rejected a second vector index on the same column. Name collisions are still rejected as before. Test coverage is updated accordingly: - Scylla now verifies that two vector indexes can coexist on the same column. - Cassandra/SAI behavior is still covered and is still expected to reject duplicate indexes on the same column. Fixes: VECTOR-610 Closes scylladb/scylladb#29407 * github.com:scylladb/scylladb: docs: document vector index metadata and duplicate handling test/cqlpy: cover vector index duplicate creation rules vector_index: allow multiple named indexes on one column vector_index: store `index_version` as creation timeuuid	2026-04-15 14:40:15 +03:00
Avi Kivity	0ae22a09d4	LICENSE: Update to version 1.1 Updated terms of non-commercial use (must be a never-customer).	2026-04-12 19:46:33 +03:00
Dawid Pawlik	2dd8eef38c	vector_index: store `index_version` as creation timeuuid Vector indexes currently store the base table schema version in `index_version`. That value is name-based, not time-based, so it does not represent when the index was created. Store a timeuuid instead and change the relevant interfaces from `table_schema_version` to `utils::UUID`. This is a prerequisite for supporting multiple vector indexes on the same column where the oldest index must be selected deterministically via routing implemented in Vector Store. Update the cqlpy tests to check the new semantics directly: recreating the index changes `index_version`, while ALTER TABLE does not.	2026-04-10 13:05:21 +02:00
Szymon Wasik	573def7cd8	cql: accept source_model option and show options in DESCRIBE Accept the Cassandra SAI 'source_model' option for vector indexes. This option is used by Cassandra libraries (e.g., CassIO, LangChain) to tag vector indexes with the name of the embedding model that produced the vectors. ScyllaDB does not use the source_model value but stores it and includes it in the DESCRIBE INDEX output for Cassandra compatibility. Additionally, extend vector_index::describe() to emit a WITH OPTIONS = {...} clause containing all user-provided index options (filtering out system keys: target, class_name, index_version). This makes options like similarity_function, source_model, etc. visible in DESCRIBE output.	2026-04-09 17:20:03 +02:00
Szymon Wasik	80a2e4a0ab	cql: add Cassandra SAI (StorageAttachedIndex) compatibility Libraries such as CassIO, LangChain, and LlamaIndex create vector indexes using Cassandra's StorageAttachedIndex (SAI) class name. This commit lets ScyllaDB accept these statements without modification. When a CREATE CUSTOM INDEX statement specifies an SAI class name on a vector column, ScyllaDB automatically rewrites it to the native vector_index implementation. Accepted class names (case-insensitive): - org.apache.cassandra.index.sai.StorageAttachedIndex - StorageAttachedIndex - sai SAI on non-vector columns is rejected with a clear error directing users to a secondary index instead. The SAI detection and rewriting logic is extracted into a dedicated static function (maybe_rewrite_sai_to_vector_index) to keep the already-long validate_while_executing method manageable. Multi-column (local index) targets and nonexistent columns are skipped with continue — the former are treated as filtering columns by vector_index::check_target(), and the latter are caught later by vector_index::validate(). Tests that exercise features common to both backends (basic creation, similarity_function, IF NOT EXISTS, bad options, etc.) now use the SAI class name with the skip_on_scylla_vnodes fixture so they run against both ScyllaDB and Cassandra. ScyllaDB-specific tests continue to use USING 'vector_index' with scylla_only.	2026-04-09 17:20:03 +02:00
Karol Nowacki	493a4433e7	index: fix DESC INDEX for vector index The `DESC INDEX` command returned incorrect results for local vector indexes and for vector indexes that included filtering columns. This patch corrects the implementation to ensure `DESCRIBE INDEX` accurately reflects the index configuration. This was a pre-existing issue, not a regression from recent serialization schema changes for vector index target options.	2026-03-30 16:46:48 +02:00
Karol Nowacki	6bc88e817f	vector_search: fix SELECT on local vector index Queries against local vector indexes were failing with the error: "ANN ordering by vector requires the column to be indexed using 'vector_index'" This was a regression introduced by `15788c3734`, which incorrectly assumed the first column in the targets list is always the vector column. For local vector indexes, the first column is the partition key, causing the failure. Previously, serialization logic for the target index option was shared between vector and secondary indexes. This is no longer viable due to the introduction of local vector indexes and vector indexes with filtering columns, which have different target format. This commit introduces a dedicated JSON-based serialization format for vector index targets, identifying the target column (tc), filtering columns (fc), and partition key columns (pk). This ensures unambiguous serialization and deserialization for all vector index types. This change is backward compatible for regular vector indexes. However, it breaks compatibility for local vector indexes and vector indexes with filtering columns created in version 2026.1.0. To mitigate this, usage of these specific index types will be blocked in the 2026.1.0 release by failing ANN queries against them in vector-store service. Fixes: SCYLLADB-895	2026-03-30 16:46:48 +02:00
Karol Nowacki	30487e8854	index: fix vector index with filtering target column The secondary index mechanism is currently used to determine the target column. This mechanism works incorrectly for vector indexes with filtering because it returns the last specified column as the target (vectors) column. However, the syntax for a vector index requires the first column to be the target: ``` CREATE CUSTOM INDEX ON t(vectors, users) USING 'vector_index'; ``` This discrepancy eventually leads to the following exception when performing an ANN search on a vector index with filtering columns: ```` ANN ordering by vector requires the column to be indexed using 'vector_index' ```` This commit fixes the issue by introducing dedicated logic for vector indexes to correctly identify the target(vectors) column. Fixes: SCYLLADB-635 Closes scylladb/scylladb#28740	2026-03-02 18:47:58 +02:00
Michał Hudobski	579ed6f19f	secondary_index_manager: fix double registration bug We have observed a bug that caused Scylla to crash due to metrics double registration. This bug is really difficult to reproduce and was seen only once in the wild. We think that it may be caused by a request in-flight keeping a reference to the stats object, making it not deregister when the index is dropped, which casues a double registration when we recreate the index, however we are not 100% sure. This patch makes it so the metrics always get deregistered when we drop the index, which should fix the double registration bug. Fixes: #27252 Closes scylladb/scylladb#28655	2026-02-26 09:39:53 +01:00
Pawel Pery	f49c9e896a	vector_search: allow full secondary indexes syntax while creating the vector index Vector Search feature needs to support creating vector indexes with additional filtering column. There will be two types of indexes: global which indexes vectors per table, and local which indexes vectors per partition key. The new syntaxes are based on ScyllaDB's Global Secondary Index and Local Secondary Index. Vector indexes don't use secondary indexes functionalities in any way - all indexing, filtering and processing data will be done on Vector Store side. This patch allows creating vector indexes using this CQL syntax: ``` CREATE TABLE IF NOT EXISTS cycling.comments_vs ( commenter text, comment text, comment_vector VECTOR <FLOAT, 5>, created_at timestamp, discussion_board_id int, country text, lang text, PRIMARY KEY ((commenter, discussion_board_id), created_at) ); CREATE CUSTOM INDEX IF NOT EXISTS global_ann_index ON cycling.comments_vs(comment_vector, country, lang) USING 'vector_index' WITH OPTIONS = { 'similarity_function': 'DOT_PRODUCT' }; CREATE CUSTOM INDEX IF NOT EXISTS local_ann_index ON cycling.comments_vs((commenter, discussion_board_id), comment_vector, country, lang) USING 'vector_index' WITH OPTIONS = { 'similarity_function': 'DOT_PRODUCT' }; ``` Currently, if we run these queries to create indexes we will receive such errors: ``` InvalidRequest: Error from server: code=2200 [Invalid query] message="Vector index can only be created on a single column" InvalidRequest: Error from server: code=2200 [Invalid query] message="Local index definition must contain full partition key only. Redundant column: XYZ" ``` This commit refactors `vector_index::check_target` to correctly validate columns building the index. Vector-store currently support filtering by native types, so the type of columns is checked. The first column from the list must be a vector (to build index based on these vectors), so it is also checked. Allowed types for columns are native types without counter (it is not possible to create a table with counter and vector) and without duration (it is not possible to correctly compare durations, this type is even not allowed in secondary indexes). This commits adds cqlpy test to check errors while creating indexes. Fixes: SCYLLADB-298 This needs to be backported to version 2026.1 as this is a fix for filtering support. Closes scylladb/scylladb#28366	2026-01-30 01:14:31 +02:00
Szymon Malewski	c89957b725	vector_index: rescoring: Add hidden similarity score column Rescoring consist of recalculating similarity score and reordering results based on it. In this patch we add calculation of similarity score as a hidden (non-serialized) column and following patch will add reordering. Normal ordering uses `add_column_for_post_processing`, however this works only for regular columns, not function. So we create it together with user requested columns (this also forces the use of `selection_with_processing`) and hide the column later. This also requires special handling for 'SELECT *' case - we need to manually add all columns before adding similarity column. In case user already asks for similarity score in the SELECT clause, this value will be calculated twice - is should be optimized in future patches.	2026-01-22 15:38:40 +01:00
Szymon Malewski	c5945b1ef4	vector_index: introduce rescoring option This patch adds vector index option allowing to enable rescoring - recalculation of similarity metric and re-ranking of quantized VS candidates. Quantization is a necessary condition to run rescoring - checked in convenience function `is_rescoring_enabled`. Rescoring itself is not implemented - it will come in following patches. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-294	2026-01-20 21:01:45 +01:00
Szymon Malewski	262a8cef0b	vector_index: improve options validation In this patch we enhance validation of option by: - giving context (option name) in error messages - listing supported values in error messages of enumerated options - avoiding using templates Fixes https://scylladb.atlassian.net/browse/SCYLLADB-293 Follow-up: https://github.com/scylladb/scylladb/pull/27677	2026-01-20 21:01:41 +01:00
Nadav Har'El	70b3cd0540	Merge 'vector_index: introduce `quantization` and `oversampling` options' from Szymon Malewski This patch adds vector index options allowing to enable quantization and oversampling. Specific quantization value will be used internally by vector store. In the current implementation, get_oversampling allows us to decide how many times more candidates to retrieve from vector store - final response is still trimmed to the given limit. It is a first step to allow rescoring - recalculation of similarity metric and re-ranking. Without rescoring oversampling will be also further optimized to happen internally in vector store. `test/vector_search/rescoring_test.cc` implements basic tests of added functionality. New options are documented in `docs/cql/secondary-indexes.rst`. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-82 Ref https://scylladb.atlassian.net/browse/SCYLLADB-83 New feature - no backporting Closes scylladb/scylladb#27677 * github.com:scylladb/scylladb: vector_search: doc: Document new index options vector_search: test: Test oversampling vector_search: test: Add rescoring index options test vector_search: test: Extract Configure utility to shared header vector_index: introduce `quantization` and `oversampling` options	2026-01-20 08:50:46 +02:00
Botond Dénes	e01041d3ee	db/system_keyspace: move remining tables out of v3 keyspace The last remining tables in the v3 keyspace are those that are genuinely distinct -- added by Cassandra 3.0 or >= ScyllaDB 2.0. Move these out of the v3 keyspace too, with this the v3 keyspace is defunct and removed.	2026-01-19 12:32:21 +02:00
Szymon Malewski	b8e91ee6ae	vector_index: introduce `quantization` and `oversampling` options This patch adds vector index options allowing to enable quantization and oversampling. Specific quantization value will be used internally by vector store. In the current implementation, `get_oversampling` allows us to decide how many times more candidates to retrieve from vector store - final response is still trimmed to the given limit. It is a first step to allow rescoring - recalculation of similarity metric and re-ranking. Without rescoring oversampling will be also further optimized to happen internally in vector store. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-82 Ref https://scylladb.atlassian.net/browse/SCYLLADB-83	2026-01-19 10:21:43 +01:00
Botond Dénes	122b7847e5	Merge 'index: Accept view properties in CREATE INDEX' from Dawid Mędrek Problem ------- Secondary indexes are implemented via materialized views under the hood. The way an index behaves is determined by the configuration of the view. Currently, it can be modified by performing the CQL statement `ALTER MATERIALIZED VIEW` on it. However, that raises some concerns. Consider, for instance, the following scenario: 1. The user creates a secondary index on a table. 2. In parallel, the user performs writes to the base table. 3. The user modifies the underlying materialized view, e.g. by setting the `synchronous_updates` to `true` [1]. Some of the writes that happened before step 3 used the default value of the property (which is `false`). That had an actual consequence on what happened later on: the view updates were performed asynchronously. Only after step 3 had finished did it change. Unfortunately, as of now, there is no way to avoid a situation like that. Whenever the user wants to configure a secondary index they're creating, they need to do it in another schema change. Since it's not always possible to control how the database is manipulated in the meantime, it leads to problems like the one described. That's not all, though. The fact that it's not possible to configure secondary indexes is inconsistent with other schema entities. When it comes to tables or materialized views, the user always have a means to set some or even all of the properties during their creation. Solution -------- The solution to this problem is extending the `CREATE INDEX` CQL statement by view properties. The syntax is of form: ``` > CREATE INDEX <index name> > .. ON <keyspace>.<table> (<columns>) > .. WITH <properties> ``` where `<properties>` corresponds to both index-specific and view properties [2, 3]. View properties can only be used with indexes implemented with materialized views; for example, it will be impossible to create a vector index when specifying any view property (see examples below). When a view property is provided, it will be applied when creating the underlying materialized view. The behavior should be similar to how other CQL statements responsible for creating schema entities work. High-level implementation strategy ---------------------------------- 1. Make auxiliary changes. 2. Introduce data structures representing the new set of index properties: both index-specific and those corresponding to the underlying view. 3. Extend `CREATE INDEX` to accept view properties. 4. Extend `DESCRIBE INDEX` and other `DESCRIBE` statements to include view properties in their output. User documentation is also updated at the steps to reflect the corresponding changes. Implementation considerations ----------------------------- There are a number of schema properties that are now obsolete. They're accepted by other CQL statements, but they have no effect. They include: * `index_interval` * `replicate_on_write` * `populate_io_cache_on_flush` * `read_repair_chance` * `dclocal_read_repair_chance` If the user tries to create a secondary index specifying any of those keywords, the statement will fail with an appropriate error (see examples below). Unlike materialized views, we forbid specifying the clustering order when creating a secondary index [4]. This limitation may be lifted later on, but it's a detail that may or may not prove troublesome. It's better to postpone covering it to when we have a better perspective on the consequences it would bring. Examples -------- Good examples ``` > CREATE INDEX idx ON ks.t (v); > CREATE INDEX idx ON ks.t (v) WITH comment = 'ok view property'; > CREATE INDEX idx ON ks.t (v) .. WITH comment = 'multiple view properties are ok' .. AND synchronous_updates = true; > CREATE INDEX idx ON ks.t (v) .. WITH comment = 'default value ok' .. AND synchronous_updates = false; ``` Bad examples ``` > CREATE INDEX idx ON ks.t (v) WITH replicate_on_write = true; SyntaxException: Unknown property 'replicate_on_write' > CREATE INDEX idx ON ks.t (v) .. WITH OPTIONS = {'option1': 'value1'} .. AND comment = 'some text'; InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot specify options for a non-CUSTOM index" > CREATE CUSTOM INDEX idx ON ks.t (v) .. WITH OPTIONS = {'option1': 'value1'} .. AND comment = 'some text'; InvalidRequest: Error from server: code=2200 [Invalid query] message="CUSTOM index requires specifying the index class" > CREATE CUSTOM INDEX idx ON ks.t (v) .. USING 'vector_index' .. WITH OPTIONS = {'option1': 'value1'} .. AND comment = 'some text'; InvalidRequest: Error from server: code=2200 [Invalid query] message="You cannot use view properties with a vector index" > CREATE INDEX idx ON ks.t (v) WITH CLUSTERING ORDER BY (v ASC); InvalidRequest: Error from server: code=2200 [Invalid query] message="Indexes do not allow for specifying the clustering order" ``` and so on. For more examples, see the relevant tests. References: [1] https://docs.scylladb.com/manual/branch-2025.4/cql/cql-extensions.html#synchronous-materialized-views [2] https://docs.scylladb.com/manual/branch-2025.4/cql/secondary-indexes.html#create-index [3] https://docs.scylladb.com/manual/branch-2025.4/cql/mv.html#mv-options [4] https://docs.scylladb.com/manual/branch-2025.4/cql/dml/select.html#ordering-clause Fixes scylladb/scylladb#16454 Backport: not needed. This is an enhancement. Closes scylladb/scylladb#24977 * github.com:scylladb/scylladb: cql3: Extend DESC INDEX by view properties cql3: Forbid using CLUSTERING ORDER BY when creating index cql3: Extend CREATE INDEX by MV properties cql3/statements/create_index_statement: Allow for view options cql3/statements/create_index_statement: Rename member cql3/statements/index_prop_defs: Re-introduce index_prop_defs cql3/statements/property_definitions: Add extract_property() cql3/statements/index_prop_defs.cc: Add namespace cql3/statements/index_prop_defs.hh: Rename type cql3/statements/view_prop_defs.cc: Move validation logic into file cql3/statements: Introduce view_prop_defs.{hh,cc} cql3/statements/create_view_statement.cc: Move validation of ID schema/schema.hh: Do not include index_prop_defs.hh	2026-01-14 09:54:27 +02:00
Avi Kivity	c6dfae5661	treewide: #include Seastar headers with angle brackets Seastar is an external library from the point of view of ScyllaDB, so should be included with angle brackets. Closes scylladb/scylladb#27947	2026-01-13 14:56:15 +02:00
Dawid Mędrek	dcf2c71204	cql3/statements/index_prop_defs.hh: Rename type We rename the type `index_prop_defs` to `index_specific_prop_defs`. The rationale for the change is to distinguish between properties related directly to a index and properties related to the underlying view (if applicable). The type `index_prop_defs` will be re-introduced in an upcomming commit where it'll encompass both index-related and view-related properties. This is a prerequisite for it.	2025-12-16 11:43:37 +01:00
Amnon Heiman	68c7236acb	vector_index: require tablets for vector indexes This patch enforces that vector indexes can only be created on keyspaces that use tablets. During index validation, `check_uses_tablets()` verifies the base keyspace configuration and rejects creation otherwise. To support this, the `custom_index::validate()` API now receives a `const data_dictionary::database&` parameter, allowing index implementations to access keyspace-level settings during DDL validation. Fixes https://scylladb.atlassian.net/browse/VECTOR-322 Closes scylladb/scylladb#26786	2025-11-26 13:30:43 +02:00
Amnon Heiman	b2c2a99741	index/vector_index.cc: Don't allow zero as an index option This patch forces vector_index option value to be real-positive numbers as zero would make no senese. Fixes https://scylladb.atlassian.net/browse/VECTOR-249 Signed-off-by: Amnon Heiman <amnon@scylladb.com> Closes scylladb/scylladb#27191	2025-11-25 10:05:44 +02:00
Radosław Cybulski	d589e68642	Add precompiled headers to CMakeLists.txt Add precompiled header support to CMakeLists.txt and configure.py - it improves compilation time by approximately 10%. New header `stdafx.hh` is added, don't include it manually - the compiler will include it for you. The header contains includes from external libraries used by Scylla - seastar, standard library, linux headers and zlib. The feature is enabled by default, use CMake option `Scylla_USE_PRECOMPILED_HEADER` or configure.py --disable-precompiled-header to disable. The feature should be disabled, when trying to check headers - otherwise you might get false negatives on missing includes from seastar / abseil and so on. Note: following configuration needs to be added to ccache.conf: sloppiness = pch_defines,time_macros,include_file_mtime,include_file_ctime Closes scylladb/scylladb#26617	2025-11-21 12:27:41 +02:00
Michał Hudobski	46589bc64c	secondary_index: disallow multiple vector indexes on the same column We currently allow creating multiple vector indexes on one column. This doesn't make much sense as we do not support picking one when making ann queries. To make this less confusing and to make our behavior similar to Cassandra we disallow the creation of multiple vector indexes on one column. We also add a test that checks this behavior. Fixes: VECTOR-254 Fixes: #26672 Closes scylladb/scylladb#26508	2025-10-29 11:55:38 +02:00
Dawid Mędrek	e294b80615	index: Make `create_view_for_index` method of `create_index_statement`	2025-10-20 14:04:16 +02:00
Dawid Mędrek	fe00485491	index: Move code for creating MV of secondary index to cql3 We move the code responsible for creating the schema for the underlying materialized view of a secondary index from `index/` to `cql3/` so that it's close to that responsible for performing `CREATE INDEX`. That's in line with how other CQL statements are designed. Note that the moved method is still a method of `secondary_index_manager`. We'll make it a method of `create_index_statement` in the following commit.	2025-10-20 14:04:11 +02:00
Dawid Mędrek	ecc955fbe0	index/secondary_index_manager: Take std::span instead of std::vector	2025-10-09 16:17:07 +02:00
Dawid Mędrek	074f0f2e4c	index/secondary_index_manager: Add missing const qualifier	2025-10-09 16:06:50 +02:00
Dawid Mędrek	7baf95bc4b	index/vector_index: Add missing const qualifiers	2025-10-09 16:06:24 +02:00
Ernest Zaslavsky	54aa552af7	treewide: Move type related files to a `type` directory As requested in #22110 , moved the files and fixed other includes and build system. Moved files: - duration.hh - duration.cc - concrete_types.hh Fixes: #22110 This is a cleanup, no need to backport Closes scylladb/scylladb#25088	2025-09-17 17:32:19 +03:00
Nadav Har'El	e322902506	Merge 'index, metrics: add per-index metrics' from Michał Hudobski This patch adds the possibility to track metrics per secondary index. Currently, only a histogram of query latencies is tracked, but more metrics can be added in the future. To add a new metric, it needs to be added to the index_metrics struct in index/secondary_index_manager.hh and then initialized in index/secondary_index_manager.cc in the constructor of the index_metrics struct. The metrics are created when the index is created and removed when the index is dropped. First lines of the new metric: \# HELP scylla_index_query_latencies Index query latencies \# TYPE scylla_index_query_latencies histogram scylla_index_query_latencies_sum{idx="test_i_idx",ks="test"} 640 scylla_index_query_latencies_count{idx="test_i_idx",ks="test"} 1 scylla_index_query_latencies_bucket{idx="test_i_idx",ks="test",le="640.000000"} 1 scylla_index_query_latencies_bucket{idx="test_i_idx",ks="test",le="768.000000"} 1 Fixes: https://github.com/scylladb/scylladb/issues/25970 Closes scylladb/scylladb#25995 * github.com:scylladb/scylladb: test: verify that the index metric is added index, metrics: add per-index metrics	2025-09-17 14:54:12 +03:00
Ernest Zaslavsky	d624413ddd	treewide: Move query related files to a new `query` directory As requested in #22120, moved the files and fixed other includes and build system. Moved files: - query.cc - query-request.hh - query-result.hh - query-result-reader.hh - query-result-set.cc - query-result-set.hh - query-result-writer.hh - query_id.hh - query_result_merger.hh Fixes: #22120 This is a cleanup, no need to backport Closes scylladb/scylladb#25105	2025-09-16 23:40:47 +03:00
Michał Hudobski	b09d1f0a98	index, metrics: add per-index metrics This patch adds the possibility to track metrics per secondary index. Currently, only a histogram of query latencies is tracked, but more metrics can be added in the future. To add a new metric, it needs to be added to the index_metrics struct in index/secondary_index_manager.hh and then initialized in index/secondary_index_manager.cc in the constructor of the index_metrics struct. The metrics are created when the index is created and removed when the index is dropped. First lines of the new metric: \# HELP scylla_index_query_latencies Index query latencies \# TYPE scylla_index_query_latencies histogram scylla_index_query_latencies_sum{idx="test_i_idx",ks="test"} 640 scylla_index_query_latencies_count{idx="test_i_idx",ks="test"} 1 scylla_index_query_latencies_bucket{idx="test_i_idx",ks="test",le="640.000000"} 1 scylla_index_query_latencies_bucket{idx="test_i_idx",ks="test",le="768.000000"} 1	2025-09-16 14:03:43 +02:00
Dawid Pawlik	909a51e524	vector_index, index_prop_defs: add version to index options Since creating the vector index does not lead to creation of a view table [#24438] (whose version info had been logged in `system_schema.scylla_tables`) we lack the information about the version of the index. The mentioned version is used to recognize the quick-drop-create index with the same parameters that needs to be rebuild. The case is mainly experienced while testing, benchmarking or experimenting with Vector Search. Nevertheless it is important to have it considered, as it is really weird having seen that DROP and CREATE commands did not change anything. Although being nice "optimization" to use the same old index, the rebuild feels more natural for the get-to-know-VS-users. Should not change anything in a real production environment. The solution we arrived at is to add the version as a field in options column of `system_schema.indexes`. The version of vector index is a base table's schema version on which the index was created. The table's schema version changes everytime a table is changed meaning that CREATE INDEX or DROP INDEX statement also change it. Every index has a different index version, so it allows to identify them easily. This patch implements the solution described above.	2025-09-10 15:16:54 +02:00
Radosław Cybulski	c242234552	Revert "build: add precompiled headers to CMakeLists.txt" This reverts commit `01bb7b629a`. Closes scylladb/scylladb#25735	2025-09-03 09:46:00 +03:00
Dawid Pawlik	873d7dba5c	custom index: rename `custom_index_option_name` Renamed `custom_index_option_name` to `custom_class_option_name` as the late was a bit misleading since we refactored our model of custom indexes to be index class reliant.	2025-08-29 10:49:15 +02:00
Dawid Pawlik	18e4b9d989	vector_index: rename `supported_options` to `vector_index_options` There are a few types of index options abstraction in a code. One is `raw_options` which indicates the options provided by the user via CQL. Another is `options` which includes the real index options after correction checks and addition of system-set options. I believe we do not need another abstraction with undescriptive name. This patch adds a little neatness, describing what should the developer understand by looking at the `supported_options`. This options are only provided for the vector index to setup the external index properly with parameters strongly related to Vector Search.	2025-08-29 10:47:02 +02:00
Radosław Cybulski	01bb7b629a	build: add precompiled headers to CMakeLists.txt Add precompiled header support to CMakeLists.txt and configure.py - it improves compilation time by approximately 10%. New header `stdafx.hh` is added, don't include it manually - the compiler will include it for you. The header contains includes from external libraries used by Scylla - seastar, standard library, linux headers and zlib. The feature is enabled by default, use CMake option `Scylla_USE_PRECOMPILED_HEADER` or configure.py --disable-precompiled-header to disable. The feature should be disabled, when trying to check headers - otherwise you might get false negatives on missing includes from seastar / abseil and so on. Note: following configuration needs to be added to ccache.conf: sloppiness = pch_defines,time_macros Closes #25182	2025-08-27 21:37:54 +03:00

1 2 3 4

177 Commits