scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-01 12:36:56 +00:00

Author	SHA1	Message	Date
Lakshmi Narayanan Sreethar	7b97928152	cmake: link `vector_search` to `test-lib` instead of `cql3` PR #26237 fixed linker errors by linking `cql3` to `vector_search` but this introduced a circular dependency between these two static libraries, sometimes causing failures during compilation : ``` ninja: error: dependency cycle: /home/user/Development/scylladb/build/debug/cql3/CqlParser.hpp -> data_dictionary/libdata_dictionary.a -> data_dictionary/CMakeFiles/data_dictionary.dir/data_dictionary.cc.o -> /home/user/Development/scylladb/build/debug/cql3/CqlParser.hpp ``` So, instead of linking the `vector_search` library to the `cql3` library, link it directly to the executable where the `cql3` library is also to be linked. For the test cases, this means linking `vector_search` to the `test-lib` library. Since both `vector_search` and `cql3` are static libraries, the linker will resolve them correctly regardless of the order in which they are linked. Refs #26235 Refs #26237 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Closes scylladb/scylladb#26318	2025-09-29 17:46:58 +03:00
Piotr Dulikowski	3abe6eadce	Merge 'Add CQL documentation for vector queries using SELECT ANN' from Szymon Wasik This PR adds the missing documentation for the SELECT ... ANN statement that allows performing vector queries. This is just the basic explanation of the grammar and how to use it. More comprehensive documentation about vector search will be added separately in Scylla Cloud documentation and features description. Links to this additional documentation will be added as part of VECTOR-244. Fixes: VECTOR-247. No backport is needed as this is the new feature. Closes scylladb/scylladb#26282 * github.com:scylladb/scylladb: cql3: Update error messages to be in line with documentation. docs: Add CQL documentation for vector queries using SELECT ANN	2025-09-29 12:46:55 +02:00
Avi Kivity	5b6570be52	Merge 'db/config: Add SSTable compression options for user tables' from Nikos Dragazis ScyllaDB offers the `compression` DDL property for configuring compression per user table (compression algorithm and chunk size). If not specified, the default compression algorithm is the LZ4Compressor with a 4KiB chunk size. The same default applies to system tables as well. This series introduces a new configuration option to allow customizing the default for user tables. It also adds some tests for the new functionality. Fixes #25195. Closes scylladb/scylladb#26003 * github.com:scylladb/scylladb: test/cluster: Add tests for invalid SSTable compression options test/boost: Add tests for SSTable compression config options main: Validate SSTable compression options from config db/config: Add SSTable compression options for user tables db/config: Prepare compression_parameters for config system compressor: Validate presence of sstable_compression in parameters compressor: Add missing space in exception message	2025-09-28 20:23:23 +03:00
Szymon Wasik	ccfe80ab97	cql3: Update error messages to be in line with documentation. ANN (aproximate nearest neighborhood) is just the name of the type of algorithm used to perform vector search. For this reason the error messages should refer to vector queries rather than ANN queries.	2025-09-26 17:01:10 +02:00
Nikos Dragazis	e1d9c83406	db/config: Add SSTable compression options for user tables ScyllaDB offers the `compression` DDL property for configuring compression per user table (compression algorithm and chunk size). If not specified, the default compression algorithm is the LZ4Compressor with a 4KiB chunk size (refer to the default constructor for `compression_parameters`). The same default applies to system tables as well. Add a new configuration option to allow customizing the default for user tables. Use the previously hardcoded default as the new option's default value. Note that the option has no effect on ALTER TABLE statements. An altered table either inherits explicit compression options from the CQL statement, or maintains its existing options. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2025-09-26 12:02:00 +03:00
Botond Dénes	86ed627fc4	compaction: move code to namespace compaction The namespace usage in this directory is very inconsistent, with files and classes scattered in: * global namespace * namespace compaction * namespace sstables With cases, where all three used in the same file. This code used to live in sstables/ and some of it still retains namespace sstables as a heritage of that time. The mismatch between the dir (future module) and the namespace used is confusing, so finish the migration and move all code in compaction/ to namespace compaction too. This patch, although large, is mechanic and only the following kind of changes are made: * replace namespace sstable {} with namespace compaction {} * add namespace compaction {} * drop/add sstables:: * drop/add compaction:: * move around forward-declarations so they are in the correct namespace context This refactoring revealed some awkward leftover coupling between sstables and compaction, in sstables/sstable_set.cc, where the make_sstable_set() methods of compaction strategies are implemented.	2025-09-25 15:03:56 +03:00
Lakshmi Narayanan Sreethar	690546fa40	cmake: link vector_search library to cql3 library The `indexed_table_select_statement::actually_do_execute()` method references `vector_search::vector_store_client::ann()`, but the `vector_search` library, which provides its definition, is not linked with the `cql3` library. This causes linker errors when other targets are built, for example linking `comparable_bytes_test`, which links the `types` library that in turn links `cql3` throws the following error : ``` ...error: undefined symbol: vector_search::vector_store_client::ann... ``` Fix by adding `vector_search` to the private link libraries of `cql3`. Fixes #26235 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Closes scylladb/scylladb#26237	2025-09-25 11:05:51 +03:00
Ernest Zaslavsky	5ba5aec1f8	treewide: Move mutation related files to a `mutation` directory As requested in #22104, moved the files and fixed other includes and build system. Moved files: - combine.hh - collection_mutation.hh - collection_mutation.cc - converting_mutation_partition_applier.hh - converting_mutation_partition_applier.cc - counters.hh - counters.cc - timestamp.hh Fixes: #22104 This is a cleanup, no need to backport Closes scylladb/scylladb#25085	2025-09-24 13:23:38 +03:00
Karol Nowacki	eae71d3e91	vector_store_client: Move to vector_search module Vector search related implementation moved to a new module vector_search. As the vector search functionality is going to be extended, it is better to keep it in a separate module.	2025-09-22 08:01:47 +02:00
Michał Hudobski	1690e5265a	vector search: correct column name formatting This patch corrects the column name formatting whenever an "Undefined column name" exception is thrown. Until now we used the `name()` function which returns a bytes object. This resulted in a message with a garbled ascii bytes column name instead of a proper string. We switch to the `text()` function that returns a sstring instead, making the message readable. Tests are adjusted to confirm this behavior. Fixes: VECTOR-228 Closes scylladb/scylladb#26120	2025-09-20 07:02:53 +02:00
Piotr Dulikowski	5f55787e50	Merge 'CDC with tablets' from Michael Litvak initial implementation to support CDC in tablets-enabled keyspaces. The design is described in https://docs.google.com/document/d/1qO5f2q5QoN5z1-rYOQFu6tqVLD3Ha6pphXKEqbtSNiU/edit?usp=sharing It is followed closely for the most part except "Deciding when to change streams" - instead, streams are changed synchronously with tablet split / merge. Instead of the stream switching algorithm with the double writes, we use a scheme similar to the previous method for vnodes - we add the new streams with timestamp that is sufficiently far into the future. In this PR we: * add new group0-based internal system tables for tablet stream metadata and loading it into in-memory CDC metadata * add virtual tables for CDC consumers * the write coordinator chooses a stream by looking up the appropriate stream in the CDC metadata * enable creating tables with CDC enabled in tablets-enabled keyspaces. tablets are allocated for the CDC table, and a stream is created per each tablet. * on tablet resize (split / merge), the topology coordinator creates a new stream set with a new stream for each new tablet. * the cdc tablets are co-located with the base tablets Fixes https://github.com/scylladb/scylladb/issues/22576 backport not needed - new feature update dtests: https://github.com/scylladb/scylla-dtest/pull/5897 update java cdc library: https://github.com/scylladb/scylla-cdc-java/pull/102 update rust cdc library: https://github.com/scylladb/scylla-cdc-rust/pull/136 Closes scylladb/scylladb#23795 * github.com:scylladb/scylladb: docs/dev: update CDC dev docs for tablets doc: update CDC docs for tablets test: cluster_events: enable add_cdc and drop_cdc test/cql: enable cql cdc tests to run with tablets test: test_cdc_with_alter: adjust for cdc with tablets test/cqlpy: adjust cdc tests for tablets test/cluster/test_cdc_with_tablets: introduce cdc with tablets tests cdc: enable cdc with tablets topology coordinator: change streams on tablet split/merge cdc: virtual tables for cdc with tablets cdc: generate_stream_diff helper function cdc: choose stream in tablets enabled keyspaces cdc: rename get_stream to get_vnode_stream cdc: load tablet streams metadata from tables cdc: helper functions for reading metadata from tables cdc: colocate cdc table with base cdc: remove streams when dropping CDC table cdc: create streams when allocating tablets migration_listener: add on_before_allocate_tablet_map notification cdc: notify when creating or dropping cdc table cdc: move cdc table creation to pre_create cdc: add internal tables for cdc with tablets cdc: add cdc_with_tablets feature flag cdc: add is_log_schema helper	2025-09-18 13:39:37 +02:00
Ernest Zaslavsky	54aa552af7	treewide: Move type related files to a `type` directory As requested in #22110 , moved the files and fixed other includes and build system. Moved files: - duration.hh - duration.cc - concrete_types.hh Fixes: #22110 This is a cleanup, no need to backport Closes scylladb/scylladb#25088	2025-09-17 17:32:19 +03:00
Michael Litvak	1fc3273b27	cdc: enable cdc with tablets Allow to create CDC tables in a tablets-enabled keyspace when all nodes in the cluster support the cdc_with_tablets feature. Fixes scylladb/scylladb#22576	2025-09-17 14:47:12 +02:00
Nadav Har'El	e322902506	Merge 'index, metrics: add per-index metrics' from Michał Hudobski This patch adds the possibility to track metrics per secondary index. Currently, only a histogram of query latencies is tracked, but more metrics can be added in the future. To add a new metric, it needs to be added to the index_metrics struct in index/secondary_index_manager.hh and then initialized in index/secondary_index_manager.cc in the constructor of the index_metrics struct. The metrics are created when the index is created and removed when the index is dropped. First lines of the new metric: \# HELP scylla_index_query_latencies Index query latencies \# TYPE scylla_index_query_latencies histogram scylla_index_query_latencies_sum{idx="test_i_idx",ks="test"} 640 scylla_index_query_latencies_count{idx="test_i_idx",ks="test"} 1 scylla_index_query_latencies_bucket{idx="test_i_idx",ks="test",le="640.000000"} 1 scylla_index_query_latencies_bucket{idx="test_i_idx",ks="test",le="768.000000"} 1 Fixes: https://github.com/scylladb/scylladb/issues/25970 Closes scylladb/scylladb#25995 * github.com:scylladb/scylladb: test: verify that the index metric is added index, metrics: add per-index metrics	2025-09-17 14:54:12 +03:00
Ernest Zaslavsky	d624413ddd	treewide: Move query related files to a new `query` directory As requested in #22120, moved the files and fixed other includes and build system. Moved files: - query.cc - query-request.hh - query-result.hh - query-result-reader.hh - query-result-set.cc - query-result-set.hh - query-result-writer.hh - query_id.hh - query_result_merger.hh Fixes: #22120 This is a cleanup, no need to backport Closes scylladb/scylladb#25105	2025-09-16 23:40:47 +03:00
Botond Dénes	0cf6a648bb	Merge 'Default create keyspace syntax' from Dario Mirovic Allow for the following CQL syntax: ``` CREATE KEYSPACE [IF NOT EXISTS] <name>; ``` for example: ``` CREATE KEYSPACE test_keyspace; ``` With this syntax all the keyspace's parameters would be defaulted to: replication strategy = `NetworkTopologyStrategy`, replication factor = number of racks , but excluding racks that only have arbiter nodes storage options, durable writes = defaults we normally would use, tablets enabled if they are enabled in the db configuration, e.g. scylla.yaml or db/config.cc by default. Options besides `replication` already have defaults. `replication` had to be specified, but it could be an empty set, where defaults for sub-options (replication strategy and replication factor) would be used - `replication = {}`. Now there is no need for specifying an empty set - omitting `replication = {}` has the same effect as `replication = {}`. Since all the options now have defaults, `WITH` is optional for `CREATE KEYSPACE` statement. Fixes #25145 This is an improvement, no backport needed. Closes scylladb/scylladb#25872 * github.com:scylladb/scylladb: docs: cql: default create keyspace syntax test: cqlpy: add test for create keyspace with no options specified cql: default `CREATE KEYSPACE` syntax	2025-09-16 23:40:47 +03:00
Michał Hudobski	b09d1f0a98	index, metrics: add per-index metrics This patch adds the possibility to track metrics per secondary index. Currently, only a histogram of query latencies is tracked, but more metrics can be added in the future. To add a new metric, it needs to be added to the index_metrics struct in index/secondary_index_manager.hh and then initialized in index/secondary_index_manager.cc in the constructor of the index_metrics struct. The metrics are created when the index is created and removed when the index is dropped. First lines of the new metric: \# HELP scylla_index_query_latencies Index query latencies \# TYPE scylla_index_query_latencies histogram scylla_index_query_latencies_sum{idx="test_i_idx",ks="test"} 640 scylla_index_query_latencies_count{idx="test_i_idx",ks="test"} 1 scylla_index_query_latencies_bucket{idx="test_i_idx",ks="test",le="640.000000"} 1 scylla_index_query_latencies_bucket{idx="test_i_idx",ks="test",le="768.000000"} 1	2025-09-16 14:03:43 +02:00
Nadav Har'El	5307d1b9a8	Merge 'vector_index: add version to index options' from Dawid Pawlik Since creating the vector index does not lead to creation of a view table [#24438] (whose version info had been logged in `system_schema.scylla_tables`) we lacked the information about the version of the index. The solution we arrived at is to add the version as a field in options column of `system_schema.indexes`. It requires few changes and seems unintruitive for existing infrastructure. This patch implements the solution described above. Refs: VECTOR-142 Closes scylladb/scylladb#25614 * github.com:scylladb/scylladb: cqlpy/test_vector_index: add vector index version test vector_index, index_prop_defs: add version to index options create_index_statement: rename `validator` to `custom_index_factory` custom index: rename `custom_index_option_name` vector_index: rename `supported_options` to `vector_index_options`	2025-09-14 15:35:53 +03:00
Avi Kivity	c91b326d5a	Merge 'transport: replace throwing protocol_exception with returns' from Dario Mirovic Replace throwing `protocol_exception` with returning it as a result or an exceptional future in the transport server module. The goal is to improve performance. Most of the `protocol_exception` throws were made from `fragmented_temporary_buffer` module, by passing `exception_thrower()` to its `read` methods. `fragmented_temporary_buffer` is changed so that it now accepts an exception creator, not exception thrower. `fragmented_temporary_buffer_concepts::ExceptionCreator` concept replaced `fragmented_temporary_buffer_concepts::ExceptionThrower` and all methods that have been throwing now return failed result of type `utils::result_with_eptr`. This change is then propagated to the callers. The scope of this patch is `protocol_exception`, so commitlog just calls `.value()` method on the result. If the result failed, that will throw the exception from the result, as defined by `utils::result_with_eptr_throw_policy`. This means that the behavior of commitlog module stays the same. transport server module handles results gracefully. All the caller functions that return non-future value `T` now return `utils::result_with_eptr<T>`. When the caller is a function that returns a future, and it receives failed result, `make_exception_future(std::move(failed_result).value())` is returned. The rest of the callstack up to the transport server `handle_error` function is already working without throwing, and that's how zero throws is achieved. cql3 module changes do the same as transport server module. Benchmark that is not yet merged has commit `67fbe35833e2d23a8e9c2dcb5e04580231d8ec96`, [GitHub diff view](https://github.com/scylladb/scylladb/compare/master...nuivall:scylladb:perf_cql_raw). It uses either read or write query. Command line used: ``` ./build/release/scylla perf-cql-raw --workdir ~/tmp/scylladir --smp 1 --developer-mode 1 --workload write --duration 300 --concurrency 1000 --username cassandra --password cassandra 2>/dev/null ``` The only thing changed across runs is `--workload write`/`--workload read`. Built and run on `release` target. <details> ``` throughput: mean= 36946.04 standard-deviation=1831.28 median= 37515.49 median-absolute-deviation=1544.52 maximum=39748.41 minimum=28443.36 instructions_per_op: mean= 108105.70 standard-deviation=965.19 median= 108052.56 median-absolute-deviation=53.47 maximum=124735.92 minimum=107899.00 cpu_cycles_per_op: mean= 70065.73 standard-deviation=2328.50 median= 69755.89 median-absolute-deviation=1250.85 maximum=92631.48 minimum=66479.36 ⏱ real=5:11.08 user=2:00.20 sys=2:25.55 cpu=85% ``` ``` throughput: mean= 40718.30 standard-deviation=2237.16 median= 41194.39 median-absolute-deviation=1723.72 maximum=43974.56 minimum=34738.16 instructions_per_op: mean= 117083.62 standard-deviation=40.74 median= 117087.54 median-absolute-deviation=31.95 maximum=117215.34 minimum=116874.30 cpu_cycles_per_op: mean= 58777.43 standard-deviation=1225.70 median= 58724.65 median-absolute-deviation=776.03 maximum=64740.54 minimum=55922.58 ⏱ real=5:12.37 user=27.461 sys=3:54.53 cpu=83% ``` ``` throughput: mean= 37107.91 standard-deviation=1698.58 median= 37185.53 median-absolute-deviation=1300.99 maximum=40459.85 minimum=29224.83 instructions_per_op: mean= 108345.12 standard-deviation=931.33 median= 108289.82 median-absolute-deviation=55.97 maximum=124394.65 minimum=108188.37 cpu_cycles_per_op: mean= 70333.79 standard-deviation=2247.71 median= 69985.47 median-absolute-deviation=1212.65 maximum=92219.10 minimum=65881.72 ⏱ real=5:10.98 user=2:40.01 sys=1:45.84 cpu=85% ``` ``` throughput: mean= 38353.12 standard-deviation=1806.46 median= 38971.17 median-absolute-deviation=1365.79 maximum=41143.64 minimum=32967.57 instructions_per_op: mean= 117270.60 standard-deviation=35.50 median= 117268.07 median-absolute-deviation=16.81 maximum=117475.89 minimum=117073.74 cpu_cycles_per_op: mean= 57256.00 standard-deviation=1039.17 median= 57341.93 median-absolute-deviation=634.50 maximum=61993.62 minimum=54670.77 ⏱ real=5:12.82 user=4:10.79 sys=11.530 cpu=83% ``` This shows ~240 instructions per op increase for reads and ~180 instructions per op increase for writes. Tests have been run multiple times, with almost identical results. Each run lasted 300 seconds. Number of operations executed is roughly 38k per second 300 seconds = 11.4m ops. Update: I have repeated the benchmark with clean state - reboot computer, put in performance mode, rebuild, closed other apps that might affect CPU and disk usage. run count: 5 times before and 5 times after the patch duration: 300 seconds Average write throughput median before patch: 41155.99 Average write throughput median after patch: 42193.22 Median absolute deviation is also lower now, with values in range 350-550, while the previous runs' values were in range 750-1350. </details> Built and run on `release` target. <details> ./build/release/scylla perf-simple-query --smp 1 --duration 300 --concurrency 1000 --enable-cache false --bypass-cache 2>/dev/null ``` throughput: mean= 14910.90 standard-deviation=477.72 median= 14956.73 median-absolute-deviation=294.16 maximum=16061.18 minimum=13198.68 instructions_per_op: mean= 659591.63 standard-deviation=495.85 median= 659595.46 median-absolute-deviation=324.91 maximum=661184.94 minimum=658001.49 cpu_cycles_per_op: mean= 213301.49 standard-deviation=2724.27 median= 212768.64 median-absolute-deviation=1403.85 maximum=225837.15 minimum=208110.12 ⏱ real=5:19.26 user=5:00.22 sys=15.827 cpu=98% ``` ./build/release/scylla perf-simple-query --smp 1 --duration 300 --concurrency 1000 --enable-cache false 2>/dev/null ``` throughput: mean= 93345.45 standard-deviation=4499.00 median= 93915.52 median-absolute-deviation=2764.41 maximum=104343.64 minimum=79816.66 instructions_per_op: mean= 65556.11 standard-deviation=97.42 median= 65545.11 median-absolute-deviation=71.51 maximum=65806.75 minimum=65346.25 cpu_cycles_per_op: mean= 34160.75 standard-deviation=803.02 median= 33927.16 median-absolute-deviation=453.08 maximum=39285.19 minimum=32547.13 ⏱ real=5:03.23 user=4:29.46 sys=29.255 cpu=98% ``` ./build/release/scylla perf-simple-query --smp 1 --duration 300 --concurrency 1000 --enable-cache true 2>/dev/null ``` throughput: mean= 206982.18 standard-deviation=15894.64 median= 208893.79 median-absolute-deviation=9923.41 maximum=232630.14 minimum=127393.34 instructions_per_op: mean= 35983.27 standard-deviation=6.12 median= 35982.75 median-absolute-deviation=3.75 maximum=36008.24 minimum=35952.14 cpu_cycles_per_op: mean= 17374.87 standard-deviation=985.06 median= 17140.81 median-absolute-deviation=368.86 maximum=26125.38 minimum=16421.99 ⏱ real=5:01.23 user=4:57.88 sys=0.124 cpu=98% ``` ./build/release/scylla perf-simple-query --smp 1 --duration 300 --concurrency 1000 --enable-cache false --bypass-cache 2>/dev/null ``` throughput: mean= 16198.26 standard-deviation=902.41 median= 16094.02 median-absolute-deviation=588.58 maximum=17890.10 minimum=13458.74 instructions_per_op: mean= 659752.73 standard-deviation=488.08 median= 659789.16 median-absolute-deviation=334.35 maximum=660881.69 minimum=658460.82 cpu_cycles_per_op: mean= 216070.70 standard-deviation=3491.26 median= 215320.37 median-absolute-deviation=1678.06 maximum=232396.48 minimum=209839.86 ⏱ real=5:17.33 user=4:55.87 sys=18.425 cpu=99% ``` ./build/release/scylla perf-simple-query --smp 1 --duration 300 --concurrency 1000 --enable-cache false 2>/dev/null ``` throughput: mean= 97067.79 standard-deviation=2637.79 median= 97058.93 median-absolute-deviation=1477.30 maximum=106338.97 minimum=87457.60 instructions_per_op: mean= 65695.66 standard-deviation=58.43 median= 65695.93 median-absolute-deviation=37.67 maximum=65947.76 minimum=65547.05 cpu_cycles_per_op: mean= 34300.20 standard-deviation=704.66 median= 34143.92 median-absolute-deviation=321.72 maximum=38203.68 minimum=33427.46 ⏱ real=5:03.22 user=4:31.56 sys=29.164 cpu=99% ``` ./build/release/scylla perf-simple-query --smp 1 --duration 300 --concurrency 1000 --enable-cache true 2>/dev/null ``` throughput: mean= 223495.91 standard-deviation=6134.95 median= 224825.90 median-absolute-deviation=3302.09 maximum=234859.90 minimum=193209.69 instructions_per_op: mean= 35981.41 standard-deviation=3.16 median= 35981.13 median-absolute-deviation=2.12 maximum=35991.46 minimum=35972.55 cpu_cycles_per_op: mean= 17482.26 standard-deviation=281.82 median= 17424.08 median-absolute-deviation=143.91 maximum=19120.68 minimum=16937.43 ⏱ real=5:01.23 user=4:58.54 sys=0.136 cpu=99% ``` </details> Fixes: #24567 This PR is a continuation of #24738 [transport: remove throwing protocol_exception on connection start](https://github.com/scylladb/scylladb/pull/24738). This PR does not solve a burning issue, but is rather an improvement in the same direction. As it is just an enhancement, it should not be backported. Closes scylladb/scylladb#25408 * github.com:scylladb/scylladb: test/cqlpy: add protocol exception tests test/cqlpy: `test_protocol_exceptions.py` refactor message frame building test/cqlpy: `test_protocol_exceptions.py` refactor duplicate code transport: replace `make_frame` throw with return result cql3: remove throwing `protocol_exception` transport: replace throw in validate_utf8 with result_with_exception_ptr return transport: replace throwing protocol_exception with returns utils: add result_with_exception_ptr test/cqlpy: add unknown compression algorithm test case	2025-09-10 21:54:15 +03:00
Dawid Pawlik	909a51e524	vector_index, index_prop_defs: add version to index options Since creating the vector index does not lead to creation of a view table [#24438] (whose version info had been logged in `system_schema.scylla_tables`) we lack the information about the version of the index. The mentioned version is used to recognize the quick-drop-create index with the same parameters that needs to be rebuild. The case is mainly experienced while testing, benchmarking or experimenting with Vector Search. Nevertheless it is important to have it considered, as it is really weird having seen that DROP and CREATE commands did not change anything. Although being nice "optimization" to use the same old index, the rebuild feels more natural for the get-to-know-VS-users. Should not change anything in a real production environment. The solution we arrived at is to add the version as a field in options column of `system_schema.indexes`. The version of vector index is a base table's schema version on which the index was created. The table's schema version changes everytime a table is changed meaning that CREATE INDEX or DROP INDEX statement also change it. Every index has a different index version, so it allows to identify them easily. This patch implements the solution described above.	2025-09-10 15:16:54 +02:00
Dario Mirovic	20c173e958	cql: default `CREATE KEYSPACE` syntax Since all the options except `REPLICATION` already have defaults, and `REPLICATION` has defaults for all the fields inside, this patch makes `REPLICATION` optional. More specifically, there is no need for `WITH` in create keyspace statement anymore. This allows for the following syntax: `CREATE KEYSPACE [IF NOT EXISTS] <name>;` For example: `CREATE KEYSPACE test_keyspace;` Fixes #25145	2025-09-08 10:07:40 +02:00
Avi Kivity	03ee862b50	cql3: statement_restrictions: forbid querying a single-column or token restriction on a multi-column restriction In `41880bc893` ("cql3: statement_restrictions: forbid querying a single-column inequality restriction on a multi-column restriction"), we removed the ability to contrain a single column on a tuple inequality, on the grounds that it isn't used and can't be used. Here, we extend this to remove the ability to constrain a single column on a tuple equality, on the grounds that it isn't used and hampers further refactoring. CQL supports multi-column equality restrictions in the form (ck1, ck2, ck3) = (:v1, :v2, :v3) These restriction shape is only allowed on clustering keys, and is translated into a partition_slice allowing the primary index to efficiently select the part of the partition that satisfies the restriction. The possible_lhs_values() values function allows extracting single-column restrictions from this and similar tuple restrictions. For example, the multi-column restriction (ck1, ck2, ck3) = (:v1, :v2, :v3) implies that ck2 = :v2. If we have an index on ck2, and if we don't further have a restriction on the partition key, then it is advantageous to use the index to select rows, and then filter on ck1 and ck3 to satisfy the full restriction. However, we never actually do that. The following sequence ```cql CREATE TABLE ks.t1 ( pk int, ck1 int, ck2 int, PRIMARY KEY (pk, ck1, ck2) ); CREATE INDEX ON ks.t1(ck1); SELECT * FROM ks.t1 WHERE (ck1, ck2) = (1, 2); ``` Could have been used to query a single partition via the index, but instead is used for a full table scan, using the partition slice to skip through unselected rows. We can't easily start using a new query plan via the index, since switching plans mid-query (due to paging and moving from one coordinator to another during upgrade) would cause the sort order to change, therefore causing some rows to be omitted and some rows to be returned twice. Similarly, we cannot extract a token restriction from a tuple, since the grammar doesn't allow for ```cql WHERE (token(pk)) = (:var1) ``` Since it's not used, remove it. This code was first introduced in `d33053b841` ("cql3/restrictions: Add free functions over new classes") It does not directly correspond to pre-expression code. Closes scylladb/scylladb#25757 Closes scylladb/scylladb#25821	2025-09-07 18:36:05 +03:00
Nadav Har'El	a1ed2c9d4b	Merge 'Allow users to SELECT from CDC log tables they created.' from Dawid Pawlik Before the patch, user with CREATE access could create a table with CDC or alter the table enabling CDC, but could not query a SELECT on the CDC table they created. It was due to the fact, the SELECT permission was checked on the CDC log, and later it's "parent" - the keyspace, but not the base table, on which the user had SELECT permission automatically granted on CREATE. This patch matches the behavior of querying the CDC log to the one implemented for Materialized Views: 1. No new permissions are granted on CREATE. 2. When querying SELECT, the permissions on base table SELECT are checked. Fixes: https://github.com/scylladb/scylladb/issues/19798 Fixes: VECTOR-151 Closes scylladb/scylladb#25797 * github.com:scylladb/scylladb: cqlpy/test_permissions: run the reproducer tests for #19798 select_statement: check for access to CDC base table	2025-09-04 16:56:52 +03:00
Dawid Mędrek	d2c5268196	cql3: Produce CREATE MATERIALIZED VIEW statement when describing MV of index Before this change, executing `DESCRIBE MATERIALIZED VIEW` on the underlying materialized view of a secondary index would produce a `CREATE INDEX` statement. It was not only confusing, but it also prevented from learning about the definition of the view. The only way to do so was to query system tables. We change that behavior and produce a `CREATE MATERIALIZED VIEW` statement instead. The statement is printed as a comment to implicitly convey that the user should not attempt to execute it to restore the view. A short comment is provided to make it clearer. Before this commit: ``` cqlsh> CREATE TABLE ks.t(p int PRIMARY KEY, v int); cqlsh> CREATE INDEX i ON ks.t(v); cqlsh> DESCRIBE MATERIALIZED VIEW ks.i; CREATE INDEX i ON ks.t(v); ``` After this commit: ``` cqlsh> CREATE TABLE ks.t(p int PRIMARY KEY, v int); cqlsh> CREATE INDEX i ON ks.t(v); cqlsh> DESCRIBE MATERIALIZED VIEW ks.i; /* Do NOT execute this statement! It's only for informational purposes. This materialized view is the underlying materialized view of a secondary index. It can be restored via restoring the index. CREATE MATERIALIZED VIEW ks.i_index [...]; */ ``` Note that describing the base table has not been affected and still works as follows: ``` cqlsh> CREATE TABLE ks.t(p int PRIMARY KEY, v int); cqlsh> CREATE INDEX i ON ks.t(v); cqlsh> DESCRIBE TABLE ks.t; CREATE TABLE ks.t ( p int, v int, PRIMARY KEY (p) ) WITH bloom_filter_fp_chance = 0.01 AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'} AND comment = '' AND compaction = {'class': 'IncrementalCompactionStrategy'} AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND crc_check_chance = 1 AND default_time_to_live = 0 AND gc_grace_seconds = 864000 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND speculative_retry = '99.0PERCENTILE' AND tombstone_gc = {'mode': 'timeout', 'propagation_delay_in_seconds': '3600'}; CREATE INDEX i ON ks.t(v); ``` We also provide two reproducers of scylladb/scylladb#24610. Fixes scylladb/scylladb#24610 Closes scylladb/scylladb#25697	2025-09-03 15:21:37 +02:00
Dawid Pawlik	be54346846	select_statement: check for access to CDC base table Before the patch, user with CREATE access could create a table with CDC or alter the table enabling CDC, but could not query a SELECT on the CDC table they created. It was due to the fact, the SELECT permission was checked on the CDC log, and later it's "parent" - the keyspace, but not thebase table, on which the user had SELECT permission automatically granted on CREATE. This patch matches the behaviour of querying the CDC log to the one implemented for Materialized Views: 1. No new permissions are granted on CREATE. 2. When querying SELECT, the permissions on base table SELECT are checked. Fixes: #19798	2025-09-03 13:20:39 +02:00
Pavel Emelyanov	b0aa2d61d9	Merge 'cql3: add default replication factor to `create_keyspace_statement`' from Dario Mirovic When creating a new keyspace, replication factor must be stated. For example: `CREATE KEYSPACE ks WITH REPLICATION { 'class': 'NetworkTopologyStrategy', 'replication_factor': 3 };` This patch changes it in the following way - if there is no replication factor specified, use default replication factor. Default replication factor is equal to the number of racks that are not arbiter-only, i.e. racks that have at least one non-arbiter node. The following syntax is now valid: `CREATE KEYSPACE ks WITH REPLICATION { 'class': 'NetworkTopologyStrategy' };` `CREATE KEYSPACE ks WITH REPLICATION { };` Fixes #16028 Backport is not needed. This is an enhancement for future releases. Closes scylladb/scylladb#25570 * github.com:scylladb/scylladb: docs/cql: update documentation for default replication factor test/cqlpy: add keyspace creation default replication factor tests cql3: add default replication factor to `create_keyspace_statement`	2025-09-03 12:31:53 +03:00
Radosław Cybulski	c242234552	Revert "build: add precompiled headers to CMakeLists.txt" This reverts commit `01bb7b629a`. Closes scylladb/scylladb#25735	2025-09-03 09:46:00 +03:00
Piotr Dulikowski	762d9ef68f	Merge 'cdc: Set tombstone_gc when creating log table' from Dawid Mędrek Normally, when we create a table, MV, etc., we apply `cf_prop_defs` to the schema builder via the function `cf_prop_defs::apply_to_builder`. Unfortunately, that didn't happen when creating CDC log tables, and so we might have missed some of the properties that would normally be set to some value, even if the default one. One particular example of that phenomenon was `tombstone_gc`. For better or worse, it's not a "standalone property" of a table, but rather part of `extensions`. [Somewhat related issue: scylladb/scylladb#9722] That may have and did cause trouble. Consider this scenario: 1. A CDC log table is created. 2. The table does NOT have any value of `tombstone_gc` set. 3. The user edits the table via `ALTER TABLE`. That statement treats the log table just like any other one (at least as far as the relevant portion of the logic is concerned). Among other things, it uses `cf_prop_defs::apply_to_builder`, and as a result, the `tombstone_gc` property is set to some value: * the default one if the user doesn't specify it in the statement, * a custom one if they do. Why is that a problem? First of all, it's confusing. When we perform a schema backup and a table uses CDC, we include an ALTER statement for its corresponding CDC log table (for more context, see issue scylladb/scylladb#18467 or commit scylladb/scylladb@f12edbdd95). There are two consequences for the user here: 1. If the log table had NOT been altered ever since it was created, the statement will miss the `tombstone_gc` property as if it couldn't be set for it at all. That's confusing! 2. If the log table HAD in fact been altered after its creation, the statement will include the `tombstone_gc` property. That's even more confusing (why was it not present the first time, but it is now?). The `tombstone_gc` property should always be set to avoid confusion and problematic edge cases in tests and to simply be consistent with how other schema entities work. The solution we employ is that we always set the property to the default value. That includes the case when we reattach the log table to the base; consider the following scenario: 1. Create a table with CDC enabled. 2. Detach the log table by performing `ALTER TABLE ... WITH cdc = {'enabled': false}`. 3. Change the `tombstone_gc` property of the log table. 4. Reattach the log table to the base in the same way as in step 2. The expected result would be that the new value of `tombstone_gc` would be preserved after reattaching the log table. However, that's not what will happen. We decide to stay consistent with how other properties of a log table behave, and we reset them after every reattachment. We might change that in the future: see issue scylladb/scylladb#25523. Two reproducer tests of scylladb/scylladb#25187 are included in the changes. Backport: The problem is not critical, so it may not be necessary to backport the changes. That's to be discussed. Closes scylladb/scylladb#25521 * github.com:scylladb/scylladb: cdc: Set tombstone_gc when creating log table tombstone_gc: Add overload of get_default_tombstone_gc_mode tombstone_gc: Rename get_default_tombstonesonte_gc_mode	2025-09-02 10:20:11 +02:00
Karol Nowacki	3086d15999	cql3: Fix crash on ANN OF query when TRACING ON is enabled Executing a vector search (SELECT with ANN OF ordering) query with `TRACING ON` enabled caused a node to crash due to a null pointer dereference. This occurred because a vector index does not have an associated view table, making its `_view_schema` member null. The implementation attempted to enable tracing on this null view schema, leading to the crash. The fix adds a null check for `_view_schema` before attempting to enable tracing on the view (index) table. A regression test is included to prevent this from happening again. Fixes: VECTOR-179 Closes scylladb/scylladb#25500	2025-09-01 17:26:54 +03:00
Avi Kivity	41880bc893	cql3: statement_restrictions: forbid querying a single-column inequality restriction on a multi-column restriction CQL supports multi-column inequality restrictions in the form (ck1, ck2, ck3) >= (:v1, :v2, :v3) These restriction shape is only allowed on clustering keys, and is translated into a partition_slice allowing the primary index to efficiently select the part of the partition that satisfies the restriction. The possible_lhs_values() values function allows extracting single-column restrictions from this and similar tuple restrictions. For example, the multi-column restriction (ck1, ck2, ck3) = (:v1, :v2, :v3) implies that ck2 = :v2. If we have an index on ck2, and if we don't further have a restriction on the partition key, then it is advantageous to use the index to select rows, and then filter on ck1 and ck3 to satisfy the full restriction. For the inquality restriction, we can only infer a restriction on the first column due to lexicographical comparison. We can see that, given (ck1, ck2, ck3) >= (:v1, :v2, :v3) then ck1 >= :v1 ck2 = unbounded ck3 = unbounded and possible_lhs_values() indeed computes this. However, this is never used in practice, and it makes further refactoring difficult. If we want to convert an boolean factor of the where clause to a predicate on a column or tuple of columns, we cannot do so because we can actually generate two predicates: one on the tuple and one on the first column. Since it's not used, remove it. This code was first introduced in `d33053b841` ("cql3/restrictions: Add free functions over new classes") (search for "if (column_index_on_lhs > 0) {"). It does not directly correspond to pre-expression code. Closes scylladb/scylladb#25757	2025-09-01 17:21:26 +03:00
Piotr Dulikowski	7ccb50514d	Merge 'Introduce view building coordinator' from Michał Jadwiszczak This patch introduces `view_building_coordinator`, a single entity within whole cluster responsible for building tablet-based views. The view building coordinator takes slightly different approach than the existing node-local view builder. The whole process is split into smaller view building tasks, one per each tablet replica of the base table. The coordinator builds one base table at a time and it can choose another when all views of currently processing base table are built. The tasks are started by setting `STARTED` state and they are executed by node-local view building worker. The tasks are scheduled in a way, that each shard processes only one tablet at a time (multiple tasks can be started for a shard on a node because a table can have multiple views but then all tasks have the same base table and tablet (last_token)). Once the coordinator starts the tasks, it sends `work_on_view_building_tasks` RPC to start the tasks and receive their results. This RPC is resilient to RPC failure or raft leader change, meaning if one RPC call started a batch of tasks but then failed (for instance the raft leader was changed and caller aborted waiting for the response), next RPC call will attach itself to the already started batch. The coordinator plugs into handling tablet operations (migration/resize/RF change) and adjusts its tasks accordingly. At the start of each tablet operation, the coordinator aborts necessary view building tasks to prevent https://github.com/scylladb/scylladb/issues/21564. Then, new adjusted tasks are created at the end of the operation. If the operation fails at any moment, aborted tasks are rollback. The view building coordinator can also handle staging sstables using process_staging view building tasks. We do this because we don't want to start generating view updates from a staging sstable prematurely, before the writes are directed to the new replica (https://github.com/scylladb/scylladb/issues/19149). For detailed description check: `docs/dev/view-building-coordinator.md` Fixes https://github.com/scylladb/scylladb/issues/22288 Fixes https://github.com/scylladb/scylladb/issues/19149 Fixes https://github.com/scylladb/scylladb/issues/21564 Fixes https://github.com/scylladb/scylladb/issues/17603 Fixes https://github.com/scylladb/scylladb/issues/22586 Fixes https://github.com/scylladb/scylladb/issues/18826 Fixes https://github.com/scylladb/scylladb/issues/23930 --- This PR is reimplementation of https://github.com/scylladb/scylladb/pull/21942 Closes scylladb/scylladb#23760 * github.com:scylladb/scylladb: test/cluster: add view build status tests test/cluster: add view building coordinator tests utils/error_injection: allow to abort `injection_handler::wait_for_message()` test: adjust existing tests utils/error_injection: add injection with `sleep_abortable()` db/view/view_builder: ignore `no_such_keyspace` exception docs/dev: add view building coordinator documentation db/view/view_building_worker: work on `process_staging` tasks db/view/view_building_worker: register staging sstable to view building coordinator when needed db/view/view_building_worker: discover staging sstables db/view/view_building_worker: add method to register staging sstable db/view/view_update_generator: add method to process staging sstables instantly db/view/view_update_generator: extract generating updates from staging sstables to a method db/view/view_update_generator: ignore tablet-based sstables db/view/view_building_coordinator: update view build status on node join/left db/view/view_building_coordinator: handle tablet operations db/view: add view building task mutation builder service/topology_coordinator: run view building coordinator db/view: introduce `view_building_coordinator` db/view/view_building_worker: update built views locally db/view: introduce `view_building_worker` db/view: extract common view building functionalities db/view: prepare to create abstract `view_consumer` message/messaging_service: add `work_on_view_building_tasks` RPC service/topology_coordinator: make `term_changed_error` public db/schema_tables: create/cleanup tasks when an index is created/dropped service/migration_manager: cleanup view building state on drop keyspace service/migration_manager: cleanup view building state on drop view service/migration_manager: create view building tasks on create view test/boost: enable proxy remote in some tests service/migration_manager: pass `storage_proxy` to `prepare_keyspace_drop_announcement()` service/migration_manager: coroutinize `prepare_new_view_announcement()` service/storage_proxy: expose references to `system_keyspace` and `view_building_state_machine` service: reload `view_building_state_machine` on group0 apply() service/vb_coordinator: add currently processing base db/system_keyspace: move `get_scylla_local_mutation()` up db/system_keyspace: add `view_building_tasks` table db/view: add view_building_state and views_state db/system_keyspace: add method to get view build status map db/view: extract `system.view_build_status_v2` cql statements to system_keyspace db/system_keyspace: move `internal_system_query_state()` function earlier db/view: ignore tablet-based views in `view_builder` gms/feature_service: add VIEW_BUILDING_COORDINATOR feature	2025-08-29 17:28:44 +02:00
Dawid Pawlik	a70086c781	create_index_statement: rename `validator` to `custom_index_factory` The change is motivated by the fact that indeed the result of `get_custom_class_factory` is a `custom_index_factory`. The name `validator` was a bit misleading as it does not validate anything by itself. Furthermore if we wanted to use the custom index produced by the factory in other operations than validate, the name feels really off.	2025-08-29 10:49:15 +02:00
Dawid Pawlik	873d7dba5c	custom index: rename `custom_index_option_name` Renamed `custom_index_option_name` to `custom_class_option_name` as the late was a bit misleading since we refactored our model of custom indexes to be index class reliant.	2025-08-29 10:49:15 +02:00
Dario Mirovic	ba178f4c85	cql3: remove throwing `protocol_exception` Remove throwing `protocol_exception` in cql3/query_options.cc` in function `cql3::query_options::check_serial_consistency` as part of an ongoing effort to remove throwing `protocol_exception`. This change only affects code local to the `cql3` module. Refs: #24567	2025-08-28 23:33:15 +02:00
Dario Mirovic	ca5adf2ac1	cql3: add default replication factor to `create_keyspace_statement` When creating a new keyspace, replication factor must be stated. For example: `CREATE KEYSPACE ks WITH REPLICATION { 'class': 'NetworkTopologyStrategy', 'replication_factor': 3 };` This patch changes it in the following way - if there is no replication factor specified, use default replication factor. Default replication factor is equal to the number of racks that are not comprised of only zero-token nodes, i.e. racks that have at least one non-zero-token node. The following syntax is now valid: `CREATE KEYSPACE ks WITH REPLICATION { 'class': 'NetworkTopologyStrategy' };` `CREATE KEYSPACE ks WITH REPLICATION { };` Fixes: #16028	2025-08-28 01:42:29 +02:00
Radosław Cybulski	01bb7b629a	build: add precompiled headers to CMakeLists.txt Add precompiled header support to CMakeLists.txt and configure.py - it improves compilation time by approximately 10%. New header `stdafx.hh` is added, don't include it manually - the compiler will include it for you. The header contains includes from external libraries used by Scylla - seastar, standard library, linux headers and zlib. The feature is enabled by default, use CMake option `Scylla_USE_PRECOMPILED_HEADER` or configure.py --disable-precompiled-header to disable. The feature should be disabled, when trying to check headers - otherwise you might get false negatives on missing includes from seastar / abseil and so on. Note: following configuration needs to be added to ccache.conf: sloppiness = pch_defines,time_macros Closes #25182	2025-08-27 21:37:54 +03:00
Dawid Mędrek	fd4e577db0	tombstone_gc: Rename get_default_tombstonesonte_gc_mode The previous identifier was probably a typo that was missed.	2025-08-27 13:00:10 +02:00
Nadav Har'El	0a990d2a48	config: split tri_mode_restriction to a separate header Today, any source file or header file that wants to use the tri_mode_restriction type needs to include db/config.hh, which is a large and frequently-changing header file. In this patch we split this type into a separate header file, db/tri_mode_restriction.hh, and avoid a few unnecessary inclusions of db/config.hh. However, a few source files now need to explicitly include db/config.hh, after its transitive inclusion is gone. Note that the overwhelmingly common inclusion of db/config.hh continues to be a problem after this patch - 128 source files include it directly. So this patch is just the first step in long journey. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#25692	2025-08-27 13:47:04 +03:00
Michał Jadwiszczak	204f61ffe1	service/migration_manager: pass `storage_proxy` to `prepare_keyspace_drop_announcement()` The reference is needed to get `view_building_state_machine`.	2025-08-27 08:55:47 +02:00
Nadav Har'El	e2c99436cf	Merge 'cdc, vector_search: enable CDC when the index is created' from Dawid Pawlik When a vector index is created in Scylla, it is initially built using a full scan of the database. After that, it stays up to date by tracking changes through CDC, which should be automatically enabled when the vector index is created. When a user attempts to enable Vector Search (VS), the system checks whether Change Data Capture (CDC) is enabled and properly configured: 1. CDC is not enabled - CDC is automatically enabled with the minimum required TTL (Time-to-Live) for VS (24 hours) and the delta mode set to 'full' or post-image is enabled. - If the user later tries to reduce the CDC TTL below 24 hours or set delta mode to 'keys' with post-image disabled, the action fails. - Error message: Clearly states that CDC TTL must be at least 24 hours and delta mode must be set to 'full' or post-image must be enabled for VS to function. 2. CDC is already enabled - If CDC TTL is ≥ 24 hours and delta mode is set to 'full' or post-image is enabled: VS is enabled successfully. - If CDC TTL is < 24 hours or delta mode is set to 'keys' with post-image disabled: The VS enabling process fails. - Error message: Informs the user that CDC TTL must be at least 24 hours, delta mode must be set to 'full' or post-image must be enabled, and provides a link to documentation on how to update the TTL, delta mode, and post-image. When a user attempts to disable CDC when VS is enabled, the action will fail and the user will be informed by error message that clearly states that VS needs to be disabled (vector indexes have to be dropped) first. Full setup requirements and steps will be detailed in the documentation of Vector Search. Co-authored-by: @smoczy123 Fixes: VECTOR-27 Fixes: VECTOR-25 Closes scylladb/scylladb#25179 * github.com:scylladb/scylladb: test/cqlpy: ensure Vector Search CDC options test/boost: adjust CDC boost tests for Vector Search test/cql: add Vector Search CDC enable/disable test cdc, vector_index: provide minimal option setup for Vector Search test/cqlpy: adjust describe table tests with CDC for Vector Search describe, cdc: adjust describe for cdc log tables cdc: enable CDC log when vector index is created test/cqlpy: run vector_index tests only on vnodes vector_index: check if vector index exists in schema	2025-08-26 23:01:32 +03:00
Dawid Mędrek	af8a3dd17b	cql3/statements: Fix indentation	2025-08-21 19:29:36 +02:00
Dawid Mędrek	60ea22d887	cql3: Warn when creating RF-rack-invalid keyspace Although RF-rack-valid keyspaces are not universally enforced yet (they're governed by the configuration option `rf_rack_valid_keyspaces`), we'd like to encourage the user to abide by the restriction. To that end, we're introducing a warning when creating or altering a keyspace. If the configuration option is disabled, but the user is trying to create an RF-rack-invalid keyspace, they'll receive a warning. We provide a validation test.	2025-08-21 19:29:33 +02:00
Dawid Pawlik	a27eef9f18	cdc, vector_index: provide minimal option setup for Vector Search Ensure that the CDC used by Vector Search has at least 24h TTL and delta mode is set to 'full' or postimage is enabled. This setup is required by the Vector Store to work as intended. The TTL of at least 24h is a rough estimate of the maximal time needed for the full scan conducted by Vector Store to finish. The delta mode set to 'full' or postimage enabled is needed to read the values of vectors being written to the table, so Vector Store can save them in the desired external index. As the default we set TTL = 24h, delta = 'full', postimage = false. Full delta is preffered option to log the vector values as it is less costly and does not require additional read on write.	2025-08-20 17:20:20 +02:00
Dawid Pawlik	35b82e6d2f	describe, cdc: adjust describe for cdc log tables Make CDC log table describe mention that it can be created by creating the vector index on base table's vector column.	2025-08-20 12:38:52 +02:00
Dario Mirovic	bc8bb0873d	cql3: add default replication strategy to `create_keyspace_statement` When creating a new keyspace, both replication strategy and replication factor must be stated. For example: `CREATE KEYSPACE ks WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'replication_factor' : 3 };` This syntax is verbose, and in all but some testing scenarios `NetworkTopologyStrategy` is used. This patch allows skipping replication strategy name, filling it with `NetworkTopologyStrategy` when that happens. The following syntax is now valid: `CREATE KEYSPACE ks WITH REPLICATION = { 'replication_factor' : 3 };` and will give the same result as the previous, more explicit one. Fixes #16029	2025-08-13 01:51:53 +02:00
Avi Kivity	8164f72f6e	Merge 'Separate local_effective_replication_map from vnode_effective_replication_map' from Benny Halevy Derive both vnode_effective_replication_map and local_effective_replication_map from static_effective_replication_map as both are static and per-keyspace. However, local_effective_replication_map does not need vnodes for the mapping of all tokens to the local node. Refs #22733 * No backport required Closes scylladb/scylladb#25222 * github.com:scylladb/scylladb: locator: abstract_replication_strategy: implement local_replication_strategy locator: vnode_effective_replication_map: convert clone_data_gently to clone_gently locator: abstract_replication_map: rename make_effective_replication_map locator: abstract_replication_map: rename calculate_effective_replication_map replica: database: keyspace: rename {create,update}_effective_replication_map locator: effective_replication_map_factory: rename create_effective_replication_map locator: abstract_replication_strategy: rename vnode_effective_replication_map_ptr et. al locator: abstract_replication_strategy: rename global_vnode_effective_replication_map keyspace: rename get_vnode_effective_replication_map dht: range_streamer: use naked e_r_m pointers storage_service: use naked e_r_m pointers alternator: ttl: use naked e_r_m pointers locator: abstract_replication_strategy: define is_local	2025-08-07 12:51:43 +03:00
Benny Halevy	ec85678de1	locator: abstract_replication_strategy: define is_local Prefer for specializing the local replication strategy, local effective replication map, et. al byt defining an is_local() predicate, similar to uses_tablets(). Note that is_vnode_based() still applies to local replication strategy. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-08-06 13:34:23 +03:00
Nadav Har'El	d46dda0840	Merge 'cql, vector_search: implement read path' from null This pull request is an addition of ANN OF queries. The patch contains: - CQL syntax for ORDER BY `vector_column_name` ANN OF `vector_literal` clause of SELECT statements. - implementation of external ANN queries (using vector-store service) - tests Example syntax: ``` SELECT comment FROM cycling.comments_vs ORDER BY comment_vector ANN OF [0.1, 0.15, 0.3, 0.12, 0.05] LIMIT 3; ``` Limit can be between 1 and 1000 - same as for Cassandra. Co-authored-by: @janpiotrlakomy @smoczy123 Fixes: VECTOR-48 Fixes: VECTOR-46 Closes scylladb/scylladb#24444 * github.com:scylladb/scylladb: cql3/statements: implement external `ANN OF` queries vector_store_client: implement ann_error_visitor test/cqlpy: check ANN queries disallow filtering properly cassandra_tests: translate vector_invalid_query_test cassandra_tests: copy vector_invalid_query_test from Cassandra vector_index: make parameter names case insensitive cql3/statements: add `ANN OF` queries support to select statements cql/Cql.g: extend the grammar to allow for `ANN OF` queries cql3/raw: add ANN ordering to the raw statement layer	2025-08-06 09:53:38 +03:00
Jan Łakomy	447c66f4ec	cql3/statements: implement external `ANN OF` queries Implement execution of `ANN OF` queries using the vector_store service. Throw invalid_request_exception with specific message using the ann_error_visitor when ANN request returns no result. Co-authored-by: Dawid Pawlik <dawid.pawlik@scylladb.com> Co-authored-by: Michał Hudobski <michal.hudobski@scylladb.com>	2025-08-05 12:34:48 +02:00
Jan Łakomy	5fecad0ec8	cql3/statements: add `ANN OF` queries support to select statements Add parsing of `ANN OF` queries to the `select_statement` and `indexed_table_select_statement` classes. Add a placeholder for the implementation of external ANN queries. Rename `should_create_view` to `view_should_exist` as it is used not only to check if the view should be created but also if the view has been created. Co-authored-by: Dawid Pawlik <dawid.pawlik@scylladb.com>	2025-08-01 12:08:50 +02:00

1 2 3 4 5 ...

3876 Commits