This PR adds the missing documentation for the SELECT ... ANN statement that allows performing vector queries. This is just the basic explanation of the grammar and how to use it. More comprehensive documentation about vector search will be added separately in Scylla Cloud documentation and features description. Links to this additional documentation will be added as part of VECTOR-244.
Fixes: VECTOR-247.
No backport is needed as this is the new feature.
Closesscylladb/scylladb#26282
* github.com:scylladb/scylladb:
cql3: Update error messages to be in line with documentation.
docs: Add CQL documentation for vector queries using SELECT ANN
ScyllaDB offers the `compression` DDL property for configuring compression per user table (compression algorithm and chunk size). If not specified, the default compression algorithm is the LZ4Compressor with a 4KiB chunk size. The same default applies to system tables as well.
This series introduces a new configuration option to allow customizing the default for user tables. It also adds some tests for the new functionality.
Fixes#25195.
Closesscylladb/scylladb#26003
* github.com:scylladb/scylladb:
test/cluster: Add tests for invalid SSTable compression options
test/boost: Add tests for SSTable compression config options
main: Validate SSTable compression options from config
db/config: Add SSTable compression options for user tables
db/config: Prepare compression_parameters for config system
compressor: Validate presence of sstable_compression in parameters
compressor: Add missing space in exception message
ANN (aproximate nearest neighborhood) is just the name of the type
of algorithm used to perform vector search. For this reason the error
messages should refer to vector queries rather than ANN queries.
ScyllaDB offers the `compression` DDL property for configuring
compression per user table (compression algorithm and chunk size). If
not specified, the default compression algorithm is the LZ4Compressor
with a 4KiB chunk size (refer to the default constructor for
`compression_parameters`). The same default applies to system tables as
well.
Add a new configuration option to allow customizing the default for user
tables. Use the previously hardcoded default as the new option's default
value.
Note that the option has no effect on ALTER TABLE statements. An altered
table either inherits explicit compression options from the CQL
statement, or maintains its existing options.
Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
The namespace usage in this directory is very inconsistent, with files
and classes scattered in:
* global namespace
* namespace compaction
* namespace sstables
With cases, where all three used in the same file. This code used to
live in sstables/ and some of it still retains namespace sstables as a
heritage of that time. The mismatch between the dir (future module) and
the namespace used is confusing, so finish the migration and move all
code in compaction/ to namespace compaction too.
This patch, although large, is mechanic and only the following kind of
changes are made:
* replace namespace sstable {} with namespace compaction {}
* add namespace compaction {}
* drop/add sstables::
* drop/add compaction::
* move around forward-declarations so they are in the correct namespace
context
This refactoring revealed some awkward leftover coupling between
sstables and compaction, in sstables/sstable_set.cc, where the
make_sstable_set() methods of compaction strategies are implemented.
As requested in #22104, moved the files and fixed other includes and build system.
Moved files:
- combine.hh
- collection_mutation.hh
- collection_mutation.cc
- converting_mutation_partition_applier.hh
- converting_mutation_partition_applier.cc
- counters.hh
- counters.cc
- timestamp.hh
Fixes: #22104
This is a cleanup, no need to backport
Closesscylladb/scylladb#25085
Vector search related implementation moved to a new module vector_search.
As the vector search functionality is going to be extended, it is
better to keep it in a separate module.
This patch corrects the column name formatting whenever
an "Undefined column name" exception is thrown.
Until now we used the `name()` function which
returns a bytes object. This resulted in a message
with a garbled ascii bytes column name instead of
a proper string. We switch to the `text()` function
that returns a sstring instead, making the message
readable.
Tests are adjusted to confirm this behavior.
Fixes: VECTOR-228
Closesscylladb/scylladb#26120
initial implementation to support CDC in tablets-enabled keyspaces.
The design is described in https://docs.google.com/document/d/1qO5f2q5QoN5z1-rYOQFu6tqVLD3Ha6pphXKEqbtSNiU/edit?usp=sharing
It is followed closely for the most part except "Deciding when to change streams" - instead, streams are changed synchronously with tablet split / merge.
Instead of the stream switching algorithm with the double writes, we use a scheme similar to the previous method for vnodes - we add the new streams with timestamp that is sufficiently far into the future.
In this PR we:
* add new group0-based internal system tables for tablet stream metadata and loading it into in-memory CDC metadata
* add virtual tables for CDC consumers
* the write coordinator chooses a stream by looking up the appropriate stream in the CDC metadata
* enable creating tables with CDC enabled in tablets-enabled keyspaces. tablets are allocated for the CDC table, and a stream is created per each tablet.
* on tablet resize (split / merge), the topology coordinator creates a new stream set with a new stream for each new tablet.
* the cdc tablets are co-located with the base tablets
Fixes https://github.com/scylladb/scylladb/issues/22576
backport not needed - new feature
update dtests: https://github.com/scylladb/scylla-dtest/pull/5897
update java cdc library: https://github.com/scylladb/scylla-cdc-java/pull/102
update rust cdc library: https://github.com/scylladb/scylla-cdc-rust/pull/136Closesscylladb/scylladb#23795
* github.com:scylladb/scylladb:
docs/dev: update CDC dev docs for tablets
doc: update CDC docs for tablets
test: cluster_events: enable add_cdc and drop_cdc
test/cql: enable cql cdc tests to run with tablets
test: test_cdc_with_alter: adjust for cdc with tablets
test/cqlpy: adjust cdc tests for tablets
test/cluster/test_cdc_with_tablets: introduce cdc with tablets tests
cdc: enable cdc with tablets
topology coordinator: change streams on tablet split/merge
cdc: virtual tables for cdc with tablets
cdc: generate_stream_diff helper function
cdc: choose stream in tablets enabled keyspaces
cdc: rename get_stream to get_vnode_stream
cdc: load tablet streams metadata from tables
cdc: helper functions for reading metadata from tables
cdc: colocate cdc table with base
cdc: remove streams when dropping CDC table
cdc: create streams when allocating tablets
migration_listener: add on_before_allocate_tablet_map notification
cdc: notify when creating or dropping cdc table
cdc: move cdc table creation to pre_create
cdc: add internal tables for cdc with tablets
cdc: add cdc_with_tablets feature flag
cdc: add is_log_schema helper
Allow to create CDC tables in a tablets-enabled keyspace when all nodes
in the cluster support the cdc_with_tablets feature.
Fixesscylladb/scylladb#22576
This patch adds the possibility to track metrics
per secondary index. Currently, only a histogram
of query latencies is tracked, but more metrics
can be added in the future. To add a new metric,
it needs to be added to the index_metrics struct
in index/secondary_index_manager.hh and then
initialized in index/secondary_index_manager.cc
in the constructor of the index_metrics struct.
The metrics are created when the index is created
and removed when the index is dropped.
First lines of the new metric:
\# HELP scylla_index_query_latencies Index query latencies
\# TYPE scylla_index_query_latencies histogram
scylla_index_query_latencies_sum{idx="test_i_idx",ks="test"} 640
scylla_index_query_latencies_count{idx="test_i_idx",ks="test"} 1
scylla_index_query_latencies_bucket{idx="test_i_idx",ks="test",le="640.000000"} 1
scylla_index_query_latencies_bucket{idx="test_i_idx",ks="test",le="768.000000"} 1
Fixes: https://github.com/scylladb/scylladb/issues/25970Closesscylladb/scylladb#25995
* github.com:scylladb/scylladb:
test: verify that the index metric is added
index, metrics: add per-index metrics
As requested in #22120, moved the files and fixed other includes and build system.
Moved files:
- query.cc
- query-request.hh
- query-result.hh
- query-result-reader.hh
- query-result-set.cc
- query-result-set.hh
- query-result-writer.hh
- query_id.hh
- query_result_merger.hh
Fixes: #22120
This is a cleanup, no need to backport
Closesscylladb/scylladb#25105
This patch adds the possibility to track metrics
per secondary index. Currently, only a histogram
of query latencies is tracked, but more metrics
can be added in the future. To add a new metric,
it needs to be added to the index_metrics struct
in index/secondary_index_manager.hh and then
initialized in index/secondary_index_manager.cc
in the constructor of the index_metrics struct.
The metrics are created when the index is created
and removed when the index is dropped.
First lines of the new metric:
\# HELP scylla_index_query_latencies Index query latencies
\# TYPE scylla_index_query_latencies histogram
scylla_index_query_latencies_sum{idx="test_i_idx",ks="test"} 640
scylla_index_query_latencies_count{idx="test_i_idx",ks="test"} 1
scylla_index_query_latencies_bucket{idx="test_i_idx",ks="test",le="640.000000"} 1
scylla_index_query_latencies_bucket{idx="test_i_idx",ks="test",le="768.000000"} 1
Since creating the vector index does not lead to creation of a view table [#24438] (whose version info had been logged in `system_schema.scylla_tables`) we lacked the information about the version of the index.
The solution we arrived at is to add the version as a field in options column of `system_schema.indexes`.
It requires few changes and seems unintruitive for existing infrastructure.
This patch implements the solution described above.
Refs: VECTOR-142
Closesscylladb/scylladb#25614
* github.com:scylladb/scylladb:
cqlpy/test_vector_index: add vector index version test
vector_index, index_prop_defs: add version to index options
create_index_statement: rename `validator` to `custom_index_factory`
custom index: rename `custom_index_option_name`
vector_index: rename `supported_options` to `vector_index_options`
Replace throwing `protocol_exception` with returning it as a result or an exceptional future in the transport server module. The goal is to improve performance.
Most of the `protocol_exception` throws were made from `fragmented_temporary_buffer` module, by passing `exception_thrower()` to its `read*` methods. `fragmented_temporary_buffer` is changed so that it now accepts an exception creator, not exception thrower. `fragmented_temporary_buffer_concepts::ExceptionCreator` concept replaced `fragmented_temporary_buffer_concepts::ExceptionThrower` and all methods that have been throwing now return failed result of type `utils::result_with_eptr`. This change is then propagated to the callers.
The scope of this patch is `protocol_exception`, so commitlog just calls `.value()` method on the result. If the result failed, that will throw the exception from the result, as defined by `utils::result_with_eptr_throw_policy`. This means that the behavior of commitlog module stays the same.
transport server module handles results gracefully. All the caller functions that return non-future value `T` now return `utils::result_with_eptr<T>`. When the caller is a function that returns a future, and it receives failed result, `make_exception_future(std::move(failed_result).value())` is returned. The rest of the callstack up to the transport server `handle_error` function is already working without throwing, and that's how zero throws is achieved.
cql3 module changes do the same as transport server module.
Benchmark that is not yet merged has commit `67fbe35833e2d23a8e9c2dcb5e04580231d8ec96`, [GitHub diff view](https://github.com/scylladb/scylladb/compare/master...nuivall:scylladb:perf_cql_raw). It uses either read or write query.
Command line used:
```
./build/release/scylla perf-cql-raw --workdir ~/tmp/scylladir --smp 1 --developer-mode 1 --workload write --duration 300 --concurrency 1000 --username cassandra --password cassandra 2>/dev/null
```
The only thing changed across runs is `--workload write`/`--workload read`.
Built and run on `release` target.
<details>
```
throughput:
mean= 36946.04 standard-deviation=1831.28
median= 37515.49 median-absolute-deviation=1544.52
maximum=39748.41 minimum=28443.36
instructions_per_op:
mean= 108105.70 standard-deviation=965.19
median= 108052.56 median-absolute-deviation=53.47
maximum=124735.92 minimum=107899.00
cpu_cycles_per_op:
mean= 70065.73 standard-deviation=2328.50
median= 69755.89 median-absolute-deviation=1250.85
maximum=92631.48 minimum=66479.36
⏱ real=5:11.08 user=2:00.20 sys=2:25.55 cpu=85%
```
```
throughput:
mean= 40718.30 standard-deviation=2237.16
median= 41194.39 median-absolute-deviation=1723.72
maximum=43974.56 minimum=34738.16
instructions_per_op:
mean= 117083.62 standard-deviation=40.74
median= 117087.54 median-absolute-deviation=31.95
maximum=117215.34 minimum=116874.30
cpu_cycles_per_op:
mean= 58777.43 standard-deviation=1225.70
median= 58724.65 median-absolute-deviation=776.03
maximum=64740.54 minimum=55922.58
⏱ real=5:12.37 user=27.461 sys=3:54.53 cpu=83%
```
```
throughput:
mean= 37107.91 standard-deviation=1698.58
median= 37185.53 median-absolute-deviation=1300.99
maximum=40459.85 minimum=29224.83
instructions_per_op:
mean= 108345.12 standard-deviation=931.33
median= 108289.82 median-absolute-deviation=55.97
maximum=124394.65 minimum=108188.37
cpu_cycles_per_op:
mean= 70333.79 standard-deviation=2247.71
median= 69985.47 median-absolute-deviation=1212.65
maximum=92219.10 minimum=65881.72
⏱ real=5:10.98 user=2:40.01 sys=1:45.84 cpu=85%
```
```
throughput:
mean= 38353.12 standard-deviation=1806.46
median= 38971.17 median-absolute-deviation=1365.79
maximum=41143.64 minimum=32967.57
instructions_per_op:
mean= 117270.60 standard-deviation=35.50
median= 117268.07 median-absolute-deviation=16.81
maximum=117475.89 minimum=117073.74
cpu_cycles_per_op:
mean= 57256.00 standard-deviation=1039.17
median= 57341.93 median-absolute-deviation=634.50
maximum=61993.62 minimum=54670.77
⏱ real=5:12.82 user=4:10.79 sys=11.530 cpu=83%
```
This shows ~240 instructions per op increase for reads and ~180 instructions per op increase for writes.
Tests have been run multiple times, with almost identical results. Each run lasted 300 seconds. Number of operations executed is roughly 38k per second * 300 seconds = 11.4m ops.
Update:
I have repeated the benchmark with clean state - reboot computer, put in performance mode, rebuild, closed other apps that might affect CPU and disk usage.
run count: 5 times before and 5 times after the patch
duration: 300 seconds
Average write throughput median before patch: 41155.99
Average write throughput median after patch: 42193.22
Median absolute deviation is also lower now, with values in range 350-550, while the previous runs' values were in range 750-1350.
</details>
Built and run on `release` target.
<details>
./build/release/scylla perf-simple-query --smp 1 --duration 300 --concurrency 1000 --enable-cache false --bypass-cache 2>/dev/null
```
throughput:
mean= 14910.90 standard-deviation=477.72
median= 14956.73 median-absolute-deviation=294.16
maximum=16061.18 minimum=13198.68
instructions_per_op:
mean= 659591.63 standard-deviation=495.85
median= 659595.46 median-absolute-deviation=324.91
maximum=661184.94 minimum=658001.49
cpu_cycles_per_op:
mean= 213301.49 standard-deviation=2724.27
median= 212768.64 median-absolute-deviation=1403.85
maximum=225837.15 minimum=208110.12
⏱ real=5:19.26 user=5:00.22 sys=15.827 cpu=98%
```
./build/release/scylla perf-simple-query --smp 1 --duration 300 --concurrency 1000 --enable-cache false 2>/dev/null
```
throughput:
mean= 93345.45 standard-deviation=4499.00
median= 93915.52 median-absolute-deviation=2764.41
maximum=104343.64 minimum=79816.66
instructions_per_op:
mean= 65556.11 standard-deviation=97.42
median= 65545.11 median-absolute-deviation=71.51
maximum=65806.75 minimum=65346.25
cpu_cycles_per_op:
mean= 34160.75 standard-deviation=803.02
median= 33927.16 median-absolute-deviation=453.08
maximum=39285.19 minimum=32547.13
⏱ real=5:03.23 user=4:29.46 sys=29.255 cpu=98%
```
./build/release/scylla perf-simple-query --smp 1 --duration 300 --concurrency 1000 --enable-cache true 2>/dev/null
```
throughput:
mean= 206982.18 standard-deviation=15894.64
median= 208893.79 median-absolute-deviation=9923.41
maximum=232630.14 minimum=127393.34
instructions_per_op:
mean= 35983.27 standard-deviation=6.12
median= 35982.75 median-absolute-deviation=3.75
maximum=36008.24 minimum=35952.14
cpu_cycles_per_op:
mean= 17374.87 standard-deviation=985.06
median= 17140.81 median-absolute-deviation=368.86
maximum=26125.38 minimum=16421.99
⏱ real=5:01.23 user=4:57.88 sys=0.124 cpu=98%
```
./build/release/scylla perf-simple-query --smp 1 --duration 300 --concurrency 1000 --enable-cache false --bypass-cache 2>/dev/null
```
throughput:
mean= 16198.26 standard-deviation=902.41
median= 16094.02 median-absolute-deviation=588.58
maximum=17890.10 minimum=13458.74
instructions_per_op:
mean= 659752.73 standard-deviation=488.08
median= 659789.16 median-absolute-deviation=334.35
maximum=660881.69 minimum=658460.82
cpu_cycles_per_op:
mean= 216070.70 standard-deviation=3491.26
median= 215320.37 median-absolute-deviation=1678.06
maximum=232396.48 minimum=209839.86
⏱ real=5:17.33 user=4:55.87 sys=18.425 cpu=99%
```
./build/release/scylla perf-simple-query --smp 1 --duration 300 --concurrency 1000 --enable-cache false 2>/dev/null
```
throughput:
mean= 97067.79 standard-deviation=2637.79
median= 97058.93 median-absolute-deviation=1477.30
maximum=106338.97 minimum=87457.60
instructions_per_op:
mean= 65695.66 standard-deviation=58.43
median= 65695.93 median-absolute-deviation=37.67
maximum=65947.76 minimum=65547.05
cpu_cycles_per_op:
mean= 34300.20 standard-deviation=704.66
median= 34143.92 median-absolute-deviation=321.72
maximum=38203.68 minimum=33427.46
⏱ real=5:03.22 user=4:31.56 sys=29.164 cpu=99%
```
./build/release/scylla perf-simple-query --smp 1 --duration 300 --concurrency 1000 --enable-cache true 2>/dev/null
```
throughput:
mean= 223495.91 standard-deviation=6134.95
median= 224825.90 median-absolute-deviation=3302.09
maximum=234859.90 minimum=193209.69
instructions_per_op:
mean= 35981.41 standard-deviation=3.16
median= 35981.13 median-absolute-deviation=2.12
maximum=35991.46 minimum=35972.55
cpu_cycles_per_op:
mean= 17482.26 standard-deviation=281.82
median= 17424.08 median-absolute-deviation=143.91
maximum=19120.68 minimum=16937.43
⏱ real=5:01.23 user=4:58.54 sys=0.136 cpu=99%
```
</details>
Fixes: #24567
This PR is a continuation of #24738 [transport: remove throwing protocol_exception on connection start](https://github.com/scylladb/scylladb/pull/24738). This PR does not solve a burning issue, but is rather an improvement in the same direction. As it is just an enhancement, it should not be backported.
Closesscylladb/scylladb#25408
* github.com:scylladb/scylladb:
test/cqlpy: add protocol exception tests
test/cqlpy: `test_protocol_exceptions.py` refactor message frame building
test/cqlpy: `test_protocol_exceptions.py` refactor duplicate code
transport: replace `make_frame` throw with return result
cql3: remove throwing `protocol_exception`
transport: replace throw in validate_utf8 with result_with_exception_ptr return
transport: replace throwing protocol_exception with returns
utils: add result_with_exception_ptr
test/cqlpy: add unknown compression algorithm test case
Since creating the vector index does not lead to creation
of a view table [#24438] (whose version info had been logged in
`system_schema.scylla_tables`) we lack the information about
the version of the index.
The mentioned version is used to recognize the quick-drop-create
index with the same parameters that needs to be rebuild.
The case is mainly experienced while testing, benchmarking
or experimenting with Vector Search.
Nevertheless it is important to have it considered, as it is really
weird having seen that DROP and CREATE commands did not change
anything.
Although being nice "optimization" to use the same old index,
the rebuild feels more natural for the get-to-know-VS-users.
Should not change anything in a real production environment.
The solution we arrived at is to add the version as a field in
options column of `system_schema.indexes`.
The version of vector index is a base table's schema version
on which the index was created.
The table's schema version changes everytime a table is changed
meaning that CREATE INDEX or DROP INDEX statement also change it.
Every index has a different index version, so it allows to identify
them easily.
This patch implements the solution described above.
Before the patch, user with CREATE access could create a table with CDC or alter the table enabling CDC, but could not query a SELECT on the CDC table they created.
It was due to the fact, the SELECT permission was checked on the CDC log, and later it's "parent" - the keyspace, but not the base table, on which the user had SELECT permission automatically granted on CREATE.
This patch matches the behavior of querying the CDC log to the one implemented for Materialized Views:
1. No new permissions are granted on CREATE.
2. When querying SELECT, the permissions on base table SELECT are checked.
Fixes: https://github.com/scylladb/scylladb/issues/19798
Fixes: VECTOR-151
Closes scylladb/scylladb#25797
* github.com:scylladb/scylladb:
cqlpy/test_permissions: run the reproducer tests for #19798
select_statement: check for access to CDC base table
Before this change, executing `DESCRIBE MATERIALIZED VIEW` on the underlying
materialized view of a secondary index would produce a `CREATE INDEX` statement.
It was not only confusing, but it also prevented from learning about
the definition of the view. The only way to do so was to query system tables.
We change that behavior and produce a `CREATE MATERIALIZED VIEW` statement
instead. The statement is printed as a comment to implicitly convey that
the user should not attempt to execute it to restore the view. A short comment
is provided to make it clearer.
Before this commit:
```
cqlsh> CREATE TABLE ks.t(p int PRIMARY KEY, v int);
cqlsh> CREATE INDEX i ON ks.t(v);
cqlsh> DESCRIBE MATERIALIZED VIEW ks.i;
CREATE INDEX i ON ks.t(v);
```
After this commit:
```
cqlsh> CREATE TABLE ks.t(p int PRIMARY KEY, v int);
cqlsh> CREATE INDEX i ON ks.t(v);
cqlsh> DESCRIBE MATERIALIZED VIEW ks.i;
/* Do NOT execute this statement! It's only for informational purposes.
This materialized view is the underlying materialized view of a secondary
index. It can be restored via restoring the index.
CREATE MATERIALIZED VIEW ks.i_index [...];
*/
```
Note that describing the base table has not been affected and still works
as follows:
```
cqlsh> CREATE TABLE ks.t(p int PRIMARY KEY, v int);
cqlsh> CREATE INDEX i ON ks.t(v);
cqlsh> DESCRIBE TABLE ks.t;
CREATE TABLE ks.t (
p int,
v int,
PRIMARY KEY (p)
) WITH bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}
AND comment = ''
AND compaction = {'class': 'IncrementalCompactionStrategy'}
AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND speculative_retry = '99.0PERCENTILE'
AND tombstone_gc = {'mode': 'timeout', 'propagation_delay_in_seconds': '3600'};
CREATE INDEX i ON ks.t(v);
```
We also provide two reproducers of scylladb/scylladb#24610.
Fixesscylladb/scylladb#24610Closesscylladb/scylladb#25697
Before the patch, user with CREATE access could create a table
with CDC or alter the table enabling CDC, but could not query
a SELECT on the CDC table they created.
It was due to the fact, the SELECT permission was checked on
the CDC log, and later it's "parent" - the keyspace,
but not thebase table, on which the user had SELECT permission
automatically granted on CREATE.
This patch matches the behaviour of querying the CDC log
to the one implemented for Materialized Views:
1. No new permissions are granted on CREATE.
2. When querying SELECT, the permissions on base table
SELECT are checked.
Fixes: #19798
When creating a new keyspace, replication factor must be stated.
For example:
`CREATE KEYSPACE ks WITH REPLICATION { 'class': 'NetworkTopologyStrategy', 'replication_factor': 3 };`
This patch changes it in the following way - if there is no
replication factor specified, use default replication factor.
Default replication factor is equal to the number of racks that
are not arbiter-only, i.e. racks that have at least one non-arbiter node.
The following syntax is now valid:
`CREATE KEYSPACE ks WITH REPLICATION { 'class': 'NetworkTopologyStrategy' };`
`CREATE KEYSPACE ks WITH REPLICATION { };`
Fixes#16028
Backport is not needed. This is an enhancement for future releases.
Closesscylladb/scylladb#25570
* github.com:scylladb/scylladb:
docs/cql: update documentation for default replication factor
test/cqlpy: add keyspace creation default replication factor tests
cql3: add default replication factor to `create_keyspace_statement`
Normally, when we create a table, MV, etc., we apply `cf_prop_defs` to the
schema builder via the function `cf_prop_defs::apply_to_builder`. Unfortunately,
that didn't happen when creating CDC log tables, and so we might have missed
some of the properties that would normally be set to some value, even if the
default one.
One particular example of that phenomenon was `tombstone_gc`. For better or
worse, it's not a "standalone property" of a table, but rather part of
`extensions`. [Somewhat related issue: scylladb/scylladb#9722]
That may have and did cause trouble. Consider this scenario:
1. A CDC log table is created.
2. The table does NOT have any value of `tombstone_gc` set.
3. The user edits the table via `ALTER TABLE`. That statement treats the log
table just like any other one (at least as far as the relevant portion of the
logic is concerned). Among other things, it uses
`cf_prop_defs::apply_to_builder`, and as a result, the `tombstone_gc`
property is set to some value:
* the default one if the user doesn't specify it in the statement,
* a custom one if they do.
Why is that a problem?
First of all, it's confusing. When we perform a schema backup and a table uses
CDC, we include an ALTER statement for its corresponding CDC log table (for more
context, see issue scylladb/scylladb#18467 or commit
scylladb/scylladb@f12edbdd95).
There are two consequences for the user here:
1. If the log table had NOT been altered ever since it was created, the
statement will miss the `tombstone_gc` property as if it couldn't be set for
it at all. That's confusing!
2. If the log table HAD in fact been altered after its creation, the statement
will include the `tombstone_gc` property. That's even more confusing (why was
it not present the first time, but it is now?).
The `tombstone_gc` property should always be set to avoid confusion and
problematic edge cases in tests and to simply be consistent with how other
schema entities work.
The solution we employ is that we always set the property to the default
value. That includes the case when we reattach the log table to the base;
consider the following scenario:
1. Create a table with CDC enabled.
2. Detach the log table by performing `ALTER TABLE ... WITH cdc = {'enabled': false}`.
3. Change the `tombstone_gc` property of the log table.
4. Reattach the log table to the base in the same way as in step 2.
The expected result would be that the new value of `tombstone_gc` would be
preserved after reattaching the log table. However, that's not what will
happen. We decide to stay consistent with how other properties of a log
table behave, and we reset them after every reattachment. We might change that
in the future: see issue scylladb/scylladb#25523.
Two reproducer tests of scylladb/scylladb#25187 are included in the changes.
Backport: The problem is not critical, so it may not be necessary to backport the changes.
That's to be discussed.
Closesscylladb/scylladb#25521
* github.com:scylladb/scylladb:
cdc: Set tombstone_gc when creating log table
tombstone_gc: Add overload of get_default_tombstone_gc_mode
tombstone_gc: Rename get_default_tombstonesonte_gc_mode
Executing a vector search (SELECT with ANN OF ordering) query with `TRACING ON` enabled
caused a node to crash due to a null pointer dereference.
This occurred because a vector index does not have an associated view
table, making its `_view_schema` member null. The implementation
attempted to enable tracing on this null view schema, leading to the
crash.
The fix adds a null check for `_view_schema` before attempting to
enable tracing on the view (index) table.
A regression test is included to prevent this from happening again.
Fixes: VECTOR-179
Closesscylladb/scylladb#25500
This patch introduces `view_building_coordinator`, a single entity within whole cluster responsible for building tablet-based views.
The view building coordinator takes slightly different approach than the existing node-local view builder. The whole process is split into smaller view building tasks, one per each tablet replica of the base table.
The coordinator builds one base table at a time and it can choose another when all views of currently processing base table are built.
The tasks are started by setting `STARTED` state and they are executed by node-local view building worker. The tasks are scheduled in a way, that each shard processes only one tablet at a time (multiple tasks can be started for a shard on a node because a table can have multiple views but then all tasks have the same base table and tablet (last_token)). Once the coordinator starts the tasks, it sends `work_on_view_building_tasks` RPC to start the tasks and receive their results.
This RPC is resilient to RPC failure or raft leader change, meaning if one RPC call started a batch of tasks but then failed (for instance the raft leader was changed and caller aborted waiting for the response), next RPC call will attach itself to the already started batch.
The coordinator plugs into handling tablet operations (migration/resize/RF change) and adjusts its tasks accordingly. At the start of each tablet operation, the coordinator aborts necessary view building tasks to prevent https://github.com/scylladb/scylladb/issues/21564. Then, new adjusted tasks are created at the end of the operation.
If the operation fails at any moment, aborted tasks are rollback.
The view building coordinator can also handle staging sstables using process_staging view building tasks. We do this because we don't want to start generating view updates from a staging sstable prematurely, before the writes are directed to the new replica (https://github.com/scylladb/scylladb/issues/19149).
For detailed description check: `docs/dev/view-building-coordinator.md`
Fixes https://github.com/scylladb/scylladb/issues/22288
Fixes https://github.com/scylladb/scylladb/issues/19149
Fixes https://github.com/scylladb/scylladb/issues/21564
Fixes https://github.com/scylladb/scylladb/issues/17603
Fixes https://github.com/scylladb/scylladb/issues/22586
Fixes https://github.com/scylladb/scylladb/issues/18826
Fixes https://github.com/scylladb/scylladb/issues/23930
---
This PR is reimplementation of https://github.com/scylladb/scylladb/pull/21942Closesscylladb/scylladb#23760
* github.com:scylladb/scylladb:
test/cluster: add view build status tests
test/cluster: add view building coordinator tests
utils/error_injection: allow to abort `injection_handler::wait_for_message()`
test: adjust existing tests
utils/error_injection: add injection with `sleep_abortable()`
db/view/view_builder: ignore `no_such_keyspace` exception
docs/dev: add view building coordinator documentation
db/view/view_building_worker: work on `process_staging` tasks
db/view/view_building_worker: register staging sstable to view building coordinator when needed
db/view/view_building_worker: discover staging sstables
db/view/view_building_worker: add method to register staging sstable
db/view/view_update_generator: add method to process staging sstables instantly
db/view/view_update_generator: extract generating updates from staging sstables to a method
db/view/view_update_generator: ignore tablet-based sstables
db/view/view_building_coordinator: update view build status on node join/left
db/view/view_building_coordinator: handle tablet operations
db/view: add view building task mutation builder
service/topology_coordinator: run view building coordinator
db/view: introduce `view_building_coordinator`
db/view/view_building_worker: update built views locally
db/view: introduce `view_building_worker`
db/view: extract common view building functionalities
db/view: prepare to create abstract `view_consumer`
message/messaging_service: add `work_on_view_building_tasks` RPC
service/topology_coordinator: make `term_changed_error` public
db/schema_tables: create/cleanup tasks when an index is created/dropped
service/migration_manager: cleanup view building state on drop keyspace
service/migration_manager: cleanup view building state on drop view
service/migration_manager: create view building tasks on create view
test/boost: enable proxy remote in some tests
service/migration_manager: pass `storage_proxy` to `prepare_keyspace_drop_announcement()`
service/migration_manager: coroutinize `prepare_new_view_announcement()`
service/storage_proxy: expose references to `system_keyspace` and `view_building_state_machine`
service: reload `view_building_state_machine` on group0 apply()
service/vb_coordinator: add currently processing base
db/system_keyspace: move `get_scylla_local_mutation()` up
db/system_keyspace: add `view_building_tasks` table
db/view: add view_building_state and views_state
db/system_keyspace: add method to get view build status map
db/view: extract `system.view_build_status_v2` cql statements to system_keyspace
db/system_keyspace: move `internal_system_query_state()` function earlier
db/view: ignore tablet-based views in `view_builder`
gms/feature_service: add VIEW_BUILDING_COORDINATOR feature
The change is motivated by the fact that indeed the result of
`get_custom_class_factory` is a `custom_index_factory`.
The name `validator` was a bit misleading as it does not validate
anything by itself. Furthermore if we wanted to use the custom index
produced by the factory in other operations than validate, the name
feels really off.
Renamed `custom_index_option_name` to `custom_class_option_name`
as the late was a bit misleading since we refactored our model
of custom indexes to be index class reliant.
Remove throwing `protocol_exception` in cql3/query_options.cc` in
function `cql3::query_options::check_serial_consistency` as part of
an ongoing effort to remove throwing `protocol_exception`.
This change only affects code local to the `cql3` module.
Refs: #24567
When creating a new keyspace, replication factor must be stated.
For example:
`CREATE KEYSPACE ks WITH REPLICATION { 'class': 'NetworkTopologyStrategy', 'replication_factor': 3 };`
This patch changes it in the following way - if there is no
replication factor specified, use default replication factor.
Default replication factor is equal to the number of racks that
are not comprised of only zero-token nodes, i.e. racks that have
at least one non-zero-token node.
The following syntax is now valid:
`CREATE KEYSPACE ks WITH REPLICATION { 'class': 'NetworkTopologyStrategy' };`
`CREATE KEYSPACE ks WITH REPLICATION { };`
Fixes: #16028
Today, any source file or header file that wants to use the
tri_mode_restriction type needs to include db/config.hh, which is a
large and frequently-changing header file. In this patch we split this
type into a separate header file, db/tri_mode_restriction.hh, and avoid
a few unnecessary inclusions of db/config.hh. However, a few source
files now need to explicitly include db/config.hh, after its
transitive inclusion is gone.
Note that the overwhelmingly common inclusion of db/config.hh continues
to be a problem after this patch - 128 source files include it directly.
So this patch is just the first step in long journey.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#25692
When a vector index is created in Scylla, it is initially built using a full scan of the database. After that, it stays up to date by tracking changes through CDC, which should be automatically enabled when the vector index is created.
When a user attempts to enable Vector Search (VS), the system checks whether Change Data Capture (CDC) is enabled and properly configured:
1. CDC is not enabled
- CDC is automatically enabled with the minimum required TTL (Time-to-Live) for VS (24 hours) and the delta mode set to 'full' or post-image is enabled.
- If the user later tries to reduce the CDC TTL below 24 hours or set delta mode to 'keys' with post-image disabled, the action fails.
- Error message: Clearly states that CDC TTL must be at least 24 hours and delta mode must be set to 'full' or post-image must be enabled for VS to function.
2. CDC is already enabled
- If CDC TTL is ≥ 24 hours and delta mode is set to 'full' or post-image is enabled: VS is enabled successfully.
- If CDC TTL is < 24 hours or delta mode is set to 'keys' with post-image disabled: The VS enabling process fails.
- Error message: Informs the user that CDC TTL must be at least 24 hours, delta mode must be set to 'full' or post-image must be enabled, and provides a link to documentation on how to update the TTL, delta mode, and post-image.
When a user attempts to disable CDC when VS is enabled, the action will fail and the user will be informed by error message that clearly states that VS needs to be disabled (vector indexes have to be dropped) first.
Full setup requirements and steps will be detailed in the documentation of Vector Search.
Co-authored-by: @smoczy123
Fixes: VECTOR-27
Fixes: VECTOR-25
Closesscylladb/scylladb#25179
* github.com:scylladb/scylladb:
test/cqlpy: ensure Vector Search CDC options
test/boost: adjust CDC boost tests for Vector Search
test/cql: add Vector Search CDC enable/disable test
cdc, vector_index: provide minimal option setup for Vector Search
test/cqlpy: adjust describe table tests with CDC for Vector Search
describe, cdc: adjust describe for cdc log tables
cdc: enable CDC log when vector index is created
test/cqlpy: run vector_index tests only on vnodes
vector_index: check if vector index exists in schema
Although RF-rack-valid keyspaces are not universally enforced
yet (they're governed by the configuration option
`rf_rack_valid_keyspaces`), we'd like to encourage the user to
abide by the restriction.
To that end, we're introducing a warning when creating or
altering a keyspace. If the configuration option is disabled,
but the user is trying to create an RF-rack-invalid keyspace,
they'll receive a warning.
We provide a validation test.
Ensure that the CDC used by Vector Search has at least 24h TTL
and delta mode is set to 'full' or postimage is enabled.
This setup is required by the Vector Store to work as intended.
The TTL of at least 24h is a rough estimate of the maximal time
needed for the full scan conducted by Vector Store to finish.
The delta mode set to 'full' or postimage enabled is needed
to read the values of vectors being written to the table,
so Vector Store can save them in the desired external index.
As the default we set TTL = 24h, delta = 'full', postimage = false.
Full delta is preffered option to log the vector values as it is less
costly and does not require additional read on write.
When creating a new keyspace, both replication strategy and replication
factor must be stated. For example:
`CREATE KEYSPACE ks WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'replication_factor' : 3 };`
This syntax is verbose, and in all but some testing scenarios
`NetworkTopologyStrategy` is used.
This patch allows skipping replication strategy name, filling it with
`NetworkTopologyStrategy` when that happens. The following syntax is now
valid:
`CREATE KEYSPACE ks WITH REPLICATION = { 'replication_factor' : 3 };`
and will give the same result as the previous, more explicit one.
Fixes#16029
Derive both vnode_effective_replication_map
and local_effective_replication_map from
static_effective_replication_map as both are static and per-keyspace.
However, local_effective_replication_map does not need vnodes
for the mapping of all tokens to the local node.
Refs #22733
* No backport required
Closesscylladb/scylladb#25222
* github.com:scylladb/scylladb:
locator: abstract_replication_strategy: implement local_replication_strategy
locator: vnode_effective_replication_map: convert clone_data_gently to clone_gently
locator: abstract_replication_map: rename make_effective_replication_map
locator: abstract_replication_map: rename calculate_effective_replication_map
replica: database: keyspace: rename {create,update}_effective_replication_map
locator: effective_replication_map_factory: rename create_effective_replication_map
locator: abstract_replication_strategy: rename vnode_effective_replication_map_ptr et. al
locator: abstract_replication_strategy: rename global_vnode_effective_replication_map
keyspace: rename get_vnode_effective_replication_map
dht: range_streamer: use naked e_r_m pointers
storage_service: use naked e_r_m pointers
alternator: ttl: use naked e_r_m pointers
locator: abstract_replication_strategy: define is_local
Prefer for specializing the local replication strategy,
local effective replication map, et. al byt defining
an is_local() predicate, similar to uses_tablets().
Note that is_vnode_based() still applies to local replication
strategy.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Implement execution of `ANN OF` queries using the vector_store service.
Throw invalid_request_exception with specific message using
the ann_error_visitor when ANN request returns no result.
Co-authored-by: Dawid Pawlik <dawid.pawlik@scylladb.com>
Co-authored-by: Michał Hudobski <michal.hudobski@scylladb.com>
Add parsing of `ANN OF` queries to the `select_statement` and
`indexed_table_select_statement` classes.
Add a placeholder for the implementation of external ANN queries.
Rename `should_create_view` to `view_should_exist` as it is used
not only to check if the view should be created but also if
the view has been created.
Co-authored-by: Dawid Pawlik <dawid.pawlik@scylladb.com>
We want to access the paxos state table only on the local node and
shard (or shards in case of intranode_migration). In this commit we
add a node_local_only flag to query_options, which allows to do that.
This flag can be set for a query via make_internal_options.
We handle this flag on the statements layer by forwarding it to
either coordinator_query_options or coordinator_mutate_options.
The set of columns of a CDC log table should be managed automatically
by Scylla, and the user should not have the ability to manipulate them
directly. That could lead to disastrous consequences such as a
segmentation fault.
In this commit, we're restricting those operations. We also provide two
validation tests.
One of the existing tests had to be adjusted as it modified the type
of a column in a CDC log table. Since the test simply verifies that
the user has sufficient permissions to perform `ALTER TABLE` on the log
table, the test is still valid.
Fixesscylladb/scylladb#24643
Backport: we should backport the change to all affected
branches to prevent the consequences that may affect the user.
Closesscylladb/scylladb#25008
* github.com:scylladb/scylladb:
cdc: Forbid altering columns of inactive CDC log table
cdc: Forbid altering columns of CDC log tables directly
When CDC becomes disabled on the base table, the CDC log table
still exsits (cf. scylladb/scylladb@adda43edc7).
If it continues to exist up to the point when CDC is re-enabled
on the base table, no new log table will be created -- instead,
the old olg table will be *re-attached*.
Since we want to avoid situations when the definition of the log
table has become misaligned with the definition of the base table
due to actions of the user, we forbid modifying the set of columns
or renaming them in CDC log tables, even when they're inactive.
Validation tests are provided.
This PR adds a way for custom indexes to decide whether a view should be created for them, as for the vector_index the view is not needed, because we store it in the external service. To allow this, custom logic for describing indexes using custom classes was added (as it used to depend on the view corresponding to an index).
Fixes: VECTOR-10
Closesscylladb/scylladb#24438
* github.com:scylladb/scylladb:
custom_index: do not create view when creating a custom index
custom_index: refactor describe for custom indexes
custom_index: remove unneeded duplicate of a static string
The set of columns of a CDC log table should be managed automatically
by Scylla, and the user should not have the ability to manipulate them
directly. That could lead to disastrous consequences such as a
segmentation fault.
In this commit, we're restricting those operations. We also provide two
validation tests.
One of the existing tests had to be adjusted as it modified the type
of a column in a CDC log table. Since the test simply verifies that
the user has sufficient permissions to perform `ALTER TABLE` on the log
table, the test is still valid.
Fixesscylladb/scylladb#24643
This change is preparing ground for state update unification for raft bound subsystems. It introduces schema_applier which in the future will become generic interface for applying mutations in raft.
Pulling database::apply() out of schema merging code will allow to batch changes to subsystems. Future generic code will first call prepare() on all implementations, then single database::apply() and then update() on all implementations, then on each shard it will call commit() for all implementations, without preemption so that the change is observed as atomic across all subsystems, and then post_commit().
Backport: no, it's a new feature
Fixes: https://github.com/scylladb/scylladb/issues/19649
Fixes https://github.com/scylladb/scylladb/issues/24531Closesscylladb/scylladb#24886
[avi: adjust for std::vector<mutations> -> utils::chunked_vector<mutations>]
* github.com:scylladb/scylladb:
test: add type creation to test_snapshot
storage_service: always wake up load balancer on update tablet metadata
db: schema_applier: call destroy also when exception occurs
db: replica: simplify seeding ERM during shema change
db: remove cleanup from add_column_family
db: abort on exception during schema commit phase
db: make user defined types changes atomic
replica: db: make keyspace schema changes atomic
db: atomically apply changes to tables and views
replica: make truncate_table_on_all_shards get whole schema from table_shards
service: split update_tablet_metadata into two phases
service: pull out update_tablet_metadata from migration_listener
db: service: add store_service dependency to schema_applier
service: simplify load_tablet_metadata and update_tablet_metadata
db: don't perform move on tablet_hint reference
replica: split add_column_family_and_make_directory into steps
replica: db: split drop_table into steps
db: don't move map references in merge_tables_and_views()
db: introduce commit_on_shard function
db: access types during schema merge via special storage
replica: make non-preemptive keyspace create/update/delete functions public
replica: split update keyspace into two phases
replica: split creating keyspace into two functions
db: rename create_keyspace_from_schema_partition
db: decouple functions and aggregates schema change notification from merging code
db: store functions and aggregates change batch in schema_applier
db: decouple tables and views schema change notifications from merging code
db: store tables and views schema diff in schema_applier
db: decouple user type schema change notifications from types merging code
service: unify keyspace notification functions arguments
db: replica: decouple keyspace schema change notifications to a separate function
db: add class encapsulating schema merging