Commit Graph

1785 Commits

Author SHA1 Message Date
Dario Mirovic
bc8bb0873d cql3: add default replication strategy to create_keyspace_statement
When creating a new keyspace, both replication strategy and replication
factor must be stated. For example:
`CREATE KEYSPACE ks WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'replication_factor' : 3 };`

This syntax is verbose, and in all but some testing scenarios
`NetworkTopologyStrategy` is used.

This patch allows skipping replication strategy name, filling it with
`NetworkTopologyStrategy` when that happens. The following syntax is now
valid:
`CREATE KEYSPACE ks WITH REPLICATION = { 'replication_factor' : 3 };`
and will give the same result as the previous, more explicit one.

Fixes #16029
2025-08-13 01:51:53 +02:00
Avi Kivity
8164f72f6e Merge 'Separate local_effective_replication_map from vnode_effective_replication_map' from Benny Halevy
Derive both vnode_effective_replication_map
and local_effective_replication_map from
static_effective_replication_map as both are static and per-keyspace.

However, local_effective_replication_map does not need vnodes
for the mapping of all tokens to the local node.

Refs #22733

* No backport required

Closes scylladb/scylladb#25222

* github.com:scylladb/scylladb:
  locator: abstract_replication_strategy: implement local_replication_strategy
  locator: vnode_effective_replication_map: convert clone_data_gently to clone_gently
  locator: abstract_replication_map: rename make_effective_replication_map
  locator: abstract_replication_map: rename calculate_effective_replication_map
  replica: database: keyspace: rename {create,update}_effective_replication_map
  locator: effective_replication_map_factory: rename create_effective_replication_map
  locator: abstract_replication_strategy: rename vnode_effective_replication_map_ptr et. al
  locator: abstract_replication_strategy: rename global_vnode_effective_replication_map
  keyspace: rename get_vnode_effective_replication_map
  dht: range_streamer: use naked e_r_m pointers
  storage_service: use naked e_r_m pointers
  alternator: ttl: use naked e_r_m pointers
  locator: abstract_replication_strategy: define is_local
2025-08-07 12:51:43 +03:00
Benny Halevy
ec85678de1 locator: abstract_replication_strategy: define is_local
Prefer for specializing the local replication strategy,
local effective replication map, et. al byt defining
an is_local() predicate, similar to uses_tablets().

Note that is_vnode_based() still applies to local replication
strategy.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-08-06 13:34:23 +03:00
Jan Łakomy
447c66f4ec cql3/statements: implement external ANN OF queries
Implement execution of `ANN OF` queries using the vector_store service.

Throw invalid_request_exception with specific message using
the ann_error_visitor when ANN request returns no result.

Co-authored-by: Dawid Pawlik <dawid.pawlik@scylladb.com>
Co-authored-by: Michał Hudobski <michal.hudobski@scylladb.com>
2025-08-05 12:34:48 +02:00
Jan Łakomy
5fecad0ec8 cql3/statements: add ANN OF queries support to select statements
Add parsing of `ANN OF` queries to the `select_statement` and
`indexed_table_select_statement` classes.
Add a placeholder for the implementation of external ANN queries.

Rename `should_create_view` to `view_should_exist` as it is used
not only to check if the view should be created but also if
the view has been created.

Co-authored-by: Dawid Pawlik <dawid.pawlik@scylladb.com>
2025-08-01 12:08:50 +02:00
Jan Łakomy
d073a4c1fa cql3/raw: add ANN ordering to the raw statement layer
Extend `orderings_type` to include ANN ordering.

Co-authored-by: Dawid Pawlik <dawid.pawlik@scylladb.com>
2025-07-31 11:11:24 +02:00
Petr Gusev
aae5260147 create_keyspace: fix warning for tablets
Remove LWT from the list of unsupported features.
2025-07-24 20:04:43 +02:00
Petr Gusev
ff1caa9798 query_options: add node_local_only mode
We want to access the paxos state table only on the local node and
shard (or shards in case of intranode_migration). In this commit we
add a node_local_only flag to query_options, which allows to do that.
This flag can be set for a query via make_internal_options.

We handle this flag on the statements layer by forwarding it to
either coordinator_query_options or coordinator_mutate_options.
2025-07-24 19:48:08 +02:00
Piotr Dulikowski
7fd97e6a93 Merge 'cdc: Forbid altering columns of CDC log tables directly' from Dawid Mędrek
The set of columns of a CDC log table should be managed automatically
by Scylla, and the user should not have the ability to manipulate them
directly. That could lead to disastrous consequences such as a
segmentation fault.

In this commit, we're restricting those operations. We also provide two
validation tests.

One of the existing tests had to be adjusted as it modified the type
of a column in a CDC log table. Since the test simply verifies that
the user has sufficient permissions to perform `ALTER TABLE` on the log
table, the test is still valid.

Fixes scylladb/scylladb#24643

Backport: we should backport the change to all affected
branches to prevent the consequences that may affect the user.

Closes scylladb/scylladb#25008

* github.com:scylladb/scylladb:
  cdc: Forbid altering columns of inactive CDC log table
  cdc: Forbid altering columns of CDC log tables directly
2025-07-21 09:31:00 +02:00
Dawid Mędrek
59800b1d66 cdc: Forbid altering columns of inactive CDC log table
When CDC becomes disabled on the base table, the CDC log table
still exsits (cf. scylladb/scylladb@adda43edc7).
If it continues to exist up to the point when CDC is re-enabled
on the base table, no new log table will be created -- instead,
the old olg table will be *re-attached*.

Since we want to avoid situations when the definition of the log
table has become misaligned with the definition of the base table
due to actions of the user, we forbid modifying the set of columns
or renaming them in CDC log tables, even when they're inactive.

Validation tests are provided.
2025-07-18 15:03:08 +02:00
Nadav Har'El
04b263b51a Merge 'vector_index: do not create a view when creating a vector index' from Michał Hudobski
This PR adds a way for custom indexes to decide whether a view should be created for them, as for the vector_index the view is not needed, because we store it in the external service. To allow this, custom logic for describing indexes using custom classes was added (as it used to depend on the view corresponding to an index).

Fixes: VECTOR-10

Closes scylladb/scylladb#24438

* github.com:scylladb/scylladb:
  custom_index: do not create view when creating a custom index
  custom_index: refactor describe for custom indexes
  custom_index: remove unneeded duplicate of a static string
2025-07-17 13:48:49 +03:00
Dawid Mędrek
20d0050f4e cdc: Forbid altering columns of CDC log tables directly
The set of columns of a CDC log table should be managed automatically
by Scylla, and the user should not have the ability to manipulate them
directly. That could lead to disastrous consequences such as a
segmentation fault.

In this commit, we're restricting those operations. We also provide two
validation tests.

One of the existing tests had to be adjusted as it modified the type
of a column in a CDC log table. Since the test simply verifies that
the user has sufficient permissions to perform `ALTER TABLE` on the log
table, the test is still valid.

Fixes scylladb/scylladb#24643
2025-07-16 15:35:48 +02:00
Avi Kivity
6fce817aa8 Merge 'Atomic in-memory schema changes application' from Marcin Maliszkiewicz
This change is preparing ground for state update unification for raft bound subsystems. It introduces schema_applier which in the future will become generic interface for applying mutations in raft.

Pulling database::apply() out of schema merging code will allow to batch changes to subsystems. Future generic code will first call prepare() on all implementations, then single database::apply() and then update() on all implementations, then on each shard it will call commit() for all implementations, without preemption so that the change is observed as atomic across all subsystems, and then post_commit().

Backport: no, it's a new feature

Fixes: https://github.com/scylladb/scylladb/issues/19649
Fixes https://github.com/scylladb/scylladb/issues/24531

Closes scylladb/scylladb#24886

[avi: adjust for std::vector<mutations> -> utils::chunked_vector<mutations>]

* github.com:scylladb/scylladb:
  test: add type creation to test_snapshot
  storage_service: always wake up load balancer on update tablet metadata
  db: schema_applier: call destroy also when exception occurs
  db: replica: simplify seeding ERM during shema change
  db: remove cleanup from add_column_family
  db: abort on exception during schema commit phase
  db: make user defined types changes atomic
  replica: db: make keyspace schema changes atomic
  db: atomically apply changes to tables and views
  replica: make truncate_table_on_all_shards get whole schema from table_shards
  service: split update_tablet_metadata into two phases
  service: pull out update_tablet_metadata from migration_listener
  db: service: add store_service dependency to schema_applier
  service: simplify load_tablet_metadata and update_tablet_metadata
  db: don't perform move on tablet_hint reference
  replica: split add_column_family_and_make_directory into steps
  replica: db: split drop_table into steps
  db: don't move map references in merge_tables_and_views()
  db: introduce commit_on_shard function
  db: access types during schema merge via special storage
  replica: make non-preemptive keyspace create/update/delete functions public
  replica: split update keyspace into two phases
  replica: split creating keyspace into two functions
  db: rename create_keyspace_from_schema_partition
  db: decouple functions and aggregates schema change notification from merging code
  db: store functions and aggregates change batch in schema_applier
  db: decouple tables and views schema change notifications from merging code
  db: store tables and views schema diff in schema_applier
  db: decouple user type schema change notifications from types merging code
  service: unify keyspace notification functions arguments
  db: replica: decouple keyspace schema change notifications to a separate function
  db: add class encapsulating schema merging
2025-07-13 20:47:55 +03:00
Benny Halevy
3feb759943 everywhere: use utils::chunked_vector for list of mutations
Currently, we use std::vector<*mutation> to keep
a list of mutations for processing.
This can lead to large allocation, e.g. when the vector
size is a function of the number of tables.

Use a chunked vector instead to prevent oversized allocations.

`perf-simple-query --smp 1` results obtained for fixed 400MHz frequency
and PGO disabled:

Before (read path):
```
enable-cache=1
Running test with config: {partitions=10000, concurrency=100, mode=read, query_single_key=no, counters=no}
Disabling auto compaction
Creating 10000 partitions...

89055.97 tps ( 66.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   39417 insns/op,   18003 cycles/op,        0 errors)
103372.72 tps ( 66.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   39380 insns/op,   17300 cycles/op,        0 errors)
98942.27 tps ( 66.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   39413 insns/op,   17336 cycles/op,        0 errors)
103752.93 tps ( 66.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   39407 insns/op,   17252 cycles/op,        0 errors)
102516.77 tps ( 66.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   39403 insns/op,   17288 cycles/op,        0 errors)
throughput:
	mean=   99528.13 standard-deviation=6155.71
	median= 102516.77 median-absolute-deviation=3844.59
	maximum=103752.93 minimum=89055.97
instructions_per_op:
	mean=   39403.99 standard-deviation=14.25
	median= 39406.75 median-absolute-deviation=9.30
	maximum=39416.63 minimum=39380.39
cpu_cycles_per_op:
	mean=   17435.81 standard-deviation=318.24
	median= 17300.40 median-absolute-deviation=147.59
	maximum=18002.53 minimum=17251.75
```

After (read path)
```
enable-cache=1
Running test with config: {partitions=10000, concurrency=100, mode=read, query_single_key=no, counters=no}
Disabling auto compaction
Creating 10000 partitions...
59755.04 tps ( 66.2 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   39466 insns/op,   22834 cycles/op,        0 errors)
71854.16 tps ( 66.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   39417 insns/op,   17883 cycles/op,        0 errors)
82149.45 tps ( 66.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   39411 insns/op,   17409 cycles/op,        0 errors)
49640.04 tps ( 66.1 allocs/op,   0.0 logallocs/op,  14.3 tasks/op,   39474 insns/op,   19975 cycles/op,        0 errors)
54963.22 tps ( 66.1 allocs/op,   0.0 logallocs/op,  14.3 tasks/op,   39474 insns/op,   18235 cycles/op,        0 errors)
throughput:
	mean=   63672.38 standard-deviation=13195.12
	median= 59755.04 median-absolute-deviation=8709.16
	maximum=82149.45 minimum=49640.04
instructions_per_op:
	mean=   39448.38 standard-deviation=31.60
	median= 39466.17 median-absolute-deviation=25.75
	maximum=39474.12 minimum=39411.42
cpu_cycles_per_op:
	mean=   19267.01 standard-deviation=2217.03
	median= 18234.80 median-absolute-deviation=1384.25
	maximum=22834.26 minimum=17408.67
```

`perf-simple-query --smp 1 --write` results obtained for fixed 400MHz frequency
and PGO disabled:

Before (write path):
```
enable-cache=1
Running test with config: {partitions=10000, concurrency=100, mode=write, query_single_key=no, counters=no}
Disabling auto compaction
63736.96 tps ( 59.4 allocs/op,  16.4 logallocs/op,  14.3 tasks/op,   49667 insns/op,   19924 cycles/op,        0 errors)
64109.41 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   49992 insns/op,   20084 cycles/op,        0 errors)
56950.47 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   50005 insns/op,   20501 cycles/op,        0 errors)
44858.42 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   50014 insns/op,   21947 cycles/op,        0 errors)
28592.87 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   50027 insns/op,   27659 cycles/op,        0 errors)
throughput:
	mean=   51649.63 standard-deviation=15059.74
	median= 56950.47 median-absolute-deviation=12087.33
	maximum=64109.41 minimum=28592.87
instructions_per_op:
	mean=   49941.18 standard-deviation=153.76
	median= 50005.24 median-absolute-deviation=73.01
	maximum=50027.07 minimum=49667.05
cpu_cycles_per_op:
	mean=   22023.01 standard-deviation=3249.92
	median= 20500.74 median-absolute-deviation=1938.76
	maximum=27658.75 minimum=19924.32
```

After (write path)
```
enable-cache=1
Running test with config: {partitions=10000, concurrency=100, mode=write, query_single_key=no, counters=no}
Disabling auto compaction
53395.93 tps ( 59.4 allocs/op,  16.5 logallocs/op,  14.3 tasks/op,   50326 insns/op,   21252 cycles/op,        0 errors)
46527.83 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   50704 insns/op,   21555 cycles/op,        0 errors)
55846.30 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   50731 insns/op,   21060 cycles/op,        0 errors)
55669.30 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   50735 insns/op,   21521 cycles/op,        0 errors)
52130.17 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   50757 insns/op,   21334 cycles/op,        0 errors)
throughput:
	mean=   52713.91 standard-deviation=3795.38
	median= 53395.93 median-absolute-deviation=2955.40
	maximum=55846.30 minimum=46527.83
instructions_per_op:
	mean=   50650.57 standard-deviation=182.46
	median= 50731.38 median-absolute-deviation=84.09
	maximum=50756.62 minimum=50325.87
cpu_cycles_per_op:
	mean=   21344.42 standard-deviation=202.86
	median= 21334.00 median-absolute-deviation=176.37
	maximum=21554.61 minimum=21060.24
```

Fixes #24815

Improvement for rare corner cases. No backport required

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#24919
2025-07-13 19:13:11 +03:00
Marcin Maliszkiewicz
19bc6ffcb0 replica: make truncate_table_on_all_shards get whole schema from table_shards
Before for views and indexes it was fetching base schema from db (and
couple other properties). This is a problem once we introduce atomic
tables and views deletion (in the following commit).
Because once we delete table it can no longer be fetched from db object,
and truncation is performed after atomically deleting all relevant
tables/views/indexes.

Now the whole relevant schema will be fetched via global_table_ptr
(table_shards) object.
2025-07-10 10:40:43 +02:00
Pawel Pery
5bfce5290e cql3: refactor primary_key as a top-level class
This patch is a part of vector_store_client sharded service
implementation for a communication with vector-store service.

There is a need for forward declaration of primary_key class. This patch
moves a nested definition of select_statement::primary_key (from a
cql3::statements namespace) into a standalone class in a
cql3::statements namespace.

Reference: VS-47
2025-07-09 11:54:51 +02:00
Michał Hudobski
d4002b61dd custom_index: refactor describe for custom indexes
Currently, to describe an index we look at
a corresponding view. However for custom indexes
the view may not exist (as we are removing the views
from vector indexes). This commit adds a way for a custom
index class to override the default describing logic
and provides such an override for the vector_index
class.
2025-07-07 13:47:07 +02:00
Michał Hudobski
5de3adb536 custom_index: remove unneeded duplicate of a static string
We have got a duplicate of the same static string and
the only usage of one of the copies can be easily replaced
2025-07-07 13:47:06 +02:00
Gleb Natapov
ca7837550d topology coordinator: do not set request_type field for truncation command if topology_global_request_queue feature is not enabled yet
Old nodes do not expect global topology request names to be in
request_type field, so set it only if a cluster is fully upgraded
already.

Closes scylladb/scylladb#24731
2025-07-02 17:09:29 +02:00
Avi Kivity
1e0b015c8b Merge 'cql3: Represent create_statement using managed_bytes' from Dawid Mędrek
When describing a table, we need to do it carefully: if some
columns were dropped, we must specify that explicitly by

```
ALTER TABLE {table} DROP {column} USING TIMESTAMP ...
```

in the result of the DESCRIBE statement. Failing to do so
could lead to data resurrection.

However, if a table has been altered many, many times,
we might end up with a huge create statement. Constructing
it could, in turn, trigger an oversized allocation.
Some tests ran into that very problem in fact.

In this commit, we want to mitigate the problem: instead of
allocating a contiguous chunk of memory for the create
statement, we use `bytes_ostream` and `managed_bytes` to
possibly keep data scattered in memory. It makes handling
`cql3::description` less convenient in the code, but since
the struct is pretty much immediately serialized after
creating it, it's a very good trade-off.

A reproducer is intentionally not provided by this commit:
it's easy to test the change, but adding and dropping
a huge number of columns would take a really long amount
of time, so we need to omit it.

Fixes scylladb/scylladb#24018

Backport: all of the supported versions are affected, so we want to backport the changes there.

Closes scylladb/scylladb#24151

* github.com:scylladb/scylladb:
  cql3/description: Serialize only rvalues of description
  cql3: Represent create_statement using managed_string
  cql3/statements/describe_statement.cc: Don't copy descriptions
  cql3: Use managed_bytes instead of bytes in DESCRIBE
  utils/managed_string.hh: Introduce managed_string and fragmented_ostringstream
2025-07-01 21:59:38 +03:00
Dawid Mędrek
9d03dcd28e cql3/description: Serialize only rvalues of description
We discard instances of `cql3::description` right after serializing them,
so let's change the signature of the function to save some work.
2025-07-01 12:58:11 +02:00
Dawid Mędrek
ac9062644f cql3: Represent create_statement using managed_string
When describing a table, we need to do it carefully: if some
columns were dropped, we must specify that explicitly by

```
ALTER TABLE {table} DROP {column} USING TIMESTAMP ...
```

in the result of the DESCRIBE statement. Failing to do so
could lead to data resurrection.

However, if a table has been altered many, many times,
we might end up with a huge create statement. Constructing
it could, in turn, trigger an oversized allocation.
Some tests ran into that very problem in fact.

In this commit, we want to mitigate the problem: instead of
allocating a contiguous chunk of memory for the create
statement, we use `fragmented_ostringstream` and `managed_string`
to possibly keep data scattered in memory. It makes handling
`cql3::description` less convenient in the code, but since
the struct is pretty much immediately serialized after
creating it, it's a very good trade-off.

We provide a reproducer. It consistently passes with this commit,
while having about 50% chance of failure before it (based on my
own experiments). Playing with the parameters of the test
doesn't seem to improve that chance, so let's keep it as-is.

Fixes scylladb/scylladb#24018
2025-07-01 12:58:02 +02:00
Nadav Har'El
e12ff4d3ab Merge 'LWT: use tablet_metadata_guard' from Petr Gusev
This PR is a step towards enabling LWT for tablet-based tables.

It pursues several goals:
* Make it explicit that the tablet can't migrate after the `cas_shard` check in `selec_statement/modification_statement`. Currently, `storage_proxy::cas` expects that the client calls it on a correct shard -- the one which owns the partition key the LWT is running on. There reasons for that are explained in [this commit](f16e3b0491 (diff-1073ea9ce4c5e00bb6eb614154f523ba7962403a4fe6c8cd877d1c8b73b3f649)) message. The statements check the current shard and invokes `bounce_to_shard` if it's not the right one. However , the erm strong pointer is only captured in `storage_proxy::cas` and until that moment there is no explicit structure in the code which would prevent the ongoing migrations. In this PR we introduce such stucture -- `erm_handle`. We create it before the `cas_check` and pass it down to `storage_proxy::cas` and `paxos_response_handler`.
* Another goal of this PR is an optimization -- we don't want to hold erm for the duration of entire LWT, unless it directly affects the current tablet. The is a `tablet_metadata_guard` class which is used for long running tablet operations. It automatically switches to a new erm if the topology change represented by the new erm doesn't affect the current tablet. We use this class in `erm_handle` if the table uses tablets. Otherwise, `erm_handle` just stores erm directly.
* Fixes [shard bouncing issue in alternator](https://github.com/scylladb/scylladb/issues/17399)

Backport: not needed (new feature).

Closes scylladb/scylladb#24495

* github.com:scylladb/scylladb:
  LWT: make cas_shard non-optional in sp::cas
  LWT: create cas_shard in select_statement
  LWT: create cas_shard in modification and batch statements
  LWT: create cas_shard in alternator
  LWT: use cas_shard in storage_proxy::cas
  do_query_with_paxos: remove redundant cas_shard check
  storage_proxy: add cas_shard class
  sp::cas_shard: rename to get_cas_shard
  token_metadata_guard: a topology guard for a token
  tablet_metadata_guard: mark as noncopyable and nonmoveable
2025-07-01 11:33:20 +03:00
Dawid Mędrek
d4315e4fae cql3/statements/describe_statement.cc: Don't copy descriptions
In an upcoming commit, `cql3::description` is going to become
a move-only type. These changes are a prerequisite for it:
we get rid of all places in the file where we copy its instances
and start moving them instead.
2025-06-30 19:12:14 +02:00
Dawid Mędrek
9472da3220 cql3: Use managed_bytes instead of bytes in DESCRIBE
This is a prerequiste for a following commit. We want
to move towards using non-contiguous memory chunks
to avoid making large allocations.

This commit does NOT change the behavior of Scylla
at all. The rows corresponding to the result of a DESCRIBE
statement are represented by an instance of `result_set`.
Before these changes, we encoded descriptions using `bytes`
and then passed them into a `result_set` using its method
`add_row`. What it does is turn the instances of `bytes`
into instances of `managed_bytes` and append them at the
end of its internal vector. In these changes, we do it
on our own and use another overload of the method.
2025-06-30 19:12:14 +02:00
Petr Gusev
3d262d2be8 LWT: create cas_shard in select_statement
In this commit we create cas_shard in select_statement
and pass it to the sp::query_result function.
2025-06-30 10:37:33 +02:00
Petr Gusev
736fa05b17 LWT: create cas_shard in modification and batch statements
We create cas_shard before the shard check to protect
against concurrent tablet migrations.
2025-06-30 10:37:33 +02:00
Petr Gusev
deb7afbc87 LWT: use cas_shard in storage_proxy::cas
Take cas_shard parameter in sp::cas and pass token_metadata_guard down to paxos_response_handler.

We make cas_shard parameter optional in storage_proxy methods
to make the refactoring easier. The sp::cas method constructs a new
token_metadata_guard if it's not set. All call sites pass null
in this commit, we will add the proper implementation in the next
commits.
2025-06-30 10:33:17 +02:00
Patryk Jędrzejczak
6489308ebc Merge 'Introduce a queue of global topology requests.' from Gleb Natapov
Currently only one global topology request (such as truncate, cdc repair, cleanup and alter table) can be pending. If one is already pending others will be rejected with an error. This is not very user friendly, so this series introduces a queue of global requests which allows queuing many global topology requests simultaneously.

Fixes: #16822

No need to backport since this is a new feature.

Closes scylladb/scylladb#24293

* https://github.com/scylladb/scylladb:
  topology coordinator: simplify truncate handling in case request queue feature is disable
  topology coordinator: fix indentation after the previous patch
  topology coordinator: allow running multiple global commands in parallel
  topology coordinator: Implement global topology request queue
  topology coordinator: Do not cancel global requests in cancel_all_requests
  topology coordinator: store request type for each global command
  topology request: make it possible to hold global request types in request_type field
  topology coordinator: move alter table global request parameters into topology_request table
  topology coordinator: move cleanup global command to report completion through topology_request table
  topology coordinator: no need to create updates vector explicitly
  topology coordinator: use topology_request_tracking_mutation_builder::done() instead of open code it
  topology coordinator: handle error during new_cdc_generation command processing
  topology coordinator: remove unneeded semicolon
  topology coordinator: fix indentation after the last commit
  topology coordinator: move new_cdc_generation topology request to use topology_request table for completion
  gms/feature_service: add TOPOLOGY_GLOBAL_REQUEST_QUEUE feature flag
2025-06-23 16:08:09 +03:00
Petr Gusev
aa970bf2e4 sp::cas_shard: rename to get_cas_shard
We intend to introduce a separate cas_shard
class in the next commits. We rename the existing
function here to avoid conflicts.
2025-06-18 11:51:48 +02:00
Avi Kivity
cd79a8fc25 Revert "Merge 'Atomic in-memory schema changes application' from Marcin Maliszkiewicz"
This reverts commit 0b516da95b, reversing
changes made to 30199552ac. It breaks
cluster.random_failures.test_random_failures.test_random_failures
in debug mode (at least).

Fixes #24513
2025-06-16 22:38:12 +03:00
Gleb Natapov
a9e99d1d3c topology coordinator: allow running multiple global commands in parallel
Now that we have a global request queue do not check that there is
global request before adding another one. Amend truncation test that
expects it explicitly and add another one that checks that two truncates
can be submitted in parallel.
2025-06-11 11:29:33 +03:00
Gleb Natapov
a0a3a034e0 topology coordinator: Implement global topology request queue
Requests, together with their parameters, are added to the
topology_request tables and the queue of active global requests is
kept in topology state. Thy are processed one by one by the topology
state machine.

Fixes: #16822
2025-06-11 11:29:33 +03:00
Tomasz Grabiec
0b516da95b Merge 'Atomic in-memory schema changes application' from Marcin Maliszkiewicz
This change is preparing ground for state update unification for raft bound subsystems. It introduces schema_applier which in the future will become generic interface for applying mutations in raft.

Pulling `database::apply()` out of schema merging code will allow to batch changes to subsystems. Future generic code will first call `prepare()` on all implementations, then single  `database::apply()` and then `update()` on all implementations, then on each shard it will call `commit()` for all implementations, without preemption so that the change is observed as atomic across all subsystems, and then `post_commit()`.

Backport: no, it's a new feature

Fixes: https://github.com/scylladb/scylladb/issues/19649

Closes scylladb/scylladb#20853

* github.com:scylladb/scylladb:
  storage_service: always wake up load balancer on update tablet metadata
  db: schema_applier: call destroy also when exception occurs
  db: replica: simplify seeding ERM during shema change
  db: remove cleanup from add_column_family
  db: abort on exception during schema commit phase
  db: make user defined types changes atomic
  replica: db: make keyspace schema changes atomic
  db: atomically apply changes to tables and views
  replica: make truncate_table_on_all_shards get whole schema from table_shards
  service: split update_tablet_metadata into two phases
  service: pull out update_tablet_metadata from migration_listener
  db: service: add store_service dependency to schema_applier
  service: simplify load_tablet_metadata and update_tablet_metadata
  db: don't perform move on tablet_hint reference
  replica: split add_column_family_and_make_directory into steps
  replica: db: split drop_table into steps
  db: don't move map references in merge_tables_and_views()
  db: introduce commit_on_shard function
  db: access types during schema merge via special storage
  replica: make non-preemptive keyspace create/update/delete functions public
  replica: split update keyspace into two phases
  replica: split creating keyspace into two functions
  db: rename create_keyspace_from_schema_partition
  db: decouple functions and aggregates schema change notification from merging code
  db: store functions and aggregates change batch in schema_applier
  db: decouple tables and views schema change notifications from merging code
  db: store tables and views schema diff in schema_applier
  db: decouple user type schema change notifications from types merging code
  service: unify keyspace notification functions arguments
  db: replica: decouple keyspace schema change notifications to a separate function
  db: add class encapsulating schema merging
2025-06-10 13:45:32 +02:00
Gleb Natapov
be0b328b19 topology coordinator: store request type for each global command 2025-06-09 13:38:49 +03:00
Gleb Natapov
3a496067c6 topology coordinator: move alter table global request parameters into topology_request table
Currently parameters to alter table global topology command are stored
in static column in the topology table, but this way there can be only one
outstanding alter table request. This patch moves the parameters to
the topology_request table where parameters are stored per request.
2025-06-09 13:38:49 +03:00
Michał Chojnowski
7d26d3c7cb db/config: add an option that disables dict-aware sstable compressors in DDL statements
For reasons, we want to be able to disallow dictionary-aware compressors
in chosen deployments.

This patch adds a knob for that. When the knob is disabled,
dictionary-aware compressors will be rejected in the validation
stage of CREATE and ALTER statements.

Closes scylladb/scylladb#24355
2025-06-09 13:30:40 +03:00
Marcin Maliszkiewicz
a27776b4ff replica: make truncate_table_on_all_shards get whole schema from table_shards
Before for views and indexes it was fetching base schema from db (and
couple other properties). This is a problem once we introduce atomic
tables and views deletion (in the following commit).
Because once we delete table it can no longer be fetched from db object,
and truncation is performed after atomically deleting all relevant
tables/views/indexes.

Now the whole relevant schema will be fetched via global_table_ptr
(table_shards) object.
2025-06-06 08:50:33 +02:00
Avi Kivity
c00824c7df Merge 'transport: Implement SCYLLA_USE_METADATA_ID support' from Andrzej Jackowski
Metadata id was introduced in CQLv5 to make metadata of prepared
statement metadata consistent between driver and database.
This commit introduces a protocol extension that allows to use the same
mechanism in CQLv4. As CQLv5 is currently unsupported in ScyllaDb (as well
as in some of the drivers), the motivation is to allow fixing https://github.com/scylladb/scylladb/issues/20860.

This change:
     - Implement metadata::calculate_metadata_id()
     - Implement SCYLLA_USE_METADATA_ID protocol extension for CQLv4
     - Added description of SCYLLA_USE_METADATA_ID in documentation
     - Add boost tests to confirm correctness of the function
     - Add python tests for table metadata change corner-cases

Fixes scylladb/scylladb#20860

Also see related https://scylladb.atlassian.net/wiki/spaces/RND/pages/42238631/MetadataId+extension+in+CQLv4+Requirement+Document

No backport needed (unless specifically requested by a customer), because there are existing workarounds for the issue

Closes scylladb/scylladb#23292

* github.com:scylladb/scylladb:
  test: add tests for prepared statement metadata consistency corner cases
  transport: implement SCYLLA_USE_METADATA_ID support
  cql3: implement metadata::calculate_metadata_id()
2025-05-29 12:27:31 +03:00
Nadav Har'El
61581d458e Merge 'vector_index: add custom index class from Michał Hudobski
This PR adds a class that allows for validation (and in the future creating and querying) of custom indexes and implements it for vector indexes. Currently custom vector_index creation runs a usual index creation process. This PR does not change that, however it adds validation of the parameters that need to have certain values for the actual creation of the vector index in the future. The only thing left for the vector_index feature to work as intended should be the integration with the Vector Store service.

This is a continuation of https://github.com/scylladb/scylladb/pull/23720
Refs:  [VS-55
](https://scylladb.atlassian.net/browse/VS-55) (Support setting index parametrs and similarity function in CREATE INDEX)
Fixes: [VS-13](https://scylladb.atlassian.net/browse/VS-13) (Validate that the base type is numeric when creating the vector index)

[VS-13]: https://scylladb.atlassian.net/browse/VS-13?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ

Closes scylladb/scylladb#24212

* github.com:scylladb/scylladb:
  test/cqlpy: remove xfail and add more vector tests
  vector_index: allow options when custom class is provided
  vector_index: add custom index and vector index classes
2025-05-28 10:42:29 +03:00
Michał Hudobski
7a2b0179e8 vector_index: allow options when custom class is provided
We have changed the validation for the custom index to not
require the CUSTOM keyword when creating the index,
only the custom class now we change the validation for options
so that they match.
2025-05-27 21:04:50 +02:00
Michał Hudobski
3ab643a5de vector_index: add custom index and vector index classes
In this patch we add an abstract class, "custom_index", with a validate() method.
Each CUSTOM INDEX class needs to implement a concrete subclass of custom_index
which is used to validate if this type of custom index class may be used,
and whether the optional parameters passed to it are valid.

We change the existing CUSTOM INDEX validation code to use this new mechanism.

Finally this patch implements one concrete subclass for vector index.
Before this patch, the custom index type "vector_index" was allowed,
but after this patch it gains more validation of its optional parameters
(we support 4 specific parameters, with some rules on their values).
Of course, the vector index isn't actually implemented in this patch,
we are just improving the validation of the index creation statement.
2025-05-27 21:04:50 +02:00
Avi Kivity
f195c05b0d untyped_result_set: mark get_blob() as returning unfragmented data
Blobs can be large, and unfragmented blobs can easily exceed 128k
(as seen in #23903). Rename get_blob() to get_blob_unfragmented()
to warn users.

Note that most uses are fine as the blobs are really short strings.

Closes scylladb/scylladb#24102
2025-05-26 09:40:34 +02:00
Nadav Har'El
d2844055ad Merge 'index: implement schema management layer for vector search indexes' from null
This pull request adds support for creating custom indexes (at a metadata level) as long as a supported custom class is provided (currently only vector search).

The patch contains:

- a change in CREATE INDEX statement that allows for the USING keyword to be present as long as one of the supported classes is used
-  support for describing custom indexes in the DESCRIBE statement
- unit tests

Co-authored by: @Balwancia

Closes scylladb/scylladb#23720

* github.com:scylladb/scylladb:
  test/cqlpy: add custom index tests
  index: support storing metadata for custom indices
2025-05-22 12:19:36 +03:00
Andrzej Jackowski
086df24555 transport: implement SCYLLA_USE_METADATA_ID support
Metadata id was introduced in CQLv5 to make metadata of prepared
statement consistent between driver and database. This commit introduces
a protocol extension that allows to use the same mechanism in CQLv4.

This change:
 - Introduce SCYLLA_USE_METADATA_ID protocol extension for CQLv4
 - Introduce METADATA_CHANGED flag in RESULT. The flag cames directly
   from CQLv5 binary protocol. In CQLv4, the bit was never used, so we
   assume it is safe to reuse it.
 - Implement handling of metadata_id and METADATA_CHANGED in RESULT rows
 - Implement returning metadata_id in RESULT prepared
 - Implement reading metadata_id from EXECUTE
 - Added description of SCYLLA_USE_METADATA_ID in documentation

Metadata_id is wrapped in cql_metadata_id_wrapper because we need to
distinguish the following situations:
 - Metadata_id is not supported by the protocol (e.g. CQLv4 without the
   extension is used)
 - Metadata_id is supported by the protocol but not set - e.g. PREPARE
   query is being handled: it doesn't contain metadata_id in the
   request but the reply (RESULT prepared) must contain metadata_id
 - Metadata_id is supported by the protocol and set, any number of
   bytes >= 0 is allowed, according to the CQLv5 protocol specification

Fixes scylladb/scylladb#20860
2025-05-14 09:59:16 +02:00
Andrzej Jackowski
c32aba93b4 cql3: implement metadata::calculate_metadata_id()
CQLv5 introduced metadata_id, which is a checksum computed from column
names and types, to track schema changes in prepared statements. This
commit introduces calculate_metadata_id to compute such id for given
metadata.

Please note that calculate_metadata_id() produces different hashes
than Cassandra's computeResultMetadataId(). We use SHA256 truncated to
128 bits instead of MD5. There are also two smaller technical
differences: calculate_metadata_id() doesn't add unneeded zeros and it
adds a length of a string when an sstring is being fed to the hasher.
The difference is intentional because MD5 has known vulnerabilities,
moreover we don't want to introduce any dependency between our
metadata_id and Cassandra's.

This change:
 - Add cql_metadata_id_type
 - Implement metadata::calculate_metadata_id()
 - Add boost tests to confirm correctness of the function
2025-05-14 09:33:16 +02:00
Michał Hudobski
05daa8dded index: support storing metadata for custom indices
Added function returning custom index class name.
Added printing custom index class name when using DESCRIBE.
Changed validation to reflect current support of indices.
2025-05-14 09:32:00 +02:00
Piotr Smaron
f740f9f0e1 cql: fix CREATE tablets KS warning msg
Materialized Views and Secondary Indexes are yet another features that
keyspaces with tablets do not support, but these were not listed in a
warning message returned to the user on CREATE KEYSPACE statement. This
commit adds the 2 missing features.

Fixes: #24006

Closes scylladb/scylladb#23902
2025-05-08 17:18:43 +02:00
Wojciech Mitros
900687c818 view_info: set base info on construction
Currently, the base_info may or may not be set in view schemas.
Even when it's set, it may be modified. This necessitates extra
checks when handling view schemas, as well as potentially causing
errors when we forget to set it at some point.

Instead, we want to make the base info an immutable member of view
schemas (inside view_info). The first step towards that is making
sure that all newly created schemas have the base info set.
We achieve that by requiring a base schema when constructing a view
schema. Unfortunately, this adds complexity each time we're making
a view schema - we need to get the base schema as well.
In most cases, the base schema is already available. The most
problematic scenario is when we create a schema from mutations:
- when parsing system tables we can get the schema from the
database, as regular tables are parsed before views
- when loading a view schema using the schema loader tool, we need
to load the base additionally to the view schema, effectively
doubling the work
- when pulling the schema from another node - in this case we can
only get the current version of the base schema from the local
database

Additionally, we need to consider the base schema version - when
we generate view updates the version of the base schema used for
reads should match the version of the base schema in view's base
info.
This is achieved by selecting the correct (old or new) schema in
`db::schema_tables::merge_tables_and_views` and using the stored
base schema in the schema_registry.
2025-04-24 01:08:39 +02:00
Pavel Emelyanov
d4f3a3ee4f cql: Remove unused "initial_tablets" mention from guardrails
All tablets configuration was moved into its own "with tablets" section,
this option name cannot be met among replication factors.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#23555
2025-04-06 16:52:07 +03:00