Commit Graph

9612 Commits

Author SHA1 Message Date
Andrzej Jackowski
c8f45dbbb2 test: speed up test_long_query_timeout_erm
`test_long_query_timeout_erm` is slow because it has many parameterized
variants, and it verifies timeout behavior during ERM operations, which
are slow by nature.

This change speeds the test up by roughly 3× (319s -> 114s) by:
 - Removing two of the five scenarios that were near duplicates.
 - Shortening timeout values to reduce waiting time.
 - Parallelizing waiting on server_log with asyncio.TaskGroup().

The two removed scenarios (`("SELECT", True, False)`,
`("SELECT_WHERE", True, False)`) were near duplicates to
`("SELECT_COUNT_WHERE", True, False)` scenario, because all three
scenarios use non-mapreduce query and triggers basically the same
system behavior. It is sufficient to keep only one of them, so the test
verifies three cases:
 - One with nodes shutdown
 - One with mapreduce query
 - One with non-mapreduce query

Fixes: scylladb/scylla#24127

Closes scylladb/scylladb#25987
2025-09-23 10:28:07 +03:00
Piotr Dulikowski
482ddfb3b4 Merge 'mv: handle mismatched base/view replica count caused by RF change' from Wojciech Mitros
During an ALTER KEYSPACE statement execution where a table with a view
is present, we need to perform tablet migrations for both tables.
These migrations are not synchronized, so at some point the base may
have a different number of non-pending replicas than the view. Because
of that, we can't pair them correctly. If there is more non-pending
base replicas than view replicas, we don't need to do anything because
the view replica that didn't finish migrating is a pending replica
and will get view updates from all base replicas. But if there is more
non-pending view replicas than base replicas, we may currently lose
view updates to the new view replica.

This patch adds a workaround for this scenario. If after one migration
we have too more non-pending view replicas than base replicas, we add
it to the pending replica list so that it gets an update anyway.

This patch will also take effect if the base and view replica counts
differ due to some other bug. To track that, a new metric is added
to count such occurrences.

This patch also includes a test for this exact scenario, which is enforced by an injection.

Fixes https://github.com/scylladb/scylladb/issues/21492

Closes scylladb/scylladb#24396

* github.com:scylladb/scylladb:
  mv: handle mismatched base/view replica count caused by RF change
  mv: save the nodes used for pairing calculations for later reuse
  mv: move the decision about simple rack-aware pairing later
2025-09-23 08:10:08 +02:00
Dawid Mędrek
35f7d2aec6 db/batchlog: Drop batch if table has been dropped
If there are pending mutations in the batchlog for a table that
has been dropped, we'll keep attempting to replay them but with
no success -- `db::no_such_column_family` exceptions will be thrown,
and we'll keep trying again and again.

To prevent that, we drop the batch in that case just like we do
in the case of a non-existing keyspace.

A reproducer test has been included in the commit. It fails without
the changes in `db/batchlog_manager.cc`, and it succeeds with them.

Fixes scylladb/scylladb#24806

Closes scylladb/scylladb#26057
2025-09-23 07:48:59 +02:00
Patryk Jędrzejczak
a56115f77b test: deflake driver reconnections in the recovery procedure tests
All three tests could hit
https://github.com/scylladb/python-driver/issues/295. We use the
standard workaround for this issue: reconnecting the driver after
the rolling restart, and before sending any requests to local tables
(that can fail if the driver closes a connection to the node that
restarted last).

All three tests perform two rolling restarts, but the latter ones
already have the workaround.

Fixes #26005

Closes scylladb/scylladb#26056
2025-09-22 17:21:06 +02:00
Andrzej Jackowski
15e71ee083 test: audit: stop using datetime.datetime.now() in syslog converter
`line_to_row` is a test function that converts `syslog` audit log to
the format of `table` audit log so tests can use the same checks
for both types of audit. Because `syslog` audit doesn't have `date`
information, the field was filled with the current date. This behavior
broke the tests running at 23:59:59 because `line_to_row` returned
different results on different days.

Fixes: scylladb/scylladb#25509

Closes scylladb/scylladb#26101
2025-09-22 15:31:33 +03:00
Pavel Emelyanov
b23aab882a Merge 'test/alternator: multiple fixes for tests so they would pass on DynamoDB' from Nadav Har'El
Issue #26079 noted that multiple Alternator tests are failing when run against DynamoDB. This pull request fixes many of them, in several small patches. In one case we need to avoid a DynamoDB bug that wasn't even the point of the original test (and we create a new test specifically for that DynamoDB bug). Another test exposed a real incompatibility with Alternator (#26103) but didn't need to be exposed in this specific test so again we split the test to one that passes, and another one that xfails on Alternator (not on DynamoDB). A bigger changed had to be done to the tags feature test - since August 2024, the TagResource operation became asynchronous which broke our tests, so we fix this.

Each of these changes are described in more detail in the individual patches.

Refs #26079. It doesn't fix it completely because there are some tests which remain flaky, and some tests which, surprisingly, pass on us-east-1 but fail on eu-north-1. We'll need to address the rest later.

No backports needed, we only run tests against DynamDB from master (when we rarely do...), not on old branches.

Closes scylladb/scylladb#26114

* github.com:scylladb/scylladb:
  test/alternator: fix test_list_tables_paginated on DynamoDB
  test/alternator: fix tests in test_tag.py on DynamoDB
  test/alternator: fix test_health_only_works_for_root_path on DynamoDB
  test/alternator: reproducer tests for faux GSI range key problem
  test/alternator: fix test "test_17119a" to pass on DynamoDB
  test/alternator: fix test to pass on DynamoDB
2025-09-22 15:30:40 +03:00
Avi Kivity
29032213c8 test: avoid #include <boost/test/included/...>
The boost/test/included/... directory is apparently internal and not
intended for user consumption.

Including it caused a One-Definition-Rule violation, due to
boost/test/impl/unit_test_parameters.ipp containing code like this:

```c++
namespace runtime_config {

// UTF parameters
std::string btrt_auto_start_dbg    = "auto_start_dbg";
std::string btrt_break_exec_path   = "break_exec_path";
std::string btrt_build_info        = "build_info";
std::string btrt_catch_sys_errors  = "catch_system_errors";
std::string btrt_color_output      = "color_output";
std::string btrt_detect_fp_except  = "detect_fp_exceptions";
std::string btrt_detect_mem_leaks  = "detect_memory_leaks";
std::string btrt_list_content      = "list_content";
```

This is defining variables in a header, and so can (and in fact does)
create duplicate variable definitions, which later cause trouble.

So far, we were protected from this trouble by -fvisibility=hidden, which
caused the duplicate definitions to be in fact not duplicate.

Fix this by correcting the include path away from <boost/test/included/>.

Closes scylladb/scylladb#26161
2025-09-22 15:26:06 +03:00
Wojciech Mitros
d9b8278178 mv: handle mismatched base/view replica count caused by RF change
During an ALTER KEYSPACE statement execution where a table with a view
is present, we need to perform tablet migrations for both tables.
These migrations are not synchronized, so at some point the base may
have a different number of non-pending replicas than the view. Because
of that, we can't pair them correctly. If there is more non-pending
base replicas than view replicas, we don't need to do anything because
the view replica that didn't finish migrating is a pending replica
and will get view updates from all base replicas. But if there is more
non-pending view replicas than base replicas, we may currently lose
view updates to the new view replica.

This patch adds a workaround for this scenario. If after one migration
we have too more non-pending view replicas than base replicas, we add
it to the pending replica list so that it gets an update anyway.

This patch will also take effect if the base and view replica counts
differ due to some other bug. To track that, a new metric is added
to count such occurrences.

This patch also includes a test for this exact scenario, which is enforced by an injection.

Fixes https://github.com/scylladb/scylladb/issues/21492
2025-09-22 12:50:16 +02:00
Nadav Har'El
b205e1a3da Merge 'vector_store_client: Extract DNS logic into a dedicated class' from Karol Nowacki
Vector search related implementation moved to a new module vector_search.
As the vector search functionality is going to be extended, it is better to keep it in a separate module.

The DNS resolution logic and its background task are moved out of the `vector_store_client` and into a new, dedicated class `vector_search::dns`.

This refactoring is the first step towards supporting DNS hostnames that resolve to multiple IP addresses.

References: VECTOR-187

No backport needed as this is refactoring.

Closes scylladb/scylladb#26052

* github.com:scylladb/scylladb:
  vector_store_client_test: Verify DNS is not refreshed when disabled
  vector_store_client: Extract DNS logic into a dedicated class
  vector_search: Apply clang-format
  vector_store_client: Move to vector_search module
2025-09-22 13:24:34 +03:00
Avi Kivity
1258e7c165 Revert "Merge 'transport: service_level_controller: create and use driver service level' from Andrzej Jackowski"
This reverts commit fe7e63f109, reversing
changes made to b5f3f2f4c5. It is causing
test.py failures around cqlpy.

Fixes #26163

Closes scylladb/scylladb#26174
2025-09-22 09:32:46 +03:00
Piotr Dulikowski
b382531d99 Merge 'cdc: fix create table with cdc if not exists' from Michael Litvak
Fix an issue where executing a CREATE TABLE IF NOT EXISTS statement with
CDC enabled fails with an error if the table already exists. Instead,
the query should succeed and be a no-op.

This regression was introduced by commit fed1048059. Previously, when
executing the query, we would first check if the table exists in
do_prepare_new_column_families_announcement. If it did, we would throw
an already_exists_exception, which was handled correctly; otherwise, we
would continue and create the CDC table in the
before_create_column_families notification.

The order of operations was changed in fed1048059, causing the
regression. Now, we first create the CDC schema and add it to the schema
list for creation, and then check for each of them if they already
exist. The problem is that when we create the CDC schema in
on_pre_create_column_families, it also checks if the CDC table already
exists. If it does, it throws an invalid_request_exception, which is not
caught and handled as expected.

This patch restores the previous order of operations: we first check if
the tables exist, and only then add the CDC schema in pre_create.

Fixes https://github.com/scylladb/scylladb/issues/26142

no backport - recent regression, not released yet

Closes scylladb/scylladb#26155

* github.com:scylladb/scylladb:
  test: add test for creating table with CDC enabled if not exists
  cdc: fix create table with cdc if not exists
2025-09-22 08:18:26 +02:00
Piotr Dulikowski
591a67c7e7 Merge 'view_builder: register view on all shards atomically' from Michael Litvak
When the view builder starts to build a new view, each shard registers
itself by writing the shard id and current token to the
scylla_views_builds_in_progress table.

Previously, this happened independently by each shard. We change it now
to register all shards "atomically" - when a shard registers itself, it
also registers all other shards with an empty status, if they aren't
registered yet. This ensures that we don't have a partial state in the
table where only some of the shards are registered, but we always have a
status for all shards.

The reason we want to register all shards atomically is that if it
happens that only some of the shards were registered, then we restart
and load the status from table, this doesn't work well for multiple
reasons.

One example is that to know how many shards we had previously, we take
the maximum shard id we see in the table. If it's different than the
current shard count, we will execute the reshard code. But of course, if
the last shard is missing from the table because it didn't register
itself, this calculation will be wrong, and we can't know the previous
number of shards.

This is a problem because suppose we have two shards, and shard 0
finished building the view but shard 1 didn't start. When we come up, we
will think that previously we had only a single shard and it completed
building everything, when in fact we built only half the view
approximately. The problem is that we don't have enough information in
the tables to know that.

There are additional problems related to reshard. In the reshard
function, whether it is executed because we actually do node reshard or
because we calculated the wrong number of previous shards, if the status
of some shard is missing then the calculation of new ranges will be
wrong. When some shard didn't make progress we should start building the
view from scratch. However, this doesn't happen if we don't have a
status for the shard, because the code looks only for shards that have a
status. In effect, this shard is considered complete even though it
didn't start. This could cause the view building to get stuck or
complete without building all tokens ranges.

By registering all shards atomically, this should solve the above
problems because we will always have statuses for all shards.

Fixes https://github.com/scylladb/scylladb/issues/22989

backport not needed - the issue is probably not common and there's a workaround

Closes scylladb/scylladb#25790

* github.com:scylladb/scylladb:
  test: mv: add a test for view build interrupt during registration
  view_builder: register view on all shards atomically
2025-09-22 08:03:44 +02:00
Karol Nowacki
6bd1d7db49 vector_store_client_test: Verify DNS is not refreshed when disabled
Extend the `vector_store_client_uri_update_to_empty` test case to
verify that the DNS resolver stops refreshing when the vector store is
disabled.
2025-09-22 08:02:59 +02:00
Karol Nowacki
7cc7b95681 vector_search: Apply clang-format
Run clang-format on the vector_search module to fix minor formatting
inconsistencies.
2025-09-22 08:01:50 +02:00
Karol Nowacki
eae71d3e91 vector_store_client: Move to vector_search module
Vector search related implementation moved to a new module vector_search.
As the vector search functionality is going to be extended, it is
better to keep it in a separate module.
2025-09-22 08:01:47 +02:00
Dawid Mędrek
0d2560c07f test/perf/tablet_load_balancing.cc: Create nodes within one DC
In 789a4a1ce7, we adjusted the test file
to work with the configuration option `rf_rack_valid_keyspaces`. Part of
the commit was making the two tables used in the test replicate in
separate data centers.

Unfortunately, that destroyed the point of the test because the tables
no longer competed for resources. We fix that by enforcing the same
replication factor for both tables.

We still accept different values of replication factor when provided
manually by the user (by `--rf1` and `--rf2` commandline options). Scylla
won't allow for creating RF-rack-invalid keyspaces, but there's no reason
to take away the flexibility the user of the test already has.

Fixes scylladb/scylladb#26026

Closes scylladb/scylladb#26115
2025-09-21 21:36:43 +02:00
Tomasz Grabiec
4a83b4eef3 Merge 'topology_coordinator: abort view building a bit later in case of tablet migration' from Piotr Dulikowski
In multi DC setup, tablet load balancer might generate multiple migrations of the same tablet_id but only one is actually commited to the `system.tablets` table.

This PR moved abortion of view building tasks from the same start of the migration (`<no tablet transition> -> allow_write_both_read_old`) to the next step (`allow_write_both_read_old -> write_both_read_old`). This way, we'll abort only tasks for which the tablet migration was actually started.

The PR also includes a reproducer test.

Fixes scylladb/scylladb#25912

View building coordinator hasn't been released yet, so no backport is needed.

Closes scylladb/scylladb#26144

* github.com:scylladb/scylladb:
  test/test_view_building_coordinator: add reproducer
  topology_coordinator: abort view building a bit later in case of tablet migration
2025-09-21 15:41:53 +02:00
Karol Nowacki
eedf506be5 vector_store_client: Rename vector_store_uri to vector_store_primary_uri
The configuration setting vector_store_uri is renamed to
vector_store_primary_uri according to the final design.
In the future, the vector_store_secondary_uri setting will
be introduced.

This setting now also accepts a comma-separated list of URIs to prepare
for future support for redundancy and load balancing. Currently, only the
first URI in the list is used.

This change must be included before the next release.
Otherwise, users will be affected by a breaking change.

References: VECTOR-187

Closes scylladb/scylladb#26033
2025-09-21 16:33:10 +03:00
Michael Litvak
3dffb8e0dc test: mv: add a test for view build interrupt during registration
Add a new test that reproduces issue #22989. The test starts view
building and interrupts it by restarting the node while some shards
registered their status and some didn't.
2025-09-21 10:39:30 +02:00
Michael Litvak
6043409c31 view_builder: register view on all shards atomically
When the view builder starts to build a new view, each shard registers
itself by writing the shard id and current token to the
scylla_views_builds_in_progress table.

Previously, this happened independently by each shard. We change it now
to register all shards "atomically" - when a shard registers itself, it
also registers all other shards with an empty status, if they aren't
registered yet. This ensures that we don't have a partial state in the
table where only some of the shards are registered, but we always have a
status for all shards.

The reason we want to register all shards atomically is that if it
happens that only some of the shards were registered, then we restart
and load the status from table, this doesn't work well for multiple
reasons.

One example is that to know how many shards we had previously, we take
the maximum shard id we see in the table. If it's different than the
current shard count, we will execute the reshard code. But of course, if
the last shard is missing from the table because it didn't register
itself, this calculation will be wrong, and we can't know the previous
number of shards.

This is a problem because suppose we have two shards, and shard 0
finished building the view but shard 1 didn't start. When we come up, we
will think that previously we had only a single shard and it completed
building everything, when in fact we built only half the view
approximately. The problem is that we don't have enough information in
the tables to know that.

There are additional problems related to reshard. In the reshard
function, whether it is executed because we actually do node reshard or
because we calculated the wrong number of previous shards, if the status
of some shard is missing then the calculation of new ranges will be
wrong. When some shard didn't make progress we should start building the
view from scratch. However, this doesn't happen if we don't have a
status for the shard, because the code looks only for shards that have a
status. In effect, this shard is considered complete even though it
didn't start. This could cause the view building to get stuck or
complete without building all tokens ranges.

By registering all shards atomically, this should solve the above
problems because we will always have statuses for all shards.

Fixes scylladb/scylladb#22989
2025-09-21 10:39:05 +02:00
Evgeniy Naydanov
85cbe7a8d4 test: add test for creating table with CDC enabled if not exists
Check if there are no errors on the second attempt of executing
"create table if not exists" query if CDC is enabled.
2025-09-21 09:38:36 +02:00
Michał Hudobski
1690e5265a vector search: correct column name formatting
This patch corrects the column name formatting whenever
an "Undefined column name" exception is thrown.
Until now we used the `name()` function which
returns a bytes object. This resulted in a message
with a garbled ascii bytes column name instead of
a proper string. We switch to the `text()` function
that returns a sstring instead, making the message
readable.
Tests are adjusted to confirm this behavior.

Fixes: VECTOR-228

Closes scylladb/scylladb#26120
2025-09-20 07:02:53 +02:00
Michał Jadwiszczak
2aabf8ee3f test/test_view_building_coordinator: add reproducer
Adds a test which reproduces the issue described
in scylladb/scylladb#25912.

The test creates a situation where a single tablet is replicated across
multiple DCs / racks, and all those tablet replicas are eligible for
migration. The tablet load balancer is unpaused at that moment which
currently causes it to attempt to generate multiple migrations for
different tablet replicas of the same tablet. Before the fix for #25912,
this used to confuse the view build coordinator which would react to
each migration attempt, pausing view building work for each tablet
replica for which there was an attempt to migrate but only unpausing it
for the tablet replica that was actually migrated. After the fix, the
view build coordinator only reacts to the migration that has "won" so
the test successfully passes.
2025-09-19 19:08:34 +02:00
Michał Chojnowski
9e70df83ab db: get rid of sstables-format-selector
Our sstable format selection logic is weird, and hard to follow.

If I'm not misunderstanding, the pieces are:
1. There's the `sstable_format` config entry, which currently
   doesn't do anything, but in the past it used to disable
   cluster features for versions newer than the specified one.
2. There are deprecated and unused config entries for individual
   versions (`enable_sstables_mc_format`, `enable_sstables_md_format`,
   etc).
3. There is a cluster feature for each version:
   ME_SSTABLE_FORMAT, MD_SSTABLE_FORMAT, etc.
   (Currently all sstable version features have been grandfathered,
   and aren't checked by the code anymore).
4. There's an entry in `system.scylla_local` which contains the
   latest enabled sstable version. (Why? Isn't this directly derived
   from cluster features anyway)?
5. There's `sstable_manager::_format` which contains the
   sstable version to be used for new writes.
   This field is updated by `sstables_format_selector`
   based on cluster features and the `system.scylla_local` entry.

I don't see why those pieces are needed. Version selection has the
following constraints:
1. New sstables must be written with a format that supports existing
   data. For example, range tombstones with an infinite bound are only
   supported by sstables since version "mc". So if a range tombstone
   with an infinite bound exists somewhere in the dataset,
   the format chosen for new sstables has to be at least as new as "mc".
2. A new format might only be used after a corresponding cluster feature
   is enabled. (Otherwise new sstables might become unreadable if they
   are sent to another node, or if a node is downgraded).
3. The user should have a way to inhibit format ugprades if he wishes.

So far, constraint (1) has been fulfilled by never using formats older
than the newest format ever enabled on the node. (With an exception
for resharding and reshaping system tables).
Constraint (2) has been fulfilled by calling `sstable_manager::set_format`
only after the corresponsing cluster feature is enabled.
Constraint (3) has been fulfilled by the ability to inhibit cluster
features by setting `sstable_format` by some fixed value.

The main thing I don't like about this whole setup is that it doesn't
let me downgrade the preferred sstable format. After a format is
enabled, there is no way to go back to writing the old format again.
That is no good -- after I make some performance-sensitive changes
in a new format, it might turn out to be a pessimization for the
particular workload, and I want to be able to go back.

This patch aims to give a way to downgrade formats without violating
the constraints. What it does is:
1. The entry in `system.scylla_local` becomes obsolete.
   After the patch we no longer update or read it.
   As far as I understand, the purpose of this entry is to prevent
   unwanted format downgrades (which is something cluster features
   are designed for) and it's updated if and only if relevant
   cluster features are updated. So there's no reason to have it,
   we can just directly use cluster features.
2. `sstable_format_selector` gets deleted.
   Without the `system.scylla_local` around, it's just a glorified
   feature listener.
3. The format selection logic is moved into `sstable_manager`.
   It already sees the `db::config` and the `gms::feature_service`.
   For the foreseeable future, the knowledge of enabled cluster features
   and current config should be enough information to pick the right formats.
4. The `sstable_format` entry in `db::config` is no longer intended to
   inhibit cluster features. Instead, it is intended to select the
   format for new sstables, and it becomes live-updatable.
5. Instead of writing new sstables with "highest supported" format,
   (which used to be set by `sstables_format_selector`) we write
   them with the "preferred" format, which is determined by
   `sstable_manager` based on the combination of enabled features
   and the current value of `sstable_format`.

Closes scylladb/scylladb#26092

[avi: Pavel found the reason for the scylla_local entry -
      it predates stable storage for cluster features]
2025-09-19 16:17:56 +03:00
Pavel Emelyanov
a1ea553fe1 code: Replace distributed<> with sharded<>
The latter is recommended in seastar, and the former was left as
compatibility alias. Latest seastar explicitly marks it as deprecated so
once the submodule is updated, compilation logs will explode.

Most of the patch is generated with

    for f in $(git grep -l '\<distributed<[A-Za-z0-9:_]*>') ; do sed -e 's/\<distributed<\([A-Za-z0-9:_]*\)>/sharded<\1>/g' -i $f; done
    for f in $(git grep -l distributed.hh); do sed -e 's/distributed.hh/sharded.hh/' -i $f ; done

and a small manual change in test/perf/perf.hh

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#26136
2025-09-19 12:22:51 +02:00
Aleksandra Martyniuk
5235e3cf67 test: limit test_streaming_deadlock_removenode concurrency
test_streaming_deadlock_removenode starts 10240 inserts at once,
overloading a node. Due to that test fails with timeout.

Limit inserts concurrency.

Fixes: #25945.

Closes scylladb/scylladb#26102
2025-09-19 12:50:20 +03:00
Botond Dénes
37e46f674d Merge 'nodetool: ignore repair request error of colocated tables' from Michael Litvak
when cluster repair is run for an entire keyspace, nodetool makes a
repair api request for each table.

if the keyspace contains colocated tables, then the api request for the
colocated tables will fail, because currently scylla doesn't allow making
repair requests for specific colocated tables, but only for base tables.

if the request is to repair an entire keyspace then we can ignore this,
because we will make a repair request for all base tables, and this in
turn will repair also all the colocated tables in the keyspace.

however if specific tables are requested and some of them are colocated
then we should propagate the error to let the user know the request is
invalid.

Refs https://github.com/scylladb/scylladb/issues/24816

no backport - no colocated tablets in previous releases

Closes scylladb/scylladb#26051

* github.com:scylladb/scylladb:
  nodetool: ignore repair request error of colocated tables
  storage_service: improve error message on repair of colocated tables
2025-09-19 06:44:23 +03:00
Nadav Har'El
7be5454db1 Merge 'alternator: Store LSI keys in :attrs for newly created tables' from Piotr Wieczorek
Previously, LSI keys were stored as separate, top-level columns in the base table. This patch changes this behavior for newly created tables, so that the key columns are stored inside the `:attrs` map. Then, we use top-level computed columns instead of regular ones.

This makes LSI storage consistent with GSIs and allows the use of a collection tombstone on `:attrs` to delete all attributes in a row except for keys in new tables.

Refs https://github.com/scylladb/scylladb/pull/24991
Refs https://github.com/scylladb/scylladb/issues/6930

Closes scylladb/scylladb#25796

* github.com:scylladb/scylladb:
  alternator: Store LSI keys in :attrs for newly created tables
  alternator/test: Add LSI tests based mostly on the existing GSI tests
2025-09-18 21:48:43 +03:00
Avi Kivity
fe7e63f109 Merge 'transport: service_level_controller: create and use driver service level' from Andrzej Jackowski
This patch series:
 - Increases the number of allowed scheduling groups to allow creation of `sl:driver`
 - Implements `create_driver_service_level` that creates `sl:driver` with shares=200 if it wasn't already created
 - Implements creation of `sl:driver` for new systems and tests in `raft_initialize_discovery_leader`
 - Modifies `topology_coordinator` to use  create `sl:driver` after upgrades.
 - Implements using `sl:driver` for new connections in `transport/server`
 - Adds to `transport/server` recognition of driver's control connections and forcing them to keep using `sl:driver`.
 - Adds tests to verify the new functionality
 - Modifies existing tests to let them pass after `sl:driver` is added
 - Modifies the documentation to contain new `sl:driver`

The changes were evaluated by a test with the following scenario ([test_connections-sl-driver.py](https://github.com/user-attachments/files/22021273/test_connections-sl-driver.py)):
 - Start ScyllaDB with one node
 - Create 1000 keyspaces, 1 table in each keyspace
 - Start `cassandra-stress` (`-rate threads=50  -mode native cql3`)
 - Run connection storm with 1000 session (100 python processes, 10 sessions each)

The maximum latency during connection storm dropped **from 224.94ms to 41.43ms** (those numbers are average from 20 test executions, were max latency was in [140ms, 361ms] before change and [31.4ms, 61.5ms] after).

The snippet of cassandra-stress output from the moment of connection storm:
Before:
```
type       total ops,    op/s,    pk/s,   row/s,    mean,     med,     .95,     .99,    .999,     max,   time,   stderr, errors,  gc: #,  max ms,  sum ms,  sdv ms,      mb
...
total,        789206,   85887,   85887,   85887,     0.6,     0.3,     2.0,     2.0,     2.5,     5.0,    9.0,  0.09679,      0,      0,       0,       0,       0,       0
total,        909322,  120116,  120116,  120116,     0.4,     0.2,     1.9,     2.0,     2.1,     3.1,   10.0,  0.09053,      0,      0,       0,       0,       0,       0
total,        964392,   55070,   55070,   55070,     0.9,     0.4,     2.0,     4.5,     7.7,    18.9,   11.0,  0.09203,      0,      0,       0,       0,       0,       0
total,        975705,   11313,   11313,   11313,     4.4,     3.5,     6.5,    24.5,    82.7,    83.0,   12.0,  0.11713,      0,      0,       0,       0,       0,       0
total,        987548,   11843,   11843,   11843,     4.2,     3.5,     6.5,    33.7,    48.6,    51.5,   13.0,  0.13366,      0,      0,       0,       0,       0,       0
total,        995422,    7874,    7874,    7874,     6.3,     4.0,     7.7,    85.6,   112.9,   113.5,   14.0,  0.14753,      0,      0,       0,       0,       0,       0
total,       1007228,   11806,   11806,   11806,     4.3,     3.5,     6.5,    29.1,    43.8,    87.1,   15.0,  0.15598,      0,      0,       0,       0,       0,       0
total,       1012840,    5612,    5612,    5612,     8.2,     5.0,    11.5,   121.8,   166.6,   170.1,   16.0,  0.16535,      0,      0,       0,       0,       0,       0
total,       1016186,    3346,    3346,    3346,    13.4,     7.4,    20.1,   204.9,   207.6,   210.4,   17.0,  0.17405,      0,      0,       0,       0,       0,       0
total,       1025462,    9276,    9276,    9276,     6.3,     3.9,     9.6,    74.6,   206.8,   210.0,   18.0,  0.17800,      0,      0,       0,       0,       0,       0
total,       1035979,   10517,   10517,   10517,     4.8,     3.5,     6.7,    38.5,    82.6,    83.0,   19.0,  0.18120,      0,      0,       0,       0,       0,       0
total,       1047488,   11509,   11509,   11509,     4.3,     3.5,     6.0,    32.6,    72.3,    74.0,   20.0,  0.18334,      0,      0,       0,       0,       0,       0
total,       1077456,   29968,   29968,   29968,     1.7,     1.6,     2.9,     3.6,     7.0,     8.2,   21.0,  0.17943,      0,      0,       0,       0,       0,       0
total,       1105490,   28034,   28034,   28034,     1.8,     1.8,     3.5,     4.6,     5.3,    13.8,   22.0,  0.17609,      0,      0,       0,       0,       0,       0
total,       1132221,   26731,   26731,   26731,     1.9,     1.8,     3.8,     5.2,     8.4,    11.1,   23.0,  0.17314,      0,      0,       0,       0,       0,       0
total,       1162149,   29928,   29928,   29928,     1.7,     1.7,     3.0,     4.5,     8.0,     9.1,   24.0,  0.16950,      0,      0,       0,       0,       0,       0
...
```

After:
```
type       total ops,    op/s,    pk/s,   row/s,    mean,     med,     .95,     .99,    .999,     max,   time,   stderr, errors,  gc: #,  max ms,  sum ms,  sdv ms,      mb
...
total,        822863,   94379,   94379,   94379,     0.5,     0.3,     2.0,     2.0,     2.1,     3.7,    9.0,  0.06669,      0,      0,       0,       0,       0,       0
total,        937337,  114474,  114474,  114474,     0.4,     0.2,     2.0,     2.0,     2.1,     3.4,   10.0,  0.06301,      0,      0,       0,       0,       0,       0
total,        986630,   49293,   49293,   49293,     1.0,     1.0,     2.0,     2.1,    17.9,    19.0,   11.0,  0.07318,      0,      0,       0,       0,       0,       0
total,       1026734,   40104,   40104,   40104,     1.2,     1.0,     2.0,     2.2,     6.3,     7.1,   12.0,  0.08410,      0,      0,       0,       0,       0,       0
total,       1066124,   39390,   39390,   39390,     1.3,     1.0,     2.0,     2.2,     2.6,     3.4,   13.0,  0.09108,      0,      0,       0,       0,       0,       0
total,       1103082,   36958,   36958,   36958,     1.3,     1.1,     2.1,     2.5,     3.1,     4.2,   14.0,  0.09643,      0,      0,       0,       0,       0,       0
total,       1141987,   38905,   38905,   38905,     1.3,     1.0,     2.0,     2.4,    11.4,    12.7,   15.0,  0.09894,      0,      0,       0,       0,       0,       0
total,       1180023,   38036,   38036,   38036,     1.3,     1.0,     2.0,     3.7,     5.6,     7.1,   16.0,  0.10070,      0,      0,       0,       0,       0,       0
total,       1216481,   36458,   36458,   36458,     1.4,     1.0,     2.1,     3.6,     4.7,     5.0,   17.0,  0.10210,      0,      0,       0,       0,       0,       0
total,       1256819,   40338,   40338,   40338,     1.2,     1.0,     2.0,     2.2,     3.5,     5.4,   18.0,  0.10173,      0,      0,       0,       0,       0,       0
total,       1295122,   38303,   38303,   38303,     1.3,     1.0,     2.0,     2.4,    21.0,    21.1,   19.0,  0.10136,      0,      0,       0,       0,       0,       0
total,       1334743,   39621,   39621,   39621,     1.3,     1.0,     2.0,     2.3,     3.3,     4.0,   20.0,  0.10055,      0,      0,       0,       0,       0,       0
total,       1375579,   40836,   40836,   40836,     1.2,     1.0,     2.0,     2.1,     3.4,     5.7,   21.0,  0.09927,      0,      0,       0,       0,       0,       0
total,       1415576,   39997,   39997,   39997,     1.2,     1.0,     2.0,     2.3,     3.2,     4.1,   22.0,  0.09807,      0,      0,       0,       0,       0,       0
total,       1449268,   33692,   33692,   33692,     1.5,     1.4,     2.5,     3.2,     4.2,     5.6,   23.0,  0.09800,      0,      0,       0,       0,       0,       0
total,       1471873,   22605,   22605,   22605,     2.2,     2.0,     4.8,     5.9,     7.0,     7.9,   24.0,  0.10015,      0,      0,       0,       0,       0,       0
...
```

Fixes: https://github.com/scylladb/scylladb/issues/24411

This is a new feature, so no backport needed.

Closes scylladb/scylladb#25412

* github.com:scylladb/scylladb:
  docs: workload-prioritization: add driver service level
  test: add test to verify use of `sl:driver`
  transport: use `sl:driver` to handle driver's control connections
  transport: whitespace only change in update_scheduling_group
  transport: call update_scheduling_group for non-auth connections
  generic_server: transport: start using `sl:driver` for new connections
  test: add test_desc_* for driver service level
  test: service_levels: add tests for sl:driver creation and removal
  test: add reload_raft_topology_state() to ScyllaRESTAPIClient
  service_level_controller: automatically create `sl:driver`
  service_level_controller: methods to create driver service level
  service_level_controller: handle special sl:driver in DESC output
  topology_coordinator: add service_level_controller reference
  system_keyspace: add service_level_driver_created
  test: add MAX_USER_SERVICE_LEVELS
2025-09-18 19:45:17 +03:00
Radosław Cybulski
c0db278c03 Don't report spurious keys in DescribeTable
Alternator, when creating gsi, adds artificially columns, that user
had not ask for. This patch prevents those columns from showing up in
DescribeTable's output.

Fixes #5320

Closes scylladb/scylladb#25978
2025-09-18 19:34:39 +03:00
Patryk Jędrzejczak
5efc46152c Merge 'raft_topology: Modify the conditional logic in remove node operation …' from Abhinav Kumar Jha
In the current scenario, the shard receiving the remove node REST api request performs condional lock depending on whether raft is enabled or not. Since non-zero shard returns false for `raft_topology_change_enabled()`, the requests routed to non zero shards are prone to this lock which is unnecessary and hampers the ability to perform concurrent operations, which is possible for raft enabled nodes.

This pr modifies the conditional lock logic and orchestrates the remove node execution logic directly to the shard0, hence the `raft_topology_change_enabled()` is now checked on the shard0 and execution is performed accordingly.

Earlier, `storage_service::find_raft_nodes_from_hoeps` code threw error upon observing any non topology member present in ignore_nodes. Since we are performing concurrent remove node operations, the timing can lead to one node being fully removed before the other node remove op begins processing which can lead to runtime error in storage_service::find_raft_nodes_from_hoeps. This error throw was added to prevent users from putting random non existent nodes in ignore_nodes list. Hence made changes in that function to account for already removed nodes and ignore those nodes instead of throwing error.

A test is also added to confirm the new behaviour, where concurrent remove node operations are now being performed seamlessly.

This pr doesn't fix a critical bug. No need to backport it.

Fixes: scylladb/scylladb#24737

Closes scylladb/scylladb#25713

* https://github.com/scylladb/scylladb:
  raft_topology: Modify the conditional logic in remove node operation to enhance concurrency for raft enabled clusters.
  storage_service: remove assumptions and checks for ignore_nodes to be normal.
2025-09-18 17:27:59 +02:00
Nadav Har'El
27c1545340 test/alternator: fix test_list_tables_paginated on DynamoDB
Our list_tables() utility function, used by the test
test_table.py::test_list_tables_paginated, asserts that empty pages
cannot be returned by ListTables - and in fact neither DynamoDB nor
Alternator returns them. But it turns out this is only true on
DynamoDB's us-east-1 region, and in the eu-north-1 region, ListTables
when using Limit=1 can actually return an empty last page.

So let's just drop that unnecessary assertion as being wrong. In any
case, this assert was in a utility function, not a test, which probably
wasn't a great idea in the first place.
2025-09-18 17:46:34 +03:00
Nadav Har'El
284284bf83 test/alternator: fix tests in test_tag.py on DynamoDB
Until August 2024, DynamoDB's "TagResource" operation was synchronous -
when it returned the tags were available for read. This is no longer
true, as the new documentation says and we see in practice with many
test_tag.py failing on DynamoDB. Not only do we can't read the new tags
without waiting, we're not allowed to change other tags or even to
delete the table without waiting.

We don't need to fix Alternator for this new behavior - there is
(surprisingly!) no new API to check if the tag change took affect,
and it's perfectly fine that in Alternator the tags take affect
immediately (when TagResource returns) and not a few seconds later.
But we don't need to fix most test_tag.py tests to work with the
new asynchronous API.

The fix introduces convenience functions tag_resource() and
untag_resource() which performs the TagResource or UntagResource
operation, but also waits until the change took affect by trying
ListTagsOfResources until the desired change took affect. This
will make failed tests wait until the timeout (60 seconds), but
that's fine - we don't expect to have failed test.

After this test all tests in test/altrnator/test_tag.py pass on
DynamoDB (and continue passing on Alternator).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2025-09-18 16:38:09 +03:00
Łukasz Paszkowski
d42d4a05fb disk_space_monitor_test.cc: Start a monitor after fake space source function is registered
When the monitor is started, the first disk utilization value is
obtained from the actual host filesystem and not from the fake
space source function.

Thus, register a fake space source function before the monitor
is started.

Fixes: https://github.com/scylladb/scylladb/issues/26036

Backport is not required. The test has been added recently.

Closes scylladb/scylladb#26054
2025-09-18 15:03:34 +03:00
Piotr Dulikowski
5f55787e50 Merge 'CDC with tablets' from Michael Litvak
initial implementation to support CDC in tablets-enabled keyspaces.

The design is described in https://docs.google.com/document/d/1qO5f2q5QoN5z1-rYOQFu6tqVLD3Ha6pphXKEqbtSNiU/edit?usp=sharing
It is followed closely for the most part except "Deciding when to change streams" - instead, streams are changed synchronously with tablet split / merge.
Instead of the stream switching algorithm with the double writes, we use a scheme similar to the previous method for vnodes - we add the new streams with timestamp that is sufficiently far into the future.

In this PR we:
* add new group0-based internal system tables for tablet stream metadata and loading it into in-memory CDC metadata
* add virtual tables for CDC consumers
* the write coordinator chooses a stream by looking up the appropriate stream in the CDC metadata
* enable creating tables with CDC enabled in tablets-enabled keyspaces. tablets are allocated for the CDC table, and a stream is created per each tablet.
* on tablet resize (split / merge), the topology coordinator creates a new stream set with a new stream for each new tablet.
* the cdc tablets are co-located with the base tablets

Fixes https://github.com/scylladb/scylladb/issues/22576

backport not needed - new feature

update dtests: https://github.com/scylladb/scylla-dtest/pull/5897
update java cdc library: https://github.com/scylladb/scylla-cdc-java/pull/102
update rust cdc library: https://github.com/scylladb/scylla-cdc-rust/pull/136

Closes scylladb/scylladb#23795

* github.com:scylladb/scylladb:
  docs/dev: update CDC dev docs for tablets
  doc: update CDC docs for tablets
  test: cluster_events: enable add_cdc and drop_cdc
  test/cql: enable cql cdc tests to run with tablets
  test: test_cdc_with_alter: adjust for cdc with tablets
  test/cqlpy: adjust cdc tests for tablets
  test/cluster/test_cdc_with_tablets: introduce cdc with tablets tests
  cdc: enable cdc with tablets
  topology coordinator: change streams on tablet split/merge
  cdc: virtual tables for cdc with tablets
  cdc: generate_stream_diff helper function
  cdc: choose stream in tablets enabled keyspaces
  cdc: rename get_stream to get_vnode_stream
  cdc: load tablet streams metadata from tables
  cdc: helper functions for reading metadata from tables
  cdc: colocate cdc table with base
  cdc: remove streams when dropping CDC table
  cdc: create streams when allocating tablets
  migration_listener: add on_before_allocate_tablet_map notification
  cdc: notify when creating or dropping cdc table
  cdc: move cdc table creation to pre_create
  cdc: add internal tables for cdc with tablets
  cdc: add cdc_with_tablets feature flag
  cdc: add is_log_schema helper
2025-09-18 13:39:37 +02:00
Nadav Har'El
3afe078d24 test/alternator: fix test_health_only_works_for_root_path on DynamoDB
test_health.py::test_health_only_works_for_root_path checks that while
http://ourserver/ is a valid health-check URL, taking other silly
strings at the end, like http://ourserver/abc - is NOT valid and results
in an error.

It turns out that for one of the silly strings we chose to test,
"/health", DynamoDB started recently NOT to return an error, and
instead return an empty but successful response. In fact, it does this
for every string starting with /health - including "/healthz". Perhaps
they did this for some sort of Kubernetes compatibility, but in any
case this behavior isn't documented and we don't need to emulate it.
For now, let's just remove the string "/health" from our test so the
test doesn't fail on DynamoDB.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2025-09-18 12:48:42 +03:00
Nadav Har'El
5bd503ad43 test/alternator: reproducer tests for faux GSI range key problem
In issue #5320 we noticed that when we have a GSI with a hash key only
(no range key) but the implementation's MV needs to add a clustering
key for the original base key columns, the output of DescribeTable
wrongly lists that exta "range key" - which isn't a real range key of
the GSI.

It turns out that the fact that the extra attribute is not a real GSI
range key has another implication: It should not be allowed in
KeyConditions or KeyConditionExpression - which should allow only real
key columns of the GSI.

This patch adds two reproducing tests for this issue (issue #2601),
both pass on DynamoDB but xfail on Alternator.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2025-09-18 12:17:21 +03:00
Avi Kivity
f6b6312cf4 Merge 'sstables/trie: prepare for integrating BTI indexes with sstable readers and writers' from Michał Chojnowski
This is yet another part in the BTI index project.

Overarching issue: https://github.com/scylladb/scylladb/issues/19191
Previous part: https://github.com/scylladb/scylladb/pull/25626
Next parts: introducing the new components, Partitions.db and Rows.db

This is the preparatory, uncontroversial part of https://github.com/scylladb/scylladb/pull/26039, which has been split out to a separate PR to make the main part (which, after a revision, will be posted later) smaller.

This series contains several small fixes and changes to BTI-related code added earlier, which either have to be done (i.e. propagating `reader_permit` to IO calls in index reads) or just deserved to be done. There's no single theme for the changes in this PR, refer to the individual commits for details.

The changes are for the sake of new and unreleased code. No backporting should be done.

Closes scylladb/scylladb#26075

* github.com:scylladb/scylladb:
  sstables/mx/reader: remove mx::make_reader_with_index_reader
  test/boost/bti_index_test: fix indentation
  sstables/trie/bti_index_reader: in last_block_offset(), return offset from the beginning of partition, not file
  sstables/trie: support reader_permit and trace_state properly
  sstables/trie/bti_node_reader: avoid calling into `cached_file` if the target position is already cached
  sstables/trie/bti_index_reader: get rid of the seastar::file wrapper in read_row_index_header
  sstables/trie/bti_index_reader: support BYPASS CACHE
  test/boost/bti_index_test: use read_bti_partitions_db_footer where appropriate
  sstables/trie: change the signature of bti_partition_index_writer::finish
  sstables/bti_index: improve signatures of special member functions in index writers
  streaming/stream_transfer_task: coroutinize `estimate_partitions()`
  types/comparable_bytes: add a missing implementation for date_type_impl
  sstables: remove an outdated FIXME
  storage_service: delete `get_splits()`
  sstables/trie: fix some comment typos in bti_index_reader.cc
  sstables/mx/writer: rename _pi_write_m.tomb to partition_tombstone
2025-09-18 12:10:27 +03:00
Nadav Har'El
0b30688641 test/alternator: fix test "test_17119a" to pass on DynamoDB
As noticed in issue #26079 the Alternator test test_gsi.py::test_17119a
fails on DynamoDB. The problem was that the test added to KeyConditions
reading from a GSI an unnecessary attribute - one which was accidentally
allowed by Alternator (Refs #26103) but not allowed by DynamoDB.

This is easy to fix - just remove the unnecessary attribute from
KeyConditions, and the test still works properly and passes on both
DynamoDB and Alternator.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2025-09-18 11:38:35 +03:00
Michael Litvak
aae91330b0 nodetool: ignore repair request error of colocated tables
when cluster repair is run for an entire keyspace, nodetool makes a
repair api request for each table.

if the keyspace contains colocated tables, then the api request for the
colocated tables will fail, because currently scylla doesn't allow making
repair requests for specific colocated tables, but only for base tables.

if the request is to repair an entire keyspace then we can ignore this,
because we will make a repair request for all base tables, and this in
turn will repair also all the colocated tables in the keyspace.

however if specific tables are requested and some of them are colocated
then we should propagate the error to let the user know the request is
invalid.

Refs scylladb/scylladb#24816
2025-09-18 09:35:53 +02:00
Andrzej Jackowski
452313f5a5 test: add test to verify use of sl:driver
`sl:driver` is expected to be used for new and control connections,
but other connections that run user load should not use it after
the user is authenticated.

Refs: scylladb/scylladb#24411
2025-09-18 09:29:37 +02:00
Andrzej Jackowski
1ad483749a generic_server: transport: start using sl:driver for new connections
Before this change, new connections were handled in a default
scheduling group (`main`), because before the user is authenticated
we do not know which service level should be used. With the new
`sl:driver` service level, creation of new connections can be moved to
`sl:driver`.

We switch the service level as early as possible, in `do_accepts`.
There is a possibility, that `sl:driver` will not exist yet, for
instance, in specific upgrade cases, or if it was removed. Therefore,
we also switch to `sl:driver` after a connection is accepted.

Refs: scylladb/scylladb#24411
2025-09-18 09:29:29 +02:00
Andrzej Jackowski
e1b4a338ba test: add test_desc_* for driver service level
Driver service level is a special service level that is created
automatically by the system. Therefore, it requires special handling
in DESC SCHEMA WITH INTERNALS and those test verifies the special
behavior.

Refs: scylladb/scylladb#24411
2025-09-18 09:28:32 +02:00
Andrzej Jackowski
43a0eb7b0b test: service_levels: add tests for sl:driver creation and removal
Refs: scylladb/scylladb#24411
2025-09-18 09:28:32 +02:00
Andrzej Jackowski
4af270a271 test: add reload_raft_topology_state() to ScyllaRESTAPIClient
To encapsulate `/storage_service/raft_topology/reload` API call
2025-09-18 09:28:32 +02:00
Andrzej Jackowski
6f678a2d1f service_level_controller: automatically create sl:driver
This commit:
  - Increases the number of allowed scheduling groups to allow the
    creation of `sl:driver`.
  - Adds the `DRIVER_SERVICE_LEVEL` feature, which prevents creating
    `sl:driver` until all nodes have increased the number of
    scheduling groups.
  - Starts using `get_create_driver_service_level_mutations`
    to unconditionally create `sl:driver` on
    `raft_initialize_discovery_leader`. The purpose of this code
    path is ensuring existence of `sl:driver` in new system and tests.
  - Starts using `migrate_to_driver_service_level` to create `sl:driver`
    if it is not already present. The creation of `sl:driver` is
    managed by `topology_coordinator`, similar to other system keyspace
    updates, such as the `view_builder` migration. The purpose of this
    code path is handling upgrades.
  - Modifies related tests to pass after `sl:driver` is added.

Later in this patch series, `sl:driver` will be used by
`transport/server` to handle selected traffic, such as the driver's
schema and topology fetches.

Refs: scylladb/scylladb#24411
2025-09-18 09:28:32 +02:00
Andrzej Jackowski
d30590c1d0 test: add MAX_USER_SERVICE_LEVELS
Previously, tests used the hardcoded value 7 for the maximum number of
user service levels. This commit introduces a named variable that can
be shared across tests to avoid cases where this magic number goes
out of sync.
2025-09-18 09:28:32 +02:00
Nadav Har'El
22f88bff30 test/alternator: fix test to pass on DynamoDB
As noticed in issue #26079, the Alternator test
test_number.py::test_invalid_numbers failed on DynamoDB, because one of
the things it did, as a "sanity check", was to check that the number
0e1000 was a valid number. But it turns out it isn't allowed by
DynamoDB.

So this patch removes 0e1000 from the list of *valid* numbers in
test_invalid_numbers, and instead creates a whole new test for the
case of 0e1000.

It turns out that DynamoDB has a bug (it appears to be a regression,
because test_invalid_numbers used to pass on DynamoDB!) where it
allows 0.0e1000 (since it's just zero, really!) but forbids 0e1000
which is incorrectly considered to have a too-large magnitude.

So we introduce a test that confirms that Alternator correctly allows
both 0.0e1000 and 0e1000. DynamoDB fails this test (it allows the
first, forbidding the second), making it the first Alternator test
tagged as a "dynamodb_bug".

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2025-09-18 10:28:01 +03:00
Pawel Pery
12f04edf22 vector_store_client: rename embedding into vs_vector
According to the changes in Vector Store API (VECTOR-148) the `embedding` term
should be changed to `vector`. As `vector` term is used for STL class the
internal type or variable names would be changed to `vs_vector` (aka vector
store vector). This patch changes also the HTTP ann json request payload
according to the Vector Store API changes.

Fixes: VECTOR-229

Closes scylladb/scylladb#26050
2025-09-18 08:45:46 +03:00
Yauheni Khatsianevich
adc4a4e15a test: new test for LWT testing during tablets migration
E2E test runs multi-column CAS workload (LOCAL_QUORUM/LOCAL_SERIAL) while
tablets are repeatedly migrated between nodes. Uncertainty timeouts are
resolved via LOCAL_SERIAL reads; guards use max(row, lower_bound). Final
assertion: s{i} per (pk,i) equals the count of confirmed CAS by worker i
(no lost/phantom updates) despite tablet moves.

Closes scylladb/scylladb#25402
2025-09-18 08:11:12 +03:00