Commit Graph

10163 Commits

Author SHA1 Message Date
Botond Dénes
384bffb8da Merge 'compaction: limit the maximum shares allocated to a compaction scheduling class' from Raphael Raph Carvalho
This PR adds support for limiting the maximum shares allocated to a
compaction scheduling class by the compaction controller. It introduces
a new configuration parameter, compaction_max_shares, which, when set
to a non zero value, will cap the shares allocated to compaction jobs.
This PR also exposes the shares computed by the compaction controller
via metrics, for observability purposes.

Fixes https://github.com/scylladb/scylladb/issues/9431

Enhancement. No need to backport.

NOTE: Replaces PR https://github.com/scylladb/scylladb/pull/26696

Ran a test in which the backlog raised the need for max shares (normalized backlog above normalization_factor), and played with different values for new option compaction_max_shares to see it works (500, 1000, 2000, 250, 50)

Closes scylladb/scylladb#27024

* github.com:scylladb/scylladb:
  db/config: introduce new config parameter `compaction_max_shares`
  compaction_manager:config: introduce max_shares
  compaction_controller: add configurable maximum shares
  compaction_controller: introduce `set_max_shares()`
2025-11-26 06:51:30 +02:00
Botond Dénes
584f4e467e tools/scylla-sstable: introduce the dump-schema command
There is a limited number of ways to obtain the schema of a table:
1) Use DESCRIBE TABLE in cqlsh
2) Find the schema definition in the code (for system tables)
3) Ask support/user to provide schema
4) Piece together the schema definition from the system tables

Option (1) is the most convenient but requires access to live cluster.
(2) is limited to system tables only.
When investigating issues for customers, we have to rely on (3) and this
often adds communication round-trips and delays. (4) requires knowledge
of ScyllaDB internals and access to system tables.

The new dump-schema commands provides a convenient way to obtain the
schema of tables, given that there is access to either an sstable or the
system tables. It can dump the schema of system tables without either.

Closes scylladb/scylladb#26433
2025-11-25 20:32:36 +03:00
Piotr Dulikowski
ff5c7bd960 Merge 'topology_coordinator: don't repair colocated tablets' from Michael Litvak
With the introduction of colocated tables, all the tablet transitions
now operate on groups of colocated tablets instead of individual
tablets. such is tablet migration, and also tablet repair.

The tablet repair currently doesn't work on individual tablets due to
the limitations in the tablet map being shared. The way it was
implemented to work on a group of colocated tablets is by repairing all
the colocated tablets together, using a dedicated rpc, and setting a
shared repair_time in the shared tablet map.  It was implemented this
way because we wanted to have some way to repair the tablets of a
colocated table.

However, we want to change this in the next release so that it will be
possible to repair the tablets of a colocated table individually. In
order to simplify and prepare for the future change, we prefer until
then to not repair colocated tables at all. otherwise, we will need to
support both the shared repair and individual repair together for a long
time, and the upgrade will be more complicated.

We change the handling of the tablet 'repair' transition to repair only
the base table's tablets. It means it will not be possible to request
tablet repair for a non-base colocated table such as local MV, CDC and
paxos table. This restriction will be temporary until a later release
where we will suuport repairing colocated tablets.

This is a reasonable restriction because repair for these kind of tables
is not required or as important as for normal tables.

Fixes https://github.com/scylladb/scylladb/issues/27119

backport to 2025.4 since we must change it in the same version it's introduced before it's released

Closes scylladb/scylladb#27120

* github.com:scylladb/scylladb:
  tombstone_gc: don't use 'repair' mode for colocated tables
  Revert "storage service: add repair colocated tablets rpc"
  topology_coordinator: don't repair colocated tablets
2025-11-25 14:58:06 +01:00
Nadav Har'El
bcd1758911 Merge 'vector_search: add validator tests' from Pawel Pery
The vector-search-validator is a binary tool which do functional and integration tests between scylla and vector-store. It is build in Rust mainly in vector-store repository. This patch adds possibility to write tests on scylladb repository side, compile them together with vector-store tests and run them in `test.py` environment.

There are three parts of the change:

- add sources of validator to the `test/vector_search_validator` directory
- add support for building validator and vector-store in `build/vector-search-validator/bin` directory with or without cmake
- add support for `pytest` and `test.py` to run validator test locally and in the CI environment; this part adds also README to the `test/vector_search_validator` directory

Design for validator integration tests: https://scylladb.atlassian.net/wiki/spaces/RND/pages/39518215/Vector+Search+Core+Test+Plan+Document

References: VECTOR-50

No backport needed as this is  a new functionality.

Closes scylladb/scylladb#26653

* github.com:scylladb/scylladb:
  vector_search: add vector-search-validator tests
  vector_search: implement building vector-search-validator
  vector_search: add vector-search-validator sources
2025-11-25 10:34:33 +02:00
Michael Litvak
868ac42a8b tombstone_gc: don't use 'repair' mode for colocated tables
For tables of special types that can be located: MV, CDC, and paxos
table, we should not use tombstone_gc=repair mode because colocated
tablets are never repaired, hence they will not have repair_time set and
will never be GC'd using 'repair' mode.
2025-11-25 09:15:46 +01:00
Michael Litvak
273f664496 topology_coordinator: don't repair colocated tablets
With the introduction of colocated tables, all the tablet transitions
now operate on groups of colocated tablets instead of individual
tablets. such is tablet migration, and also tablet repair.

The tablet repair currently doesn't work on individual tablets due to
the limitations in the tablet map being shared. The way it was
implemented to work on a group of colocated tablets is by repairing all
the colocated tablets together, using a dedicated rpc, and setting a
shared repair_time in the shared tablet map.  It was implemented this
way because we wanted to have some way to repair the tablets of a
colocated table.

However, we want to change this in the next release so that it will be
possible to repair the tablets of a colocated table individually. In
order to simplify and prepare for the future change, we prefer until
then to not repair colocated tables at all. otherwise, we will need to
support both the shared repair and individual repair together for a long
time, and the upgrade will be more complicated.

We change the handling of the tablet 'repair' transition to repair only
the base table's tablets. It means it will not be possible to request
tablet repair for a non-base colocated table such as local MV, CDC and
paxos table. This restriction will be temporary until a later release
where we will suuport repairing colocated tablets.

This is a reasonable restriction because repair for these kind of tables
is not required or as important as for normal tables.

Fixes scylladb/scylladb#27119
2025-11-25 09:05:59 +01:00
Karol Nowacki
ca62effdd2 vector_search: Restrict vector index tests to tablets only
Vector indexes are going to be supported only for tablets (see VECTOR-322).
As a result, tests using vector indexes will be failing when run with vnodes.

This change ensures tests using vector indexes run exclusively with tablets.

Fixes: VECTOR-49

Closes scylladb/scylladb#26843
2025-11-25 09:26:16 +02:00
Pawel Pery
9f10aebc66 vector_search: add vector-search-validator tests
The commit adds a functionality for `pytest` and `test.py` to run
`vector-search-validator` in `sudo unshare` environment. There are already two
tests - first parametrized `test_validator.py::test_validator[test-case-name]`
(run validator) and second `test_cargo_toml.py::test_cargo_toml` (check if the
current `Cargo.toml` for validator is correct).

Documentation for these tests are provided in `README.md`.
2025-11-24 17:26:04 +01:00
Pawel Pery
3702e982b9 vector_search: implement building vector-search-validator
The commit adds targets building
`build/vector-search-validator/bin/{vector-store,vector-search-validator}. The
targets must be build for tests. They don't depend on build mode.

The commit adds target in `configure.py` and also in `cmake`.
2025-11-24 17:26:04 +01:00
Pawel Pery
e569a04785 vector_search: add vector-search-validator sources
The commit adds validator sources uses combination of local files and
vector-store's files. In `build-env` there are definition of vector-store git
repository and revision on which validator will be built. `cargo-toml-template`
is script for printing current `Cargo.toml` to the stdout. After updating
`build-env` developer needs to update new configuration with
`./cargo-toml-template > Cargo.toml`. Git revision is used in several places in
`Cargo.toml` and will be used for building `vector-store`, so for better
handling git revision it should be setup only in one place.

The validator is divided into several crates to be able to built it within
scylladb and vector-store repositories. Here we need to create a new validator
crate with simple `main` function and call `validator_engine::main` there. We
provide tests written in scylladb repo in `validator-scylla` crate. The commit
provides empty `cql` test case, which should be filled in the future.
2025-11-24 17:26:04 +01:00
Gleb Natapov
39cec4ae45 topology: let banned node know that it is banned
Currently if a banned node tries to connect to a cluster it fails to
create connections, but has no idea why, so from inside the node it
looks like it has communication problems. This patch adds new rpc
NOTIFY_BANNED which is sent back to the node when its connection is
dropped. On receiving the rpc the node isolates itself and print an
informative message about why it did so.

Closes scylladb/scylladb#26943
2025-11-24 17:12:13 +01:00
Lakshmi Narayanan Sreethar
9cb766f929 db/config: introduce new config parameter compaction_max_shares
Add support for the new configuration parameter `compaction_max_shares`,
and update the compaction manager to pass it down to the compaction
controller when it changes. The shares allocated to compaction jobs will
be limited by this new parameter.

Fixes #9431

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2025-11-24 12:52:29 -03:00
Lakshmi Narayanan Sreethar
468b800e89 compaction_manager:config: introduce max_shares
Introduce an updateable value `max_shares` to compaction manager's
config. Also add a method `update_max_shares()` that applies the latest
`max_shares` value to the compaction controller’s `max_shares`. This new
variable will be connected to a config parameter in the next patch.

Refs #9431

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2025-11-24 11:43:38 -03:00
Tomasz Grabiec
d4b77c422f Merge 'load_stats: leaving replica could be std::nullopt' from Ferenc Szili
When migrating tablet size during the end_migration tablet transition stage, we need the pending and leaving replica hosts. The leaving and pending replicas are gathered in objects of type std::optional<tablet_replica> and are not checked if they contain a value before dereferencing which could cause an exception in the topology coordinator.

This patch adds a check for leaving and pending replicas, and only performs the tablet size migration if neither are empty.

This bug was introduced in 10f07fb95a

This change also adds the ability to create a tablet size in load_stats during end_migration stage of a tablet rebuild. We compute the new tablet size from by averaging the tablet sizes of the existing replicas.

This change also adds the virtual table tablet_sizes which contains tablet sizes of all the replicas of all the tablets in the cluster.

A version containing this bug has not yet been released, so a backport is not needed.

Closes scylladb/scylladb#27118

* github.com:scylladb/scylladb:
  test: add tests for tablet size migration during end_migration
  virtual_table: add tablet_sizes virtual table
  load_stats: update tablet sizes after migration or rebuild
2025-11-24 15:31:30 +01:00
Botond Dénes
296d7b8595 Merge 'Enable digest+checksum verification for file based streaming' from Taras Veretilnyk
This patch enables integrity check in  'create_stream_sources()' by introducing a new 'sstable_data_stream_source_impl' class for handling the Data component of SSTables. The new implementation uses 'sstable::data_stream()' with 'integrity_check::yes' instead of the raw input_stream.

These additional checks require reading the digest and CRC components from disk, which may introduce some I/O overhead. For uncompressed SSTables, this involves loading and computing checksums and digest from the data.
For compressed SSTables - where checksums are already embedded  - the cost comes from reading, calculating and verifying the diges.

New test cases were added to verify that the integrity checks work correctly, detecting both data and digest mismatches.

Backport is not required, since it is a new feature

Fixes #21776

Closes scylladb/scylladb#26702

* github.com:scylladb/scylladb:
  file_stream_test: add sstable file streaming integrity verification test cases
  streaming: prioritize sender-side errors in tablet_stream_files
  sstables: enable integrity check for data file streaming
  sstables: Add compressed raw streaming support
  sstables: Allow to read digest and checksum from user provided file instance
  sstables: add overload of data_stream() to accept custom file_input_stream_options
2025-11-24 06:37:27 +02:00
Aleksandra Martyniuk
76174d1f7a cql3: reject ALTER KEYSPACE if rf of datacenter with tablets is omitted
In ALTER KEYSPACE, when a datacenter name is omitted, its replication
factor is implicitly set to zero with vnodes, while with tablets,
it remains unchanged.

ALTER KEYSPACE should behave the same way for tablets as it does
for vnodes. However, this can be dangerous as we may mistakenly
drop the whole datacenter.

Reject ALTER KEYSPACE if it changes replication factor, but omits
a datacenter that currently contains tablet replicas.

Fixes: https://github.com/scylladb/scylladb/issues/25549.

Closes scylladb/scylladb#25731
2025-11-24 06:36:51 +02:00
Avi Kivity
85db7b1caf Merge 'address_map: Use more efficient and reliable replication method' from Tomasz Grabiec
Primary issue with the old method is that each update is a separate
cross-shard call, and all later updates queue behind it. If one of the
shards has high latency for such calls, the queue may accumulate and
system will appear unresponsive for mapping changes on non-zero shards.

This happened in the field when one of the shards was overloaded with
sstables and compaction work, which caused frequent stalls which
delayed polling for ~100ms. A queue of 3k address updates
accumulated, because we update mapping on each change of gossip
states. This made bootstrap impossible because nodes couldn't
learn about the IP mapping for the bootstrapping node and streaming
failed.

To protect against that, use a more efficient method of replication
which requires a single cross-shard call to replicate all prior
updates.

It is also more reliable, if replication fails transiently for some
reason, we don't give up and fail all later updates.

Fixes #26865

Closes scylladb/scylladb#26941

* github.com:scylladb/scylladb:
  address_map: Use barrier() to wait for replication
  address_map: Use more efficient and reliable replication method
  utils: Introduce helper for replicated data structures
2025-11-23 19:15:12 +02:00
Avi Kivity
b0643f8959 Merge 'db/config: enable ms sstable format by default' from Michał Chojnowski
Trie-based sstable indexes are supposed to be (hopefully) a better default than the old BIG indexes.
Make them the new default.

If we change our mind, this change can be reverted later.

New functionality, and this is a drastic change. No backport needed.

Closes scylladb/scylladb#26377

* github.com:scylladb/scylladb:
  db/config: enable `ms` sstable format by default
  cluster/dtest/bypass_cache_test: switch from highest_supported_sstable_format to chosen_sstable_format
  api/system: add /system/chosen_sstable_version
  test/cluster/dtest: reduce num_tokens to 16
2025-11-23 13:52:57 +02:00
Karol Nowacki
c40b3ba4b3 vector_search: Add HTTPS support for vector store connections
This commit introduces TLS encryption support for vector store connections.
A new configuration option is added:
- vector_store_encryption_options.truststore: path to the trust store file

To enable secure connections, use the https:// scheme in the
vector_store_primary_uri/vector_store_secondary_uri configuration options.

Fixes: VECTOR-327
2025-11-22 08:18:45 +01:00
Ferenc Szili
39711920eb test: add tests for tablet size migration during end_migration
This change adds tests for the correctness of tablet size migration
during the end_migrations stage. This size migration can happend for
tablet migrations and for tablet rebuild.
2025-11-21 16:58:11 +01:00
Taras Veretilnyk
3003669c96 file_stream_test: add sstable file streaming integrity verification test cases
Add 'test_sstable_stream' to verify SSTable file streaming integrity check.
The new tests cover both compressed and uncompressed SSTables and includes:
- Checksum mismatch detection verification
- Digest mismatch detection verifivation
2025-11-21 12:52:35 +01:00
Michał Chojnowski
da51a30780 db/config: enable ms sstable format by default
Trie-based sstable indexes are supposed to be (hopefully)
a better default than the old BIG indexes.
Make them the new default.

If we change our mind, this change can be reverted later.
2025-11-21 12:39:46 +01:00
Michał Chojnowski
73090c0d27 cluster/dtest/bypass_cache_test: switch from highest_supported_sstable_format to chosen_sstable_format
Trie-based indexes and older indexes have a difference in metrics,
and the test uses the metrics to check for bypass cache.
To choose the right metrics, it uses highest_supported_sstable_format,
which is inappropriate, because the sstable format chosen for writes
by Scylla might be different than highest_supported_sstable_format.

Use chosen_sstable_format instead.
2025-11-21 12:39:46 +01:00
Michał Chojnowski
38e14d9cd5 api/system: add /system/chosen_sstable_version
Returns the sstable version currently chosen for use in for new sstables.

We are adding it because some tests want to know what format they are
writing (tests using upgradesstable, tests which check stats that only
apply to one of the index types, etc).

(Currently they are using `highest_supported_sstable_format` for this
purpose, which is inappropriate, and will become invalid if a non-latest
format is the default).
2025-11-21 12:39:46 +01:00
Botond Dénes
5c6813ccd0 test/cluster/test_repair.py: add test_repair_timestamp_difference
Add a test which verifies that if two nodes have the same data, with
different timestamps, repair will detect and fix the diverging
timestamps.

All our repair tests focus on difference in data and I remember writing
this test multiple times in the past to quickly verify whether this
works. Time to upstream this test.

Closes scylladb/scylladb#26900
2025-11-21 14:19:51 +03:00
Nadav Har'El
66bd3dc22c test/alternator: tests for request compression
DynamoDB's documentation
https://docs.aws.amazon.com/sdkref/latest/guide/feature-compression.html
suggests that DynamoDB allows request bodies to be compressed (currently
only by gzip).

The purpose of patch is to have a test reproducing this feature. The
test shows us that indeed DynamoDB understands compressed requests using
the "gzip" encoding, but Alternator does *not*, so the new test is xfail.

As you can see in the test code, although the low-level SDK (botocore)
can send compress requests, this is not actually enabled for DynamoDB
and we need to resort to some trickery to send compressed requests.
But the point is that once we do manage to send compressed requests,
the test shows us that they work properly on AWS, but fail on
Alternator.

The failure of the compressed requests on Alternator is reported like:

    An error occurred (ValidationException) when calling the PutItem
    operation: Parsing JSON failed: Invalid value. at 70459088

This error message should probably be improved (what is that high
number?!) but of course even better would be to make it really work.

By enabling tracing on alternator-server (e.g., edit test/cqlpy/run.py
and add `'--logger-log-level', 'alternator-server=trace',`) we can
see exactly what request the SDK sends Alternator. What we can see in
the request is:

1. The request headers are uncompressed (this is expected in HTTP)
2. There is a header "Content-Encoding: gzip"
3. The request's body is binary, a full-fleged gzip output complete with
   a gzip magic in the beginning.

Refs #5041

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#27049
2025-11-21 10:48:33 +02:00
Botond Dénes
0cc5208f8e Merge 'Add sstables_manager::config' from Pavel Emelyanov
Currently sstables_manager keeps a reference on global db::config to configure itself. Most of other services use their own specific configs with much less data on-board for the same purposes (e.g. #24841, #19051 and #23705 did same for other services) This PR applies this approach to sstables_manager as well.

Mostly it moves various values from db::config onto newly introduced struct sstables_manager::config, but it also adds specific tracking of sstable_file_io_extensions and patches tools/scylla-sstable not to use sstables_manager as "proxy" object to get db::config from along its calls.

Shuffling components dependencies, no need to backport

Closes scylladb/scylladb#27021

* github.com:scylladb/scylladb:
  sstables_manager: Drop db::config from sstables_manager
  tools/sstable: Make shard_of_with_tablets use db::config argument
  tools/sstable: Add db::config& to all operations
  tools/sstable: Get endpoints from storage manager
  sstables_manager: Hold sstable IO extensions on it
  sstables: Manager helper to grab file io extensions
  sstables_manager: Move default format on config
  sstables_manager: Move enable_sstable_data_integrity_check on config
  sstables_manager: Move data_file_directories on config
  sstables_manager: Move components_memory_reclaim_threshold on config
  sstables_manager: Move column_index_auto_scale_threshold on config
  sstables_manager: Move column_index_size on config
  sstables_manager: Move sstable_summary_ratio on config
  sstables_manager: Move enable_sstable_key_validation on config
  sstables_manager: Move available_memory on config
  code: Introduce sstables_manager::config
  sstables: Patch get_local_directories() to work on vector of paths
  code: Rename sstables_manager::config() into db_config()
2025-11-21 10:21:41 +02:00
Botond Dénes
f89bb68fe2 Merge 'cdc: Preserve properties when reattaching log table' from Dawid Mędrek
When we enable CDC on a table, Scylla creates a log table for it.
It has default properties, but the user may change them later on.
Furthermore, it's possible to detach that log table by simply
disabling CDC on the base table:

```cql
/* Create a table with CDC enabled. The log table is created. */
CREATE TABLE ks.t (pk int PRIMARY KEY) WITH cdc = {'enabled': true};

/* Detach the log table. */
ALTER TABLE ks.t WITH cdc = {'enabled': false};

/* Modify a property of the log table. */
ALTER TABLE ks.t_scylla_cdc_log WITH bloom_filter_fp_chance = 0.13;
```

The log table can also be reattached by enabling CDC on the base table
again:

```cql
/* Reattach the log table */
ALTER TABLE ks.t WITH cdc = {'enabled': true};
```

However, because the process of reattachment goes through the same
code that created it in the first place, the properties of the log
table are rolled back to their default values. This may be confusing
to the user and, if unnoticed, also have other consequences, e.g.
affecting performance.

To prevent that, we ensure that the properties are preserved.

A reproducer test,
`test_log_table_preserves_properties_after_reattachment`, has been
provided to verify that the changes are correct.

Another test, `test_log_table_preserves_id_after_reattachment`, has
also been added because the current implementation sets properties
and the ID separately.

Fixes scylladb/scylladb#25523

Backport: not necessary. Although the behavior may be unexpected,
it's not a bug per se.

Closes scylladb/scylladb#26443

* github.com:scylladb/scylladb:
  cdc: Preserve properties when reattaching log table
  cdc: Extract creating columns in CDC log table to dedicated function
  cdc: Extract default properties of CDC log tables to dedicated function
  schema/schema_builder.hh: Add set_properties
  schema: Add getter for schema::user_properties
  schema: Remove underscores in fields of schema::user_properties
  schema: Extract user properties out of raw_schema
2025-11-21 10:06:05 +02:00
Calle Wilund
03408b185e utils::gcp::object_storage: Fix buffer alignment reordering trailing data
Fixes #26874

Due to certain people (me) not being able to tell forward from backward,
the data alignment to ensure partial uploads adhere to the 256k-align
rule would potentially _reorder_ trailing buffers generated iff the
source buffers input into the sink are small enough. Which, as a fun fact,
they are in backup upload.

Change the unit test to use raw sink IO and add two unit tests (of which
the smaller size provokes the bug) that checks the same 64k buf segmented
upload backup uses.

Closes scylladb/scylladb#26938
2025-11-21 09:36:13 +02:00
Radosław Cybulski
ce8db6e19e Add table name to tracing in alternator
Add a table name to Alternator's tracing output, as some clients would
like to consistently receive this information.

- add missing `tracing::add_table_name` in `executor::scan`
- add emiting tables' names in `trace_state::build_parameters_map`
- update tests, so when tracing is looked for it is filtered by table's
  name, which confirms table is being outputed.
- change `struct one_session_records` declaration to `class one_session_records`,
  as `one_session_records` is later defined as class.

Refs #26618
Fixes #24031

Closes scylladb/scylladb#26634
2025-11-21 09:33:40 +02:00
Michał Chojnowski
3f11a5ed8c test/cluster/dtest: reduce num_tokens to 16
cluster.dtest_alternator_tests.test_slow_query_logging performs
a bootstrap with 768 token ranges.

It works with `me` sstables, which have 2 open file descriptors
per open sstable, but with `ms` sstables, which have 3 open
file descriptors per open sstable, it fails with EMFILE.

To avoid this problem, let's just decrease the number of vnodes
for in the test suite. It's appropriate anyway, because it avoids some
unneeded work without weakening the tests.
(Note: pylib-based have been setting `num_tokens` to 16 for a long time too).

This breaks `bypass_cache_test`, which is written in a way that expects
a certain number of token ranges. We adjust the relevant parameter
accordingly.
2025-11-21 00:38:50 +01:00
Raphael S. Carvalho
74ecedfb5c replica: Fail timed-out single-key read on cleaned up tablet replica
Consider the following:
1) single-key read starts, blocks on replica e.g. waiting for memory.
2) the same replica is migrated away
3) single-key read expires, coordinator abandons it, releases erm.
4) migration advances to cleanup stage, barrier doesn't wait on
   timed-out read
5) compaction group of the replica is deallocated on cleanup
6) that single-key resumes, but doesn't find sstable set (post cleanup)
7) with abort-on-internal-error turned on, node crashes

It's fine for abandoned (= timed out) reads to fail, since the
coordinator is gone.
For active reads (non timed out), the barrier will wait for them
since their coordinator holds erm.
This solution consists of failing reads which underlying tablet
replica has been cleaned up, by just converting internal error
to plain exception.

Fixes #26229.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#27078
2025-11-20 11:44:03 +02:00
Gleb Natapov
ad3cf2c174 utils: fix get_random_time_UUID_from_micros to generate correct time uuid
According to the IETF spec uuid variant bits should be set to '10'. All
others are either invalid or reserved. The patch change the code to
follow the spec.

Closes scylladb/scylladb#27073
2025-11-20 10:27:29 +02:00
Nadav Har'El
a9cf7d08da test/cqlpy: remove USE from test, and test a USE-related bug
One of the tests in test_describe.py used "USE {test_keyspace}" which
affects the CQL session shared by all tests in an unrecoverable way (there
is no "UNUSE" statement).

As an example of what might happen if the shared CQL session is "polluted"
by a USE, issue #26334 is about a bug we have in DESC KEYSPACES when USE
is active.

So in this patch, we:

1. Fix the test to not use USE on the shared CQL session - it's easy to
   create a separate session to use the "USE" on. With this fix, the test
   no longer leaves the shared CQL session in a "USE" state.

2. Add a new xfailing test to reproduce the DESC KEYSPACES bug.
   Refs #26334

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#26345
2025-11-20 10:25:03 +02:00
Botond Dénes
a084094c18 Merge 'alternator and cql: tests for default sstable compression' from Nadav Har'El
The purpose of this two-patch series is to reproduce a previously unknown bug, Refs #26914.

Recently we saw a lot of patches that change how we create new schemas (keyspaces and tables), sometimes changing various long-time defaults. We started to worry that perhaps some of these defaults were applied only to CQL base tables and perhaps not to Alternator or to CQL's auxiliary tables (materialized views, secondary indexes, or CDC logs). For example, in Refs #26307 we wondered if perhaps the default "speculative_retry" option is different in Alternator than in CQL.

The first patch includes Alternator tests, and the second CQL tests. In both tests we discover that although recently (commit adf9c42, Refs #26610) we changed the default sstable compressor from LZ4Compressor to LZ4WithDictsCompressor, actually this change was **only** applied to CQL base tables. All Alternator tables and all CQL auxiliary tables (views, indexes, CDC) still use the old LZ4Compressor. This is issue #26914.

Closes scylladb/scylladb#26915

* github.com:scylladb/scylladb:
  test/cqlpy: test compression setting for auxiliary table
  test/alternator: tests for schema of Alternator table
2025-11-20 10:24:31 +02:00
Karol Nowacki
104de44a8d vector_search: Add support for secondary vector store clients
This change adds support for secondary vector store clients, typically
located in different availability zones. Secondary clients serve as
fallback targets when all primary clients are unavailable.
New configuration option allows specifying secondary client addresses
and ports.

Fixes: VECTOR-187

Closes scylladb/scylladb#26484
2025-11-20 08:37:18 +01:00
Pavel Emelyanov
1cabc8d9b0 Merge 'streaming: fix loop break condition in tablet_sstable_streamer::stream' from Ernest Zaslavsky
When streaming SSTables by tablet range, the original implementation of tablet_sstable_streamer::stream may break out of the loop too early when encountering a non-overlaping SSTable. As a result, subsequent SSTables that should be classified as partially contained are skipped entirely.

Tablet range: [4, 5]
SSTable ranges:
[0,5]
[0, 3] <--- is considered exhausted, and causes skip to next tablet
[2, 5] <--- is missed for range [4, 5]

The loop uses if (!overlaps) break; semantics, which conflated “no overlap” with “done scanning.” This caused premature termination when an SSTable did not overlapped but the following one did.

Correct logic should be:

before(sst_last) → skip and continue.

after(sst_first) → break (no further SSTables can overlap).

Otherwise → `contains` to classify as full or partial.

Missing SSTables in streaming and potential data loss or incomplete streaming in repair/streaming operations.

1. Correct the loop termination logic that previously caused certain SSTables to be prematurely excluded, resulting in lost mutations. This change ensures all relevant SSTables are properly streamed and their mutations preserved.
2. Refactor the loop to use before() and after() checks explicitly, and only break when the SSTable is entirely after the tablet range
3. Add pytest to cover this case, full streaming flow by means of `restore`
4. Add boost tests to test the new refactored function

This data corruption fix should be ported back to 2024.2, 2025.1, 2025.2, 2025.3 and 2025.4

Fixes: https://github.com/scylladb/scylladb/issues/26979

Closes scylladb/scylladb#26980

* github.com:scylladb/scylladb:
  streaming: fix loop break condition in tablet_sstable_streamer::stream
  streaming: add pytest case to reproduce mutation loss issue
2025-11-20 10:16:17 +03:00
Piotr Dulikowski
dc7944ce5c Merge 'vector_search: Fix error handling and status parsing' from Karol Nowacki
vector_search: Fix error handling and status parsing

This change addresses two issues in the vector search client that caused
validator test failures: incorrect handling of 5xx server errors and
faulty status response parsing.

1.  5xx Error Handling:
    Previously, a 5xx response (e.g., 503 Service Unavailable) from the
    underlying vector store for an `/ann` search request was incorrectly
    interpreted as a node failure. This would cause the node to be marked
    as down, even for transient issues like an index scan being in progress.

    This change ensures that 5xx errors are treated as transient search
    failures, not node failures, preventing nodes from being incorrectly
    marked as down.

2.  Status Response Parsing:
    The logic for parsing status responses from the vector store was
    flawed. This has been corrected to ensure proper parsing.

Fixes: SCYLLADB-50

Backport to 2025.4 as this problem is present on this branch.

Closes scylladb/scylladb#27111

* github.com:scylladb/scylladb:
  vector_search: Don't mark nodes as down on 5xx server errors
  test: vector_search: Move unavailable_server to dedicated file
  vector_search: Fix status response parsing
2025-11-20 08:14:28 +01:00
Botond Dénes
6ee0f1f3a7 Merge 'replica/table: add a metric for hypothetical total file size without compression' from Michał Chojnowski
This patch adds a metric for pre-compression size of sstable files.

This patch adds a per-table metric
`scylla_column_family_total_disk_space_before_compression`,
which measures the hypothetical total size of sstables on disk,
if Data.db was replaced with an uncompressed equivalent.

As for the implementation:
Before the patch, tables and sstable sets are already tracking their total physical file size.
Whenever sstables are added or removed, the size delta is propagated from the sstable up through sstable sets into table_stats.
To implement the new metric, we turn the size delta that is getting passed around from a one-dimensional to a two-dimensional value, which includes both the physical and the pre-compression size.

New functionality, no backport needed.

Closes scylladb/scylladb#26996

* github.com:scylladb/scylladb:
  replica/table: add a metric for hypothetical total file size without compression
  replica/table: keep track of total pre-compression file size
2025-11-20 09:10:38 +02:00
Karol Nowacki
9563d87f74 vector_search: Don't mark nodes as down on 5xx server errors
For an `/ann` search request, a 5xx server response does not
indicate that the node is down. It can signify a transient state, such
as the index full scan being in progress.

Previously, treating a 503 error as a node fault would cause the node
to be incorrectly marked as down, for example, when a new index was
being created. This commit ensures that such errors are treated as
transient search failures, not node failures.
2025-11-20 08:10:20 +01:00
Karol Nowacki
366ecef1b9 test: vector_search: Move unavailable_server to dedicated file
The unavailable_server code will be reused in upcoming client unit tests.
2025-11-20 08:09:21 +01:00
Pavel Emelyanov
53b71018e8 Merge 'Alternator: additional tests for ExclusiveStartKey' from Nadav Har'El
In pull request #26960 there was some discussion about what is the valid form of ExclusiveStartKey, and whether we need to allow some "non-standard" uses of it for scan over system tables (which aren't real Alternator tables and may have multiple key columns, something not possible in normal Altenrator tables). This made me realize our tests for what is allowed - and what is not allowed - in ExclusiveStartKey - are very sparse and don't cover all the cases that are possible in Scan and Query, in base tables and in GSIs.

So this small series attempts to increase the coverage of the tests for ExclusiveStartKey to make sure we are compatible with DynamoDB and also that we don't regress in #26960.

The new tests reproduce a previously unknown error-path issues, #26988, where in some cases DynamoDB considers ExclusiveStartKey to be invalid but Alternator erronously accepts. Fortunately, we didn't find any success-path (correctness) bugs.

Closes scylladb/scylladb#26994

* github.com:scylladb/scylladb:
  test/alternator: tests for ExclusiveStartKey in GSI
  test/alternator: more tests for ExclusiveStartKey in Scan
  test/alternator: more tests for ExclusiveStartKey in Query
2025-11-20 07:21:39 +03:00
Benny Halevy
fd81333181 test/pylib/cpp: increase max-networking-io-control-blocks value
Increase the value of the max-networking-io-control-blocks option
for the cpp tests as it is too low and causes flakiness
as seen in vector_search.vector_store_client_test.vector_store_client_single_status_check_after_concurrent_failures:
```
seastar/src/core/reactor_backend.cc:342: void seastar::aio_general_context::queue(linux_abi::iocb *): Assertion `last < end` failed.
```

See also https://github.com/scylladb/seastar/issues/976

Fixes #27056

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#27117
2025-11-20 04:31:36 +01:00
Ernest Zaslavsky
dedc8bdf71 streaming: fix loop break condition in tablet_sstable_streamer::stream
Correct the loop termination logic that previously caused
certain SSTables to be prematurely excluded, resulting in
lost mutations. This change ensures all relevant SSTables
are properly streamed and their mutations preserved.
2025-11-19 17:32:49 +02:00
Tomasz Grabiec
f83c4ffc68 address_map: Use barrier() to wait for replication
More efficient than 100 pings.

There was one ping in test which was done "so this shard notices the
clock advance". It's not necessary, since obsering completed SMP
call implies that local shard sees the clock advancement done within in.
2025-11-19 15:21:02 +01:00
Tomasz Grabiec
4a85ea8eb2 address_map: Use more efficient and reliable replication method
Primary issue with the old method is that each update is a separate
cross-shard call, and all later updated queue behind it. If one of the
shards has high latency for such calls, the queue may accumulate and
system will appear unresponsive for mapping changes on non-zero shards.

This happened in the field when one of the shards was overloaded with
sstables and compaction work, which caused frequent stalls which
delayed polling for ~100ms. A queue of 3k address updates
accumulated. This made bootstrap impossible, since nodes couldn't
learn about the IP mapping for the bootstrapping node and streaming
failed.

To protect against that, use a more efficient method of replication
which requires a single cross-shard call to replicate all prior
updates.

It is also more reliable, if replication fails transiently for some
reason, we don't give up and fail all later updates.

Fixes #26865
Fixes #26835
2025-11-19 15:21:02 +01:00
Tomasz Grabiec
ed8d127457 utils: Introduce helper for replicated data structures
Key goals:
  - efficient (batching updates)
  - reliable (no lost updates)

Will be used in data structures maintained on one designed owning
shard and replicated to other shards.
2025-11-19 15:21:02 +01:00
Michał Chojnowski
d8e299dbb2 sstables/trie/trie_writer: free nodes after they are flushed
Somehow, the line of code responsible for freeing flushed nodes
in `trie_writer` is missing from the implementation.

This effectively means that `trie_writer` keeps the whole index in
memory until the index writer is closed, which for many dataset
is a guaranteed OOM.

Fix that, and add some test that catches this.

Fixes scylladb/scylladb#27082

Closes scylladb/scylladb#27083
2025-11-19 14:54:16 +02:00
Karol Nowacki
05b9cafb57 vector_search: Fix status response parsing
The response was incorrectly parsed as a plain string and compared
directly with C++ string. However, the body contains a JSON string,
which includes escaped quotes that caused comparison failures.
2025-11-19 10:02:05 +01:00
Nadav Har'El
7b9428d8d7 test/cqlpy: test compression setting for auxiliary table
In the previous patch we noticed that although recently (commit adf9c42,
Refs #26610) we changed the default sstable compressor from LZ4Compressor
to LZ4WithDictsCompressor, this change was only applied to CQL, not to
Alternator.

In this patch we add tests that demonstrate that it's even worse - the
new compression only applies to CQL's *base* table - all the "auxiliary"
tables -
        * Materialized views
        * Secondary index's materialized views
        * CDC log tables

all still have the old LZ4Compressor, different from the base table's
default compressor.

The new test fails on Scylla, reproducing #26914, and passes on
Cassandra (on Cassandra, we only compare the materialized view table,
because SI and CDC is implemented differently).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2025-11-19 09:18:37 +02:00