Commit Graph

49825 Commits

Author SHA1 Message Date
Dawid Mędrek
f90ca413a0 test/cluster: Adjust MV tests to RF-rack-validity
Some of the new tests covering materialized views explicitly disabled
the configuration option `rf_rack_valid_keyspaces`. It's going to become
a new requirement for views with tablets, so we adjust those tests and
enable the option. There is one exception, the test:

`cluster/mv/test_mv_topology_change.py::test_mv_rf_change`

We handle it separately in the following commit.

(cherry picked from commit 994f09530f)
2025-10-06 13:19:54 +00:00
Dawid Mędrek
5e0f5f4b44 test/boost/schema_loader_test.cc: Explicitly enable rf_rack_valid_keyspaces
The test cases in the file aren't run via an existing interface like
`do_with_cql_env`, but they rely on a more direct approach -- calling
one of the schema loader tools. Because of that, they manage the
`db::config` object on their own and don't enable the configuration
option `rf_rack_valid_keyspaces`.

That hasn't been a problem so far since the test doesn't attempt to
create RF-rack-invalid keyspaces anyway. However, in an upcoming commit,
we're going to further restrict views with tablets and require that the
option is enabled.

To prepare for that, we enable the option in all test cases. It's only
necessary in a small subset of them, but it won't hurt the enforce it
everywhere, so let's do that.

Refs scylladb/scylladb#23958

(cherry picked from commit d6fcd18540)
2025-10-06 13:19:54 +00:00
Dawid Mędrek
5d32fef3ae db/view: Name requirement for views with tablets
We add a named requirement, a function, for materialized views with tablets.
It decides whether we can create views and secondary indexes in a given
keyspace. It's a stepping stone towards modifying the requirements for it.

This way, we keep the code in one place, so it's not possible to forget
to modify it somewhere. It also makes it more organized and concise.

(cherry picked from commit a1254fb6f3)
2025-10-06 13:19:53 +00:00
Jenkins Promoter
f2c5874fa9 Update ScyllaDB version to: 2025.4.0-rc1 scylla-2025.4.0-rc1 scylla-2025.4.0-rc1-candidate-20251005063348 2025-10-03 21:26:12 +03:00
Jenkins Promoter
a9f4024c1b Update pgo profiles - aarch64 2025-10-01 04:42:23 +03:00
Jenkins Promoter
6969918d31 Update pgo profiles - x86_64 scylla-2025.4.0-rc0-candidate-20251001110541 scylla-2025.4.0-rc0-candidate-20251001105036 scylla-2025.4.0-rc0 2025-10-01 04:20:49 +03:00
Luis Freitas
d69edfcd34 Update ScyllaDB version to: 2025.4.0-rc0 2025-09-30 18:51:59 +03:00
Avi Kivity
72609b5f69 Merge 'mv: generate view updates on pending replica' from Michael Litvak
Generate view updates from a pending base replica if it's a reading
replica, i.e. it's in the last stage of transition write_both_read_new
before becoming the new base replica.

Previously we didn't generate view updates on a pending replica. The
problem with that is that when a base token is migrated from one replica
B1 to another B2, at one stage we generate view updates only from B1,
then at the next stage we generate view updates only from B2. During
this transition, it can happen that for some write neither B1 nor B2
generate view update, because each one sees the other as the base
replica.

We fix this by generating view updates from both base replicas in the
phase before the transition. We can generate view updates on the pending
replica in this case, even if it requires read-before-write, because
it's in a stage where it contains all data and serves reads.

Fixes https://github.com/scylladb/scylladb/issues/24292

backport not needed - the issue mostly affects MV with tablets which is still experimental

Closes scylladb/scylladb#25904

* github.com:scylladb/scylladb:
  test: mv: test view update during topology operations
  mv: generate view updates on both shards in intranode migration
  mv: generate view updates on pending replica
2025-09-30 13:17:16 +03:00
Piotr Wieczorek
4be0bdbc07 alternator: Don't emit a redundant REMOVE event in Alternator Streams for PutItem calls
Until now, every PutItem operation appeared in the Alternator Streams as
two events - a REMOVE and a MODIFY. DynamoDB Streams emits only INSERT
or MODIFY, depending on whether a row was replaced, or created anew. A
related issue scylladb#6918 concerns distinguishing the mutation type properly.

This was because each call to PutItem emitted the two CDC rows, returned
by GetRecords. Since this patch, we use a collection tombstone for the
`:attrs` column, and a separate tombstone for each regular column in the
table's schema. We don't expect that new tables would have any other
regular column, except for the `:attrs` and keys, but we may encounter
them in in upgraded tables which had old GSIs or LSIs.

Fixes: scylladb#6930.

Closes scylladb/scylladb#24991
2025-09-30 13:12:16 +03:00
Michał Hudobski
ae4d4908ba configure: increase SCHEDULING_GROUPS_COUNT to 20
We would like to have an additional service level
available for users of the Vector Store service,
which would allow us to de/prioritize vector
operations as needed. To allow that, we increase
the number of scheduling groups from 19 to 20
and adjust the related test accordingly.

Closes scylladb/scylladb#26316
2025-09-30 12:41:28 +03:00
Nadav Har'El
38002718a9 cqlpy: improve testing for "duration" column type
We had very rudimentary tests for the "duration" CQL type in the cqlpy
framework - just for reproducing issue #8001. But we left two
alternative formats, and a lot of corner cases, untested.
So this patch aims to add the missing tests - to exhaustively cover
the "duration" literal formats and their capabilities.

Some of the examples tested in the new test are inspired by Cassandra's
unit test test/unit/org/apache/cassandra/cql3/DurationTest.java and the
corner cases that this file covers. However, the new tests are not direct
translation of that file because DurationTest.java was not a CQL test -
it was a unit test of Cassandra's internal "Duration" type, so could not
be directly translated into a CQL-based test.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#25092
2025-09-30 12:25:02 +03:00
Botond Dénes
efd99bb0af Merge 'Return tablet ranges from range_to_endpoint_map API' from Pavel Emelyanov
The handler in question when called for tablets-enabled keyspace, returns ranges that are inconsistent with those from system.tablets. Like this:

system.tablets:
```
    TabletReplicas(last_token=-4611686018427387905, replicas=[('e43ce450-2834-4137-92b7-379bb37684d1', 0), ('67c82fc2-8ef9-4dd9-8cf6-c7f9372ce207', 0)])
    TabletReplicas(last_token=-1, replicas=[('22c84cba-d8d0-4d20-8d46-eb90865bb612', 0), ('67c82fc2-8ef9-4dd9-8cf6-c7f9372ce207', 1)])
    TabletReplicas(last_token=4611686018427387903, replicas=[('22c84cba-d8d0-4d20-8d46-eb90865bb612', 1), ('67c82fc2-8ef9-4dd9-8cf6-c7f9372ce207', 1)])
    TabletReplicas(last_token=9223372036854775807, replicas=[('e43ce450-2834-4137-92b7-379bb37684d1', 1), ('22c84cba-d8d0-4d20-8d46-eb90865bb612', 0)])
```

range_to_endpoint_map:
```
    {'key': ['-9069053676502949657', '-8925522303269734226'], 'value': ['127.110.40.2', '127.110.40.3']}
    {'key': ['-8925522303269734226', '-8868737574445419305'], 'value': ['127.110.40.2', '127.110.40.3']}
    ...
    {'key': ['-337928553869203886', '-288500562444694340'], 'value': ['127.110.40.1', '127.110.40.3']}
    {'key': ['-288500562444694340', '105026475358661740'], 'value': ['127.110.40.1', '127.110.40.3']}
    {'key': ['105026475358661740', '611365860935890281'], 'value': ['127.110.40.1', '127.110.40.3']}
    ...
    {'key': ['8307064440200319556', '9117218379311179096'], 'value': ['127.110.40.2', '127.110.40.1']}
    {'key': ['9117218379311179096', '9125431458286674075'], 'value': ['127.110.40.2', '127.110.40.1']}
```

Not only the number of ranges differs, but also separating tokens do not match (e.g. tokens -2 and 0 belong to different tablets according to system.tablets, but fall into the same "range" in the API result).

The source of confusion is that despite storage_service::get_range_to_address_map() is given correct e.r.m. pointer from the table, it still uses token_metadata::sorted_token() to work with. The fix is -- when the e.r.m. is per-table, the tokens should be get from token_metadata's tablet_map (e.g. compare this to storage_service::effective_ownership() -- it grabs tokens differently for vnodes/tables cases).

This PR fixes the mentioned problem and adds validation test. The test also checks /storage_service/describe_ring endpoint that happens to return correct set of values.

The API is very ancient, so the bug is present in all versions with tablets

Fixes #26331

Closes scylladb/scylladb#26231

* github.com:scylladb/scylladb:
  test: Add validation of data returned by /storage_service endpoints
  test,lib: Add range_to_endpoint_map() method to rest client
  api: Indentation fix after previous patches
  storage_service: Get tablet tokens if e.r.m. is per-table
  storage_service,api: Get e.r.m. inside get_range_to_address_map()
  storage_service: Calculate tokens on stack
2025-09-30 11:20:35 +03:00
Michał Jadwiszczak
3bbbbf419b test/cluster/test_view_building_coordinator: add reproducer for staging sstables with tablet merge
The test verifies if staging sstables are processed correctly after
tablet merge.

Refs scylladb/scylladb#26244

Closes scylladb/scylladb#26286
2025-09-30 09:05:31 +02:00
Avi Kivity
4d9271df98 Merge 'sstables: introduce sstable version ms' from Michał Chojnowski
This is yet another part in the BTI index project.

Overarching issue: https://github.com/scylladb/scylladb/issues/19191
Previous part: https://github.com/scylladb/scylladb/pull/25626
Next parts: make `ms` the default. Then, general tweaks and improvements. Later, potentially a full `da` format implementation.

This patch series introduces a new, Scylla-only sstable format version `ms`, which is like `me`, but with the index components (Summary.db and Index.db) replaced with BTI index components (Partitions.db and Rows.db), as they are in Cassandra 5.0's `da` format version.

(Eventually we want to just implement `da`, but there are several other changes (unrelated to the index files) between `me` and `da`. By adding this `ms` as an intermediate step we can adapt the new index formats without dragging all the other changes into the mix (and raising the risk of regressions, which is already high)).

The high-level structure of the PR is:
1. Introduce new component types — `Partitions` and `Rows`.
2. Teach `class sstable` to open them when they exist.
3. Teach the sstable writer how to write index data to them.
4. Teach `class sstable` and unit tests how to deal with sstables that have no `Index` or `Summary` (but have `Partitions` and `Rows` instead).
5. Introduce the new sstable version `ms`, specify that it has `Partitions` and `Rows` instead of `Index` and `Summary`.
6. Prepare unit tests for the appearance of `ms`.
7. Enable `ms` in unit tests.
8. Make `ms` enablable via db::config (with a silent fall back to `me` until the new `MS_SSTABLE_FORMAT` cluster feature is enabled).
9. Prepare integration tests for the appearance of `ms`.
10. Enable both `ms` and `me` in tests where we want both versions to be tested.

This series doesn't make `ms` the default yet, because that requires teaching Scylla Manager and a few dtests about the new format first. It can be enabled by setting `sstable_format: ms` in the config.

Per a review request, here is an example from `perf_fast_forward`, demonstrating some motivation for a new format. (Although not the main one. The main motivations are getting rid of restrictions on the RAM:disk ratio, and index read throughput for datasets with tiny partitions). The dataset was populated with `build/release/scylla perf-fast-forward --smp=1 --sstable-format=$VERSION --data-directory=data.$VERSION --column-index-size-in-kb=1 --populate --random-seed=0`.
This test involves a partition with 1000000 clustering rows (with 32-bit keys and 100-byte values) and ~500 index blocks, and queries a few particular rows from the partition. Since the branching factor for the BIG promoted index is 2 (it's a binary search), the lookup involves ~11.2 sequential page reads per row. The BTI format has a more reasonable branching factor, so it involves ~2.3 page reads per row.

`build/release/scylla perf-fast-forward --smp=1 --data-directory=perf_fast_forward_data/me --run-tests=large-partition-select-few-rows`:
```
offset  stride  rows     iterations    avg aio    aio      (KiB)
500000  1       1                70       18.0     18        128
500001  1       1               647       19.0     19        132
0       1000000 1               748       15.0     15        116
0       500000  2               372       29.0     29        284
0       250000  4               227       56.0     56        504
0       125000  8               116      106.0    106        928
0       62500   16               67      195.0    195       1732
```
`build/release/scylla perf-fast-forward --smp=1 --data-directory=perf_fast_forward_data/ms --run-tests=large-partition-select-few-rows`:
```
offset  stride  rows     iterations    avg aio    aio      (KiB)
500000  1       1                51        5.1      5         20
500001  1       1                64        5.3      5         20
0       1000000 1               679        4.0      4         16
0       500000  2               492        8.0      8         88
0       250000  4               804       16.0     16        232
0       125000  8               409       31.0     31        516
0       62500   16               97       54.0     54       1056
```

Index file size comparison for the default `perf_fast_forward` tables with `--random-seed=0`:
Large partition table (dominated by intra-partition index): 2.4 MB with `me`, 732 kB with `ms`.
For the small partitions table (dominated by inter-partition index): 11 MB with `me`, 8.4 MB with `ms`.

External tests:
I ran SCT test `longevity-mv-si-4days-streaming-test` test on 6 nodes with 30 shards each for 8 hours. No anomalies were observed.

New functionality, no backport needed.

Closes scylladb/scylladb#26215

* github.com:scylladb/scylladb:
  test/boost/bloom_filter_test: add test_rebuild_from_temporary_hashes
  test/cluster: add test_bti_index.py
  test: prepare bypass_cache_test.py for `ms` sstables
  sstables/trie/bti_index_reader: add a failure injection in advance_lower_and_check_if_present
  test/cqlpy/test_sstable_validation.py: prepare the test for `ms` sstables
  tools/scylla-sstable: add `--sstable-version=?` to `scylla sstable write`
  db/config: expose "ms" format to the users via database config
  test: in Python tests, prepare some sstable filename regexes for `ms`
  sstables: add `ms` to `all_sstable_versions`
  test/boost/sstable_3_x_test: add `ms` sstables to multi-version tests
  test/lib/index_reader_assertions: skip some row index checks for BTI indexes
  test/boost/sstable_inexact_index_test: explicitly use a `me` sstable
  test/boost/sstable_datafile_test: skip test_broken_promoted_index_is_skipped for `ms` sstables
  test/resource: add `ms` sample sstable files for relevant tests
  test/boost/sstable_compaction_test: prepare for `ms` sstables.
  test/boost/index_reader_test: prepare for `ms` sstables
  test/boost/bloom_filter_tests: prepare for `ms` sstables
  test/boost/sstable_datafile_test: prepare for `ms` sstables
  test/boost/sstable_test: prepare for `ms` sstables.
  sstables: introduce `ms` sstable format version
  tools/scylla-sstable: default to "preferred" sstable version, not "highest"
  sstables/mx/reader: use the same hashed_key for the bloom filter and the index reader
  sstables/trie/bti_index_reader: allow the caller to passing a precalculated murmur hash
  sstables/trie/bti_partition_index_writer: in add(), get the key hash from the caller
  sstables/mx: make Index and Summary components optional
  sstables: open Partitions.db early when it's needed to populate key range for sharding metadata
  sstables: adapt sstable::set_first_and_last_keys to sstables without Summary
  sstables: implement an alternative way to rebuild bloom filters for sstables without Index
  utils/bloom_filter: add `add(const hashed_key&)`
  sstables: adapt estimated_keys_for_range to sstables without Summary
  sstables: make `sstable::estimated_keys_for_range` asynchronous
  sstables/sstable: compute get_estimated_key_count() from Statistics instead of Summary
  replica/database: add table::estimated_partitions_in_range()
  sstables/mx: implement sstable::has_partition_key using a regular read
  sstables: use BTI index for queries, when present and enabled
  sstables/mx/writer: populate BTI index files
  sstables: create and open BTI index files, when enabled
  sstables: introduce Partition and Rows component types
  sstables/mx/writer: make `_pi_write_m.partition_tombstone` a `sstables::deletion_time`
2025-09-30 09:40:02 +03:00
Piotr Dulikowski
4581c72430 Merge 'lwt: prohibit for tablet-based views and cdc logs' from Petr Gusev
`SELECT` commands with SERIAL consistency level are historically allowed for vnode-based views, even though they don't provide linearizability guarantees and in general don't make much sense. In this PR we prohibit LWTs for tablet-based views, but preserve old behavior for vnode-based views for compatibility. Similar logic is applied to CDC log tables.

We also add a general check that disallows colocating a table with another colocated table, since this is not needed for now.

Fixes https://github.com/scylladb/scylladb/issues/26258

backports: not needed (a new feature)

Closes scylladb/scylladb#26284

* github.com:scylladb/scylladb:
  cql_test_env.cc: log exception when callback throws
  lwt: prohibit for tablet-based views and cdc logs
  tablets: disallow chains of colocated tables
  database: get_base_table_for_tablet_colocation: extract table_id_by_name lambda
2025-09-30 07:15:16 +02:00
Michał Chojnowski
771a82969e test/boost/bloom_filter_test: add test_rebuild_from_temporary_hashes
Adds a test for the bloom filter rebuild mechanism in `ms` sstables.
2025-09-29 22:15:26 +02:00
Michał Chojnowski
c1e6cd58fa test/cluster: add test_bti_index.py
Add a test which checks that `ms` sstables can be enabled and disabled.
2025-09-29 22:15:25 +02:00
Michał Chojnowski
7a2dc9cbfa test: prepare bypass_cache_test.py for ms sstables
The test looks at metrics to confirm whether queries
hit the row cache, the index cache or the disk,
depending on various settings.

BIG index readers use a two level, read-through index cache,
where the higher layer stores parsed "index pages"
of Index.db, while the lower layer is a cache of
raw 4kiB file pages of Index.db.

Therefore, if we want to count index cache hits,
the appropriate metric to check in this case is
`scylla_sstables_index_page_hits",
which counts hits in the higher layer.
This is done by the test.

However, BTI index readers don't have an equivalent of the higher cache
layer. Their cache only stores the raw 4 kiB pages, and the hits are
counted in `scylla_sstables_index_page_cache_hits`. (The same metric
is incremented by the lower layer of the BIG index cache).

Before this commit, the test would fail with `ms` sstables,
because their reads don't increment `scylla_sstables_index_page_hits`.
In this commit we adapt the test so that it instead checks
`scylla_sstables_index_page_cache_hits` for `ms` sstables.
2025-09-29 22:15:25 +02:00
Michał Chojnowski
621bfbe6d9 sstables/trie/bti_index_reader: add a failure injection in advance_lower_and_check_if_present
test_column_family.py::test_sstables_by_key_reader_closed
injects a failure into `index_reader::advance_lower_and_check_if_present`.
To preserve this tests when BTI indexes are made the default,
we have to add a corresponding error injection to
`bti_index_reader::advance_lower_and_check_if_present`.
2025-09-29 22:15:25 +02:00
Michał Chojnowski
182c8ce87b test/cqlpy/test_sstable_validation.py: prepare the test for ms sstables
BIG sstables and BTI sstables use different code paths for
validating the Data file against the index. So we want to test
both types of indexes, not just the default one.

This patch changes the test so that it explicitly tests both
`me` and `ms` instead of only testing the default format.

Note that we disable some tests for BTI indexes:
the tests which check that validation detects mismatches
between the row index ("promoted index") and the Data file.
This is because currently iteration over the row
index in BTI isn't implemented at the moment,
so for BTI the validation behaves as if there was no row indexes.
2025-09-29 22:15:25 +02:00
Michał Chojnowski
aed1cb6f65 tools/scylla-sstable: add --sstable-version=? to scylla sstable write
A useful option in general, and I'll need it to test multiple versions
in `test_sstable_validation.py`.
2025-09-29 22:15:25 +02:00
Michał Chojnowski
ef11dc57c1 db/config: expose "ms" format to the users via database config
Extend the `sstable_format` config enum with a "ms" value,
and, if it's enabled (in the config and in cluster features),
use it for new sstables on the node.

(Before this commit, writing `ms` sstables should only be possible
in unit tests, via internal APIs. After this commit, the format
can be enabled in the config and the database will write it during
normal operation).

As of this commit, the new format is not the default yet.
(But it will become the default in a later commit in the same series).
2025-09-29 22:15:25 +02:00
Michał Chojnowski
2ed2033224 test: in Python tests, prepare some sstable filename regexes for ms 2025-09-29 22:15:25 +02:00
Michał Chojnowski
fe9f5f4da2 sstables: add ms to all_sstable_versions
Add `ms` to the lists of sstable formats.
This will cause it to be included in various unit tests.
2025-09-29 22:15:25 +02:00
Michał Chojnowski
9155eeed10 test/boost/sstable_3_x_test: add ms sstables to multi-version tests
Add `ms` to tests which already test many format versions.

The tests check that sstable files in newer verisons are
the same as in `mc`.
Arbitrarily, for `ms`, we only check the files common
between `mc` and `ms`.

If we want to extend this test more, so that it checks
that `Partitions.db` and `Rows.db` don't change over time,
we have to add `ms` versions of all the sstables under
`test/resources` which are used in this test. We won't do that
in this patch series. And I'm not sure if we want to do that at all.
2025-09-29 22:15:25 +02:00
Michał Chojnowski
70a6c9481b test/lib/index_reader_assertions: skip some row index checks for BTI indexes
Block monotonicity checks can't be implemented for BTI row indexes
because they don't store full clustering positions, only some encoded
prefixes.

The emptiness check could be implemented with some effort,
but we currently don't bother.
The two tests which use this `is_empty()` method aren't very
useful anyway. (They check that the promoted index is empty when
there are no clustering keys. That doesn't really need a dedicated
test).
2025-09-29 22:15:25 +02:00
Michał Chojnowski
d53f362328 test/boost/sstable_inexact_index_test: explicitly use a me sstable
The test currently implicitly uses the default sstable format.
But it assumes that the index reader type is `sstables::index_reader`,
and it wants some methods specific to that type (and absent from
the base `abstract_index_reader`).

If we switch the default format from `me` to `ms`,
without doing something about this,
this test will start failing on the downcast to `sstables::index_reader`.

We deal with this by explicitly specifying `me`.
`me` and `ms` data readers are identical.
And this is a test of the data reader, not the index reader.
So it's perfectly fine to just use `me`.
2025-09-29 22:15:25 +02:00
Michał Chojnowski
fca56cb458 test/boost/sstable_datafile_test: skip test_broken_promoted_index_is_skipped for ms sstables
This is an old test for some workaround for incorrectly-generated
promoted indexes. It doesn't make sense to port this test to newer
sstable formats. So just skip it for the new sstable versions.
2025-09-29 22:15:25 +02:00
Michał Chojnowski
1f7882526b test/resource: add ms sample sstable files for relevant tests
There are some tests which want sstables of all format versions
in `test/resource`. This tests adds `ms` files for those tests.

I didn't think much about this change, I just mechanically
generated the `ms` from the existing `me` sstables in the same directories
(using `scylla sstable upgrade`) for the tests which were complaining
about the lack of `ms` files.
2025-09-29 22:15:25 +02:00
Michał Chojnowski
6143dce3db test/boost/sstable_compaction_test: prepare for ms sstables.
Fix incompatibilites between the test's assumptions
and the upcoming addition of `ms` sstables.
Refer to individual tests for comments.
2025-09-29 22:15:25 +02:00
Michał Chojnowski
622149a183 test/boost/index_reader_test: prepare for ms sstables
Adjust the incompatibilities between the test and the upcoming
`ms` sstables. Refer to individual test for comments.
2025-09-29 22:15:25 +02:00
Michał Chojnowski
a67d10d15d test/boost/bloom_filter_tests: prepare for ms sstables
The test for the bloom filter rebuild mechanism has to be adjusted,
because `ms` sstables won't use this mechanism.
2025-09-29 22:15:25 +02:00
Michał Chojnowski
312423fe53 test/boost/sstable_datafile_test: prepare for ms sstables
The tests touched in this commit are concerned specifically with
Summary. They are not applicable to sstables with BTI indexes.
2025-09-29 22:15:24 +02:00
Michał Chojnowski
924b8eec11 test/boost/sstable_test: prepare for ms sstables.
Skip `ms` sstables in an uninteresting test which
relies on `sstables::index_reader`.
2025-09-29 22:15:24 +02:00
Michał Chojnowski
db4283b542 sstables: introduce ms sstable format version
Introduce `ms` -- a new sstable format version which
is a hybrid of Cassandra's `me` and `da`.

It is based on `me`, but with the index components
(Summary.db and Index.db) replaced with the index
components of `da` (Partitions.db and Rows.db).

As of this patch, the version is never chosen
anywhere for writing sstables yet. It is only introduced.
We will add it to unit tests in a later commit,
and expose it to users in yet later commit.
2025-09-29 22:15:24 +02:00
Michał Chojnowski
17085dc1e4 tools/scylla-sstable: default to "preferred" sstable version, not "highest"
Later in this patch series we will introduce `ms` as the new highest
format, but we won't be able to make it the default within the same
series due to some dtest incompatibilities.

Until `ms` is the default, we don't `scylla sstable` to default to
it, even though it's the highest. Let's choose the default
version in `scylla sstable` using the same method which is
used by Scylla in general: by letting the `sstable_manager` choose.
2025-09-29 22:13:59 +02:00
Nadav Har'El
3a5475afb7 Merge 'metrics, vector search: add metrics to the vector store client' from Michał Hudobski
This PR adds metrics to the vector store client, as described in https://scylladb.atlassian.net/wiki/spaces/RND/pages/86245395/Vector+Store+Core+APIs#Metrics:

- number of the dns refreshes

We would like the dns refreshes to see if the network client is working properly.

Here is the added metric:

\# HELP scylla_vector_store_dns_refreshes Number of DNS refreshes
\# TYPE scylla_vector_store_dns_refreshes gauge
scylla_vector_store_dns_refreshes{shard="0"} 1.000000

Fixes: VECTOR-68

Closes scylladb/scylladb#25288

* github.com:scylladb/scylladb:
  metrics, test: added a test case for vs metrics
  metrics, vector_search: add a dns refresh metric
  vector_search: move the ann implementation to impl
2025-09-29 22:31:03 +03:00
Petr Gusev
29f9c355ab cql_test_env.cc: log exception when callback throws
When a test fails inside a do_with_cql_env callback, the logs don’t
make it clear where the failure happened. This is because cql_env
immediately begins shutting down services, which obscures the
original failure.
2025-09-29 17:53:36 +02:00
Lakshmi Narayanan Sreethar
7b97928152 cmake: link vector_search to test-lib instead of cql3
PR #26237 fixed linker errors by linking `cql3` to `vector_search` but
this introduced a circular dependency between these two static
libraries, sometimes causing failures during compilation :

```
ninja: error: dependency cycle:
/home/user/Development/scylladb/build/debug/cql3/CqlParser.hpp ->
data_dictionary/libdata_dictionary.a ->
data_dictionary/CMakeFiles/data_dictionary.dir/data_dictionary.cc.o ->
/home/user/Development/scylladb/build/debug/cql3/CqlParser.hpp
```

So, instead of linking the `vector_search` library to the `cql3`
library, link it directly to the executable where the `cql3` library is
also to be linked. For the test cases, this means linking
`vector_search` to the `test-lib` library. Since both `vector_search`
and `cql3` are static libraries, the linker will resolve them correctly
regardless of the order in which they are linked.

Refs #26235
Refs #26237

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>

Closes scylladb/scylladb#26318
2025-09-29 17:46:58 +03:00
Benny Halevy
b81c6a339b test_tablets_merge: test_tablet_split_merge_with_many_tables: reduce number of tables in debug mode
As the test hits timeouts in debug mode on aarch64.

Fixes #26252

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#26303
2025-09-29 15:30:13 +03:00
Michael Litvak
d94c1f6674 test: mv: test view update during topology operations
add new test cases checking view consistency when writing to a table
with MV and generating view updates while data is migrated.

one case has tablet migrations while writing to the table. The other
case does the equivalent for vnode keyspaces - it adds a new node.

The tests reproduce issue scylladb/scylladb#24292
2025-09-29 13:44:04 +02:00
Michael Litvak
c9237bf5f6 mv: generate view updates on both shards in intranode migration
Similarly to the issue of tokens migrating from one host to another,
where we need to generate view updates on both replicas before
transitioning in order to not lose view updates, we need to do the same
in case of intranode migration.

In intranode migration we migrate tokens from one shard to another.
Previously we checked shard_for_reads in order to generate view updates
only on the single shard that is selected for reads, and not on a
pending shard that is not ready yet. The problem is that shard_for_reads
switches from the source shard to the destination shard in a single
transition, and during that switch we can lose view updates because
neither shard sees itself as the shard for reads.

We fix this by having a phase before the transition when both shards are
ready for reads and both will generate view updates.
2025-09-29 13:44:04 +02:00
Michael Litvak
d842ea2dc9 mv: generate view updates on pending replica
Generate view updates from a pending base replica if it's a reading
replica, i.e. it's in the last stage of transition write_both_read_new
before becoming the new base replica.

Previously we didn't generate view updates on a pending replica. The
problem with that is that when a base token is migrated from one replica
B1 to another B2, at one stage we generate view updates only from B1,
then at the next stage we generate view updates only from B2. During
this transition, it can happen that for some write neither B1 nor B2
generate view update, because each one sees the other as the base
replica.

We fix this by generating view updates from both base replicas in the
phase before the transition. We can generate view updates on the pending
replica in this case, even if it requires read-before-write, because
it's in a stage where it contains all data and serves reads.

Fixes scylladb/scylladb#24292
2025-09-29 13:44:04 +02:00
Botond Dénes
7a773da425 Merge 'Speed up test cluster/test_alternator::test_localnodes_joining_nodes' from Nadav Har'El
Before this patch, the test `cluster/test_alternator::test_localnodes_joining_nodes` was one of the slowest tests in the test/cluster framework, taking over two minutes to run.

As comments in the test already acknowledged, there was no good reason why this test had to be so slow. The test needed to, intentionally, boot a server which took a long time (2 minutes) to fail its boot. But it didn't really need to wait for this failure - the right thing to do was to just kill the server at the end of the test. But we just didn't have the test-framework API to do it. So in this series, the first patch introduces the missing API, and the second patch uses it to fix test_localnodes_joining_nodes to kill the (unsuccessfully) booting server.

After this patch, the test takes just 7 seconds to run.

This is a test speedup only, so no real need to backport it - old release anyway get fewer test runs and the latency of these runs is less important.

Closes scylladb/scylladb#25312

* github.com:scylladb/scylladb:
  test/cluster: greatly speed up test_localnodes_joining_nodes
  test/pylib: add the ability to stop currently-starting servers
2025-09-29 14:34:34 +03:00
Michał Chojnowski
4ca215abbc sstables/mx/reader: use the same hashed_key for the bloom filter and the index reader
Partitions.db uses a piece of the murmur hash of the partition key
internally. The same hash is used to query the bloom filter.
So to avoid computing the hash twice (which involves converting the
key into a hashable linearized form) it would make sense to use
the same `hashed_key` for both purposes.

This is what we do in this patch. We extract the computation
of the `hashed_key` from `make_pk_filter` up to its parent
`sstable_set_impl::create_single_key_sstable_reader`,
and we pass this hash down both to `make_pk_filter` and
to the sstable reader. (And we add a pointer to the `hashed_key`
as a parameter to all functions along the way, to propagate it).

The number of parameters to `mx::make_reader` is getting uncomfortable.
Maybe they should be packed into some structs.
2025-09-29 13:01:22 +02:00
Michał Chojnowski
420e215873 sstables/trie/bti_index_reader: allow the caller to passing a precalculated murmur hash
Partitions.db internally uses a piece of the partition key murmur
hash (the same hash which is used to compute the token and the
relevant bits in the bloom filter). Before this patch,
the Partitions.db reader computes the hash internally from the
`sstables::partition_key`.

That's a waste, because this hash is usually also computed
for bloom filter purposes just before that.

So in this patch we let the caller pass that hash instead.

The old index interface, without the hash, is kept for convenience.

In this patch we only add a new interface, we don't switch the callers
to it yet. That will happen in the next commit.
2025-09-29 13:01:21 +02:00
Michał Chojnowski
cee4011e7a sstables/trie/bti_partition_index_writer: in add(), get the key hash from the caller
Partitions.db internally uses a piece of the partition key murmur
hash (the same hash which is used to compute the token and the
relevant bits in the bloom filter). Before this patch,
the Partitions.db writer computes the hash internally from the
`sstables::partition_key`.

That's a waste, because this hash is also computed for bloom filter
purposes just before that, in the owning sstable writer.
So in this patch we let the caller pass that hash here instead.
2025-09-29 13:01:21 +02:00
Michał Chojnowski
f8e3d5e7c2 sstables/mx: make Index and Summary components optional
In previous patches we (hopefully) modified all users of
Index and Summary components so that they don't longer
need those components to exist. (And can use Partitions and
Rows components instead).
2025-09-29 13:01:21 +02:00
Michał Chojnowski
f003cbce6d sstables: open Partitions.db early when it's needed to populate key range for sharding metadata
If there's no metadata file with sharding metadata,
the owning shards of an sstable are computed based on the partition key
range within the sstable.

This range is set in `set_first_and_last_keys()`, which
(since another commit in this commit series) reads it
either from the Summary component or from the footer of the Partitions
component, whichever is available.

But in some code paths `set_first_and_last_keys()` is called
before the footer of Partitions is loaded. If the sstable
doesn't have Summary, only Partitions, then the
`set_first_and_last_keys()` will fail. To prevent that,
in those cases we have to open the file and read its footer
early, before the `set_first_and_last_keys()` calls.

Note: the changes in this commit shouldn't matter during
normal operation, in which a Scylla component with sharding
metadata is available. But it might be used when
old and/or incomplete sstables are read.
2025-09-29 13:01:21 +02:00
Michał Chojnowski
4bdf5ca0cf sstables: adapt sstable::set_first_and_last_keys to sstables without Summary
`sstable::set_first_and_last_keys` currently takes the first and last
key from the Summary component. But if only BTI indexes are used,
this component will be nonexistent. In this case, we can use the first
and last keys written in the footer of Partitions.db.
2025-09-29 13:01:21 +02:00