Commit Graph

49745 Commits

Author SHA1 Message Date
Michał Chojnowski
6143dce3db test/boost/sstable_compaction_test: prepare for ms sstables.
Fix incompatibilites between the test's assumptions
and the upcoming addition of `ms` sstables.
Refer to individual tests for comments.
2025-09-29 22:15:25 +02:00
Michał Chojnowski
622149a183 test/boost/index_reader_test: prepare for ms sstables
Adjust the incompatibilities between the test and the upcoming
`ms` sstables. Refer to individual test for comments.
2025-09-29 22:15:25 +02:00
Michał Chojnowski
a67d10d15d test/boost/bloom_filter_tests: prepare for ms sstables
The test for the bloom filter rebuild mechanism has to be adjusted,
because `ms` sstables won't use this mechanism.
2025-09-29 22:15:25 +02:00
Michał Chojnowski
312423fe53 test/boost/sstable_datafile_test: prepare for ms sstables
The tests touched in this commit are concerned specifically with
Summary. They are not applicable to sstables with BTI indexes.
2025-09-29 22:15:24 +02:00
Michał Chojnowski
924b8eec11 test/boost/sstable_test: prepare for ms sstables.
Skip `ms` sstables in an uninteresting test which
relies on `sstables::index_reader`.
2025-09-29 22:15:24 +02:00
Michał Chojnowski
db4283b542 sstables: introduce ms sstable format version
Introduce `ms` -- a new sstable format version which
is a hybrid of Cassandra's `me` and `da`.

It is based on `me`, but with the index components
(Summary.db and Index.db) replaced with the index
components of `da` (Partitions.db and Rows.db).

As of this patch, the version is never chosen
anywhere for writing sstables yet. It is only introduced.
We will add it to unit tests in a later commit,
and expose it to users in yet later commit.
2025-09-29 22:15:24 +02:00
Michał Chojnowski
17085dc1e4 tools/scylla-sstable: default to "preferred" sstable version, not "highest"
Later in this patch series we will introduce `ms` as the new highest
format, but we won't be able to make it the default within the same
series due to some dtest incompatibilities.

Until `ms` is the default, we don't `scylla sstable` to default to
it, even though it's the highest. Let's choose the default
version in `scylla sstable` using the same method which is
used by Scylla in general: by letting the `sstable_manager` choose.
2025-09-29 22:13:59 +02:00
Michał Chojnowski
4ca215abbc sstables/mx/reader: use the same hashed_key for the bloom filter and the index reader
Partitions.db uses a piece of the murmur hash of the partition key
internally. The same hash is used to query the bloom filter.
So to avoid computing the hash twice (which involves converting the
key into a hashable linearized form) it would make sense to use
the same `hashed_key` for both purposes.

This is what we do in this patch. We extract the computation
of the `hashed_key` from `make_pk_filter` up to its parent
`sstable_set_impl::create_single_key_sstable_reader`,
and we pass this hash down both to `make_pk_filter` and
to the sstable reader. (And we add a pointer to the `hashed_key`
as a parameter to all functions along the way, to propagate it).

The number of parameters to `mx::make_reader` is getting uncomfortable.
Maybe they should be packed into some structs.
2025-09-29 13:01:22 +02:00
Michał Chojnowski
420e215873 sstables/trie/bti_index_reader: allow the caller to passing a precalculated murmur hash
Partitions.db internally uses a piece of the partition key murmur
hash (the same hash which is used to compute the token and the
relevant bits in the bloom filter). Before this patch,
the Partitions.db reader computes the hash internally from the
`sstables::partition_key`.

That's a waste, because this hash is usually also computed
for bloom filter purposes just before that.

So in this patch we let the caller pass that hash instead.

The old index interface, without the hash, is kept for convenience.

In this patch we only add a new interface, we don't switch the callers
to it yet. That will happen in the next commit.
2025-09-29 13:01:21 +02:00
Michał Chojnowski
cee4011e7a sstables/trie/bti_partition_index_writer: in add(), get the key hash from the caller
Partitions.db internally uses a piece of the partition key murmur
hash (the same hash which is used to compute the token and the
relevant bits in the bloom filter). Before this patch,
the Partitions.db writer computes the hash internally from the
`sstables::partition_key`.

That's a waste, because this hash is also computed for bloom filter
purposes just before that, in the owning sstable writer.
So in this patch we let the caller pass that hash here instead.
2025-09-29 13:01:21 +02:00
Michał Chojnowski
f8e3d5e7c2 sstables/mx: make Index and Summary components optional
In previous patches we (hopefully) modified all users of
Index and Summary components so that they don't longer
need those components to exist. (And can use Partitions and
Rows components instead).
2025-09-29 13:01:21 +02:00
Michał Chojnowski
f003cbce6d sstables: open Partitions.db early when it's needed to populate key range for sharding metadata
If there's no metadata file with sharding metadata,
the owning shards of an sstable are computed based on the partition key
range within the sstable.

This range is set in `set_first_and_last_keys()`, which
(since another commit in this commit series) reads it
either from the Summary component or from the footer of the Partitions
component, whichever is available.

But in some code paths `set_first_and_last_keys()` is called
before the footer of Partitions is loaded. If the sstable
doesn't have Summary, only Partitions, then the
`set_first_and_last_keys()` will fail. To prevent that,
in those cases we have to open the file and read its footer
early, before the `set_first_and_last_keys()` calls.

Note: the changes in this commit shouldn't matter during
normal operation, in which a Scylla component with sharding
metadata is available. But it might be used when
old and/or incomplete sstables are read.
2025-09-29 13:01:21 +02:00
Michał Chojnowski
4bdf5ca0cf sstables: adapt sstable::set_first_and_last_keys to sstables without Summary
`sstable::set_first_and_last_keys` currently takes the first and last
key from the Summary component. But if only BTI indexes are used,
this component will be nonexistent. In this case, we can use the first
and last keys written in the footer of Partitions.db.
2025-09-29 13:01:21 +02:00
Michał Chojnowski
b1984d6798 sstables: implement an alternative way to rebuild bloom filters for sstables without Index
For efficiency, the cardinality of the bloom filter
(i.e. the number of partition keys which will be written into the sstable)
has to be known before elements are inserted into the filter.

In some cases (e.g. memtables flush) this number is known exactly.
But in others (e.g. repair) it can only be estimated,
and the estimation might be very wrong, leading to an oversized filter.

Because of that, some time ago we added a piece of logic
(ran after the sstable is written, but before it's sealed)
which looks at the actual number of written partitions,
compares it to the initial estimate (on which the size of the bloom
filter was based on), and if the difference is unacceptably large,
it rewrites the bloom filter from partition keys contained in Index.db.

But the idea to rebuild the bloom filters from index files
isn't going to work with BTI indexes, because they don't store
whole partition keys. If we want sstables which don't have Index.db
files, we need some other way to deal with oversized filters.
Partition keys can be recovered from Data.db,
but that would often be way too expensive.

This patch adds another way. We introduce a new component file,
TemporaryHashes. This component, if written at all,
contains the 16-byte murmur hash for every partition key, in order,
and can be used in place of Index to reconstruct the bloom filter.

(Our bloom filters are actually built from the set of murmur hashes of
partition keys. The first step of inserting a partition key into a
filter is hashing the key. Remembering the hashes is sufficient
to build the filter later, without looking at partition keys again.)

As of this patch, if the Index component is not being written,
we don't allocate and populate a bloom filter during the Data.db write.
Instead, we write the murmur hashes to TemporaryHashes, and only
later, after the Data write finishes, we allocate the optimal-size,
bloom filter, we read the hashes back from TemporaryHashes,
and we populate the filter with them.

That is suboptimal.
Writing the hashes to disk (or worse, to S3) and reading
them back is more expensive than building the bloom filter
during the main Data pass.
So ideally it should be avoided in cases where we know
in advance that the partition key count estimate is good enough.
(Which should be the case in flushes and compactions).
But we defer that to a future patch.
(Such a change would involve passing some flag to the sstable writer
if the cardinality estimate is trustworthy, and not creating
TemporaryHashes if the estimate is trustworthy).
2025-09-29 13:01:21 +02:00
Michał Chojnowski
c549afa1a9 utils/bloom_filter: add add(const hashed_key&)
In one of the next patches, we will want to use (in BTI partition
index writer) the same hash as used by the bloom filter,
and we'll also want to allow rebuilding the filter in a second
pass (after the whole sstable is written) from hashes (as opposed
to rebuilding from partition keys saved in Index.db, which is
something we sometimes do today) saved to a temporary file.

For those, we need an interface that allows us to compute the hash
externally, and only pass the hash to `add()`.
2025-09-29 13:01:21 +02:00
Michał Chojnowski
3c83914814 sstables: adapt estimated_keys_for_range to sstables without Summary
Before this patch, `estimated_keys_for_range` assumes the presence
of the Summary component. But we want to make this component optional
in this series.

This patch adds a second branch to this function, for sstables
which don't have a BIG index (in particular, Summary component),
but have a BTI index (Partitions component).

In this case, instead of calculating the estimate as
"fraction of summary overlapping with given range,
multiplied by the total key estimate", we calculate
it as "fraction of Data file overlapping with given range,
multiplied by the total key estimate".

(With an extra conditional for the special case when the given range
doesn't overlap with the sstable's range at all. In this case, if the
ranges are adjacent, the main path could easily return "1 partition"
instead of "0 partitions", due to the inexactness of BTI indexes for
range queries. Returning something non-zero in this case would
be unfortunate, so the extra conditional makes sure that
we return 0).
2025-09-29 13:01:21 +02:00
Michał Chojnowski
55c4b89b88 sstables: make sstable::estimated_keys_for_range asynchronous
Currently, `sstable::estimated_keys_for_range` works by
checking what fraction of Summary is covered by the given
range, and multiplying this fraction to the number of all keys.
Since computing things on Summary doesn't involve I/O (because Summary
is always kept in RAM), this is synchronous.

In a later patch, we will modify `sstable::estimated_keys_for_range`
so that it can deal with sstables that don't have a Summary
(because they use BTI indexes instead of BIG indexes).
In that case, the function is going to compute the relevant fraction
by using the index instead of Summary. This will require making
the function asynchronous. This is what we do in this patch.

(The actual change to the logic of `sstable::estimated_keys_for_range`
will come in the next patch. In this one, we only make it asynchronous).
2025-09-29 13:01:21 +02:00
Michał Chojnowski
70994170e2 sstables/sstable: compute get_estimated_key_count() from Statistics instead of Summary
`sstable::get_estimated_key_count()` estimates the partition count from the
size of Summary, and the interval between Summary entries.
But we want to allow writing sstables without a Summary
(i.e. sstables that use BTI indexes instead of BIG indexes),
so we want a way to get the key count without involving Summary.

For that, we can use the `estimated_partition_size` histogram in
Statistics. By counting the histogram entries, we get the exact
number of partitions in the sstable.
2025-09-29 13:01:21 +02:00
Michał Chojnowski
68c33c0173 replica/database: add table::estimated_partitions_in_range()
Add a function which computes an estimated number of partitions
in the given token range. We will use this helper in a later patch
to replace a few places in the code which de facto do the same
thing "manually".
2025-09-29 13:01:21 +02:00
Michał Chojnowski
5f4b9a03d1 sstables/mx: implement sstable::has_partition_key using a regular read
A BTI index isn't able to determine if a given key is present in
the sstable, because it doesn't store full keys.
(It only stores prefixes of decorated keys, so it might give false positives).

If the sstable only has BTI index, and no BIG index, then
`sstable::has_partition_key()` will have to be implemented with
with something else than just the index reader.

We might as well ignore the index in any cases and just check
that a regular data read for the given partition returns a non-empty result.
`sstable::has_partition_key` is only used in the
`column_family/sstables/by_key` REST API call that nobody
uses anyway, no point in trying to make special optimizations for it.
2025-09-29 13:01:21 +02:00
Michał Chojnowski
893eb4ca1f sstables: use BTI index for queries, when present and enabled
This patch teaches `sstable::make_index_reader` how to create
a BTI index reader, from the the `Partitions.db` and `Rows.db`
components, if they exist (in which case they are opened by this point).
2025-09-29 13:01:21 +02:00
Michał Chojnowski
e0fda9ae6f sstables/mx/writer: populate BTI index files
In the previous patch we added code responsible
for creating and opening Partitions.db and Rows.db,
but we left those files empty.

In this patch, we populate the files using
`trie::bti_row_index_writer` and `trie::bti_partition_index_writer`.

Note: for the row index, we insert the same clustering blocks to
both indexes. The logic for choosing the size of the blocks
hasn't been changed in any way.

Much of this patch has to do with propagating the current range
tombstone down to all places which can start a new clustering block.

The reason we need that is that, for each clustering block,
BIG indexes store the range tombstone succeeding the block
(i.e. the range tombstone in between the given block and its successor)
BTI indexes store the range tombstone preceding the block,
(i.e. the range tombstone in between the given block and its predecessor).
So before the patch there's no code which looks at the current tombstone
when *starting* the block, only when *ending* the block.

This patch adds an extra copy for each `decorated_key`.
This is mostly unavoidable -- the BTI partition writer just
has to remember the key until its successor appears, to find the
common prefix. (We could avoid the key copy if the BTI isn't used, though.
We don't do that in this patch, we just let the copy happen).
2025-09-29 13:01:21 +02:00
Michał Chojnowski
cdcf34b3a0 sstables: create and open BTI index files, when enabled
This patch adds code responsible for creation and opening
of BTI index components (Rows.db, Partitions.db) when
BTI index writing is enabled.

(It is enabled if the cluster feature is enabled and the relevant
config entry permits it).

The files are empty for now, and are never read.
We will populate and use them in following patches.
2025-09-29 13:01:21 +02:00
Michał Chojnowski
18875621e8 sstables: introduce Partition and Rows component types
BTI indexes are made up of Partition.db and Rows.db files.
In this patch we introduce the corresponding component types.

In Cassandra, BTI is a separate "sstable format", with a new set
of versions. (I.e. `bti-da`, as opposed to `big-me`).

In this patch series, we are doing something different:
we are introducing version `ms`, which is like `me`, except with
`Index.db` and `Summary.db` replaced with `Partitions.db` and `Rows.db`.

With a setup like that, Scylla won't yet be able to read Cassandra's
BTI (`da`) files, because this patch doesn't teach Scylla
about `da`.
(But the way to that is open. It would just require first implementing
several other things which changed between `me` and `da`).

(And, naturally Cassandra will reject `ms` sstables.
But this isn't the first time we are breaking file
compatibility with Cassandra to some degree.
Other examples include encryption and dictionary compression).

Note: Partitions.db and Rows.db contain prefixes of keys,
which is sensitive information, so they have to be encrypted.
2025-09-29 13:01:21 +02:00
Michał Chojnowski
e04ee6d5f6 sstables/mx/writer: make _pi_write_m.partition_tombstone a sstables::deletion_time
There's a test (boost/sstable_compaction_test.cc::tombstone_purge_test)
which tests the value of `_stats.capped_tombstone_deletion_time`.

Before this patch, for "ms" sstables, `to_deletion_time` would have
be called twice for each written partition tombstone, which would fail
the test.

Since `_pi_write_m.partition_tombstone` always ends up being
converted from `tombstone` to `sstables::deletion_time` anyway,
let's just make it a `sstables::deletion_time` to begin with.
This will ensure that `to_deletion_time` will be able to be
only called once per partition tombstone.
2025-09-29 13:01:20 +02:00
Michael Litvak
6bc41926e2 view_builder: reduce log level for expected aborts during view creation
When draining the view builder, we abort ongoing operations using the
view builder's abort source, which may cause them to fail with
abort_requested_exception or raft::request_aborted exceptions.

Since these failures are expected during shutdown, reduce the log level
in add_new_view from 'error' to 'debug' for these specific exceptions
while keeping 'error' level for unexpected failures.

Closes scylladb/scylladb#26297
2025-09-28 22:55:07 +03:00
Avi Kivity
5b40d4d52b Merge 'root,replica: mv multishard_mutation_query -> replica/multishard_query' from Botond Dénes
The code in `multishard_mutation_query.cc` implements the replica-side of range scans and as such it belongs in the replica module. Take the opportunity to also rename it to `multishard_query`, the code implements both data and mutation queries for a long time now.

Code cleanup, no backport required.

Closes scylladb/scylladb#26279

* github.com:scylladb/scylladb:
  test/boost: rename multishard_mutation_query_test to multishard_query_test
  replica/multishard_query: move code into namespace replica
  replica/multishard_query.cc: update logger name
  docs/paged-queries.md: update references to readers
  root,replica: move multishard_mutation_query to replica/
2025-09-28 20:24:46 +03:00
Avi Kivity
5b6570be52 Merge 'db/config: Add SSTable compression options for user tables' from Nikos Dragazis
ScyllaDB offers the `compression` DDL property for configuring compression per user table (compression algorithm and chunk size). If not specified, the default compression algorithm is the LZ4Compressor with a 4KiB chunk size. The same default applies to system tables as well.

This series introduces a new configuration option to allow customizing the default for user tables. It also adds some tests for the new functionality.

Fixes #25195.

Closes scylladb/scylladb#26003

* github.com:scylladb/scylladb:
  test/cluster: Add tests for invalid SSTable compression options
  test/boost: Add tests for SSTable compression config options
  main: Validate SSTable compression options from config
  db/config: Add SSTable compression options for user tables
  db/config: Prepare compression_parameters for config system
  compressor: Validate presence of sstable_compression in parameters
  compressor: Add missing space in exception message
2025-09-28 20:23:23 +03:00
Artsiom Mishuta
eedd61f43f test.py: remove 'sudo' from resource_gather.py
The container now runs as root (4c1f4c419c), so sudo it's not needed
anymore

Closes scylladb/scylladb#26294
2025-09-28 16:51:19 +03:00
Avi Kivity
1c1e8802d5 Merge 'Fix lifetime problems between group0 and sstable dictionary trainings' from Michał Chojnowski
Apparently the group0 server object dies (and is freed) during drain/shutdown, and I didn't take that into account in my https://github.com/scylladb/scylladb/pull/23025, which still attempts to use it afterwards.

The patch fixes two problems.
The problem with `is_raft_leader` has been observed in tests.
The problems with `publish_new_sstable_dict` has not been observed, but AFAIU (based on code inspection) it exists. I didn't attempt to prove its existence with a test.

Should be backported to 2025.3.

Closes scylladb/scylladb#25115

* github.com:scylladb/scylladb:
  storage_service: in publish_new_sstable_dict, use _group0_as instead of the main abort source
  storage_service: hold group0 gate in `publish_new_sstable_dict`
2025-09-28 14:27:37 +03:00
Botond Dénes
34cc7aafae tools/scylla-sstable: introduce the upgrade command
An offline, scylla-sstable variant of nodetool upgradesstables command.
Applies latest (or selected) sstable version and latest schema.

Closes scylladb/scylladb#26109
2025-09-27 16:53:14 +03:00
Avi Kivity
24b5d08731 Merge 'Remove table::for_all_partitions_slow()' from Pavel Emelyanov
This method was once implemented by calling table::for_all_partitions(), which was supposed to be non-slow version. Then callers of "non-slow" method were updated and the method itself was renamed into "_slow()" one. Nowadays only one test still uses it.

At the same time the method itself mostly consists of a boilerplate code that moves bits around to call lambda on the partitions read from reader. Open-coding the method into the calling test results in much shorter and simpler to follow code.

Code cleanup, no backport needed

Closes scylladb/scylladb#26283

* github.com:scylladb/scylladb:
  test: Fix indentation after previous patch
  test: Opencode for_all_partitions_slow()
  test: Coroutinize test_multiple_memtables_multiple_partitions inner lambda
  table: Move for_all_partitions_slow() to test
2025-09-27 16:26:18 +03:00
Piotr Dulikowski
39145ff1d0 Merge 'vector_store_client: Add support for load balancing' from Karol Nowacki
This change introduces a load balancing mechanism for the vector store client.
The client can now distribute requests across multiple vector store nodes.
The distribution mechanism performs random selection of nodes for each request.

References: VECTOR-187

No backport is needed as this is a new feature.

Closes scylladb/scylladb#26205

* github.com:scylladb/scylladb:
  vector_store_client: Add support for load balancing
  vector_store_client_test: Introduce vs_mock_server
  vector_store_client_test: Relocate to a dedicated directory
2025-09-26 18:55:14 +02:00
Pavel Emelyanov
04a40b08f7 test: Fix indentation after previous patch
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-09-26 16:39:09 +03:00
Pavel Emelyanov
813619f939 test: Opencode for_all_partitions_slow()
The method is a large boilerplate that moves stuff around to do simple
thing -- read mutations from reader in a row and "check" them with a
lambda, optionally breaking the loop if lambda wants it.

The whole thing is much shorter if the caller kicks reader itsown.

One thing to note -- reader is not closed if something throws in
between, but that's test anyway, if something throws, test fails and not
closed reader is not a big deal.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-09-26 16:36:58 +03:00
Pavel Emelyanov
c1ebf987a9 test: Coroutinize test_multiple_memtables_multiple_partitions inner
lambda

The only place where it needs futures is to call the
for_all_partitions_slow() from a table

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-09-26 16:35:59 +03:00
Pavel Emelyanov
f3c57f7dd0 table: Move for_all_partitions_slow() to test
It's now only used by a single test, so move it there and remove from
public table API.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-09-26 16:33:25 +03:00
Karol Nowacki
a0e62ef8de vector_store_client: Add support for load balancing
This change introduces a load balancing mechanism for the vector store client.
The client can now distribute requests across multiple vector store nodes.
The distribution mechanism performs random selection of nodes for each request.
2025-09-26 13:44:28 +02:00
Karol Nowacki
ee90530c31 vector_store_client_test: Introduce vs_mock_server
Introduce the `vs_mock_server` test class, which is capable of counting
incoming requests. This will be used in subsequent tests to verify
load balancing logic.
2025-09-26 12:27:06 +02:00
Nikos Dragazis
8410532fa0 test/cluster: Add tests for invalid SSTable compression options
Complementary to the previous patch. It triggers semantic validation
checks in `compression_parameters::validate()` and expects the server to
exit. The tests examine both command line and YAML options.

Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
2025-09-26 12:02:42 +03:00
Nikos Dragazis
6ba0fa20ee test/boost: Add tests for SSTable compression config options
Since patch 03461d6a54, all boost unit tests depending on `cql_test_env`
are compiled into a single executable (`combined_tests`). Add the new
test in there.

Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
2025-09-26 12:02:42 +03:00
Nikos Dragazis
8d5bd212ca main: Validate SSTable compression options from config
`compression_parameters` provides two levels of validation:

* syntactic checks - implemented in the constructor
* semantic checks - implemented by `compression_parameters::validate()`

The former are applied implicitly when parsing the options from the
command line or from scylla.yaml. The latter are currently not applied,
but they should.

In lack of a better place, apply them in main, right after joining the
cluster, to make sure that the cluster features have been negotiated.
The feature needed here is the `SSTABLE_COMPRESSION_DICTS`. Validation
will fail if the feature is disabled and a dictionary compression
algorithm has been selected.

Also, mark `validate()` as const so that it can be called from a config
object.

Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
2025-09-26 12:02:42 +03:00
Nikos Dragazis
e1d9c83406 db/config: Add SSTable compression options for user tables
ScyllaDB offers the `compression` DDL property for configuring
compression per user table (compression algorithm and chunk size). If
not specified, the default compression algorithm is the LZ4Compressor
with a 4KiB chunk size (refer to the default constructor for
`compression_parameters`). The same default applies to system tables as
well.

Add a new configuration option to allow customizing the default for user
tables. Use the previously hardcoded default as the new option's default
value.

Note that the option has no effect on ALTER TABLE statements. An altered
table either inherits explicit compression options from the CQL
statement, or maintains its existing options.

Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
2025-09-26 12:02:00 +03:00
Botond Dénes
52c05d89aa test/boost: rename multishard_mutation_query_test to multishard_query_test 2025-09-26 11:15:38 +03:00
Botond Dénes
3be4f0698f replica/multishard_query: move code into namespace replica
Complete the migration, add code to the replica namespace too.
2025-09-26 11:15:38 +03:00
Botond Dénes
ed50a307db replica/multishard_query.cc: update logger name
To reflect the new file name.
2025-09-26 11:15:38 +03:00
Botond Dénes
8f90137e87 docs/paged-queries.md: update references to readers
Both links to reader code are outdated, update them.
2025-09-26 11:15:38 +03:00
Botond Dénes
fb16c0a6d4 root,replica: move multishard_mutation_query to replica/
It belongs there, it is a completely replica-side thing. Also take the
opportunity to rename it to multishard_query.{hh,cc}, it is not just
mutation anymore (data query is also implemented).
2025-09-26 11:15:38 +03:00
Piotr Dulikowski
68d5dcfa23 Merge 'Coroutinize gossipring property file snitch' from Pavel Emelyanov
Most of it's then-chains are quire hairy and look much nicer as coroutines.
Last patch restores indentation.

Code cleanup, no backport required.

Closes scylladb/scylladb#26271

* github.com:scylladb/scylladb:
  snitch: Reindent after previous changes
  snitch: Make periodic_reader_callback() a coroutine
  snitch: Coroutinize pause_io()
  snitch: Coroutinize stop()
  snitch: Coroutinize reload_configuration()
  snitch: Coroutinize read_property_file()
  snitch: Coroutinize start()
  snitch: Coroutinize property_file_was_modified()
2025-09-26 08:32:19 +02:00
Avi Kivity
0f4363cc8d Merge 'sstable: add more complete schema to scylla component' from Botond Dénes
Sstables store a basic schema in the statistics component. The scylla-sstable tool uses this to be able to read and dump sstables in a self-contained manner, without requiring an external schema source.
The problem is that the schema stored int he statistics component is incomplete: it doesn't store column names for key columns, so these have placeholder names in dump outputs where column names are visible.
This is not a disaster but it is confusing and it can cause errors in scripts which want to check the content of sstables, while also knowing the schema and expecting the proper names for key columns.

To make sstables truly self-contained w.r.t. the schema, add a complete schema to the scylla component. This schema contains the names and types of all columns, as well as some basic information about the schema: keyspace name, table name, id and version.
When available, scylla-sstable's schema loader will use this new more complete schema and fall-back to the old method of loading the (incomplete) schema from the statistics component otherwise.

New feature, no backport required.

Closes scylladb/scylladb#24187

* github.com:scylladb/scylladb:
  test/boost/schema_loader_test: add specific test with interesting types
  test/lib/random_schema: add random_schema(schema_ptr) constructor
  test/boost/schema_loader_test: test_load_schema_from_sstable: add fall-back test
  tools/schema_loader: add support for loading from scylla-metadata
  tools/schema_loader: extract code which load schema from statistics
  sstables: scylla_metadata: add schema member
2025-09-26 00:21:17 +03:00