Commit Graph

47265 Commits

Author SHA1 Message Date
Michał Chojnowski
f851efd4fa test: add test_sstable_compression_dictionaries_autotrain.py
Adds a test which checks that sstable compression dict autotraining
does its job.
2025-04-01 00:07:31 +02:00
Michał Chojnowski
62da3d8363 test: add test_sstable_compression_dictionaries_basic.py
Add a basic integration test for SSTable compression with shared dictionaries.
2025-04-01 00:07:30 +02:00
Michał Chojnowski
7b0eeefd79 test/pylib/rest_client: add keyspace_upgrade_sstables helper 2025-04-01 00:07:30 +02:00
Michał Chojnowski
3f7969313f main: run a sstable_dict_autotrainer
Create an instance of `sstable_dict_autotrainer` in `scylla_main`
and run it.
2025-04-01 00:07:30 +02:00
Michał Chojnowski
a19d6d95f7 api: add the estimate_compression_ratios API call
Add an API call which estimates the effectiveness of possible
compression config changes.

This can be used to make an informed decision about whether to
change the compression method, without actually recompressing
any SSTables.
2025-04-01 00:07:30 +02:00
Michał Chojnowski
4f0d453acf dict_autotrainer: introduce sstable_dict_autotrainer
Add a fiber responsible for periodic re-training of compression dictionaries
(for tables which opted into dict-aware compression).

As of this patch, it works like this:
every `$tick_period` (15 minutes), if we are the current Raft leader,
we check for dict-aware tables which have no dict, or a dict older
than `$retrain_period`.

For those tables, if they have enough data (>1GiB) for a training,
we train a new dict and check if it's significantly better
than the current one (provides ratio smaller than 95% of current ratio),
and if so, we update the dict.
2025-04-01 00:07:30 +02:00
Michał Chojnowski
9d02e2c005 db/system_keyspace: add query_dict_timestamp
Adds a helper method which queries the creation timestamp
of a given dict in `system.dicts`.

We will later use the age of the current SSTable compression dict
to decide if another training should be done already.
2025-04-01 00:07:30 +02:00
Michał Chojnowski
cb1b291051 compress: add ZstdWithDictsCompressor and LZ4WithDictsCompressor
Add new compressor names to `sstable_compression`.
When those names are configured in the schema,
new SSTables will be compressed with dict-aware Zstd or LZ4
respectively.
2025-04-01 00:07:30 +02:00
Michał Chojnowski
bea866a46f main: clean up sstable compression dicts after table drops
When a table is dropped, its corresponding dictionary in `system.dicts`
-- if any -- should be deleted, otherwise it will remain forever as
garbage.

This commit implements such cleanup.
2025-04-01 00:07:30 +02:00
Michał Chojnowski
cee504f66f sstables/compress: discard hidden compression options after the decompressor is created
Dictionary contents are kept in the list of "compression options" in the
header of `CompressionInfo.db`, and they are loaded from disk into
memory when the `sstable::compression` object is populated.

After the decompressor for the SSTable is created based on those
dict contents, they are not needed in RAM anymore. And since
they take up a sizeable amount of memory, we would like to free them.

In this patch, we discard all "hidden compression options"
(currently: only the dictionary contents) from the
`sstable::compression` object right after the decompressor is created.
(Those options are not supposed to be used for anything else anyway).
2025-04-01 00:07:30 +02:00
Michał Chojnowski
10fa4abde7 compress: change compressor_ptr from shared_ptr to unique_ptr
Cleanup patch. After we moved the ownership of compressors
to sstables, compressor objects never have shared lifetime.
`unique_ptr` is more appropriate for them than `shared_ptr` now.
(And besides expressing the intent better, using `unique_ptr`
prevents an accidental cross-shard `shared_ptr` copy).
2025-04-01 00:07:29 +02:00
Michał Chojnowski
58ae278d10 api: add the retrain_dict API call
Add an API call which will retrain the SSTable compression dictionary
for a given table.

Currently, it needs all nodes to be alive to succeed. We can relax this later.
2025-04-01 00:07:29 +02:00
Michał Chojnowski
4115a6fece storage_service: add some dict-related routines
storage_service will be the interface between the API layer
(or the automatic training loop) and the dict machinery.
This commit implements the relevant interface for that.

It adds methods that:
1. Take SSTable samples from the cluster, using the new RPC verbs.
2. Train a dict on the sample. (The trainer will be plugged in from `main`).
3. Publishes the trained dictionary. (By adding mutations to Raft group 0).

Perhaps this should be moved to a separate "service".
But it's not like `storage_service` has a clear purpose anyway.
2025-04-01 00:07:29 +02:00
Michał Chojnowski
94d244ab49 main: in compression_dict_updated_callback, recognize and use SSTable compression dicts
Currently, there is at most one dictionary in `system.dicts`:
named "general", used by RPC compression. So the callback called
on `system.dicts` just always refreshes the RPC compression dict.

In a follow-up commit, we will publish SSTable compression dicts to
`system.dicts` rows with a name in the "sstables/{table_uuid}" format.
We want modification to such rows to be passed as new dictionary
recommendations to the SSTable compressor factory. This commit teaches
the `system.dicts` modification callback to recognize such modifications
and forward them to the compressor factory.
2025-04-01 00:07:29 +02:00
Michał Chojnowski
380f409c46 storage_service: add do_sample_sstables()
Adds a helper which uses ESTIMATE_SSTABLE_VOLUME and SAMPLE_SSTABLES
RPC calls to gather a combined sample of SSTable Data files for the given table
from the entire cluster.
2025-04-01 00:07:29 +02:00
Michał Chojnowski
94c33b6760 messaging_service: add SAMPLE_SSTABLES and ESTIMATE_SSTABLE_VOLUME verbs
Add two verbs needed to implement dictionary training for SSTable
compression.

SAMPLE_SSTABLES returns a list of randomly-selected chunks of Data files
with a given cardinality and using a given chunk size,
for the given table.

ESTIMATE_SSTABLE_VOLUME returns the total uncompressed size of all Data
files the given table.
2025-04-01 00:07:29 +02:00
Michał Chojnowski
4856f4acca db/system_keyspace: let system.dicts helpers be used for dicts other than the RPC compression dict
Extend the `system.dicts` helper for querying and modifying
`system.dicts` with an ability to use names other than "general".
We will use that in later commits to publish dictionaries for SSTable compression.
2025-04-01 00:07:29 +02:00
Michał Chojnowski
b77c611c00 raft/group0_state_machine: on system.dicts mutations, pass the affected partitition keys to the callback
Before this patch, `system.dicts` contains only one dictionary, for RPC
compression, with the fixed name "general".

In later parts of this series, we will add more dictionaries to
system.dicts, one per table, for SSTable compression.

To enable that, this patch adjusts the callback mechanism for group0's `write_mutations`
command, so that the mutation callbacks for group0-managed tables can see which
partition keys were affected. This way, the callbacks can query only the
modified partitions instead of doing a full scan. (This is necessary to
prevent quadratic behaviours.)

For now, only the `system.dicts` callback uses the partition keys.
2025-04-01 00:07:29 +02:00
Michał Chojnowski
d920ab5366 database: add sample_data_files()
Add a helper for sampling the Data files for a given table.
We will use it to take samples for dictionary training.
2025-04-01 00:07:29 +02:00
Michał Chojnowski
48c06c7e4b database: add take_sstable_set_snapshot()
We want a method that will allow us to take a stable snapshot of
SSTables, to asynchronously compute some stats on them.
But `take_storage_snapshot` is overly invasive for that, because
it flushes memtables on each call.
(If `take_storage_snapshot` was, for example, called repetitively,
it could create a ton of small memtables and lead to trouble).

This commit adds a weaker version which only takes a snapshot of
*existing SSTables*, and doesn't flush memtables by itself.

This will be useful for dictionary training, which doesn't
care about the semantics of SSTables, only their rough statistical
properties.
2025-04-01 00:07:28 +02:00
Michał Chojnowski
64f3d7e364 compress: teach lz4_processor about dictionaries
Extend `lz4_processor` with the ability to use dictionaries.
We won't use this ability yet. It will be used when new
compressor names are added.
2025-04-01 00:07:28 +02:00
Michał Chojnowski
b65101b371 compress: teach zstd_processor about dictionaries
Extend `zstd_processor` with the ability to use dictionaries.
We won't use this ability yet. It will be used when new
compressor names are added.
2025-04-01 00:07:28 +02:00
Michał Chojnowski
b18ddcb92e sstables: delegate compressor creation to the compressor factory
Remove `compressor::create()`. This enforces that compressors
are only created through the `sstable_compressor_factory`.

Unlike the synchronous `compressor::create()`, the factory will be able
to create dict-aware compressors.
2025-04-01 00:07:28 +02:00
Michał Chojnowski
30a9d471fa sstables: plug an sstable_compressor_factory into sstables_manager
Create a `sstable_compressor_factory_impl` in `scylla_main`,
and pipe it through constructors into `sstables_manager`.

In next commits, the factory available through the `sstables_manager`
will be used to create compressors for SSTable readers and writers.
2025-04-01 00:07:28 +02:00
Michał Chojnowski
ebf02913a2 sstables: introduce sstable_compressor_factory
Before this commit, `compressor` objects are synchronously
created, during the creation or opening of SSTables,
from `compression_parameters` objects.

But we want to add compression dictionaries to SSTables and we want
to share dictionary contents across shards.
To do that, we need to make the creation of `compressor` objects asynchronous,
and give it access to a global dictionary registry.

We encapsulate that in a `sstable_compression_factory`. Instead of
calling `compressor::create()` on SSTable opening or creation, we will
ask the factory, asynchronously, for a new compressor, and it will return
a compressor with a deduplicated, up-to-date dictionary.

This commit introduces such a factory. It's not used anywhere yet,
and the compressors it produces don't use the provided dictionaries yet.
2025-04-01 00:07:28 +02:00
Michał Chojnowski
2bd393849c utils/hashers: add get_sha256()
Add a helper function which computes the SHA256 for a blob.
We will use it to compute identifiers for SSTable compression
dictionaries later.
2025-04-01 00:07:28 +02:00
Michał Chojnowski
61316e29df gms/feature_service: add the SSTABLE_COMPRESSION_DICTS cluster feature
This feature will guard against writing SSTables containing compression
dictionaries before the entire cluster is able to understand them.
2025-04-01 00:07:28 +02:00
Michał Chojnowski
dd932ebb2f compress: add hidden dictionary options
Before this commit, "compression options" written into
CompressionInfo.db (and used to construct a decompressor)
have a 1:1 correspondence to "compression options" specified
in the schema.

But we want to add a new "compression option" -- the compression
dictionary -- which will be written into CompressionInfo.db
and used to construct decompressors, but won't be specified in the
schema.

To reconcile that, in this commit we introduce the notion of a "hidden
option". If an option name in `CompressionInfo.db` begins with a dot,
then this option will be used to construct decompressors, but won't
be visible for other uses. (I.e. for the `sstable_info` API call
and for recovering a fake `schema` from `CompressionInfo.db` in the
`scylla sstable` tool).

Then, we introduce the hidden `.dictionary.{0,1,2,..}` options,
which hold the contents of the dictionary blob for this SSTable.

(The dictionary is split into several parts because the SSTable
format limits the length of a single option value to 16 bits,
and dictionaries usually have a length greater than that).

This commit only introduces helpers which translate dictionary blobs
into "options" for CompressionInfo.db, and vice-versa, but it doesn't
use those helpers yet. They will be used in later commits.
2025-04-01 00:07:28 +02:00
Michał Chojnowski
11be7c0704 compress: remove compression_parameters::get_compressor()
Following up on the previous commits, we avoid constructing
compressors where not necessary,
by checking things directly on `compression_parameters` instead.
2025-04-01 00:07:28 +02:00
Michał Chojnowski
006c631642 sstables/compress: remove get_sstable_compressor()
Following up on the previous commit, we avoid constructing
a compressor in the `sstable_info` API call, and we instead
read the compression options from the `sstable::compression`.
2025-04-01 00:07:28 +02:00
Michał Chojnowski
8e611536b0 sstables/compress: move ownership of compressor to sstable::compression
SSTable readers and writers use `compressor` objects to compress and
decompress chunks of SSTable data files.

`compressor` objects are read-only, so only one of them is needed
for each SSTable. Before this commit, each reader and writer has
its own `compressor` object. This isn't necessary, but it's okay.

But later in this series it will stop being okay, because the creation
of a `compressor` will become an expensive cross-shard
operation (because it might require sharing a compression dictionary
from another shard). So we have to adjust the code so that there is
only once `compressor` per sstable, not one per reader/writer.

We stuff the ownership of this compressor into `sstable::compression`.

To make the ownership clear, we remove `compression_ptr` shared
pointers from readers and writers, and make them access the
compressor via the `sstable::compression` instead.
2025-04-01 00:07:27 +02:00
Michał Chojnowski
7bdcd5e8c1 compress: remove compressor::option_names()
It used to be used by `compression_parameters` validation logic
to ask the created `compressor` for compressor-specific option names.

Since we no longer delegate this to `compressor`, but we just
put the knowledge of those options directly into
`compressor_parameters`, it's dead code now.
2025-04-01 00:07:27 +02:00
Michał Chojnowski
3b0ab8e1ee compress: clean up the constructor of zstd_processor
Since we now parse and validate the compression level during the
construction of `compression_parameters`, we can just pass the
structured params to `zstd_processor` instead of passing
a raw string map.
2025-04-01 00:07:27 +02:00
Michał Chojnowski
6470035a74 compress: squash zstd.cc into compress.cc
Unlike all other implementations of `compressor`, `zstd_processor`
has its own special object file and its own special
late binding mechanism (via the `class_registry`).
It doesn't need either.

Let's squash it into `compress.cc`. Keeping `zstd_processor` a separate "module"
would require adding even more headers and source files later in the
series (when adding dictionaries), and there's no benefit in being
so granular. All `compressor` logic can be in `compress.cc` and it will
still be small enough.

This commit also gets rid of the pointless `class_registry` late binding
mechanism and just constructs the `zstd_processor` in
`compressor::create()` with a regular constructor call.
2025-04-01 00:07:27 +02:00
Michał Chojnowski
cfe69e057f sstables/compress: break the dependency of compression_parameters on compressor
Note: this commit is meant to be a code refactoring only and is not intended
to change the observable behaviour.

Today `schema` contains a `compression_parameters`.
`compression_parameters` contains an instance of
`compressor`, and SSTable writers just share that instance.

This is fine because `compressor` is a stateless object,
functionally dependent on the schema.

But in later parts of the series, we will break this functional
dependency by adding dictionaries to compressors. Two writers
for the same schema might have different dictionaries, so they won't
be able to just share a single instance contained in the schema.

And when that happens, having a `compressor` instance
in the `schema`/`compression_parameters` will become awkward,
since it won't be actually used. It will be only a container for options.

In addition, for performance reasons, we will want to share some pieces
of compressors across shards, which will require -- in the general case --
a construction of a compressor to be asynchronous, and therefore not
possible inside the constructor of `compression_parameters`.

This commit modifies `compression_parameters` so that it doesn't hold or
construct instances of `compressor`.

Before this patch, the `compressor` instance constructed in
`compression_parameters` has an additional role of validating and
holding compressor-specific options.
(Today the only such option is the zstd compression level).

This means that the pieces of logic responsible for compressor-specific
options have to be rewritten. That ends up being the bulk of this commit.
2025-04-01 00:07:27 +02:00
Michał Chojnowski
f4ca94d13b compress.hh: switch compressor::name() from an instance member to a virtual call
Before this patch, `compressor` is designed to be a proper abstract
class, where the creator of a compressor doesn't even know
what he's creating -- he passes a name, and it gets turned into a
`compressor` behind a scenes.

But later, when creation of compressors will involve looking up
dictionaries, this abstraction will only get in the way.
So we give up on keeping `compressor` abstract, and instead of
using "opaque" names we turn to an explicit enum of possible compressor types.

The main point of this patch is to add the `algorithm` enum and the `algorithm_to_name()`
function. The rest of the patch switches the `compressor::name()` function
to use `algorithm_to_name()` instead of the passed-by-constructor
`compressor::_name`, to keep a single source of truth for the names.
2025-04-01 00:07:27 +02:00
Michał Chojnowski
4f634de2e9 bytes: adapt fmt_hex to std::span<const std::byte>
This allows us to hexdump things other than `bytes_view`.
(That is, without reinterpret_casting them to `bytes_view`,
which -- aside from the inconvenience -- isn't quite legal.
In contrast, any span can be legally casted to `std::span<const std::byte>`).
2025-04-01 00:07:27 +02:00
Tomasz Grabiec
29d1c2adc6 Merge 'Finalize tablet splits earlier' from Lakshmi Narayanan Sreethar
Resize finalization is executed in a separate topology transition state,
`tablet_resize_finalization`, to ensure it does not overlap with tablet
transitions. The topology transitions into the
`tablet_resize_finalization` state only when no tablet migrations are
scheduled or being executed. If there is a large load-balancing backlog,
split finalization might be delayed indefinitely, leaving the tables
with large tablets.

This PR fixes the issue by updating the load balancer to no schedule any
migrations and to not make any repair plans when there a resize
finalization is pending in any table.

Also added a testcase to verify the fix.

Fixes #21762

Improvement : No need to backport.

Closes scylladb/scylladb#22148

* github.com:scylladb/scylladb:
  topology_coordinator: fix indentation in generate_migration_updates
  topology_coordinator: do not schedule migrations when there are pending resize finalizations
  load_balancer: make repair plans only when there is no pending resize finalization
2025-03-31 14:42:34 +02:00
Botond Dénes
90c20858ed Merge 'test/database: Remove most of take_snapshot() helper overloads and re-use them more' from Pavel Emelyanov
This helper facilitate snapshot creation by various test cases in database_test.cc. This PR generalizes all overloads into one that suits all callers and patches one more test case to use it as well.

Closes scylladb/scylladb#23482

* github.com:scylladb/scylladb:
  test/database: Re-use take_snapshot() helper once more
  test/database: Remove most of take_snapshot() helper overloads
2025-03-31 15:20:51 +03:00
Benny Halevy
5f2ce0b022 loading_cache_test: test_loading_cache_reload_during_eviction: use manual_clock
Rather than lowres_clock, as since
32b7cab917,
loading_cache_for_test uses manual_clock for timing
and relying on lowres_clock to time the test might
run out of memory on fast test machines.

Fixes #23497

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#23498
2025-03-31 14:53:06 +03:00
Pavel Emelyanov
ac582efb44 test/database: Re-use take_snapshot() helper once more
There's a test case that can call the recently patched take_snapshot()
helper as well. This changes nothing, but makes further patching a bit
simpler (not in this branch).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-03-31 13:18:06 +03:00
Pavel Emelyanov
7e6380b6bd test/database: Remove most of take_snapshot() helper overloads
There are 3 of those that help tests (re)shuffle cql_test_env/database,
skip_flush == true/false options and keyspace/table/snapshot names.
There's little sense in having that many of those, just one overload
with default arguments suits most of the callers.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-03-31 13:18:06 +03:00
Botond Dénes
ea55eed037 Merge 'Snapshot several tables at once in scrub API handler' from Pavel Emelyanov
The scrub API handler may want to snapshot several tables. For that, it calls snapshot-ctl method to snapshot a single table for each table in the list. That's excessive, snapshot-ctl has a method to snapshot a bunch of tables at once, just what the scrub handler needs.

It's an improvement, so no need to backport

Closes scylladb/scylladb#23472

* github.com:scylladb/scylladb:
  snapshot-ctl: Remove unused snapshot-single-table method
  api: Snapshot all tables at once in scrub handler
2025-03-31 13:00:32 +03:00
Piotr Smaron
aff8cbc6f3 CODEOWNERS: remove expired owners
Removing krzaq, who's no longer with the company.
Removing core-frontend team members from Alternator areas, as it's no
longer the domain of this team.

Closes scylladb/scylladb#23500
2025-03-31 11:37:51 +03:00
Pavel Emelyanov
0077acd1bb api: Properly validate table in tablet add|del replica handlers
The handlers in question just go and call database.find_column_family,
in case the table in question doesn't exist, the no_such_column_family
exception would be thrown, which is not nice. Proper behavior is to
throw bad_param one and there's a helper that does it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#23389
2025-03-31 10:03:17 +02:00
Andrzej Jackowski
c89d8c6566 cql3: prevent from empty option use in cf_statement::column_family()
Implementation of cf_statement::column_family() dereferences _cf_name
option without checking if the option is non-empty. On enterprise
branch, there is a safeguard that prevents from such an empty option
dereferencing. Although the current code on master seems to not call
columny_family() when _cf_name is empty, it is safer to introduce the
same workaround on master, to avoid any regression.

This change:
 - Prevent from empty option use in cf_statement::column_family()

Fixes: scylla-enterprise#5273

Closes scylladb/scylladb#23366
2025-03-31 09:43:22 +03:00
Michał Chojnowski
e23fdc0799 table: fix a race in table::take_storage_snapshot()
`safe_foreach_sstable` doesn't do its job correctly.

It iterates over an sstable set under the sstable deletion
lock in an attempt to ensure that SSTables aren't deleted during the iteration.

The thing is, it takes the deletion lock after the SSTable set is
already obtained, so SSTables might get unlinked *before* we take the lock.

Remove this function and fix its usages to obtain the set and iterate
over it under the lock.

Closes scylladb/scylladb#23397
2025-03-31 09:40:32 +03:00
Avi Kivity
2b9e1e61d0 docs: reader_concurrency_semaphore: document CPU concurrency limit
Document the CPU concurrency implemented in 3d816b7c16
and adjusted in 3d12451d1f.

Closes scylladb/scylladb#23404
2025-03-31 09:39:55 +03:00
Dawid Mędrek
b0b0c5905e test/cluster/test_multidc: Clean up RF-rack-valid keyspaces tests
There are some minor things we should fix that are a remnant
of the original changes (scylladb/scylladb@7646e14).

Closes scylladb/scylladb#23429
2025-03-31 09:38:42 +03:00
David Garcia
1a7be07b8c docs: renders os-support from json file
docs: renders os-support from json file

Closes scylladb/scylladb#23436
2025-03-31 09:36:49 +03:00