pylib_test contains one pure Python test. This test does not test Scylla.
This test is not deleted because it can be useful to run during pre-commit,
for example, but it definitely should not be run in CI in modes with 3 repeats each.
It does not make sense. It is a Unit test for test.py framework.
Note: test still can be easily run by pytest via the command:
./tools/toolchain/dbuild pytest test/pylib_test
Closesscylladb/scylladb#23181
Move `object_storage.yaml` endpoints to `scylla.yaml`
This change also removes the `object_storage.yaml` file
altogether and adds tests for fetching the endpoints
via the `v2/config/object_storage_endpoints` REST api.
Also, `object_storage_config_file` options is moved to a deprecated state as it's no longer needed.
This PR depends on #22951, the reviewers should review patch 393e1ac0ec066475ca94094265a5f88dbbdb1a1f
Refs https://github.com/scylladb/scylladb/issues/22428Closesscylladb/scylladb#22952
* github.com:scylladb/scylladb:
Remove db::config::object_storage_config
Move `object_storage.yaml` endpoints to `scylla.yaml`
This PR extends Scylla's SSTable compression with the ability to use compression dictionaries shared across compression chunks. This involves several changes:
- We refactor `compression_parameters` and friends (`compressor`, `sstables::local_compression`, `sstables::compression`) to prepare for making the construction of `compressor`s asynchronous, to enable sharing pieces of compressors (the dictionaries) across shards.
- We introduce the notion of "hidden compression options" which are written to `CompressionInfo.db` and used to construct decompressors, like regular options, but don't appear in the schema. (We later stuff the SSTable's dictionary into `CompressionInfo.db` using a sequence of such options).
- We add a cluster feature which guards the creation of dictionary-compressed SSTables.
- We introduce a central "compressor factory" (one instance shared by all shards), which from this point onward is used to construct all `compressor` objects (one per SSTable) used to process the SSTables. When constructing a compressor for writing, it uses the "current"/"recommended" dictionary (which is passed to the factory from the actively-observed contents of the group0-managed `system.dicts`). When constructing a compressor for reading, it uses the dictionary written in the hidden compression options in CompressionInfo.db. And it keeps dictionaries deduplicated, so that each unique live dictionary blob has only one instance in memory, shared across shards.
- We teach the relevant `lz4` and `zstd` compressor wrappers about the dictionaries.
- We add a HTTP API call which samples pieces of the given table (i.e. the Data.db files) from across the cluster, trains a dictionary on it, and publishes it via `system.dicts` as the new current dictionary for that table. (And we add some RPC verbs to support that).
- We add a HTTP API call which estimates the impact of various available compression configurations on the compression ratio.
- We add an autotrainer fiber which periodically retrains dicts for dict-aware tables and publishes them if they seem to be a significant improvement.
Known imperfections:
- The factory currently keeps one dictionary instance on the entire node, but we probably want one copy per NUMA node. I didn't do that because exposing NUMA knowledge to Scylla seems to require some changes in Seastar first.
New feature, no backporting involved.
Closesscylladb/scylladb#23025
* github.com:scylladb/scylladb:
docs: add user-facing documentation for SSTable compression with shared dicts
docs/dev: add sstable-compression-dicts.md
test: add test_sstable_compression_dictionaries_autotrain.py
test: add test_sstable_compression_dictionaries_basic.py
test/pylib/rest_client: add `keyspace_upgrade_sstables` helper
main: run a sstable_dict_autotrainer
api: add the estimate_compression_ratios API call
dict_autotrainer: introduce sstable_dict_autotrainer
db/system_keyspace: add query_dict_timestamp
compress: add ZstdWithDictsCompressor and LZ4WithDictsCompressor
main: clean up sstable compression dicts after table drops
sstables/compress: discard hidden compression options after the decompressor is created
compress: change compressor_ptr from shared_ptr to unique_ptr
api: add the retrain_dict API call
storage_service: add some dict-related routines
main: in compression_dict_updated_callback, recognize and use SSTable compression dicts
storage_service: add do_sample_sstables()
messaging_service: add SAMPLE_SSTABLES and ESTIMATE_SSTABLE_VOLUME verbs
db/system_keyspace: let `system.dicts` helpers be used for dicts other than the RPC compression dict
raft/group0_state_machine: on `system.dicts` mutations, pass the affected partitition keys to the callback
database: add sample_data_files()
database: add take_sstable_set_snapshot()
compress: teach `lz4_processor` about dictionaries
compress: teach `zstd_processor` about dictionaries
sstables: delegate compressor creation to the compressor factory
sstables: plug an `sstable_compressor_factory` into `sstables_manager`
sstables: introduce sstable_compressor_factory
utils/hashers: add get_sha256()
gms/feature_service: add the SSTABLE_COMPRESSION_DICTS cluster feature
compress: add hidden dictionary options
compress: remove `compression_parameters::get_compressor()`
sstables/compress: remove get_sstable_compressor()
sstables/compress: move ownership of `compressor` to `sstable::compression`
compress: remove compressor::option_names()
compress: clean up the constructor of zstd_processor
compress: squash zstd.cc into compress.cc
sstables/compress: break the dependency of `compression_parameters` on `compressor`
compress.hh: switch compressor::name() from an instance member to a virtual call
bytes: adapt fmt_hex to std::span<const std::byte>
This restored timeout seems to have been accidentally removed in
7081215552 (r2005352424).
Without it, `raft_server_with_timeouts::run_with_timeout` will get
`std::nullopt` as a value of the `timeout` parameter and perform an
operation without any timeout, whereas previously it would have waited
for the default timeout specified in
`raft_server_for_group::default_op_timeout`.
Closesscylladb/scylladb#23380
A default timestamp (not to confuse with the timestamp passed via 'USING TIMESTAMP' query clause) can be set using 0x20 flag and the <timestamp> field in the binary CQL frame payload of QUERY, EXECUTE and BATCH ops. It also happens to be a default of a Java CQL Driver.
However, we were only setting the corresponding info in the CQL Tracing context of a QUERY operation. For an unknown reason we were not setting this for an EXECUTE and for a BATCH traces (I guess I simply forgot to set it back then).
This patch fixes this.
Fixes#23173
The issue fixed by this PR is not critical but the fix is simple and safe enough so we should backport it to all live releases.
Closesscylladb/scylladb#23174
* github.com:scylladb/scylladb:
CQL Tracing: set common query parameters in a single function
transport/server.cc: set default timestamp info in EXECUTE and BATCH tracing
This method is only used by the loader code (and tests). Also, There's the
highest_version_seen() peer that sits in the loader code either.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#23324
In its operations the fs storage carefully generates full filename from
all sstable parameters -- version, format, generation, keyspace and
table names and component type or name. However, in all of the cases
format, version and keyspace:table names are inherited from the sstable
being operated on. This calls for a filename generation helper that
wraps most of the arguments thus making the lines shorter.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#23384
So that a multi-dc/multi-rack cluster can be populated
in a single call.
* Enhancement, no backport required
Closesscylladb/scylladb#23341
* github.com:scylladb/scylladb:
test/pylib: servers_add: add auto_rack_dc parameter
test/pylib: servers_add: support list of property_files
Add an API call which estimates the effectiveness of possible
compression config changes.
This can be used to make an informed decision about whether to
change the compression method, without actually recompressing
any SSTables.
Add a fiber responsible for periodic re-training of compression dictionaries
(for tables which opted into dict-aware compression).
As of this patch, it works like this:
every `$tick_period` (15 minutes), if we are the current Raft leader,
we check for dict-aware tables which have no dict, or a dict older
than `$retrain_period`.
For those tables, if they have enough data (>1GiB) for a training,
we train a new dict and check if it's significantly better
than the current one (provides ratio smaller than 95% of current ratio),
and if so, we update the dict.
Adds a helper method which queries the creation timestamp
of a given dict in `system.dicts`.
We will later use the age of the current SSTable compression dict
to decide if another training should be done already.
Add new compressor names to `sstable_compression`.
When those names are configured in the schema,
new SSTables will be compressed with dict-aware Zstd or LZ4
respectively.
When a table is dropped, its corresponding dictionary in `system.dicts`
-- if any -- should be deleted, otherwise it will remain forever as
garbage.
This commit implements such cleanup.
Dictionary contents are kept in the list of "compression options" in the
header of `CompressionInfo.db`, and they are loaded from disk into
memory when the `sstable::compression` object is populated.
After the decompressor for the SSTable is created based on those
dict contents, they are not needed in RAM anymore. And since
they take up a sizeable amount of memory, we would like to free them.
In this patch, we discard all "hidden compression options"
(currently: only the dictionary contents) from the
`sstable::compression` object right after the decompressor is created.
(Those options are not supposed to be used for anything else anyway).
Cleanup patch. After we moved the ownership of compressors
to sstables, compressor objects never have shared lifetime.
`unique_ptr` is more appropriate for them than `shared_ptr` now.
(And besides expressing the intent better, using `unique_ptr`
prevents an accidental cross-shard `shared_ptr` copy).
Add an API call which will retrain the SSTable compression dictionary
for a given table.
Currently, it needs all nodes to be alive to succeed. We can relax this later.
storage_service will be the interface between the API layer
(or the automatic training loop) and the dict machinery.
This commit implements the relevant interface for that.
It adds methods that:
1. Take SSTable samples from the cluster, using the new RPC verbs.
2. Train a dict on the sample. (The trainer will be plugged in from `main`).
3. Publishes the trained dictionary. (By adding mutations to Raft group 0).
Perhaps this should be moved to a separate "service".
But it's not like `storage_service` has a clear purpose anyway.
Currently, there is at most one dictionary in `system.dicts`:
named "general", used by RPC compression. So the callback called
on `system.dicts` just always refreshes the RPC compression dict.
In a follow-up commit, we will publish SSTable compression dicts to
`system.dicts` rows with a name in the "sstables/{table_uuid}" format.
We want modification to such rows to be passed as new dictionary
recommendations to the SSTable compressor factory. This commit teaches
the `system.dicts` modification callback to recognize such modifications
and forward them to the compressor factory.
Adds a helper which uses ESTIMATE_SSTABLE_VOLUME and SAMPLE_SSTABLES
RPC calls to gather a combined sample of SSTable Data files for the given table
from the entire cluster.
Add two verbs needed to implement dictionary training for SSTable
compression.
SAMPLE_SSTABLES returns a list of randomly-selected chunks of Data files
with a given cardinality and using a given chunk size,
for the given table.
ESTIMATE_SSTABLE_VOLUME returns the total uncompressed size of all Data
files the given table.
Extend the `system.dicts` helper for querying and modifying
`system.dicts` with an ability to use names other than "general".
We will use that in later commits to publish dictionaries for SSTable compression.
Before this patch, `system.dicts` contains only one dictionary, for RPC
compression, with the fixed name "general".
In later parts of this series, we will add more dictionaries to
system.dicts, one per table, for SSTable compression.
To enable that, this patch adjusts the callback mechanism for group0's `write_mutations`
command, so that the mutation callbacks for group0-managed tables can see which
partition keys were affected. This way, the callbacks can query only the
modified partitions instead of doing a full scan. (This is necessary to
prevent quadratic behaviours.)
For now, only the `system.dicts` callback uses the partition keys.
We want a method that will allow us to take a stable snapshot of
SSTables, to asynchronously compute some stats on them.
But `take_storage_snapshot` is overly invasive for that, because
it flushes memtables on each call.
(If `take_storage_snapshot` was, for example, called repetitively,
it could create a ton of small memtables and lead to trouble).
This commit adds a weaker version which only takes a snapshot of
*existing SSTables*, and doesn't flush memtables by itself.
This will be useful for dictionary training, which doesn't
care about the semantics of SSTables, only their rough statistical
properties.
Remove `compressor::create()`. This enforces that compressors
are only created through the `sstable_compressor_factory`.
Unlike the synchronous `compressor::create()`, the factory will be able
to create dict-aware compressors.
Create a `sstable_compressor_factory_impl` in `scylla_main`,
and pipe it through constructors into `sstables_manager`.
In next commits, the factory available through the `sstables_manager`
will be used to create compressors for SSTable readers and writers.
Before this commit, `compressor` objects are synchronously
created, during the creation or opening of SSTables,
from `compression_parameters` objects.
But we want to add compression dictionaries to SSTables and we want
to share dictionary contents across shards.
To do that, we need to make the creation of `compressor` objects asynchronous,
and give it access to a global dictionary registry.
We encapsulate that in a `sstable_compression_factory`. Instead of
calling `compressor::create()` on SSTable opening or creation, we will
ask the factory, asynchronously, for a new compressor, and it will return
a compressor with a deduplicated, up-to-date dictionary.
This commit introduces such a factory. It's not used anywhere yet,
and the compressors it produces don't use the provided dictionaries yet.
Before this commit, "compression options" written into
CompressionInfo.db (and used to construct a decompressor)
have a 1:1 correspondence to "compression options" specified
in the schema.
But we want to add a new "compression option" -- the compression
dictionary -- which will be written into CompressionInfo.db
and used to construct decompressors, but won't be specified in the
schema.
To reconcile that, in this commit we introduce the notion of a "hidden
option". If an option name in `CompressionInfo.db` begins with a dot,
then this option will be used to construct decompressors, but won't
be visible for other uses. (I.e. for the `sstable_info` API call
and for recovering a fake `schema` from `CompressionInfo.db` in the
`scylla sstable` tool).
Then, we introduce the hidden `.dictionary.{0,1,2,..}` options,
which hold the contents of the dictionary blob for this SSTable.
(The dictionary is split into several parts because the SSTable
format limits the length of a single option value to 16 bits,
and dictionaries usually have a length greater than that).
This commit only introduces helpers which translate dictionary blobs
into "options" for CompressionInfo.db, and vice-versa, but it doesn't
use those helpers yet. They will be used in later commits.
Following up on the previous commits, we avoid constructing
compressors where not necessary,
by checking things directly on `compression_parameters` instead.
Following up on the previous commit, we avoid constructing
a compressor in the `sstable_info` API call, and we instead
read the compression options from the `sstable::compression`.
SSTable readers and writers use `compressor` objects to compress and
decompress chunks of SSTable data files.
`compressor` objects are read-only, so only one of them is needed
for each SSTable. Before this commit, each reader and writer has
its own `compressor` object. This isn't necessary, but it's okay.
But later in this series it will stop being okay, because the creation
of a `compressor` will become an expensive cross-shard
operation (because it might require sharing a compression dictionary
from another shard). So we have to adjust the code so that there is
only once `compressor` per sstable, not one per reader/writer.
We stuff the ownership of this compressor into `sstable::compression`.
To make the ownership clear, we remove `compression_ptr` shared
pointers from readers and writers, and make them access the
compressor via the `sstable::compression` instead.
It used to be used by `compression_parameters` validation logic
to ask the created `compressor` for compressor-specific option names.
Since we no longer delegate this to `compressor`, but we just
put the knowledge of those options directly into
`compressor_parameters`, it's dead code now.
Since we now parse and validate the compression level during the
construction of `compression_parameters`, we can just pass the
structured params to `zstd_processor` instead of passing
a raw string map.
Unlike all other implementations of `compressor`, `zstd_processor`
has its own special object file and its own special
late binding mechanism (via the `class_registry`).
It doesn't need either.
Let's squash it into `compress.cc`. Keeping `zstd_processor` a separate "module"
would require adding even more headers and source files later in the
series (when adding dictionaries), and there's no benefit in being
so granular. All `compressor` logic can be in `compress.cc` and it will
still be small enough.
This commit also gets rid of the pointless `class_registry` late binding
mechanism and just constructs the `zstd_processor` in
`compressor::create()` with a regular constructor call.
Note: this commit is meant to be a code refactoring only and is not intended
to change the observable behaviour.
Today `schema` contains a `compression_parameters`.
`compression_parameters` contains an instance of
`compressor`, and SSTable writers just share that instance.
This is fine because `compressor` is a stateless object,
functionally dependent on the schema.
But in later parts of the series, we will break this functional
dependency by adding dictionaries to compressors. Two writers
for the same schema might have different dictionaries, so they won't
be able to just share a single instance contained in the schema.
And when that happens, having a `compressor` instance
in the `schema`/`compression_parameters` will become awkward,
since it won't be actually used. It will be only a container for options.
In addition, for performance reasons, we will want to share some pieces
of compressors across shards, which will require -- in the general case --
a construction of a compressor to be asynchronous, and therefore not
possible inside the constructor of `compression_parameters`.
This commit modifies `compression_parameters` so that it doesn't hold or
construct instances of `compressor`.
Before this patch, the `compressor` instance constructed in
`compression_parameters` has an additional role of validating and
holding compressor-specific options.
(Today the only such option is the zstd compression level).
This means that the pieces of logic responsible for compressor-specific
options have to be rewritten. That ends up being the bulk of this commit.
Before this patch, `compressor` is designed to be a proper abstract
class, where the creator of a compressor doesn't even know
what he's creating -- he passes a name, and it gets turned into a
`compressor` behind a scenes.
But later, when creation of compressors will involve looking up
dictionaries, this abstraction will only get in the way.
So we give up on keeping `compressor` abstract, and instead of
using "opaque" names we turn to an explicit enum of possible compressor types.
The main point of this patch is to add the `algorithm` enum and the `algorithm_to_name()`
function. The rest of the patch switches the `compressor::name()` function
to use `algorithm_to_name()` instead of the passed-by-constructor
`compressor::_name`, to keep a single source of truth for the names.
This allows us to hexdump things other than `bytes_view`.
(That is, without reinterpret_casting them to `bytes_view`,
which -- aside from the inconvenience -- isn't quite legal.
In contrast, any span can be legally casted to `std::span<const std::byte>`).