Commit Graph

3761 Commits

Author SHA1 Message Date
Pavel Emelyanov
5bd3df507e sstables: Lazily access statistics for trace-level logging
There's a message in sstable::get_gc_before_for_fully_expire() method
that is trace-level and one of its argument finds a value in sstable
statisitics. Finding the value is not quite cheap (makes a lookup in
std::unordered_map) and for mostly-off trace messages is just a waste of
cycles.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#23910
2025-05-12 11:22:31 +03:00
Botond Dénes
4a802baccb Merge 'compress: make sstable compression dictionaries NUMA-aware ' from Michał Chojnowski
compress: distribute compression dictionaries over shards
We don't want each shard to have its own copy of each dictionary.
It would unnecessary pressure on cache and memory.
Instead, we want to share dictionaries between shards.

Before this commit, all dictionaries live on shard 0.
All other shards borrow foreign shared pointers from shard 0.

There's a problem with this setup: dictionary blobs receive many random
accesses. If shard 0 is on a remote NUMA node, this could pose
a performance problem.

Therefore, for each dictionary, we would like to have one copy per NUMA node,
not one copy per the entire machine. And each shard should use the copy
belonging to its own NUMA node. This is the main goal of this patch.

There is another issue with putting all dicts on shard 0: it eats
an assymetric amount of memory from shard 0.
This commit spreads the ownership of dicts over all shards within
the NUMA group, to make the situation more symmetric.
(Dict owner is decided based on the hash of dict contents).

It should be noted that the last part isn't necessarily a good thing,
though.
While it makes the situation more symmetric within each node,
it makes it less symmetric across the cluster, if different node
sizes are present.

If dicts occupy 1% of memory on each shard of a 100-shard node,
then the same dicts would occupy 100% of memory on a 1-shard node.

So for the sake of cluster-wide symmetry, we might later want to consider
e.g. making the memory limit for dictionaries inversely proportional
to the number of shards.

New functionality, added to a feature which isn't in any stable branch yet. No backporting.

Closes scylladb/scylladb#23590

* github.com:scylladb/scylladb:
  test: add test/boost/sstable_compressor_factory_test
  compress: add some test-only APIs
  compress: rename sstable_compressor_factory_impl to dictionary_holder
  compress: fix indentation
  compress: remove sstable_compressor_factory_impl::_owner_shard
  compress: distribute compression dictionaries over shards
  test: switch uses of make_sstable_compressor_factory() to a seastar::thread-dependent version
  test: remove sstables::test_env::do_with()
2025-05-08 09:52:46 +03:00
Botond Dénes
e5d944f986 Merge 'replica: Fix use-after-free with concurrent schema change and sstable set update' from Raphael Raph Carvalho
When schema is changed, sstable set is updated according to the compaction strategy of the new schema (no changes to set are actually made, just the underlying set type is updated), but the problem is that it happens without a lock, causing a use-after-free when running concurrently to another set update.

Example:

1) A: sstable set is being updated on compaction completion
2) B: schema change updates the set (it's non deferring, so it happens in one go) and frees the set used by A.
3) when A resumes, system will likely crash since the set is freed already.

ASAN screams about it:
SUMMARY: AddressSanitizer: heap-use-after-free sstables/sstable_set.cc ...

Fix is about deferring update of the set on schema change to compaction, which is triggered after new schema is set. Only strategy state and backlog tracker are updated immediately, which is fine since strategy doesn't depend on any particular implementation of sstable set.

Fixes #22040.

Closes scylladb/scylladb#23680

* github.com:scylladb/scylladb:
  replica: Fix use-after-free with concurrent schema change and sstable set update
  sstables: Implement sstable_set_impl::all_sstable_runs()
2025-05-08 06:56:16 +03:00
Pavel Emelyanov
0a9675de01 sstable: Use fmt::to_string(sstable::filename()) to get component file path
The stream sink abort() method wants to remove component file by its
path. For that the path is calculated from storage prefix and component
basename, but there's a filename() method for it already.

SStable filenames shouldn't be considered as on-disk paths (see #23194),
but places that want it should be explicit and format the filename to
string by hand.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#24039
2025-05-07 22:25:58 +03:00
Pavel Emelyanov
36baeaeb57 sstable: Move update_info_for_opened_data() method to private: block
The method is internally called by ssatble itself to refresh its state
after opening or assigning (from foreign info) data and index files.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#24041
2025-05-07 20:58:34 +03:00
Pavel Emelyanov
c2ecc45db8 sstable: Remove validate argument from sstable::load_metadata()
There are only two callers of the method and the one that wants
validation (the sstable::load()) can do it on its own. This helps the
other caller (schema loader) being simpler and shorter.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#24038
2025-05-07 20:57:37 +03:00
Michał Chojnowski
518f04f1c4 compress: add some test-only APIs
Will be needed by the test added in the next patch.
2025-05-07 14:43:20 +02:00
Michał Chojnowski
66a454f61d compress: rename sstable_compressor_factory_impl to dictionary_holder
Since sstable_compressor_factory_impl no longer
implements sstable_compressor_factory, the name can be
misleading. Rename it to something closer to its new role.
2025-05-07 14:43:20 +02:00
Michał Chojnowski
1bcf77951c compress: distribute compression dictionaries over shards
We don't want each shard to have its own copy of each dictionary.
It would unnecessary pressure on cache and memory.
Instead, we want to share dictionaries between shards.

Before this commit, all dictionaries live on shard 0.
All other shards borrow foreign shared pointers from shard 0.

There's a problem with this setup: dictionary blobs receive many random
accesses. If shard 0 is on a remote NUMA node, this could pose
a performance problem.

Therefore, for each dictionary, we would like to have one copy per NUMA node,
not one copy per the entire machine. And each shard should use the copy
belonging to its own NUMA node. This is the main goal of this patch.

There is another issue with putting all dicts on shard 0: it eats
an assymetric amount of memory from shard 0.
This commit spreads the ownership of dicts over all shards within
the NUMA group, to make the situation more symmetric.
(Dict owner is decided based on the hash of dict contents).

It should be noted that the last part isn't necessarily a good thing,
though.
While it makes the situation more symmetric within each node,
it makes it less symmetric across the cluster, if different node
sizes are present.

If dicts occupy 1% of memory on each shard of a 100-shard node,
then the same dicts would occupy 100% of memory on a 1-shard node.

So for the sake of cluster-wide symmetry, we might later want to consider
e.g. making the memory limit for dictionaries inversely proportional
to the number of shards.
2025-05-07 14:43:18 +02:00
Michał Chojnowski
8649adafa8 test: switch uses of make_sstable_compressor_factory() to a seastar::thread-dependent version
In next patches, make_sstable_compressor_factory() will have to
disappear.
In preparation for that, we switch to a seastar::thread-dependent
replacement.
2025-05-07 14:43:04 +02:00
Raphael S. Carvalho
628bec4dbd sstables: Implement sstable_set_impl::all_sstable_runs()
With upcoming change where table::set_compaction_strategy() might delay
update of sstable set, ICS might temporarily work with sstable set
implementations other than partitioned_sstable_set. ICS relies on
all_sstable_runs() during regular compaction, and today it triggers
bad_function_call exception if not overriden by set implementation.
To remove this strong dependency between compaction strategy and
a particular set implementation, let's provide a default implementation
of all_sstable_runs(), such that ICS will still work until the set
is updated eventually through a process that adds or remove a
sstable.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2025-05-06 10:06:06 -03:00
Pavel Emelyanov
b56d6fbb84 Merge 'sstables: Fix quadratic space complexity in partitioned_sstable_set' from Raphael Raph Carvalho
Interval map is very susceptible to quadratic space behavior when it's flooded with many entries overlapping all (or most of) intervals, since each such entry will have presence on all intervals it overlaps with.

A trigger we observed was memtable flush storm, which creates many small "L0" sstables that spans roughly the entire token range.

Since we cannot rely on insertion order, solution will be about storing sstables with such wide ranges in a vector (unleveled).

There should be no consequence for single-key reads, since upper layer applies an additional filtering based on token of key being queried.
And for range scans, there can be an increase in memory usage, but not significant because the sstables span an wide range and would have been selected in the combined reader if the range of scan overlaps with them.

Anyway, this is a protection against storm of memtable flushes and shouldn't be the common scenario.

It works both with tablets and vnodes, by adjusting the token range spanned by compaction group accordingly.

Fixes #23634.

We can backport this into 2024.2, 2025.1, but we should let this cook in master for 1 month or so.

Closes scylladb/scylladb#23806

* github.com:scylladb/scylladb:
  test: Verify partitioned set store split and unsplit correctly
  sstables: Fix quadratic space complexity in partitioned_sstable_set
  compaction: Wire table_state into make_sstable_set()
  compaction: Introduce token_range() to table_state
  dht: Add overlap_ratio() for token range
2025-05-05 11:28:38 +03:00
Pavel Emelyanov
d40d6801b0 sstable_directory: Print ks.cf when moving unshared remove sstables
When an sstable is identified by sstable_directory as remote-unshared,
it will at some point be moved to the target shard. When it happens a
log-message appears:

    sstable_directory - Moving 1 unshared SSTables to shard 1

Processing of tables by sstable_directory often happens in parallel, and
messages from sstable_directory are intermixed. Having a message like
above is not very informative, as it tells nothing about sstables that
are being moved.

Equip the message with ks:cf pair to make it more informative.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#23912
2025-05-05 09:45:44 +03:00
Pavel Emelyanov
e0f30a30a7 sstable_directory: Print unshared remote sstable when sorting
When collecting sstables, the sstable_directory may sort the collected
descriptors into one of three buckets -- unshared local and remote, and
shared ones. Unshared local and shared sstables' paths are loggerd (with
trace level) while unshared remote is silently collected for further
processing. Add log message for that case too, there's enough data to
print the sstable path as well.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#23913
2025-05-05 09:33:06 +03:00
Raphael S. Carvalho
d5bee4c814 test: Verify partitioned set store split and unsplit correctly
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2025-04-29 15:47:33 -03:00
Raphael S. Carvalho
c77f710a0c sstables: Fix quadratic space complexity in partitioned_sstable_set
Interval map is very susceptible to quadratic space behavior when
it's flooded with many entries overlapping all (or most of)
intervals, since each such entry will have presence on all
intervals it overlaps with.

A trigger we observed was memtable flush storm, which creates many
small "L0" sstables that spans roughly the entire token range.

Since we cannot rely on insertion order, solution will be about
storing sstables with such wide ranges in a vector (unleveled).

There should be no consequence for single-key reads, since upper
layer applies an additional filtering based on token of key being
queried.
And for range scans, there can be an increase in memory usage,
but not significant because the sstables span an wide range and
would have been selected in the combined reader if the range of
scan overlaps with them.

Anyway, this is a protection against storm of memtable flushes
and shouldn't be the common scenario.

It works both with tablets and vnodes, by adjusting the token
range spanned by compaction group accordingly.

Fixes #23634.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2025-04-29 15:47:33 -03:00
Raphael S. Carvalho
21d1e78457 compaction: Wire table_state into make_sstable_set()
This will be useful for feeding token range owned by compaction group
into sstable set.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2025-04-29 15:47:33 -03:00
Botond Dénes
6172ff501f readers: mv reversing_v2.hh reversing.hh
Completely mechanical change.
2025-04-16 04:46:08 -04:00
Botond Dénes
c29c696780 readers: mv from_mutations_v2.hh from_mutations.hh
Completely mechanical change.
2025-04-16 04:46:08 -04:00
Botond Dénes
b104862702 tree: s/make_mutation_reader_from_mutations_v2/make_mutation_reader_from_mutations/s
Completely mechanical change.
2025-04-16 04:46:07 -04:00
Botond Dénes
f1bd2553ed readers: mv forwardable_v2.hh forwardable.hh
Completely mechanical change.
2025-04-16 04:33:50 -04:00
Botond Dénes
a9d75c4f9d readers: mv empty_v2.hh empty.hh
Completely mechanical change.
2025-04-16 04:32:56 -04:00
Botond Dénes
05829f98f3 tree: s/make_empty_flat_reader_v2/make_empty_mutation_reader/
Completely mechanical change.
2025-04-16 04:32:56 -04:00
Pavel Emelyanov
b25cb5af0c Merge 'Use named gates' from Benny Halevy
Name the gates and phased barriers we use
to make it easy to debug gate_closed_exception

Refs https://github.com/scylladb/seastar/pull/2688

* Enhancement only, no backport needed

Closes scylladb/scylladb#23329

* github.com:scylladb/scylladb:
  utils: loading_cache: use named_gate
  utils: flush_queue: use named_gate
  sstables_manager: use named gate
  sstables_loader: use named gate
  utils: phased_barrier, pluggable: use named gate
  utils: s3::client::multipart_upload: use named gate
  utils: s3::client: use named_gate
  transport: controller: use named gate
  tracing: trace_keyspace_helper: use named gate
  task_manager: module: use named gate
  topology_coordinator: use named gate
  storage_service: use named gate
  storage_proxy: wait_for_hint_sync_point: use named gate
  storage_proxy: remote: use named gate
  service: session: use named gate
  service: raft: raft_rpc: use named gate
  service: raft: raft_group0: use named gate
  service: raft: persistent_discovery: use named gate
  service: raft: group0_state_machine: use named gate
  service: migration_manager: use named gate
  replica: table: use named gate
  replica: compaction_group, storage_group: use named gate
  redis: query_processor: use named gate
  repair: repair_meta: use named gate
  reader_concurrency_semaphore: use named gate
  raft: server_impl: use named gate
  querier_cache: use named gate
  gms: gossiper: use named gate
  generic_server: use named gate
  db: sstables_format_listener: use named gate
  db: snapshot: backup_task: use named gate
  db: snapshot_ctl: use named gate
  hints: hints_sender: use named gate
  hints: manager: use named gate
  hints: hint_endpoint_manager: use named gate
  commitlog: segment_manager: use named gate
  db: batchlog_manager: use named gate
  query_processor: remote: use named gate
  compaction: compaction_state: use named gate
  alternator/server: use named_gate
2025-04-14 20:56:32 +03:00
Pavel Emelyanov
1bd991a111 test: Inherit sstable_assertions from sstables::test
The latter class is invented to let tests access private fields of an
sstable (mostly methods). The former is in fact an extended version of
that also does some checks. Howerver, they don't inherit from each
other, and the sstable_assertions partially duplicates some funtionality
of the test one.

Add the inheritance, remove the duplicated methods from the child class,
update the callers (the test class returns future<>s, the assertions one
"knows" it runs in seastar thread) and marm sstable::read_toc() private.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#23697
2025-04-14 13:45:14 +03:00
Benny Halevy
d665bb4f8b sstables_manager: use named gate
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-04-12 11:47:00 +03:00
Benny Halevy
d8b0c661e4 sstables_manager: add subscriptions
Allow other submodules to subscribe for added/deleted
notifications.  This will be used in a later to
patch to prioritize unlinked sstables for backup.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-04-09 08:54:07 +03:00
Benny Halevy
e60fcc58b7 sstables: directory_semaphore: expose get_units
To be used by a following patch for
backup concurrency control.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-04-09 08:54:07 +03:00
Benny Halevy
63bc1d4626 db: snapshot: backup_task: do_backup: organize components by sstable generation
Do not rely on the snapshot directory listing order.
This will become useful for prioritizing unlinked
sstables in a following patch.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-04-09 08:54:06 +03:00
Avi Kivity
ac3d25eb44 sstable_set: incremental_reader_selector: be more careful when filtering out already engaged sstables
The incremental reader selector maintains an unordered_set of
sstables that are already engaged, and uses std::views::filter
to filter those out. It adds the sstable under consideration to the
set, and if addition failed (because it's already in) then it
filters it out.

This breaks if the filter view is executed twice - the first pass
will add every sstable to the set, and the second will consider
every sstable already filtered. This is what happens with
libstdc++ 15 (due to the addition of vector(from_range_t) constructor),
which uses the first pass to calculate the vector size
and the second pass to insert the elements into a correctly-sized
vector.

Fix by open-coding the loop.

Closes scylladb/scylladb#23597
2025-04-07 12:49:04 +03:00
Pavel Emelyanov
2ee9cec1d3 Merge 'Remove object_storage.yaml and move the endpoints to scylla.yaml' from Robert Bindar
Move `object_storage.yaml` endpoints to `scylla.yaml`

This change also removes the `object_storage.yaml` file
altogether and adds tests for fetching the endpoints
via the `v2/config/object_storage_endpoints` REST api.

Also, `object_storage_config_file` options is moved to a deprecated state as it's no longer needed.

This PR depends on #22951, the reviewers should review patch 393e1ac0ec066475ca94094265a5f88dbbdb1a1f

Refs https://github.com/scylladb/scylladb/issues/22428

Closes scylladb/scylladb#22952

* github.com:scylladb/scylladb:
  Remove db::config::object_storage_config
  Move `object_storage.yaml` endpoints to `scylla.yaml`
2025-04-01 16:01:44 +03:00
Avi Kivity
69684e16d8 Merge 'sstables: add SSTable compression with shared dictionaries ' from Michał Chojnowski
This PR extends Scylla's SSTable compression with the ability to use compression dictionaries shared across compression chunks. This involves several changes:

- We refactor `compression_parameters` and friends (`compressor`, `sstables::local_compression`, `sstables::compression`) to prepare for making the construction of `compressor`s asynchronous, to enable sharing pieces of compressors (the dictionaries) across shards.
- We introduce the notion of "hidden compression options" which are written to `CompressionInfo.db` and used to construct decompressors, like regular options, but don't appear in the schema. (We later stuff the SSTable's dictionary into `CompressionInfo.db` using a sequence of such options).
- We add a cluster feature which guards the creation of dictionary-compressed SSTables.
- We introduce a central "compressor factory" (one instance shared by all shards), which from this point onward is used to construct all `compressor` objects (one per SSTable) used to process the SSTables. When constructing a compressor for writing, it uses the "current"/"recommended" dictionary (which is passed to the factory from the actively-observed contents of the group0-managed `system.dicts`). When constructing a compressor for reading, it uses the dictionary written in the hidden compression options in CompressionInfo.db. And it keeps dictionaries deduplicated, so that each unique live dictionary blob has only one instance in memory, shared across shards.
- We teach the relevant `lz4` and `zstd` compressor wrappers about the dictionaries.
- We add a HTTP API call which samples pieces of the given table (i.e. the Data.db files) from across the cluster, trains a dictionary on it, and publishes it via `system.dicts` as the new current dictionary for that table. (And we add some RPC verbs to support that).
- We add a HTTP API call which estimates the impact of various available compression configurations on the compression ratio.
- We add an autotrainer fiber which periodically retrains dicts for dict-aware tables and publishes them if they seem to be a significant improvement.

Known imperfections:
- The factory currently keeps one dictionary instance on the entire node, but we probably want one copy per NUMA node. I didn't do that because exposing NUMA knowledge to Scylla seems to require some changes in Seastar first.

New feature, no backporting involved.

Closes scylladb/scylladb#23025

* github.com:scylladb/scylladb:
  docs: add user-facing documentation for SSTable compression with shared dicts
  docs/dev: add sstable-compression-dicts.md
  test: add test_sstable_compression_dictionaries_autotrain.py
  test: add test_sstable_compression_dictionaries_basic.py
  test/pylib/rest_client: add `keyspace_upgrade_sstables` helper
  main: run a sstable_dict_autotrainer
  api: add the estimate_compression_ratios API call
  dict_autotrainer: introduce sstable_dict_autotrainer
  db/system_keyspace: add query_dict_timestamp
  compress: add ZstdWithDictsCompressor and LZ4WithDictsCompressor
  main: clean up sstable compression dicts after table drops
  sstables/compress: discard hidden compression options after the decompressor is created
  compress: change compressor_ptr from shared_ptr to unique_ptr
  api: add the retrain_dict API call
  storage_service: add some dict-related routines
  main: in compression_dict_updated_callback, recognize and use SSTable compression dicts
  storage_service: add do_sample_sstables()
  messaging_service: add SAMPLE_SSTABLES and ESTIMATE_SSTABLE_VOLUME verbs
  db/system_keyspace: let `system.dicts` helpers be used for dicts other than the RPC compression dict
  raft/group0_state_machine: on `system.dicts` mutations, pass the affected partitition keys to the callback
  database: add sample_data_files()
  database: add take_sstable_set_snapshot()
  compress: teach `lz4_processor` about dictionaries
  compress: teach `zstd_processor` about dictionaries
  sstables: delegate compressor creation to the compressor factory
  sstables: plug an `sstable_compressor_factory` into `sstables_manager`
  sstables: introduce sstable_compressor_factory
  utils/hashers: add get_sha256()
  gms/feature_service: add the SSTABLE_COMPRESSION_DICTS cluster feature
  compress: add hidden dictionary options
  compress: remove `compression_parameters::get_compressor()`
  sstables/compress: remove get_sstable_compressor()
  sstables/compress: move ownership of `compressor` to `sstable::compression`
  compress: remove compressor::option_names()
  compress: clean up the constructor of zstd_processor
  compress: squash zstd.cc into compress.cc
  sstables/compress: break the dependency of `compression_parameters` on `compressor`
  compress.hh: switch compressor::name() from an instance member to a virtual call
  bytes: adapt fmt_hex to std::span<const std::byte>
2025-04-01 12:47:34 +03:00
Pavel Emelyanov
b5a124f60c sstable_directory: Move highest_generation_seen() to distributed_loader.cc
This method is only used by the loader code (and tests). Also, There's the
highest_version_seen() peer that sits in the loader code either.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#23324
2025-04-01 09:15:14 +03:00
Pavel Emelyanov
eafc767cc6 sstable/filesystem: Add convenience helper to generate filename
In its operations the fs storage carefully generates full filename from
all sstable parameters -- version, format, generation, keyspace and
table names and component type or name. However, in all of the cases
format, version and keyspace:table names are inherited from the sstable
being operated on. This calls for a filename generation helper that
wraps most of the arguments thus making the lines shorter.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#23384
2025-04-01 09:14:44 +03:00
Michał Chojnowski
cee504f66f sstables/compress: discard hidden compression options after the decompressor is created
Dictionary contents are kept in the list of "compression options" in the
header of `CompressionInfo.db`, and they are loaded from disk into
memory when the `sstable::compression` object is populated.

After the decompressor for the SSTable is created based on those
dict contents, they are not needed in RAM anymore. And since
they take up a sizeable amount of memory, we would like to free them.

In this patch, we discard all "hidden compression options"
(currently: only the dictionary contents) from the
`sstable::compression` object right after the decompressor is created.
(Those options are not supposed to be used for anything else anyway).
2025-04-01 00:07:30 +02:00
Michał Chojnowski
10fa4abde7 compress: change compressor_ptr from shared_ptr to unique_ptr
Cleanup patch. After we moved the ownership of compressors
to sstables, compressor objects never have shared lifetime.
`unique_ptr` is more appropriate for them than `shared_ptr` now.
(And besides expressing the intent better, using `unique_ptr`
prevents an accidental cross-shard `shared_ptr` copy).
2025-04-01 00:07:29 +02:00
Michał Chojnowski
b18ddcb92e sstables: delegate compressor creation to the compressor factory
Remove `compressor::create()`. This enforces that compressors
are only created through the `sstable_compressor_factory`.

Unlike the synchronous `compressor::create()`, the factory will be able
to create dict-aware compressors.
2025-04-01 00:07:28 +02:00
Michał Chojnowski
30a9d471fa sstables: plug an sstable_compressor_factory into sstables_manager
Create a `sstable_compressor_factory_impl` in `scylla_main`,
and pipe it through constructors into `sstables_manager`.

In next commits, the factory available through the `sstables_manager`
will be used to create compressors for SSTable readers and writers.
2025-04-01 00:07:28 +02:00
Michał Chojnowski
ebf02913a2 sstables: introduce sstable_compressor_factory
Before this commit, `compressor` objects are synchronously
created, during the creation or opening of SSTables,
from `compression_parameters` objects.

But we want to add compression dictionaries to SSTables and we want
to share dictionary contents across shards.
To do that, we need to make the creation of `compressor` objects asynchronous,
and give it access to a global dictionary registry.

We encapsulate that in a `sstable_compression_factory`. Instead of
calling `compressor::create()` on SSTable opening or creation, we will
ask the factory, asynchronously, for a new compressor, and it will return
a compressor with a deduplicated, up-to-date dictionary.

This commit introduces such a factory. It's not used anywhere yet,
and the compressors it produces don't use the provided dictionaries yet.
2025-04-01 00:07:28 +02:00
Michał Chojnowski
dd932ebb2f compress: add hidden dictionary options
Before this commit, "compression options" written into
CompressionInfo.db (and used to construct a decompressor)
have a 1:1 correspondence to "compression options" specified
in the schema.

But we want to add a new "compression option" -- the compression
dictionary -- which will be written into CompressionInfo.db
and used to construct decompressors, but won't be specified in the
schema.

To reconcile that, in this commit we introduce the notion of a "hidden
option". If an option name in `CompressionInfo.db` begins with a dot,
then this option will be used to construct decompressors, but won't
be visible for other uses. (I.e. for the `sstable_info` API call
and for recovering a fake `schema` from `CompressionInfo.db` in the
`scylla sstable` tool).

Then, we introduce the hidden `.dictionary.{0,1,2,..}` options,
which hold the contents of the dictionary blob for this SSTable.

(The dictionary is split into several parts because the SSTable
format limits the length of a single option value to 16 bits,
and dictionaries usually have a length greater than that).

This commit only introduces helpers which translate dictionary blobs
into "options" for CompressionInfo.db, and vice-versa, but it doesn't
use those helpers yet. They will be used in later commits.
2025-04-01 00:07:28 +02:00
Michał Chojnowski
11be7c0704 compress: remove compression_parameters::get_compressor()
Following up on the previous commits, we avoid constructing
compressors where not necessary,
by checking things directly on `compression_parameters` instead.
2025-04-01 00:07:28 +02:00
Michał Chojnowski
006c631642 sstables/compress: remove get_sstable_compressor()
Following up on the previous commit, we avoid constructing
a compressor in the `sstable_info` API call, and we instead
read the compression options from the `sstable::compression`.
2025-04-01 00:07:28 +02:00
Michał Chojnowski
8e611536b0 sstables/compress: move ownership of compressor to sstable::compression
SSTable readers and writers use `compressor` objects to compress and
decompress chunks of SSTable data files.

`compressor` objects are read-only, so only one of them is needed
for each SSTable. Before this commit, each reader and writer has
its own `compressor` object. This isn't necessary, but it's okay.

But later in this series it will stop being okay, because the creation
of a `compressor` will become an expensive cross-shard
operation (because it might require sharing a compression dictionary
from another shard). So we have to adjust the code so that there is
only once `compressor` per sstable, not one per reader/writer.

We stuff the ownership of this compressor into `sstable::compression`.

To make the ownership clear, we remove `compression_ptr` shared
pointers from readers and writers, and make them access the
compressor via the `sstable::compression` instead.
2025-04-01 00:07:27 +02:00
Michał Chojnowski
7bdcd5e8c1 compress: remove compressor::option_names()
It used to be used by `compression_parameters` validation logic
to ask the created `compressor` for compressor-specific option names.

Since we no longer delegate this to `compressor`, but we just
put the knowledge of those options directly into
`compressor_parameters`, it's dead code now.
2025-04-01 00:07:27 +02:00
Michał Chojnowski
cfe69e057f sstables/compress: break the dependency of compression_parameters on compressor
Note: this commit is meant to be a code refactoring only and is not intended
to change the observable behaviour.

Today `schema` contains a `compression_parameters`.
`compression_parameters` contains an instance of
`compressor`, and SSTable writers just share that instance.

This is fine because `compressor` is a stateless object,
functionally dependent on the schema.

But in later parts of the series, we will break this functional
dependency by adding dictionaries to compressors. Two writers
for the same schema might have different dictionaries, so they won't
be able to just share a single instance contained in the schema.

And when that happens, having a `compressor` instance
in the `schema`/`compression_parameters` will become awkward,
since it won't be actually used. It will be only a container for options.

In addition, for performance reasons, we will want to share some pieces
of compressors across shards, which will require -- in the general case --
a construction of a compressor to be asynchronous, and therefore not
possible inside the constructor of `compression_parameters`.

This commit modifies `compression_parameters` so that it doesn't hold or
construct instances of `compressor`.

Before this patch, the `compressor` instance constructed in
`compression_parameters` has an additional role of validating and
holding compressor-specific options.
(Today the only such option is the zstd compression level).

This means that the pieces of logic responsible for compressor-specific
options have to be rewritten. That ends up being the bulk of this commit.
2025-04-01 00:07:27 +02:00
Michał Chojnowski
f4ca94d13b compress.hh: switch compressor::name() from an instance member to a virtual call
Before this patch, `compressor` is designed to be a proper abstract
class, where the creator of a compressor doesn't even know
what he's creating -- he passes a name, and it gets turned into a
`compressor` behind a scenes.

But later, when creation of compressors will involve looking up
dictionaries, this abstraction will only get in the way.
So we give up on keeping `compressor` abstract, and instead of
using "opaque" names we turn to an explicit enum of possible compressor types.

The main point of this patch is to add the `algorithm` enum and the `algorithm_to_name()`
function. The rest of the patch switches the `compressor::name()` function
to use `algorithm_to_name()` instead of the passed-by-constructor
`compressor::_name`, to keep a single source of truth for the names.
2025-04-01 00:07:27 +02:00
Robert Bindar
b647196121 Remove db::config::object_storage_config
That map became redundant once we added
object_storage_endpoints in the config, this patch removes
it and switches all the user code to use the new option.

Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>
2025-03-31 17:15:12 +03:00
Avi Kivity
73e4a3c581 sstables: store features early in write path
sstable features indicate that an sstable has some extension, or that
some bug was fixed. They allow us to know if we can rely on certain
properties in a read sstables.

Currently, sstable features are set early in the read path (when we
read the scylla metadata file) and very late in the write path
(when we write the scylla metadata file just before sealing the sstable).

However, we happen to read features before we set them in the write path -
when we resize the bloom filter for a newly written sstable we instantiate
an index reader, and that depends on some features. As a result,
we read a disengaged optional (for the scylla metadata component) as if
it was engaged. This somehow worked so far, but fails with libstdc++
hash table implementation.

Fix it by moving storage of the features to the sstable itself, and
setting it early in the write path.

Fixes #23484

Closes scylladb/scylladb#23485
2025-03-31 09:33:56 +03:00
Calle Wilund
e02be77af7 sstables::storage: Move wrapping sstable components to storage provider
Fixes #23225
Fixes #23185

Moved wrapping component files/sinks to storage provider. Also ensures
to wrap data_sinks as well as actual files. This ensures that we actually
write encryption if active.
2025-03-20 14:54:24 +00:00
Calle Wilund
d46dcbb769 sstables::file_io_extension: Add a "wrap_sink" method.
Similar to wrap file, should wrap a data_sink (used for
sstable writers), in obvious write-only, simple stream
mode.

Default impl will detect if we wrap files for this component,
and if so, generate a file wrapper for the input sink, wrap
this, and the wrap it in a file_data_sink_impl.

This is obviously not efficient, so extensions used in actual
non-test code should implement the method.
2025-03-20 14:54:22 +00:00