scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-23 18:10:39 +00:00

Author	SHA1	Message	Date
Pavel Emelyanov	5bd3df507e	sstables: Lazily access statistics for trace-level logging There's a message in sstable::get_gc_before_for_fully_expire() method that is trace-level and one of its argument finds a value in sstable statisitics. Finding the value is not quite cheap (makes a lookup in std::unordered_map) and for mostly-off trace messages is just a waste of cycles. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#23910	2025-05-12 11:22:31 +03:00
Botond Dénes	4a802baccb	Merge 'compress: make sstable compression dictionaries NUMA-aware ' from Michał Chojnowski compress: distribute compression dictionaries over shards We don't want each shard to have its own copy of each dictionary. It would unnecessary pressure on cache and memory. Instead, we want to share dictionaries between shards. Before this commit, all dictionaries live on shard 0. All other shards borrow foreign shared pointers from shard 0. There's a problem with this setup: dictionary blobs receive many random accesses. If shard 0 is on a remote NUMA node, this could pose a performance problem. Therefore, for each dictionary, we would like to have one copy per NUMA node, not one copy per the entire machine. And each shard should use the copy belonging to its own NUMA node. This is the main goal of this patch. There is another issue with putting all dicts on shard 0: it eats an assymetric amount of memory from shard 0. This commit spreads the ownership of dicts over all shards within the NUMA group, to make the situation more symmetric. (Dict owner is decided based on the hash of dict contents). It should be noted that the last part isn't necessarily a good thing, though. While it makes the situation more symmetric within each node, it makes it less symmetric across the cluster, if different node sizes are present. If dicts occupy 1% of memory on each shard of a 100-shard node, then the same dicts would occupy 100% of memory on a 1-shard node. So for the sake of cluster-wide symmetry, we might later want to consider e.g. making the memory limit for dictionaries inversely proportional to the number of shards. New functionality, added to a feature which isn't in any stable branch yet. No backporting. Closes scylladb/scylladb#23590 * github.com:scylladb/scylladb: test: add test/boost/sstable_compressor_factory_test compress: add some test-only APIs compress: rename sstable_compressor_factory_impl to dictionary_holder compress: fix indentation compress: remove sstable_compressor_factory_impl::_owner_shard compress: distribute compression dictionaries over shards test: switch uses of make_sstable_compressor_factory() to a seastar::thread-dependent version test: remove sstables::test_env::do_with()	2025-05-08 09:52:46 +03:00
Botond Dénes	e5d944f986	Merge 'replica: Fix use-after-free with concurrent schema change and sstable set update' from Raphael Raph Carvalho When schema is changed, sstable set is updated according to the compaction strategy of the new schema (no changes to set are actually made, just the underlying set type is updated), but the problem is that it happens without a lock, causing a use-after-free when running concurrently to another set update. Example: 1) A: sstable set is being updated on compaction completion 2) B: schema change updates the set (it's non deferring, so it happens in one go) and frees the set used by A. 3) when A resumes, system will likely crash since the set is freed already. ASAN screams about it: SUMMARY: AddressSanitizer: heap-use-after-free sstables/sstable_set.cc ... Fix is about deferring update of the set on schema change to compaction, which is triggered after new schema is set. Only strategy state and backlog tracker are updated immediately, which is fine since strategy doesn't depend on any particular implementation of sstable set. Fixes #22040. Closes scylladb/scylladb#23680 * github.com:scylladb/scylladb: replica: Fix use-after-free with concurrent schema change and sstable set update sstables: Implement sstable_set_impl::all_sstable_runs()	2025-05-08 06:56:16 +03:00
Pavel Emelyanov	0a9675de01	sstable: Use fmt::to_string(sstable::filename()) to get component file path The stream sink abort() method wants to remove component file by its path. For that the path is calculated from storage prefix and component basename, but there's a filename() method for it already. SStable filenames shouldn't be considered as on-disk paths (see #23194), but places that want it should be explicit and format the filename to string by hand. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#24039	2025-05-07 22:25:58 +03:00
Pavel Emelyanov	36baeaeb57	sstable: Move update_info_for_opened_data() method to private: block The method is internally called by ssatble itself to refresh its state after opening or assigning (from foreign info) data and index files. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#24041	2025-05-07 20:58:34 +03:00
Pavel Emelyanov	c2ecc45db8	sstable: Remove validate argument from sstable::load_metadata() There are only two callers of the method and the one that wants validation (the sstable::load()) can do it on its own. This helps the other caller (schema loader) being simpler and shorter. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#24038	2025-05-07 20:57:37 +03:00
Michał Chojnowski	518f04f1c4	compress: add some test-only APIs Will be needed by the test added in the next patch.	2025-05-07 14:43:20 +02:00
Michał Chojnowski	66a454f61d	compress: rename sstable_compressor_factory_impl to dictionary_holder Since sstable_compressor_factory_impl no longer implements sstable_compressor_factory, the name can be misleading. Rename it to something closer to its new role.	2025-05-07 14:43:20 +02:00
Michał Chojnowski	1bcf77951c	compress: distribute compression dictionaries over shards We don't want each shard to have its own copy of each dictionary. It would unnecessary pressure on cache and memory. Instead, we want to share dictionaries between shards. Before this commit, all dictionaries live on shard 0. All other shards borrow foreign shared pointers from shard 0. There's a problem with this setup: dictionary blobs receive many random accesses. If shard 0 is on a remote NUMA node, this could pose a performance problem. Therefore, for each dictionary, we would like to have one copy per NUMA node, not one copy per the entire machine. And each shard should use the copy belonging to its own NUMA node. This is the main goal of this patch. There is another issue with putting all dicts on shard 0: it eats an assymetric amount of memory from shard 0. This commit spreads the ownership of dicts over all shards within the NUMA group, to make the situation more symmetric. (Dict owner is decided based on the hash of dict contents). It should be noted that the last part isn't necessarily a good thing, though. While it makes the situation more symmetric within each node, it makes it less symmetric across the cluster, if different node sizes are present. If dicts occupy 1% of memory on each shard of a 100-shard node, then the same dicts would occupy 100% of memory on a 1-shard node. So for the sake of cluster-wide symmetry, we might later want to consider e.g. making the memory limit for dictionaries inversely proportional to the number of shards.	2025-05-07 14:43:18 +02:00
Michał Chojnowski	8649adafa8	test: switch uses of make_sstable_compressor_factory() to a seastar::thread-dependent version In next patches, make_sstable_compressor_factory() will have to disappear. In preparation for that, we switch to a seastar::thread-dependent replacement.	2025-05-07 14:43:04 +02:00
Raphael S. Carvalho	628bec4dbd	sstables: Implement sstable_set_impl::all_sstable_runs() With upcoming change where table::set_compaction_strategy() might delay update of sstable set, ICS might temporarily work with sstable set implementations other than partitioned_sstable_set. ICS relies on all_sstable_runs() during regular compaction, and today it triggers bad_function_call exception if not overriden by set implementation. To remove this strong dependency between compaction strategy and a particular set implementation, let's provide a default implementation of all_sstable_runs(), such that ICS will still work until the set is updated eventually through a process that adds or remove a sstable. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2025-05-06 10:06:06 -03:00
Pavel Emelyanov	b56d6fbb84	Merge 'sstables: Fix quadratic space complexity in partitioned_sstable_set' from Raphael Raph Carvalho Interval map is very susceptible to quadratic space behavior when it's flooded with many entries overlapping all (or most of) intervals, since each such entry will have presence on all intervals it overlaps with. A trigger we observed was memtable flush storm, which creates many small "L0" sstables that spans roughly the entire token range. Since we cannot rely on insertion order, solution will be about storing sstables with such wide ranges in a vector (unleveled). There should be no consequence for single-key reads, since upper layer applies an additional filtering based on token of key being queried. And for range scans, there can be an increase in memory usage, but not significant because the sstables span an wide range and would have been selected in the combined reader if the range of scan overlaps with them. Anyway, this is a protection against storm of memtable flushes and shouldn't be the common scenario. It works both with tablets and vnodes, by adjusting the token range spanned by compaction group accordingly. Fixes #23634. We can backport this into 2024.2, 2025.1, but we should let this cook in master for 1 month or so. Closes scylladb/scylladb#23806 * github.com:scylladb/scylladb: test: Verify partitioned set store split and unsplit correctly sstables: Fix quadratic space complexity in partitioned_sstable_set compaction: Wire table_state into make_sstable_set() compaction: Introduce token_range() to table_state dht: Add overlap_ratio() for token range	2025-05-05 11:28:38 +03:00
Pavel Emelyanov	d40d6801b0	sstable_directory: Print ks.cf when moving unshared remove sstables When an sstable is identified by sstable_directory as remote-unshared, it will at some point be moved to the target shard. When it happens a log-message appears: sstable_directory - Moving 1 unshared SSTables to shard 1 Processing of tables by sstable_directory often happens in parallel, and messages from sstable_directory are intermixed. Having a message like above is not very informative, as it tells nothing about sstables that are being moved. Equip the message with ks:cf pair to make it more informative. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#23912	2025-05-05 09:45:44 +03:00
Pavel Emelyanov	e0f30a30a7	sstable_directory: Print unshared remote sstable when sorting When collecting sstables, the sstable_directory may sort the collected descriptors into one of three buckets -- unshared local and remote, and shared ones. Unshared local and shared sstables' paths are loggerd (with trace level) while unshared remote is silently collected for further processing. Add log message for that case too, there's enough data to print the sstable path as well. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#23913	2025-05-05 09:33:06 +03:00
Raphael S. Carvalho	d5bee4c814	test: Verify partitioned set store split and unsplit correctly Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2025-04-29 15:47:33 -03:00
Raphael S. Carvalho	c77f710a0c	sstables: Fix quadratic space complexity in partitioned_sstable_set Interval map is very susceptible to quadratic space behavior when it's flooded with many entries overlapping all (or most of) intervals, since each such entry will have presence on all intervals it overlaps with. A trigger we observed was memtable flush storm, which creates many small "L0" sstables that spans roughly the entire token range. Since we cannot rely on insertion order, solution will be about storing sstables with such wide ranges in a vector (unleveled). There should be no consequence for single-key reads, since upper layer applies an additional filtering based on token of key being queried. And for range scans, there can be an increase in memory usage, but not significant because the sstables span an wide range and would have been selected in the combined reader if the range of scan overlaps with them. Anyway, this is a protection against storm of memtable flushes and shouldn't be the common scenario. It works both with tablets and vnodes, by adjusting the token range spanned by compaction group accordingly. Fixes #23634. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2025-04-29 15:47:33 -03:00
Raphael S. Carvalho	21d1e78457	compaction: Wire table_state into make_sstable_set() This will be useful for feeding token range owned by compaction group into sstable set. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2025-04-29 15:47:33 -03:00
Botond Dénes	6172ff501f	readers: mv reversing_v2.hh reversing.hh Completely mechanical change.	2025-04-16 04:46:08 -04:00
Botond Dénes	c29c696780	readers: mv from_mutations_v2.hh from_mutations.hh Completely mechanical change.	2025-04-16 04:46:08 -04:00
Botond Dénes	b104862702	tree: s/make_mutation_reader_from_mutations_v2/make_mutation_reader_from_mutations/s Completely mechanical change.	2025-04-16 04:46:07 -04:00
Botond Dénes	f1bd2553ed	readers: mv forwardable_v2.hh forwardable.hh Completely mechanical change.	2025-04-16 04:33:50 -04:00
Botond Dénes	a9d75c4f9d	readers: mv empty_v2.hh empty.hh Completely mechanical change.	2025-04-16 04:32:56 -04:00
Botond Dénes	05829f98f3	tree: s/make_empty_flat_reader_v2/make_empty_mutation_reader/ Completely mechanical change.	2025-04-16 04:32:56 -04:00
Pavel Emelyanov	b25cb5af0c	Merge 'Use named gates' from Benny Halevy Name the gates and phased barriers we use to make it easy to debug gate_closed_exception Refs https://github.com/scylladb/seastar/pull/2688 * Enhancement only, no backport needed Closes scylladb/scylladb#23329 * github.com:scylladb/scylladb: utils: loading_cache: use named_gate utils: flush_queue: use named_gate sstables_manager: use named gate sstables_loader: use named gate utils: phased_barrier, pluggable: use named gate utils: s3::client::multipart_upload: use named gate utils: s3::client: use named_gate transport: controller: use named gate tracing: trace_keyspace_helper: use named gate task_manager: module: use named gate topology_coordinator: use named gate storage_service: use named gate storage_proxy: wait_for_hint_sync_point: use named gate storage_proxy: remote: use named gate service: session: use named gate service: raft: raft_rpc: use named gate service: raft: raft_group0: use named gate service: raft: persistent_discovery: use named gate service: raft: group0_state_machine: use named gate service: migration_manager: use named gate replica: table: use named gate replica: compaction_group, storage_group: use named gate redis: query_processor: use named gate repair: repair_meta: use named gate reader_concurrency_semaphore: use named gate raft: server_impl: use named gate querier_cache: use named gate gms: gossiper: use named gate generic_server: use named gate db: sstables_format_listener: use named gate db: snapshot: backup_task: use named gate db: snapshot_ctl: use named gate hints: hints_sender: use named gate hints: manager: use named gate hints: hint_endpoint_manager: use named gate commitlog: segment_manager: use named gate db: batchlog_manager: use named gate query_processor: remote: use named gate compaction: compaction_state: use named gate alternator/server: use named_gate	2025-04-14 20:56:32 +03:00
Pavel Emelyanov	1bd991a111	test: Inherit sstable_assertions from sstables::test The latter class is invented to let tests access private fields of an sstable (mostly methods). The former is in fact an extended version of that also does some checks. Howerver, they don't inherit from each other, and the sstable_assertions partially duplicates some funtionality of the test one. Add the inheritance, remove the duplicated methods from the child class, update the callers (the test class returns future<>s, the assertions one "knows" it runs in seastar thread) and marm sstable::read_toc() private. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#23697	2025-04-14 13:45:14 +03:00
Benny Halevy	d665bb4f8b	sstables_manager: use named gate Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-04-12 11:47:00 +03:00
Benny Halevy	d8b0c661e4	sstables_manager: add subscriptions Allow other submodules to subscribe for added/deleted notifications. This will be used in a later to patch to prioritize unlinked sstables for backup. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-04-09 08:54:07 +03:00
Benny Halevy	e60fcc58b7	sstables: directory_semaphore: expose get_units To be used by a following patch for backup concurrency control. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-04-09 08:54:07 +03:00
Benny Halevy	63bc1d4626	db: snapshot: backup_task: do_backup: organize components by sstable generation Do not rely on the snapshot directory listing order. This will become useful for prioritizing unlinked sstables in a following patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-04-09 08:54:06 +03:00
Avi Kivity	ac3d25eb44	sstable_set: incremental_reader_selector: be more careful when filtering out already engaged sstables The incremental reader selector maintains an unordered_set of sstables that are already engaged, and uses std::views::filter to filter those out. It adds the sstable under consideration to the set, and if addition failed (because it's already in) then it filters it out. This breaks if the filter view is executed twice - the first pass will add every sstable to the set, and the second will consider every sstable already filtered. This is what happens with libstdc++ 15 (due to the addition of vector(from_range_t) constructor), which uses the first pass to calculate the vector size and the second pass to insert the elements into a correctly-sized vector. Fix by open-coding the loop. Closes scylladb/scylladb#23597	2025-04-07 12:49:04 +03:00
Pavel Emelyanov	2ee9cec1d3	Merge 'Remove object_storage.yaml and move the endpoints to scylla.yaml' from Robert Bindar Move `object_storage.yaml` endpoints to `scylla.yaml` This change also removes the `object_storage.yaml` file altogether and adds tests for fetching the endpoints via the `v2/config/object_storage_endpoints` REST api. Also, `object_storage_config_file` options is moved to a deprecated state as it's no longer needed. This PR depends on #22951, the reviewers should review patch 393e1ac0ec066475ca94094265a5f88dbbdb1a1f Refs https://github.com/scylladb/scylladb/issues/22428 Closes scylladb/scylladb#22952 * github.com:scylladb/scylladb: Remove db::config::object_storage_config Move `object_storage.yaml` endpoints to `scylla.yaml`	2025-04-01 16:01:44 +03:00
Avi Kivity	69684e16d8	Merge 'sstables: add SSTable compression with shared dictionaries ' from Michał Chojnowski This PR extends Scylla's SSTable compression with the ability to use compression dictionaries shared across compression chunks. This involves several changes: - We refactor `compression_parameters` and friends (`compressor`, `sstables::local_compression`, `sstables::compression`) to prepare for making the construction of `compressor`s asynchronous, to enable sharing pieces of compressors (the dictionaries) across shards. - We introduce the notion of "hidden compression options" which are written to `CompressionInfo.db` and used to construct decompressors, like regular options, but don't appear in the schema. (We later stuff the SSTable's dictionary into `CompressionInfo.db` using a sequence of such options). - We add a cluster feature which guards the creation of dictionary-compressed SSTables. - We introduce a central "compressor factory" (one instance shared by all shards), which from this point onward is used to construct all `compressor` objects (one per SSTable) used to process the SSTables. When constructing a compressor for writing, it uses the "current"/"recommended" dictionary (which is passed to the factory from the actively-observed contents of the group0-managed `system.dicts`). When constructing a compressor for reading, it uses the dictionary written in the hidden compression options in CompressionInfo.db. And it keeps dictionaries deduplicated, so that each unique live dictionary blob has only one instance in memory, shared across shards. - We teach the relevant `lz4` and `zstd` compressor wrappers about the dictionaries. - We add a HTTP API call which samples pieces of the given table (i.e. the Data.db files) from across the cluster, trains a dictionary on it, and publishes it via `system.dicts` as the new current dictionary for that table. (And we add some RPC verbs to support that). - We add a HTTP API call which estimates the impact of various available compression configurations on the compression ratio. - We add an autotrainer fiber which periodically retrains dicts for dict-aware tables and publishes them if they seem to be a significant improvement. Known imperfections: - The factory currently keeps one dictionary instance on the entire node, but we probably want one copy per NUMA node. I didn't do that because exposing NUMA knowledge to Scylla seems to require some changes in Seastar first. New feature, no backporting involved. Closes scylladb/scylladb#23025 * github.com:scylladb/scylladb: docs: add user-facing documentation for SSTable compression with shared dicts docs/dev: add sstable-compression-dicts.md test: add test_sstable_compression_dictionaries_autotrain.py test: add test_sstable_compression_dictionaries_basic.py test/pylib/rest_client: add `keyspace_upgrade_sstables` helper main: run a sstable_dict_autotrainer api: add the estimate_compression_ratios API call dict_autotrainer: introduce sstable_dict_autotrainer db/system_keyspace: add query_dict_timestamp compress: add ZstdWithDictsCompressor and LZ4WithDictsCompressor main: clean up sstable compression dicts after table drops sstables/compress: discard hidden compression options after the decompressor is created compress: change compressor_ptr from shared_ptr to unique_ptr api: add the retrain_dict API call storage_service: add some dict-related routines main: in compression_dict_updated_callback, recognize and use SSTable compression dicts storage_service: add do_sample_sstables() messaging_service: add SAMPLE_SSTABLES and ESTIMATE_SSTABLE_VOLUME verbs db/system_keyspace: let `system.dicts` helpers be used for dicts other than the RPC compression dict raft/group0_state_machine: on `system.dicts` mutations, pass the affected partitition keys to the callback database: add sample_data_files() database: add take_sstable_set_snapshot() compress: teach `lz4_processor` about dictionaries compress: teach `zstd_processor` about dictionaries sstables: delegate compressor creation to the compressor factory sstables: plug an `sstable_compressor_factory` into `sstables_manager` sstables: introduce sstable_compressor_factory utils/hashers: add get_sha256() gms/feature_service: add the SSTABLE_COMPRESSION_DICTS cluster feature compress: add hidden dictionary options compress: remove `compression_parameters::get_compressor()` sstables/compress: remove get_sstable_compressor() sstables/compress: move ownership of `compressor` to `sstable::compression` compress: remove compressor::option_names() compress: clean up the constructor of zstd_processor compress: squash zstd.cc into compress.cc sstables/compress: break the dependency of `compression_parameters` on `compressor` compress.hh: switch compressor::name() from an instance member to a virtual call bytes: adapt fmt_hex to std::span<const std::byte>	2025-04-01 12:47:34 +03:00
Pavel Emelyanov	b5a124f60c	sstable_directory: Move highest_generation_seen() to distributed_loader.cc This method is only used by the loader code (and tests). Also, There's the highest_version_seen() peer that sits in the loader code either. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#23324	2025-04-01 09:15:14 +03:00
Pavel Emelyanov	eafc767cc6	sstable/filesystem: Add convenience helper to generate filename In its operations the fs storage carefully generates full filename from all sstable parameters -- version, format, generation, keyspace and table names and component type or name. However, in all of the cases format, version and keyspace:table names are inherited from the sstable being operated on. This calls for a filename generation helper that wraps most of the arguments thus making the lines shorter. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#23384	2025-04-01 09:14:44 +03:00
Michał Chojnowski	cee504f66f	sstables/compress: discard hidden compression options after the decompressor is created Dictionary contents are kept in the list of "compression options" in the header of `CompressionInfo.db`, and they are loaded from disk into memory when the `sstable::compression` object is populated. After the decompressor for the SSTable is created based on those dict contents, they are not needed in RAM anymore. And since they take up a sizeable amount of memory, we would like to free them. In this patch, we discard all "hidden compression options" (currently: only the dictionary contents) from the `sstable::compression` object right after the decompressor is created. (Those options are not supposed to be used for anything else anyway).	2025-04-01 00:07:30 +02:00
Michał Chojnowski	10fa4abde7	compress: change compressor_ptr from shared_ptr to unique_ptr Cleanup patch. After we moved the ownership of compressors to sstables, compressor objects never have shared lifetime. `unique_ptr` is more appropriate for them than `shared_ptr` now. (And besides expressing the intent better, using `unique_ptr` prevents an accidental cross-shard `shared_ptr` copy).	2025-04-01 00:07:29 +02:00
Michał Chojnowski	b18ddcb92e	sstables: delegate compressor creation to the compressor factory Remove `compressor::create()`. This enforces that compressors are only created through the `sstable_compressor_factory`. Unlike the synchronous `compressor::create()`, the factory will be able to create dict-aware compressors.	2025-04-01 00:07:28 +02:00
Michał Chojnowski	30a9d471fa	sstables: plug an `sstable_compressor_factory` into `sstables_manager` Create a `sstable_compressor_factory_impl` in `scylla_main`, and pipe it through constructors into `sstables_manager`. In next commits, the factory available through the `sstables_manager` will be used to create compressors for SSTable readers and writers.	2025-04-01 00:07:28 +02:00
Michał Chojnowski	ebf02913a2	sstables: introduce sstable_compressor_factory Before this commit, `compressor` objects are synchronously created, during the creation or opening of SSTables, from `compression_parameters` objects. But we want to add compression dictionaries to SSTables and we want to share dictionary contents across shards. To do that, we need to make the creation of `compressor` objects asynchronous, and give it access to a global dictionary registry. We encapsulate that in a `sstable_compression_factory`. Instead of calling `compressor::create()` on SSTable opening or creation, we will ask the factory, asynchronously, for a new compressor, and it will return a compressor with a deduplicated, up-to-date dictionary. This commit introduces such a factory. It's not used anywhere yet, and the compressors it produces don't use the provided dictionaries yet.	2025-04-01 00:07:28 +02:00
Michał Chojnowski	dd932ebb2f	compress: add hidden dictionary options Before this commit, "compression options" written into CompressionInfo.db (and used to construct a decompressor) have a 1:1 correspondence to "compression options" specified in the schema. But we want to add a new "compression option" -- the compression dictionary -- which will be written into CompressionInfo.db and used to construct decompressors, but won't be specified in the schema. To reconcile that, in this commit we introduce the notion of a "hidden option". If an option name in `CompressionInfo.db` begins with a dot, then this option will be used to construct decompressors, but won't be visible for other uses. (I.e. for the `sstable_info` API call and for recovering a fake `schema` from `CompressionInfo.db` in the `scylla sstable` tool). Then, we introduce the hidden `.dictionary.{0,1,2,..}` options, which hold the contents of the dictionary blob for this SSTable. (The dictionary is split into several parts because the SSTable format limits the length of a single option value to 16 bits, and dictionaries usually have a length greater than that). This commit only introduces helpers which translate dictionary blobs into "options" for CompressionInfo.db, and vice-versa, but it doesn't use those helpers yet. They will be used in later commits.	2025-04-01 00:07:28 +02:00
Michał Chojnowski	11be7c0704	compress: remove `compression_parameters::get_compressor()` Following up on the previous commits, we avoid constructing compressors where not necessary, by checking things directly on `compression_parameters` instead.	2025-04-01 00:07:28 +02:00
Michał Chojnowski	006c631642	sstables/compress: remove get_sstable_compressor() Following up on the previous commit, we avoid constructing a compressor in the `sstable_info` API call, and we instead read the compression options from the `sstable::compression`.	2025-04-01 00:07:28 +02:00
Michał Chojnowski	8e611536b0	sstables/compress: move ownership of `compressor` to `sstable::compression` SSTable readers and writers use `compressor` objects to compress and decompress chunks of SSTable data files. `compressor` objects are read-only, so only one of them is needed for each SSTable. Before this commit, each reader and writer has its own `compressor` object. This isn't necessary, but it's okay. But later in this series it will stop being okay, because the creation of a `compressor` will become an expensive cross-shard operation (because it might require sharing a compression dictionary from another shard). So we have to adjust the code so that there is only once `compressor` per sstable, not one per reader/writer. We stuff the ownership of this compressor into `sstable::compression`. To make the ownership clear, we remove `compression_ptr` shared pointers from readers and writers, and make them access the compressor via the `sstable::compression` instead.	2025-04-01 00:07:27 +02:00
Michał Chojnowski	7bdcd5e8c1	compress: remove compressor::option_names() It used to be used by `compression_parameters` validation logic to ask the created `compressor` for compressor-specific option names. Since we no longer delegate this to `compressor`, but we just put the knowledge of those options directly into `compressor_parameters`, it's dead code now.	2025-04-01 00:07:27 +02:00
Michał Chojnowski	cfe69e057f	sstables/compress: break the dependency of `compression_parameters` on `compressor` Note: this commit is meant to be a code refactoring only and is not intended to change the observable behaviour. Today `schema` contains a `compression_parameters`. `compression_parameters` contains an instance of `compressor`, and SSTable writers just share that instance. This is fine because `compressor` is a stateless object, functionally dependent on the schema. But in later parts of the series, we will break this functional dependency by adding dictionaries to compressors. Two writers for the same schema might have different dictionaries, so they won't be able to just share a single instance contained in the schema. And when that happens, having a `compressor` instance in the `schema`/`compression_parameters` will become awkward, since it won't be actually used. It will be only a container for options. In addition, for performance reasons, we will want to share some pieces of compressors across shards, which will require -- in the general case -- a construction of a compressor to be asynchronous, and therefore not possible inside the constructor of `compression_parameters`. This commit modifies `compression_parameters` so that it doesn't hold or construct instances of `compressor`. Before this patch, the `compressor` instance constructed in `compression_parameters` has an additional role of validating and holding compressor-specific options. (Today the only such option is the zstd compression level). This means that the pieces of logic responsible for compressor-specific options have to be rewritten. That ends up being the bulk of this commit.	2025-04-01 00:07:27 +02:00
Michał Chojnowski	f4ca94d13b	compress.hh: switch compressor::name() from an instance member to a virtual call Before this patch, `compressor` is designed to be a proper abstract class, where the creator of a compressor doesn't even know what he's creating -- he passes a name, and it gets turned into a `compressor` behind a scenes. But later, when creation of compressors will involve looking up dictionaries, this abstraction will only get in the way. So we give up on keeping `compressor` abstract, and instead of using "opaque" names we turn to an explicit enum of possible compressor types. The main point of this patch is to add the `algorithm` enum and the `algorithm_to_name()` function. The rest of the patch switches the `compressor::name()` function to use `algorithm_to_name()` instead of the passed-by-constructor `compressor::_name`, to keep a single source of truth for the names.	2025-04-01 00:07:27 +02:00
Robert Bindar	b647196121	Remove db::config::object_storage_config That map became redundant once we added object_storage_endpoints in the config, this patch removes it and switches all the user code to use the new option. Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>	2025-03-31 17:15:12 +03:00
Avi Kivity	73e4a3c581	sstables: store features early in write path sstable features indicate that an sstable has some extension, or that some bug was fixed. They allow us to know if we can rely on certain properties in a read sstables. Currently, sstable features are set early in the read path (when we read the scylla metadata file) and very late in the write path (when we write the scylla metadata file just before sealing the sstable). However, we happen to read features before we set them in the write path - when we resize the bloom filter for a newly written sstable we instantiate an index reader, and that depends on some features. As a result, we read a disengaged optional (for the scylla metadata component) as if it was engaged. This somehow worked so far, but fails with libstdc++ hash table implementation. Fix it by moving storage of the features to the sstable itself, and setting it early in the write path. Fixes #23484 Closes scylladb/scylladb#23485	2025-03-31 09:33:56 +03:00
Calle Wilund	e02be77af7	sstables::storage: Move wrapping sstable components to storage provider Fixes #23225 Fixes #23185 Moved wrapping component files/sinks to storage provider. Also ensures to wrap data_sinks as well as actual files. This ensures that we actually write encryption if active.	2025-03-20 14:54:24 +00:00
Calle Wilund	d46dcbb769	sstables::file_io_extension: Add a "wrap_sink" method. Similar to wrap file, should wrap a data_sink (used for sstable writers), in obvious write-only, simple stream mode. Default impl will detect if we wrap files for this component, and if so, generate a file wrapper for the input sink, wrap this, and the wrap it in a file_data_sink_impl. This is obviously not efficient, so extensions used in actual non-test code should implement the method.	2025-03-20 14:54:22 +00:00

1 2 3 4 5 ...

3761 Commits