scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-26 03:20:37 +00:00

Author	SHA1	Message	Date
Michał Chojnowski	518f04f1c4	compress: add some test-only APIs Will be needed by the test added in the next patch.	2025-05-07 14:43:20 +02:00
Michał Chojnowski	cb1b291051	compress: add ZstdWithDictsCompressor and LZ4WithDictsCompressor Add new compressor names to `sstable_compression`. When those names are configured in the schema, new SSTables will be compressed with dict-aware Zstd or LZ4 respectively.	2025-04-01 00:07:30 +02:00
Michał Chojnowski	10fa4abde7	compress: change compressor_ptr from shared_ptr to unique_ptr Cleanup patch. After we moved the ownership of compressors to sstables, compressor objects never have shared lifetime. `unique_ptr` is more appropriate for them than `shared_ptr` now. (And besides expressing the intent better, using `unique_ptr` prevents an accidental cross-shard `shared_ptr` copy).	2025-04-01 00:07:29 +02:00
Michał Chojnowski	b18ddcb92e	sstables: delegate compressor creation to the compressor factory Remove `compressor::create()`. This enforces that compressors are only created through the `sstable_compressor_factory`. Unlike the synchronous `compressor::create()`, the factory will be able to create dict-aware compressors.	2025-04-01 00:07:28 +02:00
Michał Chojnowski	dd932ebb2f	compress: add hidden dictionary options Before this commit, "compression options" written into CompressionInfo.db (and used to construct a decompressor) have a 1:1 correspondence to "compression options" specified in the schema. But we want to add a new "compression option" -- the compression dictionary -- which will be written into CompressionInfo.db and used to construct decompressors, but won't be specified in the schema. To reconcile that, in this commit we introduce the notion of a "hidden option". If an option name in `CompressionInfo.db` begins with a dot, then this option will be used to construct decompressors, but won't be visible for other uses. (I.e. for the `sstable_info` API call and for recovering a fake `schema` from `CompressionInfo.db` in the `scylla sstable` tool). Then, we introduce the hidden `.dictionary.{0,1,2,..}` options, which hold the contents of the dictionary blob for this SSTable. (The dictionary is split into several parts because the SSTable format limits the length of a single option value to 16 bits, and dictionaries usually have a length greater than that). This commit only introduces helpers which translate dictionary blobs into "options" for CompressionInfo.db, and vice-versa, but it doesn't use those helpers yet. They will be used in later commits.	2025-04-01 00:07:28 +02:00
Michał Chojnowski	11be7c0704	compress: remove `compression_parameters::get_compressor()` Following up on the previous commits, we avoid constructing compressors where not necessary, by checking things directly on `compression_parameters` instead.	2025-04-01 00:07:28 +02:00
Michał Chojnowski	7bdcd5e8c1	compress: remove compressor::option_names() It used to be used by `compression_parameters` validation logic to ask the created `compressor` for compressor-specific option names. Since we no longer delegate this to `compressor`, but we just put the knowledge of those options directly into `compressor_parameters`, it's dead code now.	2025-04-01 00:07:27 +02:00
Michał Chojnowski	3b0ab8e1ee	compress: clean up the constructor of zstd_processor Since we now parse and validate the compression level during the construction of `compression_parameters`, we can just pass the structured params to `zstd_processor` instead of passing a raw string map.	2025-04-01 00:07:27 +02:00
Michał Chojnowski	6470035a74	compress: squash zstd.cc into compress.cc Unlike all other implementations of `compressor`, `zstd_processor` has its own special object file and its own special late binding mechanism (via the `class_registry`). It doesn't need either. Let's squash it into `compress.cc`. Keeping `zstd_processor` a separate "module" would require adding even more headers and source files later in the series (when adding dictionaries), and there's no benefit in being so granular. All `compressor` logic can be in `compress.cc` and it will still be small enough. This commit also gets rid of the pointless `class_registry` late binding mechanism and just constructs the `zstd_processor` in `compressor::create()` with a regular constructor call.	2025-04-01 00:07:27 +02:00
Michał Chojnowski	cfe69e057f	sstables/compress: break the dependency of `compression_parameters` on `compressor` Note: this commit is meant to be a code refactoring only and is not intended to change the observable behaviour. Today `schema` contains a `compression_parameters`. `compression_parameters` contains an instance of `compressor`, and SSTable writers just share that instance. This is fine because `compressor` is a stateless object, functionally dependent on the schema. But in later parts of the series, we will break this functional dependency by adding dictionaries to compressors. Two writers for the same schema might have different dictionaries, so they won't be able to just share a single instance contained in the schema. And when that happens, having a `compressor` instance in the `schema`/`compression_parameters` will become awkward, since it won't be actually used. It will be only a container for options. In addition, for performance reasons, we will want to share some pieces of compressors across shards, which will require -- in the general case -- a construction of a compressor to be asynchronous, and therefore not possible inside the constructor of `compression_parameters`. This commit modifies `compression_parameters` so that it doesn't hold or construct instances of `compressor`. Before this patch, the `compressor` instance constructed in `compression_parameters` has an additional role of validating and holding compressor-specific options. (Today the only such option is the zstd compression level). This means that the pieces of logic responsible for compressor-specific options have to be rewritten. That ends up being the bulk of this commit.	2025-04-01 00:07:27 +02:00
Michał Chojnowski	f4ca94d13b	compress.hh: switch compressor::name() from an instance member to a virtual call Before this patch, `compressor` is designed to be a proper abstract class, where the creator of a compressor doesn't even know what he's creating -- he passes a name, and it gets turned into a `compressor` behind a scenes. But later, when creation of compressors will involve looking up dictionaries, this abstraction will only get in the way. So we give up on keeping `compressor` abstract, and instead of using "opaque" names we turn to an explicit enum of possible compressor types. The main point of this patch is to add the `algorithm` enum and the `algorithm_to_name()` function. The rest of the patch switches the `compressor::name()` function to use `algorithm_to_name()` instead of the passed-by-constructor `compressor::_name`, to keep a single source of truth for the names.	2025-04-01 00:07:27 +02:00
Kefu Chai	4a268362b9	compress: fix compressor initialization order by making namespace_prefix a function Fixes a race condition where COMPRESSOR_NAME in zstd.cc could be initialized before compressor::namespace_prefix due to undefined global variable initialization order across translation units. This was causing ZstdCompressor to be unregistered in release builds, making it impossible to create tables with Zstd compression. Replace the global namespace_prefix variable with a function that returns the fully qualified compressor name. This ensures proper initialization order and fixes the registration of the ZstdCompressor. Fixes scylladb/scylladb#22444 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#22451	2025-01-26 13:43:02 +02:00
Avi Kivity	f3eade2f62	treewide: relicense to ScyllaDB-Source-Available-1.0 Drop the AGPL license in favor of a source-available license. See the blog post [1] for details. [1] https://www.scylladb.com/2024/12/18/why-were-moving-to-a-source-available-license/	2024-12-18 17:45:13 +02:00
Kefu Chai	fb87ab1c75	compress, auth: include used headers before this change, we rely on `seastar/util/std-compat.hh` to include the used headers provided by stdandard library. this was necessary before we moved to a C++20 compliant standard library implementation. but since Seastar has dropped C++17 support. its `seastar/util/std-compat.hh` is not responsible for providing these headers anymore. so, in this change, we include the used header directly instead of relying on `seastar/util/std-compat.hh`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18986	2024-05-30 09:16:23 +03:00
Kefu Chai	0ae81446ef	./: not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16766	2024-01-17 16:30:14 +02:00
Kefu Chai	f5b05cf981	treewide: use defaulted operator!=() and operator==() in C++20, compiler generate operator!=() if the corresponding operator==() is already defined, the language now understands that the comparison is symmetric in the new standard. fortunately, our operator!=() is always equivalent to `! operator==()`, this matches the behavior of the default generated operator!=(). so, in this change, all `operator!=` are removed. in addition to the defaulted operator!=, C++20 also brings to us the defaulted operator==() -- it is able to generated the operator==() if the member-wise lexicographical comparison. under some circumstances, this is exactly what we need. so, in this change, if the operator==() is also implemented as a lexicographical comparison of all memeber variables of the class/struct in question, it is implemented using the default generated one by removing its body and mark the function as `default`. moreover, if the class happen to have other comparison operators which are implemented using lexicographical comparison, the default generated `operator<=>` is used in place of the defaulted `operator==`. sometimes, we fail to mark the operator== with the `const` specifier, in this change, to fulfil the need of C++ standard, and to be more correct, the `const` specifier is added. also, to generate the defaulted operator==, the operand should be `const class_name&`, but it is not always the case, in the class of `version`, we use `version` as the parameter type, to fulfill the need of the C++ standard, the parameter type is changed to `const version&` instead. this does not change the semantic of the comparison operator. and is a more idiomatic way to pass non-trivial struct as function parameters. please note, because in C++20, both operator= and operator<=> are symmetric, some of the operators in `multiprecision` are removed. they are the symmetric form of the another variant. if they were not removed, compiler would, for instance, find ambiguous overloaded operator '=='. this change is a cleanup to modernize the code base with C++20 features. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13687	2023-04-27 10:24:46 +03:00
Avi Kivity	fcb8d040e8	treewide: use Software Package Data Exchange (SPDX) license identifiers Instead of lengthy blurbs, switch to single-line, machine-readable standardized (https://spdx.dev) license identifiers. The Linux kernel switched long ago, so there is strong precedent. Three cases are handled: AGPL-only, Apache-only, and dual licensed. For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0), reasoning that our changes are extensive enough to apply our license. The changes we applied mechanically with a script, except to licenses/README.md. Closes #9937	2022-01-18 12:15:18 +01:00
Benny Halevy	d96a67eb57	abstract_replication_strategy: use shared_ptr in registry Enable creating shared_ptr<BaseClass> in nonstatic_class_registry using BaseClass::ptr_type and use that for abstract_replication_strategy. While at it, also clean up compressor with that respect to define compressor::ptr_type as shared_ptr<compressor> thus simplifying compressor_registry. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-10-13 12:39:36 +03:00
Avi Kivity	331eb57e17	Revert "compression: define 'class' attribute for compression and deprecate 'sstable_compression'" This reverts commit `5571ef0d6d`. It causes rolling upgrade failures. Fixes #9055. Reopens #8948.	2021-07-28 14:14:22 +03:00
Avi Kivity	5571ef0d6d	compression: define 'class' attribute for compression and deprecate 'sstable_compression' Cassandra 3.0 deprecated the 'sstable_compression' attribute and added 'class' as a replacement. Follow by supporting both. The SSTABLE_COMPRESSION variable is renamed to SSTABLE_COMPRESSION_DEPRECATED to detect all uses and prevent future misuse. To prevent old-version nodes from seeing the new name, the compression_parameters class preserves the key name when it is constructed from an options map, and emits the same key name when asked to generate an options map. Existing unit tests are modified to use the new name, and a test is added to ensure the old name is still supported. Fixes #8948. Closes #8949	2021-07-07 19:15:20 +02:00
Avi Kivity	a55b434a2b	treewide: extent copyright statements to present day	2021-06-06 19:18:49 +03:00
Duarte Nunes	fa2b0384d2	Replace std::experimental types with C++17 std version. Replace stdx::optional and stdx::string_view with the C++ std counterparts. Some instances of boost::variant were also replaced with std::variant, namely those that called seastar::visit. Scylla now requires GCC 8 to compile. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20190108111141.5369-1-duarte@scylladb.com>	2019-01-08 13:16:36 +02:00
Juliana Oliveira	5eb76c9bc6	compress: add support for Cassandra's compression parameter This patch adds compatibility for Cassandra's "chunk_size_in_kb", as well as it keeps Scylla's "chunk_size_kb" compression parameter. Fixes #3669 Tests: unit (release) v2: use variable instead of array v3: fix commited files Signed-off-by: Juliana Oliveira <juliana@scylladb.com> Message-Id: <20181211215840.GA7379@shenzou.localdomain>	2018-12-11 23:33:27 +00:00
Duarte Nunes	5f64e34fcc	tests: Be explicit about absence of compression Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-11-20 22:47:26 +00:00
Calle Wilund	74758c87cd	sstables::compress/compress: Make compression a virtual object Make a "compressor" an actual class, that can be implemented and registered via class registry. For "common" compressors, the objects will be shared, but complex implementors can be semi-stateful. sstable compression is split into two parts: The "static" config which is shared across shards, and a "local" one, which holds a compressor pointer. The latter is encapsulated, along with actual compressed data writers, in sstables/compress.cc. For compression (write), compression writer is instansiated with the settings active in table metadata. For decompression (read), compression reader is instansiated with the settings stored in sstable metadata, which can differ from the currently active table metadata. v2: * Structured patch sets differently (dependencies) * Added more comments/api descs * Added patch to move all sstable compression into compress.cc, effectively separating top-level virtual compressor object from sstable io knowledge v3: * Rebased v4: * Moved all sstable compression logic/knowledge into compress.cc (local compression). Merged the two patches (separation just confuses reader).	2018-02-07 10:11:45 +00:00
Duarte Nunes	e33c02aa60	cql3: Disable compression on empty properties The CQL 3.1 documentation specifies that for disabling compression, users should use an empty string: ALTER TABLE mytable WITH COMPRESSION = {'sstable_compression': ''}; However, Cassandra also accepts the absence of the sstable_compression option to disable compression. The patch 7c28ed prevented this behavior in Scylla, which this patch aims to fix. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <1478639499-4183-1-git-send-email-duarte@scylladb.com>	2016-11-09 10:03:59 +02:00
Duarte Nunes	7c28ed3dfc	schema: Extract default compressor This patch extracts the definition of the default compressor into the compression_parameters class, so that the table and view creation statements don't have to explicitly deal with it. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-10-18 01:18:52 +00:00
Nadav Har'El	164c760324	Switch compression chunk default from 64 KB to 4 KB Following Cassandra, our default sstable compression chunk size is 64 KB. The big downside of this default size is that small reads need to read and uncompress a large chunk, around 32 KB (if compression halves the data size). In this patch we switch the default chunk size to 4 KB, which allows faster small reads (the report in issue #1337 was of a 60-fold speedup...). Since commit `2f56577`, large reads will not be signficantly slowed down by changing to a small chunk size. The remaining potential downside of this change is lowering of the compression ratio because of the smaller chunks individually compressed. However, experimentation shows that the compression ratio is hurt somewhat, but not dramatically, by lowering the chunk size: A recent survey of Cassandra compression in https://www.percona.com/blog/2016/03/09/evaluating-database-compression-methods/ reports a compression ratio of 2 for 64 KB chunks, vs. 1.75 for 4 KB chunks. My own test on a cassandra-stress workload (whose data is relatively hard to compress), showed compression ratio 1.25 for 64 KB chunk, vs. 1.23 for 4 KB chunks. Also remember that if a user wants to control the chunk length for a particular table, he can - the 64 KB or 4 KB sizes are just the default. Fixes #1337 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <1467063335-12096-1-git-send-email-nyh@scylladb.com>	2016-06-28 08:50:24 +03:00
Pekka Enberg	38a54df863	Fix pre-ScyllaDB copyright statements People keep tripping over the old copyrights and copy-pasting them to new files. Search and replace "Cloudius Systems" with "ScyllaDB". Message-Id: <1460013664-25966-1-git-send-email-penberg@scylladb.com>	2016-04-08 08:12:47 +03:00
Tomasz Grabiec	f9d6c7b026	compress: Add equality operators	2015-12-16 18:06:55 +01:00
Avi Kivity	d5cf0fb2b1	Add license notices	2015-09-20 10:43:39 +03:00
Paweł Dziepak	148d6b9db2	compress: allow empty sstable_compression Fixes #13. Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>	2015-07-08 10:47:11 +02:00
Paweł Dziepak	a0424d5d27	compressor: allow an empty map of options Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>	2015-06-29 09:36:14 +02:00
Paweł Dziepak	b520ef6172	compress: generate a std::map<sstring, sstring> of options Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>	2015-06-29 09:35:01 +02:00
Paweł Dziepak	f4ce125422	compress: use std::optional for chunk length and crc check chance Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>	2015-06-29 09:30:31 +02:00
Paweł Dziepak	9134381638	compress: accept both qualified and unqualified class names Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>	2015-06-29 09:05:50 +02:00
Paweł Dziepak	c51a430020	compress: "DeflateCompressor" is not compressor::none Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>	2015-06-29 08:25:13 +02:00
Paweł Dziepak	4899100877	compress: use '= default' for default constructor Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>	2015-06-25 16:32:58 +02:00
Paweł Dziepak	28242489c3	compress: fix include quotes Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>	2015-06-25 16:32:30 +02:00
Paweł Dziepak	53640c73fd	compress: add compression_parameters class Passing a single enum specifying a compressor type around is not enough, since there are other compression options user may want to specify. Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>	2015-06-25 13:32:13 +02:00
Raphael S. Carvalho	d1ed0744f0	schema: add sstable compressor property The field compressor is about saying which compressor algorithm must be used in compression of sstable data file. This is a small step towards compressed sstable data file. Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>	2015-06-09 11:18:56 +03:00

41 Commits