scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-25 02:50:33 +00:00

Author	SHA1	Message	Date
Taras Veretilnyk	d287b054b9	sstables: Add TemporaryScylla metadata component type Add TemporaryScylla component type to make atomic updates of SSTable Scylla metadata using temporary files and atomic rename operations possible. This will be needed in further commit to rewrite metadata together with the statistics component.	2025-12-03 23:40:10 +01:00
Michał Chojnowski	db4283b542	sstables: introduce `ms` sstable format version Introduce `ms` -- a new sstable format version which is a hybrid of Cassandra's `me` and `da`. It is based on `me`, but with the index components (Summary.db and Index.db) replaced with the index components of `da` (Partitions.db and Rows.db). As of this patch, the version is never chosen anywhere for writing sstables yet. It is only introduced. We will add it to unit tests in a later commit, and expose it to users in yet later commit.	2025-09-29 22:15:24 +02:00
Michał Chojnowski	b1984d6798	sstables: implement an alternative way to rebuild bloom filters for sstables without Index For efficiency, the cardinality of the bloom filter (i.e. the number of partition keys which will be written into the sstable) has to be known before elements are inserted into the filter. In some cases (e.g. memtables flush) this number is known exactly. But in others (e.g. repair) it can only be estimated, and the estimation might be very wrong, leading to an oversized filter. Because of that, some time ago we added a piece of logic (ran after the sstable is written, but before it's sealed) which looks at the actual number of written partitions, compares it to the initial estimate (on which the size of the bloom filter was based on), and if the difference is unacceptably large, it rewrites the bloom filter from partition keys contained in Index.db. But the idea to rebuild the bloom filters from index files isn't going to work with BTI indexes, because they don't store whole partition keys. If we want sstables which don't have Index.db files, we need some other way to deal with oversized filters. Partition keys can be recovered from Data.db, but that would often be way too expensive. This patch adds another way. We introduce a new component file, TemporaryHashes. This component, if written at all, contains the 16-byte murmur hash for every partition key, in order, and can be used in place of Index to reconstruct the bloom filter. (Our bloom filters are actually built from the set of murmur hashes of partition keys. The first step of inserting a partition key into a filter is hashing the key. Remembering the hashes is sufficient to build the filter later, without looking at partition keys again.) As of this patch, if the Index component is not being written, we don't allocate and populate a bloom filter during the Data.db write. Instead, we write the murmur hashes to TemporaryHashes, and only later, after the Data write finishes, we allocate the optimal-size, bloom filter, we read the hashes back from TemporaryHashes, and we populate the filter with them. That is suboptimal. Writing the hashes to disk (or worse, to S3) and reading them back is more expensive than building the bloom filter during the main Data pass. So ideally it should be avoided in cases where we know in advance that the partition key count estimate is good enough. (Which should be the case in flushes and compactions). But we defer that to a future patch. (Such a change would involve passing some flag to the sstable writer if the cardinality estimate is trustworthy, and not creating TemporaryHashes if the estimate is trustworthy).	2025-09-29 13:01:21 +02:00
Michał Chojnowski	18875621e8	sstables: introduce Partition and Rows component types BTI indexes are made up of Partition.db and Rows.db files. In this patch we introduce the corresponding component types. In Cassandra, BTI is a separate "sstable format", with a new set of versions. (I.e. `bti-da`, as opposed to `big-me`). In this patch series, we are doing something different: we are introducing version `ms`, which is like `me`, except with `Index.db` and `Summary.db` replaced with `Partitions.db` and `Rows.db`. With a setup like that, Scylla won't yet be able to read Cassandra's BTI (`da`) files, because this patch doesn't teach Scylla about `da`. (But the way to that is open. It would just require first implementing several other things which changed between `me` and `da`). (And, naturally Cassandra will reject `ms` sstables. But this isn't the first time we are breaking file compatibility with Cassandra to some degree. Other examples include encryption and dictionary compression). Note: Partitions.db and Rows.db contain prefixes of keys, which is sensitive information, so they have to be encrypted.	2025-09-29 13:01:21 +02:00
Pavel Emelyanov	84e1ac5248	sstables: Move versions static-assertion check to .cc file Thiss check validates that static values of supported versions are "in sync" with each other. It's enough to do it once when compiling sstable_version.cc, not every time the header is included. refs: #1 (not that it helps noticeably, but technically it fits) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#24839	2025-07-07 13:16:21 +03:00
Avi Kivity	f3eade2f62	treewide: relicense to ScyllaDB-Source-Available-1.0 Drop the AGPL license in favor of a source-available license. See the blog post [1] for details. [1] https://www.scylladb.com/2024/12/18/why-were-moving-to-a-source-available-license/	2024-12-18 17:45:13 +02:00
Michael Livshin	c96708d262	add support for the ME sstable format The ME format has been introduced in Cassandra 3.11.11: `11952fae77/src/java/org/apache/cassandra/io/sstable/format/big/BigFormat.java (L123)` `d84c6e9810` It adds originating host id to sstable metadata in support of fixing loss of commit log data when moving sstables between nodes: https://issues.apache.org/jira/browse/CASSANDRA-16619 In Scylla: * The supported way to ingest sstables is via upload/, where stored commit log replay position should be disregarded (but see https://github.com/scylladb/scylla/issues/10080). * A later commit in this series implements originating host id validation for native ME sstables. Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>	2022-02-16 18:21:24 +02:00
Avi Kivity	fcb8d040e8	treewide: use Software Package Data Exchange (SPDX) license identifiers Instead of lengthy blurbs, switch to single-line, machine-readable standardized (https://spdx.dev) license identifiers. The Linux kernel switched long ago, so there is strong precedent. Three cases are handled: AGPL-only, Apache-only, and dual licensed. For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0), reasoning that our changes are extensive enough to apply our license. The changes we applied mechanically with a script, except to licenses/README.md. Closes #9937	2022-01-18 12:15:18 +01:00
Avi Kivity	a55b434a2b	treewide: extent copyright statements to present day	2021-06-06 19:18:49 +03:00
Pekka Enberg	a37eaaa022	sstables: Add support for the "md" format enum value Add the sstable_version_types::md enum value and logically extend sstable_version_types comparisons to cover also the > sstable_version_types::mc cases. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-10 18:53:04 +03:00
Piotr Jastrzebski	561ca34ec2	sstable: Make component_map version dependent Introduce sstable_version_constants that will be a proxy serving correct constants depending on the format version. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-04-24 11:30:26 +02:00
Avi Kivity	28be4ff5da	Revert "Merge "Implement loading sstables in 3.x format" from Piotr" This reverts commit `513479f624`, reversing changes made to `01c36556bf`. It breaks booting. Fixes #3376.	2018-04-23 06:47:00 +03:00
Piotr Jastrzebski	00756582ca	sstable: Make component_map version dependent Introduce sstable_version_constants that will be a proxy serving correct constants depending on the format version. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-04-22 13:46:12 +02:00

13 Commits