scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-25 19:10:42 +00:00

Author	SHA1	Message	Date
Vladimir Krivopalov	759d36a26e	sstables: Support Scylla-specific extension for writing shadowable tombstones. The original SSTables 'mc' format, as defined in Cassandra, does not provide a way to store shadowable deletion in addition to regular row deletion for materialized views. It is essential to store it because of known corner-case issues that otherwise appear. For this to work, we introduce a Scylla-specific extended flag to be set in SSTables in 'mc' format that indicates a shadowable tombstone is written after the regular row tombstone. This is deemed to be safe because shadowable tombstones are specific to materialized views and MV tables are not supposed to be imported or exported. Note that a shadowable tombstone can be written without a regular tombstone as well as along with it. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-10-23 16:30:42 -07:00
Vladimir Krivopalov	e168433945	sstables: Introduce a feature for shadowable tombstones in Scylla.db. This is used to indicate that the SSTables being read may contain a Scylla-specific HAS_SCYLLA_SHADOWABLE_TOMBSTONE extended flag set. If feature is not disabled, we should not honour this flag. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-10-23 16:30:42 -07:00
Vladimir Krivopalov	8f79f76116	sstables: Support checking row extension flags for Cassandra shadowable deletion. This flag can be only used in MV tables that are not supposed to be imported to Scylla. Since Scylla representation of shadowable tombstones differs from that of Cassandra, such SSTables are rejected on read and Scylla never sets this flag on writing. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-10-23 16:30:42 -07:00
Vladimir Krivopalov	e71cc5ab20	sstables: Introduce TTL limitation and special 'expired TTL' value. This allows to store expired liveness info in SSTables 3.x format without introducing a possible conflict with real TTL values. As per Cassandra, TTL cannot exceed 20 years so taking the maximum value as a special value for indicating expired liveness info is safe. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-10-10 11:44:14 -07:00
Vladimir Krivopalov	bdca27ae41	sstables: Always store only min bases in serialization_header. There previously was an inconsistency in treating min values stored in a serialization_header. They are written to or read from a Statistics.db as deltas against fixed bases, but when we parse timeouts from the data file, we need the full bases, not just deltas. This inconsistency causes wrong timestamp values if we write an sstable and then read from it using one and the same sstable object because we turn min values into bases on write and then don't adjust them back because we already have them in memory. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-25 17:23:40 -07:00
Vladimir Krivopalov	48fa088ec6	sstables: Do not parse ancestors from compaction metadata for SSTables 3.x Ancestors array has been removed starting from 'ma' format (CASSANDRA-7066). Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-09-19 17:11:43 -07:00
Vladimir Krivopalov	4bf1e9de3f	sstables: Support resetting data_consume_rows_context_m to indexable_element::cell. Set the proper parsing state when resetting to indexable_element::cell. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-08-17 10:09:19 -07:00
Vladimir Krivopalov	a497edcbda	sstables: Move promoted_index_block from types.hh to index_entry.hh. It is only being used by index_reader internally and never exposed so should not be listed in commonly used types. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-06-28 12:28:59 -07:00
Piotr Jastrzebski	a3683d6e0f	sstables 3: add serialization_header::adjust In SSTables 3, min timestamp and min deletion time in serialization header are not stored normally but instead the difference between their value and the cassandra "epoch" is stored. This is supposed to make SSTables smaller. As a consequence, we have to add the "epoch" after reading the values to obtain the actual values of min timestamp and min deletion time. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-06-15 09:10:48 +02:00
Piotr Jastrzebski	2b8ff15f9f	column_flags_m: add HAS_COMPLEX_DELETION Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-06-07 22:47:19 +02:00
Piotr Jastrzebski	f6e1c38486	Introduce column_flags_m This will be used for reading columns from data file. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-05-23 19:54:16 +02:00
Piotr Jastrzebski	d8cd8e04ed	Add unfiltered_flags_m::has_all_columns Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-05-23 16:39:52 +02:00
Piotr Jastrzebski	b849eefc8c	Use disk_string_vint_size for bytes_array_vint_size Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-05-23 16:39:52 +02:00
Piotr Jastrzebski	5ca4bfd69a	disk_array_vint_size: Remove unused Size template parameter Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-05-23 16:15:44 +02:00
Vladimir Krivopalov	56ac941a2e	Fix the order of items in stats_metadata. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-05-04 15:45:10 -07:00
Vladimir Krivopalov	5db6002720	Write serialization header to Statistics.db for SSTables 3.x. Serialization header is a new components in Statistics.db introduced in SSTables 3.0 ('ma') format. It is essential for reading data file as it contains the base values used for delta-encoded values (timestamps, TTLs, local deletion times) and description of column types. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-05-04 15:43:17 -07:00
Vladimir Krivopalov	3e471116b4	Separate statistics for count of cells, columns and rows in column_stats. SSTables 3.0 format makes a distinction between count of cells and count of columns. In that sense, a column of a collection type counts as one column but every atomic cell in it counts as a separate cell. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-05-03 17:05:06 -07:00
Piotr Jastrzebski	2ee3d8b87b	Introduce consumer_m and data_consume_rows_context_m Those classes can handle SSTables in MC format. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-04-26 12:49:38 +02:00
Piotr Jastrzebski	df457166b0	Add support for 3_x stats metadata Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-04-24 11:30:26 +02:00
Piotr Jastrzebski	e1e23ec555	Pass sstable version to describe_type Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-04-24 11:30:26 +02:00
Piotr Jastrzebski	1cc1f9af5f	Pass sstable version to write methods This will allow writing different versions differently Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-04-24 11:30:26 +02:00
Piotr Jastrzebski	08da518dae	metadata_type: add Serialization type Ignore it while reading sstable 3_x and throw if it's present when reading 2_x. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-04-24 11:30:26 +02:00
Avi Kivity	28be4ff5da	Revert "Merge "Implement loading sstables in 3.x format" from Piotr" This reverts commit `513479f624`, reversing changes made to `01c36556bf`. It breaks booting. Fixes #3376.	2018-04-23 06:47:00 +03:00
Piotr Jastrzebski	b683870644	Add support for 3_x stats metadata Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-04-22 15:06:51 +02:00
Piotr Jastrzebski	26ab3056ae	Pass sstable version to describe_type Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-04-22 14:41:11 +02:00
Piotr Jastrzebski	0022c309ee	Pass sstable version to write methods This will allow writing different versions differently Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-04-22 14:41:10 +02:00
Piotr Jastrzebski	65fe564cd2	metadata_type: add Serialization type Ignore it while reading sstable 3_x and throw if it's present when reading 2_x. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-04-22 14:40:04 +02:00
Glauber Costa	b2f9958071	large_bitset: use a chunked_vector internally and simplify API save and load functions for the large_bitset were introduced by Avi with `d590e327c0`. In that commit, Avi says: "... providing iterator-based load() and save() methods. The methods support partial load/save so that access to very large bitmaps can be split over multiple tasks." The only user of this interface is SSTables. And turns out we don't really split the access like that. What we do instead is to create a chunked vector and then pass its begin() method with position = 0 and let it write everything. The problem here is that this require the chunked vector to be fully initialized, not just reserved. If the bitmap is large enough that in itself can take a long time without yielding (up to 16ms seen in my setup). We can simplify things considerably by moving the large_bitset to use a chunked vector internally: it already uses a poor man's version of it by allocating chunks internally (it predates the chunked_vector). By doing that, we can turn save() into a simple copy operation, and do away with load altogether by adding a new constructor that will just copy an existing chunked_vector. Fixes #3341 Tests: unit (release) Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20180409234726.28219-1-glauber@scylladb.com>	2018-04-10 10:25:06 +03:00
Glauber Costa	f5c32423b8	summary: don't go through all entries when computing memory size. Summary has a function, memory_size(), that estimates the amount of memory the summary takes. It is my understanding that this is called to serve information to tooling. First, this function is innacurate because it doesn't take into account the tokens per each entry, just the keys. But more importantly, it has to iterate over all keys which can be pretty expensive if the entries list is long. We are now keeping that in a memory area, with just pointers in the entry. So instead of iterating through the entries, we can iterate through the memory areas, which is much cheaper. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20180316120915.16809-1-glauber@scylladb.com>	2018-03-16 12:57:19 +00:00
Glauber Costa	e680c7c8cc	abstract summary entry version of the token with a token view dht::token doesn't have a trivial destructor, so destroying an array full of those can be quite expensive. If we use the same trick as we used for the summary - storing the token data in a stable memory location - we can leave the entries with a trivial destructor and destroy the chunks themselves. Those being larger, they will be more efficient to delete. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-03-15 12:24:15 -04:00
Glauber Costa	091b0f9d41	summary_entry: do not store key bytes in each summary entry If we store a bytes_view instead of bytes, that has a trivial destructor and then we don't need to destroy each element individually. To do that, we allocate the data in a couple of large arrays which can be disposed of easily and point to it. We still can't destroy trivially because of the token. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-03-14 10:46:20 -04:00
Calle Wilund	b0c0c3c0ad	sstables::types: Add optional extensions attribute to scylla metadata Allowing storing key:value pairs.	2018-02-07 10:11:46 +00:00
Vladimir Krivopalov	7e15e436de	Parse promoted index entries lazily upon request rather than immediately. Now promoted index is converted into an input_stream and skipped over instead of being consumed immediately and stored as a single buffer. The only part that is read right away is the deletion time as it is likely to be there in the already read buffer and reading it should both be cheap and prevent from reading the whole promoted index if only deletion time mark is needed. When accessed, promoted index is parsed in chunks, buffer by buffer, to limit memory consumption. Fixes #2981 Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-01-29 11:57:15 -08:00
Duarte Nunes	eeacef3089	sstables/sstables: Correctly deserialize range tombstones This patch changes the range tombstone read path to deal with correctly written non-compound range tombstones, while also maintaining backward compatibility and reading old Scylla-generated range tombstones. The fix for the write path will activate an sstable feature which will connect with this patch. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-11-23 16:45:53 +00:00
Duarte Nunes	f217dcc0ce	sstables/sstables: Don't use incorrectly serialized promoted index Promoted indexes generated before this patch by Scylla are considered incorrect if they belong to a non-compound schema, due to #2993. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-11-23 16:45:53 +00:00
Duarte Nunes	8cdd8e2431	sstables/sstables: Add supported feature list to sstables This patch adds additional metadata to the scylla sstable component. Namely, it adds a list of features that the current sstable supports. The upcoming usages of the feature list are meant for backward compatibility, but the implementation makes no such assumptions. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-11-23 16:45:53 +00:00
Avi Kivity	beffe469af	index_entry: add move constructor, assigment operators As can be seen in one of the traces in #2958, the copy constructor of index_entry is called in response to std::vector<index_entry>::push_back(index_entry&&). This is wasteful. Fix by providing the full suite of constructors/assignment operators. Message-Id: <20171116121608.5580-1-avi@scylladb.com>	2017-11-16 13:54:05 +01:00
Avi Kivity	1f66940134	sstables: switch std::deque to chunked_vector Reduce susceptibility to memory fragmentation.	2017-08-26 16:44:47 +03:00
Raphael S. Carvalho	8726ee937d	sstables: introduce size-based sampling for sstable summary Currently, a summary entry is added after min_index_interval index entries were written. Not taking into account size of index entries becomes a problem with large partitions which may create big index entries due to promoted indexes. Read performance is affected as a consequence because index entries spanned by summary are all read from disk to serve request. What we wanna do is to also add a summary entry after index reaches a boundary. To deal with oversampling, we want to write 1 byte to summary for every 2000 bytes written to data file (this will be eventually made into an option in the config file). Both conditions must be met to avoid under or oversampling. That way, the amount of data needed from index file to satify the request is drastically reduced. Fixes #1842. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-08-11 00:30:12 -03:00
Paweł Dziepak	dc7bad9a50	sstables: cache token in index entries When a sstable reader is fast forwarded some index entries may be read (and compared) multiple times. This patch makes sure that once a token is computed we keep it around and reuse if the entry is accessed again.	2017-07-26 14:36:37 +01:00
Paweł Dziepak	bfb7b56c74	sstable: keep a pre-computed token in summary_entry Each sstable index lookup involves a binary search in the summary and each time a partition key of summary entry is compared with anything its token needs to be calculated. Since we keep summary in the memory all the time it is better to also keep the tokens around.	2017-07-26 14:36:36 +01:00
Raphael S. Carvalho	d90f46000d	streaming_histogram: move it to utils It's not specific to sstables. May be needed somewhere else in the future. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-06-28 01:07:13 -03:00
Duarte Nunes	d45596ae8e	sstables: Read and write shadowable tombstones This patch serializes shadowable tombstones to sstables by adding a new, incompatible atom's mask. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-04-25 11:46:33 +02:00
Tomasz Grabiec	ae72c159b1	sstables: index_reader: Introduce promoted_index_view So that we have a nice way of extracting tombstone out of it. We not always need fully parsed index.	2017-04-20 10:54:37 +02:00
Tomasz Grabiec	5b36976bf0	sstables: Store parsed promoted index in index_entry	2017-03-28 18:34:55 +02:00
Tomasz Grabiec	5af815bf20	sstables: Define deletion_time earlier	2017-03-28 18:34:55 +02:00
Tomasz Grabiec	27d86dfe18	sstables: Enable skipping to cells at data_consume_context level	2017-03-28 18:10:39 +02:00
Tomasz Grabiec	0635d74e17	sstables: Make index_entry copyable Needed to make the index_list copyable, which is going to be needed to implement legacy get_index_entries() which returns by value, after index sharing is implemented.	2017-03-28 18:10:39 +02:00
Tomasz Grabiec	d5e704ca1e	sstables: Make key_view constructor from bytes_view explicit	2017-03-28 18:10:39 +02:00
Raphael S. Carvalho	e28537b56f	sstables: fix calculation of memory footprint for summary size of keys weren't taken into account, so value reported via collectd is much smaller than actual footprint. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <3ca24612e4e84d1cbdea4f2d79e431a4f4479291.1482255327.git.raphaelsc@scylladb.com>	2016-12-20 18:28:47 +00:00

1 2 3

118 Commits