scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-28 20:27:03 +00:00

Author	SHA1	Message	Date
Piotr Jastrzebski	a3683d6e0f	sstables 3: add serialization_header::adjust In SSTables 3, min timestamp and min deletion time in serialization header are not stored normally but instead the difference between their value and the cassandra "epoch" is stored. This is supposed to make SSTables smaller. As a consequence, we have to add the "epoch" after reading the values to obtain the actual values of min timestamp and min deletion time. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-06-15 09:10:48 +02:00
Piotr Jastrzebski	2b8ff15f9f	column_flags_m: add HAS_COMPLEX_DELETION Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-06-07 22:47:19 +02:00
Piotr Jastrzebski	f6e1c38486	Introduce column_flags_m This will be used for reading columns from data file. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-05-23 19:54:16 +02:00
Piotr Jastrzebski	d8cd8e04ed	Add unfiltered_flags_m::has_all_columns Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-05-23 16:39:52 +02:00
Piotr Jastrzebski	b849eefc8c	Use disk_string_vint_size for bytes_array_vint_size Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-05-23 16:39:52 +02:00
Piotr Jastrzebski	5ca4bfd69a	disk_array_vint_size: Remove unused Size template parameter Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-05-23 16:15:44 +02:00
Vladimir Krivopalov	56ac941a2e	Fix the order of items in stats_metadata. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-05-04 15:45:10 -07:00
Vladimir Krivopalov	5db6002720	Write serialization header to Statistics.db for SSTables 3.x. Serialization header is a new components in Statistics.db introduced in SSTables 3.0 ('ma') format. It is essential for reading data file as it contains the base values used for delta-encoded values (timestamps, TTLs, local deletion times) and description of column types. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-05-04 15:43:17 -07:00
Vladimir Krivopalov	3e471116b4	Separate statistics for count of cells, columns and rows in column_stats. SSTables 3.0 format makes a distinction between count of cells and count of columns. In that sense, a column of a collection type counts as one column but every atomic cell in it counts as a separate cell. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-05-03 17:05:06 -07:00
Piotr Jastrzebski	2ee3d8b87b	Introduce consumer_m and data_consume_rows_context_m Those classes can handle SSTables in MC format. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-04-26 12:49:38 +02:00
Piotr Jastrzebski	df457166b0	Add support for 3_x stats metadata Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-04-24 11:30:26 +02:00
Piotr Jastrzebski	e1e23ec555	Pass sstable version to describe_type Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-04-24 11:30:26 +02:00
Piotr Jastrzebski	1cc1f9af5f	Pass sstable version to write methods This will allow writing different versions differently Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-04-24 11:30:26 +02:00
Piotr Jastrzebski	08da518dae	metadata_type: add Serialization type Ignore it while reading sstable 3_x and throw if it's present when reading 2_x. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-04-24 11:30:26 +02:00
Avi Kivity	28be4ff5da	Revert "Merge "Implement loading sstables in 3.x format" from Piotr" This reverts commit `513479f624`, reversing changes made to `01c36556bf`. It breaks booting. Fixes #3376.	2018-04-23 06:47:00 +03:00
Piotr Jastrzebski	b683870644	Add support for 3_x stats metadata Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-04-22 15:06:51 +02:00
Piotr Jastrzebski	26ab3056ae	Pass sstable version to describe_type Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-04-22 14:41:11 +02:00
Piotr Jastrzebski	0022c309ee	Pass sstable version to write methods This will allow writing different versions differently Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-04-22 14:41:10 +02:00
Piotr Jastrzebski	65fe564cd2	metadata_type: add Serialization type Ignore it while reading sstable 3_x and throw if it's present when reading 2_x. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-04-22 14:40:04 +02:00
Glauber Costa	b2f9958071	large_bitset: use a chunked_vector internally and simplify API save and load functions for the large_bitset were introduced by Avi with `d590e327c0`. In that commit, Avi says: "... providing iterator-based load() and save() methods. The methods support partial load/save so that access to very large bitmaps can be split over multiple tasks." The only user of this interface is SSTables. And turns out we don't really split the access like that. What we do instead is to create a chunked vector and then pass its begin() method with position = 0 and let it write everything. The problem here is that this require the chunked vector to be fully initialized, not just reserved. If the bitmap is large enough that in itself can take a long time without yielding (up to 16ms seen in my setup). We can simplify things considerably by moving the large_bitset to use a chunked vector internally: it already uses a poor man's version of it by allocating chunks internally (it predates the chunked_vector). By doing that, we can turn save() into a simple copy operation, and do away with load altogether by adding a new constructor that will just copy an existing chunked_vector. Fixes #3341 Tests: unit (release) Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20180409234726.28219-1-glauber@scylladb.com>	2018-04-10 10:25:06 +03:00
Glauber Costa	f5c32423b8	summary: don't go through all entries when computing memory size. Summary has a function, memory_size(), that estimates the amount of memory the summary takes. It is my understanding that this is called to serve information to tooling. First, this function is innacurate because it doesn't take into account the tokens per each entry, just the keys. But more importantly, it has to iterate over all keys which can be pretty expensive if the entries list is long. We are now keeping that in a memory area, with just pointers in the entry. So instead of iterating through the entries, we can iterate through the memory areas, which is much cheaper. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20180316120915.16809-1-glauber@scylladb.com>	2018-03-16 12:57:19 +00:00
Glauber Costa	e680c7c8cc	abstract summary entry version of the token with a token view dht::token doesn't have a trivial destructor, so destroying an array full of those can be quite expensive. If we use the same trick as we used for the summary - storing the token data in a stable memory location - we can leave the entries with a trivial destructor and destroy the chunks themselves. Those being larger, they will be more efficient to delete. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-03-15 12:24:15 -04:00
Glauber Costa	091b0f9d41	summary_entry: do not store key bytes in each summary entry If we store a bytes_view instead of bytes, that has a trivial destructor and then we don't need to destroy each element individually. To do that, we allocate the data in a couple of large arrays which can be disposed of easily and point to it. We still can't destroy trivially because of the token. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-03-14 10:46:20 -04:00
Calle Wilund	b0c0c3c0ad	sstables::types: Add optional extensions attribute to scylla metadata Allowing storing key:value pairs.	2018-02-07 10:11:46 +00:00
Vladimir Krivopalov	7e15e436de	Parse promoted index entries lazily upon request rather than immediately. Now promoted index is converted into an input_stream and skipped over instead of being consumed immediately and stored as a single buffer. The only part that is read right away is the deletion time as it is likely to be there in the already read buffer and reading it should both be cheap and prevent from reading the whole promoted index if only deletion time mark is needed. When accessed, promoted index is parsed in chunks, buffer by buffer, to limit memory consumption. Fixes #2981 Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-01-29 11:57:15 -08:00
Duarte Nunes	eeacef3089	sstables/sstables: Correctly deserialize range tombstones This patch changes the range tombstone read path to deal with correctly written non-compound range tombstones, while also maintaining backward compatibility and reading old Scylla-generated range tombstones. The fix for the write path will activate an sstable feature which will connect with this patch. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-11-23 16:45:53 +00:00
Duarte Nunes	f217dcc0ce	sstables/sstables: Don't use incorrectly serialized promoted index Promoted indexes generated before this patch by Scylla are considered incorrect if they belong to a non-compound schema, due to #2993. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-11-23 16:45:53 +00:00
Duarte Nunes	8cdd8e2431	sstables/sstables: Add supported feature list to sstables This patch adds additional metadata to the scylla sstable component. Namely, it adds a list of features that the current sstable supports. The upcoming usages of the feature list are meant for backward compatibility, but the implementation makes no such assumptions. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-11-23 16:45:53 +00:00
Avi Kivity	beffe469af	index_entry: add move constructor, assigment operators As can be seen in one of the traces in #2958, the copy constructor of index_entry is called in response to std::vector<index_entry>::push_back(index_entry&&). This is wasteful. Fix by providing the full suite of constructors/assignment operators. Message-Id: <20171116121608.5580-1-avi@scylladb.com>	2017-11-16 13:54:05 +01:00
Avi Kivity	1f66940134	sstables: switch std::deque to chunked_vector Reduce susceptibility to memory fragmentation.	2017-08-26 16:44:47 +03:00
Raphael S. Carvalho	8726ee937d	sstables: introduce size-based sampling for sstable summary Currently, a summary entry is added after min_index_interval index entries were written. Not taking into account size of index entries becomes a problem with large partitions which may create big index entries due to promoted indexes. Read performance is affected as a consequence because index entries spanned by summary are all read from disk to serve request. What we wanna do is to also add a summary entry after index reaches a boundary. To deal with oversampling, we want to write 1 byte to summary for every 2000 bytes written to data file (this will be eventually made into an option in the config file). Both conditions must be met to avoid under or oversampling. That way, the amount of data needed from index file to satify the request is drastically reduced. Fixes #1842. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-08-11 00:30:12 -03:00
Paweł Dziepak	dc7bad9a50	sstables: cache token in index entries When a sstable reader is fast forwarded some index entries may be read (and compared) multiple times. This patch makes sure that once a token is computed we keep it around and reuse if the entry is accessed again.	2017-07-26 14:36:37 +01:00
Paweł Dziepak	bfb7b56c74	sstable: keep a pre-computed token in summary_entry Each sstable index lookup involves a binary search in the summary and each time a partition key of summary entry is compared with anything its token needs to be calculated. Since we keep summary in the memory all the time it is better to also keep the tokens around.	2017-07-26 14:36:36 +01:00
Raphael S. Carvalho	d90f46000d	streaming_histogram: move it to utils It's not specific to sstables. May be needed somewhere else in the future. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-06-28 01:07:13 -03:00
Duarte Nunes	d45596ae8e	sstables: Read and write shadowable tombstones This patch serializes shadowable tombstones to sstables by adding a new, incompatible atom's mask. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-04-25 11:46:33 +02:00
Tomasz Grabiec	ae72c159b1	sstables: index_reader: Introduce promoted_index_view So that we have a nice way of extracting tombstone out of it. We not always need fully parsed index.	2017-04-20 10:54:37 +02:00
Tomasz Grabiec	5b36976bf0	sstables: Store parsed promoted index in index_entry	2017-03-28 18:34:55 +02:00
Tomasz Grabiec	5af815bf20	sstables: Define deletion_time earlier	2017-03-28 18:34:55 +02:00
Tomasz Grabiec	27d86dfe18	sstables: Enable skipping to cells at data_consume_context level	2017-03-28 18:10:39 +02:00
Tomasz Grabiec	0635d74e17	sstables: Make index_entry copyable Needed to make the index_list copyable, which is going to be needed to implement legacy get_index_entries() which returns by value, after index sharing is implemented.	2017-03-28 18:10:39 +02:00
Tomasz Grabiec	d5e704ca1e	sstables: Make key_view constructor from bytes_view explicit	2017-03-28 18:10:39 +02:00
Raphael S. Carvalho	e28537b56f	sstables: fix calculation of memory footprint for summary size of keys weren't taken into account, so value reported via collectd is much smaller than actual footprint. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <3ca24612e4e84d1cbdea4f2d79e431a4f4479291.1482255327.git.raphaelsc@scylladb.com>	2016-12-20 18:28:47 +00:00
Avi Kivity	3c3a18f222	sstables: move sharding metadata from Statistics component to a new Scylla component The Cassandra derived sstable tools (and likely Cassandra itself) object to a new sub-component in the Statistics component; create a new Scylla component instead to host this data.	2016-12-07 15:20:13 +02:00
Avi Kivity	bdd11648ac	sstables: add intra-node sharding metadata Add a metadata component that describes token ranges that are spanned by this sstable. With the current sharding algorithm, where each shard owns a single token range, the first/last partition key is sufficient to describing sharding information, but for multi-range algorithms, this is not sufficient.	2016-11-22 21:44:25 +02:00
Avi Kivity	316ef1d70a	sstables: automate writing statistics components Add a virtual funnction to metadata_base so we can loop over statistics components when writing them.	2016-11-22 21:05:06 +02:00
Avi Kivity	7c5e6525ef	sstables: switch statistics components to generic serialized_size() implementation	2016-11-22 20:20:38 +02:00
Glauber Costa	4310635bae	move estimated histogram to utils Nothing sstable-specific in it, really. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-08-31 15:13:23 -04:00
Glauber Costa	ffc2131c51	decouple estimated_histogram from sstables There is nothing really that fundamentally ties the estimated histogram to sstables. This patch gets rid of the few incidental ties. They are: - the namespace name, which is now moved to utils. Users inside sstables/ now need to add a namespace prefix, while the ones outside have to change it to the right one - sstables::merge, which has a very non-descriptive name to begin with, is changed to a more descriptive name that can live inside utils/ - the disk_types.hh include has to be removed - but it had no reason to be here in the first place. Todo, is to actually move the file outside sstables/. That is done in a separate step for clarity. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-08-31 15:13:23 -04:00
Nadav Har'El	1d38a69e49	sstables: expose promoted index in index entry Our index_entry type, holding one partition's entry that we read from the index file, already contained the "_promoted_index" which we read from disk - as an unparsed byte buffer. But there wasn't any API to access this buffer after it was read. This patch adds a trivial getter, to get a read-only view of this buffer. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2016-08-07 17:47:06 +03:00
Nadav Har'El	c647d917e0	sstables: move to_bytes_view to header file Move the to_bytes_view(temporary_buffer<char>) function from source file to header file where is can be used in more places. This saves one use of reinterpret_cast (which we are no re-evaluating), and moreover, we want to use this function also in the promoted index code (to return a bytes_view from the promoted index which was saved as a temporary_buffer). Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <1468761437-27046-1-git-send-email-nyh@scylladb.com>	2016-07-17 16:29:26 +03:00

1 2 3

110 Commits