scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-23 01:50:35 +00:00

Author	SHA1	Message	Date
Tomasz Grabiec	d523c60629	sstables: Push fragments from mp_row_consumer so that parser is interrupted less Currently we return proceed::no after every mutation_fragment which is to be consumed. This froces parser to save and reload its state often. This can be avoided if we pushed the fragments directly from mp_row_consumer, then we would return proceed::no only when the buffer fills up. tests/perf/perf_fast_forward shows 15% increase in throughput of a large partition scan, from 1.34M frag/s to 1.55M frag/s. Message-Id: <1490882700-22684-1-git-send-email-tgrabiec@scylladb.com>	2017-04-05 18:10:54 +03:00
Tomasz Grabiec	3fbc0bed6e	sstables: sstable_streamed_mutation: use index in fast_forward_to()	2017-03-28 18:34:55 +02:00
Tomasz Grabiec	5b36976bf0	sstables: Store parsed promoted index in index_entry	2017-03-28 18:34:55 +02:00
Tomasz Grabiec	a2a8312c78	sstables: Add trace-level logging for sstable consumption	2017-03-28 18:34:55 +02:00
Tomasz Grabiec	5e34743882	sstables: Make parsing throw exception on malformed promoted index Will be easier to propagate failure to upper layers once parsing is reused in the index_reader. The old behavior of ignoring parsing failures is preserved, but the error is logged now.	2017-03-28 18:34:55 +02:00
Tomasz Grabiec	123b102dd6	sstables: Skip to next partition using index Slicing front of a very large partition: Before: offset read time [s] frags frag/s aio [KiB] blocked dropped cpu 0 1 0.110960 1 9 992 126956 924 0 92.4% After: offset read time [s] frags frag/s aio [KiB] blocked dropped cpu 0 1 0.000784 1 1276 3 344 2 1 37.3%	2017-03-28 18:10:39 +02:00
Tomasz Grabiec	a9252dfc58	sstables: Use separate index readers for lower and upper bounds So that lower bound can be advanced within the range.	2017-03-28 18:10:39 +02:00
Tomasz Grabiec	27d86dfe18	sstables: Enable skipping to cells at data_consume_context level	2017-03-28 18:10:39 +02:00
Tomasz Grabiec	e36979da47	sstables: index_reader: Use sstable's schema Makes for a simpler interface.	2017-03-28 18:10:39 +02:00
Tomasz Grabiec	1f1b516b31	sstables: Remove use of forwarding wrapper	2017-03-10 14:42:22 +01:00
Tomasz Grabiec	d7afab21e7	sstables: Implement sstable_streamed_mutation::fast_forward_to() Handling of forwarding is done inside mp_row_consumer, because it allows us to filter out irrelevant data sooner and thus more efficiently. Becuase static row can be now skipped as well, _skip_clustering_row was renamed to more generic _skip_in_progress.	2017-03-10 14:42:22 +01:00
Tomasz Grabiec	4750216387	sstables: Extract and use clustering_ranges_walker Extracted from mp_row_consumer.	2017-03-10 14:42:22 +01:00
Tomasz Grabiec	01374c41f2	sstables: Move workaround for out-of-order range tombstones to mp_row_consumer This is a preliminary step before adding support for fast-forwarding to mp_row_consumer, so that range handling can be solely in mp_row_consumer rather than split between it and sstable_streamed_mutation. This also alleviates #2080 by reading all tombstones only up to the first row, after that range tombstones are treated like other fragments.	2017-03-10 14:42:22 +01:00
Tomasz Grabiec	d41a7c5eb4	sstables: Drop default mp_row_consumer constructor	2017-03-10 14:42:22 +01:00
Tomasz Grabiec	084747b1ee	sstables: streamed_mutation: Stop reading when end of slice reached As part of this change, skip detection detection is refactored. This simplifies reasoning about mp_row_consumer's state a bit because now is_mutation() is not reset externally and only depends on current position of the reader. It will prove useful when we extend mutation reader to decide if it should skip to the next partition up front before calling _context.read(), so that we can for instance skip using index instead. Fixes #2088.	2017-03-10 14:42:19 +01:00
Tomasz Grabiec	55358cacc5	sstables: Switch is_in_range() to position_in_partition Makes it immune to #1446 and is a prerequisite for implementing forwarding in mp_row_consumer.	2017-03-09 21:15:11 +01:00
Tomasz Grabiec	892d4a2165	db: Enable creating forwardable readers via mutation_source Right now all mutation source implementations will use make_forwardable() wrapper.	2017-02-23 18:50:44 +01:00
Tomasz Grabiec	33457cc9a9	sstables: Fix detection of repeated tombstones The check was not catching range tombstone repeated immediately after itself. Message-Id: <1487596098-17409-1-git-send-email-tgrabiec@scylladb.com>	2017-02-20 15:35:15 +00:00
Tomasz Grabiec	cc439df542	Revert "sstables: Simplify sstable_streamed_mutation::read_next()" This reverts commit `1e2c01ff49`. We do not detect repeated tombstone if it follows an in-range tombstone following a skipped clustering row, because _in_progress will be disengaged after such tombstone is emitted. Message-Id: <1487596080-21480-1-git-send-email-tgrabiec@scylladb.com>	2017-02-20 15:34:58 +00:00
Tomasz Grabiec	1e2c01ff49	sstables: Simplify sstable_streamed_mutation::read_next() mp_row_consumer doesn't split row fragments on repeated range tombstones any more.	2017-02-13 16:12:16 +01:00
Tomasz Grabiec	6324876f24	sstables: Emit only relevant range tombstones	2017-02-13 16:12:16 +01:00
Paweł Dziepak	354ce0b2c7	mutation_fragment: make write access more explicit mutation_fragments are going to be caching their size in memory. In order to be able to invalidate that correctly, they need to know when that size may change (but avoid invalidation when it is not necessary).	2017-02-09 10:49:46 +00:00
Paweł Dziepak	5905729c4a	sstables: read counter cells	2017-02-02 10:35:14 +00:00
Paweł Dziepak	19ad35610b	sstables: do not discard future returned by fast_forward_to() continuous_data_consumer::fast_forward_to() returns a future which was later ignored by data_consume_context::fast_forward_to(). With the current implementation, the future in question is always ready and that's why the problem didn't manifest itself in the form of crashes or invalid results. Message-Id: <20170120105746.7300-1-pdziepak@scylladb.com>	2017-01-20 12:22:17 +01:00
Benoît Canet	bcc826cc34	mutation_reader: Short circuit the read path on empty range Add a boolean to short circuit the read path on empty range hoping for some speedup. tested in read write with cs using: cl=QUORUM duration=1m -mode native cql3 -rate threads=700 -node localhost Will do some additional benchmark. Fixes #1056 Signed-off-by: Benoît Canet <benoit@scylladb.com> Message-Id: <20170118194451.16836-1-benoit@scylladb.com>	2017-01-20 10:05:40 +00:00
Raphael S. Carvalho	eed2a7d065	sstables: group sstable components that can be shared among shards We intend to share immutable sstable components among shards to reduce excessive memory usage when resharding shared sstables. This change is about grouping those components into a structure, and using foreign ptr to make sure that the structure will be deleted by whichever shard created it. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-01-06 15:16:19 -02:00
Asias He	e5485f3ea6	Get rid of query::partition_range Use dht::partition_range instead	2016-12-19 08:09:25 +08:00
Piotr Jastrzebski	4fe989d58e	Cleanup sstables::mutation_reader::impl Pointer to sstable seems unnecessary. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <a45e8853af2b5f896ec44144fbc26d3325a5ec0c.1479123740.git.piotr@scylladb.com>	2016-11-14 11:52:52 +00:00
Avi Kivity	a35136533d	Convert ring_position and token ranges to be nonwrapping Wrapping ranges are a pain, so we are moving wrap handling to the edges. Since cql can't generate wrapping ranges, this means thrift and the ring maintenance code; also range->ring transformations need to merge the first and last ranges. Message-Id: <1478105905-31613-1-git-send-email-avi@scylladb.com>	2016-11-02 21:04:11 +02:00
Piotr Jastrzebski	27726cecff	Clean up position_in_partition. Introduce position_in_partition_view and use it in position() method in mutation_fragment, range_tombstone, static_row and clustering_row. Clean up comparators in position_in_partition. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <c65293c71a6aa23cf930ed317fb63df1fdc34fd1.1477399763.git.piotr@scylladb.com>	2016-10-25 15:13:20 +01:00
Paweł Dziepak	20bfa1fa52	sstables: drop sstable::{lower, upper}_bound() Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Paweł Dziepak	6755a679f6	drop key readers key_readers weren't used since introduction of continuity flag to cache entries. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Paweł Dziepak	c63e88d556	sstables: implement mutation_reader::impl::fast_forward_to() This patch allows sstable readers to be fast forwarded without making it necessary to recreate the reader (and dropping all buffers in the process). It is built on top of index_reader and ability of data_consume_context to be fast forwarded. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Paweł Dziepak	a530762277	sstables: introduce index_reader index_reader is a helper that implements index lookups. Its goal is to avoid dropping read buffers if they still may be needed (for example to get end bound of the range or after fast forwarding the reader). Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Paweł Dziepak	f49a9e0d64	sstables: drop unused read_range_rows() overload That overload was used only by unit test and violated guarantee that partition range lives until mutation reader is done. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Paweł Dziepak	25b91c51e2	ssables: add data_consume_rows_context::reset() reset() is going to be used to restore valid state after fast forwarding the reader. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Raphael S. Carvalho	004617839d	database: check bloom filter of all sstables earlier All sstables will now have bloom filter checked in a single pass before reader iterate through all candidates. It's possible that we will need to futurize the procedure if it holds cpu for too long. This change is also a step towards the optimization that will rule out sstables based on clustering filter. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-09-02 10:50:08 -03:00
Piotr Jastrzebski	3607d99269	Remove clustering_key_filtering_context. Remove clustering_key_filter_factory and clustering_key_filtering_context. Use partition_slice directly with a static get_ranges method. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2016-08-30 20:31:55 +02:00
Piotr Jastrzebski	b05b90b3a5	Introduce clustering_key_filter_ranges. This fixes the problem of multiple concurrent get_ranges calls. Previously each call was invalidating the result of the previous call. Now they don't step on each other foot. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2016-08-30 19:46:38 +02:00
Piotr Jastrzebski	7c9de37ef9	Remove clustering_key_filtering_context::want_static_columns It's always true and clustering_key_filtering_context is going away so the first step is to get rid of this method. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2016-08-25 08:53:31 +02:00
Paweł Dziepak	e60bb83688	sstables: optimise clustering rows filtering Clustering rows in the sstables are sorted in the ascending order so we can use that to minimise number of comparisons when checking if a row is in the requested range. Refs #1544. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com> Reviewed-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <1471608921-30818-1-git-send-email-pdziepak@scylladb.com>	2016-08-19 18:11:11 +03:00
Nadav Har'El	0d00da7f7f	sstables: don't forget to read static row [v2: fix check for static column (don't check if the schema is not compound) and move want-static-columns flag inside the filtering context to avoid changing all the callers.] When a CQL request asks to read only a range of clustering keys inside a partition, we actually need to read not just these clustering rows, but also the static columns and add them to the response (as explained by Tomek in issue #1568). With the current code, that CQL request is translated into an sstable::read_row() with a clustering-key filter. But this currently only reads the requested clustering keys - NOT the static columns. We don't want sstable::read_row() to unconditionally read the from disk the static columns because if, for example, they are already cached, we might not want to read them from disk. We don't have such partial-partition cache yet, but we are likely to have one in the future. This patch adds in the clustering key filter object a flag of whether we need to read the static columns (actually, it's function, returning this flag per partition, to match the API for the clustering-key filtering). When sstable::read_row() sees the flag for this partition is true, it also request to read the static columns. Currently, the code always passes "true" for this flag - because we don't have the logic to cache partially-read partitions. The current find_disk_ranges() code does not yet support returning a non- contiguous byte range, so this patch, if it notices that this partition really has static columns in addition to the range it needs to read, falls back to reading the entire partition. This is a correct solution (and fixes #1568) but not the most efficient solution. Because static columns are relatively rare, let's start with this solution (correct by less efficient when there are static columns) and providing the non- contiguous reading support is left as a FIXME. Fixes #1568 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <1471124536-19471-1-git-send-email-nyh@scylladb.com>	2016-08-15 12:30:19 +03:00
Nadav Har'El	c2e4f5ba16	Avoid some warnings in debug build The sanitizer of the debug build warns when a "bool" variable is read when containing a value not 0 or 1. In particular, if a class has an uninitialized bool field, which class logic allows to only be set later, then "move"ing such an object will read the uninitialized value and produce this warning. This patch fixes four of these warnings seen in sstable_test by initializing some bool fields to false, even though the code doesn't strictly need this initialization. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <1470744318-10230-1-git-send-email-nyh@scylladb.com>	2016-08-09 13:21:45 +01:00
Nadav Har'El	e005762271	sstable: avoid copying non-existant value The promoted-index reading code contained a bug where it copied the value of an disengaged optional (this non-value was never used, but it was still copied ). Fix it by keeping the optional<> as such longer. This bug caused tests/sstable_test in the debug build to crash (the release build somehow worked). Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <1470742418-8813-1-git-send-email-nyh@scylladb.com>	2016-08-09 14:35:18 +03:00
Nadav Har'El	022a69caea	sstables: promoted index read support This patch adds support more efficiently reading small parts of a large partition, without reading the entire partition as we had to do so far. This is done using the "promoted index". The "promoted index" is stored in the sstable index file, and provides for each large sstable row ("partition" in CQL nomenclature) a sample of the column names at (for example) 64KB intervals. This means that when we read a slice of columns (e.g., cql rows), or page through a large partition, we do not have to read the entire partition from disk. This patch only implements the read side of promoted index - a later patch will add the write-side support (i.e., writing the promoted index to the index file while saving the sstable). Nevertheless this patch can already be tested by reading existing sstables from Cassandra which include a promoted index - such as the one included in the test in the previous patch. The use of the promoted index currently has two limitations: 1. It is only used when reading a single partition with sstable::read_row(), not when scanning through many partitions with sstable::read_range_rows() or sstable::read_rows(). 2. It is only used when filtering a single clustering-key range, rather than a list of disjoint ranges. A single range is the common case. These two issues will be improved later. In the meantime, in those unsupported cases we simply continue to read entire partitions, so we're not worse-off than before. Also note that this patch only helps when sstable::read_row() is used with a clustering-key prefix (i.e., a slice). Our higher-level request handling code may decide to read an entire partition into the cache, and not use a clustering-key prefix at all when reading. We will need to indepdently improve the high-level code to use read_row()'s slicing capabilities when paging through large partitions, for example. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2016-08-07 17:47:09 +03:00
Nadav Har'El	6cdf5684f5	sstables: introduce find_disk_ranges() Our sstable reading code is currently hard-coded to read entire partitions, even if we know that only a subset of the columns are requested. This patch introduces find_disk_ranges(), a function to find the ranges of bytes we need to read from the sstable data file to guarantee that the desired columns from the desired partition are read. The returned range may be the entire byte range of the given partition - as found using the summary and index files - but if the index contains a "promoted index" (basically a sample of column positions for each key) we may return a smaller range. The "disk_read_range" type introduced in the previous patch is extended here to support reading a partial partition - by including additional information which would be missed when reading only part of a partition (viz., the partition key and the partition's tombstone). This function isn't used in this patch - we will wire its use in the next patch, which will complete the read-side support for the promoted index. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2016-08-07 17:47:08 +03:00
Nadav Har'El	4e5a09538d	make column-name parsing code public The "struct column" code in partition.cc is generally useful code for parsing serialized column names from the sstable. It is currently private inside the "mp_row_consumer" class. But in a next patch we'll also want to use it in the "sstable" class, for the promoted-index parsing code, which among other things also needs to deserialize column names. The trivial fix, in this patch, is to make this code "public". However, for now it is still available only in partition.cc. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2016-08-07 17:47:04 +03:00
Nadav Har'El	1975abbfd6	sstables: disk_read_range Currently, the main sstable data parsing entry point data_consume_rows() takes a contiguous range of bytes to read from disk and parse. This range is supposed to be an entire partition or contiguous group of partitions. and is self contained (can be parsed without extra information about the identity of these partitions). For the promoted index feature (which we will add in a following patch) we will want the range to span only a part of a partition, and will need the caller to provide some information not available to the parser (such as the partition's key). In the future, we will also want to support a vector of byte ranges, instead of just one. So in preparation for this, this patch simply replaces the start/end pair by a new class disk_read_range, which can be easily extended in later patches. No new functionality is introduced in this patch. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2016-08-07 17:47:02 +03:00
Paweł Dziepak	3a27582cd4	sstables: allow streamed_mutation to outlive mutation_reader This patch makes sstable_streamed_mutation keep a reference to sstable_data_source object which contains full state necessary to read the sstable. That state is also shared with parent mutation_reader (only for range queries), but now its lifetime is appropriately extended if the mutation_reader is destoryed before streamed_mutation. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-07-29 15:53:09 +01:00
Duarte Nunes	25a44ee6cf	sstables: Validate static cell is on static column This patch enforces compatibility between a cell and the corresponding column definition with regards to them being static. [tgrabiec: Fixed typo in "definition"] Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <1469699532-26984-1-git-send-email-duarte@scylladb.com>	2016-07-28 12:01:31 +02:00

1 2 3

143 Commits