scylladb

Author	SHA1	Message	Date
Botond Dénes	47e07b787e	restricted_mutation_reader: restrict based-on memory consumption Restrict readers based on their memory consumption, instead of the count of the top-level readers. To do this an interposer is installed at the input_stream level which tracks buffers emmited by the stream. This way we can have an accurate picture of the readers' actual memory consumption. New readers will consume 16k units from the semaphore up-front. This is to account their own memory-consumption, apart from the buffers they will allocate. Creating the reader will be deferred to when there are enough resources to create it. As before only new readers will be blocked on an exhausted semaphore, existing readers can continue to work.	2017-10-03 12:44:12 +03:00
Botond Dénes	0a07e9e7c7	mutation_reader.hh: Move restricted_reader related code In preparation of make_restricted_reader taking a mutation_source as its argument.	2017-10-03 12:39:22 +03:00
Avi Kivity	78eae8bf48	Revert "Merge "Make restricting_mutation_reader more accurate" from Botond" This reverts commit `c6e5dcc556`, reversing changes made to `19b21a0ab2`. Failes to build, plus author has more changes.	2017-10-03 11:58:59 +03:00
Botond Dénes	43dba8f173	Update reader restriction related metrics Update description of existing reader count metrics, add memory consumption metrics.	2017-09-20 11:16:21 +03:00
Botond Dénes	33e97e7457	restricted_mutation_reader: restrict based-on memory consumption Restrict readers based on their memory consumption, instead of the count of the top-level readers. To do this an interposer is installed at the input_stream level which tracks buffers emmited by the stream. This way we can have an accurate picture of the readers' actual memory consumption. New readers will consume 16k units from the semaphore up-front. This is to account their own memory-consumption, apart from the buffers they will allocate. Creating the reader will be deferred to when there are enough resources to create it. As before only new readers will be blocked on an exhausted semaphore, existing readers can continue to work.	2017-09-20 11:14:35 +03:00
Botond Dénes	e4a9e55e0d	mutation_reader.hh: Move restricted_reader related code In preparation of make_restricted_reader taking a mutation_source as its argument.	2017-09-20 11:12:57 +03:00
Tomasz Grabiec	8a9f0f86e7	mutation_source: Introduce mutation_source::make_partition_presence_checker() Every mutation source can have a presence checker. By default all answer "maybe contains". Having this on mutation_source level will be useful for simplifying cache update flow. The cache can ask the right snapshot for a presence checker rather than relying on database to know when and how to make the right one which preserves all invariants. This will be especially useful once all updates of the underlying mutation source of cache (e.g. sstable list) will have to go through cache for safety reasons.	2017-09-04 10:04:29 +02:00
Tomasz Grabiec	065feb1b7b	mutation_reader: Move definitions up in the header	2017-09-04 10:04:29 +02:00
Tomasz Grabiec	4e4839082b	mutation_reader: Use constructor delegation to reduce code duplication	2017-09-04 10:04:29 +02:00
Duarte Nunes	7fb6a74302	combined_mutation_reader: Drop exhausted readers if not in FF mode Exhausted readers can be fast forwarded, so we have to keep them around. However, if the current reader is not fast forwardable, then we can drop those readers and their buffers. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-08-14 14:37:27 +02:00
Duarte Nunes	0b53f88a42	combined_mutation_reader: Remove superfluous mutation_readers list The _all_readers variable can do the same job. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-08-14 14:37:27 +02:00
Botond Dénes	3e97a5cd6b	Remove range_sstable_reader range_sstable_reader is replaced with combined_mutation_reader, using the incremental_reader_selector.	2017-08-10 12:38:10 +03:00
Botond Dénes	a6b9186cab	Add reader_selector to combined_mutation_reader combined_mutation_reader now accepts as a constructor argument a reader_selector instance whoose task is to create new readers on each call to operator()() if needed and possible. This way it is possible to control how readers are created through different specializations of reader_selector. The previous logic is refactored into list_reader_selector which is using a pre-provided mutation_reader list and forwards all of them to combined_mutation_reader at once.	2017-08-10 12:37:40 +03:00
Tomasz Grabiec	ddfcf64966	mutation_source: Make copying cheaper Cache readers will need to take snapshots by copying the mutation_source. That's going to happen quite often, so make copying cheaper.	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	58d5e1393b	mutation_reader: Introduce make_combined_mutation_source()	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	1e2463a382	mutation_reader: Introduce make_empty_*_source()	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	289d01c2cc	mutation_reader: Introduce concept of snapshot_source	2017-06-24 18:06:11 +02:00
Piotr Jastrzebski	9380dd1ee3	mutation_source: make sure we never ignore fast forwarding mutation source sometimes ignore fast forwarding parameter so this change adds assertion to check that this parameter can be safely ignored. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-06-24 18:06:11 +02:00
Piotr Jastrzebski	ab72241e22	mutation_reader: Accept forwarding flag in make_reader_returning() By default make_reader_returning creates a reader that does not support fast forwarding but the second parameter can be used to make it support fast forwarding. [tgrabiec: Improve title] Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-06-24 18:06:11 +02:00
Nadav Har'El	3018df11b5	Allow reading exactly desired byte ranges and fast_forward_to In commit `c63e88d556`, support was added for fast_forward_to() in data_consume_rows(). Because an input stream's end cannot be changed after creation, that patch ignores the specified end byte, and uses the end of file as the end position of the stream. As result of this, even when we want to read a specific byte range (e.g., in the repair code to checksum the partitions in a given range), the code reads an entire 128K buffer around the end byte, or significantly more, with read-ahead enabled. This causes repair to do more than 10 times the amount of I/O it really has to do in the checksumming phase (which in the current implementation, reads small ranges of partitions at a time). This patch has two levels: 1. In the lower level, sstable::data_consume_rows(), which reads all partitions in a given disk byte range, now gets another byte position, "last_end". That can be the range's end, the end of the file, or anything in between the two. It opens the disk stream until last_end, which means 1. we will never read-ahead beyond last_end, and 2. fast_fordward_to() is not allowed beyond last_end. 2. In the upper level, we add to the various layers of sstable readers, mutation readers, etc., a boolean flag mutation_reader::forwarding, which says whether fast_forward_to() is allowed on the stream of mutations to move the stream to a different partition range. Note that this flag is separate from the existing boolean flag streamed_mutation::fowarding - that one talks about skipping inside a single partition, while the flag we are adding is about switching the partition range being read. Most of the functions that previously accepted streamed_mutation::forwarding now accept also the option mutation_reader::forwarding. The exception are functions which are known to read only a single partition, and not support fast_forward_to() a different partition range. We note that if mutation_reader::forwarding::no is requested, and fast_forward_to() is forbidden, there is no point in reading anything beyond the range's end, so data_consume_rows() is called with last_end as the range's end. But if forwarding::yes is requested, we use the end of the file as last_end, exactly like the code before this patch did. Importantly, we note that the repair's partition reading code, column_family::make_streaming_reader, uses mutation_reader::forwarding::no, while the other existing reading code will use the default forwarding::yes. In the future, we can further optimize the amount of bytes read from disk by replacing forwarding::yes by an actual last partition that may ever be read, and use its byte position as the last_end passed to data_consume_rows. But we don't do this yet, and it's not a regression from the existing code, which also opened the file input stream until the end of the file, and not until the end of the range query. Moreover, such an improvement will not improve of anything if the overall range is always very large, in which case not over-reading at its end will not improve performance. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20170619152629.11703-1-nyh@scylladb.com>	2017-06-19 18:31:32 +03:00
Avi Kivity	6e2c9ef9fb	Revert "Allow reading exactly desired byte ranges and fast_forward_to" This reverts commit `317d7fc253` (and also the related `2c57ab84b2`). It causes crashes during range scans, reported by Gleb: "To reproduce I run SELECT * FROM keyspace1.standard1; on typical c-s dataset and 3 node cluster. Backtrace: at /home/gleb/work/seastar/seastar/core/apply.hh:36 rvalue=<unknown type in /home/gleb/work/seastar/build/release/scylla, CU 0x54cf307, DIE 0x55ebf2a>) at /home/gleb/work/seastar/seastar/core/do_with.hh:57 range=std::vector of length 6, capacity 8 = {...}) at /home/gleb/work/seastar/seastar/core/future-util.hh:142 at ./seastar/core/future.hh:890 at /home/gleb/work/seastar/seastar/core/future-util.hh:119 at /home/gleb/work/seastar/seastar/core/future-util.hh:142	2017-06-18 16:10:21 +03:00
Avi Kivity	2c57ab84b2	mutation_reader: fix typo in forwarding_tag The typo went unnoticed since the compiler picked up the global scope's forwarding_tag. The bug made streamed_mutation::forwarding and mutation_reader::forwarding the same type, but fortunately there were no type mixups due to this.	2017-06-15 20:13:01 +03:00
Nadav Har'El	317d7fc253	Allow reading exactly desired byte ranges and fast_forward_to In commit `c63e88d556`, support was added for fast_forward_to() in data_consume_rows(). Because an input stream's end cannot be changed after creation, that patch ignores the specified end byte, and uses the end of file as the end position of the stream. As result of this, even when we want to read a specific byte range (e.g., in the repair code to checksum the partitions in a given range), the code reads an entire 128K buffer around the end byte, or significantly more, with read-ahead enabled. This causes repair to do more than 10 times the amount of I/O it really has to do in the checksumming phase (which in the current implementation, reads small ranges of partitions at a time). This patch has two levels: 1. In the lower level, sstable::data_consume_rows(), which reads all partitions in a given disk byte range, now gets another byte position, "last_end". That can be the range's end, the end of the file, or anything in between the two. It opens the disk stream until last_end, which means 1. we will never read-ahead beyond last_end, and 2. fast_fordward_to() is not allowed beyond last_end. 2. In the upper level, we add to the various layers of sstable readers, mutation readers, etc., a boolean flag mutation_reader::forwarding, which says whether fast_forward_to() is allowed on the stream of mutations to move the stream to a different partition range. Note that this flag is separate from the existing boolean flag streamed_mutation::fowarding - that one talks about skipping inside a single partition, while the flag we are adding is about switching the partition range being read. Most of the functions that previously accepted streamed_mutation::forwarding now accept also the option mutation_reader::forwarding. The exception are functions which are known to read only a single partition, and not support fast_forward_to() a different partition range. We note that if mutation_reader::forwarding::no is requested, and fast_forward_to() is forbidden, there is no point in reading anything beyond the range's end, so data_consume_rows() is called with last_end as the range's end. But if forwarding::yes is requested, we use the end of the file as last_end, exactly like the code before this patch did. Importantly, we note that the repair's partition reading code, column_family::make_streaming_reader, uses mutation_reader::forwarding::no, while the other existing reading code will use the default forwarding::yes. In the future, we can further optimize the amount of bytes read from disk by replacing forwarding::yes by an actual last partition that may ever be read, and use its byte position as the last_end passed to data_consume_rows. But we don't do this yet, and it's not a regression from the existing code, which also opened the file input stream until the end of the file, and not until the end of the range query. Moreover, such an improvement will not improve of anything if the overall range is always very large, in which case not over-reading at its end will not improve performance. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20170614072122.13473-1-nyh@scylladb.com>	2017-06-15 13:22:46 +01:00
Paweł Dziepak	12135dbe21	mutation_reader: make mutation_source nothrow movable	2017-03-09 09:27:43 +00:00
Tomasz Grabiec	892d4a2165	db: Enable creating forwardable readers via mutation_source Right now all mutation source implementations will use make_forwardable() wrapper.	2017-02-23 18:50:44 +01:00
Tomasz Grabiec	b1d1091906	mutation_source: Document liveness requirements	2017-02-23 18:23:52 +01:00
Tomasz Grabiec	15db80188b	mutation_source: Cleanup - combines telescopic overloads into one method with default paramters. - Introduce func_type for a full handler to avoid some duplication.	2017-02-23 18:23:52 +01:00
Tomasz Grabiec	586dbaa8d3	db: Replace virtual_reader_type with mutation_source_opt Virtual reader is a mutation_source.	2017-02-23 18:23:52 +01:00
Tomasz Grabiec	78844fa2e5	db: Use incremental selector in partition_presence_checker This reduces the number of sstables we need to check to only those whose token range overlaps with the key. Reduces cache update time. Especially effective with leveled compaction strategy. Refs #1943. Incremental selector works with an immutable sstable set, so cache updates need to be serialized. Otherwise we could mispopulate due to stale presence information. Presence checker interface was changed to accept decorated key in order to gain easy access to the token, which is required by the incremental selector.	2016-12-19 14:20:58 +01:00
Asias He	937f28d2f1	Convert to use dht::partition_range_vector and dht::token_range_vector	2016-12-19 14:08:50 +08:00
Asias He	e5485f3ea6	Get rid of query::partition_range Use dht::partition_range instead	2016-12-19 08:09:25 +08:00
Asias He	85034c1b57	Convert to use dht::partition_range	2016-12-19 08:04:30 +08:00
Paweł Dziepak	52a4e79210	mutation_reader: add multi_range_reader So far, the only way to combine outputs of multiple readers was to use combining reader. It is very general and, in particular, supports case when the readers emit mutations from overlapping ranges. However, we have cases (e.g. streaming) when we need to read from several disjoint ranges. Combining reader is a suboptimal solution as it requires to creating a reader for each range and ignores the fact that they do not overlap. This patch introduces multi_range_mutation_reader which takes a mutation_source and a sorted set of disjoint ranges. Internally, it uses mutation_reader::fast_forward_to() to move to the next range once the current one is completed.	2016-12-15 13:07:31 +00:00
Paweł Dziepak	bcd374c05d	mutation_reader: forward fast_forward_to() calls Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Paweł Dziepak	b7b7b2bd63	combined_mutation_reader: implement fast_forward_to() Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Paweł Dziepak	2c0cdd55fc	mutation_reader: make combinded_reader public We want to be able to fast forward sstable readers. However, just implementing fast_forward_to() for combined_reader is not enough as the sstables we are reading from may need to change. Following patches are going to introduce a combined sstable reader that derives from combined_reader. To make that possible we first need to make combined_reader public. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Paweł Dziepak	62c9492d33	mutation_reader: introduce fast_forward_to() This patch introduces the interface for fast forwarding mutation readers. The main user of this feature is going to be cache which, while serving range query, may need to read multiple small ranges from the sstables to populate itself with the missing entries. Fast forwarding is an alternative to recreating a reader with different range. Its main advantage is fact that it avoids dropping data that has already been read. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-10-19 15:29:08 +01:00
Duarte Nunes	5fd66f00c2	mutation_reader: Accept trace_state_ptr This patch changes the mutation_reader so it optionally accepts a trace_state_ptr. This will allow us to trace, for example, which sstables are accessed during a request. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-09-01 12:00:31 +02:00
Piotr Jastrzebski	3607d99269	Remove clustering_key_filtering_context. Remove clustering_key_filter_factory and clustering_key_filtering_context. Use partition_slice directly with a static get_ranges method. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2016-08-30 20:31:55 +02:00
Piotr Jastrzebski	b05b90b3a5	Introduce clustering_key_filter_ranges. This fixes the problem of multiple concurrent get_ranges calls. Previously each call was invalidating the result of the previous call. Now they don't step on each other foot. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2016-08-30 19:46:38 +02:00
Paweł Dziepak	7b479d8b41	clarify relations between mutation_reader and streamed_mutation Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-07-29 15:58:42 +01:00
Paweł Dziepak	93cc4454a6	streamed_mutation: emit range_tombstones directly Originally, streamed_mutations guaranteed that emitted tombstones are disjoint. In order to achieve that two separate objects were produced for each range tombstone: range_tombstone_begin and range_tombstone_end. Unfortunately, this forced sstable writer to accumulate all clustering rows between range_tombstone_begin and range_tombstone_end. However, since there is no need to write disjoint tombstones to sstables (see #1153 "Write range tombstones to sstables like Cassandra does") it is also not necessary for streamed_mutations to produce disjoint range tombstones. This patch changes that by making streamed_mutation produce range_tombstone objects directly. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-07-13 09:51:18 +01:00
Tomasz Grabiec	74ff30a31a	mutation_reader: Introduce stable_flattened_mutations_consumer adaptor Needed to make compact_mutation class non-movable later. It is used in do_with, so needs to be movable. Will be solved by using this adaptor.	2016-07-09 22:31:28 +02:00
Tomasz Grabiec	fb44f895b2	mutation_reader: Name template parameters after concepts With so many consumer concepts out there, it is confusing to name parameters using genering "Consumer" name, let's name them after (already defined) concepts: CompactedMutationsConsumer, FlattenedConsumer.	2016-07-09 22:31:27 +02:00
Paweł Dziepak	0287e0c9ac	mutation_reader: add consume_flattened_in_thread() This is a version of consume_flattened() intended to be run inside a thread. All consumer code is going to be invoked in the same thread context. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-30 11:39:01 +01:00
Paweł Dziepak	4133cc7a53	mutation_reader: make consume_flattened() produce decorated keys Since decorated keys are already computed it is better to pass more information than less. Consumers interested just in partition key can just drop token and the ones requiring full decorated key don't need to recompute it. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-30 11:39:00 +01:00
Avi Kivity	9ac730dcc9	mutation_reader: make restricting_mutation_reader even more restricting While limiting the number of concurrently executing sstable readers reduces our memory load, the queued readers, although consuming a small amount of memory, can still grow without bounds. To limit the damage, add two limits on the queue: - a timeout, which is equal to the read timeout - a queue length limit, which is equal to 2% of the shard memory divided by an estimate of the queued request size (1kb) Together, these limits bound the amount of memory needed by queued disk requests in case the disk can't keep up. Message-Id: <1467206055-30769-1-git-send-email-avi@scylladb.com>	2016-06-29 15:17:35 +02:00
Avi Kivity	bea7d7ee94	mutation_reader: introduce restricting_reader A restricting_reader wraps a mutation_reader, and restricts it concurrency using a provided semaphore; this allows controlling read concurrency, which is important since reads can consume a lot of resources ((number of participating sstables) * 128k after we have streaming mutations, and a lot more before).	2016-06-27 17:17:52 +03:00
Paweł Dziepak	2b7e62599d	mutation_reader: add consume_flattened() Mutation reader produces a stream of streamed_mutations. Each streamed_mutation itself is a stream so basically we are dealing here with a stream of streams. consume_flattened() flattens such stream of streams making all its elements consumable by a single consumer. It also allows reversing the mutations before consumption using reverse_streamed_mutation().	2016-06-20 21:29:52 +01:00
Paweł Dziepak	8dfabf2790	mutation_reader: support slicing in make_reader_returning_many() Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:29:52 +01:00

1 2

75 Commits