scylladb

Author	SHA1	Message	Date
Botond Dénes	07fb2e9c4d	make_foreign_reader: don't wrap local readers If the to-be-wrapped reader is local (lives on the same shard where make_foreign_reader() is called) there is no need to wrap it with foreign_reader. Just return it as is. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <886ed883b707f163603a40b56b8823f2bb6c47c6.1523873224.git.bdenes@scylladb.com>	2018-04-16 15:11:20 +03:00
Botond Dénes	3a6f397fd0	Add multishard_combined_reader Takes care of reading a range from all shards that own a subrange in the range. The read happens sequentially, reading from one shard at a time. Under the scenes it uses combined_mutation_reader and foreign_reader, the former providing the merging logic and the latter taking care of transferring the output of the remote readers to the local shard. Readers are created on-demand by a reader-selector implementation that creates readers for yet unvisited shards as the read progresses. The read starts with a concurrency of one, that is the reader reads from a single shard at a time. The concurrency is exponentially increased (to a maximum of the number of shards) when a reader's buffer is empty after moving the next shard. This condition is important as we only wan't to increase concurrency for sparse tables that have little data and the reader has to move between shards often. When concurrency is > 1, the reader issues background read-aheads to the next shards so that by the time it needs to move to them they have the data ready. For dense tables (where we rarely cross shards) we rely on the foreign_reader to issue sufficient read-aheads on its own to avoid blocking.	2018-04-11 10:03:47 +03:00
Botond Dénes	2c0f8d0586	Add foreign_reader Local representant of a reader located on a remote shard. Manages the lifecycle and takes care of seamlessly transferring fragments produced by the remote reader. Fragments are copied between the shards in batches, a bufferful at a time. To maximize throughput read-ahead is used. After each fill_buffer() or fast_forward_to() a read-ahead (a fill_buffer() on the remote reader) is issued. This read-ahead runs in the background and is brough back to foreground on the next fill_buffer() or fast_forward_to() call.	2018-04-11 09:22:45 +03:00
Botond Dénes	f488ae3917	Add buffer_size() to flat_mutation_reader buffer_size() exposes the collective size of the external memory consumed by the mutattion-fragments in the flat reader's buffer. This provides a basis to build basic memory accounting on. Altought this is not the entire memory consumption of any given reader it is the most volatile component and usually by far the largest one too.	2018-03-13 10:34:34 +02:00
Botond Dénes	212b2dabc4	Resource-based cache eviction Readers serving user-reads need to obtain a permit to start reading. There exists a restriction on how much active readers can be admitted based on their count and their memory onsumption. Since the saved readers of cached queriers are techically active (they hold a permit) they can block new readers from obtaining a permit. New readers have a higher priority because a cached reader might be abandoned or used later at best so in the face of memory pressure we evict cached readers to free up permits for new readers. Cached queriers are evicted in LRU order as the oldest queriers are the most likely to be evicted based on their TTL anyway.	2018-03-13 10:34:34 +02:00
Botond Dénes	1259031af3	Use the reader_concurrency_semaphore to limit reader concurrency	2018-03-08 14:12:12 +02:00
Botond Dénes	dfa04c3fea	Add reader_concurrency_semaphore This semaphore implements the new dual, count and memory based active reader limiting. As purely memory-based limiting proved to cause problems on big boxes admitting a large number of readers (more than any disk could handle) the previous count-based limit is reintroduced in addition to the existing memory-based limit. When creating new readers first the count-based limit is checked. If that clears the memory limit is checked before admitting the reader. reader_conccurency_semaphore wraps the two semaphores that implement these limits and enforces the correct order of limit checking. This class also completely replaces the restricted_reader_config struct, it encapsulates all data and related functinality of the latter, making client code simpler.	2018-03-08 14:12:12 +02:00
Botond Dénes	d5bb8a47fc	mv reader_resource_tracker.hh -> reader_concurrency_semaphore.hh In preparation to reader_concurrency_semaphore being added to the file. The reader_resource_tracker is really only a helper class for reader_concurrency_semaphore so the latter is better suited to provide the name of the file.	2018-03-08 10:29:16 +02:00
Botond Dénes	206e7d40d4	restricted_mutation_reader: switch to std::variant Tests: unit-tests(release) Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <a8930b764171db131d9d8d5fe4035014ecb452f4.1519391304.git.bdenes@scylladb.com>	2018-02-25 14:35:57 +02:00
Piotr Jastrzebski	37285ad7fa	Delete unused make_reader_returning Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-01-24 20:56:48 +01:00
Piotr Jastrzebski	864db78fcf	Delete unused make_reader_returning_many Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-01-24 20:56:48 +01:00
Piotr Jastrzebski	ff4ffc1c64	Delete unused make_empty_reader Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-01-24 20:56:48 +01:00
Piotr Jastrzebski	0b8aedcc59	Delete unused mutation_reader_from_flat_mutation_reader Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-01-24 20:56:48 +01:00
Piotr Jastrzebski	8aaf5dc900	Delete unused streamed_mutation_from_flat_mutation_reader Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-01-24 20:56:48 +01:00
Piotr Jastrzebski	d266eaa01e	mutation_source: rename make_flat_mutation_reader to make_reader Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-01-19 09:30:12 +01:00
Avi Kivity	93076d25b6	Merge "mutation_source: remove support for creation with mutation_reader" from Piotr "After this patchset it's only possible to create a mutation_source with a function that produces flat_mutation_reader." * 'haaawk/mutation_source_v1' of ssh://github.com/scylladb/seastar-dev: Merge flat_mutation_reader_mutation_source into mutation_source Remove unused mutation_reader_mutation_source Remove unused mutation_source constructor. Migrate make_source to flat reader Migrate run_conversion_to_mutation_reader_tests to flat reader flat_mutation_reader_from_mutations: add support for slicing Remove unused mutation_source constructor. Migrate partition_counting_reader to flat reader Migrate throttled_mutation_source to flat reader Extract delegating_reader from make_delegating_reader row_cache_test: call row_cache::make_flat_reader in mutation_sources Remove unused friend declaration in flat_mutation_reader::impl Migrate make_source_with to flat reader Migrate make_empty_mutation_source to flat reader Remove unused mutation_source constructor Migrate test_multi_range_reader to flat reader Remove unused mutation_source constructors	2018-01-15 18:15:53 +02:00
Avi Kivity	fe788e0a5d	mutation_reader: adjust FragmentProducer concept for timeout forward_to() no accepts a timeout parameter, and the concept should reflect it, or it breaks the build when concepts are enabled.	2018-01-14 18:09:37 +02:00
Glauber Costa	3c9eeea4cf	restricted_mutation_reader: don't pass timeouts through the config structure This patch enables passing a timeout to the restricted_mutation_reader through the read path interface -- using fill_buffer and friends. This will serve as a basis for having per-timeout requests. The config structure still has a timeout, but that is so far only used to actually pass the value to the query interface. Once that starts coming from the storage proxy layer (next patch) we will remove. The query callers are patched so that we pass the timeout down. We patch the callers in database.cc, but leave the streaming ones alone. That can be safely done because the default for the query path is now no_timeout, and that is what the streaming code wants. So there is no need to complicate the interface to allow for passing a timeout that we intend to disable. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-01-12 07:43:21 -05:00
Glauber Costa	5140aaea00	add a timeout to fast forward to In the last patch, we enabled per-request timeouts, we enable timeouts in fill_buffer. There are many places, though, in which we fast_forward_to before we fill_buffer, so in order to make that effective we need to propagate the timeouts to fast_forward_to as well. In the same way as fill_buffer, we make the argument optional wherever possible in the high level callers, making them mandatory in the implementations. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-01-12 07:43:19 -05:00
Glauber Costa	d965af42b0	add a timeout to fill_buffer As part of the work to enable per-request timeouts, we enable timeouts in fill_buffer. The argument is made optional at the main classes, but mandatory in all the ::impl versions. This way we'll make sure we didn't forget anything. At this point we're still mostly passing that information around and don't have any entity that will act on those timeouts. In the next patch we will wire that up. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-01-11 12:07:41 -05:00
Paweł Dziepak	b4a4c04bab	combined_reader: optimise for disjoint partition streams The legacy mutation_reader/streamed_mutation design allowed very easily to skip the partition merging logic if there was only one underlying reader that has emitted it. That optimisation was lost after conversion to flat mutation readers which has impacted the performance. This patch mostly recovers it by bypassing most of mutation_reader_merger logic if there is only a single active reader for a given partition. The performance regression was introduced in `8731c1bc66` "Flatten the implementation of combined_mutation_reader". perf_simple_query -c4 read results (medians of 60): original regression before 8731c1 after 8731c1 diff read 326241.02 300244.09 -8.0% this patch before after diff read 313882.59 325148.05 3.6% Message-Id: <20180103121019.764-1-pdziepak@scylladb.com>	2018-01-11 10:21:17 +01:00
Raphael S. Carvalho	818830715f	Fix potential infinite recursion when combining mutations for leveled compaction The issue is triggered by compaction of sstables of level higher than 0. The problem happens when interval map of partitioned sstable set stores intervals such as follow: [-9223362900961284625 : -3695961740249769322 ] (-3695961740249769322 : -3695961103022958562 ] When selector is called for first interval above, the exclusive lower bound of the second interval is returned as next token, but the inclusivess info is not returned. So reader_selector was returning that there were new readers when the current token was -3695961740249769322 because it was stored in selector position field as inclusive, but it's actually exclusive. This false positive was leading to infinite recursion in combined reader because sstable set's incremental selector itself knew that there were actually no new readers, and therefore no progress could be made. Fix is to use ring_position in reader_selector, such that inclusiveness would be respected. So reader_selector::has_new_readers() won't return false positive under the conditions described above. Fixes #2908. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-01-03 16:23:01 -02:00
Avi Kivity	8795238869	Merge "Fix handling of range tombstones starting at same position" from Tomasz "When we get two range tombstones with the same lower bound from different data sources (e.g. two sstable), which need to be combined into a single stream, they need to be de-overlapped, because each mutation fragment in the stream must have a different position. If we have range tombstones [1, 10) and [1, 20), the result of that de-overlapping will be [1, 10) and [10, 20]. The problem is that if the stream corresponds to a clustering slice with upper bound greater than 1, but lower than 10, the second range tombstone would appear as being out of the query range. This is currently violating assumptions made by some consumers, like cache populator. One effect of this may be that a reader will miss rows which are in the range (1, 10) (after the start of the first range tombstone, and before the start of the second range tombstone), if the second range tombstone happens to be the last fragment which was read for a discontinuous range in cache and we stopped reading at that point because of a full buffer and cache was evicted before we resumed reading, so we went to reading from the sstable reader again. There could be more cases in which this violation may resurface. There is also a related bug in mutation_fragment_merger. If the reader is in forwarding mode, and the current range is [1, 5], the reader would still emit range_tombstone([10, 20]). If that reader is later fast forwarded to another range, say [6, 8], it may produce fragments with smaller positions which were emitted before, violating monotonicity of fragment positions in the stream. A similar bug was also present in partition_snapshot_flat_reader. Possible solutions: 1) relax the assumption (in cache) that streams contain only relevant range tombstones, and only require that they contain at least all relevant tombstones 2) allow subsequent range tombstones in a stream to share the same starting position (position is weakly monotonic), then we don't need to de-overlap the tombstones in readers. 3) teach combining readers about query restrictions so that they can drop fragments which fall outside the range 4) force leaf readers to trim all range tombstones to query restrictions This patch implements solution no 2. It simplifies combining readers, which don't need to accumulate and trim range tombstones. I don't like solution 3, because it makes combining readers more complicated, slower, and harder to properly construct (currently combining readers don't need to know restrictions of the leaf streams). Solution 4 is confined to implementations of leaf readers, but also has disadvantage of making those more complicated and slower. There is only one consumer which needs the tombstones with monotonic positions, and that is the sstable writer. Fixes #3093." * tag 'tgrabiec/fix-out-of-range-tombstones-v1' of github.com:scylladb/seastar-dev: tests: row_cache: Introduce test for concurrent read, population and eviction tests: sstables: Add test for writing combined stream with range tombstones at same position tests: memtable: Test that combined mutation source is a mutation source tests: memtable: Test that memtable with many versions is a mutation source tests: mutation_source: Add test for stream invariants with overlapping tombstones tests: mutation_reader: Test fast forwarding of combined reader with overlapping range tombstones tests: mutation_reader: Test combined reader slicing on random mutations tests: mutation_source_test: Extract random_mutation_generator::make_partition_keys() mutation_fragment: Introduce range() clustering_interval_set: Introduce overlaps() clustering_interval_set: Extract private make_interval() mutation_reader: Allow range tombstones with same position in the fragment stream sstables: Handle consecutive range_tombstone fragments with same position tests: streamed_mutation_assertions: Merge range_tombstones with the same position in produces_range_tombstone() streamed_mutation: Introduce peek() mutation_fragment: Extract mergeable_with() mutation_reader: Move definition of combining mutation reader to source file mutation_reader: Use make_combined_reader() to create combined reader	2018-01-02 18:32:09 +02:00
Duarte Nunes	2618209c2d	Remove obsolete includes and fix build move.hh was deleted, but files weren't updated to reflect that. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-12-28 12:03:44 +00:00
Tomasz Grabiec	41ede08a1d	mutation_reader: Allow range tombstones with same position in the fragment stream When we get two range tombstones with the same lower bound from different data sources (e.g. two sstable), which need to be combined into a single stream, they need to be de-overlapped, because each mutation fragment in the stream must have a different position. If we have range tombstones [1, 10) and [1, 20), the result of that de-overlapping will be [1, 10) and [10, 20]. The problem is that if the stream corresponds to a clustering slice with upper bound greater than 1, but lower than 10, the second range tombstone would appear as being out of the query range. This is currently violating assumptions made by some consumers, like cache populator. One effect of this may be that a reader will miss rows which are in the range (1, 10) (after the start of the first range tombstone, and before the start of the second range tombstone), if the second range tombstone happens to be the last fragment which was read for a discontinuous range in cache and we stopped reading at that point because of a full buffer and cache was evicted before we resumed reading, so we went to reading from the sstable reader again. There could be more cases in which this violation may resurface. There is also a related bug in mutation_fragment_merger. If the reader is in forwarding mode, and the current range is [1, 5], the reader would still emit range_tombstone([10, 20]). If that reader is later fast forwarded to another range, say [6, 8], it may produce fragments with smaller positions which were emitted before, violating monotonicity of fragment positions in the stream. A similar bug was also present in partition_snapshot_flat_reader. Possible solutions: 1) relax the assumption (in cache) that streams contain only relevant range tombstones, and only require that they contain at least all relevant tombstones 2) allow subsequent range tombstones in a stream to share the same starting position (position is weakly monotonic), then we don't need to de-overlap the tombstones in readers. 3) teach combining readers about query restrictions so that they can drop fragments which fall outside the range 4) force leaf readers to trim all range tombstones to query restrictions This patch implements solution no 2. It simplifies combining readers, which don't need to accumulate and trim range tombstones. I don't like solution 3, because it makes combining readers more complicated, slower, and harder to properly construct (currently combining readers don't need to know restrictions of the leaf streams). Solution 4 is confined to implementations of leaf readers, but also has disadvantage of making those more complicated and slower. Fixes #3093.	2017-12-22 11:06:20 +01:00
Tomasz Grabiec	60ed5d29c0	mutation_reader: Move definition of combining mutation reader to source file So that the whole world doesn't recompile when it changes.	2017-12-21 21:24:11 +01:00
Tomasz Grabiec	52285a9e73	mutation_reader: Use make_combined_reader() to create combined reader So that we can hide the definition of combined_mutation_reader. It's also less verbose.	2017-12-21 21:24:11 +01:00
Piotr Jastrzebski	2c1f0250c2	Migrate make_empty_mutation_source to flat reader Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-12-21 21:17:46 +01:00
Piotr Jastrzebski	04ce7dfb84	Remove unused make_combined_reader overload. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-12-21 17:00:43 +01:00
Piotr Jastrzebski	b3b6db4f50	Remove unused make_combined_reader overload. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-12-21 17:00:43 +01:00
Piotr Jastrzebski	b1c1709127	Migrate make_combined_mutation_source to flat reader Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-12-21 17:00:42 +01:00
Paweł Dziepak	d8dad04564	mutation_reader: drop make_restricted_reader() make_restricted_reader() has been replaced by make_restricted_flat_reader().	2017-12-13 11:57:22 +00:00
Paweł Dziepak	3839bc5d60	mutation_reader: convert restricted reader to flat streams	2017-12-13 10:46:41 +00:00
Botond Dénes	1ff65f41fd	mutation_reader_merger: don't query the kind of moved-from fragment Call mutation_fragment_kind() on the fragment before it's moved as there are not guarantees for the state of a moved-from object (apart from that it's in a valid one). Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <c47b1e22877bb9499f1fbb9d513093c29ef1901b.1512635422.git.bdenes@scylladb.com>	2017-12-07 10:40:31 +02:00
Botond Dénes	9661769313	combined_mutation_reader: fix fast-fowarding related row-skipping bug When fast forwarding is enabled and all readers positioned inside the current partition return EOS, return EOS from the combined-reader too. Instead of skipping to the next partition if there are idle readers (positioned at some later partition) available. This will cause rows to be skipped in some cases. The fix is to distinguish EOS'd readers that are only halted (waiting for a fast-forward) from thoose really out of data. To achieve this we track the last fragment-kind the reader emitted. If that was a partition-end then the reader is out of data, otherwise it might emit more fragments after a fast-forward. Without this additional information it is impossible to determine why a reader reached EOS and the code later may make the wrong decision about whether the combined-reader as a whole is at EOS or not. Also when fast-forwarding between partition-ranges or calling next_partition() we set the last fragment-kind of forwarded readers because they should emit a partition-start, otherwise they are out of data. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <6f0b21b1ec62e1197de6b46510d5508cdb4a6977.1512569218.git.bdenes@scylladb.com>	2017-12-06 16:09:05 +02:00
Botond Dénes	e7535f5e88	Add flat_mutation_reader overload of make_combined_reader	2017-12-04 07:57:43 +02:00
Botond Dénes	8731c1bc66	Flatten the implementation of combined_mutation_reader In fact flatten mutation_reader_merger and adjust combined_mutation_reader accordingly.	2017-12-04 07:57:43 +02:00
Botond Dénes	3f8110b5b6	Make combined_mutation_reader a flat_mutation_reader For now only the interface is converted, behind the scenes the previous implementation remains, it's output is simply converted by flat_mutation_reader_from_mutation_reader. The implementation will be converted in the following patches.	2017-12-04 07:57:43 +02:00
Botond Dénes	c011747c30	Move the mutation merging logic to combined_mutation_reader This is the second step in splitting the combined readers's logic into two parts as outlined in the previous patch.	2017-12-04 07:57:43 +02:00
Botond Dénes	3681e17555	Remove the unnecessary indirection of mutation_reader_merger::next()	2017-12-04 07:57:43 +02:00
Botond Dénes	c5e57e0961	Move the implementation of combined_mutation_reader into mutation_reader_merger This simple code-movement and patch lays the groundwork for splitting the logic in combined_mutation_reader into two blocks: * one that takes care of moving the readers in lockstep and emits their output as a non-decreasing stream of streamed_mutations and * one that takes care of merging the above stream into strictly-increasing stream of streamed_mutations. This in turn is preparation-work to the transformation of combined_mutation_reader into a flat_mutation_reader::impl.	2017-12-04 07:57:43 +02:00
Piotr Jastrzebski	3f70dfc939	Introduce conversion from flat_mutation_reader to streamed_mutation Allows splitting migration into small steps. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-11-15 15:33:23 +01:00
Paweł Dziepak	97767963a0	mutation_reader: drop multi_range_reader	2017-11-13 16:49:52 +00:00
Piotr Jastrzebski	acfc6fef55	Simplify flat_mutation_reader wrappers If a wrapper takes a flat_mutation_reader in a constructor then it does not have to take schema_ptr because it can obtain it from the inner flat_mutation_reader. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <88c3672df08d2ac465711e9138d426e43ae9c62b.1510331382.git.piotr@scylladb.com>	2017-11-13 08:53:34 +01:00
Tomasz Grabiec	92e3449d59	mutation_reader: Do not call fast_forward_to() on a reference to a capture The range reference is supposed to be valid as long as the reader is used, not just around fast_forward_to(). Introduced in `a6b9186cab` Message-Id: <1509710642-12713-1-git-send-email-tgrabiec@scylladb.com>	2017-11-03 12:09:42 +00:00
Paweł Dziepak	8c3b7fea81	Merge "Introduce new API and converters from/to old mutation_reader" from Piotr "This changeset is the first step to flatten mutation_reader. Then it introduces new mutation_fragment types for partition header and end of partition. Using those a new flat_mutation_reader is defined. Finally it introduces converters between new flat_mutation_reader and old mutation_reader." * 'haaawk/flattened_mutation_reader_v12' of github.com:scylladb/seastar-dev: Add tests for flat_mutation_reader Introduce conversion from flat_mutation_reader to mutation_reader Introduce conversion from mutation_reader to flat_mutation_reader Introduce flat_mutation_reader Extract FlattenedConsumer concept using GCC6_CONCEPT Introduce partition_end mutation_fragment Introduce a position for end of partition Introduce partition_start mutation_fragment Introduce FragmentConsumer Introduce a position for partition start streamed_mutation: Extract concepts using GCC6_CONCEPT macro	2017-10-16 12:14:23 +01:00
Piotr Jastrzebski	31733a7eeb	Introduce conversion from flat_mutation_reader to mutation_reader This will be used in transition from mutation_reader to flat_mutation_reader Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-10-13 16:08:59 +02:00
Botond Dénes	a43901f842	row_consumer: de-virtualize io_priority() and resource_tracker() Fixes #2830 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <448a1f739ab8c88a7a5562bce8dce5ae6efdf934.1507302530.git.bdenes@scylladb.com>	2017-10-06 18:50:12 +01:00
Botond Dénes	fea6214a0a	Update reader restriction related metrics Update description of existing reader count metrics, add memory consumption metrics. Use labels to distinguish between system, user and streaming reads related metrics.	2017-10-03 12:44:17 +03:00
Botond Dénes	47e07b787e	restricted_mutation_reader: restrict based-on memory consumption Restrict readers based on their memory consumption, instead of the count of the top-level readers. To do this an interposer is installed at the input_stream level which tracks buffers emmited by the stream. This way we can have an accurate picture of the readers' actual memory consumption. New readers will consume 16k units from the semaphore up-front. This is to account their own memory-consumption, apart from the buffers they will allocate. Creating the reader will be deferred to when there are enough resources to create it. As before only new readers will be blocked on an exhausted semaphore, existing readers can continue to work.	2017-10-03 12:44:12 +03:00

1 2

98 Commits