scylladb

Author	SHA1	Message	Date
Botond Dénes	a011a9ebf2	mutation_reader: multishard_combining_reader: support custom dismantler Add a dismantler functor parameter. When the multishard reader is destroyed this functor will be called for each shard reader, passing a future to a `stopped_foreign_reader`. This future becomes available when the shard reader is stopped, that is, when it finished all in-progress read-aheads and/or pending next partition calls. The intended use case for the dismantler functor is a client that needs to be notified when readers are destroyed and/or has to have access to any unconsumed fragments from the foreign readers wrapping the shard readers.	2018-09-03 10:31:44 +03:00
Botond Dénes	f13b878a94	mutation_reader: pass all standard reader params to `remote_reader_factory` Extend `remote_reader_factory` interface so that it accepts all standard mutation reader creation parameters. This allows factory lambdas to be truly stateless, not having to capture any standard parameters that is needed for creating the reader. Standard parameters are those accepted by `mutation_source::make_reader()`.	2018-09-03 10:31:44 +03:00
Botond Dénes	81a03db955	mutation_reader: reader_selector: use ring_position instead of token sstable_set::incremental selector was migrated to ring position, follow suit and migrate the reader_selector to use ring_position as well. Above correctness this also improves efficiency in case of dense tables, avoiding prematurely selecting sstables that share the token but start at different keys, altough one could argue that this is a niche case.	2018-07-04 17:42:37 +03:00
Botond Dénes	a8e795a16e	sstables_set::incremental_selector: use ring_position instead of token Currently `sstable_set::incremental_selector` works in terms of tokens. Sstables can be selected with tokens and internally the token-space is partitioned (in `partitioned_sstable_set`, used for LCS) with tokens as well. This is problematic for severeal reasons. The sub-range sstables cover from the token-space is defined in terms of decorated keys. It is even possible that multiple sstables cover multiple non-overlapping sub-ranges of a single token. The current system is unable to model this and will at best result in selecting unnecessary sstables. The usage of token for providing the next position where the intersecting sstables change [1] causes further problems. Attempting to walk over the token-space by repeatedly calling `select()` with the `next_position` returned from the previous call will quite possibly lead to an infinite loop as a token cannot express inclusiveness/exclusiveness and thus the incremental selector will not be able to make progress when the upper and lower bounds of two neighbouring intervals share the same token with different inclusiveness e.g. [t1, t2](t2, t3]. To solve these problems update incremental_selector to work in terms of ring position. This makes it possible to partition the token-space amoing sstables at decorated key granularity. It also makes it possible for select() to return a next_position that is guaranteed to make progress. partitioned_sstable_set now builds the internal interval map using the decorated key of the sstables, not just the tokens. incremental_selector::select() now uses `dht::ring_position_view` as both the selector and the next_position. ring_position_view can express positions between keys so it can also include information about inclusiveness/exclusiveness of the next interval guaranteeing forward progress. [1] `sstable_set::incremental_selector::selection::next_position`	2018-07-04 17:42:33 +03:00
Botond Dénes	07fb2e9c4d	make_foreign_reader: don't wrap local readers If the to-be-wrapped reader is local (lives on the same shard where make_foreign_reader() is called) there is no need to wrap it with foreign_reader. Just return it as is. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <886ed883b707f163603a40b56b8823f2bb6c47c6.1523873224.git.bdenes@scylladb.com>	2018-04-16 15:11:20 +03:00
Botond Dénes	3a6f397fd0	Add multishard_combined_reader Takes care of reading a range from all shards that own a subrange in the range. The read happens sequentially, reading from one shard at a time. Under the scenes it uses combined_mutation_reader and foreign_reader, the former providing the merging logic and the latter taking care of transferring the output of the remote readers to the local shard. Readers are created on-demand by a reader-selector implementation that creates readers for yet unvisited shards as the read progresses. The read starts with a concurrency of one, that is the reader reads from a single shard at a time. The concurrency is exponentially increased (to a maximum of the number of shards) when a reader's buffer is empty after moving the next shard. This condition is important as we only wan't to increase concurrency for sparse tables that have little data and the reader has to move between shards often. When concurrency is > 1, the reader issues background read-aheads to the next shards so that by the time it needs to move to them they have the data ready. For dense tables (where we rarely cross shards) we rely on the foreign_reader to issue sufficient read-aheads on its own to avoid blocking.	2018-04-11 10:03:47 +03:00
Botond Dénes	2c0f8d0586	Add foreign_reader Local representant of a reader located on a remote shard. Manages the lifecycle and takes care of seamlessly transferring fragments produced by the remote reader. Fragments are copied between the shards in batches, a bufferful at a time. To maximize throughput read-ahead is used. After each fill_buffer() or fast_forward_to() a read-ahead (a fill_buffer() on the remote reader) is issued. This read-ahead runs in the background and is brough back to foreground on the next fill_buffer() or fast_forward_to() call.	2018-04-11 09:22:45 +03:00
Botond Dénes	f488ae3917	Add buffer_size() to flat_mutation_reader buffer_size() exposes the collective size of the external memory consumed by the mutattion-fragments in the flat reader's buffer. This provides a basis to build basic memory accounting on. Altought this is not the entire memory consumption of any given reader it is the most volatile component and usually by far the largest one too.	2018-03-13 10:34:34 +02:00
Botond Dénes	1259031af3	Use the reader_concurrency_semaphore to limit reader concurrency	2018-03-08 14:12:12 +02:00
Botond Dénes	872fd369ba	Add reader_resource_tracker param to mutation_source Soon, reader_resource_tracker will only be constructible after the reader has been admitted. This means that the resource tracker cannot be preconstructed and just captured by the lambda stored in the mutation source and instead has to be passed in along the other parameters.	2018-03-08 14:12:09 +02:00
Piotr Jastrzebski	28c36d8884	Delete unused do_consume_streamed_mutation_flattened Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-01-24 20:56:49 +01:00
Piotr Jastrzebski	61f0ac257f	Delete unused mutation_reader Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-01-24 20:56:48 +01:00
Piotr Jastrzebski	9ce48bc5fc	Delete unused consume(mutation_reader&, Consumer) Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-01-24 20:56:48 +01:00
Piotr Jastrzebski	37285ad7fa	Delete unused make_reader_returning Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-01-24 20:56:48 +01:00
Piotr Jastrzebski	864db78fcf	Delete unused make_reader_returning_many Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-01-24 20:56:48 +01:00
Piotr Jastrzebski	ff4ffc1c64	Delete unused make_empty_reader Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-01-24 20:56:48 +01:00
Piotr Jastrzebski	0b8aedcc59	Delete unused mutation_reader_from_flat_mutation_reader Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-01-24 20:56:48 +01:00
Piotr Jastrzebski	8aaf5dc900	Delete unused streamed_mutation_from_flat_mutation_reader Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-01-24 20:56:48 +01:00
Piotr Jastrzebski	d266eaa01e	mutation_source: rename make_flat_mutation_reader to make_reader Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-01-19 09:30:12 +01:00
Piotr Jastrzebski	380d5c3402	Remove unused mutation_source::operator() Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-01-19 08:56:37 +01:00
Avi Kivity	93076d25b6	Merge "mutation_source: remove support for creation with mutation_reader" from Piotr "After this patchset it's only possible to create a mutation_source with a function that produces flat_mutation_reader." * 'haaawk/mutation_source_v1' of ssh://github.com/scylladb/seastar-dev: Merge flat_mutation_reader_mutation_source into mutation_source Remove unused mutation_reader_mutation_source Remove unused mutation_source constructor. Migrate make_source to flat reader Migrate run_conversion_to_mutation_reader_tests to flat reader flat_mutation_reader_from_mutations: add support for slicing Remove unused mutation_source constructor. Migrate partition_counting_reader to flat reader Migrate throttled_mutation_source to flat reader Extract delegating_reader from make_delegating_reader row_cache_test: call row_cache::make_flat_reader in mutation_sources Remove unused friend declaration in flat_mutation_reader::impl Migrate make_source_with to flat reader Migrate make_empty_mutation_source to flat reader Remove unused mutation_source constructor Migrate test_multi_range_reader to flat reader Remove unused mutation_source constructors	2018-01-15 18:15:53 +02:00
Glauber Costa	08a0c3714c	allow request-specific read timeouts in storage proxy reads Timeouts are a global property. However, for tables in keyspaces like the system keyspace, we don't want to uphold that timeout--in fact, we wan't no timeout there at all. We already apply such configuration for requests waiting in the queued sstable queue: system keyspace requests won't be removed. However, the storage proxy will insert its own timeouts in those requests, causing them to fail. This patch changes the storage proxy read layer so that the timeout is applied based on the column family configuration, which is in turn inherited from the keyspace configuration. This matches our usual way of passing db parameters down. In terms of implementation, we can either move the timeout inside the abstract read executor or keep it external. The former is a bit cleaner, the the latter has the nice property that all executors generated will share the exact same timeout point. In this patch, we chose the latter. We are also careful to propagate the timeout information to the replica. So even if we are talking about the local replica, when we add the request to the concurrency queue, we will do it in accordance with the timeout specified by the storage proxy layer. After this patch, Scylla is able to start just fine with very low timeouts--since read timeouts in the system keyspace are now ignored. Fixes #2462 Implementation notes, and general comments about open discussion in 2462: * Because we are not bypassing the timeout, just setting it high enough, I consider the concerns about the batchlog moot: if we fail for any other reason that will be propagated. Last case, because the timeout is per-CF, we could do what we do for the dirty memory manager and move the batchlog alone to use a different timeout setting. * Storage proxy likes specifying its timeouts as a time_point, whereas when we get low enough as to deal with the read_concurrency_config, we are talking about deltas. So at some point we need to convert time_points to durations. We do that in the database query functions. v2: - use per-request instead of per-table timeouts. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-01-12 07:43:21 -05:00
Glauber Costa	3c9eeea4cf	restricted_mutation_reader: don't pass timeouts through the config structure This patch enables passing a timeout to the restricted_mutation_reader through the read path interface -- using fill_buffer and friends. This will serve as a basis for having per-timeout requests. The config structure still has a timeout, but that is so far only used to actually pass the value to the query interface. Once that starts coming from the storage proxy layer (next patch) we will remove. The query callers are patched so that we pass the timeout down. We patch the callers in database.cc, but leave the streaming ones alone. That can be safely done because the default for the query path is now no_timeout, and that is what the streaming code wants. So there is no need to complicate the interface to allow for passing a timeout that we intend to disable. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-01-12 07:43:21 -05:00
Glauber Costa	5140aaea00	add a timeout to fast forward to In the last patch, we enabled per-request timeouts, we enable timeouts in fill_buffer. There are many places, though, in which we fast_forward_to before we fill_buffer, so in order to make that effective we need to propagate the timeouts to fast_forward_to as well. In the same way as fill_buffer, we make the argument optional wherever possible in the high level callers, making them mandatory in the implementations. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-01-12 07:43:19 -05:00
Glauber Costa	d965af42b0	add a timeout to fill_buffer As part of the work to enable per-request timeouts, we enable timeouts in fill_buffer. The argument is made optional at the main classes, but mandatory in all the ::impl versions. This way we'll make sure we didn't forget anything. At this point we're still mostly passing that information around and don't have any entity that will act on those timeouts. In the next patch we will wire that up. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-01-11 12:07:41 -05:00
Raphael S. Carvalho	818830715f	Fix potential infinite recursion when combining mutations for leveled compaction The issue is triggered by compaction of sstables of level higher than 0. The problem happens when interval map of partitioned sstable set stores intervals such as follow: [-9223362900961284625 : -3695961740249769322 ] (-3695961740249769322 : -3695961103022958562 ] When selector is called for first interval above, the exclusive lower bound of the second interval is returned as next token, but the inclusivess info is not returned. So reader_selector was returning that there were new readers when the current token was -3695961740249769322 because it was stored in selector position field as inclusive, but it's actually exclusive. This false positive was leading to infinite recursion in combined reader because sstable set's incremental selector itself knew that there were actually no new readers, and therefore no progress could be made. Fix is to use ring_position in reader_selector, such that inclusiveness would be respected. So reader_selector::has_new_readers() won't return false positive under the conditions described above. Fixes #2908. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-01-03 16:23:01 -02:00
Avi Kivity	8795238869	Merge "Fix handling of range tombstones starting at same position" from Tomasz "When we get two range tombstones with the same lower bound from different data sources (e.g. two sstable), which need to be combined into a single stream, they need to be de-overlapped, because each mutation fragment in the stream must have a different position. If we have range tombstones [1, 10) and [1, 20), the result of that de-overlapping will be [1, 10) and [10, 20]. The problem is that if the stream corresponds to a clustering slice with upper bound greater than 1, but lower than 10, the second range tombstone would appear as being out of the query range. This is currently violating assumptions made by some consumers, like cache populator. One effect of this may be that a reader will miss rows which are in the range (1, 10) (after the start of the first range tombstone, and before the start of the second range tombstone), if the second range tombstone happens to be the last fragment which was read for a discontinuous range in cache and we stopped reading at that point because of a full buffer and cache was evicted before we resumed reading, so we went to reading from the sstable reader again. There could be more cases in which this violation may resurface. There is also a related bug in mutation_fragment_merger. If the reader is in forwarding mode, and the current range is [1, 5], the reader would still emit range_tombstone([10, 20]). If that reader is later fast forwarded to another range, say [6, 8], it may produce fragments with smaller positions which were emitted before, violating monotonicity of fragment positions in the stream. A similar bug was also present in partition_snapshot_flat_reader. Possible solutions: 1) relax the assumption (in cache) that streams contain only relevant range tombstones, and only require that they contain at least all relevant tombstones 2) allow subsequent range tombstones in a stream to share the same starting position (position is weakly monotonic), then we don't need to de-overlap the tombstones in readers. 3) teach combining readers about query restrictions so that they can drop fragments which fall outside the range 4) force leaf readers to trim all range tombstones to query restrictions This patch implements solution no 2. It simplifies combining readers, which don't need to accumulate and trim range tombstones. I don't like solution 3, because it makes combining readers more complicated, slower, and harder to properly construct (currently combining readers don't need to know restrictions of the leaf streams). Solution 4 is confined to implementations of leaf readers, but also has disadvantage of making those more complicated and slower. There is only one consumer which needs the tombstones with monotonic positions, and that is the sstable writer. Fixes #3093." * tag 'tgrabiec/fix-out-of-range-tombstones-v1' of github.com:scylladb/seastar-dev: tests: row_cache: Introduce test for concurrent read, population and eviction tests: sstables: Add test for writing combined stream with range tombstones at same position tests: memtable: Test that combined mutation source is a mutation source tests: memtable: Test that memtable with many versions is a mutation source tests: mutation_source: Add test for stream invariants with overlapping tombstones tests: mutation_reader: Test fast forwarding of combined reader with overlapping range tombstones tests: mutation_reader: Test combined reader slicing on random mutations tests: mutation_source_test: Extract random_mutation_generator::make_partition_keys() mutation_fragment: Introduce range() clustering_interval_set: Introduce overlaps() clustering_interval_set: Extract private make_interval() mutation_reader: Allow range tombstones with same position in the fragment stream sstables: Handle consecutive range_tombstone fragments with same position tests: streamed_mutation_assertions: Merge range_tombstones with the same position in produces_range_tombstone() streamed_mutation: Introduce peek() mutation_fragment: Extract mergeable_with() mutation_reader: Move definition of combining mutation reader to source file mutation_reader: Use make_combined_reader() to create combined reader	2018-01-02 18:32:09 +02:00
Duarte Nunes	1374f898b9	Merge seastar upstream Class optimized_optional was moved into seastar, and its usage simplified so move_and_disengage() is replaced in favour of std::exchange(_, { }). * seastar adaca37...b0f5591 (9): > Merge "core: Introduce cancellation mechanism" from Duarte > Fix Seastar build that no longer builds with --enable-dpdk after the recent commit fd87ea2 > noncopyable_function: support function objects whose move constructors throw > Adding new hardware options to new config format, using new config format for dpdk device > Fix check for Boost version during pre-build configuration. > variant_utils: add variant_visitor constructor for C++17 mode > Merge "Allows json object to be stream to an" from Amnon > Merge 'Default to C++17' from Avi > Add const version of subscript operator to circular_buffer Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20171228112126.18142-1-duarte@scylladb.com>	2017-12-28 13:24:18 +02:00
Piotr Jastrzebski	0430968426	Merge flat_mutation_reader_mutation_source into mutation_source Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-12-22 22:32:38 +01:00
Piotr Jastrzebski	3817519844	Remove unused mutation_reader_mutation_source Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-12-22 21:42:50 +01:00
Piotr Jastrzebski	e0e2fcc013	Remove unused mutation_source constructor. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-12-22 21:27:43 +01:00
Piotr Jastrzebski	093d6f06f0	Remove unused mutation_source constructor. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-12-22 16:10:41 +01:00
Tomasz Grabiec	60ed5d29c0	mutation_reader: Move definition of combining mutation reader to source file So that the whole world doesn't recompile when it changes.	2017-12-21 21:24:11 +01:00
Tomasz Grabiec	52285a9e73	mutation_reader: Use make_combined_reader() to create combined reader So that we can hide the definition of combined_mutation_reader. It's also less verbose.	2017-12-21 21:24:11 +01:00
Piotr Jastrzebski	b5ad96c9ca	Remove unused mutation_source constructor Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-12-21 21:01:35 +01:00
Piotr Jastrzebski	b583ef7c8b	Remove unused mutation_source constructors Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-12-21 20:34:20 +01:00
Piotr Jastrzebski	04ce7dfb84	Remove unused make_combined_reader overload. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-12-21 17:00:43 +01:00
Piotr Jastrzebski	b3b6db4f50	Remove unused make_combined_reader overload. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-12-21 17:00:43 +01:00
Piotr Jastrzebski	024e01ad9e	mutation_source: Add constructors for sources that ignore forwarding Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-12-21 16:59:57 +01:00
Piotr Jastrzebski	ff718d6573	Add default parameter values in make_combined_reader Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-12-21 11:47:07 +01:00
Piotr Jastrzebski	ac1d2f98e4	Fix build by removing semicolon after concept Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <4504cf47be0a451c58052476bc8cc4f9cba59472.1513248094.git.piotr@scylladb.com>	2017-12-14 10:46:13 +00:00
Paweł Dziepak	a0a13ceb46	filtering_reader: switch to flat mutation fragment streams	2017-12-13 12:01:03 +00:00
Paweł Dziepak	3bbb3b300d	filtering_reader: pass a const dht::decorated_key& to the callback All users of the filtering reader need only the decorated key of a partition, but currently the predicate is given a reference to streamed_mutations which are obsolete now.	2017-12-13 11:57:27 +00:00
Paweł Dziepak	d8dad04564	mutation_reader: drop make_restricted_reader() make_restricted_reader() has been replaced by make_restricted_flat_reader().	2017-12-13 11:57:22 +00:00
Paweł Dziepak	3839bc5d60	mutation_reader: convert restricted reader to flat streams	2017-12-13 10:46:41 +00:00
Botond Dénes	9661769313	combined_mutation_reader: fix fast-fowarding related row-skipping bug When fast forwarding is enabled and all readers positioned inside the current partition return EOS, return EOS from the combined-reader too. Instead of skipping to the next partition if there are idle readers (positioned at some later partition) available. This will cause rows to be skipped in some cases. The fix is to distinguish EOS'd readers that are only halted (waiting for a fast-forward) from thoose really out of data. To achieve this we track the last fragment-kind the reader emitted. If that was a partition-end then the reader is out of data, otherwise it might emit more fragments after a fast-forward. Without this additional information it is impossible to determine why a reader reached EOS and the code later may make the wrong decision about whether the combined-reader as a whole is at EOS or not. Also when fast-forwarding between partition-ranges or calling next_partition() we set the last fragment-kind of forwarded readers because they should emit a partition-start, otherwise they are out of data. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <6f0b21b1ec62e1197de6b46510d5508cdb4a6977.1512569218.git.bdenes@scylladb.com>	2017-12-06 16:09:05 +02:00
Botond Dénes	e7535f5e88	Add flat_mutation_reader overload of make_combined_reader	2017-12-04 07:57:43 +02:00
Botond Dénes	8731c1bc66	Flatten the implementation of combined_mutation_reader In fact flatten mutation_reader_merger and adjust combined_mutation_reader accordingly.	2017-12-04 07:57:43 +02:00
Botond Dénes	217740c608	Add mutation_fragment_merger This is the mutation fragment level equivalent of mutation_merger. It merges fragments produced by different sources. Mutation fragments are not as self-contained as streamed mutations, they have external context, e.g. the partition they belong to. To support this mutation_fragment_merger operates on a producer instead of a vector of fragments. Producer can have internal state and can do side-actions as fragments are consumed.	2017-12-04 07:57:43 +02:00
Botond Dénes	3f8110b5b6	Make combined_mutation_reader a flat_mutation_reader For now only the interface is converted, behind the scenes the previous implementation remains, it's output is simply converted by flat_mutation_reader_from_mutation_reader. The implementation will be converted in the following patches.	2017-12-04 07:57:43 +02:00

1 2 3

147 Commits