scylladb

Author	SHA1	Message	Date
Botond Dénes	c3b4c3f451	evictable_reader: reset _range_override after fast-forwarding `_range_override` is used to store the modified range the reader reads after it has to be recreated (when recreating a reader it's read range is reduced to account for partitions it already read). When engaged, this field overrides the `_pr` field as the definitive range the reader is supposed to be currently reading. Fast forwarding conceptually overrides the range the reader is currently reading, however currently it doesn't reset the `_range_override` field. This resulted in `_range_override` (containing the modified pre-fast-forward range) incorrectly overriding the fast-forwarded-to range in `_pr` when validating the first partition produced by the just recreated reader, resulting in a false-positive validation failure. Fixes: #8059 Tests: unit(release) Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20210217164744.420100-1-bdenes@scylladb.com>	2021-02-17 19:11:00 +02:00
Benny Halevy	d565e3fb57	reader_lifecycle_policy: retire low level try_resume method The caller can now just call sem.unregister_inactive_read(irh) directly. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-02-08 20:32:40 +02:00
Benny Halevy	4e8f29ef14	reader_concurrency_semaphore: inactive_read: keep a flat_mutation_reader There's no need to hold a unique_ptr<flat_mutation_reader> as flat_mutation_reader itself holds a unique_ptr<flat_mutation_reader::impl> and functions as a unique ptr via flat_mutation_reader_opt. With that, unregister_inactive_read was modified to return a flat_mutation_reader_opt rather than a std::unique_ptr<flat_mutation_reader>, keeping exactly the same semantics. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-02-08 20:32:40 +02:00
Avi Kivity	913d970c64	Merge "Unify inactive readers" from Botond " Currently inactive readers are stored in two different places: * reader concurrency semaphore * querier cache With the latter registering its inactive readers with the former. This is an unnecessarily complex (and possibly surprising) setup that we want to move away from. This series solves this by moving the responsibility if storing of inactive reads solely to the reader concurrency semaphore, including all supported eviction policies. The querier cache is now only responsible for indexing queriers and maintaining relevant stats. This makes the ownership of the inactive readers much more clear, hopefully making Benny's work on introducing close() and abort() a little bit easier. Tests: unit(release, debug:v1) " * 'unify-inactive-readers/v2' of https://github.com/denesb/scylla: reader_concurrency_semaphore: store inactive readers directly querier_cache: store readers in the reader concurrency semaphore directly querier_cache: retire memory based cache eviction querier_cache: delegate expiry to the reader_concurrency_semaphore reader_concurrency_semaphore: introduce ttl for inactive reads querier_cache: use new eviction notify mechanism to maintain stats reader_concurrency_semaphore: add eviction notification facility reader_concurrency_semaphore: extract evict code into method evict()	2021-02-03 10:59:04 +02:00
Botond Dénes	226088d12e	mutation_reader: reader_lifecycle_policy::stopped_reader: drop pending_next_partition flag Its not used anymore.	2021-01-22 16:18:59 +02:00
Botond Dénes	4eb65b12a0	mutation_reader: evictable_reader: remove next_partition() workaround `next_partition()` now returns a future<>, so we can forward it to the remote shard in the scope of the next partition call, remove the now obsolete workaround for the synchronous next partition.	2021-01-22 16:18:30 +02:00
Botond Dénes	febd2feb4c	mutation_reader: shard_reader: remove next_partition() workaround `next_partition()` now returns a future<>, so we can forward it to the remote shard in the scope of the next partition call, remove the now obsolete workaround for the synchronous next partition.	2021-01-22 15:53:05 +02:00
Botond Dénes	81da6b756f	mutation_reader: foreign_reader: remove next_partition() workaround `next_partition()` now returns a future<>, so we can forward it to the remote shard in the scope of the next partition call, remove the now obsolete workaround for the synchronous next partition.	2021-01-22 15:30:36 +02:00
Kamil Braun	570d15c7bc	multishard_combining_reader: do not use `smp::count` `multishard_combining_reader` currently only works under the assumption that every table uses the same sharder configured using the node's number of shards. But we could potentially specify a different sharder for a chosen table, e.g. one that puts everything on shard 0. Then this assumption will be broken and the reader causes a segfault. Fixes #7945.	2021-01-21 18:28:18 +02:00
Benny Halevy	29002e3b48	flat_mutation_reader: return future from next_partition To allow it to asynchronously close underlying readers on next_partition(). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-01-13 17:35:07 +02:00
Kamil Braun	5e846b33b8	clustering_order_reader_merger: fix the 0 readers case With 0 readers the merger would produce a `partition_end` fragment when it should immediately return `end_of_stream` instead.	2020-12-18 12:30:40 +01:00
Kamil Braun	0b36c5e116	mutation_reader: introduce clustering_order_reader_merger This abstraction is used to merge the output of multiple readers, each opened for a single partition query, into a non-decreasing stream of mutation_fragments. It is similar to `mutation_reader_merger`, an important difference is that the new merger may select new readers in the middle of a partition after it already returned some fragments from that partition. It uses the new `position_reader_queue` abstraction to select new readers. It doesn't support multi-partition (ring range) queries. The new merger will be later used when reading from sstable sets created by TimeWindowCompactionStrategy. This strategy creates many sstables that are mostly disjoint w.r.t the contained clustering keys, so we can delay opening sstable readers when querying a partition until after we have processed all mutation fragments with positions before the keys contained by these sstables.	2020-11-30 11:55:44 +01:00
Kamil Braun	857911d353	mutation_reader: `generalize combined_mutation_reader` It is now called `merging_reader`, and is used to change a `FragmentProducer` that produces a non-decreasing stream of mutation fragments batches into a `flat_mutation_reader` producing a non-decreasing stream of fragments. The resulting stream of fragments is increasing except for places where we encounter range tombstones (multiple range tombstones may be produced with the same position_in_partition) `merging_reader` is a simple adapter over `mutation_fragment_merger`. The old `combined_mutation_reader` is simply a specialization of `merging_reader` where the used `FragmentProducer` is `mutation_reader_merger`, an abstraction that merges the output of multiple readers into one non-decreasing stream of fragment batches. There is no separate class for `combined_mutation_reader` now. Instead, `make_combined_reader` works directly with `merging_reader`.	2020-11-19 14:35:11 +01:00
Kamil Braun	60adee6900	mutation_reader: fix description of mutation_fragment_merger The resulting sequence is not necessarily strictly increasing (e.g. if there are range tombstones).	2020-11-19 14:29:04 +01:00
Botond Dénes	f5323b29d9	mutation_reader: queue_reader: don't set EOS flag on abort If the consumer happens to check the EOS flag before it hits the exception injected by the abort (by calling fill_buffer()), they can think the stream ended normally and expect it to be valid. However this is not guaranteed when the reader is aborted. To avoid consumers falsely thinking the stream ended normally, don't set the EOS flag on abort at all. Additionally make sure the producer is aborted too on abort. In theory this is not needed as they are the one initiating the abort, but better to be safe then sorry. Fixes: #7411 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20201102100732.35132-1-bdenes@scylladb.com>	2020-11-11 13:44:25 +02:00
Pavel Emelyanov	3da3d448c8	range_tombstone: Remove unused schema arg from .set_start Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-11-06 15:13:05 +03:00
Botond Dénes	ff623e70b3	reader_concurrency_semaphore: name permits Require a schema and an operation name to be given to each permit when created. The schema is of the table the read is executed against, and the operation name, which is some name identifying the operation the permit is part of. Ideally this should be different for each site the permit is created at, to be able to discern not only different kind of reads, but different code paths the read took. As not all read can be associated with one schema, the schema is allowed to be null. The name will be used for debugging purposes, both for coredump debugging and runtime logging of permit-related diagnostics.	2020-10-13 12:32:13 +03:00
Botond Dénes	307cdf1e0d	multishard_combining_reader: reader_lifecycle_policy: add permit param to create_reader() Allow the evictable reader managing the underlying reader to pass its own permit to it when creating it, making sure they share the same permit. Note that the two parts can still end up using different permits, when the underlying reader is kept alive between two pages of a paged read and thus keeps using the permit received on the previous page. Also adjust the `reader_context` in multishard_mutation_query.cc to use the passed-in permit instead of creating a new one when creating a new reader.	2020-10-12 15:56:56 +03:00
Botond Dénes	e09ab09fff	multishard_combining_reader: add permit parameter Don't create an own permit, take one as a parameter, like all other readers do, so the permit can be provided by the higher layer, making sure all parts of the logical read use the same permit.	2020-10-12 15:56:56 +03:00
Botond Dénes	600f1c7853	multishard_combining_reader: shard_reader: use multishard reader's permit Don't create a new permit per shard reader, pass down the multishard reader's one to be used by each shard reader. They all belong to the same read, they should use the same permit. Note that despite its name the shard readers are the local representation of a reader living on a remote shard and as such they live on the same shard the multishard combining reader lives on.	2020-10-12 15:56:56 +03:00
Botond Dénes	dd372c8457	flat_mutation_reader: de-virtualize buffer_size() The main user of this method, the one which required this method to return the collective buffer size of the entire reader tree, is now gone. The remaining two users just use it to check the size of the reader instance they are working with. So de-virtualize this method and reduce its responsibility to just returning the buffer size of the current reader instance.	2020-10-06 08:22:56 +03:00
Botond Dénes	6ca0464af5	mutation_fragment: add schema and permit We want to start tracking the memory consumption of mutation fragments. For this we need schema and permit during construction, and on each modification, so the memory consumption can be recalculated and pass to the permit. In this patch we just add the new parameters and go through the insane churn of updating all call sites. They will be used in the next patch.	2020-09-28 11:27:23 +03:00
Botond Dénes	72a88e0257	mutation_fragment: s/as_mutable_range_tombstone/mutate_as_range_tombstone/ We will soon want to update the memory consumption of mutation fragment after each modification done to it, to do that safely we have to forbid direct access to the underlying data and instead have callers pass a lambda doing their modifications. Uses where this method was just used to move the fragment away are converted to use `as_range_tombstone() &&`.	2020-09-28 10:53:56 +03:00
Botond Dénes	0518571e56	flat_mutation_reader: make _buffer a tracked buffer Via a tracked_allocator. Although the memory allocations made by the _buffer shouldn't dominate the memory consumption of the read itself, they can still be a significant portion that scales with the number of readers in the read.	2020-09-28 10:53:56 +03:00
Botond Dénes	77ea44cb73	mutation_reader: extract the two fill_buffer_result into a single one Currently we have two, nearly identical definitions of said struct. Extract it to a common definition and rename it to `remote_fill_buffer_result`.	2020-09-28 10:53:56 +03:00
Botond Dénes	3fab83b3a1	flat_mutation_reader: impl: add reader_permit parameter Not used yet, this patch does all the churn of propagating a permit to each impl. In the next patch we will use it to track to track the memory consumption of `_buffer`.	2020-09-28 10:53:48 +03:00
Botond Dénes	0b0ae18a14	evictable_reader: validate buffer after recreation the underlying The reader recreation mechanism is a very delicate and error-prone one, as proven by the countless bugs it had. Most of these bugs were related to the recreated reader not continuing the read from the expected position, inserting out-of-order fragments into the stream. This patch adds a defense mechanism against such bugs by validating the start position of the recreated reader. Several things are checked: * The partition is the expected one -- the one we were in the middle of or the next if we stopped at partition boundaries. * The partition is in the read range. * The first fragment in the partition is the expected one -- has a an equal or larger position than the next expected fragment. * The fragment is in the clustering range as defined by the slice. As these validations are only done on the slow-path of recreating an evicted reader, no performance impact is expected.	2020-09-25 12:09:00 +03:00
Botond Dénes	91020eef73	evictable_reader: update_next_position(): only use peek'd position on partition boundary `evictable_reader::update_next_position()` is used to record the position the reader will continue from, in the next buffer fill. This position is used to create the partition slice when the underlying reader is evicted and has to be recreated. There is an optimization in this method -- if the underlying's buffer is not empty we peek at the first fragment in it and use it as the next position. This is however problematic for buffer validation on reader recreation (introduced in the next patch), because using the next row's position as the next pos will allow for range tombstones to be emitted with before_key(next_pos.key()), which will trigger the validation. Instead of working around this, just drop this optimization for mid-partition positions, it is inconsequential anyway. We keep it for where it is important, when we detect that we are at a partition boundary. In this case we can avoid reading the current partition altogether when recreating the reader.	2020-09-25 12:09:00 +03:00
Botond Dénes	4f2e7a18e2	evictable_reader: trim range tombstones to the read clustering range Currently mutation sources are allowed to emit range tombstones that are out-of the clustering read range if they are relevant to it. For example a read of a clustering range [ck100, +inf), might start with: range_tombstone{start={ck1, -1}, end={ck200, 1}}, clustering_row{ck100} The range tombstone is relevant to the range and the first row of the range so it is emitted as first, but its position (start) is outside the read range. This is normally fine, but it poses a problem for evictable reader. When the underlying reader is evicted and has to be recreated from a certain clustering position, this results in out-of-order mutation fragments being inserted into the middle of the stream. This is not fine anymore as the monotonicity guarantee of the stream is violated. The real solution would be to require all mutation sources to trim range tombstones to their read range, but this is a lot of work. Until that is done, as a workaround we do this trimming in the evictable reader itself.	2020-09-25 12:09:00 +03:00
Botond Dénes	4944e050e3	mutation_reader: make_combined_reader(): return empty reader when combining 0 readers Avoid creating all the combining machinery when we know there is no data to be had. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200821045602.13096-1-bdenes@scylladb.com>	2020-08-22 20:47:49 +03:00
Botond Dénes	a9013030cf	multishard_mutation_reader: add a trace message for each shard reader created So we can see in the trace output, the shards that actually participated in the reads. There is a single message for each shard reader. Fixes: #6888 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200803132338.95013-1-bdenes@scylladb.com>	2020-08-03 16:24:46 +03:00
Botond Dénes	5de0afdab7	mutation_reader: expose new_reader_base_cost So that test code can use it.	2020-07-20 11:23:39 +03:00
Botond Dénes	627054c3d7	mutation_reader: shard_reader: fix fast-forwarding with read-ahead The current `fast_forward_to(const dht::partition_range&)` implementation has two problems: * If the reader was not created yet, but there is an ongoing read-ahead (which is going to create it), the function bails out. This will result in this shard reader not being fast-forwarded to the new range at all. * If the reader was already created and there is an ongoing read-ahead, the function will wait for this to complete, then fast-forward the reader, as it should. However, the buffer is cleared before the read-ahead is waited for. So if the read-ahead brings in new data, this will land in the buffer. This data will be outside of the fast-forwarded-to range and worse, as we just cleared the buffer, it might violate mutation fragment stream monotonicity requirements. This patch fixes both of these bugs. Targeted reproducer unit tests are coming in the next patches.	2020-07-01 09:51:02 +03:00
Botond Dénes	e31f7316c0	mutation_reader: evictable_reader: add assert against pause handle leak We are currently investigating a segmentation fault, which is suspected to be caused by a leaked pause handle. Although according to the latest theory the handle leak is not the root cause of the issue, just a symptom, its better to catch any bugs that would cause a handle leaking at the act, and not later when some side-effect causes a segfault. Refs: #6613 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200625153729.522811-1-bdenes@scylladb.com>	2020-06-28 12:08:25 +03:00
Botond Dénes	542d9c3711	mutation_reader: expose evictable_reader Expose functions for the outside world to create evictable readers. We expose two functions, which create an evictable reader with `auto_pause::yes` and `auto_pause::no` respectively. The function creating the latter also returns a handle in addition to the reader, which can be used to pause the reader.	2020-06-23 21:08:21 +03:00
Botond Dénes	1cc31deff9	mutation_reader: evictable_reader: add auto_pause flag Currently the evictable reader unconditionally pauses the underlying reader after each use (`fill_buffer()` or `fast_forward_to()` call). This is fine for current users (the multishard reader), but the future user we are doing all this refactoring for -- repair -- will want to control when the underlying reader is paused "manually". Both these behaviours can easily be supported in a single implementation, so we add an `auto_pause` flag to allow the creator of the evictable reader to control this.	2020-06-23 21:08:21 +03:00
Botond Dénes	af9e1c23e1	mutation_reader: make evictable_reader a flat_mutation_reader The `evictable_reader` class is almost a proper flat mutation reader already, it roughly offers the same interface. This patch makes this formal: changing the class to inherit from `flat_mutation_reader::impl`, and implement all virtual methods. This also entails a departure from using the lifecycle policy to pause/resume and create readers, instead using more general building blocks like the reader concurrency semaphore and a mutation source.	2020-06-23 21:08:21 +03:00
Botond Dénes	4485864ada	mutation_reader: s/inactive_shard_read/inactive_evictable_reader/ Rename `inactive_shard_read` to `inactive_evictable_reader` to reflect that the fact that the evictable reader is going to be of general use, not specific to the multishard reader.	2020-06-23 10:01:38 +03:00
Botond Dénes	b6ed054c08	mutation_reader: move inactive_shard_reader code up It will be used by the `evictable_reader` code too in the next patches.	2020-06-23 10:01:38 +03:00
Botond Dénes	e3ea1c9080	mutation_reader: fix indentation Deferred from the previous patch.	2020-06-23 10:01:38 +03:00
Botond Dénes	f9d1916499	mutation_reader: shard_reader: extract remote_reader as evictable_reader We want to make the evictable reader mechanism used in the multishard reader pipeline available for general (re)use, as a standalone flat mutation reader implementation. The first step is extracting `shard_reader::remote_reader` the class implementing this logic into a top-level class, also renamed to `evictable_reader`.	2020-06-23 10:01:38 +03:00
Avi Kivity	a4c44cab88	treewide: update concepts language from the Concepts TS to C++20 Seastar recently lost support for the experimental Concepts Technical Specification (TS) and gained support for C++20 concepts. Re-enable concepts in Scylla by updating our use of concepts to the C++20 standard. This change: - peels off uses of the GCC6_CONCEPT macro - removes inclusions of <seastar/gcc6-concepts.hh> - replaces function-style concepts (no longer supported) with equation-style concepts - semicolons added and removed as needed - deprecated std::is_pod replaced by recommended replacement - updates return type constraints to use concepts instead of type names (either std::same_as or std::convertible_to, with std::same_as chosen when possible) No attempt is made to improve the concepts; this is a specification update only. Message-Id: <20200531110254.2555854-1-avi@scylladb.com>	2020-06-02 09:12:21 +03:00
Botond Dénes	4409579352	mutation_reader: restricted_reader: work in terms of reader_permit We want to refactor all read resource tracking code to work through the read_permit, so refactor the restricted reader to also do so.	2020-05-28 11:34:35 +03:00
Botond Dénes	06dd3d9077	queue_reader: push(): eliminate unneeded continuation on full buffer case Currently, push() attaches a continuation to the _not_full future, if push() is called when the buffer is already full. This is not needed as we can safely push the fragment even if the buffer is already full. Furthermore we can eliminate the possibility of push() being called when the buffer is full, by checking whether it is full after pushing the fragment, not before. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200521055840.376019-1-bdenes@scylladb.com>	2020-05-21 09:34:44 +03:00
Botond Dénes	54a0d8536e	restricting_mutation_reader: include own buffer in buffer size calculation Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200519102902.231042-1-bdenes@scylladb.com>	2020-05-19 18:23:15 +03:00
Glauber Costa	e44b2826ab	compaction: avoid abandoned futures when using interposers When using interposers, cancelling compactions can leave futures that are not waited for (resharding, twcs) The reason is when consume_end_of_stream gets called, it tries to push end_of_stream into the queue_reader_handle. Because cancelling a compaction is done through an exception, the queue_reader_handle is terminated already at this time. Trying to push to it generates another exception and prevents us from returning the future right below it. This patch adds a new method is_terminated() and if we detect that the queue_reader_handle is already terminated by this point, we don't try to push. We call it is_terminated() because the check is to see if the queue_reader_handle has a _reader. The reader is also set to null on successful destruction. Signed-off-by: Glauber Costa <glauber@scylladb.com> Reviewed-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200430175839.8292-1-glauber@scylladb.com>	2020-05-01 16:30:23 +03:00
Botond Dénes	196dd5fa9b	treewide: throw std::bad_function_call with backtraces We typically use `std::bad_function_call` to throw from mandatory-to-implement virtual functions, that cannot have a meaningful implementation in the derived class. The problem with `std::bad_function_call` is that it carries absolutely no information w.r.t. where was it thrown from. I originally wanted to replace `std::bad_function_call` in our codebase with a custom exception type that would allow passing in the name of the function it is thrown from to be included in the exception message. However after I ended up also including a backtrace, Benny Halevy pointed out that I might as well just throw `std:bad_function_call` with a backtrace instead. So this is what this patch does. All users are various unimplemented methods of the `flat_mutation_reader::impl` interface. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200408075801.701416-1-bdenes@scylladb.com>	2020-04-08 13:54:06 +02:00
Piotr Jastrzebski	e72696a8e6	sharding_info: rename the class to sharder Also rename all variables that were named si or sinfo to sharder. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-30 18:42:33 +02:00
Piotr Jastrzebski	031f589dba	multishard_combining_reader: use token_for_next_shard from sharding info not partitioner Previously this function was accessing sharding logic through partitioner obtained from the schema. While converting tests, dummy_partitioner is turned into dummy_sharding_info. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-30 18:42:25 +02:00
Rafael Ávila de Espíndola	c5795e8199	everywhere: Replace engine().cpu_id() with this_shard_id() This is a bit simpler and might allow removing a few includes of reactor.hh. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200326194656.74041-1-espindola@scylladb.com>	2020-03-27 11:40:03 +03:00

1 2 3 4 5

218 Commits