scylladb

Author	SHA1	Message	Date
Avi Kivity	342c967b6a	Merge "Introduce compacting reader" from Botond " Allow adding compacting to any reader pipeline. The intended users are streaming and repair, with the goal to prevent wasting transfer bandwidth with data that is purgeable. No current user in the tree. Tests: unit(dev), mutation_reader_test.compacting_reader_(debug) " 'compacting-reader/v3' of https://github.com/denesb/scylla: test: boost/mutation_reader_test: add unit test for compacting_reader test: lib/flat_mutation_reader_assertions: be more lenient about empty mutations test: lib/mutation_source_test: make data compaction friendly test: random_mutation_generator: add generate_uncompactable mode mutation_reader: introduce compacting_reader	2020-03-16 16:41:50 +02:00
Botond Dénes	8286a0b1bd	mutation_reader: introduce compacting_reader Compacting reader compacts the output of another reader on-the-fly. Performs compaction-type compaction (`compact_for_sstables::yes`). It will be used in streaming and repair to eliminate purgeable data from the stream, thus prevent wasting transfer bandwidth.	2020-03-16 13:58:13 +02:00
Piotr Jastrzebski	924ed7bb1c	make_multishard_combining_reader: stop taking partitioner The function already takes schema so there's no need for it to take partitioner. It can be obtained using schema::get_partitioner Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-15 10:25:20 +01:00
Botond Dénes	dfc8b2fc45	treewide: replace reader_resource_tracer with reader_permit The former was never really more than a reader_permit with one additional method. Currently using it doesn't even save one from any includes. Now that readers will be using reader_permit we would have to pass down both to mutation_source. Instead get rid of reader_resource_tracker and just use reader_permit. Instead of making it a last and optional parameter that is easy to ignore, make it a first class parameter, right after schema, to signify that permits are now a prominent part of the reader API. This -- mostly mechanical -- patch essentially refactors mutation_source to ask for the reader_permit instead of reader_resource_tracking and updates all usage sites.	2020-01-28 08:13:16 +02:00
Botond Dénes	c0f96db2d9	reader_concurrency_semaphore: mv reader_resources and reader_permit to reader_permit.hh In the next patches we will replace `reader_resource_tracker` and have code use the `reader_permit` directly. In subsequent patches, the `reader_permit` will get even more usages as we attempt to make the tracking of reader resource more accurate by tracking more parts of it. So the grand plan is that the current `reader_concurrency_semaphore.hh` is split into two headers: * `reader_concurrency_semaphore.hh` - containing the semaphore proper. * `reader_permit.hh` - a very lightweight header, to be used by components which only want to track various parts of the resource consumption of reads.	2020-01-28 08:13:16 +02:00
Botond Dénes	2005495857	reader_concurrency_semaphore: reader_permit: make it a value type Currently `reader_permit` is passed around as `lw_shared_ptr<reader_permit>`, which is clunky to write and use and is also an unnecessary leak of details on how permit ownership is managed. Make `reader_permit` a simple value type, making it a little bit easier and safer to use. In the next patches we will get rid of `reader_resource_tracker` and instead have code use the permit instance directly, so this small improvement in usability will go a long way towards preventing eye sore.	2020-01-28 08:13:16 +02:00
Piotr Dulikowski	2b4ca0c562	mutation_reader: gallop mode for combined reader In case when a single reader contributes a stream of fragments and keeps winning over other readers, mutation_reader_merger will enter gallop mode, in which it is assumed that the reader will keep winning over other readers. Currently, a reader needs to contribute 3 fragments to enter that mode. In gallop mode, fragments returned by the galloping reader will be compared with the best fragment from _fragment_heap. If it wins, the fragment is directly returned. Otherwise, gallop mode ends and merging performed as in general case, which involves heap operations. In current implementation, when the end of partition is encountered while in gallop mode, the gallop mode is ended unconditionally. Fixes #3593.	2019-10-30 09:51:18 +01:00
Piotr Dulikowski	2a46a09e7c	mutation_reader: refactor prepare_next Move out logic responsible for adding readers at partition boundary into `maybe_add_readers_at_partition_boundary`, and advancing one reader into `prepare_one`. This will allow to reuse this logic outside `prepare_next`.	2019-10-30 09:49:12 +01:00
Botond Dénes	6bfe468a17	multishard_combining_reader: remote_reader::recreate_reader(): restore indentation	2019-08-13 09:47:55 +03:00
Botond Dénes	68353acc1c	multishard_combining_reader: remote_reader: use next instead of last pos Currently the remote reader uses the last seen fragment's position to calculate the position the reader should continue from when the reader is recreated after having been evicted. Recently it was discovered that this logic breaks down badly when this last position is a non-full clustering prefix (a range tombstone start bound). In this case, if only the last position is available, there is no good way of computing the starting position. Starting after this position will potentially miss any rows that fall into the prefix (the current behaviour). Starting from before it will cause all range tombstones with said prefix to be re-emitted, causing other problems. A better solution is to exploit the fact that sometimes we also know what the next fragment is. These "some" times are the exact times that are problematic with the current approach -- when the last fragment is a range tombstone. Exploiting this extra knowledge allows for a much better way for calculating the starting position: instead of maintaining the last position, we maintain the next position, which is always safe to start from. This is not always possible, but in many cases we can know for sure what the next position is, for example if the last position was a static row we can be sure the next position is the first clustering position (or partition end). In the few cases where we cannot calculate the next position we fall back to the previous logic and start from after the last positions. The good news is that in these remaining cases (the last fragment is a clustering row) it is safe to do so. This patch also does some refactoring of the remote-reader internals, all fill-buffer related logic is grouped together in a single `fill_buffer()` method.	2019-08-13 09:47:55 +03:00
Botond Dénes	3949189918	multishard_combining_reader: remote_reader::do_fill_buffer(): reorganize drop logic To make it more readable.	2019-08-13 09:47:55 +03:00
Botond Dénes	339be3853d	foreign_reader: silence warning about discarded future And add a comment explaining why this is fine. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20190801062234.69081-1-bdenes@scylladb.com>	2019-08-01 10:11:24 +03:00
Botond Dénes	0f30bc0004	mutation_reader: move away from variadic futures Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20190724102246.20450-1-bdenes@scylladb.com>	2019-07-27 13:21:24 +03:00
Botond Dénes	2ccd8ee47c	queue_reader: use the reader's buffer as the queue The queue reader currently uses two buffers, a `_queue` that the producer pushes fragments into and its internal `_buffer` where these fragments eventually end up being served to the consumer from. This double buffering is not necessary. Change the reader to allow the producer to push fragments directly into the internal `_buffer`. This complicates the code a little bit, as the producer logic of `seastar::queue` has to be folded into the queue reader. On the other hand this introduces proper memory consumption management, as well as reduces the amount of consumed memory and eliminates the possibility of outside code mangling with the queue. Another big advantage of the change is that there is now an explicit way to communicate the EOS condition, no need to push a disengaged `mutation_fragment_opt`. The producer of the queue reader now pushes the fragments into the reader via an opaque `queue_reader_handle` object, which has the producer methods of `seastar::queue`. Existing users of queue readers are refactored to use the new interface. Since the code is more complex now, unit tests are added as well.	2019-06-04 13:39:26 +03:00
Botond Dénes	a597e46792	Make queue_reader public Extract it from `mutlishard_writer.cc` and move it to `mutation_reader.{hh,cc}` so other code can start using it too.	2019-06-03 12:08:37 +03:00
Botond Dénes	eba310163d	multishard_combining_reader: fix handling of non-strictly monotonous positions The shard readers under a multishard reader are paused after every operation executed on them. When paused they can be evicted at any time. When this happens, they will be re-created lazily on the next operation, with a start position such that they continue reading from where the evicted reader left off. This start position is determined from the last fragment seen by the previous reader. When this position is clustering position, the reader will be recreated such that it reads the clustering range (from the half-read partition): (last-ckey, +inf). This can cause problems if the last fragment seen by the evicted reader was a range-tombstone. Range tombstones can share the same clustering position with other range tombstones and potentially one clustering row. This means that when the reader is recreated, it will start from the next clustering position, ignoring any unread fragments that share the same position as the last seen range tombstone. To fix, ensure that on each fill-buffer call, the buffer contains all fragments for the last position. To this end, when the last fragment in the buffer is a range tombstone (with pos x), we continue reading until we see a fragment with a position y that is greater. This way it is ensured that we have seen all fragments for pos x and it is safe to resume the read, starting from after position x.	2019-04-26 11:38:12 +03:00
Botond Dénes	a3f79bfe5e	mutlishard_combining_reader: reorder shard_reader::remote_reader::do_fill_buffer() code Reduce the number of indentations - use early return for the short path.	2019-04-24 10:55:16 +03:00
Botond Dénes	bbd3f0acc3	multishard_combining_reader: shard_reader::remote_reader extract fill-buffer logic into do_fill_buffer()	2019-04-24 10:55:16 +03:00
Avi Kivity	88322086cb	Merge "Add fuzzer-type unit test for range scans" from Botond " This series adds a fuzzer-type unit test for range scans, which generates a semi-random dataset and executes semi-random range scans against it, validating the result. This test aims to cover a wide range of corner cases with the help of randomness. Data and queries against it are generated in such a way that various corner cases and their combinations are likely to be covered. The infrastructure under range-scans have gone under massive changes in the last year, growing in complexity and scope. The correctness of range scans is critical for the correct functioning of any Scylla cluster, and while the current unit tests served well in detecting any major problems (mostly while developing), they are too simplistic and can only be relied on to check the correctness of the basic functionality. This test aims to extend coverage drastically, testing cases that the author of the range-scan code or that of the existing unit tests didn't even think exists, by relying on some randomness. Fixes: #3954 (deprecates really) " * 'more-extensive-range-scan-unit-tests/v2' of https://github.com/denesb/scylla: tests/multishard_mutation_query_test: add fuzzy test tests/multishard_mutation_query_test: refactor read_all_partitions_with_paged_scan() tests/test_table: add advanced `create_test_table()` overload tests/test_table: make `create_test_table()` customizable query: add trim_clustering_row_ranges_to() tests/test_table: add keyspace and table name params tests/test_table: s/create_test_cf/create_test_table/ tests: move create_test_cf() to tests/test_table.{hh,cc} tests/multishard_mutation_query_test: drop many partition test tests/multishard_mutation_query_test: drop range tombstone test	2019-02-27 17:26:53 +02:00
Paweł Dziepak	b524f96a74	mutation_reader_merger: drop unneded readers in small batches It was observed that destroying readers as soon as they are not needed negatively affects performance of relatively small reads. We don't want to keep them alive for too long either, since they may own a lot of memory, but deferring the destruction slightly and removing them in batches of 4 seems to solve the problem for the small reads.	2019-02-22 14:43:38 +00:00
Paweł Dziepak	435e24f509	mutation_reader_merger: track readers by iterators and not pointers mutation_reader_merger uses a std::list of mutation_reader to keep them alive while the rest of the logic operates on non-owning pointers. This means that when it is a time to drop some of the readers that are no longer needed, the merger needs to scan the list looking for them. That's not ideal. The solution is to make the logic use iterators to elements in that list, which allows for O(1) removal of an unneeded reader. Iterators to list are just pointers to the node and are not invalidated by unrelated additions and removals.	2019-02-22 14:33:10 +00:00
Botond Dénes	9000626647	shard_reader: auto pause readers after being used Previously it was the responsibility of the layer above (multishard combining reader) to pause readers, which happened via an explicit `pause()` call. This proved to be a very bad design as we kept finding spots where the multishard reader should have paused the reader to avoid potential deadlocks (due to starved reader concurrency semaphores), but didn't. This commit moves the responsibility of pausing the reader into the shard reader. The reader is now kept in a paused state, except when it is actually used (a `fill_buffer()` or `fast_forward_to()` call is executing). This is fully transparent to the layer above. As a side note, the shard reader now also hides when the reader is created. This also used to be the responsibility of the multishard reader, and although it caused no problems so far, it can be considered a leak of internal details. The shard reader now automatically creates the remote reader on the first time it is attempted to be used. The code has been reorganized, such that there is now a clear separation of responsibilities. The multishard combining reader handles the combining of the output of the shard readers, as well as issuing read-aheads. The shard reader handles read-ahead and creating the remote reader when needed, as well as transferring the results of remote reads to the "home" shard. The remote reader (`shard_reader::remote_reader`, new in this patch) handles pausing-resuming as well as recreating the reader after it was evicted. Layers don't access each other's internals (like they used to). After this commit, the reader passed to `destroy_reader()` will always be in paused state.	2019-02-12 16:20:51 +02:00
Botond Dénes	37006135dc	shard_reader: make reader creation sync Reader creation happens through the `reader_lifecycle_policy` interface, which offers a `create_reader()` method. This method accepts a shard parameter (among others) and returns a future. Its implementation is expected to go to the specified shard and then return with the created reader. The method is expected to be called from the shard where the shard reader (and consequently the multishard reader) lives. This API, while reasonable enough, has a serious flaw. It doesn't make batching possible. For example, if the shard reader issues a call to the remote shard to fill the remote reader's buffer, but finds that it was evicted while paused, it has to come back to the local shard just to issue the recreate call. This makes the code both convoluted and slow. Change the reader creation API to be synchronous, that is, callable from the shard where the reader has to be created, allowing for simple call sites and batching. This change requires that implementations of the lifecycle policy update any per-reader data-structure they have from the remote shard. This is not a problem however, as these data-structures are usually partitioned, such that they can be accessed safely from a remote shard. Another, very pleasant, consequence of this change is that now all methods of the lifecycle interface are sync and thus calls to them cannot overlap anymore. This patch also removes the `test_multishard_combining_reader_destroyed_with_pending_create_reader` unit test, which is not useful anymore. For now just emulate the old interface inside shard reader. We will overhaul the shard reader after some further changes to minimize noise.	2019-02-12 16:20:51 +02:00
Botond Dénes	57d1f6589c	shard_reader: use semaphore directly to pause-resume The shard reader relies on the `reader_lifecycle_policy` for pausing and resuming the remote reader. The lifecycle policy's API was designed to be as general as possible, allowing for any implementation of pause/resume. However, in practice, we have a single implementation of pause/resume: registering/unregistering the reader with the relevant `reader_concurrency_semaphore`, and we don't expect any new implementations to appear in the future. Thus, the generic API of the lifecycle policy, is needlessly abstract making its implementations needlessly complex. We can instead make this very concrete and have the lifecycle policy just return the relevant semaphore, removing the need for every implementor of the lifecycle policy interface to have a duplicate implementation of the very same logic. For now just emulate the old interface inside shard reader. We will overhaul the shard reader after some further changes to minimize noise.	2019-02-12 16:20:51 +02:00
Botond Dénes	fae5a2a8c8	shard_reader: recreate_reader(): fix empty range case If the shard reader is created for a singular range (has a single partition), and then it is evicted after reaching EOS, when recreated we would have to create a reader that reads an empty range, since the only partition the range has was already read. Since it is not possible to create a reader with an empty range, we just didn't recreate the reader in this case. This is incorrect however, as the code might still attempt to read from this reader, if only due to a bug, and would trigger a crash. The correct fix is to create an empty reader that will immediately be at EOS.	2019-02-12 16:20:51 +02:00
Botond Dénes	cd807586f6	foreign_reader: rip out the now unused private API Drop all the glue code, needed in the past so the shard reader can be implemented on top of foreign reader. As the shard reader moved away from foreign reader, this glue code is not needed anymore.	2019-02-12 16:20:51 +02:00
Botond Dénes	d80bc3c0a5	shard_reader: move away from foreign_reader In the past, shard reader wrapped a foreign reader instance, adding functionality required by the multishard reader on top. This has worked well to a certain degree, but after the addition of pause-resume of shard reader, the cooperation with foreign reader became more-and-more a struggle. It has now gotten to a point, where it feels like shard reader is fighting foreign reader as much as it reuses it. This manifested itself in the ever growing amount of glue code, and hacks baked into foreign reader (which is supposed to be of general use), specific to the usage in the multishard reader. It is time we don't force this code-reuse anymore and instead implement all the required functionality in shard reader directly.	2019-02-12 16:20:51 +02:00
Botond Dénes	da0c01c68b	multishard_combining_reader: make shard_reader a shared pointer Some members of shard reader have to be accessed even after it is destroyed. This is required by background work that might still be pending when the reader is destroyed. This was solved by creating a special `state` struct, which contained all the members of the shard readers that had to be accessed even after it was destroyed. This state struct was managed through a shared pointer, that each continuation that was expected to outlive the reader, held a copy of. This however created a minefield, where each line of the code had to be carefully audited to access only fields that will be guaranteed to remain valid. Fix this mess by making the whole class a shared pointer, with `enable_shared_from_this`. Now each continuation just has to make sure to keep `this` alive and code can now access all members freely (well, almost).	2019-02-12 16:20:51 +02:00
Botond Dénes	f1c3421eb4	multishard_combining_reader: move the shard reader definition out Shard reader started its life as a very thin layer above foreign reader, with just some convenience methods added. As usual, by now it has grown into a hairy monster, its class definition out-growing even that of the multishard reader itself. It is time shard reader is moved into the top-level scope, improving the readability of both classes.	2019-02-12 16:20:51 +02:00
Botond Dénes	7114b59309	multishard_combining_reader: disentangle shard_reader Currently shard reader has a reference to the owning multishard reader and it freely accesses its members. This resulted in a mess, where it's not clear what exactly shard reader depends on. Disentangle this mess, by making the shard reader self-sufficient, passing all it depends on into its constructor.	2019-02-12 16:20:51 +02:00
Botond Dénes	181bf64858	query: add trim_clustering_row_ranges_to() This algorithm was already duplicated in two places (service/pager/query_pagers.cc and mutation_reader.cc). Soon it will be used in a third place. Instead of triplicating, move it into a function that everybody can use.	2019-02-08 16:30:17 +02:00
Botond Dénes	2a67355ded	multishard_combining_reader: better shard selection algorithm The multishard reader has to combine the output of all shards into a single fragment stream. To do that, each time a `partition_start` is read it has to check if there is another partition, from another shard, that has to be emitted before this partition. Currently for this it uses the partitioner. At every partition start fragment it checks if the token falls into the current shard sub-range. The shard sub-range is the continuous range of tokens, where each token belongs to the same shard. If the partition doesn't belong to the current shard sub-range the multishard reader assumes the following shard sub-range of the next shard will have data and move over to it. This assumption will however only stand on very dense tables, and will fail miserably on less dense tables, resulting in the multishard reader effectively iterating over the shard sub-ranges (4096 in the worst case), only to find data in just a few of them. This resulted in high user-perceived latency when scanning a sparse table. This patch replaces this algorithm with one based on a shard heap. The shards are now organized into a min-heap, by the next token they have data for. When a partition start fragment is read from the current shard, its token is compared to the smallest token in the shard heap. If smaller, we continue to read from the current shard. Otherwise we move to the shard with the smallest token. When constructing the reader, or after fast-forwarding we don't know what first token each reader will produce. To avoid reading in a partition from each reader, we assume each reader will produce the first token from the first shard sub-range that overlaps with the query range. This algorithm performs much better on sparse tables, while also being slightly better on dense tables. I did only a very rough measurement using CQL tracing. I populated a table with four rows on a 64 shards machine, then scanned the entire table. Time to scan the table (microseconds): before 27'846 after 5'248 Fixes: #4125 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <d559f887b650ab8caa79ad4d45fa2b7adc39462d.1548846019.git.bdenes@scylladb.com>	2019-02-04 14:10:23 +02:00
Duarte Nunes	fa2b0384d2	Replace std::experimental types with C++17 std version. Replace stdx::optional and stdx::string_view with the C++ std counterparts. Some instances of boost::variant were also replaced with std::variant, namely those that called seastar::visit. Scylla now requires GCC 8 to compile. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20190108111141.5369-1-duarte@scylladb.com>	2019-01-08 13:16:36 +02:00
Avi Kivity	475b151c97	Merge "Use utils::small_vector more in read path" from Paweł " This series optimises the read path by replacing some usages of std::vector by utils::small_vector. The motivation for this change was an observation that memory allocation functions are pointed out by the profiler as the ones where we spent most time and while they have a large number of callers storage allocation for some vectors was close to the top. The gains are not huge, since the problem is a lot of things adding up and not a single slow thing, but we need to start with something. Unfortunately, the performance of boost::container::small_vector is quite disappointing so a new implementation of a small_vector was introduced. perf_simple_query -c4 --duration 60, medians: ./perf_before ./perf_after diff read 343086.80 360720.53 5.1% Tests: unit(release, small_vector in debug) " * tag 'small_vector/v2.1' of https://github.com/pdziepak/scylla: partition_slice: use small_vector for column_ids mutation_fragment_merger: use small_vector auth: use small_vector in resource auth: avoid list-initialisation of vectors idl: serialiser: add serialiser for utils::small_vector idl: serialiser: deduplicate vector serialisers utils: introduce small_vector intrusive_set_external_comparator: make iterator nothrow move constructible mutation_fragment_merger: value-initialise iterator	2018-12-10 13:50:59 +02:00
Paweł Dziepak	a014367c5b	mutation_fragment_merger: use small_vector	2018-12-06 14:21:04 +00:00
Botond Dénes	ee193f1ab4	multishard_combining_reader: pause readers after reading ahead Readers created or resumed just to read ahead should be paused right after, to avoid consuming all available permits on the shards they operate on, causing a deadlock.	2018-12-06 13:20:30 +02:00
Botond Dénes	170fa382fa	multishard_combining_reader: pause all EOS'd readers Previously the last shard reader to reach EOS wasn't paused. This is a mistake and can contribute to causing deadlocks when the number of concurrently active readers on any shard is limited.	2018-12-06 10:30:43 +02:00
Paweł Dziepak	402902ac78	mutation_fragment_merger: value-initialise iterator ForwardIterators are default constructible, but they have to be value-initialised to compare equal to other value-initialised instances of that iterator.	2018-12-05 20:07:29 +00:00
Botond Dénes	22b14d593b	multishard_combining_reader: use pause-resume API Refactor the multishard combining reader to make use of the new pause-resume API to pause inactive shard readers. Make the pause-resume API mandatory to implement, as by now all existing clients have adapted it.	2018-12-04 08:51:05 +02:00
Botond Dénes	9601d23e0d	foreign_reader: add pause-resume API Allowing for pausing the reader and later resume it. Pausing the reader waits on the ongoing read ahead (if any), executes any pending `next_partition()` calls and than detaches the shard reader's buffer. The paused shard reader is returned to the client. Resuming the reader consists of getting the previously detached reader back, or one that has the same position as the old reader had. This API allows for making the inactive shard readers of the `multishard_combining_reader` evictable. The API is private, it's only accessible for classes knowing the full definition of the `foreign_reader` (which resides in a .cc file).	2018-12-04 08:51:05 +02:00
Botond Dénes	5f67a065c6	reader_lifecycle_policy: extend with a pause-resume API This API provides a way for the mulishard reader to pause inactive shard readers and later resume them when they are needed again. This allows for these paused shard readers to be evicted when the node is under pressure. How the readers are made evictable while paused is up to the clients. Using this API in the `multishard_combining_reader` and implementing it in the clients will be done in the next patches. Provide default implementation for the new virtual methods to facilitate gradual adoption.	2018-12-04 08:51:05 +02:00
Botond Dénes	007619de4c	multishard_combining_reader: use the reader lifecycle policy Refactor the multishard combining reader and its clients to use the reader lifecycle policy introduced in the previous patch.	2018-12-04 08:51:05 +02:00
Botond Dénes	301abaca07	multishard_combining_reader: drop unnecessary `reader_promise` member The `reader_promise` member of the `shard_reader` was used to synchronize a foreground request to create the underlying reader with an ongoing background request with the same goal. This is however unnecessary. The underlying reader is created in the background only as part of a read ahead. In this case there is no need for extra synchronization point, the foreground reader create request can just wait for the read ahead to finish, for which there already exists a mean. Furthermore, foreground reader create requests are always followed by a `fill_buffer()` request, so by waiting on the read ahead we ensure that the following `fill_buffer()` call will not block.	2018-12-04 08:51:05 +02:00
Botond Dénes	a73175fdbc	multishard_combining_reader: drop tracking of pending next_partition calls Shard readers used to track pending `next_partition()` calls that they couldn't execute, because their underlying reader wasn't created yet. These pending calls were then executed after the reader was created. However the only situation where a shard reader can receive a `next_partition()` call, before its underlying reader wasn't created is when `next_partition()` is called on the multishard reader before a single fragment is read. In this case we know we are at a partition boundary and thus this call has no effect, therefore it is safe to ignore it.	2018-12-04 08:51:05 +02:00
Botond Dénes	ab3e639c3b	foreign_reader: use bool for pending_next_partition Foreign reader doesn't execute `next_partition()` calls straight away, when this would require interaction with the remote reader. Instead these calls are "remembered" and executed on the next occasion the foreign reader has to interact with the remote reader. This was implemented with a counter that counts the number of pending `next_partition()` calls. However when `next_partition()` is called multiple times, without interleaving calls to `operator()()` or `fast_forward_to()`, only the first such call has effect. Thus it doesn't make sense to count these calls, it is enough to just set a flag if there was at least one such call.	2018-12-04 08:51:05 +02:00
Botond Dénes	5a4fd1abab	multishard_combining_reader: drop support for streamed_mutation fast-forwarding It doesn't make sense for the multishard reader anyway, as it's only used by the row-cache. We are about to introduce the pausing of inactive shard readers, and it would require complex data structures and code to maintain support for this feature that is not even used. So drop it.	2018-12-04 08:51:05 +02:00
Botond Dénes	0cb7c43fb5	reader_concurrency_semaphore: add dedicated .cc file As we are about to extend the functionality of the reader concurrency semaphore, adding more method implementations that need to go to a .cc file, it's time we create a dedicated file, instead of keep shoving them into unrelated .cc files (mutation_reader.cc).	2018-12-03 13:37:02 +02:00
Avi Kivity	775b7e41f4	Update seastar submodule * seastar d59fcef...b924495 (2): > build: Fix protobuf generation rules > Merge "Restructure files" from Jesse Includes fixup patch from Jesse: " Update Seastar `#include`s to reflect restructure All Seastar header files are now prefixed with "seastar" and the configure script reflects the new locations of files. Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com> Message-Id: <5d22d964a7735696fb6bb7606ed88f35dde31413.1542731639.git.jhaberku@scylladb.com> "	2018-11-21 00:01:44 +02:00
George Kollias	c2343dc841	Make restricting reader fill_buffer more efficient Currently, restricting_mutation_reader::fill_buffer justs reads lower-layer reader's fragments one by one without doing any further transformations. This change just swaps the parent-child buffers in a single step, as suggested in #3604, and, hence, removing any possible per-fragment overhead. I couldn't find any test that exercises restricting_mutation_reader as a mutation source, so I added test_restricted_reader_as_mutation_source in mutation_reader_test. Tests: unit (release), though these 4 tests are failing regardless of my changes (they fail on master for me as well): snitch_reset_test, sstable_mutation_test, sstable_test, sstable_3_x_test. Fixes: #3604 Signed-off-by: George Kollias <georgioskollias@gmail.com> Message-Id: <1540052861-621-1-git-send-email-georgioskollias@gmail.com>	2018-10-22 11:36:54 +03:00
Botond Dénes	dfad223ea2	multishard_mutation_reader: shard_reader: don't do concurrent read-aheads multishard_mutation_reader starts read-aheads on the shards-to-be-read-soon. When doing this it didn't check whether the respective shards had an ongoing read-ahead already. This lead to a single shard executing multiple concurrent read-aheads. This is damaging for multiple reasons: * Can lead to concurrent access of the remote reader's data members. * The `shard_reader` was designed around a single read-ahead and thus will synchronise foreground reads with only the last one. The practical implications of this seen so far was that queries reading a large number of rows (large enough to reliably trigger the bug) would stop the read early, due the `combined_mutation_reader`'s internal accounting being messed up by concurrent access. Also add a unit test. Instead of coming up with a very specific, and very contrived unit test, use the test-case that detected this bug in the first place: count(*) on a table with lots of rows (>1000). This unit-test should serve well for detecting any similar bugs in the future. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <ff1c49be64e2fb443f9aa8c5c8d235e682442248.1536746388.git.bdenes@scylladb.com>	2018-09-12 11:43:18 +01:00

1 2 3 4

168 Commits