Commit Graph

150 Commits

Author SHA1 Message Date
Avi Kivity
88322086cb Merge "Add fuzzer-type unit test for range scans" from Botond
"
This series adds a fuzzer-type unit test for range scans, which
generates a semi-random dataset and executes semi-random range scans
against it, validating the result.
This test aims to cover a wide range of corner cases with the help of
randomness. Data and queries against it are generated in such a way that
various corner cases and their combinations are likely to be covered.

The infrastructure under range-scans have gone under massive changes in
the last year, growing in complexity and scope. The correctness of range
scans is critical for the correct functioning of any Scylla cluster, and
while the current unit tests served well in detecting any major problems
(mostly while developing), they are too simplistic and can only be
relied on to check the correctness of the basic functionality. This test
aims to extend coverage drastically, testing cases that the author of
the range-scan code or that of the existing unit tests didn't even think
exists, by relying on some randomness.

Fixes: #3954 (deprecates really)
"

* 'more-extensive-range-scan-unit-tests/v2' of https://github.com/denesb/scylla:
  tests/multishard_mutation_query_test: add fuzzy test
  tests/multishard_mutation_query_test: refactor read_all_partitions_with_paged_scan()
  tests/test_table: add advanced `create_test_table()` overload
  tests/test_table: make `create_test_table()` customizable
  query: add trim_clustering_row_ranges_to()
  tests/test_table: add keyspace and table name params
  tests/test_table: s/create_test_cf/create_test_table/
  tests: move create_test_cf() to tests/test_table.{hh,cc}
  tests/multishard_mutation_query_test: drop many partition test
  tests/multishard_mutation_query_test: drop range tombstone test
2019-02-27 17:26:53 +02:00
Paweł Dziepak
b524f96a74 mutation_reader_merger: drop unneded readers in small batches
It was observed that destroying readers as soon as they are not needed
negatively affects performance of relatively small reads. We don't want
to keep them alive for too long either, since they may own a lot of
memory, but deferring the destruction slightly and removing them in
batches of 4 seems to solve the problem for the small reads.
2019-02-22 14:43:38 +00:00
Paweł Dziepak
435e24f509 mutation_reader_merger: track readers by iterators and not pointers
mutation_reader_merger uses a std::list of mutation_reader to keep them
alive while the rest of the logic operates on non-owning pointers.

This means that when it is a time to drop some of the readers that are
no longer needed, the merger needs to scan the list looking for them.
That's not ideal.

The solution is to make the logic use iterators to elements in that
list, which allows for O(1) removal of an unneeded reader. Iterators to
list are just pointers to the node and are not invalidated by unrelated
additions and removals.
2019-02-22 14:33:10 +00:00
Botond Dénes
9000626647 shard_reader: auto pause readers after being used
Previously it was the responsibility of the layer above (multishard
combining reader) to pause readers, which happened via an explicit
`pause()` call. This proved to be a very bad design as we kept finding
spots where the multishard reader should have paused the reader to avoid
potential deadlocks (due to starved reader concurrency semaphores), but
didn't.

This commit moves the responsibility of pausing the reader into the
shard reader. The reader is now kept in a paused state, except when it
is actually used (a `fill_buffer()` or `fast_forward_to()` call is
executing). This is fully transparent to the layer above.
As a side note, the shard reader now also hides when the reader is
created. This also used to be the responsibility of the multishard
reader, and although it caused no problems so far, it can be considered
a leak of internal details. The shard reader now automatically creates
the remote reader on the first time it is attempted to be used.

The code has been reorganized, such that there is now a clear separation
of responsibilities. The multishard combining reader handles the
combining of the output of the shard readers, as well as issuing
read-aheads. The shard reader handles read-ahead and creating the
remote reader when needed, as well as transferring the results of remote
reads to the "home" shard. The remote reader
(`shard_reader::remote_reader`, new in this patch) handles
pausing-resuming as well as recreating the reader after it was evicted.
Layers don't access each other's internals (like they used to).

After this commit, the reader passed to `destroy_reader()` will always
be in paused state.
2019-02-12 16:20:51 +02:00
Botond Dénes
37006135dc shard_reader: make reader creation sync
Reader creation happens through the `reader_lifecycle_policy` interface,
which offers a `create_reader()` method. This method accepts a shard
parameter (among others) and returns a future. Its implementation is
expected to go to the specified shard and then return with the created
reader. The method is expected to be called from the shard where the
shard reader (and consequently the multishard reader) lives. This API,
while reasonable enough, has a serious flaw. It doesn't make batching
possible. For example, if the shard reader issues a call to the remote
shard to fill the remote reader's buffer, but finds that it was evicted
while paused, it has to come back to the local shard just to issue the
recreate call. This makes the code both convoluted and slow.
Change the reader creation API to be synchronous, that is, callable from
the shard where the reader has to be created, allowing for simple call
sites and batching.
This change requires that implementations of the lifecycle policy update
any per-reader data-structure they have from the remote shard. This is
not a problem however, as these data-structures are usually partitioned,
such that they can be accessed safely from a remote shard.
Another, very pleasant, consequence of this change is that now all
methods of the lifecycle interface are sync and thus calls to them
cannot overlap anymore.

This patch also removes the
`test_multishard_combining_reader_destroyed_with_pending_create_reader`
unit test, which is not useful anymore.

For now just emulate the old interface inside shard reader. We will
overhaul the shard reader after some further changes to minimize
noise.
2019-02-12 16:20:51 +02:00
Botond Dénes
57d1f6589c shard_reader: use semaphore directly to pause-resume
The shard reader relies on the `reader_lifecycle_policy` for pausing and
resuming the remote reader. The lifecycle policy's API was designed to
be as general as possible, allowing for any implementation of
pause/resume. However, in practice, we have a single implementation of
pause/resume: registering/unregistering the reader with the relevant
`reader_concurrency_semaphore`, and we don't expect any new
implementations to appear in the future.
Thus, the generic API of the lifecycle policy, is needlessly abstract
making its implementations needlessly complex. We can instead make this
very concrete and have the lifecycle policy just return the relevant
semaphore, removing the need for every implementor of the lifecycle
policy interface to have a duplicate implementation of the very same
logic.

For now just emulate the old interface inside shard reader. We will
overhaul the shard reader after some further changes to minimize noise.
2019-02-12 16:20:51 +02:00
Botond Dénes
fae5a2a8c8 shard_reader: recreate_reader(): fix empty range case
If the shard reader is created for a singular range (has a single
partition), and then it is evicted after reaching EOS, when recreated we
would have to create a reader that reads an empty range, since the only
partition the range has was already read. Since it is not possible to
create a reader with an empty range, we just didn't recreate the reader
in this case. This is incorrect however, as the code might still attempt
to read from this reader, if only due to a bug, and would trigger a
crash. The correct fix is to create an empty reader that will
immediately be at EOS.
2019-02-12 16:20:51 +02:00
Botond Dénes
cd807586f6 foreign_reader: rip out the now unused private API
Drop all the glue code, needed in the past so the shard reader can be
implemented on top of foreign reader. As the shard reader moved away
from foreign reader, this glue code is not needed anymore.
2019-02-12 16:20:51 +02:00
Botond Dénes
d80bc3c0a5 shard_reader: move away from foreign_reader
In the past, shard reader wrapped a foreign reader instance, adding
functionality required by the multishard reader on top. This has worked
well to a certain degree, but after the addition of pause-resume of
shard reader, the cooperation with foreign reader became more-and-more a
struggle. It has now gotten to a point, where it feels like shard reader
is fighting foreign reader as much as it reuses it. This manifested
itself in the ever growing amount of glue code, and hacks baked into
foreign reader (which is supposed to be of general use), specific to
the usage in the multishard reader.
It is time we don't force this code-reuse anymore and instead implement
all the required functionality in shard reader directly.
2019-02-12 16:20:51 +02:00
Botond Dénes
da0c01c68b multishard_combining_reader: make shard_reader a shared pointer
Some members of shard reader have to be accessed even after it is
destroyed. This is required by background work that might still be
pending when the reader is destroyed. This was solved by creating a
special `state` struct, which contained all the members of the shard
readers that had to be accessed even after it was destroyed. This state
struct was managed through a shared pointer, that each continuation that
was expected to outlive the reader, held a copy of. This however created
a minefield, where each line of the code had to be carefully audited to
access only fields that will be guaranteed to remain valid.
Fix this mess by making the whole class a shared pointer, with
`enable_shared_from_this`. Now each continuation just has to make sure
to keep `this` alive and code can now access all members freely (well,
almost).
2019-02-12 16:20:51 +02:00
Botond Dénes
f1c3421eb4 multishard_combining_reader: move the shard reader definition out
Shard reader started its life as a very thin layer above foreign reader,
with just some convenience methods added. As usual, by now it has grown
into a hairy monster, its class definition out-growing even that of the
multishard reader itself. It is time shard reader is moved into the
top-level scope, improving the readability of both classes.
2019-02-12 16:20:51 +02:00
Botond Dénes
7114b59309 multishard_combining_reader: disentangle shard_reader
Currently shard reader has a reference to the owning multishard reader
and it freely accesses its members. This resulted in a mess, where it's
not clear what exactly shard reader depends on. Disentangle this mess,
by making the shard reader self-sufficient, passing all it depends on
into its constructor.
2019-02-12 16:20:51 +02:00
Botond Dénes
181bf64858 query: add trim_clustering_row_ranges_to()
This algorithm was already duplicated in two places
(service/pager/query_pagers.cc and mutation_reader.cc). Soon it will be
used in a third place. Instead of triplicating, move it into a function
that everybody can use.
2019-02-08 16:30:17 +02:00
Botond Dénes
2a67355ded multishard_combining_reader: better shard selection algorithm
The multishard reader has to combine the output of all shards into a
single fragment stream. To do that, each time a `partition_start` is
read it has to check if there is another partition, from another shard,
that has to be emitted before this partition. Currently for this it
uses the partitioner. At every partition start fragment it checks if the
token falls into the current shard sub-range. The shard sub-range is the
continuous range of tokens, where each token belongs to the same shard.
If the partition doesn't belong to the current shard sub-range the
multishard reader assumes the following shard sub-range of the next shard
will have data and move over to it. This assumption will however only
stand on very dense tables, and will fail miserably on less dense
tables, resulting in the multishard reader effectively iterating over
the shard sub-ranges (4096 in the worst case), only to find data in just
a few of them. This resulted in high user-perceived latency when
scanning a sparse table.

This patch replaces this algorithm with one based on a shard heap. The
shards are now organized into a min-heap, by the next token they have
data for. When a partition start fragment is read from the current
shard, its token is compared to the smallest token in the shard heap. If
smaller, we continue to read from the current shard. Otherwise we move
to the shard with the smallest token. When constructing the reader, or
after fast-forwarding we don't know what first token each reader will
produce. To avoid reading in a partition from each reader, we assume
each reader will produce the first token from the first shard sub-range
that overlaps with the query range. This algorithm performs much better
on sparse tables, while also being slightly better on dense tables.

I did only a very rough measurement using CQL tracing. I populated a
table with four rows on a 64 shards machine, then scanned the entire
table.
Time to scan the table (microseconds):
before 27'846
after   5'248

Fixes: #4125

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <d559f887b650ab8caa79ad4d45fa2b7adc39462d.1548846019.git.bdenes@scylladb.com>
2019-02-04 14:10:23 +02:00
Duarte Nunes
fa2b0384d2 Replace std::experimental types with C++17 std version.
Replace stdx::optional and stdx::string_view with the C++ std
counterparts.

Some instances of boost::variant were also replaced with std::variant,
namely those that called seastar::visit.

Scylla now requires GCC 8 to compile.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20190108111141.5369-1-duarte@scylladb.com>
2019-01-08 13:16:36 +02:00
Avi Kivity
475b151c97 Merge "Use utils::small_vector more in read path" from Paweł
"
This series optimises the read path by replacing some usages of
std::vector by utils::small_vector. The motivation for this change was
an observation that memory allocation functions are pointed out by the
profiler as the ones where we spent most time and while they have a
large number of callers storage allocation for some vectors was close to
the top. The gains are not huge, since the problem is a lot of things
adding up and not a single slow thing, but we need to start with
something.

Unfortunately, the performance of boost::container::small_vector is
quite disappointing so a new implementation of a small_vector was
introduced.

perf_simple_query -c4 --duration 60, medians:

       ./perf_before  ./perf_after  diff
 read      343086.80     360720.53  5.1%

Tests: unit(release, small_vector in debug)
"

* tag 'small_vector/v2.1' of https://github.com/pdziepak/scylla:
  partition_slice: use small_vector for column_ids
  mutation_fragment_merger: use small_vector
  auth: use small_vector in resource
  auth: avoid list-initialisation of vectors
  idl: serialiser: add serialiser for utils::small_vector
  idl: serialiser: deduplicate vector serialisers
  utils: introduce small_vector
  intrusive_set_external_comparator: make iterator nothrow move constructible
  mutation_fragment_merger: value-initialise iterator
2018-12-10 13:50:59 +02:00
Paweł Dziepak
a014367c5b mutation_fragment_merger: use small_vector 2018-12-06 14:21:04 +00:00
Botond Dénes
ee193f1ab4 multishard_combining_reader: pause readers after reading ahead
Readers created or resumed just to read ahead should be paused right
after, to avoid consuming all available permits on the shards they
operate on, causing a deadlock.
2018-12-06 13:20:30 +02:00
Botond Dénes
170fa382fa multishard_combining_reader: pause *all* EOS'd readers
Previously the last shard reader to reach EOS wasn't paused. This is a
mistake and can contribute to causing deadlocks when the number of
concurrently active readers on any shard is limited.
2018-12-06 10:30:43 +02:00
Paweł Dziepak
402902ac78 mutation_fragment_merger: value-initialise iterator
ForwardIterators are default constructible, but they have to be
value-initialised to compare equal to other value-initialised instances
of that iterator.
2018-12-05 20:07:29 +00:00
Botond Dénes
22b14d593b multishard_combining_reader: use pause-resume API
Refactor the multishard combining reader to make use of the new
pause-resume API to pause inactive shard readers.

Make the pause-resume API mandatory to implement, as by now all existing
clients have adapted it.
2018-12-04 08:51:05 +02:00
Botond Dénes
9601d23e0d foreign_reader: add pause-resume API
Allowing for pausing the reader and later resume it. Pausing the reader
waits on the ongoing read ahead (if any), executes any pending
`next_partition()` calls and than detaches the shard reader's buffer.
The paused shard reader is returned to the client.
Resuming the reader consists of getting the previously detached reader
back, or one that has the same position as the old reader had.
This API allows for making the inactive shard readers of the
`multishard_combining_reader` evictable.
The API is private, it's only accessible for classes knowing the full
definition of the `foreign_reader` (which resides in a .cc file).
2018-12-04 08:51:05 +02:00
Botond Dénes
5f67a065c6 reader_lifecycle_policy: extend with a pause-resume API
This API provides a way for the mulishard reader to pause inactive shard
readers and later resume them when they are needed again. This allows
for these paused shard readers to be evicted when the node is under
pressure.
How the readers are made evictable while paused is up to the clients.

Using this API in the `multishard_combining_reader` and implementing it
in the clients will be done in the next patches.

Provide default implementation for the new virtual methods to facilitate
gradual adoption.
2018-12-04 08:51:05 +02:00
Botond Dénes
007619de4c multishard_combining_reader: use the reader lifecycle policy
Refactor the multishard combining reader and its clients to use the
reader lifecycle policy introduced in the previous patch.
2018-12-04 08:51:05 +02:00
Botond Dénes
301abaca07 multishard_combining_reader: drop unnecessary reader_promise member
The `reader_promise` member of the `shard_reader` was used to
synchronize a foreground request to create the underlying reader with an
ongoing background request with the same goal. This is however
unnecessary. The underlying reader is created in the background only as
part of a read ahead. In this case there is no need for extra
synchronization point, the foreground reader create request can just
wait for the read ahead to finish, for which there already exists a
mean. Furthermore, foreground reader create requests are always followed
by a `fill_buffer()` request, so by waiting on the read ahead we ensure
that the following `fill_buffer()` call will not block.
2018-12-04 08:51:05 +02:00
Botond Dénes
a73175fdbc multishard_combining_reader: drop tracking of pending next_partition calls
Shard readers used to track pending `next_partition()` calls that they
couldn't execute, because their underlying reader wasn't created yet.
These pending calls were then executed after the reader was created.
However the only situation where a shard reader can receive a
`next_partition()` call, before its underlying reader wasn't created is
when `next_partition()` is called on the multishard reader before a
single fragment is read. In this case we know we are at a partition
boundary and thus this call has no effect, therefore it is safe to
ignore it.
2018-12-04 08:51:05 +02:00
Botond Dénes
ab3e639c3b foreign_reader: use bool for pending_next_partition
Foreign reader doesn't execute `next_partition()` calls straight away,
when this would require interaction with the remote reader. Instead
these calls are "remembered" and executed on the next occasion the
foreign reader has to interact with the remote reader. This was
implemented with a counter that counts the number of pending
`next_partition()` calls.
However when `next_partition()` is called multiple times, without
interleaving calls to `operator()()` or `fast_forward_to()`, only the
first such call has effect. Thus it doesn't make sense to count these
calls, it is enough to just set a flag if there was at least one such
call.
2018-12-04 08:51:05 +02:00
Botond Dénes
5a4fd1abab multishard_combining_reader: drop support for streamed_mutation fast-forwarding
It doesn't make sense for the multishard reader anyway, as it's only
used by the row-cache. We are about to introduce the pausing of inactive
shard readers, and it would require complex data structures and code
to maintain support for this feature that is not even used. So drop it.
2018-12-04 08:51:05 +02:00
Botond Dénes
0cb7c43fb5 reader_concurrency_semaphore: add dedicated .cc file
As we are about to extend the functionality of the reader concurrency
semaphore, adding more method implementations that need to go to a .cc
file, it's time we create a dedicated file, instead of keep shoving them
into unrelated .cc files (mutation_reader.cc).
2018-12-03 13:37:02 +02:00
Avi Kivity
775b7e41f4 Update seastar submodule
* seastar d59fcef...b924495 (2):
  > build: Fix protobuf generation rules
  > Merge "Restructure files" from Jesse

Includes fixup patch from Jesse:

"
Update Seastar `#include`s to reflect restructure

All Seastar header files are now prefixed with "seastar" and the
configure script reflects the new locations of files.

Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com>
Message-Id: <5d22d964a7735696fb6bb7606ed88f35dde31413.1542731639.git.jhaberku@scylladb.com>
"
2018-11-21 00:01:44 +02:00
George Kollias
c2343dc841 Make restricting reader fill_buffer more efficient
Currently, restricting_mutation_reader::fill_buffer justs reads
lower-layer reader's fragments one by one without doing any further
transformations. This change just swaps the parent-child buffers in a
single step, as suggested in #3604, and, hence, removing any possible
per-fragment overhead.

I couldn't find any test that exercises restricting_mutation_reader as
a mutation source, so I added test_restricted_reader_as_mutation_source
in mutation_reader_test.

Tests: unit (release), though these 4 tests are failing regardless of
my changes (they fail on master for me as well): snitch_reset_test,
sstable_mutation_test, sstable_test, sstable_3_x_test.

Fixes: #3604

Signed-off-by: George Kollias <georgioskollias@gmail.com>
Message-Id: <1540052861-621-1-git-send-email-georgioskollias@gmail.com>
2018-10-22 11:36:54 +03:00
Botond Dénes
dfad223ea2 multishard_mutation_reader: shard_reader: don't do concurrent read-aheads
multishard_mutation_reader starts read-aheads on the
shards-to-be-read-soon. When doing this it didn't check whether the
respective shards had an ongoing read-ahead already. This lead to a
single shard executing multiple concurrent read-aheads. This is damaging
for multiple reasons:
    * Can lead to concurrent access of the remote reader's data members.
    * The `shard_reader` was designed around a single read-ahead and
    thus will synchronise foreground reads with only the last one.

The practical implications of this seen so far was that queries reading
a large number of rows (large enough to reliably trigger the
bug) would stop the read early, due the `combined_mutation_reader`'s
internal accounting being messed up by concurrent access.

Also add a unit test. Instead of coming up with a very specific, and
very contrived unit test, use the test-case that detected this bug in
the first place: count(*) on a table with lots of rows (>1000). This
unit-test should serve well for detecting any similar bugs in the
future.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <ff1c49be64e2fb443f9aa8c5c8d235e682442248.1536746388.git.bdenes@scylladb.com>
2018-09-12 11:43:18 +01:00
Botond Dénes
6a07b8ae83 multishard_mutation_reader: update shard_reader's comment
The `adandoned` member was renamed to `stopped`. Update the comment
accordingly.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <1d655785f28fe1e5fa041f2f49852f0ad88be53e.1536743950.git.bdenes@scylladb.com>
2018-09-12 11:32:08 +02:00
Botond Dénes
49704755b0 combined_mutation_reader: propagate timeout in fill_buffer()
All user reads go through the combined reader. Not propagating the
timeout down from there means that the storage layer's timeout
functionality is effectively disabled. Spotted while reading the code.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <7fc10eca1c231dd04ac433913d9e6a51b6b17139.1536657041.git.bdenes@scylladb.com>
2018-09-11 15:44:28 +02:00
Botond Dénes
a011a9ebf2 mutation_reader: multishard_combining_reader: support custom dismantler
Add a dismantler functor parameter. When the multishard reader is
destroyed this functor will be called for each shard reader, passing a
future to a `stopped_foreign_reader`. This future becomes available when
the shard reader is stopped, that is, when it finished all in-progress
read-aheads and/or pending next partition calls.

The intended use case for the dismantler functor is a client that needs
to be notified when readers are destroyed and/or has to have access to
any unconsumed fragments from the foreign readers wrapping the shard
readers.
2018-09-03 10:31:44 +03:00
Botond Dénes
f13b878a94 mutation_reader: pass all standard reader params to remote_reader_factory
Extend `remote_reader_factory` interface so that it accepts all standard
mutation reader creation parameters. This allows factory lambdas to be
truly stateless, not having to capture any standard parameters that is
needed for creating the reader.
Standard parameters are those accepted by
`mutation_source::make_reader()`.
2018-09-03 10:31:44 +03:00
Botond Dénes
8915293257 multishard_combining_reader: fix incorrect comment 2018-09-03 10:31:44 +03:00
Botond Dénes
81a03db955 mutation_reader: reader_selector: use ring_position instead of token
sstable_set::incremental selector was migrated to ring position, follow
suit and migrate the reader_selector to use ring_position as well. Above
correctness this also improves efficiency in case of dense tables,
avoiding prematurely selecting sstables that share the token but start
at different keys, altough one could argue that this is a niche case.
2018-07-04 17:42:37 +03:00
Botond Dénes
a8e795a16e sstables_set::incremental_selector: use ring_position instead of token
Currently `sstable_set::incremental_selector` works in terms of tokens.
Sstables can be selected with tokens and internally the token-space is
partitioned (in `partitioned_sstable_set`, used for LCS) with tokens as
well. This is problematic for severeal reasons.
The sub-range sstables cover from the token-space is defined in terms of
decorated keys. It is even possible that multiple sstables cover
multiple non-overlapping sub-ranges of a single token. The current
system is unable to model this and will at best result in selecting
unnecessary sstables.
The usage of token for providing the next position where the
intersecting sstables change [1] causes further problems. Attempting to
walk over the token-space by repeatedly calling `select()` with the
`next_position` returned from the previous call will quite possibly lead
to an infinite loop as a token cannot express inclusiveness/exclusiveness
and thus the incremental selector will not be able to make progress when
the upper and lower bounds of two neighbouring intervals share the same
token with different inclusiveness e.g. [t1, t2](t2, t3].

To solve these problems update incremental_selector to work in terms of
ring position. This makes it possible to partition the token-space
amoing sstables at decorated key granularity. It also makes it possible
for select() to return a next_position that is guaranteed to make
progress.

partitioned_sstable_set now builds the internal interval map using the
decorated key of the sstables, not just the tokens.
incremental_selector::select() now uses `dht::ring_position_view` as
both the selector and the next_position. ring_position_view can express
positions between keys so it can also include information about
inclusiveness/exclusiveness of the next interval guaranteeing forward
progress.

[1] `sstable_set::incremental_selector::selection::next_position`
2018-07-04 17:42:33 +03:00
Botond Dénes
78ecf2740a mutation_reader_merger::maybe_add_readers(): remove early return
It's unnecessary (doesn't prevent anything). The code without it
expresses intent better (and is shorter by two lines).
2018-07-02 11:41:09 +03:00
Botond Dénes
d26b35b058 mutation_reader_merger: get rid of _key
`_key` is only used in a single place and this does not warrant storing
it in a member. Also get rid of current_position() which was used to
query `_key`.
2018-07-02 11:40:43 +03:00
Paweł Dziepak
27014a23d7 treewide: require type info for copying atomic_cell_or_collection 2018-05-31 15:51:11 +01:00
Tomasz Grabiec
bb96518cc5 mutation_reader: Make empty mutation source advertize no partitions
So that perf_row_cache_update will always populate cache.
2018-05-30 12:18:56 +02:00
Avi Kivity
7d29addb1f mutation_reader: optimize make_combined_reader for the single-reader case
If we're given a single reader (can be common in a low-write-rate table,
where most of the data will be in a single large sstable, or in leveled
tables) then we can avoid the overhead of the combining reader by returning
the single input.

Tests: unit (release)
Message-Id: <20180513130333.15424-1-avi@scylladb.com>
2018-05-13 20:07:10 +02:00
Botond Dénes
7a3eab90c8 multishard_combining_reader: use optimized optional for the shard reader
Use flat_mutation_reader_opt instead of
std::optional<flat_mutation_reader>.
2018-05-10 13:06:47 +03:00
Botond Dénes
04643fb223 multishard_combining_reader: prepare for read-ahead otliving the reader
When the multishard reader is destroyed there might be severeal pending
read-aheads running in the background. These read-aheads need their
associated reader to stay alive until after the read-ahead completes.
To solve this move the flat_mutation_reader into a struct and manage
this struct's lifetime through a shared pointer. Fibers associated with
read-aheads that might outlive the multishard reader will hold on to a
copy of the shard pointer keeping the underlying reader alive until they
complete. To avoid doing any extra work a flag is added to this state
which is set when the multishard reader is destroyed. When this flag is
set, pending continuations will return early.  All this is encapsulated
in multishard_combining_reader::shard_reader the multishard reader code
itself need not be changed.
2018-04-30 17:16:21 +03:00
Botond Dénes
a05d398be7 foreign_reader: prepare for read-ahead outliving the reader
The foreign reader keeps track of ongoing read-aheads via a
foreign_ptr to the read-ahead's future on the remote shard. This pointer
is overwritten after each "remote call" to the remote reader with a
pointer to the future of the new read-ahead's future.
There are severeal problems with the current implementation:
1) There is a new read-ahead launched after each "remote call"
  unconditionally, even if the remote reader is at EOS. This will start
  unecessary read-ahead when the reader is already finished and may be
  soon destroyed (legally) by the client.
2) The pointer to the remote read-ahead future is not set to nullptr
  when a remote call is issued. Thus in the destructor, where we
  attach a continuation to the read-ahead's future to extend the
  reader's lifetime until after the read-ahead finishes, we migh attach
  a continuation to a future that already has one and run into a failed
  assert().

To fix this issues reset the read-ahead pointer to nullptr each time a
remote call is issued and don't start a new read-ahead if the remote
reader is at EOS. This way we can ensure that when the reader is
destroyed we either have a valid and non-stale read-aead future or none
at all and can reliably make a decision about whether we need to extend
the lifetime of the remote reader or not.
2018-04-30 14:34:43 +03:00
Botond Dénes
704d3d8421 multishard_combining_reader: avoid creating the shard reader twice
The multishard reader creates its shard readers on demand when they are
first attempted to be used. However at this time the reader migh already
be in the progress of being created, initiated by a previous read-ahead.
To avoid creating the shard reader twice, before creating the reader
check whether there are any read-aheads in progress. If there is, it
already created (is creating or will create) the reader and hence
synchronise with the read ahead. Synchronisation happens via a promise,
the read ahead creates a promise which will be fulfilled when the reader
is created. A concurrent create_reader() call will wait on this promise
instead of attempting to create a new reader.
2018-04-30 14:34:43 +03:00
Botond Dénes
f9464cfcd7 multishard_combining_reader: read_ahead: don't assume reader is created
Currently it is assumed that when read_ahead is called the reader is
already created. Under most circumstances this will not be true. It was
blind (bad) luck that we didn't hit this before (during testing).
2018-04-30 14:34:43 +03:00
Botond Dénes
d9fceb398a multishard_combining_reader: move read-ahead related methods
To the group of methods that do not assume the reader is already
created. A patch will follow that will update read_ahead() to not assume
that the reader is created.
2018-04-30 14:34:43 +03:00