Commit Graph

265 Commits

Author SHA1 Message Date
Benny Halevy
4476800493 flat_mutation_reader: get rid of timeout parameter
Now that the timeout is taken from the reader_permit.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-08-24 16:30:51 +03:00
Benny Halevy
4e3dcfd7d6 reader_concurrency_semaphore: use permit timeout for admission
Now that the timeout is stored in the reader
permit use it for admission rather than a timeout
parameter.

Note that evictable_reader::next_partition
currently passes db::no_timeout to
resume_or_create_reader, which propagated to
maybe_wait_readmission, but it seems to be
an oversight of the f_m_r api that doesn't
pass a timeout to next_partition().

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-08-24 16:30:51 +03:00
Benny Halevy
9b0b13c450 reader_concurrency_semaphore: adjust reactivated reader timeout
Update the reader's timeout where needed
after unregistering inactive_read.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-08-24 16:30:51 +03:00
Juliusz Stasiewicz
2b802711c2 queue_reader: implement next_partition() 2021-07-20 14:00:54 +02:00
Botond Dénes
16d3cb4777 mutation_reader: remove now unused restricting_reader
Move the now orphaned new_reader_base_cost constant to
database.hh/table.cc, as its main user is now
`table::estimate_read_memory_cost()`.
2021-07-14 17:19:02 +03:00
Botond Dénes
1b7eea0f52 reader_concurrency_semaphore: admission: flip the switch
This patch flips two "switches":
1) It switches admission to be up-front.
2) It changes the admission algorithm.

(1) by now all permits are obtained up-front, so this patch just yanks
out the restricted reader from all reader stacks and simultaneously
switches all `obtain_permit_nowait()` calls to `obtain_permit()`. By
doing this admission is now waited on when creating the permit.

(2) we switch to an admission algorithm that adds a new aspect to the
existing resource availability: the number of used/blocked reads. Namely
it only admits new reads if in addition to the necessary amount of
resources being available, all currently used readers are blocked. In
other words we only admit new reads if all currently admitted reads
requires something other than CPU to progress. They are either waiting
on I/O, a remote shard, or attention from their consumers (not used
currently).

We flip these two switches at the same time because up-front admission
means cache reads now need to obtain a permit too. For cache reads the
optimal concurrency is 1. Anything above that just increases latency
(without increasing throughput). So we want to make sure that if a cache
reader hits it doesn't get any competition for CPU and it can run to
completion. We admit new reads only if the read misses and has to go to
disk.

Another change made to accommodate this switch is the replacement of the
replica side read execution stages which the reader concurrency
semaphore as an execution stage. This replacement is needed because with
the introduction of up-front admission, reads are not independent of
each other any-more. One read executed can influence whether later reads
executed will be admitted or not, and execution stages require
independent operations to work well. By moving the execution stage into
the semaphore, we have an execution stage which is in control of both
admission and running the operations in batches, avoiding the bad
interaction between the two.
2021-07-14 17:19:02 +03:00
Botond Dénes
0ced9c83b7 mutation_reader: evictable_reader: futurize resume_or_recreate_reader()
In preparation for waiting for readmission after eviction in a later
patch.
2021-07-14 16:48:43 +03:00
Botond Dénes
5291494a50 mutation_reader: shard reader: use reader_lifecycle_policy::obtain_reader_permit()
Co-routinize the reader creation lambda in the process.
2021-07-14 16:48:43 +03:00
Botond Dénes
b5cbd19383 mutation_reader: shard_reader: mark permit as blocked when waiting on remote shard 2021-07-14 16:48:43 +03:00
Botond Dénes
6f6a8f5cf8 mutation_reader: shard_reader: coroutinize fill_buffer() and fast_forward_to()
To facilitate further patching (and make the code look nicer too).
2021-07-14 16:48:43 +03:00
Botond Dénes
26e83bdde8 mutation_reader: foreign_reader: mark permit as blocked when waiting on remote shard 2021-07-14 16:48:43 +03:00
Botond Dénes
47342ae8a8 mutation_reader: shard_reader: mark underlying permit as used 2021-07-14 16:48:43 +03:00
Botond Dénes
852bf6befd evictable_reader: relax partition key check on reader recreation
When recreating the underlying reader, the evictable reader validates
that the first partition key it emits is what it expects to be. If the
read stopped at the end of a partition, it expects the first partition
to be a larger one. If the read stopped in the middle of a certain
partition it expects the first partition to be the same it stopped in
the middle of. This latter assumption doesn't hold in all circumstances
however. Namely, the partition it stopped in the middle of might get
compacted away in the time the read was paused, in which case the read
will resume from a greater partition. This perfectly valid cases however
currently triggers the evictable reader's self validation, leading to
the abortion of the read and a scary error to be logged. Relax this
check to accept any partition that is >= compared to the one the read
stopped in the middle of.
2021-06-30 11:21:53 +03:00
Botond Dénes
2740dd2ae4 evictable_reader: always reset static row drop flag
When the evictable reader recreates in underlying reader, it does it
such that it continues from the exact mutation fragment the read was
left off from. There are however two special mutation fragments, the
partition start and static row that are unconditionally re-emitted at
the start of a new read. To work around this, when stopping at either of
these fragments the evictable reader sets two flags
_drop_partition_start and _drop_static_row to drop the unneeded
fragments (that were already emitted by it) from the underlying reader.
These flags are then reset when the respective fragment is dropped.
_drop_static_row has a vulnerability though: the partition doesn't
necessarily have a static row and if it doesn't the flag is currently
not cleared and can stay set until the next fill buffer call causing the
static row to be dropped from another partition.
To fix, always reset the _drop_static_row flag, even if no static row
was dropped (because it doesn't exist).
2021-06-30 10:05:35 +03:00
Avi Kivity
d27e88e785 Merge "compaction: prevent broken_promise or dangling reader errors" from Benny
"
This series prevents broken_promise or dangling reader errors
when (resharding) compaction is stopped, e.g. during shutdown.

At the moment compaction just closes the reader unilaterally
and this yanks the reader from under the queue_reader_handle feet,
causing dangling queue reader and broken_promise errors
as seen in #8755.

Instead, fix queue_reader::close to set value on the
_full/_not_full promises and detach from the handle,
and return _consume_fut from bucket_writer::consume
if handle is terminated.

Fixes #8755

Test: unit(dev)
DTest: materialized_views_test.py:TestMaterializedViews.interrupt_build_process_and_resharding_half_to_max_test(debug)
"

* tag 'propagate-reader-abort-v3' of github.com:bhalevy/scylla:
  mutation_writer: bucket_writer: consume: propagate _consume_fut if queue_reader_handle is_terminated
  queue_reader_handle: add get_exception method
  queue_reader: close: set value on promises on detach from handle
2021-06-22 18:52:11 +03:00
Avi Kivity
0948908502 Merge "mutation_reader: multishard_combining_reader clean-up close path" from Botond
"
The close path of the multishard combining reader is riddled with
workarounds the fact that the flat mutation reader couldn't wait on
futures when destroyed. Now that we have a close() method that can do
just that, all these workarounds can be removed.
Even more workarounds can be found in tests, where resources like the
reader concurrency semaphore are created separately for each tested
multishard reader and then destroyed after it doesn't need it, so we had
to come up with all sorts of creative and ugly workarounds to keep
these alive until background cleanup is finished.
This series fixes all this. Now, after calling close on the multishard
reader, all resources it used, including the life-cycle policy, the
semaphores created by it can be safely destroyed. This greatly
simplifies the handling of the multishard reader, and makes it much
easier to reason about life-cycle dependencies.

Tests: unit(dev, release:v2, debug:v2,
    mutation_reader_test:debug -t test_multishard,
    multishard_mutation_query_test:debug,
    multishard_combining_reader_as_mutation_source:debug)
"

* 'multishard-combining-reader-close-cleanup/v3' of https://github.com/denesb/scylla:
  mutation_reader: reader_lifecycle_policy: remove convenience methods
  mutation_reader: multishard_combining_reader: store shard_reader via unique ptr
  test/lib/reader_lifecycle_policy: destroy_reader: cleanup context
  test/lib/reader_lifecycle_policy: get rid of lifecycle workarounds
  test/lib/reader_lifecycle_policy: destroy_reader(): stop the semaphore
  test/lib/reader_lifecycle_policy: use a more robust eviction mechanism
  reader_concurrency_semaphore: wait for all permits to be destroyed in stop()
  test/lib/reader_lifcecycle_policy: fix indentation
  mutation_reader: reader_lifecycle_policy::destroy_reader(): require to be called on native shard
  reader_lifecycle_policy implementations: fix indentation
  mutation_reader: reader_lifecycle_policy::destroy_reader(): de-futurize reader parameter
  mutation_reader: shard_reader::close(): wait on the remote reader
  multishard_mutation_query: destroy remote parts in the foreground
  mutation_reader: shard_reader::close(): close _reader
  mutation_reader: reader_lifcecycle_policy::destroy_reader(): remove out-of-date comment
2021-06-16 17:25:50 +03:00
Benny Halevy
b5efc3ceac queue_reader_handle: add get_exception method
To be used by the mutation_writer in the following patch.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-06-16 17:25:16 +03:00
Benny Halevy
4830b6647c queue_reader: close: set value on promises on detach from handle
To prevent broken_promise exception.

Since close() is manadatory the queue_reader destructor,
that just detaches the reader from the handle, is not
needed anymore, so remove it.

Adjust the test_queue_reader unit test accordingly.

Test: test_queue_reader(dev)

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-06-16 17:25:14 +03:00
Botond Dénes
28c2b54875 mutation_reader: reader_lifecycle_policy: remove convenience methods
These convenience methods are not used as much anymore and they are not
even really necessary as the register/unregister inactive read API got
streamlined a lot to the point where all of these "convenience methods"
are just one-liners, which we can just inline into their few callers
without loosing readability.
2021-06-16 11:29:37 +03:00
Botond Dénes
63f0839164 mutation_reader: multishard_combining_reader: store shard_reader via unique ptr
No need for a shared pointer anymore, as we don't have to potentially
keep the shard reader alive after the multishard reader is destroyed, we
now do proper cleanup in close().
We still need a pointer as the shard reader is un-movable but is stored
in a vector which requires movable values.
2021-06-16 11:29:37 +03:00
Botond Dénes
8c7447effd mutation_reader: reader_lifecycle_policy::destroy_reader(): require to be called on native shard
Currently shard_reader::close() (its caller) goes to the remote shard,
copies back all fragments left there to the local shard, then calls
`destroy_reader()`, which in the case of the multishard mutation query
copies it all back to the native shard. This was required before because
`shard_reader::stop()` (`close()`'s) predecessor) couldn't wait on
`smp::submit_to()`. But close can, so we can get rid of all this
back-and-forth and just call `destroy_reader()` on the shard the reader
lives on, just like we do with `create_reader()`.
2021-06-16 11:29:35 +03:00
Botond Dénes
a7e59d3e2c mutation_reader: reader_lifecycle_policy::destroy_reader(): de-futurize reader parameter
The shard reader is now able to wait on the stopped reader and pass the
already stopped reader to `destroy_reader()`, so we can de-futurize the
reader parameter of said method. The shard reader was already patched to
pass a ready future so adjusting the call-site is trivial.
The most prominent implementation, the multishard mutation query, can
now also drop its `_dismantling_gate` which was put in place so it can
wait on the background stopping if readers.

A consequence of this move is that handling errors that might happen
during the stopping of the reader is now handled in the shard reader,
not all lifecycle policy implementations.
2021-06-16 11:21:38 +03:00
Botond Dénes
13d7806b62 mutation_reader: shard_reader::close(): wait on the remote reader
We now have a future<> returning close() method so we don't need to
do the cleanup of the remote reader in the background, detaching it
from the shard-reader under destruction. We can now wait for the
cleanup properly before the shard reader is destroyed and just pass the
stopped reader to reader_lifecycle_policy::destroy_reader(). This patch
does the first part -- moving the cleanup to the foreground, the API
change of said method will come in the next patch.
2021-06-16 11:21:38 +03:00
Botond Dénes
7552cc73cf mutation_reader: shard_reader::close(): close _reader
The reason we got away without closing _reader so far is that it is an
`std::unique_ptr<evictable_reader>` which is a
`flat_mutation_reader::impl` instance, without the
`flat_mutation_reader` wrapper, which contains the validations for
close.
2021-06-16 11:21:33 +03:00
Botond Dénes
114459684b mutation_reader: foreign_reader::close() use on_internal_error_noexcept()
Instead of the throwing on_internal_error(). `close()` is noexcept so we
can't throw exceptions here.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20210615133130.786048-1-bdenes@scylladb.com>
2021-06-16 09:34:49 +02:00
Benny Halevy
8ecc626c15 queue_reader_handle: mark copy constructor noexcept
It is trivially so, as std::exception_ptr is nothrow default
constructible.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210609135925.270883-2-bhalevy@scylladb.com>
2021-06-09 20:09:01 +03:00
Benny Halevy
3100cdcc65 queue_reader_handle: move-construct also _ex
We're only moving the other reader without the
other's exception (as it maybe already be abandoned
or aborted).

While at it, mark the constructor noexcept.

Fixes #8833

Test: unit(dev)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210609135925.270883-1-bhalevy@scylladb.com>
2021-06-09 20:09:01 +03:00
Avi Kivity
a55b434a2b treewide: extent copyright statements to present day 2021-06-06 19:18:49 +03:00
Botond Dénes
5e39cedbe3 evictable_reader: remove _reader_created flag
This flag is not really needed, because we can just attempt a resume on
first use which will fail with the default constructed inactive read
handle and the reader will be created via the recreate-after-evicted
path.
This allows the same path to be used for all reader creation cases,
simplifying the logic and more importantly making further patching
easier without the special case.
To make the recreate path (almost) as cheap for the first reader
creation as it was with the special path, `_trim_range_tombstones` and
`_validate_partition_key` is only set when really needed.

Tests: unit(dev)
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20210514141511.127735-1-bdenes@scylladb.com>
2021-05-16 14:45:46 +03:00
Botond Dénes
3b57106627 evictable_reader: remove destructor
We now have close() which is expected to clean up, no need for cleanup
in the destructor and consequently a destructor at all.

Message-Id: <20210514112349.75867-1-bdenes@scylladb.com>
2021-05-16 12:19:41 +03:00
Benny Halevy
6e62ec8c24 mutation_reader: shard_reader: get rid of stop
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-04-25 11:35:07 +03:00
Benny Halevy
fc5e4688db mutation_reader: multishard_combining_reader: get rid of destructor
Now that the multishard_combining_reader is guaranteed to be called
there is no longer need for stopping the shard readers
in the destructor.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-04-25 11:35:07 +03:00
Benny Halevy
5b22731f9a flat_mutation_reader: require close
Make flat_mutation_reader::impl::close pure virtual
so that all implementations are required to implemnt it.

With that, provide a trivial implementation to
all implementations that currently use the default,
trivial close implementation.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-04-25 11:35:07 +03:00
Benny Halevy
51c96d405d mutation_reader: evictable_reader: fill_buffer: make sure to close the reader
If reader.fill_buffer() fails, we will not call `maybe_pause` and the
reader will be destroyed, so make sure to close it.

Otherwise, the reader is std:move'ed to `maybe_pause` that either
paused using register_inactive_read or further std::move'ed to _reader,
in both cases it doesn't need to be closed.  `with_closeable`
can safely try to close the moved-from reader and do nothing in this
case, as the f_m_r::impl was already moved away.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-04-25 11:35:07 +03:00
Benny Halevy
a3f9dc6e0b mutation_reader: multishard_combining_reader: implement close
Close all underlying shard readers.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-04-25 11:35:07 +03:00
Benny Halevy
58b1da8cf5 mutation_reader: shard_reader: implement close
return reader lifecycle policy's destroy_reader future
so it can be waited on by caller (multishard_combining_reader::close).

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-04-25 11:35:07 +03:00
Benny Halevy
2c1edb1a94 mutation_reader: reader_lifecycle_policy: return future from destroy_reader
So we can wait on it from to-be-introduced shard_reader::close().

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-04-25 11:35:07 +03:00
Benny Halevy
bfe56fd99c mutation_reader: shard_reader: get rid of _stopped
It's unused.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-04-25 11:35:07 +03:00
Benny Halevy
e1ec401bb6 mutation_reader: evictable_reader: implement close
If there's an active reader then close it, else,
try to resume the paused reader, and close it.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-04-25 11:35:07 +03:00
Benny Halevy
84206501ae mutation_reader: foreign_reader: wait for readahead and close underlying reader
Move the logic in ~foreign_reader to close()
to wait on the read_ahead future and close the underlying
reader on the remote shard.  Still call close in the background
in ~foreign_reader if destroyed without closing to keep the current
behavior, but warn about it, until it's proved to be unneeded.

Also, added on_iternal_error in close if _read_ahead_future
is engaged but _reader is not, since this must never happen
and we wait on the _read_ahead_future without the _reader.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-04-25 11:35:07 +03:00
Benny Halevy
ea3f2a6536 mutation_reader: restricting_mutation_reader: close underlying reader
If a reader was admitted, close it in close().

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-04-25 11:35:07 +03:00
Benny Halevy
f9ae50483f mutation_reader: merging_reader: close underlying merger
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-04-25 11:35:07 +03:00
Benny Halevy
dccdbdff95 mutation_reader: mutation_fragment_merger: close underlying producer
This will be needed by the merging_reader.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-04-25 11:35:07 +03:00
Benny Halevy
761a38ce21 mutation_reader: mutation_reader_merger: make sure to close underlying readers
These will be called by merging_reader::close via
mutation_fragment_merger::close in the following patches.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-04-25 11:35:07 +03:00
Benny Halevy
b140ea6df2 mutation_reader: compacting_reader: implement close
Close underlying reader.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-04-25 11:35:07 +03:00
Kamil Braun
7ffb0d826b clustering_order_reader_merger: handle empty readers
The merger could return end-of-stream if some (but not all) of the
underlying readers were empty (i.e. not even returning a
`partition_start`). This could happen in places where it was used
(`time_series_sstable_set::create_single_key_sstable_reader`) if we
opened an sstable which did not have the queried partition but passed
all the filters (specifically, the bloom filter returned a false
positive for this sstable).

The commit also extends the random tests for the merger to include empty
readers and adds an explicit test case that catches this bug (in a
limited scope: when we merge a single empty reader).

It also modifies `test_twcs_single_key_reader_filtering` (regression
test for #8432) because the time where the clustering key filter is
invoked changes (some invocations move from the constructor of the
merger to operator()). I checked manually that it still catches the bug
when I reintroduce it.

Fixes #8445.

Closes #8446
2021-04-12 10:34:52 +03:00
Botond Dénes
bc1fcd3db2 multishard_combining_reader: only read from needed shards
The multishard combining reader currently assumes that all shards have
data for the read range. This however is not always true and in extreme
cases (like reading a single token) it can lead to huge read
amplification. Avoid this by not pushing shards to
`_shard_selection_min_heap` if the first token they are expected to
produce falls outside of the read range. Also change the read ahead
algorithm to select the shards from `_shard_selection_min_heap`, instead
of walking them in shard order. This was wrong in two ways:
* Shards may be ordered differently with respect to the first partition
  they will produce; reading ahead on the next shard in shard order
  might not bring in data on the next shard the read will continue on.
  Shard order is only correct when starting a new range and shards are
  iterated over in the order they own tokens according to the sharding
  algorithm.
* Shards that may not have data relevant to the read range are also
  considered for read ahead.

After this patch, the multishard reader will only read from shards that
have data relevant to the read range, both in the case of normal reads
and also for read-ahead.

Fixes: #8161

Tests: unit(release)
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20210226132536.85438-1-bdenes@scylladb.com>
2021-02-26 23:29:20 +02:00
Botond Dénes
c3b4c3f451 evictable_reader: reset _range_override after fast-forwarding
`_range_override` is used to store the modified range the reader reads
after it has to be recreated (when recreating a reader it's read range
is reduced to account for partitions it already read). When engaged,
this field overrides the `_pr` field as the definitive range the reader
is supposed to be currently reading. Fast forwarding conceptually
overrides the range the reader is currently reading, however currently
it doesn't reset the `_range_override` field. This resulted in
`_range_override` (containing the modified pre-fast-forward range)
incorrectly overriding the fast-forwarded-to range in `_pr` when
validating the first partition produced by the just recreated reader,
resulting in a false-positive validation failure.

Fixes: #8059

Tests: unit(release)
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20210217164744.420100-1-bdenes@scylladb.com>
2021-02-17 19:11:00 +02:00
Benny Halevy
d565e3fb57 reader_lifecycle_policy: retire low level try_resume method
The caller can now just call sem.unregister_inactive_read(irh) directly.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-02-08 20:32:40 +02:00
Benny Halevy
4e8f29ef14 reader_concurrency_semaphore: inactive_read: keep a flat_mutation_reader
There's no need to hold a unique_ptr<flat_mutation_reader> as
flat_mutation_reader itself holds a unique_ptr<flat_mutation_reader::impl>
and functions as a unique ptr via flat_mutation_reader_opt.

With that, unregister_inactive_read was modified to return a
flat_mutation_reader_opt rather than a std::unique_ptr<flat_mutation_reader>,
keeping exactly the same semantics.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-02-08 20:32:40 +02:00