Commit Graph

218 Commits

Author SHA1 Message Date
Botond Dénes
c3b4c3f451 evictable_reader: reset _range_override after fast-forwarding
`_range_override` is used to store the modified range the reader reads
after it has to be recreated (when recreating a reader it's read range
is reduced to account for partitions it already read). When engaged,
this field overrides the `_pr` field as the definitive range the reader
is supposed to be currently reading. Fast forwarding conceptually
overrides the range the reader is currently reading, however currently
it doesn't reset the `_range_override` field. This resulted in
`_range_override` (containing the modified pre-fast-forward range)
incorrectly overriding the fast-forwarded-to range in `_pr` when
validating the first partition produced by the just recreated reader,
resulting in a false-positive validation failure.

Fixes: #8059

Tests: unit(release)
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20210217164744.420100-1-bdenes@scylladb.com>
2021-02-17 19:11:00 +02:00
Benny Halevy
d565e3fb57 reader_lifecycle_policy: retire low level try_resume method
The caller can now just call sem.unregister_inactive_read(irh) directly.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-02-08 20:32:40 +02:00
Benny Halevy
4e8f29ef14 reader_concurrency_semaphore: inactive_read: keep a flat_mutation_reader
There's no need to hold a unique_ptr<flat_mutation_reader> as
flat_mutation_reader itself holds a unique_ptr<flat_mutation_reader::impl>
and functions as a unique ptr via flat_mutation_reader_opt.

With that, unregister_inactive_read was modified to return a
flat_mutation_reader_opt rather than a std::unique_ptr<flat_mutation_reader>,
keeping exactly the same semantics.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-02-08 20:32:40 +02:00
Avi Kivity
913d970c64 Merge "Unify inactive readers" from Botond
"
Currently inactive readers are stored in two different places:
* reader concurrency semaphore
* querier cache
With the latter registering its inactive readers with the former. This
is an unnecessarily complex (and possibly surprising) setup that we want
to move away from. This series solves this by moving the responsibility
if storing of inactive reads solely to the reader concurrency semaphore,
including all supported eviction policies. The querier cache is now only
responsible for indexing queriers and maintaining relevant stats.
This makes the ownership of the inactive readers much more clear,
hopefully making Benny's work on introducing close() and abort() a
little bit easier.

Tests: unit(release, debug:v1)
"

* 'unify-inactive-readers/v2' of https://github.com/denesb/scylla:
  reader_concurrency_semaphore: store inactive readers directly
  querier_cache: store readers in the reader concurrency semaphore directly
  querier_cache: retire memory based cache eviction
  querier_cache: delegate expiry to the reader_concurrency_semaphore
  reader_concurrency_semaphore: introduce ttl for inactive reads
  querier_cache: use new eviction notify mechanism to maintain stats
  reader_concurrency_semaphore: add eviction notification facility
  reader_concurrency_semaphore: extract evict code into method evict()
2021-02-03 10:59:04 +02:00
Botond Dénes
226088d12e mutation_reader: reader_lifecycle_policy::stopped_reader: drop pending_next_partition flag
Its not used anymore.
2021-01-22 16:18:59 +02:00
Botond Dénes
4eb65b12a0 mutation_reader: evictable_reader: remove next_partition() workaround
`next_partition()` now returns a future<>, so we can forward it to the
remote shard in the scope of the next partition call, remove the
now obsolete workaround for the synchronous next partition.
2021-01-22 16:18:30 +02:00
Botond Dénes
febd2feb4c mutation_reader: shard_reader: remove next_partition() workaround
`next_partition()` now returns a future<>, so we can forward it to the
remote shard in the scope of the next partition call, remove the
now obsolete workaround for the synchronous next partition.
2021-01-22 15:53:05 +02:00
Botond Dénes
81da6b756f mutation_reader: foreign_reader: remove next_partition() workaround
`next_partition()` now returns a future<>, so we can forward it to the
remote shard in the scope of the next partition call, remove the
now obsolete workaround for the synchronous next partition.
2021-01-22 15:30:36 +02:00
Kamil Braun
570d15c7bc multishard_combining_reader: do not use smp::count
`multishard_combining_reader` currently only works under the assumption
that every table uses the same sharder configured using the node's number
of shards. But we could potentially specify a different sharder for a chosen table,
e.g. one that puts everything on shard 0.
Then this assumption will be broken and the reader causes a segfault.

Fixes #7945.
2021-01-21 18:28:18 +02:00
Benny Halevy
29002e3b48 flat_mutation_reader: return future from next_partition
To allow it to asynchronously close underlying readers
on next_partition().

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-01-13 17:35:07 +02:00
Kamil Braun
5e846b33b8 clustering_order_reader_merger: fix the 0 readers case
With 0 readers the merger would produce a `partition_end` fragment
when it should immediately return `end_of_stream` instead.
2020-12-18 12:30:40 +01:00
Kamil Braun
0b36c5e116 mutation_reader: introduce clustering_order_reader_merger
This abstraction is used to merge the output of multiple readers, each
opened for a single partition query, into a non-decreasing stream
of mutation_fragments.

It is similar to `mutation_reader_merger`,
an important difference is that the new merger may select new readers
in the middle of a partition after it already returned some fragments
from that partition.  It uses the new `position_reader_queue` abstraction
to select new readers. It doesn't support multi-partition (ring range) queries.

The new merger will be later used when reading from sstable sets created
by TimeWindowCompactionStrategy. This strategy creates many sstables
that are mostly disjoint w.r.t the contained clustering keys, so we can
delay opening sstable readers when querying a partition until after we have
processed all mutation fragments with positions before the keys
contained by these sstables.
2020-11-30 11:55:44 +01:00
Kamil Braun
857911d353 mutation_reader: generalize combined_mutation_reader
It is now called `merging_reader`, and is used to change a `FragmentProducer`
that produces a non-decreasing stream of mutation fragments batches into
a `flat_mutation_reader` producing a non-decreasing stream of fragments.

The resulting stream of fragments is increasing except for places where
we encounter range tombstones (multiple range tombstones may be produced
with the same position_in_partition)

`merging_reader` is a simple adapter over `mutation_fragment_merger`.

The old `combined_mutation_reader` is simply a specialization of `merging_reader`
where the used `FragmentProducer` is `mutation_reader_merger`, an abstraction that
merges the output of multiple readers into one non-decreasing stream of fragment
batches.

There is no separate class for `combined_mutation_reader` now. Instead,
`make_combined_reader` works directly with `merging_reader`.
2020-11-19 14:35:11 +01:00
Kamil Braun
60adee6900 mutation_reader: fix description of mutation_fragment_merger
The resulting sequence is not necessarily strictly increasing
(e.g. if there are range tombstones).
2020-11-19 14:29:04 +01:00
Botond Dénes
f5323b29d9 mutation_reader: queue_reader: don't set EOS flag on abort
If the consumer happens to check the EOS flag before it hits the
exception injected by the abort (by calling fill_buffer()), they can
think the stream ended normally and expect it to be valid. However this
is not guaranteed when the reader is aborted. To avoid consumers falsely
thinking the stream ended normally, don't set the EOS flag on abort at
all.

Additionally make sure the producer is aborted too on abort. In theory
this is not needed as they are the one initiating the abort, but better
to be safe then sorry.

Fixes: #7411
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20201102100732.35132-1-bdenes@scylladb.com>
2020-11-11 13:44:25 +02:00
Pavel Emelyanov
3da3d448c8 range_tombstone: Remove unused schema arg from .set_start
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-11-06 15:13:05 +03:00
Botond Dénes
ff623e70b3 reader_concurrency_semaphore: name permits
Require a schema and an operation name to be given to each permit when
created. The schema is of the table the read is executed against, and
the operation name, which is some name identifying the operation the
permit is part of. Ideally this should be different for each site the
permit is created at, to be able to discern not only different kind of
reads, but different code paths the read took.

As not all read can be associated with one schema, the schema is allowed
to be null.

The name will be used for debugging purposes, both for coredump
debugging and runtime logging of permit-related diagnostics.
2020-10-13 12:32:13 +03:00
Botond Dénes
307cdf1e0d multishard_combining_reader: reader_lifecycle_policy: add permit param to create_reader()
Allow the evictable reader managing the underlying reader to pass its
own permit to it when creating it, making sure they share the same
permit. Note that the two parts can still end up using different
permits, when the underlying reader is kept alive between two pages of a
paged read and thus keeps using the permit received on the previous
page.

Also adjust the `reader_context` in multishard_mutation_query.cc to use
the passed-in permit instead of creating a new one when creating a new
reader.
2020-10-12 15:56:56 +03:00
Botond Dénes
e09ab09fff multishard_combining_reader: add permit parameter
Don't create an own permit, take one as a parameter, like all other
readers do, so the permit can be provided by the higher layer, making
sure all parts of the logical read use the same permit.
2020-10-12 15:56:56 +03:00
Botond Dénes
600f1c7853 multishard_combining_reader: shard_reader: use multishard reader's permit
Don't create a new permit per shard reader, pass down the multishard
reader's one to be used by each shard reader. They all belong to the
same read, they should use the same permit. Note that despite its name
the shard readers are the local representation of a reader living on a
remote shard and as such they live on the same shard the multishard
combining reader lives on.
2020-10-12 15:56:56 +03:00
Botond Dénes
dd372c8457 flat_mutation_reader: de-virtualize buffer_size()
The main user of this method, the one which required this method to
return the collective buffer size of the entire reader tree, is now
gone. The remaining two users just use it to check the size of the
reader instance they are working with.
So de-virtualize this method and reduce its responsibility to just
returning the buffer size of the current reader instance.
2020-10-06 08:22:56 +03:00
Botond Dénes
6ca0464af5 mutation_fragment: add schema and permit
We want to start tracking the memory consumption of mutation fragments.
For this we need schema and permit during construction, and on each
modification, so the memory consumption can be recalculated and pass to
the permit.

In this patch we just add the new parameters and go through the insane
churn of updating all call sites. They will be used in the next patch.
2020-09-28 11:27:23 +03:00
Botond Dénes
72a88e0257 mutation_fragment: s/as_mutable_range_tombstone/mutate_as_range_tombstone/
We will soon want to update the memory consumption of mutation fragment
after each modification done to it, to do that safely we have to forbid
direct access to the underlying data and instead have callers pass a
lambda doing their modifications.

Uses where this method was just used to move the fragment away are
converted to use `as_range_tombstone() &&`.
2020-09-28 10:53:56 +03:00
Botond Dénes
0518571e56 flat_mutation_reader: make _buffer a tracked buffer
Via a tracked_allocator. Although the memory allocations made by the
_buffer shouldn't dominate the memory consumption of the read itself,
they can still be a significant portion that scales with the number of
readers in the read.
2020-09-28 10:53:56 +03:00
Botond Dénes
77ea44cb73 mutation_reader: extract the two fill_buffer_result into a single one
Currently we have two, nearly identical definitions of said struct.
Extract it to a common definition and rename it to
`remote_fill_buffer_result`.
2020-09-28 10:53:56 +03:00
Botond Dénes
3fab83b3a1 flat_mutation_reader: impl: add reader_permit parameter
Not used yet, this patch does all the churn of propagating a permit
to each impl.

In the next patch we will use it to track to track the memory
consumption of `_buffer`.
2020-09-28 10:53:48 +03:00
Botond Dénes
0b0ae18a14 evictable_reader: validate buffer after recreation the underlying
The reader recreation mechanism is a very delicate and error-prone one,
as proven by the countless bugs it had. Most of these bugs were related
to the recreated reader not continuing the read from the expected
position, inserting out-of-order fragments into the stream.
This patch adds a defense mechanism against such bugs by validating the
start position of the recreated reader. Several things are checked:
* The partition is the expected one -- the one we were in the middle of
  or the next if we stopped at partition boundaries.
* The partition is in the read range.
* The first fragment in the partition is the expected one -- has a
  an equal or larger position than the next expected fragment.
* The fragment is in the clustering range as defined by the slice.

As these validations are only done on the slow-path of recreating an
evicted reader, no performance impact is expected.
2020-09-25 12:09:00 +03:00
Botond Dénes
91020eef73 evictable_reader: update_next_position(): only use peek'd position on partition boundary
`evictable_reader::update_next_position()` is used to record the position the
reader will continue from, in the next buffer fill. This position is used to
create the partition slice when the underlying reader is evicted and has
to be recreated. There is an optimization in this method -- if the
underlying's buffer is not empty we peek at the first fragment in it and
use it as the next position. This is however problematic for buffer
validation on reader recreation (introduced in the next patch), because
using the next row's position as the next pos will allow for range
tombstones to be emitted with before_key(next_pos.key()), which will
trigger the validation. Instead of working around this, just drop this
optimization for mid-partition positions, it is inconsequential anyway.
We keep it for where it is important, when we detect that we are at a
partition boundary. In this case we can avoid reading the current
partition altogether when recreating the reader.
2020-09-25 12:09:00 +03:00
Botond Dénes
4f2e7a18e2 evictable_reader: trim range tombstones to the read clustering range
Currently mutation sources are allowed to emit range tombstones that are
out-of the clustering read range if they are relevant to it. For example
a read of a clustering range [ck100, +inf), might start with:

    range_tombstone{start={ck1, -1}, end={ck200, 1}},
    clustering_row{ck100}

The range tombstone is relevant to the range and the first row of the
range so it is emitted as first, but its position (start) is outside the
read range. This is normally fine, but it poses a problem for evictable
reader. When the underlying reader is evicted and has to be recreated
from a certain clustering position, this results in out-of-order
mutation fragments being inserted into the middle of the stream. This is
not fine anymore as the monotonicity guarantee of the stream is
violated. The real solution would be to require all mutation sources to
trim range tombstones to their read range, but this is a lot of work.
Until that is done, as a workaround we do this trimming in the evictable
reader itself.
2020-09-25 12:09:00 +03:00
Botond Dénes
4944e050e3 mutation_reader: make_combined_reader(): return empty reader when combining 0 readers
Avoid creating all the combining machinery when we know there is no data
to be had.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20200821045602.13096-1-bdenes@scylladb.com>
2020-08-22 20:47:49 +03:00
Botond Dénes
a9013030cf multishard_mutation_reader: add a trace message for each shard reader created
So we can see in the trace output, the shards that actually participated
in the reads. There is a single message for each shard reader.

Fixes: #6888
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20200803132338.95013-1-bdenes@scylladb.com>
2020-08-03 16:24:46 +03:00
Botond Dénes
5de0afdab7 mutation_reader: expose new_reader_base_cost
So that test code can use it.
2020-07-20 11:23:39 +03:00
Botond Dénes
627054c3d7 mutation_reader: shard_reader: fix fast-forwarding with read-ahead
The current `fast_forward_to(const dht::partition_range&)`
implementation has two problems:
* If the reader was not created yet, but there is an ongoing read-ahead
  (which is going to create it), the function bails out. This will
  result in this shard reader not being fast-forwarded to the new range
  at all.
* If the reader was already created and there is an ongoing read-ahead,
  the function will wait for this to complete, then fast-forward the
  reader, as it should. However, the buffer is cleared *before* the
  read-ahead is waited for. So if the read-ahead brings in new data,
  this will land in the buffer. This data will be outside of the
  fast-forwarded-to range and worse, as we just cleared the buffer, it
  might violate mutation fragment stream monotonicity requirements.

This patch fixes both of these bugs. Targeted reproducer unit tests are
coming in the next patches.
2020-07-01 09:51:02 +03:00
Botond Dénes
e31f7316c0 mutation_reader: evictable_reader: add assert against pause handle leak
We are currently investigating a segmentation fault, which is suspected
to be caused by a leaked pause handle. Although according to the latest
theory the handle leak is not the root cause of the issue, just a
symptom, its better to catch any bugs that would cause a handle leaking
at the act, and not later when some side-effect causes a segfault.

Refs: #6613
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20200625153729.522811-1-bdenes@scylladb.com>
2020-06-28 12:08:25 +03:00
Botond Dénes
542d9c3711 mutation_reader: expose evictable_reader
Expose functions for the outside world to create evictable readers. We
expose two functions, which create an evictable reader with
`auto_pause::yes` and `auto_pause::no` respectively. The function
creating the latter also returns a handle in addition to the reader,
which can be used to pause the reader.
2020-06-23 21:08:21 +03:00
Botond Dénes
1cc31deff9 mutation_reader: evictable_reader: add auto_pause flag
Currently the evictable reader unconditionally pauses the underlying
reader after each use (`fill_buffer()` or `fast_forward_to()` call).
This is fine for current users (the multishard reader), but the future
user we are doing all this refactoring for -- repair -- will want to
control when the underlying reader is paused "manually". Both these
behaviours can easily be supported in a single implementation, so we
add an `auto_pause` flag to allow the creator of the evictable reader
to control this.
2020-06-23 21:08:21 +03:00
Botond Dénes
af9e1c23e1 mutation_reader: make evictable_reader a flat_mutation_reader
The `evictable_reader` class is almost a proper flat mutation reader
already, it roughly offers the same interface. This patch makes this
formal: changing the class to inherit from `flat_mutation_reader::impl`,
and implement all virtual methods. This also entails a departure from
using the lifecycle policy to pause/resume and create readers, instead
using more general building blocks like the reader concurrency semaphore
and a mutation source.
2020-06-23 21:08:21 +03:00
Botond Dénes
4485864ada mutation_reader: s/inactive_shard_read/inactive_evictable_reader/
Rename `inactive_shard_read` to `inactive_evictable_reader` to reflect
that the fact that the evictable reader is going to be of general use,
not specific to the multishard reader.
2020-06-23 10:01:38 +03:00
Botond Dénes
b6ed054c08 mutation_reader: move inactive_shard_reader code up
It will be used by the `evictable_reader` code too in the next patches.
2020-06-23 10:01:38 +03:00
Botond Dénes
e3ea1c9080 mutation_reader: fix indentation
Deferred from the previous patch.
2020-06-23 10:01:38 +03:00
Botond Dénes
f9d1916499 mutation_reader: shard_reader: extract remote_reader as evictable_reader
We want to make the evictable reader mechanism used in the multishard
reader pipeline available for general (re)use, as a standalone
flat mutation reader implementation. The first step is extracting
`shard_reader::remote_reader` the class implementing this logic into a
top-level class, also renamed to `evictable_reader`.
2020-06-23 10:01:38 +03:00
Avi Kivity
a4c44cab88 treewide: update concepts language from the Concepts TS to C++20
Seastar recently lost support for the experimental Concepts Technical
Specification (TS) and gained support for C++20 concepts. Re-enable
concepts in Scylla by updating our use of concepts to the C++20
standard.

This change:
 - peels off uses of the GCC6_CONCEPT macro
 - removes inclusions of <seastar/gcc6-concepts.hh>
 - replaces function-style concepts (no longer supported) with
   equation-style concepts
 - semicolons added and removed as needed
 - deprecated std::is_pod replaced by recommended replacement
 - updates return type constraints to use concepts instead of
   type names (either std::same_as or std::convertible_to, with
   std::same_as chosen when possible)

No attempt is made to improve the concepts; this is a specification
update only.
Message-Id: <20200531110254.2555854-1-avi@scylladb.com>
2020-06-02 09:12:21 +03:00
Botond Dénes
4409579352 mutation_reader: restricted_reader: work in terms of reader_permit
We want to refactor all read resource tracking code to work through the
read_permit, so refactor the restricted reader to also do so.
2020-05-28 11:34:35 +03:00
Botond Dénes
06dd3d9077 queue_reader: push(): eliminate unneeded continuation on full buffer case
Currently, push() attaches a continuation to the _not_full future, if
push() is called when the buffer is already full. This is not needed as
we can safely push the fragment even if the buffer is already full.
Furthermore we can eliminate the possibility of push() being called when
the buffer is full, by checking whether it is full *after* pushing the
fragment, not before.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20200521055840.376019-1-bdenes@scylladb.com>
2020-05-21 09:34:44 +03:00
Botond Dénes
54a0d8536e restricting_mutation_reader: include own buffer in buffer size calculation
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20200519102902.231042-1-bdenes@scylladb.com>
2020-05-19 18:23:15 +03:00
Glauber Costa
e44b2826ab compaction: avoid abandoned futures when using interposers
When using interposers, cancelling compactions can leave futures
that are not waited for (resharding, twcs)

The reason is when consume_end_of_stream gets called, it tries to
push end_of_stream into the queue_reader_handle. Because cancelling
a compaction is done through an exception, the queue_reader_handle
is terminated already at this time. Trying to push to it generates
another exception and prevents us from returning the future right
below it.

This patch adds a new method is_terminated() and if we detect
that the queue_reader_handle is already terminated by this point,
we don't try to push. We call it is_terminated() because the check
is to see if the queue_reader_handle has a _reader. The reader is
also set to null on successful destruction.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Reviewed-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20200430175839.8292-1-glauber@scylladb.com>
2020-05-01 16:30:23 +03:00
Botond Dénes
196dd5fa9b treewide: throw std::bad_function_call with backtraces
We typically use `std::bad_function_call` to throw from
mandatory-to-implement virtual functions, that cannot have a meaningful
implementation in the derived class. The problem with
`std::bad_function_call` is that it carries absolutely no information
w.r.t. where was it thrown from.

I originally wanted to replace `std::bad_function_call` in our codebase
with a custom exception type that would allow passing in the name of the
function it is thrown from to be included in the exception message.
However after I ended up also including a backtrace, Benny Halevy
pointed out that I might as well just throw `std:bad_function_call` with
a backtrace instead. So this is what this patch does.

All users are various unimplemented methods of the
`flat_mutation_reader::impl` interface.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20200408075801.701416-1-bdenes@scylladb.com>
2020-04-08 13:54:06 +02:00
Piotr Jastrzebski
e72696a8e6 sharding_info: rename the class to sharder
Also rename all variables that were named si or sinfo
to sharder.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-03-30 18:42:33 +02:00
Piotr Jastrzebski
031f589dba multishard_combining_reader: use token_for_next_shard from sharding info not partitioner
Previously this function was accessing sharding logic
through partitioner obtained from the schema.

While converting tests, dummy_partitioner is turned into
dummy_sharding_info.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-03-30 18:42:25 +02:00
Rafael Ávila de Espíndola
c5795e8199 everywhere: Replace engine().cpu_id() with this_shard_id()
This is a bit simpler and might allow removing a few includes of
reactor.hh.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200326194656.74041-1-espindola@scylladb.com>
2020-03-27 11:40:03 +03:00