Commit Graph

54 Commits

Author SHA1 Message Date
Benny Halevy
6e92f07630 reader_concurrency_semaphore: register_inactive_read: make noexcept
Catch error to allocate an inactive_read and just log them.
Return an empty inactive_read_handle in
this case, as if the inactive reader was evicted due to
lack of resources.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-02-08 22:31:01 +02:00
Benny Halevy
46c2229b78 reader_concurrency_semaphore: separate set_notify_handler from register_inactive_reader
Register the inactive reader first with no
evict_notify_handler and ttl.

Those can be set later, only if registration succeeded.
Otherwise, as in the querier example, there is no need
to to place the querier in the index and erase it
on eviction.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-02-08 22:31:01 +02:00
Benny Halevy
d752ea7e91 reader_concurrency_semaphore: inactive_read: make ttl_timer non-optional
By default it will be unarmed and with no callback
so there's no need to wrap it in a std::optional.

This saves an allocation and another potential
error case.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-02-08 22:31:01 +02:00
Benny Halevy
a12c9638b6 reader_concurrency_semaphore: inactive_read: use intrusive list
To simplify insertion and eviction into the inactive_reads container,
use an intrusive list thta requires a single allocation for the
inactive_read object itself.

This allows passing a reference to the inactive_read
to evict it.

Note that the reader will be unlinked automatically from
the inactive_readers list if the inactive_read_handle is destroyed.
This is okay since there is no need to track the inactive_read
if the caller loses the i_r_h (e.g. if an error is thrown).

It is also safe to evict the inactive_reader while the
i_r_h is alive.  In this case the i_r will be unlinked
after the flat_mutation_reader it holds is moved out of it.

bi::auto_unlink will detect that it's alredy unlinked
when destroyed and do nothing.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-02-08 22:31:01 +02:00
Benny Halevy
f751e42bf9 reader_concurrency_semaphore: do_wait_admission: use try_evict_one_inactive_read
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-02-08 20:52:16 +02:00
Benny Halevy
81cd3d0c51 reader_concurrency_semaphore: try_evict_one_inactive_read: pass evict_reason
So try_evict_one_inactive_read could be used also in do_wait_admission
in the next patch.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-02-08 20:32:40 +02:00
Benny Halevy
e072199b8d reader_concurrency_semaphore: unregister_inactive_read: calling on wrong semaphore is an internal error
Calling unregister_inactive_read on the wrong semaphore is a blatant
bug so better call on_internal_error so it'd be easier to catch and fix.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-02-08 20:32:40 +02:00
Benny Halevy
9c9b4c85ae reader_concurrency_semaphore: unregister_inactive_read: do nothing if disengaged
There is no need to lookup the inactive_read if the i_r_h
is disengaged, it should not be registered so just return
quickly.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-02-08 20:32:40 +02:00
Benny Halevy
4e8f29ef14 reader_concurrency_semaphore: inactive_read: keep a flat_mutation_reader
There's no need to hold a unique_ptr<flat_mutation_reader> as
flat_mutation_reader itself holds a unique_ptr<flat_mutation_reader::impl>
and functions as a unique ptr via flat_mutation_reader_opt.

With that, unregister_inactive_read was modified to return a
flat_mutation_reader_opt rather than a std::unique_ptr<flat_mutation_reader>,
keeping exactly the same semantics.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-02-08 20:32:40 +02:00
Avi Kivity
913d970c64 Merge "Unify inactive readers" from Botond
"
Currently inactive readers are stored in two different places:
* reader concurrency semaphore
* querier cache
With the latter registering its inactive readers with the former. This
is an unnecessarily complex (and possibly surprising) setup that we want
to move away from. This series solves this by moving the responsibility
if storing of inactive reads solely to the reader concurrency semaphore,
including all supported eviction policies. The querier cache is now only
responsible for indexing queriers and maintaining relevant stats.
This makes the ownership of the inactive readers much more clear,
hopefully making Benny's work on introducing close() and abort() a
little bit easier.

Tests: unit(release, debug:v1)
"

* 'unify-inactive-readers/v2' of https://github.com/denesb/scylla:
  reader_concurrency_semaphore: store inactive readers directly
  querier_cache: store readers in the reader concurrency semaphore directly
  querier_cache: retire memory based cache eviction
  querier_cache: delegate expiry to the reader_concurrency_semaphore
  reader_concurrency_semaphore: introduce ttl for inactive reads
  querier_cache: use new eviction notify mechanism to maintain stats
  reader_concurrency_semaphore: add eviction notification facility
  reader_concurrency_semaphore: extract evict code into method evict()
2021-02-03 10:59:04 +02:00
Botond Dénes
318b0ef259 reader_concurrency_semaphore: rate-limit diagnostics messages
And since now there is no danger of them filling the logs, the log-level
is promoted to info, so users can see the diagnostics messages by
default.

The rate-limit chosen is 1/30s.

Refs: #7398

Tests: manual

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20201117091253.238739-1-bdenes@scylladb.com>
2020-11-17 11:57:51 +02:00
Piotr Sarna
3ce7848bdf reader_concurrency_semaphore: add metrics for shed reads
When the admission queue capacity reaches its limits, excessive
reads are shed in order to avoid overload. Each such operation
now bumps the metrics, which can help the user judge if a replica
is overloaded.
2020-11-11 19:01:38 +01:00
Avi Kivity
cfada6e04d reader_concurrency_semaphore: adjust permit_summary construction for clang
Clang does not implement P0960R3, parenthesized initialization of
aggregates, so we have to use brace initialization in
permit_summary.

As the parenthesized constructor call is done by emplace_back(),
we have to do the braced call ourselves.
2020-10-19 14:57:51 +03:00
Botond Dénes
18454e4a80 reader_concurrency_semaphore: dump permit diagnostics on timeout or queue overflow
The reader concurrency semaphore timing out or its queue being overflown
are fairly common events both in production and in testing. At the same
time it is a hard to diagnose problem that often has a benign cause
(especially during testing), but it is equally possible that it points
to something serious. So when this error starts to appear in logs,
usually we want to investigate and the investigation is lengthy...
either involves looking at metrics or coredumps or both.

This patch intends to jumpstart this process by dumping a diagnostics on
semaphore timeout or queue overflow. The diagnostics is printed to the
log with debug level to avoid excessive spamming. It contains a
histogram of all the permits associated with the problematic semaphore
organized by table, operation and state.

Example:

DEBUG 2020-10-08 17:05:26,115 [shard 0] reader_concurrency_semaphore -
Semaphore _read_concurrency_sem: timed out, dumping permit diagnostics:
Permits with state admitted, sorted by memory
memory  count   name
3499M   27      ks.test:data-query

3499M   27      total

Permits with state waiting, sorted by count
count   memory  name
1       0B      ks.test:drain
7650    0B      ks.test:data-query

7651    0B      total

Permits with state registered, sorted by count
count   memory  name

0       0B      total

Total: permits: 7678, memory: 3499M

This allows determining several things at glance:
* What are the tables involved
* What are the operations involved
* Where is the memory

This can speed up a follow-up investigation greatly, or it can even be
enough on its own to determine that the issue is benign.
2020-10-13 12:32:14 +03:00
Botond Dénes
27bbf5566d reader_concurrency_semaphore: link permits into an intrusive list 2020-10-13 12:32:14 +03:00
Botond Dénes
fdb93ae0fd reader_concurrency_semaphore: move expiry_handler::operator()() out-of-line
Soon we will want to add more logic to this now simple handler, move it
out-of-line in preparation.
2020-10-13 12:32:14 +03:00
Botond Dénes
85bfd28f4e reader_concurrency_semaphore: move constructors out-of-line
Soon, the semaphore will have a field that will not have a publicly
available definition. Move the constructor out-of-line in preparation.
2020-10-13 12:32:13 +03:00
Botond Dénes
70fa543c31 reader_concurrency_semaphore: add state to permits
Instead of a simple boolean, designating whether the permit was already
admitted or not, add a proper state field with a value for all the
different states the permit can be in. Currently there are three such
states:
* registered - the permit was created and started accounting resource
  consumption.
* waiting - the permit was queued to wait for admission.
* admitted - the permit was successfully admitted.

The state will be used for debugging purposes, both during coredump
debugging as well as for dumping diagnostics data about permits.
2020-10-13 12:32:13 +03:00
Botond Dénes
ff623e70b3 reader_concurrency_semaphore: name permits
Require a schema and an operation name to be given to each permit when
created. The schema is of the table the read is executed against, and
the operation name, which is some name identifying the operation the
permit is part of. Ideally this should be different for each site the
permit is created at, to be able to discern not only different kind of
reads, but different code paths the read took.

As not all read can be associated with one schema, the schema is allowed
to be null.

The name will be used for debugging purposes, both for coredump
debugging and runtime logging of permit-related diagnostics.
2020-10-13 12:32:13 +03:00
Botond Dénes
73a6b97c75 reader_permit: add consumed_resources() accessor
That allows querying he amount of resources accounted though this
permit, and by extension by this logical read.
2020-10-06 08:18:42 +03:00
Botond Dénes
4c8ab10563 reader_permit: only forward resource consumption to semaphore after admission
In the next patches we plan to start tracking the memory consumption of
the actual allocations made by the circular_buffer<mutation_fragment>,
as well as the memory consumed by the mutation fragments.
This means that readers will start consuming memory off the permit right
after being constructed. Ironically this can prevent the reader from
being admitted, due to its own pre-admission memory consumption. To
prevent this hold on forwarding the memory consumption to the semaphore,
until the permit is actually admitted.
2020-09-28 08:46:22 +03:00
Botond Dénes
e1eee0dc34 reader_permit: track resource consumed through permit
Track all resources consumed through the permit inside the permit. This
allows querying how much memory each read is consuming (as there should
be one read per permit). Although this might be interesting, especially
when debugging OOM cores, the real reason we are doing this is to be
able forward resource consumption to the semaphore only post-admission.
More on this in the patch introducing this.

Another advantage of tracking resources consumed through the permit is
that now we can detect resource leaks in the permit destructor and
report them. Even if it is just a case of the holder of the resources
wanting to release the resources later, with the permit destroyed it
will cause use-after-free.
2020-09-28 08:46:22 +03:00
Botond Dénes
cd953a36fd reader_permit: move internals to impl
In the next patches the reader permit will gain members that are shared
across all instances of the same permit. To facilitate this move all
internals into an impl class, of which the permit stores a shared
pointer. We use a shared_ptr to avoid defining `impl` in the header.

This is how the reader permit started in the beginning. We've done a
full circle. :)
2020-09-28 08:46:22 +03:00
Botond Dénes
12372731cb reader_permit: add consume()/signal()
And do all consuming and signalling through these methods. These
operations will soon be more involved than the simple forwarding they do
today, so we want to centralize them to a single method pair.
2020-09-28 08:46:22 +03:00
Botond Dénes
375815e650 reader_permit::resource_units: store permit instead of semaphore
In the next patches we want to introduce per-permit resource tracking --
that is, have each permit track the amount of resource consumed through
it. For this, we need all consumption to happen through a permit, and
not directly with the semaphore.
2020-09-28 08:46:22 +03:00
Botond Dénes
0fe75571d9 reader_concurrency_semaphore: admit one read if no reader is active
To ensure progress at all times. This is due to evictable readers, who
still hold on to a buffer even when their underlying reader is evicted.
As we are introducing buffer and mutation fragment tracking in the next
patches, these readers will hold on to memory even in this state, so it
may theoretically happen that even though no readers are admitted (all
count resources all available) no reader can be admitted due to lack of
memory. To prevent such deadlocks we now always admit one reader if all
count resource are available.
2020-09-28 08:46:22 +03:00
Botond Dénes
ef0b279c80 reader_concurrency_semaphore: move may_proceed() out-of-line
They are only used in the .cc anyway.
2020-09-28 08:46:22 +03:00
Botond Dénes
c18756ce9a reader_concurrency_semaphore: s/inactive_read_stats/stats/
In preparations of non-inactive read stats being added to the semaphore,
rename its existing stats struct and member to a more generic name.
Fields, whose name only made sense in the context of the old name are
adjusted accordingly.
2020-09-23 13:11:55 +03:00
Botond Dénes
a0107ba1c6 reader_permit: reader_resources: make true RAII class
Currently in all cases we first deduct the to-be-consumed resources,
then construct the `reader_resources` class to protect it (release it on
destruction). This is error prone as it relies on no exception being
thrown while constructing the `reader_resources`. Albeit the
`reader_resources` constructor is `noexcept` right now this might change
in the future and as the call sites relying on this are disconnected
from the declaration, the one modifying them might not notice.
To make this safe going forward, make the `reader_resources` a true RAII
class, consuming the units in its constructor and releasing them in its
destructor.

Fixes: #7256

Tests: unit(dev)
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20200922150625.1253798-1-bdenes@scylladb.com>
2020-09-22 18:13:35 +03:00
Botond Dénes
11105cbb78 reader_concurrency_semaphore: make inactive read handles unique across semaphores
Currently inactive read handles are only unique within the same
semaphore, allowing for an unregister against another semaphore to
potentially succeed. This can lead to disasters ranging from crashes to
data corruption. While a handle should never be used with another
semaphore in the first place, we have recently seen a bug (#6613)
causing exactly that, so in this patch we prevent such unregister
operations from ever succeeding by making handles unique across all
semaphores. This is achieved by adding a pointer to the semaphore to the
handle.
2020-07-23 16:43:33 +03:00
Avi Kivity
4e79296090 tracked_file_impl: inherit disk and memory alignment from underlying file
tracked_file_impl is a wrapper around another file, that tracks
memory allocated for buffers in order to control memory consumption.

However, it neglects to inherit the disk and memory alignment settings
from the wrapped file, which can cause unnecessarily-large buffers
to be read from disk, reducing throughput.

Fix by copying the alignment parameters.

Fixes #6290.
2020-06-11 17:43:50 +03:00
Botond Dénes
3cd2598ab3 reader_permit: forbid empty permits
Remove `no_reader_permit()` and all ways to create empty (invalid)
permits. All permits are guaranteed to be valid now and are only
obtainable from a semaphore.

`reader_permit::semaphore()` now returns a reference, as it is
guaranteed to always have a valid semaphore reference.
2020-05-28 11:34:35 +03:00
Botond Dénes
f417b9a3ea reader_concurrency_semaphore: remove wait_admission and consume_resources()
Permits are now created with `make_permit()` and code is using the
permit to do all resource consumption tracking and admission waiting, so
we can remove these from the semaphore. This allows us to remove some
now unused code from the permit as well, namely the `base_cost` which
was used to track the resource amount the permit was created with. Now
this amount is also tracked with a `resource_units` RAII object, returned
from `reader_permit::wait_admission()`, so it can be removed. Curiously,
this reduces the reader permit to be glorified semaphore pointer. Still,
the permit abstraction is worth keeping, because it allows us to make
changes to how the resource tracking part of the semaphore works,
without having to change the huge amount of code sites passing around
the permit.
2020-05-28 11:34:35 +03:00
Botond Dénes
bf4ade8917 reader_permit: resource_units: introduce add()
Allows merging two resource_units into one.
2020-05-28 11:34:35 +03:00
Botond Dénes
4d7250d12b reader_permit: add wait_admission
We want to make `read_permit` the single interface through which reads
interact with the concurrency limiting mechanism. So far it was only
usable to track memory consumption. Add the missing `wait_admission()`
and `consume_resources()` to the permit API. As opposed to
`reader_concurrency_semaphore::` equivalents which returned a
permit, the `reader_permit::` variants jut return
`reader_permit::resource_units` which is an RAII holder for the acquired
units. This also allows for the permit to be created earlier, before the
reader is admitted, allowing for tracking pre-admission memory usage as
well. In fact this is what we are going to do in the next patches.

This patch also introduces a `broken()` method on the reader concurrency
semaphore which resolves waiters with an exception. This method is also
called internally from the semaphore's destructor. This is needed
because the semaphore can now have external waiters, who has to be
resolved before the semaphore itself is destroyed.
2020-05-28 11:34:35 +03:00
Botond Dénes
bd793d6e19 reader_permit: resource_units: work in terms of reader_resources
Refactor resource_units semantically as well to work in terms of
reader_resources, instead of just memory.
2020-05-28 11:34:35 +03:00
Botond Dénes
0f9c24631a reader_permit: s/memory_units/resource_units/
We want to refactor reader_permit::memory_units to work in terms of
reader_resources, as we are planning to use it for guarding count
resources as well. This patch makes the first step: renames it from
memory_units to resources_units. Since this is a very noisy change, we
do it in a separate patch, the semantic change is in the next patch.
2020-05-28 11:34:35 +03:00
Avi Kivity
88ade3110f treewide: replace calls to engine().some_api() with some_api()
This removes the need to include reactor.hh, a source of compile
time bloat.

In some places, the call is qualified with seastar:: in order
to resolve ambiguities with a local name.

Includes are adjusted to make everything compile. We end up
having 14 translation units including reactor.hh, primarily for
deprecated things like reactor::at_exit().

Ref #1
2020-04-05 12:46:04 +03:00
Botond Dénes
05116ba963 reader_concurrency_semaphore: make signal() noexcept
Currently reader_concurrency_semaphore::signal() can fail. This is
dangerous in two ways:
* It is called from constructors, so the exception can bring down the
  node. This will convert an `std::bad_alloc` to a crash.
* Reads in the queue will be blocked until they either time-out, or
  another `signal()` succeeds.

To solve this, wrap the `reader_permit` constructor, the only code that
can throw, with try-catch and forward the exception to the reader
admission promise. In practice this will result in the flushing of the
reader queue, when we fail to admit a read.

Fixes #5741
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20200206154238.707031-1-bdenes@scylladb.com>
2020-02-06 17:51:03 +02:00
Botond Dénes
434d32befe reader_permit: tidy up reader_permit::memory_units
This patch is a bag of fixes/cleanups that were omitted from the reader
memory tracking series due to contributor error. It contains the
following changes:
* Get rid of unused `increase()` and `decrease()` methods.
* Make all constructors and assignment operators `noexcept`.
* Make move assignment operator safe w.r.t. self assignment.
* `reset()`: consume the new amount before releasing the old amount,
  to prevent a transient window where new readers might be admitted.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20200206143007.633069-1-bdenes@scylladb.com>
2020-02-06 16:35:07 +02:00
Botond Dénes
92fffe51d5 reader_concurrency_semaphore: tracking_file_impl: consume memory speculatively
Consume the memory before even submitting the I/O to the underlying
`file` object. This is in line with the underlying `file` object
allocating the buffer before it forwards the I/O request to the kernel.
This extends the "visibility" over the memory consumed by I/O greatly,
as it turns out buffers spend most time alive waiting for the I/O to
complete and are parsed shortly afterwards.
2020-01-28 08:13:16 +02:00
Botond Dénes
4bb3c7b1f0 reader_concurrency_semaphore: bye reader_resource_tracker
Replaced by `reader_permit`, of which it was a mere wrapper of in the
first place.
2020-01-28 08:13:16 +02:00
Botond Dénes
dea24ca859 reader_permit: expose make_tracked_temporary_buffer()
Previously `tracking_file_impl::make_tracked_buf()`. In the next patches
we plan on using this outside `tracking_file_impl`, so make it public
and templatize on the char type.
2020-01-28 08:13:16 +02:00
Botond Dénes
16cea36a94 reader_permit: introduce make_tracked_file()
Free function equivalent of `reader_resource_tracker::track_file()`,
using a `reader_permit` directly.
2020-01-28 08:13:16 +02:00
Botond Dénes
1859a03629 reader_permit: introduce memory_units
Similar to `seastar::semaphore_units`, this allows consuming and
releasing memory via an RAII object. In addition to that, it also allows
tracking changing values. This feature was designed to be used for
tracking the ever changing memory consumption of the buffers of
`flat_mutation_reader`:s.
This is now the only supported way of consuming memory from a permit.
2020-01-28 08:13:16 +02:00
Botond Dénes
c0f96db2d9 reader_concurrency_semaphore: mv reader_resources and reader_permit to reader_permit.hh
In the next patches we will replace `reader_resource_tracker` and have
code use the `reader_permit` directly. In subsequent patches, the
`reader_permit` will get even more usages as we attempt to make the
tracking of reader resource more accurate by tracking more parts of it.
So the grand plan is that the current `reader_concurrency_semaphore.hh`
is split into two headers:
* `reader_concurrency_semaphore.hh` - containing the semaphore proper.
* `reader_permit.hh` - a very lightweight header, to be used by
  components which only want to track various parts of the resource
  consumption of reads.
2020-01-28 08:13:16 +02:00
Botond Dénes
2005495857 reader_concurrency_semaphore: reader_permit: make it a value type
Currently `reader_permit` is passed around as
`lw_shared_ptr<reader_permit>`, which is clunky to write and use and is
also an unnecessary leak of details on how permit ownership is managed.
Make `reader_permit` a simple value type, making it a little bit easier
and safer to use.
In the next patches we will get rid of `reader_resource_tracker` and
instead have code use the permit instance directly, so this small
improvement in usability will go a long way towards preventing eye sore.
2020-01-28 08:13:16 +02:00
Botond Dénes
89c5fd0c25 reader_concurrency_semaphore::reader_permit: move methods out-of-line
In preparation for making the reader_permit a top-level class, and
moving it to another file. It is also good practice to define
non-performance critical methods out-of-line to reduce header bloat.
2020-01-28 08:13:16 +02:00
Juliusz Stasiewicz
d043393f52 db+semaphores+tests: mandatory `name' param in reader_concurrency_semaphore
Exception messages contain semaphore's name (provided in ctor).
This affects the queue overflow exception as well as timeout
exception. Also, custom throwing function in ctor was changed
to `prethrow_action', i.e. metrics can still be updated there but
now callers have no control over the type of the exception being
thrown. This affected `restricted_reader_max_queue_length' test.
`reader_concurrency_semaphore'-s docs are updated accordingly.
2019-12-03 15:41:34 +01:00
Botond Dénes
e56c26205f reader_concurrency_semaphore: add counters for inactive reads
Add counters that give insight into inactive read related events.
Two counters are added:
* permit_based_evictions
* population
2019-01-07 16:45:49 +02:00