Commit Graph

36 Commits

Author SHA1 Message Date
Botond Dénes
72b8a2d147 querier: move common stuff into querier_base
The querier cache expects all querier objects it stores to have certain
methods. To avoid accessing these via `std::visit()` (the querier object
is stored in an `std::variant`), we move all the stuff that is common to
all querier types into a base class. The querier cache now accesses the
members via a reference to this common base. Additionally the variant is
eliminated completely and the cache entry stores an
`std::unique_ptr<querier_base>` instead.

Tests: unit(dev)

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20200603152544.83704-1-bdenes@scylladb.com>
2020-06-03 18:45:33 +03:00
Botond Dénes
e678f06a5e querier_cache: get semaphore from querier
Currently the `querier_cache` is passed a semaphore during its
construction and it uses this semaphore to do all the inactive reader
registering/unregistering. This is inaccurate as in theory cached reads
could belong to different semaphores (although currently this is not yet
the case). As all queriers store a valid permit now, use this
permit to obtain the semaphore the querier is associated with, and
register the inactive read with this semaphore.
2020-05-28 11:34:35 +03:00
Botond Dénes
e778b072b1 read_command: use bool_class for is_first_page parameter
The constructor of `read_command` is used both by IDL and clients in the
code. However, this constructor has a parameter that is not used by IDL:
`read_timestamp`. This requires that this parameter is the very last in
the list and that new parameters that are used by IDL are added before
it. One such new parameter was `bool is_first_page`. Adding this
parameter right before the read timestamp one created a situation where
the last parameter (read_timestamp) implicitly converts to the one
before it (is_first_page). This means that some call sites passing
`read_timestamp` were now silently converting this to `is_first_page`,
effectively dropping the timestamp.

This patch aims to rectify this, while also avoiding similar accidents
in the future, by making `is_first_page` a `bool_class` which doesn't
have any implicit convertions defined. This change does not break the
ABI as `bool_class` is also sent as a `bool` on the wire.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Tests: unit(dev)
Message-Id: <20200422073657.87241-1-bdenes@scylladb.com>
2020-04-22 11:01:22 +03:00
Botond Dénes
091d80e8c3 flat_mutation_reader: expose reverse reader as a standalone reader
Currently reverse reads just pass a flag to
`flat_mutation_reader::consume()` to make the read happen in reverse.
This is deceptively simple and streamlined -- while in fact behind the
scenes a reversing reader is created to wrap the reader in question to
reverse partitions, one-by-one.

This patch makes this apparent by exposing the reversing reader via
`make_reversing_reader()`. This now makes how reversing works more
apparent. It also allows for more configuration to be passed to the
reversing reader (in the next patches).

This change is forward compatible, as in time we plan to add reversing
support to the sstable layer, in which case the reversing reader will
go.
2020-02-27 18:11:54 +02:00
Botond Dénes
00b432b61d querier_cache: correctly account entries evicted on insertion in the population
Currently, the population stat is not increased for entries that are
evicted immediately on insert, however the code that does the eviction
still decreases the population stat, leading to an imbalance and in some
cases the underflow of the population stat. To fix, unconditionally
increase the population stat upon inserting an entry, regardless of
whether it is immediately evicted or not.

Fixes: #5123

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191001153215.82997-1-bdenes@scylladb.com>
2019-10-03 11:49:44 +03:00
Botond Dénes
d57ab83bc8 querier_cache: add inserted stat
Recently we have seen a case where the population stat of the cache was
corrupt, either due to misaccounting or some more serious corruption.
When debugging something like that it would have been useful to know how
many items have been inserted to the cache. I also believe that such a
counter could be useful generally as well.

Refs: #4918

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20190924083429.43038-1-bdenes@scylladb.com>
2019-09-24 10:52:49 +02:00
Botond Dénes
ab5d717052 reader_concurrency_semaphore::inactive_read_handle: fix handle semantics
That is:
* make it move only;
* make moved-from handles null handles;
* add (public) default constructor, which constructs a null handle;
2019-02-12 16:20:51 +02:00
Duarte Nunes
fa2b0384d2 Replace std::experimental types with C++17 std version.
Replace stdx::optional and stdx::string_view with the C++ std
counterparts.

Some instances of boost::variant were also replaced with std::variant,
namely those that called seastar::visit.

Scylla now requires GCC 8 to compile.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20190108111141.5369-1-duarte@scylladb.com>
2019-01-08 13:16:36 +02:00
Botond Dénes
021feef513 querier_cache: simplify memory eviction use-after-free fix, add tests
Simplify the fix for memory based eviction, introduced by 918d255 so
there is no need to massage the counters.

Also add a check to `test_memory_based_cache_eviction` which checks for
the bug fixed. While at it also add a check to
`test_time_based_cache_eviction` for the fix to time based eviction
(e5a0ea3).

Tests: tests/querier_cache:debug
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <c89e2788a88c2a701a2c39f377328e77ac01e3ef.1546515465.git.bdenes@scylladb.com>
2019-01-03 13:44:08 +02:00
Botond Dénes
e5a0ea390a querier_cache: unregister queriers evicted due to expired TTL
Currently queriers evicted due to their TTL expiring are not
unregistered from the `reader_concurrency_semaphore`. This can cause a
use-after-free when the semaphore tries to evict the same querier at
some later point in time, as the querier entry it has a pointer to is
now invalid.

Fix by unregistering the querier from the semaphore before destroying
the entry.

Refs: #4018
Refs: #4031

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <4adfd09f5af8a12d73c29d59407a789324cd3d01.1546504034.git.bdenes@scylladb.com>
2019-01-03 10:29:26 +02:00
Avi Kivity
918d255168 querier_cache: unregister querier from reader_concurrency_semaphore during eviction
In insert_querier(), we may evict older queriers to make room for the new one.
However, we forgot to unregister the evicted queriers from
reader_concurrency_semaphore. As a result, when reader_concurrency_semaphore
eventually wanted to evict something, it saw an inactive_read_handle that was
not connected to a querier_cache::entry, and crashed on use-after-free.

Fix by evicting through the inactive_read_handle associated with the querier
to be evicted. This removes traces of the querier from both
reader_concurrency_semaphore and querier_cache. We also have to massage the
statistics since querier_inactive_read::evict() updates different counters.

Fixes #4018.

Tests: unit(release)
Reviewed-by: Botond Denes <bdenes@scylladb.com>
Message-Id: <20190102175023.26093-1-avi@scylladb.com>
2019-01-03 09:15:07 +02:00
Botond Dénes
5780f2ce7a querier_cache: check that the query wasn't evicted during registering
The reader concurrency semaphore can evict the querier when it is
registered as an inactive read. Make the `querier_cache` aware of this
so that it doesn't continue to process the inserted querier when this
happens.
Also add a unit test for this.
2018-12-17 13:18:08 +02:00
Botond Dénes
77dbc7d09a querier: fix evict_one() and evict_all_for_table()
Both of these have the same problem. They remove the to-be-evicted
entries from `_entries` but they don't unregister the `entry` from the
`read_concurrency_semaphore`. This results in the
`reader_concurrency_semaphore` being left with a dangling pointer to the
entries will trigger segfault when it tries to evict the associated
inactive reads.

Also add a unit test for `evict_all_for_table()` to check that it works
properly (`evict_one()` is only used in tests, so no dedicated test for
it).

Fixes: #3962

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <57001857e3791c6385721b624d33b667ccda2e7d.1544010868.git.bdenes@scylladb.com>
2018-12-05 21:51:01 +02:00
Botond Dénes
37f0117747 reader_concurrency_semaphore: refactor eviction mechanism
As we are about to add multiple sources of evictable readers, we need a
more scalable solution than a single functor being passed that opaquely
evicts a reader when called.
Add a generic way to register and unregister evictable (inactive)
readers to the semaphore. The readers are expected to be registered when
they become evictable and are expected to be unregistered when they
cease to become evictable. The semaphore might evict any reader that is
registered to it, when it sees fit.

This also solves the problem of notifying the semaphore when new readers
become evictable. Previously there was no such mechanism, and the
semaphore would only evict any such new readers when a new permit was
requested from it.
2018-12-04 08:51:00 +02:00
Botond Dénes
ecb1e79bcc querier: add shard_mutation_querier
The querier to be used for saving shard readers belonging to a
multishard range scan. This querier doesn't provide a `consume_page`
method as it doesn't support reading from it directly. It is more
of a storage to allow caching the reader and any objects it depends on.
2018-09-03 10:31:44 +03:00
Botond Dénes
07cdf766c5 querier: prepare for multi-ranges
In the next patch a querier will be added that reads multiple ranges as
opposed to a single range that data and mutation queriers read.
To keep `querier_cache` code seamless regarding this difference change all
range-matching logic to work in terms of `dht::partition_ranges_view`.
This allows for cheap and seamless way of having a single code-base for
the insert/lookup logic. Code actually matching ranges is updated to be
able to handle both singular and multi-ranges while maintaining backward
compatibility.
2018-09-03 10:31:44 +03:00
Botond Dénes
c12008b8cb querier: split querier into separate data and mutation querier types
Instead of hiding what compaction method the querier uses (and only
expose it via rejecting 'can_be_used_for_page()`) make it very explicit
that these are really two different queriers. This allows using
different indexes for the two queriers in `querier_cache` and
eliminating the possibility of picking up a querier with the wrong
compaction method (read kind).
This also makes it possible to add new querier type(s) that suit the
multishard-query's needs without making a confusing mess of `querier` by
making it a union of all querying logic.

Splitting the queriers this way changes what happens when a lookup finds
a querier of the wrong kind (e.g. emit_only_live::yes for an
emit_only_live::no command). As opposed to dropping the found (but
wrong) querier the querier will now simply not be found by the lookup.
This is a result of using separate search indexes for the different
mutation kinds. This change should have no practical implications.

Splitting is done by making querier templated on `emit_only_live_rows`.
It doesn't make sense to duplicate the entire querier as the two share
99% of the code.
2018-09-03 10:31:44 +03:00
Botond Dénes
c53f17ddb8 querier: move all matching related logic into free functions
So that they can be used for multiple querier classes easily, without
inheritance. The functions are not visible from the header.
Also update the comments on `querier` to w.r.t. the disappeared
checking functions. Change the language to be more general. In practice
these checks are never done by client code, instead they are done by the
`querier_cache`.
2018-09-03 10:31:44 +03:00
Botond Dénes
43f464c52d querier: inline querier::current_position() and make it public 2018-09-03 10:31:44 +03:00
Botond Dénes
86a61ded7d querier: s/position/position_view/
Also treat it as a view, that is take it by value in functions,
instead of reference.
2018-09-03 10:31:44 +03:00
Botond Dénes
6e4ec53679 querier: move position outside of querier
In preparation for having multiple querier types that can share code
without inheritance.
2018-09-03 10:31:44 +03:00
Botond Dénes
7bd955e993 querier_cache: move insert/lookup related logic into free functions
In preparations for introducing support multiple entry types in the
querier_cache move all insert/lookup related logic into free functions.
Later these functions will be templated so they can handle multiple
entry types with the same code.
2018-09-03 10:31:44 +03:00
Botond Dénes
cded477b94 querier: return std::optional<querier> instead of using create_fun()
Requiring the caller of lookup() to pass in a `create_fun()` was not
such a good idea in hindsight. It leads to awkward call sites and even
more awkward code when trying to find out whether the lookup was
successfull or not.
Returning an optional gives calling code much more flexibility and makes
the code cleaner.
2018-09-03 10:31:44 +03:00
Botond Dénes
5f726e9a89 querier: move all to query namespace
To avoid name clashes.
2018-09-03 10:31:44 +03:00
Botond Dénes
2609a17a23 querier: find_querier(): return end() when no querier matches the range
When none of the queriers found for the lookup key match the lookup
range `_entries.end()` should be returned as the search failed. Instead
the iterator returned from the failed `std::find_if()` is returned
which, if the find failed, will be the end iterator returned by the
previous call to `_entries.equal_range()`. This is incorrect because as
long as `equal_range()`'s end iterator is not also `_entries.end()` the
search will always return an iterator to a querier regardless of whether
any of them actually matches the read range.
Fix by returning `_entries.end()` when it is detected that no queriers
match the range.

Fixes: #3530
2018-06-19 13:20:43 +03:00
Botond Dénes
7ce7f3f0cc querier_cache: restructure entries storage
Currently querier_cache uses a `std::unordered_map<utils::UUID, querier>`
to store cache entries and an `std::list<meta_entry>` to store meta
information about the querier entries, like insertion order, expiry
time, etc.

All cache eviction algorithms use the meta-entry list to evict entries
in reverse insertion order (LRU order). To make this possible
meta-entries keep an iterator into the entry map so that given a
meta-entry one can easily erase the querier entry. This however poses a
problem as std::unordered_map can possibly invalidate all its iterators
when new items are inserted. This is use-after-free waiting to happen.

Another disadvantages of the current solution is that it requires the
meta-entry to use a weak pointer to the querier entry so that in case
that is removed (as a result of a successful lookup) it doesn't try to
access it. This has an impact on all cache eviction algorithms as they
have to be prepared to deal with stale meta-entries. Stale meta-entries
also unnecesarily consume memory.

To solve these problems redesign how querier_cache stores entries
completely. Instead of storing the entries in an `std::unordered_map`
and storing the meta-entries in an `std::list`, store the entries in an
`std::list` and an intrusive-map (index) for lookups. This new design
has severeal advantages over the old one:
* The entries will now be in insert order, so eviction strategies can
  work on the entry list itself, no need to involve additional data
  structures for this.
* All data related to an entry is stored in one place, no data
  duplication.
* Removing an entry automatically removes it from the index as intrusive
  containers support auto unlink. This means there is no need to store
  iterators for long terms, risking use-after-free when the container
  invalidates it's iterators.

Additional changes:
* Modify eviction strategies so that they work with the `entry`
  interface rather than the stored value directly.

Ref #3424
2018-06-19 13:20:40 +03:00
Gleb Natapov
04727acee9 Configure querier_cache size limit during object creation 2018-06-11 15:34:13 +03:00
Botond Dénes
3b6f4e4901 querier: check only the end bound of ranges when matching them
The querier provides a `matches(const nonwrapping_range&)` member to
allow for checking whether a range matches that with which the querier
was originally created. The check for match is more lax than a strict
equality check as ranges are shrunk query progresses.
Because of this the above member only checked that one of the bounds of
the examined ranges matches. This is adequate as for this purpose
because, in the context of a single query, it is guaranteed that no
two read requests to the same replica will have overlapping range.
However Avi pointed out in a recent, related review, that this check can
be made a little more strict by requiring that the end-bounds of the
two ranges *always* matches, instead of allowing any of the bounds to
match.
2018-05-10 06:22:39 +03:00
Botond Dénes
6f7d919470 database: when dropping a table evict all relevant queriers
Queriers shouldn't outlive the table they read from as that could lead
to use-after-free problems when they are destroyed.

Fixes: #3414

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <3d7172cef79bb52b7097596e1d4ebba3a6ff757e.1525716986.git.bdenes@scylladb.com>
2018-05-07 21:20:25 +03:00
Botond Dénes
b2f75a6c53 Add counters to monitor querier-cache efficiency
Add the following counters:
(1) querier_cache_lookups
(2) querier_cache_misses
(3) querier_cache_drops
(4) querier_cache_time_based_evictions
(5) querier_cache_resource_based_evictions
(6) querier_cache_memory_based_evictions
(6) querier_cache_population

(1) counts the total number of querier cache lookups. Not all
page-fetches will result in a querier lookup. For example the first page
of a query will not do a lookup as there was no previous page to reuse
the querier from. The second, and all subsequent pages however should
attempt to reuse the querier from the previous page.
(2) counts the subset of (1) where the read have missed the querier
cache (failed to find a matching saved querier).
(3) counts the subset of (1) where the querier was recalled and dropped
immediately. This can happen for example if the querier was at the wrong
position.
(4) counts the cached queriers that were evicted due to their TTL
expiring.
(5) counts the cached queriers that were evicted due to reader-resource
(those limited by reader-concurrency limits) shortage.
(6) counts the cached queriers that were evicted due to reaching the
cache's memory limits (currently set to 4% of the shards' memory).
(7) is the current number of entries in the cache

Note:
* The count of cache hits can be derived from these counters as
(1) - (2).
* cache_drop (3) also implies a cache hit (see above). This means that
the number of actually reused queriers is:
(1) - (2) - (3)
2018-03-13 10:34:34 +02:00
Botond Dénes
8513549b55 Memory based cache eviction
To bound the memory consumption of the querier-cache the total memory
consumption of the cached queriers is limited to 4% of the shard's total
memory.
When inserting a new querier it is first checked whether it's insertion
would cause the limit to be crossed. If this is the case existing
entries are evicted until the memory consumption is sufficiently reduced
so that after inserting the querier it stays below the limit.
Cached queriers are evicted in LRU order as the oldest queriers are the
most likely to be evicted based on their TTL anyway.
To calculate the memory consumption of the cached queriers
flat_mutation_reader::buffer_size() is used. While this is not very
precise as it doesn't include object sizes and member containers it
gives a good picture of the memory consumption of the queriers.

Memory based cache eviction overlaps with resource-based cache eviction
but only to some degree as that only accounts the memory consumption of
sstable readers.
2018-03-13 10:34:34 +02:00
Botond Dénes
212b2dabc4 Resource-based cache eviction
Readers serving user-reads need to obtain a permit to start reading.
There exists a restriction on how much active readers can be admitted
based on their count and their memory onsumption.
Since the saved readers of cached queriers are techically active (they
hold a permit) they can block new readers from obtaining a permit.
New readers have a higher priority because a cached reader might be
abandoned or used later at best so in the face of memory pressure we
evict cached readers to free up permits for new readers.
Cached queriers are evicted in LRU order as the oldest queriers are the
most likely to be evicted based on their TTL anyway.
2018-03-13 10:34:34 +02:00
Botond Dénes
d5bcadcfda Time-based cache eviction
Cached queriers should not sit in the cache indefinitely otherwise
abandoned reads would cause excess and unncessary resource-usage. Attach
an expiry timer to each cache-entry which evicts it after the TTL
passes.
2018-03-13 10:34:34 +02:00
Botond Dénes
cab38c9f81 Add the querier_cache_context helper
querier_cache_context is supposed to make propagating the cache and the
key down the layers. It comes bundled with some of the required
parameters (the lookup and save state) and aso hides all of the
boiler-plate of dealing with the cache (checking whether the key is
non-empty, etc.). It also makes it possible to not use the cache and
hide this from the lower layers.
2018-03-13 10:34:34 +02:00
Botond Dénes
bbfe17437e Add querier_cache
This is the cache where suspended queriers are going to be saved between
pages. This is not a general purpose cache. It caters to the specific
needs of the querier recall mechanism. More specifically:
(1) Cache entries are of single-use, they are inserted once and the first
lookup removes them. Multiple items may be stored under a single key.
Identifying the correct one happens based on additional information like
the query range. Lookup knows to drop queriers when they cannot be used
to serve the next page.
(2) Cache entries are evicted after a certain time to avoid the
depletion of resources due to abandoned reads.
(3) Cache entries are evicted when facing reader-permit shortage, until
either enough permits are freed up or all entries are evicted.
(4) A memory limiter is set up which keeps the total memory consumption
of the cache under a limit (4% of memory) by evicting the oldest entries
when inserting a new one would cause the total memory consumption to go
above the limit.
(5) It updates the relevant counters of the db_stats.

This patch only implements (1), the other features will be implemented
in their own patches.
2018-03-13 10:34:34 +02:00
Botond Dénes
7a5143a670 Add querier
The querier encapsulates all objects needed to serve queries, except
result builders. It is designed to be suspendable, savable and
resumable. It contains all logic needed to suspend, resume and determine
whether the querier can be resumed or not.
It is the foundation upon which the "reader-reuse" mechanism is built.
2018-03-13 10:34:34 +02:00