Commit Graph

57 Commits

Author SHA1 Message Date
Botond Dénes
256140a033 mutation_fragment: memory_usage(): remove unused schema parameter
The memory usage is now maintained and updated on each change to the
mutation fragment, so it needs not be recalculated on a call to
`memory_usage()`, hence the schema parameter is unused and can be
removed.
2020-09-28 11:27:47 +03:00
Botond Dénes
6ca0464af5 mutation_fragment: add schema and permit
We want to start tracking the memory consumption of mutation fragments.
For this we need schema and permit during construction, and on each
modification, so the memory consumption can be recalculated and pass to
the permit.

In this patch we just add the new parameters and go through the insane
churn of updating all call sites. They will be used in the next patch.
2020-09-28 11:27:23 +03:00
Botond Dénes
0518571e56 flat_mutation_reader: make _buffer a tracked buffer
Via a tracked_allocator. Although the memory allocations made by the
_buffer shouldn't dominate the memory consumption of the read itself,
they can still be a significant portion that scales with the number of
readers in the read.
2020-09-28 10:53:56 +03:00
Piotr Sarna
71bb277cbc multishard_mutation_query: fix a typo in variable name
s/allwoed/allowed
Message-Id: <eedb62b1f13ebf4ab1e6e92642a77fab32379d73.1596799589.git.sarna@scylladb.com>
2020-08-09 12:52:40 +03:00
Wojciech Mitros
45215746fe increase the maximum size of query results to 2^64
Currently, we cannot select more than 2^32 rows from a table because we are limited by types of
variables containing the numbers of rows. This patch changes these types and sets new limits.

The new limits take effect while selecting all rows from a table - custom limits of rows in a result
stay the same (2^32-1).

In classes which are being serialized and used in messaging, in order to be able to process queries
originating from older nodes, the top 32 bits of new integers are optional and stay at the end
of the class - if they're absent we assume they equal 0.

The backward compatibility was tested by querying an older node for a paged selection, using the
received paging_state with the same select statement on an upgraded node, and comparing the returned
rows with the result generated for the same query by the older node, additionally checking if the
paging_state returned by the upgraded node contained new fields with correct values. Also verified
if the older node simply ignores the top 32 bits of the remaining rows number when handling a query
with a paging_state originating from an upgraded node by generating and sending such a query to
an older node and checking the paging_state in the reply(using python driver).

Fixes #5101.
2020-08-03 17:32:49 +02:00
Botond Dénes
f7a4d19fb1 mutation_partition: abort read when hard limit is exceeded for non-paged reads
If the read is not paged (short read is not allowed) abort the query if
the hard memory limit is reached. On reaching the soft memory limit a
warning is logged. This should allow users to adjust their application
code while at the same time protecting the database from the really bad
queries.
The enforcement happens inside the memory accounter and doesn't require
cooperation from the result builders. This ensures memory limit set for
the query is respected for all kind of reads. Previously non-paged reads
simply ignored the memory accounter requesting the read to stop and
consumed all the memory they wanted.
2020-07-29 08:32:31 +03:00
Botond Dénes
9eab5bca27 query_*(): use the coordinator specified memory limit for unlimited queries
It is important that all replicas participating in a read use the same
memory limits to avoid artificial differences due to different amount of
results. The coordinator now passes down its own memory limit for reads,
in the form of max_result_size (or max_size). For unpaged or reverse
queries this has to be used now instead of the locally set
max_memory_unlimited_query configuration item.

To avoid the replicas accidentally using the local limit contained in
the `query_class_config` returned from
`database::make_query_class_config()`, we refactor the latter into
`database::get_reader_concurrency_semaphore()`. Most of its callers were
only interested in the semaphore only anyway and those that were
interested in the limit as well should get it from the coordinator
instead, so this refactoring is a win-win.
2020-07-28 18:00:29 +03:00
Botond Dénes
159d37053d storage_proxy: use read_command::max_result_size to pass max result size around
Use the recently added `max_result_size` field of `query::read_command`
to pass the max result size around, including passing it to remote
nodes. This means that the max result size will be sent along each read,
instead of once per connection.
As we want to select the appropriate `max_result_size` based on the type
of the query as well as based on the query class (user or internal) the
previous method won't do anymore. If the remote doesn't fill this
field, the old per-connection value is used.
2020-07-28 18:00:29 +03:00
Botond Dénes
fbbbc3e05c query: result_memory_limiter: use the new max_result_size type 2020-07-28 18:00:29 +03:00
Botond Dénes
eeeef0a0f1 multishard_mutation_query: use cached semaphore
Instead of requesting the query class config from the database every
time the semaphore is needed, use the cached one by calling
`semaphore()`.
2020-07-27 12:17:22 +03:00
Botond Dénes
b7cfa4ea97 multishard_mutation_query: validate the semaphore of the looked-up reader
To make sure it belongs to the same semaphore that the database thinks
is appropriate for the current query. Since a semaphore mismatch points
to a serious bug, we use `on_internal_error()` to allow generating
coredumps on-demand.
2020-07-23 16:43:37 +03:00
Botond Dénes
63309f925c mutation_reader: reader_lifecycle_policy: make semaphore() available early
Currently all reader lifecycle policy implementations assume that
`semaphore()` will only be called after at least one call to
`make_reader()`. This assumption will soon not hold, so make sure
`semaphore()` can be called at any time, including before any calls are
made to `make_reader()`.
2020-06-23 10:01:38 +03:00
Botond Dénes
3cd2598ab3 reader_permit: forbid empty permits
Remove `no_reader_permit()` and all ways to create empty (invalid)
permits. All permits are guaranteed to be valid now and are only
obtainable from a semaphore.

`reader_permit::semaphore()` now returns a reference, as it is
guaranteed to always have a valid semaphore reference.
2020-05-28 11:34:35 +03:00
Botond Dénes
e4c591aa67 database: introduce make_query_class_config()
And use it to obtain any query-class specific configuration that was
obtained from `table::config` before, such as the read concurrency
semaphore and the max memory limit for unlimited queries. As all users
of these items get these from the query class config now, we can remove
them from `table::config`.
2020-05-28 11:34:35 +03:00
Botond Dénes
d5ebd763ff multishard_mutation_query: pass a valid permit to shard mutation sources
In preparation of a valid permit being required to be passed to all
mutation sources, create a permit before creating the shard readers and
pass it to the mutation source when doing so. The permit is also
persisted in the `shard_mutation_querier` object when saving the reader,
which is another forward looking change, to allow the querier-cache to
use it to obtain the semaphore the read is actually registered with.
2020-05-28 11:34:35 +03:00
Botond Dénes
0b4ec62332 flat_mutation_reader: flat_multi_range_reader: add reader_permit parameter
Mutation sources will soon require a valid permit so make sure we have
one and pass it to the mutation sources when creating the underlying
readers.
For now, pass no_reader_permit() on call sites, deferring the obtaining
of a valid permit to later patches.
2020-05-28 11:34:35 +03:00
Piotr Jastrzebski
e72696a8e6 sharding_info: rename the class to sharder
Also rename all variables that were named si or sinfo
to sharder.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-03-30 18:42:33 +02:00
Piotr Jastrzebski
d8ac8fd6e8 multishard_mutation_query: use sharding_info::shard_of
This patch replaces all the uses of i_partitioner:shard_of
with sharding_info::shard_of in read_context.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-03-30 18:42:33 +02:00
Piotr Jastrzebski
031f589dba multishard_combining_reader: use token_for_next_shard from sharding info not partitioner
Previously this function was accessing sharding logic
through partitioner obtained from the schema.

While converting tests, dummy_partitioner is turned into
dummy_sharding_info.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-03-30 18:42:25 +02:00
Rafael Ávila de Espíndola
c5795e8199 everywhere: Replace engine().cpu_id() with this_shard_id()
This is a bit simpler and might allow removing a few includes of
reactor.hh.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200326194656.74041-1-espindola@scylladb.com>
2020-03-27 11:40:03 +03:00
Piotr Jastrzebski
924ed7bb1c make_multishard_combining_reader: stop taking partitioner
The function already takes schema so there's no need
for it to take partitioner. It can be obtained using
schema::get_partitioner

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-03-15 10:25:20 +01:00
Botond Dénes
7bdeec4b00 flat_mutation_reader: make_reversing_reader(): add memory limit
If the reversing requires more memory than the limit, the read is
aborted. All users are updated to get a meaningful limit, from the
respective table object, with the exception of tests of course.
2020-02-27 18:11:54 +02:00
Piotr Jastrzebski
0677bafd16 multishard_mutation_query: stop calling global_partitioner()
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-02-17 10:59:15 +01:00
Botond Dénes
dfc8b2fc45 treewide: replace reader_resource_tracer with reader_permit
The former was never really more than a reader_permit with one
additional method. Currently using it doesn't even save one from any
includes. Now that readers will be using reader_permit we would have to
pass down both to mutation_source. Instead get rid of
reader_resource_tracker and just use reader_permit. Instead of making it
a last and optional parameter that is easy to ignore, make it a
first class parameter, right after schema, to signify that permits are
now a prominent part of the reader API.

This -- mostly mechanical -- patch essentially refactors mutation_source
to ask for the reader_permit instead of reader_resource_tracking and
updates all usage sites.
2020-01-28 08:13:16 +02:00
Avi Kivity
ba64ec78cf messaging_service: use rpc::tuple instead of variadic futures for rpc
Since variadic future<> is deprecated, switch to rpc::tuple for multiple
return values in rpc calls. This is more or less mechanical translation.
2019-09-26 12:09:31 +02:00
Botond Dénes
fddd9a88dd treewide: silence discarded future warnings for legit discards
This patch silences those future discard warnings where it is clear that
discarding the future was actually the intent of the original author,
*and* they did the necessary precautions (handling errors). The patch
also adds some trivial error handling (logging the error) in some
places, which were lacking this, but otherwise look ok. No functional
changes.
2019-08-26 18:54:44 +03:00
Botond Dénes
18581cfb76 multishard_mutation_query: create_readr(): use the caller's priority class
The priority class the shard reader was created with was hardcoded to be
`service::get_local_sstable_query_read_priority()`. At the time this
code was written, priority classes could not be passed to other shards,
so this method, receiving its priority class parameters from another
shard, could not use it. This is now fixed, so we can just use whatever
the caller wants us to use.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20190823115111.68711-1-bdenes@scylladb.com>
2019-08-23 16:10:43 +02:00
Botond Dénes
a41e8f0bcf query::consume_page: move away from variadic future
Require the `consumer` to return 0 or 1 value in its future. Update all
downstream code.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20190731140440.57295-1-bdenes@scylladb.com>
2019-07-31 18:49:47 +03:00
Botond Dénes
db106a32c8 query_mutations_on_all_shards(): make states light-weight
Previously the different states a reader can be in were all separate
structs, and were joined together by a variant. When this was designed
this made sense as states were numerous and quite different. By this
point however the number of states has been reduced to 4, with 3 of them
being almost the same. Thus it makes sense to merge these states into
single struct and keep track of the current state with an enum field.
This can theoretically increase the chances of mistakes, but in practice
I expect the opposite, due to the simpler (and less) code. Also, all the
important checks that verify that a reader is in the state expected by
the code are all left in place.
A byproduct of this change is that the amount of cross-shard writes is
greatly reduced. Whereas previously the whole state object had to be
rewritten on state change, now a single enum value has to be updated.
Cross shard reads are reduced as well to the read of a few foreign
pointers, all state-related data is now kept on the shard where the
associated reader lives.
2019-02-12 16:20:51 +02:00
Botond Dénes
65b2eb0939 query_mutations_on_all_shards(): get rid of read_context::paused_reader 2019-02-12 16:20:51 +02:00
Botond Dénes
ec44a4dbb1 query_mutations_on_all_shards(): merge the dismantling and ready_to_save states into saving state
These two states are now the same, with the artificial distinction that
all readers are promoted to readey_to_save state after the compaction
state and the combined buffer is dismantled. From a practical
perspective this distinction is meaningless so merge the two states into
a single `saving` state.
2019-02-12 16:20:51 +02:00
Botond Dénes
9a1bd24d82 query_mutations_on_all_shards(): pause looked-up readers
On the beginning of each page, all saved readers from the previous pages
(if any) are looked up, so they can be reused. Some of these saved
readers can end up not being used at all for the current page, in which
case they will needlessly sit on their permit for the duration of
filling the page. Avoid this by immediately pausing all looked-up
readers. This also allows a nice unifying of the reader saving logic, as
now *all* readers will be in a paused state when `save_reader()` is
called. Previously, looked-up, but not used readers were an exception to
this, requiring extra logic to handle both cases. This logic can now be
removed.
2019-02-12 16:20:51 +02:00
Botond Dénes
61b9ed7faf query_mutation_on_all_shards(): remove unecessary indirection 2019-02-12 16:20:51 +02:00
Botond Dénes
9000626647 shard_reader: auto pause readers after being used
Previously it was the responsibility of the layer above (multishard
combining reader) to pause readers, which happened via an explicit
`pause()` call. This proved to be a very bad design as we kept finding
spots where the multishard reader should have paused the reader to avoid
potential deadlocks (due to starved reader concurrency semaphores), but
didn't.

This commit moves the responsibility of pausing the reader into the
shard reader. The reader is now kept in a paused state, except when it
is actually used (a `fill_buffer()` or `fast_forward_to()` call is
executing). This is fully transparent to the layer above.
As a side note, the shard reader now also hides when the reader is
created. This also used to be the responsibility of the multishard
reader, and although it caused no problems so far, it can be considered
a leak of internal details. The shard reader now automatically creates
the remote reader on the first time it is attempted to be used.

The code has been reorganized, such that there is now a clear separation
of responsibilities. The multishard combining reader handles the
combining of the output of the shard readers, as well as issuing
read-aheads. The shard reader handles read-ahead and creating the
remote reader when needed, as well as transferring the results of remote
reads to the "home" shard. The remote reader
(`shard_reader::remote_reader`, new in this patch) handles
pausing-resuming as well as recreating the reader after it was evicted.
Layers don't access each other's internals (like they used to).

After this commit, the reader passed to `destroy_reader()` will always
be in paused state.
2019-02-12 16:20:51 +02:00
Botond Dénes
37006135dc shard_reader: make reader creation sync
Reader creation happens through the `reader_lifecycle_policy` interface,
which offers a `create_reader()` method. This method accepts a shard
parameter (among others) and returns a future. Its implementation is
expected to go to the specified shard and then return with the created
reader. The method is expected to be called from the shard where the
shard reader (and consequently the multishard reader) lives. This API,
while reasonable enough, has a serious flaw. It doesn't make batching
possible. For example, if the shard reader issues a call to the remote
shard to fill the remote reader's buffer, but finds that it was evicted
while paused, it has to come back to the local shard just to issue the
recreate call. This makes the code both convoluted and slow.
Change the reader creation API to be synchronous, that is, callable from
the shard where the reader has to be created, allowing for simple call
sites and batching.
This change requires that implementations of the lifecycle policy update
any per-reader data-structure they have from the remote shard. This is
not a problem however, as these data-structures are usually partitioned,
such that they can be accessed safely from a remote shard.
Another, very pleasant, consequence of this change is that now all
methods of the lifecycle interface are sync and thus calls to them
cannot overlap anymore.

This patch also removes the
`test_multishard_combining_reader_destroyed_with_pending_create_reader`
unit test, which is not useful anymore.

For now just emulate the old interface inside shard reader. We will
overhaul the shard reader after some further changes to minimize
noise.
2019-02-12 16:20:51 +02:00
Botond Dénes
57d1f6589c shard_reader: use semaphore directly to pause-resume
The shard reader relies on the `reader_lifecycle_policy` for pausing and
resuming the remote reader. The lifecycle policy's API was designed to
be as general as possible, allowing for any implementation of
pause/resume. However, in practice, we have a single implementation of
pause/resume: registering/unregistering the reader with the relevant
`reader_concurrency_semaphore`, and we don't expect any new
implementations to appear in the future.
Thus, the generic API of the lifecycle policy, is needlessly abstract
making its implementations needlessly complex. We can instead make this
very concrete and have the lifecycle policy just return the relevant
semaphore, removing the need for every implementor of the lifecycle
policy interface to have a duplicate implementation of the very same
logic.

For now just emulate the old interface inside shard reader. We will
overhaul the shard reader after some further changes to minimize noise.
2019-02-12 16:20:51 +02:00
Botond Dénes
4537ec7426 mutlishard_mutation_query(): use correct reader concurrency semaphore
The multishard mutation query used the semaphore obtained from
`database::user_read_concurrency_sem()` to pause-resume shard readers.
This presented a problem when `multishard_mutation_query()` was reading
from system tables. In this case the readers themselves would obtain
their permits from the system read concurrency semaphore. Since the
pausing of shard readers used the user read semaphore, pausing failed to
fulfill its objective of alleviating pressure on the semaphore the reads
obtained their permits from. In some cases this lead to a deadlock
during system reads.
To ensure the correct semaphore is used for pausing-resuming readers,
obtain the semaphore from the `table` object. To avoid looking up the
table on every pause or resume call, cache the semaphores when readers
are created.

Fixes: #4096

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <c784a3cd525ce29642d7216fbe92638fa7884e88.1547729119.git.bdenes@scylladb.com>
2019-01-17 15:19:59 +02:00
Botond Dénes
b4c3aab4a7 multishard_mutation_query: reset failed readers to inexistent state
When attempting to dismantling readers, some of the to-be-dismantled
readers might be in a failed state. The code waiting on the reader to
stop is expecting failures, however it didn't do anything besides
logging the failure and bumping a counter. Code in the lower layers did
not know how to deal with a failed reader and would trigger
`std::bad_variant_access` when trying to process (save or cleanup) it.
To prevent this, reset the state of failed readers to `inexistent_state`
so code in the lower layers doesn't attempt to further process them.
2018-12-17 13:18:08 +02:00
Botond Dénes
9cef043841 multishard_mutation_query: handle missing readers when dismantling
When dismantling the combined buffer and the compaction state we are no
longer guaranteed to have the reader each partition originated from. The
reader might have been evicted and not resumed, or resuming it might
have failed. In any case we can no longer assume the originating reader
of each partition will be present. If a reader no longer exists,
discard the partitions that it emitted.
2018-12-17 13:18:08 +02:00
Botond Dénes
438bef333b multishard_mutation_query: add support for keeping stats for discarded partitions
In the next patches we will add code that will have to discard some of
the dismantled partitions/fragments/bytes. Prepare the
`dismantle_buffer_stats` struct for being able to track the discarded
partitions/fragments/bytes in addition to those that were successfully
dismantled.
2018-12-17 13:18:08 +02:00
Botond Dénes
ce52436af4 multishard_mutation_query: expect evicted reader state when creating reader
Previously readers were created once, so `make_remote_reader()` had a
validation to ensure readers were not attempted at being created more
than once. This validation was done by checking that the reader-state is
either `inexistent` or `successful_lookup`. However with the
introduction of pausing shard readers, it is now possible that a reader
will have to be created and then re-created several times, however this
validation was not updated to expect this.
Update the validation so it also expects the reader-state to be
`evicted`, the state the reader will be if it was evicted while paused.
2018-12-17 13:18:08 +02:00
Botond Dénes
1effb1995b multishard_mutation_query: pretty-print the reader state in log messages 2018-12-17 13:18:08 +02:00
Botond Dénes
1865e5da41 treewide: remove include database.hh from headers where possible
Many headers don't really need to include database.hh, the include can
be replaced by forward declarations and/or including the actually needed
headers directly. Some headers don't need this include at all.

Each header was verified to be compilable on its own after the change,
by including it into an empty `.cc` file and compiling it. `.cc` files
that used to get `database.hh` through headers that no longer include it
were changed to include it themselves.
2018-12-14 08:03:57 +02:00
Botond Dénes
f334d3717f query_mutations_on_all_shards(): implement pause-resume API 2018-12-04 08:51:05 +02:00
Botond Dénes
5f67a065c6 reader_lifecycle_policy: extend with a pause-resume API
This API provides a way for the mulishard reader to pause inactive shard
readers and later resume them when they are needed again. This allows
for these paused shard readers to be evicted when the node is under
pressure.
How the readers are made evictable while paused is up to the clients.

Using this API in the `multishard_combining_reader` and implementing it
in the clients will be done in the next patches.

Provide default implementation for the new virtual methods to facilitate
gradual adoption.
2018-12-04 08:51:05 +02:00
Botond Dénes
6f0e0c4ed7 query_mutations_on_all_shards(): restore indentation
The previous patch added half-aligned lines to improve readability of
that patch.
2018-12-04 08:51:05 +02:00
Botond Dénes
aa6083a75b query_mutations_on_all_shards(): simplify the state-machine
The `read_context` which handles creating, saving and looking-up the
shard readers has to deal with its `destroy_reader()` method called any
time, even before some other method finished its work. For example it is
valid for a reader to be requested to be destroyed, even before the
contexts finishes creating it.
This means that state transitions that take time can be interleaved with
another state transition request. To deal with this the read context
uses `future_` states, states that mark an ongoing state transitions.
This allows for state transition request that arrive in the middle of
another state transition to be attached as a continuation to the ongoing
transition, and to be executed after that finishes. This however
resulted in complex code, that has to handle readers being in all sorts
of different states, when the `save_readers()` method is called.
To avoid all this complexity, exploit the fact that `destroy_reader()`
receives a future<> as its argument, which resolves when all previous
state transitions have finished. Use a gate to wait on all these futures
to resolve. This way we don't need all those transitional states,
instead in `save_readers()` we only need to wait on the gate to close.
Thus the number of states `save_readers()` has to consider drops
drastically.

This has the theoretical drawback of the process of saving the readers
having to wait on each of the readers to stop, but in practice the
process finishes when the last reader is saved anyway, so I don't expect
this to result in any slowdown.
2018-12-04 08:51:05 +02:00
Botond Dénes
007619de4c multishard_combining_reader: use the reader lifecycle policy
Refactor the multishard combining reader and its clients to use the
reader lifecycle policy introduced in the previous patch.
2018-12-04 08:51:05 +02:00
Botond Dénes
5a4fd1abab multishard_combining_reader: drop support for streamed_mutation fast-forwarding
It doesn't make sense for the multishard reader anyway, as it's only
used by the row-cache. We are about to introduce the pausing of inactive
shard readers, and it would require complex data structures and code
to maintain support for this feature that is not even used. So drop it.
2018-12-04 08:51:05 +02:00
Avi Kivity
bceff1550c tests: fix bad format string syntax
Some sprint() calls use the fmt language instead of the printf syntax. Convert
them all the way to format().
2018-11-01 13:16:17 +00:00