Since fce124bd90 ("Merge "Introduce flat_mutation_reader_v2" from
Tomasz") tests involving mutation_reader are a lot slower due to
the new API testing. On slower machines it's enough to time out.
Work underway to improve the situation, and it will also revert back
to the original timing once the flat_mutation_reader_v2 work is done,
but meanwhile, increase the timeout.
Closes#9046
This patch applies the same changes to both kl and mx sstable readers, but because the kl reader is old, we'll focus on the newer one.
This patch makes the main sstable reader process a coroutine,
allowing to simplify it, by:
- using the state saved in the coroutine instead of most of the states saved in the _state variable
- removing the switch statement and moving the code of former switch cases, resulting in reduced number of jumps in code
- removing repetitive ifs for read statuses, by adding them to the coroutine implementation
The coroutine is saved in a new class ```processing_result_generator```, which works like a generator: using its ```generate()``` method, one can order the coroutine to continue until it yields a data_consumer::processing_result value, which was achieved previously by calling the function that is now the coroutine(```do_process_state()```).
Before the patch, the main processing method had 558 lines. The patch reduces this number to 345 lines.
However, usage of c++ coroutines has a non-negligible effect on the performance of the sstable reader.
In the test cases from ```perf_fast_forward``` the new sstable reader performs up to 2% more instructions (per fragment) than the former implementation, and this loss is achieved for cases where we're reading many subsequent rows, without any skips.
Thanks to finding an optimization during the development of the patch, the loss is mitigated when we do skip rows, and for some cases, we can even observe an improvement.
You can see the full results in attached files: [old_results.txt](https://github.com/scylladb/scylla/files/6793139/old_results.txt), [new_results.txt](https://github.com/scylladb/scylla/files/6793140/new_results.txt)
Test: unit(dev)
Refs: #7952Closes#9002
* github.com:scylladb/scylla:
mx sstable reader: reduce code blocks
mx sstable reader: make ifs consistent
sstable readers: make awaiter for read status
mx sstable reader: don't yield if the data buffer is not empty
mx sstable reader: combine FLAGS and FLAGS_2 states
mx sstable reader: reduce placeholder state usage
mx sstable reader: replace non_consuming states with a bool
mx sstable reader: reduce placeholder state usage
mx sstable reader: replace unnecessary states with a placeholder
mx sstable reader: remove false if case
mx sstable reader: remove row_body_missing_columns_label
mx sstable reader: remove row_body_deletion_label
mx sstable reader: remove column_end_label
mx sstable reader: remove column_cell_path_label
mx sstable reader: remove column_ttl_label
mx sstable reader: remove column_deletion_time_label
mx sstable reader: remove complex_column_2_label
mx sstable reader: remove row_body_missing_columns_read_columns_label
mx sstable reader: remove row_body_marker_label
mx sstable reader: remove row_body_shadowable_deletion_label
mx sstable reader: remove row_body_prev_size_label
mx sstable reader: remove ck_block_label
mx sstable reader: remove ck_block2_label
mx sstable reader: remove clustering_row_label and complex_column_label
mx sstable reader: remove labels with only one goto
mx sstable reader: replace the switch cases with gotos and a new label
mx sstable reader: remove states only reached consecutively or from goto
mx sstable reader: remove switch breaks for consecutive states
mx sstable reader: convert readers main method into a coroutine
kl sstable reader: replace states for ending with one state, simplify non_consuming
kl sstable reader: remove unnecessary states
kl sstable reader: remove unnecessary yield
kl sstable reader: remove unnecessary blocks
kl sstable reader: fix indentation
kl sstable reader: replace switch with standard flow control
kl sstable reader: remove state::CELL case
kl sstable reader: move states code only reachable from one place
kl sstable reader: remove states only reached consecutively
kl sstable reader: remove switch breaks for consecutive states
kl sstable reader: remove unreachable case
kl sstable reader: move testing hack for fragmented buffers outside the coroutine
kl sstable reader: convert readers main method into a coroutine
sstable readers: create a generator class for coroutines
Some blocks of code were surrounded by curly braces, because
a variable was declared inside a switch case. After changes,
some of the variable declarations are in if/else/while cases,
and no longer need to be in separate code blocks, while other
blocks can be extended to entire labels for simplicity.
In several places we're checking the return value of our
consumers' consume_* calls. Because the behaviour in all cases
is the same, let us use the same notation as well.
After each read* call of the primitive_consumer we need to check
if the entire primitive was in our current buffer. We can check it
in the proceed_generator object by yielding the returned read status:
if the yielded status is ready, the yield_value method returns
a structure whose await_ready() method returns true. Otherwise it
returns false.
The returned structure is co_awaited by the coroutine (due to co_yield),
and if await_ready() returns true, the coroutine isn't stopped,
conversely, if it returns false, (technical: and because its await_suspend
methods returns void) the coroutine stops, and a proceed::yes value
is saved, indicating that we need more buffers.
The skip() method returns a skip_bytes object if we want to
skip the entire buffer, otherwise it returns a proceed::yes
and trims the buffer.
If the buffer is only trimmed we don't need to interrupt
the coroutine, we simply continue instead.
The non_consuming() method is only used after assuring that
primitive_consumer::active() (in continuous_data_consumer::process())
so we don't need states where primitive_consumer::active(), which
is most of them.
We still need to make sure that the states change when they need to,
so we replace all the concerned states with the placeholder state,
and for the few states from the non_consuming() OR, where the
primitive_consumer::active() returns true, we set the value of
_consuming to false, changing it back when the state is no longer
non_consuming.
We can remove state assignments that we know are
changing a state to itself.
Similarily, if a state is changed in the same way
in an if and an else, it can be changed before the
if/else instead.
After removing the switch, the state is only used for
verify_end_state() and non_consuming(), so we can
replace states that are not used there with a single
one, so that the state still stops being one of the
appearing states when it needs to.
consume_row_marker_and_tombstone does not return proceed::no in the
mp_row_consumer_m implementation, and even if it did, we would most
likely want to yield proceed::no in that case as well.
column_cell_path_label is only reached from two goto, both
at the end of an if/else block, or consecutively, so the code
after the if/else block can be ommited by an if instead (or else).
column_ttl_label is only reached from two goto, both
at the end of an if/else block, or consecutively, so the code
after the if/else block can be ommited by an if instead (or else).
row_body_missing_columns_read_columns_label is only reached
consecutively, or from a goto after the label. This is changed to a
while loop starting at the label and ending at the goto.
The code executed in the only case we do not reach the goto (so
when exiting the loop) is moved after the while.
row_body_marker_label is only reached from one goto inside an else
case, or consecutively, so the code omitted by goto can be moved
inside the corresponding if case.
row_body_shadowable_deletion_label is only reached from one
goto, or consecutively, so the code omitted by goto can be
ommited by an if instead (or else).
row_body_prev_size_label is only reached consecutively, or from
a goto not far after the label. This is changed to a while loop
starting at the label and ending at the goto.
ck_block_label is only reached consecutively, or from
a few gotos not far after the label. This is changed
to a while loop with gotos replaced with continue's.
clustering_row_label is only reached from one goto, or consecutively,
so the code omitted by goto can be ommited by an if instead (or else).
Also remove complex_column_label because it is next to
its only goto.
Because the number of remaining cases is moderately low, and
after finishing a case we always enter another one, the switch
is removed completely, and the last remaining cases are handled
by 3 additional gotos and 1 new label.
If a state is never reached from the top of the switch, but only
by continuing from the previous case, we don't need to have a case:
for it.
Similarily, if there is a label that we goto, we don't need the
switch case.
(same as in kl sstable reader)
The function is converted to a coroutine simply by adding an
infinite loop around the switch, and starting another iteration
after yielding a value, instead of returning.
Because the coroutine resume() function does not take any arguments,
a new member is introduced to remember the "data" buffer, that was
previously an argument to the method.
After removing the switch, the only use for states in the sstable reader
are methods non_consuming() and verify_end_state().
The non_consuming() method is only used after assuring that
!primitive_consumer::active() (in continuous_data_consumer::process())
so we don't need states where primitive_consumer::active() for this
method, and is actually all of them.
We don't differentiate between ATOM_START and ATOM_START_2 in
verify_end_state(), so we can just merge them into one.
While we need tho remember times when we enter states used in verify_end_state(),
we also need to remember when we exit them. For that reason we introduce a new
state "NOT_CLOSING", that fails all comparisons in verify_end_state(), and
replaces all states that aren't used in verify_end_state()
After removing the switch, the state is only used for
verify_end_state() and non_consuming(), so we can
remove states that are not used there (and which do
not change them).
Some blocks of code were surrounded by curly braces, because
a variable was declared inside a switch case. With standard
flow control, it's no longer needed.
We get rid of the switch by using the infinite loop around the
switch for jumping to the first case, adding an infinite loop
around the second case (one break from the switch with the
state of the first case becomes a break of the new while),
and adding an if around the first case (because we never break
in the first case).
The function is converted to a coroutine simply by adding an
infinite loop around the switch, and starting another iteration
after yielding a value, instead of returning.
Because the coroutine resume() function does not take any arguments,
a new member is introduced to remember the "data" buffer, that was
previously an argument to the method.
The data_consume_rows_context and data_consume_rows_context_m are
classes, that use primitive_consumer read* methods to get primitives
from a streamed sstable, and using their corresponding consumers' (
mp_row_consumer_k_l and mp_row_consumer_m) consume* methods, they
fill the buffer of the corresponding flat_mutation_reader.
The main procedure where we decide which read* and consume* methods
to call, is do_process_state. We save the current state of the
procedure in the _state variable, to remember where to continue in
the next call. For each call, the do_process_state method returns
an information about whether we can keep filling the buffer using
more buffers from the stream (proceed::yes), or not (proceed::no).
The saved state can be (mostly) removed by using a generator
coroutine, whose state is saved when its execution is halted,
and which yields the values, that do_process_state would return
before.
The processing_result_generator is a class for managing a generator
coroutine. When the coroutine halts, the proceed_generator saves the
value yielded by the coroutine, and returns it to the caller.
This patchset combines two important changes to the way reader permits
are created and admitted:
1) It switches admission to be up-front.
2) It changes the admission algorithm.
(1) Currently permits are created before the read is started, but they
only wait for admission when going to the disk. This leaves the
resources consumption of cache and memtables reads unbounded, possibly
leading to OOM (rare but happens). This series changes this that permits
are admitted at the moment they are creating making admission up-front
-- at least those reads that pass admission at all (some don't).
(2) Admission currently is based on availability of resources. We have a
certain amount of memory available, which derived from the memory
available to the shard, as well a hardcoded count resource. Reads are
admitted when a count and a certain amount (base cost) of memory is
available. This patchset adds a new aspect to this admission process
beyond the existing resource availability: the number of used/blocked
reads. Namely it only admits new reads if in addition to the necessary
amount of resources being available, all currently used readers are
blocked. In other words we only admit new reads if all currently
admitted reads requires something other than CPU to progress. They are
either waiting on I/O, a remote shard, or attention from their consumers
(not used currently).
The reason for making these two changes at the same time is that
up-front admission means cache reads now need to obtain a permit too.
For cache reads the optimal concurrency is 1. Anything above that just
increases latency (without increasing throughput). So we want to make sure
that if a cache reader hits it doesn't get any competition for CPU and
it can run to completion. We admit new reads only if the read misses and
has to go to disk.
A side effect of these changes is that the execution stages from the
replica-side read path are replaced with the reader concurrency
semaphore as an execution stage. This is necessary due to bad
interaction between said execution stages and up-front admission. This
has an important consequence: read timeouts are more strictly enforced
because the execution stage doesn't have a timeout so it can execute
already timed-out reads too. This is not the case with the semaphore's
queue which will drop timed-out reads. Another consequence is that, now
data and mutation reads share the same execution stage, which increases
its effectiveness, on the other hand system and user reads don't
anymore.
Fixes: #4758Fixes: #5718
Tests: unit(dev, release, debug)
* 'reader-concurrency-semaphore-in-front-of-the-cache/v5.3' of https://github.com/denesb/scylla: (54 commits)
test/boost/reader_concurrency_semaphore_test: add used/blocked test
test/boost/reader_concurrency_semaphore_test: add admission test
reader_permit: add operator<< for reader_resources
reader_concurrency_semaphore: add reads_{admitted,enqueued} stats
table: make_sstable_reader(): fix indentation
table: clean up make_sstable_reader()
database: remove now unused query execution stages
mutation_reader: remove now unused restricting_reader
sstables: sstable_set: remove now unused make_restricted_range_sstable_reader()
reader_permit: remove now unused wait_admission()
reader_concurrency_semaphore: remove now unused obtain_permit_nowait()
reader_concurrency_semaphore: admission: flip the switch
database: increase semaphore max queue size
test: index_with_paging_test: increase semaphore's queue size
reader_concurrency_semaphore: add set_max_queue_size()
test: mutation_reader_test: remove restricted reader tests
reader_concurrency_semaphore: remove now unused make_permit()
test: reader_concurrency_semaphore_test: move away from make_permit()
test: move away from make_permit()
treewide: use make_tracking_only_permit()
...