Commit Graph

27412 Commits

Author SHA1 Message Date
Avi Kivity
ed6c01a9fa test: increase timeout to account for flat_mutation_reader_v2 tests
Since fce124bd90 ("Merge "Introduce flat_mutation_reader_v2" from
Tomasz") tests involving mutation_reader are a lot slower due to
the new API testing. On slower machines it's enough to time out.

Work underway to improve the situation, and it will also revert back
to the original timing once the flat_mutation_reader_v2 work is done,
but meanwhile, increase the timeout.

Closes #9046
2021-07-15 12:33:43 +03:00
Avi Kivity
1643549d08 Merge 'Coroutinize the sstable reader' from Wojciech Mitros
This patch applies the same changes to both kl and mx sstable readers, but because the kl reader is old, we'll focus on the newer one.

This patch makes the main sstable reader process a coroutine,
allowing to simplify it, by:

- using the state saved in the coroutine instead of most of the states saved in the _state variable
- removing the switch statement and moving the code of former switch cases, resulting in reduced number of jumps in code
- removing repetitive ifs for read statuses, by adding them to the coroutine implementation

The coroutine is saved in a new class ```processing_result_generator```, which works like a generator: using its ```generate()``` method, one can order the coroutine to continue until it yields a data_consumer::processing_result value, which was achieved previously by calling the function that is now the coroutine(```do_process_state()```).

Before the patch, the main processing method had 558 lines. The patch reduces this number to 345 lines.

However, usage of c++ coroutines has a non-negligible effect on the performance of the sstable reader.
In the test cases from ```perf_fast_forward``` the new sstable reader performs up to 2% more instructions (per fragment) than the former implementation, and this loss is achieved for cases where we're reading many subsequent rows, without any skips.
Thanks to finding an optimization during the development of the patch, the loss is mitigated when we do skip rows, and for some cases, we can even observe an improvement.
You can see the full results in attached files: [old_results.txt](https://github.com/scylladb/scylla/files/6793139/old_results.txt), [new_results.txt](https://github.com/scylladb/scylla/files/6793140/new_results.txt)

Test: unit(dev)
Refs: #7952

Closes #9002

* github.com:scylladb/scylla:
  mx sstable reader: reduce code blocks
  mx sstable reader: make ifs consistent
  sstable readers: make awaiter for read status
  mx sstable reader: don't yield if the data buffer is not empty
  mx sstable reader: combine FLAGS and FLAGS_2 states
  mx sstable reader: reduce placeholder state usage
  mx sstable reader: replace non_consuming states with a bool
  mx sstable reader: reduce placeholder state usage
  mx sstable reader: replace unnecessary states with a placeholder
  mx sstable reader: remove false if case
  mx sstable reader: remove row_body_missing_columns_label
  mx sstable reader: remove row_body_deletion_label
  mx sstable reader: remove column_end_label
  mx sstable reader: remove column_cell_path_label
  mx sstable reader: remove column_ttl_label
  mx sstable reader: remove column_deletion_time_label
  mx sstable reader: remove complex_column_2_label
  mx sstable reader: remove row_body_missing_columns_read_columns_label
  mx sstable reader: remove row_body_marker_label
  mx sstable reader: remove row_body_shadowable_deletion_label
  mx sstable reader: remove row_body_prev_size_label
  mx sstable reader: remove ck_block_label
  mx sstable reader: remove ck_block2_label
  mx sstable reader: remove clustering_row_label and complex_column_label
  mx sstable reader: remove labels with only one goto
  mx sstable reader: replace the switch cases with gotos and a new label
  mx sstable reader: remove states only reached consecutively or from goto
  mx sstable reader: remove switch breaks for consecutive states
  mx sstable reader: convert readers main method into a coroutine
  kl sstable reader: replace states for ending with one state, simplify non_consuming
  kl sstable reader: remove unnecessary states
  kl sstable reader: remove unnecessary yield
  kl sstable reader: remove unnecessary blocks
  kl sstable reader: fix indentation
  kl sstable reader: replace switch with standard flow control
  kl sstable reader: remove state::CELL case
  kl sstable reader: move states code only reachable from one place
  kl sstable reader: remove states only reached consecutively
  kl sstable reader: remove switch breaks for consecutive states
  kl sstable reader: remove unreachable case
  kl sstable reader: move testing hack for fragmented buffers outside the coroutine
  kl sstable reader: convert readers main method into a coroutine
  sstable readers: create a generator class for coroutines
2021-07-15 12:06:14 +03:00
Wojciech Mitros
45058776c2 mx sstable reader: reduce code blocks
Some blocks of code were surrounded by curly braces, because
a variable was declared inside a switch case. After changes,
some of the variable declarations are in if/else/while cases,
and no longer need to be in separate code blocks, while other
blocks can be extended to entire labels for simplicity.
2021-07-14 20:50:30 +02:00
Wojciech Mitros
9b333908e4 mx sstable reader: make ifs consistent
In several places we're checking the return value of our
consumers' consume_* calls. Because the behaviour in all cases
is the same, let us use the same notation as well.
2021-07-14 20:50:30 +02:00
Wojciech Mitros
dc38605f75 sstable readers: make awaiter for read status
After each read* call of the primitive_consumer we need to check
if the entire primitive was in our current buffer. We can check it
in the proceed_generator object by yielding the returned read status:
if the yielded status is ready, the yield_value method returns
a structure whose await_ready() method returns true. Otherwise it
returns false.
The returned structure is co_awaited by the coroutine (due to co_yield),
and if await_ready() returns true, the coroutine isn't stopped,
conversely, if it returns false, (technical: and because its await_suspend
methods returns void) the coroutine stops, and a proceed::yes value
is saved, indicating that we need more buffers.
2021-07-14 20:50:30 +02:00
Wojciech Mitros
09a0cd7c05 mx sstable reader: don't yield if the data buffer is not empty
The skip() method returns a skip_bytes object if we want to
skip the entire buffer, otherwise it returns a proceed::yes
and trims the buffer.

If the buffer is only trimmed we don't need to interrupt
the coroutine, we simply continue instead.
2021-07-14 20:50:30 +02:00
Wojciech Mitros
5dc64532bd mx sstable reader: combine FLAGS and FLAGS_2 states
We don't differentiate between FLAGS and FLAGS_2 in
verify_end_state(), so we can merge them into one state.
2021-07-14 20:50:30 +02:00
Wojciech Mitros
ab1e6f4211 mx sstable reader: reduce placeholder state usage
After the changes to non_consuming states, we can
remove some state::OTHER assignments again.
2021-07-14 20:50:30 +02:00
Wojciech Mitros
c904ab12c8 mx sstable reader: replace non_consuming states with a bool
The non_consuming() method is only used after assuring that
primitive_consumer::active() (in continuous_data_consumer::process())
so we don't need states where primitive_consumer::active(), which
is most of them.

We still need to make sure that the states change when they need to,
so we replace all the concerned states with the placeholder state,
and for the few states from the non_consuming() OR, where the
primitive_consumer::active() returns true, we set the value of
_consuming to false, changing it back when the state is no longer
non_consuming.
2021-07-14 20:50:30 +02:00
Wojciech Mitros
b05d3eefed mx sstable reader: reduce placeholder state usage
We can remove state assignments that we know are
changing a state to itself.

Similarily, if a state is changed in the same way
in an if and an else, it can be changed before the
if/else instead.
2021-07-14 20:50:30 +02:00
Wojciech Mitros
b2e3fbffd0 mx sstable reader: replace unnecessary states with a placeholder
After removing the switch, the state is only used for
verify_end_state() and non_consuming(), so we can
replace states that are not used there with a single
one, so that the state still stops being one of the
appearing states when it needs to.
2021-07-14 20:50:30 +02:00
Wojciech Mitros
9a7a8fa86c mx sstable reader: remove false if case
consume_row_marker_and_tombstone does not return proceed::no in the
mp_row_consumer_m implementation, and even if it did, we would most
likely want to yield proceed::no in that case as well.
2021-07-14 20:50:30 +02:00
Wojciech Mitros
2262aac11a mx sstable reader: remove row_body_missing_columns_label
row_body_missing_columns_label is only reached from one
goto, or consecutively, so the code omitted by goto can be
ommited by an if instead (or else).
2021-07-14 20:50:30 +02:00
Wojciech Mitros
99b5a332db mx sstable reader: remove row_body_deletion_label
row_body_deletion_label is only reached from one
goto, or consecutively, so the code omitted by goto can be
ommited by an if instead (or else).
2021-07-14 20:50:30 +02:00
Wojciech Mitros
cbce22a88b mx sstable reader: remove column_end_label
column_end_label is only reached from one goto, or consecutively,
so the code omitted by goto can be ommited by an if instead (or else).
2021-07-14 20:50:30 +02:00
Wojciech Mitros
925d921cb4 mx sstable reader: remove column_cell_path_label
column_cell_path_label is only reached from two goto, both
at the end of an if/else block, or consecutively, so the code
after the if/else block can be ommited by an if instead (or else).
2021-07-14 20:50:30 +02:00
Wojciech Mitros
e85987a439 mx sstable reader: remove column_ttl_label
column_ttl_label is only reached from two goto, both
at the end of an if/else block, or consecutively, so the code
after the if/else block can be ommited by an if instead (or else).
2021-07-14 20:50:30 +02:00
Wojciech Mitros
4b3607e97b mx sstable reader: remove column_deletion_time_label
column_deletion_time_label is only reached from one
goto, or consecutively, so the code omitted by goto can be
ommited by an if instead (or else).
2021-07-14 20:50:30 +02:00
Wojciech Mitros
8cf23c3b01 mx sstable reader: remove complex_column_2_label
complex_column_2_label is only reached from one goto, or consecutively,
so the code omitted by goto can be ommited by an if instead (or else).
2021-07-14 20:50:30 +02:00
Wojciech Mitros
fbe28d18f3 mx sstable reader: remove row_body_missing_columns_read_columns_label
row_body_missing_columns_read_columns_label is only reached
consecutively, or from a goto after the label. This is changed to a
while loop starting at the label and ending at the goto.

The code executed in the only case we do not reach the goto (so
when exiting the loop) is moved after the while.
2021-07-14 20:50:30 +02:00
Wojciech Mitros
3b512ea2c2 mx sstable reader: remove row_body_marker_label
row_body_marker_label is only reached from one goto inside an else
case, or consecutively, so the code omitted by goto can be moved
inside the corresponding if case.
2021-07-14 20:50:30 +02:00
Wojciech Mitros
0bcde69319 mx sstable reader: remove row_body_shadowable_deletion_label
row_body_shadowable_deletion_label is only reached from one
goto, or consecutively, so the code omitted by goto can be
ommited by an if instead (or else).
2021-07-14 20:50:30 +02:00
Wojciech Mitros
3d0fdf9f3b mx sstable reader: remove row_body_prev_size_label
row_body_prev_size_label is only reached consecutively, or from
a goto not far after the label. This is changed to a while loop
starting at the label and ending at the goto.
2021-07-14 20:50:30 +02:00
Wojciech Mitros
b27166c36f mx sstable reader: remove ck_block_label
ck_block_label is only reached consecutively, or from
a few gotos not far after the label. This is changed
to a while loop with gotos replaced with continue's.
2021-07-14 20:50:30 +02:00
Wojciech Mitros
ec6c2f0e07 mx sstable reader: remove ck_block2_label
ck_block2_label is only reached from one goto, or consecutively,
so the code omitted by goto can be ommited by an if instead (or else).
2021-07-14 20:50:30 +02:00
Wojciech Mitros
1e59e249ec mx sstable reader: remove clustering_row_label and complex_column_label
clustering_row_label is only reached from one goto, or consecutively,
so the code omitted by goto can be ommited by an if instead (or else).

Also remove complex_column_label because it is next to
its only goto.
2021-07-14 20:50:30 +02:00
Wojciech Mitros
440aba61a9 mx sstable reader: remove labels with only one goto
If a case is reached only after after jumping with a single
goto, that goto may be replaced with the target code.
2021-07-14 20:50:30 +02:00
Wojciech Mitros
65f7eb5ada mx sstable reader: replace the switch cases with gotos and a new label
Because the number of remaining cases is moderately low, and
after finishing a case we always enter another one, the switch
is removed completely, and the last remaining cases are handled
by 3 additional gotos and 1 new label.
2021-07-14 20:50:30 +02:00
Wojciech Mitros
0398c68797 mx sstable reader: remove states only reached consecutively or from goto
If a state is never reached from the top of the switch, but only
by continuing from the previous case, we don't need to have a case:
for it.

Similarily, if there is a label that we goto, we don't need the
switch case.
2021-07-14 20:50:30 +02:00
Wojciech Mitros
f87b27b9e4 mx sstable reader: remove switch breaks for consecutive states
If _state at the end of a switch case has the same value as the
next case, instead of breaking the switch, we can just fall through.
2021-07-14 20:50:30 +02:00
Wojciech Mitros
32b996aca5 mx sstable reader: convert readers main method into a coroutine
(same as in kl sstable reader)
The function is converted to a coroutine simply by adding an
infinite loop around the switch, and starting another iteration
after yielding a value, instead of returning.

Because the coroutine resume() function does not take any arguments,
a new member is introduced to remember the "data" buffer, that was
previously an argument to the method.
2021-07-14 20:50:30 +02:00
Wojciech Mitros
4816e8120b kl sstable reader: replace states for ending with one state, simplify non_consuming
After removing the switch, the only use for states in the sstable reader
are methods non_consuming() and verify_end_state().

The non_consuming() method is only used after assuring that
!primitive_consumer::active() (in continuous_data_consumer::process())
so we don't need states where primitive_consumer::active() for this
method, and is actually all of them.

We don't differentiate between ATOM_START and ATOM_START_2 in
verify_end_state(), so we can just merge them into one.

While we need tho remember times when we enter states used in verify_end_state(),
we also need to remember when we exit them. For that reason we introduce a new
state "NOT_CLOSING", that fails all comparisons in verify_end_state(), and
replaces all states that aren't used in verify_end_state()
2021-07-14 20:50:30 +02:00
Wojciech Mitros
0c284a8b5e kl sstable reader: remove unnecessary states
After removing the switch, the state is only used for
verify_end_state() and non_consuming(), so we can
remove states that are not used there (and which do
not change them).
2021-07-14 20:50:30 +02:00
Wojciech Mitros
35c30e6178 kl sstable reader: remove unnecessary yield
We don't need to yield row_consumer::proceed::yes if we are
not parsing a primitive using primitive_consumer, we can just
continue execution.
2021-07-14 20:50:30 +02:00
Wojciech Mitros
97c7b5fe76 kl sstable reader: remove unnecessary blocks
Some blocks of code were surrounded by curly braces, because
a variable was declared inside a switch case. With standard
flow control, it's no longer needed.
2021-07-14 20:50:30 +02:00
Wojciech Mitros
914e4f27e9 kl sstable reader: fix indentation
To simplify review, the code moved in previous commits
didn't change its indentation. This commit fixes it.
2021-07-14 20:50:30 +02:00
Wojciech Mitros
7a6729159f kl sstable reader: replace switch with standard flow control
We get rid of the switch by using the infinite loop around the
switch for jumping to the first case, adding an infinite loop
around the second case (one break from the switch with the
state of the first case becomes a break of the new while),
and adding an if around the first case (because we never break
in the first case).
2021-07-14 20:50:30 +02:00
Wojciech Mitros
cfe6a46a60 kl sstable reader: remove state::CELL case
The CELL state is only set in the if/else block immediately
before the CELL case, so we don't need to have a case for it.
2021-07-14 20:50:30 +02:00
Wojciech Mitros
c41f49d2e5 kl sstable reader: move states code only reachable from one place
If a case is reached only after exiting a certain other case (or goto)
its code may as well be moved to that place.
2021-07-14 20:50:30 +02:00
Wojciech Mitros
5f27413c1f kl sstable reader: remove states only reached consecutively
If a state is never reached from the top of the switch, but only
by continuing from the previous case, we don't need to have a case:
for it.
2021-07-14 20:50:30 +02:00
Wojciech Mitros
e226fc12c9 kl sstable reader: remove switch breaks for consecutive states
If _state at the end of a switch case has the same value as the
next case, instead of breaking the switch, we can just fall through.
2021-07-14 20:50:30 +02:00
Wojciech Mitros
bc7ed3f596 kl sstable reader: remove unreachable case
The STOP_THEN_ATOM_START is never reached, so it can be
removed altogether.
2021-07-14 20:50:30 +02:00
Wojciech Mitros
63d1a44d12 kl sstable reader: move testing hack for fragmented buffers outside the coroutine
The testing hack can't be done inside the coroutine, because
we don't have the original "data" buffer
2021-07-14 20:50:30 +02:00
Wojciech Mitros
6fff9aed3c kl sstable reader: convert readers main method into a coroutine
The function is converted to a coroutine simply by adding an
infinite loop around the switch, and starting another iteration
after yielding a value, instead of returning.

Because the coroutine resume() function does not take any arguments,
a new member is introduced to remember the "data" buffer, that was
previously an argument to the method.
2021-07-14 20:50:30 +02:00
Wojciech Mitros
01c2f406df sstable readers: create a generator class for coroutines
The data_consume_rows_context and data_consume_rows_context_m are
classes, that use primitive_consumer read* methods to get primitives
from a streamed sstable, and using their corresponding consumers' (
mp_row_consumer_k_l and mp_row_consumer_m) consume* methods, they
fill the buffer of the corresponding flat_mutation_reader.

The main procedure where we decide which read* and consume* methods
to call, is do_process_state. We save the current state of the
procedure in the _state variable, to remember where to continue in
the next call. For each call, the do_process_state method returns
an information about whether we can keep filling the buffer using
more buffers from the stream (proceed::yes), or not (proceed::no).

The saved state can be (mostly) removed by using a generator
coroutine, whose state is saved when its execution is halted,
and which yields the values, that do_process_state would return
before.

The processing_result_generator is a class for managing a generator
coroutine. When the coroutine halts, the proceed_generator saves the
value yielded by the coroutine, and returns it to the caller.
2021-07-14 20:50:27 +02:00
Piotr Sarna
3d816b7c16 Merge 'Move the reader concurrency semaphore in front of the cache' from Botond
This patchset combines two important changes to the way reader permits
are created and admitted:
1) It switches admission to be up-front.
2) It changes the admission algorithm.

(1) Currently permits are created before the read is started, but they
only wait for admission when going to the disk. This leaves the
resources consumption of cache and memtables reads unbounded, possibly
leading to OOM (rare but happens). This series changes this that permits
are admitted at the moment they are creating making admission up-front
-- at least those reads that pass admission at all (some don't).

(2) Admission currently is based on availability of resources. We have a
certain amount of memory available, which derived from the memory
available to the shard, as well a hardcoded count resource. Reads are
admitted when a count and a certain amount (base cost) of memory is
available. This patchset adds a new aspect to this admission process
beyond the existing resource availability: the number of used/blocked
reads. Namely it only admits new reads if in addition to the necessary
amount of resources being available, all currently used readers are
blocked. In other words we only admit new reads if all currently
admitted reads requires something other than CPU to progress. They are
either waiting on I/O, a remote shard, or attention from their consumers
(not used currently).

The reason for making these two changes at the same time is that
up-front admission means cache reads now need to obtain a permit too.
For cache reads the optimal concurrency is 1. Anything above that just
increases latency (without increasing throughput). So we want to make sure
that if a cache reader hits it doesn't get any competition for CPU and
it can run to completion. We admit new reads only if the read misses and
has to go to disk.

A side effect of these changes is that the execution stages from the
replica-side read path are replaced with the reader concurrency
semaphore as an execution stage. This is necessary due to bad
interaction between said execution stages and up-front admission. This
has an important consequence: read timeouts are more strictly enforced
because the execution stage doesn't have a timeout so it can execute
already timed-out reads too. This is not the case with the semaphore's
queue which will drop timed-out reads. Another consequence is that, now
data and mutation reads share the same execution stage, which increases
its effectiveness, on the other hand system and user reads don't
anymore.

Fixes: #4758
Fixes: #5718

Tests: unit(dev, release, debug)

* 'reader-concurrency-semaphore-in-front-of-the-cache/v5.3' of https://github.com/denesb/scylla: (54 commits)
  test/boost/reader_concurrency_semaphore_test: add used/blocked test
  test/boost/reader_concurrency_semaphore_test: add admission test
  reader_permit: add operator<< for reader_resources
  reader_concurrency_semaphore: add reads_{admitted,enqueued} stats
  table: make_sstable_reader(): fix indentation
  table: clean up make_sstable_reader()
  database: remove now unused query execution stages
  mutation_reader: remove now unused restricting_reader
  sstables: sstable_set: remove now unused make_restricted_range_sstable_reader()
  reader_permit: remove now unused wait_admission()
  reader_concurrency_semaphore: remove now unused obtain_permit_nowait()
  reader_concurrency_semaphore: admission: flip the switch
  database: increase semaphore max queue size
  test: index_with_paging_test: increase semaphore's queue size
  reader_concurrency_semaphore: add set_max_queue_size()
  test: mutation_reader_test: remove restricted reader tests
  reader_concurrency_semaphore: remove now unused make_permit()
  test: reader_concurrency_semaphore_test: move away from make_permit()
  test: move away from make_permit()
  treewide: use make_tracking_only_permit()
  ...
2021-07-14 16:22:56 +02:00
Botond Dénes
e2dfb2df71 test/boost/reader_concurrency_semaphore_test: add used/blocked test
Make sure that releasing a bunch of used/blocked guards in random order
doesn't break the permit state.
2021-07-14 17:19:02 +03:00
Botond Dénes
0337d3ea4a test/boost/reader_concurrency_semaphore_test: add admission test
Checking every conceivable admission scenario (hopefully).
2021-07-14 17:19:02 +03:00
Botond Dénes
b81f39cec9 reader_permit: add operator<< for reader_resources
And use it in tests, it results in actually useful error messages.
2021-07-14 17:19:02 +03:00
Botond Dénes
1666ad078a reader_concurrency_semaphore: add reads_{admitted,enqueued} stats
Primarily for tests, but we could also export these, should we want to.
2021-07-14 17:19:02 +03:00