Commit Graph

112 Commits

Author SHA1 Message Date
Piotr Dulikowski
ee9bfb583c combined: mergers: remove recursion in operator()()
In mutation_reader_merger and clustering_order_reader_merger, the
operator()() is responsible for producing mutation fragments that will
be merged and pushed to the combined reader's buffer. Sometimes, it
might have to advance existing readers, open new and / or close some
existing ones, which requires calling a helper method and then calling
operator()() recursively.

In some unlucky circumstances, a stack overflow can occur:

- Readers have to be opened incrementally,
- Most or all readers must not produce any fragments and need to report
  end of stream without preemption,
- There has to be enough readers opened within the lifetime of the
  combined reader (~500),
- All of the above needs to happen within a single task quota.

In order to prevent such a situation, the code of both reader merger
classes were modified not to perform recursion at all. Most of the code
of the operator()() was moved to maybe_produce_batch which does not
recur if it is not possible for it to produce a fragment, instead it
returns std::nullopt and operator()() calls this method in a loop via
seastar::repeat_until_value.

A regression test is added.

Fixes: scylladb/scylladb#14415

Closes #14452
2023-06-30 12:07:13 +03:00
Kamil Braun
96bc78905d readers: evictable_reader: don't accidentally consume the entire partition
The evictable reader must ensure that each buffer fill makes forward
progress, i.e. the last fragment in the buffer has a position larger
than the last fragment from the previous buffer-fill. Otherwise, the
reader could get stuck in an infinite loop between buffer fills, if the
reader is evicted in-between.

The code guranteeing this forward progress had a bug: the comparison
between the position after the last buffer-fill and the current
last fragment position was done in the wrong direction.

So if the condition that we wanted to achieve was already true, we would
continue filling the buffer until partition end which may lead to OOMs
such as in #13491.

There was already a fix in this area to handle `partition_start`
fragments correctly - #13563 - but it missed that the position
comparison was done in the wrong order.

Fix the comparison and adjust one of the tests (added in #13563) to
detect this case.

Fixes #13491
2023-06-27 14:37:29 +02:00
Tomasz Grabiec
d92287f997 db: multishard: Obtain sharder from erm
This is not strictly necessary, as the multishard reader will be later
avoided altogether for tablet-based tables, but it is a step towards
converting all code to use the erm->get_sharder() instead of
schema::get_sharder().
2023-06-21 00:58:24 +02:00
Pavel Emelyanov
66e43912d6 code: Switch to seastar API level 7
In that level no io_priority_class-es exist. Instead, all the IO happens
in the context of current sched-group. File API no longer accepts prio
class argument (and makes io_intent arg mandatory to impls).

So the change consists of
- removing all usage of io_priority_class
- patching file_impl's inheritants to updated API
- priority manager goes away altogether
- IO bandwidth update is performed on respective sched group
- tune-up scylla-gdb.py io_queues command

The first change is huge and was made semi-autimatically by:
- grep io_priority_class | default_priority_class
- remove all calls, found methods' args and class' fields

Patching file_impl-s is smaller, but also mechanical:
- replace io_priority_class& argument with io_intent* one
- pass intent to lower file (if applicatble)

Dropping the priority manager is:
- git-rm .cc and .hh
- sed out all the #include-s
- fix configure.py and cmakefile

The scylla-gdb.py update is a bit hairry -- it needs to use task queues
list for IO classes names and shares, but to detect it should it checks
for the "commitlog" group is present.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #13963
2023-06-06 13:29:16 +03:00
Botond Dénes
8681f3e997 readers,mutation: move mutation_fragment_stream_validator to mutation/
The validator classes have their definition in a header located in mutation/,
while their implementation is located in a .cc in readers/mutation_reader.cc.
This patch fixes this inconsistency by moving the implementation into
mutation/mutation_fragment_stream_validator.cc. The only change is that
the validator code gets a new logger instance (but the logger variable itself
is left unchanged for now).
2023-05-09 07:55:13 -04:00
Nadav Har'El
5f37d43ee6 Merge 'compaction: validate: validate the index too' from Botond Dénes
In addition to the data file itself. Currently validation avoids the
index altogether, using the crawling reader which only relies on the
data file and ignores the index+summary. This is because a corrupt
sstable usually has a corrupt index too and using both at the same time
might hide the corruption. This patch adds targeted validation of the
index, independent of and in addition to the already existing data
validation: it validates the order of index entries as well as whether
the entry points to a complete partition in the data file.
This will usually result in duplicate errors for out-of-order
partitions: one for the data file and one for the index file.

Fixes: #9611

Closes #11405

* github.com:scylladb/scylladb:
  test/cql-pytest: add test_sstable_validation.py
  test/cql-pytest: extract scylla_path,temp_workdir fixtures to conftest.py
  tools/scylla-sstables: write validation result to stdout
  sstables/sstable: validate(): delegate to mx validator for mx sstables
  sstables/mx/reader: add mx specific validator
  mutation/mutation_fragment_stream_validator: add validator() accessor to validating filter
  sstables/mx/reader: template data_consume_rows_context_m on the consumer
  sstables/mx/reader: move row_processing_result to namespace scope
  sstables/mx/reader: use data_consumer::proceed directly
  sstables/mx/reader.cc: extend namespace to end-of-file (cosmetic)
  compaction/compaction: remove now unused scrub_validate_mode_validate_reader()
  compaction/compaction: move away from scrub_validate_mode_validate_reader()
  tools/scylla-sstable: move away from scrub_validate_mode_validate_reader()
  test/boost/sstable_compaction_test: move away from scrub_validate_mode_validate_reader()
  sstables/sstable: add validate() method
  compaction/compaction: scrub_sstables_validate_mode(): validate sstables one-by-one
  compaction: scrub: use error messages from validator
  mutation_fragment_stream_validator: produce error messages in low-level validator
2023-05-08 17:14:26 +03:00
Avi Kivity
f125a3e315 Merge 'tree: finish the reader_permit state renames' from Botond Dénes
In https://github.com/scylladb/scylladb/pull/13482 we renamed the reader permit states to more descriptive names. That PR however only covered only the states themselves and their usages, as well as the documentation in `docs/dev`.
This PR is a followup to said PR, completing the name changes: renaming all symbols, names, comments etc, so all is consistent and up-to-date.

Closes #13573

* github.com:scylladb/scylladb:
  reader_concurrency_semaphore: misc updates w.r.t. recent permit state name changes
  reader_concurrency_semaphore: update permit members w.r.t. recent permit state name changes
  reader_concurrency_semaphore: update RAII state guard classes w.r.t. recent permit state name changes
  reader_concurrency_semaphore: update API w.r.t. recent permit state name changes
  reader_concurrency_semaphore: update stats w.r.t. recent permit state name changes
2023-05-04 18:29:04 +03:00
Botond Dénes
d3749b810a mutation_fragment_stream_validator: produce error messages in low-level validator
Currently, error messages for validation errors are produced in several
places:
* the high-level validator (which is built on the low-level one)
* scrub compaction and validation compaction (scrub in validate mode)
* scylla-sstable's validate operation

We plan to introduce yet another place which would use the low-level
validator and hence would have to produce its own error messages. To cut
down all this duplication, centralize the production of error messages
in the low-level validator, which now returns a `validation_result`
object instead of bool from its validate methods. This object can be
converted to bool (so its backwards compatible) and also contains an
error message if validation failed. In the next patches we will migrate
all users of the low level validator (be that direct or indirect) to use
the error messages provided in this result object instead of coming up
with one themselves.
2023-05-02 09:42:41 -04:00
Botond Dénes
72003dc35c readers: evictable_reader: skip progress guarantee when next pos is partition start
The evictable reader must ensure that each buffer fill makes forward
progress, i.e. the last fragment in the buffer has a position larger
than the last fragment from the last buffer-fill. Otherwise, the reader
could get stuck in an infinite loop between buffer fills, if the reader
is evicted in-between.
The code guranteeing this forward change has a bug: when the next
expected position is a partition-start (another partition), the code
would loop forever, effectively reading all there is from the underlying
reader.
To avoid this, add a special case to ignore the progress guarantee loop
altogether when the next expected position is a partition start. In this
case, progress is garanteed anyway, because there is exactly one
partition-start fragment in each partition.

Fixes: #13491

Closes #13563
2023-05-02 16:19:32 +03:00
Botond Dénes
804403f618 reader_concurrency_semaphore: update RAII state guard classes w.r.t. recent permit state name changes
They is still using the old terminology for permit state names, bring
them up to date with the recent state name changes.
2023-04-19 05:20:42 -04:00
Botond Dénes
14bff955e2 readers/multishard: shard_reader: fast-forward created reader to current range
When creating the reader, the lifecycle policy might return one that was
saved on the last page and survived in the cache. This reader might have
skipped some fast-forwarding ranges while sitting in the cache. To avoid
using a reader reading a stale range (from the read's POV), check its
read range and fast forward it if necessary.
2023-03-24 08:43:03 -04:00
Botond Dénes
0aa03f85a3 readers/multishard: reader_lifecycle_policy: add get_read_range()
Allows retrieving the current read-range for the reader on the given
shard (where the method is called).
2023-03-24 08:40:11 -04:00
Botond Dénes
156e5d346d reader_permit: keep trace_state pointer on permit
And propagate it down to where it is created. This will be used to add
trace points for semaphore related events, but this will come in the
next patches.
2023-03-22 04:58:01 -04:00
Avi Kivity
29a2788b2e Merge 'reader_concurrency_semaphore: handle read blocked on memory being registered as inactive' from Botond Dénes
A read that requested memory and has to wait for it can be registered as inactive. This can happen for example if the memory request originated from a background I/O operation (a read-ahead maybe).
Handling this case is currently very difficult. What we want to do is evict such a read on-the-spot: the fact that there is a read waiting on memory means memory is in demand and so inactive reads should be evicted. To evict this reader, we'd first have to remove it from the memory wait list, which is almost impossible currently, because `expiring_fifo<>`, the type used for the wait list, doesn't allow for that. So in this PR we set out to make this possible first, by transforming all current queues to be intrusive lists of permits. Permits are already linked into an intrusive list, to allow for enumerating all existing permits. We use these existing hooks to link the permits into the appropriate queue, and back to `_permit_list` when they are not in any special queue. To make this possible we first have to make all lists store naked permits, moving all auxiliary data fields currently stored in wrappers like `entry` into the permit itself. With this, all queues and lists in the semaphore are intrusive lists, storing permits directly, which has the following implications:
* queues no longer take extra memory, as all of them are intrusive
* permits are completely self-sufficient w.r.t to queuing: code can queue or dequeue permits just with a reference to a permit at hand, no other wrapper, iterator, pointer, etc. is necessary.
* queues don't keep permits alive anymore; destroying a permit will automatically unlink it from the respective queue, although this might lead to use-after-free. Not a problem in practice, only one code-path (`reader_concurrenc_semaphore::with_permit()`) had to be adjusted.

After all that extensive preparations, we can now handle the case of evicting a reader which is queued on memory.

Fixes: #12700

Closes #12777

* github.com:scylladb/scylladb:
  reader_concurrency_semaphore: handle reader blocked on memory becoming inactive
  reader_concurrency_semaphore: move _permit_list next to the other lists
  reader_permit: evict inactive read on timeout
  reader_concurrency_semaphore: move inactive_read to .cc
  reader_concurrency_semaphore: store permits in _inactive_reads
  reader_concurrency_semaphore: inactive_read: de-inline more methods
  reader_concurrency_semaphore: make _ready_list intrusive
  reader_permit: add wait_for_execution state
  reader_concurrency_semaphore: make wait lists intrusive
  reader_concurrency_semaphore: move most wait_queue methods out-of-line
  reader_concurrency_semaphore: store permits directly in queues
  reader_permit: introduce (private) operator * and ->
  reader_concurrency_semaphore: remove redundant waiters() member
  reader_concurrency_semaphore: add waiters counter
  reader_permit: use check_abort() for timeout
  reader_concurrency_semaphore: maybe_dump_permit_diagnostics(): remove permit list param
  reader_concurrency_semaphroe: make foreach_permit() const
  reader_permit: add get_schema() and get_op_name() accessors
  reader_concurrency_semaphore: mark maybe_dump_permit_diagnostics as noexcept
2023-03-15 20:10:19 +02:00
Kefu Chai
e21926f602 flat_mutation_reader_v2: use maybe_yield() when appropriate
just came across this part of code, as `maybe_yield()` is a wrapper
around "if should_yield(): yield()", so better off using it for more
concise code.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13107
2023-03-15 15:58:55 +02:00
Botond Dénes
6181c08191 reader_concurrency_semaphore: move inactive_read to .cc
It is not used in the header anymore and moving it to the .cc allows us
to remove the dependency on flat_mutation_reader_v2.hh.
2023-03-13 08:07:53 -04:00
Botond Dénes
bcfb8715f9 reader_permit: introduce (private) operator * and ->
Currently the reader_permit has some private methods that only the
semaphore's internal calls. But this method of communication is not
consistent, other times the semaphore accesses the permit impl directly,
calling methods on that.
This commit introduces operator * and -> for reader_permit. With this,
the semaphore internals always call the reader_permit::impl methods
direcly, either via a direct reference, or via the above operators.
This makes the permit internface a little narrower and reduces
boilerplate code.
2023-03-09 06:53:11 -05:00
Botond Dénes
2694aa1078 reader_permit: use check_abort() for timeout
Instead of having callers use get_timeout(), then compare it against the
current time, set up a timeout timer in the permit, which assigned a new
`_ex` member (a `std::exception_ptr`) to the appropriate exception type
when it fires.
Callers can now just poll check_abort() which will throw when `_ex`
is not null. This is more natural and allows for more general reasons
for aborting reads in the future.
This prepares the ground for timeouts being managed inside the permit,
instead of by the semaphore. Including timing out while in a wait queue.
2023-03-09 06:53:09 -05:00
Kefu Chai
563fbb2d11 build: cmake: extract more subsystem out into its own CMakeLists.txt
namely, cdc, compaction, dht, gms, lang, locator, mutation_writer, raft, readers, replica,
service, tools, tracing and transport.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-02 10:15:25 +08:00
Botond Dénes
46efdfa1a1 Merge 'readers/nonforwarding: don't emit partition_end on next_partition,fast_forward_to' from Gusev Petr
The series fixes the `make_nonforwardable` reader, it shouldn't emit `partition_end` for previous partition after `next_partition()` and `fast_forward_to()`

Fixes: #12249

Closes #12978

* github.com:scylladb/scylladb:
  flat_mutation_reader_test: cleanup, seastar::async -> SEASTAR_THREAD_TEST_CASE
  make_nonforwardable: test through run_mutation_source_tests
  make_nonforwardable: next_partition and fast_forward_to when single_partition is true
  make_forwardable: fix next_partition
  flat_mutation_reader_v2: drop forward_buffer_to
  nonforwardable reader: fix indentation
  nonforwardable reader: refactor, extract reset_partition
  nonforwardable reader: add more tests
  nonforwardable reader: no partition_end after fast_forward_to()
  nonforwardable reader: no partition_end after next_partition()
  nonforwardable reader: no partition_end for empty reader
  row_cache: pass partition_start though nonforwardable reader
2023-03-01 09:58:14 +02:00
Petr Gusev
989ef9d358 make_nonforwardable: next_partition and fast_forward_to when single_partition is true
This flag designates that we should consume only one
partition from the underlying reader. This means that
attempts to move to another partition should cause an EOS.
2023-02-28 23:42:34 +04:00
Petr Gusev
a67776b750 make_forwardable: fix next_partition
When next_partition is called, the buffer could
contain partition_start and possibly static_row.
In this case clear_buffer_to_next_partition will
not remove anything from the buffer and the
reader position should not change. Before this patch,
however, we used to set _end_of_stream=false,
which violated the forwardable-reader
contract - the data of the next partition
was emitted after the data of the first partition
without intermediate EOS.

This bug was found when debugging
test_make_nonforwardable_from_mutations_as_mutation_source flakiness.
A corresponding focused test_make_forwardable_next_partition
has been added to exercise this problem.
2023-02-28 23:11:45 +04:00
Petr Gusev
64427b9164 flat_mutation_reader_v2: drop forward_buffer_to
This is just a strange method I came across.
It effectively does nothing but clear_buffer().
2023-02-28 23:00:02 +04:00
Petr Gusev
a517e1d6ad nonforwardable reader: fix indentation 2023-02-28 23:00:02 +04:00
Petr Gusev
beeffb899f nonforwardable reader: refactor, extract reset_partition
No observable behaviour changes, just refactor
the code.
2023-02-28 23:00:02 +04:00
Petr Gusev
88cd1c3700 nonforwardable reader: no partition_end after fast_forward_to()
This patch fixes the problem with method fast_forward_to
which is similar to the one with next_partition, no
partition_end should be injected for the partition if
fast_forward_to was called inside it.
2023-02-28 23:00:02 +04:00
Petr Gusev
8ff96e1bce nonforwardable reader: no partition_end after next_partition()
Before the patch, nonforwardable reader injected
partition_end unconditionally. This caused problems
in case next_partition() was called, the downstream
reader might have already injected its own
partition_end marker, and the one from nonforwardable
reader was a duplicate.

Fixes: #12249
2023-02-28 23:00:02 +04:00
Petr Gusev
9c5c380b0b nonforwardable reader: no partition_end for empty reader
The patch introduces the _partition_is_open flag,
inject partition_end only if there was some data
in the input reader.

A simple unit test has been added for
the nonforwardable reader which checks this
new behaviour.
2023-02-28 22:59:56 +04:00
Kefu Chai
3ae11de204 treewide: do not define/capture unused variables
these warnings are found by Clang-17 after removing
`-Wno-unused-lambda-capture` and '-Wno-unused-variable' from
the list of disabled warnings in `configure.py`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-28 21:56:53 +08:00
Kefu Chai
0cb842797a treewide: do not define/capture unused variables
these warnings are found by Clang-17 after removing
`-Wno-unused-lambda-capture` and '-Wno-unused-variable' from
the list of disabled warnings in `configure.py`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-15 22:57:18 +02:00
Avi Kivity
69a385fd9d Introduce schema/ module
Schema related files are moved there. This excludes schema files that
also interact with mutations, because the mutation module depends on
the schema. Those files will have to go into a separate module.

Closes #12858
2023-02-15 11:01:50 +02:00
Avi Kivity
c5e4bf51bd Introduce mutation/ module
Move mutation-related files to a new mutation/ directory. The names
are kept in the global namespace to reduce churn; the names are
unambiguous in any case.

mutation_reader remains in the readers/ module.

mutation_partition_v2.cc was missing from CMakeLists.txt; it's added in this
patch.

This is a step forward towards librarization or modularization of the
source base.

Closes #12788
2023-02-14 11:19:03 +02:00
Tomasz Grabiec
ccf3a13648 range_tombstone_change_merger: Introduce peek()
Returns the current tombstone without affecting state.
2023-01-27 19:15:39 +01:00
Tomasz Grabiec
42f5a7189d readers: Extract range_tombstone_change_merger 2023-01-27 19:15:39 +01:00
Botond Dénes
1d273a98b9 readers/multishard: shard_reader::close() silence read-ahead timeouts
Timouts are benign, especially on a read-ahead that turned out to be not
needed at all. They just introduce noise in the logs, so silence them.

Fixes: #12435

Closes #12441
2023-01-04 16:10:09 +02:00
Tomasz Grabiec
23e4c83155 position_in_partition: Make after_key() work with non-full keys
This fixes a long standing bug related to handling of non-full
clustering keys, issue #1446.

after_key() was creating a position which is after all keys prefixed
by a non-full key, rather than a position which is right after that
key.

This will issue will be caught by cql_query_test::test_compact_storage
in debug mode when mutation_partition_v2 merging starts inserting
sentinels at position after_key() on preemption.

It probably already causes problems for such keys.
2022-12-14 14:47:33 +01:00
Botond Dénes
49ec7caf27 mutation_fragment_stream_validator: avoid allocation when stream is correct
Currently the ctor of said class always allocates as it copies the
provided name string and it creates a new name via format().
We want to avoid this, now that the validator is used on the read path.
So defer creating the formatted name to when we actually want to log
something, which is either when log level is debug or when an error is
found. We don't care about performance in either case, but we do care
about it on the happy path.
Further to the above, provide a constructor for string literal names and
when this is used, don't copy the name string, just save a view to it.

Refs: #11174

Closes #12042
2022-11-22 19:19:18 +02:00
Botond Dénes
5c245b4a5e mutation_fragment_stream_validator: add a 'none' validation level
Which, as its name suggests, makes the validating filter not validate
anything at all. This validation level can be used effectively to make
it so as if the validator was not there at all.
2022-11-11 09:58:44 +02:00
Botond Dénes
f1a039fc2b treewide: use ::for_partition_start() instead of ::partition_start_tag_t{}
We just added a convenience static factory method for partition start,
change the present users of the clunky constructor+tag to use it
instead.
2022-11-11 09:58:18 +02:00
Tomasz Grabiec
687df05e28 db: make_forwardable::reader: Do not emit range_tombstone_change with position past the range
Since the end bound is exclusive, the end position should be
before_key(), not after_key().

Affects only tests, as far as I know, only there we can get an end
bound which is a clustering row position.

Would cause failures once row cache is switched to v2 representation
because of violated assumptions about positions.

Introduced in 76ee3f029c

Closes #11823
2022-10-24 17:06:52 +03:00
Tomasz Grabiec
9dae2b9c02 Merge 'mutation_fragment_stream_validator: various API improvements' from Botond Dénes
The low-level `mutation_fragment_stream_validator` gets `reset()` methods that until now only the high-level `mutation_fragment_stream_validating_filter` had.
Active tombstone validation is pushed down to the low level validator.
The low level validator, which was a pain to use until now due to being very fussy on which subset of its API one used, is made much more robust, not requiring the user to stick to a subset of its API anymore.

Closes #11614

* github.com:scylladb/scylladb:
  mutation_fragment_stream_validator: make interface more robust
  mutation_fragment_stream_validator: add reset() to validating filter
  mutation_fragment_stream_validator: move active tomsbtone validation into low level validator
2022-10-03 16:23:46 +02:00
Avi Kivity
aca96c4103 readers/multishard: restore shard_reader_v2::do_fill_buffer() indentation 2022-09-30 19:19:51 +03:00
Avi Kivity
b08196f3b3 readers/multishard: convert shard_reader_v2::do_fill_buffer() to a pure coroutine
do_full_buffer() is an eclectic mix of coroutines and continuations.
That makes it hard to follow what is running sequentially and
concurrently.

Convert it into a pure coroutine by changing internal continuations
to lambda coroutines. These lambda coroutines are guarded with
seastar::coroutine::lambda. Furthermore, a future that is co_awaited
is converted to immediate co_await (without an intermediate future),
since seastar::coroutine::lambda only works if the coroutine is awaited
in the same statement it is defined on.
2022-09-30 19:19:48 +03:00
Botond Dénes
895522db23 mutation_fragment_stream_validator: make interface more robust
The validator has several API families with increasing amount of detail.
E.g. there is an `operator()(mutation_fragment_v2::kind)` and an
overload also taking a position. These different API families
currently cannot be mixed. If one uses one overload-set, one has to
stick with it, not doing so will generate false-positive failures.
This is hard to explain in documentation to users (provided they even
read it). Instead, just make the validator robust enough such that the
different API subsets can be mixed in any order. The validator will try
to make most of the situation and validate as much as possible.
Behind the scenes all the different validation methods are consolidated
into just two: one for the partition level, the other for the
intra-partition level. All the different overloads just call these
methods passing as much information as they have.
A test is also added to make sure this works.
2022-09-26 13:26:26 +03:00
Botond Dénes
4d017b6d7e mutation_fragment_stream_validator: add reset() to validating filter
Allow the high level filtering validator to be reset() to a certain
position, so it can be used in situations where the consumption is not
continuous (fast-forwarding or paging).
2022-09-26 10:17:28 +03:00
Botond Dénes
a8cbf66573 mutation_fragment_stream_validator: move active tomsbtone validation into low level validator
Currently the active range tombstone change is validated in the high
level `mutation_fragment_stream_validating_stream`, meaning that users of
the low-level `mutation_fragment_stream_validator` don't benefit from
checking that tombstones are properly closed.
This patch moves the validation down to the low-level validator (which
is what the high-level one uses under the hood too), and requires all
users to pass information about changes to the active tombstone for each
fragment.
2022-09-26 10:17:27 +03:00
Tomasz Grabiec
ccbfe2ef0d Merge 'Fix invalid mutation fragment stream issues' from Botond Dénes
Found by a fragment stream validator added to the mutation-compactor (https://github.com/scylladb/scylladb/pull/11532). As that PR moves very slowly, the fixes for the issues found are split out into a PR of their own.
The first two of these issues seems benign, but it is important to remember that how benign an invalid fragment stream is depends entirely on the consumer of said stream. The present consumer of said streams may swallow the invalid stream without problem now but any future change may cause it to enter into a corrupt state.
The last one is a non-benign problem (again because the consumer reacts badly already) causing problems when building query results for range scans.

Closes #11604

* github.com:scylladb/scylladb:
  shard_reader: do_fill_buffer(): only update _end_of_stream after buffer is copied
  readers/mutation_readers: compacting_reader: remember injected partition-end
  db/view: view_builder::execute(): only inject partition-start if needed
2022-09-22 17:57:27 +02:00
Botond Dénes
0ccb23d02b shard_reader: do_fill_buffer(): only update _end_of_stream after buffer is copied
Commit 8ab57aa added a yield to the buffer-copy loop, which means that
the copy can yield before done and the multishard reader might see the
half-copied buffer and consider the reader done (because
`_end_of_stream` is already set) resulting in the dropping the remaining
part of the buffer and in an invalid stream if the last copied fragment
wasn't a partition-end.

Fixes: #11561
2022-09-22 13:54:36 +03:00
Botond Dénes
16a0025dc3 readers/mutation_readers: compacting_reader: remember injected partition-end
Currently injecting a partition-end doesn't update
`_last_uncompacted_kind`, which will allow for a subsequent
`next_partition()` call to trigger injecting a partition-end, leading to
an invalid mutation fragment stream (partition-end after partition-end).
Fix by changing `_last_uncompacted_kind` to `partition_end` when
injecting a partition-end, making subsequent injection attempts noop.

Fixes: #11608
2022-09-22 13:54:36 +03:00
Botond Dénes
ef7471c460 readers/mutation_reader: stream validator: fix log level detection logic
The mutation fragment stream validator filter has a detailed debug log
in its constructor. To avoid putting together this message when the log
level is above debug, it is enclosed in an if, activated when log level
is debug or trace... at least that was intended. Actually the if is
activated when the log level is debug or above (info, warn or error) but
is only actually logged if the log level is exactly debug. Fix the logic
to work as intended.

Closes #11603
2022-09-22 09:41:45 +03:00