Currently the ctor of said class always allocates as it copies the
provided name string and it creates a new name via format().
We want to avoid this, now that the validator is used on the read path.
So defer creating the formatted name to when we actually want to log
something, which is either when log level is debug or when an error is
found. We don't care about performance in either case, but we do care
about it on the happy path.
Further to the above, provide a constructor for string literal names and
when this is used, don't copy the name string, just save a view to it.
Refs: #11174Closes#12042
Which, as its name suggests, makes the validating filter not validate
anything at all. This validation level can be used effectively to make
it so as if the validator was not there at all.
Since the end bound is exclusive, the end position should be
before_key(), not after_key().
Affects only tests, as far as I know, only there we can get an end
bound which is a clustering row position.
Would cause failures once row cache is switched to v2 representation
because of violated assumptions about positions.
Introduced in 76ee3f029cCloses#11823
The low-level `mutation_fragment_stream_validator` gets `reset()` methods that until now only the high-level `mutation_fragment_stream_validating_filter` had.
Active tombstone validation is pushed down to the low level validator.
The low level validator, which was a pain to use until now due to being very fussy on which subset of its API one used, is made much more robust, not requiring the user to stick to a subset of its API anymore.
Closes#11614
* github.com:scylladb/scylladb:
mutation_fragment_stream_validator: make interface more robust
mutation_fragment_stream_validator: add reset() to validating filter
mutation_fragment_stream_validator: move active tomsbtone validation into low level validator
do_full_buffer() is an eclectic mix of coroutines and continuations.
That makes it hard to follow what is running sequentially and
concurrently.
Convert it into a pure coroutine by changing internal continuations
to lambda coroutines. These lambda coroutines are guarded with
seastar::coroutine::lambda. Furthermore, a future that is co_awaited
is converted to immediate co_await (without an intermediate future),
since seastar::coroutine::lambda only works if the coroutine is awaited
in the same statement it is defined on.
The validator has several API families with increasing amount of detail.
E.g. there is an `operator()(mutation_fragment_v2::kind)` and an
overload also taking a position. These different API families
currently cannot be mixed. If one uses one overload-set, one has to
stick with it, not doing so will generate false-positive failures.
This is hard to explain in documentation to users (provided they even
read it). Instead, just make the validator robust enough such that the
different API subsets can be mixed in any order. The validator will try
to make most of the situation and validate as much as possible.
Behind the scenes all the different validation methods are consolidated
into just two: one for the partition level, the other for the
intra-partition level. All the different overloads just call these
methods passing as much information as they have.
A test is also added to make sure this works.
Allow the high level filtering validator to be reset() to a certain
position, so it can be used in situations where the consumption is not
continuous (fast-forwarding or paging).
Currently the active range tombstone change is validated in the high
level `mutation_fragment_stream_validating_stream`, meaning that users of
the low-level `mutation_fragment_stream_validator` don't benefit from
checking that tombstones are properly closed.
This patch moves the validation down to the low-level validator (which
is what the high-level one uses under the hood too), and requires all
users to pass information about changes to the active tombstone for each
fragment.
Found by a fragment stream validator added to the mutation-compactor (https://github.com/scylladb/scylladb/pull/11532). As that PR moves very slowly, the fixes for the issues found are split out into a PR of their own.
The first two of these issues seems benign, but it is important to remember that how benign an invalid fragment stream is depends entirely on the consumer of said stream. The present consumer of said streams may swallow the invalid stream without problem now but any future change may cause it to enter into a corrupt state.
The last one is a non-benign problem (again because the consumer reacts badly already) causing problems when building query results for range scans.
Closes#11604
* github.com:scylladb/scylladb:
shard_reader: do_fill_buffer(): only update _end_of_stream after buffer is copied
readers/mutation_readers: compacting_reader: remember injected partition-end
db/view: view_builder::execute(): only inject partition-start if needed
Commit 8ab57aa added a yield to the buffer-copy loop, which means that
the copy can yield before done and the multishard reader might see the
half-copied buffer and consider the reader done (because
`_end_of_stream` is already set) resulting in the dropping the remaining
part of the buffer and in an invalid stream if the last copied fragment
wasn't a partition-end.
Fixes: #11561
Currently injecting a partition-end doesn't update
`_last_uncompacted_kind`, which will allow for a subsequent
`next_partition()` call to trigger injecting a partition-end, leading to
an invalid mutation fragment stream (partition-end after partition-end).
Fix by changing `_last_uncompacted_kind` to `partition_end` when
injecting a partition-end, making subsequent injection attempts noop.
Fixes: #11608
The mutation fragment stream validator filter has a detailed debug log
in its constructor. To avoid putting together this message when the log
level is above debug, it is enclosed in an if, activated when log level
is debug or trace... at least that was intended. Actually the if is
activated when the log level is debug or above (info, warn or error) but
is only actually logged if the log level is exactly debug. Fix the logic
to work as intended.
Closes#11603
Today, mutation_reader_merger drops unneeded readers in batches of 4,
meaning that the merger is having to keep the memory used by 3
unneeded readers in addition to the ones being currently read from.
As each may own a lot of memory, the combined effect of this waste,
coming from parallel reads, can potentially cause memory pressure.
This batching behavior was introduced in b524f96a74,
when readers had to be destroyed synchronously, as flat_mutation_reader
lacked an async close interface. But we have gone a long way since
then. Readers can be closed asynchronously and outstanding I/O
requests will be cancelled on close.
Now, we'll close readers as soon they're uneeded, one at a time,
using a continuation chain. If we're submitting close calls faster
than we can retire them, then we wait for their completion,
preventing memory usage from growing unbounded.
The benefit of this new approach will be very good when combining
disjoint readers, where only one is active at a time for producing
fragments. As soon as we're done with the current one, then it will
be closed allowing its memory to be released, before we move on
to the next reader that follows.
Refs #11040.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Closes#11167
Now that we use emit_only_live_rows::no everywhere we can remove this
template parameters. Only the template parameter is removed, the
internal logic around it is left in place (will be removed in a next
patch), by hard-wiring `only_live()`.
At this point, none of the remaining uses of
`flat_mutation_reader` (all of which are results of calling
`downgrade_to_v1()` anyway) actually need a full-featured flat
mutation reader with its own separate buffer etc.
`mutation_fragment_v1_stream` can only be constructed by wrapping a
`flat_mutation_reader_v2`, contains enough functionality for the
remaining consumers of `mutation_fragment_v1` sources and unit tests
and no more, and does not buffer.
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
The handle must not point at this reader implementation
after it's destroyed.
This fixes use-after-free when the queue_reader_v2
is destroyed first as repair_writer_impl::_queue_reader,
before repair_writer_impl::_mq is destroyed.
The issue was introduced in 39205917a8
in the definition of `repair_writer_impl`.
Fixes#10528
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
This series gets rid of the remaining usage of flat_mutation_reader v1 in compaction
Test: sstable_compaction_test
Closes#10454
* github.com:scylladb/scylla:
compaction: sanitize headers from flat_mutation_reader v1
flat_mutation_reader: get rid of class filter
compaction: cleanup_compaction: make_partition_filter: return flat_mutation_reader_v2::filter
The only user is the tests of downgrade_to_v1(), which uses it through
mutation source. To avoid any new users popping up, we make it a private
method of the latter. In the process the pass-through optimization is
dropped, it is not needed for tests anyway.
The only user is row level repair: it is replaced with
downgrade_to_v1(make_empty_flat_reader_v2()). The row level reader has
lots of downgrade_to_v1() calls, we will deal with these later all at
once.
Another use is the empty mutation source, this is trivially converted to
use the v2 variant.
"
cache_flat_mutation_reader gets a native v2 implementation. The
underlying mutation representation is not changed: range deletions are
still stored as v1 range_tombstones in mutation_partition. These are
converted to range tombstone changes during reading.
This allows for separating the change of a native v2 reader
implementation and a native v2 in-memory storage format, enabling the
two to be done at separate times and incrementally.
This means there is still conversion ingoing when reading from cache and
when populating, but when reading from underlying, the stream can now be
passed through as-is without conversions.
Also, any future v2 related changes to the in-memory storage will now be
limited to the cache reader implementation itself.
In the process, the non-forwarding reader, whose only user is the cache,
is also converted to v2.
"
Performance results reported by Botond:
"
build/release/test/perf/perf_simple_query -c1 -m2G --flush --
duration=20
BEFORE
median 130421.76 tps ( 71.1 allocs/op, 12.1 tasks/op, 47462
insns/op)
median absolute deviation: 319.64
maximum: 131028.33
minimum: 127502.55
AFTER
median 133297.41 tps ( 64.1 allocs/op, 12.2 tasks/op, 45406
insns/op)
median absolute deviation: 2964.24
maximum: 137581.56
minimum: 123739.4
Getting rid of those upgrade/downgrade was good for allocs and ops.
Curiously there is a 0.1 rise in number of tasks though.
"
* 'row-cache-readers-v2/v1' of https://github.com/denesb/scylla:
row_cache: update reader implementations to v2
range_tombstone_change_generator: flush(): add end_of_range
readers/nonforwardable: convert to v2
read_context: fix indentation
read_context: coroutinize move_to_next_partition()
row_cache: cache_entry::read(): return v2 reader
row_cache: return v2 readers from make_reader*()
readers/delegating_v2: s/make_delegating_reader_v2/make_delegating_reader/
It has a single user, the row cache, which for now has to
upgrade/downgrade around the nonforwardable reader, but this will go
away in the next patches when the row cache readers are converted to v2
proper.
The patchset embeds the mutation_fragment upgrading logic from v1 to v2 into the mutation_fragment_queue. This way the mutation fragments coming to the mutation_fragment_queue can be v1, but the underlying query_reader receives mutation_fragment_v2, eliminating the last usage of query_reader (v1). The last commit removes query_reader, query_reader_handle and associated factory functions.
tests: unit(dev), dtest(incremental_repair_test, read_repair_test, repair_additional_test, repair_test)
Closes#10371
* github.com:scylladb/scylla:
readers: Remove queue_reader v1 and associated code.
repair: Make mutation_fragment_queue internally upgrade fragments to v2
repair: Make mutation_fragment_queue::impl a seastar::shared_ptr
It is typical in switch statements to select on an enum type and
rely on the compliler to complain if an enum value was missed. But
gcc isn't satisified since the enum could have a value outside the
declared list. Call abort() in this impossible situation to pacify
it.
Extract the common parts of the single mutation reader
and the vector-based variant into mutation_reader_base
and reuse from both readers.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
slice_mutations() is currently used only by readers/mutation_readers.cc
so there's no need to expose it.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
When flushing range tombstones up to
position_in_partition::after_all_clustered_rows(),
the range_tombstone_change_generator now emits
the closing range_tombstone_change, so there's
no need for the upgrading_consumer to do so too.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
There are at least 1 actual and 1 potential users for it; this
change converts the existing one.
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
"
First migrate all users to the v2 variant, all of which are tests.
However, to be able to properly migrate all tests off it, a v2 variant
of the restricted reader is also needed. All restricted reader users are
then migrated to the freshly introduced v2 variant and the v1 variant is
removed.
Users include:
* replica::table::make_reader_v2()
* streaming_virtual_table::as_mutation_source()
* sstables::make_reader()
* tests
This allows us to get rid of a bunch of conversions on the query path,
which was mostly v2 already.
With a few tests we did kick the can down the road by wrapping the v2
reader in `downgrade_to_v1()`, but this series is long enough already.
Tests: unit(dev), unit(boost/flat_mutation_reader_test:debug)
"
* 'remove-reader-from-mutations-v1/v3' of https://github.com/denesb/scylla:
readers: remove now unused v1 reader from mutations
test: move away from v1 reader from mutations
test/boost/mutation_reader_test: use fragment_scatterer
test/boost/mutation_fragment_test: extract fragment_scatterer into a separate hh
test/boost: mutation_fragment_test: refactor fragment_scatterer
readers: remove now unused v1 reversing reader
test/boost/flat_mutation_reader_test: convert to v2
frozen_mutation: fragment_and_freeze(): convert to v2
frozen_mutation: coroutinize fragment_and_freeze()
readers: migrate away from v1 reversing reader
db/virtual_table: use v2 variant of reversing and forwardable readers
replica/table: use v2 variant of reversing reader
sstables/sstable: remove unused make_crawling_reader_v1()
sstables/sstable: remove make_reader_v1()
readers: add v2 variant of reversing reader
readers/reversing: remove FIXME
readers: reader from mutations: use mutation's own schema when slicing
The only internal user is the v1 make reader from mutations, we use a
downgrade/upgrade to be able to use the v2 reversing reader there. This
is ugly but the v1 reader from mutations is going away soon too, so not
a real problem.