"
Also convert the foreign_reader used by it in the process.
Tests: unit(dev)
"
* 'multishard-writer-v2/v1' of https://github.com/denesb/scylla:
mutation_writer/multishard_writer: remove now unused v1 factory overloads
test/boost/mutation_writer_test: test the v2 variant of distribute_reader_and_consume_on_shards()
flat_mutation_reader: add v2 variant of make_generating_reader()
mutation_reader: multishard_writer: migrate implementation to v2
mutation_reader: convert foreign_reader to v2
streaming/consumer: convert to v2
mutation_writer/multishard_writer: add v2 variant of distribute_reader_and_consume_on_shards()
In functions such as upgrade_to_v2 (excerpt below), if the constructor
of transforming_reader throws, r needs to be destroyed, however it
hasn't been closed. However, if a reader didn't start any operations, it
is safe to destruct such a reader. This issue can potentially manifest
itself in many more readers and might be hard to track down. This commit
adds a bool indicating whether a close is anticipated, thus avoiding
errors in the destructor.
Code excerpt:
flat_mutation_reader_v2 upgrade_to_v2(flat_mutation_reader r) {
class transforming_reader : public flat_mutation_reader_v2::impl {
// ...
};
return make_flat_mutation_reader_v2<transforming_reader>(std::move(r));
}
Fixes#9065.
Currently all concepts are in mutation_fragment_v2.hh and
flat_mutation_reader_v2.hh. Organize concepts similar to how the v1 ones
are: move high-level consume concepts into
mutation_consumer_concepts.hh.
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.
Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.
The changes we applied mechanically with a script, except to
licenses/README.md.
Closes#9937
"
With this series the mutation compactor can now consume a v2 stream. On
the output side it still uses v1, so it can now act as an online
v2->v1 converter. This allows us to push out v2->v1 conversion to as far
as the compactor, usually the next to last component in a read pipeline,
just before the final consumer. For reads this is as far as we can go,
as the intra-node ABI and hence the result-sets built are v1. For
compaction we could go further and eliminate conversion altogether, but
this requires some further work on both the compactor and the sstable
writer and so it is left to be done later.
To summarize, this patchset enables a v2 input for the compactor and it
updates compaction and single partition reads to use it.
"
* 'mutation-compactor-consume-v2/v1' of https://github.com/denesb/scylla:
table: add make_reader_v2()
querier: convert querier_cache and {data,mutation}_querier to v2
compaction: upgrade compaction::make_interposer_consumer() to v2
mutation_reader: remove unecessary stable_flattened_mutations_consumer
compaction/compaction_strategy: convert make_interposer_consumer() to v2
mutation_writer: migrate timestamp_based_splitting_writer to v2
mutation_writer: migrate shard_based_splitting_writer to v2
mutation_writer: add v2 clone of feed_writer and bucket_writer
flat_mutation_reader_v2: add reader_consumer_v2 typedef
mutation_reader: add v2 clone of queue_reader
compact_mutation: make start_new_page() independent of mutation_fragment version
compact_mutation: add support for consuming a v2 stream
compact_mutation: extract range tombstone consumption into own method
range_tombstone_assembler: add get_range_tombstone_change()
range_tombstone_assembler: add get_current_tombstone()
Mechanical changes and a resulting downgrade in one caller (which is
itself converted later).
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
Since this reader is also heavily used by tests to compose a specific
workload for readers above it, we just add a v2 variant, instead of
changing the existing v1 one.
The v2 variant was written from scratch to have built-in support for
reading in reverse. It is built-on `mutation::consume()` to avoid
duplicating the logic of consuming the contents of the mutation.
A v2 native unit test is also added.
When asked to upgrade a reader that itself is a downgrade, try to
return the original v2 reader instead, and likewise when downgrading
upgraded v1 readers.
This is desirable because version transformations can result from,
say, entering/leaving a reader concurrency semaphore, and the amount
of such transformations is practically unbounded.
Such short-circuiting is only done if it is safe, that is: the
transforming reader's buffer is empty and its internal range tombstone
tracking state is discardable.
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
The main difference compared to v1 (apart from having _v2 suffix at
relevant places) is how slicing and reversing works. The v2 variant has
native reverse support built-in because the reversing reader is not
something we want to convert to v2.
A native v2 mutation-source test is also added.
Not a completely straightforward conversion as the v2 version has to
make sure to emit the current range tombstone change after
fast_forward_to() (if it changes compared to the current one before fast
forwarding).
Changes are around the two new members `_tombstone_to_emit` and
`maybe_emit_tombstone()`.
The timeout needs to be propagated to the reader's permit.
Reset it to db::no_timeout in repair_reader::pause().
Warn if set_timeout asks to change the timeout too far into the
past (100ms). It is possible that it will be passed a
past timeout from the rcp path, where the message timeout
is applied (as duration) over the local lowres_clock time
and parallel read_data messages that share the query may end
up having close, but different timeout values.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Since detach_buffer is used before closing and
destroying the reader, we want to mark it as noexcept
to simply the caller error handling.
Currently, although it does construct a new circular_buffer,
none of the constructors used may throw.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210617114240.1294501-2-bhalevy@scylladb.com>
detach_buffer exchanges the current _buffer with
a new buffer constructed using the circular_buffer(Alloc)
constructor. The compiler implicitly constructs a
tracking_allocator(reader_permit) and passes it
to the circular_buffer constructor.
This patch just makes that explicit so it would be
clearer to the reader what's going on here.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210617114240.1294501-1-bhalevy@scylladb.com>
Currently, flat_mutation_reader_v2::is_end_of_stream() returns
flat_mutation_reader_v2::impl::_end_of_stream, which means the producer
is done. The stream may be still not yet fully consumed even if
producer is done, due to internal buffering. So consumers need to make
a more elaborate check:
rd.is_end_of_stream() && rd.is_buffer_empty()
It would be cleaner if flat_mutation_reader_v2::is_end_of_stream()
returned the state of the consumer-side of the stream, since it
belongs to the consumer-side of the API. The consumption will be as
simple as:
while (!rd.is_end_of_stream()) {
consume_fragment(rd());
}
This patch makes the change on the v2 of the reader interface. v1 is
not changed to avoid problems which could happen when backporting code
which assumes new semantics into a version with the old semantics. v2
is not in any old branch yet so it doesn't have this problem and it's
a good time to make the API change.
Note that it's always safe to use the new semantics in the context
which assumes the old semantics, so v1 users can be safely converted
to v2 even if they are unware of the change.
Fixes#3067
Message-Id: <20210715102833.146914-1-tgrabiec@scylladb.com>
This warning prevents using std::move() where it can hurt
- on an unnamed temporary or a named automatic variable being
returned from a function. In both cases the value could be
constructed directly in its final destination, but std::move()
prevents it.
Fix the handful of cases (all trivial), and enable the warning.
Closes#8992
Check if the timeout has expired before issuing I/O.
Note that the sstable reader input_stream is not closed
when the timeout is detected. The reader must be closed anyhow after
the error bubbles up the chain of readers and before the
reader is destroyed. This might already happen if the reader
times out while waiting for reader_concurrency_semaphore admission.
Test: unit(dev), auth_test.test_alter_with_timeouts(debug)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210624073232.551735-1-bhalevy@scylladb.com>
The transition to v2 will be incremental. To support that, we need
adaptors between v1 and v2 which will be inserted at places which are
boundaries of conversion.
The v1 -> v2 converter needs to accumulate range tombstones, so has unbounded worst case memory footprint.
The v2 -> v1 converter trims range tombstones around clustering rows,
so generates more fragments than necessary.
Because of that, adpators are a temporary solution and we should not
release with them on the produciton code paths.
When compacting a mutation fragment stream (e.g. for sstable
compaction, data query, repair), the compactor needs to accumulate
range tombstones which are relevant for the yet-to-be-processed range.
See range_tombstone_accumulator. One problem is that it has unbounded
memory footprint because the accumulator needs to keep track of all
the tombstoned ranges which are still active.
Another, although more benign, problem is computational complexity
needed to maintain that data structure.
The fix is to get rid of the overlap of range tombstones in the
mutation fragment stream. In v2 of the stream, there is no longer a
range_tombstone fragment. Deletions of ranges of rows within a given
partition are represented with range_tombstone_change fragments. At
any point in the stream there is a single active clustered
tombstone. It is initially equal to the neutral tombstone when the
stream of each partition starts. The range_tombstone_change fragment
type signify changes of the active clustered tombstone. All fragments
emitted while a given clustered tombstone is active are affected by
that tombstone. Like with the old range_tombstone fragments, the
clustered tombstone is independent from the partition tombstone
carried in partition_start.
The v2 stream is strict about range tombstone trimming. It emits range
tombstone changes which reflect range tombstones trimmed to query
restrictions, and fast-forwarding ranges. This makes the stream more
canonical, meaning that for a given set of writes, querying the
database should produce the same stream of fragments for a given
restrictions. There is less ambiguity in how the writes are
represented in the fragment stream. It wasn't the case with v1. For
example, A given set of deletions could be produced either as one
range_tombstone, or may, split and/or deoverlapped with other
fragments. Making a stream canonical is easier for diff-calculating.
The classes related to mutation fragment streams were cloned:
flat_mutation_reader_v2, mutation_fragment_v2, and related concepts.
Refs #8625.