The only user is row level repair: it is replaced with
downgrade_to_v1(make_empty_flat_reader_v2()). The row level reader has
lots of downgrade_to_v1() calls, we will deal with these later all at
once.
Another use is the empty mutation source, this is trivially converted to
use the v2 variant.
"
First migrate all users to the v2 variant, all of which are tests.
However, to be able to properly migrate all tests off it, a v2 variant
of the restricted reader is also needed. All restricted reader users are
then migrated to the freshly introduced v2 variant and the v1 variant is
removed.
Users include:
* replica::table::make_reader_v2()
* streaming_virtual_table::as_mutation_source()
* sstables::make_reader()
* tests
This allows us to get rid of a bunch of conversions on the query path,
which was mostly v2 already.
With a few tests we did kick the can down the road by wrapping the v2
reader in `downgrade_to_v1()`, but this series is long enough already.
Tests: unit(dev), unit(boost/flat_mutation_reader_test:debug)
"
* 'remove-reader-from-mutations-v1/v3' of https://github.com/denesb/scylla:
readers: remove now unused v1 reader from mutations
test: move away from v1 reader from mutations
test/boost/mutation_reader_test: use fragment_scatterer
test/boost/mutation_fragment_test: extract fragment_scatterer into a separate hh
test/boost: mutation_fragment_test: refactor fragment_scatterer
readers: remove now unused v1 reversing reader
test/boost/flat_mutation_reader_test: convert to v2
frozen_mutation: fragment_and_freeze(): convert to v2
frozen_mutation: coroutinize fragment_and_freeze()
readers: migrate away from v1 reversing reader
db/virtual_table: use v2 variant of reversing and forwardable readers
replica/table: use v2 variant of reversing reader
sstables/sstable: remove unused make_crawling_reader_v1()
sstables/sstable: remove make_reader_v1()
readers: add v2 variant of reversing reader
readers/reversing: remove FIXME
readers: reader from mutations: use mutation's own schema when slicing
In most files it was unused. We should move these to the patch which
moved out the last interesting reader from mutation_reader.hh (and added
the corresponding new header include) but its probably not worth the
effort.
Some other files still relied on mutation_reader.hh to provide reader
concurrency semaphore and some other misc reader related definitions.
The flat_mutation_reader files were conflated and contained multiple
readers, which were not strictly necessary. Splitting optimizes both
iterative compilation times, as touching rarely used readers doesn't
recompile large chunks of codebase. Total compilation times are also
improved, as the size of flat_mutation_reader.hh and
flat_mutation_reader_v2.hh have been reduced and those files are
included by many file in the codebase.
With changes
real 29m14.051s
user 168m39.071s
sys 5m13.443s
Without changes
real 30m36.203s
user 175m43.354s
sys 5m26.376s
Closes#10194
"
Also convert the foreign_reader used by it in the process.
Tests: unit(dev)
"
* 'multishard-writer-v2/v1' of https://github.com/denesb/scylla:
mutation_writer/multishard_writer: remove now unused v1 factory overloads
test/boost/mutation_writer_test: test the v2 variant of distribute_reader_and_consume_on_shards()
flat_mutation_reader: add v2 variant of make_generating_reader()
mutation_reader: multishard_writer: migrate implementation to v2
mutation_reader: convert foreign_reader to v2
streaming/consumer: convert to v2
mutation_writer/multishard_writer: add v2 variant of distribute_reader_and_consume_on_shards()
Memtables are a replica-side entity, and so are moved to the
replica module and namespace.
Memtables are also used outside the replica, in two places:
- in some virtual tables; this is also in some way inside the replica,
(virtual readers are installed at the replica level, not the
cooordinator), so I don't consider it a layering violation
- in many sstable unit tests, as a convenient way to create sstables
with known input. This is a layering violation.
We could make memtables their own module, but I think this is wrong.
Memtables are deeply tied into replica memory management, and trying
to make them a low-level primitive (at a lower level than sstables) will
be difficult. Not least because memtables use sstables. Instead, we
should have a memtable-like thing that doesn't support merging and
doesn't have all other funky memtable stuff, and instead replace
the uses of memtables in sstable tests with some kind of
make_flat_mutation_reader_from_unsorted_mutations() that does
the sorting that is the reason for the use of memtables in tests (and
live with the layering violation meanwhile).
Test: unit (dev)
Closes#10120
One is a reincarnation of the recently removed
test_multishard_combining_reader_non_strictly_monotonic_positions. The
latter was actually targeting the evictable reader but through the
multishard reader, probably for historic reasons (evictable reader was
part of the multishard reader family).
The other one checks that active range tombstones changes are properly
terminated when the partition ends abruptly after recreating the reader.
Before we add a v2 output option to the compactor, we want to get rid of
all the v1 inputs to make it simpler. This means that for a while the
compacting reader will be in a strange place of having a v2 input and a
v1 output. Hopefully, not for long.
Fast forwarding is delegated to the underlying reader and assumes the
it's supported. The only corner case requiring special handling that has
shown up in the tests is producing partition start mutation in the
forwarding case if there are no other fragments.
compacting state keeps track of uncompacted partition start, but doesn't
emit it by default. If end of stream is reached without producing a
mutation fragment, partition start is not emitted. This is invalid
behaviour in the forwarding case, so I've added a public method to
compacting state to force marking partition as non-empty. I don't like
this solution, as it feels like breaking an abstraction, but I didn't
come across a better idea.
Tests: unit(dev, debug, release)
Message-Id: <20220128131021.93743-1-mikolaj.sieluzycki@scylladb.com>
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.
Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.
The changes we applied mechanically with a script, except to
licenses/README.md.
Closes#9937
Move replica-oriented classes to the replica namespace. The main
classes moved are ::database, ::keyspace, and ::table, but a few
ancillary classes are also moved. There are certainly classes that
should be moved but aren't (like distributed_loader) but we have
to start somewhere.
References are adjusted treewide. In many cases, it is obvious that
a call site should not access the replica (but the data_dictionary
instead), but that is left for separate work.
scylla-gdb.py is adjusted to look for both the new and old names.
The database, keyspace, and table classes represent the replica-only
part of the objects after which they are named. Reading from a table
doesn't give you the full data, just the replica's view, and it is not
consistent since reconciliation is applied on the coordinator.
As a first step in acknowledging this, move the related files to
a replica/ subdirectory.
The gc_grace_seconds is a very fragile and broken design inherited from
Cassandra. Deleted data can be resurrected if cluster wide repair is not
performed within gc_grace_seconds. This design pushes the job of making
the database consistency to the user. In practice, it is very hard to
guarantee repair is performed within gc_grace_seconds all the time. For
example, repair workload has the lowest priority in the system which can
be slowed down by the higher priority workload, so that there is no
guarantee when a repair can finish. A gc_grace_seconds value that is
used to work might not work after data volume grows in a cluster. Users
might want to avoid running repair during a specific period where
latency is the top priority for their business.
To solve this problem, an automatic mechanism to protect data
resurrection is proposed and implemented. The main idea is to remove the
tombstone only after the range that covers the tombstone is repaired.
In this patch, a new table option tombstone_gc is added. The option is
used to configure tombstone gc mode. For example:
1) GC a tombstone after gc_grace_seconds
cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'timeout'} ;
This is the default mode. If no tombstone_gc option is specified by the
user. The old gc_grace_seconds based gc will be used.
2) Never GC a tombstone
cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'disabled'};
3) GC a tombstone immediately
cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'immediate'};
4) GC a tombstone after repair
cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'repair'};
In addition to the 'mode' option, another option 'propagation_delay_in_seconds'
is added. It defines the max time a write could possibly delay before it
eventually arrives at a node.
A new gossip feature TOMBSTONE_GC_OPTIONS is added. The new tombstone_gc
option can only be used after the whole cluster supports the new
feature. A mixed cluster works with no problem.
Tests: compaction_test.py, ninja test
Fixes#3560
[avi: resolve conflicts vs data_dictionary]
Currently the test assumes that fragments represent weakly monotonic
upper bounds and therefore unconditionally overwrites the upper-bound on
receiving each fragment. Range tombstones however violate this as a
range tombstone with a smaller position (lower bound) may have a higher
upper bound than some or all fragments that follow it in the stream.
This causes test failures after the converting the combined reader to
v2, but not before, no idea why.
As part of the drive to move over to flat_mutation_reader_v2, update
make_filtering_reader(). Since it doesn't examine range tombstones
(only the partition_start, to filter the key) the entire patch
is just glue code upgrading and downgrading users in the pipeline
(or removing a conversion, in one case).
Test: unit (dev)
Closes#9723
Add schema parameter so that:
* Caller has better control over schema -- especially relevant for
reverse reads where it is not possible to follow the convention of
passing the query schema which is reversed compared to that of the
mutations.
* Now that we don't depend on the mutations for the schema, we can lift
the restriction on mutations not being empty: this leads to safer
code. When the mutations parameter is empty, an empty reader is
created.
Add "make_" prefix to follow convention of similar reader factory
functions.
Tests: unit(dev)
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20211115155614.363663-1-bdenes@scylladb.com>
For reversed reads we must adjust the lower/upper bounds used by the
`position_reader_queue` and `clustering_combined_reader`. The bounds are
calculated using the mutation schema, but we need bounds calculated
using the query schema which is reversed.
The two might not be the same in case the schema was upgraded or if we
are reading in reverse. It is important to use the passed-in query
schema consistently during a read.
`time_series_sstable_set` uses `clustering_combined_reader` to implement
efficient single-partition reads. It provides a `position_reader_queue`
to the reader. This queue returns readers to the sstables from the set
in order of the sstables' lower bounds, and with each reader it provides
an upper bound for the positions-in-partition returned by the reader.
Until now we would assume non-reversed queries only. Reversed queries
were implemented by performing forward query in the lower layers
and reversing the results at the upper-most layer of the reader stack.
Before pushing the reversing down to the sources (in particular,
to sstable readers), we need to support the reverse mode in
`time_series_sstable_set` and the queue it provides to
`clustering_combined_reader`.
This requires using different lower and upper bounds in the queue.
For non-reversed reads we used `sstable::min_position()` as the lower
bound and `sstable::max_position()` as the upper bound. For reversed
reads all comparisons performed by `clustering_combined_reader` will be
reversed, as it will use a reversed schema. We can then use
`sstable::max_position().reversed()` for the lower bound and
`sstable::min_position().reversed()` for the upper bound.
It's easy to forget about supplying the correct value for a parameter
when it has a default value specified. It's safer if 'production code'
is forced to always supply these parameters manually.
The default values were mostly useful in tests, where some parameters
didn't matter that much and where the majority of uses of the class are.
Without default values adding a new parameter is a pain, forcing one to
modify every usage in the tests - and there are a bunch of them. To
solve this, we introduce a new constructor which requires passing the
`for_tests` tag, marking that the constructor is only supposed to be
used in tests (and the constructor has an appropriate comment). This
constructor uses default values, but the other constructors - used in
'production code' - do not.
The SEASTAR_THREAD_TEST_CASE runs the provided lambda in async
context. The sstables::test_env::run_with_async does the same.
This (script-generated) patch makes all of the found cases be
SEASTAR_TEST_CASE and, respectively, return the async future
from the run_with_async().
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Rename the old version to `sstables::make_reader_v1()`, to have a
nicely searcheable eradication target.
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
gcc 10.3.1 spews the following error:
```
_test_generator::generate_scenario(std::mt19937&) const’:
test/boost/mutation_reader_test.cc:3731:28: error: comparison of integer expressions of different signedness: ‘int’ and ‘long unsigned int’ [-Werror=sign-compare]
3731 | for (auto i = 0; i < num_ranges; ++i) {
| ~~^~~~~~~~~~~~
```
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210728073538.2467040-1-bhalevy@scylladb.com>
Soon we will switch to up-front admission which will break these tests.
No point in trying to fix them as once the switch is done we'll retire
the restricted reader too. Remove these tests now so they are not in the
way of progress.