Fast forwarding is delegated to the underlying reader and assumes the
it's supported. The only corner case requiring special handling that has
shown up in the tests is producing partition start mutation in the
forwarding case if there are no other fragments.
compacting state keeps track of uncompacted partition start, but doesn't
emit it by default. If end of stream is reached without producing a
mutation fragment, partition start is not emitted. This is invalid
behaviour in the forwarding case, so I've added a public method to
compacting state to force marking partition as non-empty. I don't like
this solution, as it feels like breaking an abstraction, but I didn't
come across a better idea.
Tests: unit(dev, debug, release)
Message-Id: <20220128131021.93743-1-mikolaj.sieluzycki@scylladb.com>
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.
Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.
The changes we applied mechanically with a script, except to
licenses/README.md.
Closes#9937
Move replica-oriented classes to the replica namespace. The main
classes moved are ::database, ::keyspace, and ::table, but a few
ancillary classes are also moved. There are certainly classes that
should be moved but aren't (like distributed_loader) but we have
to start somewhere.
References are adjusted treewide. In many cases, it is obvious that
a call site should not access the replica (but the data_dictionary
instead), but that is left for separate work.
scylla-gdb.py is adjusted to look for both the new and old names.
The database, keyspace, and table classes represent the replica-only
part of the objects after which they are named. Reading from a table
doesn't give you the full data, just the replica's view, and it is not
consistent since reconciliation is applied on the coordinator.
As a first step in acknowledging this, move the related files to
a replica/ subdirectory.
The gc_grace_seconds is a very fragile and broken design inherited from
Cassandra. Deleted data can be resurrected if cluster wide repair is not
performed within gc_grace_seconds. This design pushes the job of making
the database consistency to the user. In practice, it is very hard to
guarantee repair is performed within gc_grace_seconds all the time. For
example, repair workload has the lowest priority in the system which can
be slowed down by the higher priority workload, so that there is no
guarantee when a repair can finish. A gc_grace_seconds value that is
used to work might not work after data volume grows in a cluster. Users
might want to avoid running repair during a specific period where
latency is the top priority for their business.
To solve this problem, an automatic mechanism to protect data
resurrection is proposed and implemented. The main idea is to remove the
tombstone only after the range that covers the tombstone is repaired.
In this patch, a new table option tombstone_gc is added. The option is
used to configure tombstone gc mode. For example:
1) GC a tombstone after gc_grace_seconds
cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'timeout'} ;
This is the default mode. If no tombstone_gc option is specified by the
user. The old gc_grace_seconds based gc will be used.
2) Never GC a tombstone
cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'disabled'};
3) GC a tombstone immediately
cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'immediate'};
4) GC a tombstone after repair
cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'repair'};
In addition to the 'mode' option, another option 'propagation_delay_in_seconds'
is added. It defines the max time a write could possibly delay before it
eventually arrives at a node.
A new gossip feature TOMBSTONE_GC_OPTIONS is added. The new tombstone_gc
option can only be used after the whole cluster supports the new
feature. A mixed cluster works with no problem.
Tests: compaction_test.py, ninja test
Fixes#3560
[avi: resolve conflicts vs data_dictionary]
Currently the test assumes that fragments represent weakly monotonic
upper bounds and therefore unconditionally overwrites the upper-bound on
receiving each fragment. Range tombstones however violate this as a
range tombstone with a smaller position (lower bound) may have a higher
upper bound than some or all fragments that follow it in the stream.
This causes test failures after the converting the combined reader to
v2, but not before, no idea why.
As part of the drive to move over to flat_mutation_reader_v2, update
make_filtering_reader(). Since it doesn't examine range tombstones
(only the partition_start, to filter the key) the entire patch
is just glue code upgrading and downgrading users in the pipeline
(or removing a conversion, in one case).
Test: unit (dev)
Closes#9723
Add schema parameter so that:
* Caller has better control over schema -- especially relevant for
reverse reads where it is not possible to follow the convention of
passing the query schema which is reversed compared to that of the
mutations.
* Now that we don't depend on the mutations for the schema, we can lift
the restriction on mutations not being empty: this leads to safer
code. When the mutations parameter is empty, an empty reader is
created.
Add "make_" prefix to follow convention of similar reader factory
functions.
Tests: unit(dev)
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20211115155614.363663-1-bdenes@scylladb.com>
For reversed reads we must adjust the lower/upper bounds used by the
`position_reader_queue` and `clustering_combined_reader`. The bounds are
calculated using the mutation schema, but we need bounds calculated
using the query schema which is reversed.
The two might not be the same in case the schema was upgraded or if we
are reading in reverse. It is important to use the passed-in query
schema consistently during a read.
`time_series_sstable_set` uses `clustering_combined_reader` to implement
efficient single-partition reads. It provides a `position_reader_queue`
to the reader. This queue returns readers to the sstables from the set
in order of the sstables' lower bounds, and with each reader it provides
an upper bound for the positions-in-partition returned by the reader.
Until now we would assume non-reversed queries only. Reversed queries
were implemented by performing forward query in the lower layers
and reversing the results at the upper-most layer of the reader stack.
Before pushing the reversing down to the sources (in particular,
to sstable readers), we need to support the reverse mode in
`time_series_sstable_set` and the queue it provides to
`clustering_combined_reader`.
This requires using different lower and upper bounds in the queue.
For non-reversed reads we used `sstable::min_position()` as the lower
bound and `sstable::max_position()` as the upper bound. For reversed
reads all comparisons performed by `clustering_combined_reader` will be
reversed, as it will use a reversed schema. We can then use
`sstable::max_position().reversed()` for the lower bound and
`sstable::min_position().reversed()` for the upper bound.
It's easy to forget about supplying the correct value for a parameter
when it has a default value specified. It's safer if 'production code'
is forced to always supply these parameters manually.
The default values were mostly useful in tests, where some parameters
didn't matter that much and where the majority of uses of the class are.
Without default values adding a new parameter is a pain, forcing one to
modify every usage in the tests - and there are a bunch of them. To
solve this, we introduce a new constructor which requires passing the
`for_tests` tag, marking that the constructor is only supposed to be
used in tests (and the constructor has an appropriate comment). This
constructor uses default values, but the other constructors - used in
'production code' - do not.
The SEASTAR_THREAD_TEST_CASE runs the provided lambda in async
context. The sstables::test_env::run_with_async does the same.
This (script-generated) patch makes all of the found cases be
SEASTAR_TEST_CASE and, respectively, return the async future
from the run_with_async().
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Rename the old version to `sstables::make_reader_v1()`, to have a
nicely searcheable eradication target.
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
gcc 10.3.1 spews the following error:
```
_test_generator::generate_scenario(std::mt19937&) const’:
test/boost/mutation_reader_test.cc:3731:28: error: comparison of integer expressions of different signedness: ‘int’ and ‘long unsigned int’ [-Werror=sign-compare]
3731 | for (auto i = 0; i < num_ranges; ++i) {
| ~~^~~~~~~~~~~~
```
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210728073538.2467040-1-bhalevy@scylladb.com>
Soon we will switch to up-front admission which will break these tests.
No point in trying to fix them as once the switch is done we'll retire
the restricted reader too. Remove these tests now so they are not in the
way of progress.
As a preparation for up-front admission, add a permit parameter to
`make_multishard_streaming_reader()`, which will be the admitted permit
once we switch to up-front admission. For now it has to be a
non-admitted permit.
A nice side-effect of this patch is that now permits will have a
use-case specific description, instead of the generic
"multishard-streaming-reader" one
This warning prevents using std::move() where it can hurt
- on an unnamed temporary or a named automatic variable being
returned from a function. In both cases the value could be
constructed directly in its final destination, but std::move()
prevents it.
Fix the handful of cases (all trivial), and enable the warning.
Closes#8992
The factory method doesn't match the signature of
`reader_lifecycle_policy::make_reader()`, notably the permit is missing.
Add it as it is important that the wrapping evictable reader and
underlying reader share the permits.
Permits were designed such that there is one permit per read, being
shared by all readers in that read. Make sure readers created by tests
adhere to this.
When recreating the underlying reader, the evictable reader validates
that the first partition key it emits is what it expects to be. If the
read stopped at the end of a partition, it expects the first partition
to be a larger one. If the read stopped in the middle of a certain
partition it expects the first partition to be the same it stopped in
the middle of. This latter assumption doesn't hold in all circumstances
however. Namely, the partition it stopped in the middle of might get
compacted away in the time the read was paused, in which case the read
will resume from a greater partition. This perfectly valid cases however
currently triggers the evictable reader's self validation, leading to
the abortion of the read and a scary error to be logged. Relax this
check to accept any partition that is >= compared to the one the read
stopped in the middle of.
"
This series prevents broken_promise or dangling reader errors
when (resharding) compaction is stopped, e.g. during shutdown.
At the moment compaction just closes the reader unilaterally
and this yanks the reader from under the queue_reader_handle feet,
causing dangling queue reader and broken_promise errors
as seen in #8755.
Instead, fix queue_reader::close to set value on the
_full/_not_full promises and detach from the handle,
and return _consume_fut from bucket_writer::consume
if handle is terminated.
Fixes#8755
Test: unit(dev)
DTest: materialized_views_test.py:TestMaterializedViews.interrupt_build_process_and_resharding_half_to_max_test(debug)
"
* tag 'propagate-reader-abort-v3' of github.com:bhalevy/scylla:
mutation_writer: bucket_writer: consume: propagate _consume_fut if queue_reader_handle is_terminated
queue_reader_handle: add get_exception method
queue_reader: close: set value on promises on detach from handle
To prevent broken_promise exception.
Since close() is manadatory the queue_reader destructor,
that just detaches the reader from the handle, is not
needed anymore, so remove it.
Adjust the test_queue_reader unit test accordingly.
Test: test_queue_reader(dev)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
The lifecycle of the reader lifecycle policy and all the resources the
reads use is now enclosed in that of the multishard reader thanks to its
close() method. We can now remove all the workarounds we had in place to
keep different resources as long as background reader cleanup finishes.
The purpose of the class in question is to start sharded storage
service to make its global instance alive. I don't know when exactly
it happened but no code that instantiates this wrapper really needs
the global storage service.
Ref: #2795
tests: unit(dev), perf_sstable(dev)
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20210526170454.15795-1-xemul@scylladb.com>
Currently `with_cql_test_env()` is equivalent to
`with_cql_test_env_thread()`, which resulted in many tests using the
former while really needing the latter and getting away with it. This
equivalence is incidental and will go away soon, so make sure all cql
test env using tests that expect to be run in a thread use the
appropriate variant.
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20210514141614.128213-1-bdenes@scylladb.com>
The mutation_reader_test is already one of our largest test files.
Move the reader concurrency semaphore related tests to a new file,
making them easier to find making the mutation reader test a little bit
smaller too.
These two tests (restricted_reader_timeout and
restricted_reader_max_queue_length) are testing the semaphore in
reality, but through the restricted reader, which is distracting as it
needlessly brings in an additional layer into the picture. Rewrite them
to test the semaphore directly, getting much lighter in the process.
This unit test checks that the semaphore doesn't get into a deadlock
when contended, in the presence of many memory-only reads (that don't
wait for admission). This is tested by simulating the 3 kind of reads we
currently have in the system:
* memory-only: reads that don't pass admission and only own memory.
* admitted: reads that pass admission.
* evictable: admitted reads that are furthermore evictable.
The test creates and runs a large number of these reads in parallel,
read kinds being selected randomly, then creates a watchdog which
kills the test if no progress is being made.
This unit test passes a read through admission again-and-again, just
like an evictable reader would be during its lifetime. When readmitted
the read sometimes has to wait and sometimes not. This is to check that
the readmitting a previously admitted reader doesn't leak any units.
fa43d7680 recently introduced mandatory closing of readers before they
are destroyed. One reader destroy path that was left not closing the
reader before destruction is `inactive_reader_handle::abandon()`. This
path is executed when the handle is destroyed while still referring to a
non-evicted inactive read. This patch fixes it up to close the reader
and adds a small unit test which checks that this happens.
Make flat_mutation_reader::impl::close pure virtual
so that all implementations are required to implemnt it.
With that, provide a trivial implementation to
all implementations that currently use the default,
trivial close implementation.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
In addition to clear_inactive_reads, that's currently called when
the database object is destroyed, introduce a stop() method that will:
1. wait on all background closes of inactive_reads.
2. close all present inactive_reads and waits on their close.
3. signal waiters on the wait_list via broken() with a proper
exception indicating that the semaphore was closed.
In addition, assert in the semaphore's destructor
that it has no remaining inactive reads.
Stop must be called from whoever owns the r_c_s.
Mainly, from database::stop.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>