It is a very bad taste to sleep anywhere in the code. The test should be
fixed to explicitly test various orderings between concurrent
operations, but before that happens let's at least readuce how much
those sleeps slow it down by changing it from milliseconds to
microseconds.
This reverts commit b36733971b. That commit made
run_mutation_reader_tests() support mutation_sources that do not implement
streamed_mutation::forwarding::yes. This is wrong since mutation_sources
are not allowed to ignore or otherwise not support that mode. Moreover,
there is absolutely no reason for them to do so since there is a
make_forwardable() adapter that can make any mutation_reader a
forwardable one (at the cost of performance, but that's not always
important).
It is wrong to silently ignore streamed_mutation::forwarding option
which completely changes how the reader is supposed to operate. The best
solution is to use make_forwardable() adapter which changes
non-forwardable reader to a forwardable one.
Test the interaction of the multishard reader with the foreign reader
w.r.t next_partition(). next_partition() is a special operation, as it
its execution is deferred until the next cross-shard operations. Give it
some extra stress-testing.
This API provides a way for the mulishard reader to pause inactive shard
readers and later resume them when they are needed again. This allows
for these paused shard readers to be evicted when the node is under
pressure.
How the readers are made evictable while paused is up to the clients.
Using this API in the `multishard_combining_reader` and implementing it
in the clients will be done in the next patches.
Provide default implementation for the new virtual methods to facilitate
gradual adoption.
It doesn't make sense for the multishard reader anyway, as it's only
used by the row-cache. We are about to introduce the pausing of inactive
shard readers, and it would require complex data structures and code
to maintain support for this feature that is not even used. So drop it.
* seastar d59fcef...b924495 (2):
> build: Fix protobuf generation rules
> Merge "Restructure files" from Jesse
Includes fixup patch from Jesse:
"
Update Seastar `#include`s to reflect restructure
All Seastar header files are now prefixed with "seastar" and the
configure script reflects the new locations of files.
Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com>
Message-Id: <5d22d964a7735696fb6bb7606ed88f35dde31413.1542731639.git.jhaberku@scylladb.com>
"
sprint() recently became more strict, throwing on sprint("%s", 5). Replace
with the more modern format().
Mechanically converted with https://github.com/avikivity/unsprint.
After the new in-memory representation of cells was introduced there was
a regression in atomic_cell_or_collection::operator<< which stopped
printing the content of the cell. This makes debugging more incovenient
are time-consuming. This patch fixes the problem. Schema is propagated
to the atomic_cell_or_collection printer and the full content of the
cell is printed.
Fixes#3571.
Message-Id: <20181024095413.10736-1-pdziepak@scylladb.com>
Currently, restricting_mutation_reader::fill_buffer justs reads
lower-layer reader's fragments one by one without doing any further
transformations. This change just swaps the parent-child buffers in a
single step, as suggested in #3604, and, hence, removing any possible
per-fragment overhead.
I couldn't find any test that exercises restricting_mutation_reader as
a mutation source, so I added test_restricted_reader_as_mutation_source
in mutation_reader_test.
Tests: unit (release), though these 4 tests are failing regardless of
my changes (they fail on master for me as well): snitch_reset_test,
sstable_mutation_test, sstable_test, sstable_3_x_test.
Fixes: #3604
Signed-off-by: George Kollias <georgioskollias@gmail.com>
Message-Id: <1540052861-621-1-git-send-email-georgioskollias@gmail.com>
The single-range overload, when used by
make_multishard_streaming_reader(), has to create a reader that is
forwardable. Otherwise the multishard streaming reader will not produce
any output as it cannot fast-forward its shard readers to the ranges
produced by the generator.
Also add a unit test, that is based on the real-life purpose the
multishard streaming reader was designed for - serving partition
from a shard, according to a sharding configuration that is different
than the local one. This is also the scenario that found the buf in the
first place.
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <bf799961bfd535882ede6a54cd6c4b6f92e4e1c1.1539235034.git.bdenes@scylladb.com>
Currently timeout is opt-in, that is, all methods that even have it
default it to `db::no_timeout`. This means that ensuring timeout is used
where it should be is completely up to the author and the reviewrs of
the code. As humans are notoriously prone to mistakes this has resulted
in a very inconsistent usage of timeout, many clients of
`flat_mutation_reader` passing the timeout only to some members and only
on certain call sites. This is small wonder considering that some core
operations like `operator()()` only recently received a timeout
parameter and others like `peek()` didn't even have one until this
patch. Both of these methods call `fill_buffer()` which potentially
talks to the lower layers and is supposed to propagate the timeout.
All this makes the `flat_mutation_reader`'s timeout effectively useless.
To make order in this chaos make the timeout parameter a mandatory one
on all `flat_mutation_reader` methods that need it. This ensures that
humans now get a reminder from the compiler when they forget to pass the
timeout. Clients can still opt-out from passing a timeout by passing
`db::no_timeout` (the previous default value) but this will be now
explicit and developers should think before typing it.
There were suprisingly few core call sites to fix up. Where a timeout
was available nearby I propagated it to be able to pass it to the
reader, where I couldn't I passed `db::no_timeout`. Authors of the
latter kind of code (view, streaming and repair are some of the notable
examples) should maybe consider propagating down a timeout if needed.
In the test code (the wast majority of the changes) I just used
`db::no_timeout` everywhere.
Tests: unit(release, debug)
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <1edc10802d5eb23de8af28c9f48b8d3be0f1a468.1536744563.git.bdenes@scylladb.com>
The test in question is `restricted_reader_timeout`.
Use `eventually_true()` instead of `sleep()` to wait on the timeout
expiring, making the test more robust on overloaded machines.
Also fix graceful failing, another longstanding issue with this test.
The readers created for the test need different destruction logic
depending whether the test failed or succeeded. Previously this was
dealt with by using the logic that worked in case of success and using
asserts to abort when the test failed, thus avoiding developers
investigating the invalid memory accesses happening due to the wrong
destruction logic.
The solution is to use BOOST_CHECK() macro in the check that validates
whether timeout works as expected. This allows for execution to continue
even if the test failed, and thus allows for running the proper cleanup
code even when the test failed.
Fixes: #3719
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <911921dffc924f1b0a3e86408757467e9be2b65b.1537169933.git.bdenes@scylladb.com>
Extend `remote_reader_factory` interface so that it accepts all standard
mutation reader creation parameters. This allows factory lambdas to be
truly stateless, not having to capture any standard parameters that is
needed for creating the reader.
Standard parameters are those accepted by
`mutation_source::make_reader()`.
Make combined_mutation_reader_test more interesting:
* Set the levels on the sstables
* Arrange the sstables so that they test for the "jump over sstables"
bug.
* Arrange the sstables so that they test for the "gap between sstables".
While at it also make the code more compact.
sstable_set::incremental selector was migrated to ring position, follow
suit and migrate the reader_selector to use ring_position as well. Above
correctness this also improves efficiency in case of dense tables,
avoiding prematurely selecting sstables that share the token but start
at different keys, altough one could argue that this is a niche case.
Currently `sstable_set::incremental_selector` works in terms of tokens.
Sstables can be selected with tokens and internally the token-space is
partitioned (in `partitioned_sstable_set`, used for LCS) with tokens as
well. This is problematic for severeal reasons.
The sub-range sstables cover from the token-space is defined in terms of
decorated keys. It is even possible that multiple sstables cover
multiple non-overlapping sub-ranges of a single token. The current
system is unable to model this and will at best result in selecting
unnecessary sstables.
The usage of token for providing the next position where the
intersecting sstables change [1] causes further problems. Attempting to
walk over the token-space by repeatedly calling `select()` with the
`next_position` returned from the previous call will quite possibly lead
to an infinite loop as a token cannot express inclusiveness/exclusiveness
and thus the incremental selector will not be able to make progress when
the upper and lower bounds of two neighbouring intervals share the same
token with different inclusiveness e.g. [t1, t2](t2, t3].
To solve these problems update incremental_selector to work in terms of
ring position. This makes it possible to partition the token-space
amoing sstables at decorated key granularity. It also makes it possible
for select() to return a next_position that is guaranteed to make
progress.
partitioned_sstable_set now builds the internal interval map using the
decorated key of the sstables, not just the tokens.
incremental_selector::select() now uses `dht::ring_position_view` as
both the selector and the next_position. ring_position_view can express
positions between keys so it can also include information about
inclusiveness/exclusiveness of the next interval guaranteeing forward
progress.
[1] `sstable_set::incremental_selector::selection::next_position`
Tests testing different aspects of `foreign_reader` and
`multishard_combining_reader` are designed to run with a certain minimum
shard count. Running them with any shard count below this minimum makes
them useless at best but can even fail them.
Refuse to run these tests when the shard count is below the required
minimum to avoid an accidental and unnecessary investigation into a
false-positive test failure.
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <d24159415b6a9d74eafb8355b6e3fba98c1ff7ff.1530274392.git.bdenes@scylladb.com>
Scylla uses shared-nothing architecture and communication between the
shards is supposed to be very restricted. Applying to a memtable
mutations created on another shard is way to complex operation to be
allowed. Using frozen mutations is a much safer option.
test_multishard_combining_reader_destroyed_with_pending_create_reader
was failing because it relied on smp == 3 and thus the shard on which
the reader creation is blocked being shard-2. Since the test requires to
be run with smp >= 3 we can hardcode this shard to be 2 because if the
test runs at all we are guaranteed to have at least smp >= 3.
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <38883a1f4c18ca0cd065aa13826a4f1858353289.1525328233.git.bdenes@scylladb.com>
These tests are quite complicated and require intimate knowledge of how
foreign_reader and multishard_combining_reader operates. Knowing these
two objects is still required to understand the tests but make it that
much easier by explaining how they were designed to test what they test.
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <8de580131a8652924de920c2bc68a98e579398ee.1525328226.git.bdenes@scylladb.com>
'shard' is a short-lived on-stack variable that gets captured by
reference by continuation that gets executed on another shard.
Fixes a race condition that leads to an heap-use-after-free.
Message-Id: <20180502150507.2776-1-pdziepak@scylladb.com>
The test_foreign_reader_destroyed_with_pending_read_ahead test currently
doesn't ensure that the objects in it's scope are destroyed in the
correct order. This is necessary as there are severeal foreign pointers
to objects that live on remote shards and use each other. Since
foreign pointers destory their managed object in the background we
cannot rely on the to reliably destroy objects in order, nor can we be
sure when the object they manage is actually destroy.
So to work around that ensure that the puppet_reader is destroyed before
the remote_control it references even has a chance of being destroyed.
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <232eaa899878b03fb2a765c2916e4f05841472a3.1525269726.git.bdenes@scylladb.com>
Add tests for foreign_reader and multishard_combining_reader that check
that readers destroyed while there is pending read-head will not result
in use-after-free.
Specifically check that:
* multishard_combining_reader destroyed with pending reader creation
* foreign_reader destroyed with pending read-ahead
* multishard_combining_reader destroyed with pending read-ahead
does not result in use-after-free or SEGFAULT.
These tests try to do their best to check for correct behaviour with
various BOOST_REQUIRE* checks but they still heavily rely on ASAN to
detect any use-after-free, SEGFAULT or similar errors.
Of the test_multishard_combining_reader_reading_empty_table test.
Running this test with smp=3 instead of smp=2 helps detecting additional
read-ahead related memory problems.
reader_wrapper's _timeout defaults to now(), which means to time
out immediately rather than no timeout.
Fix by switching to a time_point, defaulting to no_timeout, and
provide a compatible constructor (with a duration parameter) for
callers that do want a duration-based timeout.
Tests: mutation_reader_test (debug, release)
Message-Id: <20180305111739.31972-1-avi@scylladb.com>
Shared pointer don't like being shared across shards.
Fixes assertion failure in build/debug/tests/mutation_reader_test.
Message-Id: <20180201125017.30259-1-pdziepak@scylladb.com>
Raphael recently caught this test failing. I can't really reproduce it,
but it seems to me that it is a timing issue: we execute two different
statements, each one should timeout after 10ms. After 20ms, we make sure
that they both timed out.
They don't (in his system), which is explained by the fact that we are
no longer using high resolution clocks for the timeouts. Expirations for
lowres clocks will only happen at every 10ms, and in the worst case we
will miss twoa.
So the fix I am proposing here is to just account for potential
innacuracies in the clocks and calculations by waiting a bit longer.
Ideally, we would use the manual clock for this. But in this case, this
would mean adding template parameters to pretty much all of the
mutation_reader path.
Currently, not only the test failed, it also had an use-after-free
SIGSEGV. That happens because we give up on the reader while the
timeouts is still to happen.
It is the caller responsibility to ensure the lifetime of the reader is
correct. Dealing with that cleanly would require a cancelation mechanism
that we don't have, so we'll just add an assertion that will fail more
gracefully than the SIGSEGV.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
We are not propagating timeouts to fast_forward_to in the
mutation_reader_test. This is not currently causing any issue, but I
noticed it while chasing one - so let's fix it.
Signed-off-by: Glauber Costa <glauber@scylladb.com>