Not necessitating these to be extracted from the sstable dir path. This
practically allows for la/mx sstables at non-standard paths to be
opened. This will be used by the `scylla-sstable` tool which wants to be
flexible about where the sstables it opens are located.
sstables_manager superseeds previous implementation of sstables_tracker
for tracking lifetime of the tables. Update scylla-gdb.py to use
sstables_manager in a backwards compatible way, as sstables_manager is
not available in Scylla Enterprise 2020.1. Add explicit test for
"scylla sstables" command, as previously only "scylla active-sstables"
was tested.
Closes#9439
pytest supports - if the "repeat" extension is installed - a convenient
and efficient way to repeat the same test (or all of them) multiple times.
Since it's very useful, let's document it in cql-pytest/README.md.
By the way, our test.py also has a "--repeat" option, but it can only run
all cql-pytest tests, not just repeat a single small test, and it is also
slower (and arguably, different) because it restarts Scylla instead of
running a test 100 times on the same Scylla.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211007122146.624210-1-nyh@scylladb.com>
The test generates random mutations and eliminates mutations whose keys
tokenize to 0, in particular it eliminates mutations with empty
partition keys (which should not end up in sstables).
However it would do that after using the randomly generated mutations to
create their reversed versions. So the reversed versions of mutations
with empty partition keys would stay.
Fix by placing the workaround earlier in the test.
Closes#9447
rewrite_sstables() can be asked to stop either on shutdown or on an
user-triggered comapction which forces all ongoing compaction to stop,
like scrub.
turns out we weren't actually bailing out from do_until() when task
cannot proceed. So rewrite_sstables() potentially runs into an infinite
loop which in turn causes shutdown or something else waiting on it
to hang forever.
found this while auditting code.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20211005233601.155442-1-raphaelsc@scylladb.com>
I stumbled upon this failure in dev mode:
```
test/boost/sstable_conforms_to_mutation_source_test.cc(0): Entering test case "test_sstable_reversing_reader_random_schema"
sstable_conforms_to_mutation_source_test: ./seastar/src/core/fstream.cc:205: virtual seastar::file_data_source_impl::~file_data_source_impl(): Assertion `_reads_in_progress == 0' failed.
Aborting on shard 0.
```
Since dev mode has no debug symbols I can't
decode the stack trace so I'm not 100% sure about
the root cause and I couldn't reproduce it in release
or debug modes yet.
One vulnerability in the current code is that r1
won't be closed if an exception is thrown before r1 and r2
are moved to `compare_readers` so this change adds
a deferred close of r1 in this case.
Test: sstable_conforms_to_mutation_source_test(dev)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20211006144009.696412-1-bhalevy@scylladb.com>
This is a translation of Cassandra's CQL unit test source file
validation/operations/SelectOrderByTest.java into our our cql-pytest
framework.
This test file includes 17 tests for various features and corners of
SELECT's "ORDER BY" feature. All these tests pass on Cassandra, but
three fail on Scylla and are marked as xfail:
One previously-unknown Scylla bug:
Refs #9435: SELECT with IN, ORDER BY and function call does not obey
the ORDER BY
And two new reproducers for already known bugs:
Refs #2247: ORDER BY should allow skipping equality-restricted
clustering columns
Refs #7751: Allow selecting map values and set elements, like in
Cassandra 4.0
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211005174140.571056-1-nyh@scylladb.com>
Introduce new syntax in IDL compiler to allow generating
registration/sending code for RPC verbs:
```
verb [[attr1, attr2...] my_verb (args...) -> return_type;
```
`my_verb` RPC verb declaration corresponds to the
`netw::messaging_verb::MY_VERB` enumeration value to identify the
new RPC verb.
For a given `idl_module.idl.hh` file, a registrator class named
`idl_module_rpc_verbs` will be created if there are any RPC verbs
registered within the IDL module file.
These are the methods being created for each RPC verb:
```
static void register_my_verb(netw::messaging_service* ms, std::function<return_type(args...)>&&);
static future<> unregister_my_verb(netw::messaging_service* ms);
static future<> send_my_verb(netw::messaging_service* ms, netw::msg_addr id, args...);
```
Each method accepts a pointer to an instance of `messaging_service`
object, which contains the underlying seastar RPC protocol
implementation, that is used to register verbs and pass messages.
There is also a method to unregister all verbs at once:
```
static future<> unregister(netw::messaging_service* ms);
```
The following attributes are supported when declaring an RPC verb
in the IDL:
* `[[with_client_info]]` - the handler will contain a const reference to
an `rpc::client_info` as the first argument.
* `[[with_timeout]]` - an additional `time_point` parameter is supplied
to the handler function and `send*` method uses `send_message_*_timeout`
variant of internal function to actually send the message.
* `[[one_way]]` - the handler function is annotated by
`future<rpc::no_wait_type>` return type to designate that a client
doesn't need to wait for an answer.
The `-> return_type` clause is optional for two-way messages. If omitted,
the return type is set to be `future<>`.
For one-way verbs, the use of return clause is prohibited and the
signature of `send*` function always returns `future<>`.
No existing code is affected.
Ref: #1456Closes#9359
* github.com:scylladb/scylla:
idl: support generating boilerplate code for RPC verbs
idl: allow specifying multiple attributes in the grammar
message: messaging_service: extract RPC protocol details and helpers into a separate header
Let's major compact the smallest tables first, increasing chances of
success if low on disk space.
parallel_for_each() didn't have any effect on space requirement as
compaction_manager serializes major compaction in a shard.
As parallel_for_each() is no longer used, find_column_family() is now
used before each compact_all_sstables() to avoid a race with table drop.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20211005135257.31931-1-raphaelsc@scylladb.com>
Ever since we started testing Alternator with tests written in Python
and using Amazon's "boto3" library, one limitation kept annoying us:
Boto3 verifies the validity of the request parameters before passing
them on to the server. It verifies that mandatory parameters are not
missing, that parameters have the right types, and sometimes even the
right ranges - all in the library before ever sending the request.
This meant that in many cases, we couldn't get good test coverage for
Alternator's server-side handling of *wrong* parameters.
As it turns out, it is trivial to tell boto3 to *not* do its client-side
request validation, with the `parameter_validation=False` config flag.
We just never noticed that such a flag existed :-)
So this patch adds this flag. It then fixes a few tests which expected
ParameterValidationError - this error is the client-side validation
failure, but should now be replaced by checking the server-side error.
The patch also adds a couple of invalid parameter checks that we
couldn't do before because of boto3's eagerness to check them on the
client side. We can add a lot more of these error tests in the future,
now that we got rid of client-side validation.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211005095514.537226-1-nyh@scylladb.com>
This patch adds yet another test for Alternator's unimplemented feature
of adding a GSI to an already existing table (issue #5022), but this
test is for a very specific corner case - tables which contain string
attributes with an empty value - the corner case described in
issue #9424:
DynamoDB used to forbid any string attributes from being set to an empty
string, but this changed in May 2020, and since then empty strings are
allowed - but NOT as keys. So although it is legal to set a string
attribute to an empty string, if this table has a GSI whose key is that
specific attribute, the update command is refused. We already had a
test for this - test_gsi_empty_value.
However, the case in this patch is the case where a GSI is added to a
table *after* the table already has data. In this case (as this test
demonstrates), we are supposed to drop the items which have the empty
string key from the GSI.
Even when #5022 (the ability to add GSIs to existing tables) will be done,
this test will continue to fail. The unique problem of this test is that
Scylla's materialized views *do* allow empty strings as clustering keys
(right now) and even partition keys (after #9375 will be solved), while
we don't want them to enter the GSI. We will probably need to add to the
view's filter, which right now contains (as required) "x IS NOT NULL"
also the filter "x != ''" (when x's type is a string or binary) so that
items with empty-string keys will be dropped.
Refs #5022
Refs #9375
Refs #9424
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211003170636.477582-1-nyh@scylladb.com>
DynamoDB has a rather baroque definition of numbers, and in particular
it does *not* allow numeric attributes to be set to infinity or NaN.
Although I did check invalid numbers in the past, manually, I was never
able to write a unit test for this in the past - because the boto3
library catches such errors on the client side, and prevents the test from
sending broken requests to the server. So in this patch, I finally came up
with a solution - a context manager client_no_transform() which
yields a client which does NOT do any transformation or validation
on the request's parameters, allowing us to use boto3 to create
improper requests - and test the server's handling of them.
The test in this patch passes - it did not discover a new bug, but
it is a useful regression test and the client_no_transform() trick
can be used in more error-case tests which until now we were unable
to write.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211004161809.520236-1-nyh@scylladb.com>
This allows us to forward-declare raw_selector, which in turn reduces
indirect inclusions of expression.hh from 147 to 58, reducing rebuilds
when anything in that area changes.
Includes that were lost due to the change are restored in individual
translation units.
Closes#9434
base64.hh pulls in the huge rjson.hh, so if someone just wants
a base64 codec they have to pull in the entire rapidjson library.
Move the json related parts of base64.hh to rjson.hh and adjust
includes and namespaces.
In practice it doesn't make much difference, as all users of base64
appear to want json too. But it's cleaner not to mix the two.
Closes#9433
"
There's a run_mutation_source_tests lib helper that runs a bunch of
tests sequentially. The problem is that it does 4 different flavors
of it each being a certain decoration over provided reader. This
amplification makes some test cases run enormous amount of time
without any chance for parallelizm.
The simplest way to help running those cases in parallel is to teach
the slowest cases to run different flavors of mutation source tests
in dedicated cases. This patch makes it so.
The resulting timings are
dev debug
sequential run: 2m1s 53m50s
--parallel-cases (+ this patch): 1m3s 31m15s
tests: unit(dev, debug)
"
* 'br-parallel-mutation-source-tests' of https://github.com/xemul/scylla:
test: Split multishard combining reader case
test: Split database test case
test: Split run_mutation_source_tests
It's possible that the server drops the snapshot in the same iteration
of `io_fiber` loop as it tries to send it (the sending of messages
happens after snapshot dropping). Handle this case by throwing an
exception.
As a preparation we also fix the code in `server_impl::send_snapshot` so
it works correctly when `rpc::send_snapshot` throws or returns a ready
future.
Refs #9407.
* kbr/snapshot-handle-errors:
test: raft: randomized_nemesis_test: remove an obsolete comment
test: raft: randomized_nemesis_test: handle missing snapshot in `rpc::send_snapshot`
raft: server: handle `rpc::send_snapshot` returning instantly
It's possible that the server drops the snapshot in the same iteration
of `io_fiber` loop as it tries to send it (the sending of messages
happens after snapshot dropping). Handle this case.
Refs #9407.
If `rpc::send_snapshot` returned immediately with a ready future, or if
it threw, the code in `server_impl::send_snapshot` would not update
`_snapshot_transfers` correctly.
The code assumed that the continuation attached to `rpc::send_snapshot`
(with `then_wrapped`) was executed after `_snapshot_transfer` below the
`rpc::send_snapshot` call was updated. That would not necessarily be true
(the continuation may even not have been executed at all if
`rpc::send_snapshot` threw).
Fix that by wrapping the `rpc::send_snapshot` call into a continuation
attached to `later()`.
Originally authored by Gleb <gleb@scylladb.com>, I added a comment.
All the cases in this test also run mutation source tests and the
case with single-fragment buffer takes times more time to execute
than the others.
Splitting this single case so that it runs mutation source tests
flavours in different cases improves the test parallelizm.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The test_database_with_data_in_sstables_is_a_mutation_source case runs
the mutation source tests in one go. The problem is that on each step
a whole new ks:cf is created which takes the majority of the tests time.
In the end of the day this case is the slowest one in the suite being
up to two times longer (depending on mode) than the #2 on this list.
This patch splits the case into 4 so that each mutation source flavor
is run in separate case.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
There are 4 flavours of mutation source tests that are all ran
sequentially -- plain, reversed and upgrade/downgrade ones that
check v1<->v2 conversions.
This patch splits them all into individual calls so that some
tests may want to have dedicated cases for each. "By default" they
are all run as they were.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The set_local_host_id() accepts UUID references and starts
to save it in local keyspace and in all shards' local cache.
Before it was coroutinized the UUID was copied on captures
and survived, after it it remains references. The problem is
that callers pass local variables as arguments that go away
"really soon".
Fix it to accept UUID as value, it's short enough for safe
and painless copy.
fixes: #9425
tests: dtest.ReplaceAddress_rbo_enabled.replace_node_diff_ip(dev)
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20211004145421.32137-1-xemul@scylladb.com>
The previous check would find a leader once and assume that it does not
change, and that the first attempt at sending a request to this leader
succeeds. In reality the leader may change at the end of the test (e.g.
it may be in the middle of stepping down when we find it) and in general
it may take some time for the cluster to stabilize. The new check tries
a few times to find a leader and perform a request - until a time limit
is reached.
* kbr/nemesis-liveness-check:
test: raft: randomized_nemesis_test: better liveness check at the end of generator test
test: raft: randomized_nemesis_test: take `time_point` instead of `duration` in `wait_for_leader`
The previous check would find a leader once and assume that it does not
change, and that the first attempt at sending a request to this leader
succeeds. In reality the leader may change at the end of the test (e.g.
it may be in the middle of stepping down when we find it) and in general
it may take some time for the cluster to stabilize. The new check tries
a few times to find a leader and perform a request - until a time limit
is reached.
The commit also removes an incorrect assertion inside in `wait_for_leader`.
Until now reversed queries were implemented inside
`querier::consume_page` (more precisely, inside the free function
`consume_page` used by `querier::consume_page`) by wrapping the
passed-in reader into `make_reversing_reader` and then consuming
fragments from the resulting reversed reader.
The first couple of commits change that by pushing the reversing down below
the `make_combined_reader` call in `table::query`. This allows
working on improving reversing for memtables independently from
reversing for sstables.
We then extend the `index_reader` with functions that allow
reading the promoted index in reverse.
We introduce `partition_reversing_data_source`, which wraps an sstable data
file and returns data buffers with contents of a single chosen partition
as if the rows were stored in reverse order.
We use the reversing source and the extended index reader in
`mx_sstable_mutation_reader` to implement efficient (at least in theory)
reversed single-partition reads.
The patchset disables cache for reversed reads. Fast-forwarding
is not supported in the mx reader for reversed queries at this point.
Details in commit messages. Read the commits in topological order
for best review experience.
Refs: #9134
(not saying "Fixes" because it's only for single-partition queries
without forwarding)
Closes#9281
* github.com:scylladb/scylla:
table: add option to automatically bypass cache for reversed queries
test: reverse sstable reader with random schema and random mutations
sstables: mx: implement reversed single-partition reads
sstables: mx: introduce partition_reversing_data_source
sstables: index_reader: add support for iterating over clustering ranges in reverse
clustering_key_filter: clustering_key_filter_ranges owning constructor
flat_mutation_reader: mention reversed schema in make_reversing_reader docstring
clustering_key_filter: document clustering_key_filter_ranges::get_ranges
Currently the new reversing sstable algorithms do not support fast
forwarding and the cache does not yet handle reversed results. This
forced us to disable the cache for reversed queries if we want to
guarantee bounded memory. We introduce an option that does this
automatically (without specifying `bypass cache` in the query) and turn
it on by default.
If the user decides that they prefer to keep the cache at the
cost of fetching entire partitions into memory (which may be viable
if their partitions are small) during reversed queries, the option can
be turned off. It is live-updateable.
The test generates a random set of mutations and creates two readers:
- one by reversing the mutations, creating an sstable out of the result,
and querying it in reverse,
- one by creating an sstable directly from the mutations and querying it
in forward mode.
It checks that the readers give equal results.
The test already managed to find a bug where offsets returned by the
sstable index were interpreted incorrectly as absolute instead of
relative. It also helped find another bug unrelated to reversing (#9352).
Surprisingly few tests use the random schema and random mutation
utilities which seem to be quite powerful.
We use partition_reversing_data_source and the new `index_reader` methods
to implement single-partition reads in `mx_sstable_mutation_reader`.
The parsing logic does not need to change: the buffers returned by the
source already contain rows in reversed clustering order.
Some changes were required in `mp_row_consumer_m` which processes the
parsed rows and emits appropriate mutation fragments. The consumer uses
`mutation_fragment_filter` underneath to decide whether a fragment
should be ignored or not (e.g. the parsed fragment may come from outside
the requested clustering range), among other things. Previously
`mutation_fragment_filter` was provided a `partition_slice`. If the
slice was reversed, the filter would use
`clustering_key_filter_ranges::get_ranges` to obtain the clustering
ranges from the slice in unreversed order (they were reversed in the
slice) since we didn't perform any reversing in the reader. Now the
reader provides the ranges directly instead of the slice; furthermore,
the ranges are provided in native-reversed format (the order of ranges
is reversed and the ranges themselves are also reversed), and the schema
provided to the filter is also reversed. Thus to the filter everything
appears as if it was used during a non-reversed query but on a table
with reversed schema, which works correctly given the fact that the
reader is feeding parsed rows into the consumer in reversed order.
During reversed queries the reader uses alternative logic for skipping
to a later range (or, speaking in non-reversed terms, to an earlier range),
which happens in `advance_context`. It asks the index to advance its
upper bound in reverse so that the reversing_data_source notices the
change of the index end position and returns following buffers with rows
from the new range.
There is a slight difference in behavior of the reader from
`mp_row_consumer_m`'s point of view. For non-reversed reads, after
the consumer obtains the beginning of a row (`consume_row_start`)
- which contains the row's position but not the columns - and tells the
reader that the row won't be emitted because we need to skip to a later
range, the reader would tell the data source (the 'context') immediately
to skip to a later range by calling `skip_to`. This caused the source
not to return the rest of the row, and the rest of the row would not
be fed to the consumer (`consume_row_end`). However, for reversed reads,
the data source performs skipping 'on its own', after it notices that
the index end position has changed. This may happen 'too late', causing
the rest of the row to be returned anyway. We are prepared for this
situation inside `mp_row_consumer` by consulting the mutation fragment
filter again when the rest of the row arrives.
Fast forwarding is not supported at this point, which is fine given that
the cache is disabled for reversed queries for now (and the cache is the
only user of fast forwarding).
The `partition_slice` provided by callers is provided in 'half-reversed'
format for reversed queries, where the order of clustering ranges is
reversed, but the ranges themselves are not. This means we need to modify
the slice sometimes: for non-single-partition queries the mx reader must
use a non-reversed slice, and for single-partition queries the mx reader
must use a native-reversed slice (where the clustering ranges themselves
are reversed as well). The modified slice must be stored somewhere; we
store it inside the mx reader itself so we don't need to allocate more
intermediate readers at the call sites. This causes the interface of
`mx::make_reader` to be a bit weird: for non-single-partition queries
where the provided slice is reversed the reader will actually return a
non-reversed stream of fragments, telling the user to reverse the stream
on their own. The interface has been documented in detail with
appropriate comments.
This patch adds an implementation of a data source that wraps an sstable
data file and returns data buffers with contents of one partition in the
sstable as if the rows of the partition were present in a reversed
order. In other words, to the user of the source the partition appears
to be reversed. We shall call this an 'intermediary' data source.
As part of the interface of the intermediary source the user is also
given read access to the source's current position over the data file,
and the constructor of the source takes a reference to `index_reader`.
This is necessary because the index operates directly on data file
offsets and we want the user to be able to use the index to skip
sequences of rows.
In order to ask the source to skip a sequence of rows - e.g. when jumping
between clustering ranges - the user must advance the index' upper bound
in reverse (to an earlier position). The source will then notice that
the end position of the index has changed and take appropriate action.
An alternative would be to translate the data positions of
`index_reader` to 'reversed positions' of the intermediary and then use
`skip_to` for skipping, as we do for forward reads. However this
solution would introduce more complexity to `index_reader` and the
intermediary source. One reason for the complexity in the input stream
is that we would have two kinds of skips: a single row skip,
and a skip to a clustering range. We know the offset of the next row,
so we could check that to differentiate them. We would also need to add
an information about the position of first clustering row and end of
the last one in the index_reader. Skipping by checking the index seems
to be overall simpler.
For simplicity, the intermediary stream always starts with
parsing the partition header and (if present) the static row,
and returning the corresponding bytes as a result of the first
read.
After partition header and static row we must find the last row entry of
the requested range. If the range ends before the partition end (i.e.
there are more row entries after the range) we can use the 'previous
unfiltered size' of the row following the range; otherwise we must scan
the last promoted index block and take its last row.
After finding the data range of the last row, we parse rows
consecutively in reversed order. We must parse the rows partially
to learn their lengths and the positions of previous rows. We're
using similar constructs as in the sstable parser, but it only
contains a small part of the parsing coroutine and doesn't perform
any correctness checks. The parser for rows still turned out rather
big mostly because we can't always deduce the size of the clustering
blocks without reading the block header.
The parser allows reading rows while skipping their bodies also in
non-reversed order, which we are making use of while reading the
last promoted index block.
The intermediary data source has one more utility: reversing range
tombstones. When we read a tombstone bound/boundary, we modify
the data buffer so that the resulting bound/boundary has the reversed
kind (so we don't read ends before starts) and the boundaries have their
before/after timestamps swapped.
In the sstable reader, we iterate over clustering ranges using the
index_reader, which normally only accepts advancing to increasing
positions. In this patch we add methods for advancing the index
reader in reverse.
To simplify our job we restrict our attention to a single implementation
of the promoted index block cursor: `bsearch_clustered_cursor`. The
`index_reader` methods for advancing in reverse will thus assume that
this implementation is used. The assumption is correct given that we're
working only with sstables of versions >= mc, which is indeed the
intended use case. We add some documentation in appropriate places to
make this obvious.
We extend `bsearch_clustered_cursor` with two methods:
`advance_past(pos)`, which advances the cursor to the first block after
`pos` (or to the end if there is no such block), and
`last_block_offset()`, which returns the data file offset of the first
row from the last promoted index block.
To efficiently find the position in the data file of the last row
of the partition (which we need when performing a reversed query)
the sstable reader may need to read the span of the entire last promoted
index block in the data file. To learn where the block starts it can use
`index_reader::last_block_offset()`, which is implemented in terms of
`bsearch_clustered_cursor::last_block_offset()`.
When performing a single partition read in forward order, the reader
asks the index to position its lower bound at the start of the partition
and its upper bound after the end of the slice. It starts by reading the
first range. After exhausting a range it jumps to the next one by asking
the index to advance the lower bound.
For reverse single partition reads we'll take a similar approach: the
initial bound positions are as in the forward case. However, we start
with the last range and after exhausting a range we want to jump to a
previous one; we will do it by advancing the upper bound in reverse
(i.e. moving it closer to the beginning of the partition). For this
we introduce the `index_reader::advance_reverse` function.
"
The storage_service is involved in the cdc_generation_service guts
more than needed.
- the bool _for_testing bit is cdc-only
- there's API-only cdc_generation_service getter
- cdc_g._s. startup code partially sits in s._s. one
This patch cleans most of the above leaving only the startup
_cdc_gen_id on board.
tests: unit(dev)
refs: #2795
"
* 'br-storage-service-vs-cdc-2' of https://github.com/xemul/scylla:
api: Use local sharded<cdc::generation_service> reference
main: Push cdc::generation_service via API
storage_service: Ditch for_testing boolean
cdc: Replace db::config with generation_service::config
cdc: Drop db::config from description_generator
cdc: Remove all arguments from maybe_rewrite_streams_descriptions
cdc: Move maybe_rewrite_streams_descriptions into after_join
cdc: Squash two methods into one
cdc: Turn make_new_cdc_generation a service method
cdc: Remove ring-delay arg from make_new_cdc_generation
cdc: Keep database reference on generation_service
The end_point_hints_manager's field _last_written_rp is initialized in
its regular constructor, but is not copied in the move constructor.
Because the move constructor is always involved when creating a new
endpoint manager, the _last_written_rp field is effectively always
initialized with the zero-argument constructor, and is set to the zero
value.
This can cause the following erroneous situation to occur:
- Node A accumulates hints towards B.
- Sync point is created at A.
It will be used later to wait for currently accumulated hints.
- Node A is restarted.
The endpoint manager A->B is created which has bogus value in the
_last_written_rp (it is set to zero).
- Node A replays its hints but does not write any new ones.
- A hint flush occurs.
If there are no hint segments on disk after flush, the endpoint
manager sets its last sent position to the last written position,
which is by design. However, the last written position has incorrect
value, so the last sent position also becomes incorrect and too low.
- Try to wait for the sync point created earlier.
The sync point waiting mechanism waits until last sent hint position
reaches or goes past the position encoded in the sync point, but it
will not happen because the last sent position is incorrect.
The above bug can be (sometimes) reproduced in
hintedhandoff_sync_point_api_test dtest.
Now, the _last_written_rp field is properly initialized in the move
constructor, which prevents the bug described above.
Fixes: #9320Closes#9426
We must abort the environment before the ticker as the environment may
require time to keep advancing during abort in order for all operations
to finish, e.g. operations that can finish only due to timeout.
Currently such operations may cause the test to hang indefinitely
at the end.
The test requires a small modification to ensure that
`delivery_queue::push` is not called after the queue was aborted.
Message-Id: <20210930143539.157727-1-kbraun@scylladb.com>
"This series removes layer violation in compaction, and also
simplifies compaction manager and how it interacts with compaction
procedure."
* 'compaction_manager_layer_violation_fix/v4' of github.com:raphaelsc/scylla:
compaction: split compaction info and data for control
compaction_manager: use task when stopping a given compaction type
compaction: remove start_size and end_size from compaction_info
compaction_manager: introduce helpers for task
compaction_manager: introduce explicit ctor for task
compaction: kill sstables field in compaction_info
compaction: kill table pointer in compaction_info
compaction: simplify procedure to stop ongoing compactions
compaction: move management of compaction_info to compaction_manager
compaction: move output run id from compaction_info into task
The script assumes the remote name is "origin", a fair
assumption, but not universally true. Read it from configuration
instead of guessing it.
Closes#9423
Since May 2020 empty strings are allowed in DynamoDB as attribute values
(see announcment in [1]). However, they are still not allowed as keys.
We had tests that they are not allowed in keys of LSI or GSI, but missed
tests that they are not allowed as keys (partition or sort key) of base
tables. This patch add these missing tests.
These tests pass - we already had code that checked for empty keys and
generated an appropriate error.
Note that for compatibility with DynamoDB, Alternator will forbid empty
strings as keys even though Scylla *does* support this possibility
(Scylla always supported empty strings as clustering key, and empty
partition keys will become possible with issue #9352).
[1] https://aws.amazon.com/about-aws/whats-new/2020/05/amazon-dynamodb-now-supports-empty-values-for-non-key-string-and-binary-attributes-in-dynamodb-tables/
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211003122842.471001-1-nyh@scylladb.com>
The gist of this series is splitting `compaction_abort_exception` from `compaction_stop_exception`
and their respective error messages to differentiate between compaction being stopped due to e.g. shutdown
or api event vs. compaction aborting due to scrub validation error.
While at it, cleanup the existing retry logic related to `compaction_stop_exception`.
Test: unit(dev)
Dtest: nodetool_additional_test.py:TestNodetool.{{scrub,validate}_sstable_with_invalid_fragment_test,{scrub,validate}_ks_sstable_with_invalid_fragment_test,{scrub,validate}_with_one_node_expect_data_loss_test} (dev, w/ https://github.com/scylladb/scylla-dtest/pull/2267)
Closes#9321
* github.com:scylladb/scylla:
compaction: split compaction_aborted_exception from compaction_stopped_exception
compaction_manager: maybe_stop_on_error: rely on retry=false default
compaction_manager: maybe_stop_on_error: sync return value with error message.
compaction: drop retry parameter from compaction_stop_exception
compaction_manager: move errors stats accounting to maybe_stop_on_error
Just like EBS disks for EC2, we want to use persistent disk on GCE.
We won't recommend to use it, but still need to support it.
Related scylladb/scylla-machine-image#215
Closes#9395
* seastar e6db0cd587...0ba6c36cc3 (6):
> semaphore: add try_get_units
> build: adjust compilation for libfmt 8+
> alloc_failure_injector: add explicit zero-initialization
> Change wakeup() from private to public in reactor.hh
> app-template: separate seastar options into --seastar-help
> files: Don't ignore FS info for read-only files
The pull_github_pr.sh script is preferred over colorful github
buttons, because it's designed to always assign proper authors.
It also works for both single- and multi-patch series, which
makes the merging process more universal.
Message-Id: <b982b650442456b988e1cea59aa5ad221207b825.1633101849.git.sarna@scylladb.com>
by setting _alloc_count initially to 0.
The _alloc_count hasn't been explicitely specified. As the allocator has
been usually an automatic variable, _alloc_count had initially some
unspecified contents. This probalby means that cases where the first few
allocations passed and the later one failed, might haven't ever been
tested. Good thing is that most of the users have been transferred to
the Seastar failure injector, which (by accident) has been correct.
Closes#9420