Commit Graph

28523 Commits

Author SHA1 Message Date
Tzach Livyatan
fb095702b0 Update docker-hub text
Mention aarch64 support
2021-10-05 17:31:14 +03:00
Pavel Emelyanov
4b4ce015aa system-keyspace: Keep UUID value when saving
The set_local_host_id() accepts UUID references and starts
to save it in local keyspace and in all shards' local cache.
Before it was coroutinized the UUID was copied on captures
and survived, after it it remains references. The problem is
that callers pass local variables as arguments that go away
"really soon".

Fix it to accept UUID as value, it's short enough for safe
and painless copy.

fixes: #9425
tests: dtest.ReplaceAddress_rbo_enabled.replace_node_diff_ip(dev)

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20211004145421.32137-1-xemul@scylladb.com>
2021-10-04 18:21:44 +03:00
Tomasz Grabiec
cd9b4d95fc Merge "test: raft: randomized_nemesis_test: better liveness check at the end of generator test" from Kamil
The previous check would find a leader once and assume that it does not
change, and that the first attempt at sending a request to this leader
succeeds. In reality the leader may change at the end of the test (e.g.
it may be in the middle of stepping down when we find it) and in general
it may take some time for the cluster to stabilize. The new check tries
a few times to find a leader and perform a request - until a time limit
is reached.

* kbr/nemesis-liveness-check:
  test: raft: randomized_nemesis_test: better liveness check at the end of generator test
  test: raft: randomized_nemesis_test: take `time_point` instead of `duration` in `wait_for_leader`
2021-10-04 16:05:37 +02:00
Kamil Braun
17e771c5f5 test: raft: randomized_nemesis_test: better liveness check at the end of generator test
The previous check would find a leader once and assume that it does not
change, and that the first attempt at sending a request to this leader
succeeds. In reality the leader may change at the end of the test (e.g.
it may be in the middle of stepping down when we find it) and in general
it may take some time for the cluster to stabilize. The new check tries
a few times to find a leader and perform a request - until a time limit
is reached.

The commit also removes an incorrect assertion inside in `wait_for_leader`.
2021-10-04 15:57:54 +02:00
Kamil Braun
478a58e86d test: raft: randomized_nemesis_test: take time_point instead of duration in wait_for_leader
To be used in the next commit, where we call `wait_for_leader` in a loop
with the same deadline `time_point`.
2021-10-04 15:56:54 +02:00
Tomasz Grabiec
e89b9799b8 Merge 'sstable mx reader: implement reverse single-partition reads' from Kamil Braun
Until now reversed queries were implemented inside
`querier::consume_page` (more precisely, inside the free function
`consume_page` used by `querier::consume_page`) by wrapping the
passed-in reader into `make_reversing_reader` and then consuming
fragments from the resulting reversed reader.

The first couple of commits change that by pushing the reversing down below
the `make_combined_reader` call in `table::query`. This allows
working on improving reversing for memtables independently from
reversing for sstables.

We then extend the `index_reader` with functions that allow
reading the promoted index in reverse.

We introduce `partition_reversing_data_source`, which wraps an sstable data
file and returns data buffers with contents of a single chosen partition
as if the rows were stored in reverse order.

We use the reversing source and the extended index reader in
`mx_sstable_mutation_reader` to implement efficient (at least in theory)
reversed single-partition reads.

The patchset disables cache for reversed reads. Fast-forwarding
is not supported in the mx reader for reversed queries at this point.

Details in commit messages. Read the commits in topological order
for best review experience.

Refs: #9134
(not saying "Fixes" because it's only for single-partition queries
without forwarding)

Closes #9281

* github.com:scylladb/scylla:
  table: add option to automatically bypass cache for reversed queries
  test: reverse sstable reader with random schema and random mutations
  sstables: mx: implement reversed single-partition reads
  sstables: mx: introduce partition_reversing_data_source
  sstables: index_reader: add support for iterating over clustering ranges in reverse
  clustering_key_filter: clustering_key_filter_ranges owning constructor
  flat_mutation_reader: mention reversed schema in make_reversing_reader docstring
  clustering_key_filter: document clustering_key_filter_ranges::get_ranges
2021-10-04 15:37:34 +02:00
Kamil Braun
703aed3277 table: add option to automatically bypass cache for reversed queries
Currently the new reversing sstable algorithms do not support fast
forwarding and the cache does not yet handle reversed results. This
forced us to disable the cache for reversed queries if we want to
guarantee bounded memory. We introduce an option that does this
automatically (without specifying `bypass cache` in the query) and turn
it on by default.

If the user decides that they prefer to keep the cache at the
cost of fetching entire partitions into memory (which may be viable
if their partitions are small) during reversed queries, the option can
be turned off. It is live-updateable.
2021-10-04 15:24:12 +02:00
Kamil Braun
9bf6be5509 test: reverse sstable reader with random schema and random mutations
The test generates a random set of mutations and creates two readers:
- one by reversing the mutations, creating an sstable out of the result,
  and querying it in reverse,
- one by creating an sstable directly from the mutations and querying it
  in forward mode.

It checks that the readers give equal results.

The test already managed to find a bug where offsets returned by the
sstable index were interpreted incorrectly as absolute instead of
relative. It also helped find another bug unrelated to reversing (#9352).

Surprisingly few tests use the random schema and random mutation
utilities which seem to be quite powerful.
2021-10-04 15:24:12 +02:00
Kamil Braun
27238eaa0f sstables: mx: implement reversed single-partition reads
We use partition_reversing_data_source and the new `index_reader` methods
to implement single-partition reads in `mx_sstable_mutation_reader`.

The parsing logic does not need to change: the buffers returned by the
source already contain rows in reversed clustering order.

Some changes were required in `mp_row_consumer_m` which processes the
parsed rows and emits appropriate mutation fragments. The consumer uses
`mutation_fragment_filter` underneath to decide whether a fragment
should be ignored or not (e.g. the parsed fragment may come from outside
the requested clustering range), among other things. Previously
`mutation_fragment_filter` was provided a `partition_slice`. If the
slice was reversed, the filter would use
`clustering_key_filter_ranges::get_ranges` to obtain the clustering
ranges from the slice in unreversed order (they were reversed in the
slice) since we didn't perform any reversing in the reader. Now the
reader provides the ranges directly instead of the slice; furthermore,
the ranges are provided in native-reversed format (the order of ranges
is reversed and the ranges themselves are also reversed), and the schema
provided to the filter is also reversed. Thus to the filter everything
appears as if it was used during a non-reversed query but on a table
with reversed schema, which works correctly given the fact that the
reader is feeding parsed rows into the consumer in reversed order.

During reversed queries the reader uses alternative logic for skipping
to a later range (or, speaking in non-reversed terms, to an earlier range),
which happens in `advance_context`. It asks the index to advance its
upper bound in reverse so that the reversing_data_source notices the
change of the index end position and returns following buffers with rows
from the new range.

There is a slight difference in behavior of the reader from
`mp_row_consumer_m`'s point of view. For non-reversed reads, after
the consumer obtains the beginning of a row (`consume_row_start`)
- which contains the row's position but not the columns - and tells the
reader that the row won't be emitted because we need to skip to a later
range, the reader would tell the data source (the 'context') immediately
to skip to a later range by calling `skip_to`. This caused the source
not to return the rest of the row, and the rest of the row would not
be fed to the consumer (`consume_row_end`). However, for reversed reads,
the data source performs skipping 'on its own', after it notices that
the index end position has changed. This may happen 'too late', causing
the rest of the row to be returned anyway. We are prepared for this
situation inside `mp_row_consumer` by consulting the mutation fragment
filter again when the rest of the row arrives.

Fast forwarding is not supported at this point, which is fine given that
the cache is disabled for reversed queries for now (and the cache is the
only user of fast forwarding).

The `partition_slice` provided by callers is provided in 'half-reversed'
format for reversed queries, where the order of clustering ranges is
reversed, but the ranges themselves are not. This means we need to modify
the slice sometimes: for non-single-partition queries the mx reader must
use a non-reversed slice, and for single-partition queries the mx reader
must use a native-reversed slice (where the clustering ranges themselves
are reversed as well). The modified slice must be stored somewhere; we
store it inside the mx reader itself so we don't need to allocate more
intermediate readers at the call sites.  This causes the interface of
`mx::make_reader` to be a bit weird: for non-single-partition queries
where the provided slice is reversed the reader will actually return a
non-reversed stream of fragments, telling the user to reverse the stream
on their own. The interface has been documented in detail with
appropriate comments.
2021-10-04 15:24:12 +02:00
Wojciech Mitros
64e703bb54 sstables: mx: introduce partition_reversing_data_source
This patch adds an implementation of a data source that wraps an sstable
data file and returns data buffers with contents of one partition in the
sstable as if the rows of the partition were present in a reversed
order. In other words, to the user of the source the partition appears
to be reversed. We shall call this an 'intermediary' data source.

As part of the interface of the intermediary source the user is also
given read access to the source's current position over the data file,
and the constructor of the source takes a reference to `index_reader`.
This is necessary because the index operates directly on data file
offsets and we want the user to be able to use the index to skip
sequences of rows.

In order to ask the source to skip a sequence of rows - e.g. when jumping
between clustering ranges - the user must advance the index' upper bound
in reverse (to an earlier position). The source will then notice that
the end position of the index has changed and take appropriate action.

An alternative would be to translate the data positions of
`index_reader` to 'reversed positions' of the intermediary and then use
`skip_to` for skipping, as we do for forward reads. However this
solution would introduce more complexity to `index_reader` and the
intermediary source. One reason for the complexity in the input stream
is that we would have two kinds of skips: a single row skip,
and a skip to a clustering range. We know the offset of the next row,
so we could check that to differentiate them. We would also need to add
an information about the position of first clustering row and end of
the last one in the index_reader. Skipping by checking the index seems
to be overall simpler.

For simplicity, the intermediary stream always starts with
parsing the partition header and (if present) the static row,
and returning the corresponding bytes as a result of the first
read.

After partition header and static row we must find the last row entry of
the requested range. If the range ends before the partition end (i.e.
there are more row entries after the range) we can use the 'previous
unfiltered size' of the row following the range; otherwise we must scan
the last promoted index block and take its last row.

After finding the data range of the last row, we parse rows
consecutively in reversed order.  We must parse the rows partially
to learn their lengths and the positions of previous rows. We're
using similar constructs as in the sstable parser, but it only
contains a small part of the parsing coroutine and doesn't perform
any correctness checks.  The parser for rows still turned out rather
big mostly because we can't always deduce the size of the clustering
blocks without reading the block header.

The parser allows reading rows while skipping their bodies also in
non-reversed order, which we are making use of while reading the
last promoted index block.

The intermediary data source has one more utility: reversing range
tombstones.  When we read a tombstone bound/boundary, we modify
the data buffer so that the resulting bound/boundary has the reversed
kind (so we don't read ends before starts) and the boundaries have their
before/after timestamps swapped.
2021-10-04 15:24:12 +02:00
Wojciech Mitros
8385f3eb21 sstables: index_reader: add support for iterating over clustering ranges in reverse
In the sstable reader, we iterate over clustering ranges using the
index_reader, which normally only accepts advancing to increasing
positions. In this patch we add methods for advancing the index
reader in reverse.

To simplify our job we restrict our attention to a single implementation
of the promoted index block cursor: `bsearch_clustered_cursor`. The
`index_reader` methods for advancing in reverse will thus assume that
this implementation is used. The assumption is correct given that we're
working only with sstables of versions >= mc, which is indeed the
intended use case. We add some documentation in appropriate places to
make this obvious.

We extend `bsearch_clustered_cursor` with two methods:
`advance_past(pos)`, which advances the cursor to the first block after
`pos` (or to the end if there is no such block), and
`last_block_offset()`, which returns the data file offset of the first
row from the last promoted index block.

To efficiently find the position in the data file of the last row
of the partition (which we need when performing a reversed query)
the sstable reader may need to read the span of the entire last promoted
index block in the data file. To learn where the block starts it can use
`index_reader::last_block_offset()`, which is implemented in terms of
`bsearch_clustered_cursor::last_block_offset()`.

When performing a single partition read in forward order, the reader
asks the index to position its lower bound at the start of the partition
and its upper bound after the end of the slice. It starts by reading the
first range. After exhausting a range it jumps to the next one by asking
the index to advance the lower bound.

For reverse single partition reads we'll take a similar approach: the
initial bound positions are as in the forward case. However, we start
with the last range and after exhausting a range we want to jump to a
previous one; we will do it by advancing the upper bound in reverse
(i.e. moving it closer to the beginning of the partition).  For this
we introduce the `index_reader::advance_reverse` function.
2021-10-04 15:24:12 +02:00
Avi Kivity
3cb865103d Update seastar submodule
* seastar 0ba6c36cc3...994b4b5a0c (2):
  > Merge "Improve smp::invoke_on_others" from Pavel E
  > file: keep hint pointer alive when calling fcntl()
2021-10-04 15:36:45 +03:00
Avi Kivity
148a12f3da Merge "Keep storage_service less aware of cdc internals" from Pavel E
"
The storage_service is involved in the cdc_generation_service guts
more than needed.

 - the bool _for_testing bit is cdc-only
 - there's API-only cdc_generation_service getter
 - cdc_g._s. startup code partially sits in s._s. one

This patch cleans most of the above leaving only the startup
_cdc_gen_id on board.

tests: unit(dev)
refs: #2795

"

* 'br-storage-service-vs-cdc-2' of https://github.com/xemul/scylla:
  api: Use local sharded<cdc::generation_service> reference
  main: Push cdc::generation_service via API
  storage_service: Ditch for_testing boolean
  cdc: Replace db::config with generation_service::config
  cdc: Drop db::config from description_generator
  cdc: Remove all arguments from maybe_rewrite_streams_descriptions
  cdc: Move maybe_rewrite_streams_descriptions into after_join
  cdc: Squash two methods into one
  cdc: Turn make_new_cdc_generation a service method
  cdc: Remove ring-delay arg from make_new_cdc_generation
  cdc: Keep database reference on generation_service
2021-10-04 14:56:05 +03:00
Piotr Dulikowski
6093c2378b hints: assign _last_written_rp in ep manager's move constructor
The end_point_hints_manager's field _last_written_rp is initialized in
its regular constructor, but is not copied in the move constructor.
Because the move constructor is always involved when creating a new
endpoint manager, the _last_written_rp field is effectively always
initialized with the zero-argument constructor, and is set to the zero
value.

This can cause the following erroneous situation to occur:

- Node A accumulates hints towards B.
- Sync point is created at A.
  It will be used later to wait for currently accumulated hints.
- Node A is restarted.
  The endpoint manager A->B is created which has bogus value in the
  _last_written_rp (it is set to zero).
- Node A replays its hints but does not write any new ones.
- A hint flush occurs.
  If there are no hint segments on disk after flush, the endpoint
  manager sets its last sent position to the last written position,
  which is by design. However, the last written position has incorrect
  value, so the last sent position also becomes incorrect and too low.
- Try to wait for the sync point created earlier.
  The sync point waiting mechanism waits until last sent hint position
  reaches or goes past the position encoded in the sync point, but it
  will not happen because the last sent position is incorrect.

The above bug can be (sometimes) reproduced in
hintedhandoff_sync_point_api_test dtest.

Now, the _last_written_rp field is properly initialized in the move
constructor, which prevents the bug described above.

Fixes: #9320
Closes #9426
2021-10-04 13:21:34 +02:00
Kamil Braun
b2f33b3e0b test: raft: randomized_nemesis_test: abort environment before ticker
We must abort the environment before the ticker as the environment may
require time to keep advancing during abort in order for all operations
to finish, e.g. operations that can finish only due to timeout.
Currently such operations may cause the test to hang indefinitely
at the end.

The test requires a small modification to ensure that
`delivery_queue::push` is not called after the queue was aborted.
Message-Id: <20210930143539.157727-1-kbraun@scylladb.com>
2021-10-04 12:31:26 +02:00
Avi Kivity
1bac93e075 Merge "simplifications and layer violation fix for compaction manager" from Raphael
"This series removes layer violation in compaction, and also
simplifies compaction manager and how it interacts with compaction
procedure."

* 'compaction_manager_layer_violation_fix/v4' of github.com:raphaelsc/scylla:
  compaction: split compaction info and data for control
  compaction_manager: use task when stopping a given compaction type
  compaction: remove start_size and end_size from compaction_info
  compaction_manager: introduce helpers for task
  compaction_manager: introduce explicit ctor for task
  compaction: kill sstables field in compaction_info
  compaction: kill table pointer in compaction_info
  compaction: simplify procedure to stop ongoing compactions
  compaction: move management of compaction_info to compaction_manager
  compaction: move output run id from compaction_info into task
2021-10-04 13:09:31 +03:00
Avi Kivity
93b765f655 scripts/pull_github_pr.sh: don't guess git remote name
The script assumes the remote name is "origin", a fair
assumption, but not universally true. Read it from configuration
instead of guessing it.

Closes #9423
2021-10-04 12:32:39 +03:00
Nadav Har'El
414b672e22 test/alternator: verify that empty-string keys are NOT allowed
Since May 2020 empty strings are allowed in DynamoDB as attribute values
(see announcment in [1]). However, they are still not allowed as keys.

We had tests that they are not allowed in keys of LSI or GSI, but missed
tests that they are not allowed as keys (partition or sort key) of base
tables. This patch add these missing tests.

These tests pass - we already had code that checked for empty keys and
generated an appropriate error.

Note that for compatibility with DynamoDB, Alternator will forbid empty
strings as keys even though Scylla *does* support this possibility
(Scylla always supported empty strings as clustering key, and empty
partition keys will become possible with issue #9352).

[1] https://aws.amazon.com/about-aws/whats-new/2020/05/amazon-dynamodb-now-supports-empty-values-for-non-key-string-and-binary-attributes-in-dynamodb-tables/

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211003122842.471001-1-nyh@scylladb.com>
2021-10-04 08:40:43 +02:00
Botond Dénes
61e7d3de90 Merge 'Cleanup compaction_stop_exception' from Benny Halevy
The gist of this series is splitting `compaction_abort_exception` from `compaction_stop_exception`
and their respective error messages to differentiate between compaction being stopped due to e.g. shutdown
or api event vs. compaction aborting due to scrub validation error.

While at it, cleanup the existing retry logic related to `compaction_stop_exception`.

Test: unit(dev)
Dtest: nodetool_additional_test.py:TestNodetool.{{scrub,validate}_sstable_with_invalid_fragment_test,{scrub,validate}_ks_sstable_with_invalid_fragment_test,{scrub,validate}_with_one_node_expect_data_loss_test} (dev, w/ https://github.com/scylladb/scylla-dtest/pull/2267)

Closes #9321

* github.com:scylladb/scylla:
  compaction: split compaction_aborted_exception from compaction_stopped_exception
  compaction_manager: maybe_stop_on_error: rely on retry=false default
  compaction_manager: maybe_stop_on_error: sync return value with error message.
  compaction: drop retry parameter from compaction_stop_exception
  compaction_manager: move errors stats accounting to maybe_stop_on_error
2021-10-04 07:27:11 +03:00
Takuya ASADA
9c830297ac scylla_util.py: add persistent disk support for GCE
Just like EBS disks for EC2, we want to use persistent disk on GCE.
We won't recommend to use it, but still need to support it.

Related scylladb/scylla-machine-image#215

Closes #9395
2021-10-03 17:58:18 +03:00
Takuya ASADA
d87b80ad14 scylla_util.py: add persistent disk support for Azure Just like EBS disks for EC2, we want to use persistent disk on Azure. We won't recommend to use it, but still need to support it.
Related https://github.com/scylladb/scylla-machine-image/issues/218

Closes #9417
2021-10-03 17:56:31 +03:00
Avi Kivity
adcd5a69d6 Update seastar submodule
* seastar e6db0cd587...0ba6c36cc3 (6):
  > semaphore: add try_get_units
  > build: adjust compilation for libfmt 8+
  > alloc_failure_injector: add explicit zero-initialization
  > Change wakeup() from private to public in reactor.hh
  > app-template: separate seastar options into --seastar-help
  > files: Don't ignore FS info for read-only files
2021-10-03 13:14:43 +03:00
Piotr Sarna
1d353bd6e7 docs: mention scripts/pull_github_pr.sh
The pull_github_pr.sh script is preferred over colorful github
buttons, because it's designed to always assign proper authors.
It also works for both single- and multi-patch series, which
makes the merging process more universal.
Message-Id: <b982b650442456b988e1cea59aa5ad221207b825.1633101849.git.sarna@scylladb.com>
2021-10-03 10:19:26 +03:00
Michał Radwański
0d5a2067ad test/lib/failure_injecting_allocation_strategy: remove UB...
by setting _alloc_count initially to 0.

The _alloc_count hasn't been explicitely specified. As the allocator has
been usually an automatic variable, _alloc_count had initially some
unspecified contents. This probalby means that cases where the first few
allocations passed and the later one failed, might haven't ever been
tested. Good thing is that most of the users have been transferred to
the Seastar failure injector, which (by accident) has been correct.

Closes #9420
2021-10-01 13:25:05 +02:00
Pavel Emelyanov
85d86cc85f scripts: Fix origin repo URL parsing
It assumes that the origin URL is git@ one while it can be
the https:// one as well.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20211001082116.7214-1-xemul@scylladb.com>
2021-10-01 13:22:06 +02:00
Piotr Sarna
ec52e05eab tracing: unify prepared statement info into a single struct
The tracing code assumes that query_option_names and query_option_values
vectors always have the same length as the prepared_statements vector,
but it's not true. E.g. if one of the statements in a batch is
incorrect, it will create a discrepancy between the number of prepared
statements and the number of bound names and values, which currently
leads to a segmentation fault.
To overcome the problem, all three vectors are integrated into a single
vector, which makes size mismatches impossible.

Tested manually with code that triggers a failure while executing
a batch statement, because the Python driver performs driver-side
validation and thus it's hard to create a test case which triggers
the problem.

closes: #9221
2021-10-01 10:57:38 +03:00
Eliran Sinvani
c38ceafdcf Service Level Controller: Add an extention point to the API (#9374)
In order to ease future extensions to the information being sent
by the service level configuration change API, we pack the additional
parameters (other the the service level options) to the interface in a
structure. This will allow an easy expansion in the future if more
parameters needs to be sent to the observer.i

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2021-10-01 10:20:28 +03:00
Raphael S. Carvalho
9067a13eac compaction: split compaction info and data for control
compaction_info must only contain info data to be exported to the
outside world, whereas compaction_data will contain data for
controlling compaction behavior and stats which change as
compaction progresses.
This separation makes the interface clearer, also allowing for
future improvements like removing direct references to table
in compaction.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-09-30 13:16:57 -03:00
Raphael S. Carvalho
87ce0c5d43 compaction_manager: use task when stopping a given compaction type
compaction_info will eventually only be used for exporting data about
ongoing compactions, so task must be used instead.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-09-30 13:16:52 -03:00
Raphael S. Carvalho
cbd78be2dd compaction: remove start_size and end_size from compaction_info
those stats aren't used in compaction stats API and therefore they
can be removed. end_size is added to compaction_result (needed for
updating history) and start_size can be calculated in advance.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-09-30 13:16:45 -03:00
Raphael S. Carvalho
18f703e94b compaction_manager: introduce helpers for task
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-09-30 13:16:41 -03:00
Raphael S. Carvalho
d4572a1bb5 compaction_manager: introduce explicit ctor for task
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-09-30 13:16:37 -03:00
Raphael S. Carvalho
38df9c68f8 compaction: kill sstables field in compaction_info
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-09-30 13:16:33 -03:00
Raphael S. Carvalho
90cfe895d4 compaction: kill table pointer in compaction_info
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-09-30 13:16:29 -03:00
Raphael S. Carvalho
4ce745e0b6 compaction: simplify procedure to stop ongoing compactions
Today, compactions are tracked by both _compactions and _tasks,
where _compactions refer to actual ongoing compaction tasks,
whereas _tasks refer to manager tasks which is responsible for
spawning new compactions, retry them on failure, etc.
As each task can only have one ongoing compaction at a time,
let's move compaction into task, such that manager won't have to
look at both when deciding to do something like stopping a task.

So stopping a task becomes simpler, and duplication is naturally
gone.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-09-30 13:16:21 -03:00
Raphael S. Carvalho
efed06e2e4 compaction: move management of compaction_info to compaction_manager
Today, compaction is calling compaction manager to register / deregister
the compaction_info created by it.

This is a layer violation because manager sits one layer above
compaction, so manager should be responsible for managing compaction
info.

From now on, compaction_info will be created and managed by
compaction_manager. compaction will only have a reference to info,
which it can use to update the world about compaction progress.

This will allow compaction_manager to be simplified as info can be
coupled with its respective task, allowing duplication to be removed
and layer violation to be fixed.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-09-30 13:15:00 -03:00
Raphael S. Carvalho
1f5b17fdc5 compaction: move output run id from compaction_info into task
this run id is used to track partial runs that are being written to.
let's move it from info into task, as this is not an external info,
but rather one that belongs to compaction_manager.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-09-30 13:13:20 -03:00
Raphael S. Carvalho
52302c3238 compaction_manager: prevent unbounded growth of pending tasks
There will be unbounded growth of pending tasks if they are submitted
faster than retiring them. That can potentially happen if memtables
are frequently flushed too early. It was observed that this unbounded
growth caused task queue violations as the queue will be filled
with tons of tasks being reevaluated. By avoiding duplication in
pending task list for a given table T, growth is no longer unbounded
and consequently reevaluation is no longer aggressive.

Refs #9331.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20210930125718.41243-1-raphaelsc@scylladb.com>
2021-09-30 16:49:52 +03:00
Pavel Emelyanov
037135316e api: Use local sharded<cdc::generation_service> reference
And remove the getter from storage_service.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-09-30 16:04:12 +03:00
Pavel Emelyanov
5d8e05e7ae main: Push cdc::generation_service via API
This is not to mess with storage service in this API call. Next
patch will make use of the passed reference.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-09-30 16:04:12 +03:00
Pavel Emelyanov
f669fbd230 storage_service: Ditch for_testing boolean
Nowadays it purely controls whether or not to inject delays into
timestamps generation by cdc. The same effect can be achieved by
configuring the cdc::generation_service directly.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-09-30 16:04:12 +03:00
Pavel Emelyanov
db623c5f64 cdc: Replace db::config with generation_service::config
This is to push the service towards general idea that each
component should have its own config and db::config to stay
in main.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-09-30 16:04:12 +03:00
Pavel Emelyanov
b879d3f3a5 cdc: Drop db::config from description_generator
It only needs one for murmur3_partitioner_ignore_msb_bits value,
provide it directly.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-09-30 16:04:12 +03:00
Pavel Emelyanov
2e7364b94f cdc: Remove all arguments from maybe_rewrite_streams_descriptions
All of them are references taken from 'this', since the function is
the generation_service method it can use 'this' directly.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-09-30 16:04:12 +03:00
Pavel Emelyanov
6fe31d8eac cdc: Move maybe_rewrite_streams_descriptions into after_join
The generation service already has all it needs to do it. This
keeps storage_service smaller and less aware about cdc internals.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-09-30 15:34:03 +03:00
Pavel Emelyanov
3b51c5c96a cdc: Squash two methods into one
The recently introduced make_new_generation() method just calls
another one by passing more this->... stuff as arguments. Relax
the flow by teaching the latter to use 'this' directly.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-09-30 15:34:03 +03:00
Pavel Emelyanov
7a7a87f24a cdc: Turn make_new_cdc_generation a service method
It has everything needed onboard. Only two arguments are required -- the
booststrap tokens and whether or not to inject a delay.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-09-30 15:34:03 +03:00
Pavel Emelyanov
b867a19da1 cdc: Remove ring-delay arg from make_new_cdc_generation
It already has the db::config from where to get one (and even
this will change soon).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-09-30 15:34:03 +03:00
Pavel Emelyanov
5e2a049266 cdc: Keep database reference on generation_service
The service effectively depends on it when rewrites streams
descriptions.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-09-30 15:34:03 +03:00
Piotr Sarna
e2fe8559ca configure: temporarily disable wasm support for aarch64
There seems to be a problem with libwasmtime.a dependency on aarch64,
causing occasional segfaults during tests - specifically, tests
which exercise the path for halting wasm execution due to fuel
exhaustion. As a temporary measure, wasm is disabled on this
architecture to unblock the flow.

Refs #9387

Closes #9414
2021-09-30 14:57:04 +03:00