Commit Graph

30482 Commits

Author SHA1 Message Date
Michał Chojnowski
f211ef9d71 replica: Prefer memtable::make_flat_reader_opt() to memtable::make_flat_reader()
The former is significantly cheaper when there is nothing to be read.
2022-03-14 12:02:49 +01:00
Michał Chojnowski
218f2b6e98 memtable: Add memtable::make_flat_reader_opt()
When there is nothing to read, make_flat_reader() returns an empty (no-op)
reader. But it turns out that constructing, combining and destroying that
empty reader is quite costly.
As an optimization, add an alternative version which returns an empty optional
instead.
2022-03-14 12:02:49 +01:00
Piotr Sarna
83ec505fab cql3: add tracing indexed aggregate queries
Commit 1c99ed6ced added tracing logs
about the index chosen for the query, but aggregate queries have
a separate code path, which wasn't taken into account.
After this patch, tracing for aggregate queries also includes
this additional information.

Closes #10195
2022-03-11 15:27:03 +02:00
Raphael S. Carvalho
67a7b7a3f4 compaction: rename interrupt() to a descriptive name
interrupt() makes it sound like it's interrupting the compaction, but it's
actually called *on* interrupt, to handle the interrupt scenario.
Let's rename it to on_interrupt().

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20220311000128.189840-1-raphaelsc@scylladb.com>
2022-03-11 10:16:34 +02:00
Michał Sala
c8413631af forward_service: change implicit lambda capture list to explicit one
Changing the capture list of a lambda in
forward_service::execute_on_this_shard from [&] to an explicit one
enables grater readability and prevents potential bugs.

Closes #10191
2022-03-10 17:30:06 +02:00
Botond Dénes
e9ba8ad43a Merge "Configure gossiper the "classical" way" from Pavel Emelyanov
The services' configuration should be performed with the help of
service-specific config that's filled by the service creator. This
is not the case for gossiper that grabs the db::config and keeps
reference on it throughout its lifetime.

This set brings the gossiper configuration to the described form
by putting the needed config bits onto gossip_config (that already
exists and is partially used for gossiper configuration). And two
live-updateable options need extra care.

tests: unit(dev), dtest.simple_boot_shutdown(dev)

* 'br-gossiper-no-db-config' of https://github.com/xemul/scylla:
  gossiper: Remove db::config reference from gossiper
  gossiper: Keep live-updateable options on gossiper
  gossiper: Keep immutable options on gossip_config
2022-03-10 16:35:41 +02:00
Botond Dénes
ab440e1a07 mutation_writer: drop now unused v1 variants of bucket_writer feed_writer()
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20220302145945.189607-2-bdenes@scylladb.com>
2022-03-10 15:20:07 +02:00
Botond Dénes
108d921fc9 mutation_writer: partition_based_splitting_writer: convert implementation to v2
Although its API was long converted to v2, its implementation stayed v1
because the memtable and mutation API were still v1. Now that the
memtable flush returns a v2 reader we can have a second look at
converting this. While the mutation API still uses v1, this can easily
be worked around by using going through `mutation_rebuilder_v2`.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20220302145945.189607-1-bdenes@scylladb.com>
2022-03-10 15:20:07 +02:00
Botond Dénes
7e0b51ff23 Merge 'Overhaul compaction_manager::task' from Benny Halevy
The series overhauls the compaction_manager::task design and implementation
by properly layering the functionality between the compaction_manager
that deals with generic task execution, and the per-task business logic that is defined
in a set of classes derived from the generic task class.

While at it, the series introduces `task::state` and a set of helper functions to manage it
to prevent leaks in the statistics, fixing #9974.

Two more stats counter were exposed: `completed_tasks` and a new `postponed_tasks`.

Test: sstable_compaction_test
Dtest: compaction_test.py compaction_additional_test.py

Fixes #9974

Closes #10122

* github.com:scylladb/scylla:
  compaction_manager: use coroutine::switch_to
  compaction_manager::task: drop _compaction_running
  compaction_manager: move per-type logic to derived task
  compaction_manager: task: add state enum
  compaction_manager: task: add maybe_retry
  compaction_manager: reevaluate_postponed_compactions: mark as noexcept
  compaction_manager: define derived task types
  compaction_manager: register_metrics: expose postponed_compactions
  compaction_manager: register_metrics: expose failed_compactions
  compaction_manager: register_metrics: expose _stats.completed_tasks
  compaction: add documentation for compaction_type to string conversions
  compaction: expose to_string(compaction_type)
  compaction_manager: task: standardize task description in log messages
  compaction_manager: refactor can_proceed
  compaction_manager: pass compaction_manager& to task ctor
  compaction_manager: use shared_ptr<task> rather than lw_shared_ptr
  compaction_manager: rewrite_sstables: acquire _maintenance_ops_sem once
  compaction_manager: use compaction_state::lock only to synchronize major and regular compaction
2022-03-10 13:33:56 +02:00
Benny Halevy
5e1fda7e1d compaction_manager: use coroutine::switch_to
Saving an allocation for running the functor
as a task in the switched-to scheduling group.

Also, switch to the desired scheduling group at
the beginning of the task so that the higher level logic,
like getting the list of sstables to compact
will be performed under the desired scheduling group,
not only the compaction code itself.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-03-10 12:20:01 +02:00
Benny Halevy
8c66916652 compaction_manager::task: drop _compaction_running
Replace the _compaction_running boolean member
by calculating _state == state::active
now that setup_new_compaction switches state to
`active`

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-03-10 12:20:01 +02:00
Benny Halevy
a2a5e530f0 compaction_manager: move per-type logic to derived task
Move the business logic into the task specific classes.
Separating initialization during task construction,
from the compaction_done task, moved into
a do_run() method, and in some cases moving
a lambda function that was called per table (as in
rewrite_sstables) into a private method of the
derived class.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-03-10 12:20:01 +02:00
Benny Halevy
2e6ce43a97 compaction_manager: task: add state enum
Add an enum class representing the task state machine
and a switch_state function to transition between the states
and update the corresponding compaction_manager stats counters.

Refs #9974

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-03-10 12:19:59 +02:00
Mikołaj Sielużycki
5920349357 row_cache: Make row_cache reader from sstables compacting.
Reading data from sstables without compacting first puts
unnecessary pressure on the cache. The mutation streams
need to be resolved anyway before passing to subsequent
consumers, so it's better to do it as close to the
source as possible.

Fixes: #3568

Closes #10188
2022-03-10 11:40:10 +02:00
Benny Halevy
9c59d66b7e compaction_manager: task: add maybe_retry
Replacing and combining compaction_manager methods:
maybe_stop_on_error and put_task_to_sleep.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-03-10 11:35:37 +02:00
Benny Halevy
ee32be3aa5 compaction_manager: reevaluate_postponed_compactions: mark as noexcept
To simplify error handling in following patches
that will coroutinize task logic.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-03-10 11:35:37 +02:00
Benny Halevy
72162ed653 compaction_manager: define derived task types
Turn task into a class, defining a clear hierarchy
of private, protected, and public methods.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-03-10 11:35:35 +02:00
Avi Kivity
2f967f84d4 Merge "Migrate sstable writer to v2" from Botond
"
This patch-set converts the sstable writer to v2, then prepares the
ground for users actually being able to use the v2 variant. Finally it
converts all users to do so and then decommissions the v1 variant.
For users to be able to use the v2 writer API, we first have to add a v2
output to the compactor first, as some users write to sstables via the
compactor.

Tests: unit(dev, release)
"

* 'sstable-writer-v2/v2' of https://github.com/denesb/scylla:
  sstables/sstable: remove now unused v1 write_components() variant
  mutation_compactor: remove now unused compact_for_compaction
  test/boost/mutation_test: migrate to compact_for_mutation_v2
  streaming: migrate to v2 variant of sstable writer API
  memtable-sstable: migrate to v2 variant of sstable writer API
  test: migrate to the v2 variant of the sstable writer API
  sstables/sstable: expose v2 variant of write_components()
  sstables: convert mx writer to v2
  sstables/metadata_collector: use position_in_partition for min/max keys
  test/boost/mutation_test: test_compactor_range_tombstone_spanning_many_pages extend to check v2 output too
  mutation_reader: convert compacting reader v2
  mutation_compactor: add v2 output
  mutation_compactor: make _last_clustering_pos track last input
  range_tombstone_change: add set_tombstone()
  test/lib/mutation_source_test: log name of each run_mutation_source()
2022-03-10 09:45:57 +02:00
Botond Dénes
2e0610e459 sstables/sstable: remove now unused v1 write_components() variant
Supplanted by the v2 variant.
2022-03-10 09:16:33 +02:00
Botond Dénes
4e97477281 mutation_compactor: remove now unused compact_for_compaction 2022-03-10 09:16:33 +02:00
Botond Dénes
32e9809e9c test/boost/mutation_test: migrate to compact_for_mutation_v2 2022-03-10 09:16:33 +02:00
Botond Dénes
06e6bb6ec9 streaming: migrate to v2 variant of sstable writer API 2022-03-10 09:16:33 +02:00
Botond Dénes
d8fec08468 memtable-sstable: migrate to v2 variant of sstable writer API 2022-03-10 09:16:33 +02:00
Botond Dénes
959483a2dc test: migrate to the v2 variant of the sstable writer API 2022-03-10 09:16:33 +02:00
Benny Halevy
37694422dc compaction_manager: register_metrics: expose postponed_compactions
Provide a metric counting the number of tables
with postponed compaction.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-03-10 08:39:18 +02:00
Benny Halevy
089d4442d8 compaction_manager: register_metrics: expose failed_compactions
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-03-10 08:39:18 +02:00
Benny Halevy
8081f951d0 compaction_manager: register_metrics: expose _stats.completed_tasks
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-03-10 08:39:18 +02:00
Benny Halevy
ffc314d506 compaction: add documentation for compaction_type to string conversions
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-03-10 08:39:18 +02:00
Benny Halevy
28a74a2e90 compaction: expose to_string(compaction_type)
To be used in the next patch to generate
a string dscription from the compaction_type.

In theory, we could use compaction_name()
btu the latter returns the compaction type
in all-upper case and that is very different from
what we print to the log today.  The all-upper
strings are used for the api layer, e.g. to
stop tasks of a particular compaction type.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-03-10 08:39:18 +02:00
Benny Halevy
20a8609392 compaction_manager: task: standardize task description in log messages
Define task::describe and use it via operator<<
to print the task metadata to the log in a standard way.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-03-10 08:39:18 +02:00
Benny Halevy
59863b317f compaction_manager: refactor can_proceed
Move the task-internal parts of can_proceed
to a respective compaction_manager::task method,
preparing for turning it into a class with
a proper hierarchy of access to private members.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-03-10 08:39:17 +02:00
Benny Halevy
33b2731a4a compaction_manager: pass compaction_manager& to task ctor
And use it to get the compaction state of the
table to compact.

It will be used in a later patch
to manage the task state from task
methods.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-03-10 08:39:17 +02:00
Benny Halevy
20067b1050 compaction_manager: use shared_ptr<task> rather than lw_shared_ptr
Prepare for defining per compaction type tasks
derived from compaction_manager::task.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-03-10 08:39:17 +02:00
Benny Halevy
cb2403e917 compaction_manager: rewrite_sstables: acquire _maintenance_ops_sem once
Like all other maintenance operations, acquire the _maintenance_ops_sem
once for the whole task, rather than for each sstable.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-03-10 08:39:17 +02:00
Benny Halevy
d0f693a517 compaction_manager: use compaction_state::lock only to synchronize major and regular compaction
Maintenance operations like cleanup, upgrade, reshape, and reshard
are serialized serialized with major compaction using the _maintenance_ops_sem
and they need no further synchronization with regular compaction
by acquiring the per-table read lock..

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-03-10 08:39:17 +02:00
Botond Dénes
fed5b73147 sstables/sstable: expose v2 variant of write_components()
In parallel to the existing v1 one. In the next patches we start
migrating users to the v2 variant incrementally and finally remove the
v1 variant.
2022-03-10 07:03:49 +02:00
Botond Dénes
105bf8888a sstables: convert mx writer to v2
The sstables::sstable class has two methods for writing sstables:
1) sstable_writer get_writer(...);
2) future<> write_components(flat_mutation_reader, ...);

(1) directly exposes the writer type, so we have to update all users of
it (there is not that many) in this same patch. We defer updating
users of (2) to a follow-up commits.
2022-03-10 07:03:49 +02:00
Botond Dénes
11adb404c6 sstables/metadata_collector: use position_in_partition for min/max keys
Instead of naked clustering keys. Working with the latter is dangerous
because it cannot accurately represent the entire clustering domain: it
cannot represent positions between (before/after) keys. For this reason
the metadata collector had a separate update_min_max_components()
overload for range tombstones because the positions of these cannot be
represented by clustering keys alone.
Moving to position_in_partition solves this problem and it is now enough
to have a single overload with position_in_partition_view. This is also
more future proof as it will work with range tombstone changes without
any additional changes.
2022-03-10 07:03:49 +02:00
Botond Dénes
2057db54ab test/boost/mutation_test: test_compactor_range_tombstone_spanning_many_pages extend to check v2 output too 2022-03-10 07:03:49 +02:00
Botond Dénes
7a37e30310 mutation_reader: convert compacting reader v2
Its input was already a v2 reader, now itself is also a v2 reader.
With this commit, compaction.cc is finally v2 all-the-way.
2022-03-10 07:03:46 +02:00
Botond Dénes
ad435dcf57 mutation_compactor: add v2 output
The output version is selected via compactor_output_format, which is a
template parameter of `compact_mutation_state` and all downstream types.
This is to ensure a compaction state created to emit a v2 stream will
not be accidentally used with a v1 consumer.
When using a v2 output, the current active tombstone has to be tracked
separately for the regular and for the gc consumer (if any), so that
each can be closed properly on EOS. The current effective tombstone is
tracked separately from these two. The reason is that purged tombstones
are still applied to data, but are not emitted to the regular consumer.
2022-03-10 06:46:46 +02:00
Botond Dénes
1ccaeb2a1a mutation_compactor: make _last_clustering_pos track last input
Instead of updating _last_clustering_pos whenever a clustering fragment
is pushed to the consumers, we now update it whenever a clustering
fragment enters the compactor. Not only is this much more robust, but it
also makes more sense. Just because a range tombstone is purged (and
therefore the consumer doesn't see it), it still moves the logical
clustering position in the stream. Also, tracking the input side avoids
any ambiguity related to cases where we have two consumers (regular + gc
consumer).
2022-03-10 06:46:46 +02:00
Botond Dénes
b2b6f03a5d range_tombstone_change: add set_tombstone() 2022-03-10 06:46:46 +02:00
Botond Dénes
6544da342a test/lib/mutation_source_test: log name of each run_mutation_source()
Although we have a log in run_mutation_reader_tests(), it is useful to
know where it was called from, when trying to find the test scenario
that failed.
2022-03-10 06:46:46 +02:00
Avi Kivity
a4756334ce Merge "tools/scylla-types: improve documentation" from Botond
"
Add per-action help content for each action. Main description now points
to these for more details.
"

* 'scylla-types-improvements/v1' of https://github.com/denesb/scylla:
  tools/types: update main description
  tools/scylla-types: per-action help content
  tools/scylla-types: description: remove -- from action listing
  tools/scylla-types: use fmt::print() instead of std::cout <<
2022-03-09 19:37:00 +02:00
Avi Kivity
e1c326a5ba Merge "Convert multishard writer to v2" from Botond
"
Also convert the foreign_reader used by it in the process.

Tests: unit(dev)
"

* 'multishard-writer-v2/v1' of https://github.com/denesb/scylla:
  mutation_writer/multishard_writer: remove now unused v1 factory overloads
  test/boost/mutation_writer_test: test the v2 variant of distribute_reader_and_consume_on_shards()
  flat_mutation_reader: add v2 variant of make_generating_reader()
  mutation_reader: multishard_writer: migrate implementation to v2
  mutation_reader: convert foreign_reader to v2
  streaming/consumer: convert to v2
  mutation_writer/multishard_writer: add v2 variant of distribute_reader_and_consume_on_shards()
2022-03-09 19:28:05 +02:00
Avi Kivity
fbb3904b67 Merge 'query: transform asserts into on_internal_error in forward_result::merge' from Michał Sala
It was a suggestion from @psarna, done to get more info about the abort from #10174.

Closes #10185

* github.com:scylladb/scylla:
  query: do not assert in `operator<<(ostream&, const forward_result::printer&)`
  query: transform asserts into on_internal_error in forward_result::merge
2022-03-09 16:30:32 +02:00
Tomasz Grabiec
8fa704972f loading_cache: Make invalidation take immediate effect
There are two issues with current implementation of remove/remove_if:

  1) If it happens concurrently with get_ptr(), the latter may still
  populate the cache using value obtained from before remove() was
  called. remove() is used to invalidate caches, e.g. the prepared
  statements cache, and the expected semantic is that values
  calculated from before remove() should not be present in the cache
  after invalidation.

  2) As long as there is any active pointer to the cached value
  (obtained by get_ptr()), the old value from before remove() will be
  still accessible and returned by get_ptr(). This can make remove()
  have no effect indefinitely if there is persistent use of the cache.

One of the user-perceived effects of this bug is that some prepared
statements may not get invalidated after a schema change and still use
the old schema (until next invalidation). If the schema change was
modifying UDT, this can cause statement execution failures. CQL
coordinator will try to interpret bound values using old set of
fields. If the driver uses the new schema, the coordinaotr will fail
to process the value with the following exception:

  User Defined Type value contained too many fields (expected 5, got 6)

The patch fixes the problem by making remove()/remove_if() erase old
entries from _loading_values immediately.

The predicate-based remove_if() variant has to also invalidate values
which are concurrently loading to be safe. The predicate cannot be
avaluated on values which are not ready. This may invalidate some
values unnecessarily, but I think it's fine.

Fixes #10117

Message-Id: <20220309135902.261734-1-tgrabiec@scylladb.com>
2022-03-09 16:13:07 +02:00
Michał Sala
538cff651e query: do not assert in operator<<(ostream&, const forward_result::printer&)
Printing invalid forward_result should not cause Scylla to stop.
2022-03-09 14:58:11 +01:00
Michał Sala
51362e4e5e query: transform asserts into on_internal_error in forward_result::merge
It was done to show more context in case of forward_result::merge
arguments size mismatch and also to prevent aborts caused by another
nodes sending malformed data.
2022-03-09 14:58:11 +01:00