Commit Graph

109 Commits

Author SHA1 Message Date
Botond Dénes
0f60cc84f4 Merge 'replica: create a replica module' from Avi Kivity
Move the ::database, ::keyspace, and ::table classes to a new replica
namespace and replica/ directory. This designates objects that only
have meaning on a replica and should not be used on a coordinator
(but note that not all replica-only classes should be in this module,
for example compaction and sstables are lower-level objects that
deserve their own modules).

The module is imperfect - some additional classes like distributed_loader
should also be moved, but there is only one way to untie Gordian knots.

Closes #9872

* github.com:scylladb/scylla:
  replica: move ::database, ::keyspace, and ::table to replica namespace
  database: Move database, keyspace, table classes to replica/ directory
2022-01-07 13:37:40 +02:00
Avi Kivity
bbad8f4677 replica: move ::database, ::keyspace, and ::table to replica namespace
Move replica-oriented classes to the replica namespace. The main
classes moved are ::database, ::keyspace, and ::table, but a few
ancillary classes are also moved. There are certainly classes that
should be moved but aren't (like distributed_loader) but we have
to start somewhere.

References are adjusted treewide. In many cases, it is obvious that
a call site should not access the replica (but the data_dictionary
instead), but that is left for separate work.

scylla-gdb.py is adjusted to look for both the new and old names.
2022-01-07 12:04:38 +02:00
Raphael S. Carvalho
07fba4ab5d compaction_manager: Abort reshape for tables waiting for a chance to run
Tables waiting for a chance to run reshape wouldn't trigger stop
exception, as the exception was only being triggered for ongoing
compactions. Given that stop reshape API must abort all ongoing
tasks and all pending ones, let's change run_custom_job() to
trigger the exception if it found that the pending task was
asked to stop.

Tests:
dtest: compaction_additional_test.py::TestCompactionAdditional::test_stop_reshape_with_multiple_keyspaces
unit: dev

Fixes #9836.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20211223002157.215571-1-raphaelsc@scylladb.com>
2022-01-06 18:04:16 +02:00
Avi Kivity
ae3a360725 database: Move database, keyspace, table classes to replica/ directory
The database, keyspace, and table classes represent the replica-only
part of the objects after which they are named. Reading from a table
doesn't give you the full data, just the replica's view, and it is not
consistent since reconciliation is applied on the coordinator.

As a first step in acknowledging this, move the related files to
a replica/ subdirectory.
2022-01-06 17:07:30 +02:00
Raphael S. Carvalho
4c28c49bc7 compaction_manager: make return of maybe_stop_on_error less confusing
maybe_stop_on_error() is confusing because it returns true if the task
can be retried which goes in opposite direction of its semantics.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20220106143233.459903-1-raphaelsc@scylladb.com>
2022-01-06 16:39:15 +02:00
Avi Kivity
2e958b3555 Merge "Coroutinization of compaction sstable rewrite procedure" from Raphael
"
Completes coroutinization of rewrite_sstables().

tests: UNIT(debug)
"

* 'rewrite_sstable_coroutinization' of https://github.com/raphaelsc/scylla:
  compaction_manager: coroutinize main loop in sstable rewrite procedure
  compaction_manager: coroutinize exception handling in sstable rewrite procedure
  compaction_manager: mark task::finish_compaction() as noexcept
  compaction_manager: make maybe_stop_on_error() more flexible
2022-01-05 10:15:19 +02:00
Benny Halevy
e0a351e0c6 compaction_manager: stop_compaction: disallow specific types
We can stop only specific compaction types.

Reshard should be excluded since it mustn't be stopped.

And other types of compaction types like "VALIDATION" or "INDEX_BUILD"
are valid in terms of their syntax but unsupported by scylla so we better
return an error rather than appear to support them.

Test: unit(dev)

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20211222133449.2177746-1-bhalevy@scylladb.com>
2022-01-05 09:32:20 +02:00
Raphael S. Carvalho
f0b816d8e8 compaction_manager: coroutinize main loop in sstable rewrite procedure
with this patch, rewrite_sstables() is now fully coroutinized.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-01-04 16:03:23 -03:00
Raphael S. Carvalho
c85ba1e694 compaction_manager: coroutinize exception handling in sstable rewrite procedure
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-01-04 15:39:54 -03:00
Raphael S. Carvalho
59a65742f9 compaction_manager: mark task::finish_compaction() as noexcept
As it's intended to be used in a deferred action.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-01-04 15:30:04 -03:00
Raphael S. Carvalho
3fe4c2e517 compaction_manager: make maybe_stop_on_error() more flexible
It's hard to integrate maybe_stop_on_error() with coroutines as it
accepts a resolved future, not an exception pointer. Let's adjust
its interface, making it more flexible to work with.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-01-04 15:28:30 -03:00
Raphael S. Carvalho
ad82ede5f3 compaction: simplify rewrite_sstables() with coroutine
rewrite_sstables() is terribly nested, making it hard to read.
as usual, can be nicely simplified with coroutines.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20211223135012.56277-1-raphaelsc@scylladb.com>
2021-12-26 14:10:52 +02:00
Botond Dénes
55bb70a878 Merge "Make sure TWCS per-window major includes all files" from Raphael
"
TWCS perform STCS on a window as long as it's the most recent one.
From there on, TWCS will compact all files in the past window into
a single file. With some moderate write load, it could happen that
there's still some compaction activity in that past window, meaning
that per-window major may miss some files being currently compacted.
As a result, a past window may contain more than 1 file after all
compaction activity is done on its behalf, which may increase read
amplification. To avoid that, TWCS will now make sure that per-window
major is serialized, to make sure no files are missed.

Fixes #9553.

tests: unit(dev).
"

* 'fix_twcs_per_window_major_v3' of https://github.com/raphaelsc/scylla:
  TWCS: Make sure major on past window is done on all its sstables
  TWCS: remove needless param for STCS options
  TWCS: kill unused param in newest_bucket()
  compaction: Implement strategy control and wire it
  compaction: Add interface to control strategy behavior.
2021-12-20 17:12:50 +02:00
Raphael S. Carvalho
49f40c8791 compaction: Implement strategy control and wire it
This implements strategy control interface for both manager and
tests, and wire it.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-12-13 16:05:23 -03:00
Benny Halevy
fed7319698 compaction_manager: stop_compaction: expose optional table*
To be used by api layer.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-12-09 14:14:49 +02:00
Raphael S. Carvalho
6737c88045 compaction_manager: use single semaphore for serialization of maintenance compactions
We have three semaphores for serialization of maintenance ops.
1) _rewrite_sstables_sem: for scrub, cleanup and upgrade.
2) _major_compaction_sem: for major
3) _custom_job_sem: for reshape, resharding and offstrategy

scrub, cleanup and upgrade should be serialized with major,
so rewrite sem should be merged into major one.

offstrategy is also a maintenance op that should be serialized
with others, to reduce compaction aggressiveness and space
requirement.

resharding is one-off operation, so can be merged there too.
the same applies for reshape, which can take long and not
serializing it with other maintenance activity can lead to
exhaustion of resources and high space requirement.

let's have a single semaphore to guarantee their serialization.

deadlock isn't an issue because locks are always taken in same
order.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20211201182046.100942-1-raphaelsc@scylladb.com>
2021-12-07 12:18:07 +02:00
Benny Halevy
cc122984d6 compaction: scrub: add quarantine_mode option
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-12-05 18:29:04 +02:00
Benny Halevy
60ff28932c compaction_manager: perform_sstable_scrub: get the whole compaction_type_options::scrub
So we can pass additional options on top of the scrub mode.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-12-05 18:21:37 +02:00
Benny Halevy
bbe275f37d compaction: scrub_sstables_validate_mode: quarantine invalid sstables
When invalid sstables are detected, move them
to the quarantine subdirectory so they won't be
selected for regular compaction.

Refs #7658

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-12-05 18:14:16 +02:00
Benny Halevy
07c5ddf182 sstables: add is_eligible_for_compaction
Currently compaction_manager tracks sstables
based on !requires_view_building() and similarly,
table::in_strategy_sstables picks up only sstables
that are not in staging.

is_eligible_for_compaction() generalizes this condition
in preparation for adding a quarantine subdirectory for
invalid sstables that should not be compacted as well.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-12-05 18:00:44 +02:00
Raphael S. Carvalho
9725e5efa9 compaction_strategy: kill unused can_compact_partial_runs()
This strategy method was introduced unnecessarily. We assume it was
going to be needed, but turns out it was never needed, not even
for ICS. Also it's built on a wrong assumption as an output
sstable run being generated can never be compacted in parallel
as the non-overlapping requirement can be easily broken.
LCS for example can allow parallel compaction on different runs
(levels) but correctness cannto be guaranteed with same runs
are compacted in parallel.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-12-03 12:20:51 -03:00
Raphael S. Carvalho
6d750d4f59 compaction_manager: move check_for_cleanup into perform_cleanup()
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-12-02 14:39:31 -03:00
Raphael S. Carvalho
9aed7e9d67 compaction_manager: replace get_total_size by one liner
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-12-02 14:39:31 -03:00
Raphael S. Carvalho
760cfd93fb compaction_manager: make consistent usage of type and name table
new code in manager adopted name and type table, whereas historical
code still uses name and type column family. let's make it consistent
for newcomers to not get confused.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-12-02 14:39:27 -03:00
Raphael S. Carvalho
e460f72250 compaction_manager: simplify rewrite_sstables()
as rewrite_sstables() switched to coroutine, it can be simplified
by not using smart pointers to handle lifetime issues.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-12-02 08:15:41 -03:00
Raphael S. Carvalho
48124fc15a compaction_manager: restore indentation
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-12-02 08:15:38 -03:00
Raphael S. Carvalho
f23e0d7f2d compaction_manager: Disconsider inactive tasks when filtering sstables
After commit 1f5b17f, overlapping can be introduced in level 1 because
procedure that filters out sstables from partial runs is considering
inactive tasks, so L1 sstables can be incorrectly filtered out from
next compaction attempt. When L0 is merged into L1, overlapping is
then introduced in L1 because old L1 sstables weren't considered in
L0 -> L1 compaction.

From now on, compaction_manager::get_candidates() will only consider
active tasks, to make sure actual partial runs are filtered out.

Fixes #9693.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20211129180459.125847-1-raphaelsc@scylladb.com>
2021-12-01 16:11:44 +02:00
Benny Halevy
957003e73f compaction_manager: stop_compaction: wait for ongoing compactions to stop
Similar to #9313, stop_compaction should also reuse the
stop_ongoing_comapctions() infrastructure and wait on ongoing
compactions of the given type to stop.

Fixes #9695

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-11-30 16:09:11 +02:00
Benny Halevy
b9ba181d3c compaction_manager: stop_ongoing_compactions: log Stopping 0 tasks at debug level
Normally, "Stopping 0 tasks for 0 ongoing compactions for table ..."
is not very interesting so demote its log_level to debug.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-11-30 16:09:11 +02:00
Benny Halevy
03e969dbef compaction_manager: unify stop_ongoing_compactions implementations
Now stop_ongoing_compactions(reason) is equivalent to
to stop_ongoing_compactions(reason, nullptr, std::nullopt)
so share the code of the latter for the former entry point.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-11-30 16:09:07 +02:00
Benny Halevy
94011bdcca compaction_manager: stop_ongoing_compactions: add compaction_type option
And make the table optional as well, so it can be used
by stop_compaction() to a particular compaction type on all tables.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-11-30 16:07:47 +02:00
Benny Halevy
a419759835 compaction_manager: get_compactions: get a table* parameter
Optionally get running compaction on the provided table.
This is required for stop_ongoing_compactions on a given table.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-11-30 16:06:34 +02:00
Raphael S. Carvalho
80a1ebf0f3 compaction_manager: Fix race when selecting sstables for rewrite operations
Rewrite operations are scrub, cleanup and upgrade.

Race can happen because 'selection of sstables' and 'mark sstables as
compacting' are decoupled. So any deferring point in between can lead
to a parallel compaction picking the same files. After commit 2cf0c4bbf,
files are marked as compacting before rewrite starts, but it didn't
take into account the commit c84217ad which moved retrieval of
candidates to a deferring thread, before rewrite_sstables() is even
called.

Scrub isn't affected by this because it uses a coarse grained approach
where whole operation is run with compaction disabled, which isn't good
because regular compaction cannot run until its completion.

From now on, selection of files and marking them as compacting will
be serialized by running them with compaction disabled.

Now cleanup will also retrieve sstables with compaction disabled,
meaning it will no longer leave uncleaned files behind, which is
important to avoid data resurrection if node regains ownership of
data in uncleaned files.

Fixes #8168.
Refs #8155.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20211129133107.53011-1-raphaelsc@scylladb.com>
2021-11-29 16:27:29 +02:00
Benny Halevy
0a33762fb1 compaction_manager: add compaction_state when table is constructed
With that, it is always expected that _compaction_state[cf]
exists when compaction jobs are submnitted.

Otherwise, throw std::out_of_range exception.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-11-23 09:40:06 +02:00
Benny Halevy
29dd24ab46 compaction_manager: remove: fixup indentation
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-11-23 09:40:06 +02:00
Benny Halevy
46ac139490 compaction_manager: remove: detach compaction_state before stopping ongoing compactions
So that the compaction_state won't be found from this point on,
while stopping the ongoing compaction.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-11-23 09:40:06 +02:00
Benny Halevy
75a2509b07 compaction_manager: remove: serialize stop_ongoing_compactions and gate.close
Now that compaction tasks enter the compaction_state gate there is
no point in stopping ongoing compaction in parallel to closing the gate.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-11-23 09:40:06 +02:00
Benny Halevy
3940ffb085 compaction_manager: task: keep a reference on compaction_state
And hold its gate to make sure the compaction_state outlives
the task and can be used to wait on all tasks and functions
using it.

With that, doing access _compaction_state[cf] to acquire
shared/exclusive locks but rather get to it via
task->compaction_state so it can be detached from
_compaction_state while task is running, if needed.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-11-23 09:40:06 +02:00
Benny Halevy
e7ab1f8581 compaction_manager: compaction_state: use counter for compaction_disabled
We'd like to use compaction_state::gate both for functions
running with compaction disabled and for and tasks referring
to the compaction_state so that stop_ongoing_compactions
could wait on all functions referring to the state structure.

This is also cleaner with respect to not relying on
gate::use_count() when re-submitting regular compaction
when compaction is re-enabled.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-11-22 22:08:42 +02:00
Benny Halevy
0cc6060552 compaction_manager: add per-task debug log messages
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-11-22 22:00:18 +02:00
Benny Halevy
1d8d472028 compaction_manager: stop_ongoing_compactions: log number of tasks to stop
get_compactions().size() may return 0 while there are
non-zero tasks to stop.

Some tasks may not be marked as `compaction_running` since
they are either:
- postponed (due to compaction manger throttling of regular compaction)
- sleeping before retry.

In both cases we still want to stop them so the log message
should reflect both the number of ongoing compactions
and the actual number of tasks we're stopping.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-11-22 22:00:18 +02:00
Raphael S. Carvalho
d89edad9fb compaction: switch to table_state
Make compaction procedure switch to table_state. Only function in
compaction.cc still directly using table is
get_fully_expired_sstables(T,...), but subsequently we'll make it
switch to table_state and then we can finally stop including database.hh
in the compaction code.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-11-19 22:06:01 -03:00
Benny Halevy
9548220b70 compaction_manager: submit_offstrategy: remove task in finally clause
Now, when the offstrategy task is stopped, it exits the repeat
loop if (!can_proceed(task)) without going through
_tasks.remove(task) - causing the assert in compaction_manger::remove
to trip, as stop_ongoing_compactions will be resolved
while the task is still listed in _tasks.

Fixes #9634

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-11-17 09:53:59 +02:00
Botond Dénes
b58403fb63 Merge "Flatten database drain" from Pavel E
"
Draining the database is now scattered across the do_drain()
method of the storage_service. Also it tells shutdown drain
from API drain.

This set packs this logic into the database::drain() method.

tests: unit(dev), start-stop-drain(dev)
"

* 'br-database-drain' of https://github.com/xemul/scylla:
  database, storage_service: Pack database::drain() method
  storage_service: Shuffle drain sequence
  storage_service, database: Move flush-on-drain code
  storage_service: Remove bool from do_drain
2021-11-11 08:19:35 +02:00
Avi Kivity
d2e02ea7aa Merge " Abstract table for compaction layer with table_state" from Raphael
"
table_state is being introduced for compaction subsystem, to remove table dependency
from compaction interface, fix layer violations, and also make unit testing
easier as table_state is an abstraction that can be implemented even with no
actual table backing it.

In this series, compaction strategy interfaces are switching to table_state,
and eventually, we'll make compact_sstables() switch to it too. The idea is
that no compaction code will directly reference a table object, but only work
with the abstraction instead. So compaction subdirectory can stop
including database.hh altogether, which is a great step forward.
"

* 'table_state_v5' of https://github.com/raphaelsc/scylla:
  sstable_compaction_test: switch to table_state
  compaction: stop including database.hh for compaction_strategy
  compaction: switch to table_state in estimated_pending_compactions()
  compaction: switch to table_state in compaction_strategy::get_major_compaction_job()
  compaction: switch to table_state in compaction_strategy::get_sstables_for_compaction()
  DTCS: reduce table dependency for task estimation
  LCS: reduce table dependency for task estimation
  table: Implement table_state
  compaction: make table param of get_fully_expired_sstables() const
  compaction_manager: make table param of has_table_ongoing_compaction() const
  Introduce table_state
2021-11-09 19:21:57 +02:00
Pavel Emelyanov
aba475fe1d storage_service: Remove bool from do_drain
The do_drain() today tells shutdown drain from API drain. The reason
is that compaction manager subscribes on the main's abort signal and
drains itself early. Thus, on regular drain it needs this extra kick
that would crash if called from shutdown drain.

This differentiation should sit in the compaction manager itself.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-11-09 19:10:13 +03:00
Raphael S. Carvalho
93ae9225f7 compaction: switch to table_state in compaction_strategy::get_major_compaction_job()
From now on, get_major_compaction_job() will use table_state instead of
a plain reference to table.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-11-09 11:25:22 -03:00
Raphael S. Carvalho
d881310b52 compaction: switch to table_state in compaction_strategy::get_sstables_for_compaction()
From now on, get_sstables_for_compaction() will use table_state.
With table_state, we avoid layer violations like strategy using
manager and also makes testing easier.

Compaction unit tests were temporarily disabled to avoid a giant
commit which is hard to parse.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-11-09 10:52:14 -03:00
Raphael S. Carvalho
33b39a2bfc compaction: move run_with_compaction_disabled() from table into compaction_manager
That's intended to fix a bad layer violation as table was given the
responsibility of disabling compaction for a given table T, but that
logic clearly belongs to compaction_manager instead.

Additionally, gate will be used instead of counter, as former provides
manager with a way to synchronize with functions running under
run_with_compaction_disabled. so remove() can wait for their
termination.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-11-08 15:12:46 -03:00
Raphael S. Carvalho
52feb41468 compaction_manager: switch to coroutine in compaction_manager::remove()
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-11-08 14:24:39 -03:00