Move the ::database, ::keyspace, and ::table classes to a new replica
namespace and replica/ directory. This designates objects that only
have meaning on a replica and should not be used on a coordinator
(but note that not all replica-only classes should be in this module,
for example compaction and sstables are lower-level objects that
deserve their own modules).
The module is imperfect - some additional classes like distributed_loader
should also be moved, but there is only one way to untie Gordian knots.
Closes#9872
* github.com:scylladb/scylla:
replica: move ::database, ::keyspace, and ::table to replica namespace
database: Move database, keyspace, table classes to replica/ directory
Move replica-oriented classes to the replica namespace. The main
classes moved are ::database, ::keyspace, and ::table, but a few
ancillary classes are also moved. There are certainly classes that
should be moved but aren't (like distributed_loader) but we have
to start somewhere.
References are adjusted treewide. In many cases, it is obvious that
a call site should not access the replica (but the data_dictionary
instead), but that is left for separate work.
scylla-gdb.py is adjusted to look for both the new and old names.
Tables waiting for a chance to run reshape wouldn't trigger stop
exception, as the exception was only being triggered for ongoing
compactions. Given that stop reshape API must abort all ongoing
tasks and all pending ones, let's change run_custom_job() to
trigger the exception if it found that the pending task was
asked to stop.
Tests:
dtest: compaction_additional_test.py::TestCompactionAdditional::test_stop_reshape_with_multiple_keyspaces
unit: dev
Fixes#9836.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20211223002157.215571-1-raphaelsc@scylladb.com>
The database, keyspace, and table classes represent the replica-only
part of the objects after which they are named. Reading from a table
doesn't give you the full data, just the replica's view, and it is not
consistent since reconciliation is applied on the coordinator.
As a first step in acknowledging this, move the related files to
a replica/ subdirectory.
"
Completes coroutinization of rewrite_sstables().
tests: UNIT(debug)
"
* 'rewrite_sstable_coroutinization' of https://github.com/raphaelsc/scylla:
compaction_manager: coroutinize main loop in sstable rewrite procedure
compaction_manager: coroutinize exception handling in sstable rewrite procedure
compaction_manager: mark task::finish_compaction() as noexcept
compaction_manager: make maybe_stop_on_error() more flexible
We can stop only specific compaction types.
Reshard should be excluded since it mustn't be stopped.
And other types of compaction types like "VALIDATION" or "INDEX_BUILD"
are valid in terms of their syntax but unsupported by scylla so we better
return an error rather than appear to support them.
Test: unit(dev)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20211222133449.2177746-1-bhalevy@scylladb.com>
It's hard to integrate maybe_stop_on_error() with coroutines as it
accepts a resolved future, not an exception pointer. Let's adjust
its interface, making it more flexible to work with.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
"
TWCS perform STCS on a window as long as it's the most recent one.
From there on, TWCS will compact all files in the past window into
a single file. With some moderate write load, it could happen that
there's still some compaction activity in that past window, meaning
that per-window major may miss some files being currently compacted.
As a result, a past window may contain more than 1 file after all
compaction activity is done on its behalf, which may increase read
amplification. To avoid that, TWCS will now make sure that per-window
major is serialized, to make sure no files are missed.
Fixes#9553.
tests: unit(dev).
"
* 'fix_twcs_per_window_major_v3' of https://github.com/raphaelsc/scylla:
TWCS: Make sure major on past window is done on all its sstables
TWCS: remove needless param for STCS options
TWCS: kill unused param in newest_bucket()
compaction: Implement strategy control and wire it
compaction: Add interface to control strategy behavior.
We have three semaphores for serialization of maintenance ops.
1) _rewrite_sstables_sem: for scrub, cleanup and upgrade.
2) _major_compaction_sem: for major
3) _custom_job_sem: for reshape, resharding and offstrategy
scrub, cleanup and upgrade should be serialized with major,
so rewrite sem should be merged into major one.
offstrategy is also a maintenance op that should be serialized
with others, to reduce compaction aggressiveness and space
requirement.
resharding is one-off operation, so can be merged there too.
the same applies for reshape, which can take long and not
serializing it with other maintenance activity can lead to
exhaustion of resources and high space requirement.
let's have a single semaphore to guarantee their serialization.
deadlock isn't an issue because locks are always taken in same
order.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20211201182046.100942-1-raphaelsc@scylladb.com>
When invalid sstables are detected, move them
to the quarantine subdirectory so they won't be
selected for regular compaction.
Refs #7658
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Currently compaction_manager tracks sstables
based on !requires_view_building() and similarly,
table::in_strategy_sstables picks up only sstables
that are not in staging.
is_eligible_for_compaction() generalizes this condition
in preparation for adding a quarantine subdirectory for
invalid sstables that should not be compacted as well.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
This strategy method was introduced unnecessarily. We assume it was
going to be needed, but turns out it was never needed, not even
for ICS. Also it's built on a wrong assumption as an output
sstable run being generated can never be compacted in parallel
as the non-overlapping requirement can be easily broken.
LCS for example can allow parallel compaction on different runs
(levels) but correctness cannto be guaranteed with same runs
are compacted in parallel.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
new code in manager adopted name and type table, whereas historical
code still uses name and type column family. let's make it consistent
for newcomers to not get confused.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
as rewrite_sstables() switched to coroutine, it can be simplified
by not using smart pointers to handle lifetime issues.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
After commit 1f5b17f, overlapping can be introduced in level 1 because
procedure that filters out sstables from partial runs is considering
inactive tasks, so L1 sstables can be incorrectly filtered out from
next compaction attempt. When L0 is merged into L1, overlapping is
then introduced in L1 because old L1 sstables weren't considered in
L0 -> L1 compaction.
From now on, compaction_manager::get_candidates() will only consider
active tasks, to make sure actual partial runs are filtered out.
Fixes#9693.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20211129180459.125847-1-raphaelsc@scylladb.com>
Similar to #9313, stop_compaction should also reuse the
stop_ongoing_comapctions() infrastructure and wait on ongoing
compactions of the given type to stop.
Fixes#9695
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Normally, "Stopping 0 tasks for 0 ongoing compactions for table ..."
is not very interesting so demote its log_level to debug.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Now stop_ongoing_compactions(reason) is equivalent to
to stop_ongoing_compactions(reason, nullptr, std::nullopt)
so share the code of the latter for the former entry point.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
And make the table optional as well, so it can be used
by stop_compaction() to a particular compaction type on all tables.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Optionally get running compaction on the provided table.
This is required for stop_ongoing_compactions on a given table.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Rewrite operations are scrub, cleanup and upgrade.
Race can happen because 'selection of sstables' and 'mark sstables as
compacting' are decoupled. So any deferring point in between can lead
to a parallel compaction picking the same files. After commit 2cf0c4bbf,
files are marked as compacting before rewrite starts, but it didn't
take into account the commit c84217ad which moved retrieval of
candidates to a deferring thread, before rewrite_sstables() is even
called.
Scrub isn't affected by this because it uses a coarse grained approach
where whole operation is run with compaction disabled, which isn't good
because regular compaction cannot run until its completion.
From now on, selection of files and marking them as compacting will
be serialized by running them with compaction disabled.
Now cleanup will also retrieve sstables with compaction disabled,
meaning it will no longer leave uncleaned files behind, which is
important to avoid data resurrection if node regains ownership of
data in uncleaned files.
Fixes#8168.
Refs #8155.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20211129133107.53011-1-raphaelsc@scylladb.com>
With that, it is always expected that _compaction_state[cf]
exists when compaction jobs are submnitted.
Otherwise, throw std::out_of_range exception.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
So that the compaction_state won't be found from this point on,
while stopping the ongoing compaction.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Now that compaction tasks enter the compaction_state gate there is
no point in stopping ongoing compaction in parallel to closing the gate.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
And hold its gate to make sure the compaction_state outlives
the task and can be used to wait on all tasks and functions
using it.
With that, doing access _compaction_state[cf] to acquire
shared/exclusive locks but rather get to it via
task->compaction_state so it can be detached from
_compaction_state while task is running, if needed.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
We'd like to use compaction_state::gate both for functions
running with compaction disabled and for and tasks referring
to the compaction_state so that stop_ongoing_compactions
could wait on all functions referring to the state structure.
This is also cleaner with respect to not relying on
gate::use_count() when re-submitting regular compaction
when compaction is re-enabled.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
get_compactions().size() may return 0 while there are
non-zero tasks to stop.
Some tasks may not be marked as `compaction_running` since
they are either:
- postponed (due to compaction manger throttling of regular compaction)
- sleeping before retry.
In both cases we still want to stop them so the log message
should reflect both the number of ongoing compactions
and the actual number of tasks we're stopping.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Make compaction procedure switch to table_state. Only function in
compaction.cc still directly using table is
get_fully_expired_sstables(T,...), but subsequently we'll make it
switch to table_state and then we can finally stop including database.hh
in the compaction code.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Now, when the offstrategy task is stopped, it exits the repeat
loop if (!can_proceed(task)) without going through
_tasks.remove(task) - causing the assert in compaction_manger::remove
to trip, as stop_ongoing_compactions will be resolved
while the task is still listed in _tasks.
Fixes#9634
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
"
Draining the database is now scattered across the do_drain()
method of the storage_service. Also it tells shutdown drain
from API drain.
This set packs this logic into the database::drain() method.
tests: unit(dev), start-stop-drain(dev)
"
* 'br-database-drain' of https://github.com/xemul/scylla:
database, storage_service: Pack database::drain() method
storage_service: Shuffle drain sequence
storage_service, database: Move flush-on-drain code
storage_service: Remove bool from do_drain
"
table_state is being introduced for compaction subsystem, to remove table dependency
from compaction interface, fix layer violations, and also make unit testing
easier as table_state is an abstraction that can be implemented even with no
actual table backing it.
In this series, compaction strategy interfaces are switching to table_state,
and eventually, we'll make compact_sstables() switch to it too. The idea is
that no compaction code will directly reference a table object, but only work
with the abstraction instead. So compaction subdirectory can stop
including database.hh altogether, which is a great step forward.
"
* 'table_state_v5' of https://github.com/raphaelsc/scylla:
sstable_compaction_test: switch to table_state
compaction: stop including database.hh for compaction_strategy
compaction: switch to table_state in estimated_pending_compactions()
compaction: switch to table_state in compaction_strategy::get_major_compaction_job()
compaction: switch to table_state in compaction_strategy::get_sstables_for_compaction()
DTCS: reduce table dependency for task estimation
LCS: reduce table dependency for task estimation
table: Implement table_state
compaction: make table param of get_fully_expired_sstables() const
compaction_manager: make table param of has_table_ongoing_compaction() const
Introduce table_state
The do_drain() today tells shutdown drain from API drain. The reason
is that compaction manager subscribes on the main's abort signal and
drains itself early. Thus, on regular drain it needs this extra kick
that would crash if called from shutdown drain.
This differentiation should sit in the compaction manager itself.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
From now on, get_major_compaction_job() will use table_state instead of
a plain reference to table.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
From now on, get_sstables_for_compaction() will use table_state.
With table_state, we avoid layer violations like strategy using
manager and also makes testing easier.
Compaction unit tests were temporarily disabled to avoid a giant
commit which is hard to parse.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
That's intended to fix a bad layer violation as table was given the
responsibility of disabling compaction for a given table T, but that
logic clearly belongs to compaction_manager instead.
Additionally, gate will be used instead of counter, as former provides
manager with a way to synchronize with functions running under
run_with_compaction_disabled. so remove() can wait for their
termination.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>