Commit Graph

194 Commits

Author SHA1 Message Date
Avi Kivity
bfc521ee9c Merge "Activate compaction_throughput_mb_per_sec option" from Pavel E
"
The option controlls the IO bandwidth of the compaction sched class.
It's not set to be 16MB/s, but is unused. This set makes it 0 by
default (which means unlimited), live-updateable and plugs it to the
seastar sched group IO throttling.

branch: https://github.com/xemul/scylla/tree/br-compaction-throttling-3
tests: unit(dev),
       v2: https://jenkins.scylladb.com/job/releng/job/Scylla-CI/1010/ ,
       v2: manual config update
"

* 'br-compaction-throttling-3-a' of https://github.com/xemul/scylla:
  compaction_manager: Add compaction throughput limit
  updateable_value: Support dummy observing
  serialized_action: Allow being observer for updateable_value
  config: Tune the config option
2022-07-07 13:14:07 +03:00
Pavel Emelyanov
b112a98318 compaction_manager: Add compaction throughput limit
Re-use eisting compaction_throughput_mb_per_sec option, push it down to
compaction manager via config and update the nderlying compaction sched
class when the option is (live)updated.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-07-06 08:17:08 +03:00
Pavel Emelyanov
af026e423e compaction_manager: Add logging around drain
Now we know when it starts and whe^w if it finishes

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-07-01 17:17:53 +03:00
Pavel Emelyanov
a9d6e5cfb6 compaction_manager: Coroutinize drain
It's short enough to fix indentation right at once

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-07-01 17:17:53 +03:00
Benny Halevy
8bccd5e9c5 compaction_manager: task: acquire_semaphore: handle abort_requested_exception
Change 8f39547d89 added
`handle_exception_type([] (const semaphore_aborted& e) {})`,
but it turned out that `named_semaphore_aborted` isn't
derived from `semaphore_aborted`, but rather from
`abort_requested_exception` so handle the base exception
instead.

Fixes #10666

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #10881
2022-06-27 09:47:48 +03:00
Benny Halevy
a65ed19edc table: perform_offstrategy_compaction: move off-strategy logic to compaction_manager
compaction_manager needs to decide about running off-strategy
compaction or not based on the maintenance_set, not partly
in table::trigger_offstrategy_compaction and part in
the compaction_manager layer as it is done today.

So move the logic down to performa_offstrategy
that now returns future<bool> to return true
iff it performed offstrategy compaction.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-06-23 08:18:17 +03:00
Benny Halevy
9079c98db0 compaction_manager: offstrategy_compaction_task: refactor log printouts
Move logging from run_offstrategy_compaction to do_run
so that in the next patch we can skip run_offstrategy_compaction
if the maintenance set is empty (but still log it,
for the sake of dtests.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-06-23 08:02:44 +03:00
Pavel Emelyanov
0c8abca75e compaction_manager: Introduce compaction_manager::config
This is to make it constructible in a way most other services are -- all
the "scalar" parameters are passed via a config.

With this it will be much shorter to add compaction bandwidth throttling
option by just extending the config itself, not the list of constructor
arguments (and all its callers).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-06-16 17:40:19 +03:00
Pavel Emelyanov
997a34bf8c backlog_controller: Generalize scheduling groups
Make struct scheduling_group be sub-class of the backlog controller. Its
new meaning is now -- the group under controller maintenance. Both
database and compaction manager derive their sched groups from this one.

This makes backlog controller construction simpler, prepares the ground
for sched groups unification in seastar and facilitates next patch.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-06-16 17:40:19 +03:00
Pavel Emelyanov
0fef2e0273 compaction_manager: Swap groups and controller
To have groups initialized before controller. Makes next patch shorter

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-06-16 17:40:19 +03:00
Pavel Emelyanov
fbb59fc920 compaction_manager: Keep compaction_sg on board
This is mainly to make next patch simpler. Also this makes the backlog
controller API smaller by removing its sg() method.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-06-16 17:40:19 +03:00
Pavel Emelyanov
0662036d27 compaction_manager: Unify scheduling_group structures
There are two of them with identical content and meaning

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-06-16 17:40:19 +03:00
Pavel Emelyanov
41f1044d3c compaction_manager: Merge static/dynamic constructors
The only difference between those two are in the way backlog controller
is created. It's much simpler to have the controller construction logic
in compaction manager instead. Similar "trick" is used to construct
flush controller for the database.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-06-16 17:40:19 +03:00
Pavel Emelyanov
2dbf0b5248 compaction_manager: Coroutinuze really_do_stop()
This way it's more compact and easier to extend.
Also it's small enough to fix indentation right at once.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-06-16 17:40:19 +03:00
Pavel Emelyanov
bbd9fc26cd compaction_manager: Shuffle really_do_stop()
Make it the future-returning method and setup the _stop_future in its
only caller. Makes next patch much simpler

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-06-16 17:40:19 +03:00
Pavel Emelyanov
b19b8c9e5b compaction_manager: Remove try-catch around logger
Logging functions are all noexcept already

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-06-16 17:40:19 +03:00
Mikołaj Sielużycki
db5b05948b compaction: Clarify comment.
Closes #10799
2022-06-15 15:09:44 +03:00
Benny Halevy
8f39547d89 compaction_manager: task: convert semaphore_aborted to compaction_stopped exception
Fixes #10666

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #10686
2022-06-13 16:20:39 +03:00
Mikołaj Sielużycki
4143878558 compaction: Release compaction weight before updating history.
update_history can take a long time compared to compaction, as a call
issued on shard S1 can be handled on shard S2. If the other shard is
under heavy load, we may unnecessarily block kicking off a new
compaction. Normally it isn't a problem, as compactions aren't super
frequent, but there were edge cases where the described behaviour caused
compaction to fail to keep up with excessive flushing, leading to too
many sstables on disk and OOM during a read.

There is no need to wait with next compaction until history is updated,
so release the weight earlier to remove unnecessary serialization.
Compaction is marked as finished as soon as sstables are compacted
(without waiting for history update).
2022-06-07 12:55:28 +02:00
Mikołaj Sielużycki
5ce1fd1574 compaction: Inline compact_sstables_and_update_history call.
This commit introduces no functional changes and exists solely for
clarity of the change in the subsequent commit.
2022-06-07 12:55:28 +02:00
Mikołaj Sielużycki
533552273a compaction: Extract compact_sstables function 2022-06-07 12:55:28 +02:00
Mikołaj Sielużycki
33c5802957 compaction: Rename compact_sstables to compact_sstables_and_update_history 2022-06-07 12:55:28 +02:00
Mikołaj Sielużycki
9572520d0d compaction: Extract update_history function 2022-06-07 12:55:28 +02:00
Mikołaj Sielużycki
537819b7f8 compaction: Extract should_update_history function. 2022-06-07 12:55:28 +02:00
Mikołaj Sielużycki
447bd8a2e0 compaction: Fetch start_size from compaction_result
The start size is calculated during compaction and returned from
sstables::compact_sstables, so there is no need to do it twice.
2022-06-07 12:55:28 +02:00
Avi Kivity
4b53af0bd5 treewide: replace parallel_for_each with coroutine::parallel_for_each in coroutines
coroutine::parallel_for_each avoids an allocation and is therefore preferred. The lifetime
of the function object is less ambiguous, and so it is safer. Replace all eligible
occurences (i.e. caller is a coroutine).

One case (storage_service::node_ops_cmd_heartbeat_updater()) needed a little extra
attention since there was a handle_exception() continuation attached. It is converted
to a try/catch.

Closes #10699
2022-05-31 09:06:24 +03:00
Raphael S. Carvalho
b120cacdd1 compaction_manager: Allow off-strategy to proceed in parallel to in-strategy compactions
Off-strategy works on maintenance sstable set using maintenance
scheduling group, whereas "in-strategy" works on main sstable set
and uses compaction group.

Today, it can happen that off-strategy has to wait for an "in-strategy"
maintenance compaction, e.g. cleanup, to complete before getting
a chance to run. But that's not desired behavior as off-strategy uses
maintenance group, and its candidates don't add to the backlog that
influences "in-strategy" bandwidth. Therefore, "in-strategy" and
off-strategy should be decoupled, with off-strategy having its own
semaphore for guaranteeing serialization across tables.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #10595
2022-05-19 17:37:11 +03:00
Raphael S. Carvalho
ca322fb7c2 compaction_manager: Quickly abort maintenance compaction waiting for its turn
Today, aborting a maintenance compaction like major, which is waiting for
its turn to run, can take lots of time because compaction manager will
only be able to bail out the task once it gets the "permit" from the
serialization mechanism, i.e. semaphore. Meaning that the command that
started the task will only complete after all this time waiting for
the "permit".

To allow a pending maintenance compaction to be quickly aborted, we
can use the abortable variant of get_units(). So when user submits an
abortion request, get_units() will be able to return earlier through
the abort exception.

Refs #10485.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #10581
2022-05-17 13:14:51 +03:00
Avi Kivity
528ab5a502 treewide: change metric calls from make_derive to make_counter
make_derive was recently deprecated in favor of make_counter, so
make the change throughput the codebase.

Closes #10564
2022-05-14 12:53:55 +02:00
Raphael S. Carvalho
5682393693 compaction: Fix use-after-move when retrying maintenance compaction
SSTable was moved into descriptor, so on failure, it couldn't be used
without resulting in a segfault. Fix it by not moving sst, and changing
signature to make it explicit we don't want to move the content.

Fixes #10505.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #10506
2022-05-08 11:16:55 +03:00
Raphael S. Carvalho
20a1ef3bee compaction_backlog_tracker: Raise logging level to error when disabling tracker on exception
If exception is caught while updating backlog tracker, the backlog
tracker will be disabled for the underlying table, potentially
causing compaction to fall behind.
That being said, let's raise the log level to error, to give it
its due importance and allow tests to detect the problem.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20220330151421.49054-1-raphaelsc@scylladb.com>
2022-03-31 07:04:00 +03:00
Botond Dénes
0c3d4091a4 Merge "Make TWCS' cleanup bucket aware" from Raphael S. Carvalho
"
Quoting patch 3/4:
"This continues the work in a69d98c3d0,
by implementing the cleanup method in TWCS to make it bucket aware.
Till now, the default impl was used which cleanups on file at a
time, starting from the smallest.

The cleanup strategy for TWCS is simple. It's simply calling the
size tiered cleanup method for each bucket, so there will be
one job for each tier in each window.

The next strategies to receive this improvement are LCS and ICS
(the latter one being only available in enterprise).

Refs #10097."

** Simply put, the goal is to reduce writeamp when performing cleanup
on a TWCS table, therefore reducing the operation time. **

tests: unit(dev).
"

* 'twcs_cleanup_bucket_aware/v1' of https://github.com/raphaelsc/scylla:
  tests: sstable_compaction_test: Add test for TWCS' bucket-aware cleanup
  compaction: TWCS: Implement cleanup method for bucket awareness
  compaction: TWCS: change get_buckets() signature to work with const qualified functions
  compaction_strategy: get_cleanup_compaction_jobs: accept candidates by value
2022-03-30 11:45:28 +03:00
Raphael S. Carvalho
177a8e8259 compaction_manager: allow sstable to be moved into rewrite_sstable()
Caller was already trying to move sstable, but rewrite_sstable() signature
was incorrect.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20220329022149.250655-1-raphaelsc@scylladb.com>
2022-03-30 11:42:52 +03:00
Raphael S. Carvalho
2a9bfa3e3f compaction_strategy: get_cleanup_compaction_jobs: accept candidates by value
Then caller can decide whether to copy or move candidate set into the
function. cleanup_sstables_compaction_task can move candidates as
it's no longer needed once it retrieves all descriptors.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-03-29 09:49:13 -03:00
Raphael S. Carvalho
c7826aa910 compaction_manager: Wire cleanup task into the strategy cleanup method
As the cleanup process can now be driven by the compaction strategy,
let's move cleanup into a new task type that uses the new
compaction_strategy::get_cleanup_compaction_jobs().

By the time being all strategies are using the default method that
returns one descriptor for each sstable that needs clean up.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-03-25 11:23:26 -03:00
Raphael S. Carvalho
25be958ab9 compaction: Introduce compaction_descriptor::sstables_size
This method can be reused in manager, and will be useful for upcoming
cleanup task.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-03-21 12:55:10 -03:00
Raphael S. Carvalho
c25d8f6770 compaction: Move decision of garbage collection from strategy to task type
For compaction to be able to purge expired data, like tombstones, a
sstable set snapshot is set in the compaction descriptor.

That's a decision that belongs to task type. For example, all regular
compaction enable GC, whereas scrub for example doesn't for safety
reasons.

The problem is that the decision is being made by every instantiation
of compaction_descriptor in the strategies, which is both unnecessary
and also adds lots of boilerplate to the code, making it hard to
understand and work with.

As sstable set snapshot is an implementation detail, a new method
is being added to compaction_descriptor to make the intention
clearer, making the interface easier to understand.

can_purge_tombstones, used previously by rewrite task only, is being
reused for communicating GC intention into task::compact_sstables().

The boilerplate was a pain when adding a new strategy method for
the ongoing work on cleanup, described by issue #10097.
Another benefit is that we'll now only create a set snapshot when
compaction will really run. Before, it could happen that the snapshot
would be discarded if the compaction attempt had to be postponed,
which is a waste of cpu cycles.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-03-21 12:14:04 -03:00
Avi Kivity
aab052c0d5 Merge 'replica/database: truncate: temporarily disable compaction on table and views before flush' from Benny Halevy
Flushing the base table triggers view building
and corresponding compactions on the view tables.

Temporarily disable compaction on both the base
table and all its view before flush and snapshot
since those flushed sstables are about to be truncated
anyway right after the snapshot is taken.

This should make truncate go faster.

In the process, this series also embeds `database::truncate_views`
into `truncate` and coroutinizes both

Refs #6309

Test: unit(dev)

Closes #10203

* github.com:scylladb/scylla:
  replica/database: truncate: fixup indentation
  replica/database: truncate: temporarily disable compaction on table and views before flush
  replica/database: truncate: coroutinize per-view logic
  replica/database: open-code truncate_view in truncate
  replica/database: truncate: coroutinize run_with_compaction_disabled lambda
  replica/database: coroutinize truncate
  compaction_manager: add disable_compaction method
2022-03-17 17:24:20 +02:00
Raphael S. Carvalho
0cc717ee86 compaction_manager: Retrieve and register files in rewrite_sstables() atomically
The atomicity was lost in commit a2a5e530f0.

Registration of compacting SSTables now happens in rewrite_sstables_compaction_task
ctor, but that's risky because a regular compaction could pick those
same files if run_with_compaction_disabled() defers after the callback
passed to it returns, and before run__w__c__d() caller has a chance to
run. The deferring point is very much possible, because submit()
(submits a regular job) is called when run__w__c__d() reenables compaction
internally.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20220315182857.121479-1-raphaelsc@scylladb.com>
2022-03-16 09:58:16 +02:00
Raphael S. Carvalho
58e520ab1d compaction: Move run_off_strategy_compaction() into compaction manager
Compaction manager is calling back the table to run off-strategy compaction,
but the logic clearly belongs to manager which should perform the
operation independently and only call table to update its state with the
result.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20220315174504.107926-2-raphaelsc@scylladb.com>
2022-03-16 09:55:52 +02:00
Benny Halevy
297a37f640 compaction_manager: add disable_compaction method
Returns a RAII class compaction_reenabler
that conditionally reenables compaction
for the given table when destroyed.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-03-15 11:00:49 +02:00
Raphael S. Carvalho
1a2332a0ba compaction: Move release_exhausted out of the compaction descriptor
With compact_sstables() now living in compaction_manager::task,
release_exhausted no longer has to live inside compaction_descriptor,
which is a good direction because implementation detail is being
removed from the interface.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20220311023410.250149-2-raphaelsc@scylladb.com>
2022-03-14 15:39:23 +02:00
Raphael S. Carvalho
fce9d869b4 compaction: Move table::compact_sstables() into compaction manager
Table submits compaction request into manager, which in turn calls
back table to run the compaction when the time has come, i.e.:
table -> compaction manager -> table -> execute compaction

But manager should not rely on table to run compaction, as compaction
execution procedure sits one layer below the manager and should be
accessed directly by it, i.e:

table -> compaction manager -> execute compaction

This makes code easier to understand and update_compaction_history()
can now be noop for unit tests using table_state.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20220311023410.250149-1-raphaelsc@scylladb.com>
2022-03-14 15:39:23 +02:00
Benny Halevy
5e1fda7e1d compaction_manager: use coroutine::switch_to
Saving an allocation for running the functor
as a task in the switched-to scheduling group.

Also, switch to the desired scheduling group at
the beginning of the task so that the higher level logic,
like getting the list of sstables to compact
will be performed under the desired scheduling group,
not only the compaction code itself.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-03-10 12:20:01 +02:00
Benny Halevy
8c66916652 compaction_manager::task: drop _compaction_running
Replace the _compaction_running boolean member
by calculating _state == state::active
now that setup_new_compaction switches state to
`active`

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-03-10 12:20:01 +02:00
Benny Halevy
a2a5e530f0 compaction_manager: move per-type logic to derived task
Move the business logic into the task specific classes.
Separating initialization during task construction,
from the compaction_done task, moved into
a do_run() method, and in some cases moving
a lambda function that was called per table (as in
rewrite_sstables) into a private method of the
derived class.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-03-10 12:20:01 +02:00
Benny Halevy
2e6ce43a97 compaction_manager: task: add state enum
Add an enum class representing the task state machine
and a switch_state function to transition between the states
and update the corresponding compaction_manager stats counters.

Refs #9974

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-03-10 12:19:59 +02:00
Benny Halevy
9c59d66b7e compaction_manager: task: add maybe_retry
Replacing and combining compaction_manager methods:
maybe_stop_on_error and put_task_to_sleep.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-03-10 11:35:37 +02:00
Benny Halevy
ee32be3aa5 compaction_manager: reevaluate_postponed_compactions: mark as noexcept
To simplify error handling in following patches
that will coroutinize task logic.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-03-10 11:35:37 +02:00
Benny Halevy
72162ed653 compaction_manager: define derived task types
Turn task into a class, defining a clear hierarchy
of private, protected, and public methods.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-03-10 11:35:35 +02:00