In many cases we trigger offstrategy compaction opportunistically
also when there's nothing to do. In this case we still print
to the log lots of info-level message and call
`run_offstrategy_compaction` that wastes more cpu cycles
on learning that it has nothing to do.
This change bails out early if the maintenance set is empty
and prints a "Skipping off-strategy compaction" message in debug
level instead.
Fixes#13466
Also, add an group_id class and return it from compaction_group and table_state.
Use that to identify the compaction_group / table_state by "ks_name.cf_name compaction_group=idx/total" in log messages.
Fixes#13467Closes#13520
* github.com:scylladb/scylladb:
compaction_manager: print compaction_group id
compaction_group, table_state: add group_id member
compaction_manager: offstrategy compaction: skip compaction if no candidates are found
Add a formatter to compaction::table_state that
prints the table ks_name.cf_name and compaction group id.
Fixes#13467
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
In many cases we trigger offstrategy compaction opportunistically
also when there's nothing to do. In this case we still print
to the log lots of info-level message and call
`run_offstrategy_compaction` that wastes more cpu cycles
on learning that it has nothing to do.
This change bails out early if the maintenance set is empty
and prints a "Skipping off-strategy compaction" message in debug
level instead.
Fixes#13466
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Seen after f1bbf705f9 in debug mode
distributed_loader collect_all_shared_sstables copies
compaction::owned_ranges_ptr (lw_shared_ptr<const
dht::token_range_vector>)
across shards.
Since update_sstable_cleanup_state is synchronous, it can
be passed a const refrence to the token_range_vector instead.
It is ok to access the memory read-only across shards
and since this happens on start-up, there are no special
performance requirements.
Fixes#13631
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Commit 49892a0, back in 2018, bumps the compaction shares by 200 to
guarantee a minimum base line.
However, after commit e3f561d, major compaction runs in maintenance
group meaning that bumping shares became completely irrelevant and
only causes regular compaction to be unnecessarily more aggressive.
Fixes#13487.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Closes#13488
If any of the sstables to-be-compacted requires cleanup,
retrive the owned_ranges_ptr from the table_state.
With that, staging sstables will eventually be cleaned up
via regular compaction.
Refs #9559
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
When perform_cleanup adds sstables to sstables_requiring_cleanup,
also save the owned_ranges_ptr in the compaction_state so
it could be used by other compaction types like
regular, reshape, or major compaction.
When the exhausted sstables are released, check
if sstables_requiring_cleanup is empty, and if it is,
clear also the owned_ranges_ptr.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
As a first step towards parallel cleanup by
(regular) compaction and cleanup compaction,
filter all sstables in perform_cleanup
and keep the set of sstables in the compaction_state.
Erase from that set when the sstables are unregistered
from compaction.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Allow getting candidates for compaction
from an arbitrary range of sstable, not only
the in_strategy_sstables.
To be used by perform_cleanup to mark all sstables
that require cleanup, even if they can't be
compacted at this time.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
update_sstable_cleanup_state calls needs_cleanup and
inserts (or erases) the sstable into the respective
compaction_state.sstables_requiring_cleanup set.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
I'm not sure why this was originally supported,
maybe for upgrade sstables where we may want to
rewrite the sstables without filtering any tokens,
but perform_sstable_upgrade is now following a
different code path and uses `rewrite_sstables`
directly, without pigybacking on cleanup.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Move the owned_ranges_ptr, currently used only by
cleanup and upgrade compactions, to the generic
compaction descriptor so we apply cleanup in other
compaction types.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
A problem in compaction reevaluation can cause the SSTable set to be left uncompacted for indefinite amount of time, potentially causing space and read amplification to be suboptimal.
Two revaluation problems are being fixed, one after off-strategy compaction ended, and another in compaction manager which intends to periodically reevaluate a need for compaction.
Fixes https://github.com/scylladb/scylladb/issues/13429.
Fixes https://github.com/scylladb/scylladb/issues/13430.
Closes#13431
* github.com:scylladb/scylladb:
compaction: Make compaction reevaluation actually periodic
replica: Reevaluate regular compaction on off-strategy completion
Compaction group is responsible for deleting SSTables of "in-strategy"
compactions, i.e. regular, major, cleanup, etc.
Both in-strategy and off-strategy compaction have their completion
handled using the same compaction group interface, which is
compaction_group::table_state::on_compaction_completion(...,
sstables::offstrategy offstrategy)
So it's important to bring symmetry there, by moving the responsibility
of deleting off-strategy input, from manager to group.
Another important advantage is that off-strategy deletion is now throttled
and gated, allowing for better control, e.g. table waiting for deletion
on shutdown.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Closes#13432
The manager intended to periodically reevaluate compaction need for
each registered table. But it's not working as intended.
The reevaluation is one-off.
This means that compaction was not kicking in later for a table, with
low to none write activity, that had expired data 1 hour from now.
Also make sure that reevaluation happens within the compaction
scheduling group.
Fixes#13430.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Task manager compaction tasks that cover compaction group
compaction need access to compaction_manager::tasks.
To avoid circular dependency and be able to rely on forward
declaration, task needs to be moved out of compaction manager.
To avoid naming confusion compaction_manager::task is renamed.
Closes#13226
* github.com:scylladb/scylladb:
compaction: use compaction namespace in compaction_manager.cc
compaction: rename compaction::task
compaction: move compaction_manager::task out of compaction manager
compaction: move sstable_task definition to source file
To avoid confusion with task manager tasks compaction::task is renamed
to compaction::compaction_task_exector. All inheriting classes are
modified similarly.
compaction_manager::task needs to be accessed from task manager compaction
tasks. Thus, compaction_manager::task and all inheriting classes are moved
from compaction manager to compaction namespace.
once compaction_strategy is made staless, the state must be retrieved
in notify_completion() through table_state.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
There are two methods to mess with compaction history -- update and get. The former had been patched to use local system-keyspace instance by 907fd2d3 (system_keyspace: De-static compaction history update) now it's time for the latter (spoiler: it's only used by the API handler)
Closes#12889
* github.com:scylladb/scylladb:
system_keyspace; Make get_compaction_history non static and drop qctx
api, compaction_manager: Get compaction history via manager
system_keyspace: Move compaction_history_entry to namespace scope
Right now the API handler directly calls static method from system
keyspace. Patching it to call compaction manager instead will let the
latter use on-board plugged system keyspace for that. If the system
keyspace is not plugged, it means early boot or late shutdown, not a
good time to get compaction history anyway.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
these warnings are found by Clang-17 after removing
`-Wno-unused-lambda-capture` and '-Wno-unused-variable' from
the list of disabled warnings in `configure.py`.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
* use `defer_verbose_shutdown()` to shutdown compaction manager
`EDQUOT` is quite similar as `ENOSPC`, in the sense that both of them
are caused by environmental issues.
before this change, `compaction_manager` filters the
ENOSPC exceptions thrown by `compaction_manager::really_do_stop()`,
so they are not propagated to caller when calling
`compaction_manager::stop()` -- only a warning message is printed
in the log. but `EDQUOT` is not handled.
after this change, the exception raised by compaction manager's
stop process is not filtered anymore and is handled by
`defer_verbose_shutdown()` instead, which is able to check the
type of exception, and print out error message in the log. so
the `ENOSPC` and `EDQUOT` errors are taken care of, and more
visible from user's perspective as they are printed as errors
instead of warning. but they are not printed using the
`compaction_manager` logger anymore. so if our testing or user's
workflow depends on this behavior, the related setting should be
updated accordingly.
Fixes#12626
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Introduces task manager's compaction module. That's an initial
part of integration of compaction with task manager.
When fully integrated, task manager will allow user to track compaction
operations, check status and progress of each individual one. It will help
with creating an asynchronous version of rest api that forces any compaction.
Currently, users can see with /task_manager/list_modules api call that
compaction is one of the modules accessible through task manager.
They won't get any additional information though, since compaction
tasks are not created yet.
A shared_ptr to compaction module is kept in compaction manager.
Closes#12635
* github.com:scylladb/scylladb:
compaction: test: pass task_manager to compaction_manager in test environment
compaction: create and register task manager's module for compaction
tasks: add task_manager constructor without arguments
Each instance of compaction manager should have compaction module pointer
initialized. All contructors get task_manager reference with which
the module is created.
As an initial part of integration of compaction with task manager, compaction
module is added. Compaction module inherits from tasks::task_manager::module
and shared_ptr to it is kept in compaction manager. No compaction tasks are
created yet.
Every 1 hour, compaction manager will submit all registered table_state
for a regular compaction attempt, all without yielding.
This can potentially cause a reactor stall if there are 1000s of table
states, as compaction strategy heuristics will run on behalf of each,
and processing all buckets and picking the best one is not cheap.
This problem can be magnified with compaction groups, as each group
is represented by a table state.
This might appear in dashboard as periodic stalls, every 1h, misleading
the investigator into believing that the problem is caused by a
chronological job.
This is fixed by piggybacking on compaction reevaluation loop which
can yield between each submission attempt if needed.
Fixes#12390.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Closes#12391
After commit a57724e711, off-strategy no longer races with view
building, therefore deletion code can be simplified and piggyback
on mechanism for deleting all sstables atomically, meaning a crash
midway won't result in some of the files coming back to life,
which leads to unnecessary work on restart.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Closes#12245
postponed_compactions_reevaluation() runs until compaction_manager is
stopped, checking if it needs to launch new compactions.
Make it return a future instead of stashing its completion somewhere.
This makes is easier to convert it to a coroutine.
compaction_state shouldn't be moved once emplaced. moving it could
theoretically cause task's gate holder to have a dangling pointer to
compaction_state's gate, but turns out gate's move ctor will actually
fail under this assertion:
assert(!_count && "gate reassigned with outstanding requests");
Cannot happen today, but let's make it more future proof.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Closes#12167
Currently, the function is inefficient in two ways:
1. unnecessary copy of first/last keys to automatic variables
2. redecorating the partition keys with the schema passed to
needs_cleanup.
We canjust use the tokens from the sstable first/last decorated keys.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Today, compaction_backlog_tracker is managed in each compaction_strategy
implementation. So every compaction strategy is managing its own
tracker and providing a reference to it through get_backlog_tracker().
But this prevents each group from having its own tracker, because
there's only a single compaction_strategy instance per table.
To remove this limitation, compaction_strategy impl will no longer
manage trackers but will instead provide an interface for trackers
to be created, such that each compaction group will be allowed to
have its own tracker, which will be managed by compaction manager.
On compaction strategy change, table will update each group with
the new tracker, which is created using the previously introduced
ompaction_group_sstable_set_updater.
Now table's backlog will be the sum of all compaction_group backlogs.
The normalization factor is applied on the sum, so we don't have
to adjust each individual backlog to any factor.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
compaction_backlog_tracker will be managed by compaction_manager, in the
per table state. As compaction tasks can access the tracker throughout
its lifetime, remove() can only deregister the state once we're done
stopping all tasks which map to that state.
remove() extracted the state upfront, then performed the stop, to
prevent new tasks from being registered and left behind. But we can
avoid the leak of new tasks by only closing the gate, which waits
for all tasks (which are stopped a step earlier) and once closed,
prevents new tasks from being registered.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Luckily it's not used anywhere. Default move ctor was picked but
it won't clear _manager of old object, meaning that its destructor
will incorrectly deregister the tracker from
compaction_backlog_manager.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
When setting a new strategy, the charges of old tracker is transferred
to the new one.
The problem is that we're not reverting changes if exception is
triggered before the new strategy is successfully set.
To fix this exception safety issue, let's copy the charges instead
of moving them. If exception is triggered, the old tracker is still
the one used and remain intact.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
This interface will be helpful for allowing replica::table, unit
tests and sstables::compaction to access the compaction group's tracker
which will be managed by the compaction manager, once we complete
the decoupling work.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Prior to off-strategy compaction, streaming / repair would place
staging files into main sstable set, and wait for view building
completion before they could be selected for regular compaction.
The reason for that is that view building relies on table providing
a mutation source without data in staging files. Had regular compaction
mixed staging data with non-staging one, table would have a hard time
providing the required mutation source.
After off-strategy compaction, staging files can be compacted
in parallel to view building. If off-strategy completes first, it
will place the output into the main sstable set. So a parallel view
building (on sstables used for off-strategy) may potentially get a
mutation source containing staging data from the off-strategy output.
That will mislead view builder as it won't be able to detect
changes to data in main directory.
To fix it, we'll do what we did before. Filter out staging files
from compaction, and trigger the operation only after we're done
with view building. We're piggybacking on off-strategy timer for
still allowing the off-strategy to only run at the end of the
node operation, to reduce the amount of compaction rounds on
the data introduced by repair / streaming.
Fixes#11882.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Closes#11919