Currently, if a compaction function enters the table
or compaction_group async_gate, we can't stop it
on the table/compaction_group stop path as they co_await
their respective async_gate.close().
This series introduces a table_ptr smart pointer to guards
the table object by entering its async_gate, and
it also defers awaiting the gate.close future
till after stopping ongoing compaction so that
closing the gate will prevent starting new compactions
while ongoing compaction can be stopped and finally
awaiting the close() future will wait for them to
unwind and exit the gate after being stopped.
Fixes#16305Closesscylladb/scylladb#16351
* github.com:scylladb/scylladb:
compaction: run_on_table: skip compaction also on gate_closed_exception
compaction: run_on_table: hold table
table: add table_holder and hold method
table: stop: allow compactions to be stopped while closing async_gate
Similar to the no_such_column_family error,
gate_closed_exception indicates that the table
is stopped and we should skip compaction on it
gracefully.
Fixes#16305
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
As a preparation for asynchronous compaction api, from which we
cannot take values by reference, top level compaction tasks get
pointers which need to be set to nullptr when they are not needed
(like in async api).
run_on_existing_tables() is not used at all. and we have two of them.
in this change, let's drop them.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#16304
Fixes some typos as found by codespell run on the code.
In this commit, I was hoping to fix only comments, not user-visible alerts, output, etc.
Follow-up commits will take care of them.
Refs: https://github.com/scylladb/scylladb/issues/16255
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
For major compacting all tables in the database.
The advantage of this api is that `commitlog->force_new_active_segment`
happens only once in `database::flush_all_tables` rather than
once per keyspace (when `nodetool compact` translates to
a sequence of `/storage_service/keyspace_compaction` calls).
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Major compaction already flushes each table to make
sure it considers any mutations that are present in the
memtable for the purpose of tombstone purging.
See 64ec1c6ec6
However, tombstone purging may be inhibited by data
in commitlog segments based on `gc_time_min` in the
`tombstone_gc_state` (See f42eb4d1ce).
Flushing all sstables in the database release
all references to commitlog segments and there
it maximizes the potential for tombstone purging,
which is typically the reason for running major compaction.
However, flushing all tables too frequently might
result in tiny sstables. Since when flushing all
keyspaces using `nodetool flush` the `force_keyspace_compaction`
api is invoked for keyspace successively, we need a mechanism
to prevent too frequent flushes by major compaction.
Hence a `compaction_flush_all_tables_before_major_seconds` interval
configuration option is added (defaults to 24 hours).
In the case that not all tables are flushed prior
to major compaction, we revert to the old behavior of
flushing each table in the keyspace before major-compacting it.
Fixesscylladb/scylladb#15777
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
When flushing is done externally, e.g. by running
`nodetool flush` prior to `nodetool compact`,
flush_memtables=false can be passed to skip flushing
of tables right before they are major-compacted.
This is useful to prevent creation of small sstables
due to excessive memtable flushing.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Set top level compaction tasks as abortable.
Compaction tasks which have no children, i.e. compaction task
executors, have abort method overriden to stop compaction data.
This reverts commit 11cafd2fc8, reversing
changes made to 2bae14f743.
Reverting because this series causes frequent CI failures, and the
proposed quickfix causes other failures of its own.
Fixes: #16113
Set top level compaction tasks as abortable.
Compaction tasks which have no children, i.e. compaction task
executors, have abort method overriden to stop compaction data.
This reverts commit 2860d43309, reversing
changes made to a3621dbd3e.
Reverting because rest_api.test_compaction_task started failing after
this was merged.
Fixes: #16005
Set top level compaction tasks as abortable.
Compaction tasks which have no children, i.e. compaction task
executors, have abort method overriden to stop compaction data.
Keep compaction_progress_monitor in compaction_task_executor and pass a reference
to it further, so that the compaction progress could be retrieved out of it.
Most of the time only the roots of tasks tree should be non internal.
Change default implementation of is_internal and delete overrides
consistent with it.
Closesscylladb/scylladb#15353
Loop in shard_reshaping_compaction_task_impl::run relies on whether
sstables::compaction_stopped_exception is thrown from run_custom_job.
The exception is swallowed for each type of compaction
in compaction_manager::perform_task.
Rethrow an exception in perfrom task for reshape compaction.
Fixes: #15058.
Closes#15067
cleanup_compaction_task_executor inherits both from compaction_task_executor
and cleanup_compaction_task_impl.
Add a new version of compaction_manager::perform_task_on_all_files
which accepts only the tasks that are derived from compaction_task_impl.
After all task executors' conversions are done, the new version replaces
the original one.
In reshard_sstables_compaction_task_impl::run() we call
sharded<sstables::sstable_directory>::invoke_on_all. In lambda passed
to that method, we use both sharded sstable_directory service
and its local instance.
To make it straightforward that sharded and local instances are
dependend, we call sharded<replica::database>::invoke_on_all
instead and access local directory through the sharded one.
Add task manager's task covering resharding compaction.
A struct and some functions are moved from replica/distributed_loader.cc
to compaction/task_manager_module.cc.
This reverts commit 2a58b4a39a, reversing
changes made to dd63169077.
After patch 87c8d63b7a,
table_resharding_compaction_task_impl::run() performs the forbidden
action of copying a lw_shared_ptr (_owned_ranges_ptr) on a remote shard,
which is a data race that can cause a use-after-free, typically manifesting
as allocator corruption.
Note: before the bad patch, this was avoided by copying the _contents_ of the
lw_shared_ptr into a new, local lw_shared_ptr.
Fixes#14475Fixes#14618Closes#14641
In reshard_sstables_compaction_task_impl::run() we call
sharded<sstables::sstable_directory>::invoke_on_all. In lambda passed
to that method, we use both sharded sstable_directory service
and its local instance.
To make it straightforward that sharded and local instances are
dependend, we call sharded<replica::database>::invoke_on_all
instead and access local directory through the sharded one.
As a preparation for integrating resharding compaction with task manager
a struct and some functions are copied from replica/distributed_loader.cc
to compaction/task_manager_module.cc.
Extend a signature of table::compact_all_sstables and
compaction_manager::perform_major_compaction so that they get
the info of a covering task.
This allows to easily create child tasks that cover compaction group
compaction.
In shard compaction tasks per table tasks will be created all at once
and then they will wait for their turn to run.
A function that allows waking up tasks one after another and a function
that makes the task wait for its turn are added.
Task manager compaction tasks need table names for logs.
Thus, compaction tasks store table infos instead of table ids.
get_table_ids function is deleted as it isn't used anywhere.