for faster build times and clear inter-module dependencies, we
should not #includes headers not directly used. instead, we should
only #include the headers directly used by a certain compilation
unit.
in this change, the source files under "/compaction" directories
are checked using clangd, which identifies the cases where we have
an #include which is not directly used. all the #includes identified
by clangd are removed. because some source files rely on the incorrectly
included header file, those ones are updated to #include the header
file they directly use.
if a forward declaration suffice, the declaration is added instead.
see also https://clangd.llvm.org/guides/include-cleaner#unused-include-warningCloses#14740
* github.com:scylladb/scylladb:
treewide: remove #includes not use directly
size_tiered_backlog_tracker: do not include remove header
for faster build times and clear inter-module dependencies, we
should not #includes headers not directly used. instead, we should
only #include the headers directly used by a certain compilation
unit.
in this change, the source files under "/compaction" directories
are checked using clangd, which identifies the cases where we have
an #include which is not directly used. all the #includes identified
by clangd are removed. because some source files rely on the incorrectly
included header file, those ones are updated to #include the header
file they directly use.
if a forward declaration suffice, the declaration is added instead.
see also https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
before this change, we sort sstables with compaction disabled, when we
are about to perform the compaction. but the idea of of guarding the
getting and registering as a transaction is to prevent other compaction
to mutate the sstables' state and cause the inconsistency.
but since the state is tracked on per-sstable basis, and is not related
to the order in which they are processed by a certain compaction task.
we don't need to guard the "sort()" with this mutual exclusive lock.
for better readability, and probably better performance, let's move the
sort out of the lock. and take this opportunity to use
`std::ranges::sort()` for more concise code.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#14699
Compaction task executors which inherit from compaction_task_impl
may stay in memory after the compaction is finished.
Thus, state switch cannot happen in destructor.
Switch state to none in perform_task defer.
This is the first phase of providing strong exception safety guarantees by the generic `compaction_backlog_tracker::replace_sstables`.
Once all compaction strategies backlog trackers' replace_sstables provide strong exception safety guarantees (i.e. they may throw an exception but must revert on error any intermediate changes they made to restore the tracker to the pre-update state).
Once this series is merged and ICS replace_sstables is also made strongly exception safe (using infrastructure from size_tiered_backlog_tracker introduced here), `compaction_backlog_tracker::replace_sstables` may allow exceptions to propagate back to the caller rather than disabling the backlog tracker on errors.
Closes#14104
* github.com:scylladb/scylladb:
leveled_compaction_backlog_tracker: replace_sstables: provide strong exception safety guarantees
time_window_backlog_tracker: replace_sstables: provide strong exception safety guarantees
size_tiered_backlog_tracker: replace_sstables: provide strong exception safety guarantees
size_tiered_backlog_tracker: provide static calculate_sstables_backlog_contribution
size_tiered_backlog_tracker: make log4 helper static
size_tiered_backlog_tracker: define struct sstables_backlog_contribution
size_tiered_backlog_tracker: update_sstables: update total_bytes only if set changed
compaction_backlog_tracker: replace_sstables: pass old and new sstables vectors by ref
compaction_backlog_tracker: replace_sstables: add FIXME comments about strong exception safety
This is the last step of deprecation dance of DTCS.
In Scylla 5.1, users were warned that DTCS was deprecated.
In 5.2, altering or creation of tables with DTCS was forbidden.
5.3 branch was already created, so this is targetting 5.4.
Users that refused to move away from DTCS will have Scylla
falling back to the default strategy, either STCS or ICS.
See:
WARN 2023-07-14 09:49:11,857 [shard 0] schema_tables - Falling back to size-tiered compaction strategy after the problem: Unable to find compaction strategy class 'DateTieredCompactionStrategy
Then user can later switch to a supported strategy with
alter table.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Closes#14559
also, remove unnecessary forward declarations.
* compaction_manager_test_task_executor is only referenced
in the friend declaration. but this declaration does not need
a forward declaration of the friend class
* compaction_manager_test_task_executor is not used anywhere.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#14680
the callers of the constructor does not move variable into this
parameter, and the constructor itself is not able to consume it.
as the parameter is a vector while `compaction_sstable_registration`
use an `unordered_set` for tracking the sstables being compacted.
so, to avoid creating a temporary copy of the vector, let's just
pass by reference.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#14661
Due to wrong order of stopping of compaction services, shutdown needs
to wait until all compactions are complete, which may take really long.
Moreover, test version of compaction manager does not abort task manager,
which is strictly bounded to it, but stops its compaction module. This results
in tests waiting for compaction task manager's tasks to be unregistered,
which never happens.
Stopping and aborting of compaction manager and task manager's compaction
module are performed in a proper order.
Closes#14461
* github.com:scylladb/scylladb:
tasks: test: abort task manager when wrapped_compaction_manager is destructed
compaction: swap compaction manager stopping order
compaction: modify compaction_manager::stop()
Today, SSTable cleanup skips to the next partition, one at a time, when it finds that the current partition is no longer owned by this node.
That's very inefficient because when a cluster is growing in size, existing nodes lose multiple sequential tokens in its owned ranges. Another inefficiency comes from fetching index pages spanning all unowned tokens, which was described in https://github.com/scylladb/scylladb/issues/14317.
To solve both problems, cleanup will now use multi range reader, to guarantee that it will only process the owned data and as a result skip unowned data. This results in cleanup scanning an owned range and then fast forwarding to the next one, until it's done with them all. This reduces significantly the amount of data in the index caching, as index will only be invoked at each range boundary instead.
Without further ado,
before:
`INFO 2023-07-01 07:10:26,281 [shard 0] compaction - [Cleanup keyspace2.standard1 701af580-17f7-11ee-8b85-a479a1a77573] Cleaned 1 sstables to [./tmp/1/keyspace2/standard1-b490ee20179f11ee9134afb16b3e10fd/me-3g7a_0s8o_06uww24drzrroaodpv-big-Data.db:level=0]. 2GB to 1GB (~50% of original) in 26248ms = 81MB/s. ~9443072 total partitions merged to 4750028.`
after:
`INFO 2023-07-01 07:07:52,354 [shard 0] compaction - [Cleanup keyspace2.standard1 199dff90-17f7-11ee-b592-b4f5d81717b9] Cleaned 1 sstables to [./tmp/1/keyspace2/standard1-b490ee20179f11ee9134afb16b3e10fd/me-3g7a_0s4m_5hehd2rejj8w15d2nt-big-Data.db:level=0]. 2GB to 1GB (~50% of original) in 17424ms = 123MB/s. ~9443072 total partitions merged to 4750028.`
Fixes#12998.
Fixes#14317.
Closes#14469
* github.com:scylladb/scylladb:
test: Extend cleanup correctness test to cover more cases
compaction: Make SSTable cleanup more efficient by fast forwarding to next owned range
sstables: Close SSTable reader if index exhaustion is detected in fast forward call
sstables: Simplify sstable reader initialization
compaction: Extend make_sstable_reader() interface to work with mutation_source
test: Extend sstable partition skipping test to cover fast forward using token
Today, SSTable cleanup skips to the next partition, one at a time, when it finds
that the current partition is no longer owned by this node.
That's very inefficient because when a cluster is growing in size, existing
nodes lose multiple sequential tokens in its owned ranges. Another inefficiency
comes from fetching index pages spanning all unowned tokens, which was described
in #14317.
To solve both problems, cleanup will now use multi range reader, to guarantee
that it will only process the owned data and as a result skip unowned data.
This results in cleanup scanning an owned range and then fast forwarding to the
next one, until it's done with them all. This reduces significantly the amount
of data in the index caching, as index will only be invoked at each range
boundary instead.
Without further ado,
before:
... 2GB to 1GB (~50% of original) in 26248ms = 81MB/s. ~9443072 total partitions merged to 4750028.
after:
... 2GB to 1GB (~50% of original) in 17424ms = 123MB/s. ~9443072 total partitions merged to 4750028.
Fixes#12998.
Fixes#14317.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
This reverts commit 2a58b4a39a, reversing
changes made to dd63169077.
After patch 87c8d63b7a,
table_resharding_compaction_task_impl::run() performs the forbidden
action of copying a lw_shared_ptr (_owned_ranges_ptr) on a remote shard,
which is a data race that can cause a use-after-free, typically manifesting
as allocator corruption.
Note: before the bad patch, this was avoided by copying the _contents_ of the
lw_shared_ptr into a new, local lw_shared_ptr.
Fixes#14475Fixes#14618Closes#14641
Today, we base compaction throughput on the amount of data written,
but it should be based on the amount of input data compacted
instead, to show the amount of data compaction had to process
during its execution.
A good example is a compaction which expire 99% of data, and
today throughput would be calculated on the 1% written, which
will mislead the reader to think that compaction was terribly
slow.
Fixes#14533.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Closes#14615
As the goal is to make compaction filter to the next owned range,
make_sstable_reader() should be extended to create a reader with
parameters forwarded from mutation_source interface, which will
be used when wiring cleanup with multi range reader.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Prevent switch case statements from falling through without annotation
([[fallthrough]]) proving that this was intended.
Existing intended cases were annotated.
Closes#14607
task_manager::module::stop() waits till all compactions are complete.
Thus, ongoing compactions should be aborted before stop() is called
not to prolong shutdown process.
Task manager's compaction module is stopped after
compaction_manager::do_stop(), which aborts ongoing compactions,
is called.
In reshard_sstables_compaction_task_impl::run() we call
sharded<sstables::sstable_directory>::invoke_on_all. In lambda passed
to that method, we use both sharded sstable_directory service
and its local instance.
To make it straightforward that sharded and local instances are
dependend, we call sharded<replica::database>::invoke_on_all
instead and access local directory through the sharded one.
As a preparation for integrating resharding compaction with task manager
a struct and some functions are copied from replica/distributed_loader.cc
to compaction/task_manager_module.cc.
Otherwise regular compaction can sneak in and
see !cs.sstables_requiring_cleanup.empty() with
cs.owned_ranges_ptr == nullptr and trigger
the internal error in `compaction_task_executor::compact_sstables`.
Fixes scylladb/scylladb#14296
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Closes#14297
By making all changes on temporary variables
and eventually moving them back into the tracker members
in a noexcept block the function can safely throw
until the changes are committed.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Instead of providing refresh_sstables_backlog_contribution
that updates the tracker in place, provide a static function
calculate_sstables_backlog_contribution that doesn't change
the tracker state to facilitate exception safety in the next patch.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Encapsulate the contribution-related members in
struct contribution, to be used for strong exception safety.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Although replace_sstables is supposed to be called
only once per {old_ssts, new_ssts} it is safer
to update `_total_bytes` with `sst->data_size()`
only if the sst was inserted/erased successfully.
Otherwise _total_bytes may go out of sync with the
contents of _all.
That said, the next step should be to refer to the
compaction_group's main sstable set directly rather
than maintaining a "shadow" set in the tracker.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Task manager's tasks covering resharding compaction
on top and shard level.
Closes#14112
* github.com:scylladb/scylladb:
test: extend test_compaction_task.py to test reshaping compaction
compaction: move reshape function to shard_reshaping_table_compaction_task_impl::run()
compaction: add shard_reshaping_compaction_task_impl
replica: delete unused function
compaction: add table_reshaping_compaction_task_impl
compaction: copy reshape to task_manager_module.cc
compaction: add reshaping_compaction_task_impl
Compaction tasks covering table major, cleanup, offstrategy,
and upgrade sstables compaction inherit sequence number from their
parents. Thus they do not need to have a new sequence number
generated as it will be overwritten anyway.
Closes#14379
Task manager task covering compaction group major
compaction.
Uses multiple inheritance on already existing
major_compaction_task_executor to keep track of
the operation with task manager.
Closes#14271
* github.com:scylladb/scylladb:
test: extend test_compaction_task.py
test: use named variable for task tree depth
compaction: turn major_compaction_task_executor into major_compaction_task_impl
compaction: take gate holder out of task executor
compaction: extend signature of some methods
tasks: keep shared_ptr to impl in task
compaction: rename compaction_task_executor methods
major_compaction_task_executor inherits both from compaction_task_executor
and major_compaction_task_impl.
Thanks to that an executed operation is represented in task manager.
In the following commits, classes deriving from compaction_task_executor
will be alive longer than they are kept in compaction_manager::_tasks.
Thus, the compaction_task_executor::_gate_holder would be held,
blocking other compactions.
compaction_task_executor::_gate_holder is moved outside of
compaction_task_executor object.
Extend a signature of table::compact_all_sstables and
compaction_manager::perform_major_compaction so that they get
the info of a covering task.
This allows to easily create child tasks that cover compaction group
compaction.