Loop in shard_reshaping_compaction_task_impl::run relies on whether
sstables::compaction_stopped_exception is thrown from run_custom_job.
The exception is swallowed for each type of compaction
in compaction_manager::perform_task.
Rethrow an exception in perfrom task for reshape compaction.
Fixes: #15058.
Closes#15067
Compaction group is the data plane for tablets, so this integration
allows each tablet to have its own storage (memtable + sstables).
A crucial step for dynamic tablets, where each tablet can be worked
on independently.
There are still some inefficiencies to be worked on, but as it is,
it already unlocks further development.
```
INFO 2023-07-27 22:43:38,331 [shard 0] init - loading tablet metadata
INFO 2023-07-27 22:43:38,333 [shard 0] init - loading non-system sstables
INFO 2023-07-27 22:43:38,354 [shard 0] table - Tablet with id 0 present for ks.cf
INFO 2023-07-27 22:43:38,354 [shard 0] table - Tablet with id 2 present for ks.cf
INFO 2023-07-27 22:43:38,354 [shard 0] table - Tablet with id 4 present for ks.cf
INFO 2023-07-27 22:43:38,354 [shard 0] table - Tablet with id 6 present for ks.cf
INFO 2023-07-27 22:43:38,428 [shard 1] table - Tablet with id 1 present for ks.cf
INFO 2023-07-27 22:43:38,428 [shard 1] table - Tablet with id 3 present for ks.cf
INFO 2023-07-27 22:43:38,428 [shard 1] table - Tablet with id 5 present for ks.cf
INFO 2023-07-27 22:43:38,428 [shard 1] table - Tablet with id 7 present for ks.cf
```
Closes#14863
* github.com:scylladb/scylladb:
Kill scylla option to configure number of compaction groups
replica: Wire tablet into compaction group
token_metadata: Add this_host_id to topology config
replica: Switch to chunked_vector for storing compaction groups
replica: Generate group_id for compaction_group on demand
Before compaction task executors started inheriting from
compaction_task_impl, they were destructed immediately after
compaction finished. Destructors of executors and their
fields performed actions that affected global structures and
statistics and had impact on compaction process.
Currently, task executors are kept in memory much longer, as their
are tracked by task manager. Thus, destructors are not called just
after the compaction, which results in compaction stats not being
updated, which causes e.g. infinite cleanup loop.
Add release_resources() method which is called at the end
of compaction process and does what destructors used to.
Fixes: #14966.
Fixes: #15030.
Closes#15005
An sstable can be in one of several states -- normal, quarantined, staging, uploading. Right now this "state" is hard-wired into sstable's path, e.g. quarantined sstable would sit in e.g. /var/lib/data/ks-cf-012345/quarantine/ directory. Respectively, there's a bunch of directory names constexprs in sstables.hh defining each "state". Other than being confusing, this approach doesn't work well with S3 backend. Additionally, there's snapshot subdir that adds to the confusion, because snapshot is not quite a state.
This PR converts "state" from constexpr char* directories names into a enum class and patches the sstable creation, opening and state-changing API to use that enum instead of parsing the path.
refs: #13017
refs: #12707Closes#14152
* github.com:scylladb/scylladb:
sstable/storage: Make filesystem storage with initial state
sstable: Maintain state
sstable: Make .change_state() accept state, not directory string
sstable: Construct it with state
sstables_manager: Remove state-less make_sstable()
table: Make sstables with required state
test: Make sstables with upload state in some cases
tools: Make sstables with normal state
table: Open-code sstables making streaming helpers
tests: Make sstables with normal state by default
sstable_directory: Make sstable with required state
sstable_directory: Construct with state
distributed_loader: Make sstable with desired state when populating
distributed_loader: Make sstable with upload state when uploading
sstable: Introduce state enum
sstable_directory: Merge verify and g.c. calls
distributed_loader: Merge verify and gc invocations
sstable/filesystem: Put underscores to dir members
sstable/s3: Mark make_s3_object_name() const
sstable: Remove filename(dir, ...) method
There are a few good reasons for this change.
1) compaction_group doesn't have to be aware of # of groups
2) thinking forward to dynamic tablets, # of groups cannot be
statically embedded in group id, otherwise it gets stale.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
perform_offstrategy is called from try_perform_cleanup
when there are sstables in the maintenance set that require
cleanup.
The input sstables are inserted into the compaction_state
`sstables_requiring_cleanup` and `try_perform_cleanup`
expects offstrategy compaction to clean them up along
with reshape compaction.
Otherwise, the maintenance sstables that require cleanup
are not cleaned up by cleanup compaction, since
the reshape output sstable(s) are not analyzed again
after reshape compaction, where that would insert
the output sstable(s) into `sstables_requiring_cleanup`
and trigger their cleanup in the subsequent cleanup compaction.
The latter method is viable too, but it is less effficient
since we can do reshape+cleanup in one pass, vs.
reshape first and cleanup later.
Fixes scylladb/scylladb#15041
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Closes#15043
discard_result ignores only successful futures. Thus, if
perform_compaction<regular_compaction_task_executor> call fails,
a failure is considered abandoned, causing tests to fail.
Explicitly ignore failed future.
Fixes: #14971.
Closes#15000
Pretty cosmetic change, but it will allow S3 to finally support moving
sstables between states (after this patch it still doesn't)
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
If make_task call in compaction_manager::perform_compaction yields,
compaction_task_executor::_compaction_state may be gone and gate
won't be held.
Hold gate immediately after compaction_task_executor is created.
Add comment not to call prepare_task without preparation.
Refs: #14971.
Fixes: #14977.
Closes#14999
Currently `sstable_requiring_cleanup` is updated using `compacting_sstable_registration`, but that mechanism is not used by offstrategy compaction, leading to #14304.
This series introduces `compaction_manager::on_compaction_completion` that intercepts the call
to the table::on_compaction_completion. This allows us to update `sstable_requiring_cleanup` right before the compacted sstables are deleted, making sure they are no leaked to `sstable_requiring_cleanup`, which would hold a reference to them until cleanup attempts to clean them up.
`cleanup_incremental_compaction_test` was adjusted to observe the sstables `on_delete` (by adding a new observer event) to detect the case where cleanup attempts to delete the leaked sstables and fails since they were already deleted from the file system by offstrategy compaction. The test fails with the fix and passes with it.
Fixes#14304Closes#14858
* github.com:scylladb/scylladb:
compaction_manager: on_compaction_completion: erase sstables from sstables_requiring_cleanup
compaction/leveled_compaction_strategy: ideal_level_for_input: special case max_sstable_size==0
sstable: add on_delete observer
compaction_manager: add on_compaction_completion
sstable_compaction_test: cleanup_incremental_compaction_test: verify sstables_requiring_cleanup is empty
when it comes to `regular_compaction_task_executor`, we repeat the
compaction until the compaction can not proceed, so after an iteration
of compaction completes successfully, the task can still continue with
yet another round of the compaction as it sees appropriate. so let's
update the comment to reflect this fact.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#14891
quite a few member variables serves as the configuration for
a given compaction, they are immutable in the life cycle of it,
so for better readability, let's mark them `const`.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#14981
Erase retired sstable from compaction_state::sstables_requiring_cleanup
also on_compaction_completion (in addition to
compacting_sstable_registration::release_compacting
for offstrategy compaction with piggybacked cleanup
or any other compaction type that doesn't use
compacting_sstable_registration.
Add cleanup_during_offstrategy_incremental_compaction_test
that is modeled after cleanup_incremental_compaction_test to check
that cleanup doesn't attempt to cleanup already-deleted
sstables that were left over by offstrategy compaction
in sstables_requiring_cleanup.
Fixes#14304
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Prevent div-by-zero byt returning const level 1
if max_sstable_size is zero, as configured by
cleanup_incremental_compaction_test, before it's
extended to cover also offstrategy compaction.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Pass the call to the table on_compaction_completion
so we can manage the sstables requiring cleanup state
along the way.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
All compaction task executors, except for regular compaction one,
become task manager compaction tasks.
Creating and starting of major_compaction_task_executor is modified
to be consistent with other compaction task executors.
Closes#14505
* github.com:scylladb/scylladb:
test: extend test_compaction_task.py to cover compaction group tasks
compaction: turn custom_task_executor into compaction_task_impl
compaction: turn sstables_task_executor into sstables_compaction_task_impl
compaction: change sstables compaction tasks type
compaction: move table_upgrade_sstables_compaction_task_impl
compaction: pass task_info through sstables compaction
compaction: turn offstrategy_compaction_task_executor into offstrategy_compaction_task_impl
compaction: turn cleanup_compaction_task_executor into cleanup_compaction_task_impl
comapction: use optional task info in major compaction
compaction: use perform_compaction in compaction_manager::perform_major_compaction
get_compacted_fragments_writer() returns a instance of
`compacted_fragments_writer`, there is no need to cast it again.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#14919
add fmt formatter for `compaction_task_executor::state` and
`compaction_task_executor` and its derived classes.
this is a part of a series to migrating from `operator<<(ostream&, ..)`
based formatting to fmtlib based formatting. the goal here is to enable
fmtlib to print `compaction_task_executor`, its derived classes and
`compaction_task_executor::state` without the help of `operator<<`.
since all of the callers of 'operator<<' of these types now use
formatters, the operator<< are removed in this change. the helpers
like `to_string()` and `describe()` are removed as well, as it'd
be more consistent if we always use fmtlib for formatting instead
of inventing APIs with different names.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#14906
cleanup_compaction_task_executor inherits both from compaction_task_executor
and cleanup_compaction_task_impl.
Add a new version of compaction_manager::perform_task_on_all_files
which accepts only the tasks that are derived from compaction_task_impl.
After all task executors' conversions are done, the new version replaces
the original one.
To make it consistent with the upcoming methods, methods triggering
major compaction get std::optional<tasks::task_info> as an argument.
Thanks to that we can distinguish between a task that has no parent
and the task which won't be registered in task manager.
simpler than the "begin, end" iterator pair. and also tighten the
type constraints, now require the value type to be
sstables::shared_sstable. this matches what we are expecting in
the implementation.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#14678
Task manager tasks covering reshard compaction.
Reattempt on https://github.com/scylladb/scylladb/pull/14044. Bugfix for https://github.com/scylladb/scylladb/issues/14618 is squashed with 95191f4.
Regression test added.
Closes#14739
* github.com:scylladb/scylladb:
test: add test for resharding with non-empty owned_ranges_ptr
test: extend test_compaction_task.py to test resharding compaction
compaction: add shard_reshard_sstables_compaction_task_impl
compaction: invoke resharding on sharded database
compaction: move run_resharding_jobs into reshard_sstables_compaction_task_impl::run()
compaction: add reshard_sstables_compaction_task_impl
compaction: create resharding_compaction_task_impl
before this change, there are chances that the temporary sstables
created for collecting the GC-able data create by a certain
compaction can be picked up by another compaction job. this
wastes the CPU cycles, adds write amplification, and causes
inefficiency.
in general, these GC-only SSTables are created with the same run id
as those non-GC SSTables, but when a new sstable exhausts input
sstable(s), we proactively replace the old main set with a new one
so that we can free up the space as soon as possible. so the
GC-only SSTables are added to the new main set along with
the non-GC SSTables, but since the former have good chance to
overlap the latter. these GC-only SSTables are assigned with
different run ids. but we fail to register them to the
`compaction_manager` when replacing the main sstable set.
that's why future compactions pick them up when performing compaction,
when the compaction which created them is not yet completed.
so, in this change,
* to prevent sstables in the transient stage from being picked
up by regular compactions, a new interface class is introduced
so that the sstable is always added to registration before
it is added to sstable set, and removed from registration after
it is removed from sstable set. the struct helps to consolidate
the regitration related logic in a single place, and helps to
make it more obvious that the timespan of an sstable in
the registration should cover that in the sstable set.
* use a different run_id for the gc sstable run, as it can
overlap with the output sstable run. the run_id for the
gc sstable run is created only when the gc sstable writer
is created. because the gc sstables is not always created
for all compactions.
please note, all (indirect) callers of
`compaction_task_executor::compact_sstables()` passes a non-empty
`std::function` to this function, so there is no need to check for
empty before calling it. so in this change, the check is dropped.
Fixes#14560
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#14725
In reshard_sstables_compaction_task_impl::run() we call
sharded<sstables::sstable_directory>::invoke_on_all. In lambda passed
to that method, we use both sharded sstable_directory service
and its local instance.
To make it straightforward that sharded and local instances are
dependend, we call sharded<replica::database>::invoke_on_all
instead and access local directory through the sharded one.
Add task manager's task covering resharding compaction.
A struct and some functions are moved from replica/distributed_loader.cc
to compaction/task_manager_module.cc.
for faster build times and clear inter-module dependencies, we
should not #includes headers not directly used. instead, we should
only #include the headers directly used by a certain compilation
unit.
in this change, the source files under "/compaction" directories
are checked using clangd, which identifies the cases where we have
an #include which is not directly used. all the #includes identified
by clangd are removed. because some source files rely on the incorrectly
included header file, those ones are updated to #include the header
file they directly use.
if a forward declaration suffice, the declaration is added instead.
see also https://clangd.llvm.org/guides/include-cleaner#unused-include-warningCloses#14740
* github.com:scylladb/scylladb:
treewide: remove #includes not use directly
size_tiered_backlog_tracker: do not include remove header
for faster build times and clear inter-module dependencies, we
should not #includes headers not directly used. instead, we should
only #include the headers directly used by a certain compilation
unit.
in this change, the source files under "/compaction" directories
are checked using clangd, which identifies the cases where we have
an #include which is not directly used. all the #includes identified
by clangd are removed. because some source files rely on the incorrectly
included header file, those ones are updated to #include the header
file they directly use.
if a forward declaration suffice, the declaration is added instead.
see also https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
before this change, we sort sstables with compaction disabled, when we
are about to perform the compaction. but the idea of of guarding the
getting and registering as a transaction is to prevent other compaction
to mutate the sstables' state and cause the inconsistency.
but since the state is tracked on per-sstable basis, and is not related
to the order in which they are processed by a certain compaction task.
we don't need to guard the "sort()" with this mutual exclusive lock.
for better readability, and probably better performance, let's move the
sort out of the lock. and take this opportunity to use
`std::ranges::sort()` for more concise code.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#14699
Compaction task executors which inherit from compaction_task_impl
may stay in memory after the compaction is finished.
Thus, state switch cannot happen in destructor.
Switch state to none in perform_task defer.
This is the first phase of providing strong exception safety guarantees by the generic `compaction_backlog_tracker::replace_sstables`.
Once all compaction strategies backlog trackers' replace_sstables provide strong exception safety guarantees (i.e. they may throw an exception but must revert on error any intermediate changes they made to restore the tracker to the pre-update state).
Once this series is merged and ICS replace_sstables is also made strongly exception safe (using infrastructure from size_tiered_backlog_tracker introduced here), `compaction_backlog_tracker::replace_sstables` may allow exceptions to propagate back to the caller rather than disabling the backlog tracker on errors.
Closes#14104
* github.com:scylladb/scylladb:
leveled_compaction_backlog_tracker: replace_sstables: provide strong exception safety guarantees
time_window_backlog_tracker: replace_sstables: provide strong exception safety guarantees
size_tiered_backlog_tracker: replace_sstables: provide strong exception safety guarantees
size_tiered_backlog_tracker: provide static calculate_sstables_backlog_contribution
size_tiered_backlog_tracker: make log4 helper static
size_tiered_backlog_tracker: define struct sstables_backlog_contribution
size_tiered_backlog_tracker: update_sstables: update total_bytes only if set changed
compaction_backlog_tracker: replace_sstables: pass old and new sstables vectors by ref
compaction_backlog_tracker: replace_sstables: add FIXME comments about strong exception safety
This is the last step of deprecation dance of DTCS.
In Scylla 5.1, users were warned that DTCS was deprecated.
In 5.2, altering or creation of tables with DTCS was forbidden.
5.3 branch was already created, so this is targetting 5.4.
Users that refused to move away from DTCS will have Scylla
falling back to the default strategy, either STCS or ICS.
See:
WARN 2023-07-14 09:49:11,857 [shard 0] schema_tables - Falling back to size-tiered compaction strategy after the problem: Unable to find compaction strategy class 'DateTieredCompactionStrategy
Then user can later switch to a supported strategy with
alter table.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Closes#14559
also, remove unnecessary forward declarations.
* compaction_manager_test_task_executor is only referenced
in the friend declaration. but this declaration does not need
a forward declaration of the friend class
* compaction_manager_test_task_executor is not used anywhere.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#14680
the callers of the constructor does not move variable into this
parameter, and the constructor itself is not able to consume it.
as the parameter is a vector while `compaction_sstable_registration`
use an `unordered_set` for tracking the sstables being compacted.
so, to avoid creating a temporary copy of the vector, let's just
pass by reference.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#14661