default_compaction_progress_monitor returns a reference to a static
object. So, it should be read-only, but its users need to modify it.
Delete default_compaction_progress_monitor and use one's own
compaction_progress_monitor instance where it's needed.
Closesscylladb/scylladb#15800
After "repair: Get rid of the gc_grace_seconds", the sstable's schema (mode,
gc period if applicable, etc) is used to estimate the amount of droppable
data (or determine full expiration = max_deletion_time < gc_before).
It could happen that the user switched from timeout to repair mode, but
sstables will still use the old mode, despite the user asked for a new one.
Another example is when you play with value of grace period, to prevent
data resurrection if repair won't be able to run in a timely manner.
The problem persists until all sstables using old GC settings are recompacted
or node is restarted.
To fix this, we have to feed latest schema into sstable procedures used
for expiration purposes.
Fixes#15643.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Closesscylladb/scylladb#15746
compaction_read_monitor_generator is an existing mechanism
for monitoring progress of sstables reading during compaction.
In this change information gathered by compaction_read_monitor_generator
is utilized by task manager compaction tasks of the lowest level,
i.e. compaction executors, to calculate task progress.
compaction_read_monitor_generator has a flag, which decides whether
monitored changes will be registered by compaction_backlog_tracker.
This allows us to pass the generator to all compaction readers without
impacting the backlog.
Task executors have access to compaction_read_monitor_generator_wrapper,
which protects the internals of compaction_read_monitor_generator
and provides only the necessary functionality.
Closesscylladb/scylladb#14878
* github.com:scylladb/scylladb:
compaction: add get_progress method to compaction_task_impl
compaction: find total compaction size
compaction: sstables: monitor validation scrub with compaction_read_generator
compaction: keep compaction_progress_monitor in compaction_task_executor
compaction: use read monitor generator for all compactions
compaction: add compaction_progress_monitor
compaction: add flag to compaction_read_monitor_generator
The estimation assumes that size of other components are irrelevant,
when estimating the number of partitions for each output sstable.
The sstables are split according to the data file size, therefore
size of other files are irrelevant for the estimation.
With certain data models, like single-row partitions containing small
values, the index could be even larger than data.
For example, assume index is as large as data, then the estimation
would say that 2x more sstables will be generated, and as a result,
each sstable are underestimated to have 2x less keys.
Fix it by only accounting size of data file.
Fixes#15726.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Closesscylladb/scylladb#15727
Validation scrub bypasses the usual compaction machinery, though it
still needs to be tracked with compaction_progress_monitor so that
we could reach its progress from compaction task executor.
Track sstable scrub in validate mode with read monitors.
Keep compaction_progress_monitor in compaction_task_executor and pass a reference
to it further, so that the compaction progress could be retrieved out of it.
Compaction read monitor generators are used in all compaction types.
Classes which did not use _monitor_generator so far, create it with
_use_backlog_tracker set to no, not to impact backlog tracker.
In the following patches compaction_read_monitor_generator will be used
to find progress of compaction_task_executor's. To avoid unnecessary life
prolongation and exposing internals of the class out of compaction.cc,
compaction_progress_monitor is created.
Compaction class keeps a reference to the compaction_progress_monitor.
Inheriting classes which actually use compaction_read_monitor_generator,
need to set it with set_generator method.
Following patches will use compaction_read_monitor_generator
to track progress of all types of compaction. Some of them should
not be registered in compaction_backlog_tracker.
_use_backlog_tracker flag, which is by default set to true, is
added to compaction_read_monitor_generator and passed to all
compaction_read_monitors created by this generator.
Before integration with task manager the state of one shard repair
was kept in repair_info. repair_info object was destroyed immediately
after shard repair was finished.
In an integration process repair_info's fields were moved to
shard_repair_task_impl as the two served the similar purposes.
Though, shard_repair_task_impl isn't immediately destoyed, but is
kept in task manager for task_ttl seconds after it's complete.
Thus, some of repair_info's fields have their lifetime prolonged,
which makes the repair state change delayed.
Release shard_repair_task_impl resources immediately after shard
repair is finished.
Fixes: #15505.
Closesscylladb/scylladb#15506
SSTable runs work hard to keep the disjointness invariant, therefore they're
expensive to build from scratch.
For every insertion, it keeps the elements sorted by their first key in
order to reject insertion of element that would introduce overlapping.
Additionally, a sstable run can grow to dozens of elements (or hundreds)
therefore, we can also make interaction with compaction strategies more
efficient by not copying them when building a list of candidates in compaction
manager. And less fragile by filtering out any sstable runs that are not
completely eligible for compaction.
Previously, ICS had to give up on using runs managed by sstable set due to
fragility of the interface (meaning runs are being built from scratch
on every call to the strategy, which is very inefficient, but that had to
be done for correctness), but now we can restore that.
Closesscylladb/scylladb#15440
* github.com:scylladb/scylladb:
compaction: Switch to strategy_control::candidates() for regular compaction
tests: Prepare sstable_compaction_test for change in compaction_strategy interface
compaction: Allow strategy to retrieve candidates either as sstables or runs
compaction: Make get_candidates() work with frozen_sstable_run too
sstables: add sstable_run::run_identifier()
sstables: tag sstable_run::insert() with nodiscard
sstables: Make all_sstable_runs() more efficient by exposing frozen shared runs
sstables: Simplify sstable_set interface to retrieve runs
Most of the time only the roots of tasks tree should be non internal.
Change default implementation of is_internal and delete overrides
consistent with it.
Closesscylladb/scylladb#15353
Now everything is prepared for the switch, let's do it.
Now let's wait for ICS to enjoy the set of changes.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
That's needed for upcoming changes that will allow ICS to efficiently
retrieve sstable runs.
Next patch will remove candidates from compaction_strategy's interface
to retrieve candidates using this one instead.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Off-strategy suffers with a 100% space overhead, as it adopted
a sort of all or nothing approach. Meaning all input sstables,
living in maintenance set, are kept alive until they're all
reshaped according to the strategy criteria.
Input sstables in off-strategy are very likely to be mostly disjoint,
so it can greatly benefit from incremental compaction.
The incremental compaction approach is not only good for
decreasing disk usage, but also memory usage (as metadata of
input and output live in memory), and file desc count, which
takes memory away from OS.
Turns out that this approach also greatly simplifies the
off-strategy impl in compaction manager, as it no longer have
to maintain new unused sstables and mark them for
deletion on failure, and also unlink intermediary sstables
used between reshape rounds.
Fixes https://github.com/scylladb/scylladb/issues/14992.
Closesscylladb/scylladb#15400
* github.com:scylladb/scylladb:
test: Verify that off-strategy can do incremental compaction
compaction: Clear pending_replacement list when tombstone GC is disabled
compaction: Enable incremental compaction on off-strategy
compaction: Extend reshape type to allow for incremental compaction
compaction: Move reshape_compaction in the source
compaction: Enable incremental compaction only if replacer callback is engaged
pending_replacement list is used by incremental compaction to
communicate to other ongoing compactions about exhausted sstables
that must be replaced in the sstable set they keep for tombstone
GC purposes.
Reshape doesn't enable tombstone GC, so that list will not
be cleared, which prevents incremental compaction from releasing
sstables referenced by that list. It's not a problem until now
where we want reshape to do incremental compaction.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Off-strategy suffers with a 100% space overhead, as it adopted
a sort of all or nothing approach. Meaning all input sstables,
living in maintenance set, are kept alive until they're all
reshaped according to the strategy criteria.
Input sstables in off-strategy are very likely to mostly disjoint,
so it can greatly benefit from incremental compaction.
The incremental compaction approach is not only good for
decreasing disk usage, but also memory usage (as metadata of
input and output live in memory), and file desc count, which
takes memory away from OS.
Turns out that this approach also greatly simplifies the
off-strategy impl in compaction manager, as it no longer have
to maintain new unused sstables and mark them for
deletion on failure, and also unlink intermediary sstables
used between reshape rounds.
Fixes#14992.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
That's done by inheriting regular_compaction, which implement
incremental compaction. But reshape still implements its own
methods for creating writer and reader. One reason is that
reshape is not driven by controller, as input sstables to it
live in maintenance set. Another reason is customization
of things like sstable origin, etc.
stop_sstable_writer() is extended because that's used by
regular_compaction to check for possibility of removing
exhausted sstables earlier whenever an output sstable
is sealed.
Also, incremental compaction will be unconditionally
enabled for ICS/LCS during off-strategy.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
That's in preparation to next change that will make reshape
inherit from regular compaction.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Currently, the moved-object's manager pointer is moved into the
constructed object, but without fixing the registration to
point to the moved-to object, causing #15248.
Although we could properly move the registration from
the moved-from object to the moved-to one, it is simpler
to just disallow moving a registered tracker, since it's
not needed anywhere. This way we just don't need to mess
with the trackers' registration.
The move-assignment operator has a similar problem,
therefore it is deleted in this series, and the function is
renamed to `transfer_backlog` that just doesn't deal with the
moved-from registration. This is safe since it's only used internally
by the compaction manager.
Fixes#15248Closesscylladb/scylladb#15445
* github.com:scylladb/scylladb:
compaction_state: store backlog_track in std::optional
compaction_backlog_tracker: do not allow moving registered trackers
Compaction tasks executors serve two different purposes - as compaction
manager related entity they execute compaction operation and as task
manager related entity they track compaction status.
When one role depends on the other, as it currently is for
compaction_task_impl::done() and compaction_task_executor::compaction_done(),
requirements of both roles need to be satisfied at the same time in each
corner case. Such complexity leads to bugs.
To prevent it, compaction_task_impl::done() of executors no longer depends
on compaction_task_executor::compaction_done().
Fixes: #14912.
Closesscylladb/scylladb#15140
* github.com:scylladb/scylladb:
compaction: warn about compaction_done()
compaction: do not run stopped compaction
compaction: modify lowest compaction tasks' run method
compaction: pass do_throw_if_stopping to compaction_task_executor
So that replacing it will destroy the previous tracker
and unregister it before assigning the new one and
then registering it.
This is safer than assiging it in place.
With that, the move assignment operator is not longer
used and can be deleted.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Currently, the moved-object's manager pointer is moved into the
constructed object, but without fixing the registration to
point to the moved-to object, causing #15248.
Although we could properly move the registration from
the moved-from object to the moved-to one, it is simpler
to just disallow moving a registered tracker, since it's
not needed anywhere. This way we just don't need to mess
with the trackers' registration.
With that in mind, when move-assigning a compaction_backlog_tracker
the existing tracker can remain registered.
Fixesscylladb/scylladb#15248
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
That's needed for enabling incremental compaction to operate, and
needed for subsequent work that enables incremental compaction
for off-strategy, which in turn uses reshape compaction type.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Split compaction_strategy_impl constructor into methods that will
be reused for validation.
Add additional checks providing that options' values are legal.
Add compaction_strategy_impl::validate_min_max_threshold method
that will be used to validate min and max threshold values
for different compaction methods.
Split size_tiered_compaction_strategy_options constructor into
methods that will be reused for validation.
Add additional checks providing that options' values are legal.
To be consistent with other compaction_strategy_options,
time_window_compaction_strategy_options uses compaction_strategy_impl::get_value
and cql3::statements::property_definitions::to_long helpers for
parsing.
Add temporarily empty validate method to compaction_strategy_options.
The method will validate the options and help determining whether
only the allowed options were set.
compaction_done() returns ready future before compaction_task_executor::run_compaction()
even though the compaction did not start.
Make compaction_done() private and add a comment to warn against
incorrect usage.
Before compaction_task_executor::do_run is called, the executor can
be already aborted. Check if compaction was stopped and set
_compaction_done to exceptional future.
For compaction_task_executors, unlike for all other task manager
tasks, run method does not embrace operations performed in a scope
of a task, but only waits until shared_future connected with
the operations is resolved.
Apart from breaking task manager task conventions, such a run method
must consider all corner cases, not to break task manager or
compaction manager functionality.
To fix existing and prevent further bugs related to task manager
and compaction manager coexistence, call perform_task inside
run method and wait for it in a standard way.
Executors that are not going to be reflected in task manager run call
perform_task the old way.
Make sure the compaction_state:s are idle before
they are destroyed. Although all tasks are stopped
in stop_ongoing_compactions, make sure there is
fiber holding the compaction_state gate.
compaction_manager::remove now needs to close the
compaction_state gate and to stop_ongoing_compactions
only if the gate is not closed yet.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Check if the compaction_state gate is closed
along with _state != state::enabled and return early
in this case.
At this point entering the gate is guaranteed to succeed.
So enter the gate before calling `perform_compaction`
keeping the std::optional<gate_holder> throughout
the compaction task.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Scylla sstable promises to *never* mutate its input sstables. This
promise was broken by `scylla sstable scrub --scrub-mode=validate`,
because validate moves invalid input sstables into qurantine. This is
unexpected and caused occasional failures in the scrub tests in
test_tools.py. Fix by propagating a flag down to
`scrub_sstables_validate_mode()` in `compaction.cc`, specifying whether
validate should qurantine invalid sstables, then set this flag to false
in `scylla-sstable.cc`. The existing test for validate-mode scrub is
ammended to check that the sstable is not mutated. The test now fails
before the fix and passes afterwards.
Fixes: #14309Closes#15139