Now that submit() switched to table_state, compaction_reenabler
and friends can switch to table_state too.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
auto_compaction_disabled_by_user is a configuration that can be enabled
or disabled on a particular table. We're adding this interface to
avoid having to push the configuration for every compaction_state,
which would result in redundant information as the configuration
value is the same for all table states.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
The idea is that we'll have a single on-completion interface for both
"in-strategy" and off-strategy compactions, so not to pollute table_state
with one interface for each.
replica::table::on_compaction_completion is being moved into private namespace.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
compaction_manager needs this interface when setting the sstable
creation lambda in compaction_descriptor, which is then forwarded
into the actual compaction procedure.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
in_strategy_sstables() doesn't have to be implemented in table, as it's
simply about main set with maintenance and staging files filtered out.
Also, let's make it switch to table_state as part of ongoing work.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
With compaction_manager switching to table_state, we'll need to
introduce a method in table_state to return maintenance set.
So better to have a descriptive name for main set.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
If the compaction_descriptor returned by time_window_compaction_strategy::get_sstables_for_compaction
is marked with has_only_fully_expired::yes it should always be compacted
since time_window_compaction_strategy::get_sstables_for_compaction is not idempotent.
It sets _last_expired_check and if compaction is postponed and retried before
expired_sstable_check_frequency has passed, it will not look for those fully-expired
sstables again. Plus, compacting them is the cheapest possible as it does not require
reading anything, just deleting the input sstables, so there's no reason not postpone it.
Fixes#10989
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
To test both expired and non-expired sstables scenarios
we need to pass this helper function the expected number
of sstables before compaction and after compaction.
When compaction a set of fully-expired sstables,
we expect none to remain, while when the set of sstables
is not fully expired, we'll expect 1 output sstable
after compaction.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
So we can use the lower-level build blocks to
test compaction serialization of both fully-expired
and non-fully-expired sstables scenarios in the following patches.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
We should separate the scheduling groups used for major compaction
from the the regular compaction scheduling group so that
the latter can be affected by the backlog tracker in case
backlog accumulates during a long running major compaction.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
- Use `sstables::generation_type` in more places
- Enforce conceptual separation of `sstables::generation_type` and `int64_t`
- Fix `extremum_tracker` so that `sstables::generation_type` can be non-default-constructible
Fixes#10796.
Closes#10844
* github.com:scylladb/scylla:
sstables: make generation_type an actual separate type
sstables: use generation_type more soundly
extremum_tracker: do not require default-constructible value types
Make the compaction manager for testing using
this class.
Makes sure to enable the compaction manager
and to stop it before it's destroyed.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
"
In order to wire-in the compaction_throughput_mb_per_sec the compaction
creation and stopping will need to be patched. Right now both places are
quite hairy, this set coroutinizes stop() for simpler adding of stopping
bits, unifies all the compaction manager constructors and adds the
compaction_manager::config for simpler future extending.
As a side effect the backlog_controller class gets an "abstract" sched
group it controlls which in turn will facilitate seastar sched groups
unification some day.
"
* 'br-compaction-manager-start-stop-cleanup' of https://github.com/xemul/scylla:
compaction_manager: Introduce compaction_manager::config
backlog_controller: Generalize scheduling groups
database: Keep compound flushing sched group
compaction_manager: Swap groups and controller
compaction_manager: Keep compaction_sg on board
compaction_manager: Unify scheduling_group structures
compaction_manager: Merge static/dynamic constructors
compaction_manager: Coroutinuze really_do_stop()
compaction_manager: Shuffle really_do_stop()
compaction_manager: Remove try-catch around logger
The sstable set param isn't being used anywhere, and it's also buggy
as sstable run list isn't being updated accordingly. so it could happen
that set contains sstables but run list is empty, introducing
inconsistency.
we're fortunate that the bug wasn't activated as it would've been
a hard one to catch. found this while auditting the code.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20220617203438.74336-1-raphaelsc@scylladb.com>
`generation_type` is (supposed to be) conceptually different from
`int64_t` (even if physically they are the same), but at present
Scylla code still largely treats them interchangeably.
In addition to using `generation_type` in more places, we
provide (no-op) `generation_value()` and `generation_from_value()`
operations to make the smoke-and-mirrors more believable.
The churn is considerable, but all mechanical. To avoid even
more (way, way more) churn, unit test code is left untreated for
now, except where it uses the affected core APIs directly.
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
This is to make it constructible in a way most other services are -- all
the "scalar" parameters are passed via a config.
With this it will be much shorter to add compaction bandwidth throttling
option by just extending the config itself, not the list of constructor
arguments (and all its callers).
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The only difference between those two are in the way backlog controller
is created. It's much simpler to have the controller construction logic
in compaction manager instead. Similar "trick" is used to construct
flush controller for the database.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The controller unit test for LCS was only creating level 0 SSTables.
As level 0 falls back to STCS controller, it means that we weren't actually
testing LCS controller.
So let's adjust the unit test to account for LCS fan_out, which is 10
instead of 4, and also allow creation of SSTables on higher levels.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
In core code there's only one place that constructs table -- in
database.cc -- and this place currently has the sstables_manager pointer
sitting on table config (despite it's a pointer, it's always non-null).
All the tests always use the manager from one of _env's out there.
For now the new contructor arg is unused.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The compaction manager's empty constructor is supposed to be invoked
only in testing environment, however, it is easy to invoke it by mistake
from production code.
Here we add a more verbose constructor and making the default compaction
private, the verbose compiler need to be invoked with a tag
for_testing_tag, this will ensure that this constructor will be invoked
only when intended.
The unit tests were changed according to this new paradigm.
Tests: unit (dev)
Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
propagate_replacement() is an internal function that shouldn't be in
the public interface. No one besides an unit test for incremental
compaction needs it. In the future, I want to revisit incremental
compaction unit test to stop using it and only rely on public
interfaces
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20220506171647.81063-1-raphaelsc@scylladb.com>
That's done by picking the ideal level for the input, such
that LCS won't have to either promote or demote data, because
the output level is not the best candidate for having the
size of the output data.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
There's a public call on replica::table to get back the compaction
manager reference. It's not needed, actually. The users of the call are
distributed loader which already has database at hand, and a test that
creates itw own instance of compaction manager for its testing tables
and thus also has it available.
tests: unit(dev)
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20220406171351.3050-1-xemul@scylladb.com>
Bucket awareness in cleanup was introduced in a69d98c3d0.
STCS and TWCS already support it, and now LCS will receive it.
The goal of bucket awareness is to reduce writeamp in cleanup,
therefore reducing operation time. Additionally, garbage collection
becomes more efficient as shadowed data can now be potentially
compacted with the data that shadows it, assuming they're on
the same level.
The implementation for LCS is simple. Will reuse the procedure
for STCS for returning jobs in level 0. And one job will be
returned for each non-empty level > 0. What allows us to do it
is our incremental selection approach used in compaction,
that sets a limit on memory usage and disk space requirement.
Fixes#10097.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20220331173417.211257-1-raphaelsc@scylladb.com>
For compaction to be able to purge expired data, like tombstones, a
sstable set snapshot is set in the compaction descriptor.
That's a decision that belongs to task type. For example, all regular
compaction enable GC, whereas scrub for example doesn't for safety
reasons.
The problem is that the decision is being made by every instantiation
of compaction_descriptor in the strategies, which is both unnecessary
and also adds lots of boilerplate to the code, making it hard to
understand and work with.
As sstable set snapshot is an implementation detail, a new method
is being added to compaction_descriptor to make the intention
clearer, making the interface easier to understand.
can_purge_tombstones, used previously by rewrite task only, is being
reused for communicating GC intention into task::compact_sstables().
The boilerplate was a pain when adding a new strategy method for
the ongoing work on cleanup, described by issue #10097.
Another benefit is that we'll now only create a set snapshot when
compaction will really run. Before, it could happen that the snapshot
would be discarded if the compaction attempt had to be postponed,
which is a waste of cpu cycles.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Compaction manager is calling back the table to run off-strategy compaction,
but the logic clearly belongs to manager which should perform the
operation independently and only call table to update its state with the
result.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20220315174504.107926-2-raphaelsc@scylladb.com>
Table submits compaction request into manager, which in turn calls
back table to run the compaction when the time has come, i.e.:
table -> compaction manager -> table -> execute compaction
But manager should not rely on table to run compaction, as compaction
execution procedure sits one layer below the manager and should be
accessed directly by it, i.e:
table -> compaction manager -> execute compaction
This makes code easier to understand and update_compaction_history()
can now be noop for unit tests using table_state.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20220311023410.250149-1-raphaelsc@scylladb.com>
The flat_mutation_reader files were conflated and contained multiple
readers, which were not strictly necessary. Splitting optimizes both
iterative compilation times, as touching rarely used readers doesn't
recompile large chunks of codebase. Total compilation times are also
improved, as the size of flat_mutation_reader.hh and
flat_mutation_reader_v2.hh have been reduced and those files are
included by many file in the codebase.
With changes
real 29m14.051s
user 168m39.071s
sys 5m13.443s
Without changes
real 30m36.203s
user 175m43.354s
sys 5m26.376s
Closes#10194
The series overhauls the compaction_manager::task design and implementation
by properly layering the functionality between the compaction_manager
that deals with generic task execution, and the per-task business logic that is defined
in a set of classes derived from the generic task class.
While at it, the series introduces `task::state` and a set of helper functions to manage it
to prevent leaks in the statistics, fixing #9974.
Two more stats counter were exposed: `completed_tasks` and a new `postponed_tasks`.
Test: sstable_compaction_test
Dtest: compaction_test.py compaction_additional_test.py
Fixes#9974Closes#10122
* github.com:scylladb/scylla:
compaction_manager: use coroutine::switch_to
compaction_manager::task: drop _compaction_running
compaction_manager: move per-type logic to derived task
compaction_manager: task: add state enum
compaction_manager: task: add maybe_retry
compaction_manager: reevaluate_postponed_compactions: mark as noexcept
compaction_manager: define derived task types
compaction_manager: register_metrics: expose postponed_compactions
compaction_manager: register_metrics: expose failed_compactions
compaction_manager: register_metrics: expose _stats.completed_tasks
compaction: add documentation for compaction_type to string conversions
compaction: expose to_string(compaction_type)
compaction_manager: task: standardize task description in log messages
compaction_manager: refactor can_proceed
compaction_manager: pass compaction_manager& to task ctor
compaction_manager: use shared_ptr<task> rather than lw_shared_ptr
compaction_manager: rewrite_sstables: acquire _maintenance_ops_sem once
compaction_manager: use compaction_state::lock only to synchronize major and regular compaction
Move the business logic into the task specific classes.
Separating initialization during task construction,
from the compaction_done task, moved into
a do_run() method, and in some cases moving
a lambda function that was called per table (as in
rewrite_sstables) into a private method of the
derived class.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
The sstables::sstable class has two methods for writing sstables:
1) sstable_writer get_writer(...);
2) future<> write_components(flat_mutation_reader, ...);
(1) directly exposes the writer type, so we have to update all users of
it (there is not that many) in this same patch. We defer updating
users of (2) to a follow-up commits.
Introduced in commit: ddd693c6d7
We're not emplacing newer windows in the tracker, causing
std::out_of_range when replacing sstables for windows.
Let's fix the logic and add an unit test to cover this.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20220301194944.95096-1-raphaelsc@scylladb.com>
There's no automated test for controller, it's time to have one.
Let's start with a basic one that verifies the assumption that
perfectly compacted tiers should produce 0 backlog.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
This new interface allows table to communicate multiple changes in the
SSTable set with a single call, which is useful on compaction completion
for example.
With this new interface, the size tiered backlog tracker will be able to
know when compaction completed, which will allow it to recompute tiers
and their backlog contribution, if any. Without it, tiered tracker
would have to recompute tiers for every change, which would be terribly
expensive.
The old remove/add interface are being removed in favor of the new one.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Memtables are a replica-side entity, and so are moved to the
replica module and namespace.
Memtables are also used outside the replica, in two places:
- in some virtual tables; this is also in some way inside the replica,
(virtual readers are installed at the replica level, not the
cooordinator), so I don't consider it a layering violation
- in many sstable unit tests, as a convenient way to create sstables
with known input. This is a layering violation.
We could make memtables their own module, but I think this is wrong.
Memtables are deeply tied into replica memory management, and trying
to make them a low-level primitive (at a lower level than sstables) will
be difficult. Not least because memtables use sstables. Instead, we
should have a memtable-like thing that doesn't support merging and
doesn't have all other funky memtable stuff, and instead replace
the uses of memtables in sstable tests with some kind of
make_flat_mutation_reader_from_unsorted_mutations() that does
the sorting that is the reason for the use of memtables in tests (and
live with the layering violation meanwhile).
Test: unit (dev)
Closes#10120
"
Currently the mutation compactor supports v1 and v2 output and has a v1
output. The next step is to add a v2 output but this would lead to a
full conversion matrix which we want to avoid. So in preparation we drop
the v1 input support. Most inputs were already v2, but there were some
notable exceptions: tests, the compacting reader and the multishard
query code. The former two was a simple mechanical update but the latter
required some further work because it turned out the v2 version of
evictable reader wasn't used yet and thus it managed to hide some bugs
and dropped features. While at it, we migrate all evictable and
multishard reader users to the v2 variant of the respective readers and
drop the v1 variant completely.
With this the road is open to a v2 compactor output and therefore to a
v2 sstable writer.
Tests: unit(dev, release), dtest(paging_additional_test.py)
"
* 'compact-mutation-v2-only-input/v5' of https://github.com/denesb/scylla:
test/lib/test_utils: return OK from check() variants
repair/row_level: use evictable reader v2
db/view/view_updating_consumer: migrate to v2
test/boost/mutation_reader_test: add v2 specific evictable reader tests
test: migrate to evictable reader v2 and multishard combining reader v2
compact_mutation: drop support for v1 input
test: pass v2 input to mutation_compaction
test/boost/mutation_test: simplify test_compaction_data_stream_split test
mutation_partition: do_compact(): do drop row tombstones covered by higher order tombstones
multishard_mutation_query: migrate to v2
mutation_fragment_v2: range_tombstone_change: add memory_usage()
evictable_reader_v2: terminate active range tombstones on reader recreation
evictable_reader_v2: restore handling of non-monotonically increasing positions
evictable_reader_v2: simplify handling of reader recreation
mutation: counter_write_query: use v2 reader
mutation: migrate consume() to v2
mutation_fragment_v2,flat_mutation_reader_v2: mirror v1 concept organization
mutation_reader: compacting_reader: require a v2 input reader
db/view/view_builder: use v2 reader
test/lib/flat_mutation_reader_assertions: adjust has_monotonic_positions() to v2 spec