It is possible that a node will have no owned token ranges
in some keyspaces based on their replication strategy,
if the strategy is configured to have no replicas in
this node's data center.
In this case we should go ahead with cleanup that will
effectively delete all data.
Note that this is current very inefficient as we need
to filter every partition and drop it as unowned.
It can be optimized by either special casing this case
or, better, use skip forward to the next owned range.
This will skip to end-of-stream since there are no
owned ranges.
Fixes#13634
Also, add a respective rest_api unit test
Closes#13849
* github.com:scylladb/scylladb:
test: rest_api: test_storage_service: add test_storage_service_keyspace_cleanup_with_no_owned_ranges
compaction_manager: perform_cleanup: handle empty owned ranges
Updates to the compaction_group sstable sets are
never done in place. Instead, the update is done
on a mutable copy of the sstable set, and the lw_shared
result is set back in the compaction_group.
(see for example compaction_group::set_main_sstables)
Therefore, there's currently a risk in perform_cleanup
`get_sstables` lambda that if it yield while in
set.for_each_sstable, the sstable_set might be replaced
and the copy it is traversing may be destroyed.
This was introduced in c2bf0e0b72.
To prevent that, hold on to set.shared_from_this()
around set.for_each_sstable.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Closes#13852
It is possible that a node will have no owned token ranges
in some keyspaces based on their replication strategy,
if the strategy is configured to have no replicas in
this node's data center.
In this case we should go ahead with cleanup that will
effectively delete all data.
Note that this is current very inefficient as we need
to filter every partition and drop it as unowned.
It can be optimized by either special casing this case
ot, better, use skip forward to the next owned range.
This will skip to end-of-stream since there are no
owned ranges.
Fixes#13634
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Task manager's tasks covering scrub compaction on top,
shard and table level.
For this levels we have common scrub tasks for each scrub
mode since they share code. Scrub modes will be differentiated
on compaction group level.
Closes#13694
* github.com:scylladb/scylladb:
test: extend test_compaction_task.py to test scrub compaction
compaction: add table_scrub_sstables_compaction_task_impl
compaction: add shard_scrub_sstables_compaction_task_impl
compaction: add scrub_sstables_compaction_task_impl
api: get rid of unnecessary std::optional in scrub
compaction: rename rewrite_sstables_compaction_task_impl
Rename rewrite_sstables_compaction_task_impl to
sstables_compaction_task_impl as a new name describes the class
of tasks better. Rewriting sstables is a slightly more fine-grained
type of sstable compaction task then the one needed here.
In addition to the data file itself. Currently validation avoids the
index altogether, using the crawling reader which only relies on the
data file and ignores the index+summary. This is because a corrupt
sstable usually has a corrupt index too and using both at the same time
might hide the corruption. This patch adds targeted validation of the
index, independent of and in addition to the already existing data
validation: it validates the order of index entries as well as whether
the entry points to a complete partition in the data file.
This will usually result in duplicate errors for out-of-order
partitions: one for the data file and one for the index file.
Fixes: #9611Closes#11405
* github.com:scylladb/scylladb:
test/cql-pytest: add test_sstable_validation.py
test/cql-pytest: extract scylla_path,temp_workdir fixtures to conftest.py
tools/scylla-sstables: write validation result to stdout
sstables/sstable: validate(): delegate to mx validator for mx sstables
sstables/mx/reader: add mx specific validator
mutation/mutation_fragment_stream_validator: add validator() accessor to validating filter
sstables/mx/reader: template data_consume_rows_context_m on the consumer
sstables/mx/reader: move row_processing_result to namespace scope
sstables/mx/reader: use data_consumer::proceed directly
sstables/mx/reader.cc: extend namespace to end-of-file (cosmetic)
compaction/compaction: remove now unused scrub_validate_mode_validate_reader()
compaction/compaction: move away from scrub_validate_mode_validate_reader()
tools/scylla-sstable: move away from scrub_validate_mode_validate_reader()
test/boost/sstable_compaction_test: move away from scrub_validate_mode_validate_reader()
sstables/sstable: add validate() method
compaction/compaction: scrub_sstables_validate_mode(): validate sstables one-by-one
compaction: scrub: use error messages from validator
mutation_fragment_stream_validator: produce error messages in low-level validator
we should always qualify `format_to` with its namespace. otherwise
we'd have following failure when compiling with libstdc++ from GCC-13:
```
/home/kefu/dev/scylladb/compaction/table_state.hh:65:16: error: call to 'format_to' is ambiguous
return format_to(ctx.out(), "{}.{} compaction_group={}", s->ks_name(), s->cf_name(), t.get_group_id());
^~~~~~~~~
```
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#13760
Currently said method creates a combined reader from all the sstables
passed to it then validates this combined reader.
Change it to validate each sstable (reader) individually in preparation
of the new validate method which can handle a single sstable at a time.
Note that this is not going to make much impact in practice, all callers
pass a single sstable to this method already.
In many cases we trigger offstrategy compaction opportunistically
also when there's nothing to do. In this case we still print
to the log lots of info-level message and call
`run_offstrategy_compaction` that wastes more cpu cycles
on learning that it has nothing to do.
This change bails out early if the maintenance set is empty
and prints a "Skipping off-strategy compaction" message in debug
level instead.
Fixes#13466
Also, add an group_id class and return it from compaction_group and table_state.
Use that to identify the compaction_group / table_state by "ks_name.cf_name compaction_group=idx/total" in log messages.
Fixes#13467Closes#13520
* github.com:scylladb/scylladb:
compaction_manager: print compaction_group id
compaction_group, table_state: add group_id member
compaction_manager: offstrategy compaction: skip compaction if no candidates are found
This series fixes a few issues caused by f1bbf705f9
(f1bbf705f9):
- table, compaction_manager: prevent cross shard access to owned_ranges_ptr
- Fixes#13631
- distributed_loader: distribute_reshard_jobs: pick one of the sstable shard owners
- compaction: make_partition_filter: do not assert shard ownership
- allow the filtering reader now used during resharding to process tokens owned by other shards
Closes#13635
* github.com:scylladb/scylladb:
compaction: make_partition_filter: do not assert shard ownership
distributed_loader: distribute_reshard_jobs: pick one of the sstable shard owners
table, compaction_manager: prevent cross shard access to owned_ranges_ptr
Add a formatter to compaction::table_state that
prints the table ks_name.cf_name and compaction group id.
Fixes#13467
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
In many cases we trigger offstrategy compaction opportunistically
also when there's nothing to do. In this case we still print
to the log lots of info-level message and call
`run_offstrategy_compaction` that wastes more cpu cycles
on learning that it has nothing to do.
This change bails out early if the maintenance set is empty
and prints a "Skipping off-strategy compaction" message in debug
level instead.
Fixes#13466
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
The only reason why it's there (right next to compaction_fwd.hh) is
because the database::table_truncate_state subclass needs the definition
of compaction_manager::compaction_reenabler subclass.
However, the former sub is not used outside of database.cc and can be
defined in .cc. Keeping it outside of the header allows dropping the
compaction_manager.hh from database.hh thus greatly reducing its fanout
over the code (from ~180 indirect inclusions down to ~20).
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closes#13622
Now, with f1bbf705f9
(Cleanup sstables in resharding and other compaction types),
we may filter sstables as part of resharding compaction
and the assertion that all tokens are owned by the current
shard when filtering is no longer true.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Seen after f1bbf705f9 in debug mode
distributed_loader collect_all_shared_sstables copies
compaction::owned_ranges_ptr (lw_shared_ptr<const
dht::token_range_vector>)
across shards.
Since update_sstable_cleanup_state is synchronous, it can
be passed a const refrence to the token_range_vector instead.
It is ok to access the memory read-only across shards
and since this happens on start-up, there are no special
performance requirements.
Fixes#13631
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
enable_optimized_twcs_queries is specific to TWCS, therefore it
belongs to TWCS, not replica::table.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Closes#13489
Commit 49892a0, back in 2018, bumps the compaction shares by 200 to
guarantee a minimum base line.
However, after commit e3f561d, major compaction runs in maintenance
group meaning that bumping shares became completely irrelevant and
only causes regular compaction to be unnecessarily more aggressive.
Fixes#13487.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Closes#13488
Task manager task implementations of classes that cover
rewrite sstables keyspace compaction which can be start
through /storage_service/keyspace_compaction/ api.
Top level task covers the whole compaction and creates child
tasks on each shard.
Closes#12714
* github.com:scylladb/scylladb:
test: extend test_compaction_task.py to test rewrite sstables compaction
compaction: create task manager's task for rewrite sstables keyspace compaction on one shard
compaction: create task manager's task for rewrite sstables keyspace compaction
compaction: create rewrite_sstables_compaction_task_impl
If any of the sstables to-be-compacted requires cleanup,
retrive the owned_ranges_ptr from the table_state.
With that, staging sstables will eventually be cleaned up
via regular compaction.
Refs #9559
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Refactor the printing logic in compaction::formatted_sstables_list
out to sstables::to_string(const shared_sstable&, bool include_origin)
and operator<<(const shared_sstable) on top of it.
So that we can easily print std::vector<shared_sstable>
from compaction_manager in the next patch.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
When perform_cleanup adds sstables to sstables_requiring_cleanup,
also save the owned_ranges_ptr in the compaction_state so
it could be used by other compaction types like
regular, reshape, or major compaction.
When the exhausted sstables are released, check
if sstables_requiring_cleanup is empty, and if it is,
clear also the owned_ranges_ptr.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
As a first step towards parallel cleanup by
(regular) compaction and cleanup compaction,
filter all sstables in perform_cleanup
and keep the set of sstables in the compaction_state.
Erase from that set when the sstables are unregistered
from compaction.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
So it can be used in the next patch that will refactor
compaction_state out of class compaction_manager.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Allow getting candidates for compaction
from an arbitrary range of sstable, not only
the in_strategy_sstables.
To be used by perform_cleanup to mark all sstables
that require cleanup, even if they can't be
compacted at this time.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
update_sstable_cleanup_state calls needs_cleanup and
inserts (or erases) the sstable into the respective
compaction_state.sstables_requiring_cleanup set.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
I'm not sure why this was originally supported,
maybe for upgrade sstables where we may want to
rewrite the sstables without filtering any tokens,
but perform_sstable_upgrade is now following a
different code path and uses `rewrite_sstables`
directly, without pigybacking on cleanup.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Move the token filtering logic down from cleanup_compaction
to regular_compaction and class compaction so it can be
reused by other compaction types.
Create a _owned_ranges_checker in class compaction
when _owned_ranges is engaged, and use it in
compaction::setup to filter partitions based on the owned ranges.
Ref scylladb/scylladb#12998
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Move the owned_ranges_ptr, currently used only by
cleanup and upgrade compactions, to the generic
compaction descriptor so we apply cleanup in other
compaction types.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
A problem in compaction reevaluation can cause the SSTable set to be left uncompacted for indefinite amount of time, potentially causing space and read amplification to be suboptimal.
Two revaluation problems are being fixed, one after off-strategy compaction ended, and another in compaction manager which intends to periodically reevaluate a need for compaction.
Fixes https://github.com/scylladb/scylladb/issues/13429.
Fixes https://github.com/scylladb/scylladb/issues/13430.
Closes#13431
* github.com:scylladb/scylladb:
compaction: Make compaction reevaluation actually periodic
replica: Reevaluate regular compaction on off-strategy completion
Compaction group is responsible for deleting SSTables of "in-strategy"
compactions, i.e. regular, major, cleanup, etc.
Both in-strategy and off-strategy compaction have their completion
handled using the same compaction group interface, which is
compaction_group::table_state::on_compaction_completion(...,
sstables::offstrategy offstrategy)
So it's important to bring symmetry there, by moving the responsibility
of deleting off-strategy input, from manager to group.
Another important advantage is that off-strategy deletion is now throttled
and gated, allowing for better control, e.g. table waiting for deletion
on shutdown.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Closes#13432
The manager intended to periodically reevaluate compaction need for
each registered table. But it's not working as intended.
The reevaluation is one-off.
This means that compaction was not kicking in later for a table, with
low to none write activity, that had expired data 1 hour from now.
Also make sure that reevaluation happens within the compaction
scheduling group.
Fixes#13430.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Task manager compaction tasks that cover compaction group
compaction need access to compaction_manager::tasks.
To avoid circular dependency and be able to rely on forward
declaration, task needs to be moved out of compaction manager.
To avoid naming confusion compaction_manager::task is renamed.
Closes#13226
* github.com:scylladb/scylladb:
compaction: use compaction namespace in compaction_manager.cc
compaction: rename compaction::task
compaction: move compaction_manager::task out of compaction manager
compaction: move sstable_task definition to source file