Make off-strategy compaction wait for view building completion

Prior to off-strategy compaction, streaming / repair would place
staging files into main sstable set, and wait for view building
completion before they could be selected for regular compaction.

The reason for that is that view building relies on table providing
a mutation source without data in staging files. Had regular compaction
mixed staging data with non-staging one, table would have a hard time
providing the required mutation source.

After off-strategy compaction, staging files can be compacted
in parallel to view building. If off-strategy completes first, it
will place the output into the main sstable set. So a parallel view
building (on sstables used for off-strategy) may potentially get a
mutation source containing staging data from the off-strategy output.
That will mislead view builder as it won't be able to detect
changes to data in main directory.

To fix it, we'll do what we did before. Filter out staging files
from compaction, and trigger the operation only after we're done
with view building. We're piggybacking on off-strategy timer for
still allowing the off-strategy to only run at the end of the
node operation, to reduce the amount of compaction rounds on
the data introduced by repair / streaming.

Fixes #11882.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #11919
This commit is contained in:
Raphael S. Carvalho
2022-11-07 17:30:19 -03:00
committed by Botond Dénes
parent 243fcb96f0
commit a57724e711
2 changed files with 11 additions and 1 deletions

View File

@@ -1097,7 +1097,12 @@ private:
compaction::table_state& t = *_compacting_table;
const auto& maintenance_sstables = t.maintenance_sstable_set();
const auto old_sstables = boost::copy_range<std::vector<sstables::shared_sstable>>(*maintenance_sstables.all());
// Filter out sstables that require view building, to avoid a race between off-strategy
// and view building. Refs: #11882
const auto old_sstables = boost::copy_range<std::vector<sstables::shared_sstable>>(*maintenance_sstables.all()
| boost::adaptors::filtered([] (const sstables::shared_sstable& sst) {
return !sst->requires_view_building();
}));
std::vector<sstables::shared_sstable> reshape_candidates = old_sstables;
std::vector<sstables::shared_sstable> sstables_to_remove;
std::unordered_set<sstables::shared_sstable> new_unused_sstables;

View File

@@ -2380,6 +2380,11 @@ future<> table::move_sstables_from_staging(std::vector<sstables::shared_sstable>
co_await coroutine::parallel_for_each(dirs_to_sync, [] (sstring dir) {
return sync_directory(dir);
});
// Off-strategy timer will be rearmed, so if there's more incoming data through repair / streaming,
// the timer can be updated once again. In practice, it allows off-strategy compaction to kick off
// at the end of the node operation on behalf of this table, which brings more efficiency in terms
// of write amplification.
do_update_off_strategy_trigger();
}
/**