From a57724e711849f472a57d968fe07520ba6fac06e Mon Sep 17 00:00:00 2001 From: "Raphael S. Carvalho" Date: Mon, 7 Nov 2022 17:30:19 -0300 Subject: [PATCH] Make off-strategy compaction wait for view building completion Prior to off-strategy compaction, streaming / repair would place staging files into main sstable set, and wait for view building completion before they could be selected for regular compaction. The reason for that is that view building relies on table providing a mutation source without data in staging files. Had regular compaction mixed staging data with non-staging one, table would have a hard time providing the required mutation source. After off-strategy compaction, staging files can be compacted in parallel to view building. If off-strategy completes first, it will place the output into the main sstable set. So a parallel view building (on sstables used for off-strategy) may potentially get a mutation source containing staging data from the off-strategy output. That will mislead view builder as it won't be able to detect changes to data in main directory. To fix it, we'll do what we did before. Filter out staging files from compaction, and trigger the operation only after we're done with view building. We're piggybacking on off-strategy timer for still allowing the off-strategy to only run at the end of the node operation, to reduce the amount of compaction rounds on the data introduced by repair / streaming. Fixes #11882. Signed-off-by: Raphael S. Carvalho Closes #11919 --- compaction/compaction_manager.cc | 7 ++++++- replica/table.cc | 5 +++++ 2 files changed, 11 insertions(+), 1 deletion(-) diff --git a/compaction/compaction_manager.cc b/compaction/compaction_manager.cc index 2f576aa96d..4108a25915 100644 --- a/compaction/compaction_manager.cc +++ b/compaction/compaction_manager.cc @@ -1097,7 +1097,12 @@ private: compaction::table_state& t = *_compacting_table; const auto& maintenance_sstables = t.maintenance_sstable_set(); - const auto old_sstables = boost::copy_range>(*maintenance_sstables.all()); + // Filter out sstables that require view building, to avoid a race between off-strategy + // and view building. Refs: #11882 + const auto old_sstables = boost::copy_range>(*maintenance_sstables.all() + | boost::adaptors::filtered([] (const sstables::shared_sstable& sst) { + return !sst->requires_view_building(); + })); std::vector reshape_candidates = old_sstables; std::vector sstables_to_remove; std::unordered_set new_unused_sstables; diff --git a/replica/table.cc b/replica/table.cc index b49a856f25..b7f248117b 100644 --- a/replica/table.cc +++ b/replica/table.cc @@ -2380,6 +2380,11 @@ future<> table::move_sstables_from_staging(std::vector co_await coroutine::parallel_for_each(dirs_to_sync, [] (sstring dir) { return sync_directory(dir); }); + // Off-strategy timer will be rearmed, so if there's more incoming data through repair / streaming, + // the timer can be updated once again. In practice, it allows off-strategy compaction to kick off + // at the end of the node operation on behalf of this table, which brings more efficiency in terms + // of write amplification. + do_update_off_strategy_trigger(); } /**