Files
scylladb/sstables/compaction_descriptor.hh
Raphael S. Carvalho a7cdd846da compaction: Prevent tons of compaction of fully expired sstable from happening in parallel
Compaction manager can start tons of compaction of fully expired sstable in
parallel, which may consume a significant amount of resources.
This problem is caused by weight being released too early in compaction, after
data is all compacted but before table is called to update its state, like
replacing sstables and so on.
Fully expired sstables aren't actually compacted, so the following can happen:
- compaction 1 starts for expired sst A with weight W, but there's nothing to
be compacted, so weight W is released, then calls table to update state.
- compaction 2 starts for expired sst B with weight W, but there's nothing to
be compacted, so weight W is released, then calls table to update state.
- compaction 3 starts for expired sst C with weight W, but there's nothing to
be compacted, so weight W is released, then calls table to update state.
- compaction 1 is done updating table state, so it finally completes and
releases all the resources.
- compaction 2 is done updating table state, so it finally completes and
releases all the resources.
- compaction 3 is done updating table state, so it finally completes and
releases all the resources.

This happens because, with expired sstable, compaction will release weight
faster than it will update table state, as there's nothing to be compacted.

With my reproducer, it's very easy to reach 50 parallel compactions on a single
shard, but that number can be easily worse depending on the amount of sstables
with fully expired data, across all tables. This high parallelism can happen
only with a couple of tables, if there are many time windows with expired data,
as they can be compacted in parallel.

Prior to 55a8b6e3c9, weight was released earlier in compaction, before
last sstable was sealed, but right now, there's no need to release weight
earlier. Weight can be released in a much simpler way, after the compaction is
actually done. So such compactions will be serialized from now on.

Fixes #8710.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20210527165443.165198-1-raphaelsc@scylladb.com>

[avi: drop now unneeded storage_service_for_tests]
2021-05-30 23:22:51 +03:00

179 lines
6.0 KiB
C++

/*
* Copyright (C) 2020 ScyllaDB
*
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <functional>
#include <optional>
#include <variant>
#include <seastar/core/smp.hh>
#include <seastar/core/file.hh>
#include "shared_sstable.hh"
#include "sstable_set.hh"
#include "utils/UUID.hh"
#include "dht/i_partitioner.hh"
#include "compaction_weight_registration.hh"
namespace sstables {
enum class compaction_type {
Compaction = 0,
Cleanup = 1,
Validation = 2,
Scrub = 3,
Index_build = 4,
Reshard = 5,
Upgrade = 6,
Reshape = 7,
};
std::ostream& operator<<(std::ostream& os, compaction_type type);
struct compaction_completion_desc {
// Old, existing SSTables that should be deleted and removed from the SSTable set.
std::vector<shared_sstable> old_sstables;
// New, fresh SSTables that should be added to SSTable set, replacing the old ones.
std::vector<shared_sstable> new_sstables;
// Set of compacted partition ranges that should be invalidated in the cache.
dht::partition_range_vector ranges_for_cache_invalidation;
};
// creates a new SSTable for a given shard
using compaction_sstable_creator_fn = std::function<shared_sstable(shard_id shard)>;
// Replaces old sstable(s) by new one(s) which contain all non-expired data.
using compaction_sstable_replacer_fn = std::function<void(compaction_completion_desc)>;
class compaction_options {
public:
struct regular {
};
struct cleanup {
std::reference_wrapper<database> db;
};
struct upgrade {
std::reference_wrapper<database> db;
};
struct scrub {
enum class mode {
abort, // abort scrub on the first sign of corruption
skip, // skip corrupt data, including range of rows and/or partitions that are out-of-order
segregate, // segregate out-of-order data into streams that all contain data with correct order
};
mode operation_mode = mode::abort;
};
struct reshard {
};
struct reshape {
};
private:
using options_variant = std::variant<regular, cleanup, upgrade, scrub, reshard, reshape>;
private:
options_variant _options;
private:
explicit compaction_options(options_variant options) : _options(std::move(options)) {
}
public:
static compaction_options make_reshape() {
return compaction_options(reshape{});
}
static compaction_options make_reshard() {
return compaction_options(reshard{});
}
static compaction_options make_regular() {
return compaction_options(regular{});
}
static compaction_options make_cleanup(database& db) {
return compaction_options(cleanup{db});
}
static compaction_options make_upgrade(database& db) {
return compaction_options(upgrade{db});
}
static compaction_options make_scrub(scrub::mode mode) {
return compaction_options(scrub{mode});
}
template <typename... Visitor>
auto visit(Visitor&&... visitor) const {
return std::visit(std::forward<Visitor>(visitor)..., _options);
}
compaction_type type() const;
};
std::string_view to_string(compaction_options::scrub::mode);
std::ostream& operator<<(std::ostream& os, compaction_options::scrub::mode scrub_mode);
struct compaction_descriptor {
// List of sstables to be compacted.
std::vector<sstables::shared_sstable> sstables;
// This is a snapshot of the table's sstable set, used only for the purpose of expiring tombstones.
// If this sstable set cannot be provided, expiration will be disabled to prevent data from being resurrected.
std::optional<sstables::sstable_set> all_sstables_snapshot;
// Level of sstable(s) created by compaction procedure.
int level;
// Threshold size for sstable(s) to be created.
uint64_t max_sstable_bytes;
// Run identifier of output sstables.
utils::UUID run_identifier;
// Calls compaction manager's task for this compaction to release reference to exhausted sstables.
std::function<void(const std::vector<shared_sstable>& exhausted_sstables)> release_exhausted;
// The options passed down to the compaction code.
// This also selects the kind of compaction to do.
compaction_options options = compaction_options::make_regular();
compaction_sstable_creator_fn creator;
compaction_sstable_replacer_fn replacer;
::io_priority_class io_priority = default_priority_class();
compaction_descriptor() = default;
static constexpr int default_level = 0;
static constexpr uint64_t default_max_sstable_bytes = std::numeric_limits<uint64_t>::max();
explicit compaction_descriptor(std::vector<sstables::shared_sstable> sstables,
std::optional<sstables::sstable_set> all_sstables_snapshot,
::io_priority_class io_priority,
int level = default_level,
uint64_t max_sstable_bytes = default_max_sstable_bytes,
utils::UUID run_identifier = utils::make_random_uuid(),
compaction_options options = compaction_options::make_regular())
: sstables(std::move(sstables))
, all_sstables_snapshot(std::move(all_sstables_snapshot))
, level(level)
, max_sstable_bytes(max_sstable_bytes)
, run_identifier(run_identifier)
, options(options)
, io_priority(io_priority)
{}
};
}