Files
scylladb/sstables/compaction.hh
Glauber Costa 4f01ec0910 restrict background writers to 50 % of CPU.
In scylla, we have foreground processes, which are latency sensitive and
need to be responded to as fast as possible in order to maintain good
latency profiles, and background process, which are less so.

The most important background processes we have during normal write
workload operations are memtable writes and sstable compactions. Those
processes are quite CPU-intensive, and left unchecked will easily
dominate the CPU. Lower values of task-quota usually help, as it will
force those processes to preempt more, but aren't enough to guarantee
good isolation. We have seen boxes with good NVMe storage having their
throughput reduced to less than half of the original baseline in a short
dive down for the duration of a compaction.

In the long run, our goal is to leverage the CPU scheduler to make sure
that those processes are balanced with respect to all the others.
However, the current state of affairs is causing grievances as this very
moment. Thankfully, those processes live in a seastar::thread, that
ships with its own rudimentary bandwidth control mechanism: the
scheduling group.

The goal of this patch is to wrap background processes together in a
scheduling group, and assign to such group 50 % of our CPU power; the
remainder being left to foreground processes.

While we pride ourselves in dynamically adjusting things to the
workload, we won't be able to do this properly before the CPU scheduler
lands - and let's face it, leaving background processes run wild is not
adaptative either. Every workload would benefit most from a different
value for such shares, but 50 % is as fair as it gets if we really need
static partitining in the mean time.

As a defense against unforeseen consequences, we'll leave the actual
value as an option, but will do our best to hide it - as this is not a
tunable that we want to be part of a normal Scylla setup. The most
convenient place for this tunable is still db::config, so we can easily
pass it down to the database layer - but we will not document it in the
yaml, and will clearly note in the help string that it is not supposed
to be tuned.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2017-07-18 23:35:33 -04:00

137 lines
5.2 KiB
C++

/*
* Copyright (C) 2015 ScyllaDB
*
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include "sstables.hh"
#include <functional>
namespace sstables {
struct compaction_descriptor {
// List of sstables to be compacted.
std::vector<sstables::shared_sstable> sstables;
// Level of sstable(s) created by compaction procedure.
int level;
// Threshold size for sstable(s) to be created.
uint64_t max_sstable_bytes;
compaction_descriptor() = default;
compaction_descriptor(std::vector<sstables::shared_sstable> sstables, int level = 0, uint64_t max_sstable_bytes = std::numeric_limits<uint64_t>::max())
: sstables(std::move(sstables))
, level(level)
, max_sstable_bytes(max_sstable_bytes) {}
};
struct resharding_descriptor {
std::vector<sstables::shared_sstable> sstables;
uint64_t max_sstable_bytes;
shard_id reshard_at;
uint32_t level;
};
enum class compaction_type {
Compaction = 0,
Cleanup = 1,
Validation = 2,
Scrub = 3,
Index_build = 4,
Reshard = 5,
};
static inline sstring compaction_name(compaction_type type) {
switch (type) {
case compaction_type::Compaction:
return "COMPACTION";
case compaction_type::Cleanup:
return "CLEANUP";
case compaction_type::Validation:
return "VALIDATION";
case compaction_type::Scrub:
return "SCRUB";
case compaction_type::Index_build:
return "INDEX_BUILD";
case compaction_type::Reshard:
return "RESHARD";
default:
throw std::runtime_error("Invalid Compaction Type");
}
}
struct compaction_info {
compaction_type type = compaction_type::Compaction;
sstring ks;
sstring cf;
size_t sstables = 0;
uint64_t start_size = 0;
uint64_t end_size = 0;
uint64_t total_partitions = 0;
uint64_t total_keys_written = 0;
std::vector<shared_sstable> new_sstables;
sstring stop_requested;
bool is_stop_requested() const {
return stop_requested.size() > 0;
}
void stop(sstring reason) {
stop_requested = std::move(reason);
}
};
// Compact a list of N sstables into M sstables.
// Returns a vector with newly created sstables(s).
//
// creator is used to get a sstable object for a new sstable that will be written.
// max_sstable_size is a relaxed limit size for a sstable to be generated.
// Example: It's okay for the size of a new sstable to go beyond max_sstable_size
// when writing its last partition.
// sstable_level will be level of the sstable(s) to be created by this function.
// If cleanup is true, mutation that doesn't belong to current node will be
// cleaned up, log messages will inform the user that compact_sstables runs for
// cleaning operation, and compaction history will not be updated.
future<std::vector<shared_sstable>> compact_sstables(std::vector<shared_sstable> sstables,
column_family& cf, std::function<shared_sstable()> creator,
uint64_t max_sstable_size, uint32_t sstable_level, bool cleanup = false,
seastar::thread_scheduling_group* tsg = nullptr);
// Compacts a set of N shared sstables into M sstables. For every shard involved,
// i.e. which owns any of the sstables, a new unshared sstable is created.
future<std::vector<shared_sstable>> reshard_sstables(std::vector<shared_sstable> sstables,
column_family& cf, std::function<shared_sstable(shard_id)> creator,
uint64_t max_sstable_size, uint32_t sstable_level,
seastar::thread_scheduling_group* tsg = nullptr);
// Return the most interesting bucket applying the size-tiered strategy.
std::vector<sstables::shared_sstable>
size_tiered_most_interesting_bucket(const std::vector<sstables::shared_sstable>& candidates);
// Return list of expired sstables for column family cf.
// A sstable is fully expired *iff* its max_local_deletion_time precedes gc_before and its
// max timestamp is lower than any other relevant sstable.
// In simpler words, a sstable is fully expired if all of its live cells with TTL is expired
// and possibly doesn't contain any tombstone that covers cells in other sstables.
std::vector<sstables::shared_sstable>
get_fully_expired_sstables(column_family& cf, std::vector<sstables::shared_sstable>& compacting, int32_t gc_before);
}