Files
scylladb/sstables/compaction_manager.hh
Nadav Har'El 721f7d1d4f Rewrite shared sstables soon after startup
Several shards may share the same sstable - e.g., when re-starting scylla
with a different number of shards, or when importing sstables from an
external source. Sharing an sstable is fine, but it can result in excessive
disk space use because the shared sstable cannot be deleted until all
the shards using it have finished compacting it. Normally, we have no idea
when the shards will decide to compact these sstables - e.g., with size-
tiered-compaction a large sstable will take a long time until we decide
to compact it. So what this patch does is to initiate compaction of the
shared sstables - on each shard using it - so that a soon as possible after
the restart, we will have the original sstable is split into separate
sstables per shard, and the original sstable can be deleted. If several
sstables are shared, we serialize this compaction process so that each
shard only rewrites one sstable at a time. Regular compactions may happen
in parallel, but they will not not be able to choose any of the shared
sstables because those are already marked as being compacted.

Commit 3f2286d0 increased the need for this patch, because since that
commit, if we don't delete the shared sstable, we also cannot delete
additional sstables which the different shards compacted with it. For one
scylla user, this resulted in so much excessive disk space use, that it
literally filled the whole disk.

After this patch commit 3f2286d0, or the discussion in issue #1318 on how
to improve it, is no longer necessary, because we will never compact a shared
sstable together with any other sstable - as explained above, the shared
sstables are marked as "being compacted" so the regular compactions will
avoid them.

Fixes #1314.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1465406235-15378-1-git-send-email-nyh@scylladb.com>
Reviewed-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-06-08 15:44:29 -04:00

143 lines
4.9 KiB
C++

/*
* Copyright (C) 2015 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include "core/semaphore.hh"
#include "core/sstring.hh"
#include "core/shared_ptr.hh"
#include "core/gate.hh"
#include "core/shared_future.hh"
#include "log.hh"
#include "utils/exponential_backoff_retry.hh"
#include <vector>
#include <list>
#include <functional>
#include "sstables/compaction.hh"
class column_family;
// Compaction manager is a feature used to manage compaction jobs from multiple
// column families pertaining to the same database.
class compaction_manager {
public:
struct stats {
int64_t pending_tasks = 0;
int64_t completed_tasks = 0;
uint64_t active_tasks = 0; // Number of compaction going on.
int64_t errors = 0;
};
private:
struct task {
column_family* compacting_cf = nullptr;
shared_future<> compaction_done = make_ready_future<>();
seastar::gate compaction_gate;
exponential_backoff_retry compaction_retry = exponential_backoff_retry(std::chrono::seconds(5), std::chrono::seconds(300));
bool stopping = false;
bool cleanup = false;
};
// compaction manager may have N fibers to allow parallel compaction per shard.
std::list<lw_shared_ptr<task>> _tasks;
// Used to assert that compaction_manager was explicitly stopped, if started.
bool _stopped = true;
stats _stats;
std::vector<scollectd::registration> _registrations;
std::list<lw_shared_ptr<sstables::compaction_info>> _compactions;
// Store sstables that are being compacted at the moment. That's needed to prevent
// a sstable from being compacted twice.
std::unordered_set<sstables::shared_sstable> _compacting_sstables;
// Keep track of weight of ongoing compaction for each column family.
// That's used to allow parallel compaction on the same column family.
std::unordered_map<column_family*, std::unordered_set<int>> _weight_tracker;
private:
lw_shared_ptr<task> task_start(column_family* cf, bool cleanup);
future<> task_stop(lw_shared_ptr<task> task);
// Returns if this compaction manager is accepting new requests.
// It will not accept new requests in case the manager was stopped.
bool can_submit();
// Return true if weight is not registered. If parallel_compaction is not
// true, only one weight is allowed to be registered.
bool try_to_register_weight(column_family* cf, int weight, bool parallel_compaction);
// Deregister weight for a column family.
void deregister_weight(column_family* cf, int weight);
// If weight of compaction job is taken, it will be trimmed until its new
// weight is not taken or its size is equal to minimum threshold.
// Return weight of compaction job.
int trim_to_compact(column_family* cf, sstables::compaction_descriptor& descriptor);
public:
compaction_manager();
~compaction_manager();
void register_collectd_metrics();
// Start compaction manager.
void start();
// Stop all fibers. Ongoing compactions will be waited.
future<> stop();
// Submit a column family to be compacted.
void submit(column_family* cf);
// Submit a column family to be cleaned up and wait for its termination.
future<> perform_cleanup(column_family* cf);
// Submit a specific sstable to be rewritten, while dropping data which
// does not belong to this shard. Meant to be used on startup when an
// sstable is shared by multiple shards, and we want to split it to a
// separate sstable for each shard.
void submit_sstable_rewrite(column_family* cf,
sstables::shared_sstable s);
// Remove a column family from the compaction manager.
// Cancel requests on cf and wait for a possible ongoing compaction on cf.
future<> remove(column_family* cf);
const stats& get_stats() const {
return _stats;
}
void register_compaction(lw_shared_ptr<sstables::compaction_info> c) {
_compactions.push_back(c);
}
void deregister_compaction(lw_shared_ptr<sstables::compaction_info> c) {
_compactions.remove(c);
}
const std::list<lw_shared_ptr<sstables::compaction_info>>& get_compactions() const {
return _compactions;
}
// Stops ongoing compaction of a given type.
void stop_compaction(sstring type);
};