mirror of
https://github.com/scylladb/scylladb.git
synced 2026-05-31 20:16:43 +00:00
ICS is a compaction strategy that inherits size tiered properties -- therefore it's write optimized too -- but fixes its space overhead of 100% due to input files being only released on completion. That's achieved with the concept of sstable run (similar in concept to LCS levels) which breaks a large sstable into fixed-size chunks (1G by default), known as run fragments. ICS picks similar-sized runs for compaction, and fragments of those runs can be released incrementally as they're compacted, reducing the space overhead to about (number_of_input_runs * 1G). This allows user to increase storage density of nodes (from 50% to ~80%), reducing the cost of ownership. NOTE: test_system_schema_version_is_stable adjusted to account for batchlog using IncrementalCompactionStrategy contains: compaction/: added incremental_compaction_strategy.cc (.hh), incremental_backlog_tracker.cc (.hh) compaction/CMakeLists.txt: include ICS cc files configure.py: changes for ICS files, includes test db/legacy_schema_migrator.cc / db/schema_tables.cc: fallback to ICS when strategy is not supported db/system_keyspace: pick ICS for some system tables schema/schema.hh: ICS becomes default test/boost: Add incremental_compaction_test.cc test/boost/sstable_compaction_test.cc: ICS related changes test/cqlpy/test_compaction_strategy_validation.py: ICS related changes docs/architecture/compaction/compaction-strategies.rst: changes to ICS section docs/cql/compaction.rst: changes to ICS section docs/cql/ddl.rst: adds reference to ICS options docs/getting-started/system-requirements.rst: updates sentence mentioning ICS docs/kb/compaction.rst: changes to ICS section docs/kb/garbage-collection-ics.rst: add file docs/kb/index.rst: add reference to <garbage-collection-ics> docs/operating-scylla/procedures/tips/production-readiness.rst: add ICS section some relevant commits throughout the ICS history: commit 434b97699b39c570d0d849d372bf64f418e5c692 Merge: 105586f747 30250749b8 Author: Paweł Dziepak <pdziepak@scylladb.com> Date: Tue Mar 12 12:14:23 2019 +0000 Merge "Introduce Incremental Compaction Strategy (ICS)" from Raphael " Introduce new compaction strategy which is essentially like size tiered but will work with the existing incremental compaction. Thus incremental compaction strategy. It works like size tiered, but each element composing a tier is a sstable run, meaning that the compaction strategy will look for N similar-sized sstable runs to compact, not just individual sstables. Parameters: * "sstable_size_in_mb": defines the maximum sstable (fragment) size composing a sstable run, which impacts directly the disk space requirement which is improved with incremental compaction. The lower the value the lower the space requirement for compaction because fragments involved will be released more frequently. * all others available in size tiered compaction strategy HOWTO ===== To change an existing table to use it, do: ALTER TABLE mykeyspace.mytable WITH compaction = {'class' : 'IncrementalCompactionStrategy'}; Set fragment size: ALTER TABLE mykeyspace.mytable WITH compaction = {'class' : 'IncrementalCompactionStrategy', 'sstable_size_in_mb' : 1000 } " commit 94ef3cd29a196bedbbeb8707e20fe78a197f30a1 Merge: dca89ce7a5 e08ef3e1a3 Author: Avi Kivity <avi@scylladb.com> Date: Tue Sep 8 11:31:52 2020 +0300 Merge "Add feature to limit space amplification in Incremental Compaction" from Raphael " A new option, space_amplification_goal (SAG), is being added to ICS. This option will allow ICS user to set a goal on the space amplification (SA). It's not supposed to be an upper bound on the space amplification, but rather, a goal. This new option will be disabled by default as it doesn't benefit write-only (no overwrites) workloads and could hurt severely the write performance. The strategy is free to delay triggering this new behavior, in order to increase overall compaction efficiency. The graph below shows how this feature works in practice for different values of space_amplification_goal: https://user-images.githubusercontent.com/1409139/89347544-60b7b980-d681-11ea-87ab-e2fdc3ecb9f0.png When strategy finds space amplification crossed space_amplification_goal, it will work on reducing the SA by doing a cross-tier compaction on the two largest tiers. This feature works only on the two largest tiers, because taking into account others, could hurt the compaction efficiency which is based on the fact that the more similar-sized sstables are compacted together the higher the compaction efficiency will be. With SAG enabled, min_threshold only plays an important role on the smallest tiers, given that the second-largest tier could be compacted into the largest tier for a space_amplification_goal value < 2. By making the options space_amplification_goal and min_threshold independent, user will be able to tune write amplification and space amplification, based on the needs. The lower the space_amplification_goal the higher the write amplification, but by increasing the min threshold, the write amplification can be decreased to a desired amount. " commit 7d90911c5fb3fa891ad64a62147c3a6ca26d61b1 Author: Raphael S. Carvalho <raphaelsc@scylladb.com> Date: Sat Oct 16 13:41:46 2021 -0300 compaction: ICS: Add garbage collection Today, ICS lacks an approach to persist expired tombstones in a timely manner, which is a problem because accumulation of tombstones are known to affecting latency considerably. For an expired tombstone to be purged, it has to reach the top of the LSM tree and hope that older overlapping data wasn't introduced at the bottom. The condition are there and must be satisfied to avoid data resurrection. STCS, today, has an inefficient garbage collection approach because it only picks a single sstable, which satisfies the tombstone density threshold and file staleness. That's a problem because overlapping data either on same tier or smaller tiers will prevent tombstones from being purged. Also, nothing is done to push the tombstones to the top of the tree, for the conditions to be eventually satisfied. Due to incremental compaction, ICS can more easily have an effecient GC by doing cross-tier compaction of relevant tiers. The trigger will be file staleness and tombstone density, which threshold values can be configured by tombstone_compaction_interval and tombstone_threshold, respectively. If ICS finds a tier which meets both conditions, then that tier and the larger[1] *and* closest-in-size[2] tier will be compacted together. [1]: A larger tier is picked because we want tombstones to eventually reach the top of the tree. [2]: It also has to be the closest-in-size tier as the smaller the size difference the higher the efficiency of the compaction. We want to minimize write amplification as much as possible. The staleness condition is there to prevent the same file from being picked over and over again in a short interval. With this approach, ICS will be continuously working to purge garbage while not hurting overall efficiency on a steady state, as same-tier compactions are prioritized. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20211016164146.38010-1-raphaelsc@scylladb.com> Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#22063
136 lines
5.4 KiB
C++
136 lines
5.4 KiB
C++
/*
|
|
* Copyright (C) 2015-present ScyllaDB
|
|
*/
|
|
|
|
/*
|
|
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
|
|
*/
|
|
|
|
#pragma once
|
|
|
|
#include "schema/schema_fwd.hh"
|
|
#include "sstables/shared_sstable.hh"
|
|
#include "exceptions/exceptions.hh"
|
|
#include "compaction_strategy_type.hh"
|
|
#include "table_state.hh"
|
|
#include "strategy_control.hh"
|
|
|
|
struct mutation_source_metadata;
|
|
class compaction_backlog_tracker;
|
|
extern logging::logger compaction_strategy_logger;
|
|
|
|
using namespace compaction;
|
|
|
|
namespace sstables {
|
|
|
|
class compaction_strategy_impl;
|
|
class sstable;
|
|
class sstable_set;
|
|
struct compaction_descriptor;
|
|
class storage;
|
|
|
|
class compaction_strategy {
|
|
::shared_ptr<compaction_strategy_impl> _compaction_strategy_impl;
|
|
public:
|
|
compaction_strategy(::shared_ptr<compaction_strategy_impl> impl);
|
|
|
|
compaction_strategy();
|
|
~compaction_strategy();
|
|
compaction_strategy(const compaction_strategy&);
|
|
compaction_strategy(compaction_strategy&&);
|
|
compaction_strategy& operator=(compaction_strategy&&);
|
|
|
|
// Return a list of sstables to be compacted after applying the strategy.
|
|
compaction_descriptor get_sstables_for_compaction(table_state& table_s, strategy_control& control);
|
|
|
|
compaction_descriptor get_major_compaction_job(table_state& table_s, std::vector<shared_sstable> candidates);
|
|
|
|
std::vector<compaction_descriptor> get_cleanup_compaction_jobs(table_state& table_s, std::vector<shared_sstable> candidates) const;
|
|
|
|
// Some strategies may look at the compacted and resulting sstables to
|
|
// get some useful information for subsequent compactions.
|
|
void notify_completion(table_state& table_s, const std::vector<shared_sstable>& removed, const std::vector<shared_sstable>& added);
|
|
|
|
// Return if parallel compaction is allowed by strategy.
|
|
bool parallel_compaction() const;
|
|
|
|
// Return if optimization to rule out sstables based on clustering key filter should be applied.
|
|
bool use_clustering_key_filter() const;
|
|
|
|
// An estimation of number of compaction for strategy to be satisfied.
|
|
int64_t estimated_pending_compactions(table_state& table_s) const;
|
|
|
|
static sstring name(compaction_strategy_type type) {
|
|
switch (type) {
|
|
case compaction_strategy_type::null:
|
|
return "NullCompactionStrategy";
|
|
case compaction_strategy_type::size_tiered:
|
|
return "SizeTieredCompactionStrategy";
|
|
case compaction_strategy_type::leveled:
|
|
return "LeveledCompactionStrategy";
|
|
case compaction_strategy_type::time_window:
|
|
return "TimeWindowCompactionStrategy";
|
|
case compaction_strategy_type::incremental:
|
|
return "IncrementalCompactionStrategy";
|
|
default:
|
|
throw std::runtime_error("Invalid Compaction Strategy");
|
|
}
|
|
}
|
|
|
|
static compaction_strategy_type type(const sstring& name) {
|
|
auto pos = name.find("org.apache.cassandra.db.compaction.");
|
|
sstring short_name = (pos == sstring::npos) ? name : name.substr(pos + 35);
|
|
if (short_name == "NullCompactionStrategy") {
|
|
return compaction_strategy_type::null;
|
|
} else if (short_name == "SizeTieredCompactionStrategy") {
|
|
return compaction_strategy_type::size_tiered;
|
|
} else if (short_name == "LeveledCompactionStrategy") {
|
|
return compaction_strategy_type::leveled;
|
|
} else if (short_name == "TimeWindowCompactionStrategy") {
|
|
return compaction_strategy_type::time_window;
|
|
} else if (short_name == "IncrementalCompactionStrategy") {
|
|
return compaction_strategy_type::incremental;
|
|
} else {
|
|
throw exceptions::configuration_exception(format("Unable to find compaction strategy class '{}'", name));
|
|
}
|
|
}
|
|
|
|
compaction_strategy_type type() const;
|
|
|
|
sstring name() const {
|
|
return name(type());
|
|
}
|
|
|
|
sstable_set make_sstable_set(schema_ptr schema) const;
|
|
|
|
compaction_backlog_tracker make_backlog_tracker() const;
|
|
|
|
uint64_t adjust_partition_estimate(const mutation_source_metadata& ms_meta, uint64_t partition_estimate, schema_ptr) const;
|
|
|
|
reader_consumer_v2 make_interposer_consumer(const mutation_source_metadata& ms_meta, reader_consumer_v2 end_consumer) const;
|
|
|
|
// Returns whether or not interposer consumer is used by a given strategy.
|
|
bool use_interposer_consumer() const;
|
|
|
|
// Informs the caller (usually the compaction manager) about what would it take for this set of
|
|
// SSTables closer to becoming in-strategy. If this returns an empty compaction descriptor, this
|
|
// means that the sstable set is already in-strategy.
|
|
//
|
|
// The caller can specify one of two modes: strict or relaxed. In relaxed mode the tolerance for
|
|
// what is considered offstrategy is higher. It can be used, for instance, for when the system
|
|
// is restarting and previous compactions were likely in-flight. In strict mode, we are less
|
|
// tolerant to invariant breakages.
|
|
//
|
|
// The caller should also pass a maximum number of SSTables which is the maximum amount of
|
|
// SSTables that can be added into a single job.
|
|
compaction_descriptor get_reshaping_job(std::vector<shared_sstable> input, schema_ptr schema, reshape_config cfg) const;
|
|
|
|
};
|
|
|
|
// Creates a compaction_strategy object from one of the strategies available.
|
|
compaction_strategy make_compaction_strategy(compaction_strategy_type strategy, const std::map<sstring, sstring>& options);
|
|
|
|
future<reshape_config> make_reshape_config(const sstables::storage& storage, reshape_mode mode);
|
|
|
|
}
|