Files
scylladb/compaction/compaction_descriptor.hh
Botond Dénes 81e214237f Merge 'Add digests for all sstable components in scylla metadata' from Taras Veretilnyk
This pull request adds support for calculation and storing CRC32 digests for all SSTable components.
This change replaces plain file_writer with crc32_digest_file_writer for all SSTable components that should be checksummed. The resulting component digests are stored in the sstable structure
and later persisted to disk as part of the Scylla metadata component during writer::consume_end_of_stream.
Several test cases where introduced to verify expected behaviour.

Additionally, this PR adds new rewrite component mechanism for safe sstable component rewriting.
Previously, rewriting an sstable component (e.g., via rewrite_statistics) created a temporary file that was renamed to the final name after sealing. This allowed crash recovery by simply removing the temporary file on startup.

However, with component digests stored in scylla_metadata (#20100),
replacing a component like Statistics requires atomically updating both the component
and scylla_metadata with the new digest - impossible with POSIX rename.

The new mechanism creates a clone sstable with a fresh generation:

- Hard-links all components from the source except the component being rewritten and scylla_metadata
- Copies original sstable components pointer and recognized components from the source
- Invokes a modifier callback to adjust the new sstable before rewriting
- Writes the modified component along with updated scylla_metadata containing the new digest
- Seals the new sstable with a temporary TOC
- Replaces the old sstable atomically, the same way as it is done in compaction

This is built on the rewrite_sstables compaction framework to support batch operations (e.g., following incremental repair).
In case of any failure durning the whole process, sstable will be automatically deleted on the node startup due to
temporary toc persistence.

Backport is not required, it is a new feature

Fixes https://github.com/scylladb/scylladb/issues/20100, https://github.com/scylladb/scylladb/issues/27453

Closes scylladb/scylladb#28338

* github.com:scylladb/scylladb:
  docs: document components_digests subcomponent and trailing digest in Scylla.db
  sstable_compaction_test: Add tests for perform_component_rewrite
  sstable_test: add verification testcases of SSTable components digests persistance
  sstables: store digest of all sstable components in scylla metadata
  sstables: replace rewrite_statistics with new rewrite component mechanism
  sstables: add new rewrite component mechanism for safe sstable component rewriting
  compaction: add compaction_group_view method to specify sstable version
  sstables: add null_data_sink and serialized_checksum for checksum-only calculation
  sstables: extract default write open flags into a constant
  sstables: Add write_simple_with_digest for component checksumming
  sstables: Extract file writer closing logic into separate methods
  sstables: Implement CRC32 digest-only writer
2026-03-10 16:02:53 +02:00

261 lines
10 KiB
C++

/*
* Copyright (C) 2020-present ScyllaDB
*
*/
/*
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
*/
#pragma once
#include <functional>
#include <optional>
#include <variant>
#include "sstables/component_type.hh"
#include "sstables/types_fwd.hh"
#include "sstables/sstable_set.hh"
#include "compaction_fwd.hh"
#include "mutation_writer/token_group_based_splitting_writer.hh"
#include "utils/chunked_vector.hh"
namespace compaction {
enum class compaction_type {
Compaction = 0, // Used only for regular compactions
Cleanup = 1,
Validation = 2, // Origin uses this for a compaction that is used exclusively for repair
Scrub = 3,
Index_build = 4,
Reshard = 5,
Upgrade = 6,
Reshape = 7,
Split = 8,
Major = 9,
RewriteComponent = 10,
};
struct compaction_completion_desc {
// Old, existing SSTables that should be deleted and removed from the SSTable set.
std::vector<sstables::shared_sstable> old_sstables;
// New, fresh SSTables that should be added to SSTable set, replacing the old ones.
std::vector<sstables::shared_sstable> new_sstables;
// Set of compacted partition ranges that should be invalidated in the cache.
utils::chunked_vector<dht::partition_range> ranges_for_cache_invalidation;
};
// creates a new SSTable for a given shard
using compaction_sstable_creator_fn = std::function<sstables::shared_sstable(shard_id shard)>;
// Replaces old sstable(s) by new one(s) which contain all non-expired data.
using compaction_sstable_replacer_fn = std::function<void(compaction_completion_desc)>;
class compaction_type_options {
public:
struct regular {
};
struct major {
};
struct cleanup {
};
struct upgrade {
};
struct scrub {
enum class mode {
abort, // abort scrub on the first sign of corruption
skip, // skip corrupt data, including range of rows and/or partitions that are out-of-order
segregate, // segregate out-of-order data into streams that all contain data with correct order
validate, // validate data, printing all errors found (sstables are only read, not rewritten)
};
mode operation_mode = mode::abort;
enum class quarantine_mode {
include, // scrub all sstables, including quarantined
exclude, // scrub only non-quarantined sstables
only, // scrub only quarantined sstables
};
quarantine_mode quarantine_operation_mode = quarantine_mode::include;
using quarantine_invalid_sstables = bool_class<class quarantine_invalid_sstables_tag>;
// Should invalid sstables be moved into quarantine.
// Only applies to validate-mode.
quarantine_invalid_sstables quarantine_sstables = quarantine_invalid_sstables::yes;
using drop_unfixable_sstables = bool_class<class drop_unfixable_sstables_tag>;
// Drop sstables that cannot be fixed.
// Only applies to segregate-mode.
drop_unfixable_sstables drop_unfixable = drop_unfixable_sstables::no;
};
struct reshard {
};
struct reshape {
};
struct split {
mutation_writer::classify_by_token_group classifier;
};
struct component_rewrite {
sstables::component_type component_to_rewrite;
std::function<void(sstables::sstable&)> modifier;
using update_sstable_id = bool_class<class update_sstable_id_tag>;
update_sstable_id update_id = update_sstable_id::yes;
};
private:
using options_variant = std::variant<regular, cleanup, upgrade, scrub, reshard, reshape, split, major, component_rewrite>;
private:
options_variant _options;
private:
explicit compaction_type_options(options_variant options) : _options(std::move(options)) {
}
public:
static compaction_type_options make_reshape() {
return compaction_type_options(reshape{});
}
static compaction_type_options make_reshard() {
return compaction_type_options(reshard{});
}
static compaction_type_options make_regular() {
return compaction_type_options(regular{});
}
static compaction_type_options make_major() {
return compaction_type_options(major{});
}
static compaction_type_options make_cleanup() {
return compaction_type_options(cleanup{});
}
static compaction_type_options make_upgrade() {
return compaction_type_options(upgrade{});
}
static compaction_type_options make_scrub(scrub::mode mode, scrub::quarantine_invalid_sstables quarantine_sstables = scrub::quarantine_invalid_sstables::yes, scrub::drop_unfixable_sstables drop_unfixable_sstables = scrub::drop_unfixable_sstables::no) {
return compaction_type_options(scrub{.operation_mode = mode, .quarantine_sstables = quarantine_sstables, .drop_unfixable = drop_unfixable_sstables});
}
static compaction_type_options make_component_rewrite(component_type component, std::function<void(sstables::sstable&)> modifier, component_rewrite::update_sstable_id update_id = component_rewrite::update_sstable_id::yes) {
return compaction_type_options(component_rewrite{.component_to_rewrite = component, .modifier = std::move(modifier), .update_id = update_id});
}
static compaction_type_options make_split(mutation_writer::classify_by_token_group classifier) {
return compaction_type_options(split{std::move(classifier)});
}
template <typename... Visitor>
auto visit(Visitor&&... visitor) const {
return std::visit(std::forward<Visitor>(visitor)..., _options);
}
template <typename OptionType>
const auto& as() const {
return std::get<OptionType>(_options);
}
const options_variant& options() const { return _options; }
compaction_type type() const;
};
std::string_view to_string(compaction_type_options::scrub::mode);
std::string_view to_string(compaction_type_options::scrub::quarantine_mode);
class dummy_tag {};
using has_only_fully_expired = seastar::bool_class<dummy_tag>;
struct compaction_descriptor {
// List of sstables to be compacted.
std::vector<sstables::shared_sstable> sstables;
// This is a snapshot of the table's sstable set, used only for the purpose of expiring tombstones.
// If this sstable set cannot be provided, expiration will be disabled to prevent data from being resurrected.
std::optional<sstables::sstable_set> all_sstables_snapshot;
// Level of sstable(s) created by compaction procedure.
int level;
// Threshold size for sstable(s) to be created.
uint64_t max_sstable_bytes;
// Can split large partitions at clustering boundary.
bool can_split_large_partition = false;
// Run identifier of output sstables.
sstables::run_id run_identifier;
// The options passed down to the compaction code.
// This also selects the kind of compaction to do.
compaction_type_options options = compaction_type_options::make_regular();
// If engaged, compaction will cleanup the input sstables by skipping non-owned ranges.
compaction::owned_ranges_ptr owned_ranges;
// Required for reshard compaction.
const dht::sharder* sharder;
compaction_sstable_creator_fn creator;
compaction_sstable_replacer_fn replacer;
// Denotes if this compaction task is comprised solely of completely expired SSTables
has_only_fully_expired has_only_fully_expired = has_only_fully_expired::no;
// If set to true, gc will check only the compacting sstables to collect tombstones.
// If set to false, gc will check the memtables, commit log and other uncompacting
// sstables to decide if a tombstone can be collected. Note that these checks are
// not perfect. W.r.to memtables and uncompacted SSTables, if their minimum timestamp
// is less than that of the tombstone and they contain the key, the tombstone will
// not be collected. No row-level, cell-level check takes place. W.r.to the commit
// log, there is currently no way to check if the key exists; only the minimum
// timestamp comparison, similar to memtables, is performed.
bool gc_check_only_compacting_sstables = false;
compaction_descriptor() = default;
static constexpr int default_level = 0;
static constexpr uint64_t default_max_sstable_bytes = std::numeric_limits<uint64_t>::max();
explicit compaction_descriptor(std::vector<sstables::shared_sstable> sstables,
int level = default_level,
uint64_t max_sstable_bytes = default_max_sstable_bytes,
sstables::run_id run_identifier = sstables::run_id::create_random_id(),
compaction_type_options options = compaction_type_options::make_regular(),
compaction::owned_ranges_ptr owned_ranges_ = {})
: sstables(std::move(sstables))
, level(level)
, max_sstable_bytes(max_sstable_bytes)
, run_identifier(run_identifier)
, options(options)
, owned_ranges(std::move(owned_ranges_))
{}
explicit compaction_descriptor(::compaction::has_only_fully_expired has_only_fully_expired,
std::vector<sstables::shared_sstable> sstables)
: sstables(std::move(sstables))
, level(default_level)
, max_sstable_bytes(default_max_sstable_bytes)
, run_identifier(sstables::run_id::create_random_id())
, options(compaction_type_options::make_regular())
, has_only_fully_expired(has_only_fully_expired)
{}
// Return fan-in of this job, which is equal to its number of runs.
unsigned fan_in() const;
// Enables garbage collection for this descriptor, meaning that compaction will be able to purge expired data
void enable_garbage_collection(sstables::sstable_set snapshot) { all_sstables_snapshot = std::move(snapshot); }
// Returns total size of all sstables contained in this descriptor
uint64_t sstables_size() const;
};
}
template <>
struct fmt::formatter<compaction::compaction_type> : fmt::formatter<string_view> {
auto format(compaction::compaction_type, fmt::format_context& ctx) const -> decltype(ctx.out());
};
template <>
struct fmt::formatter<compaction::compaction_type_options::scrub::mode> : fmt::formatter<string_view> {
auto format(compaction::compaction_type_options::scrub::mode, fmt::format_context& ctx) const -> decltype(ctx.out());
};
template <>
struct fmt::formatter<compaction::compaction_type_options::scrub::quarantine_mode> : fmt::formatter<string_view> {
auto format(compaction::compaction_type_options::scrub::quarantine_mode, fmt::format_context& ctx) const -> decltype(ctx.out());
};