Files
scylladb/data_dictionary/keyspace_metadata.hh
Benny Halevy c5668d99c9 schema: add per-table tablet options
Unlike with vnodes, each tablet is served only by a single
shard, and it is associated with a memtable that, when
flushed, it creates sstables which token-range is confined
to the tablet owning them.

On one hand, this allows for far better agility and elasticity
since migration of tablets between nodes or shards does not
require rewriting most if not all of the sstables, as required
with vnodes (at the cleanup phase).

Having too few tablets might limit performance due not
being served by all shards or by imbalance between shards
caused by quantization.  The number of tabelts per table has to be
a power of 2 with the current design, and when divided by the
number of shards, some shards will serve N tablets, while others
may serve N+1, and when N is small N+1/N may be significantly
larger than 1. For example, with N=1, some shards will serve
2 tablet replicas and some will serve only 1, causing an imbalance
of 100%.

Now, simply allocating a lot more tablets for each table may
theoretically address this problem, but practically:
a. Each tablet has memory overhead and having too many tablets
in the system with many tables and many tablets for each of them
may overwhelm the system's and cause out-of-memory errors.
b. Too-small tablets cause a proliferation of small sstables
that are less efficient to acces, have higher metadata overhead
(due to per-sstable overhead), and might exhaust the system's
open file-descriptors limitations.

The options introduced in this change can help the user tune
the system in two ways:
1. Sizing the table to prevent unnecessary tablet splits
and migrations.  This can be done when the table is created,
or later on, using ALTER TABLE.
2. Controlling min_per_shard_tablet_count to improve
tablet balancing, for hot tables.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-02-06 08:55:51 +02:00

111 lines
3.6 KiB
C++

/*
* Copyright (C) 2021-present ScyllaDB
*/
/*
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
*/
#pragma once
#include <unordered_map>
#include <vector>
#include <seastar/core/sstring.hh>
#include "cql3/description.hh"
#include "schema/schema.hh"
#include "locator/abstract_replication_strategy.hh"
#include "data_dictionary/user_types_metadata.hh"
#include "data_dictionary/storage_options.hh"
namespace gms {
class feature_service;
}
namespace data_dictionary {
class keyspace_metadata final {
sstring _name;
sstring _strategy_name;
locator::replication_strategy_config_options _strategy_options;
std::optional<unsigned> _initial_tablets;
std::unordered_map<sstring, schema_ptr> _cf_meta_data;
bool _durable_writes;
user_types_metadata _user_types;
lw_shared_ptr<const storage_options> _storage_options;
public:
keyspace_metadata(std::string_view name,
std::string_view strategy_name,
locator::replication_strategy_config_options strategy_options,
std::optional<unsigned> initial_tablets,
bool durable_writes,
std::vector<schema_ptr> cf_defs = std::vector<schema_ptr>{},
user_types_metadata user_types = user_types_metadata{},
storage_options storage_opts = storage_options{});
static lw_shared_ptr<keyspace_metadata>
new_keyspace(std::string_view name,
std::string_view strategy_name,
locator::replication_strategy_config_options options,
std::optional<unsigned> initial_tablets,
bool durables_writes = true,
storage_options storage_opts = {},
std::vector<schema_ptr> cf_defs = {});
static lw_shared_ptr<keyspace_metadata>
new_keyspace(const keyspace_metadata& ksm);
void validate(const gms::feature_service&, const locator::topology&) const;
const sstring& name() const {
return _name;
}
const sstring& strategy_name() const {
return _strategy_name;
}
const locator::replication_strategy_config_options& strategy_options() const {
return _strategy_options;
}
std::optional<unsigned> initial_tablets() const {
return _initial_tablets;
}
bool uses_tablets() const noexcept {
return _initial_tablets.has_value();
}
const std::unordered_map<sstring, schema_ptr>& cf_meta_data() const {
return _cf_meta_data;
}
bool durable_writes() const {
return _durable_writes;
}
user_types_metadata& user_types() {
return _user_types;
}
const user_types_metadata& user_types() const {
return _user_types;
}
const storage_options& get_storage_options() const {
return *_storage_options;
}
lw_shared_ptr<const storage_options> get_storage_options_ptr() {
return _storage_options;
}
void add_or_update_column_family(const schema_ptr& s) {
_cf_meta_data[s->cf_name()] = s;
}
void remove_column_family(const schema_ptr& s) {
_cf_meta_data.erase(s->cf_name());
}
void add_user_type(const user_type ut);
void remove_user_type(const user_type ut);
std::vector<schema_ptr> tables() const;
std::vector<view_ptr> views() const;
cql3::description describe(const replica::database& db, cql3::with_create_statement) const;
};
}
template <>
struct fmt::formatter<data_dictionary::keyspace_metadata> {
constexpr auto parse(format_parse_context& ctx) { return ctx.begin(); }
auto format(const data_dictionary::keyspace_metadata& ksm, fmt::format_context& ctx) const -> decltype(ctx.out());
};