Files
scylladb/data_dictionary/data_dictionary.hh
Benny Halevy c5668d99c9 schema: add per-table tablet options
Unlike with vnodes, each tablet is served only by a single
shard, and it is associated with a memtable that, when
flushed, it creates sstables which token-range is confined
to the tablet owning them.

On one hand, this allows for far better agility and elasticity
since migration of tablets between nodes or shards does not
require rewriting most if not all of the sstables, as required
with vnodes (at the cleanup phase).

Having too few tablets might limit performance due not
being served by all shards or by imbalance between shards
caused by quantization.  The number of tabelts per table has to be
a power of 2 with the current design, and when divided by the
number of shards, some shards will serve N tablets, while others
may serve N+1, and when N is small N+1/N may be significantly
larger than 1. For example, with N=1, some shards will serve
2 tablet replicas and some will serve only 1, causing an imbalance
of 100%.

Now, simply allocating a lot more tablets for each table may
theoretically address this problem, but practically:
a. Each tablet has memory overhead and having too many tablets
in the system with many tables and many tablets for each of them
may overwhelm the system's and cause out-of-memory errors.
b. Too-small tablets cause a proliferation of small sstables
that are less efficient to acces, have higher metadata overhead
(due to per-sstable overhead), and might exhaust the system's
open file-descriptors limitations.

The options introduced in this change can help the user tune
the system in two ways:
1. Sizing the table to prevent unnecessary tablet splits
and migrations.  This can be done when the table is created,
or later on, using ALTER TABLE.
2. Controlling min_per_shard_tablet_count to improve
tablet balancing, for hot tables.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-02-06 08:55:51 +02:00

133 lines
4.3 KiB
C++

/*
* Copyright (C) 2021-present ScyllaDB
*/
/*
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
*/
#pragma once
#include <optional>
#include <set>
#include <string_view>
#include <vector>
#include <seastar/core/shared_ptr.hh>
#include "seastarx.hh"
#include "schema/schema_fwd.hh"
namespace replica {
class database; // For transition; remove
}
class schema;
using schema_ptr = lw_shared_ptr<const schema>;
class view_ptr;
namespace db {
class config;
class extensions;
}
namespace secondary_index {
class secondary_index_manager;
}
namespace gms {
class feature_service;
}
namespace locator {
class abstract_replication_strategy;
}
// Classes representing the overall schema, but without access to data.
// Useful on the coordinator side (where access to data is via storage_proxy).
//
// Everything here is type-erased to reduce dependencies. No references are
// kept, so lower-level objects like keyspaces and tables must not be held
// across continuations.
namespace data_dictionary {
// Used to forward all operations to the underlying backing store.
class impl;
class user_types_metadata;
class keyspace_metadata;
class no_such_keyspace : public std::runtime_error {
public:
no_such_keyspace(std::string_view ks_name);
};
class no_such_column_family : public std::runtime_error {
public:
no_such_column_family(const table_id& uuid);
no_such_column_family(std::string_view ks_name, std::string_view cf_name);
no_such_column_family(std::string_view ks_name, const table_id& uuid);
};
class table {
const impl* _ops;
const void* _table;
private:
friend class impl;
table(const impl* ops, const void* table);
public:
schema_ptr schema() const;
const std::vector<view_ptr>& views() const;
const secondary_index::secondary_index_manager& get_index_manager() const;
};
class keyspace {
const impl* _ops;
const void* _keyspace;
private:
friend class impl;
keyspace(const impl* ops, const void* keyspace);
public:
bool is_internal() const;
bool uses_tablets() const;
lw_shared_ptr<keyspace_metadata> metadata() const;
const user_types_metadata& user_types() const;
const locator::abstract_replication_strategy& get_replication_strategy() const;
};
class database {
const impl* _ops;
const void* _database;
private:
friend class impl;
database(const impl* ops, const void* database);
public:
const table_schema_version& get_version() const;
keyspace find_keyspace(std::string_view name) const;
std::optional<keyspace> try_find_keyspace(std::string_view name) const;
bool has_keyspace(std::string_view name) const; // throws no_keyspace
std::vector<keyspace> get_keyspaces() const;
std::vector<sstring> get_user_keyspaces() const;
std::vector<sstring> get_all_keyspaces() const;
std::vector<table> get_tables() const;
table find_table(std::string_view ks, std::string_view table) const; // throws no_such_column_family
table find_column_family(table_id uuid) const; // throws no_such_column_family
schema_ptr find_schema(std::string_view ks, std::string_view table) const; // throws no_such_column_family
schema_ptr find_schema(table_id uuid) const; // throws no_such_column_family
table find_column_family(schema_ptr s) const;
bool has_schema(std::string_view ks_name, std::string_view cf_name) const;
std::optional<table> try_find_table(std::string_view ks, std::string_view table) const;
std::optional<table> try_find_table(table_id id) const;
const db::config& get_config() const;
std::set<sstring> existing_index_names(std::string_view ks_name, std::string_view cf_to_exclude = sstring()) const;
schema_ptr find_indexed_table(std::string_view ks_name, std::string_view index_name) const;
sstring get_available_index_name(std::string_view ks_name, std::string_view table_name,
std::optional<sstring> index_name_root) const;
schema_ptr get_cdc_base_table(std::string_view ks_name, std::string_view table_name) const;
schema_ptr get_cdc_base_table(const schema&) const;
const db::extensions& extensions() const;
const gms::feature_service& features() const;
replica::database& real_database() const; // For transition; remove
replica::database* real_database_ptr() const;
};
}