Files
scylladb/api/system.cc
Michał Chojnowski 9e70df83ab db: get rid of sstables-format-selector
Our sstable format selection logic is weird, and hard to follow.

If I'm not misunderstanding, the pieces are:
1. There's the `sstable_format` config entry, which currently
   doesn't do anything, but in the past it used to disable
   cluster features for versions newer than the specified one.
2. There are deprecated and unused config entries for individual
   versions (`enable_sstables_mc_format`, `enable_sstables_md_format`,
   etc).
3. There is a cluster feature for each version:
   ME_SSTABLE_FORMAT, MD_SSTABLE_FORMAT, etc.
   (Currently all sstable version features have been grandfathered,
   and aren't checked by the code anymore).
4. There's an entry in `system.scylla_local` which contains the
   latest enabled sstable version. (Why? Isn't this directly derived
   from cluster features anyway)?
5. There's `sstable_manager::_format` which contains the
   sstable version to be used for new writes.
   This field is updated by `sstables_format_selector`
   based on cluster features and the `system.scylla_local` entry.

I don't see why those pieces are needed. Version selection has the
following constraints:
1. New sstables must be written with a format that supports existing
   data. For example, range tombstones with an infinite bound are only
   supported by sstables since version "mc". So if a range tombstone
   with an infinite bound exists somewhere in the dataset,
   the format chosen for new sstables has to be at least as new as "mc".
2. A new format might only be used after a corresponding cluster feature
   is enabled. (Otherwise new sstables might become unreadable if they
   are sent to another node, or if a node is downgraded).
3. The user should have a way to inhibit format ugprades if he wishes.

So far, constraint (1) has been fulfilled by never using formats older
than the newest format ever enabled on the node. (With an exception
for resharding and reshaping system tables).
Constraint (2) has been fulfilled by calling `sstable_manager::set_format`
only after the corresponsing cluster feature is enabled.
Constraint (3) has been fulfilled by the ability to inhibit cluster
features by setting `sstable_format` by some fixed value.

The main thing I don't like about this whole setup is that it doesn't
let me downgrade the preferred sstable format. After a format is
enabled, there is no way to go back to writing the old format again.
That is no good -- after I make some performance-sensitive changes
in a new format, it might turn out to be a pessimization for the
particular workload, and I want to be able to go back.

This patch aims to give a way to downgrade formats without violating
the constraints. What it does is:
1. The entry in `system.scylla_local` becomes obsolete.
   After the patch we no longer update or read it.
   As far as I understand, the purpose of this entry is to prevent
   unwanted format downgrades (which is something cluster features
   are designed for) and it's updated if and only if relevant
   cluster features are updated. So there's no reason to have it,
   we can just directly use cluster features.
2. `sstable_format_selector` gets deleted.
   Without the `system.scylla_local` around, it's just a glorified
   feature listener.
3. The format selection logic is moved into `sstable_manager`.
   It already sees the `db::config` and the `gms::feature_service`.
   For the foreseeable future, the knowledge of enabled cluster features
   and current config should be enough information to pick the right formats.
4. The `sstable_format` entry in `db::config` is no longer intended to
   inhibit cluster features. Instead, it is intended to select the
   format for new sstables, and it becomes live-updatable.
5. Instead of writing new sstables with "highest supported" format,
   (which used to be set by `sstables_format_selector`) we write
   them with the "preferred" format, which is determined by
   `sstable_manager` based on the combination of enabled features
   and the current value of `sstable_format`.

Closes scylladb/scylladb#26092

[avi: Pavel found the reason for the scylla_local entry -
      it predates stable storage for cluster features]
2025-09-19 16:17:56 +03:00

197 lines
8.1 KiB
C++

/*
* Copyright (C) 2015-present ScyllaDB
*/
/*
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
*/
#include "api/api_init.hh"
#include "api/api-doc/system.json.hh"
#include "api/api-doc/metrics.json.hh"
#include "replica/database.hh"
#include "sstables/sstables_manager.hh"
#include <rapidjson/document.h>
#include <boost/lexical_cast.hpp>
#include <seastar/core/reactor.hh>
#include <seastar/core/metrics_api.hh>
#include <seastar/core/relabel_config.hh>
#include <seastar/http/exception.hh>
#include <seastar/util/short_streams.hh>
#include <seastar/http/short_streams.hh>
#include "utils/log.hh"
extern logging::logger apilog;
namespace api {
using namespace seastar::httpd;
namespace hs = httpd::system_json;
namespace hm = httpd::metrics_json;
extern "C" void __attribute__((weak)) __llvm_profile_dump();
extern "C" const char * __attribute__((weak)) __llvm_profile_get_filename();
extern "C" void __attribute__((weak)) __llvm_profile_reset_counters();
void set_system(http_context& ctx, routes& r) {
hm::get_metrics_config.set(r, [](const_req req) {
std::vector<hm::metrics_config> res;
res.resize(seastar::metrics::get_relabel_configs().size());
size_t i = 0;
for (auto&& r : seastar::metrics::get_relabel_configs()) {
res[i].action = r.action;
res[i].target_label = r.target_label;
res[i].replacement = r.replacement;
res[i].separator = r.separator;
res[i].source_labels = r.source_labels;
res[i].regex = r.expr.str();
i++;
}
return res;
});
hm::set_metrics_config.set(r, [](std::unique_ptr<http::request> req) -> future<json::json_return_type> {
rapidjson::Document doc;
doc.Parse(req->content.c_str());
if (!doc.IsArray()) {
throw bad_param_exception("Expected a json array");
}
std::vector<seastar::metrics::relabel_config> relabels;
relabels.resize(doc.Size());
for (rapidjson::SizeType i = 0; i < doc.Size(); i++) {
const auto& element = doc[i];
if (element.HasMember("source_labels")) {
std::vector<std::string> source_labels;
source_labels.resize(element["source_labels"].Size());
for (size_t j = 0; j < element["source_labels"].Size(); j++) {
source_labels[j] = element["source_labels"][j].GetString();
}
relabels[i].source_labels = source_labels;
}
if (element.HasMember("action")) {
relabels[i].action = seastar::metrics::relabel_config_action(element["action"].GetString());
}
if (element.HasMember("replacement")) {
relabels[i].replacement = element["replacement"].GetString();
}
if (element.HasMember("separator")) {
relabels[i].separator = element["separator"].GetString();
}
if (element.HasMember("target_label")) {
relabels[i].target_label = element["target_label"].GetString();
}
if (element.HasMember("regex")) {
relabels[i].expr = element["regex"].GetString();
}
}
return do_with(std::move(relabels), false, [](const std::vector<seastar::metrics::relabel_config>& relabels, bool& failed) {
return smp::invoke_on_all([&relabels, &failed] {
return metrics::set_relabel_configs(relabels).then([&failed](const metrics::metric_relabeling_result& result) {
if (result.metrics_relabeled_due_to_collision > 0) {
failed = true;
}
return;
});
}).then([&failed](){
if (failed) {
throw bad_param_exception("conflicts found during relabeling");
}
return make_ready_future<json::json_return_type>(seastar::json::json_void());
});
});
});
hs::get_system_uptime.set(r, [](const_req req) {
return std::chrono::duration_cast<std::chrono::milliseconds>(engine().uptime()).count();
});
hs::get_all_logger_names.set(r, [](const_req req) {
return logging::logger_registry().get_all_logger_names();
});
hs::set_all_logger_level.set(r, [](const_req req) {
try {
logging::log_level level = boost::lexical_cast<logging::log_level>(std::string(req.get_query_param("level")));
logging::logger_registry().set_all_loggers_level(level);
} catch (boost::bad_lexical_cast& e) {
throw bad_param_exception("Unknown logging level " + req.get_query_param("level"));
}
return json::json_void();
});
hs::get_logger_level.set(r, [](const_req req) {
try {
return logging::level_name(logging::logger_registry().get_logger_level(req.get_path_param("name")));
} catch (std::out_of_range& e) {
throw bad_param_exception("Unknown logger name " + req.get_path_param("name"));
}
// just to keep the compiler happy
return sstring();
});
hs::set_logger_level.set(r, [](const_req req) {
try {
logging::log_level level = boost::lexical_cast<logging::log_level>(std::string(req.get_query_param("level")));
logging::logger_registry().set_logger_level(req.get_path_param("name"), level);
} catch (std::out_of_range& e) {
throw bad_param_exception("Unknown logger name " + req.get_path_param("name"));
} catch (boost::bad_lexical_cast& e) {
throw bad_param_exception("Unknown logging level " + req.get_query_param("level"));
}
return json::json_void();
});
hs::write_log_message.set(r, [](const_req req) {
try {
logging::log_level level = boost::lexical_cast<logging::log_level>(std::string(req.get_query_param("level")));
apilog.log(level, "/system/log: {}", std::string(req.get_query_param("message")));
} catch (boost::bad_lexical_cast& e) {
throw bad_param_exception("Unknown logging level " + req.get_query_param("level"));
}
return json::json_void();
});
hs::drop_sstable_caches.set(r, [&ctx](std::unique_ptr<request> req) {
apilog.info("Dropping sstable caches");
return ctx.db.invoke_on_all([] (replica::database& db) {
return db.drop_caches();
}).then([] {
apilog.info("Caches dropped");
return json::json_return_type(json::json_void());
});
});
hs::dump_profile.set(r, [](std::unique_ptr<request> req) {
if (!__llvm_profile_dump) {
apilog.info("Profile will not be dumped, executable is not instrumented with profile dumping.");
return make_ready_future<json::json_return_type>(json::json_return_type(json::json_void()));
}
sstring profile_dest(__llvm_profile_get_filename ? __llvm_profile_get_filename() : "disk");
apilog.info("Dumping profile to {}", profile_dest);
__llvm_profile_dump();
if (__llvm_profile_reset_counters) {
// If counters are not reset the profile dumping mechanism will issue a warning and exit
// next time it is attempted. If the counters are reset, profiles can be accumulated
// (if %m is present in LLVM_PROFILE_FILE pattern) so it can be dumped in stages or
// multiple times during runtime.
__llvm_profile_reset_counters();
} else {
apilog.warn("Could not reset profile counters, profile dumping will be skipped next time it is attempted");
}
apilog.info("Profile dumped to {}", profile_dest);
return make_ready_future<json::json_return_type>(json::json_return_type(json::json_void()));
}) ;
hs::get_highest_supported_sstable_version.set(r, [&ctx] (std::unique_ptr<request> req) {
return smp::submit_to(0, [&ctx] {
auto format = ctx.db.local().get_user_sstables_manager().get_highest_supported_format();
return make_ready_future<json::json_return_type>(seastar::to_sstring(format));
});
});
}
}