mirror of
https://github.com/scylladb/scylladb.git
synced 2026-05-13 03:12:13 +00:00
Reopening #8286 since the token metadata fix that allows `Everywhere` strategy tables to work with RBO (#8536) has been merged. --- Currently when a node wants to create and broadcast a new CDC generation it performs the following steps: 1. choose the generation's stream IDs and mapping (how this is done is irrelevant for the current discussion) 2. choose the generation's timestamp by taking the current time (according to its local clock) and adding 2 * ring_delay 3. insert the generation's data (mapping and stream IDs) into system_distributed.cdc_generation_descriptions, using the generation's timestamp as the partition key (we call this table the "old internal table" below) 4. insert the generation's timestamp into the "CDC_STREAMS_TIMESTAMP" application state. The timestamp spreads epidemically through the gossip protocol. When nodes see the timestamp, they retrieve the generation data from the old internal table. Unfortunately, due to the schema of the old internal table, where the entire generation data is stored in a single cell, step 3 may fail for sufficiently large generations (there is a size threshold for which step 3 will always fail - retrying the operation won't help). Also the old internal table lies in the system_distributed keyspace that uses SimpleStrategy with replication factor 3, which is also problematic; for example, when nodes restart, they must reach at least 2 out of these 3 specific replicas in order to retrieve the current generation (we write and read the generation data with QUORUM, unless we're a single-node cluster, where we use ONE). Until this happens, a restarting node can't coordinate writes to CDC-enabled tables. It would be better if the node could access the last known generation locally. The commit introduces a new table for broadcasting generation data with the following properties: - it uses a better schema that stores the data in multiple rows, each of manageable size - it resides in a new keyspace that uses EverywhereStrategy so the data will be written to every node in the cluster that has a token in the token ring - the data will be written using CL=ALL and read using CL=ONE; thanks to this, restarting node won't have to communicate with other nodes to retrieve the data of the last known generation. Note that writing with CL=ALL does not reduce availability: creating a new generation *requires* all nodes to be available anyway, because they must learn about the generation before their clocks go past the generation's timestamp; if they don't, partitions won't be mapped to stream IDs consistently across the cluster - the partition key is no longer the generation's timestamp. Because it was that way in the old internal table, it forced the algorithm to choose the timestamp *before* the generation data was inserted into the table. What if the inserting took a long time? It increased the chance that nodes would learn about the generation too late (after their clocks moved past its timestamp). With the new schema we will first insert the generation data using a randomly generated UUID as the partition key, *then* choose the timestamp, then gossip both the timestamp and the UUID. Observe that after a node learns about a generation broadcasted using this new method through gossip it will retrieve its data very quickly since it's one of the replicas and it can use CL=ONE as it was written using CL=ALL. The generation's timestamp and the UUID mentioned in the last point form a "generation identifier" for this new generation. For passing these new identifiers around, we introduce the cdc::generation_id_v2 type. Fixes #7961. --- For optimal review experience it is best to first read the updated design notes (you can read them rendered here: https://github.com/kbr-/scylla/blob/cdc-gen-table/docs/design-notes/cdc.md), specifically the ["Generation switching"](https://github.com/kbr-/scylla/blob/cdc-gen-table/docs/design-notes/cdc.md#generation-switching) section followed by the ["Internal generation descriptions table V1 and upgrade procedure"](https://github.com/kbr-/scylla/blob/cdc-gen-table/docs/design-notes/cdc.md#internal-generation-descriptions-table-v1-and-upgrade-procedure) section, then read the commits in topological order. dtest gating run (dev): https://jenkins.scylladb.com/job/scylla-master/job/byo/job/byo_build_tests_dtest/1160/ unit tests (dev) passed locally Closes #8643 * github.com:scylladb/scylla: docs: update cdc.md with info about the new internal table sys_dist_ks: don't create old CDC generations table on service initialization sys_dist_ks: rename all_tables() to ensured_tables() cdc: when creating new generations, use format v2 if possible main: pass feature_service to cdc::generation_service gms: introduce CDC_GENERATIONS_V2 feature cdc: introduce retrieve_generation_data test: cdc: include new generations table in permissions test sys_dist_ks: increase timeout for create_cdc_desc sys_dist_ks: new table for exchanging CDC generations tree-wide: introduce cdc::generation_id_v2
308 lines
12 KiB
C++
308 lines
12 KiB
C++
/*
|
|
* Licensed to the Apache Software Foundation (ASF) under one
|
|
* or more contributor license agreements. See the NOTICE file
|
|
* distributed with this work for additional information
|
|
* regarding copyright ownership. The ASF licenses this file
|
|
* to you under the Apache License, Version 2.0 (the
|
|
* "License"); you may not use this file except in compliance
|
|
* with the License. You may obtain a copy of the License at
|
|
*
|
|
* http://www.apache.org/licenses/LICENSE-2.0
|
|
*
|
|
* Unless required by applicable law or agreed to in writing, software
|
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
* See the License for the specific language governing permissions and
|
|
* limitations under the License.
|
|
*/
|
|
|
|
/*
|
|
* Copyright (C) 2016 ScyllaDB
|
|
*
|
|
* Modified by ScyllaDB
|
|
*/
|
|
|
|
/*
|
|
* This file is part of Scylla.
|
|
*
|
|
* Scylla is free software: you can redistribute it and/or modify
|
|
* it under the terms of the GNU Affero General Public License as published by
|
|
* the Free Software Foundation, either version 3 of the License, or
|
|
* (at your option) any later version.
|
|
*
|
|
* Scylla is distributed in the hope that it will be useful,
|
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
* GNU General Public License for more details.
|
|
*
|
|
* You should have received a copy of the GNU General Public License
|
|
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
|
|
*/
|
|
|
|
#include "client_state.hh"
|
|
#include "auth/authorizer.hh"
|
|
#include "auth/authenticator.hh"
|
|
#include "auth/common.hh"
|
|
#include "auth/resource.hh"
|
|
#include "exceptions/exceptions.hh"
|
|
#include "validation.hh"
|
|
#include "db/system_keyspace.hh"
|
|
#include "db/schema_tables.hh"
|
|
#include "tracing/trace_keyspace_helper.hh"
|
|
#include "db/system_distributed_keyspace.hh"
|
|
#include "database.hh"
|
|
#include "cdc/log.hh"
|
|
#include <seastar/core/coroutine.hh>
|
|
|
|
thread_local api::timestamp_type service::client_state::_last_timestamp_micros = 0;
|
|
|
|
void service::client_state::set_login(auth::authenticated_user user) {
|
|
_user = std::move(user);
|
|
}
|
|
|
|
future<> service::client_state::check_user_can_login() {
|
|
if (auth::is_anonymous(*_user)) {
|
|
return make_ready_future();
|
|
}
|
|
|
|
auto& role_manager = _auth_service->underlying_role_manager();
|
|
|
|
return role_manager.exists(*_user->name).then([this](bool exists) mutable {
|
|
if (!exists) {
|
|
throw exceptions::authentication_exception(
|
|
format("User {} doesn't exist - create it with CREATE USER query first",
|
|
*_user->name));
|
|
}
|
|
return make_ready_future();
|
|
}).then([this, &role_manager] {
|
|
return role_manager.can_login(*_user->name).then([this](bool can_login) {
|
|
if (!can_login) {
|
|
throw exceptions::authentication_exception(format("{} is not permitted to log in", *_user->name));
|
|
}
|
|
|
|
return make_ready_future();
|
|
});
|
|
});
|
|
}
|
|
|
|
void service::client_state::validate_login() const {
|
|
if (!_user) {
|
|
throw exceptions::unauthorized_exception("You have not logged in");
|
|
}
|
|
}
|
|
|
|
void service::client_state::ensure_not_anonymous() const {
|
|
validate_login();
|
|
if (auth::is_anonymous(*_user)) {
|
|
throw exceptions::unauthorized_exception("You have to be logged in and not anonymous to perform this request");
|
|
}
|
|
}
|
|
|
|
future<> service::client_state::has_all_keyspaces_access(
|
|
auth::permission p) const {
|
|
if (_is_internal) {
|
|
return make_ready_future();
|
|
}
|
|
validate_login();
|
|
|
|
return do_with(auth::resource(auth::resource_kind::data), [this, p](const auto& r) {
|
|
return ensure_has_permission({p, r});
|
|
});
|
|
}
|
|
|
|
future<> service::client_state::has_keyspace_access(const sstring& ks,
|
|
auth::permission p) const {
|
|
return do_with(ks, auth::make_data_resource(ks), [this, p](auto const& ks, auto const& r) {
|
|
return has_access(ks, {p, r});
|
|
});
|
|
}
|
|
|
|
future<> service::client_state::has_column_family_access(const database& db, const sstring& ks,
|
|
const sstring& cf, auth::permission p, auth::command_desc::type t) const {
|
|
validation::validate_column_family(db, ks, cf);
|
|
|
|
return do_with(ks, auth::make_data_resource(ks, cf), [this, p, t](const auto& ks, const auto& r) {
|
|
return has_access(ks, {p, r, t});
|
|
});
|
|
}
|
|
|
|
future<> service::client_state::has_schema_access(const schema& s, auth::permission p) const {
|
|
return do_with(
|
|
s.ks_name(),
|
|
auth::make_data_resource(s.ks_name(),s.cf_name()),
|
|
[this, p](auto const& ks, auto const& r) {
|
|
return has_access(ks, {p, r});
|
|
});
|
|
}
|
|
|
|
future<> service::client_state::has_access(const sstring& ks, auth::command_desc cmd) const {
|
|
if (ks.empty()) {
|
|
throw exceptions::invalid_request_exception("You have not set a keyspace for this session");
|
|
}
|
|
if (_is_internal) {
|
|
return make_ready_future();
|
|
}
|
|
|
|
validate_login();
|
|
|
|
static const auto alteration_permissions = auth::permission_set::of<
|
|
auth::permission::CREATE, auth::permission::ALTER, auth::permission::DROP>();
|
|
|
|
// we only care about schema modification.
|
|
if (alteration_permissions.contains(cmd.permission)) {
|
|
// prevent system keyspace modification
|
|
auto name = ks;
|
|
std::transform(name.begin(), name.end(), name.begin(), ::tolower);
|
|
if (is_system_keyspace(name)) {
|
|
throw exceptions::unauthorized_exception(ks + " keyspace is not user-modifiable.");
|
|
}
|
|
|
|
//
|
|
// we want to disallow dropping any contents of TRACING_KS and disallow dropping the `auth::meta::AUTH_KS`
|
|
// keyspace.
|
|
//
|
|
|
|
const bool dropping_anything_in_tracing = (name == tracing::trace_keyspace_helper::KEYSPACE_NAME)
|
|
&& (cmd.permission == auth::permission::DROP);
|
|
|
|
const bool dropping_auth_keyspace = (cmd.resource == auth::make_data_resource(auth::meta::AUTH_KS))
|
|
&& (cmd.permission == auth::permission::DROP);
|
|
|
|
if (dropping_anything_in_tracing || dropping_auth_keyspace) {
|
|
throw exceptions::unauthorized_exception(
|
|
format("Cannot {} {}", auth::permissions::to_string(cmd.permission), cmd.resource));
|
|
}
|
|
}
|
|
|
|
static thread_local std::unordered_set<auth::resource> readable_system_resources = [] {
|
|
std::unordered_set<auth::resource> tmp;
|
|
for (auto cf : { db::system_keyspace::LOCAL, db::system_keyspace::PEERS }) {
|
|
tmp.insert(auth::make_data_resource(db::system_keyspace::NAME, cf));
|
|
}
|
|
for (auto cf : db::schema_tables::all_table_names(db::schema_features::full())) {
|
|
tmp.insert(auth::make_data_resource(db::schema_tables::NAME, cf));
|
|
}
|
|
return tmp;
|
|
}();
|
|
|
|
if (cmd.permission == auth::permission::SELECT && readable_system_resources.contains(cmd.resource)) {
|
|
return make_ready_future();
|
|
}
|
|
if (alteration_permissions.contains(cmd.permission)) {
|
|
if (auth::is_protected(*_auth_service, cmd)) {
|
|
throw exceptions::unauthorized_exception(format("{} is protected", cmd.resource));
|
|
}
|
|
}
|
|
|
|
if (cmd.resource.kind() == auth::resource_kind::data) {
|
|
const auto resource_view = auth::data_resource_view(cmd.resource);
|
|
if (resource_view.table()) {
|
|
if (cmd.permission == auth::permission::DROP) {
|
|
if (cdc::is_log_for_some_table(ks, *resource_view.table())) {
|
|
throw exceptions::unauthorized_exception(
|
|
format("Cannot {} cdc log table {}", auth::permissions::to_string(cmd.permission), cmd.resource));
|
|
}
|
|
}
|
|
|
|
static constexpr auto cdc_topology_description_forbidden_permissions = auth::permission_set::of<
|
|
auth::permission::ALTER, auth::permission::DROP>();
|
|
|
|
if (cdc_topology_description_forbidden_permissions.contains(cmd.permission)) {
|
|
if ((ks == db::system_distributed_keyspace::NAME || ks == db::system_distributed_keyspace::NAME_EVERYWHERE)
|
|
&& (resource_view.table() == db::system_distributed_keyspace::CDC_DESC_V2
|
|
|| resource_view.table() == db::system_distributed_keyspace::CDC_TOPOLOGY_DESCRIPTION
|
|
|| resource_view.table() == db::system_distributed_keyspace::CDC_TIMESTAMPS
|
|
|| resource_view.table() == db::system_distributed_keyspace::CDC_GENERATIONS_V2)) {
|
|
throw exceptions::unauthorized_exception(
|
|
format("Cannot {} {}", auth::permissions::to_string(cmd.permission), cmd.resource));
|
|
}
|
|
}
|
|
}
|
|
}
|
|
|
|
return ensure_has_permission(cmd);
|
|
}
|
|
|
|
future<bool> service::client_state::check_has_permission(auth::command_desc cmd) const {
|
|
if (_is_internal) {
|
|
return make_ready_future<bool>(true);
|
|
}
|
|
|
|
return do_with(cmd.resource.parent(), [this, cmd](const std::optional<auth::resource>& parent_r) {
|
|
return auth::get_permissions(*_auth_service, *_user, cmd.resource).then(
|
|
[this, p = cmd.permission, &parent_r](auth::permission_set set) {
|
|
if (set.contains(p)) {
|
|
return make_ready_future<bool>(true);
|
|
}
|
|
if (parent_r) {
|
|
return check_has_permission({p, *parent_r});
|
|
}
|
|
return make_ready_future<bool>(false);
|
|
});
|
|
});
|
|
}
|
|
|
|
future<> service::client_state::ensure_has_permission(auth::command_desc cmd) const {
|
|
return check_has_permission(cmd).then([this, cmd](bool ok) {
|
|
if (!ok) {
|
|
throw exceptions::unauthorized_exception(
|
|
format("User {} has no {} permission on {} or any of its parents",
|
|
*_user,
|
|
auth::permissions::to_string(cmd.permission),
|
|
cmd.resource));
|
|
}
|
|
});
|
|
}
|
|
|
|
void service::client_state::set_keyspace(database& db, std::string_view keyspace) {
|
|
// Skip keyspace validation for non-authenticated users. Apparently, some client libraries
|
|
// call set_keyspace() before calling login(), and we have to handle that.
|
|
if (_user && !db.has_keyspace(keyspace)) {
|
|
throw exceptions::invalid_request_exception(format("Keyspace '{}' does not exist", keyspace));
|
|
}
|
|
_keyspace = sstring(keyspace);
|
|
}
|
|
|
|
future<> service::client_state::ensure_exists(const auth::resource& r) const {
|
|
return _auth_service->exists(r).then([&r](bool exists) {
|
|
if (!exists) {
|
|
throw exceptions::invalid_request_exception(format("{} doesn't exist.", r));
|
|
}
|
|
|
|
return make_ready_future<>();
|
|
});
|
|
}
|
|
|
|
future<> service::client_state::maybe_update_per_service_level_params() {
|
|
if (_sl_controller && _user && _user->name) {
|
|
auto& role_manager = _auth_service->underlying_role_manager();
|
|
auto role_set = co_await role_manager.query_granted(_user->name.value(), auth::recursive_role_query::yes);
|
|
auto slo_opt = co_await _sl_controller->find_service_level(role_set);
|
|
if (!slo_opt) {
|
|
co_return;
|
|
}
|
|
auto slo_timeout_or = [&] (const lowres_clock::duration& default_timeout) {
|
|
return std::visit(overloaded_functor{
|
|
[&] (const qos::service_level_options::unset_marker&) -> lowres_clock::duration {
|
|
return default_timeout;
|
|
},
|
|
[&] (const qos::service_level_options::delete_marker&) -> lowres_clock::duration {
|
|
return default_timeout;
|
|
},
|
|
[&] (const lowres_clock::duration& d) -> lowres_clock::duration {
|
|
return d;
|
|
},
|
|
}, slo_opt->timeout);
|
|
};
|
|
_timeout_config.read_timeout = slo_timeout_or(_default_timeout_config.read_timeout);
|
|
_timeout_config.write_timeout = slo_timeout_or(_default_timeout_config.write_timeout);
|
|
_timeout_config.range_read_timeout = slo_timeout_or(_default_timeout_config.range_read_timeout);
|
|
_timeout_config.counter_write_timeout = slo_timeout_or(_default_timeout_config.counter_write_timeout);
|
|
_timeout_config.truncate_timeout = slo_timeout_or(_default_timeout_config.truncate_timeout);
|
|
_timeout_config.cas_timeout = slo_timeout_or(_default_timeout_config.cas_timeout);
|
|
_timeout_config.other_timeout = slo_timeout_or(_default_timeout_config.other_timeout);
|
|
|
|
_workload_type = slo_opt->workload;
|
|
}
|
|
}
|