Files
scylladb/api/api_init.hh
Kamil Braun 26cbd28883 Merge 'token_metadata: switch to host_id' from Petr Gusev
In this PR we refactor `token_metadata` to use `locator::host_id` instead of `gms::inet_address` for node identification in its internal data structures. Main motivation for these changes is to make raft state machine deterministic. The use of IPs is a problem since they are distributed through gossiper and can't be used reliably. One specific scenario is outlined [in this comment](https://github.com/scylladb/scylladb/pull/13655#issuecomment-1521389804) - `storage_service::topology_state_load` can't resolve host_id to IP when we are applying old raft log entries, containing host_id-s of the long-gone nodes.

The refactoring is structured as follows:
  * Turn `token_metadata` into a template so that it can be used with host_id or inet_address as the node key. The version with inet_address (the current one) provides a `get_new()` method, which can be used to access the new version.
  * Go over all places which write to the old version and make the corresponding writes to the new version through `get_new()`. When this stage is finished we can use any version of the `token_metadata` for reading.
  * Go over all the places which read `token_metadata` and switch them to the new version.
  * Make `host_id`-based `token_metadata` default, drop `inet_address`-based version, change `token_metadata` back to non-template.

These series [depends](1745a1551a) on RPC sender `host_id` being present in RPC `clent_info` for `bootstrap` and `replace` node_ops commands. This feature was added in [this commit](95c726a8df) and released in `5.4`. It is generally recommended not to skip versions when upgrading, so users who upgrade sequentially first to `5.4` (or the corresponding Enterprise version) then to the version with these changes (`5.5` or `6.0`) should be fine. If for some reason they upgrade from a version without `host_id` in RPC `clent_info` to the version with these changes and they run bootstrap or replace commands during the upgrade procedure itself, these commands may fail with an error `Coordinator host_id not found` if some nodes are already upgraded and the node which started the node_ops command is not yet upgraded. In this case the user can finish the upgrade first to version 5.4 or later, or start bootstrap/replace with an upgraded node. Note that removenode and decommission do not depend on coordinator host_id so they can be started in the middle of upgrade from any node.

Closes scylladb/scylladb#15903

* github.com:scylladb/scylladb:
  topology: remove_endpoint: remove inet_address overload
  token_metadata: topology: cleanup add_or_update_endpoint
  token_metadata: add_replacing_endpoint: forbid replacing node with itself
  topology: drop key_kind, host_id is now the primary key
  dc_rack_fn: make it non-template
  token_metadata: drop the template
  shared_token_metadata: switch to the new token_metadata
  gossiper: use new token_metadata
  database: get_token_metadata -> new token_metadata
  erm: switch to the new token_metadata
  storage_service: get_token_metadata -> token_metadata2
  storage_service: get_token_to_endpoint_map: use new token_metadata
  api/token_metadata: switch to new version
  storage_service::on_change: switch to new token_metadata
  cdc: switch to token_metadata2
  calculate_natural_endpoints: fix indentation
  calculate_natural_endpoints: switch to token_metadata2
  storage_service: get_changed_ranges_for_leaving: use new token_metadata
  decommission_with_repair, removenode_with_repair -> new token_metadata
  rebuild_with_repair, replace_with_repair: use new token_metadata
  bootstrap: use new token_metadata
  tablets: switch to token_metadata2
  calculate_effective_replication_map: use new token_metadata
  calculate_natural_endpoints: fix formatting
  abstract_replication_strategy: calculate_natural_endpoints: make it work with both versions of token_metadata
  network_topology_strategy_test: update new token_metadata
  storage_service: on_alive: update new token_metadata
  storage_service: handle_state_bootstrap: update new token_metadata
  storage_service: snitch_reconfigured: update new token_metadata
  storage_service: leave_ring: update new token_metadata
  storage_service: node_ops_cmd_handler: update new token_metadata
  storage_service: node_ops_cmd_handler: add coordinator_host_id
  storage_service: bootstrap: update new token_metadata
  storage_service: join_token_ring: update new token_metadata
  storage_service: excise: update new token_metadata
  storage_service: join_cluster: update new token_metadata
  storage_service: on_remove: update new token_metadata
  storage_service: handle_state_normal: fill new token_metadata
  storage_service: topology_state_load: fill new token_metadata
  storage_service: adjust update_topology_change_info to update new token_metadata
  topology: set self host_id on the new topology
  locator::topology: allow being_replaced and replacing nodes to have the same IP
  token_metadata: get_endpoint_for_host_id -> get_endpoint_for_host_id_if_known
  token_metadata: get_host_id: exception -> on_internal_error
  token_metadata: add get_all_ips method
  token_metadata: support host_id-based version
  token_metadata: make it a template with NodeId=inet_address/host_id NodeId is used in all internal token_metadata data structures, that previously used inet_address. We choose topology::key_kind based on the value of the template parameter.
  locator: make dc_rack_fn a template
  locator/topology: add key_kind parameter
  token_metadata: topology_change_info: change field types to token_metadata_ptr
  token_metadata: drop unused method get_endpoint_to_token_map_for_reading
2023-12-13 16:35:52 +01:00

134 lines
4.5 KiB
C++

/*
* Copyright 2016 ScylaDB
*/
/*
* SPDX-License-Identifier: AGPL-3.0-or-later
*/
#pragma once
#include <seastar/http/httpd.hh>
#include <seastar/core/future.hh>
#include "locator/host_id.hh"
#include "replica/database_fwd.hh"
#include "tasks/task_manager.hh"
#include "seastarx.hh"
using request = http::request;
using reply = http::reply;
namespace service {
class load_meter;
class storage_proxy;
class storage_service;
class raft_group0_client;
} // namespace service
class sstables_loader;
namespace streaming {
class stream_manager;
}
namespace gms {
class inet_address;
}
namespace locator {
class token_metadata;
class shared_token_metadata;
class snitch_ptr;
} // namespace locator
namespace cql_transport { class controller; }
class thrift_controller;
namespace db {
class snapshot_ctl;
class config;
namespace view {
class view_builder;
}
class system_keyspace;
}
namespace netw { class messaging_service; }
class repair_service;
namespace gms {
class gossiper;
}
namespace auth { class service; }
namespace tasks {
class task_manager;
}
namespace api {
struct http_context {
sstring api_dir;
sstring api_doc;
httpd::http_server_control http_server;
distributed<replica::database>& db;
service::load_meter& lmeter;
http_context(distributed<replica::database>& _db,
service::load_meter& _lm)
: db(_db), lmeter(_lm)
{
}
};
future<> set_server_init(http_context& ctx);
future<> set_server_config(http_context& ctx, const db::config& cfg);
future<> set_server_snitch(http_context& ctx, sharded<locator::snitch_ptr>& snitch);
future<> unset_server_snitch(http_context& ctx);
future<> set_server_storage_service(http_context& ctx, sharded<service::storage_service>& ss, service::raft_group0_client&);
future<> unset_server_storage_service(http_context& ctx);
future<> set_server_sstables_loader(http_context& ctx, sharded<sstables_loader>& sst_loader);
future<> unset_server_sstables_loader(http_context& ctx);
future<> set_server_view_builder(http_context& ctx, sharded<db::view::view_builder>& vb);
future<> unset_server_view_builder(http_context& ctx);
future<> set_server_repair(http_context& ctx, sharded<repair_service>& repair);
future<> unset_server_repair(http_context& ctx);
future<> set_transport_controller(http_context& ctx, cql_transport::controller& ctl);
future<> unset_transport_controller(http_context& ctx);
future<> set_rpc_controller(http_context& ctx, thrift_controller& ctl);
future<> unset_rpc_controller(http_context& ctx);
future<> set_server_authorization_cache(http_context& ctx, sharded<auth::service> &auth_service);
future<> unset_server_authorization_cache(http_context& ctx);
future<> set_server_snapshot(http_context& ctx, sharded<db::snapshot_ctl>& snap_ctl);
future<> unset_server_snapshot(http_context& ctx);
future<> set_server_token_metadata(http_context& ctx, sharded<locator::shared_token_metadata>& tm);
future<> unset_server_token_metadata(http_context& ctx);
future<> set_server_gossip(http_context& ctx, sharded<gms::gossiper>& g);
future<> set_server_load_sstable(http_context& ctx, sharded<db::system_keyspace>& sys_ks);
future<> unset_server_load_sstable(http_context& ctx);
future<> set_server_messaging_service(http_context& ctx, sharded<netw::messaging_service>& ms);
future<> unset_server_messaging_service(http_context& ctx);
future<> set_server_storage_proxy(http_context& ctx, sharded<service::storage_proxy>& proxy);
future<> unset_server_storage_proxy(http_context& ctx);
future<> set_server_stream_manager(http_context& ctx, sharded<streaming::stream_manager>& sm);
future<> unset_server_stream_manager(http_context& ctx);
future<> set_hinted_handoff(http_context& ctx, sharded<service::storage_proxy>& p);
future<> unset_hinted_handoff(http_context& ctx);
future<> set_server_gossip_settle(http_context& ctx, sharded<gms::gossiper>& g);
future<> set_server_cache(http_context& ctx);
future<> set_server_compaction_manager(http_context& ctx);
future<> set_server_done(http_context& ctx);
future<> set_server_task_manager(http_context& ctx, sharded<tasks::task_manager>& tm, lw_shared_ptr<db::config> cfg);
future<> unset_server_task_manager(http_context& ctx);
future<> set_server_task_manager_test(http_context& ctx, sharded<tasks::task_manager>& tm);
future<> unset_server_task_manager_test(http_context& ctx);
future<> set_server_tasks_compaction_module(http_context& ctx, sharded<service::storage_service>& ss, sharded<db::snapshot_ctl>& snap_ctl);
future<> unset_server_tasks_compaction_module(http_context& ctx);
}