mirror of
https://github.com/scylladb/scylladb.git
synced 2026-04-30 13:17:01 +00:00
load_balancer: fix std::out_of_bounds when decommissioning with empty nodes
Consider the following:
The tablet load balancer is working on:
- node1: an empty node (no tablets) with a large disk capacity
- node2: an empty node (no tablets) with a lower disk capacity then node1
- node3: is being decommissioned and contains tablet replicas
In load_balancer::make_internode_plan() the initial destination
node/shard is selected like this:
// Pick best target shard.
auto dst = global_shard_id {target, _load_sketch->get_least_loaded_shard(target)};
load_sketch::get_least_loaded_shard(host_id) calls ensure_node() which
adds the host to load_sketch's internal hash maps in case the node was
not yet seen by load_sketch.
Let's assume dst is a shard on node1.
Later in load_balancer::make_internode_plan() we will call
pick_candidate() to try to find a better destination node than the
initial one:
// May choose a different source shard than src.shard or different destination host/shard than dst.
auto candidate = co_await pick_candidate(nodes, src_node_info, target_info, src, dst, nodes_by_load_dst,
drain_skipped);
auto source_tablets = candidate.tablets;
src = candidate.src;
dst = candidate.dst;
If pick_candidate() selects some other empty destination (due to larger
capacity: node1) node, and that node has not yet been seen by
load_sketch (because it was empty), a subsequent call to
load_sketch::pick() will search for the node using
std::unordered_map::at(), and because the node is not found it will
throw a std::out_of_bounds() exception crashing the load balancer.
This problem is fixed by changing load_sketch::populate() to initialize
its internal maps with all the nodes which populate()'s arguments
filter for.
Fixes: #26203
Closes scylladb/scylladb#26207
(cherry picked from commit c6c9c316a7)
Closes scylladb/scylladb#26240
This commit is contained in:
committed by
Botond Dénes
parent
c7091b61e4
commit
99b69092ef
@@ -3726,6 +3726,43 @@ static void execute_tablet_for_new_rf_test(calculate_tablet_replicas_for_new_rf_
|
||||
}
|
||||
}
|
||||
|
||||
SEASTAR_THREAD_TEST_CASE(test_ensure_node_for_load_sketch) {
|
||||
// This tests reproduces the balancer crash when a node is drained and there are more then one
|
||||
// empty destination nodes. If one of these destination nodes has a lower capacity then the other,
|
||||
// and the initial target node selected is the one with lower capacity, pick_candidate() will then
|
||||
// change the target node to the one with higher capacity. The problem is that
|
||||
// load_sketch::get_least_loaded_shard() and consequently load_sketch::ensure_node() have not yet
|
||||
// been called for the new, larger target (only for the initial, smaller one), and load_sketch will
|
||||
// not have the larger node in its _nodes member hash map. This will cause an std::out_of_bounds
|
||||
// exception when load_sketch::pick() is called with the host_id of the larger node.
|
||||
do_with_cql_env_thread([] (auto& e) {
|
||||
logging::logger_registry().set_logger_level("load_balancer", logging::log_level::debug);
|
||||
|
||||
topology_builder topo(e);
|
||||
|
||||
auto host1 = topo.add_node(node_state::normal, 2);
|
||||
const uint64_t node1_capacity = 50UL * 1024UL * 1024UL * 1024UL;
|
||||
topo.get_shared_load_stats().set_capacity(host1, node1_capacity);
|
||||
|
||||
auto ks_name = add_keyspace(e, {{topo.dc(), 1}}, 4);
|
||||
add_table(e, ks_name).get();
|
||||
|
||||
auto host2 = topo.add_node(node_state::normal, 2);
|
||||
const uint64_t node2_capacity = 70UL * 1024UL * 1024UL * 1024UL;
|
||||
topo.get_shared_load_stats().set_capacity(host2, node2_capacity);
|
||||
|
||||
auto host3 = topo.add_node(node_state::normal, 2);
|
||||
const uint64_t node3_capacity = 60UL * 1024UL * 1024UL * 1024UL;
|
||||
topo.get_shared_load_stats().set_capacity(host3, node3_capacity);
|
||||
|
||||
topo.set_node_state(host1, node_state::removing);
|
||||
|
||||
auto& talloc = e.get_tablet_allocator().local();
|
||||
auto& stm = e.shared_token_metadata().local();
|
||||
talloc.balance_tablets(stm.get(), topo.get_shared_load_stats().get()).get();
|
||||
}).get();
|
||||
}
|
||||
|
||||
SEASTAR_THREAD_TEST_CASE(test_calculate_tablet_replicas_for_new_rf_upsize_one_dc) {
|
||||
calculate_tablet_replicas_for_new_rf_config config;
|
||||
config.ring_points = {
|
||||
|
||||
Reference in New Issue
Block a user