Consider the following:
The tablet load balancer is working on:
- node1: an empty node (no tablets) with a large disk capacity
- node2: an empty node (no tablets) with a lower disk capacity then node1
- node3: is being decommissioned and contains tablet replicas
In load_balancer::make_internode_plan() the initial destination
node/shard is selected like this:
// Pick best target shard.
auto dst = global_shard_id {target, _load_sketch->get_least_loaded_shard(target)};
load_sketch::get_least_loaded_shard(host_id) calls ensure_node() which
adds the host to load_sketch's internal hash maps in case the node was
not yet seen by load_sketch.
Let's assume dst is a shard on node1.
Later in load_balancer::make_internode_plan() we will call
pick_candidate() to try to find a better destination node than the
initial one:
// May choose a different source shard than src.shard or different destination host/shard than dst.
auto candidate = co_await pick_candidate(nodes, src_node_info, target_info, src, dst, nodes_by_load_dst,
drain_skipped);
auto source_tablets = candidate.tablets;
src = candidate.src;
dst = candidate.dst;
If pick_candidate() selects some other empty destination (due to larger
capacity: node1) node, and that node has not yet been seen by
load_sketch (because it was empty), a subsequent call to
load_sketch::pick() will search for the node using
std::unordered_map::at(), and because the node is not found it will
throw a std::out_of_bounds() exception crashing the load balancer.
This problem is fixed by changing load_sketch::populate() to initialize
its internal maps with all the nodes which populate()'s arguments
filter for.
Fixes: #26203
Closes scylladb/scylladb#26207
(cherry picked from commit c6c9c316a7)
Closes scylladb/scylladb#26240
Scylla in-source tests.
For details on how to run the tests, see docs/dev/testing.md
Shared C++ utils, libraries are in lib/, for Python - pylib/
alternator - Python tests which connect to a single server and use the DynamoDB API unit, boost, raft - unit tests in C++ cqlpy - Python tests which connect to a single server and use CQL topology* - tests that set up clusters and add/remove nodes cql - approval tests that use CQL and pre-recorded output rest_api - tests for Scylla REST API Port 9000 scylla-gdb - tests for scylla-gdb.py helper script nodetool - tests for C++ implementation of nodetool
If you can use an existing folder, consider adding your test to it. New folders should be used for new large categories/subsystems, or when the test environment is significantly different from some existing suite, e.g. you plan to start scylladb with different configuration, and you intend to add many tests and would like them to reuse an existing Scylla cluster (clusters can be reused for tests within the same folder).
To add a new folder, create a new directory, and then
copy & edit its suite.ini.