mirror of
https://github.com/scylladb/scylladb.git
synced 2026-06-01 12:36:56 +00:00
raft topology: join_node_request_handler: wait until first node becomes normal
We need to wait until the first node becomes normal in `join_node_request_handler` to ensure that joining nodes are not handled as the first node in the cluster. If we placed a join request before the first node becomes normal, the topology coordinator would incorrectly skip the join node handshake in `handle_node_transition` (`case node_state::none`). It would happen because the topology coordinator decides whether a node is the first in the cluster by checking if there are no normal nodes. Therefore, we must ensure at least one normal node when the topology coordinator handles a join request for a non-first node. We change the previous check because it can return true if there are no normal nodes. `topology::is_empty` would also return false if the first node was still new or in transition. Additionally, calling `join_node_request_handler` before the first node sets itself as normal is frequent during concurrent bootstrap, so we remove "unlikely" from the comment. Fixes: scylladb/scylladb#15807 Closes scylladb/scylladb#15775
This commit is contained in:
committed by
Avi Kivity
parent
227136ddf5
commit
a6236072ee
@@ -6345,11 +6345,13 @@ future<join_node_request_result> storage_service::join_node_request_handler(join
|
||||
|
||||
co_await _topology_state_machine.event.when([this] {
|
||||
// The first node defines the cluster and inserts its entry to the
|
||||
// `system.topology` without checking anything. It is unlikely but
|
||||
// possible that the `join_node_request_handler` fires before the first
|
||||
// node inserts its entry, therefore we might need to wait
|
||||
// until that happens, here.
|
||||
return !_topology_state_machine._topology.is_empty();
|
||||
// `system.topology` without checking anything. It is possible that the
|
||||
// `join_node_request_handler` fires before the first node sets itself
|
||||
// as a normal node, therefore we might need to wait until that happens,
|
||||
// here. If we didn't do it, the topology coordinator could handle the
|
||||
// joining node as the first one and skip the necessary join node
|
||||
// handshake.
|
||||
return !_topology_state_machine._topology.normal_nodes.empty();
|
||||
});
|
||||
|
||||
auto& g0_server = _group0->group0_server();
|
||||
|
||||
Reference in New Issue
Block a user