mirror of
https://github.com/scylladb/scylladb.git
synced 2026-05-23 16:22:15 +00:00
Previously, the prev_ip check caused problems for bootstrapping nodes. Suppose a bootstrapping node A appears in the system.peers table of some other node B. Its record has only ID and IP of the node A, due to the special handling of bootstrapping nodes in raft_topology_update_ip. Suppose node B gets temporarily isolated from the topology coordinator. The topology coordinator fences out node B and succesfully finishes bootstrapping of the node A. Later, when the connectivity is restored, topology_state_load runs on the node B, node A is already in normal state, but the gossiper on B might not yet have any state for it yet. In this case, raft_topology_update_ip would not update system.peers because the gossiper state is missing. Subsequently, on_join/on_restart/on_alive events would skip updates because the IP in gossiper matches the IP for that node in system.peers. Removing the check avoids this issue, with negligible overhead: * on_join/on_restart/on_alive happen only once in a node’s lifetime * topology_state_load already updates all nodes each time it runs. This problem was found by a fencing test, which crashed a node while another node was going through the bootstrapping process. After restart the node saw that other node already is in normal state, since the topology coordinator fenced out this node and managed to finish the bootstrapping process successfully. This test will be provided in a separate fencing-for-paxos PR. Closes scylladb/scylladb#25596