mirror of
https://github.com/scylladb/scylladb.git
synced 2026-04-22 17:40:34 +00:00
Operations of adding or removing a node to Raft configuration are made idempotent: they do nothing if already done, and they are safe to resume after a failure. However, since topology changes are not transactional, if a bootstrap or removal procedure fails midway, Raft group 0 configuration may go out of sync with topology state as seen by gossip. In future we must change gossip to avoid making any persistent changes to the cluster: all changes to persistent topology state will be done exclusively through Raft Group 0. Specifically, instead of persisting the tokens by advertising them through gossip, the bootstrap will commit a change to a system table using Raft group 0. nodetool will switch from looking at gossip-managed tables to consulting with Raft Group 0 configuration or Raft-managed tables. Once this transformation is done, naturally, adding a node to Raft configuration (perhaps as a non-voting member at first) will become the first persistent change to ring state applied when a node joins; removing a node from the Raft Group 0 configuration will become the last action when removing a node. Until this is done, do our best to avoid a cluster state when a removed node or a node which addition failed is stuck in Raft configuration, but the node is no longer present in gossip-managed system tables. In other words, keep the gossip the primary source of truth. For this purpose, carefully chose the timing when we join and leave Raft group 0: Join the Raft group 0 only after we've advertised our tokens, so the cluster is aware of this node, it's visible in nodetool status, but before node state jumps to "normal", i.e. before it accepts queries. Since the operation is idempotent, invoke it on each restart. Remove the node from Group 0 *before* its tokens are removed from gossip-managed system tables. This guarantees that if removal from Raft group 0 fails for whatever reason, the node stays in the ring, so nodetool removenode and friends are re-tried. Add tracing.