service/raft: raft_group0: rewrite leave_group0

One of the following cases is true: 1. RAFT local feature is disabled. Then we don't do anything related to group 0. 2. RAFT local feature is enabled and when we bootstrapped, we joined group 0. Then `raft_group0::_group0` variant holds the `raft::group_id` alternative. 3. RAFT local feature is enabled and when we bootstrapped we didn't join group 0. This means the RAFT local feature was disabled when we bootstrapped and we're in the (unimplemented yet) upgrade scenario. `raft_group0::_group0` variant holds the `std::monostate` alternative. The problem with the previous implementation was that it checked for the conditions of the third case above - that RAFT local feature is enabled but `_group0` does not hold `raft::group_id` - and if those conditions were true, it executed some logic that didn't really make sense: it ran the discovery algorithm and called `send_group0_modify_config` RPC. In this rewrite I state some assumptions that `leave_group0` makes: - we've finished the startup procedure. - we're being run during decommission - after the node entered LEFT status. In the new implementation, if `_group0` does not hold `raft::group_id` (checked by the internal `joined_group0()` helper), we simply return. This is the yet-unimplemented upgrade case left for a follow-up PR. Otherwise we fetch our Raft server ID (at this point it must be present - otherwise it's a fatal error) and simply call `modify_config` from the `raft::server` API. Remove unnecessary call to `_shutdown_gate.hold()` (this is not a background task).
2026-05-29 11:10:40 +00:00 · 2022-07-12 12:47:24 +02:00
parent 75608bcd2f
commit eeeef0bc50
2 changed files with 28 additions and 24 deletions
--- a/service/raft/raft_group0.cc
+++ b/service/raft/raft_group0.cc
@@ -347,33 +347,28 @@ future<> raft_group0::become_voter() {
 }

 future<> raft_group0::leave_group0() {
-    if (!_raft_gr.is_enabled()) {
-        co_return;
-    }
    assert(this_shard_id() == 0);
-    raft::server_id remove_addr;
+
+    if (!_raft_gr.is_enabled()) {
+        rslog.info("leave_group0: local RAFT feature disabled, skipping.");
+        co_return;
+    }
+
+    if (!joined_group0()) {
+        // TODO: unimplemented upgrade case.
+        co_return;
+    }
+
    auto my_id = raft::server_id{co_await db::system_keyspace::get_raft_server_id()};
-    if (my_id) {
-        remove_addr = my_id;
-    } else {
-        // Nothing to do
-        co_return;
+    if (!my_id) {
+        on_internal_error(rslog,
+            "leave_group0: we're fully upgraded to use Raft and group 0 ID is present but Raft server ID is not."
+            " Please report a bug.");
    }
-    auto pause_shutdown = _shutdown_gate.hold();
-    if (std::holds_alternative<raft::group_id>(_group0)) {
-        co_return co_await _raft_gr.group0().modify_config({}, {remove_addr}, &_abort_source);
-    }
-    auto g0_info = co_await discover_group0(raft::server_address{my_id, raft::server_info{}});
-    if (g0_info.addr.id == my_id) {
-        co_return;
-    }
-    netw::msg_addr peer(raft_addr_to_inet_addr(g0_info.addr));
-    // During removenode, the client itself will retry or abort
-    // the operation if necessary, it's move important it's not
-    // "flaky" on slow network or CPU.
-    auto timeout = db::timeout_clock::now() + std::chrono::minutes{20};
-    // TODO: aborts?
-    co_return co_await ser::group0_rpc_verbs::send_group0_modify_config(&_ms, peer, timeout, g0_info.group0_id, {}, {remove_addr});
+
+    // Note: if this gets stuck due to a failure, the DB admin can abort.
+    // FIXME: this gets stuck without failures if we're the leader (#10833)
+    co_return co_await _raft_gr.group0().modify_config({}, {my_id}, &_abort_source);
 }

 future<> raft_group0::remove_from_group0(gms::inet_address node) {
--- a/service/raft/raft_group0.hh
+++ b/service/raft/raft_group0.hh
@@ -106,6 +106,15 @@ public:
    future<> become_voter();

    // Remove ourselves from group 0.
+    //
+    // Assumes we've finished the startup procedure (`setup_group0()` finished earlier).
+    // Assumes to run during decommission, after the node entered LEFT status.
+    //
+    // FIXME: make it retryable and do nothing if we're not a member.
+    // Currently if we call leave_group0 twice, it will get stuck the second time
+    // (it will try to forward an entry to a leader but never find the leader).
+    // Not sure how easy or hard it is and whether it's a problem worth solving; if decommission crashes,
+    // one can simply call `removenode` on another node to make sure we areremoved (from group 0 too).
    future<> leave_group0();

    // Remove `host` from group 0.