storage_service: topology coordinator: do not retry the metadata barrier forever in write_both_read_new state

Handle the barrier failure by sleeping for a "ring delay" and continuing. The purpose of the barrier is to wait for all reads to old replica set to complete and fence the remaining requests. If the barrier fails we give the fence some time to propagate and continue with the topology change. Of fence did not propagate we may have stale reads, but this is not worse that we have with gossiper.
2026-05-30 11:36:54 +00:00 · 2023-11-06 13:47:23 +02:00
parent 7ea8fa459c
commit 7267376eac
1 changed files with 10 additions and 2 deletions
--- a/service/storage_service.cc
+++ b/service/storage_service.cc
@@ -2063,7 +2063,7 @@ class topology_coordinator {
                break;
            case topology::transition_state::write_both_read_new: {
                auto node = get_node_to_work_on(std::move(guard));
-
+                bool barrier_failed = false;
                // In this state writes goes to old and new replicas but reads start to be done from new replicas
                // Before we stop writing to old replicas we need to wait for all previous reads to complete
                try {
@@ -2076,7 +2076,15 @@ class topology_coordinator {
                    slogger.error("raft topology: transition_state::write_both_read_new, "
                                    "global_token_metadata_barrier failed, error {}",
                                    std::current_exception());
-                    break;
+                    barrier_failed = true;
+                }
+                if (barrier_failed) {
+                    // If barrier above failed it means there may be unfenced reads from old replicas.
+                    // Lets wait for the ring delay for those writes to complete or fence to propagate
+                    // before continuing.
+                    // FIXME: nodes that cannot be reached need to be isolated either automatically or
+                    // by an administrator
+                    co_await sleep_abortable(_ring_delay, _as);
                }
                switch(node.rs->state) {
                case node_state::bootstrapping: {