storage_service: topology coordinator: fix accessing outdated node in case of barrier failure

When metadata barrier fails a guard is released and node becomes
outdated. Failure handling path needs to re-take the guard and re-create
the node before continuing.

Fixes: #16568

Message-ID: <ZYxEm+SaBeFcRT8E@scylladb.com>
This commit is contained in:
Gleb Natapov
2023-12-27 17:36:59 +02:00
committed by Avi Kivity
parent 3ce0576a31
commit e31f6893af

View File

@@ -2156,6 +2156,7 @@ class topology_coordinator {
// FIXME: nodes that cannot be reached need to be isolated either automatically or
// by an administrator
co_await sleep_abortable(_ring_delay, _as);
node = retake_node(co_await start_operation(), node.id);
}
switch(node.rs->state) {
case node_state::bootstrapping: {
@@ -2360,6 +2361,7 @@ class topology_coordinator {
// Lets wait for the ring delay for those writes to complete and new topology to propagate
// before continuing.
co_await sleep_abortable(_ring_delay, _as);
node = retake_node(co_await start_operation(), node.id);
}
// Tell the node to shut down.