strong consistency/groups_manager: handle timeout in update() wait-for-leader loop

The wait-for-leader loop in groups_manager::update() uses abort_on_expiry
with a 60-second timeout. If the timeout fires, co_await w->future throws
an exception that propagates unhandled out of the server_control_op
coroutine, leaving the group in an indeterminate state.

Use coroutine::as_future to catch the exception, log a warning, and break
out of the loop gracefully. The group will still be reported as started
(allowing other operations to proceed) even if the leader wasn't found
within the timeout.
This commit is contained in:
Petr Gusev
2026-05-27 12:05:24 +02:00
parent d922c43358
commit f2b1cbe998

View File

@@ -409,7 +409,12 @@ void groups_manager::update(token_metadata_ptr new_tm) {
auto srv = raft_server(state, state.gate->hold());
auto res = srv.begin_mutate(aoe.abort_source());
if (auto w = get_if<raft_server::need_wait_for_leader>(&res)) {
co_await std::move(w->future);
auto f = co_await coroutine::as_future(std::move(w->future));
if (f.failed()) {
logger.warn("update(): waiting for leader timed out for tablet {}, "
"group id {}: {}", tablet, id, f.get_exception());
break;
}
} else {
break;
}