Merge 'tasks: do not fail the wait request if rpc fails' from Aleksandra Martyniuk

During decommission, we first mark a topology request as done, then shut
down a node and in the following steps we remove node from the topology.
Thus,  finished request does not imply that a node is removed from
the topology.

Due to that, in node_ops_virtual_task::wait, while gathering children
from the whole cluster, we may hit the connection exception - because
a node is still in topology, even though it is down.

Modify the get_children method to ignore the exception and warn
about the failure instead.

Keep token_metadata_ptr in get_children to prevent topology from changing.

Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-867

Needs backports to all versions

Closes scylladb/scylladb#29035

* github.com:scylladb/scylladb:
  tasks: fix indentation
  tasks: do not fail the wait request if rpc fails
  tasks: pass token_metadata_ptr to task_manager::virtual_task::impl::get_children

(cherry picked from commit 2e47fd9f56)

Closes scylladb/scylladb#29531
This commit is contained in:
Botond Dénes
2026-03-19 10:03:18 +02:00
parent 42edeee977
commit 2eda427b96
5 changed files with 18 additions and 42 deletions

View File

@@ -91,7 +91,7 @@ future<std::optional<tasks::task_status>> node_ops_virtual_task::get_status_help
.entity = "",
.progress_units = "",
.progress = tasks::task_manager::task::progress{},
.children = started ? co_await get_children(get_module(), id, [&gossiper = _ss.gossiper()] (gms::inet_address addr) { return gossiper.is_alive(addr); }) : std::vector<tasks::task_identity>{}
.children = started ? co_await get_children(get_module(), id, _ss.get_token_metadata_ptr()) : std::vector<tasks::task_identity>{}
};
}