From 7ea6e1ec0ae4d651fefb888df5f0fff542d4a47c Mon Sep 17 00:00:00 2001 From: Piotr Dulikowski Date: Wed, 27 Mar 2024 08:20:10 +0100 Subject: [PATCH] storage_service: keep subscription to raft topology feature alive The storage_service::track_upgrade_progress_to_topology_coordinator function is supposed to wait on the SUPPORTS_CONSISTENT_TOPOLOGY_CHANGES cluster feature (among other things) before starting the raft_state_monitor_fiber. The wait is realized by passing a callback to feature::when_enabled which sets a shared_promise that is waited on by the tracking fiber. If the feature is already enabled, when_enabled will call the callback immediately. However, if it's not, then it will return a non-null listener_registration object - as long as it is alive, the callback is registered. The listener_registration object was not assigned to a variable which caused it to be destroyed shortly after the when_enabled function returns. Due to that, if upgrade was requested but the current group0 leader didn't have the SUPPORTS_CONSISTENT_TOPOLOGY_CHANGES feature enabled right after boot, the upgrade would not start until the leader is changed to a node which has that cluster feature already enabled on boot. Moreover, the topology coordinator would not start on such a node until the node were rebooted. Fix the issue by assigning the subscription to a variable. --- service/storage_service.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/service/storage_service.cc b/service/storage_service.cc index 6f912c74db..69e2f19686 100644 --- a/service/storage_service.cc +++ b/service/storage_service.cc @@ -1917,7 +1917,7 @@ future<> storage_service::track_upgrade_progress_to_topology_coordinator(sharded // First, wait for the feature to become enabled shared_promise<> p; - _feature_service.supports_consistent_topology_changes.when_enabled([&] () noexcept { p.set_value(); }); + auto sub = _feature_service.supports_consistent_topology_changes.when_enabled([&] () noexcept { p.set_value(); }); co_await p.get_shared_future(_group0_as); rtlogger.info("The cluster is ready to start upgrade to the raft topology. The procedure needs to be manually triggered. Refer to the documentation");