storage_service: keep subscription to raft topology feature alive

The storage_service::track_upgrade_progress_to_topology_coordinator
function is supposed to wait on the SUPPORTS_CONSISTENT_TOPOLOGY_CHANGES
cluster feature (among other things) before starting the
raft_state_monitor_fiber. The wait is realized by passing a callback to
feature::when_enabled which sets a shared_promise that is waited on by
the tracking fiber. If the feature is already enabled, when_enabled will
call the callback immediately. However, if it's not, then it will return
a non-null listener_registration object - as long as it is alive, the
callback is registered. The listener_registration object was not
assigned to a variable which caused it to be destroyed shortly after the
when_enabled function returns.

Due to that, if upgrade was requested but the current group0 leader
didn't have the SUPPORTS_CONSISTENT_TOPOLOGY_CHANGES feature enabled
right after boot, the upgrade would not start until the leader is
changed to a node which has that cluster feature already enabled on
boot. Moreover, the topology coordinator would not start on such a node
until the node were rebooted.

Fix the issue by assigning the subscription to a variable.
This commit is contained in:
Piotr Dulikowski
2024-03-27 08:20:10 +01:00
parent 04370dc8a4
commit 7ea6e1ec0a

View File

@@ -1917,7 +1917,7 @@ future<> storage_service::track_upgrade_progress_to_topology_coordinator(sharded
// First, wait for the feature to become enabled
shared_promise<> p;
_feature_service.supports_consistent_topology_changes.when_enabled([&] () noexcept { p.set_value(); });
auto sub = _feature_service.supports_consistent_topology_changes.when_enabled([&] () noexcept { p.set_value(); });
co_await p.get_shared_future(_group0_as);
rtlogger.info("The cluster is ready to start upgrade to the raft topology. The procedure needs to be manually triggered. Refer to the documentation");