From ebdf5f9e55ce5cdf4fdc501eaddd221f1f4e62a4 Mon Sep 17 00:00:00 2001 From: Asias He Date: Thu, 13 Aug 2020 10:16:46 +0800 Subject: [PATCH] gossip: Fix race between shutdown message handler and apply_state_locally 1. The node1 is shutdown 2. The node1 sends shutdown message to node2 3. The node2 receives gossip shutdown message but the handler yields 4. The node1 is restarted 5. The node1 sends new gossip endpoint_state to node2, node2 applies the state in apply_state_locally and calls gossiper::handle_major_state_change and then calls gossiper::mark_alive 6. The shutdown message handler in step 3 resumes and sets status of node1 to SHUTDOWN 7. The gossiper::mark_alive fiber in step 5 resumes and calls gossiper::real_mark_alive, node2 will skip to mark node1 as alive because the status of node1 is SHUTDOWN. As a result, node1 is alive but it is not marked as UP by node2. To fix, we serialize the two operations. Fixes #7032 (cherry picked from commit e6ceec1685de2c08bd5220bb7cae52fcc012aa5c) --- gms/gossiper.cc | 1 + 1 file changed, 1 insertion(+) diff --git a/gms/gossiper.cc b/gms/gossiper.cc index db69bafb0b..4fde83d22a 100644 --- a/gms/gossiper.cc +++ b/gms/gossiper.cc @@ -428,6 +428,7 @@ future<> gossiper::handle_shutdown_msg(inet_address from) { return make_ready_future<>(); } return seastar::async([this, from] { + auto permit = this->lock_endpoint(from).get0(); this->mark_as_shutdown(from); }); }