storage_service: Update peer table only if the peer is part of the ring

Consider the following procedure:

- n1, n2, n3

- n3 is network partitioned from the cluster

- n4 replaces n3

- n3 has the network partition fixed

- n1 learns n3 as NORMAL status and calls
  storage_service::handle_state_normal which in turn calls
  update_peer_info, all columns except tokens column in system.peers are
  written

- n1 restarts before figure out n4 is the new owner and deletes the
  entry for n3 in system.peers

- n3 is removed from gossip by all the nodes in the cluster
  automatically because they detect the collision and removes n3

- n1 restarts, leaving the entry in system.peers for n3 forever

To fix, we can update peer tables only if the node is part of the ring.

Fixes #8729

Closes #8742
This commit is contained in:
Asias He
2021-05-28 09:35:04 +08:00
committed by Tomasz Grabiec
parent b6c49fd320
commit e86d39faf0

View File

@@ -911,8 +911,6 @@ void storage_service::handle_state_normal(inet_address endpoint) {
if (tmptr->is_member(endpoint)) {
slogger.info("Node {} state jump to normal", endpoint);
}
update_peer_info(endpoint);
std::unordered_set<inet_address> endpoints_to_remove;
auto do_remove_node = [&] (gms::inet_address node) {
@@ -996,6 +994,7 @@ void storage_service::handle_state_normal(inet_address endpoint) {
}
slogger.debug("handle_state_normal: endpoint={} owned_tokens = {}", endpoint, owned_tokens);
if (!owned_tokens.empty() && !endpoints_to_remove.count(endpoint)) {
update_peer_info(endpoint);
db::system_keyspace::update_tokens(endpoint, owned_tokens).then_wrapped([endpoint] (auto&& f) {
try {
f.get();