Files
scylladb/service
Asias He 9a873bf4b3 storage_service: Wait for normal state handler to finish in bootstrap
In storage_service::handle_state_normal, storage_service::notify_joined
will be called which drops the rpc connections to the node becomes
normal. This causes rpc calls with that node fail with
seastar::rpc::closed_error error.

Consider this:

- n1 in the cluster
- n2 is added to join the cluster
- n2 sees n1 is in normal status
- n2 starts bootstrap process
- notify_joined on n2 closes rpc connection to n1 in the middle of
  bootstrap
- n2 fails to bootstrap

For example, during bootstrap with RBNO, we saw repair failed in a
test that sets ring_delay to zero and does not wait for gossip to
settle.

repair - repair[9cd0dbf8-4bca-48fc-9b1c-d9e80d0313a2]: sync data for
keyspace=system_distributed_everywhere, status=failed:
std::runtime_error ({shard 0: seastar::rpc::closed_error (connection is
closed)})

This patch fixes the race by waiting for the handle_state_normal handler
to finish before the bootstrap process.

Fixes #12764
Fixes #12956

(cherry picked from commit 53636167ca)
2023-04-30 18:58:28 +03:00
..