mirror of
https://github.com/scylladb/scylladb.git
synced 2026-05-13 11:22:01 +00:00
In storage_service::handle_state_normal, storage_service::notify_joined
will be called which drops the rpc connections to the node becomes
normal. This causes rpc calls with that node fail with
seastar::rpc::closed_error error.
Consider this:
- n1 in the cluster
- n2 is added to join the cluster
- n2 sees n1 is in normal status
- n2 starts bootstrap process
- notify_joined on n2 closes rpc connection to n1 in the middle of
bootstrap
- n2 fails to bootstrap
For example, during bootstrap with RBNO, we saw repair failed in a
test that sets ring_delay to zero and does not wait for gossip to
settle.
repair - repair[9cd0dbf8-4bca-48fc-9b1c-d9e80d0313a2]: sync data for
keyspace=system_distributed_everywhere, status=failed:
std::runtime_error ({shard 0: seastar::rpc::closed_error (connection is
closed)})
This patch fixes the race by waiting for the handle_state_normal handler
to finish before the bootstrap process.
Fixes #12764
Fixes #12956
(cherry picked from commit 53636167ca)