When a node is restarted, there is a race between gossip starts (other
nodes will mark this node up again and send requests) and the tokens are
replicated to other shards. Here is an example:
- n1, n2
- n2 is down, n1 think n2 is down
- n2 starts again, n2 starts gossip service, n1 thinks n2 is up and sends
reads/writes to n2, but n2 hasn't replicated the token_metadata to all
the shards.
- n2 complains:
token_metadata - sorted_tokens is empty in first_token_index!
token_metadata - sorted_tokens is empty in first_token_index!
token_metadata - sorted_tokens is empty in first_token_index!
token_metadata - sorted_tokens is empty in first_token_index!
token_metadata - sorted_tokens is empty in first_token_index!
token_metadata - sorted_tokens is empty in first_token_index!
storage_proxy - Failed to apply mutation from $ip#4: std::runtime_error
(sorted_tokens is empty in first_token_index!)
The code path looks like below:
0 stoarge_service::init_server
1 prepare_to_join()
2 add gossip application state of NET_VERSION, SCHEMA and so on.
3 _gossiper.start_gossiping().get()
4 join_token_ring()
5 _token_metadata.update_normal_tokens(tokens, get_broadcast_address());
6 replicate_to_all_cores().get()
7 storage_service::set_gossip_tokens() which adds the gossip application state of TOKENS and STATUS
The race talked above is at line 3 and line 6.
To fix, we can replicate the token_metadata early after it is filled
with the tokens read from system table before gossip starts. So that
when other nodes think this restarting node is up, the tokens are
already replicated to all the shards.
In addition, this patch also fixes the issue that other nodes might see
a node miss the TOKENS and STATUS application state in gossip if that
node failed in the middle of a restarting process, i.e., it is killed
after line 3 and before line 7. As a result we could not replace the
node.
Tests: update_cluster_layout_tests.py
Fixes: #4709Fixes: #4723
(cherry picked from commit 3b39a59135)