storage_service: check if we know other nodes' tokens when joining ring
If we are a seed node (but not the only one) or we set auto_bootstrap=off, it might happen due to misconfiguration or a network partition that we don't know other nodes' tokens at the end of the join_token_ring function, when we go into the NORMAL status, finishing the joining process. CDC however requires that we know other nodes' tokens at this point: we need them to correctly create a new CDC generation. This commit adds a check which prevents the node from starting if that's not the case. If the check fails, the node first tries waiting a bit until it learns about the tokens or timeouts.
This commit is contained in:
2
seastar
2
seastar
Submodule seastar updated: 44cf127ee9...65980a9b30
@@ -640,6 +640,34 @@ void storage_service::maybe_start_sys_dist_ks() {
|
||||
_sys_dist_ks.invoke_on_all(&db::system_distributed_keyspace::start).get();
|
||||
}
|
||||
|
||||
/* If we're not a seed node or there are seeds other than us,
|
||||
* wait until we contact any of them and learn what tokens are used by other nodes in the cluster,
|
||||
* with a timeout.
|
||||
*
|
||||
* Returns true if we're the only seed or we've managed to learn the other nodes' tokens before timeouting.
|
||||
*/
|
||||
// Run in seastar::async context.
|
||||
static bool wait_for_other_tokens(const locator::token_metadata& tm,
|
||||
const std::set<inet_address>& seeds, inet_address my_ep, abort_source& as) {
|
||||
if (seeds.size() == 1 && seeds.count(my_ep)) {
|
||||
return true;
|
||||
}
|
||||
|
||||
int retries = 30;
|
||||
while (retries--) {
|
||||
auto& tok_to_ep = tm.get_token_to_endpoint();
|
||||
if (std::any_of(tok_to_ep.begin(), tok_to_ep.end(),
|
||||
[my_ep] (const std::pair<token, inet_address>& p) { return p.second != my_ep; })) {
|
||||
return true;
|
||||
}
|
||||
|
||||
slogger.warn("Couldn't retrieve other nodes' tokens. Retrying again in one second...");
|
||||
sleep_abortable(std::chrono::seconds(1), as).get();
|
||||
}
|
||||
|
||||
return false;
|
||||
}
|
||||
|
||||
// Runs inside seastar::async context
|
||||
void storage_service::join_token_ring(int delay) {
|
||||
// This function only gets called on shard 0, but we want to set _joined
|
||||
@@ -816,6 +844,21 @@ void storage_service::join_token_ring(int delay) {
|
||||
// Finally, if we're simply a new node joining the ring but skipping bootstrapping,
|
||||
// we'll propose a new generation just as normally bootstrapping nodes do.
|
||||
|
||||
if (!db::system_keyspace::bootstrap_complete()) {
|
||||
// Check if we managed to learn about other nodes' tokens (if there are other nodes).
|
||||
// We might have not due to misconfiguration, e.g. skipping the "wait for gossip to settle" phase,
|
||||
// or due to a network partition which didn't allow us to contact other seeds.
|
||||
// But CDC requires that we know other nodes' tokens - they are needed to make a new generation of streams.
|
||||
if (!wait_for_other_tokens(_token_metadata, _gossiper.get_seeds(), get_broadcast_address(), _abort_source)) {
|
||||
throw std::runtime_error(format(
|
||||
"Can't boot this node because we weren't able to retrieve other nodes' tokens,"
|
||||
" which is required for CDC to work. Make sure that:\n"
|
||||
" - you don't use the \"--skip-wait-for-gossip-to-settle\" flag,\n"
|
||||
" - it is possible to contact other nodes (fix any network issues).\n"
|
||||
"Configured seeds: {}", _gossiper.get_seeds()));
|
||||
}
|
||||
}
|
||||
|
||||
if (!db().local().is_replacing()
|
||||
&& (!db::system_keyspace::bootstrap_complete()
|
||||
|| cdc::should_propose_first_generation(get_broadcast_address(), _gossiper))) {
|
||||
|
||||
Reference in New Issue
Block a user