storage_service: check if we know other nodes' tokens when joining ring

If we are a seed node (but not the only one) or we set
auto_bootstrap=off, it might happen due to misconfiguration or a network
partition that we don't know other nodes' tokens at the end of the
join_token_ring function, when we go into the NORMAL status, finishing
the joining process.

CDC however requires that we know other nodes' tokens at this point:
we need them to correctly create a new CDC generation.

This commit adds a check which prevents the node from starting if that's
not the case. If the check fails, the node first tries waiting a bit until
it learns about the tokens or timeouts.
This commit is contained in:
Kamil Braun
2020-01-10 17:57:47 +01:00
parent bd42b10df1
commit 7cb6ac33f5
2 changed files with 44 additions and 1 deletions

Submodule seastar updated: 44cf127ee9...65980a9b30

View File

@@ -640,6 +640,34 @@ void storage_service::maybe_start_sys_dist_ks() {
_sys_dist_ks.invoke_on_all(&db::system_distributed_keyspace::start).get();
}
/* If we're not a seed node or there are seeds other than us,
* wait until we contact any of them and learn what tokens are used by other nodes in the cluster,
* with a timeout.
*
* Returns true if we're the only seed or we've managed to learn the other nodes' tokens before timeouting.
*/
// Run in seastar::async context.
static bool wait_for_other_tokens(const locator::token_metadata& tm,
const std::set<inet_address>& seeds, inet_address my_ep, abort_source& as) {
if (seeds.size() == 1 && seeds.count(my_ep)) {
return true;
}
int retries = 30;
while (retries--) {
auto& tok_to_ep = tm.get_token_to_endpoint();
if (std::any_of(tok_to_ep.begin(), tok_to_ep.end(),
[my_ep] (const std::pair<token, inet_address>& p) { return p.second != my_ep; })) {
return true;
}
slogger.warn("Couldn't retrieve other nodes' tokens. Retrying again in one second...");
sleep_abortable(std::chrono::seconds(1), as).get();
}
return false;
}
// Runs inside seastar::async context
void storage_service::join_token_ring(int delay) {
// This function only gets called on shard 0, but we want to set _joined
@@ -816,6 +844,21 @@ void storage_service::join_token_ring(int delay) {
// Finally, if we're simply a new node joining the ring but skipping bootstrapping,
// we'll propose a new generation just as normally bootstrapping nodes do.
if (!db::system_keyspace::bootstrap_complete()) {
// Check if we managed to learn about other nodes' tokens (if there are other nodes).
// We might have not due to misconfiguration, e.g. skipping the "wait for gossip to settle" phase,
// or due to a network partition which didn't allow us to contact other seeds.
// But CDC requires that we know other nodes' tokens - they are needed to make a new generation of streams.
if (!wait_for_other_tokens(_token_metadata, _gossiper.get_seeds(), get_broadcast_address(), _abort_source)) {
throw std::runtime_error(format(
"Can't boot this node because we weren't able to retrieve other nodes' tokens,"
" which is required for CDC to work. Make sure that:\n"
" - you don't use the \"--skip-wait-for-gossip-to-settle\" flag,\n"
" - it is possible to contact other nodes (fix any network issues).\n"
"Configured seeds: {}", _gossiper.get_seeds()));
}
}
if (!db().local().is_replacing()
&& (!db::system_keyspace::bootstrap_complete()
|| cdc::should_propose_first_generation(get_broadcast_address(), _gossiper))) {