mirror of
https://github.com/scylladb/scylladb.git
synced 2026-05-13 03:12:13 +00:00
This commit is a first part of the fix for #14675. The issue is about the test test_joining_old_node_fails faling occasionally with experimental_features: [consistent-topology-changes]. The next commit contains a fix for it, here we solve the pre-existing gossiper problem which we stumble upon after the fix. Local generation for addr may have been increased since the current node sent an initial SYN. Comparing versions across different generations in get_state_for_version_bigger_than could result in loosing some app states with smaller versions. More specifically, consider a cluster with nodes .1, .2, .3, .3 has .1 and .2 as seeds, .2 has .1 as a seed. Suppose .2 receives a SYN from .3 before its gossiper starts, and it has a version 0.24 for .1 in endpoint_states. The digest from .3 contains 0.25 as a version for .1, so examine_gossiper produces .1->0.24 as a digest and this digest is send to .3 as part of the ack. Before processing this ack, .3 processed an ack from .1 (scylla sends SYN to many nodes) and updates its endpoint_states according to it, so now it has .1->100500.32 for .1. Then we get to do_send_ack2_msg and call get_state_for_version_bigger_than(.1, 24). This returns properties which has version > 24, ignoring a lot of them with smaller versions which has been received from .1. Also, get_state_for_version_bigger_than updates generation (it copies get_heart_beat_state from .3), so when we apply the ack in handle_ack2_msg at .2 we update the generation and now the skipped app states will only be updated on .2 if somebody change them and increment their version. Cassandra behaviour is the same in this case (see https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/gms/GossipDigestAckVerbHandler.java#L86). This is probably less of a problem for them since most of the time they send only one SYN in one gossiper round (save for unreachable nodes), so there is less room for conflicts.