In legacy topology mode, on startup, a node will attempt to insert data
of the newest CDC generation into the legacy distributed tables. In case
of any errors, the operation will be retried until success in 60s
intervals. While the node waits for the operation to be retried, it
keeps a token_metadata_ptr instance. This is a problem for two reasons:
- The tmptr instance is used in a lambda which determines the cluster
size. This lambda is used to determine the consistency level when
inserting the generation to the distributed tables - if there is only
one node, CL=ONE should be used instead of CL=QUORUM. The tmptr is
immutable so it can technically happen the the cluster is shrinked
while the code waits for the generation to be inserted.
- Token metadata instance keeps a version tracker that which prevents
topology operations from proceeding while the tracker exists. This is
a very niche problem, but it might happen that a leftover instance of
token metadata held by update_streams_description might delay a
topology operation which happens after upgrade to raft topology
happens. This actually slows down the test which simulates upgrade to
raft topology getting stuck (to be introduced in later commits).
Instead of capturing a token_metadata_ptr instance, capture a reference
to shared_token_metadata and use a freshly issued token_metadata_ptr
when computing the cluster size in order to choose the consistency
level.