test/encryption: wait for topology convergence after abrupt restart

test_reboot uses a custom restart function that SIGKILLs and restarts nodes sequentially. After all nodes are back up, the test proceeded directly to reads after wait_for_cql_and_get_hosts(), which only confirms CQL reachability. While a node is restarted, other nodes might execute global token metadata barriers, which advance the topology fence version. The restarted node has to learn about the new version before it can send reads/writes to the other nodes. The test issues reads as soon as the CQL port is opened, which might happen before the last restarted node learns of the latest topology version. If this node acts as a coordinator for reads/write before this happens, these will fail as the other nodes will reject the ops with the outdated topology fence version. Fix this by replacing wait_for_cql_and_get_hosts() on the abrupt-restart path with the more robus get_ready_cql(), which makes sure servers see each other before refreshing the cql connection. This should ensure that nodes have exchanged gossip and converged on topology state before any reads are executed. The rolling_restart() path is unaffected as it handles this internally. Fixes: SCYLLADB-557 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Closes scylladb/scylladb#29211 (cherry picked from commit 854c374ebf) Closes scylladb/scylladb#29260
2026-04-19 16:15:07 +00:00 · 2026-03-24 12:13:20 +02:00
parent 666d0440f1
commit 253fa9519f
1 changed files with 1 additions and 1 deletions
--- a/test/cluster/test_encryption.py
+++ b/test/cluster/test_encryption.py
@@ -177,7 +177,7 @@ async def _smoke_test(manager: ManagerClient, key_provider: KeyProviderFactory,
            # restart the cluster
            if restart:
                await restart(manager, servers, cfs)
-                await wait_for_cql_and_get_hosts(cql, servers, time.time() + 60)
+                cql, _ = await manager.get_ready_cql(servers)
            else:
                await manager.rolling_restart(servers)
            for table_name in cfs: