test: fix topology_custom/test_raft_recovery_stuck flakiness

The test performs consecutive schema changes in RECOVERY mode. The second change relies on the first. However the driver might route the changes to different servers and we don't have group 0 to guarantee linearizability. We must rely on the first change coordinator to push the schema mutations to other servers before returning, but that only happens when it sees other servers as alive when doing the schema change. It wasn't guaranteed in the test. Fix this. Fixes scylladb/scylladb#20791 Should be backported to all branches containing this test to reduce flakiness. Closes scylladb/scylladb#20792
2026-05-13 11:22:01 +00:00 · 2024-09-24 13:13:33 +02:00
parent d16ea0afd6
commit 69b4769418
1 changed files with 4 additions and 0 deletions
--- a/test/topology_custom/test_raft_recovery_stuck.py
+++ b/test/topology_custom/test_raft_recovery_stuck.py
@@ -79,6 +79,10 @@ async def test_recover_stuck_raft_recovery(request, manager: ManagerClient):
    logging.info(f"Restarting {others}")
    await manager.rolling_restart(others)

+    # Prevent scylladb/scylladb#20791
+    logging.info(f"Wait until {srv1} sees {others} as alive")
+    await manager.server_sees_others(srv1.server_id, len(others))
+
    logging.info(f"{others} restarted, waiting until driver reconnects to them")
    hosts = await wait_for_cql_and_get_hosts(cql, others, time.time() + 60)