test: topology: wait for token ring/group 0 consistency after decommission

There was a check for immediate consistency after a decommission
operation has finished in one of the tests, but it turns out that also
after decommission it might take some time for token ring to be updated
on other nodes. Replace the check with a wait.

Also do the wait in another test that performs a sequence of
decommissions. We won't attempt to start another decommission until
every node learns that the previously decommissioned node has left.

Closes #12686
This commit is contained in:
Kamil Braun
2023-01-31 17:03:32 +01:00
committed by Nadav Har'El
parent 1b2140e416
commit 40142a51d0

View File

@@ -68,9 +68,9 @@ async def check_token_ring_and_group0_consistency(manager: ManagerClient) -> Non
async def wait_for_token_ring_and_group0_consistency(manager: ManagerClient, deadline: float) -> None:
"""Weaker version of the above check; the token ring is not immediately updated
after bootstrap/replace - the normal tokens of the new node propagate through gossip.
Take this into account and wait for the equality condition to hold, with a timeout.
"""Weaker version of the above check; the token ring is not immediately updated after
bootstrap/replace/decommission - the normal tokens of the new node propagate through gossip.
Take this into account and wait for the equality condition to hold, with a timeout.
"""
servers = await manager.running_servers()
for srv in servers:
@@ -159,7 +159,7 @@ async def test_decommission_node_add_column(manager, random_tables):
await manager.api.enable_injection(
bootstrapped_server.ip_addr, 'storage_service_decommission_prepare_handler_sleep', one_shot=True)
await manager.decommission_node(decommission_target.server_id)
await check_token_ring_and_group0_consistency(manager)
await wait_for_token_ring_and_group0_consistency(manager, time.time() + 30)
await table.add_column()
await random_tables.verify_schema()
@@ -226,6 +226,7 @@ async def test_nodes_with_different_smp(manager: ManagerClient, random_tables: R
servers = await manager.running_servers()
for s in servers[:-1]:
await manager.decommission_node(s.server_id)
await wait_for_token_ring_and_group0_consistency(manager, time.time() + 30)
logger.info(f'Adding --smp=4 server')
await manager.server_add(cmdline=['--smp', '4'])