Compare commits

...

1 Commits

Author SHA1 Message Date
Alex
3a901a1bf3 test/cluster/test_view_building_coordinator: start view-building nodes one by one to make the tests stronger
The test_node_operation_during_view_building() setup used servers_add() to
bring up all initial nodes concurrently. That is more aggressive than this test
needs, and it makes the setup sensitive to bootstrap/topology races and to
single-node startup failures. The add_server has notes about this case.
In the decommission case in particular, the test starts with 4 nodes and only
later exercises the node operation under test. When all 4 nodes are started
concurrently, a failure in one node during initial bootstrap can cause the whole
batch add to fail before the test even reaches the decommission step. This
showed up as Failed to add servers, with later nodes timing out while waiting
for topology/IP mapping after one of the early nodes shut down.
Switch the initial cluster setup to repeated server_add() calls. This keeps
the topology changes serialized, allows each node to fully join before the next
one starts, and matches the actual needs of the test. The change does not alter
the scenario being tested; it only makes the test setup less fragile and easier
to diagnose when a node startup problem happens.
2026-03-22 12:08:34 +02:00

View File

@@ -352,9 +352,14 @@ async def test_node_operation_during_view_building(manager: ManagerClient, opera
rack_layout = ["rack1", "rack2", "rack3"]
property_file = [{"dc": "dc1", "rack": rack} for rack in rack_layout]
servers = await manager.servers_add(node_count, config={"enable_tablets": "true"},
cmdline=cmdline_loggers,
property_file=property_file)
servers = [
await manager.server_add(
config={"enable_tablets": "true"},
cmdline=cmdline_loggers,
property_file=server_property_file,
)
for server_property_file in property_file
]
cql, _ = await manager.get_ready_cql(servers)
await manager.disable_tablet_balancing()