Files
scylladb/test/cluster
Avi Kivity f2ab911a46 Merge 'test/cluster: fix server-starting functions to wait for all ports' from Nadav Har'El
This series fixes a recurring source of flaky tests in the cluster test suite.

When a test configures Scylla to listen on non-default ports (e.g. a custom Alternator port, proxy-protocol port or shard-aware port), server_add() and server_start() would declare the server ready by polling the hardcoded standard CQL and Alternator ports. Those ports can become available slightly before the custom ports finish binding, so the test could start using the custom port before it was open — causing intermittent failures.

The fix for each affected test was to pass `expected_server_up_state=ServerUpState.SERVING` explicitly, which waits for Scylla's sd_notify("STATUS=serving") signal instead. That signal is sent only after all configured listeners are fully open, so it is always the right readiness signal regardless of the port configuration. This workaround was applied again in PR #29737 and will keep being needed for every new test that uses a non-default port.

This series makes ServerUpState.SERVING the default at every level of the server start/add call stack so no test needs to remember it:

* Make server_add(), servers_add(), server_start() et al. all  default to ServerUpState.SERVING.
* Document that server_add/server_start wait for all ports to be  ready,  so future test authors understand what the functions guarantee.
* Remove now-redundant expected_server_up_state=SERVING from exiting tests.
* A small optimization: Fix check_serving_notification() returning False on first completion. When the sd_notify future completed, the function correctly updated _received_serving but still returned False, wasting one 100ms polling interval. Return self._received_serving directly.

Closes scylladb/scylladb#29758

* github.com:scylladb/scylladb:
  test/pylib: fix missing protocol_version=4 on control_cluster
  scylla_cluster: guard poll_status() set_result() calls against cancelled future
  test/cluster: avoid repeated CQL checks and leaks while waiting for SERVING
  test/cluster: fix check_serving_notification() inefficiency
  test/cluster: remove now-redundant expected_server_up_state=SERVING
  test/cluster: document that add/start waits for all ports to be ready
  test/cluster: update remaining CQL_ALTERNATOR_QUERIED defaults to SERVING
  test/cluster: fix server_add/server_start hanging when starting in maintenance mode
  main: notify "entering maintenance mode" after the maintenance CQL server is ready
  test/cluster: make server_start() default to ServerUpState.SERVING
  test/cluster: make server_add() default to ServerUpState.SERVING
2026-05-13 21:23:18 +03:00
..
2026-04-12 19:46:33 +03:00
2026-04-12 19:46:33 +03:00
2026-04-12 19:46:33 +03:00
2026-04-12 19:46:33 +03:00