mirror of
https://github.com/scylladb/scylladb.git
synced 2026-05-21 23:32:15 +00:00
test/cluster: fix server_add/server_start hanging when starting in maintenance mode
When Scylla starts in maintenance mode it sends sd_notify("STATUS=entering
maintenance mode") instead of sd_notify("STATUS=serving"), and does not
open the standard CQL port. This caused two independent bugs after the
default was changed to ServerUpState.SERVING:
1. poll_status() resolved serving_signal to False on the maintenance
notification, so check_serving_notification() would never return True,
and start() would time out waiting for SERVING.
2. The readiness check in start() was guarded by
`server_up_state >= CQL_ALTERNATOR_QUERIED`, which is never reached in
maintenance mode (the standard CQL port is not open). Even if bug 1
were fixed, SERVING would never be recognized.
Fix both:
- Treat STATUS=entering maintenance mode as a successful readiness signal
in poll_status(), resolving serving_signal to True just like
STATUS=serving. Both mean "all configured ports are now open".
- Remove the CQL_ALTERNATOR_QUERIED precondition from the
check_serving_notification() call in start(). The sd_notify signal is
authoritative: Scylla sends it only when fully ready, regardless of
which ports it opened. No CQL precondition is needed.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
This commit is contained in:
@@ -835,8 +835,9 @@ class ScyllaServer:
|
||||
loop.call_soon_threadsafe(f.set_result, True)
|
||||
return
|
||||
if 'STATUS=entering maintenance mode' in message:
|
||||
logger.debug("Receive sd_notify 'entering maintenance mode'")
|
||||
break
|
||||
logger.debug("Received sd_notify 'entering maintenance mode' message")
|
||||
loop.call_soon_threadsafe(f.set_result, True)
|
||||
return
|
||||
except socket.timeout:
|
||||
pass
|
||||
except Exception as e:
|
||||
@@ -972,8 +973,11 @@ class ScyllaServer:
|
||||
if server_up_state == ServerUpState.PROCESS_STARTED:
|
||||
server_up_state = ServerUpState.HOST_ID_QUERIED
|
||||
server_up_state = await self.get_cql_alternator_up_state() or server_up_state
|
||||
# Check for SERVING state (sd_notify "serving" message)
|
||||
if server_up_state >= ServerUpState.CQL_ALTERNATOR_QUERIED and self.check_serving_notification():
|
||||
# Check for SERVING state via sd_notify. This is authoritative: Scylla sends
|
||||
# STATUS=serving once all configured listeners are ready, and
|
||||
# STATUS=entering maintenance mode once the maintenance socket is ready.
|
||||
# Both mean the server is fully started and we don't need to wait further.
|
||||
if self.check_serving_notification():
|
||||
server_up_state = ServerUpState.SERVING
|
||||
if server_up_state >= expected_server_up_state:
|
||||
if expected_error is not None:
|
||||
|
||||
Reference in New Issue
Block a user