Files
Łukasz Paszkowski 96a992002c tasks: fix busy-spin and shutdown hang in tablet_virtual_task::wait() for repair tasks
The condition variable predicate for repair tasks unconditionally
returned true (introduced in e5928497ce), which meant event.wait(pred)
never actually suspended: do_until checks the predicate first, and if
it's already satisfied, returns immediately without calling the inner
wait(). This caused two problems:
1. The while(true) loop busy-spun, polling without blocking between
   topology changes.
2. During shutdown, event.broken() had no effect because no waiter was
   registered on the CV. The loop kept spinning, holding the HTTP
   server's task gate open and preventing http_server::stop() from
   completing. After ~15 minutes, systemd killed the process with
   SIGABRT.

The fix replaces the synchronous predicate with an async task_finished()
helper that dispatches on the task type. Since the repair check is async
(for_each_tablet scans every tablet), we cannot use event.wait(Pred).
Instead, we register a waiter via event.wait() *before* running the async
check, ensuring no broadcast is missed during the check. event.broken()
during shutdown propagates broken_condition_variable to the registered
waiter and unblocks the loop promptly.

Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1532

Closes scylladb/scylladb#29485
2026-05-22 16:47:48 +03:00
..
2026-04-12 19:46:33 +03:00
2026-04-12 19:46:33 +03:00
2026-04-12 19:46:33 +03:00
2026-04-12 19:46:33 +03:00
2026-04-12 19:46:33 +03:00