mirror of
https://github.com/scylladb/scylladb.git
synced 2026-06-04 14:03:06 +00:00
The condition variable predicate for repair tasks unconditionally returned true (introduced ine5928497ce), which meant event.wait(pred) never actually suspended: do_until checks the predicate first, and if it's already satisfied, returns immediately without calling the inner wait(). This caused two problems: 1. The while(true) loop busy-spun, polling without blocking between topology changes. 2. During shutdown, event.broken() had no effect because no waiter was registered on the CV. The loop kept spinning, holding the HTTP server's task gate open and preventing http_server::stop() from completing. After ~15 minutes, systemd killed the process with SIGABRT. The fix replaces the synchronous predicate with an async task_finished() helper that dispatches on the task type. Since the repair check is async (for_each_tablet scans every tablet), we cannot use event.wait(Pred). Instead, we register a waiter via event.wait() *before* running the async check, ensuring no broadcast is missed during the check. event.broken() during shutdown propagates broken_condition_variable to the registered waiter and unblocks the loop promptly. Fixes: SCYLLADB-2181 Closes scylladb/scylladb#29485 (cherry picked from commit96a992002c) Closes scylladb/scylladb#30042