Files
seaweedfs/weed/util/log_buffer
Chris Lu eb5624233d [filer] fix log buffer idle polling (#9012)
* fix log buffer idle polling

* log_buffer: document notificationHealthCheckInterval tradeoffs

Explain that notifyChan is the primary wakeup path and this interval only
bounds the fallback / state-recheck cadence, so future maintainers don't
tune it without understanding the implications for client-disconnect
detection latency.

* log_buffer: rename waitForNotification to awaitNotificationOrTimeout

The helper returns after either a notification or the health-check
timeout; the old name read like it blocked indefinitely. No behavior
change.

* log_buffer: wake blocked subscribers on shutdown

awaitNotificationOrTimeout previously only returned on notifyChan or the
health-check timeout, so ShutdownLogBuffer on an idle buffer (where
copyToFlush returns nil and loopFlush never fires the post-flush
notification) would leave subscribers parked for up to 250ms before they
noticed IsStopping.

Add an internal shutdownCh closed by ShutdownLogBuffer and select on it
from awaitNotificationOrTimeout, which is now a method on *LogBuffer.
Subscribers wake immediately, re-check IsStopping, and exit. No change
to LoopProcessLogData signatures or any caller (filer metadata
subscribers, MQ broker, local partition subscribe).

* log_buffer: regression tests for flush-notify wake-up

TestLoopFlush_NotifiesSubscribersAfterFlush directly verifies that
loopFlush calls notifySubscribers after processing a flush, so a reader
parked on notifyChan wakes promptly when a flush lands. Verified to fail
if that notification is removed.

TestLoopProcessLogDataWithOffset_WakesOnDataArrival is the end-to-end
counterpart: a real LoopProcessLogDataWithOffset reader parks on
notifyChan via the ResumeFromDiskError branch, then wakes and processes
the entry well under the 250ms fallback once data arrives.

* log_buffer: keep notification-timeout logs at V(4)

Revert the V(4)->V(5) demotion. Now that the shutdown wake-up path
exists and (with the follow-up fix) idle-polling CPU churn is bounded
by the 250ms health check, these timeout logs no longer flood at V=4
the way they did on the 10ms fallback, so the previous verbosity is
appropriate again.

* log_buffer: exit reader loops cleanly on shutdown

awaitNotificationOrTimeout returns true on both data notifications and
shutdown (shutdownCh closed). Without an explicit IsStopping() guard,
the ResumeFromDiskError, offset-based no-data, empty-buffer, and
timestamp-wait paths would either tight-spin against a closed shutdownCh
or, in the offset-based case, return ResumeFromDiskError to the caller
instead of exiting.

Add an IsStopping() check after each awaitNotificationOrTimeout call
that previously continued or returned ResumeFromDiskError, so subscribers
exit promptly with isDone=true and err=nil when ShutdownLogBuffer is
called.

* log_buffer: regression test for shutdown wake-up

Park a real LoopProcessLogDataWithOffset reader on notifyChan via the
ResumeFromDiskError branch, call ShutdownLogBuffer, and assert the
reader exits with isDone=true and err=nil well under the 250ms
fallback. Verified to fail (timeout) if the IsStopping() guards added
in the prior commit are removed.

* log_buffer: bump reader-park sleep to 50ms with rationale

Both wake-path tests use a sleep to give the goroutine time to reach
awaitNotificationOrTimeout before the test triggers the wake-up.
Bump from 20ms to 50ms and document the timing assumption to reduce
flakiness on slow CI. Both paths are race-free either way (a buffered
notification or a closed shutdownCh stays valid until consumed), so
this is purely about exercising the park-then-wake path rather than
the already-pending fast path.
2026-04-09 18:09:57 -07:00
..
2025-10-13 18:05:17 -07:00