reclaim_timer uses a coarse clock, but does not account for the measurement error introduced by that -- it can falsely report reclaims as stalls, even if they are shorter by a full coarse clock tick from the requested threshold (blocked-reactor-notify-ms). Notably, if the stall threshold happens to be smaller or equal to coarse clock resolution, Scylla's log gets spammed with false stall reports. The resolution of coarse clocks in Linux is 1/CONFIG_HZ. This is typically equal to 1 ms or 4 ms, and stall thresholds of this order can occur in practice. Eliminate false positives by requiring the measured reclaim duration to be at least 1 clock tick longer than the configured threshold for it to be considered a stall. Fixes #10981 Closes #11680
108 KiB
108 KiB