Only start new quorum election after a receive failure

It's possible for the quorum worker to be preempted for a long period,
especially on debug kernels. Since we only check for how much time
has passed, it's possible for a clean receive to inadvertently
trigger an election. This can cause the quorum-heartbeat-timeout
test to fail due to observed delays outside of the expected bounds.

Instead, make sure we had a receive failure before comparing timestamps.

Signed-off-by: Chris Kirby <ckirby@versity.com>
This commit is contained in:
Chris Kirby
2025-07-16 14:09:07 -05:00
parent 35bcad91a6
commit a896984f59

View File

@@ -822,6 +822,7 @@ static void scoutfs_quorum_worker(struct work_struct *work)
/* followers and candidates start new election on timeout */
if (qst.role != LEADER &&
msg.type == SCOUTFS_QUORUM_MSG_INVALID &&
ktime_after(ktime_get(), qst.timeout)) {
/* .. but only if their server has stopped */
if (!scoutfs_server_is_down(sb)) {