Only start new quorum election after a receive failure

It's possible for the quorum worker to be preempted for a long period, especially on debug kernels. Since we only check for how much time has passed, it's possible for a clean receive to inadvertently trigger an election. This can cause the quorum-heartbeat-timeout test to fail due to observed delays outside of the expected bounds. Instead, make sure we had a receive failure before comparing timestamps. Signed-off-by: Chris Kirby <ckirby@versity.com>
2026-01-08 04:55:21 +00:00 · 2025-07-16 14:09:07 -05:00
parent 35bcad91a6
commit a896984f59
1 changed files with 1 additions and 0 deletions
--- a/kmod/src/quorum.c
+++ b/kmod/src/quorum.c
@@ -822,6 +822,7 @@ static void scoutfs_quorum_worker(struct work_struct *work)

 		/* followers and candidates start new election on timeout */
 		if (qst.role != LEADER &&
+		    msg.type == SCOUTFS_QUORUM_MSG_INVALID &&
 		    ktime_after(ktime_get(), qst.timeout)) {
 			/* .. but only if their server has stopped */
 			if (!scoutfs_server_is_down(sb)) {