From 9ee7f7b9dc8aa2fb336edbcff56395043d70e3b2 Mon Sep 17 00:00:00 2001
From: Zach Brown <zab@versity.com>
Date: Wed, 7 Apr 2021 12:29:26 -0700
Subject: [PATCH] Block cache shrink restart waits for rcu callbacks

We're seeing cpu livelocks in block shrinking where counters show that a
single block cache shrink call is only getting EAGAIN from repeated
rhashtable walk attempts.  It occurred to me that the running task might
be preventing an RCU grace period from ending by never blocking.

The hope of this commit is that by waiting for rcu callbacks to run
we'll ensure that any pending rebalance callback runs before we retry
the rhashtable walk again.  I haven't been able to reproduce this easily
so this is a stab in the dark.

Signed-off-by: Zach Brown <zab@versity.com>
---
 kmod/src/block.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kmod/src/block.c b/kmod/src/block.c
index 60a1f030..1d80a667 100644
--- a/kmod/src/block.c
+++ b/kmod/src/block.c
@@ -1074,10 +1074,11 @@ restart:
 		if (bp == NULL)
 			break;
 		if (bp == ERR_PTR(-EAGAIN)) {
-			/* hard reset to not hold rcu grace period across retries */
+			/* hard exit to wait for rcu rebalance to finish */
 			rhashtable_walk_stop(&iter);
 			rhashtable_walk_exit(&iter);
 			scoutfs_inc_counter(sb, block_cache_shrink_restart);
+			synchronize_rcu();
 			goto restart;
 		}