Block cache shrink restart waits for rcu callbacks

We're seeing cpu livelocks in block shrinking where counters show that a single block cache shrink call is only getting EAGAIN from repeated rhashtable walk attempts. It occurred to me that the running task might be preventing an RCU grace period from ending by never blocking. The hope of this commit is that by waiting for rcu callbacks to run we'll ensure that any pending rebalance callback runs before we retry the rhashtable walk again. I haven't been able to reproduce this easily so this is a stab in the dark. Signed-off-by: Zach Brown <zab@versity.com>
2026-01-10 05:37:25 +00:00 · 2021-04-07 12:29:26 -07:00
parent 300791ecfa
commit 9ee7f7b9dc
1 changed files with 2 additions and 1 deletions
--- a/kmod/src/block.c
+++ b/kmod/src/block.c
@@ -1074,10 +1074,11 @@ restart:
 		if (bp == NULL)
 			break;
 		if (bp == ERR_PTR(-EAGAIN)) {
-			/* hard reset to not hold rcu grace period across retries */
+			/* hard exit to wait for rcu rebalance to finish */
 			rhashtable_walk_stop(&iter);
 			rhashtable_walk_exit(&iter);
 			scoutfs_inc_counter(sb, block_cache_shrink_restart);
+			synchronize_rcu();
 			goto restart;
 		}