From 9ee7f7b9dc8aa2fb336edbcff56395043d70e3b2 Mon Sep 17 00:00:00 2001 From: Zach Brown Date: Wed, 7 Apr 2021 12:29:26 -0700 Subject: [PATCH] Block cache shrink restart waits for rcu callbacks We're seeing cpu livelocks in block shrinking where counters show that a single block cache shrink call is only getting EAGAIN from repeated rhashtable walk attempts. It occurred to me that the running task might be preventing an RCU grace period from ending by never blocking. The hope of this commit is that by waiting for rcu callbacks to run we'll ensure that any pending rebalance callback runs before we retry the rhashtable walk again. I haven't been able to reproduce this easily so this is a stab in the dark. Signed-off-by: Zach Brown --- kmod/src/block.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/kmod/src/block.c b/kmod/src/block.c index 60a1f030..1d80a667 100644 --- a/kmod/src/block.c +++ b/kmod/src/block.c @@ -1074,10 +1074,11 @@ restart: if (bp == NULL) break; if (bp == ERR_PTR(-EAGAIN)) { - /* hard reset to not hold rcu grace period across retries */ + /* hard exit to wait for rcu rebalance to finish */ rhashtable_walk_stop(&iter); rhashtable_walk_exit(&iter); scoutfs_inc_counter(sb, block_cache_shrink_restart); + synchronize_rcu(); goto restart; }