Block cache shrink restart waits for rcu callbacks

We're seeing cpu livelocks in block shrinking where counters show that a
single block cache shrink call is only getting EAGAIN from repeated
rhashtable walk attempts.  It occurred to me that the running task might
be preventing an RCU grace period from ending by never blocking.

The hope of this commit is that by waiting for rcu callbacks to run
we'll ensure that any pending rebalance callback runs before we retry
the rhashtable walk again.  I haven't been able to reproduce this easily
so this is a stab in the dark.

Signed-off-by: Zach Brown <zab@versity.com>
This commit is contained in:
Zach Brown
2021-04-07 12:29:26 -07:00
parent 300791ecfa
commit 9ee7f7b9dc

View File

@@ -1074,10 +1074,11 @@ restart:
if (bp == NULL)
break;
if (bp == ERR_PTR(-EAGAIN)) {
/* hard reset to not hold rcu grace period across retries */
/* hard exit to wait for rcu rebalance to finish */
rhashtable_walk_stop(&iter);
rhashtable_walk_exit(&iter);
scoutfs_inc_counter(sb, block_cache_shrink_restart);
synchronize_rcu();
goto restart;
}