Zach Brown 55dde87bb1 scoutfs: fix lock invalidation work deadlock
The client lock network message processing callbacks were built to
simply perform the processing work for the message in the networking
work context that it was called in.  This particularly makes sense for
invalidation because it has to interact with other components that
require blocking contexts (syncing commits, invalidating inodes,
truncating pages, etc).

The problem is that these messages are per-lock.  With the right
workloads we can use all the capacity for executing work just in lock
invalidation work.  There is no more work execution available for other
network processing.  Critically, the blocked invalidation work is
waiting for the commit thread to get its network responses before
invalidation can make forward progress.  I was easily reproducing
deadlocks by leaving behind a lot of locks and then triggering a flood
of invalidation requests on behalf of shrinking due to memory pressure.

The fix is to put locks on lists and have a small fixed number of work
contexts process all the locks pending for each message type.  The
network callbacks don't block, they just put the lock on the list and
queue the work that will walk the lists.  Invalidation now blocks one
work context, not the number of incoming requests.

There were some wait conditions in work that used to use the lock workq.
Other paths that change those conditions now have to know to queue the
work specifically, not just wake tasks which included blocked work
executors.

The other subtle impact of the change is that we can no longer rely on
networking to shutdown message processing work that was happening in its
callbacks.  We have to specifically stop our work queues in _shutdown.

Signed-off-by: Zach Brown <zab@versity.com>
2020-08-26 14:39:12 -07:00
Description
No description provided
8 MiB
Languages
C 87.1%
Shell 9.2%
Roff 2.5%
TeX 0.8%
Makefile 0.4%