mirror of
https://github.com/versity/scoutfs.git
synced 2026-02-07 19:20:44 +00:00
fsx-mpi spins creating contention between ex holders of locks between nodes. It was tripping assertions in item invalidation as it tried to invalidate dirty items. Tracing showed that we were allowing holders of locks while we were invalidating. Our invalidation function would commit the current transaction, another task would hold the lock and dirty an item, and then invalidation would continue on and try to invalidate the dirty item. The invalidation code has always assumed that it's not running concurrently with item dirtying. The recursive locking change allowed acquireing blocked locks if the recursive flag was set. It'd then check holders after calling downconvert_worker (invalidation for us) and retry the downconvert if a holder appeared. That it allowed recursive holders regardless of who was alredy holding the lock is what let holders arrive once downconvert started on the blocked lock. Not only did this create our problem with invalidation, it also could leave items behind if the holder dirtied an item and dropped the lock between invalidation and before downconvert checked the holders again. The fix is to only allow recursive holders on blocked locks that already have holders. This ensures that holders will never increase past zero on blocked locks. Once the downconvert sees the holders drain it will call invalidation which won't have racing dirtiers. We can remove the holder check after invalidation entirely. With this fixed fsx-mpi no longer tries to invalidate dirty items as it bounces locks back and forth. Signed-off-by: Zach Brown <zab@versity.com>