scoutfs: only allow recursive blocked hold

fsx-mpi spins creating contention between ex holders of locks between nodes. It was tripping assertions in item invalidation as it tried to invalidate dirty items. Tracing showed that we were allowing holders of locks while we were invalidating. Our invalidation function would commit the current transaction, another task would hold the lock and dirty an item, and then invalidation would continue on and try to invalidate the dirty item. The invalidation code has always assumed that it's not running concurrently with item dirtying. The recursive locking change allowed acquireing blocked locks if the recursive flag was set. It'd then check holders after calling downconvert_worker (invalidation for us) and retry the downconvert if a holder appeared. That it allowed recursive holders regardless of who was alredy holding the lock is what let holders arrive once downconvert started on the blocked lock. Not only did this create our problem with invalidation, it also could leave items behind if the holder dirtied an item and dropped the lock between invalidation and before downconvert checked the holders again. The fix is to only allow recursive holders on blocked locks that already have holders. This ensures that holders will never increase past zero on blocked locks. Once the downconvert sees the holders drain it will call invalidation which won't have racing dirtiers. We can remove the holder check after invalidation entirely. With this fixed fsx-mpi no longer tries to invalidate dirty items as it bounces locks back and forth. Signed-off-by: Zach Brown <zab@versity.com>
2026-02-07 11:10:44 +00:00 · 2017-10-19 13:01:47 -07:00
parent 0712ca6b9b
commit 95d8f4bf20
2 changed files with 10 additions and 21 deletions
--- a/kmod/src/dlmglue.c
+++ b/kmod/src/dlmglue.c
@@ -918,11 +918,14 @@ static inline int ocfs2_may_continue_on_blocked_lock(struct ocfs2_lock_res *lock
 	return wanted <= ocfs2_highest_compat_lock_level(lockres->l_blocking);
 }

+/* the caller doesn't have to wait on a blocked lock if their wanted level
+ * is compatible with it and there are already holders of the lock */
 static inline int lockres_allow_recursion(struct ocfs2_lock_res *lockres,
 					  int wanted)
 {
-	return (lockres->l_ops->flags & LOCK_TYPE_RECURSIVE)
-		&& wanted <= lockres->l_level;
+	return (lockres->l_ops->flags & LOCK_TYPE_RECURSIVE) &&
+	       wanted <= lockres->l_level &&
+	       (lockres->l_ex_holders || lockres->l_ro_holders);
 }

 static void ocfs2_init_mask_waiter(struct ocfs2_mask_waiter *mw)
@@ -2393,21 +2396,6 @@ recheck:
 		goto recheck;
 	}

-	if ((lockres->l_ops->flags & LOCK_TYPE_RECURSIVE) &&
-	    ((lockres->l_blocking == DLM_LOCK_PR && lockres->l_ex_holders) ||
-	     (lockres->l_blocking == DLM_LOCK_EX &&
-	      (lockres->l_ex_holders || lockres->l_ro_holders)))) {
-		/*
-		 * Recursive locks may have had their holder count
-		 * incremented while we were sleeping in
-		 * ->downconvert_worker. Recheck here.
-		 */
-		mlog(ML_BASTS, "lockres %s, block=%d:%d, level=%d:%d, ro=%d "
-		     "ex=%d, Recheck\n", lockres->l_name, blocking,
-		       lockres->l_blocking, level, lockres->l_level,
-		       lockres->l_ro_holders, lockres->l_ex_holders);
-		goto recheck;
-	}
 downconvert:
 	ctl->requeue = 0;

--- a/kmod/src/dlmglue.h
+++ b/kmod/src/dlmglue.h
@@ -282,10 +282,11 @@ struct ocfs2_lock_res_ops {
 /*
 * Tells dlmglue to override fairness considerations when locking this
 * lock type - the blocking flag will be ignored when a lock is
- * requested and we already have it at the appropriate level. This
- * allows a process to acquire a dlmglue lock on the same resource
- * multiple times in a row without deadlocking, even if another node has
- * asked for a competing lock on the resource.
+ * requested and we already have it at the appropriate level and the
+ * resource is currently held.  This allows a process to acquire a
+ * dlmglue lock on the same resource multiple times in a row without
+ * deadlocking, even if another node has asked for a competing lock on
+ * the resource.
 *
 * Note that lock/unlock calls must always be balanced (1 unlock for
 * every lock), even when this flag is set.