Close lock server use-after-free race

Lock object lifetimes in the lock server are protected by reference
counts.  References are acquired while holding a lock on an rbtree.

Unfortunately, the decision to free lock objects wasn't tested while
also holding that lock on the rbtree.  A caller putting their object
would test the refcount, then wait to get the rbtree lock to remove it
from the tree.

There's a possible race where the decision is made to remove the object
but another reference is added before the object is removed.  This was
seen in testing and manifest as an incoming request handling path adding
a request message to the object before it is freed, losing the message.
Clients would then hang on a lock that never saw a response because
their request was freed with the lock object.

The fix is to hold the rbtree lock when testing the refcount and
deciding to free.  It adds a bit more contention but not significantly
so, given the wild existing contention on a per-fs spinlocked rbtree.

Signed-off-by: Zach Brown <zab@versity.com>
This commit is contained in:
Zach Brown
2024-10-30 13:04:13 -07:00
parent d94ec29ffa
commit 69d9040e68

View File

@@ -317,16 +317,18 @@ static void put_server_lock(struct lock_server_info *inf,
BUG_ON(!mutex_is_locked(&snode->mutex));
spin_lock(&inf->lock);
if (atomic_dec_and_test(&snode->refcount) &&
list_empty(&snode->granted) &&
list_empty(&snode->requested) &&
list_empty(&snode->invalidated)) {
spin_lock(&inf->lock);
rb_erase(&snode->node, &inf->locks_root);
spin_unlock(&inf->lock);
should_free = true;
}
spin_unlock(&inf->lock);
mutex_unlock(&snode->mutex);
if (should_free) {