No i_mutex in aio_read for data_wait_check

As a regular file reader first acquires a cluster lock it checks the
file's extents to see if the region it is reading contains offline
extents.

When this was written the extent operations didn't have their own
internal locking.  It was up to callers to use vfs mechanisms to
serialize readers and writers.  The aio_read data_wait_check extent
caller tried to use dio_count and an i_mutex acquisition to ensure that
our ioctls wouldn't modify extents.

This creates a bad inversion between the vfs i_mutex and our cluster
inode lock.  There are lots of fs methods which are called by the vfs
with i_mutex held which acquire a lock.  This read case was holding a
cluster lock and then acquiring i_mutex.

Since the data waiting was written the file data extent operations have
added their own extent_sem to protect the extent items from concurrent
callers.  We can rely on that internal locking and drop the bad i_mutex
use which causes the inversion.

Not surprisingly, this was trivial to deadlock by racing simple
utilities like cat and touch.

Signed-off-by: Zach Brown <zab@versity.com>
This commit is contained in:
Zach Brown
2021-04-26 16:09:13 -07:00
parent df90b3eb90
commit 8efb30afbc

View File

@@ -29,7 +29,12 @@
#include "per_task.h"
#include "omap.h"
/* TODO: Direct I/O, AIO */
/*
* Start a high level file read. We check for offline extents in the
* read region here so that we only check the extents once. We use the
* dio count to prevent releasing while we're reading after we've
* checked the extents.
*/
ssize_t scoutfs_file_aio_read(struct kiocb *iocb, const struct iovec *iov,
unsigned long nr_segs, loff_t pos)
{
@@ -43,30 +48,32 @@ ssize_t scoutfs_file_aio_read(struct kiocb *iocb, const struct iovec *iov,
int ret;
retry:
/* protect checked extents from release */
mutex_lock(&inode->i_mutex);
atomic_inc(&inode->i_dio_count);
mutex_unlock(&inode->i_mutex);
ret = scoutfs_lock_inode(sb, SCOUTFS_LOCK_READ,
SCOUTFS_LKF_REFRESH_INODE, inode, &inode_lock);
if (ret)
goto out;
if (scoutfs_per_task_add_excl(&si->pt_data_lock, &pt_ent, inode_lock)) {
/* protect checked extents from stage/release */
mutex_lock(&inode->i_mutex);
atomic_inc(&inode->i_dio_count);
mutex_unlock(&inode->i_mutex);
ret = scoutfs_data_wait_check_iov(inode, iov, nr_segs, pos,
SEF_OFFLINE,
SCOUTFS_IOC_DWO_READ,
&dw, inode_lock);
if (ret != 0)
goto out;
} else {
WARN_ON_ONCE(true);
}
ret = generic_file_aio_read(iocb, iov, nr_segs, pos);
out:
if (scoutfs_per_task_del(&si->pt_data_lock, &pt_ent))
inode_dio_done(inode);
inode_dio_done(inode);
scoutfs_per_task_del(&si->pt_data_lock, &pt_ent);
scoutfs_unlock(sb, inode_lock, SCOUTFS_LOCK_READ);
if (scoutfs_data_wait_found(&dw)) {