Start server commits when holds wait for alloc

Server code that wants to dirty blocks by holding a commit won't be allowed to until the current allocators for the server transaction have enough space for the holder. As an active holder applies the commit the allocators are refilled and the waiting holders will proceed. But the current allocators can have no resources as the server starts up. There will never be active holders to apply the commit and refill the allocators. In this case all the holders will block indefinitely. The fix is to trigger a server commit when a holder doesn't have room. It used to be that commits were only triggered when apply callers were waiting. We transfer some of that logic into a new 'committing' field so that we can have commits in flight without apply callers waiting. We add it to the server commit tracing. While we're at it we clean up the logic that tests if a hold can proceed. It used to be confusingly split across two functions that both could sample the current allocator space remaining. This could lead to weird cases where the first holder could use the second alloc remaining call, not the one whose values were tested to see if the holder could fit. Now each hold check only samples the allocators once. And finally we fix a subtle case where the budget exceeded message can spuriously trigger in the case where dirtying the freed list created a new empty block after the holder recorded the amount of space in the freed block. Signed-off-by: Zach Brown <zab@versity.com>
Merge pull request #132 from versity/zab/v1.15
2026-01-08 13:01:23 +00:00 · 2023-10-03 13:32:09 -07:00 · 2023-07-17 13:02:10 -07:00 · 2023-07-17 12:07:13 -07:00 · 2023-07-17 10:21:18 -07:00 · 2023-07-17 09:36:09 -07:00
16 changed files with 1250 additions and 141 deletions
--- a/ReleaseNotes.md
+++ b/ReleaseNotes.md
@@ -1,6 +1,39 @@
 Versity ScoutFS Release Notes
 =============================

+---
+v1.15
+\
+*Jul 17, 2023*
+
+Process log btree merge splicing in multiple commits.  This prevents a
+rare case where pending log merge completions contain more work than can
+be done in a single server commit, causing the server to trigger an
+assert shortly after starting.
+
+Fix spurious EINVAL from data writes when data\_prealloc\_contig\_only was
+set to 0.
+
+---
+v1.14
+\
+*Jun 29, 2023*
+
+Add get\_referring\_entries ioctl for getting directory entries that
+refer to an inode.
+
+Fix excessive CPU use in the move\_blocks interface when moving a large
+number of extents.
+
+Reduce fragmented data allocation when contig\_only prealloc is not in
+use by more consistently allocating multi-block extents within each
+aligned prealloc region.
+
+Avoid rare deadlock in metadata block cache recalim under both heavy
+load and memory pressure.
+
+Fix crash when using quorum\_heartbeat\_timeout\_ms mount option.
+
 ---
 v1.13
 \
--- a/kmod/src/data.c
+++ b/kmod/src/data.c
@@ -456,11 +456,11 @@ static int alloc_block(struct super_block *sb, struct inode *inode,

 	} else {
 		/*
-		 * Preallocation of aligned regions only preallocates if
-		 * the aligned region contains no extents at all.  This
-		 * could be fooled by offline sparse extents but we
-		 * don't want to iterate over all offline extents in the
-		 * aligned region.
+		 * Preallocation within aligned regions tries to
+		 * allocate an extent to fill the hole in the region
+		 * that contains iblock.  We'd have to add a bit of plumbing
+		 * to find previous extents so we only search for a next
+		 * extent from the front of the region and from iblock.
 		 */
 		div64_u64_rem(iblock, opts.data_prealloc_blocks, &rem);
 		start = iblock - rem;
@@ -468,8 +468,20 @@ static int alloc_block(struct super_block *sb, struct inode *inode,
 		ret = scoutfs_ext_next(sb, &data_ext_ops, &args, start, 1, &found);
 		if (ret < 0 && ret != -ENOENT)
 			goto out;
-		if (found.len && found.start < start + count)
-			count = 1;
+
+		/* trim count if there's an extent in the region before iblock */
+		if (found.len && found.start < iblock) {
+			count -= iblock - start;
+			start = iblock;
+			/* see if there's also an extent after iblock */
+			ret = scoutfs_ext_next(sb, &data_ext_ops, &args, iblock, 1, &found);
+			if (ret < 0 && ret != -ENOENT)
+				goto out;
+		}
+
+		/* trim count by next extent after iblock */
+		if (found.len && found.start > start && found.start < start + count)
+			count = (found.start - start);
 	}

 	/* overall prealloc limit */
@@ -1253,6 +1265,7 @@ int scoutfs_data_move_blocks(struct inode *from, u64 from_off,
 	from_iblock = from_off >> SCOUTFS_BLOCK_SM_SHIFT;
 	count = (byte_len + SCOUTFS_BLOCK_SM_MASK) >> SCOUTFS_BLOCK_SM_SHIFT;
 	to_iblock = to_off >> SCOUTFS_BLOCK_SM_SHIFT;
+	from_start = from_iblock;

 	/* only move extent blocks inside i_size, careful not to wrap */
 	from_size = i_size_read(from);
@@ -1329,7 +1342,7 @@ int scoutfs_data_move_blocks(struct inode *from, u64 from_off,

 			/* find the next extent to move */
 			ret = scoutfs_ext_next(sb, &data_ext_ops, &from_args,
-					       from_iblock, 1, &ext);
+					       from_start, 1, &ext);
 			if (ret < 0) {
 				if (ret == -ENOENT) {
 					done = true;
@@ -1417,6 +1430,12 @@ int scoutfs_data_move_blocks(struct inode *from, u64 from_off,
 							i_size_read(from);
 				i_size_write(to, to_size);
 			}
+
+			/* find next after moved extent, avoiding wrapping */
+			if (from_start + len < from_start)
+				from_start = from_iblock + count + 1;
+			else
+				from_start += len;
 		}


--- a/kmod/src/dir.c
+++ b/kmod/src/dir.c
@@ -1253,75 +1253,93 @@ int scoutfs_symlink_drop(struct super_block *sb, u64 ino,
 }

 /*
- * Find the next link backref key for the given ino starting from the
- * given dir inode and final entry position.  If we find a backref item
- * we add an allocated copy of it to the head of the caller's list.
+ * Find the next link backref items for the given ino starting from the
+ * given dir inode and final entry position.  For each backref item we
+ * add an allocated copy of it to the head of the caller's list.
 *
- * Returns 0 if we added an entry, -ENOENT if we didn't, and -errno for
- * search errors.
+ * Callers who are building a path can add one entry for each parent.
+ * They're left with a list of entries from the root down in list order.
+ *
+ * Callers who are gathering multiple entries for one inode get the
+ * entries in the opposite order that their items are found.
+ *
+ * Returns +ve for number of entries added, -ENOENT if no entries were
+ * found, or -errno on error.  It weirdly won't return 0, but early
+ * callers preferred -ENOENT so we use that for the case of no entries.
 *
 * Callers are comfortable with the race inherent to incrementally
- * building up a path with individual locked backref item lookups.
+ * gathering backrefs across multiple lock acquisitions.
 */
-int scoutfs_dir_add_next_linkref(struct super_block *sb, u64 ino,
-				 u64 dir_ino, u64 dir_pos,
-				 struct list_head *list)
+int scoutfs_dir_add_next_linkrefs(struct super_block *sb, u64 ino, u64 dir_ino, u64 dir_pos,
+				  int count, struct list_head *list)
 {
+	struct scoutfs_link_backref_entry *prev_ent = NULL;
 	struct scoutfs_link_backref_entry *ent = NULL;
 	struct scoutfs_lock *lock = NULL;
 	struct scoutfs_key last_key;
 	struct scoutfs_key key;
+	int nr = 0;
 	int len;
 	int ret;

-	ent = kmalloc(offsetof(struct scoutfs_link_backref_entry,
-			       dent.name[SCOUTFS_NAME_LEN]), GFP_KERNEL);
-	if (!ent) {
-		ret = -ENOMEM;
-		goto out;
-	}
-
-	INIT_LIST_HEAD(&ent->head);
-
 	init_dirent_key(&key, SCOUTFS_LINK_BACKREF_TYPE, ino, dir_ino, dir_pos);
-	init_dirent_key(&last_key, SCOUTFS_LINK_BACKREF_TYPE, ino, U64_MAX,
-			U64_MAX);
+	init_dirent_key(&last_key, SCOUTFS_LINK_BACKREF_TYPE, ino, U64_MAX, U64_MAX);

 	ret = scoutfs_lock_ino(sb, SCOUTFS_LOCK_READ, 0, ino, &lock);
 	if (ret)
 		goto out;

-	ret = scoutfs_item_next(sb, &key, &last_key, &ent->dent,
-				dirent_bytes(SCOUTFS_NAME_LEN), lock);
-	scoutfs_unlock(sb, lock, SCOUTFS_LOCK_READ);
-	lock = NULL;
-	if (ret < 0)
-		goto out;
+	while (nr < count) {
+		ent = kmalloc(offsetof(struct scoutfs_link_backref_entry,
+				       dent.name[SCOUTFS_NAME_LEN]), GFP_NOFS);
+		if (!ent) {
+			ret = -ENOMEM;
+			goto out;
+		}

-	len = ret - sizeof(struct scoutfs_dirent);
-	if (len < 1 || len > SCOUTFS_NAME_LEN) {
-		scoutfs_corruption(sb, SC_DIRENT_BACKREF_NAME_LEN,
-				   corrupt_dirent_backref_name_len,
-				   "ino %llu dir_ino %llu pos %llu key "SK_FMT" len %d",
-				   ino, dir_ino, dir_pos, SK_ARG(&key), len);
-		ret = -EIO;
-		goto out;
+		INIT_LIST_HEAD(&ent->head);
+
+		ret = scoutfs_item_next(sb, &key, &last_key, &ent->dent,
+					dirent_bytes(SCOUTFS_NAME_LEN), lock);
+		if (ret < 0) {
+			if (ret == -ENOENT && prev_ent)
+				prev_ent->last = true;
+			goto out;
+		}
+
+		len = ret - sizeof(struct scoutfs_dirent);
+		if (len < 1 || len > SCOUTFS_NAME_LEN) {
+			scoutfs_corruption(sb, SC_DIRENT_BACKREF_NAME_LEN,
+					   corrupt_dirent_backref_name_len,
+					   "ino %llu dir_ino %llu pos %llu key "SK_FMT" len %d",
+					   ino, dir_ino, dir_pos, SK_ARG(&key), len);
+			ret = -EIO;
+			goto out;
+		}
+
+		ent->dir_ino = le64_to_cpu(key.skd_major);
+		ent->dir_pos = le64_to_cpu(key.skd_minor);
+		ent->name_len = len;
+		ent->d_type = dentry_type(ent->dent.type);
+		ent->last = false;
+
+		trace_scoutfs_dir_add_next_linkref_found(sb, ino, ent->dir_ino, ent->dir_pos,
+							 ent->name_len);
+
+		list_add(&ent->head, list);
+		prev_ent = ent;
+		ent = NULL;
+		nr++;
+		scoutfs_key_inc(&key);
 	}

-	list_add(&ent->head, list);
-	ent->dir_ino = le64_to_cpu(key.skd_major);
-	ent->dir_pos = le64_to_cpu(key.skd_minor);
-	ent->name_len = len;
 	ret = 0;
 out:
-	trace_scoutfs_dir_add_next_linkref(sb, ino, dir_ino, dir_pos, ret,
-					   ent ? ent->dir_ino : 0,
-					   ent ? ent->dir_pos : 0,
-					   ent ? ent->name_len : 0);
+	scoutfs_unlock(sb, lock, SCOUTFS_LOCK_READ);
+	trace_scoutfs_dir_add_next_linkrefs(sb, ino, dir_ino, dir_pos, count, nr, ret);

-	if (ent && list_empty(&ent->head))
-		kfree(ent);
-	return ret;
+	kfree(ent);
+	return nr ?: ret;
 }

 static u64 first_backref_dir_ino(struct list_head *list)
@@ -1396,7 +1414,7 @@ retry:
 	}

 	/* get the next link name to the given inode */
-	ret = scoutfs_dir_add_next_linkref(sb, ino, dir_ino, dir_pos, list);
+	ret = scoutfs_dir_add_next_linkrefs(sb, ino, dir_ino, dir_pos, 1, list);
 	if (ret < 0)
 		goto out;

@@ -1404,7 +1422,7 @@ retry:
 	par_ino = first_backref_dir_ino(list);
 	while (par_ino != SCOUTFS_ROOT_INO) {

-		ret = scoutfs_dir_add_next_linkref(sb, par_ino, 0, 0, list);
+		ret = scoutfs_dir_add_next_linkrefs(sb, par_ino, 0, 0, 1, list);
 		if (ret < 0) {
 			if (ret == -ENOENT) {
 				/* restart if there was no parent component */
@@ -1416,6 +1434,8 @@ retry:

 		par_ino = first_backref_dir_ino(list);
 	}
+
+	ret = 0;
 out:
 	if (ret < 0)
 		scoutfs_dir_free_backref_path(sb, list);
--- a/kmod/src/dir.h
+++ b/kmod/src/dir.h
@@ -15,6 +15,8 @@ struct scoutfs_link_backref_entry {
 	u64 dir_ino;
 	u64 dir_pos;
 	u16 name_len;
+	u8 d_type;
+	bool last;
 	struct scoutfs_dirent dent;
 	/* the full name is allocated and stored in dent.name[] */
 };
@@ -24,9 +26,8 @@ int scoutfs_dir_get_backref_path(struct super_block *sb, u64 ino, u64 dir_ino,
 void scoutfs_dir_free_backref_path(struct super_block *sb,
 				   struct list_head *list);

-int scoutfs_dir_add_next_linkref(struct super_block *sb, u64 ino,
-				 u64 dir_ino, u64 dir_pos,
-				 struct list_head *list);
+int scoutfs_dir_add_next_linkrefs(struct super_block *sb, u64 ino, u64 dir_ino, u64 dir_pos,
+				  int count, struct list_head *list);

 int scoutfs_symlink_drop(struct super_block *sb, u64 ino,
 			 struct scoutfs_lock *lock, u64 i_size);
--- a/kmod/src/export.c
+++ b/kmod/src/export.c
@@ -114,8 +114,8 @@ static struct dentry *scoutfs_get_parent(struct dentry *child)
 	int ret;
 	u64 ino;

-	ret = scoutfs_dir_add_next_linkref(sb, scoutfs_ino(inode), 0, 0, &list);
-	if (ret)
+	ret = scoutfs_dir_add_next_linkrefs(sb, scoutfs_ino(inode), 0, 0, 1, &list);
+	if (ret < 0)
 		return ERR_PTR(ret);

 	ent = list_first_entry(&list, struct scoutfs_link_backref_entry, head);
@@ -138,9 +138,9 @@ static int scoutfs_get_name(struct dentry *parent, char *name,
 	LIST_HEAD(list);
 	int ret;

-	ret = scoutfs_dir_add_next_linkref(sb, scoutfs_ino(inode), dir_ino,
-					   0, &list);
-	if (ret)
+	ret = scoutfs_dir_add_next_linkrefs(sb, scoutfs_ino(inode), dir_ino,
+					    0, 1, &list);
+	if (ret < 0)
 		return ret;

 	ret = -ENOENT;
--- a/kmod/src/ioctl.c
+++ b/kmod/src/ioctl.c
@@ -1398,6 +1398,110 @@ out:
 	return ret ?: nr;
 }

+/*
+ * Copy entries that point to an inode to the user's buffer.  We copy to
+ * userspace from copies of the entries that are acquired under a lock
+ * so that we don't fault while holding cluster locks.  It also gives us
+ * a chance to limit the amount of work under each lock hold.
+ */
+static long scoutfs_ioc_get_referring_entries(struct file *file, unsigned long arg)
+{
+	struct super_block *sb = file_inode(file)->i_sb;
+	struct scoutfs_ioctl_get_referring_entries gre;
+	struct scoutfs_link_backref_entry *bref = NULL;
+	struct scoutfs_link_backref_entry *bref_tmp;
+	struct scoutfs_ioctl_dirent __user *uent;
+	struct scoutfs_ioctl_dirent ent;
+	LIST_HEAD(list);
+	u64 copied;
+	int name_len;
+	int bytes;
+	long nr;
+	int ret;
+
+	if (!capable(CAP_DAC_READ_SEARCH))
+		return -EPERM;
+
+	if (copy_from_user(&gre, (void __user *)arg, sizeof(gre)))
+		return -EFAULT;
+
+	uent = (void __user *)(unsigned long)gre.entries_ptr;
+	copied = 0;
+	nr = 0;
+
+	/* use entry as cursor between calls */
+	ent.dir_ino = gre.dir_ino;
+	ent.dir_pos = gre.dir_pos;
+
+	for (;;) {
+		ret = scoutfs_dir_add_next_linkrefs(sb, gre.ino, ent.dir_ino, ent.dir_pos, 1024,
+						    &list);
+		if (ret < 0) {
+			if (ret == -ENOENT)
+				ret = 0;
+			goto out;
+		}
+
+		/* _add_next adds each entry to the head, _reverse for key order */
+		list_for_each_entry_safe_reverse(bref, bref_tmp, &list, head) {
+			list_del_init(&bref->head);
+
+			name_len = bref->name_len;
+			bytes = ALIGN(offsetof(struct scoutfs_ioctl_dirent, name[name_len + 1]),
+				      16);
+			if (copied + bytes > gre.entries_bytes) {
+				ret = -EINVAL;
+				goto out;
+			}
+
+			ent.dir_ino = bref->dir_ino;
+			ent.dir_pos = bref->dir_pos;
+			ent.ino = gre.ino;
+			ent.entry_bytes = bytes;
+			ent.flags = bref->last ? SCOUTFS_IOCTL_DIRENT_FLAG_LAST : 0;
+			ent.d_type = bref->d_type;
+			ent.name_len = name_len;
+
+			if (copy_to_user(uent, &ent, sizeof(struct scoutfs_ioctl_dirent)) ||
+			    copy_to_user(&uent->name[0], bref->dent.name, name_len) ||
+			    put_user('\0', &uent->name[name_len])) {
+				ret = -EFAULT;
+				goto out;
+			}
+
+			kfree(bref);
+			bref = NULL;
+
+			uent = (void __user *)uent + bytes;
+			copied += bytes;
+			nr++;
+
+			if (nr == LONG_MAX || (ent.flags & SCOUTFS_IOCTL_DIRENT_FLAG_LAST)) {
+				ret = 0;
+				goto out;
+			}
+		}
+
+		/* advance cursor pos from last copied entry */
+		if (++ent.dir_pos == 0) {
+			if (++ent.dir_ino == 0) {
+				ret = 0;
+				goto out;
+			}
+		}
+	}
+
+	ret = 0;
+out:
+	kfree(bref);
+	list_for_each_entry_safe(bref, bref_tmp, &list, head) {
+		list_del_init(&bref->head);
+		kfree(bref);
+	}
+
+	return nr ?: ret;
+}
+
 long scoutfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 {
 	switch (cmd) {
@@ -1433,6 +1537,8 @@ long scoutfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 		return scoutfs_ioc_read_xattr_totals(file, arg);
 	case SCOUTFS_IOC_GET_ALLOCATED_INOS:
 		return scoutfs_ioc_get_allocated_inos(file, arg);
+	case SCOUTFS_IOC_GET_REFERRING_ENTRIES:
+		return scoutfs_ioc_get_referring_entries(file, arg);
 	}

 	return -ENOTTY;
--- a/kmod/src/ioctl.h
+++ b/kmod/src/ioctl.h
@@ -559,4 +559,118 @@ struct scoutfs_ioctl_get_allocated_inos {
 #define SCOUTFS_IOC_GET_ALLOCATED_INOS \
 	_IOW(SCOUTFS_IOCTL_MAGIC, 16, struct scoutfs_ioctl_get_allocated_inos)

+/*
+ * Get directory entries that refer to a specific inode.
+ *
+ * @ino: The target ino that we're finding referring entries to.
+ * Constant across all the calls that make up an iteration over all the
+ * inode's entries.
+ *
+ * @dir_ino: The inode number of a directory containing the entry to our
+ * inode to search from.  If this parent directory contains no more
+ * entries to our inode then we'll search through other parent directory
+ * inodes in inode order.
+ *
+ * @dir_pos: The position in the dir_ino parent directory of the entry
+ * to our inode to search from.  If there is no entry at this position
+ * then we'll search through other entry positions in increasing order.
+ * If we exhaust the parent directory then we'll search through
+ * additional parent directories in inode order.
+ *
+ * @entries_ptr: A pointer to the buffer where found entries will be
+ * stored.  The pointer must be aligned to 16 bytes.
+ *
+ * @entries_bytes: The size of the buffer that will contain entries.
+ *
+ * To start iterating set the desired target ino, dir_ino to 0, dir_pos
+ * to 0, and set result_ptr and _bytes to a sufficiently large buffer.
+ * Each entry struct that's stored in the buffer adds some overhead so a
+ * large multiple of the largest possible name is a reasonable choice.
+ * (A few multiples of PATH_MAX perhaps.)
+ *
+ * Each call returns the total number of entries that were stored in the
+ * entries buffer.  Zero is returned when the search was successful and
+ * no referring entries were found.  The entries can be iterated over by
+ * advancing each starting struct offset by the total number of bytes in
+ * each entry.  If the _LAST flag is set on an entry then there were no
+ * more entries referring to the inode at the time of the call and
+ * iteration can be stopped.
+ *
+ * To resume iteration set the next call's starting dir_ino and dir_pos
+ * to one past the last entry seen.  Increment the last entry's dir_pos,
+ * and if it wrapped to 0, increment its dir_ino.
+ *
+ * This does not check that the caller has permission to read the
+ * entries found in each containing directory.  It requires
+ * CAP_DAC_READ_SEARCH which bypasses path traversal permissions
+ * checking.
+ *
+ * Entries returned by a single call can reflect any combination of
+ * racing creation and removal of entries.  Each entry existed at the
+ * time it was read though it may have changed in the time it took to
+ * return from the call.  The set of entries returned may no longer
+ * reflect the current set of entries and may not have existed at the
+ * same time.
+ *
+ * This has no knowledge of the life cycle of the inode.  It can return
+ * 0 when there are no referring entries because either the target inode
+ * doesn't exist, it is in the process of being deleted, or because it
+ * is still open while being unlinked.
+ *
+ * On success this returns the number of entries filled in the buffer.
+ * A return of 0 indicates that no entries referred to the inode.
+ *
+ * EINVAL is returned when there is a problem with the buffer.  Either
+ * it was not aligned or it was not large enough for the first entry.
+ *
+ * Many other errnos indicate hard failure to find the next entry.
+ */
+struct scoutfs_ioctl_get_referring_entries {
+	__u64 ino;
+	__u64 dir_ino;
+	__u64 dir_pos;
+	__u64 entries_ptr;
+	__u64 entries_bytes;
+};
+
+/*
+ * @dir_ino: The inode of the directory containing the entry.
+ *
+ * @dir_pos: The readdir f_pos position of the entry within the
+ * directory.
+ *
+ * @ino: The inode number of the target of the entry.
+ *
+ * @flags: Flags associated with this entry.
+ *
+ * @d_type: Inode type as specified with DT_ enum values in readdir(3).
+ *
+ * @entry_bytes: The total bytes taken by the entry in memory, including
+ * the name and any alignment padding.  The start of a following entry
+ * will be found after this number of bytes.
+ *
+ * @name_len: The number of bytes in the name not including the trailing
+ * null, ala strlen(3).
+ *
+ * @name: The null terminated name of the referring entry.  In the
+ * struct definition this array is sized to naturally align the struct.
+ * That number of padded bytes are not necessarily found in the buffer
+ * returned by _get_referring_entries;
+ */
+struct scoutfs_ioctl_dirent {
+	__u64 dir_ino;
+	__u64 dir_pos;
+	__u64 ino;
+	__u16 entry_bytes;
+	__u8  flags;
+	__u8  d_type;
+	__u8  name_len;
+	__u8  name[3];
+};
+
+#define SCOUTFS_IOCTL_DIRENT_FLAG_LAST (1 << 0)
+
+#define SCOUTFS_IOC_GET_REFERRING_ENTRIES \
+	_IOW(SCOUTFS_IOCTL_MAGIC, 17, struct scoutfs_ioctl_get_referring_entries)
+
 #endif
--- a/kmod/src/scoutfs_trace.h
+++ b/kmod/src/scoutfs_trace.h
@@ -817,22 +817,17 @@ TRACE_EVENT(scoutfs_advance_dirty_super,
 	TP_printk(SCSBF" super seq now %llu", SCSB_TRACE_ARGS, __entry->seq)
 );

-TRACE_EVENT(scoutfs_dir_add_next_linkref,
+TRACE_EVENT(scoutfs_dir_add_next_linkref_found,
 	TP_PROTO(struct super_block *sb, __u64 ino, __u64 dir_ino,
-		 __u64 dir_pos, int ret, __u64 found_dir_ino,
-		 __u64 found_dir_pos, unsigned int name_len),
+		 __u64 dir_pos, unsigned int name_len),

-	TP_ARGS(sb, ino, dir_ino, dir_pos, ret, found_dir_pos, found_dir_ino,
-		name_len),
+	TP_ARGS(sb, ino, dir_ino, dir_pos, name_len),

 	TP_STRUCT__entry(
 		SCSB_TRACE_FIELDS
 		__field(__u64, ino)
 		__field(__u64, dir_ino)
 		__field(__u64, dir_pos)
-		__field(int, ret)
-		__field(__u64, found_dir_ino)
-		__field(__u64, found_dir_pos)
 		__field(unsigned int, name_len)
 	),

@@ -841,16 +836,43 @@ TRACE_EVENT(scoutfs_dir_add_next_linkref,
 		__entry->ino = ino;
 		__entry->dir_ino = dir_ino;
 		__entry->dir_pos = dir_pos;
-		__entry->ret = ret;
-		__entry->found_dir_ino = dir_ino;
-		__entry->found_dir_pos = dir_pos;
 		__entry->name_len = name_len;
 	),

-	TP_printk(SCSBF" ino %llu dir_ino %llu dir_pos %llu ret %d found_dir_ino %llu found_dir_pos %llu name_len %u",
-		  SCSB_TRACE_ARGS, __entry->ino, __entry->dir_pos,
-		  __entry->dir_ino, __entry->ret, __entry->found_dir_pos,
-		  __entry->found_dir_ino, __entry->name_len)
+	TP_printk(SCSBF" ino %llu dir_ino %llu dir_pos %llu name_len %u",
+		  SCSB_TRACE_ARGS, __entry->ino, __entry->dir_ino,
+		  __entry->dir_pos, __entry->name_len)
+);
+
+TRACE_EVENT(scoutfs_dir_add_next_linkrefs,
+	TP_PROTO(struct super_block *sb, __u64 ino, __u64 dir_ino,
+		 __u64 dir_pos, int count, int nr, int ret),
+
+	TP_ARGS(sb, ino, dir_ino, dir_pos, count, nr, ret),
+
+	TP_STRUCT__entry(
+		SCSB_TRACE_FIELDS
+		__field(__u64, ino)
+		__field(__u64, dir_ino)
+		__field(__u64, dir_pos)
+		__field(int, count)
+		__field(int, nr)
+		__field(int, ret)
+	),
+
+	TP_fast_assign(
+		SCSB_TRACE_ASSIGN(sb);
+		__entry->ino = ino;
+		__entry->dir_ino = dir_ino;
+		__entry->dir_pos = dir_pos;
+		__entry->count = count;
+		__entry->nr = nr;
+		__entry->ret = ret;
+	),
+
+	TP_printk(SCSBF" ino %llu dir_ino %llu dir_pos %llu count %d nr %d ret %d",
+		  SCSB_TRACE_ARGS, __entry->ino, __entry->dir_ino,
+		  __entry->dir_pos, __entry->count, __entry->nr, __entry->ret)
 );

 TRACE_EVENT(scoutfs_write_begin,
@@ -1874,8 +1896,9 @@ DEFINE_EVENT(scoutfs_server_client_count_class, scoutfs_server_client_down,

 DECLARE_EVENT_CLASS(scoutfs_server_commit_users_class,
        TP_PROTO(struct super_block *sb, int holding, int applying, int nr_holders,
-		 u32 avail_before, u32 freed_before, int exceeded),
-        TP_ARGS(sb, holding, applying, nr_holders, avail_before, freed_before, exceeded),
+		 u32 avail_before, u32 freed_before, int committing, int exceeded),
+        TP_ARGS(sb, holding, applying, nr_holders, avail_before, freed_before, committing,
+		exceeded),
        TP_STRUCT__entry(
 		SCSB_TRACE_FIELDS
 		__field(int, holding)
@@ -1883,6 +1906,7 @@ DECLARE_EVENT_CLASS(scoutfs_server_commit_users_class,
 		__field(int, nr_holders)
 		__field(__u32, avail_before)
 		__field(__u32, freed_before)
+		__field(int, committing)
 		__field(int, exceeded)
        ),
        TP_fast_assign(
@@ -1892,31 +1916,33 @@ DECLARE_EVENT_CLASS(scoutfs_server_commit_users_class,
 		__entry->nr_holders = nr_holders;
 		__entry->avail_before = avail_before;
 		__entry->freed_before = freed_before;
+		__entry->committing = !!committing;
 		__entry->exceeded = !!exceeded;
        ),
-	TP_printk(SCSBF" holding %u applying %u nr %u avail_before %u freed_before %u exceeded %u",
+	TP_printk(SCSBF" holding %u applying %u nr %u avail_before %u freed_before %u committing %u exceeded %u",
 		  SCSB_TRACE_ARGS, __entry->holding, __entry->applying, __entry->nr_holders,
-		  __entry->avail_before, __entry->freed_before, __entry->exceeded)
+		  __entry->avail_before, __entry->freed_before, __entry->committing,
+		  __entry->exceeded)
 );
 DEFINE_EVENT(scoutfs_server_commit_users_class, scoutfs_server_commit_hold,
        TP_PROTO(struct super_block *sb, int holding, int applying, int nr_holders,
-		 u32 avail_before, u32 freed_before, int exceeded),
-        TP_ARGS(sb, holding, applying, nr_holders, avail_before, freed_before, exceeded)
+		 u32 avail_before, u32 freed_before, int committing, int exceeded),
+        TP_ARGS(sb, holding, applying, nr_holders, avail_before, freed_before, committing, exceeded)
 );
 DEFINE_EVENT(scoutfs_server_commit_users_class, scoutfs_server_commit_apply,
        TP_PROTO(struct super_block *sb, int holding, int applying, int nr_holders,
-		 u32 avail_before, u32 freed_before, int exceeded),
-        TP_ARGS(sb, holding, applying, nr_holders, avail_before, freed_before, exceeded)
+		 u32 avail_before, u32 freed_before, int committing, int exceeded),
+        TP_ARGS(sb, holding, applying, nr_holders, avail_before, freed_before, committing, exceeded)
 );
 DEFINE_EVENT(scoutfs_server_commit_users_class, scoutfs_server_commit_start,
        TP_PROTO(struct super_block *sb, int holding, int applying, int nr_holders,
-		 u32 avail_before, u32 freed_before, int exceeded),
-        TP_ARGS(sb, holding, applying, nr_holders, avail_before, freed_before, exceeded)
+		 u32 avail_before, u32 freed_before, int committing, int exceeded),
+        TP_ARGS(sb, holding, applying, nr_holders, avail_before, freed_before, committing, exceeded)
 );
 DEFINE_EVENT(scoutfs_server_commit_users_class, scoutfs_server_commit_end,
        TP_PROTO(struct super_block *sb, int holding, int applying, int nr_holders,
-		 u32 avail_before, u32 freed_before, int exceeded),
-        TP_ARGS(sb, holding, applying, nr_holders, avail_before, freed_before, exceeded)
+		 u32 avail_before, u32 freed_before, int committing, int exceeded),
+        TP_ARGS(sb, holding, applying, nr_holders, avail_before, freed_before, committing, exceeded)
 );

 #define slt_symbolic(mode)						\
--- a/kmod/src/server.c
+++ b/kmod/src/server.c
@@ -67,6 +67,7 @@ struct commit_users {
 	unsigned int nr_holders;
 	u32 avail_before;
 	u32 freed_before;
+	bool committing;
 	bool exceeded;
 };

@@ -84,7 +85,7 @@ do {												\
 	__typeof__(cusers) _cusers = (cusers);							\
 	trace_scoutfs_server_commit_##which(sb, !list_empty(&_cusers->holding),			\
 		!list_empty(&_cusers->applying), _cusers->nr_holders, _cusers->avail_before,	\
-		_cusers->freed_before, _cusers->exceeded);					\
+		_cusers->freed_before, _cusers->committing, _cusers->exceeded);			\
 } while (0)

 struct server_info {
@@ -282,6 +283,14 @@ struct commit_hold {
 * per-holder allocation consumption tracking.   The best we can do is
 * flag all the current holders so that as they release we can see
 * everyone involved in crossing the limit.
+ *
+ * The consumption of space to record freed blocks is tricky.  The
+ * freed_before value was the space available as the holder started.
+ * But that happens before we actually dirty the first block in the
+ * freed list.  If that block is too full then we just allocate a new
+ * empty first block.  In that case the current remaining here can be a
+ * lot more than the initial freed_before.  We account for that and
+ * treat freed_before as the maximum capacity.
 */
 static void check_holder_budget(struct super_block *sb, struct server_info *server,
 				struct commit_users *cusers)
@@ -301,8 +310,13 @@ static void check_holder_budget(struct super_block *sb, struct server_info *serv
 		return;

 	scoutfs_alloc_meta_remaining(&server->alloc, &avail_now, &freed_now);
+
 	avail_used = cusers->avail_before - avail_now;
-	freed_used = cusers->freed_before - freed_now;
+	if (freed_now < cusers->freed_before)
+		freed_used = cusers->freed_before - freed_now;
+	else
+		freed_used = SCOUTFS_ALLOC_LIST_MAX_BLOCKS - freed_now;
+
 	budget = cusers->nr_holders * COMMIT_HOLD_ALLOC_BUDGET;
 	if (avail_used <= budget && freed_used <= budget)
 		return;
@@ -325,31 +339,18 @@ static void check_holder_budget(struct super_block *sb, struct server_info *serv
 /*
 * We don't have per-holder consumption.   We allow commit holders as
 * long as the total budget of all the holders doesn't exceed the alloc
- * resources that were available
+ * resources that were available.  If a hold is waiting for budget
+ * availability in the allocators then we try and kick off a commit to
+ * fill and use the next allocators after the current transaction.
 */
-static bool commit_alloc_has_room(struct server_info *server, struct commit_users *cusers,
-				  unsigned int more_holders)
-{
-	u32 avail_before;
-	u32 freed_before;
-	u32 budget;
-
-	if (cusers->nr_holders > 0) {
-		avail_before = cusers->avail_before;
-		freed_before = cusers->freed_before;
-	} else {
-		scoutfs_alloc_meta_remaining(&server->alloc, &avail_before, &freed_before);
-	}
-
-	budget = (cusers->nr_holders + more_holders) * COMMIT_HOLD_ALLOC_BUDGET;
-
-	return avail_before >= budget && freed_before >= budget;
-}
-
 static bool hold_commit(struct super_block *sb, struct server_info *server,
 			struct commit_users *cusers, struct commit_hold *hold)
 {
-	bool held = false;
+	bool has_room;
+	bool held;
+	u32 budget;
+	u32 av;
+	u32 fr;

 	spin_lock(&cusers->lock);

@@ -357,19 +358,39 @@ static bool hold_commit(struct super_block *sb, struct server_info *server,

 	check_holder_budget(sb, server, cusers);

+	if (cusers->nr_holders == 0) {
+		scoutfs_alloc_meta_remaining(&server->alloc, &av, &fr);
+	} else {
+		av = cusers->avail_before;
+		fr = cusers->freed_before;
+	}
+
 	/* +2 for our additional hold and then for the final commit work the server does */
-	if (list_empty(&cusers->applying) && commit_alloc_has_room(server, cusers, 2)) {
-		scoutfs_alloc_meta_remaining(&server->alloc, &hold->avail, &hold->freed);
+	budget = (cusers->nr_holders + 2) * COMMIT_HOLD_ALLOC_BUDGET;
+	has_room = av >= budget && fr >= budget;
+	/* checking applying so holders drain once an apply caller starts waiting */
+	held = !cusers->committing && has_room && list_empty(&cusers->applying);
+
+	if (held) {
 		if (cusers->nr_holders == 0) {
-			cusers->avail_before = hold->avail;
-			cusers->freed_before = hold->freed;
+			cusers->avail_before = av;
+			cusers->freed_before = fr;
+			hold->avail = av;
+			hold->freed = fr;
 			cusers->exceeded = false;
+		} else {
+			scoutfs_alloc_meta_remaining(&server->alloc, &hold->avail, &hold->freed);
 		}
+
 		hold->exceeded = false;
 		hold->start = ktime_get();
 		list_add_tail(&hold->entry, &cusers->holding);
+
 		cusers->nr_holders++;
-		held = true;
+
+	} else if (!has_room && cusers->nr_holders == 0 && !cusers->committing) {
+		cusers->committing = true;
+		queue_work(server->wq, &server->commit_work);
 	}

 	spin_unlock(&cusers->lock);
@@ -403,7 +424,6 @@ static int server_apply_commit(struct super_block *sb, struct commit_hold *hold,
 	DECLARE_SERVER_INFO(sb, server);
 	struct commit_users *cusers = &server->cusers;
 	struct timespec ts;
-	bool start_commit;

 	spin_lock(&cusers->lock);

@@ -424,12 +444,14 @@ static int server_apply_commit(struct super_block *sb, struct commit_hold *hold,
 		list_del_init(&hold->entry);
 		hold->ret = err;
 	}
-	cusers->nr_holders--;
-	start_commit = cusers->nr_holders == 0 && !list_empty(&cusers->applying);
-	spin_unlock(&cusers->lock);

-	if (start_commit)
+	cusers->nr_holders--;
+	if (cusers->nr_holders == 0 && !cusers->committing && !list_empty(&cusers->applying)) {
+		cusers->committing = true;
 		queue_work(server->wq, &server->commit_work);
+	}
+
+	spin_unlock(&cusers->lock);

 	wait_event(cusers->waitq, list_empty_careful(&hold->entry));
 	smp_rmb(); /* entry load before ret */
@@ -438,8 +460,8 @@ static int server_apply_commit(struct super_block *sb, struct commit_hold *hold,

 /*
 * Start a commit from the commit work.  We should only have been queued
- * while a holder is waiting to apply after all active holders have
- * finished.
+ * while there are no active holders and someone started the commit.
+ * There may or may not be blocked apply callers waiting for the result.
 */
 static int commit_start(struct super_block *sb, struct commit_users *cusers)
 {
@@ -448,7 +470,7 @@ static int commit_start(struct super_block *sb, struct commit_users *cusers)
 	/* make sure holders held off once commit started */
 	spin_lock(&cusers->lock);
 	TRACE_COMMIT_USERS(sb, cusers, start);
-	if (WARN_ON_ONCE(list_empty(&cusers->applying) || cusers->nr_holders != 0))
+	if (WARN_ON_ONCE(!cusers->committing || cusers->nr_holders != 0))
 		ret = -EINVAL;
 	spin_unlock(&cusers->lock);

@@ -471,6 +493,7 @@ static void commit_end(struct super_block *sb, struct commit_users *cusers, int
 	smp_wmb(); /* ret stores before list updates */
 	list_for_each_entry_safe(hold, tmp, &cusers->applying, entry)
 		list_del_init(&hold->entry);
+	cusers->committing = false;
 	spin_unlock(&cusers->lock);

 	wake_up(&cusers->waitq);
@@ -543,7 +566,7 @@ static void set_stable_super(struct server_info *server, struct scoutfs_super_bl
 * implement commits with a single pending work func.
 *
 * Processing paths hold the commit while they're making multiple
- * dependent changes.  When they're done and want it persistent they add
+ * dependent changes.  When they're done and want it persistent they
 * queue the commit work.  This work runs, performs the commit, and
 * wakes all the applying waiters with the result.  Readers can run
 * concurrently with these commits.
@@ -2058,6 +2081,13 @@ out:
 * reset the next range key if there's still work to do.  If the
 * operation is complete then we tear down the input log_trees items and
 * delete the status.
+ *
+ * Processing all the completions can take more than one transaction.
+ * We return -EINPROGRESS if we have to commit a transaction and the
+ * caller will apply the commit and immediate call back in so we can
+ * perform another commit.  We need to be very careful to leave the
+ * status in a state where requests won't be issued at the wrong time
+ * (by forcing nr_completions to a batch while we delete them).
 */
 static int splice_log_merge_completions(struct super_block *sb,
 					struct scoutfs_log_merge_status *stat,
@@ -2070,15 +2100,29 @@ static int splice_log_merge_completions(struct super_block *sb,
 	struct scoutfs_log_merge_range rng;
 	struct scoutfs_log_trees lt = {{{0,}}};
 	SCOUTFS_BTREE_ITEM_REF(iref);
+	bool upd_stat = true;
+	int einprogress = 0;
 	struct scoutfs_key key;
 	char *err_str = NULL;
+	u32 alloc_low;
+	u32 tmp;
 	u64 seq;
 	int ret;
+	int err;

 	/* musn't rebalance fs tree parents while reqs rely on their key bounds */
 	if (WARN_ON_ONCE(le64_to_cpu(stat->nr_requests) > 0))
 		return -EIO;

+	/*
+	 * Be overly conservative about how low the allocator can get
+	 * before we commit.  This gives us a lot of work to do in a
+	 * commit while also allowing a pretty big smallest allocator to
+	 * work with the theoretically unbounded alloc list splicing.
+	 */
+	scoutfs_alloc_meta_remaining(&server->alloc, &alloc_low, &tmp);
+	alloc_low = min(alloc_low, tmp) / 4;
+
 	/*
 	 * Splice in all the completed subtrees at the initial parent
 	 * blocks in the main fs_tree before rebalancing any of them.
@@ -2100,6 +2144,22 @@ static int splice_log_merge_completions(struct super_block *sb,

 		seq = le64_to_cpu(comp.seq);

+		/*
+		 * Use having cleared the lists as an indication that
+		 * we've already set the parents and don't need to dirty
+		 * the btree blocks to do it all over again.  This is
+		 * safe because there is always an fs block that the
+		 * merge dirties and frees into the meta_freed list.
+		 */
+		if (comp.meta_avail.ref.blkno == 0 && comp.meta_freed.ref.blkno == 0)
+			continue;
+
+		if (scoutfs_alloc_meta_low(sb, &server->alloc, alloc_low)) {
+			einprogress = -EINPROGRESS;
+			ret = 0;
+			goto out;
+		}
+
 		ret = scoutfs_btree_set_parent(sb, &server->alloc, &server->wri,
 					       &super->fs_root, &comp.start,
 					       &comp.root);
@@ -2134,6 +2194,14 @@ static int splice_log_merge_completions(struct super_block *sb,
 		}
 	}

+	/*
+	 * Once we start rebalancing we force the number of completions
+	 * to a batch so that requests won't be issued.  Once we're done
+	 * we clear the completion count and requests can flow again.
+	 */
+	if (le64_to_cpu(stat->nr_complete) < LOG_MERGE_SPLICE_BATCH)
+		stat->nr_complete = cpu_to_le64(LOG_MERGE_SPLICE_BATCH);
+
 	/*
 	 * Now with all the parent blocks spliced in, rebalance items
 	 * amongst parents that needed to split/join and delete the
@@ -2155,6 +2223,12 @@ static int splice_log_merge_completions(struct super_block *sb,

 		seq = le64_to_cpu(comp.seq);

+		if (scoutfs_alloc_meta_low(sb, &server->alloc, alloc_low)) {
+			einprogress = -EINPROGRESS;
+			ret = 0;
+			goto out;
+		}
+
 		/* balance when there was a remaining key range */
 		if (le64_to_cpu(comp.flags) & SCOUTFS_LOG_MERGE_COMP_REMAIN) {
 			ret = scoutfs_btree_rebalance(sb, &server->alloc,
@@ -2194,18 +2268,11 @@ static int splice_log_merge_completions(struct super_block *sb,
 		}
 	}

-	/* update the status once all completes are processed */
-	scoutfs_key_set_zeros(&stat->next_range_key);
-	stat->nr_complete = 0;
-
 	/* update counts and done if there's still ranges to process */
 	if (!no_ranges) {
-		init_log_merge_key(&key, SCOUTFS_LOG_MERGE_STATUS_ZONE, 0, 0);
-		ret = scoutfs_btree_update(sb, &server->alloc, &server->wri,
-					   &super->log_merge, &key,
-					   stat, sizeof(*stat));
-		if (ret < 0)
-			err_str = "update status";
+		scoutfs_key_set_zeros(&stat->next_range_key);
+		stat->nr_complete = 0;
+		ret = 0;
 		goto out;
 	}

@@ -2241,6 +2308,12 @@ static int splice_log_merge_completions(struct super_block *sb,
 		      (le64_to_cpu(lt.finalize_seq) < le64_to_cpu(stat->seq))))
 			continue;

+		if (scoutfs_alloc_meta_low(sb, &server->alloc, alloc_low)) {
+			einprogress = -EINPROGRESS;
+			ret = 0;
+			goto out;
+		}
+
 		fr.root = lt.item_root;
 		scoutfs_key_set_zeros(&fr.key);
 		fr.seq = cpu_to_le64(scoutfs_server_next_seq(sb));
@@ -2274,9 +2347,10 @@ static int splice_log_merge_completions(struct super_block *sb,
 		}

 		le64_add_cpu(&super->inode_count, le64_to_cpu(lt.inode_count_delta));
-
 	}

+	/* everything's done, remove the merge operation */
+	upd_stat = false;
 	init_log_merge_key(&key, SCOUTFS_LOG_MERGE_STATUS_ZONE, 0, 0);
 	ret = scoutfs_btree_delete(sb, &server->alloc, &server->wri,
 				   &super->log_merge, &key);
@@ -2285,12 +2359,23 @@ static int splice_log_merge_completions(struct super_block *sb,
 	else
 		err_str = "deleting merge status item";
 out:
+	if (upd_stat) {
+		init_log_merge_key(&key, SCOUTFS_LOG_MERGE_STATUS_ZONE, 0, 0);
+		err = scoutfs_btree_update(sb, &server->alloc, &server->wri,
+					   &super->log_merge, &key,
+					   stat, sizeof(struct scoutfs_log_merge_status));
+		if (err && !ret) {
+			err_str = "updating merge status item";
+			ret = err;
+		}
+	}
+
 	if (ret < 0)
 		scoutfs_err(sb, "server error %d splicing log merge completion: %s", ret, err_str);

 	BUG_ON(ret); /* inconsistent */

-	return ret;
+	return ret ?: einprogress;
 }

 /*
@@ -2465,6 +2550,12 @@ static void server_log_merge_free_work(struct work_struct *work)
 }

 /*
+ * Clients regularly ask if there is log merge work to do.  We process
+ * completions inline before responding so that we don't create large
+ * delays between completion processing and the next request.  We don't
+ * mind if the client get_log_merge request sees high latency, the
+ * blocked caller has nothing else to do.
+ *
 * This will return ENOENT to the client if there is no work to do.
 */
 static int server_get_log_merge(struct super_block *sb,
@@ -2532,14 +2623,22 @@ restart:
 			goto out;
 		}

-		/* maybe splice now that we know if there's ranges */
+		/* splice if we have a batch or ran out of ranges */
 		no_next = ret == -ENOENT;
 		no_ranges = scoutfs_key_is_zeros(&stat.next_range_key) && ret == -ENOENT;
 		if (le64_to_cpu(stat.nr_requests) == 0 &&
 		    (no_next || le64_to_cpu(stat.nr_complete) >= LOG_MERGE_SPLICE_BATCH)) {
 			ret = splice_log_merge_completions(sb, &stat, no_ranges);
-			if (ret < 0)
+			if (ret == -EINPROGRESS) {
+				mutex_unlock(&server->logs_mutex);
+				ret = server_apply_commit(sb, &hold, 0);
+				if (ret < 0)
+					goto respond;
+				server_hold_commit(sb, &hold);
+				mutex_lock(&server->logs_mutex);
+			} else if (ret < 0) {
 				goto out;
+			}
 			/* splicing resets key and adds ranges, could finish status */
 			goto restart;
 		}
@@ -2741,6 +2840,7 @@ out:
 	mutex_unlock(&server->logs_mutex);
 	ret = server_apply_commit(sb, &hold, ret);

+respond:
 	return scoutfs_net_response(sb, conn, cmd, id, ret, &req, sizeof(req));
 }

--- a/tests/golden/data-prealloc
+++ b/tests/golden/data-prealloc
@@ -24,3 +24,307 @@
 /mnt/test/test/data-prealloc/file-2: 5 extents found
 /mnt/test/test/data-prealloc/file-1: 3 extents found
 /mnt/test/test/data-prealloc/file-2: 3 extents found
+== block writes into region allocs hole
+wrote blk 24
+wrote blk 32
+wrote blk 40
+wrote blk 55
+wrote blk 63
+wrote blk 71
+wrote blk 72
+wrote blk 79
+wrote blk 80
+wrote blk 87
+wrote blk 88
+wrote blk 95
+before:
+24.. 1: 
+32.. 1: 
+40.. 1: 
+55.. 1: 
+63.. 1: 
+71.. 2: 
+79.. 2: 
+87.. 2: 
+95.. 1: eof
+writing into existing 0 at pos 0
+wrote blk 0
+0.. 1: 
+1.. 7: unwritten
+24.. 1: 
+32.. 1: 
+40.. 1: 
+55.. 1: 
+63.. 1: 
+71.. 2: 
+79.. 2: 
+87.. 2: 
+95.. 1: eof
+writing into existing 0 at pos 1
+wrote blk 15
+0.. 1: 
+1.. 14: unwritten
+15.. 1: 
+24.. 1: 
+32.. 1: 
+40.. 1: 
+55.. 1: 
+63.. 1: 
+71.. 2: 
+79.. 2: 
+87.. 2: 
+95.. 1: eof
+writing into existing 0 at pos 2
+wrote blk 19
+0.. 1: 
+1.. 14: unwritten
+15.. 1: 
+16.. 3: unwritten
+19.. 1: 
+20.. 4: unwritten
+24.. 1: 
+32.. 1: 
+40.. 1: 
+55.. 1: 
+63.. 1: 
+71.. 2: 
+79.. 2: 
+87.. 2: 
+95.. 1: eof
+writing into existing 1 at pos 0
+wrote blk 25
+0.. 1: 
+1.. 14: unwritten
+15.. 1: 
+16.. 3: unwritten
+19.. 1: 
+20.. 4: unwritten
+24.. 1: 
+25.. 1: 
+26.. 6: unwritten
+32.. 1: 
+40.. 1: 
+55.. 1: 
+63.. 1: 
+71.. 2: 
+79.. 2: 
+87.. 2: 
+95.. 1: eof
+writing into existing 1 at pos 1
+wrote blk 39
+0.. 1: 
+1.. 14: unwritten
+15.. 1: 
+16.. 3: unwritten
+19.. 1: 
+20.. 4: unwritten
+24.. 1: 
+25.. 1: 
+26.. 6: unwritten
+32.. 1: 
+39.. 1: 
+40.. 1: 
+55.. 1: 
+63.. 1: 
+71.. 2: 
+79.. 2: 
+87.. 2: 
+95.. 1: eof
+writing into existing 1 at pos 2
+wrote blk 44
+0.. 1: 
+1.. 14: unwritten
+15.. 1: 
+16.. 3: unwritten
+19.. 1: 
+20.. 4: unwritten
+24.. 1: 
+25.. 1: 
+26.. 6: unwritten
+32.. 1: 
+39.. 1: 
+40.. 1: 
+44.. 1: 
+45.. 3: unwritten
+55.. 1: 
+63.. 1: 
+71.. 2: 
+79.. 2: 
+87.. 2: 
+95.. 1: eof
+writing into existing 2 at pos 0
+wrote blk 48
+0.. 1: 
+1.. 14: unwritten
+15.. 1: 
+16.. 3: unwritten
+19.. 1: 
+20.. 4: unwritten
+24.. 1: 
+25.. 1: 
+26.. 6: unwritten
+32.. 1: 
+39.. 1: 
+40.. 1: 
+44.. 1: 
+45.. 3: unwritten
+48.. 1: 
+49.. 6: unwritten
+55.. 1: 
+63.. 1: 
+71.. 2: 
+79.. 2: 
+87.. 2: 
+95.. 1: eof
+writing into existing 2 at pos 1
+wrote blk 62
+0.. 1: 
+1.. 14: unwritten
+15.. 1: 
+16.. 3: unwritten
+19.. 1: 
+20.. 4: unwritten
+24.. 1: 
+25.. 1: 
+26.. 6: unwritten
+32.. 1: 
+39.. 1: 
+40.. 1: 
+44.. 1: 
+45.. 3: unwritten
+48.. 1: 
+49.. 6: unwritten
+55.. 1: 
+56.. 6: unwritten
+62.. 1: 
+63.. 1: 
+71.. 2: 
+79.. 2: 
+87.. 2: 
+95.. 1: eof
+writing into existing 2 at pos 2
+wrote blk 67
+0.. 1: 
+1.. 14: unwritten
+15.. 1: 
+16.. 3: unwritten
+19.. 1: 
+20.. 4: unwritten
+24.. 1: 
+25.. 1: 
+26.. 6: unwritten
+32.. 1: 
+39.. 1: 
+40.. 1: 
+44.. 1: 
+45.. 3: unwritten
+48.. 1: 
+49.. 6: unwritten
+55.. 1: 
+56.. 6: unwritten
+62.. 1: 
+63.. 1: 
+64.. 3: unwritten
+67.. 1: 
+68.. 3: unwritten
+71.. 2: 
+79.. 2: 
+87.. 2: 
+95.. 1: eof
+writing into existing 3 at pos 0
+wrote blk 73
+0.. 1: 
+1.. 14: unwritten
+15.. 1: 
+16.. 3: unwritten
+19.. 1: 
+20.. 4: unwritten
+24.. 1: 
+25.. 1: 
+26.. 6: unwritten
+32.. 1: 
+39.. 1: 
+40.. 1: 
+44.. 1: 
+45.. 3: unwritten
+48.. 1: 
+49.. 6: unwritten
+55.. 1: 
+56.. 6: unwritten
+62.. 1: 
+63.. 1: 
+64.. 3: unwritten
+67.. 1: 
+68.. 3: unwritten
+71.. 2: 
+73.. 1: 
+74.. 5: unwritten
+79.. 2: 
+87.. 2: 
+95.. 1: eof
+writing into existing 3 at pos 1
+wrote blk 86
+0.. 1: 
+1.. 14: unwritten
+15.. 1: 
+16.. 3: unwritten
+19.. 1: 
+20.. 4: unwritten
+24.. 1: 
+25.. 1: 
+26.. 6: unwritten
+32.. 1: 
+39.. 1: 
+40.. 1: 
+44.. 1: 
+45.. 3: unwritten
+48.. 1: 
+49.. 6: unwritten
+55.. 1: 
+56.. 6: unwritten
+62.. 1: 
+63.. 1: 
+64.. 3: unwritten
+67.. 1: 
+68.. 3: unwritten
+71.. 2: 
+73.. 1: 
+74.. 5: unwritten
+79.. 2: 
+86.. 1: 
+87.. 2: 
+95.. 1: eof
+writing into existing 3 at pos 2
+wrote blk 92
+0.. 1: 
+1.. 14: unwritten
+15.. 1: 
+16.. 3: unwritten
+19.. 1: 
+20.. 4: unwritten
+24.. 1: 
+25.. 1: 
+26.. 6: unwritten
+32.. 1: 
+39.. 1: 
+40.. 1: 
+44.. 1: 
+45.. 3: unwritten
+48.. 1: 
+49.. 6: unwritten
+55.. 1: 
+56.. 6: unwritten
+62.. 1: 
+63.. 1: 
+64.. 3: unwritten
+67.. 1: 
+68.. 3: unwritten
+71.. 2: 
+73.. 1: 
+74.. 5: unwritten
+79.. 2: 
+86.. 1: 
+87.. 2: 
+92.. 1: 
+93.. 2: unwritten
+95.. 1: eof
--- a/tests/golden/get-referring-entries
+++ b/tests/golden/get-referring-entries
@@ -0,0 +1,18 @@
+== root inode returns nothing
+== crazy large unused inode does nothing
+== basic entry
+file
+== rename
+renamed
+== hard link
+file
+link
+== removal
+== different dirs
+== file types
+type b name block
+type c name char
+type d name dir
+type f name file
+type l name symlink
+== all name lengths work
--- a/tests/sequence
+++ b/tests/sequence
@@ -5,6 +5,7 @@ inode-items-updated.sh
 simple-inode-index.sh
 simple-staging.sh
 simple-release-extents.sh
+get-referring-entries.sh
 fallocate.sh
 basic-truncate.sh
 data-prealloc.sh
--- a/tests/tests/data-prealloc.sh
+++ b/tests/tests/data-prealloc.sh
@@ -6,6 +6,15 @@
 #
 t_require_commands scoutfs stat filefrag dd touch truncate

+write_block()
+{
+	local file="$1"
+	local blk="$2"
+
+	dd if=/dev/zero of="$file" bs=4096 seek=$blk count=1 conv=notrunc status=none
+	echo "wrote blk $blk"
+}
+
 write_forwards()
 {
 	local prefix="$1"
@@ -70,6 +79,25 @@ print_extents_found()
 	filefrag "$prefix"* 2>&1 | grep "extent.*found" | t_filter_fs
 }

+#
+# print the logical start, len, and flags if they're there.
+#
+print_logical_extents()
+{
+	local file="$1"
+
+	filefrag -v -b4096 "$file" 2>&1 | t_filter_fs | awk '
+		($1 ~ /[0-9]+:/) {
+			if ($NF !~  /[0-9]+:/) {
+				flags=$NF
+			} else {
+				flags=""
+			}
+			print $2, $6, flags
+		}
+	'
+}
+
 t_save_all_sysfs_mount_options data_prealloc_blocks
 t_save_all_sysfs_mount_options data_prealloc_contig_only
 restore_options()
@@ -133,4 +161,71 @@ t_set_sysfs_mount_option 0 data_prealloc_contig_only 0
 write_forwards $prefix 3
 print_extents_found $prefix

+#
+# prepare aligned regions of 8 blocks that we'll write into.
+# We'll right into the first, last, and middle block of each
+# region which was prepared with no existing extents, one at
+# the start, and one at the end.
+#
+# Let's keep this last because it creates a ton of output to read
+# through.  The correct output is tied to preallocation strategy so it
+# has to be verified each time we change preallocation.
+#
+echo "== block writes into region allocs hole" 
+t_set_sysfs_mount_option 0 data_prealloc_blocks 8
+t_set_sysfs_mount_option 0 data_prealloc_contig_only 1
+touch "$prefix"
+truncate -s 0 "$prefix"
+
+# write initial blocks in regions
+base=0
+for sides in 0 1 2 3; do
+	for i in 0 1 2; do
+                case "$sides" in
+			# none
+			0) ;;
+			# left
+			1) write_block $prefix $((base + 0)) ;;
+			# right
+			2) write_block $prefix $((base + 7)) ;;
+			# both
+			3) write_block $prefix $((base + 0)) 
+			   write_block $prefix $((base + 7)) ;;
+		esac
+		((base+=8))
+	done
+done
+
+echo before:
+print_logical_extents "$prefix"
+
+# now write into the first, middle, and last empty block of each
+t_set_sysfs_mount_option 0 data_prealloc_contig_only 0
+base=0
+for sides in 0 1 2 3; do
+	for i in 0 1 2; do
+		echo "writing into existing $sides at pos $i"
+		case "$sides" in
+			# none
+			0) left=$base; right=$((base + 7));;
+			# left
+			1) left=$((base + 1)); right=$((base + 7));;
+			# right
+			2) left=$((base)); right=$((base + 6));;
+			# both
+			3) left=$((base + 1)); right=$((base + 6));;
+		esac
+		case "$i" in
+			# start
+			0) write_block $prefix $left ;;
+			# end
+			1) write_block $prefix $right ;;
+			# mid (both has 6 blocks internally)
+			2) write_block $prefix $((left + 3)) ;;
+		esac
+		print_logical_extents "$prefix"
+		((base+=8))
+	done
+done
+
 t_pass
--- a/tests/tests/get-referring-entries.sh
+++ b/tests/tests/get-referring-entries.sh
@@ -0,0 +1,99 @@
+
+#
+# Test _GET_REFERRING_ENTRIES ioctl via the get-referring-entries cli
+# command
+#
+
+# consistently print only entry names
+filter_names() {
+	exec cut -d ' ' -f 8- | sort
+}
+
+# print entries with type characters to match find.  not happy with hard
+# coding, but abi won't change much.
+filter_types() {
+	exec cut -d ' ' -f 5- | \
+	sed \
+		-e 's/type 1 /type p /' \
+		-e 's/type 2 /type c /' \
+		-e 's/type 4 /type d /' \
+		-e 's/type 6 /type b /' \
+		-e 's/type 8 /type f /' \
+		-e 's/type 10 /type l /' \
+		-e 's/type 12 /type s /' \
+		| \
+	sort
+}
+
+n_chars() {
+	local n="$1"
+	printf 'A%.0s' $(eval echo {1..\$n})
+}
+
+GRE="scoutfs get-referring-entries -p $T_M0"
+
+echo "== root inode returns nothing"
+$GRE 1
+
+echo "== crazy large unused inode does nothing"
+$GRE 4611686018427387904 # 1 << 62
+
+echo "== basic entry"
+touch $T_D0/file
+ino=$(stat -c '%i' $T_D0/file)
+$GRE $ino | filter_names
+
+echo "== rename"
+mv $T_D0/file $T_D0/renamed
+$GRE $ino | filter_names
+
+echo "== hard link"
+mv $T_D0/renamed $T_D0/file
+ln $T_D0/file $T_D0/link
+$GRE $ino | filter_names
+
+echo "== removal"
+rm $T_D0/file $T_D0/link
+$GRE $ino
+
+echo "== different dirs"
+touch $T_D0/file
+ino=$(stat -c '%i' $T_D0/file)
+for i in $(seq 1 10); do
+	mkdir $T_D0/dir-$i
+	ln $T_D0/file $T_D0/dir-$i/file-$i
+done
+diff -u <(find $T_D0 -type f -printf '%f\n' | sort) <($GRE $ino | filter_names)
+rm $T_D0/file
+
+echo "== file types"
+mkdir $T_D0/dir
+touch $T_D0/dir/file
+mkdir $T_D0/dir/dir
+ln -s $T_D0/dir/file $T_D0/dir/symlink
+mknod $T_D0/dir/char c 1 3 # null
+mknod $T_D0/dir/block b 7 0 # loop0
+for name in $(ls -UA $T_D0/dir | sort); do
+	ino=$(stat -c '%i' $T_D0/dir/$name)
+	$GRE $ino | filter_types
+done
+rm -rf $T_D0/dir
+
+echo "== all name lengths work"
+mkdir $T_D0/dir
+touch $T_D0/dir/file
+ino=$(stat -c '%i' $T_D0/dir/file)
+name=""
+> $T_TMP.unsorted
+for i in $(seq 1 255); do
+	name+="a"
+	echo "$name" >> $T_TMP.unsorted
+	ln $T_D0/dir/file $T_D0/dir/$name
+done
+sort $T_TMP.unsorted > $T_TMP.sorted
+rm $T_D0/dir/file
+$GRE $ino | filter_names > $T_TMP.gre
+diff -u $T_TMP.sorted $T_TMP.gre
+rm -rf $T_D0/dir
+
+t_pass
--- a/utils/man/scoutfs.8
+++ b/utils/man/scoutfs.8
@@ -209,6 +209,29 @@ A path within a ScoutFS filesystem.
 .RE
 .PD

+.TP
+.BI "get-referring-entries [-p|--path PATH] INO"
+.sp
+Find directory entries that reference an inode number.
+.sp
+Display all the directory entries that refer to a given inode.  Each
+entry includes the inode number of the directory that contains it, the
+d_off and d_type values for the entry as described by
+.BR readdir (3)
+, and the name of the entry.
+.RS 1.0i
+.PD 0
+.TP
+.sp
+.TP
+.B "-p, --path PATH"
+A path within a ScoutFS filesystem.
+.TP
+.B "INO"
+The inode number of the target inode.
+.RE
+.PD
+
 .TP
 .BI "ino-path INODE-NUM [-p|--path PATH]"
 .sp
--- a/utils/src/get_referring_entries.c
+++ b/utils/src/get_referring_entries.c
@@ -0,0 +1,150 @@
+#include <unistd.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/ioctl.h>
+#include <fcntl.h>
+#include <errno.h>
+#include <string.h>
+#include <limits.h>
+#include <argp.h>
+
+#include "sparse.h"
+#include "parse.h"
+#include "util.h"
+#include "format.h"
+#include "ioctl.h"
+#include "parse.h"
+#include "cmd.h"
+
+struct gre_args {
+	char *path;
+	u64 ino;
+};
+
+static int do_get_referring_entries(struct gre_args *args)
+{
+	struct scoutfs_ioctl_get_referring_entries gre;
+	struct scoutfs_ioctl_dirent *dent;
+	unsigned int bytes;
+	void *buf;
+	int ret;
+	int fd;
+
+	fd = get_path(args->path, O_RDONLY);
+	if (fd < 0)
+		return fd;
+
+	bytes = PATH_MAX * 1024;
+	buf = malloc(bytes);
+	if (!buf) {
+		fprintf(stderr, "couldn't allocate %u byte buffer\n", bytes);
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	gre.ino = args->ino;
+	gre.dir_ino = 0;
+	gre.dir_pos = 0;
+	gre.entries_ptr = (intptr_t)buf;
+	gre.entries_bytes = bytes;
+
+	for (;;) {
+		ret = ioctl(fd, SCOUTFS_IOC_GET_REFERRING_ENTRIES, &gre);
+		if (ret <= 0) {
+			if (ret < 0) {
+				ret = -errno;
+				fprintf(stderr, "ioctl failed: %s (%d)\n", strerror(errno), errno);
+			}
+			goto out;
+		}
+
+		dent = buf;
+		while (ret-- > 0) {
+			printf("dir %llu pos %llu type %u name %s\n",
+			       dent->dir_ino, dent->dir_pos, dent->d_type, dent->name);
+
+			gre.dir_ino = dent->dir_ino;
+			gre.dir_pos = dent->dir_pos;
+
+			if (dent->flags & SCOUTFS_IOCTL_DIRENT_FLAG_LAST) {
+				ret = 0;
+				goto out;
+			}
+
+			dent = (void *)dent + dent->entry_bytes;
+		}
+
+		if (++gre.dir_pos == 0) {
+			if (++gre.dir_ino == 0) {
+				ret = 0;
+				goto out;
+			}
+		}
+	}
+
+out:
+	close(fd);
+	free(buf);
+
+	return ret;
+};
+
+static int parse_opt(int key, char *arg, struct argp_state *state)
+{
+	struct gre_args *args = state->input;
+	int ret;
+
+	switch (key) {
+	case 'p':
+		args->path = strdup_or_error(state, arg);
+		break;
+	case ARGP_KEY_ARG:
+		if (args->ino)
+			argp_error(state, "more than one argument given");
+		ret = parse_u64(arg, &args->ino);
+		if (ret)
+			argp_error(state, "inode parse error");
+		break;
+	case ARGP_KEY_FINI:
+		if (!args->ino) {
+			argp_error(state, "must provide inode number");
+		}
+		break;
+	default:
+		break;
+	}
+
+	return 0;
+}
+
+static struct argp_option options[] = {
+	{ "path", 'p', "PATH", 0, "Path to ScoutFS filesystem"},
+	{ NULL }
+};
+
+static struct argp argp = {
+	options,
+	parse_opt,
+	"INODE-NUM",
+	"Print directory entries that refer to inode number"
+};
+
+static int get_referring_entries_cmd(int argc, char **argv)
+{
+	struct gre_args args = {NULL};
+	int ret;
+
+	ret = argp_parse(&argp, argc, argv, 0, NULL, &args);
+	if (ret)
+		return ret;
+
+	return do_get_referring_entries(&args);
+}
+
+
+static void __attribute__((constructor)) get_referring_entries_ctor(void)
+{
+	cmd_register_argp("get-referring-entries", &argp, GROUP_SEARCH, get_referring_entries_cmd);
+}
Author	SHA1	Message	Date
Zach Brown	4784ccdfd5	Start server commits when holds wait for alloc Server code that wants to dirty blocks by holding a commit won't be allowed to until the current allocators for the server transaction have enough space for the holder. As an active holder applies the commit the allocators are refilled and the waiting holders will proceed. But the current allocators can have no resources as the server starts up. There will never be active holders to apply the commit and refill the allocators. In this case all the holders will block indefinitely. The fix is to trigger a server commit when a holder doesn't have room. It used to be that commits were only triggered when apply callers were waiting. We transfer some of that logic into a new 'committing' field so that we can have commits in flight without apply callers waiting. We add it to the server commit tracing. While we're at it we clean up the logic that tests if a hold can proceed. It used to be confusingly split across two functions that both could sample the current allocator space remaining. This could lead to weird cases where the first holder could use the second alloc remaining call, not the one whose values were tested to see if the holder could fit. Now each hold check only samples the allocators once. And finally we fix a subtle case where the budget exceeded message can spuriously trigger in the case where dirtying the freed list created a new empty block after the holder recorded the amount of space in the freed block. Signed-off-by: Zach Brown <zab@versity.com>	2023-10-03 13:32:09 -07:00
Zach Brown	778c2769df	Merge pull request #132 from versity/zab/v1.15 v1.15 Release	2023-07-17 13:02:10 -07:00
Zach Brown	9e3529060e	v1.15 Release Finish the release notes for the 1.15 release. Signed-off-by: Zach Brown <zab@versity.com>	2023-07-17 12:07:13 -07:00
Zach Brown	1672b3ecec	Merge pull request #130 from versity/zab/noncontig_alloc_einval Fix partial preallocation when _contig_only = 0	2023-07-17 10:21:18 -07:00
Zach Brown	55f9435fad	Fix partial preallocation when _contig_only = 0 Data preallocation attempts to allocate large aligned regions of extents. It tried to fill the hole around a write offset that didn't contain an extent. It missed the case where there can be multiple extents between the start of the region and the hole. It could try to overwrite these additional existing extents and writes could return EINVAL. We fix this by trimming the preallocation to start at the write offset if there are any extents in the region before the write offset. The data preallocation test output has to be updated now that allocation extents won't grow towards the start of the region when there are existing extents. Signed-off-by: Zach Brown <zab@versity.com>	2023-07-17 09:36:09 -07:00
Zach Brown	072f6868d3	Merge pull request #131 from versity/zab/server_merge_splice_failure Process log merge splicing in many commits	2023-07-15 21:03:32 -07:00
Zach Brown	8a64b46a2f	Process log merge splicing in many commits Log merge completions were spliced in one server commit. It's possible to get enough completion work pending that it all can't be completed in one server commit. Operations fail with ENOSPC and because these changes can't be unwound cleanly the server asserts. This allows the completion splicing to break the work up into multiple commits. Processing completions in multiple commits means that request creation can observe the merge status in states that weren't possible before. Splicing is careful to maintain an elevated nr_complete count while the client can't get requests because the tree is rebalancing. Signed-off-by: Zach Brown <zab@versity.com>	2023-07-14 13:28:29 -07:00
Zach Brown	14901c39aa	Merge pull request #129 from versity/zab/v1.14 v1.14 Release	2023-06-29 11:30:01 -07:00
Zach Brown	e095127ae9	v1.14 Release Finish the release notes for the 1.14 release. Signed-off-by: Zach Brown <zab@versity.com>	2023-06-29 10:03:53 -07:00
Zach Brown	a9da27444f	Merge pull request #128 from versity/zab/prealloc_fragmentation Zab/prealloc fragmentation	2023-06-29 09:57:32 -07:00
Zach Brown	49fe89741d	Merge pull request #125 from versity/zab/get_referring_entries Zab/get referring entries	2023-06-29 09:57:06 -07:00
Zach Brown	847916860d	Advance move_blocks extent search offset The move_blocks ioctl finds extents to move in the source file by searching from the starting block offset of the region to move. Logically, this is fine. After each extent item is deleted the next search will find the next extent. The problem is that deleted items still exist in the item cache. The next iteration has to skip over all the deleted extents from the start of the region. This is fine with large extents, but with heavily fragmented extents this creates a huge amplification of the number of items to traverse when moving the fragmented extents in a large file. (It's not quite O(n^2)/2 for the total extents, deleted items are purged as we write out the dirty items in each transaction.. but it's still immense.) The fix is to simply start searching for the next extent after the one we just moved. Signed-off-by: Zach Brown <zab@versity.com>	2023-06-28 16:54:28 -07:00
Zach Brown	564b942ead	Write test for hole filling noncontig prealloc Add a test which exercises filling holes in prealloc regions when the _contig_only prealloc option is not set. Signed-off-by: Zach Brown <zab@versity.com>	2023-06-28 16:16:04 -07:00
Zach Brown	3d99fda0f6	Preallocate data around iblock when noncontig If the _contig_only option isn't set then we try to preallocate aligned regions of files. The initial implementation naively only allowed one preallocation attempt in each aligned region. If it got a small allocation that didn't fill the region then every future allocation in the region would be a single block. This changes every preallocation in the region to attempt to fill the hole in the region that iblock fell in. It uses an extra extent search (item cache search) to try and avoid thousands of single block allocations. Signed-off-by: Zach Brown <zab@versity.com>	2023-06-28 12:21:25 -07:00
Zach Brown	74c5fe1115	Add get-referring-entries test Signed-off-by: Zach Brown <zab@versity.com>	2023-06-14 14:12:10 -07:00
Zach Brown	2279e9657f	Add get_referring_entries scoutfs command Add a cli command for the get_referring_entries ioctl. Signed-off-by: Zach Brown <zab@versity.com>	2023-06-14 14:12:10 -07:00
Zach Brown	707752a7bf	Add get_referring_entries ioctl Add an ioctl that gives the callers all entries that refer to an inode. It's like a backwards readdir. It's a light bit of translation between the internal _add_next_linkrefs() list of entries and the ioctl interface of a buffer of entry structs. Signed-off-by: Zach Brown <zab@versity.com>	2023-06-14 14:12:10 -07:00
Zach Brown	0316c22026	Extend scoutfs_dir_add_next_linkrefs Extend scoutfs_dir_add_next_linkref() to be able to return multiple backrefs under the lock for each call and have it take an argument to limit the number of backrefs that can be added and returned. Its return code changes a bit in that it returns 1 on success instead of 0 so we have to be a little careful with callers who were expecting 0. It still returns -ENOENT when no entries are found. We break up its tracepoint into one that records each entry added and one that records the result of each call. This will be used by an ioctl to give callers just the entries that point to an inode instead of assembling full paths from the root. Signed-off-by: Zach Brown <zab@versity.com>	2023-06-14 14:12:10 -07:00