merge conflict from zab/shrink cleanup

Fix a sparse warning in net.c
Add tcp_keepalive_timeout_ms option.
2026-01-21 02:42:51 +00:00 · 2025-10-07 12:22:53 -07:00 · 2025-10-07 12:22:40 -07:00 · 2025-10-07 12:16:23 -07:00 · 2025-10-07 12:15:59 -07:00 · 2025-10-07 12:15:51 -07:00
55 changed files with 1553 additions and 2466 deletions
--- a/ReleaseNotes.md
+++ b/ReleaseNotes.md
@@ -1,74 +1,6 @@
 Versity ScoutFS Release Notes
 =============================

---
-v1.27
-\
-*Jan 15, 2026*
-
-Switch away from using the general VM cache reclaim machinery to reduce
-idle cluster locks in the client.  The VM treated locks like a cache and
-let many accumulate, presuming that it would be efficient to free them
-in batches.  Lock freeing requires network communication so this could
-result in enormous backlogs in network messages (on the order of
-hundreds of thousands) and could result in signifcant delays of other
-network messaging.
-
-Fix inefficient network receive processing while many messages are in
-the send queue.  This consumed sufficient CPU to cause significant
-stalls, perhaps resulting in hung task warning messages due to delayed
-lock message delivery.
-
-Fix a server livelock case that could happen while committing client
-transactions that contain a large amount of freed file data extents.
-This would present as client tasks hanging and a server task spinning
-consuming cpu.
-
-Fix a rare server request processing failure that doesn't deal with
-retransmission of a request that a previous server partially processed.
-This would present as hung client tasks and repeated "error -2
-committing log merge: getting merge status item" kernel messages.
-
-Fix an unneccessary server shutdown during specific circumstances in
-client lock recovery.  The shutdown was due to server state and was
-ultimately harmless.  The next server that started up would proceed
-accordingly.
-
---
-v1.26
-\
-*Nov 17, 2025*
-
-Add the ino\_alloc\_per\_lock mount option.  This changes the number of
-inode numbers allocated under each cluster lock and can alleviate lock
-contention for some patterns of larger file creation.
-
-Add the tcp\_keepalive\_timeout\_ms mount option.  This can enable the
-system to survive longer periods of networking outages.
-
-Fix a rare double free of internal btree metadata blocks when merging
-log trees.  The duplicated freed metadata block numbers would cause
-persistent errors in the server, preventing the server from starting and
-hanging the system.
-
-Fix the data\_wait interface to not require the correct data\_version of
-the inode when raising an error.  This lets callers raise errors when
-they're unable to recall the details of the inode to discover its
-data\_version.
-
-Change scoutfs to more aggressively reclaim cached memory when under
-memory pressure.  This makes scoutfs behave more like other kernel
-components and it integrates better with the reclaim policy heuristics
-in the VM core of the kernel.
-
-Change scoutfs to more efficiently transmit and receive socket messages.
-Under heavy load this can process messages sufficiently more quickly to
-avoid hung task messages for tasks that were waiting for cluster lock
-messages to be processed.
-
-Fix faulty server block commit budget calculations that were generating
-spurious "holders exceeded alloc budget" console messages.
-
 ---
 v1.25
 \
--- a/kmod/src/Makefile.kernelcompat
+++ b/kmod/src/Makefile.kernelcompat
@@ -278,14 +278,6 @@ ifneq (,$(shell grep 'int ..mknod. .struct user_namespace' include/linux/fs.h))
 ccflags-y += -DKC_VFS_METHOD_USER_NAMESPACE_ARG
 endif

-#
-# v6.2-rc1-2-gabf08576afe3
-#
-# fs: vfs methods use struct mnt_idmap instead of struct user_namespace
-ifneq (,$(shell grep 'int vfs_mknod.struct mnt_idmap' include/linux/fs.h))
-ccflags-y += -DKC_VFS_METHOD_MNT_IDMAP_ARG
-endif
-
 #
 # v5.17-rc2-21-g07888c665b40
 #
@@ -470,19 +462,3 @@ ifneq (,$(shell grep 'struct list_lru_one \*list, spinlock_t \*lock, void \*cb_a
 ccflags-y += -DKC_LIST_LRU_WALK_CB_LIST_LOCK
 endif

-#
-# v5.1-rc4-273-ge9b98e162aa5
-#
-# introduce stack trace helpers
-#
-ifneq (,$(shell grep '^unsigned int stack_trace_save' include/linux/stacktrace.h))
-ccflags-y += -DKC_STACK_TRACE_SAVE
-endif
-
-# v6.1-rc1-4-g7420332a6ff4
-#
-# .get_acl() method now has dentry arg (and mnt_idmap). The old get_acl has been renamed
-# to get_inode_acl() and is still available as well, but has an extra rcu param.
-ifneq (,$(shell grep 'struct posix_acl ...get_acl..struct mnt_idmap ., struct dentry' include/linux/fs.h))
-ccflags-y += -DKC_GET_ACL_DENTRY
-endif
--- a/kmod/src/acl.c
+++ b/kmod/src/acl.c
@@ -107,15 +107,8 @@ struct posix_acl *scoutfs_get_acl_locked(struct inode *inode, int type, struct s
 	return acl;
 }

-#ifdef KC_GET_ACL_DENTRY
-struct posix_acl *scoutfs_get_acl(KC_VFS_NS_DEF
-				  struct dentry *dentry, int type)
-{
-	struct inode *inode = dentry->d_inode;
-#else
 struct posix_acl *scoutfs_get_acl(struct inode *inode, int type)
 {
-#endif
 	struct super_block *sb = inode->i_sb;
 	struct scoutfs_lock *lock = NULL;
 	struct posix_acl *acl;
@@ -208,15 +201,8 @@ out:
 	return ret;
 }

-#ifdef KC_GET_ACL_DENTRY
-int scoutfs_set_acl(KC_VFS_NS_DEF
-		    struct dentry *dentry, struct posix_acl *acl, int type)
-{
-	struct inode *inode = dentry->d_inode;
-#else
 int scoutfs_set_acl(struct inode *inode, struct posix_acl *acl, int type)
 {
-#endif
 	struct super_block *sb = inode->i_sb;
 	struct scoutfs_lock *lock = NULL;
 	LIST_HEAD(ind_locks);
@@ -254,12 +240,7 @@ int scoutfs_acl_get_xattr(struct dentry *dentry, const char *name, void *value,
 	if (!IS_POSIXACL(dentry->d_inode))
 		return -EOPNOTSUPP;

-#ifdef KC_GET_ACL_DENTRY
-	acl = scoutfs_get_acl(KC_VFS_INIT_NS
-			      dentry, type);
-#else
 	acl = scoutfs_get_acl(dentry->d_inode, type);
-#endif
 	if (IS_ERR(acl))
 		return PTR_ERR(acl);
 	if (acl == NULL)
@@ -305,11 +286,7 @@ int scoutfs_acl_set_xattr(struct dentry *dentry, const char *name, const void *v
 		}
 	}

-#ifdef KC_GET_ACL_DENTRY
-	ret = scoutfs_set_acl(KC_VFS_INIT_NS dentry, acl, type);
-#else
 	ret = scoutfs_set_acl(dentry->d_inode, acl, type);
-#endif
 out:
 	posix_acl_release(acl);

--- a/kmod/src/acl.h
+++ b/kmod/src/acl.h
@@ -1,14 +1,9 @@
 #ifndef _SCOUTFS_ACL_H_
 #define _SCOUTFS_ACL_H_

-#ifdef KC_GET_ACL_DENTRY
-struct posix_acl *scoutfs_get_acl(KC_VFS_NS_DEF struct dentry *dentry, int type);
-int scoutfs_set_acl(KC_VFS_NS_DEF struct dentry *dentry, struct posix_acl *acl, int type);
-#else
 struct posix_acl *scoutfs_get_acl(struct inode *inode, int type);
-int scoutfs_set_acl(struct inode *inode, struct posix_acl *acl, int type);
-#endif
 struct posix_acl *scoutfs_get_acl_locked(struct inode *inode, int type, struct scoutfs_lock *lock);
+int scoutfs_set_acl(struct inode *inode, struct posix_acl *acl, int type);
 int scoutfs_set_acl_locked(struct inode *inode, struct posix_acl *acl, int type,
 			   struct scoutfs_lock *lock, struct list_head *ind_locks);
 #ifdef KC_XATTR_STRUCT_XATTR_HANDLER
--- a/kmod/src/alloc.c
+++ b/kmod/src/alloc.c
@@ -857,7 +857,7 @@ static int find_zone_extent(struct super_block *sb, struct scoutfs_alloc_root *r
 		.zone = SCOUTFS_FREE_EXTENT_ORDER_ZONE,
 	};
 	struct scoutfs_extent found;
-	struct scoutfs_extent ext = {0,};
+	struct scoutfs_extent ext;
 	u64 start;
 	u64 len;
 	int nr;
--- a/kmod/src/block.c
+++ b/kmod/src/block.c
@@ -23,7 +23,6 @@
 #include <linux/random.h>
 #include <linux/sched/mm.h>
 #include <linux/list_lru.h>
-#include <linux/stacktrace.h>

 #include "format.h"
 #include "super.h"
@@ -81,8 +80,6 @@ struct block_private {
 		struct page *page;
 		void *virt;
 	};
-	unsigned int stack_len;
-	unsigned long stack[10];
 };

 #define TRACE_BLOCK(which, bp)									\
@@ -103,17 +100,7 @@ static __le32 block_calc_crc(struct scoutfs_block_header *hdr, u32 size)
 	return cpu_to_le32(calc);
 }

-static noinline void save_block_stack(struct block_private *bp)
-{
-	bp->stack_len = stack_trace_save(bp->stack, ARRAY_SIZE(bp->stack), 2);
-}
-
-static void print_block_stack(struct block_private *bp)
-{
-	stack_trace_print(bp->stack, bp->stack_len, 1);
-}
-
-static noinline struct block_private *block_alloc(struct super_block *sb, u64 blkno)
+static struct block_private *block_alloc(struct super_block *sb, u64 blkno)
 {
 	struct block_private *bp;
 	unsigned int nofs_flags;
@@ -169,7 +156,6 @@ static noinline struct block_private *block_alloc(struct super_block *sb, u64 bl
 	atomic_set(&bp->io_count, 0);

 	TRACE_BLOCK(allocate, bp);
-	save_block_stack(bp);

 out:
 	if (!bp)
@@ -1127,19 +1113,6 @@ static unsigned long block_scan_objects(struct shrinker *shrink, struct shrink_c
 	return freed;
 }

-static enum lru_status dump_lru_block(struct list_head *item, struct list_lru_one *list,
-					 void *cb_arg)
-{
-	struct block_private *bp = container_of(item, struct block_private, lru_head);
-
-	printk("blkno %llu refcount 0x%x io_count %d bits 0x%lx\n",
-		bp->bl.blkno, atomic_read(&bp->refcount), atomic_read(&bp->io_count),
-		bp->bits);
-	print_block_stack(bp);
-
-	return LRU_SKIP;
-}
-
 /*
 * Called during shutdown with no other users.  The isolating walk must
 * find blocks on the lru that only have references for presence on the
@@ -1149,19 +1122,11 @@ static void block_shrink_all(struct super_block *sb)
 {
 	DECLARE_BLOCK_INFO(sb, binf);
 	DECLARE_ISOLATE_ARGS(sb, ia);
-	long count;

-	count = DIV_ROUND_UP(list_lru_count(&binf->lru), 128) * 2;
 	do {
 		kc_list_lru_walk(&binf->lru, isolate_lru_block, &ia, 128);
 		shrink_dispose_blocks(sb, &ia.dispose);
-	} while (list_lru_count(&binf->lru) > 0 && --count > 0);
-
-	count = list_lru_count(&binf->lru);
-	if (count > 0) {
-		scoutfs_err(sb, "failed to isolate/dispose %ld blocks", count);
-		kc_list_lru_walk(&binf->lru, dump_lru_block, sb, count);
-	}
+        } while (list_lru_count(&binf->lru) > 0);
 }

 struct sm_block_completion {
--- a/kmod/src/client.c
+++ b/kmod/src/client.c
@@ -435,8 +435,8 @@ static int lookup_mounted_client_item(struct super_block *sb, u64 rid)
 	if (ret == -ENOENT)
 		ret = 0;

-out:
 	kfree(super);
+out:
 	return ret;
 }

--- a/kmod/src/counters.h
+++ b/kmod/src/counters.h
@@ -125,6 +125,7 @@
 	EXPAND_COUNTER(item_update)				\
 	EXPAND_COUNTER(item_write_dirty)			\
 	EXPAND_COUNTER(lock_alloc)				\
+	EXPAND_COUNTER(lock_count_objects)			\
 	EXPAND_COUNTER(lock_free)				\
 	EXPAND_COUNTER(lock_grant_request)			\
 	EXPAND_COUNTER(lock_grant_response)			\
@@ -138,13 +139,13 @@
 	EXPAND_COUNTER(lock_lock_error)				\
 	EXPAND_COUNTER(lock_nonblock_eagain)			\
 	EXPAND_COUNTER(lock_recover_request)			\
+	EXPAND_COUNTER(lock_scan_objects)			\
 	EXPAND_COUNTER(lock_shrink_attempted)			\
-	EXPAND_COUNTER(lock_shrink_request_failed)		\
+	EXPAND_COUNTER(lock_shrink_aborted)			\
+	EXPAND_COUNTER(lock_shrink_work)			\
 	EXPAND_COUNTER(lock_unlock)				\
 	EXPAND_COUNTER(lock_wait)				\
-	EXPAND_COUNTER(log_merge_complete)			\
 	EXPAND_COUNTER(log_merge_no_finalized)			\
-	EXPAND_COUNTER(log_merge_start)				\
 	EXPAND_COUNTER(log_merge_wait_timeout)			\
 	EXPAND_COUNTER(net_dropped_response)			\
 	EXPAND_COUNTER(net_send_bytes)				\
@@ -159,7 +160,6 @@
 	EXPAND_COUNTER(orphan_scan)				\
 	EXPAND_COUNTER(orphan_scan_attempts)			\
 	EXPAND_COUNTER(orphan_scan_cached)			\
-	EXPAND_COUNTER(orphan_scan_empty)			\
 	EXPAND_COUNTER(orphan_scan_error)			\
 	EXPAND_COUNTER(orphan_scan_item)			\
 	EXPAND_COUNTER(orphan_scan_omap_set)			\
--- a/kmod/src/dir.c
+++ b/kmod/src/dir.c
@@ -2053,9 +2053,6 @@ const struct inode_operations scoutfs_dir_iops = {
 #endif
 	.listxattr	= scoutfs_listxattr,
 	.get_acl	= scoutfs_get_acl,
-#ifdef KC_GET_ACL_DENTRY
-	.set_acl	= scoutfs_set_acl,
-#endif
 	.symlink	= scoutfs_symlink,
 	.permission	= scoutfs_permission,
 #ifdef KC_LINUX_HAVE_RHEL_IOPS_WRAPPER
--- a/kmod/src/inode.c
+++ b/kmod/src/inode.c
@@ -150,9 +150,6 @@ static const struct inode_operations scoutfs_file_iops = {
 #endif
 	.listxattr	= scoutfs_listxattr,
 	.get_acl	= scoutfs_get_acl,
-#ifdef KC_GET_ACL_DENTRY
-	.set_acl	= scoutfs_set_acl,
-#endif
 	.fiemap		= scoutfs_data_fiemap,
 };

@@ -166,9 +163,6 @@ static const struct inode_operations scoutfs_special_iops = {
 #endif
 	.listxattr	= scoutfs_listxattr,
 	.get_acl	= scoutfs_get_acl,
-#ifdef KC_GET_ACL_DENTRY
-	.set_acl	= scoutfs_set_acl,
-#endif
 };

 /*
@@ -1482,6 +1476,12 @@ static int remove_index_items(struct super_block *sb, u64 ino,
 * Return an allocated and unused inode number.  Returns -ENOSPC if
 * we're out of inode.
 *
+ * Each parent directory has its own pool of free inode numbers.  Items
+ * are sorted by their inode numbers as they're stored in segments.
+ * This will tend to group together files that are created in a
+ * directory at the same time in segments.  Concurrent creation across
+ * different directories will be stored in their own regions.
+ *
 * Inode numbers are never reclaimed.  If the inode is evicted or we're
 * unmounted the pending inode numbers will be lost.  Asking for a
 * relatively small number from the server each time will tend to
@@ -1491,18 +1491,12 @@ static int remove_index_items(struct super_block *sb, u64 ino,
 int scoutfs_alloc_ino(struct super_block *sb, bool is_dir, u64 *ino_ret)
 {
 	DECLARE_INODE_SB_INFO(sb, inf);
-	struct scoutfs_mount_options opts;
 	struct inode_allocator *ia;
 	u64 ino;
 	u64 nr;
 	int ret;

-	scoutfs_options_read(sb, &opts);
-
-	if (is_dir && opts.ino_alloc_per_lock == SCOUTFS_LOCK_INODE_GROUP_NR)
-		ia = &inf->dir_ino_alloc;
-	else
-		ia = &inf->ino_alloc;
+	ia = is_dir ? &inf->dir_ino_alloc : &inf->ino_alloc;

 	spin_lock(&ia->lock);

@@ -1523,17 +1517,6 @@ int scoutfs_alloc_ino(struct super_block *sb, bool is_dir, u64 *ino_ret)
 	*ino_ret = ia->ino++;
 	ia->nr--;

-	if (opts.ino_alloc_per_lock != SCOUTFS_LOCK_INODE_GROUP_NR) {
-		nr = ia->ino & SCOUTFS_LOCK_INODE_GROUP_MASK;
-		if (nr >= opts.ino_alloc_per_lock) {
-			nr = SCOUTFS_LOCK_INODE_GROUP_NR - nr;
-			if (nr > ia->nr)
-				nr = ia->nr;
-			ia->ino += nr;
-			ia->nr -= nr;
-		}
-	}
-
 	spin_unlock(&ia->lock);
 	ret = 0;
 out:
@@ -1637,14 +1620,10 @@ int scoutfs_inode_orphan_delete(struct super_block *sb, u64 ino, struct scoutfs_
 				struct scoutfs_lock *primary)
 {
 	struct scoutfs_key key;
-	int ret;

 	init_orphan_key(&key, ino);

-	ret = scoutfs_item_delete_force(sb, &key, lock, primary);
-	trace_scoutfs_inode_orphan_delete(sb, ino, ret);
-
-	return ret;
+	return scoutfs_item_delete_force(sb, &key, lock, primary);
 }

 /*
@@ -1726,8 +1705,6 @@ out:
 		scoutfs_release_trans(sb);
 	scoutfs_inode_index_unlock(sb, &ind_locks);

-	trace_scoutfs_delete_inode_end(sb, ino, mode, size, ret);
-
 	return ret;
 }

@@ -1823,9 +1800,6 @@ out:
 * they've checked that the inode could really be deleted.  We serialize
 * on a bit in the lock data so that we only have one deletion attempt
 * per inode under this mount's cluster lock.
- *
- * Returns -EAGAIN if we either did some cleanup work or are unable to finish
- * cleaning up this inode right now.
 */
 static int try_delete_inode_items(struct super_block *sb, u64 ino)
 {
@@ -1839,8 +1813,6 @@ static int try_delete_inode_items(struct super_block *sb, u64 ino)
 	int bit_nr;
 	int ret;

-	trace_scoutfs_try_delete(sb, ino);
-
 	ret = scoutfs_lock_ino(sb, SCOUTFS_LOCK_WRITE, 0, ino, &lock);
 	if (ret < 0)
 		goto out;
@@ -1853,32 +1825,27 @@ static int try_delete_inode_items(struct super_block *sb, u64 ino)

 	/* only one local attempt per inode at a time */
 	if (test_and_set_bit(bit_nr, ldata->trying)) {
-		trace_scoutfs_try_delete_local_busy(sb, ino);
-		ret = -EAGAIN;
+		ret = 0;
 		goto out;
 	}
 	clear_trying = true;

 	/* can't delete if it's cached in local or remote mounts */
 	if (scoutfs_omap_test(sb, ino) || test_bit_le(bit_nr, ldata->map.bits)) {
-		trace_scoutfs_try_delete_cached(sb, ino);
-		ret = -EAGAIN;
+		ret = 0;
 		goto out;
 	}

 	scoutfs_inode_init_key(&key, ino);
 	ret = lookup_inode_item(sb, &key, &sinode, lock);
 	if (ret < 0) {
-		if (ret == -ENOENT) {
-			trace_scoutfs_try_delete_no_item(sb, ino);
+		if (ret == -ENOENT)
 			ret = 0;
-		}
 		goto out;
 	}

 	if (le32_to_cpu(sinode.nlink) > 0) {
-		trace_scoutfs_try_delete_has_links(sb, ino, le32_to_cpu(sinode.nlink));
-		ret = -EAGAIN;
+		ret = 0;
 		goto out;
 	}

@@ -1887,10 +1854,8 @@ static int try_delete_inode_items(struct super_block *sb, u64 ino)
 		goto out;

 	ret = delete_inode_items(sb, ino, &sinode, lock, orph_lock);
-	if (ret == 0) {
-		ret = -EAGAIN;
+	if (ret == 0)
 		scoutfs_inc_counter(sb, inode_deleted);
-	}

 out:
 	if (clear_trying)
@@ -2092,10 +2057,6 @@ void scoutfs_inode_schedule_orphan_dwork(struct super_block *sb)
 * a locally cached inode.  Then we ask the server for the open map
 * containing the inode.  Only if we don't see any cached users do we do
 * the expensive work of acquiring locks to try and delete the items.
- *
- * We need to track whether there is any orphan cleanup work remaining so
- * that tests such as inode-deletion can watch the orphan_scan_empty counter
- * to determine when inode cleanup from open-unlink scenarios is complete.
 */
 static void inode_orphan_scan_worker(struct work_struct *work)
 {
@@ -2107,14 +2068,11 @@ static void inode_orphan_scan_worker(struct work_struct *work)
 	SCOUTFS_BTREE_ITEM_REF(iref);
 	struct scoutfs_key last;
 	struct scoutfs_key key;
-	bool work_todo = false;
 	u64 group_nr;
 	int bit_nr;
 	u64 ino;
 	int ret;

-	trace_scoutfs_orphan_scan_start(sb);
-
 	scoutfs_inc_counter(sb, orphan_scan);

 	init_orphan_key(&last, U64_MAX);
@@ -2134,10 +2092,8 @@ static void inode_orphan_scan_worker(struct work_struct *work)
 		init_orphan_key(&key, ino);
 		ret = scoutfs_btree_next(sb, &roots.fs_root, &key, &iref);
 		if (ret < 0) {
-			if (ret == -ENOENT) {
-				trace_scoutfs_orphan_scan_work(sb, 0);
+			if (ret == -ENOENT)
 				break;
-			}
 			goto out;
 		}

@@ -2152,7 +2108,6 @@ static void inode_orphan_scan_worker(struct work_struct *work)

 		/* locally cached inodes will try to delete as they evict */
 		if (scoutfs_omap_test(sb, ino)) {
-			work_todo = true;
 			scoutfs_inc_counter(sb, orphan_scan_cached);
 			continue;
 		}
@@ -2168,22 +2123,13 @@ static void inode_orphan_scan_worker(struct work_struct *work)

 		/* remote cached inodes will also try to delete */
 		if (test_bit_le(bit_nr, omap.bits)) {
-			work_todo = true;
 			scoutfs_inc_counter(sb, orphan_scan_omap_set);
 			continue;
 		}

 		/* seemingly orphaned and unused, get locks and check for sure */
 		scoutfs_inc_counter(sb, orphan_scan_attempts);
-		trace_scoutfs_orphan_scan_work(sb, ino);
-
 		ret = try_delete_inode_items(sb, ino);
-		if (ret == -EAGAIN) {
-			work_todo = true;
-			ret = 0;
-		}
-
-		trace_scoutfs_orphan_scan_end(sb, ino, ret);
 	}

 	ret = 0;
@@ -2192,11 +2138,6 @@ out:
 	if (ret < 0)
 		scoutfs_inc_counter(sb, orphan_scan_error);

-	if (!work_todo)
-		scoutfs_inc_counter(sb, orphan_scan_empty);
-
-	trace_scoutfs_orphan_scan_stop(sb, work_todo);
-
 	scoutfs_inode_schedule_orphan_dwork(sb);
 }

@@ -2247,7 +2188,7 @@ int scoutfs_inode_walk_writeback(struct super_block *sb, bool write)
 	struct scoutfs_inode_info *si;
 	struct scoutfs_inode_info *tmp;
 	struct inode *inode;
-	int ret = 0;
+	int ret;

 	spin_lock(&inf->writeback_lock);

--- a/kmod/src/ioctl.c
+++ b/kmod/src/ioctl.c
@@ -441,6 +441,8 @@ static long scoutfs_ioc_data_wait_err(struct file *file, unsigned long arg)

 	if (!S_ISREG(inode->i_mode)) {
 		ret = -EINVAL;
+	} else if (scoutfs_inode_data_version(inode) != args.data_version) {
+		ret = -ESTALE;
 	} else {
 		ret = scoutfs_data_wait_err(inode, sblock, eblock, args.op,
 					    args.err);
@@ -952,9 +954,6 @@ static int copy_alloc_detail_to_user(struct super_block *sb, void *arg,
 	if (args->copied == args->nr)
 		return -EOVERFLOW;

-	/* .type and .pad need clearing */
-	memset(&ade, 0, sizeof(struct scoutfs_ioctl_alloc_detail_entry));
-
 	ade.blocks = blocks;
 	ade.id = id;
 	ade.meta = !!meta;
@@ -1370,7 +1369,7 @@ static long scoutfs_ioc_get_referring_entries(struct file *file, unsigned long a
 			ent.d_type = bref->d_type;
 			ent.name_len = name_len;

-			if (copy_to_user(uent, &ent, offsetof(struct scoutfs_ioctl_dirent, name[0])) ||
+			if (copy_to_user(uent, &ent, sizeof(struct scoutfs_ioctl_dirent)) ||
 			    copy_to_user(&uent->name[0], bref->dent.name, name_len) ||
 			    put_user('\0', &uent->name[name_len])) {
 				ret = -EFAULT;
--- a/kmod/src/ioctl.h
+++ b/kmod/src/ioctl.h
@@ -366,15 +366,10 @@ struct scoutfs_ioctl_statfs_more {
 *
 * Find current waiters that match the inode, op, and block range to wake
 * up and return an error.
- *
- * (*) ca. v1.25 and earlier required that the data_version passed match
- * that of the waiter, but this check is removed. It was never needed
- * because no data is modified during this ioctl. Any data_version value
- * here is thus since then ignored.
 */
 struct scoutfs_ioctl_data_wait_err {
 	__u64 ino;
-	__u64 data_version; /* Ignored, see above (*) */
+	__u64 data_version;
 	__u64 offset;
 	__u64 count;
 	__u64 op;
--- a/kmod/src/item.c
+++ b/kmod/src/item.c
@@ -86,8 +86,6 @@ struct item_cache_info {
 	/* often walked, but per-cpu refs are fast path */
 	rwlock_t rwlock;
 	struct rb_root pg_root;
-	/* stop readers from caching stale items behind reclaimed cleaned written items */
-	u64 read_dirty_barrier;

 	/* page-granular modification by writers, then exclusive to commit */
 	spinlock_t dirty_lock;
@@ -98,6 +96,9 @@ struct item_cache_info {
 	spinlock_t lru_lock;
 	struct list_head lru_list;
 	unsigned long lru_pages;
+
+	/* stop readers from caching stale items behind reclaimed cleaned written items */
+	atomic64_t read_dirty_barrier;
 };

 #define DECLARE_ITEM_CACHE_INFO(sb, name) \
@@ -1430,9 +1431,7 @@ static int read_pages(struct super_block *sb, struct item_cache_info *cinf,
 	pg->end = lock->end;
 	rbtree_insert(&pg->node, NULL, &root.rb_node, &root);

-	read_lock(&cinf->rwlock);
-	rdbar = cinf->read_dirty_barrier;
-	read_unlock(&cinf->rwlock);
+	rdbar = atomic64_read(&cinf->read_dirty_barrier);

 	start = lock->start;
 	end = lock->end;
@@ -1471,18 +1470,19 @@ static int read_pages(struct super_block *sb, struct item_cache_info *cinf,
 retry:
 	write_lock(&cinf->rwlock);

-	/* can't insert if write has cleaned since we read */
-	if (cinf->read_dirty_barrier != rdbar) {
-		scoutfs_inc_counter(sb, item_read_pages_barrier);
-		ret = -ESTALE;
-		goto unlock;
-	}
-
+	ret = 0;
 	while ((rd = first_page(&root))) {

 		pg = page_rbtree_walk(sb, &cinf->pg_root, &rd->start, &rd->end,
 				      NULL, NULL, &par, &pnode);
 		if (!pg) {
+			/* can't insert if write is cleaning (write_lock is read barrier) */
+			if (atomic64_read(&cinf->read_dirty_barrier) != rdbar) {
+				scoutfs_inc_counter(sb, item_read_pages_barrier);
+				ret = -ESTALE;
+				break;
+			}
+
 			/* insert read pages that don't intersect */
 			rbtree_erase(&rd->node, &root);
 			rbtree_insert(&rd->node, par, pnode, &cinf->pg_root);
@@ -1515,9 +1515,6 @@ retry:
 		}
 	}

-	ret = 0;
-
-unlock:
 	write_unlock(&cinf->rwlock);

 out:
@@ -2361,10 +2358,9 @@ int scoutfs_item_write_done(struct super_block *sb)
 	struct cached_item *tmp;
 	struct cached_page *pg;

-	/* don't let read_pages miss written+cleaned items */
-	write_lock(&cinf->rwlock);
-	cinf->read_dirty_barrier++;
-	write_unlock(&cinf->rwlock);
+	/* don't let read_pages insert possibly stale items */
+	atomic64_inc(&cinf->read_dirty_barrier);
+	smp_mb__after_atomic();

 	spin_lock(&cinf->dirty_lock);
 	while ((pg = list_first_entry_or_null(&cinf->dirty_list, struct cached_page, dirty_head))) {
@@ -2619,6 +2615,7 @@ int scoutfs_item_setup(struct super_block *sb)
 	atomic_set(&cinf->dirty_pages, 0);
 	spin_lock_init(&cinf->lru_lock);
 	INIT_LIST_HEAD(&cinf->lru_list);
+	atomic64_set(&cinf->read_dirty_barrier, 0);

 	cinf->pcpu_pages = alloc_percpu(struct item_percpu_pages);
 	if (!cinf->pcpu_pages)
--- a/kmod/src/kernelcompat.h
+++ b/kmod/src/kernelcompat.h
@@ -263,11 +263,6 @@ typedef unsigned int blk_opf_t;
 #define kc__vmalloc __vmalloc
 #endif

-#ifdef KC_VFS_METHOD_MNT_IDMAP_ARG
-#define KC_VFS_NS_DEF struct mnt_idmap *mnt_idmap,
-#define KC_VFS_NS mnt_idmap,
-#define KC_VFS_INIT_NS &nop_mnt_idmap,
-#else
 #ifdef KC_VFS_METHOD_USER_NAMESPACE_ARG
 #define KC_VFS_NS_DEF struct user_namespace *mnt_user_ns,
 #define KC_VFS_NS mnt_user_ns,
@@ -277,7 +272,6 @@ typedef unsigned int blk_opf_t;
 #define KC_VFS_NS
 #define KC_VFS_INIT_NS
 #endif
-#endif /* KC_VFS_METHOD_MNT_IDMAP_ARG */

 #ifdef KC_BIO_ALLOC_DEV_OPF_ARGS
 #define kc_bio_alloc bio_alloc
@@ -463,30 +457,4 @@ static inline void list_lru_isolate_move(struct list_lru_one *list, struct list_
 }
 #endif

-#ifndef KC_STACK_TRACE_SAVE
-#include <linux/stacktrace.h>
-static inline unsigned int stack_trace_save(unsigned long *store, unsigned int size,
-					    unsigned int skipnr)
-{
-        struct stack_trace trace = {
-                .entries        = store,
-                .max_entries    = size,
-                .skip           = skipnr,
-        };
-
-        save_stack_trace(&trace);
-        return trace.nr_entries;
-}
-
-static inline void stack_trace_print(unsigned long *entries, unsigned int nr_entries, int spaces)
-{
-        struct stack_trace trace = {
-                .entries        = entries,
-                .nr_entries     = nr_entries,
-        };
-
-	print_stack_trace(&trace, spaces);
-}
-#endif
-
 #endif
--- a/kmod/src/lock.c
+++ b/kmod/src/lock.c
@@ -53,10 +53,8 @@
 * all access to the lock (by revoking it down to a null mode) then the
 * lock is freed.
 *
- * Each client has a configurable number of locks that are allowed to
- * remain idle after being granted, for use by future tasks.  Past the
- * limit locks are freed by requesting a null mode from the server,
- * governed by a LRU.
+ * Memory pressure on the client can cause the client to request a null
+ * mode from the server so that once its granted the lock can be freed.
 *
 * So far we've only needed a minimal trylock.  We return -EAGAIN if a
 * lock attempt can't immediately match an existing granted lock.  This
@@ -81,11 +79,14 @@ struct lock_info {
 	bool unmounting;
 	struct rb_root lock_tree;
 	struct rb_root lock_range_tree;
-	u64 nr_locks;
+	KC_DEFINE_SHRINKER(shrinker);
 	struct list_head lru_list;
+	unsigned long long lru_nr;
 	struct workqueue_struct *workq;
 	struct work_struct inv_work;
 	struct list_head inv_list;
+	struct work_struct shrink_work;
+	struct list_head shrink_list;
 	atomic64_t next_refresh_gen;

 	struct dentry *tseq_dentry;
@@ -248,6 +249,7 @@ static void lock_free(struct lock_info *linfo, struct scoutfs_lock *lock)
 	BUG_ON(!RB_EMPTY_NODE(&lock->range_node));
 	BUG_ON(!list_empty(&lock->lru_head));
 	BUG_ON(!list_empty(&lock->inv_head));
+	BUG_ON(!list_empty(&lock->shrink_head));
 	BUG_ON(!list_empty(&lock->cov_list));

 	kfree(lock->inode_deletion_data);
@@ -275,6 +277,7 @@ static struct scoutfs_lock *lock_alloc(struct super_block *sb,
 	INIT_LIST_HEAD(&lock->lru_head);
 	INIT_LIST_HEAD(&lock->inv_head);
 	INIT_LIST_HEAD(&lock->inv_list);
+	INIT_LIST_HEAD(&lock->shrink_head);
 	spin_lock_init(&lock->cov_list_lock);
 	INIT_LIST_HEAD(&lock->cov_list);

@@ -407,7 +410,6 @@ static bool lock_insert(struct super_block *sb, struct scoutfs_lock *ins)
 	rb_link_node(&ins->node, parent, node);
 	rb_insert_color(&ins->node, &linfo->lock_tree);

-	linfo->nr_locks++;
 	scoutfs_tseq_add(&linfo->tseq_tree, &ins->tseq_entry);

 	return true;
@@ -422,7 +424,6 @@ static void lock_remove(struct lock_info *linfo, struct scoutfs_lock *lock)
 	rb_erase(&lock->range_node, &linfo->lock_range_tree);
 	RB_CLEAR_NODE(&lock->range_node);

-	linfo->nr_locks--;
 	scoutfs_tseq_del(&linfo->tseq_tree, &lock->tseq_entry);
 }

@@ -462,8 +463,10 @@ static void __lock_del_lru(struct lock_info *linfo, struct scoutfs_lock *lock)
 {
 	assert_spin_locked(&linfo->lock);

-	if (!list_empty(&lock->lru_head))
+	if (!list_empty(&lock->lru_head)) {
 		list_del_init(&lock->lru_head);
+		linfo->lru_nr--;
+	}
 }

 /*
@@ -522,16 +525,14 @@ static struct scoutfs_lock *create_lock(struct super_block *sb,
 * indicate that the lock wasn't idle.  If it really is idle then we
 * either free it if it's null or put it back on the lru.
 */
-static void __put_lock(struct lock_info *linfo, struct scoutfs_lock *lock, bool tail)
+static void put_lock(struct lock_info *linfo,struct scoutfs_lock *lock)
 {
 	assert_spin_locked(&linfo->lock);

 	if (lock_idle(lock)) {
 		if (lock->mode != SCOUTFS_LOCK_NULL) {
-			if (tail)
-				list_add_tail(&lock->lru_head, &linfo->lru_list);
-			else
-				list_add(&lock->lru_head, &linfo->lru_list);
+			list_add_tail(&lock->lru_head, &linfo->lru_list);
+			linfo->lru_nr++;
 		} else {
 			lock_remove(linfo, lock);
 			lock_free(linfo, lock);
@@ -539,11 +540,6 @@ static void __put_lock(struct lock_info *linfo, struct scoutfs_lock *lock, bool
 	}
 }

-static inline void put_lock(struct lock_info *linfo, struct scoutfs_lock *lock)
-{
-	__put_lock(linfo, lock, true);
-}
-
 /*
 * The caller has made a change (set a lock mode) which can let one of the
 * invalidating locks make forward progress.
@@ -717,14 +713,14 @@ static void lock_invalidate_worker(struct work_struct *work)
 		/* only lock protocol, inv can't call subsystems after shutdown */
 		if (!linfo->shutdown) {
 			ret = lock_invalidate(sb, lock, nl->old_mode, nl->new_mode);
-			BUG_ON(ret < 0 && ret != -ENOLINK);
+			BUG_ON(ret);
 		}

 		/* respond with the key and modes from the request, server might have died */
 		ret = scoutfs_client_lock_response(sb, ireq->net_id, nl);
 		if (ret == -ENOTCONN)
 			ret = 0;
-		BUG_ON(ret < 0 && ret != -ENOLINK);
+		BUG_ON(ret);

 		scoutfs_inc_counter(sb, lock_invalidate_response);
 	}
@@ -879,69 +875,6 @@ int scoutfs_lock_recover_request(struct super_block *sb, u64 net_id,
 	return ret;
 }

-/*
- * This is called on every _lock call to try and keep the number of
- * locks under the idle count.  We're intentionally trying to throttle
- * shrinking bursts by tying its frequency to lock use.  It will only
- * send requests to free unused locks, though, so it's always possible
- * to exceed the high water mark under heavy load.
- *
- * We send a null request and the lock will be freed by the response
- * once all users drain.  If this races with invalidation then the
- * server will only send the grant response once the invalidation is
- * finished.
- */
-static bool try_shrink_lock(struct super_block *sb, struct lock_info *linfo, bool force)
-{
-	struct scoutfs_mount_options opts;
-	struct scoutfs_lock *lock = NULL;
-	struct scoutfs_net_lock nl;
-	int ret = 0;
-
-	scoutfs_options_read(sb, &opts);
-
-	/* avoiding lock contention with unsynchronized test, don't mind temp false results */
-	if (!force && (list_empty(&linfo->lru_list) ||
-	               READ_ONCE(linfo->nr_locks) <= opts.lock_idle_count))
-		return false;
-
-	spin_lock(&linfo->lock);
-
-	lock = list_first_entry_or_null(&linfo->lru_list, struct scoutfs_lock, lru_head);
-	if (lock && (force || (linfo->nr_locks > opts.lock_idle_count))) {
-		__lock_del_lru(linfo, lock);
-		lock->request_pending = 1;
-
-		nl.key = lock->start;
-		nl.old_mode = lock->mode;
-		nl.new_mode = SCOUTFS_LOCK_NULL;
-	} else {
-		lock = NULL;
-	}
-
-	spin_unlock(&linfo->lock);
-
-	if (lock) {
-		ret = scoutfs_client_lock_request(sb, &nl);
-		if (ret < 0) {
-			scoutfs_inc_counter(sb, lock_shrink_request_failed);
-
-			spin_lock(&linfo->lock);
-
-			lock->request_pending = 0;
-			wake_up(&lock->waitq);
-			__put_lock(linfo, lock, false);
-
-			spin_unlock(&linfo->lock);
-		} else {
-			scoutfs_inc_counter(sb, lock_shrink_attempted);
-			trace_scoutfs_lock_shrink(sb, lock);
-		}
-	}
-
-	return lock && ret == 0;
-}
-
 static bool lock_wait_cond(struct super_block *sb, struct scoutfs_lock *lock,
 			   enum scoutfs_lock_mode mode)
 {
@@ -1004,8 +937,6 @@ static int lock_key_range(struct super_block *sb, enum scoutfs_lock_mode mode, i
 	if (WARN_ON_ONCE(scoutfs_trans_held()))
 		return -EDEADLK;

-	try_shrink_lock(sb, linfo, false);
-
 	spin_lock(&linfo->lock);

 	/* drops and re-acquires lock if it allocates */
@@ -1449,12 +1380,134 @@ bool scoutfs_lock_protected(struct scoutfs_lock *lock, struct scoutfs_key *key,
 					  &lock->start, &lock->end) == 0;
 }

+/*
+ * The shrink callback got the lock, marked it request_pending, and put
+ * it on the shrink list.  We send a null request and the lock will be
+ * freed by the response once all users drain.  If this races with
+ * invalidation then the server will only send the grant response once
+ * the invalidation is finished.
+ */
+static void lock_shrink_worker(struct work_struct *work)
+{
+	struct lock_info *linfo = container_of(work, struct lock_info,
+					       shrink_work);
+	struct super_block *sb = linfo->sb;
+	struct scoutfs_net_lock nl;
+	struct scoutfs_lock *lock;
+	struct scoutfs_lock *tmp;
+	LIST_HEAD(list);
+	int ret;
+
+	scoutfs_inc_counter(sb, lock_shrink_work);
+
+	spin_lock(&linfo->lock);
+	list_splice_init(&linfo->shrink_list, &list);
+	spin_unlock(&linfo->lock);
+
+	list_for_each_entry_safe(lock, tmp, &list, shrink_head) {
+		list_del_init(&lock->shrink_head);
+
+		/* unlocked lock access, but should be stable since we queued */
+		nl.key = lock->start;
+		nl.old_mode = lock->mode;
+		nl.new_mode = SCOUTFS_LOCK_NULL;
+
+		ret = scoutfs_client_lock_request(sb, &nl);
+		if (ret) {
+			/* oh well, not freeing */
+			scoutfs_inc_counter(sb, lock_shrink_aborted);
+
+			spin_lock(&linfo->lock);
+
+			lock->request_pending = 0;
+			wake_up(&lock->waitq);
+			put_lock(linfo, lock);
+
+			spin_unlock(&linfo->lock);
+		}
+	}
+}
+
+static unsigned long lock_count_objects(struct shrinker *shrink,
+					struct shrink_control *sc)
+{
+	struct lock_info *linfo = KC_SHRINKER_CONTAINER_OF(shrink, struct lock_info);
+	struct super_block *sb = linfo->sb;
+
+	scoutfs_inc_counter(sb, lock_count_objects);
+
+	return shrinker_min_long(linfo->lru_nr);
+}
+
+/*
+ * Start the shrinking process for locks on the lru.  If a lock is on
+ * the lru then it can't have any active users.  We don't want to block
+ * or allocate here so all we do is get the lock, mark it request
+ * pending, and kick off the work.  The work sends a null request and
+ * eventually the lock is freed by its response.
+ *
+ * Only a racing lock attempt that isn't matched can prevent the lock
+ * from being freed.  It'll block waiting to send its request for its
+ * mode which will prevent the lock from being freed when the null
+ * response arrives.
+ */
+static unsigned long lock_scan_objects(struct shrinker *shrink,
+				       struct shrink_control *sc)
+{
+	struct lock_info *linfo = KC_SHRINKER_CONTAINER_OF(shrink, struct lock_info);
+	struct super_block *sb = linfo->sb;
+	struct scoutfs_lock *lock;
+	struct scoutfs_lock *tmp;
+	unsigned long freed = 0;
+	unsigned long nr = sc->nr_to_scan;
+	bool added = false;
+
+	scoutfs_inc_counter(sb, lock_scan_objects);
+
+	spin_lock(&linfo->lock);
+
+restart:
+	list_for_each_entry_safe(lock, tmp, &linfo->lru_list, lru_head) {
+
+		BUG_ON(!lock_idle(lock));
+		BUG_ON(lock->mode == SCOUTFS_LOCK_NULL);
+		BUG_ON(!list_empty(&lock->shrink_head));
+
+		if (nr-- == 0)
+			break;
+
+		__lock_del_lru(linfo, lock);
+		lock->request_pending = 1;
+		list_add_tail(&lock->shrink_head, &linfo->shrink_list);
+		added = true;
+		freed++;
+
+		scoutfs_inc_counter(sb, lock_shrink_attempted);
+		trace_scoutfs_lock_shrink(sb, lock);
+
+		/* could have bazillions of idle locks */
+		if (cond_resched_lock(&linfo->lock))
+			goto restart;
+	}
+
+	spin_unlock(&linfo->lock);
+
+	if (added)
+		queue_work(linfo->workq, &linfo->shrink_work);
+
+	trace_scoutfs_lock_shrink_exit(sb, sc->nr_to_scan, freed);
+	return freed;
+}
+
 void scoutfs_free_unused_locks(struct super_block *sb)
 {
-	DECLARE_LOCK_INFO(sb, linfo);
+	struct lock_info *linfo = SCOUTFS_SB(sb)->lock_info;
+	struct shrink_control sc = {
+		.gfp_mask = GFP_NOFS,
+		.nr_to_scan = INT_MAX,
+	};

-	while (try_shrink_lock(sb, linfo, true))
-		cond_resched();
+	lock_scan_objects(KC_SHRINKER_FN(&linfo->shrinker), &sc);
 }

 static void lock_tseq_show(struct seq_file *m, struct scoutfs_tseq_entry *ent)
@@ -1537,10 +1590,10 @@ u64 scoutfs_lock_ino_refresh_gen(struct super_block *sb, u64 ino)
 * transitions and sending requests.   We set the shutdown flag to catch
 * anyone who breaks this rule.
 *
- * With no more lock callers, we'll no longer try to shrink the pool of
- * granted locks.  We'll free all of them as _destroy() is called after
- * the farewell response indicates that the server tore down all our
- * lock state.
+ * We unregister the shrinker so that we won't try and send null
+ * requests in response to memory pressure.  The locks will all be
+ * unceremoniously dropped once we get a farewell response from the
+ * server which indicates that they destroyed our locking state.
 *
 * We will still respond to invalidation requests that have to be
 * processed to let unmount in other mounts acquire locks and make
@@ -1560,6 +1613,10 @@ void scoutfs_lock_shutdown(struct super_block *sb)

 	trace_scoutfs_lock_shutdown(sb, linfo);

+	/* stop the shrinker from queueing work */
+	KC_UNREGISTER_SHRINKER(&linfo->shrinker);
+	flush_work(&linfo->shrink_work);
+
 	/* cause current and future lock calls to return errors */
 	spin_lock(&linfo->lock);
 	linfo->shutdown = true;
@@ -1650,6 +1707,8 @@ void scoutfs_lock_destroy(struct super_block *sb)
 			list_del_init(&lock->inv_head);
 			lock->invalidate_pending = 0;
 		}
+		if (!list_empty(&lock->shrink_head))
+			list_del_init(&lock->shrink_head);
 		lock_remove(linfo, lock);
 		lock_free(linfo, lock);
 	}
@@ -1674,9 +1733,14 @@ int scoutfs_lock_setup(struct super_block *sb)
 	spin_lock_init(&linfo->lock);
 	linfo->lock_tree = RB_ROOT;
 	linfo->lock_range_tree = RB_ROOT;
+	KC_INIT_SHRINKER_FUNCS(&linfo->shrinker, lock_count_objects,
+			       lock_scan_objects);
+	KC_REGISTER_SHRINKER(&linfo->shrinker, "scoutfs-lock:" SCSBF, SCSB_ARGS(sb));
 	INIT_LIST_HEAD(&linfo->lru_list);
 	INIT_WORK(&linfo->inv_work, lock_invalidate_worker);
 	INIT_LIST_HEAD(&linfo->inv_list);
+	INIT_WORK(&linfo->shrink_work, lock_shrink_worker);
+	INIT_LIST_HEAD(&linfo->shrink_list);
 	atomic64_set(&linfo->next_refresh_gen, 0);
 	scoutfs_tseq_tree_init(&linfo->tseq_tree, lock_tseq_show);

--- a/kmod/src/lock_server.c
+++ b/kmod/src/lock_server.c
@@ -506,19 +506,6 @@ out:
 * because we don't know which locks they'll hold.  Once recover
 * finishes the server calls us to kick all the locks that were waiting
 * during recovery.
- *
- * The calling server shuts down if we return errors indicating that we
- * weren't able to ensure forward progress in the lock state machine.
- *
- * Failure to send to a disconnected client is not a fatal error.
- * During normal disconnection the client's state is removed before
- * their connection is destroyed.  We can't use state to try and send to
- * a non-existing connection.  But a client that fails to reconnect is
- * disconnected before being fenced.  If we have multiple disconnected
- * clients we can try to send to one while cleaning up another.  If
- * they've uncleanly disconnected their locks are going to be removed
- * and the lock can make forward progress again.  Or we'll shutdown for
- * failure to fence.
 */
 static int process_waiting_requests(struct super_block *sb,
 				    struct server_lock_node *snode)
@@ -610,10 +597,6 @@ static int process_waiting_requests(struct super_block *sb,
 out:
 	put_server_lock(inf, snode);

-	/* disconnected clients will be fenced, trying to send to them isn't fatal */
-	if (ret == -ENOTCONN)
-		ret = 0;
-
 	return ret;
 }

--- a/kmod/src/msg.h
+++ b/kmod/src/msg.h
@@ -35,12 +35,6 @@ do {									\
 	}								\
 } while (0)								\

-#define scoutfs_bug_on_err(sb, err, fmt, args...) \
-do { \
-	__typeof__(err) _err = (err); \
-	scoutfs_bug_on(sb, _err < 0 && _err != -ENOLINK, fmt, ##args); \
-} while (0)
-
 /*
 * Each message is only generated once per volume.  Remounting resets
 * the messages.
--- a/kmod/src/net.c
+++ b/kmod/src/net.c
@@ -21,7 +21,6 @@
 #include <net/tcp.h>
 #include <linux/log2.h>
 #include <linux/jhash.h>
-#include <linux/rbtree.h>

 #include "format.h"
 #include "counters.h"
@@ -126,7 +125,6 @@ struct message_send {
 	unsigned long dead:1;
 	struct list_head head;
 	scoutfs_net_response_t resp_func;
-	struct rb_node node;
 	void *resp_data;
 	struct scoutfs_net_header nh;
 };
@@ -163,118 +161,49 @@ static bool nh_is_request(struct scoutfs_net_header *nh)
 	return !nh_is_response(nh);
 }

-static int cmp_sorted_msend(u64 pos, struct message_send *msend)
-{
-	if (nh_is_request(&msend->nh))
-		return pos < le64_to_cpu(msend->nh.id) ? -1 :
-		       pos > le64_to_cpu(msend->nh.id) ? 1 : 0;
-	else
-		return pos < le64_to_cpu(msend->nh.seq) ? -1 :
-		       pos > le64_to_cpu(msend->nh.seq) ? 1 : 0;
-}
-
-static struct message_send *search_sorted_msends(struct rb_root *root, u64 pos, struct rb_node *ins)
-{
-	struct rb_node **node = &root->rb_node;
-	struct rb_node *parent = NULL;
-	struct message_send *msend = NULL;
-	struct message_send *next = NULL;
-	int cmp = -1;
-
-	while (*node) {
-		parent = *node;
-		msend = container_of(*node, struct message_send, node);
-
-		cmp = cmp_sorted_msend(pos, msend);
-		if (cmp < 0) {
-			next = msend;
-			node = &(*node)->rb_left;
-		} else if (cmp > 0) {
-			node = &(*node)->rb_right;
-		} else {
-			next = msend;
-			break;
-		}
-	}
-
-	BUG_ON(cmp == 0 && ins);
-
-	if (ins) {
-		rb_link_node(ins, parent, node);
-		rb_insert_color(ins, root);
-	}
-
-	return next;
-}
-
-static struct message_send *next_sorted_msend(struct message_send *msend)
-{
-	struct rb_node *node = rb_next(&msend->node);
-
-	return node ? rb_entry(node, struct message_send, node) : NULL;
-}
-
-#define for_each_sorted_msend(MSEND_, TMP_, ROOT_, POS_) \
-	for (MSEND_ = search_sorted_msends(ROOT_, POS_, NULL); \
-	     MSEND_ != NULL && ({ TMP_ = next_sorted_msend(MSEND_); true; }); \
-	     MSEND_ = TMP_)
-
-static void insert_sorted_msend(struct scoutfs_net_connection *conn, struct message_send *msend)
-{
-	BUG_ON(!RB_EMPTY_NODE(&msend->node));
-
-	if (nh_is_request(&msend->nh))
-		search_sorted_msends(&conn->req_root, le64_to_cpu(msend->nh.id), &msend->node);
-	else
-		search_sorted_msends(&conn->resp_root, le64_to_cpu(msend->nh.seq), &msend->node);
-}
-
-static void erase_sorted_msend(struct scoutfs_net_connection *conn, struct message_send *msend)
-{
-	if (!RB_EMPTY_NODE(&msend->node)) {
-		if (nh_is_request(&msend->nh))
-			rb_erase(&msend->node, &conn->req_root);
-		else
-			rb_erase(&msend->node, &conn->resp_root);
-		RB_CLEAR_NODE(&msend->node);
-	}
-}
-
-static void move_sorted_msends(struct scoutfs_net_connection *dst_conn, struct rb_root *dst_root,
-			       struct scoutfs_net_connection *src_conn, struct rb_root *src_root)
-{
-	struct message_send *msend;
-	struct message_send *tmp;
-
-	for_each_sorted_msend(msend, tmp, src_root, 0) {
-		erase_sorted_msend(src_conn, msend);
-		insert_sorted_msend(dst_conn, msend);
-	}
-}
-
 /*
- * Pending requests are uniquely identified by the id they were assigned
- * as they were first put on the send queue.
+ * We return dead requests so that the caller can stop searching other
+ * lists for the dead request that we found.
 */
-static struct message_send *find_request(struct scoutfs_net_connection *conn, u8 cmd, u64 id)
+static struct message_send *search_list(struct scoutfs_net_connection *conn,
+					struct list_head *list,
+					u8 cmd, u64 id)
 {
 	struct message_send *msend;

 	assert_spin_locked(&conn->lock);

-	msend = search_sorted_msends(&conn->req_root, id, NULL);
-	if (msend && !(msend->nh.cmd == cmd && le64_to_cpu(msend->nh.id) == id))
-		msend = NULL;
+	list_for_each_entry(msend, list, head) {
+		if (nh_is_request(&msend->nh) && msend->nh.cmd == cmd &&
+		    le64_to_cpu(msend->nh.id) == id)
+			return msend;
+	}

+	return NULL;
+}
+
+/*
+ * Find an active send request on the lists.  It's almost certainly
+ * waiting on the resend queue but it could be actively being sent.
+ */
+static struct message_send *find_request(struct scoutfs_net_connection *conn,
+					 u8 cmd, u64 id)
+{
+	struct message_send *msend;
+
+	msend = search_list(conn, &conn->resend_queue, cmd, id) ?:
+		search_list(conn, &conn->send_queue, cmd, id);
+	if (msend && msend->dead)
+		msend = NULL;
 	return msend;
 }

 /*
- * Free a send message by moving it to the send queue and marking it
- * dead.  It is removed from the sorted rb roots so it won't be visible
- * as a request for response processing.
+ * Complete a send message by moving it to the send queue and marking it
+ * to be freed.  It won't be visible to callers trying to find sends.
 */
-static void queue_dead_free(struct scoutfs_net_connection *conn, struct message_send *msend)
+static void complete_send(struct scoutfs_net_connection *conn,
+			  struct message_send *msend)
 {
 	assert_spin_locked(&conn->lock);

@@ -284,7 +213,6 @@ static void queue_dead_free(struct scoutfs_net_connection *conn, struct message_

 	msend->dead = 1;
 	list_move(&msend->head, &conn->send_queue);
-	erase_sorted_msend(conn, msend);
 	queue_work(conn->workq, &conn->send_work);
 }

@@ -442,7 +370,6 @@ static int submit_send(struct super_block *sb,
 	msend->resp_func = resp_func;
 	msend->resp_data = resp_data;
 	msend->dead = 0;
-	RB_CLEAR_NODE(&msend->node);

 	msend->nh.seq = cpu_to_le64(seq);
 	msend->nh.recv_seq = 0;  /* set when sent, not when queued */
@@ -463,7 +390,6 @@ static int submit_send(struct super_block *sb,
 	} else {
 		list_add_tail(&msend->head, &conn->resend_queue);
 	}
-	insert_sorted_msend(conn, msend);

 	if (id_ret)
 		*id_ret = le64_to_cpu(msend->nh.id);
@@ -533,7 +459,7 @@ static int process_response(struct scoutfs_net_connection *conn,
 	if (msend) {
 		resp_func = msend->resp_func;
 		resp_data = msend->resp_data;
-		queue_dead_free(conn, msend);
+		complete_send(conn, msend);
 	} else {
 		scoutfs_inc_counter(sb, net_dropped_response);
 	}
@@ -624,21 +550,43 @@ static void queue_ordered_proc(struct scoutfs_net_connection *conn, struct messa
 * Free live responses up to and including the seq by marking them dead
 * and moving them to the send queue to be freed.
 */
-static void free_acked_responses(struct scoutfs_net_connection *conn, u64 seq)
+static bool move_acked_responses(struct scoutfs_net_connection *conn,
+				 struct list_head *list, u64 seq)
 {
 	struct message_send *msend;
 	struct message_send *tmp;
+	bool moved = false;
+
+	assert_spin_locked(&conn->lock);
+
+	list_for_each_entry_safe(msend, tmp, list, head) {
+		if (le64_to_cpu(msend->nh.seq) > seq)
+			break;
+		if (!nh_is_response(&msend->nh) || msend->dead)
+			continue;
+
+		msend->dead = 1;
+		list_move(&msend->head, &conn->send_queue);
+		moved = true;
+	}
+
+	return moved;
+}
+
+/* acks are processed inline in the recv worker */
+static void free_acked_responses(struct scoutfs_net_connection *conn, u64 seq)
+{
+	bool moved;

 	spin_lock(&conn->lock);

-	for_each_sorted_msend(msend, tmp, &conn->resp_root, 0) {
-		if (le64_to_cpu(msend->nh.seq) > seq)
-			break;
-
-		queue_dead_free(conn, msend);
-	}
+	moved = move_acked_responses(conn, &conn->send_queue, seq) |
+		move_acked_responses(conn, &conn->resend_queue, seq);

 	spin_unlock(&conn->lock);
+
+	if (moved)
+		queue_work(conn->workq, &conn->send_work);
 }

 static int k_recvmsg(struct socket *sock, void *buf, unsigned len)
@@ -876,11 +824,9 @@ static int k_sendmsg_full(struct socket *sock, struct kvec *kv, unsigned long nr
 	return ret;
 }

-static void free_msend(struct net_info *ninf, struct scoutfs_net_connection *conn,
-		       struct message_send *msend)
+static void free_msend(struct net_info *ninf, struct message_send *msend)
 {
 	list_del_init(&msend->head);
-	erase_sorted_msend(conn, msend);
 	scoutfs_tseq_del(&ninf->msg_tseq_tree, &msend->tseq_entry);
 	kfree(msend);
 }
@@ -920,10 +866,9 @@ static void scoutfs_net_send_worker(struct work_struct *work)
 		count = 0;

 		spin_lock(&conn->lock);
-
 		list_for_each_entry_safe(msend, _msend_, &conn->send_queue, head) {
 			if (msend->dead) {
-				free_msend(ninf, conn, msend);
+				free_msend(ninf, msend);
 				continue;
 			}

@@ -1012,7 +957,7 @@ static void scoutfs_net_destroy_worker(struct work_struct *work)

 	list_splice_init(&conn->resend_queue, &conn->send_queue);
 	list_for_each_entry_safe(msend, tmp, &conn->send_queue, head)
-		free_msend(ninf, conn, msend);
+		free_msend(ninf, msend);

 	/* accepted sockets are removed from their listener's list */
 	if (conn->listening_conn) {
@@ -1160,15 +1105,9 @@ static void scoutfs_net_listen_worker(struct work_struct *work)
 						  conn->notify_down,
 						  conn->info_size,
 						  conn->req_funcs, "accepted");
-		/*
-		 * scoutfs_net_alloc_conn() can fail due to ENOMEM. If this
-		 * is the only thing that does so, there's no harm in trying
-		 * to see if kernel_accept() can get enough memory to try accepting
-		 * a new connection again. If that then fails with ENOMEM, it'll
-		 * shut down the conn anyway. So just retry here.
-		 */
 		if (!acc_conn) {
 			sock_release(acc_sock);
+			ret = -ENOMEM;
 			continue;
 		}

@@ -1358,7 +1297,7 @@ static void scoutfs_net_shutdown_worker(struct work_struct *work)
 							struct message_send, head))) {
 			resp_func = msend->resp_func;
 			resp_data = msend->resp_data;
-			free_msend(ninf, conn, msend);
+			free_msend(ninf, msend);
 			spin_unlock(&conn->lock);

 			call_resp_func(sb, conn, resp_func, resp_data, NULL, 0, -ECONNABORTED);
@@ -1374,7 +1313,7 @@ static void scoutfs_net_shutdown_worker(struct work_struct *work)
 	list_splice_tail_init(&conn->send_queue, &conn->resend_queue);
 	list_for_each_entry_safe(msend, tmp, &conn->resend_queue, head) {
 		if (msend->nh.cmd == SCOUTFS_NET_CMD_GREETING)
-			free_msend(ninf, conn, msend);
+			free_msend(ninf, msend);
 	}

 	clear_conn_fl(conn, saw_greeting);
@@ -1548,8 +1487,6 @@ scoutfs_net_alloc_conn(struct super_block *sb,
 	atomic64_set(&conn->recv_seq, 0);
 	INIT_LIST_HEAD(&conn->send_queue);
 	INIT_LIST_HEAD(&conn->resend_queue);
-	conn->req_root = RB_ROOT;
-	conn->resp_root = RB_ROOT;
 	INIT_WORK(&conn->listen_work, scoutfs_net_listen_worker);
 	INIT_WORK(&conn->connect_work, scoutfs_net_connect_worker);
 	INIT_WORK(&conn->send_work, scoutfs_net_send_worker);
@@ -1762,7 +1699,7 @@ void scoutfs_net_client_greeting(struct super_block *sb,
 		atomic64_set(&conn->recv_seq, 0);
 		list_for_each_entry_safe(msend, tmp, &conn->resend_queue, head){
 			if (nh_is_response(&msend->nh))
-				free_msend(ninf, conn, msend);
+				free_msend(ninf, msend);
 		}
 	}

@@ -1865,8 +1802,6 @@ restart:
 		BUG_ON(!list_empty(&reconn->send_queue));
 		/* queued greeting response is racing, can be in send or resend queue */
 		list_splice_tail_init(&reconn->resend_queue, &conn->resend_queue);
-		move_sorted_msends(conn, &conn->req_root, reconn, &reconn->req_root);
-		move_sorted_msends(conn, &conn->resp_root, reconn, &reconn->resp_root);

 		/* new conn info is unused, swap, old won't call down */
 		swap(conn->info, reconn->info);
--- a/kmod/src/net.h
+++ b/kmod/src/net.h
@@ -67,8 +67,6 @@ struct scoutfs_net_connection {
 	u64 next_send_id;
 	struct list_head send_queue;
 	struct list_head resend_queue;
-	struct rb_root req_root;
-	struct rb_root resp_root;

 	atomic64_t recv_seq;
 	unsigned int ordered_proc_nr;
--- a/kmod/src/omap.c
+++ b/kmod/src/omap.c
@@ -592,7 +592,7 @@ static int handle_request(struct super_block *sb, struct omap_request *req)
 	ret = 0;
 out:
 	free_rids(&priv_rids);
-	if ((ret < 0) && (req != NULL)) {
+	if (ret < 0) {
 		ret = scoutfs_server_send_omap_response(sb, req->client_rid, req->client_id,
 							NULL, ret);
 		free_req(req);
--- a/kmod/src/options.c
+++ b/kmod/src/options.c
@@ -33,8 +33,6 @@ enum {
 	Opt_acl,
 	Opt_data_prealloc_blocks,
 	Opt_data_prealloc_contig_only,
-	Opt_ino_alloc_per_lock,
-	Opt_lock_idle_count,
 	Opt_log_merge_wait_timeout_ms,
 	Opt_metadev_path,
 	Opt_noacl,
@@ -49,8 +47,6 @@ static const match_table_t tokens = {
 	{Opt_acl, "acl"},
 	{Opt_data_prealloc_blocks, "data_prealloc_blocks=%s"},
 	{Opt_data_prealloc_contig_only, "data_prealloc_contig_only=%s"},
-	{Opt_ino_alloc_per_lock, "ino_alloc_per_lock=%s"},
-	{Opt_lock_idle_count, "lock_idle_count=%s"},
 	{Opt_log_merge_wait_timeout_ms, "log_merge_wait_timeout_ms=%s"},
 	{Opt_metadev_path, "metadev_path=%s"},
 	{Opt_noacl, "noacl"},
@@ -121,10 +117,6 @@ static void free_options(struct scoutfs_mount_options *opts)
 	kfree(opts->metadev_path);
 }

-#define MIN_LOCK_IDLE_COUNT	32
-#define DEFAULT_LOCK_IDLE_COUNT	(10 * 1000)
-#define MAX_LOCK_IDLE_COUNT	(100 * 1000)
-
 #define MIN_LOG_MERGE_WAIT_TIMEOUT_MS		100UL
 #define DEFAULT_LOG_MERGE_WAIT_TIMEOUT_MS	500
 #define MAX_LOG_MERGE_WAIT_TIMEOUT_MS		(60 * MSEC_PER_SEC)
@@ -136,7 +128,7 @@ static void free_options(struct scoutfs_mount_options *opts)
 #define MIN_DATA_PREALLOC_BLOCKS	1ULL
 #define MAX_DATA_PREALLOC_BLOCKS	((unsigned long long)SCOUTFS_BLOCK_SM_MAX)

-#define DEFAULT_TCP_KEEPALIVE_TIMEOUT_MS	(60 * MSEC_PER_SEC)
+#define DEFAULT_TCP_KEEPALIVE_TIMEOUT_MS	(10 * MSEC_PER_SEC)

 static void init_default_options(struct scoutfs_mount_options *opts)
 {
@@ -144,8 +136,6 @@ static void init_default_options(struct scoutfs_mount_options *opts)

 	opts->data_prealloc_blocks = SCOUTFS_DATA_PREALLOC_DEFAULT_BLOCKS;
 	opts->data_prealloc_contig_only = 1;
-	opts->ino_alloc_per_lock = SCOUTFS_LOCK_INODE_GROUP_NR;
-	opts->lock_idle_count = DEFAULT_LOCK_IDLE_COUNT;
 	opts->log_merge_wait_timeout_ms = DEFAULT_LOG_MERGE_WAIT_TIMEOUT_MS;
 	opts->orphan_scan_delay_ms = -1;
 	opts->quorum_heartbeat_timeout_ms = SCOUTFS_QUORUM_DEF_HB_TIMEO_MS;
@@ -153,21 +143,6 @@ static void init_default_options(struct scoutfs_mount_options *opts)
 	opts->tcp_keepalive_timeout_ms = DEFAULT_TCP_KEEPALIVE_TIMEOUT_MS;
 }

-static int verify_lock_idle_count(struct super_block *sb, int ret, int val)
-{
-	if (ret < 0) {
-		scoutfs_err(sb, "failed to parse lock_idle_count value");
-		return -EINVAL;
-	}
-	if (val < MIN_LOCK_IDLE_COUNT || val > MAX_LOCK_IDLE_COUNT) {
-		scoutfs_err(sb, "invalid lock_idle_count value %d, must be between %u and %u",
-			    val, MIN_LOCK_IDLE_COUNT, MAX_LOCK_IDLE_COUNT);
-		return -EINVAL;
-	}
-
-	return 0;
-}
-
 static int verify_log_merge_wait_timeout_ms(struct super_block *sb, int ret, int val)
 {
 	if (ret < 0) {
@@ -263,18 +238,6 @@ static int parse_options(struct super_block *sb, char *options, struct scoutfs_m
 			opts->data_prealloc_contig_only = nr;
 			break;

-		case Opt_ino_alloc_per_lock:
-			ret = match_int(args, &nr);
-			if (ret < 0 || nr < 1 || nr > SCOUTFS_LOCK_INODE_GROUP_NR) {
-				scoutfs_err(sb, "invalid ino_alloc_per_lock option, must be between 1 and %u",
-					    SCOUTFS_LOCK_INODE_GROUP_NR);
-				if (ret == 0)
-					ret = -EINVAL;
-				return ret;
-			}
-			opts->ino_alloc_per_lock = nr;
-			break;
-
 		case Opt_tcp_keepalive_timeout_ms:
 			ret = match_int(args, &nr);
 			ret = verify_tcp_keepalive_timeout_ms(sb, ret, nr);
@@ -283,14 +246,6 @@ static int parse_options(struct super_block *sb, char *options, struct scoutfs_m
 			opts->tcp_keepalive_timeout_ms = nr;
 			break;

-		case Opt_lock_idle_count:
-			ret = match_int(args, &nr);
-			ret = verify_lock_idle_count(sb, ret, nr);
-			if (ret < 0)
-				return ret;
-			opts->lock_idle_count = nr;
-			break;
-
 		case Opt_log_merge_wait_timeout_ms:
 			ret = match_int(args, &nr);
 			ret = verify_log_merge_wait_timeout_ms(sb, ret, nr);
@@ -438,7 +393,6 @@ int scoutfs_options_show(struct seq_file *seq, struct dentry *root)
 		seq_puts(seq, ",acl");
 	seq_printf(seq, ",data_prealloc_blocks=%llu", opts.data_prealloc_blocks);
 	seq_printf(seq, ",data_prealloc_contig_only=%u", opts.data_prealloc_contig_only);
-	seq_printf(seq, ",ino_alloc_per_lock=%u", opts.ino_alloc_per_lock);
 	seq_printf(seq, ",metadev_path=%s", opts.metadev_path);
 	if (!is_acl)
 		seq_puts(seq, ",noacl");
@@ -527,82 +481,6 @@ static ssize_t data_prealloc_contig_only_store(struct kobject *kobj, struct kobj
 }
 SCOUTFS_ATTR_RW(data_prealloc_contig_only);

-static ssize_t ino_alloc_per_lock_show(struct kobject *kobj, struct kobj_attribute *attr,
-					 char *buf)
-{
-	struct super_block *sb = SCOUTFS_SYSFS_ATTRS_SB(kobj);
-	struct scoutfs_mount_options opts;
-
-	scoutfs_options_read(sb, &opts);
-
-	return snprintf(buf, PAGE_SIZE, "%u", opts.ino_alloc_per_lock);
-}
-static ssize_t ino_alloc_per_lock_store(struct kobject *kobj, struct kobj_attribute *attr,
-					  const char *buf, size_t count)
-{
-	struct super_block *sb = SCOUTFS_SYSFS_ATTRS_SB(kobj);
-	DECLARE_OPTIONS_INFO(sb, optinf);
-	char nullterm[20]; /* more than enough for octal -U32_MAX */
-	long val;
-	int len;
-	int ret;
-
-	len = min(count, sizeof(nullterm) - 1);
-	memcpy(nullterm, buf, len);
-	nullterm[len] = '\0';
-
-	ret = kstrtol(nullterm, 0, &val);
-	if (ret < 0 || val < 1 || val > SCOUTFS_LOCK_INODE_GROUP_NR) {
-		scoutfs_err(sb, "invalid ino_alloc_per_lock option, must be between 1 and %u",
-			    SCOUTFS_LOCK_INODE_GROUP_NR);
-		return -EINVAL;
-	}
-
-	write_seqlock(&optinf->seqlock);
-	optinf->opts.ino_alloc_per_lock = val;
-	write_sequnlock(&optinf->seqlock);
-
-	return count;
-}
-SCOUTFS_ATTR_RW(ino_alloc_per_lock);
-
-static ssize_t lock_idle_count_show(struct kobject *kobj, struct kobj_attribute *attr,
-						char *buf)
-{
-	struct super_block *sb = SCOUTFS_SYSFS_ATTRS_SB(kobj);
-	struct scoutfs_mount_options opts;
-
-	scoutfs_options_read(sb, &opts);
-
-	return snprintf(buf, PAGE_SIZE, "%u", opts.lock_idle_count);
-}
-static ssize_t lock_idle_count_store(struct kobject *kobj, struct kobj_attribute *attr,
-						 const char *buf, size_t count)
-{
-	struct super_block *sb = SCOUTFS_SYSFS_ATTRS_SB(kobj);
-	DECLARE_OPTIONS_INFO(sb, optinf);
-	char nullterm[30]; /* more than enough for octal -U64_MAX */
-	int val;
-	int len;
-	int ret;
-
-	len = min(count, sizeof(nullterm) - 1);
-	memcpy(nullterm, buf, len);
-	nullterm[len] = '\0';
-
-	ret = kstrtoint(nullterm, 0, &val);
-	ret = verify_lock_idle_count(sb, ret, val);
-	if (ret == 0) {
-		write_seqlock(&optinf->seqlock);
-		optinf->opts.lock_idle_count = val;
-		write_sequnlock(&optinf->seqlock);
-		ret = count;
-	}
-
-	return ret;
-}
-SCOUTFS_ATTR_RW(lock_idle_count);
-
 static ssize_t log_merge_wait_timeout_ms_show(struct kobject *kobj, struct kobj_attribute *attr,
 						char *buf)
 {
@@ -743,8 +621,6 @@ SCOUTFS_ATTR_RO(quorum_slot_nr);
 static struct attribute *options_attrs[] = {
 	SCOUTFS_ATTR_PTR(data_prealloc_blocks),
 	SCOUTFS_ATTR_PTR(data_prealloc_contig_only),
-	SCOUTFS_ATTR_PTR(ino_alloc_per_lock),
-	SCOUTFS_ATTR_PTR(lock_idle_count),
 	SCOUTFS_ATTR_PTR(log_merge_wait_timeout_ms),
 	SCOUTFS_ATTR_PTR(metadev_path),
 	SCOUTFS_ATTR_PTR(orphan_scan_delay_ms),
--- a/kmod/src/options.h
+++ b/kmod/src/options.h
@@ -8,8 +8,6 @@
 struct scoutfs_mount_options {
 	u64 data_prealloc_blocks;
 	bool data_prealloc_contig_only;
-	unsigned int ino_alloc_per_lock;
-	int lock_idle_count;
 	unsigned int log_merge_wait_timeout_ms;
 	char *metadev_path;
 	unsigned int orphan_scan_delay_ms;
--- a/kmod/src/quorum.c
+++ b/kmod/src/quorum.c
@@ -507,10 +507,10 @@ static int update_quorum_block(struct super_block *sb, int event, u64 term, bool
 		set_quorum_block_event(sb, &blk, event, term);
 		ret = write_quorum_block(sb, blkno, &blk);
 		if (ret < 0)
-			scoutfs_err(sb, "error %d writing quorum block %llu after updating event %d term %llu",
+			scoutfs_err(sb, "error %d reading quorum block %llu to update event %d term %llu",
 				    ret, blkno, event, term);
 	} else {
-		scoutfs_err(sb, "error %d reading quorum block %llu to update event %d term %llu",
+		scoutfs_err(sb, "error %d writing quorum block %llu after updating event %d term %llu",
 			    ret, blkno, event, term);
 	}

@@ -713,6 +713,8 @@ static void scoutfs_quorum_worker(struct work_struct *work)
 	struct quorum_status qst = {0,};
 	struct hb_recording hbr;
 	bool record_hb;
+	bool recv_failed;
+	bool initializing = true;
 	int ret;
 	int err;

@@ -745,6 +747,8 @@ static void scoutfs_quorum_worker(struct work_struct *work)

 		update_show_status(qinf, &qst);

+		recv_failed = false;
+
 		ret = recv_msg(sb, &msg, qst.timeout);
 		if (ret < 0) {
 			if (ret != -ETIMEDOUT && ret != -EAGAIN) {
@@ -752,6 +756,9 @@ static void scoutfs_quorum_worker(struct work_struct *work)
 				scoutfs_inc_counter(sb, quorum_recv_error);
 				goto out;
 			}
+
+			recv_failed = true;
+
 			msg.type = SCOUTFS_QUORUM_MSG_INVALID;
 			ret = 0;
 		}
@@ -809,13 +816,13 @@ static void scoutfs_quorum_worker(struct work_struct *work)

 		/* followers and candidates start new election on timeout */
 		if (qst.role != LEADER &&
-		    msg.type == SCOUTFS_QUORUM_MSG_INVALID &&
+		    (initializing || recv_failed) &&
 		    ktime_after(ktime_get(), qst.timeout)) {
 			/* .. but only if their server has stopped */
 			if (!scoutfs_server_is_down(sb)) {
 				qst.timeout = election_timeout();
 				scoutfs_inc_counter(sb, quorum_candidate_server_stopping);
-				continue;
+				goto again;
 			}

 			qst.role = CANDIDATE;
@@ -952,6 +959,9 @@ static void scoutfs_quorum_worker(struct work_struct *work)
 		}

 		record_hb_delay(sb, qinf, &hbr, record_hb, qst.role);
+
+again:
+		initializing = false;
 	}

 	update_show_status(qinf, &qst);
@@ -970,10 +980,7 @@ static void scoutfs_quorum_worker(struct work_struct *work)
 	}

 	/* record that this slot no longer has an active quorum */
-	err = update_quorum_block(sb, SCOUTFS_QUORUM_EVENT_END, qst.term, true);
-	if (err < 0 && ret == 0)
-		ret = err;
-
+	update_quorum_block(sb, SCOUTFS_QUORUM_EVENT_END, qst.term, true);
 out:
 	if (ret < 0) {
 		scoutfs_err(sb, "quorum service saw error %d, shutting down.  This mount is no longer participating in quorum.  It should be remounted to restore service.",
@@ -1062,7 +1069,7 @@ static char *role_str(int role)
 		[LEADER] = "leader",
 	};

-	if (role < 0 || role >= ARRAY_SIZE(roles) || !roles[role])
+	if (role < 0 || role > ARRAY_SIZE(roles) || !roles[role])
 		return "invalid";

 	return roles[role];
--- a/kmod/src/scoutfs_trace.h
+++ b/kmod/src/scoutfs_trace.h
@@ -789,80 +789,6 @@ TRACE_EVENT(scoutfs_inode_walk_writeback,
 		  __entry->ino, __entry->write, __entry->ret)
 );

-TRACE_EVENT(scoutfs_orphan_scan_start,
-	TP_PROTO(struct super_block *sb),
-
-	TP_ARGS(sb),
-
-	TP_STRUCT__entry(
-		SCSB_TRACE_FIELDS
-	),
-
-	TP_fast_assign(
-		SCSB_TRACE_ASSIGN(sb);
-	),
-
-	TP_printk(SCSBF, SCSB_TRACE_ARGS)
-);
-
-TRACE_EVENT(scoutfs_orphan_scan_stop,
-	TP_PROTO(struct super_block *sb, bool work_todo),
-
-	TP_ARGS(sb, work_todo),
-
-	TP_STRUCT__entry(
-		SCSB_TRACE_FIELDS
-		__field(bool, work_todo)
-	),
-
-	TP_fast_assign(
-		SCSB_TRACE_ASSIGN(sb);
-		__entry->work_todo = work_todo;
-	),
-
-	TP_printk(SCSBF" work_todo %d", SCSB_TRACE_ARGS, __entry->work_todo)
-);
-
-TRACE_EVENT(scoutfs_orphan_scan_work,
-	TP_PROTO(struct super_block *sb, __u64 ino),
-
-	TP_ARGS(sb, ino),
-
-	TP_STRUCT__entry(
-		SCSB_TRACE_FIELDS
-		__field(__u64, ino)
-	),
-
-	TP_fast_assign(
-		SCSB_TRACE_ASSIGN(sb);
-		__entry->ino = ino;
-	),
-
-	TP_printk(SCSBF" ino %llu", SCSB_TRACE_ARGS,
-		  __entry->ino)
-);
-
-TRACE_EVENT(scoutfs_orphan_scan_end,
-	TP_PROTO(struct super_block *sb, __u64 ino, int ret),
-
-	TP_ARGS(sb, ino, ret),
-
-	TP_STRUCT__entry(
-		SCSB_TRACE_FIELDS
-		__field(__u64, ino)
-		__field(int, ret)
-	),
-
-	TP_fast_assign(
-		SCSB_TRACE_ASSIGN(sb);
-		__entry->ino = ino;
-		__entry->ret = ret;
-	),
-
-	TP_printk(SCSBF" ino %llu ret %d", SCSB_TRACE_ARGS,
-		  __entry->ino, __entry->ret)
-);
-
 DECLARE_EVENT_CLASS(scoutfs_lock_info_class,
 	TP_PROTO(struct super_block *sb, struct lock_info *linfo),

@@ -1110,82 +1036,6 @@ TRACE_EVENT(scoutfs_orphan_inode,
 		  MINOR(__entry->dev), __entry->ino)
 );

-DECLARE_EVENT_CLASS(scoutfs_try_delete_class,
-        TP_PROTO(struct super_block *sb, u64 ino),
-        TP_ARGS(sb, ino),
-        TP_STRUCT__entry(
-		SCSB_TRACE_FIELDS
-		__field(__u64, ino)
-        ),
-        TP_fast_assign(
-		SCSB_TRACE_ASSIGN(sb);
-		__entry->ino = ino;
-        ),
-	TP_printk(SCSBF" ino %llu", SCSB_TRACE_ARGS, __entry->ino)
-);
-
-DEFINE_EVENT(scoutfs_try_delete_class, scoutfs_try_delete,
-        TP_PROTO(struct super_block *sb, u64 ino),
-        TP_ARGS(sb, ino)
-);
-
-DEFINE_EVENT(scoutfs_try_delete_class, scoutfs_try_delete_local_busy,
-        TP_PROTO(struct super_block *sb, u64 ino),
-        TP_ARGS(sb, ino)
-);
-
-DEFINE_EVENT(scoutfs_try_delete_class, scoutfs_try_delete_cached,
-        TP_PROTO(struct super_block *sb, u64 ino),
-        TP_ARGS(sb, ino)
-);
-
-DEFINE_EVENT(scoutfs_try_delete_class, scoutfs_try_delete_no_item,
-        TP_PROTO(struct super_block *sb, u64 ino),
-        TP_ARGS(sb, ino)
-);
-
-TRACE_EVENT(scoutfs_try_delete_has_links,
-	TP_PROTO(struct super_block *sb, u64 ino, unsigned int nlink),
-
-	TP_ARGS(sb, ino, nlink),
-
-	TP_STRUCT__entry(
-		SCSB_TRACE_FIELDS
-		__field(__u64, ino)
-		__field(unsigned int, nlink)
-	),
-
-	TP_fast_assign(
-		SCSB_TRACE_ASSIGN(sb);
-		__entry->ino = ino;
-		__entry->nlink = nlink;
-	),
-
-	TP_printk(SCSBF" ino %llu nlink %u", SCSB_TRACE_ARGS, __entry->ino,
-		  __entry->nlink)
-);
-
-TRACE_EVENT(scoutfs_inode_orphan_delete,
-	TP_PROTO(struct super_block *sb, u64 ino, int ret),
-
-	TP_ARGS(sb, ino, ret),
-
-	TP_STRUCT__entry(
-		SCSB_TRACE_FIELDS
-		__field(__u64, ino)
-		__field(int, ret)
-	),
-
-	TP_fast_assign(
-		SCSB_TRACE_ASSIGN(sb);
-		__entry->ino = ino;
-		__entry->ret = ret;
-	),
-
-	TP_printk(SCSBF" ino %llu ret %d", SCSB_TRACE_ARGS, __entry->ino,
-		__entry->ret)
-);
-
 TRACE_EVENT(scoutfs_delete_inode,
 	TP_PROTO(struct super_block *sb, u64 ino, umode_t mode, u64 size),

@@ -1210,32 +1060,6 @@ TRACE_EVENT(scoutfs_delete_inode,
 		  __entry->mode, __entry->size)
 );

-TRACE_EVENT(scoutfs_delete_inode_end,
-	TP_PROTO(struct super_block *sb, u64 ino, umode_t mode, u64 size, int ret),
-
-	TP_ARGS(sb, ino, mode, size, ret),
-
-	TP_STRUCT__entry(
-		__field(dev_t, dev)
-		__field(__u64, ino)
-		__field(umode_t, mode)
-		__field(__u64, size)
-		__field(int, ret)
-	),
-
-	TP_fast_assign(
-		__entry->dev = sb->s_dev;
-		__entry->ino = ino;
-		__entry->mode = mode;
-		__entry->size = size;
-		__entry->ret = ret;
-	),
-
-	TP_printk("dev %d,%d ino %llu, mode 0x%x size %llu, ret %d",
-		  MAJOR(__entry->dev), MINOR(__entry->dev), __entry->ino,
-		  __entry->mode, __entry->size, __entry->ret)
-);
-
 DECLARE_EVENT_CLASS(scoutfs_key_class,
        TP_PROTO(struct super_block *sb, struct scoutfs_key *key),
        TP_ARGS(sb, key),
@@ -1619,6 +1443,28 @@ DEFINE_EVENT(scoutfs_work_class, scoutfs_data_return_server_extents_exit,
        TP_ARGS(sb, data, ret)
 );

+DECLARE_EVENT_CLASS(scoutfs_shrink_exit_class,
+        TP_PROTO(struct super_block *sb, unsigned long nr_to_scan, int ret),
+        TP_ARGS(sb, nr_to_scan, ret),
+        TP_STRUCT__entry(
+		__field(void *, sb)
+		__field(unsigned long, nr_to_scan)
+		__field(int, ret)
+        ),
+        TP_fast_assign(
+		__entry->sb = sb;
+		__entry->nr_to_scan = nr_to_scan;
+		__entry->ret = ret;
+        ),
+        TP_printk("sb %p nr_to_scan %lu ret %d",
+		  __entry->sb, __entry->nr_to_scan, __entry->ret)
+);
+
+DEFINE_EVENT(scoutfs_shrink_exit_class, scoutfs_lock_shrink_exit,
+        TP_PROTO(struct super_block *sb, unsigned long nr_to_scan, int ret),
+        TP_ARGS(sb, nr_to_scan, ret)
+);
+
 TRACE_EVENT(scoutfs_rename,
 	TP_PROTO(struct super_block *sb, struct inode *old_dir,
 		 struct dentry *old_dentry, struct inode *new_dir,
@@ -3251,24 +3097,6 @@ TRACE_EVENT(scoutfs_ioc_search_xattrs,
 		  __entry->ino, __entry->last_ino)
 );

-TRACE_EVENT(scoutfs_trigger_fired,
-	TP_PROTO(struct super_block *sb, const char *name),
-
-	TP_ARGS(sb, name),
-
-	TP_STRUCT__entry(
-		SCSB_TRACE_FIELDS
-		__field(const char *, name)
-	),
-
-	TP_fast_assign(
-		SCSB_TRACE_ASSIGN(sb);
-		__entry->name = name;
-	),
-
-	TP_printk(SCSBF" %s", SCSB_TRACE_ARGS, __entry->name)
-);
-
 #endif /* _TRACE_SCOUTFS_H */

 /* This part must be outside protection */
--- a/kmod/src/server.c
+++ b/kmod/src/server.c
@@ -41,7 +41,6 @@
 #include "recov.h"
 #include "omap.h"
 #include "fence.h"
-#include "triggers.h"

 /*
 * Every active mount can act as the server that listens on a net
@@ -995,11 +994,10 @@ static int for_each_rid_last_lt(struct super_block *sb, struct scoutfs_btree_roo
 }

 /*
- * Log merge range items are stored at the starting fs key of the range
- * with the zone overwritten to indicate the log merge item type.  This
- * day0 mistake loses sorting information for items in the different
- * zones in the fs root, so the range items aren't strictly sorted by
- * the starting key of their range.
+ * Log merge range items are stored at the starting fs key of the range.
+ * The only fs key field that doesn't hold information is the zone, so
+ * we use the zone to differentiate all types that we store in the log
+ * merge tree.
 */
 static void init_log_merge_key(struct scoutfs_key *key, u8 zone, u64 first,
 			       u64 second)
@@ -1031,51 +1029,6 @@ static int next_log_merge_item_key(struct super_block *sb, struct scoutfs_btree_
 	return ret;
 }

-/*
- * The range items aren't sorted by their range.start because
- * _RANGE_ZONE clobbers the range's zone.  We sweep all the items and
- * find the range with the next least starting key that's greater than
- * the caller's starting key.  We have to be careful to iterate over the
- * log_merge tree keys because the ranges can overlap as they're mapped
- * to the log_merge keys by clobbering their zone.
- */
-static int next_log_merge_range(struct super_block *sb, struct scoutfs_btree_root *root,
-				struct scoutfs_key *start, struct scoutfs_log_merge_range *rng)
-{
-	struct scoutfs_log_merge_range *next;
-	SCOUTFS_BTREE_ITEM_REF(iref);
-	struct scoutfs_key key;
-	int ret;
-
-	key = *start;
-	key.sk_zone = SCOUTFS_LOG_MERGE_RANGE_ZONE;
-	scoutfs_key_set_ones(&rng->start);
-
-	do {
-		ret = scoutfs_btree_next(sb, root, &key, &iref);
-		if (ret == 0) {
-			if (iref.key->sk_zone != SCOUTFS_LOG_MERGE_RANGE_ZONE) {
-				ret = -ENOENT;
-			} else if (iref.val_len != sizeof(struct scoutfs_log_merge_range)) {
-				ret = -EIO;
-			} else {
-				next = iref.val;
-				if (scoutfs_key_compare(&next->start, &rng->start) < 0 &&
-				    scoutfs_key_compare(&next->start, start) >= 0)
-					*rng = *next;
-				key = *iref.key;
-				scoutfs_key_inc(&key);
-			}
-			scoutfs_btree_put_iref(&iref);
-		}
-	} while (ret == 0);
-
-	if (ret == -ENOENT && !scoutfs_key_is_ones(&rng->start))
-		ret = 0;
-
-	return ret;
-}
-
 static int next_log_merge_item(struct super_block *sb,
 			       struct scoutfs_btree_root *root,
 			       u8 zone, u64 first, u64 second,
@@ -1292,13 +1245,9 @@ static int finalize_and_start_log_merge(struct super_block *sb, struct scoutfs_l
 		 * meta was low so that deleted items are merged
 		 * promptly and freed blocks can bring the client out of
 		 * enospc.
-		 *
-		 * The trigger can be used to force a log merge in cases where
-		 * a test only generates small amounts of change.
 		 */
 		finalize_ours = (lt->item_root.height > 2) ||
-				(le32_to_cpu(lt->meta_avail.flags) & SCOUTFS_ALLOC_FLAG_LOW) ||
-				scoutfs_trigger(sb, LOG_MERGE_FORCE_FINALIZE_OURS);
+				(le32_to_cpu(lt->meta_avail.flags) & SCOUTFS_ALLOC_FLAG_LOW);

 		trace_scoutfs_server_finalize_decision(sb, rid, saw_finalized, others_active,
 						       ours_visible, finalize_ours, delay_ms,
@@ -1407,8 +1356,6 @@ static int finalize_and_start_log_merge(struct super_block *sb, struct scoutfs_l
 			BUG_ON(err); /* inconsistent */
 		}

-		scoutfs_inc_counter(sb, log_merge_start);
-
 		/* we're done, caller can make forward progress */
 		break;
 	}
@@ -1625,8 +1572,7 @@ static int server_get_log_trees(struct super_block *sb,
 		goto update;
 	}

-	ret = alloc_move_empty(sb, &super->data_alloc, &lt.data_freed,
-			       COMMIT_HOLD_ALLOC_BUDGET / 2);
+	ret = alloc_move_empty(sb, &super->data_alloc, &lt.data_freed, 100);
 	if (ret == -EINPROGRESS)
 		ret = 0;
 	if (ret < 0) {
@@ -1736,7 +1682,6 @@ static int server_commit_log_trees(struct super_block *sb,
 	int ret;

 	if (arg_len != sizeof(struct scoutfs_log_trees)) {
-		err_str = "invalid message log_trees size";
 		ret = -EINVAL;
 		goto out;
 	}
@@ -1800,7 +1745,7 @@ static int server_commit_log_trees(struct super_block *sb,

 	ret = scoutfs_btree_update(sb, &server->alloc, &server->wri,
 				   &super->logs_root, &key, &lt, sizeof(lt));
-	BUG_ON(ret < 0); /* dirtying should have guaranteed success, srch item inconsistent */
+	BUG_ON(ret < 0); /* dirtying should have guaranteed success */
 	if (ret < 0)
 		err_str = "updating log trees item";

@@ -1808,10 +1753,11 @@ unlock:
 	mutex_unlock(&server->logs_mutex);

 	ret = server_apply_commit(sb, &hold, ret);
-out:
 	if (ret < 0)
 		scoutfs_err(sb, "server error %d committing client logs for rid %016llx, nr %llu: %s",
 			    ret, rid, le64_to_cpu(lt.nr), err_str);
+out:
+	WARN_ON_ONCE(ret < 0);
 	return scoutfs_net_response(sb, conn, cmd, id, ret, NULL, 0);
 }

@@ -1921,11 +1867,9 @@ static int reclaim_open_log_tree(struct super_block *sb, u64 rid)
 	       scoutfs_alloc_splice_list(sb, &server->alloc, &server->wri, server->other_freed,
 					 &lt.meta_avail)) ?:
 	      (err_str = "empty data_avail",
-	       alloc_move_empty(sb, &super->data_alloc, &lt.data_avail,
-				COMMIT_HOLD_ALLOC_BUDGET / 2)) ?:
+	       alloc_move_empty(sb, &super->data_alloc, &lt.data_avail, 100)) ?:
 	      (err_str = "empty data_freed",
-	       alloc_move_empty(sb, &super->data_alloc, &lt.data_freed,
-				COMMIT_HOLD_ALLOC_BUDGET / 2));
+	       alloc_move_empty(sb, &super->data_alloc, &lt.data_freed, 100));
 	mutex_unlock(&server->alloc_mutex);

 	/* only finalize, allowing merging, once the allocators are fully freed */
@@ -2150,7 +2094,7 @@ static int server_srch_get_compact(struct super_block *sb,

 apply:
 	ret = server_apply_commit(sb, &hold, ret);
-	WARN_ON_ONCE(ret < 0 && ret != -ENOENT && ret != -ENOLINK); /* XXX leaked busy item */
+	WARN_ON_ONCE(ret < 0 && ret != -ENOENT); /* XXX leaked busy item */
 out:
 	ret = scoutfs_net_response(sb, conn, cmd, id, ret,
 				   sc, sizeof(struct scoutfs_srch_compact));
@@ -2190,7 +2134,7 @@ static int server_srch_commit_compact(struct super_block *sb,
 					  &super->srch_root, rid, sc,
 					  &av, &fr);
 	mutex_unlock(&server->srch_mutex);
-	if (ret < 0)
+	if (ret < 0) /* XXX very bad, leaks allocators */
 		goto apply;

 	/* reclaim allocators if they were set by _srch_commit_ */
@@ -2200,10 +2144,10 @@ static int server_srch_commit_compact(struct super_block *sb,
 	      scoutfs_alloc_splice_list(sb, &server->alloc, &server->wri,
 					server->other_freed, &fr);
 	mutex_unlock(&server->alloc_mutex);
-	WARN_ON(ret < 0); /* XXX leaks allocators */
 apply:
 	ret = server_apply_commit(sb, &hold, ret);
 out:
+	WARN_ON(ret < 0); /* XXX leaks allocators */
 	return scoutfs_net_response(sb, conn, cmd, id, ret, NULL, 0);
 }

@@ -2516,8 +2460,6 @@ static int splice_log_merge_completions(struct super_block *sb,
 		queue_work(server->wq, &server->log_merge_free_work);
 	else
 		err_str = "deleting merge status item";
-
-	scoutfs_inc_counter(sb, log_merge_complete);
 out:
 	if (upd_stat) {
 		init_log_merge_key(&key, SCOUTFS_LOG_MERGE_STATUS_ZONE, 0, 0);
@@ -2530,9 +2472,10 @@ out:
 		}
 	}

-	/* inconsistent */
-	scoutfs_bug_on_err(sb, ret,
-			   "server error %d splicing log merge completion: %s", ret, err_str);
+	if (ret < 0)
+		scoutfs_err(sb, "server error %d splicing log merge completion: %s", ret, err_str);
+
+	BUG_ON(ret); /* inconsistent */

 	return ret ?: einprogress;
 }
@@ -2777,7 +2720,10 @@ restart:

 	/* find the next range, always checking for splicing */
 	for (;;) {
-		ret = next_log_merge_range(sb, &super->log_merge, &stat.next_range_key, &rng);
+		key = stat.next_range_key;
+		key.sk_zone = SCOUTFS_LOG_MERGE_RANGE_ZONE;
+		ret = next_log_merge_item_key(sb, &super->log_merge, SCOUTFS_LOG_MERGE_RANGE_ZONE,
+					      &key, &rng, sizeof(rng));
 		if (ret < 0 && ret != -ENOENT) {
 			err_str = "finding merge range item";
 			goto out;
@@ -3048,13 +2994,7 @@ static int server_commit_log_merge(struct super_block *sb,
 				  SCOUTFS_LOG_MERGE_STATUS_ZONE, 0, 0,
 				  &stat, sizeof(stat));
 	if (ret < 0) {
-		/*
-		 * During a retransmission, it's possible that the server
-		 * already committed and resolved this log merge. ENOENT
-		 * is expected in that case.
-		 */
-		if (ret != -ENOENT)
-			err_str = "getting merge status item";
+		err_str = "getting merge status item";
 		goto out;
 	}

--- a/kmod/src/srch.c
+++ b/kmod/src/srch.c
@@ -537,35 +537,23 @@ out:
 * the pairs cancel each other out by all readers (the second encoding
 * looks like deletion) so they aren't visible to the first/last bounds of
 * the block or file.
- *
- * We use the same entry repeatedly, so the diff between them will be empty.
- * This lets us just emit the two-byte count word, leaving the other bytes
- * as zero.
- *
- * Split the desired total len into two pieces, adding any remainder to the
- * first four-bit value.
 */
-static void append_padded_entry(struct scoutfs_srch_file *sfl,
-				struct scoutfs_srch_block *srb,
-				int len)
+static int append_padded_entry(struct scoutfs_srch_file *sfl, u64 blk,
+			       struct scoutfs_srch_block *srb, struct scoutfs_srch_entry *sre)
 {
-	int each;
-	int rem;
-	u16 lengths = 0;
-	u8 *buf = srb->entries + le32_to_cpu(srb->entry_bytes);
+	int ret;

-	each = (len - 2) >> 1;
-	rem = (len - 2) & 1;
+	ret = encode_entry(srb->entries + le32_to_cpu(srb->entry_bytes),
+			   sre, &srb->tail);
+	if (ret > 0) {
+		srb->tail = *sre;
+		le32_add_cpu(&srb->entry_nr, 1);
+		le32_add_cpu(&srb->entry_bytes, ret);
+		le64_add_cpu(&sfl->entries, 1);
+		ret = 0;
+	}

-	lengths |= each + rem;
-	lengths |= each << 4;
-
-	memset(buf, 0, len);
-	put_unaligned_le16(lengths, buf);
-
-	le32_add_cpu(&srb->entry_nr, 1);
-	le32_add_cpu(&srb->entry_bytes, len);
-	le64_add_cpu(&sfl->entries, 1);
+	return ret;
 }

 /*
@@ -576,41 +564,61 @@ static void append_padded_entry(struct scoutfs_srch_file *sfl,
 * This is called when there is a single existing entry in the block.
 * We have the entire block to work with.  We encode pairs of matching
 * entries.  This hides them from readers (both searches and merging) as
- * they're interpreted as creation and deletion and are deleted.
+ * they're interpreted as creation and deletion and are deleted.  We use
+ * the existing hash value of the first entry in the block but then set
+ * the inode to an impossibly large number so it doesn't interfere with
+ * anything.
 *
- * For simplicity and to maintain sort ordering within the block, we reuse
- * the existing entry. This lets us skip the encoding step, because we know
- * the diff will be zero. We can zero-pad the resulting entries to hit the
- * target offset exactly.
+ * To hit the specific offset we very carefully manage the amount of
+ * bytes of change between fields in the entry.  We know that if we
+ * change all the byte of the ino and id we end up with a 20 byte
+ * (2+8+8,2) encoding of the pair of entries.  To have the last entry
+ * start at the _SAFE_POS offset we know that the final 20 byte pair
+ * encoding needs to end at 2 bytes (second entry encoding) after the
+ * _SAFE_POS offset.
 *
- * Because we can't predict the exact number of entry_bytes when we start,
- * we adjust the byte count of subsequent entries until we wind up at a
- * multiple of 20 bytes away from our goal and then use that length for
- * the remaining entries.
- *
- * We could just use a single pair of unnaturally large entries to consume
- * the needed space, adjusting for an odd number of entry_bytes if necessary.
- * The use of 19 or 20 bytes for the entry pair matches what we would see with
- * real (non-zero) entries that vary from the existing entry.
+ * So as we encode pairs we watch the delta of our current offset from
+ * that desired final offset of 2 past _SAFE_POS.  If we're a multiple
+ * of 20 away then we encode the full 20 byte pairs.  If we're not, then
+ * we drop a byte to encode 19 bytes.  That'll slowly change the offset
+ * to be a multiple of 20 again while encoding large entries.
 */
-static void pad_entries_at_safe(struct scoutfs_srch_file *sfl,
+static void pad_entries_at_safe(struct scoutfs_srch_file *sfl, u64 blk,
 				struct scoutfs_srch_block *srb)
 {
+	struct scoutfs_srch_entry sre;
 	u32 target;
 	s32 diff;
+	u64 hash;
+	u64 ino;
+	u64 id;
+	int ret;
+
+	hash = le64_to_cpu(srb->tail.hash);
+	ino = le64_to_cpu(srb->tail.ino) | (1ULL << 62);
+	id = le64_to_cpu(srb->tail.id);

 	target = SCOUTFS_SRCH_BLOCK_SAFE_BYTES + 2;

 	while ((diff = target - le32_to_cpu(srb->entry_bytes)) > 0) {
-		append_padded_entry(sfl, srb, 10);
+		ino ^= 1ULL << (7 * 8);
 		if (diff % 20 == 0) {
-			append_padded_entry(sfl, srb, 10);
+			id ^= 1ULL << (7 * 8);
 		} else {
-			append_padded_entry(sfl, srb, 9);
+			id ^= 1ULL << (6 * 8);
 		}
-	}

-	WARN_ON_ONCE(diff != 0);
+		sre.hash = cpu_to_le64(hash);
+		sre.ino = cpu_to_le64(ino);
+		sre.id = cpu_to_le64(id);
+
+		ret = append_padded_entry(sfl, blk, srb, &sre);
+		if (ret == 0)
+			ret = append_padded_entry(sfl, blk, srb, &sre);
+		BUG_ON(ret != 0);
+
+		diff = target - le32_to_cpu(srb->entry_bytes);
+	}
 }

 /*
@@ -856,14 +864,14 @@ static int search_sorted_file(struct super_block *sb,
 		if (pos > SCOUTFS_SRCH_BLOCK_SAFE_BYTES) {
 			/* can only be inconsistency :/ */
 			ret = -EIO;
-			goto out;
+			break;
 		}

 		ret = decode_entry(srb->entries + pos, &sre, &prev);
 		if (ret <= 0) {
 			/* can only be inconsistency :/ */
 			ret = -EIO;
-			goto out;
+			break;
 		}
 		pos += ret;
 		prev = sre;
@@ -1406,7 +1414,7 @@ int scoutfs_srch_commit_compact(struct super_block *sb,
 			ret = -EIO;
 		scoutfs_btree_put_iref(&iref);
 	}
-	if (ret < 0)
+	if (ret < 0) /* XXX leaks allocators */
 		goto out;

 	/* restore busy to pending if the operation failed */
@@ -1426,8 +1434,10 @@ int scoutfs_srch_commit_compact(struct super_block *sb,
 	/* update file references if we finished compaction (!deleting) */
 	if (!(res->flags & SCOUTFS_SRCH_COMPACT_FLAG_DELETE)) {
 		ret = commit_files(sb, alloc, wri, root, res);
-		if (ret < 0)
+		if (ret < 0) {
+			/* XXX we can't commit, shutdown? */
 			goto out;
+		}

 		/* transition flags for deleting input files */
 		for (i = 0; i < res->nr; i++) {
@@ -1454,7 +1464,7 @@ update:
 			      le64_to_cpu(pending->id), 0);
 		ret = scoutfs_btree_insert(sb, alloc, wri, root, &key,
 					   pending, sizeof(*pending));
-		if (WARN_ON_ONCE(ret < 0)) /* XXX inconsistency */
+		if (ret < 0)
 			goto out;
 	}

@@ -1467,6 +1477,7 @@ update:
 		BUG_ON(err); /* both busy and pending present */
 	}
 out:
+	WARN_ON_ONCE(ret < 0); /* XXX inconsistency */
 	kfree(busy);
 	return ret;
 }
@@ -1664,7 +1675,7 @@ static int kway_merge(struct super_block *sb,
 			/* end sorted block on _SAFE offset for testing */
 			if (bl && le32_to_cpu(srb->entry_nr) == 1 && logs_input &&
 			    scoutfs_trigger(sb, SRCH_COMPACT_LOGS_PAD_SAFE)) {
-				pad_entries_at_safe(sfl, srb);
+				pad_entries_at_safe(sfl, blk, srb);
 				scoutfs_block_put(sb, bl);
 				bl = NULL;
 				blk++;
@@ -1862,7 +1873,7 @@ static int compact_logs(struct super_block *sb,
 		if (pos > SCOUTFS_SRCH_BLOCK_SAFE_BYTES) {
 			/* can only be inconsistency :/ */
 			ret = -EIO;
-			goto out;
+			break;
 		}

 		ret = decode_entry(srb->entries + pos, sre, &prev);
@@ -2276,11 +2287,12 @@ static void scoutfs_srch_compact_worker(struct work_struct *work)
 	} else {
 		ret = -EINVAL;
 	}
+	if (ret < 0)
+		goto commit;

-	scoutfs_alloc_prepare_commit(sb, &alloc, &wri);
-	if (ret == 0)
+	ret = scoutfs_alloc_prepare_commit(sb, &alloc, &wri) ?:
 	      scoutfs_block_writer_write(sb, &wri);
-
+commit:
 	/* the server won't use our partial compact if _ERROR is set */
 	sc->meta_avail = alloc.avail;
 	sc->meta_freed = alloc.freed;
@@ -2297,7 +2309,7 @@ out:
 		scoutfs_inc_counter(sb, srch_compact_error);

 	scoutfs_block_writer_forget_all(sb, &wri);
-	queue_compact_work(srinf, sc != NULL && sc->nr > 0 && ret == 0);
+	queue_compact_work(srinf, sc->nr > 0 && ret == 0);

 	kfree(sc);
 }
--- a/kmod/src/super.c
+++ b/kmod/src/super.c
@@ -512,9 +512,9 @@ static int scoutfs_fill_super(struct super_block *sb, void *data, int silent)

 	sbi = kzalloc(sizeof(struct scoutfs_sb_info), GFP_KERNEL);
 	sb->s_fs_info = sbi;
+	sbi->sb = sb;
 	if (!sbi)
 		return -ENOMEM;
-	sbi->sb = sb;

 	ret = assign_random_id(sbi);
 	if (ret < 0)
--- a/kmod/src/triggers.c
+++ b/kmod/src/triggers.c
@@ -18,7 +18,6 @@

 #include "super.h"
 #include "triggers.h"
-#include "scoutfs_trace.h"

 /*
 * We have debugfs files we can write to which arm triggers which
@@ -40,7 +39,6 @@ struct scoutfs_triggers {

 static char *names[] = {
 	[SCOUTFS_TRIGGER_BLOCK_REMOVE_STALE] = "block_remove_stale",
-	[SCOUTFS_TRIGGER_LOG_MERGE_FORCE_FINALIZE_OURS] = "log_merge_force_finalize_ours",
 	[SCOUTFS_TRIGGER_SRCH_COMPACT_LOGS_PAD_SAFE] = "srch_compact_logs_pad_safe",
 	[SCOUTFS_TRIGGER_SRCH_FORCE_LOG_ROTATE] = "srch_force_log_rotate",
 	[SCOUTFS_TRIGGER_SRCH_MERGE_STOP_SAFE] = "srch_merge_stop_safe",
@@ -53,7 +51,6 @@ bool scoutfs_trigger_test_and_clear(struct super_block *sb, unsigned int t)
 	atomic_t *atom;
 	int old;
 	int mem;
-	bool fired;

 	BUG_ON(t >= SCOUTFS_TRIGGER_NR);
 	atom = &triggers->atomics[t];
@@ -67,12 +64,7 @@ bool scoutfs_trigger_test_and_clear(struct super_block *sb, unsigned int t)
 		mem = atomic_cmpxchg(atom, old, 0);
 	} while (mem && mem != old);

-	fired = !!mem;
-
-	if (fired)
-		trace_scoutfs_trigger_fired(sb, names[t]);
-
-	return fired;
+	return !!mem;
 }

 int scoutfs_setup_triggers(struct super_block *sb)
--- a/kmod/src/triggers.h
+++ b/kmod/src/triggers.h
@@ -3,7 +3,6 @@

 enum scoutfs_trigger {
 	SCOUTFS_TRIGGER_BLOCK_REMOVE_STALE,
-	SCOUTFS_TRIGGER_LOG_MERGE_FORCE_FINALIZE_OURS,
 	SCOUTFS_TRIGGER_SRCH_COMPACT_LOGS_PAD_SAFE,
 	SCOUTFS_TRIGGER_SRCH_FORCE_LOG_ROTATE,
 	SCOUTFS_TRIGGER_SRCH_MERGE_STOP_SAFE,
--- a/tests/README.md
+++ b/tests/README.md
@@ -117,7 +117,6 @@ used during the test.
 | T\_NR\_MOUNTS    | number of mounts     | -n              | 3                 |
 | T\_O[0-9]        | mount options        | created per run | -o server\_addr=  |
 | T\_QUORUM        | quorum count         | -q              | 2                 |
-| T\_EXTRA         | per-test file dir    | revision ctled  | tests/extra/t     |
 | T\_TMP           | per-test tmp prefix  | made for test   | results/tmp/t/tmp |
 | T\_TMPDIR        | per-test tmp dir dir | made for test   | results/tmp/t     |

--- a/tests/extra/xfstests/expected-results
+++ b/tests/extra/xfstests/expected-results
@@ -1,882 +0,0 @@
-Ran:
-generic/001
-generic/002
-generic/004
-generic/005
-generic/006
-generic/007
-generic/008
-generic/009
-generic/011
-generic/012
-generic/013
-generic/014
-generic/015
-generic/016
-generic/018
-generic/020
-generic/021
-generic/022
-generic/023
-generic/024
-generic/025
-generic/026
-generic/028
-generic/029
-generic/030
-generic/031
-generic/032
-generic/033
-generic/034
-generic/035
-generic/037
-generic/039
-generic/040
-generic/041
-generic/050
-generic/052
-generic/053
-generic/056
-generic/057
-generic/058
-generic/059
-generic/060
-generic/061
-generic/062
-generic/063
-generic/064
-generic/065
-generic/066
-generic/067
-generic/069
-generic/070
-generic/071
-generic/073
-generic/076
-generic/078
-generic/079
-generic/080
-generic/081
-generic/082
-generic/084
-generic/086
-generic/087
-generic/088
-generic/090
-generic/091
-generic/092
-generic/094
-generic/096
-generic/097
-generic/098
-generic/099
-generic/101
-generic/104
-generic/105
-generic/106
-generic/107
-generic/110
-generic/111
-generic/113
-generic/114
-generic/115
-generic/116
-generic/117
-generic/118
-generic/119
-generic/120
-generic/121
-generic/122
-generic/123
-generic/124
-generic/126
-generic/128
-generic/129
-generic/130
-generic/131
-generic/134
-generic/135
-generic/136
-generic/138
-generic/139
-generic/140
-generic/141
-generic/142
-generic/143
-generic/144
-generic/145
-generic/146
-generic/147
-generic/148
-generic/149
-generic/150
-generic/151
-generic/152
-generic/153
-generic/154
-generic/155
-generic/156
-generic/157
-generic/158
-generic/159
-generic/160
-generic/161
-generic/162
-generic/163
-generic/169
-generic/171
-generic/172
-generic/173
-generic/174
-generic/177
-generic/178
-generic/179
-generic/180
-generic/181
-generic/182
-generic/183
-generic/184
-generic/185
-generic/188
-generic/189
-generic/190
-generic/191
-generic/193
-generic/194
-generic/195
-generic/196
-generic/197
-generic/198
-generic/199
-generic/200
-generic/201
-generic/202
-generic/203
-generic/205
-generic/206
-generic/207
-generic/210
-generic/211
-generic/212
-generic/214
-generic/215
-generic/216
-generic/217
-generic/218
-generic/219
-generic/220
-generic/221
-generic/222
-generic/223
-generic/225
-generic/227
-generic/228
-generic/229
-generic/230
-generic/235
-generic/236
-generic/237
-generic/238
-generic/240
-generic/244
-generic/245
-generic/246
-generic/247
-generic/248
-generic/249
-generic/250
-generic/252
-generic/253
-generic/254
-generic/255
-generic/256
-generic/257
-generic/258
-generic/259
-generic/260
-generic/261
-generic/262
-generic/263
-generic/264
-generic/265
-generic/266
-generic/267
-generic/268
-generic/271
-generic/272
-generic/276
-generic/277
-generic/278
-generic/279
-generic/281
-generic/282
-generic/283
-generic/284
-generic/286
-generic/287
-generic/288
-generic/289
-generic/290
-generic/291
-generic/292
-generic/293
-generic/294
-generic/295
-generic/296
-generic/301
-generic/302
-generic/303
-generic/304
-generic/305
-generic/306
-generic/307
-generic/308
-generic/309
-generic/312
-generic/313
-generic/314
-generic/315
-generic/316
-generic/317
-generic/319
-generic/322
-generic/324
-generic/325
-generic/326
-generic/327
-generic/328
-generic/329
-generic/330
-generic/331
-generic/332
-generic/335
-generic/336
-generic/337
-generic/341
-generic/342
-generic/343
-generic/346
-generic/348
-generic/353
-generic/355
-generic/358
-generic/359
-generic/360
-generic/361
-generic/362
-generic/363
-generic/364
-generic/365
-generic/366
-generic/367
-generic/368
-generic/369
-generic/370
-generic/371
-generic/372
-generic/373
-generic/374
-generic/375
-generic/376
-generic/377
-generic/378
-generic/379
-generic/380
-generic/381
-generic/382
-generic/383
-generic/384
-generic/385
-generic/386
-generic/389
-generic/391
-generic/392
-generic/393
-generic/394
-generic/395
-generic/396
-generic/397
-generic/398
-generic/400
-generic/401
-generic/402
-generic/403
-generic/404
-generic/406
-generic/407
-generic/408
-generic/412
-generic/413
-generic/414
-generic/417
-generic/419
-generic/420
-generic/421
-generic/422
-generic/424
-generic/425
-generic/426
-generic/427
-generic/428
-generic/436
-generic/437
-generic/439
-generic/440
-generic/443
-generic/445
-generic/446
-generic/448
-generic/449
-generic/450
-generic/451
-generic/452
-generic/453
-generic/454
-generic/456
-generic/458
-generic/460
-generic/462
-generic/463
-generic/465
-generic/466
-generic/468
-generic/469
-generic/470
-generic/471
-generic/474
-generic/477
-generic/478
-generic/479
-generic/480
-generic/481
-generic/483
-generic/485
-generic/486
-generic/487
-generic/488
-generic/489
-generic/490
-generic/491
-generic/492
-generic/498
-generic/499
-generic/501
-generic/502
-generic/503
-generic/504
-generic/505
-generic/506
-generic/507
-generic/508
-generic/509
-generic/510
-generic/511
-generic/512
-generic/513
-generic/514
-generic/515
-generic/516
-generic/517
-generic/518
-generic/519
-generic/520
-generic/523
-generic/524
-generic/525
-generic/526
-generic/527
-generic/528
-generic/529
-generic/530
-generic/531
-generic/533
-generic/534
-generic/535
-generic/536
-generic/537
-generic/538
-generic/539
-generic/540
-generic/541
-generic/542
-generic/543
-generic/544
-generic/545
-generic/546
-generic/547
-generic/548
-generic/549
-generic/550
-generic/552
-generic/553
-generic/555
-generic/556
-generic/557
-generic/566
-generic/567
-generic/571
-generic/572
-generic/573
-generic/574
-generic/575
-generic/576
-generic/577
-generic/578
-generic/580
-generic/581
-generic/582
-generic/583
-generic/584
-generic/586
-generic/587
-generic/588
-generic/591
-generic/592
-generic/593
-generic/594
-generic/595
-generic/596
-generic/597
-generic/598
-generic/599
-generic/600
-generic/601
-generic/602
-generic/603
-generic/604
-generic/605
-generic/606
-generic/607
-generic/608
-generic/609
-generic/610
-generic/611
-generic/612
-generic/613
-generic/614
-generic/618
-generic/621
-generic/623
-generic/624
-generic/625
-generic/626
-generic/628
-generic/629
-generic/630
-generic/632
-generic/634
-generic/635
-generic/637
-generic/638
-generic/639
-generic/640
-generic/644
-generic/645
-generic/646
-generic/647
-generic/651
-generic/652
-generic/653
-generic/654
-generic/655
-generic/657
-generic/658
-generic/659
-generic/660
-generic/661
-generic/662
-generic/663
-generic/664
-generic/665
-generic/666
-generic/667
-generic/668
-generic/669
-generic/673
-generic/674
-generic/675
-generic/676
-generic/677
-generic/678
-generic/679
-generic/680
-generic/681
-generic/682
-generic/683
-generic/684
-generic/685
-generic/686
-generic/687
-generic/688
-generic/689
-shared/002
-shared/032
-Not
-run:
-generic/008
-generic/009
-generic/012
-generic/015
-generic/016
-generic/018
-generic/021
-generic/022
-generic/025
-generic/026
-generic/031
-generic/033
-generic/050
-generic/052
-generic/058
-generic/059
-generic/060
-generic/061
-generic/063
-generic/064
-generic/078
-generic/079
-generic/081
-generic/082
-generic/091
-generic/094
-generic/096
-generic/110
-generic/111
-generic/113
-generic/114
-generic/115
-generic/116
-generic/118
-generic/119
-generic/121
-generic/122
-generic/123
-generic/128
-generic/130
-generic/134
-generic/135
-generic/136
-generic/138
-generic/139
-generic/140
-generic/142
-generic/143
-generic/144
-generic/145
-generic/146
-generic/147
-generic/148
-generic/149
-generic/150
-generic/151
-generic/152
-generic/153
-generic/154
-generic/155
-generic/156
-generic/157
-generic/158
-generic/159
-generic/160
-generic/161
-generic/162
-generic/163
-generic/171
-generic/172
-generic/173
-generic/174
-generic/177
-generic/178
-generic/179
-generic/180
-generic/181
-generic/182
-generic/183
-generic/185
-generic/188
-generic/189
-generic/190
-generic/191
-generic/193
-generic/194
-generic/195
-generic/196
-generic/197
-generic/198
-generic/199
-generic/200
-generic/201
-generic/202
-generic/203
-generic/205
-generic/206
-generic/207
-generic/210
-generic/211
-generic/212
-generic/214
-generic/216
-generic/217
-generic/218
-generic/219
-generic/220
-generic/222
-generic/223
-generic/225
-generic/227
-generic/229
-generic/230
-generic/235
-generic/238
-generic/240
-generic/244
-generic/250
-generic/252
-generic/253
-generic/254
-generic/255
-generic/256
-generic/259
-generic/260
-generic/261
-generic/262
-generic/263
-generic/264
-generic/265
-generic/266
-generic/267
-generic/268
-generic/271
-generic/272
-generic/276
-generic/277
-generic/278
-generic/279
-generic/281
-generic/282
-generic/283
-generic/284
-generic/287
-generic/288
-generic/289
-generic/290
-generic/291
-generic/292
-generic/293
-generic/295
-generic/296
-generic/301
-generic/302
-generic/303
-generic/304
-generic/305
-generic/312
-generic/314
-generic/316
-generic/317
-generic/324
-generic/326
-generic/327
-generic/328
-generic/329
-generic/330
-generic/331
-generic/332
-generic/353
-generic/355
-generic/358
-generic/359
-generic/361
-generic/362
-generic/363
-generic/364
-generic/365
-generic/366
-generic/367
-generic/368
-generic/369
-generic/370
-generic/371
-generic/372
-generic/373
-generic/374
-generic/378
-generic/379
-generic/380
-generic/381
-generic/382
-generic/383
-generic/384
-generic/385
-generic/386
-generic/391
-generic/392
-generic/395
-generic/396
-generic/397
-generic/398
-generic/400
-generic/402
-generic/404
-generic/406
-generic/407
-generic/408
-generic/412
-generic/413
-generic/414
-generic/417
-generic/419
-generic/420
-generic/421
-generic/422
-generic/424
-generic/425
-generic/427
-generic/439
-generic/440
-generic/446
-generic/449
-generic/450
-generic/451
-generic/453
-generic/454
-generic/456
-generic/458
-generic/462
-generic/463
-generic/465
-generic/466
-generic/468
-generic/469
-generic/470
-generic/471
-generic/474
-generic/485
-generic/487
-generic/488
-generic/491
-generic/492
-generic/499
-generic/501
-generic/503
-generic/505
-generic/506
-generic/507
-generic/508
-generic/511
-generic/513
-generic/514
-generic/515
-generic/516
-generic/517
-generic/518
-generic/519
-generic/520
-generic/528
-generic/530
-generic/536
-generic/537
-generic/538
-generic/539
-generic/540
-generic/541
-generic/542
-generic/543
-generic/544
-generic/545
-generic/546
-generic/548
-generic/549
-generic/550
-generic/552
-generic/553
-generic/555
-generic/556
-generic/566
-generic/567
-generic/572
-generic/573
-generic/574
-generic/575
-generic/576
-generic/577
-generic/578
-generic/580
-generic/581
-generic/582
-generic/583
-generic/584
-generic/586
-generic/587
-generic/588
-generic/591
-generic/592
-generic/593
-generic/594
-generic/595
-generic/596
-generic/597
-generic/598
-generic/599
-generic/600
-generic/601
-generic/602
-generic/603
-generic/605
-generic/606
-generic/607
-generic/608
-generic/609
-generic/610
-generic/612
-generic/613
-generic/621
-generic/623
-generic/624
-generic/625
-generic/626
-generic/628
-generic/629
-generic/630
-generic/635
-generic/644
-generic/645
-generic/646
-generic/647
-generic/651
-generic/652
-generic/653
-generic/654
-generic/655
-generic/657
-generic/658
-generic/659
-generic/660
-generic/661
-generic/662
-generic/663
-generic/664
-generic/665
-generic/666
-generic/667
-generic/668
-generic/669
-generic/673
-generic/674
-generic/675
-generic/677
-generic/678
-generic/679
-generic/680
-generic/681
-generic/682
-generic/683
-generic/684
-generic/685
-generic/686
-generic/687
-generic/688
-generic/689
-shared/002
-shared/032
-Passed all 512 tests
--- a/tests/extra/xfstests/local.exclude
+++ b/tests/extra/xfstests/local.exclude
@@ -1,44 +0,0 @@
-generic/003	# missing atime update in buffered read
-generic/075	# file content mismatch failures (fds, etc)
-generic/103	# enospc causes trans commit failures
-generic/108	# mount fails on failing device?
-generic/112	# file content mismatch failures (fds, etc)
-generic/213	# enospc causes trans commit failures
-generic/318	# can't support user namespaces until v5.11
-generic/321	# requires selinux enabled for '+' in ls?
-generic/338	# BUG_ON update inode error handling
-generic/347	# _dmthin_mount doesn't work?
-generic/356	# swap
-generic/357	# swap
-generic/409	# bind mounts not scripted yet
-generic/410	# bind mounts not scripted yet
-generic/411	# bind mounts not scripted yet
-generic/423	# symlink inode size is strlen() + 1 on scoutfs
-generic/430	# xfs_io copy_range missing in el7
-generic/431	# xfs_io copy_range missing in el7
-generic/432	# xfs_io copy_range missing in el7
-generic/433	# xfs_io copy_range missing in el7
-generic/434	# xfs_io copy_range missing in el7
-generic/441	# dm-mapper
-generic/444	# el9's posix_acl_update_mode is buggy ?
-generic/467	# open_by_handle ESTALE
-generic/472	# swap
-generic/484	# dm-mapper
-generic/493	# swap
-generic/494	# swap
-generic/495	# swap
-generic/496	# swap
-generic/497	# swap
-generic/532	# xfs_io statx attrib_mask missing in el7
-generic/554	# swap
-generic/563	# cgroup+loopdev
-generic/564	# xfs_io copy_range missing in el7
-generic/565	# xfs_io copy_range missing in el7
-generic/568	# falloc not resulting in block count increase
-generic/569	# swap
-generic/570	# swap
-generic/620	# dm-hugedisk
-generic/633	# id-mapped mounts missing in el7
-generic/636	# swap
-generic/641	# swap
-generic/643	# swap
--- a/tests/fenced-local-force-unmount.sh
+++ b/tests/fenced-local-force-unmount.sh
@@ -8,33 +8,36 @@

 echo "$0 running rid '$SCOUTFS_FENCED_REQ_RID' ip '$SCOUTFS_FENCED_REQ_IP' args '$@'"

-echo_fail() {
-	echo "$@" >&2
+log() {
+	echo "$@" > /dev/stderr
 	exit 1
 }

-# silence error messages
-quiet_cat()
-{
-	cat "$@" 2>/dev/null
+echo_fail() {
+	echo "$@" > /dev/stderr
+	exit 1
 }

 rid="$SCOUTFS_FENCED_REQ_RID"

-shopt -s nullglob
 for fs in /sys/fs/scoutfs/*; do
-	fs_rid="$(quiet_cat $fs/rid)"
-	nr="$(quiet_cat $fs/data_device_maj_min)"
-	[ ! -d "$fs" -o "$fs_rid" != "$rid" ] && continue
+	[ ! -d "$fs" ] && continue

-	mnt=$(findmnt -l -n -t scoutfs -o TARGET -S $nr)
-	[ -z "$mnt" ] && continue
-
-	if ! umount -qf "$mnt"; then
-		if [ -d "$fs" ]; then
-			echo_fail "umount -qf $mnt failed"
-		fi
+	fs_rid="$(cat $fs/rid)" || \
+		echo_fail "failed to get rid in $fs"
+	if [ "$fs_rid" != "$rid" ]; then
+		continue
 	fi
+
+	nr="$(cat $fs/data_device_maj_min)" || \
+		echo_fail "failed to get data device major:minor in $fs"
+
+	mnts=$(findmnt -l -n -t scoutfs -o TARGET -S $nr) || \
+		echo_fail "findmnt -t scoutfs -S $nr failed"
+	for mnt in $mnts; do
+		umount -f "$mnt" || \
+			echo_fail "umout -f $mnt failed"
+	done
 done

 exit 0
--- a/tests/funcs/exec.sh
+++ b/tests/funcs/exec.sh
@@ -64,27 +64,21 @@ t_rc()
 }

 #
-# As run, stdout/err are redirected to a file that will be compared with
-# the stored expected golden output of the test.  This redirects
-# stdout/err in the script to stdout of the invoking run-test.  It's
-# intended to give visible output of tests without being included in the
-# golden output.
+# redirect test output back to the output of the invoking script intead
+# of the compared output.
 #
-# (see the goofy "exec" fd manipulation in the main run-tests as it runs
-# each test)
-#
-t_stdout_invoked()
+t_restore_output()
 {
 	exec >&6 2>&1
 }

 #
-# This undoes t_stdout_invokved, returning the test's stdout/err to the
-# output file as it was when it was launched.
+# redirect a command's output back to the compared output after the
+# test has restored its output
 #
-t_stdout_compare()
+t_compare_output()
 {
-	exec >&7 2>&1
+	"$@" >&7 2>&1
 }

 #
--- a/tests/funcs/filter.sh
+++ b/tests/funcs/filter.sh
@@ -121,7 +121,6 @@ t_filter_dmesg()

 	# in debugging kernels we can slow things down a bit
 	re="$re|hrtimer: interrupt took .*"
-	re="$re|clocksource: Long readout interval"

 	# fencing tests force unmounts and trigger timeouts
 	re="$re|scoutfs .* forcing unmount"
@@ -167,12 +166,6 @@ t_filter_dmesg()
 	# perf warning that it adjusted sample rate
 	re="$re|perf: interrupt took too long.*lowering kernel.perf_event_max_sample_rate.*"

-	# some ci test guests are unresponsive
-	re="$re|longest quorum heartbeat .* delay"
-
-	# creating block devices may trigger this
-	re="$re|block device autoloading is deprecated and will be removed."
-
 	egrep -v "($re)" | \
 		ignore_harmless_unwind_kasan_stack_oob
 }
--- a/tests/funcs/fs.sh
+++ b/tests/funcs/fs.sh
@@ -498,121 +498,3 @@ t_restore_all_sysfs_mount_options() {
 		t_set_sysfs_mount_option $i $name "${_saved_opts[$ind]}"
 	done
 }
-
-t_force_log_merge() {
-	local sv=$(t_server_nr)
-	local merges_started
-	local last_merges_started
-	local merges_completed
-	local last_merges_completed
-
-	while true; do
-		last_merges_started=$(t_counter log_merge_start $sv)
-		last_merges_completed=$(t_counter log_merge_complete $sv)
-
-		t_trigger_arm_silent log_merge_force_finalize_ours $sv
-
-		t_sync_seq_index
-
-		while test "$(t_trigger_get log_merge_force_finalize_ours $sv)" == "1"; do
-			sleep .5
-		done
-
-		merges_started=$(t_counter log_merge_start $sv)
-
-		if (( merges_started > last_merges_started )); then
-			merges_completed=$(t_counter log_merge_complete $sv)
-
-			while (( merges_completed == last_merges_completed )); do
-				sleep .5
-				merges_completed=$(t_counter log_merge_complete $sv)
-			done
-			break
-		fi
-	done
-}
-
-declare -A _last_scan
-t_get_orphan_scan_runs() {
-	local i
-
-	for i in $(t_fs_nrs); do
-		_last_scan[$i]=$(t_counter orphan_scan $i)
-	done
-}
-
-t_wait_for_orphan_scan_runs() {
-	local i
-	local scan
-
-	t_get_orphan_scan_runs
-
-	for i in $(t_fs_nrs); do
-		while true; do
-			scan=$(t_counter orphan_scan $i)
-			if (( scan != _last_scan[$i] )); then
-				break
-			fi
-			sleep .5
-		done
-	done
-}
-
-declare -A _last_empty
-t_get_orphan_scan_empty() {
-	local i
-
-	for i in $(t_fs_nrs); do
-		_last_empty[$i]=$(t_counter orphan_scan_empty $i)
-	done
-}
-
-t_wait_for_no_orphans() {
-	local i;
-	local working;
-	local empty;
-
-	t_get_orphan_scan_empty
-
-	while true; do
-		working=0
-
-		t_wait_for_orphan_scan_runs
-
-		for i in $(t_fs_nrs); do
-			empty=$(t_counter orphan_scan_empty $i)
-			if (( empty == _last_empty[$i] )); then
-				(( working++ ))
-			else
-				(( _last_empty[$i] = empty ))
-			fi
-		done
-
-		if (( working == 0 )); then
-			break
-		fi
-
-		sleep 1
-	done
-}
-
-#
-# Repeatedly run the arguments as a command, sleeping in between, until
-# it returns success.  The first argument is a relative timeout in
-# seconds.  The remaining arguments are the command and its arguments.
-#
-# If the timeout expires without the command returning 0 then the test
-# fails.
-#
-t_wait_until_timeout() {
-	local relative="$1"
-	local expire="$((SECONDS + relative))"
-	shift
-
-	while (( SECONDS < expire )); do
-		"$@" && return
-		sleep 1
-	done
-
-	t_fail "command failed for $relative sec: $@"
-}
--- a/tests/funcs/tap.sh
+++ b/tests/funcs/tap.sh
@@ -43,14 +43,9 @@ t_tap_progress()
 	local testname=$1
 	local result=$2

-	local stmsg=""
 	local diff=""
 	local dmsg=""

-	if [[ -s $T_RESULTS/tmp/${testname}/status.msg ]]; then
-		stmsg="1"
-	fi
-
 	if [[ -s "$T_RESULTS/tmp/${testname}/dmesg.new" ]]; then
 		dmsg="1"
 	fi
@@ -66,7 +61,6 @@ t_tap_progress()
 		echo "# ${testname} ** skipped - permitted **"
 	else
 		echo "not ok ${i} - ${testname}"
-
 		case ${result} in
 		101)
 			echo "# ${testname} ** skipped **"
@@ -76,13 +70,6 @@ t_tap_progress()
 			;;
 		esac

-		if [[ -n "${stmsg}" ]]; then
-			echo "#"
-			echo "# status:"
-			echo "#"
-			cat $T_RESULTS/tmp/${testname}/status.msg | sed 's/^/# - /'
-		fi
-
 		if [[ -n "${diff}" ]]; then
 			echo "#"
 			echo "# diff:"
--- a/tests/golden/inode-deletion
+++ b/tests/golden/inode-deletion
@@ -17,7 +17,7 @@ ino not found in dseq index
 mount 0 contents after mount 1 rm: contents
 ino found in dseq index
 ino found in dseq index
-stat: cannot stat '/mnt/test/test/inode-deletion/badfile': No such file or directory
+stat: cannot stat '/mnt/test/test/inode-deletion/file': No such file or directory
 ino not found in dseq index
 ino not found in dseq index
 == lots of deletions use one open map
--- a/tests/golden/xfstests
+++ b/tests/golden/xfstests
@@ -0,0 +1,882 @@
+Ran:
+generic/001
+generic/002
+generic/004
+generic/005
+generic/006
+generic/007
+generic/008
+generic/009
+generic/011
+generic/012
+generic/013
+generic/014
+generic/015
+generic/016
+generic/018
+generic/020
+generic/021
+generic/022
+generic/023
+generic/024
+generic/025
+generic/026
+generic/028
+generic/029
+generic/030
+generic/031
+generic/032
+generic/033
+generic/034
+generic/035
+generic/037
+generic/039
+generic/040
+generic/041
+generic/050
+generic/052
+generic/053
+generic/056
+generic/057
+generic/058
+generic/059
+generic/060
+generic/061
+generic/062
+generic/063
+generic/064
+generic/065
+generic/066
+generic/067
+generic/069
+generic/070
+generic/071
+generic/073
+generic/076
+generic/078
+generic/079
+generic/080
+generic/081
+generic/082
+generic/084
+generic/086
+generic/087
+generic/088
+generic/090
+generic/091
+generic/092
+generic/094
+generic/096
+generic/097
+generic/098
+generic/099
+generic/101
+generic/104
+generic/105
+generic/106
+generic/107
+generic/110
+generic/111
+generic/113
+generic/114
+generic/115
+generic/116
+generic/117
+generic/118
+generic/119
+generic/120
+generic/121
+generic/122
+generic/123
+generic/124
+generic/126
+generic/128
+generic/129
+generic/130
+generic/131
+generic/134
+generic/135
+generic/136
+generic/138
+generic/139
+generic/140
+generic/141
+generic/142
+generic/143
+generic/144
+generic/145
+generic/146
+generic/147
+generic/148
+generic/149
+generic/150
+generic/151
+generic/152
+generic/153
+generic/154
+generic/155
+generic/156
+generic/157
+generic/158
+generic/159
+generic/160
+generic/161
+generic/162
+generic/163
+generic/169
+generic/171
+generic/172
+generic/173
+generic/174
+generic/177
+generic/178
+generic/179
+generic/180
+generic/181
+generic/182
+generic/183
+generic/184
+generic/185
+generic/188
+generic/189
+generic/190
+generic/191
+generic/193
+generic/194
+generic/195
+generic/196
+generic/197
+generic/198
+generic/199
+generic/200
+generic/201
+generic/202
+generic/203
+generic/205
+generic/206
+generic/207
+generic/210
+generic/211
+generic/212
+generic/214
+generic/215
+generic/216
+generic/217
+generic/218
+generic/219
+generic/220
+generic/221
+generic/222
+generic/223
+generic/225
+generic/227
+generic/228
+generic/229
+generic/230
+generic/235
+generic/236
+generic/237
+generic/238
+generic/240
+generic/244
+generic/245
+generic/246
+generic/247
+generic/248
+generic/249
+generic/250
+generic/252
+generic/253
+generic/254
+generic/255
+generic/256
+generic/257
+generic/258
+generic/259
+generic/260
+generic/261
+generic/262
+generic/263
+generic/264
+generic/265
+generic/266
+generic/267
+generic/268
+generic/271
+generic/272
+generic/276
+generic/277
+generic/278
+generic/279
+generic/281
+generic/282
+generic/283
+generic/284
+generic/286
+generic/287
+generic/288
+generic/289
+generic/290
+generic/291
+generic/292
+generic/293
+generic/294
+generic/295
+generic/296
+generic/301
+generic/302
+generic/303
+generic/304
+generic/305
+generic/306
+generic/307
+generic/308
+generic/309
+generic/312
+generic/313
+generic/314
+generic/315
+generic/316
+generic/317
+generic/319
+generic/322
+generic/324
+generic/325
+generic/326
+generic/327
+generic/328
+generic/329
+generic/330
+generic/331
+generic/332
+generic/335
+generic/336
+generic/337
+generic/341
+generic/342
+generic/343
+generic/346
+generic/348
+generic/353
+generic/355
+generic/358
+generic/359
+generic/360
+generic/361
+generic/362
+generic/363
+generic/364
+generic/365
+generic/366
+generic/367
+generic/368
+generic/369
+generic/370
+generic/371
+generic/372
+generic/373
+generic/374
+generic/375
+generic/376
+generic/377
+generic/378
+generic/379
+generic/380
+generic/381
+generic/382
+generic/383
+generic/384
+generic/385
+generic/386
+generic/389
+generic/391
+generic/392
+generic/393
+generic/394
+generic/395
+generic/396
+generic/397
+generic/398
+generic/400
+generic/401
+generic/402
+generic/403
+generic/404
+generic/406
+generic/407
+generic/408
+generic/412
+generic/413
+generic/414
+generic/417
+generic/419
+generic/420
+generic/421
+generic/422
+generic/424
+generic/425
+generic/426
+generic/427
+generic/428
+generic/436
+generic/437
+generic/439
+generic/440
+generic/443
+generic/445
+generic/446
+generic/448
+generic/449
+generic/450
+generic/451
+generic/452
+generic/453
+generic/454
+generic/456
+generic/458
+generic/460
+generic/462
+generic/463
+generic/465
+generic/466
+generic/468
+generic/469
+generic/470
+generic/471
+generic/474
+generic/477
+generic/478
+generic/479
+generic/480
+generic/481
+generic/483
+generic/485
+generic/486
+generic/487
+generic/488
+generic/489
+generic/490
+generic/491
+generic/492
+generic/498
+generic/499
+generic/501
+generic/502
+generic/503
+generic/504
+generic/505
+generic/506
+generic/507
+generic/508
+generic/509
+generic/510
+generic/511
+generic/512
+generic/513
+generic/514
+generic/515
+generic/516
+generic/517
+generic/518
+generic/519
+generic/520
+generic/523
+generic/524
+generic/525
+generic/526
+generic/527
+generic/528
+generic/529
+generic/530
+generic/531
+generic/533
+generic/534
+generic/535
+generic/536
+generic/537
+generic/538
+generic/539
+generic/540
+generic/541
+generic/542
+generic/543
+generic/544
+generic/545
+generic/546
+generic/547
+generic/548
+generic/549
+generic/550
+generic/552
+generic/553
+generic/555
+generic/556
+generic/557
+generic/566
+generic/567
+generic/571
+generic/572
+generic/573
+generic/574
+generic/575
+generic/576
+generic/577
+generic/578
+generic/580
+generic/581
+generic/582
+generic/583
+generic/584
+generic/586
+generic/587
+generic/588
+generic/591
+generic/592
+generic/593
+generic/594
+generic/595
+generic/596
+generic/597
+generic/598
+generic/599
+generic/600
+generic/601
+generic/602
+generic/603
+generic/604
+generic/605
+generic/606
+generic/607
+generic/608
+generic/609
+generic/610
+generic/611
+generic/612
+generic/613
+generic/614
+generic/618
+generic/621
+generic/623
+generic/624
+generic/625
+generic/626
+generic/628
+generic/629
+generic/630
+generic/632
+generic/634
+generic/635
+generic/637
+generic/638
+generic/639
+generic/640
+generic/644
+generic/645
+generic/646
+generic/647
+generic/651
+generic/652
+generic/653
+generic/654
+generic/655
+generic/657
+generic/658
+generic/659
+generic/660
+generic/661
+generic/662
+generic/663
+generic/664
+generic/665
+generic/666
+generic/667
+generic/668
+generic/669
+generic/673
+generic/674
+generic/675
+generic/676
+generic/677
+generic/678
+generic/679
+generic/680
+generic/681
+generic/682
+generic/683
+generic/684
+generic/685
+generic/686
+generic/687
+generic/688
+generic/689
+shared/002
+shared/032
+Not
+run:
+generic/008
+generic/009
+generic/012
+generic/015
+generic/016
+generic/018
+generic/021
+generic/022
+generic/025
+generic/026
+generic/031
+generic/033
+generic/050
+generic/052
+generic/058
+generic/059
+generic/060
+generic/061
+generic/063
+generic/064
+generic/078
+generic/079
+generic/081
+generic/082
+generic/091
+generic/094
+generic/096
+generic/110
+generic/111
+generic/113
+generic/114
+generic/115
+generic/116
+generic/118
+generic/119
+generic/121
+generic/122
+generic/123
+generic/128
+generic/130
+generic/134
+generic/135
+generic/136
+generic/138
+generic/139
+generic/140
+generic/142
+generic/143
+generic/144
+generic/145
+generic/146
+generic/147
+generic/148
+generic/149
+generic/150
+generic/151
+generic/152
+generic/153
+generic/154
+generic/155
+generic/156
+generic/157
+generic/158
+generic/159
+generic/160
+generic/161
+generic/162
+generic/163
+generic/171
+generic/172
+generic/173
+generic/174
+generic/177
+generic/178
+generic/179
+generic/180
+generic/181
+generic/182
+generic/183
+generic/185
+generic/188
+generic/189
+generic/190
+generic/191
+generic/193
+generic/194
+generic/195
+generic/196
+generic/197
+generic/198
+generic/199
+generic/200
+generic/201
+generic/202
+generic/203
+generic/205
+generic/206
+generic/207
+generic/210
+generic/211
+generic/212
+generic/214
+generic/216
+generic/217
+generic/218
+generic/219
+generic/220
+generic/222
+generic/223
+generic/225
+generic/227
+generic/229
+generic/230
+generic/235
+generic/238
+generic/240
+generic/244
+generic/250
+generic/252
+generic/253
+generic/254
+generic/255
+generic/256
+generic/259
+generic/260
+generic/261
+generic/262
+generic/263
+generic/264
+generic/265
+generic/266
+generic/267
+generic/268
+generic/271
+generic/272
+generic/276
+generic/277
+generic/278
+generic/279
+generic/281
+generic/282
+generic/283
+generic/284
+generic/287
+generic/288
+generic/289
+generic/290
+generic/291
+generic/292
+generic/293
+generic/295
+generic/296
+generic/301
+generic/302
+generic/303
+generic/304
+generic/305
+generic/312
+generic/314
+generic/316
+generic/317
+generic/324
+generic/326
+generic/327
+generic/328
+generic/329
+generic/330
+generic/331
+generic/332
+generic/353
+generic/355
+generic/358
+generic/359
+generic/361
+generic/362
+generic/363
+generic/364
+generic/365
+generic/366
+generic/367
+generic/368
+generic/369
+generic/370
+generic/371
+generic/372
+generic/373
+generic/374
+generic/378
+generic/379
+generic/380
+generic/381
+generic/382
+generic/383
+generic/384
+generic/385
+generic/386
+generic/391
+generic/392
+generic/395
+generic/396
+generic/397
+generic/398
+generic/400
+generic/402
+generic/404
+generic/406
+generic/407
+generic/408
+generic/412
+generic/413
+generic/414
+generic/417
+generic/419
+generic/420
+generic/421
+generic/422
+generic/424
+generic/425
+generic/427
+generic/439
+generic/440
+generic/446
+generic/449
+generic/450
+generic/451
+generic/453
+generic/454
+generic/456
+generic/458
+generic/462
+generic/463
+generic/465
+generic/466
+generic/468
+generic/469
+generic/470
+generic/471
+generic/474
+generic/485
+generic/487
+generic/488
+generic/491
+generic/492
+generic/499
+generic/501
+generic/503
+generic/505
+generic/506
+generic/507
+generic/508
+generic/511
+generic/513
+generic/514
+generic/515
+generic/516
+generic/517
+generic/518
+generic/519
+generic/520
+generic/528
+generic/530
+generic/536
+generic/537
+generic/538
+generic/539
+generic/540
+generic/541
+generic/542
+generic/543
+generic/544
+generic/545
+generic/546
+generic/548
+generic/549
+generic/550
+generic/552
+generic/553
+generic/555
+generic/556
+generic/566
+generic/567
+generic/572
+generic/573
+generic/574
+generic/575
+generic/576
+generic/577
+generic/578
+generic/580
+generic/581
+generic/582
+generic/583
+generic/584
+generic/586
+generic/587
+generic/588
+generic/591
+generic/592
+generic/593
+generic/594
+generic/595
+generic/596
+generic/597
+generic/598
+generic/599
+generic/600
+generic/601
+generic/602
+generic/603
+generic/605
+generic/606
+generic/607
+generic/608
+generic/609
+generic/610
+generic/612
+generic/613
+generic/621
+generic/623
+generic/624
+generic/625
+generic/626
+generic/628
+generic/629
+generic/630
+generic/635
+generic/644
+generic/645
+generic/646
+generic/647
+generic/651
+generic/652
+generic/653
+generic/654
+generic/655
+generic/657
+generic/658
+generic/659
+generic/660
+generic/661
+generic/662
+generic/663
+generic/664
+generic/665
+generic/666
+generic/667
+generic/668
+generic/669
+generic/673
+generic/674
+generic/675
+generic/677
+generic/678
+generic/679
+generic/680
+generic/681
+generic/682
+generic/683
+generic/684
+generic/685
+generic/686
+generic/687
+generic/688
+generic/689
+shared/002
+shared/032
+Passed all 512 tests
--- a/tests/run-tests.sh
+++ b/tests/run-tests.sh
@@ -56,7 +56,6 @@ $(basename $0) options:
              | only tests matching will be run.  Can be provided multiple
              | times
    -i        | Force removing and inserting the built scoutfs.ko module.
-    -l <nr>   | Loop each test <nr> times while passing, last run counts.
    -M <file> | Specify the filesystem's meta data device path that contains
              | the file system to be tested.  Will be clobbered by -m mkfs.
    -m        | Run mkfs on the device before mounting and running
@@ -70,7 +69,6 @@ $(basename $0) options:
    -r <dir>  | Specify the directory in which to store results of
              | test runs.  The directory will be created if it doesn't
              | exist.  Previous results will be deleted as each test runs.
-    -R        | shuffle the test order randomly using shuf
    -s        | Skip git repo checkouts.
    -t        | Enabled trace events that match the given glob argument.
              | Multiple options enable multiple globbed events.
@@ -91,8 +89,6 @@ done
 # set some T_ defaults
 T_TRACE_DUMP="0"
 T_TRACE_PRINTK="0"
-T_PORT_START="19700"
-T_LOOP_ITER="1"

 # array declarations to be able to use array ops
 declare -a T_TRACE_GLOB
@@ -133,12 +129,6 @@ while true; do
 	-i)
 		T_INSMOD="1"
 		;;
-	-l)
-	        test -n "$2" || die "-l must have a nr iterations argument"
-		test "$2" -eq "$2" 2>/dev/null || die "-l <nr> argument must be an integer"
-		T_LOOP_ITER="$2"
-		shift
-		;;
 	-M)
 	        test -n "$2" || die "-z must have meta device file argument"
 	        T_META_DEVICE="$2"
@@ -174,9 +164,6 @@ while true; do
 		T_RESULTS="$2"
 		shift
 		;;
-	-R)
-		T_SHUF="1"
-		;;
 	-s)
 	        T_SKIP_CHECKOUT="1"
 		;;
@@ -274,37 +261,13 @@ for e in T_META_DEVICE T_DATA_DEVICE T_EX_META_DEV T_EX_DATA_DEV T_KMOD T_RESULT
 	eval $e=\"$(readlink -f "${!e}")\"
 done

-# try and check ports, but not necessary
-T_TEST_PORT="$T_PORT_START"
-T_SCRATCH_PORT="$((T_PORT_START + 100))"
-T_DEV_PORT="$((T_PORT_START + 200))"
-read local_start local_end < /proc/sys/net/ipv4/ip_local_port_range
-if [ -n "$local_start" -a -n "$local_end" -a "$local_start" -lt "$local_end" ]; then
-	if [ ! "$T_DEV_PORT" -lt "$local_start" -a ! "$T_TEST_PORT" -gt "$local_end" ]; then
-		die "listening port range $T_TEST_PORT - $T_DEV_PORT is within local dynamic port range $local_start - $local_end in /proc/sys/net/ipv4/ip_local_port_range"
-	fi
-fi
-
-# permute sequence?
-T_SEQUENCE=sequence
-if [ -n "$T_SHUF" ]; then
-	msg "shuffling test order"
-	shuf sequence -o sequence.shuf
-	# keep xfstests at the end
-	if grep -q 'xfstests.sh' sequence.shuf ; then
-		sed -i '/xfstests.sh/d' sequence.shuf
-		echo "xfstests.sh" >> sequence.shuf
-	fi
-	T_SEQUENCE=sequence.shuf
-fi
-
 # include everything by default
 test -z "$T_INCLUDE" && T_INCLUDE="-e '.*'"
 # (quickly) exclude nothing by default
 test -z "$T_EXCLUDE" && T_EXCLUDE="-e '\Zx'"

 # eval to strip re ticks but not expand
-tests=$(grep -v "^#" $T_SEQUENCE |
+tests=$(grep -v "^#" sequence |
 	eval grep "$T_INCLUDE" | eval grep -v "$T_EXCLUDE")
 test -z "$tests" && \
 	die "no tests found by including $T_INCLUDE and excluding $T_EXCLUDE"
@@ -383,7 +346,7 @@ fi
 quo=""
 if [ -n "$T_MKFS" ]; then
 	for i in $(seq -0 $((T_QUORUM - 1))); do
-		quo="$quo -Q $i,127.0.0.1,$((T_TEST_PORT + i))"
+		quo="$quo -Q $i,127.0.0.1,$((42000 + i))"
 	done

 	msg "making new filesystem with $T_QUORUM quorum members"
@@ -400,8 +363,7 @@ if [ -n "$T_INSMOD" ]; then
 fi

 if [ -n "$T_TRACE_MULT" ]; then
-#	orig_trace_size=$(cat /sys/kernel/debug/tracing/buffer_size_kb)
-	orig_trace_size=1408
+	orig_trace_size=$(cat /sys/kernel/debug/tracing/buffer_size_kb)
 	mult_trace_size=$((orig_trace_size * T_TRACE_MULT))
 	msg "increasing trace buffer size from $orig_trace_size KiB to $mult_trace_size KiB"
 	echo $mult_trace_size > /sys/kernel/debug/tracing/buffer_size_kb
@@ -439,30 +401,6 @@ cmd grep .  /sys/kernel/debug/tracing/options/trace_printk \
 	    /sys/kernel/debug/tracing/buffer_size_kb \
 	    /proc/sys/kernel/ftrace_dump_on_oops

-# we can record pids to kill as we exit, we kill in reverse added order
-atexit_kill_pids=""
-add_atexit_kill_pid()
-{
-	atexit_kill_pids="$1 $atexit_kill_pids"
-}
-atexit_kill()
-{
-	local pid
-
-	# suppress bg function exited messages
-	exec {ERR}>&2 2>/dev/null
-
-	for pid in $atexit_kill_pids; do
-		if test -e "/proc/$pid/status" ; then
-			kill "$pid"
-			wait "$pid"
-		fi
-	done
-
-	exec 2>&$ERR {ERR}>&-
-}
-trap atexit_kill EXIT
-
 #
 # Build a fenced config that runs scripts out of the repository rather
 # than the default system directory
@@ -476,46 +414,26 @@ EOF
 export SCOUTFS_FENCED_CONFIG_FILE="$conf"
 T_FENCED_LOG="$T_RESULTS/fenced.log"

+#
+# Run the agent in the background, log its output, an kill it if we
+# exit
+#
+fenced_log()
+{
+	echo "[$(timestamp)] $*" >> "$T_FENCED_LOG"
+}
+fenced_pid=""
+kill_fenced()
+{
+	if test -n "$fenced_pid" -a -d "/proc/$fenced_pid" ; then
+		fenced_log "killing fenced pid $fenced_pid"
+		kill "$fenced_pid"
+	fi
+}
+trap kill_fenced EXIT
 $T_UTILS/fenced/scoutfs-fenced > "$T_FENCED_LOG" 2>&1 &
 fenced_pid=$!
-add_atexit_kill_pid $fenced_pid
-
-#
-# some critical failures will cause fs operations to hang.  We can watch
-# for evidence of them and cause the system to crash, at least.
-#
-crash_monitor()
-{
-	local bad=0
-
-	while sleep 1; do
-		if dmesg | grep -q "inserting extent.*overlaps existing"; then
-			echo "run-tests monitor saw overlapping extent message"
-			bad=1
-		fi
-
-		if dmesg | grep -q "error indicated by fence action" ; then
-			echo "run-tests monitor saw fence agent error message"
-			bad=1
-		fi
-
-		if [ ! -e "/proc/${fenced_pid}/status" ]; then
-			echo "run-tests monitor didn't see fenced pid $fenced_pid /proc dir"
-			bad=1
-		fi
-
-		if [ "$bad" != 0 ]; then
-			echo "run-tests monitor syncing and triggering crash"
-			# hail mary, the sync could well hang
-			(echo s > /proc/sysrq-trigger) &
-			sleep 5
-			echo c > /proc/sysrq-trigger
-			exit 1
-		fi
-	done
-}
-crash_monitor &
-add_atexit_kill_pid $!
+fenced_log "started fenced pid $fenced_pid in the background"

 # setup dm tables
 echo "0 $(blockdev --getsz $T_META_DEVICE) linear $T_META_DEVICE 0" > \
@@ -588,7 +506,7 @@ fi
 . funcs/filter.sh

 # give tests access to built binaries in src/, prefer over installed
-export PATH="$PWD/src:$PATH"
+PATH="$PWD/src:$PATH"

 msg "running tests"
 > "$T_RESULTS/skip.log"
@@ -608,110 +526,101 @@ for t in $tests; do
 	t="tests/$t"
 	test_name=$(basename "$t" | sed -e 's/.sh$//')

+	# create a temporary dir and file path for the test
+	T_TMPDIR="$T_RESULTS/tmp/$test_name"
+	T_TMP="$T_TMPDIR/tmp"
+	cmd rm -rf "$T_TMPDIR"
+	cmd mkdir -p "$T_TMPDIR"
+
+	# create a test name dir in the fs, clean up old data as needed
+	T_DS=""
+	for i in $(seq 0 $((T_NR_MOUNTS - 1))); do
+		dir="${T_M[$i]}/test/$test_name"
+
+		test $i == 0 && (
+			test -d "$dir" && cmd rm -rf "$dir"
+			cmd mkdir -p "$dir"
+		)
+
+		eval T_D$i=$dir
+		T_D[$i]=$dir
+		T_DS+="$dir "
+	done
+
+	# export all our T_ variables
+	for v in ${!T_*}; do
+		eval export $v
+	done
+	export PATH # give test access to scoutfs binary
+
+	# prepare to compare output to golden output
+	test -e "$T_RESULTS/output" || cmd mkdir -p "$T_RESULTS/output"
+	out="$T_RESULTS/output/$test_name"
+	> "$T_TMPDIR/status.msg"
+	golden="golden/$test_name"
+
 	# get stats from previous pass
 	last="$T_RESULTS/last-passed-test-stats"
 	stats=$(grep -s "^$test_name " "$last" | cut -d " " -f 2-)
 	test -n "$stats" && stats="last: $stats"
+
 	printf "  %-30s $stats" "$test_name"

 	# mark in dmesg as to what test we are running
 	echo "run scoutfs test $test_name" > /dev/kmsg

-	# let the test get at its extra files
-	T_EXTRA="$T_TESTS/extra/$test_name"
+	# record dmesg before
+	dmesg | t_filter_dmesg > "$T_TMPDIR/dmesg.before"

-	for iter in $(seq 1 $T_LOOP_ITER); do
+	# give tests stdout and compared output on specific fds
+	exec 6>&1
+	exec 7>$out

-		# create a temporary dir and file path for the test
-		T_TMPDIR="$T_RESULTS/tmp/$test_name"
-		T_TMP="$T_TMPDIR/tmp"
-		cmd rm -rf "$T_TMPDIR"
-		cmd mkdir -p "$T_TMPDIR"
+	# run the test with access to our functions
+	start_secs=$SECONDS
+	bash -c "for f in funcs/*.sh; do . \$f; done; . $t" >&7 2>&1
+	sts="$?"
+	log "test $t exited with status $sts"
+	stats="$((SECONDS - start_secs))s"

-		# create a test name dir in the fs, clean up old data as needed
-		T_DS=""
-		for i in $(seq 0 $((T_NR_MOUNTS - 1))); do
-			dir="${T_M[$i]}/test/$test_name"
+	# close our weird descriptors
+	exec 6>&-
+	exec 7>&-

-			test $i == 0 && (
-				test -d "$dir" && cmd rm -rf "$dir"
-				cmd mkdir -p "$dir"
-			)
-
-			eval T_D$i=$dir
-			T_D[$i]=$dir
-			T_DS+="$dir "
-		done
-
-		# export all our T_ variables
-		for v in ${!T_*}; do
-			eval export $v
-		done
-
-		# prepare to compare output to golden output
-		test -e "$T_RESULTS/output" || cmd mkdir -p "$T_RESULTS/output"
-		out="$T_RESULTS/output/$test_name"
-		> "$T_TMPDIR/status.msg"
-		golden="golden/$test_name"
-
-		# record dmesg before
-		dmesg | t_filter_dmesg > "$T_TMPDIR/dmesg.before"
-
-		# give tests stdout and compared output on specific fds
-		exec 6>&1
-		exec 7>$out
-
-		# run the test with access to our functions
-		start_secs=$SECONDS
-		bash -c "for f in funcs/*.sh; do . \$f; done; . $t" >&7 2>&1
-		sts="$?"
-		log "test $t exited with status $sts"
-		stats="$((SECONDS - start_secs))s"
-
-		# close our weird descriptors
-		exec 6>&-
-		exec 7>&-
-
-		# compare output if the test returned passed status
-		if [ "$sts" == "$T_PASS_STATUS" ]; then
-			if [ ! -e "$golden" ]; then
-				message="no golden output"
-				sts=$T_FAIL_STATUS
-			elif ! cmp -s "$golden" "$out"; then 
-				message="output differs"
-				sts=$T_FAIL_STATUS
-				diff -u "$golden" "$out" >> "$T_RESULTS/fail.log"
-			fi
-		else
-			# get message from t_*() functions
-			message=$(cat "$T_TMPDIR/status.msg")
-		fi
-
-		# see if anything unexpected was added to dmesg
-		if [ "$sts" == "$T_PASS_STATUS" ]; then
-			dmesg | t_filter_dmesg > "$T_TMPDIR/dmesg.after"
-			diff --old-line-format="" --unchanged-line-format="" \
-				"$T_TMPDIR/dmesg.before" "$T_TMPDIR/dmesg.after" > \
-				"$T_TMPDIR/dmesg.new"
-
-			if [ -s "$T_TMPDIR/dmesg.new" ]; then
-				message="unexpected messages in dmesg"
-				sts=$T_FAIL_STATUS
-				cat "$T_TMPDIR/dmesg.new" >> "$T_RESULTS/fail.log"
-			fi
-		fi
-
-		# record unknown exit status
-		if [ "$sts" -lt "$T_FIRST_STATUS" -o "$sts" -gt "$T_LAST_STATUS" ]; then
-			message="unknown status: $sts"
+	# compare output if the test returned passed status
+	if [ "$sts" == "$T_PASS_STATUS" ]; then
+		if [ ! -e "$golden" ]; then
+			message="no golden output"
 			sts=$T_FAIL_STATUS
+		elif ! cmp -s "$golden" "$out"; then 
+			message="output differs"
+			sts=$T_FAIL_STATUS
+			diff -u "$golden" "$out" >> "$T_RESULTS/fail.log"
 		fi
+	else
+		# get message from t_*() functions
+		message=$(cat "$T_TMPDIR/status.msg")
+	fi

-		# stop looping if we didn't pass
-		if [ "$sts" != "$T_PASS_STATUS" ]; then
-			break;
+	# see if anything unexpected was added to dmesg
+	if [ "$sts" == "$T_PASS_STATUS" ]; then
+		dmesg | t_filter_dmesg > "$T_TMPDIR/dmesg.after"
+		diff --old-line-format="" --unchanged-line-format="" \
+			"$T_TMPDIR/dmesg.before" "$T_TMPDIR/dmesg.after" > \
+			"$T_TMPDIR/dmesg.new"
+
+		if [ -s "$T_TMPDIR/dmesg.new" ]; then
+			message="unexpected messages in dmesg"
+			sts=$T_FAIL_STATUS
+			cat "$T_TMPDIR/dmesg.new" >> "$T_RESULTS/fail.log"
 		fi
-	done
+	fi
+
+	# record unknown exit status
+	if [ "$sts" -lt "$T_FIRST_STATUS" -o "$sts" -gt "$T_LAST_STATUS" ]; then
+		message="unknown status: $sts"
+		sts=$T_FAIL_STATUS
+	fi

 	# show and record the result of the test
 	if [ "$sts" == "$T_PASS_STATUS" ]; then
--- a/tests/src/mmap_stress.c
+++ b/tests/src/mmap_stress.c
@@ -19,7 +19,6 @@
 #include <sys/types.h>
 #include <stdio.h>
 #include <sys/stat.h>
-#include <inttypes.h>
 #include <fcntl.h>
 #include <unistd.h>
 #include <stdlib.h>
@@ -30,7 +29,7 @@
 #include <errno.h>

 static int size = 0;
-static int duration = 0;
+static int count = 0; /* XXX make this duration instead */

 struct thread_info {
 	int nr;
@@ -42,8 +41,6 @@ static void *run_test_func(void *ptr)
 	void *buf = NULL;
 	char *addr = NULL;
 	struct thread_info *tinfo = ptr;
-	uint64_t seconds = 0;
-	struct timespec ts;
 	int c = 0;
 	int fd;
 	ssize_t read, written, ret;
@@ -64,15 +61,9 @@ static void *run_test_func(void *ptr)

 	usleep(100000); /* 0.1sec to allow all threads to start roughly at the same time */

-	clock_gettime(CLOCK_REALTIME, &ts); /* record start time */
-	seconds = ts.tv_sec + duration;
-
 	for (;;) {
-		if (++c % 16 == 0) {
-			clock_gettime(CLOCK_REALTIME, &ts);
-			if (ts.tv_sec >= seconds)
-				break;
-		}
+		if (++c > count)
+			break;

 		switch (rand() % 4) {
 		case 0: /* pread */
@@ -108,8 +99,6 @@ static void *run_test_func(void *ptr)
 			memcpy(addr, buf, size); /* noerr */
 			break;
 		}
-
-		usleep(10000);
 	}

 	munmap(addr, size);
@@ -131,7 +120,7 @@ int main(int argc, char **argv)
 	int i;

 	if (argc != 8) {
-		fprintf(stderr, "%s requires 7 arguments - size duration file1 file2 file3 file4 file5\n", argv[0]);
+		fprintf(stderr, "%s requires 7 arguments - size count file1 file2 file3 file4 file5\n", argv[0]);
 		exit(-1);
 	}

@@ -141,9 +130,9 @@ int main(int argc, char **argv)
 		exit(-1);
 	}

-	duration = atoi(argv[2]);
-	if (duration < 0) {
-		fprintf(stderr, "invalid duration, must be greater than or equal to 0\n");
+	count = atoi(argv[2]);
+	if (count < 0) {
+		fprintf(stderr, "invalid count, must be greater than 0\n");
 		exit(-1);
 	}

--- a/tests/tests/basic-bad-mounts.sh
+++ b/tests/tests/basic-bad-mounts.sh
@@ -15,7 +15,7 @@ echo "== prepare devices, mount point, and logs"
 SCR="$T_TMPDIR/mnt.scratch"
 mkdir -p "$SCR"
 > $T_TMP.mount.out
-scoutfs mkfs -f -Q 0,127.0.0.1,$T_SCRATCH_PORT "$T_EX_META_DEV" "$T_EX_DATA_DEV" > $T_TMP.mkfs.out 2>&1 \
+scoutfs mkfs -f -Q 0,127.0.0.1,53000 "$T_EX_META_DEV" "$T_EX_DATA_DEV" > $T_TMP.mkfs.out 2>&1 \
 	|| t_fail "mkfs failed"

 echo "== bad devices, bad options"
--- a/tests/tests/change-devices.sh
+++ b/tests/tests/change-devices.sh
@@ -11,7 +11,7 @@ truncate -s $sz "$T_TMP.equal"
 truncate -s $large_sz "$T_TMP.large"

 echo "== make scratch fs"
-t_quiet scoutfs mkfs -f -Q 0,127.0.0.1,$T_SCRATCH_PORT "$T_EX_META_DEV" "$T_EX_DATA_DEV"
+t_quiet scoutfs mkfs -f -Q 0,127.0.0.1,53000 "$T_EX_META_DEV" "$T_EX_DATA_DEV"
 SCR="$T_TMPDIR/mnt.scratch"
 mkdir -p "$SCR"

--- a/tests/tests/enospc.sh
+++ b/tests/tests/enospc.sh
@@ -57,7 +57,7 @@ test "$before" == "$after" || \
 # XXX this is all pretty manual, would be nice to have helpers
 echo "== make small meta fs"
 # meta device just big enough for reserves and the metadata we'll fill
-scoutfs mkfs -A -f -Q 0,127.0.0.1,$T_SCRATCH_PORT -m 10G "$T_EX_META_DEV" "$T_EX_DATA_DEV" > $T_TMP.mkfs.out 2>&1 || \
+scoutfs mkfs -A -f -Q 0,127.0.0.1,53000 -m 10G "$T_EX_META_DEV" "$T_EX_DATA_DEV" > $T_TMP.mkfs.out 2>&1 || \
 	t_fail "mkfs failed"
 SCR="$T_TMPDIR/mnt.scratch"
 mkdir -p "$SCR"
--- a/tests/tests/fence-and-reclaim.sh
+++ b/tests/tests/fence-and-reclaim.sh
@@ -5,9 +5,6 @@
 t_require_commands sleep touch grep sync scoutfs
 t_require_mounts 2

-# regularly see ~20/~30s
-VERIFY_TIMEOUT_SECS=90
-
 #
 # Make sure that all mounts can read the results of a write from each
 # mount.
@@ -43,10 +40,8 @@ verify_fenced_run()

 	for rid in $rids; do
 		grep -q ".* running rid '$rid'.* args 'ignored run args'" "$T_FENCED_LOG" || \
-			return 1
+			t_fail "fenced didn't execute RUN script for rid $rid"
 	done
-
-	return 0
 }

 echo "== make sure all mounts can see each other"
@@ -59,7 +54,14 @@ rid=$(t_mount_rid $cl)
 echo "cl $cl sv $sv rid $rid" >> "$T_TMP.log"
 sync
 t_force_umount $cl
-t_wait_until_timeout $VERIFY_TIMEOUT_SECS verify_fenced_run $rid
+# wait for client reconnection to timeout
+while grep -q $rid $(t_debugfs_path $sv)/connections; do
+	sleep .5
+done
+while t_rid_is_fencing $rid; do
+	sleep .5
+done
+verify_fenced_run $rid
 t_mount $cl
 check_read_write

@@ -81,7 +83,15 @@ for cl in $(t_fs_nrs); do
 	t_force_umount $cl
 done

-t_wait_until_timeout $VERIFY_TIMEOUT_SECS verify_fenced_run $rids
+# wait for all client reconnections to timeout
+while egrep -q "($pattern)" $(t_debugfs_path $sv)/connections; do
+	sleep .5
+done
+# wait for all fence requests to complete
+while test -d $(echo /sys/fs/scoutfs/*/fence/* | cut -d " " -f 1); do
+	sleep .5
+done
+verify_fenced_run $rids
 # remount all the clients
 for cl in $(t_fs_nrs); do
 	if [ $cl == $sv ]; then
@@ -97,7 +107,12 @@ rid=$(t_mount_rid $sv)
 echo "sv $sv rid $rid" >> "$T_TMP.log"
 sync
 t_force_umount $sv
-t_wait_until_timeout $VERIFY_TIMEOUT_SECS verify_fenced_run $rid
+t_wait_for_leader
+# wait until new server is done fencing unmounted leader rid
+while t_rid_is_fencing $rid; do
+	sleep .5
+done
+verify_fenced_run $rid
 t_mount $sv
 check_read_write

@@ -112,7 +127,11 @@ for nr in $(t_fs_nrs); do
 	t_force_umount $nr
 done
 t_mount_all
-t_wait_until_timeout $VERIFY_TIMEOUT_SECS verify_fenced_run $rids
+# wait for all fence requests to complete
+while test -d $(echo /sys/fs/scoutfs/*/fence/* | cut -d " " -f 1); do
+	sleep .5
+done
+verify_fenced_run $rids
 check_read_write

 t_pass
--- a/tests/tests/format-version-forward-back.sh
+++ b/tests/tests/format-version-forward-back.sh
@@ -89,7 +89,7 @@ for vers in $(seq $MIN $((MAX - 1))); do
 	old_module="$builds/$vers/scoutfs.ko"

 	echo "mkfs $vers" >> "$T_TMP.log"
-	t_quiet $old_scoutfs mkfs -f -Q 0,127.0.0.1,$T_SCRATCH_PORT "$T_EX_META_DEV" "$T_EX_DATA_DEV" \
+	t_quiet $old_scoutfs mkfs -f -Q 0,127.0.0.1,53000 "$T_EX_META_DEV" "$T_EX_DATA_DEV" \
 		|| t_fail "mkfs $vers failed"

 	echo "mount $vers with $vers" >> "$T_TMP.log"
--- a/tests/tests/get-referring-entries.sh
+++ b/tests/tests/get-referring-entries.sh
@@ -72,7 +72,7 @@ touch $T_D0/dir/file
 mkdir $T_D0/dir/dir
 ln -s $T_D0/dir/file $T_D0/dir/symlink
 mknod $T_D0/dir/char c 1 3 # null
-mknod $T_D0/dir/block b 42 0 # SAMPLE block dev - nonexistant/demo use only number
+mknod $T_D0/dir/block b 7 0 # loop0
 for name in $(ls -UA $T_D0/dir | sort); do
 	ino=$(stat -c '%i' $T_D0/dir/$name)
 	$GRE $ino | filter_types
--- a/tests/tests/inode-deletion.sh
+++ b/tests/tests/inode-deletion.sh
@@ -61,28 +61,18 @@ rm -f "$T_D1/file"
 check_ino_index "$ino" "$dseq" "$T_M0"
 check_ino_index "$ino" "$dseq" "$T_M1"

-# Hurry along the orphan scanners. If any are currently asleep, we will
-# have to wait at least their current scan interval before they wake up,
-# run, and notice their new interval.
-t_save_all_sysfs_mount_options orphan_scan_delay_ms
-t_set_all_sysfs_mount_options orphan_scan_delay_ms 500
-t_wait_for_orphan_scan_runs
-
 echo "== unlink wait for open on other mount"
-echo "contents" > "$T_D0/badfile"
-ino=$(stat -c "%i" "$T_D0/badfile")
-dseq=$(scoutfs stat -s data_seq "$T_D0/badfile")
-exec {FD}<"$T_D0/badfile"
-rm -f "$T_D1/badfile"
+echo "contents" > "$T_D0/file"
+ino=$(stat -c "%i" "$T_D0/file")
+dseq=$(scoutfs stat -s data_seq "$T_D0/file")
+exec {FD}<"$T_D0/file"
+rm -f "$T_D1/file"
 echo "mount 0 contents after mount 1 rm: $(cat <&$FD)"
 check_ino_index "$ino" "$dseq" "$T_M0"
 check_ino_index "$ino" "$dseq" "$T_M1"
 exec {FD}>&-  # close
 # we know that revalidating will unhash the remote dentry
-stat "$T_D0/badfile" 2>&1 | sed 's/cannot statx/cannot stat/' | t_filter_fs
-t_force_log_merge
-# wait for orphan scanners to pick up the unlinked inode and become idle
-t_wait_for_no_orphans
+stat "$T_D0/file" 2>&1 | sed 's/cannot statx/cannot stat/' | t_filter_fs
 check_ino_index "$ino" "$dseq" "$T_M0"
 check_ino_index "$ino" "$dseq" "$T_M1"

@@ -93,20 +83,16 @@ rm -f "$T_D0/dir"/files-*
 rmdir "$T_D0/dir"

 echo "== open files survive remote scanning orphans"
-echo "contents" > "$T_D0/lastfile"
-ino=$(stat -c "%i" "$T_D0/lastfile")
-dseq=$(scoutfs stat -s data_seq "$T_D0/lastfile")
-exec {FD}<"$T_D0/lastfile"
-rm -f "$T_D0/lastfile"
+echo "contents" > "$T_D0/file"
+ino=$(stat -c "%i" "$T_D0/file")
+dseq=$(scoutfs stat -s data_seq "$T_D0/file")
+exec {FD}<"$T_D0/file"
+rm -f "$T_D0/file"
 t_umount 1
 t_mount 1
 echo "mount 0 contents after mount 1 remounted: $(cat <&$FD)"
 exec {FD}>&-  # close
-t_force_log_merge
-t_wait_for_no_orphans
 check_ino_index "$ino" "$dseq" "$T_M0"
 check_ino_index "$ino" "$dseq" "$T_M1"

-t_restore_all_sysfs_mount_options orphan_scan_delay_ms
-
 t_pass
--- a/tests/tests/mmap.sh
+++ b/tests/tests/mmap.sh
@@ -5,7 +5,7 @@
 t_require_commands mmap_stress mmap_validate scoutfs xfs_io

 echo "== mmap_stress"
-mmap_stress 8192 30 "$T_D0/mmap_stress" "$T_D0/mmap_stress" "$T_D0/mmap_stress" "$T_D3/mmap_stress" "$T_D3/mmap_stress" | sed 's/:.*//g' | sort
+mmap_stress 8192 2000 "$T_D0/mmap_stress" "$T_D1/mmap_stress" "$T_D2/mmap_stress" "$T_D3/mmap_stress" "$T_D4/mmap_stress" | sed 's/:.*//g' | sort

 echo "== basic mmap/read/write consistency checks"
 mmap_validate 256 1000 "$T_D0/mmap_val1" "$T_D1/mmap_val1"
--- a/tests/tests/quorum-heartbeat-timeout.sh
+++ b/tests/tests/quorum-heartbeat-timeout.sh
@@ -62,7 +62,7 @@ test_timeout()
 	sleep 1

 	# tear down the current server/leader
-	t_force_umount $sv &
+	t_force_umount $sv

 	# see how long it takes for the next leader to start
 	start=$(time_ms)
@@ -73,7 +73,6 @@ test_timeout()
 	echo "to $to delay $delay" >> $T_TMP.delay

 	# restore the mount that we tore down
-	wait
 	t_mount $sv

 	# make sure the new leader delay was reasonable, allowing for some slack
--- a/tests/tests/renameat2-noreplace.sh
+++ b/tests/tests/renameat2-noreplace.sh
@@ -8,19 +8,19 @@ t_require_mounts 2
 echo "=== renameat2 noreplace flag test"

 # give each mount their own dir (lock group) to minimize create contention
-mkdir $T_D0/dir0
-mkdir $T_D1/dir1
+mkdir $T_M0/dir0
+mkdir $T_M1/dir1

 echo "=== run two asynchronous calls to renameat2 NOREPLACE"
 for i in $(seq 0 100); do
        # prepare inputs in isolation
-        touch "$T_D0/dir0/old0"
-        touch "$T_D1/dir1/old1"
+        touch "$T_M0/dir0/old0"
+        touch "$T_M1/dir1/old1"

        # race doing noreplace renames, both can't succeed
-        dumb_renameat2 -n "$T_D0/dir0/old0" "$T_D0/dir0/sharednew" 2> /dev/null &
+        dumb_renameat2 -n "$T_M0/dir0/old0" "$T_M0/dir0/sharednew" 2> /dev/null &
        pid0=$!
-        dumb_renameat2 -n "$T_D1/dir1/old1" "$T_D1/dir0/sharednew" 2> /dev/null &
+        dumb_renameat2 -n "$T_M1/dir1/old1" "$T_M1/dir0/sharednew" 2> /dev/null &
        pid1=$!

        wait $pid0
@@ -31,7 +31,7 @@ for i in $(seq 0 100); do
        test "$rc0" == 0 -a "$rc1" == 0 && t_fail "both renames succeeded"

        # blow away possible files for either race outcome
-        rm -f "$T_D0/dir0/old0" "$T_D1/dir1/old1" "$T_D0/dir0/sharednew" "$T_D1/dir1/sharednew"
+        rm -f "$T_M0/dir0/old0" "$T_M1/dir1/old1" "$T_M0/dir0/sharednew" "$T_M1/dir1/sharednew"
 done

 t_pass
--- a/tests/tests/resize-devices.sh
+++ b/tests/tests/resize-devices.sh
@@ -72,7 +72,7 @@ quarter_data=$(echo "$size_data / 4" | bc)

 # XXX this is all pretty manual, would be nice to have helpers
 echo "== make initial small fs"
-scoutfs mkfs -A -f -Q 0,127.0.0.1,$T_SCRATCH_PORT -m $quarter_meta -d $quarter_data \
+scoutfs mkfs -A -f -Q 0,127.0.0.1,53000 -m $quarter_meta -d $quarter_data \
 	"$T_EX_META_DEV" "$T_EX_DATA_DEV" > $T_TMP.mkfs.out 2>&1 || \
 		t_fail "mkfs failed"
 SCR="$T_TMPDIR/mnt.scratch"
--- a/tests/tests/xfstests.sh
+++ b/tests/tests/xfstests.sh
@@ -50,9 +50,9 @@ t_quiet sync
 cat << EOF > local.config
 export FSTYP=scoutfs
 export MKFS_OPTIONS="-f"
-export MKFS_TEST_OPTIONS="-Q 0,127.0.0.1,$T_TEST_PORT"
-export MKFS_SCRATCH_OPTIONS="-Q 0,127.0.0.1,$T_SCRATCH_PORT"
-export MKFS_DEV_OPTIONS="-Q 0,127.0.0.1,$T_DEV_PORT"
+export MKFS_TEST_OPTIONS="-Q 0,127.0.0.1,42000"
+export MKFS_SCRATCH_OPTIONS="-Q 0,127.0.0.1,43000"
+export MKFS_DEV_OPTIONS="-Q 0,127.0.0.1,44000"
 export TEST_DEV=$T_DB0
 export TEST_DIR=$T_M0
 export SCRATCH_META_DEV=$T_EX_META_DEV
@@ -63,47 +63,73 @@ export MOUNT_OPTIONS="-o quorum_slot_nr=0,metadev_path=$T_MB0"
 export TEST_FS_MOUNT_OPTS="-o quorum_slot_nr=0,metadev_path=$T_MB0"
 EOF

-cp "$T_EXTRA/local.exclude" local.exclude
+cat << EOF > local.exclude
+generic/003	# missing atime update in buffered read
+generic/075	# file content mismatch failures (fds, etc)
+generic/103	# enospc causes trans commit failures
+generic/108	# mount fails on failing device?
+generic/112	# file content mismatch failures (fds, etc)
+generic/213	# enospc causes trans commit failures
+generic/318	# can't support user namespaces until v5.11
+generic/321	# requires selinux enabled for '+' in ls?
+generic/338	# BUG_ON update inode error handling
+generic/347	# _dmthin_mount doesn't work?
+generic/356	# swap
+generic/357	# swap
+generic/409	# bind mounts not scripted yet
+generic/410	# bind mounts not scripted yet
+generic/411	# bind mounts not scripted yet
+generic/423	# symlink inode size is strlen() + 1 on scoutfs
+generic/430	# xfs_io copy_range missing in el7
+generic/431	# xfs_io copy_range missing in el7
+generic/432	# xfs_io copy_range missing in el7
+generic/433	# xfs_io copy_range missing in el7
+generic/434	# xfs_io copy_range missing in el7
+generic/441	# dm-mapper
+generic/444	# el9's posix_acl_update_mode is buggy ?
+generic/467	# open_by_handle ESTALE
+generic/472	# swap
+generic/484	# dm-mapper
+generic/493	# swap
+generic/494	# swap
+generic/495	# swap
+generic/496	# swap
+generic/497	# swap
+generic/532	# xfs_io statx attrib_mask missing in el7
+generic/554	# swap
+generic/563	# cgroup+loopdev
+generic/564	# xfs_io copy_range missing in el7
+generic/565	# xfs_io copy_range missing in el7
+generic/568	# falloc not resulting in block count increase
+generic/569	# swap
+generic/570	# swap
+generic/620	# dm-hugedisk
+generic/633	# id-mapped mounts missing in el7
+generic/636	# swap
+generic/641	# swap
+generic/643	# swap
+EOF

-t_stdout_invoked
+t_restore_output
 echo "  (showing output of xfstests)"

 args="-E local.exclude ${T_XFSTESTS_ARGS:--g quick}"
 ./check $args
 # the fs is unmounted when check finishes

-t_stdout_compare
-
 #
-# ./check writes the results of the run to check.log.  It lists the
-# tests it ran, skipped, or failed.  Then it writes a line saying
-# everything passed or some failed.
-#
-
-#
-# If XFSTESTS_ARGS were specified then we just pass/fail to match the
-# check run.
-#
-if [ -n "$T_XFSTESTS_ARGS" ]; then
-	if tail -1 results/check.log | grep -q "Failed"; then
-		t_fail
-	else
-		t_pass
-	fi
-fi
-
-#
-# Otherwise, typically, when there were no args then we scrape the most
-# recent run and use it as the output to compare to make sure that we
-# run the right tests and get the right results.
+# ./check writes the results of the run to check.log.  It lists
+# the tests it ran, skipped, or failed.  Then it writes a line saying
+# everything passed or some failed.  We scrape the most recent run and
+# use it as the output to compare to make sure that we run the right
+# tests and get the right results.
 #
 awk '
 	/^(Ran|Not run|Failures):.*/ {
 		if (pf) {
 			res=""
 			pf=""
-		}
-		res = res "\n" $0
+		} res = res "\n" $0
 	}
 	/^(Passed|Failed).*tests$/ {
 		pf=$0
@@ -113,14 +139,10 @@ awk '
 	}' < results/check.log  > "$T_TMPDIR/results"

 # put a test per line so diff shows tests that differ
-grep -E "^(Ran|Not run|Failures):" "$T_TMPDIR/results" | fmt -w 1 > "$T_TMPDIR/results.fmt"
-grep -E "^(Passed|Failed).*tests$" "$T_TMPDIR/results" >> "$T_TMPDIR/results.fmt"
+egrep "^(Ran|Not run|Failures):" "$T_TMPDIR/results" | \
+	fmt -w 1 > "$T_TMPDIR/results.fmt"
+egrep "^(Passed|Failed).*tests$" "$T_TMPDIR/results" >> "$T_TMPDIR/results.fmt"

-diff -u "$T_EXTRA/expected-results" "$T_TMPDIR/results.fmt" > "$T_TMPDIR/results.diff"
-if [ -s "$T_TMPDIR/results.diff" ]; then
-	echo "tests that were skipped/run differed from expected:"
-	cat "$T_TMPDIR/results.diff"
-	t_fail
-fi
+t_compare_output cat "$T_TMPDIR/results.fmt"

 t_pass
--- a/utils/fenced/scoutfs-fenced
+++ b/utils/fenced/scoutfs-fenced
@@ -62,28 +62,32 @@ test -x "$SCOUTFS_FENCED_RUN" || \
 # files disappear.
 #

-# silence error messages
-quiet_cat()
+# generate failure messages to stderr while still echoing 0 for the caller
+careful_cat()
 {
-	cat "$@" 2>/dev/null
+	local path="$@"
+
+	cat "$@" || echo 0
 }

 while sleep $SCOUTFS_FENCED_DELAY; do
-	shopt -s nullglob
 	for fence in /sys/fs/scoutfs/*/fence/*; do
-
-		srv=$(basename $(dirname $(dirname $fence)))
-		fenced="$(quiet_cat $fence/fenced)"
-		error="$(quiet_cat $fence/error)"
-		rid="$(quiet_cat $fence/rid)"
-		ip="$(quiet_cat $fence/ipv4_addr)"
-		reason="$(quiet_cat $fence/reason)"
-
-		# request dirs can linger then disappear after fenced/error is set
-		if [ ! -d "$fence" -o "$fenced" == "1" -o "$error" == "1" ]; then
+		# catches unmatched regex when no dirs
+		if [ ! -d "$fence" ]; then
 			continue
 		fi

+		# skip requests that have been handled
+		if [ "$(careful_cat $fence/fenced)" == 1 -o \
+		     "$(careful_cat $fence/error)" == 1 ]; then
+			continue
+		fi
+
+		srv=$(basename $(dirname $(dirname $fence)))
+		rid="$(cat $fence/rid)"
+		ip="$(cat $fence/ipv4_addr)"
+		reason="$(cat $fence/reason)"
+
 		log_message "server $srv fencing rid $rid at IP $ip for $reason"

 		# export _REQ_ vars for run to use
--- a/utils/man/scoutfs.5
+++ b/utils/man/scoutfs.5
@@ -55,30 +55,6 @@ with initial sparse regions (perhaps by multiple threads writing to
 different regions) and wasted space isn't an issue (perhaps because the
 file population contains few small files).
 .TP
-.B ino_alloc_per_lock=<number>
-This option determines how many inode numbers are allocated in the same
-cluster lock.  The default, and maximum, is 1024.  The minimum is 1.
-Allocating fewer inodes per lock can allow more parallelism between
-mounts because there are more locks that cover the same number of
-created files.  This can be helpful when working with smaller numbers of
-large files.
-.TP
-.B lock_idle_count=<number>
-This option sets the number of locks that the client will allow to
-remain idle after being granted.  If the number of locks exceeds this
-count then the client will try to free the oldest locks.  This setting
-is per-mount and only changes the behavior of that mount.
-.sp
-Idle locks are not reclaimed by memory pressure so this option
-determines the limit of how much memory is likely to be pinned by
-allocated idle locks.  Setting this too low can increase latency of
-operations as repeated use of a working set of locks has to request the
-locks from the network rather than using granted idle locks.
-.sp
-The count is not strictly enforced.  Operations are allowed to use locks
-while over the limit to avoid deadlocks under heavy concurrent load.
-Exceeding the count only attempts freeing of idle locks.
-.TP
 .B log_merge_wait_timeout_ms=<number>
 This option sets the amount of time, in milliseconds, that log merge
 creation can wait before timing out.  This setting is per-mount, only
@@ -161,10 +137,11 @@ connection will wait for active TCP packets, before deciding that
 the connection is dead. This setting is per-mount and only changes
 the behavior of that mount.
 .sp
-The default value of this setting is 60000msec (60s). Any precision
+The default value of this setting is 10000msec (10s). Any precision
 beyond a whole second is likely unrealistic due to the nature of
 TCP keepalive mechanisms in the Linux kernel. Valid values are any
-value higher than 3000 (3s).
+value higher than 3000 (3s). Values that are higher than 30000msec
+(30s) will likely interfere with other embedded timeout values.
 .sp
 The TCP keepalive mechanism is complex and observing a lost connection
 quickly is important to maintain cluster stability. If the local
Author	SHA1	Message	Date
Auke Kok	732637d372	merge conflict from zab/shrink cleanup	2025-10-07 12:22:53 -07:00
Auke Kok	963591cc9a	Fix a sparse warning in net.c	2025-10-07 12:22:40 -07:00
Auke Kok	ad79ee94f9	Add tcp_keepalive_timeout_ms option. The default TCP keepalive value is currently 10s, resulting in clients being disconnected after 10 seconds of not replying to a TCP keepalive packet. These keepalive values are reasonable most of the times, but we've seen client disconnects where this timeout has been exceeded, resulting in fencing. The cause for this is unknown at this time, but it is suspected that network intermissions are happening. This change adds a configurable value for this specific client socket timeout. It enforces that its value is above UNRESPONSIVE_PROBES, whose value remains unchanged. The default value of 10000ms (10s) remains the trusted value. It is enirely unclear and untested what values are reasonable and which ones are not. Since the value of this setting can and will interact with other timeout values, care must be taken to not exceed certain other timeout values. I've tested this only briefly with values of 5000 and 25000. Outside that range is likely problematic. Signed-off-by: Auke Kok <auke.kok@versity.com>	2025-10-07 12:16:23 -07:00
Zach Brown	65ea250de9	Remove msghdr iov_iter kernelcompat This removes the KC_MSGHDR_STRUCT_IOV_ITER kernel compat. kernel_{send,recv}msg() initializes either msg_iov or msg_iter. This isn't a clean revert of "69068ae2 Initialize msg.msg_iter from iovec." because previous patches fixed the order of arguments, and the net send caller was removed. Signed-off-by: Zach Brown <zab@versity.com>	2025-10-07 12:15:59 -07:00
Zach Brown	86ca09ed7d	Send messages in batches Previous work had the receiver try to receive multiple messages in bulk. This does the same for the sender. We walk the send queue and initialize a vector that we then send with one call. This is intentionally similar to the single message sending pattern to avoid unintended changes. Along with the changes to recieve in bulk this ended up increasing the message processing rate by about 6x when both send and receive were going full throttle. Signed-off-by: Zach Brown <zab@versity.com>	2025-10-07 12:15:51 -07:00
Zach Brown	5681920bfe	Fix swapped sendmsg nr_segs/count When the msg_iter compat was added the iter was initialized with nr_segs and count swapped. I'm not convinced this had any effect because the kernel_{send,recv}msg() call would initialize msg_iter again with the correct arguments. Signed-off-by: Zach Brown <zab@versity.com>	2025-10-07 12:15:43 -07:00
Zach Brown	6c2ccf75ea	Receive incoming messages in bulk Our messaging layer is used for small control messages, not large data payloads. By calling recvmsg twice for every incoming message we're hitting the socket lock reasonably hard. With senders doing the same, and a lot of messages flowing in each direction, the contention is non-trivial. This changes the receiver to copy as much of the incoming stream into a page that is then framed and copied again into individual allocated messages that can be processed concurrently. We're avoiding contention with the sender on the socket at the cost of additional copies of our small messages. Signed-off-by: Zach Brown <zab@versity.com>	2025-10-07 12:15:34 -07:00
Zach Brown	a818b9e461	Process client lock messages in ordered work The lock client has a requirement that it can't handle some messages being processed out of order. Previously it had detected message ordering itself, but had missed some cases. Recieve processing was then changed to always call lock message processing from the recv work to globally order all lock messages. This inline processing was contributing to excessive latencies in making our way through the incoming receive queue, delaying work that would otherwise be parallel once we got it off the recv queue. This was seen in practice as a giant flood of lock shrink messages arrived at the client. It processed each in turn, starving a statfs response long enough to trigger the hung task warning. This fix does two things. First, it moves ordered recv processing out of the recv work. It lets the recv work drain the socket quickly and turn it into a list that the ordered work is consuming. Other messages will have a chance to be received and queued to their processing work without having to wait for the ordered work to be processed. Secondly, it adds parallelism to the ordered processing. The incoming lock messages don't need global ordering, they need ordering within each lock. We add an arbitrary but reasonable number of ordered workers and hash lock messages to each worker based on the lock's key. Signed-off-by: Zach Brown <zab@versity.com>	2025-10-07 12:15:20 -07:00
Zach Brown	b9f8eee59e	Use list_lru for block cache shrinking The block cache had a bizarre cache eviction policy that was trying to avoid precise LRU updates at each block. It had pretty bad behaviour, including only allowing reclaim of maybe 20% of the blocks that were visited by the shrinker. We can use the existing list_lru facility in the kernel to do a better job. Blocks only exhibit contention as they're allocated and added to per-node lists. From then on we only set accessed bits and the private list walkers move blocks around on the list as we see the accessed bits. (It looks more like a fifo with lazy promotion than a "LRU" that is actively moving list items around as they're accessed.) Using the facility means changing how we remove blocks from the cache and hide them from lookup. We clean up the refcount inserted flag a bit to be expressed more as a base refcount that can be acquired by whoever's removing from the cache. It seems a lot clearer. Signed-off-by: Zach Brown <zab@versity.com>	2025-10-07 12:14:25 -07:00
Zach Brown	d8fcbb9564	Add kernelcompat for list_lru Add kernelcompat helpers for initial use of list_lru for shrinking. The most complicated part is the walk callback type changing. Signed-off-by: Zach Brown <zab@versity.com>	2025-10-07 12:14:15 -07:00
Zach Brown	4d58252e1a	Retry stale item reads instead of stopping reclaim Readers can read a set of items that is stale with respect to items that were dirtied and written under a local cluster lock after the read started. The active reader machanism addressed this by refusing to shrink pages that could contain items that were dirtied while any readers were in flight. Under the right circumstances this can result in refusing to shrink quite a lot of pages indeed. This changes the mechanism to allow pages to be reclaimed, and instead forces stale readers to retry. The gamble is that reads are much faster than writes. A small fraction should have to retry, and when they do they can be satisfied by the block cache. Signed-off-by: Zach Brown <zab@versity.com>	2025-10-07 12:12:29 -07:00
Chris Kirby	293df47589	Fix race condition in orphan-inodes test Make sure that the orphan scanners can see deletions after forced unmounts by waiting for reclaim_open_log_tree() to run on each mount; and waiting for finalize_and_start_log_merge() to run and not find any finalized trees. Do this by adding two new counters: reclaimed_open_logs and log_merge_no_finalized and fixing the orphan-inodes test to check those before waiting for the orphan scanners to complete. Signed-off-by: Chris Kirby <ckirby@versity.com>	2025-10-06 16:55:47 -05:00
Chris Kirby	2a58e4c147	Use ENOLINK as a special error code during forced unmount Tests such as quorum-heartbeat-timeout were failing with EIO messages in dmesg output due to expected errors during forced unmount. Use ENOLINK instead, and filter all errors from dmesg with this errno (67). Signed-off-by: Chris Kirby <ckirby@versity.com>	2025-10-06 15:57:42 -05:00
Auke Kok	1b7917e063	Don't run format-version-forward-back on el8, either This test compiles an earlier commit from the tree that is starting to fail due to various changes on the OS level, most recently due to sparse issues with newer kernel headers. This problem will likely increase in the future as we add more supported releases. We opt to just only run this test on el7 for now. While we could have made this skip sparse checks that fail it on el8, it will suffice at this point if this just works on one of the supported OS versions during testing. Signed-off-by: Auke Kok <auke.kok@versity.com>	2025-10-06 12:27:25 -05:00
Zach Brown	4f9c3503c8	Add cond_resched to iput worker The iput worker can accumulate quite a bit of pending work to do. We've seen hung task warnings while it's doing its work (admitedly in debug kernels). There's no harm in throwing in a cond_resched so other tasks get a chance to do work. Signed-off-by: Zach Brown <zab@versity.com>	2025-10-06 12:27:25 -05:00
Chris Kirby	541cb47af0	Add tracing for get_file_block() and scoutfs_ioc_search_xattrs(). Signed-off-by: Chris Kirby <ckirby@versity.com>	2025-10-06 12:27:25 -05:00
Chris Kirby	d537365d0a	Fix several cases in srch.c where the return value of EIO should have been -EIO. Signed-off-by: Chris Kirby <ckirby@versity.com>	2025-10-06 12:27:25 -05:00
Chris Kirby	7375627861	Add the inode number to scoutfs_xattr_set traces. Signed-off-by: Chris Kirby <ckirby@versity.com>	2025-10-06 12:27:25 -05:00
Chris Kirby	48d849e2f4	Only start new quorum election after a receive failure It's possible for the quorum worker to be preempted for a long period, especially on debug kernels. Since we only check for how much time has passed, it's possible for a clean receive to inadvertently trigger an election. This can cause the quorum-heartbeat-timeout test to fail due to observed delays outside of the expected bounds. Instead, make sure we had a receive failure before comparing timestamps. Signed-off-by: Chris Kirby <ckirby@versity.com>	2025-10-06 12:27:25 -05:00