Introduce meta_reserve_blocks mount option, default value.

This option adds a mount option, with default value of 16384, that adds an additional reserve amount of blocks for the meta device. The default value is 16384, which corresponds to 1GB of space, and just about doubles the internal value for the reserve that is calculated based on clients/mounts dynamically in sort of standard values. It also just compromises about less than 2% of the meta device size for the smallest meta device size. A suggested value for larger deployments is like somewhere around 256 blocks per GB of meta device size, i.e. 1/64 of the meta device space, and about 1.6% in effect. Customers who are running into issues can adjust their mount options to increase the value to have a larger safety buffer, or decrease it to potentially have a way to get out of low space conditions temporarily. Obviously one would want to increase the value of this option after resolving the low space condition issues as soon as possible. Our test suite will run with meta_reserve_blocks=0, so that the behavior of any of our tests is functionally unaffected by this change, and won't interfere with resolving underlying ENOSPC issues and their resolution. The addition of this option however allows us to artifically create ENOSPC conditions at will, and we may want to add tests specifically that do so. Signed-off-by: Auke Kok <auke.kok@versity.com>
2026-04-30 18:05:43 +00:00 · 2025-04-17 16:06:33 -04:00
92 changed files with 2543 additions and 5600 deletions
--- a/ReleaseNotes.md
+++ b/ReleaseNotes.md
@@ -1,139 +1,6 @@
 Versity ScoutFS Release Notes
 =============================

---
-v1.30
-\
-*Apr 21, 2026*
-
-Fix a problem reading the accumulated totals of contributing .totl.
-xattrs when log merging is in progress.  The problem would have readers
-of the totals calculate the sums incorrectly.
-
-Fix a problem updating quota rules.  There was a race where updates
-could be corrupted if they happened while a transaction was being
-written.
-
-Fix a problem deleting files with .indx. xattrs.  The internal indexing
-metadata wouldn't be properly deleted so the files would still claim to
-be present and visible in the index, though the file no longer existed.
-
---
-v1.29
-\
-*Mar 25, 2026*
-
-Add a repair mechanism for mount logs that weren't properly resolved as
-mounts left the cluster.  The presence of these logs prevents log
-merging from making forward progress and the backlog of logs over time
-can cause operations to slow to a crawl.  With the repair mechanism in
-place the orphaned logs don't stop merging and operations proceed as
-usual.
-
-Add an ioctl for turning offline unmapped file regions into sparse
-regions.
-
---
-v1.28
-\
-*Feb 5, 2026*
-
-Fix a bug that lead to incorrect negative caching of ACL entries
-starting in version 9.6 of distribution kernels in the enterprise linux
-family.  This would manifest as ACLs seemingly disappearing,
-particularly default ACLs on directories.  The persistent ACLs always
-existed but because of internal API incompatibility some readers
-couldn't see them and would cache that they didn't exist.
-
---
-v1.27
-\
-*Jan 15, 2026*
-
-Switch away from using the general VM cache reclaim machinery to reduce
-idle cluster locks in the client.  The VM treated locks like a cache and
-let many accumulate, presuming that it would be efficient to free them
-in batches.  Lock freeing requires network communication so this could
-result in enormous backlogs in network messages (on the order of
-hundreds of thousands) and could result in signifcant delays of other
-network messaging.
-
-Fix inefficient network receive processing while many messages are in
-the send queue.  This consumed sufficient CPU to cause significant
-stalls, perhaps resulting in hung task warning messages due to delayed
-lock message delivery.
-
-Fix a server livelock case that could happen while committing client
-transactions that contain a large amount of freed file data extents.
-This would present as client tasks hanging and a server task spinning
-consuming cpu.
-
-Fix a rare server request processing failure that doesn't deal with
-retransmission of a request that a previous server partially processed.
-This would present as hung client tasks and repeated "error -2
-committing log merge: getting merge status item" kernel messages.
-
-Fix an unneccessary server shutdown during specific circumstances in
-client lock recovery.  The shutdown was due to server state and was
-ultimately harmless.  The next server that started up would proceed
-accordingly.
-
---
-v1.26
-\
-*Nov 17, 2025*
-
-Add the ino\_alloc\_per\_lock mount option.  This changes the number of
-inode numbers allocated under each cluster lock and can alleviate lock
-contention for some patterns of larger file creation.
-
-Add the tcp\_keepalive\_timeout\_ms mount option.  This can enable the
-system to survive longer periods of networking outages.
-
-Fix a rare double free of internal btree metadata blocks when merging
-log trees.  The duplicated freed metadata block numbers would cause
-persistent errors in the server, preventing the server from starting and
-hanging the system.
-
-Fix the data\_wait interface to not require the correct data\_version of
-the inode when raising an error.  This lets callers raise errors when
-they're unable to recall the details of the inode to discover its
-data\_version.
-
-Change scoutfs to more aggressively reclaim cached memory when under
-memory pressure.  This makes scoutfs behave more like other kernel
-components and it integrates better with the reclaim policy heuristics
-in the VM core of the kernel.
-
-Change scoutfs to more efficiently transmit and receive socket messages.
-Under heavy load this can process messages sufficiently more quickly to
-avoid hung task messages for tasks that were waiting for cluster lock
-messages to be processed.
-
-Fix faulty server block commit budget calculations that were generating
-spurious "holders exceeded alloc budget" console messages.
-
---
-v1.25
-\
-*Jun 3, 2025*
-
-Fix a bug that could cause indefinite retries of failed client commits.
-Under specific error conditions the client and server's understanding of
-the current client commit could get out of sync.  The client would retry
-commits indefinitely that could never succeed.  This manifested as
-infinite "critical transaction commit failure" messages in the kernel
-log on the client and matching "error <nr> committing client logs" on
-the server.
-
-Fix a bug in a specific case of server error handling that could result
-in sending references to unwritten blocks to the client.  The client
-would try to read blocks that hadn't been written and return spurious
-errors.  This was seen under low free space conditions on the server and
-resulted in error messages with error code 116 (The errno enum for
-ESTALE, the client's indication that it couldn't read the blocks that it
-expected.)
-
 ---
 v1.24
 \
--- a/kmod/Makefile
+++ b/kmod/Makefile
@@ -5,6 +5,13 @@ ifeq ($(SK_KSRC),)
 SK_KSRC := $(shell echo /lib/modules/`uname -r`/build)
 endif

+# fail if sparse fails if we find it
+ifeq ($(shell sparse && echo found),found)
+SP =
+else
+SP = @:
+endif
+
 SCOUTFS_GIT_DESCRIBE ?= \
 	$(shell git describe --all --abbrev=6 --long 2>/dev/null || \
 		echo no-git)
@@ -29,7 +36,9 @@ TARFILE = scoutfs-kmod-$(RPM_VERSION).tar
 all: module

 module:
-	$(MAKE) CHECK=$(CURDIR)/src/sparse-filtered.sh C=1 CF="-D__CHECK_ENDIAN__" $(SCOUTFS_ARGS)
+	$(MAKE) $(SCOUTFS_ARGS)
+	$(SP) $(MAKE) C=2 CF="-D__CHECK_ENDIAN__" $(SCOUTFS_ARGS)
+

 modules_install:
 	$(MAKE) $(SCOUTFS_ARGS) modules_install
--- a/kmod/src/Makefile.kernelcompat
+++ b/kmod/src/Makefile.kernelcompat
@@ -158,6 +158,15 @@ ifneq (,$(shell grep 'sock_create_kern.*struct net' include/linux/net.h))
 ccflags-y += -DKC_SOCK_CREATE_KERN_NET=1
 endif

+#
+# v3.18-rc6-1619-gc0371da6047a
+#
+# iov_iter is now part of struct msghdr
+#
+ifneq (,$(shell grep 'struct iov_iter.*msg_iter' include/linux/socket.h))
+ccflags-y += -DKC_MSGHDR_STRUCT_IOV_ITER=1
+endif
+
 #
 # v4.17-rc6-7-g95582b008388
 #
@@ -278,14 +287,6 @@ ifneq (,$(shell grep 'int ..mknod. .struct user_namespace' include/linux/fs.h))
 ccflags-y += -DKC_VFS_METHOD_USER_NAMESPACE_ARG
 endif

-#
-# v6.2-rc1-2-gabf08576afe3
-#
-# fs: vfs methods use struct mnt_idmap instead of struct user_namespace
-ifneq (,$(shell grep 'int vfs_mknod.struct mnt_idmap' include/linux/fs.h))
-ccflags-y += -DKC_VFS_METHOD_MNT_IDMAP_ARG
-endif
-
 #
 # v5.17-rc2-21-g07888c665b40
 #
@@ -433,66 +434,3 @@ endif
 ifneq (,$(shell grep 'int ..remap_pages..struct vm_area_struct' include/linux/mm.h))
 ccflags-y += -DKC_MM_REMAP_PAGES
 endif
-
-#
-# v3.19-4742-g503c358cf192
-#
-# list_lru_shrink_count() and list_lru_shrink_walk() introduced
-#
-ifneq (,$(shell grep 'list_lru_shrink_count.*struct list_lru' include/linux/list_lru.h))
-ccflags-y += -DKC_LIST_LRU_SHRINK_COUNT_WALK
-endif
-
-#
-# v3.19-4757-g3f97b163207c
-#
-# lru_list_walk_cb lru arg added
-#
-ifneq (,$(shell grep 'struct list_head \*item, spinlock_t \*lock, void \*cb_arg' include/linux/list_lru.h))
-ccflags-y += -DKC_LIST_LRU_WALK_CB_ITEM_LOCK
-endif
-
-#
-# v6.7-rc4-153-g0a97c01cd20b
-#
-# list_lru_{add,del} -> list_lru_{add,del}_obj
-#
-ifneq (,$(shell grep '^bool list_lru_add_obj' include/linux/list_lru.h))
-ccflags-y += -DKC_LIST_LRU_ADD_OBJ
-endif
-
-#
-# v6.12-rc6-227-gda0c02516c50
-#
-# lru_list_walk_cb lock arg removed
-#
-ifneq (,$(shell grep 'struct list_lru_one \*list, spinlock_t \*lock, void \*cb_arg' include/linux/list_lru.h))
-ccflags-y += -DKC_LIST_LRU_WALK_CB_LIST_LOCK
-endif
-
-#
-# v5.1-rc4-273-ge9b98e162aa5
-#
-# introduce stack trace helpers
-#
-ifneq (,$(shell grep '^unsigned int stack_trace_save' include/linux/stacktrace.h))
-ccflags-y += -DKC_STACK_TRACE_SAVE
-endif
-
-#
-# v6.1-rc1-2-g138060ba92b3
-#
-# set_acl now passed a struct dentry instead of inode.
-#
-ifneq (,$(shell grep 'int ..set_acl.*struct dentry' include/linux/fs.h))
-ccflags-y += -DKC_SET_ACL_DENTRY
-endif
-
-#
-# v6.1-rc1-3-gcac2f8b8d8b5
-#
-# get_acl renamed to get_inode_acl.
-#
-ifneq (,$(shell grep 'struct posix_acl.*get_inode_acl' include/linux/fs.h))
-ccflags-y += -DKC_GET_INODE_ACL
-endif
--- a/kmod/src/acl.c
+++ b/kmod/src/acl.c
@@ -107,22 +107,13 @@ struct posix_acl *scoutfs_get_acl_locked(struct inode *inode, int type, struct s
 	return acl;
 }

-#ifdef KC_GET_INODE_ACL
-struct posix_acl *scoutfs_get_acl(struct inode *inode, int type, bool rcu)
-#else
 struct posix_acl *scoutfs_get_acl(struct inode *inode, int type)
-#endif
 {
 	struct super_block *sb = inode->i_sb;
 	struct scoutfs_lock *lock = NULL;
 	struct posix_acl *acl;
 	int ret;

-#ifdef KC_GET_INODE_ACL
-	if (rcu)
-		return ERR_PTR(-ECHILD);
-#endif
-
 #ifndef KC___POSIX_ACL_CREATE
 	if (!IS_POSIXACL(inode))
 		return NULL;
@@ -210,15 +201,8 @@ out:
 	return ret;
 }

-#ifdef KC_SET_ACL_DENTRY
-int scoutfs_set_acl(KC_VFS_NS_DEF
-		    struct dentry *dentry, struct posix_acl *acl, int type)
-{
-	struct inode *inode = dentry->d_inode;
-#else
 int scoutfs_set_acl(struct inode *inode, struct posix_acl *acl, int type)
 {
-#endif
 	struct super_block *sb = inode->i_sb;
 	struct scoutfs_lock *lock = NULL;
 	LIST_HEAD(ind_locks);
@@ -256,11 +240,7 @@ int scoutfs_acl_get_xattr(struct dentry *dentry, const char *name, void *value,
 	if (!IS_POSIXACL(dentry->d_inode))
 		return -EOPNOTSUPP;

-#ifdef KC_GET_INODE_ACL
-	acl = scoutfs_get_acl(dentry->d_inode, type, false);
-#else
 	acl = scoutfs_get_acl(dentry->d_inode, type);
-#endif
 	if (IS_ERR(acl))
 		return PTR_ERR(acl);
 	if (acl == NULL)
@@ -306,11 +286,7 @@ int scoutfs_acl_set_xattr(struct dentry *dentry, const char *name, const void *v
 		}
 	}

-#ifdef KC_SET_ACL_DENTRY
-	ret = scoutfs_set_acl(KC_VFS_INIT_NS dentry, acl, type);
-#else
 	ret = scoutfs_set_acl(dentry->d_inode, acl, type);
-#endif
 out:
 	posix_acl_release(acl);

--- a/kmod/src/acl.h
+++ b/kmod/src/acl.h
@@ -1,18 +1,9 @@
 #ifndef _SCOUTFS_ACL_H_
 #define _SCOUTFS_ACL_H_

-#ifdef KC_SET_ACL_DENTRY
-int scoutfs_set_acl(KC_VFS_NS_DEF
-		    struct dentry *dentry, struct posix_acl *acl, int type);
-#else
-int scoutfs_set_acl(struct inode *inode, struct posix_acl *acl, int type);
-#endif
-#ifdef KC_GET_INODE_ACL
-struct posix_acl *scoutfs_get_acl(struct inode *inode, int type, bool rcu);
-#else
 struct posix_acl *scoutfs_get_acl(struct inode *inode, int type);
-#endif
 struct posix_acl *scoutfs_get_acl_locked(struct inode *inode, int type, struct scoutfs_lock *lock);
+int scoutfs_set_acl(struct inode *inode, struct posix_acl *acl, int type);
 int scoutfs_set_acl_locked(struct inode *inode, struct posix_acl *acl, int type,
 			   struct scoutfs_lock *lock, struct list_head *ind_locks);
 #ifdef KC_XATTR_STRUCT_XATTR_HANDLER
--- a/kmod/src/alloc.c
+++ b/kmod/src/alloc.c
@@ -86,47 +86,18 @@ static u64 smallest_order_length(u64 len)
 }

 /*
- * Moving an extent between trees can dirty blocks in several ways. This
- * function calculates worst case number of blocks across these scenarions.
- * We treat the alloc and free counts independently, so the values below are
- * max(allocated, freed), not the sum.
- *
- * We track extents with two separate btree items: by block number and by size.
- *
- * If we're removing an extent from the btree (allocating), we can dirty
- * two blocks if the keys are in different leaves. If we wind up merging
- * leaves because we fall below the low water mark, we can wind up freeing
- * three leaves.
- *
- * That sequence is as follows, assuming the original keys are removed from
- * blocks A and B:
- *
- * Allocate new dirty A' and B'
- * Free old stable A and B
- * B' has fallen below the low water mark, so copy B' into A'
- * Free B'
- *
- * An extent insertion (freeing an extent) can dirty up to five distinct items
- * in the btree as it adds and removes the blkno and size sorted items for the
- * old and new lengths of the extent:
- *
- * In the by-blkno portion of the btree, we can dirty (allocate for COW) up
- * to two blocks- either by merging adjacent extents, which can cause us to
- * join leaf blocks; or by an insertion that causes a split.
- *
- * In the by-size portion, we never merge extents, so normally we just dirty
- * a single item with a size insertion. But if we merged adjacent extents in
- * the by-blkno portion of the tree, we might be working with three by-sizex
- * items: removing the two old ones that were combined in the merge; and
- * adding the new one for the larger, merged size.
- *
- * Finally, dirtying the paths to these leaves can grow the tree and grow/shrink
- * neighbours at each level, so we multiply by the height of the tree after
- * accounting for a possible new level.
+ * An extent modification dirties three distinct leaves of an allocator
+ * btree as it adds and removes the blkno and size sorted items for the
+ * old and new lengths of the extent.  Dirtying the paths to these
+ * leaves can grow the tree and grow/shrink neighbours at each level.
+ * We over-estimate the number of blocks allocated and freed (the paths
+ * share a root, growth doesn't free) to err on the simpler and safer
+ * side.  The overhead is minimal given the relatively large list blocks
+ * and relatively short allocator trees.
 */
 static u32 extent_mod_blocks(u32 height)
 {
-	return ((1 + height) * 3) * 5;
+	return ((1 + height) * 2) * 3;
 }

 /*
@@ -857,7 +828,7 @@ static int find_zone_extent(struct super_block *sb, struct scoutfs_alloc_root *r
 		.zone = SCOUTFS_FREE_EXTENT_ORDER_ZONE,
 	};
 	struct scoutfs_extent found;
-	struct scoutfs_extent ext = {0,};
+	struct scoutfs_extent ext;
 	u64 start;
 	u64 len;
 	int nr;
--- a/kmod/src/block.c
+++ b/kmod/src/block.c
@@ -22,8 +22,6 @@
 #include <linux/rhashtable.h>
 #include <linux/random.h>
 #include <linux/sched/mm.h>
-#include <linux/list_lru.h>
-#include <linux/stacktrace.h>

 #include "format.h"
 #include "super.h"
@@ -40,12 +38,26 @@
 * than the page size.  Callers can have their own contexts for tracking
 * dirty blocks that are written together.  We pin dirty blocks in
 * memory and only checksum them all as they're all written.
+ *
+ * Memory reclaim is driven by maintaining two very coarse groups of
+ * blocks.  As we access blocks we mark them with an increasing counter
+ * to discourage them from being reclaimed.  We then define a threshold
+ * at the current counter minus half the population.  Recent blocks have
+ * a counter greater than the threshold, and all other blocks with
+ * counters less than it are considered older and are candidates for
+ * reclaim.  This results in access updates rarely modifying an atomic
+ * counter as blocks need to be moved into the recent group, and shrink
+ * can randomly scan blocks looking for the half of the population that
+ * will be in the old group.  It's reasonably effective, but is
+ * particularly efficient and avoids contention between concurrent
+ * accesses and shrinking.
 */

 struct block_info {
 	struct super_block *sb;
+	atomic_t total_inserted;
+	atomic64_t access_counter;
 	struct rhashtable ht;
-	struct list_lru lru;
 	wait_queue_head_t waitq;
 	KC_DEFINE_SHRINKER(shrinker);
 	struct work_struct free_work;
@@ -64,15 +76,28 @@ enum block_status_bits {
 	BLOCK_BIT_PAGE_ALLOC,	/* page (possibly high order) allocation */
 	BLOCK_BIT_VIRT,		/* mapped virt allocation */
 	BLOCK_BIT_CRC_VALID,	/* crc has been verified */
-	BLOCK_BIT_ACCESSED,	/* seen by lookup since last lru add/walk */
 };

+/*
+ * We want to tie atomic changes in refcounts to whether or not the
+ * block is still visible in the hash table, so we store the hash
+ * table's reference up at a known high bit.  We could naturally set the
+ * inserted bit through excessive refcount increments.  We don't do
+ * anything about that but at least warn if we get close.
+ *
+ * We're avoiding the high byte for no real good reason, just out of a
+ * historical fear of implementations that don't provide the full
+ * precision.
+ */
+#define BLOCK_REF_INSERTED	(1U << 23)
+#define BLOCK_REF_FULL		(BLOCK_REF_INSERTED >> 1)
+
 struct block_private {
 	struct scoutfs_block bl;
 	struct super_block *sb;
 	atomic_t refcount;
+	u64 accessed;
 	struct rhash_head ht_head;
-	struct list_head lru_head;
 	struct list_head dirty_entry;
 	struct llist_node free_node;
 	unsigned long bits;
@@ -81,15 +106,13 @@ struct block_private {
 		struct page *page;
 		void *virt;
 	};
-	unsigned int stack_len;
-	unsigned long stack[10];
 };

 #define TRACE_BLOCK(which, bp)									\
 do {												\
 	__typeof__(bp) _bp = (bp);								\
 	trace_scoutfs_block_##which(_bp->sb, _bp, _bp->bl.blkno, atomic_read(&_bp->refcount),	\
-				    atomic_read(&_bp->io_count), _bp->bits);	\
+				    atomic_read(&_bp->io_count), _bp->bits, _bp->accessed);	\
 } while (0)

 #define BLOCK_PRIVATE(_bl) \
@@ -103,17 +126,7 @@ static __le32 block_calc_crc(struct scoutfs_block_header *hdr, u32 size)
 	return cpu_to_le32(calc);
 }

-static noinline void save_block_stack(struct block_private *bp)
-{
-	bp->stack_len = stack_trace_save(bp->stack, ARRAY_SIZE(bp->stack), 2);
-}
-
-static void print_block_stack(struct block_private *bp)
-{
-	stack_trace_print(bp->stack, bp->stack_len, 1);
-}
-
-static noinline struct block_private *block_alloc(struct super_block *sb, u64 blkno)
+static struct block_private *block_alloc(struct super_block *sb, u64 blkno)
 {
 	struct block_private *bp;
 	unsigned int nofs_flags;
@@ -163,13 +176,11 @@ static noinline struct block_private *block_alloc(struct super_block *sb, u64 bl
 	bp->bl.blkno = blkno;
 	bp->sb = sb;
 	atomic_set(&bp->refcount, 1);
-	INIT_LIST_HEAD(&bp->lru_head);
 	INIT_LIST_HEAD(&bp->dirty_entry);
 	set_bit(BLOCK_BIT_NEW, &bp->bits);
 	atomic_set(&bp->io_count, 0);

 	TRACE_BLOCK(allocate, bp);
-	save_block_stack(bp);

 out:
 	if (!bp)
@@ -222,85 +233,32 @@ static void block_free_work(struct work_struct *work)
 }

 /*
- * Users of blocks hold a refcount.  If putting a refcount drops to zero
- * then the block is freed.
- *
- * Acquiring new references and claiming the exclusive right to tear
- * down a block is built around this LIVE_REFCOUNT_BASE refcount value.
- * As blocks are initially cached they have the live base added to their
- * refcount.  Lookups will only increment the refcount and return blocks
- * for reference holders while the refcount is >= than the base.
- *
- * To remove a block from the cache and eventually free it, either by
- * the lru walk in the shrinker, or by reference holders, the live base
- * is removed and turned into a normal refcount increment that will be
- * put by the caller.  This can only be done once for a block, and once
- * its done lookup will not return any more references.
- */
-#define LIVE_REFCOUNT_BASE (INT_MAX ^ (INT_MAX >> 1))
-
-/*
- * Inc the refcount while holding an incremented refcount.  We can't
- * have so many individual reference holders that they pass the live
- * base.
+ * Get a reference to a block while holding an existing reference.
 */
 static void block_get(struct block_private *bp)
 {
-	int now = atomic_inc_return(&bp->refcount);
+	WARN_ON_ONCE((atomic_read(&bp->refcount) & ~BLOCK_REF_INSERTED) <= 0);

-	BUG_ON(now <= 1);
-	BUG_ON(now == LIVE_REFCOUNT_BASE);
+	atomic_inc(&bp->refcount);
 }

 /*
- * if (*v >= u) {
- * 	*v += a;
- * 	return true;
- * }
- */
-static bool atomic_add_unless_less(atomic_t *v, int a, int u)
+ * Get a reference to a block as long as it's been inserted in the hash
+ * table and hasn't been removed.
+ */ 
+static struct block_private *block_get_if_inserted(struct block_private *bp)
 {
-	int c;
+	int cnt;

 	do {
-		c = atomic_read(v);
-		if (c < u)
-			return false;
-	} while (atomic_cmpxchg(v, c, c + a) != c);
+		cnt = atomic_read(&bp->refcount);
+		WARN_ON_ONCE(cnt & BLOCK_REF_FULL);
+		if (!(cnt & BLOCK_REF_INSERTED))
+			return NULL;

-	return true;
-}
+	} while (atomic_cmpxchg(&bp->refcount, cnt, cnt + 1) != cnt);

-static bool block_get_if_live(struct block_private *bp)
-{
-	return atomic_add_unless_less(&bp->refcount, 1, LIVE_REFCOUNT_BASE);
-}
-
-/*
- * If the refcount still has the live base, subtract it and increment
- * the callers refcount that they'll put.
- */
-static bool block_get_remove_live(struct block_private *bp)
-{
-	return atomic_add_unless_less(&bp->refcount, (1 - LIVE_REFCOUNT_BASE), LIVE_REFCOUNT_BASE);
-}
-
-/*
- * Only get the live base refcount if it is the only refcount remaining.
- * This means that there are no active refcount holders and the block
- * can't be dirty or under IO, which both hold references.
- */
-static bool block_get_remove_live_only(struct block_private *bp)
-{
-	int c;
-
-	do {
-		c = atomic_read(&bp->refcount);
-		if (c != LIVE_REFCOUNT_BASE)
-			return false;
-	} while (atomic_cmpxchg(&bp->refcount, c, c - LIVE_REFCOUNT_BASE + 1) != c);
-
-	return true;
+	return bp;
 }

 /*
@@ -332,81 +290,143 @@ static const struct rhashtable_params block_ht_params = {
 };

 /*
- * Insert the block into the cache so that it's visible for lookups.
- * The caller can hold references (including for a dirty block).
- *
- * We make sure the base is added and the block is in the lru once it's
- * in the hash.  If hash table insertion fails it'll be briefly visible
- * in the lru, but won't be isolated/evicted because we hold an
- * incremented refcount in addition to the live base.
+ * Insert a new block into the hash table.  Once it is inserted in the
+ * hash table readers can start getting references.  The caller may have
+ * multiple refs but the block can't already be inserted.
 */
 static int block_insert(struct super_block *sb, struct block_private *bp)
 {
 	DECLARE_BLOCK_INFO(sb, binf);
 	int ret;

-	BUG_ON(atomic_read(&bp->refcount) >= LIVE_REFCOUNT_BASE);
-	atomic_add(LIVE_REFCOUNT_BASE, &bp->refcount);
-	smp_mb__after_atomic(); /* make sure live base is visible to list_lru walk */
-	list_lru_add_obj(&binf->lru, &bp->lru_head);
+	WARN_ON_ONCE(atomic_read(&bp->refcount) & BLOCK_REF_INSERTED);
+
 retry:
+	atomic_add(BLOCK_REF_INSERTED, &bp->refcount);
 	ret = rhashtable_lookup_insert_fast(&binf->ht, &bp->ht_head, block_ht_params);
 	if (ret < 0) {
+		atomic_sub(BLOCK_REF_INSERTED, &bp->refcount);
 		if (ret == -EBUSY) {
 			/* wait for pending rebalance to finish */
 			synchronize_rcu();
 			goto retry;
-		} else {
-			atomic_sub(LIVE_REFCOUNT_BASE, &bp->refcount);
-			BUG_ON(atomic_read(&bp->refcount) >= LIVE_REFCOUNT_BASE);
-			list_lru_del_obj(&binf->lru, &bp->lru_head);
 		}
 	} else {
+		atomic_inc(&binf->total_inserted);
 		TRACE_BLOCK(insert, bp);
 	}

 	return ret;
 }

-/*
- * Indicate to the lru walker that this block has been accessed since it
- * was added or last walked.
- */
-static void block_accessed(struct super_block *sb, struct block_private *bp)
+static u64 accessed_recently(struct block_info *binf)
 {
-	if (!test_and_set_bit(BLOCK_BIT_ACCESSED, &bp->bits))
-		scoutfs_inc_counter(sb, block_cache_access_update);
+	return atomic64_read(&binf->access_counter) - (atomic_read(&binf->total_inserted) >> 1);
 }

 /*
- * Remove the block from the cache.  When this returns the block won't
- * be visible for additional references from lookup.
- *
- * We always try and remove from the hash table.  It's safe to remove a
- * block that isn't hashed, it just returns -ENOENT.
- *
- * This is racing with the lru walk in the shrinker also trying to
- * remove idle blocks from the cache.  They both try to remove the live
- * refcount base and perform their removal and put if they get it.
+ * Make sure that a block that is being accessed is less likely to be
+ * reclaimed if it is seen by the shrinker.   If the block hasn't been
+ * accessed recently we update its accessed value.
 */
-static void block_remove(struct super_block *sb, struct block_private *bp)
+static void block_accessed(struct super_block *sb, struct block_private *bp)
 {
 	DECLARE_BLOCK_INFO(sb, binf);

-	rhashtable_remove_fast(&binf->ht, &bp->ht_head, block_ht_params);
-
-	if (block_get_remove_live(bp)) {
-		list_lru_del_obj(&binf->lru, &bp->lru_head);
-		block_put(sb, bp);
+	if (bp->accessed == 0 || bp->accessed < accessed_recently(binf)) {
+		scoutfs_inc_counter(sb, block_cache_access_update);
+		bp->accessed = atomic64_inc_return(&binf->access_counter);
 	}
 }

+/*
+ * The caller wants to remove the block from the hash table and has an
+ * idea what the refcount should be.  If the refcount does still
+ * indicate that the block is hashed, and we're able to clear that bit,
+ * then we can remove it from the hash table.
+ *
+ * The caller makes sure that it's safe to be referencing this block,
+ * either with their own held reference (most everything) or by being in
+ * an rcu grace period (shrink).
+ */
+static bool block_remove_cnt(struct super_block *sb, struct block_private *bp, int cnt)
+{
+	DECLARE_BLOCK_INFO(sb, binf);
+	int ret;
+
+	if ((cnt & BLOCK_REF_INSERTED) &&
+	    (atomic_cmpxchg(&bp->refcount, cnt, cnt & ~BLOCK_REF_INSERTED) == cnt)) {
+
+		TRACE_BLOCK(remove, bp);
+		ret = rhashtable_remove_fast(&binf->ht, &bp->ht_head, block_ht_params);
+		WARN_ON_ONCE(ret); /* must have been inserted */
+		atomic_dec(&binf->total_inserted);
+		return true;
+	}
+
+	return false;
+}
+
+/*
+ * Try to remove the block from the hash table as long as the refcount
+ * indicates that it is still in the hash table.  This can be racing
+ * with normal refcount changes so it might have to retry.
+ */
+static void block_remove(struct super_block *sb, struct block_private *bp)
+{
+	int cnt;
+
+	do {
+		cnt = atomic_read(&bp->refcount);
+	} while ((cnt & BLOCK_REF_INSERTED) && !block_remove_cnt(sb, bp, cnt));
+}
+
+/*
+ * Take one shot at removing the block from the hash table if it's still
+ * in the hash table and the caller has the only other reference.
+ */
+static bool block_remove_solo(struct super_block *sb, struct block_private *bp)
+{
+	return block_remove_cnt(sb, bp, BLOCK_REF_INSERTED | 1);
+}
+
 static bool io_busy(struct block_private *bp)
 {
 	smp_rmb(); /* test after adding to wait queue */
 	return test_bit(BLOCK_BIT_IO_BUSY, &bp->bits);
 }

+/*
+ * Called during shutdown with no other users.
+ */
+static void block_remove_all(struct super_block *sb)
+{
+	DECLARE_BLOCK_INFO(sb, binf);
+	struct rhashtable_iter iter;
+	struct block_private *bp;
+
+	rhashtable_walk_enter(&binf->ht, &iter);
+	rhashtable_walk_start(&iter);
+
+	for (;;) {
+		bp = rhashtable_walk_next(&iter);
+		if (bp == NULL)
+			break;
+		if (bp == ERR_PTR(-EAGAIN))
+			continue;
+
+		if (block_get_if_inserted(bp)) {
+			block_remove(sb, bp);
+			WARN_ON_ONCE(atomic_read(&bp->refcount) != 1);
+			block_put(sb, bp);
+		}
+	}
+
+	rhashtable_walk_stop(&iter);
+	rhashtable_walk_exit(&iter);
+
+	WARN_ON_ONCE(atomic_read(&binf->total_inserted) != 0);
+}

 /*
 * XXX The io_count and sb fields in the block_private are only used
@@ -468,7 +488,7 @@ static int block_submit_bio(struct super_block *sb, struct block_private *bp,
 	int ret = 0;

 	if (scoutfs_forcing_unmount(sb))
-		return -ENOLINK;
+		return -EIO;

 	sector = bp->bl.blkno << (SCOUTFS_BLOCK_LG_SHIFT - 9);

@@ -523,10 +543,6 @@ static int block_submit_bio(struct super_block *sb, struct block_private *bp,
 	return ret;
 }

-/*
- * Return a block with an elevated refcount if it was present in the
- * hash table and its refcount didn't indicate that it was being freed.
- */
 static struct block_private *block_lookup(struct super_block *sb, u64 blkno)
 {
 	DECLARE_BLOCK_INFO(sb, binf);
@@ -534,8 +550,8 @@ static struct block_private *block_lookup(struct super_block *sb, u64 blkno)

 	rcu_read_lock();
 	bp = rhashtable_lookup(&binf->ht, &blkno, block_ht_params);
-	if (bp && !block_get_if_live(bp))
-		bp = NULL;
+	if (bp)
+		bp = block_get_if_inserted(bp);
 	rcu_read_unlock();

 	return bp;
@@ -696,8 +712,8 @@ retry:

 	ret = 0;
 out:
-	if (!retried && !IS_ERR_OR_NULL(bp) && !block_is_dirty(bp) &&
-	    (ret == -ESTALE || scoutfs_trigger(sb, BLOCK_REMOVE_STALE))) {
+	if ((ret == -ESTALE || scoutfs_trigger(sb, BLOCK_REMOVE_STALE)) &&
+	    !retried && !block_is_dirty(bp)) {
 		retried = true;
 		scoutfs_inc_counter(sb, block_cache_remove_stale);
 		block_remove(sb, bp);
@@ -1062,106 +1078,100 @@ static unsigned long block_count_objects(struct shrinker *shrink, struct shrink_
 	struct super_block *sb = binf->sb;

 	scoutfs_inc_counter(sb, block_cache_count_objects);
-	return list_lru_shrink_count(&binf->lru, sc);
-}
-
-struct isolate_args {
-	struct super_block *sb;
-	struct list_head dispose;
-};
-
-#define DECLARE_ISOLATE_ARGS(sb_, name_) \
-	struct isolate_args name_ = { \
-		.sb = sb_, \
-		.dispose = LIST_HEAD_INIT(name_.dispose), \
-	}
-
-static enum lru_status isolate_lru_block(struct list_head *item, struct list_lru_one *list,
-					 void *cb_arg)
-{
-	struct block_private *bp = container_of(item, struct block_private, lru_head);
-	struct isolate_args *ia = cb_arg;
-
-	TRACE_BLOCK(isolate, bp);
-
-	/* rotate accessed blocks to the tail of the list (lazy promotion) */
-	if (test_and_clear_bit(BLOCK_BIT_ACCESSED, &bp->bits)) {
-		scoutfs_inc_counter(ia->sb, block_cache_isolate_rotate);
-		return LRU_ROTATE;
-	}
-
-	/* any refs, including dirty/io, stop us from acquiring lru refcount */
-	if (!block_get_remove_live_only(bp)) {
-		scoutfs_inc_counter(ia->sb, block_cache_isolate_skip);
-		return LRU_SKIP;
-	}
-
-	scoutfs_inc_counter(ia->sb, block_cache_isolate_removed);
-	list_lru_isolate_move(list, &bp->lru_head, &ia->dispose);
-	return LRU_REMOVED;
-}
-
-static void shrink_dispose_blocks(struct super_block *sb, struct list_head *dispose)
-{
-	struct block_private *bp;
-	struct block_private *bp__;
-
-	list_for_each_entry_safe(bp, bp__, dispose, lru_head) {
-		list_del_init(&bp->lru_head);
-		block_remove(sb, bp);
-		block_put(sb, bp);
-	}
+
+	return shrinker_min_long(atomic_read(&binf->total_inserted));
 }

+/*
+ * Remove a number of cached blocks that haven't been used recently.
+ *
+ * We don't maintain a strictly ordered LRU to avoid the contention of
+ * accesses always moving blocks around in some precise global
+ * structure.
+ *
+ * Instead we use counters to divide the blocks into two roughly equal
+ * groups by how recently they were accessed.  We randomly walk all
+ * inserted blocks looking for any blocks in the older half to remove
+ * and free.  The random walk and there being two groups means that we
+ * typically only walk a small multiple of the number we're looking for
+ * before we find them all.
+ *
+ * Our rcu walk of blocks can see blocks in all stages of their life
+ * cycle, from dirty blocks to those with 0 references that are queued
+ * for freeing.  We only want to free idle inserted blocks so we
+ * atomically remove blocks when the only references are ours and the
+ * hash table.
+ */
 static unsigned long block_scan_objects(struct shrinker *shrink, struct shrink_control *sc)
 {
 	struct block_info *binf = KC_SHRINKER_CONTAINER_OF(shrink, struct block_info);
 	struct super_block *sb = binf->sb;
-	DECLARE_ISOLATE_ARGS(sb, ia);
-	unsigned long freed;
+	struct rhashtable_iter iter;
+	struct block_private *bp;
+	bool stop = false;
+	unsigned long freed = 0;
+	unsigned long nr = sc->nr_to_scan;
+	u64 recently;

 	scoutfs_inc_counter(sb, block_cache_scan_objects);

-	freed = kc_list_lru_shrink_walk(&binf->lru, sc, isolate_lru_block, &ia);
-	shrink_dispose_blocks(sb, &ia.dispose);
-	return freed;
-}
+	recently = accessed_recently(binf);
+	rhashtable_walk_enter(&binf->ht, &iter);
+	rhashtable_walk_start(&iter);

-static enum lru_status dump_lru_block(struct list_head *item, struct list_lru_one *list,
-					 void *cb_arg)
-{
-	struct block_private *bp = container_of(item, struct block_private, lru_head);
+	/*
+	 * This isn't great but I don't see a better way.  We want to
+	 * walk the hash from a random point so that we're not
+	 * constantly walking over the same region that we've already
+	 * freed old blocks within.  The interface doesn't let us do
+	 * this explicitly, but this seems to work?  The difference this
+	 * makes is enormous, around a few orders of magnitude fewer
+	 * _nexts per shrink.
+	 */
+	if (iter.walker.tbl)
+		iter.slot = prandom_u32_max(iter.walker.tbl->size);

-	printk("blkno %llu refcount 0x%x io_count %d bits 0x%lx\n",
-		bp->bl.blkno, atomic_read(&bp->refcount), atomic_read(&bp->io_count),
-		bp->bits);
-	print_block_stack(bp);
+	while (nr > 0) {
+		bp = rhashtable_walk_next(&iter);
+		if (bp == NULL)
+			break;
+		if (bp == ERR_PTR(-EAGAIN)) {
+			/*
+			 * We can be called from reclaim in the allocation
+			 * to resize the hash table itself.  We have to
+			 * return so that the caller can proceed and
+			 * enable hash table iteration again.
+			 */
+			scoutfs_inc_counter(sb, block_cache_shrink_stop);
+			stop = true;
+			break;
+		}

-	return LRU_SKIP;
-}
+		scoutfs_inc_counter(sb, block_cache_shrink_next);

-/*
- * Called during shutdown with no other users.  The isolating walk must
- * find blocks on the lru that only have references for presence on the
- * lru and in the hash table.
- */
-static void block_shrink_all(struct super_block *sb)
-{
-	DECLARE_BLOCK_INFO(sb, binf);
-	DECLARE_ISOLATE_ARGS(sb, ia);
-	long count;
+		if (bp->accessed >= recently) {
+			scoutfs_inc_counter(sb, block_cache_shrink_recent);
+			continue;
+		}

-	count = DIV_ROUND_UP(list_lru_count(&binf->lru), 128) * 2;
-	do {
-		kc_list_lru_walk(&binf->lru, isolate_lru_block, &ia, 128);
-		shrink_dispose_blocks(sb, &ia.dispose);
-	} while (list_lru_count(&binf->lru) > 0 && --count > 0);
-
-	count = list_lru_count(&binf->lru);
-	if (count > 0) {
-		scoutfs_err(sb, "failed to isolate/dispose %ld blocks", count);
-		kc_list_lru_walk(&binf->lru, dump_lru_block, sb, count);
+		if (block_get_if_inserted(bp)) {
+			if (block_remove_solo(sb, bp)) {
+				scoutfs_inc_counter(sb, block_cache_shrink_remove);
+				TRACE_BLOCK(shrink, bp);
+				freed++;
+				nr--;
+			}
+			block_put(sb, bp);
+		}
 	}
+
+	rhashtable_walk_stop(&iter);
+	rhashtable_walk_exit(&iter);
+
+	if (stop)
+		return SHRINK_STOP;
+	else
+		return freed;
 }

 struct sm_block_completion {
@@ -1200,7 +1210,7 @@ static int sm_block_io(struct super_block *sb, struct block_device *bdev, blk_op
 	BUILD_BUG_ON(PAGE_SIZE < SCOUTFS_BLOCK_SM_SIZE);

 	if (scoutfs_forcing_unmount(sb))
-		return -ENOLINK;
+		return -EIO;

 	if (WARN_ON_ONCE(len > SCOUTFS_BLOCK_SM_SIZE) ||
 	    WARN_ON_ONCE(!op_is_write(opf) && !blk_crc))
@@ -1266,7 +1276,7 @@ int scoutfs_block_write_sm(struct super_block *sb,
 int scoutfs_block_setup(struct super_block *sb)
 {
 	struct scoutfs_sb_info *sbi = SCOUTFS_SB(sb);
-	struct block_info *binf = NULL;
+	struct block_info *binf;
 	int ret;

 	binf = kzalloc(sizeof(struct block_info), GFP_KERNEL);
@@ -1275,15 +1285,15 @@ int scoutfs_block_setup(struct super_block *sb)
 		goto out;
 	}

-	ret = list_lru_init(&binf->lru);
-	if (ret < 0)
-		goto out;
-
 	ret = rhashtable_init(&binf->ht, &block_ht_params);
-	if (ret < 0)
+	if (ret < 0) {
+		kfree(binf);
 		goto out;
+	}

 	binf->sb = sb;
+	atomic_set(&binf->total_inserted, 0);
+	atomic64_set(&binf->access_counter, 0);
 	init_waitqueue_head(&binf->waitq);
 	KC_INIT_SHRINKER_FUNCS(&binf->shrinker, block_count_objects,
 			       block_scan_objects);
@@ -1295,10 +1305,8 @@ int scoutfs_block_setup(struct super_block *sb)

 	ret = 0;
 out:
-	if (ret < 0 && binf) {
-		list_lru_destroy(&binf->lru);
-		kfree(binf);
-	}
+	if (ret)
+		scoutfs_block_destroy(sb);

 	return ret;
 }
@@ -1310,10 +1318,9 @@ void scoutfs_block_destroy(struct super_block *sb)

 	if (binf) {
 		KC_UNREGISTER_SHRINKER(&binf->shrinker);
-		block_shrink_all(sb);
+		block_remove_all(sb);
 		flush_work(&binf->free_work);
 		rhashtable_destroy(&binf->ht);
-		list_lru_destroy(&binf->lru);

 		kfree(binf);
 		sbi->block_info = NULL;
--- a/kmod/src/btree.c
+++ b/kmod/src/btree.c
@@ -2183,8 +2183,6 @@ static int merge_read_item(struct super_block *sb, struct scoutfs_key *key, u64
 		if (ret > 0) {
 			if (ret == SCOUTFS_DELTA_COMBINED) {
 				scoutfs_inc_counter(sb, btree_merge_delta_combined);
-				if (seq > found->seq)
-					found->seq = seq;
 			} else if (ret == SCOUTFS_DELTA_COMBINED_NULL) {
 				scoutfs_inc_counter(sb, btree_merge_delta_null);
 				free_mitem(rng, found);
@@ -2488,14 +2486,6 @@ int scoutfs_btree_merge(struct super_block *sb,
 			mitem = next_mitem(mitem);
 			free_mitem(&rng, tmp);
 		}
-
-		if (mitem && walk_val_len == 0 &&
-		    !(walk_flags & (BTW_INSERT | BTW_DELETE)) &&
-		    scoutfs_trigger(sb, LOG_MERGE_FORCE_PARTIAL)) {
-			ret = -ERANGE;
-			*next_ret = mitem->key;
-			goto out;
-		}
 	}

 	ret = 0;
--- a/kmod/src/client.c
+++ b/kmod/src/client.c
@@ -435,8 +435,8 @@ static int lookup_mounted_client_item(struct super_block *sb, u64 rid)
 	if (ret == -ENOENT)
 		ret = 0;

-out:
 	kfree(super);
+out:
 	return ret;
 }

--- a/kmod/src/counters.h
+++ b/kmod/src/counters.h
@@ -26,15 +26,17 @@
 	EXPAND_COUNTER(block_cache_alloc_page_order)		\
 	EXPAND_COUNTER(block_cache_alloc_virt)			\
 	EXPAND_COUNTER(block_cache_end_io_error)		\
-	EXPAND_COUNTER(block_cache_isolate_removed)		\
-	EXPAND_COUNTER(block_cache_isolate_rotate)		\
-	EXPAND_COUNTER(block_cache_isolate_skip)		\
 	EXPAND_COUNTER(block_cache_forget)			\
 	EXPAND_COUNTER(block_cache_free)			\
 	EXPAND_COUNTER(block_cache_free_work)			\
 	EXPAND_COUNTER(block_cache_remove_stale)		\
 	EXPAND_COUNTER(block_cache_count_objects)		\
 	EXPAND_COUNTER(block_cache_scan_objects)		\
+	EXPAND_COUNTER(block_cache_shrink)			\
+	EXPAND_COUNTER(block_cache_shrink_next)			\
+	EXPAND_COUNTER(block_cache_shrink_recent)		\
+	EXPAND_COUNTER(block_cache_shrink_remove)		\
+	EXPAND_COUNTER(block_cache_shrink_stop)			\
 	EXPAND_COUNTER(btree_compact_values)			\
 	EXPAND_COUNTER(btree_compact_values_enomem)		\
 	EXPAND_COUNTER(btree_delete)				\
@@ -88,7 +90,6 @@
 	EXPAND_COUNTER(forest_read_items)			\
 	EXPAND_COUNTER(forest_roots_next_hint)			\
 	EXPAND_COUNTER(forest_set_bloom_bits)			\
-	EXPAND_COUNTER(inode_deleted)				\
 	EXPAND_COUNTER(item_cache_count_objects)		\
 	EXPAND_COUNTER(item_cache_scan_objects)			\
 	EXPAND_COUNTER(item_clear_dirty)			\
@@ -116,15 +117,15 @@
 	EXPAND_COUNTER(item_pcpu_page_hit)			\
 	EXPAND_COUNTER(item_pcpu_page_miss)			\
 	EXPAND_COUNTER(item_pcpu_page_miss_keys)		\
-	EXPAND_COUNTER(item_read_pages_barrier)			\
-	EXPAND_COUNTER(item_read_pages_retry)			\
 	EXPAND_COUNTER(item_read_pages_split)			\
 	EXPAND_COUNTER(item_shrink_page)			\
 	EXPAND_COUNTER(item_shrink_page_dirty)			\
+	EXPAND_COUNTER(item_shrink_page_reader)			\
 	EXPAND_COUNTER(item_shrink_page_trylock)		\
 	EXPAND_COUNTER(item_update)				\
 	EXPAND_COUNTER(item_write_dirty)			\
 	EXPAND_COUNTER(lock_alloc)				\
+	EXPAND_COUNTER(lock_count_objects)			\
 	EXPAND_COUNTER(lock_free)				\
 	EXPAND_COUNTER(lock_grant_request)			\
 	EXPAND_COUNTER(lock_grant_response)			\
@@ -138,13 +139,12 @@
 	EXPAND_COUNTER(lock_lock_error)				\
 	EXPAND_COUNTER(lock_nonblock_eagain)			\
 	EXPAND_COUNTER(lock_recover_request)			\
+	EXPAND_COUNTER(lock_scan_objects)			\
 	EXPAND_COUNTER(lock_shrink_attempted)			\
-	EXPAND_COUNTER(lock_shrink_request_failed)		\
+	EXPAND_COUNTER(lock_shrink_aborted)			\
+	EXPAND_COUNTER(lock_shrink_work)			\
 	EXPAND_COUNTER(lock_unlock)				\
 	EXPAND_COUNTER(lock_wait)				\
-	EXPAND_COUNTER(log_merge_complete)			\
-	EXPAND_COUNTER(log_merge_no_finalized)			\
-	EXPAND_COUNTER(log_merge_start)				\
 	EXPAND_COUNTER(log_merge_wait_timeout)			\
 	EXPAND_COUNTER(net_dropped_response)			\
 	EXPAND_COUNTER(net_send_bytes)				\
@@ -159,7 +159,6 @@
 	EXPAND_COUNTER(orphan_scan)				\
 	EXPAND_COUNTER(orphan_scan_attempts)			\
 	EXPAND_COUNTER(orphan_scan_cached)			\
-	EXPAND_COUNTER(orphan_scan_empty)			\
 	EXPAND_COUNTER(orphan_scan_error)			\
 	EXPAND_COUNTER(orphan_scan_item)			\
 	EXPAND_COUNTER(orphan_scan_omap_set)			\
@@ -182,7 +181,6 @@
 	EXPAND_COUNTER(quorum_send_vote)			\
 	EXPAND_COUNTER(quorum_server_shutdown)			\
 	EXPAND_COUNTER(quorum_term_follower)			\
-	EXPAND_COUNTER(reclaimed_open_logs)			\
 	EXPAND_COUNTER(server_commit_hold)			\
 	EXPAND_COUNTER(server_commit_queue)			\
 	EXPAND_COUNTER(server_commit_worker)			\
--- a/kmod/src/data.c
+++ b/kmod/src/data.c
@@ -79,10 +79,8 @@ static void item_from_extent(struct scoutfs_key *key,
 		.skdx_end = cpu_to_le64(start + len - 1),
 		.skdx_len = cpu_to_le64(len),
 	};
-	*dv = (struct scoutfs_data_extent_val) {
-		.blkno = cpu_to_le64(map),
-		.flags = flags,
-	};
+	dv->blkno = cpu_to_le64(map);
+	dv->flags = flags;
 }

 static void ext_from_item(struct scoutfs_extent *ext,
@@ -1517,101 +1515,6 @@ out:
 	return ret;
 }

-/*
- * Punch holes in offline extents.  This is a very specific tool that
- * only does one job: it converts extents from offline to sparse.  It
- * returns an error if it encounters an extent that isn't offline or has
- * a block mapping.  It ignores i_size completely; it does not test it,
- * and does not update it.
- *
- * The caller has the inode locked in the vfs and performed basic sanity
- * checks.  We manage transactions and the extent_sem which is ordered
- * inside the transaction.
- */
-int scoutfs_data_punch_offline(struct inode *inode, u64 iblock, u64 last, u64 data_version,
-			       struct scoutfs_lock *lock)
-{
-	struct scoutfs_inode_info *si = SCOUTFS_I(inode);
-	struct super_block *sb = inode->i_sb;
-	struct data_ext_args args = {
-		.ino = scoutfs_ino(inode),
-		.inode = inode,
-		.lock = lock,
-	};
-	struct scoutfs_extent ext;
-	LIST_HEAD(ind_locks);
-	int ret;
-	int i;
-
-	if (WARN_ON_ONCE(iblock > last)) {
-		ret = -EINVAL;
-		goto out;
-	}
-
-	/* idiomatic to call start,last with 0,~0, clamp last to last possible */
-	last = min(last, SCOUTFS_BLOCK_SM_MAX);
-
-	ret = 0;
-	while (iblock <= last) {
-		ret = scoutfs_inode_index_lock_hold(inode, &ind_locks, true, false) ?:
-		      scoutfs_dirty_inode_item(inode, lock);
-		if (ret < 0)
-			break;
-
-		down_write(&si->extent_sem);
-
-		for (i = 0; i < 32 && (iblock <= last); i++) {
-			ret = scoutfs_ext_next(sb, &data_ext_ops, &args, iblock, 1, &ext);
-			if (ret == -ENOENT) {
-				iblock = last + 1;
-				ret = 0;
-				break;
-			}
-
-			if (ret < 0)
-				break;
-
-			if (ext.start > last) {
-				iblock = last + 1;
-				break;
-			}
-
-			if (ext.map) {
-				ret = -EINVAL;
-				break;
-			}
-
-			if (ext.flags & SEF_OFFLINE) {
-				if (iblock > ext.start) {
-					ext.len -= iblock - ext.start;
-					ext.start = iblock;
-				}
-				ext.len = min(ext.len, last - ext.start + 1);
-				ext.flags &= ~SEF_OFFLINE;
-
-				ret = scoutfs_ext_set(sb, &data_ext_ops, &args,
-						      ext.start, ext.len, ext.map, ext.flags);
-				if (ret < 0)
-					break;
-			}
-
-			iblock = ext.start + ext.len;
-		}
-
-		up_write(&si->extent_sem);
-
-		scoutfs_update_inode_item(inode, lock, &ind_locks);
-		scoutfs_release_trans(sb);
-		scoutfs_inode_index_unlock(sb, &ind_locks);
-
-		if (ret < 0)
-			break;
-	}
-
-out:
-	return ret;
-}
-
 /*
 * This copies to userspace :/
 */
--- a/kmod/src/data.h
+++ b/kmod/src/data.h
@@ -57,8 +57,6 @@ int scoutfs_data_init_offline_extent(struct inode *inode, u64 size,
 int scoutfs_data_move_blocks(struct inode *from, u64 from_off,
 			     u64 byte_len, struct inode *to, u64 to_off, bool to_stage,
 			     u64 data_version);
-int scoutfs_data_punch_offline(struct inode *inode, u64 iblock, u64 last, u64 data_version,
-			       struct scoutfs_lock *lock);

 int scoutfs_data_wait_check(struct inode *inode, loff_t pos, loff_t len,
 			    u8 sef, u8 op, struct scoutfs_data_wait *ow,
--- a/kmod/src/dir.c
+++ b/kmod/src/dir.c
@@ -587,12 +587,10 @@ static int add_entry_items(struct super_block *sb, u64 dir_ino, u64 hash,
 	}

 	/* initialize the dent */
-	*dent = (struct scoutfs_dirent) {
-		.ino = cpu_to_le64(ino),
-		.hash = cpu_to_le64(hash),
-		.pos = cpu_to_le64(pos),
-		.type = mode_to_type(mode),
-	};
+	dent->ino = cpu_to_le64(ino);
+	dent->hash = cpu_to_le64(hash);
+	dent->pos = cpu_to_le64(pos);
+	dent->type = mode_to_type(mode);
 	memcpy(dent->name, name, name_len);

 	init_dirent_key(&ent_key, SCOUTFS_DIRENT_TYPE, dir_ino, hash, pos);
@@ -2008,11 +2006,7 @@ const struct inode_operations scoutfs_symlink_iops = {
 #ifdef KC_LINUX_HAVE_RHEL_IOPS_WRAPPER
 	.removexattr	= generic_removexattr,
 #endif
-#ifdef KC_GET_INODE_ACL
-	.get_inode_acl	= scoutfs_get_acl,
-#else
 	.get_acl	= scoutfs_get_acl,
-#endif
 #ifndef KC_LINUX_HAVE_RHEL_IOPS_WRAPPER
 	.tmpfile	= scoutfs_tmpfile,
 	.rename		= scoutfs_rename_common,
@@ -2058,14 +2052,7 @@ const struct inode_operations scoutfs_dir_iops = {
 	.removexattr	= generic_removexattr,
 #endif
 	.listxattr	= scoutfs_listxattr,
-#ifdef KC_GET_INODE_ACL
-	.get_inode_acl	= scoutfs_get_acl,
-#else
 	.get_acl	= scoutfs_get_acl,
-#endif
-#ifdef KC_SET_ACL_DENTRY
-	.set_acl	= scoutfs_set_acl,
-#endif
 	.symlink	= scoutfs_symlink,
 	.permission	= scoutfs_permission,
 #ifdef KC_LINUX_HAVE_RHEL_IOPS_WRAPPER
--- a/kmod/src/forest.c
+++ b/kmod/src/forest.c
@@ -239,9 +239,9 @@ static int forest_read_items(struct super_block *sb, struct scoutfs_key *key, u6
 * to reset their state and retry with a newer version of the btrees.
 */
 int scoutfs_forest_read_items_roots(struct super_block *sb, struct scoutfs_net_roots *roots,
-				    u64 merge_input_seq, struct scoutfs_key *key,
-				    struct scoutfs_key *bloom_key, struct scoutfs_key *start,
-				    struct scoutfs_key *end, scoutfs_forest_item_cb cb, void *arg)
+				    struct scoutfs_key *key, struct scoutfs_key *bloom_key,
+				    struct scoutfs_key *start, struct scoutfs_key *end,
+				    scoutfs_forest_item_cb cb, void *arg)
 {
 	struct forest_read_items_data rid = {
 		.cb = cb,
@@ -317,17 +317,15 @@ int scoutfs_forest_read_items_roots(struct super_block *sb, struct scoutfs_net_r

 		scoutfs_inc_counter(sb, forest_bloom_pass);

-		if ((le64_to_cpu(lt.flags) & SCOUTFS_LOG_TREES_FINALIZED) &&
-		    (merge_input_seq == 0 ||
-		     le64_to_cpu(lt.finalize_seq) < merge_input_seq))
-			rid.fic |= FIC_MERGE_INPUT;
+		if ((le64_to_cpu(lt.flags) & SCOUTFS_LOG_TREES_FINALIZED))
+			rid.fic |= FIC_FINALIZED;

 		ret = scoutfs_btree_read_items(sb, &lt.item_root, key, start,
 					       end, forest_read_items, &rid);
 		if (ret < 0)
 			goto out;

-		rid.fic &= ~FIC_MERGE_INPUT;
+		rid.fic &= ~FIC_FINALIZED;
 	}

 	ret = 0;
@@ -347,7 +345,7 @@ int scoutfs_forest_read_items(struct super_block *sb,

 	ret = scoutfs_client_get_roots(sb, &roots);
 	if (ret == 0)
-		ret = scoutfs_forest_read_items_roots(sb, &roots, 0, key, bloom_key, start, end,
+		ret = scoutfs_forest_read_items_roots(sb, &roots, key, bloom_key, start, end,
 						      cb, arg);
 	return ret;
 }
@@ -795,7 +793,7 @@ out:
 	if (ret)
 		scoutfs_forest_destroy(sb);

-	return ret;
+	return 0;
 }

 void scoutfs_forest_start(struct super_block *sb)
--- a/kmod/src/forest.h
+++ b/kmod/src/forest.h
@@ -11,7 +11,7 @@ struct scoutfs_lock;
 /* caller gives an item to the callback */
 enum {
 	FIC_FS_ROOT = (1 << 0),
-	FIC_MERGE_INPUT = (1 << 1),
+	FIC_FINALIZED = (1 << 1),
 };
 typedef int (*scoutfs_forest_item_cb)(struct super_block *sb, struct scoutfs_key *key, u64 seq,
 				      u8 flags, void *val, int val_len, int fic, void *arg);
@@ -25,9 +25,9 @@ int scoutfs_forest_read_items(struct super_block *sb,
 			      struct scoutfs_key *end,
 			      scoutfs_forest_item_cb cb, void *arg);
 int scoutfs_forest_read_items_roots(struct super_block *sb, struct scoutfs_net_roots *roots,
-				    u64 merge_input_seq, struct scoutfs_key *key,
-				    struct scoutfs_key *bloom_key, struct scoutfs_key *start,
-				    struct scoutfs_key *end, scoutfs_forest_item_cb cb, void *arg);
+				    struct scoutfs_key *key, struct scoutfs_key *bloom_key,
+				    struct scoutfs_key *start, struct scoutfs_key *end,
+				    scoutfs_forest_item_cb cb, void *arg);
 int scoutfs_forest_set_bloom_bits(struct super_block *sb,
 				  struct scoutfs_lock *lock);
 void scoutfs_forest_set_max_seq(struct super_block *sb, u64 max_seq);
--- a/kmod/src/format.h
+++ b/kmod/src/format.h
@@ -470,7 +470,7 @@ struct scoutfs_srch_compact {
 * @get_trans_seq, @commit_trans_seq: These pair of sequence numbers
 * determine if a transaction is currently open for the mount that owns
 * the log_trees struct.  get_trans_seq is advanced by the server as the
- * transaction is opened.   The server sets commit_trans_seq equal to
+ * transaction is opened.   The server sets comimt_trans_seq equal to
 * get_ as the transaction is committed.
 */
 struct scoutfs_log_trees {
@@ -1091,8 +1091,7 @@ enum scoutfs_net_cmd {
 	EXPAND_NET_ERRNO(ENOMEM)	\
 	EXPAND_NET_ERRNO(EIO)		\
 	EXPAND_NET_ERRNO(ENOSPC)	\
-	EXPAND_NET_ERRNO(EINVAL)	\
-	EXPAND_NET_ERRNO(ENOLINK)
+	EXPAND_NET_ERRNO(EINVAL)

 #undef EXPAND_NET_ERRNO
 #define EXPAND_NET_ERRNO(which) SCOUTFS_NET_ERR_##which,
--- a/kmod/src/inode.c
+++ b/kmod/src/inode.c
@@ -149,14 +149,7 @@ static const struct inode_operations scoutfs_file_iops = {
 	.removexattr	= generic_removexattr,
 #endif
 	.listxattr	= scoutfs_listxattr,
-#ifdef KC_GET_INODE_ACL
-	.get_inode_acl	= scoutfs_get_acl,
-#else
 	.get_acl	= scoutfs_get_acl,
-#endif
-#ifdef KC_SET_ACL_DENTRY
-	.set_acl	= scoutfs_set_acl,
-#endif
 	.fiemap		= scoutfs_data_fiemap,
 };

@@ -169,14 +162,7 @@ static const struct inode_operations scoutfs_special_iops = {
 	.removexattr	= generic_removexattr,
 #endif
 	.listxattr	= scoutfs_listxattr,
-#ifdef KC_GET_INODE_ACL
-	.get_inode_acl	= scoutfs_get_acl,
-#else
 	.get_acl	= scoutfs_get_acl,
-#endif
-#ifdef KC_SET_ACL_DENTRY
-	.set_acl	= scoutfs_set_acl,
-#endif
 };

 /*
@@ -1490,6 +1476,12 @@ static int remove_index_items(struct super_block *sb, u64 ino,
 * Return an allocated and unused inode number.  Returns -ENOSPC if
 * we're out of inode.
 *
+ * Each parent directory has its own pool of free inode numbers.  Items
+ * are sorted by their inode numbers as they're stored in segments.
+ * This will tend to group together files that are created in a
+ * directory at the same time in segments.  Concurrent creation across
+ * different directories will be stored in their own regions.
+ *
 * Inode numbers are never reclaimed.  If the inode is evicted or we're
 * unmounted the pending inode numbers will be lost.  Asking for a
 * relatively small number from the server each time will tend to
@@ -1499,18 +1491,12 @@ static int remove_index_items(struct super_block *sb, u64 ino,
 int scoutfs_alloc_ino(struct super_block *sb, bool is_dir, u64 *ino_ret)
 {
 	DECLARE_INODE_SB_INFO(sb, inf);
-	struct scoutfs_mount_options opts;
 	struct inode_allocator *ia;
 	u64 ino;
 	u64 nr;
 	int ret;

-	scoutfs_options_read(sb, &opts);
-
-	if (is_dir && opts.ino_alloc_per_lock == SCOUTFS_LOCK_INODE_GROUP_NR)
-		ia = &inf->dir_ino_alloc;
-	else
-		ia = &inf->ino_alloc;
+	ia = is_dir ? &inf->dir_ino_alloc : &inf->ino_alloc;

 	spin_lock(&ia->lock);

@@ -1531,17 +1517,6 @@ int scoutfs_alloc_ino(struct super_block *sb, bool is_dir, u64 *ino_ret)
 	*ino_ret = ia->ino++;
 	ia->nr--;

-	if (opts.ino_alloc_per_lock != SCOUTFS_LOCK_INODE_GROUP_NR) {
-		nr = ia->ino & SCOUTFS_LOCK_INODE_GROUP_MASK;
-		if (nr >= opts.ino_alloc_per_lock) {
-			nr = SCOUTFS_LOCK_INODE_GROUP_NR - nr;
-			if (nr > ia->nr)
-				nr = ia->nr;
-			ia->ino += nr;
-			ia->nr -= nr;
-		}
-	}
-
 	spin_unlock(&ia->lock);
 	ret = 0;
 out:
@@ -1645,14 +1620,10 @@ int scoutfs_inode_orphan_delete(struct super_block *sb, u64 ino, struct scoutfs_
 				struct scoutfs_lock *primary)
 {
 	struct scoutfs_key key;
-	int ret;

 	init_orphan_key(&key, ino);

-	ret = scoutfs_item_delete_force(sb, &key, lock, primary);
-	trace_scoutfs_inode_orphan_delete(sb, ino, ret);
-
-	return ret;
+	return scoutfs_item_delete_force(sb, &key, lock, primary);
 }

 /*
@@ -1734,8 +1705,6 @@ out:
 		scoutfs_release_trans(sb);
 	scoutfs_inode_index_unlock(sb, &ind_locks);

-	trace_scoutfs_delete_inode_end(sb, ino, mode, size, ret);
-
 	return ret;
 }

@@ -1831,9 +1800,6 @@ out:
 * they've checked that the inode could really be deleted.  We serialize
 * on a bit in the lock data so that we only have one deletion attempt
 * per inode under this mount's cluster lock.
- *
- * Returns -EAGAIN if we either did some cleanup work or are unable to finish
- * cleaning up this inode right now.
 */
 static int try_delete_inode_items(struct super_block *sb, u64 ino)
 {
@@ -1847,8 +1813,6 @@ static int try_delete_inode_items(struct super_block *sb, u64 ino)
 	int bit_nr;
 	int ret;

-	trace_scoutfs_try_delete(sb, ino);
-
 	ret = scoutfs_lock_ino(sb, SCOUTFS_LOCK_WRITE, 0, ino, &lock);
 	if (ret < 0)
 		goto out;
@@ -1861,32 +1825,27 @@ static int try_delete_inode_items(struct super_block *sb, u64 ino)

 	/* only one local attempt per inode at a time */
 	if (test_and_set_bit(bit_nr, ldata->trying)) {
-		trace_scoutfs_try_delete_local_busy(sb, ino);
-		ret = -EAGAIN;
+		ret = 0;
 		goto out;
 	}
 	clear_trying = true;

 	/* can't delete if it's cached in local or remote mounts */
 	if (scoutfs_omap_test(sb, ino) || test_bit_le(bit_nr, ldata->map.bits)) {
-		trace_scoutfs_try_delete_cached(sb, ino);
-		ret = -EAGAIN;
+		ret = 0;
 		goto out;
 	}

 	scoutfs_inode_init_key(&key, ino);
 	ret = lookup_inode_item(sb, &key, &sinode, lock);
 	if (ret < 0) {
-		if (ret == -ENOENT) {
-			trace_scoutfs_try_delete_no_item(sb, ino);
+		if (ret == -ENOENT)
 			ret = 0;
-		}
 		goto out;
 	}

 	if (le32_to_cpu(sinode.nlink) > 0) {
-		trace_scoutfs_try_delete_has_links(sb, ino, le32_to_cpu(sinode.nlink));
-		ret = -EAGAIN;
+		ret = 0;
 		goto out;
 	}

@@ -1895,11 +1854,6 @@ static int try_delete_inode_items(struct super_block *sb, u64 ino)
 		goto out;

 	ret = delete_inode_items(sb, ino, &sinode, lock, orph_lock);
-	if (ret == 0) {
-		ret = -EAGAIN;
-		scoutfs_inc_counter(sb, inode_deleted);
-	}
-
 out:
 	if (clear_trying)
 		clear_bit(bit_nr, ldata->trying);
@@ -2008,8 +1962,6 @@ static void iput_worker(struct work_struct *work)
 		while (count-- > 0)
 			iput(inode);

-		cond_resched();
-
 		/* can't touch inode after final iput */

 		spin_lock(&inf->iput_lock);
@@ -2100,10 +2052,6 @@ void scoutfs_inode_schedule_orphan_dwork(struct super_block *sb)
 * a locally cached inode.  Then we ask the server for the open map
 * containing the inode.  Only if we don't see any cached users do we do
 * the expensive work of acquiring locks to try and delete the items.
- *
- * We need to track whether there is any orphan cleanup work remaining so
- * that tests such as inode-deletion can watch the orphan_scan_empty counter
- * to determine when inode cleanup from open-unlink scenarios is complete.
 */
 static void inode_orphan_scan_worker(struct work_struct *work)
 {
@@ -2115,14 +2063,11 @@ static void inode_orphan_scan_worker(struct work_struct *work)
 	SCOUTFS_BTREE_ITEM_REF(iref);
 	struct scoutfs_key last;
 	struct scoutfs_key key;
-	bool work_todo = false;
 	u64 group_nr;
 	int bit_nr;
 	u64 ino;
 	int ret;

-	trace_scoutfs_orphan_scan_start(sb);
-
 	scoutfs_inc_counter(sb, orphan_scan);

 	init_orphan_key(&last, U64_MAX);
@@ -2142,10 +2087,8 @@ static void inode_orphan_scan_worker(struct work_struct *work)
 		init_orphan_key(&key, ino);
 		ret = scoutfs_btree_next(sb, &roots.fs_root, &key, &iref);
 		if (ret < 0) {
-			if (ret == -ENOENT) {
-				trace_scoutfs_orphan_scan_work(sb, 0);
+			if (ret == -ENOENT)
 				break;
-			}
 			goto out;
 		}

@@ -2160,7 +2103,6 @@ static void inode_orphan_scan_worker(struct work_struct *work)

 		/* locally cached inodes will try to delete as they evict */
 		if (scoutfs_omap_test(sb, ino)) {
-			work_todo = true;
 			scoutfs_inc_counter(sb, orphan_scan_cached);
 			continue;
 		}
@@ -2176,22 +2118,13 @@ static void inode_orphan_scan_worker(struct work_struct *work)

 		/* remote cached inodes will also try to delete */
 		if (test_bit_le(bit_nr, omap.bits)) {
-			work_todo = true;
 			scoutfs_inc_counter(sb, orphan_scan_omap_set);
 			continue;
 		}

 		/* seemingly orphaned and unused, get locks and check for sure */
 		scoutfs_inc_counter(sb, orphan_scan_attempts);
-		trace_scoutfs_orphan_scan_work(sb, ino);
-
 		ret = try_delete_inode_items(sb, ino);
-		if (ret == -EAGAIN) {
-			work_todo = true;
-			ret = 0;
-		}
-
-		trace_scoutfs_orphan_scan_end(sb, ino, ret);
 	}

 	ret = 0;
@@ -2200,11 +2133,6 @@ out:
 	if (ret < 0)
 		scoutfs_inc_counter(sb, orphan_scan_error);

-	if (!work_todo)
-		scoutfs_inc_counter(sb, orphan_scan_empty);
-
-	trace_scoutfs_orphan_scan_stop(sb, work_todo);
-
 	scoutfs_inode_schedule_orphan_dwork(sb);
 }

@@ -2255,7 +2183,7 @@ int scoutfs_inode_walk_writeback(struct super_block *sb, bool write)
 	struct scoutfs_inode_info *si;
 	struct scoutfs_inode_info *tmp;
 	struct inode *inode;
-	int ret = 0;
+	int ret;

 	spin_lock(&inf->writeback_lock);

--- a/kmod/src/ioctl.c
+++ b/kmod/src/ioctl.c
@@ -415,6 +415,8 @@ static long scoutfs_ioc_data_wait_err(struct file *file, unsigned long arg)
 		return 0;
 	if ((args.op & SCOUTFS_IOC_DWO_UNKNOWN) || !IS_ERR_VALUE(args.err))
 		return -EINVAL;
+	if ((args.op & SCOUTFS_IOC_DWO_UNKNOWN) || !IS_ERR_VALUE(args.err))
+		return -EINVAL;

 	trace_scoutfs_ioc_data_wait_err(sb, &args);

@@ -439,6 +441,8 @@ static long scoutfs_ioc_data_wait_err(struct file *file, unsigned long arg)

 	if (!S_ISREG(inode->i_mode)) {
 		ret = -EINVAL;
+	} else if (scoutfs_inode_data_version(inode) != args.data_version) {
+		ret = -ESTALE;
 	} else {
 		ret = scoutfs_data_wait_err(inode, sblock, eblock, args.op,
 					    args.err);
@@ -950,9 +954,6 @@ static int copy_alloc_detail_to_user(struct super_block *sb, void *arg,
 	if (args->copied == args->nr)
 		return -EOVERFLOW;

-	/* .type and .pad need clearing */
-	memset(&ade, 0, sizeof(struct scoutfs_ioctl_alloc_detail_entry));
-
 	ade.blocks = blocks;
 	ade.id = id;
 	ade.meta = !!meta;
@@ -1368,7 +1369,7 @@ static long scoutfs_ioc_get_referring_entries(struct file *file, unsigned long a
 			ent.d_type = bref->d_type;
 			ent.name_len = name_len;

-			if (copy_to_user(uent, &ent, offsetof(struct scoutfs_ioctl_dirent, name[0])) ||
+			if (copy_to_user(uent, &ent, sizeof(struct scoutfs_ioctl_dirent)) ||
 			    copy_to_user(&uent->name[0], bref->dent.name, name_len) ||
 			    put_user('\0', &uent->name[name_len])) {
 				ret = -EFAULT;
@@ -1667,78 +1668,6 @@ out:
 	return ret;
 }

-static long scoutfs_ioc_punch_offline(struct file *file, unsigned long arg)
-{
-	struct inode *inode = file_inode(file);
-	struct super_block *sb = inode->i_sb;
-	struct scoutfs_ioctl_punch_offline __user *upo = (void __user *)arg;
-	struct scoutfs_ioctl_punch_offline po;
-	struct scoutfs_lock *lock = NULL;
-	u64 iblock;
-	u64 last;
-	u64 tmp;
-	int ret;
-
-	if (copy_from_user(&po, upo, sizeof(po)))
-		return -EFAULT;
-
-	if (po.len == 0)
-		return 0;
-
-	if (check_add_overflow(po.offset, po.len - 1, &tmp) ||
-	    (po.offset & SCOUTFS_BLOCK_SM_MASK) ||
-	    (po.len & SCOUTFS_BLOCK_SM_MASK))
-		return -EOVERFLOW;
-
-	if (po.flags)
-		return -EINVAL;
-
-	ret = mnt_want_write_file(file);
-	if (ret < 0)
-		return ret;
-
-	inode_lock(inode);
-
-	ret = scoutfs_lock_inode(sb, SCOUTFS_LOCK_WRITE,
-				 SCOUTFS_LKF_REFRESH_INODE, inode, &lock);
-	if (ret)
-		goto out;
-
-	if (!S_ISREG(inode->i_mode)) {
-		ret = -EINVAL;
-		goto out;
-	}
-
-	if (!(file->f_mode & FMODE_WRITE)) {
-		ret = -EINVAL;
-		goto out;
-	}
-
-	ret = inode_permission(KC_VFS_INIT_NS inode, MAY_WRITE);
-	if (ret < 0)
-		goto out;
-
-	if (scoutfs_inode_data_version(inode) != po.data_version) {
-		ret = -ESTALE;
-		goto out;
-	}
-
-	if ((ret = scoutfs_inode_check_retention(inode)))
-		goto out;
-
-	iblock = po.offset >> SCOUTFS_BLOCK_SM_SHIFT;
-	last = (po.offset + po.len - 1) >> SCOUTFS_BLOCK_SM_SHIFT;
-
-	ret = scoutfs_data_punch_offline(inode, iblock, last, po.data_version, lock);
-
-out:
-	scoutfs_unlock(sb, lock, SCOUTFS_LOCK_WRITE);
-	inode_unlock(inode);
-	mnt_drop_write_file(file);
-
-	return ret;
-}
-
 long scoutfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 {
 	switch (cmd) {
@@ -1788,8 +1717,6 @@ long scoutfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 		return scoutfs_ioc_mod_quota_rule(file, arg, false);
 	case SCOUTFS_IOC_READ_XATTR_INDEX:
 		return scoutfs_ioc_read_xattr_index(file, arg);
-	case SCOUTFS_IOC_PUNCH_OFFLINE:
-		return scoutfs_ioc_punch_offline(file, arg);
 	}

 	return -ENOTTY;
--- a/kmod/src/ioctl.h
+++ b/kmod/src/ioctl.h
@@ -366,15 +366,10 @@ struct scoutfs_ioctl_statfs_more {
 *
 * Find current waiters that match the inode, op, and block range to wake
 * up and return an error.
- *
- * (*) ca. v1.25 and earlier required that the data_version passed match
- * that of the waiter, but this check is removed. It was never needed
- * because no data is modified during this ioctl. Any data_version value
- * here is thus since then ignored.
 */
 struct scoutfs_ioctl_data_wait_err {
 	__u64 ino;
-	__u64 data_version; /* Ignored, see above (*) */
+	__u64 data_version;
 	__u64 offset;
 	__u64 count;
 	__u64 op;
@@ -848,32 +843,4 @@ struct scoutfs_ioctl_read_xattr_index {
 #define SCOUTFS_IOC_READ_XATTR_INDEX \
 	_IOR(SCOUTFS_IOCTL_MAGIC, 23, struct scoutfs_ioctl_read_xattr_index)

-/*
- * This is a limited and specific version of hole punching.  It's an
- * archive layer operation that only converts unmapped offline extents
- * into sparse extents.  It is intended to be used when restoring sparse
- * files after the initial creation set the entire file size offline.
- *
- * The offset and len fields are in units of bytes and must be aligned
- * to the small (4KiB) block size.  All regions of offline extents
- * covered by the region will be converted into sparse online extents,
- * including regions that straddle the boundaries of the region.  Any
- * existing sparse extents in the region are ignored.
- *
- * The data_version must match the inode or EINVAL is returned.  The
- * data_version is not modified by this operation.
- *
- * EINVAL is returned if any mapped extents are found in the region.  If
- * an error is returned then partial progress may have been made.
- */
-struct scoutfs_ioctl_punch_offline {
-	__u64 offset;
-	__u64 len;
-	__u64 data_version;
-	__u64 flags;
-};
-
-#define SCOUTFS_IOC_PUNCH_OFFLINE \
-	_IOW(SCOUTFS_IOCTL_MAGIC, 24, struct scoutfs_ioctl_punch_offline)
-
 #endif
--- a/kmod/src/item.c
+++ b/kmod/src/item.c
@@ -86,8 +86,6 @@ struct item_cache_info {
 	/* often walked, but per-cpu refs are fast path */
 	rwlock_t rwlock;
 	struct rb_root pg_root;
-	/* stop readers from caching stale items behind reclaimed cleaned written items */
-	u64 read_dirty_barrier;

 	/* page-granular modification by writers, then exclusive to commit */
 	spinlock_t dirty_lock;
@@ -98,6 +96,10 @@ struct item_cache_info {
 	spinlock_t lru_lock;
 	struct list_head lru_list;
 	unsigned long lru_pages;
+
+	/* written by page readers, read by shrink */
+	spinlock_t active_lock;
+	struct list_head active_list;
 };

 #define DECLARE_ITEM_CACHE_INFO(sb, name) \
@@ -1283,6 +1285,78 @@ static int cache_empty_page(struct super_block *sb,
 	return 0;
 }

+/*
+ * Readers operate independently from dirty items and transactions.
+ * They read a set of persistent items and insert them into the cache
+ * when there aren't already pages whose key range contains the items.
+ * This naturally prefers cached dirty items over stale read items.
+ *
+ * We have to deal with the case where dirty items are written and
+ * invalidated while a read is in flight.   The reader won't have seen
+ * the items that were dirty in their persistent roots as they started
+ * reading.  By the time they insert their read pages the previously
+ * dirty items have been reclaimed and are not in the cache.  The old
+ * stale items will be inserted in their place, effectively corrupting
+ * by having the dirty items disappear.
+ *
+ * We fix this by tracking the max seq of items in pages.  As readers
+ * start they record the current transaction seq.  Invalidation skips
+ * pages with a max seq greater than the first reader seq because the
+ * items in the page have to stick around to prevent the readers stale
+ * items from being inserted.
+ *
+ * This naturally only affects a small set of pages with items that were
+ * written relatively recently.  If we're in memory pressure then we
+ * probably have a lot of pages and they'll naturally have items that
+ * were visible to any raders.  We don't bother with the complicated and
+ * expensive further refinement of tracking the ranges that are being
+ * read and comparing those with pages to invalidate.
+ */
+struct active_reader {
+	struct list_head head;
+	u64 seq;
+};
+
+#define INIT_ACTIVE_READER(rdr) \
+	struct active_reader rdr = { .head = LIST_HEAD_INIT(rdr.head) }
+
+static void add_active_reader(struct super_block *sb, struct active_reader *active)
+{
+	DECLARE_ITEM_CACHE_INFO(sb, cinf);
+
+	BUG_ON(!list_empty(&active->head));
+
+	active->seq = scoutfs_trans_sample_seq(sb);
+
+	spin_lock(&cinf->active_lock);
+	list_add_tail(&active->head, &cinf->active_list);
+	spin_unlock(&cinf->active_lock);
+}
+
+static u64 first_active_reader_seq(struct item_cache_info *cinf)
+{
+	struct active_reader *active;
+	u64 first;
+
+	/* only the calling task adds or deletes this active */
+	spin_lock(&cinf->active_lock);
+	active = list_first_entry_or_null(&cinf->active_list, struct active_reader, head);
+	first = active ? active->seq : U64_MAX;
+	spin_unlock(&cinf->active_lock);
+
+	return first;
+}
+
+static void del_active_reader(struct item_cache_info *cinf, struct active_reader *active)
+{
+	/* only the calling task adds or deletes this active */
+	if (!list_empty(&active->head)) {
+		spin_lock(&cinf->active_lock);
+		list_del_init(&active->head);
+		spin_unlock(&cinf->active_lock);
+	}
+}
+
 /*
 * Add a newly read item to the pages that we're assembling for
 * insertion into the cache.   These pages are private, they only exist
@@ -1376,34 +1450,24 @@ static int read_page_item(struct super_block *sb, struct scoutfs_key *key, u64 s
 * and duplicates, we insert any resulting pages which don't overlap
 * with existing cached pages.
 *
- * The forest item reader is reading stable trees that could be
- * overwritten.  It can return -ESTALE which we return to the caller who
- * will retry the operation and work with a new set of more recent
- * btrees.
- *
 * We only insert uncached regions because this is called with cluster
 * locks held, but without locking the cache.  The regions we read can
 * be stale with respect to the current cache, which can be read and
 * dirtied by other cluster lock holders on our node, but the cluster
- * locks protect the stable items we read.
+ * locks protect the stable items we read.  Invalidation is careful not
+ * to drop pages that have items that we couldn't see because they were
+ * dirty when we started reading.
 *
- * Using the presence of locally written dirty pages to override stale
- * read pages only works if, well, the more recent locally written pages
- * are still present.  Readers are totally decoupled from writers and
- * can have a set of items that is very old indeed.  In the mean time
- * more recent items would have been dirtied locally, committed,
- * cleaned, and reclaimed.  We have a coarse barrier which ensures that
- * readers can't insert items read from old roots from before local data
- * was written.  If a write completes while a read is in progress the
- * read will have to retry.  The retried read can use cached blocks so
- * we're relying on reads being much faster than writes to reduce the
- * overhead to mostly cpu work of recollecting the items from cached
- * blocks via a more recent root from the server.
+ * The forest item reader is reading stable trees that could be
+ * overwritten.  It can return -ESTALE which we return to the caller who
+ * will retry the operation and work with a new set of more recent
+ * btrees.
 */
 static int read_pages(struct super_block *sb, struct item_cache_info *cinf,
 		      struct scoutfs_key *key, struct scoutfs_lock *lock)
 {
 	struct rb_root root = RB_ROOT;
+	INIT_ACTIVE_READER(active);
 	struct cached_page *right = NULL;
 	struct cached_page *pg;
 	struct cached_page *rd;
@@ -1416,7 +1480,6 @@ static int read_pages(struct super_block *sb, struct item_cache_info *cinf,
 	struct rb_node *par;
 	struct rb_node *pg_tmp;
 	struct rb_node *item_tmp;
-	u64 rdbar;
 	int pgi;
 	int ret;

@@ -1430,9 +1493,8 @@ static int read_pages(struct super_block *sb, struct item_cache_info *cinf,
 	pg->end = lock->end;
 	rbtree_insert(&pg->node, NULL, &root.rb_node, &root);

-	read_lock(&cinf->rwlock);
-	rdbar = cinf->read_dirty_barrier;
-	read_unlock(&cinf->rwlock);
+	/* set active reader seq before reading persistent roots */
+	add_active_reader(sb, &active);

 	start = lock->start;
 	end = lock->end;
@@ -1471,13 +1533,6 @@ static int read_pages(struct super_block *sb, struct item_cache_info *cinf,
 retry:
 	write_lock(&cinf->rwlock);

-	/* can't insert if write has cleaned since we read */
-	if (cinf->read_dirty_barrier != rdbar) {
-		scoutfs_inc_counter(sb, item_read_pages_barrier);
-		ret = -ESTALE;
-		goto unlock;
-	}
-
 	while ((rd = first_page(&root))) {

 		pg = page_rbtree_walk(sb, &cinf->pg_root, &rd->start, &rd->end,
@@ -1515,12 +1570,12 @@ retry:
 		}
 	}

-	ret = 0;
-
-unlock:
 	write_unlock(&cinf->rwlock);

+	ret = 0;
 out:
+	del_active_reader(cinf, &active);
+
 	/* free any pages we left dangling on error */
 	for_each_page_safe(&root, rd, pg_tmp) {
 		rbtree_erase(&rd->node, &root);
@@ -1580,7 +1635,6 @@ retry:
 			ret = read_pages(sb, cinf, key, lock);
 		if (ret < 0 && ret != -ESTALE)
 			goto out;
-		scoutfs_inc_counter(sb, item_read_pages_retry);
 		goto retry;
 	}

@@ -2347,12 +2401,6 @@ out:
 * The caller has successfully committed all the dirty btree blocks that
 * contained the currently dirty items.  Clear all the dirty items and
 * pages.
- *
- * This strange lock/trylock loop comes from sparse issuing spurious
- * mismatched context warnings if we do anything (like unlock and relax)
- * in the else branch of the failed trylock.  We're jumping through
- * hoops to not use the else but still drop and reacquire the dirty_lock
- * if the trylock fails.
 */
 int scoutfs_item_write_done(struct super_block *sb)
 {
@@ -2361,35 +2409,40 @@ int scoutfs_item_write_done(struct super_block *sb)
 	struct cached_item *tmp;
 	struct cached_page *pg;

-	/* don't let read_pages miss written+cleaned items */
-	write_lock(&cinf->rwlock);
-	cinf->read_dirty_barrier++;
-	write_unlock(&cinf->rwlock);
-
+retry:
 	spin_lock(&cinf->dirty_lock);
-	while ((pg = list_first_entry_or_null(&cinf->dirty_list, struct cached_page, dirty_head))) {
-		if (write_trylock(&pg->rwlock)) {
+
+	while ((pg = list_first_entry_or_null(&cinf->dirty_list,
+					      struct cached_page,
+					      dirty_head))) {
+
+		if (!write_trylock(&pg->rwlock)) {
 			spin_unlock(&cinf->dirty_lock);
-			list_for_each_entry_safe(item, tmp, &pg->dirty_list,
-						 dirty_head) {
-				clear_item_dirty(sb, cinf, pg, item);
-
-				if (item->delta)
-					scoutfs_inc_counter(sb, item_delta_written);
-
-				/* free deletion items */
-				if (item->deletion || item->delta)
-					erase_item(pg, item);
-				else
-					item->persistent = 1;
-			}
-
-			write_unlock(&pg->rwlock);
-			spin_lock(&cinf->dirty_lock);
+			cpu_relax();
+			goto retry;
 		}
+
 		spin_unlock(&cinf->dirty_lock);
+
+		list_for_each_entry_safe(item, tmp, &pg->dirty_list,
+					 dirty_head) {
+			clear_item_dirty(sb, cinf, pg, item);
+
+			if (item->delta)
+				scoutfs_inc_counter(sb, item_delta_written);
+
+			/* free deletion items */
+			if (item->deletion || item->delta)
+				erase_item(pg, item);
+			else
+				item->persistent = 1;
+		}
+
+		write_unlock(&pg->rwlock);
+
 		spin_lock(&cinf->dirty_lock);
-	} while (pg);
+	}
+
 	spin_unlock(&cinf->dirty_lock);

 	return 0;
@@ -2544,15 +2597,24 @@ static unsigned long item_cache_scan_objects(struct shrinker *shrink,
 	struct cached_page *tmp;
 	struct cached_page *pg;
 	unsigned long freed = 0;
+	u64 first_reader_seq;
 	int nr = sc->nr_to_scan;

 	scoutfs_inc_counter(sb, item_cache_scan_objects);

+	/* can't invalidate pages with items that weren't visible to first reader */
+	first_reader_seq = first_active_reader_seq(cinf);
+
 	write_lock(&cinf->rwlock);
 	spin_lock(&cinf->lru_lock);

 	list_for_each_entry_safe(pg, tmp, &cinf->lru_list, lru_head) {

+		if (first_reader_seq <= pg->max_seq) {
+			scoutfs_inc_counter(sb, item_shrink_page_reader);
+			continue;
+		}
+
 		if (!write_trylock(&pg->rwlock)) {
 			scoutfs_inc_counter(sb, item_shrink_page_trylock);
 			continue;
@@ -2619,6 +2681,8 @@ int scoutfs_item_setup(struct super_block *sb)
 	atomic_set(&cinf->dirty_pages, 0);
 	spin_lock_init(&cinf->lru_lock);
 	INIT_LIST_HEAD(&cinf->lru_list);
+	spin_lock_init(&cinf->active_lock);
+	INIT_LIST_HEAD(&cinf->active_list);

 	cinf->pcpu_pages = alloc_percpu(struct item_percpu_pages);
 	if (!cinf->pcpu_pages)
@@ -2651,6 +2715,8 @@ void scoutfs_item_destroy(struct super_block *sb)
 	int cpu;

 	if (cinf) {
+		BUG_ON(!list_empty(&cinf->active_list));
+
 #ifdef KC_CPU_NOTIFIER
 		unregister_hotcpu_notifier(&cinf->notifier);
 #endif
--- a/kmod/src/kernelcompat.c
+++ b/kmod/src/kernelcompat.c
@@ -81,69 +81,3 @@ kc_generic_file_buffered_write(struct kiocb *iocb, const struct iovec *iov,
 	return written ? written : status;
 }
 #endif
-
-#include <linux/list_lru.h>
-
-#ifdef KC_LIST_LRU_WALK_CB_ITEM_LOCK
-static enum lru_status kc_isolate(struct list_head *item, spinlock_t *lock, void *cb_arg)
-{
-	struct kc_isolate_args *args = cb_arg;
-
-	/* isolate doesn't use list, nr_items updated in caller */
-	return args->isolate(item, NULL, args->cb_arg);
-}
-
-unsigned long kc_list_lru_walk(struct list_lru *lru, kc_list_lru_walk_cb_t isolate, void *cb_arg,
-				      unsigned long nr_to_walk)
-{
-	struct kc_isolate_args args = {
-		.isolate = isolate,
-		.cb_arg = cb_arg,
-	};
-
-	return list_lru_walk(lru, kc_isolate, &args, nr_to_walk);
-}
-
-unsigned long kc_list_lru_shrink_walk(struct list_lru *lru, struct shrink_control *sc,
-				      kc_list_lru_walk_cb_t isolate, void *cb_arg)
-{
-	struct kc_isolate_args args = {
-		.isolate = isolate,
-		.cb_arg = cb_arg,
-	};
-
-	return list_lru_shrink_walk(lru, sc, kc_isolate, &args);
-}
-#endif
-
-#ifdef KC_LIST_LRU_WALK_CB_LIST_LOCK
-static enum lru_status kc_isolate(struct list_head *item, struct list_lru_one *list,
-				  spinlock_t *lock, void *cb_arg)
-{
-	struct kc_isolate_args *args = cb_arg;
-
-	return args->isolate(item, list, args->cb_arg);
-}
-
-unsigned long kc_list_lru_walk(struct list_lru *lru, kc_list_lru_walk_cb_t isolate, void *cb_arg,
-				      unsigned long nr_to_walk)
-{
-	struct kc_isolate_args args = {
-		.isolate = isolate,
-		.cb_arg = cb_arg,
-	};
-
-	return list_lru_walk(lru, kc_isolate, &args, nr_to_walk);
-}
-unsigned long kc_list_lru_shrink_walk(struct list_lru *lru, struct shrink_control *sc,
-				      kc_list_lru_walk_cb_t isolate, void *cb_arg)
-{
-	struct kc_isolate_args args = {
-		.isolate = isolate,
-		.cb_arg = cb_arg,
-	};
-
-	return list_lru_shrink_walk(lru, sc, kc_isolate, &args);
-}
-
-#endif
--- a/kmod/src/kernelcompat.h
+++ b/kmod/src/kernelcompat.h
@@ -263,11 +263,6 @@ typedef unsigned int blk_opf_t;
 #define kc__vmalloc __vmalloc
 #endif

-#ifdef KC_VFS_METHOD_MNT_IDMAP_ARG
-#define KC_VFS_NS_DEF struct mnt_idmap *mnt_idmap,
-#define KC_VFS_NS mnt_idmap,
-#define KC_VFS_INIT_NS &nop_mnt_idmap,
-#else
 #ifdef KC_VFS_METHOD_USER_NAMESPACE_ARG
 #define KC_VFS_NS_DEF struct user_namespace *mnt_user_ns,
 #define KC_VFS_NS mnt_user_ns,
@@ -277,7 +272,6 @@ typedef unsigned int blk_opf_t;
 #define KC_VFS_NS
 #define KC_VFS_INIT_NS
 #endif
-#endif /* KC_VFS_METHOD_MNT_IDMAP_ARG */

 #ifdef KC_BIO_ALLOC_DEV_OPF_ARGS
 #define kc_bio_alloc bio_alloc
@@ -416,77 +410,4 @@ static inline vm_fault_t vmf_error(int err)
 }
 #endif

-#include <linux/list_lru.h>
-
-#ifndef KC_LIST_LRU_SHRINK_COUNT_WALK
-/* we don't bother with sc->{nid,memcg} (which doesn't exist in oldest kernels) */
-static inline unsigned long list_lru_shrink_count(struct list_lru *lru,
-                                                  struct shrink_control *sc)
-{
-        return list_lru_count(lru);
-}
-static inline unsigned long
-list_lru_shrink_walk(struct list_lru *lru, struct shrink_control *sc,
-		     list_lru_walk_cb isolate, void *cb_arg)
-{
-	return list_lru_walk(lru, isolate, cb_arg, sc->nr_to_scan);
-}
-#endif
-
-#ifndef KC_LIST_LRU_ADD_OBJ
-#define list_lru_add_obj list_lru_add
-#define list_lru_del_obj list_lru_del
-#endif
-
-#if defined(KC_LIST_LRU_WALK_CB_LIST_LOCK) || defined(KC_LIST_LRU_WALK_CB_ITEM_LOCK)
-struct list_lru_one;
-typedef enum lru_status (*kc_list_lru_walk_cb_t)(struct list_head *item, struct list_lru_one *list,
-						 void *cb_arg);
-struct kc_isolate_args {
-	kc_list_lru_walk_cb_t isolate;
-	void *cb_arg;
-};
-unsigned long kc_list_lru_walk(struct list_lru *lru, kc_list_lru_walk_cb_t isolate, void *cb_arg,
-			       unsigned long nr_to_walk);
-unsigned long kc_list_lru_shrink_walk(struct list_lru *lru, struct shrink_control *sc,
-				      kc_list_lru_walk_cb_t isolate, void *cb_arg);
-#else
-#define kc_list_lru_shrink_walk list_lru_shrink_walk
-#endif
-
-#if defined(KC_LIST_LRU_WALK_CB_ITEM_LOCK)
-/* isolate moved by hand, nr_items updated in walk as _REMOVE returned */
-static inline void list_lru_isolate_move(struct list_lru_one *list, struct list_head *item,
-					 struct list_head *head)
-{
-        list_move(item, head);
-}
-#endif
-
-#ifndef KC_STACK_TRACE_SAVE
-#include <linux/stacktrace.h>
-static inline unsigned int stack_trace_save(unsigned long *store, unsigned int size,
-					    unsigned int skipnr)
-{
-        struct stack_trace trace = {
-                .entries        = store,
-                .max_entries    = size,
-                .skip           = skipnr,
-        };
-
-        save_stack_trace(&trace);
-        return trace.nr_entries;
-}
-
-static inline void stack_trace_print(unsigned long *entries, unsigned int nr_entries, int spaces)
-{
-        struct stack_trace trace = {
-                .entries        = entries,
-                .nr_entries     = nr_entries,
-        };
-
-	print_stack_trace(&trace, spaces);
-}
-#endif
-
 #endif
--- a/kmod/src/lock.c
+++ b/kmod/src/lock.c
@@ -53,10 +53,8 @@
 * all access to the lock (by revoking it down to a null mode) then the
 * lock is freed.
 *
- * Each client has a configurable number of locks that are allowed to
- * remain idle after being granted, for use by future tasks.  Past the
- * limit locks are freed by requesting a null mode from the server,
- * governed by a LRU.
+ * Memory pressure on the client can cause the client to request a null
+ * mode from the server so that once its granted the lock can be freed.
 *
 * So far we've only needed a minimal trylock.  We return -EAGAIN if a
 * lock attempt can't immediately match an existing granted lock.  This
@@ -81,11 +79,14 @@ struct lock_info {
 	bool unmounting;
 	struct rb_root lock_tree;
 	struct rb_root lock_range_tree;
-	u64 nr_locks;
+	KC_DEFINE_SHRINKER(shrinker);
 	struct list_head lru_list;
+	unsigned long long lru_nr;
 	struct workqueue_struct *workq;
 	struct work_struct inv_work;
 	struct list_head inv_list;
+	struct work_struct shrink_work;
+	struct list_head shrink_list;
 	atomic64_t next_refresh_gen;

 	struct dentry *tseq_dentry;
@@ -167,6 +168,7 @@ static int lock_invalidate(struct super_block *sb, struct scoutfs_lock *lock,
 			   enum scoutfs_lock_mode prev, enum scoutfs_lock_mode mode)
 {
 	struct scoutfs_lock_coverage *cov;
+	struct scoutfs_lock_coverage *tmp;
 	u64 ino, last;
 	int ret = 0;

@@ -190,22 +192,19 @@ static int lock_invalidate(struct super_block *sb, struct scoutfs_lock *lock,

 	/* have to invalidate if we're not in the only usable case */
 	if (!(prev == SCOUTFS_LOCK_WRITE && mode == SCOUTFS_LOCK_READ)) {
-		/*
-		 * Remove cov items to tell users that their cache is
-		 * stale.  The unlock pattern comes from avoiding bad
-		 * sparse warnings when taking else in a failed trylock.
-		 */
+retry:
+		/* remove cov items to tell users that their cache is stale */
 		spin_lock(&lock->cov_list_lock);
-		while ((cov = list_first_entry_or_null(&lock->cov_list,
-						       struct scoutfs_lock_coverage, head))) {
-			if (spin_trylock(&cov->cov_lock)) {
-				list_del_init(&cov->head);
-				cov->lock = NULL;
-				spin_unlock(&cov->cov_lock);
-				scoutfs_inc_counter(sb, lock_invalidate_coverage);
+		list_for_each_entry_safe(cov, tmp, &lock->cov_list, head) {
+			if (!spin_trylock(&cov->cov_lock)) {
+				spin_unlock(&lock->cov_list_lock);
+				cpu_relax();
+				goto retry;
 			}
-			spin_unlock(&lock->cov_list_lock);
-			spin_lock(&lock->cov_list_lock);
+			list_del_init(&cov->head);
+			cov->lock = NULL;
+			spin_unlock(&cov->cov_lock);
+			scoutfs_inc_counter(sb, lock_invalidate_coverage);
 		}
 		spin_unlock(&lock->cov_list_lock);

@@ -248,6 +247,7 @@ static void lock_free(struct lock_info *linfo, struct scoutfs_lock *lock)
 	BUG_ON(!RB_EMPTY_NODE(&lock->range_node));
 	BUG_ON(!list_empty(&lock->lru_head));
 	BUG_ON(!list_empty(&lock->inv_head));
+	BUG_ON(!list_empty(&lock->shrink_head));
 	BUG_ON(!list_empty(&lock->cov_list));

 	kfree(lock->inode_deletion_data);
@@ -275,6 +275,7 @@ static struct scoutfs_lock *lock_alloc(struct super_block *sb,
 	INIT_LIST_HEAD(&lock->lru_head);
 	INIT_LIST_HEAD(&lock->inv_head);
 	INIT_LIST_HEAD(&lock->inv_list);
+	INIT_LIST_HEAD(&lock->shrink_head);
 	spin_lock_init(&lock->cov_list_lock);
 	INIT_LIST_HEAD(&lock->cov_list);

@@ -407,7 +408,6 @@ static bool lock_insert(struct super_block *sb, struct scoutfs_lock *ins)
 	rb_link_node(&ins->node, parent, node);
 	rb_insert_color(&ins->node, &linfo->lock_tree);

-	linfo->nr_locks++;
 	scoutfs_tseq_add(&linfo->tseq_tree, &ins->tseq_entry);

 	return true;
@@ -422,7 +422,6 @@ static void lock_remove(struct lock_info *linfo, struct scoutfs_lock *lock)
 	rb_erase(&lock->range_node, &linfo->lock_range_tree);
 	RB_CLEAR_NODE(&lock->range_node);

-	linfo->nr_locks--;
 	scoutfs_tseq_del(&linfo->tseq_tree, &lock->tseq_entry);
 }

@@ -462,8 +461,10 @@ static void __lock_del_lru(struct lock_info *linfo, struct scoutfs_lock *lock)
 {
 	assert_spin_locked(&linfo->lock);

-	if (!list_empty(&lock->lru_head))
+	if (!list_empty(&lock->lru_head)) {
 		list_del_init(&lock->lru_head);
+		linfo->lru_nr--;
+	}
 }

 /*
@@ -522,16 +523,14 @@ static struct scoutfs_lock *create_lock(struct super_block *sb,
 * indicate that the lock wasn't idle.  If it really is idle then we
 * either free it if it's null or put it back on the lru.
 */
-static void __put_lock(struct lock_info *linfo, struct scoutfs_lock *lock, bool tail)
+static void put_lock(struct lock_info *linfo,struct scoutfs_lock *lock)
 {
 	assert_spin_locked(&linfo->lock);

 	if (lock_idle(lock)) {
 		if (lock->mode != SCOUTFS_LOCK_NULL) {
-			if (tail)
-				list_add_tail(&lock->lru_head, &linfo->lru_list);
-			else
-				list_add(&lock->lru_head, &linfo->lru_list);
+			list_add_tail(&lock->lru_head, &linfo->lru_list);
+			linfo->lru_nr++;
 		} else {
 			lock_remove(linfo, lock);
 			lock_free(linfo, lock);
@@ -539,11 +538,6 @@ static void __put_lock(struct lock_info *linfo, struct scoutfs_lock *lock, bool
 	}
 }

-static inline void put_lock(struct lock_info *linfo, struct scoutfs_lock *lock)
-{
-	__put_lock(linfo, lock, true);
-}
-
 /*
 * The caller has made a change (set a lock mode) which can let one of the
 * invalidating locks make forward progress.
@@ -717,14 +711,14 @@ static void lock_invalidate_worker(struct work_struct *work)
 		/* only lock protocol, inv can't call subsystems after shutdown */
 		if (!linfo->shutdown) {
 			ret = lock_invalidate(sb, lock, nl->old_mode, nl->new_mode);
-			BUG_ON(ret < 0 && ret != -ENOLINK);
+			BUG_ON(ret);
 		}

 		/* respond with the key and modes from the request, server might have died */
 		ret = scoutfs_client_lock_response(sb, ireq->net_id, nl);
 		if (ret == -ENOTCONN)
 			ret = 0;
-		BUG_ON(ret < 0 && ret != -ENOLINK);
+		BUG_ON(ret);

 		scoutfs_inc_counter(sb, lock_invalidate_response);
 	}
@@ -879,69 +873,6 @@ int scoutfs_lock_recover_request(struct super_block *sb, u64 net_id,
 	return ret;
 }

-/*
- * This is called on every _lock call to try and keep the number of
- * locks under the idle count.  We're intentionally trying to throttle
- * shrinking bursts by tying its frequency to lock use.  It will only
- * send requests to free unused locks, though, so it's always possible
- * to exceed the high water mark under heavy load.
- *
- * We send a null request and the lock will be freed by the response
- * once all users drain.  If this races with invalidation then the
- * server will only send the grant response once the invalidation is
- * finished.
- */
-static bool try_shrink_lock(struct super_block *sb, struct lock_info *linfo, bool force)
-{
-	struct scoutfs_mount_options opts;
-	struct scoutfs_lock *lock = NULL;
-	struct scoutfs_net_lock nl;
-	int ret = 0;
-
-	scoutfs_options_read(sb, &opts);
-
-	/* avoiding lock contention with unsynchronized test, don't mind temp false results */
-	if (!force && (list_empty(&linfo->lru_list) ||
-	               READ_ONCE(linfo->nr_locks) <= opts.lock_idle_count))
-		return false;
-
-	spin_lock(&linfo->lock);
-
-	lock = list_first_entry_or_null(&linfo->lru_list, struct scoutfs_lock, lru_head);
-	if (lock && (force || (linfo->nr_locks > opts.lock_idle_count))) {
-		__lock_del_lru(linfo, lock);
-		lock->request_pending = 1;
-
-		nl.key = lock->start;
-		nl.old_mode = lock->mode;
-		nl.new_mode = SCOUTFS_LOCK_NULL;
-	} else {
-		lock = NULL;
-	}
-
-	spin_unlock(&linfo->lock);
-
-	if (lock) {
-		ret = scoutfs_client_lock_request(sb, &nl);
-		if (ret < 0) {
-			scoutfs_inc_counter(sb, lock_shrink_request_failed);
-
-			spin_lock(&linfo->lock);
-
-			lock->request_pending = 0;
-			wake_up(&lock->waitq);
-			__put_lock(linfo, lock, false);
-
-			spin_unlock(&linfo->lock);
-		} else {
-			scoutfs_inc_counter(sb, lock_shrink_attempted);
-			trace_scoutfs_lock_shrink(sb, lock);
-		}
-	}
-
-	return lock && ret == 0;
-}
-
 static bool lock_wait_cond(struct super_block *sb, struct scoutfs_lock *lock,
 			   enum scoutfs_lock_mode mode)
 {
@@ -1004,8 +935,6 @@ static int lock_key_range(struct super_block *sb, enum scoutfs_lock_mode mode, i
 	if (WARN_ON_ONCE(scoutfs_trans_held()))
 		return -EDEADLK;

-	try_shrink_lock(sb, linfo, false);
-
 	spin_lock(&linfo->lock);

 	/* drops and re-acquires lock if it allocates */
@@ -1449,12 +1378,134 @@ bool scoutfs_lock_protected(struct scoutfs_lock *lock, struct scoutfs_key *key,
 					  &lock->start, &lock->end) == 0;
 }

+/*
+ * The shrink callback got the lock, marked it request_pending, and put
+ * it on the shrink list.  We send a null request and the lock will be
+ * freed by the response once all users drain.  If this races with
+ * invalidation then the server will only send the grant response once
+ * the invalidation is finished.
+ */
+static void lock_shrink_worker(struct work_struct *work)
+{
+	struct lock_info *linfo = container_of(work, struct lock_info,
+					       shrink_work);
+	struct super_block *sb = linfo->sb;
+	struct scoutfs_net_lock nl;
+	struct scoutfs_lock *lock;
+	struct scoutfs_lock *tmp;
+	LIST_HEAD(list);
+	int ret;
+
+	scoutfs_inc_counter(sb, lock_shrink_work);
+
+	spin_lock(&linfo->lock);
+	list_splice_init(&linfo->shrink_list, &list);
+	spin_unlock(&linfo->lock);
+
+	list_for_each_entry_safe(lock, tmp, &list, shrink_head) {
+		list_del_init(&lock->shrink_head);
+
+		/* unlocked lock access, but should be stable since we queued */
+		nl.key = lock->start;
+		nl.old_mode = lock->mode;
+		nl.new_mode = SCOUTFS_LOCK_NULL;
+
+		ret = scoutfs_client_lock_request(sb, &nl);
+		if (ret) {
+			/* oh well, not freeing */
+			scoutfs_inc_counter(sb, lock_shrink_aborted);
+
+			spin_lock(&linfo->lock);
+
+			lock->request_pending = 0;
+			wake_up(&lock->waitq);
+			put_lock(linfo, lock);
+
+			spin_unlock(&linfo->lock);
+		}
+	}
+}
+
+static unsigned long lock_count_objects(struct shrinker *shrink,
+					struct shrink_control *sc)
+{
+	struct lock_info *linfo = KC_SHRINKER_CONTAINER_OF(shrink, struct lock_info);
+	struct super_block *sb = linfo->sb;
+
+	scoutfs_inc_counter(sb, lock_count_objects);
+
+	return shrinker_min_long(linfo->lru_nr);
+}
+
+/*
+ * Start the shrinking process for locks on the lru.  If a lock is on
+ * the lru then it can't have any active users.  We don't want to block
+ * or allocate here so all we do is get the lock, mark it request
+ * pending, and kick off the work.  The work sends a null request and
+ * eventually the lock is freed by its response.
+ *
+ * Only a racing lock attempt that isn't matched can prevent the lock
+ * from being freed.  It'll block waiting to send its request for its
+ * mode which will prevent the lock from being freed when the null
+ * response arrives.
+ */
+static unsigned long lock_scan_objects(struct shrinker *shrink,
+				       struct shrink_control *sc)
+{
+	struct lock_info *linfo = KC_SHRINKER_CONTAINER_OF(shrink, struct lock_info);
+	struct super_block *sb = linfo->sb;
+	struct scoutfs_lock *lock;
+	struct scoutfs_lock *tmp;
+	unsigned long freed = 0;
+	unsigned long nr = sc->nr_to_scan;
+	bool added = false;
+
+	scoutfs_inc_counter(sb, lock_scan_objects);
+
+	spin_lock(&linfo->lock);
+
+restart:
+	list_for_each_entry_safe(lock, tmp, &linfo->lru_list, lru_head) {
+
+		BUG_ON(!lock_idle(lock));
+		BUG_ON(lock->mode == SCOUTFS_LOCK_NULL);
+		BUG_ON(!list_empty(&lock->shrink_head));
+
+		if (nr-- == 0)
+			break;
+
+		__lock_del_lru(linfo, lock);
+		lock->request_pending = 1;
+		list_add_tail(&lock->shrink_head, &linfo->shrink_list);
+		added = true;
+		freed++;
+
+		scoutfs_inc_counter(sb, lock_shrink_attempted);
+		trace_scoutfs_lock_shrink(sb, lock);
+
+		/* could have bazillions of idle locks */
+		if (cond_resched_lock(&linfo->lock))
+			goto restart;
+	}
+
+	spin_unlock(&linfo->lock);
+
+	if (added)
+		queue_work(linfo->workq, &linfo->shrink_work);
+
+	trace_scoutfs_lock_shrink_exit(sb, sc->nr_to_scan, freed);
+	return freed;
+}
+
 void scoutfs_free_unused_locks(struct super_block *sb)
 {
-	DECLARE_LOCK_INFO(sb, linfo);
+	struct lock_info *linfo = SCOUTFS_SB(sb)->lock_info;
+	struct shrink_control sc = {
+		.gfp_mask = GFP_NOFS,
+		.nr_to_scan = INT_MAX,
+	};

-	while (try_shrink_lock(sb, linfo, true))
-		cond_resched();
+	lock_scan_objects(KC_SHRINKER_FN(&linfo->shrinker), &sc);
 }

 static void lock_tseq_show(struct seq_file *m, struct scoutfs_tseq_entry *ent)
@@ -1537,10 +1588,10 @@ u64 scoutfs_lock_ino_refresh_gen(struct super_block *sb, u64 ino)
 * transitions and sending requests.   We set the shutdown flag to catch
 * anyone who breaks this rule.
 *
- * With no more lock callers, we'll no longer try to shrink the pool of
- * granted locks.  We'll free all of them as _destroy() is called after
- * the farewell response indicates that the server tore down all our
- * lock state.
+ * We unregister the shrinker so that we won't try and send null
+ * requests in response to memory pressure.  The locks will all be
+ * unceremoniously dropped once we get a farewell response from the
+ * server which indicates that they destroyed our locking state.
 *
 * We will still respond to invalidation requests that have to be
 * processed to let unmount in other mounts acquire locks and make
@@ -1560,6 +1611,10 @@ void scoutfs_lock_shutdown(struct super_block *sb)

 	trace_scoutfs_lock_shutdown(sb, linfo);

+	/* stop the shrinker from queueing work */
+	KC_UNREGISTER_SHRINKER(&linfo->shrinker);
+	flush_work(&linfo->shrink_work);
+
 	/* cause current and future lock calls to return errors */
 	spin_lock(&linfo->lock);
 	linfo->shutdown = true;
@@ -1650,6 +1705,8 @@ void scoutfs_lock_destroy(struct super_block *sb)
 			list_del_init(&lock->inv_head);
 			lock->invalidate_pending = 0;
 		}
+		if (!list_empty(&lock->shrink_head))
+			list_del_init(&lock->shrink_head);
 		lock_remove(linfo, lock);
 		lock_free(linfo, lock);
 	}
@@ -1674,9 +1731,14 @@ int scoutfs_lock_setup(struct super_block *sb)
 	spin_lock_init(&linfo->lock);
 	linfo->lock_tree = RB_ROOT;
 	linfo->lock_range_tree = RB_ROOT;
+	KC_INIT_SHRINKER_FUNCS(&linfo->shrinker, lock_count_objects,
+			       lock_scan_objects);
+	KC_REGISTER_SHRINKER(&linfo->shrinker, "scoutfs-lock:" SCSBF, SCSB_ARGS(sb));
 	INIT_LIST_HEAD(&linfo->lru_list);
 	INIT_WORK(&linfo->inv_work, lock_invalidate_worker);
 	INIT_LIST_HEAD(&linfo->inv_list);
+	INIT_WORK(&linfo->shrink_work, lock_shrink_worker);
+	INIT_LIST_HEAD(&linfo->shrink_list);
 	atomic64_set(&linfo->next_refresh_gen, 0);
 	scoutfs_tseq_tree_init(&linfo->tseq_tree, lock_tseq_show);

--- a/kmod/src/lock_server.c
+++ b/kmod/src/lock_server.c
@@ -506,19 +506,6 @@ out:
 * because we don't know which locks they'll hold.  Once recover
 * finishes the server calls us to kick all the locks that were waiting
 * during recovery.
- *
- * The calling server shuts down if we return errors indicating that we
- * weren't able to ensure forward progress in the lock state machine.
- *
- * Failure to send to a disconnected client is not a fatal error.
- * During normal disconnection the client's state is removed before
- * their connection is destroyed.  We can't use state to try and send to
- * a non-existing connection.  But a client that fails to reconnect is
- * disconnected before being fenced.  If we have multiple disconnected
- * clients we can try to send to one while cleaning up another.  If
- * they've uncleanly disconnected their locks are going to be removed
- * and the lock can make forward progress again.  Or we'll shutdown for
- * failure to fence.
 */
 static int process_waiting_requests(struct super_block *sb,
 				    struct server_lock_node *snode)
@@ -610,10 +597,6 @@ static int process_waiting_requests(struct super_block *sb,
 out:
 	put_server_lock(inf, snode);

-	/* disconnected clients will be fenced, trying to send to them isn't fatal */
-	if (ret == -ENOTCONN)
-		ret = 0;
-
 	return ret;
 }

--- a/kmod/src/msg.h
+++ b/kmod/src/msg.h
@@ -35,12 +35,6 @@ do {									\
 	}								\
 } while (0)								\

-#define scoutfs_bug_on_err(sb, err, fmt, args...) \
-do { \
-	__typeof__(err) _err = (err); \
-	scoutfs_bug_on(sb, _err < 0 && _err != -ENOLINK, fmt, ##args); \
-} while (0)
-
 /*
 * Each message is only generated once per volume.  Remounting resets
 * the messages.
--- a/kmod/src/net.c
+++ b/kmod/src/net.c
@@ -20,8 +20,6 @@
 #include <net/sock.h>
 #include <net/tcp.h>
 #include <linux/log2.h>
-#include <linux/jhash.h>
-#include <linux/rbtree.h>

 #include "format.h"
 #include "counters.h"
@@ -33,7 +31,6 @@
 #include "endian_swap.h"
 #include "tseq.h"
 #include "fence.h"
-#include "options.h"

 /*
 * scoutfs networking delivers requests and responses between nodes.
@@ -126,7 +123,6 @@ struct message_send {
 	unsigned long dead:1;
 	struct list_head head;
 	scoutfs_net_response_t resp_func;
-	struct rb_node node;
 	void *resp_data;
 	struct scoutfs_net_header nh;
 };
@@ -138,7 +134,6 @@ struct message_send {
 struct message_recv {
 	struct scoutfs_tseq_entry tseq_entry;
 	struct work_struct proc_work;
-	struct list_head ordered_head;
 	struct scoutfs_net_connection *conn;
 	struct scoutfs_net_header nh;
 };
@@ -163,118 +158,49 @@ static bool nh_is_request(struct scoutfs_net_header *nh)
 	return !nh_is_response(nh);
 }

-static int cmp_sorted_msend(u64 pos, struct message_send *msend)
-{
-	if (nh_is_request(&msend->nh))
-		return pos < le64_to_cpu(msend->nh.id) ? -1 :
-		       pos > le64_to_cpu(msend->nh.id) ? 1 : 0;
-	else
-		return pos < le64_to_cpu(msend->nh.seq) ? -1 :
-		       pos > le64_to_cpu(msend->nh.seq) ? 1 : 0;
-}
-
-static struct message_send *search_sorted_msends(struct rb_root *root, u64 pos, struct rb_node *ins)
-{
-	struct rb_node **node = &root->rb_node;
-	struct rb_node *parent = NULL;
-	struct message_send *msend = NULL;
-	struct message_send *next = NULL;
-	int cmp = -1;
-
-	while (*node) {
-		parent = *node;
-		msend = container_of(*node, struct message_send, node);
-
-		cmp = cmp_sorted_msend(pos, msend);
-		if (cmp < 0) {
-			next = msend;
-			node = &(*node)->rb_left;
-		} else if (cmp > 0) {
-			node = &(*node)->rb_right;
-		} else {
-			next = msend;
-			break;
-		}
-	}
-
-	BUG_ON(cmp == 0 && ins);
-
-	if (ins) {
-		rb_link_node(ins, parent, node);
-		rb_insert_color(ins, root);
-	}
-
-	return next;
-}
-
-static struct message_send *next_sorted_msend(struct message_send *msend)
-{
-	struct rb_node *node = rb_next(&msend->node);
-
-	return node ? rb_entry(node, struct message_send, node) : NULL;
-}
-
-#define for_each_sorted_msend(MSEND_, TMP_, ROOT_, POS_) \
-	for (MSEND_ = search_sorted_msends(ROOT_, POS_, NULL); \
-	     MSEND_ != NULL && ({ TMP_ = next_sorted_msend(MSEND_); true; }); \
-	     MSEND_ = TMP_)
-
-static void insert_sorted_msend(struct scoutfs_net_connection *conn, struct message_send *msend)
-{
-	BUG_ON(!RB_EMPTY_NODE(&msend->node));
-
-	if (nh_is_request(&msend->nh))
-		search_sorted_msends(&conn->req_root, le64_to_cpu(msend->nh.id), &msend->node);
-	else
-		search_sorted_msends(&conn->resp_root, le64_to_cpu(msend->nh.seq), &msend->node);
-}
-
-static void erase_sorted_msend(struct scoutfs_net_connection *conn, struct message_send *msend)
-{
-	if (!RB_EMPTY_NODE(&msend->node)) {
-		if (nh_is_request(&msend->nh))
-			rb_erase(&msend->node, &conn->req_root);
-		else
-			rb_erase(&msend->node, &conn->resp_root);
-		RB_CLEAR_NODE(&msend->node);
-	}
-}
-
-static void move_sorted_msends(struct scoutfs_net_connection *dst_conn, struct rb_root *dst_root,
-			       struct scoutfs_net_connection *src_conn, struct rb_root *src_root)
-{
-	struct message_send *msend;
-	struct message_send *tmp;
-
-	for_each_sorted_msend(msend, tmp, src_root, 0) {
-		erase_sorted_msend(src_conn, msend);
-		insert_sorted_msend(dst_conn, msend);
-	}
-}
-
 /*
- * Pending requests are uniquely identified by the id they were assigned
- * as they were first put on the send queue.
+ * We return dead requests so that the caller can stop searching other
+ * lists for the dead request that we found.
 */
-static struct message_send *find_request(struct scoutfs_net_connection *conn, u8 cmd, u64 id)
+static struct message_send *search_list(struct scoutfs_net_connection *conn,
+					struct list_head *list,
+					u8 cmd, u64 id)
 {
 	struct message_send *msend;

 	assert_spin_locked(&conn->lock);

-	msend = search_sorted_msends(&conn->req_root, id, NULL);
-	if (msend && !(msend->nh.cmd == cmd && le64_to_cpu(msend->nh.id) == id))
-		msend = NULL;
+	list_for_each_entry(msend, list, head) {
+		if (nh_is_request(&msend->nh) && msend->nh.cmd == cmd &&
+		    le64_to_cpu(msend->nh.id) == id)
+			return msend;
+	}

+	return NULL;
+}
+
+/*
+ * Find an active send request on the lists.  It's almost certainly
+ * waiting on the resend queue but it could be actively being sent.
+ */
+static struct message_send *find_request(struct scoutfs_net_connection *conn,
+					 u8 cmd, u64 id)
+{
+	struct message_send *msend;
+
+	msend = search_list(conn, &conn->resend_queue, cmd, id) ?:
+		search_list(conn, &conn->send_queue, cmd, id);
+	if (msend && msend->dead)
+		msend = NULL;
 	return msend;
 }

 /*
- * Free a send message by moving it to the send queue and marking it
- * dead.  It is removed from the sorted rb roots so it won't be visible
- * as a request for response processing.
+ * Complete a send message by moving it to the send queue and marking it
+ * to be freed.  It won't be visible to callers trying to find sends.
 */
-static void queue_dead_free(struct scoutfs_net_connection *conn, struct message_send *msend)
+static void complete_send(struct scoutfs_net_connection *conn,
+			  struct message_send *msend)
 {
 	assert_spin_locked(&conn->lock);

@@ -284,7 +210,6 @@ static void queue_dead_free(struct scoutfs_net_connection *conn, struct message_

 	msend->dead = 1;
 	list_move(&msend->head, &conn->send_queue);
-	erase_sorted_msend(conn, msend);
 	queue_work(conn->workq, &conn->send_work);
 }

@@ -336,7 +261,7 @@ static inline u8 net_err_from_host(struct super_block *sb, int error)
 				     error);
 		}

-		return SCOUTFS_NET_ERR_EINVAL;
+		return -EINVAL;
 	}

 	return net_errs[ind];
@@ -407,7 +332,7 @@ static int submit_send(struct super_block *sb,
 		return -EINVAL;

 	if (scoutfs_forcing_unmount(sb))
-		return -ENOLINK;
+		return -EIO;

 	msend = kmalloc(offsetof(struct message_send,
 				 nh.data[data_len]), GFP_NOFS);
@@ -442,7 +367,6 @@ static int submit_send(struct super_block *sb,
 	msend->resp_func = resp_func;
 	msend->resp_data = resp_data;
 	msend->dead = 0;
-	RB_CLEAR_NODE(&msend->node);

 	msend->nh.seq = cpu_to_le64(seq);
 	msend->nh.recv_seq = 0;  /* set when sent, not when queued */
@@ -463,7 +387,6 @@ static int submit_send(struct super_block *sb,
 	} else {
 		list_add_tail(&msend->head, &conn->resend_queue);
 	}
-	insert_sorted_msend(conn, msend);

 	if (id_ret)
 		*id_ret = le64_to_cpu(msend->nh.id);
@@ -533,7 +456,7 @@ static int process_response(struct scoutfs_net_connection *conn,
 	if (msend) {
 		resp_func = msend->resp_func;
 		resp_data = msend->resp_data;
-		queue_dead_free(conn, msend);
+		complete_send(conn, msend);
 	} else {
 		scoutfs_inc_counter(sb, net_dropped_response);
 	}
@@ -575,83 +498,76 @@ static void scoutfs_net_proc_worker(struct work_struct *work)
 	trace_scoutfs_net_proc_work_exit(sb, 0, ret);
 }

-static void scoutfs_net_ordered_proc_worker(struct work_struct *work)
-{
-	struct scoutfs_work_list *wlist = container_of(work, struct scoutfs_work_list, work);
-	struct message_recv *mrecv;
-	struct message_recv *mrecv__;
-	LIST_HEAD(list);
-
-	spin_lock(&wlist->lock);
-	list_splice_init(&wlist->list, &list);
-	spin_unlock(&wlist->lock);
-
-	list_for_each_entry_safe(mrecv, mrecv__, &list, ordered_head) {
-		list_del_init(&mrecv->ordered_head);
-		scoutfs_net_proc_worker(&mrecv->proc_work);
-	}
-}
-
-/*
- * Some messages require in-order processing.  But the scope of the
- * ordering isn't global.  In the case of lock messages, it's per lock.
- * So for these messages we hash them to a number of ordered workers who
- * walk a list and call the usual work function in order.  This replaced
- * first the proc work detecting OOO and re-ordering, and then only
- * calling proc from the one recv work context.
- */
-static void queue_ordered_proc(struct scoutfs_net_connection *conn, struct message_recv *mrecv)
-{
-	struct scoutfs_work_list *wlist;
-	struct scoutfs_net_lock *nl;
-	u32 h;
-
-	if (WARN_ON_ONCE(mrecv->nh.cmd != SCOUTFS_NET_CMD_LOCK ||
-		         le16_to_cpu(mrecv->nh.data_len) != sizeof(struct scoutfs_net_lock)))
-		return scoutfs_net_proc_worker(&mrecv->proc_work);
-
-	nl = (void *)mrecv->nh.data;
-	h = jhash(&nl->key, sizeof(struct scoutfs_key), 0x6fdd3cd5);
-	wlist = &conn->ordered_proc_wlists[h % conn->ordered_proc_nr];
-
-	spin_lock(&wlist->lock);
-	list_add_tail(&mrecv->ordered_head, &wlist->list);
-	spin_unlock(&wlist->lock);
-	queue_work(conn->workq, &wlist->work);
-}
-
 /*
 * Free live responses up to and including the seq by marking them dead
 * and moving them to the send queue to be freed.
 */
-static void free_acked_responses(struct scoutfs_net_connection *conn, u64 seq)
+static bool move_acked_responses(struct scoutfs_net_connection *conn,
+				 struct list_head *list, u64 seq)
 {
 	struct message_send *msend;
 	struct message_send *tmp;
+	bool moved = false;
+
+	assert_spin_locked(&conn->lock);
+
+	list_for_each_entry_safe(msend, tmp, list, head) {
+		if (le64_to_cpu(msend->nh.seq) > seq)
+			break;
+		if (!nh_is_response(&msend->nh) || msend->dead)
+			continue;
+
+		msend->dead = 1;
+		list_move(&msend->head, &conn->send_queue);
+		moved = true;
+	}
+
+	return moved;
+}
+
+/* acks are processed inline in the recv worker */
+static void free_acked_responses(struct scoutfs_net_connection *conn, u64 seq)
+{
+	bool moved;

 	spin_lock(&conn->lock);

-	for_each_sorted_msend(msend, tmp, &conn->resp_root, 0) {
-		if (le64_to_cpu(msend->nh.seq) > seq)
-			break;
-
-		queue_dead_free(conn, msend);
-	}
+	moved = move_acked_responses(conn, &conn->send_queue, seq) |
+		move_acked_responses(conn, &conn->resend_queue, seq);

 	spin_unlock(&conn->lock);
+
+	if (moved)
+		queue_work(conn->workq, &conn->send_work);
 }

-static int k_recvmsg(struct socket *sock, void *buf, unsigned len)
+static int recvmsg_full(struct socket *sock, void *buf, unsigned len)
 {
-	struct kvec kv = {
-		.iov_base = buf,
-		.iov_len = len,
-	};
-	struct msghdr msg = {
-		.msg_flags = MSG_NOSIGNAL,
-	};
+	struct msghdr msg;
+	struct kvec kv;
+	int ret;

-	return kernel_recvmsg(sock, &msg, &kv, 1, len, msg.msg_flags);
+	while (len) {
+		memset(&msg, 0, sizeof(msg));
+		msg.msg_flags = MSG_NOSIGNAL;
+		kv.iov_base = buf;
+		kv.iov_len = len;
+
+#ifndef KC_MSGHDR_STRUCT_IOV_ITER
+		msg.msg_iov = (struct iovec *)&kv;
+		msg.msg_iovlen = 1;
+#else
+		iov_iter_init(&msg.msg_iter, READ, (struct iovec *)&kv, len, 1);
+#endif
+		ret = kernel_recvmsg(sock, &msg, &kv, 1, len, msg.msg_flags);
+		if (ret <= 0)
+			return -ECONNABORTED;
+
+		len -= ret;
+		buf += ret;
+	}
+
+	return 0;
 }

 static bool invalid_message(struct scoutfs_net_connection *conn,
@@ -688,72 +604,6 @@ static bool invalid_message(struct scoutfs_net_connection *conn,
 	return false;
 }

-static int recv_one_message(struct super_block *sb, struct net_info *ninf,
-			    struct scoutfs_net_connection *conn, struct scoutfs_net_header *nh,
-			    unsigned int data_len)
-{
-	struct message_recv *mrecv;
-	int ret;
-
-	scoutfs_inc_counter(sb, net_recv_messages);
-	scoutfs_add_counter(sb, net_recv_bytes, nh_bytes(data_len));
-	trace_scoutfs_net_recv_message(sb, &conn->sockname, &conn->peername, nh);
-
-	/* caller's invalid message checked data len */
-	mrecv = kmalloc(offsetof(struct message_recv, nh.data[data_len]), GFP_NOFS);
-	if (!mrecv) {
-		ret = -ENOMEM;
-		goto out;
-	}
-
-	mrecv->conn = conn;
-	INIT_WORK(&mrecv->proc_work, scoutfs_net_proc_worker);
-	INIT_LIST_HEAD(&mrecv->ordered_head);
-	mrecv->nh = *nh;
-	if (data_len)
-		memcpy(mrecv->nh.data, (nh + 1), data_len);
-
-	if (nh->cmd == SCOUTFS_NET_CMD_GREETING) {
-		/* greetings are out of band, no seq mechanics */
-		set_conn_fl(conn, saw_greeting);
-
-	} else if (le64_to_cpu(nh->seq) <=
-		   atomic64_read(&conn->recv_seq)) {
-		/* drop any resent duplicated messages */
-		scoutfs_inc_counter(sb, net_recv_dropped_duplicate);
-		kfree(mrecv);
-		ret = 0;
-		goto out;
-
-	} else {
-		/* record that we've received sender's seq */
-		atomic64_set(&conn->recv_seq, le64_to_cpu(nh->seq));
-		/* and free our responses that sender has received */
-		free_acked_responses(conn, le64_to_cpu(nh->recv_seq));
-	}
-
-	scoutfs_tseq_add(&ninf->msg_tseq_tree, &mrecv->tseq_entry);
-
-	/*
-	 * Initial received greetings are processed inline
-	 * before any other incoming messages.
-	 *
-	 * Incoming requests or responses to the lock client
-	 * can't handle re-ordering, so they're queued to
-	 * ordered receive processing work.
-	 */
-	if (nh->cmd == SCOUTFS_NET_CMD_GREETING)
-		scoutfs_net_proc_worker(&mrecv->proc_work);
-	else if (nh->cmd == SCOUTFS_NET_CMD_LOCK && !conn->listening_conn)
-		queue_ordered_proc(conn, mrecv);
-	else
-		queue_work(conn->workq, &mrecv->proc_work);
-	ret = 0;
-
-out:
-	return ret;
-}
-
 /*
 * Always block receiving from the socket.  Errors trigger shutting down
 * the connection.
@@ -764,72 +614,86 @@ static void scoutfs_net_recv_worker(struct work_struct *work)
 	struct super_block *sb = conn->sb;
 	struct net_info *ninf = SCOUTFS_SB(sb)->net_info;
 	struct socket *sock = conn->sock;
-	struct scoutfs_net_header *nh;
-	struct page *page = NULL;
+	struct scoutfs_net_header nh;
+	struct message_recv *mrecv;
 	unsigned int data_len;
-	int hdr_off;
-	int rx_off;
-	int size;
 	int ret;

 	trace_scoutfs_net_recv_work_enter(sb, 0, 0);

-	page = alloc_page(GFP_NOFS);
-	if (!page) {
-		ret = -ENOMEM;
-		goto out;
-	}
-
-	hdr_off = 0;
-	rx_off = 0;
-
 	for (;;) {
 		/* receive the header */
-		ret = k_recvmsg(sock, page_address(page) + rx_off, PAGE_SIZE - rx_off);
-		if (ret <= 0) {
-			ret = -ECONNABORTED;
-			goto out;
+		ret = recvmsg_full(sock, &nh, sizeof(nh));
+		if (ret)
+			break;
+
+		/* receiving an invalid message breaks the connection */
+		if (invalid_message(conn, &nh)) {
+			scoutfs_inc_counter(sb, net_recv_invalid_message);
+			ret = -EBADMSG;
+			break;
 		}

-		rx_off += ret;
+		data_len = le16_to_cpu(nh.data_len);

-		for (;;) {
-			size = rx_off - hdr_off;
-			if (size < sizeof(struct scoutfs_net_header))
-				break;
+		scoutfs_inc_counter(sb, net_recv_messages);
+		scoutfs_add_counter(sb, net_recv_bytes, nh_bytes(data_len));
+		trace_scoutfs_net_recv_message(sb, &conn->sockname,
+					       &conn->peername, &nh);

-			nh = page_address(page) + hdr_off;
-
-			/* receiving an invalid message breaks the connection */
-			if (invalid_message(conn, nh)) {
-				scoutfs_inc_counter(sb, net_recv_invalid_message);
-				ret = -EBADMSG;
-				break;
-			}
-
-			data_len = le16_to_cpu(nh->data_len);
-			if (sizeof(struct scoutfs_net_header) + data_len > size)
-				break;
-
-			ret = recv_one_message(sb, ninf, conn, nh, data_len);
-			if (ret < 0)
-				goto out;
-
-			hdr_off += sizeof(struct scoutfs_net_header) + data_len;
+		/* invalid message checked data len */
+		mrecv = kmalloc(offsetof(struct message_recv,
+					 nh.data[data_len]), GFP_NOFS);
+		if (!mrecv) {
+			ret = -ENOMEM;
+			break;
 		}

-		if ((PAGE_SIZE - rx_off) <
-		    (sizeof(struct scoutfs_net_header) + SCOUTFS_NET_MAX_DATA_LEN)) {
-			if (size)
-				memmove(page_address(page), page_address(page) + hdr_off, size);
-			hdr_off = 0;
-			rx_off = size;
+		mrecv->conn = conn;
+		INIT_WORK(&mrecv->proc_work, scoutfs_net_proc_worker);
+		mrecv->nh = nh;
+
+		/* receive the data payload */
+		ret = recvmsg_full(sock, mrecv->nh.data, data_len);
+		if (ret) {
+			kfree(mrecv);
+			break;
 		}
+
+		if (nh.cmd == SCOUTFS_NET_CMD_GREETING) {
+			/* greetings are out of band, no seq mechanics */
+			set_conn_fl(conn, saw_greeting);
+
+		} else if (le64_to_cpu(nh.seq) <=
+			   atomic64_read(&conn->recv_seq)) {
+			/* drop any resent duplicated messages */
+			scoutfs_inc_counter(sb, net_recv_dropped_duplicate);
+			kfree(mrecv);
+			continue;
+
+		} else {
+			/* record that we've received sender's seq */
+			atomic64_set(&conn->recv_seq, le64_to_cpu(nh.seq));
+			/* and free our responses that sender has received */
+			free_acked_responses(conn, le64_to_cpu(nh.recv_seq));
+		}
+
+		scoutfs_tseq_add(&ninf->msg_tseq_tree, &mrecv->tseq_entry);
+
+		/*
+		 * Initial received greetings are processed
+		 * synchronously before any other incoming messages.
+		 *
+		 * Incoming requests or responses to the lock client are
+		 * called synchronously to avoid reordering.
+		 */
+		if (nh.cmd == SCOUTFS_NET_CMD_GREETING ||
+		    (nh.cmd == SCOUTFS_NET_CMD_LOCK && !conn->listening_conn))
+			scoutfs_net_proc_worker(&mrecv->proc_work);
+		else
+			queue_work(conn->workq, &mrecv->proc_work);
 	}

-out:
-	__free_page(page);
-
 	if (ret)
 		scoutfs_inc_counter(sb, net_recv_error);

@@ -839,48 +703,38 @@ out:
 	trace_scoutfs_net_recv_work_exit(sb, 0, ret);
 }

-/*
- * This consumes the kvec.
- */
-static int k_sendmsg_full(struct socket *sock, struct kvec *kv, unsigned long nr_segs, size_t count)
+static int sendmsg_full(struct socket *sock, void *buf, unsigned len)
 {
-	int ret = 0;
+	struct msghdr msg;
+	struct kvec kv;
+	int ret;

-	while (count > 0) {
-		struct msghdr msg = {
-			.msg_flags = MSG_NOSIGNAL,
-		};
+	while (len) {
+		memset(&msg, 0, sizeof(msg));
+		msg.msg_flags = MSG_NOSIGNAL;
+		kv.iov_base = buf;
+		kv.iov_len = len;

-		ret = kernel_sendmsg(sock, &msg, kv, nr_segs, count);
-		if (ret <= 0) {
-			ret = -ECONNABORTED;
-			break;
-		}
+#ifndef KC_MSGHDR_STRUCT_IOV_ITER
+		msg.msg_iov = (struct iovec *)&kv;
+		msg.msg_iovlen = 1;
+#else
+		iov_iter_init(&msg.msg_iter, WRITE, (struct iovec *)&kv, len, 1);
+#endif
+		ret = kernel_sendmsg(sock, &msg, &kv, 1, len);
+		if (ret <= 0)
+			return -ECONNABORTED;

-		count -= ret;
-		if (count) {
-			while (nr_segs > 0 && ret >= kv->iov_len) {
-				ret -= kv->iov_len;
-				kv++;
-				nr_segs--;
-			}
-			if (nr_segs > 0 && ret > 0) {
-				kv->iov_base += ret;
-				kv->iov_len -= ret;
-			}
-			BUG_ON(nr_segs == 0);
-		}
-		ret = 0;
+		len -= ret;
+		buf += ret;
 	}
-	
-	return ret;
+
+	return 0;
 }

-static void free_msend(struct net_info *ninf, struct scoutfs_net_connection *conn,
-		       struct message_send *msend)
+static void free_msend(struct net_info *ninf, struct message_send *msend)
 {
 	list_del_init(&msend->head);
-	erase_sorted_msend(conn, msend);
 	scoutfs_tseq_del(&ninf->msg_tseq_tree, &msend->tseq_entry);
 	kfree(msend);
 }
@@ -906,74 +760,54 @@ static void scoutfs_net_send_worker(struct work_struct *work)
 	struct super_block *sb = conn->sb;
 	struct net_info *ninf = SCOUTFS_SB(sb)->net_info;
 	struct message_send *msend;
-	struct message_send *_msend_;
-	struct kvec kv[16];
-	unsigned long nr_segs;
-	size_t count;
+	int ret = 0;
 	int len;
-	int ret;

 	trace_scoutfs_net_send_work_enter(sb, 0, 0);

-	for (;;) {
-		nr_segs = 0;
-		count = 0;
+	spin_lock(&conn->lock);
+
+	while ((msend = list_first_entry_or_null(&conn->send_queue,
+						 struct message_send, head))) {
+
+		if (msend->dead) {
+			free_msend(ninf, msend);
+			continue;
+		}
+
+		if ((msend->nh.cmd == SCOUTFS_NET_CMD_FAREWELL) &&
+		    nh_is_response(&msend->nh)) {
+			set_conn_fl(conn, saw_farewell);
+		}
+
+		msend->nh.recv_seq =
+			cpu_to_le64(atomic64_read(&conn->recv_seq));
+
+		spin_unlock(&conn->lock);
+
+		len = nh_bytes(le16_to_cpu(msend->nh.data_len));
+
+		scoutfs_inc_counter(sb, net_send_messages);
+		scoutfs_add_counter(sb, net_send_bytes, len);
+		trace_scoutfs_net_send_message(sb, &conn->sockname,
+					       &conn->peername, &msend->nh);
+
+		ret = sendmsg_full(conn->sock, &msend->nh, len);

 		spin_lock(&conn->lock);

-		list_for_each_entry_safe(msend, _msend_, &conn->send_queue, head) {
-			if (msend->dead) {
-				free_msend(ninf, conn, msend);
-				continue;
-			}
+		msend->nh.recv_seq = 0;

-			len = nh_bytes(le16_to_cpu(msend->nh.data_len));
+		if (ret)
+			break;

-			if ((msend->nh.cmd == SCOUTFS_NET_CMD_FAREWELL) &&
-			    nh_is_response(&msend->nh)) {
-				set_conn_fl(conn, saw_farewell);
-			}
-
-			msend->nh.recv_seq = cpu_to_le64(atomic64_read(&conn->recv_seq));
-
-			scoutfs_inc_counter(sb, net_send_messages);
-			scoutfs_add_counter(sb, net_send_bytes, len);
-			trace_scoutfs_net_send_message(sb, &conn->sockname,
-						       &conn->peername, &msend->nh);
-
-			count += len;
-			kv[nr_segs].iov_base = &msend->nh;
-			kv[nr_segs].iov_len = len;
-			if (++nr_segs == ARRAY_SIZE(kv))
-				break;
-
-		}
-		spin_unlock(&conn->lock);
-
-		if (nr_segs == 0) {
-			ret = 0;
-			goto out;
-		}
-
-		ret = k_sendmsg_full(conn->sock, kv, nr_segs, count);
-		if (ret < 0)
-			goto out;
-
-		spin_lock(&conn->lock);
-		list_for_each_entry_safe(msend, _msend_, &conn->send_queue, head) {
-			msend->nh.recv_seq = 0;
-
-			/* resend if it wasn't freed while we sent */
-			if (!msend->dead)
-				list_move_tail(&msend->head, &conn->resend_queue);
-
-			if (--nr_segs == 0)
-				break;
-		}
-		spin_unlock(&conn->lock);
+		/* resend if it wasn't freed while we sent */
+		if (!msend->dead)
+			list_move_tail(&msend->head, &conn->resend_queue);
 	}

-out:
+	spin_unlock(&conn->lock);
+
 	if (ret) {
 		scoutfs_inc_counter(sb, net_send_error);
 		shutdown_conn(conn);
@@ -1012,7 +846,7 @@ static void scoutfs_net_destroy_worker(struct work_struct *work)

 	list_splice_init(&conn->resend_queue, &conn->send_queue);
 	list_for_each_entry_safe(msend, tmp, &conn->send_queue, head)
-		free_msend(ninf, conn, msend);
+		free_msend(ninf, msend);

 	/* accepted sockets are removed from their listener's list */
 	if (conn->listening_conn) {
@@ -1028,7 +862,6 @@ static void scoutfs_net_destroy_worker(struct work_struct *work)
 	destroy_workqueue(conn->workq);
 	scoutfs_tseq_del(&ninf->conn_tseq_tree, &conn->tseq_entry);
 	kfree(conn->info);
-	kfree(conn->ordered_proc_wlists);
 	trace_scoutfs_conn_destroy_free(conn);
 	kfree(conn);

@@ -1054,7 +887,7 @@ static void destroy_conn(struct scoutfs_net_connection *conn)
 * The TCP_KEEP* and TCP_USER_TIMEOUT option interaction is subtle.
 * TCP_USER_TIMEOUT only applies if there is unacked written data in the
 * send queue.  It doesn't work if the connection is idle.  Adding
- * keepalive probes with user_timeout set changes how the keepalive
+ * keepalice probes with user_timeout set changes how the keepalive
 * timeout is calculated.   CNT no longer matters.   Each time
 * additional probes (not the first) are sent the user timeout is
 * checked against the last time data was received.  If none of the
@@ -1066,16 +899,14 @@ static void destroy_conn(struct scoutfs_net_connection *conn)
 * elapses during the probe timer processing after the unsuccessful
 * probes.
 */
-static int sock_opts_and_names(struct super_block *sb,
-			       struct scoutfs_net_connection *conn,
+#define UNRESPONSIVE_TIMEOUT_SECS 10
+#define UNRESPONSIVE_PROBES 3
+static int sock_opts_and_names(struct scoutfs_net_connection *conn,
 			       struct socket *sock)
 {
-	struct scoutfs_mount_options opts;
 	int optval;
 	int ret;

-	scoutfs_options_read(sb, &opts);
-
 	/* we use a keepalive timeout instead of send timeout */
 	ret = kc_sock_set_sndtimeo(sock, 0);
 	if (ret)
@@ -1088,7 +919,8 @@ static int sock_opts_and_names(struct super_block *sb,
 	if (ret)
 		goto out;

-	optval = (opts.tcp_keepalive_timeout_ms / MSEC_PER_SEC) - UNRESPONSIVE_PROBES;
+	BUILD_BUG_ON(UNRESPONSIVE_PROBES >= UNRESPONSIVE_TIMEOUT_SECS);
+	optval = UNRESPONSIVE_TIMEOUT_SECS - (UNRESPONSIVE_PROBES);
 	ret = kc_tcp_sock_set_keepidle(sock, optval);
 	if (ret)
 		goto out;
@@ -1098,7 +930,7 @@ static int sock_opts_and_names(struct super_block *sb,
 	if (ret)
 		goto out;

-	optval = opts.tcp_keepalive_timeout_ms;
+	optval = UNRESPONSIVE_TIMEOUT_SECS * MSEC_PER_SEC;
 	ret = kc_tcp_sock_set_user_timeout(sock, optval);
 	if (ret)
 		goto out;
@@ -1160,19 +992,13 @@ static void scoutfs_net_listen_worker(struct work_struct *work)
 						  conn->notify_down,
 						  conn->info_size,
 						  conn->req_funcs, "accepted");
-		/*
-		 * scoutfs_net_alloc_conn() can fail due to ENOMEM. If this
-		 * is the only thing that does so, there's no harm in trying
-		 * to see if kernel_accept() can get enough memory to try accepting
-		 * a new connection again. If that then fails with ENOMEM, it'll
-		 * shut down the conn anyway. So just retry here.
-		 */
 		if (!acc_conn) {
 			sock_release(acc_sock);
+			ret = -ENOMEM;
 			continue;
 		}

-		ret = sock_opts_and_names(sb, acc_conn, acc_sock);
+		ret = sock_opts_and_names(acc_conn, acc_sock);
 		if (ret) {
 			sock_release(acc_sock);
 			destroy_conn(acc_conn);
@@ -1243,7 +1069,7 @@ static void scoutfs_net_connect_worker(struct work_struct *work)
 	if (ret)
 		goto out;

-	ret = sock_opts_and_names(sb, conn, sock);
+	ret = sock_opts_and_names(conn, sock);
 	if (ret)
 		goto out;

@@ -1358,7 +1184,7 @@ static void scoutfs_net_shutdown_worker(struct work_struct *work)
 							struct message_send, head))) {
 			resp_func = msend->resp_func;
 			resp_data = msend->resp_data;
-			free_msend(ninf, conn, msend);
+			free_msend(ninf, msend);
 			spin_unlock(&conn->lock);

 			call_resp_func(sb, conn, resp_func, resp_data, NULL, 0, -ECONNABORTED);
@@ -1374,7 +1200,7 @@ static void scoutfs_net_shutdown_worker(struct work_struct *work)
 	list_splice_tail_init(&conn->send_queue, &conn->resend_queue);
 	list_for_each_entry_safe(msend, tmp, &conn->resend_queue, head) {
 		if (msend->nh.cmd == SCOUTFS_NET_CMD_GREETING)
-			free_msend(ninf, conn, msend);
+			free_msend(ninf, msend);
 	}

 	clear_conn_fl(conn, saw_greeting);
@@ -1504,30 +1330,25 @@ scoutfs_net_alloc_conn(struct super_block *sb,
 {
 	struct net_info *ninf = SCOUTFS_SB(sb)->net_info;
 	struct scoutfs_net_connection *conn;
-	unsigned int nr;
-	unsigned int i;
-
-	nr = min_t(unsigned int, num_possible_cpus(),
-		   PAGE_SIZE / sizeof(struct scoutfs_work_list));

 	conn = kzalloc(sizeof(struct scoutfs_net_connection), GFP_NOFS);
-	if (conn) {
-		if (info_size)
-			conn->info = kzalloc(info_size, GFP_NOFS);
-		conn->ordered_proc_wlists = kmalloc_array(nr, sizeof(struct scoutfs_work_list),
-							  GFP_NOFS);
-		conn->workq = alloc_workqueue("scoutfs_net_%s",
-					      WQ_UNBOUND | WQ_NON_REENTRANT, 0,
-					      name_suffix);
-	}
-	if (!conn || (info_size && !conn->info) || !conn->workq || !conn->ordered_proc_wlists) {
-		if (conn) {
-			kfree(conn->info);
-			kfree(conn->ordered_proc_wlists);
-			if (conn->workq)
-				destroy_workqueue(conn->workq);
+	if (!conn)
+		return NULL;
+
+	if (info_size) {
+		conn->info = kzalloc(info_size, GFP_NOFS);
+		if (!conn->info) {
 			kfree(conn);
+			return NULL;
 		}
+	}
+
+	conn->workq = alloc_workqueue("scoutfs_net_%s",
+				      WQ_UNBOUND | WQ_NON_REENTRANT, 0,
+				      name_suffix);
+	if (!conn->workq) {
+		kfree(conn->info);
+		kfree(conn);
 		return NULL;
 	}

@@ -1548,8 +1369,6 @@ scoutfs_net_alloc_conn(struct super_block *sb,
 	atomic64_set(&conn->recv_seq, 0);
 	INIT_LIST_HEAD(&conn->send_queue);
 	INIT_LIST_HEAD(&conn->resend_queue);
-	conn->req_root = RB_ROOT;
-	conn->resp_root = RB_ROOT;
 	INIT_WORK(&conn->listen_work, scoutfs_net_listen_worker);
 	INIT_WORK(&conn->connect_work, scoutfs_net_connect_worker);
 	INIT_WORK(&conn->send_work, scoutfs_net_send_worker);
@@ -1559,13 +1378,6 @@ scoutfs_net_alloc_conn(struct super_block *sb,
 	INIT_DELAYED_WORK(&conn->reconn_free_dwork,
 			  scoutfs_net_reconn_free_worker);

-	conn->ordered_proc_nr = nr;
-	for (i = 0; i < nr; i++) {
-		INIT_WORK(&conn->ordered_proc_wlists[i].work, scoutfs_net_ordered_proc_worker);
-		spin_lock_init(&conn->ordered_proc_wlists[i].lock);
-		INIT_LIST_HEAD(&conn->ordered_proc_wlists[i].list);
-	}
-
 	scoutfs_tseq_add(&ninf->conn_tseq_tree, &conn->tseq_entry);
 	trace_scoutfs_conn_alloc(conn);

@@ -1762,7 +1574,7 @@ void scoutfs_net_client_greeting(struct super_block *sb,
 		atomic64_set(&conn->recv_seq, 0);
 		list_for_each_entry_safe(msend, tmp, &conn->resend_queue, head){
 			if (nh_is_response(&msend->nh))
-				free_msend(ninf, conn, msend);
+				free_msend(ninf, msend);
 		}
 	}

@@ -1865,8 +1677,6 @@ restart:
 		BUG_ON(!list_empty(&reconn->send_queue));
 		/* queued greeting response is racing, can be in send or resend queue */
 		list_splice_tail_init(&reconn->resend_queue, &conn->resend_queue);
-		move_sorted_msends(conn, &conn->req_root, reconn, &reconn->req_root);
-		move_sorted_msends(conn, &conn->resp_root, reconn, &reconn->resp_root);

 		/* new conn info is unused, swap, old won't call down */
 		swap(conn->info, reconn->info);
--- a/kmod/src/net.h
+++ b/kmod/src/net.h
@@ -1,18 +1,10 @@
 #ifndef _SCOUTFS_NET_H_
 #define _SCOUTFS_NET_H_

-#include <linux/spinlock.h>
-#include <linux/list.h>
 #include <linux/in.h>
 #include "endian_swap.h"
 #include "tseq.h"

-struct scoutfs_work_list {
-	struct work_struct work;
-	spinlock_t lock;
-	struct list_head list;
-};
-
 struct scoutfs_net_connection;

 /* These are called in their own blocking context */
@@ -67,12 +59,8 @@ struct scoutfs_net_connection {
 	u64 next_send_id;
 	struct list_head send_queue;
 	struct list_head resend_queue;
-	struct rb_root req_root;
-	struct rb_root resp_root;

 	atomic64_t recv_seq;
-	unsigned int ordered_proc_nr;
-	struct scoutfs_work_list *ordered_proc_wlists;

 	struct workqueue_struct *workq;
 	struct work_struct listen_work;
--- a/kmod/src/omap.c
+++ b/kmod/src/omap.c
@@ -592,7 +592,7 @@ static int handle_request(struct super_block *sb, struct omap_request *req)
 	ret = 0;
 out:
 	free_rids(&priv_rids);
-	if ((ret < 0) && (req != NULL)) {
+	if (ret < 0) {
 		ret = scoutfs_server_send_omap_response(sb, req->client_rid, req->client_id,
 							NULL, ret);
 		free_req(req);
--- a/kmod/src/options.c
+++ b/kmod/src/options.c
@@ -33,15 +33,13 @@ enum {
 	Opt_acl,
 	Opt_data_prealloc_blocks,
 	Opt_data_prealloc_contig_only,
-	Opt_ino_alloc_per_lock,
-	Opt_lock_idle_count,
 	Opt_log_merge_wait_timeout_ms,
 	Opt_metadev_path,
 	Opt_noacl,
 	Opt_orphan_scan_delay_ms,
 	Opt_quorum_heartbeat_timeout_ms,
 	Opt_quorum_slot_nr,
-	Opt_tcp_keepalive_timeout_ms,
+	Opt_meta_reserve_blocks,
 	Opt_err,
 };

@@ -49,15 +47,13 @@ static const match_table_t tokens = {
 	{Opt_acl, "acl"},
 	{Opt_data_prealloc_blocks, "data_prealloc_blocks=%s"},
 	{Opt_data_prealloc_contig_only, "data_prealloc_contig_only=%s"},
-	{Opt_ino_alloc_per_lock, "ino_alloc_per_lock=%s"},
-	{Opt_lock_idle_count, "lock_idle_count=%s"},
 	{Opt_log_merge_wait_timeout_ms, "log_merge_wait_timeout_ms=%s"},
 	{Opt_metadev_path, "metadev_path=%s"},
 	{Opt_noacl, "noacl"},
 	{Opt_orphan_scan_delay_ms, "orphan_scan_delay_ms=%s"},
 	{Opt_quorum_heartbeat_timeout_ms, "quorum_heartbeat_timeout_ms=%s"},
 	{Opt_quorum_slot_nr, "quorum_slot_nr=%s"},
-	{Opt_tcp_keepalive_timeout_ms, "tcp_keepalive_timeout_ms=%s"},
+	{Opt_meta_reserve_blocks, "meta_reserve_blocks=%s"},
 	{Opt_err, NULL}
 };

@@ -121,10 +117,6 @@ static void free_options(struct scoutfs_mount_options *opts)
 	kfree(opts->metadev_path);
 }

-#define MIN_LOCK_IDLE_COUNT	32
-#define DEFAULT_LOCK_IDLE_COUNT	(10 * 1000)
-#define MAX_LOCK_IDLE_COUNT	(100 * 1000)
-
 #define MIN_LOG_MERGE_WAIT_TIMEOUT_MS		100UL
 #define DEFAULT_LOG_MERGE_WAIT_TIMEOUT_MS	500
 #define MAX_LOG_MERGE_WAIT_TIMEOUT_MS		(60 * MSEC_PER_SEC)
@@ -136,7 +128,8 @@ static void free_options(struct scoutfs_mount_options *opts)
 #define MIN_DATA_PREALLOC_BLOCKS	1ULL
 #define MAX_DATA_PREALLOC_BLOCKS	((unsigned long long)SCOUTFS_BLOCK_SM_MAX)

-#define DEFAULT_TCP_KEEPALIVE_TIMEOUT_MS	(60 * MSEC_PER_SEC)
+#define SCOUTFS_META_RESERVE_DEFAULT_BLOCKS 16384
+

 static void init_default_options(struct scoutfs_mount_options *opts)
 {
@@ -144,28 +137,11 @@ static void init_default_options(struct scoutfs_mount_options *opts)

 	opts->data_prealloc_blocks = SCOUTFS_DATA_PREALLOC_DEFAULT_BLOCKS;
 	opts->data_prealloc_contig_only = 1;
-	opts->ino_alloc_per_lock = SCOUTFS_LOCK_INODE_GROUP_NR;
-	opts->lock_idle_count = DEFAULT_LOCK_IDLE_COUNT;
 	opts->log_merge_wait_timeout_ms = DEFAULT_LOG_MERGE_WAIT_TIMEOUT_MS;
 	opts->orphan_scan_delay_ms = -1;
 	opts->quorum_heartbeat_timeout_ms = SCOUTFS_QUORUM_DEF_HB_TIMEO_MS;
 	opts->quorum_slot_nr = -1;
-	opts->tcp_keepalive_timeout_ms = DEFAULT_TCP_KEEPALIVE_TIMEOUT_MS;
-}
-
-static int verify_lock_idle_count(struct super_block *sb, int ret, int val)
-{
-	if (ret < 0) {
-		scoutfs_err(sb, "failed to parse lock_idle_count value");
-		return -EINVAL;
-	}
-	if (val < MIN_LOCK_IDLE_COUNT || val > MAX_LOCK_IDLE_COUNT) {
-		scoutfs_err(sb, "invalid lock_idle_count value %d, must be between %u and %u",
-			    val, MIN_LOCK_IDLE_COUNT, MAX_LOCK_IDLE_COUNT);
-		return -EINVAL;
-	}
-
-	return 0;
+	opts->meta_reserve_blocks = SCOUTFS_META_RESERVE_DEFAULT_BLOCKS;
 }

 static int verify_log_merge_wait_timeout_ms(struct super_block *sb, int ret, int val)
@@ -197,16 +173,19 @@ static int verify_quorum_heartbeat_timeout_ms(struct super_block *sb, int ret, u

 	return 0;
 }
-
-static int verify_tcp_keepalive_timeout_ms(struct super_block *sb, int ret, int val)
+static int verify_meta_reserve_blocks(struct super_block *sb, int ret, int val)
 {
+	/*
+	 *  Ideally we set a limit to something reasonable like 1/2 the actual
+	 * total_meta_blocks, but we can't yet get this info when mount is called
+	 */
 	if (ret < 0) {
-		scoutfs_err(sb, "failed to parse tcp_keepalive_timeout_ms value");
+		scoutfs_err(sb, "failed to parse meta_reserve_blocks value");
 		return -EINVAL;
 	}
-	if (val <= (UNRESPONSIVE_PROBES * MSEC_PER_SEC)) {
-		scoutfs_err(sb, "invalid tcp_keepalive_timeout_ms value %d, must be larger than %lu",
-			    val, (UNRESPONSIVE_PROBES * MSEC_PER_SEC));
+	if (val < 0 || val > INT_MAX) {
+		scoutfs_err(sb, "invalid meta_reserve_blocks value %d, must be between 0 and %d",
+			    val, INT_MAX);
 		return -EINVAL;
 	}

@@ -263,34 +242,6 @@ static int parse_options(struct super_block *sb, char *options, struct scoutfs_m
 			opts->data_prealloc_contig_only = nr;
 			break;

-		case Opt_ino_alloc_per_lock:
-			ret = match_int(args, &nr);
-			if (ret < 0 || nr < 1 || nr > SCOUTFS_LOCK_INODE_GROUP_NR) {
-				scoutfs_err(sb, "invalid ino_alloc_per_lock option, must be between 1 and %u",
-					    SCOUTFS_LOCK_INODE_GROUP_NR);
-				if (ret == 0)
-					ret = -EINVAL;
-				return ret;
-			}
-			opts->ino_alloc_per_lock = nr;
-			break;
-
-		case Opt_tcp_keepalive_timeout_ms:
-			ret = match_int(args, &nr);
-			ret = verify_tcp_keepalive_timeout_ms(sb, ret, nr);
-			if (ret < 0)
-				return ret;
-			opts->tcp_keepalive_timeout_ms = nr;
-			break;
-
-		case Opt_lock_idle_count:
-			ret = match_int(args, &nr);
-			ret = verify_lock_idle_count(sb, ret, nr);
-			if (ret < 0)
-				return ret;
-			opts->lock_idle_count = nr;
-			break;
-
 		case Opt_log_merge_wait_timeout_ms:
 			ret = match_int(args, &nr);
 			ret = verify_log_merge_wait_timeout_ms(sb, ret, nr);
@@ -352,6 +303,14 @@ static int parse_options(struct super_block *sb, char *options, struct scoutfs_m
 			opts->quorum_slot_nr = nr;
 			break;

+		case Opt_meta_reserve_blocks:
+			ret = match_int(args, &nr);
+			ret = verify_meta_reserve_blocks(sb, ret, nr);
+			if (ret < 0)
+				return ret;
+			opts->meta_reserve_blocks = nr;
+			break;
+
 		default:
 			scoutfs_err(sb, "Unknown or malformed option, \"%s\"", p);
 			return -EINVAL;
@@ -438,14 +397,13 @@ int scoutfs_options_show(struct seq_file *seq, struct dentry *root)
 		seq_puts(seq, ",acl");
 	seq_printf(seq, ",data_prealloc_blocks=%llu", opts.data_prealloc_blocks);
 	seq_printf(seq, ",data_prealloc_contig_only=%u", opts.data_prealloc_contig_only);
-	seq_printf(seq, ",ino_alloc_per_lock=%u", opts.ino_alloc_per_lock);
 	seq_printf(seq, ",metadev_path=%s", opts.metadev_path);
 	if (!is_acl)
 		seq_puts(seq, ",noacl");
 	seq_printf(seq, ",orphan_scan_delay_ms=%u", opts.orphan_scan_delay_ms);
 	if (opts.quorum_slot_nr >= 0)
 		seq_printf(seq, ",quorum_slot_nr=%d", opts.quorum_slot_nr);
-	seq_printf(seq, ",tcp_keepalive_timeout_ms=%d", opts.tcp_keepalive_timeout_ms);
+	seq_printf(seq, ".meta_reserve_blocks=%llu", opts.meta_reserve_blocks);

 	return 0;
 }
@@ -527,82 +485,6 @@ static ssize_t data_prealloc_contig_only_store(struct kobject *kobj, struct kobj
 }
 SCOUTFS_ATTR_RW(data_prealloc_contig_only);

-static ssize_t ino_alloc_per_lock_show(struct kobject *kobj, struct kobj_attribute *attr,
-					 char *buf)
-{
-	struct super_block *sb = SCOUTFS_SYSFS_ATTRS_SB(kobj);
-	struct scoutfs_mount_options opts;
-
-	scoutfs_options_read(sb, &opts);
-
-	return snprintf(buf, PAGE_SIZE, "%u", opts.ino_alloc_per_lock);
-}
-static ssize_t ino_alloc_per_lock_store(struct kobject *kobj, struct kobj_attribute *attr,
-					  const char *buf, size_t count)
-{
-	struct super_block *sb = SCOUTFS_SYSFS_ATTRS_SB(kobj);
-	DECLARE_OPTIONS_INFO(sb, optinf);
-	char nullterm[20]; /* more than enough for octal -U32_MAX */
-	long val;
-	int len;
-	int ret;
-
-	len = min(count, sizeof(nullterm) - 1);
-	memcpy(nullterm, buf, len);
-	nullterm[len] = '\0';
-
-	ret = kstrtol(nullterm, 0, &val);
-	if (ret < 0 || val < 1 || val > SCOUTFS_LOCK_INODE_GROUP_NR) {
-		scoutfs_err(sb, "invalid ino_alloc_per_lock option, must be between 1 and %u",
-			    SCOUTFS_LOCK_INODE_GROUP_NR);
-		return -EINVAL;
-	}
-
-	write_seqlock(&optinf->seqlock);
-	optinf->opts.ino_alloc_per_lock = val;
-	write_sequnlock(&optinf->seqlock);
-
-	return count;
-}
-SCOUTFS_ATTR_RW(ino_alloc_per_lock);
-
-static ssize_t lock_idle_count_show(struct kobject *kobj, struct kobj_attribute *attr,
-						char *buf)
-{
-	struct super_block *sb = SCOUTFS_SYSFS_ATTRS_SB(kobj);
-	struct scoutfs_mount_options opts;
-
-	scoutfs_options_read(sb, &opts);
-
-	return snprintf(buf, PAGE_SIZE, "%u", opts.lock_idle_count);
-}
-static ssize_t lock_idle_count_store(struct kobject *kobj, struct kobj_attribute *attr,
-						 const char *buf, size_t count)
-{
-	struct super_block *sb = SCOUTFS_SYSFS_ATTRS_SB(kobj);
-	DECLARE_OPTIONS_INFO(sb, optinf);
-	char nullterm[30]; /* more than enough for octal -U64_MAX */
-	int val;
-	int len;
-	int ret;
-
-	len = min(count, sizeof(nullterm) - 1);
-	memcpy(nullterm, buf, len);
-	nullterm[len] = '\0';
-
-	ret = kstrtoint(nullterm, 0, &val);
-	ret = verify_lock_idle_count(sb, ret, val);
-	if (ret == 0) {
-		write_seqlock(&optinf->seqlock);
-		optinf->opts.lock_idle_count = val;
-		write_sequnlock(&optinf->seqlock);
-		ret = count;
-	}
-
-	return ret;
-}
-SCOUTFS_ATTR_RW(lock_idle_count);
-
 static ssize_t log_merge_wait_timeout_ms_show(struct kobject *kobj, struct kobj_attribute *attr,
 						char *buf)
 {
@@ -740,16 +622,26 @@ static ssize_t quorum_slot_nr_show(struct kobject *kobj, struct kobj_attribute *
 }
 SCOUTFS_ATTR_RO(quorum_slot_nr);

+static ssize_t meta_reserve_blocks_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf)
+{
+	struct super_block *sb = SCOUTFS_SYSFS_ATTRS_SB(kobj);
+	struct scoutfs_mount_options opts;
+
+	scoutfs_options_read(sb, &opts);
+
+	return snprintf(buf, PAGE_SIZE, "%lld\n", opts.meta_reserve_blocks);
+}
+SCOUTFS_ATTR_RO(meta_reserve_blocks);
+
 static struct attribute *options_attrs[] = {
 	SCOUTFS_ATTR_PTR(data_prealloc_blocks),
 	SCOUTFS_ATTR_PTR(data_prealloc_contig_only),
-	SCOUTFS_ATTR_PTR(ino_alloc_per_lock),
-	SCOUTFS_ATTR_PTR(lock_idle_count),
 	SCOUTFS_ATTR_PTR(log_merge_wait_timeout_ms),
 	SCOUTFS_ATTR_PTR(metadev_path),
 	SCOUTFS_ATTR_PTR(orphan_scan_delay_ms),
 	SCOUTFS_ATTR_PTR(quorum_heartbeat_timeout_ms),
 	SCOUTFS_ATTR_PTR(quorum_slot_nr),
+	SCOUTFS_ATTR_PTR(meta_reserve_blocks),
 	NULL,
 };

--- a/kmod/src/options.h
+++ b/kmod/src/options.h
@@ -8,18 +8,14 @@
 struct scoutfs_mount_options {
 	u64 data_prealloc_blocks;
 	bool data_prealloc_contig_only;
-	unsigned int ino_alloc_per_lock;
-	int lock_idle_count;
 	unsigned int log_merge_wait_timeout_ms;
 	char *metadev_path;
 	unsigned int orphan_scan_delay_ms;
 	int quorum_slot_nr;
 	u64 quorum_heartbeat_timeout_ms;
-	int tcp_keepalive_timeout_ms;
+	u64 meta_reserve_blocks;
 };

-#define UNRESPONSIVE_PROBES	3
-
 void scoutfs_options_read(struct super_block *sb, struct scoutfs_mount_options *opts);
 int scoutfs_options_show(struct seq_file *seq, struct dentry *root);

--- a/kmod/src/quorum.c
+++ b/kmod/src/quorum.c
@@ -243,6 +243,10 @@ static int send_msg_members(struct super_block *sb, int type, u64 term, int only
 	};
 	struct sockaddr_in sin;
 	struct msghdr mh = {
+#ifndef KC_MSGHDR_STRUCT_IOV_ITER
+		.msg_iov = (struct iovec *)&kv,
+		.msg_iovlen = 1,
+#endif
 		.msg_flags = MSG_DONTWAIT | MSG_NOSIGNAL,
 		.msg_name = &sin,
 		.msg_namelen = sizeof(sin),
@@ -264,7 +268,9 @@ static int send_msg_members(struct super_block *sb, int type, u64 term, int only

 		scoutfs_quorum_slot_sin(&qinf->qconf, i, &sin);
 		now = ktime_get();
-
+#ifdef KC_MSGHDR_STRUCT_IOV_ITER
+		iov_iter_init(&mh.msg_iter, WRITE, (struct iovec *)&kv, sizeof(qmes), 1);
+#endif
 		ret = kernel_sendmsg(qinf->sock, &mh, &kv, 1, kv.iov_len);
 		if (ret != kv.iov_len)
 			failed++;
@@ -306,6 +312,10 @@ static int recv_msg(struct super_block *sb, struct quorum_host_msg *msg,
 		.iov_len = sizeof(struct scoutfs_quorum_message),
 	};
 	struct msghdr mh = {
+#ifndef KC_MSGHDR_STRUCT_IOV_ITER
+		.msg_iov = (struct iovec *)&kv,
+		.msg_iovlen = 1,
+#endif
 		.msg_flags = MSG_NOSIGNAL,
 	};

@@ -323,6 +333,9 @@ static int recv_msg(struct super_block *sb, struct quorum_host_msg *msg,
 		ret = kc_tcp_sock_set_rcvtimeo(qinf->sock, rel_to);
 	}

+#ifdef KC_MSGHDR_STRUCT_IOV_ITER
+	iov_iter_init(&mh.msg_iter, READ, (struct iovec *)&kv, sizeof(struct scoutfs_quorum_message), 1);
+#endif
 	ret = kernel_recvmsg(qinf->sock, &mh, &kv, 1, kv.iov_len, mh.msg_flags);
 	if (ret < 0)
 		return ret;
@@ -507,10 +520,10 @@ static int update_quorum_block(struct super_block *sb, int event, u64 term, bool
 		set_quorum_block_event(sb, &blk, event, term);
 		ret = write_quorum_block(sb, blkno, &blk);
 		if (ret < 0)
-			scoutfs_err(sb, "error %d writing quorum block %llu after updating event %d term %llu",
+			scoutfs_err(sb, "error %d reading quorum block %llu to update event %d term %llu",
 				    ret, blkno, event, term);
 	} else {
-		scoutfs_err(sb, "error %d reading quorum block %llu to update event %d term %llu",
+		scoutfs_err(sb, "error %d writing quorum block %llu after updating event %d term %llu",
 			    ret, blkno, event, term);
 	}

@@ -809,7 +822,6 @@ static void scoutfs_quorum_worker(struct work_struct *work)

 		/* followers and candidates start new election on timeout */
 		if (qst.role != LEADER &&
-		    msg.type == SCOUTFS_QUORUM_MSG_INVALID &&
 		    ktime_after(ktime_get(), qst.timeout)) {
 			/* .. but only if their server has stopped */
 			if (!scoutfs_server_is_down(sb)) {
@@ -970,10 +982,7 @@ static void scoutfs_quorum_worker(struct work_struct *work)
 	}

 	/* record that this slot no longer has an active quorum */
-	err = update_quorum_block(sb, SCOUTFS_QUORUM_EVENT_END, qst.term, true);
-	if (err < 0 && ret == 0)
-		ret = err;
-
+	update_quorum_block(sb, SCOUTFS_QUORUM_EVENT_END, qst.term, true);
 out:
 	if (ret < 0) {
 		scoutfs_err(sb, "quorum service saw error %d, shutting down.  This mount is no longer participating in quorum.  It should be remounted to restore service.",
@@ -1062,7 +1071,7 @@ static char *role_str(int role)
 		[LEADER] = "leader",
 	};

-	if (role < 0 || role >= ARRAY_SIZE(roles) || !roles[role])
+	if (role < 0 || role > ARRAY_SIZE(roles) || !roles[role])
 		return "invalid";

 	return roles[role];
@@ -1195,8 +1204,8 @@ static struct attribute *quorum_attrs[] = {

 static inline bool valid_ipv4_unicast(__be32 addr)
 {
-	return !(ipv4_is_multicast(addr) || ipv4_is_lbcast(addr) ||
-		 ipv4_is_zeronet(addr) || ipv4_is_local_multicast(addr));
+	return !(ipv4_is_multicast(addr) && ipv4_is_lbcast(addr) &&
+		 ipv4_is_zeronet(addr) && ipv4_is_local_multicast(addr));
 }

 static inline bool valid_ipv4_port(__be16 port)
--- a/kmod/src/quota.c
+++ b/kmod/src/quota.c
@@ -34,7 +34,6 @@
 #include "totl.h"
 #include "util.h"
 #include "quota.h"
-#include "trans.h"
 #include "counters.h"
 #include "scoutfs_trace.h"

@@ -1087,10 +1086,6 @@ int scoutfs_quota_mod_rule(struct super_block *sb, bool is_add,
 	if (ret < 0)
 		goto out;

-	ret = scoutfs_hold_trans(sb, true);
-	if (ret < 0)
-		goto out;
-
 	down_write(&qtinf->rwsem);

 	if (is_add) {
@@ -1100,30 +1095,28 @@ int scoutfs_quota_mod_rule(struct super_block *sb, bool is_add,
 		else if (ret == 0)
 			ret = -EEXIST;
 		if (ret < 0)
-			goto release;
+			goto unlock;

 		rule_to_rule_val(&rv, &rule);
 		ret = scoutfs_item_create(sb, &key, &rv, sizeof(rv), lock);
 		if (ret < 0)
-			goto release;
+			goto unlock;

 	} else {
 		ret = find_rule(sb, &rule, &key, lock) ?:
 		      scoutfs_item_delete(sb, &key, lock);
 		if (ret < 0)
-			goto release;
+			goto unlock;
 	}

 	scoutfs_quota_invalidate(sb);
 	ret = 0;

-release:
+unlock:
 	up_write(&qtinf->rwsem);
-	scoutfs_release_trans(sb);
-
-out:
 	scoutfs_unlock(sb, lock, SCOUTFS_LOCK_WRITE);

+out:
 	if (is_add)
 		trace_scoutfs_quota_add_rule(sb, &rule, ret);
 	else
--- a/kmod/src/scoutfs_trace.h
+++ b/kmod/src/scoutfs_trace.h
@@ -789,80 +789,6 @@ TRACE_EVENT(scoutfs_inode_walk_writeback,
 		  __entry->ino, __entry->write, __entry->ret)
 );

-TRACE_EVENT(scoutfs_orphan_scan_start,
-	TP_PROTO(struct super_block *sb),
-
-	TP_ARGS(sb),
-
-	TP_STRUCT__entry(
-		SCSB_TRACE_FIELDS
-	),
-
-	TP_fast_assign(
-		SCSB_TRACE_ASSIGN(sb);
-	),
-
-	TP_printk(SCSBF, SCSB_TRACE_ARGS)
-);
-
-TRACE_EVENT(scoutfs_orphan_scan_stop,
-	TP_PROTO(struct super_block *sb, bool work_todo),
-
-	TP_ARGS(sb, work_todo),
-
-	TP_STRUCT__entry(
-		SCSB_TRACE_FIELDS
-		__field(bool, work_todo)
-	),
-
-	TP_fast_assign(
-		SCSB_TRACE_ASSIGN(sb);
-		__entry->work_todo = work_todo;
-	),
-
-	TP_printk(SCSBF" work_todo %d", SCSB_TRACE_ARGS, __entry->work_todo)
-);
-
-TRACE_EVENT(scoutfs_orphan_scan_work,
-	TP_PROTO(struct super_block *sb, __u64 ino),
-
-	TP_ARGS(sb, ino),
-
-	TP_STRUCT__entry(
-		SCSB_TRACE_FIELDS
-		__field(__u64, ino)
-	),
-
-	TP_fast_assign(
-		SCSB_TRACE_ASSIGN(sb);
-		__entry->ino = ino;
-	),
-
-	TP_printk(SCSBF" ino %llu", SCSB_TRACE_ARGS,
-		  __entry->ino)
-);
-
-TRACE_EVENT(scoutfs_orphan_scan_end,
-	TP_PROTO(struct super_block *sb, __u64 ino, int ret),
-
-	TP_ARGS(sb, ino, ret),
-
-	TP_STRUCT__entry(
-		SCSB_TRACE_FIELDS
-		__field(__u64, ino)
-		__field(int, ret)
-	),
-
-	TP_fast_assign(
-		SCSB_TRACE_ASSIGN(sb);
-		__entry->ino = ino;
-		__entry->ret = ret;
-	),
-
-	TP_printk(SCSBF" ino %llu ret %d", SCSB_TRACE_ARGS,
-		  __entry->ino, __entry->ret)
-);
-
 DECLARE_EVENT_CLASS(scoutfs_lock_info_class,
 	TP_PROTO(struct super_block *sb, struct lock_info *linfo),

@@ -897,14 +823,13 @@ DEFINE_EVENT(scoutfs_lock_info_class, scoutfs_lock_destroy,
 );

 TRACE_EVENT(scoutfs_xattr_set,
-	TP_PROTO(struct super_block *sb, __u64 ino, size_t name_len,
-		 const void *value, size_t size, int flags),
+	TP_PROTO(struct super_block *sb, size_t name_len, const void *value,
+		 size_t size, int flags),

-	TP_ARGS(sb, ino, name_len, value, size, flags),
+	TP_ARGS(sb, name_len, value, size, flags),

 	TP_STRUCT__entry(
 		SCSB_TRACE_FIELDS
-		__field(__u64, ino)
 		__field(size_t, name_len)
 		__field(const void *, value)
 		__field(size_t, size)
@@ -913,16 +838,15 @@ TRACE_EVENT(scoutfs_xattr_set,

 	TP_fast_assign(
 		SCSB_TRACE_ASSIGN(sb);
-		__entry->ino = ino;
 		__entry->name_len = name_len;
 		__entry->value = value;
 		__entry->size = size;
 		__entry->flags = flags;
 	),

-	TP_printk(SCSBF" ino %llu name_len %zu value %p size %zu flags 0x%x",
-		  SCSB_TRACE_ARGS, __entry->ino,  __entry->name_len,
-		  __entry->value, __entry->size, __entry->flags)
+	TP_printk(SCSBF" name_len %zu value %p size %zu flags 0x%x",
+		  SCSB_TRACE_ARGS, __entry->name_len, __entry->value,
+		  __entry->size, __entry->flags)
 );

 TRACE_EVENT(scoutfs_advance_dirty_super,
@@ -1110,82 +1034,6 @@ TRACE_EVENT(scoutfs_orphan_inode,
 		  MINOR(__entry->dev), __entry->ino)
 );

-DECLARE_EVENT_CLASS(scoutfs_try_delete_class,
-        TP_PROTO(struct super_block *sb, u64 ino),
-        TP_ARGS(sb, ino),
-        TP_STRUCT__entry(
-		SCSB_TRACE_FIELDS
-		__field(__u64, ino)
-        ),
-        TP_fast_assign(
-		SCSB_TRACE_ASSIGN(sb);
-		__entry->ino = ino;
-        ),
-	TP_printk(SCSBF" ino %llu", SCSB_TRACE_ARGS, __entry->ino)
-);
-
-DEFINE_EVENT(scoutfs_try_delete_class, scoutfs_try_delete,
-        TP_PROTO(struct super_block *sb, u64 ino),
-        TP_ARGS(sb, ino)
-);
-
-DEFINE_EVENT(scoutfs_try_delete_class, scoutfs_try_delete_local_busy,
-        TP_PROTO(struct super_block *sb, u64 ino),
-        TP_ARGS(sb, ino)
-);
-
-DEFINE_EVENT(scoutfs_try_delete_class, scoutfs_try_delete_cached,
-        TP_PROTO(struct super_block *sb, u64 ino),
-        TP_ARGS(sb, ino)
-);
-
-DEFINE_EVENT(scoutfs_try_delete_class, scoutfs_try_delete_no_item,
-        TP_PROTO(struct super_block *sb, u64 ino),
-        TP_ARGS(sb, ino)
-);
-
-TRACE_EVENT(scoutfs_try_delete_has_links,
-	TP_PROTO(struct super_block *sb, u64 ino, unsigned int nlink),
-
-	TP_ARGS(sb, ino, nlink),
-
-	TP_STRUCT__entry(
-		SCSB_TRACE_FIELDS
-		__field(__u64, ino)
-		__field(unsigned int, nlink)
-	),
-
-	TP_fast_assign(
-		SCSB_TRACE_ASSIGN(sb);
-		__entry->ino = ino;
-		__entry->nlink = nlink;
-	),
-
-	TP_printk(SCSBF" ino %llu nlink %u", SCSB_TRACE_ARGS, __entry->ino,
-		  __entry->nlink)
-);
-
-TRACE_EVENT(scoutfs_inode_orphan_delete,
-	TP_PROTO(struct super_block *sb, u64 ino, int ret),
-
-	TP_ARGS(sb, ino, ret),
-
-	TP_STRUCT__entry(
-		SCSB_TRACE_FIELDS
-		__field(__u64, ino)
-		__field(int, ret)
-	),
-
-	TP_fast_assign(
-		SCSB_TRACE_ASSIGN(sb);
-		__entry->ino = ino;
-		__entry->ret = ret;
-	),
-
-	TP_printk(SCSBF" ino %llu ret %d", SCSB_TRACE_ARGS, __entry->ino,
-		__entry->ret)
-);
-
 TRACE_EVENT(scoutfs_delete_inode,
 	TP_PROTO(struct super_block *sb, u64 ino, umode_t mode, u64 size),

@@ -1210,32 +1058,6 @@ TRACE_EVENT(scoutfs_delete_inode,
 		  __entry->mode, __entry->size)
 );

-TRACE_EVENT(scoutfs_delete_inode_end,
-	TP_PROTO(struct super_block *sb, u64 ino, umode_t mode, u64 size, int ret),
-
-	TP_ARGS(sb, ino, mode, size, ret),
-
-	TP_STRUCT__entry(
-		__field(dev_t, dev)
-		__field(__u64, ino)
-		__field(umode_t, mode)
-		__field(__u64, size)
-		__field(int, ret)
-	),
-
-	TP_fast_assign(
-		__entry->dev = sb->s_dev;
-		__entry->ino = ino;
-		__entry->mode = mode;
-		__entry->size = size;
-		__entry->ret = ret;
-	),
-
-	TP_printk("dev %d,%d ino %llu, mode 0x%x size %llu, ret %d",
-		  MAJOR(__entry->dev), MINOR(__entry->dev), __entry->ino,
-		  __entry->mode, __entry->size, __entry->ret)
-);
-
 DECLARE_EVENT_CLASS(scoutfs_key_class,
        TP_PROTO(struct super_block *sb, struct scoutfs_key *key),
        TP_ARGS(sb, key),
@@ -1619,6 +1441,28 @@ DEFINE_EVENT(scoutfs_work_class, scoutfs_data_return_server_extents_exit,
        TP_ARGS(sb, data, ret)
 );

+DECLARE_EVENT_CLASS(scoutfs_shrink_exit_class,
+        TP_PROTO(struct super_block *sb, unsigned long nr_to_scan, int ret),
+        TP_ARGS(sb, nr_to_scan, ret),
+        TP_STRUCT__entry(
+		__field(void *, sb)
+		__field(unsigned long, nr_to_scan)
+		__field(int, ret)
+        ),
+        TP_fast_assign(
+		__entry->sb = sb;
+		__entry->nr_to_scan = nr_to_scan;
+		__entry->ret = ret;
+        ),
+        TP_printk("sb %p nr_to_scan %lu ret %d",
+		  __entry->sb, __entry->nr_to_scan, __entry->ret)
+);
+
+DEFINE_EVENT(scoutfs_shrink_exit_class, scoutfs_lock_shrink_exit,
+        TP_PROTO(struct super_block *sb, unsigned long nr_to_scan, int ret),
+        TP_ARGS(sb, nr_to_scan, ret)
+);
+
 TRACE_EVENT(scoutfs_rename,
 	TP_PROTO(struct super_block *sb, struct inode *old_dir,
 		 struct dentry *old_dentry, struct inode *new_dir,
@@ -2122,17 +1966,15 @@ DEFINE_EVENT(scoutfs_server_client_count_class, scoutfs_server_client_down,
 );

 DECLARE_EVENT_CLASS(scoutfs_server_commit_users_class,
-        TP_PROTO(struct super_block *sb, int holding, int applying,
-		 int nr_holders, u32 budget,
-		 u32 avail_before, u32 freed_before,
-		 int committing, int exceeded),
-        TP_ARGS(sb, holding, applying, nr_holders, budget, avail_before, freed_before, committing, exceeded),
+        TP_PROTO(struct super_block *sb, int holding, int applying, int nr_holders,
+		 u32 avail_before, u32 freed_before, int committing, int exceeded),
+        TP_ARGS(sb, holding, applying, nr_holders, avail_before, freed_before, committing,
+		exceeded),
        TP_STRUCT__entry(
 		SCSB_TRACE_FIELDS
 		__field(int, holding)
 		__field(int, applying)
 		__field(int, nr_holders)
-		__field(u32, budget)
 		__field(__u32, avail_before)
 		__field(__u32, freed_before)
 		__field(int, committing)
@@ -2143,45 +1985,35 @@ DECLARE_EVENT_CLASS(scoutfs_server_commit_users_class,
 		__entry->holding = !!holding;
 		__entry->applying = !!applying;
 		__entry->nr_holders = nr_holders;
-		__entry->budget = budget;
 		__entry->avail_before = avail_before;
 		__entry->freed_before = freed_before;
 		__entry->committing = !!committing;
 		__entry->exceeded = !!exceeded;
        ),
-	TP_printk(SCSBF" holding %u applying %u nr %u budget %u avail_before %u freed_before %u committing %u exceeded %u",
-		  SCSB_TRACE_ARGS, __entry->holding, __entry->applying,
-		  __entry->nr_holders, __entry->budget,
-		  __entry->avail_before, __entry->freed_before,
-		  __entry->committing, __entry->exceeded)
+	TP_printk(SCSBF" holding %u applying %u nr %u avail_before %u freed_before %u committing %u exceeded %u",
+		  SCSB_TRACE_ARGS, __entry->holding, __entry->applying, __entry->nr_holders,
+		  __entry->avail_before, __entry->freed_before, __entry->committing,
+		  __entry->exceeded)
 );
 DEFINE_EVENT(scoutfs_server_commit_users_class, scoutfs_server_commit_hold,
-        TP_PROTO(struct super_block *sb, int holding, int applying,
-		 int nr_holders, u32 budget,
-		 u32 avail_before, u32 freed_before,
-		 int committing, int exceeded),
-        TP_ARGS(sb, holding, applying, nr_holders, budget, avail_before, freed_before, committing, exceeded)
+        TP_PROTO(struct super_block *sb, int holding, int applying, int nr_holders,
+		 u32 avail_before, u32 freed_before, int committing, int exceeded),
+        TP_ARGS(sb, holding, applying, nr_holders, avail_before, freed_before, committing, exceeded)
 );
 DEFINE_EVENT(scoutfs_server_commit_users_class, scoutfs_server_commit_apply,
-        TP_PROTO(struct super_block *sb, int holding, int applying,
-		 int nr_holders, u32 budget,
-		 u32 avail_before, u32 freed_before,
-		 int committing, int exceeded),
-        TP_ARGS(sb, holding, applying, nr_holders, budget, avail_before, freed_before, committing, exceeded)
+        TP_PROTO(struct super_block *sb, int holding, int applying, int nr_holders,
+		 u32 avail_before, u32 freed_before, int committing, int exceeded),
+        TP_ARGS(sb, holding, applying, nr_holders, avail_before, freed_before, committing, exceeded)
 );
 DEFINE_EVENT(scoutfs_server_commit_users_class, scoutfs_server_commit_start,
-        TP_PROTO(struct super_block *sb, int holding, int applying,
-		 int nr_holders, u32 budget,
-		 u32 avail_before, u32 freed_before,
-		 int committing, int exceeded),
-        TP_ARGS(sb, holding, applying, nr_holders, budget, avail_before, freed_before, committing, exceeded)
+        TP_PROTO(struct super_block *sb, int holding, int applying, int nr_holders,
+		 u32 avail_before, u32 freed_before, int committing, int exceeded),
+        TP_ARGS(sb, holding, applying, nr_holders, avail_before, freed_before, committing, exceeded)
 );
 DEFINE_EVENT(scoutfs_server_commit_users_class, scoutfs_server_commit_end,
-        TP_PROTO(struct super_block *sb, int holding, int applying,
-		 int nr_holders, u32 budget,
-		 u32 avail_before, u32 freed_before,
-		 int committing, int exceeded),
-        TP_ARGS(sb, holding, applying, nr_holders, budget, avail_before, freed_before, committing, exceeded)
+        TP_PROTO(struct super_block *sb, int holding, int applying, int nr_holders,
+		 u32 avail_before, u32 freed_before, int committing, int exceeded),
+        TP_ARGS(sb, holding, applying, nr_holders, avail_before, freed_before, committing, exceeded)
 );

 #define slt_symbolic(mode)						\
@@ -2619,27 +2451,6 @@ TRACE_EVENT(scoutfs_block_dirty_ref,
 		  __entry->block_blkno, __entry->block_seq)
 );

-TRACE_EVENT(scoutfs_get_file_block,
-	TP_PROTO(struct super_block *sb, u64 blkno, int flags),
-
-	TP_ARGS(sb, blkno, flags),
-
-	TP_STRUCT__entry(
-		SCSB_TRACE_FIELDS
-		__field(__u64, blkno)
-		__field(int, flags)
-	),
-
-	TP_fast_assign(
-		SCSB_TRACE_ASSIGN(sb);
-		__entry->blkno = blkno;
-		__entry->flags = flags;
-	),
-
-	TP_printk(SCSBF" blkno %llu flags 0x%x",
-		  SCSB_TRACE_ARGS, __entry->blkno, __entry->flags)
-);
-
 TRACE_EVENT(scoutfs_block_stale,
 	TP_PROTO(struct super_block *sb, struct scoutfs_block_ref *ref,
 		 struct scoutfs_block_header *hdr, u32 magic, u32 crc),
@@ -2680,8 +2491,8 @@ TRACE_EVENT(scoutfs_block_stale,

 DECLARE_EVENT_CLASS(scoutfs_block_class,
 	TP_PROTO(struct super_block *sb, void *bp, u64 blkno, int refcount, int io_count,
-		 unsigned long bits),
-	TP_ARGS(sb, bp, blkno, refcount, io_count, bits),
+		 unsigned long bits, __u64 accessed),
+	TP_ARGS(sb, bp, blkno, refcount, io_count, bits, accessed),
 	TP_STRUCT__entry(
 		SCSB_TRACE_FIELDS
 		__field(void *, bp)
@@ -2689,6 +2500,7 @@ DECLARE_EVENT_CLASS(scoutfs_block_class,
 		__field(int, refcount)
 		__field(int, io_count)
 		__field(long, bits)
+		__field(__u64, accessed)
 	),
 	TP_fast_assign(
 		SCSB_TRACE_ASSIGN(sb);
@@ -2697,65 +2509,71 @@ DECLARE_EVENT_CLASS(scoutfs_block_class,
 		__entry->refcount = refcount;
 		__entry->io_count = io_count;
 		__entry->bits = bits;
+		__entry->accessed = accessed;
 	),
-	TP_printk(SCSBF" bp %p blkno %llu refcount %x io_count %d bits 0x%lx",
+	TP_printk(SCSBF" bp %p blkno %llu refcount %d io_count %d bits 0x%lx accessed %llu",
 		  SCSB_TRACE_ARGS, __entry->bp, __entry->blkno, __entry->refcount,
-		  __entry->io_count, __entry->bits)
+		  __entry->io_count, __entry->bits, __entry->accessed)
 );
 DEFINE_EVENT(scoutfs_block_class, scoutfs_block_allocate,
 	TP_PROTO(struct super_block *sb, void *bp, u64 blkno,
-		 int refcount, int io_count, unsigned long bits),
-	TP_ARGS(sb, bp, blkno, refcount, io_count, bits)
+		 int refcount, int io_count, unsigned long bits,
+		 __u64 accessed),
+	TP_ARGS(sb, bp, blkno, refcount, io_count, bits, accessed)
 );
 DEFINE_EVENT(scoutfs_block_class, scoutfs_block_free,
 	TP_PROTO(struct super_block *sb, void *bp, u64 blkno,
-		 int refcount, int io_count, unsigned long bits),
-	TP_ARGS(sb, bp, blkno, refcount, io_count, bits)
+		 int refcount, int io_count, unsigned long bits,
+		 __u64 accessed),
+	TP_ARGS(sb, bp, blkno, refcount, io_count, bits, accessed)
 );
 DEFINE_EVENT(scoutfs_block_class, scoutfs_block_insert,
 	TP_PROTO(struct super_block *sb, void *bp, u64 blkno,
-		 int refcount, int io_count, unsigned long bits),
-	TP_ARGS(sb, bp, blkno, refcount, io_count, bits)
+		 int refcount, int io_count, unsigned long bits,
+		 __u64 accessed),
+	TP_ARGS(sb, bp, blkno, refcount, io_count, bits, accessed)
 );
 DEFINE_EVENT(scoutfs_block_class, scoutfs_block_remove,
 	TP_PROTO(struct super_block *sb, void *bp, u64 blkno,
-		 int refcount, int io_count, unsigned long bits),
-	TP_ARGS(sb, bp, blkno, refcount, io_count, bits)
+		 int refcount, int io_count, unsigned long bits,
+		 __u64 accessed),
+	TP_ARGS(sb, bp, blkno, refcount, io_count, bits, accessed)
 );
 DEFINE_EVENT(scoutfs_block_class, scoutfs_block_end_io,
 	TP_PROTO(struct super_block *sb, void *bp, u64 blkno,
-		 int refcount, int io_count, unsigned long bits),
-	TP_ARGS(sb, bp, blkno, refcount, io_count, bits)
+		 int refcount, int io_count, unsigned long bits,
+		 __u64 accessed),
+	TP_ARGS(sb, bp, blkno, refcount, io_count, bits, accessed)
 );
 DEFINE_EVENT(scoutfs_block_class, scoutfs_block_submit,
 	TP_PROTO(struct super_block *sb, void *bp, u64 blkno,
-		 int refcount, int io_count, unsigned long bits),
-	TP_ARGS(sb, bp, blkno, refcount, io_count, bits)
+		 int refcount, int io_count, unsigned long bits,
+		 __u64 accessed),
+	TP_ARGS(sb, bp, blkno, refcount, io_count, bits, accessed)
 );
 DEFINE_EVENT(scoutfs_block_class, scoutfs_block_invalidate,
 	TP_PROTO(struct super_block *sb, void *bp, u64 blkno,
-		 int refcount, int io_count, unsigned long bits),
-	TP_ARGS(sb, bp, blkno, refcount, io_count, bits)
+		 int refcount, int io_count, unsigned long bits,
+		 __u64 accessed),
+	TP_ARGS(sb, bp, blkno, refcount, io_count, bits, accessed)
 );
 DEFINE_EVENT(scoutfs_block_class, scoutfs_block_mark_dirty,
 	TP_PROTO(struct super_block *sb, void *bp, u64 blkno,
-		 int refcount, int io_count, unsigned long bits),
-	TP_ARGS(sb, bp, blkno, refcount, io_count, bits)
+		 int refcount, int io_count, unsigned long bits,
+		 __u64 accessed),
+	TP_ARGS(sb, bp, blkno, refcount, io_count, bits, accessed)
 );
 DEFINE_EVENT(scoutfs_block_class, scoutfs_block_forget,
 	TP_PROTO(struct super_block *sb, void *bp, u64 blkno,
-		 int refcount, int io_count, unsigned long bits),
-	TP_ARGS(sb, bp, blkno, refcount, io_count, bits)
+		 int refcount, int io_count, unsigned long bits,
+		 __u64 accessed),
+	TP_ARGS(sb, bp, blkno, refcount, io_count, bits, accessed)
 );
 DEFINE_EVENT(scoutfs_block_class, scoutfs_block_shrink,
 	TP_PROTO(struct super_block *sb, void *bp, u64 blkno,
-		 int refcount, int io_count, unsigned long bits),
-	TP_ARGS(sb, bp, blkno, refcount, io_count, bits)
-);
-DEFINE_EVENT(scoutfs_block_class, scoutfs_block_isolate,
-	TP_PROTO(struct super_block *sb, void *bp, u64 blkno,
-		 int refcount, int io_count, unsigned long bits),
-	TP_ARGS(sb, bp, blkno, refcount, io_count, bits)
+		 int refcount, int io_count, unsigned long bits,
+		 __u64 accessed),
+	TP_ARGS(sb, bp, blkno, refcount, io_count, bits, accessed)
 );

 DECLARE_EVENT_CLASS(scoutfs_ext_next_class,
@@ -3230,45 +3048,6 @@ DEFINE_EVENT(scoutfs_srch_compact_class, scoutfs_srch_compact_client_recv,
 	TP_ARGS(sb, sc)
 );

-TRACE_EVENT(scoutfs_ioc_search_xattrs,
-	TP_PROTO(struct super_block *sb, u64 ino, u64 last_ino),
-
-	TP_ARGS(sb, ino, last_ino),
-
-	TP_STRUCT__entry(
-		SCSB_TRACE_FIELDS
-		__field(u64, ino)
-		__field(u64, last_ino)
-	),
-
-	TP_fast_assign(
-		SCSB_TRACE_ASSIGN(sb);
-		__entry->ino = ino;
-		__entry->last_ino = last_ino;
-	),
-
-	TP_printk(SCSBF" ino %llu last_ino %llu", SCSB_TRACE_ARGS,
-		  __entry->ino, __entry->last_ino)
-);
-
-TRACE_EVENT(scoutfs_trigger_fired,
-	TP_PROTO(struct super_block *sb, const char *name),
-
-	TP_ARGS(sb, name),
-
-	TP_STRUCT__entry(
-		SCSB_TRACE_FIELDS
-		__field(const char *, name)
-	),
-
-	TP_fast_assign(
-		SCSB_TRACE_ASSIGN(sb);
-		__entry->name = name;
-	),
-
-	TP_printk(SCSBF" %s", SCSB_TRACE_ARGS, __entry->name)
-);
-
 #endif /* _TRACE_SCOUTFS_H */

 /* This part must be outside protection */
--- a/kmod/src/server.c
+++ b/kmod/src/server.c
@@ -41,7 +41,6 @@
 #include "recov.h"
 #include "omap.h"
 #include "fence.h"
-#include "triggers.h"

 /*
 * Every active mount can act as the server that listens on a net
@@ -66,7 +65,6 @@ struct commit_users {
 	struct list_head holding;
 	struct list_head applying;
 	unsigned int nr_holders;
-	u32 budget;
 	u32 avail_before;
 	u32 freed_before;
 	bool committing;
@@ -86,9 +84,8 @@ static void init_commit_users(struct commit_users *cusers)
 do {												\
 	__typeof__(cusers) _cusers = (cusers);							\
 	trace_scoutfs_server_commit_##which(sb, !list_empty(&_cusers->holding),			\
-		!list_empty(&_cusers->applying), _cusers->nr_holders, _cusers->budget,		\
-		_cusers->avail_before, _cusers->freed_before, _cusers->committing,		\
-		_cusers->exceeded);								\
+		!list_empty(&_cusers->applying), _cusers->nr_holders, _cusers->avail_before,	\
+		_cusers->freed_before, _cusers->committing, _cusers->exceeded);			\
 } while (0)

 struct server_info {
@@ -256,14 +253,6 @@ static void server_down(struct server_info *server)
 		cmpxchg(&server->status, was, SERVER_DOWN);
 }

-static void init_mounted_client_key(struct scoutfs_key *key, u64 rid)
-{
-	*key = (struct scoutfs_key) {
-		.sk_zone = SCOUTFS_MOUNTED_CLIENT_ZONE,
-		.skmc_rid = cpu_to_le64(rid),
-	};
-}
-
 /*
 * The per-holder allocation block use budget balances batching
 * efficiency and concurrency.  The larger this gets, the fewer
@@ -314,6 +303,7 @@ static void check_holder_budget(struct super_block *sb, struct server_info *serv
 	u32 freed_used;
 	u32 avail_now;
 	u32 freed_now;
+	u32 budget;

 	assert_spin_locked(&cusers->lock);

@@ -328,14 +318,15 @@ static void check_holder_budget(struct super_block *sb, struct server_info *serv
 	else
 		freed_used = SCOUTFS_ALLOC_LIST_MAX_BLOCKS - freed_now;

-	if (avail_used <= cusers->budget && freed_used <= cusers->budget)
+	budget = cusers->nr_holders * COMMIT_HOLD_ALLOC_BUDGET;
+	if (avail_used <= budget && freed_used <= budget)
 		return;

 	exceeded_once = true;
 	cusers->exceeded = cusers->nr_holders;

-	scoutfs_err(sb, "holders exceeded alloc budget %u av: bef %u now %u, fr: bef %u now %u",
-		    cusers->budget, cusers->avail_before, avail_now,
+	scoutfs_err(sb, "%u holders exceeded alloc budget av: bef %u now %u, fr: bef %u now %u",
+		    cusers->nr_holders, cusers->avail_before, avail_now,
 		    cusers->freed_before, freed_now);

 	list_for_each_entry(hold, &cusers->holding, entry) {
@@ -358,7 +349,7 @@ static bool hold_commit(struct super_block *sb, struct server_info *server,
 {
 	bool has_room;
 	bool held;
-	u32 new_budget;
+	u32 budget;
 	u32 av;
 	u32 fr;

@@ -376,8 +367,8 @@ static bool hold_commit(struct super_block *sb, struct server_info *server,
 	}

 	/* +2 for our additional hold and then for the final commit work the server does */
-	new_budget = max(cusers->budget, (cusers->nr_holders + 2) * COMMIT_HOLD_ALLOC_BUDGET);
-	has_room = av >= new_budget && fr >= new_budget;
+	budget = (cusers->nr_holders + 2) * COMMIT_HOLD_ALLOC_BUDGET;
+	has_room = av >= budget && fr >= budget;
 	/* checking applying so holders drain once an apply caller starts waiting */
 	held = !cusers->committing && has_room && list_empty(&cusers->applying);

@@ -397,7 +388,6 @@ static bool hold_commit(struct super_block *sb, struct server_info *server,
 		list_add_tail(&hold->entry, &cusers->holding);

 		cusers->nr_holders++;
-		cusers->budget = new_budget;

 	} else if (!has_room && cusers->nr_holders == 0 && !cusers->committing) {
 		cusers->committing = true;
@@ -526,7 +516,6 @@ static void commit_end(struct super_block *sb, struct commit_users *cusers, int
 	list_for_each_entry_safe(hold, tmp, &cusers->applying, entry)
 		list_del_init(&hold->entry);
 	cusers->committing = false;
-	cusers->budget = 0;
 	spin_unlock(&cusers->lock);

 	wake_up(&cusers->waitq);
@@ -619,7 +608,7 @@ static void scoutfs_server_commit_func(struct work_struct *work)
 		goto out;

 	if (scoutfs_forcing_unmount(sb)) {
-		ret = -ENOLINK;
+		ret = -EIO;
 		goto out;
 	}

@@ -783,11 +772,14 @@ static int alloc_move_empty(struct super_block *sb,
 u64 scoutfs_server_reserved_meta_blocks(struct super_block *sb)
 {
 	DECLARE_SERVER_INFO(sb, server);
+	struct scoutfs_mount_options opts;
 	u64 server_blocks;
 	u64 client_blocks;
 	u64 log_blocks;
 	u64 nr_clients;

+	scoutfs_options_read(sb, &opts);
+
 	/* server has two meta_avail lists it swaps between */
 	server_blocks = SCOUTFS_SERVER_META_FILL_TARGET * 2;

@@ -812,7 +804,7 @@ u64 scoutfs_server_reserved_meta_blocks(struct super_block *sb)
 	nr_clients = server->nr_clients;
 	spin_unlock(&server->lock);

-	return server_blocks + (max(1ULL, nr_clients) * client_blocks);
+	return server_blocks + (max(1ULL, nr_clients) * client_blocks) + opts.meta_reserve_blocks;
 }

 /*
@@ -971,28 +963,6 @@ static int find_log_trees_item(struct super_block *sb,
 	return ret;
 }

-/*
- * Return true if the given rid has a mounted_clients entry.
- */
-static bool rid_is_mounted(struct super_block *sb, u64 rid)
-{
-	DECLARE_SERVER_INFO(sb, server);
-	struct scoutfs_super_block *super = DIRTY_SUPER_SB(sb);
-	SCOUTFS_BTREE_ITEM_REF(iref);
-	struct scoutfs_key key;
-	int ret;
-
-	init_mounted_client_key(&key, rid);
-
-	mutex_lock(&server->mounted_clients_mutex);
-	ret = scoutfs_btree_lookup(sb, &super->mounted_clients, &key, &iref);
-	if (ret == 0)
-		scoutfs_btree_put_iref(&iref);
-	mutex_unlock(&server->mounted_clients_mutex);
-
-	return ret == 0;
-}
-
 /*
 * Find the log_trees item with the greatest nr for each rid.  Fills the
 * caller's log_trees and sets the key before the returned log_trees for
@@ -1025,11 +995,10 @@ static int for_each_rid_last_lt(struct super_block *sb, struct scoutfs_btree_roo
 }

 /*
- * Log merge range items are stored at the starting fs key of the range
- * with the zone overwritten to indicate the log merge item type.  This
- * day0 mistake loses sorting information for items in the different
- * zones in the fs root, so the range items aren't strictly sorted by
- * the starting key of their range.
+ * Log merge range items are stored at the starting fs key of the range.
+ * The only fs key field that doesn't hold information is the zone, so
+ * we use the zone to differentiate all types that we store in the log
+ * merge tree.
 */
 static void init_log_merge_key(struct scoutfs_key *key, u8 zone, u64 first,
 			       u64 second)
@@ -1061,50 +1030,6 @@ static int next_log_merge_item_key(struct super_block *sb, struct scoutfs_btree_
 	return ret;
 }

-/*
- * The range items aren't sorted by their range.start because
- * _RANGE_ZONE clobbers the range's zone.  We sweep all the items and
- * find the range with the next least starting key that's greater than
- * the caller's starting key.  We have to be careful to iterate over the
- * log_merge tree keys because the ranges can overlap as they're mapped
- * to the log_merge keys by clobbering their zone.
- */
-static int next_log_merge_range(struct super_block *sb, struct scoutfs_btree_root *root,
-				struct scoutfs_key *start, struct scoutfs_log_merge_range *rng)
-{
-	struct scoutfs_log_merge_range *next;
-	SCOUTFS_BTREE_ITEM_REF(iref);
-	struct scoutfs_key key;
-	int ret;
-
-	init_log_merge_key(&key, SCOUTFS_LOG_MERGE_RANGE_ZONE, 0, 0);
-	scoutfs_key_set_ones(&rng->start);
-
-	do {
-		ret = scoutfs_btree_next(sb, root, &key, &iref);
-		if (ret == 0) {
-			if (iref.key->sk_zone != SCOUTFS_LOG_MERGE_RANGE_ZONE) {
-				ret = -ENOENT;
-			} else if (iref.val_len != sizeof(struct scoutfs_log_merge_range)) {
-				ret = -EIO;
-			} else {
-				next = iref.val;
-				if (scoutfs_key_compare(&next->start, &rng->start) < 0 &&
-				    scoutfs_key_compare(&next->start, start) >= 0)
-					*rng = *next;
-				key = *iref.key;
-				scoutfs_key_inc(&key);
-			}
-			scoutfs_btree_put_iref(&iref);
-		}
-	} while (ret == 0);
-
-	if (ret == -ENOENT && !scoutfs_key_is_ones(&rng->start))
-		ret = 0;
-
-	return ret;
-}
-
 static int next_log_merge_item(struct super_block *sb,
 			       struct scoutfs_btree_root *root,
 			       u8 zone, u64 first, u64 second,
@@ -1116,101 +1041,6 @@ static int next_log_merge_item(struct super_block *sb,
 	return next_log_merge_item_key(sb, root, zone, &key, val, val_len);
 }

-static int do_finalize_ours(struct super_block *sb,
-			    struct scoutfs_log_trees *lt,
-			    struct commit_hold *hold)
-{
-	struct server_info *server = SCOUTFS_SB(sb)->server_info;
-	struct scoutfs_super_block *super = DIRTY_SUPER_SB(sb);
-	struct scoutfs_key key;
-	char *err_str = NULL;
-	u64 rid = le64_to_cpu(lt->rid);
-	bool more;
-	int ret;
-	int err;
-
-	mutex_lock(&server->srch_mutex);
-	ret = scoutfs_srch_rotate_log(sb, &server->alloc, &server->wri,
-				      &super->srch_root, &lt->srch_file, true);
-	mutex_unlock(&server->srch_mutex);
-	if (ret < 0) {
-		scoutfs_err(sb, "error rotating srch log for rid %016llx: %d",
-			    rid, ret);
-		return ret;
-        }
-
-	do {
-		more = false;
-
-		/*
-		 * All of these can return errors, perhaps indicating successful
-		 * partial progress, after having modified the allocator trees.
-		 * We always have to update the roots in the log item.
-		 */
-		mutex_lock(&server->alloc_mutex);
-		ret = (err_str = "splice meta_freed to other_freed",
-				scoutfs_alloc_splice_list(sb, &server->alloc,
-					&server->wri, server->other_freed,
-					&lt->meta_freed)) ?:
-			(err_str = "splice meta_avail",
-			 scoutfs_alloc_splice_list(sb, &server->alloc,
-					&server->wri, server->other_freed,
-					&lt->meta_avail)) ?:
-			(err_str = "empty data_avail",
-			 alloc_move_empty(sb, &super->data_alloc,
-					  &lt->data_avail,
-					  COMMIT_HOLD_ALLOC_BUDGET / 2)) ?:
-			(err_str = "empty data_freed",
-			 alloc_move_empty(sb, &super->data_alloc,
-					  &lt->data_freed,
-					  COMMIT_HOLD_ALLOC_BUDGET / 2));
-		mutex_unlock(&server->alloc_mutex);
-
-		/*
-		 * only finalize, allowing merging, once the allocators are
-		 * fully freed
-		 */
-		if (ret == 0) {
-			/* the transaction is no longer open */
-			le64_add_cpu(&lt->flags, SCOUTFS_LOG_TREES_FINALIZED);
-			lt->finalize_seq = cpu_to_le64(scoutfs_server_next_seq(sb));
-		}
-
-		scoutfs_key_init_log_trees(&key, rid, le64_to_cpu(lt->nr));
-
-		err = scoutfs_btree_update(sb, &server->alloc, &server->wri,
-					   &super->logs_root, &key, lt,
-					   sizeof(*lt));
-		BUG_ON(err != 0); /* alloc, log, srch items out of sync */
-
-		if (ret == -EINPROGRESS) {
-			more = true;
-			mutex_unlock(&server->logs_mutex);
-			ret = server_apply_commit(sb, hold, 0);
-			if (ret < 0)
-				WARN_ON_ONCE(ret < 0);
-			server_hold_commit(sb, hold);
-			mutex_lock(&server->logs_mutex);
-		} else if (ret == 0) {
-			memset(&lt->item_root, 0, sizeof(lt->item_root));
-			memset(&lt->bloom_ref, 0, sizeof(lt->bloom_ref));
-			lt->inode_count_delta = 0;
-			lt->max_item_seq = 0;
-			lt->finalize_seq = 0;
-			le64_add_cpu(&lt->nr, 1);
-			lt->flags = 0;
-		}
-	} while (more);
-
-	if (ret < 0) {
-		scoutfs_err(sb,
-			    "error %d finalizing log trees for rid %016llx: %s",
-			    ret, rid, err_str);
-	}
-
-	return ret;
-}
-
 /*
 * Finalizing the log btrees for merging needs to be done carefully so
 * that items don't appear to go backwards in time.
@@ -1250,60 +1080,6 @@ static int do_finalize_ours(struct super_block *sb,
 * happens to arrive at just the right time.  That's fine, merging will
 * ignore and tear down the empty input.
 */
-
-static int reclaim_open_log_tree(struct super_block *sb, u64 rid);
-
-/*
- * Reclaim log trees for rids that have no mounted_clients entry.
- * They block merges by appearing active.  reclaim_open_log_tree
- * may need multiple commits to drain allocators (-EINPROGRESS).
- *
- * The caller holds logs_mutex and a commit, both are dropped and
- * re-acquired around each reclaim call.  Returns >0 if any orphans
- * were reclaimed so the caller can re-check state that may have
- * changed while the lock was dropped.
- */
-static int reclaim_orphan_log_trees(struct super_block *sb, u64 rid,
-				    struct commit_hold *hold)
-{
-	struct server_info *server = SCOUTFS_SB(sb)->server_info;
-	struct scoutfs_super_block *super = DIRTY_SUPER_SB(sb);
-	struct scoutfs_log_trees lt;
-	struct scoutfs_key key;
-	bool found = false;
-	u64 orphan_rid;
-	int ret;
-	int err;
-
-	scoutfs_key_init_log_trees(&key, U64_MAX, U64_MAX);
-	while ((ret = for_each_rid_last_lt(sb, &super->logs_root, &key, &lt)) > 0) {
-
-		if ((le64_to_cpu(lt.flags) & SCOUTFS_LOG_TREES_FINALIZED) ||
-		    le64_to_cpu(lt.rid) == rid ||
-		    rid_is_mounted(sb, le64_to_cpu(lt.rid)))
-			continue;
-
-		orphan_rid = le64_to_cpu(lt.rid);
-		scoutfs_err(sb, "reclaiming orphan log trees for rid %016llx nr %llu",
-			    orphan_rid, le64_to_cpu(lt.nr));
-		found = true;
-
-		do {
-			mutex_unlock(&server->logs_mutex);
-			err = reclaim_open_log_tree(sb, orphan_rid);
-			ret = server_apply_commit(sb, hold,
-						  err == -EINPROGRESS ? 0 : err);
-			server_hold_commit(sb, hold);
-			mutex_lock(&server->logs_mutex);
-		} while (err == -EINPROGRESS && ret == 0);
-
-		if (ret < 0)
-			break;
-	}
-
-	return ret < 0 ? ret : found;
-}
-
 #define FINALIZE_POLL_MIN_DELAY_MS	5U
 #define FINALIZE_POLL_MAX_DELAY_MS	100U
 #define FINALIZE_POLL_DELAY_GROWTH_PCT	150U
@@ -1316,6 +1092,7 @@ static int finalize_and_start_log_merge(struct super_block *sb, struct scoutfs_l
 	struct scoutfs_log_merge_range rng;
 	struct scoutfs_mount_options opts;
 	struct scoutfs_log_trees each_lt;
+	struct scoutfs_log_trees fin;
 	unsigned int delay_ms;
 	unsigned long timeo;
 	bool saw_finalized;
@@ -1344,16 +1121,6 @@ static int finalize_and_start_log_merge(struct super_block *sb, struct scoutfs_l
 			break;
 		}

-		ret = reclaim_orphan_log_trees(sb, rid, hold);
-		if (ret < 0) {
-			err_str = "reclaiming orphan log trees";
-			break;
-		}
-		if (ret > 0) {
-			/* lock was dropped, re-check merge status */
-			continue;
-		}
-
 		/* look for finalized and other active log btrees */
 		saw_finalized = false;
 		others_active = false;
@@ -1385,13 +1152,9 @@ static int finalize_and_start_log_merge(struct super_block *sb, struct scoutfs_l
 		 * meta was low so that deleted items are merged
 		 * promptly and freed blocks can bring the client out of
 		 * enospc.
-		 *
-		 * The trigger can be used to force a log merge in cases where
-		 * a test only generates small amounts of change.
 		 */
 		finalize_ours = (lt->item_root.height > 2) ||
-				(le32_to_cpu(lt->meta_avail.flags) & SCOUTFS_ALLOC_FLAG_LOW) ||
-				scoutfs_trigger(sb, LOG_MERGE_FORCE_FINALIZE_OURS);
+				(le32_to_cpu(lt->meta_avail.flags) & SCOUTFS_ALLOC_FLAG_LOW);

 		trace_scoutfs_server_finalize_decision(sb, rid, saw_finalized, others_active,
 						       ours_visible, finalize_ours, delay_ms,
@@ -1400,7 +1163,6 @@ static int finalize_and_start_log_merge(struct super_block *sb, struct scoutfs_l
 		/* done if we're not finalizing and there's no finalized */
 		if (!finalize_ours && !saw_finalized) {
 			ret = 0;
-			scoutfs_inc_counter(sb, log_merge_no_finalized);
 			break;
 		}

@@ -1435,11 +1197,32 @@ static int finalize_and_start_log_merge(struct super_block *sb, struct scoutfs_l

 		/* Finalize ours if it's visible to others */
 		if (ours_visible) {
-			ret = do_finalize_ours(sb, lt, hold);
+			fin = *lt;
+			memset(&fin.meta_avail, 0, sizeof(fin.meta_avail));
+			memset(&fin.meta_freed, 0, sizeof(fin.meta_freed));
+			memset(&fin.data_avail, 0, sizeof(fin.data_avail));
+			memset(&fin.data_freed, 0, sizeof(fin.data_freed));
+			memset(&fin.srch_file, 0, sizeof(fin.srch_file));
+			le64_add_cpu(&fin.flags, SCOUTFS_LOG_TREES_FINALIZED);
+			fin.finalize_seq = cpu_to_le64(scoutfs_server_next_seq(sb));
+
+			scoutfs_key_init_log_trees(&key, le64_to_cpu(fin.rid),
+						   le64_to_cpu(fin.nr));
+			ret = scoutfs_btree_update(sb, &server->alloc, &server->wri,
+						   &super->logs_root, &key, &fin,
+						   sizeof(fin));
 			if (ret < 0) {
-				err_str = "finalizing ours";
+				err_str = "updating finalized log_trees";
 				break;
 			}
+
+			memset(&lt->item_root, 0, sizeof(lt->item_root));
+			memset(&lt->bloom_ref, 0, sizeof(lt->bloom_ref));
+			lt->inode_count_delta = 0;
+			lt->max_item_seq = 0;
+			lt->finalize_seq = 0;
+			le64_add_cpu(&lt->nr, 1);
+			lt->flags = 0;
 		}

 		/* wait a bit for mounts to arrive */
@@ -1500,8 +1283,6 @@ static int finalize_and_start_log_merge(struct super_block *sb, struct scoutfs_l
 			BUG_ON(err); /* inconsistent */
 		}

-		scoutfs_inc_counter(sb, log_merge_start);
-
 		/* we're done, caller can make forward progress */
 		break;
 	}
@@ -1521,10 +1302,12 @@ static int finalize_and_start_log_merge(struct super_block *sb, struct scoutfs_l
 * is nested inside holding commits so we recheck the persistent item
 * each time we commit to make sure it's still what we think.   The
 * caller is still going to send the item to the client so we update the
- * caller's each time we make progress.  If we hit an error applying the
- * changes we make then we can't send the log_trees to the client.
+ * caller's each time we make progress.  This is a best-effort attempt
+ * to clean up and it's valid to leave extents in data_freed we don't
+ * return errors to the caller.  The client will continue the work later
+ * in get_log_trees or as the rid is reclaimed.
 */
-static int try_drain_data_freed(struct super_block *sb, struct scoutfs_log_trees *lt)
+static void try_drain_data_freed(struct super_block *sb, struct scoutfs_log_trees *lt)
 {
 	DECLARE_SERVER_INFO(sb, server);
 	struct scoutfs_super_block *super = DIRTY_SUPER_SB(sb);
@@ -1533,7 +1316,6 @@ static int try_drain_data_freed(struct super_block *sb, struct scoutfs_log_trees
 	struct scoutfs_log_trees drain;
 	struct scoutfs_key key;
 	COMMIT_HOLD(hold);
-	bool apply = false;
 	int ret = 0;
 	int err;

@@ -1542,27 +1324,22 @@ static int try_drain_data_freed(struct super_block *sb, struct scoutfs_log_trees
 	while (lt->data_freed.total_len != 0) {
 		server_hold_commit(sb, &hold);
 		mutex_lock(&server->logs_mutex);
-		apply = true;

 		ret = find_log_trees_item(sb, &super->logs_root, false, rid, U64_MAX, &drain);
-		if (ret < 0) {
-			ret = 0;
+		if (ret < 0)
 			break;
-		}

 		/* careful to only keep draining the caller's specific open trans */
 		if (drain.nr != lt->nr || drain.get_trans_seq != lt->get_trans_seq ||
 		    drain.commit_trans_seq != lt->commit_trans_seq || drain.flags != lt->flags) {
-			ret = 0;
+			ret = -ENOENT;
 			break;
 		}

 		ret = scoutfs_btree_dirty(sb, &server->alloc, &server->wri,
 					  &super->logs_root, &key);
-		if (ret < 0) {
-			ret = 0;
+		if (ret < 0)
 			break;
-		}

 		/* moving can modify and return errors, always update caller and item */
 		mutex_lock(&server->alloc_mutex);
@@ -1578,19 +1355,19 @@ static int try_drain_data_freed(struct super_block *sb, struct scoutfs_log_trees
 		BUG_ON(err < 0); /* dirtying must guarantee success */

 		mutex_unlock(&server->logs_mutex);
+
 		ret = server_apply_commit(sb, &hold, ret);
-		apply = false;
-
-		if (ret < 0)
+		if (ret < 0) {
+			ret = 0; /* don't try to abort, ignoring ret */
 			break;
+		}
 	}

-	if (apply) {
+	/* try to cleanly abort and write any partial dirty btree blocks, but ignore result */
+	if (ret < 0) {
 		mutex_unlock(&server->logs_mutex);
-		server_apply_commit(sb, &hold, ret);
+		server_apply_commit(sb, &hold, 0);
 	}
-
-	return ret;
 }

 /*
@@ -1718,8 +1495,7 @@ static int server_get_log_trees(struct super_block *sb,
 		goto update;
 	}

-	ret = alloc_move_empty(sb, &super->data_alloc, &lt.data_freed,
-			       COMMIT_HOLD_ALLOC_BUDGET / 2);
+	ret = alloc_move_empty(sb, &super->data_alloc, &lt.data_freed, 100);
 	if (ret == -EINPROGRESS)
 		ret = 0;
 	if (ret < 0) {
@@ -1799,9 +1575,9 @@ out:
 		scoutfs_err(sb, "error %d getting log trees for rid %016llx: %s",
 			    ret, rid, err_str);

-	/* try to drain excessive data_freed with additional commits, if needed */
+	/* try to drain excessive data_freed with additional commits, if needed, ignoring err */
 	if (ret == 0)
-		ret = try_drain_data_freed(sb, &lt);
+		try_drain_data_freed(sb, &lt);

 	return scoutfs_net_response(sb, conn, cmd, id, ret, &lt, sizeof(lt));
 }
@@ -1829,7 +1605,6 @@ static int server_commit_log_trees(struct super_block *sb,
 	int ret;

 	if (arg_len != sizeof(struct scoutfs_log_trees)) {
-		err_str = "invalid message log_trees size";
 		ret = -EINVAL;
 		goto out;
 	}
@@ -1893,7 +1668,7 @@ static int server_commit_log_trees(struct super_block *sb,

 	ret = scoutfs_btree_update(sb, &server->alloc, &server->wri,
 				   &super->logs_root, &key, &lt, sizeof(lt));
-	BUG_ON(ret < 0); /* dirtying should have guaranteed success, srch item inconsistent */
+	BUG_ON(ret < 0); /* dirtying should have guaranteed success */
 	if (ret < 0)
 		err_str = "updating log trees item";

@@ -1901,10 +1676,11 @@ unlock:
 	mutex_unlock(&server->logs_mutex);

 	ret = server_apply_commit(sb, &hold, ret);
-out:
 	if (ret < 0)
-		scoutfs_err(sb, "server error %d committing client logs for rid %016llx, nr %llu: %s",
-			    ret, rid, le64_to_cpu(lt.nr), err_str);
+		scoutfs_err(sb, "server error %d committing client logs for rid %016llx: %s",
+			    ret, rid, err_str);
+out:
+	WARN_ON_ONCE(ret < 0);
 	return scoutfs_net_response(sb, conn, cmd, id, ret, NULL, 0);
 }

@@ -2014,15 +1790,13 @@ static int reclaim_open_log_tree(struct super_block *sb, u64 rid)
 	       scoutfs_alloc_splice_list(sb, &server->alloc, &server->wri, server->other_freed,
 					 &lt.meta_avail)) ?:
 	      (err_str = "empty data_avail",
-	       alloc_move_empty(sb, &super->data_alloc, &lt.data_avail,
-				COMMIT_HOLD_ALLOC_BUDGET / 2)) ?:
+	       alloc_move_empty(sb, &super->data_alloc, &lt.data_avail, 100)) ?:
 	      (err_str = "empty data_freed",
-	       alloc_move_empty(sb, &super->data_alloc, &lt.data_freed,
-				COMMIT_HOLD_ALLOC_BUDGET / 2));
+	       alloc_move_empty(sb, &super->data_alloc, &lt.data_freed, 100));
 	mutex_unlock(&server->alloc_mutex);

 	/* only finalize, allowing merging, once the allocators are fully freed */
-	if (ret == 0 && !scoutfs_trigger(sb, RECLAIM_SKIP_FINALIZE)) {
+	if (ret == 0) {
 		/* the transaction is no longer open */
 		lt.commit_trans_seq = lt.get_trans_seq;

@@ -2039,9 +1813,6 @@ static int reclaim_open_log_tree(struct super_block *sb, u64 rid)
 out:
 	mutex_unlock(&server->logs_mutex);

-	if (ret == 0)
-		scoutfs_inc_counter(sb, reclaimed_open_logs);
-
 	if (ret < 0 && ret != -EINPROGRESS)
 		scoutfs_err(sb, "server error %d reclaiming log trees for rid %016llx: %s",
 			    ret, rid, err_str);
@@ -2074,8 +1845,7 @@ static int get_stable_trans_seq(struct super_block *sb, u64 *last_seq_ret)
 	scoutfs_key_init_log_trees(&key, U64_MAX, U64_MAX);
 	while ((ret = for_each_rid_last_lt(sb, &super->logs_root, &key, &lt)) > 0) {
 		if ((le64_to_cpu(lt.get_trans_seq) > le64_to_cpu(lt.commit_trans_seq)) &&
-		     le64_to_cpu(lt.get_trans_seq) <= last_seq &&
-		     rid_is_mounted(sb, le64_to_cpu(lt.rid))) {
+		     le64_to_cpu(lt.get_trans_seq) <= last_seq) {
 			last_seq = le64_to_cpu(lt.get_trans_seq) - 1;
 		}
 	}
@@ -2244,7 +2014,7 @@ static int server_srch_get_compact(struct super_block *sb,

 apply:
 	ret = server_apply_commit(sb, &hold, ret);
-	WARN_ON_ONCE(ret < 0 && ret != -ENOENT && ret != -ENOLINK); /* XXX leaked busy item */
+	WARN_ON_ONCE(ret < 0 && ret != -ENOENT); /* XXX leaked busy item */
 out:
 	ret = scoutfs_net_response(sb, conn, cmd, id, ret,
 				   sc, sizeof(struct scoutfs_srch_compact));
@@ -2284,7 +2054,7 @@ static int server_srch_commit_compact(struct super_block *sb,
 					  &super->srch_root, rid, sc,
 					  &av, &fr);
 	mutex_unlock(&server->srch_mutex);
-	if (ret < 0)
+	if (ret < 0) /* XXX very bad, leaks allocators */
 		goto apply;

 	/* reclaim allocators if they were set by _srch_commit_ */
@@ -2294,10 +2064,10 @@ static int server_srch_commit_compact(struct super_block *sb,
 	      scoutfs_alloc_splice_list(sb, &server->alloc, &server->wri,
 					server->other_freed, &fr);
 	mutex_unlock(&server->alloc_mutex);
-	WARN_ON(ret < 0); /* XXX leaks allocators */
 apply:
 	ret = server_apply_commit(sb, &hold, ret);
 out:
+	WARN_ON(ret < 0); /* XXX leaks allocators */
 	return scoutfs_net_response(sb, conn, cmd, id, ret, NULL, 0);
 }

@@ -2610,8 +2380,6 @@ static int splice_log_merge_completions(struct super_block *sb,
 		queue_work(server->wq, &server->log_merge_free_work);
 	else
 		err_str = "deleting merge status item";
-
-	scoutfs_inc_counter(sb, log_merge_complete);
 out:
 	if (upd_stat) {
 		init_log_merge_key(&key, SCOUTFS_LOG_MERGE_STATUS_ZONE, 0, 0);
@@ -2624,9 +2392,10 @@ out:
 		}
 	}

-	/* inconsistent */
-	scoutfs_bug_on_err(sb, ret,
-			   "server error %d splicing log merge completion: %s", ret, err_str);
+	if (ret < 0)
+		scoutfs_err(sb, "server error %d splicing log merge completion: %s", ret, err_str);
+
+	BUG_ON(ret); /* inconsistent */

 	return ret ?: einprogress;
 }
@@ -2761,7 +2530,7 @@ static void server_log_merge_free_work(struct work_struct *work)

 		ret = scoutfs_btree_free_blocks(sb, &server->alloc,
 						&server->wri, &fr.key,
-						&fr.root, COMMIT_HOLD_ALLOC_BUDGET / 8);
+						&fr.root, COMMIT_HOLD_ALLOC_BUDGET / 2);
 		if (ret < 0) {
 			err_str = "freeing log btree";
 			break;
@@ -2780,7 +2549,7 @@ static void server_log_merge_free_work(struct work_struct *work)
 		/* freed blocks are in allocator, we *have* to update fr */
 		BUG_ON(ret < 0);

-		if (server_hold_alloc_used_since(sb, &hold) >= (COMMIT_HOLD_ALLOC_BUDGET * 3) / 4) {
+		if (server_hold_alloc_used_since(sb, &hold) >= COMMIT_HOLD_ALLOC_BUDGET / 2) {
 			mutex_unlock(&server->logs_mutex);
 			ret = server_apply_commit(sb, &hold, ret);
 			commit = false;
@@ -2871,7 +2640,10 @@ restart:

 	/* find the next range, always checking for splicing */
 	for (;;) {
-		ret = next_log_merge_range(sb, &super->log_merge, &stat.next_range_key, &rng);
+		key = stat.next_range_key;
+		key.sk_zone = SCOUTFS_LOG_MERGE_RANGE_ZONE;
+		ret = next_log_merge_item_key(sb, &super->log_merge, SCOUTFS_LOG_MERGE_RANGE_ZONE,
+					      &key, &rng, sizeof(rng));
 		if (ret < 0 && ret != -ENOENT) {
 			err_str = "finding merge range item";
 			goto out;
@@ -3142,13 +2914,7 @@ static int server_commit_log_merge(struct super_block *sb,
 				  SCOUTFS_LOG_MERGE_STATUS_ZONE, 0, 0,
 				  &stat, sizeof(stat));
 	if (ret < 0) {
-		/*
-		 * During a retransmission, it's possible that the server
-		 * already committed and resolved this log merge. ENOENT
-		 * is expected in that case.
-		 */
-		if (ret != -ENOENT)
-			err_str = "getting merge status item";
+		err_str = "getting merge status item";
 		goto out;
 	}

@@ -3627,6 +3393,14 @@ out:
 	return scoutfs_net_response(sb, conn, cmd, id, ret, &nst, sizeof(nst));
 }

+static void init_mounted_client_key(struct scoutfs_key *key, u64 rid)
+{
+	*key = (struct scoutfs_key) {
+		.sk_zone = SCOUTFS_MOUNTED_CLIENT_ZONE,
+		.skmc_rid = cpu_to_le64(rid),
+	};
+}
+
 static bool invalid_mounted_client_item(struct scoutfs_btree_item_ref *iref)
 {
 	return (iref->val_len != sizeof(struct scoutfs_mounted_client_btree_val));
@@ -4378,7 +4152,7 @@ static void fence_pending_recov_worker(struct work_struct *work)
 	struct server_info *server = container_of(work, struct server_info,
 						  fence_pending_recov_work);
 	struct super_block *sb = server->sb;
-	union scoutfs_inet_addr addr = {{0,}};
+	union scoutfs_inet_addr addr;
 	u64 rid = 0;
 	int ret = 0;

--- a/kmod/src/sparse-filtered.sh
+++ b/kmod/src/sparse-filtered.sh
@@ -1,45 +0,0 @@
-#!/bin/bash
-
-#
-# Unfortunately, kernels can ship which contain sparse errors that are
-# unrelated to us.
-#
-# The exit status of this filtering wrapper will indicate an error if
-# sparse wasn't found or if there were any unfiltered output lines.  It
-# can hide error exit status from sparse or grep if they don't produce
-# output that makes it past the filters.
-#
-
-# must have sparse.  Fail with error message, mask success path.
-which sparse > /dev/null || exit 1
-
-# initial unmatchable, additional added as RE+="|..."
-RE="$^"
-
-#
-# Darn.  sparse has multi-line error messages, and I'd rather not bother
-# with multi-line filters.  So we'll just drop this context.
-#
-# command-line: note: in included file (through include/linux/netlink.h, include/linux/ethtool.h, include/linux/netdevice.h, include/net/sock.h, /root/scoutfs/kmod/src/kernelcompat.h, builtin): 
-#         fprintf(stderr, "%s: note: in included file%s:\n",
-#
-RE+="|: note: in included file"
-
-# 3.10.0-1160.119.1.el7.x86_64.debug
-# include/linux/posix_acl.h:138:9: warning: incorrect type in assignment (different address spaces)
-# include/linux/posix_acl.h:138:9:    expected struct posix_acl *<noident>
-# include/linux/posix_acl.h:138:9:    got struct posix_acl [noderef] <asn:4>*<noident>
-RE+="|include/linux/posix_acl.h:"
-
-# 3.10.0-1160.119.1.el7.x86_64.debug
-#include/uapi/linux/perf_event.h:146:56: warning: cast truncates bits from constant value (8000000000000000 becomes 0)
-RE+="|include/uapi/linux/perf_event.h:"
-
-# 4.18.0-513.24.1.el8_9.x86_64+debug'
-#./include/linux/skbuff.h:824:1: warning: directive in macro's argument list
-RE+="|include/linux/skbuff.h:"
-
-sparse "$@" |& \
-	grep -E -v "($RE)" |& \
-	awk '{ print $0 } END { exit NR > 0 }'
-exit $?
--- a/kmod/src/srch.c
+++ b/kmod/src/srch.c
@@ -62,7 +62,7 @@
 * re-allocated and re-written.  Search can restart by checking the
 * btree for the current set of files.  Compaction reads log files which
 * are protected from other compactions by the persistent busy items
- * created by the server.  Compaction won't see its blocks reused out
+ * created by the server.  Compaction won't see it's blocks reused out
 * from under it, but it can encounter stale cached blocks that need to
 * be invalidated.
 */
@@ -442,10 +442,6 @@ out:
 	if (ret == 0 && (flags & GFB_INSERT) && blk >= le64_to_cpu(sfl->blocks))
 		sfl->blocks = cpu_to_le64(blk + 1);

-	if (bl) {
-		trace_scoutfs_get_file_block(sb, bl->blkno, flags);
-	}
-
 	*bl_ret = bl;
 	return ret;
 }
@@ -537,35 +533,23 @@ out:
 * the pairs cancel each other out by all readers (the second encoding
 * looks like deletion) so they aren't visible to the first/last bounds of
 * the block or file.
- *
- * We use the same entry repeatedly, so the diff between them will be empty.
- * This lets us just emit the two-byte count word, leaving the other bytes
- * as zero.
- *
- * Split the desired total len into two pieces, adding any remainder to the
- * first four-bit value.
 */
-static void append_padded_entry(struct scoutfs_srch_file *sfl,
-				struct scoutfs_srch_block *srb,
-				int len)
+static int append_padded_entry(struct scoutfs_srch_file *sfl, u64 blk,
+			       struct scoutfs_srch_block *srb, struct scoutfs_srch_entry *sre)
 {
-	int each;
-	int rem;
-	u16 lengths = 0;
-	u8 *buf = srb->entries + le32_to_cpu(srb->entry_bytes);
+	int ret;

-	each = (len - 2) >> 1;
-	rem = (len - 2) & 1;
+	ret = encode_entry(srb->entries + le32_to_cpu(srb->entry_bytes),
+			   sre, &srb->tail);
+	if (ret > 0) {
+		srb->tail = *sre;
+		le32_add_cpu(&srb->entry_nr, 1);
+		le32_add_cpu(&srb->entry_bytes, ret);
+		le64_add_cpu(&sfl->entries, 1);
+		ret = 0;
+	}

-	lengths |= each + rem;
-	lengths |= each << 4;
-
-	memset(buf, 0, len);
-	put_unaligned_le16(lengths, buf);
-
-	le32_add_cpu(&srb->entry_nr, 1);
-	le32_add_cpu(&srb->entry_bytes, len);
-	le64_add_cpu(&sfl->entries, 1);
+	return ret;
 }

 /*
@@ -576,41 +560,61 @@ static void append_padded_entry(struct scoutfs_srch_file *sfl,
 * This is called when there is a single existing entry in the block.
 * We have the entire block to work with.  We encode pairs of matching
 * entries.  This hides them from readers (both searches and merging) as
- * they're interpreted as creation and deletion and are deleted.
+ * they're interpreted as creation and deletion and are deleted.  We use
+ * the existing hash value of the first entry in the block but then set
+ * the inode to an impossibly large number so it doesn't interfere with
+ * anything.
 *
- * For simplicity and to maintain sort ordering within the block, we reuse
- * the existing entry. This lets us skip the encoding step, because we know
- * the diff will be zero. We can zero-pad the resulting entries to hit the
- * target offset exactly.
+ * To hit the specific offset we very carefully manage the amount of
+ * bytes of change between fields in the entry.  We know that if we
+ * change all the byte of the ino and id we end up with a 20 byte
+ * (2+8+8,2) encoding of the pair of entries.  To have the last entry
+ * start at the _SAFE_POS offset we know that the final 20 byte pair
+ * encoding needs to end at 2 bytes (second entry encoding) after the
+ * _SAFE_POS offset.
 *
- * Because we can't predict the exact number of entry_bytes when we start,
- * we adjust the byte count of subsequent entries until we wind up at a
- * multiple of 20 bytes away from our goal and then use that length for
- * the remaining entries.
- *
- * We could just use a single pair of unnaturally large entries to consume
- * the needed space, adjusting for an odd number of entry_bytes if necessary.
- * The use of 19 or 20 bytes for the entry pair matches what we would see with
- * real (non-zero) entries that vary from the existing entry.
+ * So as we encode pairs we watch the delta of our current offset from
+ * that desired final offset of 2 past _SAFE_POS.  If we're a multiple
+ * of 20 away then we encode the full 20 byte pairs.  If we're not, then
+ * we drop a byte to encode 19 bytes.  That'll slowly change the offset
+ * to be a multiple of 20 again while encoding large entries.
 */
-static void pad_entries_at_safe(struct scoutfs_srch_file *sfl,
+static void pad_entries_at_safe(struct scoutfs_srch_file *sfl, u64 blk,
 				struct scoutfs_srch_block *srb)
 {
+	struct scoutfs_srch_entry sre;
 	u32 target;
 	s32 diff;
+	u64 hash;
+	u64 ino;
+	u64 id;
+	int ret;
+
+	hash = le64_to_cpu(srb->tail.hash);
+	ino = le64_to_cpu(srb->tail.ino) | (1ULL << 62);
+	id = le64_to_cpu(srb->tail.id);

 	target = SCOUTFS_SRCH_BLOCK_SAFE_BYTES + 2;

 	while ((diff = target - le32_to_cpu(srb->entry_bytes)) > 0) {
-		append_padded_entry(sfl, srb, 10);
+		ino ^= 1ULL << (7 * 8);
 		if (diff % 20 == 0) {
-			append_padded_entry(sfl, srb, 10);
+			id ^= 1ULL << (7 * 8);
 		} else {
-			append_padded_entry(sfl, srb, 9);
+			id ^= 1ULL << (6 * 8);
 		}
-	}

-	WARN_ON_ONCE(diff != 0);
+		sre.hash = cpu_to_le64(hash);
+		sre.ino = cpu_to_le64(ino);
+		sre.id = cpu_to_le64(id);
+
+		ret = append_padded_entry(sfl, blk, srb, &sre);
+		if (ret == 0)
+			ret = append_padded_entry(sfl, blk, srb, &sre);
+		BUG_ON(ret != 0);
+
+		diff = target - le32_to_cpu(srb->entry_bytes);
+	}
 }

 /*
@@ -745,14 +749,14 @@ static int search_log_file(struct super_block *sb,
 		for (i = 0; i < le32_to_cpu(srb->entry_nr); i++) {
 			if (pos > SCOUTFS_SRCH_BLOCK_SAFE_BYTES) {
 				/* can only be inconsistency :/ */
-				ret = -EIO;
+				ret = EIO;
 				break;
 			}

 			ret = decode_entry(srb->entries + pos, &sre, &prev);
 			if (ret <= 0) {
 				/* can only be inconsistency :/ */
-				ret = -EIO;
+				ret = EIO;
 				break;
 			}
 			pos += ret;
@@ -855,15 +859,15 @@ static int search_sorted_file(struct super_block *sb,

 		if (pos > SCOUTFS_SRCH_BLOCK_SAFE_BYTES) {
 			/* can only be inconsistency :/ */
-			ret = -EIO;
-			goto out;
+			ret = EIO;
+			break;
 		}

 		ret = decode_entry(srb->entries + pos, &sre, &prev);
 		if (ret <= 0) {
 			/* can only be inconsistency :/ */
-			ret = -EIO;
-			goto out;
+			ret = EIO;
+			break;
 		}
 		pos += ret;
 		prev = sre;
@@ -968,8 +972,6 @@ int scoutfs_srch_search_xattrs(struct super_block *sb,

 	scoutfs_inc_counter(sb, srch_search_xattrs);

-	trace_scoutfs_ioc_search_xattrs(sb, ino, last_ino);
-
 	*done = false;
 	srch_init_rb_root(sroot);

@@ -1406,7 +1408,7 @@ int scoutfs_srch_commit_compact(struct super_block *sb,
 			ret = -EIO;
 		scoutfs_btree_put_iref(&iref);
 	}
-	if (ret < 0)
+	if (ret < 0) /* XXX leaks allocators */
 		goto out;

 	/* restore busy to pending if the operation failed */
@@ -1426,8 +1428,10 @@ int scoutfs_srch_commit_compact(struct super_block *sb,
 	/* update file references if we finished compaction (!deleting) */
 	if (!(res->flags & SCOUTFS_SRCH_COMPACT_FLAG_DELETE)) {
 		ret = commit_files(sb, alloc, wri, root, res);
-		if (ret < 0)
+		if (ret < 0) {
+			/* XXX we can't commit, shutdown? */
 			goto out;
+		}

 		/* transition flags for deleting input files */
 		for (i = 0; i < res->nr; i++) {
@@ -1454,7 +1458,7 @@ update:
 			      le64_to_cpu(pending->id), 0);
 		ret = scoutfs_btree_insert(sb, alloc, wri, root, &key,
 					   pending, sizeof(*pending));
-		if (WARN_ON_ONCE(ret < 0)) /* XXX inconsistency */
+		if (ret < 0)
 			goto out;
 	}

@@ -1467,6 +1471,7 @@ update:
 		BUG_ON(err); /* both busy and pending present */
 	}
 out:
+	WARN_ON_ONCE(ret < 0); /* XXX inconsistency */
 	kfree(busy);
 	return ret;
 }
@@ -1664,7 +1669,7 @@ static int kway_merge(struct super_block *sb,
 			/* end sorted block on _SAFE offset for testing */
 			if (bl && le32_to_cpu(srb->entry_nr) == 1 && logs_input &&
 			    scoutfs_trigger(sb, SRCH_COMPACT_LOGS_PAD_SAFE)) {
-				pad_entries_at_safe(sfl, srb);
+				pad_entries_at_safe(sfl, blk, srb);
 				scoutfs_block_put(sb, bl);
 				bl = NULL;
 				blk++;
@@ -1797,7 +1802,7 @@ static void swap_page_sre(void *A, void *B, int size)
 * typically, ~10x worst case).
 *
 * Because we read and sort all the input files we must perform the full
- * compaction in one operation.  The server must have given us
+ * compaction in one operation.  The server must have given us a
 * sufficiently large avail/freed lists, otherwise we'll return ENOSPC.
 */
 static int compact_logs(struct super_block *sb,
@@ -1861,14 +1866,14 @@ static int compact_logs(struct super_block *sb,

 		if (pos > SCOUTFS_SRCH_BLOCK_SAFE_BYTES) {
 			/* can only be inconsistency :/ */
-			ret = -EIO;
-			goto out;
+			ret = EIO;
+			break;
 		}

 		ret = decode_entry(srb->entries + pos, sre, &prev);
 		if (ret <= 0) {
 			/* can only be inconsistency :/ */
-			ret = -EIO;
+			ret = EIO;
 			goto out;
 		}
 		prev = *sre;
@@ -2276,11 +2281,12 @@ static void scoutfs_srch_compact_worker(struct work_struct *work)
 	} else {
 		ret = -EINVAL;
 	}
+	if (ret < 0)
+		goto commit;

-	scoutfs_alloc_prepare_commit(sb, &alloc, &wri);
-	if (ret == 0)
+	ret = scoutfs_alloc_prepare_commit(sb, &alloc, &wri) ?:
 	      scoutfs_block_writer_write(sb, &wri);
-
+commit:
 	/* the server won't use our partial compact if _ERROR is set */
 	sc->meta_avail = alloc.avail;
 	sc->meta_freed = alloc.freed;
@@ -2297,7 +2303,7 @@ out:
 		scoutfs_inc_counter(sb, srch_compact_error);

 	scoutfs_block_writer_forget_all(sb, &wri);
-	queue_compact_work(srinf, sc != NULL && sc->nr > 0 && ret == 0);
+	queue_compact_work(srinf, sc->nr > 0 && ret == 0);

 	kfree(sc);
 }
--- a/kmod/src/super.c
+++ b/kmod/src/super.c
@@ -512,9 +512,9 @@ static int scoutfs_fill_super(struct super_block *sb, void *data, int silent)

 	sbi = kzalloc(sizeof(struct scoutfs_sb_info), GFP_KERNEL);
 	sb->s_fs_info = sbi;
+	sbi->sb = sb;
 	if (!sbi)
 		return -ENOMEM;
-	sbi->sb = sb;

 	ret = assign_random_id(sbi);
 	if (ret < 0)
--- a/kmod/src/totl.c
+++ b/kmod/src/totl.c
@@ -30,11 +30,6 @@ void scoutfs_totl_merge_init(struct scoutfs_totl_merging *merg)
 	memset(merg, 0, sizeof(struct scoutfs_totl_merging));
 }

-/*
- * bin the incoming merge inputs so that we can resolve delta items
- * properly. Finalized logs that are merge inputs are kept separately
- * from those that are not.
- */
 void scoutfs_totl_merge_contribute(struct scoutfs_totl_merging *merg,
 				   u64 seq, u8 flags, void *val, int val_len, int fic)
 {
@@ -44,10 +39,10 @@ void scoutfs_totl_merge_contribute(struct scoutfs_totl_merging *merg,
 		merg->fs_seq = seq;
 		merg->fs_total = le64_to_cpu(tval->total);
 		merg->fs_count = le64_to_cpu(tval->count);
-	} else if (fic & FIC_MERGE_INPUT) {
-		merg->inp_seq = seq;
-		merg->inp_total += le64_to_cpu(tval->total);
-		merg->inp_count += le64_to_cpu(tval->count);
+	} else if (fic & FIC_FINALIZED) {
+		merg->fin_seq = seq;
+		merg->fin_total += le64_to_cpu(tval->total);
+		merg->fin_count += le64_to_cpu(tval->count);
 	} else {
 		merg->log_seq = seq;
 		merg->log_total += le64_to_cpu(tval->total);
@@ -58,18 +53,15 @@ void scoutfs_totl_merge_contribute(struct scoutfs_totl_merging *merg,
 /*
 * .totl. item merging has to be careful because the log btree merging
 * code can write partial results to the fs_root.  This means that a
- * reader can see both cases where merge input deltas should be applied
- * to the old fs items and where they have already been applied to the
- * partially merged fs items.
- *
- * Only finalized log trees that are inputs to the current merge cycle
- * are tracked in the inp_ bucket.  Finalized trees that aren't merge
- * inputs and active log trees are always applied unconditionally since
- * they cannot be in fs_root.
+ * reader can see both cases where new finalized logs should be applied
+ * to the old fs items and where old finalized logs have already been
+ * applied to the partially merged fs items.  Currently active logged
+ * items are always applied on top of all cases.
 *
 * These cases are differentiated with a combination of sequence numbers
- * in items and the count of contributing xattrs.  This lets us
- * recognize all cases, including when merge inputs were merged and
+ * in items, the count of contributing xattrs, and a flag
+ * differentiating finalized and active logged items.  This lets us
+ * recognize all cases, including when finalized logs were merged and
 * deleted the fs item.
 */
 void scoutfs_totl_merge_resolve(struct scoutfs_totl_merging *merg, __u64 *total, __u64 *count)
@@ -83,14 +75,14 @@ void scoutfs_totl_merge_resolve(struct scoutfs_totl_merging *merg, __u64 *total,
 		*count = merg->fs_count;
 	}

-	/* apply merge input deltas if they're newer or creating */
-	if (((merg->fs_seq != 0) && (merg->inp_seq > merg->fs_seq)) ||
-	    ((merg->fs_seq == 0) && (merg->inp_count > 0))) {
-		*total += merg->inp_total;
-		*count += merg->inp_count;
+	/* apply finalized logs if they're newer or creating */
+	if (((merg->fs_seq != 0) && (merg->fin_seq > merg->fs_seq)) ||
+	    ((merg->fs_seq == 0) && (merg->fin_count > 0))) {
+		*total += merg->fin_total;
+		*count += merg->fin_count;
 	}

-	/* always apply non-input finalized and active logs */
+	/* always apply active logs which must be newer than fs and finalized */
 	if (merg->log_seq > 0) {
 		*total += merg->log_total;
 		*count += merg->log_count;
--- a/kmod/src/totl.h
+++ b/kmod/src/totl.h
@@ -7,9 +7,9 @@ struct scoutfs_totl_merging {
 	u64 fs_seq;
 	u64 fs_total;
 	u64 fs_count;
-	u64 inp_seq;
-	u64 inp_total;
-	s64 inp_count;
+	u64 fin_seq;
+	u64 fin_total;
+	s64 fin_count;
 	u64 log_seq;
 	u64 log_total;
 	s64 log_count;
--- a/kmod/src/trans.c
+++ b/kmod/src/trans.c
@@ -159,58 +159,6 @@ static bool drained_holders(struct trans_info *tri)
 	return holders == 0;
 }

-static int commit_current_log_trees(struct super_block *sb, char **str)
-{
-	DECLARE_TRANS_INFO(sb, tri);
-
-	return (*str = "data submit", scoutfs_inode_walk_writeback(sb, true)) ?:
-	       (*str = "item dirty", scoutfs_item_write_dirty(sb))  ?:
-	       (*str = "data prepare", scoutfs_data_prepare_commit(sb))  ?:
-	       (*str = "alloc prepare", scoutfs_alloc_prepare_commit(sb, &tri->alloc, &tri->wri)) ?:
-	       (*str = "meta write", scoutfs_block_writer_write(sb, &tri->wri))  ?:
-	       (*str = "data wait", scoutfs_inode_walk_writeback(sb, false)) ?:
-	       (*str = "commit log trees", commit_btrees(sb)) ?:
-	       scoutfs_item_write_done(sb);
-}
-
-static int get_next_log_trees(struct super_block *sb, char **str)
-{
-	return (*str = "get log trees", scoutfs_trans_get_log_trees(sb));
-}
-
-static int retry_forever(struct super_block *sb, int (*func)(struct super_block *sb, char **str))
-{
-	bool retrying = false;
-	char *str;
-	int ret;
-
-	do {
-		str = NULL;
-
-		ret = func(sb, &str);
-		if (ret < 0) {
-			if (!retrying) {
-				scoutfs_warn(sb, "critical transaction commit failure: %s = %d, retrying",
-					    str, ret);
-				retrying = true;
-			}
-
-			if (scoutfs_forcing_unmount(sb)) {
-				ret = -ENOLINK;
-				break;
-			}
-
-			msleep(2 * MSEC_PER_SEC);
-
-		} else if (retrying) {
-			scoutfs_info(sb, "retried transaction commit succeeded");
-		}
-
-	} while (ret < 0);
-
-	return ret;
-}
-
 /*
 * This work func is responsible for writing out all the dirty blocks
 * that make up the current dirty transaction.  It prevents writers from
@@ -236,6 +184,8 @@ void scoutfs_trans_write_func(struct work_struct *work)
 	struct trans_info *tri = container_of(work, struct trans_info, write_work.work);
 	struct super_block *sb = tri->sb;
 	struct scoutfs_sb_info *sbi = SCOUTFS_SB(sb);
+	bool retrying = false;
+	char *s = NULL;
 	int ret = 0;

 	tri->task = current;
@@ -252,7 +202,7 @@ void scoutfs_trans_write_func(struct work_struct *work)
 	}

 	if (scoutfs_forcing_unmount(sb)) {
-		ret = -ENOLINK;
+		ret = -EIO;
 		goto out;
 	}

@@ -264,9 +214,37 @@ void scoutfs_trans_write_func(struct work_struct *work)

 	scoutfs_inc_counter(sb, trans_commit_written);

-	/* retry {commit,get}_log_trees until they succeeed, can only fail when forcing unmount */
-	ret = retry_forever(sb, commit_current_log_trees) ?:
-	      retry_forever(sb, get_next_log_trees);
+	do {
+		ret = (s = "data submit", scoutfs_inode_walk_writeback(sb, true)) ?:
+		      (s = "item dirty", scoutfs_item_write_dirty(sb))  ?:
+		      (s = "data prepare", scoutfs_data_prepare_commit(sb))  ?:
+		      (s = "alloc prepare", scoutfs_alloc_prepare_commit(sb, &tri->alloc,
+									 &tri->wri))  ?:
+		      (s = "meta write", scoutfs_block_writer_write(sb, &tri->wri))  ?:
+		      (s = "data wait", scoutfs_inode_walk_writeback(sb, false)) ?:
+		      (s = "commit log trees", commit_btrees(sb)) ?:
+		      scoutfs_item_write_done(sb) ?:
+		      (s = "get log trees", scoutfs_trans_get_log_trees(sb));
+		if (ret < 0) {
+			if (!retrying) {
+				scoutfs_warn(sb, "critical transaction commit failure: %s = %d, retrying",
+					    s, ret);
+				retrying = true;
+			}
+
+			if (scoutfs_forcing_unmount(sb)) {
+				ret = -EIO;
+				break;
+			}
+
+			msleep(2 * MSEC_PER_SEC);
+
+		} else if (retrying) {
+			scoutfs_info(sb, "retried transaction commit succeeded");
+		}
+
+	} while (ret < 0);
+
 out:
 	spin_lock(&tri->write_lock);
 	tri->write_count++;
--- a/kmod/src/triggers.c
+++ b/kmod/src/triggers.c
@@ -18,7 +18,6 @@

 #include "super.h"
 #include "triggers.h"
-#include "scoutfs_trace.h"

 /*
 * We have debugfs files we can write to which arm triggers which
@@ -40,13 +39,10 @@ struct scoutfs_triggers {

 static char *names[] = {
 	[SCOUTFS_TRIGGER_BLOCK_REMOVE_STALE] = "block_remove_stale",
-	[SCOUTFS_TRIGGER_LOG_MERGE_FORCE_FINALIZE_OURS] = "log_merge_force_finalize_ours",
 	[SCOUTFS_TRIGGER_SRCH_COMPACT_LOGS_PAD_SAFE] = "srch_compact_logs_pad_safe",
 	[SCOUTFS_TRIGGER_SRCH_FORCE_LOG_ROTATE] = "srch_force_log_rotate",
 	[SCOUTFS_TRIGGER_SRCH_MERGE_STOP_SAFE] = "srch_merge_stop_safe",
 	[SCOUTFS_TRIGGER_STATFS_LOCK_PURGE] = "statfs_lock_purge",
-	[SCOUTFS_TRIGGER_RECLAIM_SKIP_FINALIZE] = "reclaim_skip_finalize",
-	[SCOUTFS_TRIGGER_LOG_MERGE_FORCE_PARTIAL] = "log_merge_force_partial",
 };

 bool scoutfs_trigger_test_and_clear(struct super_block *sb, unsigned int t)
@@ -55,7 +51,6 @@ bool scoutfs_trigger_test_and_clear(struct super_block *sb, unsigned int t)
 	atomic_t *atom;
 	int old;
 	int mem;
-	bool fired;

 	BUG_ON(t >= SCOUTFS_TRIGGER_NR);
 	atom = &triggers->atomics[t];
@@ -69,12 +64,7 @@ bool scoutfs_trigger_test_and_clear(struct super_block *sb, unsigned int t)
 		mem = atomic_cmpxchg(atom, old, 0);
 	} while (mem && mem != old);

-	fired = !!mem;
-
-	if (fired)
-		trace_scoutfs_trigger_fired(sb, names[t]);
-
-	return fired;
+	return !!mem;
 }

 int scoutfs_setup_triggers(struct super_block *sb)
--- a/kmod/src/triggers.h
+++ b/kmod/src/triggers.h
@@ -3,13 +3,10 @@

 enum scoutfs_trigger {
 	SCOUTFS_TRIGGER_BLOCK_REMOVE_STALE,
-	SCOUTFS_TRIGGER_LOG_MERGE_FORCE_FINALIZE_OURS,
 	SCOUTFS_TRIGGER_SRCH_COMPACT_LOGS_PAD_SAFE,
 	SCOUTFS_TRIGGER_SRCH_FORCE_LOG_ROTATE,
 	SCOUTFS_TRIGGER_SRCH_MERGE_STOP_SAFE,
 	SCOUTFS_TRIGGER_STATFS_LOCK_PURGE,
-	SCOUTFS_TRIGGER_RECLAIM_SKIP_FINALIZE,
-	SCOUTFS_TRIGGER_LOG_MERGE_FORCE_PARTIAL,
 	SCOUTFS_TRIGGER_NR,
 };

--- a/kmod/src/wkic.c
+++ b/kmod/src/wkic.c
@@ -95,7 +95,6 @@ struct wkic_info {
 	/* block reading slow path */
 	struct mutex roots_mutex;
 	struct scoutfs_net_roots roots;
-	u64 merge_input_seq;
 	u64 roots_read_seq;
 	ktime_t roots_expire;

@@ -806,79 +805,29 @@ static void free_page_list(struct super_block *sb, struct list_head *list)
 * read_seq number so that we can compare the age of the items in cached
 * pages.  Only one request to refresh the roots is in progress at a
 * time.  This is the slow path that's only used when the cache isn't
- * populated and the roots aren't cached.
- *
- * We read roots directly from the on-disk superblock rather than
- * requesting them from the server so that we can also read the
- * log_merge btree from the same superblock.  The merge status item
- * seq tells us which finalized log trees are inputs to the current
- * merge, which is needed to correctly resolve totl delta items.
+ * populated and the roots aren't cached.  The root request is fast
+ * enough, especially compared to the resulting item reading IO, that we
+ * don't mind hiding it behind a trivial mutex.
 */
-static int refresh_roots(struct super_block *sb, struct wkic_info *winf)
-{
-	struct scoutfs_super_block *super;
-	struct scoutfs_log_merge_status *stat;
-	SCOUTFS_BTREE_ITEM_REF(iref);
-	struct scoutfs_key key;
-	int ret;
-
-	super = kmalloc(sizeof(*super), GFP_NOFS);
-	if (!super)
-		return -ENOMEM;
-
-	ret = scoutfs_read_super(sb, super);
-	if (ret < 0)
-		goto out;
-
-	winf->roots = (struct scoutfs_net_roots){
-		.fs_root = super->fs_root,
-		.logs_root = super->logs_root,
-		.srch_root = super->srch_root,
-	};
-
-	winf->merge_input_seq = 0;
-	if (super->log_merge.ref.blkno) {
-		scoutfs_key_set_zeros(&key);
-		key.sk_zone = SCOUTFS_LOG_MERGE_STATUS_ZONE;
-		ret = scoutfs_btree_lookup(sb, &super->log_merge, &key, &iref);
-		if (ret == 0) {
-			if (iref.val_len == sizeof(*stat)) {
-				stat = iref.val;
-				winf->merge_input_seq = le64_to_cpu(stat->seq);
-			} else {
-				ret = -EUCLEAN;
-			}
-			scoutfs_btree_put_iref(&iref);
-		} else if (ret == -ENOENT) {
-			ret = 0;
-		}
-		if (ret < 0)
-			goto out;
-	}
-
-	winf->roots_read_seq++;
-	winf->roots_expire = ktime_add_ms(ktime_get_raw(), WKIC_CACHE_LIFETIME_MS);
-out:
-	kfree(super);
-	return ret;
-}
-
 static int get_roots(struct super_block *sb, struct wkic_info *winf,
-		     struct scoutfs_net_roots *roots_ret, u64 *merge_input_seq,
-		     u64 *read_seq, bool force_new)
+		     struct scoutfs_net_roots *roots_ret, u64 *read_seq, bool force_new)
 {
+	struct scoutfs_net_roots roots;
 	int ret;

 	mutex_lock(&winf->roots_mutex);

 	if (force_new || ktime_before(winf->roots_expire, ktime_get_raw())) {
-		ret = refresh_roots(sb, winf);
+		ret = scoutfs_client_get_roots(sb, &roots);
 		if (ret < 0)
 			goto out;
+
+		winf->roots = roots;
+		winf->roots_read_seq++;
+		winf->roots_expire = ktime_add_ms(ktime_get_raw(), WKIC_CACHE_LIFETIME_MS);
 	}

 	*roots_ret = winf->roots;
-	*merge_input_seq = winf->merge_input_seq;
 	*read_seq = winf->roots_read_seq;
 	ret = 0;
 out:
@@ -921,30 +870,24 @@ static int insert_read_pages(struct super_block *sb, struct wkic_info *winf,
 	struct scoutfs_key end;
 	struct wkic_page *wpage;
 	LIST_HEAD(pages);
-	u64 merge_input_seq;
-	u64 read_seq = 0;
+	u64 read_seq;
 	int ret;

 	ret = 0;
 retry_stale:
-	ret = get_roots(sb, winf, &roots, &merge_input_seq, &read_seq, ret == -ESTALE);
+	ret = get_roots(sb, winf, &roots, &read_seq, ret == -ESTALE);
 	if (ret < 0)
-		goto check_stale;
+		goto out;

 	start = *range_start;
 	end = *range_end;
-	ret = scoutfs_forest_read_items_roots(sb, &roots, merge_input_seq, key, range_start,
-					      &start, &end, read_items_cb, &root);
+	ret = scoutfs_forest_read_items_roots(sb, &roots, key, range_start, &start, &end,
+					      read_items_cb, &root);
 	trace_scoutfs_wkic_read_items(sb, key, &start, &end);
-check_stale:
 	ret = scoutfs_block_check_stale(sb, ret, &saved, &roots.fs_root.ref, &roots.logs_root.ref);
 	if (ret < 0) {
-		if (ret == -ESTALE) {
-			/* not safe to retry due to delta items, must restart clean */
-			free_item_tree(&root);
-			root = RB_ROOT;
+		if (ret == -ESTALE)
 			goto retry_stale;
-		}
 		goto out;
 	}

--- a/kmod/src/xattr.c
+++ b/kmod/src/xattr.c
@@ -742,7 +742,7 @@ int scoutfs_xattr_set_locked(struct inode *inode, const char *name, size_t name_
 	int ret;
 	int err;

-	trace_scoutfs_xattr_set(sb, ino, name_len, value, size, flags);
+	trace_scoutfs_xattr_set(sb, name_len, value, size, flags);

 	if (WARN_ON_ONCE(tgs->totl && tgs->indx) ||
 	    WARN_ON_ONCE((tgs->totl | tgs->indx) && !tag_lock))
@@ -1265,7 +1265,6 @@ int scoutfs_xattr_drop(struct super_block *sb, u64 ino,
 			ret = parse_indx_key(&tag_key, xat->name, xat->name_len, ino);
 			if (ret < 0)
 				goto out;
-			scoutfs_xattr_set_indx_key_xid(&tag_key, le64_to_cpu(key.skx_id));
 		}

 		if ((tgs.totl || tgs.indx) && locked_zone != tag_key.sk_zone) {
--- a/tests/README.md
+++ b/tests/README.md
@@ -117,7 +117,6 @@ used during the test.
 | T\_NR\_MOUNTS    | number of mounts     | -n              | 3                 |
 | T\_O[0-9]        | mount options        | created per run | -o server\_addr=  |
 | T\_QUORUM        | quorum count         | -q              | 2                 |
-| T\_EXTRA         | per-test file dir    | revision ctled  | tests/extra/t     |
 | T\_TMP           | per-test tmp prefix  | made for test   | results/tmp/t/tmp |
 | T\_TMPDIR        | per-test tmp dir dir | made for test   | results/tmp/t     |

--- a/tests/extra/xfstests/expected-results
+++ b/tests/extra/xfstests/expected-results
@@ -1,882 +0,0 @@
-Ran:
-generic/001
-generic/002
-generic/004
-generic/005
-generic/006
-generic/007
-generic/008
-generic/009
-generic/011
-generic/012
-generic/013
-generic/014
-generic/015
-generic/016
-generic/018
-generic/020
-generic/021
-generic/022
-generic/023
-generic/024
-generic/025
-generic/026
-generic/028
-generic/029
-generic/030
-generic/031
-generic/032
-generic/033
-generic/034
-generic/035
-generic/037
-generic/039
-generic/040
-generic/041
-generic/050
-generic/052
-generic/053
-generic/056
-generic/057
-generic/058
-generic/059
-generic/060
-generic/061
-generic/062
-generic/063
-generic/064
-generic/065
-generic/066
-generic/067
-generic/069
-generic/070
-generic/071
-generic/073
-generic/076
-generic/078
-generic/079
-generic/080
-generic/081
-generic/082
-generic/084
-generic/086
-generic/087
-generic/088
-generic/090
-generic/091
-generic/092
-generic/094
-generic/096
-generic/097
-generic/098
-generic/099
-generic/101
-generic/104
-generic/105
-generic/106
-generic/107
-generic/110
-generic/111
-generic/113
-generic/114
-generic/115
-generic/116
-generic/117
-generic/118
-generic/119
-generic/120
-generic/121
-generic/122
-generic/123
-generic/124
-generic/126
-generic/128
-generic/129
-generic/130
-generic/131
-generic/134
-generic/135
-generic/136
-generic/138
-generic/139
-generic/140
-generic/141
-generic/142
-generic/143
-generic/144
-generic/145
-generic/146
-generic/147
-generic/148
-generic/149
-generic/150
-generic/151
-generic/152
-generic/153
-generic/154
-generic/155
-generic/156
-generic/157
-generic/158
-generic/159
-generic/160
-generic/161
-generic/162
-generic/163
-generic/169
-generic/171
-generic/172
-generic/173
-generic/174
-generic/177
-generic/178
-generic/179
-generic/180
-generic/181
-generic/182
-generic/183
-generic/184
-generic/185
-generic/188
-generic/189
-generic/190
-generic/191
-generic/193
-generic/194
-generic/195
-generic/196
-generic/197
-generic/198
-generic/199
-generic/200
-generic/201
-generic/202
-generic/203
-generic/205
-generic/206
-generic/207
-generic/210
-generic/211
-generic/212
-generic/214
-generic/215
-generic/216
-generic/217
-generic/218
-generic/219
-generic/220
-generic/221
-generic/222
-generic/223
-generic/225
-generic/227
-generic/228
-generic/229
-generic/230
-generic/235
-generic/236
-generic/237
-generic/238
-generic/240
-generic/244
-generic/245
-generic/246
-generic/247
-generic/248
-generic/249
-generic/250
-generic/252
-generic/253
-generic/254
-generic/255
-generic/256
-generic/257
-generic/258
-generic/259
-generic/260
-generic/261
-generic/262
-generic/263
-generic/264
-generic/265
-generic/266
-generic/267
-generic/268
-generic/271
-generic/272
-generic/276
-generic/277
-generic/278
-generic/279
-generic/281
-generic/282
-generic/283
-generic/284
-generic/286
-generic/287
-generic/288
-generic/289
-generic/290
-generic/291
-generic/292
-generic/293
-generic/294
-generic/295
-generic/296
-generic/301
-generic/302
-generic/303
-generic/304
-generic/305
-generic/306
-generic/307
-generic/308
-generic/309
-generic/312
-generic/313
-generic/314
-generic/315
-generic/316
-generic/317
-generic/319
-generic/322
-generic/324
-generic/325
-generic/326
-generic/327
-generic/328
-generic/329
-generic/330
-generic/331
-generic/332
-generic/335
-generic/336
-generic/337
-generic/341
-generic/342
-generic/343
-generic/346
-generic/348
-generic/353
-generic/355
-generic/358
-generic/359
-generic/360
-generic/361
-generic/362
-generic/363
-generic/364
-generic/365
-generic/366
-generic/367
-generic/368
-generic/369
-generic/370
-generic/371
-generic/372
-generic/373
-generic/374
-generic/375
-generic/376
-generic/377
-generic/378
-generic/379
-generic/380
-generic/381
-generic/382
-generic/383
-generic/384
-generic/385
-generic/386
-generic/389
-generic/391
-generic/392
-generic/393
-generic/394
-generic/395
-generic/396
-generic/397
-generic/398
-generic/400
-generic/401
-generic/402
-generic/403
-generic/404
-generic/406
-generic/407
-generic/408
-generic/412
-generic/413
-generic/414
-generic/417
-generic/419
-generic/420
-generic/421
-generic/422
-generic/424
-generic/425
-generic/426
-generic/427
-generic/428
-generic/436
-generic/437
-generic/439
-generic/440
-generic/443
-generic/445
-generic/446
-generic/448
-generic/449
-generic/450
-generic/451
-generic/452
-generic/453
-generic/454
-generic/456
-generic/458
-generic/460
-generic/462
-generic/463
-generic/465
-generic/466
-generic/468
-generic/469
-generic/470
-generic/471
-generic/474
-generic/477
-generic/478
-generic/479
-generic/480
-generic/481
-generic/483
-generic/485
-generic/486
-generic/487
-generic/488
-generic/489
-generic/490
-generic/491
-generic/492
-generic/498
-generic/499
-generic/501
-generic/502
-generic/503
-generic/504
-generic/505
-generic/506
-generic/507
-generic/508
-generic/509
-generic/510
-generic/511
-generic/512
-generic/513
-generic/514
-generic/515
-generic/516
-generic/517
-generic/518
-generic/519
-generic/520
-generic/523
-generic/524
-generic/525
-generic/526
-generic/527
-generic/528
-generic/529
-generic/530
-generic/531
-generic/533
-generic/534
-generic/535
-generic/536
-generic/537
-generic/538
-generic/539
-generic/540
-generic/541
-generic/542
-generic/543
-generic/544
-generic/545
-generic/546
-generic/547
-generic/548
-generic/549
-generic/550
-generic/552
-generic/553
-generic/555
-generic/556
-generic/557
-generic/566
-generic/567
-generic/571
-generic/572
-generic/573
-generic/574
-generic/575
-generic/576
-generic/577
-generic/578
-generic/580
-generic/581
-generic/582
-generic/583
-generic/584
-generic/586
-generic/587
-generic/588
-generic/591
-generic/592
-generic/593
-generic/594
-generic/595
-generic/596
-generic/597
-generic/598
-generic/599
-generic/600
-generic/601
-generic/602
-generic/603
-generic/604
-generic/605
-generic/606
-generic/607
-generic/608
-generic/609
-generic/610
-generic/611
-generic/612
-generic/613
-generic/614
-generic/618
-generic/621
-generic/623
-generic/624
-generic/625
-generic/626
-generic/628
-generic/629
-generic/630
-generic/632
-generic/634
-generic/635
-generic/637
-generic/638
-generic/639
-generic/640
-generic/644
-generic/645
-generic/646
-generic/647
-generic/651
-generic/652
-generic/653
-generic/654
-generic/655
-generic/657
-generic/658
-generic/659
-generic/660
-generic/661
-generic/662
-generic/663
-generic/664
-generic/665
-generic/666
-generic/667
-generic/668
-generic/669
-generic/673
-generic/674
-generic/675
-generic/676
-generic/677
-generic/678
-generic/679
-generic/680
-generic/681
-generic/682
-generic/683
-generic/684
-generic/685
-generic/686
-generic/687
-generic/688
-generic/689
-shared/002
-shared/032
-Not
-run:
-generic/008
-generic/009
-generic/012
-generic/015
-generic/016
-generic/018
-generic/021
-generic/022
-generic/025
-generic/026
-generic/031
-generic/033
-generic/050
-generic/052
-generic/058
-generic/059
-generic/060
-generic/061
-generic/063
-generic/064
-generic/078
-generic/079
-generic/081
-generic/082
-generic/091
-generic/094
-generic/096
-generic/110
-generic/111
-generic/113
-generic/114
-generic/115
-generic/116
-generic/118
-generic/119
-generic/121
-generic/122
-generic/123
-generic/128
-generic/130
-generic/134
-generic/135
-generic/136
-generic/138
-generic/139
-generic/140
-generic/142
-generic/143
-generic/144
-generic/145
-generic/146
-generic/147
-generic/148
-generic/149
-generic/150
-generic/151
-generic/152
-generic/153
-generic/154
-generic/155
-generic/156
-generic/157
-generic/158
-generic/159
-generic/160
-generic/161
-generic/162
-generic/163
-generic/171
-generic/172
-generic/173
-generic/174
-generic/177
-generic/178
-generic/179
-generic/180
-generic/181
-generic/182
-generic/183
-generic/185
-generic/188
-generic/189
-generic/190
-generic/191
-generic/193
-generic/194
-generic/195
-generic/196
-generic/197
-generic/198
-generic/199
-generic/200
-generic/201
-generic/202
-generic/203
-generic/205
-generic/206
-generic/207
-generic/210
-generic/211
-generic/212
-generic/214
-generic/216
-generic/217
-generic/218
-generic/219
-generic/220
-generic/222
-generic/223
-generic/225
-generic/227
-generic/229
-generic/230
-generic/235
-generic/238
-generic/240
-generic/244
-generic/250
-generic/252
-generic/253
-generic/254
-generic/255
-generic/256
-generic/259
-generic/260
-generic/261
-generic/262
-generic/263
-generic/264
-generic/265
-generic/266
-generic/267
-generic/268
-generic/271
-generic/272
-generic/276
-generic/277
-generic/278
-generic/279
-generic/281
-generic/282
-generic/283
-generic/284
-generic/287
-generic/288
-generic/289
-generic/290
-generic/291
-generic/292
-generic/293
-generic/295
-generic/296
-generic/301
-generic/302
-generic/303
-generic/304
-generic/305
-generic/312
-generic/314
-generic/316
-generic/317
-generic/324
-generic/326
-generic/327
-generic/328
-generic/329
-generic/330
-generic/331
-generic/332
-generic/353
-generic/355
-generic/358
-generic/359
-generic/361
-generic/362
-generic/363
-generic/364
-generic/365
-generic/366
-generic/367
-generic/368
-generic/369
-generic/370
-generic/371
-generic/372
-generic/373
-generic/374
-generic/378
-generic/379
-generic/380
-generic/381
-generic/382
-generic/383
-generic/384
-generic/385
-generic/386
-generic/391
-generic/392
-generic/395
-generic/396
-generic/397
-generic/398
-generic/400
-generic/402
-generic/404
-generic/406
-generic/407
-generic/408
-generic/412
-generic/413
-generic/414
-generic/417
-generic/419
-generic/420
-generic/421
-generic/422
-generic/424
-generic/425
-generic/427
-generic/439
-generic/440
-generic/446
-generic/449
-generic/450
-generic/451
-generic/453
-generic/454
-generic/456
-generic/458
-generic/462
-generic/463
-generic/465
-generic/466
-generic/468
-generic/469
-generic/470
-generic/471
-generic/474
-generic/485
-generic/487
-generic/488
-generic/491
-generic/492
-generic/499
-generic/501
-generic/503
-generic/505
-generic/506
-generic/507
-generic/508
-generic/511
-generic/513
-generic/514
-generic/515
-generic/516
-generic/517
-generic/518
-generic/519
-generic/520
-generic/528
-generic/530
-generic/536
-generic/537
-generic/538
-generic/539
-generic/540
-generic/541
-generic/542
-generic/543
-generic/544
-generic/545
-generic/546
-generic/548
-generic/549
-generic/550
-generic/552
-generic/553
-generic/555
-generic/556
-generic/566
-generic/567
-generic/572
-generic/573
-generic/574
-generic/575
-generic/576
-generic/577
-generic/578
-generic/580
-generic/581
-generic/582
-generic/583
-generic/584
-generic/586
-generic/587
-generic/588
-generic/591
-generic/592
-generic/593
-generic/594
-generic/595
-generic/596
-generic/597
-generic/598
-generic/599
-generic/600
-generic/601
-generic/602
-generic/603
-generic/605
-generic/606
-generic/607
-generic/608
-generic/609
-generic/610
-generic/612
-generic/613
-generic/621
-generic/623
-generic/624
-generic/625
-generic/626
-generic/628
-generic/629
-generic/630
-generic/635
-generic/644
-generic/645
-generic/646
-generic/647
-generic/651
-generic/652
-generic/653
-generic/654
-generic/655
-generic/657
-generic/658
-generic/659
-generic/660
-generic/661
-generic/662
-generic/663
-generic/664
-generic/665
-generic/666
-generic/667
-generic/668
-generic/669
-generic/673
-generic/674
-generic/675
-generic/677
-generic/678
-generic/679
-generic/680
-generic/681
-generic/682
-generic/683
-generic/684
-generic/685
-generic/686
-generic/687
-generic/688
-generic/689
-shared/002
-shared/032
-Passed all 512 tests
--- a/tests/extra/xfstests/local.exclude
+++ b/tests/extra/xfstests/local.exclude
@@ -1,44 +0,0 @@
-generic/003	# missing atime update in buffered read
-generic/075	# file content mismatch failures (fds, etc)
-generic/103	# enospc causes trans commit failures
-generic/108	# mount fails on failing device?
-generic/112	# file content mismatch failures (fds, etc)
-generic/213	# enospc causes trans commit failures
-generic/318	# can't support user namespaces until v5.11
-generic/321	# requires selinux enabled for '+' in ls?
-generic/338	# BUG_ON update inode error handling
-generic/347	# _dmthin_mount doesn't work?
-generic/356	# swap
-generic/357	# swap
-generic/409	# bind mounts not scripted yet
-generic/410	# bind mounts not scripted yet
-generic/411	# bind mounts not scripted yet
-generic/423	# symlink inode size is strlen() + 1 on scoutfs
-generic/430	# xfs_io copy_range missing in el7
-generic/431	# xfs_io copy_range missing in el7
-generic/432	# xfs_io copy_range missing in el7
-generic/433	# xfs_io copy_range missing in el7
-generic/434	# xfs_io copy_range missing in el7
-generic/441	# dm-mapper
-generic/444	# el9's posix_acl_update_mode is buggy ?
-generic/467	# open_by_handle ESTALE
-generic/472	# swap
-generic/484	# dm-mapper
-generic/493	# swap
-generic/494	# swap
-generic/495	# swap
-generic/496	# swap
-generic/497	# swap
-generic/532	# xfs_io statx attrib_mask missing in el7
-generic/554	# swap
-generic/563	# cgroup+loopdev
-generic/564	# xfs_io copy_range missing in el7
-generic/565	# xfs_io copy_range missing in el7
-generic/568	# falloc not resulting in block count increase
-generic/569	# swap
-generic/570	# swap
-generic/620	# dm-hugedisk
-generic/633	# id-mapped mounts missing in el7
-generic/636	# swap
-generic/641	# swap
-generic/643	# swap
--- a/tests/fenced-local-force-unmount.sh
+++ b/tests/fenced-local-force-unmount.sh
@@ -8,33 +8,36 @@

 echo "$0 running rid '$SCOUTFS_FENCED_REQ_RID' ip '$SCOUTFS_FENCED_REQ_IP' args '$@'"

-echo_fail() {
-	echo "$@" >&2
+log() {
+	echo "$@" > /dev/stderr
 	exit 1
 }

-# silence error messages
-quiet_cat()
-{
-	cat "$@" 2>/dev/null
+echo_fail() {
+	echo "$@" > /dev/stderr
+	exit 1
 }

 rid="$SCOUTFS_FENCED_REQ_RID"

-shopt -s nullglob
 for fs in /sys/fs/scoutfs/*; do
-	fs_rid="$(quiet_cat $fs/rid)"
-	nr="$(quiet_cat $fs/data_device_maj_min)"
-	[ ! -d "$fs" -o "$fs_rid" != "$rid" ] && continue
+	[ ! -d "$fs" ] && continue

-	mnt=$(findmnt -l -n -t scoutfs -o TARGET -S $nr)
-	[ -z "$mnt" ] && continue
-
-	if ! umount -qf "$mnt"; then
-		if [ -d "$fs" ]; then
-			echo_fail "umount -qf $mnt failed"
-		fi
+	fs_rid="$(cat $fs/rid)" || \
+		echo_fail "failed to get rid in $fs"
+	if [ "$fs_rid" != "$rid" ]; then
+		continue
 	fi
+
+	nr="$(cat $fs/data_device_maj_min)" || \
+		echo_fail "failed to get data device major:minor in $fs"
+
+	mnts=$(findmnt -l -n -t scoutfs -o TARGET -S $nr) || \
+		echo_fail "findmnt -t scoutfs -S $nr failed"
+	for mnt in $mnts; do
+		umount -f "$mnt" || \
+			echo_fail "umout -f $mnt failed"
+	done
 done

 exit 0
--- a/tests/funcs/exec.sh
+++ b/tests/funcs/exec.sh
@@ -64,37 +64,19 @@ t_rc()
 }

 #
-# As run, stdout/err are redirected to a file that will be compared with
-# the stored expected golden output of the test.  This redirects
-# stdout/err in the script to stdout of the invoking run-test.  It's
-# intended to give visible output of tests without being included in the
-# golden output.
+# redirect test output back to the output of the invoking script intead
+# of the compared output.
 #
-# (see the goofy "exec" fd manipulation in the main run-tests as it runs
-# each test)
-#
-t_stdout_invoked()
+t_restore_output()
 {
 	exec >&6 2>&1
 }

 #
-# This undoes t_stdout_invokved, returning the test's stdout/err to the
-# output file as it was when it was launched.
+# redirect a command's output back to the compared output after the
+# test has restored its output
 #
-t_stdout_compare()
+t_compare_output()
 {
-	exec >&7 2>&1
-}
-
-#
-# usually bash prints an annoying output message when jobs
-# are killed.  We can avoid that by redirecting stderr for
-# the bash process when it reaps the jobs that are killed.
-#
-t_silent_kill() {
-	exec {ERR}>&2 2>/dev/null
-	kill "$@"
-	wait "$@"
-	exec 2>&$ERR {ERR}>&-
+	"$@" >&7 2>&1
 }
--- a/tests/funcs/filter.sh
+++ b/tests/funcs/filter.sh
@@ -20,6 +20,9 @@ t_filter_fs()
 # [ 2687.691366] BUG: KASAN: stack-out-of-bounds in get_reg+0x1bc/0x230
 # ...
 # [ 2687.706220] ==================================================================
+# [ 2687.707284] Disabling lock debugging due to kernel taint
+#
+# That final lock debugging message may not be included.
 #
 ignore_harmless_unwind_kasan_stack_oob()
 {
@@ -43,6 +46,10 @@ awk '
 		saved=""
        }
        ( in_soob == 2 && $0 ~ /==================================================================/ ) {
+                in_soob = 3
+                soob_nr = NR
+        }
+        ( in_soob == 3 && NR > soob_nr && $0 !~ /Disabling lock debugging/ ) {
                in_soob = 0
        }
        ( !in_soob ) { print $0 }
@@ -54,58 +61,6 @@ awk '
 '
 }

-#
-# in el97+, XFS can generate a spurious lockdep circular dependency
-# warning about reclaim. Fixed upstream in e.g. v5.7-rc4-129-g6dcde60efd94
-#
-ignore_harmless_xfs_lockdep_warning()
-{
-awk '
-	BEGIN {
-		in_block = 0
-		block_nr = 0
-		buf = ""
-	}
-	( !in_block && $0 ~ /======================================================/ ) {
-		in_block = 1
-		block_nr = NR
-		buf = $0 "\n"
-		next
-	}
-	( in_block == 1 && NR == (block_nr + 1) ) {
-		if (match($0, /WARNING: possible circular locking dependency detected/) != 0) {
-			in_block = 2
-			buf = buf $0 "\n"
-		} else {
-			in_block = 0
-			printf "%s", buf
-			print $0
-			buf = ""
-		}
-		next
-	}
-	( in_block == 2 ) {
-		buf = buf $0 "\n"
-		if ($0 ~ /<\/TASK>/) {
-			if (buf ~ /xfs_(nondir_|dir_)?ilock_class/ && buf ~ /fs_reclaim/) {
-				# known xfs lockdep false positive, discard
-			} else {
-				printf "%s", buf
-			}
-			in_block = 0
-			buf = ""
-		}
-		next
-	}
-	{ print $0 }
-	END {
-		if (buf) {
-			printf "%s", buf
-		}
-	}
-'
-}
-
 #
 # Filter out expected messages.  Putting messages here implies that
 # tests aren't relying on messages to discover failures.. they're
@@ -166,10 +121,6 @@ t_filter_dmesg()

 	# in debugging kernels we can slow things down a bit
 	re="$re|hrtimer: interrupt took .*"
-	re="$re|clocksource: Long readout interval"
-
-	# orphan log trees reclaim is handled, not an error
-	re="$re|scoutfs .* reclaiming orphan log trees"

 	# fencing tests force unmounts and trigger timeouts
 	re="$re|scoutfs .* forcing unmount"
@@ -189,9 +140,6 @@ t_filter_dmesg()
 	re="$re|scoutfs .* error.*server failed to bind to.*"
 	re="$re|scoutfs .* critical transaction commit failure.*"

-	# ENOLINK (-67) indicates an expected forced unmount error
-	re="$re|scoutfs .* error -67 .*"
-
 	# change-devices causes loop device resizing
 	re="$re|loop: module loaded"
 	re="$re|loop[0-9].* detected capacity change from.*"
@@ -212,19 +160,6 @@ t_filter_dmesg()
 	re="$re|Pipe handler or fully qualified core dump path required.*"
 	re="$re|Set kernel.core_pattern before fs.suid_dumpable.*"

-	# perf warning that it adjusted sample rate
-	re="$re|perf: interrupt took too long.*lowering kernel.perf_event_max_sample_rate.*"
-
-	# some ci test guests are unresponsive
-	re="$re|longest quorum heartbeat .* delay"
-
-	# creating block devices may trigger this
-	re="$re|block device autoloading is deprecated and will be removed."
-
-	# lockdep or kasan warnings can cause this
-	re="$re|Disabling lock debugging due to kernel taint"
-
 	egrep -v "($re)" | \
-		ignore_harmless_unwind_kasan_stack_oob | \
-		ignore_harmless_xfs_lockdep_warning
+		ignore_harmless_unwind_kasan_stack_oob
 }
--- a/tests/funcs/fs.sh
+++ b/tests/funcs/fs.sh
@@ -283,30 +283,6 @@ t_reinsert_remount_all()
 	t_quiet t_mount_all || t_fail "mounting all failed"
 }

-#
-# scratch helpers
-#
-t_scratch_mkfs()
-{
-	scoutfs mkfs -f -Q 0,127.0.0.1,$T_SCRATCH_PORT "$T_EX_META_DEV" "$T_EX_DATA_DEV" "$@" > $T_TMP.mkfs.out 2>&1 || \
-		t_fail "scratch mkfs failed"
-}
-
-t_scratch_mount()
-{
-	mkdir -p "$T_MSCR"
-	mount -t scoutfs -o metadev_path=$T_EX_META_DEV,quorum_slot_nr=0 "$@" "$T_EX_DATA_DEV" "$T_MSCR" || \
-		t_fail "scratch mount failed"
-}
-
-t_scratch_umount()
-{
-	umount "$T_MSCR" || \
-		t_fail "scratch umount failed"
-	rmdir "$T_MSCR"
-}
-
-
 t_trigger_path() {
 	local nr="$1"

@@ -522,121 +498,3 @@ t_restore_all_sysfs_mount_options() {
 		t_set_sysfs_mount_option $i $name "${_saved_opts[$ind]}"
 	done
 }
-
-t_force_log_merge() {
-	local sv=$(t_server_nr)
-	local merges_started
-	local last_merges_started
-	local merges_completed
-	local last_merges_completed
-
-	while true; do
-		last_merges_started=$(t_counter log_merge_start $sv)
-		last_merges_completed=$(t_counter log_merge_complete $sv)
-
-		t_trigger_arm_silent log_merge_force_finalize_ours $sv
-
-		t_sync_seq_index
-
-		while test "$(t_trigger_get log_merge_force_finalize_ours $sv)" == "1"; do
-			sleep .5
-		done
-
-		merges_started=$(t_counter log_merge_start $sv)
-
-		if (( merges_started > last_merges_started )); then
-			merges_completed=$(t_counter log_merge_complete $sv)
-
-			while (( merges_completed == last_merges_completed )); do
-				sleep .5
-				merges_completed=$(t_counter log_merge_complete $sv)
-			done
-			break
-		fi
-	done
-}
-
-declare -A _last_scan
-t_get_orphan_scan_runs() {
-	local i
-
-	for i in $(t_fs_nrs); do
-		_last_scan[$i]=$(t_counter orphan_scan $i)
-	done
-}
-
-t_wait_for_orphan_scan_runs() {
-	local i
-	local scan
-
-	t_get_orphan_scan_runs
-
-	for i in $(t_fs_nrs); do
-		while true; do
-			scan=$(t_counter orphan_scan $i)
-			if (( scan != _last_scan[$i] )); then
-				break
-			fi
-			sleep .5
-		done
-	done
-}
-
-declare -A _last_empty
-t_get_orphan_scan_empty() {
-	local i
-
-	for i in $(t_fs_nrs); do
-		_last_empty[$i]=$(t_counter orphan_scan_empty $i)
-	done
-}
-
-t_wait_for_no_orphans() {
-	local i;
-	local working;
-	local empty;
-
-	t_get_orphan_scan_empty
-
-	while true; do
-		working=0
-
-		t_wait_for_orphan_scan_runs
-
-		for i in $(t_fs_nrs); do
-			empty=$(t_counter orphan_scan_empty $i)
-			if (( empty == _last_empty[$i] )); then
-				(( working++ ))
-			else
-				(( _last_empty[$i] = empty ))
-			fi
-		done
-
-		if (( working == 0 )); then
-			break
-		fi
-
-		sleep 1
-	done
-}
-
-#
-# Repeatedly run the arguments as a command, sleeping in between, until
-# it returns success.  The first argument is a relative timeout in
-# seconds.  The remaining arguments are the command and its arguments.
-#
-# If the timeout expires without the command returning 0 then the test
-# fails.
-#
-t_wait_until_timeout() {
-	local relative="$1"
-	local expire="$((SECONDS + relative))"
-	shift
-
-	while (( SECONDS < expire )); do
-		"$@" && return
-		sleep 1
-	done
-
-	t_fail "command failed for $relative sec: $@"
-}
--- a/tests/funcs/tap.sh
+++ b/tests/funcs/tap.sh
@@ -43,14 +43,9 @@ t_tap_progress()
 	local testname=$1
 	local result=$2

-	local stmsg=""
 	local diff=""
 	local dmsg=""

-	if [[ -s $T_RESULTS/tmp/${testname}/status.msg ]]; then
-		stmsg="1"
-	fi
-
 	if [[ -s "$T_RESULTS/tmp/${testname}/dmesg.new" ]]; then
 		dmsg="1"
 	fi
@@ -66,7 +61,6 @@ t_tap_progress()
 		echo "# ${testname} ** skipped - permitted **"
 	else
 		echo "not ok ${i} - ${testname}"
-
 		case ${result} in
 		101)
 			echo "# ${testname} ** skipped **"
@@ -76,13 +70,6 @@ t_tap_progress()
 			;;
 		esac

-		if [[ -n "${stmsg}" ]]; then
-			echo "#"
-			echo "# status:"
-			echo "#"
-			cat $T_RESULTS/tmp/${testname}/status.msg | sed 's/^/# - /'
-		fi
-
 		if [[ -n "${diff}" ]]; then
 			echo "#"
 			echo "# diff:"
--- a/tests/golden/basic-acl-consistency
+++ b/tests/golden/basic-acl-consistency
@@ -1,6 +0,0 @@
-== make scratch fs
-== create uid/gids
-== set acls and permissions
-== compare output
-== drop caches and compare again
-== cleanup scratch fs
--- a/tests/golden/basic-xattr-indx
+++ b/tests/golden/basic-xattr-indx
@@ -1,54 +0,0 @@
-== testing invalid read-xattr-index arguments
-bad index position entry argument 'bad', it must be in the form "a.b.ino" where each value can be prefixed by '0' for octal or '0x' for hex
-scoutfs: read-xattr-index failed: Invalid argument (22)
-bad index position entry argument '1.2', it must be in the form "a.b.ino" where each value can be prefixed by '0' for octal or '0x' for hex
-scoutfs: read-xattr-index failed: Invalid argument (22)
-initial major index position '256' must be between 0 and 255, inclusive.
-scoutfs: read-xattr-index failed: Invalid argument (22)
-first index position 1.2.3 must be less than last index position 0.0.0
-scoutfs: read-xattr-index failed: Invalid argument (22)
-first index position 1.2.0 must be less than last index position 1.1.2
-scoutfs: read-xattr-index failed: Invalid argument (22)
-first index position 2.2.2 must be less than last index position 2.2.1
-scoutfs: read-xattr-index failed: Invalid argument (22)
-== testing invalid names
-setfattr: /mnt/test/test/basic-xattr-indx/invalid: Invalid argument
-setfattr: /mnt/test/test/basic-xattr-indx/invalid: Invalid argument
-setfattr: /mnt/test/test/basic-xattr-indx/invalid: Invalid argument
-setfattr: /mnt/test/test/basic-xattr-indx/invalid: Invalid argument
-setfattr: /mnt/test/test/basic-xattr-indx/invalid: Invalid argument
-setfattr: /mnt/test/test/basic-xattr-indx/invalid: Invalid argument
-setfattr: /mnt/test/test/basic-xattr-indx/invalid: Invalid argument
-setfattr: /mnt/test/test/basic-xattr-indx/invalid: Invalid argument
-setfattr: /mnt/test/test/basic-xattr-indx/invalid: Invalid argument
-setfattr: /mnt/test/test/basic-xattr-indx/invalid: Invalid argument
-setfattr: /mnt/test/test/basic-xattr-indx/invalid: Invalid argument
-setfattr: /mnt/test/test/basic-xattr-indx/invalid: Numerical result out of range
-== testing boundary values
-0.0 found
-255.max found
-== indx xattr must have no value
-setfattr: /mnt/test/test/basic-xattr-indx/noval: Invalid argument
-setfattr: /mnt/test/test/basic-xattr-indx/noval: Invalid argument
-== set indx xattr and verify index entry
-found
-== setting same indx xattr again is a no-op
-found
-== removing non-existent indx xattr succeeds
-setfattr: /mnt/test/test/basic-xattr-indx/file: No such attribute
-still found
-== explicit xattr removal cleans up index entry
-== file deletion cleans up index entry
-found before delete
-== multiple indx xattrs on one file cleaned up by deletion
-entries before delete: 2
-entries after delete: 0
-== partial removal leaves other entries
-300 found
-== multiple files at same index position
-files at same position: 2
-surviving file found
-== cross-mount visibility
-found on mount 1
-== duplicate position deduplication
-entries for same position: 1
--- a/tests/golden/inode-deletion
+++ b/tests/golden/inode-deletion
@@ -17,7 +17,7 @@ ino not found in dseq index
 mount 0 contents after mount 1 rm: contents
 ino found in dseq index
 ino found in dseq index
-stat: cannot stat '/mnt/test/test/inode-deletion/badfile': No such file or directory
+stat: cannot stat '/mnt/test/test/inode-deletion/file': No such file or directory
 ino not found in dseq index
 ino not found in dseq index
 == lots of deletions use one open map
--- a/tests/golden/large-fragmented-free
+++ b/tests/golden/large-fragmented-free
@@ -1,3 +1,4 @@
+== setting longer hung task timeout
 == creating fragmented extents
 == unlink file with moved extents to free extents per block
 == cleanup
--- a/tests/golden/lock-rever-invalidate
+++ b/tests/golden/lock-rever-invalidate
--- a/tests/golden/offline-extent-waiting
+++ b/tests/golden/offline-extent-waiting
@@ -49,7 +49,7 @@ offline wating should be empty:
 0
 == truncating does wait
 truncate should be waiting for first block:
-truncate should no longer be waiting:
+trunate should no longer be waiting:
 0
 == writing waits
 should be waiting for write
--- a/tests/golden/orphan-log-trees
+++ b/tests/golden/orphan-log-trees
@@ -1,3 +0,0 @@
-== create orphan log_trees entry via trigger
-== verify orphan is reclaimed and merge completes
-== verify orphan reclaim was logged
--- a/tests/golden/punch-offline
+++ b/tests/golden/punch-offline
@@ -1,460 +0,0 @@
-== missing options should fail ==
-punch-offline: must provide offset
-Try `punch-offline --help' or `punch-offline --usage' for more information.
-punch-offline: must provide length
-Try `punch-offline --help' or `punch-offline --usage' for more information.
-punch-offline: must provide data_version
-Try `punch-offline --help' or `punch-offline --usage' for more information.
-== can't hole punch dir or special ==
-failed to open '/mnt/test.0/test/punch-offline/dir': Is a directory (21)
-scoutfs: punch-offline failed: Is a directory (21)
-== punching an empty file does nothing ==
-== punch outside of i_size does nothing ==
-== can't hole punch online extent ==
-0: offset: 0 length: 4096 flags: ..L
-extents: 1
-punch_offline ioctl failed: Invalid argument (22)
-scoutfs: punch-offline failed: Invalid argument (22)
-0: offset: 0 length: 4096 flags: ..L
-extents: 1
-== can't hole punch unwritten extent ==
-0: offset: 0 length: 12288 flags: .UL
-extents: 1
-punch_offline ioctl failed: Invalid argument (22)
-scoutfs: punch-offline failed: Invalid argument (22)
-0: offset: 0 length: 12288 flags: .UL
-extents: 1
-== hole punch offline extent ==
-0: offset: 0 length: 12288 flags: O.L
-extents: 1
-0: offset: 0 length: 4096 flags: O..
-1: offset: 8192 length: 4096 flags: O.L
-extents: 2
-== can't hole punch non-aligned bsz offset or len ==
-0: offset: 0 length: 12288 flags: O.L
-extents: 1
-punch_offline ioctl failed: Value too large for defined data type (75)
-scoutfs: punch-offline failed: Value too large for defined data type (75)
-punch_offline ioctl failed: Value too large for defined data type (75)
-scoutfs: punch-offline failed: Value too large for defined data type (75)
-punch_offline ioctl failed: Value too large for defined data type (75)
-scoutfs: punch-offline failed: Value too large for defined data type (75)
-punch_offline ioctl failed: Value too large for defined data type (75)
-scoutfs: punch-offline failed: Value too large for defined data type (75)
-punch_offline ioctl failed: Value too large for defined data type (75)
-scoutfs: punch-offline failed: Value too large for defined data type (75)
-punch_offline ioctl failed: Value too large for defined data type (75)
-scoutfs: punch-offline failed: Value too large for defined data type (75)
-0: offset: 0 length: 12288 flags: O.L
-extents: 1
-== can't hole punch mismatched data_version ==
-0: offset: 0 length: 12288 flags: O.L
-extents: 1
-punch_offline ioctl failed: Stale file handle (116)
-scoutfs: punch-offline failed: Stale file handle (116)
-punch_offline ioctl failed: Stale file handle (116)
-scoutfs: punch-offline failed: Stale file handle (116)
-punch_offline ioctl failed: Stale file handle (116)
-scoutfs: punch-offline failed: Stale file handle (116)
-0: offset: 0 length: 12288 flags: O.L
-extents: 1
-== Punch hole crossing multiple extents ==
-0: offset: 0 length: 7 flags: O.L
-extents: 1
-0: offset: 0 length: 1 flags: O..
-1: offset: 2 length: 1 flags: O..
-2: offset: 4 length: 1 flags: O..
-3: offset: 6 length: 1 flags: O.L
-extents: 4
-0: offset: 0 length: 1 flags: O..
-1: offset: 6 length: 1 flags: O.L
-extents: 2
-== punch hole starting at a hole ==
-0: offset: 0 length: 7 flags: O.L
-extents: 1
-0: offset: 0 length: 1 flags: O..
-1: offset: 2 length: 1 flags: O..
-2: offset: 4 length: 1 flags: O..
-3: offset: 6 length: 1 flags: O.L
-extents: 4
-0: offset: 0 length: 1 flags: O..
-1: offset: 6 length: 1 flags: O.L
-extents: 2
-== large punch ==
-0: offset: 0 length: 1572864 flags: O.L
-extents: 1
-0: offset: 0 length: 134123 flags: O..
-1: offset: 202466 length: 264807 flags: O..
-2: offset: 535616 length: 199007 flags: O..
-3: offset: 802966 length: 769898 flags: O.L
-extents: 4
-== overlapping punches with lots of extents ==
-0: offset: 0 length: 4194304 flags: O.L
-extents: 1
-extents: 512
-extents: 505
-extents: 378
-extents: 252
-0: offset: 0 length: 4096 flags: O..
-1: offset: 8192 length: 4096 flags: O..
-2: offset: 32768 length: 4096 flags: O..
-3: offset: 40960 length: 4096 flags: O..
-4: offset: 65536 length: 4096 flags: O..
-5: offset: 73728 length: 4096 flags: O..
-6: offset: 98304 length: 4096 flags: O..
-7: offset: 106496 length: 4096 flags: O..
-8: offset: 196608 length: 4096 flags: O..
-9: offset: 204800 length: 4096 flags: O..
-10: offset: 229376 length: 4096 flags: O..
-11: offset: 237568 length: 4096 flags: O..
-12: offset: 262144 length: 4096 flags: O..
-13: offset: 270336 length: 4096 flags: O..
-14: offset: 294912 length: 4096 flags: O..
-15: offset: 303104 length: 4096 flags: O..
-16: offset: 327680 length: 4096 flags: O..
-17: offset: 335872 length: 4096 flags: O..
-18: offset: 360448 length: 4096 flags: O..
-19: offset: 368640 length: 4096 flags: O..
-20: offset: 393216 length: 4096 flags: O..
-21: offset: 401408 length: 4096 flags: O..
-22: offset: 425984 length: 4096 flags: O..
-23: offset: 434176 length: 4096 flags: O..
-24: offset: 458752 length: 4096 flags: O..
-25: offset: 466944 length: 4096 flags: O..
-26: offset: 491520 length: 4096 flags: O..
-27: offset: 499712 length: 4096 flags: O..
-28: offset: 720896 length: 4096 flags: O..
-29: offset: 729088 length: 4096 flags: O..
-30: offset: 753664 length: 4096 flags: O..
-31: offset: 761856 length: 4096 flags: O..
-32: offset: 786432 length: 4096 flags: O..
-33: offset: 794624 length: 4096 flags: O..
-34: offset: 819200 length: 4096 flags: O..
-35: offset: 827392 length: 4096 flags: O..
-36: offset: 851968 length: 4096 flags: O..
-37: offset: 860160 length: 4096 flags: O..
-38: offset: 884736 length: 4096 flags: O..
-39: offset: 892928 length: 4096 flags: O..
-40: offset: 917504 length: 4096 flags: O..
-41: offset: 925696 length: 4096 flags: O..
-42: offset: 950272 length: 4096 flags: O..
-43: offset: 958464 length: 4096 flags: O..
-44: offset: 983040 length: 4096 flags: O..
-45: offset: 991232 length: 4096 flags: O..
-46: offset: 1015808 length: 4096 flags: O..
-47: offset: 1024000 length: 4096 flags: O..
-48: offset: 1048576 length: 4096 flags: O..
-49: offset: 1056768 length: 4096 flags: O..
-50: offset: 1081344 length: 4096 flags: O..
-51: offset: 1089536 length: 4096 flags: O..
-52: offset: 1114112 length: 4096 flags: O..
-53: offset: 1122304 length: 4096 flags: O..
-54: offset: 1146880 length: 4096 flags: O..
-55: offset: 1155072 length: 4096 flags: O..
-56: offset: 1179648 length: 4096 flags: O..
-57: offset: 1187840 length: 4096 flags: O..
-58: offset: 1212416 length: 4096 flags: O..
-59: offset: 1220608 length: 4096 flags: O..
-60: offset: 1245184 length: 4096 flags: O..
-61: offset: 1253376 length: 4096 flags: O..
-62: offset: 1277952 length: 4096 flags: O..
-63: offset: 1286144 length: 4096 flags: O..
-64: offset: 1310720 length: 4096 flags: O..
-65: offset: 1318912 length: 4096 flags: O..
-66: offset: 1343488 length: 4096 flags: O..
-67: offset: 1351680 length: 4096 flags: O..
-68: offset: 1376256 length: 4096 flags: O..
-69: offset: 1384448 length: 4096 flags: O..
-70: offset: 1409024 length: 4096 flags: O..
-71: offset: 1417216 length: 4096 flags: O..
-72: offset: 1441792 length: 4096 flags: O..
-73: offset: 1449984 length: 4096 flags: O..
-74: offset: 1474560 length: 4096 flags: O..
-75: offset: 1482752 length: 4096 flags: O..
-76: offset: 1507328 length: 4096 flags: O..
-77: offset: 1515520 length: 4096 flags: O..
-78: offset: 1540096 length: 4096 flags: O..
-79: offset: 1548288 length: 4096 flags: O..
-80: offset: 1572864 length: 4096 flags: O..
-81: offset: 1581056 length: 4096 flags: O..
-82: offset: 1605632 length: 4096 flags: O..
-83: offset: 1613824 length: 4096 flags: O..
-84: offset: 1638400 length: 4096 flags: O..
-85: offset: 1646592 length: 4096 flags: O..
-86: offset: 1671168 length: 4096 flags: O..
-87: offset: 1679360 length: 4096 flags: O..
-88: offset: 1703936 length: 4096 flags: O..
-89: offset: 1712128 length: 4096 flags: O..
-90: offset: 1736704 length: 4096 flags: O..
-91: offset: 1744896 length: 4096 flags: O..
-92: offset: 1769472 length: 4096 flags: O..
-93: offset: 1777664 length: 4096 flags: O..
-94: offset: 1802240 length: 4096 flags: O..
-95: offset: 1810432 length: 4096 flags: O..
-96: offset: 1835008 length: 4096 flags: O..
-97: offset: 1843200 length: 4096 flags: O..
-98: offset: 1867776 length: 4096 flags: O..
-99: offset: 1875968 length: 4096 flags: O..
-100: offset: 1900544 length: 4096 flags: O..
-101: offset: 1908736 length: 4096 flags: O..
-102: offset: 1933312 length: 4096 flags: O..
-103: offset: 1941504 length: 4096 flags: O..
-104: offset: 1966080 length: 4096 flags: O..
-105: offset: 1974272 length: 4096 flags: O..
-106: offset: 1998848 length: 4096 flags: O..
-107: offset: 2007040 length: 4096 flags: O..
-108: offset: 2031616 length: 4096 flags: O..
-109: offset: 2039808 length: 4096 flags: O..
-110: offset: 2064384 length: 4096 flags: O..
-111: offset: 2072576 length: 4096 flags: O..
-112: offset: 2097152 length: 4096 flags: O..
-113: offset: 2105344 length: 4096 flags: O..
-114: offset: 2129920 length: 4096 flags: O..
-115: offset: 2138112 length: 4096 flags: O..
-116: offset: 2162688 length: 4096 flags: O..
-117: offset: 2170880 length: 4096 flags: O..
-118: offset: 2195456 length: 4096 flags: O..
-119: offset: 2203648 length: 4096 flags: O..
-120: offset: 2228224 length: 4096 flags: O..
-121: offset: 2236416 length: 4096 flags: O..
-122: offset: 2260992 length: 4096 flags: O..
-123: offset: 2269184 length: 4096 flags: O..
-124: offset: 2293760 length: 4096 flags: O..
-125: offset: 2301952 length: 4096 flags: O..
-126: offset: 2326528 length: 4096 flags: O..
-127: offset: 2334720 length: 4096 flags: O..
-128: offset: 2359296 length: 4096 flags: O..
-129: offset: 2367488 length: 4096 flags: O..
-130: offset: 2392064 length: 4096 flags: O..
-131: offset: 2400256 length: 4096 flags: O..
-132: offset: 2424832 length: 4096 flags: O..
-133: offset: 2433024 length: 4096 flags: O..
-134: offset: 2457600 length: 4096 flags: O..
-135: offset: 2465792 length: 4096 flags: O..
-136: offset: 2490368 length: 4096 flags: O..
-137: offset: 2498560 length: 4096 flags: O..
-138: offset: 2523136 length: 4096 flags: O..
-139: offset: 2531328 length: 4096 flags: O..
-140: offset: 2555904 length: 4096 flags: O..
-141: offset: 2564096 length: 4096 flags: O..
-142: offset: 2588672 length: 4096 flags: O..
-143: offset: 2596864 length: 4096 flags: O..
-144: offset: 2621440 length: 4096 flags: O..
-145: offset: 2629632 length: 4096 flags: O..
-146: offset: 2654208 length: 4096 flags: O..
-147: offset: 2662400 length: 4096 flags: O..
-148: offset: 2686976 length: 4096 flags: O..
-149: offset: 2695168 length: 4096 flags: O..
-150: offset: 2719744 length: 4096 flags: O..
-151: offset: 2727936 length: 4096 flags: O..
-152: offset: 2752512 length: 4096 flags: O..
-153: offset: 2760704 length: 4096 flags: O..
-154: offset: 2785280 length: 4096 flags: O..
-155: offset: 2793472 length: 4096 flags: O..
-156: offset: 2818048 length: 4096 flags: O..
-157: offset: 2826240 length: 4096 flags: O..
-158: offset: 2850816 length: 4096 flags: O..
-159: offset: 2859008 length: 4096 flags: O..
-160: offset: 2883584 length: 4096 flags: O..
-161: offset: 2891776 length: 4096 flags: O..
-162: offset: 2916352 length: 4096 flags: O..
-163: offset: 2924544 length: 4096 flags: O..
-164: offset: 2949120 length: 4096 flags: O..
-165: offset: 2957312 length: 4096 flags: O..
-166: offset: 2981888 length: 4096 flags: O..
-167: offset: 2990080 length: 4096 flags: O..
-168: offset: 3014656 length: 4096 flags: O..
-169: offset: 3022848 length: 4096 flags: O..
-170: offset: 3047424 length: 4096 flags: O..
-171: offset: 3055616 length: 4096 flags: O..
-172: offset: 3080192 length: 4096 flags: O..
-173: offset: 3088384 length: 4096 flags: O..
-174: offset: 3112960 length: 4096 flags: O..
-175: offset: 3121152 length: 4096 flags: O..
-176: offset: 3145728 length: 4096 flags: O..
-177: offset: 3153920 length: 4096 flags: O..
-178: offset: 3178496 length: 4096 flags: O..
-179: offset: 3186688 length: 4096 flags: O..
-180: offset: 3211264 length: 4096 flags: O..
-181: offset: 3219456 length: 4096 flags: O..
-182: offset: 3244032 length: 4096 flags: O..
-183: offset: 3252224 length: 4096 flags: O..
-184: offset: 3276800 length: 4096 flags: O..
-185: offset: 3284992 length: 4096 flags: O..
-186: offset: 3309568 length: 4096 flags: O..
-187: offset: 3317760 length: 4096 flags: O..
-188: offset: 3342336 length: 4096 flags: O..
-189: offset: 3350528 length: 4096 flags: O..
-190: offset: 3375104 length: 4096 flags: O..
-191: offset: 3383296 length: 4096 flags: O..
-192: offset: 3407872 length: 4096 flags: O..
-193: offset: 3416064 length: 4096 flags: O..
-194: offset: 3440640 length: 4096 flags: O..
-195: offset: 3448832 length: 4096 flags: O..
-196: offset: 3473408 length: 4096 flags: O..
-197: offset: 3481600 length: 4096 flags: O..
-198: offset: 3506176 length: 4096 flags: O..
-199: offset: 3514368 length: 4096 flags: O..
-200: offset: 3538944 length: 4096 flags: O..
-201: offset: 3547136 length: 4096 flags: O..
-202: offset: 3571712 length: 4096 flags: O..
-203: offset: 3579904 length: 4096 flags: O..
-204: offset: 3604480 length: 4096 flags: O..
-205: offset: 3612672 length: 4096 flags: O..
-206: offset: 3637248 length: 4096 flags: O..
-207: offset: 3645440 length: 4096 flags: O..
-208: offset: 3670016 length: 4096 flags: O..
-209: offset: 3678208 length: 4096 flags: O..
-210: offset: 3702784 length: 4096 flags: O..
-211: offset: 3710976 length: 4096 flags: O..
-212: offset: 3735552 length: 4096 flags: O..
-213: offset: 3743744 length: 4096 flags: O..
-214: offset: 3768320 length: 4096 flags: O..
-215: offset: 3776512 length: 4096 flags: O..
-216: offset: 3801088 length: 4096 flags: O..
-217: offset: 3809280 length: 4096 flags: O..
-218: offset: 3833856 length: 4096 flags: O..
-219: offset: 3842048 length: 4096 flags: O..
-220: offset: 3866624 length: 4096 flags: O..
-221: offset: 3874816 length: 4096 flags: O..
-222: offset: 3899392 length: 4096 flags: O..
-223: offset: 3907584 length: 4096 flags: O..
-224: offset: 3932160 length: 4096 flags: O..
-225: offset: 3940352 length: 4096 flags: O..
-226: offset: 3964928 length: 4096 flags: O..
-227: offset: 3973120 length: 4096 flags: O..
-228: offset: 3997696 length: 4096 flags: O..
-229: offset: 4005888 length: 4096 flags: O..
-230: offset: 4030464 length: 4096 flags: O..
-231: offset: 4038656 length: 4096 flags: O..
-232: offset: 4063232 length: 4096 flags: O..
-233: offset: 4071424 length: 4096 flags: O..
-234: offset: 4096000 length: 4096 flags: O..
-235: offset: 4104192 length: 4096 flags: O..
-236: offset: 4128768 length: 4096 flags: O..
-237: offset: 4136960 length: 4096 flags: O..
-238: offset: 4161536 length: 4096 flags: O..
-239: offset: 4169728 length: 4096 flags: O.L
-extents: 240
-0: offset: 0 length: 1 flags: O..
-1: offset: 8 length: 1 flags: O..
-2: offset: 16 length: 1 flags: O..
-3: offset: 24 length: 1 flags: O..
-4: offset: 48 length: 1 flags: O..
-5: offset: 56 length: 1 flags: O..
-6: offset: 64 length: 1 flags: O..
-7: offset: 72 length: 1 flags: O..
-8: offset: 80 length: 1 flags: O..
-9: offset: 88 length: 1 flags: O..
-10: offset: 96 length: 1 flags: O..
-11: offset: 104 length: 1 flags: O..
-12: offset: 112 length: 1 flags: O..
-13: offset: 120 length: 1 flags: O..
-14: offset: 176 length: 1 flags: O..
-15: offset: 184 length: 1 flags: O..
-16: offset: 192 length: 1 flags: O..
-17: offset: 200 length: 1 flags: O..
-18: offset: 208 length: 1 flags: O..
-19: offset: 216 length: 1 flags: O..
-20: offset: 224 length: 1 flags: O..
-21: offset: 232 length: 1 flags: O..
-22: offset: 240 length: 1 flags: O..
-23: offset: 248 length: 1 flags: O..
-24: offset: 256 length: 1 flags: O..
-25: offset: 264 length: 1 flags: O..
-26: offset: 272 length: 1 flags: O..
-27: offset: 280 length: 1 flags: O..
-28: offset: 288 length: 1 flags: O..
-29: offset: 296 length: 1 flags: O..
-30: offset: 304 length: 1 flags: O..
-31: offset: 312 length: 1 flags: O..
-32: offset: 320 length: 1 flags: O..
-33: offset: 328 length: 1 flags: O..
-34: offset: 336 length: 1 flags: O..
-35: offset: 344 length: 1 flags: O..
-36: offset: 352 length: 1 flags: O..
-37: offset: 360 length: 1 flags: O..
-38: offset: 368 length: 1 flags: O..
-39: offset: 376 length: 1 flags: O..
-40: offset: 384 length: 1 flags: O..
-41: offset: 392 length: 1 flags: O..
-42: offset: 400 length: 1 flags: O..
-43: offset: 408 length: 1 flags: O..
-44: offset: 416 length: 1 flags: O..
-45: offset: 424 length: 1 flags: O..
-46: offset: 432 length: 1 flags: O..
-47: offset: 440 length: 1 flags: O..
-48: offset: 448 length: 1 flags: O..
-49: offset: 456 length: 1 flags: O..
-50: offset: 464 length: 1 flags: O..
-51: offset: 472 length: 1 flags: O..
-52: offset: 480 length: 1 flags: O..
-53: offset: 488 length: 1 flags: O..
-54: offset: 496 length: 1 flags: O..
-55: offset: 504 length: 1 flags: O..
-56: offset: 512 length: 1 flags: O..
-57: offset: 520 length: 1 flags: O..
-58: offset: 528 length: 1 flags: O..
-59: offset: 536 length: 1 flags: O..
-60: offset: 544 length: 1 flags: O..
-61: offset: 552 length: 1 flags: O..
-62: offset: 560 length: 1 flags: O..
-63: offset: 568 length: 1 flags: O..
-64: offset: 576 length: 1 flags: O..
-65: offset: 584 length: 1 flags: O..
-66: offset: 592 length: 1 flags: O..
-67: offset: 600 length: 1 flags: O..
-68: offset: 608 length: 1 flags: O..
-69: offset: 616 length: 1 flags: O..
-70: offset: 624 length: 1 flags: O..
-71: offset: 632 length: 1 flags: O..
-72: offset: 640 length: 1 flags: O..
-73: offset: 648 length: 1 flags: O..
-74: offset: 656 length: 1 flags: O..
-75: offset: 664 length: 1 flags: O..
-76: offset: 672 length: 1 flags: O..
-77: offset: 680 length: 1 flags: O..
-78: offset: 688 length: 1 flags: O..
-79: offset: 696 length: 1 flags: O..
-80: offset: 704 length: 1 flags: O..
-81: offset: 712 length: 1 flags: O..
-82: offset: 720 length: 1 flags: O..
-83: offset: 728 length: 1 flags: O..
-84: offset: 736 length: 1 flags: O..
-85: offset: 744 length: 1 flags: O..
-86: offset: 752 length: 1 flags: O..
-87: offset: 760 length: 1 flags: O..
-88: offset: 768 length: 1 flags: O..
-89: offset: 776 length: 1 flags: O..
-90: offset: 784 length: 1 flags: O..
-91: offset: 792 length: 1 flags: O..
-92: offset: 800 length: 1 flags: O..
-93: offset: 808 length: 1 flags: O..
-94: offset: 816 length: 1 flags: O..
-95: offset: 824 length: 1 flags: O..
-96: offset: 832 length: 1 flags: O..
-97: offset: 840 length: 1 flags: O..
-98: offset: 848 length: 1 flags: O..
-99: offset: 856 length: 1 flags: O..
-100: offset: 864 length: 1 flags: O..
-101: offset: 872 length: 1 flags: O..
-102: offset: 880 length: 1 flags: O..
-103: offset: 888 length: 1 flags: O..
-104: offset: 896 length: 1 flags: O..
-105: offset: 904 length: 1 flags: O..
-106: offset: 912 length: 1 flags: O..
-107: offset: 920 length: 1 flags: O..
-108: offset: 928 length: 1 flags: O..
-109: offset: 936 length: 1 flags: O..
-110: offset: 944 length: 1 flags: O..
-111: offset: 952 length: 1 flags: O..
-112: offset: 960 length: 1 flags: O..
-113: offset: 968 length: 1 flags: O..
-114: offset: 976 length: 1 flags: O..
-115: offset: 984 length: 1 flags: O..
-116: offset: 992 length: 1 flags: O..
-117: offset: 1000 length: 1 flags: O..
-118: offset: 1008 length: 1 flags: O..
-119: offset: 1016 length: 1 flags: O.L
-extents: 120
-extents: 0
--- a/tests/golden/totl-merge-read
+++ b/tests/golden/totl-merge-read
@@ -1,3 +0,0 @@
-== setup
-expected 4681
-== cleanup
--- a/tests/golden/xfstests
+++ b/tests/golden/xfstests
@@ -0,0 +1,882 @@
+Ran:
+generic/001
+generic/002
+generic/004
+generic/005
+generic/006
+generic/007
+generic/008
+generic/009
+generic/011
+generic/012
+generic/013
+generic/014
+generic/015
+generic/016
+generic/018
+generic/020
+generic/021
+generic/022
+generic/023
+generic/024
+generic/025
+generic/026
+generic/028
+generic/029
+generic/030
+generic/031
+generic/032
+generic/033
+generic/034
+generic/035
+generic/037
+generic/039
+generic/040
+generic/041
+generic/050
+generic/052
+generic/053
+generic/056
+generic/057
+generic/058
+generic/059
+generic/060
+generic/061
+generic/062
+generic/063
+generic/064
+generic/065
+generic/066
+generic/067
+generic/069
+generic/070
+generic/071
+generic/073
+generic/076
+generic/078
+generic/079
+generic/080
+generic/081
+generic/082
+generic/084
+generic/086
+generic/087
+generic/088
+generic/090
+generic/091
+generic/092
+generic/094
+generic/096
+generic/097
+generic/098
+generic/099
+generic/101
+generic/104
+generic/105
+generic/106
+generic/107
+generic/110
+generic/111
+generic/113
+generic/114
+generic/115
+generic/116
+generic/117
+generic/118
+generic/119
+generic/120
+generic/121
+generic/122
+generic/123
+generic/124
+generic/126
+generic/128
+generic/129
+generic/130
+generic/131
+generic/134
+generic/135
+generic/136
+generic/138
+generic/139
+generic/140
+generic/141
+generic/142
+generic/143
+generic/144
+generic/145
+generic/146
+generic/147
+generic/148
+generic/149
+generic/150
+generic/151
+generic/152
+generic/153
+generic/154
+generic/155
+generic/156
+generic/157
+generic/158
+generic/159
+generic/160
+generic/161
+generic/162
+generic/163
+generic/169
+generic/171
+generic/172
+generic/173
+generic/174
+generic/177
+generic/178
+generic/179
+generic/180
+generic/181
+generic/182
+generic/183
+generic/184
+generic/185
+generic/188
+generic/189
+generic/190
+generic/191
+generic/193
+generic/194
+generic/195
+generic/196
+generic/197
+generic/198
+generic/199
+generic/200
+generic/201
+generic/202
+generic/203
+generic/205
+generic/206
+generic/207
+generic/210
+generic/211
+generic/212
+generic/214
+generic/215
+generic/216
+generic/217
+generic/218
+generic/219
+generic/220
+generic/221
+generic/222
+generic/223
+generic/225
+generic/227
+generic/228
+generic/229
+generic/230
+generic/235
+generic/236
+generic/237
+generic/238
+generic/240
+generic/244
+generic/245
+generic/246
+generic/247
+generic/248
+generic/249
+generic/250
+generic/252
+generic/253
+generic/254
+generic/255
+generic/256
+generic/257
+generic/258
+generic/259
+generic/260
+generic/261
+generic/262
+generic/263
+generic/264
+generic/265
+generic/266
+generic/267
+generic/268
+generic/271
+generic/272
+generic/276
+generic/277
+generic/278
+generic/279
+generic/281
+generic/282
+generic/283
+generic/284
+generic/286
+generic/287
+generic/288
+generic/289
+generic/290
+generic/291
+generic/292
+generic/293
+generic/294
+generic/295
+generic/296
+generic/301
+generic/302
+generic/303
+generic/304
+generic/305
+generic/306
+generic/307
+generic/308
+generic/309
+generic/312
+generic/313
+generic/314
+generic/315
+generic/316
+generic/317
+generic/319
+generic/322
+generic/324
+generic/325
+generic/326
+generic/327
+generic/328
+generic/329
+generic/330
+generic/331
+generic/332
+generic/335
+generic/336
+generic/337
+generic/341
+generic/342
+generic/343
+generic/346
+generic/348
+generic/353
+generic/355
+generic/358
+generic/359
+generic/360
+generic/361
+generic/362
+generic/363
+generic/364
+generic/365
+generic/366
+generic/367
+generic/368
+generic/369
+generic/370
+generic/371
+generic/372
+generic/373
+generic/374
+generic/375
+generic/376
+generic/377
+generic/378
+generic/379
+generic/380
+generic/381
+generic/382
+generic/383
+generic/384
+generic/385
+generic/386
+generic/389
+generic/391
+generic/392
+generic/393
+generic/394
+generic/395
+generic/396
+generic/397
+generic/398
+generic/400
+generic/401
+generic/402
+generic/403
+generic/404
+generic/406
+generic/407
+generic/408
+generic/412
+generic/413
+generic/414
+generic/417
+generic/419
+generic/420
+generic/421
+generic/422
+generic/424
+generic/425
+generic/426
+generic/427
+generic/428
+generic/436
+generic/437
+generic/439
+generic/440
+generic/443
+generic/445
+generic/446
+generic/448
+generic/449
+generic/450
+generic/451
+generic/452
+generic/453
+generic/454
+generic/456
+generic/458
+generic/460
+generic/462
+generic/463
+generic/465
+generic/466
+generic/468
+generic/469
+generic/470
+generic/471
+generic/474
+generic/477
+generic/478
+generic/479
+generic/480
+generic/481
+generic/483
+generic/485
+generic/486
+generic/487
+generic/488
+generic/489
+generic/490
+generic/491
+generic/492
+generic/498
+generic/499
+generic/501
+generic/502
+generic/503
+generic/504
+generic/505
+generic/506
+generic/507
+generic/508
+generic/509
+generic/510
+generic/511
+generic/512
+generic/513
+generic/514
+generic/515
+generic/516
+generic/517
+generic/518
+generic/519
+generic/520
+generic/523
+generic/524
+generic/525
+generic/526
+generic/527
+generic/528
+generic/529
+generic/530
+generic/531
+generic/533
+generic/534
+generic/535
+generic/536
+generic/537
+generic/538
+generic/539
+generic/540
+generic/541
+generic/542
+generic/543
+generic/544
+generic/545
+generic/546
+generic/547
+generic/548
+generic/549
+generic/550
+generic/552
+generic/553
+generic/555
+generic/556
+generic/557
+generic/566
+generic/567
+generic/571
+generic/572
+generic/573
+generic/574
+generic/575
+generic/576
+generic/577
+generic/578
+generic/580
+generic/581
+generic/582
+generic/583
+generic/584
+generic/586
+generic/587
+generic/588
+generic/591
+generic/592
+generic/593
+generic/594
+generic/595
+generic/596
+generic/597
+generic/598
+generic/599
+generic/600
+generic/601
+generic/602
+generic/603
+generic/604
+generic/605
+generic/606
+generic/607
+generic/608
+generic/609
+generic/610
+generic/611
+generic/612
+generic/613
+generic/614
+generic/618
+generic/621
+generic/623
+generic/624
+generic/625
+generic/626
+generic/628
+generic/629
+generic/630
+generic/632
+generic/634
+generic/635
+generic/637
+generic/638
+generic/639
+generic/640
+generic/644
+generic/645
+generic/646
+generic/647
+generic/651
+generic/652
+generic/653
+generic/654
+generic/655
+generic/657
+generic/658
+generic/659
+generic/660
+generic/661
+generic/662
+generic/663
+generic/664
+generic/665
+generic/666
+generic/667
+generic/668
+generic/669
+generic/673
+generic/674
+generic/675
+generic/676
+generic/677
+generic/678
+generic/679
+generic/680
+generic/681
+generic/682
+generic/683
+generic/684
+generic/685
+generic/686
+generic/687
+generic/688
+generic/689
+shared/002
+shared/032
+Not
+run:
+generic/008
+generic/009
+generic/012
+generic/015
+generic/016
+generic/018
+generic/021
+generic/022
+generic/025
+generic/026
+generic/031
+generic/033
+generic/050
+generic/052
+generic/058
+generic/059
+generic/060
+generic/061
+generic/063
+generic/064
+generic/078
+generic/079
+generic/081
+generic/082
+generic/091
+generic/094
+generic/096
+generic/110
+generic/111
+generic/113
+generic/114
+generic/115
+generic/116
+generic/118
+generic/119
+generic/121
+generic/122
+generic/123
+generic/128
+generic/130
+generic/134
+generic/135
+generic/136
+generic/138
+generic/139
+generic/140
+generic/142
+generic/143
+generic/144
+generic/145
+generic/146
+generic/147
+generic/148
+generic/149
+generic/150
+generic/151
+generic/152
+generic/153
+generic/154
+generic/155
+generic/156
+generic/157
+generic/158
+generic/159
+generic/160
+generic/161
+generic/162
+generic/163
+generic/171
+generic/172
+generic/173
+generic/174
+generic/177
+generic/178
+generic/179
+generic/180
+generic/181
+generic/182
+generic/183
+generic/185
+generic/188
+generic/189
+generic/190
+generic/191
+generic/193
+generic/194
+generic/195
+generic/196
+generic/197
+generic/198
+generic/199
+generic/200
+generic/201
+generic/202
+generic/203
+generic/205
+generic/206
+generic/207
+generic/210
+generic/211
+generic/212
+generic/214
+generic/216
+generic/217
+generic/218
+generic/219
+generic/220
+generic/222
+generic/223
+generic/225
+generic/227
+generic/229
+generic/230
+generic/235
+generic/238
+generic/240
+generic/244
+generic/250
+generic/252
+generic/253
+generic/254
+generic/255
+generic/256
+generic/259
+generic/260
+generic/261
+generic/262
+generic/263
+generic/264
+generic/265
+generic/266
+generic/267
+generic/268
+generic/271
+generic/272
+generic/276
+generic/277
+generic/278
+generic/279
+generic/281
+generic/282
+generic/283
+generic/284
+generic/287
+generic/288
+generic/289
+generic/290
+generic/291
+generic/292
+generic/293
+generic/295
+generic/296
+generic/301
+generic/302
+generic/303
+generic/304
+generic/305
+generic/312
+generic/314
+generic/316
+generic/317
+generic/324
+generic/326
+generic/327
+generic/328
+generic/329
+generic/330
+generic/331
+generic/332
+generic/353
+generic/355
+generic/358
+generic/359
+generic/361
+generic/362
+generic/363
+generic/364
+generic/365
+generic/366
+generic/367
+generic/368
+generic/369
+generic/370
+generic/371
+generic/372
+generic/373
+generic/374
+generic/378
+generic/379
+generic/380
+generic/381
+generic/382
+generic/383
+generic/384
+generic/385
+generic/386
+generic/391
+generic/392
+generic/395
+generic/396
+generic/397
+generic/398
+generic/400
+generic/402
+generic/404
+generic/406
+generic/407
+generic/408
+generic/412
+generic/413
+generic/414
+generic/417
+generic/419
+generic/420
+generic/421
+generic/422
+generic/424
+generic/425
+generic/427
+generic/439
+generic/440
+generic/446
+generic/449
+generic/450
+generic/451
+generic/453
+generic/454
+generic/456
+generic/458
+generic/462
+generic/463
+generic/465
+generic/466
+generic/468
+generic/469
+generic/470
+generic/471
+generic/474
+generic/485
+generic/487
+generic/488
+generic/491
+generic/492
+generic/499
+generic/501
+generic/503
+generic/505
+generic/506
+generic/507
+generic/508
+generic/511
+generic/513
+generic/514
+generic/515
+generic/516
+generic/517
+generic/518
+generic/519
+generic/520
+generic/528
+generic/530
+generic/536
+generic/537
+generic/538
+generic/539
+generic/540
+generic/541
+generic/542
+generic/543
+generic/544
+generic/545
+generic/546
+generic/548
+generic/549
+generic/550
+generic/552
+generic/553
+generic/555
+generic/556
+generic/566
+generic/567
+generic/572
+generic/573
+generic/574
+generic/575
+generic/576
+generic/577
+generic/578
+generic/580
+generic/581
+generic/582
+generic/583
+generic/584
+generic/586
+generic/587
+generic/588
+generic/591
+generic/592
+generic/593
+generic/594
+generic/595
+generic/596
+generic/597
+generic/598
+generic/599
+generic/600
+generic/601
+generic/602
+generic/603
+generic/605
+generic/606
+generic/607
+generic/608
+generic/609
+generic/610
+generic/612
+generic/613
+generic/621
+generic/623
+generic/624
+generic/625
+generic/626
+generic/628
+generic/629
+generic/630
+generic/635
+generic/644
+generic/645
+generic/646
+generic/647
+generic/651
+generic/652
+generic/653
+generic/654
+generic/655
+generic/657
+generic/658
+generic/659
+generic/660
+generic/661
+generic/662
+generic/663
+generic/664
+generic/665
+generic/666
+generic/667
+generic/668
+generic/669
+generic/673
+generic/674
+generic/675
+generic/677
+generic/678
+generic/679
+generic/680
+generic/681
+generic/682
+generic/683
+generic/684
+generic/685
+generic/686
+generic/687
+generic/688
+generic/689
+shared/002
+shared/032
+Passed all 512 tests
--- a/tests/run-tests.sh
+++ b/tests/run-tests.sh
@@ -56,7 +56,6 @@ $(basename $0) options:
              | only tests matching will be run.  Can be provided multiple
              | times
    -i        | Force removing and inserting the built scoutfs.ko module.
-    -l <nr>   | Loop each test <nr> times while passing, last run counts.
    -M <file> | Specify the filesystem's meta data device path that contains
              | the file system to be tested.  Will be clobbered by -m mkfs.
    -m        | Run mkfs on the device before mounting and running
@@ -70,7 +69,6 @@ $(basename $0) options:
    -r <dir>  | Specify the directory in which to store results of
              | test runs.  The directory will be created if it doesn't
              | exist.  Previous results will be deleted as each test runs.
-    -R        | shuffle the test order randomly using shuf
    -s        | Skip git repo checkouts.
    -t        | Enabled trace events that match the given glob argument.
              | Multiple options enable multiple globbed events.
@@ -91,8 +89,6 @@ done
 # set some T_ defaults
 T_TRACE_DUMP="0"
 T_TRACE_PRINTK="0"
-T_PORT_START="19700"
-T_LOOP_ITER="1"

 # array declarations to be able to use array ops
 declare -a T_TRACE_GLOB
@@ -133,12 +129,6 @@ while true; do
 	-i)
 		T_INSMOD="1"
 		;;
-	-l)
-	        test -n "$2" || die "-l must have a nr iterations argument"
-		test "$2" -eq "$2" 2>/dev/null || die "-l <nr> argument must be an integer"
-		T_LOOP_ITER="$2"
-		shift
-		;;
 	-M)
 	        test -n "$2" || die "-z must have meta device file argument"
 	        T_META_DEVICE="$2"
@@ -174,9 +164,6 @@ while true; do
 		T_RESULTS="$2"
 		shift
 		;;
-	-R)
-		T_SHUF="1"
-		;;
 	-s)
 	        T_SKIP_CHECKOUT="1"
 		;;
@@ -274,37 +261,13 @@ for e in T_META_DEVICE T_DATA_DEVICE T_EX_META_DEV T_EX_DATA_DEV T_KMOD T_RESULT
 	eval $e=\"$(readlink -f "${!e}")\"
 done

-# try and check ports, but not necessary
-T_TEST_PORT="$T_PORT_START"
-T_SCRATCH_PORT="$((T_PORT_START + 100))"
-T_DEV_PORT="$((T_PORT_START + 200))"
-read local_start local_end < /proc/sys/net/ipv4/ip_local_port_range
-if [ -n "$local_start" -a -n "$local_end" -a "$local_start" -lt "$local_end" ]; then
-	if [ ! "$T_DEV_PORT" -lt "$local_start" -a ! "$T_TEST_PORT" -gt "$local_end" ]; then
-		die "listening port range $T_TEST_PORT - $T_DEV_PORT is within local dynamic port range $local_start - $local_end in /proc/sys/net/ipv4/ip_local_port_range"
-	fi
-fi
-
-# permute sequence?
-T_SEQUENCE=sequence
-if [ -n "$T_SHUF" ]; then
-	msg "shuffling test order"
-	shuf sequence -o sequence.shuf
-	# keep xfstests at the end
-	if grep -q 'xfstests.sh' sequence.shuf ; then
-		sed -i '/xfstests.sh/d' sequence.shuf
-		echo "xfstests.sh" >> sequence.shuf
-	fi
-	T_SEQUENCE=sequence.shuf
-fi
-
 # include everything by default
 test -z "$T_INCLUDE" && T_INCLUDE="-e '.*'"
 # (quickly) exclude nothing by default
 test -z "$T_EXCLUDE" && T_EXCLUDE="-e '\Zx'"

 # eval to strip re ticks but not expand
-tests=$(grep -v "^#" $T_SEQUENCE |
+tests=$(grep -v "^#" sequence |
 	eval grep "$T_INCLUDE" | eval grep -v "$T_EXCLUDE")
 test -z "$tests" && \
 	die "no tests found by including $T_INCLUDE and excluding $T_EXCLUDE"
@@ -383,7 +346,7 @@ fi
 quo=""
 if [ -n "$T_MKFS" ]; then
 	for i in $(seq -0 $((T_QUORUM - 1))); do
-		quo="$quo -Q $i,127.0.0.1,$((T_TEST_PORT + i))"
+		quo="$quo -Q $i,127.0.0.1,$((42000 + i))"
 	done

 	msg "making new filesystem with $T_QUORUM quorum members"
@@ -400,8 +363,7 @@ if [ -n "$T_INSMOD" ]; then
 fi

 if [ -n "$T_TRACE_MULT" ]; then
-#	orig_trace_size=$(cat /sys/kernel/debug/tracing/buffer_size_kb)
-	orig_trace_size=1408
+	orig_trace_size=$(cat /sys/kernel/debug/tracing/buffer_size_kb)
 	mult_trace_size=$((orig_trace_size * T_TRACE_MULT))
 	msg "increasing trace buffer size from $orig_trace_size KiB to $mult_trace_size KiB"
 	echo $mult_trace_size > /sys/kernel/debug/tracing/buffer_size_kb
@@ -439,30 +401,6 @@ cmd grep .  /sys/kernel/debug/tracing/options/trace_printk \
 	    /sys/kernel/debug/tracing/buffer_size_kb \
 	    /proc/sys/kernel/ftrace_dump_on_oops

-# we can record pids to kill as we exit, we kill in reverse added order
-atexit_kill_pids=""
-add_atexit_kill_pid()
-{
-	atexit_kill_pids="$1 $atexit_kill_pids"
-}
-atexit_kill()
-{
-	local pid
-
-	# suppress bg function exited messages
-	exec {ERR}>&2 2>/dev/null
-
-	for pid in $atexit_kill_pids; do
-		if test -e "/proc/$pid/status" ; then
-			kill "$pid"
-			wait "$pid"
-		fi
-	done
-
-	exec 2>&$ERR {ERR}>&-
-}
-trap atexit_kill EXIT
-
 #
 # Build a fenced config that runs scripts out of the repository rather
 # than the default system directory
@@ -476,46 +414,26 @@ EOF
 export SCOUTFS_FENCED_CONFIG_FILE="$conf"
 T_FENCED_LOG="$T_RESULTS/fenced.log"

+#
+# Run the agent in the background, log its output, an kill it if we
+# exit
+#
+fenced_log()
+{
+	echo "[$(timestamp)] $*" >> "$T_FENCED_LOG"
+}
+fenced_pid=""
+kill_fenced()
+{
+	if test -n "$fenced_pid" -a -d "/proc/$fenced_pid" ; then
+		fenced_log "killing fenced pid $fenced_pid"
+		kill "$fenced_pid"
+	fi
+}
+trap kill_fenced EXIT
 $T_UTILS/fenced/scoutfs-fenced > "$T_FENCED_LOG" 2>&1 &
 fenced_pid=$!
-add_atexit_kill_pid $fenced_pid
-
-#
-# some critical failures will cause fs operations to hang.  We can watch
-# for evidence of them and cause the system to crash, at least.
-#
-crash_monitor()
-{
-	local bad=0
-
-	while sleep 1; do
-		if dmesg | grep -q "inserting extent.*overlaps existing"; then
-			echo "run-tests monitor saw overlapping extent message"
-			bad=1
-		fi
-
-		if dmesg | grep -q "error indicated by fence action" ; then
-			echo "run-tests monitor saw fence agent error message"
-			bad=1
-		fi
-
-		if [ ! -e "/proc/${fenced_pid}/status" ]; then
-			echo "run-tests monitor didn't see fenced pid $fenced_pid /proc dir"
-			bad=1
-		fi
-
-		if [ "$bad" != 0 ]; then
-			echo "run-tests monitor syncing and triggering crash"
-			# hail mary, the sync could well hang
-			(echo s > /proc/sysrq-trigger) &
-			sleep 5
-			echo c > /proc/sysrq-trigger
-			exit 1
-		fi
-	done
-}
-crash_monitor &
-add_atexit_kill_pid $!
+fenced_log "started fenced pid $fenced_pid in the background"

 # setup dm tables
 echo "0 $(blockdev --getsz $T_META_DEVICE) linear $T_META_DEVICE 0" > \
@@ -546,6 +464,7 @@ for i in $(seq 0 $((T_NR_MOUNTS - 1))); do
 	if [ "$i" -lt "$T_QUORUM" ]; then
 		opts="$opts,quorum_slot_nr=$i"
 	fi
+	opts="$opts,meta_reserve_blocks=0"
 	opts="${opts}${T_MNT_OPTIONS}"

 	msg "mounting $meta_dev|$data_dev on $dir"
@@ -588,7 +507,7 @@ fi
 . funcs/filter.sh

 # give tests access to built binaries in src/, prefer over installed
-export PATH="$PWD/src:$PATH"
+PATH="$PWD/src:$PATH"

 msg "running tests"
 > "$T_RESULTS/skip.log"
@@ -608,113 +527,98 @@ for t in $tests; do
 	t="tests/$t"
 	test_name=$(basename "$t" | sed -e 's/.sh$//')

+	# create a temporary dir and file path for the test
+	T_TMPDIR="$T_RESULTS/tmp/$test_name"
+	T_TMP="$T_TMPDIR/tmp"
+	cmd rm -rf "$T_TMPDIR"
+	cmd mkdir -p "$T_TMPDIR"
+
+	# create a test name dir in the fs
+	T_DS=""
+	for i in $(seq 0 $((T_NR_MOUNTS - 1))); do
+		dir="${T_M[$i]}/test/$test_name"
+
+		test $i == 0 && cmd mkdir -p "$dir"
+
+		eval T_D$i=$dir
+		T_D[$i]=$dir
+		T_DS+="$dir "
+	done
+
+	# export all our T_ variables
+	for v in ${!T_*}; do
+		eval export $v
+	done
+	export PATH # give test access to scoutfs binary
+
+	# prepare to compare output to golden output
+	test -e "$T_RESULTS/output" || cmd mkdir -p "$T_RESULTS/output"
+	out="$T_RESULTS/output/$test_name"
+	> "$T_TMPDIR/status.msg"
+	golden="golden/$test_name"
+
 	# get stats from previous pass
 	last="$T_RESULTS/last-passed-test-stats"
 	stats=$(grep -s "^$test_name " "$last" | cut -d " " -f 2-)
 	test -n "$stats" && stats="last: $stats"
+
 	printf "  %-30s $stats" "$test_name"

 	# mark in dmesg as to what test we are running
 	echo "run scoutfs test $test_name" > /dev/kmsg

-	# let the test get at its extra files
-	T_EXTRA="$T_TESTS/extra/$test_name"
+	# record dmesg before
+	dmesg | t_filter_dmesg > "$T_TMPDIR/dmesg.before"

-	for iter in $(seq 1 $T_LOOP_ITER); do
+	# give tests stdout and compared output on specific fds
+	exec 6>&1
+	exec 7>$out

-		# create a temporary dir and file path for the test
-		T_TMPDIR="$T_RESULTS/tmp/$test_name"
-		T_TMP="$T_TMPDIR/tmp"
-		cmd rm -rf "$T_TMPDIR"
-		cmd mkdir -p "$T_TMPDIR"
+	# run the test with access to our functions
+	start_secs=$SECONDS
+	bash -c "for f in funcs/*.sh; do . \$f; done; . $t" >&7 2>&1
+	sts="$?"
+	log "test $t exited with status $sts"
+	stats="$((SECONDS - start_secs))s"

-		# assign scratch mount point in temporary dir
-		T_MSCR="$T_TMPDIR/scratch"
+	# close our weird descriptors
+	exec 6>&-
+	exec 7>&-

-		# create a test name dir in the fs, clean up old data as needed
-		T_DS=""
-		for i in $(seq 0 $((T_NR_MOUNTS - 1))); do
-			dir="${T_M[$i]}/test/$test_name"
-
-			test $i == 0 && (
-				test -d "$dir" && cmd rm -rf "$dir"
-				cmd mkdir -p "$dir"
-			)
-
-			eval T_D$i=$dir
-			T_D[$i]=$dir
-			T_DS+="$dir "
-		done
-
-		# export all our T_ variables
-		for v in ${!T_*}; do
-			eval export $v
-		done
-
-		# prepare to compare output to golden output
-		test -e "$T_RESULTS/output" || cmd mkdir -p "$T_RESULTS/output"
-		out="$T_RESULTS/output/$test_name"
-		> "$T_TMPDIR/status.msg"
-		golden="golden/$test_name"
-
-		# record dmesg before
-		dmesg | t_filter_dmesg > "$T_TMPDIR/dmesg.before"
-
-		# give tests stdout and compared output on specific fds
-		exec 6>&1
-		exec 7>$out
-
-		# run the test with access to our functions
-		start_secs=$SECONDS
-		bash -c "for f in funcs/*.sh; do . \$f; done; . $t" >&7 2>&1
-		sts="$?"
-		log "test $t exited with status $sts"
-		stats="$((SECONDS - start_secs))s"
-
-		# close our weird descriptors
-		exec 6>&-
-		exec 7>&-
-
-		# compare output if the test returned passed status
-		if [ "$sts" == "$T_PASS_STATUS" ]; then
-			if [ ! -e "$golden" ]; then
-				message="no golden output"
-				sts=$T_FAIL_STATUS
-			elif ! cmp -s "$golden" "$out"; then 
-				message="output differs"
-				sts=$T_FAIL_STATUS
-				diff -u "$golden" "$out" >> "$T_RESULTS/fail.log"
-			fi
-		else
-			# get message from t_*() functions
-			message=$(cat "$T_TMPDIR/status.msg")
-		fi
-
-		# see if anything unexpected was added to dmesg
-		if [ "$sts" == "$T_PASS_STATUS" ]; then
-			dmesg | t_filter_dmesg > "$T_TMPDIR/dmesg.after"
-			diff --old-line-format="" --unchanged-line-format="" \
-				"$T_TMPDIR/dmesg.before" "$T_TMPDIR/dmesg.after" | \
-				grep -v '^$' > "$T_TMPDIR/dmesg.new"
-
-			if [ -s "$T_TMPDIR/dmesg.new" ]; then
-				message="unexpected messages in dmesg"
-				sts=$T_FAIL_STATUS
-				cat "$T_TMPDIR/dmesg.new" >> "$T_RESULTS/fail.log"
-			fi
-		fi
-
-		# record unknown exit status
-		if [ "$sts" -lt "$T_FIRST_STATUS" -o "$sts" -gt "$T_LAST_STATUS" ]; then
-			message="unknown status: $sts"
+	# compare output if the test returned passed status
+	if [ "$sts" == "$T_PASS_STATUS" ]; then
+		if [ ! -e "$golden" ]; then
+			message="no golden output"
 			sts=$T_FAIL_STATUS
+		elif ! cmp -s "$golden" "$out"; then 
+			message="output differs"
+			sts=$T_FAIL_STATUS
+			diff -u "$golden" "$out" >> "$T_RESULTS/fail.log"
 		fi
+	else
+		# get message from t_*() functions
+		message=$(cat "$T_TMPDIR/status.msg")
+	fi

-		# stop looping if we didn't pass
-		if [ "$sts" != "$T_PASS_STATUS" ]; then
-			break;
+	# see if anything unexpected was added to dmesg
+	if [ "$sts" == "$T_PASS_STATUS" ]; then
+		dmesg | t_filter_dmesg > "$T_TMPDIR/dmesg.after"
+		diff --old-line-format="" --unchanged-line-format="" \
+			"$T_TMPDIR/dmesg.before" "$T_TMPDIR/dmesg.after" > \
+			"$T_TMPDIR/dmesg.new"
+
+		if [ -s "$T_TMPDIR/dmesg.new" ]; then
+			message="unexpected messages in dmesg"
+			sts=$T_FAIL_STATUS
+			cat "$T_TMPDIR/dmesg.new" >> "$T_RESULTS/fail.log"
 		fi
-	done
+	fi
+
+	# record unknown exit status
+	if [ "$sts" -lt "$T_FIRST_STATUS" -o "$sts" -gt "$T_LAST_STATUS" ]; then
+		message="unknown status: $sts"
+		sts=$T_FAIL_STATUS
+	fi

 	# show and record the result of the test
 	if [ "$sts" == "$T_PASS_STATUS" ]; then
--- a/tests/sequence
+++ b/tests/sequence
@@ -2,7 +2,6 @@ export-get-name-parent.sh
 basic-block-counts.sh
 basic-bad-mounts.sh
 basic-posix-acl.sh
-basic-acl-consistency.sh
 inode-items-updated.sh
 simple-inode-index.sh
 simple-staging.sh
@@ -11,7 +10,6 @@ simple-readdir.sh
 get-referring-entries.sh
 fallocate.sh
 basic-truncate.sh
-punch-offline.sh
 data-prealloc.sh
 setattr_more.sh
 offline-extent-waiting.sh
@@ -26,9 +24,7 @@ srch-basic-functionality.sh
 simple-xattr-unit.sh
 retention-basic.sh
 totl-xattr-tag.sh
-basic-xattr-indx.sh
 quota.sh
-totl-merge-read.sh
 lock-refleak.sh
 lock-shrink-consistency.sh
 lock-shrink-read-race.sh
@@ -52,7 +48,6 @@ setup-error-teardown.sh
 resize-devices.sh
 change-devices.sh
 fence-and-reclaim.sh
-orphan-log-trees.sh
 quorum-heartbeat-timeout.sh
 orphan-inodes.sh
 mount-unmount-race.sh
--- a/tests/src/mmap_stress.c
+++ b/tests/src/mmap_stress.c
@@ -19,7 +19,6 @@
 #include <sys/types.h>
 #include <stdio.h>
 #include <sys/stat.h>
-#include <inttypes.h>
 #include <fcntl.h>
 #include <unistd.h>
 #include <stdlib.h>
@@ -30,7 +29,7 @@
 #include <errno.h>

 static int size = 0;
-static int duration = 0;
+static int count = 0; /* XXX make this duration instead */

 struct thread_info {
 	int nr;
@@ -42,8 +41,6 @@ static void *run_test_func(void *ptr)
 	void *buf = NULL;
 	char *addr = NULL;
 	struct thread_info *tinfo = ptr;
-	uint64_t seconds = 0;
-	struct timespec ts;
 	int c = 0;
 	int fd;
 	ssize_t read, written, ret;
@@ -64,15 +61,9 @@ static void *run_test_func(void *ptr)

 	usleep(100000); /* 0.1sec to allow all threads to start roughly at the same time */

-	clock_gettime(CLOCK_REALTIME, &ts); /* record start time */
-	seconds = ts.tv_sec + duration;
-
 	for (;;) {
-		if (++c % 16 == 0) {
-			clock_gettime(CLOCK_REALTIME, &ts);
-			if (ts.tv_sec >= seconds)
-				break;
-		}
+		if (++c > count)
+			break;

 		switch (rand() % 4) {
 		case 0: /* pread */
@@ -108,8 +99,6 @@ static void *run_test_func(void *ptr)
 			memcpy(addr, buf, size); /* noerr */
 			break;
 		}
-
-		usleep(10000);
 	}

 	munmap(addr, size);
@@ -131,7 +120,7 @@ int main(int argc, char **argv)
 	int i;

 	if (argc != 8) {
-		fprintf(stderr, "%s requires 7 arguments - size duration file1 file2 file3 file4 file5\n", argv[0]);
+		fprintf(stderr, "%s requires 7 arguments - size count file1 file2 file3 file4 file5\n", argv[0]);
 		exit(-1);
 	}

@@ -141,9 +130,9 @@ int main(int argc, char **argv)
 		exit(-1);
 	}

-	duration = atoi(argv[2]);
-	if (duration < 0) {
-		fprintf(stderr, "invalid duration, must be greater than or equal to 0\n");
+	count = atoi(argv[2]);
+	if (count < 0) {
+		fprintf(stderr, "invalid count, must be greater than 0\n");
 		exit(-1);
 	}

--- a/tests/tests/basic-acl-consistency.sh
+++ b/tests/tests/basic-acl-consistency.sh
@@ -1,117 +0,0 @@
-
-#
-# Test basic clustered posix acl consistency.
-#
-
-t_require_commands getfacl setfacl
-
-GETFACL="getfacl --absolute-names"
-
-filter_scratch() {
-	sed "s@$T_MSCR@t_mscr@g"
-}
-
-acl_compare()
-{
-	diff -u - <($GETFACL $T_MSCR/data/dir_a/dir_b | filter_scratch) <<EOF1
-# file: t_mscr/data/dir_a/dir_b
-# owner: t_usr_3
-# group: t_grp_3
-# flags: -s-
-user::rwx
-group::rwx
-group:t_grp_2:r-x
-mask::rwx
-other::---
-default:user::rwx
-default:group::rwx
-default:group:t_grp_2:r-x
-default:group:t_grp_3:rwx
-default:mask::rwx
-default:other::---
-
-EOF1
-
-	test $? -eq 0 || t_fail "dir_b differs"
-
-	diff -u - <($GETFACL -p $T_MSCR/data/dir_a/dir_b/dir_c/dir_d | filter_scratch) <<EOF3
-# file: t_mscr/data/dir_a/dir_b/dir_c/dir_d
-# owner: t_usr_1
-# group: t_grp_1
-# flags: -s-
-user::rwx
-group::rwx
-group:t_grp_2:r-x
-mask::rwx
-other::---
-default:user::rwx
-default:group::rwx
-default:group:t_grp_2:r-x
-default:group:t_grp_3:rwx
-default:mask::rwx
-default:other::---
-
-EOF3
-	test $? -eq 0 || t_fail "dir_d differs"
-
-	diff -u - <($GETFACL $T_MSCR/data/dir_a/dir_b/dir_c | filter_scratch) <<EOF2
-# file: t_mscr/data/dir_a/dir_b/dir_c
-# owner: t_usr_3
-# group: t_grp_2
-# flags: -s-
-user::rwx
-group::rwx
-group:t_grp_2:r-x
-mask::rwx
-other::---
-default:user::rwx
-default:group::rwx
-default:group:t_grp_2:r-x
-default:group:t_grp_3:rwx
-default:mask::rwx
-default:other::---
-
-EOF2
-	test $? -eq 0 || t_fail "dir_c differs"
-}
-echo "== make scratch fs"
-t_scratch_mkfs
-t_scratch_mount
-
-rm -rf $T_MSCR/data
-
-echo "== create uid/gids"
-groupadd -g 7101 t_grp_1 > /dev/null 2>&1
-useradd -g 7101 -u 7101 t_usr_1 > /dev/null 2>&1
-groupadd -g 7102 t_grp_2 > /dev/null 2>&1
-groupadd -g 7103 t_grp_3 > /dev/null 2>&1
-useradd -g 7103 -u 7103 t_usr_3 > /dev/null 2>&1
-
-echo "== set acls and permissions"
-mkdir -p $T_MSCR/data/dir_a/dir_b
-chown t_usr_3:t_grp_3 $T_MSCR/data/dir_a/dir_b
-chmod 2770 $T_MSCR/data/dir_a/dir_b
-setfacl -m g:t_grp_2:rx $T_MSCR/data/dir_a/dir_b
-setfacl -m d:g:t_grp_2:rx $T_MSCR/data/dir_a/dir_b
-setfacl -m d:g:t_grp_3:rwx $T_MSCR/data/dir_a/dir_b
-
-mkdir -p $T_MSCR/data/dir_a/dir_b/dir_c
-chown t_usr_3:t_grp_2 $T_MSCR/data/dir_a/dir_b/dir_c
-setfacl -x g:t_grp_3 $T_MSCR/data/dir_a/dir_b/dir_c
-
-mkdir -p $T_MSCR/data/dir_a/dir_b/dir_c/dir_d
-chown t_usr_1:t_grp_1 $T_MSCR/data/dir_a/dir_b/dir_c/dir_d
-setfacl -x g:t_grp_3 $T_MSCR/data/dir_a/dir_b/dir_c/dir_d
-
-echo "== compare output"
-acl_compare
-
-echo "== drop caches and compare again"
-sync
-echo 3 > /proc/sys/vm/drop_caches
-acl_compare
-
-echo "== cleanup scratch fs"
-t_scratch_umount
-
-t_pass
--- a/tests/tests/basic-bad-mounts.sh
+++ b/tests/tests/basic-bad-mounts.sh
@@ -12,22 +12,25 @@ mount_fail()
 }

 echo "== prepare devices, mount point, and logs"
-t_scratch_mkfs
+SCR="$T_TMPDIR/mnt.scratch"
+mkdir -p "$SCR"
 > $T_TMP.mount.out
+scoutfs mkfs -f -Q 0,127.0.0.1,53000 "$T_EX_META_DEV" "$T_EX_DATA_DEV" > $T_TMP.mkfs.out 2>&1 \
+	|| t_fail "mkfs failed"

 echo "== bad devices, bad options"
-mount_fail -o _bad /dev/null /dev/null "$T_MSCR"
+mount_fail -o _bad /dev/null /dev/null "$SCR"

 echo "== swapped devices"
-mount_fail -o metadev_path=$T_EX_DATA_DEV,quorum_slot_nr=0 "$T_EX_META_DEV" "$T_MSCR"
+mount_fail -o metadev_path=$T_EX_DATA_DEV,quorum_slot_nr=0 "$T_EX_META_DEV" "$SCR"

 echo "== both meta devices"
-mount_fail -o metadev_path=$T_EX_META_DEV,quorum_slot_nr=0 "$T_EX_META_DEV" "$T_MSCR"
+mount_fail -o metadev_path=$T_EX_META_DEV,quorum_slot_nr=0 "$T_EX_META_DEV" "$SCR"

 echo "== both data devices"
-mount_fail -o metadev_path=$T_EX_DATA_DEV,quorum_slot_nr=0 "$T_EX_DATA_DEV" "$T_MSCR"
+mount_fail -o metadev_path=$T_EX_DATA_DEV,quorum_slot_nr=0 "$T_EX_DATA_DEV" "$SCR"

 echo "== good volume, bad option and good options"
-mount_fail -o _bad,metadev_path=$T_EX_META_DEV,quorum_slot_nr=0 "$T_EX_DATA_DEV" "$T_MSCR"
+mount_fail -o _bad,metadev_path=$T_EX_META_DEV,quorum_slot_nr=0 "$T_EX_DATA_DEV" "$SCR" 

 t_pass
--- a/tests/tests/basic-xattr-indx.sh
+++ b/tests/tests/basic-xattr-indx.sh
@@ -1,143 +0,0 @@
-#
-# Test basic .indx. xattr tag functionality and index entry lifecycle
-#
-
-t_require_commands touch rm setfattr scoutfs stat
-t_require_mounts 2
-
-# query index from a specific mount, default mount 0
-read_xattr_index()
-{
-	local nr="${1:-0}"
-	local mnt="$(eval echo \$T_M$nr)"
-	shift
-
-	sync
-	echo 1 > $(t_debugfs_path $nr)/drop_weak_item_cache
-	scoutfs read-xattr-index -p "$mnt" "$@"
-}
-
-MAJOR=5
-MINOR=100
-
-echo "== testing invalid read-xattr-index arguments"
-scoutfs read-xattr-index -p "$T_M0" bad 2>&1
-scoutfs read-xattr-index -p "$T_M0" 1.2 2>&1
-scoutfs read-xattr-index -p "$T_M0" 1.2.3 256.0.0 2>&1
-scoutfs read-xattr-index -p "$T_M0" 1.2.3 0.0.0 2>&1
-scoutfs read-xattr-index -p "$T_M0" 1.2.0 1.1.2 2>&1
-scoutfs read-xattr-index -p "$T_M0" 2.2.2 2.2.1 2>&1
-
-echo "== testing invalid names"
-touch "$T_D0/invalid"
-setfattr -n scoutfs.hide.indx.test.$MAJOR "$T_D0/invalid" 2>&1 | t_filter_fs
-setfattr -n scoutfs.hide.indx.test "$T_D0/invalid" 2>&1 | t_filter_fs
-setfattr -n scoutfs.hide.indx.test.. "$T_D0/invalid" 2>&1 | t_filter_fs
-setfattr -n scoutfs.hide.indx.test..$MINOR "$T_D0/invalid" 2>&1 | t_filter_fs
-setfattr -n scoutfs.hide.indx.test.$MAJOR. "$T_D0/invalid" 2>&1 | t_filter_fs
-setfattr -n scoutfs.hide.indx.test.256.$MINOR "$T_D0/invalid" 2>&1 | t_filter_fs
-setfattr -n scoutfs.hide.indx.test.abc.$MINOR "$T_D0/invalid" 2>&1 | t_filter_fs
-setfattr -n scoutfs.hide.indx.test.$MAJOR.abc "$T_D0/invalid" 2>&1 | t_filter_fs
-setfattr -n scoutfs.hide.indx.test.-1.$MINOR "$T_D0/invalid" 2>&1 | t_filter_fs
-setfattr -n scoutfs.hide.indx.test.$MAJOR.-1 "$T_D0/invalid" 2>&1 | t_filter_fs
-setfattr -n scoutfs.hide.indx.test.18446744073709551616.$MINOR "$T_D0/invalid" 2>&1 | t_filter_fs
-setfattr -n scoutfs.hide.indx.$(printf 'x%.0s' $(seq 1 240)).$MAJOR.$MINOR "$T_D0/invalid" 2>&1 | t_filter_fs
-rm -f "$T_D0/invalid"
-
-echo "== testing boundary values"
-touch "$T_D0/boundary"
-INO=$(stat -c "%i" "$T_D0/boundary")
-setfattr -n scoutfs.hide.indx.test.0.0 "$T_D0/boundary"
-read_xattr_index 0 0.0.0 0.0.-1 | awk '($3 == "'$INO'") {print "0.0 found"}'
-setfattr -x scoutfs.hide.indx.test.0.0 "$T_D0/boundary"
-setfattr -n scoutfs.hide.indx.test.255.18446744073709551615 "$T_D0/boundary"
-read_xattr_index 0 255.0.0 255.-1.-1 | awk '($3 == "'$INO'") {print "255.max found"}'
-setfattr -x scoutfs.hide.indx.test.255.18446744073709551615 "$T_D0/boundary"
-rm -f "$T_D0/boundary"
-
-echo "== indx xattr must have no value"
-touch "$T_D0/noval"
-setfattr -n scoutfs.hide.indx.test.$MAJOR.$MINOR -v "" "$T_D0/noval" 2>&1 | t_filter_fs
-setfattr -n scoutfs.hide.indx.test.$MAJOR.$MINOR -v 0 "$T_D0/noval" 2>&1 | t_filter_fs
-setfattr -n scoutfs.hide.indx.test.$MAJOR.$MINOR -v 1 "$T_D0/noval" 2>&1 | t_filter_fs
-rm -f "$T_D0/noval"
-
-echo "== set indx xattr and verify index entry"
-touch "$T_D0/file"
-INO=$(stat -c "%i" "$T_D0/file")
-setfattr -n scoutfs.hide.indx.test.$MAJOR.$MINOR "$T_D0/file"
-read_xattr_index 0 $MAJOR.0.0 $MAJOR.-1.-1 | awk '($3 == "'$INO'") {print "found"}'
-
-echo "== setting same indx xattr again is a no-op"
-setfattr -n scoutfs.hide.indx.test.$MAJOR.$MINOR "$T_D0/file"
-read_xattr_index 0 $MAJOR.0.0 $MAJOR.-1.-1 | awk '($3 == "'$INO'") {print "found"}'
-
-echo "== removing non-existent indx xattr succeeds"
-setfattr -x scoutfs.hide.indx.nonexistent.$MAJOR.999 "$T_D0/file" 2>&1 | t_filter_fs
-read_xattr_index 0 $MAJOR.0.0 $MAJOR.-1.-1 | awk '($3 == "'$INO'") {print "still found"}'
-
-echo "== explicit xattr removal cleans up index entry"
-setfattr -x scoutfs.hide.indx.test.$MAJOR.$MINOR "$T_D0/file"
-read_xattr_index 0 $MAJOR.0.0 $MAJOR.-1.-1 | awk '($3 == "'$INO'") {print "found orphan"}'
-rm -f "$T_D0/file"
-
-echo "== file deletion cleans up index entry"
-touch "$T_D0/file2"
-INO=$(stat -c "%i" "$T_D0/file2")
-setfattr -n scoutfs.hide.indx.test.$MAJOR.$MINOR "$T_D0/file2"
-read_xattr_index 0 $MAJOR.0.0 $MAJOR.-1.-1 | awk '($3 == "'$INO'") {print "found before delete"}'
-rm -f "$T_D0/file2"
-read_xattr_index 0 $MAJOR.0.0 $MAJOR.-1.-1 | awk '($3 == "'$INO'") {print "found orphan after delete"}'
-
-echo "== multiple indx xattrs on one file cleaned up by deletion"
-touch "$T_D0/file3"
-INO=$(stat -c "%i" "$T_D0/file3")
-setfattr -n scoutfs.hide.indx.a.$MAJOR.200 "$T_D0/file3"
-setfattr -n scoutfs.hide.indx.b.$MAJOR.300 "$T_D0/file3"
-BEFORE=$(read_xattr_index 0 $MAJOR.0.0 $MAJOR.-1.-1 | awk '($3 == "'$INO'")' | wc -l)
-echo "entries before delete: $BEFORE"
-rm -f "$T_D0/file3"
-AFTER=$(read_xattr_index 0 $MAJOR.0.0 $MAJOR.-1.-1 | awk '($3 == "'$INO'")' | wc -l)
-echo "entries after delete: $AFTER"
-
-echo "== partial removal leaves other entries"
-touch "$T_D0/partial"
-INO=$(stat -c "%i" "$T_D0/partial")
-setfattr -n scoutfs.hide.indx.a.$MAJOR.200 "$T_D0/partial"
-setfattr -n scoutfs.hide.indx.b.$MAJOR.300 "$T_D0/partial"
-setfattr -x scoutfs.hide.indx.a.$MAJOR.200 "$T_D0/partial"
-read_xattr_index 0 $MAJOR.200.0 $MAJOR.200.-1 | awk '($3 == "'$INO'") {print "200 found"}'
-read_xattr_index 0 $MAJOR.300.0 $MAJOR.300.-1 | awk '($3 == "'$INO'") {print "300 found"}'
-rm -f "$T_D0/partial"
-
-echo "== multiple files at same index position"
-touch "$T_D0/multi_a" "$T_D0/multi_b"
-INO_A=$(stat -c "%i" "$T_D0/multi_a")
-INO_B=$(stat -c "%i" "$T_D0/multi_b")
-setfattr -n scoutfs.hide.indx.test.$MAJOR.$MINOR "$T_D0/multi_a"
-setfattr -n scoutfs.hide.indx.test.$MAJOR.$MINOR "$T_D0/multi_b"
-COUNT=$(read_xattr_index 0 $MAJOR.$MINOR.0 $MAJOR.$MINOR.-1 | wc -l)
-echo "files at same position: $COUNT"
-rm -f "$T_D0/multi_a"
-read_xattr_index 0 $MAJOR.$MINOR.0 $MAJOR.$MINOR.-1 | awk '($3 == "'$INO_A'") {print "deleted file still found"}'
-read_xattr_index 0 $MAJOR.$MINOR.0 $MAJOR.$MINOR.-1 | awk '($3 == "'$INO_B'") {print "surviving file found"}'
-rm -f "$T_D0/multi_b"
-
-echo "== cross-mount visibility"
-touch "$T_D0/file4"
-INO=$(stat -c "%i" "$T_D0/file4")
-setfattr -n scoutfs.hide.indx.test.$MAJOR.$MINOR "$T_D0/file4"
-read_xattr_index 1 $MAJOR.0.0 $MAJOR.-1.-1 | awk '($3 == "'$INO'") {print "found on mount 1"}'
-rm -f "$T_D0/file4"
-read_xattr_index 1 $MAJOR.0.0 $MAJOR.-1.-1 | awk '($3 == "'$INO'") {print "found orphan on mount 1"}'
-
-echo "== duplicate position deduplication"
-touch "$T_D0/file5"
-INO=$(stat -c "%i" "$T_D0/file5")
-setfattr -n scoutfs.hide.indx.aa.$MAJOR.$MINOR "$T_D0/file5"
-setfattr -n scoutfs.hide.indx.bb.$MAJOR.$MINOR "$T_D0/file5"
-COUNT=$(read_xattr_index 0 $MAJOR.0.0 $MAJOR.-1.-1 | awk '($3 == "'$INO'")' | wc -l)
-echo "entries for same position: $COUNT"
-rm -f "$T_D0/file5"
-
-t_pass
--- a/tests/tests/change-devices.sh
+++ b/tests/tests/change-devices.sh
@@ -11,8 +11,9 @@ truncate -s $sz "$T_TMP.equal"
 truncate -s $large_sz "$T_TMP.large"

 echo "== make scratch fs"
-t_scratch_mkfs
-mkdir -p "$T_MSCR"
+t_quiet scoutfs mkfs -f -Q 0,127.0.0.1,53000 "$T_EX_META_DEV" "$T_EX_DATA_DEV"
+SCR="$T_TMPDIR/mnt.scratch"
+mkdir -p "$SCR"

 echo "== small new data device fails"
 t_rc scoutfs prepare-empty-data-device "$T_EX_META_DEV" "$T_TMP.small"
@@ -22,13 +23,13 @@ t_rc scoutfs prepare-empty-data-device --check "$T_EX_META_DEV" "$T_TMP.small"
 t_rc scoutfs prepare-empty-data-device --check "$T_EX_META_DEV"

 echo "== preparing while mounted fails"
-mount -t scoutfs -o metadev_path=$T_EX_META_DEV,quorum_slot_nr=0 "$T_EX_DATA_DEV" "$T_MSCR"
+mount -t scoutfs -o metadev_path=$T_EX_META_DEV,quorum_slot_nr=0 "$T_EX_DATA_DEV" "$SCR"
 t_rc scoutfs prepare-empty-data-device "$T_EX_META_DEV" "$T_TMP.equal"
-umount "$T_MSCR"
+umount "$SCR"

 echo "== preparing without recovery fails"
-mount -t scoutfs -o metadev_path=$T_EX_META_DEV,quorum_slot_nr=0 "$T_EX_DATA_DEV" "$T_MSCR"
-umount -f "$T_MSCR"
+mount -t scoutfs -o metadev_path=$T_EX_META_DEV,quorum_slot_nr=0 "$T_EX_DATA_DEV" "$SCR"
+umount -f "$SCR"
 t_rc scoutfs prepare-empty-data-device "$T_EX_META_DEV" "$T_TMP.equal"

 echo "== check sees metadata errors"
@@ -36,16 +37,16 @@ t_rc scoutfs prepare-empty-data-device --check "$T_EX_META_DEV"
 t_rc scoutfs prepare-empty-data-device --check "$T_EX_META_DEV" "$T_TMP.equal"

 echo "== preparing with file data fails"
-mount -t scoutfs -o metadev_path=$T_EX_META_DEV,quorum_slot_nr=0 "$T_EX_DATA_DEV" "$T_MSCR"
-echo hi > "$T_MSCR"/file
-umount "$T_MSCR"
+mount -t scoutfs -o metadev_path=$T_EX_META_DEV,quorum_slot_nr=0 "$T_EX_DATA_DEV" "$SCR"
+echo hi > "$SCR"/file
+umount "$SCR"
 scoutfs print "$T_EX_META_DEV" > "$T_TMP.print"
 t_rc scoutfs prepare-empty-data-device "$T_EX_META_DEV" "$T_TMP.equal"

 echo "== preparing after emptied"
-mount -t scoutfs -o metadev_path=$T_EX_META_DEV,quorum_slot_nr=0 "$T_EX_DATA_DEV" "$T_MSCR"
-rm -f "$T_MSCR"/file
-umount "$T_MSCR"
+mount -t scoutfs -o metadev_path=$T_EX_META_DEV,quorum_slot_nr=0 "$T_EX_DATA_DEV" "$SCR"
+rm -f "$SCR"/file
+umount "$SCR"
 t_rc scoutfs prepare-empty-data-device "$T_EX_META_DEV" "$T_TMP.equal"

 echo "== checks pass"
@@ -54,22 +55,22 @@ t_rc scoutfs prepare-empty-data-device --check "$T_EX_META_DEV" "$T_TMP.equal"

 echo "== using prepared"
 scr_loop=$(losetup --find --show "$T_TMP.equal")
-mount -t scoutfs -o metadev_path=$T_EX_META_DEV,quorum_slot_nr=0 "$scr_loop" "$T_MSCR"
-touch "$T_MSCR"/equal_prepared
-equal_tot=$(scoutfs statfs -s total_data_blocks -p "$T_MSCR")
-umount "$T_MSCR"
+mount -t scoutfs -o metadev_path=$T_EX_META_DEV,quorum_slot_nr=0 "$scr_loop" "$SCR"
+touch "$SCR"/equal_prepared
+equal_tot=$(scoutfs statfs -s total_data_blocks -p "$SCR")
+umount "$SCR"
 losetup -d "$scr_loop"

 echo "== preparing larger and resizing"
 t_rc scoutfs prepare-empty-data-device "$T_EX_META_DEV" "$T_TMP.large"
 scr_loop=$(losetup --find --show "$T_TMP.large")
-mount -t scoutfs -o metadev_path=$T_EX_META_DEV,quorum_slot_nr=0 "$scr_loop" "$T_MSCR"
-touch "$T_MSCR"/large_prepared
-ls "$T_MSCR"
-scoutfs resize-devices -p "$T_MSCR" -d $large_sz
-large_tot=$(scoutfs statfs -s total_data_blocks -p "$T_MSCR")
+mount -t scoutfs -o metadev_path=$T_EX_META_DEV,quorum_slot_nr=0 "$scr_loop" "$SCR"
+touch "$SCR"/large_prepared
+ls "$SCR"
+scoutfs resize-devices -p "$SCR" -d $large_sz
+large_tot=$(scoutfs statfs -s total_data_blocks -p "$SCR")
 test "$large_tot" -gt "$equal_tot" ; echo "resized larger test rc: $?"
-umount "$T_MSCR"
+umount "$SCR"
 losetup -d "$scr_loop"

 echo "== cleanup"
--- a/tests/tests/enospc.sh
+++ b/tests/tests/enospc.sh
@@ -54,16 +54,21 @@ after=$(free_blocks Data "$T_M0")
 test "$before" == "$after" || \
 	t_fail "$after free data blocks after rm, expected $before"

+# XXX this is all pretty manual, would be nice to have helpers
 echo "== make small meta fs"
 # meta device just big enough for reserves and the metadata we'll fill
-t_scratch_mkfs -A -m 10G
-t_scratch_mount
+scoutfs mkfs -A -f -Q 0,127.0.0.1,53000 -m 10G "$T_EX_META_DEV" "$T_EX_DATA_DEV" > $T_TMP.mkfs.out 2>&1 || \
+	t_fail "mkfs failed"
+SCR="$T_TMPDIR/mnt.scratch"
+mkdir -p "$SCR"
+mount -t scoutfs -o metadev_path=$T_EX_META_DEV,quorum_slot_nr=0 \
+	"$T_EX_DATA_DEV" "$SCR"

 echo "== create large xattrs until we fill up metadata"
-mkdir -p "$T_MSCR/xattrs"
+mkdir -p "$SCR/xattrs"

 for f in $(seq 1 100000); do
-	file="$T_MSCR/xattrs/file-$f"
+	file="$SCR/xattrs/file-$f"
 	touch "$file"

 	LC_ALL=C create_xattr_loop -c 1000 -n user.scoutfs-enospc -p "$file" -s 65535 > $T_TMP.cxl 2>&1
@@ -79,21 +84,17 @@ for f in $(seq 1 100000); do
 done

 echo "== remove files with xattrs after enospc"
-rm -rf "$T_MSCR/xattrs"
+rm -rf "$SCR/xattrs"

 echo "== make sure we can create again"
-file="$T_MSCR/file-after"
-C=120
-while (( C-- )); do
-	touch $file 2> /dev/null && break
-	sleep 1
-done
+file="$SCR/file-after"
 touch $file
 setfattr -n user.scoutfs-enospc -v 1 "$file"
 sync
 rm -f "$file"

 echo "== cleanup small meta fs"
-t_scratch_umount
+umount "$SCR"
+rmdir "$SCR"

 t_pass
--- a/tests/tests/fence-and-reclaim.sh
+++ b/tests/tests/fence-and-reclaim.sh
@@ -5,9 +5,6 @@
 t_require_commands sleep touch grep sync scoutfs
 t_require_mounts 2

-# regularly see ~20/~30s
-VERIFY_TIMEOUT_SECS=90
-
 #
 # Make sure that all mounts can read the results of a write from each
 # mount.
@@ -43,10 +40,8 @@ verify_fenced_run()

 	for rid in $rids; do
 		grep -q ".* running rid '$rid'.* args 'ignored run args'" "$T_FENCED_LOG" || \
-			return 1
+			t_fail "fenced didn't execute RUN script for rid $rid"
 	done
-
-	return 0
 }

 echo "== make sure all mounts can see each other"
@@ -59,7 +54,14 @@ rid=$(t_mount_rid $cl)
 echo "cl $cl sv $sv rid $rid" >> "$T_TMP.log"
 sync
 t_force_umount $cl
-t_wait_until_timeout $VERIFY_TIMEOUT_SECS verify_fenced_run $rid
+# wait for client reconnection to timeout
+while grep -q $rid $(t_debugfs_path $sv)/connections; do
+	sleep .5
+done
+while t_rid_is_fencing $rid; do
+	sleep .5
+done
+verify_fenced_run $rid
 t_mount $cl
 check_read_write

@@ -81,7 +83,15 @@ for cl in $(t_fs_nrs); do
 	t_force_umount $cl
 done

-t_wait_until_timeout $VERIFY_TIMEOUT_SECS verify_fenced_run $rids
+# wait for all client reconnections to timeout
+while egrep -q "($pattern)" $(t_debugfs_path $sv)/connections; do
+	sleep .5
+done
+# wait for all fence requests to complete
+while test -d $(echo /sys/fs/scoutfs/*/fence/* | cut -d " " -f 1); do
+	sleep .5
+done
+verify_fenced_run $rids
 # remount all the clients
 for cl in $(t_fs_nrs); do
 	if [ $cl == $sv ]; then
@@ -97,7 +107,12 @@ rid=$(t_mount_rid $sv)
 echo "sv $sv rid $rid" >> "$T_TMP.log"
 sync
 t_force_umount $sv
-t_wait_until_timeout $VERIFY_TIMEOUT_SECS verify_fenced_run $rid
+t_wait_for_leader
+# wait until new server is done fencing unmounted leader rid
+while t_rid_is_fencing $rid; do
+	sleep .5
+done
+verify_fenced_run $rid
 t_mount $sv
 check_read_write

@@ -112,7 +127,11 @@ for nr in $(t_fs_nrs); do
 	t_force_umount $nr
 done
 t_mount_all
-t_wait_until_timeout $VERIFY_TIMEOUT_SECS verify_fenced_run $rids
+# wait for all fence requests to complete
+while test -d $(echo /sys/fs/scoutfs/*/fence/* | cut -d " " -f 1); do
+	sleep .5
+done
+verify_fenced_run $rids
 check_read_write

 t_pass
--- a/tests/tests/format-version-forward-back.sh
+++ b/tests/tests/format-version-forward-back.sh
@@ -11,8 +11,8 @@
 # format version.
 #

-# not supported on el8 or higher
-if [ $(source /etc/os-release ; echo ${VERSION_ID:0:1}) -gt 7 ]; then
+# not supported on el9!
+if [ $(source /etc/os-release ; echo ${VERSION_ID:0:1}) -gt 8 ]; then
 	t_skip_permitted "Unsupported OS version"
 fi

@@ -89,7 +89,7 @@ for vers in $(seq $MIN $((MAX - 1))); do
 	old_module="$builds/$vers/scoutfs.ko"

 	echo "mkfs $vers" >> "$T_TMP.log"
-	t_quiet $old_scoutfs mkfs -f -Q 0,127.0.0.1,$T_SCRATCH_PORT "$T_EX_META_DEV" "$T_EX_DATA_DEV" \
+	t_quiet $old_scoutfs mkfs -f -Q 0,127.0.0.1,53000 "$T_EX_META_DEV" "$T_EX_DATA_DEV" \
 		|| t_fail "mkfs $vers failed"

 	echo "mount $vers with $vers" >> "$T_TMP.log"
--- a/tests/tests/get-referring-entries.sh
+++ b/tests/tests/get-referring-entries.sh
@@ -72,7 +72,7 @@ touch $T_D0/dir/file
 mkdir $T_D0/dir/dir
 ln -s $T_D0/dir/file $T_D0/dir/symlink
 mknod $T_D0/dir/char c 1 3 # null
-mknod $T_D0/dir/block b 42 0 # SAMPLE block dev - nonexistant/demo use only number
+mknod $T_D0/dir/block b 7 0 # loop0
 for name in $(ls -UA $T_D0/dir | sort); do
 	ino=$(stat -c '%i' $T_D0/dir/$name)
 	$GRE $ino | filter_types
--- a/tests/tests/inode-deletion.sh
+++ b/tests/tests/inode-deletion.sh
@@ -53,40 +53,26 @@ exec {FD1}>&-  # close
 exec {FD2}>&-  # close
 check_ino_index "$ino" "$dseq" "$T_M0"

-# Hurry along the orphan scanners. If any are currently asleep, we will
-# have to wait at least their current scan interval before they wake up,
-# run, and notice their new interval.
-t_save_all_sysfs_mount_options orphan_scan_delay_ms
-t_set_all_sysfs_mount_options orphan_scan_delay_ms 500
-t_wait_for_orphan_scan_runs
-
 echo "== remote unopened unlink deletes"
 echo "contents" > "$T_D0/file"
 ino=$(stat -c "%i" "$T_D0/file")
 dseq=$(scoutfs stat -s data_seq "$T_D0/file")
 rm -f "$T_D1/file"
-# cross-mount deletion falls back to the orphan scanner when the
-# creating mount still has the inode cached, wait for it to complete
-t_force_log_merge
-# wait for orphan scanners to pick up the unlinked inode and become idle
-t_wait_for_no_orphans
 check_ino_index "$ino" "$dseq" "$T_M0"
 check_ino_index "$ino" "$dseq" "$T_M1"

 echo "== unlink wait for open on other mount"
-echo "contents" > "$T_D0/badfile"
-ino=$(stat -c "%i" "$T_D0/badfile")
-dseq=$(scoutfs stat -s data_seq "$T_D0/badfile")
-exec {FD}<"$T_D0/badfile"
-rm -f "$T_D1/badfile"
+echo "contents" > "$T_D0/file"
+ino=$(stat -c "%i" "$T_D0/file")
+dseq=$(scoutfs stat -s data_seq "$T_D0/file")
+exec {FD}<"$T_D0/file"
+rm -f "$T_D1/file"
 echo "mount 0 contents after mount 1 rm: $(cat <&$FD)"
 check_ino_index "$ino" "$dseq" "$T_M0"
 check_ino_index "$ino" "$dseq" "$T_M1"
 exec {FD}>&-  # close
 # we know that revalidating will unhash the remote dentry
-stat "$T_D0/badfile" 2>&1 | sed 's/cannot statx/cannot stat/' | t_filter_fs
-t_force_log_merge
-t_wait_for_no_orphans
+stat "$T_D0/file" 2>&1 | sed 's/cannot statx/cannot stat/' | t_filter_fs
 check_ino_index "$ino" "$dseq" "$T_M0"
 check_ino_index "$ino" "$dseq" "$T_M1"

@@ -97,20 +83,16 @@ rm -f "$T_D0/dir"/files-*
 rmdir "$T_D0/dir"

 echo "== open files survive remote scanning orphans"
-echo "contents" > "$T_D0/lastfile"
-ino=$(stat -c "%i" "$T_D0/lastfile")
-dseq=$(scoutfs stat -s data_seq "$T_D0/lastfile")
-exec {FD}<"$T_D0/lastfile"
-rm -f "$T_D0/lastfile"
+echo "contents" > "$T_D0/file"
+ino=$(stat -c "%i" "$T_D0/file")
+dseq=$(scoutfs stat -s data_seq "$T_D0/file")
+exec {FD}<"$T_D0/file"
+rm -f "$T_D0/file"
 t_umount 1
 t_mount 1
 echo "mount 0 contents after mount 1 remounted: $(cat <&$FD)"
 exec {FD}>&-  # close
-t_force_log_merge
-t_wait_for_no_orphans
 check_ino_index "$ino" "$dseq" "$T_M0"
 check_ino_index "$ino" "$dseq" "$T_M1"

-t_restore_all_sysfs_mount_options orphan_scan_delay_ms
-
 t_pass
--- a/tests/tests/large-fragmented-free.sh
+++ b/tests/tests/large-fragmented-free.sh
@@ -10,6 +10,30 @@ EXTENTS_PER_BTREE_BLOCK=600
 EXTENTS_PER_LIST_BLOCK=8192
 FREED_EXTENTS=$((EXTENTS_PER_BTREE_BLOCK * EXTENTS_PER_LIST_BLOCK))

+#
+# This test specifically creates a pathologically sparse file that will
+# be as expensive as possible to free.  This is usually fine on
+# dedicated or reasonable hardware, but trying to run this in
+# virtualized debug kernels can take a very long time.  This test is
+# about making sure that the server doesn't fail, not that the platform
+# can handle the scale of work that our btree formats happen to require
+# while execution is bogged down with use-after-free memory reference
+# tracking.  So we give the test a lot more breathing room before
+# deciding that its hung.
+#
+echo "== setting longer hung task timeout"
+if [ -w /proc/sys/kernel/hung_task_timeout_secs ]; then
+	secs=$(cat /proc/sys/kernel/hung_task_timeout_secs)
+	test "$secs" -gt 0 || \
+		t_fail "confusing value '$secs' from /proc/sys/kernel/hung_task_timeout_secs"
+	restore_hung_task_timeout()
+	{
+		echo "$secs" > /proc/sys/kernel/hung_task_timeout_secs
+	}
+	trap restore_hung_task_timeout EXIT
+	echo "$((secs * 5))" > /proc/sys/kernel/hung_task_timeout_secs
+fi
+
 echo "== creating fragmented extents"
 fragmented_data_extents $FREED_EXTENTS $EXTENTS_PER_BTREE_BLOCK "$T_D0/alloc" "$T_D0/move"

--- a/tests/tests/lock-recover-invalidate.sh
+++ b/tests/tests/lock-recover-invalidate.sh
@@ -38,6 +38,6 @@ while [ "$SECONDS" -lt "$END" ]; do
 done

 echo "== stopping background load"
-t_silent_kill $load_pids
+kill $load_pids

 t_pass
--- a/tests/tests/mmap.sh
+++ b/tests/tests/mmap.sh
@@ -5,7 +5,7 @@
 t_require_commands mmap_stress mmap_validate scoutfs xfs_io

 echo "== mmap_stress"
-mmap_stress 8192 30 "$T_D0/mmap_stress" "$T_D0/mmap_stress" "$T_D0/mmap_stress" "$T_D3/mmap_stress" "$T_D3/mmap_stress" | sed 's/:.*//g' | sort
+mmap_stress 8192 2000 "$T_D0/mmap_stress" "$T_D1/mmap_stress" "$T_D2/mmap_stress" "$T_D3/mmap_stress" "$T_D4/mmap_stress" | sed 's/:.*//g' | sort

 echo "== basic mmap/read/write consistency checks"
 mmap_validate 256 1000 "$T_D0/mmap_val1" "$T_D1/mmap_val1"
--- a/tests/tests/offline-extent-waiting.sh
+++ b/tests/tests/offline-extent-waiting.sh
@@ -157,7 +157,7 @@ echo "truncate should be waiting for first block:"
 expect_wait "$DIR/file" "change_size" $ino 0
 scoutfs stage "$DIR/golden" "$DIR/file" -V "$vers" -o 0 -l $BYTES
 sleep .1
-echo "truncate should no longer be waiting:"
+echo "trunate should no longer be waiting:"
 scoutfs data-waiting -B 0 -I 0 -p "$DIR" | wc -l
 cat "$DIR/golden" > "$DIR/file"
 vers=$(scoutfs stat -s data_version "$DIR/file")
@@ -168,13 +168,10 @@ scoutfs release "$DIR/file" -V "$vers" -o 0 -l $BYTES
 # overwrite, not truncate+write
 dd if="$DIR/other" of="$DIR/file" \
 	bs=$BS count=$BLOCKS conv=notrunc status=none &
-pid="$!"
 sleep .1
 echo "should be waiting for write"
 expect_wait "$DIR/file" "write" $ino 0
 scoutfs stage "$DIR/golden" "$DIR/file" -V "$vers" -o 0 -l $BYTES
-# wait for the background dd to complete
-wait "$pid" 2> /dev/null
 cmp "$DIR/file" "$DIR/other"

 echo "== cleanup"
--- a/tests/tests/orphan-inodes.sh
+++ b/tests/tests/orphan-inodes.sh
@@ -5,6 +5,18 @@
 t_require_commands sleep touch sync stat handle_cat kill rm
 t_require_mounts 2

+#
+# usually bash prints an annoying output message when jobs
+# are killed.  We can avoid that by redirecting stderr for
+# the bash process when it reaps the jobs that are killed.
+#
+silent_kill() {
+	exec {ERR}>&2 2>/dev/null
+	kill "$@"
+	wait "$@"
+	exec 2>&$ERR {ERR}>&-
+}
+
 #
 # We don't have a great way to test that inode items still exist.   We
 # don't prevent opening handles with nlink 0 today, so we'll use that.
@@ -40,7 +52,7 @@ inode_exists $ino || echo "$ino didn't exist"

 echo "== orphan from failed evict deletion is picked up"
 # pending kill signal stops evict from getting locks and deleting
-t_silent_kill $pid
+silent_kill $pid
 t_set_sysfs_mount_option 0 orphan_scan_delay_ms 1000
 sleep 5
 inode_exists $ino && echo "$ino still exists"
@@ -58,7 +70,7 @@ for nr in $(t_fs_nrs); do
 	rm -f "$path"
 done
 sync
-t_silent_kill $pids
+silent_kill $pids
 for nr in $(t_fs_nrs); do
 	t_force_umount $nr
 done
@@ -67,49 +79,10 @@ t_mount_all
 while test -d $(echo /sys/fs/scoutfs/*/fence/* | cut -d " " -f 1); do
 	sleep .5
 done
-
-
-sv=$(t_server_nr)
-
-# wait for reclaim_open_log_tree() to complete for each mount
-while [ $(t_counter reclaimed_open_logs $sv) -lt $T_NR_MOUNTS ]; do
-	sleep 1
-done
-
-# wait for finalize_and_start_log_merge() to find no active merges in flight
-# and not find any finalized trees
-while [ $(t_counter log_merge_no_finalized $sv) -lt 1 ]; do
-	sleep 1
-done
-
 # wait for orphan scans to run
 t_set_all_sysfs_mount_options orphan_scan_delay_ms 1000
-# wait until we see two consecutive orphan scan attempts without
-# any inode deletion forward progress in each mount
-for nr in $(t_fs_nrs); do
-	C=0
-	LOSA=$(t_counter orphan_scan_attempts $nr)
-	LDOP=$(t_counter inode_deleted $nr)
-
-	while [ $C -lt 2 ]; do
-		sleep 1
-
-		OSA=$(t_counter orphan_scan_attempts $nr)
-		DOP=$(t_counter inode_deleted $nr)
-
-		if [ $OSA != $LOSA ]; then
-			if [ $DOP == $LDOP ]; then
-				(( C++ ))
-			else
-				C=0
-			fi
-		fi
-
-		LOSA=$OSA
-		LDOP=$DOP
-	done
-done
-
+# also have to wait for delayed log merge work from mount
+sleep 15
 for ino in $inos; do
 	inode_exists $ino && echo "$ino still exists"
 done
@@ -158,7 +131,7 @@ while [ $SECONDS -lt $END ]; do
 	done

 	# trigger eviction deletion of each file in each mount
-	t_silent_kill $pids
+	silent_kill $pids

 	wait || t_fail "handle_fsetxattr failed"

--- a/tests/tests/orphan-log-trees.sh
+++ b/tests/tests/orphan-log-trees.sh
@@ -1,52 +0,0 @@
-#
-# Test that orphaned log_trees entries from unmounted rids are
-# finalized and merged.
-#
-# An orphan log_trees entry is one whose rid has no mounted_clients
-# entry.  This can happen from incomplete reclaim across server
-# failovers.  We simulate it with the reclaim_skip_finalize trigger
-# which makes reclaim_open_log_tree skip the finalization step.
-#
-
-t_require_commands touch scoutfs
-t_require_mounts 2
-
-TIMEOUT=90
-
-echo "== create orphan log_trees entry via trigger"
-sv=$(t_server_nr)
-cl=$(t_first_client_nr)
-rid=$(t_mount_rid $cl)
-
-touch "$T_D0/file" "$T_D1/file"
-sync
-
-# arm the trigger so reclaim skips finalization
-t_trigger_arm_silent reclaim_skip_finalize $sv
-
-# force unmount the client, server will fence and reclaim it
-# but the trigger makes reclaim leave log_trees unfinalized
-t_force_umount $cl
-
-# wait for fencing to run
-verify_fenced() {
-	grep -q "running rid '$rid'" "$T_FENCED_LOG" 2>/dev/null
-}
-t_wait_until_timeout $TIMEOUT verify_fenced
-
-# give the server time to complete reclaim after fence
-sleep 5
-
-# remount the client so t_force_log_merge can sync all mounts.
-# the client gets a new rid; the old rid's log_trees is the orphan.
-t_mount $cl
-
-echo "== verify orphan is reclaimed and merge completes"
-t_force_log_merge
-
-echo "== verify orphan reclaim was logged"
-if ! dmesg | grep -q "reclaiming orphan log trees for rid $rid"; then
-	t_fail "expected orphan reclaim message for rid $rid in dmesg"
-fi
-
-t_pass
--- a/tests/tests/punch-offline.sh
+++ b/tests/tests/punch-offline.sh
@@ -1,152 +0,0 @@
-
-t_require_commands scoutfs dd fallocate
-
-FILE="$T_D0/file"
-DIR="$T_D0/dir"
-
-echo "== missing options should fail =="
-rm -rf $DIR && mkdir -p $DIR
-scoutfs punch-offline $DIR -l 4096 -V 0
-scoutfs punch-offline $DIR -o 0 -V 0
-scoutfs punch-offline $DIR -o 0 -l 4096
-
-echo "== can't hole punch dir or special =="
-rm -rf $DIR && mkdir -p $DIR
-scoutfs punch-offline $DIR -o 0 -l 4096 -V 0
-
-echo "== punching an empty file does nothing =="
-rm -f $FILE && touch $FILE
-scoutfs punch-offline $FILE -o 0 -l 4096 -V 0
-
-echo "== punch outside of i_size does nothing =="
-dd if=/dev/zero of=$FILE bs=4096 count=1 status=none
-scoutfs punch-offline $FILE -o 4096 -l 4096 -V 1
-
-echo "== can't hole punch online extent =="
-scoutfs get-fiemap -Lb $FILE
-scoutfs punch-offline $FILE -o 0 -l 4096 -V 1
-scoutfs get-fiemap -Lb $FILE
-
-echo "== can't hole punch unwritten extent =="
-rm -rf $FILE && touch $FILE
-fallocate -l $((4096 * 3)) $FILE
-vers=$(scoutfs stat -s data_version "$FILE")
-scoutfs get-fiemap -Lb $FILE
-scoutfs punch-offline $FILE -o 4096 -l 4096 -V $vers
-scoutfs get-fiemap -Lb $FILE
-
-echo "== hole punch offline extent =="
-rm -rf $FILE && touch $FILE
-fallocate -l $((4096 * 3)) $FILE
-vers=$(scoutfs stat -s data_version "$FILE")
-scoutfs release $FILE --data-version $vers
-scoutfs get-fiemap -Lb $FILE
-scoutfs punch-offline $FILE -o 4096 -l 4096 -V $vers
-scoutfs get-fiemap -Lb $FILE
-
-echo "== can't hole punch non-aligned bsz offset or len =="
-rm -rf $FILE && touch $FILE
-fallocate -l $((4096 * 3)) $FILE
-vers=$(scoutfs stat -s data_version "$FILE")
-scoutfs release $FILE --data-version $vers
-scoutfs get-fiemap -Lb $FILE
-scoutfs punch-offline $FILE -o 4095 -l 4096 -V $vers
-scoutfs punch-offline $FILE -o 1 -l 4096 -V $vers
-scoutfs punch-offline $FILE -o 4096 -l 409700 -V $vers
-scoutfs punch-offline $FILE -o 4096 -l 4097 -V $vers
-scoutfs punch-offline $FILE -o 4096 -l 4095 -V $vers
-scoutfs punch-offline $FILE -o 4096 -l 1 -V $vers
-scoutfs punch-offline $FILE -o 4096 -l 0 -V $vers
-scoutfs get-fiemap -Lb $FILE
-
-echo "== can't hole punch mismatched data_version =="
-rm -rf $FILE && touch $FILE
-fallocate -l $((4096 * 3)) $FILE
-vers=$(scoutfs stat -s data_version "$FILE")
-scoutfs release $FILE --data-version $vers
-scoutfs get-fiemap -Lb $FILE
-scoutfs punch-offline $FILE -o 4096 -l 4096 -V 0
-scoutfs punch-offline $FILE -o 4096 -l 4096 -V 2
-scoutfs punch-offline $FILE -o 4096 -l 4096 -V 9999
-scoutfs get-fiemap -Lb $FILE
-
-echo "== Punch hole crossing multiple extents =="
-rm -rf $FILE && touch $FILE
-fallocate -l $((7 * 4096)) $FILE
-vers=$(scoutfs stat -s data_version "$FILE")
-scoutfs release $FILE --data-version $vers
-scoutfs get-fiemap -L $FILE
-scoutfs punch-offline $FILE -o $((1 * 4096)) -l 4096 -V $vers
-scoutfs punch-offline $FILE -o $((3 * 4096)) -l 4096 -V $vers
-scoutfs punch-offline $FILE -o $((5 * 4096)) -l 4096 -V $vers
-# 0.1.2.3
-scoutfs get-fiemap -L $FILE
-scoutfs punch-offline $FILE -o $((2 * 4096)) -l $((3 * 4096)) -V $vers
-# 0.....1
-scoutfs get-fiemap -L $FILE
-
-echo "== punch hole starting at a hole =="
-rm -rf $FILE && touch $FILE
-fallocate -l $((7 * 4096)) $FILE
-vers=$(scoutfs stat -s data_version "$FILE")
-scoutfs release $FILE --data-version $vers
-scoutfs get-fiemap -L $FILE
-scoutfs punch-offline $FILE -o $((1 * 4096)) -l 4096 -V $vers
-scoutfs punch-offline $FILE -o $((3 * 4096)) -l 4096 -V $vers
-scoutfs punch-offline $FILE -o $((5 * 4096)) -l 4096 -V $vers
-# 0.1.2.3
-scoutfs get-fiemap -L $FILE
-scoutfs punch-offline $FILE -o $((1 * 4096)) -l $((5 * 4096)) -V $vers
-# 0.....1
-scoutfs get-fiemap -L $FILE
-
-echo "== large punch =="
-rm -rf $FILE && touch $FILE
-fallocate -l $((6 * 1024 * 1024 * 1024)) $FILE
-vers=$(scoutfs stat -s data_version "$FILE")
-scoutfs release $FILE --data-version $vers
-scoutfs get-fiemap -L $FILE
-scoutfs punch-offline $FILE -o $((134123 * 4096)) -l $((68343 * 4096)) -V $vers
-scoutfs punch-offline $FILE -o $((467273 * 4096)) -l $((68343 * 4096)) -V $vers
-scoutfs punch-offline $FILE -o $((734623 * 4096)) -l $((68343 * 4096)) -V $vers
-scoutfs get-fiemap -L $FILE
-
-echo "== overlapping punches with lots of extents =="
-rm -rf $FILE && touch $FILE
-fallocate -l $((4096 * 1024)) $FILE
-vers=$(scoutfs stat -s data_version "$FILE")
-scoutfs release $FILE --data-version 1
-scoutfs get-fiemap -Lb $FILE
-# punch odd ones away
-for h in $(seq 1 2 1023); do
-	scoutfs punch-offline $FILE -o $((h * 4096)) -l 4096 -V $vers
-done
-scoutfs get-fiemap -Lb $FILE | tail -n 1
-# punch a large hole from 32 to 55, removing 7 extents
-scoutfs punch-offline $FILE -o $((32 * 4096)) -l $((13 * 4096)) -V $vers
-scoutfs get-fiemap -Lb $FILE | tail -n 1
-# punch every 8th @6
-for h in $(seq 6 8 1024); do
-	scoutfs punch-offline $FILE -o $((h * 4096)) -l 4096 -V $vers
-done
-# again @4
-scoutfs get-fiemap -Lb $FILE | tail -n 1
-for h in $(seq 4 8 1024); do
-	scoutfs punch-offline $FILE -o $((h * 4096)) -l 4096 -V $vers
-done
-scoutfs get-fiemap -Lb $FILE | tail -n 1
-# punching a large hole from 127 to 175, removing 12 extents
-scoutfs punch-offline $FILE -o $((127 * 4096)) -l $((48 * 4096)) -V $vers
-scoutfs get-fiemap -Lb $FILE
-# again @2
-for h in $(seq 2 8 1024); do
-	scoutfs punch-offline $FILE -o $((h * 4096)) -l 4096 -V $vers
-done
-scoutfs get-fiemap -L $FILE
-# and again @0, punching away everything remaining extent
-for h in $(seq 0 8 1024); do
-	scoutfs punch-offline $FILE -o $((h * 4096)) -l 4096 -V $vers
-done
-scoutfs get-fiemap -Lb $FILE
-
-t_pass
--- a/tests/tests/quorum-heartbeat-timeout.sh
+++ b/tests/tests/quorum-heartbeat-timeout.sh
@@ -62,7 +62,7 @@ test_timeout()
 	sleep 1

 	# tear down the current server/leader
-	t_force_umount $sv &
+	t_force_umount $sv

 	# see how long it takes for the next leader to start
 	start=$(time_ms)
@@ -73,7 +73,6 @@ test_timeout()
 	echo "to $to delay $delay" >> $T_TMP.delay

 	# restore the mount that we tore down
-	wait
 	t_mount $sv

 	# make sure the new leader delay was reasonable, allowing for some slack
--- a/tests/tests/renameat2-noreplace.sh
+++ b/tests/tests/renameat2-noreplace.sh
@@ -8,19 +8,19 @@ t_require_mounts 2
 echo "=== renameat2 noreplace flag test"

 # give each mount their own dir (lock group) to minimize create contention
-mkdir $T_D0/dir0
-mkdir $T_D1/dir1
+mkdir $T_M0/dir0
+mkdir $T_M1/dir1

 echo "=== run two asynchronous calls to renameat2 NOREPLACE"
 for i in $(seq 0 100); do
        # prepare inputs in isolation
-        touch "$T_D0/dir0/old0"
-        touch "$T_D1/dir1/old1"
+        touch "$T_M0/dir0/old0"
+        touch "$T_M1/dir1/old1"

        # race doing noreplace renames, both can't succeed
-        dumb_renameat2 -n "$T_D0/dir0/old0" "$T_D0/dir0/sharednew" 2> /dev/null &
+        dumb_renameat2 -n "$T_M0/dir0/old0" "$T_M0/dir0/sharednew" 2> /dev/null &
        pid0=$!
-        dumb_renameat2 -n "$T_D1/dir1/old1" "$T_D1/dir0/sharednew" 2> /dev/null &
+        dumb_renameat2 -n "$T_M1/dir1/old1" "$T_M1/dir0/sharednew" 2> /dev/null &
        pid1=$!

        wait $pid0
@@ -31,7 +31,7 @@ for i in $(seq 0 100); do
        test "$rc0" == 0 -a "$rc1" == 0 && t_fail "both renames succeeded"

        # blow away possible files for either race outcome
-        rm -f "$T_D0/dir0/old0" "$T_D1/dir1/old1" "$T_D0/dir0/sharednew" "$T_D1/dir1/sharednew"
+        rm -f "$T_M0/dir0/old0" "$T_M1/dir1/old1" "$T_M0/dir0/sharednew" "$T_M1/dir1/sharednew"
 done

 t_pass
--- a/tests/tests/resize-devices.sh
+++ b/tests/tests/resize-devices.sh
@@ -19,8 +19,8 @@ df_free() {
 }

 same_totals() {
-	cur_meta_tot=$(statfs_total meta "$T_MSCR")
-	cur_data_tot=$(statfs_total data "$T_MSCR")
+	cur_meta_tot=$(statfs_total meta "$SCR")
+	cur_data_tot=$(statfs_total data "$SCR")

 	test "$cur_meta_tot" == "$exp_meta_tot" || \
 		t_fail "cur total_meta_blocks $cur_meta_tot != expected $exp_meta_tot"
@@ -34,10 +34,10 @@ same_totals() {
 # some slop to account for reserved blocks and concurrent allocation.
 #
 devices_grew() {
-	cur_meta_tot=$(statfs_total meta "$T_MSCR")
-	cur_data_tot=$(statfs_total data "$T_MSCR")
-	cur_meta_df=$(df_free MetaData "$T_MSCR")
-	cur_data_df=$(df_free Data "$T_MSCR")
+	cur_meta_tot=$(statfs_total meta "$SCR")
+	cur_data_tot=$(statfs_total data "$SCR")
+	cur_meta_df=$(df_free MetaData "$SCR")
+	cur_data_df=$(df_free Data "$SCR")

 	local grow_meta_tot=$(echo "$exp_meta_tot * 2" | bc)
 	local grow_data_tot=$(echo "$exp_data_tot * 2" | bc)
@@ -70,13 +70,19 @@ size_data=$(blockdev --getsize64 "$T_EX_DATA_DEV")
 quarter_meta=$(echo "$size_meta / 4" | bc)
 quarter_data=$(echo "$size_data / 4" | bc)

+# XXX this is all pretty manual, would be nice to have helpers
 echo "== make initial small fs"
-t_scratch_mkfs -A -m $quarter_meta -d $quarter_data
-t_scratch_mount
+scoutfs mkfs -A -f -Q 0,127.0.0.1,53000 -m $quarter_meta -d $quarter_data \
+	"$T_EX_META_DEV" "$T_EX_DATA_DEV" > $T_TMP.mkfs.out 2>&1 || \
+		t_fail "mkfs failed"
+SCR="$T_TMPDIR/mnt.scratch"
+mkdir -p "$SCR"
+mount -t scoutfs -o metadev_path=$T_EX_META_DEV,quorum_slot_nr=0 \
+	"$T_EX_DATA_DEV" "$SCR"

 # then calculate sizes based on blocks that mkfs used
-quarter_meta=$(echo "$(statfs_total meta "$T_MSCR") * 64 * 1024" | bc)
-quarter_data=$(echo "$(statfs_total data "$T_MSCR") * 4 * 1024" | bc)
+quarter_meta=$(echo "$(statfs_total meta "$SCR") * 64 * 1024" | bc)
+quarter_data=$(echo "$(statfs_total data "$SCR") * 4 * 1024" | bc)
 whole_meta=$(echo "$quarter_meta * 4" | bc)
 whole_data=$(echo "$quarter_data * 4" | bc)
 outsize_meta=$(echo "$whole_meta * 2" | bc)
@@ -87,58 +93,59 @@ shrink_meta=$(echo "$quarter_meta / 2" | bc)
 shrink_data=$(echo "$quarter_data / 2" | bc)

 # and save expected values for checks
-exp_meta_tot=$(statfs_total meta "$T_MSCR")
-exp_meta_df=$(df_free MetaData "$T_MSCR")
-exp_data_tot=$(statfs_total data "$T_MSCR")
-exp_data_df=$(df_free Data "$T_MSCR")
+exp_meta_tot=$(statfs_total meta "$SCR")
+exp_meta_df=$(df_free MetaData "$SCR")
+exp_data_tot=$(statfs_total data "$SCR")
+exp_data_df=$(df_free Data "$SCR")

 echo "== 0s do nothing"
-scoutfs resize-devices -p "$T_MSCR"
-scoutfs resize-devices -p "$T_MSCR" -m 0
-scoutfs resize-devices -p "$T_MSCR" -d 0
-scoutfs resize-devices -p "$T_MSCR" -m 0 -d 0
+scoutfs resize-devices -p "$SCR" 
+scoutfs resize-devices -p "$SCR" -m 0
+scoutfs resize-devices -p "$SCR" -d 0
+scoutfs resize-devices -p "$SCR" -m 0 -d 0

 echo "== shrinking fails"
-scoutfs resize-devices -p "$T_MSCR" -m $shrink_meta
-scoutfs resize-devices -p "$T_MSCR" -d $shrink_data
-scoutfs resize-devices -p "$T_MSCR" -m $shrink_meta -d $shrink_data
+scoutfs resize-devices -p "$SCR" -m $shrink_meta
+scoutfs resize-devices -p "$SCR" -d $shrink_data
+scoutfs resize-devices -p "$SCR" -m $shrink_meta -d $shrink_data
 same_totals

 echo "== existing sizes do nothing"
-scoutfs resize-devices -p "$T_MSCR" -m $quarter_meta
-scoutfs resize-devices -p "$T_MSCR" -d $quarter_data
-scoutfs resize-devices -p "$T_MSCR" -m $quarter_meta -d $quarter_data
+scoutfs resize-devices -p "$SCR" -m $quarter_meta
+scoutfs resize-devices -p "$SCR" -d $quarter_data
+scoutfs resize-devices -p "$SCR" -m $quarter_meta -d $quarter_data
 same_totals

 echo "== growing outside device fails"
-scoutfs resize-devices -p "$T_MSCR" -m $outsize_meta
-scoutfs resize-devices -p "$T_MSCR" -d $outsize_data
-scoutfs resize-devices -p "$T_MSCR" -m $outsize_meta -d $outsize_data
+scoutfs resize-devices -p "$SCR" -m $outsize_meta
+scoutfs resize-devices -p "$SCR" -d $outsize_data
+scoutfs resize-devices -p "$SCR" -m $outsize_meta -d $outsize_data
 same_totals

 echo "== resizing meta works"
-scoutfs resize-devices -p "$T_MSCR" -m $half_meta
+scoutfs resize-devices -p "$SCR" -m $half_meta
 devices_grew meta

 echo "== resizing data works"
-scoutfs resize-devices -p "$T_MSCR" -d $half_data
+scoutfs resize-devices -p "$SCR" -d $half_data
 devices_grew data

 echo "== shrinking back fails"
-scoutfs resize-devices -p "$T_MSCR" -m $quarter_meta
-scoutfs resize-devices -p "$T_MSCR" -m $quarter_data
+scoutfs resize-devices -p "$SCR" -m $quarter_meta
+scoutfs resize-devices -p "$SCR" -m $quarter_data
 same_totals

 echo "== resizing again does nothing"
-scoutfs resize-devices -p "$T_MSCR" -m $half_meta
-scoutfs resize-devices -p "$T_MSCR" -m $half_data
+scoutfs resize-devices -p "$SCR" -m $half_meta
+scoutfs resize-devices -p "$SCR" -m $half_data
 same_totals

 echo "== resizing to full works"
-scoutfs resize-devices -p "$T_MSCR" -m $whole_meta -d $whole_data
+scoutfs resize-devices -p "$SCR" -m $whole_meta -d $whole_data
 devices_grew meta data

 echo "== cleanup extra fs"
-t_scratch_umount
+umount "$SCR"
+rmdir "$SCR"

 t_pass
--- a/tests/tests/simple-inode-index.sh
+++ b/tests/tests/simple-inode-index.sh
@@ -32,7 +32,7 @@ echo "== dirs shouldn't appear in data_seq queries"
 mkdir "$DIR"
 ino=$(stat -c "%i" "$DIR")
 t_sync_seq_index
-query_index data_seq | awk '($4 == "'$ino'")'
+query_index data_seq | grep "$ino\>"

 echo "== two created files are present and come after each other"
 touch "$DIR/first"
@@ -92,13 +92,13 @@ test "$before" -lt "$after" || \
 # didn't skip past deleted dirty items
 #
 echo "== make sure dirtying doesn't livelock walk"
-dd if=/dev/urandom of="$DIR/dirtying" bs=4K count=1 >> "$T_TMPDIR/seqres.full" 2>&1
+dd if=/dev/urandom of="$DIR/dirtying" bs=4K count=1 >> $seqres.full 2>&1
 nr=1
 while [ "$nr" -lt 100 ]; do
-	echo "dirty/walk attempt $nr" >> "$T_TMPDIR/seqres.full"
+	echo "dirty/walk attempt $nr" >> $seqres.full
 	sync
 	dd if=/dev/urandom of="$DIR/dirtying" bs=4K count=1 conv=notrunc \
-		>> "$T_TMPDIR/seqres.full" 2>&1
+		>> $seqres.full 2>&1
 	scoutfs walk-inodes data_seq 0 -1 $DIR/dirtying >& /dev/null 
 	((nr++))
 done
--- a/tests/tests/simple-staging.sh
+++ b/tests/tests/simple-staging.sh
@@ -12,12 +12,12 @@ create_file() {

 	if [ "$blocks" != 0 ]; then
 		dd if=/dev/urandom bs=4096 count=$blocks of="$file" \
-			>> "$T_TMPDIR/seqres.full" 2>&1
+			>> $seqres.full 2>&1
 	fi

 	if [ "$remainder" != 0 ]; then
 		dd if=/dev/urandom bs="$remainder" count=1 of="$file" \
-			conv=notrunc oflag=append >> "$T_TMPDIR/seqres.full" 2>&1
+			conv=notrunc oflag=append >> $seqres.full 2>&1
 	fi
 }

@@ -78,7 +78,7 @@ create_file "$FILE" $((4096 * 1024))
 cp "$FILE"  "$T_TMP"
 nr=1
 while [ "$nr" -lt 10 ]; do
-	echo "attempt $nr" >> "$T_TMPDIR/$seqres.full" 2>&1
+	echo "attempt $nr" >> $seqres.full 2>&1
 	release_vers "$FILE" stat 0 4096K
 	sync
 	echo 3 > /proc/sys/vm/drop_caches
--- a/tests/tests/totl-merge-read.sh
+++ b/tests/tests/totl-merge-read.sh
@@ -1,50 +0,0 @@
-#
-# Test that merge_read_item() correctly updates the sequence number when
-# combining delta items from multiple finalized log trees.  Each mount
-# sets a totl value in its own 3-bit lane (powers of 8) so that any
-# double-counting overflows the lane and is caught by: or(v, exp) != exp.
-#
-
-t_require_commands setfattr scoutfs
-t_require_mounts 5
-
-echo "== setup"
-for nr in $(t_fs_nrs); do
-	d=$(eval echo \$T_D$nr)
-	for i in $(seq 1 2500); do : > "$d/f$nr$i"; done
-done
-sync
-t_force_log_merge
-
-vals=(1 8 64 512 4096)
-expected=4681
-n=0
-for nr in $(t_fs_nrs); do
-	d=$(eval echo \$T_D$nr)
-	v=${vals[$((n++))]}
-	for i in $(seq 1 2500); do
-		setfattr -n "scoutfs.totl.t.$i.0.0" -v $v "$d/f$nr$i"
-	done
-done
-
-t_trigger_arm_silent log_merge_force_partial $(t_server_nr)
-
-bad="$T_TMPDIR/bad"
-for nr in $(t_fs_nrs); do
-	( while true; do
-		echo 1 > "$(t_debugfs_path $nr)/drop_weak_item_cache"
-		scoutfs read-xattr-totals -p "$(eval echo \$T_M$nr)" | \
-			awk -F'[ =,]+' -v e=$expected 'or($2+0,e) != e'
-	done ) >> "$bad" &
-done
-
-echo "expected $expected"
-t_force_log_merge
-t_silent_kill $(jobs -p)
-test -s "$bad" && echo "double-counted:" && cat "$bad"
-
-echo "== cleanup"
-for nr in $(t_fs_nrs); do
-	find "$(eval echo \$T_D$nr)" -name "f$nr*" -delete
-done
-t_pass
--- a/tests/tests/xfstests.sh
+++ b/tests/tests/xfstests.sh
@@ -50,9 +50,9 @@ t_quiet sync
 cat << EOF > local.config
 export FSTYP=scoutfs
 export MKFS_OPTIONS="-f"
-export MKFS_TEST_OPTIONS="-Q 0,127.0.0.1,$T_TEST_PORT"
-export MKFS_SCRATCH_OPTIONS="-Q 0,127.0.0.1,$T_SCRATCH_PORT"
-export MKFS_DEV_OPTIONS="-Q 0,127.0.0.1,$T_DEV_PORT"
+export MKFS_TEST_OPTIONS="-Q 0,127.0.0.1,42000"
+export MKFS_SCRATCH_OPTIONS="-Q 0,127.0.0.1,43000"
+export MKFS_DEV_OPTIONS="-Q 0,127.0.0.1,44000"
 export TEST_DEV=$T_DB0
 export TEST_DIR=$T_M0
 export SCRATCH_META_DEV=$T_EX_META_DEV
@@ -63,47 +63,73 @@ export MOUNT_OPTIONS="-o quorum_slot_nr=0,metadev_path=$T_MB0"
 export TEST_FS_MOUNT_OPTS="-o quorum_slot_nr=0,metadev_path=$T_MB0"
 EOF

-cp "$T_EXTRA/local.exclude" local.exclude
+cat << EOF > local.exclude
+generic/003	# missing atime update in buffered read
+generic/075	# file content mismatch failures (fds, etc)
+generic/103	# enospc causes trans commit failures
+generic/108	# mount fails on failing device?
+generic/112	# file content mismatch failures (fds, etc)
+generic/213	# enospc causes trans commit failures
+generic/318	# can't support user namespaces until v5.11
+generic/321	# requires selinux enabled for '+' in ls?
+generic/338	# BUG_ON update inode error handling
+generic/347	# _dmthin_mount doesn't work?
+generic/356	# swap
+generic/357	# swap
+generic/409	# bind mounts not scripted yet
+generic/410	# bind mounts not scripted yet
+generic/411	# bind mounts not scripted yet
+generic/423	# symlink inode size is strlen() + 1 on scoutfs
+generic/430	# xfs_io copy_range missing in el7
+generic/431	# xfs_io copy_range missing in el7
+generic/432	# xfs_io copy_range missing in el7
+generic/433	# xfs_io copy_range missing in el7
+generic/434	# xfs_io copy_range missing in el7
+generic/441	# dm-mapper
+generic/444	# el9's posix_acl_update_mode is buggy ?
+generic/467	# open_by_handle ESTALE
+generic/472	# swap
+generic/484	# dm-mapper
+generic/493	# swap
+generic/494	# swap
+generic/495	# swap
+generic/496	# swap
+generic/497	# swap
+generic/532	# xfs_io statx attrib_mask missing in el7
+generic/554	# swap
+generic/563	# cgroup+loopdev
+generic/564	# xfs_io copy_range missing in el7
+generic/565	# xfs_io copy_range missing in el7
+generic/568	# falloc not resulting in block count increase
+generic/569	# swap
+generic/570	# swap
+generic/620	# dm-hugedisk
+generic/633	# id-mapped mounts missing in el7
+generic/636	# swap
+generic/641	# swap
+generic/643	# swap
+EOF

-t_stdout_invoked
+t_restore_output
 echo "  (showing output of xfstests)"

 args="-E local.exclude ${T_XFSTESTS_ARGS:--g quick}"
 ./check $args
 # the fs is unmounted when check finishes

-t_stdout_compare
-
 #
-# ./check writes the results of the run to check.log.  It lists the
-# tests it ran, skipped, or failed.  Then it writes a line saying
-# everything passed or some failed.
-#
-
-#
-# If XFSTESTS_ARGS were specified then we just pass/fail to match the
-# check run.
-#
-if [ -n "$T_XFSTESTS_ARGS" ]; then
-	if tail -1 results/check.log | grep -q "Failed"; then
-		t_fail
-	else
-		t_pass
-	fi
-fi
-
-#
-# Otherwise, typically, when there were no args then we scrape the most
-# recent run and use it as the output to compare to make sure that we
-# run the right tests and get the right results.
+# ./check writes the results of the run to check.log.  It lists
+# the tests it ran, skipped, or failed.  Then it writes a line saying
+# everything passed or some failed.  We scrape the most recent run and
+# use it as the output to compare to make sure that we run the right
+# tests and get the right results.
 #
 awk '
 	/^(Ran|Not run|Failures):.*/ {
 		if (pf) {
 			res=""
 			pf=""
-		}
-		res = res "\n" $0
+		} res = res "\n" $0
 	}
 	/^(Passed|Failed).*tests$/ {
 		pf=$0
@@ -113,14 +139,10 @@ awk '
 	}' < results/check.log  > "$T_TMPDIR/results"

 # put a test per line so diff shows tests that differ
-grep -E "^(Ran|Not run|Failures):" "$T_TMPDIR/results" | fmt -w 1 > "$T_TMPDIR/results.fmt"
-grep -E "^(Passed|Failed).*tests$" "$T_TMPDIR/results" >> "$T_TMPDIR/results.fmt"
+egrep "^(Ran|Not run|Failures):" "$T_TMPDIR/results" | \
+	fmt -w 1 > "$T_TMPDIR/results.fmt"
+egrep "^(Passed|Failed).*tests$" "$T_TMPDIR/results" >> "$T_TMPDIR/results.fmt"

-diff -u "$T_EXTRA/expected-results" "$T_TMPDIR/results.fmt" > "$T_TMPDIR/results.diff"
-if [ -s "$T_TMPDIR/results.diff" ]; then
-	echo "tests that were skipped/run differed from expected:"
-	cat "$T_TMPDIR/results.diff"
-	t_fail
-fi
+t_compare_output cat "$T_TMPDIR/results.fmt"

 t_pass
--- a/utils/fenced/scoutfs-fenced
+++ b/utils/fenced/scoutfs-fenced
@@ -62,28 +62,32 @@ test -x "$SCOUTFS_FENCED_RUN" || \
 # files disappear.
 #

-# silence error messages
-quiet_cat()
+# generate failure messages to stderr while still echoing 0 for the caller
+careful_cat()
 {
-	cat "$@" 2>/dev/null
+	local path="$@"
+
+	cat "$@" || echo 0
 }

 while sleep $SCOUTFS_FENCED_DELAY; do
-	shopt -s nullglob
 	for fence in /sys/fs/scoutfs/*/fence/*; do
-
-		srv=$(basename $(dirname $(dirname $fence)))
-		fenced="$(quiet_cat $fence/fenced)"
-		error="$(quiet_cat $fence/error)"
-		rid="$(quiet_cat $fence/rid)"
-		ip="$(quiet_cat $fence/ipv4_addr)"
-		reason="$(quiet_cat $fence/reason)"
-
-		# request dirs can linger then disappear after fenced/error is set
-		if [ ! -d "$fence" -o "$fenced" == "1" -o "$error" == "1" ]; then
+		# catches unmatched regex when no dirs
+		if [ ! -d "$fence" ]; then
 			continue
 		fi

+		# skip requests that have been handled
+		if [ "$(careful_cat $fence/fenced)" == 1 -o \
+		     "$(careful_cat $fence/error)" == 1 ]; then
+			continue
+		fi
+
+		srv=$(basename $(dirname $(dirname $fence)))
+		rid="$(cat $fence/rid)"
+		ip="$(cat $fence/ipv4_addr)"
+		reason="$(cat $fence/reason)"
+
 		log_message "server $srv fencing rid $rid at IP $ip for $reason"

 		# export _REQ_ vars for run to use
--- a/utils/man/scoutfs.5
+++ b/utils/man/scoutfs.5
@@ -55,30 +55,6 @@ with initial sparse regions (perhaps by multiple threads writing to
 different regions) and wasted space isn't an issue (perhaps because the
 file population contains few small files).
 .TP
-.B ino_alloc_per_lock=<number>
-This option determines how many inode numbers are allocated in the same
-cluster lock.  The default, and maximum, is 1024.  The minimum is 1.
-Allocating fewer inodes per lock can allow more parallelism between
-mounts because there are more locks that cover the same number of
-created files.  This can be helpful when working with smaller numbers of
-large files.
-.TP
-.B lock_idle_count=<number>
-This option sets the number of locks that the client will allow to
-remain idle after being granted.  If the number of locks exceeds this
-count then the client will try to free the oldest locks.  This setting
-is per-mount and only changes the behavior of that mount.
-.sp
-Idle locks are not reclaimed by memory pressure so this option
-determines the limit of how much memory is likely to be pinned by
-allocated idle locks.  Setting this too low can increase latency of
-operations as repeated use of a working set of locks has to request the
-locks from the network rather than using granted idle locks.
-.sp
-The count is not strictly enforced.  Operations are allowed to use locks
-while over the limit to avoid deadlocks under heavy concurrent load.
-Exceeding the count only attempts freeing of idle locks.
-.TP
 .B log_merge_wait_timeout_ms=<number>
 This option sets the amount of time, in milliseconds, that log merge
 creation can wait before timing out.  This setting is per-mount, only
@@ -154,23 +130,6 @@ the server for the filesystem if it is elected leader.
 The assigned number must match one of the slots defined with \-Q options
 when the filesystem was created with mkfs.  If the number assigned
 doesn't match a number created during mkfs then the mount will fail.
-.TP
-.B tcp_keepalive_timeout_ms=<number>
-This option sets the amount of time, in milliseconds, that a client
-connection will wait for active TCP packets, before deciding that
-the connection is dead. This setting is per-mount and only changes
-the behavior of that mount.
-.sp
-The default value of this setting is 60000msec (60s). Any precision
-beyond a whole second is likely unrealistic due to the nature of
-TCP keepalive mechanisms in the Linux kernel. Valid values are any
-value higher than 3000 (3s).
-.sp
-The TCP keepalive mechanism is complex and observing a lost connection
-quickly is important to maintain cluster stability. If the local
-network suffers from intermittent outages this option may provide
-some respite to overcome these outages without the cluster becoming
-desynchronized.
 .SH VOLUME OPTIONS
 Volume options are persistent options which are stored in the super
 block in the metadata device and which apply to all mounts of the volume.
--- a/utils/sparse.sh
+++ b/utils/sparse.sh
@@ -1,7 +1,7 @@
 #!/bin/bash

-# must have sparse.  Fail with error message, mask success path.
-which sparse > /dev/null || exit 1
+# can we find sparse?  If not, we're done.
+which sparse > /dev/null 2>&1 || exit 0

 # 
 # one of the problems with using sparse in userspace is that it picks up
@@ -22,11 +22,6 @@ RE="$RE|warning: memset with byte count of 4194304"
 # some sparse versions don't know about some builtins
 RE="$RE|error: undefined identifier '__builtin_fpclassify'"

-# on el8, sparse can't handle __has_include for some reason when _GNU_SOURCE
-# is defined, and we need that for O_DIRECT.
-RE="$RE|note: in included file .through /usr/include/sys/stat.h.:"
-RE="$RE|/usr/include/bits/statx.h:30:6: error: "
-
 #
 # don't filter out 'too many errors' here, it can signify that
 # sparse doesn't understand something and is throwing a *ton*
--- a/utils/src/punch_offline.c
+++ b/utils/src/punch_offline.c
@@ -1,127 +0,0 @@
-#include <sys/ioctl.h>
-#include <fcntl.h>
-#include <errno.h>
-#include <string.h>
-#include <argp.h>
-
-#include "sparse.h"
-#include "parse.h"
-#include "util.h"
-#include "ioctl.h"
-#include "cmd.h"
-
-struct po_args {
-	char *path;
-	u64 offset;
-	u64 length;
-	u64 data_version;
-
-	unsigned offset_set:1,
-	         length_set:1,
-	         data_version_set:1;
-};
-
-static int do_punch_offline(struct po_args *args)
-{
-	struct scoutfs_ioctl_punch_offline ioctl_args;
-	int ret;
-	int fd;
-
-	fd = get_path(args->path, O_RDWR);
-	if (fd < 0)
-		return fd;
-
-	ioctl_args.offset = args->offset;
-	ioctl_args.len = args->length;
-	ioctl_args.data_version = args->data_version;
-	ioctl_args.flags = 0;
-
-	ret = ioctl(fd, SCOUTFS_IOC_PUNCH_OFFLINE, &ioctl_args);
-
-	if (ret < 0) {
-		ret = -errno;
-		fprintf(stderr, "punch_offline ioctl failed: %s (%d)\n",
-			strerror(errno), errno);
-	}
-
-	close(fd);
-	return ret;
-}
-
-static int parse_opt(int key, char *arg, struct argp_state *state)
-{
-	struct po_args *args = state->input;
-	int ret = 0;
-
-	switch (key) {
-	case 'V':
-		ret = parse_u64(arg, &args->data_version);
-		if (ret)
-			return ret;
-		args->data_version_set = 1;
-		break;
-	case 'o': /* offset */
-		ret = parse_human(arg, &args->offset);
-		if (ret)
-			return ret;
-		args->offset_set = 1;
-		break;
-	case 'l': /* length */
-		ret = parse_human(arg, &args->length);
-		if (ret)
-			return ret;
-		args->length_set = 1;
-		break;
-	case ARGP_KEY_ARG:
-		if (!args->path)
-			args->path = strdup_or_error(state, arg);
-		else
-			argp_error(state, "unknown extra argument given");
-		break;
-	case ARGP_KEY_FINI:
-		if (!args->path)
-			argp_error(state, "must provide path to file");
-		if (!args->offset_set)
-			argp_error(state, "must provide offset");
-		if (!args->length_set)
-			argp_error(state, "must provide length");
-		if (!args->data_version_set)
-			argp_error(state, "must provide data_version");
-		break;
-	default:
-		break;
-	}
-
-	return 0;
-}
-
-static struct argp_option options[] = {
-	{ "data-version", 'V', "VERSION", 0, "Data version of the file [Required]"},
-	{ "offset", 'o', "OFFSET", 0, "Offset (bytes or KMGTP units) in file to stage [Required]"},
-	{ "length", 'l', "LENGTH", 0, "Length of range (bytes or KMGTP units) of file to stage. [Required]"},
-	{ NULL }
-};
-
-static struct argp argp = {
-	options,
-	parse_opt,
-	"PATH",
-	"Make a (sparse) hole in the file at offset and with length"
-};
-
-static int punch_offline_cmd(int argc, char **argv)
-{
-	struct po_args po_args = {NULL};
-	int ret;
-
-	ret = argp_parse(&argp, argc, argv, 0, NULL, &po_args);
-	if (ret)
-		return ret;
-
-	return do_punch_offline(&po_args);
-}
-
-static void __attribute__((constructor)) punch_offline_ctor(void)
-{
-	cmd_register_argp("punch-offline", &argp, GROUP_AGENT, punch_offline_cmd);
-}