Fix commit budget calculation with multiple holders

The try_drain_data_freed() path was generating errors about overrunning its commit budget: scoutfs f.2b8928.r.02689f error: 1 holders exceeded alloc budget av: bef 8185 now 8036, fr: bef 8185 now 7602 The budget overrun check was using the current number of commit holders (in this case one) instead of the the maximum number of concurrent holders (in this case two). So even well behaved paths like try_drain_data_freed() can appear to exceed their commit budget if other holders dirty some blocks and apply their commits before the try_drain_data_freed() thread does its final budget reconciliation. Signed-off-by: Chris Kirby <ckirby@versity.com>
Fix dirtied block calculation in extent_mod_blocks()
2026-06-09 21:22:36 +00:00 · 2025-06-17 11:38:07 -05:00 · 2025-06-17 11:38:07 -05:00
103 changed files with 2431 additions and 5939 deletions
@@ -1,147 +1,6 @@
 Versity ScoutFS Release Notes
 =============================

---
-v1.32
-\
-*June 2, 2026*
-
-Fix writing POSIX ACLs over NFS mounts that export the scoutfs
-filesystem.
-
-Add support for kernels in the RHEL 9.8 minor release.
-
-Reduce unneeded block allocation when data\_prealloc\_contig\_only was
-set to 0. This will help achieve more efficient data space usage when
-writing small files.
-
---
-v1.31
-\
-*May 5, 2026*
-
-Fix race between modifying quota rules and internal reading of the rules
-that tripped an assertion.
-
-Fix a bug that could skip merging totl items under specific heavy write
-loads.  This could lead to merged totl items incorrectly tracking the
-sum of all the contributing totl xattrs.
-
-Fix many small low risk bugs in error paths that were found with code
-analysis and testing.
-
---
-v1.30
-\
-*Apr 21, 2026*
-
-Fix a problem reading the accumulated totals of contributing .totl.
-xattrs when log merging is in progress.  The problem would have readers
-of the totals calculate the sums incorrectly.
-
-Fix a problem updating quota rules.  There was a race where updates
-could be corrupted if they happened while a transaction was being
-written.
-
-Fix a problem deleting files with .indx. xattrs.  The internal indexing
-metadata wouldn't be properly deleted so the files would still claim to
-be present and visible in the index, though the file no longer existed.
-
---
-v1.29
-\
-*Mar 25, 2026*
-
-Add a repair mechanism for mount logs that weren't properly resolved as
-mounts left the cluster.  The presence of these logs prevents log
-merging from making forward progress and the backlog of logs over time
-can cause operations to slow to a crawl.  With the repair mechanism in
-place the orphaned logs don't stop merging and operations proceed as
-usual.
-
-Add an ioctl for turning offline unmapped file regions into sparse
-regions.
-
---
-v1.28
-\
-*Feb 5, 2026*
-
-Fix a bug that lead to incorrect negative caching of ACL entries
-starting in version 9.6 of distribution kernels in the enterprise linux
-family.  This would manifest as ACLs seemingly disappearing,
-particularly default ACLs on directories.  The persistent ACLs always
-existed but because of internal API incompatibility some readers
-couldn't see them and would cache that they didn't exist.
-
---
-v1.27
-\
-*Jan 15, 2026*
-
-Switch away from using the general VM cache reclaim machinery to reduce
-idle cluster locks in the client.  The VM treated locks like a cache and
-let many accumulate, presuming that it would be efficient to free them
-in batches.  Lock freeing requires network communication so this could
-result in enormous backlogs in network messages (on the order of
-hundreds of thousands) and could result in signifcant delays of other
-network messaging.
-
-Fix inefficient network receive processing while many messages are in
-the send queue.  This consumed sufficient CPU to cause significant
-stalls, perhaps resulting in hung task warning messages due to delayed
-lock message delivery.
-
-Fix a server livelock case that could happen while committing client
-transactions that contain a large amount of freed file data extents.
-This would present as client tasks hanging and a server task spinning
-consuming cpu.
-
-Fix a rare server request processing failure that doesn't deal with
-retransmission of a request that a previous server partially processed.
-This would present as hung client tasks and repeated "error -2
-committing log merge: getting merge status item" kernel messages.
-
-Fix an unneccessary server shutdown during specific circumstances in
-client lock recovery.  The shutdown was due to server state and was
-ultimately harmless.  The next server that started up would proceed
-accordingly.
-
---
-v1.26
-\
-*Nov 17, 2025*
-
-Add the ino\_alloc\_per\_lock mount option.  This changes the number of
-inode numbers allocated under each cluster lock and can alleviate lock
-contention for some patterns of larger file creation.
-
-Add the tcp\_keepalive\_timeout\_ms mount option.  This can enable the
-system to survive longer periods of networking outages.
-
-Fix a rare double free of internal btree metadata blocks when merging
-log trees.  The duplicated freed metadata block numbers would cause
-persistent errors in the server, preventing the server from starting and
-hanging the system.
-
-Fix the data\_wait interface to not require the correct data\_version of
-the inode when raising an error.  This lets callers raise errors when
-they're unable to recall the details of the inode to discover its
-data\_version.
-
-Change scoutfs to more aggressively reclaim cached memory when under
-memory pressure.  This makes scoutfs behave more like other kernel
-components and it integrates better with the reclaim policy heuristics
-in the VM core of the kernel.
-
-Change scoutfs to more efficiently transmit and receive socket messages.
-Under heavy load this can process messages sufficiently more quickly to
-avoid hung task messages for tasks that were waiting for cluster lock
-messages to be processed.
-
-Fix faulty server block commit budget calculations that were generating
-spurious "holders exceeded alloc budget" console messages.
-
 ---
 v1.25
 \
@@ -5,6 +5,13 @@ ifeq ($(SK_KSRC),)
 SK_KSRC := $(shell echo /lib/modules/`uname -r`/build)
 endif

+# fail if sparse fails if we find it
+ifeq ($(shell sparse && echo found),found)
+SP =
+else
+SP = @:
+endif
+
 SCOUTFS_GIT_DESCRIBE ?= \
 	$(shell git describe --all --abbrev=6 --long 2>/dev/null || \
 		echo no-git)
@@ -29,7 +36,9 @@ TARFILE = scoutfs-kmod-$(RPM_VERSION).tar
 all: module

 module:
-	$(MAKE) CHECK=$(CURDIR)/src/sparse-filtered.sh C=1 CF="-D__CHECK_ENDIAN__" $(SCOUTFS_ARGS)
+	$(MAKE) $(SCOUTFS_ARGS)
+	$(SP) $(MAKE) C=2 CF="-D__CHECK_ENDIAN__" $(SCOUTFS_ARGS)
+

 modules_install:
 	$(MAKE) $(SCOUTFS_ARGS) modules_install
@@ -158,6 +158,15 @@ ifneq (,$(shell grep 'sock_create_kern.*struct net' include/linux/net.h))
 ccflags-y += -DKC_SOCK_CREATE_KERN_NET=1
 endif

+#
+# v3.18-rc6-1619-gc0371da6047a
+#
+# iov_iter is now part of struct msghdr
+#
+ifneq (,$(shell grep 'struct iov_iter.*msg_iter' include/linux/socket.h))
+ccflags-y += -DKC_MSGHDR_STRUCT_IOV_ITER=1
+endif
+
 #
 # v4.17-rc6-7-g95582b008388
 #
@@ -278,14 +287,6 @@ ifneq (,$(shell grep 'int ..mknod. .struct user_namespace' include/linux/fs.h))
 ccflags-y += -DKC_VFS_METHOD_USER_NAMESPACE_ARG
 endif

-#
-# v6.2-rc1-2-gabf08576afe3
-#
-# fs: vfs methods use struct mnt_idmap instead of struct user_namespace
-ifneq (,$(shell grep 'int vfs_mknod.struct mnt_idmap' include/linux/fs.h))
-ccflags-y += -DKC_VFS_METHOD_MNT_IDMAP_ARG
-endif
-
 #
 # v5.17-rc2-21-g07888c665b40
 #
@@ -433,85 +434,3 @@ endif
 ifneq (,$(shell grep 'int ..remap_pages..struct vm_area_struct' include/linux/mm.h))
 ccflags-y += -DKC_MM_REMAP_PAGES
 endif
-
-#
-# v3.19-4742-g503c358cf192
-#
-# list_lru_shrink_count() and list_lru_shrink_walk() introduced
-#
-ifneq (,$(shell grep 'list_lru_shrink_count.*struct list_lru' include/linux/list_lru.h))
-ccflags-y += -DKC_LIST_LRU_SHRINK_COUNT_WALK
-endif
-
-#
-# v3.19-4757-g3f97b163207c
-#
-# lru_list_walk_cb lru arg added
-#
-ifneq (,$(shell grep 'struct list_head \*item, spinlock_t \*lock, void \*cb_arg' include/linux/list_lru.h))
-ccflags-y += -DKC_LIST_LRU_WALK_CB_ITEM_LOCK
-endif
-
-#
-# v6.7-rc4-153-g0a97c01cd20b
-#
-# list_lru_{add,del} -> list_lru_{add,del}_obj
-#
-ifneq (,$(shell grep '^bool list_lru_add_obj' include/linux/list_lru.h))
-ccflags-y += -DKC_LIST_LRU_ADD_OBJ
-endif
-
-#
-# v6.12-rc6-227-gda0c02516c50
-#
-# lru_list_walk_cb lock arg removed
-#
-ifneq (,$(shell grep 'struct list_lru_one \*list, spinlock_t \*lock, void \*cb_arg' include/linux/list_lru.h))
-ccflags-y += -DKC_LIST_LRU_WALK_CB_LIST_LOCK
-endif
-
-#
-# v5.1-rc4-273-ge9b98e162aa5
-#
-# introduce stack trace helpers
-#
-ifneq (,$(shell grep '^unsigned int stack_trace_save' include/linux/stacktrace.h))
-ccflags-y += -DKC_STACK_TRACE_SAVE
-endif
-
-#
-# v3.14-rc1-7-g4e34e719e457
-#
-# .set_acl callback added to struct inode_operations.  Most kernels
-# we target have it, but el7 (3.10 base) does not, so detect.
-#
-ifneq (,$(shell grep 'int ..set_acl..struct' include/linux/fs.h))
-ccflags-y += -DKC_HAS_SET_ACL
-endif
-
-#
-# v6.1-rc1-2-g138060ba92b3
-#
-# set_acl now passed a struct dentry instead of inode.
-#
-ifneq (,$(shell grep 'int ..set_acl.*struct dentry' include/linux/fs.h))
-ccflags-y += -DKC_SET_ACL_DENTRY
-endif
-
-#
-# v6.1-rc1-3-gcac2f8b8d8b5
-#
-# get_acl renamed to get_inode_acl.
-#
-ifneq (,$(shell grep 'struct posix_acl.*get_inode_acl' include/linux/fs.h))
-ccflags-y += -DKC_GET_INODE_ACL
-endif
-
-#
-# v6.15-13744-g41cb08555c41
-#
-# from_timer renamed to timer_container_of.
-#
-ifneq (,$(shell grep 'define timer_container_of' include/linux/timer.h))
-ccflags-y += -DKC_TIMER_CONTAINER_OF
-endif
@@ -107,22 +107,13 @@ struct posix_acl *scoutfs_get_acl_locked(struct inode *inode, int type, struct s
 	return acl;
 }

-#ifdef KC_GET_INODE_ACL
-struct posix_acl *scoutfs_get_acl(struct inode *inode, int type, bool rcu)
-#else
 struct posix_acl *scoutfs_get_acl(struct inode *inode, int type)
-#endif
 {
 	struct super_block *sb = inode->i_sb;
 	struct scoutfs_lock *lock = NULL;
 	struct posix_acl *acl;
 	int ret;

-#ifdef KC_GET_INODE_ACL
-	if (rcu)
-		return ERR_PTR(-ECHILD);
-#endif
-
 #ifndef KC___POSIX_ACL_CREATE
 	if (!IS_POSIXACL(inode))
 		return NULL;
@@ -210,16 +201,8 @@ out:
 	return ret;
 }

-#ifdef KC_SET_ACL_DENTRY
-int scoutfs_set_acl(KC_VFS_NS_DEF
-		    struct dentry *dentry, struct posix_acl *acl, int type)
+int scoutfs_set_acl(struct inode *inode, struct posix_acl *acl, int type)
 {
-	struct inode *inode = dentry->d_inode;
-#else
-int scoutfs_set_acl(KC_VFS_NS_DEF
-		    struct inode *inode, struct posix_acl *acl, int type)
-{
-#endif
 	struct super_block *sb = inode->i_sb;
 	struct scoutfs_lock *lock = NULL;
 	LIST_HEAD(ind_locks);
@@ -257,11 +240,7 @@ int scoutfs_acl_get_xattr(struct dentry *dentry, const char *name, void *value,
 	if (!IS_POSIXACL(dentry->d_inode))
 		return -EOPNOTSUPP;

-#ifdef KC_GET_INODE_ACL
-	acl = scoutfs_get_acl(dentry->d_inode, type, false);
-#else
 	acl = scoutfs_get_acl(dentry->d_inode, type);
-#endif
 	if (IS_ERR(acl))
 		return PTR_ERR(acl);
 	if (acl == NULL)
@@ -307,11 +286,7 @@ int scoutfs_acl_set_xattr(struct dentry *dentry, const char *name, const void *v
 		}
 	}

-#ifdef KC_SET_ACL_DENTRY
-	ret = scoutfs_set_acl(KC_VFS_INIT_NS dentry, acl, type);
-#else
-	ret = scoutfs_set_acl(KC_VFS_INIT_NS dentry->d_inode, acl, type);
-#endif
+	ret = scoutfs_set_acl(dentry->d_inode, acl, type);
 out:
 	posix_acl_release(acl);

@@ -1,19 +1,9 @@
 #ifndef _SCOUTFS_ACL_H_
 #define _SCOUTFS_ACL_H_

-#ifdef KC_SET_ACL_DENTRY
-int scoutfs_set_acl(KC_VFS_NS_DEF
-		    struct dentry *dentry, struct posix_acl *acl, int type);
-#else
-int scoutfs_set_acl(KC_VFS_NS_DEF
-		    struct inode *inode, struct posix_acl *acl, int type);
-#endif
-#ifdef KC_GET_INODE_ACL
-struct posix_acl *scoutfs_get_acl(struct inode *inode, int type, bool rcu);
-#else
 struct posix_acl *scoutfs_get_acl(struct inode *inode, int type);
-#endif
 struct posix_acl *scoutfs_get_acl_locked(struct inode *inode, int type, struct scoutfs_lock *lock);
+int scoutfs_set_acl(struct inode *inode, struct posix_acl *acl, int type);
 int scoutfs_set_acl_locked(struct inode *inode, struct posix_acl *acl, int type,
 			   struct scoutfs_lock *lock, struct list_head *ind_locks);
 #ifdef KC_XATTR_STRUCT_XATTR_HANDLER
@@ -857,7 +857,7 @@ static int find_zone_extent(struct super_block *sb, struct scoutfs_alloc_root *r
 		.zone = SCOUTFS_FREE_EXTENT_ORDER_ZONE,
 	};
 	struct scoutfs_extent found;
-	struct scoutfs_extent ext = {0,};
+	struct scoutfs_extent ext;
 	u64 start;
 	u64 len;
 	int nr;
@@ -22,8 +22,6 @@
 #include <linux/rhashtable.h>
 #include <linux/random.h>
 #include <linux/sched/mm.h>
-#include <linux/list_lru.h>
-#include <linux/stacktrace.h>

 #include "format.h"
 #include "super.h"
@@ -40,12 +38,26 @@
 * than the page size.  Callers can have their own contexts for tracking
 * dirty blocks that are written together.  We pin dirty blocks in
 * memory and only checksum them all as they're all written.
+ *
+ * Memory reclaim is driven by maintaining two very coarse groups of
+ * blocks.  As we access blocks we mark them with an increasing counter
+ * to discourage them from being reclaimed.  We then define a threshold
+ * at the current counter minus half the population.  Recent blocks have
+ * a counter greater than the threshold, and all other blocks with
+ * counters less than it are considered older and are candidates for
+ * reclaim.  This results in access updates rarely modifying an atomic
+ * counter as blocks need to be moved into the recent group, and shrink
+ * can randomly scan blocks looking for the half of the population that
+ * will be in the old group.  It's reasonably effective, but is
+ * particularly efficient and avoids contention between concurrent
+ * accesses and shrinking.
 */

 struct block_info {
 	struct super_block *sb;
+	atomic_t total_inserted;
+	atomic64_t access_counter;
 	struct rhashtable ht;
-	struct list_lru lru;
 	wait_queue_head_t waitq;
 	KC_DEFINE_SHRINKER(shrinker);
 	struct work_struct free_work;
@@ -64,15 +76,28 @@ enum block_status_bits {
 	BLOCK_BIT_PAGE_ALLOC,	/* page (possibly high order) allocation */
 	BLOCK_BIT_VIRT,		/* mapped virt allocation */
 	BLOCK_BIT_CRC_VALID,	/* crc has been verified */
-	BLOCK_BIT_ACCESSED,	/* seen by lookup since last lru add/walk */
 };

+/*
+ * We want to tie atomic changes in refcounts to whether or not the
+ * block is still visible in the hash table, so we store the hash
+ * table's reference up at a known high bit.  We could naturally set the
+ * inserted bit through excessive refcount increments.  We don't do
+ * anything about that but at least warn if we get close.
+ *
+ * We're avoiding the high byte for no real good reason, just out of a
+ * historical fear of implementations that don't provide the full
+ * precision.
+ */
+#define BLOCK_REF_INSERTED	(1U << 23)
+#define BLOCK_REF_FULL		(BLOCK_REF_INSERTED >> 1)
+
 struct block_private {
 	struct scoutfs_block bl;
 	struct super_block *sb;
 	atomic_t refcount;
+	u64 accessed;
 	struct rhash_head ht_head;
-	struct list_head lru_head;
 	struct list_head dirty_entry;
 	struct llist_node free_node;
 	unsigned long bits;
@@ -81,15 +106,13 @@ struct block_private {
 		struct page *page;
 		void *virt;
 	};
-	unsigned int stack_len;
-	unsigned long stack[10];
 };

 #define TRACE_BLOCK(which, bp)									\
 do {												\
 	__typeof__(bp) _bp = (bp);								\
 	trace_scoutfs_block_##which(_bp->sb, _bp, _bp->bl.blkno, atomic_read(&_bp->refcount),	\
-				    atomic_read(&_bp->io_count), _bp->bits);	\
+				    atomic_read(&_bp->io_count), _bp->bits, _bp->accessed);	\
 } while (0)

 #define BLOCK_PRIVATE(_bl) \
@@ -103,17 +126,7 @@ static __le32 block_calc_crc(struct scoutfs_block_header *hdr, u32 size)
 	return cpu_to_le32(calc);
 }

-static noinline void save_block_stack(struct block_private *bp)
-{
-	bp->stack_len = stack_trace_save(bp->stack, ARRAY_SIZE(bp->stack), 2);
-}
-
-static void print_block_stack(struct block_private *bp)
-{
-	stack_trace_print(bp->stack, bp->stack_len, 1);
-}
-
-static noinline struct block_private *block_alloc(struct super_block *sb, u64 blkno)
+static struct block_private *block_alloc(struct super_block *sb, u64 blkno)
 {
 	struct block_private *bp;
 	unsigned int nofs_flags;
@@ -163,13 +176,11 @@ static noinline struct block_private *block_alloc(struct super_block *sb, u64 bl
 	bp->bl.blkno = blkno;
 	bp->sb = sb;
 	atomic_set(&bp->refcount, 1);
-	INIT_LIST_HEAD(&bp->lru_head);
 	INIT_LIST_HEAD(&bp->dirty_entry);
 	set_bit(BLOCK_BIT_NEW, &bp->bits);
 	atomic_set(&bp->io_count, 0);

 	TRACE_BLOCK(allocate, bp);
-	save_block_stack(bp);

 out:
 	if (!bp)
@@ -218,90 +229,36 @@ static void block_free_work(struct work_struct *work)

 	llist_for_each_entry_safe(bp, tmp, deleted, free_node) {
 		block_free(sb, bp);
-		cond_resched();
 	}
 }

 /*
- * Users of blocks hold a refcount.  If putting a refcount drops to zero
- * then the block is freed.
- *
- * Acquiring new references and claiming the exclusive right to tear
- * down a block is built around this LIVE_REFCOUNT_BASE refcount value.
- * As blocks are initially cached they have the live base added to their
- * refcount.  Lookups will only increment the refcount and return blocks
- * for reference holders while the refcount is >= than the base.
- *
- * To remove a block from the cache and eventually free it, either by
- * the lru walk in the shrinker, or by reference holders, the live base
- * is removed and turned into a normal refcount increment that will be
- * put by the caller.  This can only be done once for a block, and once
- * its done lookup will not return any more references.
- */
-#define LIVE_REFCOUNT_BASE (INT_MAX ^ (INT_MAX >> 1))
-
-/*
- * Inc the refcount while holding an incremented refcount.  We can't
- * have so many individual reference holders that they pass the live
- * base.
+ * Get a reference to a block while holding an existing reference.
 */
 static void block_get(struct block_private *bp)
 {
-	int now = atomic_inc_return(&bp->refcount);
+	WARN_ON_ONCE((atomic_read(&bp->refcount) & ~BLOCK_REF_INSERTED) <= 0);

-	BUG_ON(now <= 1);
-	BUG_ON(now == LIVE_REFCOUNT_BASE);
+	atomic_inc(&bp->refcount);
 }

 /*
- * if (*v >= u) {
- * 	*v += a;
- * 	return true;
- * }
- */
-static bool atomic_add_unless_less(atomic_t *v, int a, int u)
+ * Get a reference to a block as long as it's been inserted in the hash
+ * table and hasn't been removed.
+ */ 
+static struct block_private *block_get_if_inserted(struct block_private *bp)
 {
-	int c;
+	int cnt;

 	do {
-		c = atomic_read(v);
-		if (c < u)
-			return false;
-	} while (atomic_cmpxchg(v, c, c + a) != c);
+		cnt = atomic_read(&bp->refcount);
+		WARN_ON_ONCE(cnt & BLOCK_REF_FULL);
+		if (!(cnt & BLOCK_REF_INSERTED))
+			return NULL;

-	return true;
-}
+	} while (atomic_cmpxchg(&bp->refcount, cnt, cnt + 1) != cnt);

-static bool block_get_if_live(struct block_private *bp)
-{
-	return atomic_add_unless_less(&bp->refcount, 1, LIVE_REFCOUNT_BASE);
-}
-
-/*
- * If the refcount still has the live base, subtract it and increment
- * the callers refcount that they'll put.
- */
-static bool block_get_remove_live(struct block_private *bp)
-{
-	return atomic_add_unless_less(&bp->refcount, (1 - LIVE_REFCOUNT_BASE), LIVE_REFCOUNT_BASE);
-}
-
-/*
- * Only get the live base refcount if it is the only refcount remaining.
- * This means that there are no active refcount holders and the block
- * can't be dirty or under IO, which both hold references.
- */
-static bool block_get_remove_live_only(struct block_private *bp)
-{
-	int c;
-
-	do {
-		c = atomic_read(&bp->refcount);
-		if (c != LIVE_REFCOUNT_BASE)
-			return false;
-	} while (atomic_cmpxchg(&bp->refcount, c, c - LIVE_REFCOUNT_BASE + 1) != c);
-
-	return true;
+	return bp;
 }

 /*
@@ -333,81 +290,143 @@ static const struct rhashtable_params block_ht_params = {
 };

 /*
- * Insert the block into the cache so that it's visible for lookups.
- * The caller can hold references (including for a dirty block).
- *
- * We make sure the base is added and the block is in the lru once it's
- * in the hash.  If hash table insertion fails it'll be briefly visible
- * in the lru, but won't be isolated/evicted because we hold an
- * incremented refcount in addition to the live base.
+ * Insert a new block into the hash table.  Once it is inserted in the
+ * hash table readers can start getting references.  The caller may have
+ * multiple refs but the block can't already be inserted.
 */
 static int block_insert(struct super_block *sb, struct block_private *bp)
 {
 	DECLARE_BLOCK_INFO(sb, binf);
 	int ret;

-	BUG_ON(atomic_read(&bp->refcount) >= LIVE_REFCOUNT_BASE);
-	atomic_add(LIVE_REFCOUNT_BASE, &bp->refcount);
-	smp_mb__after_atomic(); /* make sure live base is visible to list_lru walk */
-	list_lru_add_obj(&binf->lru, &bp->lru_head);
+	WARN_ON_ONCE(atomic_read(&bp->refcount) & BLOCK_REF_INSERTED);
+
 retry:
+	atomic_add(BLOCK_REF_INSERTED, &bp->refcount);
 	ret = rhashtable_lookup_insert_fast(&binf->ht, &bp->ht_head, block_ht_params);
 	if (ret < 0) {
+		atomic_sub(BLOCK_REF_INSERTED, &bp->refcount);
 		if (ret == -EBUSY) {
 			/* wait for pending rebalance to finish */
 			synchronize_rcu();
 			goto retry;
-		} else {
-			atomic_sub(LIVE_REFCOUNT_BASE, &bp->refcount);
-			BUG_ON(atomic_read(&bp->refcount) >= LIVE_REFCOUNT_BASE);
-			list_lru_del_obj(&binf->lru, &bp->lru_head);
 		}
 	} else {
+		atomic_inc(&binf->total_inserted);
 		TRACE_BLOCK(insert, bp);
 	}

 	return ret;
 }

-/*
- * Indicate to the lru walker that this block has been accessed since it
- * was added or last walked.
- */
-static void block_accessed(struct super_block *sb, struct block_private *bp)
+static u64 accessed_recently(struct block_info *binf)
 {
-	if (!test_and_set_bit(BLOCK_BIT_ACCESSED, &bp->bits))
-		scoutfs_inc_counter(sb, block_cache_access_update);
+	return atomic64_read(&binf->access_counter) - (atomic_read(&binf->total_inserted) >> 1);
 }

 /*
- * Remove the block from the cache.  When this returns the block won't
- * be visible for additional references from lookup.
- *
- * We always try and remove from the hash table.  It's safe to remove a
- * block that isn't hashed, it just returns -ENOENT.
- *
- * This is racing with the lru walk in the shrinker also trying to
- * remove idle blocks from the cache.  They both try to remove the live
- * refcount base and perform their removal and put if they get it.
+ * Make sure that a block that is being accessed is less likely to be
+ * reclaimed if it is seen by the shrinker.   If the block hasn't been
+ * accessed recently we update its accessed value.
 */
-static void block_remove(struct super_block *sb, struct block_private *bp)
+static void block_accessed(struct super_block *sb, struct block_private *bp)
 {
 	DECLARE_BLOCK_INFO(sb, binf);

-	rhashtable_remove_fast(&binf->ht, &bp->ht_head, block_ht_params);
-
-	if (block_get_remove_live(bp)) {
-		list_lru_del_obj(&binf->lru, &bp->lru_head);
-		block_put(sb, bp);
+	if (bp->accessed == 0 || bp->accessed < accessed_recently(binf)) {
+		scoutfs_inc_counter(sb, block_cache_access_update);
+		bp->accessed = atomic64_inc_return(&binf->access_counter);
 	}
 }

+/*
+ * The caller wants to remove the block from the hash table and has an
+ * idea what the refcount should be.  If the refcount does still
+ * indicate that the block is hashed, and we're able to clear that bit,
+ * then we can remove it from the hash table.
+ *
+ * The caller makes sure that it's safe to be referencing this block,
+ * either with their own held reference (most everything) or by being in
+ * an rcu grace period (shrink).
+ */
+static bool block_remove_cnt(struct super_block *sb, struct block_private *bp, int cnt)
+{
+	DECLARE_BLOCK_INFO(sb, binf);
+	int ret;
+
+	if ((cnt & BLOCK_REF_INSERTED) &&
+	    (atomic_cmpxchg(&bp->refcount, cnt, cnt & ~BLOCK_REF_INSERTED) == cnt)) {
+
+		TRACE_BLOCK(remove, bp);
+		ret = rhashtable_remove_fast(&binf->ht, &bp->ht_head, block_ht_params);
+		WARN_ON_ONCE(ret); /* must have been inserted */
+		atomic_dec(&binf->total_inserted);
+		return true;
+	}
+
+	return false;
+}
+
+/*
+ * Try to remove the block from the hash table as long as the refcount
+ * indicates that it is still in the hash table.  This can be racing
+ * with normal refcount changes so it might have to retry.
+ */
+static void block_remove(struct super_block *sb, struct block_private *bp)
+{
+	int cnt;
+
+	do {
+		cnt = atomic_read(&bp->refcount);
+	} while ((cnt & BLOCK_REF_INSERTED) && !block_remove_cnt(sb, bp, cnt));
+}
+
+/*
+ * Take one shot at removing the block from the hash table if it's still
+ * in the hash table and the caller has the only other reference.
+ */
+static bool block_remove_solo(struct super_block *sb, struct block_private *bp)
+{
+	return block_remove_cnt(sb, bp, BLOCK_REF_INSERTED | 1);
+}
+
 static bool io_busy(struct block_private *bp)
 {
 	smp_rmb(); /* test after adding to wait queue */
 	return test_bit(BLOCK_BIT_IO_BUSY, &bp->bits);
 }

+/*
+ * Called during shutdown with no other users.
+ */
+static void block_remove_all(struct super_block *sb)
+{
+	DECLARE_BLOCK_INFO(sb, binf);
+	struct rhashtable_iter iter;
+	struct block_private *bp;
+
+	rhashtable_walk_enter(&binf->ht, &iter);
+	rhashtable_walk_start(&iter);
+
+	for (;;) {
+		bp = rhashtable_walk_next(&iter);
+		if (bp == NULL)
+			break;
+		if (bp == ERR_PTR(-EAGAIN))
+			continue;
+
+		if (block_get_if_inserted(bp)) {
+			block_remove(sb, bp);
+			WARN_ON_ONCE(atomic_read(&bp->refcount) != 1);
+			block_put(sb, bp);
+		}
+	}
+
+	rhashtable_walk_stop(&iter);
+	rhashtable_walk_exit(&iter);
+
+	WARN_ON_ONCE(atomic_read(&binf->total_inserted) != 0);
+}

 /*
 * XXX The io_count and sb fields in the block_private are only used
@@ -468,6 +487,9 @@ static int block_submit_bio(struct super_block *sb, struct block_private *bp,
 	sector_t sector;
 	int ret = 0;

+	if (scoutfs_forcing_unmount(sb))
+		return -EIO;
+
 	sector = bp->bl.blkno << (SCOUTFS_BLOCK_LG_SHIFT - 9);

 	WARN_ON_ONCE(bp->bl.blkno == U64_MAX);
@@ -478,17 +500,6 @@ static int block_submit_bio(struct super_block *sb, struct block_private *bp,
 	set_bit(BLOCK_BIT_IO_BUSY, &bp->bits);
 	block_get(bp);

-	/*
-	 * A second thread may already be waiting on this block's completion
-	 * after this thread won the race to submit the block.  We exit through
-	 * the block_end_io error path which sets BLOCK_BIT_ERROR and assures
-	 * that other callers in the waitq get woken up.
-	 */
-	if (scoutfs_forcing_unmount(sb)) {
-		ret = -ENOLINK;
-		goto end_io;
-	}
-
 	blk_start_plug(&plug);

 	for (off = 0; off < SCOUTFS_BLOCK_LG_SIZE; off += PAGE_SIZE) {
@@ -526,17 +537,12 @@ static int block_submit_bio(struct super_block *sb, struct block_private *bp,

 	blk_finish_plug(&plug);

-end_io:
 	/* let racing end_io know we're done */
 	block_end_io(sb, opf, bp, ret);

 	return ret;
 }

-/*
- * Return a block with an elevated refcount if it was present in the
- * hash table and its refcount didn't indicate that it was being freed.
- */
 static struct block_private *block_lookup(struct super_block *sb, u64 blkno)
 {
 	DECLARE_BLOCK_INFO(sb, binf);
@@ -544,8 +550,8 @@ static struct block_private *block_lookup(struct super_block *sb, u64 blkno)

 	rcu_read_lock();
 	bp = rhashtable_lookup(&binf->ht, &blkno, block_ht_params);
-	if (bp && !block_get_if_live(bp))
-		bp = NULL;
+	if (bp)
+		bp = block_get_if_inserted(bp);
 	rcu_read_unlock();

 	return bp;
@@ -706,8 +712,8 @@ retry:

 	ret = 0;
 out:
-	if (!retried && !IS_ERR_OR_NULL(bp) && !block_is_dirty(bp) &&
-	    (ret == -ESTALE || scoutfs_trigger(sb, BLOCK_REMOVE_STALE))) {
+	if ((ret == -ESTALE || scoutfs_trigger(sb, BLOCK_REMOVE_STALE)) &&
+	    !retried && !block_is_dirty(bp)) {
 		retried = true;
 		scoutfs_inc_counter(sb, block_cache_remove_stale);
 		block_remove(sb, bp);
@@ -1072,106 +1078,100 @@ static unsigned long block_count_objects(struct shrinker *shrink, struct shrink_
 	struct super_block *sb = binf->sb;

 	scoutfs_inc_counter(sb, block_cache_count_objects);
-	return list_lru_shrink_count(&binf->lru, sc);
-}
-
-struct isolate_args {
-	struct super_block *sb;
-	struct list_head dispose;
-};
-
-#define DECLARE_ISOLATE_ARGS(sb_, name_) \
-	struct isolate_args name_ = { \
-		.sb = sb_, \
-		.dispose = LIST_HEAD_INIT(name_.dispose), \
-	}
-
-static enum lru_status isolate_lru_block(struct list_head *item, struct list_lru_one *list,
-					 void *cb_arg)
-{
-	struct block_private *bp = container_of(item, struct block_private, lru_head);
-	struct isolate_args *ia = cb_arg;
-
-	TRACE_BLOCK(isolate, bp);
-
-	/* rotate accessed blocks to the tail of the list (lazy promotion) */
-	if (test_and_clear_bit(BLOCK_BIT_ACCESSED, &bp->bits)) {
-		scoutfs_inc_counter(ia->sb, block_cache_isolate_rotate);
-		return LRU_ROTATE;
-	}
-
-	/* any refs, including dirty/io, stop us from acquiring lru refcount */
-	if (!block_get_remove_live_only(bp)) {
-		scoutfs_inc_counter(ia->sb, block_cache_isolate_skip);
-		return LRU_SKIP;
-	}
-
-	scoutfs_inc_counter(ia->sb, block_cache_isolate_removed);
-	list_lru_isolate_move(list, &bp->lru_head, &ia->dispose);
-	return LRU_REMOVED;
-}
-
-static void shrink_dispose_blocks(struct super_block *sb, struct list_head *dispose)
-{
-	struct block_private *bp;
-	struct block_private *bp__;
-
-	list_for_each_entry_safe(bp, bp__, dispose, lru_head) {
-		list_del_init(&bp->lru_head);
-		block_remove(sb, bp);
-		block_put(sb, bp);
-	}
+
+	return shrinker_min_long(atomic_read(&binf->total_inserted));
 }

+/*
+ * Remove a number of cached blocks that haven't been used recently.
+ *
+ * We don't maintain a strictly ordered LRU to avoid the contention of
+ * accesses always moving blocks around in some precise global
+ * structure.
+ *
+ * Instead we use counters to divide the blocks into two roughly equal
+ * groups by how recently they were accessed.  We randomly walk all
+ * inserted blocks looking for any blocks in the older half to remove
+ * and free.  The random walk and there being two groups means that we
+ * typically only walk a small multiple of the number we're looking for
+ * before we find them all.
+ *
+ * Our rcu walk of blocks can see blocks in all stages of their life
+ * cycle, from dirty blocks to those with 0 references that are queued
+ * for freeing.  We only want to free idle inserted blocks so we
+ * atomically remove blocks when the only references are ours and the
+ * hash table.
+ */
 static unsigned long block_scan_objects(struct shrinker *shrink, struct shrink_control *sc)
 {
 	struct block_info *binf = KC_SHRINKER_CONTAINER_OF(shrink, struct block_info);
 	struct super_block *sb = binf->sb;
-	DECLARE_ISOLATE_ARGS(sb, ia);
-	unsigned long freed;
+	struct rhashtable_iter iter;
+	struct block_private *bp;
+	bool stop = false;
+	unsigned long freed = 0;
+	unsigned long nr = sc->nr_to_scan;
+	u64 recently;

 	scoutfs_inc_counter(sb, block_cache_scan_objects);

-	freed = kc_list_lru_shrink_walk(&binf->lru, sc, isolate_lru_block, &ia);
-	shrink_dispose_blocks(sb, &ia.dispose);
-	return freed;
-}
+	recently = accessed_recently(binf);
+	rhashtable_walk_enter(&binf->ht, &iter);
+	rhashtable_walk_start(&iter);

-static enum lru_status dump_lru_block(struct list_head *item, struct list_lru_one *list,
-					 void *cb_arg)
-{
-	struct block_private *bp = container_of(item, struct block_private, lru_head);
+	/*
+	 * This isn't great but I don't see a better way.  We want to
+	 * walk the hash from a random point so that we're not
+	 * constantly walking over the same region that we've already
+	 * freed old blocks within.  The interface doesn't let us do
+	 * this explicitly, but this seems to work?  The difference this
+	 * makes is enormous, around a few orders of magnitude fewer
+	 * _nexts per shrink.
+	 */
+	if (iter.walker.tbl)
+		iter.slot = prandom_u32_max(iter.walker.tbl->size);

-	printk("blkno %llu refcount 0x%x io_count %d bits 0x%lx\n",
-		bp->bl.blkno, atomic_read(&bp->refcount), atomic_read(&bp->io_count),
-		bp->bits);
-	print_block_stack(bp);
+	while (nr > 0) {
+		bp = rhashtable_walk_next(&iter);
+		if (bp == NULL)
+			break;
+		if (bp == ERR_PTR(-EAGAIN)) {
+			/*
+			 * We can be called from reclaim in the allocation
+			 * to resize the hash table itself.  We have to
+			 * return so that the caller can proceed and
+			 * enable hash table iteration again.
+			 */
+			scoutfs_inc_counter(sb, block_cache_shrink_stop);
+			stop = true;
+			break;
+		}

-	return LRU_SKIP;
-}
+		scoutfs_inc_counter(sb, block_cache_shrink_next);

-/*
- * Called during shutdown with no other users.  The isolating walk must
- * find blocks on the lru that only have references for presence on the
- * lru and in the hash table.
- */
-static void block_shrink_all(struct super_block *sb)
-{
-	DECLARE_BLOCK_INFO(sb, binf);
-	DECLARE_ISOLATE_ARGS(sb, ia);
-	long count;
+		if (bp->accessed >= recently) {
+			scoutfs_inc_counter(sb, block_cache_shrink_recent);
+			continue;
+		}

-	count = DIV_ROUND_UP(list_lru_count(&binf->lru), 128) * 2;
-	do {
-		kc_list_lru_walk(&binf->lru, isolate_lru_block, &ia, 128);
-		shrink_dispose_blocks(sb, &ia.dispose);
-	} while (list_lru_count(&binf->lru) > 0 && --count > 0);
-
-	count = list_lru_count(&binf->lru);
-	if (count > 0) {
-		scoutfs_err(sb, "failed to isolate/dispose %ld blocks", count);
-		kc_list_lru_walk(&binf->lru, dump_lru_block, sb, count);
+		if (block_get_if_inserted(bp)) {
+			if (block_remove_solo(sb, bp)) {
+				scoutfs_inc_counter(sb, block_cache_shrink_remove);
+				TRACE_BLOCK(shrink, bp);
+				freed++;
+				nr--;
+			}
+			block_put(sb, bp);
+		}
 	}
+
+	rhashtable_walk_stop(&iter);
+	rhashtable_walk_exit(&iter);
+
+	if (stop)
+		return SHRINK_STOP;
+	else
+		return freed;
 }

 struct sm_block_completion {
@@ -1210,7 +1210,7 @@ static int sm_block_io(struct super_block *sb, struct block_device *bdev, blk_op
 	BUILD_BUG_ON(PAGE_SIZE < SCOUTFS_BLOCK_SM_SIZE);

 	if (scoutfs_forcing_unmount(sb))
-		return -ENOLINK;
+		return -EIO;

 	if (WARN_ON_ONCE(len > SCOUTFS_BLOCK_SM_SIZE) ||
 	    WARN_ON_ONCE(!op_is_write(opf) && !blk_crc))
@@ -1276,7 +1276,7 @@ int scoutfs_block_write_sm(struct super_block *sb,
 int scoutfs_block_setup(struct super_block *sb)
 {
 	struct scoutfs_sb_info *sbi = SCOUTFS_SB(sb);
-	struct block_info *binf = NULL;
+	struct block_info *binf;
 	int ret;

 	binf = kzalloc(sizeof(struct block_info), GFP_KERNEL);
@@ -1285,15 +1285,15 @@ int scoutfs_block_setup(struct super_block *sb)
 		goto out;
 	}

-	ret = list_lru_init(&binf->lru);
-	if (ret < 0)
-		goto out;
-
 	ret = rhashtable_init(&binf->ht, &block_ht_params);
-	if (ret < 0)
+	if (ret < 0) {
+		kfree(binf);
 		goto out;
+	}

 	binf->sb = sb;
+	atomic_set(&binf->total_inserted, 0);
+	atomic64_set(&binf->access_counter, 0);
 	init_waitqueue_head(&binf->waitq);
 	KC_INIT_SHRINKER_FUNCS(&binf->shrinker, block_count_objects,
 			       block_scan_objects);
@@ -1305,10 +1305,8 @@ int scoutfs_block_setup(struct super_block *sb)

 	ret = 0;
 out:
-	if (ret < 0 && binf) {
-		list_lru_destroy(&binf->lru);
-		kfree(binf);
-	}
+	if (ret)
+		scoutfs_block_destroy(sb);

 	return ret;
 }
@@ -1320,10 +1318,9 @@ void scoutfs_block_destroy(struct super_block *sb)

 	if (binf) {
 		KC_UNREGISTER_SHRINKER(&binf->shrinker);
-		block_shrink_all(sb);
+		block_remove_all(sb);
 		flush_work(&binf->free_work);
 		rhashtable_destroy(&binf->ht);
-		list_lru_destroy(&binf->lru);

 		kfree(binf);
 		sbi->block_info = NULL;
@@ -2183,8 +2183,6 @@ static int merge_read_item(struct super_block *sb, struct scoutfs_key *key, u64
 		if (ret > 0) {
 			if (ret == SCOUTFS_DELTA_COMBINED) {
 				scoutfs_inc_counter(sb, btree_merge_delta_combined);
-				if (seq > found->seq)
-					found->seq = seq;
 			} else if (ret == SCOUTFS_DELTA_COMBINED_NULL) {
 				scoutfs_inc_counter(sb, btree_merge_delta_null);
 				free_mitem(rng, found);
@@ -2488,14 +2486,6 @@ int scoutfs_btree_merge(struct super_block *sb,
 			mitem = next_mitem(mitem);
 			free_mitem(&rng, tmp);
 		}
-
-		if (mitem && walk_val_len == 0 &&
-		    !(walk_flags & (BTW_INSERT | BTW_DELETE)) &&
-		    scoutfs_trigger(sb, LOG_MERGE_FORCE_PARTIAL)) {
-			ret = -ERANGE;
-			*next_ret = mitem->key;
-			goto out;
-		}
 	}

 	ret = 0;
@@ -435,8 +435,8 @@ static int lookup_mounted_client_item(struct super_block *sb, u64 rid)
 	if (ret == -ENOENT)
 		ret = 0;

-out:
 	kfree(super);
+out:
 	return ret;
 }

@@ -26,15 +26,17 @@
 	EXPAND_COUNTER(block_cache_alloc_page_order)		\
 	EXPAND_COUNTER(block_cache_alloc_virt)			\
 	EXPAND_COUNTER(block_cache_end_io_error)		\
-	EXPAND_COUNTER(block_cache_isolate_removed)		\
-	EXPAND_COUNTER(block_cache_isolate_rotate)		\
-	EXPAND_COUNTER(block_cache_isolate_skip)		\
 	EXPAND_COUNTER(block_cache_forget)			\
 	EXPAND_COUNTER(block_cache_free)			\
 	EXPAND_COUNTER(block_cache_free_work)			\
 	EXPAND_COUNTER(block_cache_remove_stale)		\
 	EXPAND_COUNTER(block_cache_count_objects)		\
 	EXPAND_COUNTER(block_cache_scan_objects)		\
+	EXPAND_COUNTER(block_cache_shrink)			\
+	EXPAND_COUNTER(block_cache_shrink_next)			\
+	EXPAND_COUNTER(block_cache_shrink_recent)		\
+	EXPAND_COUNTER(block_cache_shrink_remove)		\
+	EXPAND_COUNTER(block_cache_shrink_stop)			\
 	EXPAND_COUNTER(btree_compact_values)			\
 	EXPAND_COUNTER(btree_compact_values_enomem)		\
 	EXPAND_COUNTER(btree_delete)				\
@@ -88,7 +90,6 @@
 	EXPAND_COUNTER(forest_read_items)			\
 	EXPAND_COUNTER(forest_roots_next_hint)			\
 	EXPAND_COUNTER(forest_set_bloom_bits)			\
-	EXPAND_COUNTER(inode_deleted)				\
 	EXPAND_COUNTER(item_cache_count_objects)		\
 	EXPAND_COUNTER(item_cache_scan_objects)			\
 	EXPAND_COUNTER(item_clear_dirty)			\
@@ -116,15 +117,15 @@
 	EXPAND_COUNTER(item_pcpu_page_hit)			\
 	EXPAND_COUNTER(item_pcpu_page_miss)			\
 	EXPAND_COUNTER(item_pcpu_page_miss_keys)		\
-	EXPAND_COUNTER(item_read_pages_barrier)			\
-	EXPAND_COUNTER(item_read_pages_retry)			\
 	EXPAND_COUNTER(item_read_pages_split)			\
 	EXPAND_COUNTER(item_shrink_page)			\
 	EXPAND_COUNTER(item_shrink_page_dirty)			\
+	EXPAND_COUNTER(item_shrink_page_reader)			\
 	EXPAND_COUNTER(item_shrink_page_trylock)		\
 	EXPAND_COUNTER(item_update)				\
 	EXPAND_COUNTER(item_write_dirty)			\
 	EXPAND_COUNTER(lock_alloc)				\
+	EXPAND_COUNTER(lock_count_objects)			\
 	EXPAND_COUNTER(lock_free)				\
 	EXPAND_COUNTER(lock_grant_request)			\
 	EXPAND_COUNTER(lock_grant_response)			\
@@ -138,13 +139,12 @@
 	EXPAND_COUNTER(lock_lock_error)				\
 	EXPAND_COUNTER(lock_nonblock_eagain)			\
 	EXPAND_COUNTER(lock_recover_request)			\
+	EXPAND_COUNTER(lock_scan_objects)			\
 	EXPAND_COUNTER(lock_shrink_attempted)			\
-	EXPAND_COUNTER(lock_shrink_request_failed)		\
+	EXPAND_COUNTER(lock_shrink_aborted)			\
+	EXPAND_COUNTER(lock_shrink_work)			\
 	EXPAND_COUNTER(lock_unlock)				\
 	EXPAND_COUNTER(lock_wait)				\
-	EXPAND_COUNTER(log_merge_complete)			\
-	EXPAND_COUNTER(log_merge_no_finalized)			\
-	EXPAND_COUNTER(log_merge_start)				\
 	EXPAND_COUNTER(log_merge_wait_timeout)			\
 	EXPAND_COUNTER(net_dropped_response)			\
 	EXPAND_COUNTER(net_send_bytes)				\
@@ -159,7 +159,6 @@
 	EXPAND_COUNTER(orphan_scan)				\
 	EXPAND_COUNTER(orphan_scan_attempts)			\
 	EXPAND_COUNTER(orphan_scan_cached)			\
-	EXPAND_COUNTER(orphan_scan_empty)			\
 	EXPAND_COUNTER(orphan_scan_error)			\
 	EXPAND_COUNTER(orphan_scan_item)			\
 	EXPAND_COUNTER(orphan_scan_omap_set)			\
@@ -182,7 +181,6 @@
 	EXPAND_COUNTER(quorum_send_vote)			\
 	EXPAND_COUNTER(quorum_server_shutdown)			\
 	EXPAND_COUNTER(quorum_term_follower)			\
-	EXPAND_COUNTER(reclaimed_open_logs)			\
 	EXPAND_COUNTER(server_commit_hold)			\
 	EXPAND_COUNTER(server_commit_queue)			\
 	EXPAND_COUNTER(server_commit_worker)			\
@@ -79,10 +79,8 @@ static void item_from_extent(struct scoutfs_key *key,
 		.skdx_end = cpu_to_le64(start + len - 1),
 		.skdx_len = cpu_to_le64(len),
 	};
-	*dv = (struct scoutfs_data_extent_val) {
-		.blkno = cpu_to_le64(map),
-		.flags = flags,
-	};
+	dv->blkno = cpu_to_le64(map);
+	dv->flags = flags;
 }

 static void ext_from_item(struct scoutfs_extent *ext,
@@ -422,8 +420,6 @@ static int alloc_block(struct super_block *sb, struct inode *inode,

 	mutex_lock(&datinf->mutex);

-	scoutfs_inode_get_onoff(inode, &online, &offline);
-
 	/* default to single allocation at the written block */
 	start = iblock;
 	count = 1;
@@ -446,6 +442,7 @@ static int alloc_block(struct super_block *sb, struct inode *inode,
 		 * the preallocation size to the number of online
 		 * blocks.
 		 */
+		scoutfs_inode_get_onoff(inode, &online, &offline);
 		if (iblock > 1 && iblock == online) {
 			ret = scoutfs_ext_next(sb, &data_ext_ops, &args,
 					       iblock, 1, &found);
@@ -487,13 +484,6 @@ static int alloc_block(struct super_block *sb, struct inode *inode,
 		/* trim count by next extent after iblock */
 		if (found.len && found.start > start && found.start < start + count)
 			count = (found.start - start);
-
-		/*
-		 * Ramp the aligned region size up proportionally with
-		 * the file's online block count rather than jumping to
-		 * the full prealloc size.
-		 */
-		count = max_t(u64, 1, min(count, online));
 	}

 	/* overall prealloc limit */
@@ -1525,101 +1515,6 @@ out:
 	return ret;
 }

-/*
- * Punch holes in offline extents.  This is a very specific tool that
- * only does one job: it converts extents from offline to sparse.  It
- * returns an error if it encounters an extent that isn't offline or has
- * a block mapping.  It ignores i_size completely; it does not test it,
- * and does not update it.
- *
- * The caller has the inode locked in the vfs and performed basic sanity
- * checks.  We manage transactions and the extent_sem which is ordered
- * inside the transaction.
- */
-int scoutfs_data_punch_offline(struct inode *inode, u64 iblock, u64 last, u64 data_version,
-			       struct scoutfs_lock *lock)
-{
-	struct scoutfs_inode_info *si = SCOUTFS_I(inode);
-	struct super_block *sb = inode->i_sb;
-	struct data_ext_args args = {
-		.ino = scoutfs_ino(inode),
-		.inode = inode,
-		.lock = lock,
-	};
-	struct scoutfs_extent ext;
-	LIST_HEAD(ind_locks);
-	int ret;
-	int i;
-
-	if (WARN_ON_ONCE(iblock > last)) {
-		ret = -EINVAL;
-		goto out;
-	}
-
-	/* idiomatic to call start,last with 0,~0, clamp last to last possible */
-	last = min(last, SCOUTFS_BLOCK_SM_MAX);
-
-	ret = 0;
-	while (iblock <= last) {
-		ret = scoutfs_inode_index_lock_hold(inode, &ind_locks, true, false) ?:
-		      scoutfs_dirty_inode_item(inode, lock);
-		if (ret < 0)
-			break;
-
-		down_write(&si->extent_sem);
-
-		for (i = 0; i < 32 && (iblock <= last); i++) {
-			ret = scoutfs_ext_next(sb, &data_ext_ops, &args, iblock, 1, &ext);
-			if (ret == -ENOENT) {
-				iblock = last + 1;
-				ret = 0;
-				break;
-			}
-
-			if (ret < 0)
-				break;
-
-			if (ext.start > last) {
-				iblock = last + 1;
-				break;
-			}
-
-			if (ext.map) {
-				ret = -EINVAL;
-				break;
-			}
-
-			if (ext.flags & SEF_OFFLINE) {
-				if (iblock > ext.start) {
-					ext.len -= iblock - ext.start;
-					ext.start = iblock;
-				}
-				ext.len = min(ext.len, last - ext.start + 1);
-				ext.flags &= ~SEF_OFFLINE;
-
-				ret = scoutfs_ext_set(sb, &data_ext_ops, &args,
-						      ext.start, ext.len, ext.map, ext.flags);
-				if (ret < 0)
-					break;
-			}
-
-			iblock = ext.start + ext.len;
-		}
-
-		up_write(&si->extent_sem);
-
-		scoutfs_update_inode_item(inode, lock, &ind_locks);
-		scoutfs_release_trans(sb);
-		scoutfs_inode_index_unlock(sb, &ind_locks);
-
-		if (ret < 0)
-			break;
-	}
-
-out:
-	return ret;
-}
-
 /*
 * This copies to userspace :/
 */
@@ -57,8 +57,6 @@ int scoutfs_data_init_offline_extent(struct inode *inode, u64 size,
 int scoutfs_data_move_blocks(struct inode *from, u64 from_off,
 			     u64 byte_len, struct inode *to, u64 to_off, bool to_stage,
 			     u64 data_version);
-int scoutfs_data_punch_offline(struct inode *inode, u64 iblock, u64 last, u64 data_version,
-			       struct scoutfs_lock *lock);

 int scoutfs_data_wait_check(struct inode *inode, loff_t pos, loff_t len,
 			    u8 sef, u8 op, struct scoutfs_data_wait *ow,
@@ -587,12 +587,10 @@ static int add_entry_items(struct super_block *sb, u64 dir_ino, u64 hash,
 	}

 	/* initialize the dent */
-	*dent = (struct scoutfs_dirent) {
-		.ino = cpu_to_le64(ino),
-		.hash = cpu_to_le64(hash),
-		.pos = cpu_to_le64(pos),
-		.type = mode_to_type(mode),
-	};
+	dent->ino = cpu_to_le64(ino);
+	dent->hash = cpu_to_le64(hash);
+	dent->pos = cpu_to_le64(pos);
+	dent->type = mode_to_type(mode);
 	memcpy(dent->name, name, name_len);

 	init_dirent_key(&ent_key, SCOUTFS_DIRENT_TYPE, dir_ino, hash, pos);
@@ -2008,11 +2006,7 @@ const struct inode_operations scoutfs_symlink_iops = {
 #ifdef KC_LINUX_HAVE_RHEL_IOPS_WRAPPER
 	.removexattr	= generic_removexattr,
 #endif
-#ifdef KC_GET_INODE_ACL
-	.get_inode_acl	= scoutfs_get_acl,
-#else
 	.get_acl	= scoutfs_get_acl,
-#endif
 #ifndef KC_LINUX_HAVE_RHEL_IOPS_WRAPPER
 	.tmpfile	= scoutfs_tmpfile,
 	.rename		= scoutfs_rename_common,
@@ -2058,14 +2052,7 @@ const struct inode_operations scoutfs_dir_iops = {
 	.removexattr	= generic_removexattr,
 #endif
 	.listxattr	= scoutfs_listxattr,
-#ifdef KC_GET_INODE_ACL
-	.get_inode_acl	= scoutfs_get_acl,
-#else
 	.get_acl	= scoutfs_get_acl,
-#endif
-#ifdef KC_HAS_SET_ACL
-	.set_acl	= scoutfs_set_acl,
-#endif
 	.symlink	= scoutfs_symlink,
 	.permission	= scoutfs_permission,
 #ifdef KC_LINUX_HAVE_RHEL_IOPS_WRAPPER
@@ -222,7 +222,7 @@ static struct attribute *fence_attrs[] = {

 static void fence_timeout(struct timer_list *timer)
 {
-	struct pending_fence *fence = timer_container_of(fence, timer, timer);
+	struct pending_fence *fence = from_timer(fence, timer, timer);
 	struct super_block *sb = fence->sb;
 	DECLARE_FENCE_INFO(sb, fi);

@@ -239,9 +239,9 @@ static int forest_read_items(struct super_block *sb, struct scoutfs_key *key, u6
 * to reset their state and retry with a newer version of the btrees.
 */
 int scoutfs_forest_read_items_roots(struct super_block *sb, struct scoutfs_net_roots *roots,
-				    u64 merge_input_seq, struct scoutfs_key *key,
-				    struct scoutfs_key *bloom_key, struct scoutfs_key *start,
-				    struct scoutfs_key *end, scoutfs_forest_item_cb cb, void *arg)
+				    struct scoutfs_key *key, struct scoutfs_key *bloom_key,
+				    struct scoutfs_key *start, struct scoutfs_key *end,
+				    scoutfs_forest_item_cb cb, void *arg)
 {
 	struct forest_read_items_data rid = {
 		.cb = cb,
@@ -317,17 +317,15 @@ int scoutfs_forest_read_items_roots(struct super_block *sb, struct scoutfs_net_r

 		scoutfs_inc_counter(sb, forest_bloom_pass);

-		if ((le64_to_cpu(lt.flags) & SCOUTFS_LOG_TREES_FINALIZED) &&
-		    (merge_input_seq == 0 ||
-		     le64_to_cpu(lt.finalize_seq) < merge_input_seq))
-			rid.fic |= FIC_MERGE_INPUT;
+		if ((le64_to_cpu(lt.flags) & SCOUTFS_LOG_TREES_FINALIZED))
+			rid.fic |= FIC_FINALIZED;

 		ret = scoutfs_btree_read_items(sb, &lt.item_root, key, start,
 					       end, forest_read_items, &rid);
 		if (ret < 0)
 			goto out;

-		rid.fic &= ~FIC_MERGE_INPUT;
+		rid.fic &= ~FIC_FINALIZED;
 	}

 	ret = 0;
@@ -347,7 +345,7 @@ int scoutfs_forest_read_items(struct super_block *sb,

 	ret = scoutfs_client_get_roots(sb, &roots);
 	if (ret == 0)
-		ret = scoutfs_forest_read_items_roots(sb, &roots, 0, key, bloom_key, start, end,
+		ret = scoutfs_forest_read_items_roots(sb, &roots, key, bloom_key, start, end,
 						      cb, arg);
 	return ret;
 }
@@ -795,7 +793,7 @@ out:
 	if (ret)
 		scoutfs_forest_destroy(sb);

-	return ret;
+	return 0;
 }

 void scoutfs_forest_start(struct super_block *sb)
@@ -11,7 +11,7 @@ struct scoutfs_lock;
 /* caller gives an item to the callback */
 enum {
 	FIC_FS_ROOT = (1 << 0),
-	FIC_MERGE_INPUT = (1 << 1),
+	FIC_FINALIZED = (1 << 1),
 };
 typedef int (*scoutfs_forest_item_cb)(struct super_block *sb, struct scoutfs_key *key, u64 seq,
 				      u8 flags, void *val, int val_len, int fic, void *arg);
@@ -25,9 +25,9 @@ int scoutfs_forest_read_items(struct super_block *sb,
 			      struct scoutfs_key *end,
 			      scoutfs_forest_item_cb cb, void *arg);
 int scoutfs_forest_read_items_roots(struct super_block *sb, struct scoutfs_net_roots *roots,
-				    u64 merge_input_seq, struct scoutfs_key *key,
-				    struct scoutfs_key *bloom_key, struct scoutfs_key *start,
-				    struct scoutfs_key *end, scoutfs_forest_item_cb cb, void *arg);
+				    struct scoutfs_key *key, struct scoutfs_key *bloom_key,
+				    struct scoutfs_key *start, struct scoutfs_key *end,
+				    scoutfs_forest_item_cb cb, void *arg);
 int scoutfs_forest_set_bloom_bits(struct super_block *sb,
 				  struct scoutfs_lock *lock);
 void scoutfs_forest_set_max_seq(struct super_block *sb, u64 max_seq);
@@ -470,7 +470,7 @@ struct scoutfs_srch_compact {
 * @get_trans_seq, @commit_trans_seq: These pair of sequence numbers
 * determine if a transaction is currently open for the mount that owns
 * the log_trees struct.  get_trans_seq is advanced by the server as the
- * transaction is opened.   The server sets commit_trans_seq equal to
+ * transaction is opened.   The server sets comimt_trans_seq equal to
 * get_ as the transaction is committed.
 */
 struct scoutfs_log_trees {
@@ -1091,8 +1091,7 @@ enum scoutfs_net_cmd {
 	EXPAND_NET_ERRNO(ENOMEM)	\
 	EXPAND_NET_ERRNO(EIO)		\
 	EXPAND_NET_ERRNO(ENOSPC)	\
-	EXPAND_NET_ERRNO(EINVAL)	\
-	EXPAND_NET_ERRNO(ENOLINK)
+	EXPAND_NET_ERRNO(EINVAL)

 #undef EXPAND_NET_ERRNO
 #define EXPAND_NET_ERRNO(which) SCOUTFS_NET_ERR_##which,
@@ -149,14 +149,7 @@ static const struct inode_operations scoutfs_file_iops = {
 	.removexattr	= generic_removexattr,
 #endif
 	.listxattr	= scoutfs_listxattr,
-#ifdef KC_GET_INODE_ACL
-	.get_inode_acl	= scoutfs_get_acl,
-#else
 	.get_acl	= scoutfs_get_acl,
-#endif
-#ifdef KC_HAS_SET_ACL
-	.set_acl	= scoutfs_set_acl,
-#endif
 	.fiemap		= scoutfs_data_fiemap,
 };

@@ -169,14 +162,7 @@ static const struct inode_operations scoutfs_special_iops = {
 	.removexattr	= generic_removexattr,
 #endif
 	.listxattr	= scoutfs_listxattr,
-#ifdef KC_GET_INODE_ACL
-	.get_inode_acl	= scoutfs_get_acl,
-#else
 	.get_acl	= scoutfs_get_acl,
-#endif
-#ifdef KC_HAS_SET_ACL
-	.set_acl	= scoutfs_set_acl,
-#endif
 };

 /*
@@ -549,7 +535,6 @@ retry:
 				goto out;
 			if (scoutfs_data_wait_found(&dw)) {
 				scoutfs_unlock(sb, lock, SCOUTFS_LOCK_WRITE);
-				lock = NULL;

 				/* XXX callee locks instead? */
 				inode_unlock(inode);
@@ -1491,6 +1476,12 @@ static int remove_index_items(struct super_block *sb, u64 ino,
 * Return an allocated and unused inode number.  Returns -ENOSPC if
 * we're out of inode.
 *
+ * Each parent directory has its own pool of free inode numbers.  Items
+ * are sorted by their inode numbers as they're stored in segments.
+ * This will tend to group together files that are created in a
+ * directory at the same time in segments.  Concurrent creation across
+ * different directories will be stored in their own regions.
+ *
 * Inode numbers are never reclaimed.  If the inode is evicted or we're
 * unmounted the pending inode numbers will be lost.  Asking for a
 * relatively small number from the server each time will tend to
@@ -1500,18 +1491,12 @@ static int remove_index_items(struct super_block *sb, u64 ino,
 int scoutfs_alloc_ino(struct super_block *sb, bool is_dir, u64 *ino_ret)
 {
 	DECLARE_INODE_SB_INFO(sb, inf);
-	struct scoutfs_mount_options opts;
 	struct inode_allocator *ia;
 	u64 ino;
 	u64 nr;
 	int ret;

-	scoutfs_options_read(sb, &opts);
-
-	if (is_dir && opts.ino_alloc_per_lock == SCOUTFS_LOCK_INODE_GROUP_NR)
-		ia = &inf->dir_ino_alloc;
-	else
-		ia = &inf->ino_alloc;
+	ia = is_dir ? &inf->dir_ino_alloc : &inf->ino_alloc;

 	spin_lock(&ia->lock);

@@ -1532,17 +1517,6 @@ int scoutfs_alloc_ino(struct super_block *sb, bool is_dir, u64 *ino_ret)
 	*ino_ret = ia->ino++;
 	ia->nr--;

-	if (opts.ino_alloc_per_lock != SCOUTFS_LOCK_INODE_GROUP_NR) {
-		nr = ia->ino & SCOUTFS_LOCK_INODE_GROUP_MASK;
-		if (nr >= opts.ino_alloc_per_lock) {
-			nr = SCOUTFS_LOCK_INODE_GROUP_NR - nr;
-			if (nr > ia->nr)
-				nr = ia->nr;
-			ia->ino += nr;
-			ia->nr -= nr;
-		}
-	}
-
 	spin_unlock(&ia->lock);
 	ret = 0;
 out:
@@ -1646,14 +1620,10 @@ int scoutfs_inode_orphan_delete(struct super_block *sb, u64 ino, struct scoutfs_
 				struct scoutfs_lock *primary)
 {
 	struct scoutfs_key key;
-	int ret;

 	init_orphan_key(&key, ino);

-	ret = scoutfs_item_delete_force(sb, &key, lock, primary);
-	trace_scoutfs_inode_orphan_delete(sb, ino, ret);
-
-	return ret;
+	return scoutfs_item_delete_force(sb, &key, lock, primary);
 }

 /*
@@ -1735,8 +1705,6 @@ out:
 		scoutfs_release_trans(sb);
 	scoutfs_inode_index_unlock(sb, &ind_locks);

-	trace_scoutfs_delete_inode_end(sb, ino, mode, size, ret);
-
 	return ret;
 }

@@ -1832,9 +1800,6 @@ out:
 * they've checked that the inode could really be deleted.  We serialize
 * on a bit in the lock data so that we only have one deletion attempt
 * per inode under this mount's cluster lock.
- *
- * Returns -EAGAIN if we either did some cleanup work or are unable to finish
- * cleaning up this inode right now.
 */
 static int try_delete_inode_items(struct super_block *sb, u64 ino)
 {
@@ -1848,8 +1813,6 @@ static int try_delete_inode_items(struct super_block *sb, u64 ino)
 	int bit_nr;
 	int ret;

-	trace_scoutfs_try_delete(sb, ino);
-
 	ret = scoutfs_lock_ino(sb, SCOUTFS_LOCK_WRITE, 0, ino, &lock);
 	if (ret < 0)
 		goto out;
@@ -1862,32 +1825,27 @@ static int try_delete_inode_items(struct super_block *sb, u64 ino)

 	/* only one local attempt per inode at a time */
 	if (test_and_set_bit(bit_nr, ldata->trying)) {
-		trace_scoutfs_try_delete_local_busy(sb, ino);
-		ret = -EAGAIN;
+		ret = 0;
 		goto out;
 	}
 	clear_trying = true;

 	/* can't delete if it's cached in local or remote mounts */
 	if (scoutfs_omap_test(sb, ino) || test_bit_le(bit_nr, ldata->map.bits)) {
-		trace_scoutfs_try_delete_cached(sb, ino);
-		ret = -EAGAIN;
+		ret = 0;
 		goto out;
 	}

 	scoutfs_inode_init_key(&key, ino);
 	ret = lookup_inode_item(sb, &key, &sinode, lock);
 	if (ret < 0) {
-		if (ret == -ENOENT) {
-			trace_scoutfs_try_delete_no_item(sb, ino);
+		if (ret == -ENOENT)
 			ret = 0;
-		}
 		goto out;
 	}

 	if (le32_to_cpu(sinode.nlink) > 0) {
-		trace_scoutfs_try_delete_has_links(sb, ino, le32_to_cpu(sinode.nlink));
-		ret = -EAGAIN;
+		ret = 0;
 		goto out;
 	}

@@ -1896,11 +1854,6 @@ static int try_delete_inode_items(struct super_block *sb, u64 ino)
 		goto out;

 	ret = delete_inode_items(sb, ino, &sinode, lock, orph_lock);
-	if (ret == 0) {
-		ret = -EAGAIN;
-		scoutfs_inc_counter(sb, inode_deleted);
-	}
-
 out:
 	if (clear_trying)
 		clear_bit(bit_nr, ldata->trying);
@@ -2009,8 +1962,6 @@ static void iput_worker(struct work_struct *work)
 		while (count-- > 0)
 			iput(inode);

-		cond_resched();
-
 		/* can't touch inode after final iput */

 		spin_lock(&inf->iput_lock);
@@ -2101,10 +2052,6 @@ void scoutfs_inode_schedule_orphan_dwork(struct super_block *sb)
 * a locally cached inode.  Then we ask the server for the open map
 * containing the inode.  Only if we don't see any cached users do we do
 * the expensive work of acquiring locks to try and delete the items.
- *
- * We need to track whether there is any orphan cleanup work remaining so
- * that tests such as inode-deletion can watch the orphan_scan_empty counter
- * to determine when inode cleanup from open-unlink scenarios is complete.
 */
 static void inode_orphan_scan_worker(struct work_struct *work)
 {
@@ -2116,14 +2063,11 @@ static void inode_orphan_scan_worker(struct work_struct *work)
 	SCOUTFS_BTREE_ITEM_REF(iref);
 	struct scoutfs_key last;
 	struct scoutfs_key key;
-	bool work_todo = false;
 	u64 group_nr;
 	int bit_nr;
 	u64 ino;
 	int ret;

-	trace_scoutfs_orphan_scan_start(sb);
-
 	scoutfs_inc_counter(sb, orphan_scan);

 	init_orphan_key(&last, U64_MAX);
@@ -2143,10 +2087,8 @@ static void inode_orphan_scan_worker(struct work_struct *work)
 		init_orphan_key(&key, ino);
 		ret = scoutfs_btree_next(sb, &roots.fs_root, &key, &iref);
 		if (ret < 0) {
-			if (ret == -ENOENT) {
-				trace_scoutfs_orphan_scan_work(sb, 0);
+			if (ret == -ENOENT)
 				break;
-			}
 			goto out;
 		}

@@ -2161,7 +2103,6 @@ static void inode_orphan_scan_worker(struct work_struct *work)

 		/* locally cached inodes will try to delete as they evict */
 		if (scoutfs_omap_test(sb, ino)) {
-			work_todo = true;
 			scoutfs_inc_counter(sb, orphan_scan_cached);
 			continue;
 		}
@@ -2177,22 +2118,13 @@ static void inode_orphan_scan_worker(struct work_struct *work)

 		/* remote cached inodes will also try to delete */
 		if (test_bit_le(bit_nr, omap.bits)) {
-			work_todo = true;
 			scoutfs_inc_counter(sb, orphan_scan_omap_set);
 			continue;
 		}

 		/* seemingly orphaned and unused, get locks and check for sure */
 		scoutfs_inc_counter(sb, orphan_scan_attempts);
-		trace_scoutfs_orphan_scan_work(sb, ino);
-
 		ret = try_delete_inode_items(sb, ino);
-		if (ret == -EAGAIN) {
-			work_todo = true;
-			ret = 0;
-		}
-
-		trace_scoutfs_orphan_scan_end(sb, ino, ret);
 	}

 	ret = 0;
@@ -2201,11 +2133,6 @@ out:
 	if (ret < 0)
 		scoutfs_inc_counter(sb, orphan_scan_error);

-	if (!work_todo)
-		scoutfs_inc_counter(sb, orphan_scan_empty);
-
-	trace_scoutfs_orphan_scan_stop(sb, work_todo);
-
 	scoutfs_inode_schedule_orphan_dwork(sb);
 }

@@ -2256,7 +2183,7 @@ int scoutfs_inode_walk_writeback(struct super_block *sb, bool write)
 	struct scoutfs_inode_info *si;
 	struct scoutfs_inode_info *tmp;
 	struct inode *inode;
-	int ret = 0;
+	int ret;

 	spin_lock(&inf->writeback_lock);

@@ -415,6 +415,8 @@ static long scoutfs_ioc_data_wait_err(struct file *file, unsigned long arg)
 		return 0;
 	if ((args.op & SCOUTFS_IOC_DWO_UNKNOWN) || !IS_ERR_VALUE(args.err))
 		return -EINVAL;
+	if ((args.op & SCOUTFS_IOC_DWO_UNKNOWN) || !IS_ERR_VALUE(args.err))
+		return -EINVAL;

 	trace_scoutfs_ioc_data_wait_err(sb, &args);

@@ -439,6 +441,8 @@ static long scoutfs_ioc_data_wait_err(struct file *file, unsigned long arg)

 	if (!S_ISREG(inode->i_mode)) {
 		ret = -EINVAL;
+	} else if (scoutfs_inode_data_version(inode) != args.data_version) {
+		ret = -ESTALE;
 	} else {
 		ret = scoutfs_data_wait_err(inode, sblock, eblock, args.op,
 					    args.err);
@@ -950,9 +954,6 @@ static int copy_alloc_detail_to_user(struct super_block *sb, void *arg,
 	if (args->copied == args->nr)
 		return -EOVERFLOW;

-	/* .type and .pad need clearing */
-	memset(&ade, 0, sizeof(struct scoutfs_ioctl_alloc_detail_entry));
-
 	ade.blocks = blocks;
 	ade.id = id;
 	ade.meta = !!meta;
@@ -1368,7 +1369,7 @@ static long scoutfs_ioc_get_referring_entries(struct file *file, unsigned long a
 			ent.d_type = bref->d_type;
 			ent.name_len = name_len;

-			if (copy_to_user(uent, &ent, offsetof(struct scoutfs_ioctl_dirent, name[0])) ||
+			if (copy_to_user(uent, &ent, sizeof(struct scoutfs_ioctl_dirent)) ||
 			    copy_to_user(&uent->name[0], bref->dent.name, name_len) ||
 			    put_user('\0', &uent->name[name_len])) {
 				ret = -EFAULT;
@@ -1667,115 +1668,6 @@ out:
 	return ret;
 }

-static long scoutfs_ioc_punch_offline(struct file *file, unsigned long arg)
-{
-	struct inode *inode = file_inode(file);
-	struct super_block *sb = inode->i_sb;
-	struct scoutfs_ioctl_punch_offline __user *upo = (void __user *)arg;
-	struct scoutfs_ioctl_punch_offline po;
-	struct scoutfs_lock *lock = NULL;
-	u64 iblock;
-	u64 last;
-	u64 tmp;
-	int ret;
-
-	if (copy_from_user(&po, upo, sizeof(po)))
-		return -EFAULT;
-
-	if (po.len == 0)
-		return 0;
-
-	if (check_add_overflow(po.offset, po.len - 1, &tmp) ||
-	    (po.offset & SCOUTFS_BLOCK_SM_MASK) ||
-	    (po.len & SCOUTFS_BLOCK_SM_MASK))
-		return -EOVERFLOW;
-
-	if (po.flags)
-		return -EINVAL;
-
-	ret = mnt_want_write_file(file);
-	if (ret < 0)
-		return ret;
-
-	inode_lock(inode);
-
-	ret = scoutfs_lock_inode(sb, SCOUTFS_LOCK_WRITE,
-				 SCOUTFS_LKF_REFRESH_INODE, inode, &lock);
-	if (ret)
-		goto out;
-
-	if (!S_ISREG(inode->i_mode)) {
-		ret = -EINVAL;
-		goto out;
-	}
-
-	if (!(file->f_mode & FMODE_WRITE)) {
-		ret = -EINVAL;
-		goto out;
-	}
-
-	ret = inode_permission(KC_VFS_INIT_NS inode, MAY_WRITE);
-	if (ret < 0)
-		goto out;
-
-	if (scoutfs_inode_data_version(inode) != po.data_version) {
-		ret = -ESTALE;
-		goto out;
-	}
-
-	if ((ret = scoutfs_inode_check_retention(inode)))
-		goto out;
-
-	iblock = po.offset >> SCOUTFS_BLOCK_SM_SHIFT;
-	last = (po.offset + po.len - 1) >> SCOUTFS_BLOCK_SM_SHIFT;
-
-	ret = scoutfs_data_punch_offline(inode, iblock, last, po.data_version, lock);
-
-out:
-	scoutfs_unlock(sb, lock, SCOUTFS_LOCK_WRITE);
-	inode_unlock(inode);
-	mnt_drop_write_file(file);
-
-	return ret;
-}
-
-static long scoutfs_ioc_inject_totl_delta(struct file *file, unsigned long arg)
-{
-	struct super_block *sb = file_inode(file)->i_sb;
-	struct scoutfs_ioctl_inject_totl_delta __user *uitd = (void __user *)arg;
-	struct scoutfs_ioctl_inject_totl_delta itd;
-	struct scoutfs_xattr_totl_val tval;
-	struct scoutfs_lock *lock = NULL;
-	struct scoutfs_key key;
-	int ret;
-
-	if (!capable(CAP_SYS_ADMIN))
-		return -EPERM;
-
-	if (copy_from_user(&itd, uitd, sizeof(itd)))
-		return -EFAULT;
-
-	scoutfs_xattr_init_totl_key(&key, itd.name);
-	tval.total = cpu_to_le64((u64)itd.total);
-	tval.count = cpu_to_le64((u64)itd.count);
-
-	ret = scoutfs_lock_xattr_totl(sb, SCOUTFS_LOCK_WRITE_ONLY, 0, &lock);
-	if (ret < 0)
-		goto out;
-
-	ret = scoutfs_hold_trans(sb, true);
-	if (ret < 0)
-		goto unlock;
-
-	ret = scoutfs_item_delta(sb, &key, &tval, sizeof(tval), lock);
-
-	scoutfs_release_trans(sb);
-unlock:
-	scoutfs_unlock(sb, lock, SCOUTFS_LOCK_WRITE_ONLY);
-out:
-	return ret;
-}
-
 long scoutfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 {
 	switch (cmd) {
@@ -1825,10 +1717,6 @@ long scoutfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 		return scoutfs_ioc_mod_quota_rule(file, arg, false);
 	case SCOUTFS_IOC_READ_XATTR_INDEX:
 		return scoutfs_ioc_read_xattr_index(file, arg);
-	case SCOUTFS_IOC_PUNCH_OFFLINE:
-		return scoutfs_ioc_punch_offline(file, arg);
-	case SCOUTFS_IOC_INJECT_TOTL_DELTA:
-		return scoutfs_ioc_inject_totl_delta(file, arg);
 	}

 	return -ENOTTY;
@@ -366,15 +366,10 @@ struct scoutfs_ioctl_statfs_more {
 *
 * Find current waiters that match the inode, op, and block range to wake
 * up and return an error.
- *
- * (*) ca. v1.25 and earlier required that the data_version passed match
- * that of the waiter, but this check is removed. It was never needed
- * because no data is modified during this ioctl. Any data_version value
- * here is thus since then ignored.
 */
 struct scoutfs_ioctl_data_wait_err {
 	__u64 ino;
-	__u64 data_version; /* Ignored, see above (*) */
+	__u64 data_version;
 	__u64 offset;
 	__u64 count;
 	__u64 op;
@@ -848,45 +843,4 @@ struct scoutfs_ioctl_read_xattr_index {
 #define SCOUTFS_IOC_READ_XATTR_INDEX \
 	_IOR(SCOUTFS_IOCTL_MAGIC, 23, struct scoutfs_ioctl_read_xattr_index)

-/*
- * This is a limited and specific version of hole punching.  It's an
- * archive layer operation that only converts unmapped offline extents
- * into sparse extents.  It is intended to be used when restoring sparse
- * files after the initial creation set the entire file size offline.
- *
- * The offset and len fields are in units of bytes and must be aligned
- * to the small (4KiB) block size.  All regions of offline extents
- * covered by the region will be converted into sparse online extents,
- * including regions that straddle the boundaries of the region.  Any
- * existing sparse extents in the region are ignored.
- *
- * The data_version must match the inode or EINVAL is returned.  The
- * data_version is not modified by this operation.
- *
- * EINVAL is returned if any mapped extents are found in the region.  If
- * an error is returned then partial progress may have been made.
- */
-struct scoutfs_ioctl_punch_offline {
-	__u64 offset;
-	__u64 len;
-	__u64 data_version;
-	__u64 flags;
-};
-
-#define SCOUTFS_IOC_PUNCH_OFFLINE \
-	_IOW(SCOUTFS_IOCTL_MAGIC, 24, struct scoutfs_ioctl_punch_offline)
-
-/*
- * Inject a signed (total, count) delta at the totl key @name (a, b, c
- * match the trailing dotted u64s of a totl xattr name).
- */
-struct scoutfs_ioctl_inject_totl_delta {
-	__u64	name[SCOUTFS_IOCTL_XATTR_TOTAL_NAME_NR];
-	__s64	total;
-	__s64	count;
-};
-
-#define SCOUTFS_IOC_INJECT_TOTL_DELTA \
-	_IOW(SCOUTFS_IOCTL_MAGIC, 25, struct scoutfs_ioctl_inject_totl_delta)
-
 #endif
@@ -86,8 +86,6 @@ struct item_cache_info {
 	/* often walked, but per-cpu refs are fast path */
 	rwlock_t rwlock;
 	struct rb_root pg_root;
-	/* stop readers from caching stale items behind reclaimed cleaned written items */
-	u64 read_dirty_barrier;

 	/* page-granular modification by writers, then exclusive to commit */
 	spinlock_t dirty_lock;
@@ -98,6 +96,10 @@ struct item_cache_info {
 	spinlock_t lru_lock;
 	struct list_head lru_list;
 	unsigned long lru_pages;
+
+	/* written by page readers, read by shrink */
+	spinlock_t active_lock;
+	struct list_head active_list;
 };

 #define DECLARE_ITEM_CACHE_INFO(sb, name) \
@@ -1283,6 +1285,78 @@ static int cache_empty_page(struct super_block *sb,
 	return 0;
 }

+/*
+ * Readers operate independently from dirty items and transactions.
+ * They read a set of persistent items and insert them into the cache
+ * when there aren't already pages whose key range contains the items.
+ * This naturally prefers cached dirty items over stale read items.
+ *
+ * We have to deal with the case where dirty items are written and
+ * invalidated while a read is in flight.   The reader won't have seen
+ * the items that were dirty in their persistent roots as they started
+ * reading.  By the time they insert their read pages the previously
+ * dirty items have been reclaimed and are not in the cache.  The old
+ * stale items will be inserted in their place, effectively corrupting
+ * by having the dirty items disappear.
+ *
+ * We fix this by tracking the max seq of items in pages.  As readers
+ * start they record the current transaction seq.  Invalidation skips
+ * pages with a max seq greater than the first reader seq because the
+ * items in the page have to stick around to prevent the readers stale
+ * items from being inserted.
+ *
+ * This naturally only affects a small set of pages with items that were
+ * written relatively recently.  If we're in memory pressure then we
+ * probably have a lot of pages and they'll naturally have items that
+ * were visible to any raders.  We don't bother with the complicated and
+ * expensive further refinement of tracking the ranges that are being
+ * read and comparing those with pages to invalidate.
+ */
+struct active_reader {
+	struct list_head head;
+	u64 seq;
+};
+
+#define INIT_ACTIVE_READER(rdr) \
+	struct active_reader rdr = { .head = LIST_HEAD_INIT(rdr.head) }
+
+static void add_active_reader(struct super_block *sb, struct active_reader *active)
+{
+	DECLARE_ITEM_CACHE_INFO(sb, cinf);
+
+	BUG_ON(!list_empty(&active->head));
+
+	active->seq = scoutfs_trans_sample_seq(sb);
+
+	spin_lock(&cinf->active_lock);
+	list_add_tail(&active->head, &cinf->active_list);
+	spin_unlock(&cinf->active_lock);
+}
+
+static u64 first_active_reader_seq(struct item_cache_info *cinf)
+{
+	struct active_reader *active;
+	u64 first;
+
+	/* only the calling task adds or deletes this active */
+	spin_lock(&cinf->active_lock);
+	active = list_first_entry_or_null(&cinf->active_list, struct active_reader, head);
+	first = active ? active->seq : U64_MAX;
+	spin_unlock(&cinf->active_lock);
+
+	return first;
+}
+
+static void del_active_reader(struct item_cache_info *cinf, struct active_reader *active)
+{
+	/* only the calling task adds or deletes this active */
+	if (!list_empty(&active->head)) {
+		spin_lock(&cinf->active_lock);
+		list_del_init(&active->head);
+		spin_unlock(&cinf->active_lock);
+	}
+}
+
 /*
 * Add a newly read item to the pages that we're assembling for
 * insertion into the cache.   These pages are private, they only exist
@@ -1376,34 +1450,24 @@ static int read_page_item(struct super_block *sb, struct scoutfs_key *key, u64 s
 * and duplicates, we insert any resulting pages which don't overlap
 * with existing cached pages.
 *
- * The forest item reader is reading stable trees that could be
- * overwritten.  It can return -ESTALE which we return to the caller who
- * will retry the operation and work with a new set of more recent
- * btrees.
- *
 * We only insert uncached regions because this is called with cluster
 * locks held, but without locking the cache.  The regions we read can
 * be stale with respect to the current cache, which can be read and
 * dirtied by other cluster lock holders on our node, but the cluster
- * locks protect the stable items we read.
+ * locks protect the stable items we read.  Invalidation is careful not
+ * to drop pages that have items that we couldn't see because they were
+ * dirty when we started reading.
 *
- * Using the presence of locally written dirty pages to override stale
- * read pages only works if, well, the more recent locally written pages
- * are still present.  Readers are totally decoupled from writers and
- * can have a set of items that is very old indeed.  In the mean time
- * more recent items would have been dirtied locally, committed,
- * cleaned, and reclaimed.  We have a coarse barrier which ensures that
- * readers can't insert items read from old roots from before local data
- * was written.  If a write completes while a read is in progress the
- * read will have to retry.  The retried read can use cached blocks so
- * we're relying on reads being much faster than writes to reduce the
- * overhead to mostly cpu work of recollecting the items from cached
- * blocks via a more recent root from the server.
+ * The forest item reader is reading stable trees that could be
+ * overwritten.  It can return -ESTALE which we return to the caller who
+ * will retry the operation and work with a new set of more recent
+ * btrees.
 */
 static int read_pages(struct super_block *sb, struct item_cache_info *cinf,
 		      struct scoutfs_key *key, struct scoutfs_lock *lock)
 {
 	struct rb_root root = RB_ROOT;
+	INIT_ACTIVE_READER(active);
 	struct cached_page *right = NULL;
 	struct cached_page *pg;
 	struct cached_page *rd;
@@ -1416,7 +1480,6 @@ static int read_pages(struct super_block *sb, struct item_cache_info *cinf,
 	struct rb_node *par;
 	struct rb_node *pg_tmp;
 	struct rb_node *item_tmp;
-	u64 rdbar;
 	int pgi;
 	int ret;

@@ -1430,9 +1493,8 @@ static int read_pages(struct super_block *sb, struct item_cache_info *cinf,
 	pg->end = lock->end;
 	rbtree_insert(&pg->node, NULL, &root.rb_node, &root);

-	read_lock(&cinf->rwlock);
-	rdbar = cinf->read_dirty_barrier;
-	read_unlock(&cinf->rwlock);
+	/* set active reader seq before reading persistent roots */
+	add_active_reader(sb, &active);

 	start = lock->start;
 	end = lock->end;
@@ -1471,13 +1533,6 @@ static int read_pages(struct super_block *sb, struct item_cache_info *cinf,
 retry:
 	write_lock(&cinf->rwlock);

-	/* can't insert if write has cleaned since we read */
-	if (cinf->read_dirty_barrier != rdbar) {
-		scoutfs_inc_counter(sb, item_read_pages_barrier);
-		ret = -ESTALE;
-		goto unlock;
-	}
-
 	while ((rd = first_page(&root))) {

 		pg = page_rbtree_walk(sb, &cinf->pg_root, &rd->start, &rd->end,
@@ -1515,12 +1570,12 @@ retry:
 		}
 	}

-	ret = 0;
-
-unlock:
 	write_unlock(&cinf->rwlock);

+	ret = 0;
 out:
+	del_active_reader(cinf, &active);
+
 	/* free any pages we left dangling on error */
 	for_each_page_safe(&root, rd, pg_tmp) {
 		rbtree_erase(&rd->node, &root);
@@ -1580,7 +1635,6 @@ retry:
 			ret = read_pages(sb, cinf, key, lock);
 		if (ret < 0 && ret != -ESTALE)
 			goto out;
-		scoutfs_inc_counter(sb, item_read_pages_retry);
 		goto retry;
 	}

@@ -2347,12 +2401,6 @@ out:
 * The caller has successfully committed all the dirty btree blocks that
 * contained the currently dirty items.  Clear all the dirty items and
 * pages.
- *
- * This strange lock/trylock loop comes from sparse issuing spurious
- * mismatched context warnings if we do anything (like unlock and relax)
- * in the else branch of the failed trylock.  We're jumping through
- * hoops to not use the else but still drop and reacquire the dirty_lock
- * if the trylock fails.
 */
 int scoutfs_item_write_done(struct super_block *sb)
 {
@@ -2361,35 +2409,40 @@ int scoutfs_item_write_done(struct super_block *sb)
 	struct cached_item *tmp;
 	struct cached_page *pg;

-	/* don't let read_pages miss written+cleaned items */
-	write_lock(&cinf->rwlock);
-	cinf->read_dirty_barrier++;
-	write_unlock(&cinf->rwlock);
-
+retry:
 	spin_lock(&cinf->dirty_lock);
-	while ((pg = list_first_entry_or_null(&cinf->dirty_list, struct cached_page, dirty_head))) {
-		if (write_trylock(&pg->rwlock)) {
+
+	while ((pg = list_first_entry_or_null(&cinf->dirty_list,
+					      struct cached_page,
+					      dirty_head))) {
+
+		if (!write_trylock(&pg->rwlock)) {
 			spin_unlock(&cinf->dirty_lock);
-			list_for_each_entry_safe(item, tmp, &pg->dirty_list,
-						 dirty_head) {
-				clear_item_dirty(sb, cinf, pg, item);
-
-				if (item->delta)
-					scoutfs_inc_counter(sb, item_delta_written);
-
-				/* free deletion items */
-				if (item->deletion || item->delta)
-					erase_item(pg, item);
-				else
-					item->persistent = 1;
-			}
-
-			write_unlock(&pg->rwlock);
-			spin_lock(&cinf->dirty_lock);
+			cpu_relax();
+			goto retry;
 		}
+
 		spin_unlock(&cinf->dirty_lock);
+
+		list_for_each_entry_safe(item, tmp, &pg->dirty_list,
+					 dirty_head) {
+			clear_item_dirty(sb, cinf, pg, item);
+
+			if (item->delta)
+				scoutfs_inc_counter(sb, item_delta_written);
+
+			/* free deletion items */
+			if (item->deletion || item->delta)
+				erase_item(pg, item);
+			else
+				item->persistent = 1;
+		}
+
+		write_unlock(&pg->rwlock);
+
 		spin_lock(&cinf->dirty_lock);
-	} while (pg);
+	}
+
 	spin_unlock(&cinf->dirty_lock);

 	return 0;
@@ -2544,15 +2597,24 @@ static unsigned long item_cache_scan_objects(struct shrinker *shrink,
 	struct cached_page *tmp;
 	struct cached_page *pg;
 	unsigned long freed = 0;
+	u64 first_reader_seq;
 	int nr = sc->nr_to_scan;

 	scoutfs_inc_counter(sb, item_cache_scan_objects);

+	/* can't invalidate pages with items that weren't visible to first reader */
+	first_reader_seq = first_active_reader_seq(cinf);
+
 	write_lock(&cinf->rwlock);
 	spin_lock(&cinf->lru_lock);

 	list_for_each_entry_safe(pg, tmp, &cinf->lru_list, lru_head) {

+		if (first_reader_seq <= pg->max_seq) {
+			scoutfs_inc_counter(sb, item_shrink_page_reader);
+			continue;
+		}
+
 		if (!write_trylock(&pg->rwlock)) {
 			scoutfs_inc_counter(sb, item_shrink_page_trylock);
 			continue;
@@ -2619,6 +2681,8 @@ int scoutfs_item_setup(struct super_block *sb)
 	atomic_set(&cinf->dirty_pages, 0);
 	spin_lock_init(&cinf->lru_lock);
 	INIT_LIST_HEAD(&cinf->lru_list);
+	spin_lock_init(&cinf->active_lock);
+	INIT_LIST_HEAD(&cinf->active_list);

 	cinf->pcpu_pages = alloc_percpu(struct item_percpu_pages);
 	if (!cinf->pcpu_pages)
@@ -2651,6 +2715,8 @@ void scoutfs_item_destroy(struct super_block *sb)
 	int cpu;

 	if (cinf) {
+		BUG_ON(!list_empty(&cinf->active_list));
+
 #ifdef KC_CPU_NOTIFIER
 		unregister_hotcpu_notifier(&cinf->notifier);
 #endif
@@ -81,69 +81,3 @@ kc_generic_file_buffered_write(struct kiocb *iocb, const struct iovec *iov,
 	return written ? written : status;
 }
 #endif
-
-#include <linux/list_lru.h>
-
-#ifdef KC_LIST_LRU_WALK_CB_ITEM_LOCK
-static enum lru_status kc_isolate(struct list_head *item, spinlock_t *lock, void *cb_arg)
-{
-	struct kc_isolate_args *args = cb_arg;
-
-	/* isolate doesn't use list, nr_items updated in caller */
-	return args->isolate(item, NULL, args->cb_arg);
-}
-
-unsigned long kc_list_lru_walk(struct list_lru *lru, kc_list_lru_walk_cb_t isolate, void *cb_arg,
-				      unsigned long nr_to_walk)
-{
-	struct kc_isolate_args args = {
-		.isolate = isolate,
-		.cb_arg = cb_arg,
-	};
-
-	return list_lru_walk(lru, kc_isolate, &args, nr_to_walk);
-}
-
-unsigned long kc_list_lru_shrink_walk(struct list_lru *lru, struct shrink_control *sc,
-				      kc_list_lru_walk_cb_t isolate, void *cb_arg)
-{
-	struct kc_isolate_args args = {
-		.isolate = isolate,
-		.cb_arg = cb_arg,
-	};
-
-	return list_lru_shrink_walk(lru, sc, kc_isolate, &args);
-}
-#endif
-
-#ifdef KC_LIST_LRU_WALK_CB_LIST_LOCK
-static enum lru_status kc_isolate(struct list_head *item, struct list_lru_one *list,
-				  spinlock_t *lock, void *cb_arg)
-{
-	struct kc_isolate_args *args = cb_arg;
-
-	return args->isolate(item, list, args->cb_arg);
-}
-
-unsigned long kc_list_lru_walk(struct list_lru *lru, kc_list_lru_walk_cb_t isolate, void *cb_arg,
-				      unsigned long nr_to_walk)
-{
-	struct kc_isolate_args args = {
-		.isolate = isolate,
-		.cb_arg = cb_arg,
-	};
-
-	return list_lru_walk(lru, kc_isolate, &args, nr_to_walk);
-}
-unsigned long kc_list_lru_shrink_walk(struct list_lru *lru, struct shrink_control *sc,
-				      kc_list_lru_walk_cb_t isolate, void *cb_arg)
-{
-	struct kc_isolate_args args = {
-		.isolate = isolate,
-		.cb_arg = cb_arg,
-	};
-
-	return list_lru_shrink_walk(lru, sc, kc_isolate, &args);
-}
-
-#endif
@@ -263,11 +263,6 @@ typedef unsigned int blk_opf_t;
 #define kc__vmalloc __vmalloc
 #endif

-#ifdef KC_VFS_METHOD_MNT_IDMAP_ARG
-#define KC_VFS_NS_DEF struct mnt_idmap *mnt_idmap,
-#define KC_VFS_NS mnt_idmap,
-#define KC_VFS_INIT_NS &nop_mnt_idmap,
-#else
 #ifdef KC_VFS_METHOD_USER_NAMESPACE_ARG
 #define KC_VFS_NS_DEF struct user_namespace *mnt_user_ns,
 #define KC_VFS_NS mnt_user_ns,
@@ -277,7 +272,6 @@ typedef unsigned int blk_opf_t;
 #define KC_VFS_NS
 #define KC_VFS_INIT_NS
 #endif
-#endif /* KC_VFS_METHOD_MNT_IDMAP_ARG */

 #ifdef KC_BIO_ALLOC_DEV_OPF_ARGS
 #define kc_bio_alloc bio_alloc
@@ -416,82 +410,4 @@ static inline vm_fault_t vmf_error(int err)
 }
 #endif

-#include <linux/list_lru.h>
-
-#ifndef KC_LIST_LRU_SHRINK_COUNT_WALK
-/* we don't bother with sc->{nid,memcg} (which doesn't exist in oldest kernels) */
-static inline unsigned long list_lru_shrink_count(struct list_lru *lru,
-                                                  struct shrink_control *sc)
-{
-        return list_lru_count(lru);
-}
-static inline unsigned long
-list_lru_shrink_walk(struct list_lru *lru, struct shrink_control *sc,
-		     list_lru_walk_cb isolate, void *cb_arg)
-{
-	return list_lru_walk(lru, isolate, cb_arg, sc->nr_to_scan);
-}
-#endif
-
-#ifndef KC_LIST_LRU_ADD_OBJ
-#define list_lru_add_obj list_lru_add
-#define list_lru_del_obj list_lru_del
-#endif
-
-#if defined(KC_LIST_LRU_WALK_CB_LIST_LOCK) || defined(KC_LIST_LRU_WALK_CB_ITEM_LOCK)
-struct list_lru_one;
-typedef enum lru_status (*kc_list_lru_walk_cb_t)(struct list_head *item, struct list_lru_one *list,
-						 void *cb_arg);
-struct kc_isolate_args {
-	kc_list_lru_walk_cb_t isolate;
-	void *cb_arg;
-};
-unsigned long kc_list_lru_walk(struct list_lru *lru, kc_list_lru_walk_cb_t isolate, void *cb_arg,
-			       unsigned long nr_to_walk);
-unsigned long kc_list_lru_shrink_walk(struct list_lru *lru, struct shrink_control *sc,
-				      kc_list_lru_walk_cb_t isolate, void *cb_arg);
-#else
-#define kc_list_lru_shrink_walk list_lru_shrink_walk
-#endif
-
-#if defined(KC_LIST_LRU_WALK_CB_ITEM_LOCK)
-/* isolate moved by hand, nr_items updated in walk as _REMOVE returned */
-static inline void list_lru_isolate_move(struct list_lru_one *list, struct list_head *item,
-					 struct list_head *head)
-{
-        list_move(item, head);
-}
-#endif
-
-#ifndef KC_STACK_TRACE_SAVE
-#include <linux/stacktrace.h>
-static inline unsigned int stack_trace_save(unsigned long *store, unsigned int size,
-					    unsigned int skipnr)
-{
-        struct stack_trace trace = {
-                .entries        = store,
-                .max_entries    = size,
-                .skip           = skipnr,
-        };
-
-        save_stack_trace(&trace);
-        return trace.nr_entries;
-}
-
-static inline void stack_trace_print(unsigned long *entries, unsigned int nr_entries, int spaces)
-{
-        struct stack_trace trace = {
-                .entries        = entries,
-                .nr_entries     = nr_entries,
-        };
-
-	print_stack_trace(&trace, spaces);
-}
-#endif
-
-#ifndef KC_TIMER_CONTAINER_OF
-#define timer_container_of(var, callback_timer, timer_fieldname) \
-	from_timer(var, callback_timer, timer_fieldname)
-#endif
-
 #endif
@@ -53,10 +53,8 @@
 * all access to the lock (by revoking it down to a null mode) then the
 * lock is freed.
 *
- * Each client has a configurable number of locks that are allowed to
- * remain idle after being granted, for use by future tasks.  Past the
- * limit locks are freed by requesting a null mode from the server,
- * governed by a LRU.
+ * Memory pressure on the client can cause the client to request a null
+ * mode from the server so that once its granted the lock can be freed.
 *
 * So far we've only needed a minimal trylock.  We return -EAGAIN if a
 * lock attempt can't immediately match an existing granted lock.  This
@@ -81,11 +79,14 @@ struct lock_info {
 	bool unmounting;
 	struct rb_root lock_tree;
 	struct rb_root lock_range_tree;
-	u64 nr_locks;
+	KC_DEFINE_SHRINKER(shrinker);
 	struct list_head lru_list;
+	unsigned long long lru_nr;
 	struct workqueue_struct *workq;
 	struct work_struct inv_work;
 	struct list_head inv_list;
+	struct work_struct shrink_work;
+	struct list_head shrink_list;
 	atomic64_t next_refresh_gen;

 	struct dentry *tseq_dentry;
@@ -167,6 +168,7 @@ static int lock_invalidate(struct super_block *sb, struct scoutfs_lock *lock,
 			   enum scoutfs_lock_mode prev, enum scoutfs_lock_mode mode)
 {
 	struct scoutfs_lock_coverage *cov;
+	struct scoutfs_lock_coverage *tmp;
 	u64 ino, last;
 	int ret = 0;

@@ -190,22 +192,19 @@ static int lock_invalidate(struct super_block *sb, struct scoutfs_lock *lock,

 	/* have to invalidate if we're not in the only usable case */
 	if (!(prev == SCOUTFS_LOCK_WRITE && mode == SCOUTFS_LOCK_READ)) {
-		/*
-		 * Remove cov items to tell users that their cache is
-		 * stale.  The unlock pattern comes from avoiding bad
-		 * sparse warnings when taking else in a failed trylock.
-		 */
+retry:
+		/* remove cov items to tell users that their cache is stale */
 		spin_lock(&lock->cov_list_lock);
-		while ((cov = list_first_entry_or_null(&lock->cov_list,
-						       struct scoutfs_lock_coverage, head))) {
-			if (spin_trylock(&cov->cov_lock)) {
-				list_del_init(&cov->head);
-				cov->lock = NULL;
-				spin_unlock(&cov->cov_lock);
-				scoutfs_inc_counter(sb, lock_invalidate_coverage);
+		list_for_each_entry_safe(cov, tmp, &lock->cov_list, head) {
+			if (!spin_trylock(&cov->cov_lock)) {
+				spin_unlock(&lock->cov_list_lock);
+				cpu_relax();
+				goto retry;
 			}
-			spin_unlock(&lock->cov_list_lock);
-			spin_lock(&lock->cov_list_lock);
+			list_del_init(&cov->head);
+			cov->lock = NULL;
+			spin_unlock(&cov->cov_lock);
+			scoutfs_inc_counter(sb, lock_invalidate_coverage);
 		}
 		spin_unlock(&lock->cov_list_lock);

@@ -248,6 +247,7 @@ static void lock_free(struct lock_info *linfo, struct scoutfs_lock *lock)
 	BUG_ON(!RB_EMPTY_NODE(&lock->range_node));
 	BUG_ON(!list_empty(&lock->lru_head));
 	BUG_ON(!list_empty(&lock->inv_head));
+	BUG_ON(!list_empty(&lock->shrink_head));
 	BUG_ON(!list_empty(&lock->cov_list));

 	kfree(lock->inode_deletion_data);
@@ -275,6 +275,7 @@ static struct scoutfs_lock *lock_alloc(struct super_block *sb,
 	INIT_LIST_HEAD(&lock->lru_head);
 	INIT_LIST_HEAD(&lock->inv_head);
 	INIT_LIST_HEAD(&lock->inv_list);
+	INIT_LIST_HEAD(&lock->shrink_head);
 	spin_lock_init(&lock->cov_list_lock);
 	INIT_LIST_HEAD(&lock->cov_list);

@@ -407,7 +408,6 @@ static bool lock_insert(struct super_block *sb, struct scoutfs_lock *ins)
 	rb_link_node(&ins->node, parent, node);
 	rb_insert_color(&ins->node, &linfo->lock_tree);

-	linfo->nr_locks++;
 	scoutfs_tseq_add(&linfo->tseq_tree, &ins->tseq_entry);

 	return true;
@@ -422,7 +422,6 @@ static void lock_remove(struct lock_info *linfo, struct scoutfs_lock *lock)
 	rb_erase(&lock->range_node, &linfo->lock_range_tree);
 	RB_CLEAR_NODE(&lock->range_node);

-	linfo->nr_locks--;
 	scoutfs_tseq_del(&linfo->tseq_tree, &lock->tseq_entry);
 }

@@ -462,8 +461,10 @@ static void __lock_del_lru(struct lock_info *linfo, struct scoutfs_lock *lock)
 {
 	assert_spin_locked(&linfo->lock);

-	if (!list_empty(&lock->lru_head))
+	if (!list_empty(&lock->lru_head)) {
 		list_del_init(&lock->lru_head);
+		linfo->lru_nr--;
+	}
 }

 /*
@@ -522,16 +523,14 @@ static struct scoutfs_lock *create_lock(struct super_block *sb,
 * indicate that the lock wasn't idle.  If it really is idle then we
 * either free it if it's null or put it back on the lru.
 */
-static void __put_lock(struct lock_info *linfo, struct scoutfs_lock *lock, bool tail)
+static void put_lock(struct lock_info *linfo,struct scoutfs_lock *lock)
 {
 	assert_spin_locked(&linfo->lock);

 	if (lock_idle(lock)) {
 		if (lock->mode != SCOUTFS_LOCK_NULL) {
-			if (tail)
-				list_add_tail(&lock->lru_head, &linfo->lru_list);
-			else
-				list_add(&lock->lru_head, &linfo->lru_list);
+			list_add_tail(&lock->lru_head, &linfo->lru_list);
+			linfo->lru_nr++;
 		} else {
 			lock_remove(linfo, lock);
 			lock_free(linfo, lock);
@@ -539,11 +538,6 @@ static void __put_lock(struct lock_info *linfo, struct scoutfs_lock *lock, bool
 	}
 }

-static inline void put_lock(struct lock_info *linfo, struct scoutfs_lock *lock)
-{
-	__put_lock(linfo, lock, true);
-}
-
 /*
 * The caller has made a change (set a lock mode) which can let one of the
 * invalidating locks make forward progress.
@@ -717,14 +711,14 @@ static void lock_invalidate_worker(struct work_struct *work)
 		/* only lock protocol, inv can't call subsystems after shutdown */
 		if (!linfo->shutdown) {
 			ret = lock_invalidate(sb, lock, nl->old_mode, nl->new_mode);
-			BUG_ON(ret < 0 && ret != -ENOLINK);
+			BUG_ON(ret);
 		}

 		/* respond with the key and modes from the request, server might have died */
 		ret = scoutfs_client_lock_response(sb, ireq->net_id, nl);
 		if (ret == -ENOTCONN)
 			ret = 0;
-		BUG_ON(ret < 0 && ret != -ENOLINK);
+		BUG_ON(ret);

 		scoutfs_inc_counter(sb, lock_invalidate_response);
 	}
@@ -813,7 +807,6 @@ int scoutfs_lock_invalidate_request(struct super_block *sb, u64 net_id,

 out:
 	if (!lock) {
-		kfree(ireq);
 		ret = scoutfs_client_lock_response(sb, net_id, nl);
 		BUG_ON(ret); /* lock server doesn't fence timed out client requests */
 	}
@@ -880,69 +873,6 @@ int scoutfs_lock_recover_request(struct super_block *sb, u64 net_id,
 	return ret;
 }

-/*
- * This is called on every _lock call to try and keep the number of
- * locks under the idle count.  We're intentionally trying to throttle
- * shrinking bursts by tying its frequency to lock use.  It will only
- * send requests to free unused locks, though, so it's always possible
- * to exceed the high water mark under heavy load.
- *
- * We send a null request and the lock will be freed by the response
- * once all users drain.  If this races with invalidation then the
- * server will only send the grant response once the invalidation is
- * finished.
- */
-static bool try_shrink_lock(struct super_block *sb, struct lock_info *linfo, bool force)
-{
-	struct scoutfs_mount_options opts;
-	struct scoutfs_lock *lock = NULL;
-	struct scoutfs_net_lock nl;
-	int ret = 0;
-
-	scoutfs_options_read(sb, &opts);
-
-	/* avoiding lock contention with unsynchronized test, don't mind temp false results */
-	if (!force && (list_empty(&linfo->lru_list) ||
-	               READ_ONCE(linfo->nr_locks) <= opts.lock_idle_count))
-		return false;
-
-	spin_lock(&linfo->lock);
-
-	lock = list_first_entry_or_null(&linfo->lru_list, struct scoutfs_lock, lru_head);
-	if (lock && (force || (linfo->nr_locks > opts.lock_idle_count))) {
-		__lock_del_lru(linfo, lock);
-		lock->request_pending = 1;
-
-		nl.key = lock->start;
-		nl.old_mode = lock->mode;
-		nl.new_mode = SCOUTFS_LOCK_NULL;
-	} else {
-		lock = NULL;
-	}
-
-	spin_unlock(&linfo->lock);
-
-	if (lock) {
-		ret = scoutfs_client_lock_request(sb, &nl);
-		if (ret < 0) {
-			scoutfs_inc_counter(sb, lock_shrink_request_failed);
-
-			spin_lock(&linfo->lock);
-
-			lock->request_pending = 0;
-			wake_up(&lock->waitq);
-			__put_lock(linfo, lock, false);
-
-			spin_unlock(&linfo->lock);
-		} else {
-			scoutfs_inc_counter(sb, lock_shrink_attempted);
-			trace_scoutfs_lock_shrink(sb, lock);
-		}
-	}
-
-	return lock && ret == 0;
-}
-
 static bool lock_wait_cond(struct super_block *sb, struct scoutfs_lock *lock,
 			   enum scoutfs_lock_mode mode)
 {
@@ -1005,8 +935,6 @@ static int lock_key_range(struct super_block *sb, enum scoutfs_lock_mode mode, i
 	if (WARN_ON_ONCE(scoutfs_trans_held()))
 		return -EDEADLK;

-	try_shrink_lock(sb, linfo, false);
-
 	spin_lock(&linfo->lock);

 	/* drops and re-acquires lock if it allocates */
@@ -1450,12 +1378,134 @@ bool scoutfs_lock_protected(struct scoutfs_lock *lock, struct scoutfs_key *key,
 					  &lock->start, &lock->end) == 0;
 }

+/*
+ * The shrink callback got the lock, marked it request_pending, and put
+ * it on the shrink list.  We send a null request and the lock will be
+ * freed by the response once all users drain.  If this races with
+ * invalidation then the server will only send the grant response once
+ * the invalidation is finished.
+ */
+static void lock_shrink_worker(struct work_struct *work)
+{
+	struct lock_info *linfo = container_of(work, struct lock_info,
+					       shrink_work);
+	struct super_block *sb = linfo->sb;
+	struct scoutfs_net_lock nl;
+	struct scoutfs_lock *lock;
+	struct scoutfs_lock *tmp;
+	LIST_HEAD(list);
+	int ret;
+
+	scoutfs_inc_counter(sb, lock_shrink_work);
+
+	spin_lock(&linfo->lock);
+	list_splice_init(&linfo->shrink_list, &list);
+	spin_unlock(&linfo->lock);
+
+	list_for_each_entry_safe(lock, tmp, &list, shrink_head) {
+		list_del_init(&lock->shrink_head);
+
+		/* unlocked lock access, but should be stable since we queued */
+		nl.key = lock->start;
+		nl.old_mode = lock->mode;
+		nl.new_mode = SCOUTFS_LOCK_NULL;
+
+		ret = scoutfs_client_lock_request(sb, &nl);
+		if (ret) {
+			/* oh well, not freeing */
+			scoutfs_inc_counter(sb, lock_shrink_aborted);
+
+			spin_lock(&linfo->lock);
+
+			lock->request_pending = 0;
+			wake_up(&lock->waitq);
+			put_lock(linfo, lock);
+
+			spin_unlock(&linfo->lock);
+		}
+	}
+}
+
+static unsigned long lock_count_objects(struct shrinker *shrink,
+					struct shrink_control *sc)
+{
+	struct lock_info *linfo = KC_SHRINKER_CONTAINER_OF(shrink, struct lock_info);
+	struct super_block *sb = linfo->sb;
+
+	scoutfs_inc_counter(sb, lock_count_objects);
+
+	return shrinker_min_long(linfo->lru_nr);
+}
+
+/*
+ * Start the shrinking process for locks on the lru.  If a lock is on
+ * the lru then it can't have any active users.  We don't want to block
+ * or allocate here so all we do is get the lock, mark it request
+ * pending, and kick off the work.  The work sends a null request and
+ * eventually the lock is freed by its response.
+ *
+ * Only a racing lock attempt that isn't matched can prevent the lock
+ * from being freed.  It'll block waiting to send its request for its
+ * mode which will prevent the lock from being freed when the null
+ * response arrives.
+ */
+static unsigned long lock_scan_objects(struct shrinker *shrink,
+				       struct shrink_control *sc)
+{
+	struct lock_info *linfo = KC_SHRINKER_CONTAINER_OF(shrink, struct lock_info);
+	struct super_block *sb = linfo->sb;
+	struct scoutfs_lock *lock;
+	struct scoutfs_lock *tmp;
+	unsigned long freed = 0;
+	unsigned long nr = sc->nr_to_scan;
+	bool added = false;
+
+	scoutfs_inc_counter(sb, lock_scan_objects);
+
+	spin_lock(&linfo->lock);
+
+restart:
+	list_for_each_entry_safe(lock, tmp, &linfo->lru_list, lru_head) {
+
+		BUG_ON(!lock_idle(lock));
+		BUG_ON(lock->mode == SCOUTFS_LOCK_NULL);
+		BUG_ON(!list_empty(&lock->shrink_head));
+
+		if (nr-- == 0)
+			break;
+
+		__lock_del_lru(linfo, lock);
+		lock->request_pending = 1;
+		list_add_tail(&lock->shrink_head, &linfo->shrink_list);
+		added = true;
+		freed++;
+
+		scoutfs_inc_counter(sb, lock_shrink_attempted);
+		trace_scoutfs_lock_shrink(sb, lock);
+
+		/* could have bazillions of idle locks */
+		if (cond_resched_lock(&linfo->lock))
+			goto restart;
+	}
+
+	spin_unlock(&linfo->lock);
+
+	if (added)
+		queue_work(linfo->workq, &linfo->shrink_work);
+
+	trace_scoutfs_lock_shrink_exit(sb, sc->nr_to_scan, freed);
+	return freed;
+}
+
 void scoutfs_free_unused_locks(struct super_block *sb)
 {
-	DECLARE_LOCK_INFO(sb, linfo);
+	struct lock_info *linfo = SCOUTFS_SB(sb)->lock_info;
+	struct shrink_control sc = {
+		.gfp_mask = GFP_NOFS,
+		.nr_to_scan = INT_MAX,
+	};

-	while (try_shrink_lock(sb, linfo, true))
-		cond_resched();
+	lock_scan_objects(KC_SHRINKER_FN(&linfo->shrinker), &sc);
 }

 static void lock_tseq_show(struct seq_file *m, struct scoutfs_tseq_entry *ent)
@@ -1538,10 +1588,10 @@ u64 scoutfs_lock_ino_refresh_gen(struct super_block *sb, u64 ino)
 * transitions and sending requests.   We set the shutdown flag to catch
 * anyone who breaks this rule.
 *
- * With no more lock callers, we'll no longer try to shrink the pool of
- * granted locks.  We'll free all of them as _destroy() is called after
- * the farewell response indicates that the server tore down all our
- * lock state.
+ * We unregister the shrinker so that we won't try and send null
+ * requests in response to memory pressure.  The locks will all be
+ * unceremoniously dropped once we get a farewell response from the
+ * server which indicates that they destroyed our locking state.
 *
 * We will still respond to invalidation requests that have to be
 * processed to let unmount in other mounts acquire locks and make
@@ -1561,6 +1611,10 @@ void scoutfs_lock_shutdown(struct super_block *sb)

 	trace_scoutfs_lock_shutdown(sb, linfo);

+	/* stop the shrinker from queueing work */
+	KC_UNREGISTER_SHRINKER(&linfo->shrinker);
+	flush_work(&linfo->shrink_work);
+
 	/* cause current and future lock calls to return errors */
 	spin_lock(&linfo->lock);
 	linfo->shutdown = true;
@@ -1651,6 +1705,8 @@ void scoutfs_lock_destroy(struct super_block *sb)
 			list_del_init(&lock->inv_head);
 			lock->invalidate_pending = 0;
 		}
+		if (!list_empty(&lock->shrink_head))
+			list_del_init(&lock->shrink_head);
 		lock_remove(linfo, lock);
 		lock_free(linfo, lock);
 	}
@@ -1675,9 +1731,14 @@ int scoutfs_lock_setup(struct super_block *sb)
 	spin_lock_init(&linfo->lock);
 	linfo->lock_tree = RB_ROOT;
 	linfo->lock_range_tree = RB_ROOT;
+	KC_INIT_SHRINKER_FUNCS(&linfo->shrinker, lock_count_objects,
+			       lock_scan_objects);
+	KC_REGISTER_SHRINKER(&linfo->shrinker, "scoutfs-lock:" SCSBF, SCSB_ARGS(sb));
 	INIT_LIST_HEAD(&linfo->lru_list);
 	INIT_WORK(&linfo->inv_work, lock_invalidate_worker);
 	INIT_LIST_HEAD(&linfo->inv_list);
+	INIT_WORK(&linfo->shrink_work, lock_shrink_worker);
+	INIT_LIST_HEAD(&linfo->shrink_list);
 	atomic64_set(&linfo->next_refresh_gen, 0);
 	scoutfs_tseq_tree_init(&linfo->tseq_tree, lock_tseq_show);

@@ -506,19 +506,6 @@ out:
 * because we don't know which locks they'll hold.  Once recover
 * finishes the server calls us to kick all the locks that were waiting
 * during recovery.
- *
- * The calling server shuts down if we return errors indicating that we
- * weren't able to ensure forward progress in the lock state machine.
- *
- * Failure to send to a disconnected client is not a fatal error.
- * During normal disconnection the client's state is removed before
- * their connection is destroyed.  We can't use state to try and send to
- * a non-existing connection.  But a client that fails to reconnect is
- * disconnected before being fenced.  If we have multiple disconnected
- * clients we can try to send to one while cleaning up another.  If
- * they've uncleanly disconnected their locks are going to be removed
- * and the lock can make forward progress again.  Or we'll shutdown for
- * failure to fence.
 */
 static int process_waiting_requests(struct super_block *sb,
 				    struct server_lock_node *snode)
@@ -610,10 +597,6 @@ static int process_waiting_requests(struct super_block *sb,
 out:
 	put_server_lock(inf, snode);

-	/* disconnected clients will be fenced, trying to send to them isn't fatal */
-	if (ret == -ENOTCONN)
-		ret = 0;
-
 	return ret;
 }

@@ -35,12 +35,6 @@ do {									\
 	}								\
 } while (0)								\

-#define scoutfs_bug_on_err(sb, err, fmt, args...) \
-do { \
-	__typeof__(err) _err = (err); \
-	scoutfs_bug_on(sb, _err < 0 && _err != -ENOLINK, fmt, ##args); \
-} while (0)
-
 /*
 * Each message is only generated once per volume.  Remounting resets
 * the messages.
@@ -20,8 +20,6 @@
 #include <net/sock.h>
 #include <net/tcp.h>
 #include <linux/log2.h>
-#include <linux/jhash.h>
-#include <linux/rbtree.h>

 #include "format.h"
 #include "counters.h"
@@ -33,7 +31,6 @@
 #include "endian_swap.h"
 #include "tseq.h"
 #include "fence.h"
-#include "options.h"

 /*
 * scoutfs networking delivers requests and responses between nodes.
@@ -126,7 +123,6 @@ struct message_send {
 	unsigned long dead:1;
 	struct list_head head;
 	scoutfs_net_response_t resp_func;
-	struct rb_node node;
 	void *resp_data;
 	struct scoutfs_net_header nh;
 };
@@ -138,7 +134,6 @@ struct message_send {
 struct message_recv {
 	struct scoutfs_tseq_entry tseq_entry;
 	struct work_struct proc_work;
-	struct list_head ordered_head;
 	struct scoutfs_net_connection *conn;
 	struct scoutfs_net_header nh;
 };
@@ -163,118 +158,49 @@ static bool nh_is_request(struct scoutfs_net_header *nh)
 	return !nh_is_response(nh);
 }

-static int cmp_sorted_msend(u64 pos, struct message_send *msend)
-{
-	if (nh_is_request(&msend->nh))
-		return pos < le64_to_cpu(msend->nh.id) ? -1 :
-		       pos > le64_to_cpu(msend->nh.id) ? 1 : 0;
-	else
-		return pos < le64_to_cpu(msend->nh.seq) ? -1 :
-		       pos > le64_to_cpu(msend->nh.seq) ? 1 : 0;
-}
-
-static struct message_send *search_sorted_msends(struct rb_root *root, u64 pos, struct rb_node *ins)
-{
-	struct rb_node **node = &root->rb_node;
-	struct rb_node *parent = NULL;
-	struct message_send *msend = NULL;
-	struct message_send *next = NULL;
-	int cmp = -1;
-
-	while (*node) {
-		parent = *node;
-		msend = container_of(*node, struct message_send, node);
-
-		cmp = cmp_sorted_msend(pos, msend);
-		if (cmp < 0) {
-			next = msend;
-			node = &(*node)->rb_left;
-		} else if (cmp > 0) {
-			node = &(*node)->rb_right;
-		} else {
-			next = msend;
-			break;
-		}
-	}
-
-	BUG_ON(cmp == 0 && ins);
-
-	if (ins) {
-		rb_link_node(ins, parent, node);
-		rb_insert_color(ins, root);
-	}
-
-	return next;
-}
-
-static struct message_send *next_sorted_msend(struct message_send *msend)
-{
-	struct rb_node *node = rb_next(&msend->node);
-
-	return node ? rb_entry(node, struct message_send, node) : NULL;
-}
-
-#define for_each_sorted_msend(MSEND_, TMP_, ROOT_, POS_) \
-	for (MSEND_ = search_sorted_msends(ROOT_, POS_, NULL); \
-	     MSEND_ != NULL && ({ TMP_ = next_sorted_msend(MSEND_); true; }); \
-	     MSEND_ = TMP_)
-
-static void insert_sorted_msend(struct scoutfs_net_connection *conn, struct message_send *msend)
-{
-	BUG_ON(!RB_EMPTY_NODE(&msend->node));
-
-	if (nh_is_request(&msend->nh))
-		search_sorted_msends(&conn->req_root, le64_to_cpu(msend->nh.id), &msend->node);
-	else
-		search_sorted_msends(&conn->resp_root, le64_to_cpu(msend->nh.seq), &msend->node);
-}
-
-static void erase_sorted_msend(struct scoutfs_net_connection *conn, struct message_send *msend)
-{
-	if (!RB_EMPTY_NODE(&msend->node)) {
-		if (nh_is_request(&msend->nh))
-			rb_erase(&msend->node, &conn->req_root);
-		else
-			rb_erase(&msend->node, &conn->resp_root);
-		RB_CLEAR_NODE(&msend->node);
-	}
-}
-
-static void move_sorted_msends(struct scoutfs_net_connection *dst_conn, struct rb_root *dst_root,
-			       struct scoutfs_net_connection *src_conn, struct rb_root *src_root)
-{
-	struct message_send *msend;
-	struct message_send *tmp;
-
-	for_each_sorted_msend(msend, tmp, src_root, 0) {
-		erase_sorted_msend(src_conn, msend);
-		insert_sorted_msend(dst_conn, msend);
-	}
-}
-
 /*
- * Pending requests are uniquely identified by the id they were assigned
- * as they were first put on the send queue.
+ * We return dead requests so that the caller can stop searching other
+ * lists for the dead request that we found.
 */
-static struct message_send *find_request(struct scoutfs_net_connection *conn, u8 cmd, u64 id)
+static struct message_send *search_list(struct scoutfs_net_connection *conn,
+					struct list_head *list,
+					u8 cmd, u64 id)
 {
 	struct message_send *msend;

 	assert_spin_locked(&conn->lock);

-	msend = search_sorted_msends(&conn->req_root, id, NULL);
-	if (msend && !(msend->nh.cmd == cmd && le64_to_cpu(msend->nh.id) == id))
-		msend = NULL;
+	list_for_each_entry(msend, list, head) {
+		if (nh_is_request(&msend->nh) && msend->nh.cmd == cmd &&
+		    le64_to_cpu(msend->nh.id) == id)
+			return msend;
+	}

+	return NULL;
+}
+
+/*
+ * Find an active send request on the lists.  It's almost certainly
+ * waiting on the resend queue but it could be actively being sent.
+ */
+static struct message_send *find_request(struct scoutfs_net_connection *conn,
+					 u8 cmd, u64 id)
+{
+	struct message_send *msend;
+
+	msend = search_list(conn, &conn->resend_queue, cmd, id) ?:
+		search_list(conn, &conn->send_queue, cmd, id);
+	if (msend && msend->dead)
+		msend = NULL;
 	return msend;
 }

 /*
- * Free a send message by moving it to the send queue and marking it
- * dead.  It is removed from the sorted rb roots so it won't be visible
- * as a request for response processing.
+ * Complete a send message by moving it to the send queue and marking it
+ * to be freed.  It won't be visible to callers trying to find sends.
 */
-static void queue_dead_free(struct scoutfs_net_connection *conn, struct message_send *msend)
+static void complete_send(struct scoutfs_net_connection *conn,
+			  struct message_send *msend)
 {
 	assert_spin_locked(&conn->lock);

@@ -284,7 +210,6 @@ static void queue_dead_free(struct scoutfs_net_connection *conn, struct message_

 	msend->dead = 1;
 	list_move(&msend->head, &conn->send_queue);
-	erase_sorted_msend(conn, msend);
 	queue_work(conn->workq, &conn->send_work);
 }

@@ -336,7 +261,7 @@ static inline u8 net_err_from_host(struct super_block *sb, int error)
 				     error);
 		}

-		return SCOUTFS_NET_ERR_EINVAL;
+		return -EINVAL;
 	}

 	return net_errs[ind];
@@ -407,7 +332,7 @@ static int submit_send(struct super_block *sb,
 		return -EINVAL;

 	if (scoutfs_forcing_unmount(sb))
-		return -ENOLINK;
+		return -EIO;

 	msend = kmalloc(offsetof(struct message_send,
 				 nh.data[data_len]), GFP_NOFS);
@@ -442,7 +367,6 @@ static int submit_send(struct super_block *sb,
 	msend->resp_func = resp_func;
 	msend->resp_data = resp_data;
 	msend->dead = 0;
-	RB_CLEAR_NODE(&msend->node);

 	msend->nh.seq = cpu_to_le64(seq);
 	msend->nh.recv_seq = 0;  /* set when sent, not when queued */
@@ -463,7 +387,6 @@ static int submit_send(struct super_block *sb,
 	} else {
 		list_add_tail(&msend->head, &conn->resend_queue);
 	}
-	insert_sorted_msend(conn, msend);

 	if (id_ret)
 		*id_ret = le64_to_cpu(msend->nh.id);
@@ -525,7 +448,7 @@ static int process_response(struct scoutfs_net_connection *conn,
 	struct super_block *sb = conn->sb;
 	struct message_send *msend;
 	scoutfs_net_response_t resp_func = NULL;
-	void *resp_data = NULL;
+	void *resp_data;

 	spin_lock(&conn->lock);

@@ -533,7 +456,7 @@ static int process_response(struct scoutfs_net_connection *conn,
 	if (msend) {
 		resp_func = msend->resp_func;
 		resp_data = msend->resp_data;
-		queue_dead_free(conn, msend);
+		complete_send(conn, msend);
 	} else {
 		scoutfs_inc_counter(sb, net_dropped_response);
 	}
@@ -575,83 +498,76 @@ static void scoutfs_net_proc_worker(struct work_struct *work)
 	trace_scoutfs_net_proc_work_exit(sb, 0, ret);
 }

-static void scoutfs_net_ordered_proc_worker(struct work_struct *work)
-{
-	struct scoutfs_work_list *wlist = container_of(work, struct scoutfs_work_list, work);
-	struct message_recv *mrecv;
-	struct message_recv *mrecv__;
-	LIST_HEAD(list);
-
-	spin_lock(&wlist->lock);
-	list_splice_init(&wlist->list, &list);
-	spin_unlock(&wlist->lock);
-
-	list_for_each_entry_safe(mrecv, mrecv__, &list, ordered_head) {
-		list_del_init(&mrecv->ordered_head);
-		scoutfs_net_proc_worker(&mrecv->proc_work);
-	}
-}
-
-/*
- * Some messages require in-order processing.  But the scope of the
- * ordering isn't global.  In the case of lock messages, it's per lock.
- * So for these messages we hash them to a number of ordered workers who
- * walk a list and call the usual work function in order.  This replaced
- * first the proc work detecting OOO and re-ordering, and then only
- * calling proc from the one recv work context.
- */
-static void queue_ordered_proc(struct scoutfs_net_connection *conn, struct message_recv *mrecv)
-{
-	struct scoutfs_work_list *wlist;
-	struct scoutfs_net_lock *nl;
-	u32 h;
-
-	if (WARN_ON_ONCE(mrecv->nh.cmd != SCOUTFS_NET_CMD_LOCK ||
-		         le16_to_cpu(mrecv->nh.data_len) != sizeof(struct scoutfs_net_lock)))
-		return scoutfs_net_proc_worker(&mrecv->proc_work);
-
-	nl = (void *)mrecv->nh.data;
-	h = jhash(&nl->key, sizeof(struct scoutfs_key), 0x6fdd3cd5);
-	wlist = &conn->ordered_proc_wlists[h % conn->ordered_proc_nr];
-
-	spin_lock(&wlist->lock);
-	list_add_tail(&mrecv->ordered_head, &wlist->list);
-	spin_unlock(&wlist->lock);
-	queue_work(conn->workq, &wlist->work);
-}
-
 /*
 * Free live responses up to and including the seq by marking them dead
 * and moving them to the send queue to be freed.
 */
-static void free_acked_responses(struct scoutfs_net_connection *conn, u64 seq)
+static bool move_acked_responses(struct scoutfs_net_connection *conn,
+				 struct list_head *list, u64 seq)
 {
 	struct message_send *msend;
 	struct message_send *tmp;
+	bool moved = false;
+
+	assert_spin_locked(&conn->lock);
+
+	list_for_each_entry_safe(msend, tmp, list, head) {
+		if (le64_to_cpu(msend->nh.seq) > seq)
+			break;
+		if (!nh_is_response(&msend->nh) || msend->dead)
+			continue;
+
+		msend->dead = 1;
+		list_move(&msend->head, &conn->send_queue);
+		moved = true;
+	}
+
+	return moved;
+}
+
+/* acks are processed inline in the recv worker */
+static void free_acked_responses(struct scoutfs_net_connection *conn, u64 seq)
+{
+	bool moved;

 	spin_lock(&conn->lock);

-	for_each_sorted_msend(msend, tmp, &conn->resp_root, 0) {
-		if (le64_to_cpu(msend->nh.seq) > seq)
-			break;
-
-		queue_dead_free(conn, msend);
-	}
+	moved = move_acked_responses(conn, &conn->send_queue, seq) |
+		move_acked_responses(conn, &conn->resend_queue, seq);

 	spin_unlock(&conn->lock);
+
+	if (moved)
+		queue_work(conn->workq, &conn->send_work);
 }

-static int k_recvmsg(struct socket *sock, void *buf, unsigned len)
+static int recvmsg_full(struct socket *sock, void *buf, unsigned len)
 {
-	struct kvec kv = {
-		.iov_base = buf,
-		.iov_len = len,
-	};
-	struct msghdr msg = {
-		.msg_flags = MSG_NOSIGNAL,
-	};
+	struct msghdr msg;
+	struct kvec kv;
+	int ret;

-	return kernel_recvmsg(sock, &msg, &kv, 1, len, msg.msg_flags);
+	while (len) {
+		memset(&msg, 0, sizeof(msg));
+		msg.msg_flags = MSG_NOSIGNAL;
+		kv.iov_base = buf;
+		kv.iov_len = len;
+
+#ifndef KC_MSGHDR_STRUCT_IOV_ITER
+		msg.msg_iov = (struct iovec *)&kv;
+		msg.msg_iovlen = 1;
+#else
+		iov_iter_init(&msg.msg_iter, READ, (struct iovec *)&kv, len, 1);
+#endif
+		ret = kernel_recvmsg(sock, &msg, &kv, 1, len, msg.msg_flags);
+		if (ret <= 0)
+			return -ECONNABORTED;
+
+		len -= ret;
+		buf += ret;
+	}
+
+	return 0;
 }

 static bool invalid_message(struct scoutfs_net_connection *conn,
@@ -688,72 +604,6 @@ static bool invalid_message(struct scoutfs_net_connection *conn,
 	return false;
 }

-static int recv_one_message(struct super_block *sb, struct net_info *ninf,
-			    struct scoutfs_net_connection *conn, struct scoutfs_net_header *nh,
-			    unsigned int data_len)
-{
-	struct message_recv *mrecv;
-	int ret;
-
-	scoutfs_inc_counter(sb, net_recv_messages);
-	scoutfs_add_counter(sb, net_recv_bytes, nh_bytes(data_len));
-	trace_scoutfs_net_recv_message(sb, &conn->sockname, &conn->peername, nh);
-
-	/* caller's invalid message checked data len */
-	mrecv = kmalloc(offsetof(struct message_recv, nh.data[data_len]), GFP_NOFS);
-	if (!mrecv) {
-		ret = -ENOMEM;
-		goto out;
-	}
-
-	mrecv->conn = conn;
-	INIT_WORK(&mrecv->proc_work, scoutfs_net_proc_worker);
-	INIT_LIST_HEAD(&mrecv->ordered_head);
-	mrecv->nh = *nh;
-	if (data_len)
-		memcpy(mrecv->nh.data, (nh + 1), data_len);
-
-	if (nh->cmd == SCOUTFS_NET_CMD_GREETING) {
-		/* greetings are out of band, no seq mechanics */
-		set_conn_fl(conn, saw_greeting);
-
-	} else if (le64_to_cpu(nh->seq) <=
-		   atomic64_read(&conn->recv_seq)) {
-		/* drop any resent duplicated messages */
-		scoutfs_inc_counter(sb, net_recv_dropped_duplicate);
-		kfree(mrecv);
-		ret = 0;
-		goto out;
-
-	} else {
-		/* record that we've received sender's seq */
-		atomic64_set(&conn->recv_seq, le64_to_cpu(nh->seq));
-		/* and free our responses that sender has received */
-		free_acked_responses(conn, le64_to_cpu(nh->recv_seq));
-	}
-
-	scoutfs_tseq_add(&ninf->msg_tseq_tree, &mrecv->tseq_entry);
-
-	/*
-	 * Initial received greetings are processed inline
-	 * before any other incoming messages.
-	 *
-	 * Incoming requests or responses to the lock client
-	 * can't handle re-ordering, so they're queued to
-	 * ordered receive processing work.
-	 */
-	if (nh->cmd == SCOUTFS_NET_CMD_GREETING)
-		scoutfs_net_proc_worker(&mrecv->proc_work);
-	else if (nh->cmd == SCOUTFS_NET_CMD_LOCK && !conn->listening_conn)
-		queue_ordered_proc(conn, mrecv);
-	else
-		queue_work(conn->workq, &mrecv->proc_work);
-	ret = 0;
-
-out:
-	return ret;
-}
-
 /*
 * Always block receiving from the socket.  Errors trigger shutting down
 * the connection.
@@ -764,72 +614,86 @@ static void scoutfs_net_recv_worker(struct work_struct *work)
 	struct super_block *sb = conn->sb;
 	struct net_info *ninf = SCOUTFS_SB(sb)->net_info;
 	struct socket *sock = conn->sock;
-	struct scoutfs_net_header *nh;
-	struct page *page = NULL;
+	struct scoutfs_net_header nh;
+	struct message_recv *mrecv;
 	unsigned int data_len;
-	int hdr_off;
-	int rx_off;
-	int size;
 	int ret;

 	trace_scoutfs_net_recv_work_enter(sb, 0, 0);

-	page = alloc_page(GFP_NOFS);
-	if (!page) {
-		ret = -ENOMEM;
-		goto out;
-	}
-
-	hdr_off = 0;
-	rx_off = 0;
-
 	for (;;) {
 		/* receive the header */
-		ret = k_recvmsg(sock, page_address(page) + rx_off, PAGE_SIZE - rx_off);
-		if (ret <= 0) {
-			ret = -ECONNABORTED;
-			goto out;
+		ret = recvmsg_full(sock, &nh, sizeof(nh));
+		if (ret)
+			break;
+
+		/* receiving an invalid message breaks the connection */
+		if (invalid_message(conn, &nh)) {
+			scoutfs_inc_counter(sb, net_recv_invalid_message);
+			ret = -EBADMSG;
+			break;
 		}

-		rx_off += ret;
+		data_len = le16_to_cpu(nh.data_len);

-		for (;;) {
-			size = rx_off - hdr_off;
-			if (size < sizeof(struct scoutfs_net_header))
-				break;
+		scoutfs_inc_counter(sb, net_recv_messages);
+		scoutfs_add_counter(sb, net_recv_bytes, nh_bytes(data_len));
+		trace_scoutfs_net_recv_message(sb, &conn->sockname,
+					       &conn->peername, &nh);

-			nh = page_address(page) + hdr_off;
-
-			/* receiving an invalid message breaks the connection */
-			if (invalid_message(conn, nh)) {
-				scoutfs_inc_counter(sb, net_recv_invalid_message);
-				ret = -EBADMSG;
-				goto out;
-			}
-
-			data_len = le16_to_cpu(nh->data_len);
-			if (sizeof(struct scoutfs_net_header) + data_len > size)
-				break;
-
-			ret = recv_one_message(sb, ninf, conn, nh, data_len);
-			if (ret < 0)
-				goto out;
-
-			hdr_off += sizeof(struct scoutfs_net_header) + data_len;
+		/* invalid message checked data len */
+		mrecv = kmalloc(offsetof(struct message_recv,
+					 nh.data[data_len]), GFP_NOFS);
+		if (!mrecv) {
+			ret = -ENOMEM;
+			break;
 		}

-		if ((PAGE_SIZE - rx_off) <
-		    (sizeof(struct scoutfs_net_header) + SCOUTFS_NET_MAX_DATA_LEN)) {
-			if (size)
-				memmove(page_address(page), page_address(page) + hdr_off, size);
-			hdr_off = 0;
-			rx_off = size;
+		mrecv->conn = conn;
+		INIT_WORK(&mrecv->proc_work, scoutfs_net_proc_worker);
+		mrecv->nh = nh;
+
+		/* receive the data payload */
+		ret = recvmsg_full(sock, mrecv->nh.data, data_len);
+		if (ret) {
+			kfree(mrecv);
+			break;
 		}
+
+		if (nh.cmd == SCOUTFS_NET_CMD_GREETING) {
+			/* greetings are out of band, no seq mechanics */
+			set_conn_fl(conn, saw_greeting);
+
+		} else if (le64_to_cpu(nh.seq) <=
+			   atomic64_read(&conn->recv_seq)) {
+			/* drop any resent duplicated messages */
+			scoutfs_inc_counter(sb, net_recv_dropped_duplicate);
+			kfree(mrecv);
+			continue;
+
+		} else {
+			/* record that we've received sender's seq */
+			atomic64_set(&conn->recv_seq, le64_to_cpu(nh.seq));
+			/* and free our responses that sender has received */
+			free_acked_responses(conn, le64_to_cpu(nh.recv_seq));
+		}
+
+		scoutfs_tseq_add(&ninf->msg_tseq_tree, &mrecv->tseq_entry);
+
+		/*
+		 * Initial received greetings are processed
+		 * synchronously before any other incoming messages.
+		 *
+		 * Incoming requests or responses to the lock client are
+		 * called synchronously to avoid reordering.
+		 */
+		if (nh.cmd == SCOUTFS_NET_CMD_GREETING ||
+		    (nh.cmd == SCOUTFS_NET_CMD_LOCK && !conn->listening_conn))
+			scoutfs_net_proc_worker(&mrecv->proc_work);
+		else
+			queue_work(conn->workq, &mrecv->proc_work);
 	}

-out:
-	__free_page(page);
-
 	if (ret)
 		scoutfs_inc_counter(sb, net_recv_error);

@@ -839,48 +703,38 @@ out:
 	trace_scoutfs_net_recv_work_exit(sb, 0, ret);
 }

-/*
- * This consumes the kvec.
- */
-static int k_sendmsg_full(struct socket *sock, struct kvec *kv, unsigned long nr_segs, size_t count)
+static int sendmsg_full(struct socket *sock, void *buf, unsigned len)
 {
-	int ret = 0;
+	struct msghdr msg;
+	struct kvec kv;
+	int ret;

-	while (count > 0) {
-		struct msghdr msg = {
-			.msg_flags = MSG_NOSIGNAL,
-		};
+	while (len) {
+		memset(&msg, 0, sizeof(msg));
+		msg.msg_flags = MSG_NOSIGNAL;
+		kv.iov_base = buf;
+		kv.iov_len = len;

-		ret = kernel_sendmsg(sock, &msg, kv, nr_segs, count);
-		if (ret <= 0) {
-			ret = -ECONNABORTED;
-			break;
-		}
+#ifndef KC_MSGHDR_STRUCT_IOV_ITER
+		msg.msg_iov = (struct iovec *)&kv;
+		msg.msg_iovlen = 1;
+#else
+		iov_iter_init(&msg.msg_iter, WRITE, (struct iovec *)&kv, len, 1);
+#endif
+		ret = kernel_sendmsg(sock, &msg, &kv, 1, len);
+		if (ret <= 0)
+			return -ECONNABORTED;

-		count -= ret;
-		if (count) {
-			while (nr_segs > 0 && ret >= kv->iov_len) {
-				ret -= kv->iov_len;
-				kv++;
-				nr_segs--;
-			}
-			if (nr_segs > 0 && ret > 0) {
-				kv->iov_base += ret;
-				kv->iov_len -= ret;
-			}
-			BUG_ON(nr_segs == 0);
-		}
-		ret = 0;
+		len -= ret;
+		buf += ret;
 	}
-	
-	return ret;
+
+	return 0;
 }

-static void free_msend(struct net_info *ninf, struct scoutfs_net_connection *conn,
-		       struct message_send *msend)
+static void free_msend(struct net_info *ninf, struct message_send *msend)
 {
 	list_del_init(&msend->head);
-	erase_sorted_msend(conn, msend);
 	scoutfs_tseq_del(&ninf->msg_tseq_tree, &msend->tseq_entry);
 	kfree(msend);
 }
@@ -906,74 +760,54 @@ static void scoutfs_net_send_worker(struct work_struct *work)
 	struct super_block *sb = conn->sb;
 	struct net_info *ninf = SCOUTFS_SB(sb)->net_info;
 	struct message_send *msend;
-	struct message_send *_msend_;
-	struct kvec kv[16];
-	unsigned long nr_segs;
-	size_t count;
+	int ret = 0;
 	int len;
-	int ret;

 	trace_scoutfs_net_send_work_enter(sb, 0, 0);

-	for (;;) {
-		nr_segs = 0;
-		count = 0;
+	spin_lock(&conn->lock);
+
+	while ((msend = list_first_entry_or_null(&conn->send_queue,
+						 struct message_send, head))) {
+
+		if (msend->dead) {
+			free_msend(ninf, msend);
+			continue;
+		}
+
+		if ((msend->nh.cmd == SCOUTFS_NET_CMD_FAREWELL) &&
+		    nh_is_response(&msend->nh)) {
+			set_conn_fl(conn, saw_farewell);
+		}
+
+		msend->nh.recv_seq =
+			cpu_to_le64(atomic64_read(&conn->recv_seq));
+
+		spin_unlock(&conn->lock);
+
+		len = nh_bytes(le16_to_cpu(msend->nh.data_len));
+
+		scoutfs_inc_counter(sb, net_send_messages);
+		scoutfs_add_counter(sb, net_send_bytes, len);
+		trace_scoutfs_net_send_message(sb, &conn->sockname,
+					       &conn->peername, &msend->nh);
+
+		ret = sendmsg_full(conn->sock, &msend->nh, len);

 		spin_lock(&conn->lock);

-		list_for_each_entry_safe(msend, _msend_, &conn->send_queue, head) {
-			if (msend->dead) {
-				free_msend(ninf, conn, msend);
-				continue;
-			}
+		msend->nh.recv_seq = 0;

-			len = nh_bytes(le16_to_cpu(msend->nh.data_len));
+		if (ret)
+			break;

-			if ((msend->nh.cmd == SCOUTFS_NET_CMD_FAREWELL) &&
-			    nh_is_response(&msend->nh)) {
-				set_conn_fl(conn, saw_farewell);
-			}
-
-			msend->nh.recv_seq = cpu_to_le64(atomic64_read(&conn->recv_seq));
-
-			scoutfs_inc_counter(sb, net_send_messages);
-			scoutfs_add_counter(sb, net_send_bytes, len);
-			trace_scoutfs_net_send_message(sb, &conn->sockname,
-						       &conn->peername, &msend->nh);
-
-			count += len;
-			kv[nr_segs].iov_base = &msend->nh;
-			kv[nr_segs].iov_len = len;
-			if (++nr_segs == ARRAY_SIZE(kv))
-				break;
-
-		}
-		spin_unlock(&conn->lock);
-
-		if (nr_segs == 0) {
-			ret = 0;
-			goto out;
-		}
-
-		ret = k_sendmsg_full(conn->sock, kv, nr_segs, count);
-		if (ret < 0)
-			goto out;
-
-		spin_lock(&conn->lock);
-		list_for_each_entry_safe(msend, _msend_, &conn->send_queue, head) {
-			msend->nh.recv_seq = 0;
-
-			/* resend if it wasn't freed while we sent */
-			if (!msend->dead)
-				list_move_tail(&msend->head, &conn->resend_queue);
-
-			if (--nr_segs == 0)
-				break;
-		}
-		spin_unlock(&conn->lock);
+		/* resend if it wasn't freed while we sent */
+		if (!msend->dead)
+			list_move_tail(&msend->head, &conn->resend_queue);
 	}

-out:
+	spin_unlock(&conn->lock);
+
 	if (ret) {
 		scoutfs_inc_counter(sb, net_send_error);
 		shutdown_conn(conn);
@@ -1012,7 +846,7 @@ static void scoutfs_net_destroy_worker(struct work_struct *work)

 	list_splice_init(&conn->resend_queue, &conn->send_queue);
 	list_for_each_entry_safe(msend, tmp, &conn->send_queue, head)
-		free_msend(ninf, conn, msend);
+		free_msend(ninf, msend);

 	/* accepted sockets are removed from their listener's list */
 	if (conn->listening_conn) {
@@ -1028,7 +862,6 @@ static void scoutfs_net_destroy_worker(struct work_struct *work)
 	destroy_workqueue(conn->workq);
 	scoutfs_tseq_del(&ninf->conn_tseq_tree, &conn->tseq_entry);
 	kfree(conn->info);
-	kfree(conn->ordered_proc_wlists);
 	trace_scoutfs_conn_destroy_free(conn);
 	kfree(conn);

@@ -1054,7 +887,7 @@ static void destroy_conn(struct scoutfs_net_connection *conn)
 * The TCP_KEEP* and TCP_USER_TIMEOUT option interaction is subtle.
 * TCP_USER_TIMEOUT only applies if there is unacked written data in the
 * send queue.  It doesn't work if the connection is idle.  Adding
- * keepalive probes with user_timeout set changes how the keepalive
+ * keepalice probes with user_timeout set changes how the keepalive
 * timeout is calculated.   CNT no longer matters.   Each time
 * additional probes (not the first) are sent the user timeout is
 * checked against the last time data was received.  If none of the
@@ -1066,16 +899,14 @@ static void destroy_conn(struct scoutfs_net_connection *conn)
 * elapses during the probe timer processing after the unsuccessful
 * probes.
 */
-static int sock_opts_and_names(struct super_block *sb,
-			       struct scoutfs_net_connection *conn,
+#define UNRESPONSIVE_TIMEOUT_SECS 10
+#define UNRESPONSIVE_PROBES 3
+static int sock_opts_and_names(struct scoutfs_net_connection *conn,
 			       struct socket *sock)
 {
-	struct scoutfs_mount_options opts;
 	int optval;
 	int ret;

-	scoutfs_options_read(sb, &opts);
-
 	/* we use a keepalive timeout instead of send timeout */
 	ret = kc_sock_set_sndtimeo(sock, 0);
 	if (ret)
@@ -1088,7 +919,8 @@ static int sock_opts_and_names(struct super_block *sb,
 	if (ret)
 		goto out;

-	optval = (opts.tcp_keepalive_timeout_ms / MSEC_PER_SEC) - UNRESPONSIVE_PROBES;
+	BUILD_BUG_ON(UNRESPONSIVE_PROBES >= UNRESPONSIVE_TIMEOUT_SECS);
+	optval = UNRESPONSIVE_TIMEOUT_SECS - (UNRESPONSIVE_PROBES);
 	ret = kc_tcp_sock_set_keepidle(sock, optval);
 	if (ret)
 		goto out;
@@ -1098,7 +930,7 @@ static int sock_opts_and_names(struct super_block *sb,
 	if (ret)
 		goto out;

-	optval = opts.tcp_keepalive_timeout_ms;
+	optval = UNRESPONSIVE_TIMEOUT_SECS * MSEC_PER_SEC;
 	ret = kc_tcp_sock_set_user_timeout(sock, optval);
 	if (ret)
 		goto out;
@@ -1160,19 +992,13 @@ static void scoutfs_net_listen_worker(struct work_struct *work)
 						  conn->notify_down,
 						  conn->info_size,
 						  conn->req_funcs, "accepted");
-		/*
-		 * scoutfs_net_alloc_conn() can fail due to ENOMEM. If this
-		 * is the only thing that does so, there's no harm in trying
-		 * to see if kernel_accept() can get enough memory to try accepting
-		 * a new connection again. If that then fails with ENOMEM, it'll
-		 * shut down the conn anyway. So just retry here.
-		 */
 		if (!acc_conn) {
 			sock_release(acc_sock);
+			ret = -ENOMEM;
 			continue;
 		}

-		ret = sock_opts_and_names(sb, acc_conn, acc_sock);
+		ret = sock_opts_and_names(acc_conn, acc_sock);
 		if (ret) {
 			sock_release(acc_sock);
 			destroy_conn(acc_conn);
@@ -1243,7 +1069,7 @@ static void scoutfs_net_connect_worker(struct work_struct *work)
 	if (ret)
 		goto out;

-	ret = sock_opts_and_names(sb, conn, sock);
+	ret = sock_opts_and_names(conn, sock);
 	if (ret)
 		goto out;

@@ -1358,7 +1184,7 @@ static void scoutfs_net_shutdown_worker(struct work_struct *work)
 							struct message_send, head))) {
 			resp_func = msend->resp_func;
 			resp_data = msend->resp_data;
-			free_msend(ninf, conn, msend);
+			free_msend(ninf, msend);
 			spin_unlock(&conn->lock);

 			call_resp_func(sb, conn, resp_func, resp_data, NULL, 0, -ECONNABORTED);
@@ -1374,7 +1200,7 @@ static void scoutfs_net_shutdown_worker(struct work_struct *work)
 	list_splice_tail_init(&conn->send_queue, &conn->resend_queue);
 	list_for_each_entry_safe(msend, tmp, &conn->resend_queue, head) {
 		if (msend->nh.cmd == SCOUTFS_NET_CMD_GREETING)
-			free_msend(ninf, conn, msend);
+			free_msend(ninf, msend);
 	}

 	clear_conn_fl(conn, saw_greeting);
@@ -1504,30 +1330,25 @@ scoutfs_net_alloc_conn(struct super_block *sb,
 {
 	struct net_info *ninf = SCOUTFS_SB(sb)->net_info;
 	struct scoutfs_net_connection *conn;
-	unsigned int nr;
-	unsigned int i;
-
-	nr = min_t(unsigned int, num_possible_cpus(),
-		   PAGE_SIZE / sizeof(struct scoutfs_work_list));

 	conn = kzalloc(sizeof(struct scoutfs_net_connection), GFP_NOFS);
-	if (conn) {
-		if (info_size)
-			conn->info = kzalloc(info_size, GFP_NOFS);
-		conn->ordered_proc_wlists = kmalloc_array(nr, sizeof(struct scoutfs_work_list),
-							  GFP_NOFS);
-		conn->workq = alloc_workqueue("scoutfs_net_%s",
-					      WQ_UNBOUND | WQ_NON_REENTRANT, 0,
-					      name_suffix);
-	}
-	if (!conn || (info_size && !conn->info) || !conn->workq || !conn->ordered_proc_wlists) {
-		if (conn) {
-			kfree(conn->info);
-			kfree(conn->ordered_proc_wlists);
-			if (conn->workq)
-				destroy_workqueue(conn->workq);
+	if (!conn)
+		return NULL;
+
+	if (info_size) {
+		conn->info = kzalloc(info_size, GFP_NOFS);
+		if (!conn->info) {
 			kfree(conn);
+			return NULL;
 		}
+	}
+
+	conn->workq = alloc_workqueue("scoutfs_net_%s",
+				      WQ_UNBOUND | WQ_NON_REENTRANT, 0,
+				      name_suffix);
+	if (!conn->workq) {
+		kfree(conn->info);
+		kfree(conn);
 		return NULL;
 	}

@@ -1548,8 +1369,6 @@ scoutfs_net_alloc_conn(struct super_block *sb,
 	atomic64_set(&conn->recv_seq, 0);
 	INIT_LIST_HEAD(&conn->send_queue);
 	INIT_LIST_HEAD(&conn->resend_queue);
-	conn->req_root = RB_ROOT;
-	conn->resp_root = RB_ROOT;
 	INIT_WORK(&conn->listen_work, scoutfs_net_listen_worker);
 	INIT_WORK(&conn->connect_work, scoutfs_net_connect_worker);
 	INIT_WORK(&conn->send_work, scoutfs_net_send_worker);
@@ -1559,13 +1378,6 @@ scoutfs_net_alloc_conn(struct super_block *sb,
 	INIT_DELAYED_WORK(&conn->reconn_free_dwork,
 			  scoutfs_net_reconn_free_worker);

-	conn->ordered_proc_nr = nr;
-	for (i = 0; i < nr; i++) {
-		INIT_WORK(&conn->ordered_proc_wlists[i].work, scoutfs_net_ordered_proc_worker);
-		spin_lock_init(&conn->ordered_proc_wlists[i].lock);
-		INIT_LIST_HEAD(&conn->ordered_proc_wlists[i].list);
-	}
-
 	scoutfs_tseq_add(&ninf->conn_tseq_tree, &conn->tseq_entry);
 	trace_scoutfs_conn_alloc(conn);

@@ -1762,7 +1574,7 @@ void scoutfs_net_client_greeting(struct super_block *sb,
 		atomic64_set(&conn->recv_seq, 0);
 		list_for_each_entry_safe(msend, tmp, &conn->resend_queue, head){
 			if (nh_is_response(&msend->nh))
-				free_msend(ninf, conn, msend);
+				free_msend(ninf, msend);
 		}
 	}

@@ -1865,8 +1677,6 @@ restart:
 		BUG_ON(!list_empty(&reconn->send_queue));
 		/* queued greeting response is racing, can be in send or resend queue */
 		list_splice_tail_init(&reconn->resend_queue, &conn->resend_queue);
-		move_sorted_msends(conn, &conn->req_root, reconn, &reconn->req_root);
-		move_sorted_msends(conn, &conn->resp_root, reconn, &reconn->resp_root);

 		/* new conn info is unused, swap, old won't call down */
 		swap(conn->info, reconn->info);
@@ -1,18 +1,10 @@
 #ifndef _SCOUTFS_NET_H_
 #define _SCOUTFS_NET_H_

-#include <linux/spinlock.h>
-#include <linux/list.h>
 #include <linux/in.h>
 #include "endian_swap.h"
 #include "tseq.h"

-struct scoutfs_work_list {
-	struct work_struct work;
-	spinlock_t lock;
-	struct list_head list;
-};
-
 struct scoutfs_net_connection;

 /* These are called in their own blocking context */
@@ -67,12 +59,8 @@ struct scoutfs_net_connection {
 	u64 next_send_id;
 	struct list_head send_queue;
 	struct list_head resend_queue;
-	struct rb_root req_root;
-	struct rb_root resp_root;

 	atomic64_t recv_seq;
-	unsigned int ordered_proc_nr;
-	struct scoutfs_work_list *ordered_proc_wlists;

 	struct workqueue_struct *workq;
 	struct work_struct listen_work;
@@ -592,7 +592,7 @@ static int handle_request(struct super_block *sb, struct omap_request *req)
 	ret = 0;
 out:
 	free_rids(&priv_rids);
-	if ((ret < 0) && (req != NULL)) {
+	if (ret < 0) {
 		ret = scoutfs_server_send_omap_response(sb, req->client_rid, req->client_id,
 							NULL, ret);
 		free_req(req);
@@ -33,15 +33,12 @@ enum {
 	Opt_acl,
 	Opt_data_prealloc_blocks,
 	Opt_data_prealloc_contig_only,
-	Opt_ino_alloc_per_lock,
-	Opt_lock_idle_count,
 	Opt_log_merge_wait_timeout_ms,
 	Opt_metadev_path,
 	Opt_noacl,
 	Opt_orphan_scan_delay_ms,
 	Opt_quorum_heartbeat_timeout_ms,
 	Opt_quorum_slot_nr,
-	Opt_tcp_keepalive_timeout_ms,
 	Opt_err,
 };

@@ -49,15 +46,12 @@ static const match_table_t tokens = {
 	{Opt_acl, "acl"},
 	{Opt_data_prealloc_blocks, "data_prealloc_blocks=%s"},
 	{Opt_data_prealloc_contig_only, "data_prealloc_contig_only=%s"},
-	{Opt_ino_alloc_per_lock, "ino_alloc_per_lock=%s"},
-	{Opt_lock_idle_count, "lock_idle_count=%s"},
 	{Opt_log_merge_wait_timeout_ms, "log_merge_wait_timeout_ms=%s"},
 	{Opt_metadev_path, "metadev_path=%s"},
 	{Opt_noacl, "noacl"},
 	{Opt_orphan_scan_delay_ms, "orphan_scan_delay_ms=%s"},
 	{Opt_quorum_heartbeat_timeout_ms, "quorum_heartbeat_timeout_ms=%s"},
 	{Opt_quorum_slot_nr, "quorum_slot_nr=%s"},
-	{Opt_tcp_keepalive_timeout_ms, "tcp_keepalive_timeout_ms=%s"},
 	{Opt_err, NULL}
 };

@@ -121,10 +115,6 @@ static void free_options(struct scoutfs_mount_options *opts)
 	kfree(opts->metadev_path);
 }

-#define MIN_LOCK_IDLE_COUNT	32
-#define DEFAULT_LOCK_IDLE_COUNT	(10 * 1000)
-#define MAX_LOCK_IDLE_COUNT	(100 * 1000)
-
 #define MIN_LOG_MERGE_WAIT_TIMEOUT_MS		100UL
 #define DEFAULT_LOG_MERGE_WAIT_TIMEOUT_MS	500
 #define MAX_LOG_MERGE_WAIT_TIMEOUT_MS		(60 * MSEC_PER_SEC)
@@ -136,36 +126,16 @@ static void free_options(struct scoutfs_mount_options *opts)
 #define MIN_DATA_PREALLOC_BLOCKS	1ULL
 #define MAX_DATA_PREALLOC_BLOCKS	((unsigned long long)SCOUTFS_BLOCK_SM_MAX)

-#define DEFAULT_TCP_KEEPALIVE_TIMEOUT_MS	(60 * MSEC_PER_SEC)
-
 static void init_default_options(struct scoutfs_mount_options *opts)
 {
 	memset(opts, 0, sizeof(*opts));

 	opts->data_prealloc_blocks = SCOUTFS_DATA_PREALLOC_DEFAULT_BLOCKS;
 	opts->data_prealloc_contig_only = 1;
-	opts->ino_alloc_per_lock = SCOUTFS_LOCK_INODE_GROUP_NR;
-	opts->lock_idle_count = DEFAULT_LOCK_IDLE_COUNT;
 	opts->log_merge_wait_timeout_ms = DEFAULT_LOG_MERGE_WAIT_TIMEOUT_MS;
 	opts->orphan_scan_delay_ms = -1;
 	opts->quorum_heartbeat_timeout_ms = SCOUTFS_QUORUM_DEF_HB_TIMEO_MS;
 	opts->quorum_slot_nr = -1;
-	opts->tcp_keepalive_timeout_ms = DEFAULT_TCP_KEEPALIVE_TIMEOUT_MS;
-}
-
-static int verify_lock_idle_count(struct super_block *sb, int ret, int val)
-{
-	if (ret < 0) {
-		scoutfs_err(sb, "failed to parse lock_idle_count value");
-		return -EINVAL;
-	}
-	if (val < MIN_LOCK_IDLE_COUNT || val > MAX_LOCK_IDLE_COUNT) {
-		scoutfs_err(sb, "invalid lock_idle_count value %d, must be between %u and %u",
-			    val, MIN_LOCK_IDLE_COUNT, MAX_LOCK_IDLE_COUNT);
-		return -EINVAL;
-	}
-
-	return 0;
 }

 static int verify_log_merge_wait_timeout_ms(struct super_block *sb, int ret, int val)
@@ -198,21 +168,6 @@ static int verify_quorum_heartbeat_timeout_ms(struct super_block *sb, int ret, u
 	return 0;
 }

-static int verify_tcp_keepalive_timeout_ms(struct super_block *sb, int ret, int val)
-{
-	if (ret < 0) {
-		scoutfs_err(sb, "failed to parse tcp_keepalive_timeout_ms value");
-		return -EINVAL;
-	}
-	if (val <= (UNRESPONSIVE_PROBES * MSEC_PER_SEC)) {
-		scoutfs_err(sb, "invalid tcp_keepalive_timeout_ms value %d, must be larger than %lu",
-			    val, (UNRESPONSIVE_PROBES * MSEC_PER_SEC));
-		return -EINVAL;
-	}
-
-	return 0;
-}
-
 /*
 * Parse the option string into our options struct.   This can allocate
 * memory in the struct.  The caller is responsible for always calling
@@ -263,34 +218,6 @@ static int parse_options(struct super_block *sb, char *options, struct scoutfs_m
 			opts->data_prealloc_contig_only = nr;
 			break;

-		case Opt_ino_alloc_per_lock:
-			ret = match_int(args, &nr);
-			if (ret < 0 || nr < 1 || nr > SCOUTFS_LOCK_INODE_GROUP_NR) {
-				scoutfs_err(sb, "invalid ino_alloc_per_lock option, must be between 1 and %u",
-					    SCOUTFS_LOCK_INODE_GROUP_NR);
-				if (ret == 0)
-					ret = -EINVAL;
-				return ret;
-			}
-			opts->ino_alloc_per_lock = nr;
-			break;
-
-		case Opt_tcp_keepalive_timeout_ms:
-			ret = match_int(args, &nr);
-			ret = verify_tcp_keepalive_timeout_ms(sb, ret, nr);
-			if (ret < 0)
-				return ret;
-			opts->tcp_keepalive_timeout_ms = nr;
-			break;
-
-		case Opt_lock_idle_count:
-			ret = match_int(args, &nr);
-			ret = verify_lock_idle_count(sb, ret, nr);
-			if (ret < 0)
-				return ret;
-			opts->lock_idle_count = nr;
-			break;
-
 		case Opt_log_merge_wait_timeout_ms:
 			ret = match_int(args, &nr);
 			ret = verify_log_merge_wait_timeout_ms(sb, ret, nr);
@@ -438,14 +365,12 @@ int scoutfs_options_show(struct seq_file *seq, struct dentry *root)
 		seq_puts(seq, ",acl");
 	seq_printf(seq, ",data_prealloc_blocks=%llu", opts.data_prealloc_blocks);
 	seq_printf(seq, ",data_prealloc_contig_only=%u", opts.data_prealloc_contig_only);
-	seq_printf(seq, ",ino_alloc_per_lock=%u", opts.ino_alloc_per_lock);
 	seq_printf(seq, ",metadev_path=%s", opts.metadev_path);
 	if (!is_acl)
 		seq_puts(seq, ",noacl");
 	seq_printf(seq, ",orphan_scan_delay_ms=%u", opts.orphan_scan_delay_ms);
 	if (opts.quorum_slot_nr >= 0)
 		seq_printf(seq, ",quorum_slot_nr=%d", opts.quorum_slot_nr);
-	seq_printf(seq, ",tcp_keepalive_timeout_ms=%d", opts.tcp_keepalive_timeout_ms);

 	return 0;
 }
@@ -527,82 +452,6 @@ static ssize_t data_prealloc_contig_only_store(struct kobject *kobj, struct kobj
 }
 SCOUTFS_ATTR_RW(data_prealloc_contig_only);

-static ssize_t ino_alloc_per_lock_show(struct kobject *kobj, struct kobj_attribute *attr,
-					 char *buf)
-{
-	struct super_block *sb = SCOUTFS_SYSFS_ATTRS_SB(kobj);
-	struct scoutfs_mount_options opts;
-
-	scoutfs_options_read(sb, &opts);
-
-	return snprintf(buf, PAGE_SIZE, "%u", opts.ino_alloc_per_lock);
-}
-static ssize_t ino_alloc_per_lock_store(struct kobject *kobj, struct kobj_attribute *attr,
-					  const char *buf, size_t count)
-{
-	struct super_block *sb = SCOUTFS_SYSFS_ATTRS_SB(kobj);
-	DECLARE_OPTIONS_INFO(sb, optinf);
-	char nullterm[20]; /* more than enough for octal -U32_MAX */
-	long val;
-	int len;
-	int ret;
-
-	len = min(count, sizeof(nullterm) - 1);
-	memcpy(nullterm, buf, len);
-	nullterm[len] = '\0';
-
-	ret = kstrtol(nullterm, 0, &val);
-	if (ret < 0 || val < 1 || val > SCOUTFS_LOCK_INODE_GROUP_NR) {
-		scoutfs_err(sb, "invalid ino_alloc_per_lock option, must be between 1 and %u",
-			    SCOUTFS_LOCK_INODE_GROUP_NR);
-		return -EINVAL;
-	}
-
-	write_seqlock(&optinf->seqlock);
-	optinf->opts.ino_alloc_per_lock = val;
-	write_sequnlock(&optinf->seqlock);
-
-	return count;
-}
-SCOUTFS_ATTR_RW(ino_alloc_per_lock);
-
-static ssize_t lock_idle_count_show(struct kobject *kobj, struct kobj_attribute *attr,
-						char *buf)
-{
-	struct super_block *sb = SCOUTFS_SYSFS_ATTRS_SB(kobj);
-	struct scoutfs_mount_options opts;
-
-	scoutfs_options_read(sb, &opts);
-
-	return snprintf(buf, PAGE_SIZE, "%u", opts.lock_idle_count);
-}
-static ssize_t lock_idle_count_store(struct kobject *kobj, struct kobj_attribute *attr,
-						 const char *buf, size_t count)
-{
-	struct super_block *sb = SCOUTFS_SYSFS_ATTRS_SB(kobj);
-	DECLARE_OPTIONS_INFO(sb, optinf);
-	char nullterm[30]; /* more than enough for octal -U64_MAX */
-	int val;
-	int len;
-	int ret;
-
-	len = min(count, sizeof(nullterm) - 1);
-	memcpy(nullterm, buf, len);
-	nullterm[len] = '\0';
-
-	ret = kstrtoint(nullterm, 0, &val);
-	ret = verify_lock_idle_count(sb, ret, val);
-	if (ret == 0) {
-		write_seqlock(&optinf->seqlock);
-		optinf->opts.lock_idle_count = val;
-		write_sequnlock(&optinf->seqlock);
-		ret = count;
-	}
-
-	return ret;
-}
-SCOUTFS_ATTR_RW(lock_idle_count);
-
 static ssize_t log_merge_wait_timeout_ms_show(struct kobject *kobj, struct kobj_attribute *attr,
 						char *buf)
 {
@@ -743,8 +592,6 @@ SCOUTFS_ATTR_RO(quorum_slot_nr);
 static struct attribute *options_attrs[] = {
 	SCOUTFS_ATTR_PTR(data_prealloc_blocks),
 	SCOUTFS_ATTR_PTR(data_prealloc_contig_only),
-	SCOUTFS_ATTR_PTR(ino_alloc_per_lock),
-	SCOUTFS_ATTR_PTR(lock_idle_count),
 	SCOUTFS_ATTR_PTR(log_merge_wait_timeout_ms),
 	SCOUTFS_ATTR_PTR(metadev_path),
 	SCOUTFS_ATTR_PTR(orphan_scan_delay_ms),
@@ -8,18 +8,13 @@
 struct scoutfs_mount_options {
 	u64 data_prealloc_blocks;
 	bool data_prealloc_contig_only;
-	unsigned int ino_alloc_per_lock;
-	int lock_idle_count;
 	unsigned int log_merge_wait_timeout_ms;
 	char *metadev_path;
 	unsigned int orphan_scan_delay_ms;
 	int quorum_slot_nr;
 	u64 quorum_heartbeat_timeout_ms;
-	int tcp_keepalive_timeout_ms;
 };

-#define UNRESPONSIVE_PROBES	3
-
 void scoutfs_options_read(struct super_block *sb, struct scoutfs_mount_options *opts);
 int scoutfs_options_show(struct seq_file *seq, struct dentry *root);

@@ -243,6 +243,10 @@ static int send_msg_members(struct super_block *sb, int type, u64 term, int only
 	};
 	struct sockaddr_in sin;
 	struct msghdr mh = {
+#ifndef KC_MSGHDR_STRUCT_IOV_ITER
+		.msg_iov = (struct iovec *)&kv,
+		.msg_iovlen = 1,
+#endif
 		.msg_flags = MSG_DONTWAIT | MSG_NOSIGNAL,
 		.msg_name = &sin,
 		.msg_namelen = sizeof(sin),
@@ -264,7 +268,9 @@ static int send_msg_members(struct super_block *sb, int type, u64 term, int only

 		scoutfs_quorum_slot_sin(&qinf->qconf, i, &sin);
 		now = ktime_get();
-
+#ifdef KC_MSGHDR_STRUCT_IOV_ITER
+		iov_iter_init(&mh.msg_iter, WRITE, (struct iovec *)&kv, sizeof(qmes), 1);
+#endif
 		ret = kernel_sendmsg(qinf->sock, &mh, &kv, 1, kv.iov_len);
 		if (ret != kv.iov_len)
 			failed++;
@@ -306,6 +312,10 @@ static int recv_msg(struct super_block *sb, struct quorum_host_msg *msg,
 		.iov_len = sizeof(struct scoutfs_quorum_message),
 	};
 	struct msghdr mh = {
+#ifndef KC_MSGHDR_STRUCT_IOV_ITER
+		.msg_iov = (struct iovec *)&kv,
+		.msg_iovlen = 1,
+#endif
 		.msg_flags = MSG_NOSIGNAL,
 	};

@@ -323,6 +333,9 @@ static int recv_msg(struct super_block *sb, struct quorum_host_msg *msg,
 		ret = kc_tcp_sock_set_rcvtimeo(qinf->sock, rel_to);
 	}

+#ifdef KC_MSGHDR_STRUCT_IOV_ITER
+	iov_iter_init(&mh.msg_iter, READ, (struct iovec *)&kv, sizeof(struct scoutfs_quorum_message), 1);
+#endif
 	ret = kernel_recvmsg(qinf->sock, &mh, &kv, 1, kv.iov_len, mh.msg_flags);
 	if (ret < 0)
 		return ret;
@@ -507,10 +520,10 @@ static int update_quorum_block(struct super_block *sb, int event, u64 term, bool
 		set_quorum_block_event(sb, &blk, event, term);
 		ret = write_quorum_block(sb, blkno, &blk);
 		if (ret < 0)
-			scoutfs_err(sb, "error %d writing quorum block %llu after updating event %d term %llu",
+			scoutfs_err(sb, "error %d reading quorum block %llu to update event %d term %llu",
 				    ret, blkno, event, term);
 	} else {
-		scoutfs_err(sb, "error %d reading quorum block %llu to update event %d term %llu",
+		scoutfs_err(sb, "error %d writing quorum block %llu after updating event %d term %llu",
 			    ret, blkno, event, term);
 	}

@@ -809,7 +822,6 @@ static void scoutfs_quorum_worker(struct work_struct *work)

 		/* followers and candidates start new election on timeout */
 		if (qst.role != LEADER &&
-		    msg.type == SCOUTFS_QUORUM_MSG_INVALID &&
 		    ktime_after(ktime_get(), qst.timeout)) {
 			/* .. but only if their server has stopped */
 			if (!scoutfs_server_is_down(sb)) {
@@ -970,10 +982,7 @@ static void scoutfs_quorum_worker(struct work_struct *work)
 	}

 	/* record that this slot no longer has an active quorum */
-	err = update_quorum_block(sb, SCOUTFS_QUORUM_EVENT_END, qst.term, true);
-	if (err < 0 && ret == 0)
-		ret = err;
-
+	update_quorum_block(sb, SCOUTFS_QUORUM_EVENT_END, qst.term, true);
 out:
 	if (ret < 0) {
 		scoutfs_err(sb, "quorum service saw error %d, shutting down.  This mount is no longer participating in quorum.  It should be remounted to restore service.",
@@ -1062,7 +1071,7 @@ static char *role_str(int role)
 		[LEADER] = "leader",
 	};

-	if (role < 0 || role >= ARRAY_SIZE(roles) || !roles[role])
+	if (role < 0 || role > ARRAY_SIZE(roles) || !roles[role])
 		return "invalid";

 	return roles[role];
@@ -1195,8 +1204,8 @@ static struct attribute *quorum_attrs[] = {

 static inline bool valid_ipv4_unicast(__be32 addr)
 {
-	return !(ipv4_is_multicast(addr) || ipv4_is_lbcast(addr) ||
-		 ipv4_is_zeronet(addr) || ipv4_is_local_multicast(addr));
+	return !(ipv4_is_multicast(addr) && ipv4_is_lbcast(addr) &&
+		 ipv4_is_zeronet(addr) && ipv4_is_local_multicast(addr));
 }

 static inline bool valid_ipv4_port(__be16 port)
@@ -34,7 +34,6 @@
 #include "totl.h"
 #include "util.h"
 #include "quota.h"
-#include "trans.h"
 #include "counters.h"
 #include "scoutfs_trace.h"

@@ -1087,10 +1086,6 @@ int scoutfs_quota_mod_rule(struct super_block *sb, bool is_add,
 	if (ret < 0)
 		goto out;

-	ret = scoutfs_hold_trans(sb, true);
-	if (ret < 0)
-		goto out;
-
 	down_write(&qtinf->rwsem);

 	if (is_add) {
@@ -1100,31 +1095,28 @@ int scoutfs_quota_mod_rule(struct super_block *sb, bool is_add,
 		else if (ret == 0)
 			ret = -EEXIST;
 		if (ret < 0)
-			goto release;
+			goto unlock;

 		rule_to_rule_val(&rv, &rule);
 		ret = scoutfs_item_create(sb, &key, &rv, sizeof(rv), lock);
 		if (ret < 0)
-			goto release;
+			goto unlock;

 	} else {
 		ret = find_rule(sb, &rule, &key, lock) ?:
 		      scoutfs_item_delete(sb, &key, lock);
 		if (ret < 0)
-			goto release;
+			goto unlock;
 	}

-	wait_event(qtinf->waitq, !ruleset_is_busy(qtinf));
 	scoutfs_quota_invalidate(sb);
 	ret = 0;

-release:
+unlock:
 	up_write(&qtinf->rwsem);
-	scoutfs_release_trans(sb);
-
-out:
 	scoutfs_unlock(sb, lock, SCOUTFS_LOCK_WRITE);

+out:
 	if (is_add)
 		trace_scoutfs_quota_add_rule(sb, &rule, ret);
 	else
@@ -1143,17 +1135,12 @@ void scoutfs_quota_get_lock_range(struct scoutfs_key *start, struct scoutfs_key
 }

 /*
- * Mark the cached ruleset invalid and free the previous one once readers
- * drain.  Called from cluster lock invalidation and from quota rule
- * modification.
- *
- * Cluster lock invalidation runs only after the lock layer has drained
- * local READ users.  Since EBUSY is set only while a reader holds READ,
- * the reader has already published by the time we run.
- *
- * Quota rule modification waits on the waitq for any in-flight reader
- * to publish before calling here, so the next check rebuilds against
- * the newly written rules rather than the reader's stale result.
+ * This is called during cluster lock invalidation to indicate that the
+ * ruleset is no longer protected by cluster locking and might have been
+ * modified.  We mark the ruleset invalid and free it once all readers
+ * drain.  The next check will acquire the cluster lock and read the
+ * rules.  Because this is called during invalidation this is serialized
+ * with write holders of cluster locks so we can never see -EBUSY here.
 */
 void scoutfs_quota_invalidate(struct super_block *sb)
 {
@@ -1167,10 +1154,13 @@ void scoutfs_quota_invalidate(struct super_block *sb)

 	spin_lock(&qtinf->lock);
 	rs = rcu_dereference_protected(qtinf->ruleset, lockdep_is_held(&qtinf->lock));
-	if (rs == ERR_PTR(-ENOENT) || !IS_ERR(rs))
+	if (rs != ERR_PTR(-EINVAL))
 		rcu_assign_pointer(qtinf->ruleset, ERR_PTR(-EINVAL));
 	spin_unlock(&qtinf->lock);

+	/* cluster locking should have prevented this */
+	BUG_ON(rs == ERR_PTR(-EBUSY));
+
 	if (!IS_ERR(rs))
 		call_rcu(&rs->rcu, free_ruleset_rcu);

@@ -134,7 +134,7 @@ static int recov_finished(struct recov_info *recinf)

 static void timer_callback(struct timer_list *timer)
 {
-	struct recov_info *recinf = timer_container_of(recinf, timer, timer);
+	struct recov_info *recinf = from_timer(recinf, timer, timer);

 	recinf->timeout_fn(recinf->sb);
 }
@@ -789,80 +789,6 @@ TRACE_EVENT(scoutfs_inode_walk_writeback,
 		  __entry->ino, __entry->write, __entry->ret)
 );

-TRACE_EVENT(scoutfs_orphan_scan_start,
-	TP_PROTO(struct super_block *sb),
-
-	TP_ARGS(sb),
-
-	TP_STRUCT__entry(
-		SCSB_TRACE_FIELDS
-	),
-
-	TP_fast_assign(
-		SCSB_TRACE_ASSIGN(sb);
-	),
-
-	TP_printk(SCSBF, SCSB_TRACE_ARGS)
-);
-
-TRACE_EVENT(scoutfs_orphan_scan_stop,
-	TP_PROTO(struct super_block *sb, bool work_todo),
-
-	TP_ARGS(sb, work_todo),
-
-	TP_STRUCT__entry(
-		SCSB_TRACE_FIELDS
-		__field(bool, work_todo)
-	),
-
-	TP_fast_assign(
-		SCSB_TRACE_ASSIGN(sb);
-		__entry->work_todo = work_todo;
-	),
-
-	TP_printk(SCSBF" work_todo %d", SCSB_TRACE_ARGS, __entry->work_todo)
-);
-
-TRACE_EVENT(scoutfs_orphan_scan_work,
-	TP_PROTO(struct super_block *sb, __u64 ino),
-
-	TP_ARGS(sb, ino),
-
-	TP_STRUCT__entry(
-		SCSB_TRACE_FIELDS
-		__field(__u64, ino)
-	),
-
-	TP_fast_assign(
-		SCSB_TRACE_ASSIGN(sb);
-		__entry->ino = ino;
-	),
-
-	TP_printk(SCSBF" ino %llu", SCSB_TRACE_ARGS,
-		  __entry->ino)
-);
-
-TRACE_EVENT(scoutfs_orphan_scan_end,
-	TP_PROTO(struct super_block *sb, __u64 ino, int ret),
-
-	TP_ARGS(sb, ino, ret),
-
-	TP_STRUCT__entry(
-		SCSB_TRACE_FIELDS
-		__field(__u64, ino)
-		__field(int, ret)
-	),
-
-	TP_fast_assign(
-		SCSB_TRACE_ASSIGN(sb);
-		__entry->ino = ino;
-		__entry->ret = ret;
-	),
-
-	TP_printk(SCSBF" ino %llu ret %d", SCSB_TRACE_ARGS,
-		  __entry->ino, __entry->ret)
-);
-
 DECLARE_EVENT_CLASS(scoutfs_lock_info_class,
 	TP_PROTO(struct super_block *sb, struct lock_info *linfo),

@@ -897,14 +823,13 @@ DEFINE_EVENT(scoutfs_lock_info_class, scoutfs_lock_destroy,
 );

 TRACE_EVENT(scoutfs_xattr_set,
-	TP_PROTO(struct super_block *sb, __u64 ino, size_t name_len,
-		 const void *value, size_t size, int flags),
+	TP_PROTO(struct super_block *sb, size_t name_len, const void *value,
+		 size_t size, int flags),

-	TP_ARGS(sb, ino, name_len, value, size, flags),
+	TP_ARGS(sb, name_len, value, size, flags),

 	TP_STRUCT__entry(
 		SCSB_TRACE_FIELDS
-		__field(__u64, ino)
 		__field(size_t, name_len)
 		__field(const void *, value)
 		__field(size_t, size)
@@ -913,16 +838,15 @@ TRACE_EVENT(scoutfs_xattr_set,

 	TP_fast_assign(
 		SCSB_TRACE_ASSIGN(sb);
-		__entry->ino = ino;
 		__entry->name_len = name_len;
 		__entry->value = value;
 		__entry->size = size;
 		__entry->flags = flags;
 	),

-	TP_printk(SCSBF" ino %llu name_len %zu value %p size %zu flags 0x%x",
-		  SCSB_TRACE_ARGS, __entry->ino,  __entry->name_len,
-		  __entry->value, __entry->size, __entry->flags)
+	TP_printk(SCSBF" name_len %zu value %p size %zu flags 0x%x",
+		  SCSB_TRACE_ARGS, __entry->name_len, __entry->value,
+		  __entry->size, __entry->flags)
 );

 TRACE_EVENT(scoutfs_advance_dirty_super,
@@ -1110,82 +1034,6 @@ TRACE_EVENT(scoutfs_orphan_inode,
 		  MINOR(__entry->dev), __entry->ino)
 );

-DECLARE_EVENT_CLASS(scoutfs_try_delete_class,
-        TP_PROTO(struct super_block *sb, u64 ino),
-        TP_ARGS(sb, ino),
-        TP_STRUCT__entry(
-		SCSB_TRACE_FIELDS
-		__field(__u64, ino)
-        ),
-        TP_fast_assign(
-		SCSB_TRACE_ASSIGN(sb);
-		__entry->ino = ino;
-        ),
-	TP_printk(SCSBF" ino %llu", SCSB_TRACE_ARGS, __entry->ino)
-);
-
-DEFINE_EVENT(scoutfs_try_delete_class, scoutfs_try_delete,
-        TP_PROTO(struct super_block *sb, u64 ino),
-        TP_ARGS(sb, ino)
-);
-
-DEFINE_EVENT(scoutfs_try_delete_class, scoutfs_try_delete_local_busy,
-        TP_PROTO(struct super_block *sb, u64 ino),
-        TP_ARGS(sb, ino)
-);
-
-DEFINE_EVENT(scoutfs_try_delete_class, scoutfs_try_delete_cached,
-        TP_PROTO(struct super_block *sb, u64 ino),
-        TP_ARGS(sb, ino)
-);
-
-DEFINE_EVENT(scoutfs_try_delete_class, scoutfs_try_delete_no_item,
-        TP_PROTO(struct super_block *sb, u64 ino),
-        TP_ARGS(sb, ino)
-);
-
-TRACE_EVENT(scoutfs_try_delete_has_links,
-	TP_PROTO(struct super_block *sb, u64 ino, unsigned int nlink),
-
-	TP_ARGS(sb, ino, nlink),
-
-	TP_STRUCT__entry(
-		SCSB_TRACE_FIELDS
-		__field(__u64, ino)
-		__field(unsigned int, nlink)
-	),
-
-	TP_fast_assign(
-		SCSB_TRACE_ASSIGN(sb);
-		__entry->ino = ino;
-		__entry->nlink = nlink;
-	),
-
-	TP_printk(SCSBF" ino %llu nlink %u", SCSB_TRACE_ARGS, __entry->ino,
-		  __entry->nlink)
-);
-
-TRACE_EVENT(scoutfs_inode_orphan_delete,
-	TP_PROTO(struct super_block *sb, u64 ino, int ret),
-
-	TP_ARGS(sb, ino, ret),
-
-	TP_STRUCT__entry(
-		SCSB_TRACE_FIELDS
-		__field(__u64, ino)
-		__field(int, ret)
-	),
-
-	TP_fast_assign(
-		SCSB_TRACE_ASSIGN(sb);
-		__entry->ino = ino;
-		__entry->ret = ret;
-	),
-
-	TP_printk(SCSBF" ino %llu ret %d", SCSB_TRACE_ARGS, __entry->ino,
-		__entry->ret)
-);
-
 TRACE_EVENT(scoutfs_delete_inode,
 	TP_PROTO(struct super_block *sb, u64 ino, umode_t mode, u64 size),

@@ -1210,32 +1058,6 @@ TRACE_EVENT(scoutfs_delete_inode,
 		  __entry->mode, __entry->size)
 );

-TRACE_EVENT(scoutfs_delete_inode_end,
-	TP_PROTO(struct super_block *sb, u64 ino, umode_t mode, u64 size, int ret),
-
-	TP_ARGS(sb, ino, mode, size, ret),
-
-	TP_STRUCT__entry(
-		__field(dev_t, dev)
-		__field(__u64, ino)
-		__field(umode_t, mode)
-		__field(__u64, size)
-		__field(int, ret)
-	),
-
-	TP_fast_assign(
-		__entry->dev = sb->s_dev;
-		__entry->ino = ino;
-		__entry->mode = mode;
-		__entry->size = size;
-		__entry->ret = ret;
-	),
-
-	TP_printk("dev %d,%d ino %llu, mode 0x%x size %llu, ret %d",
-		  MAJOR(__entry->dev), MINOR(__entry->dev), __entry->ino,
-		  __entry->mode, __entry->size, __entry->ret)
-);
-
 DECLARE_EVENT_CLASS(scoutfs_key_class,
        TP_PROTO(struct super_block *sb, struct scoutfs_key *key),
        TP_ARGS(sb, key),
@@ -1619,6 +1441,28 @@ DEFINE_EVENT(scoutfs_work_class, scoutfs_data_return_server_extents_exit,
        TP_ARGS(sb, data, ret)
 );

+DECLARE_EVENT_CLASS(scoutfs_shrink_exit_class,
+        TP_PROTO(struct super_block *sb, unsigned long nr_to_scan, int ret),
+        TP_ARGS(sb, nr_to_scan, ret),
+        TP_STRUCT__entry(
+		__field(void *, sb)
+		__field(unsigned long, nr_to_scan)
+		__field(int, ret)
+        ),
+        TP_fast_assign(
+		__entry->sb = sb;
+		__entry->nr_to_scan = nr_to_scan;
+		__entry->ret = ret;
+        ),
+        TP_printk("sb %p nr_to_scan %lu ret %d",
+		  __entry->sb, __entry->nr_to_scan, __entry->ret)
+);
+
+DEFINE_EVENT(scoutfs_shrink_exit_class, scoutfs_lock_shrink_exit,
+        TP_PROTO(struct super_block *sb, unsigned long nr_to_scan, int ret),
+        TP_ARGS(sb, nr_to_scan, ret)
+);
+
 TRACE_EVENT(scoutfs_rename,
 	TP_PROTO(struct super_block *sb, struct inode *old_dir,
 		 struct dentry *old_dentry, struct inode *new_dir,
@@ -2619,27 +2463,6 @@ TRACE_EVENT(scoutfs_block_dirty_ref,
 		  __entry->block_blkno, __entry->block_seq)
 );

-TRACE_EVENT(scoutfs_get_file_block,
-	TP_PROTO(struct super_block *sb, u64 blkno, int flags),
-
-	TP_ARGS(sb, blkno, flags),
-
-	TP_STRUCT__entry(
-		SCSB_TRACE_FIELDS
-		__field(__u64, blkno)
-		__field(int, flags)
-	),
-
-	TP_fast_assign(
-		SCSB_TRACE_ASSIGN(sb);
-		__entry->blkno = blkno;
-		__entry->flags = flags;
-	),
-
-	TP_printk(SCSBF" blkno %llu flags 0x%x",
-		  SCSB_TRACE_ARGS, __entry->blkno, __entry->flags)
-);
-
 TRACE_EVENT(scoutfs_block_stale,
 	TP_PROTO(struct super_block *sb, struct scoutfs_block_ref *ref,
 		 struct scoutfs_block_header *hdr, u32 magic, u32 crc),
@@ -2680,8 +2503,8 @@ TRACE_EVENT(scoutfs_block_stale,

 DECLARE_EVENT_CLASS(scoutfs_block_class,
 	TP_PROTO(struct super_block *sb, void *bp, u64 blkno, int refcount, int io_count,
-		 unsigned long bits),
-	TP_ARGS(sb, bp, blkno, refcount, io_count, bits),
+		 unsigned long bits, __u64 accessed),
+	TP_ARGS(sb, bp, blkno, refcount, io_count, bits, accessed),
 	TP_STRUCT__entry(
 		SCSB_TRACE_FIELDS
 		__field(void *, bp)
@@ -2689,6 +2512,7 @@ DECLARE_EVENT_CLASS(scoutfs_block_class,
 		__field(int, refcount)
 		__field(int, io_count)
 		__field(long, bits)
+		__field(__u64, accessed)
 	),
 	TP_fast_assign(
 		SCSB_TRACE_ASSIGN(sb);
@@ -2697,65 +2521,71 @@ DECLARE_EVENT_CLASS(scoutfs_block_class,
 		__entry->refcount = refcount;
 		__entry->io_count = io_count;
 		__entry->bits = bits;
+		__entry->accessed = accessed;
 	),
-	TP_printk(SCSBF" bp %p blkno %llu refcount %x io_count %d bits 0x%lx",
+	TP_printk(SCSBF" bp %p blkno %llu refcount %d io_count %d bits 0x%lx accessed %llu",
 		  SCSB_TRACE_ARGS, __entry->bp, __entry->blkno, __entry->refcount,
-		  __entry->io_count, __entry->bits)
+		  __entry->io_count, __entry->bits, __entry->accessed)
 );
 DEFINE_EVENT(scoutfs_block_class, scoutfs_block_allocate,
 	TP_PROTO(struct super_block *sb, void *bp, u64 blkno,
-		 int refcount, int io_count, unsigned long bits),
-	TP_ARGS(sb, bp, blkno, refcount, io_count, bits)
+		 int refcount, int io_count, unsigned long bits,
+		 __u64 accessed),
+	TP_ARGS(sb, bp, blkno, refcount, io_count, bits, accessed)
 );
 DEFINE_EVENT(scoutfs_block_class, scoutfs_block_free,
 	TP_PROTO(struct super_block *sb, void *bp, u64 blkno,
-		 int refcount, int io_count, unsigned long bits),
-	TP_ARGS(sb, bp, blkno, refcount, io_count, bits)
+		 int refcount, int io_count, unsigned long bits,
+		 __u64 accessed),
+	TP_ARGS(sb, bp, blkno, refcount, io_count, bits, accessed)
 );
 DEFINE_EVENT(scoutfs_block_class, scoutfs_block_insert,
 	TP_PROTO(struct super_block *sb, void *bp, u64 blkno,
-		 int refcount, int io_count, unsigned long bits),
-	TP_ARGS(sb, bp, blkno, refcount, io_count, bits)
+		 int refcount, int io_count, unsigned long bits,
+		 __u64 accessed),
+	TP_ARGS(sb, bp, blkno, refcount, io_count, bits, accessed)
 );
 DEFINE_EVENT(scoutfs_block_class, scoutfs_block_remove,
 	TP_PROTO(struct super_block *sb, void *bp, u64 blkno,
-		 int refcount, int io_count, unsigned long bits),
-	TP_ARGS(sb, bp, blkno, refcount, io_count, bits)
+		 int refcount, int io_count, unsigned long bits,
+		 __u64 accessed),
+	TP_ARGS(sb, bp, blkno, refcount, io_count, bits, accessed)
 );
 DEFINE_EVENT(scoutfs_block_class, scoutfs_block_end_io,
 	TP_PROTO(struct super_block *sb, void *bp, u64 blkno,
-		 int refcount, int io_count, unsigned long bits),
-	TP_ARGS(sb, bp, blkno, refcount, io_count, bits)
+		 int refcount, int io_count, unsigned long bits,
+		 __u64 accessed),
+	TP_ARGS(sb, bp, blkno, refcount, io_count, bits, accessed)
 );
 DEFINE_EVENT(scoutfs_block_class, scoutfs_block_submit,
 	TP_PROTO(struct super_block *sb, void *bp, u64 blkno,
-		 int refcount, int io_count, unsigned long bits),
-	TP_ARGS(sb, bp, blkno, refcount, io_count, bits)
+		 int refcount, int io_count, unsigned long bits,
+		 __u64 accessed),
+	TP_ARGS(sb, bp, blkno, refcount, io_count, bits, accessed)
 );
 DEFINE_EVENT(scoutfs_block_class, scoutfs_block_invalidate,
 	TP_PROTO(struct super_block *sb, void *bp, u64 blkno,
-		 int refcount, int io_count, unsigned long bits),
-	TP_ARGS(sb, bp, blkno, refcount, io_count, bits)
+		 int refcount, int io_count, unsigned long bits,
+		 __u64 accessed),
+	TP_ARGS(sb, bp, blkno, refcount, io_count, bits, accessed)
 );
 DEFINE_EVENT(scoutfs_block_class, scoutfs_block_mark_dirty,
 	TP_PROTO(struct super_block *sb, void *bp, u64 blkno,
-		 int refcount, int io_count, unsigned long bits),
-	TP_ARGS(sb, bp, blkno, refcount, io_count, bits)
+		 int refcount, int io_count, unsigned long bits,
+		 __u64 accessed),
+	TP_ARGS(sb, bp, blkno, refcount, io_count, bits, accessed)
 );
 DEFINE_EVENT(scoutfs_block_class, scoutfs_block_forget,
 	TP_PROTO(struct super_block *sb, void *bp, u64 blkno,
-		 int refcount, int io_count, unsigned long bits),
-	TP_ARGS(sb, bp, blkno, refcount, io_count, bits)
+		 int refcount, int io_count, unsigned long bits,
+		 __u64 accessed),
+	TP_ARGS(sb, bp, blkno, refcount, io_count, bits, accessed)
 );
 DEFINE_EVENT(scoutfs_block_class, scoutfs_block_shrink,
 	TP_PROTO(struct super_block *sb, void *bp, u64 blkno,
-		 int refcount, int io_count, unsigned long bits),
-	TP_ARGS(sb, bp, blkno, refcount, io_count, bits)
-);
-DEFINE_EVENT(scoutfs_block_class, scoutfs_block_isolate,
-	TP_PROTO(struct super_block *sb, void *bp, u64 blkno,
-		 int refcount, int io_count, unsigned long bits),
-	TP_ARGS(sb, bp, blkno, refcount, io_count, bits)
+		 int refcount, int io_count, unsigned long bits,
+		 __u64 accessed),
+	TP_ARGS(sb, bp, blkno, refcount, io_count, bits, accessed)
 );

 DECLARE_EVENT_CLASS(scoutfs_ext_next_class,
@@ -3230,45 +3060,6 @@ DEFINE_EVENT(scoutfs_srch_compact_class, scoutfs_srch_compact_client_recv,
 	TP_ARGS(sb, sc)
 );

-TRACE_EVENT(scoutfs_ioc_search_xattrs,
-	TP_PROTO(struct super_block *sb, u64 ino, u64 last_ino),
-
-	TP_ARGS(sb, ino, last_ino),
-
-	TP_STRUCT__entry(
-		SCSB_TRACE_FIELDS
-		__field(u64, ino)
-		__field(u64, last_ino)
-	),
-
-	TP_fast_assign(
-		SCSB_TRACE_ASSIGN(sb);
-		__entry->ino = ino;
-		__entry->last_ino = last_ino;
-	),
-
-	TP_printk(SCSBF" ino %llu last_ino %llu", SCSB_TRACE_ARGS,
-		  __entry->ino, __entry->last_ino)
-);
-
-TRACE_EVENT(scoutfs_trigger_fired,
-	TP_PROTO(struct super_block *sb, const char *name),
-
-	TP_ARGS(sb, name),
-
-	TP_STRUCT__entry(
-		SCSB_TRACE_FIELDS
-		__field(const char *, name)
-	),
-
-	TP_fast_assign(
-		SCSB_TRACE_ASSIGN(sb);
-		__entry->name = name;
-	),
-
-	TP_printk(SCSBF" %s", SCSB_TRACE_ARGS, __entry->name)
-);
-
 #endif /* _TRACE_SCOUTFS_H */

 /* This part must be outside protection */
@@ -41,7 +41,6 @@
 #include "recov.h"
 #include "omap.h"
 #include "fence.h"
-#include "triggers.h"

 /*
 * Every active mount can act as the server that listens on a net
@@ -256,14 +255,6 @@ static void server_down(struct server_info *server)
 		cmpxchg(&server->status, was, SERVER_DOWN);
 }

-static void init_mounted_client_key(struct scoutfs_key *key, u64 rid)
-{
-	*key = (struct scoutfs_key) {
-		.sk_zone = SCOUTFS_MOUNTED_CLIENT_ZONE,
-		.skmc_rid = cpu_to_le64(rid),
-	};
-}
-
 /*
 * The per-holder allocation block use budget balances batching
 * efficiency and concurrency.  The larger this gets, the fewer
@@ -619,7 +610,7 @@ static void scoutfs_server_commit_func(struct work_struct *work)
 		goto out;

 	if (scoutfs_forcing_unmount(sb)) {
-		ret = -ENOLINK;
+		ret = -EIO;
 		goto out;
 	}

@@ -971,28 +962,6 @@ static int find_log_trees_item(struct super_block *sb,
 	return ret;
 }

-/*
- * Return true if the given rid has a mounted_clients entry.
- */
-static bool rid_is_mounted(struct super_block *sb, u64 rid)
-{
-	DECLARE_SERVER_INFO(sb, server);
-	struct scoutfs_super_block *super = DIRTY_SUPER_SB(sb);
-	SCOUTFS_BTREE_ITEM_REF(iref);
-	struct scoutfs_key key;
-	int ret;
-
-	init_mounted_client_key(&key, rid);
-
-	mutex_lock(&server->mounted_clients_mutex);
-	ret = scoutfs_btree_lookup(sb, &super->mounted_clients, &key, &iref);
-	if (ret == 0)
-		scoutfs_btree_put_iref(&iref);
-	mutex_unlock(&server->mounted_clients_mutex);
-
-	return ret == 0;
-}
-
 /*
 * Find the log_trees item with the greatest nr for each rid.  Fills the
 * caller's log_trees and sets the key before the returned log_trees for
@@ -1025,11 +994,10 @@ static int for_each_rid_last_lt(struct super_block *sb, struct scoutfs_btree_roo
 }

 /*
- * Log merge range items are stored at the starting fs key of the range
- * with the zone overwritten to indicate the log merge item type.  This
- * day0 mistake loses sorting information for items in the different
- * zones in the fs root, so the range items aren't strictly sorted by
- * the starting key of their range.
+ * Log merge range items are stored at the starting fs key of the range.
+ * The only fs key field that doesn't hold information is the zone, so
+ * we use the zone to differentiate all types that we store in the log
+ * merge tree.
 */
 static void init_log_merge_key(struct scoutfs_key *key, u8 zone, u64 first,
 			       u64 second)
@@ -1061,50 +1029,6 @@ static int next_log_merge_item_key(struct super_block *sb, struct scoutfs_btree_
 	return ret;
 }

-/*
- * The range items aren't sorted by their range.start because
- * _RANGE_ZONE clobbers the range's zone.  We sweep all the items and
- * find the range with the next least starting key that's greater than
- * the caller's starting key.  We have to be careful to iterate over the
- * log_merge tree keys because the ranges can overlap as they're mapped
- * to the log_merge keys by clobbering their zone.
- */
-static int next_log_merge_range(struct super_block *sb, struct scoutfs_btree_root *root,
-				struct scoutfs_key *start, struct scoutfs_log_merge_range *rng)
-{
-	struct scoutfs_log_merge_range *next;
-	SCOUTFS_BTREE_ITEM_REF(iref);
-	struct scoutfs_key key;
-	int ret;
-
-	init_log_merge_key(&key, SCOUTFS_LOG_MERGE_RANGE_ZONE, 0, 0);
-	scoutfs_key_set_ones(&rng->start);
-
-	do {
-		ret = scoutfs_btree_next(sb, root, &key, &iref);
-		if (ret == 0) {
-			if (iref.key->sk_zone != SCOUTFS_LOG_MERGE_RANGE_ZONE) {
-				ret = -ENOENT;
-			} else if (iref.val_len != sizeof(struct scoutfs_log_merge_range)) {
-				ret = -EIO;
-			} else {
-				next = iref.val;
-				if (scoutfs_key_compare(&next->start, &rng->start) < 0 &&
-				    scoutfs_key_compare(&next->start, start) >= 0)
-					*rng = *next;
-				key = *iref.key;
-				scoutfs_key_inc(&key);
-			}
-			scoutfs_btree_put_iref(&iref);
-		}
-	} while (ret == 0);
-
-	if (ret == -ENOENT && !scoutfs_key_is_ones(&rng->start))
-		ret = 0;
-
-	return ret;
-}
-
 static int next_log_merge_item(struct super_block *sb,
 			       struct scoutfs_btree_root *root,
 			       u8 zone, u64 first, u64 second,
@@ -1116,101 +1040,6 @@ static int next_log_merge_item(struct super_block *sb,
 	return next_log_merge_item_key(sb, root, zone, &key, val, val_len);
 }

-static int do_finalize_ours(struct super_block *sb,
-			    struct scoutfs_log_trees *lt,
-			    struct commit_hold *hold)
-{
-	struct server_info *server = SCOUTFS_SB(sb)->server_info;
-	struct scoutfs_super_block *super = DIRTY_SUPER_SB(sb);
-	struct scoutfs_key key;
-	char *err_str = NULL;
-	u64 rid = le64_to_cpu(lt->rid);
-	bool more;
-	int ret;
-	int err;
-
-	mutex_lock(&server->srch_mutex);
-	ret = scoutfs_srch_rotate_log(sb, &server->alloc, &server->wri,
-				      &super->srch_root, &lt->srch_file, true);
-	mutex_unlock(&server->srch_mutex);
-	if (ret < 0) {
-		scoutfs_err(sb, "error rotating srch log for rid %016llx: %d",
-			    rid, ret);
-		return ret;
-        }
-
-	do {
-		more = false;
-
-		/*
-		 * All of these can return errors, perhaps indicating successful
-		 * partial progress, after having modified the allocator trees.
-		 * We always have to update the roots in the log item.
-		 */
-		mutex_lock(&server->alloc_mutex);
-		ret = (err_str = "splice meta_freed to other_freed",
-				scoutfs_alloc_splice_list(sb, &server->alloc,
-					&server->wri, server->other_freed,
-					&lt->meta_freed)) ?:
-			(err_str = "splice meta_avail",
-			 scoutfs_alloc_splice_list(sb, &server->alloc,
-					&server->wri, server->other_freed,
-					&lt->meta_avail)) ?:
-			(err_str = "empty data_avail",
-			 alloc_move_empty(sb, &super->data_alloc,
-					  &lt->data_avail,
-					  COMMIT_HOLD_ALLOC_BUDGET / 2)) ?:
-			(err_str = "empty data_freed",
-			 alloc_move_empty(sb, &super->data_alloc,
-					  &lt->data_freed,
-					  COMMIT_HOLD_ALLOC_BUDGET / 2));
-		mutex_unlock(&server->alloc_mutex);
-
-		/*
-		 * only finalize, allowing merging, once the allocators are
-		 * fully freed
-		 */
-		if (ret == 0) {
-			/* the transaction is no longer open */
-			le64_add_cpu(&lt->flags, SCOUTFS_LOG_TREES_FINALIZED);
-			lt->finalize_seq = cpu_to_le64(scoutfs_server_next_seq(sb));
-		}
-
-		scoutfs_key_init_log_trees(&key, rid, le64_to_cpu(lt->nr));
-
-		err = scoutfs_btree_update(sb, &server->alloc, &server->wri,
-					   &super->logs_root, &key, lt,
-					   sizeof(*lt));
-		BUG_ON(err != 0); /* alloc, log, srch items out of sync */
-
-		if (ret == -EINPROGRESS) {
-			more = true;
-			mutex_unlock(&server->logs_mutex);
-			ret = server_apply_commit(sb, hold, 0);
-			if (ret < 0)
-				WARN_ON_ONCE(ret < 0);
-			server_hold_commit(sb, hold);
-			mutex_lock(&server->logs_mutex);
-		} else if (ret == 0) {
-			memset(&lt->item_root, 0, sizeof(lt->item_root));
-			memset(&lt->bloom_ref, 0, sizeof(lt->bloom_ref));
-			lt->inode_count_delta = 0;
-			lt->max_item_seq = 0;
-			lt->finalize_seq = 0;
-			le64_add_cpu(&lt->nr, 1);
-			lt->flags = 0;
-		}
-	} while (more);
-
-	if (ret < 0) {
-		scoutfs_err(sb,
-			    "error %d finalizing log trees for rid %016llx: %s",
-			    ret, rid, err_str);
-	}
-
-	return ret;
-}
-
 /*
 * Finalizing the log btrees for merging needs to be done carefully so
 * that items don't appear to go backwards in time.
@@ -1250,60 +1079,6 @@ static int do_finalize_ours(struct super_block *sb,
 * happens to arrive at just the right time.  That's fine, merging will
 * ignore and tear down the empty input.
 */
-
-static int reclaim_open_log_tree(struct super_block *sb, u64 rid);
-
-/*
- * Reclaim log trees for rids that have no mounted_clients entry.
- * They block merges by appearing active.  reclaim_open_log_tree
- * may need multiple commits to drain allocators (-EINPROGRESS).
- *
- * The caller holds logs_mutex and a commit, both are dropped and
- * re-acquired around each reclaim call.  Returns >0 if any orphans
- * were reclaimed so the caller can re-check state that may have
- * changed while the lock was dropped.
- */
-static int reclaim_orphan_log_trees(struct super_block *sb, u64 rid,
-				    struct commit_hold *hold)
-{
-	struct server_info *server = SCOUTFS_SB(sb)->server_info;
-	struct scoutfs_super_block *super = DIRTY_SUPER_SB(sb);
-	struct scoutfs_log_trees lt;
-	struct scoutfs_key key;
-	bool found = false;
-	u64 orphan_rid;
-	int ret;
-	int err;
-
-	scoutfs_key_init_log_trees(&key, U64_MAX, U64_MAX);
-	while ((ret = for_each_rid_last_lt(sb, &super->logs_root, &key, &lt)) > 0) {
-
-		if ((le64_to_cpu(lt.flags) & SCOUTFS_LOG_TREES_FINALIZED) ||
-		    le64_to_cpu(lt.rid) == rid ||
-		    rid_is_mounted(sb, le64_to_cpu(lt.rid)))
-			continue;
-
-		orphan_rid = le64_to_cpu(lt.rid);
-		scoutfs_err(sb, "reclaiming orphan log trees for rid %016llx nr %llu",
-			    orphan_rid, le64_to_cpu(lt.nr));
-		found = true;
-
-		do {
-			mutex_unlock(&server->logs_mutex);
-			err = reclaim_open_log_tree(sb, orphan_rid);
-			ret = server_apply_commit(sb, hold,
-						  err == -EINPROGRESS ? 0 : err);
-			server_hold_commit(sb, hold);
-			mutex_lock(&server->logs_mutex);
-		} while (err == -EINPROGRESS && ret == 0);
-
-		if (ret < 0)
-			break;
-	}
-
-	return ret < 0 ? ret : found;
-}
-
 #define FINALIZE_POLL_MIN_DELAY_MS	5U
 #define FINALIZE_POLL_MAX_DELAY_MS	100U
 #define FINALIZE_POLL_DELAY_GROWTH_PCT	150U
@@ -1316,6 +1091,7 @@ static int finalize_and_start_log_merge(struct super_block *sb, struct scoutfs_l
 	struct scoutfs_log_merge_range rng;
 	struct scoutfs_mount_options opts;
 	struct scoutfs_log_trees each_lt;
+	struct scoutfs_log_trees fin;
 	unsigned int delay_ms;
 	unsigned long timeo;
 	bool saw_finalized;
@@ -1344,16 +1120,6 @@ static int finalize_and_start_log_merge(struct super_block *sb, struct scoutfs_l
 			break;
 		}

-		ret = reclaim_orphan_log_trees(sb, rid, hold);
-		if (ret < 0) {
-			err_str = "reclaiming orphan log trees";
-			break;
-		}
-		if (ret > 0) {
-			/* lock was dropped, re-check merge status */
-			continue;
-		}
-
 		/* look for finalized and other active log btrees */
 		saw_finalized = false;
 		others_active = false;
@@ -1385,13 +1151,9 @@ static int finalize_and_start_log_merge(struct super_block *sb, struct scoutfs_l
 		 * meta was low so that deleted items are merged
 		 * promptly and freed blocks can bring the client out of
 		 * enospc.
-		 *
-		 * The trigger can be used to force a log merge in cases where
-		 * a test only generates small amounts of change.
 		 */
 		finalize_ours = (lt->item_root.height > 2) ||
-				(le32_to_cpu(lt->meta_avail.flags) & SCOUTFS_ALLOC_FLAG_LOW) ||
-				scoutfs_trigger(sb, LOG_MERGE_FORCE_FINALIZE_OURS);
+				(le32_to_cpu(lt->meta_avail.flags) & SCOUTFS_ALLOC_FLAG_LOW);

 		trace_scoutfs_server_finalize_decision(sb, rid, saw_finalized, others_active,
 						       ours_visible, finalize_ours, delay_ms,
@@ -1400,7 +1162,6 @@ static int finalize_and_start_log_merge(struct super_block *sb, struct scoutfs_l
 		/* done if we're not finalizing and there's no finalized */
 		if (!finalize_ours && !saw_finalized) {
 			ret = 0;
-			scoutfs_inc_counter(sb, log_merge_no_finalized);
 			break;
 		}

@@ -1435,11 +1196,32 @@ static int finalize_and_start_log_merge(struct super_block *sb, struct scoutfs_l

 		/* Finalize ours if it's visible to others */
 		if (ours_visible) {
-			ret = do_finalize_ours(sb, lt, hold);
+			fin = *lt;
+			memset(&fin.meta_avail, 0, sizeof(fin.meta_avail));
+			memset(&fin.meta_freed, 0, sizeof(fin.meta_freed));
+			memset(&fin.data_avail, 0, sizeof(fin.data_avail));
+			memset(&fin.data_freed, 0, sizeof(fin.data_freed));
+			memset(&fin.srch_file, 0, sizeof(fin.srch_file));
+			le64_add_cpu(&fin.flags, SCOUTFS_LOG_TREES_FINALIZED);
+			fin.finalize_seq = cpu_to_le64(scoutfs_server_next_seq(sb));
+
+			scoutfs_key_init_log_trees(&key, le64_to_cpu(fin.rid),
+						   le64_to_cpu(fin.nr));
+			ret = scoutfs_btree_update(sb, &server->alloc, &server->wri,
+						   &super->logs_root, &key, &fin,
+						   sizeof(fin));
 			if (ret < 0) {
-				err_str = "finalizing ours";
+				err_str = "updating finalized log_trees";
 				break;
 			}
+
+			memset(&lt->item_root, 0, sizeof(lt->item_root));
+			memset(&lt->bloom_ref, 0, sizeof(lt->bloom_ref));
+			lt->inode_count_delta = 0;
+			lt->max_item_seq = 0;
+			lt->finalize_seq = 0;
+			le64_add_cpu(&lt->nr, 1);
+			lt->flags = 0;
 		}

 		/* wait a bit for mounts to arrive */
@@ -1500,8 +1282,6 @@ static int finalize_and_start_log_merge(struct super_block *sb, struct scoutfs_l
 			BUG_ON(err); /* inconsistent */
 		}

-		scoutfs_inc_counter(sb, log_merge_start);
-
 		/* we're done, caller can make forward progress */
 		break;
 	}
@@ -1718,8 +1498,7 @@ static int server_get_log_trees(struct super_block *sb,
 		goto update;
 	}

-	ret = alloc_move_empty(sb, &super->data_alloc, &lt.data_freed,
-			       COMMIT_HOLD_ALLOC_BUDGET / 2);
+	ret = alloc_move_empty(sb, &super->data_alloc, &lt.data_freed, 100);
 	if (ret == -EINPROGRESS)
 		ret = 0;
 	if (ret < 0) {
@@ -1829,7 +1608,6 @@ static int server_commit_log_trees(struct super_block *sb,
 	int ret;

 	if (arg_len != sizeof(struct scoutfs_log_trees)) {
-		err_str = "invalid message log_trees size";
 		ret = -EINVAL;
 		goto out;
 	}
@@ -1893,7 +1671,7 @@ static int server_commit_log_trees(struct super_block *sb,

 	ret = scoutfs_btree_update(sb, &server->alloc, &server->wri,
 				   &super->logs_root, &key, &lt, sizeof(lt));
-	BUG_ON(ret < 0); /* dirtying should have guaranteed success, srch item inconsistent */
+	BUG_ON(ret < 0); /* dirtying should have guaranteed success */
 	if (ret < 0)
 		err_str = "updating log trees item";

@@ -1901,10 +1679,11 @@ unlock:
 	mutex_unlock(&server->logs_mutex);

 	ret = server_apply_commit(sb, &hold, ret);
-out:
 	if (ret < 0)
-		scoutfs_err(sb, "server error %d committing client logs for rid %016llx, nr %llu: %s",
-			    ret, rid, le64_to_cpu(lt.nr), err_str);
+		scoutfs_err(sb, "server error %d committing client logs for rid %016llx: %s",
+			    ret, rid, err_str);
+out:
+	WARN_ON_ONCE(ret < 0);
 	return scoutfs_net_response(sb, conn, cmd, id, ret, NULL, 0);
 }

@@ -2014,15 +1793,13 @@ static int reclaim_open_log_tree(struct super_block *sb, u64 rid)
 	       scoutfs_alloc_splice_list(sb, &server->alloc, &server->wri, server->other_freed,
 					 &lt.meta_avail)) ?:
 	      (err_str = "empty data_avail",
-	       alloc_move_empty(sb, &super->data_alloc, &lt.data_avail,
-				COMMIT_HOLD_ALLOC_BUDGET / 2)) ?:
+	       alloc_move_empty(sb, &super->data_alloc, &lt.data_avail, 100)) ?:
 	      (err_str = "empty data_freed",
-	       alloc_move_empty(sb, &super->data_alloc, &lt.data_freed,
-				COMMIT_HOLD_ALLOC_BUDGET / 2));
+	       alloc_move_empty(sb, &super->data_alloc, &lt.data_freed, 100));
 	mutex_unlock(&server->alloc_mutex);

 	/* only finalize, allowing merging, once the allocators are fully freed */
-	if (ret == 0 && !scoutfs_trigger(sb, RECLAIM_SKIP_FINALIZE)) {
+	if (ret == 0) {
 		/* the transaction is no longer open */
 		lt.commit_trans_seq = lt.get_trans_seq;

@@ -2039,9 +1816,6 @@ static int reclaim_open_log_tree(struct super_block *sb, u64 rid)
 out:
 	mutex_unlock(&server->logs_mutex);

-	if (ret == 0)
-		scoutfs_inc_counter(sb, reclaimed_open_logs);
-
 	if (ret < 0 && ret != -EINPROGRESS)
 		scoutfs_err(sb, "server error %d reclaiming log trees for rid %016llx: %s",
 			    ret, rid, err_str);
@@ -2074,8 +1848,7 @@ static int get_stable_trans_seq(struct super_block *sb, u64 *last_seq_ret)
 	scoutfs_key_init_log_trees(&key, U64_MAX, U64_MAX);
 	while ((ret = for_each_rid_last_lt(sb, &super->logs_root, &key, &lt)) > 0) {
 		if ((le64_to_cpu(lt.get_trans_seq) > le64_to_cpu(lt.commit_trans_seq)) &&
-		     le64_to_cpu(lt.get_trans_seq) <= last_seq &&
-		     rid_is_mounted(sb, le64_to_cpu(lt.rid))) {
+		     le64_to_cpu(lt.get_trans_seq) <= last_seq) {
 			last_seq = le64_to_cpu(lt.get_trans_seq) - 1;
 		}
 	}
@@ -2244,7 +2017,7 @@ static int server_srch_get_compact(struct super_block *sb,

 apply:
 	ret = server_apply_commit(sb, &hold, ret);
-	WARN_ON_ONCE(ret < 0 && ret != -ENOENT && ret != -ENOLINK); /* XXX leaked busy item */
+	WARN_ON_ONCE(ret < 0 && ret != -ENOENT); /* XXX leaked busy item */
 out:
 	ret = scoutfs_net_response(sb, conn, cmd, id, ret,
 				   sc, sizeof(struct scoutfs_srch_compact));
@@ -2284,7 +2057,7 @@ static int server_srch_commit_compact(struct super_block *sb,
 					  &super->srch_root, rid, sc,
 					  &av, &fr);
 	mutex_unlock(&server->srch_mutex);
-	if (ret < 0)
+	if (ret < 0) /* XXX very bad, leaks allocators */
 		goto apply;

 	/* reclaim allocators if they were set by _srch_commit_ */
@@ -2294,10 +2067,10 @@ static int server_srch_commit_compact(struct super_block *sb,
 	      scoutfs_alloc_splice_list(sb, &server->alloc, &server->wri,
 					server->other_freed, &fr);
 	mutex_unlock(&server->alloc_mutex);
-	WARN_ON(ret < 0); /* XXX leaks allocators */
 apply:
 	ret = server_apply_commit(sb, &hold, ret);
 out:
+	WARN_ON(ret < 0); /* XXX leaks allocators */
 	return scoutfs_net_response(sb, conn, cmd, id, ret, NULL, 0);
 }

@@ -2610,8 +2383,6 @@ static int splice_log_merge_completions(struct super_block *sb,
 		queue_work(server->wq, &server->log_merge_free_work);
 	else
 		err_str = "deleting merge status item";
-
-	scoutfs_inc_counter(sb, log_merge_complete);
 out:
 	if (upd_stat) {
 		init_log_merge_key(&key, SCOUTFS_LOG_MERGE_STATUS_ZONE, 0, 0);
@@ -2624,9 +2395,10 @@ out:
 		}
 	}

-	/* inconsistent */
-	scoutfs_bug_on_err(sb, ret,
-			   "server error %d splicing log merge completion: %s", ret, err_str);
+	if (ret < 0)
+		scoutfs_err(sb, "server error %d splicing log merge completion: %s", ret, err_str);
+
+	BUG_ON(ret); /* inconsistent */

 	return ret ?: einprogress;
 }
@@ -2761,7 +2533,7 @@ static void server_log_merge_free_work(struct work_struct *work)

 		ret = scoutfs_btree_free_blocks(sb, &server->alloc,
 						&server->wri, &fr.key,
-						&fr.root, COMMIT_HOLD_ALLOC_BUDGET / 8);
+						&fr.root, COMMIT_HOLD_ALLOC_BUDGET / 2);
 		if (ret < 0) {
 			err_str = "freeing log btree";
 			break;
@@ -2780,7 +2552,7 @@ static void server_log_merge_free_work(struct work_struct *work)
 		/* freed blocks are in allocator, we *have* to update fr */
 		BUG_ON(ret < 0);

-		if (server_hold_alloc_used_since(sb, &hold) >= (COMMIT_HOLD_ALLOC_BUDGET * 3) / 4) {
+		if (server_hold_alloc_used_since(sb, &hold) >= COMMIT_HOLD_ALLOC_BUDGET / 2) {
 			mutex_unlock(&server->logs_mutex);
 			ret = server_apply_commit(sb, &hold, ret);
 			commit = false;
@@ -2871,7 +2643,10 @@ restart:

 	/* find the next range, always checking for splicing */
 	for (;;) {
-		ret = next_log_merge_range(sb, &super->log_merge, &stat.next_range_key, &rng);
+		key = stat.next_range_key;
+		key.sk_zone = SCOUTFS_LOG_MERGE_RANGE_ZONE;
+		ret = next_log_merge_item_key(sb, &super->log_merge, SCOUTFS_LOG_MERGE_RANGE_ZONE,
+					      &key, &rng, sizeof(rng));
 		if (ret < 0 && ret != -ENOENT) {
 			err_str = "finding merge range item";
 			goto out;
@@ -3142,13 +2917,7 @@ static int server_commit_log_merge(struct super_block *sb,
 				  SCOUTFS_LOG_MERGE_STATUS_ZONE, 0, 0,
 				  &stat, sizeof(stat));
 	if (ret < 0) {
-		/*
-		 * During a retransmission, it's possible that the server
-		 * already committed and resolved this log merge. ENOENT
-		 * is expected in that case.
-		 */
-		if (ret != -ENOENT)
-			err_str = "getting merge status item";
+		err_str = "getting merge status item";
 		goto out;
 	}

@@ -3627,6 +3396,14 @@ out:
 	return scoutfs_net_response(sb, conn, cmd, id, ret, &nst, sizeof(nst));
 }

+static void init_mounted_client_key(struct scoutfs_key *key, u64 rid)
+{
+	*key = (struct scoutfs_key) {
+		.sk_zone = SCOUTFS_MOUNTED_CLIENT_ZONE,
+		.skmc_rid = cpu_to_le64(rid),
+	};
+}
+
 static bool invalid_mounted_client_item(struct scoutfs_btree_item_ref *iref)
 {
 	return (iref->val_len != sizeof(struct scoutfs_mounted_client_btree_val));
@@ -1,45 +0,0 @@
-#!/bin/bash
-
-#
-# Unfortunately, kernels can ship which contain sparse errors that are
-# unrelated to us.
-#
-# The exit status of this filtering wrapper will indicate an error if
-# sparse wasn't found or if there were any unfiltered output lines.  It
-# can hide error exit status from sparse or grep if they don't produce
-# output that makes it past the filters.
-#
-
-# must have sparse.  Fail with error message, mask success path.
-which sparse > /dev/null || exit 1
-
-# initial unmatchable, additional added as RE+="|..."
-RE="$^"
-
-#
-# Darn.  sparse has multi-line error messages, and I'd rather not bother
-# with multi-line filters.  So we'll just drop this context.
-#
-# command-line: note: in included file (through include/linux/netlink.h, include/linux/ethtool.h, include/linux/netdevice.h, include/net/sock.h, /root/scoutfs/kmod/src/kernelcompat.h, builtin): 
-#         fprintf(stderr, "%s: note: in included file%s:\n",
-#
-RE+="|: note: in included file"
-
-# 3.10.0-1160.119.1.el7.x86_64.debug
-# include/linux/posix_acl.h:138:9: warning: incorrect type in assignment (different address spaces)
-# include/linux/posix_acl.h:138:9:    expected struct posix_acl *<noident>
-# include/linux/posix_acl.h:138:9:    got struct posix_acl [noderef] <asn:4>*<noident>
-RE+="|include/linux/posix_acl.h:"
-
-# 3.10.0-1160.119.1.el7.x86_64.debug
-#include/uapi/linux/perf_event.h:146:56: warning: cast truncates bits from constant value (8000000000000000 becomes 0)
-RE+="|include/uapi/linux/perf_event.h:"
-
-# 4.18.0-513.24.1.el8_9.x86_64+debug'
-#./include/linux/skbuff.h:824:1: warning: directive in macro's argument list
-RE+="|include/linux/skbuff.h:"
-
-sparse "$@" |& \
-	grep -E -v "($RE)" |& \
-	awk '{ print $0 } END { exit NR > 0 }'
-exit $?
@@ -62,7 +62,7 @@
 * re-allocated and re-written.  Search can restart by checking the
 * btree for the current set of files.  Compaction reads log files which
 * are protected from other compactions by the persistent busy items
- * created by the server.  Compaction won't see its blocks reused out
+ * created by the server.  Compaction won't see it's blocks reused out
 * from under it, but it can encounter stale cached blocks that need to
 * be invalidated.
 */
@@ -442,10 +442,6 @@ out:
 	if (ret == 0 && (flags & GFB_INSERT) && blk >= le64_to_cpu(sfl->blocks))
 		sfl->blocks = cpu_to_le64(blk + 1);

-	if (bl) {
-		trace_scoutfs_get_file_block(sb, bl->blkno, flags);
-	}
-
 	*bl_ret = bl;
 	return ret;
 }
@@ -537,35 +533,23 @@ out:
 * the pairs cancel each other out by all readers (the second encoding
 * looks like deletion) so they aren't visible to the first/last bounds of
 * the block or file.
- *
- * We use the same entry repeatedly, so the diff between them will be empty.
- * This lets us just emit the two-byte count word, leaving the other bytes
- * as zero.
- *
- * Split the desired total len into two pieces, adding any remainder to the
- * first four-bit value.
 */
-static void append_padded_entry(struct scoutfs_srch_file *sfl,
-				struct scoutfs_srch_block *srb,
-				int len)
+static int append_padded_entry(struct scoutfs_srch_file *sfl, u64 blk,
+			       struct scoutfs_srch_block *srb, struct scoutfs_srch_entry *sre)
 {
-	int each;
-	int rem;
-	u16 lengths = 0;
-	u8 *buf = srb->entries + le32_to_cpu(srb->entry_bytes);
+	int ret;

-	each = (len - 2) >> 1;
-	rem = (len - 2) & 1;
+	ret = encode_entry(srb->entries + le32_to_cpu(srb->entry_bytes),
+			   sre, &srb->tail);
+	if (ret > 0) {
+		srb->tail = *sre;
+		le32_add_cpu(&srb->entry_nr, 1);
+		le32_add_cpu(&srb->entry_bytes, ret);
+		le64_add_cpu(&sfl->entries, 1);
+		ret = 0;
+	}

-	lengths |= each + rem;
-	lengths |= each << 4;
-
-	memset(buf, 0, len);
-	put_unaligned_le16(lengths, buf);
-
-	le32_add_cpu(&srb->entry_nr, 1);
-	le32_add_cpu(&srb->entry_bytes, len);
-	le64_add_cpu(&sfl->entries, 1);
+	return ret;
 }

 /*
@@ -576,41 +560,61 @@ static void append_padded_entry(struct scoutfs_srch_file *sfl,
 * This is called when there is a single existing entry in the block.
 * We have the entire block to work with.  We encode pairs of matching
 * entries.  This hides them from readers (both searches and merging) as
- * they're interpreted as creation and deletion and are deleted.
+ * they're interpreted as creation and deletion and are deleted.  We use
+ * the existing hash value of the first entry in the block but then set
+ * the inode to an impossibly large number so it doesn't interfere with
+ * anything.
 *
- * For simplicity and to maintain sort ordering within the block, we reuse
- * the existing entry. This lets us skip the encoding step, because we know
- * the diff will be zero. We can zero-pad the resulting entries to hit the
- * target offset exactly.
+ * To hit the specific offset we very carefully manage the amount of
+ * bytes of change between fields in the entry.  We know that if we
+ * change all the byte of the ino and id we end up with a 20 byte
+ * (2+8+8,2) encoding of the pair of entries.  To have the last entry
+ * start at the _SAFE_POS offset we know that the final 20 byte pair
+ * encoding needs to end at 2 bytes (second entry encoding) after the
+ * _SAFE_POS offset.
 *
- * Because we can't predict the exact number of entry_bytes when we start,
- * we adjust the byte count of subsequent entries until we wind up at a
- * multiple of 20 bytes away from our goal and then use that length for
- * the remaining entries.
- *
- * We could just use a single pair of unnaturally large entries to consume
- * the needed space, adjusting for an odd number of entry_bytes if necessary.
- * The use of 19 or 20 bytes for the entry pair matches what we would see with
- * real (non-zero) entries that vary from the existing entry.
+ * So as we encode pairs we watch the delta of our current offset from
+ * that desired final offset of 2 past _SAFE_POS.  If we're a multiple
+ * of 20 away then we encode the full 20 byte pairs.  If we're not, then
+ * we drop a byte to encode 19 bytes.  That'll slowly change the offset
+ * to be a multiple of 20 again while encoding large entries.
 */
-static void pad_entries_at_safe(struct scoutfs_srch_file *sfl,
+static void pad_entries_at_safe(struct scoutfs_srch_file *sfl, u64 blk,
 				struct scoutfs_srch_block *srb)
 {
+	struct scoutfs_srch_entry sre;
 	u32 target;
 	s32 diff;
+	u64 hash;
+	u64 ino;
+	u64 id;
+	int ret;
+
+	hash = le64_to_cpu(srb->tail.hash);
+	ino = le64_to_cpu(srb->tail.ino) | (1ULL << 62);
+	id = le64_to_cpu(srb->tail.id);

 	target = SCOUTFS_SRCH_BLOCK_SAFE_BYTES + 2;

 	while ((diff = target - le32_to_cpu(srb->entry_bytes)) > 0) {
-		append_padded_entry(sfl, srb, 10);
+		ino ^= 1ULL << (7 * 8);
 		if (diff % 20 == 0) {
-			append_padded_entry(sfl, srb, 10);
+			id ^= 1ULL << (7 * 8);
 		} else {
-			append_padded_entry(sfl, srb, 9);
+			id ^= 1ULL << (6 * 8);
 		}
-	}

-	WARN_ON_ONCE(diff != 0);
+		sre.hash = cpu_to_le64(hash);
+		sre.ino = cpu_to_le64(ino);
+		sre.id = cpu_to_le64(id);
+
+		ret = append_padded_entry(sfl, blk, srb, &sre);
+		if (ret == 0)
+			ret = append_padded_entry(sfl, blk, srb, &sre);
+		BUG_ON(ret != 0);
+
+		diff = target - le32_to_cpu(srb->entry_bytes);
+	}
 }

 /*
@@ -745,14 +749,14 @@ static int search_log_file(struct super_block *sb,
 		for (i = 0; i < le32_to_cpu(srb->entry_nr); i++) {
 			if (pos > SCOUTFS_SRCH_BLOCK_SAFE_BYTES) {
 				/* can only be inconsistency :/ */
-				ret = -EIO;
+				ret = EIO;
 				break;
 			}

 			ret = decode_entry(srb->entries + pos, &sre, &prev);
 			if (ret <= 0) {
 				/* can only be inconsistency :/ */
-				ret = -EIO;
+				ret = EIO;
 				break;
 			}
 			pos += ret;
@@ -855,15 +859,15 @@ static int search_sorted_file(struct super_block *sb,

 		if (pos > SCOUTFS_SRCH_BLOCK_SAFE_BYTES) {
 			/* can only be inconsistency :/ */
-			ret = -EIO;
-			goto out;
+			ret = EIO;
+			break;
 		}

 		ret = decode_entry(srb->entries + pos, &sre, &prev);
 		if (ret <= 0) {
 			/* can only be inconsistency :/ */
-			ret = -EIO;
-			goto out;
+			ret = EIO;
+			break;
 		}
 		pos += ret;
 		prev = sre;
@@ -968,8 +972,6 @@ int scoutfs_srch_search_xattrs(struct super_block *sb,

 	scoutfs_inc_counter(sb, srch_search_xattrs);

-	trace_scoutfs_ioc_search_xattrs(sb, ino, last_ino);
-
 	*done = false;
 	srch_init_rb_root(sroot);

@@ -1406,7 +1408,7 @@ int scoutfs_srch_commit_compact(struct super_block *sb,
 			ret = -EIO;
 		scoutfs_btree_put_iref(&iref);
 	}
-	if (ret < 0)
+	if (ret < 0) /* XXX leaks allocators */
 		goto out;

 	/* restore busy to pending if the operation failed */
@@ -1426,8 +1428,10 @@ int scoutfs_srch_commit_compact(struct super_block *sb,
 	/* update file references if we finished compaction (!deleting) */
 	if (!(res->flags & SCOUTFS_SRCH_COMPACT_FLAG_DELETE)) {
 		ret = commit_files(sb, alloc, wri, root, res);
-		if (ret < 0)
+		if (ret < 0) {
+			/* XXX we can't commit, shutdown? */
 			goto out;
+		}

 		/* transition flags for deleting input files */
 		for (i = 0; i < res->nr; i++) {
@@ -1454,7 +1458,7 @@ update:
 			      le64_to_cpu(pending->id), 0);
 		ret = scoutfs_btree_insert(sb, alloc, wri, root, &key,
 					   pending, sizeof(*pending));
-		if (WARN_ON_ONCE(ret < 0)) /* XXX inconsistency */
+		if (ret < 0)
 			goto out;
 	}

@@ -1467,6 +1471,7 @@ update:
 		BUG_ON(err); /* both busy and pending present */
 	}
 out:
+	WARN_ON_ONCE(ret < 0); /* XXX inconsistency */
 	kfree(busy);
 	return ret;
 }
@@ -1664,7 +1669,7 @@ static int kway_merge(struct super_block *sb,
 			/* end sorted block on _SAFE offset for testing */
 			if (bl && le32_to_cpu(srb->entry_nr) == 1 && logs_input &&
 			    scoutfs_trigger(sb, SRCH_COMPACT_LOGS_PAD_SAFE)) {
-				pad_entries_at_safe(sfl, srb);
+				pad_entries_at_safe(sfl, blk, srb);
 				scoutfs_block_put(sb, bl);
 				bl = NULL;
 				blk++;
@@ -1797,7 +1802,7 @@ static void swap_page_sre(void *A, void *B, int size)
 * typically, ~10x worst case).
 *
 * Because we read and sort all the input files we must perform the full
- * compaction in one operation.  The server must have given us
+ * compaction in one operation.  The server must have given us a
 * sufficiently large avail/freed lists, otherwise we'll return ENOSPC.
 */
 static int compact_logs(struct super_block *sb,
@@ -1861,14 +1866,14 @@ static int compact_logs(struct super_block *sb,

 		if (pos > SCOUTFS_SRCH_BLOCK_SAFE_BYTES) {
 			/* can only be inconsistency :/ */
-			ret = -EIO;
-			goto out;
+			ret = EIO;
+			break;
 		}

 		ret = decode_entry(srb->entries + pos, sre, &prev);
 		if (ret <= 0) {
 			/* can only be inconsistency :/ */
-			ret = -EIO;
+			ret = EIO;
 			goto out;
 		}
 		prev = *sre;
@@ -2276,11 +2281,12 @@ static void scoutfs_srch_compact_worker(struct work_struct *work)
 	} else {
 		ret = -EINVAL;
 	}
+	if (ret < 0)
+		goto commit;

-	scoutfs_alloc_prepare_commit(sb, &alloc, &wri);
-	if (ret == 0)
+	ret = scoutfs_alloc_prepare_commit(sb, &alloc, &wri) ?:
 	      scoutfs_block_writer_write(sb, &wri);
-
+commit:
 	/* the server won't use our partial compact if _ERROR is set */
 	sc->meta_avail = alloc.avail;
 	sc->meta_freed = alloc.freed;
@@ -2297,7 +2303,7 @@ out:
 		scoutfs_inc_counter(sb, srch_compact_error);

 	scoutfs_block_writer_forget_all(sb, &wri);
-	queue_compact_work(srinf, sc != NULL && sc->nr > 0 && ret == 0);
+	queue_compact_work(srinf, sc->nr > 0 && ret == 0);

 	kfree(sc);
 }
@@ -512,9 +512,9 @@ static int scoutfs_fill_super(struct super_block *sb, void *data, int silent)

 	sbi = kzalloc(sizeof(struct scoutfs_sb_info), GFP_KERNEL);
 	sb->s_fs_info = sbi;
+	sbi->sb = sb;
 	if (!sbi)
 		return -ENOMEM;
-	sbi->sb = sb;

 	ret = assign_random_id(sbi);
 	if (ret < 0)
@@ -30,11 +30,6 @@ void scoutfs_totl_merge_init(struct scoutfs_totl_merging *merg)
 	memset(merg, 0, sizeof(struct scoutfs_totl_merging));
 }

-/*
- * bin the incoming merge inputs so that we can resolve delta items
- * properly. Finalized logs that are merge inputs are kept separately
- * from those that are not.
- */
 void scoutfs_totl_merge_contribute(struct scoutfs_totl_merging *merg,
 				   u64 seq, u8 flags, void *val, int val_len, int fic)
 {
@@ -44,10 +39,10 @@ void scoutfs_totl_merge_contribute(struct scoutfs_totl_merging *merg,
 		merg->fs_seq = seq;
 		merg->fs_total = le64_to_cpu(tval->total);
 		merg->fs_count = le64_to_cpu(tval->count);
-	} else if (fic & FIC_MERGE_INPUT) {
-		merg->inp_seq = seq;
-		merg->inp_total += le64_to_cpu(tval->total);
-		merg->inp_count += le64_to_cpu(tval->count);
+	} else if (fic & FIC_FINALIZED) {
+		merg->fin_seq = seq;
+		merg->fin_total += le64_to_cpu(tval->total);
+		merg->fin_count += le64_to_cpu(tval->count);
 	} else {
 		merg->log_seq = seq;
 		merg->log_total += le64_to_cpu(tval->total);
@@ -58,18 +53,15 @@ void scoutfs_totl_merge_contribute(struct scoutfs_totl_merging *merg,
 /*
 * .totl. item merging has to be careful because the log btree merging
 * code can write partial results to the fs_root.  This means that a
- * reader can see both cases where merge input deltas should be applied
- * to the old fs items and where they have already been applied to the
- * partially merged fs items.
- *
- * Only finalized log trees that are inputs to the current merge cycle
- * are tracked in the inp_ bucket.  Finalized trees that aren't merge
- * inputs and active log trees are always applied unconditionally since
- * they cannot be in fs_root.
+ * reader can see both cases where new finalized logs should be applied
+ * to the old fs items and where old finalized logs have already been
+ * applied to the partially merged fs items.  Currently active logged
+ * items are always applied on top of all cases.
 *
 * These cases are differentiated with a combination of sequence numbers
- * in items and the count of contributing xattrs.  This lets us
- * recognize all cases, including when merge inputs were merged and
+ * in items, the count of contributing xattrs, and a flag
+ * differentiating finalized and active logged items.  This lets us
+ * recognize all cases, including when finalized logs were merged and
 * deleted the fs item.
 */
 void scoutfs_totl_merge_resolve(struct scoutfs_totl_merging *merg, __u64 *total, __u64 *count)
@@ -83,14 +75,14 @@ void scoutfs_totl_merge_resolve(struct scoutfs_totl_merging *merg, __u64 *total,
 		*count = merg->fs_count;
 	}

-	/* apply merge input deltas if they're newer or creating */
-	if (((merg->fs_seq != 0) && (merg->inp_seq > merg->fs_seq)) ||
-	    ((merg->fs_seq == 0) && (merg->inp_count > 0))) {
-		*total += merg->inp_total;
-		*count += merg->inp_count;
+	/* apply finalized logs if they're newer or creating */
+	if (((merg->fs_seq != 0) && (merg->fin_seq > merg->fs_seq)) ||
+	    ((merg->fs_seq == 0) && (merg->fin_count > 0))) {
+		*total += merg->fin_total;
+		*count += merg->fin_count;
 	}

-	/* always apply non-input finalized and active logs */
+	/* always apply active logs which must be newer than fs and finalized */
 	if (merg->log_seq > 0) {
 		*total += merg->log_total;
 		*count += merg->log_count;
@@ -7,9 +7,9 @@ struct scoutfs_totl_merging {
 	u64 fs_seq;
 	u64 fs_total;
 	u64 fs_count;
-	u64 inp_seq;
-	u64 inp_total;
-	s64 inp_count;
+	u64 fin_seq;
+	u64 fin_total;
+	s64 fin_count;
 	u64 log_seq;
 	u64 log_total;
 	s64 log_count;
@@ -196,7 +196,7 @@ static int retry_forever(struct super_block *sb, int (*func)(struct super_block
 			}

 			if (scoutfs_forcing_unmount(sb)) {
-				ret = -ENOLINK;
+				ret = -EIO;
 				break;
 			}

@@ -252,7 +252,7 @@ void scoutfs_trans_write_func(struct work_struct *work)
 	}

 	if (scoutfs_forcing_unmount(sb)) {
-		ret = -ENOLINK;
+		ret = -EIO;
 		goto out;
 	}

@@ -18,7 +18,6 @@

 #include "super.h"
 #include "triggers.h"
-#include "scoutfs_trace.h"

 /*
 * We have debugfs files we can write to which arm triggers which
@@ -40,13 +39,10 @@ struct scoutfs_triggers {

 static char *names[] = {
 	[SCOUTFS_TRIGGER_BLOCK_REMOVE_STALE] = "block_remove_stale",
-	[SCOUTFS_TRIGGER_LOG_MERGE_FORCE_FINALIZE_OURS] = "log_merge_force_finalize_ours",
 	[SCOUTFS_TRIGGER_SRCH_COMPACT_LOGS_PAD_SAFE] = "srch_compact_logs_pad_safe",
 	[SCOUTFS_TRIGGER_SRCH_FORCE_LOG_ROTATE] = "srch_force_log_rotate",
 	[SCOUTFS_TRIGGER_SRCH_MERGE_STOP_SAFE] = "srch_merge_stop_safe",
 	[SCOUTFS_TRIGGER_STATFS_LOCK_PURGE] = "statfs_lock_purge",
-	[SCOUTFS_TRIGGER_RECLAIM_SKIP_FINALIZE] = "reclaim_skip_finalize",
-	[SCOUTFS_TRIGGER_LOG_MERGE_FORCE_PARTIAL] = "log_merge_force_partial",
 };

 bool scoutfs_trigger_test_and_clear(struct super_block *sb, unsigned int t)
@@ -55,7 +51,6 @@ bool scoutfs_trigger_test_and_clear(struct super_block *sb, unsigned int t)
 	atomic_t *atom;
 	int old;
 	int mem;
-	bool fired;

 	BUG_ON(t >= SCOUTFS_TRIGGER_NR);
 	atom = &triggers->atomics[t];
@@ -69,12 +64,7 @@ bool scoutfs_trigger_test_and_clear(struct super_block *sb, unsigned int t)
 		mem = atomic_cmpxchg(atom, old, 0);
 	} while (mem && mem != old);

-	fired = !!mem;
-
-	if (fired)
-		trace_scoutfs_trigger_fired(sb, names[t]);
-
-	return fired;
+	return !!mem;
 }

 int scoutfs_setup_triggers(struct super_block *sb)
@@ -3,13 +3,10 @@

 enum scoutfs_trigger {
 	SCOUTFS_TRIGGER_BLOCK_REMOVE_STALE,
-	SCOUTFS_TRIGGER_LOG_MERGE_FORCE_FINALIZE_OURS,
 	SCOUTFS_TRIGGER_SRCH_COMPACT_LOGS_PAD_SAFE,
 	SCOUTFS_TRIGGER_SRCH_FORCE_LOG_ROTATE,
 	SCOUTFS_TRIGGER_SRCH_MERGE_STOP_SAFE,
 	SCOUTFS_TRIGGER_STATFS_LOCK_PURGE,
-	SCOUTFS_TRIGGER_RECLAIM_SKIP_FINALIZE,
-	SCOUTFS_TRIGGER_LOG_MERGE_FORCE_PARTIAL,
 	SCOUTFS_TRIGGER_NR,
 };

@@ -95,7 +95,6 @@ struct wkic_info {
 	/* block reading slow path */
 	struct mutex roots_mutex;
 	struct scoutfs_net_roots roots;
-	u64 merge_input_seq;
 	u64 roots_read_seq;
 	ktime_t roots_expire;

@@ -806,79 +805,29 @@ static void free_page_list(struct super_block *sb, struct list_head *list)
 * read_seq number so that we can compare the age of the items in cached
 * pages.  Only one request to refresh the roots is in progress at a
 * time.  This is the slow path that's only used when the cache isn't
- * populated and the roots aren't cached.
- *
- * We read roots directly from the on-disk superblock rather than
- * requesting them from the server so that we can also read the
- * log_merge btree from the same superblock.  The merge status item
- * seq tells us which finalized log trees are inputs to the current
- * merge, which is needed to correctly resolve totl delta items.
+ * populated and the roots aren't cached.  The root request is fast
+ * enough, especially compared to the resulting item reading IO, that we
+ * don't mind hiding it behind a trivial mutex.
 */
-static int refresh_roots(struct super_block *sb, struct wkic_info *winf)
-{
-	struct scoutfs_super_block *super;
-	struct scoutfs_log_merge_status *stat;
-	SCOUTFS_BTREE_ITEM_REF(iref);
-	struct scoutfs_key key;
-	int ret;
-
-	super = kmalloc(sizeof(*super), GFP_NOFS);
-	if (!super)
-		return -ENOMEM;
-
-	ret = scoutfs_read_super(sb, super);
-	if (ret < 0)
-		goto out;
-
-	winf->roots = (struct scoutfs_net_roots){
-		.fs_root = super->fs_root,
-		.logs_root = super->logs_root,
-		.srch_root = super->srch_root,
-	};
-
-	winf->merge_input_seq = 0;
-	if (super->log_merge.ref.blkno) {
-		scoutfs_key_set_zeros(&key);
-		key.sk_zone = SCOUTFS_LOG_MERGE_STATUS_ZONE;
-		ret = scoutfs_btree_lookup(sb, &super->log_merge, &key, &iref);
-		if (ret == 0) {
-			if (iref.val_len == sizeof(*stat)) {
-				stat = iref.val;
-				winf->merge_input_seq = le64_to_cpu(stat->seq);
-			} else {
-				ret = -EUCLEAN;
-			}
-			scoutfs_btree_put_iref(&iref);
-		} else if (ret == -ENOENT) {
-			ret = 0;
-		}
-		if (ret < 0)
-			goto out;
-	}
-
-	winf->roots_read_seq++;
-	winf->roots_expire = ktime_add_ms(ktime_get_raw(), WKIC_CACHE_LIFETIME_MS);
-out:
-	kfree(super);
-	return ret;
-}
-
 static int get_roots(struct super_block *sb, struct wkic_info *winf,
-		     struct scoutfs_net_roots *roots_ret, u64 *merge_input_seq,
-		     u64 *read_seq, bool force_new)
+		     struct scoutfs_net_roots *roots_ret, u64 *read_seq, bool force_new)
 {
+	struct scoutfs_net_roots roots;
 	int ret;

 	mutex_lock(&winf->roots_mutex);

 	if (force_new || ktime_before(winf->roots_expire, ktime_get_raw())) {
-		ret = refresh_roots(sb, winf);
+		ret = scoutfs_client_get_roots(sb, &roots);
 		if (ret < 0)
 			goto out;
+
+		winf->roots = roots;
+		winf->roots_read_seq++;
+		winf->roots_expire = ktime_add_ms(ktime_get_raw(), WKIC_CACHE_LIFETIME_MS);
 	}

 	*roots_ret = winf->roots;
-	*merge_input_seq = winf->merge_input_seq;
 	*read_seq = winf->roots_read_seq;
 	ret = 0;
 out:
@@ -921,30 +870,24 @@ static int insert_read_pages(struct super_block *sb, struct wkic_info *winf,
 	struct scoutfs_key end;
 	struct wkic_page *wpage;
 	LIST_HEAD(pages);
-	u64 merge_input_seq;
-	u64 read_seq = 0;
+	u64 read_seq;
 	int ret;

 	ret = 0;
 retry_stale:
-	ret = get_roots(sb, winf, &roots, &merge_input_seq, &read_seq, ret == -ESTALE);
+	ret = get_roots(sb, winf, &roots, &read_seq, ret == -ESTALE);
 	if (ret < 0)
-		goto check_stale;
+		goto out;

 	start = *range_start;
 	end = *range_end;
-	ret = scoutfs_forest_read_items_roots(sb, &roots, merge_input_seq, key, range_start,
-					      &start, &end, read_items_cb, &root);
+	ret = scoutfs_forest_read_items_roots(sb, &roots, key, range_start, &start, &end,
+					      read_items_cb, &root);
 	trace_scoutfs_wkic_read_items(sb, key, &start, &end);
-check_stale:
 	ret = scoutfs_block_check_stale(sb, ret, &saved, &roots.fs_root.ref, &roots.logs_root.ref);
 	if (ret < 0) {
-		if (ret == -ESTALE) {
-			/* not safe to retry due to delta items, must restart clean */
-			free_item_tree(&root);
-			root = RB_ROOT;
+		if (ret == -ESTALE)
 			goto retry_stale;
-		}
 		goto out;
 	}

@@ -742,7 +742,7 @@ int scoutfs_xattr_set_locked(struct inode *inode, const char *name, size_t name_
 	int ret;
 	int err;

-	trace_scoutfs_xattr_set(sb, ino, name_len, value, size, flags);
+	trace_scoutfs_xattr_set(sb, name_len, value, size, flags);

 	if (WARN_ON_ONCE(tgs->totl && tgs->indx) ||
 	    WARN_ON_ONCE((tgs->totl | tgs->indx) && !tag_lock))
@@ -1265,7 +1265,6 @@ int scoutfs_xattr_drop(struct super_block *sb, u64 ino,
 			ret = parse_indx_key(&tag_key, xat->name, xat->name_len, ino);
 			if (ret < 0)
 				goto out;
-			scoutfs_xattr_set_indx_key_xid(&tag_key, le64_to_cpu(key.skx_id));
 		}

 		if ((tgs.totl || tgs.indx) && locked_zone != tag_key.sk_zone) {
@@ -12,4 +12,3 @@ src/o_tmpfile_umask
 src/o_tmpfile_linkat
 src/mmap_stress
 src/mmap_validate
-src/totl-delta-inject
@@ -15,8 +15,7 @@ BIN := src/createmany			\
 	src/o_tmpfile_umask		\
 	src/o_tmpfile_linkat		\
 	src/mmap_stress			\
-	src/mmap_validate		\
-	src/totl-delta-inject
+	src/mmap_validate

 DEPS := $(wildcard src/*.d)

@@ -117,7 +117,6 @@ used during the test.
 | T\_NR\_MOUNTS    | number of mounts     | -n              | 3                 |
 | T\_O[0-9]        | mount options        | created per run | -o server\_addr=  |
 | T\_QUORUM        | quorum count         | -q              | 2                 |
-| T\_EXTRA         | per-test file dir    | revision ctled  | tests/extra/t     |
 | T\_TMP           | per-test tmp prefix  | made for test   | results/tmp/t/tmp |
 | T\_TMPDIR        | per-test tmp dir dir | made for test   | results/tmp/t     |

@@ -1,882 +0,0 @@
-Ran:
-generic/001
-generic/002
-generic/004
-generic/005
-generic/006
-generic/007
-generic/008
-generic/009
-generic/011
-generic/012
-generic/013
-generic/014
-generic/015
-generic/016
-generic/018
-generic/020
-generic/021
-generic/022
-generic/023
-generic/024
-generic/025
-generic/026
-generic/028
-generic/029
-generic/030
-generic/031
-generic/032
-generic/033
-generic/034
-generic/035
-generic/037
-generic/039
-generic/040
-generic/041
-generic/050
-generic/052
-generic/053
-generic/056
-generic/057
-generic/058
-generic/059
-generic/060
-generic/061
-generic/062
-generic/063
-generic/064
-generic/065
-generic/066
-generic/067
-generic/069
-generic/070
-generic/071
-generic/073
-generic/076
-generic/078
-generic/079
-generic/080
-generic/081
-generic/082
-generic/084
-generic/086
-generic/087
-generic/088
-generic/090
-generic/091
-generic/092
-generic/094
-generic/096
-generic/097
-generic/098
-generic/099
-generic/101
-generic/104
-generic/105
-generic/106
-generic/107
-generic/110
-generic/111
-generic/113
-generic/114
-generic/115
-generic/116
-generic/117
-generic/118
-generic/119
-generic/120
-generic/121
-generic/122
-generic/123
-generic/124
-generic/126
-generic/128
-generic/129
-generic/130
-generic/131
-generic/134
-generic/135
-generic/136
-generic/138
-generic/139
-generic/140
-generic/141
-generic/142
-generic/143
-generic/144
-generic/145
-generic/146
-generic/147
-generic/148
-generic/149
-generic/150
-generic/151
-generic/152
-generic/153
-generic/154
-generic/155
-generic/156
-generic/157
-generic/158
-generic/159
-generic/160
-generic/161
-generic/162
-generic/163
-generic/169
-generic/171
-generic/172
-generic/173
-generic/174
-generic/177
-generic/178
-generic/179
-generic/180
-generic/181
-generic/182
-generic/183
-generic/184
-generic/185
-generic/188
-generic/189
-generic/190
-generic/191
-generic/193
-generic/194
-generic/195
-generic/196
-generic/197
-generic/198
-generic/199
-generic/200
-generic/201
-generic/202
-generic/203
-generic/205
-generic/206
-generic/207
-generic/210
-generic/211
-generic/212
-generic/214
-generic/215
-generic/216
-generic/217
-generic/218
-generic/219
-generic/220
-generic/221
-generic/222
-generic/223
-generic/225
-generic/227
-generic/228
-generic/229
-generic/230
-generic/235
-generic/236
-generic/237
-generic/238
-generic/240
-generic/244
-generic/245
-generic/246
-generic/247
-generic/248
-generic/249
-generic/250
-generic/252
-generic/253
-generic/254
-generic/255
-generic/256
-generic/257
-generic/258
-generic/259
-generic/260
-generic/261
-generic/262
-generic/263
-generic/264
-generic/265
-generic/266
-generic/267
-generic/268
-generic/271
-generic/272
-generic/276
-generic/277
-generic/278
-generic/279
-generic/281
-generic/282
-generic/283
-generic/284
-generic/286
-generic/287
-generic/288
-generic/289
-generic/290
-generic/291
-generic/292
-generic/293
-generic/294
-generic/295
-generic/296
-generic/301
-generic/302
-generic/303
-generic/304
-generic/305
-generic/306
-generic/307
-generic/308
-generic/309
-generic/312
-generic/313
-generic/314
-generic/315
-generic/316
-generic/317
-generic/319
-generic/322
-generic/324
-generic/325
-generic/326
-generic/327
-generic/328
-generic/329
-generic/330
-generic/331
-generic/332
-generic/335
-generic/336
-generic/337
-generic/341
-generic/342
-generic/343
-generic/346
-generic/348
-generic/353
-generic/355
-generic/358
-generic/359
-generic/360
-generic/361
-generic/362
-generic/363
-generic/364
-generic/365
-generic/366
-generic/367
-generic/368
-generic/369
-generic/370
-generic/371
-generic/372
-generic/373
-generic/374
-generic/375
-generic/376
-generic/377
-generic/378
-generic/379
-generic/380
-generic/381
-generic/382
-generic/383
-generic/384
-generic/385
-generic/386
-generic/389
-generic/391
-generic/392
-generic/393
-generic/394
-generic/395
-generic/396
-generic/397
-generic/398
-generic/400
-generic/401
-generic/402
-generic/403
-generic/404
-generic/406
-generic/407
-generic/408
-generic/412
-generic/413
-generic/414
-generic/417
-generic/419
-generic/420
-generic/421
-generic/422
-generic/424
-generic/425
-generic/426
-generic/427
-generic/428
-generic/436
-generic/437
-generic/439
-generic/440
-generic/443
-generic/445
-generic/446
-generic/448
-generic/449
-generic/450
-generic/451
-generic/452
-generic/453
-generic/454
-generic/456
-generic/458
-generic/460
-generic/462
-generic/463
-generic/465
-generic/466
-generic/468
-generic/469
-generic/470
-generic/471
-generic/474
-generic/477
-generic/478
-generic/479
-generic/480
-generic/481
-generic/483
-generic/485
-generic/486
-generic/487
-generic/488
-generic/489
-generic/490
-generic/491
-generic/492
-generic/498
-generic/499
-generic/501
-generic/502
-generic/503
-generic/504
-generic/505
-generic/506
-generic/507
-generic/508
-generic/509
-generic/510
-generic/511
-generic/512
-generic/513
-generic/514
-generic/515
-generic/516
-generic/517
-generic/518
-generic/519
-generic/520
-generic/523
-generic/524
-generic/525
-generic/526
-generic/527
-generic/528
-generic/529
-generic/530
-generic/531
-generic/533
-generic/534
-generic/535
-generic/536
-generic/537
-generic/538
-generic/539
-generic/540
-generic/541
-generic/542
-generic/543
-generic/544
-generic/545
-generic/546
-generic/547
-generic/548
-generic/549
-generic/550
-generic/552
-generic/553
-generic/555
-generic/556
-generic/557
-generic/566
-generic/567
-generic/571
-generic/572
-generic/573
-generic/574
-generic/575
-generic/576
-generic/577
-generic/578
-generic/580
-generic/581
-generic/582
-generic/583
-generic/584
-generic/586
-generic/587
-generic/588
-generic/591
-generic/592
-generic/593
-generic/594
-generic/595
-generic/596
-generic/597
-generic/598
-generic/599
-generic/600
-generic/601
-generic/602
-generic/603
-generic/604
-generic/605
-generic/606
-generic/607
-generic/608
-generic/609
-generic/610
-generic/611
-generic/612
-generic/613
-generic/614
-generic/618
-generic/621
-generic/623
-generic/624
-generic/625
-generic/626
-generic/628
-generic/629
-generic/630
-generic/632
-generic/634
-generic/635
-generic/637
-generic/638
-generic/639
-generic/640
-generic/644
-generic/645
-generic/646
-generic/647
-generic/651
-generic/652
-generic/653
-generic/654
-generic/655
-generic/657
-generic/658
-generic/659
-generic/660
-generic/661
-generic/662
-generic/663
-generic/664
-generic/665
-generic/666
-generic/667
-generic/668
-generic/669
-generic/673
-generic/674
-generic/675
-generic/676
-generic/677
-generic/678
-generic/679
-generic/680
-generic/681
-generic/682
-generic/683
-generic/684
-generic/685
-generic/686
-generic/687
-generic/688
-generic/689
-shared/002
-shared/032
-Not
-run:
-generic/008
-generic/009
-generic/012
-generic/015
-generic/016
-generic/018
-generic/021
-generic/022
-generic/025
-generic/026
-generic/031
-generic/033
-generic/050
-generic/052
-generic/058
-generic/059
-generic/060
-generic/061
-generic/063
-generic/064
-generic/078
-generic/079
-generic/081
-generic/082
-generic/091
-generic/094
-generic/096
-generic/110
-generic/111
-generic/113
-generic/114
-generic/115
-generic/116
-generic/118
-generic/119
-generic/121
-generic/122
-generic/123
-generic/128
-generic/130
-generic/134
-generic/135
-generic/136
-generic/138
-generic/139
-generic/140
-generic/142
-generic/143
-generic/144
-generic/145
-generic/146
-generic/147
-generic/148
-generic/149
-generic/150
-generic/151
-generic/152
-generic/153
-generic/154
-generic/155
-generic/156
-generic/157
-generic/158
-generic/159
-generic/160
-generic/161
-generic/162
-generic/163
-generic/171
-generic/172
-generic/173
-generic/174
-generic/177
-generic/178
-generic/179
-generic/180
-generic/181
-generic/182
-generic/183
-generic/185
-generic/188
-generic/189
-generic/190
-generic/191
-generic/193
-generic/194
-generic/195
-generic/196
-generic/197
-generic/198
-generic/199
-generic/200
-generic/201
-generic/202
-generic/203
-generic/205
-generic/206
-generic/207
-generic/210
-generic/211
-generic/212
-generic/214
-generic/216
-generic/217
-generic/218
-generic/219
-generic/220
-generic/222
-generic/223
-generic/225
-generic/227
-generic/229
-generic/230
-generic/235
-generic/238
-generic/240
-generic/244
-generic/250
-generic/252
-generic/253
-generic/254
-generic/255
-generic/256
-generic/259
-generic/260
-generic/261
-generic/262
-generic/263
-generic/264
-generic/265
-generic/266
-generic/267
-generic/268
-generic/271
-generic/272
-generic/276
-generic/277
-generic/278
-generic/279
-generic/281
-generic/282
-generic/283
-generic/284
-generic/287
-generic/288
-generic/289
-generic/290
-generic/291
-generic/292
-generic/293
-generic/295
-generic/296
-generic/301
-generic/302
-generic/303
-generic/304
-generic/305
-generic/312
-generic/314
-generic/316
-generic/317
-generic/324
-generic/326
-generic/327
-generic/328
-generic/329
-generic/330
-generic/331
-generic/332
-generic/353
-generic/355
-generic/358
-generic/359
-generic/361
-generic/362
-generic/363
-generic/364
-generic/365
-generic/366
-generic/367
-generic/368
-generic/369
-generic/370
-generic/371
-generic/372
-generic/373
-generic/374
-generic/378
-generic/379
-generic/380
-generic/381
-generic/382
-generic/383
-generic/384
-generic/385
-generic/386
-generic/391
-generic/392
-generic/395
-generic/396
-generic/397
-generic/398
-generic/400
-generic/402
-generic/404
-generic/406
-generic/407
-generic/408
-generic/412
-generic/413
-generic/414
-generic/417
-generic/419
-generic/420
-generic/421
-generic/422
-generic/424
-generic/425
-generic/427
-generic/439
-generic/440
-generic/446
-generic/449
-generic/450
-generic/451
-generic/453
-generic/454
-generic/456
-generic/458
-generic/462
-generic/463
-generic/465
-generic/466
-generic/468
-generic/469
-generic/470
-generic/471
-generic/474
-generic/485
-generic/487
-generic/488
-generic/491
-generic/492
-generic/499
-generic/501
-generic/503
-generic/505
-generic/506
-generic/507
-generic/508
-generic/511
-generic/513
-generic/514
-generic/515
-generic/516
-generic/517
-generic/518
-generic/519
-generic/520
-generic/528
-generic/530
-generic/536
-generic/537
-generic/538
-generic/539
-generic/540
-generic/541
-generic/542
-generic/543
-generic/544
-generic/545
-generic/546
-generic/548
-generic/549
-generic/550
-generic/552
-generic/553
-generic/555
-generic/556
-generic/566
-generic/567
-generic/572
-generic/573
-generic/574
-generic/575
-generic/576
-generic/577
-generic/578
-generic/580
-generic/581
-generic/582
-generic/583
-generic/584
-generic/586
-generic/587
-generic/588
-generic/591
-generic/592
-generic/593
-generic/594
-generic/595
-generic/596
-generic/597
-generic/598
-generic/599
-generic/600
-generic/601
-generic/602
-generic/603
-generic/605
-generic/606
-generic/607
-generic/608
-generic/609
-generic/610
-generic/612
-generic/613
-generic/621
-generic/623
-generic/624
-generic/625
-generic/626
-generic/628
-generic/629
-generic/630
-generic/635
-generic/644
-generic/645
-generic/646
-generic/647
-generic/651
-generic/652
-generic/653
-generic/654
-generic/655
-generic/657
-generic/658
-generic/659
-generic/660
-generic/661
-generic/662
-generic/663
-generic/664
-generic/665
-generic/666
-generic/667
-generic/668
-generic/669
-generic/673
-generic/674
-generic/675
-generic/677
-generic/678
-generic/679
-generic/680
-generic/681
-generic/682
-generic/683
-generic/684
-generic/685
-generic/686
-generic/687
-generic/688
-generic/689
-shared/002
-shared/032
-Passed all 512 tests
@@ -1,44 +0,0 @@
-generic/003	# missing atime update in buffered read
-generic/075	# file content mismatch failures (fds, etc)
-generic/103	# enospc causes trans commit failures
-generic/108	# mount fails on failing device?
-generic/112	# file content mismatch failures (fds, etc)
-generic/213	# enospc causes trans commit failures
-generic/318	# can't support user namespaces until v5.11
-generic/321	# requires selinux enabled for '+' in ls?
-generic/338	# BUG_ON update inode error handling
-generic/347	# _dmthin_mount doesn't work?
-generic/356	# swap
-generic/357	# swap
-generic/409	# bind mounts not scripted yet
-generic/410	# bind mounts not scripted yet
-generic/411	# bind mounts not scripted yet
-generic/423	# symlink inode size is strlen() + 1 on scoutfs
-generic/430	# xfs_io copy_range missing in el7
-generic/431	# xfs_io copy_range missing in el7
-generic/432	# xfs_io copy_range missing in el7
-generic/433	# xfs_io copy_range missing in el7
-generic/434	# xfs_io copy_range missing in el7
-generic/441	# dm-mapper
-generic/444	# el9's posix_acl_update_mode is buggy ?
-generic/467	# open_by_handle ESTALE
-generic/472	# swap
-generic/484	# dm-mapper
-generic/493	# swap
-generic/494	# swap
-generic/495	# swap
-generic/496	# swap
-generic/497	# swap
-generic/532	# xfs_io statx attrib_mask missing in el7
-generic/554	# swap
-generic/563	# cgroup+loopdev
-generic/564	# xfs_io copy_range missing in el7
-generic/565	# xfs_io copy_range missing in el7
-generic/568	# falloc not resulting in block count increase
-generic/569	# swap
-generic/570	# swap
-generic/620	# dm-hugedisk
-generic/633	# id-mapped mounts missing in el7
-generic/636	# swap
-generic/641	# swap
-generic/643	# swap
@@ -8,33 +8,36 @@

 echo "$0 running rid '$SCOUTFS_FENCED_REQ_RID' ip '$SCOUTFS_FENCED_REQ_IP' args '$@'"

-echo_fail() {
-	echo "$@" >&2
+log() {
+	echo "$@" > /dev/stderr
 	exit 1
 }

-# silence error messages
-quiet_cat()
-{
-	cat "$@" 2>/dev/null
+echo_fail() {
+	echo "$@" > /dev/stderr
+	exit 1
 }

 rid="$SCOUTFS_FENCED_REQ_RID"

-shopt -s nullglob
 for fs in /sys/fs/scoutfs/*; do
-	fs_rid="$(quiet_cat $fs/rid)"
-	nr="$(quiet_cat $fs/data_device_maj_min)"
-	[ ! -d "$fs" -o "$fs_rid" != "$rid" ] && continue
+	[ ! -d "$fs" ] && continue

-	mnt=$(findmnt -l -n -t scoutfs -o TARGET -S $nr)
-	[ -z "$mnt" ] && continue
-
-	if ! umount -qf "$mnt"; then
-		if [ -d "$fs" ]; then
-			echo_fail "umount -qf $mnt failed"
-		fi
+	fs_rid="$(cat $fs/rid)" || \
+		echo_fail "failed to get rid in $fs"
+	if [ "$fs_rid" != "$rid" ]; then
+		continue
 	fi
+
+	nr="$(cat $fs/data_device_maj_min)" || \
+		echo_fail "failed to get data device major:minor in $fs"
+
+	mnts=$(findmnt -l -n -t scoutfs -o TARGET -S $nr) || \
+		echo_fail "findmnt -t scoutfs -S $nr failed"
+	for mnt in $mnts; do
+		umount -f "$mnt" || \
+			echo_fail "umout -f $mnt failed"
+	done
 done

 exit 0
@@ -64,27 +64,21 @@ t_rc()
 }

 #
-# As run, stdout/err are redirected to a file that will be compared with
-# the stored expected golden output of the test.  This redirects
-# stdout/err in the script to stdout of the invoking run-test.  It's
-# intended to give visible output of tests without being included in the
-# golden output.
+# redirect test output back to the output of the invoking script intead
+# of the compared output.
 #
-# (see the goofy "exec" fd manipulation in the main run-tests as it runs
-# each test)
-#
-t_stdout_invoked()
+t_restore_output()
 {
 	exec >&6 2>&1
 }

 #
-# This undoes t_stdout_invokved, returning the test's stdout/err to the
-# output file as it was when it was launched.
+# redirect a command's output back to the compared output after the
+# test has restored its output
 #
-t_stdout_compare()
+t_compare_output()
 {
-	exec >&7 2>&1
+	"$@" >&7 2>&1
 }

 #
@@ -20,6 +20,9 @@ t_filter_fs()
 # [ 2687.691366] BUG: KASAN: stack-out-of-bounds in get_reg+0x1bc/0x230
 # ...
 # [ 2687.706220] ==================================================================
+# [ 2687.707284] Disabling lock debugging due to kernel taint
+#
+# That final lock debugging message may not be included.
 #
 ignore_harmless_unwind_kasan_stack_oob()
 {
@@ -43,6 +46,10 @@ awk '
 		saved=""
        }
        ( in_soob == 2 && $0 ~ /==================================================================/ ) {
+                in_soob = 3
+                soob_nr = NR
+        }
+        ( in_soob == 3 && NR > soob_nr && $0 !~ /Disabling lock debugging/ ) {
                in_soob = 0
        }
        ( !in_soob ) { print $0 }
@@ -54,58 +61,6 @@ awk '
 '
 }

-#
-# in el97+, XFS can generate a spurious lockdep circular dependency
-# warning about reclaim. Fixed upstream in e.g. v5.7-rc4-129-g6dcde60efd94
-#
-ignore_harmless_xfs_lockdep_warning()
-{
-awk '
-	BEGIN {
-		in_block = 0
-		block_nr = 0
-		buf = ""
-	}
-	( !in_block && $0 ~ /======================================================/ ) {
-		in_block = 1
-		block_nr = NR
-		buf = $0 "\n"
-		next
-	}
-	( in_block == 1 && NR == (block_nr + 1) ) {
-		if (match($0, /WARNING: possible circular locking dependency detected/) != 0) {
-			in_block = 2
-			buf = buf $0 "\n"
-		} else {
-			in_block = 0
-			printf "%s", buf
-			print $0
-			buf = ""
-		}
-		next
-	}
-	( in_block == 2 ) {
-		buf = buf $0 "\n"
-		if ($0 ~ /<\/TASK>/) {
-			if (buf ~ /xfs_(nondir_|dir_)?ilock_class/ && buf ~ /fs_reclaim/) {
-				# known xfs lockdep false positive, discard
-			} else {
-				printf "%s", buf
-			}
-			in_block = 0
-			buf = ""
-		}
-		next
-	}
-	{ print $0 }
-	END {
-		if (buf) {
-			printf "%s", buf
-		}
-	}
-'
-}
-
 #
 # Filter out expected messages.  Putting messages here implies that
 # tests aren't relying on messages to discover failures.. they're
@@ -166,17 +121,6 @@ t_filter_dmesg()

 	# in debugging kernels we can slow things down a bit
 	re="$re|hrtimer: interrupt took .*"
-	re="$re|clocksource: Long readout interval"
-
-	# orphan log trees reclaim is handled, not an error
-	re="$re|scoutfs .* reclaiming orphan log trees"
-
-	# nfs can emit a whole range of messages we can ignore
-	re="$re|Installing knfsd .*"
-	re="$re|nfsd: .*"
-	re="$re|NFSD: .*"
-	re="$re|RPC: .*"
-	re="$re|FS-Cache: .*"

 	# fencing tests force unmounts and trigger timeouts
 	re="$re|scoutfs .* forcing unmount"
@@ -196,9 +140,6 @@ t_filter_dmesg()
 	re="$re|scoutfs .* error.*server failed to bind to.*"
 	re="$re|scoutfs .* critical transaction commit failure.*"

-	# ENOLINK (-67) indicates an expected forced unmount error
-	re="$re|scoutfs .* error -67 .*"
-
 	# change-devices causes loop device resizing
 	re="$re|loop: module loaded"
 	re="$re|loop[0-9].* detected capacity change from.*"
@@ -222,16 +163,6 @@ t_filter_dmesg()
 	# perf warning that it adjusted sample rate
 	re="$re|perf: interrupt took too long.*lowering kernel.perf_event_max_sample_rate.*"

-	# some ci test guests are unresponsive
-	re="$re|longest quorum heartbeat .* delay"
-
-	# creating block devices may trigger this
-	re="$re|block device autoloading is deprecated and will be removed."
-
-	# lockdep or kasan warnings can cause this
-	re="$re|Disabling lock debugging due to kernel taint"
-
 	egrep -v "($re)" | \
-		ignore_harmless_unwind_kasan_stack_oob | \
-		ignore_harmless_xfs_lockdep_warning
+		ignore_harmless_unwind_kasan_stack_oob
 }
@@ -283,30 +283,6 @@ t_reinsert_remount_all()
 	t_quiet t_mount_all || t_fail "mounting all failed"
 }

-#
-# scratch helpers
-#
-t_scratch_mkfs()
-{
-	scoutfs mkfs -f -Q 0,127.0.0.1,$T_SCRATCH_PORT "$T_EX_META_DEV" "$T_EX_DATA_DEV" "$@" > $T_TMP.mkfs.out 2>&1 || \
-		t_fail "scratch mkfs failed"
-}
-
-t_scratch_mount()
-{
-	mkdir -p "$T_MSCR"
-	mount -t scoutfs -o metadev_path=$T_EX_META_DEV,quorum_slot_nr=0 "$@" "$T_EX_DATA_DEV" "$T_MSCR" || \
-		t_fail "scratch mount failed"
-}
-
-t_scratch_umount()
-{
-	umount "$T_MSCR" || \
-		t_fail "scratch umount failed"
-	rmdir "$T_MSCR"
-}
-
-
 t_trigger_path() {
 	local nr="$1"

@@ -522,121 +498,3 @@ t_restore_all_sysfs_mount_options() {
 		t_set_sysfs_mount_option $i $name "${_saved_opts[$ind]}"
 	done
 }
-
-t_force_log_merge() {
-	local sv=$(t_server_nr)
-	local merges_started
-	local last_merges_started
-	local merges_completed
-	local last_merges_completed
-
-	while true; do
-		last_merges_started=$(t_counter log_merge_start $sv)
-		last_merges_completed=$(t_counter log_merge_complete $sv)
-
-		t_trigger_arm_silent log_merge_force_finalize_ours $sv
-
-		t_sync_seq_index
-
-		while test "$(t_trigger_get log_merge_force_finalize_ours $sv)" == "1"; do
-			sleep .5
-		done
-
-		merges_started=$(t_counter log_merge_start $sv)
-
-		if (( merges_started > last_merges_started )); then
-			merges_completed=$(t_counter log_merge_complete $sv)
-
-			while (( merges_completed == last_merges_completed )); do
-				sleep .5
-				merges_completed=$(t_counter log_merge_complete $sv)
-			done
-			break
-		fi
-	done
-}
-
-declare -A _last_scan
-t_get_orphan_scan_runs() {
-	local i
-
-	for i in $(t_fs_nrs); do
-		_last_scan[$i]=$(t_counter orphan_scan $i)
-	done
-}
-
-t_wait_for_orphan_scan_runs() {
-	local i
-	local scan
-
-	t_get_orphan_scan_runs
-
-	for i in $(t_fs_nrs); do
-		while true; do
-			scan=$(t_counter orphan_scan $i)
-			if (( scan != _last_scan[$i] )); then
-				break
-			fi
-			sleep .5
-		done
-	done
-}
-
-declare -A _last_empty
-t_get_orphan_scan_empty() {
-	local i
-
-	for i in $(t_fs_nrs); do
-		_last_empty[$i]=$(t_counter orphan_scan_empty $i)
-	done
-}
-
-t_wait_for_no_orphans() {
-	local i;
-	local working;
-	local empty;
-
-	t_get_orphan_scan_empty
-
-	while true; do
-		working=0
-
-		t_wait_for_orphan_scan_runs
-
-		for i in $(t_fs_nrs); do
-			empty=$(t_counter orphan_scan_empty $i)
-			if (( empty == _last_empty[$i] )); then
-				(( working++ ))
-			else
-				(( _last_empty[$i] = empty ))
-			fi
-		done
-
-		if (( working == 0 )); then
-			break
-		fi
-
-		sleep 1
-	done
-}
-
-#
-# Repeatedly run the arguments as a command, sleeping in between, until
-# it returns success.  The first argument is a relative timeout in
-# seconds.  The remaining arguments are the command and its arguments.
-#
-# If the timeout expires without the command returning 0 then the test
-# fails.
-#
-t_wait_until_timeout() {
-	local relative="$1"
-	local expire="$((SECONDS + relative))"
-	shift
-
-	while (( SECONDS < expire )); do
-		"$@" && return
-		sleep 1
-	done
-
-	t_fail "command failed for $relative sec: $@"
-}
@@ -43,14 +43,9 @@ t_tap_progress()
 	local testname=$1
 	local result=$2

-	local stmsg=""
 	local diff=""
 	local dmsg=""

-	if [[ -s $T_RESULTS/tmp/${testname}/status.msg ]]; then
-		stmsg="1"
-	fi
-
 	if [[ -s "$T_RESULTS/tmp/${testname}/dmesg.new" ]]; then
 		dmsg="1"
 	fi
@@ -66,7 +61,6 @@ t_tap_progress()
 		echo "# ${testname} ** skipped - permitted **"
 	else
 		echo "not ok ${i} - ${testname}"
-
 		case ${result} in
 		101)
 			echo "# ${testname} ** skipped **"
@@ -76,13 +70,6 @@ t_tap_progress()
 			;;
 		esac

-		if [[ -n "${stmsg}" ]]; then
-			echo "#"
-			echo "# status:"
-			echo "#"
-			cat $T_RESULTS/tmp/${testname}/status.msg | sed 's/^/# - /'
-		fi
-
 		if [[ -n "${diff}" ]]; then
 			echo "#"
 			echo "# diff:"
@@ -1,6 +0,0 @@
-== make scratch fs
-== create uid/gids
-== set acls and permissions
-== compare output
-== drop caches and compare again
-== cleanup scratch fs
@@ -1,32 +0,0 @@
-== write via NFS, read both sides
-== POSIX ACL set via NFS, read both sides
-user::rw-
-user:22222:rw-
-group::r--
-mask::rw-
-other::r--
-
-user::rw-
-user:22222:rw-
-group::r--
-mask::rw-
-other::r--
-
-== POSIX ACL set on scoutfs, read via NFS
-user::rw-
-user:22222:rw-
-group::r--
-group:44444:r--
-mask::rw-
-other::r--
-
-== default ACL inheritance via NFS
-user::rw-
-user:22222:rwx	#effective:rw-
-group::r-x	#effective:r--
-mask::rw-
-other::r--
-
-== NFS read demand-stages a released file
-1
-== cleanup
@@ -1,54 +0,0 @@
-== testing invalid read-xattr-index arguments
-bad index position entry argument 'bad', it must be in the form "a.b.ino" where each value can be prefixed by '0' for octal or '0x' for hex
-scoutfs: read-xattr-index failed: Invalid argument (22)
-bad index position entry argument '1.2', it must be in the form "a.b.ino" where each value can be prefixed by '0' for octal or '0x' for hex
-scoutfs: read-xattr-index failed: Invalid argument (22)
-initial major index position '256' must be between 0 and 255, inclusive.
-scoutfs: read-xattr-index failed: Invalid argument (22)
-first index position 1.2.3 must be less than last index position 0.0.0
-scoutfs: read-xattr-index failed: Invalid argument (22)
-first index position 1.2.0 must be less than last index position 1.1.2
-scoutfs: read-xattr-index failed: Invalid argument (22)
-first index position 2.2.2 must be less than last index position 2.2.1
-scoutfs: read-xattr-index failed: Invalid argument (22)
-== testing invalid names
-setfattr: /mnt/test/test/basic-xattr-indx/invalid: Invalid argument
-setfattr: /mnt/test/test/basic-xattr-indx/invalid: Invalid argument
-setfattr: /mnt/test/test/basic-xattr-indx/invalid: Invalid argument
-setfattr: /mnt/test/test/basic-xattr-indx/invalid: Invalid argument
-setfattr: /mnt/test/test/basic-xattr-indx/invalid: Invalid argument
-setfattr: /mnt/test/test/basic-xattr-indx/invalid: Invalid argument
-setfattr: /mnt/test/test/basic-xattr-indx/invalid: Invalid argument
-setfattr: /mnt/test/test/basic-xattr-indx/invalid: Invalid argument
-setfattr: /mnt/test/test/basic-xattr-indx/invalid: Invalid argument
-setfattr: /mnt/test/test/basic-xattr-indx/invalid: Invalid argument
-setfattr: /mnt/test/test/basic-xattr-indx/invalid: Invalid argument
-setfattr: /mnt/test/test/basic-xattr-indx/invalid: Numerical result out of range
-== testing boundary values
-0.0 found
-255.max found
-== indx xattr must have no value
-setfattr: /mnt/test/test/basic-xattr-indx/noval: Invalid argument
-setfattr: /mnt/test/test/basic-xattr-indx/noval: Invalid argument
-== set indx xattr and verify index entry
-found
-== setting same indx xattr again is a no-op
-found
-== removing non-existent indx xattr succeeds
-setfattr: /mnt/test/test/basic-xattr-indx/file: No such attribute
-still found
-== explicit xattr removal cleans up index entry
-== file deletion cleans up index entry
-found before delete
-== multiple indx xattrs on one file cleaned up by deletion
-entries before delete: 2
-entries after delete: 0
-== partial removal leaves other entries
-300 found
-== multiple files at same index position
-files at same position: 2
-surviving file found
-== cross-mount visibility
-found on mount 1
-== duplicate position deduplication
-entries for same position: 1
@@ -8,10 +8,10 @@
 /mnt/test/test/data-prealloc/file-1: extents: 32
 /mnt/test/test/data-prealloc/file-2: extents: 32
 == any writes to region prealloc get full extents
-/mnt/test/test/data-prealloc/file-1: extents: 8
-/mnt/test/test/data-prealloc/file-2: extents: 8
-/mnt/test/test/data-prealloc/file-1: extents: 8
-/mnt/test/test/data-prealloc/file-2: extents: 8
+/mnt/test/test/data-prealloc/file-1: extents: 4
+/mnt/test/test/data-prealloc/file-2: extents: 4
+/mnt/test/test/data-prealloc/file-1: extents: 4
+/mnt/test/test/data-prealloc/file-2: extents: 4
 == streaming offline writes get full extents either way
 /mnt/test/test/data-prealloc/file-1: extents: 4
 /mnt/test/test/data-prealloc/file-2: extents: 4
@@ -20,8 +20,8 @@
 == goofy preallocation amounts work
 /mnt/test/test/data-prealloc/file-1: extents: 6
 /mnt/test/test/data-prealloc/file-2: extents: 6
-/mnt/test/test/data-prealloc/file-1: extents: 10
-/mnt/test/test/data-prealloc/file-2: extents: 10
+/mnt/test/test/data-prealloc/file-1: extents: 6
+/mnt/test/test/data-prealloc/file-2: extents: 6
 /mnt/test/test/data-prealloc/file-1: extents: 3
 /mnt/test/test/data-prealloc/file-2: extents: 3
 == block writes into region allocs hole
@@ -17,7 +17,7 @@ ino not found in dseq index
 mount 0 contents after mount 1 rm: contents
 ino found in dseq index
 ino found in dseq index
-stat: cannot stat '/mnt/test/test/inode-deletion/badfile': No such file or directory
+stat: cannot stat '/mnt/test/test/inode-deletion/file': No such file or directory
 ino not found in dseq index
 ino not found in dseq index
 == lots of deletions use one open map
@@ -1,3 +1,4 @@
+== setting longer hung task timeout
 == creating fragmented extents
 == unlink file with moved extents to free extents per block
 == cleanup
@@ -49,7 +49,7 @@ offline wating should be empty:
 0
 == truncating does wait
 truncate should be waiting for first block:
-truncate should no longer be waiting:
+trunate should no longer be waiting:
 0
 == writing waits
 should be waiting for write
@@ -1,3 +0,0 @@
-== create orphan log_trees entry via trigger
-== verify orphan is reclaimed and merge completes
-== verify orphan reclaim was logged
@@ -1,460 +0,0 @@
-== missing options should fail ==
-punch-offline: must provide offset
-Try `punch-offline --help' or `punch-offline --usage' for more information.
-punch-offline: must provide length
-Try `punch-offline --help' or `punch-offline --usage' for more information.
-punch-offline: must provide data_version
-Try `punch-offline --help' or `punch-offline --usage' for more information.
-== can't hole punch dir or special ==
-failed to open '/mnt/test.0/test/punch-offline/dir': Is a directory (21)
-scoutfs: punch-offline failed: Is a directory (21)
-== punching an empty file does nothing ==
-== punch outside of i_size does nothing ==
-== can't hole punch online extent ==
-0: offset: 0 length: 4096 flags: ..L
-extents: 1
-punch_offline ioctl failed: Invalid argument (22)
-scoutfs: punch-offline failed: Invalid argument (22)
-0: offset: 0 length: 4096 flags: ..L
-extents: 1
-== can't hole punch unwritten extent ==
-0: offset: 0 length: 12288 flags: .UL
-extents: 1
-punch_offline ioctl failed: Invalid argument (22)
-scoutfs: punch-offline failed: Invalid argument (22)
-0: offset: 0 length: 12288 flags: .UL
-extents: 1
-== hole punch offline extent ==
-0: offset: 0 length: 12288 flags: O.L
-extents: 1
-0: offset: 0 length: 4096 flags: O..
-1: offset: 8192 length: 4096 flags: O.L
-extents: 2
-== can't hole punch non-aligned bsz offset or len ==
-0: offset: 0 length: 12288 flags: O.L
-extents: 1
-punch_offline ioctl failed: Value too large for defined data type (75)
-scoutfs: punch-offline failed: Value too large for defined data type (75)
-punch_offline ioctl failed: Value too large for defined data type (75)
-scoutfs: punch-offline failed: Value too large for defined data type (75)
-punch_offline ioctl failed: Value too large for defined data type (75)
-scoutfs: punch-offline failed: Value too large for defined data type (75)
-punch_offline ioctl failed: Value too large for defined data type (75)
-scoutfs: punch-offline failed: Value too large for defined data type (75)
-punch_offline ioctl failed: Value too large for defined data type (75)
-scoutfs: punch-offline failed: Value too large for defined data type (75)
-punch_offline ioctl failed: Value too large for defined data type (75)
-scoutfs: punch-offline failed: Value too large for defined data type (75)
-0: offset: 0 length: 12288 flags: O.L
-extents: 1
-== can't hole punch mismatched data_version ==
-0: offset: 0 length: 12288 flags: O.L
-extents: 1
-punch_offline ioctl failed: Stale file handle (116)
-scoutfs: punch-offline failed: Stale file handle (116)
-punch_offline ioctl failed: Stale file handle (116)
-scoutfs: punch-offline failed: Stale file handle (116)
-punch_offline ioctl failed: Stale file handle (116)
-scoutfs: punch-offline failed: Stale file handle (116)
-0: offset: 0 length: 12288 flags: O.L
-extents: 1
-== Punch hole crossing multiple extents ==
-0: offset: 0 length: 7 flags: O.L
-extents: 1
-0: offset: 0 length: 1 flags: O..
-1: offset: 2 length: 1 flags: O..
-2: offset: 4 length: 1 flags: O..
-3: offset: 6 length: 1 flags: O.L
-extents: 4
-0: offset: 0 length: 1 flags: O..
-1: offset: 6 length: 1 flags: O.L
-extents: 2
-== punch hole starting at a hole ==
-0: offset: 0 length: 7 flags: O.L
-extents: 1
-0: offset: 0 length: 1 flags: O..
-1: offset: 2 length: 1 flags: O..
-2: offset: 4 length: 1 flags: O..
-3: offset: 6 length: 1 flags: O.L
-extents: 4
-0: offset: 0 length: 1 flags: O..
-1: offset: 6 length: 1 flags: O.L
-extents: 2
-== large punch ==
-0: offset: 0 length: 1572864 flags: O.L
-extents: 1
-0: offset: 0 length: 134123 flags: O..
-1: offset: 202466 length: 264807 flags: O..
-2: offset: 535616 length: 199007 flags: O..
-3: offset: 802966 length: 769898 flags: O.L
-extents: 4
-== overlapping punches with lots of extents ==
-0: offset: 0 length: 4194304 flags: O.L
-extents: 1
-extents: 512
-extents: 505
-extents: 378
-extents: 252
-0: offset: 0 length: 4096 flags: O..
-1: offset: 8192 length: 4096 flags: O..
-2: offset: 32768 length: 4096 flags: O..
-3: offset: 40960 length: 4096 flags: O..
-4: offset: 65536 length: 4096 flags: O..
-5: offset: 73728 length: 4096 flags: O..
-6: offset: 98304 length: 4096 flags: O..
-7: offset: 106496 length: 4096 flags: O..
-8: offset: 196608 length: 4096 flags: O..
-9: offset: 204800 length: 4096 flags: O..
-10: offset: 229376 length: 4096 flags: O..
-11: offset: 237568 length: 4096 flags: O..
-12: offset: 262144 length: 4096 flags: O..
-13: offset: 270336 length: 4096 flags: O..
-14: offset: 294912 length: 4096 flags: O..
-15: offset: 303104 length: 4096 flags: O..
-16: offset: 327680 length: 4096 flags: O..
-17: offset: 335872 length: 4096 flags: O..
-18: offset: 360448 length: 4096 flags: O..
-19: offset: 368640 length: 4096 flags: O..
-20: offset: 393216 length: 4096 flags: O..
-21: offset: 401408 length: 4096 flags: O..
-22: offset: 425984 length: 4096 flags: O..
-23: offset: 434176 length: 4096 flags: O..
-24: offset: 458752 length: 4096 flags: O..
-25: offset: 466944 length: 4096 flags: O..
-26: offset: 491520 length: 4096 flags: O..
-27: offset: 499712 length: 4096 flags: O..
-28: offset: 720896 length: 4096 flags: O..
-29: offset: 729088 length: 4096 flags: O..
-30: offset: 753664 length: 4096 flags: O..
-31: offset: 761856 length: 4096 flags: O..
-32: offset: 786432 length: 4096 flags: O..
-33: offset: 794624 length: 4096 flags: O..
-34: offset: 819200 length: 4096 flags: O..
-35: offset: 827392 length: 4096 flags: O..
-36: offset: 851968 length: 4096 flags: O..
-37: offset: 860160 length: 4096 flags: O..
-38: offset: 884736 length: 4096 flags: O..
-39: offset: 892928 length: 4096 flags: O..
-40: offset: 917504 length: 4096 flags: O..
-41: offset: 925696 length: 4096 flags: O..
-42: offset: 950272 length: 4096 flags: O..
-43: offset: 958464 length: 4096 flags: O..
-44: offset: 983040 length: 4096 flags: O..
-45: offset: 991232 length: 4096 flags: O..
-46: offset: 1015808 length: 4096 flags: O..
-47: offset: 1024000 length: 4096 flags: O..
-48: offset: 1048576 length: 4096 flags: O..
-49: offset: 1056768 length: 4096 flags: O..
-50: offset: 1081344 length: 4096 flags: O..
-51: offset: 1089536 length: 4096 flags: O..
-52: offset: 1114112 length: 4096 flags: O..
-53: offset: 1122304 length: 4096 flags: O..
-54: offset: 1146880 length: 4096 flags: O..
-55: offset: 1155072 length: 4096 flags: O..
-56: offset: 1179648 length: 4096 flags: O..
-57: offset: 1187840 length: 4096 flags: O..
-58: offset: 1212416 length: 4096 flags: O..
-59: offset: 1220608 length: 4096 flags: O..
-60: offset: 1245184 length: 4096 flags: O..
-61: offset: 1253376 length: 4096 flags: O..
-62: offset: 1277952 length: 4096 flags: O..
-63: offset: 1286144 length: 4096 flags: O..
-64: offset: 1310720 length: 4096 flags: O..
-65: offset: 1318912 length: 4096 flags: O..
-66: offset: 1343488 length: 4096 flags: O..
-67: offset: 1351680 length: 4096 flags: O..
-68: offset: 1376256 length: 4096 flags: O..
-69: offset: 1384448 length: 4096 flags: O..
-70: offset: 1409024 length: 4096 flags: O..
-71: offset: 1417216 length: 4096 flags: O..
-72: offset: 1441792 length: 4096 flags: O..
-73: offset: 1449984 length: 4096 flags: O..
-74: offset: 1474560 length: 4096 flags: O..
-75: offset: 1482752 length: 4096 flags: O..
-76: offset: 1507328 length: 4096 flags: O..
-77: offset: 1515520 length: 4096 flags: O..
-78: offset: 1540096 length: 4096 flags: O..
-79: offset: 1548288 length: 4096 flags: O..
-80: offset: 1572864 length: 4096 flags: O..
-81: offset: 1581056 length: 4096 flags: O..
-82: offset: 1605632 length: 4096 flags: O..
-83: offset: 1613824 length: 4096 flags: O..
-84: offset: 1638400 length: 4096 flags: O..
-85: offset: 1646592 length: 4096 flags: O..
-86: offset: 1671168 length: 4096 flags: O..
-87: offset: 1679360 length: 4096 flags: O..
-88: offset: 1703936 length: 4096 flags: O..
-89: offset: 1712128 length: 4096 flags: O..
-90: offset: 1736704 length: 4096 flags: O..
-91: offset: 1744896 length: 4096 flags: O..
-92: offset: 1769472 length: 4096 flags: O..
-93: offset: 1777664 length: 4096 flags: O..
-94: offset: 1802240 length: 4096 flags: O..
-95: offset: 1810432 length: 4096 flags: O..
-96: offset: 1835008 length: 4096 flags: O..
-97: offset: 1843200 length: 4096 flags: O..
-98: offset: 1867776 length: 4096 flags: O..
-99: offset: 1875968 length: 4096 flags: O..
-100: offset: 1900544 length: 4096 flags: O..
-101: offset: 1908736 length: 4096 flags: O..
-102: offset: 1933312 length: 4096 flags: O..
-103: offset: 1941504 length: 4096 flags: O..
-104: offset: 1966080 length: 4096 flags: O..
-105: offset: 1974272 length: 4096 flags: O..
-106: offset: 1998848 length: 4096 flags: O..
-107: offset: 2007040 length: 4096 flags: O..
-108: offset: 2031616 length: 4096 flags: O..
-109: offset: 2039808 length: 4096 flags: O..
-110: offset: 2064384 length: 4096 flags: O..
-111: offset: 2072576 length: 4096 flags: O..
-112: offset: 2097152 length: 4096 flags: O..
-113: offset: 2105344 length: 4096 flags: O..
-114: offset: 2129920 length: 4096 flags: O..
-115: offset: 2138112 length: 4096 flags: O..
-116: offset: 2162688 length: 4096 flags: O..
-117: offset: 2170880 length: 4096 flags: O..
-118: offset: 2195456 length: 4096 flags: O..
-119: offset: 2203648 length: 4096 flags: O..
-120: offset: 2228224 length: 4096 flags: O..
-121: offset: 2236416 length: 4096 flags: O..
-122: offset: 2260992 length: 4096 flags: O..
-123: offset: 2269184 length: 4096 flags: O..
-124: offset: 2293760 length: 4096 flags: O..
-125: offset: 2301952 length: 4096 flags: O..
-126: offset: 2326528 length: 4096 flags: O..
-127: offset: 2334720 length: 4096 flags: O..
-128: offset: 2359296 length: 4096 flags: O..
-129: offset: 2367488 length: 4096 flags: O..
-130: offset: 2392064 length: 4096 flags: O..
-131: offset: 2400256 length: 4096 flags: O..
-132: offset: 2424832 length: 4096 flags: O..
-133: offset: 2433024 length: 4096 flags: O..
-134: offset: 2457600 length: 4096 flags: O..
-135: offset: 2465792 length: 4096 flags: O..
-136: offset: 2490368 length: 4096 flags: O..
-137: offset: 2498560 length: 4096 flags: O..
-138: offset: 2523136 length: 4096 flags: O..
-139: offset: 2531328 length: 4096 flags: O..
-140: offset: 2555904 length: 4096 flags: O..
-141: offset: 2564096 length: 4096 flags: O..
-142: offset: 2588672 length: 4096 flags: O..
-143: offset: 2596864 length: 4096 flags: O..
-144: offset: 2621440 length: 4096 flags: O..
-145: offset: 2629632 length: 4096 flags: O..
-146: offset: 2654208 length: 4096 flags: O..
-147: offset: 2662400 length: 4096 flags: O..
-148: offset: 2686976 length: 4096 flags: O..
-149: offset: 2695168 length: 4096 flags: O..
-150: offset: 2719744 length: 4096 flags: O..
-151: offset: 2727936 length: 4096 flags: O..
-152: offset: 2752512 length: 4096 flags: O..
-153: offset: 2760704 length: 4096 flags: O..
-154: offset: 2785280 length: 4096 flags: O..
-155: offset: 2793472 length: 4096 flags: O..
-156: offset: 2818048 length: 4096 flags: O..
-157: offset: 2826240 length: 4096 flags: O..
-158: offset: 2850816 length: 4096 flags: O..
-159: offset: 2859008 length: 4096 flags: O..
-160: offset: 2883584 length: 4096 flags: O..
-161: offset: 2891776 length: 4096 flags: O..
-162: offset: 2916352 length: 4096 flags: O..
-163: offset: 2924544 length: 4096 flags: O..
-164: offset: 2949120 length: 4096 flags: O..
-165: offset: 2957312 length: 4096 flags: O..
-166: offset: 2981888 length: 4096 flags: O..
-167: offset: 2990080 length: 4096 flags: O..
-168: offset: 3014656 length: 4096 flags: O..
-169: offset: 3022848 length: 4096 flags: O..
-170: offset: 3047424 length: 4096 flags: O..
-171: offset: 3055616 length: 4096 flags: O..
-172: offset: 3080192 length: 4096 flags: O..
-173: offset: 3088384 length: 4096 flags: O..
-174: offset: 3112960 length: 4096 flags: O..
-175: offset: 3121152 length: 4096 flags: O..
-176: offset: 3145728 length: 4096 flags: O..
-177: offset: 3153920 length: 4096 flags: O..
-178: offset: 3178496 length: 4096 flags: O..
-179: offset: 3186688 length: 4096 flags: O..
-180: offset: 3211264 length: 4096 flags: O..
-181: offset: 3219456 length: 4096 flags: O..
-182: offset: 3244032 length: 4096 flags: O..
-183: offset: 3252224 length: 4096 flags: O..
-184: offset: 3276800 length: 4096 flags: O..
-185: offset: 3284992 length: 4096 flags: O..
-186: offset: 3309568 length: 4096 flags: O..
-187: offset: 3317760 length: 4096 flags: O..
-188: offset: 3342336 length: 4096 flags: O..
-189: offset: 3350528 length: 4096 flags: O..
-190: offset: 3375104 length: 4096 flags: O..
-191: offset: 3383296 length: 4096 flags: O..
-192: offset: 3407872 length: 4096 flags: O..
-193: offset: 3416064 length: 4096 flags: O..
-194: offset: 3440640 length: 4096 flags: O..
-195: offset: 3448832 length: 4096 flags: O..
-196: offset: 3473408 length: 4096 flags: O..
-197: offset: 3481600 length: 4096 flags: O..
-198: offset: 3506176 length: 4096 flags: O..
-199: offset: 3514368 length: 4096 flags: O..
-200: offset: 3538944 length: 4096 flags: O..
-201: offset: 3547136 length: 4096 flags: O..
-202: offset: 3571712 length: 4096 flags: O..
-203: offset: 3579904 length: 4096 flags: O..
-204: offset: 3604480 length: 4096 flags: O..
-205: offset: 3612672 length: 4096 flags: O..
-206: offset: 3637248 length: 4096 flags: O..
-207: offset: 3645440 length: 4096 flags: O..
-208: offset: 3670016 length: 4096 flags: O..
-209: offset: 3678208 length: 4096 flags: O..
-210: offset: 3702784 length: 4096 flags: O..
-211: offset: 3710976 length: 4096 flags: O..
-212: offset: 3735552 length: 4096 flags: O..
-213: offset: 3743744 length: 4096 flags: O..
-214: offset: 3768320 length: 4096 flags: O..
-215: offset: 3776512 length: 4096 flags: O..
-216: offset: 3801088 length: 4096 flags: O..
-217: offset: 3809280 length: 4096 flags: O..
-218: offset: 3833856 length: 4096 flags: O..
-219: offset: 3842048 length: 4096 flags: O..
-220: offset: 3866624 length: 4096 flags: O..
-221: offset: 3874816 length: 4096 flags: O..
-222: offset: 3899392 length: 4096 flags: O..
-223: offset: 3907584 length: 4096 flags: O..
-224: offset: 3932160 length: 4096 flags: O..
-225: offset: 3940352 length: 4096 flags: O..
-226: offset: 3964928 length: 4096 flags: O..
-227: offset: 3973120 length: 4096 flags: O..
-228: offset: 3997696 length: 4096 flags: O..
-229: offset: 4005888 length: 4096 flags: O..
-230: offset: 4030464 length: 4096 flags: O..
-231: offset: 4038656 length: 4096 flags: O..
-232: offset: 4063232 length: 4096 flags: O..
-233: offset: 4071424 length: 4096 flags: O..
-234: offset: 4096000 length: 4096 flags: O..
-235: offset: 4104192 length: 4096 flags: O..
-236: offset: 4128768 length: 4096 flags: O..
-237: offset: 4136960 length: 4096 flags: O..
-238: offset: 4161536 length: 4096 flags: O..
-239: offset: 4169728 length: 4096 flags: O.L
-extents: 240
-0: offset: 0 length: 1 flags: O..
-1: offset: 8 length: 1 flags: O..
-2: offset: 16 length: 1 flags: O..
-3: offset: 24 length: 1 flags: O..
-4: offset: 48 length: 1 flags: O..
-5: offset: 56 length: 1 flags: O..
-6: offset: 64 length: 1 flags: O..
-7: offset: 72 length: 1 flags: O..
-8: offset: 80 length: 1 flags: O..
-9: offset: 88 length: 1 flags: O..
-10: offset: 96 length: 1 flags: O..
-11: offset: 104 length: 1 flags: O..
-12: offset: 112 length: 1 flags: O..
-13: offset: 120 length: 1 flags: O..
-14: offset: 176 length: 1 flags: O..
-15: offset: 184 length: 1 flags: O..
-16: offset: 192 length: 1 flags: O..
-17: offset: 200 length: 1 flags: O..
-18: offset: 208 length: 1 flags: O..
-19: offset: 216 length: 1 flags: O..
-20: offset: 224 length: 1 flags: O..
-21: offset: 232 length: 1 flags: O..
-22: offset: 240 length: 1 flags: O..
-23: offset: 248 length: 1 flags: O..
-24: offset: 256 length: 1 flags: O..
-25: offset: 264 length: 1 flags: O..
-26: offset: 272 length: 1 flags: O..
-27: offset: 280 length: 1 flags: O..
-28: offset: 288 length: 1 flags: O..
-29: offset: 296 length: 1 flags: O..
-30: offset: 304 length: 1 flags: O..
-31: offset: 312 length: 1 flags: O..
-32: offset: 320 length: 1 flags: O..
-33: offset: 328 length: 1 flags: O..
-34: offset: 336 length: 1 flags: O..
-35: offset: 344 length: 1 flags: O..
-36: offset: 352 length: 1 flags: O..
-37: offset: 360 length: 1 flags: O..
-38: offset: 368 length: 1 flags: O..
-39: offset: 376 length: 1 flags: O..
-40: offset: 384 length: 1 flags: O..
-41: offset: 392 length: 1 flags: O..
-42: offset: 400 length: 1 flags: O..
-43: offset: 408 length: 1 flags: O..
-44: offset: 416 length: 1 flags: O..
-45: offset: 424 length: 1 flags: O..
-46: offset: 432 length: 1 flags: O..
-47: offset: 440 length: 1 flags: O..
-48: offset: 448 length: 1 flags: O..
-49: offset: 456 length: 1 flags: O..
-50: offset: 464 length: 1 flags: O..
-51: offset: 472 length: 1 flags: O..
-52: offset: 480 length: 1 flags: O..
-53: offset: 488 length: 1 flags: O..
-54: offset: 496 length: 1 flags: O..
-55: offset: 504 length: 1 flags: O..
-56: offset: 512 length: 1 flags: O..
-57: offset: 520 length: 1 flags: O..
-58: offset: 528 length: 1 flags: O..
-59: offset: 536 length: 1 flags: O..
-60: offset: 544 length: 1 flags: O..
-61: offset: 552 length: 1 flags: O..
-62: offset: 560 length: 1 flags: O..
-63: offset: 568 length: 1 flags: O..
-64: offset: 576 length: 1 flags: O..
-65: offset: 584 length: 1 flags: O..
-66: offset: 592 length: 1 flags: O..
-67: offset: 600 length: 1 flags: O..
-68: offset: 608 length: 1 flags: O..
-69: offset: 616 length: 1 flags: O..
-70: offset: 624 length: 1 flags: O..
-71: offset: 632 length: 1 flags: O..
-72: offset: 640 length: 1 flags: O..
-73: offset: 648 length: 1 flags: O..
-74: offset: 656 length: 1 flags: O..
-75: offset: 664 length: 1 flags: O..
-76: offset: 672 length: 1 flags: O..
-77: offset: 680 length: 1 flags: O..
-78: offset: 688 length: 1 flags: O..
-79: offset: 696 length: 1 flags: O..
-80: offset: 704 length: 1 flags: O..
-81: offset: 712 length: 1 flags: O..
-82: offset: 720 length: 1 flags: O..
-83: offset: 728 length: 1 flags: O..
-84: offset: 736 length: 1 flags: O..
-85: offset: 744 length: 1 flags: O..
-86: offset: 752 length: 1 flags: O..
-87: offset: 760 length: 1 flags: O..
-88: offset: 768 length: 1 flags: O..
-89: offset: 776 length: 1 flags: O..
-90: offset: 784 length: 1 flags: O..
-91: offset: 792 length: 1 flags: O..
-92: offset: 800 length: 1 flags: O..
-93: offset: 808 length: 1 flags: O..
-94: offset: 816 length: 1 flags: O..
-95: offset: 824 length: 1 flags: O..
-96: offset: 832 length: 1 flags: O..
-97: offset: 840 length: 1 flags: O..
-98: offset: 848 length: 1 flags: O..
-99: offset: 856 length: 1 flags: O..
-100: offset: 864 length: 1 flags: O..
-101: offset: 872 length: 1 flags: O..
-102: offset: 880 length: 1 flags: O..
-103: offset: 888 length: 1 flags: O..
-104: offset: 896 length: 1 flags: O..
-105: offset: 904 length: 1 flags: O..
-106: offset: 912 length: 1 flags: O..
-107: offset: 920 length: 1 flags: O..
-108: offset: 928 length: 1 flags: O..
-109: offset: 936 length: 1 flags: O..
-110: offset: 944 length: 1 flags: O..
-111: offset: 952 length: 1 flags: O..
-112: offset: 960 length: 1 flags: O..
-113: offset: 968 length: 1 flags: O..
-114: offset: 976 length: 1 flags: O..
-115: offset: 984 length: 1 flags: O..
-116: offset: 992 length: 1 flags: O..
-117: offset: 1000 length: 1 flags: O..
-118: offset: 1008 length: 1 flags: O..
-119: offset: 1016 length: 1 flags: O.L
-extents: 120
-extents: 0
@@ -1,6 +0,0 @@
-== setup
-== concurrent quota mod and check across mounts
-== verify quota rules are consistent after race
-== verify file creation still works under quota
-file visible on mount 1
-== cleanup
@@ -1,10 +0,0 @@
-== setup three files contributing to totl 8888.0.0
-== merge baseline into fs_root
-8888.0.0 = 42, 3
-== inject (+128, +2) unbalances totl 8888.0.0
-8888.0.0 = 170, 5
-== unlink f3 (value 32) produces a -32/-1 delta
-8888.0.0 = 138, 4
-== inject (-128, -2) restores accounting for the remaining files
-8888.0.0 = 10, 2
-== cleanup
@@ -1,3 +0,0 @@
-== setup
-expected 4681
-== cleanup
@@ -0,0 +1,882 @@
+Ran:
+generic/001
+generic/002
+generic/004
+generic/005
+generic/006
+generic/007
+generic/008
+generic/009
+generic/011
+generic/012
+generic/013
+generic/014
+generic/015
+generic/016
+generic/018
+generic/020
+generic/021
+generic/022
+generic/023
+generic/024
+generic/025
+generic/026
+generic/028
+generic/029
+generic/030
+generic/031
+generic/032
+generic/033
+generic/034
+generic/035
+generic/037
+generic/039
+generic/040
+generic/041
+generic/050
+generic/052
+generic/053
+generic/056
+generic/057
+generic/058
+generic/059
+generic/060
+generic/061
+generic/062
+generic/063
+generic/064
+generic/065
+generic/066
+generic/067
+generic/069
+generic/070
+generic/071
+generic/073
+generic/076
+generic/078
+generic/079
+generic/080
+generic/081
+generic/082
+generic/084
+generic/086
+generic/087
+generic/088
+generic/090
+generic/091
+generic/092
+generic/094
+generic/096
+generic/097
+generic/098
+generic/099
+generic/101
+generic/104
+generic/105
+generic/106
+generic/107
+generic/110
+generic/111
+generic/113
+generic/114
+generic/115
+generic/116
+generic/117
+generic/118
+generic/119
+generic/120
+generic/121
+generic/122
+generic/123
+generic/124
+generic/126
+generic/128
+generic/129
+generic/130
+generic/131
+generic/134
+generic/135
+generic/136
+generic/138
+generic/139
+generic/140
+generic/141
+generic/142
+generic/143
+generic/144
+generic/145
+generic/146
+generic/147
+generic/148
+generic/149
+generic/150
+generic/151
+generic/152
+generic/153
+generic/154
+generic/155
+generic/156
+generic/157
+generic/158
+generic/159
+generic/160
+generic/161
+generic/162
+generic/163
+generic/169
+generic/171
+generic/172
+generic/173
+generic/174
+generic/177
+generic/178
+generic/179
+generic/180
+generic/181
+generic/182
+generic/183
+generic/184
+generic/185
+generic/188
+generic/189
+generic/190
+generic/191
+generic/193
+generic/194
+generic/195
+generic/196
+generic/197
+generic/198
+generic/199
+generic/200
+generic/201
+generic/202
+generic/203
+generic/205
+generic/206
+generic/207
+generic/210
+generic/211
+generic/212
+generic/214
+generic/215
+generic/216
+generic/217
+generic/218
+generic/219
+generic/220
+generic/221
+generic/222
+generic/223
+generic/225
+generic/227
+generic/228
+generic/229
+generic/230
+generic/235
+generic/236
+generic/237
+generic/238
+generic/240
+generic/244
+generic/245
+generic/246
+generic/247
+generic/248
+generic/249
+generic/250
+generic/252
+generic/253
+generic/254
+generic/255
+generic/256
+generic/257
+generic/258
+generic/259
+generic/260
+generic/261
+generic/262
+generic/263
+generic/264
+generic/265
+generic/266
+generic/267
+generic/268
+generic/271
+generic/272
+generic/276
+generic/277
+generic/278
+generic/279
+generic/281
+generic/282
+generic/283
+generic/284
+generic/286
+generic/287
+generic/288
+generic/289
+generic/290
+generic/291
+generic/292
+generic/293
+generic/294
+generic/295
+generic/296
+generic/301
+generic/302
+generic/303
+generic/304
+generic/305
+generic/306
+generic/307
+generic/308
+generic/309
+generic/312
+generic/313
+generic/314
+generic/315
+generic/316
+generic/317
+generic/319
+generic/322
+generic/324
+generic/325
+generic/326
+generic/327
+generic/328
+generic/329
+generic/330
+generic/331
+generic/332
+generic/335
+generic/336
+generic/337
+generic/341
+generic/342
+generic/343
+generic/346
+generic/348
+generic/353
+generic/355
+generic/358
+generic/359
+generic/360
+generic/361
+generic/362
+generic/363
+generic/364
+generic/365
+generic/366
+generic/367
+generic/368
+generic/369
+generic/370
+generic/371
+generic/372
+generic/373
+generic/374
+generic/375
+generic/376
+generic/377
+generic/378
+generic/379
+generic/380
+generic/381
+generic/382
+generic/383
+generic/384
+generic/385
+generic/386
+generic/389
+generic/391
+generic/392
+generic/393
+generic/394
+generic/395
+generic/396
+generic/397
+generic/398
+generic/400
+generic/401
+generic/402
+generic/403
+generic/404
+generic/406
+generic/407
+generic/408
+generic/412
+generic/413
+generic/414
+generic/417
+generic/419
+generic/420
+generic/421
+generic/422
+generic/424
+generic/425
+generic/426
+generic/427
+generic/428
+generic/436
+generic/437
+generic/439
+generic/440
+generic/443
+generic/445
+generic/446
+generic/448
+generic/449
+generic/450
+generic/451
+generic/452
+generic/453
+generic/454
+generic/456
+generic/458
+generic/460
+generic/462
+generic/463
+generic/465
+generic/466
+generic/468
+generic/469
+generic/470
+generic/471
+generic/474
+generic/477
+generic/478
+generic/479
+generic/480
+generic/481
+generic/483
+generic/485
+generic/486
+generic/487
+generic/488
+generic/489
+generic/490
+generic/491
+generic/492
+generic/498
+generic/499
+generic/501
+generic/502
+generic/503
+generic/504
+generic/505
+generic/506
+generic/507
+generic/508
+generic/509
+generic/510
+generic/511
+generic/512
+generic/513
+generic/514
+generic/515
+generic/516
+generic/517
+generic/518
+generic/519
+generic/520
+generic/523
+generic/524
+generic/525
+generic/526
+generic/527
+generic/528
+generic/529
+generic/530
+generic/531
+generic/533
+generic/534
+generic/535
+generic/536
+generic/537
+generic/538
+generic/539
+generic/540
+generic/541
+generic/542
+generic/543
+generic/544
+generic/545
+generic/546
+generic/547
+generic/548
+generic/549
+generic/550
+generic/552
+generic/553
+generic/555
+generic/556
+generic/557
+generic/566
+generic/567
+generic/571
+generic/572
+generic/573
+generic/574
+generic/575
+generic/576
+generic/577
+generic/578
+generic/580
+generic/581
+generic/582
+generic/583
+generic/584
+generic/586
+generic/587
+generic/588
+generic/591
+generic/592
+generic/593
+generic/594
+generic/595
+generic/596
+generic/597
+generic/598
+generic/599
+generic/600
+generic/601
+generic/602
+generic/603
+generic/604
+generic/605
+generic/606
+generic/607
+generic/608
+generic/609
+generic/610
+generic/611
+generic/612
+generic/613
+generic/614
+generic/618
+generic/621
+generic/623
+generic/624
+generic/625
+generic/626
+generic/628
+generic/629
+generic/630
+generic/632
+generic/634
+generic/635
+generic/637
+generic/638
+generic/639
+generic/640
+generic/644
+generic/645
+generic/646
+generic/647
+generic/651
+generic/652
+generic/653
+generic/654
+generic/655
+generic/657
+generic/658
+generic/659
+generic/660
+generic/661
+generic/662
+generic/663
+generic/664
+generic/665
+generic/666
+generic/667
+generic/668
+generic/669
+generic/673
+generic/674
+generic/675
+generic/676
+generic/677
+generic/678
+generic/679
+generic/680
+generic/681
+generic/682
+generic/683
+generic/684
+generic/685
+generic/686
+generic/687
+generic/688
+generic/689
+shared/002
+shared/032
+Not
+run:
+generic/008
+generic/009
+generic/012
+generic/015
+generic/016
+generic/018
+generic/021
+generic/022
+generic/025
+generic/026
+generic/031
+generic/033
+generic/050
+generic/052
+generic/058
+generic/059
+generic/060
+generic/061
+generic/063
+generic/064
+generic/078
+generic/079
+generic/081
+generic/082
+generic/091
+generic/094
+generic/096
+generic/110
+generic/111
+generic/113
+generic/114
+generic/115
+generic/116
+generic/118
+generic/119
+generic/121
+generic/122
+generic/123
+generic/128
+generic/130
+generic/134
+generic/135
+generic/136
+generic/138
+generic/139
+generic/140
+generic/142
+generic/143
+generic/144
+generic/145
+generic/146
+generic/147
+generic/148
+generic/149
+generic/150
+generic/151
+generic/152
+generic/153
+generic/154
+generic/155
+generic/156
+generic/157
+generic/158
+generic/159
+generic/160
+generic/161
+generic/162
+generic/163
+generic/171
+generic/172
+generic/173
+generic/174
+generic/177
+generic/178
+generic/179
+generic/180
+generic/181
+generic/182
+generic/183
+generic/185
+generic/188
+generic/189
+generic/190
+generic/191
+generic/193
+generic/194
+generic/195
+generic/196
+generic/197
+generic/198
+generic/199
+generic/200
+generic/201
+generic/202
+generic/203
+generic/205
+generic/206
+generic/207
+generic/210
+generic/211
+generic/212
+generic/214
+generic/216
+generic/217
+generic/218
+generic/219
+generic/220
+generic/222
+generic/223
+generic/225
+generic/227
+generic/229
+generic/230
+generic/235
+generic/238
+generic/240
+generic/244
+generic/250
+generic/252
+generic/253
+generic/254
+generic/255
+generic/256
+generic/259
+generic/260
+generic/261
+generic/262
+generic/263
+generic/264
+generic/265
+generic/266
+generic/267
+generic/268
+generic/271
+generic/272
+generic/276
+generic/277
+generic/278
+generic/279
+generic/281
+generic/282
+generic/283
+generic/284
+generic/287
+generic/288
+generic/289
+generic/290
+generic/291
+generic/292
+generic/293
+generic/295
+generic/296
+generic/301
+generic/302
+generic/303
+generic/304
+generic/305
+generic/312
+generic/314
+generic/316
+generic/317
+generic/324
+generic/326
+generic/327
+generic/328
+generic/329
+generic/330
+generic/331
+generic/332
+generic/353
+generic/355
+generic/358
+generic/359
+generic/361
+generic/362
+generic/363
+generic/364
+generic/365
+generic/366
+generic/367
+generic/368
+generic/369
+generic/370
+generic/371
+generic/372
+generic/373
+generic/374
+generic/378
+generic/379
+generic/380
+generic/381
+generic/382
+generic/383
+generic/384
+generic/385
+generic/386
+generic/391
+generic/392
+generic/395
+generic/396
+generic/397
+generic/398
+generic/400
+generic/402
+generic/404
+generic/406
+generic/407
+generic/408
+generic/412
+generic/413
+generic/414
+generic/417
+generic/419
+generic/420
+generic/421
+generic/422
+generic/424
+generic/425
+generic/427
+generic/439
+generic/440
+generic/446
+generic/449
+generic/450
+generic/451
+generic/453
+generic/454
+generic/456
+generic/458
+generic/462
+generic/463
+generic/465
+generic/466
+generic/468
+generic/469
+generic/470
+generic/471
+generic/474
+generic/485
+generic/487
+generic/488
+generic/491
+generic/492
+generic/499
+generic/501
+generic/503
+generic/505
+generic/506
+generic/507
+generic/508
+generic/511
+generic/513
+generic/514
+generic/515
+generic/516
+generic/517
+generic/518
+generic/519
+generic/520
+generic/528
+generic/530
+generic/536
+generic/537
+generic/538
+generic/539
+generic/540
+generic/541
+generic/542
+generic/543
+generic/544
+generic/545
+generic/546
+generic/548
+generic/549
+generic/550
+generic/552
+generic/553
+generic/555
+generic/556
+generic/566
+generic/567
+generic/572
+generic/573
+generic/574
+generic/575
+generic/576
+generic/577
+generic/578
+generic/580
+generic/581
+generic/582
+generic/583
+generic/584
+generic/586
+generic/587
+generic/588
+generic/591
+generic/592
+generic/593
+generic/594
+generic/595
+generic/596
+generic/597
+generic/598
+generic/599
+generic/600
+generic/601
+generic/602
+generic/603
+generic/605
+generic/606
+generic/607
+generic/608
+generic/609
+generic/610
+generic/612
+generic/613
+generic/621
+generic/623
+generic/624
+generic/625
+generic/626
+generic/628
+generic/629
+generic/630
+generic/635
+generic/644
+generic/645
+generic/646
+generic/647
+generic/651
+generic/652
+generic/653
+generic/654
+generic/655
+generic/657
+generic/658
+generic/659
+generic/660
+generic/661
+generic/662
+generic/663
+generic/664
+generic/665
+generic/666
+generic/667
+generic/668
+generic/669
+generic/673
+generic/674
+generic/675
+generic/677
+generic/678
+generic/679
+generic/680
+generic/681
+generic/682
+generic/683
+generic/684
+generic/685
+generic/686
+generic/687
+generic/688
+generic/689
+shared/002
+shared/032
+Passed all 512 tests
@@ -56,7 +56,6 @@ $(basename $0) options:
              | only tests matching will be run.  Can be provided multiple
              | times
    -i        | Force removing and inserting the built scoutfs.ko module.
-    -l <nr>   | Loop each test <nr> times while passing, last run counts.
    -M <file> | Specify the filesystem's meta data device path that contains
              | the file system to be tested.  Will be clobbered by -m mkfs.
    -m        | Run mkfs on the device before mounting and running
@@ -70,7 +69,6 @@ $(basename $0) options:
    -r <dir>  | Specify the directory in which to store results of
              | test runs.  The directory will be created if it doesn't
              | exist.  Previous results will be deleted as each test runs.
-    -R        | shuffle the test order randomly using shuf
    -s        | Skip git repo checkouts.
    -t        | Enabled trace events that match the given glob argument.
              | Multiple options enable multiple globbed events.
@@ -91,8 +89,6 @@ done
 # set some T_ defaults
 T_TRACE_DUMP="0"
 T_TRACE_PRINTK="0"
-T_PORT_START="19700"
-T_LOOP_ITER="1"

 # array declarations to be able to use array ops
 declare -a T_TRACE_GLOB
@@ -133,12 +129,6 @@ while true; do
 	-i)
 		T_INSMOD="1"
 		;;
-	-l)
-	        test -n "$2" || die "-l must have a nr iterations argument"
-		test "$2" -eq "$2" 2>/dev/null || die "-l <nr> argument must be an integer"
-		T_LOOP_ITER="$2"
-		shift
-		;;
 	-M)
 	        test -n "$2" || die "-z must have meta device file argument"
 	        T_META_DEVICE="$2"
@@ -174,9 +164,6 @@ while true; do
 		T_RESULTS="$2"
 		shift
 		;;
-	-R)
-		T_SHUF="1"
-		;;
 	-s)
 	        T_SKIP_CHECKOUT="1"
 		;;
@@ -274,37 +261,13 @@ for e in T_META_DEVICE T_DATA_DEVICE T_EX_META_DEV T_EX_DATA_DEV T_KMOD T_RESULT
 	eval $e=\"$(readlink -f "${!e}")\"
 done

-# try and check ports, but not necessary
-T_TEST_PORT="$T_PORT_START"
-T_SCRATCH_PORT="$((T_PORT_START + 100))"
-T_DEV_PORT="$((T_PORT_START + 200))"
-read local_start local_end < /proc/sys/net/ipv4/ip_local_port_range
-if [ -n "$local_start" -a -n "$local_end" -a "$local_start" -lt "$local_end" ]; then
-	if [ ! "$T_DEV_PORT" -lt "$local_start" -a ! "$T_TEST_PORT" -gt "$local_end" ]; then
-		die "listening port range $T_TEST_PORT - $T_DEV_PORT is within local dynamic port range $local_start - $local_end in /proc/sys/net/ipv4/ip_local_port_range"
-	fi
-fi
-
-# permute sequence?
-T_SEQUENCE=sequence
-if [ -n "$T_SHUF" ]; then
-	msg "shuffling test order"
-	shuf sequence -o sequence.shuf
-	# keep xfstests at the end
-	if grep -q 'xfstests.sh' sequence.shuf ; then
-		sed -i '/xfstests.sh/d' sequence.shuf
-		echo "xfstests.sh" >> sequence.shuf
-	fi
-	T_SEQUENCE=sequence.shuf
-fi
-
 # include everything by default
 test -z "$T_INCLUDE" && T_INCLUDE="-e '.*'"
 # (quickly) exclude nothing by default
 test -z "$T_EXCLUDE" && T_EXCLUDE="-e '\Zx'"

 # eval to strip re ticks but not expand
-tests=$(grep -v "^#" $T_SEQUENCE |
+tests=$(grep -v "^#" sequence |
 	eval grep "$T_INCLUDE" | eval grep -v "$T_EXCLUDE")
 test -z "$tests" && \
 	die "no tests found by including $T_INCLUDE and excluding $T_EXCLUDE"
@@ -383,7 +346,7 @@ fi
 quo=""
 if [ -n "$T_MKFS" ]; then
 	for i in $(seq -0 $((T_QUORUM - 1))); do
-		quo="$quo -Q $i,127.0.0.1,$((T_TEST_PORT + i))"
+		quo="$quo -Q $i,127.0.0.1,$((42000 + i))"
 	done

 	msg "making new filesystem with $T_QUORUM quorum members"
@@ -400,8 +363,7 @@ if [ -n "$T_INSMOD" ]; then
 fi

 if [ -n "$T_TRACE_MULT" ]; then
-#	orig_trace_size=$(cat /sys/kernel/debug/tracing/buffer_size_kb)
-	orig_trace_size=1408
+	orig_trace_size=$(cat /sys/kernel/debug/tracing/buffer_size_kb)
 	mult_trace_size=$((orig_trace_size * T_TRACE_MULT))
 	msg "increasing trace buffer size from $orig_trace_size KiB to $mult_trace_size KiB"
 	echo $mult_trace_size > /sys/kernel/debug/tracing/buffer_size_kb
@@ -439,30 +401,6 @@ cmd grep .  /sys/kernel/debug/tracing/options/trace_printk \
 	    /sys/kernel/debug/tracing/buffer_size_kb \
 	    /proc/sys/kernel/ftrace_dump_on_oops

-# we can record pids to kill as we exit, we kill in reverse added order
-atexit_kill_pids=""
-add_atexit_kill_pid()
-{
-	atexit_kill_pids="$1 $atexit_kill_pids"
-}
-atexit_kill()
-{
-	local pid
-
-	# suppress bg function exited messages
-	exec {ERR}>&2 2>/dev/null
-
-	for pid in $atexit_kill_pids; do
-		if test -e "/proc/$pid/status" ; then
-			kill "$pid"
-			wait "$pid"
-		fi
-	done
-
-	exec 2>&$ERR {ERR}>&-
-}
-trap atexit_kill EXIT
-
 #
 # Build a fenced config that runs scripts out of the repository rather
 # than the default system directory
@@ -476,46 +414,26 @@ EOF
 export SCOUTFS_FENCED_CONFIG_FILE="$conf"
 T_FENCED_LOG="$T_RESULTS/fenced.log"

+#
+# Run the agent in the background, log its output, an kill it if we
+# exit
+#
+fenced_log()
+{
+	echo "[$(timestamp)] $*" >> "$T_FENCED_LOG"
+}
+fenced_pid=""
+kill_fenced()
+{
+	if test -n "$fenced_pid" -a -d "/proc/$fenced_pid" ; then
+		fenced_log "killing fenced pid $fenced_pid"
+		kill "$fenced_pid"
+	fi
+}
+trap kill_fenced EXIT
 $T_UTILS/fenced/scoutfs-fenced > "$T_FENCED_LOG" 2>&1 &
 fenced_pid=$!
-add_atexit_kill_pid $fenced_pid
-
-#
-# some critical failures will cause fs operations to hang.  We can watch
-# for evidence of them and cause the system to crash, at least.
-#
-crash_monitor()
-{
-	local bad=0
-
-	while sleep 1; do
-		if dmesg | grep -q "inserting extent.*overlaps existing"; then
-			echo "run-tests monitor saw overlapping extent message"
-			bad=1
-		fi
-
-		if dmesg | grep -q "error indicated by fence action" ; then
-			echo "run-tests monitor saw fence agent error message"
-			bad=1
-		fi
-
-		if [ ! -e "/proc/${fenced_pid}/status" ]; then
-			echo "run-tests monitor didn't see fenced pid $fenced_pid /proc dir"
-			bad=1
-		fi
-
-		if [ "$bad" != 0 ]; then
-			echo "run-tests monitor syncing and triggering crash"
-			# hail mary, the sync could well hang
-			(echo s > /proc/sysrq-trigger) &
-			sleep 5
-			echo c > /proc/sysrq-trigger
-			exit 1
-		fi
-	done
-}
-crash_monitor &
-add_atexit_kill_pid $!
+fenced_log "started fenced pid $fenced_pid in the background"

 # setup dm tables
 echo "0 $(blockdev --getsz $T_META_DEVICE) linear $T_META_DEVICE 0" > \
@@ -588,7 +506,7 @@ fi
 . funcs/filter.sh

 # give tests access to built binaries in src/, prefer over installed
-export PATH="$PWD/src:$PATH"
+PATH="$PWD/src:$PATH"

 msg "running tests"
 > "$T_RESULTS/skip.log"
@@ -608,113 +526,101 @@ for t in $tests; do
 	t="tests/$t"
 	test_name=$(basename "$t" | sed -e 's/.sh$//')

+	# create a temporary dir and file path for the test
+	T_TMPDIR="$T_RESULTS/tmp/$test_name"
+	T_TMP="$T_TMPDIR/tmp"
+	cmd rm -rf "$T_TMPDIR"
+	cmd mkdir -p "$T_TMPDIR"
+
+	# create a test name dir in the fs, clean up old data as needed
+	T_DS=""
+	for i in $(seq 0 $((T_NR_MOUNTS - 1))); do
+		dir="${T_M[$i]}/test/$test_name"
+
+		test $i == 0 && (
+			test -d "$dir" && cmd rm -rf "$dir"
+			cmd mkdir -p "$dir"
+		)
+
+		eval T_D$i=$dir
+		T_D[$i]=$dir
+		T_DS+="$dir "
+	done
+
+	# export all our T_ variables
+	for v in ${!T_*}; do
+		eval export $v
+	done
+	export PATH # give test access to scoutfs binary
+
+	# prepare to compare output to golden output
+	test -e "$T_RESULTS/output" || cmd mkdir -p "$T_RESULTS/output"
+	out="$T_RESULTS/output/$test_name"
+	> "$T_TMPDIR/status.msg"
+	golden="golden/$test_name"
+
 	# get stats from previous pass
 	last="$T_RESULTS/last-passed-test-stats"
 	stats=$(grep -s "^$test_name " "$last" | cut -d " " -f 2-)
 	test -n "$stats" && stats="last: $stats"
+
 	printf "  %-30s $stats" "$test_name"

 	# mark in dmesg as to what test we are running
 	echo "run scoutfs test $test_name" > /dev/kmsg

-	# let the test get at its extra files
-	T_EXTRA="$T_TESTS/extra/$test_name"
+	# record dmesg before
+	dmesg | t_filter_dmesg > "$T_TMPDIR/dmesg.before"

-	for iter in $(seq 1 $T_LOOP_ITER); do
+	# give tests stdout and compared output on specific fds
+	exec 6>&1
+	exec 7>$out

-		# create a temporary dir and file path for the test
-		T_TMPDIR="$T_RESULTS/tmp/$test_name"
-		T_TMP="$T_TMPDIR/tmp"
-		cmd rm -rf "$T_TMPDIR"
-		cmd mkdir -p "$T_TMPDIR"
+	# run the test with access to our functions
+	start_secs=$SECONDS
+	bash -c "for f in funcs/*.sh; do . \$f; done; . $t" >&7 2>&1
+	sts="$?"
+	log "test $t exited with status $sts"
+	stats="$((SECONDS - start_secs))s"

-		# assign scratch mount point in temporary dir
-		T_MSCR="$T_TMPDIR/scratch"
+	# close our weird descriptors
+	exec 6>&-
+	exec 7>&-

-		# create a test name dir in the fs, clean up old data as needed
-		T_DS=""
-		for i in $(seq 0 $((T_NR_MOUNTS - 1))); do
-			dir="${T_M[$i]}/test/$test_name"
-
-			test $i == 0 && (
-				test -d "$dir" && cmd rm -rf "$dir"
-				cmd mkdir -p "$dir"
-			)
-
-			eval T_D$i=$dir
-			T_D[$i]=$dir
-			T_DS+="$dir "
-		done
-
-		# export all our T_ variables
-		for v in ${!T_*}; do
-			eval export $v
-		done
-
-		# prepare to compare output to golden output
-		test -e "$T_RESULTS/output" || cmd mkdir -p "$T_RESULTS/output"
-		out="$T_RESULTS/output/$test_name"
-		> "$T_TMPDIR/status.msg"
-		golden="golden/$test_name"
-
-		# record dmesg before
-		dmesg | t_filter_dmesg > "$T_TMPDIR/dmesg.before"
-
-		# give tests stdout and compared output on specific fds
-		exec 6>&1
-		exec 7>$out
-
-		# run the test with access to our functions
-		start_secs=$SECONDS
-		bash -c "for f in funcs/*.sh; do . \$f; done; . $t" >&7 2>&1
-		sts="$?"
-		log "test $t exited with status $sts"
-		stats="$((SECONDS - start_secs))s"
-
-		# close our weird descriptors
-		exec 6>&-
-		exec 7>&-
-
-		# compare output if the test returned passed status
-		if [ "$sts" == "$T_PASS_STATUS" ]; then
-			if [ ! -e "$golden" ]; then
-				message="no golden output"
-				sts=$T_FAIL_STATUS
-			elif ! cmp -s "$golden" "$out"; then 
-				message="output differs"
-				sts=$T_FAIL_STATUS
-				diff -u "$golden" "$out" >> "$T_RESULTS/fail.log"
-			fi
-		else
-			# get message from t_*() functions
-			message=$(cat "$T_TMPDIR/status.msg")
-		fi
-
-		# see if anything unexpected was added to dmesg
-		if [ "$sts" == "$T_PASS_STATUS" ]; then
-			dmesg | t_filter_dmesg > "$T_TMPDIR/dmesg.after"
-			diff --old-line-format="" --unchanged-line-format="" \
-				"$T_TMPDIR/dmesg.before" "$T_TMPDIR/dmesg.after" | \
-				grep -v '^$' > "$T_TMPDIR/dmesg.new"
-
-			if [ -s "$T_TMPDIR/dmesg.new" ]; then
-				message="unexpected messages in dmesg"
-				sts=$T_FAIL_STATUS
-				cat "$T_TMPDIR/dmesg.new" >> "$T_RESULTS/fail.log"
-			fi
-		fi
-
-		# record unknown exit status
-		if [ "$sts" -lt "$T_FIRST_STATUS" -o "$sts" -gt "$T_LAST_STATUS" ]; then
-			message="unknown status: $sts"
+	# compare output if the test returned passed status
+	if [ "$sts" == "$T_PASS_STATUS" ]; then
+		if [ ! -e "$golden" ]; then
+			message="no golden output"
 			sts=$T_FAIL_STATUS
+		elif ! cmp -s "$golden" "$out"; then 
+			message="output differs"
+			sts=$T_FAIL_STATUS
+			diff -u "$golden" "$out" >> "$T_RESULTS/fail.log"
 		fi
+	else
+		# get message from t_*() functions
+		message=$(cat "$T_TMPDIR/status.msg")
+	fi

-		# stop looping if we didn't pass
-		if [ "$sts" != "$T_PASS_STATUS" ]; then
-			break;
+	# see if anything unexpected was added to dmesg
+	if [ "$sts" == "$T_PASS_STATUS" ]; then
+		dmesg | t_filter_dmesg > "$T_TMPDIR/dmesg.after"
+		diff --old-line-format="" --unchanged-line-format="" \
+			"$T_TMPDIR/dmesg.before" "$T_TMPDIR/dmesg.after" > \
+			"$T_TMPDIR/dmesg.new"
+
+		if [ -s "$T_TMPDIR/dmesg.new" ]; then
+			message="unexpected messages in dmesg"
+			sts=$T_FAIL_STATUS
+			cat "$T_TMPDIR/dmesg.new" >> "$T_RESULTS/fail.log"
 		fi
-	done
+	fi
+
+	# record unknown exit status
+	if [ "$sts" -lt "$T_FIRST_STATUS" -o "$sts" -gt "$T_LAST_STATUS" ]; then
+		message="unknown status: $sts"
+		sts=$T_FAIL_STATUS
+	fi

 	# show and record the result of the test
 	if [ "$sts" == "$T_PASS_STATUS" ]; then
@@ -2,8 +2,6 @@ export-get-name-parent.sh
 basic-block-counts.sh
 basic-bad-mounts.sh
 basic-posix-acl.sh
-basic-acl-consistency.sh
-basic-nfs.sh
 inode-items-updated.sh
 simple-inode-index.sh
 simple-staging.sh
@@ -12,7 +10,6 @@ simple-readdir.sh
 get-referring-entries.sh
 fallocate.sh
 basic-truncate.sh
-punch-offline.sh
 data-prealloc.sh
 setattr_more.sh
 offline-extent-waiting.sh
@@ -27,11 +24,7 @@ srch-basic-functionality.sh
 simple-xattr-unit.sh
 retention-basic.sh
 totl-xattr-tag.sh
-basic-xattr-indx.sh
 quota.sh
-totl-merge-read.sh
-quota-invalidate-race.sh
-totl-delta-inject.sh
 lock-refleak.sh
 lock-shrink-consistency.sh
 lock-shrink-read-race.sh
@@ -55,7 +48,6 @@ setup-error-teardown.sh
 resize-devices.sh
 change-devices.sh
 fence-and-reclaim.sh
-orphan-log-trees.sh
 quorum-heartbeat-timeout.sh
 orphan-inodes.sh
 mount-unmount-race.sh
@@ -19,7 +19,6 @@
 #include <sys/types.h>
 #include <stdio.h>
 #include <sys/stat.h>
-#include <inttypes.h>
 #include <fcntl.h>
 #include <unistd.h>
 #include <stdlib.h>
@@ -30,7 +29,7 @@
 #include <errno.h>

 static int size = 0;
-static int duration = 0;
+static int count = 0; /* XXX make this duration instead */

 struct thread_info {
 	int nr;
@@ -42,8 +41,6 @@ static void *run_test_func(void *ptr)
 	void *buf = NULL;
 	char *addr = NULL;
 	struct thread_info *tinfo = ptr;
-	uint64_t seconds = 0;
-	struct timespec ts;
 	int c = 0;
 	int fd;
 	ssize_t read, written, ret;
@@ -64,15 +61,9 @@ static void *run_test_func(void *ptr)

 	usleep(100000); /* 0.1sec to allow all threads to start roughly at the same time */

-	clock_gettime(CLOCK_REALTIME, &ts); /* record start time */
-	seconds = ts.tv_sec + duration;
-
 	for (;;) {
-		if (++c % 16 == 0) {
-			clock_gettime(CLOCK_REALTIME, &ts);
-			if (ts.tv_sec >= seconds)
-				break;
-		}
+		if (++c > count)
+			break;

 		switch (rand() % 4) {
 		case 0: /* pread */
@@ -108,8 +99,6 @@ static void *run_test_func(void *ptr)
 			memcpy(addr, buf, size); /* noerr */
 			break;
 		}
-
-		usleep(10000);
 	}

 	munmap(addr, size);
@@ -131,7 +120,7 @@ int main(int argc, char **argv)
 	int i;

 	if (argc != 8) {
-		fprintf(stderr, "%s requires 7 arguments - size duration file1 file2 file3 file4 file5\n", argv[0]);
+		fprintf(stderr, "%s requires 7 arguments - size count file1 file2 file3 file4 file5\n", argv[0]);
 		exit(-1);
 	}

@@ -141,9 +130,9 @@ int main(int argc, char **argv)
 		exit(-1);
 	}

-	duration = atoi(argv[2]);
-	if (duration < 0) {
-		fprintf(stderr, "invalid duration, must be greater than or equal to 0\n");
+	count = atoi(argv[2]);
+	if (count < 0) {
+		fprintf(stderr, "invalid count, must be greater than 0\n");
 		exit(-1);
 	}

@@ -1,121 +0,0 @@
-/*
- * Test helper that calls SCOUTFS_IOC_INJECT_TOTL_DELTA to seed
- * arbitrary totl deltas.
- *
- * Copyright (C) 2026 Versity Software, Inc.  All rights reserved.
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public
- * License v2 as published by the Free Software Foundation.
- */
-
-#ifndef _GNU_SOURCE
-#define _GNU_SOURCE
-#endif
-#include <unistd.h>
-#include <stdio.h>
-#include <stdlib.h>
-#include <string.h>
-#include <stdint.h>
-#include <inttypes.h>
-#include <fcntl.h>
-#include <errno.h>
-#include <sys/ioctl.h>
-#include <linux/types.h>
-
-#include "ioctl.h"
-
-static void usage(const char *prog)
-{
-	fprintf(stderr,
-		"Usage: %s <mountpoint> <a>.<b>.<c> <total> <count>\n",
-		prog);
-	exit(2);
-}
-
-static int parse_s64(const char *s, int64_t *out)
-{
-	char *end;
-	int64_t v;
-
-	errno = 0;
-	v = strtoll(s, &end, 0);
-	if (errno || *end != '\0' || end == s)
-		return -1;
-	*out = v;
-	return 0;
-}
-
-/*
- * Parse "<a>.<b>.<c>" into abc[0..2] (skxt_a, skxt_b, skxt_c).  Each
- * component must be a non-empty unsigned base-0 integer.
- */
-static int parse_dotted_name(const char *s, uint64_t abc[3])
-{
-	const char *p = s;
-	char *end;
-	int i;
-
-	for (i = 0; i < 3; i++) {
-		if (*p == '\0' || *p == '.')
-			return -1;
-		errno = 0;
-		abc[i] = strtoull(p, &end, 0);
-		if (errno || end == p)
-			return -1;
-
-		if (i < 2) {
-			if (*end != '.')
-				return -1;
-			p = end + 1;
-		} else {
-			if (*end != '\0')
-				return -1;
-		}
-	}
-	return 0;
-}
-
-int main(int argc, char **argv)
-{
-	struct scoutfs_ioctl_inject_totl_delta itd = {{0,}};
-	uint64_t abc[3];
-	int64_t total, count;
-	int fd;
-	int ret;
-
-	if (argc != 5)
-		usage(argv[0]);
-
-	if (parse_dotted_name(argv[2], abc) ||
-	    parse_s64(argv[3], &total) ||
-	    parse_s64(argv[4], &count)) {
-		fprintf(stderr, "could not parse arguments\n");
-		usage(argv[0]);
-	}
-
-	itd.name[0] = abc[0];
-	itd.name[1] = abc[1];
-	itd.name[2] = abc[2];
-	itd.total = total;
-	itd.count = count;
-
-	fd = open(argv[1], O_RDONLY | O_DIRECTORY);
-	if (fd < 0) {
-		fprintf(stderr, "open(%s): %s\n", argv[1], strerror(errno));
-		return 1;
-	}
-
-	ret = ioctl(fd, SCOUTFS_IOC_INJECT_TOTL_DELTA, &itd);
-	if (ret < 0) {
-		fprintf(stderr,
-			"INJECT_TOTL_DELTA(%" PRIu64 ".%" PRIu64 ".%" PRIu64
-			", total=%" PRId64 ", count=%" PRId64 "): %s\n",
-			abc[0], abc[1], abc[2], total, count, strerror(errno));
-		close(fd);
-		return 1;
-	}
-
-	close(fd);
-	return 0;
-}
@@ -1,117 +0,0 @@
-
-#
-# Test basic clustered posix acl consistency.
-#
-
-t_require_commands getfacl setfacl
-
-GETFACL="getfacl --absolute-names"
-
-filter_scratch() {
-	sed "s@$T_MSCR@t_mscr@g"
-}
-
-acl_compare()
-{
-	diff -u - <($GETFACL $T_MSCR/data/dir_a/dir_b | filter_scratch) <<EOF1
-# file: t_mscr/data/dir_a/dir_b
-# owner: t_usr_3
-# group: t_grp_3
-# flags: -s-
-user::rwx
-group::rwx
-group:t_grp_2:r-x
-mask::rwx
-other::---
-default:user::rwx
-default:group::rwx
-default:group:t_grp_2:r-x
-default:group:t_grp_3:rwx
-default:mask::rwx
-default:other::---
-
-EOF1
-
-	test $? -eq 0 || t_fail "dir_b differs"
-
-	diff -u - <($GETFACL -p $T_MSCR/data/dir_a/dir_b/dir_c/dir_d | filter_scratch) <<EOF3
-# file: t_mscr/data/dir_a/dir_b/dir_c/dir_d
-# owner: t_usr_1
-# group: t_grp_1
-# flags: -s-
-user::rwx
-group::rwx
-group:t_grp_2:r-x
-mask::rwx
-other::---
-default:user::rwx
-default:group::rwx
-default:group:t_grp_2:r-x
-default:group:t_grp_3:rwx
-default:mask::rwx
-default:other::---
-
-EOF3
-	test $? -eq 0 || t_fail "dir_d differs"
-
-	diff -u - <($GETFACL $T_MSCR/data/dir_a/dir_b/dir_c | filter_scratch) <<EOF2
-# file: t_mscr/data/dir_a/dir_b/dir_c
-# owner: t_usr_3
-# group: t_grp_2
-# flags: -s-
-user::rwx
-group::rwx
-group:t_grp_2:r-x
-mask::rwx
-other::---
-default:user::rwx
-default:group::rwx
-default:group:t_grp_2:r-x
-default:group:t_grp_3:rwx
-default:mask::rwx
-default:other::---
-
-EOF2
-	test $? -eq 0 || t_fail "dir_c differs"
-}
-echo "== make scratch fs"
-t_scratch_mkfs
-t_scratch_mount
-
-rm -rf $T_MSCR/data
-
-echo "== create uid/gids"
-groupadd -g 7101 t_grp_1 > /dev/null 2>&1
-useradd -g 7101 -u 7101 t_usr_1 > /dev/null 2>&1
-groupadd -g 7102 t_grp_2 > /dev/null 2>&1
-groupadd -g 7103 t_grp_3 > /dev/null 2>&1
-useradd -g 7103 -u 7103 t_usr_3 > /dev/null 2>&1
-
-echo "== set acls and permissions"
-mkdir -p $T_MSCR/data/dir_a/dir_b
-chown t_usr_3:t_grp_3 $T_MSCR/data/dir_a/dir_b
-chmod 2770 $T_MSCR/data/dir_a/dir_b
-setfacl -m g:t_grp_2:rx $T_MSCR/data/dir_a/dir_b
-setfacl -m d:g:t_grp_2:rx $T_MSCR/data/dir_a/dir_b
-setfacl -m d:g:t_grp_3:rwx $T_MSCR/data/dir_a/dir_b
-
-mkdir -p $T_MSCR/data/dir_a/dir_b/dir_c
-chown t_usr_3:t_grp_2 $T_MSCR/data/dir_a/dir_b/dir_c
-setfacl -x g:t_grp_3 $T_MSCR/data/dir_a/dir_b/dir_c
-
-mkdir -p $T_MSCR/data/dir_a/dir_b/dir_c/dir_d
-chown t_usr_1:t_grp_1 $T_MSCR/data/dir_a/dir_b/dir_c/dir_d
-setfacl -x g:t_grp_3 $T_MSCR/data/dir_a/dir_b/dir_c/dir_d
-
-echo "== compare output"
-acl_compare
-
-echo "== drop caches and compare again"
-sync
-echo 3 > /proc/sys/vm/drop_caches
-acl_compare
-
-echo "== cleanup scratch fs"
-t_scratch_umount
-
-t_pass
@@ -12,22 +12,25 @@ mount_fail()
 }

 echo "== prepare devices, mount point, and logs"
-t_scratch_mkfs
+SCR="$T_TMPDIR/mnt.scratch"
+mkdir -p "$SCR"
 > $T_TMP.mount.out
+scoutfs mkfs -f -Q 0,127.0.0.1,53000 "$T_EX_META_DEV" "$T_EX_DATA_DEV" > $T_TMP.mkfs.out 2>&1 \
+	|| t_fail "mkfs failed"

 echo "== bad devices, bad options"
-mount_fail -o _bad /dev/null /dev/null "$T_MSCR"
+mount_fail -o _bad /dev/null /dev/null "$SCR"

 echo "== swapped devices"
-mount_fail -o metadev_path=$T_EX_DATA_DEV,quorum_slot_nr=0 "$T_EX_META_DEV" "$T_MSCR"
+mount_fail -o metadev_path=$T_EX_DATA_DEV,quorum_slot_nr=0 "$T_EX_META_DEV" "$SCR"

 echo "== both meta devices"
-mount_fail -o metadev_path=$T_EX_META_DEV,quorum_slot_nr=0 "$T_EX_META_DEV" "$T_MSCR"
+mount_fail -o metadev_path=$T_EX_META_DEV,quorum_slot_nr=0 "$T_EX_META_DEV" "$SCR"

 echo "== both data devices"
-mount_fail -o metadev_path=$T_EX_DATA_DEV,quorum_slot_nr=0 "$T_EX_DATA_DEV" "$T_MSCR"
+mount_fail -o metadev_path=$T_EX_DATA_DEV,quorum_slot_nr=0 "$T_EX_DATA_DEV" "$SCR"

 echo "== good volume, bad option and good options"
-mount_fail -o _bad,metadev_path=$T_EX_META_DEV,quorum_slot_nr=0 "$T_EX_DATA_DEV" "$T_MSCR"
+mount_fail -o _bad,metadev_path=$T_EX_META_DEV,quorum_slot_nr=0 "$T_EX_DATA_DEV" "$SCR" 

 t_pass
@@ -1,86 +0,0 @@
-#
-# Test basic scoutfs-nfs interactions:
-# - read/write
-# - stage/release and data wait
-# - nfs setacl/getacl mapping
-#
-
-t_require_commands scoutfs setfacl getfacl exportfs mount.nfs umount \
-		   stat dd cmp systemctl
-
-systemctl start nfs-server >> "$T_TMPDIR/nfs.log" 2>&1 || \
-	t_skip "nfs-server not available"
-
-# Keep file creation modes deterministic for the ACL golden output.
-umask 022
-
-EXPORT_OPTS="rw,async,no_root_squash,no_subtree_check,fsid=42"
-NFS_MNT="$T_TMP.nfs"
-NFS_DIR="$NFS_MNT/test/basic-nfs"
-
-filter() { sed "s@$T_TMPDIR@T_TMPDIR@g" | t_filter_fs; }
-gf() { getfacl -n --omit-header "$@" 2>/dev/null; }
-
-teardown_nfs()
-{
-	(
-		umount "$NFS_MNT"
-		exportfs -u "127.0.0.1:$T_M0"
-		exportfs -f
-		systemctl stop nfs-server
-		rmdir "$NFS_MNT"
-	) >> "$T_TMPDIR/nfs.log" 2>&1
-}
-trap teardown_nfs EXIT
-
-exportfs -u "127.0.0.1:$T_M0" >> "$T_TMPDIR/nfs.log" 2>&1 || true
-t_quiet mkdir -p "$NFS_MNT"
-exportfs -o "$EXPORT_OPTS" "127.0.0.1:$T_M0" >> "$T_TMPDIR/nfs.log" 2>&1
-mount.nfs -o vers=3,noac,actimeo=0 "127.0.0.1:$T_M0" "$NFS_MNT" >> "$T_TMPDIR/nfs.log" 2>&1
-
-test -d "$NFS_DIR" || t_fail "test dir $NFS_DIR not visible over NFS"
-
-echo "== write via NFS, read both sides"
-dd if=/dev/urandom bs=4096 count=1 of="$T_TMP.data" status=none
-cp "$T_TMP.data" "$NFS_DIR/file"
-cmp "$T_TMP.data" "$T_D0/file"
-cmp "$T_TMP.data" "$NFS_DIR/file"
-
-echo "== POSIX ACL set via NFS, read both sides"
-setfacl -m u:22222:rw "$NFS_DIR/file" 2>&1 | filter
-gf "$NFS_DIR/file"
-gf "$T_D0/file"
-
-echo "== POSIX ACL set on scoutfs, read via NFS"
-setfacl -m g:44444:r "$T_D0/file" 2>&1 | filter
-gf "$NFS_DIR/file"
-
-echo "== default ACL inheritance via NFS"
-mkdir "$NFS_DIR/d"
-setfacl -d -m u:22222:rwx "$NFS_DIR/d" 2>&1 | filter
-touch "$NFS_DIR/d/child"
-gf "$NFS_DIR/d/child"
-
-echo "== NFS read demand-stages a released file"
-dd if=/dev/urandom bs=4096 count=1 of="$T_TMP.big" status=none
-cp "$T_TMP.big" "$T_D0/big"
-sync
-vers=$(scoutfs stat -s data_version "$T_D0/big")
-t_quiet scoutfs release "$T_D0/big" -V "$vers" -o 0 -l 4K
-
-# NFS read against the offline file blocks in scoutfs_read waiting
-# for the data to come back online.
-cat "$NFS_DIR/big" > "$T_TMP.read" &
-read_pid=$!
-sleep 1
-scoutfs data-waiting -B 0 -I 0 -p "$T_D0" | wc -l
-
-t_quiet scoutfs stage "$T_TMP.big" "$T_D0/big" -V "$vers" -o 0 -l 4096
-wait "$read_pid"
-cmp "$T_TMP.big" "$T_TMP.read"
-
-echo "== cleanup"
-rm -f "$T_D0/file" "$T_D0/big"
-rm -rf "$T_D0/d"
-
-t_pass
@@ -1,143 +0,0 @@
-#
-# Test basic .indx. xattr tag functionality and index entry lifecycle
-#
-
-t_require_commands touch rm setfattr scoutfs stat
-t_require_mounts 2
-
-# query index from a specific mount, default mount 0
-read_xattr_index()
-{
-	local nr="${1:-0}"
-	local mnt="$(eval echo \$T_M$nr)"
-	shift
-
-	sync
-	echo 1 > $(t_debugfs_path $nr)/drop_weak_item_cache
-	scoutfs read-xattr-index -p "$mnt" "$@"
-}
-
-MAJOR=5
-MINOR=100
-
-echo "== testing invalid read-xattr-index arguments"
-scoutfs read-xattr-index -p "$T_M0" bad 2>&1
-scoutfs read-xattr-index -p "$T_M0" 1.2 2>&1
-scoutfs read-xattr-index -p "$T_M0" 1.2.3 256.0.0 2>&1
-scoutfs read-xattr-index -p "$T_M0" 1.2.3 0.0.0 2>&1
-scoutfs read-xattr-index -p "$T_M0" 1.2.0 1.1.2 2>&1
-scoutfs read-xattr-index -p "$T_M0" 2.2.2 2.2.1 2>&1
-
-echo "== testing invalid names"
-touch "$T_D0/invalid"
-setfattr -n scoutfs.hide.indx.test.$MAJOR "$T_D0/invalid" 2>&1 | t_filter_fs
-setfattr -n scoutfs.hide.indx.test "$T_D0/invalid" 2>&1 | t_filter_fs
-setfattr -n scoutfs.hide.indx.test.. "$T_D0/invalid" 2>&1 | t_filter_fs
-setfattr -n scoutfs.hide.indx.test..$MINOR "$T_D0/invalid" 2>&1 | t_filter_fs
-setfattr -n scoutfs.hide.indx.test.$MAJOR. "$T_D0/invalid" 2>&1 | t_filter_fs
-setfattr -n scoutfs.hide.indx.test.256.$MINOR "$T_D0/invalid" 2>&1 | t_filter_fs
-setfattr -n scoutfs.hide.indx.test.abc.$MINOR "$T_D0/invalid" 2>&1 | t_filter_fs
-setfattr -n scoutfs.hide.indx.test.$MAJOR.abc "$T_D0/invalid" 2>&1 | t_filter_fs
-setfattr -n scoutfs.hide.indx.test.-1.$MINOR "$T_D0/invalid" 2>&1 | t_filter_fs
-setfattr -n scoutfs.hide.indx.test.$MAJOR.-1 "$T_D0/invalid" 2>&1 | t_filter_fs
-setfattr -n scoutfs.hide.indx.test.18446744073709551616.$MINOR "$T_D0/invalid" 2>&1 | t_filter_fs
-setfattr -n scoutfs.hide.indx.$(printf 'x%.0s' $(seq 1 240)).$MAJOR.$MINOR "$T_D0/invalid" 2>&1 | t_filter_fs
-rm -f "$T_D0/invalid"
-
-echo "== testing boundary values"
-touch "$T_D0/boundary"
-INO=$(stat -c "%i" "$T_D0/boundary")
-setfattr -n scoutfs.hide.indx.test.0.0 "$T_D0/boundary"
-read_xattr_index 0 0.0.0 0.0.-1 | awk '($3 == "'$INO'") {print "0.0 found"}'
-setfattr -x scoutfs.hide.indx.test.0.0 "$T_D0/boundary"
-setfattr -n scoutfs.hide.indx.test.255.18446744073709551615 "$T_D0/boundary"
-read_xattr_index 0 255.0.0 255.-1.-1 | awk '($3 == "'$INO'") {print "255.max found"}'
-setfattr -x scoutfs.hide.indx.test.255.18446744073709551615 "$T_D0/boundary"
-rm -f "$T_D0/boundary"
-
-echo "== indx xattr must have no value"
-touch "$T_D0/noval"
-setfattr -n scoutfs.hide.indx.test.$MAJOR.$MINOR -v "" "$T_D0/noval" 2>&1 | t_filter_fs
-setfattr -n scoutfs.hide.indx.test.$MAJOR.$MINOR -v 0 "$T_D0/noval" 2>&1 | t_filter_fs
-setfattr -n scoutfs.hide.indx.test.$MAJOR.$MINOR -v 1 "$T_D0/noval" 2>&1 | t_filter_fs
-rm -f "$T_D0/noval"
-
-echo "== set indx xattr and verify index entry"
-touch "$T_D0/file"
-INO=$(stat -c "%i" "$T_D0/file")
-setfattr -n scoutfs.hide.indx.test.$MAJOR.$MINOR "$T_D0/file"
-read_xattr_index 0 $MAJOR.0.0 $MAJOR.-1.-1 | awk '($3 == "'$INO'") {print "found"}'
-
-echo "== setting same indx xattr again is a no-op"
-setfattr -n scoutfs.hide.indx.test.$MAJOR.$MINOR "$T_D0/file"
-read_xattr_index 0 $MAJOR.0.0 $MAJOR.-1.-1 | awk '($3 == "'$INO'") {print "found"}'
-
-echo "== removing non-existent indx xattr succeeds"
-setfattr -x scoutfs.hide.indx.nonexistent.$MAJOR.999 "$T_D0/file" 2>&1 | t_filter_fs
-read_xattr_index 0 $MAJOR.0.0 $MAJOR.-1.-1 | awk '($3 == "'$INO'") {print "still found"}'
-
-echo "== explicit xattr removal cleans up index entry"
-setfattr -x scoutfs.hide.indx.test.$MAJOR.$MINOR "$T_D0/file"
-read_xattr_index 0 $MAJOR.0.0 $MAJOR.-1.-1 | awk '($3 == "'$INO'") {print "found orphan"}'
-rm -f "$T_D0/file"
-
-echo "== file deletion cleans up index entry"
-touch "$T_D0/file2"
-INO=$(stat -c "%i" "$T_D0/file2")
-setfattr -n scoutfs.hide.indx.test.$MAJOR.$MINOR "$T_D0/file2"
-read_xattr_index 0 $MAJOR.0.0 $MAJOR.-1.-1 | awk '($3 == "'$INO'") {print "found before delete"}'
-rm -f "$T_D0/file2"
-read_xattr_index 0 $MAJOR.0.0 $MAJOR.-1.-1 | awk '($3 == "'$INO'") {print "found orphan after delete"}'
-
-echo "== multiple indx xattrs on one file cleaned up by deletion"
-touch "$T_D0/file3"
-INO=$(stat -c "%i" "$T_D0/file3")
-setfattr -n scoutfs.hide.indx.a.$MAJOR.200 "$T_D0/file3"
-setfattr -n scoutfs.hide.indx.b.$MAJOR.300 "$T_D0/file3"
-BEFORE=$(read_xattr_index 0 $MAJOR.0.0 $MAJOR.-1.-1 | awk '($3 == "'$INO'")' | wc -l)
-echo "entries before delete: $BEFORE"
-rm -f "$T_D0/file3"
-AFTER=$(read_xattr_index 0 $MAJOR.0.0 $MAJOR.-1.-1 | awk '($3 == "'$INO'")' | wc -l)
-echo "entries after delete: $AFTER"
-
-echo "== partial removal leaves other entries"
-touch "$T_D0/partial"
-INO=$(stat -c "%i" "$T_D0/partial")
-setfattr -n scoutfs.hide.indx.a.$MAJOR.200 "$T_D0/partial"
-setfattr -n scoutfs.hide.indx.b.$MAJOR.300 "$T_D0/partial"
-setfattr -x scoutfs.hide.indx.a.$MAJOR.200 "$T_D0/partial"
-read_xattr_index 0 $MAJOR.200.0 $MAJOR.200.-1 | awk '($3 == "'$INO'") {print "200 found"}'
-read_xattr_index 0 $MAJOR.300.0 $MAJOR.300.-1 | awk '($3 == "'$INO'") {print "300 found"}'
-rm -f "$T_D0/partial"
-
-echo "== multiple files at same index position"
-touch "$T_D0/multi_a" "$T_D0/multi_b"
-INO_A=$(stat -c "%i" "$T_D0/multi_a")
-INO_B=$(stat -c "%i" "$T_D0/multi_b")
-setfattr -n scoutfs.hide.indx.test.$MAJOR.$MINOR "$T_D0/multi_a"
-setfattr -n scoutfs.hide.indx.test.$MAJOR.$MINOR "$T_D0/multi_b"
-COUNT=$(read_xattr_index 0 $MAJOR.$MINOR.0 $MAJOR.$MINOR.-1 | wc -l)
-echo "files at same position: $COUNT"
-rm -f "$T_D0/multi_a"
-read_xattr_index 0 $MAJOR.$MINOR.0 $MAJOR.$MINOR.-1 | awk '($3 == "'$INO_A'") {print "deleted file still found"}'
-read_xattr_index 0 $MAJOR.$MINOR.0 $MAJOR.$MINOR.-1 | awk '($3 == "'$INO_B'") {print "surviving file found"}'
-rm -f "$T_D0/multi_b"
-
-echo "== cross-mount visibility"
-touch "$T_D0/file4"
-INO=$(stat -c "%i" "$T_D0/file4")
-setfattr -n scoutfs.hide.indx.test.$MAJOR.$MINOR "$T_D0/file4"
-read_xattr_index 1 $MAJOR.0.0 $MAJOR.-1.-1 | awk '($3 == "'$INO'") {print "found on mount 1"}'
-rm -f "$T_D0/file4"
-read_xattr_index 1 $MAJOR.0.0 $MAJOR.-1.-1 | awk '($3 == "'$INO'") {print "found orphan on mount 1"}'
-
-echo "== duplicate position deduplication"
-touch "$T_D0/file5"
-INO=$(stat -c "%i" "$T_D0/file5")
-setfattr -n scoutfs.hide.indx.aa.$MAJOR.$MINOR "$T_D0/file5"
-setfattr -n scoutfs.hide.indx.bb.$MAJOR.$MINOR "$T_D0/file5"
-COUNT=$(read_xattr_index 0 $MAJOR.0.0 $MAJOR.-1.-1 | awk '($3 == "'$INO'")' | wc -l)
-echo "entries for same position: $COUNT"
-rm -f "$T_D0/file5"
-
-t_pass
@@ -11,8 +11,9 @@ truncate -s $sz "$T_TMP.equal"
 truncate -s $large_sz "$T_TMP.large"

 echo "== make scratch fs"
-t_scratch_mkfs
-mkdir -p "$T_MSCR"
+t_quiet scoutfs mkfs -f -Q 0,127.0.0.1,53000 "$T_EX_META_DEV" "$T_EX_DATA_DEV"
+SCR="$T_TMPDIR/mnt.scratch"
+mkdir -p "$SCR"

 echo "== small new data device fails"
 t_rc scoutfs prepare-empty-data-device "$T_EX_META_DEV" "$T_TMP.small"
@@ -22,13 +23,13 @@ t_rc scoutfs prepare-empty-data-device --check "$T_EX_META_DEV" "$T_TMP.small"
 t_rc scoutfs prepare-empty-data-device --check "$T_EX_META_DEV"

 echo "== preparing while mounted fails"
-mount -t scoutfs -o metadev_path=$T_EX_META_DEV,quorum_slot_nr=0 "$T_EX_DATA_DEV" "$T_MSCR"
+mount -t scoutfs -o metadev_path=$T_EX_META_DEV,quorum_slot_nr=0 "$T_EX_DATA_DEV" "$SCR"
 t_rc scoutfs prepare-empty-data-device "$T_EX_META_DEV" "$T_TMP.equal"
-umount "$T_MSCR"
+umount "$SCR"

 echo "== preparing without recovery fails"
-mount -t scoutfs -o metadev_path=$T_EX_META_DEV,quorum_slot_nr=0 "$T_EX_DATA_DEV" "$T_MSCR"
-umount -f "$T_MSCR"
+mount -t scoutfs -o metadev_path=$T_EX_META_DEV,quorum_slot_nr=0 "$T_EX_DATA_DEV" "$SCR"
+umount -f "$SCR"
 t_rc scoutfs prepare-empty-data-device "$T_EX_META_DEV" "$T_TMP.equal"

 echo "== check sees metadata errors"
@@ -36,16 +37,16 @@ t_rc scoutfs prepare-empty-data-device --check "$T_EX_META_DEV"
 t_rc scoutfs prepare-empty-data-device --check "$T_EX_META_DEV" "$T_TMP.equal"

 echo "== preparing with file data fails"
-mount -t scoutfs -o metadev_path=$T_EX_META_DEV,quorum_slot_nr=0 "$T_EX_DATA_DEV" "$T_MSCR"
-echo hi > "$T_MSCR"/file
-umount "$T_MSCR"
+mount -t scoutfs -o metadev_path=$T_EX_META_DEV,quorum_slot_nr=0 "$T_EX_DATA_DEV" "$SCR"
+echo hi > "$SCR"/file
+umount "$SCR"
 scoutfs print "$T_EX_META_DEV" > "$T_TMP.print"
 t_rc scoutfs prepare-empty-data-device "$T_EX_META_DEV" "$T_TMP.equal"

 echo "== preparing after emptied"
-mount -t scoutfs -o metadev_path=$T_EX_META_DEV,quorum_slot_nr=0 "$T_EX_DATA_DEV" "$T_MSCR"
-rm -f "$T_MSCR"/file
-umount "$T_MSCR"
+mount -t scoutfs -o metadev_path=$T_EX_META_DEV,quorum_slot_nr=0 "$T_EX_DATA_DEV" "$SCR"
+rm -f "$SCR"/file
+umount "$SCR"
 t_rc scoutfs prepare-empty-data-device "$T_EX_META_DEV" "$T_TMP.equal"

 echo "== checks pass"
@@ -54,22 +55,22 @@ t_rc scoutfs prepare-empty-data-device --check "$T_EX_META_DEV" "$T_TMP.equal"

 echo "== using prepared"
 scr_loop=$(losetup --find --show "$T_TMP.equal")
-mount -t scoutfs -o metadev_path=$T_EX_META_DEV,quorum_slot_nr=0 "$scr_loop" "$T_MSCR"
-touch "$T_MSCR"/equal_prepared
-equal_tot=$(scoutfs statfs -s total_data_blocks -p "$T_MSCR")
-umount "$T_MSCR"
+mount -t scoutfs -o metadev_path=$T_EX_META_DEV,quorum_slot_nr=0 "$scr_loop" "$SCR"
+touch "$SCR"/equal_prepared
+equal_tot=$(scoutfs statfs -s total_data_blocks -p "$SCR")
+umount "$SCR"
 losetup -d "$scr_loop"

 echo "== preparing larger and resizing"
 t_rc scoutfs prepare-empty-data-device "$T_EX_META_DEV" "$T_TMP.large"
 scr_loop=$(losetup --find --show "$T_TMP.large")
-mount -t scoutfs -o metadev_path=$T_EX_META_DEV,quorum_slot_nr=0 "$scr_loop" "$T_MSCR"
-touch "$T_MSCR"/large_prepared
-ls "$T_MSCR"
-scoutfs resize-devices -p "$T_MSCR" -d $large_sz
-large_tot=$(scoutfs statfs -s total_data_blocks -p "$T_MSCR")
+mount -t scoutfs -o metadev_path=$T_EX_META_DEV,quorum_slot_nr=0 "$scr_loop" "$SCR"
+touch "$SCR"/large_prepared
+ls "$SCR"
+scoutfs resize-devices -p "$SCR" -d $large_sz
+large_tot=$(scoutfs statfs -s total_data_blocks -p "$SCR")
 test "$large_tot" -gt "$equal_tot" ; echo "resized larger test rc: $?"
-umount "$T_MSCR"
+umount "$SCR"
 losetup -d "$scr_loop"

 echo "== cleanup"
@@ -54,16 +54,21 @@ after=$(free_blocks Data "$T_M0")
 test "$before" == "$after" || \
 	t_fail "$after free data blocks after rm, expected $before"

+# XXX this is all pretty manual, would be nice to have helpers
 echo "== make small meta fs"
 # meta device just big enough for reserves and the metadata we'll fill
-t_scratch_mkfs -A -m 10G
-t_scratch_mount
+scoutfs mkfs -A -f -Q 0,127.0.0.1,53000 -m 10G "$T_EX_META_DEV" "$T_EX_DATA_DEV" > $T_TMP.mkfs.out 2>&1 || \
+	t_fail "mkfs failed"
+SCR="$T_TMPDIR/mnt.scratch"
+mkdir -p "$SCR"
+mount -t scoutfs -o metadev_path=$T_EX_META_DEV,quorum_slot_nr=0 \
+	"$T_EX_DATA_DEV" "$SCR"

 echo "== create large xattrs until we fill up metadata"
-mkdir -p "$T_MSCR/xattrs"
+mkdir -p "$SCR/xattrs"

 for f in $(seq 1 100000); do
-	file="$T_MSCR/xattrs/file-$f"
+	file="$SCR/xattrs/file-$f"
 	touch "$file"

 	LC_ALL=C create_xattr_loop -c 1000 -n user.scoutfs-enospc -p "$file" -s 65535 > $T_TMP.cxl 2>&1
@@ -79,10 +84,10 @@ for f in $(seq 1 100000); do
 done

 echo "== remove files with xattrs after enospc"
-rm -rf "$T_MSCR/xattrs"
+rm -rf "$SCR/xattrs"

 echo "== make sure we can create again"
-file="$T_MSCR/file-after"
+file="$SCR/file-after"
 C=120
 while (( C-- )); do
 	touch $file 2> /dev/null && break
@@ -94,6 +99,7 @@ sync
 rm -f "$file"

 echo "== cleanup small meta fs"
-t_scratch_umount
+umount "$SCR"
+rmdir "$SCR"

 t_pass
@@ -5,9 +5,6 @@
 t_require_commands sleep touch grep sync scoutfs
 t_require_mounts 2

-# regularly see ~20/~30s
-VERIFY_TIMEOUT_SECS=90
-
 #
 # Make sure that all mounts can read the results of a write from each
 # mount.
@@ -43,10 +40,8 @@ verify_fenced_run()

 	for rid in $rids; do
 		grep -q ".* running rid '$rid'.* args 'ignored run args'" "$T_FENCED_LOG" || \
-			return 1
+			t_fail "fenced didn't execute RUN script for rid $rid"
 	done
-
-	return 0
 }

 echo "== make sure all mounts can see each other"
@@ -59,7 +54,14 @@ rid=$(t_mount_rid $cl)
 echo "cl $cl sv $sv rid $rid" >> "$T_TMP.log"
 sync
 t_force_umount $cl
-t_wait_until_timeout $VERIFY_TIMEOUT_SECS verify_fenced_run $rid
+# wait for client reconnection to timeout
+while grep -q $rid $(t_debugfs_path $sv)/connections; do
+	sleep .5
+done
+while t_rid_is_fencing $rid; do
+	sleep .5
+done
+verify_fenced_run $rid
 t_mount $cl
 check_read_write

@@ -81,7 +83,15 @@ for cl in $(t_fs_nrs); do
 	t_force_umount $cl
 done

-t_wait_until_timeout $VERIFY_TIMEOUT_SECS verify_fenced_run $rids
+# wait for all client reconnections to timeout
+while egrep -q "($pattern)" $(t_debugfs_path $sv)/connections; do
+	sleep .5
+done
+# wait for all fence requests to complete
+while test -d $(echo /sys/fs/scoutfs/*/fence/* | cut -d " " -f 1); do
+	sleep .5
+done
+verify_fenced_run $rids
 # remount all the clients
 for cl in $(t_fs_nrs); do
 	if [ $cl == $sv ]; then
@@ -97,7 +107,12 @@ rid=$(t_mount_rid $sv)
 echo "sv $sv rid $rid" >> "$T_TMP.log"
 sync
 t_force_umount $sv
-t_wait_until_timeout $VERIFY_TIMEOUT_SECS verify_fenced_run $rid
+t_wait_for_leader
+# wait until new server is done fencing unmounted leader rid
+while t_rid_is_fencing $rid; do
+	sleep .5
+done
+verify_fenced_run $rid
 t_mount $sv
 check_read_write

@@ -112,7 +127,11 @@ for nr in $(t_fs_nrs); do
 	t_force_umount $nr
 done
 t_mount_all
-t_wait_until_timeout $VERIFY_TIMEOUT_SECS verify_fenced_run $rids
+# wait for all fence requests to complete
+while test -d $(echo /sys/fs/scoutfs/*/fence/* | cut -d " " -f 1); do
+	sleep .5
+done
+verify_fenced_run $rids
 check_read_write

 t_pass
@@ -11,8 +11,8 @@
 # format version.
 #

-# not supported on el8 or higher
-if [ $(source /etc/os-release ; echo ${VERSION_ID:0:1}) -gt 7 ]; then
+# not supported on el9!
+if [ $(source /etc/os-release ; echo ${VERSION_ID:0:1}) -gt 8 ]; then
 	t_skip_permitted "Unsupported OS version"
 fi

@@ -89,7 +89,7 @@ for vers in $(seq $MIN $((MAX - 1))); do
 	old_module="$builds/$vers/scoutfs.ko"

 	echo "mkfs $vers" >> "$T_TMP.log"
-	t_quiet $old_scoutfs mkfs -f -Q 0,127.0.0.1,$T_SCRATCH_PORT "$T_EX_META_DEV" "$T_EX_DATA_DEV" \
+	t_quiet $old_scoutfs mkfs -f -Q 0,127.0.0.1,53000 "$T_EX_META_DEV" "$T_EX_DATA_DEV" \
 		|| t_fail "mkfs $vers failed"

 	echo "mount $vers with $vers" >> "$T_TMP.log"
@@ -72,7 +72,7 @@ touch $T_D0/dir/file
 mkdir $T_D0/dir/dir
 ln -s $T_D0/dir/file $T_D0/dir/symlink
 mknod $T_D0/dir/char c 1 3 # null
-mknod $T_D0/dir/block b 42 0 # SAMPLE block dev - nonexistant/demo use only number
+mknod $T_D0/dir/block b 7 0 # loop0
 for name in $(ls -UA $T_D0/dir | sort); do
 	ino=$(stat -c '%i' $T_D0/dir/$name)
 	$GRE $ino | filter_types
@@ -53,40 +53,26 @@ exec {FD1}>&-  # close
 exec {FD2}>&-  # close
 check_ino_index "$ino" "$dseq" "$T_M0"

-# Hurry along the orphan scanners. If any are currently asleep, we will
-# have to wait at least their current scan interval before they wake up,
-# run, and notice their new interval.
-t_save_all_sysfs_mount_options orphan_scan_delay_ms
-t_set_all_sysfs_mount_options orphan_scan_delay_ms 500
-t_wait_for_orphan_scan_runs
-
 echo "== remote unopened unlink deletes"
 echo "contents" > "$T_D0/file"
 ino=$(stat -c "%i" "$T_D0/file")
 dseq=$(scoutfs stat -s data_seq "$T_D0/file")
 rm -f "$T_D1/file"
-# cross-mount deletion falls back to the orphan scanner when the
-# creating mount still has the inode cached, wait for it to complete
-t_force_log_merge
-# wait for orphan scanners to pick up the unlinked inode and become idle
-t_wait_for_no_orphans
 check_ino_index "$ino" "$dseq" "$T_M0"
 check_ino_index "$ino" "$dseq" "$T_M1"

 echo "== unlink wait for open on other mount"
-echo "contents" > "$T_D0/badfile"
-ino=$(stat -c "%i" "$T_D0/badfile")
-dseq=$(scoutfs stat -s data_seq "$T_D0/badfile")
-exec {FD}<"$T_D0/badfile"
-rm -f "$T_D1/badfile"
+echo "contents" > "$T_D0/file"
+ino=$(stat -c "%i" "$T_D0/file")
+dseq=$(scoutfs stat -s data_seq "$T_D0/file")
+exec {FD}<"$T_D0/file"
+rm -f "$T_D1/file"
 echo "mount 0 contents after mount 1 rm: $(cat <&$FD)"
 check_ino_index "$ino" "$dseq" "$T_M0"
 check_ino_index "$ino" "$dseq" "$T_M1"
 exec {FD}>&-  # close
 # we know that revalidating will unhash the remote dentry
-stat "$T_D0/badfile" 2>&1 | sed 's/cannot statx/cannot stat/' | t_filter_fs
-t_force_log_merge
-t_wait_for_no_orphans
+stat "$T_D0/file" 2>&1 | sed 's/cannot statx/cannot stat/' | t_filter_fs
 check_ino_index "$ino" "$dseq" "$T_M0"
 check_ino_index "$ino" "$dseq" "$T_M1"

@@ -97,20 +83,16 @@ rm -f "$T_D0/dir"/files-*
 rmdir "$T_D0/dir"

 echo "== open files survive remote scanning orphans"
-echo "contents" > "$T_D0/lastfile"
-ino=$(stat -c "%i" "$T_D0/lastfile")
-dseq=$(scoutfs stat -s data_seq "$T_D0/lastfile")
-exec {FD}<"$T_D0/lastfile"
-rm -f "$T_D0/lastfile"
+echo "contents" > "$T_D0/file"
+ino=$(stat -c "%i" "$T_D0/file")
+dseq=$(scoutfs stat -s data_seq "$T_D0/file")
+exec {FD}<"$T_D0/file"
+rm -f "$T_D0/file"
 t_umount 1
 t_mount 1
 echo "mount 0 contents after mount 1 remounted: $(cat <&$FD)"
 exec {FD}>&-  # close
-t_force_log_merge
-t_wait_for_no_orphans
 check_ino_index "$ino" "$dseq" "$T_M0"
 check_ino_index "$ino" "$dseq" "$T_M1"

-t_restore_all_sysfs_mount_options orphan_scan_delay_ms
-
 t_pass
@@ -10,6 +10,30 @@ EXTENTS_PER_BTREE_BLOCK=600
 EXTENTS_PER_LIST_BLOCK=8192
 FREED_EXTENTS=$((EXTENTS_PER_BTREE_BLOCK * EXTENTS_PER_LIST_BLOCK))

+#
+# This test specifically creates a pathologically sparse file that will
+# be as expensive as possible to free.  This is usually fine on
+# dedicated or reasonable hardware, but trying to run this in
+# virtualized debug kernels can take a very long time.  This test is
+# about making sure that the server doesn't fail, not that the platform
+# can handle the scale of work that our btree formats happen to require
+# while execution is bogged down with use-after-free memory reference
+# tracking.  So we give the test a lot more breathing room before
+# deciding that its hung.
+#
+echo "== setting longer hung task timeout"
+if [ -w /proc/sys/kernel/hung_task_timeout_secs ]; then
+	secs=$(cat /proc/sys/kernel/hung_task_timeout_secs)
+	test "$secs" -gt 0 || \
+		t_fail "confusing value '$secs' from /proc/sys/kernel/hung_task_timeout_secs"
+	restore_hung_task_timeout()
+	{
+		echo "$secs" > /proc/sys/kernel/hung_task_timeout_secs
+	}
+	trap restore_hung_task_timeout EXIT
+	echo "$((secs * 5))" > /proc/sys/kernel/hung_task_timeout_secs
+fi
+
 echo "== creating fragmented extents"
 fragmented_data_extents $FREED_EXTENTS $EXTENTS_PER_BTREE_BLOCK "$T_D0/alloc" "$T_D0/move"

@@ -5,7 +5,7 @@
 t_require_commands mmap_stress mmap_validate scoutfs xfs_io

 echo "== mmap_stress"
-mmap_stress 8192 30 "$T_D0/mmap_stress" "$T_D0/mmap_stress" "$T_D0/mmap_stress" "$T_D3/mmap_stress" "$T_D3/mmap_stress" | sed 's/:.*//g' | sort
+mmap_stress 8192 2000 "$T_D0/mmap_stress" "$T_D1/mmap_stress" "$T_D2/mmap_stress" "$T_D3/mmap_stress" "$T_D4/mmap_stress" | sed 's/:.*//g' | sort

 echo "== basic mmap/read/write consistency checks"
 mmap_validate 256 1000 "$T_D0/mmap_val1" "$T_D1/mmap_val1"
@@ -157,7 +157,7 @@ echo "truncate should be waiting for first block:"
 expect_wait "$DIR/file" "change_size" $ino 0
 scoutfs stage "$DIR/golden" "$DIR/file" -V "$vers" -o 0 -l $BYTES
 sleep .1
-echo "truncate should no longer be waiting:"
+echo "trunate should no longer be waiting:"
 scoutfs data-waiting -B 0 -I 0 -p "$DIR" | wc -l
 cat "$DIR/golden" > "$DIR/file"
 vers=$(scoutfs stat -s data_version "$DIR/file")
@@ -168,13 +168,10 @@ scoutfs release "$DIR/file" -V "$vers" -o 0 -l $BYTES
 # overwrite, not truncate+write
 dd if="$DIR/other" of="$DIR/file" \
 	bs=$BS count=$BLOCKS conv=notrunc status=none &
-pid="$!"
 sleep .1
 echo "should be waiting for write"
 expect_wait "$DIR/file" "write" $ino 0
 scoutfs stage "$DIR/golden" "$DIR/file" -V "$vers" -o 0 -l $BYTES
-# wait for the background dd to complete
-wait "$pid" 2> /dev/null
 cmp "$DIR/file" "$DIR/other"

 echo "== cleanup"
@@ -67,49 +67,18 @@ t_mount_all
 while test -d $(echo /sys/fs/scoutfs/*/fence/* | cut -d " " -f 1); do
 	sleep .5
 done
-
-
-sv=$(t_server_nr)
-
-# wait for reclaim_open_log_tree() to complete for each mount
-while [ $(t_counter reclaimed_open_logs $sv) -lt $T_NR_MOUNTS ]; do
-	sleep 1
-done
-
-# wait for finalize_and_start_log_merge() to find no active merges in flight
-# and not find any finalized trees
-while [ $(t_counter log_merge_no_finalized $sv) -lt 1 ]; do
-	sleep 1
-done
-
 # wait for orphan scans to run
 t_set_all_sysfs_mount_options orphan_scan_delay_ms 1000
-# wait until we see two consecutive orphan scan attempts without
-# any inode deletion forward progress in each mount
-for nr in $(t_fs_nrs); do
-	C=0
-	LOSA=$(t_counter orphan_scan_attempts $nr)
-	LDOP=$(t_counter inode_deleted $nr)
-
-	while [ $C -lt 2 ]; do
-		sleep 1
-
-		OSA=$(t_counter orphan_scan_attempts $nr)
-		DOP=$(t_counter inode_deleted $nr)
-
-		if [ $OSA != $LOSA ]; then
-			if [ $DOP == $LDOP ]; then
-				(( C++ ))
-			else
-				C=0
-			fi
-		fi
-
-		LOSA=$OSA
-		LDOP=$DOP
+# also have to wait for delayed log merge work from mount
+C=120
+while (( C-- )); do
+	brk=1
+	for ino in $inos; do
+		inode_exists $ino && brk=0
 	done
+	test $brk -eq 1 && break
+	sleep 1
 done
-
 for ino in $inos; do
 	inode_exists $ino && echo "$ino still exists"
 done
@@ -1,52 +0,0 @@
-#
-# Test that orphaned log_trees entries from unmounted rids are
-# finalized and merged.
-#
-# An orphan log_trees entry is one whose rid has no mounted_clients
-# entry.  This can happen from incomplete reclaim across server
-# failovers.  We simulate it with the reclaim_skip_finalize trigger
-# which makes reclaim_open_log_tree skip the finalization step.
-#
-
-t_require_commands touch scoutfs
-t_require_mounts 2
-
-TIMEOUT=90
-
-echo "== create orphan log_trees entry via trigger"
-sv=$(t_server_nr)
-cl=$(t_first_client_nr)
-rid=$(t_mount_rid $cl)
-
-touch "$T_D0/file" "$T_D1/file"
-sync
-
-# arm the trigger so reclaim skips finalization
-t_trigger_arm_silent reclaim_skip_finalize $sv
-
-# force unmount the client, server will fence and reclaim it
-# but the trigger makes reclaim leave log_trees unfinalized
-t_force_umount $cl
-
-# wait for fencing to run
-verify_fenced() {
-	grep -q "running rid '$rid'" "$T_FENCED_LOG" 2>/dev/null
-}
-t_wait_until_timeout $TIMEOUT verify_fenced
-
-# give the server time to complete reclaim after fence
-sleep 5
-
-# remount the client so t_force_log_merge can sync all mounts.
-# the client gets a new rid; the old rid's log_trees is the orphan.
-t_mount $cl
-
-echo "== verify orphan is reclaimed and merge completes"
-t_force_log_merge
-
-echo "== verify orphan reclaim was logged"
-if ! dmesg | grep -q "reclaiming orphan log trees for rid $rid"; then
-	t_fail "expected orphan reclaim message for rid $rid in dmesg"
-fi
-
-t_pass
@@ -1,152 +0,0 @@
-
-t_require_commands scoutfs dd fallocate
-
-FILE="$T_D0/file"
-DIR="$T_D0/dir"
-
-echo "== missing options should fail =="
-rm -rf $DIR && mkdir -p $DIR
-scoutfs punch-offline $DIR -l 4096 -V 0
-scoutfs punch-offline $DIR -o 0 -V 0
-scoutfs punch-offline $DIR -o 0 -l 4096
-
-echo "== can't hole punch dir or special =="
-rm -rf $DIR && mkdir -p $DIR
-scoutfs punch-offline $DIR -o 0 -l 4096 -V 0
-
-echo "== punching an empty file does nothing =="
-rm -f $FILE && touch $FILE
-scoutfs punch-offline $FILE -o 0 -l 4096 -V 0
-
-echo "== punch outside of i_size does nothing =="
-dd if=/dev/zero of=$FILE bs=4096 count=1 status=none
-scoutfs punch-offline $FILE -o 4096 -l 4096 -V 1
-
-echo "== can't hole punch online extent =="
-scoutfs get-fiemap -Lb $FILE
-scoutfs punch-offline $FILE -o 0 -l 4096 -V 1
-scoutfs get-fiemap -Lb $FILE
-
-echo "== can't hole punch unwritten extent =="
-rm -rf $FILE && touch $FILE
-fallocate -l $((4096 * 3)) $FILE
-vers=$(scoutfs stat -s data_version "$FILE")
-scoutfs get-fiemap -Lb $FILE
-scoutfs punch-offline $FILE -o 4096 -l 4096 -V $vers
-scoutfs get-fiemap -Lb $FILE
-
-echo "== hole punch offline extent =="
-rm -rf $FILE && touch $FILE
-fallocate -l $((4096 * 3)) $FILE
-vers=$(scoutfs stat -s data_version "$FILE")
-scoutfs release $FILE --data-version $vers
-scoutfs get-fiemap -Lb $FILE
-scoutfs punch-offline $FILE -o 4096 -l 4096 -V $vers
-scoutfs get-fiemap -Lb $FILE
-
-echo "== can't hole punch non-aligned bsz offset or len =="
-rm -rf $FILE && touch $FILE
-fallocate -l $((4096 * 3)) $FILE
-vers=$(scoutfs stat -s data_version "$FILE")
-scoutfs release $FILE --data-version $vers
-scoutfs get-fiemap -Lb $FILE
-scoutfs punch-offline $FILE -o 4095 -l 4096 -V $vers
-scoutfs punch-offline $FILE -o 1 -l 4096 -V $vers
-scoutfs punch-offline $FILE -o 4096 -l 409700 -V $vers
-scoutfs punch-offline $FILE -o 4096 -l 4097 -V $vers
-scoutfs punch-offline $FILE -o 4096 -l 4095 -V $vers
-scoutfs punch-offline $FILE -o 4096 -l 1 -V $vers
-scoutfs punch-offline $FILE -o 4096 -l 0 -V $vers
-scoutfs get-fiemap -Lb $FILE
-
-echo "== can't hole punch mismatched data_version =="
-rm -rf $FILE && touch $FILE
-fallocate -l $((4096 * 3)) $FILE
-vers=$(scoutfs stat -s data_version "$FILE")
-scoutfs release $FILE --data-version $vers
-scoutfs get-fiemap -Lb $FILE
-scoutfs punch-offline $FILE -o 4096 -l 4096 -V 0
-scoutfs punch-offline $FILE -o 4096 -l 4096 -V 2
-scoutfs punch-offline $FILE -o 4096 -l 4096 -V 9999
-scoutfs get-fiemap -Lb $FILE
-
-echo "== Punch hole crossing multiple extents =="
-rm -rf $FILE && touch $FILE
-fallocate -l $((7 * 4096)) $FILE
-vers=$(scoutfs stat -s data_version "$FILE")
-scoutfs release $FILE --data-version $vers
-scoutfs get-fiemap -L $FILE
-scoutfs punch-offline $FILE -o $((1 * 4096)) -l 4096 -V $vers
-scoutfs punch-offline $FILE -o $((3 * 4096)) -l 4096 -V $vers
-scoutfs punch-offline $FILE -o $((5 * 4096)) -l 4096 -V $vers
-# 0.1.2.3
-scoutfs get-fiemap -L $FILE
-scoutfs punch-offline $FILE -o $((2 * 4096)) -l $((3 * 4096)) -V $vers
-# 0.....1
-scoutfs get-fiemap -L $FILE
-
-echo "== punch hole starting at a hole =="
-rm -rf $FILE && touch $FILE
-fallocate -l $((7 * 4096)) $FILE
-vers=$(scoutfs stat -s data_version "$FILE")
-scoutfs release $FILE --data-version $vers
-scoutfs get-fiemap -L $FILE
-scoutfs punch-offline $FILE -o $((1 * 4096)) -l 4096 -V $vers
-scoutfs punch-offline $FILE -o $((3 * 4096)) -l 4096 -V $vers
-scoutfs punch-offline $FILE -o $((5 * 4096)) -l 4096 -V $vers
-# 0.1.2.3
-scoutfs get-fiemap -L $FILE
-scoutfs punch-offline $FILE -o $((1 * 4096)) -l $((5 * 4096)) -V $vers
-# 0.....1
-scoutfs get-fiemap -L $FILE
-
-echo "== large punch =="
-rm -rf $FILE && touch $FILE
-fallocate -l $((6 * 1024 * 1024 * 1024)) $FILE
-vers=$(scoutfs stat -s data_version "$FILE")
-scoutfs release $FILE --data-version $vers
-scoutfs get-fiemap -L $FILE
-scoutfs punch-offline $FILE -o $((134123 * 4096)) -l $((68343 * 4096)) -V $vers
-scoutfs punch-offline $FILE -o $((467273 * 4096)) -l $((68343 * 4096)) -V $vers
-scoutfs punch-offline $FILE -o $((734623 * 4096)) -l $((68343 * 4096)) -V $vers
-scoutfs get-fiemap -L $FILE
-
-echo "== overlapping punches with lots of extents =="
-rm -rf $FILE && touch $FILE
-fallocate -l $((4096 * 1024)) $FILE
-vers=$(scoutfs stat -s data_version "$FILE")
-scoutfs release $FILE --data-version 1
-scoutfs get-fiemap -Lb $FILE
-# punch odd ones away
-for h in $(seq 1 2 1023); do
-	scoutfs punch-offline $FILE -o $((h * 4096)) -l 4096 -V $vers
-done
-scoutfs get-fiemap -Lb $FILE | tail -n 1
-# punch a large hole from 32 to 55, removing 7 extents
-scoutfs punch-offline $FILE -o $((32 * 4096)) -l $((13 * 4096)) -V $vers
-scoutfs get-fiemap -Lb $FILE | tail -n 1
-# punch every 8th @6
-for h in $(seq 6 8 1024); do
-	scoutfs punch-offline $FILE -o $((h * 4096)) -l 4096 -V $vers
-done
-# again @4
-scoutfs get-fiemap -Lb $FILE | tail -n 1
-for h in $(seq 4 8 1024); do
-	scoutfs punch-offline $FILE -o $((h * 4096)) -l 4096 -V $vers
-done
-scoutfs get-fiemap -Lb $FILE | tail -n 1
-# punching a large hole from 127 to 175, removing 12 extents
-scoutfs punch-offline $FILE -o $((127 * 4096)) -l $((48 * 4096)) -V $vers
-scoutfs get-fiemap -Lb $FILE
-# again @2
-for h in $(seq 2 8 1024); do
-	scoutfs punch-offline $FILE -o $((h * 4096)) -l 4096 -V $vers
-done
-scoutfs get-fiemap -L $FILE
-# and again @0, punching away everything remaining extent
-for h in $(seq 0 8 1024); do
-	scoutfs punch-offline $FILE -o $((h * 4096)) -l 4096 -V $vers
-done
-scoutfs get-fiemap -Lb $FILE
-
-t_pass
@@ -62,7 +62,7 @@ test_timeout()
 	sleep 1

 	# tear down the current server/leader
-	t_force_umount $sv &
+	t_force_umount $sv

 	# see how long it takes for the next leader to start
 	start=$(time_ms)
@@ -73,7 +73,6 @@ test_timeout()
 	echo "to $to delay $delay" >> $T_TMP.delay

 	# restore the mount that we tore down
-	wait
 	t_mount $sv

 	# make sure the new leader delay was reasonable, allowing for some slack
@@ -1,70 +0,0 @@
-#
-# Regression for the BUG_ON in scoutfs_quota_invalidate when a concurrent
-# ruleset read on one mount races with a quota rule modification.
-#
-
-t_require_mounts 2
-
-TEST_UID=22222
-SET_UID="--ruid=$TEST_UID --euid=$TEST_UID"
-
-echo "== setup"
-mkdir -p "$T_D0/dir"
-chown --quiet $TEST_UID "$T_D0/dir"
-
-# totl xattr gives quota checks something to consult
-setfattr -n scoutfs.totl.test.1.1.1 -v 1 "$T_D0/dir"
-
-echo "== concurrent quota mod and check across mounts"
-
-(
-	for i in $(seq 1 20); do
-		scoutfs quota-add -p "$T_M0" \
-			-r "1 1,L,- 1,L,- $i,L,- I 999999 -" 2>/dev/null
-		scoutfs quota-del -p "$T_M0" \
-			-r "1 1,L,- 1,L,- $i,L,- I 999999 -" 2>/dev/null
-	done
-) &
-MOD_PID=$!
-
-# same mount as the mod: races local read against invalidate
-(
-	for i in $(seq 1 50); do
-		setpriv $SET_UID touch "$T_D0/dir/race0_$i" 2>/dev/null
-		rm -f "$T_D0/dir/race0_$i"
-	done
-) &
-CHECK0_PID=$!
-
-# other mount: drives cross-node lock traffic
-(
-	for i in $(seq 1 50); do
-		setpriv $SET_UID touch "$T_D1/dir/race1_$i" 2>/dev/null
-		rm -f "$T_D1/dir/race1_$i"
-	done
-) &
-CHECK1_PID=$!
-
-t_quiet wait $MOD_PID
-t_quiet wait $CHECK0_PID
-t_quiet wait $CHECK1_PID
-
-echo "== verify quota rules are consistent after race"
-scoutfs quota-wipe -p "$T_M0"
-scoutfs quota-list -p "$T_M0"
-
-echo "== verify file creation still works under quota"
-scoutfs quota-add -p "$T_M0" -r "1 1,L,- 1,L,- 1,L,- I 999999 -"
-sync
-echo 1 > $(t_debugfs_path)/drop_weak_item_cache
-echo 1 > $(t_debugfs_path)/drop_quota_check_cache
-setpriv $SET_UID touch "$T_D0/dir/verify_file"
-test -f "$T_D1/dir/verify_file" && echo "file visible on mount 1"
-rm -f "$T_D0/dir/verify_file"
-scoutfs quota-wipe -p "$T_M0"
-
-echo "== cleanup"
-setfattr -x scoutfs.totl.test.1.1.1 "$T_D0/dir"
-rm -rf "$T_D0/dir"
-
-t_pass
@@ -8,19 +8,19 @@ t_require_mounts 2
 echo "=== renameat2 noreplace flag test"

 # give each mount their own dir (lock group) to minimize create contention
-mkdir $T_D0/dir0
-mkdir $T_D1/dir1
+mkdir $T_M0/dir0
+mkdir $T_M1/dir1

 echo "=== run two asynchronous calls to renameat2 NOREPLACE"
 for i in $(seq 0 100); do
        # prepare inputs in isolation
-        touch "$T_D0/dir0/old0"
-        touch "$T_D1/dir1/old1"
+        touch "$T_M0/dir0/old0"
+        touch "$T_M1/dir1/old1"

        # race doing noreplace renames, both can't succeed
-        dumb_renameat2 -n "$T_D0/dir0/old0" "$T_D0/dir0/sharednew" 2> /dev/null &
+        dumb_renameat2 -n "$T_M0/dir0/old0" "$T_M0/dir0/sharednew" 2> /dev/null &
        pid0=$!
-        dumb_renameat2 -n "$T_D1/dir1/old1" "$T_D1/dir0/sharednew" 2> /dev/null &
+        dumb_renameat2 -n "$T_M1/dir1/old1" "$T_M1/dir0/sharednew" 2> /dev/null &
        pid1=$!

        wait $pid0
@@ -31,7 +31,7 @@ for i in $(seq 0 100); do
        test "$rc0" == 0 -a "$rc1" == 0 && t_fail "both renames succeeded"

        # blow away possible files for either race outcome
-        rm -f "$T_D0/dir0/old0" "$T_D1/dir1/old1" "$T_D0/dir0/sharednew" "$T_D1/dir1/sharednew"
+        rm -f "$T_M0/dir0/old0" "$T_M1/dir1/old1" "$T_M0/dir0/sharednew" "$T_M1/dir1/sharednew"
 done

 t_pass
@@ -19,8 +19,8 @@ df_free() {
 }

 same_totals() {
-	cur_meta_tot=$(statfs_total meta "$T_MSCR")
-	cur_data_tot=$(statfs_total data "$T_MSCR")
+	cur_meta_tot=$(statfs_total meta "$SCR")
+	cur_data_tot=$(statfs_total data "$SCR")

 	test "$cur_meta_tot" == "$exp_meta_tot" || \
 		t_fail "cur total_meta_blocks $cur_meta_tot != expected $exp_meta_tot"
@@ -34,10 +34,10 @@ same_totals() {
 # some slop to account for reserved blocks and concurrent allocation.
 #
 devices_grew() {
-	cur_meta_tot=$(statfs_total meta "$T_MSCR")
-	cur_data_tot=$(statfs_total data "$T_MSCR")
-	cur_meta_df=$(df_free MetaData "$T_MSCR")
-	cur_data_df=$(df_free Data "$T_MSCR")
+	cur_meta_tot=$(statfs_total meta "$SCR")
+	cur_data_tot=$(statfs_total data "$SCR")
+	cur_meta_df=$(df_free MetaData "$SCR")
+	cur_data_df=$(df_free Data "$SCR")

 	local grow_meta_tot=$(echo "$exp_meta_tot * 2" | bc)
 	local grow_data_tot=$(echo "$exp_data_tot * 2" | bc)
@@ -70,13 +70,19 @@ size_data=$(blockdev --getsize64 "$T_EX_DATA_DEV")
 quarter_meta=$(echo "$size_meta / 4" | bc)
 quarter_data=$(echo "$size_data / 4" | bc)

+# XXX this is all pretty manual, would be nice to have helpers
 echo "== make initial small fs"
-t_scratch_mkfs -A -m $quarter_meta -d $quarter_data
-t_scratch_mount
+scoutfs mkfs -A -f -Q 0,127.0.0.1,53000 -m $quarter_meta -d $quarter_data \
+	"$T_EX_META_DEV" "$T_EX_DATA_DEV" > $T_TMP.mkfs.out 2>&1 || \
+		t_fail "mkfs failed"
+SCR="$T_TMPDIR/mnt.scratch"
+mkdir -p "$SCR"
+mount -t scoutfs -o metadev_path=$T_EX_META_DEV,quorum_slot_nr=0 \
+	"$T_EX_DATA_DEV" "$SCR"

 # then calculate sizes based on blocks that mkfs used
-quarter_meta=$(echo "$(statfs_total meta "$T_MSCR") * 64 * 1024" | bc)
-quarter_data=$(echo "$(statfs_total data "$T_MSCR") * 4 * 1024" | bc)
+quarter_meta=$(echo "$(statfs_total meta "$SCR") * 64 * 1024" | bc)
+quarter_data=$(echo "$(statfs_total data "$SCR") * 4 * 1024" | bc)
 whole_meta=$(echo "$quarter_meta * 4" | bc)
 whole_data=$(echo "$quarter_data * 4" | bc)
 outsize_meta=$(echo "$whole_meta * 2" | bc)
@@ -87,58 +93,59 @@ shrink_meta=$(echo "$quarter_meta / 2" | bc)
 shrink_data=$(echo "$quarter_data / 2" | bc)

 # and save expected values for checks
-exp_meta_tot=$(statfs_total meta "$T_MSCR")
-exp_meta_df=$(df_free MetaData "$T_MSCR")
-exp_data_tot=$(statfs_total data "$T_MSCR")
-exp_data_df=$(df_free Data "$T_MSCR")
+exp_meta_tot=$(statfs_total meta "$SCR")
+exp_meta_df=$(df_free MetaData "$SCR")
+exp_data_tot=$(statfs_total data "$SCR")
+exp_data_df=$(df_free Data "$SCR")

 echo "== 0s do nothing"
-scoutfs resize-devices -p "$T_MSCR"
-scoutfs resize-devices -p "$T_MSCR" -m 0
-scoutfs resize-devices -p "$T_MSCR" -d 0
-scoutfs resize-devices -p "$T_MSCR" -m 0 -d 0
+scoutfs resize-devices -p "$SCR" 
+scoutfs resize-devices -p "$SCR" -m 0
+scoutfs resize-devices -p "$SCR" -d 0
+scoutfs resize-devices -p "$SCR" -m 0 -d 0

 echo "== shrinking fails"
-scoutfs resize-devices -p "$T_MSCR" -m $shrink_meta
-scoutfs resize-devices -p "$T_MSCR" -d $shrink_data
-scoutfs resize-devices -p "$T_MSCR" -m $shrink_meta -d $shrink_data
+scoutfs resize-devices -p "$SCR" -m $shrink_meta
+scoutfs resize-devices -p "$SCR" -d $shrink_data
+scoutfs resize-devices -p "$SCR" -m $shrink_meta -d $shrink_data
 same_totals

 echo "== existing sizes do nothing"
-scoutfs resize-devices -p "$T_MSCR" -m $quarter_meta
-scoutfs resize-devices -p "$T_MSCR" -d $quarter_data
-scoutfs resize-devices -p "$T_MSCR" -m $quarter_meta -d $quarter_data
+scoutfs resize-devices -p "$SCR" -m $quarter_meta
+scoutfs resize-devices -p "$SCR" -d $quarter_data
+scoutfs resize-devices -p "$SCR" -m $quarter_meta -d $quarter_data
 same_totals

 echo "== growing outside device fails"
-scoutfs resize-devices -p "$T_MSCR" -m $outsize_meta
-scoutfs resize-devices -p "$T_MSCR" -d $outsize_data
-scoutfs resize-devices -p "$T_MSCR" -m $outsize_meta -d $outsize_data
+scoutfs resize-devices -p "$SCR" -m $outsize_meta
+scoutfs resize-devices -p "$SCR" -d $outsize_data
+scoutfs resize-devices -p "$SCR" -m $outsize_meta -d $outsize_data
 same_totals

 echo "== resizing meta works"
-scoutfs resize-devices -p "$T_MSCR" -m $half_meta
+scoutfs resize-devices -p "$SCR" -m $half_meta
 devices_grew meta

 echo "== resizing data works"
-scoutfs resize-devices -p "$T_MSCR" -d $half_data
+scoutfs resize-devices -p "$SCR" -d $half_data
 devices_grew data

 echo "== shrinking back fails"
-scoutfs resize-devices -p "$T_MSCR" -m $quarter_meta
-scoutfs resize-devices -p "$T_MSCR" -m $quarter_data
+scoutfs resize-devices -p "$SCR" -m $quarter_meta
+scoutfs resize-devices -p "$SCR" -m $quarter_data
 same_totals

 echo "== resizing again does nothing"
-scoutfs resize-devices -p "$T_MSCR" -m $half_meta
-scoutfs resize-devices -p "$T_MSCR" -m $half_data
+scoutfs resize-devices -p "$SCR" -m $half_meta
+scoutfs resize-devices -p "$SCR" -m $half_data
 same_totals

 echo "== resizing to full works"
-scoutfs resize-devices -p "$T_MSCR" -m $whole_meta -d $whole_data
+scoutfs resize-devices -p "$SCR" -m $whole_meta -d $whole_data
 devices_grew meta data

 echo "== cleanup extra fs"
-t_scratch_umount
+umount "$SCR"
+rmdir "$SCR"

 t_pass
@@ -32,7 +32,7 @@ echo "== dirs shouldn't appear in data_seq queries"
 mkdir "$DIR"
 ino=$(stat -c "%i" "$DIR")
 t_sync_seq_index
-query_index data_seq | awk '($4 == "'$ino'")'
+query_index data_seq | grep "$ino\>"

 echo "== two created files are present and come after each other"
 touch "$DIR/first"
@@ -92,13 +92,13 @@ test "$before" -lt "$after" || \
 # didn't skip past deleted dirty items
 #
 echo "== make sure dirtying doesn't livelock walk"
-dd if=/dev/urandom of="$DIR/dirtying" bs=4K count=1 >> "$T_TMPDIR/seqres.full" 2>&1
+dd if=/dev/urandom of="$DIR/dirtying" bs=4K count=1 >> $seqres.full 2>&1
 nr=1
 while [ "$nr" -lt 100 ]; do
-	echo "dirty/walk attempt $nr" >> "$T_TMPDIR/seqres.full"
+	echo "dirty/walk attempt $nr" >> $seqres.full
 	sync
 	dd if=/dev/urandom of="$DIR/dirtying" bs=4K count=1 conv=notrunc \
-		>> "$T_TMPDIR/seqres.full" 2>&1
+		>> $seqres.full 2>&1
 	scoutfs walk-inodes data_seq 0 -1 $DIR/dirtying >& /dev/null 
 	((nr++))
 done
@@ -12,12 +12,12 @@ create_file() {

 	if [ "$blocks" != 0 ]; then
 		dd if=/dev/urandom bs=4096 count=$blocks of="$file" \
-			>> "$T_TMPDIR/seqres.full" 2>&1
+			>> $seqres.full 2>&1
 	fi

 	if [ "$remainder" != 0 ]; then
 		dd if=/dev/urandom bs="$remainder" count=1 of="$file" \
-			conv=notrunc oflag=append >> "$T_TMPDIR/seqres.full" 2>&1
+			conv=notrunc oflag=append >> $seqres.full 2>&1
 	fi
 }

@@ -78,7 +78,7 @@ create_file "$FILE" $((4096 * 1024))
 cp "$FILE"  "$T_TMP"
 nr=1
 while [ "$nr" -lt 10 ]; do
-	echo "attempt $nr" >> "$T_TMPDIR/$seqres.full" 2>&1
+	echo "attempt $nr" >> $seqres.full 2>&1
 	release_vers "$FILE" stat 0 4096K
 	sync
 	echo 3 > /proc/sys/vm/drop_caches
@@ -1,43 +0,0 @@
-#
-# Exercise the SCOUTFS_IOC_INJECT_TOTL_DELTA ioctl that injects totl
-# deltas directly via totl-delta-inject(1).
-#
-
-t_require_commands setfattr scoutfs sync rm touch totl-delta-inject
-
-# force a log merge then read-xattr-totals filtered to our own keys
-read_totals()
-{
-	t_force_log_merge
-	sync
-	echo 1 > $(t_debugfs_path)/drop_weak_item_cache
-	scoutfs read-xattr-totals -p "$T_M0" | \
-		grep -E '^8888\.' || true
-}
-
-echo "== setup three files contributing to totl 8888.0.0"
-touch "$T_D0/f1" "$T_D0/f2" "$T_D0/f3"
-setfattr -n scoutfs.totl.inj.8888.0.0 -v 2  "$T_D0/f1"
-setfattr -n scoutfs.totl.inj.8888.0.0 -v 8  "$T_D0/f2"
-setfattr -n scoutfs.totl.inj.8888.0.0 -v 32 "$T_D0/f3"
-
-echo "== merge baseline into fs_root"
-read_totals
-
-echo "== inject (+128, +2) unbalances totl 8888.0.0"
-totl-delta-inject "$T_M0" 8888.0.0 128 2
-read_totals
-
-echo "== unlink f3 (value 32) produces a -32/-1 delta"
-rm -f "$T_D0/f3"
-read_totals
-
-echo "== inject (-128, -2) restores accounting for the remaining files"
-totl-delta-inject "$T_M0" 8888.0.0 -128 -2
-read_totals
-
-echo "== cleanup"
-rm -f "$T_D0/f1" "$T_D0/f2"
-read_totals
-
-t_pass
@@ -1,50 +0,0 @@
-#
-# Test that merge_read_item() correctly updates the sequence number when
-# combining delta items from multiple finalized log trees.  Each mount
-# sets a totl value in its own 3-bit lane (powers of 8) so that any
-# double-counting overflows the lane and is caught by: or(v, exp) != exp.
-#
-
-t_require_commands setfattr scoutfs
-t_require_mounts 5
-
-echo "== setup"
-for nr in $(t_fs_nrs); do
-	d=$(eval echo \$T_D$nr)
-	for i in $(seq 1 2500); do : > "$d/f$nr$i"; done
-done
-sync
-t_force_log_merge
-
-vals=(1 8 64 512 4096)
-expected=4681
-n=0
-for nr in $(t_fs_nrs); do
-	d=$(eval echo \$T_D$nr)
-	v=${vals[$((n++))]}
-	for i in $(seq 1 2500); do
-		setfattr -n "scoutfs.totl.t.$i.0.0" -v $v "$d/f$nr$i"
-	done
-done
-
-t_trigger_arm_silent log_merge_force_partial $(t_server_nr)
-
-bad="$T_TMPDIR/bad"
-for nr in $(t_fs_nrs); do
-	( while true; do
-		echo 1 > "$(t_debugfs_path $nr)/drop_weak_item_cache"
-		scoutfs read-xattr-totals -p "$(eval echo \$T_M$nr)" | \
-			awk -F'[ =,]+' -v e=$expected 'or($2+0,e) != e'
-	done ) >> "$bad" &
-done
-
-echo "expected $expected"
-t_force_log_merge
-t_silent_kill $(jobs -p)
-test -s "$bad" && echo "double-counted:" && cat "$bad"
-
-echo "== cleanup"
-for nr in $(t_fs_nrs); do
-	find "$(eval echo \$T_D$nr)" -name "f$nr*" -delete
-done
-t_pass
@@ -50,9 +50,9 @@ t_quiet sync
 cat << EOF > local.config
 export FSTYP=scoutfs
 export MKFS_OPTIONS="-f"
-export MKFS_TEST_OPTIONS="-Q 0,127.0.0.1,$T_TEST_PORT"
-export MKFS_SCRATCH_OPTIONS="-Q 0,127.0.0.1,$T_SCRATCH_PORT"
-export MKFS_DEV_OPTIONS="-Q 0,127.0.0.1,$T_DEV_PORT"
+export MKFS_TEST_OPTIONS="-Q 0,127.0.0.1,42000"
+export MKFS_SCRATCH_OPTIONS="-Q 0,127.0.0.1,43000"
+export MKFS_DEV_OPTIONS="-Q 0,127.0.0.1,44000"
 export TEST_DEV=$T_DB0
 export TEST_DIR=$T_M0
 export SCRATCH_META_DEV=$T_EX_META_DEV
@@ -63,47 +63,73 @@ export MOUNT_OPTIONS="-o quorum_slot_nr=0,metadev_path=$T_MB0"
 export TEST_FS_MOUNT_OPTS="-o quorum_slot_nr=0,metadev_path=$T_MB0"
 EOF

-cp "$T_EXTRA/local.exclude" local.exclude
+cat << EOF > local.exclude
+generic/003	# missing atime update in buffered read
+generic/075	# file content mismatch failures (fds, etc)
+generic/103	# enospc causes trans commit failures
+generic/108	# mount fails on failing device?
+generic/112	# file content mismatch failures (fds, etc)
+generic/213	# enospc causes trans commit failures
+generic/318	# can't support user namespaces until v5.11
+generic/321	# requires selinux enabled for '+' in ls?
+generic/338	# BUG_ON update inode error handling
+generic/347	# _dmthin_mount doesn't work?
+generic/356	# swap
+generic/357	# swap
+generic/409	# bind mounts not scripted yet
+generic/410	# bind mounts not scripted yet
+generic/411	# bind mounts not scripted yet
+generic/423	# symlink inode size is strlen() + 1 on scoutfs
+generic/430	# xfs_io copy_range missing in el7
+generic/431	# xfs_io copy_range missing in el7
+generic/432	# xfs_io copy_range missing in el7
+generic/433	# xfs_io copy_range missing in el7
+generic/434	# xfs_io copy_range missing in el7
+generic/441	# dm-mapper
+generic/444	# el9's posix_acl_update_mode is buggy ?
+generic/467	# open_by_handle ESTALE
+generic/472	# swap
+generic/484	# dm-mapper
+generic/493	# swap
+generic/494	# swap
+generic/495	# swap
+generic/496	# swap
+generic/497	# swap
+generic/532	# xfs_io statx attrib_mask missing in el7
+generic/554	# swap
+generic/563	# cgroup+loopdev
+generic/564	# xfs_io copy_range missing in el7
+generic/565	# xfs_io copy_range missing in el7
+generic/568	# falloc not resulting in block count increase
+generic/569	# swap
+generic/570	# swap
+generic/620	# dm-hugedisk
+generic/633	# id-mapped mounts missing in el7
+generic/636	# swap
+generic/641	# swap
+generic/643	# swap
+EOF

-t_stdout_invoked
+t_restore_output
 echo "  (showing output of xfstests)"

 args="-E local.exclude ${T_XFSTESTS_ARGS:--g quick}"
 ./check $args
 # the fs is unmounted when check finishes

-t_stdout_compare
-
 #
-# ./check writes the results of the run to check.log.  It lists the
-# tests it ran, skipped, or failed.  Then it writes a line saying
-# everything passed or some failed.
-#
-
-#
-# If XFSTESTS_ARGS were specified then we just pass/fail to match the
-# check run.
-#
-if [ -n "$T_XFSTESTS_ARGS" ]; then
-	if tail -1 results/check.log | grep -q "Failed"; then
-		t_fail
-	else
-		t_pass
-	fi
-fi
-
-#
-# Otherwise, typically, when there were no args then we scrape the most
-# recent run and use it as the output to compare to make sure that we
-# run the right tests and get the right results.
+# ./check writes the results of the run to check.log.  It lists
+# the tests it ran, skipped, or failed.  Then it writes a line saying
+# everything passed or some failed.  We scrape the most recent run and
+# use it as the output to compare to make sure that we run the right
+# tests and get the right results.
 #
 awk '
 	/^(Ran|Not run|Failures):.*/ {
 		if (pf) {
 			res=""
 			pf=""
-		}
-		res = res "\n" $0
+		} res = res "\n" $0
 	}
 	/^(Passed|Failed).*tests$/ {
 		pf=$0
@@ -113,14 +139,10 @@ awk '
 	}' < results/check.log  > "$T_TMPDIR/results"

 # put a test per line so diff shows tests that differ
-grep -E "^(Ran|Not run|Failures):" "$T_TMPDIR/results" | fmt -w 1 > "$T_TMPDIR/results.fmt"
-grep -E "^(Passed|Failed).*tests$" "$T_TMPDIR/results" >> "$T_TMPDIR/results.fmt"
+egrep "^(Ran|Not run|Failures):" "$T_TMPDIR/results" | \
+	fmt -w 1 > "$T_TMPDIR/results.fmt"
+egrep "^(Passed|Failed).*tests$" "$T_TMPDIR/results" >> "$T_TMPDIR/results.fmt"

-diff -u "$T_EXTRA/expected-results" "$T_TMPDIR/results.fmt" > "$T_TMPDIR/results.diff"
-if [ -s "$T_TMPDIR/results.diff" ]; then
-	echo "tests that were skipped/run differed from expected:"
-	cat "$T_TMPDIR/results.diff"
-	t_fail
-fi
+t_compare_output cat "$T_TMPDIR/results.fmt"

 t_pass
@@ -62,28 +62,32 @@ test -x "$SCOUTFS_FENCED_RUN" || \
 # files disappear.
 #

-# silence error messages
-quiet_cat()
+# generate failure messages to stderr while still echoing 0 for the caller
+careful_cat()
 {
-	cat "$@" 2>/dev/null
+	local path="$@"
+
+	cat "$@" || echo 0
 }

 while sleep $SCOUTFS_FENCED_DELAY; do
-	shopt -s nullglob
 	for fence in /sys/fs/scoutfs/*/fence/*; do
-
-		srv=$(basename $(dirname $(dirname $fence)))
-		fenced="$(quiet_cat $fence/fenced)"
-		error="$(quiet_cat $fence/error)"
-		rid="$(quiet_cat $fence/rid)"
-		ip="$(quiet_cat $fence/ipv4_addr)"
-		reason="$(quiet_cat $fence/reason)"
-
-		# request dirs can linger then disappear after fenced/error is set
-		if [ ! -d "$fence" -o "$fenced" == "1" -o "$error" == "1" ]; then
+		# catches unmatched regex when no dirs
+		if [ ! -d "$fence" ]; then
 			continue
 		fi

+		# skip requests that have been handled
+		if [ "$(careful_cat $fence/fenced)" == 1 -o \
+		     "$(careful_cat $fence/error)" == 1 ]; then
+			continue
+		fi
+
+		srv=$(basename $(dirname $(dirname $fence)))
+		rid="$(cat $fence/rid)"
+		ip="$(cat $fence/ipv4_addr)"
+		reason="$(cat $fence/reason)"
+
 		log_message "server $srv fencing rid $rid at IP $ip for $reason"

 		# export _REQ_ vars for run to use
--- a/Show More
+++ b/Show More