Drain conn workers before nulling client->conn in destroy

scoutfs_client_destroy nulled client->conn before scoutfs_net_free_conn had a chance to drain the conn's workqueue. An in-flight proc_worker running client_lock_recover dispatches scoutfs_lock_recover_request synchronously, which in turn calls scoutfs_client_lock_recover_response. That helper reads client->conn and hands it to scoutfs_net_response, so a racing NULL made submit_send dereference conn->lock and trip a KASAN null-ptr-deref followed by a GPF. Only became reachable in practice once reconnect started draining pending client requests with -ECONNRESET, because the farewell can now return while the server is still sending requests on the re-established socket. Reorder so scoutfs_net_free_conn runs first; its shutdown_worker drains conn->workq before any memory is freed, then client->conn is nulled. The original intent of nulling to catch buggy late callers is preserved. Signed-off-by: Auke Kok <auke.kok@versity.com>
Bound RPC waits in idempotent background workers
2026-04-30 09:56:55 +00:00 · 2026-04-22 13:49:33 -07:00 · 2026-04-22 13:49:27 -07:00 · 2026-04-22 13:49:20 -07:00 · 2026-04-22 13:49:19 -07:00 · 2026-04-22 13:49:18 -07:00
17 changed files with 689 additions and 370 deletions
--- a/ReleaseNotes.md
+++ b/ReleaseNotes.md
@@ -1,6 +1,23 @@
 Versity ScoutFS Release Notes
 =============================

+---
+v1.30
+\
+*Apr 21, 2026*
+
+Fix a problem reading the accumulated totals of contributing .totl.
+xattrs when log merging is in progress.  The problem would have readers
+of the totals calculate the sums incorrectly.
+
+Fix a problem updating quota rules.  There was a race where updates
+could be corrupted if they happened while a transaction was being
+written.
+
+Fix a problem deleting files with .indx. xattrs.  The internal indexing
+metadata wouldn't be properly deleted so the files would still claim to
+be present and visible in the index, though the file no longer existed.
+
 ---
 v1.29
 \
--- a/kmod/src/alloc.c
+++ b/kmod/src/alloc.c
@@ -24,6 +24,7 @@
 #include "trans.h"
 #include "alloc.h"
 #include "counters.h"
+#include "msg.h"
 #include "scoutfs_trace.h"

 /*
@@ -496,10 +497,11 @@ static int dirty_alloc_blocks(struct super_block *sb,
 	struct scoutfs_block *fr_bl = NULL;
 	struct scoutfs_block *bl;
 	bool link_orig = false;
+	__le32 orig_first_nr;
 	u64 av_peek;
-	u64 av_old;
+	u64 av_old = 0;
 	u64 fr_peek;
-	u64 fr_old;
+	u64 fr_old = 0;
 	int ret;

 	if (alloc->dirty_avail_bl != NULL)
@@ -509,6 +511,7 @@ static int dirty_alloc_blocks(struct super_block *sb,

 	/* undo dirty freed if we get an error after */
 	orig_freed = alloc->freed.ref;
+	orig_first_nr = alloc->freed.first_nr;

 	if (alloc->dirty_avail_bl != NULL) {
 		ret = 0;
@@ -562,6 +565,17 @@ static int dirty_alloc_blocks(struct super_block *sb,
 	/* sort dirty avail to encourage contiguous sorted meta blocks */
 	list_block_sort(av_bl->data);

+	lblk = fr_bl->data;
+	if (WARN_ON_ONCE(alloc->freed.ref.blkno != lblk->hdr.blkno)) {
+		scoutfs_err(sb, "dirty_alloc freed ref %llu hdr %llu av_old %llu fr_old %llu av_peek %llu fr_peek %llu link_orig %d",
+			    le64_to_cpu(alloc->freed.ref.blkno),
+			    le64_to_cpu(lblk->hdr.blkno),
+			    av_old, fr_old, av_peek, fr_peek, link_orig);
+		ret = -EIO;
+		goto out;
+	}
+	lblk = NULL;
+
 	if (av_old)
 		list_block_add(&alloc->freed, fr_bl->data, av_old);
 	if (fr_old)
@@ -578,6 +592,7 @@ out:
 		if (fr_bl)
 			scoutfs_block_writer_forget(sb, wri, fr_bl);
 		alloc->freed.ref = orig_freed;
+		alloc->freed.first_nr = orig_first_nr;
 	}

 	mutex_unlock(&alloc->mutex);
--- a/kmod/src/block.c
+++ b/kmod/src/block.c
@@ -218,6 +218,7 @@ static void block_free_work(struct work_struct *work)

 	llist_for_each_entry_safe(bp, tmp, deleted, free_node) {
 		block_free(sb, bp);
+		cond_resched();
 	}
 }

@@ -467,9 +468,6 @@ static int block_submit_bio(struct super_block *sb, struct block_private *bp,
 	sector_t sector;
 	int ret = 0;

-	if (scoutfs_forcing_unmount(sb))
-		return -ENOLINK;
-
 	sector = bp->bl.blkno << (SCOUTFS_BLOCK_LG_SHIFT - 9);

 	WARN_ON_ONCE(bp->bl.blkno == U64_MAX);
@@ -480,6 +478,17 @@ static int block_submit_bio(struct super_block *sb, struct block_private *bp,
 	set_bit(BLOCK_BIT_IO_BUSY, &bp->bits);
 	block_get(bp);

+	/*
+	 * A second thread may already be waiting on this block's completion
+	 * after this thread won the race to submit the block.  We exit through
+	 * the block_end_io error path which sets BLOCK_BIT_ERROR and assures
+	 * that other callers in the waitq get woken up.
+	 */
+	if (scoutfs_forcing_unmount(sb)) {
+		ret = -ENOLINK;
+		goto end_io;
+	}
+
 	blk_start_plug(&plug);

 	for (off = 0; off < SCOUTFS_BLOCK_LG_SIZE; off += PAGE_SIZE) {
@@ -517,6 +526,7 @@ static int block_submit_bio(struct super_block *sb, struct block_private *bp,

 	blk_finish_plug(&plug);

+end_io:
 	/* let racing end_io know we're done */
 	block_end_io(sb, opf, bp, ret);

@@ -836,6 +846,8 @@ int scoutfs_block_dirty_ref(struct super_block *sb, struct scoutfs_alloc *alloc,
 		bp = BLOCK_PRIVATE(bl);

 		if (block_is_dirty(bp)) {
+			if (ref_blkno)
+				*ref_blkno = 0;
 			ret = 0;
 			goto out;
 		}
--- a/kmod/src/client.c
+++ b/kmod/src/client.c
@@ -59,6 +59,31 @@ struct client_info {
 	struct completion farewell_comp;
 };

+/*
+ * Reconnection to a new server completes pending sync requests with
+ * -ECONNRESET because their state in the old server was reclaimed at
+ * fence time.  Transparently retry so callers don't surface the
+ * reconnect as a failed RPC; preserve the pre-drain behavior where a
+ * sync request was silently resent across failover.  Shutdown paths
+ * break the loop via the errors that submit and wait already return.
+ */
+static int client_sync_request(struct super_block *sb,
+			       struct scoutfs_net_connection *conn,
+			       u8 cmd, void *arg, unsigned arg_len,
+			       void *resp, size_t resp_len)
+{
+	int ret;
+
+	for (;;) {
+		ret = scoutfs_net_sync_request(sb, conn, cmd, arg, arg_len,
+					       resp, resp_len);
+		if (ret != -ECONNRESET)
+			return ret;
+		if (scoutfs_unmounting(sb) || scoutfs_forcing_unmount(sb))
+			return -ESHUTDOWN;
+	}
+}
+
 /*
 * Ask for a new run of allocated inode numbers.  The server can return
 * fewer than @count.  It will success with nr == 0 if we've run out.
@@ -72,10 +97,10 @@ int scoutfs_client_alloc_inodes(struct super_block *sb, u64 count,
 	u64 tmp;
 	int ret;

-	ret = scoutfs_net_sync_request(sb, client->conn,
-				       SCOUTFS_NET_CMD_ALLOC_INODES,
-				       &lecount, sizeof(lecount),
-				       &ial, sizeof(ial));
+	ret = client_sync_request(sb, client->conn,
+				  SCOUTFS_NET_CMD_ALLOC_INODES,
+				  &lecount, sizeof(lecount),
+				  &ial, sizeof(ial));
 	if (ret == 0) {
 		*ino = le64_to_cpu(ial.ino);
 		*nr = le64_to_cpu(ial.nr);
@@ -94,9 +119,9 @@ int scoutfs_client_get_log_trees(struct super_block *sb,
 {
 	struct client_info *client = SCOUTFS_SB(sb)->client_info;

-	return scoutfs_net_sync_request(sb, client->conn,
-					SCOUTFS_NET_CMD_GET_LOG_TREES,
-					NULL, 0, lt, sizeof(*lt));
+	return client_sync_request(sb, client->conn,
+				   SCOUTFS_NET_CMD_GET_LOG_TREES,
+				   NULL, 0, lt, sizeof(*lt));
 }

 int scoutfs_client_commit_log_trees(struct super_block *sb,
@@ -104,9 +129,9 @@ int scoutfs_client_commit_log_trees(struct super_block *sb,
 {
 	struct client_info *client = SCOUTFS_SB(sb)->client_info;

-	return scoutfs_net_sync_request(sb, client->conn,
-					SCOUTFS_NET_CMD_COMMIT_LOG_TREES,
-					lt, sizeof(*lt), NULL, 0);
+	return client_sync_request(sb, client->conn,
+				   SCOUTFS_NET_CMD_COMMIT_LOG_TREES,
+				   lt, sizeof(*lt), NULL, 0);
 }

 int scoutfs_client_get_roots(struct super_block *sb,
@@ -114,9 +139,26 @@ int scoutfs_client_get_roots(struct super_block *sb,
 {
 	struct client_info *client = SCOUTFS_SB(sb)->client_info;

-	return scoutfs_net_sync_request(sb, client->conn,
-					SCOUTFS_NET_CMD_GET_ROOTS,
-					NULL, 0, roots, sizeof(*roots));
+	return client_sync_request(sb, client->conn,
+				   SCOUTFS_NET_CMD_GET_ROOTS,
+				   NULL, 0, roots, sizeof(*roots));
+}
+
+/*
+ * Bounded-wait get_roots for the orphan scan worker.  The worker
+ * reschedules on error, so -ETIMEDOUT is treated like any other RPC
+ * failure and retries on the next scan.
+ */
+int scoutfs_client_get_roots_timeout(struct super_block *sb,
+				     struct scoutfs_net_roots *roots,
+				     unsigned long timeout_jiffies)
+{
+	struct client_info *client = SCOUTFS_SB(sb)->client_info;
+
+	return scoutfs_net_sync_request_timeout(sb, client->conn,
+						SCOUTFS_NET_CMD_GET_ROOTS,
+						NULL, 0, roots, sizeof(*roots),
+						timeout_jiffies);
 }

 int scoutfs_client_get_last_seq(struct super_block *sb, u64 *seq)
@@ -125,9 +167,9 @@ int scoutfs_client_get_last_seq(struct super_block *sb, u64 *seq)
 	__le64 last_seq;
 	int ret;

-	ret = scoutfs_net_sync_request(sb, client->conn,
-				       SCOUTFS_NET_CMD_GET_LAST_SEQ,
-				       NULL, 0, &last_seq, sizeof(last_seq));
+	ret = client_sync_request(sb, client->conn,
+				  SCOUTFS_NET_CMD_GET_LAST_SEQ,
+				  NULL, 0, &last_seq, sizeof(last_seq));
 	if (ret == 0)
 		*seq = le64_to_cpu(last_seq);

@@ -140,24 +182,34 @@ static int client_lock_response(struct super_block *sb,
 				void *resp, unsigned int resp_len,
 				int error, void *data)
 {
+	struct scoutfs_lock *lock = data;
+
+	if (error) {
+		scoutfs_lock_request_failed(sb, lock);
+		return 0;
+	}
+
 	if (resp_len != sizeof(struct scoutfs_net_lock))
 		return -EINVAL;

-	/* XXX error? */
-
 	return scoutfs_lock_grant_response(sb, resp);
 }

-/* Send a lock request to the server. */
+/*
+ * Send a lock request to the server.  The lock is anchored by
+ * request_pending so its address is stable until the response callback
+ * runs and clears request_pending on either the grant or error path.
+ */
 int scoutfs_client_lock_request(struct super_block *sb,
-				struct scoutfs_net_lock *nl)
+				struct scoutfs_net_lock *nl,
+				struct scoutfs_lock *lock)
 {
 	struct client_info *client = SCOUTFS_SB(sb)->client_info;

 	return scoutfs_net_submit_request(sb, client->conn,
 					  SCOUTFS_NET_CMD_LOCK,
 					  nl, sizeof(*nl),
-					  client_lock_response, NULL, NULL);
+					  client_lock_response, lock, NULL);
 }

 /* Send a lock response to the server. */
@@ -189,9 +241,26 @@ int scoutfs_client_srch_get_compact(struct super_block *sb,
 {
 	struct client_info *client = SCOUTFS_SB(sb)->client_info;

-	return scoutfs_net_sync_request(sb, client->conn,
-					SCOUTFS_NET_CMD_SRCH_GET_COMPACT,
-					NULL, 0, sc, sizeof(*sc));
+	return client_sync_request(sb, client->conn,
+				   SCOUTFS_NET_CMD_SRCH_GET_COMPACT,
+				   NULL, 0, sc, sizeof(*sc));
+}
+
+/*
+ * Bounded-wait get_compact for the srch compact worker.  The worker
+ * reschedules on any error and the compact work is idempotent, so
+ * -ETIMEDOUT just defers this round.
+ */
+int scoutfs_client_srch_get_compact_timeout(struct super_block *sb,
+					    struct scoutfs_srch_compact *sc,
+					    unsigned long timeout_jiffies)
+{
+	struct client_info *client = SCOUTFS_SB(sb)->client_info;
+
+	return scoutfs_net_sync_request_timeout(sb, client->conn,
+						SCOUTFS_NET_CMD_SRCH_GET_COMPACT,
+						NULL, 0, sc, sizeof(*sc),
+						timeout_jiffies);
 }

 /* Commit the result of a srch file compaction. */
@@ -200,9 +269,27 @@ int scoutfs_client_srch_commit_compact(struct super_block *sb,
 {
 	struct client_info *client = SCOUTFS_SB(sb)->client_info;

-	return scoutfs_net_sync_request(sb, client->conn,
-					SCOUTFS_NET_CMD_SRCH_COMMIT_COMPACT,
-					res, sizeof(*res), NULL, 0);
+	return client_sync_request(sb, client->conn,
+				   SCOUTFS_NET_CMD_SRCH_COMMIT_COMPACT,
+				   res, sizeof(*res), NULL, 0);
+}
+
+/*
+ * Bounded-wait commit_compact for the srch compact worker.  The server
+ * ignores partial work flagged with ERROR, so a timed-out commit
+ * (marked ERROR on this side) lets the server reclaim our allocators
+ * and reassign the compact on the next scheduled attempt.
+ */
+int scoutfs_client_srch_commit_compact_timeout(struct super_block *sb,
+					       struct scoutfs_srch_compact *res,
+					       unsigned long timeout_jiffies)
+{
+	struct client_info *client = SCOUTFS_SB(sb)->client_info;
+
+	return scoutfs_net_sync_request_timeout(sb, client->conn,
+						SCOUTFS_NET_CMD_SRCH_COMMIT_COMPACT,
+						res, sizeof(*res), NULL, 0,
+						timeout_jiffies);
 }

 int scoutfs_client_get_log_merge(struct super_block *sb,
@@ -210,9 +297,9 @@ int scoutfs_client_get_log_merge(struct super_block *sb,
 {
 	struct client_info *client = SCOUTFS_SB(sb)->client_info;

-	return scoutfs_net_sync_request(sb, client->conn,
-					SCOUTFS_NET_CMD_GET_LOG_MERGE,
-					NULL, 0, req, sizeof(*req));
+	return client_sync_request(sb, client->conn,
+				   SCOUTFS_NET_CMD_GET_LOG_MERGE,
+				   NULL, 0, req, sizeof(*req));
 }

 int scoutfs_client_commit_log_merge(struct super_block *sb,
@@ -220,9 +307,9 @@ int scoutfs_client_commit_log_merge(struct super_block *sb,
 {
 	struct client_info *client = SCOUTFS_SB(sb)->client_info;

-	return scoutfs_net_sync_request(sb, client->conn,
-					SCOUTFS_NET_CMD_COMMIT_LOG_MERGE,
-					comp, sizeof(*comp), NULL, 0);
+	return client_sync_request(sb, client->conn,
+				   SCOUTFS_NET_CMD_COMMIT_LOG_MERGE,
+				   comp, sizeof(*comp), NULL, 0);
 }

 int scoutfs_client_send_omap_response(struct super_block *sb, u64 id,
@@ -254,8 +341,30 @@ int scoutfs_client_open_ino_map(struct super_block *sb, u64 group_nr,
 		.req_id = 0,
 	};

-	return scoutfs_net_sync_request(sb, client->conn, SCOUTFS_NET_CMD_OPEN_INO_MAP,
-					&args, sizeof(args), map, sizeof(*map));
+	return client_sync_request(sb, client->conn, SCOUTFS_NET_CMD_OPEN_INO_MAP,
+				   &args, sizeof(args), map, sizeof(*map));
+}
+
+/*
+ * Bounded-wait open_ino_map for the orphan scan worker.  The scan
+ * reschedules on error; the delete path callers keep the unbounded
+ * retry.
+ */
+int scoutfs_client_open_ino_map_timeout(struct super_block *sb, u64 group_nr,
+					struct scoutfs_open_ino_map *map,
+					unsigned long timeout_jiffies)
+{
+	struct client_info *client = SCOUTFS_SB(sb)->client_info;
+	struct scoutfs_open_ino_map_args args = {
+		.group_nr = cpu_to_le64(group_nr),
+		.req_id = 0,
+	};
+
+	return scoutfs_net_sync_request_timeout(sb, client->conn,
+						SCOUTFS_NET_CMD_OPEN_INO_MAP,
+						&args, sizeof(args),
+						map, sizeof(*map),
+						timeout_jiffies);
 }

 /* The client is asking the server for the current volume options */
@@ -263,8 +372,8 @@ int scoutfs_client_get_volopt(struct super_block *sb, struct scoutfs_volume_opti
 {
 	struct client_info *client = SCOUTFS_SB(sb)->client_info;

-	return scoutfs_net_sync_request(sb, client->conn, SCOUTFS_NET_CMD_GET_VOLOPT,
-					NULL, 0, volopt, sizeof(*volopt));
+	return client_sync_request(sb, client->conn, SCOUTFS_NET_CMD_GET_VOLOPT,
+				   NULL, 0, volopt, sizeof(*volopt));
 }

 /* The client is asking the server to update volume options */
@@ -272,8 +381,8 @@ int scoutfs_client_set_volopt(struct super_block *sb, struct scoutfs_volume_opti
 {
 	struct client_info *client = SCOUTFS_SB(sb)->client_info;

-	return scoutfs_net_sync_request(sb, client->conn, SCOUTFS_NET_CMD_SET_VOLOPT,
-					volopt, sizeof(*volopt), NULL, 0);
+	return client_sync_request(sb, client->conn, SCOUTFS_NET_CMD_SET_VOLOPT,
+				   volopt, sizeof(*volopt), NULL, 0);
 }

 /* The client is asking the server to clear volume options */
@@ -281,24 +390,24 @@ int scoutfs_client_clear_volopt(struct super_block *sb, struct scoutfs_volume_op
 {
 	struct client_info *client = SCOUTFS_SB(sb)->client_info;

-	return scoutfs_net_sync_request(sb, client->conn, SCOUTFS_NET_CMD_CLEAR_VOLOPT,
-					volopt, sizeof(*volopt), NULL, 0);
+	return client_sync_request(sb, client->conn, SCOUTFS_NET_CMD_CLEAR_VOLOPT,
+				   volopt, sizeof(*volopt), NULL, 0);
 }

 int scoutfs_client_resize_devices(struct super_block *sb, struct scoutfs_net_resize_devices *nrd)
 {
 	struct client_info *client = SCOUTFS_SB(sb)->client_info;

-	return scoutfs_net_sync_request(sb, client->conn, SCOUTFS_NET_CMD_RESIZE_DEVICES,
-					nrd, sizeof(*nrd), NULL, 0);
+	return client_sync_request(sb, client->conn, SCOUTFS_NET_CMD_RESIZE_DEVICES,
+				   nrd, sizeof(*nrd), NULL, 0);
 }

 int scoutfs_client_statfs(struct super_block *sb, struct scoutfs_net_statfs *nst)
 {
 	struct client_info *client = SCOUTFS_SB(sb)->client_info;

-	return scoutfs_net_sync_request(sb, client->conn, SCOUTFS_NET_CMD_STATFS,
-					NULL, 0, nst, sizeof(*nst));
+	return client_sync_request(sb, client->conn, SCOUTFS_NET_CMD_STATFS,
+				   NULL, 0, nst, sizeof(*nst));
 }

 /*
@@ -646,8 +755,12 @@ void scoutfs_client_destroy(struct super_block *sb)
 						 client_farewell_response,
 						 NULL, NULL);
 		if (ret == 0) {
-			wait_for_completion(&client->farewell_comp);
-			ret = client->farewell_error;
+			if (!wait_for_completion_timeout(&client->farewell_comp,
+							 120 * HZ)) {
+				ret = -ETIMEDOUT;
+			} else {
+				ret = client->farewell_error;
+			}
 		}
 		if (ret) {
 			scoutfs_inc_counter(sb, client_farewell_error);
@@ -661,10 +774,16 @@ void scoutfs_client_destroy(struct super_block *sb)
 	/* make sure worker isn't using the conn */
 	cancel_delayed_work_sync(&client->connect_dwork);

-	/* make racing conn use explode */
+	/*
+	 * Drain the conn's workers before nulling client->conn.  In-flight
+	 * proc_workers dispatch request handlers that call back into client
+	 * response helpers (e.g. scoutfs_client_lock_recover_response) which
+	 * read client->conn; nulling it first races with those workers and
+	 * causes submit_send to dereference a NULL conn->lock.
+	 */
 	conn = client->conn;
-	client->conn = NULL;
 	scoutfs_net_free_conn(sb, conn);
+	client->conn = NULL;

 	if (client->workq)
 		destroy_workqueue(client->workq);
--- a/kmod/src/client.h
+++ b/kmod/src/client.h
@@ -9,18 +9,28 @@ int scoutfs_client_commit_log_trees(struct super_block *sb,
 				    struct scoutfs_log_trees *lt);
 int scoutfs_client_get_roots(struct super_block *sb,
 			     struct scoutfs_net_roots *roots);
+int scoutfs_client_get_roots_timeout(struct super_block *sb,
+				     struct scoutfs_net_roots *roots,
+				     unsigned long timeout_jiffies);
 u64 *scoutfs_client_bulk_alloc(struct super_block *sb);
 int scoutfs_client_get_last_seq(struct super_block *sb, u64 *seq);
 int scoutfs_client_lock_request(struct super_block *sb,
-				struct scoutfs_net_lock *nl);
+				struct scoutfs_net_lock *nl,
+				struct scoutfs_lock *lock);
 int scoutfs_client_lock_response(struct super_block *sb, u64 net_id,
 				struct scoutfs_net_lock *nl);
 int scoutfs_client_lock_recover_response(struct super_block *sb, u64 net_id,
 					 struct scoutfs_net_lock_recover *nlr);
 int scoutfs_client_srch_get_compact(struct super_block *sb,
 				    struct scoutfs_srch_compact *sc);
+int scoutfs_client_srch_get_compact_timeout(struct super_block *sb,
+					    struct scoutfs_srch_compact *sc,
+					    unsigned long timeout_jiffies);
 int scoutfs_client_srch_commit_compact(struct super_block *sb,
 				       struct scoutfs_srch_compact *res);
+int scoutfs_client_srch_commit_compact_timeout(struct super_block *sb,
+					       struct scoutfs_srch_compact *res,
+					       unsigned long timeout_jiffies);
 int scoutfs_client_get_log_merge(struct super_block *sb,
 				 struct scoutfs_log_merge_request *req);
 int scoutfs_client_commit_log_merge(struct super_block *sb,
@@ -29,6 +39,9 @@ int scoutfs_client_send_omap_response(struct super_block *sb, u64 id,
 				      struct scoutfs_open_ino_map *map);
 int scoutfs_client_open_ino_map(struct super_block *sb, u64 group_nr,
 				struct scoutfs_open_ino_map *map);
+int scoutfs_client_open_ino_map_timeout(struct super_block *sb, u64 group_nr,
+					struct scoutfs_open_ino_map *map,
+					unsigned long timeout_jiffies);
 int scoutfs_client_get_volopt(struct super_block *sb, struct scoutfs_volume_options *volopt);
 int scoutfs_client_set_volopt(struct super_block *sb, struct scoutfs_volume_options *volopt);
 int scoutfs_client_clear_volopt(struct super_block *sb, struct scoutfs_volume_options *volopt);
--- a/kmod/src/counters.h
+++ b/kmod/src/counters.h
@@ -62,6 +62,7 @@
 	EXPAND_COUNTER(btree_walk)				\
 	EXPAND_COUNTER(btree_walk_restart)			\
 	EXPAND_COUNTER(client_farewell_error)			\
+	EXPAND_COUNTER(client_rpc_timeout)			\
 	EXPAND_COUNTER(corrupt_btree_block_level)		\
 	EXPAND_COUNTER(corrupt_btree_no_child_ref)		\
 	EXPAND_COUNTER(corrupt_dirent_backref_name_len)		\
@@ -138,6 +139,7 @@
 	EXPAND_COUNTER(lock_lock_error)				\
 	EXPAND_COUNTER(lock_nonblock_eagain)			\
 	EXPAND_COUNTER(lock_recover_request)			\
+	EXPAND_COUNTER(lock_request_failed)			\
 	EXPAND_COUNTER(lock_shrink_attempted)			\
 	EXPAND_COUNTER(lock_shrink_request_failed)		\
 	EXPAND_COUNTER(lock_unlock)				\
--- a/kmod/src/inode.c
+++ b/kmod/src/inode.c
@@ -2074,6 +2074,14 @@ void scoutfs_inode_schedule_orphan_dwork(struct super_block *sb)
 	}
 }

+/*
+ * Generous per-RPC bound for the idempotent orphan scan worker.  A
+ * server that hasn't answered in this long is assumed to be broken;
+ * dropping the request lets the scan reschedule instead of blocking
+ * forever.
+ */
+#define ORPHAN_SCAN_RPC_TIMEOUT (5 * 60 * HZ)
+
 /*
 * Find and delete inodes whose only remaining reference is the
 * persistent orphan item that was created as they were unlinked.
@@ -2128,7 +2136,7 @@ static void inode_orphan_scan_worker(struct work_struct *work)
 	init_orphan_key(&last, U64_MAX);
 	omap.args.group_nr = cpu_to_le64(U64_MAX);

-	ret = scoutfs_client_get_roots(sb, &roots);
+	ret = scoutfs_client_get_roots_timeout(sb, &roots, ORPHAN_SCAN_RPC_TIMEOUT);
 	if (ret)
 		goto out;

@@ -2169,7 +2177,8 @@ static void inode_orphan_scan_worker(struct work_struct *work)
 		scoutfs_omap_calc_group_nrs(ino, &group_nr, &bit_nr);

 		if (le64_to_cpu(omap.args.group_nr) != group_nr) {
-			ret = scoutfs_client_open_ino_map(sb, group_nr, &omap);
+			ret = scoutfs_client_open_ino_map_timeout(sb, group_nr, &omap,
+								  ORPHAN_SCAN_RPC_TIMEOUT);
 			if (ret < 0)
 				goto out;
 		}
--- a/kmod/src/lock.c
+++ b/kmod/src/lock.c
@@ -71,6 +71,8 @@
 * relative to that lock state we resend.
 */

+#define CLIENT_LOCK_WAIT_TIMEOUT (60 * HZ)
+
 /*
 * allocated per-super, freed on unmount.
 */
@@ -157,6 +159,33 @@ static void invalidate_inode(struct super_block *sb, u64 ino)
 	}
 }

+/*
+ * Remove all coverage items from the lock to tell users that their
+ * cache is stale.  This is lock-internal bookkeeping that is safe to
+ * call during shutdown and unmount.  The unconditional unlock/relock
+ * of cov_list_lock avoids sparse warnings from unbalanced locking in
+ * the trylock failure path.
+ */
+static void lock_clear_coverage(struct super_block *sb,
+				struct scoutfs_lock *lock)
+{
+	struct scoutfs_lock_coverage *cov;
+
+	spin_lock(&lock->cov_list_lock);
+	while ((cov = list_first_entry_or_null(&lock->cov_list,
+					       struct scoutfs_lock_coverage, head))) {
+		if (spin_trylock(&cov->cov_lock)) {
+			list_del_init(&cov->head);
+			cov->lock = NULL;
+			spin_unlock(&cov->cov_lock);
+			scoutfs_inc_counter(sb, lock_invalidate_coverage);
+		}
+		spin_unlock(&lock->cov_list_lock);
+		spin_lock(&lock->cov_list_lock);
+	}
+	spin_unlock(&lock->cov_list_lock);
+}
+
 /*
 * Invalidate caches associated with this lock.  Either we're
 * invalidating a write to a read or we're invalidating to null.  We
@@ -166,7 +195,6 @@ static void invalidate_inode(struct super_block *sb, u64 ino)
 static int lock_invalidate(struct super_block *sb, struct scoutfs_lock *lock,
 			   enum scoutfs_lock_mode prev, enum scoutfs_lock_mode mode)
 {
-	struct scoutfs_lock_coverage *cov;
 	u64 ino, last;
 	int ret = 0;

@@ -190,24 +218,7 @@ static int lock_invalidate(struct super_block *sb, struct scoutfs_lock *lock,

 	/* have to invalidate if we're not in the only usable case */
 	if (!(prev == SCOUTFS_LOCK_WRITE && mode == SCOUTFS_LOCK_READ)) {
-		/*
-		 * Remove cov items to tell users that their cache is
-		 * stale.  The unlock pattern comes from avoiding bad
-		 * sparse warnings when taking else in a failed trylock.
-		 */
-		spin_lock(&lock->cov_list_lock);
-		while ((cov = list_first_entry_or_null(&lock->cov_list,
-						       struct scoutfs_lock_coverage, head))) {
-			if (spin_trylock(&cov->cov_lock)) {
-				list_del_init(&cov->head);
-				cov->lock = NULL;
-				spin_unlock(&cov->cov_lock);
-				scoutfs_inc_counter(sb, lock_invalidate_coverage);
-			}
-			spin_unlock(&lock->cov_list_lock);
-			spin_lock(&lock->cov_list_lock);
-		}
-		spin_unlock(&lock->cov_list_lock);
+		lock_clear_coverage(sb, lock);

 		/* invalidate inodes after removing coverage so drop/evict aren't covered */
 		if (lock->start.sk_zone == SCOUTFS_FS_ZONE) {
@@ -643,6 +654,33 @@ int scoutfs_lock_grant_response(struct super_block *sb,
 	return 0;
 }

+/*
+ * The lock request we sent to the server was dropped before we could
+ * receive a grant response.  This happens when the client reconnects to
+ * a new server and completes pending requests with an error, since the
+ * old server's pending-request state was reclaimed at fence time.
+ *
+ * Clear request_pending so that a waiter in lock_key_range re-evaluates
+ * and sends a fresh request to the new server, and symmetrically put
+ * the lock so shrink's lru state matches the grant_response path.
+ */
+void scoutfs_lock_request_failed(struct super_block *sb,
+				 struct scoutfs_lock *lock)
+{
+	DECLARE_LOCK_INFO(sb, linfo);
+
+	scoutfs_inc_counter(sb, lock_request_failed);
+
+	spin_lock(&linfo->lock);
+
+	BUG_ON(!lock->request_pending);
+	lock->request_pending = 0;
+	wake_up(&lock->waitq);
+	put_lock(linfo, lock);
+
+	spin_unlock(&linfo->lock);
+}
+
 struct inv_req {
 	struct list_head head;
 	struct scoutfs_lock *lock;
@@ -714,10 +752,13 @@ static void lock_invalidate_worker(struct work_struct *work)
 		ireq = list_first_entry(&lock->inv_list, struct inv_req, head);
 		nl = &ireq->nl;

-		/* only lock protocol, inv can't call subsystems after shutdown */
-		if (!linfo->shutdown) {
+		/* only lock protocol, inv can't call subsystems after shutdown or unmount */
+		if (!linfo->shutdown && !scoutfs_unmounting(sb)) {
 			ret = lock_invalidate(sb, lock, nl->old_mode, nl->new_mode);
 			BUG_ON(ret < 0 && ret != -ENOLINK);
+		} else {
+			lock_clear_coverage(sb, lock);
+			scoutfs_item_invalidate(sb, &lock->start, &lock->end);
 		}

 		/* respond with the key and modes from the request, server might have died */
@@ -922,7 +963,7 @@ static bool try_shrink_lock(struct super_block *sb, struct lock_info *linfo, boo
 	spin_unlock(&linfo->lock);

 	if (lock) {
-		ret = scoutfs_client_lock_request(sb, &nl);
+		ret = scoutfs_client_lock_request(sb, &nl, lock);
 		if (ret < 0) {
 			scoutfs_inc_counter(sb, lock_shrink_request_failed);

@@ -953,6 +994,9 @@ static bool lock_wait_cond(struct super_block *sb, struct scoutfs_lock *lock,
 	       !lock->request_pending;
 	spin_unlock(&linfo->lock);

+	if (!wake)
+		wake = scoutfs_unmounting(sb);
+
 	if (!wake)
 		scoutfs_inc_counter(sb, lock_wait);

@@ -997,8 +1041,10 @@ static int lock_key_range(struct super_block *sb, enum scoutfs_lock_mode mode, i
 		return -EINVAL;

 	/* maybe catch _setup() and _shutdown order mistakes */
-	if (WARN_ON_ONCE(!linfo || linfo->shutdown))
+	if (!linfo || linfo->shutdown) {
+		WARN_ON_ONCE(!scoutfs_unmounting(sb));
 		return -ENOLCK;
+	}

 	/* have to lock before entering transactions */
 	if (WARN_ON_ONCE(scoutfs_trans_held()))
@@ -1024,6 +1070,11 @@ static int lock_key_range(struct super_block *sb, enum scoutfs_lock_mode mode, i
 			break;
 		}

+		if (scoutfs_unmounting(sb)) {
+			ret = -ESHUTDOWN;
+			break;
+		}
+
 		/* the fast path where we can use the granted mode */
 		if (lock_modes_match(lock->mode, mode)) {
 			lock_inc_count(lock->users, mode);
@@ -1053,7 +1104,7 @@ static int lock_key_range(struct super_block *sb, enum scoutfs_lock_mode mode, i
 			nl.old_mode = lock->mode;
 			nl.new_mode = mode;

-			ret = scoutfs_client_lock_request(sb, &nl);
+			ret = scoutfs_client_lock_request(sb, &nl, lock);
 			if (ret) {
 				spin_lock(&linfo->lock);
 				lock->request_pending = 0;
@@ -1067,8 +1118,9 @@ static int lock_key_range(struct super_block *sb, enum scoutfs_lock_mode mode, i
 		if (flags & SCOUTFS_LKF_INTERRUPTIBLE) {
 			ret = wait_event_interruptible(lock->waitq,
 						       lock_wait_cond(sb, lock, mode));
-		} else {
-			wait_event(lock->waitq, lock_wait_cond(sb, lock, mode));
+		} else if (!wait_event_timeout(lock->waitq,
+					       lock_wait_cond(sb, lock, mode),
+					       CLIENT_LOCK_WAIT_TIMEOUT)) {
 			ret = 0;
 		}

@@ -1650,6 +1702,7 @@ void scoutfs_lock_destroy(struct super_block *sb)
 			list_del_init(&lock->inv_head);
 			lock->invalidate_pending = 0;
 		}
+		lock_clear_coverage(sb, lock);
 		lock_remove(linfo, lock);
 		lock_free(linfo, lock);
 	}
--- a/kmod/src/lock.h
+++ b/kmod/src/lock.h
@@ -60,6 +60,8 @@ struct scoutfs_lock_coverage {

 int scoutfs_lock_grant_response(struct super_block *sb,
 				struct scoutfs_net_lock *nl);
+void scoutfs_lock_request_failed(struct super_block *sb,
+				 struct scoutfs_lock *lock);
 int scoutfs_lock_invalidate_request(struct super_block *sb, u64 net_id,
 				    struct scoutfs_net_lock *nl);
 int scoutfs_lock_recover_request(struct super_block *sb, u64 net_id,
--- a/kmod/src/net.c
+++ b/kmod/src/net.c
@@ -1750,8 +1750,10 @@ void scoutfs_net_client_greeting(struct super_block *sb,
 				 bool new_server)
 {
 	struct net_info *ninf = SCOUTFS_SB(sb)->net_info;
+	scoutfs_net_response_t resp_func;
 	struct message_send *msend;
 	struct message_send *tmp;
+	void *resp_data;

 	/* only called on client connections :/ */
 	BUG_ON(conn->listening_conn);
@@ -1760,10 +1762,32 @@ void scoutfs_net_client_greeting(struct super_block *sb,

 	if (new_server) {
 		atomic64_set(&conn->recv_seq, 0);
+
+		/* drop stale responses; old server's state is gone */
 		list_for_each_entry_safe(msend, tmp, &conn->resend_queue, head){
 			if (nh_is_response(&msend->nh))
 				free_msend(ninf, conn, msend);
 		}
+
+		/*
+		 * Complete pending requests with -ECONNRESET.  Any state
+		 * they depended on in the old server was reclaimed at
+		 * fence time, so resending is wrong.  Callers re-issue on
+		 * the new server if they still care.
+		 */
+		while ((msend = list_first_entry_or_null(&conn->resend_queue,
+							 struct message_send, head))) {
+			if (nh_is_response(&msend->nh))
+				break;
+			resp_func = msend->resp_func;
+			resp_data = msend->resp_data;
+			free_msend(ninf, conn, msend);
+			spin_unlock(&conn->lock);
+
+			call_resp_func(sb, conn, resp_func, resp_data, NULL, 0, -ECONNRESET);
+
+			spin_lock(&conn->lock);
+		}
 	}

 	set_valid_greeting(conn);
@@ -1990,8 +2014,9 @@ static int sync_response(struct super_block *sb,
 * buffer.  Errors returned can come from the remote request processing
 * or local failure to send.
 *
- * The wait for the response is interruptible and can return
- * -ERESTARTSYS if it is interrupted.
+ * The wait for the response uses a 60 second timeout loop that
+ * checks for unmount, returning -ESHUTDOWN if the mount is
+ * being torn down.
 *
 * -EOVERFLOW is returned if the response message's data_length doesn't
 * match the caller's resp_len buffer.
@@ -2002,6 +2027,7 @@ int scoutfs_net_sync_request(struct super_block *sb,
 			     void *resp, size_t resp_len)
 {
 	struct sync_request_completion sreq;
+	struct message_send *msend;
 	int ret;
 	u64 id;

@@ -2014,13 +2040,124 @@ int scoutfs_net_sync_request(struct super_block *sb,
 					 sync_response, &sreq, &id);

 	if (ret == 0) {
-		wait_for_completion(&sreq.comp);
-		ret = sreq.error;
+		while (!wait_for_completion_timeout(&sreq.comp, 60 * HZ)) {
+			if (scoutfs_unmounting(sb)) {
+				ret = -ESHUTDOWN;
+				break;
+			}
+		}
+		if (ret == -ESHUTDOWN) {
+			spin_lock(&conn->lock);
+			msend = find_request(conn, cmd, id);
+			if (msend)
+				queue_dead_free(conn, msend);
+			spin_unlock(&conn->lock);
+		} else {
+			ret = sreq.error;
+		}
 	}

 	return ret;
 }

+/*
+ * A bounded-wait variant of sync_request for idempotent background
+ * workers that must reschedule instead of blocking indefinitely on an
+ * unresponsive server.  Returns -ETIMEDOUT if the response doesn't
+ * arrive within timeout_jiffies; the caller then treats it like any
+ * other RPC failure and retries on its normal reschedule cadence.
+ *
+ * Response state lives in a refcounted heap allocation rather than on
+ * the caller's stack so a late callback can't scribble into freed
+ * memory if we give up waiting.  On timeout we race with an arriving
+ * response for the msend: if find_request wins we queue_dead_free and
+ * the callback won't fire (we drop its ref); otherwise the callback is
+ * already running so we wait for it to complete before returning.
+ */
+struct bounded_sync {
+	struct completion comp;
+	void *resp;
+	unsigned int resp_len;
+	int error;
+	atomic_t refs;
+};
+
+static void bounded_sync_put(struct bounded_sync *bs)
+{
+	if (atomic_dec_and_test(&bs->refs))
+		kfree(bs);
+}
+
+static int bounded_sync_response(struct super_block *sb,
+				 struct scoutfs_net_connection *conn,
+				 void *resp, unsigned int resp_len,
+				 int error, void *data)
+{
+	struct bounded_sync *bs = data;
+
+	if (error == 0 && resp_len != bs->resp_len)
+		error = -EMSGSIZE;
+
+	if (error)
+		bs->error = error;
+	else if (resp_len)
+		memcpy(bs->resp, resp, resp_len);
+
+	complete(&bs->comp);
+	bounded_sync_put(bs);
+	return 0;
+}
+
+int scoutfs_net_sync_request_timeout(struct super_block *sb,
+				     struct scoutfs_net_connection *conn,
+				     u8 cmd, void *arg, unsigned arg_len,
+				     void *resp, size_t resp_len,
+				     unsigned long timeout_jiffies)
+{
+	struct message_send *msend;
+	struct bounded_sync *bs;
+	int ret;
+	u64 id;
+
+	bs = kzalloc(sizeof(*bs), GFP_NOFS);
+	if (!bs)
+		return -ENOMEM;
+	init_completion(&bs->comp);
+	bs->resp = resp;
+	bs->resp_len = resp_len;
+	bs->error = 0;
+	atomic_set(&bs->refs, 2);
+
+	ret = scoutfs_net_submit_request(sb, conn, cmd, arg, arg_len,
+					 bounded_sync_response, bs, &id);
+	if (ret) {
+		bounded_sync_put(bs);
+		bounded_sync_put(bs);
+		return ret;
+	}
+
+	if (wait_for_completion_timeout(&bs->comp, timeout_jiffies) == 0) {
+		scoutfs_inc_counter(sb, client_rpc_timeout);
+
+		spin_lock(&conn->lock);
+		msend = find_request(conn, cmd, id);
+		if (msend)
+			queue_dead_free(conn, msend);
+		spin_unlock(&conn->lock);
+
+		if (msend)
+			bounded_sync_put(bs);
+		else
+			wait_for_completion(&bs->comp);
+		ret = -ETIMEDOUT;
+	} else {
+		ret = bs->error;
+	}
+
+	bounded_sync_put(bs);
+	return ret;
+}
+
 static void net_tseq_show_conn(struct seq_file *m,
 			      struct scoutfs_tseq_entry *ent)
 {
--- a/kmod/src/net.h
+++ b/kmod/src/net.h
@@ -150,6 +150,11 @@ int scoutfs_net_sync_request(struct super_block *sb,
 			     struct scoutfs_net_connection *conn,
 			     u8 cmd, void *arg, unsigned arg_len,
 			     void *resp, size_t resp_len);
+int scoutfs_net_sync_request_timeout(struct super_block *sb,
+				     struct scoutfs_net_connection *conn,
+				     u8 cmd, void *arg, unsigned arg_len,
+				     void *resp, size_t resp_len,
+				     unsigned long timeout_jiffies);
 int scoutfs_net_response(struct super_block *sb,
 			 struct scoutfs_net_connection *conn,
 			 u8 cmd, u64 id, int error, void *resp, u16 resp_len);
--- a/kmod/src/scoutfs_trace.h
+++ b/kmod/src/scoutfs_trace.h
@@ -2620,60 +2620,24 @@ TRACE_EVENT(scoutfs_block_dirty_ref,
 );

 TRACE_EVENT(scoutfs_get_file_block,
-	TP_PROTO(struct super_block *sb, u64 blkno, int flags,
-		 struct scoutfs_srch_block *srb),
+	TP_PROTO(struct super_block *sb, u64 blkno, int flags),

-	TP_ARGS(sb, blkno, flags, srb),
+	TP_ARGS(sb, blkno, flags),

 	TP_STRUCT__entry(
 		SCSB_TRACE_FIELDS
 		__field(__u64, blkno)
-		__field(__u32, entry_nr)
-		__field(__u32, entry_bytes)
 		__field(int, flags)
-		__field(__u64, first_hash)
-		__field(__u64, first_ino)
-		__field(__u64, first_id)
-		__field(__u64, last_hash)
-		__field(__u64, last_ino)
-		__field(__u64, last_id)
 	),

 	TP_fast_assign(
 		SCSB_TRACE_ASSIGN(sb);
 		__entry->blkno = blkno;
-		__entry->entry_nr = __le32_to_cpu(srb->entry_nr);
-		__entry->entry_bytes = __le32_to_cpu(srb->entry_bytes);
 		__entry->flags = flags;
-		__entry->first_hash = __le64_to_cpu(srb->first.hash);
-		__entry->first_ino = __le64_to_cpu(srb->first.ino);
-		__entry->first_id = __le64_to_cpu(srb->first.id);
-		__entry->last_hash = __le64_to_cpu(srb->last.hash);
-		__entry->last_ino = __le64_to_cpu(srb->last.ino);
-		__entry->last_id = __le64_to_cpu(srb->last.id);
 	),

-	TP_printk(SCSBF" blkno %llu nr %u bytes %u flags 0x%x first_hash 0x%llx first_ino %llu first_id 0x%llx last_hash 0x%llx last_ino %llu last_id 0x%llx",
-		  SCSB_TRACE_ARGS, __entry->blkno, __entry->entry_nr,
-		  __entry->entry_bytes, __entry->flags,
-		  __entry->first_hash, __entry->first_ino, __entry->first_id,
-		  __entry->last_hash, __entry->last_ino, __entry->last_id)
-);
-
-TRACE_EVENT(scoutfs_srch_new_merge,
-	TP_PROTO(struct super_block *sb),
-
-	TP_ARGS(sb),
-
-	TP_STRUCT__entry(
-		SCSB_TRACE_FIELDS
-	),
-
-	TP_fast_assign(
-		SCSB_TRACE_ASSIGN(sb);
-	),
-
-	TP_printk(SCSBF, SCSB_TRACE_ARGS)
+	TP_printk(SCSBF" blkno %llu flags 0x%x",
+		  SCSB_TRACE_ARGS, __entry->blkno, __entry->flags)
 );

 TRACE_EVENT(scoutfs_block_stale,
--- a/kmod/src/srch.c
+++ b/kmod/src/srch.c
@@ -95,6 +95,13 @@ struct srch_info {
 */
 #define SRCH_COMPACT_DIRTY_LIMIT_BYTES (32 * 1024 * 1024)

+/*
+ * Generous per-RPC bound for the idempotent compact worker.  A server
+ * that hasn't answered in this long is assumed to be broken; dropping
+ * the request lets the worker reschedule instead of blocking forever.
+ */
+#define COMPACT_RPC_TIMEOUT (5 * 60 * HZ)
+
 static int sre_cmp(const struct scoutfs_srch_entry *a,
 		   const struct scoutfs_srch_entry *b)
 {
@@ -443,7 +450,7 @@ out:
 		sfl->blocks = cpu_to_le64(blk + 1);

 	if (bl) {
-		trace_scoutfs_get_file_block(sb, bl->blkno, flags, bl->data);
+		trace_scoutfs_get_file_block(sb, bl->blkno, flags);
 	}

 	*bl_ret = bl;
@@ -1525,65 +1532,6 @@ static bool should_commit(struct super_block *sb, struct scoutfs_alloc *alloc,
 		scoutfs_alloc_meta_low(sb, alloc, nr);
 }

-static int alloc_srch_block(struct super_block *sb, struct scoutfs_alloc *alloc,
-			    struct scoutfs_block_writer *wri,
-			    struct scoutfs_srch_file *sfl,
-			    struct scoutfs_block **bl,
-			    u64 blk)
-{
-	DECLARE_SRCH_INFO(sb, srinf);
-	int ret;
-
-	if (atomic_read(&srinf->shutdown))
-		return -ESHUTDOWN;
-
-	/* could grow and dirty to a leaf */
-	if (should_commit(sb, alloc, wri, sfl->height + 1))
-		return -EAGAIN;
-
-	ret = get_file_block(sb, alloc, wri, sfl, GFB_INSERT | GFB_DIRTY,
-			     blk, bl);
-	if (ret < 0)
-		return ret;
-
-	scoutfs_inc_counter(sb, srch_compact_dirty_block);
-
-	return 0;
-}
-
-static int emit_srch_entry(struct super_block *sb,
-			   struct scoutfs_srch_file *sfl,
-			   struct scoutfs_srch_block *srb,
-			   struct scoutfs_srch_entry *sre,
-			   u64 blk)
-{
-	int ret;
-
-	ret = encode_entry(srb->entries + le32_to_cpu(srb->entry_bytes),
-			   sre, &srb->tail);
-	if (WARN_ON_ONCE(ret <= 0)) {
-		/* shouldn't happen */
-		return -EIO;
-	}
-
-	if (srb->entry_bytes == 0) {
-		if (blk == 0)
-			sfl->first = *sre;
-		srb->first = *sre;
-	}
-
-	le32_add_cpu(&srb->entry_nr, 1);
-	le32_add_cpu(&srb->entry_bytes, ret);
-	srb->last = *sre;
-	srb->tail = *sre;
-	sfl->last = *sre;
-	le64_add_cpu(&sfl->entries, 1);
-
-	scoutfs_inc_counter(sb, srch_compact_entry);
-
-	return 0;
-}
-
 struct tourn_node {
 	struct scoutfs_srch_entry sre;
 	int ind;
@@ -1606,8 +1554,7 @@ static void tourn_update(struct tourn_node *tnodes, struct tourn_node *tn)
 }

 /* return the entry at the current position, can return enoent if done */
-typedef int (*kway_get_t)(struct super_block *sb, struct scoutfs_alloc *alloc,
-			  struct scoutfs_block_writer *wri,
+typedef int (*kway_get_t)(struct super_block *sb,
 			  struct scoutfs_srch_entry *sre_ret, void *arg);
 /* only called after _get returns 0, advances to next entry for _get */
 typedef void (*kway_advance_t)(struct super_block *sb, void *arg);
@@ -1619,18 +1566,20 @@ static int kway_merge(struct super_block *sb,
 		      kway_get_t kway_get, kway_advance_t kway_adv,
 		      void **args, int nr, bool logs_input)
 {
+	DECLARE_SRCH_INFO(sb, srinf);
 	struct scoutfs_srch_block *srb = NULL;
-	struct scoutfs_srch_entry tmp_entry = {0};
+	struct scoutfs_srch_entry last_tail;
 	struct scoutfs_block *bl = NULL;
 	struct tourn_node *tnodes;
 	struct tourn_node *leaves;
 	struct tourn_node *root;
 	struct tourn_node *tn;
-	bool have_tmp = false;
+	int last_bytes = 0;
 	int nr_parents;
 	int nr_nodes;
 	int empty = 0;
 	int ret = 0;
+	int diff;
 	u64 blk;
 	int ind;
 	int i;
@@ -1654,7 +1603,7 @@ static int kway_merge(struct super_block *sb,
 	for (i = 0; i < nr; i++) {
 		tn = &leaves[i];
 		tn->ind = i;
-		ret = kway_get(sb, NULL, NULL, &tn->sre, args[i]);
+		ret = kway_get(sb, &tn->sre, args[i]);
 		if (ret == 0) {
 			tourn_update(tnodes, &leaves[i]);
 		} else if (ret == -ENOENT) {
@@ -1664,68 +1613,97 @@ static int kway_merge(struct super_block *sb,
 		}
 	}

-	trace_scoutfs_srch_new_merge(sb);
-
 	/* always append new blocks */
 	blk = le64_to_cpu(sfl->blocks);
 	while (empty < nr) {
-		if (sre_cmp(&root->sre, &tmp_entry) != 0) {
-			if (have_tmp) {
-				if (bl == NULL) {
-					ret = alloc_srch_block(sb, alloc, wri,
-							       sfl, &bl, blk);
-					if (ret < 0)
-						goto out;
-
-					srb = bl->data;
-				}
-
-				ret = emit_srch_entry(sb, sfl, srb, &tmp_entry,
-						      blk);
-				if (ret < 0)
-					goto out;
-
-				if (le32_to_cpu(srb->entry_bytes) >
-						SCOUTFS_SRCH_BLOCK_SAFE_BYTES) {
-					scoutfs_block_put(sb, bl);
-					bl = NULL;
-					blk++;
-					memset(&tmp_entry, 0, sizeof(tmp_entry));
-					have_tmp = false;
-					continue;
-				}
-
-				/*
-				 * end sorted block on _SAFE offset for
-				 * testing
-				 */
-				if (bl && le32_to_cpu(srb->entry_nr) == 1 &&
-				    logs_input &&
-				    scoutfs_trigger(sb, SRCH_COMPACT_LOGS_PAD_SAFE)) {
-					pad_entries_at_safe(sfl, srb);
-					scoutfs_block_put(sb, bl);
-					bl = NULL;
-					blk++;
-
-					memset(&tmp_entry, 0, sizeof(tmp_entry));
-					have_tmp = false;
-					continue;
-				}
+		if (bl == NULL) {
+			if (atomic_read(&srinf->shutdown)) {
+				ret = -ESHUTDOWN;
+				goto out;
 			}

-			tmp_entry = root->sre;
-			have_tmp = true;
+			/* could grow and dirty to a leaf */
+			if (should_commit(sb, alloc, wri, sfl->height + 1)) {
+				ret = 0;
+				goto out;
+			}
+
+			ret = get_file_block(sb, alloc, wri, sfl,
+					     GFB_INSERT | GFB_DIRTY, blk, &bl);
+			if (ret < 0)
+				goto out;
+			srb = bl->data;
+			scoutfs_inc_counter(sb, srch_compact_dirty_block);
+		}
+
+		if (sre_cmp(&root->sre, &srb->last) != 0) {
+			last_bytes = le32_to_cpu(srb->entry_bytes);
+			last_tail = srb->last;
+			ret = encode_entry(srb->entries +
+					   le32_to_cpu(srb->entry_bytes),
+					   &root->sre, &srb->tail);
+			if (WARN_ON_ONCE(ret <= 0)) {
+				/* shouldn't happen */
+				ret = -EIO;
+				goto out;
+			}
+
+			if (srb->entry_bytes == 0) {
+				if (blk == 0)
+					sfl->first = root->sre;
+				srb->first = root->sre;
+			}
+			le32_add_cpu(&srb->entry_nr, 1);
+			le32_add_cpu(&srb->entry_bytes, ret);
+			srb->last = root->sre;
+			srb->tail = root->sre;
+			sfl->last = root->sre;
+			le64_add_cpu(&sfl->entries, 1);
+			ret = 0;
+
+			if (le32_to_cpu(srb->entry_bytes) >
+			    SCOUTFS_SRCH_BLOCK_SAFE_BYTES) {
+				scoutfs_block_put(sb, bl);
+				bl = NULL;
+				blk++;
+			}
+
+			/* end sorted block on _SAFE offset for testing */
+			if (bl && le32_to_cpu(srb->entry_nr) == 1 && logs_input &&
+			    scoutfs_trigger(sb, SRCH_COMPACT_LOGS_PAD_SAFE)) {
+				pad_entries_at_safe(sfl, srb);
+				scoutfs_block_put(sb, bl);
+				bl = NULL;
+				blk++;
+			}
+
+			scoutfs_inc_counter(sb, srch_compact_entry);
+
 		} else {
 			/*
 			 * Duplicate entries indicate deletion so we
-			 * undo the previously cached tmp entry and ignore
+			 * undo the previously encoded entry and ignore
 			 * this entry.  This only happens within each
 			 * block.  Deletions can span block boundaries
 			 * and will be filtered out by search and
 			 * hopefully removed in future compactions.
 			 */
-			memset(&tmp_entry, 0, sizeof(tmp_entry));
-			have_tmp = false;
+			diff = le32_to_cpu(srb->entry_bytes) - last_bytes;
+			if (diff) {
+				memset(srb->entries + last_bytes, 0, diff);
+				if (srb->entry_bytes == 0) {
+					/* last_tail will be 0 */
+					if (blk == 0)
+						sfl->first = last_tail;
+					srb->first = last_tail;
+				}
+				le32_add_cpu(&srb->entry_nr, -1);
+				srb->entry_bytes = cpu_to_le32(last_bytes);
+				srb->last = last_tail;
+				srb->tail = last_tail;
+				sfl->last = last_tail;
+				le64_add_cpu(&sfl->entries, -1);
+			}

 			scoutfs_inc_counter(sb, srch_compact_removed_entry);
 		}
@@ -1734,7 +1712,7 @@ static int kway_merge(struct super_block *sb,
 		ind = root->ind;
 		tn = &leaves[ind];
 		kway_adv(sb, args[ind]);
-		ret = kway_get(sb, alloc, wri, &tn->sre, args[ind]);
+		ret = kway_get(sb, &tn->sre, args[ind]);
 		if (ret == -ENOENT) {
 			/* this index is done */
 			memset(&tn->sre, 0xff, sizeof(tn->sre));
@@ -1768,21 +1746,6 @@ static int kway_merge(struct super_block *sb,
 	/* could stream a final index.. arguably a small portion of work */

 out:
-	if (have_tmp) {
-		bool emit = true;
-
-		if (bl == NULL) {
-			ret = alloc_srch_block(sb, alloc, wri, sfl, &bl, blk);
-			if (ret)
-				emit = false;
-			else
-				srb = bl->data;
-		}
-
-		if (emit)
-			ret = emit_srch_entry(sb, sfl, srb, &tmp_entry, blk);
-	}
-
 	scoutfs_block_put(sb, bl);
 	vfree(tnodes);
 	return ret;
@@ -1795,8 +1758,7 @@ static struct scoutfs_srch_entry *page_priv_sre(struct page *page)
 	return (struct scoutfs_srch_entry *)page_address(page) + page->private;
 }

-static int kway_get_page(struct super_block *sb, struct scoutfs_alloc *alloc,
-			 struct scoutfs_block_writer *wri,
+static int kway_get_page(struct super_block *sb,
 			 struct scoutfs_srch_entry *sre_ret, void *arg)
 {
 	struct page *page = arg;
@@ -2003,8 +1965,7 @@ struct kway_file_reader {
 	int decoded_bytes;
 };

-static int kway_get_reader(struct super_block *sb, struct scoutfs_alloc *alloc,
-			   struct scoutfs_block_writer *wri,
+static int kway_get_reader(struct super_block *sb,
 			   struct scoutfs_srch_entry *sre_ret, void *arg)
 {
 	struct kway_file_reader *rdr = arg;
@@ -2015,17 +1976,6 @@ static int kway_get_reader(struct super_block *sb, struct scoutfs_alloc *alloc,
 		return -ENOENT;

 	if (rdr->bl == NULL) {
-		/*
-		 * Each new output block has the possibility of winding up with
-		 * a straggler entry due to our need to look ahead an entry so
-		 * that we don't wind up emitting an empty block.
-		 *
-		 * Make sure there's space to emit the straggler before starting
-		 * another input block.
-		 */
-		if (alloc && should_commit(sb, alloc, wri, 16))
-			return -ENOENT;
-
 		ret = get_file_block(sb, NULL, NULL, rdr->sfl, 0, rdr->blk,
 				     &rdr->bl);
 		if (ret < 0)
@@ -2039,11 +1989,6 @@ static int kway_get_reader(struct super_block *sb, struct scoutfs_alloc *alloc,
 	    rdr->skip > SCOUTFS_SRCH_BLOCK_SAFE_BYTES ||
 	    rdr->skip >= le32_to_cpu(srb->entry_bytes)) {
 		/* XXX inconsistency */
-		scoutfs_err(sb, "blkno %llu pos %u vs %ld, skip %u, bytes %u",
-			__le64_to_cpu(srb->hdr.blkno),
-			rdr->pos, SCOUTFS_SRCH_BLOCK_SAFE_BYTES,
-			rdr->skip,
-			le32_to_cpu(srb->entry_bytes));
 		return -EIO;
 	}

@@ -2318,7 +2263,8 @@ static void scoutfs_srch_compact_worker(struct work_struct *work)

 	scoutfs_block_writer_init(sb, &wri);

-	ret = scoutfs_client_srch_get_compact(sb, sc);
+	ret = scoutfs_client_srch_get_compact_timeout(sb, sc,
+						      COMPACT_RPC_TIMEOUT);
 	if (ret >= 0)
 		trace_scoutfs_srch_compact_client_recv(sb, sc);
 	if (ret < 0 || sc->nr == 0)
@@ -2349,7 +2295,8 @@ static void scoutfs_srch_compact_worker(struct work_struct *work)
 	sc->flags |= ret < 0 ? SCOUTFS_SRCH_COMPACT_FLAG_ERROR : 0;

 	trace_scoutfs_srch_compact_client_send(sb, sc);
-	err = scoutfs_client_srch_commit_compact(sb, sc);
+	err = scoutfs_client_srch_commit_compact_timeout(sb, sc,
+							 COMPACT_RPC_TIMEOUT);
 	if (err < 0 && ret == 0)
 		ret = err;
 out:
--- a/kmod/src/trans.c
+++ b/kmod/src/trans.c
@@ -195,7 +195,8 @@ static int retry_forever(struct super_block *sb, int (*func)(struct super_block
 				retrying = true;
 			}

-			if (scoutfs_forcing_unmount(sb)) {
+			if (scoutfs_forcing_unmount(sb) ||
+			    scoutfs_unmounting(sb)) {
 				ret = -ENOLINK;
 				break;
 			}
--- a/tests/golden/srch-safe-merge-pos
+++ b/tests/golden/srch-safe-merge-pos
@@ -1,7 +1,37 @@
 == initialize per-mount values
 == arm compaction triggers
+trigger srch_compact_logs_pad_safe armed: 1
+trigger srch_merge_stop_safe armed: 1
+trigger srch_compact_logs_pad_safe armed: 1
+trigger srch_merge_stop_safe armed: 1
+trigger srch_compact_logs_pad_safe armed: 1
+trigger srch_merge_stop_safe armed: 1
+trigger srch_compact_logs_pad_safe armed: 1
+trigger srch_merge_stop_safe armed: 1
+trigger srch_compact_logs_pad_safe armed: 1
+trigger srch_merge_stop_safe armed: 1
 == compact more often
 == create padded sorted inputs by forcing log rotation
+trigger srch_force_log_rotate armed: 1
+trigger srch_force_log_rotate armed: 1
+trigger srch_force_log_rotate armed: 1
+trigger srch_force_log_rotate armed: 1
+trigger srch_compact_logs_pad_safe armed: 1
+trigger srch_force_log_rotate armed: 1
+trigger srch_force_log_rotate armed: 1
+trigger srch_force_log_rotate armed: 1
+trigger srch_force_log_rotate armed: 1
+trigger srch_compact_logs_pad_safe armed: 1
+trigger srch_force_log_rotate armed: 1
+trigger srch_force_log_rotate armed: 1
+trigger srch_force_log_rotate armed: 1
+trigger srch_force_log_rotate armed: 1
+trigger srch_compact_logs_pad_safe armed: 1
+trigger srch_force_log_rotate armed: 1
+trigger srch_force_log_rotate armed: 1
+trigger srch_force_log_rotate armed: 1
+trigger srch_force_log_rotate armed: 1
+trigger srch_compact_logs_pad_safe armed: 1
 == compaction of padded should stop at safe
 == verify no compaction errors
 == cleanup
--- a/tests/run-tests.sh
+++ b/tests/run-tests.sh
@@ -90,7 +90,7 @@ done

 # set some T_ defaults
 T_TRACE_DUMP="0"
-T_TRACE_PRINTK=""
+T_TRACE_PRINTK="0"
 T_PORT_START="19700"
 T_LOOP_ITER="1"

@@ -137,9 +137,6 @@ while true; do
 	        test -n "$2" || die "-l must have a nr iterations argument"
 		test "$2" -eq "$2" 2>/dev/null || die "-l <nr> argument must be an integer"
 		T_LOOP_ITER="$2"
-
-		# when looping, break after first failure
-		T_ABORT="1"
 		shift
 		;;
 	-M)
@@ -402,44 +399,31 @@ if [ -n "$T_INSMOD" ]; then
 	cmd insmod "$T_MODULE"
 fi

-start_tracing() {
-	if [ -n "$T_TRACE_MULT" ]; then
-		orig_trace_size=1408
-		mult_trace_size=$((orig_trace_size * T_TRACE_MULT))
-		msg "increasing trace buffer size from $orig_trace_size KiB to $mult_trace_size KiB"
-		echo $mult_trace_size > /sys/kernel/debug/tracing/buffer_size_kb
-	fi
+if [ -n "$T_TRACE_MULT" ]; then
+#	orig_trace_size=$(cat /sys/kernel/debug/tracing/buffer_size_kb)
+	orig_trace_size=1408
+	mult_trace_size=$((orig_trace_size * T_TRACE_MULT))
+	msg "increasing trace buffer size from $orig_trace_size KiB to $mult_trace_size KiB"
+	echo $mult_trace_size > /sys/kernel/debug/tracing/buffer_size_kb
+fi

-	nr_globs=${#T_TRACE_GLOB[@]}
-	if [ $nr_globs -gt 0 ]; then
-		echo 0 > /sys/kernel/debug/tracing/events/scoutfs/enable
+nr_globs=${#T_TRACE_GLOB[@]}
+if [ $nr_globs -gt 0 ]; then
+	echo 0 > /sys/kernel/debug/tracing/events/scoutfs/enable

-		for g in "${T_TRACE_GLOB[@]}"; do
-			for e in /sys/kernel/debug/tracing/events/scoutfs/$g/enable; do
-				if test -w "$e"; then
-					echo 1 > "$e"
-				else
-					die "-t glob '$g' matched no scoutfs events"
-				fi
-			done
+	for g in "${T_TRACE_GLOB[@]}"; do
+		for e in /sys/kernel/debug/tracing/events/scoutfs/$g/enable; do
+			if test -w "$e"; then
+				echo 1 > "$e"
+			else
+				die "-t glob '$g' matched no scoutfs events"
+			fi
 		done
+	done

-		nr_events=$(cat /sys/kernel/debug/tracing/set_event | wc -l)
-		msg "enabled $nr_events trace events from $nr_globs -t globs"
-	fi
-}
-
-stop_tracing() {
-	if [ -n "$T_TRACE_GLOB" -o -n "$T_TRACE_PRINTK" ]; then
-		msg "saving traces and disabling tracing"
-		echo 0 > /sys/kernel/debug/tracing/events/scoutfs/enable
-		echo 0 > /sys/kernel/debug/tracing/options/trace_printk
-		cat /sys/kernel/debug/tracing/trace | gzip > "$T_RESULTS/traces.gz"
-		if [ -n "$orig_trace_size" ]; then
-			echo $orig_trace_size > /sys/kernel/debug/tracing/buffer_size_kb
-		fi
-	fi
-}
+	nr_events=$(cat /sys/kernel/debug/tracing/set_event | wc -l)
+	msg "enabled $nr_events trace events from $nr_globs -t globs"
+fi

 if [ -n "$T_TRACE_PRINTK" ]; then
 	echo "$T_TRACE_PRINTK" > /sys/kernel/debug/tracing/options/trace_printk
@@ -619,26 +603,24 @@ passed=0
 skipped=0
 failed=0
 skipped_permitted=0
-for iter in $(seq 1 $T_LOOP_ITER); do
+for t in $tests; do
+	# tests has basenames from sequence, get path and name
+	t="tests/$t"
+	test_name=$(basename "$t" | sed -e 's/.sh$//')

-	start_tracing
+	# get stats from previous pass
+	last="$T_RESULTS/last-passed-test-stats"
+	stats=$(grep -s "^$test_name " "$last" | cut -d " " -f 2-)
+	test -n "$stats" && stats="last: $stats"
+	printf "  %-30s $stats" "$test_name"

-	for t in $tests; do
-		# tests has basenames from sequence, get path and name
-		t="tests/$t"
-		test_name=$(basename "$t" | sed -e 's/.sh$//')
+	# mark in dmesg as to what test we are running
+	echo "run scoutfs test $test_name" > /dev/kmsg

-		# get stats from previous pass
-		last="$T_RESULTS/last-passed-test-stats"
-		stats=$(grep -s "^$test_name " "$last" | cut -d " " -f 2-)
-		test -n "$stats" && stats="last: $stats"
-		printf "  %-30s $stats" "$test_name"
+	# let the test get at its extra files
+	T_EXTRA="$T_TESTS/extra/$test_name"

-		# mark in dmesg as to what test we are running
-		echo "run scoutfs test $test_name" > /dev/kmsg
-
-		# let the test get at its extra files
-		T_EXTRA="$T_TESTS/extra/$test_name"
+	for iter in $(seq 1 $T_LOOP_ITER); do

 		# create a temporary dir and file path for the test
 		T_TMPDIR="$T_RESULTS/tmp/$test_name"
@@ -728,43 +710,55 @@ for iter in $(seq 1 $T_LOOP_ITER); do
 			sts=$T_FAIL_STATUS
 		fi

-		# show and record the result of the test
-		if [ "$sts" == "$T_PASS_STATUS" ]; then
-			echo "  passed: $stats"
-			((passed++))
-			# save stats for passed test
-			grep -s -v "^$test_name " "$last" > "$last.tmp"
-			echo "$test_name $stats" >> "$last.tmp"
-			mv -f "$last.tmp" "$last"
-		elif [ "$sts" == "$T_SKIP_PERMITTED_STATUS" ]; then
-			echo "  [ skipped (permitted): $message ]"
-			echo "$test_name skipped (permitted) $message " >> "$T_RESULTS/skip.log"
-			((skipped_permitted++))
-		elif [ "$sts" == "$T_SKIP_STATUS" ]; then
-			echo "  [ skipped: $message ]"
-			echo "$test_name $message" >> "$T_RESULTS/skip.log"
-			((skipped++))
-		elif [ "$sts" == "$T_FAIL_STATUS" ]; then
-			echo "  [ failed: $message ]"
-			echo "$test_name $message" >> "$T_RESULTS/fail.log"
-			((failed++))
-
-			if [ -n "$T_ABORT" ]; then
-				stop_tracing
-				die "aborting after first failure"
-			fi
+		# stop looping if we didn't pass
+		if [ "$sts" != "$T_PASS_STATUS" ]; then
+			break;
 		fi
-
-		# record results for TAP format output
-		t_tap_progress $test_name $sts
-		((testcount++))
 	done

-	stop_tracing
+	# show and record the result of the test
+	if [ "$sts" == "$T_PASS_STATUS" ]; then
+		echo "  passed: $stats"
+		((passed++))
+		# save stats for passed test
+		grep -s -v "^$test_name " "$last" > "$last.tmp"
+		echo "$test_name $stats" >> "$last.tmp"
+		mv -f "$last.tmp" "$last"
+	elif [ "$sts" == "$T_SKIP_PERMITTED_STATUS" ]; then
+		echo "  [ skipped (permitted): $message ]"
+		echo "$test_name skipped (permitted) $message " >> "$T_RESULTS/skip.log"
+		((skipped_permitted++))
+	elif [ "$sts" == "$T_SKIP_STATUS" ]; then
+		echo "  [ skipped: $message ]"
+		echo "$test_name $message" >> "$T_RESULTS/skip.log"
+		((skipped++))
+	elif [ "$sts" == "$T_FAIL_STATUS" ]; then
+		echo "  [ failed: $message ]"
+		echo "$test_name $message" >> "$T_RESULTS/fail.log"
+		((failed++))
+
+		test -n "$T_ABORT" && die "aborting after first failure"
+	fi
+
+	# record results for TAP format output
+	t_tap_progress $test_name $sts
+	((testcount++))
+
 done

 msg "all tests run: $passed passed, $skipped skipped, $skipped_permitted skipped (permitted), $failed failed"

+
+if [ -n "$T_TRACE_GLOB" -o -n "$T_TRACE_PRINTK" ]; then
+	msg "saving traces and disabling tracing"
+	echo 0 > /sys/kernel/debug/tracing/events/scoutfs/enable
+	echo 0 > /sys/kernel/debug/tracing/options/trace_printk
+	cat /sys/kernel/debug/tracing/trace > "$T_RESULTS/traces"
+	if [ -n "$orig_trace_size" ]; then
+		echo $orig_trace_size > /sys/kernel/debug/tracing/buffer_size_kb
+	fi
+fi
+
 if [ "$skipped" == 0 -a "$failed" == 0 ]; then
 	msg "all tests passed"
 	unmount_all
--- a/tests/tests/srch-safe-merge-pos.sh
+++ b/tests/tests/srch-safe-merge-pos.sh
@@ -31,8 +31,8 @@ trap restore_compact_delay EXIT

 echo "== arm compaction triggers"
 for nr in $(t_fs_nrs); do
-	t_trigger_arm_silent srch_compact_logs_pad_safe $nr
-	t_trigger_arm_silent srch_merge_stop_safe $nr
+	t_trigger_arm srch_compact_logs_pad_safe $nr
+	t_trigger_arm srch_merge_stop_safe $nr
 done

 echo "== compact more often"
@@ -44,12 +44,11 @@ echo "== create padded sorted inputs by forcing log rotation"
 sv=$(t_server_nr)
 for i in $(seq 1 $COMPACT_NR); do
 	for j in $(seq 1 $COMPACT_NR); do
+		t_trigger_arm srch_force_log_rotate $sv
+
 		seq -f "f-$i-$j-$SEQF" 1 10 | \
 			bulk_create_paths -X "scoutfs.srch.t-srch-safe-merge-pos" -d "$T_D0" > \
 			/dev/null
-
-		t_trigger_arm_silent srch_force_log_rotate $sv
-
 		sync

 		test "$(t_trigger_get srch_force_log_rotate $sv)" == "0" || \
@@ -60,7 +59,7 @@ for i in $(seq 1 $COMPACT_NR); do
 	while test $padded == 0 && sleep .5; do
 		for nr in $(t_fs_nrs); do
 			if [ "$(t_trigger_get srch_compact_logs_pad_safe $nr)" == "0" ]; then
-				t_trigger_arm_silent srch_compact_logs_pad_safe $nr
+				t_trigger_arm srch_compact_logs_pad_safe $nr
 				padded=1
 				break
 			fi
Author	SHA1	Message	Date
Auke Kok	4e7cf76afc	Drain conn workers before nulling client->conn in destroy scoutfs_client_destroy nulled client->conn before scoutfs_net_free_conn had a chance to drain the conn's workqueue. An in-flight proc_worker running client_lock_recover dispatches scoutfs_lock_recover_request synchronously, which in turn calls scoutfs_client_lock_recover_response. That helper reads client->conn and hands it to scoutfs_net_response, so a racing NULL made submit_send dereference conn->lock and trip a KASAN null-ptr-deref followed by a GPF. Only became reachable in practice once reconnect started draining pending client requests with -ECONNRESET, because the farewell can now return while the server is still sending requests on the re-established socket. Reorder so scoutfs_net_free_conn runs first; its shutdown_worker drains conn->workq before any memory is freed, then client->conn is nulled. The original intent of nulling to catch buggy late callers is preserved. Signed-off-by: Auke Kok <auke.kok@versity.com>	2026-04-22 13:49:33 -07:00
Auke Kok	64200ed61c	Bound RPC waits in idempotent background workers The srch compact and orphan scan workers called sync request RPCs that would block indefinitely if the server stopped answering. Both workers are idempotent and reschedule on error, so blocking forever buys nothing compared to treating a stalled RPC as a failure and trying again on the next tick. Add scoutfs_net_sync_request_timeout, a bounded-wait variant that returns -ETIMEDOUT if the response doesn't arrive in time. Response state lives on a refcounted heap allocation rather than the caller's stack so a late callback can't scribble into freed memory. On timeout we race with an arriving response for the msend under conn->lock: if find_request wins we queue_dead_free and drop the callback's ref; otherwise we wait for the in-flight callback to complete before returning. Add _timeout typed wrappers for the four RPCs these workers use and thread a 5 minute bound in from each worker. All other callers keep the unbounded client_sync_request path with its reconnect retries. Signed-off-by: Auke Kok <auke.kok@versity.com>	2026-04-22 13:49:27 -07:00
Auke Kok	fe43c624aa	Fail pending client requests when reconnecting to new server Previously, client_greeting spliced pending requests back onto send_queue when reconnecting to a new server. Those requests carried state from the old server (sequence numbers, log tree references, lock modes) that was reclaimed at fence time, so resending against the new server was incorrect. Drain pending requests with -ECONNRESET at greeting time, mirroring the forcing_unmount drain in the shutdown worker. Thread the lock pointer through scoutfs_client_lock_request so the response callback can clear request_pending and wake waiters on error; otherwise a lock_key_range waiter would block forever because the new server's lock recovery only reports granted modes, not pending requests. Wrap the sync request senders in client_sync_request so userspace paths (statfs, mkdir, sysfs volopt, resize ioctl, walk-inodes ioctl) retry transparently across failover instead of surfacing a new -ECONNRESET that callers never saw before. Signed-off-by: Auke Kok <auke.kok@versity.com>	2026-04-22 13:49:20 -07:00
Auke Kok	0360462a35	Clear ref_blkno output when block is already dirty block_dirty_ref() skipped setting *ref_blkno when the block was already dirty, leaving the caller with a stale value. Set it to 0 on the already-dirty fast path so callers do not try to free a block that was not allocated. Signed-off-by: Auke Kok <auke.kok@versity.com>	2026-04-22 13:49:19 -07:00
Auke Kok	283564f9a2	Validate freed ref consistency in dirty_alloc_blocks Add a WARN_ON_ONCE check that the freed list ref blkno matches the block header blkno after dirtying alloc blocks. Also save and restore freed.first_nr on the error path, and initialize av_old/fr_old to 0 so the diagnostic message has valid values. Signed-off-by: Auke Kok <auke.kok@versity.com>	2026-04-22 13:49:18 -07:00
Auke Kok	e8f4d0b8cc	Break retry_forever loop on normal unmount retry_forever() only checked scoutfs_forcing_unmount(), so a normal unmount with a network error in the commit path would loop forever. Also check scoutfs_unmounting() so the write worker can exit cleanly. Signed-off-by: Auke Kok <auke.kok@versity.com>	2026-04-22 13:49:17 -07:00
Auke Kok	d52fa5d71b	lock: clear coverage and skip invalidation during unmount During normal unmount, lock_invalidate_worker can hang in scoutfs_trans_sync(sb, 1) because the trans commit path may return network errors that cause an infinite retry loop. Skip full lock_invalidate() during shutdown and unmount, and extract lock_clear_coverage() to still clean up coverage items in those paths and in scoutfs_lock_destroy(). Without this, coverage items can remain attached to locks being freed. Signed-off-by: Auke Kok <auke.kok@versity.com>	2026-04-22 13:49:16 -07:00
Auke Kok	2e45e9dc8c	net: break out of sync request wait during unmount Replace unbounded wait_for_completion() in scoutfs_net_sync_request() with a 60 second timeout loop that checks scoutfs_unmounting(). Cancel the queued request before returning -ESHUTDOWN so that sync_response cannot fire on freed stack memory after the caller returns. Signed-off-by: Auke Kok <auke.kok@versity.com>	2026-04-22 13:49:15 -07:00
Chris Kirby	d6a4034564	Suppress another forced shutdown error message The "server error emptying freed" error was causing a fence-and-reclaim test failure. In this case, the error was -ENOLINK, which we should ignore for messaging purposes. Signed-off-by: Chris Kirby <ckirby@versity.com>	2026-04-22 13:49:14 -07:00
Auke Kok	c2e8675a8c	Wake up lock waiters to prevent hangs during unmount. Add unmounting checks to lock_wait_cond() and lock_key_range() so that lock waiters wake up and new lock requests fail with -ESHUTDOWN during unmount. Replace the unbounded wait_event() with a 60 second timeout to prevent indefinite hangs. Relax the WARN_ON_ONCE at lock_key_range entry to only warn when not unmounting, since late lock attempts during shutdown are expected. Signed-off-by: Auke Kok <auke.kok@versity.com>	2026-04-22 13:49:13 -07:00
Auke Kok	7c6c3d223d	Add client timeout to farewell completion wait. Replace unbounded wait_for_completion() with a 120 second timeout to prevent indefinite hangs during unmount if the server never responds to the farewell request. Signed-off-by: Auke Kok <auke.kok@versity.com>	2026-04-22 13:49:12 -07:00
Auke Kok	6ec131da03	Add cond_resched in block_free_work I'm seeing consistent CPU soft lockups in block_free_work on my bare metal system that aren't reached by VM instances. The reason is that the bare metal machine has a ton more memory available causing the block free work queue to grow much larger in size, and then it has so much work that it can take 30+ seconds before it goes through it all. This is all with a debug kernel. A non debug kernel will likely zoom through the outstanding work here at a much faster rate. Signed-off-by: Auke Kok <auke.kok@versity.com>	2026-04-22 13:49:11 -07:00
Auke Kok	d76a217ff8	Set BLOCK_BIT_ERROR on bio submit failure during forced unmount block_submit_bio will return -ENOLINK if called during a forced shutdown, the bio is never submitted, and thus no completion callback will fire to set BLOCK_BIT_ERROR. Any other task waiting for this specific bp will end up waiting forever. To fix, fall through to the existing block_end_io call on the error path instead of returning directly. That means moving the forcing_unmount check past the setup calls so block_end_io's bookkeeping stays balanced. block_end_io then sets BLOCK_BIT_ERROR and wakes up waiters just as it would on a failed async completion. Signed-off-by: Auke Kok <auke.kok@versity.com>	2026-04-22 13:49:08 -07:00
Zach Brown	af31b9f1e8	Merge pull request #306 from versity/zab/v1.30 v1.30 Release	2026-04-22 10:43:17 -07:00
Zach Brown	ad65116d8f	v1.30 Release Finish the release notes for the 1.30 release. Signed-off-by: Zach Brown <zab@versity.com>	2026-04-21 16:43:12 -07:00