Test export-lookup-evict-race in a loop with tracing.

This test hits the unmount hang consistently in our CI the most, so run it in a tight loop and enable tracing. Discard traces when the run succeeded. This will blow up if a hung task timeout occurs, so we should crash on panic and then extract traces from the crash. Make sure we don't wait for an hour before doing so, then, too. Signed-off-by: Auke Kok <auke.kok@versity.com>
Merge pull request #224 from versity/auke/renameat2-test-sub-dir
2026-01-07 20:45:18 +00:00 · 2025-12-10 14:22:05 -08:00 · 2025-12-08 10:05:46 -08:00 · 2025-12-08 09:47:19 -08:00 · 2025-12-04 14:34:02 -05:00 · 2025-12-04 13:24:48 -05:00
6 changed files with 31 additions and 72 deletions
--- a/kmod/src/server.c
+++ b/kmod/src/server.c
@@ -3036,7 +3036,13 @@ static int server_commit_log_merge(struct super_block *sb,
 				  SCOUTFS_LOG_MERGE_STATUS_ZONE, 0, 0,
 				  &stat, sizeof(stat));
 	if (ret < 0) {
-		err_str = "getting merge status item";
+		/*
+		 * During a retransmission, it's possible that the server
+		 * already committed and resolved this log merge. ENOENT
+		 * is expected in that case.
+		 */
+		if (ret != -ENOENT)
+			err_str = "getting merge status item";
 		goto out;
 	}

--- a/tests/fenced-local-force-unmount.sh
+++ b/tests/fenced-local-force-unmount.sh
@@ -9,7 +9,7 @@
 echo "$0 running rid '$SCOUTFS_FENCED_REQ_RID' ip '$SCOUTFS_FENCED_REQ_IP' args '$@'"

 echo_fail() {
-	echo "$@" >> /dev/stderr
+	echo "$@" >&2
 	exit 1
 }

@@ -27,8 +27,7 @@ for fs in /sys/fs/scoutfs/*; do
 	nr="$(quiet_cat $fs/data_device_maj_min)"
 	[ ! -d "$fs" -o "$fs_rid" != "$rid" ] && continue

-	mnt=$(findmnt -l -n -t scoutfs -o TARGET -S $nr) || \
-		echo_fail "findmnt -t scoutfs -S $nr failed"
+	mnt=$(findmnt -l -n -t scoutfs -o TARGET -S $nr)
 	[ -z "$mnt" ] && continue

 	if ! umount -qf "$mnt"; then
--- a/tests/run-tests.sh
+++ b/tests/run-tests.sh
@@ -92,10 +92,14 @@ done
 T_TRACE_DUMP="0"
 T_TRACE_PRINTK="0"
 T_PORT_START="19700"
-T_LOOP_ITER="1"
+T_LOOP_ITER="100"

 # array declarations to be able to use array ops
 declare -a T_TRACE_GLOB
+T_TRACE_GLOB=( "scoutfs*" )
+
+# CI sets this to 3600, but, for this case we want it very short
+echo 30 > /proc/sys/kernel/hung_task_timeout_secs

 while true; do
 	case $1 in
@@ -493,6 +497,11 @@ crash_monitor()
 			bad=1
 		fi

+		if dmesg | grep -q "blocked for more than"; then
+			echo "run-tests monitor saw blocked task message"
+			bad=1
+		fi
+
 		if dmesg | grep -q "error indicated by fence action" ; then
 			echo "run-tests monitor saw fence agent error message"
 			bad=1
@@ -504,6 +513,8 @@ crash_monitor()
 		fi

 		if [ "$bad" != 0 ]; then
+			sync & # maybe this gets logs synced...
+			sleep .1
 			echo "run-tests monitor triggering crash"
 			echo c > /proc/sysrq-trigger
 			exit 1
@@ -706,6 +717,8 @@ for t in $tests; do
 		# stop looping if we didn't pass
 		if [ "$sts" != "$T_PASS_STATUS" ]; then
 			break;
+		else
+			echo > /sys/kernel/debug/tracing/trace
 		fi
 	done

--- a/tests/sequence
+++ b/tests/sequence
@@ -1,60 +1 @@
-export-get-name-parent.sh
-basic-block-counts.sh
-basic-bad-mounts.sh
-basic-posix-acl.sh
-inode-items-updated.sh
-simple-inode-index.sh
-simple-staging.sh
-simple-release-extents.sh
-simple-readdir.sh
-get-referring-entries.sh
-fallocate.sh
-basic-truncate.sh
-data-prealloc.sh
-setattr_more.sh
-offline-extent-waiting.sh
-move-blocks.sh
-projects.sh
-large-fragmented-free.sh
-format-version-forward-back.sh
-enospc.sh
-mmap.sh
-srch-safe-merge-pos.sh
-srch-basic-functionality.sh
-simple-xattr-unit.sh
-retention-basic.sh
-totl-xattr-tag.sh
-quota.sh
-lock-refleak.sh
-lock-shrink-consistency.sh
-lock-shrink-read-race.sh
-lock-pr-cw-conflict.sh
-lock-revoke-getcwd.sh
-lock-recover-invalidate.sh
 export-lookup-evict-race.sh
-createmany-parallel.sh
-createmany-large-names.sh
-createmany-rename-large-dir.sh
-stage-release-race-alloc.sh
-stage-multi-part.sh
-o_tmpfile.sh
-basic-posix-consistency.sh
-dirent-consistency.sh
-mkdir-rename-rmdir.sh
-lock-ex-race-processes.sh
-cross-mount-data-free.sh
-persistent-item-vers.sh
-setup-error-teardown.sh
-resize-devices.sh
-change-devices.sh
-fence-and-reclaim.sh
-quorum-heartbeat-timeout.sh
-orphan-inodes.sh
-mount-unmount-race.sh
-client-unmount-recovery.sh
-createmany-parallel-mounts.sh
-archive-light-cycle.sh
-block-stale-reads.sh
-inode-deletion.sh
-renameat2-noreplace.sh
-xfstests.sh
--- a/tests/tests/renameat2-noreplace.sh
+++ b/tests/tests/renameat2-noreplace.sh
@@ -8,19 +8,19 @@ t_require_mounts 2
 echo "=== renameat2 noreplace flag test"

 # give each mount their own dir (lock group) to minimize create contention
-mkdir $T_M0/dir0
-mkdir $T_M1/dir1
+mkdir $T_D0/dir0
+mkdir $T_D1/dir1

 echo "=== run two asynchronous calls to renameat2 NOREPLACE"
 for i in $(seq 0 100); do
        # prepare inputs in isolation
-        touch "$T_M0/dir0/old0"
-        touch "$T_M1/dir1/old1"
+        touch "$T_D0/dir0/old0"
+        touch "$T_D1/dir1/old1"

        # race doing noreplace renames, both can't succeed
-        dumb_renameat2 -n "$T_M0/dir0/old0" "$T_M0/dir0/sharednew" 2> /dev/null &
+        dumb_renameat2 -n "$T_D0/dir0/old0" "$T_D0/dir0/sharednew" 2> /dev/null &
        pid0=$!
-        dumb_renameat2 -n "$T_M1/dir1/old1" "$T_M1/dir0/sharednew" 2> /dev/null &
+        dumb_renameat2 -n "$T_D1/dir1/old1" "$T_D1/dir0/sharednew" 2> /dev/null &
        pid1=$!

        wait $pid0
@@ -31,7 +31,7 @@ for i in $(seq 0 100); do
        test "$rc0" == 0 -a "$rc1" == 0 && t_fail "both renames succeeded"

        # blow away possible files for either race outcome
-        rm -f "$T_M0/dir0/old0" "$T_M1/dir1/old1" "$T_M0/dir0/sharednew" "$T_M1/dir1/sharednew"
+        rm -f "$T_D0/dir0/old0" "$T_D1/dir1/old1" "$T_D0/dir0/sharednew" "$T_D1/dir1/sharednew"
 done

 t_pass
--- a/utils/fenced/scoutfs-fenced
+++ b/utils/fenced/scoutfs-fenced
@@ -7,7 +7,7 @@ message_output()

 error_message()
 {
-	message_output "$@" >> /dev/stderr
+	message_output "$@" >&2
 }

 error_exit()
Author	SHA1	Message	Date
Auke Kok	523bbfd0b2	Test export-lookup-evict-race in a loop with tracing. This test hits the unmount hang consistently in our CI the most, so run it in a tight loop and enable tracing. Discard traces when the run succeeded. This will blow up if a hung task timeout occurs, so we should crash on panic and then extract traces from the crash. Make sure we don't wait for an hour before doing so, then, too. Signed-off-by: Auke Kok <auke.kok@versity.com>	2025-12-10 14:22:05 -08:00
Zach Brown	1768f69c3c	Merge pull request #224 from versity/auke/renameat2-test-sub-dir Use T_D0/1 instead of T_M0 here.	2025-12-08 10:05:46 -08:00
Zach Brown	dcb0fd5805	Merge pull request #268 from versity/auke/dont_use_bash_special_stdfiles Avoid using bash special device nodes.	2025-12-08 09:47:19 -08:00
Auke Kok	660f874488	Use T_D0/1 instead of T_M0 here. Use of T_M0 and variants should be reserved for e.g. scoutfs <subcommand> -p <mountpoint> type of usages. Tests should create individual content files in the assigned subdirectory. Signed-off-by: Auke Kok <auke.kok@versity.com>	2025-12-04 14:34:02 -05:00
Auke Kok	2884a92408	Avoid using bash special device nodes. Bash has special handling when these standard IO files, but there are cases where customers have special restrictions set on them. Likely to avoid leaking error data out of system logs as part of IDS software. In any case, we can just reopen existing file descriptors here in both these cases to avoid this entirely. This will always work. Signed-off-by: Auke Kok <auke.kok@versity.com>	2025-12-04 13:24:48 -05:00
Zach Brown	e194714004	Merge pull request #264 from versity/auke/findmnt_retval Findmnt returns 1 when no matching entries found	2025-12-03 14:29:31 -08:00
Auke Kok	8bb2f83cf9	Findmnt returns 1 when no matching entries found Our local fence script attempts to interpret errors executing `findmnt` as critical errors, but the program exit code explicitly returns EXIT_FAILURE when the total number of matching mount entries is zero. This can happen if the mount disappeared while we're attempting to fence the mount, but, the scoutfs sysfs files are still in place as we read them. It's a small window, but, it's a fork/exec plus full parse of /etc/fstab, and a lot can happen in the 0.015s findmnt takes on my system. There's no other exit codes from findmnt other than 0 and 1. At that point, we can only assume that if the stdout is empty, the mount isn't there anymore. Signed-off-by: Auke Kok <auke.kok@versity.com>	2025-12-02 12:55:11 -08:00
Zach Brown	6a9a6789d5	Merge pull request #267 from versity/clk/merge_enoent Handle ENOENT when getting log merge status item	2025-12-02 09:34:28 -08:00
Chris Kirby	ee630b164f	Handle ENOENT when getting log merge status item Tests that cause client retries can fail with this error from server_commit_log_merge(): error -2 committing log merge: getting merge status item This can happen if the server has already committed and resolved the log merge that is being retried. We can safely ignore ENOENT here just like we do a few lines later. Signed-off-by: Chris Kirby <ckirby@versity.com>	2025-12-01 08:58:24 -06:00