Add tuned profile with recommended vm tunings for scoutfs.

Install a scoutfs `tuned` profile as a system-profile for `tuned` with recommendation hint to enable this. The goal is to provide a set of basic VM tunings that are reasonably good starting point for production deployment of scoutfs. The tunings are chosen to reflect good practices to aim for responsiveness of the scoutfs deployment. The values chosen are based on existing tuned profiles, building on throughput-performance and network-throughput as a base, and tuning VM values in the same way that latency-performance does. All of them enable the performance CPU governor. None of this enables powersave settings. Different deployments may have different performance characteristics and require further adjustment, or even a completely different profile. Signed-off-by: Auke Kok <auke.kok@versity.com>
Merge pull request #266 from versity/zab/increase_move_empty_budget
2026-05-01 18:35:43 +00:00 · 2026-01-06 13:06:03 -08:00 · 2025-12-18 12:44:20 -08:00 · 2025-12-17 14:22:04 -06:00 · 2025-12-17 11:06:32 -08:00 · 2025-12-17 11:04:00 -08:00
12 changed files with 139 additions and 269 deletions
--- a/kmod/src/server.c
+++ b/kmod/src/server.c
@@ -1618,7 +1618,8 @@ static int server_get_log_trees(struct super_block *sb,
 		goto update;
 	}

-	ret = alloc_move_empty(sb, &super->data_alloc, &lt.data_freed, 100);
+	ret = alloc_move_empty(sb, &super->data_alloc, &lt.data_freed,
+			       COMMIT_HOLD_ALLOC_BUDGET / 2);
 	if (ret == -EINPROGRESS)
 		ret = 0;
 	if (ret < 0) {
@@ -1913,9 +1914,11 @@ static int reclaim_open_log_tree(struct super_block *sb, u64 rid)
 	       scoutfs_alloc_splice_list(sb, &server->alloc, &server->wri, server->other_freed,
 					 &lt.meta_avail)) ?:
 	      (err_str = "empty data_avail",
-	       alloc_move_empty(sb, &super->data_alloc, &lt.data_avail, 100)) ?:
+	       alloc_move_empty(sb, &super->data_alloc, &lt.data_avail,
+				COMMIT_HOLD_ALLOC_BUDGET / 2)) ?:
 	      (err_str = "empty data_freed",
-	       alloc_move_empty(sb, &super->data_alloc, &lt.data_freed, 100));
+	       alloc_move_empty(sb, &super->data_alloc, &lt.data_freed,
+				COMMIT_HOLD_ALLOC_BUDGET / 2));
 	mutex_unlock(&server->alloc_mutex);

 	/* only finalize, allowing merging, once the allocators are fully freed */
@@ -3036,7 +3039,13 @@ static int server_commit_log_merge(struct super_block *sb,
 				  SCOUTFS_LOG_MERGE_STATUS_ZONE, 0, 0,
 				  &stat, sizeof(stat));
 	if (ret < 0) {
-		err_str = "getting merge status item";
+		/*
+		 * During a retransmission, it's possible that the server
+		 * already committed and resolved this log merge. ENOENT
+		 * is expected in that case.
+		 */
+		if (ret != -ENOENT)
+			err_str = "getting merge status item";
 		goto out;
 	}

--- a/tests/fenced-local-force-unmount.sh
+++ b/tests/fenced-local-force-unmount.sh
@@ -9,7 +9,7 @@
 echo "$0 running rid '$SCOUTFS_FENCED_REQ_RID' ip '$SCOUTFS_FENCED_REQ_IP' args '$@'"

 echo_fail() {
-	echo "$@" >> /dev/stderr
+	echo "$@" >&2
 	exit 1
 }

@@ -27,8 +27,7 @@ for fs in /sys/fs/scoutfs/*; do
 	nr="$(quiet_cat $fs/data_device_maj_min)"
 	[ ! -d "$fs" -o "$fs_rid" != "$rid" ] && continue

-	mnt=$(findmnt -l -n -t scoutfs -o TARGET -S $nr) || \
-		echo_fail "findmnt -t scoutfs -S $nr failed"
+	mnt=$(findmnt -l -n -t scoutfs -o TARGET -S $nr)
 	[ -z "$mnt" ] && continue

 	if ! umount -qf "$mnt"; then
--- a/tests/funcs/filter.sh
+++ b/tests/funcs/filter.sh
@@ -170,6 +170,9 @@ t_filter_dmesg()
 	# some ci test guests are unresponsive
 	re="$re|longest quorum heartbeat .* delay"

+	# creating block devices may trigger this
+	re="$re|block device autoloading is deprecated and will be removed."
+
 	egrep -v "($re)" | \
 		ignore_harmless_unwind_kasan_stack_oob
 }
--- a/tests/funcs/tap.sh
+++ b/tests/funcs/tap.sh
@@ -43,9 +43,14 @@ t_tap_progress()
 	local testname=$1
 	local result=$2

+	local stmsg=""
 	local diff=""
 	local dmsg=""

+	if [[ -s $T_RESULTS/tmp/${testname}/status.msg ]]; then
+		stmsg="1"
+	fi
+
 	if [[ -s "$T_RESULTS/tmp/${testname}/dmesg.new" ]]; then
 		dmsg="1"
 	fi
@@ -61,6 +66,7 @@ t_tap_progress()
 		echo "# ${testname} ** skipped - permitted **"
 	else
 		echo "not ok ${i} - ${testname}"
+
 		case ${result} in
 		101)
 			echo "# ${testname} ** skipped **"
@@ -70,6 +76,13 @@ t_tap_progress()
 			;;
 		esac

+		if [[ -n "${stmsg}" ]]; then
+			echo "#"
+			echo "# status:"
+			echo "#"
+			cat $T_RESULTS/tmp/${testname}/status.msg | sed 's/^/# - /'
+		fi
+
 		if [[ -n "${diff}" ]]; then
 			echo "#"
 			echo "# diff:"
--- a/tests/tests/get-referring-entries.sh
+++ b/tests/tests/get-referring-entries.sh
@@ -72,7 +72,7 @@ touch $T_D0/dir/file
 mkdir $T_D0/dir/dir
 ln -s $T_D0/dir/file $T_D0/dir/symlink
 mknod $T_D0/dir/char c 1 3 # null
-mknod $T_D0/dir/block b 7 0 # loop0
+mknod $T_D0/dir/block b 42 0 # SAMPLE block dev - nonexistant/demo use only number
 for name in $(ls -UA $T_D0/dir | sort); do
 	ino=$(stat -c '%i' $T_D0/dir/$name)
 	$GRE $ino | filter_types
--- a/tests/tests/renameat2-noreplace.sh
+++ b/tests/tests/renameat2-noreplace.sh
@@ -8,19 +8,19 @@ t_require_mounts 2
 echo "=== renameat2 noreplace flag test"

 # give each mount their own dir (lock group) to minimize create contention
-mkdir $T_M0/dir0
-mkdir $T_M1/dir1
+mkdir $T_D0/dir0
+mkdir $T_D1/dir1

 echo "=== run two asynchronous calls to renameat2 NOREPLACE"
 for i in $(seq 0 100); do
        # prepare inputs in isolation
-        touch "$T_M0/dir0/old0"
-        touch "$T_M1/dir1/old1"
+        touch "$T_D0/dir0/old0"
+        touch "$T_D1/dir1/old1"

        # race doing noreplace renames, both can't succeed
-        dumb_renameat2 -n "$T_M0/dir0/old0" "$T_M0/dir0/sharednew" 2> /dev/null &
+        dumb_renameat2 -n "$T_D0/dir0/old0" "$T_D0/dir0/sharednew" 2> /dev/null &
        pid0=$!
-        dumb_renameat2 -n "$T_M1/dir1/old1" "$T_M1/dir0/sharednew" 2> /dev/null &
+        dumb_renameat2 -n "$T_D1/dir1/old1" "$T_D1/dir0/sharednew" 2> /dev/null &
        pid1=$!

        wait $pid0
@@ -31,7 +31,7 @@ for i in $(seq 0 100); do
        test "$rc0" == 0 -a "$rc1" == 0 && t_fail "both renames succeeded"

        # blow away possible files for either race outcome
-        rm -f "$T_M0/dir0/old0" "$T_M1/dir1/old1" "$T_M0/dir0/sharednew" "$T_M1/dir1/sharednew"
+        rm -f "$T_D0/dir0/old0" "$T_D1/dir1/old1" "$T_D0/dir0/sharednew" "$T_D1/dir1/sharednew"
 done

 t_pass
--- a/utils/fenced/scoutfs-fenced
+++ b/utils/fenced/scoutfs-fenced
@@ -7,7 +7,7 @@ message_output()

 error_message()
 {
-	message_output "$@" >> /dev/stderr
+	message_output "$@" >&2
 }

 error_exit()
--- a/utils/man/scoutfs.8
+++ b/utils/man/scoutfs.8
@@ -402,39 +402,25 @@ before destroying an old empty data device.
 .PD

 .TP
-.BI "print {-a|--allocs} {-i|--items ITEMS} {-r|--roots ROOTS} {-S|--skip-likely-huge} META-DEVICE"
+.BI "print {-S|--skip-likely-huge} META-DEVICE"
 .sp
-Prints out some or all of the metadata in the file system.  This makes no effort
+Prints out all of the metadata in the file system.  This makes no effort
 to ensure that the structures are consistent as they're traversed and
 can present structures that seem corrupt as they change as they're
 output.
-.sp
-Structures that are related to the number of mounts and are maintained at a
-relatively reasonable size are always printed. These include per-mount log
-trees, srch files, allocators, and the metadata allocators used by server
-commits. Other btrees and their items can be selected as desired.
 .RS 1.0i
 .PD 0
+.TP
 .sp
-.TP
-.B "-a, --allocs"
-Print the metadata and data allocators. Enabled by default.
-.TP
-.B "-r, --roots ROOTS"
-This option can be used to select which btrees are traversed. It is a comma-separated list containing one or more of the following btree roots: logs, srch, fs. Default is all roots.
-.TP
-.B "-i, --items ITEMS"
-This option can be used to choose which btree items are printed from the
-selected btree roots. It is a comma-separated list containing one or
-more of the following items: inode, xattr, dirent, symlink, backref, extent.
-Default is all items.
-.TP
 .B "-S, --skip-likely-huge"
 Skip printing structures that are likely to be very large.  The
 structures that are skipped tend to be global and whose size tends to be
 related to the size of the volume.   Examples of skipped structures include
 the global fs items, srch files, and metadata and data
-allocators.
+allocators.  Similar structures that are not skipped are related to the
+number of mounts and are maintained at a relatively reasonable size.
+These include per-mount log trees, srch files, allocators, and the
+metadata allocators used by server commits.
 .sp
 Skipping the larger structures limits the print output to a relatively
 constant size rather than being a large multiple of the used metadata
--- a/utils/scoutfs-utils.spec.in
+++ b/utils/scoutfs-utils.spec.in
@@ -4,6 +4,12 @@

 %{!?_release: %global _release 0.%{pkg_date}git%{pkg_git_hash}}

+%if 0%{?rhel} && 0%{?rhel} < 10
+%global tuned_profiles_dir %{_prefix}/lib/tuned
+%else
+%global tuned_profiles_dir %{_prefix}/lib/tuned/profiles
+%endif
+
 Name:           scoutfs-utils
 Summary:        scoutfs user space utilities
 Version:        %{pkg_version}
@@ -57,6 +63,8 @@ install -m 644 -D src/format.h $RPM_BUILD_ROOT%{_includedir}/scoutfs/format.h
 install -m 755 -D fenced/scoutfs-fenced $RPM_BUILD_ROOT%{_libexecdir}/scoutfs-fenced/scoutfs-fenced
 install -m 644 -D fenced/scoutfs-fenced.service $RPM_BUILD_ROOT%{_unitdir}/scoutfs-fenced.service
 install -m 644 -D fenced/scoutfs-fenced.conf.example $RPM_BUILD_ROOT%{_sysconfdir}/scoutfs/scoutfs-fenced.conf.example
+install -m 644 -D tuned/tuned.conf $RPM_BUILD_ROOT%{tuned_profiles_dir}/scoutfs/tuned.conf
+install -m 644 -D tuned/40-scoutfs.conf $RPM_BUILD_ROOT%{_prefix}/lib/tuned/recommend.d/40-scoutfs.conf

 %files
 %defattr(644,root,root,755)
@@ -66,6 +74,8 @@ install -m 644 -D fenced/scoutfs-fenced.conf.example $RPM_BUILD_ROOT%{_sysconfdi
 %defattr(755,root,root,755)
 %{_sbindir}/scoutfs
 %{_libexecdir}/scoutfs-fenced
+%{tuned_profiles_dir}/scoutfs/tuned.conf
+%{_prefix}/lib/tuned/recommend.d/40-scoutfs.conf

 %files -n scoutfs-devel
 %defattr(644,root,root,755)
--- a/utils/src/print.c
+++ b/utils/src/print.c
@@ -29,42 +29,6 @@
 #include "leaf_item_hash.h"
 #include "dev.h"

-struct print_args {
-	char *meta_device;
-	bool skip_likely_huge;
-	bool roots_requested;
-	bool items_requested;
-	bool allocs_requested;
-	bool walk_allocs;
-	bool walk_logs_root;
-	bool walk_fs_root;
-	bool walk_srch_root;
-	bool print_inodes;
-	bool print_xattrs;
-	bool print_dirents;
-	bool print_symlinks;
-	bool print_backrefs;
-	bool print_extents;
-};
-
-static struct print_args print_args = {
-	.meta_device	  = NULL,
-	.skip_likely_huge = false,
-	.roots_requested  = false,
-	.items_requested  = false,
-	.allocs_requested = false,
-	.walk_allocs	  = true,
-	.walk_logs_root	  = true,
-	.walk_fs_root	  = true,
-	.walk_srch_root	  = true,
-	.print_inodes	  = true,
-	.print_xattrs	  = true,
-	.print_dirents	  = true,
-	.print_symlinks	  = true,
-	.print_backrefs	  = true,
-	.print_extents	  = true
-};
-
 static void print_block_header(struct scoutfs_block_header *hdr, int size)
 {
 	u32 crc = crc_block(hdr, size);
@@ -231,7 +195,7 @@ static void print_inode_index(struct scoutfs_key *key, void *val, int val_len)

 typedef void (*print_func_t)(struct scoutfs_key *key, void *val, int val_len);

-static print_func_t find_printer(u8 zone, u8 type, bool *suppress)
+static print_func_t find_printer(u8 zone, u8 type)
 {
 	if (zone == SCOUTFS_INODE_INDEX_ZONE &&
 	    type >= SCOUTFS_INODE_INDEX_META_SEQ_TYPE  &&
@@ -254,34 +218,13 @@ static print_func_t find_printer(u8 zone, u8 type, bool *suppress)

 	if (zone == SCOUTFS_FS_ZONE) {
 		switch(type) {
-			case SCOUTFS_INODE_TYPE:
-				if (!print_args.print_inodes)
-					*suppress = true;
-				return print_inode;
-			case SCOUTFS_XATTR_TYPE:
-				if (!print_args.print_xattrs)
-					*suppress = true;
-				return print_xattr;
-			case SCOUTFS_DIRENT_TYPE:
-				if (!print_args.print_dirents)
-					*suppress = true;
-				return print_dirent;
-			case SCOUTFS_READDIR_TYPE:
-				if (!print_args.print_dirents)
-					*suppress = true;
-				return print_dirent;
-			case SCOUTFS_SYMLINK_TYPE:
-				if (!print_args.print_symlinks)
-					*suppress = true;
-				return print_symlink;
-			case SCOUTFS_LINK_BACKREF_TYPE:
-				if (!print_args.print_backrefs)
-					*suppress = true;
-				return print_dirent;
-			case SCOUTFS_DATA_EXTENT_TYPE:
-				if (!print_args.print_extents)
-					*suppress = true;
-				return print_data_extent;
+			case SCOUTFS_INODE_TYPE: return print_inode;
+			case SCOUTFS_XATTR_TYPE: return print_xattr;
+			case SCOUTFS_DIRENT_TYPE: return print_dirent;
+			case SCOUTFS_READDIR_TYPE: return print_dirent;
+			case SCOUTFS_SYMLINK_TYPE: return print_symlink;
+			case SCOUTFS_LINK_BACKREF_TYPE: return print_dirent;
+			case SCOUTFS_DATA_EXTENT_TYPE: return print_data_extent;
 		}
 	}

@@ -301,16 +244,12 @@ static int print_fs_item(struct scoutfs_key *key, u64 seq, u8 flags, void *val,

 	/* only items in leaf blocks have values */
 	if (val != NULL && !(flags & SCOUTFS_ITEM_FLAG_DELETION)) {
-		bool suppress = false;
-
-		printer = find_printer(key->sk_zone, key->sk_type, &suppress);
-		if (printer) {
-			if (!suppress)
-				printer(key, val, val_len);
-		} else {
+		printer = find_printer(key->sk_zone, key->sk_type);
+		if (printer)
+			printer(key, val, val_len);
+		else
 			printf("      (unknown zone %u type %u)\n",
 			       key->sk_zone, key->sk_type);
-		}
 	}

 	return 0;
@@ -1098,7 +1037,12 @@ static void print_super_block(struct scoutfs_super_block *super, u64 blkno)
 	}
 }

-static int print_volume(int fd)
+struct print_args {
+	char *meta_device;
+	bool skip_likely_huge;
+};
+
+static int print_volume(int fd, struct print_args *args)
 {
 	struct scoutfs_super_block *super = NULL;
 	struct print_recursion_args pa;
@@ -1148,7 +1092,7 @@ static int print_volume(int fd)
 			ret = err;
 	}

-	if (print_args.walk_allocs) {
+	if (!args->skip_likely_huge) {
 		for (i = 0; i < array_size(super->meta_alloc); i++) {
 			snprintf(str, sizeof(str), "meta_alloc[%u]", i);
 			err = print_btree(fd, super, str, &super->meta_alloc[i].root,
@@ -1175,21 +1119,18 @@ static int print_volume(int fd)

 	pa.super = super;
 	pa.fd = fd;
-	if (print_args.walk_srch_root) {
+	if (!args->skip_likely_huge) {
 		err = print_btree_leaf_items(fd, super, &super->srch_root.ref,
 					     print_srch_root_files, &pa);
 		if (err && !ret)
 			ret = err;
 	}
+	err = print_btree_leaf_items(fd, super, &super->logs_root.ref,
+				     print_log_trees_roots, &pa);
+	if (err && !ret)
+		ret = err;

-	if (print_args.walk_logs_root) {
-		err = print_btree_leaf_items(fd, super, &super->logs_root.ref,
-					     print_log_trees_roots, &pa);
-		if (err && !ret)
-			ret = err;
-	}
-
-	if (print_args.walk_fs_root) {
+	if (!args->skip_likely_huge) {
 		err = print_btree(fd, super, "fs_root", &super->fs_root,
 				  print_fs_item, NULL);
 		if (err && !ret)
@@ -1202,16 +1143,16 @@ out:
 	return ret;
 }

-static int do_print(void)
+static int do_print(struct print_args *args)
 {
 	int ret;
 	int fd;

-	fd = open(print_args.meta_device, O_RDONLY);
+	fd = open(args->meta_device, O_RDONLY);
 	if (fd < 0) {
 		ret = -errno;
 		fprintf(stderr, "failed to open '%s': %s (%d)\n",
-			print_args.meta_device, strerror(errno), errno);
+			args->meta_device, strerror(errno), errno);
 		return ret;
 	}

@@ -1219,169 +1160,30 @@ static int do_print(void)
 	if (ret < 0)
 		goto out;

-	ret = print_volume(fd);
+	ret = print_volume(fd, args);
 out:
 	close(fd);
 	return ret;
 };

-enum {
-	LOGS_OPT = 0,
-	FS_OPT,
-	SRCH_OPT
-};
-
-static char *const root_tokens[] = {
-	[LOGS_OPT] = "logs",
-	[FS_OPT] =   "fs",
-	[SRCH_OPT] = "srch",
-	NULL
-};
-
-enum {
-	INODE_OPT = 0,
-	XATTR_OPT,
-	DIRENT_OPT,
-	SYMLINK_OPT,
-	BACKREF_OPT,
-	EXTENT_OPT
-};
-
-static char *const item_tokens[] = {
-	[INODE_OPT] =   "inode",
-	[XATTR_OPT] =   "xattr",
-	[DIRENT_OPT] =  "dirent",
-	[SYMLINK_OPT] = "symlink",
-	[BACKREF_OPT] = "backref",
-	[EXTENT_OPT] =  "extent",
-	NULL
-};
-
-static void clear_items(void)
-{
-	print_args.print_inodes = false;
-	print_args.print_xattrs = false;
-	print_args.print_dirents = false;
-	print_args.print_symlinks = false;
-	print_args.print_backrefs = false;
-	print_args.print_extents = false;
-}
-
-static void clear_roots(void)
-{
-	print_args.walk_logs_root = false;
-	print_args.walk_fs_root = false;
-	print_args.walk_srch_root = false;
-}
-
 static int parse_opt(int key, char *arg, struct argp_state *state)
 {
 	struct print_args *args = state->input;
-	char *subopts;
-	char *value;
-	bool parse_err = false;

 	switch (key) {
 	case 'S':
 		args->skip_likely_huge = true;
 		break;
-
-	case 'a':
-		args->allocs_requested = true;
-		args->walk_allocs = true;
-		break;
-
-	case 'i':
-		/* Specific items being requested- clear them all to start */
-		if (!args->items_requested) {
-			clear_items();
-			if (!args->allocs_requested)
-				args->walk_allocs = false;
-			args->items_requested = true;
-		}
-
-		subopts = arg;
-		while (*subopts != '\0' && !parse_err) {
-			switch (getsubopt(&subopts, item_tokens, &value)) {
-			case INODE_OPT:
-				args->print_inodes = true;
-				break;
-			case XATTR_OPT:
-				args->print_xattrs = true;
-				break;
-			case DIRENT_OPT:
-				args->print_dirents = true;
-				break;
-			case SYMLINK_OPT:
-				args->print_symlinks = true;
-				break;
-			case BACKREF_OPT:
-				args->print_backrefs = true;
-				break;
-			case EXTENT_OPT:
-				args->print_extents = true;
-				break;
-			default:
-				argp_usage(state);
-				parse_err = true;
-				break;
-			}
-		}
-		break;
-
-	case 'r':
-		/* Specific roots being requested- clear them all to start */
-		if (!args->roots_requested) {
-			clear_roots();
-			if (!args->allocs_requested)
-				args->walk_allocs = false;
-			args->roots_requested = true;
-		}
-
-		subopts = arg;
-		while (*subopts != '\0' && !parse_err) {
-			switch (getsubopt(&subopts, root_tokens, &value)) {
-			case LOGS_OPT:
-				args->walk_logs_root = true;
-				break;
-			case FS_OPT:
-				args->walk_fs_root = true;
-				break;
-			case SRCH_OPT:
-				args->walk_srch_root = true;
-				break;
-			default:
-				argp_usage(state);
-				parse_err = true;
-				break;
-			}
-		}
-		break;
-
 	case ARGP_KEY_ARG:
 		if (!args->meta_device)
 			args->meta_device = strdup_or_error(state, arg);
 		else
 			argp_error(state, "more than one argument given");
 		break;
-
 	case ARGP_KEY_FINI:
 		if (!args->meta_device)
 			argp_error(state, "no metadata device argument given");
-
-		/*
-		 * For backwards compatibility, translate -S. Should we warn if
-		 * this conflicts with other explicit options?
-		 */
-		if (args->skip_likely_huge) {
-			if (!args->allocs_requested)
-				args->walk_allocs = false;
-			args->walk_fs_root = false;
-			args->walk_srch_root = false;
-		}
-
 		break;
-
 	default:
 		break;
 	}
@@ -1390,10 +1192,7 @@ static int parse_opt(int key, char *arg, struct argp_state *state)
 }

 static struct argp_option options[] = {
-	{ "allocs", 'a', NULL, 0, "Print metadata and data alloc lists" },
-	{ "items", 'i', "ITEMS", 0, "Item(s) to print (inode, xattr, dirent, symlink, backref, extent)" },
-	{ "roots", 'r', "ROOTS", 0, "Tree root(s) to walk (logs, srch, fs)" },
-	{ "skip-likely-huge", 'S', NULL, 0, "Skip allocs, srch root and fs root to minimize output size" },
+	{ "skip-likely-huge", 'S', NULL, 0, "Skip large structures to minimize output size"},
 	{ NULL }
 };

@@ -1406,15 +1205,17 @@ static struct argp argp = {

 static int print_cmd(int argc, char **argv)
 {
+	struct print_args print_args = {NULL};
 	int ret;

 	ret = argp_parse(&argp, argc, argv, 0, NULL, &print_args);
 	if (ret)
 		return ret;

-	return do_print();
+	return do_print(&print_args);
 }

+
 static void __attribute__((constructor)) print_ctor(void)
 {
 	cmd_register_argp("print", &argp, GROUP_DEBUG, print_cmd);
--- a/utils/tuned/40-scoutfs.conf
+++ b/utils/tuned/40-scoutfs.conf
@@ -0,0 +1,9 @@
+#
+# scoutfs tuned recommendation
+#
+
+# If the system has support for mounting scoutfs filesystems, which is
+# valid for client mounts and quorum mounts. We then always recommend
+# the scoutfs profile.
+[scoutfs]
+/proc/filesystems=scoutfs
--- a/utils/tuned/tuned.conf
+++ b/utils/tuned/tuned.conf
@@ -0,0 +1,40 @@
+#
+# ScoutFS specific tuned profile
+#
+
+# The parameters below are a mix of settings present in the throughput-performance
+# profile as well as the latency-performance profile. Generally speaking, we
+# want to encourage the system to avoid swap and accumulating large amounts of
+# dirty data, as this can cause reclaim to lead to congestion.
+
+# Enable this profile with `$ sudo tuned-adm profile scoutfs`
+
+# linux default values are marked with [<value>] for reference.
+
+[main]
+summary=Optimize for production scoutfs deployment
+description=Configures the system for production scoutfs filesystem server deployment.
+
+# network-throughput sets some larger buffers useful for 40gbe deployments
+# network-throughput also inherits throughput-performance
+include=network-throughput
+
+[vm]
+# throughput-performance sets dirty_bytes to 40% (much larger than linux default), but
+# this allows the accumulation of large backlogs of writeback. We prefer to writeback
+# often and early to avoid congestion [20%]
+dirty_bytes = 10%
+# start writing back at this amount [10%]
+dirty_background_bytes = 5%
+
+[sysctl]
+# the kernel default is 60. Lower it to instruct the kernel that swapping is
+# expensive and we want to avoid it. We assume scoutfs deployments have ample
+# available RAM. [60]
+vm.swappiness = 10
+
+# increase pdflush runs so it can more aggressively write out dirty data [500]
+vm.dirty_writeback_centisecs = 300
+
+# decrease time dirty data will linger before being written back [3000]
+vm.dirty_expire_centisecs = 2000
Author	SHA1	Message	Date
Auke Kok	21a676876b	Add tuned profile with recommended vm tunings for scoutfs. Install a scoutfs `tuned` profile as a system-profile for `tuned` with recommendation hint to enable this. The goal is to provide a set of basic VM tunings that are reasonably good starting point for production deployment of scoutfs. The tunings are chosen to reflect good practices to aim for responsiveness of the scoutfs deployment. The values chosen are based on existing tuned profiles, building on throughput-performance and network-throughput as a base, and tuning VM values in the same way that latency-performance does. All of them enable the performance CPU governor. None of this enables powersave settings. Different deployments may have different performance characteristics and require further adjustment, or even a completely different profile. Signed-off-by: Auke Kok <auke.kok@versity.com>	2026-01-06 13:06:03 -08:00
Zach Brown	50bff13f21	Merge pull request #266 from versity/zab/increase_move_empty_budget Increase server commit block budget for alloc move	2025-12-18 12:44:20 -08:00
Zach Brown	de70ca2372	Increase server commit block budget for alloc move A few callers of alloc_move_empty in the server were providing a budget that was too small. Recent changes to extent_mod_blocks increased the max budget that is necessary to move extents between btrees. The existing WAG of 100 was too small for trees of height 2 and 3. This caused looping in production. We can increase the move budget to half the overall commit budget, which leaves room for a height of around 7 each. This is much greater than we see in practice because the size of the per-mount btrees is effectiely limited by both watermarks and thresholds to commit and drain. Signed-off-by: Zach Brown <zab@versity.com>	2025-12-17 14:22:04 -06:00
Zach Brown	5af1412d5f	Merge pull request #270 from versity/auke/bdev_autoloading Avoid block device autoloading warning.	2025-12-17 11:06:32 -08:00
Zach Brown	0a2b2ad409	Merge pull request #269 from versity/auke/tap_status_msg Include t_fail status in tap output.	2025-12-17 11:04:00 -08:00
Auke Kok	6c4590a8a0	Avoid block device autoloading warning. It's possible to trigger the block device autoloading mechanism with a mknod()/stat(), and this mechanism has long been declared obsolete, thus triggering a dmesg warning since el9_7, which then fails the test. You may need to `rmmod loop` to reproduce. Avoid this by avoiding to trigger a loop autoload - we just make a different blockdev. Chosing `42` here should avoid any autoload mechanism as this number is explicitly for demo drivers and should never trigger an autoload. We also just ignore the warning line in dmesg. Other tests can and might perhaps still trigger this, as well as background noise running during the test. Signed-off-by: Auke Kok <auke.kok@versity.com>	2025-12-08 13:04:58 -08:00
Zach Brown	1768f69c3c	Merge pull request #224 from versity/auke/renameat2-test-sub-dir Use T_D0/1 instead of T_M0 here.	2025-12-08 10:05:46 -08:00
Zach Brown	dcb0fd5805	Merge pull request #268 from versity/auke/dont_use_bash_special_stdfiles Avoid using bash special device nodes.	2025-12-08 09:47:19 -08:00
Auke Kok	660f874488	Use T_D0/1 instead of T_M0 here. Use of T_M0 and variants should be reserved for e.g. scoutfs <subcommand> -p <mountpoint> type of usages. Tests should create individual content files in the assigned subdirectory. Signed-off-by: Auke Kok <auke.kok@versity.com>	2025-12-04 14:34:02 -05:00
Auke Kok	e1a6689a9b	Include t_fail status in tap output. The tap output file was not yet complete as it failed to include the contents of `status.msg`. In a few cases, that would mean it lacks important context. Signed-off-by: Auke Kok <auke.kok@versity.com>	2025-12-04 14:09:39 -05:00
Auke Kok	2884a92408	Avoid using bash special device nodes. Bash has special handling when these standard IO files, but there are cases where customers have special restrictions set on them. Likely to avoid leaking error data out of system logs as part of IDS software. In any case, we can just reopen existing file descriptors here in both these cases to avoid this entirely. This will always work. Signed-off-by: Auke Kok <auke.kok@versity.com>	2025-12-04 13:24:48 -05:00
Zach Brown	e194714004	Merge pull request #264 from versity/auke/findmnt_retval Findmnt returns 1 when no matching entries found	2025-12-03 14:29:31 -08:00
Auke Kok	8bb2f83cf9	Findmnt returns 1 when no matching entries found Our local fence script attempts to interpret errors executing `findmnt` as critical errors, but the program exit code explicitly returns EXIT_FAILURE when the total number of matching mount entries is zero. This can happen if the mount disappeared while we're attempting to fence the mount, but, the scoutfs sysfs files are still in place as we read them. It's a small window, but, it's a fork/exec plus full parse of /etc/fstab, and a lot can happen in the 0.015s findmnt takes on my system. There's no other exit codes from findmnt other than 0 and 1. At that point, we can only assume that if the stdout is empty, the mount isn't there anymore. Signed-off-by: Auke Kok <auke.kok@versity.com>	2025-12-02 12:55:11 -08:00
Zach Brown	6a9a6789d5	Merge pull request #267 from versity/clk/merge_enoent Handle ENOENT when getting log merge status item	2025-12-02 09:34:28 -08:00
Chris Kirby	ee630b164f	Handle ENOENT when getting log merge status item Tests that cause client retries can fail with this error from server_commit_log_merge(): error -2 committing log merge: getting merge status item This can happen if the server has already committed and resolved the log merge that is being retried. We can safely ignore ENOENT here just like we do a few lines later. Signed-off-by: Chris Kirby <ckirby@versity.com>	2025-12-01 08:58:24 -06:00