Compare commits

..

15 Commits

Author SHA1 Message Date
Auke Kok
21a676876b Add tuned profile with recommended vm tunings for scoutfs.
Install a scoutfs `tuned` profile as a system-profile for `tuned`
with recommendation hint to enable this.

The goal is to provide a set of basic VM tunings that are reasonably
good starting point for production deployment of scoutfs. The tunings
are chosen to reflect good practices to aim for responsiveness of
the scoutfs deployment.

The values chosen are based on existing tuned profiles, building on
throughput-performance and network-throughput as a base, and tuning
VM values in the same way that latency-performance does. All of them
enable the performance CPU governor. None of this enables powersave
settings.

Different deployments may have different performance characteristics
and require further adjustment, or even a completely different profile.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2026-01-06 13:06:03 -08:00
Zach Brown
50bff13f21 Merge pull request #266 from versity/zab/increase_move_empty_budget
Increase server commit block budget for alloc move
2025-12-18 12:44:20 -08:00
Zach Brown
de70ca2372 Increase server commit block budget for alloc move
A few callers of alloc_move_empty in the server were providing a budget
that was too small.  Recent changes to extent_mod_blocks increased the
max budget that is necessary to move extents between btrees.  The
existing WAG of 100 was too small for trees of height 2 and 3.  This
caused looping in production.

We can increase the move budget to half the overall commit budget, which
leaves room for a height of around 7 each.  This is much greater than we
see in practice because the size of the per-mount btrees is effectiely
limited by both watermarks and thresholds to commit and drain.

Signed-off-by: Zach Brown <zab@versity.com>
2025-12-17 14:22:04 -06:00
Zach Brown
5af1412d5f Merge pull request #270 from versity/auke/bdev_autoloading
Avoid block device autoloading warning.
2025-12-17 11:06:32 -08:00
Zach Brown
0a2b2ad409 Merge pull request #269 from versity/auke/tap_status_msg
Include t_fail status in tap output.
2025-12-17 11:04:00 -08:00
Auke Kok
6c4590a8a0 Avoid block device autoloading warning.
It's possible to trigger the block device autoloading mechanism
with a mknod()/stat(), and this mechanism has long been declared
obsolete, thus triggering a dmesg warning since el9_7, which then
fails the test. You may need to `rmmod loop` to reproduce.

Avoid this by avoiding to trigger a loop autoload - we just make a
different blockdev. Chosing `42` here should avoid any autoload
mechanism as this number is explicitly for demo drivers and should
never trigger an autoload.

We also just ignore the warning line in dmesg. Other tests can and
might perhaps still trigger this, as well as background noise running
during the test.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2025-12-08 13:04:58 -08:00
Zach Brown
1768f69c3c Merge pull request #224 from versity/auke/renameat2-test-sub-dir
Use T_D0/1 instead of T_M0 here.
2025-12-08 10:05:46 -08:00
Zach Brown
dcb0fd5805 Merge pull request #268 from versity/auke/dont_use_bash_special_stdfiles
Avoid using bash special device nodes.
2025-12-08 09:47:19 -08:00
Auke Kok
660f874488 Use T_D0/1 instead of T_M0 here.
Use of T_M0 and variants should be reserved for e.g. scoutfs
<subcommand> -p <mountpoint> type of usages. Tests should create
individual content files in the assigned subdirectory.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2025-12-04 14:34:02 -05:00
Auke Kok
e1a6689a9b Include t_fail status in tap output.
The tap output file was not yet complete as it failed to include
the contents of `status.msg`. In a few cases, that would mean it
lacks important context.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2025-12-04 14:09:39 -05:00
Auke Kok
2884a92408 Avoid using bash special device nodes.
Bash has special handling when these standard IO files, but
there are cases where customers have special restrictions set
on them. Likely to avoid leaking error data out of system logs
as part of IDS software.

In any case, we can just reopen existing file descriptors here
in both these cases to avoid this entirely. This will always
work.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2025-12-04 13:24:48 -05:00
Zach Brown
e194714004 Merge pull request #264 from versity/auke/findmnt_retval
Findmnt returns 1 when no matching entries found
2025-12-03 14:29:31 -08:00
Auke Kok
8bb2f83cf9 Findmnt returns 1 when no matching entries found
Our local fence script attempts to interpret errors executing `findmnt`
as critical errors, but the program exit code explicitly returns
EXIT_FAILURE when the total number of matching mount entries is zero.

This can happen if the mount disappeared while we're attempting to
fence the mount, but, the scoutfs sysfs files are still in place as
we read them. It's a small window, but, it's a fork/exec plus full
parse of /etc/fstab, and a lot can happen in the 0.015s findmnt takes
on my system.

There's no other exit codes from findmnt other than 0 and 1. At that
point, we can only assume that if the stdout is empty, the mount
isn't there anymore.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2025-12-02 12:55:11 -08:00
Zach Brown
6a9a6789d5 Merge pull request #267 from versity/clk/merge_enoent
Handle ENOENT when getting log merge status item
2025-12-02 09:34:28 -08:00
Chris Kirby
ee630b164f Handle ENOENT when getting log merge status item
Tests that cause client retries can fail with this error
from server_commit_log_merge():

error -2 committing log merge: getting merge status item

This can happen if the server has already committed and resolved
the log merge that is being retried. We can safely ignore ENOENT here
just like we do a few lines later.

Signed-off-by: Chris Kirby <ckirby@versity.com>
2025-12-01 08:58:24 -06:00
12 changed files with 139 additions and 269 deletions

View File

@@ -1618,7 +1618,8 @@ static int server_get_log_trees(struct super_block *sb,
goto update;
}
ret = alloc_move_empty(sb, &super->data_alloc, &lt.data_freed, 100);
ret = alloc_move_empty(sb, &super->data_alloc, &lt.data_freed,
COMMIT_HOLD_ALLOC_BUDGET / 2);
if (ret == -EINPROGRESS)
ret = 0;
if (ret < 0) {
@@ -1913,9 +1914,11 @@ static int reclaim_open_log_tree(struct super_block *sb, u64 rid)
scoutfs_alloc_splice_list(sb, &server->alloc, &server->wri, server->other_freed,
&lt.meta_avail)) ?:
(err_str = "empty data_avail",
alloc_move_empty(sb, &super->data_alloc, &lt.data_avail, 100)) ?:
alloc_move_empty(sb, &super->data_alloc, &lt.data_avail,
COMMIT_HOLD_ALLOC_BUDGET / 2)) ?:
(err_str = "empty data_freed",
alloc_move_empty(sb, &super->data_alloc, &lt.data_freed, 100));
alloc_move_empty(sb, &super->data_alloc, &lt.data_freed,
COMMIT_HOLD_ALLOC_BUDGET / 2));
mutex_unlock(&server->alloc_mutex);
/* only finalize, allowing merging, once the allocators are fully freed */
@@ -3036,7 +3039,13 @@ static int server_commit_log_merge(struct super_block *sb,
SCOUTFS_LOG_MERGE_STATUS_ZONE, 0, 0,
&stat, sizeof(stat));
if (ret < 0) {
err_str = "getting merge status item";
/*
* During a retransmission, it's possible that the server
* already committed and resolved this log merge. ENOENT
* is expected in that case.
*/
if (ret != -ENOENT)
err_str = "getting merge status item";
goto out;
}

View File

@@ -9,7 +9,7 @@
echo "$0 running rid '$SCOUTFS_FENCED_REQ_RID' ip '$SCOUTFS_FENCED_REQ_IP' args '$@'"
echo_fail() {
echo "$@" >> /dev/stderr
echo "$@" >&2
exit 1
}
@@ -27,8 +27,7 @@ for fs in /sys/fs/scoutfs/*; do
nr="$(quiet_cat $fs/data_device_maj_min)"
[ ! -d "$fs" -o "$fs_rid" != "$rid" ] && continue
mnt=$(findmnt -l -n -t scoutfs -o TARGET -S $nr) || \
echo_fail "findmnt -t scoutfs -S $nr failed"
mnt=$(findmnt -l -n -t scoutfs -o TARGET -S $nr)
[ -z "$mnt" ] && continue
if ! umount -qf "$mnt"; then

View File

@@ -170,6 +170,9 @@ t_filter_dmesg()
# some ci test guests are unresponsive
re="$re|longest quorum heartbeat .* delay"
# creating block devices may trigger this
re="$re|block device autoloading is deprecated and will be removed."
egrep -v "($re)" | \
ignore_harmless_unwind_kasan_stack_oob
}

View File

@@ -43,9 +43,14 @@ t_tap_progress()
local testname=$1
local result=$2
local stmsg=""
local diff=""
local dmsg=""
if [[ -s $T_RESULTS/tmp/${testname}/status.msg ]]; then
stmsg="1"
fi
if [[ -s "$T_RESULTS/tmp/${testname}/dmesg.new" ]]; then
dmsg="1"
fi
@@ -61,6 +66,7 @@ t_tap_progress()
echo "# ${testname} ** skipped - permitted **"
else
echo "not ok ${i} - ${testname}"
case ${result} in
101)
echo "# ${testname} ** skipped **"
@@ -70,6 +76,13 @@ t_tap_progress()
;;
esac
if [[ -n "${stmsg}" ]]; then
echo "#"
echo "# status:"
echo "#"
cat $T_RESULTS/tmp/${testname}/status.msg | sed 's/^/# - /'
fi
if [[ -n "${diff}" ]]; then
echo "#"
echo "# diff:"

View File

@@ -72,7 +72,7 @@ touch $T_D0/dir/file
mkdir $T_D0/dir/dir
ln -s $T_D0/dir/file $T_D0/dir/symlink
mknod $T_D0/dir/char c 1 3 # null
mknod $T_D0/dir/block b 7 0 # loop0
mknod $T_D0/dir/block b 42 0 # SAMPLE block dev - nonexistant/demo use only number
for name in $(ls -UA $T_D0/dir | sort); do
ino=$(stat -c '%i' $T_D0/dir/$name)
$GRE $ino | filter_types

View File

@@ -8,19 +8,19 @@ t_require_mounts 2
echo "=== renameat2 noreplace flag test"
# give each mount their own dir (lock group) to minimize create contention
mkdir $T_M0/dir0
mkdir $T_M1/dir1
mkdir $T_D0/dir0
mkdir $T_D1/dir1
echo "=== run two asynchronous calls to renameat2 NOREPLACE"
for i in $(seq 0 100); do
# prepare inputs in isolation
touch "$T_M0/dir0/old0"
touch "$T_M1/dir1/old1"
touch "$T_D0/dir0/old0"
touch "$T_D1/dir1/old1"
# race doing noreplace renames, both can't succeed
dumb_renameat2 -n "$T_M0/dir0/old0" "$T_M0/dir0/sharednew" 2> /dev/null &
dumb_renameat2 -n "$T_D0/dir0/old0" "$T_D0/dir0/sharednew" 2> /dev/null &
pid0=$!
dumb_renameat2 -n "$T_M1/dir1/old1" "$T_M1/dir0/sharednew" 2> /dev/null &
dumb_renameat2 -n "$T_D1/dir1/old1" "$T_D1/dir0/sharednew" 2> /dev/null &
pid1=$!
wait $pid0
@@ -31,7 +31,7 @@ for i in $(seq 0 100); do
test "$rc0" == 0 -a "$rc1" == 0 && t_fail "both renames succeeded"
# blow away possible files for either race outcome
rm -f "$T_M0/dir0/old0" "$T_M1/dir1/old1" "$T_M0/dir0/sharednew" "$T_M1/dir1/sharednew"
rm -f "$T_D0/dir0/old0" "$T_D1/dir1/old1" "$T_D0/dir0/sharednew" "$T_D1/dir1/sharednew"
done
t_pass

View File

@@ -7,7 +7,7 @@ message_output()
error_message()
{
message_output "$@" >> /dev/stderr
message_output "$@" >&2
}
error_exit()

View File

@@ -402,39 +402,25 @@ before destroying an old empty data device.
.PD
.TP
.BI "print {-a|--allocs} {-i|--items ITEMS} {-r|--roots ROOTS} {-S|--skip-likely-huge} META-DEVICE"
.BI "print {-S|--skip-likely-huge} META-DEVICE"
.sp
Prints out some or all of the metadata in the file system. This makes no effort
Prints out all of the metadata in the file system. This makes no effort
to ensure that the structures are consistent as they're traversed and
can present structures that seem corrupt as they change as they're
output.
.sp
Structures that are related to the number of mounts and are maintained at a
relatively reasonable size are always printed. These include per-mount log
trees, srch files, allocators, and the metadata allocators used by server
commits. Other btrees and their items can be selected as desired.
.RS 1.0i
.PD 0
.TP
.sp
.TP
.B "-a, --allocs"
Print the metadata and data allocators. Enabled by default.
.TP
.B "-r, --roots ROOTS"
This option can be used to select which btrees are traversed. It is a comma-separated list containing one or more of the following btree roots: logs, srch, fs. Default is all roots.
.TP
.B "-i, --items ITEMS"
This option can be used to choose which btree items are printed from the
selected btree roots. It is a comma-separated list containing one or
more of the following items: inode, xattr, dirent, symlink, backref, extent.
Default is all items.
.TP
.B "-S, --skip-likely-huge"
Skip printing structures that are likely to be very large. The
structures that are skipped tend to be global and whose size tends to be
related to the size of the volume. Examples of skipped structures include
the global fs items, srch files, and metadata and data
allocators.
allocators. Similar structures that are not skipped are related to the
number of mounts and are maintained at a relatively reasonable size.
These include per-mount log trees, srch files, allocators, and the
metadata allocators used by server commits.
.sp
Skipping the larger structures limits the print output to a relatively
constant size rather than being a large multiple of the used metadata

View File

@@ -4,6 +4,12 @@
%{!?_release: %global _release 0.%{pkg_date}git%{pkg_git_hash}}
%if 0%{?rhel} && 0%{?rhel} < 10
%global tuned_profiles_dir %{_prefix}/lib/tuned
%else
%global tuned_profiles_dir %{_prefix}/lib/tuned/profiles
%endif
Name: scoutfs-utils
Summary: scoutfs user space utilities
Version: %{pkg_version}
@@ -57,6 +63,8 @@ install -m 644 -D src/format.h $RPM_BUILD_ROOT%{_includedir}/scoutfs/format.h
install -m 755 -D fenced/scoutfs-fenced $RPM_BUILD_ROOT%{_libexecdir}/scoutfs-fenced/scoutfs-fenced
install -m 644 -D fenced/scoutfs-fenced.service $RPM_BUILD_ROOT%{_unitdir}/scoutfs-fenced.service
install -m 644 -D fenced/scoutfs-fenced.conf.example $RPM_BUILD_ROOT%{_sysconfdir}/scoutfs/scoutfs-fenced.conf.example
install -m 644 -D tuned/tuned.conf $RPM_BUILD_ROOT%{tuned_profiles_dir}/scoutfs/tuned.conf
install -m 644 -D tuned/40-scoutfs.conf $RPM_BUILD_ROOT%{_prefix}/lib/tuned/recommend.d/40-scoutfs.conf
%files
%defattr(644,root,root,755)
@@ -66,6 +74,8 @@ install -m 644 -D fenced/scoutfs-fenced.conf.example $RPM_BUILD_ROOT%{_sysconfdi
%defattr(755,root,root,755)
%{_sbindir}/scoutfs
%{_libexecdir}/scoutfs-fenced
%{tuned_profiles_dir}/scoutfs/tuned.conf
%{_prefix}/lib/tuned/recommend.d/40-scoutfs.conf
%files -n scoutfs-devel
%defattr(644,root,root,755)

View File

@@ -29,42 +29,6 @@
#include "leaf_item_hash.h"
#include "dev.h"
struct print_args {
char *meta_device;
bool skip_likely_huge;
bool roots_requested;
bool items_requested;
bool allocs_requested;
bool walk_allocs;
bool walk_logs_root;
bool walk_fs_root;
bool walk_srch_root;
bool print_inodes;
bool print_xattrs;
bool print_dirents;
bool print_symlinks;
bool print_backrefs;
bool print_extents;
};
static struct print_args print_args = {
.meta_device = NULL,
.skip_likely_huge = false,
.roots_requested = false,
.items_requested = false,
.allocs_requested = false,
.walk_allocs = true,
.walk_logs_root = true,
.walk_fs_root = true,
.walk_srch_root = true,
.print_inodes = true,
.print_xattrs = true,
.print_dirents = true,
.print_symlinks = true,
.print_backrefs = true,
.print_extents = true
};
static void print_block_header(struct scoutfs_block_header *hdr, int size)
{
u32 crc = crc_block(hdr, size);
@@ -231,7 +195,7 @@ static void print_inode_index(struct scoutfs_key *key, void *val, int val_len)
typedef void (*print_func_t)(struct scoutfs_key *key, void *val, int val_len);
static print_func_t find_printer(u8 zone, u8 type, bool *suppress)
static print_func_t find_printer(u8 zone, u8 type)
{
if (zone == SCOUTFS_INODE_INDEX_ZONE &&
type >= SCOUTFS_INODE_INDEX_META_SEQ_TYPE &&
@@ -254,34 +218,13 @@ static print_func_t find_printer(u8 zone, u8 type, bool *suppress)
if (zone == SCOUTFS_FS_ZONE) {
switch(type) {
case SCOUTFS_INODE_TYPE:
if (!print_args.print_inodes)
*suppress = true;
return print_inode;
case SCOUTFS_XATTR_TYPE:
if (!print_args.print_xattrs)
*suppress = true;
return print_xattr;
case SCOUTFS_DIRENT_TYPE:
if (!print_args.print_dirents)
*suppress = true;
return print_dirent;
case SCOUTFS_READDIR_TYPE:
if (!print_args.print_dirents)
*suppress = true;
return print_dirent;
case SCOUTFS_SYMLINK_TYPE:
if (!print_args.print_symlinks)
*suppress = true;
return print_symlink;
case SCOUTFS_LINK_BACKREF_TYPE:
if (!print_args.print_backrefs)
*suppress = true;
return print_dirent;
case SCOUTFS_DATA_EXTENT_TYPE:
if (!print_args.print_extents)
*suppress = true;
return print_data_extent;
case SCOUTFS_INODE_TYPE: return print_inode;
case SCOUTFS_XATTR_TYPE: return print_xattr;
case SCOUTFS_DIRENT_TYPE: return print_dirent;
case SCOUTFS_READDIR_TYPE: return print_dirent;
case SCOUTFS_SYMLINK_TYPE: return print_symlink;
case SCOUTFS_LINK_BACKREF_TYPE: return print_dirent;
case SCOUTFS_DATA_EXTENT_TYPE: return print_data_extent;
}
}
@@ -301,16 +244,12 @@ static int print_fs_item(struct scoutfs_key *key, u64 seq, u8 flags, void *val,
/* only items in leaf blocks have values */
if (val != NULL && !(flags & SCOUTFS_ITEM_FLAG_DELETION)) {
bool suppress = false;
printer = find_printer(key->sk_zone, key->sk_type, &suppress);
if (printer) {
if (!suppress)
printer(key, val, val_len);
} else {
printer = find_printer(key->sk_zone, key->sk_type);
if (printer)
printer(key, val, val_len);
else
printf(" (unknown zone %u type %u)\n",
key->sk_zone, key->sk_type);
}
}
return 0;
@@ -1098,7 +1037,12 @@ static void print_super_block(struct scoutfs_super_block *super, u64 blkno)
}
}
static int print_volume(int fd)
struct print_args {
char *meta_device;
bool skip_likely_huge;
};
static int print_volume(int fd, struct print_args *args)
{
struct scoutfs_super_block *super = NULL;
struct print_recursion_args pa;
@@ -1148,7 +1092,7 @@ static int print_volume(int fd)
ret = err;
}
if (print_args.walk_allocs) {
if (!args->skip_likely_huge) {
for (i = 0; i < array_size(super->meta_alloc); i++) {
snprintf(str, sizeof(str), "meta_alloc[%u]", i);
err = print_btree(fd, super, str, &super->meta_alloc[i].root,
@@ -1175,21 +1119,18 @@ static int print_volume(int fd)
pa.super = super;
pa.fd = fd;
if (print_args.walk_srch_root) {
if (!args->skip_likely_huge) {
err = print_btree_leaf_items(fd, super, &super->srch_root.ref,
print_srch_root_files, &pa);
if (err && !ret)
ret = err;
}
err = print_btree_leaf_items(fd, super, &super->logs_root.ref,
print_log_trees_roots, &pa);
if (err && !ret)
ret = err;
if (print_args.walk_logs_root) {
err = print_btree_leaf_items(fd, super, &super->logs_root.ref,
print_log_trees_roots, &pa);
if (err && !ret)
ret = err;
}
if (print_args.walk_fs_root) {
if (!args->skip_likely_huge) {
err = print_btree(fd, super, "fs_root", &super->fs_root,
print_fs_item, NULL);
if (err && !ret)
@@ -1202,16 +1143,16 @@ out:
return ret;
}
static int do_print(void)
static int do_print(struct print_args *args)
{
int ret;
int fd;
fd = open(print_args.meta_device, O_RDONLY);
fd = open(args->meta_device, O_RDONLY);
if (fd < 0) {
ret = -errno;
fprintf(stderr, "failed to open '%s': %s (%d)\n",
print_args.meta_device, strerror(errno), errno);
args->meta_device, strerror(errno), errno);
return ret;
}
@@ -1219,169 +1160,30 @@ static int do_print(void)
if (ret < 0)
goto out;
ret = print_volume(fd);
ret = print_volume(fd, args);
out:
close(fd);
return ret;
};
enum {
LOGS_OPT = 0,
FS_OPT,
SRCH_OPT
};
static char *const root_tokens[] = {
[LOGS_OPT] = "logs",
[FS_OPT] = "fs",
[SRCH_OPT] = "srch",
NULL
};
enum {
INODE_OPT = 0,
XATTR_OPT,
DIRENT_OPT,
SYMLINK_OPT,
BACKREF_OPT,
EXTENT_OPT
};
static char *const item_tokens[] = {
[INODE_OPT] = "inode",
[XATTR_OPT] = "xattr",
[DIRENT_OPT] = "dirent",
[SYMLINK_OPT] = "symlink",
[BACKREF_OPT] = "backref",
[EXTENT_OPT] = "extent",
NULL
};
static void clear_items(void)
{
print_args.print_inodes = false;
print_args.print_xattrs = false;
print_args.print_dirents = false;
print_args.print_symlinks = false;
print_args.print_backrefs = false;
print_args.print_extents = false;
}
static void clear_roots(void)
{
print_args.walk_logs_root = false;
print_args.walk_fs_root = false;
print_args.walk_srch_root = false;
}
static int parse_opt(int key, char *arg, struct argp_state *state)
{
struct print_args *args = state->input;
char *subopts;
char *value;
bool parse_err = false;
switch (key) {
case 'S':
args->skip_likely_huge = true;
break;
case 'a':
args->allocs_requested = true;
args->walk_allocs = true;
break;
case 'i':
/* Specific items being requested- clear them all to start */
if (!args->items_requested) {
clear_items();
if (!args->allocs_requested)
args->walk_allocs = false;
args->items_requested = true;
}
subopts = arg;
while (*subopts != '\0' && !parse_err) {
switch (getsubopt(&subopts, item_tokens, &value)) {
case INODE_OPT:
args->print_inodes = true;
break;
case XATTR_OPT:
args->print_xattrs = true;
break;
case DIRENT_OPT:
args->print_dirents = true;
break;
case SYMLINK_OPT:
args->print_symlinks = true;
break;
case BACKREF_OPT:
args->print_backrefs = true;
break;
case EXTENT_OPT:
args->print_extents = true;
break;
default:
argp_usage(state);
parse_err = true;
break;
}
}
break;
case 'r':
/* Specific roots being requested- clear them all to start */
if (!args->roots_requested) {
clear_roots();
if (!args->allocs_requested)
args->walk_allocs = false;
args->roots_requested = true;
}
subopts = arg;
while (*subopts != '\0' && !parse_err) {
switch (getsubopt(&subopts, root_tokens, &value)) {
case LOGS_OPT:
args->walk_logs_root = true;
break;
case FS_OPT:
args->walk_fs_root = true;
break;
case SRCH_OPT:
args->walk_srch_root = true;
break;
default:
argp_usage(state);
parse_err = true;
break;
}
}
break;
case ARGP_KEY_ARG:
if (!args->meta_device)
args->meta_device = strdup_or_error(state, arg);
else
argp_error(state, "more than one argument given");
break;
case ARGP_KEY_FINI:
if (!args->meta_device)
argp_error(state, "no metadata device argument given");
/*
* For backwards compatibility, translate -S. Should we warn if
* this conflicts with other explicit options?
*/
if (args->skip_likely_huge) {
if (!args->allocs_requested)
args->walk_allocs = false;
args->walk_fs_root = false;
args->walk_srch_root = false;
}
break;
default:
break;
}
@@ -1390,10 +1192,7 @@ static int parse_opt(int key, char *arg, struct argp_state *state)
}
static struct argp_option options[] = {
{ "allocs", 'a', NULL, 0, "Print metadata and data alloc lists" },
{ "items", 'i', "ITEMS", 0, "Item(s) to print (inode, xattr, dirent, symlink, backref, extent)" },
{ "roots", 'r', "ROOTS", 0, "Tree root(s) to walk (logs, srch, fs)" },
{ "skip-likely-huge", 'S', NULL, 0, "Skip allocs, srch root and fs root to minimize output size" },
{ "skip-likely-huge", 'S', NULL, 0, "Skip large structures to minimize output size"},
{ NULL }
};
@@ -1406,15 +1205,17 @@ static struct argp argp = {
static int print_cmd(int argc, char **argv)
{
struct print_args print_args = {NULL};
int ret;
ret = argp_parse(&argp, argc, argv, 0, NULL, &print_args);
if (ret)
return ret;
return do_print();
return do_print(&print_args);
}
static void __attribute__((constructor)) print_ctor(void)
{
cmd_register_argp("print", &argp, GROUP_DEBUG, print_cmd);

View File

@@ -0,0 +1,9 @@
#
# scoutfs tuned recommendation
#
# If the system has support for mounting scoutfs filesystems, which is
# valid for client mounts and quorum mounts. We then always recommend
# the scoutfs profile.
[scoutfs]
/proc/filesystems=scoutfs

40
utils/tuned/tuned.conf Normal file
View File

@@ -0,0 +1,40 @@
#
# ScoutFS specific tuned profile
#
# The parameters below are a mix of settings present in the throughput-performance
# profile as well as the latency-performance profile. Generally speaking, we
# want to encourage the system to avoid swap and accumulating large amounts of
# dirty data, as this can cause reclaim to lead to congestion.
# Enable this profile with `$ sudo tuned-adm profile scoutfs`
# linux default values are marked with [<value>] for reference.
[main]
summary=Optimize for production scoutfs deployment
description=Configures the system for production scoutfs filesystem server deployment.
# network-throughput sets some larger buffers useful for 40gbe deployments
# network-throughput also inherits throughput-performance
include=network-throughput
[vm]
# throughput-performance sets dirty_bytes to 40% (much larger than linux default), but
# this allows the accumulation of large backlogs of writeback. We prefer to writeback
# often and early to avoid congestion [20%]
dirty_bytes = 10%
# start writing back at this amount [10%]
dirty_background_bytes = 5%
[sysctl]
# the kernel default is 60. Lower it to instruct the kernel that swapping is
# expensive and we want to avoid it. We assume scoutfs deployments have ample
# available RAM. [60]
vm.swappiness = 10
# increase pdflush runs so it can more aggressively write out dirty data [500]
vm.dirty_writeback_centisecs = 300
# decrease time dirty data will linger before being written back [3000]
vm.dirty_expire_centisecs = 2000