Adds an accompanying option to set a data preallocation minimum
threshold value. The value can be set through sysfs or at mount
time.
data_prealloc_blocks_min cannot be larger than data_prealloc_blocks,
and this is enforced. This should be fine for all common use
cases where the _min option is expected to be less than 2048,
the default of data_prealloc_blocks.
Extra test cases are added to validate bad mount option values and
sysfs value writes. As well as tests that validate that the
minimum threshold is set and honored as expected.
Preallocation scales with scoutfs_get_inode_onoff() online values,
so that new extents double the online size every allocation until
it reaches data_prealloc_blocks. The _onoff() value is only
fetched once if possible.
Signed-off-by: Auke Kok <auke.kok@versity.com>
Add a reclaim_skip_finalize trigger that prevents reclaim from
setting FINALIZED on log_trees entries. The test arms this trigger,
force-unmounts a client to create an orphan, and verifies the log
merge succeeds without timeout and the orphan reclaim message
appears in dmesg.
Signed-off-by: Auke Kok <auke.kok@versity.com>
Basic testing for the punch-offline ioctl code. The tests consist of a
bunch of negative testing to make sure things that are expressly not
allowed fail, followed by a bunch of known-expected outcome tests that
punches holes in several patterns, verifying them.
Signed-off-by: Auke Kok <auke.kok@versity.com>
This test regularly fails here because the grep is greedy and can
match inodes ending in the same digits as the one we're looking for.
Make it use the same awk pattern used below.
Signed-off-by: Auke Kok <auke.kok@versity.com>
The fix added in v1.26-17-gef0f6f8a does a good job of avoiding the
intermittent test failures for the part that it was added. The remote
unlink section could use it as well, as it suffers from the same
intermediate failures.
Signed-off-by: Auke Kok <auke.kok@versity.com>
This file was put into $CWD by the test scripts for no real good
reason. I suppose somewhere $seqres was supposed to be set before
these writes happened. Just write them to the test temp folder for
good measure for now.
Signed-off-by: Auke Kok <auke.kok@versity.com>
This test case is used to detect and reproduce a customer issue we're
seeing where the new .get_acl() method API and underlying changes in
el9_6+ are causing ACL cache fetching to return inconsistent results,
which shows as missing ACLs on directories.
This particular sequence is consistent enough that it warrants making
it into a specific test.
Signed-off-by: Auke Kok <auke.kok@versity.com>
Adds basic mkfs/mount/umount helpers that handle all the basics
for making, mounting and unmounting scratch devices. The mount/unmount
create "$T_MSCR", which lives in "$T_TMPDIR".
Signed-off-by: Auke Kok <auke.kok@versity.com>
There have been a few failures where output is generated just before we
crash but it didn't have a chance to be written. Add a best-effort
background sync before crashing. There's a good chance it'll hang if
the system is stuck so we don't wait for it directly, just for .. some
time to pass.
Signed-off-by: Zach Brown <zab@versity.com>
The fence-and-reclaim test runs a bunch of scenarios and makes sure that
the fence agent was run on the appropriate mount's rids.
Unfortunately the checks were racey. The check itself only looked at
the log once to see if the rid had been fenced. Each check had steps
before that would wait until the rid should have been fenced and could
be checked.
Those steps were racey. They'd do things like make sure a fence request
wasn't pending, but never waited for it to be created in the first
place. They'd falsely indicate that the log should be checked and when
the rid wasn't found in the log the test would fail. In logs of
failures we'd see that the rids were fenced after this test failed and
moved on to the next.
This simplifies the checks. It gets rid of all the intermediate steps
and just waits around for the rid to be fenced, with a timeout. This
avoids the flakey tests.
Signed-off-by: Zach Brown <zab@versity.com>
We observe that unmount in this test can consume up to 10sec of time
before proceeding to record heartbeat timeout elections by followers.
When this happens, elections and new leaders happen before unmount even
completes. This indicates that hearbeat packets from the unmount are
ceased immediately, but the unmount is taking longer doing other things.
The timeouts then trigger, possibly during the unmount.
The result is that with timeouts of 3 seconds, we're not actually
waiting for an election at all. It already happened 7 seconds ago. The
code here just "sees" that it happens a few hundred ms after it started
looking for it.
There's a few ways about this fix. We could record the actual timestamp
of the election, and compare it with the actual timestamp of the last
heartbeat packet. This would be conclusive, and could disregard any
complication from umount taking too long. But it also means adding
timestamping in various places, or having to rely on tcpdump with packet
processing.
We can't just record $start before unmount. We will still violate the
part of the test that checks that elections didn't happen too late.
Especially in the 3sec test case if unmount takes 10sec.
The simplest solution is to unmount in a bg thread, and circle around
later to `wait` for it to assure we can re-mount without ill effect.
Signed-off-by: Auke Kok <auke.kok@versity.com>
Due to an iput race, the "unlink wait for open on other mount"
subtest can fail. If the unlink happens inline, then the test
passes. But if the orphan scanner has to complete the unlink
work, it's possible that there won't be enough log merge work
for the scanner to do the cleanup before we look at the seq index.
Add SCOUTFS_TRIGGER_LOG_MERGE_FORCE_FINALIZE_OURS, to allow
forcing a log merge. Add new counters, log_merges_start and
log_merge_complete, so that tests can see that a merge has happened.
Then we have to wait for the orphan scanner to do its work.
Add a new counter, orphan_scan_empty, that increments each time
the scanner walks the entire inode space without finding any
orphans. Once the test sees that counter increment, it should be
safe to check the seq index and see that the unlinked inode is gone.
Signed-off-by: Chris Kirby <ckirby@versity.com>
The /sys/kernel/debug/tracing/buffer_size_kb file always reads as
"7 (expanded: 1408)". So the -T option to run-test.sh won't work,
because it tries to multiply that string by the given factor.
It always defaults to 1408 on every platform we currently support.
Just use that value so we can specify -T in CI runs.
Signed-off-by: Chris Kirby <ckirby@versity.com>
It's possible to trigger the block device autoloading mechanism
with a mknod()/stat(), and this mechanism has long been declared
obsolete, thus triggering a dmesg warning since el9_7, which then
fails the test. You may need to `rmmod loop` to reproduce.
Avoid this by avoiding to trigger a loop autoload - we just make a
different blockdev. Chosing `42` here should avoid any autoload
mechanism as this number is explicitly for demo drivers and should
never trigger an autoload.
We also just ignore the warning line in dmesg. Other tests can and
might perhaps still trigger this, as well as background noise running
during the test.
Signed-off-by: Auke Kok <auke.kok@versity.com>
Use of T_M0 and variants should be reserved for e.g. scoutfs
<subcommand> -p <mountpoint> type of usages. Tests should create
individual content files in the assigned subdirectory.
Signed-off-by: Auke Kok <auke.kok@versity.com>
The tap output file was not yet complete as it failed to include
the contents of `status.msg`. In a few cases, that would mean it
lacks important context.
Signed-off-by: Auke Kok <auke.kok@versity.com>
Bash has special handling when these standard IO files, but
there are cases where customers have special restrictions set
on them. Likely to avoid leaking error data out of system logs
as part of IDS software.
In any case, we can just reopen existing file descriptors here
in both these cases to avoid this entirely. This will always
work.
Signed-off-by: Auke Kok <auke.kok@versity.com>
Our local fence script attempts to interpret errors executing `findmnt`
as critical errors, but the program exit code explicitly returns
EXIT_FAILURE when the total number of matching mount entries is zero.
This can happen if the mount disappeared while we're attempting to
fence the mount, but, the scoutfs sysfs files are still in place as
we read them. It's a small window, but, it's a fork/exec plus full
parse of /etc/fstab, and a lot can happen in the 0.015s findmnt takes
on my system.
There's no other exit codes from findmnt other than 0 and 1. At that
point, we can only assume that if the stdout is empty, the mount
isn't there anymore.
Signed-off-by: Auke Kok <auke.kok@versity.com>
The userspace fencing process wasn't careful about handling underlying
directories that disappear while it was working.
On the server/fenced side, fencing requests can linger after they've
been resolved by writing 1 to fenced or error. The script could come
back around to see the directory before the server finally removes it,
causing all later uses of the request dir to fail. We saw this in the
logs as a bunch of cat errors for the various request files.
On the local fence script side, all the mounts can be in the process of
being unmounted so both the /sys/fs dirs and the mount it self can be
removed while we're working.
For both, when we're working with the /sys/fs files we read them without
logging errors and then test that the dir still exists before using what
we read. When fencing a mount, we stop if findmnt doesn't find the
mount and then raise a umount error if the /sys/fs dir exists after
umount fails.
And while we're at it, we have each scripts logging append instead of
truncating (if, say, it's a log file instead of an interactive tty).
Signed-off-by: Zach Brown <zab@versity.com>
We're getting test failures from messages that our guests can be
unresponsive. They sure can be. We don't need to fail for this one
specific case.
Signed-off-by: Zach Brown <zab@versity.com>
mmap_stress gets completely stalled in lock messaging and starving
most of the mmap_stress threads, which causes it to delay and even
time out in CI.
Instead of spawning threads over all 5 test nodes, we reduce it
to spawning over only 2 artificially. This still does a good number
of operations on those node, and now the work is spread across the
two nodes evenly.
Additionaly, I've added a miniscule (10ms) delay in between operations
that should hopefully be sufficient for other locking attempts to
settle and allow the threads to better spread the work.
This now shows that all the threads exit within < 0.25s on my test
machine, which is a lot better than the 40s variation that I was seeing
locally. Hopefully this fares better in CI.
Signed-off-by: Auke Kok <auke.kok@versity.com>
There's a scenarion where mmap_stress gets enough resources that
twoe of the threads will starve the others, which then all take
a very long time catching up committing changes.
Because this test program didn't finish until all the threads had
completed a fixed amount of work, essentially these threads all
ended up tripping over eachother. In CI this would exceed 6h+,
while originally I intended this to run in about 100s or so.
Instead, cap the run time to ~30s by default. If threads exceed
this time, they will immediately exit, which causes any clog in
contention between the threads to drain relatively quickly.
Signed-off-by: Auke Kok <auke.kok@versity.com>
The xfstests's golden output includes the full set of tests we expect to
run when no args are specified. If we specify args then the set of
tests can change and the test will always fail when they do.
This fixes that by having the test check the set of tests itself, rather
than relying on golden output. If args are specified then our xfstest
only fails if any of the executed xfstest tests failed. Without args,
we perform the same scraping of the check output and compare it against
the expected results ourself.
It would have been a bit much to put that large file inline in the test
file, so we add a dir of per-test files in revision control. We can
also put the list of exclusions there.
We can also clean up the output redirection helper functions to make
them more clear. After xfstests has executed we want to redirect output
back to the compared output so that we can catch any unexpected output.
Signed-off-by: Zach Brown <zab@versity.com>
Add a little background function that runs during the test which
triggers a crash if it finds catastrophic failure conditions.
This is the second bg task we want to kill and we can only have one
function run on the EXIT trap, so we create a generic process killing
trap function.
We feed it the fenced pid as well. run-tests didn't log much of value
into the fenced log, and we're not logging the kills into anymore, so we
just remove run-tests fenced logging.
Signed-off-by: Zach Brown <zab@versity.com>
Add an option to run-tests to have it loop over each test that will be
run a number of times. Looping stops if the test doesn't pass.
Most of the change in the per-test execution is indenting as we add the
for loop block. The stats and kmsg output are lifted up before of the
loop.
Signed-off-by: Zach Brown <zab@versity.com>
The tests were using high ephemeral port numbers for the mount server's
listening port. This caused occasional failure if the client's
ephemeral ports happened to collide with the ports used by the tests.
This ports all the port number configuration in one place and has a
quick check to make sure it doesn't wander into the current ephemeral
range. Then it updates all the tests to use the chosen ports.
Signed-off-by: Zach Brown <zab@versity.com>
Make sure that the orphan scanners can see deletions after forced unmounts
by waiting for reclaim_open_log_tree() to run on each mount; and waiting for
finalize_and_start_log_merge() to run and not find any finalized trees.
Do this by adding two new counters: reclaimed_open_logs and
log_merge_no_finalized and fixing the orphan-inodes test to check those
before waiting for the orphan scanners to complete.
Signed-off-by: Chris Kirby <ckirby@versity.com>
Tests such as quorum-heartbeat-timeout were failing with EIO messages in
dmesg output due to expected errors during forced unmount. Use ENOLINK
instead, and filter all errors from dmesg with this errno (67).
Signed-off-by: Chris Kirby <ckirby@versity.com>
This test compiles an earlier commit from the tree that is starting to
fail due to various changes on the OS level, most recently due to sparse
issues with newer kernel headers. This problem will likely increase
in the future as we add more supported releases.
We opt to just only run this test on el7 for now. While we could have
made this skip sparse checks that fail it on el8, it will suffice at
this point if this just works on one of the supported OS versions
during testing.
Signed-off-by: Auke Kok <auke.kok@versity.com>
The issue with the previous attempt to fix the orphan-inodes test was
that we would regularly exceed the 120s timeout value put in there.
Instead, in this commit, we change the code to add a new counter to
indicate orphan deletion progress. When orphan inodes are deleted, the
increment of this counter indicates progress happened. Inversely,
every time the counter doesn't increment, and the orphan scan attempts
counter increments, we know that there was no more work to be done.
For safety, we wait until 2 consecutive scan attempts were made without
forward progress in the test case.
Signed-off-by: Auke Kok <auke.kok@versity.com>
This reverts commit 138c7c6b49.
The timeout value here is still exceeded by CI test jobs, and thus
causing the test to fail.
Signed-off-by: Auke Kok <auke.kok@versity.com>
Adjusting hung_task_timeout_secs is still needed for this test to pass
with a debug kernel. But the logic belongs on the platform side.
Signed-off-by: Chris Kirby <ckirby@versity.com>
The `-R` option will shuffle the order in which tests are executed.
The testing order shouldn't affect the outcome of any of the tests, but
in practice many of these tests will execute code slightly different
based on the history of the filesystem, resources allocated, memory
usage etc. of tests that were executed before. Shuffling the order of
tests therefore introduces small semi-random variations in the
enviroment.
The xfstests test is the only one that can't be shuffled yet into the
mix, so it is kept at the end. This is because it leaves the filesystems
unmounted. At a later point we may want to address this.
Signed-off-by: Auke Kok <auke.kok@versity.com>