We observe that unmount in this test can consume up to 10sec of time
before proceeding to record heartbeat timeout elections by followers.
When this happens, elections and new leaders happen before unmount even
completes. This indicates that hearbeat packets from the unmount are
ceased immediately, but the unmount is taking longer doing other things.
The timeouts then trigger, possibly during the unmount.
The result is that with timeouts of 3 seconds, we're not actually
waiting for an election at all. It already happened 7 seconds ago. The
code here just "sees" that it happens a few hundred ms after it started
looking for it.
There's a few ways about this fix. We could record the actual timestamp
of the election, and compare it with the actual timestamp of the last
heartbeat packet. This would be conclusive, and could disregard any
complication from umount taking too long. But it also means adding
timestamping in various places, or having to rely on tcpdump with packet
processing.
We can't just record $start before unmount. We will still violate the
part of the test that checks that elections didn't happen too late.
Especially in the 3sec test case if unmount takes 10sec.
The simplest solution is to unmount in a bg thread, and circle around
later to `wait` for it to assure we can re-mount without ill effect.
Signed-off-by: Auke Kok <auke.kok@versity.com>
A few callers of alloc_move_empty in the server were providing a budget
that was too small. Recent changes to extent_mod_blocks increased the
max budget that is necessary to move extents between btrees. The
existing WAG of 100 was too small for trees of height 2 and 3. This
caused looping in production.
We can increase the move budget to half the overall commit budget, which
leaves room for a height of around 7 each. This is much greater than we
see in practice because the size of the per-mount btrees is effectiely
limited by both watermarks and thresholds to commit and drain.
Signed-off-by: Zach Brown <zab@versity.com>
It's possible to trigger the block device autoloading mechanism
with a mknod()/stat(), and this mechanism has long been declared
obsolete, thus triggering a dmesg warning since el9_7, which then
fails the test. You may need to `rmmod loop` to reproduce.
Avoid this by avoiding to trigger a loop autoload - we just make a
different blockdev. Chosing `42` here should avoid any autoload
mechanism as this number is explicitly for demo drivers and should
never trigger an autoload.
We also just ignore the warning line in dmesg. Other tests can and
might perhaps still trigger this, as well as background noise running
during the test.
Signed-off-by: Auke Kok <auke.kok@versity.com>
Use of T_M0 and variants should be reserved for e.g. scoutfs
<subcommand> -p <mountpoint> type of usages. Tests should create
individual content files in the assigned subdirectory.
Signed-off-by: Auke Kok <auke.kok@versity.com>
The tap output file was not yet complete as it failed to include
the contents of `status.msg`. In a few cases, that would mean it
lacks important context.
Signed-off-by: Auke Kok <auke.kok@versity.com>
Bash has special handling when these standard IO files, but
there are cases where customers have special restrictions set
on them. Likely to avoid leaking error data out of system logs
as part of IDS software.
In any case, we can just reopen existing file descriptors here
in both these cases to avoid this entirely. This will always
work.
Signed-off-by: Auke Kok <auke.kok@versity.com>
Our local fence script attempts to interpret errors executing `findmnt`
as critical errors, but the program exit code explicitly returns
EXIT_FAILURE when the total number of matching mount entries is zero.
This can happen if the mount disappeared while we're attempting to
fence the mount, but, the scoutfs sysfs files are still in place as
we read them. It's a small window, but, it's a fork/exec plus full
parse of /etc/fstab, and a lot can happen in the 0.015s findmnt takes
on my system.
There's no other exit codes from findmnt other than 0 and 1. At that
point, we can only assume that if the stdout is empty, the mount
isn't there anymore.
Signed-off-by: Auke Kok <auke.kok@versity.com>
Tests that cause client retries can fail with this error
from server_commit_log_merge():
error -2 committing log merge: getting merge status item
This can happen if the server has already committed and resolved
the log merge that is being retried. We can safely ignore ENOENT here
just like we do a few lines later.
Signed-off-by: Chris Kirby <ckirby@versity.com>
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.