The core quorum work loop assumes that it has exclusive access to its
slot's quorum block. It uniquely marks blocks it writes and verifies
the marks on read to discover if another mount has written to its slot
under the assumption that this must be a configuration error that put
two mounts in the same slot.
But the design of the leader bit in the block violates the invariant
that only a slot will write to its block. As the server comes up and
fences previous leaders it writes to their block to clear their leader
bit.
The final hole in the design is that because we're fencing mounts, not
slots, each slot can have two mounts in play. An active mount can be
using the slot and there can still be a persistent record of a previous
mount in the slot that crashed that needs to be fenced.
All this comes together to have the server fence an old mount in a slot
while a new mount is coming up. The new mount sees the mark change and
freaks out and stops participating in quorum.
The fix is to rework the quorum blocks so that each slot only writes to
its own block. Instead of the server writing to each fenced mount's
slot, it writes a fence event to its block once all previous mounts have
been fenced. We add a bit of bookkeeping so that the server can
discover when all block leader fence operations have completed. Each
event gets its own term so we can compare events to discover live
servers.
We get rid of the write marks and instead have an event that is written
as a quorum agent starts up and is then checked on every read to make
sure it still matches.
Signed-off-by: Zach Brown <zab@versity.com>
Add a test which exercises the various reasons for fencing mounts and
checks that we reclaim the resources that they had.
Signed-off-by: Zach Brown <zab@versity.com>
The shared recovery layer outputs different messages than when it ran
only for lock_recovery in the lock server.
Signed-off-by: Zach Brown <zab@versity.com>
t_umount had a typo that had it try to unmount a mount based on a
caller's variable, which accidentally happened to work for its only
caller. Future callers would not have been so lucky.
Signed-off-by: Zach Brown <zab@versity.com>
t_trigger_arm always output the value of the trigger after arming on the
premise that tests required the trigger being armed. In the process of
showing the trigger it calls a bunch of t_ helpers that build the path
to the trigger file using statfs_more to get the rid of mounts.
If the trigger being armed is in the server's mount and the specific
trigger test is fired by the server's statfs_more request processing
then the trigger can be fired before read its value. Tests can
inconsistently fail as the golden output shows the trigger being armed
or not depending on if it was in the server's mount or not.
t_trigger_arm_silent doesn't output the value of the armed trigger. It
can be used for low level triggers that don't rely on reading the
trigger's value to discover that their effect has happened.
Signed-off-by: Zach Brown <zab@versity.com>
Tests can use t_counter_diff to put a message in their golden output
when a specific change in counters is expected. This adds
t_counter_diff_changed to output a message that indicates change or not,
for tests that want to see counters change but the amount of change
doesn't need to be precisely known.
Signed-off-by: Zach Brown <zab@versity.com>
We mask device numbers in command output to 0:0 so that we can have
consistent golden test output. The device number matching regex
responsible for this missed a few digits.
It didn't show up until we both tested enough mounts to get larger
device minor numbers and fixed multi-mount consistency so that the
affected tests didn't fail for other reasons.
Signed-off-by: Zach Brown <zab@versity.com>
Our test unmount function unmounted the device instead of the mount
point. It was written this way back in an old version of the harness
which didn't track mount points.
Now that we have mount points, we can just unmount that. This stops the
umount command from having to search through all the current mounts
looking for the mountpoint for the device it was asked to unmount.
Signed-off-by: Zach Brown <zab@versity.com>
When running in debug kernels in guests we can really bog down things
enough to trigger hrtimer warnings. I don't think there's much we can
reasonably do about that.
Signed-off-by: Zach Brown <zab@versity.com>
Add a function that tests can use to skip when the metadata device isn't
large enough. I thought we needed to avoid enospc in a particular test,
but it turns out the test's failure was unrelated. So this isn't used
for now but it seems nice to keep around.
Signed-off-by: Zach Brown <zab@versity.com>
When we had three repos the run-tests harness helped by checking
branches in kmod and utils repos to build and test. Now that we have
one repo we can just use the sibling kmod/ and utils/ dirs in the repo.
Signed-off-by: Zach Brown <zab@versity.com>
The xfstests generic/067 test is a bit of a stinker in that it's trying
to make sure a mount failes when the device is invalid. It does this
with raw mount calls without any filesystem-specific conventions. Our
mount fails, so the test passes, but not for the reason the test
assumes. It's not a great test. But we expect it to not be great and
produce this message.
Signed-off-by: Zach Brown <zab@versity.com>
Add another expected message that comes from attempting to mount an ext4
filesystem from a device that returns read errors.
Signed-off-by: Zach Brown <zab@versity.com>
Add -z option to run-tests.sh to specify metadata device.
Do a bunch of things twice.
Fix up setup-error-teardown test.
Signed-off-by: Andy Grover <agrover@versity.com>
[zab@versity.com: minor arg message fixes, golden output]
The first commit of the scoutfs-tests suite which uses multiple mounts
on one host to test multi-node scoutfs.
Signed-off-by: Zach Brown <zab@versity.com>