Commit Graph

83 Commits

Author SHA1 Message Date
Zach Brown
6d0694f1b0 Add resize_devices ioctl and scoutfs command
Add a scoutfs command that uses an ioctl to send a request to the server
to safely use a device that has grown.

Signed-off-by: Zach Brown <zab@versity.com>
2021-07-30 13:26:32 -07:00
Zach Brown
a9baeab22e stage_tmpfile test gets current data_version
The stage_tmpfile test util was written when fallocate didn't update
data_version for size extensions.  It is more correct to get the
data_version after fallocate changes data_versions for however many
transactions, extent allocations, and i_size extensions it took to
allocate space.

Signed-off-by: Zach Brown <zab@versity.com>
2021-07-30 13:22:42 -07:00
Zach Brown
e4dca8ddcc Don't shutdown quorum if server startup fails
The quorum service shuts down if it sees errors that mean that it can't
do its job.

This is mostly fatal errors gathering resources at startup or runtime IO
errors but it was also shutting down if server startup fails.   That's
not quite right.  This should be treated like the server shutting down
on errors.  Quorum needs to stay around to participate in electing the
next server.

Fence timeouts could trigger this.   A quorum mount could crash, the
next server without a fence script could have a fence request timeout
and shutdown, and now the third remaining server is left to indefinitely
send vote requests into the void.

With this fixed, continuing that example, the quorum service in the
second mount remains to elect the third server with a working fence
script after the second server shuts down after its fence request times
out.

Signed-off-by: Zach Brown <zab@versity.com>
2021-07-30 11:34:52 -07:00
Zach Brown
120c2d342a Add create_xattr_loop test tool
Add a quick tool that creates xattrs in a tight loop.

Signed-off-by: Zach Brown <zab@versity.com>
2021-07-07 14:13:14 -07:00
Zach Brown
73bf916182 Return ENOSPC as space gets low
Returning ENOSPC is challenging because we have clients working on
allocators which are a fraction of the whole and we use COW transactions
so we need to be able to allocate to free.  This adds support for
returning ENOSPC to client posix allocators as free space gets low.

For metadata, we reserve a number of free blocks for making progress
with client and server transactions which can free space.  The server
sets the low flag in a client's allocator if we start to dip into
reserved blocks.  In the client we add an argument to entering a
transaction which indicates if we're allocating new space (as opposed to
just modifying existing data or freeing).  When an allocating
transaction runs low and the server low flag is set then we return
ENOSPC.

Adding an argument to transaciton holders and having it return ENOSPC
gave us the opportunity to clean it up and make it a little clearer.
More work is done outside the wait_event function and it now
specifically waits for a transaction to cycle when it forces a commit
rather than spinning until the transaction worker acquires the lock and
stops it.

For data the same pattern applies except there are no reserved blocks
and we don't COW data so it's a simple case of returning the hard ENOSPC
when the data allocator flag is set.

The server needs to consider the reserved count when refilling the
client's meta_avail allocator and when swapping between the two
meta_avail and meta_free allocators.

We add the reserved metadata block count to statfs_more so that df can
subtract it from the free meta blocks and make it clear when enospc is
going to be returned for metadata allocations.

We increase the minimum device size in mkfs so that small testing
devices provide sufficient reserved blocks.

And finally we add a little test that makes sure we can fill both
metadata and data to ENOSPC and then recover by deleting what we filled.

Signed-off-by: Zach Brown <zab@versity.com>
2021-07-07 14:13:14 -07:00
Zach Brown
24d682bf81 Add orphan-inodes test
Signed-off-by: Zach Brown <zab@versity.com>
2021-07-02 10:54:56 -07:00
Zach Brown
38a4a56741 Stop writing to other quorum slot blocks
The core quorum work loop assumes that it has exclusive access to its
slot's quorum block.  It uniquely marks blocks it writes and verifies
the marks on read to discover if another mount has written to its slot
under the assumption that this must be a configuration error that put
two mounts in the same slot.

But the design of the leader bit in the block violates the invariant
that only a slot will write to its block.   As the server comes up and
fences previous leaders it writes to their block to clear their leader
bit.

The final hole in the design is that because we're fencing mounts, not
slots, each slot can have two mounts in play.  An active mount can be
using the slot and there can still be a persistent record of a previous
mount in the slot that crashed that needs to be fenced.

All this comes together to have the server fence an old mount in a slot
while a new mount is coming up.  The new mount sees the mark change and
freaks out and stops participating in quorum.

The fix is to rework the quorum blocks so that each slot only writes to
its own block.  Instead of the server writing to each fenced mount's
slot, it writes a fence event to its block once all previous mounts have
been fenced.  We add a bit of bookkeeping so that the server can
discover when all block leader fence operations have completed.  Each
event gets its own term so we can compare events to discover live
servers.

We get rid of the write marks and instead have an event that is written
as a quorum agent starts up and is then checked on every read to make
sure it still matches.

Signed-off-by: Zach Brown <zab@versity.com>
2021-05-31 13:10:45 -07:00
Zach Brown
a972e42fba Update dmesg filters for fencing and reclaim
Add regexes for the messages that come from fencing and reclaiming
resources from fenced mounts.

Signed-off-by: Zach Brown <zab@versity.com>
2021-05-26 14:18:28 -07:00
Zach Brown
aad2d3db59 Add stage_tmpfile to .gitignore
We missed adding this newly added binary to .gitignore.

Signed-off-by: Zach Brown <zab@versity.com>
2021-05-26 14:18:28 -07:00
Zach Brown
6663034295 Run the fence agent in the background of tests
Signed-off-by: Zach Brown <zab@versity.com>
2021-05-26 14:18:28 -07:00
Zach Brown
8b78f701a1 Add fence-and-reclaim test
Add a test which exercises the various reasons for fencing mounts and
checks that we reclaim the resources that they had.

Signed-off-by: Zach Brown <zab@versity.com>
2021-05-26 14:18:28 -07:00
Zach Brown
ef440ead28 Add -z to run-test for data-alloc-zone-blocks
Add an option to run-tests which gets passed through to the
data-alloc-zone-blocks argument for mkfs.

Signed-off-by: Zach Brown <zab@versity.com>
2021-05-21 15:31:02 -07:00
Zach Brown
5231cf4034 Add export-lookup-evict-race test
Add a test that creates races between fh_to_dentry and eviction
triggered by lock invalidation.

Signed-off-by: Zach Brown <zab@versity.com>
2021-04-28 12:11:06 -07:00
Zach Brown
1b4e60cae4 Add mkdir-rename-rmdir test
Add a test which performs mkdir, two renames of the dir, and rmdir on
all possible combinations of mounts.

Signed-off-by: Zach Brown <zab@versity.com>
2021-04-27 12:01:43 -07:00
Zach Brown
ba8bf13ae1 Update dmesg whitelist for recovery
The shared recovery layer outputs different messages than when it ran
only for lock_recovery in the lock server.

Signed-off-by: Zach Brown <zab@versity.com>
2021-04-21 12:17:33 -07:00
Zach Brown
dba88705f7 Fix t_umount mount point number
t_umount had a typo that had it try to unmount a mount based on a
caller's variable, which accidentally happened to work for its only
caller.  Future callers would not have been so lucky.

Signed-off-by: Zach Brown <zab@versity.com>
2021-04-21 12:17:33 -07:00
Zach Brown
b244b2d59c Add inode-deletion test
Signed-off-by: Zach Brown <zab@versity.com>
2021-04-21 12:17:33 -07:00
Andy Grover
0deb232d3f Support O_TMPFILE and allow MOVE_BLOCKS into released extents
Support O_TMPFILE: Create an unlinked file and put it on the orphan list.
If it ever gains a link, take it off the orphan list.

Change MOVE_BLOCKS ioctl to allow moving blocks into offline extent ranges.
Ioctl callers must set a new flag to enable this operation mode.

RH-compat: tmpfile support it actually backported by RH into 3.10 kernel.
We need to use some of their kabi-maintaining wrappers to use it:
use a struct inode_operations_wrapper instead of base struct
inode_operations, set S_IOPS_WRAPPER flag in i_flags. This lets
RH's modified vfs_tmpfile() find our tmpfile fn pointer.

Add a test that tests both creating tmpfiles as well as moving their
contents into a destination file via MOVE_BLOCKS.

xfstests common/004 now runs because tmpfile is supported.

Signed-off-by: Andy Grover <agrover@versity.com>
2021-04-05 14:23:44 -07:00
Zach Brown
5661a1fb02 Fix block-stale-reads test
The block-stale-reads test was built from the ashes of a test that
used counters and triggers to work with the btree when it was
only used on the server.

The initial quick translation to try and trigger block cache retries
while the forest called the btree got so much wrong.  It was still
trying to use some 'cl' variable that didn't refer to the client any
more, the trigger helpers now call statfs to find paths and can end up
triggering themselves. and many more counters stale reads can happen
throughout the system while we're working -- not just one from our
trigger.

This fixes it up to consistently use fs numbers instead of
the silly stale cl variable and be less sensitive to triggers firing and
counter differences.

Signed-off-by: Zach Brown <zab@versity.com>
2021-03-10 12:36:41 -08:00
Zach Brown
12fa289399 Add t_trigger_arm_silent
t_trigger_arm always output the value of the trigger after arming on the
premise that tests required the trigger being armed.  In the process of
showing the trigger it calls a bunch of t_ helpers that build the path
to the trigger file using statfs_more to get the rid of mounts.

If the trigger being armed is in the server's mount and the specific
trigger test is fired by the server's statfs_more request processing
then the trigger can be fired before read its value.  Tests can
inconsistently fail as the golden output shows the trigger being armed
or not depending on if it was in the server's mount or not.

t_trigger_arm_silent doesn't output the value of the armed trigger.  It
can be used for low level triggers that don't rely on reading the
trigger's value to discover that their effect has happened.

Signed-off-by: Zach Brown <zab@versity.com>
2021-03-10 12:36:34 -08:00
Zach Brown
75e8fab57c Add t_counter_diff_changed
Tests can use t_counter_diff to put a message in their golden output
when a specific change in counters is expected.  This adds
t_counter_diff_changed to output a message that indicates change or not,
for tests that want to see counters change but the amount of change
doesn't need to be precisely known.

Signed-off-by: Zach Brown <zab@versity.com>
2021-03-10 12:32:04 -08:00
Zach Brown
208c51d1d2 Update stale block reading test
The previous test that triggered re-reading blocks, as though they were
stale, was written in the era where it only hit btree blocks and
everything else was stored in LSM segments.

This reworks the test to make it clear that it affects all our block
readers today.  The test only exercise the core read retry path, but it
could be expanded to test callers retrying with newer references after
they get -ESTALE errors.

Signed-off-by: Zach Brown <zab@versity.com>
2021-03-01 09:50:00 -08:00
Zach Brown
f6f72e7eae Resume running the mount-unmount-race test
The recent quorum and unmount fixes should have addressed the failures
we were seeing in the mount-unmount-race test.

Signed-off-by: Zach Brown <zab@versity.com>
2021-02-22 13:28:38 -08:00
Zach Brown
7421bd1861 Filter all test device digits to 0
We mask device numbers in command output to 0:0 so that we can have
consistent golden test output.  The device number matching regex
responsible for this missed a few digits.

It didn't show up until we both tested enough mounts to get larger
device minor numbers and fixed multi-mount consistency so that the
affected tests didn't fail for other reasons.

Signed-off-by: Zach Brown <zab@versity.com>
2021-02-22 13:28:38 -08:00
Zach Brown
1db6f8194d Update xfstests to use quorum slot options
Signed-off-by: Zach Brown <zab@versity.com>
2021-02-22 13:28:38 -08:00
Zach Brown
2de7692336 Unmount mount point, not device
Our test unmount function unmounted the device instead of the mount
point.  It was written this way back in an old version of the harness
which didn't track mount points.

Now that we have mount points, we can just unmount that.  This stops the
umount command from having to search through all the current mounts
looking for the mountpoint for the device it was asked to unmount.

Signed-off-by: Zach Brown <zab@versity.com>
2021-02-22 13:28:38 -08:00
Zach Brown
8c1d96898a Log wait failure in mount-unmount-race test
I got a test failure where waiting returned an error, but it wasn't
clear what the error was or where it might have come from.  Add more
logging so that we learn more about what might have gone wrong.

Signed-off-by: Zach Brown <zab@versity.com>
2021-02-22 13:28:38 -08:00
Zach Brown
dbb716f1bb Update tests for quorum slots
Update the tests to deal with the mkfs and mount changes for the
specifically configured quorum slots.

Signed-off-by: Zach Brown <zab@versity.com>
2021-02-22 13:28:38 -08:00
Zach Brown
6ad18769cb Disable mount-unmount-race test
The mount-unmount-race test is occasionally hanging, disable it while we
debug it and have test coverage for unrelated work.

Signed-off-by: Zach Brown <zab@versity.com>
2021-02-01 10:07:47 -08:00
Zach Brown
5a90234c94 Use terminated test name when saving passed stats
We've grown some test names that are prefixes of others
(createmany-parallel, createmany-parallel-mounts).  When we're searching
for lines with the test name we have to search for the exact test name,
by terminating the name with a space, instead of searching for a line
that starts with the test name.

This fixes strange output and saved passed stats for the names that
share a prefix.

Signed-off-by: Zach Brown <zab@versity.com>
2021-01-26 14:46:07 -08:00
Zach Brown
f81e4cb98a Add whitespace to xfstests output message
The message indicating that xfstests output was now being shown was
mashed up against the previous passed stats and it was gross and I hated
it.

Signed-off-by: Zach Brown <zab@versity.com>
2021-01-26 14:46:07 -08:00
Zach Brown
1fc706bf3f Filter hrtimer slow messages from dmesg
When running in debug kernels in guests we can really bog down things
enough to trigger hrtimer warnings.  I don't think there's much we can
reasonably do about that.

Signed-off-by: Zach Brown <zab@versity.com>
2021-01-26 14:46:07 -08:00
Zach Brown
35ed1a2438 Add t_require_meta_size function
Add a function that tests can use to skip when the metadata device isn't
large enough.  I thought we needed to avoid enospc in a particular test,
but it turns out the test's failure was unrelated.  So this isn't used
for now but it seems nice to keep around.

Signed-off-by: Zach Brown <zab@versity.com>
2021-01-26 14:46:07 -08:00
Zach Brown
8123b8fc35 fix lock-conflicting-batch-commit conf output
The test had a silly typo in the label it put on the time it took mounts
to perform conflicting metadata changes.

Signed-off-by: Zach Brown <zab@versity.com>
2021-01-26 14:46:07 -08:00
Zach Brown
7a96537210 Leave mounts mounted if run-tests fails
We can lose interesting state if the mounts are unmounted as tests fail,
only unmount if all the tests pass.

Signed-off-by: Zach Brown <zab@versity.com>
2021-01-26 14:46:07 -08:00
Zach Brown
0607dfdac8 Enable and collect trace_printk
Weirdly, run-tests was treating trace_printk not as an option to enable
trace_printk() traces but as an option to print trace events to the
console with printk?  That's not a thing.

Make -P really enable trace_printk tracing and collect it as it would
enabled trace events.  It needs to be treated seperately from the -t
options that enable trace events.

While we're at it treat the -P trace dumping option as a stand-alone
option that works without -t arguments.

Signed-off-by: Zach Brown <zab@versity.com>
2021-01-26 14:46:07 -08:00
Zach Brown
0354bb64c5 More carefully enable tracing in run-tests
run-tests.sh has a -t argument which takes a whitespace seperated string
of globs of events to enable.  This was hard to use and made it very
easy to accidentally expand the globs at the wrong place in the script.

This makes each -t argument specify a single word glob which is stored
in an array so the glob isn't expanded until it's applied to the trace
event path.   We also add an error for -t globs that didn't match any
events and add a message with the count of -t arguments and enabled
events.

Signed-off-by: Zach Brown <zab@versity.com>
2021-01-26 14:46:07 -08:00
Zach Brown
47a1ac92f7 Update ino-path args in basic-posix-consistency
The ino-path calls in basic-posix-consistency weren't updated for the
recent change to scoutfs cli args.

Signed-off-by: Zach Brown <zab@versity.com>
2021-01-26 14:45:23 -08:00
Zach Brown
004f693af3 Add golden output for mount-unmount-race test
Signed-off-by: Zach Brown <zab@versity.com>
2021-01-25 14:19:35 -08:00
Zach Brown
773eb129ed Add move-blocks test
Add a basic test of the move_blocks ioctl.

Signed-off-by: Zach Brown <zab@versity.com>
2021-01-14 13:42:22 -08:00
Andy Grover
299062a456 Fix mkfs check for existing ScoutFS superblock
We were checking for the wrong magic value.

We now need to use -f when running mkfs in run-tests for things to work.

Signed-off-by: Andy Grover <agrover@versity.com>
2021-01-13 16:32:41 -08:00
Andy Grover
454dbebf59 Categorize not enough mounts as skip, not fail
Signed-off-by: Andy Grover <agrover@versity.com>
2021-01-12 16:29:42 -08:00
Andy Grover
64a698aa93 Make changes to tests for new scoutfs cmdline syntax
Some different error message require changes to golden/*

Signed-off-by: Andy Grover <agrover@versity.com>
2021-01-12 16:29:42 -08:00
Zach Brown
511cb04330 Add stage-mulit-part test
Add a test which stages a file in multiple parts while a long-lived
process is blocking on offline extents trying to compare the file to the
known contents.

Signed-off-by: Zach Brown <zab@versity.com>
2020-12-15 15:13:42 -08:00
Zach Brown
eb22425bad Update tests/ README
The README in tests/ had gone a bit stale.  While it was originally
written to be a README.md displayed in the github repo, we can
still use it in place as a quick introduction to the tests.

Signed-off-by: Zach Brown <zab@versity.com>
2020-12-07 10:39:20 -08:00
Zach Brown
6415814f92 Use kmod and utils subdirs instead of repos
When we had three repos the run-tests harness helped by checking
branches in kmod and utils repos to build and test.  Now that we have
one repo we can just use the sibling kmod/ and utils/ dirs in the repo.

Signed-off-by: Zach Brown <zab@versity.com>
2020-12-07 10:39:20 -08:00
Zach Brown
14530471c4 scoutfs-tests: add srch-basic-functionality
Add basic functional testing of finding inodes by their xattrs.

Signed-off-by: Zach Brown <zab@versity.com>
2020-12-03 13:40:33 -08:00
Zach Brown
88aefc381a scoutfs-tests: add find_xattrs
Add a utility that mimics our search_xattrs ioctl with directory entry
walking and fgetxattr as efficiently as it can so we can use it to test
large file populations.

Signed-off-by: Zach Brown <zab@versity.com>
2020-12-03 13:40:33 -08:00
Zach Brown
8982750266 scoutfs-tests: bulk create more clearly sets xattr
Just set the value using a single char, this messed up and set the size
of the pointer.

Signed-off-by: Zach Brown <zab@versity.com>
2020-12-03 13:40:33 -08:00
Zach Brown
7b2310442b scoutfs-tests: add createmany-rename-large-dir
Add a test that randomly renames entries in a single large directory.
This has caught bugs in the reservation of allocator resources for
client transactions.

Signed-off-by: Zach Brown <zab@versity.com>
2020-12-02 09:23:15 -08:00