scoutfs

mirror of https://github.com/versity/scoutfs.git synced 2026-02-07 11:10:44 +00:00

Author	SHA1	Message	Date
Auke Kok	553e6e909e	Scratch mount test helpers. Adds basic mkfs/mount/umount helpers that handle all the basics for making, mounting and unmounting scratch devices. The mount/unmount create "$T_MSCR", which lives in "$T_TMPDIR". Signed-off-by: Auke Kok <auke.kok@versity.com>	2026-01-20 12:35:09 -08:00
Zach Brown	b468352254	Add t_wait_until_timeout Add a test helper for waiting for a command to return success which will fail the test after a timeout. Signed-off-by: Zach Brown <zab@versity.com>	2026-01-13 15:34:55 -08:00
Chris Kirby	ef0f6f8ac2	Fix race in inode-deletion test Due to an iput race, the "unlink wait for open on other mount" subtest can fail. If the unlink happens inline, then the test passes. But if the orphan scanner has to complete the unlink work, it's possible that there won't be enough log merge work for the scanner to do the cleanup before we look at the seq index. Add SCOUTFS_TRIGGER_LOG_MERGE_FORCE_FINALIZE_OURS, to allow forcing a log merge. Add new counters, log_merges_start and log_merge_complete, so that tests can see that a merge has happened. Then we have to wait for the orphan scanner to do its work. Add a new counter, orphan_scan_empty, that increments each time the scanner walks the entire inode space without finding any orphans. Once the test sees that counter increment, it should be safe to check the seq index and see that the unlinked inode is gone. Signed-off-by: Chris Kirby <ckirby@versity.com>	2026-01-07 08:29:38 -06:00
Zach Brown	5af1412d5f	Merge pull request #270 from versity/auke/bdev_autoloading Avoid block device autoloading warning.	2025-12-17 11:06:32 -08:00
Auke Kok	6c4590a8a0	Avoid block device autoloading warning. It's possible to trigger the block device autoloading mechanism with a mknod()/stat(), and this mechanism has long been declared obsolete, thus triggering a dmesg warning since el9_7, which then fails the test. You may need to `rmmod loop` to reproduce. Avoid this by avoiding to trigger a loop autoload - we just make a different blockdev. Chosing `42` here should avoid any autoload mechanism as this number is explicitly for demo drivers and should never trigger an autoload. We also just ignore the warning line in dmesg. Other tests can and might perhaps still trigger this, as well as background noise running during the test. Signed-off-by: Auke Kok <auke.kok@versity.com>	2025-12-08 13:04:58 -08:00
Auke Kok	e1a6689a9b	Include t_fail status in tap output. The tap output file was not yet complete as it failed to include the contents of `status.msg`. In a few cases, that would mean it lacks important context. Signed-off-by: Auke Kok <auke.kok@versity.com>	2025-12-04 14:09:39 -05:00
Zach Brown	fd80c17ab6	Filter out kernel message when guests are slow Ignore more kernel messages when debug guests are being slow. Signed-off-by: Zach Brown <zab@versity.com>	2025-11-13 12:43:31 -08:00
Zach Brown	991e2cbdf8	Ignore slow quorum hb transfers in tests We're getting test failures from messages that our guests can be unresponsive. They sure can be. We don't need to fail for this one specific case. Signed-off-by: Zach Brown <zab@versity.com>	2025-11-13 12:43:31 -08:00
Zach Brown	8484a58dd6	Have xfstest pass when using args The xfstests's golden output includes the full set of tests we expect to run when no args are specified. If we specify args then the set of tests can change and the test will always fail when they do. This fixes that by having the test check the set of tests itself, rather than relying on golden output. If args are specified then our xfstest only fails if any of the executed xfstest tests failed. Without args, we perform the same scraping of the check output and compare it against the expected results ourself. It would have been a bit much to put that large file inline in the test file, so we add a dir of per-test files in revision control. We can also put the list of exclusions there. We can also clean up the output redirection helper functions to make them more clear. After xfstests has executed we want to redirect output back to the compared output so that we can catch any unexpected output. Signed-off-by: Zach Brown <zab@versity.com>	2025-11-13 12:43:31 -08:00
Chris Kirby	c72bf915ae	Use ENOLINK as a special error code during forced unmount Tests such as quorum-heartbeat-timeout were failing with EIO messages in dmesg output due to expected errors during forced unmount. Use ENOLINK instead, and filter all errors from dmesg with this errno (67). Signed-off-by: Chris Kirby <ckirby@versity.com>	2025-10-22 10:58:44 -07:00
Auke Kok	377e49caf1	Properly silently kill background tasks. Occasionally, we have some tests fail because these kills produce: tests/lock-recover-invalidate.sh: line 42: 9928 Terminated Even though we expected them to be silent. In these particular cases we already don't care about this output. We borrow the silent_kill() function from orphan-inodes and promote it to t_silent_kill() in funcs/exec.sh, and then use it everywhere where appropriate. Signed-off-by: Auke Kok <auke.kok@versity.com>	2025-05-08 12:03:04 -07:00
Zach Brown	8aa1a98901	Merge pull request #210 from versity/auke/perf-irq-took-too-long Filter out perf `interrupt took too long` dmesg.	2025-04-30 10:04:00 -07:00
Auke Kok	24031cde1d	TAP formatted output. Stored as `results/scoutfs.tap`, this file contains TAP format 14 generated test results. Embedded in the output are some metadata so that these files can be aggregated and stored in an unique and deduplicating way, but using a generated UUID at the start of testing. The file itself also catches git ID, date, and kernel version, as well as the (possibly altered) test sequence used. Any test that has diff or dmesg output will be considered failed, and a copy of the relevant data is included as comments. Signed-off-by: Auke Kok <auke.kok@versity.com>	2025-04-15 12:02:41 -07:00
Auke Kok	1b47e9429e	Filter out perf `interrupt took too long` dmesg. Example: ``` [ 2469.638414] perf: interrupt took too long (2507 > 2500), lowering kernel.perf_event_max_sample_rate to 79000 ``` Signed-off-by: Auke Kok <auke.kok@versity.com>	2025-04-14 12:06:58 -07:00
Auke Kok	fc7876e844	Allow certain tests to skip, but not fail exit condition. Previously, any t_skip would cause the final test result to be a failure because up until now no test should have been skipped. However, with format-version-forward-back not being compatible with el9, we are going to rely on el7/8 testing for that test soleley, and therefore we have to allow skipping of this test on el9 and newer OS versions. We add `t_skip_permitted` to signal this from the test case to the run-tests.sh script. A new exit code is passed, and all accounting is updated to reflect that a test was skipped, but this was permitted. We modify format-version-forward-back to use this new exit path. Signed-off-by: Auke Kok <auke.kok@versity.com>	2024-10-03 15:38:34 -07:00
Auke Kok	5337b9e221	Ingore Process accounting resumed dmesg. I'm seeing more and more of these as audit is enabled in el8 and el9 images I am using for testing, and during ENOSPC tests this has a chance of triggering process accounting suspension, and subsequent resume. Signed-off-by: Auke Kok <auke.kok@versity.com>	2024-10-03 15:38:34 -07:00
Auke Kok	8a22bdd366	Ignore device mapper size change dmesg output. In v1.18-10-g5507ee5, we changed the test code away from loopback to device-mapper, which simplified our DUT setup code. However, this results in the occasional `device changed size` messages now being emitted by the `dm` driver instead of the `loop` kernel module. We have to additionally ignore these kernel messages from now as well. Signed-off-by: Auke Kok <auke.kok@versity.com>	2024-10-03 15:38:34 -07:00
Auke Kok	21b5032365	Add new xfstests that we won't support or don't pass The new version of xfstests adds a _lot_ more tests to our mix. Many of the new ones will auto enable or auto skip as needed. There are tests we can't or won't support that will be in future xfstests. Disable them now so we can avoid dealing with them later. Quite a few fall into "we don't support these types of mounting yet", mostly bind-mount or dm-mapper things. We disable all the swapfile tests flatout. A few tests fail on el7 but not el8/9 but we don't have a way to run them without failing yet, so disable them as well. Update golden with the proper new array of tests. This all requires the `auke/scoutfs-el9` branch in `versity/scoutfs-xfstests-dev`. Signed-off-by: Auke Kok <auke.kok@versity.com>	2024-10-03 15:38:34 -07:00
Zach Brown	5a53e7144d	Add format-version back/forward compat test Signed-off-by: Zach Brown <zab@versity.com> Signed-off-by: Bryant G. Duffy-Ly <bduffyly@versity.com>	2024-06-28 14:53:49 -07:00
Zach Brown	a23877b150	Add fs test functions for mounted paths We have some fs functions which return info based on the test mount nr as the test has setup. This refactors those a bit to also provide some of the info when the caller has a path in a given mount. This will let tests work with scratch mounts a little more easily. Signed-off-by: Zach Brown <zab@versity.com> Signed-off-by: Bryant G. Duffy-Ly <bduffyly@versity.com>	2024-06-28 14:53:49 -07:00
Zach Brown	b552406427	Ignore spurious KASAN unwind warning KASAN could raise a spurious warning if the unwinder started in code without ORC metadata and tried to access in the KASAN stack frame redzones. This was fixed upstream but we can rarely see it in older kernels. We can ignore these messages. Signed-off-by: Zach Brown <zab@versity.com>	2023-11-21 12:25:16 -08:00
Zach Brown	2b94cd6468	Add loop module kernel message filter Now that we're not setting up per-mount loopback devices we can not have the loop module loaded until tests are running. Signed-off-by: Zach Brown <zab@versity.com>	2023-11-15 13:39:38 -08:00
Zach Brown	77fbf92968	Add t_trigger_set helper Add a helper to arm or disarm a trigger with a value argument. Signed-off-by: Zach Brown <zab@versity.com>	2023-11-07 12:12:10 -08:00
Zach Brown	bb835b948d	Merge pull request #138 from versity/auke/ignore-journald-rotate Filter out journald rotate messages.	2023-10-16 14:54:56 -07:00
Auke Kok	7ceb215c91	Filter out journald rotate messages. On el9 distros systemd-journald will log rotation events into kmesg. Since the default logs on VM images are transient only, they are rotated several times during a single test cycle, causing test failures. Signed-off-by: Auke Kok <auke.kok@versity.com>	2023-10-12 12:27:41 -04:00
Zach Brown	cf05aefe50	t_quiet appends command output The t_quiet test command execution helper was constantly truncating the quiet.log with the output of each command. It was meant to show each command and its output as they're run. Signed-off-by: Zach Brown <zab@versity.com>	2023-10-11 14:50:04 -07:00
Auke Kok	e580f33f82	Ignore loop device resizing messages. These occasionally trigger during tests. Signed-off-by: Auke Kok <auke.kok@versity.com>	2023-10-09 15:35:40 -04:00
Zach Brown	05371b83f0	Update expected console messages during testing Signed-off-by: Zach Brown <zab@versity.com>	2023-06-16 09:37:37 -07:00
Zach Brown	e52435b993	Add t_mount_opt Add a test helper that mounts with a mount option. Signed-off-by: Zach Brown <zab@versity.com>	2023-05-22 16:30:01 -07:00
Zach Brown	904c5dce90	Filter forced unmount transaction commit error Add a transaction commit error message to the set of errors we ignore when triggering forced unmount. Signed-off-by: Zach Brown <zab@versity.com>	2023-05-18 15:50:34 -07:00
Zach Brown	57c6d78df8	Add test of quorum heartbeat timeout setting Signed-off-by: Zach Brown <zab@versity.com>	2023-05-18 15:50:33 -07:00
Zach Brown	74e9d0f764	Silence test syfs option failure If setting a sysfs option failes the bash write error is output. It contains the script line number which can fail over time, leading to mismatched golden output failures if we used the output as an expected indication of failure. Callers should test its rc and output accordingly if they want the failure logged and compared. Signed-off-by: Zach Brown <zab@versity.com>	2023-05-18 11:15:28 -07:00
Zach Brown	98eb0eb649	Add t_quorum_nrs test helper Add a quick function that outputs the fs numbers of the quorum mounts. Signed-off-by: Zach Brown <zab@versity.com>	2023-05-18 11:15:28 -07:00
Zach Brown	6ded240089	Add t_rc test execution helper function Add a quick wrapper to run commands whose output is saved while only echoing their return code. Signed-off-by: Zach Brown <zab@versity.com>	2023-04-17 12:47:50 -07:00
Zach Brown	41174867ed	Add t_get_sysfs_mount_option test func Add a quick little function to get the value of a mount option. Signed-off-by: Zach Brown <zab@versity.com>	2022-12-02 12:28:13 -08:00
Zach Brown	d5ddf1ecac	Fix option save/restore test helpers The test shell helpers for saving and restoring mount options were trying to put each mount's option value in an array. It meant to build the array key by concatenating the option name and the mount number. But it didn't isolate the option "name" variable when evaluating it, instead always evaluating "name_" to nothing and building keys for all options that only contained the mount index. This then broke when tests attempted to save and restore multiple options. Signed-off-by: Zach Brown <zab@versity.com>	2022-10-17 09:12:21 -07:00
Zach Brown	875583b7ef	Add t_fs_is_leader test helper The t_server_nr and t_first_client_nr helpers iterated over all the fs numbers examining their quorum/is_leader files, but clients don't have a quorum/ directory. This was causing spurious outputs in tests that were looking for servers but didn't find it in the first quorum fs number and made it down into the clients. Give them a helper that returns 0 for being a leader if the quorum/ dir doesn't exist. Signed-off-by: Zach Brown <zab@versity.com>	2022-03-15 16:09:55 -07:00
Zach Brown	cd23cc61ca	Add mount option test bash functions Add some test functions which work with mount options. Signed-off-by: Zach Brown <zab@versity.com>	2022-03-10 11:43:11 -08:00
Zach Brown	b2834d3c28	Add basic bad mount testing Add some tests which exercise the kinds of reasonable mistakes that people will make in the field. Signed-off-by: Zach Brown <zab@versity.com>	2022-02-21 10:44:38 -08:00
Bryant G. Duffy-Ly	38ee2defd5	Add a filter for forced unmount error output [85164.299902] scoutfs f.8c19e1.r.facf2e error: server error writing btree blocks: -5 [144308.589596] scoutfs f.c9397a.r.8ae97f error: server error -5 freeing merged btree blocks: looping commit del/upd freeing item [174646.005596] scoutfs f.15f0b3.r.1862df error: server error -5 freeing merged btree blocks: final commit del/upd freeing item [146653.893676] scoutfs f.c7f188.r.34e23c error: server error writing super block: -5 [273218.436675] scoutfs f.dd4157.r.f0da7e error: server failed to bind to 127.0.0.1:42002, err -98 [376832.542823] scoutfs f.049985.r.1a8987 error: error -5 reading quorum block 19 to update event 1 term 3 The above is an example output that will be filtered out Signed-off-by: Bryant G. Duffy-Ly <bduffyly@versity.com>	2021-11-08 07:36:02 -06:00
Zach Brown	e4dca8ddcc	Don't shutdown quorum if server startup fails The quorum service shuts down if it sees errors that mean that it can't do its job. This is mostly fatal errors gathering resources at startup or runtime IO errors but it was also shutting down if server startup fails. That's not quite right. This should be treated like the server shutting down on errors. Quorum needs to stay around to participate in electing the next server. Fence timeouts could trigger this. A quorum mount could crash, the next server without a fence script could have a fence request timeout and shutdown, and now the third remaining server is left to indefinitely send vote requests into the void. With this fixed, continuing that example, the quorum service in the second mount remains to elect the third server with a working fence script after the second server shuts down after its fence request times out. Signed-off-by: Zach Brown <zab@versity.com>	2021-07-30 11:34:52 -07:00
Zach Brown	24d682bf81	Add orphan-inodes test Signed-off-by: Zach Brown <zab@versity.com>	2021-07-02 10:54:56 -07:00
Zach Brown	38a4a56741	Stop writing to other quorum slot blocks The core quorum work loop assumes that it has exclusive access to its slot's quorum block. It uniquely marks blocks it writes and verifies the marks on read to discover if another mount has written to its slot under the assumption that this must be a configuration error that put two mounts in the same slot. But the design of the leader bit in the block violates the invariant that only a slot will write to its block. As the server comes up and fences previous leaders it writes to their block to clear their leader bit. The final hole in the design is that because we're fencing mounts, not slots, each slot can have two mounts in play. An active mount can be using the slot and there can still be a persistent record of a previous mount in the slot that crashed that needs to be fenced. All this comes together to have the server fence an old mount in a slot while a new mount is coming up. The new mount sees the mark change and freaks out and stops participating in quorum. The fix is to rework the quorum blocks so that each slot only writes to its own block. Instead of the server writing to each fenced mount's slot, it writes a fence event to its block once all previous mounts have been fenced. We add a bit of bookkeeping so that the server can discover when all block leader fence operations have completed. Each event gets its own term so we can compare events to discover live servers. We get rid of the write marks and instead have an event that is written as a quorum agent starts up and is then checked on every read to make sure it still matches. Signed-off-by: Zach Brown <zab@versity.com>	2021-05-31 13:10:45 -07:00
Zach Brown	a972e42fba	Update dmesg filters for fencing and reclaim Add regexes for the messages that come from fencing and reclaiming resources from fenced mounts. Signed-off-by: Zach Brown <zab@versity.com>	2021-05-26 14:18:28 -07:00
Zach Brown	8b78f701a1	Add fence-and-reclaim test Add a test which exercises the various reasons for fencing mounts and checks that we reclaim the resources that they had. Signed-off-by: Zach Brown <zab@versity.com>	2021-05-26 14:18:28 -07:00
Zach Brown	ba8bf13ae1	Update dmesg whitelist for recovery The shared recovery layer outputs different messages than when it ran only for lock_recovery in the lock server. Signed-off-by: Zach Brown <zab@versity.com>	2021-04-21 12:17:33 -07:00
Zach Brown	dba88705f7	Fix t_umount mount point number t_umount had a typo that had it try to unmount a mount based on a caller's variable, which accidentally happened to work for its only caller. Future callers would not have been so lucky. Signed-off-by: Zach Brown <zab@versity.com>	2021-04-21 12:17:33 -07:00
Zach Brown	12fa289399	Add t_trigger_arm_silent t_trigger_arm always output the value of the trigger after arming on the premise that tests required the trigger being armed. In the process of showing the trigger it calls a bunch of t_ helpers that build the path to the trigger file using statfs_more to get the rid of mounts. If the trigger being armed is in the server's mount and the specific trigger test is fired by the server's statfs_more request processing then the trigger can be fired before read its value. Tests can inconsistently fail as the golden output shows the trigger being armed or not depending on if it was in the server's mount or not. t_trigger_arm_silent doesn't output the value of the armed trigger. It can be used for low level triggers that don't rely on reading the trigger's value to discover that their effect has happened. Signed-off-by: Zach Brown <zab@versity.com>	2021-03-10 12:36:34 -08:00
Zach Brown	75e8fab57c	Add t_counter_diff_changed Tests can use t_counter_diff to put a message in their golden output when a specific change in counters is expected. This adds t_counter_diff_changed to output a message that indicates change or not, for tests that want to see counters change but the amount of change doesn't need to be precisely known. Signed-off-by: Zach Brown <zab@versity.com>	2021-03-10 12:32:04 -08:00
Zach Brown	7421bd1861	Filter all test device digits to 0 We mask device numbers in command output to 0:0 so that we can have consistent golden test output. The device number matching regex responsible for this missed a few digits. It didn't show up until we both tested enough mounts to get larger device minor numbers and fixed multi-mount consistency so that the affected tests didn't fail for other reasons. Signed-off-by: Zach Brown <zab@versity.com>	2021-02-22 13:28:38 -08:00

1 2

66 Commits