scoutfs

mirror of https://github.com/versity/scoutfs.git synced 2026-02-04 09:42:06 +00:00

Author	SHA1	Message	Date
Auke Kok	0ef22b3c44	Add basic ACL consistency test case. This test case is used to detect and reproduce a customer issue we're seeing where the new .get_acl() method API and underlying changes in el9_6+ are causing ACL cache fetching to return inconsistent results, which shows as missing ACLs on directories. This particular sequence is consistent enough that it warrants making it into a specific test. Signed-off-by: Auke Kok <auke.kok@versity.com>	2026-01-22 12:23:38 -08:00
Auke Kok	85ffba5329	Update existing tests to use scratch helpers. Signed-off-by: Auke Kok <auke.kok@versity.com>	2026-01-20 12:35:43 -08:00
Auke Kok	553e6e909e	Scratch mount test helpers. Adds basic mkfs/mount/umount helpers that handle all the basics for making, mounting and unmounting scratch devices. The mount/unmount create "$T_MSCR", which lives in "$T_TMPDIR". Signed-off-by: Auke Kok <auke.kok@versity.com>	2026-01-20 12:35:09 -08:00
Zach Brown	9b569415f2	Merge pull request #276 from versity/zab/v1.27 v1.27 Release	2026-01-15 19:36:38 -08:00
Zach Brown	6a1e136085	v1.27 Release Finish the release notes for the 1.27 release. Signed-off-by: Zach Brown <zab@versity.com> v1.27	2026-01-15 14:21:53 -08:00
Zach Brown	7ca789c837	Merge pull request #278 from versity/zab/test_sync_before_crash Have run-tests monitor sync before crashing	2026-01-15 14:03:26 -08:00
Zach Brown	4d55fe6251	Have run-tests monitor sync before crashing There have been a few failures where output is generated just before we crash but it didn't have a chance to be written. Add a best-effort background sync before crashing. There's a good chance it'll hang if the system is stuck so we don't wait for it directly, just for .. some time to pass. Signed-off-by: Zach Brown <zab@versity.com>	2026-01-15 10:41:44 -08:00
Zach Brown	8f896d9783	Merge pull request #277 from versity/zab/avoid_lock_shrink_storm_hangs Zab/avoid lock shrink storm hangs	2026-01-14 11:13:09 -08:00
Zach Brown	e54f8d3ec0	Don't shutdown server from sending to fencing client Errors from lock server calls typically shut the server down. During normal unmount a client's locks are reclaimed before the connection is disconnected. The lock server won't try to send to unmounting clients. Clients whose connections time out can cause ENOTCONN errors. Their connection is freed before they're fenced and their locks are reclaimed. The server can try to send to the client for a lock that's disconnected and get a send error. These errors shouldn't shut down the server. The client is either going to be fenced and have the locks reclaimed, ensuring forward progress, or the server is going to shutdown if it can't fence. This was seen in testing as multiple clients were timed out. Signed-off-by: Zach Brown <zab@versity.com>	2026-01-13 15:34:55 -08:00
Zach Brown	d89e16214d	Simplify fence-and-reclaim fence execution check The fence-and-reclaim test runs a bunch of scenarios and makes sure that the fence agent was run on the appropriate mount's rids. Unfortunately the checks were racey. The check itself only looked at the log once to see if the rid had been fenced. Each check had steps before that would wait until the rid should have been fenced and could be checked. Those steps were racey. They'd do things like make sure a fence request wasn't pending, but never waited for it to be created in the first place. They'd falsely indicate that the log should be checked and when the rid wasn't found in the log the test would fail. In logs of failures we'd see that the rids were fenced after this test failed and moved on to the next. This simplifies the checks. It gets rid of all the intermediate steps and just waits around for the rid to be fenced, with a timeout. This avoids the flakey tests. Signed-off-by: Zach Brown <zab@versity.com>	2026-01-13 15:34:55 -08:00
Zach Brown	b468352254	Add t_wait_until_timeout Add a test helper for waiting for a command to return success which will fail the test after a timeout. Signed-off-by: Zach Brown <zab@versity.com>	2026-01-13 15:34:55 -08:00
Zach Brown	0eb9dfebdc	Allow forced unmount errors in lock invalidation Lock invalidation has assertions for critical errors, but it doesn't allow the synthetic errors that come from forced unmount severing the client's connection to the world. Signed-off-by: Zach Brown <zab@versity.com>	2026-01-13 15:34:55 -08:00
Zach Brown	f5750de244	Search messages in rbtree instead of lists The net layer was initially built around send queue lists with the presumption that there wouldn't be many messages in flight and that responses would be sent roughly in order. In the modern era, we can have 10s of thousands of lock request messages in flight. This lead to o(n^2) processing in quite a few places as recv processing searched for either requests to complete or responses to free. This adds messages to two rbtrees, indexing either requests by their id or responses by their send sequence. Recv processing can find messages in o(log n). This patch intends to be minimally disruptive. It's only replacing the search of the send and resend queues in the recv path with rbtrees. Other uses of the two queue lists are untouched. On a single node, with ~40k lock shrink attempts in flight, we go from processing ~800 total request/grant request/response pairs per second to ~60,000 per second. Signed-off-by: Zach Brown <zab@versity.com>	2026-01-13 15:32:55 -08:00
Zach Brown	f0c7996612	Limit client locks with option instead of shrinker The use of the VM shrinker was a bad fit for locks. Shrinking a lock requires a round trip with the server to request a null mode. The VM treats the locks like a cache, as expected, which leads to huge amounts of locks accumulating and then being shrank in bulk. This creates a huge backlog of locks making their way through the network conversation with the server that implements invalidating to a null mode and freeing. It starves other network and lock processing, possibly for minutes. This removes the VM shrinker and instead introduces an option that sets a limit on the number of idle locks. As the number of locks exceeds the count we only try to free an oldest lock at each lock call. This results in a lock freeing pace that is proportional to the allocation of new locks by callers and so is throttled by the work done while callers hold locks. It avoids the bulk shrinking of 10s of thousands of locks that we see in the field. Signed-off-by: Zach Brown <zab@versity.com>	2026-01-08 10:58:50 -08:00
Zach Brown	5143927e07	Merge pull request #275 from versity/auke/qht_slow_umount_pr Unmounts can be slow and break quorum-heartbeat-timeout	2026-01-08 09:35:23 -08:00
Auke Kok	f495f52ec9	Unmounts can be slow and break quorum-heartbeat-timeout We observe that unmount in this test can consume up to 10sec of time before proceeding to record heartbeat timeout elections by followers. When this happens, elections and new leaders happen before unmount even completes. This indicates that hearbeat packets from the unmount are ceased immediately, but the unmount is taking longer doing other things. The timeouts then trigger, possibly during the unmount. The result is that with timeouts of 3 seconds, we're not actually waiting for an election at all. It already happened 7 seconds ago. The code here just "sees" that it happens a few hundred ms after it started looking for it. There's a few ways about this fix. We could record the actual timestamp of the election, and compare it with the actual timestamp of the last heartbeat packet. This would be conclusive, and could disregard any complication from umount taking too long. But it also means adding timestamping in various places, or having to rely on tcpdump with packet processing. We can't just record $start before unmount. We will still violate the part of the test that checks that elections didn't happen too late. Especially in the 3sec test case if unmount takes 10sec. The simplest solution is to unmount in a bg thread, and circle around later to `wait` for it to assure we can re-mount without ill effect. Signed-off-by: Auke Kok <auke.kok@versity.com>	2026-01-08 09:05:40 -08:00
Zach Brown	3dafeaac5b	Merge pull request #273 from versity/clk/inode_deletion Clk/inode deletion	2026-01-07 12:20:12 -08:00
Chris Kirby	ef0f6f8ac2	Fix race in inode-deletion test Due to an iput race, the "unlink wait for open on other mount" subtest can fail. If the unlink happens inline, then the test passes. But if the orphan scanner has to complete the unlink work, it's possible that there won't be enough log merge work for the scanner to do the cleanup before we look at the seq index. Add SCOUTFS_TRIGGER_LOG_MERGE_FORCE_FINALIZE_OURS, to allow forcing a log merge. Add new counters, log_merges_start and log_merge_complete, so that tests can see that a merge has happened. Then we have to wait for the orphan scanner to do its work. Add a new counter, orphan_scan_empty, that increments each time the scanner walks the entire inode space without finding any orphans. Once the test sees that counter increment, it should be safe to check the seq index and see that the unlinked inode is gone. Signed-off-by: Chris Kirby <ckirby@versity.com>	2026-01-07 08:29:38 -06:00
Chris Kirby	c0cd29aa1b	Fix run-test.sh buffer multiplier breakage The /sys/kernel/debug/tracing/buffer_size_kb file always reads as "7 (expanded: 1408)". So the -T option to run-test.sh won't work, because it tries to multiply that string by the given factor. It always defaults to 1408 on every platform we currently support. Just use that value so we can specify -T in CI runs. Signed-off-by: Chris Kirby <ckirby@versity.com>	2025-12-18 15:05:48 -06:00
Zach Brown	50bff13f21	Merge pull request #266 from versity/zab/increase_move_empty_budget Increase server commit block budget for alloc move	2025-12-18 12:44:20 -08:00
Zach Brown	de70ca2372	Increase server commit block budget for alloc move A few callers of alloc_move_empty in the server were providing a budget that was too small. Recent changes to extent_mod_blocks increased the max budget that is necessary to move extents between btrees. The existing WAG of 100 was too small for trees of height 2 and 3. This caused looping in production. We can increase the move budget to half the overall commit budget, which leaves room for a height of around 7 each. This is much greater than we see in practice because the size of the per-mount btrees is effectiely limited by both watermarks and thresholds to commit and drain. Signed-off-by: Zach Brown <zab@versity.com>	2025-12-17 14:22:04 -06:00
Zach Brown	5af1412d5f	Merge pull request #270 from versity/auke/bdev_autoloading Avoid block device autoloading warning.	2025-12-17 11:06:32 -08:00
Zach Brown	0a2b2ad409	Merge pull request #269 from versity/auke/tap_status_msg Include t_fail status in tap output.	2025-12-17 11:04:00 -08:00
Auke Kok	6c4590a8a0	Avoid block device autoloading warning. It's possible to trigger the block device autoloading mechanism with a mknod()/stat(), and this mechanism has long been declared obsolete, thus triggering a dmesg warning since el9_7, which then fails the test. You may need to `rmmod loop` to reproduce. Avoid this by avoiding to trigger a loop autoload - we just make a different blockdev. Chosing `42` here should avoid any autoload mechanism as this number is explicitly for demo drivers and should never trigger an autoload. We also just ignore the warning line in dmesg. Other tests can and might perhaps still trigger this, as well as background noise running during the test. Signed-off-by: Auke Kok <auke.kok@versity.com>	2025-12-08 13:04:58 -08:00
Zach Brown	1768f69c3c	Merge pull request #224 from versity/auke/renameat2-test-sub-dir Use T_D0/1 instead of T_M0 here.	2025-12-08 10:05:46 -08:00
Zach Brown	dcb0fd5805	Merge pull request #268 from versity/auke/dont_use_bash_special_stdfiles Avoid using bash special device nodes.	2025-12-08 09:47:19 -08:00
Auke Kok	660f874488	Use T_D0/1 instead of T_M0 here. Use of T_M0 and variants should be reserved for e.g. scoutfs <subcommand> -p <mountpoint> type of usages. Tests should create individual content files in the assigned subdirectory. Signed-off-by: Auke Kok <auke.kok@versity.com>	2025-12-04 14:34:02 -05:00
Auke Kok	e1a6689a9b	Include t_fail status in tap output. The tap output file was not yet complete as it failed to include the contents of `status.msg`. In a few cases, that would mean it lacks important context. Signed-off-by: Auke Kok <auke.kok@versity.com>	2025-12-04 14:09:39 -05:00
Auke Kok	2884a92408	Avoid using bash special device nodes. Bash has special handling when these standard IO files, but there are cases where customers have special restrictions set on them. Likely to avoid leaking error data out of system logs as part of IDS software. In any case, we can just reopen existing file descriptors here in both these cases to avoid this entirely. This will always work. Signed-off-by: Auke Kok <auke.kok@versity.com>	2025-12-04 13:24:48 -05:00
Zach Brown	e194714004	Merge pull request #264 from versity/auke/findmnt_retval Findmnt returns 1 when no matching entries found	2025-12-03 14:29:31 -08:00
Auke Kok	8bb2f83cf9	Findmnt returns 1 when no matching entries found Our local fence script attempts to interpret errors executing `findmnt` as critical errors, but the program exit code explicitly returns EXIT_FAILURE when the total number of matching mount entries is zero. This can happen if the mount disappeared while we're attempting to fence the mount, but, the scoutfs sysfs files are still in place as we read them. It's a small window, but, it's a fork/exec plus full parse of /etc/fstab, and a lot can happen in the 0.015s findmnt takes on my system. There's no other exit codes from findmnt other than 0 and 1. At that point, we can only assume that if the stdout is empty, the mount isn't there anymore. Signed-off-by: Auke Kok <auke.kok@versity.com>	2025-12-02 12:55:11 -08:00
Zach Brown	6a9a6789d5	Merge pull request #267 from versity/clk/merge_enoent Handle ENOENT when getting log merge status item	2025-12-02 09:34:28 -08:00
Chris Kirby	ee630b164f	Handle ENOENT when getting log merge status item Tests that cause client retries can fail with this error from server_commit_log_merge(): error -2 committing log merge: getting merge status item This can happen if the server has already committed and resolved the log merge that is being retried. We can safely ignore ENOENT here just like we do a few lines later. Signed-off-by: Chris Kirby <ckirby@versity.com>	2025-12-01 08:58:24 -06:00
Zach Brown	1c7678b6f5	Merge pull request #263 from versity/zab/v1.26 v1.26 Release	2025-11-18 09:39:27 -08:00
Zach Brown	22b5e79bbd	v1.26 Release Finish the release notes for the 1.26 release. Signed-off-by: Zach Brown <zab@versity.com> v1.26	2025-11-17 14:42:14 -08:00
Zach Brown	259e639271	Merge pull request #262 from versity/zab/ino_alloc_per_lock Add ino_alloc_per_lock option	2025-11-14 13:57:49 -08:00
Zach Brown	4d66c38c71	Remove redundant WARN in commit_log_trees The server's commit_log_trees has an error message that includes the source of the error, but it's not used for all errors. The WARN_ON is redundant with the message and is removed because it isn't filtered out when we see errors from forced unmount. Signed-off-by: Zach Brown <zab@versity.com>	2025-11-14 10:04:30 -08:00
Zach Brown	7ef62894bd	Add ino_alloc_per_lock option Add an option that can limit the number of inode numbers that are allocated per lock group. Signed-off-by: Zach Brown <zab@versity.com>	2025-11-13 17:19:04 -08:00
Zach Brown	1f363a1ead	Merge pull request #261 from versity/zab/log_merge_double_free Zab/log merge double free	2025-11-13 17:18:30 -08:00
Zach Brown	8ddf9b8c8c	Handle disappearing fencing requests and targets The userspace fencing process wasn't careful about handling underlying directories that disappear while it was working. On the server/fenced side, fencing requests can linger after they've been resolved by writing 1 to fenced or error. The script could come back around to see the directory before the server finally removes it, causing all later uses of the request dir to fail. We saw this in the logs as a bunch of cat errors for the various request files. On the local fence script side, all the mounts can be in the process of being unmounted so both the /sys/fs dirs and the mount it self can be removed while we're working. For both, when we're working with the /sys/fs files we read them without logging errors and then test that the dir still exists before using what we read. When fencing a mount, we stop if findmnt doesn't find the mount and then raise a umount error if the /sys/fs dir exists after umount fails. And while we're at it, we have each scripts logging append instead of truncating (if, say, it's a log file instead of an interactive tty). Signed-off-by: Zach Brown <zab@versity.com>	2025-11-13 12:43:31 -08:00
Zach Brown	fd80c17ab6	Filter out kernel message when guests are slow Ignore more kernel messages when debug guests are being slow. Signed-off-by: Zach Brown <zab@versity.com>	2025-11-13 12:43:31 -08:00
Zach Brown	991e2cbdf8	Ignore slow quorum hb transfers in tests We're getting test failures from messages that our guests can be unresponsive. They sure can be. We don't need to fail for this one specific case. Signed-off-by: Zach Brown <zab@versity.com>	2025-11-13 12:43:31 -08:00
Zach Brown	92ac132873	Silence merge splice error when forcing Silence another error warning and assertion that's assuming that the result of the errors is going to be persistent. When we're forcing an unmount we've severed storage and networking. Signed-off-by: Zach Brown <zab@versity.com>	2025-11-13 12:43:31 -08:00
Auke Kok	ad078cd93c	Avoid lock stalling mmap_stress mmap_stress gets completely stalled in lock messaging and starving most of the mmap_stress threads, which causes it to delay and even time out in CI. Instead of spawning threads over all 5 test nodes, we reduce it to spawning over only 2 artificially. This still does a good number of operations on those node, and now the work is spread across the two nodes evenly. Additionaly, I've added a miniscule (10ms) delay in between operations that should hopefully be sufficient for other locking attempts to settle and allow the threads to better spread the work. This now shows that all the threads exit within < 0.25s on my test machine, which is a lot better than the 40s variation that I was seeing locally. Hopefully this fares better in CI. Signed-off-by: Auke Kok <auke.kok@versity.com>	2025-11-13 12:43:31 -08:00
Auke Kok	90cb458cd5	Make mmap_stress not exceed a fixed amount of time. There's a scenarion where mmap_stress gets enough resources that twoe of the threads will starve the others, which then all take a very long time catching up committing changes. Because this test program didn't finish until all the threads had completed a fixed amount of work, essentially these threads all ended up tripping over eachother. In CI this would exceed 6h+, while originally I intended this to run in about 100s or so. Instead, cap the run time to ~30s by default. If threads exceed this time, they will immediately exit, which causes any clog in contention between the threads to drain relatively quickly. Signed-off-by: Auke Kok <auke.kok@versity.com>	2025-11-13 12:43:31 -08:00
Zach Brown	1ab798e7eb	Silence inconsistent srch on forced unmount Assembling a srch compaction operation creates an item and populates it with allocator state. It doesn't cleanly unwind the allocation and undo the compaction item change if allocation filling fails and issues a warning. This warning isn't needed if the error shows that we're in forced unmount. The inconsistent state won't be applied, it will be dropped on the floor as the mount is torn down. Signed-off-by: Zach Brown <zab@versity.com>	2025-11-13 12:43:31 -08:00
Zach Brown	e182914e51	Fix double free of metadata blocks in log merging The log merging process is meant to provide parallelism across workers in mounts. The idea is that the server hands out a bunch of concurrent non-intersecting work that's based on the structure of the stable input fs_root btree. The nature of the parallel work (cow of the blocks that intersect a key range) means that the ranges of concurrently issued work can't overlap or the work will all cow the same input blocks, freeing that input stable block multiple times. We're seeing this in testing. Correctness was intended by having an advancing key that sweeps sorted ranges. Duplicate ranges would never be hit as the key advanced past each it visited. This was broken by the mapping of the fs item keys to log merge tree keys by clobbering the sk_zone key value. It effectively interleaves the ranges of each zone in the fs root (meta indexes, orphans, fs items). With just the right log merge conditions that involve logged items in the right places and partial completed work to insert remaining ranges behind the key, ranges can be stored at mapped keys that end up with ranges out of order. The server iterates over these and ends up issueing overlapping work, which results in duplicated frees of the input blocks. The fix, without changing the format of the stored log tree items, is to perform a full sweep of all the range items and determine the next item by looking at the full precision stored keys. This ensures that the processed ranges always advance and never overlap. Signed-off-by: Zach Brown <zab@versity.com>	2025-11-13 12:43:31 -08:00
Zach Brown	8484a58dd6	Have xfstest pass when using args The xfstests's golden output includes the full set of tests we expect to run when no args are specified. If we specify args then the set of tests can change and the test will always fail when they do. This fixes that by having the test check the set of tests itself, rather than relying on golden output. If args are specified then our xfstest only fails if any of the executed xfstest tests failed. Without args, we perform the same scraping of the check output and compare it against the expected results ourself. It would have been a bit much to put that large file inline in the test file, so we add a dir of per-test files in revision control. We can also put the list of exclusions there. We can also clean up the output redirection helper functions to make them more clear. After xfstests has executed we want to redirect output back to the compared output so that we can catch any unexpected output. Signed-off-by: Zach Brown <zab@versity.com>	2025-11-13 12:43:31 -08:00
Zach Brown	a077104531	Add crash monitor to run-tests Add a little background function that runs during the test which triggers a crash if it finds catastrophic failure conditions. This is the second bg task we want to kill and we can only have one function run on the EXIT trap, so we create a generic process killing trap function. We feed it the fenced pid as well. run-tests didn't log much of value into the fenced log, and we're not logging the kills into anymore, so we just remove run-tests fenced logging. Signed-off-by: Zach Brown <zab@versity.com>	2025-11-13 12:43:31 -08:00
Zach Brown	23aaa994df	Add -l to run-tests for looping over tests Add an option to run-tests to have it loop over each test that will be run a number of times. Looping stops if the test doesn't pass. Most of the change in the per-test execution is indenting as we add the for loop block. The stats and kmsg output are lifted up before of the loop. Signed-off-by: Zach Brown <zab@versity.com>	2025-11-06 12:07:42 -08:00

1 2 3 4 5 ...

2160 Commits