scoutfs

mirror of https://github.com/versity/scoutfs.git synced 2026-04-10 00:49:08 +00:00

Author	SHA1	Message	Date
Auke Kok	efc19c3d75	server: limit to one merge request when fs_root height <= 2 When the fs_root is too short for subtrees, get_parent returns the entire root for every request. Multiple concurrent merges would each independently CoW and modify the same root tree. Processing their completions would replace the root with each result, only keeping the last and orphaning blocks allocated by earlier completions. Limit to one outstanding request when fs_root.height <= 2 to prevent this. Signed-off-by: Auke Kok <auke.kok@versity.com>	2026-03-17 10:13:56 -07:00
Auke Kok	a5e746d185	Fix use-after-free in scoutfs_btree_free_blocks() bt = bl->data, but we just marked bl to be freed with scoutfs_block_put(), so save the blkno. Very hypothetical. Signed-off-by: Auke Kok <auke.kok@versity.com>	2026-03-17 10:13:51 -07:00
Auke Kok	13149b121f	btree: free level-1 parent blocks in scoutfs_btree_free_blocks() The descent in free_blocks stops at level 1 and frees all leaf refs in that parent block. Ancestor blocks reached through their final child ref are recorded in blknos[] and freed after the leaves. But the level-1 parent block itself was never freed — it wasn't added to blknos[] during descent (that array only tracks levels 2+) and wasn't freed explicitly after the leaf loop. This leaked every level-1 parent block in every finalized log tree, leaving them as meta blocks that were neither referenced by any btree nor present in any free list. This caused CK_META_COVERAGE gap failures in scoutfs check. Free the level-1 parent block explicitly after all its leaf refs have been freed. Adjust the budget check to reserve space for this additional free (+1 for the parent alongside the existing +nr_par for ancestors). Signed-off-by: Auke Kok <auke.kok@versity.com>	2026-03-17 10:13:07 -07:00
Chris Kirby	b66ed414f0	Suppress another forced shutdown error message The "server error emptying freed" error was causing a fence-and-reclaim test failure. In this case, the error was -ENOLINK, which we should ignore for messaging purposes. Signed-off-by: Chris Kirby <ckirby@versity.com>	2026-03-17 10:13:06 -07:00
Chris Kirby	5a9ea9d246	Don't emit empty blocks in kway_merge() It's possible for a srch compaction to collapse down to nothing if given evenly paired create/delete entries. In this case, we were emitting an empty block. This could cause problems for search_sorted_file(), which assumes that every block it sees has a valid first and last entry. Fix this by keeping a temp entry and only emitting it if it differs from the next entry in the block. Be sure to flush out a straggling temp entry if we have one when we're done with the last block of the merge. Signed-off-by: Chris Kirby <ckirby@versity.com>	2026-03-17 10:13:04 -07:00
Chris Kirby	95a2be99b6	Improve tracing for get_file_block() Print the first and last entries, the entry_nr and entry_bytes. Signed-off-by: Chris Kirby <ckirby@versity.com>	2026-03-17 10:12:54 -07:00
Chris Kirby	ea64279ea0	Fix trigger firing race in srch-safe-merge-pos Because the srch triggers are inherently async to the test, we can't be sure they won't fire prematurely just because a compact worker started running at an inconvenient time. Make the trigger arming silent to avoid spurious test failures. Move the trigger arming closer to the point of interest to increase the chances that we're actually testing what we want. Signed-off-by: Chris Kirby <ckirby@versity.com>	2026-03-17 10:12:53 -07:00
Auke Kok	5beaa6c896	Wake up lock waiters to prevent hangs during unmount. Add unmounting checks to lock_wait_cond() and lock_key_range() so that lock waiters wake up and new lock requests fail with -ESHUTDOWN during unmount. Replace the unbounded wait_event() with a 60 second timeout to prevent indefinite hangs. Relax the WARN_ON_ONCE at lock_key_range entry to only warn when not unmounting, since late lock attempts during shutdown are expected. Signed-off-by: Auke Kok <auke.kok@versity.com>	2026-03-17 10:12:52 -07:00
Auke Kok	9261132b1f	Add client timeout to farewell completion wait. Replace unbounded wait_for_completion() with a 120 second timeout to prevent indefinite hangs during unmount if the server never responds to the farewell request. Signed-off-by: Auke Kok <auke.kok@versity.com>	2026-03-17 10:12:51 -07:00
Auke Kok	b43b8e9559	Set BLOCK_BIT_ERROR on bio submit failure. When block_submit_bio() fails, set BLOCK_BIT_ERROR so that waiters in wait_event(uptodate_or_error) will wake up rather than waiting indefinitely for a completion. Signed-off-by: Auke Kok <auke.kok@versity.com>	2026-03-17 10:12:49 -07:00
Zach Brown	a62708ac19	Merge pull request #286 from versity/auke/more-inode-deletion Also use orphan scan wait code for remote unlink parts.	2026-03-16 14:33:20 -07:00
Zach Brown	48c1f221b3	Merge pull request #285 from versity/auke/s-i-i-grep-awk-fix Use awk matching for ino.	2026-03-13 13:51:39 -07:00
Zach Brown	34713f3559	Merge pull request #290 from versity/auke/dirent_zero_pad Auke/dirent zero pad	2026-03-06 10:41:04 -08:00
Auke Kok	137abc1fe2	Zero scoutfs_data_extent_val padding. The initialization here avoids clearing __pad[], which leaks to disk. Use a struct initializer to avoid it. Signed-off-by: Auke Kok <auke.kok@versity.com>	2026-03-05 16:20:06 -08:00
Auke Kok	64fcbdc15e	Zero out dirent padding to avoid leaking to disk. This allocation here currently leaks through __pad[7] which is written to disk. Use the initializer to enforce zeroing the pad. The name member is written right after. Signed-off-by: Auke Kok <auke.kok@versity.com>	2026-03-05 16:20:06 -08:00
Zach Brown	d9c951ff48	Merge pull request #287 from versity/auke/misc_fixes Unsorted misc. fixes for minor/cosmetic issues.	2026-03-02 10:12:26 -08:00
Auke Kok	eaae92d983	Don't send -EINVAL as u8, over the network. The caller sends the return value of this inline as u8. If we return -EINVAL, it maps to (234) which is outside of our enum range. Assume this was meant to return SCOUTFS_NET_ERR_EINVAL which is a defined constant. Signed-off-by: Auke Kok <auke.kok@versity.com>	2026-02-26 14:02:42 -05:00
Auke Kok	43f3dd7259	Invalid address check logic. These boolean checks are all mutually exclusive, meaning this check will always succeed due to the negative. Instead of && it needs to use \|\|. Signed-off-by: Auke Kok <auke.kok@versity.com>	2026-02-26 14:02:42 -05:00
Auke Kok	7d96cf9b96	Remove copy/paste duplicate op flag check. The exact 2 lines here are repeated. It suggests that there may have been the intent of an additional check, but, there isn't anything left from what I can see that needs checking here. Signed-off-by: Auke Kok <auke.kok@versity.com>	2026-02-26 14:02:41 -05:00
Auke Kok	03e22164db	Return error on scoutfs_forest_setup(). This setup function always returned 0, even on error, causing initialization to continue despite the error. Signed-off-by: Auke Kok <auke.kok@versity.com>	2026-02-26 14:02:41 -05:00
Zach Brown	e0948ec6de	Merge pull request #281 from versity/auke/dotfull-file-seqres Put `.full` file in $T_TMPDIR.	2026-02-26 09:15:22 -08:00
Auke Kok	d0c1c28438	Use awk matching for ino. This test regularly fails here because the grep is greedy and can match inodes ending in the same digits as the one we're looking for. Make it use the same awk pattern used below. Signed-off-by: Auke Kok <auke.kok@versity.com>	2026-02-25 13:43:20 -05:00
Auke Kok	65808c2cb2	Also use orphan scan wait code for remote unlink parts. The fix added in v1.26-17-gef0f6f8a does a good job of avoiding the intermittent test failures for the part that it was added. The remote unlink section could use it as well, as it suffers from the same intermediate failures. Signed-off-by: Auke Kok <auke.kok@versity.com>	2026-02-24 14:12:03 -08:00
Zach Brown	73573d2c2b	Merge pull request #283 from versity/auke/rever Delete stray file from golden directory.	2026-02-20 10:12:21 -08:00
Auke Kok	f5db935afc	Delete stray file from golden directory. Signed-off-by: Auke Kok <auke.kok@versity.com>	2026-02-11 14:05:32 -05:00
Zach Brown	831faff7d2	Merge pull request #282 from versity/zab/v1.28 v1.28 Release	2026-02-06 09:28:52 -08:00
Zach Brown	8dad826f88	v1.28 Release Finish the release notes for the 1.28 release. Signed-off-by: Zach Brown <zab@versity.com> v1.28	2026-02-05 09:47:05 -08:00
Auke Kok	e2f3f2e060	Put `.full` file in $T_TMPDIR. This file was put into $CWD by the test scripts for no real good reason. I suppose somewhere $seqres was supposed to be set before these writes happened. Just write them to the test temp folder for good measure for now. Signed-off-by: Auke Kok <auke.kok@versity.com>	2026-02-02 14:37:40 -08:00
Zach Brown	3a05c69643	Merge pull request #279 from versity/auke/basic-acl-consistency Auke/basic acl consistency (test/reproduction)	2026-02-02 10:32:30 -08:00
Auke Kok	533f309aec	Switch to .get_inode_acl() to avoid rcu corruption. In el9.6, the kernel VFS no longer goes through xattr handlers to retreive ACLs, but instead calls the FS drivers' .get_{inode_}acl method. In the initial compat version we hooked up to .get_acl given the identical name that was used in the past. However, this results in caching issues, as was encountered by customers and exposed in the added test case `basic-acl-consistency`. The result is that some group ACL entries may appear randomly missing. Dropping caches may temporarily fix the issue. The root cause of the issue is that the VFS now has 2 separate paths to retreive ACL's from the FS driver, and, they have conflicting implications for caching. `.get_acl` is purely meant for filesystems like overlay/ecryptfs where no caching should ever go on as they are fully passthrough only. Filesystems with dentries (i.e. all normal filesystems should not expose this interface, and instead expose the .get_inode_acl method. And indeed, in introducing the new interface, the upstream kernel converts all but a few fs's to use .get_inode_acl(). The functional change in the driver is to detach KC_GET_ACL_DENTRY and introduce KC_GET_INODE_ACL to handle the new (and required) interface. KC_SET_ACL_DENTRY is detached due to it being a different changeset in the kernel and we should separate these for good measure now. Signed-off-by: Auke Kok <auke.kok@versity.com>	2026-01-30 11:31:43 -08:00
Auke Kok	0ef22b3c44	Add basic ACL consistency test case. This test case is used to detect and reproduce a customer issue we're seeing where the new .get_acl() method API and underlying changes in el9_6+ are causing ACL cache fetching to return inconsistent results, which shows as missing ACLs on directories. This particular sequence is consistent enough that it warrants making it into a specific test. Signed-off-by: Auke Kok <auke.kok@versity.com>	2026-01-22 12:23:38 -08:00
Auke Kok	85ffba5329	Update existing tests to use scratch helpers. Signed-off-by: Auke Kok <auke.kok@versity.com>	2026-01-20 12:35:43 -08:00
Auke Kok	553e6e909e	Scratch mount test helpers. Adds basic mkfs/mount/umount helpers that handle all the basics for making, mounting and unmounting scratch devices. The mount/unmount create "$T_MSCR", which lives in "$T_TMPDIR". Signed-off-by: Auke Kok <auke.kok@versity.com>	2026-01-20 12:35:09 -08:00
Zach Brown	9b569415f2	Merge pull request #276 from versity/zab/v1.27 v1.27 Release	2026-01-15 19:36:38 -08:00
Zach Brown	6a1e136085	v1.27 Release Finish the release notes for the 1.27 release. Signed-off-by: Zach Brown <zab@versity.com> v1.27	2026-01-15 14:21:53 -08:00
Zach Brown	7ca789c837	Merge pull request #278 from versity/zab/test_sync_before_crash Have run-tests monitor sync before crashing	2026-01-15 14:03:26 -08:00
Zach Brown	4d55fe6251	Have run-tests monitor sync before crashing There have been a few failures where output is generated just before we crash but it didn't have a chance to be written. Add a best-effort background sync before crashing. There's a good chance it'll hang if the system is stuck so we don't wait for it directly, just for .. some time to pass. Signed-off-by: Zach Brown <zab@versity.com>	2026-01-15 10:41:44 -08:00
Zach Brown	8f896d9783	Merge pull request #277 from versity/zab/avoid_lock_shrink_storm_hangs Zab/avoid lock shrink storm hangs	2026-01-14 11:13:09 -08:00
Zach Brown	e54f8d3ec0	Don't shutdown server from sending to fencing client Errors from lock server calls typically shut the server down. During normal unmount a client's locks are reclaimed before the connection is disconnected. The lock server won't try to send to unmounting clients. Clients whose connections time out can cause ENOTCONN errors. Their connection is freed before they're fenced and their locks are reclaimed. The server can try to send to the client for a lock that's disconnected and get a send error. These errors shouldn't shut down the server. The client is either going to be fenced and have the locks reclaimed, ensuring forward progress, or the server is going to shutdown if it can't fence. This was seen in testing as multiple clients were timed out. Signed-off-by: Zach Brown <zab@versity.com>	2026-01-13 15:34:55 -08:00
Zach Brown	d89e16214d	Simplify fence-and-reclaim fence execution check The fence-and-reclaim test runs a bunch of scenarios and makes sure that the fence agent was run on the appropriate mount's rids. Unfortunately the checks were racey. The check itself only looked at the log once to see if the rid had been fenced. Each check had steps before that would wait until the rid should have been fenced and could be checked. Those steps were racey. They'd do things like make sure a fence request wasn't pending, but never waited for it to be created in the first place. They'd falsely indicate that the log should be checked and when the rid wasn't found in the log the test would fail. In logs of failures we'd see that the rids were fenced after this test failed and moved on to the next. This simplifies the checks. It gets rid of all the intermediate steps and just waits around for the rid to be fenced, with a timeout. This avoids the flakey tests. Signed-off-by: Zach Brown <zab@versity.com>	2026-01-13 15:34:55 -08:00
Zach Brown	b468352254	Add t_wait_until_timeout Add a test helper for waiting for a command to return success which will fail the test after a timeout. Signed-off-by: Zach Brown <zab@versity.com>	2026-01-13 15:34:55 -08:00
Zach Brown	0eb9dfebdc	Allow forced unmount errors in lock invalidation Lock invalidation has assertions for critical errors, but it doesn't allow the synthetic errors that come from forced unmount severing the client's connection to the world. Signed-off-by: Zach Brown <zab@versity.com>	2026-01-13 15:34:55 -08:00
Zach Brown	f5750de244	Search messages in rbtree instead of lists The net layer was initially built around send queue lists with the presumption that there wouldn't be many messages in flight and that responses would be sent roughly in order. In the modern era, we can have 10s of thousands of lock request messages in flight. This lead to o(n^2) processing in quite a few places as recv processing searched for either requests to complete or responses to free. This adds messages to two rbtrees, indexing either requests by their id or responses by their send sequence. Recv processing can find messages in o(log n). This patch intends to be minimally disruptive. It's only replacing the search of the send and resend queues in the recv path with rbtrees. Other uses of the two queue lists are untouched. On a single node, with ~40k lock shrink attempts in flight, we go from processing ~800 total request/grant request/response pairs per second to ~60,000 per second. Signed-off-by: Zach Brown <zab@versity.com>	2026-01-13 15:32:55 -08:00
Zach Brown	f0c7996612	Limit client locks with option instead of shrinker The use of the VM shrinker was a bad fit for locks. Shrinking a lock requires a round trip with the server to request a null mode. The VM treats the locks like a cache, as expected, which leads to huge amounts of locks accumulating and then being shrank in bulk. This creates a huge backlog of locks making their way through the network conversation with the server that implements invalidating to a null mode and freeing. It starves other network and lock processing, possibly for minutes. This removes the VM shrinker and instead introduces an option that sets a limit on the number of idle locks. As the number of locks exceeds the count we only try to free an oldest lock at each lock call. This results in a lock freeing pace that is proportional to the allocation of new locks by callers and so is throttled by the work done while callers hold locks. It avoids the bulk shrinking of 10s of thousands of locks that we see in the field. Signed-off-by: Zach Brown <zab@versity.com>	2026-01-08 10:58:50 -08:00
Zach Brown	5143927e07	Merge pull request #275 from versity/auke/qht_slow_umount_pr Unmounts can be slow and break quorum-heartbeat-timeout	2026-01-08 09:35:23 -08:00
Auke Kok	f495f52ec9	Unmounts can be slow and break quorum-heartbeat-timeout We observe that unmount in this test can consume up to 10sec of time before proceeding to record heartbeat timeout elections by followers. When this happens, elections and new leaders happen before unmount even completes. This indicates that hearbeat packets from the unmount are ceased immediately, but the unmount is taking longer doing other things. The timeouts then trigger, possibly during the unmount. The result is that with timeouts of 3 seconds, we're not actually waiting for an election at all. It already happened 7 seconds ago. The code here just "sees" that it happens a few hundred ms after it started looking for it. There's a few ways about this fix. We could record the actual timestamp of the election, and compare it with the actual timestamp of the last heartbeat packet. This would be conclusive, and could disregard any complication from umount taking too long. But it also means adding timestamping in various places, or having to rely on tcpdump with packet processing. We can't just record $start before unmount. We will still violate the part of the test that checks that elections didn't happen too late. Especially in the 3sec test case if unmount takes 10sec. The simplest solution is to unmount in a bg thread, and circle around later to `wait` for it to assure we can re-mount without ill effect. Signed-off-by: Auke Kok <auke.kok@versity.com>	2026-01-08 09:05:40 -08:00
Zach Brown	3dafeaac5b	Merge pull request #273 from versity/clk/inode_deletion Clk/inode deletion	2026-01-07 12:20:12 -08:00
Chris Kirby	ef0f6f8ac2	Fix race in inode-deletion test Due to an iput race, the "unlink wait for open on other mount" subtest can fail. If the unlink happens inline, then the test passes. But if the orphan scanner has to complete the unlink work, it's possible that there won't be enough log merge work for the scanner to do the cleanup before we look at the seq index. Add SCOUTFS_TRIGGER_LOG_MERGE_FORCE_FINALIZE_OURS, to allow forcing a log merge. Add new counters, log_merges_start and log_merge_complete, so that tests can see that a merge has happened. Then we have to wait for the orphan scanner to do its work. Add a new counter, orphan_scan_empty, that increments each time the scanner walks the entire inode space without finding any orphans. Once the test sees that counter increment, it should be safe to check the seq index and see that the unlinked inode is gone. Signed-off-by: Chris Kirby <ckirby@versity.com>	2026-01-07 08:29:38 -06:00
Chris Kirby	c0cd29aa1b	Fix run-test.sh buffer multiplier breakage The /sys/kernel/debug/tracing/buffer_size_kb file always reads as "7 (expanded: 1408)". So the -T option to run-test.sh won't work, because it tries to multiply that string by the given factor. It always defaults to 1408 on every platform we currently support. Just use that value so we can specify -T in CI runs. Signed-off-by: Chris Kirby <ckirby@versity.com>	2025-12-18 15:05:48 -06:00
Zach Brown	50bff13f21	Merge pull request #266 from versity/zab/increase_move_empty_budget Increase server commit block budget for alloc move	2025-12-18 12:44:20 -08:00

1 2 3 4 5 ...

2190 Commits