Commit Graph

2167 Commits

Author SHA1 Message Date
Chris Kirby
5465350ba2 block alloc fix 2026-01-14 13:25:00 -06:00
Chris Kirby
bf4f2ef8bb more dbg 2026-01-14 13:14:48 -06:00
Chris Kirby
c5207243f0 typo 2026-01-14 13:14:48 -06:00
Chris Kirby
8d330861ba track blkno 2026-01-14 13:14:48 -06:00
Chris Kirby
b7beadb22c track entry_bytes 2026-01-14 13:14:48 -06:00
Chris Kirby
926bb6b9ec debug 2026-01-14 13:14:48 -06:00
Chris Kirby
a420f5389e More debug 2026-01-14 13:14:48 -06:00
Chris Kirby
544362dfaa grab traces at exit 2026-01-14 13:14:48 -06:00
Chris Kirby
79593b6ecc And the output 2026-01-14 13:14:48 -06:00
Chris Kirby
b6e7d5b2d0 Temporarily make srch-safe triggers silent 2026-01-14 13:14:48 -06:00
Chris Kirby
487d38a4a5 init trace buffer size when starting tracing 2026-01-14 13:14:48 -06:00
Chris Kirby
9cddf6d9c9 Track srch_compact_error 2026-01-14 13:14:48 -06:00
Chris Kirby
d81c318c3e Beef up get_file_block() tracing
Signed-off-by: Chris Kirby <ckirby@versity.com>
2026-01-14 13:14:48 -06:00
Chris Kirby
279b093600 Debug stuff for srch-basic
Signed-off-by: Chris Kirby <ckirby@versity.com>
2026-01-14 13:14:48 -06:00
Zach Brown
8f896d9783 Merge pull request #277 from versity/zab/avoid_lock_shrink_storm_hangs
Zab/avoid lock shrink storm hangs
2026-01-14 11:13:09 -08:00
Zach Brown
e54f8d3ec0 Don't shutdown server from sending to fencing client
Errors from lock server calls typically shut the server down.

During normal unmount a client's locks are reclaimed before the
connection is disconnected.  The lock server won't try to send to
unmounting clients.

Clients whose connections time out can cause ENOTCONN errors.  Their
connection is freed before they're fenced and their locks are reclaimed.
The server can try to send to the client for a lock that's disconnected
and get a send error.

These errors shouldn't shut down the server.  The client is either going
to be fenced and have the locks reclaimed, ensuring forward progress, or
the server is going to shutdown if it can't fence.

This was seen in testing as multiple clients were timed out.

Signed-off-by: Zach Brown <zab@versity.com>
2026-01-13 15:34:55 -08:00
Zach Brown
d89e16214d Simplify fence-and-reclaim fence execution check
The fence-and-reclaim test runs a bunch of scenarios and makes sure that
the fence agent was run on the appropriate mount's rids.

Unfortunately the checks were racey.  The check itself only looked at
the log once to see if the rid had been fenced.  Each check had steps
before that would wait until the rid should have been fenced and could
be checked.

Those steps were racey.  They'd do things like make sure a fence request
wasn't pending, but never waited for it to be created in the first
place.  They'd falsely indicate that the log should be checked and when
the rid wasn't found in the log the test would fail.  In logs of
failures we'd see that the rids were fenced after this test failed and
moved on to the next.

This simplifies the checks.  It gets rid of all the intermediate steps
and just waits around for the rid to be fenced, with a timeout.  This
avoids the flakey tests.

Signed-off-by: Zach Brown <zab@versity.com>
2026-01-13 15:34:55 -08:00
Zach Brown
b468352254 Add t_wait_until_timeout
Add a test helper for waiting for a command to return success which will
fail the test after a timeout.

Signed-off-by: Zach Brown <zab@versity.com>
2026-01-13 15:34:55 -08:00
Zach Brown
0eb9dfebdc Allow forced unmount errors in lock invalidation
Lock invalidation has assertions for critical errors, but it doesn't
allow the synthetic errors that come from forced unmount severing the
client's connection to the world.

Signed-off-by: Zach Brown <zab@versity.com>
2026-01-13 15:34:55 -08:00
Zach Brown
f5750de244 Search messages in rbtree instead of lists
The net layer was initially built around send queue lists with the
presumption that there wouldn't be many messages in flight and that
responses would be sent roughly in order.

In the modern era, we can have 10s of thousands of lock request messages
in flight.  This lead to o(n^2) processing in quite a few places as recv
processing searched for either requests to complete or responses to
free.

This adds messages to two rbtrees, indexing either requests by their id
or responses by their send sequence.  Recv processing can find messages
in o(log n).  This patch intends to be minimally disruptive.  It's only
replacing the search of the send and resend queues in the recv path with
rbtrees.  Other uses of the two queue lists are untouched.

On a single node, with ~40k lock shrink attempts in flight, we go from
processing ~800 total request/grant request/response pairs per second to
~60,000 per second.

Signed-off-by: Zach Brown <zab@versity.com>
2026-01-13 15:32:55 -08:00
Zach Brown
f0c7996612 Limit client locks with option instead of shrinker
The use of the VM shrinker was a bad fit for locks.  Shrinking a lock
requires a round trip with the server to request a null mode.  The VM
treats the locks like a cache, as expected, which leads to huge amounts
of locks accumulating and then being shrank in bulk.  This creates a
huge backlog of locks making their way through the network conversation
with the server that implements invalidating to a null mode and freeing.
It starves other network and lock processing, possibly for minutes.

This removes the VM shrinker and instead introduces an option that sets
a limit on the number of idle locks.  As the number of locks exceeds the
count we only try to free an oldest lock at each lock call.  This
results in a lock freeing pace that is proportional to the allocation of
new locks by callers and so is throttled by the work done while callers
hold locks.  It avoids the bulk shrinking of 10s of thousands of locks
that we see in the field.

Signed-off-by: Zach Brown <zab@versity.com>
2026-01-08 10:58:50 -08:00
Zach Brown
5143927e07 Merge pull request #275 from versity/auke/qht_slow_umount_pr
Unmounts can be slow and break quorum-heartbeat-timeout
2026-01-08 09:35:23 -08:00
Auke Kok
f495f52ec9 Unmounts can be slow and break quorum-heartbeat-timeout
We observe that unmount in this test can consume up to 10sec of time
before proceeding to record heartbeat timeout elections by followers.

When this happens, elections and new leaders happen before unmount even
completes. This indicates that hearbeat packets from the unmount are
ceased immediately, but the unmount is taking longer doing other things.
The timeouts then trigger, possibly during the unmount.

The result is that with timeouts of 3 seconds, we're not actually
waiting for an election at all. It already happened 7 seconds ago. The
code here just "sees" that it happens a few hundred ms after it started
looking for it.

There's a few ways about this fix. We could record the actual timestamp
of the election, and compare it with the actual timestamp of the last
heartbeat packet. This would be conclusive, and could disregard any
complication from umount taking too long. But it also means adding
timestamping in various places, or having to rely on tcpdump with packet
processing.

We can't just record $start before unmount. We will still violate the
part of the test that checks that elections didn't happen too late.
Especially in the 3sec test case if unmount takes 10sec.

The simplest solution is to unmount in a bg thread, and circle around
later to `wait` for it to assure we can re-mount without ill effect.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2026-01-08 09:05:40 -08:00
Zach Brown
3dafeaac5b Merge pull request #273 from versity/clk/inode_deletion
Clk/inode deletion
2026-01-07 12:20:12 -08:00
Chris Kirby
ef0f6f8ac2 Fix race in inode-deletion test
Due to an iput race, the "unlink wait for open on other mount"
subtest can fail. If the unlink happens inline, then the test
passes. But if the orphan scanner has to complete the unlink
work, it's possible that there won't be enough log merge work
for the scanner to do the cleanup before we look at the seq index.

Add SCOUTFS_TRIGGER_LOG_MERGE_FORCE_FINALIZE_OURS, to allow
forcing a log merge. Add new counters, log_merges_start and
log_merge_complete, so that tests can see that a merge has happened.

Then we have to wait for the orphan scanner to do its work.
Add a new counter, orphan_scan_empty, that increments each time
the scanner walks the entire inode space without finding any
orphans. Once the test sees that counter increment, it should be
safe to check the seq index and see that the unlinked inode is gone.

Signed-off-by: Chris Kirby <ckirby@versity.com>
2026-01-07 08:29:38 -06:00
Chris Kirby
c0cd29aa1b Fix run-test.sh buffer multiplier breakage
The /sys/kernel/debug/tracing/buffer_size_kb file always reads as
"7 (expanded: 1408)". So the -T option to run-test.sh won't work,
because it tries to multiply that string by the given factor.

It always defaults to 1408 on every platform we currently support.
Just use that value so we can specify -T in CI runs.

Signed-off-by: Chris Kirby <ckirby@versity.com>
2025-12-18 15:05:48 -06:00
Zach Brown
50bff13f21 Merge pull request #266 from versity/zab/increase_move_empty_budget
Increase server commit block budget for alloc move
2025-12-18 12:44:20 -08:00
Zach Brown
de70ca2372 Increase server commit block budget for alloc move
A few callers of alloc_move_empty in the server were providing a budget
that was too small.  Recent changes to extent_mod_blocks increased the
max budget that is necessary to move extents between btrees.  The
existing WAG of 100 was too small for trees of height 2 and 3.  This
caused looping in production.

We can increase the move budget to half the overall commit budget, which
leaves room for a height of around 7 each.  This is much greater than we
see in practice because the size of the per-mount btrees is effectiely
limited by both watermarks and thresholds to commit and drain.

Signed-off-by: Zach Brown <zab@versity.com>
2025-12-17 14:22:04 -06:00
Zach Brown
5af1412d5f Merge pull request #270 from versity/auke/bdev_autoloading
Avoid block device autoloading warning.
2025-12-17 11:06:32 -08:00
Zach Brown
0a2b2ad409 Merge pull request #269 from versity/auke/tap_status_msg
Include t_fail status in tap output.
2025-12-17 11:04:00 -08:00
Auke Kok
6c4590a8a0 Avoid block device autoloading warning.
It's possible to trigger the block device autoloading mechanism
with a mknod()/stat(), and this mechanism has long been declared
obsolete, thus triggering a dmesg warning since el9_7, which then
fails the test. You may need to `rmmod loop` to reproduce.

Avoid this by avoiding to trigger a loop autoload - we just make a
different blockdev. Chosing `42` here should avoid any autoload
mechanism as this number is explicitly for demo drivers and should
never trigger an autoload.

We also just ignore the warning line in dmesg. Other tests can and
might perhaps still trigger this, as well as background noise running
during the test.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2025-12-08 13:04:58 -08:00
Zach Brown
1768f69c3c Merge pull request #224 from versity/auke/renameat2-test-sub-dir
Use T_D0/1 instead of T_M0 here.
2025-12-08 10:05:46 -08:00
Zach Brown
dcb0fd5805 Merge pull request #268 from versity/auke/dont_use_bash_special_stdfiles
Avoid using bash special device nodes.
2025-12-08 09:47:19 -08:00
Auke Kok
660f874488 Use T_D0/1 instead of T_M0 here.
Use of T_M0 and variants should be reserved for e.g. scoutfs
<subcommand> -p <mountpoint> type of usages. Tests should create
individual content files in the assigned subdirectory.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2025-12-04 14:34:02 -05:00
Auke Kok
e1a6689a9b Include t_fail status in tap output.
The tap output file was not yet complete as it failed to include
the contents of `status.msg`. In a few cases, that would mean it
lacks important context.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2025-12-04 14:09:39 -05:00
Auke Kok
2884a92408 Avoid using bash special device nodes.
Bash has special handling when these standard IO files, but
there are cases where customers have special restrictions set
on them. Likely to avoid leaking error data out of system logs
as part of IDS software.

In any case, we can just reopen existing file descriptors here
in both these cases to avoid this entirely. This will always
work.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2025-12-04 13:24:48 -05:00
Zach Brown
e194714004 Merge pull request #264 from versity/auke/findmnt_retval
Findmnt returns 1 when no matching entries found
2025-12-03 14:29:31 -08:00
Auke Kok
8bb2f83cf9 Findmnt returns 1 when no matching entries found
Our local fence script attempts to interpret errors executing `findmnt`
as critical errors, but the program exit code explicitly returns
EXIT_FAILURE when the total number of matching mount entries is zero.

This can happen if the mount disappeared while we're attempting to
fence the mount, but, the scoutfs sysfs files are still in place as
we read them. It's a small window, but, it's a fork/exec plus full
parse of /etc/fstab, and a lot can happen in the 0.015s findmnt takes
on my system.

There's no other exit codes from findmnt other than 0 and 1. At that
point, we can only assume that if the stdout is empty, the mount
isn't there anymore.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2025-12-02 12:55:11 -08:00
Zach Brown
6a9a6789d5 Merge pull request #267 from versity/clk/merge_enoent
Handle ENOENT when getting log merge status item
2025-12-02 09:34:28 -08:00
Chris Kirby
ee630b164f Handle ENOENT when getting log merge status item
Tests that cause client retries can fail with this error
from server_commit_log_merge():

error -2 committing log merge: getting merge status item

This can happen if the server has already committed and resolved
the log merge that is being retried. We can safely ignore ENOENT here
just like we do a few lines later.

Signed-off-by: Chris Kirby <ckirby@versity.com>
2025-12-01 08:58:24 -06:00
Zach Brown
1c7678b6f5 Merge pull request #263 from versity/zab/v1.26
v1.26 Release
2025-11-18 09:39:27 -08:00
Zach Brown
22b5e79bbd v1.26 Release
Finish the release notes for the 1.26 release.

Signed-off-by: Zach Brown <zab@versity.com>
v1.26
2025-11-17 14:42:14 -08:00
Zach Brown
259e639271 Merge pull request #262 from versity/zab/ino_alloc_per_lock
Add ino_alloc_per_lock option
2025-11-14 13:57:49 -08:00
Zach Brown
4d66c38c71 Remove redundant WARN in commit_log_trees
The server's commit_log_trees has an error message that includes the
source of the error, but it's not used for all errors.  The WARN_ON is
redundant with the message and is removed because it isn't filtered out
when we see errors from forced unmount.

Signed-off-by: Zach Brown <zab@versity.com>
2025-11-14 10:04:30 -08:00
Zach Brown
7ef62894bd Add ino_alloc_per_lock option
Add an option that can limit the number of inode numbers that are
allocated per lock group.

Signed-off-by: Zach Brown <zab@versity.com>
2025-11-13 17:19:04 -08:00
Zach Brown
1f363a1ead Merge pull request #261 from versity/zab/log_merge_double_free
Zab/log merge double free
2025-11-13 17:18:30 -08:00
Zach Brown
8ddf9b8c8c Handle disappearing fencing requests and targets
The userspace fencing process wasn't careful about handling underlying
directories that disappear while it was working.

On the server/fenced side, fencing requests can linger after they've
been resolved by writing 1 to fenced or error.  The script could come
back around to see the directory before the server finally removes it,
causing all later uses of the request dir to fail.  We saw this in the
logs as a bunch of cat errors for the various request files.

On the local fence script side, all the mounts can be in the process of
being unmounted so both the /sys/fs dirs and the mount it self can be
removed while we're working.

For both, when we're working with the /sys/fs files we read them without
logging errors and then test that the dir still exists before using what
we read.  When fencing a mount, we stop if findmnt doesn't find the
mount and then raise a umount error if the /sys/fs dir exists after
umount fails.

And while we're at it, we have each scripts logging append instead of
truncating (if, say, it's a log file instead of an interactive tty).

Signed-off-by: Zach Brown <zab@versity.com>
2025-11-13 12:43:31 -08:00
Zach Brown
fd80c17ab6 Filter out kernel message when guests are slow
Ignore more kernel messages when debug guests are being slow.

Signed-off-by: Zach Brown <zab@versity.com>
2025-11-13 12:43:31 -08:00
Zach Brown
991e2cbdf8 Ignore slow quorum hb transfers in tests
We're getting test failures from messages that our guests can be
unresponsive.  They sure can be.  We don't need to fail for this one
specific case.

Signed-off-by: Zach Brown <zab@versity.com>
2025-11-13 12:43:31 -08:00
Zach Brown
92ac132873 Silence merge splice error when forcing
Silence another error warning and assertion that's assuming that the
result of the errors is going to be persistent.  When we're forcing an
unmount we've severed storage and networking.

Signed-off-by: Zach Brown <zab@versity.com>
2025-11-13 12:43:31 -08:00