Commit Graph

2113 Commits

Author SHA1 Message Date
Chris Kirby
d38e41cb57 Add the inode number to scoutfs_xattr_set traces.
Signed-off-by: Chris Kirby <ckirby@versity.com>
2025-10-15 17:35:17 -05:00
Chris Kirby
a896984f59 Only start new quorum election after a receive failure
It's possible for the quorum worker to be preempted for a long period,
especially on debug kernels. Since we only check for how much time
has passed, it's possible for a clean receive to inadvertently
trigger an election. This can cause the quorum-heartbeat-timeout
test to fail due to observed delays outside of the expected bounds.

Instead, make sure we had a receive failure before comparing timestamps.

Signed-off-by: Chris Kirby <ckirby@versity.com>
2025-10-15 17:35:17 -05:00
Chris Kirby
35bcad91a6 Close window where we can lose search items
In finalize_and_start_log_merge(), we overwrite the server
mount's log tree with its finalized form and then later write out
its next open log tree. This leaves a window where the mount's
srch_file is nulled out, causing us to lose any search items in
that log tree.

This shows up as intermittent failures in the srch-basic-functionality
test.

Eliminate this timing window by doing what unmount/reclaim does when
it finalizes, by moving the resources from the item that we finalize
into server trees/items as it finalizes. Then there is no window
where those resources exist only in memory until we create another
transaction.

Signed-off-by: Chris Kirby <ckirby@versity.com>
2025-10-06 12:27:25 -05:00
Auke Kok
0b7b9d4a5e Avoid trigger munching of block_remove_stale trigger.
It's entirely likely that the trigger here is munched by a read on a
dirty block from any unrelated or background read. Avoid that by putting
the trigger at the end of the condition list.

Now that the order is swapped, we have to avoid a null deref in
block_is_dirty(bp) here, as well.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2025-10-06 12:27:25 -05:00
Auke Kok
f86a7b4d3c Fully wait for orphan inode scan to complete.
The issue with the previous attempt to fix the orphan-inodes test was
that we would regularly exceed the 120s timeout value put in there.

Instead, in this commit, we change the code to add a new counter to
indicate orphan deletion progress. When orphan inodes are deleted, the
increment of this counter indicates progress happened. Inversely,
every time the counter doesn't increment, and the orphan scan attempts
counter increments, we know that there was no more work to be done.

For safety, we wait until 2 consecutive scan attempts were made without
forward progress in the test case.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2025-10-06 12:27:25 -05:00
Auke Kok
96eb9662a1 Revert "Extend orphan-inodes timeout."
This reverts commit 138c7c6b49.

The timeout value here is still exceeded by CI test jobs, and thus
causing the test to fail.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2025-10-06 12:27:25 -05:00
Chris Kirby
47af90d078 Fix race in offline-extent-waiting test
Before comparing file contents, wait for the background dd to complete.
Also fix a typo.

Signed-off-by: Chris Kirby <ckirby@versity.com>
2025-10-06 12:27:25 -05:00
Chris Kirby
669e37c636 Remove hung task workaround from large-fragmented-free test
Adjusting hung_task_timeout_secs is still needed for this test to pass
with a debug kernel. But the logic belongs on the platform side.

Signed-off-by: Chris Kirby <ckirby@versity.com>
2025-10-06 12:27:25 -05:00
Chris Kirby
bb3e1f3665 Fix commit budget calculation with multiple holders
The try_drain_data_freed() path was generating errors about overrunning
its commit budget:

scoutfs f.2b8928.r.02689f error: 1 holders exceeded alloc budget av: bef 8185 now 8036, fr: bef 8185 now 7602

The budget overrun check was using the current number of commit holders
(in this case one) instead of the the maximum number of concurrent holders
(in this case two). So even well behaved paths like try_drain_data_freed()
can appear to exceed their commit budget if other holders dirty some blocks
and apply their commits before the try_drain_data_freed() thread does its
final budget reconciliation.

Signed-off-by: Chris Kirby <ckirby@versity.com>
2025-10-06 12:27:25 -05:00
Chris Kirby
0d262de4ac Fix dirtied block calculation in extent_mod_blocks()
Free extents are stored in two btrees: one sorted by block number, one
by size. So if you insert a new extent between two existing extents, you can
be modifying two items in the by-block-number tree. And depending on the size
of those items, that can result in three items over in the -by-size tree.
So that's a 5x multiplier per level.

If we're shrinking the tree and adding more freed blocks, we're conceptually
dirtying two blocks at each level to merge. (current *2 in the code).
But if they fall under the low water mark then one of them is freed, so we
can have *3 per level in this case.

Signed-off-by: Chris Kirby <ckirby@versity.com>
2025-10-06 12:27:25 -05:00
Auke Kok
70bd936213 Ignore sparse error about stat.h on el8.
On el8, sparse is at 0.6.4 in epel-release, but it fails with:
```
[SP src/util.c]
src/util.c: note: in included file (through /usr/include/sys/stat.h):
/usr/include/bits/statx.h:30:6: error: not a function <noident>
/usr/include/bits/statx.h:30:6: error: bad constant expression type
```

This is due to us needing O_DIRECT from <fcntl.h>, so we set _GNU_SOURCE
before including it, but this causes (through _USE_GNU in sys/stat.h)
statx.h to be included, and that has __has_include, and sparse is too
dumb to understand it.

Just shut it up.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2025-10-06 12:27:25 -05:00
Chris Kirby
3f786596e0 Don't overrun the block budget in server_log_merge_free_work().
This fixes a potential fence post failure like the following:

error: 1 holders exceeded alloc budget av: bef 7407 now 7392, fr: bef 8185 now 7672

The code is only accounting for the freed btree blocks, not the dirtying of
other items. So it's possible to be at exactly (COMMIT_HOLD_ALLOC_BUDGET / 2),
dirty some log btree blocks, loop again, then consume another
(COMMIT_HOLD_ALLOC_BUDGET / 2) and blow past the total budget.

In this example, we went over by 13 blocks.

By only consuming up to 1/8 of the budget on each loop, and committing when we
have consumed 3/4 of the budget, we can avoid the fence post condition.

Signed-off-by: Chris Kirby <ckirby@versity.com>
2025-10-06 12:27:25 -05:00
Zach Brown
cad47ed1ed Merge pull request #247 from versity/zab/sparse_error
Zab/sparse error
2025-10-06 10:09:07 -07:00
Auke Kok
bf87ea0a1c Add option to shuffle test order.
The `-R` option will shuffle the order in which tests are executed.

The testing order shouldn't affect the outcome of any of the tests, but
in practice many of these tests will execute code slightly different
based on the history of the filesystem, resources allocated, memory
usage etc. of tests that were executed before. Shuffling the order of
tests therefore introduces small semi-random variations in the
enviroment.

The xfstests test is the only one that can't be shuffled yet into the
mix, so it is kept at the end. This is because it leaves the filesystems
unmounted. At a later point we may want to address this.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2025-10-03 14:55:32 -07:00
Zach Brown
e088424d70 Add initial filters for warnings in distro source
Add a chunk of filters for sparse warnings that trigger on distro kernel
source.

Signed-off-by: Zach Brown <zab@versity.com>
2025-10-03 09:35:36 -07:00
Zach Brown
d0cf026298 Require sparse, and filter kernel sparse output
Fail the build if we don't check with sparse in both the kernel and
userspace utils.  Add a filtering wrapper to the kernel build so that we
have a place to filter out uninteresting errors from kernel sources that
we're building against.

Signed-off-by: Zach Brown <zab@versity.com>
2025-10-03 09:35:36 -07:00
Zach Brown
03fa1ce7c5 Avoid bad sparse warning in lock_invalidate()
This is another example of refactoring a loop to avoid sparse warnings
from doing something in the else of a failed trylock if.  We want to
drop and reacquire the lock if the trylock fails so we do it every loop
iteration.  This shouldn't be experiencing much contention because most
of the cov users are usually done under locks and invalidation has
excluded lock holders.  So the additional lock and unlock noise should
be local.

Signed-off-by: Zach Brown <zab@versity.com>
2025-10-03 09:35:36 -07:00
Zach Brown
3d9f10de93 Work around sparse warning in _item_write_done
scoutfs_item_write_done() acquires the cinf dirty_lock and pg rwlock out
of order.  It uses a trylock to detect failure and back off of both
before retrying.

sparse seems to have some peculiar sensitivity to following the else
branch from a failed trylock while already in a context.  Doing that
consistently triggered the spurious mismatched context warning.

This refactors the loop to always drop and reacquire the dirty_lock
after attemping the trylock.  It's not great, but this shouldn't be very
contended because the transaction write has serialized write lock
holderse that would be trying to dirty items.  The silly lock noise will
be mostly cached.

Signed-off-by: Zach Brown <zab@versity.com>
2025-10-03 09:34:23 -07:00
Zach Brown
9741d40e10 Merge pull request #229 from versity/zab/v1.25
v1.25 Release
2025-06-04 11:21:25 -07:00
Zach Brown
48ac7bdf7c v1.25 Release
Finish the release notes for the 1.25 release.

Signed-off-by: Zach Brown <zab@versity.com>
v1.25
2025-06-03 13:35:42 -07:00
Zach Brown
7865ee9f54 Merge pull request #223 from versity/auke/el9_5_wmaybe-uninit
Fix -Wmaybe-uninitalized since rhel9.5
2025-05-12 12:21:02 -07:00
Zach Brown
624eb128c6 Merge pull request #221 from versity/auke/enospc-test
Give enospc test more time to commit unlink.
2025-05-09 11:27:04 -07:00
Zach Brown
091eb3b683 Merge pull request #219 from versity/auke/fix-tests-failing-dirty-test-dirs
Fix test cases that don't run cleanly in a semi-dirty env.
2025-05-09 11:17:24 -07:00
Zach Brown
04e8cc6295 Merge pull request #220 from versity/auke/orphan-inodes
Extend orphan-inodes timeout.
2025-05-09 11:15:13 -07:00
Zach Brown
0f6fdb3eb5 Merge pull request #222 from versity/auke/t_kill_silent
Properly silently kill background tasks.
2025-05-09 11:11:24 -07:00
Auke Kok
2f48a606e8 Fix -Wmaybe-uninitalized since rhel9.5
Looks like the compiler isn't smart enough to understand the pass by
pointer value, and we can initialize it here easily.

make[1]: Entering directory '/usr/src/kernels/5.14.0-503.26.1.el9_5.x86_64'
  CC [M]  /home/auke/scoutfs/kmod/src/server.o
/home/auke/scoutfs/kmod/src/server.c: In function ‘fence_pending_recov_worker’:
/home/auke/scoutfs/kmod/src/server.c:4170:23: error: ‘addr.v4.addr’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
 4170 |                 ret = scoutfs_fence_start(sb, rid, le32_to_be32(addr.v4.addr),
      |                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 4171 |                                           SCOUTFS_FENCE_CLIENT_RECOVERY);
      |                                           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors

There's still the obvious issue here that we'd intended to support ipv6
but just disregard that here.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2025-05-08 15:20:50 -07:00
Auke Kok
377e49caf1 Properly silently kill background tasks.
Occasionally, we have some tests fail because these kills produce:

tests/lock-recover-invalidate.sh: line 42:  9928 Terminated

Even though we expected them to be silent. In these particular cases we
already don't care about this output.

We borrow the silent_kill() function from orphan-inodes and promote it
to t_silent_kill() in funcs/exec.sh, and then use it everywhere where
appropriate.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2025-05-08 12:03:04 -07:00
Auke Kok
d08eb66adc Give enospc test more time to commit unlink.
The current test sequence performs the unlink and immediately tests
whether enough resources are available to create new files again, and
this consistently fails.

One of my crummy VMs takes a good 12 seconds before the `touch` actually
succeeds. We care about the filesystem eventually returning from ENOSPC,
and certainly we don't want it to take forever, but there is a period
after our first ENOSPC error and cleanup that we expect ENOSPC to fail
for a bit longer.

Make the timeout 120s. As soon as the `touch` completes, exit the wait
loop.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2025-05-08 11:40:13 -07:00
Zach Brown
6f19d0bd36 Merge pull request #216 from versity/zab/stop_ending_dirty_data_freed
Zab/stop ending dirty data freed
2025-05-08 11:18:23 -07:00
Auke Kok
1d0cde7cc3 Clean up old test data as needed.
If run without `-m` (explicit mkfs) in subsequent testing, old test
data files may break several tests. Most failures are -EEXIST, but
there are some more subtle ones.

This change erases any existing test dir as needed just before we
run the tests, and avoids the issue entirely.

I considered doing a `mv dir dir.$$ && rm -rf dir.$$ &` alternative
solution but that likely will interfere disproportionally with
tests that do disconnects and other thing that can be impacted by an
unlink storm.

This has an obvious performance aspect - tests will be a little
slower to start on subsequent runs. In CI, this will effectively be
a no-op though.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2025-05-08 10:10:01 -07:00
Auke Kok
138c7c6b49 Extend orphan-inodes timeout.
This test regularly fails in CI when the 15 seconds elapses and the
system still hasn't concluded the mount log merges and orphan inode
scans needed to unlink the test files.

Instead of just extending the timeout value, we test-and-retry for 120s.
This hopefully is faster in most cases. My smallest VM needs about 6s-8s
on average.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2025-05-08 09:56:45 -07:00
Zach Brown
8aa1a98901 Merge pull request #210 from versity/auke/perf-irq-took-too-long
Filter out perf `interrupt took too long` dmesg.
2025-04-30 10:04:00 -07:00
Zach Brown
888b1394a6 Retry client commit and get log trees separately
The client transaction commit worker has a series of functions that it
calls to commit the current transaction and open the next one.  If any
of them fail, it retries all of them from the beginning each time until
they all succeed.

This pattern behaves badly since we added the strict get_trans_seq and
commit_trans_seq latching in the log_trees.  The server will only commit
the items for a get or commit request once, and will fail a commit
request if it isn't given the seq that matches the current item.

If the server gets an error it can have persisted items while sending an
error to the client.  If this error was for a get request, then the
client will retry all of its transaction write functions.  This includes
the commit request which is now using a stale seq and will fail
indefinitely.  This is visible in the server log as:

  error -5 committing client logs for rid e57e37132c919c4f: invalid log trees item get_trans_seq

The solution is to retry the commit and get phases independently.  This
way a failed get will be retried on its own without running through the
commit phase that had succeeded.  The client will eventually get the
next seq that it can then safely commit.

Signed-off-by: Zach Brown <zab@versity.com>
2025-04-29 11:46:38 -07:00
Zach Brown
e457694f19 Don't send dirty data_freed blocks to client
At the end of get_log_trees we can try and drain the data_freed extent
tree, which can take multiple commits.  If a commit fails then the
blocks are still dirty in memory.  We can't send references to those
blocks to the client.  We have to return an error and not send the
log_trees, like the main get_log_trees does.  The client will retry and
eventually get a log_trees that references blocks that were successfully
committed.

Signed-off-by: Zach Brown <zab@versity.com>
2025-04-29 11:46:38 -07:00
Zach Brown
459de5b478 Merge pull request #211 from versity/auke/tapf-output
TAP formatted output.
2025-04-15 14:25:06 -07:00
Auke Kok
24031cde1d TAP formatted output.
Stored as `results/scoutfs.tap`, this file contains TAP format 14
generated test results.

Embedded in the output are some metadata so that these files can be
aggregated and stored in an unique and deduplicating way, but using a
generated UUID at the start of testing. The file itself also catches git
ID, date, and kernel version, as well as the (possibly altered) test
sequence used.

Any test that has diff or dmesg output will be considered failed, and a
copy of the relevant data is included as comments.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2025-04-15 12:02:41 -07:00
Zach Brown
04cc41719c Merge pull request #209 from versity/auke/basic-truncate-yes-pipefail
Ignore pipefail alternative error when not a tty.
2025-04-14 13:15:03 -07:00
Auke Kok
1b47e9429e Filter out perf interrupt took too long dmesg.
Example:

```
[ 2469.638414] perf: interrupt took too long (2507 > 2500), lowering kernel.perf_event_max_sample_rate to 79000
```

Signed-off-by: Auke Kok <auke.kok@versity.com>
2025-04-14 12:06:58 -07:00
Auke Kok
7ea084082d Ignore pipefail alternative error when not a tty.
This happens with the basic-truncate test, only. It's the only user
of the `yes` program.

The `yes` command normally fails gracefully under the usual runs that
are attached to some terminal. But when the test script runs entirely
under something else, it will throw a needless error message that
pollutes the test output:

  `yes: standard output: Broken pipe`

Adjust the redirect to omit all stderr for `yes` in this case.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2025-04-14 11:13:39 -07:00
Zach Brown
f565451f76 Merge pull request #208 from versity/zab/v1.24
v1.24 Release
2025-03-17 11:18:42 -07:00
Zach Brown
05f14640fb v1.24 Release
Finish the release notes for the 1.24 release.

Signed-off-by: Zach Brown <zab@versity.com>
v1.24
2025-03-14 12:19:30 -07:00
Zach Brown
609fc56cd6 Merge pull request #203 from versity/auke/new_inode_ctime
Fix new_inode ctime assignment.
2025-02-25 15:23:16 -08:00
Zach Brown
a4b5a256eb Merge pull request #175 from versity/auke/mmap
Support for mmap() writable mappings.
2025-02-20 14:03:01 -08:00
Zach Brown
f701ce104c Merge pull request #204 from versity/zab/remove_wordexp
Remove wordexp expansion of utils path argument
2025-02-19 09:27:15 -08:00
Zach Brown
c6dab3c306 Remove wordexp expansion of utils path argument
scoutfs cli commands were using a helper that tried to perform word
expansion on the path argument.  This was done with the intent of
providing the convenience of shell expansion (env vars, ~) within the
cli command argument.

But it breaks paths that accidentally have their file names match the
syntax that wordexp supports.   "[ ]" tripped up files in the wild.

We don't need to provide shell expansion functionality in our argument
parsing.  The shell can do that.  The cli must pass the arguments
straight through, no parsing at all.

Signed-off-by: Zach Brown <zab@versity.com>
2025-02-18 11:55:37 -08:00
Auke Kok
e3e2cfceec Fix new_inode ctime assignment.
Very old copy/paste bug here, we want to update new_inode's ctime
instead. old_inode already is updated.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2025-02-18 13:15:49 -05:00
Zach Brown
5a10c79409 Merge pull request #201 from versity/auke/fixes_pre_parallel_restore
Misc. fixes and changes to support parallel_restore and check.
2025-02-02 06:53:25 -08:00
Auke Kok
e9d147260c Fix ctx->pos updating to properly handle dent gaps
We need to assure we're emitting dents with the proper position
and we already have them as part of our dent. The only caveat is
to increment ctx->pos once beyond the list to make sure the caller
doesn't call us once more.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2025-01-27 14:49:04 -05:00
Auke Kok
6c85879489 Assert unlock doesn't underflow lock user count.
While debugging a double unlock error we hit this condition and
debugging would have been a lot easier had we enforced this simple
constraint that we can't decrement the lock users count if it's
already 0.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2025-01-27 14:49:04 -05:00
Auke Kok
8b76a53cf3 Avoid cluster locking while put_user() in _allocated_inos.
Similar to fiemap, readdir and walk_inodes, this method could have
put_user during a page fault, causing potentially a deadlock.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2025-01-27 14:49:04 -05:00