Commit Graph

1774 Commits

Author SHA1 Message Date
Auke Kok
81aa58253e module_init/_exit should have a semicolon at eol.
In the past this was not needed but since el7 onwards these macros
should require the semicolon.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-10-09 15:35:40 -04:00
Auke Kok
c683ded0e6 Adjust for new augmented rbtree compute callback function signature
The new variant of the code that recomputes the augmented value
is designed to handle non-scalar types and to facilitate that, it
has new semantics for the _compute callback. It is now passed a
boolean flag `exit` that indicates that if the value isn't changed,
it should exit and halt propagation.

The callback function now shall return whether that propagation should
stop or not, and not the computed new value. The callback can now
directly update the new computed value in the node.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-10-09 15:35:40 -04:00
Auke Kok
f27431b3ae Add include <blkdev.h>.
Fixes: Error: implicit declaration of function ‘blkdev_put’

Previously this was an `extern` in <fs.h> and included implicitly,
hence the need to hard include it now.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-10-09 15:35:40 -04:00
Auke Kok
28c3cee995 preempt_mask.h is removed entirely.
v4.1-rc4-22-g92cf211874e9 merges this into preempt.h, and on
rhel7 kernels we don't need this include anymore either.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-10-09 15:35:40 -04:00
Auke Kok
430960ef3c page_cache_release() is removed. put_page() instead.
Even in 3.x, this already was equivalent.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-10-09 15:35:40 -04:00
Auke Kok
7006a84d96 flush_work_sync is equivalent to flush_work.
v3.15-rc1-6-g1a56f2aa4752 removes flush_work_sync entirely, but
ever since v3.6-rc1-25-g606a5020b9bd which made all workqueues
non-reentrant, it has been equivalent to flush_work.

This is safe because in all cases only one server->work can be
in flight at a time.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-10-09 15:35:40 -04:00
Auke Kok
eafb8621da d_materialise_unique replaced with d_splice_alias.
Note argument order reversal.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-10-09 15:35:40 -04:00
Auke Kok
006555d42a READ_ONCE() replaces ACCESS_ONCE()
v3.18-rc3-2-g230fa253df63 forces us to remove ACCESS_ONCE() with
READ_ONCE(), but it is probably the better interface and works with
non-scalar types.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-10-09 15:35:40 -04:00
Auke Kok
8e458f9230 PAGE_CACHE_SIZE was removed, replace with PAGE_SIZE.
PAGE_CACHE_SIZE was previously defined to be equivalent to PAGE_SIZE.

This symbol was removed in v4.6-rc1-32-g1fa64f198b9f.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-10-09 15:35:40 -04:00
Auke Kok
32c0dbce09 Include kernel.h and fs.h at the top of kernelcompat.h
Because we `-include src/kernelcompat.h` from the command line,
this header gets included before any of the kernel includes in
most .c and .h files. We should at least make sure we pull in
<fs> and <kernel> since they're required.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-10-09 15:35:40 -04:00
Zach Brown
9c9ba651bd Merge pull request #141 from versity/zab/fence-reclaim-racey-seq-test
Remove seq test from fence-and-reclaim
2023-10-09 12:21:48 -07:00
Zach Brown
14eddb6420 Remove seq test from fence-and-reclaim
The fence-and-reclaim test has a little function that runs after fencing
and recovery to make sure that all the mounts are operational again.
The main thing it does is re-use the same locks across a lot of files to
ensure that lock recovery didn't lose any locks that stop forward
progress.

But I also threw in a test of the committed_seq machinery, as a bit of
belt and suspenders.  The problem is the test is racey.  It samples the
seq after the write so the greatest seq it rememebers can be after the
write and will not be committed by the other nodes reads.  It being less
than the committed_seq is a totally reasonable race.

Which explains why this test has been rarely failing since it was
written.  There's no particular reason to test the committed_seq
machinery here, so we can just remove that racey test.

Signed-off-by: Zach Brown <zab@versity.com>
2023-10-09 10:56:15 -07:00
Zach Brown
597208324d Merge pull request #140 from versity/zab/v1.16
v1.16 Release
2023-10-04 11:51:45 -07:00
Zach Brown
8596c9ad45 v1.16 Release
Finish the release notes for the 1.16 release.

Signed-off-by: Zach Brown <zab@versity.com>
v1.16
2023-10-04 10:32:55 -07:00
Zach Brown
8a705ea380 Merge pull request #139 from versity/zab/hold_commit_stuck
Start server commits when holds wait for alloc
2023-10-04 10:27:12 -07:00
Zach Brown
4784ccdfd5 Start server commits when holds wait for alloc
Server code that wants to dirty blocks by holding a commit won't be
allowed to until the current allocators for the server transaction have
enough space for the holder.  As an active holder applies the commit the
allocators are refilled and the waiting holders will proceed.

But the current allocators can have no resources as the server starts
up.  There will never be active holders to apply the commit and refill
the allocators.  In this case all the holders will block indefinitely.

The fix is to trigger a server commit when a holder doesn't have room.
It used to be that commits were only triggered when apply callers were
waiting.  We transfer some of that logic into a new 'committing' field
so that we can have commits in flight without apply callers waiting.  We
add it to the server commit tracing.

While we're at it we clean up the logic that tests if a hold can
proceed.  It used to be confusingly split across two functions that both
could sample the current allocator space remaining.  This could lead to
weird cases where the first holder could use the second alloc remaining
call, not the one whose values were tested to see if the holder could
fit.  Now each hold check only samples the allocators once.

And finally we fix a subtle case where the budget exceeded message can
spuriously trigger in the case where dirtying the freed list created a
new empty block after the holder recorded the amount of space in the
freed block.

Signed-off-by: Zach Brown <zab@versity.com>
2023-10-03 13:32:09 -07:00
Zach Brown
778c2769df Merge pull request #132 from versity/zab/v1.15
v1.15 Release
2023-07-17 13:02:10 -07:00
Zach Brown
9e3529060e v1.15 Release
Finish the release notes for the 1.15 release.

Signed-off-by: Zach Brown <zab@versity.com>
v1.15
2023-07-17 12:07:13 -07:00
Zach Brown
1672b3ecec Merge pull request #130 from versity/zab/noncontig_alloc_einval
Fix partial preallocation when _contig_only = 0
2023-07-17 10:21:18 -07:00
Zach Brown
55f9435fad Fix partial preallocation when _contig_only = 0
Data preallocation attempts to allocate large aligned regions of
extents.  It tried to fill the hole around a write offset that
didn't contain an extent.  It missed the case where there can be
multiple extents between the start of the region and the hole.
It could try to overwrite these additional existing extents and writes
could return EINVAL.

We fix this by trimming the preallocation to start at the write offset
if there are any extents in the region before the write offset.  The
data preallocation test output has to be updated now that allocation
extents won't grow towards the start of the region when there are
existing extents.

Signed-off-by: Zach Brown <zab@versity.com>
2023-07-17 09:36:09 -07:00
Zach Brown
072f6868d3 Merge pull request #131 from versity/zab/server_merge_splice_failure
Process log merge splicing in many commits
2023-07-15 21:03:32 -07:00
Zach Brown
8a64b46a2f Process log merge splicing in many commits
Log merge completions were spliced in one server commit.  It's possible
to get enough completion work pending that it all can't be completed in
one server commit.  Operations fail with ENOSPC and because these
changes can't be unwound cleanly the server asserts.

This allows the completion splicing to break the work up into multiple
commits.

Processing completions in multiple commits means that request creation
can observe the merge status in states that weren't possible before.
Splicing is careful to maintain an elevated nr_complete count while the
client can't get requests because the tree is rebalancing.

Signed-off-by: Zach Brown <zab@versity.com>
2023-07-14 13:28:29 -07:00
Zach Brown
14901c39aa Merge pull request #129 from versity/zab/v1.14
v1.14 Release
2023-06-29 11:30:01 -07:00
Zach Brown
e095127ae9 v1.14 Release
Finish the release notes for the 1.14 release.

Signed-off-by: Zach Brown <zab@versity.com>
v1.14
2023-06-29 10:03:53 -07:00
Zach Brown
a9da27444f Merge pull request #128 from versity/zab/prealloc_fragmentation
Zab/prealloc fragmentation
2023-06-29 09:57:32 -07:00
Zach Brown
49fe89741d Merge pull request #125 from versity/zab/get_referring_entries
Zab/get referring entries
2023-06-29 09:57:06 -07:00
Zach Brown
847916860d Advance move_blocks extent search offset
The move_blocks ioctl finds extents to move in the source file by
searching from the starting block offset of the region to move.
Logically, this is fine.  After each extent item is deleted the next
search will find the next extent.

The problem is that deleted items still exist in the item cache.  The
next iteration has to skip over all the deleted extents from the start
of the region.  This is fine with large extents, but with heavily
fragmented extents this creates a huge amplification of the number of
items to traverse when moving the fragmented extents in a large file.
(It's not quite O(n^2)/2 for the total extents, deleted items are purged
as we write out the dirty items in each transaction.. but it's still
immense.)

The fix is to simply start searching for the next extent after the one
we just moved.

Signed-off-by: Zach Brown <zab@versity.com>
2023-06-28 16:54:28 -07:00
Zach Brown
564b942ead Write test for hole filling noncontig prealloc
Add a test which exercises filling holes in prealloc regions when the
_contig_only prealloc option is not set.

Signed-off-by: Zach Brown <zab@versity.com>
2023-06-28 16:16:04 -07:00
Zach Brown
3d99fda0f6 Preallocate data around iblock when noncontig
If the _contig_only option isn't set then we try to preallocate aligned
regions of files.  The initial implementation naively only allowed one
preallocation attempt in each aligned region.  If it got a small
allocation that didn't fill the region then every future allocation
in the region would be a single block.

This changes every preallocation in the region to attempt to fill the
hole in the region that iblock fell in.  It uses an extra extent search
(item cache search) to try and avoid thousands of single block
allocations.

Signed-off-by: Zach Brown <zab@versity.com>
2023-06-28 12:21:25 -07:00
Zach Brown
6c0ab75477 Merge pull request #126 from versity/zab/rht_block_shrink_deadlock
Avoid deadlock from block reclaim in rht resize
2023-06-16 10:30:16 -07:00
Zach Brown
89b238a5c4 Add more acceptable quorum delay during testing
Loaded VMs can see a few more seconds delay.

Signed-off-by: Zach Brown <zab@versity.com>
2023-06-16 09:38:58 -07:00
Zach Brown
05371b83f0 Update expected console messages during testing
Signed-off-by: Zach Brown <zab@versity.com>
2023-06-16 09:37:37 -07:00
Zach Brown
acafb869e7 Avoid deadlock from block reclaim in rht resize
The RCU hash table uses deferred work to resize the hash table.  There's
a time during resize when hash table iteration will return EAGAIN until
resize makes more progress.  During this time resize can perform
GFP_KERNEL allocations.

Our shrinker tries to iterate over its RCU hash table to find blocks to
reclaim.  It tries to restart iteration if it gets EAGAIN on the
assumption that it will be usable again soon.

Combine the two and our shrinker can get stuck retrying iteration
indefinitely because it's shrinking on behalf of the hash table resizing
that is trying to allocate the next table before making iteration work
again.  We have to stop shrinking in this case so that the resizing
caller can proceed.

Signed-off-by: Zach Brown <zab@versity.com>
2023-06-15 14:45:26 -07:00
Zach Brown
74c5fe1115 Add get-referring-entries test
Signed-off-by: Zach Brown <zab@versity.com>
2023-06-14 14:12:10 -07:00
Zach Brown
2279e9657f Add get_referring_entries scoutfs command
Add a cli command for the get_referring_entries ioctl.

Signed-off-by: Zach Brown <zab@versity.com>
2023-06-14 14:12:10 -07:00
Zach Brown
707752a7bf Add get_referring_entries ioctl
Add an ioctl that gives the callers all entries that refer to an inode.
It's like a backwards readdir.  It's a light bit of translation between
the internal _add_next_linkrefs() list of entries and the ioctl
interface of a buffer of entry structs.

Signed-off-by: Zach Brown <zab@versity.com>
2023-06-14 14:12:10 -07:00
Zach Brown
0316c22026 Extend scoutfs_dir_add_next_linkrefs
Extend scoutfs_dir_add_next_linkref() to be able to return multiple
backrefs under the lock for each call and have it take an argument to
limit the number of backrefs that can be added and returned.

Its return code changes a bit in that it returns 1 on success instead of
0 so we have to be a little careful with callers who were expecting 0.
It still returns -ENOENT when no entries are found.

We break up its tracepoint into one that records each entry added and
one that records the result of each call.

This will be used by an ioctl to give callers just the entries that
point to an inode instead of assembling full paths from the root.

Signed-off-by: Zach Brown <zab@versity.com>
2023-06-14 14:12:10 -07:00
Zach Brown
5a1e5639c2 Merge pull request #124 from versity/zab/fix_quo_hb_mount_option
Zab/fix quo hb mount option
2023-06-07 10:50:32 -07:00
Zach Brown
950963375b Update quorum heartbeat test for mount option
Update the quorum_heartbeat_timeout_ms test to also test the mount
option, not just updating the timeout via sysfs.  This takes some
reworking as we have to avoid the active leader/server when setting the
timeout via the mount option.  We also allow for a bit more slack around
comparing kernel sleeps and userspace wall clocks.

Signed-off-by: Zach Brown <zab@versity.com>
2023-05-23 09:57:13 -07:00
Zach Brown
e52435b993 Add t_mount_opt
Add a test helper that mounts with a mount option.

Signed-off-by: Zach Brown <zab@versity.com>
2023-05-22 16:30:01 -07:00
Zach Brown
2b72c57cb0 Fix crash in quorum_heartbeat_timeout_ms parsing
Mount option parsing runs early enough that the rest of the option
read/write serialization infrastructure isn't set up yet.  The
quorum_heartbeat_timeout_ms mount option tried to use a helper that
updated the stored option but it wasn't initialized yet so it crashed.

The helper was really only to have the option validity test in one
place.  It's reworked to only verify the option and the actual setting
is left to the callers.

Signed-off-by: Zach Brown <zab@versity.com>
2023-05-22 16:29:56 -07:00
Zach Brown
9c67b2a42d Merge pull request #122 from versity/zab/v1.13
v1.13 Release
2023-05-19 11:38:48 -07:00
Zach Brown
0b38aeb5a4 v1.13 Release
Finish the release notes for the 1.13 release.

Signed-off-by: Zach Brown <zab@versity.com>
v1.13
2023-05-19 10:38:40 -07:00
Zach Brown
2daf873983 Merge pull request #121 from versity/zab/heartbeat_fencing_tweaks
Zab/heartbeat fencing tweaks
2023-05-18 17:10:40 -07:00
Zach Brown
904c5dce90 Filter forced unmount transaction commit error
Add a transaction commit error message to the set of errors we ignore
when triggering forced unmount.

Signed-off-by: Zach Brown <zab@versity.com>
2023-05-18 15:50:34 -07:00
Zach Brown
57c6d78df8 Add test of quorum heartbeat timeout setting
Signed-off-by: Zach Brown <zab@versity.com>
2023-05-18 15:50:33 -07:00
Zach Brown
74e9d0f764 Silence test syfs option failure
If setting a sysfs option failes the bash write error is output.  It
contains the script line number which can fail over time, leading to
mismatched golden output failures if we used the output as an expected
indication of failure.  Callers should test its rc and output
accordingly if they want the failure logged and compared.

Signed-off-by: Zach Brown <zab@versity.com>
2023-05-18 11:15:28 -07:00
Zach Brown
98eb0eb649 Add t_quorum_nrs test helper
Add a quick function that outputs the fs numbers of the quorum mounts.

Signed-off-by: Zach Brown <zab@versity.com>
2023-05-18 11:15:28 -07:00
Zach Brown
15de0c21c1 Have quorum drop messages on force unmount
Forced unmount is supposed to isolate the mount from the world.  The
net.c TCP messaging returns errors when sending during forced unmount.
The quorum code has its own UDP messaging and wasn't taking forced
unmount into account.

This lead to quorum still being able to send resignation messages to
other quorum peers during forced unmount, making it hard to test
heartbeat timeouts with forced unmount.

The quorum messaging is already unreliable so we can easily make it drop
messages during forced unmount.  Now forced unmount more fully isolates
the quorum code and it becomes easier to test.

Signed-off-by: Zach Brown <zab@versity.com>
2023-05-18 10:01:19 -07:00
Zach Brown
7b65767803 Track and log quorum heartbeat delays
Add tracking and reporting of delays in sending or receiving quorum
heartbeat messages.  We measure the time between back to back sends or
receives of heartbeat messages.  We record these delays truncated down
to second granularity in the quorum sysfs status file.  We log messages
to the console for each longest measured delay up to the maximum
configurable heartbeat timeout.

Signed-off-by: Zach Brown <zab@versity.com>
2023-05-17 14:44:27 -07:00