Commit Graph

1782 Commits

Author SHA1 Message Date
Auke Kok
8a7bc0cdfa Remove the use of backing_dev_info pt from address_space.
Instead, use the new inline inode_to_bdi from <backing-dev.h> to fill
in the task's backing_dev_info.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-08-01 16:35:48 -04:00
Auke Kok
e81d16f8db Do not use MS_* flags anymore in kernel space.
MS_* flags from <linux/mount.h> should not be used in the kernel
anymore from 4.x onwards. Instead, we need to use the SB_* versions

Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-08-01 16:35:48 -04:00
Zach Brown
bad0455e28 Use count/scan objects shrinking interface
Move to the more recent interfaces for counting and scanning cached
objects to shrink.

Signed-off-by: Zach Brown <zab@versity.com>
Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-08-01 16:35:48 -04:00
Auke Kok
0a30c0b926 Use page->lru instead of page->list
With v3.14-rc1-10-g34bf6ef94a83, page->list is removed Instead,
use the union member ->lru.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-08-01 16:35:48 -04:00
Zach Brown
84a4000c85 Use more modern bio interfaces
Move towards modern bio intefaces, while unfortunately carrying along a
bunch of compat functions that let us still work with the old
incompatible interfaces.

Signed-off-by: Zach Brown <zab@versity.com>
Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-08-01 16:35:48 -04:00
Zach Brown
859f63e49b Use memalloc_nofs_save
memalloc_nofs_save() was introduced as preferential to trying to use GFP
flags to indicate that a task should not recurse during reclaim.  We use
it instead of the _noio_ we were using before.

Signed-off-by: Zach Brown <zab@versity.com>
2023-08-01 16:35:48 -04:00
Zach Brown
588bdb7969 Use percpu_counter_add_batch
__percpu_counter_add_batch was renamed to make it clear that the __
doesn't mean it's less safe, as it means in other calls in the API, but
just that it takes an additional parameter.

Signed-off-by: Zach Brown <zab@versity.com>
Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-08-01 16:35:48 -04:00
Auke Kok
b894f6b04c Use __posix_acl_create/_chmod and add backwards compatibility
There are new interfaces available but the old one has been retained
for us to use. In case of older kernels, we will need to fall back
to the previous name of these functions.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-08-01 16:35:46 -04:00
Auke Kok
e26573ae8e Fix argument test for __posix_acl_valid.
The argument is fixed to be user_namespace, instead of user_ns.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-08-01 16:34:50 -04:00
Auke Kok
3f6b98496f Use setattr_preapre() as inode_change_ok() was removed in v4.8-rc1
Instead, we can call setattr_prepare() directly. We provide a fallback
for older kernels.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-08-01 16:34:49 -04:00
Auke Kok
b8a378ede7 Use the new inode->i_version manipulation methods.
Provide fallback in degraded mode for kernels pre-v4.15-rc3 by directly
manipulating the member as needed.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-08-01 16:33:28 -04:00
Auke Kok
4b08e79988 inode->i_mutex has been replaced with inode->i_rwsem.
Since v4.6-rc3-27-g9902af79c01a, inode->i_mutex has been replaced
with ->i_rwsem. However, long since whenever, inode_lock() and
related functions already worked as intended and provided fully
exclusive locking to the inode.

To avoid a name clash on pre-rhel8 kernels, we have to rename a
stack variable in `src/file.c`.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-08-01 16:33:28 -04:00
Auke Kok
2ac28c4969 New inode->i_version API requires <iversion.h>
Since v4.15-rc3-4-gae5e165d855d, <linux/iversion.h> contains a new
inode->i_version API and it is not included by default.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-08-01 16:33:28 -04:00
Auke Kok
3608d1aae1 use $(MAKE) to allow passing jobserver flags.
With this, we can `make -jX` to speed up compiles a bit from
the kmod folder.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-08-01 16:33:28 -04:00
Auke Kok
f13757f0af module_init/_exit should have a semicolon at eol.
In the past this was not needed but since el7 onwards these macros
should require the semicolon.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-08-01 16:33:28 -04:00
Auke Kok
34e6efd39c Adjust for new augmented rbtree compute callback function signature
The new variant of the code that recomputes the augmented value
is designed to handle non-scalar types and to facilitate that, it
has new semantics for the _compute callback. It is now passed a
boolean flag `exit` that indicates that if the value isn't changed,
it should exit and halt propagation.

The callback function now shall return whether that propagation should
stop or not, and not the computed new value. The callback can now
directly update the new computed value in the node.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-08-01 16:30:16 -04:00
Auke Kok
b452ca3d23 Add include <blkdev.h>.
Fixes: Error: implicit declaration of function ‘blkdev_put’

Previously this was an `extern` in <fs.h> and included implicitly,
hence the need to hard include it now.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-08-01 13:40:59 -04:00
Auke Kok
090c795b7e preempt_mask.h is removed entirely.
v4.1-rc4-22-g92cf211874e9 merges this into preempt.h, and on
rhel7 kernels we don't need this include anymore either.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-08-01 13:40:59 -04:00
Auke Kok
d9394cb084 page_cache_release() is removed. put_page() instead.
Even in 3.x, this already was equivalent.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-08-01 13:40:59 -04:00
Auke Kok
67ae352618 flush_work_sync is equivalent to flush_work.
v3.15-rc1-6-g1a56f2aa4752 removes flush_work_sync entirely, but
ever since v3.6-rc1-25-g606a5020b9bd which made all workqueues
non-reentrant, it has been equivalent to flush_work.

This is safe because in all cases only one server->work can be
in flight at a time.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-08-01 13:40:59 -04:00
Auke Kok
38bb5a8254 d_materialise_unique replaced with d_splice_alias.
Note argument order reversal.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-08-01 13:40:59 -04:00
Auke Kok
2510688a36 READ_ONCE() replaces ACCESS_ONCE()
v3.18-rc3-2-g230fa253df63 forces us to remove ACCESS_ONCE() with
READ_ONCE(), but it is probably the better interface and works with
non-scalar types.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-08-01 13:40:59 -04:00
Auke Kok
15a5dca8c6 PAGE_CACHE_SIZE was removed, replace with PAGE_SIZE.
PAGE_CACHE_SIZE was previously defined to be equivalent to PAGE_SIZE.

This symbol was removed in v4.6-rc1-32-g1fa64f198b9f.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-08-01 13:40:59 -04:00
Auke Kok
c3996cb021 Include kernel.h and fs.h at the top of kernelcompat.h
Because we `-include src/kernelcompat.h` from the command line,
this header gets included before any of the kernel includes in
most .c and .h files. We should at least make sure we pull in
<fs> and <kernel> since they're required.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-08-01 13:40:59 -04:00
Zach Brown
778c2769df Merge pull request #132 from versity/zab/v1.15
v1.15 Release
2023-07-17 13:02:10 -07:00
Zach Brown
9e3529060e v1.15 Release
Finish the release notes for the 1.15 release.

Signed-off-by: Zach Brown <zab@versity.com>
v1.15
2023-07-17 12:07:13 -07:00
Zach Brown
1672b3ecec Merge pull request #130 from versity/zab/noncontig_alloc_einval
Fix partial preallocation when _contig_only = 0
2023-07-17 10:21:18 -07:00
Zach Brown
55f9435fad Fix partial preallocation when _contig_only = 0
Data preallocation attempts to allocate large aligned regions of
extents.  It tried to fill the hole around a write offset that
didn't contain an extent.  It missed the case where there can be
multiple extents between the start of the region and the hole.
It could try to overwrite these additional existing extents and writes
could return EINVAL.

We fix this by trimming the preallocation to start at the write offset
if there are any extents in the region before the write offset.  The
data preallocation test output has to be updated now that allocation
extents won't grow towards the start of the region when there are
existing extents.

Signed-off-by: Zach Brown <zab@versity.com>
2023-07-17 09:36:09 -07:00
Zach Brown
072f6868d3 Merge pull request #131 from versity/zab/server_merge_splice_failure
Process log merge splicing in many commits
2023-07-15 21:03:32 -07:00
Zach Brown
8a64b46a2f Process log merge splicing in many commits
Log merge completions were spliced in one server commit.  It's possible
to get enough completion work pending that it all can't be completed in
one server commit.  Operations fail with ENOSPC and because these
changes can't be unwound cleanly the server asserts.

This allows the completion splicing to break the work up into multiple
commits.

Processing completions in multiple commits means that request creation
can observe the merge status in states that weren't possible before.
Splicing is careful to maintain an elevated nr_complete count while the
client can't get requests because the tree is rebalancing.

Signed-off-by: Zach Brown <zab@versity.com>
2023-07-14 13:28:29 -07:00
Zach Brown
14901c39aa Merge pull request #129 from versity/zab/v1.14
v1.14 Release
2023-06-29 11:30:01 -07:00
Zach Brown
e095127ae9 v1.14 Release
Finish the release notes for the 1.14 release.

Signed-off-by: Zach Brown <zab@versity.com>
v1.14
2023-06-29 10:03:53 -07:00
Zach Brown
a9da27444f Merge pull request #128 from versity/zab/prealloc_fragmentation
Zab/prealloc fragmentation
2023-06-29 09:57:32 -07:00
Zach Brown
49fe89741d Merge pull request #125 from versity/zab/get_referring_entries
Zab/get referring entries
2023-06-29 09:57:06 -07:00
Zach Brown
847916860d Advance move_blocks extent search offset
The move_blocks ioctl finds extents to move in the source file by
searching from the starting block offset of the region to move.
Logically, this is fine.  After each extent item is deleted the next
search will find the next extent.

The problem is that deleted items still exist in the item cache.  The
next iteration has to skip over all the deleted extents from the start
of the region.  This is fine with large extents, but with heavily
fragmented extents this creates a huge amplification of the number of
items to traverse when moving the fragmented extents in a large file.
(It's not quite O(n^2)/2 for the total extents, deleted items are purged
as we write out the dirty items in each transaction.. but it's still
immense.)

The fix is to simply start searching for the next extent after the one
we just moved.

Signed-off-by: Zach Brown <zab@versity.com>
2023-06-28 16:54:28 -07:00
Zach Brown
564b942ead Write test for hole filling noncontig prealloc
Add a test which exercises filling holes in prealloc regions when the
_contig_only prealloc option is not set.

Signed-off-by: Zach Brown <zab@versity.com>
2023-06-28 16:16:04 -07:00
Zach Brown
3d99fda0f6 Preallocate data around iblock when noncontig
If the _contig_only option isn't set then we try to preallocate aligned
regions of files.  The initial implementation naively only allowed one
preallocation attempt in each aligned region.  If it got a small
allocation that didn't fill the region then every future allocation
in the region would be a single block.

This changes every preallocation in the region to attempt to fill the
hole in the region that iblock fell in.  It uses an extra extent search
(item cache search) to try and avoid thousands of single block
allocations.

Signed-off-by: Zach Brown <zab@versity.com>
2023-06-28 12:21:25 -07:00
Zach Brown
6c0ab75477 Merge pull request #126 from versity/zab/rht_block_shrink_deadlock
Avoid deadlock from block reclaim in rht resize
2023-06-16 10:30:16 -07:00
Zach Brown
89b238a5c4 Add more acceptable quorum delay during testing
Loaded VMs can see a few more seconds delay.

Signed-off-by: Zach Brown <zab@versity.com>
2023-06-16 09:38:58 -07:00
Zach Brown
05371b83f0 Update expected console messages during testing
Signed-off-by: Zach Brown <zab@versity.com>
2023-06-16 09:37:37 -07:00
Zach Brown
acafb869e7 Avoid deadlock from block reclaim in rht resize
The RCU hash table uses deferred work to resize the hash table.  There's
a time during resize when hash table iteration will return EAGAIN until
resize makes more progress.  During this time resize can perform
GFP_KERNEL allocations.

Our shrinker tries to iterate over its RCU hash table to find blocks to
reclaim.  It tries to restart iteration if it gets EAGAIN on the
assumption that it will be usable again soon.

Combine the two and our shrinker can get stuck retrying iteration
indefinitely because it's shrinking on behalf of the hash table resizing
that is trying to allocate the next table before making iteration work
again.  We have to stop shrinking in this case so that the resizing
caller can proceed.

Signed-off-by: Zach Brown <zab@versity.com>
2023-06-15 14:45:26 -07:00
Zach Brown
74c5fe1115 Add get-referring-entries test
Signed-off-by: Zach Brown <zab@versity.com>
2023-06-14 14:12:10 -07:00
Zach Brown
2279e9657f Add get_referring_entries scoutfs command
Add a cli command for the get_referring_entries ioctl.

Signed-off-by: Zach Brown <zab@versity.com>
2023-06-14 14:12:10 -07:00
Zach Brown
707752a7bf Add get_referring_entries ioctl
Add an ioctl that gives the callers all entries that refer to an inode.
It's like a backwards readdir.  It's a light bit of translation between
the internal _add_next_linkrefs() list of entries and the ioctl
interface of a buffer of entry structs.

Signed-off-by: Zach Brown <zab@versity.com>
2023-06-14 14:12:10 -07:00
Zach Brown
0316c22026 Extend scoutfs_dir_add_next_linkrefs
Extend scoutfs_dir_add_next_linkref() to be able to return multiple
backrefs under the lock for each call and have it take an argument to
limit the number of backrefs that can be added and returned.

Its return code changes a bit in that it returns 1 on success instead of
0 so we have to be a little careful with callers who were expecting 0.
It still returns -ENOENT when no entries are found.

We break up its tracepoint into one that records each entry added and
one that records the result of each call.

This will be used by an ioctl to give callers just the entries that
point to an inode instead of assembling full paths from the root.

Signed-off-by: Zach Brown <zab@versity.com>
2023-06-14 14:12:10 -07:00
Zach Brown
5a1e5639c2 Merge pull request #124 from versity/zab/fix_quo_hb_mount_option
Zab/fix quo hb mount option
2023-06-07 10:50:32 -07:00
Zach Brown
950963375b Update quorum heartbeat test for mount option
Update the quorum_heartbeat_timeout_ms test to also test the mount
option, not just updating the timeout via sysfs.  This takes some
reworking as we have to avoid the active leader/server when setting the
timeout via the mount option.  We also allow for a bit more slack around
comparing kernel sleeps and userspace wall clocks.

Signed-off-by: Zach Brown <zab@versity.com>
2023-05-23 09:57:13 -07:00
Zach Brown
e52435b993 Add t_mount_opt
Add a test helper that mounts with a mount option.

Signed-off-by: Zach Brown <zab@versity.com>
2023-05-22 16:30:01 -07:00
Zach Brown
2b72c57cb0 Fix crash in quorum_heartbeat_timeout_ms parsing
Mount option parsing runs early enough that the rest of the option
read/write serialization infrastructure isn't set up yet.  The
quorum_heartbeat_timeout_ms mount option tried to use a helper that
updated the stored option but it wasn't initialized yet so it crashed.

The helper was really only to have the option validity test in one
place.  It's reworked to only verify the option and the actual setting
is left to the callers.

Signed-off-by: Zach Brown <zab@versity.com>
2023-05-22 16:29:56 -07:00
Zach Brown
9c67b2a42d Merge pull request #122 from versity/zab/v1.13
v1.13 Release
2023-05-19 11:38:48 -07:00