This works together with the dropped block_write_full_page(), allowing
us to drop the _writepage() method as long as we implement
_writepages(). Since v5.19-rc3-395-g67235182a41c. This used to be the
.migratepage() method.
Signed-off-by: Auke Kok <auke.kok@versity.com>
This caller of scoutfs_get_block is now actively used in el10 and
the WARN_ON_ONCE(!lock) in data.c:567 triggers. Add the
scoutfs_per_task_add_excl/del calls in scoutfs_readpage,
scoutfs_readpages, and scoutfs_readahead to register the cluster
lock for scoutfs_get_block_read.
Add unconditionally rather than guarded by the add_excl return,
since these methods can be reached reentrantly from a top-level
read that already added the entry. Skipping the I/O in that case
left BUG_ON(!list_empty(pages)) in scoutfs_readpages and the page
locked in scoutfs_readpage.
Move scoutfs_per_task_del before scoutfs_unlock to match the
ordering used by file.c read/write paths.
Signed-off-by: Auke Kok <auke.kok@versity.com>
Since v5.1-rc3-29-gaa30f47cf666, and in el9, there are changes to reduce
the amount of boilerplate code needed to hook up lots of attribute files
using a .default_groups member. In el10, this is the required method as
.default_attrs has been removed. This touches every sysfs part that we
have.
Signed-off-by: Auke Kok <auke.kok@versity.com>
In v6.9-rc4-8-gead083aeeed9, this now takes a struct file argument,
adding to the ifdef salad we've got going on here.
Signed-off-by: Auke Kok <auke.kok@versity.com>
Since ~v6.5-rc1-95-g0d72b92883c6, generic_fillattr() asks us to pass
through the request_mask from the caller. This allows it to only
request a subset.
Signed-off-by: Auke Kok <auke.kok@versity.com>
Yet another major shrinker API evolution in v6.6-rc4-53-gc42d50aefd17.
The struct shrinker now has to be dynamically allocated. This is
purposely a backwards incompatible break.
Collapse the previous KC_ALLOC_SHRINKER, KC_INIT_SHRINKER_FUNCS,
and KC_REGISTER_SHRINKER macros into a single KC_SETUP_SHRINKER
macro. The three operations have to happen in different orders on
different kernel APIs (the name is needed at alloc time on el10
and at register time on KC_SHRINKER_NAME kernels), so coupling
them keeps the ordering correct per kernel.
Add KC_SHRINKER_IS_NULL so callers can detect shrinker_alloc()
failure on el10 and return -ENOMEM. The macro compiles to a
constant 0 on older kernels where the shrinker is an embedded
struct that cannot fail allocation.
Signed-off-by: Auke Kok <auke.kok@versity.com>
The return type always has been int, so, we just need to add return
value checking and do something with it. We could return -ENOMEM here as
well, either way it'll fall all the way through no matter what.
This is since v6.4-rc2-100-g83f2caaaf9cb.
Signed-off-by: Auke Kok <auke.kok@versity.com>
In v6.8-9146-gc759e609030c, the second argument for __assign_str() was
removed, as the second parameter is already derived from the __string()
definition and no longer needed. We have to do a little digging in
headers here to find the definition.
Note the missing `;` at a few places... it has to be added now.
Signed-off-by: Auke Kok <auke.kok@versity.com>
v6.9-rc4-29-g203c1ce0bb06 removes bd_inode. The canonical replacement is
bd_mapping->host, where applicable. We have one use where we directly
need the mapping instead of the inode, as well.
Signed-off-by: Auke Kok <auke.kok@versity.com>
Instead of defining a struct that ends with a flex array member with
`val[0]`, the compiler now balks at this since technically, the spec
considers this unsanitary. As a result however, we can't memcpy to
`struct->val` since that's a pointer and now we're writing something of
a different length (u8's in our case) into something that's of pointer
size. So there we have to do the opposite, and memcpy to
&struct->val[0].
Signed-off-by: Auke Kok <auke.kok@versity.com>
In v6.12-rc1-3-g5f60d5f6bbc1, asm/unaligned.h only included
asm-generic/unaligned.h and that was cleaned up from architecture
specific things. Everyone should now include linux/unaligned.h and the
former include was removed.
A quick peek at server.c shows that while included, it no longer uses
any function from this header at all, so it can just be dropped.
Signed-off-by: Auke Kok <auke.kok@versity.com>
In v6.6-rc5-1-g077c212f0344, one can no longer directly access the
inode m_time and a_time etc. We have to go through these static inline
functions to get to them. The compat is matched closely to mimic the
new functions.
Further back, ctime accessors were added in v6.5-rc1-7-g9b6304c1d537,
and need to be applied as well.
Signed-off-by: Auke Kok <auke.kok@versity.com>
In v6.1-rc5-2-ge9a688bcb193, get_random_u32_below() becomes available and
can start replacing prandom_bytes_max(). Switch to it where we can.
get_random_bytes() has been available since el7, so also replace
prandom_bytes() where we're using it.
Signed-off-by: Auke Kok <auke.kok@versity.com>
In RHEL10, the grep version is bumped from 3.6 to 3.11, and grep
no longer recognizes the \Z character anymore.
We have 2 solutions: We can either choose to use `grep -P` to
continue using it, or, alternatively, we can choose a different
`null` match to have an effectively empty exclude list.
The latter seems easy enough: By default, we can just exclude
empty lines ("^$") obtaining the exact same behavior as before.
Signed-off-by: Auke Kok <auke.kok@versity.com>
This is somewhat cumbersome, we want to see the error message, but the
format changes enough to make this messy. We opt to change the golden to
the new format, which only shows one of the arguments in its error
output: the thing that cannot be overwritten. We then add a filter that
rewrites the old output format with sed patterns to be exactly like the
new format, so this will work everywhere again, without changing or
adding filters to obscure error messages.
Signed-off-by: Auke Kok <auke.kok@versity.com>
The new format in el10 has non-hex output, separated by a comma. Add the
additional filter string so this works as expected.
Signed-off-by: Auke Kok <auke.kok@versity.com>
scoutfs_quota_mod_rule calls scoutfs_item_create/delete which use
the transaction allocator but it never held it. Without the hold,
a concurrent transaction commit can call scoutfs_alloc_init to
reinitialize the allocator while dirty_alloc_blocks is in the middle
of setting up the freed list block. This overwrites alloc->freed with
the server's fresh (empty) state, causing a blkno mismatch BUG_ON
in list_block_add.
Reproduced by stressing concurrent quota add/del operations across
mounts. Crashdump analysis confirms dirty_list_block COW'd a freed
block (fr_old=9842, new blkno=9852) but by the time list_block_add
ran, freed.ref.blkno was 0 with first_nr=0 and total_nr=0: the freed
list head had been zeroed by a concurrent alloc_init.
Fix by adding scoutfs_hold_trans/scoutfs_release_trans around the
item modification in scoutfs_quota_mod_rule, preventing transaction
commit from racing with the allocator use.
Rename the 'unlock' label to 'release' since 'out' now directly
does the unlock. The unlock safely handles a NULL lock.
Signed-off-by: Auke Kok <auke.kok@versity.com>
Without overly broad filtering empty lines from dmesg, filter
them so dmesg.new doesn't trigger a test failure. I don't want
to overly process dmesg, so do this as late as possible.
The xfs lockdep patterns can forget a leading/trailing empty line,
causing a failure despite the explicit removal of the lockdep
false positive.
Signed-off-by: Auke Kok <auke.kok@versity.com>
This already caught xfs_nondir_ilock_class, but recent CI runs
have been hitting xfs_dir_ilock_class, too.
Signed-off-by: Auke Kok <auke.kok@versity.com>
The gcc version in el7 can't determine that scoutfs_block_check_stale
won't return ret = 0 when the input ret value is < 0, and
errors because we might call alloc_wpage with an uninitialized
read_seq. Initialize it to 0 to avoid it.
Signed-off-by: Auke Kok <auke.kok@versity.com>
Readers currently accumulate all finalized log tree deltas into
a single bucket for deciding whether they are already in fs_root
or not, but, finalized trees that aren't inputs to a current merge
will have higher seqs, and thus we may be double applying deltas
already merged into fs_root.
To distinguish, scoutfs_totl_merge_contribute() needs to know the
merge status item seq. We change wkic's get_roots() from using the
SCOUTFS_NET_CMD_GET_ROOTS RPC to reading the superblock directly.
This is needed because totl merge resolution has to use the same data
as the btree roots it is operating on, thus we can't grab it from a
SCOUTFS_NET_CMD_GET_ROOTS packet - it likely is different.
Signed-off-by: Auke Kok <auke.kok@versity.com>
These mislabeled members and enums were clearly not describing
the actual data being handled and obfuscating the intent of
avoiding mixing merge input items with non-merge input items.
Signed-off-by: Auke Kok <auke.kok@versity.com>
Before deltas were added this code path was correct, but with
deltas we can't just retry this without clearing &root, since
it would potentially double count.
The condition where this could happen is when there are deltas in
several finalized log trees, and we've made progress towards reading
some of them, and then encounter a stale btree block. The retry
would not clear the collected trees, apply the same delta as was
already applied before the retry, and thus double count.
Signed-off-by: Auke Kok <auke.kok@versity.com>
Two different clients can write delta's for totl indexes at the same
time, recording their changes. When merged, a reader should apply both
in order, and only once. To do so, the seq determines whether the delta
has been applied already.
The code fails to update the seq while walking the trees for deltas to
apply. Subsequently, when processing subsequent trees, it could
re-process deltas already applied. In case of a large negative delta
(e.g. removal of large amounts of files), the totl value could become
negative, resulting in quota lockout.
The fix is simple: advance the seq when reading partial delta merges
to avoid double counting.
Signed-off-by: Auke Kok <auke.kok@versity.com>
Add a trigger that forces btree_merge() to return -ERANGE after
modifying a leaf's worth of items, causing many small partial merges
per merge cycle. This is used by tests to reliably reproduce races
that depend on partial merges splicing items into fs_root while
finalized logs still exist.
The trigger check lives inside btree_merge() where it can observe
actual item modification progress, rather than overriding the
caller's dirty byte limit argument which applies to the whole
writer context.
Signed-off-by: Auke Kok <auke.kok@versity.com>
merge_read_item() fails to update found->seq when combining delta items
from multiple finalized log trees. Add a test case to replicate the
conditions of this issue.
Each of 5 mounts sets totl value 1 on 2500 shared keys, giving an
expected total of 5 per key. Any total > 5 proves double-counting
from a stale seq.
The log_merge_force_partial trigger forces many partial merges per
cycle, creating the conditions where stale-seq items get spliced into
fs_root while finalized logs still exist. Parallel readers on all
mounts race against this window to detect double-counted values.
Signed-off-by: Auke Kok <auke.kok@versity.com>
We had no basic testing for `scoutfs read-xattr-index` whatsoever. This
adds your basic negative argument tests, lifecycle tests, the
deduplicated reads, and partial removal.
This exposes a bug in deletion where the indx entry isn't cleaned up
on inode delete.
Signed-off-by: Auke Kok <auke.kok@versity.com>
During inode deletion, scoutfs_xattr_drop forgot to set the xid
of the xattr after calling parse_indx_key, which hardcodes xid=0, and it
is the callers' responsibility. delete_force then deletes the wrong
key, and returns no errors on nonexistant keys.
So now there is a pending deletion for a non-existant indx and an
orphan indx entry in the tree. Subsequent calls to `scoutfs
read-xattr-index` will thus return entries for deleted inodes.
Signed-off-by: Auke Kok <auke.kok@versity.com>
This xfs lockdep stack trace has at least 2 variants around
fs_reclaim, so try and capture it not too precisely here.
We can remove "lockdep disabled" in the $re grep -v, because it
can affect both this and the kasan one.
Signed-off-by: Auke Kok <auke.kok@versity.com>
Add a reclaim_skip_finalize trigger that prevents reclaim from
setting FINALIZED on log_trees entries. The test arms this trigger,
force-unmounts a client to create an orphan, and verifies the log
merge succeeds without timeout and the orphan reclaim message
appears in dmesg.
Signed-off-by: Auke Kok <auke.kok@versity.com>
An unfinalized log_trees entry whose rid is not in mounted_clients
is an orphan left behind by incomplete reclaim. Previously this
permanently blocked log merges because the finalize loop treated it
as an active client that would never commit.
Call reclaim_open_log_tree for orphaned rids before starting a log
merge. Once reclaimed, the existing merge and freeing paths include
them normally.
Also skip orphans in get_stable_trans_seq so their open transaction
doesn't artificially lower the stable sequence.
Signed-off-by: Auke Kok <auke.kok@versity.com>
Basic testing for the punch-offline ioctl code. The tests consist of a
bunch of negative testing to make sure things that are expressly not
allowed fail, followed by a bunch of known-expected outcome tests that
punches holes in several patterns, verifying them.
Signed-off-by: Auke Kok <auke.kok@versity.com>
A minimal punch_offline ioctl wrapper. Argument style is adopted from
stage/release.
Following the syntax for the option of stage/release, this calls the
punch offline ioctl, punching any offline extent within the designated
range from offset with length.
Signed-off-by: Auke Kok <auke.kok@versity.com>
Add an archive layer ioctl for converting offline extents into sparse
extents without relying on or modifying data_version. This is helpful
when working with files with very large sparse regions.
Signed-off-by: Zach Brown <zab@versity.com>
Signed-off-by: Auke Kok <auke.kok@versity.com>