Very old copy/paste bug here, we want to update new_inode's ctime
instead. old_inode already is updated.
Signed-off-by: Auke Kok <auke.kok@versity.com>
We need to assure we're emitting dents with the proper position
and we already have them as part of our dent. The only caveat is
to increment ctx->pos once beyond the list to make sure the caller
doesn't call us once more.
Signed-off-by: Auke Kok <auke.kok@versity.com>
While debugging a double unlock error we hit this condition and
debugging would have been a lot easier had we enforced this simple
constraint that we can't decrement the lock users count if it's
already 0.
Signed-off-by: Auke Kok <auke.kok@versity.com>
Similar to fiemap, readdir and walk_inodes, this method could have
put_user during a page fault, causing potentially a deadlock.
Signed-off-by: Auke Kok <auke.kok@versity.com>
Similar to readdir and fiemap vfs methods, we can't copy to user while
holding cluster locks. The previous comment about it being safe no
longer applies, and this could deadlock.
Rewrite the loop to iterate and store entries in a page, then flush
the page contents while not holding a clusterlock.
Signed-off-by: Auke Kok <auke.kok@versity.com>
Now that we support mmap writes, at any point in time we could
pagefault and lock for writes. That means - just like readdir -
we can no longer lock and copy_to_user, since it also may page fault
and thus deadlock.
We statically allocate 32 extent entries on the stack and use
these to shuffle out fiemap entries at a time, locking and
unlocking around collecting and fiemap_fill_extent_next.
Signed-off-by: Auke Kok <auke.kok@versity.com>
dir_emit() will copy_to_user, which can pagefault. If this happens while
cluster locked, we could deadlock.
We use a single page to stage dir_emit data, and iterate between
fetching dirents while locked, and emitting them while not locked.
Signed-off-by: Auke Kok <auke.kok@versity.com>
These 2 sections of compat for readdir are wholly obsolete and can be
hard dropped, which restores the method to look like current upstream
code.
This was added in ddd1a4e.
Signed-off-by: Auke Kok <auke.kok@versity.com>
We merely trace exit values and position, and ignore length.
Because vm_fault_t is __bitwise, sparse will loudly complain about
a plain cast to u32, so we must __force (on el8). ret will be 512 in
normal cases.
Signed-off-by: Auke Kok <auke.kok@versity.com>
Now that all of these should be passing, we enable all mmap() tests in
xfstests, and update the golden output with the new tests.
Signed-off-by: Auke Kok <auke.kok@versity.com>
Two test programs are added. The run time is about 1min on my el7
instance.
The test script finishes up with a read/write mmap test on offline
extents to verify the data wait paths in those functions.
One program will perform vfs read/write and mmap read/write calls on
the same file from across 5 threads (mounts) repeatedly. The goal
is to assure there are no locking issues between read/write paths.
The second test program performs consistency checking on a file that is
repeatedly written/read using memory maps and normal reads and writes,
and the content is verified after every operation.
Signed-off-by: Auke Kok <auke.kok@versity.com>
Add support for writable MAP_SHARED mmap()ings. Avoid issues with late
writepage()s building transactions by doing the block_write_begin() work in
scoutfs_data_page_mkwrite(). Ensure the page is marked dirty and prepared
for write, then let the VM complete the write when the page is flushed or
invalidated.
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Auke Kok <auke.kok@versity.com>
The list alloc blocks have an array of blknos that are offset by a start
field in the block header. The print code wasn't using that and was
always referencing the beginning of the array, which could miss blocks.
Signed-off-by: Zach Brown <zab@versity.com>
Since kABI migrations across minor versions is a thing of the past going
forward, we now:
- Detect if we're on EL9
- If so, add a requirement on the various flavors of release package to
that specific major.minor version
This appropriately does not allow upgrades across minor versions.
Additional blkdev/bdev changes now cause this call to be removed as
well resulting in us having to use yet another API to do the same for
el9_5.
The changes are a little more subtle as now the bdev_mount() call passes
a custom bd_holder_ops that we must match or else throw a WARN_ON, so we
switch to using sbi as our holder arg instead.
Make sure to bdev_fput and not fput, since we don't want to have our
private data cleanup deferred, failing xfstests generic/604.
Signed-off-by: Auke Kok <auke.kok@versity.com>
The assignments to it is no longer needed at all. All references can be
dropped since v6.4-rc4-163-g0d625446d0a4.
Signed-off-by: Auke Kok <auke.kok@versity.com>
Since v6.5-rc1-7-g9b6304c1d537, current_time() is no longer
extern, so we need to update this grep regex to continue to match.
Signed-off-by: Auke Kok <auke.kok@versity.com>
Add a quick test that races readers and shrinking to stress lock object
refcount racing between concurrent lock request handling threads in the
lock server.
Signed-off-by: Zach Brown <zab@versity.com>
Right now a client requesting a null mode for a lock will cause
invalidations of all existing granted modes of the lock across the
cluster.
This is unneccessarily broad. The absolute requirement is that a null
request invalidates other existing granted modes on the client. That's
how the client safely resolves shrinking's desire to free locks while
the locks are in use. It relies on turning it into the race between use
and remote invalidation.
But that only requires invalidating existing grants from the requesting
client, not all clients. It is always safe for null grants to coexist
with all grants on other clients. Consider the existing mechanics
involving null modes. First, null locks are instatiated on the client
before sending any requests at all. At any given time newly allocated
null locks are coexisting with all existing locks across the cluster.
Second, the server frees the client entry tracking struct the moment it
sends a null grant to the client. From that point on the client's null
lock can not have any impact on the rest of the lock holders because the
server has forgotten about it.
So we add this case to the server's test that two client lock modes are
compatible. We take the opportunity to comment the heck out of this
function instead of making it a dense boolean composition. The only
functional change is the addition of this case, the existing cases are
refactored but unchanged.
Signed-off-by: Zach Brown <zab@versity.com>
When freeing acked reesponses in the net layer we sweep the send and
resend queues looking for queued responses up to the sequence number
we've had acked. The code that did this used a weird pattern of
returning ints and adding them which gave me pause. Clean it up to use
bools and or (not short-circuiting ||) to more obviously communicate
what's going on.
Signed-off-by: Zach Brown <zab@versity.com>
Over time some fields have been added to the lock struct which haven't
been added to the lock tracing output. Add some of the more relevant
lock fields to tracing.
Signed-off-by: Zach Brown <zab@versity.com>
Lock object lifetimes in the lock server are protected by reference
counts. References are acquired while holding a lock on an rbtree.
Unfortunately, the decision to free lock objects wasn't tested while
also holding that lock on the rbtree. A caller putting their object
would test the refcount, then wait to get the rbtree lock to remove it
from the tree.
There's a possible race where the decision is made to remove the object
but another reference is added before the object is removed. This was
seen in testing and manifest as an incoming request handling path adding
a request message to the object before it is freed, losing the message.
Clients would then hang on a lock that never saw a response because
their request was freed with the lock object.
The fix is to hold the rbtree lock when testing the refcount and
deciding to free. It adds a bit more contention but not significantly
so, given the wild existing contention on a per-fs spinlocked rbtree.
Signed-off-by: Zach Brown <zab@versity.com>
Previously, any t_skip would cause the final test result to be a failure
because up until now no test should have been skipped.
However, with format-version-forward-back not being compatible with el9,
we are going to rely on el7/8 testing for that test soleley, and
therefore we have to allow skipping of this test on el9 and newer OS
versions.
We add `t_skip_permitted` to signal this from the test case to the
run-tests.sh script. A new exit code is passed, and all accounting is
updated to reflect that a test was skipped, but this was permitted. We
modify format-version-forward-back to use this new exit path.
Signed-off-by: Auke Kok <auke.kok@versity.com>
I'm seeing more and more of these as audit is enabled in el8 and el9
images I am using for testing, and during ENOSPC tests this has a chance
of triggering process accounting suspension, and subsequent resume.
Signed-off-by: Auke Kok <auke.kok@versity.com>
In v1.18-10-g5507ee5, we changed the test code away from loopback
to device-mapper, which simplified our DUT setup code.
However, this results in the occasional `device changed size` messages
now being emitted by the `dm` driver instead of the `loop` kernel
module. We have to additionally ignore these kernel messages from now as
well.
Signed-off-by: Auke Kok <auke.kok@versity.com>