We had a one-off test that was overly specific to staging from tmpfile.
This renames it to a more generic test where we can add more tests of
o_tmpfile in general.
Signed-off-by: Zach Brown <zab@versity.com>
Add a quick test of the index items to make sure that rapid inode
updates don't create duplicate meta_seq items.
Signed-off-by: Zach Brown <zab@versity.com>
Add a test which gives the server a transaction with a free list block
that contains blknos that each dirty an individiaul btree blocks in the
global data free extent btree.
Signed-off-by: Zach Brown <zab@versity.com>
We're seeing some trouble with very specific race conditions. This
updates the orphan-inodes test to try and force final inode deletion
during eviction, the orphan scan worker, and opening inodes by handle to
all race and hit an inode number at the same time.
Signed-off-by: Zach Brown <zab@versity.com>
This unit test reproduces the race we have between
client and server diong lock recovery while farewell
is processed.
Signed-off-by: Bryant G. Duffy-Ly <bduffyly@versity.com>
We want to enable the test case for:
generic/023 - tests that renameat2 syscall exists
generic/024 - renameat2 with NOREPLACE flag
Move both generic/025 and 078 to the no run list so that
we can test the unsupported output if the flags were
passed that were not supported.
Example output:
generic/025 [not run] fs doesn't support RENAME_EXCHANGE
generic/078 [not run] fs doesn't support RENAME_WHITEOUT
Signed-off-by: Bryant G. Duffy-Ly <bduffyly@versity.com>
The goal of the test case is to have two mount points
with two async calls made to do renameat2. This allows
for two calls to race to call renameat2 RENAME_NOREPLACE.
When this happens you expect one of them to fail with a
-EEXIST. This would validate that the new flag works.
Essentially one of the two calls to renameat should hit the
new RENAME_NOREPLACE code and exit early.
Signed-off-by: Bryant G. Duffy-Ly <bduffyly@versity.com>
The current test case attempts to create a state to read
by calling setattr and getattr in attempt to force block
cache reads. It so happens that this does not always force
cache block reads, which in rare cases causes this test case
to fail.
The new test case removes all the extra bouncing around of mount
points and we just directly call scoutfs df which will walk
everyone's allocators to summarize the block counts, which is
guaranteed to exist. Therefore, we do not have to create any sort
of state prior to trying to force a read.
Signed-off-by: Bryant G. Duffy-Ly <bduffyly@versity.com>
As we update xattrs we need to update any existing old items with the
contents of the new xattr that uses those items. The loop that updated
existing items only took the old xattr size into account and assumed
that the new xattr would use those items. If the new xattr size used
fewer parts then the attempt to update all the old parts that weren't
covered by the new size would go very wrong. The length of the region
in the new xattr would be negative so it'd try to use the max part
length. Worse, it'd copy these max part length regions outside the
input new xattr buffer. Typically this would land in addressible memory
and copy garbage into the unused old items before they were later
deleted.
However, it could access so far outside the input buffer that it could
cross a page boudary into inaccessible memory and fault. We saw this in
the field while trying to repeatedly incrementally shrink a large xattr.
This fixes the loop that updates overlapping items between the new and
old xattr to start with the smaller of their two item counts. Now it
will only update items that are actually used by both xattrs and will
only safely access the new xattr input buffer.
Signed-off-by: Zach Brown <zab@versity.com>
The k-way merge function at the core of the srch file entry merging had
some bookkeeping math (calculating number of parents) that couldn't
handle merging a single incoming entry stream, so it threw a warning and
returned an error. When refusing to handle that case, it was assuming
that caller was trying to merge down a single log file which doesn't
make any sense.
But in the case of multiple small unsorted logs we can absolutely end up
with their entries stored in one sorted page. We have one sorted input
page that's merging multiple log files. The merge function is also the
path that writes to the output file so we absolutely need to handle this
case.
We more carefully calculate the number of parents, clamping it to one
parent when we'd otherwise get "(roundup(1) -> 1) - 1 == 0" when
calculating the number of parents from the number of inputs. We can
relax the warning and error to refuse to merge nothing.
The test triggers this case by putting single search entries in the log
files for mounts and unmounting them to force rotation of the mount log
files into mergable rotated log files.
Signed-off-by: Zach Brown <zab@versity.com>
Add a quick test to make sure that create is validating stale dentries
before deciding if it should create or return -eexist.
Signed-off-by: Zach Brown <zab@versity.com>
Add the .totl. xattr tag. When the tag is set the end of the name
specifies a total name with 3 encoded u64s separated by dots. The value
of the xattr is a u64 that is added to the named total. An ioctl is
added to read the totals.
Signed-off-by: Zach Brown <zab@versity.com>
We had some logic to try and delay lock invalidation while the lock was
still actively in use. This was trying to reduce the cost of
pathological lock conflict cases but it had some severe fairness
problems.
It was first introduced to deal with bad patterns in userspace that no
longer exist and it was built on top of the LSM transaction machinery
that also no longer exists. It hasn't aged well.
Instead of introducing invalidation latency in the hopes that it leads
to more batched work, which it can't always, let's aim more towards
reducing latency in all parts of the write-invalidate-read path and
also aim towards reducing contention in the first place.
Signed-off-by: Zach Brown <zab@versity.com>
Add a scoutfs command that uses an ioctl to send a request to the server
to safely use a device that has grown.
Signed-off-by: Zach Brown <zab@versity.com>
Returning ENOSPC is challenging because we have clients working on
allocators which are a fraction of the whole and we use COW transactions
so we need to be able to allocate to free. This adds support for
returning ENOSPC to client posix allocators as free space gets low.
For metadata, we reserve a number of free blocks for making progress
with client and server transactions which can free space. The server
sets the low flag in a client's allocator if we start to dip into
reserved blocks. In the client we add an argument to entering a
transaction which indicates if we're allocating new space (as opposed to
just modifying existing data or freeing). When an allocating
transaction runs low and the server low flag is set then we return
ENOSPC.
Adding an argument to transaciton holders and having it return ENOSPC
gave us the opportunity to clean it up and make it a little clearer.
More work is done outside the wait_event function and it now
specifically waits for a transaction to cycle when it forces a commit
rather than spinning until the transaction worker acquires the lock and
stops it.
For data the same pattern applies except there are no reserved blocks
and we don't COW data so it's a simple case of returning the hard ENOSPC
when the data allocator flag is set.
The server needs to consider the reserved count when refilling the
client's meta_avail allocator and when swapping between the two
meta_avail and meta_free allocators.
We add the reserved metadata block count to statfs_more so that df can
subtract it from the free meta blocks and make it clear when enospc is
going to be returned for metadata allocations.
We increase the minimum device size in mkfs so that small testing
devices provide sufficient reserved blocks.
And finally we add a little test that makes sure we can fill both
metadata and data to ENOSPC and then recover by deleting what we filled.
Signed-off-by: Zach Brown <zab@versity.com>
Add a test which exercises the various reasons for fencing mounts and
checks that we reclaim the resources that they had.
Signed-off-by: Zach Brown <zab@versity.com>
Support O_TMPFILE: Create an unlinked file and put it on the orphan list.
If it ever gains a link, take it off the orphan list.
Change MOVE_BLOCKS ioctl to allow moving blocks into offline extent ranges.
Ioctl callers must set a new flag to enable this operation mode.
RH-compat: tmpfile support it actually backported by RH into 3.10 kernel.
We need to use some of their kabi-maintaining wrappers to use it:
use a struct inode_operations_wrapper instead of base struct
inode_operations, set S_IOPS_WRAPPER flag in i_flags. This lets
RH's modified vfs_tmpfile() find our tmpfile fn pointer.
Add a test that tests both creating tmpfiles as well as moving their
contents into a destination file via MOVE_BLOCKS.
xfstests common/004 now runs because tmpfile is supported.
Signed-off-by: Andy Grover <agrover@versity.com>
The block-stale-reads test was built from the ashes of a test that
used counters and triggers to work with the btree when it was
only used on the server.
The initial quick translation to try and trigger block cache retries
while the forest called the btree got so much wrong. It was still
trying to use some 'cl' variable that didn't refer to the client any
more, the trigger helpers now call statfs to find paths and can end up
triggering themselves. and many more counters stale reads can happen
throughout the system while we're working -- not just one from our
trigger.
This fixes it up to consistently use fs numbers instead of
the silly stale cl variable and be less sensitive to triggers firing and
counter differences.
Signed-off-by: Zach Brown <zab@versity.com>
The previous test that triggered re-reading blocks, as though they were
stale, was written in the era where it only hit btree blocks and
everything else was stored in LSM segments.
This reworks the test to make it clear that it affects all our block
readers today. The test only exercise the core read retry path, but it
could be expanded to test callers retrying with newer references after
they get -ESTALE errors.
Signed-off-by: Zach Brown <zab@versity.com>
Add a test which stages a file in multiple parts while a long-lived
process is blocking on offline extents trying to compare the file to the
known contents.
Signed-off-by: Zach Brown <zab@versity.com>
Add a test that randomly renames entries in a single large directory.
This has caught bugs in the reservation of allocator resources for
client transactions.
Signed-off-by: Zach Brown <zab@versity.com>
Add -z option to run-tests.sh to specify metadata device.
Do a bunch of things twice.
Fix up setup-error-teardown test.
Signed-off-by: Andy Grover <agrover@versity.com>
[zab@versity.com: minor arg message fixes, golden output]
Add a test which makes sure that we don't initialize the lock server's
write version to a version less than existing log tree items.
Signed-off-by: Zach Brown <zab@versity.com>
The test that exercises re-reading stale cached blocks was still
trying to use both tiny btree blocks and segments, both of which have
been removed.
Signed-off-by: Zach Brown <zab@versity.com>
Add support for reporting errors to data waiters via a new
SCOUTFS_IOC_DATA_WAIT_ERR ioctl. This allows waiters to return an error
to readers when staging fails.
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
[zab: renamed to data_wait_err, took ino arg]
Signed-off-by: Zach Brown <zab@versity.com>
We had a bug where we were creating extent lengths that were rounded up
to the size of the packed extent items instead of being limited by
i_size. As it happens the last setattr_more test would have found it if
I'd actually done the math to check that the extent length was correct.
We add an explicit offline blocks count test because that's what lead us
to notice that the offline extent length was wrong.
Signed-off-by: Zach Brown <zab@versity.com>
We had a bug where offline extent creation during setattr_more just
wasn't making it all the way to persistent items. This adds basic
sanity tests of the setattr_more interface.
Signed-off-by: Zach Brown <zab@versity.com>
The segment-cache-fwd-back-iter test only applied to populating the item
cache from segments, and we don't do that anymore. The test can
be removed.
Signed-off-by: Zach Brown <zab@versity.com>
Add a test which makes sure that errors during setup can be properly
torn down. This found an assertion that was being triggered during lock
shudown.
Signed-off-by: Zach Brown <zab@versity.com>
The first commit of the scoutfs-tests suite which uses multiple mounts
on one host to test multi-node scoutfs.
Signed-off-by: Zach Brown <zab@versity.com>