Add a run-tests -V option that passes through the -V option to mkfs so
that runs can specify the format version that the primary volume will
have. This doesn't affect the scratch file system versions.
Signed-off-by: Zach Brown <zab@versity.com>
Add support for the indx xattr tag which lets xattrs determine the sort
order of by their inode number in a global index.
Signed-off-by: Zach Brown <zab@versity.com>
Add a test binary that uses o_tmpfile and linkat to create a file in a
given dir. We have something similar, but it's weirdly specific to a
given test. This is a simpler building block that could be used by more
tests.
Signed-off-by: Zach Brown <zab@versity.com>
Add support for project IDs. They're managed through the _attr_x
interfaces and are inherited from the parent directory during creation.
Signed-off-by: Zach Brown <zab@versity.com>
Now that the _READ_XATTR_TOTALS ioctl uses the weak item cache we have
to drop caches before each attempt to read the xattrs that we just wrote
and synced.
Signed-off-by: Zach Brown <zab@versity.com>
Change the read_xattr_totls ioctl to use the weak item cache instead of
manually reading and merging the fs items for the xattr totals on every
call.
Signed-off-by: Zach Brown <zab@versity.com>
The _READ_XATTR_TOTALS ioctl had manual code for merging the .totl.
total and value while reading fs items. We're going to want to do this
in another reader so let's put these in their own funcions that clearly
isolate the logic of merging the fs items into a coherent result.
We can get rid of some of the totl_read_ counters that tracked which
items we were merging. They weren't adding much value and conflated the
reading ioctl interface with the merging logic.
Signed-off-by: Zach Brown <zab@versity.com>
Add a forest item reading interface that lets the caller specify the net
roots instead of always getting them from a network request.
Signed-off-by: Zach Brown <zab@versity.com>
Add the weak item cache that is used for reads that can handle results
being a little behind. This gives us a lot more freedom to implement
the cache that biases concurrent reads.
Signed-off-by: Zach Brown <zab@versity.com>
Signed-off-by: Bryant G. Duffy-Ly <bduffyly@versity.com>
[zab@versity.com: refactored for retention, added test cases]
Signed-off-by: Zach Brown <zab@versity.com>
Add a bit to the private scoutfs inode flags which indicates that the
inode is in retention mode. The bit is visible through the _attr_x
interface. It can only be set on regular files and when set it prevents
modification to all but non-user xattrs. It can be cleared by root.
Signed-off-by: Zach Brown <zab@versity.com>
We have some fs functions which return info based on the test mount nr
as the test has setup. This refactors those a bit to also provide
some of the info when the caller has a path in a given mount. This will
let tests work with scratch mounts a little more easily.
Signed-off-by: Zach Brown <zab@versity.com>
Signed-off-by: Bryant G. Duffy-Ly <bduffyly@versity.com>
Now that we have the attr_x calls we can implement stat_more with
get_attr_x and setattr_more with set_attr_x.
The conversion of stat_more fixes a surprising consistency bug.
stat_more wasn't acquiring a cluster lock for the inode nore refreshing
it so it could have returned stale data if modifications were made in
another mount.
Signed-off-by: Zach Brown <zab@versity.com>
The existing stat_more and setattr_more interfaces aren't extensible.
This solves that problem by adding attribute interfaces which specify
the specific fields to work with.
We're about to add a few more inode fields and it makes sense to add
them to this extensible structure rather than adding more ioctls or
relatively clumsy xattrs. This is modeled loosely on the upstream
kernel's statx support.
The ioctl entry points call core functions so that we can also implement
the existing stat_more and setattr_more interfaces in terms of these new
attr_x functions.
Signed-off-by: Zach Brown <zab@versity.com>
Initially setattr_more followed the general pattern where extent
manipulation might require multiple transactions if there are lots of
extent items to work with. The scoutfs_data_init_offline_extent()
function that creates an offline extent handled transactions itself.
But in this case the call only supports adding a single offline extent.
It will always use a small fixed amount of metadata and could be
combined with other metadata changes in one atomic transaction.
This changes scoutfs_data_init_offline_extent() to have the caller
handle transactions, inode updates, etc. This lets the caller perform
all the restore changes in one transaction. This interface change will
then be used as we add another caller that adds a single offline extent
in the same way.
Signed-off-by: Zach Brown <zab@versity.com>
Add a little inline helper to test whether the mounted format version
supports a feature or not, returning an errno that callers can use when
they can return a shared expected error.
Signed-off-by: Zach Brown <zab@versity.com>
We're about to add new format structures so increment the max version to
2. Future commits will add the features before we release version 2 in
the wild.
Signed-off-by: Zach Brown <zab@zabbo.net>
We're about to increase the inode size and increment the format version.
Inode reading and writing has to handle different valid inode sizes as
allowed by the format version. This is the initial skeletal work that
later patches which really increase the inode size will further refine
to add the specific known sizes and format versions.
Signed-off-by: Bryant G. Duffy-Ly <bduffyly@versity.com>
[zab@versity.com: reworded description, reworked to use _within]
Signed-off-by: Zach Brown <zab@versity.com>
Add a lookup variant that returns an error if the item value is larger
than the caller's value buffer size and which zeros the rest of the
caller's buffer if the returned value is smaller.
Signed-off-by: Zach Brown <zab@versity.com>
We were using a seqcount to protect high frequency reads and writes to
some of our private inode fields. The writers were serialized by the
caller but that's a bit too easy to get wrong. We're already storing
the write seqcount update so the additional internal spinlock stores in
seqlocks isn't a significant additional overhead. The seqlocks also
handle preemption for us.
Signed-off-by: Zach Brown <zab@versity.com>
Don't let change-format-version decrease the format version. It doesn't
have the machinery to go back and migrate newer structures to older
structures that would be compatible with code expecting the older
version.
Signed-off-by: Bryant G. Duffy-Ly <bduffyly@versity.com>
[zab@versity.com: split from initial patch with other changes]
Signed-off-by: Zach Brown <zab@versity.com>
Definitions in forest.h use lock pointers. Pre-declare the struct so it
doesn't break inclusion without lock.h, following current practice in
the header.
Signed-off-by: Zach Brown <zab@versity.com>
scoutfs_file_write_iter tried to track written bytes and return those
unless there was an error. But written was uninitialized if we got
errors in any of the calls leading up to performing the write. The
bytes written were also not being passed to the generic_write_sync
helper. This fixes up all those inconsistencies and makes it look like
the write_iter path in other filesystems.
Signed-off-by: Zach Brown <zab@versity.com>
When we write to file contents we change the data_version. To stage old
contents into an offline region the data_version of the file must match
the archived copy. When writing we have to make sure that there is no
offline data so that we don't increase the data_version which will
prevent staging of any other file regions because the data_versions no
longer match.
scoutfs_file_write_iter was only checking for offline data in its write
region, not the entire file. Fix it to match the _aio_write method and
check the whole file.
Signed-off-by: Zach Brown <zab@versity.com>
scoutfs_data_wait_check_iter() was checking the contiguous region of the
file starting at its pos and extending for iter_iov_count() bytes. The
caller can do that with the previous _data_wait_check() method by
providing the same count that _check_iter() was using.
Signed-off-by: Zach Brown <zab@versity.com>
The item cache has a bit of safety checks that make sure that an
operation is performed while holding a lock that covers the item. It
dumped a stack trace via WARN when that wasn't true, but it didn't
include any details about the keys or lock modes involved.
This adds a message that's printed once which includes the keys and
modes when an operation is attempted that isn't protected.
Signed-off-by: Zach Brown <zab@versity.com>
scoutfs_item_create() was checking that its lock had a read mode, when
it should have been checking for a write mode. This worked out because
callers with write mode locks are also protecting reads.
Signed-off-by: Zach Brown <zab@versity.com>
Unlink looks up the entry items for the name it is removing because we
no longer store the extra key material in dentries. If this lookup
fails it will use an error path which release a transaction which wasn't
held. Thankfully this error path is unlikely (corruption or systemic
errors like eio or enomem) so we haven't hit this in practice.
Signed-off-by: Zach Brown <zab@versity.com>
When we added the crtime creation timestamp to the inode we forgot to
update mkfs to set the crtime of the root inode.
Signed-off-by: Zach Brown <zab@versity.com>
Block reads can return ESTALE naturally as mounts read through old
cached blocks. We won't always log it as an error but we should add a
tracepoint that can be inspected.
Signed-off-by: Zach Brown <zab@versity.com>
This addresses some minor issues with how we handle driving the
weak-modules infrastructure for handling running on kernels not
explicitly built for.
For one, we now drive weak-modules at install-time more explicitly (it
was adding symlinks for all modules into the right place for the running
kernel, whereas now it only handles that for scoutfs against all
installed kernels).
Also we no longer leave stale modules on the filesystem after an
uninstall/upgrade, similar to what's done for vsm's kmods right now.
RPM's pre/postinstall scriptlets are used to drive weak-modules to clean
things up.
Note that this (intentionally) does not (re)generate initrds of any
kind.
Finally, this was tested on both the native kernel version and on
updates that would need the migrated modules. As a result, installs are
a little quicker, the module still gets migrated successfully, and
uninstalls correctly remove (only) the packaged module.