Compare commits

..

46 Commits

Author SHA1 Message Date
Ben McClelland bcf559818b Add rpm spec file support for el8 builds
The rpmbuild support files no longer define the previously used kernel
module macros. This carves out the differences between el7 and el8 with
conditionals based on the distro we are building for.

Signed-off-by: Ben McClelland <ben.mcclelland@versity.com>
2023-08-25 15:40:05 -07:00
Auke Kok 36ee4d946b Ignore last flag output by filefrag.
New versions of filefrag will output the presence of the `last`
flag as well, but we don't care.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-08-01 16:35:48 -04:00
Auke Kok dc57b34b8d Don't use static struct initializer.
In rhel7 this is a nested struct with ktime_t. However, in rhel8
ktime_t is a simple s64, and not a union, and thus we can't do
this as easily. Just memset it.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-08-01 16:35:48 -04:00
Auke Kok 779b96df81 Allow the kernel to return -ESTALE from orphan-inode test
In newer kernels, we always get -ESTALE because the inode has been
marked immediately as deleting. Since this is expected behavior we
should not fail the test here on this error value.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-08-01 16:35:48 -04:00
Auke Kok 1ee0331b8b Skip userns based testing for RHEL8.
In RHEL7, this was skipped automatically. In RHEL8, we don't support
the needed passing through of the actual user namespace into our
ACL set/get handlers. Once we get around v5.11 or so, the handlers
are automatically passed the namespace. Until then, skip this test.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-08-01 16:35:48 -04:00
Auke Kok 42acf01dce Use .prefix for POSIX acl instead of .name.
New kernels expect to do a partial match when a .prefix is used here,
and provide a .name member in case matching should look at the whole
string. This is what we want.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-08-01 16:35:48 -04:00
Auke Kok ca4d463c75 Don't cache ACL's in newer kernels.
The caller takes care of caching for us. Us doing caching
messes with memory management of cached ACLs and breaks.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-08-01 16:35:48 -04:00
Auke Kok e7dadd09ae New versions of getfattr will quote empty attr values.
Instead of messing with quotes and using grep for the correct
xattr name, directly query the value of the xattr being tested
only, and compare that to the input.

Side effect is that this is significantly simpler and faster.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-08-01 16:35:48 -04:00
Auke Kok 954843d2ab Account for coreutils using statx() call instead of stat()
`stat` internally switched to using the new `statx` syscall, and this
affects the output of perror() subsequently. This is the same error
as before (and expected).

Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-08-01 16:35:48 -04:00
Auke Kok 5968265aad Account for e2fsprogs output format changes.
The filefrag program in e2fsprogs-v1.42.10-10-g29758d2f now includes
an extra flag, and changes how the `unknown` flag is output.

We essentially adjust for this "new" golden value on the fly if we
encounter it. We don't expect future changes to the output.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-08-01 16:35:48 -04:00
Auke Kok f30c095e8e Account for quoting style changes in coreutils.
In older versions of coreutils, quoted strings are occasionally
output using utf-8 open/close single quotes.

New versions of coreutils will exclusively use the ASCII single quote
character "'" when the output is not a TTY - as is the case with
all test scripts.

We can avoid most of these problems by always setting LC_ALL=C in
testing, however.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-08-01 16:35:48 -04:00
Auke Kok 1578ded917 Ignore loop device resizing messages.
These occasionally trigger during tests.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-08-01 16:35:48 -04:00
Auke Kok 2c21f88f24 Support .read/write_iter callbacks in lieu of .aio_read/write
The aio_read and aio_write callbacks are no longer used by newer
kernels which now uses iter based readers and writers.

We can avoid implementing plain .read and .write as an iter will
be generated when needed for us automatically.

We add a new data_wait_check_iter() function accordingly.

With these methods removed from the kernel, the el8 kernel no
longer uses the extended ops wrapper struct and is much closer now
to upstream. As a result, a lot of methods are moving around from
inode_dir_operations to and from inode_file_operations etc, and
perhaps things will look a bit more structured as a result.

As a result, we need a slightly different data_wait_check() that
accounts for the iter and offset properly.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-08-01 16:35:48 -04:00
Auke Kok 57356b57aa Implement .readahead for address_space_operations (aops).
.readpages is obsolete in el8 kernels. We implement the .readahead
method instead which is passed a struct readahead_control. We use
the readahead_page(rac) accessor to retrieve page by page from the
struct.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-08-01 16:35:48 -04:00
Auke Kok b81e3bf421 implement generic_file_buffered_write()
This function is removed in el8 therefore we need to implement
it ourselves now. Copy it.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-08-01 16:35:48 -04:00
Auke Kok 5d1742a954 (un)register_hotcpu_notifier is obsolete
v4.9-12228-g530e9b76ae8f Drops all (un)register_(hot)cpu_notifier()
API functions. From here on we need to use the new cpuhp_* API.

We avoid this entirely for now, at the cost of leaking pages until
the filesystem is unmounted.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-08-01 16:35:48 -04:00
Auke Kok 22bd4c4493 Timespec64 changes for yr2038.
Provide a fallback `current_time(inode)` implementation for older
kernels.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-08-01 16:35:48 -04:00
Auke Kok 371bff49af Adjust scoutfs_quorum_loop trace point.
Convert the timeout struct unto a u64 nsecs value before passing it to
the trace point event, as to not overflow the 64bit limitation on args.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-08-01 16:35:48 -04:00
Auke Kok 3d43fdfeaa Initialize msg.msg_iter from iovec.
Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-08-01 16:35:48 -04:00
Auke Kok 6563f70a90 Handle net arg being added to sock_create_kern()
Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-08-01 16:35:48 -04:00
Auke Kok a14da52cbb kernel_getsockname and kernel_getpeername dropped addrlen arg.
v4.16-rc1-1-g9b2c45d479d0

This interface now returns (sizeof (addr)) on success, instead of 0.
Therefore, we have to change the error condition detection.

The compat for older kernels handles the addrlen check internally.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-08-01 16:35:48 -04:00
Auke Kok f367e485a6 xattr functions are now passed flags through struct xattr_handler
Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-08-01 16:35:48 -04:00
Auke Kok 8a7bc0cdfa Remove the use of backing_dev_info pt from address_space.
Instead, use the new inline inode_to_bdi from <backing-dev.h> to fill
in the task's backing_dev_info.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-08-01 16:35:48 -04:00
Auke Kok e81d16f8db Do not use MS_* flags anymore in kernel space.
MS_* flags from <linux/mount.h> should not be used in the kernel
anymore from 4.x onwards. Instead, we need to use the SB_* versions

Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-08-01 16:35:48 -04:00
Zach Brown bad0455e28 Use count/scan objects shrinking interface
Move to the more recent interfaces for counting and scanning cached
objects to shrink.

Signed-off-by: Zach Brown <zab@versity.com>
Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-08-01 16:35:48 -04:00
Auke Kok 0a30c0b926 Use page->lru instead of page->list
With v3.14-rc1-10-g34bf6ef94a83, page->list is removed Instead,
use the union member ->lru.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-08-01 16:35:48 -04:00
Zach Brown 84a4000c85 Use more modern bio interfaces
Move towards modern bio intefaces, while unfortunately carrying along a
bunch of compat functions that let us still work with the old
incompatible interfaces.

Signed-off-by: Zach Brown <zab@versity.com>
Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-08-01 16:35:48 -04:00
Zach Brown 859f63e49b Use memalloc_nofs_save
memalloc_nofs_save() was introduced as preferential to trying to use GFP
flags to indicate that a task should not recurse during reclaim.  We use
it instead of the _noio_ we were using before.

Signed-off-by: Zach Brown <zab@versity.com>
2023-08-01 16:35:48 -04:00
Zach Brown 588bdb7969 Use percpu_counter_add_batch
__percpu_counter_add_batch was renamed to make it clear that the __
doesn't mean it's less safe, as it means in other calls in the API, but
just that it takes an additional parameter.

Signed-off-by: Zach Brown <zab@versity.com>
Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-08-01 16:35:48 -04:00
Auke Kok b894f6b04c Use __posix_acl_create/_chmod and add backwards compatibility
There are new interfaces available but the old one has been retained
for us to use. In case of older kernels, we will need to fall back
to the previous name of these functions.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-08-01 16:35:46 -04:00
Auke Kok e26573ae8e Fix argument test for __posix_acl_valid.
The argument is fixed to be user_namespace, instead of user_ns.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-08-01 16:34:50 -04:00
Auke Kok 3f6b98496f Use setattr_preapre() as inode_change_ok() was removed in v4.8-rc1
Instead, we can call setattr_prepare() directly. We provide a fallback
for older kernels.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-08-01 16:34:49 -04:00
Auke Kok b8a378ede7 Use the new inode->i_version manipulation methods.
Provide fallback in degraded mode for kernels pre-v4.15-rc3 by directly
manipulating the member as needed.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-08-01 16:33:28 -04:00
Auke Kok 4b08e79988 inode->i_mutex has been replaced with inode->i_rwsem.
Since v4.6-rc3-27-g9902af79c01a, inode->i_mutex has been replaced
with ->i_rwsem. However, long since whenever, inode_lock() and
related functions already worked as intended and provided fully
exclusive locking to the inode.

To avoid a name clash on pre-rhel8 kernels, we have to rename a
stack variable in `src/file.c`.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-08-01 16:33:28 -04:00
Auke Kok 2ac28c4969 New inode->i_version API requires <iversion.h>
Since v4.15-rc3-4-gae5e165d855d, <linux/iversion.h> contains a new
inode->i_version API and it is not included by default.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-08-01 16:33:28 -04:00
Auke Kok 3608d1aae1 use $(MAKE) to allow passing jobserver flags.
With this, we can `make -jX` to speed up compiles a bit from
the kmod folder.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-08-01 16:33:28 -04:00
Auke Kok f13757f0af module_init/_exit should have a semicolon at eol.
In the past this was not needed but since el7 onwards these macros
should require the semicolon.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-08-01 16:33:28 -04:00
Auke Kok 34e6efd39c Adjust for new augmented rbtree compute callback function signature
The new variant of the code that recomputes the augmented value
is designed to handle non-scalar types and to facilitate that, it
has new semantics for the _compute callback. It is now passed a
boolean flag `exit` that indicates that if the value isn't changed,
it should exit and halt propagation.

The callback function now shall return whether that propagation should
stop or not, and not the computed new value. The callback can now
directly update the new computed value in the node.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-08-01 16:30:16 -04:00
Auke Kok b452ca3d23 Add include <blkdev.h>.
Fixes: Error: implicit declaration of function ‘blkdev_put’

Previously this was an `extern` in <fs.h> and included implicitly,
hence the need to hard include it now.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-08-01 13:40:59 -04:00
Auke Kok 090c795b7e preempt_mask.h is removed entirely.
v4.1-rc4-22-g92cf211874e9 merges this into preempt.h, and on
rhel7 kernels we don't need this include anymore either.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-08-01 13:40:59 -04:00
Auke Kok d9394cb084 page_cache_release() is removed. put_page() instead.
Even in 3.x, this already was equivalent.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-08-01 13:40:59 -04:00
Auke Kok 67ae352618 flush_work_sync is equivalent to flush_work.
v3.15-rc1-6-g1a56f2aa4752 removes flush_work_sync entirely, but
ever since v3.6-rc1-25-g606a5020b9bd which made all workqueues
non-reentrant, it has been equivalent to flush_work.

This is safe because in all cases only one server->work can be
in flight at a time.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-08-01 13:40:59 -04:00
Auke Kok 38bb5a8254 d_materialise_unique replaced with d_splice_alias.
Note argument order reversal.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-08-01 13:40:59 -04:00
Auke Kok 2510688a36 READ_ONCE() replaces ACCESS_ONCE()
v3.18-rc3-2-g230fa253df63 forces us to remove ACCESS_ONCE() with
READ_ONCE(), but it is probably the better interface and works with
non-scalar types.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-08-01 13:40:59 -04:00
Auke Kok 15a5dca8c6 PAGE_CACHE_SIZE was removed, replace with PAGE_SIZE.
PAGE_CACHE_SIZE was previously defined to be equivalent to PAGE_SIZE.

This symbol was removed in v4.6-rc1-32-g1fa64f198b9f.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-08-01 13:40:59 -04:00
Auke Kok c3996cb021 Include kernel.h and fs.h at the top of kernelcompat.h
Because we `-include src/kernelcompat.h` from the command line,
this header gets included before any of the kernel includes in
most .c and .h files. We should at least make sure we pull in
<fs> and <kernel> since they're required.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2023-08-01 13:40:59 -04:00
83 changed files with 1026 additions and 7542 deletions
-72
View File
@@ -1,78 +1,6 @@
Versity ScoutFS Release Notes
=============================
---
v1.21
\
*Jul 1, 2024*
This release adds features that rely on incompatible changes to
structure the file system. The process of advancing the format version
to enable these features is described in scoutfs(5).
Added the ".indx." extended attribute tag which can be used to determine
the sorting of files in a global index.
Added ScoutFS quotas which let rules define file size and count limits
in terms of ".totl." extended attribute totals.
Added the project ID file attribute which is inherited from parent
directories on creation. ScoutFS quota rules can reference project IDs.
Add a retention attribute for files which prevents modification once
enabled.
---
v1.20
\
*Apr 22, 2024*
Minor changes to packaging to better support "weak" module linking of
the kernel module, and to including git hashes in the built package. No
changes in runtime behaviour.
---
v1.19
\
*Jan 30, 2024*
Added the log\_merge\_wait\_timeout\_ms mount option to set the timeout
for creating log merge operations. The previous timeout, now the
default, was too short for some systems and was resulting in consistent
timeouts which created an excessive number of log trees waiting to be
merged.
Improved performance of many in-mount server operations when there are a
large number of log trees waiting to be merged.
---
v1.18
\
*Nov 7, 2023*
Fixed a bug where background srch file compaction could stop making
forward progress if a partial compaction operation was committed at a
specific byte offset in a block. This would cause srch file searches to
be progressively more expensive over time. Once this fix is running
background compaction will resume, bringing the cost of searches back
down.
---
v1.17
\
*Oct 23, 2023*
Add support for EL8 generation kernels.
---
v1.16
\
*Oct 4, 2023*
Fix an issue where the server could hang on startup if its persistent
allocator structures were left in a specific degraded state by the
previously active server.
---
v1.15
\
+3 -9
View File
@@ -12,22 +12,17 @@ else
SP = @:
endif
SCOUTFS_GIT_DESCRIBE ?= \
SCOUTFS_GIT_DESCRIBE := \
$(shell git describe --all --abbrev=6 --long 2>/dev/null || \
echo no-git)
ESCAPED_GIT_DESCRIBE := \
$(shell echo $(SCOUTFS_GIT_DESCRIBE) |sed -e 's/\//\\\//g')
RPM_GITHASH ?= $(shell git rev-parse --short HEAD)
SCOUTFS_ARGS := SCOUTFS_GIT_DESCRIBE=$(SCOUTFS_GIT_DESCRIBE) \
RPM_GITHASH=$(RPM_GITHASH) \
CONFIG_SCOUTFS_FS=m -C $(SK_KSRC) M=$(CURDIR)/src \
EXTRA_CFLAGS="-Werror"
# - We use the git describe from tags to set up the RPM versioning
RPM_VERSION := $(shell git describe --long --tags | awk -F '-' '{gsub(/^v/,""); print $$1}')
RPM_GITHASH := $(shell git rev-parse --short HEAD)
TARFILE = scoutfs-kmod-$(RPM_VERSION).tar
@@ -46,8 +41,7 @@ modules_install:
%.spec: %.spec.in .FORCE
sed -e 's/@@VERSION@@/$(RPM_VERSION)/g' \
-e 's/@@GITHASH@@/$(RPM_GITHASH)/g' \
-e 's/@@GITDESCRIBE@@/$(ESCAPED_GIT_DESCRIBE)/g' < $< > $@+
-e 's/@@GITHASH@@/$(RPM_GITHASH)/g' < $< > $@+
mv $@+ $@
+2 -14
View File
@@ -1,7 +1,6 @@
%define kmod_name scoutfs
%define kmod_version @@VERSION@@
%define kmod_git_hash @@GITHASH@@
%define kmod_git_describe @@GITDESCRIBE@@
%define pkg_date %(date +%%Y%%m%%d)
# Disable the building of the debug package(s).
@@ -76,7 +75,7 @@ echo "Building for kernel: %{kernel_version} flavors: '%{flavors_to_build}'"
for flavor in %flavors_to_build; do
rm -rf obj/$flavor
cp -r source obj/$flavor
make RPM_GITHASH=%{kmod_git_hash} SCOUTFS_GIT_DESCRIBE=%{kmod_git_describe} SK_KSRC=%{kernel_source $flavor} -C obj/$flavor module
make SK_KSRC=%{kernel_source $flavor} -C obj/$flavor module
done
%install
@@ -98,21 +97,10 @@ find %{buildroot} -type f -name \*.ko -exec %{__chmod} u+x \{\} \;
/lib/modules
%post
echo /lib/modules/%{kversion}/%{install_mod_dir}/scoutfs.ko | weak-modules --add-modules --no-initramfs
weak-modules --add-kernel --no-initramfs
depmod -a
%endif
%clean
rm -rf %{buildroot}
%preun
# stash our modules for postun cleanup
SCOUTFS_RPM_NAME=$(rpm -q %{name} | grep "%{version}-%{release}")
rpm -ql $SCOUTFS_RPM_NAME | grep '\.ko$' > /var/run/%{name}-modules-%{version}-%{release} || true
%postun
if [ -x /sbin/weak-modules ]; then
cat /var/run/%{name}-modules-%{version}-%{release} | /sbin/weak-modules --remove-modules --no-initramfs
fi
rm /var/run/%{name}-modules-%{version}-%{release} || true
-4
View File
@@ -9,7 +9,6 @@ CFLAGS_scoutfs_trace.o = -I$(src) # define_trace.h double include
scoutfs-y += \
acl.o \
attr_x.o \
avl.o \
alloc.o \
block.o \
@@ -35,7 +34,6 @@ scoutfs-y += \
options.o \
per_task.o \
quorum.o \
quota.o \
recov.o \
scoutfs_trace.o \
server.o \
@@ -44,12 +42,10 @@ scoutfs-y += \
srch.o \
super.o \
sysfs.o \
totl.o \
trans.o \
triggers.o \
tseq.o \
volopt.o \
wkic.o \
xattr.o
#
-252
View File
@@ -1,252 +0,0 @@
/*
* Copyright (C) 2024 Versity Software, Inc. All rights reserved.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public
* License v2 as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License for more details.
*/
#include <linux/kernel.h>
#include <linux/fs.h>
#include "format.h"
#include "super.h"
#include "inode.h"
#include "ioctl.h"
#include "lock.h"
#include "trans.h"
#include "attr_x.h"
static int validate_attr_x_input(struct super_block *sb, struct scoutfs_ioctl_inode_attr_x *iax)
{
int ret;
if ((iax->x_mask & SCOUTFS_IOC_IAX__UNKNOWN) ||
(iax->x_flags & SCOUTFS_IOC_IAX_F__UNKNOWN))
return -EINVAL;
if ((iax->x_mask & SCOUTFS_IOC_IAX_RETENTION) &&
(ret = scoutfs_fmt_vers_unsupported(sb, SCOUTFS_FORMAT_VERSION_FEAT_RETENTION)))
return ret;
if ((iax->x_mask & SCOUTFS_IOC_IAX_PROJECT_ID) &&
(ret = scoutfs_fmt_vers_unsupported(sb, SCOUTFS_FORMAT_VERSION_FEAT_PROJECT_ID)))
return ret;
return 0;
}
/*
* If the mask indicates interest in the given attr then set the field
* to the caller's value and return the new size if it didn't already
* include the attr field.
*/
#define fill_attr(size, iax, bit, field, val) \
({ \
__typeof__(iax) _iax = (iax); \
__typeof__(size) _size = (size); \
\
if (_iax->x_mask & (bit)) { \
_iax->field = (val); \
_size = max(_size, offsetof(struct scoutfs_ioctl_inode_attr_x, field) + \
sizeof_field(struct scoutfs_ioctl_inode_attr_x, field)); \
} \
\
_size; \
})
/*
* Returns -errno on error, or >= number of bytes filled by the
* response. 0 can be returned if no attributes are requested in the
* input x_mask.
*/
int scoutfs_get_attr_x(struct inode *inode, struct scoutfs_ioctl_inode_attr_x *iax)
{
struct super_block *sb = inode->i_sb;
struct scoutfs_inode_info *si = SCOUTFS_I(inode);
struct scoutfs_lock *lock = NULL;
size_t size = 0;
u64 offline;
u64 online;
u64 bits;
int ret;
if (iax->x_mask == 0) {
ret = 0;
goto out;
}
ret = validate_attr_x_input(sb, iax);
if (ret < 0)
goto out;
inode_lock(inode);
ret = scoutfs_lock_inode(sb, SCOUTFS_LOCK_READ, SCOUTFS_LKF_REFRESH_INODE, inode, &lock);
if (ret)
goto unlock;
size = fill_attr(size, iax, SCOUTFS_IOC_IAX_META_SEQ,
meta_seq, scoutfs_inode_meta_seq(inode));
size = fill_attr(size, iax, SCOUTFS_IOC_IAX_DATA_SEQ,
data_seq, scoutfs_inode_data_seq(inode));
size = fill_attr(size, iax, SCOUTFS_IOC_IAX_DATA_VERSION,
data_version, scoutfs_inode_data_version(inode));
if (iax->x_mask & (SCOUTFS_IOC_IAX_ONLINE_BLOCKS | SCOUTFS_IOC_IAX_OFFLINE_BLOCKS)) {
scoutfs_inode_get_onoff(inode, &online, &offline);
size = fill_attr(size, iax, SCOUTFS_IOC_IAX_ONLINE_BLOCKS,
online_blocks, online);
size = fill_attr(size, iax, SCOUTFS_IOC_IAX_OFFLINE_BLOCKS,
offline_blocks, offline);
}
size = fill_attr(size, iax, SCOUTFS_IOC_IAX_CTIME, ctime_sec, inode->i_ctime.tv_sec);
size = fill_attr(size, iax, SCOUTFS_IOC_IAX_CTIME, ctime_nsec, inode->i_ctime.tv_nsec);
size = fill_attr(size, iax, SCOUTFS_IOC_IAX_CRTIME, crtime_sec, si->crtime.tv_sec);
size = fill_attr(size, iax, SCOUTFS_IOC_IAX_CRTIME, crtime_nsec, si->crtime.tv_nsec);
size = fill_attr(size, iax, SCOUTFS_IOC_IAX_SIZE, size, i_size_read(inode));
if (iax->x_mask & SCOUTFS_IOC_IAX__BITS) {
bits = 0;
if ((iax->x_mask & SCOUTFS_IOC_IAX_RETENTION) &&
(scoutfs_inode_get_flags(inode) & SCOUTFS_INO_FLAG_RETENTION))
bits |= SCOUTFS_IOC_IAX_B_RETENTION;
size = fill_attr(size, iax, SCOUTFS_IOC_IAX__BITS, bits, bits);
}
size = fill_attr(size, iax, SCOUTFS_IOC_IAX_PROJECT_ID,
project_id, scoutfs_inode_get_proj(inode));
ret = size;
unlock:
scoutfs_unlock(sb, lock, SCOUTFS_LOCK_READ);
inode_unlock(inode);
out:
return ret;
}
static bool valid_attr_changes(struct inode *inode, struct scoutfs_ioctl_inode_attr_x *iax)
{
/* provided data_version must be non-zero */
if ((iax->x_mask & SCOUTFS_IOC_IAX_DATA_VERSION) && (iax->data_version == 0))
return false;
/* can only set size or data version in new regular files */
if (((iax->x_mask & SCOUTFS_IOC_IAX_SIZE) ||
(iax->x_mask & SCOUTFS_IOC_IAX_DATA_VERSION)) &&
(!S_ISREG(inode->i_mode) || scoutfs_inode_data_version(inode) != 0))
return false;
/* must provide non-zero data_version with non-zero size */
if (((iax->x_mask & SCOUTFS_IOC_IAX_SIZE) && (iax->size > 0)) &&
(!(iax->x_mask & SCOUTFS_IOC_IAX_DATA_VERSION) || (iax->data_version == 0)))
return false;
/* must provide non-zero size when setting offline extents to that size */
if ((iax->x_flags & SCOUTFS_IOC_IAX_F_SIZE_OFFLINE) &&
(!(iax->x_mask & SCOUTFS_IOC_IAX_SIZE) || (iax->size == 0)))
return false;
/* the retention bit only applies to regular files */
if ((iax->x_mask & SCOUTFS_IOC_IAX_RETENTION) && !S_ISREG(inode->i_mode))
return false;
return true;
}
int scoutfs_set_attr_x(struct inode *inode, struct scoutfs_ioctl_inode_attr_x *iax)
{
struct super_block *sb = inode->i_sb;
struct scoutfs_inode_info *si = SCOUTFS_I(inode);
struct scoutfs_lock *lock = NULL;
LIST_HEAD(ind_locks);
bool set_data_seq;
int ret;
/* initially all setting is root only, could loosen with finer grained checks */
if (!capable(CAP_SYS_ADMIN)) {
ret = -EPERM;
goto out;
}
if (iax->x_mask == 0) {
ret = 0;
goto out;
}
ret = validate_attr_x_input(sb, iax);
if (ret < 0)
goto out;
inode_lock(inode);
ret = scoutfs_lock_inode(sb, SCOUTFS_LOCK_WRITE, SCOUTFS_LKF_REFRESH_INODE, inode, &lock);
if (ret)
goto unlock;
/* check for errors before making any changes */
if (!valid_attr_changes(inode, iax)) {
ret = -EINVAL;
goto unlock;
}
/* retention prevents modification unless also clearing retention */
ret = scoutfs_inode_check_retention(inode);
if (ret < 0 && !((iax->x_mask & SCOUTFS_IOC_IAX_RETENTION) &&
!(iax->bits & SCOUTFS_IOC_IAX_B_RETENTION)))
goto unlock;
/* setting only so we don't see 0 data seq with nonzero data_version */
if ((iax->x_mask & SCOUTFS_IOC_IAX_DATA_VERSION) && (iax->data_version > 0))
set_data_seq = true;
else
set_data_seq = false;
ret = scoutfs_inode_index_lock_hold(inode, &ind_locks, set_data_seq, true);
if (ret)
goto unlock;
ret = scoutfs_dirty_inode_item(inode, lock);
if (ret < 0)
goto release;
/* creating offline extent first, it might fail */
if (iax->x_flags & SCOUTFS_IOC_IAX_F_SIZE_OFFLINE) {
ret = scoutfs_data_init_offline_extent(inode, iax->size, lock);
if (ret)
goto release;
}
/* make all changes once they're all checked and will succeed */
if (iax->x_mask & SCOUTFS_IOC_IAX_DATA_VERSION)
scoutfs_inode_set_data_version(inode, iax->data_version);
if (iax->x_mask & SCOUTFS_IOC_IAX_SIZE)
i_size_write(inode, iax->size);
if (iax->x_mask & SCOUTFS_IOC_IAX_CTIME) {
inode->i_ctime.tv_sec = iax->ctime_sec;
inode->i_ctime.tv_nsec = iax->ctime_nsec;
}
if (iax->x_mask & SCOUTFS_IOC_IAX_CRTIME) {
si->crtime.tv_sec = iax->crtime_sec;
si->crtime.tv_nsec = iax->crtime_nsec;
}
if (iax->x_mask & SCOUTFS_IOC_IAX_RETENTION) {
scoutfs_inode_set_flags(inode, ~SCOUTFS_INO_FLAG_RETENTION,
(iax->bits & SCOUTFS_IOC_IAX_B_RETENTION) ?
SCOUTFS_INO_FLAG_RETENTION : 0);
}
if (iax->x_mask & SCOUTFS_IOC_IAX_PROJECT_ID)
scoutfs_inode_set_proj(inode, iax->project_id);
scoutfs_update_inode_item(inode, lock, &ind_locks);
ret = 0;
release:
scoutfs_release_trans(sb);
unlock:
scoutfs_inode_index_unlock(sb, &ind_locks);
scoutfs_unlock(sb, lock, SCOUTFS_LOCK_WRITE);
inode_unlock(inode);
out:
return ret;
}
-11
View File
@@ -1,11 +0,0 @@
#ifndef _SCOUTFS_ATTR_X_H_
#define _SCOUTFS_ATTR_X_H_
#include <linux/kernel.h>
#include <linux/fs.h>
#include "ioctl.h"
int scoutfs_get_attr_x(struct inode *inode, struct scoutfs_ioctl_inode_attr_x *iax);
int scoutfs_set_attr_x(struct inode *inode, struct scoutfs_ioctl_inode_attr_x *iax);
#endif
+2 -6
View File
@@ -683,7 +683,6 @@ int scoutfs_block_read_ref(struct super_block *sb, struct scoutfs_block_ref *ref
struct scoutfs_block_header *hdr;
struct block_private *bp = NULL;
bool retried = false;
__le32 crc = 0;
int ret;
retry:
@@ -696,9 +695,7 @@ retry:
/* corrupted writes might be a sign of a stale reference */
if (!test_bit(BLOCK_BIT_CRC_VALID, &bp->bits)) {
crc = block_calc_crc(hdr, SCOUTFS_BLOCK_LG_SIZE);
if (hdr->crc != crc) {
trace_scoutfs_block_stale(sb, ref, hdr, magic, le32_to_cpu(crc));
if (hdr->crc != block_calc_crc(hdr, SCOUTFS_BLOCK_LG_SIZE)) {
ret = -ESTALE;
goto out;
}
@@ -708,7 +705,6 @@ retry:
if (hdr->magic != cpu_to_le32(magic) || hdr->fsid != cpu_to_le64(sbi->fsid) ||
hdr->seq != ref->seq || hdr->blkno != ref->blkno) {
trace_scoutfs_block_stale(sb, ref, hdr, magic, 0);
ret = -ESTALE;
goto out;
}
@@ -1082,7 +1078,7 @@ static unsigned long block_count_objects(struct shrinker *shrink, struct shrink_
scoutfs_inc_counter(sb, block_cache_count_objects);
return shrinker_min_long(atomic_read(&binf->total_inserted));
return shrinker_min_t_long((u64)atomic_read(&binf->total_inserted));
}
/*
+185 -260
View File
@@ -2029,253 +2029,187 @@ int scoutfs_btree_rebalance(struct super_block *sb,
key, SCOUTFS_BTREE_MAX_VAL_LEN, NULL, NULL, NULL);
}
struct merged_range {
struct scoutfs_key start;
struct scoutfs_key end;
struct rb_root root;
int size;
};
struct merged_item {
struct merge_pos {
struct rb_node node;
struct scoutfs_key key;
struct scoutfs_btree_root *root;
struct scoutfs_block *bl;
struct scoutfs_btree_block *bt;
struct scoutfs_avl_node *avl;
struct scoutfs_key *key;
u64 seq;
u8 flags;
unsigned int val_len;
u8 val[0];
u8 *val;
};
static inline struct merged_item *mitem_container(struct rb_node *node)
static struct merge_pos *first_mpos(struct rb_root *root)
{
return node ? container_of(node, struct merged_item, node) : NULL;
}
static inline struct merged_item *first_mitem(struct rb_root *root)
{
return mitem_container(rb_first(root));
}
static inline struct merged_item *last_mitem(struct rb_root *root)
{
return mitem_container(rb_last(root));
}
static inline struct merged_item *next_mitem(struct merged_item *mitem)
{
return mitem_container(mitem ? rb_next(&mitem->node) : NULL);
}
static inline struct merged_item *prev_mitem(struct merged_item *mitem)
{
return mitem_container(mitem ? rb_prev(&mitem->node) : NULL);
}
static struct merged_item *find_mitem(struct rb_root *root, struct scoutfs_key *key,
struct rb_node **parent_ret, struct rb_node ***link_ret)
{
struct rb_node **node = &root->rb_node;
struct rb_node *parent = NULL;
struct merged_item *mitem;
int cmp;
while (*node) {
parent = *node;
mitem = container_of(*node, struct merged_item, node);
cmp = scoutfs_key_compare(key, &mitem->key);
if (cmp < 0) {
node = &(*node)->rb_left;
} else if (cmp > 0) {
node = &(*node)->rb_right;
} else {
*parent_ret = NULL;
*link_ret = NULL;
return mitem;
}
}
*parent_ret = parent;
*link_ret = node;
struct rb_node *node = rb_first(root);
if (node)
return container_of(node, struct merge_pos, node);
return NULL;
}
static void insert_mitem(struct merged_range *rng, struct merged_item *mitem,
struct rb_node *parent, struct rb_node **link)
static struct merge_pos *next_mpos(struct merge_pos *mpos)
{
rb_link_node(&mitem->node, parent, link);
rb_insert_color(&mitem->node, &rng->root);
rng->size += item_len_bytes(mitem->val_len);
struct rb_node *node;
if (mpos && (node = rb_next(&mpos->node)))
return container_of(node, struct merge_pos, node);
else
return NULL;
}
static void replace_mitem(struct merged_range *rng, struct merged_item *victim,
struct merged_item *new)
static void free_mpos(struct super_block *sb, struct merge_pos *mpos)
{
rb_replace_node(&victim->node, &new->node, &rng->root);
RB_CLEAR_NODE(&victim->node);
rng->size -= item_len_bytes(victim->val_len);
rng->size += item_len_bytes(new->val_len);
scoutfs_block_put(sb, mpos->bl);
kfree(mpos);
}
static void free_mitem(struct merged_range *rng, struct merged_item *mitem)
static void insert_mpos(struct rb_root *pos_root, struct merge_pos *ins)
{
if (IS_ERR_OR_NULL(mitem))
return;
struct rb_node **node = &pos_root->rb_node;
struct rb_node *parent = NULL;
struct merge_pos *mpos;
int cmp;
if (!RB_EMPTY_NODE(&mitem->node)) {
rng->size -= item_len_bytes(mitem->val_len);
rb_erase(&mitem->node, &rng->root);
parent = NULL;
while (*node) {
parent = *node;
mpos = container_of(*node, struct merge_pos, node);
/* sort merge items by key then newest to oldest */
cmp = scoutfs_key_compare(ins->key, mpos->key) ?:
-scoutfs_cmp(ins->seq, mpos->seq);
if (cmp < 0)
node = &(*node)->rb_left;
else
node = &(*node)->rb_right;
}
kfree(mitem);
}
static void trim_range_size(struct merged_range *rng, int merge_window)
{
struct merged_item *mitem;
struct merged_item *tmp;
mitem = last_mitem(&rng->root);
while (mitem && rng->size > merge_window) {
rng->end = mitem->key;
scoutfs_key_dec(&rng->end);
tmp = mitem;
mitem = prev_mitem(mitem);
free_mitem(rng, tmp);
}
}
static void trim_range_end(struct merged_range *rng)
{
struct merged_item *mitem;
struct merged_item *tmp;
mitem = last_mitem(&rng->root);
while (mitem && scoutfs_key_compare(&mitem->key, &rng->end) > 0) {
tmp = mitem;
mitem = prev_mitem(mitem);
free_mitem(rng, tmp);
}
rb_link_node(&ins->node, parent, node);
rb_insert_color(&ins->node, pos_root);
}
/*
* Record and combine logged items from log roots for merging with the
* writable destination root. The caller is responsible for trimming
* the range if it gets too large or if the key range shrinks.
* Find the next item in the merge_pos root in the caller's range and
* insert it into the rbtree sorted by key and version so that merging
* can find the next newest item at the front of the rbtree. We free
* the mpos on error or if there are no more items in the range.
*/
static int merge_read_item(struct super_block *sb, struct scoutfs_key *key, u64 seq, u8 flags,
void *val, int val_len, void *arg)
static int reset_mpos(struct super_block *sb, struct rb_root *pos_root, struct merge_pos *mpos,
struct scoutfs_key *start, struct scoutfs_key *end)
{
struct merged_range *rng = arg;
struct merged_item *mitem;
struct merged_item *found;
struct rb_node *parent;
struct rb_node **link;
int ret;
struct scoutfs_btree_item *item;
struct scoutfs_avl_node *next;
struct btree_walk_key_range kr;
struct scoutfs_key walk_key;
int ret = 0;
found = find_mitem(&rng->root, key, &parent, &link);
if (found) {
ret = scoutfs_forest_combine_deltas(key, found->val, found->val_len, val, val_len);
if (ret < 0)
goto out;
if (ret > 0) {
if (ret == SCOUTFS_DELTA_COMBINED) {
scoutfs_inc_counter(sb, btree_merge_delta_combined);
} else if (ret == SCOUTFS_DELTA_COMBINED_NULL) {
scoutfs_inc_counter(sb, btree_merge_delta_null);
free_mitem(rng, found);
}
ret = 0;
goto out;
}
if (found->seq >= seq) {
ret = 0;
goto out;
}
/* always erase before freeing or inserting */
if (!RB_EMPTY_NODE(&mpos->node)) {
rb_erase(&mpos->node, pos_root);
RB_CLEAR_NODE(&mpos->node);
}
mitem = kmalloc(offsetof(struct merged_item, val[val_len]), GFP_NOFS);
if (!mitem) {
ret = -ENOMEM;
/*
* advance to next item via the avl tree. The caller's pos is
* only ever incremented past the last key so we can use next to
* iterate rather than using search to skip past multiple items.
*/
if (mpos->avl)
mpos->avl = scoutfs_avl_next(&mpos->bt->item_root, mpos->avl);
/* find the next leaf with the key if we run out of items */
walk_key = *start;
while (!mpos->avl && !scoutfs_key_is_zeros(&walk_key)) {
scoutfs_block_put(sb, mpos->bl);
mpos->bl = NULL;
ret = btree_walk(sb, NULL, NULL, mpos->root, BTW_NEXT, &walk_key,
0, &mpos->bl, &kr, NULL);
if (ret < 0) {
if (ret == -ENOENT)
ret = 0;
free_mpos(sb, mpos);
goto out;
}
mpos->bt = mpos->bl->data;
mpos->avl = scoutfs_avl_search(&mpos->bt->item_root, cmp_key_item,
start, NULL, NULL, &next, NULL) ?: next;
if (mpos->avl == NULL)
walk_key = kr.iter_next;
}
/* see if we're out of items within the range */
item = node_item(mpos->avl);
if (!item || scoutfs_key_compare(item_key(item), end) > 0) {
free_mpos(sb, mpos);
ret = 0;
goto out;
}
mitem->key = *key;
mitem->seq = seq;
mitem->flags = flags;
mitem->val_len = val_len;
if (val_len)
memcpy(mitem->val, val, val_len);
if (found) {
replace_mitem(rng, found, mitem);
free_mitem(rng, found);
} else {
insert_mitem(rng, mitem, parent, link);
}
/* insert the next item within range at its version */
mpos->key = item_key(item);
mpos->seq = le64_to_cpu(item->seq);
mpos->flags = item->flags;
mpos->val_len = item_val_len(item);
mpos->val = item_val(mpos->bt, item);
insert_mpos(pos_root, mpos);
ret = 0;
out:
return ret;
}
/*
* Read a range of merged items. The caller has set the key bounds of
* the range. We read a merge window's worth of items from blocks in
* each input btree.
* The caller has reset all the merge positions for all the input log
* btree roots and wants the next logged item it should try and merge
* with the items in the fs_root.
*
* The caller can only use the smallest range that overlaps with all the
* blocks that we read. We start reading from the range's start key so
* it will always be present and we don't need to adjust it. The final
* block we read from each input might not cover the range's end so it
* needs to be adjusted.
*
* The end range can also shrink if we have to drop items because the
* items exceeded the merge window size.
* We look ahead in the logged item stream to see if we should merge any
* older logged delta items into one result for the caller. We also
* take this opportunity to skip and reset the mpos for any older
* versions of the first item.
*/
static int read_merged_range(struct super_block *sb, struct merged_range *rng,
struct list_head *inputs, int merge_window)
static int next_resolved_mpos(struct super_block *sb, struct rb_root *pos_root,
struct scoutfs_key *end, struct merge_pos **mpos_ret)
{
struct scoutfs_btree_root_head *rhead;
struct scoutfs_key start;
struct scoutfs_key end;
struct merge_pos *mpos;
struct merge_pos *next;
struct scoutfs_key key;
int ret = 0;
int i;
list_for_each_entry(rhead, inputs, head) {
key = rng->start;
while ((mpos = first_mpos(pos_root)) && (next = next_mpos(mpos)) &&
!scoutfs_key_compare(mpos->key, next->key)) {
for (i = 0; i < merge_window; i += SCOUTFS_BLOCK_LG_SIZE) {
start = key;
end = rng->end;
ret = scoutfs_btree_read_items(sb, &rhead->root, &key, &start, &end,
merge_read_item, rng);
ret = scoutfs_forest_combine_deltas(mpos->key, mpos->val, mpos->val_len,
next->val, next->val_len);
if (ret < 0)
break;
/* reset advances to the next item */
key = *mpos->key;
scoutfs_key_inc(&key);
/* always skip next combined or older version */
ret = reset_mpos(sb, pos_root, next, &key, end);
if (ret < 0)
break;
if (ret == SCOUTFS_DELTA_COMBINED) {
scoutfs_inc_counter(sb, btree_merge_delta_combined);
} else if (ret == SCOUTFS_DELTA_COMBINED_NULL) {
scoutfs_inc_counter(sb, btree_merge_delta_null);
/* if merging resulted in no info, skip current */
ret = reset_mpos(sb, pos_root, mpos, &key, end);
if (ret < 0)
goto out;
if (scoutfs_key_compare(&end, &rng->end) >= 0)
break;
key = end;
scoutfs_key_inc(&key);
}
if (scoutfs_key_compare(&end, &rng->end) < 0) {
rng->end = end;
trim_range_end(rng);
}
if (rng->size > merge_window)
trim_range_size(rng, merge_window);
}
trace_scoutfs_btree_merge_read_range(sb, &rng->start, &rng->end, rng->size);
ret = 0;
out:
*mpos_ret = mpos;
return ret;
}
@@ -2292,13 +2226,6 @@ out:
* to allocators running low or needing to join/split the parent.
* *next_ret is set to the next key which hasn't been merged so that the
* caller can retry with a new allocator and subtree.
*
* The number of input roots can be immense. The merge_window specifies
* the size of the set of merged items that we'll maintain as we iterate
* over all the input roots. Once we've merged items into the window
* from all the input roots the merged input items are then merged to
* the writable destination root. It may take multiple passes of
* windows of merged items to cover the input key range.
*/
int scoutfs_btree_merge(struct super_block *sb,
struct scoutfs_alloc *alloc,
@@ -2308,16 +2235,18 @@ int scoutfs_btree_merge(struct super_block *sb,
struct scoutfs_key *next_ret,
struct scoutfs_btree_root *root,
struct list_head *inputs,
bool subtree, int dirty_limit, int alloc_low, int merge_window)
bool subtree, int dirty_limit, int alloc_low)
{
struct scoutfs_btree_root_head *rhead;
struct rb_root pos_root = RB_ROOT;
struct scoutfs_btree_item *item;
struct scoutfs_btree_block *bt;
struct scoutfs_block *bl = NULL;
struct btree_walk_key_range kr;
struct scoutfs_avl_node *par;
struct merged_item *mitem;
struct merged_item *tmp;
struct merged_range rng;
struct scoutfs_key next;
struct merge_pos *mpos;
struct merge_pos *tmp;
int walk_val_len;
int walk_flags;
bool is_del;
@@ -2328,59 +2257,49 @@ int scoutfs_btree_merge(struct super_block *sb,
trace_scoutfs_btree_merge(sb, root, start, end);
scoutfs_inc_counter(sb, btree_merge);
list_for_each_entry(rhead, inputs, head) {
mpos = kzalloc(sizeof(*mpos), GFP_NOFS);
if (!mpos) {
ret = -ENOMEM;
goto out;
}
RB_CLEAR_NODE(&mpos->node);
mpos->root = &rhead->root;
ret = reset_mpos(sb, &pos_root, mpos, start, end);
if (ret < 0)
goto out;
}
walk_flags = BTW_DIRTY;
if (subtree)
walk_flags |= BTW_SUBTREE;
walk_val_len = 0;
rng.start = *start;
rng.end = *end;
rng.root = RB_ROOT;
rng.size = 0;
ret = read_merged_range(sb, &rng, inputs, merge_window);
if (ret < 0)
goto out;
for (;;) {
/* read next window as it empties (and it is possible to read an empty range) */
mitem = first_mitem(&rng.root);
if (!mitem) {
/* done if the read range hit the end */
if (scoutfs_key_compare(&rng.end, end) >= 0)
break;
/* read next batch of merged items */
rng.start = rng.end;
scoutfs_key_inc(&rng.start);
rng.end = *end;
ret = read_merged_range(sb, &rng, inputs, merge_window);
if (ret < 0)
break;
continue;
}
while ((ret = next_resolved_mpos(sb, &pos_root, end, &mpos)) == 0 && mpos) {
if (scoutfs_block_writer_dirty_bytes(sb, wri) >= dirty_limit) {
scoutfs_inc_counter(sb, btree_merge_dirty_limit);
ret = -ERANGE;
*next_ret = mitem->key;
*next_ret = *mpos->key;
goto out;
}
if (scoutfs_alloc_meta_low(sb, alloc, alloc_low)) {
scoutfs_inc_counter(sb, btree_merge_alloc_low);
ret = -ERANGE;
*next_ret = mitem->key;
*next_ret = *mpos->key;
goto out;
}
scoutfs_block_put(sb, bl);
bl = NULL;
ret = btree_walk(sb, alloc, wri, root, walk_flags,
&mitem->key, walk_val_len, &bl, &kr, NULL);
mpos->key, walk_val_len, &bl, &kr, NULL);
if (ret < 0) {
if (ret == -ERANGE)
*next_ret = mitem->key;
*next_ret = *mpos->key;
goto out;
}
bt = bl->data;
@@ -2392,21 +2311,22 @@ int scoutfs_btree_merge(struct super_block *sb,
continue;
}
while (mitem) {
while ((ret = next_resolved_mpos(sb, &pos_root, end, &mpos)) == 0 && mpos) {
/* walk to new leaf if we exceed parent ref key */
if (scoutfs_key_compare(&mitem->key, &kr.end) > 0)
if (scoutfs_key_compare(mpos->key, &kr.end) > 0)
break;
/* see if there's an existing item */
item = leaf_item_hash_search(sb, bt, &mitem->key);
is_del = !!(mitem->flags & SCOUTFS_ITEM_FLAG_DELETION);
item = leaf_item_hash_search(sb, bt, mpos->key);
is_del = !!(mpos->flags & SCOUTFS_ITEM_FLAG_DELETION);
/* see if we're merging delta items */
if (item && !is_del)
delta = scoutfs_forest_combine_deltas(&mitem->key,
delta = scoutfs_forest_combine_deltas(mpos->key,
item_val(bt, item),
item_val_len(item),
mitem->val, mitem->val_len);
mpos->val, mpos->val_len);
else
delta = 0;
if (delta < 0) {
@@ -2418,38 +2338,40 @@ int scoutfs_btree_merge(struct super_block *sb,
scoutfs_inc_counter(sb, btree_merge_delta_null);
}
trace_scoutfs_btree_merge_items(sb, &mitem->key, mitem->val_len,
trace_scoutfs_btree_merge_items(sb, mpos->root,
mpos->key, mpos->val_len,
item ? root : NULL,
item ? item_key(item) : NULL,
item ? item_val_len(item) : 0, is_del);
/* rewalk and split if ins/update needs room */
if (!is_del && !delta && !mid_free_item_room(bt, mitem->val_len)) {
if (!is_del && !delta && !mid_free_item_room(bt, mpos->val_len)) {
walk_flags |= BTW_INSERT;
walk_val_len = mitem->val_len;
walk_val_len = mpos->val_len;
break;
}
/* insert missing non-deletion merge items */
if (!item && !is_del) {
scoutfs_avl_search(&bt->item_root, cmp_key_item, &mitem->key,
scoutfs_avl_search(&bt->item_root,
cmp_key_item, mpos->key,
&cmp, &par, NULL, NULL);
create_item(bt, &mitem->key, mitem->seq, mitem->flags,
mitem->val, mitem->val_len, par, cmp);
create_item(bt, mpos->key, mpos->seq, mpos->flags,
mpos->val, mpos->val_len, par, cmp);
scoutfs_inc_counter(sb, btree_merge_insert);
}
/* update existing items */
if (item && !is_del && !delta) {
item->seq = cpu_to_le64(mitem->seq);
item->flags = mitem->flags;
update_item_value(bt, item, mitem->val, mitem->val_len);
item->seq = cpu_to_le64(mpos->seq);
item->flags = mpos->flags;
update_item_value(bt, item, mpos->val, mpos->val_len);
scoutfs_inc_counter(sb, btree_merge_update);
}
/* update combined delta item seq */
if (delta == SCOUTFS_DELTA_COMBINED) {
item->seq = cpu_to_le64(mitem->seq);
item->seq = cpu_to_le64(mpos->seq);
}
/*
@@ -2481,18 +2403,21 @@ int scoutfs_btree_merge(struct super_block *sb,
walk_flags &= ~(BTW_INSERT | BTW_DELETE);
walk_val_len = 0;
/* finished with this merged item */
tmp = mitem;
mitem = next_mitem(mitem);
free_mitem(&rng, tmp);
/* finished with this key, skip any older items */
next = *mpos->key;
scoutfs_key_inc(&next);
ret = reset_mpos(sb, &pos_root, mpos, &next, end);
if (ret < 0)
goto out;
}
}
ret = 0;
out:
scoutfs_block_put(sb, bl);
rbtree_postorder_for_each_entry_safe(mitem, tmp, &rng.root, node)
free_mitem(&rng, mitem);
rbtree_postorder_for_each_entry_safe(mpos, tmp, &pos_root, node) {
free_mpos(sb, mpos);
}
return ret;
}
+1 -1
View File
@@ -119,7 +119,7 @@ int scoutfs_btree_merge(struct super_block *sb,
struct scoutfs_key *next_ret,
struct scoutfs_btree_root *root,
struct list_head *input_list,
bool subtree, int dirty_limit, int alloc_low, int merge_window);
bool subtree, int dirty_limit, int alloc_low);
int scoutfs_btree_free_blocks(struct super_block *sb,
struct scoutfs_alloc *alloc,
+3 -1
View File
@@ -145,7 +145,6 @@
EXPAND_COUNTER(lock_shrink_work) \
EXPAND_COUNTER(lock_unlock) \
EXPAND_COUNTER(lock_wait) \
EXPAND_COUNTER(log_merge_wait_timeout) \
EXPAND_COUNTER(net_dropped_response) \
EXPAND_COUNTER(net_send_bytes) \
EXPAND_COUNTER(net_send_error) \
@@ -199,7 +198,10 @@
EXPAND_COUNTER(srch_read_stale) \
EXPAND_COUNTER(statfs) \
EXPAND_COUNTER(totl_read_copied) \
EXPAND_COUNTER(totl_read_finalized) \
EXPAND_COUNTER(totl_read_fs) \
EXPAND_COUNTER(totl_read_item) \
EXPAND_COUNTER(totl_read_logged) \
EXPAND_COUNTER(trans_commit_data_alloc_low) \
EXPAND_COUNTER(trans_commit_dirty_meta_full) \
EXPAND_COUNTER(trans_commit_fsync) \
+53 -16
View File
@@ -586,12 +586,6 @@ static int scoutfs_get_block(struct inode *inode, sector_t iblock,
goto out;
}
if (create && !si->staging) {
ret = scoutfs_inode_check_retention(inode);
if (ret < 0)
goto out;
}
/* convert unwritten to written, could be staging */
if (create && ext.map && (ext.flags & SEF_UNWRITTEN)) {
un.start = iblock;
@@ -1110,10 +1104,6 @@ long scoutfs_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
while(iblock <= last) {
ret = scoutfs_quota_check_data(sb, inode);
if (ret)
goto out_extent;
ret = scoutfs_inode_index_lock_hold(inode, &ind_locks, false, true);
if (ret)
goto out_extent;
@@ -1165,9 +1155,9 @@ out:
* on regular files with no data extents. It's used to restore a file
* with an offline extent which can then trigger staging.
*
* The caller must take care of cluster locking, transactions, inode
* updates, and index updates (so that they can atomically make this
* change along with other metadata changes).
* The caller has taken care of locking the inode. We're updating the
* inode offline count as we create the offline extent so we take care
* of the index locking, updating, and transaction.
*/
int scoutfs_data_init_offline_extent(struct inode *inode, u64 size,
struct scoutfs_lock *lock)
@@ -1181,6 +1171,7 @@ int scoutfs_data_init_offline_extent(struct inode *inode, u64 size,
.lock = lock,
};
const u64 count = DIV_ROUND_UP(size, SCOUTFS_BLOCK_SM_SIZE);
LIST_HEAD(ind_locks);
u64 on;
u64 off;
int ret;
@@ -1193,10 +1184,28 @@ int scoutfs_data_init_offline_extent(struct inode *inode, u64 size,
goto out;
}
/* we're updating meta_seq with offline block count */
ret = scoutfs_inode_index_lock_hold(inode, &ind_locks, false, true);
if (ret < 0)
goto out;
ret = scoutfs_dirty_inode_item(inode, lock);
if (ret < 0)
goto unlock;
down_write(&si->extent_sem);
ret = scoutfs_ext_insert(sb, &data_ext_ops, &args,
0, count, 0, SEF_OFFLINE);
up_write(&si->extent_sem);
if (ret < 0)
goto unlock;
scoutfs_update_inode_item(inode, lock, &ind_locks);
unlock:
scoutfs_release_trans(sb);
scoutfs_inode_index_unlock(sb, &ind_locks);
ret = 0;
out:
return ret;
}
@@ -1264,9 +1273,6 @@ int scoutfs_data_move_blocks(struct inode *from, u64 from_off,
if (ret)
goto out;
if (!is_stage && (ret = scoutfs_inode_check_retention(to)))
goto out;
if ((from_off & SCOUTFS_BLOCK_SM_MASK) ||
(to_off & SCOUTFS_BLOCK_SM_MASK) ||
((byte_len & SCOUTFS_BLOCK_SM_MASK) &&
@@ -1801,6 +1807,37 @@ int scoutfs_data_wait_check_iov(struct inode *inode, const struct iovec *iov,
return ret;
}
int scoutfs_data_wait_check_iter(struct inode *inode, loff_t pos, struct iov_iter *iter,
u8 sef, u8 op, struct scoutfs_data_wait *dw,
struct scoutfs_lock *lock)
{
size_t count = iov_iter_count(iter);
size_t off = iter->iov_offset;
const struct iovec *iov;
size_t len;
int ret = 0;
for (iov = iter->iov; count > 0; iov++) {
len = iov->iov_len - off;
if (len == 0)
continue;
/* aren't we waiting on too much data here ? */
ret = scoutfs_data_wait_check(inode, pos, len,
sef, op, dw, lock);
if (ret != 0)
break;
pos += len;
count -= len;
off = 0;
}
return ret;
}
int scoutfs_data_wait(struct inode *inode, struct scoutfs_data_wait *dw)
{
DECLARE_DATA_WAIT_ROOT(inode->i_sb, rt);
+3
View File
@@ -65,6 +65,9 @@ int scoutfs_data_wait_check_iov(struct inode *inode, const struct iovec *iov,
unsigned long nr_segs, loff_t pos, u8 sef,
u8 op, struct scoutfs_data_wait *ow,
struct scoutfs_lock *lock);
int scoutfs_data_wait_check_iter(struct inode *inode, loff_t pos, struct iov_iter *iter,
u8 sef, u8 op, struct scoutfs_data_wait *ow,
struct scoutfs_lock *lock);
bool scoutfs_data_wait_found(struct scoutfs_data_wait *ow);
int scoutfs_data_wait(struct inode *inode,
struct scoutfs_data_wait *ow);
+66 -35
View File
@@ -34,7 +34,6 @@
#include "forest.h"
#include "acl.h"
#include "counters.h"
#include "quota.h"
#include "scoutfs_trace.h"
/*
@@ -652,10 +651,6 @@ static struct inode *lock_hold_create(struct inode *dir, struct dentry *dentry,
if (ret)
goto out_unlock;
ret = scoutfs_quota_check_inode(sb, dir);
if (ret)
goto out_unlock;
if (orph_lock) {
ret = scoutfs_lock_orphan(sb, SCOUTFS_LOCK_WRITE_ONLY, 0, ino, orph_lock);
if (ret < 0)
@@ -677,8 +672,6 @@ retry:
if (ret < 0)
goto out;
scoutfs_inode_set_proj(inode, scoutfs_inode_get_proj(dir));
ret = scoutfs_dirty_inode_item(dir, *dir_lock);
out:
if (ret)
@@ -933,16 +926,12 @@ static int scoutfs_unlink(struct inode *dir, struct dentry *dentry)
goto unlock;
}
ret = scoutfs_inode_check_retention(inode);
if (ret < 0)
goto unlock;
hash = dirent_name_hash(dentry->d_name.name, dentry->d_name.len);
ret = lookup_dirent(sb, scoutfs_ino(dir), dentry->d_name.name, dentry->d_name.len, hash,
&dent, dir_lock);
if (ret < 0)
goto unlock;
goto out;
if (should_orphan(inode)) {
ret = scoutfs_lock_orphan(sb, SCOUTFS_LOCK_WRITE_ONLY, 0, scoutfs_ino(inode),
@@ -1069,15 +1058,16 @@ static int symlink_item_ops(struct super_block *sb, enum symlink_ops op, u64 ino
return ret;
}
#ifdef KC_LINUX_HAVE_RHEL_IOPS_WRAPPER
/*
* Fill a buffer with the null terminated symlink, and return it
* so callers can free it once the vfs is done.
* Full a buffer with the null terminated symlink, point nd at it, and
* return it so put_link can free it once the vfs is done.
*
* We chose to pay the runtime cost of per-call allocation and copy
* overhead instead of wiring up symlinks to the page cache, storing
* each small link in a full page, and later having to reclaim them.
*/
static void *scoutfs_get_link_target(struct dentry *dentry)
static void *scoutfs_follow_link(struct dentry *dentry, struct nameidata *nd)
{
struct inode *inode = dentry->d_inode;
struct super_block *sb = inode->i_sb;
@@ -1136,20 +1126,10 @@ out:
if (ret < 0) {
kfree(path);
path = ERR_PTR(ret);
}
scoutfs_unlock(sb, inode_lock, SCOUTFS_LOCK_READ);
return path;
}
#ifdef KC_LINUX_HAVE_RHEL_IOPS_WRAPPER
static void *scoutfs_follow_link(struct dentry *dentry, struct nameidata *nd)
{
char *path;
path = scoutfs_get_link_target(dentry);
if (!IS_ERR_OR_NULL(path))
} else {
nd_set_link(nd, path);
}
scoutfs_unlock(sb, inode_lock, SCOUTFS_LOCK_READ);
return path;
}
@@ -1162,12 +1142,67 @@ static void scoutfs_put_link(struct dentry *dentry, struct nameidata *nd,
#else
static const char *scoutfs_get_link(struct dentry *dentry, struct inode *inode, struct delayed_call *done)
{
char *path;
struct super_block *sb = inode->i_sb;
struct scoutfs_lock *inode_lock = NULL;
char *path = NULL;
loff_t size;
int ret;
path = scoutfs_get_link_target(dentry);
if (!IS_ERR_OR_NULL(path))
ret = scoutfs_lock_inode(sb, SCOUTFS_LOCK_READ,
SCOUTFS_LKF_REFRESH_INODE, inode, &inode_lock);
if (ret)
return ERR_PTR(ret);
size = i_size_read(inode);
if (size == 0 || size > SCOUTFS_SYMLINK_MAX_SIZE) {
scoutfs_corruption(sb, SC_SYMLINK_INODE_SIZE,
corrupt_symlink_inode_size,
"ino %llu size %llu",
scoutfs_ino(inode), (u64)size);
ret = -EIO;
goto out;
}
/* unlikely, but possible I suppose */
if (size > PATH_MAX) {
ret = -ENAMETOOLONG;
goto out;
}
path = kmalloc(size, GFP_NOFS);
if (!path) {
ret = -ENOMEM;
goto out;
}
ret = symlink_item_ops(sb, SYM_LOOKUP, scoutfs_ino(inode), inode_lock,
path, size);
if (ret == -ENOENT) {
scoutfs_corruption(sb, SC_SYMLINK_MISSING_ITEM,
corrupt_symlink_missing_item,
"ino %llu size %llu", scoutfs_ino(inode),
size);
ret = -EIO;
} else if (ret == 0 && path[size - 1]) {
scoutfs_corruption(sb, SC_SYMLINK_NOT_NULL_TERM,
corrupt_symlink_not_null_term,
"ino %llu last %u",
scoutfs_ino(inode), path[size - 1]);
ret = -EIO;
}
if (ret != -EIO)
set_delayed_call(done, kfree_link, path);
out:
if (ret < 0) {
kfree(path);
path = ERR_PTR(ret);
}
scoutfs_unlock(sb, inode_lock, SCOUTFS_LOCK_READ);
return path;
}
#endif
@@ -1643,10 +1678,6 @@ static int scoutfs_rename_common(struct inode *old_dir,
goto out_unlock;
}
if ((old_inode && (ret = scoutfs_inode_check_retention(old_inode))) ||
(new_inode && (ret = scoutfs_inode_check_retention(new_inode))))
goto out_unlock;
if (should_orphan(new_inode)) {
ret = scoutfs_lock_orphan(sb, SCOUTFS_LOCK_WRITE_ONLY, 0, scoutfs_ino(new_inode),
&orph_lock);
+14 -26
View File
@@ -28,7 +28,6 @@
#include "inode.h"
#include "per_task.h"
#include "omap.h"
#include "quota.h"
#ifdef KC_LINUX_HAVE_FOP_AIO_READ
/*
@@ -109,10 +108,6 @@ retry:
if (ret)
goto out;
ret = scoutfs_inode_check_retention(inode);
if (ret < 0)
goto out;
ret = scoutfs_complete_truncate(inode, scoutfs_inode_lock);
if (ret)
goto out;
@@ -127,10 +122,6 @@ retry:
goto out;
}
ret = scoutfs_quota_check_data(sb, inode);
if (ret)
goto out;
/* XXX: remove SUID bit */
ret = __generic_file_aio_write(iocb, iov, nr_segs, &iocb->ki_pos);
@@ -180,8 +171,10 @@ retry:
goto out;
if (scoutfs_per_task_add_excl(&si->pt_data_lock, &pt_ent, scoutfs_inode_lock)) {
ret = scoutfs_data_wait_check(inode, iocb->ki_pos, iov_iter_count(to), SEF_OFFLINE,
SCOUTFS_IOC_DWO_READ, &dw, scoutfs_inode_lock);
ret = scoutfs_data_wait_check_iter(inode, iocb->ki_pos, to,
SEF_OFFLINE,
SCOUTFS_IOC_DWO_READ,
&dw, scoutfs_inode_lock);
if (ret != 0)
goto out;
} else {
@@ -212,7 +205,8 @@ ssize_t scoutfs_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
struct scoutfs_lock *scoutfs_inode_lock = NULL;
SCOUTFS_DECLARE_PER_TASK_ENTRY(pt_ent);
DECLARE_DATA_WAIT(dw);
ssize_t ret;
int ret;
int written;
retry:
inode_lock(inode);
@@ -225,29 +219,23 @@ retry:
if (ret <= 0)
goto out;
ret = scoutfs_inode_check_retention(inode);
if (ret < 0)
goto out;
ret = scoutfs_complete_truncate(inode, scoutfs_inode_lock);
if (ret)
goto out;
ret = scoutfs_quota_check_data(sb, inode);
if (ret)
goto out;
if (scoutfs_per_task_add_excl(&si->pt_data_lock, &pt_ent, scoutfs_inode_lock)) {
/* data_version is per inode, whole file must be online */
ret = scoutfs_data_wait_check(inode, 0, i_size_read(inode), SEF_OFFLINE,
SCOUTFS_IOC_DWO_WRITE, &dw, scoutfs_inode_lock);
ret = scoutfs_data_wait_check_iter(inode, iocb->ki_pos, from,
SEF_OFFLINE,
SCOUTFS_IOC_DWO_WRITE,
&dw, scoutfs_inode_lock);
if (ret != 0)
goto out;
}
/* XXX: remove SUID bit */
ret = __generic_file_write_iter(iocb, from);
written = __generic_file_write_iter(iocb, from);
out:
scoutfs_per_task_del(&si->pt_data_lock, &pt_ent);
@@ -260,10 +248,10 @@ out:
goto retry;
}
if (ret > 0)
ret = generic_write_sync(iocb, ret);
if (ret > 0 || ret == -EIOCBQUEUED)
ret = generic_write_sync(iocb, written);
return ret;
return written ? written : ret;
}
#endif
+15 -26
View File
@@ -238,16 +238,19 @@ static int forest_read_items(struct super_block *sb, struct scoutfs_key *key, u6
* We return -ESTALE if we hit stale blocks to give the caller a chance
* to reset their state and retry with a newer version of the btrees.
*/
int scoutfs_forest_read_items_roots(struct super_block *sb, struct scoutfs_net_roots *roots,
struct scoutfs_key *key, struct scoutfs_key *bloom_key,
struct scoutfs_key *start, struct scoutfs_key *end,
scoutfs_forest_item_cb cb, void *arg)
int scoutfs_forest_read_items(struct super_block *sb,
struct scoutfs_key *key,
struct scoutfs_key *bloom_key,
struct scoutfs_key *start,
struct scoutfs_key *end,
scoutfs_forest_item_cb cb, void *arg)
{
struct forest_read_items_data rid = {
.cb = cb,
.cb_arg = arg,
};
struct scoutfs_log_trees lt;
struct scoutfs_net_roots roots;
struct scoutfs_bloom_block *bb;
struct forest_bloom_nrs bloom;
SCOUTFS_BTREE_ITEM_REF(iref);
@@ -261,14 +264,18 @@ int scoutfs_forest_read_items_roots(struct super_block *sb, struct scoutfs_net_r
scoutfs_inc_counter(sb, forest_read_items);
calc_bloom_nrs(&bloom, bloom_key);
trace_scoutfs_forest_using_roots(sb, &roots->fs_root, &roots->logs_root);
ret = scoutfs_client_get_roots(sb, &roots);
if (ret)
goto out;
trace_scoutfs_forest_using_roots(sb, &roots.fs_root, &roots.logs_root);
*start = orig_start;
*end = orig_end;
/* start with fs root items */
rid.fic |= FIC_FS_ROOT;
ret = scoutfs_btree_read_items(sb, &roots->fs_root, key, start, end,
ret = scoutfs_btree_read_items(sb, &roots.fs_root, key, start, end,
forest_read_items, &rid);
if (ret < 0)
goto out;
@@ -276,7 +283,7 @@ int scoutfs_forest_read_items_roots(struct super_block *sb, struct scoutfs_net_r
scoutfs_key_init_log_trees(&ltk, 0, 0);
for (;; scoutfs_key_inc(&ltk)) {
ret = scoutfs_btree_next(sb, &roots->logs_root, &ltk, &iref);
ret = scoutfs_btree_next(sb, &roots.logs_root, &ltk, &iref);
if (ret == 0) {
if (iref.val_len == sizeof(lt)) {
ltk = *iref.key;
@@ -333,23 +340,6 @@ out:
return ret;
}
int scoutfs_forest_read_items(struct super_block *sb,
struct scoutfs_key *key,
struct scoutfs_key *bloom_key,
struct scoutfs_key *start,
struct scoutfs_key *end,
scoutfs_forest_item_cb cb, void *arg)
{
struct scoutfs_net_roots roots;
int ret;
ret = scoutfs_client_get_roots(sb, &roots);
if (ret == 0)
ret = scoutfs_forest_read_items_roots(sb, &roots, key, bloom_key, start, end,
cb, arg);
return ret;
}
/*
* If the items are deltas then combine the src with the destination
* value and store the result in the destination.
@@ -731,8 +721,7 @@ static void scoutfs_forest_log_merge_worker(struct work_struct *work)
ret = scoutfs_btree_merge(sb, &alloc, &wri, &req.start, &req.end,
&next, &comp.root, &inputs,
!!(req.flags & cpu_to_le64(SCOUTFS_LOG_MERGE_REQUEST_SUBTREE)),
SCOUTFS_LOG_MERGE_DIRTY_BYTE_LIMIT, 10,
(2 * 1024 * 1024));
SCOUTFS_LOG_MERGE_DIRTY_BYTE_LIMIT, 10);
if (ret == -ERANGE) {
comp.remain = next;
le64_add_cpu(&comp.flags, SCOUTFS_LOG_MERGE_COMP_REMAIN);
-5
View File
@@ -4,7 +4,6 @@
struct scoutfs_alloc;
struct scoutfs_block_writer;
struct scoutfs_block;
struct scoutfs_lock;
#include "btree.h"
@@ -24,10 +23,6 @@ int scoutfs_forest_read_items(struct super_block *sb,
struct scoutfs_key *start,
struct scoutfs_key *end,
scoutfs_forest_item_cb cb, void *arg);
int scoutfs_forest_read_items_roots(struct super_block *sb, struct scoutfs_net_roots *roots,
struct scoutfs_key *key, struct scoutfs_key *bloom_key,
struct scoutfs_key *start, struct scoutfs_key *end,
scoutfs_forest_item_cb cb, void *arg);
int scoutfs_forest_set_bloom_bits(struct super_block *sb,
struct scoutfs_lock *lock);
void scoutfs_forest_set_max_seq(struct super_block *sb, u64 max_seq);
+2 -73
View File
@@ -8,14 +8,9 @@
*/
#define SCOUTFS_FORMAT_VERSION_MIN 1
#define SCOUTFS_FORMAT_VERSION_MIN_STR __stringify(SCOUTFS_FORMAT_VERSION_MIN)
#define SCOUTFS_FORMAT_VERSION_MAX 2
#define SCOUTFS_FORMAT_VERSION_MAX 1
#define SCOUTFS_FORMAT_VERSION_MAX_STR __stringify(SCOUTFS_FORMAT_VERSION_MAX)
#define SCOUTFS_FORMAT_VERSION_FEAT_RETENTION 2
#define SCOUTFS_FORMAT_VERSION_FEAT_PROJECT_ID 2
#define SCOUTFS_FORMAT_VERSION_FEAT_QUOTA 2
#define SCOUTFS_FORMAT_VERSION_FEAT_INDX_TAG 2
/* statfs(2) f_type */
#define SCOUTFS_SUPER_MAGIC 0x554f4353 /* "SCOU" */
@@ -180,10 +175,6 @@ struct scoutfs_key {
#define sko_rid _sk_first
#define sko_ino _sk_second
/* quota rules */
#define skqr_hash _sk_second
#define skqr_coll_nr _sk_third
/* xattr totl */
#define skxt_a _sk_first
#define skxt_b _sk_second
@@ -594,9 +585,7 @@ struct scoutfs_log_merge_freeing {
*/
#define SCOUTFS_INODE_INDEX_ZONE 4
#define SCOUTFS_ORPHAN_ZONE 8
#define SCOUTFS_QUOTA_ZONE 10
#define SCOUTFS_XATTR_TOTL_ZONE 12
#define SCOUTFS_XATTR_INDX_ZONE 14
#define SCOUTFS_FS_ZONE 16
#define SCOUTFS_LOCK_ZONE 20
/* Items only stored in server btrees */
@@ -619,9 +608,6 @@ struct scoutfs_log_merge_freeing {
/* orphan zone, redundant type used for clarity */
#define SCOUTFS_ORPHAN_TYPE 4
/* quota zone */
#define SCOUTFS_QUOTA_RULE_TYPE 4
/* fs zone */
#define SCOUTFS_INODE_TYPE 4
#define SCOUTFS_XATTR_TYPE 8
@@ -675,34 +661,6 @@ struct scoutfs_xattr_totl_val {
__le64 count;
};
#define SQ_RF_TOTL_COUNT (1 << 0)
#define SQ_RF__UNKNOWN (~((1 << 1) - 1))
#define SQ_NS_LITERAL 0
#define SQ_NS_PROJ 1
#define SQ_NS_UID 2
#define SQ_NS_GID 3
#define SQ_NS__NR 4
#define SQ_NS__NR_SELECT (SQ_NS__NR - 1) /* !literal */
#define SQ_NF_SELECT (1 << 0)
#define SQ_NF__UNKNOWN (~((1 << 1) - 1))
#define SQ_OP_INODE 0
#define SQ_OP_DATA 1
#define SQ_OP__NR 2
struct scoutfs_quota_rule_val {
__le64 name_val[3];
__le64 limit;
__u8 prio;
__u8 op;
__u8 rule_flags;
__u8 name_source[3];
__u8 name_flags[3];
__u8 _pad[7];
};
/* XXX does this exist upstream somewhere? */
#define member_sizeof(TYPE, MEMBER) (sizeof(((TYPE *)0)->MEMBER))
@@ -901,38 +859,9 @@ struct scoutfs_inode {
struct scoutfs_timespec ctime;
struct scoutfs_timespec mtime;
struct scoutfs_timespec crtime;
__le64 proj;
};
#define SCOUTFS_INODE_FMT_V1_BYTES offsetof(struct scoutfs_inode, proj)
/*
* There are so few versions that we don't mind doing this work inline
* so that both utils and kernel can share these. Mounting has already
* checked that the format version is within the supported min and max,
* so these functions only deal with size variance within that band.
*/
/* Returns the native written inode size for the given format version, 0 for bad version */
static inline int scoutfs_inode_vers_bytes(__u64 fmt_vers)
{
if (fmt_vers == 1)
return SCOUTFS_INODE_FMT_V1_BYTES;
else
return sizeof(struct scoutfs_inode);
}
/*
* Returns true if bytes is a valid inode size to read from the given
* version. The given version must be greater than the version that
* introduced the size.
*/
static inline int scoutfs_inode_valid_vers_bytes(__u64 fmt_vers, int bytes)
{
return (bytes == sizeof(struct scoutfs_inode) && fmt_vers == SCOUTFS_FORMAT_VERSION_MAX) ||
(bytes == SCOUTFS_INODE_FMT_V1_BYTES);
}
#define SCOUTFS_INO_FLAG_TRUNCATE 0x1
#define SCOUTFS_INO_FLAG_RETENTION 0x2
#define SCOUTFS_INO_FLAG_TRUNCATE 0x1
#define SCOUTFS_ROOT_INO 1
+42 -130
View File
@@ -91,7 +91,7 @@ static void scoutfs_inode_ctor(void *obj)
init_rwsem(&si->extent_sem);
mutex_init(&si->item_mutex);
seqlock_init(&si->seqlock);
seqcount_init(&si->seqcount);
si->staging = false;
scoutfs_per_task_init(&si->pt_data_lock);
atomic64_set(&si->data_waitq.changed, 0);
@@ -250,7 +250,7 @@ static void set_item_info(struct scoutfs_inode_info *si,
set_item_major(si, SCOUTFS_INODE_INDEX_DATA_SEQ_TYPE, sinode->data_seq);
}
static void load_inode(struct inode *inode, struct scoutfs_inode *cinode, int inode_bytes)
static void load_inode(struct inode *inode, struct scoutfs_inode *cinode)
{
struct scoutfs_inode_info *si = SCOUTFS_I(inode);
@@ -278,7 +278,6 @@ static void load_inode(struct inode *inode, struct scoutfs_inode *cinode, int in
si->flags = le32_to_cpu(cinode->flags);
si->crtime.tv_sec = le64_to_cpu(cinode->crtime.sec);
si->crtime.tv_nsec = le32_to_cpu(cinode->crtime.nsec);
si->proj = le64_to_cpu(cinode->proj);
/*
* i_blocks is initialized from online and offline and is then
@@ -299,24 +298,6 @@ void scoutfs_inode_init_key(struct scoutfs_key *key, u64 ino)
};
}
/*
* Read an inode item into the caller's buffer and return the size that
* we read. Returns errors if the inode size is unsupported or doesn't
* make sense for the format version.
*/
static int lookup_inode_item(struct super_block *sb, struct scoutfs_key *key,
struct scoutfs_inode *sinode, struct scoutfs_lock *lock)
{
struct scoutfs_sb_info *sbi = SCOUTFS_SB(sb);
int ret;
ret = scoutfs_item_lookup_smaller_zero(sb, key, sinode, sizeof(struct scoutfs_inode), lock);
if (ret >= 0 && !scoutfs_inode_valid_vers_bytes(sbi->fmt_vers, ret))
return -EIO;
return ret;
}
/*
* Refresh the vfs inode fields if the lock indicates that the current
* contents could be stale.
@@ -352,12 +333,12 @@ int scoutfs_inode_refresh(struct inode *inode, struct scoutfs_lock *lock)
mutex_lock(&si->item_mutex);
if (atomic64_read(&si->last_refreshed) < refresh_gen) {
ret = lookup_inode_item(sb, &key, &sinode, lock);
if (ret > 0) {
load_inode(inode, &sinode, ret);
ret = scoutfs_item_lookup_exact(sb, &key, &sinode,
sizeof(sinode), lock);
if (ret == 0) {
load_inode(inode, &sinode);
atomic64_set(&si->last_refreshed, refresh_gen);
scoutfs_lock_add_coverage(sb, lock, &si->ino_lock_cov);
ret = 0;
}
} else {
ret = 0;
@@ -505,10 +486,6 @@ retry:
if (ret)
goto out;
ret = scoutfs_inode_check_retention(inode);
if (ret < 0)
goto out;
attr_size = (attr->ia_valid & ATTR_SIZE) ? attr->ia_size :
i_size_read(inode);
@@ -589,9 +566,11 @@ static void set_trans_seq(struct inode *inode, u64 *seq)
struct scoutfs_sb_info *sbi = SCOUTFS_SB(sb);
if (*seq != sbi->trans_seq) {
write_seqlock(&si->seqlock);
preempt_disable();
write_seqcount_begin(&si->seqcount);
*seq = sbi->trans_seq;
write_sequnlock(&si->seqlock);
write_seqcount_end(&si->seqcount);
preempt_enable();
}
}
@@ -613,18 +592,22 @@ void scoutfs_inode_inc_data_version(struct inode *inode)
{
struct scoutfs_inode_info *si = SCOUTFS_I(inode);
write_seqlock(&si->seqlock);
preempt_disable();
write_seqcount_begin(&si->seqcount);
si->data_version++;
write_sequnlock(&si->seqlock);
write_seqcount_end(&si->seqcount);
preempt_enable();
}
void scoutfs_inode_set_data_version(struct inode *inode, u64 data_version)
{
struct scoutfs_inode_info *si = SCOUTFS_I(inode);
write_seqlock(&si->seqlock);
preempt_disable();
write_seqcount_begin(&si->seqcount);
si->data_version = data_version;
write_sequnlock(&si->seqlock);
write_seqcount_end(&si->seqcount);
preempt_enable();
}
void scoutfs_inode_add_onoff(struct inode *inode, s64 on, s64 off)
@@ -633,7 +616,8 @@ void scoutfs_inode_add_onoff(struct inode *inode, s64 on, s64 off)
if (inode && (on || off)) {
si = SCOUTFS_I(inode);
write_seqlock(&si->seqlock);
preempt_disable();
write_seqcount_begin(&si->seqcount);
/* inode and extents out of sync, bad callers */
if (((s64)si->online_blocks + on < 0) ||
@@ -654,7 +638,8 @@ void scoutfs_inode_add_onoff(struct inode *inode, s64 on, s64 off)
si->online_blocks,
si->offline_blocks);
write_sequnlock(&si->seqlock);
write_seqcount_end(&si->seqcount);
preempt_enable();
}
/* any time offline extents decreased we try and wake waiters */
@@ -662,16 +647,16 @@ void scoutfs_inode_add_onoff(struct inode *inode, s64 on, s64 off)
scoutfs_data_wait_changed(inode);
}
static u64 read_seqlock_u64(struct inode *inode, u64 *val)
static u64 read_seqcount_u64(struct inode *inode, u64 *val)
{
struct scoutfs_inode_info *si = SCOUTFS_I(inode);
unsigned seq;
unsigned int seq;
u64 v;
do {
seq = read_seqbegin(&si->seqlock);
seq = read_seqcount_begin(&si->seqcount);
v = *val;
} while (read_seqretry(&si->seqlock, seq));
} while (read_seqcount_retry(&si->seqcount, seq));
return v;
}
@@ -680,82 +665,33 @@ u64 scoutfs_inode_meta_seq(struct inode *inode)
{
struct scoutfs_inode_info *si = SCOUTFS_I(inode);
return read_seqlock_u64(inode, &si->meta_seq);
return read_seqcount_u64(inode, &si->meta_seq);
}
u64 scoutfs_inode_data_seq(struct inode *inode)
{
struct scoutfs_inode_info *si = SCOUTFS_I(inode);
return read_seqlock_u64(inode, &si->data_seq);
return read_seqcount_u64(inode, &si->data_seq);
}
u64 scoutfs_inode_data_version(struct inode *inode)
{
struct scoutfs_inode_info *si = SCOUTFS_I(inode);
return read_seqlock_u64(inode, &si->data_version);
return read_seqcount_u64(inode, &si->data_version);
}
void scoutfs_inode_get_onoff(struct inode *inode, s64 *on, s64 *off)
{
struct scoutfs_inode_info *si = SCOUTFS_I(inode);
unsigned seq;
unsigned int seq;
do {
seq = read_seqbegin(&si->seqlock);
seq = read_seqcount_begin(&si->seqcount);
*on = SCOUTFS_I(inode)->online_blocks;
*off = SCOUTFS_I(inode)->offline_blocks;
} while (read_seqretry(&si->seqlock, seq));
}
/*
* Get our private scoutfs inode flags, not the vfs i_flags.
*/
u32 scoutfs_inode_get_flags(struct inode *inode)
{
struct scoutfs_inode_info *si = SCOUTFS_I(inode);
unsigned seq;
u32 flags;
do {
seq = read_seqbegin(&si->seqlock);
flags = si->flags;
} while (read_seqretry(&si->seqlock, seq));
return flags;
}
void scoutfs_inode_set_flags(struct inode *inode, u32 and, u32 or)
{
struct scoutfs_inode_info *si = SCOUTFS_I(inode);
write_seqlock(&si->seqlock);
si->flags = (si->flags & and) | or;
write_sequnlock(&si->seqlock);
}
u64 scoutfs_inode_get_proj(struct inode *inode)
{
struct scoutfs_inode_info *si = SCOUTFS_I(inode);
unsigned seq;
u64 proj;
do {
seq = read_seqbegin(&si->seqlock);
proj = si->proj;
} while (read_seqretry(&si->seqlock, seq));
return proj;
}
void scoutfs_inode_set_proj(struct inode *inode, u64 proj)
{
struct scoutfs_inode_info *si = SCOUTFS_I(inode);
write_seqlock(&si->seqlock);
si->proj = proj;
write_sequnlock(&si->seqlock);
} while (read_seqcount_retry(&si->seqcount, seq));
}
static int scoutfs_iget_test(struct inode *inode, void *arg)
@@ -867,7 +803,7 @@ out:
return inode;
}
static void store_inode(struct scoutfs_inode *cinode, struct inode *inode, int inode_bytes)
static void store_inode(struct scoutfs_inode *cinode, struct inode *inode)
{
struct scoutfs_inode_info *si = SCOUTFS_I(inode);
u64 online_blocks;
@@ -903,7 +839,6 @@ static void store_inode(struct scoutfs_inode *cinode, struct inode *inode, int i
cinode->crtime.sec = cpu_to_le64(si->crtime.tv_sec);
cinode->crtime.nsec = cpu_to_le32(si->crtime.tv_nsec);
memset(cinode->crtime.__pad, 0, sizeof(cinode->crtime.__pad));
cinode->proj = cpu_to_le64(si->proj);
}
/*
@@ -927,18 +862,15 @@ static void store_inode(struct scoutfs_inode *cinode, struct inode *inode, int i
int scoutfs_dirty_inode_item(struct inode *inode, struct scoutfs_lock *lock)
{
struct super_block *sb = inode->i_sb;
struct scoutfs_sb_info *sbi = SCOUTFS_SB(sb);
struct scoutfs_inode sinode;
struct scoutfs_key key;
int inode_bytes;
int ret;
inode_bytes = scoutfs_inode_vers_bytes(sbi->fmt_vers);
store_inode(&sinode, inode, inode_bytes);
store_inode(&sinode, inode);
scoutfs_inode_init_key(&key, scoutfs_ino(inode));
ret = scoutfs_item_update(sb, &key, &sinode, inode_bytes, lock);
ret = scoutfs_item_update(sb, &key, &sinode, sizeof(sinode), lock);
if (!ret)
trace_scoutfs_dirty_inode(inode);
return ret;
@@ -1140,11 +1072,9 @@ void scoutfs_update_inode_item(struct inode *inode, struct scoutfs_lock *lock,
{
struct scoutfs_inode_info *si = SCOUTFS_I(inode);
struct super_block *sb = inode->i_sb;
struct scoutfs_sb_info *sbi = SCOUTFS_SB(sb);
const u64 ino = scoutfs_ino(inode);
struct scoutfs_inode sinode;
struct scoutfs_key key;
int inode_bytes;
struct scoutfs_inode sinode;
int ret;
int err;
@@ -1153,17 +1083,15 @@ void scoutfs_update_inode_item(struct inode *inode, struct scoutfs_lock *lock,
/* set the meta version once per trans for any inode updates */
scoutfs_inode_set_meta_seq(inode);
inode_bytes = scoutfs_inode_vers_bytes(sbi->fmt_vers);
/* only race with other inode field stores once */
store_inode(&sinode, inode, inode_bytes);
store_inode(&sinode, inode);
ret = update_indices(sb, si, ino, inode->i_mode, &sinode, lock_list, lock);
BUG_ON(ret);
scoutfs_inode_init_key(&key, ino);
err = scoutfs_item_update(sb, &key, &sinode, inode_bytes, lock);
err = scoutfs_item_update(sb, &key, &sinode, sizeof(sinode), lock);
if (err) {
scoutfs_err(sb, "inode %llu update err %d", ino, err);
BUG_ON(err);
@@ -1531,12 +1459,10 @@ out:
int scoutfs_new_inode(struct super_block *sb, struct inode *dir, umode_t mode, dev_t rdev,
u64 ino, struct scoutfs_lock *lock, struct inode **inode_ret)
{
struct scoutfs_sb_info *sbi = SCOUTFS_SB(sb);
struct scoutfs_inode_info *si;
struct scoutfs_inode sinode;
struct scoutfs_key key;
struct scoutfs_inode sinode;
struct inode *inode;
int inode_bytes;
int ret;
inode = new_inode(sb);
@@ -1552,7 +1478,6 @@ int scoutfs_new_inode(struct super_block *sb, struct inode *dir, umode_t mode, d
si->offline_blocks = 0;
si->next_readdir_pos = SCOUTFS_DIRENT_FIRST_POS;
si->next_xattr_id = 0;
si->proj = 0;
si->have_item = false;
atomic64_set(&si->last_refreshed, lock->refresh_gen);
scoutfs_lock_add_coverage(sb, lock, &si->ino_lock_cov);
@@ -1568,16 +1493,14 @@ int scoutfs_new_inode(struct super_block *sb, struct inode *dir, umode_t mode, d
inode->i_rdev = rdev;
set_inode_ops(inode);
inode_bytes = scoutfs_inode_vers_bytes(sbi->fmt_vers);
store_inode(&sinode, inode, inode_bytes);
store_inode(&sinode, inode);
scoutfs_inode_init_key(&key, scoutfs_ino(inode));
ret = scoutfs_omap_set(sb, ino);
if (ret < 0)
goto out;
ret = scoutfs_item_create(sb, &key, &sinode, inode_bytes, lock);
ret = scoutfs_item_create(sb, &key, &sinode, sizeof(sinode), lock);
if (ret < 0)
scoutfs_omap_clear(sb, ino);
out:
@@ -1831,7 +1754,7 @@ static int try_delete_inode_items(struct super_block *sb, u64 ino)
}
scoutfs_inode_init_key(&key, ino);
ret = lookup_inode_item(sb, &key, &sinode, lock);
ret = scoutfs_item_lookup_exact(sb, &key, &sinode, sizeof(sinode), lock);
if (ret < 0) {
if (ret == -ENOENT)
ret = 0;
@@ -2220,17 +2143,6 @@ out:
return ret;
}
/*
* Return an error if the inode has the retention flag set and can not
* be modified. This mimics the errno returned by the vfs whan an
* inode's immutable flag is set. The flag won't be set on older format
* versions so we don't check the mounted format version here.
*/
int scoutfs_inode_check_retention(struct inode *inode)
{
return (scoutfs_inode_get_flags(inode) & SCOUTFS_INO_FLAG_RETENTION) ? -EPERM : 0;
}
int scoutfs_inode_setup(struct super_block *sb)
{
struct scoutfs_sb_info *sbi = SCOUTFS_SB(sb);
+1 -9
View File
@@ -21,7 +21,6 @@ struct scoutfs_inode_info {
u64 data_version;
u64 online_blocks;
u64 offline_blocks;
u64 proj;
u32 flags;
struct kc_timespec crtime;
@@ -48,7 +47,7 @@ struct scoutfs_inode_info {
atomic64_t last_refreshed;
/* initialized once for slab object */
seqlock_t seqlock;
seqcount_t seqcount;
bool staging; /* holder of i_mutex is staging */
struct scoutfs_per_task pt_data_lock;
struct scoutfs_data_waitq data_waitq;
@@ -121,15 +120,8 @@ u64 scoutfs_inode_meta_seq(struct inode *inode);
u64 scoutfs_inode_data_seq(struct inode *inode);
u64 scoutfs_inode_data_version(struct inode *inode);
void scoutfs_inode_get_onoff(struct inode *inode, s64 *on, s64 *off);
u32 scoutfs_inode_get_flags(struct inode *inode);
void scoutfs_inode_set_flags(struct inode *inode, u32 and, u32 or);
u64 scoutfs_inode_get_proj(struct inode *inode);
void scoutfs_inode_set_proj(struct inode *inode, u64 proj);
int scoutfs_complete_truncate(struct inode *inode, struct scoutfs_lock *lock);
int scoutfs_inode_check_retention(struct inode *inode);
int scoutfs_inode_refresh(struct inode *inode, struct scoutfs_lock *lock);
#ifdef KC_LINUX_HAVE_RHEL_IOPS_WRAPPER
int scoutfs_getattr(struct vfsmount *mnt, struct dentry *dentry,
+293 -386
View File
@@ -42,10 +42,6 @@
#include "alloc.h"
#include "server.h"
#include "counters.h"
#include "attr_x.h"
#include "totl.h"
#include "wkic.h"
#include "quota.h"
#include "scoutfs_trace.h"
/*
@@ -549,41 +545,20 @@ out:
static long scoutfs_ioc_stat_more(struct file *file, unsigned long arg)
{
struct inode *inode = file_inode(file);
struct scoutfs_ioctl_inode_attr_x *iax = NULL;
struct scoutfs_ioctl_stat_more *stm = NULL;
int ret;
struct scoutfs_inode_info *si = SCOUTFS_I(inode);
struct scoutfs_ioctl_stat_more stm;
iax = kmalloc(sizeof(struct scoutfs_ioctl_inode_attr_x), GFP_KERNEL);
stm = kmalloc(sizeof(struct scoutfs_ioctl_stat_more), GFP_KERNEL);
if (!iax || !stm) {
ret = -ENOMEM;
goto out;
}
stm.meta_seq = scoutfs_inode_meta_seq(inode);
stm.data_seq = scoutfs_inode_data_seq(inode);
stm.data_version = scoutfs_inode_data_version(inode);
scoutfs_inode_get_onoff(inode, &stm.online_blocks, &stm.offline_blocks);
stm.crtime_sec = si->crtime.tv_sec;
stm.crtime_nsec = si->crtime.tv_nsec;
iax->x_mask = SCOUTFS_IOC_IAX_META_SEQ | SCOUTFS_IOC_IAX_DATA_SEQ |
SCOUTFS_IOC_IAX_DATA_VERSION | SCOUTFS_IOC_IAX_ONLINE_BLOCKS |
SCOUTFS_IOC_IAX_OFFLINE_BLOCKS | SCOUTFS_IOC_IAX_CRTIME;
iax->x_flags = 0;
ret = scoutfs_get_attr_x(inode, iax);
if (ret < 0)
goto out;
if (copy_to_user((void __user *)arg, &stm, sizeof(stm)))
return -EFAULT;
stm->meta_seq = iax->meta_seq;
stm->data_seq = iax->data_seq;
stm->data_version = iax->data_version;
stm->online_blocks = iax->online_blocks;
stm->offline_blocks = iax->offline_blocks;
stm->crtime_sec = iax->crtime_sec;
stm->crtime_nsec = iax->crtime_nsec;
if (copy_to_user((void __user *)arg, stm, sizeof(struct scoutfs_ioctl_stat_more)))
ret = -EFAULT;
else
ret = 0;
out:
kfree(iax);
kfree(stm);
return ret;
return 0;
}
static bool inc_wrapped(u64 *ino, u64 *iblock)
@@ -640,19 +615,24 @@ static long scoutfs_ioc_data_waiting(struct file *file, unsigned long arg)
* This is used when restoring files, it lets the caller set all the
* inode attributes which are otherwise unreachable. Changing the file
* size can only be done for regular files with a data_version of 0.
*
* We unconditionally fill the iax attributes from the sm set and let
* set_attr_x check them.
*/
static long scoutfs_ioc_setattr_more(struct file *file, unsigned long arg)
{
struct inode *inode = file_inode(file);
struct inode *inode = file->f_inode;
struct scoutfs_inode_info *si = SCOUTFS_I(inode);
struct super_block *sb = inode->i_sb;
struct scoutfs_ioctl_setattr_more __user *usm = (void __user *)arg;
struct scoutfs_ioctl_inode_attr_x *iax = NULL;
struct scoutfs_ioctl_setattr_more sm;
struct scoutfs_lock *lock = NULL;
LIST_HEAD(ind_locks);
bool set_data_seq;
int ret;
if (!capable(CAP_SYS_ADMIN)) {
ret = -EPERM;
goto out;
}
if (!(file->f_mode & FMODE_WRITE)) {
ret = -EBADF;
goto out;
@@ -663,38 +643,65 @@ static long scoutfs_ioc_setattr_more(struct file *file, unsigned long arg)
goto out;
}
if (sm.flags & SCOUTFS_IOC_SETATTR_MORE_UNKNOWN) {
if ((sm.i_size > 0 && sm.data_version == 0) ||
((sm.flags & SCOUTFS_IOC_SETATTR_MORE_OFFLINE) && !sm.i_size) ||
(sm.flags & SCOUTFS_IOC_SETATTR_MORE_UNKNOWN)) {
ret = -EINVAL;
goto out;
}
iax = kzalloc(sizeof(struct scoutfs_ioctl_inode_attr_x), GFP_KERNEL);
if (!iax) {
ret = -ENOMEM;
ret = mnt_want_write_file(file);
if (ret)
goto out;
inode_lock(inode);
ret = scoutfs_lock_inode(sb, SCOUTFS_LOCK_WRITE,
SCOUTFS_LKF_REFRESH_INODE, inode, &lock);
if (ret)
goto unlock;
/* can only change size/dv on untouched regular files */
if ((sm.i_size != 0 || sm.data_version != 0) &&
((!S_ISREG(inode->i_mode) ||
scoutfs_inode_data_version(inode) != 0))) {
ret = -EINVAL;
goto unlock;
}
iax->x_mask = SCOUTFS_IOC_IAX_DATA_VERSION | SCOUTFS_IOC_IAX_CTIME |
SCOUTFS_IOC_IAX_CRTIME | SCOUTFS_IOC_IAX_SIZE;
iax->data_version = sm.data_version;
iax->ctime_sec = sm.ctime_sec;
iax->ctime_nsec = sm.ctime_nsec;
iax->crtime_sec = sm.crtime_sec;
iax->crtime_nsec = sm.crtime_nsec;
iax->size = sm.i_size;
/* create offline extents in potentially many transactions */
if (sm.flags & SCOUTFS_IOC_SETATTR_MORE_OFFLINE) {
ret = scoutfs_data_init_offline_extent(inode, sm.i_size, lock);
if (ret)
goto unlock;
}
if (sm.flags & SCOUTFS_IOC_SETATTR_MORE_OFFLINE)
iax->x_flags |= SCOUTFS_IOC_IAX_F_SIZE_OFFLINE;
/* setting only so we don't see 0 data seq with nonzero data_version */
set_data_seq = sm.data_version != 0 ? true : false;
ret = scoutfs_inode_index_lock_hold(inode, &ind_locks, set_data_seq, false);
if (ret)
goto unlock;
ret = mnt_want_write_file(file);
if (ret < 0)
goto out;
if (sm.data_version)
scoutfs_inode_set_data_version(inode, sm.data_version);
if (sm.i_size)
i_size_write(inode, sm.i_size);
inode->i_ctime.tv_sec = sm.ctime_sec;
inode->i_ctime.tv_nsec = sm.ctime_nsec;
si->crtime.tv_sec = sm.crtime_sec;
si->crtime.tv_nsec = sm.crtime_nsec;
ret = scoutfs_set_attr_x(inode, iax);
scoutfs_update_inode_item(inode, lock, &ind_locks);
ret = 0;
scoutfs_release_trans(sb);
unlock:
scoutfs_inode_index_unlock(sb, &ind_locks);
scoutfs_unlock(sb, lock, SCOUTFS_LOCK_WRITE);
inode_unlock(inode);
mnt_drop_write_file(file);
out:
kfree(iax);
return ret;
}
@@ -1028,32 +1035,124 @@ out:
return ret;
}
struct read_xattr_total_iter_cb_args {
struct scoutfs_ioctl_xattr_total *xt;
unsigned int copied;
unsigned int total;
struct xattr_total_entry {
struct rb_node node;
struct scoutfs_ioctl_xattr_total xt;
u64 fs_seq;
u64 fs_total;
u64 fs_count;
u64 fin_seq;
u64 fin_total;
s64 fin_count;
u64 log_seq;
u64 log_total;
s64 log_count;
};
/*
* This is called under an RCU read lock so it can't copy to userspace.
*/
static int read_xattr_total_iter_cb(struct scoutfs_key *key, void *val, unsigned int val_len,
void *cb_arg)
static int cmp_xt_entry_name(const struct xattr_total_entry *a,
const struct xattr_total_entry *b)
{
return scoutfs_cmp_u64s(a->xt.name[0], b->xt.name[0]) ?:
scoutfs_cmp_u64s(a->xt.name[1], b->xt.name[1]) ?:
scoutfs_cmp_u64s(a->xt.name[2], b->xt.name[2]);
}
/*
* Record the contribution of the three classes of logged items we can
* see: the item in the fs_root, items from finalized log btrees, and
* items from active log btrees. Once we have the full set the caller
* can decide which of the items contribute to the total it sends to the
* user.
*/
static int read_xattr_total_item(struct super_block *sb, struct scoutfs_key *key,
u64 seq, u8 flags, void *val, int val_len, int fic, void *arg)
{
struct read_xattr_total_iter_cb_args *cba = cb_arg;
struct scoutfs_xattr_totl_val *tval = val;
struct scoutfs_ioctl_xattr_total *xt = &cba->xt[cba->copied];
struct xattr_total_entry *ent;
struct xattr_total_entry rd;
struct rb_root *root = arg;
struct rb_node *parent;
struct rb_node **node;
int cmp;
xt->name[0] = le64_to_cpu(key->skxt_a);
xt->name[1] = le64_to_cpu(key->skxt_b);
xt->name[2] = le64_to_cpu(key->skxt_c);
xt->total = le64_to_cpu(tval->total);
xt->count = le64_to_cpu(tval->count);
rd.xt.name[0] = le64_to_cpu(key->skxt_a);
rd.xt.name[1] = le64_to_cpu(key->skxt_b);
rd.xt.name[2] = le64_to_cpu(key->skxt_c);
if (++cba->copied < cba->total)
return -EAGAIN;
else
return 0;
/* find entry matching name */
node = &root->rb_node;
parent = NULL;
cmp = -1;
while (*node) {
parent = *node;
ent = container_of(*node, struct xattr_total_entry, node);
/* sort merge items by key then newest to oldest */
cmp = cmp_xt_entry_name(&rd, ent);
if (cmp < 0)
node = &(*node)->rb_left;
else if (cmp > 0)
node = &(*node)->rb_right;
else
break;
}
/* allocate and insert new node if we need to */
if (cmp != 0) {
ent = kzalloc(sizeof(*ent), GFP_KERNEL);
if (!ent)
return -ENOMEM;
memcpy(&ent->xt.name, &rd.xt.name, sizeof(ent->xt.name));
rb_link_node(&ent->node, parent, node);
rb_insert_color(&ent->node, root);
}
if (fic & FIC_FS_ROOT) {
ent->fs_seq = seq;
ent->fs_total = le64_to_cpu(tval->total);
ent->fs_count = le64_to_cpu(tval->count);
} else if (fic & FIC_FINALIZED) {
ent->fin_seq = seq;
ent->fin_total += le64_to_cpu(tval->total);
ent->fin_count += le64_to_cpu(tval->count);
} else {
ent->log_seq = seq;
ent->log_total += le64_to_cpu(tval->total);
ent->log_count += le64_to_cpu(tval->count);
}
scoutfs_inc_counter(sb, totl_read_item);
return 0;
}
/* these are always _safe, node stores next */
#define for_each_xt_ent(ent, node, root) \
for (node = rb_first(root); \
node && (ent = rb_entry(node, struct xattr_total_entry, node), \
node = rb_next(node), 1); )
#define for_each_xt_ent_reverse(ent, node, root) \
for (node = rb_last(root); \
node && (ent = rb_entry(node, struct xattr_total_entry, node), \
node = rb_prev(node), 1); )
static void free_xt_ent(struct rb_root *root, struct xattr_total_entry *ent)
{
rb_erase(&ent->node, root);
kfree(ent);
}
static void free_all_xt_ents(struct rb_root *root)
{
struct xattr_total_entry *ent;
struct rb_node *node;
for_each_xt_ent(ent, node, root)
free_xt_ent(root, ent);
}
/*
@@ -1063,6 +1162,30 @@ static int read_xattr_total_iter_cb(struct scoutfs_key *key, void *val, unsigned
* have been committed. It doesn't use locking to force commits and
* block writers so it can be a little bit out of date with respect to
* dirty xattrs in memory across the system.
*
* Our reader has to be careful because the log btree merging code can
* write partial results to the fs_root. This means that a reader can
* see both cases where new finalized logs should be applied to the old
* fs items and where old finalized logs have already been applied to
* the partially merged fs items. Currently active logged items are
* always applied on top of all cases.
*
* These cases are differentiated with a combination of sequence numbers
* in items, the count of contributing xattrs, and a flag
* differentiating finalized and active logged items. This lets us
* recognize all cases, including when finalized logs were merged and
* deleted the fs item.
*
* We're allocating a tracking struct for each totl name we see while
* traversing the item btrees. The forest reader is providing the items
* it finds in leaf blocks that contain the search key. In the worst
* case all of these blocks are full and none of the items overlap. At
* most, figure order a thousand names per mount. But in practice many
* of these factors fall away: leaf blocks aren't fill, leaf items
* overlap, there aren't finalized log btrees, and not all mounts are
* actively changing totals. We're much more likely to only read a
* leaf block's worth of totals that have been long since merged into
* the fs_root.
*/
static long scoutfs_ioc_read_xattr_totals(struct file *file, unsigned long arg)
{
@@ -1070,13 +1193,14 @@ static long scoutfs_ioc_read_xattr_totals(struct file *file, unsigned long arg)
struct scoutfs_ioctl_read_xattr_totals __user *urxt = (void __user *)arg;
struct scoutfs_ioctl_read_xattr_totals rxt;
struct scoutfs_ioctl_xattr_total __user *uxt;
struct read_xattr_total_iter_cb_args cba = {NULL, };
struct scoutfs_key range_start;
struct scoutfs_key range_end;
struct xattr_total_entry *ent;
struct scoutfs_key key;
unsigned int copied = 0;
unsigned int total;
unsigned int ready;
struct scoutfs_key bloom_key;
struct scoutfs_key start;
struct scoutfs_key end;
struct rb_root root = RB_ROOT;
struct rb_node *node;
int count = 0;
int ret;
if (!(file->f_mode & FMODE_READ)) {
@@ -1089,13 +1213,6 @@ static long scoutfs_ioc_read_xattr_totals(struct file *file, unsigned long arg)
goto out;
}
cba.xt = (void *)__get_free_page(GFP_KERNEL);
if (!cba.xt) {
ret = -ENOMEM;
goto out;
}
cba.total = PAGE_SIZE / sizeof(struct scoutfs_ioctl_xattr_total);
if (copy_from_user(&rxt, urxt, sizeof(rxt))) {
ret = -EFAULT;
goto out;
@@ -1108,40 +1225,101 @@ static long scoutfs_ioc_read_xattr_totals(struct file *file, unsigned long arg)
goto out;
}
total = div_u64(min_t(u64, rxt.totals_bytes, INT_MAX),
sizeof(struct scoutfs_ioctl_xattr_total));
scoutfs_key_set_zeros(&bloom_key);
bloom_key.sk_zone = SCOUTFS_XATTR_TOTL_ZONE;
scoutfs_xattr_init_totl_key(&start, rxt.pos_name);
scoutfs_totl_set_range(&range_start, &range_end);
scoutfs_xattr_init_totl_key(&key, rxt.pos_name);
while (rxt.totals_bytes >= sizeof(struct scoutfs_ioctl_xattr_total)) {
while (copied < total) {
cba.copied = 0;
ret = scoutfs_wkic_iterate(sb, &key, &range_end, &range_start, &range_end,
read_xattr_total_iter_cb, &cba);
if (ret < 0)
goto out;
if (cba.copied == 0)
scoutfs_key_set_ones(&end);
end.sk_zone = SCOUTFS_XATTR_TOTL_ZONE;
if (scoutfs_key_compare(&start, &end) > 0)
break;
ready = min(total - copied, cba.copied);
if (copy_to_user(&uxt[copied], cba.xt, ready * sizeof(cba.xt[0]))) {
ret = -EFAULT;
key = start;
ret = scoutfs_forest_read_items(sb, &key, &bloom_key, &start, &end,
read_xattr_total_item, &root);
if (ret < 0) {
if (ret == -ESTALE) {
free_all_xt_ents(&root);
continue;
}
goto out;
}
scoutfs_xattr_init_totl_key(&key, cba.xt[ready - 1].name);
scoutfs_key_inc(&key);
copied += ready;
if (RB_EMPTY_ROOT(&root))
break;
/* trim totals that fall outside of the consistent range */
for_each_xt_ent(ent, node, &root) {
scoutfs_xattr_init_totl_key(&key, ent->xt.name);
if (scoutfs_key_compare(&key, &start) < 0) {
free_xt_ent(&root, ent);
} else {
break;
}
}
for_each_xt_ent_reverse(ent, node, &root) {
scoutfs_xattr_init_totl_key(&key, ent->xt.name);
if (scoutfs_key_compare(&key, &end) > 0) {
free_xt_ent(&root, ent);
} else {
break;
}
}
/* copy resulting unique non-zero totals to userspace */
for_each_xt_ent(ent, node, &root) {
if (rxt.totals_bytes < sizeof(ent->xt))
break;
/* start with the fs item if we have it */
if (ent->fs_seq != 0) {
ent->xt.total = ent->fs_total;
ent->xt.count = ent->fs_count;
scoutfs_inc_counter(sb, totl_read_fs);
}
/* apply finalized logs if they're newer or creating */
if (((ent->fs_seq != 0) && (ent->fin_seq > ent->fs_seq)) ||
((ent->fs_seq == 0) && (ent->fin_count > 0))) {
ent->xt.total += ent->fin_total;
ent->xt.count += ent->fin_count;
scoutfs_inc_counter(sb, totl_read_finalized);
}
/* always apply active logs which must be newer than fs and finalized */
if (ent->log_seq > 0) {
ent->xt.total += ent->log_total;
ent->xt.count += ent->log_count;
scoutfs_inc_counter(sb, totl_read_logged);
}
if (ent->xt.total != 0 || ent->xt.count != 0) {
if (copy_to_user(uxt, &ent->xt, sizeof(ent->xt))) {
ret = -EFAULT;
goto out;
}
uxt++;
rxt.totals_bytes -= sizeof(ent->xt);
count++;
scoutfs_inc_counter(sb, totl_read_copied);
}
free_xt_ent(&root, ent);
}
/* continue after the last possible key read */
start = end;
scoutfs_key_inc(&start);
}
ret = 0;
out:
if (cba.xt)
free_page((long)cba.xt);
free_all_xt_ents(&root);
return ret ?: copied;
return ret ?: count;
}
static long scoutfs_ioc_get_allocated_inos(struct file *file, unsigned long arg)
@@ -1326,265 +1504,6 @@ out:
return nr ?: ret;
}
static long scoutfs_ioc_get_attr_x(struct file *file, unsigned long arg)
{
struct inode *inode = file_inode(file);
struct scoutfs_ioctl_inode_attr_x __user *uiax = (void __user *)arg;
struct scoutfs_ioctl_inode_attr_x *iax = NULL;
int ret;
iax = kmalloc(sizeof(struct scoutfs_ioctl_inode_attr_x), GFP_KERNEL);
if (!iax) {
ret = -ENOMEM;
goto out;
}
ret = get_user(iax->x_mask, &uiax->x_mask) ?:
get_user(iax->x_flags, &uiax->x_flags);
if (ret < 0)
goto out;
ret = scoutfs_get_attr_x(inode, iax);
if (ret < 0)
goto out;
/* only copy results after dropping cluster locks (could fault) */
if (ret > 0 && copy_to_user(uiax, iax, ret) != 0)
ret = -EFAULT;
else
ret = 0;
out:
kfree(iax);
return ret;
}
static long scoutfs_ioc_set_attr_x(struct file *file, unsigned long arg)
{
struct inode *inode = file_inode(file);
struct scoutfs_ioctl_inode_attr_x __user *uiax = (void __user *)arg;
struct scoutfs_ioctl_inode_attr_x *iax = NULL;
int ret;
iax = kmalloc(sizeof(struct scoutfs_ioctl_inode_attr_x), GFP_KERNEL);
if (!iax) {
ret = -ENOMEM;
goto out;
}
if (copy_from_user(iax, uiax, sizeof(struct scoutfs_ioctl_inode_attr_x))) {
ret = -EFAULT;
goto out;
}
ret = mnt_want_write_file(file);
if (ret < 0)
goto out;
ret = scoutfs_set_attr_x(inode, iax);
mnt_drop_write_file(file);
out:
kfree(iax);
return ret;
}
static long scoutfs_ioc_get_quota_rules(struct file *file, unsigned long arg)
{
struct super_block *sb = file_inode(file)->i_sb;
struct scoutfs_ioctl_get_quota_rules __user *ugqr = (void __user *)arg;
struct scoutfs_ioctl_get_quota_rules gqr;
struct scoutfs_ioctl_quota_rule __user *uirules;
struct scoutfs_ioctl_quota_rule *irules;
struct page *page = NULL;
int copied = 0;
int nr;
int ret;
if (!capable(CAP_SYS_ADMIN))
return -EPERM;
if (copy_from_user(&gqr, ugqr, sizeof(gqr)))
return -EFAULT;
if (gqr.rules_nr == 0)
return 0;
uirules = (void __user *)gqr.rules_ptr;
/* limit rules copied per call */
gqr.rules_nr = min_t(u64, gqr.rules_nr, INT_MAX);
page = alloc_page(GFP_KERNEL | __GFP_ZERO);
if (!page) {
ret = -ENOMEM;
goto out;
}
irules = page_address(page);
while (copied < gqr.rules_nr) {
nr = min_t(u64, gqr.rules_nr - copied,
PAGE_SIZE / sizeof(struct scoutfs_ioctl_quota_rule));
ret = scoutfs_quota_get_rules(sb, gqr.iterator, page_address(page), nr);
if (ret <= 0)
goto out;
if (copy_to_user(&uirules[copied], irules, ret * sizeof(irules[0]))) {
ret = -EFAULT;
goto out;
}
copied += ret;
}
ret = 0;
out:
if (page)
__free_page(page);
if (ret == 0 && copy_to_user(ugqr->iterator, gqr.iterator, sizeof(gqr.iterator)))
ret = -EFAULT;
return ret ?: copied;
}
static long scoutfs_ioc_mod_quota_rule(struct file *file, unsigned long arg, bool is_add)
{
struct super_block *sb = file_inode(file)->i_sb;
struct scoutfs_ioctl_quota_rule __user *uirule = (void __user *)arg;
struct scoutfs_ioctl_quota_rule irule;
if (!capable(CAP_SYS_ADMIN))
return -EPERM;
if (copy_from_user(&irule, uirule, sizeof(irule)))
return -EFAULT;
return scoutfs_quota_mod_rule(sb, is_add, &irule);
}
struct read_index_buf {
int nr;
int size;
struct scoutfs_ioctl_xattr_index_entry ents[0];
};
#define READ_INDEX_BUF_MAX_ENTS \
((PAGE_SIZE - sizeof(struct read_index_buf)) / \
sizeof(struct scoutfs_ioctl_xattr_index_entry))
/*
* This doesn't filter out duplicates, the caller filters them out to
* catch duplicates between iteration calls.
*/
static int read_index_cb(struct scoutfs_key *key, void *val, unsigned int val_len, void *cb_arg)
{
struct read_index_buf *rib = cb_arg;
struct scoutfs_ioctl_xattr_index_entry *ent = &rib->ents[rib->nr];
u64 xid;
if (val_len != 0)
return -EIO;
/* discard the xid, they're not exposed to ioctl callers */
scoutfs_xattr_get_indx_key(key, &ent->major, &ent->minor, &ent->ino, &xid);
if (++rib->nr == rib->size)
return rib->nr;
return -EAGAIN;
}
static long scoutfs_ioc_read_xattr_index(struct file *file, unsigned long arg)
{
struct super_block *sb = file_inode(file)->i_sb;
struct scoutfs_ioctl_read_xattr_index __user *urxi = (void __user *)arg;
struct scoutfs_ioctl_xattr_index_entry __user *uents;
struct scoutfs_ioctl_xattr_index_entry *ent;
struct scoutfs_ioctl_xattr_index_entry prev;
struct scoutfs_ioctl_read_xattr_index rxi;
struct read_index_buf *rib;
struct page *page = NULL;
struct scoutfs_key first;
struct scoutfs_key last;
struct scoutfs_key start;
struct scoutfs_key end;
int copied = 0;
int ret;
int i;
if (!capable(CAP_SYS_ADMIN)) {
ret = -EPERM;
goto out;
}
if (copy_from_user(&rxi, urxi, sizeof(rxi))) {
ret = -EFAULT;
goto out;
}
uents = (void __user *)rxi.entries_ptr;
rxi.entries_nr = min_t(u64, rxi.entries_nr, INT_MAX);
page = alloc_page(GFP_KERNEL);
if (!page) {
ret = -ENOMEM;
goto out;
}
rib = page_address(page);
scoutfs_xattr_init_indx_key(&first, rxi.first.major, rxi.first.minor, rxi.first.ino, 0);
scoutfs_xattr_init_indx_key(&last, rxi.last.major, rxi.last.minor, rxi.last.ino, U64_MAX);
scoutfs_xattr_indx_get_range(&start, &end);
if (scoutfs_key_compare(&first, &last) > 0) {
ret = -EINVAL;
goto out;
}
/* 0 ino doesn't exist, can't ever match entry to return */
memset(&prev, 0, sizeof(prev));
while (copied < rxi.entries_nr) {
rib->nr = 0;
rib->size = min_t(u64, rxi.entries_nr - copied, READ_INDEX_BUF_MAX_ENTS);
ret = scoutfs_wkic_iterate(sb, &first, &last, &start, &end,
read_index_cb, rib);
if (ret < 0)
goto out;
if (rib->nr == 0)
break;
/*
* Copy entries to userspace, skipping duplicate entries
* that can result from multiple xattrs indexing an
* inode at the same position and which can span
* multiple cache iterations. (Comparing in order of
* most likely to change to fail fast.)
*/
for (i = 0, ent = rib->ents; i < rib->nr; i++, ent++) {
if (ent->ino == prev.ino && ent->minor == prev.minor &&
ent->major == prev.major)
continue;
if (copy_to_user(&uents[copied], ent, sizeof(*ent))) {
ret = -EFAULT;
goto out;
}
prev = *ent;
copied++;
}
scoutfs_xattr_init_indx_key(&first, prev.major, prev.minor, prev.ino, U64_MAX);
scoutfs_key_inc(&first);
}
ret = copied;
out:
if (page)
__free_page(page);
return ret;
}
long scoutfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
{
switch (cmd) {
@@ -1622,18 +1541,6 @@ long scoutfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
return scoutfs_ioc_get_allocated_inos(file, arg);
case SCOUTFS_IOC_GET_REFERRING_ENTRIES:
return scoutfs_ioc_get_referring_entries(file, arg);
case SCOUTFS_IOC_GET_ATTR_X:
return scoutfs_ioc_get_attr_x(file, arg);
case SCOUTFS_IOC_SET_ATTR_X:
return scoutfs_ioc_set_attr_x(file, arg);
case SCOUTFS_IOC_GET_QUOTA_RULES:
return scoutfs_ioc_get_quota_rules(file, arg);
case SCOUTFS_IOC_ADD_QUOTA_RULE:
return scoutfs_ioc_mod_quota_rule(file, arg, true);
case SCOUTFS_IOC_DEL_QUOTA_RULE:
return scoutfs_ioc_mod_quota_rule(file, arg, false);
case SCOUTFS_IOC_READ_XATTR_INDEX:
return scoutfs_ioc_read_xattr_index(file, arg);
}
return -ENOTTY;
-170
View File
@@ -673,174 +673,4 @@ struct scoutfs_ioctl_dirent {
#define SCOUTFS_IOC_GET_REFERRING_ENTRIES \
_IOW(SCOUTFS_IOCTL_MAGIC, 17, struct scoutfs_ioctl_get_referring_entries)
struct scoutfs_ioctl_inode_attr_x {
__u64 x_mask;
__u64 x_flags;
__u64 meta_seq;
__u64 data_seq;
__u64 data_version;
__u64 online_blocks;
__u64 offline_blocks;
__u64 ctime_sec;
__u32 ctime_nsec;
__u32 crtime_nsec;
__u64 crtime_sec;
__u64 size;
__u64 bits;
__u64 project_id;
};
/*
* Behavioral flags set in the x_flags field. These flags don't
* necessarily correspond to specific attributes, but instead change the
* behaviour of a _get_ or _set_ operation.
*
* @SCOUTFS_IOC_IAX_F_SIZE_OFFLINE: When setting i_size, also create
* extents which are marked offline for the region of the file from
* offset 0 to the new set size. This can only be set when setting the
* size and has no effect if setting the size fails.
*/
#define SCOUTFS_IOC_IAX_F_SIZE_OFFLINE (1ULL << 0)
#define SCOUTFS_IOC_IAX_F__UNKNOWN (U64_MAX << 1)
/*
* Single-bit values stored in the @bits field. These indicate whether
* the bit is set, or not. The main _IAX_ bits set in the mask indicate
* whether this value bit is populated by _get or stored by _set.
*/
#define SCOUTFS_IOC_IAX_B_RETENTION (1ULL << 0)
/*
* x_mask bits which indicate which attributes of the inode to populate
* on return for _get or to set on the inode for _set. Each mask bit
* corresponds to the matching named field in the attr_x struct passed
* to the _get_ and _set_ calls.
*
* Each field can have different permissions or other attribute
* requirements which can cause calls to fail. If _set_ fails then no
* other attribute changes will have been made by the same call.
*
* @SCOUTFS_IOC_IAX_RETENTION: Mark a file for retention. When marked,
* no modification can be made to the file other than changing extended
* attributes outside the "user." prefix and clearing the retention
* mark. This can only be set on regular files and requires root (the
* CAP_SYS_ADMIN capability). Other attributes can be set with a
* set_attr_x call on a retention inode as long as that call also
* successfully clears the retention mark.
*/
#define SCOUTFS_IOC_IAX_META_SEQ (1ULL << 0)
#define SCOUTFS_IOC_IAX_DATA_SEQ (1ULL << 1)
#define SCOUTFS_IOC_IAX_DATA_VERSION (1ULL << 2)
#define SCOUTFS_IOC_IAX_ONLINE_BLOCKS (1ULL << 3)
#define SCOUTFS_IOC_IAX_OFFLINE_BLOCKS (1ULL << 4)
#define SCOUTFS_IOC_IAX_CTIME (1ULL << 5)
#define SCOUTFS_IOC_IAX_CRTIME (1ULL << 6)
#define SCOUTFS_IOC_IAX_SIZE (1ULL << 7)
#define SCOUTFS_IOC_IAX_RETENTION (1ULL << 8)
#define SCOUTFS_IOC_IAX_PROJECT_ID (1ULL << 9)
/* single bit attributes that are packed in the bits field as _B_ */
#define SCOUTFS_IOC_IAX__BITS (SCOUTFS_IOC_IAX_RETENTION)
/* inverse of all the bits we understand */
#define SCOUTFS_IOC_IAX__UNKNOWN (U64_MAX << 10)
#define SCOUTFS_IOC_GET_ATTR_X \
_IOW(SCOUTFS_IOCTL_MAGIC, 18, struct scoutfs_ioctl_inode_attr_x)
#define SCOUTFS_IOC_SET_ATTR_X \
_IOW(SCOUTFS_IOCTL_MAGIC, 19, struct scoutfs_ioctl_inode_attr_x)
/*
* (These fields are documented in the order that they're displayed by
* the scoutfs cli utility which matches the sort order of the rules.)
*
* @prio: The priority of the rule. Rules are sorted by their fields
* with prio at the highest magnitude. When multiple rules match the
* rule with the highest sort order is enforced. The priority field
* lets rules override the default field sort order.
*
* @name_val[3]: The three 64bit values that make up the name of the
* totl xattr whose total will be checked against the rule's limit to
* see if the quota rule has been exceeded. The behavior of the values
* can be changed by their corresponding name_source and name_flags.
*
* @name_source[3]: The SQ_NS_ enums that control where the value comes
* from. _LITERAL uses the value from name_val. Inode attribute
* sources (_PROJ, _UID, _GID) are taken from the inode of the operation
* that is being checked against the rule.
*
* @name_flags[3]: The SQ_NF_ enums that alter the name values. _SELECT
* makes the rule only match if the inode attribute of the operation
* matches the attribute value stored in name_val. This lets rules
* match a specific value of an attribute rather than mapping all
* attribute values of to totl names.
*
* @op: The SQ_OP_ enums which specify the operation that can't exceed
* the rule's limit. _INODE checks inode creation and the inode
* attributes are taken from the inode that would be created. _DATA
* checks file data block allocation and the inode fields come from the
* inode that is allocating the blocks.
*
* @limit: The 64bit value that is checked against the totl value
* described by the rule. If the totl value is greater than or equal to
* this value of the matching rule then the operation will return
* -EDQUOT.
*
* @rule_flags: SQ_RF_TOTL_COUNT indicates that the rule's limit should
* be checked against the number of xattrs contributing to a totl value
* instead of the sum of the xattrs.
*/
struct scoutfs_ioctl_quota_rule {
__u64 name_val[3];
__u64 limit;
__u8 prio;
__u8 op;
__u8 rule_flags;
__u8 name_source[3];
__u8 name_flags[3];
__u8 _pad[7];
};
struct scoutfs_ioctl_get_quota_rules {
__u64 iterator[2];
__u64 rules_ptr;
__u64 rules_nr;
};
/*
* Rules are uniquely identified by their non-padded fields. Addition will fail
* with -EEXIST if the specified rule already exists and deletion must find a rule
* with all matching fields to delete.
*/
#define SCOUTFS_IOC_GET_QUOTA_RULES \
_IOR(SCOUTFS_IOCTL_MAGIC, 20, struct scoutfs_ioctl_get_quota_rules)
#define SCOUTFS_IOC_ADD_QUOTA_RULE \
_IOW(SCOUTFS_IOCTL_MAGIC, 21, struct scoutfs_ioctl_quota_rule)
#define SCOUTFS_IOC_DEL_QUOTA_RULE \
_IOW(SCOUTFS_IOCTL_MAGIC, 22, struct scoutfs_ioctl_quota_rule)
/*
* Inodes can be indexed in a global key space at a position determined
* by a .indx. tagged xattr. The xattr name specifies the two index
* position values, with major having the more significant comparison
* order.
*/
struct scoutfs_ioctl_xattr_index_entry {
__u64 minor;
__u64 ino;
__u8 major;
__u8 _pad[7];
};
struct scoutfs_ioctl_read_xattr_index {
__u64 flags;
struct scoutfs_ioctl_xattr_index_entry first;
struct scoutfs_ioctl_xattr_index_entry last;
__u64 entries_ptr;
__u64 entries_nr;
};
#define SCOUTFS_IOC_READ_XATTR_INDEX \
_IOR(SCOUTFS_IOCTL_MAGIC, 23, struct scoutfs_ioctl_read_xattr_index)
#endif
+16 -55
View File
@@ -24,7 +24,6 @@
#include "item.h"
#include "forest.h"
#include "block.h"
#include "msg.h"
#include "trans.h"
#include "counters.h"
#include "scoutfs_trace.h"
@@ -1671,24 +1670,13 @@ out:
return ret;
}
static int lock_safe(struct super_block *sb, struct scoutfs_lock *lock, struct scoutfs_key *key,
static int lock_safe(struct scoutfs_lock *lock, struct scoutfs_key *key,
int mode)
{
bool prot = scoutfs_lock_protected(lock, key, mode);
if (!prot) {
static bool once = false;
if (!once) {
scoutfs_err(sb, "lock (start "SK_FMT" end "SK_FMT" mode 0x%x) does not protect operation (key "SK_FMT" mode 0x%x)",
SK_ARG(&lock->start), SK_ARG(&lock->end), lock->mode,
SK_ARG(key), mode);
dump_stack();
once = true;
}
if (WARN_ON_ONCE(!scoutfs_lock_protected(lock, key, mode)))
return -EINVAL;
}
return 0;
else
return 0;
}
static int optional_lock_mode_match(struct scoutfs_lock *lock, int mode)
@@ -1720,8 +1708,8 @@ static int copy_val(void *dst, int dst_len, void *src, int src_len)
* The amount of bytes copied is returned which can be 0 or truncated if
* the caller's buffer isn't big enough.
*/
static int item_lookup(struct super_block *sb, struct scoutfs_key *key,
void *val, int val_len, int len_limit, struct scoutfs_lock *lock)
int scoutfs_item_lookup(struct super_block *sb, struct scoutfs_key *key,
void *val, int val_len, struct scoutfs_lock *lock)
{
DECLARE_ITEM_CACHE_INFO(sb, cinf);
struct cached_item *item;
@@ -1730,7 +1718,7 @@ static int item_lookup(struct super_block *sb, struct scoutfs_key *key,
scoutfs_inc_counter(sb, item_lookup);
if ((ret = lock_safe(sb, lock, key, SCOUTFS_LOCK_READ)))
if ((ret = lock_safe(lock, key, SCOUTFS_LOCK_READ)))
goto out;
ret = get_cached_page(sb, cinf, lock, key, false, false, 0, &pg);
@@ -1741,8 +1729,6 @@ static int item_lookup(struct super_block *sb, struct scoutfs_key *key,
item = item_rbtree_walk(&pg->item_root, key, NULL, NULL, NULL);
if (!item || item->deletion)
ret = -ENOENT;
else if (len_limit > 0 && item->val_len > len_limit)
ret = -EIO;
else
ret = copy_val(val, val_len, item->val, item->val_len);
@@ -1751,38 +1737,13 @@ out:
return ret;
}
int scoutfs_item_lookup(struct super_block *sb, struct scoutfs_key *key,
void *val, int val_len, struct scoutfs_lock *lock)
{
return item_lookup(sb, key, val, val_len, 0, lock);
}
/*
* Copy an item's value into the caller's buffer. If the item's value
* is larger than the caller's buffer then -EIO is returned. If the
* item is smaller then the bytes from the end of the copied value to
* the end of the buffer are zeroed. The number of value bytes copied
* is returned, and 0 can be returned for an item with no value.
*/
int scoutfs_item_lookup_smaller_zero(struct super_block *sb, struct scoutfs_key *key,
void *val, int val_len, struct scoutfs_lock *lock)
{
int ret;
ret = item_lookup(sb, key, val, val_len, val_len, lock);
if (ret >= 0 && ret < val_len)
memset(val + ret, 0, val_len - ret);
return ret;
}
int scoutfs_item_lookup_exact(struct super_block *sb, struct scoutfs_key *key,
void *val, int val_len,
struct scoutfs_lock *lock)
{
int ret;
ret = item_lookup(sb, key, val, val_len, 0, lock);
ret = scoutfs_item_lookup(sb, key, val, val_len, lock);
if (ret == val_len)
ret = 0;
else if (ret >= 0)
@@ -1832,7 +1793,7 @@ int scoutfs_item_next(struct super_block *sb, struct scoutfs_key *key,
goto out;
}
if ((ret = lock_safe(sb, lock, key, SCOUTFS_LOCK_READ)))
if ((ret = lock_safe(lock, key, SCOUTFS_LOCK_READ)))
goto out;
pos = *key;
@@ -1913,7 +1874,7 @@ int scoutfs_item_dirty(struct super_block *sb, struct scoutfs_key *key,
scoutfs_inc_counter(sb, item_dirty);
if ((ret = lock_safe(sb, lock, key, SCOUTFS_LOCK_WRITE)))
if ((ret = lock_safe(lock, key, SCOUTFS_LOCK_WRITE)))
goto out;
ret = scoutfs_forest_set_bloom_bits(sb, lock);
@@ -1959,7 +1920,7 @@ static int item_create(struct super_block *sb, struct scoutfs_key *key,
scoutfs_inc_counter(sb, item_create);
if ((ret = lock_safe(sb, lock, key, mode)) ||
if ((ret = lock_safe(lock, key, mode)) ||
(ret = optional_lock_mode_match(primary, SCOUTFS_LOCK_WRITE)))
goto out;
@@ -2002,7 +1963,7 @@ int scoutfs_item_create(struct super_block *sb, struct scoutfs_key *key,
void *val, int val_len, struct scoutfs_lock *lock)
{
return item_create(sb, key, val, val_len, lock, NULL,
SCOUTFS_LOCK_WRITE, false);
SCOUTFS_LOCK_READ, false);
}
int scoutfs_item_create_force(struct super_block *sb, struct scoutfs_key *key,
@@ -2033,7 +1994,7 @@ int scoutfs_item_update(struct super_block *sb, struct scoutfs_key *key,
scoutfs_inc_counter(sb, item_update);
if ((ret = lock_safe(sb, lock, key, SCOUTFS_LOCK_WRITE)))
if ((ret = lock_safe(lock, key, SCOUTFS_LOCK_WRITE)))
goto out;
ret = scoutfs_forest_set_bloom_bits(sb, lock);
@@ -2101,7 +2062,7 @@ int scoutfs_item_delta(struct super_block *sb, struct scoutfs_key *key,
scoutfs_inc_counter(sb, item_delta);
if ((ret = lock_safe(sb, lock, key, SCOUTFS_LOCK_WRITE_ONLY)))
if ((ret = lock_safe(lock, key, SCOUTFS_LOCK_WRITE_ONLY)))
goto out;
ret = scoutfs_forest_set_bloom_bits(sb, lock);
@@ -2174,7 +2135,7 @@ static int item_delete(struct super_block *sb, struct scoutfs_key *key,
scoutfs_inc_counter(sb, item_delete);
if ((ret = lock_safe(sb, lock, key, mode)) ||
if ((ret = lock_safe(lock, key, mode)) ||
(ret = optional_lock_mode_match(primary, SCOUTFS_LOCK_WRITE)))
goto out;
@@ -2580,7 +2541,7 @@ static unsigned long item_cache_count_objects(struct shrinker *shrink,
scoutfs_inc_counter(sb, item_cache_count_objects);
return shrinker_min_long(cinf->lru_pages);
return shrinker_min_t_long((u64)(cinf->lru_pages));
}
/*
-2
View File
@@ -3,8 +3,6 @@
int scoutfs_item_lookup(struct super_block *sb, struct scoutfs_key *key,
void *val, int val_len, struct scoutfs_lock *lock);
int scoutfs_item_lookup_smaller_zero(struct super_block *sb, struct scoutfs_key *key,
void *val, int val_len, struct scoutfs_lock *lock);
int scoutfs_item_lookup_exact(struct super_block *sb, struct scoutfs_key *key,
void *val, int val_len,
struct scoutfs_lock *lock);
+5 -29
View File
@@ -36,8 +36,6 @@
#include "item.h"
#include "omap.h"
#include "util.h"
#include "totl.h"
#include "quota.h"
/*
* scoutfs uses a lock service to manage item cache consistency between
@@ -187,9 +185,6 @@ static int lock_invalidate(struct super_block *sb, struct scoutfs_lock *lock,
return ret;
}
if (lock->start.sk_zone == SCOUTFS_QUOTA_ZONE && !lock_mode_can_read(mode))
scoutfs_quota_invalidate(sb);
/* have to invalidate if we're not in the only usable case */
if (!(prev == SCOUTFS_LOCK_WRITE && mode == SCOUTFS_LOCK_READ)) {
retry:
@@ -1249,29 +1244,10 @@ int scoutfs_lock_xattr_totl(struct super_block *sb, enum scoutfs_lock_mode mode,
struct scoutfs_key start;
struct scoutfs_key end;
scoutfs_totl_set_range(&start, &end);
return lock_key_range(sb, mode, flags, &start, &end, lock);
}
int scoutfs_lock_xattr_indx(struct super_block *sb, enum scoutfs_lock_mode mode, int flags,
struct scoutfs_lock **lock)
{
struct scoutfs_key start;
struct scoutfs_key end;
scoutfs_xattr_indx_get_range(&start, &end);
return lock_key_range(sb, mode, flags, &start, &end, lock);
}
int scoutfs_lock_quota(struct super_block *sb, enum scoutfs_lock_mode mode, int flags,
struct scoutfs_lock **lock)
{
struct scoutfs_key start;
struct scoutfs_key end;
scoutfs_quota_get_lock_range(&start, &end);
scoutfs_key_set_zeros(&start);
start.sk_zone = SCOUTFS_XATTR_TOTL_ZONE;
scoutfs_key_set_ones(&end);
end.sk_zone = SCOUTFS_XATTR_TOTL_ZONE;
return lock_key_range(sb, mode, flags, &start, &end, lock);
}
@@ -1433,7 +1409,7 @@ static unsigned long lock_count_objects(struct shrinker *shrink,
scoutfs_inc_counter(sb, lock_count_objects);
return shrinker_min_long(linfo->lru_nr);
return shrinker_min_t_long((u64)(linfo->lru_nr));
}
/*
-4
View File
@@ -86,10 +86,6 @@ int scoutfs_lock_orphan(struct super_block *sb, enum scoutfs_lock_mode mode, int
u64 ino, struct scoutfs_lock **lock);
int scoutfs_lock_xattr_totl(struct super_block *sb, enum scoutfs_lock_mode mode, int flags,
struct scoutfs_lock **lock);
int scoutfs_lock_xattr_indx(struct super_block *sb, enum scoutfs_lock_mode mode, int flags,
struct scoutfs_lock **lock);
int scoutfs_lock_quota(struct super_block *sb, enum scoutfs_lock_mode mode, int flags,
struct scoutfs_lock **lock);
void scoutfs_unlock(struct super_block *sb, struct scoutfs_lock *lock,
enum scoutfs_lock_mode mode);
-68
View File
@@ -33,7 +33,6 @@ enum {
Opt_acl,
Opt_data_prealloc_blocks,
Opt_data_prealloc_contig_only,
Opt_log_merge_wait_timeout_ms,
Opt_metadev_path,
Opt_noacl,
Opt_orphan_scan_delay_ms,
@@ -46,7 +45,6 @@ static const match_table_t tokens = {
{Opt_acl, "acl"},
{Opt_data_prealloc_blocks, "data_prealloc_blocks=%s"},
{Opt_data_prealloc_contig_only, "data_prealloc_contig_only=%s"},
{Opt_log_merge_wait_timeout_ms, "log_merge_wait_timeout_ms=%s"},
{Opt_metadev_path, "metadev_path=%s"},
{Opt_noacl, "noacl"},
{Opt_orphan_scan_delay_ms, "orphan_scan_delay_ms=%s"},
@@ -115,10 +113,6 @@ static void free_options(struct scoutfs_mount_options *opts)
kfree(opts->metadev_path);
}
#define MIN_LOG_MERGE_WAIT_TIMEOUT_MS 100UL
#define DEFAULT_LOG_MERGE_WAIT_TIMEOUT_MS 500
#define MAX_LOG_MERGE_WAIT_TIMEOUT_MS (60 * MSEC_PER_SEC)
#define MIN_ORPHAN_SCAN_DELAY_MS 100UL
#define DEFAULT_ORPHAN_SCAN_DELAY_MS (10 * MSEC_PER_SEC)
#define MAX_ORPHAN_SCAN_DELAY_MS (60 * MSEC_PER_SEC)
@@ -132,27 +126,11 @@ static void init_default_options(struct scoutfs_mount_options *opts)
opts->data_prealloc_blocks = SCOUTFS_DATA_PREALLOC_DEFAULT_BLOCKS;
opts->data_prealloc_contig_only = 1;
opts->log_merge_wait_timeout_ms = DEFAULT_LOG_MERGE_WAIT_TIMEOUT_MS;
opts->orphan_scan_delay_ms = -1;
opts->quorum_heartbeat_timeout_ms = SCOUTFS_QUORUM_DEF_HB_TIMEO_MS;
opts->quorum_slot_nr = -1;
}
static int verify_log_merge_wait_timeout_ms(struct super_block *sb, int ret, int val)
{
if (ret < 0) {
scoutfs_err(sb, "failed to parse log_merge_wait_timeout_ms value");
return -EINVAL;
}
if (val < MIN_LOG_MERGE_WAIT_TIMEOUT_MS || val > MAX_LOG_MERGE_WAIT_TIMEOUT_MS) {
scoutfs_err(sb, "invalid log_merge_wait_timeout_ms value %d, must be between %lu and %lu",
val, MIN_LOG_MERGE_WAIT_TIMEOUT_MS, MAX_LOG_MERGE_WAIT_TIMEOUT_MS);
return -EINVAL;
}
return 0;
}
static int verify_quorum_heartbeat_timeout_ms(struct super_block *sb, int ret, u64 val)
{
if (ret < 0) {
@@ -218,14 +196,6 @@ static int parse_options(struct super_block *sb, char *options, struct scoutfs_m
opts->data_prealloc_contig_only = nr;
break;
case Opt_log_merge_wait_timeout_ms:
ret = match_int(args, &nr);
ret = verify_log_merge_wait_timeout_ms(sb, ret, nr);
if (ret < 0)
return ret;
opts->log_merge_wait_timeout_ms = nr;
break;
case Opt_metadev_path:
ret = parse_bdev_path(sb, &args[0], &opts->metadev_path);
if (ret < 0)
@@ -452,43 +422,6 @@ static ssize_t data_prealloc_contig_only_store(struct kobject *kobj, struct kobj
}
SCOUTFS_ATTR_RW(data_prealloc_contig_only);
static ssize_t log_merge_wait_timeout_ms_show(struct kobject *kobj, struct kobj_attribute *attr,
char *buf)
{
struct super_block *sb = SCOUTFS_SYSFS_ATTRS_SB(kobj);
struct scoutfs_mount_options opts;
scoutfs_options_read(sb, &opts);
return snprintf(buf, PAGE_SIZE, "%u", opts.log_merge_wait_timeout_ms);
}
static ssize_t log_merge_wait_timeout_ms_store(struct kobject *kobj, struct kobj_attribute *attr,
const char *buf, size_t count)
{
struct super_block *sb = SCOUTFS_SYSFS_ATTRS_SB(kobj);
DECLARE_OPTIONS_INFO(sb, optinf);
char nullterm[30]; /* more than enough for octal -U64_MAX */
int val;
int len;
int ret;
len = min(count, sizeof(nullterm) - 1);
memcpy(nullterm, buf, len);
nullterm[len] = '\0';
ret = kstrtoint(nullterm, 0, &val);
ret = verify_log_merge_wait_timeout_ms(sb, ret, val);
if (ret == 0) {
write_seqlock(&optinf->seqlock);
optinf->opts.log_merge_wait_timeout_ms = val;
write_sequnlock(&optinf->seqlock);
ret = count;
}
return ret;
}
SCOUTFS_ATTR_RW(log_merge_wait_timeout_ms);
static ssize_t metadev_path_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf)
{
struct super_block *sb = SCOUTFS_SYSFS_ATTRS_SB(kobj);
@@ -592,7 +525,6 @@ SCOUTFS_ATTR_RO(quorum_slot_nr);
static struct attribute *options_attrs[] = {
SCOUTFS_ATTR_PTR(data_prealloc_blocks),
SCOUTFS_ATTR_PTR(data_prealloc_contig_only),
SCOUTFS_ATTR_PTR(log_merge_wait_timeout_ms),
SCOUTFS_ATTR_PTR(metadev_path),
SCOUTFS_ATTR_PTR(orphan_scan_delay_ms),
SCOUTFS_ATTR_PTR(quorum_heartbeat_timeout_ms),
-1
View File
@@ -8,7 +8,6 @@
struct scoutfs_mount_options {
u64 data_prealloc_blocks;
bool data_prealloc_contig_only;
unsigned int log_merge_wait_timeout_ms;
char *metadev_path;
unsigned int orphan_scan_delay_ms;
int quorum_slot_nr;
-1261
View File
File diff suppressed because it is too large Load Diff
-48
View File
@@ -1,48 +0,0 @@
#ifndef _SCOUTFS_QUOTA_H_
#define _SCOUTFS_QUOTA_H_
#include "ioctl.h"
/*
* Each rule's name can be in the ruleset's rbtree associated with the
* source attr that it selects. This lets checks only test rules that
* the inputs could match. The 'i' field indicates which name is in the
* tree so we can find the containing rule.
*
* This is mostly private to quota.c but we expose it for tracing.
*/
struct squota_rule {
u64 limit;
u8 prio;
u8 op;
u8 rule_flags;
struct squota_rule_name {
struct rb_node node;
u64 val;
u8 source;
u8 flags;
u8 i;
} names[3];
};
/* private to quota.c, only here for tracing */
struct squota_input {
u64 attrs[SQ_NS__NR_SELECT];
u8 op;
};
int scoutfs_quota_check_inode(struct super_block *sb, struct inode *dir);
int scoutfs_quota_check_data(struct super_block *sb, struct inode *inode);
int scoutfs_quota_get_rules(struct super_block *sb, u64 *iterator,
struct scoutfs_ioctl_quota_rule *irules, int nr);
int scoutfs_quota_mod_rule(struct super_block *sb, bool is_add,
struct scoutfs_ioctl_quota_rule *irule);
void scoutfs_quota_get_lock_range(struct scoutfs_key *start, struct scoutfs_key *end);
void scoutfs_quota_invalidate(struct super_block *sb);
int scoutfs_quota_setup(struct super_block *sb);
void scoutfs_quota_destroy(struct super_block *sb);
#endif
+26 -227
View File
@@ -37,10 +37,6 @@
#include "net.h"
#include "data.h"
#include "ext.h"
#include "quota.h"
#include "trace/quota.h"
#include "trace/wkic.h"
struct lock_info;
@@ -443,7 +439,6 @@ DECLARE_EVENT_CLASS(scoutfs_trans_hold_release_class,
SCSB_TRACE_ASSIGN(sb);
__entry->journal_info = (unsigned long)journal_info;
__entry->holders = holders;
__entry->ret = ret;
),
TP_printk(SCSBF" journal_info 0x%0lx holders %d ret %d",
@@ -1751,41 +1746,21 @@ TRACE_EVENT(scoutfs_btree_merge,
sk_trace_args(end))
);
TRACE_EVENT(scoutfs_btree_merge_read_range,
TP_PROTO(struct super_block *sb, struct scoutfs_key *start, struct scoutfs_key *end,
int size),
TP_ARGS(sb, start, end, size),
TP_STRUCT__entry(
SCSB_TRACE_FIELDS
sk_trace_define(start)
sk_trace_define(end)
__field(int, size)
),
TP_fast_assign(
SCSB_TRACE_ASSIGN(sb);
sk_trace_assign(start, start);
sk_trace_assign(end, end);
__entry->size = size;
),
TP_printk(SCSBF" start "SK_FMT" end "SK_FMT" size %d",
SCSB_TRACE_ARGS, sk_trace_args(start), sk_trace_args(end), __entry->size)
);
TRACE_EVENT(scoutfs_btree_merge_items,
TP_PROTO(struct super_block *sb,
struct scoutfs_btree_root *m_root,
struct scoutfs_key *m_key, int m_val_len,
struct scoutfs_btree_root *f_root,
struct scoutfs_key *f_key, int f_val_len,
int is_del),
TP_ARGS(sb, m_key, m_val_len, f_root, f_key, f_val_len, is_del),
TP_ARGS(sb, m_root, m_key, m_val_len, f_root, f_key, f_val_len, is_del),
TP_STRUCT__entry(
SCSB_TRACE_FIELDS
__field(__u64, m_root_blkno)
__field(__u64, m_root_seq)
__field(__u8, m_root_height)
sk_trace_define(m_key)
__field(int, m_val_len)
__field(__u64, f_root_blkno)
@@ -1798,6 +1773,10 @@ TRACE_EVENT(scoutfs_btree_merge_items,
TP_fast_assign(
SCSB_TRACE_ASSIGN(sb);
__entry->m_root_blkno = m_root ?
le64_to_cpu(m_root->ref.blkno) : 0;
__entry->m_root_seq = m_root ? le64_to_cpu(m_root->ref.seq) : 0;
__entry->m_root_height = m_root ? m_root->height : 0;
sk_trace_assign(m_key, m_key);
__entry->m_val_len = m_val_len;
__entry->f_root_blkno = f_root ?
@@ -1809,9 +1788,11 @@ TRACE_EVENT(scoutfs_btree_merge_items,
__entry->is_del = !!is_del;
),
TP_printk(SCSBF" merge item key "SK_FMT" val_len %d, fs item root blkno %llu seq %llu height %u key "SK_FMT" val_len %d, is_del %d",
SCSB_TRACE_ARGS, sk_trace_args(m_key), __entry->m_val_len,
__entry->f_root_blkno, __entry->f_root_seq, __entry->f_root_height,
TP_printk(SCSBF" merge item root blkno %llu seq %llu height %u key "SK_FMT" val_len %d, fs item root blkno %llu seq %llu height %u key "SK_FMT" val_len %d, is_del %d",
SCSB_TRACE_ARGS, __entry->m_root_blkno, __entry->m_root_seq,
__entry->m_root_height, sk_trace_args(m_key),
__entry->m_val_len, __entry->f_root_blkno,
__entry->f_root_seq, __entry->f_root_height,
sk_trace_args(f_key), __entry->f_val_len, __entry->is_del)
);
@@ -1915,9 +1896,8 @@ DEFINE_EVENT(scoutfs_server_client_count_class, scoutfs_server_client_down,
DECLARE_EVENT_CLASS(scoutfs_server_commit_users_class,
TP_PROTO(struct super_block *sb, int holding, int applying, int nr_holders,
u32 avail_before, u32 freed_before, int committing, int exceeded),
TP_ARGS(sb, holding, applying, nr_holders, avail_before, freed_before, committing,
exceeded),
u32 avail_before, u32 freed_before, int exceeded),
TP_ARGS(sb, holding, applying, nr_holders, avail_before, freed_before, exceeded),
TP_STRUCT__entry(
SCSB_TRACE_FIELDS
__field(int, holding)
@@ -1925,7 +1905,6 @@ DECLARE_EVENT_CLASS(scoutfs_server_commit_users_class,
__field(int, nr_holders)
__field(__u32, avail_before)
__field(__u32, freed_before)
__field(int, committing)
__field(int, exceeded)
),
TP_fast_assign(
@@ -1935,33 +1914,31 @@ DECLARE_EVENT_CLASS(scoutfs_server_commit_users_class,
__entry->nr_holders = nr_holders;
__entry->avail_before = avail_before;
__entry->freed_before = freed_before;
__entry->committing = !!committing;
__entry->exceeded = !!exceeded;
),
TP_printk(SCSBF" holding %u applying %u nr %u avail_before %u freed_before %u committing %u exceeded %u",
TP_printk(SCSBF" holding %u applying %u nr %u avail_before %u freed_before %u exceeded %u",
SCSB_TRACE_ARGS, __entry->holding, __entry->applying, __entry->nr_holders,
__entry->avail_before, __entry->freed_before, __entry->committing,
__entry->exceeded)
__entry->avail_before, __entry->freed_before, __entry->exceeded)
);
DEFINE_EVENT(scoutfs_server_commit_users_class, scoutfs_server_commit_hold,
TP_PROTO(struct super_block *sb, int holding, int applying, int nr_holders,
u32 avail_before, u32 freed_before, int committing, int exceeded),
TP_ARGS(sb, holding, applying, nr_holders, avail_before, freed_before, committing, exceeded)
u32 avail_before, u32 freed_before, int exceeded),
TP_ARGS(sb, holding, applying, nr_holders, avail_before, freed_before, exceeded)
);
DEFINE_EVENT(scoutfs_server_commit_users_class, scoutfs_server_commit_apply,
TP_PROTO(struct super_block *sb, int holding, int applying, int nr_holders,
u32 avail_before, u32 freed_before, int committing, int exceeded),
TP_ARGS(sb, holding, applying, nr_holders, avail_before, freed_before, committing, exceeded)
u32 avail_before, u32 freed_before, int exceeded),
TP_ARGS(sb, holding, applying, nr_holders, avail_before, freed_before, exceeded)
);
DEFINE_EVENT(scoutfs_server_commit_users_class, scoutfs_server_commit_start,
TP_PROTO(struct super_block *sb, int holding, int applying, int nr_holders,
u32 avail_before, u32 freed_before, int committing, int exceeded),
TP_ARGS(sb, holding, applying, nr_holders, avail_before, freed_before, committing, exceeded)
u32 avail_before, u32 freed_before, int exceeded),
TP_ARGS(sb, holding, applying, nr_holders, avail_before, freed_before, exceeded)
);
DEFINE_EVENT(scoutfs_server_commit_users_class, scoutfs_server_commit_end,
TP_PROTO(struct super_block *sb, int holding, int applying, int nr_holders,
u32 avail_before, u32 freed_before, int committing, int exceeded),
TP_ARGS(sb, holding, applying, nr_holders, avail_before, freed_before, committing, exceeded)
u32 avail_before, u32 freed_before, int exceeded),
TP_ARGS(sb, holding, applying, nr_holders, avail_before, freed_before, exceeded)
);
#define slt_symbolic(mode) \
@@ -2094,71 +2071,6 @@ TRACE_EVENT(scoutfs_trans_seq_last,
SCSB_TRACE_ARGS, __entry->s_rid, __entry->trans_seq)
);
TRACE_EVENT(scoutfs_server_finalize_items,
TP_PROTO(struct super_block *sb, u64 rid, u64 item_rid, u64 item_nr, u64 item_flags,
u64 item_get_trans_seq),
TP_ARGS(sb, rid, item_rid, item_nr, item_flags, item_get_trans_seq),
TP_STRUCT__entry(
SCSB_TRACE_FIELDS
__field(__u64, c_rid)
__field(__u64, item_rid)
__field(__u64, item_nr)
__field(__u64, item_flags)
__field(__u64, item_get_trans_seq)
),
TP_fast_assign(
SCSB_TRACE_ASSIGN(sb);
__entry->c_rid = rid;
__entry->item_rid = item_rid;
__entry->item_nr = item_nr;
__entry->item_flags = item_flags;
__entry->item_get_trans_seq = item_get_trans_seq;
),
TP_printk(SCSBF" rid %016llx item_rid %016llx item_nr %llu item_flags 0x%llx item_get_trans_seq %llu",
SCSB_TRACE_ARGS, __entry->c_rid, __entry->item_rid, __entry->item_nr,
__entry->item_flags, __entry->item_get_trans_seq)
);
TRACE_EVENT(scoutfs_server_finalize_decision,
TP_PROTO(struct super_block *sb, u64 rid, bool saw_finalized, bool others_active,
bool ours_visible, bool finalize_ours, unsigned int delay_ms,
u64 finalize_sent_seq),
TP_ARGS(sb, rid, saw_finalized, others_active, ours_visible, finalize_ours, delay_ms,
finalize_sent_seq),
TP_STRUCT__entry(
SCSB_TRACE_FIELDS
__field(__u64, c_rid)
__field(bool, saw_finalized)
__field(bool, others_active)
__field(bool, ours_visible)
__field(bool, finalize_ours)
__field(unsigned int, delay_ms)
__field(__u64, finalize_sent_seq)
),
TP_fast_assign(
SCSB_TRACE_ASSIGN(sb);
__entry->c_rid = rid;
__entry->saw_finalized = saw_finalized;
__entry->others_active = others_active;
__entry->ours_visible = ours_visible;
__entry->finalize_ours = finalize_ours;
__entry->delay_ms = delay_ms;
__entry->finalize_sent_seq = finalize_sent_seq;
),
TP_printk(SCSBF" rid %016llx saw_finalized %u others_active %u ours_visible %u finalize_ours %u delay_ms %u finalize_sent_seq %llu",
SCSB_TRACE_ARGS, __entry->c_rid, __entry->saw_finalized, __entry->others_active,
__entry->ours_visible, __entry->finalize_ours, __entry->delay_ms,
__entry->finalize_sent_seq)
);
TRACE_EVENT(scoutfs_get_log_merge_status,
TP_PROTO(struct super_block *sb, u64 rid, struct scoutfs_key *next_range_key,
u64 nr_requests, u64 nr_complete, u64 seq),
@@ -2399,44 +2311,6 @@ TRACE_EVENT(scoutfs_block_dirty_ref,
__entry->block_blkno, __entry->block_seq)
);
TRACE_EVENT(scoutfs_block_stale,
TP_PROTO(struct super_block *sb, struct scoutfs_block_ref *ref,
struct scoutfs_block_header *hdr, u32 magic, u32 crc),
TP_ARGS(sb, ref, hdr, magic, crc),
TP_STRUCT__entry(
SCSB_TRACE_FIELDS
__field(__u64, ref_blkno)
__field(__u64, ref_seq)
__field(__u32, hdr_crc)
__field(__u32, hdr_magic)
__field(__u64, hdr_fsid)
__field(__u64, hdr_seq)
__field(__u64, hdr_blkno)
__field(__u32, magic)
__field(__u32, crc)
),
TP_fast_assign(
SCSB_TRACE_ASSIGN(sb);
__entry->ref_blkno = le64_to_cpu(ref->blkno);
__entry->ref_seq = le64_to_cpu(ref->seq);
__entry->hdr_crc = le32_to_cpu(hdr->crc);
__entry->hdr_magic = le32_to_cpu(hdr->magic);
__entry->hdr_fsid = le64_to_cpu(hdr->fsid);
__entry->hdr_seq = le64_to_cpu(hdr->seq);
__entry->hdr_blkno = le64_to_cpu(hdr->blkno);
__entry->magic = magic;
__entry->crc = crc;
),
TP_printk(SCSBF" ref_blkno %llu ref_seq %016llx hdr_crc %08x hdr_magic %08x hdr_fsid %016llx hdr_seq %016llx hdr_blkno %llu magic %08x crc %08x",
SCSB_TRACE_ARGS, __entry->ref_blkno, __entry->ref_seq, __entry->hdr_crc,
__entry->hdr_magic, __entry->hdr_fsid, __entry->hdr_seq, __entry->hdr_blkno,
__entry->magic, __entry->crc)
);
DECLARE_EVENT_CLASS(scoutfs_block_class,
TP_PROTO(struct super_block *sb, void *bp, u64 blkno, int refcount, int io_count,
unsigned long bits, __u64 accessed),
@@ -2921,81 +2795,6 @@ TRACE_EVENT(scoutfs_omap_should_delete,
SCSB_TRACE_ARGS, __entry->ino, __entry->nlink, __entry->ret)
);
#define SSCF_FMT "[bo %llu bs %llu es %llu]"
#define SSCF_FIELDS(pref) \
__field(__u64, pref##_blkno) \
__field(__u64, pref##_blocks) \
__field(__u64, pref##_entries)
#define SSCF_ASSIGN(pref, sfl) \
__entry->pref##_blkno = le64_to_cpu((sfl)->ref.blkno); \
__entry->pref##_blocks = le64_to_cpu((sfl)->blocks); \
__entry->pref##_entries = le64_to_cpu((sfl)->entries);
#define SSCF_ENTRY_ARGS(pref) \
__entry->pref##_blkno, \
__entry->pref##_blocks, \
__entry->pref##_entries
DECLARE_EVENT_CLASS(scoutfs_srch_compact_class,
TP_PROTO(struct super_block *sb, struct scoutfs_srch_compact *sc),
TP_ARGS(sb, sc),
TP_STRUCT__entry(
SCSB_TRACE_FIELDS
__field(__u64, id)
__field(__u8, nr)
__field(__u8, flags)
SSCF_FIELDS(out)
__field(__u64, in0_blk)
__field(__u64, in0_pos)
SSCF_FIELDS(in0)
__field(__u64, in1_blk)
__field(__u64, in1_pos)
SSCF_FIELDS(in1)
__field(__u64, in2_blk)
__field(__u64, in2_pos)
SSCF_FIELDS(in2)
__field(__u64, in3_blk)
__field(__u64, in3_pos)
SSCF_FIELDS(in3)
),
TP_fast_assign(
SCSB_TRACE_ASSIGN(sb);
__entry->id = le64_to_cpu(sc->id);
__entry->nr = sc->nr;
__entry->flags = sc->flags;
SSCF_ASSIGN(out, &sc->out)
__entry->in0_blk = le64_to_cpu(sc->in[0].blk);
__entry->in0_pos = le64_to_cpu(sc->in[0].pos);
SSCF_ASSIGN(in0, &sc->in[0].sfl)
__entry->in1_blk = le64_to_cpu(sc->in[0].blk);
__entry->in1_pos = le64_to_cpu(sc->in[0].pos);
SSCF_ASSIGN(in1, &sc->in[1].sfl)
__entry->in2_blk = le64_to_cpu(sc->in[0].blk);
__entry->in2_pos = le64_to_cpu(sc->in[0].pos);
SSCF_ASSIGN(in2, &sc->in[2].sfl)
__entry->in3_blk = le64_to_cpu(sc->in[0].blk);
__entry->in3_pos = le64_to_cpu(sc->in[0].pos);
SSCF_ASSIGN(in3, &sc->in[3].sfl)
),
TP_printk(SCSBF" id %llu nr %u flags 0x%x out "SSCF_FMT" in0 b %llu p %llu "SSCF_FMT" in1 b %llu p %llu "SSCF_FMT" in2 b %llu p %llu "SSCF_FMT" in3 b %llu p %llu "SSCF_FMT,
SCSB_TRACE_ARGS, __entry->id, __entry->nr, __entry->flags, SSCF_ENTRY_ARGS(out),
__entry->in0_blk, __entry->in0_pos, SSCF_ENTRY_ARGS(in0),
__entry->in1_blk, __entry->in1_pos, SSCF_ENTRY_ARGS(in1),
__entry->in2_blk, __entry->in2_pos, SSCF_ENTRY_ARGS(in2),
__entry->in3_blk, __entry->in3_pos, SSCF_ENTRY_ARGS(in3))
);
DEFINE_EVENT(scoutfs_srch_compact_class, scoutfs_srch_compact_client_send,
TP_PROTO(struct super_block *sb, struct scoutfs_srch_compact *sc),
TP_ARGS(sb, sc)
);
DEFINE_EVENT(scoutfs_srch_compact_class, scoutfs_srch_compact_client_recv,
TP_PROTO(struct super_block *sb, struct scoutfs_srch_compact *sc),
TP_ARGS(sb, sc)
);
#endif /* _TRACE_SCOUTFS_H */
/* This part must be outside protection */
+118 -164
View File
@@ -67,7 +67,6 @@ struct commit_users {
unsigned int nr_holders;
u32 avail_before;
u32 freed_before;
bool committing;
bool exceeded;
};
@@ -85,13 +84,12 @@ do { \
__typeof__(cusers) _cusers = (cusers); \
trace_scoutfs_server_commit_##which(sb, !list_empty(&_cusers->holding), \
!list_empty(&_cusers->applying), _cusers->nr_holders, _cusers->avail_before, \
_cusers->freed_before, _cusers->committing, _cusers->exceeded); \
_cusers->freed_before, _cusers->exceeded); \
} while (0)
struct server_info {
struct super_block *sb;
spinlock_t lock;
seqlock_t seqlock;
wait_queue_head_t waitq;
struct workqueue_struct *wq;
@@ -133,9 +131,11 @@ struct server_info {
struct mutex mounted_clients_mutex;
/* stable super stored from commits, given in locks and rpcs */
seqcount_t stable_seqcount;
struct scoutfs_super_block stable_super;
/* serializing and get and set volume options */
seqcount_t volopt_seqcount;
struct mutex volopt_mutex;
struct scoutfs_volume_options volopt;
@@ -148,8 +148,6 @@ struct server_info {
struct scoutfs_quorum_config qconf;
/* a running server maintains a private dirty super */
struct scoutfs_super_block dirty_super;
u64 finalize_sent_seq;
};
#define DECLARE_SERVER_INFO(sb, name) \
@@ -183,7 +181,7 @@ static bool get_volopt_val(struct server_info *server, int nr, u64 *val)
unsigned seq;
do {
seq = read_seqbegin(&server->seqlock);
seq = read_seqcount_begin(&server->volopt_seqcount);
if ((le64_to_cpu(server->volopt.set_bits) & bit)) {
is_set = true;
*val = le64_to_cpup(opt);
@@ -191,7 +189,7 @@ static bool get_volopt_val(struct server_info *server, int nr, u64 *val)
is_set = false;
*val = 0;
};
} while (read_seqretry(&server->seqlock, seq));
} while (read_seqcount_retry(&server->volopt_seqcount, seq));
return is_set;
}
@@ -284,14 +282,6 @@ struct commit_hold {
* per-holder allocation consumption tracking. The best we can do is
* flag all the current holders so that as they release we can see
* everyone involved in crossing the limit.
*
* The consumption of space to record freed blocks is tricky. The
* freed_before value was the space available as the holder started.
* But that happens before we actually dirty the first block in the
* freed list. If that block is too full then we just allocate a new
* empty first block. In that case the current remaining here can be a
* lot more than the initial freed_before. We account for that and
* treat freed_before as the maximum capacity.
*/
static void check_holder_budget(struct super_block *sb, struct server_info *server,
struct commit_users *cusers)
@@ -311,13 +301,8 @@ static void check_holder_budget(struct super_block *sb, struct server_info *serv
return;
scoutfs_alloc_meta_remaining(&server->alloc, &avail_now, &freed_now);
avail_used = cusers->avail_before - avail_now;
if (freed_now < cusers->freed_before)
freed_used = cusers->freed_before - freed_now;
else
freed_used = SCOUTFS_ALLOC_LIST_MAX_BLOCKS - freed_now;
freed_used = cusers->freed_before - freed_now;
budget = cusers->nr_holders * COMMIT_HOLD_ALLOC_BUDGET;
if (avail_used <= budget && freed_used <= budget)
return;
@@ -340,18 +325,31 @@ static void check_holder_budget(struct super_block *sb, struct server_info *serv
/*
* We don't have per-holder consumption. We allow commit holders as
* long as the total budget of all the holders doesn't exceed the alloc
* resources that were available. If a hold is waiting for budget
* availability in the allocators then we try and kick off a commit to
* fill and use the next allocators after the current transaction.
* resources that were available
*/
static bool commit_alloc_has_room(struct server_info *server, struct commit_users *cusers,
unsigned int more_holders)
{
u32 avail_before;
u32 freed_before;
u32 budget;
if (cusers->nr_holders > 0) {
avail_before = cusers->avail_before;
freed_before = cusers->freed_before;
} else {
scoutfs_alloc_meta_remaining(&server->alloc, &avail_before, &freed_before);
}
budget = (cusers->nr_holders + more_holders) * COMMIT_HOLD_ALLOC_BUDGET;
return avail_before >= budget && freed_before >= budget;
}
static bool hold_commit(struct super_block *sb, struct server_info *server,
struct commit_users *cusers, struct commit_hold *hold)
{
bool has_room;
bool held;
u32 budget;
u32 av;
u32 fr;
bool held = false;
spin_lock(&cusers->lock);
@@ -359,39 +357,19 @@ static bool hold_commit(struct super_block *sb, struct server_info *server,
check_holder_budget(sb, server, cusers);
if (cusers->nr_holders == 0) {
scoutfs_alloc_meta_remaining(&server->alloc, &av, &fr);
} else {
av = cusers->avail_before;
fr = cusers->freed_before;
}
/* +2 for our additional hold and then for the final commit work the server does */
budget = (cusers->nr_holders + 2) * COMMIT_HOLD_ALLOC_BUDGET;
has_room = av >= budget && fr >= budget;
/* checking applying so holders drain once an apply caller starts waiting */
held = !cusers->committing && has_room && list_empty(&cusers->applying);
if (held) {
if (list_empty(&cusers->applying) && commit_alloc_has_room(server, cusers, 2)) {
scoutfs_alloc_meta_remaining(&server->alloc, &hold->avail, &hold->freed);
if (cusers->nr_holders == 0) {
cusers->avail_before = av;
cusers->freed_before = fr;
hold->avail = av;
hold->freed = fr;
cusers->avail_before = hold->avail;
cusers->freed_before = hold->freed;
cusers->exceeded = false;
} else {
scoutfs_alloc_meta_remaining(&server->alloc, &hold->avail, &hold->freed);
}
hold->exceeded = false;
hold->start = ktime_get();
list_add_tail(&hold->entry, &cusers->holding);
cusers->nr_holders++;
} else if (!has_room && cusers->nr_holders == 0 && !cusers->committing) {
cusers->committing = true;
queue_work(server->wq, &server->commit_work);
held = true;
}
spin_unlock(&cusers->lock);
@@ -415,27 +393,6 @@ static void server_hold_commit(struct super_block *sb, struct commit_hold *hold)
wait_event(cusers->waitq, hold_commit(sb, server, cusers, hold));
}
/*
* Return the higher of the avail or freed used by the active commit
* since this holder joined the commit. This is *not* the amount used
* by the holder, we don't track per-holder alloc use.
*/
static u32 server_hold_alloc_used_since(struct super_block *sb, struct commit_hold *hold)
{
DECLARE_SERVER_INFO(sb, server);
u32 avail_used;
u32 freed_used;
u32 avail_now;
u32 freed_now;
scoutfs_alloc_meta_remaining(&server->alloc, &avail_now, &freed_now);
avail_used = hold->avail - avail_now;
freed_used = hold->freed - freed_now;
return max(avail_used, freed_used);
}
/*
* This is called while holding the commit and returns once the commit
* is successfully written. Many holders can all wait for all holders
@@ -446,6 +403,7 @@ static int server_apply_commit(struct super_block *sb, struct commit_hold *hold,
DECLARE_SERVER_INFO(sb, server);
struct commit_users *cusers = &server->cusers;
struct timespec ts;
bool start_commit;
spin_lock(&cusers->lock);
@@ -466,15 +424,13 @@ static int server_apply_commit(struct super_block *sb, struct commit_hold *hold,
list_del_init(&hold->entry);
hold->ret = err;
}
cusers->nr_holders--;
if (cusers->nr_holders == 0 && !cusers->committing && !list_empty(&cusers->applying)) {
cusers->committing = true;
queue_work(server->wq, &server->commit_work);
}
start_commit = cusers->nr_holders == 0 && !list_empty(&cusers->applying);
spin_unlock(&cusers->lock);
if (start_commit)
queue_work(server->wq, &server->commit_work);
wait_event(cusers->waitq, list_empty_careful(&hold->entry));
smp_rmb(); /* entry load before ret */
return hold->ret;
@@ -482,8 +438,8 @@ static int server_apply_commit(struct super_block *sb, struct commit_hold *hold,
/*
* Start a commit from the commit work. We should only have been queued
* while there are no active holders and someone started the commit.
* There may or may not be blocked apply callers waiting for the result.
* while a holder is waiting to apply after all active holders have
* finished.
*/
static int commit_start(struct super_block *sb, struct commit_users *cusers)
{
@@ -492,7 +448,7 @@ static int commit_start(struct super_block *sb, struct commit_users *cusers)
/* make sure holders held off once commit started */
spin_lock(&cusers->lock);
TRACE_COMMIT_USERS(sb, cusers, start);
if (WARN_ON_ONCE(!cusers->committing || cusers->nr_holders != 0))
if (WARN_ON_ONCE(list_empty(&cusers->applying) || cusers->nr_holders != 0))
ret = -EINVAL;
spin_unlock(&cusers->lock);
@@ -515,7 +471,6 @@ static void commit_end(struct super_block *sb, struct commit_users *cusers, int
smp_wmb(); /* ret stores before list updates */
list_for_each_entry_safe(hold, tmp, &cusers->applying, entry)
list_del_init(&hold->entry);
cusers->committing = false;
spin_unlock(&cusers->lock);
wake_up(&cusers->waitq);
@@ -528,7 +483,7 @@ static void get_stable(struct super_block *sb, struct scoutfs_super_block *super
unsigned int seq;
do {
seq = read_seqbegin(&server->seqlock);
seq = read_seqcount_begin(&server->stable_seqcount);
if (super)
*super = server->stable_super;
if (roots) {
@@ -536,7 +491,7 @@ static void get_stable(struct super_block *sb, struct scoutfs_super_block *super
roots->logs_root = server->stable_super.logs_root;
roots->srch_root = server->stable_super.srch_root;
}
} while (read_seqretry(&server->seqlock, seq));
} while (read_seqcount_retry(&server->stable_seqcount, seq));
}
u64 scoutfs_server_seq(struct super_block *sb)
@@ -570,9 +525,11 @@ void scoutfs_server_set_seq_if_greater(struct super_block *sb, u64 seq)
static void set_stable_super(struct server_info *server, struct scoutfs_super_block *super)
{
write_seqlock(&server->seqlock);
preempt_disable();
write_seqcount_begin(&server->stable_seqcount);
server->stable_super = *super;
write_sequnlock(&server->seqlock);
write_seqcount_end(&server->stable_seqcount);
preempt_enable();
}
/*
@@ -586,7 +543,7 @@ static void set_stable_super(struct server_info *server, struct scoutfs_super_bl
* implement commits with a single pending work func.
*
* Processing paths hold the commit while they're making multiple
* dependent changes. When they're done and want it persistent they
* dependent changes. When they're done and want it persistent they add
* queue the commit work. This work runs, performs the commit, and
* wakes all the applying waiters with the result. Readers can run
* concurrently with these commits.
@@ -961,24 +918,22 @@ static int find_log_trees_item(struct super_block *sb,
}
/*
* Find the log_trees item with the greatest nr for each rid. Fills the
* caller's log_trees and sets the key before the returned log_trees for
* the next iteration. Returns 0 when done, > 0 for each item, and
* -errno on fatal errors.
* Find the next log_trees item from the key. Fills the caller's log_trees and sets
* the key past the returned log_trees for iteration. Returns 0 when done, > 0 for each
* item, and -errno on fatal errors.
*/
static int for_each_rid_last_lt(struct super_block *sb, struct scoutfs_btree_root *root,
struct scoutfs_key *key, struct scoutfs_log_trees *lt)
static int for_each_lt(struct super_block *sb, struct scoutfs_btree_root *root,
struct scoutfs_key *key, struct scoutfs_log_trees *lt)
{
SCOUTFS_BTREE_ITEM_REF(iref);
int ret;
ret = scoutfs_btree_prev(sb, root, key, &iref);
ret = scoutfs_btree_next(sb, root, key, &iref);
if (ret == 0) {
if (iref.val_len == sizeof(struct scoutfs_log_trees)) {
memcpy(lt, iref.val, iref.val_len);
*key = *iref.key;
key->sklt_nr = 0;
scoutfs_key_dec(key);
scoutfs_key_inc(key);
ret = 1;
} else {
ret = -EIO;
@@ -1073,13 +1028,21 @@ static int next_log_merge_item(struct super_block *sb,
* abandoned log btree finalized. If it takes too long each client has
* a change to make forward progress before being asked to commit again.
*
* We're waiting on heavy state that is protected by mutexes and
* transaction machinery. It's tricky to recreate that state for
* lightweight condition tests that don't change task state. Instead of
* trying to get that right, particularly as we unwind after success or
* after timeouts, waiters use an unsatisfying poll. Short enough to
* not add terrible latency, given how heavy and infrequent this already
* is, and long enough to not melt the cpu. This could be tuned if it
* becomes a problem.
*
* This can end up finalizing a new empty log btree if a new mount
* happens to arrive at just the right time. That's fine, merging will
* ignore and tear down the empty input.
*/
#define FINALIZE_POLL_MIN_DELAY_MS 5U
#define FINALIZE_POLL_MAX_DELAY_MS 100U
#define FINALIZE_POLL_DELAY_GROWTH_PCT 150U
#define FINALIZE_POLL_MS (11)
#define FINALIZE_TIMEOUT_MS (MSEC_PER_SEC / 2)
static int finalize_and_start_log_merge(struct super_block *sb, struct scoutfs_log_trees *lt,
u64 rid, struct commit_hold *hold)
{
@@ -1087,10 +1050,8 @@ static int finalize_and_start_log_merge(struct super_block *sb, struct scoutfs_l
struct scoutfs_super_block *super = DIRTY_SUPER_SB(sb);
struct scoutfs_log_merge_status stat;
struct scoutfs_log_merge_range rng;
struct scoutfs_mount_options opts;
struct scoutfs_log_trees each_lt;
struct scoutfs_log_trees fin;
unsigned int delay_ms;
unsigned long timeo;
bool saw_finalized;
bool others_active;
@@ -1098,14 +1059,10 @@ static int finalize_and_start_log_merge(struct super_block *sb, struct scoutfs_l
bool ours_visible;
struct scoutfs_key key;
char *err_str = NULL;
ktime_t start;
int ret;
int err;
scoutfs_options_read(sb, &opts);
timeo = jiffies + msecs_to_jiffies(opts.log_merge_wait_timeout_ms);
delay_ms = FINALIZE_POLL_MIN_DELAY_MS;
start = ktime_get_raw();
timeo = jiffies + msecs_to_jiffies(FINALIZE_TIMEOUT_MS);
for (;;) {
/* nothing to do if there's already a merge in flight */
@@ -1122,13 +1079,8 @@ static int finalize_and_start_log_merge(struct super_block *sb, struct scoutfs_l
saw_finalized = false;
others_active = false;
ours_visible = false;
scoutfs_key_init_log_trees(&key, U64_MAX, U64_MAX);
while ((ret = for_each_rid_last_lt(sb, &super->logs_root, &key, &each_lt)) > 0) {
trace_scoutfs_server_finalize_items(sb, rid, le64_to_cpu(each_lt.rid),
le64_to_cpu(each_lt.nr),
le64_to_cpu(each_lt.flags),
le64_to_cpu(each_lt.get_trans_seq));
scoutfs_key_init_log_trees(&key, 0, 0);
while ((ret = for_each_lt(sb, &super->logs_root, &key, &each_lt)) > 0) {
if ((le64_to_cpu(each_lt.flags) & SCOUTFS_LOG_TREES_FINALIZED))
saw_finalized = true;
@@ -1153,10 +1105,6 @@ static int finalize_and_start_log_merge(struct super_block *sb, struct scoutfs_l
finalize_ours = (lt->item_root.height > 2) ||
(le32_to_cpu(lt->meta_avail.flags) & SCOUTFS_ALLOC_FLAG_LOW);
trace_scoutfs_server_finalize_decision(sb, rid, saw_finalized, others_active,
ours_visible, finalize_ours, delay_ms,
server->finalize_sent_seq);
/* done if we're not finalizing and there's no finalized */
if (!finalize_ours && !saw_finalized) {
ret = 0;
@@ -1164,13 +1112,12 @@ static int finalize_and_start_log_merge(struct super_block *sb, struct scoutfs_l
}
/* send sync requests soon to give time to commit */
scoutfs_key_init_log_trees(&key, U64_MAX, U64_MAX);
scoutfs_key_init_log_trees(&key, 0, 0);
while (others_active &&
(ret = for_each_rid_last_lt(sb, &super->logs_root, &key, &each_lt)) > 0) {
(ret = for_each_lt(sb, &super->logs_root, &key, &each_lt)) > 0) {
if ((le64_to_cpu(each_lt.flags) & SCOUTFS_LOG_TREES_FINALIZED) ||
(le64_to_cpu(each_lt.rid) == rid) ||
(le64_to_cpu(each_lt.get_trans_seq) <= server->finalize_sent_seq))
(le64_to_cpu(each_lt.rid) == rid))
continue;
ret = scoutfs_net_submit_request_node(sb, server->conn,
@@ -1190,8 +1137,6 @@ static int finalize_and_start_log_merge(struct super_block *sb, struct scoutfs_l
break;
}
server->finalize_sent_seq = scoutfs_server_seq(sb);
/* Finalize ours if it's visible to others */
if (ours_visible) {
fin = *lt;
@@ -1229,16 +1174,13 @@ static int finalize_and_start_log_merge(struct super_block *sb, struct scoutfs_l
if (ret < 0)
err_str = "applying commit before waiting for finalized";
msleep(delay_ms);
delay_ms = min(delay_ms * FINALIZE_POLL_DELAY_GROWTH_PCT / 100,
FINALIZE_POLL_MAX_DELAY_MS);
msleep(FINALIZE_POLL_MS);
server_hold_commit(sb, hold);
mutex_lock(&server->logs_mutex);
/* done if we timed out */
if (time_after(jiffies, timeo)) {
scoutfs_inc_counter(sb, log_merge_wait_timeout);
ret = 0;
break;
}
@@ -1821,29 +1763,43 @@ out:
* Give the caller the last seq before outstanding client commits. All
* seqs up to and including this are stable, new client transactions can
* only have greater seqs.
*
* For each rid, only its greatest log trees nr can be an open commit.
* We look at the last log_trees item for each client rid and record its
* trans seq if it hasn't been committed.
*/
static int get_stable_trans_seq(struct super_block *sb, u64 *last_seq_ret)
{
struct scoutfs_super_block *super = DIRTY_SUPER_SB(sb);
DECLARE_SERVER_INFO(sb, server);
struct scoutfs_log_trees lt;
SCOUTFS_BTREE_ITEM_REF(iref);
struct scoutfs_log_trees *lt;
struct scoutfs_key key;
u64 last_seq = 0;
int ret;
last_seq = scoutfs_server_seq(sb) - 1;
scoutfs_key_init_log_trees(&key, 0, 0);
mutex_lock(&server->logs_mutex);
scoutfs_key_init_log_trees(&key, U64_MAX, U64_MAX);
while ((ret = for_each_rid_last_lt(sb, &super->logs_root, &key, &lt)) > 0) {
if ((le64_to_cpu(lt.get_trans_seq) > le64_to_cpu(lt.commit_trans_seq)) &&
le64_to_cpu(lt.get_trans_seq) <= last_seq) {
last_seq = le64_to_cpu(lt.get_trans_seq) - 1;
for (;; scoutfs_key_inc(&key)) {
ret = scoutfs_btree_next(sb, &super->logs_root, &key, &iref);
if (ret == 0) {
if (iref.val_len == sizeof(*lt)) {
lt = iref.val;
if ((le64_to_cpu(lt->get_trans_seq) >
le64_to_cpu(lt->commit_trans_seq)) &&
le64_to_cpu(lt->get_trans_seq) <= last_seq) {
last_seq = le64_to_cpu(lt->get_trans_seq) - 1;
}
key = *iref.key;
} else {
ret = -EIO;
}
scoutfs_btree_put_iref(&iref);
}
if (ret < 0) {
if (ret == -ENOENT) {
ret = 0;
break;
}
}
}
@@ -1990,7 +1946,9 @@ static int server_srch_get_compact(struct super_block *sb,
ret = scoutfs_srch_get_compact(sb, &server->alloc, &server->wri,
&super->srch_root, rid, sc);
mutex_unlock(&server->srch_mutex);
if (ret < 0 || (ret == 0 && sc->nr == 0))
if (ret == 0 && sc->nr == 0)
ret = -ENOENT;
if (ret < 0)
goto apply;
mutex_lock(&server->alloc_mutex);
@@ -2495,11 +2453,9 @@ static void server_log_merge_free_work(struct work_struct *work)
while (!server_is_stopping(server)) {
if (!commit) {
server_hold_commit(sb, &hold);
mutex_lock(&server->logs_mutex);
commit = true;
}
server_hold_commit(sb, &hold);
mutex_lock(&server->logs_mutex);
commit = true;
ret = next_log_merge_item(sb, &super->log_merge,
SCOUTFS_LOG_MERGE_FREEING_ZONE,
@@ -2546,14 +2502,12 @@ static void server_log_merge_free_work(struct work_struct *work)
/* freed blocks are in allocator, we *have* to update fr */
BUG_ON(ret < 0);
if (server_hold_alloc_used_since(sb, &hold) >= COMMIT_HOLD_ALLOC_BUDGET / 2) {
mutex_unlock(&server->logs_mutex);
ret = server_apply_commit(sb, &hold, ret);
commit = false;
if (ret < 0) {
err_str = "looping commit del/upd freeing item";
break;
}
mutex_unlock(&server->logs_mutex);
ret = server_apply_commit(sb, &hold, ret);
commit = false;
if (ret < 0) {
err_str = "looping commit del/upd freeing item";
break;
}
}
@@ -3096,9 +3050,9 @@ static int server_get_volopt(struct super_block *sb, struct scoutfs_net_connecti
}
do {
seq = read_seqbegin(&server->seqlock);
seq = read_seqcount_begin(&server->volopt_seqcount);
volopt = server->volopt;
} while (read_seqretry(&server->seqlock, seq));
} while (read_seqcount_retry(&server->volopt_seqcount, seq));
out:
return scoutfs_net_response(sb, conn, cmd, id, ret, &volopt, sizeof(volopt));
@@ -3167,12 +3121,12 @@ static int server_set_volopt(struct super_block *sb, struct scoutfs_net_connecti
apply:
ret = server_apply_commit(sb, &hold, ret);
write_seqlock(&server->seqlock);
write_seqcount_begin(&server->volopt_seqcount);
if (ret == 0)
server->volopt = super->volopt;
else
super->volopt = server->volopt;
write_sequnlock(&server->seqlock);
write_seqcount_end(&server->volopt_seqcount);
mutex_unlock(&server->volopt_mutex);
out:
@@ -3215,12 +3169,12 @@ static int server_clear_volopt(struct super_block *sb, struct scoutfs_net_connec
ret = server_apply_commit(sb, &hold, ret);
write_seqlock(&server->seqlock);
write_seqcount_begin(&server->volopt_seqcount);
if (ret == 0)
server->volopt = super->volopt;
else
super->volopt = server->volopt;
write_sequnlock(&server->seqlock);
write_seqcount_end(&server->volopt_seqcount);
mutex_unlock(&server->volopt_mutex);
out:
@@ -4326,7 +4280,6 @@ static void scoutfs_server_worker(struct work_struct *work)
scoutfs_info(sb, "server starting at "SIN_FMT, SIN_ARG(&sin));
scoutfs_block_writer_init(sb, &server->wri);
server->finalize_sent_seq = 0;
/* first make sure no other servers are still running */
ret = scoutfs_quorum_fence_leaders(sb, &server->qconf, server->term);
@@ -4360,9 +4313,9 @@ static void scoutfs_server_worker(struct work_struct *work)
}
/* update volume options early, possibly for use during startup */
write_seqlock(&server->seqlock);
write_seqcount_begin(&server->volopt_seqcount);
server->volopt = super->volopt;
write_sequnlock(&server->seqlock);
write_seqcount_end(&server->volopt_seqcount);
atomic64_set(&server->seq_atomic, le64_to_cpu(super->seq));
set_stable_super(server, super);
@@ -4502,7 +4455,6 @@ int scoutfs_server_setup(struct super_block *sb)
server->sb = sb;
spin_lock_init(&server->lock);
seqlock_init(&server->seqlock);
init_waitqueue_head(&server->waitq);
INIT_WORK(&server->work, scoutfs_server_worker);
server->status = SERVER_DOWN;
@@ -4517,6 +4469,8 @@ int scoutfs_server_setup(struct super_block *sb)
INIT_WORK(&server->log_merge_free_work, server_log_merge_free_work);
mutex_init(&server->srch_mutex);
mutex_init(&server->mounted_clients_mutex);
seqcount_init(&server->stable_seqcount);
seqcount_init(&server->volopt_seqcount);
mutex_init(&server->volopt_mutex);
INIT_WORK(&server->fence_pending_recov_work, fence_pending_recov_worker);
INIT_DELAYED_WORK(&server->reclaim_dwork, reclaim_worker);
+13 -188
View File
@@ -30,9 +30,6 @@
#include "client.h"
#include "counters.h"
#include "scoutfs_trace.h"
#include "triggers.h"
#include "sysfs.h"
#include "msg.h"
/*
* This srch subsystem gives us a way to find inodes that have a given
@@ -71,14 +68,10 @@ struct srch_info {
atomic_t shutdown;
struct workqueue_struct *workq;
struct delayed_work compact_dwork;
struct scoutfs_sysfs_attrs ssa;
atomic_t compact_delay_ms;
};
#define DECLARE_SRCH_INFO(sb, name) \
struct srch_info *name = SCOUTFS_SB(sb)->srch_info
#define DECLARE_SRCH_INFO_KOBJ(kobj, name) \
DECLARE_SRCH_INFO(SCOUTFS_SYSFS_ATTRS_SB(kobj), name)
#define SRE_FMT "%016llx.%llu.%llu"
#define SRE_ARG(sre) \
@@ -527,95 +520,6 @@ out:
return ret;
}
/*
* Padded entries are encoded in pairs after an existing entry. All of
* the pairs cancel each other out by all readers (the second encoding
* looks like deletion) so they aren't visible to the first/last bounds of
* the block or file.
*/
static int append_padded_entry(struct scoutfs_srch_file *sfl, u64 blk,
struct scoutfs_srch_block *srb, struct scoutfs_srch_entry *sre)
{
int ret;
ret = encode_entry(srb->entries + le32_to_cpu(srb->entry_bytes),
sre, &srb->tail);
if (ret > 0) {
srb->tail = *sre;
le32_add_cpu(&srb->entry_nr, 1);
le32_add_cpu(&srb->entry_bytes, ret);
le64_add_cpu(&sfl->entries, 1);
ret = 0;
}
return ret;
}
/*
* This is called by a testing trigger to create a very specific case of
* encoded entry offsets. We want the last entry in the block to start
* precisely at the _SAFE_BYTES offset.
*
* This is called when there is a single existing entry in the block.
* We have the entire block to work with. We encode pairs of matching
* entries. This hides them from readers (both searches and merging) as
* they're interpreted as creation and deletion and are deleted. We use
* the existing hash value of the first entry in the block but then set
* the inode to an impossibly large number so it doesn't interfere with
* anything.
*
* To hit the specific offset we very carefully manage the amount of
* bytes of change between fields in the entry. We know that if we
* change all the byte of the ino and id we end up with a 20 byte
* (2+8+8,2) encoding of the pair of entries. To have the last entry
* start at the _SAFE_POS offset we know that the final 20 byte pair
* encoding needs to end at 2 bytes (second entry encoding) after the
* _SAFE_POS offset.
*
* So as we encode pairs we watch the delta of our current offset from
* that desired final offset of 2 past _SAFE_POS. If we're a multiple
* of 20 away then we encode the full 20 byte pairs. If we're not, then
* we drop a byte to encode 19 bytes. That'll slowly change the offset
* to be a multiple of 20 again while encoding large entries.
*/
static void pad_entries_at_safe(struct scoutfs_srch_file *sfl, u64 blk,
struct scoutfs_srch_block *srb)
{
struct scoutfs_srch_entry sre;
u32 target;
s32 diff;
u64 hash;
u64 ino;
u64 id;
int ret;
hash = le64_to_cpu(srb->tail.hash);
ino = le64_to_cpu(srb->tail.ino) | (1ULL << 62);
id = le64_to_cpu(srb->tail.id);
target = SCOUTFS_SRCH_BLOCK_SAFE_BYTES + 2;
while ((diff = target - le32_to_cpu(srb->entry_bytes)) > 0) {
ino ^= 1ULL << (7 * 8);
if (diff % 20 == 0) {
id ^= 1ULL << (7 * 8);
} else {
id ^= 1ULL << (6 * 8);
}
sre.hash = cpu_to_le64(hash);
sre.ino = cpu_to_le64(ino);
sre.id = cpu_to_le64(id);
ret = append_padded_entry(sfl, blk, srb, &sre);
if (ret == 0)
ret = append_padded_entry(sfl, blk, srb, &sre);
BUG_ON(ret != 0);
diff = target - le32_to_cpu(srb->entry_bytes);
}
}
/*
* The caller is dropping an ino/id because the tracking rbtree is full.
* This loses information so we can't return any entries at or after the
@@ -1083,9 +987,6 @@ int scoutfs_srch_rotate_log(struct super_block *sb,
struct scoutfs_key key;
int ret;
if (sfl->ref.blkno && !force && scoutfs_trigger(sb, SRCH_FORCE_LOG_ROTATE))
force = true;
if (sfl->ref.blkno == 0 ||
(!force && le64_to_cpu(sfl->blocks) < SCOUTFS_SRCH_LOG_BLOCK_LIMIT))
return 0;
@@ -1561,7 +1462,7 @@ static int kway_merge(struct super_block *sb,
struct scoutfs_block_writer *wri,
struct scoutfs_srch_file *sfl,
kway_get_t kway_get, kway_advance_t kway_adv,
void **args, int nr, bool logs_input)
void **args, int nr)
{
DECLARE_SRCH_INFO(sb, srinf);
struct scoutfs_srch_block *srb = NULL;
@@ -1666,15 +1567,6 @@ static int kway_merge(struct super_block *sb,
blk++;
}
/* end sorted block on _SAFE offset for testing */
if (bl && le32_to_cpu(srb->entry_nr) == 1 && logs_input &&
scoutfs_trigger(sb, SRCH_COMPACT_LOGS_PAD_SAFE)) {
pad_entries_at_safe(sfl, blk, srb);
scoutfs_block_put(sb, bl);
bl = NULL;
blk++;
}
scoutfs_inc_counter(sb, srch_compact_entry);
} else {
@@ -1717,8 +1609,6 @@ static int kway_merge(struct super_block *sb,
empty++;
ret = 0;
} else if (ret < 0) {
if (ret == -ENOANO) /* just testing trigger */
ret = 0;
goto out;
}
@@ -1926,7 +1816,7 @@ static int compact_logs(struct super_block *sb,
}
ret = kway_merge(sb, alloc, wri, &sc->out, kway_get_page, kway_adv_page,
args, nr_pages, true);
args, nr_pages);
if (ret < 0)
goto out;
@@ -1984,18 +1874,12 @@ static int kway_get_reader(struct super_block *sb,
srb = rdr->bl->data;
if (rdr->pos > SCOUTFS_SRCH_BLOCK_SAFE_BYTES ||
rdr->skip > SCOUTFS_SRCH_BLOCK_SAFE_BYTES ||
rdr->skip >= SCOUTFS_SRCH_BLOCK_SAFE_BYTES ||
rdr->skip >= le32_to_cpu(srb->entry_bytes)) {
/* XXX inconsistency */
return -EIO;
}
if (rdr->decoded_bytes == 0 && rdr->pos == SCOUTFS_SRCH_BLOCK_SAFE_BYTES &&
scoutfs_trigger(sb, SRCH_MERGE_STOP_SAFE)) {
/* only used in testing */
return -ENOANO;
}
/* decode entry, possibly skipping start of the block */
while (rdr->decoded_bytes == 0 || rdr->pos < rdr->skip) {
ret = decode_entry(srb->entries + rdr->pos,
@@ -2085,7 +1969,7 @@ static int compact_sorted(struct super_block *sb,
}
ret = kway_merge(sb, alloc, wri, &sc->out, kway_get_reader,
kway_adv_reader, args, nr, false);
kway_adv_reader, args, nr);
sc->flags |= SCOUTFS_SRCH_COMPACT_FLAG_DONE;
for (i = 0; i < nr; i++) {
@@ -2214,15 +2098,8 @@ static int delete_files(struct super_block *sb, struct scoutfs_alloc *alloc,
return ret;
}
static void queue_compact_work(struct srch_info *srinf, bool immediate)
{
unsigned long delay;
if (!atomic_read(&srinf->shutdown)) {
delay = immediate ? 0 : msecs_to_jiffies(atomic_read(&srinf->compact_delay_ms));
queue_delayed_work(srinf->workq, &srinf->compact_dwork, delay);
}
}
/* wait 10s between compact attempts on error, immediate after success */
#define SRCH_COMPACT_DELAY_MS (10 * MSEC_PER_SEC)
/*
* Get a compaction operation from the server, sort the entries from the
@@ -2250,6 +2127,7 @@ static void scoutfs_srch_compact_worker(struct work_struct *work)
struct super_block *sb = srinf->sb;
struct scoutfs_block_writer wri;
struct scoutfs_alloc alloc;
unsigned long delay;
int ret;
int err;
@@ -2262,8 +2140,6 @@ static void scoutfs_srch_compact_worker(struct work_struct *work)
scoutfs_block_writer_init(sb, &wri);
ret = scoutfs_client_srch_get_compact(sb, sc);
if (ret >= 0)
trace_scoutfs_srch_compact_client_recv(sb, sc);
if (ret < 0 || sc->nr == 0)
goto out;
@@ -2292,7 +2168,6 @@ commit:
sc->meta_freed = alloc.freed;
sc->flags |= ret < 0 ? SCOUTFS_SRCH_COMPACT_FLAG_ERROR : 0;
trace_scoutfs_srch_compact_client_send(sb, sc);
err = scoutfs_client_srch_commit_compact(sb, sc);
if (err < 0 && ret == 0)
ret = err;
@@ -2303,56 +2178,14 @@ out:
scoutfs_inc_counter(sb, srch_compact_error);
scoutfs_block_writer_forget_all(sb, &wri);
queue_compact_work(srinf, sc->nr > 0 && ret == 0);
if (!atomic_read(&srinf->shutdown)) {
delay = ret == 0 ? 0 : msecs_to_jiffies(SRCH_COMPACT_DELAY_MS);
queue_delayed_work(srinf->workq, &srinf->compact_dwork, delay);
}
kfree(sc);
}
static ssize_t compact_delay_ms_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf)
{
DECLARE_SRCH_INFO_KOBJ(kobj, srinf);
return snprintf(buf, PAGE_SIZE, "%u", atomic_read(&srinf->compact_delay_ms));
}
#define MIN_COMPACT_DELAY_MS MSEC_PER_SEC
#define DEF_COMPACT_DELAY_MS (10 * MSEC_PER_SEC)
#define MAX_COMPACT_DELAY_MS (60 * MSEC_PER_SEC)
static ssize_t compact_delay_ms_store(struct kobject *kobj, struct kobj_attribute *attr,
const char *buf, size_t count)
{
struct super_block *sb = SCOUTFS_SYSFS_ATTRS_SB(kobj);
DECLARE_SRCH_INFO(sb, srinf);
char nullterm[30]; /* more than enough for octal -U64_MAX */
u64 val;
int len;
int ret;
len = min(count, sizeof(nullterm) - 1);
memcpy(nullterm, buf, len);
nullterm[len] = '\0';
ret = kstrtoll(nullterm, 0, &val);
if (ret < 0 || val < MIN_COMPACT_DELAY_MS || val > MAX_COMPACT_DELAY_MS) {
scoutfs_err(sb, "invalid compact_delay_ms value, must be between %lu and %lu",
MIN_COMPACT_DELAY_MS, MAX_COMPACT_DELAY_MS);
return -EINVAL;
}
atomic_set(&srinf->compact_delay_ms, val);
cancel_delayed_work(&srinf->compact_dwork);
queue_compact_work(srinf, false);
return count;
}
SCOUTFS_ATTR_RW(compact_delay_ms);
static struct attribute *srch_attrs[] = {
SCOUTFS_ATTR_PTR(compact_delay_ms),
NULL,
};
void scoutfs_srch_destroy(struct super_block *sb)
{
struct scoutfs_sb_info *sbi = SCOUTFS_SB(sb);
@@ -2369,8 +2202,6 @@ void scoutfs_srch_destroy(struct super_block *sb)
destroy_workqueue(srinf->workq);
}
scoutfs_sysfs_destroy_attrs(sb, &srinf->ssa);
kfree(srinf);
sbi->srch_info = NULL;
}
@@ -2388,15 +2219,8 @@ int scoutfs_srch_setup(struct super_block *sb)
srinf->sb = sb;
atomic_set(&srinf->shutdown, 0);
INIT_DELAYED_WORK(&srinf->compact_dwork, scoutfs_srch_compact_worker);
scoutfs_sysfs_init_attrs(sb, &srinf->ssa);
atomic_set(&srinf->compact_delay_ms, DEF_COMPACT_DELAY_MS);
sbi->srch_info = srinf;
ret = scoutfs_sysfs_create_attrs(sb, &srinf->ssa, srch_attrs, "srch");
if (ret < 0)
goto out;
srinf->workq = alloc_workqueue("scoutfs_srch_compact",
WQ_NON_REENTRANT | WQ_UNBOUND |
WQ_HIGHPRI, 0);
@@ -2405,7 +2229,8 @@ int scoutfs_srch_setup(struct super_block *sb)
goto out;
}
queue_compact_work(srinf, false);
queue_delayed_work(srinf->workq, &srinf->compact_dwork,
msecs_to_jiffies(SRCH_COMPACT_DELAY_MS));
ret = 0;
out:
-6
View File
@@ -49,8 +49,6 @@
#include "volopt.h"
#include "fence.h"
#include "xattr.h"
#include "wkic.h"
#include "quota.h"
#include "scoutfs_trace.h"
static struct dentry *scoutfs_debugfs_root;
@@ -196,9 +194,7 @@ static void scoutfs_put_super(struct super_block *sb)
scoutfs_shutdown_trans(sb);
scoutfs_volopt_destroy(sb);
scoutfs_client_destroy(sb);
scoutfs_quota_destroy(sb);
scoutfs_inode_destroy(sb);
scoutfs_wkic_destroy(sb);
scoutfs_item_destroy(sb);
scoutfs_forest_destroy(sb);
scoutfs_data_destroy(sb);
@@ -548,9 +544,7 @@ static int scoutfs_fill_super(struct super_block *sb, void *data, int silent)
scoutfs_block_setup(sb) ?:
scoutfs_forest_setup(sb) ?:
scoutfs_item_setup(sb) ?:
scoutfs_wkic_setup(sb) ?:
scoutfs_inode_setup(sb) ?:
scoutfs_quota_setup(sb) ?:
scoutfs_data_setup(sb) ?:
scoutfs_setup_trans(sb) ?:
scoutfs_omap_setup(sb) ?:
-17
View File
@@ -30,8 +30,6 @@ struct recov_info;
struct omap_info;
struct volopt_info;
struct fence_info;
struct wkic_info;
struct squota_info;
struct scoutfs_sb_info {
struct super_block *sb;
@@ -57,8 +55,6 @@ struct scoutfs_sb_info {
struct omap_info *omap_info;
struct volopt_info *volopt_info;
struct item_cache_info *item_cache_info;
struct wkic_info *wkic_info;
struct squota_info *squota_info;
struct fence_info *fence_info;
/* tracks tasks waiting for data extents */
@@ -160,17 +156,4 @@ int scoutfs_write_super(struct super_block *sb,
/* to keep this out of the ioctl.h public interface definition */
long scoutfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
/*
* Returns 0 when supported, non-zero -errno when unsupported.
*/
static inline int scoutfs_fmt_vers_unsupported(struct super_block *sb, u64 vers)
{
struct scoutfs_sb_info *sbi = SCOUTFS_SB(sb);
if (sbi && (sbi->fmt_vers < vers))
return -EOPNOTSUPP;
else
return 0;
}
#endif
-90
View File
@@ -1,90 +0,0 @@
/*
* Copyright (C) 2023 Versity Software, Inc. All rights reserved.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public
* License v2 as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License for more details.
*/
#include <linux/kernel.h>
#include <linux/string.h>
#include "format.h"
#include "forest.h"
#include "totl.h"
void scoutfs_totl_set_range(struct scoutfs_key *start, struct scoutfs_key *end)
{
scoutfs_key_set_zeros(start);
start->sk_zone = SCOUTFS_XATTR_TOTL_ZONE;
scoutfs_key_set_ones(end);
end->sk_zone = SCOUTFS_XATTR_TOTL_ZONE;
}
void scoutfs_totl_merge_init(struct scoutfs_totl_merging *merg)
{
memset(merg, 0, sizeof(struct scoutfs_totl_merging));
}
void scoutfs_totl_merge_contribute(struct scoutfs_totl_merging *merg,
u64 seq, u8 flags, void *val, int val_len, int fic)
{
struct scoutfs_xattr_totl_val *tval = val;
if (fic & FIC_FS_ROOT) {
merg->fs_seq = seq;
merg->fs_total = le64_to_cpu(tval->total);
merg->fs_count = le64_to_cpu(tval->count);
} else if (fic & FIC_FINALIZED) {
merg->fin_seq = seq;
merg->fin_total += le64_to_cpu(tval->total);
merg->fin_count += le64_to_cpu(tval->count);
} else {
merg->log_seq = seq;
merg->log_total += le64_to_cpu(tval->total);
merg->log_count += le64_to_cpu(tval->count);
}
}
/*
* .totl. item merging has to be careful because the log btree merging
* code can write partial results to the fs_root. This means that a
* reader can see both cases where new finalized logs should be applied
* to the old fs items and where old finalized logs have already been
* applied to the partially merged fs items. Currently active logged
* items are always applied on top of all cases.
*
* These cases are differentiated with a combination of sequence numbers
* in items, the count of contributing xattrs, and a flag
* differentiating finalized and active logged items. This lets us
* recognize all cases, including when finalized logs were merged and
* deleted the fs item.
*/
void scoutfs_totl_merge_resolve(struct scoutfs_totl_merging *merg, __u64 *total, __u64 *count)
{
*total = 0;
*count = 0;
/* start with the fs item if we have it */
if (merg->fs_seq != 0) {
*total = merg->fs_total;
*count = merg->fs_count;
}
/* apply finalized logs if they're newer or creating */
if (((merg->fs_seq != 0) && (merg->fin_seq > merg->fs_seq)) ||
((merg->fs_seq == 0) && (merg->fin_count > 0))) {
*total += merg->fin_total;
*count += merg->fin_count;
}
/* always apply active logs which must be newer than fs and finalized */
if (merg->log_seq > 0) {
*total += merg->log_total;
*count += merg->log_count;
}
}
-24
View File
@@ -1,24 +0,0 @@
#ifndef _SCOUTFS_TOTL_H_
#define _SCOUTFS_TOTL_H_
#include "key.h"
struct scoutfs_totl_merging {
u64 fs_seq;
u64 fs_total;
u64 fs_count;
u64 fin_seq;
u64 fin_total;
s64 fin_count;
u64 log_seq;
u64 log_total;
s64 log_count;
};
void scoutfs_totl_set_range(struct scoutfs_key *start, struct scoutfs_key *end);
void scoutfs_totl_merge_init(struct scoutfs_totl_merging *merg);
void scoutfs_totl_merge_contribute(struct scoutfs_totl_merging *merg,
u64 seq, u8 flags, void *val, int val_len, int fic);
void scoutfs_totl_merge_resolve(struct scoutfs_totl_merging *merg, __u64 *total, __u64 *count);
#endif
-143
View File
@@ -1,143 +0,0 @@
/*
* Tracing squota_input
*/
#define SQI_FMT "[%u %llu %llu %llu]"
#define SQI_ARGS(i) \
(i)->op, (i)->attrs[0], (i)->attrs[1], (i)->attrs[2]
#define SQI_FIELDS(pref) \
__array(__u64, pref##_attrs, SQ_NS__NR_SELECT) \
__field(__u8, pref##_op)
#define SQI_ASSIGN(pref, i) \
__entry->pref##_attrs[0] = (i)->attrs[0]; \
__entry->pref##_attrs[1] = (i)->attrs[1]; \
__entry->pref##_attrs[2] = (i)->attrs[2]; \
__entry->pref##_op = (i)->op;
#define SQI_ENTRY_ARGS(pref) \
__entry->pref##_op, __entry->pref##_attrs[0], \
__entry->pref##_attrs[1], __entry->pref##_attrs[2]
/*
* Tracing squota_rule
*/
#define SQR_FMT "[%u %llu,%u,%x %llu,%u,%x %llu,%u,%x %u %llu]"
#define SQR_ARGS(r) \
(r)->prio, \
(r)->name_val[0], (r)->name_source[0], (r)->name_flags[0], \
(r)->name_val[1], (r)->name_source[1], (r)->name_flags[1], \
(r)->name_val[2], (r)->name_source[2], (r)->name_flags[2], \
(r)->op, (r)->limit \
#define SQR_FIELDS(pref) \
__array(__u64, pref##_name_val, 3) \
__field(__u64, pref##_limit) \
__array(__u8, pref##_name_source, 3) \
__array(__u8, pref##_name_flags, 3) \
__field(__u8, pref##_prio) \
__field(__u8, pref##_op)
#define SQR_ASSIGN(pref, r) \
__entry->pref##_name_val[0] = (r)->names[0].val; \
__entry->pref##_name_val[1] = (r)->names[1].val; \
__entry->pref##_name_val[2] = (r)->names[2].val; \
__entry->pref##_limit = (r)->limit; \
__entry->pref##_name_source[0] = (r)->names[0].source; \
__entry->pref##_name_source[1] = (r)->names[1].source; \
__entry->pref##_name_source[2] = (r)->names[2].source; \
__entry->pref##_name_flags[0] = (r)->names[0].flags; \
__entry->pref##_name_flags[1] = (r)->names[1].flags; \
__entry->pref##_name_flags[2] = (r)->names[2].flags; \
__entry->pref##_prio = (r)->prio; \
__entry->pref##_op = (r)->op;
#define SQR_ENTRY_ARGS(pref) \
__entry->pref##_prio, __entry->pref##_name_val[0], \
__entry->pref##_name_source[0], __entry->pref##_name_flags[0], \
__entry->pref##_name_val[1], __entry->pref##_name_source[1], \
__entry->pref##_name_flags[1], __entry->pref##_name_val[2], \
__entry->pref##_name_source[2], __entry->pref##_name_flags[2], \
__entry->pref##_op, __entry->pref##_limit
TRACE_EVENT(scoutfs_quota_check,
TP_PROTO(struct super_block *sb, long rs_ptr, struct squota_input *inp, int ret),
TP_ARGS(sb, rs_ptr, inp, ret),
TP_STRUCT__entry(
SCSB_TRACE_FIELDS
__field(long, rs_ptr)
SQI_FIELDS(i)
__field(int, ret)
),
TP_fast_assign(
SCSB_TRACE_ASSIGN(sb);
__entry->rs_ptr = rs_ptr;
SQI_ASSIGN(i, inp);
__entry->ret = ret;
),
TP_printk(SCSBF" rs_ptr %ld ret %d inp "SQI_FMT,
SCSB_TRACE_ARGS, __entry->rs_ptr, __entry->ret, SQI_ENTRY_ARGS(i))
);
DECLARE_EVENT_CLASS(scoutfs_quota_rule_op_class,
TP_PROTO(struct super_block *sb, struct squota_rule *rule, int ret),
TP_ARGS(sb, rule, ret),
TP_STRUCT__entry(
SCSB_TRACE_FIELDS
SQR_FIELDS(r)
__field(int, ret)
),
TP_fast_assign(
SCSB_TRACE_ASSIGN(sb);
SQR_ASSIGN(r, rule);
__entry->ret = ret;
),
TP_printk(SCSBF" "SQR_FMT" ret %d",
SCSB_TRACE_ARGS, SQR_ENTRY_ARGS(r), __entry->ret)
);
DEFINE_EVENT(scoutfs_quota_rule_op_class, scoutfs_quota_add_rule,
TP_PROTO(struct super_block *sb, struct squota_rule *rule, int ret),
TP_ARGS(sb, rule, ret)
);
DEFINE_EVENT(scoutfs_quota_rule_op_class, scoutfs_quota_del_rule,
TP_PROTO(struct super_block *sb, struct squota_rule *rule, int ret),
TP_ARGS(sb, rule, ret)
);
TRACE_EVENT(scoutfs_quota_totl_check,
TP_PROTO(struct super_block *sb, struct squota_input *inp, struct scoutfs_key *key,
u64 limit, int ret),
TP_ARGS(sb, inp, key, limit, ret),
TP_STRUCT__entry(
SCSB_TRACE_FIELDS
SQI_FIELDS(i)
sk_trace_define(k)
__field(__u64, limit)
__field(int, ret)
),
TP_fast_assign(
SCSB_TRACE_ASSIGN(sb);
SQI_ASSIGN(i, inp);
sk_trace_assign(k, key);
__entry->limit = limit;
__entry->ret = ret;
),
TP_printk(SCSBF" inp "SQI_FMT" key "SK_FMT" limit %llu ret %d",
SCSB_TRACE_ARGS, SQI_ENTRY_ARGS(i), sk_trace_args(k), __entry->limit,
__entry->ret)
);
-112
View File
@@ -1,112 +0,0 @@
DECLARE_EVENT_CLASS(scoutfs_wkic_wpage_class,
TP_PROTO(struct super_block *sb, void *ptr, int which, bool n0l, bool n1l,
struct scoutfs_key *start, struct scoutfs_key *end),
TP_ARGS(sb, ptr, which, n0l, n1l, start, end),
TP_STRUCT__entry(
SCSB_TRACE_FIELDS
__field(void *, ptr)
__field(int, which)
__field(bool, n0l)
__field(bool, n1l)
sk_trace_define(start)
sk_trace_define(end)
),
TP_fast_assign(
SCSB_TRACE_ASSIGN(sb);
__entry->ptr = ptr;
__entry->which = which;
__entry->n0l = n0l;
__entry->n1l = n1l;
sk_trace_assign(start, start);
sk_trace_assign(end, end);
__entry->which = which;
),
TP_printk(SCSBF" ptr %p wh %d nl %u,%u start "SK_FMT " end "SK_FMT, SCSB_TRACE_ARGS,
__entry->ptr, __entry->which, __entry->n0l, __entry->n1l,
sk_trace_args(start), sk_trace_args(end))
);
DEFINE_EVENT(scoutfs_wkic_wpage_class, scoutfs_wkic_wpage_alloced,
TP_PROTO(struct super_block *sb, void *ptr, int which, bool n0l, bool n1l,
struct scoutfs_key *start, struct scoutfs_key *end),
TP_ARGS(sb, ptr, which, n0l, n1l, start, end)
);
DEFINE_EVENT(scoutfs_wkic_wpage_class, scoutfs_wkic_wpage_freeing,
TP_PROTO(struct super_block *sb, void *ptr, int which, bool n0l, bool n1l,
struct scoutfs_key *start, struct scoutfs_key *end),
TP_ARGS(sb, ptr, which, n0l, n1l, start, end)
);
DEFINE_EVENT(scoutfs_wkic_wpage_class, scoutfs_wkic_wpage_found,
TP_PROTO(struct super_block *sb, void *ptr, int which, bool n0l, bool n1l,
struct scoutfs_key *start, struct scoutfs_key *end),
TP_ARGS(sb, ptr, which, n0l, n1l, start, end)
);
DEFINE_EVENT(scoutfs_wkic_wpage_class, scoutfs_wkic_wpage_trimmed,
TP_PROTO(struct super_block *sb, void *ptr, int which, bool n0l, bool n1l,
struct scoutfs_key *start, struct scoutfs_key *end),
TP_ARGS(sb, ptr, which, n0l, n1l, start, end)
);
DEFINE_EVENT(scoutfs_wkic_wpage_class, scoutfs_wkic_wpage_erased,
TP_PROTO(struct super_block *sb, void *ptr, int which, bool n0l, bool n1l,
struct scoutfs_key *start, struct scoutfs_key *end),
TP_ARGS(sb, ptr, which, n0l, n1l, start, end)
);
DEFINE_EVENT(scoutfs_wkic_wpage_class, scoutfs_wkic_wpage_inserting,
TP_PROTO(struct super_block *sb, void *ptr, int which, bool n0l, bool n1l,
struct scoutfs_key *start, struct scoutfs_key *end),
TP_ARGS(sb, ptr, which, n0l, n1l, start, end)
);
DEFINE_EVENT(scoutfs_wkic_wpage_class, scoutfs_wkic_wpage_inserted,
TP_PROTO(struct super_block *sb, void *ptr, int which, bool n0l, bool n1l,
struct scoutfs_key *start, struct scoutfs_key *end),
TP_ARGS(sb, ptr, which, n0l, n1l, start, end)
);
DEFINE_EVENT(scoutfs_wkic_wpage_class, scoutfs_wkic_wpage_shrinking,
TP_PROTO(struct super_block *sb, void *ptr, int which, bool n0l, bool n1l,
struct scoutfs_key *start, struct scoutfs_key *end),
TP_ARGS(sb, ptr, which, n0l, n1l, start, end)
);
DEFINE_EVENT(scoutfs_wkic_wpage_class, scoutfs_wkic_wpage_dropping,
TP_PROTO(struct super_block *sb, void *ptr, int which, bool n0l, bool n1l,
struct scoutfs_key *start, struct scoutfs_key *end),
TP_ARGS(sb, ptr, which, n0l, n1l, start, end)
);
DEFINE_EVENT(scoutfs_wkic_wpage_class, scoutfs_wkic_wpage_replaying,
TP_PROTO(struct super_block *sb, void *ptr, int which, bool n0l, bool n1l,
struct scoutfs_key *start, struct scoutfs_key *end),
TP_ARGS(sb, ptr, which, n0l, n1l, start, end)
);
DEFINE_EVENT(scoutfs_wkic_wpage_class, scoutfs_wkic_wpage_filled,
TP_PROTO(struct super_block *sb, void *ptr, int which, bool n0l, bool n1l,
struct scoutfs_key *start, struct scoutfs_key *end),
TP_ARGS(sb, ptr, which, n0l, n1l, start, end)
);
TRACE_EVENT(scoutfs_wkic_read_items,
TP_PROTO(struct super_block *sb, struct scoutfs_key *key, struct scoutfs_key *start,
struct scoutfs_key *end),
TP_ARGS(sb, key, start, end),
TP_STRUCT__entry(
SCSB_TRACE_FIELDS
sk_trace_define(key)
sk_trace_define(start)
sk_trace_define(end)
),
TP_fast_assign(
SCSB_TRACE_ASSIGN(sb);
sk_trace_assign(key, start);
sk_trace_assign(start, start);
sk_trace_assign(end, end);
),
TP_printk(SCSBF" key "SK_FMT" start "SK_FMT " end "SK_FMT, SCSB_TRACE_ARGS,
sk_trace_args(key), sk_trace_args(start), sk_trace_args(end))
);
-3
View File
@@ -39,9 +39,6 @@ struct scoutfs_triggers {
static char *names[] = {
[SCOUTFS_TRIGGER_BLOCK_REMOVE_STALE] = "block_remove_stale",
[SCOUTFS_TRIGGER_SRCH_COMPACT_LOGS_PAD_SAFE] = "srch_compact_logs_pad_safe",
[SCOUTFS_TRIGGER_SRCH_FORCE_LOG_ROTATE] = "srch_force_log_rotate",
[SCOUTFS_TRIGGER_SRCH_MERGE_STOP_SAFE] = "srch_merge_stop_safe",
[SCOUTFS_TRIGGER_STATFS_LOCK_PURGE] = "statfs_lock_purge",
};
-3
View File
@@ -3,9 +3,6 @@
enum scoutfs_trigger {
SCOUTFS_TRIGGER_BLOCK_REMOVE_STALE,
SCOUTFS_TRIGGER_SRCH_COMPACT_LOGS_PAD_SAFE,
SCOUTFS_TRIGGER_SRCH_FORCE_LOG_ROTATE,
SCOUTFS_TRIGGER_SRCH_MERGE_STOP_SAFE,
SCOUTFS_TRIGGER_STATFS_LOCK_PURGE,
SCOUTFS_TRIGGER_NR,
};
+2 -2
View File
@@ -23,9 +23,9 @@ static inline void down_write_two(struct rw_semaphore *a,
* ~0UL values. Hence, we cap count to ~0L, which is arbitarily high
* enough to avoid it.
*/
static inline long shrinker_min_long(long count)
static inline unsigned long shrinker_min_t_long(unsigned long count)
{
return min(count, LONG_MAX);
return min_t(u64, count, LONG_MAX);
}
#endif
-1155
View File
File diff suppressed because it is too large Load Diff
-19
View File
@@ -1,19 +0,0 @@
#ifndef _SCOUTFS_WKIC_H_
#define _SCOUTFS_WKIC_H_
#include "format.h"
typedef int (*wkic_iter_cb_t)(struct scoutfs_key *key, void *val, unsigned int val_len,
void *cb_arg);
int scoutfs_wkic_iterate(struct super_block *sb, struct scoutfs_key *key, struct scoutfs_key *last,
struct scoutfs_key *range_start, struct scoutfs_key *range_end,
wkic_iter_cb_t cb, void *cb_arg);
int scoutfs_wkic_iterate_stable(struct super_block *sb, struct scoutfs_key *key,
struct scoutfs_key *last, struct scoutfs_key *range_start,
struct scoutfs_key *range_end, wkic_iter_cb_t cb, void *cb_arg);
int scoutfs_wkic_setup(struct super_block *sb);
void scoutfs_wkic_destroy(struct super_block *sb);
#endif
+42 -209
View File
@@ -81,20 +81,7 @@ static void init_xattr_key(struct scoutfs_key *key, u64 ino, u32 name_hash,
#define SCOUTFS_XATTR_PREFIX "scoutfs."
#define SCOUTFS_XATTR_PREFIX_LEN (sizeof(SCOUTFS_XATTR_PREFIX) - 1)
/*
* We could have hidden the logic that needs this in a user-prefix
* specific .set handler, but I wanted to make sure that we always
* applied that logic from any call chains to _xattr_set. The
* additional strcmp isn't so expensive given all the rest of the work
* we're doing in here.
*/
static inline bool is_user(const char *name)
{
return !strncmp(name, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN);
}
#define HIDE_TAG "hide."
#define INDX_TAG "indx."
#define SRCH_TAG "srch."
#define TOTL_TAG "totl."
#define TAG_LEN (sizeof(HIDE_TAG) - 1)
@@ -116,9 +103,6 @@ int scoutfs_xattr_parse_tags(const char *name, unsigned int name_len,
if (!strncmp(name, HIDE_TAG, TAG_LEN)) {
if (++tgs->hide == 0)
return -EINVAL;
} else if (!strncmp(name, INDX_TAG, TAG_LEN)) {
if (++tgs->indx == 0)
return -EINVAL;
} else if (!strncmp(name, SRCH_TAG, TAG_LEN)) {
if (++tgs->srch == 0)
return -EINVAL;
@@ -556,57 +540,47 @@ static int parse_totl_u64(const char *s, int len, u64 *res)
}
/*
* non-destructive relatively quick parse of final dotted u64s in an
* xattr name. If the required number of values are found then we
* return the number of bytes in the name that are not the final dotted
* u64s with their dots. -EINVAL is returned if we didn't find the
* required number of values.
* non-destructive relatively quick parse of the last 3 dotted u64s that
* make up the name of the xattr total. -EINVAL is returned if there
* are anything but 3 valid u64 encodings between single dots at the end
* of the name.
*/
static int parse_dotted_u64s(u64 *u64s, int nr, const char *name, int name_len)
static int parse_totl_key(struct scoutfs_key *key, const char *name, int name_len)
{
u64 tot_name[3];
int end = name_len;
int nr = 0;
int len;
int ret;
int i;
int u;
/* parse name elements in reserve order from end of xattr name string */
for (u = nr - 1, i = name_len - 1; u >= 0 && i >= 0; i--) {
for (i = name_len - 1; i >= 0 && nr < ARRAY_SIZE(tot_name); i--) {
if (name[i] != '.')
continue;
len = end - (i + 1);
ret = parse_totl_u64(&name[i + 1], len, &u64s[u]);
ret = parse_totl_u64(&name[i + 1], len, &tot_name[nr]);
if (ret < 0)
goto out;
end = i;
u--;
nr++;
}
if (u == -1)
ret = end;
else
if (nr == ARRAY_SIZE(tot_name)) {
/* swap to account for parsing in reverse */
swap(tot_name[0], tot_name[2]);
scoutfs_xattr_init_totl_key(key, tot_name);
ret = 0;
} else {
ret = -EINVAL;
}
out:
return ret;
}
static int parse_totl_key(struct scoutfs_key *key, const char *name, int name_len)
{
u64 u64s[3];
int ret;
ret = parse_dotted_u64s(u64s, ARRAY_SIZE(u64s), name, name_len);
if (ret >= 0) {
scoutfs_xattr_init_totl_key(key, u64s);
ret = 0;
}
return ret;
}
static int apply_totl_delta(struct super_block *sb, struct scoutfs_key *key,
struct scoutfs_xattr_totl_val *tval, struct scoutfs_lock *lock)
{
@@ -633,72 +607,6 @@ int scoutfs_xattr_combine_totl(void *dst, int dst_len, void *src, int src_len)
return SCOUTFS_DELTA_COMBINED;
}
void scoutfs_xattr_indx_get_range(struct scoutfs_key *start, struct scoutfs_key *end)
{
scoutfs_key_set_zeros(start);
start->sk_zone = SCOUTFS_XATTR_INDX_ZONE;
scoutfs_key_set_ones(end);
end->sk_zone = SCOUTFS_XATTR_INDX_ZONE;
}
/*
* .indx. keys are a bit funny because we're iterating over index keys
* by major:minor:inode:xattr_id. That doesn't map nicely to the
* comparison precedence of the key fields. We have to mess around a
* little bit to get the major into the most significant key bits and
* the low bits of xattr id into the least significant key bits.
*/
void scoutfs_xattr_init_indx_key(struct scoutfs_key *key, u8 major, u64 minor, u64 ino, u64 xid)
{
scoutfs_key_set_zeros(key);
key->sk_zone = SCOUTFS_XATTR_INDX_ZONE;
key->_sk_first = cpu_to_le64(((u64)major << 56) | (minor >> 8));
key->_sk_second = cpu_to_le64((minor << 56) | (ino >> 8));
key->_sk_third = cpu_to_le64((ino << 56) | (xid >> 8));
key->_sk_fourth = xid & 0xff;
}
void scoutfs_xattr_get_indx_key(struct scoutfs_key *key, u8 *major, u64 *minor, u64 *ino, u64 *xid)
{
*major = le64_to_cpu(key->_sk_first) >> 56;
*minor = (le64_to_cpu(key->_sk_first) << 8) | (le64_to_cpu(key->_sk_second) >> 56);
*ino = (le64_to_cpu(key->_sk_second) << 8) | (le64_to_cpu(key->_sk_third) >> 56);
*xid = (le64_to_cpu(key->_sk_third) << 8) | key->_sk_fourth;
}
void scoutfs_xattr_set_indx_key_xid(struct scoutfs_key *key, u64 xid)
{
u8 major;
u64 minor;
u64 ino;
u64 dummy;
scoutfs_xattr_get_indx_key(key, &major, &minor, &ino, &dummy);
scoutfs_xattr_init_indx_key(key, major, minor, ino, xid);
}
/*
* This initial parsing of the name doesn't yet have access to an xattr
* id to put in the key. That's added later as the existing xattr is
* found or a new xattr's id is allocated.
*/
static int parse_indx_key(struct scoutfs_key *key, const char *name, int name_len, u64 ino)
{
u64 u64s[2];
int ret;
ret = parse_dotted_u64s(u64s, ARRAY_SIZE(u64s), name, name_len);
if (ret < 0)
return ret;
if (u64s[0] > U8_MAX)
return -EINVAL;
scoutfs_xattr_init_indx_key(key, u64s[0], u64s[1], ino, 0);
return 0;
}
/*
* The confusing swiss army knife of creating, modifying, and deleting
* xattrs.
@@ -719,7 +627,7 @@ static int parse_indx_key(struct scoutfs_key *key, const char *name, int name_le
int scoutfs_xattr_set_locked(struct inode *inode, const char *name, size_t name_len,
const void *value, size_t size, int flags,
const struct scoutfs_xattr_prefix_tags *tgs,
struct scoutfs_lock *lck, struct scoutfs_lock *tag_lock,
struct scoutfs_lock *lck, struct scoutfs_lock *totl_lock,
struct list_head *ind_locks)
{
struct scoutfs_inode_info *si = SCOUTFS_I(inode);
@@ -727,11 +635,10 @@ int scoutfs_xattr_set_locked(struct inode *inode, const char *name, size_t name_
const u64 ino = scoutfs_ino(inode);
struct scoutfs_xattr_totl_val tval = {0,};
struct scoutfs_xattr *xat = NULL;
struct scoutfs_key tag_key;
struct scoutfs_key totl_key;
struct scoutfs_key key;
bool undo_srch = false;
bool undo_totl = false;
bool undo_indx = false;
u8 found_parts;
unsigned int xat_bytes_totl;
unsigned int xat_bytes;
@@ -744,8 +651,7 @@ int scoutfs_xattr_set_locked(struct inode *inode, const char *name, size_t name_
trace_scoutfs_xattr_set(sb, name_len, value, size, flags);
if (WARN_ON_ONCE(tgs->totl && tgs->indx) ||
WARN_ON_ONCE((tgs->totl | tgs->indx) && !tag_lock))
if (WARN_ON_ONCE(tgs->totl && !totl_lock))
return -EINVAL;
/* mirror the syscall's errors for large names and values */
@@ -758,22 +664,10 @@ int scoutfs_xattr_set_locked(struct inode *inode, const char *name, size_t name_
(flags & ~(XATTR_CREATE | XATTR_REPLACE)))
return -EINVAL;
if ((tgs->hide | tgs->indx | tgs->srch | tgs->totl) && !capable(CAP_SYS_ADMIN))
if ((tgs->hide | tgs->srch | tgs->totl) && !capable(CAP_SYS_ADMIN))
return -EPERM;
if (tgs->totl && ((ret = parse_totl_key(&tag_key, name, name_len)) != 0))
return ret;
if (tgs->indx &&
(ret = scoutfs_fmt_vers_unsupported(sb, SCOUTFS_FORMAT_VERSION_FEAT_INDX_TAG)))
return ret;
if (tgs->indx && ((ret = parse_indx_key(&tag_key, name, name_len, ino)) != 0))
return ret;
/* retention blocks user. xattr modification, all else allowed */
ret = scoutfs_inode_check_retention(inode);
if (ret < 0 && is_user(name))
if (tgs->totl && ((ret = parse_totl_key(&totl_key, name, name_len)) != 0))
return ret;
/* allocate enough to always read an existing xattr's totl */
@@ -814,12 +708,6 @@ int scoutfs_xattr_set_locked(struct inode *inode, const char *name, size_t name_
/* found fields in key will also be used */
found_parts = ret >= 0 ? xattr_nr_parts(xat) : 0;
/* use existing xattr's id or allocate new when creating */
if (found_parts)
id = le64_to_cpu(key.skx_id);
else if (value)
id = si->next_xattr_id++;
if (found_parts && tgs->totl) {
/* parse old totl value before we clobber xat buf */
val_len = ret - offsetof(struct scoutfs_xattr, name[xat->name_len]);
@@ -830,25 +718,12 @@ int scoutfs_xattr_set_locked(struct inode *inode, const char *name, size_t name_
le64_add_cpu(&tval.total, -total);
}
/*
* indx xattrs don't have a value. After returning an error for
* non-zero val length or short circuiting modifying with the
* same 0 length, all we're left with is creating or deleting
* the xattr.
*/
if (tgs->indx) {
if (size != 0) {
ret = -EINVAL;
goto out;
}
if (found_parts && value) {
ret = 0;
goto out;
}
}
/* prepare the xattr header, name, and start of value in first item */
if (value) {
if (found_parts)
id = le64_to_cpu(key.skx_id);
else
id = si->next_xattr_id++;
xat->name_len = name_len;
xat->val_len = cpu_to_le16(size);
memset(xat->__pad, 0, sizeof(xat->__pad));
@@ -866,18 +741,9 @@ int scoutfs_xattr_set_locked(struct inode *inode, const char *name, size_t name_
le64_add_cpu(&tval.total, total);
}
if (tgs->indx) {
scoutfs_xattr_set_indx_key_xid(&tag_key, id);
if (value)
ret = scoutfs_item_create_force(sb, &tag_key, NULL, 0, tag_lock, NULL);
else
ret = scoutfs_item_delete_force(sb, &tag_key, tag_lock, NULL);
if (ret < 0)
goto out;
undo_indx = true;
}
if (tgs->srch && !(found_parts && value)) {
if (found_parts)
id = le64_to_cpu(key.skx_id);
hash = scoutfs_hash64(name, name_len);
ret = scoutfs_forest_srch_add(sb, hash, ino, id);
if (ret < 0)
@@ -886,7 +752,7 @@ int scoutfs_xattr_set_locked(struct inode *inode, const char *name, size_t name_
}
if (tgs->totl) {
ret = apply_totl_delta(sb, &tag_key, &tval, tag_lock);
ret = apply_totl_delta(sb, &totl_key, &tval, totl_lock);
if (ret < 0)
goto out;
undo_totl = true;
@@ -911,13 +777,6 @@ int scoutfs_xattr_set_locked(struct inode *inode, const char *name, size_t name_
ret = 0;
out:
if (ret < 0 && undo_indx) {
if (value)
err = scoutfs_item_delete_force(sb, &tag_key, tag_lock, NULL);
else
err = scoutfs_item_create_force(sb, &tag_key, NULL, 0, tag_lock, NULL);
BUG_ON(err); /* inconsistent */
}
if (ret < 0 && undo_srch) {
err = scoutfs_forest_srch_add(sb, hash, ino, id);
BUG_ON(err);
@@ -926,7 +785,7 @@ out:
/* _delta() on dirty items shouldn't fail */
tval.total = cpu_to_le64(-le64_to_cpu(tval.total));
tval.count = cpu_to_le64(-le64_to_cpu(tval.count));
err = apply_totl_delta(sb, &tag_key, &tval, tag_lock);
err = apply_totl_delta(sb, &totl_key, &tval, totl_lock);
BUG_ON(err);
}
@@ -942,7 +801,7 @@ static int scoutfs_xattr_set(struct dentry *dentry, const char *name, const void
struct inode *inode = dentry->d_inode;
struct super_block *sb = inode->i_sb;
struct scoutfs_xattr_prefix_tags tgs;
struct scoutfs_lock *tag_lock = NULL;
struct scoutfs_lock *totl_lock = NULL;
struct scoutfs_lock *lck = NULL;
size_t name_len = strlen(name);
LIST_HEAD(ind_locks);
@@ -957,11 +816,8 @@ static int scoutfs_xattr_set(struct dentry *dentry, const char *name, const void
if (ret)
goto unlock;
if (tgs.totl || tgs.indx) {
if (tgs.totl)
ret = scoutfs_lock_xattr_totl(sb, SCOUTFS_LOCK_WRITE_ONLY, 0, &tag_lock);
else
ret = scoutfs_lock_xattr_indx(sb, SCOUTFS_LOCK_WRITE_ONLY, 0, &tag_lock);
if (tgs.totl) {
ret = scoutfs_lock_xattr_totl(sb, SCOUTFS_LOCK_WRITE_ONLY, 0, &totl_lock);
if (ret)
goto unlock;
}
@@ -980,7 +836,7 @@ retry:
goto release;
ret = scoutfs_xattr_set_locked(dentry->d_inode, name, name_len, value, size, flags, &tgs,
lck, tag_lock, &ind_locks);
lck, totl_lock, &ind_locks);
if (ret == 0)
scoutfs_update_inode_item(inode, lck, &ind_locks);
@@ -989,7 +845,7 @@ release:
scoutfs_inode_index_unlock(sb, &ind_locks);
unlock:
scoutfs_unlock(sb, lck, SCOUTFS_LOCK_WRITE);
scoutfs_unlock(sb, tag_lock, SCOUTFS_LOCK_WRITE_ONLY);
scoutfs_unlock(sb, totl_lock, SCOUTFS_LOCK_WRITE_ONLY);
return ret;
}
@@ -1199,15 +1055,14 @@ int scoutfs_xattr_drop(struct super_block *sb, u64 ino,
{
struct scoutfs_xattr_prefix_tags tgs;
struct scoutfs_xattr *xat = NULL;
struct scoutfs_lock *tag_lock = NULL;
struct scoutfs_lock *totl_lock = NULL;
struct scoutfs_xattr_totl_val tval;
struct scoutfs_key tag_key;
struct scoutfs_key totl_key;
struct scoutfs_key last;
struct scoutfs_key key;
bool release = false;
unsigned int bytes;
unsigned int val_len;
u8 locked_zone = 0;
void *value;
u64 total;
u64 hash;
@@ -1253,32 +1108,16 @@ int scoutfs_xattr_drop(struct super_block *sb, u64 ino,
goto out;
}
ret = parse_totl_key(&tag_key, xat->name, xat->name_len) ?:
ret = parse_totl_key(&totl_key, xat->name, xat->name_len) ?:
parse_totl_u64(value, val_len, &total);
if (ret < 0)
break;
}
if (tgs.indx) {
ret = parse_indx_key(&tag_key, xat->name, xat->name_len, ino);
if (ret < 0)
goto out;
}
if ((tgs.totl || tgs.indx) && locked_zone != tag_key.sk_zone) {
if (tag_lock) {
scoutfs_unlock(sb, tag_lock, SCOUTFS_LOCK_WRITE_ONLY);
tag_lock = NULL;
}
if (tgs.totl)
ret = scoutfs_lock_xattr_totl(sb, SCOUTFS_LOCK_WRITE_ONLY, 0,
&tag_lock);
else
ret = scoutfs_lock_xattr_indx(sb, SCOUTFS_LOCK_WRITE_ONLY, 0,
&tag_lock);
if (tgs.totl && totl_lock == NULL) {
ret = scoutfs_lock_xattr_totl(sb, SCOUTFS_LOCK_WRITE_ONLY, 0, &totl_lock);
if (ret < 0)
break;
locked_zone = tag_key.sk_zone;
}
ret = scoutfs_hold_trans(sb, false);
@@ -1301,13 +1140,7 @@ int scoutfs_xattr_drop(struct super_block *sb, u64 ino,
if (tgs.totl) {
tval.total = cpu_to_le64(-total);
tval.count = cpu_to_le64(-1LL);
ret = apply_totl_delta(sb, &tag_key, &tval, tag_lock);
if (ret < 0)
break;
}
if (tgs.indx) {
ret = scoutfs_item_delete_force(sb, &tag_key, tag_lock, NULL);
ret = apply_totl_delta(sb, &totl_key, &tval, totl_lock);
if (ret < 0)
break;
}
@@ -1320,7 +1153,7 @@ int scoutfs_xattr_drop(struct super_block *sb, u64 ino,
if (release)
scoutfs_release_trans(sb);
scoutfs_unlock(sb, tag_lock, SCOUTFS_LOCK_WRITE_ONLY);
scoutfs_unlock(sb, totl_lock, SCOUTFS_LOCK_WRITE_ONLY);
kfree(xat);
out:
return ret;
-6
View File
@@ -3,7 +3,6 @@
struct scoutfs_xattr_prefix_tags {
unsigned long hide:1,
indx:1,
srch:1,
totl:1;
};
@@ -31,9 +30,4 @@ int scoutfs_xattr_parse_tags(const char *name, unsigned int name_len,
void scoutfs_xattr_init_totl_key(struct scoutfs_key *key, u64 *name);
int scoutfs_xattr_combine_totl(void *dst, int dst_len, void *src, int src_len);
void scoutfs_xattr_indx_get_range(struct scoutfs_key *start, struct scoutfs_key *end);
void scoutfs_xattr_init_indx_key(struct scoutfs_key *key, u8 major, u64 minor, u64 ino, u64 xid);
void scoutfs_xattr_get_indx_key(struct scoutfs_key *key, u8 *major, u64 *minor, u64 *ino, u64 *xid);
void scoutfs_xattr_set_indx_key_xid(struct scoutfs_key *key, u64 xid);
#endif
-1
View File
@@ -9,4 +9,3 @@ src/find_xattrs
src/stage_tmpfile
src/create_xattr_loop
src/o_tmpfile_umask
src/o_tmpfile_linkat
+1 -2
View File
@@ -12,8 +12,7 @@ BIN := src/createmany \
src/find_xattrs \
src/create_xattr_loop \
src/fragmented_data_extents \
src/o_tmpfile_umask \
src/o_tmpfile_linkat
src/o_tmpfile_umask
DEPS := $(wildcard src/*.d)
+4 -6
View File
@@ -25,9 +25,8 @@ All options can be seen by running with -h.
This script is built to test multi-node systems on one host by using
different mounts of the same devices. The script creates a fake block
device in front of each fs block device for each mount that will be
tested. It will create predictable device mapper devices and mounts
them on /mnt/test.N. These static device names and mount paths limit
the script to a single execution per host.
tested. Currently it will create free loop devices and will mount on
/mnt/test.[0-9].
All tests will be run by default. Particular tests can be included or
excluded by providing test name regular expressions with the -I and -E
@@ -105,15 +104,14 @@ used during the test.
| Variable | Description | Origin | Example |
| ---------------- | ------------------- | --------------- | ----------------- |
| T\_MB[0-9] | per-mount meta bdev | created per run | /dev/mapper/\_scoutfs\_test\_meta\_[0-9] |
| T\_DB[0-9] | per-mount data bdev | created per run | /dev/mapper/\_scoutfs\_test\_data\_[0-9] |
| T\_MB[0-9] | per-mount meta bdev | created per run | /dev/loop0 |
| T\_DB[0-9] | per-mount data bdev | created per run | /dev/loop1 |
| T\_D[0-9] | per-mount test dir | made for test | /mnt/test.[0-9]/t |
| T\_META\_DEVICE | main FS meta bdev | -M | /dev/vda |
| T\_DATA\_DEVICE | main FS data bdev | -D | /dev/vdb |
| T\_EX\_META\_DEV | scratch meta bdev | -f | /dev/vdd |
| T\_EX\_DATA\_DEV | scratch meta bdev | -e | /dev/vdc |
| T\_M[0-9] | mount paths | mounted per run | /mnt/test.[0-9]/ |
| T\_MODULE | built kernel module | created per run | ../kmod/src/..ko |
| T\_NR\_MOUNTS | number of mounts | -n | 3 |
| T\_O[0-9] | mount options | created per run | -o server\_addr= |
| T\_QUORUM | quorum count | -q | 2 |
+1 -1
View File
@@ -35,7 +35,7 @@ t_fail()
t_quiet()
{
echo "# $*" >> "$T_TMPDIR/quiet.log"
"$@" >> "$T_TMPDIR/quiet.log" 2>&1 || \
"$@" > "$T_TMPDIR/quiet.log" 2>&1 || \
t_fail "quiet command failed"
}
+1 -65
View File
@@ -6,61 +6,6 @@ t_filter_fs()
-e 's@Device: [a-fA-F0-9]*h/[0-9]*d@Device: 0h/0d@g'
}
#
# We can hit a spurious kasan warning that was fixed upstream:
#
# e504e74cc3a2 x86/unwind/orc: Disable KASAN checking in the ORC unwinder, part 2
#
# KASAN can get mad when the unwinder doesn't find ORC metadata and
# wanders up without using frames and hits the KASAN stack red zones.
# We can ignore these messages.
#
# They're bracketed by:
# [ 2687.690127] ==================================================================
# [ 2687.691366] BUG: KASAN: stack-out-of-bounds in get_reg+0x1bc/0x230
# ...
# [ 2687.706220] ==================================================================
# [ 2687.707284] Disabling lock debugging due to kernel taint
#
# That final lock debugging message may not be included.
#
ignore_harmless_unwind_kasan_stack_oob()
{
awk '
BEGIN {
in_soob = 0
soob_nr = 0
}
( !in_soob && $0 ~ /==================================================================/ ) {
in_soob = 1
soob_nr = NR
saved = $0
}
( in_soob == 1 && NR == (soob_nr + 1) ) {
if (match($0, /KASAN: stack-out-of-bounds in get_reg/) != 0) {
in_soob = 2
} else {
in_soob = 0
print saved
}
saved=""
}
( in_soob == 2 && $0 ~ /==================================================================/ ) {
in_soob = 3
soob_nr = NR
}
( in_soob == 3 && NR > soob_nr && $0 !~ /Disabling lock debugging/ ) {
in_soob = 0
}
( !in_soob ) { print $0 }
END {
if (saved) {
print saved
}
}
'
}
#
# Filter out expected messages. Putting messages here implies that
# tests aren't relying on messages to discover failures.. they're
@@ -141,16 +86,7 @@ t_filter_dmesg()
re="$re|scoutfs .* critical transaction commit failure.*"
# change-devices causes loop device resizing
re="$re|loop: module loaded"
re="$re|loop[0-9].* detected capacity change from.*"
# ignore systemd-journal rotating
re="$re|systemd-journald.*"
# format vers back/compat tries bad mounts
re="$re|scoutfs .* error.*outside of supported version.*"
re="$re|scoutfs .* error.*could not get .*super.*"
egrep -v "($re)" | \
ignore_harmless_unwind_kasan_stack_oob
egrep -v "($re)"
}
+8 -47
View File
@@ -29,12 +29,13 @@ t_mount_rid()
}
#
# Output the "f.$fsid.r.$rid" identifier string for the given path
# in a mounted scoutfs volume.
# Output the "f.$fsid.r.$rid" identifier string for the given mount
# number, 0 is used by default if none is specified.
#
t_ident_from_mnt()
t_ident()
{
local mnt="$1"
local nr="${1:-0}"
local mnt="$(eval echo \$T_M$nr)"
local fsid
local rid
@@ -44,38 +45,6 @@ t_ident_from_mnt()
echo "f.${fsid:0:6}.r.${rid:0:6}"
}
#
# Output the "f.$fsid.r.$rid" identifier string for the given mount
# number, 0 is used by default if none is specified.
#
t_ident()
{
local nr="${1:-0}"
local mnt="$(eval echo \$T_M$nr)"
t_ident_from_mnt "$mnt"
}
#
# Output the sysfs path for a path in a mounted fs.
#
t_sysfs_path_from_ident()
{
local ident="$1"
echo "/sys/fs/scoutfs/$ident"
}
#
# Output the sysfs path for a path in a mounted fs.
#
t_sysfs_path_from_mnt()
{
local mnt="$1"
t_sysfs_path_from_ident $(t_ident_from_mnt $mnt)
}
#
# Output the mount's sysfs path, defaulting to mount 0 if none is
# specified.
@@ -84,7 +53,7 @@ t_sysfs_path()
{
local nr="$1"
t_sysfs_path_from_ident $(t_ident $nr)
echo "/sys/fs/scoutfs/$(t_ident $nr)"
}
#
@@ -296,15 +265,6 @@ t_trigger_get() {
cat "$(t_trigger_path "$nr")/$which"
}
t_trigger_set() {
local which="$1"
local nr="$2"
local val="$3"
local path=$(t_trigger_path "$nr")
echo "$val" > "$path/$which"
}
t_trigger_show() {
local which="$1"
local string="$2"
@@ -316,8 +276,9 @@ t_trigger_show() {
t_trigger_arm_silent() {
local which="$1"
local nr="$2"
local path=$(t_trigger_path "$nr")
t_trigger_set "$which" "$nr" 1
echo 1 > "$path/$which"
}
t_trigger_arm() {
-1
View File
@@ -1,4 +1,3 @@
== measure initial createmany
== measure initial createmany
== measure two concurrent createmany runs
== cleanup
-4
View File
@@ -1,4 +0,0 @@
== ensuring utils and module for old versions
== unmounting test fs and removing test module
== testing combinations of old and new format versions
== restoring test module and mount
-1
View File
@@ -1,4 +1,3 @@
== setting longer hung task timeout
== creating fragmented extents
== unlink file with moved extents to free extents per block
== cleanup
-24
View File
@@ -1,24 +0,0 @@
== default new files don't have project
0
== set new project on files and dirs
8675309
8675309
== non-root can see id
8675309
== can use IDs around long width limits
2147483647
2147483648
4294967295
9223372036854775807
9223372036854775808
18446744073709551615
== created files and dirs inherit project id
8675309
8675309
== inheritance continues
8675309
== clearing project id stops inheritance
0
0
== o_tmpfile creations inherit dir
8675309
-40
View File
@@ -1,40 +0,0 @@
== prepare dir with write perm for test ids
== test assumes starting with no rules, empty list
== add rule
7 13,L,- 15,L,- 17,L,- I 33 -
== list is empty again after delete
== can change limits without deleting
1 1,L,- 1,L,- 1,L,- I 100 -
1 1,L,- 1,L,- 1,L,- I 101 -
1 1,L,- 1,L,- 1,L,- I 99 -
== wipe and restore rules in bulk
7 15,L,- 0,L,- 0,L,- I 33 -
7 14,L,- 0,L,- 0,L,- I 33 -
7 13,L,- 0,L,- 0,L,- I 33 -
7 12,L,- 0,L,- 0,L,- I 33 -
7 11,L,- 0,L,- 0,L,- I 33 -
7 10,L,- 0,L,- 0,L,- I 33 -
7 15,L,- 0,L,- 0,L,- I 33 -
7 14,L,- 0,L,- 0,L,- I 33 -
7 13,L,- 0,L,- 0,L,- I 33 -
7 12,L,- 0,L,- 0,L,- I 33 -
7 11,L,- 0,L,- 0,L,- I 33 -
7 10,L,- 0,L,- 0,L,- I 33 -
== default rule prevents file creation
touch: cannot touch '/mnt/test/test/quota/dir/file': Disk quota exceeded
== decreasing totl allows file creation again
== attr selecting rules prevent creation
touch: cannot touch '/mnt/test/test/quota/dir/file': Disk quota exceeded
touch: cannot touch '/mnt/test/test/quota/dir/file': Disk quota exceeded
== multi attr selecting doesn't prevent partial
touch: cannot touch '/mnt/test/test/quota/dir/file': Disk quota exceeded
== op differentiates
== higher priority rule applies
touch: cannot touch '/mnt/test/test/quota/dir/file': Disk quota exceeded
== data rules with total and count prevent write and fallocate
dd: error writing '/mnt/test/test/quota/dir/file': Disk quota exceeded
fallocate: fallocate failed: Disk quota exceeded
dd: error writing '/mnt/test/test/quota/dir/file': Disk quota exceeded
fallocate: fallocate failed: Disk quota exceeded
== added rules work after bulk restore
touch: cannot touch '/mnt/test/test/quota/dir/file': Disk quota exceeded
-28
View File
@@ -1,28 +0,0 @@
== setting retention on dir fails
attr_x ioctl failed on '/mnt/test/test/retention-basic': Invalid argument (22)
scoutfs: set-attr-x failed: Invalid argument (22)
== set retention
== get-attr-x shows retention
1
== unpriv can't clear retention
attr_x ioctl failed on '/mnt/test/test/retention-basic/file-1': Operation not permitted (1)
scoutfs: set-attr-x failed: Operation not permitted (1)
== can set hidden scoutfs xattr in retention
== setting user. xattr fails in retention
setfattr: /mnt/test/test/retention-basic/file-1: Operation not permitted
== file deletion fails in retention
rm: cannot remove '/mnt/test/test/retention-basic/file-1': Operation not permitted
== file rename fails in retention
mv: cannot move '/mnt/test/test/retention-basic/file-1' to '/mnt/test/test/retention-basic/file-2': Operation not permitted
== file write fails in retention
date: write error: Operation not permitted
== file truncate fails in retention
truncate: failed to truncate '/mnt/test/test/retention-basic/file-1' at 0 bytes: Operation not permitted
== setattr fails in retention
touch: setting times of '/mnt/test/test/retention-basic/file-1': Operation not permitted
== clear retention
== file write
== file rename
== setattr
== xattr deletion
== cleanup
-37
View File
@@ -1,37 +0,0 @@
== initialize per-mount values
== arm compaction triggers
trigger srch_compact_logs_pad_safe armed: 1
trigger srch_merge_stop_safe armed: 1
trigger srch_compact_logs_pad_safe armed: 1
trigger srch_merge_stop_safe armed: 1
trigger srch_compact_logs_pad_safe armed: 1
trigger srch_merge_stop_safe armed: 1
trigger srch_compact_logs_pad_safe armed: 1
trigger srch_merge_stop_safe armed: 1
trigger srch_compact_logs_pad_safe armed: 1
trigger srch_merge_stop_safe armed: 1
== compact more often
== create padded sorted inputs by forcing log rotation
trigger srch_force_log_rotate armed: 1
trigger srch_force_log_rotate armed: 1
trigger srch_force_log_rotate armed: 1
trigger srch_force_log_rotate armed: 1
trigger srch_compact_logs_pad_safe armed: 1
trigger srch_force_log_rotate armed: 1
trigger srch_force_log_rotate armed: 1
trigger srch_force_log_rotate armed: 1
trigger srch_force_log_rotate armed: 1
trigger srch_compact_logs_pad_safe armed: 1
trigger srch_force_log_rotate armed: 1
trigger srch_force_log_rotate armed: 1
trigger srch_force_log_rotate armed: 1
trigger srch_force_log_rotate armed: 1
trigger srch_compact_logs_pad_safe armed: 1
trigger srch_force_log_rotate armed: 1
trigger srch_force_log_rotate armed: 1
trigger srch_force_log_rotate armed: 1
trigger srch_force_log_rotate armed: 1
trigger srch_compact_logs_pad_safe armed: 1
== compaction of padded should stop at safe
== verify no compaction errors
== cleanup
+16 -26
View File
@@ -73,7 +73,6 @@ $(basename $0) options:
-t | Enabled trace events that match the given glob argument.
| Multiple options enable multiple globbed events.
-T <nr> | Multiply the original trace buffer size by nr during the run.
-V <nr> | Set mkfs device format version.
-X | xfstests git repo. Used by tests/xfstests.sh.
-x | xfstests git branch to checkout and track.
-y | xfstests ./check additional args
@@ -177,11 +176,6 @@ while true; do
T_TRACE_MULT="$2"
shift
;;
-V)
test -n "$2" || die "-V must have a format version argument"
T_MKFS_FORMAT_VERSION="-V $2"
shift
;;
-X)
test -n "$2" || die "-X requires xfstests git repo dir argument"
T_XFSTESTS_REPO="$2"
@@ -332,10 +326,16 @@ unmount_all() {
cmd wait $p
done
# delete all temp devices
for dev in /dev/mapper/_scoutfs_test_*; do
if [ -b "$dev" ]; then
cmd dmsetup remove $dev
# delete all temp meta devices
for dev in $(losetup --associated "$T_META_DEVICE" | cut -d : -f 1); do
if [ -e "$dev" ]; then
cmd losetup -d "$dev"
fi
done
# delete all temp data devices
for dev in $(losetup --associated "$T_DATA_DEVICE" | cut -d : -f 1); do
if [ -e "$dev" ]; then
cmd losetup -d "$dev"
fi
done
}
@@ -350,7 +350,7 @@ if [ -n "$T_MKFS" ]; then
done
msg "making new filesystem with $T_QUORUM quorum members"
cmd scoutfs mkfs -f $quo $T_DATA_ALLOC_ZONE_BLOCKS $T_MKFS_FORMAT_VERSION \
cmd scoutfs mkfs -f $quo $T_DATA_ALLOC_ZONE_BLOCKS \
"$T_META_DEVICE" "$T_DATA_DEVICE"
fi
@@ -358,8 +358,7 @@ if [ -n "$T_INSMOD" ]; then
msg "removing and reinserting scoutfs module"
test -e /sys/module/scoutfs && cmd rmmod scoutfs
cmd modprobe libcrc32c
T_MODULE="$T_KMOD/src/scoutfs.ko"
cmd insmod "$T_MODULE"
cmd insmod "$T_KMOD/src/scoutfs.ko"
fi
if [ -n "$T_TRACE_MULT" ]; then
@@ -435,12 +434,6 @@ $T_UTILS/fenced/scoutfs-fenced > "$T_FENCED_LOG" 2>&1 &
fenced_pid=$!
fenced_log "started fenced pid $fenced_pid in the background"
# setup dm tables
echo "0 $(blockdev --getsz $T_META_DEVICE) linear $T_META_DEVICE 0" > \
$T_RESULTS/dmtable.meta
echo "0 $(blockdev --getsz $T_DATA_DEVICE) linear $T_DATA_DEVICE 0" > \
$T_RESULTS/dmtable.data
#
# mount concurrently so that a quorum is present to elect the leader and
# start a server.
@@ -449,13 +442,10 @@ msg "mounting $T_NR_MOUNTS mounts on meta $T_META_DEVICE data $T_DATA_DEVICE"
pids=""
for i in $(seq 0 $((T_NR_MOUNTS - 1))); do
name="_scoutfs_test_meta_$i"
cmd dmsetup create "$name" --table "$(cat $T_RESULTS/dmtable.meta)"
meta_dev="/dev/mapper/$name"
name="_scoutfs_test_data_$i"
cmd dmsetup create "$name" --table "$(cat $T_RESULTS/dmtable.data)"
data_dev="/dev/mapper/$name"
meta_dev=$(losetup --find --show $T_META_DEVICE)
test -b "$meta_dev" || die "failed to create temp device $meta_dev"
data_dev=$(losetup --find --show $T_DATA_DEVICE)
test -b "$data_dev" || die "failed to create temp device $data_dev"
dir="/mnt/test.$i"
test -d "$dir" || cmd mkdir -p "$dir"
-5
View File
@@ -12,16 +12,11 @@ data-prealloc.sh
setattr_more.sh
offline-extent-waiting.sh
move-blocks.sh
projects.sh
large-fragmented-free.sh
format-version-forward-back.sh
enospc.sh
srch-safe-merge-pos.sh
srch-basic-functionality.sh
simple-xattr-unit.sh
retention-basic.sh
totl-xattr-tag.sh
quota.sh
lock-refleak.sh
lock-shrink-consistency.sh
lock-pr-cw-conflict.sh
+40 -55
View File
@@ -1,7 +1,6 @@
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <stdarg.h>
#include <errno.h>
#include <string.h>
#include <sys/stat.h>
@@ -36,10 +35,10 @@ struct opts {
unsigned int dry_run:1,
ls_output:1,
quiet:1,
xattr_set:1,
xattr_file:1,
xattr_group:1;
char *xattr_name;
user_xattr:1,
same_srch_xattr:1,
group_srch_xattr:1,
unique_srch_xattr:1;
};
struct stats {
@@ -150,31 +149,12 @@ static void free_dir(struct dir *dir)
free(dir);
}
static size_t snprintf_off(void *buf, size_t sz, size_t off, char *fmt, ...)
{
va_list ap;
int ret;
if (off >= sz)
return sz;
va_start(ap, fmt);
ret = vsnprintf(buf + off, sz - off, fmt, ap);
va_end(ap);
if (ret <= 0)
return sz;
return off + ret;
}
static void create_dir(struct dir *dir, struct opts *opts,
struct stats *stats)
{
struct str_list *s;
char name[256]; /* max len and null term */
char name[100];
char val = 'v';
size_t off;
int rc;
int i;
@@ -195,21 +175,29 @@ static void create_dir(struct dir *dir, struct opts *opts,
rc = mknod(s->str, S_IFREG | 0644, 0);
error_exit(rc, "mknod %s failed"ERRF, s->str, ERRA);
if (opts->xattr_set) {
off = snprintf_off(name, sizeof(name), 0, "%s", opts->xattr_name);
if (opts->xattr_file)
off = snprintf_off(name, sizeof(name), off,
"-f-%lu", stats->files);
if (opts->xattr_group)
off = snprintf_off(name, sizeof(name), off,
"-g-%lu", stats->files / 10000);
error_exit(off >= sizeof(name), "xattr name longer than 255 bytes");
rc = 0;
if (rc == 0 && opts->user_xattr) {
strcpy(name, "user.scoutfs_bcp");
rc = setxattr(s->str, name, &val, 1, 0);
}
if (rc == 0 && opts->same_srch_xattr) {
strcpy(name, "scoutfs.srch.scoutfs_bcp");
rc = setxattr(s->str, name, &val, 1, 0);
}
if (rc == 0 && opts->group_srch_xattr) {
snprintf(name, sizeof(name),
"scoutfs.srch.scoutfs_bcp.group.%lu",
stats->files / 10000);
rc = setxattr(s->str, name, &val, 1, 0);
}
if (rc == 0 && opts->unique_srch_xattr) {
snprintf(name, sizeof(name),
"scoutfs.srch.scoutfs_bcp.unique.%lu",
stats->files);
rc = setxattr(s->str, name, &val, 1, 0);
error_exit(rc, "setxattr %s %s failed"ERRF, s->str, name, ERRA);
}
error_exit(rc, "setxattr %s %s failed"ERRF, s->str, name, ERRA);
stats->files++;
rate_banner(opts, stats);
@@ -377,10 +365,11 @@ static void usage(void)
" -d DIR | create all files in DIR top level directory\n"
" -n | dry run, only parse, don't create any files\n"
" -q | quiet, don't regularly print rates\n"
" -F | append \"-f-NR\" file nr to xattr name, requires -X\n"
" -G | append \"-g-NR\" file nr/10000 to xattr name, requires -X\n"
" -L | parse ls output; only reg, skip meta, paths at ./\n"
" -X NAM | set named xattr in all files\n");
" -X | set the same user. xattr name in all files\n"
" -S | set the same .srch. xattr name in all files\n"
" -G | set a .srch. xattr name shared by groups of files\n"
" -U | set a unique .srch. xattr name in all files\n");
}
int main(int argc, char **argv)
@@ -397,7 +386,7 @@ int main(int argc, char **argv)
memset(&opts, 0, sizeof(opts));
while ((c = getopt(argc, argv, "d:nqFGLX:")) != -1) {
while ((c = getopt(argc, argv, "d:nqLXSGU")) != -1) {
switch(c) {
case 'd':
top_dir = strdup(optarg);
@@ -408,19 +397,20 @@ int main(int argc, char **argv)
case 'q':
opts.quiet = 1;
break;
case 'F':
opts.xattr_file = 1;
break;
case 'G':
opts.xattr_group = 1;
break;
case 'L':
opts.ls_output = 1;
break;
case 'X':
opts.xattr_set = 1;
opts.xattr_name = strdup(optarg);
error_exit(!opts.xattr_name, "error allocating xattr name");
opts.user_xattr = 1;
break;
case 'S':
opts.same_srch_xattr = 1;
break;
case 'G':
opts.group_srch_xattr = 1;
break;
case 'U':
opts.unique_srch_xattr = 1;
break;
case '?':
printf("Unknown option '%c'\n", optopt);
@@ -429,11 +419,6 @@ int main(int argc, char **argv)
}
}
error_exit(opts.xattr_file && !opts.xattr_set,
"must specify xattr -X when appending file nr with -F");
error_exit(opts.xattr_group && !opts.xattr_set,
"must specify xattr -X when appending file nr with -G");
if (!opts.dry_run) {
error_exit(!top_dir,
"must specify top level directory with -d");
-71
View File
@@ -1,71 +0,0 @@
/*
* Copyright (C) 2023 Versity Software, Inc. All rights reserved.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public
* License v2 as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License for more details.
*/
#ifndef _GNU_SOURCE
#define _GNU_SOURCE
#endif
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/ioctl.h>
#include <fcntl.h>
#include <errno.h>
#include <sys/stat.h>
#include <assert.h>
#include <limits.h>
static void linkat_tmpfile(char *dir, char *lpath)
{
char proc_self[PATH_MAX];
int ret;
int fd;
fd = open(dir, O_RDWR | O_TMPFILE, 0777);
if (fd < 0) {
perror("open(O_TMPFILE)");
exit(1);
}
snprintf(proc_self, sizeof(proc_self), "/proc/self/fd/%d", fd);
ret = linkat(AT_FDCWD, proc_self, AT_FDCWD, lpath, AT_SYMLINK_FOLLOW);
if (ret < 0) {
perror("linkat");
exit(1);
}
close(fd);
}
/*
* Use O_TMPFILE and linkat to create a new visible file, used to test
* the O_TMPFILE creation path by inspecting the created file.
*/
int main(int argc, char **argv)
{
char *lpath;
char *dir;
if (argc < 3) {
printf("%s <open_dir> <linkat_path>\n", argv[0]);
return 1;
}
dir = argv[1];
lpath = argv[2];
linkat_tmpfile(dir, lpath);
return 0;
}
+1 -6
View File
@@ -11,13 +11,8 @@ FILE="$T_D0/file"
# final block as we truncated past it.
#
echo "== truncate writes zeroed partial end of file block"
yes | dd of="$FILE" bs=8K count=1 status=none iflag=fullblock
yes | dd of="$FILE" bs=8K count=1 status=none
sync
# not passing iflag=fullblock causes the file occasionally to just be
# 4K, so just to be safe we should at least check size once
test `stat --printf="%s\n" "$FILE"` -eq 8192 || t_fail "test file incorrect start size"
truncate -s 6K "$FILE"
truncate -s 12K "$FILE"
echo 3 > /proc/sys/vm/drop_caches
+6 -13
View File
@@ -7,11 +7,9 @@ t_require_mounts 2
COUNT=50000
#
# Prep dirs for test. We have per-directory inode number allocators so
# by putting each createmany in a per-mount dir they get their own inode
# number region and cluster locks.
#
# Prep dirs for test. Each mount needs to make their own parent dir for
# the createmany run, otherwise both dirs will end up in the same inode
# group, causing updates to bounce that lock around.
echo "== measure initial createmany"
mkdir -p $T_D0/dir/0
mkdir $T_D1/dir/1
@@ -19,20 +17,18 @@ mkdir $T_D1/dir/1
echo "== measure initial createmany"
START=$SECONDS
createmany -o "$T_D0/file_" $COUNT >> $T_TMP.full
sync
SINGLE=$((SECONDS - START))
echo single $SINGLE >> $T_TMP.full
echo "== measure two concurrent createmany runs"
START=$SECONDS
(cd $T_D0/dir/0; createmany -o ./file_ $COUNT > /dev/null) &
createmany -o $T_D0/dir/0/file $COUNT > /dev/null &
pids="$!"
(cd $T_D1/dir/1; createmany -o ./file_ $COUNT > /dev/null) &
createmany -o $T_D1/dir/1/file $COUNT > /dev/null &
pids="$pids $!"
for p in $pids; do
wait $p
done
sync
BOTH=$((SECONDS - START))
echo both $BOTH >> $T_TMP.full
@@ -45,10 +41,7 @@ echo both $BOTH >> $T_TMP.full
# synchronized operation.
FACTOR=200
if [ "$BOTH" -gt $(($SINGLE*$FACTOR)) ]; then
t_fail "both createmany took $BOTH sec, more than $FACTOR x single $SINGLE sec"
echo "both createmany took $BOTH sec, more than $FACTOR x single $SINGLE sec"
fi
echo "== cleanup"
find $T_D0/dir -delete
t_pass
+14 -1
View File
@@ -7,11 +7,14 @@ t_require_mounts 2
#
# Make sure that all mounts can read the results of a write from each
# mount.
# mount. And make sure that the greatest of all the written seqs is
# visible after the writes were commited by remote reads.
#
check_read_write()
{
local expected
local greatest=0
local seq
local path
local saw
local w
@@ -22,6 +25,11 @@ check_read_write()
eval path="\$T_D${w}/written"
echo "$expected" > "$path"
seq=$(scoutfs stat -s meta_seq $path)
if [ "$seq" -gt "$greatest" ]; then
greatest=$seq
fi
for r in $(t_fs_nrs); do
eval path="\$T_D${r}/written"
saw=$(cat "$path")
@@ -30,6 +38,11 @@ check_read_write()
fi
done
done
seq=$(scoutfs statfs -s committed_seq -p $T_D0)
if [ "$seq" -lt "$greatest" ]; then
echo "committed_seq $seq less than greatest $greatest"
fi
}
# verify that fenced ran our testing fence script
-179
View File
@@ -1,179 +0,0 @@
#
# Test our basic ability to work with different format versions.
#
# The current code being tested has a range of supported format
# versions. For each of the older supported format versions we have a
# git hash of the commit before the next greater version was introduced.
# We build versions of the scoutfs utility and kernel module for the
# last commit in tree that had a lesser supported version as its max
# supported version. We use those binaries to test forward and back
# compat as new and old code works with a persistent volume with a given
# format version.
#
mount_has_format_version()
{
local mnt="$1"
local vers="$2"
local sysfs_fmt_vers="$(t_sysfs_path_from_mnt $SCR)/format_version"
test "$(cat $sysfs_fmt_vers)" == "$vers"
}
SCR="/mnt/scoutfs.scratch"
MIN=$(modinfo $T_MODULE | awk '($1 == "scoutfs_format_version_min:"){print $2}')
MAX=$(modinfo $T_MODULE | awk '($1 == "scoutfs_format_version_max:"){print $2}')
echo "min: $MIN max: $MAX" > "$T_TMP.log"
test "$MIN" -gt 0 -a "$MAX" -gt 0 -a "$MIN" -le "$MAX" || \
t_fail "parsed bad versions, min: $MIN max: $MAX"
test "$MIN" == "$MAX" && \
t_skip "only one supported format version: $MIN"
# prepare dir and wipe any weird old partial state
builds="$T_RESULTS/format_version_builds"
mkdir -p "$builds"
echo "== ensuring utils and module for old versions"
declare -A commits
commits[1]=c3c4b080
for vers in $(seq $MIN $((MAX - 1))); do
dir="$builds/$vers"
platform=$(uname -rp)
buildmark="$dir/buildmark"
commit="${commits[$vers]}"
test -n "$commit" || \
t_fail "no commit for vers $vers"
# have our files for this version
test "$(cat $buildmark 2>&1)" == "$platform" && \
continue
# build as one big sequence of commands that can return failure
(
set -o pipefail
rm -rf $dir &&
mkdir -p $dir/building &&
cd "$T_TESTS/.." &&
git archive --format=tar "$commit" | tar -C "$dir/building" -xf - &&
cd - &&
find $dir &&
make -C "$dir/building" &&
mv $dir/building/utils/src/scoutfs $dir &&
mv $dir/building/kmod/src/scoutfs.ko $dir &&
rm -rf $dir/building &&
echo "$platform" > $buildmark &&
find $dir &&
cat $buildmark
) >> "$T_TMP.log" 2>&1 || t_fail "version $vers build failed"
done
echo "== unmounting test fs and removing test module"
t_quiet t_umount_all
t_quiet rmmod scoutfs
echo "== testing combinations of old and new format versions"
mkdir -p "$SCR"
for vers in $(seq $MIN $((MAX - 1))); do
old_scoutfs="$builds/$vers/scoutfs"
old_module="$builds/$vers/scoutfs.ko"
echo "mkfs $vers" >> "$T_TMP.log"
t_quiet $old_scoutfs mkfs -f -Q 0,127.0.0.1,53000 "$T_EX_META_DEV" "$T_EX_DATA_DEV" \
|| t_fail "mkfs $vers failed"
echo "mount $vers with $vers" >> "$T_TMP.log"
t_quiet insmod $old_module
t_quiet mount -t scoutfs -o metadev_path=$T_EX_META_DEV,quorum_slot_nr=0 \
"$T_EX_DATA_DEV" "$SCR"
t_quiet mount_has_format_version "$SCR" "$vers"
echo "creating files in $vers" >> "$T_TMP.log"
t_quiet touch "$SCR/file-"{1,2,3}
stat "$SCR"/file-* > "$T_TMP.stat" || \
t_fail "stat in $vers failed"
echo "remounting $vers fs with $MAX" >> "$T_TMP.log"
t_quiet umount "$SCR"
rmmod scoutfs
insmod "$T_MODULE"
t_quiet mount -t scoutfs -o metadev_path=$T_EX_META_DEV,quorum_slot_nr=0 \
"$T_EX_DATA_DEV" "$SCR"
t_quiet mount_has_format_version "$SCR" "$vers"
echo "verifying stat in $vers with $MAX" >> "$T_TMP.log"
diff -u "$T_TMP.stat" <(stat "$SCR"/file-*)
echo "keep/update/del existing, create new in $vers" >> "$T_TMP.log"
t_quiet touch "$SCR/file-2"
t_quiet rm -f "$SCR/file-3"
t_quiet touch "$SCR/file-4"
stat "$SCR"/file-* > "$T_TMP.stat" || \
t_fail "stat in $vers failed"
echo "remounting $vers fs with $vers" >> "$T_TMP.log"
t_quiet umount "$SCR"
rmmod scoutfs
insmod "$old_module"
t_quiet mount -t scoutfs -o metadev_path=$T_EX_META_DEV,quorum_slot_nr=0 \
"$T_EX_DATA_DEV" "$SCR"
t_quiet mount_has_format_version "$SCR" "$vers"
echo "verifying stat in $vers with $vers" >> "$T_TMP.log"
diff -u "$T_TMP.stat" <(stat "$SCR"/file-*)
echo "changing format vers to $MAX" >> "$T_TMP.log"
t_quiet umount "$SCR"
rmmod scoutfs
t_quiet scoutfs change-format-version -F -V $MAX $T_EX_META_DEV "$T_EX_DATA_DEV"
echo "mount fs $MAX with old $vers should fail" >> "$T_TMP.log"
insmod "$old_module"
mount -t scoutfs -o metadev_path=$T_EX_META_DEV,quorum_slot_nr=0 \
"$T_EX_DATA_DEV" "$SCR" >> "$T_TMP.log" 2>&1
if [ "$?" == "0" ]; then
umount "$SCR"
t_fail "old code ver $vers able to mount new ver $MAX"
fi
echo "remounting $MAX fs with $MAX" >> "$T_TMP.log"
rmmod scoutfs
insmod "$T_MODULE"
t_quiet mount -t scoutfs -o metadev_path=$T_EX_META_DEV,quorum_slot_nr=0 \
"$T_EX_DATA_DEV" "$SCR"
t_quiet mount_has_format_version "$SCR" "$MAX"
echo "verifying stat in $MAX with $MAX" >> "$T_TMP.log"
diff -u "$T_TMP.stat" <(stat "$SCR"/file-*)
echo "keep/update/del existing, create new in $MAX" >> "$T_TMP.log"
t_quiet touch "$SCR/file-2"
t_quiet rm -f "$SCR/file-4"
t_quiet touch "$SCR/file-5"
stat "$SCR"/file-* > "$T_TMP.stat" || \
t_fail "stat in $MAX failed"
echo "remounting $MAX fs with $MAX again" >> "$T_TMP.log"
t_quiet umount "$SCR"
t_quiet mount -t scoutfs -o metadev_path=$T_EX_META_DEV,quorum_slot_nr=0 \
"$T_EX_DATA_DEV" "$SCR"
t_quiet mount_has_format_version "$SCR" "$MAX"
echo "verifying stat in $MAX with $MAX again" >> "$T_TMP.log"
diff -u "$T_TMP.stat" <(stat "$SCR"/file-*)
echo "done with old vers $vers" >> "$T_TMP.log"
t_quiet umount "$SCR"
rmmod scoutfs
done
echo "== restoring test module and mount"
insmod "$T_MODULE"
t_mount_all
t_pass
-24
View File
@@ -10,30 +10,6 @@ EXTENTS_PER_BTREE_BLOCK=600
EXTENTS_PER_LIST_BLOCK=8192
FREED_EXTENTS=$((EXTENTS_PER_BTREE_BLOCK * EXTENTS_PER_LIST_BLOCK))
#
# This test specifically creates a pathologically sparse file that will
# be as expensive as possible to free. This is usually fine on
# dedicated or reasonable hardware, but trying to run this in
# virtualized debug kernels can take a very long time. This test is
# about making sure that the server doesn't fail, not that the platform
# can handle the scale of work that our btree formats happen to require
# while execution is bogged down with use-after-free memory reference
# tracking. So we give the test a lot more breathing room before
# deciding that its hung.
#
echo "== setting longer hung task timeout"
if [ -w /proc/sys/kernel/hung_task_timeout_secs ]; then
secs=$(cat /proc/sys/kernel/hung_task_timeout_secs)
test "$secs" -gt 0 || \
t_fail "confusing value '$secs' from /proc/sys/kernel/hung_task_timeout_secs"
restore_hung_task_timeout()
{
echo "$secs" > /proc/sys/kernel/hung_task_timeout_secs
}
trap restore_hung_task_timeout EXIT
echo "$((secs * 5))" > /proc/sys/kernel/hung_task_timeout_secs
fi
echo "== creating fragmented extents"
fragmented_data_extents $FREED_EXTENTS $EXTENTS_PER_BTREE_BLOCK "$T_D0/alloc" "$T_D0/move"
-52
View File
@@ -1,52 +0,0 @@
# notable id to recognize in output
ID=8675309
echo "== default new files don't have project"
touch "$T_D0/file"
scoutfs get-attr-x -p "$T_D0/file"
echo "== set new project on files and dirs"
mkdir "$T_D0/dir"
scoutfs set-attr-x -p $ID "$T_D0/file"
scoutfs set-attr-x -p $ID "$T_D0/dir"
scoutfs get-attr-x -p "$T_D0/file"
scoutfs get-attr-x -p "$T_D0/dir"
echo "== non-root can see id"
chmod 644 "$T_D0/file"
setpriv --ruid=12345 --euid=12345 scoutfs get-attr-x -p "$T_D0/file"
echo "== can use IDs around long width limits"
touch "$T_D0/ids"
for id in 0x7FFFFFFF 0x80000000 0xFFFFFFFF \
0x7FFFFFFFFFFFFFFF 0x8000000000000000 0xFFFFFFFFFFFFFFFF; do
scoutfs set-attr-x -p $id "$T_D0/ids"
scoutfs get-attr-x -p "$T_D0/ids"
done
echo "== created files and dirs inherit project id"
touch "$T_D0/dir/file"
mkdir "$T_D0/dir/sub"
scoutfs get-attr-x -p "$T_D0/dir/file"
scoutfs get-attr-x -p "$T_D0/dir/sub"
echo "== inheritance continues"
mkdir "$T_D0/dir/sub/more"
scoutfs get-attr-x -p "$T_D0/dir/sub/more"
# .. just inherits 0 :)
echo "== clearing project id stops inheritance"
scoutfs set-attr-x -p 0 "$T_D0/dir"
touch "$T_D0/dir/another-file"
mkdir "$T_D0/dir/another-sub"
scoutfs get-attr-x -p "$T_D0/dir/another-file"
scoutfs get-attr-x -p "$T_D0/dir/another-sub"
echo "== o_tmpfile creations inherit dir"
scoutfs set-attr-x -p $ID "$T_D0/dir"
o_tmpfile_linkat "$T_D0/dir" "$T_D0/dir/tmpfile"
scoutfs get-attr-x -p "$T_D0/dir/tmpfile"
t_pass
-150
View File
@@ -1,150 +0,0 @@
TEST_UID=22222
TEST_GID=44444
# sys_setreuid() set fs[uid] to e[ug]id
SET_UID="--ruid=$TEST_UID --euid=$TEST_UID"
SET_GID="--rgid=$TEST_GID --egid=$TEST_GID --clear-groups"
FILE="$T_D0/dir/file"
sync_and_drop()
{
sync
echo 1 > $(t_debugfs_path)/drop_weak_item_cache
echo 1 > $(t_debugfs_path)/drop_quota_check_cache
}
reset_all()
{
rm -f "$FILE"
scoutfs quota-wipe -p "$T_M0"
getfattr --absolute-names -d -m - "$T_D0" | \
grep "^scoutfs.totl." | \
cut -d '=' -f 1 | \
xargs -n 1 -I'{}' setfattr -x '{}' "$T_D0"
}
echo "== prepare dir with write perm for test ids"
mkdir "$T_D0/dir"
chown --quiet $TEST_UID "$T_D0/dir"
chgrp --quiet $TEST_GID "$T_D0/dir"
echo "== test assumes starting with no rules, empty list"
scoutfs quota-list -p "$T_M0"
echo "== add rule"
scoutfs quota-add -p "$T_M0" -r "7 13,L,- 15,L,- 17,L,- I 33 -"
scoutfs quota-list -p "$T_M0"
echo "== list is empty again after delete"
scoutfs quota-del -p "$T_M0" -r "7 13,L,- 15,L,- 17,L,- I 33 -"
scoutfs quota-list -p "$T_M0"
echo "== can change limits without deleting"
scoutfs quota-add -p "$T_M0" -r "1 1,L,- 1,L,- 1,L,- I 100 -"
scoutfs quota-list -p "$T_M0"
scoutfs quota-add -p "$T_M0" -r "1 1,L,- 1,L,- 1,L,- I 101 -"
scoutfs quota-del -p "$T_M0" -r "1 1,L,- 1,L,- 1,L,- I 100 -"
scoutfs quota-list -p "$T_M0"
scoutfs quota-add -p "$T_M0" -r "1 1,L,- 1,L,- 1,L,- I 99 -"
scoutfs quota-del -p "$T_M0" -r "1 1,L,- 1,L,- 1,L,- I 101 -"
scoutfs quota-list -p "$T_M0"
scoutfs quota-del -p "$T_M0" -r "1 1,L,- 1,L,- 1,L,- I 99 -"
reset_all
echo "== wipe and restore rules in bulk"
for a in $(seq 10 15); do
scoutfs quota-add -p "$T_M0" -r "7 $a,L,- 0,L,- 0,L,- I 33 -"
done
scoutfs quota-list -p "$T_M0"
scoutfs quota-list -p "$T_M0" > "$T_TMP.list"
scoutfs quota-wipe -p "$T_M0"
scoutfs quota-list -p "$T_M0"
scoutfs quota-restore -p "$T_M0" < "$T_TMP.list"
scoutfs quota-list -p "$T_M0"
reset_all
echo "== default rule prevents file creation"
scoutfs quota-add -p "$T_M0" -r "1 1,L,- 1,L,- 1,L,- I 1 -"
setfattr -n scoutfs.totl.test.1.1.1 -v 2 "$T_D0"
sync_and_drop
setpriv $SET_UID touch "$FILE" 2>&1 | t_filter_fs
echo "== decreasing totl allows file creation again"
setfattr -x scoutfs.totl.test.1.1.1 "$T_D0"
sync_and_drop
setpriv $SET_UID touch "$FILE"
reset_all
echo "== attr selecting rules prevent creation"
scoutfs quota-add -p "$T_M0" -r "1 $TEST_UID,U,S 1,L,- 1,L,- I 1 -"
scoutfs quota-add -p "$T_M0" -r "1 $TEST_GID,G,S 1,L,- 1,L,- I 1 -"
setfattr -n scoutfs.totl.test.$TEST_UID.1.1 -v 2 "$T_D0"
setfattr -n scoutfs.totl.test.$TEST_GID.1.1 -v 2 "$T_D0"
sync_and_drop
setpriv $SET_UID touch "$FILE" 2>&1 | t_filter_fs
setpriv $SET_GID touch "$FILE" 2>&1 | t_filter_fs
reset_all
echo "== multi attr selecting doesn't prevent partial"
scoutfs quota-add -p "$T_M0" -r "1 $TEST_UID,U,S $TEST_GID,G,S 1,L,- I 1 -"
setfattr -n scoutfs.totl.test.$TEST_UID.$TEST_GID.1 -v 2 "$T_D0"
sync_and_drop
setpriv $SET_UID touch "$FILE"
rm -f "$FILE"
setpriv $SET_GID touch "$FILE"
rm -f "$FILE"
setpriv $SET_UID $SET_GID touch "$FILE" 2>&1 | t_filter_fs
reset_all
echo "== op differentiates"
# inode ops succeed in presence of data rule
scoutfs quota-add -p "$T_M0" -r "1 $TEST_UID,U,S 1,L,- 1,L,- D 1 -"
setfattr -n scoutfs.totl.test.$TEST_UID.1.1 -v 2 "$T_D0"
sync_and_drop
setpriv $SET_UID touch "$FILE" 2>&1 | t_filter_fs
reset_all
# data ops succeed in presence of inode rule
touch "$FILE"
chown --quiet $TEST_UID "$FILE"
scoutfs quota-add -p "$T_M0" -r "1 $TEST_UID,U,S 1,L,- 1,L,- I 1 -"
setfattr -n scoutfs.totl.test.$TEST_UID.1.1 -v 2 "$T_D0"
sync_and_drop
setpriv $SET_UID fallocate -l 4096 "$FILE" 2>&1 | t_filter_fs
reset_all
echo "== higher priority rule applies"
scoutfs quota-add -p "$T_M0" -r "1 $TEST_UID,U,S 1,L,- 1,L,- I 1000 -"
scoutfs quota-add -p "$T_M0" -r "2 $TEST_UID,U,S 1,L,- 1,L,- I 1 -"
setfattr -n scoutfs.totl.test.$TEST_UID.1.1 -v 2 "$T_D0"
sync_and_drop
setpriv $SET_UID touch "$FILE" 2>&1 | t_filter_fs
reset_all
echo "== data rules with total and count prevent write and fallocate"
touch "$FILE"
scoutfs quota-add -p "$T_M0" -r "1 1,L,- 1,L,- 1,L,- D 1 -"
setfattr -n scoutfs.totl.test.1.1.1 -v 2 "$T_D0"
sync_and_drop
dd if=/dev/zero of="$FILE" bs=4096 count=1 conv=notrunc status=none 2>&1 | t_filter_fs
fallocate -l 4096 "$FILE" 2>&1 | t_filter_fs
scoutfs quota-del -p "$T_M0" -r "1 1,L,- 1,L,- 1,L,- D 1 -"
scoutfs quota-add -p "$T_M0" -r "1 1,L,- 1,L,- 1,L,- D 0 C"
sync_and_drop
dd if=/dev/zero of="$FILE" bs=4096 count=1 conv=notrunc status=none 2>&1 | t_filter_fs
fallocate -l 4096 "$FILE" 2>&1 | t_filter_fs
reset_all
echo "== added rules work after bulk restore"
seq -f " 1 %.0f,U,S 1,L,- 1,L,- I 1 -" 9000050000 -1 9000000000 > "$T_TMP.lots"
scoutfs quota-restore -p "$T_M0" < "$T_TMP.lots"
scoutfs quota-list -p "$T_M0" > "$T_TMP.list"
diff -u "$T_TMP.lots" "$T_TMP.list"
scoutfs quota-add -p "$T_M0" -r "1 $TEST_UID,U,S 1,L,- 1,L,- I 1 -"
setfattr -n scoutfs.totl.test.$TEST_UID.1.1 -v 2 "$T_D0"
sync_and_drop
setpriv $SET_UID touch "$FILE" 2>&1 | t_filter_fs
reset_all
t_pass
-2
View File
@@ -2,8 +2,6 @@
# Some basic tests of online resizing metadata and data devices.
#
t_require_commands bc
statfs_total() {
local single="total_$1_blocks"
local mnt="$2"
-57
View File
@@ -1,57 +0,0 @@
t_require_commands scoutfs touch rm setfattr
touch "$T_D0/file-1"
echo "== setting retention on dir fails"
scoutfs set-attr-x -t 1 "$T_D0" 2>&1 | t_filter_fs
echo "== set retention"
scoutfs set-attr-x -t 1 "$T_D0/file-1"
echo "== get-attr-x shows retention"
scoutfs get-attr-x -t "$T_D0/file-1"
echo "== unpriv can't clear retention"
setpriv --ruid=12345 --euid=12345 scoutfs set-attr-x -t 0 "$T_D0/file-1" 2>&1 | t_filter_fs
echo "== can set hidden scoutfs xattr in retention"
setfattr -n scoutfs.hide.srch.retention_test -v val "$T_D0/file-1"
echo "== setting user. xattr fails in retention"
setfattr -n user.retention_test -v val "$T_D0/file-1" 2>&1 | t_filter_fs
echo "== file deletion fails in retention"
rm -f "$T_D0/file-1" 2>&1 | t_filter_fs
echo "== file rename fails in retention"
mv $T_D0/file-1 $T_D0/file-2 2>&1 | t_filter_fs
echo "== file write fails in retention"
date >> $T_D0/file-1
echo "== file truncate fails in retention"
truncate -s 0 $T_D0/file-1 2>&1 | t_filter_fs
echo "== setattr fails in retention"
touch $T_D0/file-1 2>&1 | t_filter_fs
echo "== clear retention"
scoutfs set-attr-x -t 0 "$T_D0/file-1"
echo "== file write"
date >> $T_D0/file-1
echo "== file rename"
mv $T_D0/file-1 $T_D0/file-2
mv $T_D0/file-2 $T_D0/file-1
echo "== setattr"
touch $T_D0/file-1 2>&1 | t_filter_fs
echo "== xattr deletion"
setfattr -x scoutfs.hide.srch.retention_test "$T_D0/file-1"
echo "== cleanup"
rm -f "$T_D0/file-1"
t_pass
+24 -25
View File
@@ -9,7 +9,6 @@ LOG=340000
LIM=1000000
SEQF="%.20g"
SXA="scoutfs.srch.test-srch-basic-functionality"
t_require_commands touch rm setfattr scoutfs find_xattrs
@@ -28,20 +27,20 @@ diff_srch_find()
echo "== create new xattrs"
touch "$T_D0/"{create,update}
setfattr -n $SXA -v 1 "$T_D0/"{create,update} 2>&1 | t_filter_fs
diff_srch_find $SXA
setfattr -n scoutfs.srch.test -v 1 "$T_D0/"{create,update} 2>&1 | t_filter_fs
diff_srch_find scoutfs.srch.test
echo "== update existing xattr"
setfattr -n $SXA -v 2 "$T_D0/update" 2>&1 | t_filter_fs
diff_srch_find $SXA
setfattr -n scoutfs.srch.test -v 2 "$T_D0/update" 2>&1 | t_filter_fs
diff_srch_find scoutfs.srch.test
echo "== remove an xattr"
setfattr -x $SXA "$T_D0/create" 2>&1 | t_filter_fs
diff_srch_find $SXA
setfattr -x scoutfs.srch.test "$T_D0/create" 2>&1 | t_filter_fs
diff_srch_find scoutfs.srch.test
echo "== remove xattr with files"
rm -f "$T_D0/"{create,update}
diff_srch_find $SXA
diff_srch_find scoutfs.srch.test
echo "== trigger small log merges by rotating single block with unmount"
sv=$(t_server_nr)
@@ -57,7 +56,7 @@ while [ "$i" -lt "8" ]; do
eval path="\$T_D${nr}/single-block-$i"
touch "$path"
setfattr -n $SXA -v $i "$path"
setfattr -n scoutfs.srch.single-block-logs -v $i "$path"
t_umount $nr
t_mount $nr
@@ -66,51 +65,51 @@ while [ "$i" -lt "8" ]; do
done
# wait for srch compaction worker delay
sleep 10
find "$T_D0" -type f -name 'single-block-*' -delete
rm -rf "$T_D0/single-block-*"
echo "== create entries in current log"
DIR="$T_D0/dir"
NR=$((LOG / 4))
mkdir -p "$DIR"
seq -f "f-$SEQF" 1 $NR | src/bulk_create_paths -X $SXA -d "$DIR" > /dev/null
diff_srch_find $SXA
seq -f "f-$SEQF" 1 $NR | src/bulk_create_paths -S -d "$DIR" > /dev/null
diff_srch_find scoutfs.srch.scoutfs_bcp
echo "== delete small fraction"
seq -f "$DIR/f-$SEQF" 1 7 $NR | xargs setfattr -x $SXA
diff_srch_find $SXA
seq -f "$DIR/f-$SEQF" 1 7 $NR | xargs setfattr -x scoutfs.srch.scoutfs_bcp
diff_srch_find scoutfs.srch.scoutfs_bcp
echo "== remove files"
rm -rf "$DIR"
diff_srch_find $SXA
diff_srch_find scoutfs.srch.scoutfs_bcp
echo "== create entries that exceed one log"
NR=$((LOG * 3 / 2))
mkdir -p "$DIR"
seq -f "f-$SEQF" 1 $NR | src/bulk_create_paths -X $SXA -d "$DIR" > /dev/null
diff_srch_find $SXA
seq -f "f-$SEQF" 1 $NR | src/bulk_create_paths -S -d "$DIR" > /dev/null
diff_srch_find scoutfs.srch.scoutfs_bcp
echo "== delete fractions in phases"
for i in $(seq 1 3); do
seq -f "$DIR/f-$SEQF" $i 3 $NR | xargs setfattr -x $SXA
diff_srch_find $SXA
seq -f "$DIR/f-$SEQF" $i 3 $NR | xargs setfattr -x scoutfs.srch.scoutfs_bcp
diff_srch_find scoutfs.srch.scoutfs_bcp
done
echo "== remove files"
rm -rf "$DIR"
diff_srch_find $SXA
diff_srch_find scoutfs.srch.scoutfs_bcp
echo "== create entries for exceed search entry limit"
NR=$((LIM * 3 / 2))
mkdir -p "$DIR"
seq -f "f-$SEQF" 1 $NR | src/bulk_create_paths -X $SXA -d "$DIR" > /dev/null
diff_srch_find $SXA
seq -f "f-$SEQF" 1 $NR | src/bulk_create_paths -S -d "$DIR" > /dev/null
diff_srch_find scoutfs.srch.scoutfs_bcp
echo "== delete half"
seq -f "$DIR/f-$SEQF" 1 2 $NR | xargs setfattr -x $SXA
diff_srch_find $SXA
seq -f "$DIR/f-$SEQF" 1 2 $NR | xargs setfattr -x scoutfs.srch.scoutfs_bcp
diff_srch_find scoutfs.srch.scoutfs_bcp
echo "== entirely remove third batch"
rm -rf "$DIR"
diff_srch_find $SXA
diff_srch_find scoutfs.srch.scoutfs_bcp
t_pass
-90
View File
@@ -1,90 +0,0 @@
#
# There was a bug where srch file compaction could get stuck if a
# partial compaction finished at the specific _SAFE_BYTES offset in a
# block. Resuming from that position would return an error and
# compaction would stop making forward progress.
#
# We use triggers to pad the output of log compaction to end on the safe
# offset and then cause compaction of those padded inputs to stop at the
# safe offset. Continuation will either succeed or return errors.
#
# forcing rotation, so just a few
NR=10
SEQF="%.20g"
COMPACT_NR=4
echo "== initialize per-mount values"
declare -a err
declare -a compact_delay
for nr in $(t_fs_nrs); do
err[$nr]=$(t_counter srch_compact_error $nr)
compact_delay[$nr]=$(cat $(t_sysfs_path $nr)/srch/compact_delay_ms)
done
restore_compact_delay()
{
for nr in $(t_fs_nrs); do
echo ${compact_delay[$nr]} > $(t_sysfs_path $nr)/srch/compact_delay_ms
done
}
trap restore_compact_delay EXIT
echo "== arm compaction triggers"
for nr in $(t_fs_nrs); do
t_trigger_arm srch_compact_logs_pad_safe $nr
t_trigger_arm srch_merge_stop_safe $nr
done
echo "== compact more often"
for nr in $(t_fs_nrs); do
echo 1000 > $(t_sysfs_path $nr)/srch/compact_delay_ms
done
echo "== create padded sorted inputs by forcing log rotation"
sv=$(t_server_nr)
for i in $(seq 1 $COMPACT_NR); do
for j in $(seq 1 $COMPACT_NR); do
t_trigger_arm srch_force_log_rotate $sv
seq -f "f-$i-$j-$SEQF" 1 10 | \
bulk_create_paths -X "scoutfs.srch.t-srch-safe-merge-pos" -d "$T_D0" > \
/dev/null
sync
test "$(t_trigger_get srch_force_log_rotate $sv)" == "0" || \
t_fail "srch_force_log_rotate didn't trigger"
done
padded=0
while test $padded == 0 && sleep .5; do
for nr in $(t_fs_nrs); do
if [ "$(t_trigger_get srch_compact_logs_pad_safe $nr)" == "0" ]; then
t_trigger_arm srch_compact_logs_pad_safe $nr
padded=1
break
fi
test "$(t_counter srch_compact_error $nr)" == "${err[$nr]}" || \
t_fail "srch_compact_error counter increased on mount $nr"
done
done
done
echo "== compaction of padded should stop at safe"
sleep 2
for nr in $(t_fs_nrs); do
if [ "$(t_trigger_get srch_merge_stop_safe $nr)" == "0" ]; then
break
fi
done
echo "== verify no compaction errors"
sleep 2
for nr in $(t_fs_nrs); do
test "$(t_counter srch_compact_error $nr)" == "${err[$nr]}" || \
t_fail "srch_compact_error counter increased on mount $nr"
done
echo "== cleanup"
find "$T_D0" -type f -name 'f-*' -delete
t_pass
+1 -1
View File
@@ -3,7 +3,6 @@ t_require_commands touch rm setfattr scoutfs find_xattrs
read_xattr_totals()
{
sync
echo 1 > $(t_debugfs_path)/drop_weak_item_cache
scoutfs read-xattr-totals -p "$T_M0"
}
@@ -113,6 +112,7 @@ for phase in create update remove; do
echo "$k.0.0 = ${totals[$k]}, ${counts[$k]}"
done ) | grep -v "= 0, 0$" | sort -n >> $T_TMP.check_arr
sync
read_xattr_totals | sort -n >> $T_TMP.check_read
diff -u $T_TMP.check_arr $T_TMP.check_read || \
-71
View File
@@ -55,19 +55,6 @@ with initial sparse regions (perhaps by multiple threads writing to
different regions) and wasted space isn't an issue (perhaps because the
file population contains few small files).
.TP
.B log_merge_wait_timeout_ms=<number>
This option sets the amount of time, in milliseconds, that log merge
creation can wait before timing out. This setting is per-mount, only
changes the behavior of that mount, and only affects the server when it
is running in that mount.
.sp
This determines how long it may take for mounts to synchronize
committing their log trees to create a log merge operation. Setting it
too high can create long latencies in the event that a mount takes a
long time to commit their log. Setting it too low can result in the
creation of excessive numbers of log trees that are never merged. The
default is 500 and it can not be less than 100 nor greater than 60000.
.TP
.B metadev_path=<device>
The metadev_path option specifies the path to the block device that
contains the filesystem's metadata.
@@ -270,21 +257,6 @@ metadata that is bound to a specific volume and should not be
transferred with the file by tools that read extended attributes, like
.BR tar(1) .
.TP
.B .indx.
Attributes with the .indx. tag dd the inode containing the attribute to
a filesystem-wide index. The name of the extended attribute must end
with strings representing two values separated by dots. The first value
is an unsigned 8bit value and the second is an unsigned 64bit value.
These attributes can only be modified with root privileges and the
attributes can not have a value.
.sp
The inodes in the index are stored in increasing sort order of the
values, with the first u8 value being most significant. Inodes can be
at many positions as tracked by many extended attributes, and their
position follows the creation, renaming, or deletion of the attributes.
The index can be read with the read-xattr-index command which uses the
underlying READ_XATTR_INDEX ioctl.
.TP
.B .srch.
Attributes with the .srch. tag are indexed so that they can be
found by the
@@ -310,36 +282,6 @@ with the
ioctl.
.RE
.SH FILE RETENTION MODE
A file can be set to retention mode by setting the
.IB RETENTION
attribute with the
.IB SET_ATTR_X
ioctl. This flag can only be set on regular files and requires root
permission (the
.IB CAP_SYS_ADMIN
capability).
.sp
Once in retention mode all modifications of the file will fail. The
only exceptions are that system extended attributes (all those without
the "user." prefix) may be modified. The retention bit may be cleared
with sufficient priveledges to remove the retention restrictions on
other modifications.
.RE
.SH PROJECT IDs
All inodes have a project ID attribute that can be set via the
SET_ATTR_X ioctl and displayed with the GET_ATTR_X ioctl. Project IDs
are an unsigned 64bit value and the value of 0 is reserved to indicate
that no project ID is assigned. If a project ID is set on a directory
then all inodes created with it as the initial parent inheret that ID,
for all file types. This includes files initially unlinked from the
namespace when created with O_TMPFILE. Project IDs are only
automatically inherited from the parent dir on initial creation.
They're not changed as directory entry linkes to the inode are created
or renamed.
.RE
.SH FORMAT VERSION
The format version defines the layout and use of structures stored on
devices and passed over the network. The version is incremented for
@@ -418,19 +360,6 @@ The version that a mount is using is shown in the
file in the mount's sysfs directory, typically
.I /sys/fs/scoutfs/f.FSID.r.RID/
.RE
.sp
The defined format versions are:
.RS
.TP
.sp
.B 1
Initial format version.
.TP
.B 2
Added retention mode by setting the retention attribute. Added the
project ID inode attribute. Added quota rules and enforcement. Added
the .indx. extended attribute tag.
.RE
.SH CORRUPTION DETECTION
A
-19
View File
@@ -209,16 +209,6 @@ A path within a ScoutFS filesystem.
.RE
.PD
.TP
.BI "get-attr-x FILE"
.sp
Display ScoutFS-specific attributes from a file. If no options are
given than all the attributes that the command supports will be
displayed. If attributes are specified with options then only those
attributes are displayed. If only one attribute is specified then it
will not have a label prefix in the display output. The --help option
will list the attributes that the command supports. The file system may
support a different set of attributes.
.TP
.BI "get-referring-entries [-p|--path PATH] INO"
.sp
@@ -516,15 +506,6 @@ A path within a ScoutFS filesystem.
.RE
.PD
.TP
.BI "set-attr-x FILE"
.sp
Set ScoutFS-specific attributes on a file. Only the attributes that are
spcified by options will be set. The --help option will list the
attributes that the command understands. The file system may support a
different set of attributes.
.PD
.TP
.BI "setattr FILE [-d, --data-version=VERSION [-s, --size=SIZE [-o, --offline]]] [-t, --ctime=TIMESPEC]"
.sp
-304
View File
@@ -1,304 +0,0 @@
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/ioctl.h>
#include <fcntl.h>
#include <errno.h>
#include <string.h>
#include <assert.h>
#include <argp.h>
#include <stdbool.h>
#include "sparse.h"
#include "util.h"
#include "format.h"
#include "ioctl.h"
#include "parse.h"
#include "cmd.h"
struct attr_x_args {
bool set;
char *filename;
struct scoutfs_ioctl_inode_attr_x iax;
};
#define pr(iax, name, label, fmt, args...) \
do { \
if ((iax->x_mask & SCOUTFS_IOC_IAX_##name)) { \
if (__builtin_popcount(iax->x_mask) > 1) \
printf(label ": " fmt "\n", ##args); \
else \
printf(fmt "\n", ##args); \
} \
} while (0)
#define prb(iax, name, label) \
pr(iax, name, label, "%u", !!((iax)->bits & SCOUTFS_IOC_IAX_B_##name))
static int do_attr_x(struct attr_x_args *args)
{
struct scoutfs_ioctl_inode_attr_x *iax = &args->iax;
int fd = -1;
int ret;
int op;
if (args->set) {
/* nothing to do if not setting */
if (iax->x_mask == 0)
return 0;
op = SCOUTFS_IOC_SET_ATTR_X;
} else {
/* get all known if none specified */
if (iax->x_mask == 0)
iax->x_mask = ~SCOUTFS_IOC_IAX__UNKNOWN;
op = SCOUTFS_IOC_GET_ATTR_X;
}
fd = open(args->filename, O_RDONLY);
if (fd < 0) {
ret = -errno;
fprintf(stderr, "failed to open '%s': %s (%d)\n",
args->filename, strerror(errno), errno);
goto out;
}
ret = ioctl(fd, op, iax);
if (ret < 0) {
ret = -errno;
fprintf(stderr, "attr_x ioctl failed on '%s': "
"%s (%d)\n", args->filename, strerror(errno), errno);
goto out;
}
if (!args->set) {
pr(iax, META_SEQ, "meta_seq", "%llu", iax->meta_seq);
pr(iax, DATA_SEQ, "data_seq", "%llu", iax->data_seq);
pr(iax, DATA_VERSION, "data_version", "%llu", iax->data_version);
pr(iax, ONLINE_BLOCKS, "online_blocks", "%llu", iax->online_blocks);
pr(iax, OFFLINE_BLOCKS, "offline_blocks", "%llu", iax->offline_blocks);
pr(iax, CTIME, "ctime", "%llu.%u", iax->ctime_sec, iax->ctime_nsec);
pr(iax, CRTIME, "crtime", "%llu.%u", iax->crtime_sec, iax->crtime_nsec);
pr(iax, SIZE, "size", "%llu", iax->size);
prb(iax, RETENTION, "retention");
pr(iax, PROJECT_ID, "project_id", "%llu", iax->project_id);
}
ret = 0;
out:
if (fd >= 0)
close(fd);
return ret;
}
/*
* This is called for both get and set. The get calls won't have
* arguments and are only setting the mask. The set calls parse the
* value to set. We could have defaults by making set option arguments
* optional, like setting the current time for timestamps, but that
* hasn't been needed.
*
* Option value parsing places no constraints on the attributes or
* values themselves once parsed. This lets us use the set command to
* test the kernel's testing for invalid attribute combinations and
* values.
*/
static int parse_opt(int key, char *arg, struct argp_state *state)
{
struct attr_x_args *args = state->input;
struct timespec ts;
int ret;
u64 x;
switch (key) {
case 'm':
args->iax.x_mask |= SCOUTFS_IOC_IAX_META_SEQ;
if (arg) {
ret = parse_u64(arg, &args->iax.meta_seq);
if (ret)
return ret;
}
break;
case 'd':
args->iax.x_mask |= SCOUTFS_IOC_IAX_DATA_SEQ;
if (arg) {
ret = parse_u64(arg, &args->iax.data_seq);
if (ret)
return ret;
}
break;
case 'v':
args->iax.x_mask |= SCOUTFS_IOC_IAX_DATA_VERSION;
if (arg) {
ret = parse_u64(arg, &args->iax.data_version);
if (ret)
return ret;
if (args->iax.data_version == 0)
argp_error(state, "data version must not be 0");
}
break;
case 'n':
args->iax.x_mask |= SCOUTFS_IOC_IAX_ONLINE_BLOCKS;
if (arg) {
ret = parse_u64(arg, &args->iax.online_blocks);
if (ret)
return ret;
}
break;
case 'f':
args->iax.x_mask |= SCOUTFS_IOC_IAX_OFFLINE_BLOCKS;
if (arg) {
ret = parse_u64(arg, &args->iax.offline_blocks);
if (ret)
return ret;
}
break;
case 'c':
args->iax.x_mask |= SCOUTFS_IOC_IAX_CTIME;
if (arg) {
ret = parse_timespec(arg, &ts);
if (ret)
return ret;
args->iax.ctime_sec = ts.tv_sec;
args->iax.ctime_nsec = ts.tv_nsec;
}
break;
case 'r':
args->iax.x_mask |= SCOUTFS_IOC_IAX_CRTIME;
if (arg) {
ret = parse_timespec(arg, &ts);
if (ret)
return ret;
args->iax.crtime_sec = ts.tv_sec;
args->iax.crtime_nsec = ts.tv_nsec;
}
break;
case 's':
args->iax.x_mask |= SCOUTFS_IOC_IAX_SIZE;
if (arg) {
ret = parse_u64(arg, &args->iax.size);
if (ret)
return ret;
}
break;
case 't':
args->iax.x_mask |= SCOUTFS_IOC_IAX_RETENTION;
if (arg) {
ret = parse_u64(arg, &x);
if (ret)
return ret;
if (x)
args->iax.bits |= SCOUTFS_IOC_IAX_B_RETENTION;
}
break;
case 'p':
args->iax.x_mask |= SCOUTFS_IOC_IAX_PROJECT_ID;
if (arg) {
ret = parse_u64(arg, &args->iax.project_id);
if (ret)
return ret;
}
break;
case ARGP_KEY_ARG:
if (!args->filename)
args->filename = strdup_or_error(state, arg);
else
argp_error(state, "more than one argument given");
break;
case ARGP_KEY_FINI:
if (!args->filename)
argp_error(state, "no filename given");
break;
default:
break;
}
return 0;
}
/*
* The get options are derived from these by copying the struct and
* modifying fields.
*/
static struct argp_option set_options[] = {
{ "meta_seq", 'm', "SEQ", 0, "Inode Metadata change index sequence number"},
{ "data_seq", 'd', "SEQ", 0, "File Data change index sequence number"},
{ "data_version", 'v', "VERSION", 0, "File Data contents version"},
{ "online_blocks", 'n', "COUNT", 0, "Online data block count"},
{ "offline_blocks", 'f', "COUNT", 0, "Offline data block count"},
{ "ctime", 'c', "SECS.NSECS", 0, "Inode change time (posix ctime)"},
{ "crtime", 'r', "SECS.NSECS", 0, "ScoutFS creation time"},
{ "size", 's', "SIZE", 0, "Inode i_size field"},
{ "retention", 't', "0|1", 0, "Retention flag"},
{ "project_id", 'p', "PROJECT_ID", 0, "Project ID"},
{ NULL }
};
static struct argp get_argp = {
NULL, /* dynamically built */
parse_opt,
"FILE",
"get extensible file attributes"
};
static int get_attr_x_cmd(int argc, char **argv)
{
struct attr_x_args args = {0,};
int ret;
ret = argp_parse(&get_argp, argc, argv, 0, NULL, &args);
if (ret)
return ret;
return do_attr_x(&args);
}
/*
* The set options match the get arguments but don't take argument
* values to set.
*/
static void build_get_options(void)
{
struct argp_option **opts = (struct argp_option **)&get_argp.options;
int i;
*opts = calloc(array_size(set_options), sizeof(set_options[0]));
assert(*opts);
memcpy(*opts, set_options, array_size(set_options) * sizeof(set_options[0]));
for (i = 0; i < array_size(set_options) - 1; i++)
(*opts)[i].arg = NULL;
}
static void __attribute__((constructor)) get_ctor(void)
{
build_get_options();
cmd_register_argp("get-attr-x", &get_argp, GROUP_AGENT, get_attr_x_cmd);
}
static struct argp set_argp = {
set_options,
parse_opt,
"FILE",
"Set extensible file attributes"
};
static int set_attr_x_cmd(int argc, char **argv)
{
struct attr_x_args args = {.set = true,};
int ret;
ret = argp_parse(&set_argp, argc, argv, 0, NULL, &args);
if (ret)
return ret;
return do_attr_x(&args);
}
static void __attribute__((constructor)) set_ctor(void)
{
cmd_register_argp("set-attr-x", &set_argp, GROUP_AGENT, set_attr_x_cmd);
}
-10
View File
@@ -119,16 +119,6 @@ static int do_change_fmt_vers(struct change_fmt_vers_args *args)
goto out;
}
if (le64_to_cpu(meta_super->fmt_vers) > args->fmt_vers ||
le64_to_cpu(data_super->fmt_vers) > args->fmt_vers) {
ret = -EPERM;
printf("Downgrade of Meta Format Version: %llu and Data Format Version: %llu to Format Version: %llu is not allowed\n",
le64_to_cpu(meta_super->fmt_vers),
le64_to_cpu(data_super->fmt_vers),
args->fmt_vers);
goto out;
}
if (le64_to_cpu(meta_super->fmt_vers) != args->fmt_vers) {
meta_super->fmt_vers = cpu_to_le64(args->fmt_vers);
-9
View File
@@ -141,13 +141,4 @@ static inline void scoutfs_key_dec(struct scoutfs_key *key)
key->sk_zone--;
}
static inline void scoutfs_xattr_get_indx_key(struct scoutfs_key *key, u8 *major, u64 *minor,
u64 *ino, u64 *xid)
{
*major = le64_to_cpu(key->_sk_first) >> 56;
*minor = (le64_to_cpu(key->_sk_first) << 8) | (le64_to_cpu(key->_sk_second) >> 56);
*ino = (le64_to_cpu(key->_sk_second) << 8) | (le64_to_cpu(key->_sk_third) >> 56);
*xid = (le64_to_cpu(key->_sk_third) << 8) | key->_sk_fourth;
}
#endif
+1 -3
View File
@@ -274,9 +274,7 @@ static int do_mkfs(struct mkfs_args *args)
inode.ctime.nsec = inode.atime.nsec;
inode.mtime.sec = inode.atime.sec;
inode.mtime.nsec = inode.atime.nsec;
inode.crtime.sec = inode.atime.sec;
inode.crtime.nsec = inode.atime.nsec;
btree_append_item(bt, &key, &inode, scoutfs_inode_vers_bytes(args->fmt_vers));
btree_append_item(bt, &key, &inode, sizeof(inode));
ret = write_block(meta_fd, SCOUTFS_BLOCK_MAGIC_BTREE, fsid, 1, blkno,
SCOUTFS_BLOCK_LG_SHIFT, &bt->hdr);
+1 -37
View File
@@ -49,7 +49,7 @@ static void print_inode(struct scoutfs_key *key, void *val, int val_len)
{
struct scoutfs_inode *inode = val;
printf(" inode: ino %llu size %llu version %llu proj %llu nlink %u\n"
printf(" inode: ino %llu size %llu version %llu nlink %u\n"
" uid %u gid %u mode 0%o rdev 0x%x flags 0x%x\n"
" next_readdir_pos %llu meta_seq %llu data_seq %llu data_version %llu\n"
" atime %llu.%08u ctime %llu.%08u\n"
@@ -57,7 +57,6 @@ static void print_inode(struct scoutfs_key *key, void *val, int val_len)
le64_to_cpu(key->ski_ino),
le64_to_cpu(inode->size),
le64_to_cpu(inode->version),
le64_to_cpu(inode->proj),
le32_to_cpu(inode->nlink), le32_to_cpu(inode->uid),
le32_to_cpu(inode->gid), le32_to_cpu(inode->mode),
le32_to_cpu(inode->rdev),
@@ -80,24 +79,6 @@ static void print_orphan(struct scoutfs_key *key, void *val, int val_len)
}
#define SQR_FMT "[%u %llu,%u,%x %llu,%u,%x %llu,%u,%x %u %llu %x]"
#define SQR_ARGS(r) \
(r)->prio, \
le64_to_cpu((r)->name_val[0]), (r)->name_source[0], (r)->name_flags[0], \
le64_to_cpu((r)->name_val[1]), (r)->name_source[1], (r)->name_flags[1], \
le64_to_cpu((r)->name_val[2]), (r)->name_source[2], (r)->name_flags[2], \
(r)->op, le64_to_cpu((r)->limit), (r)->rule_flags
static void print_quota(struct scoutfs_key *key, void *val, int val_len)
{
struct scoutfs_quota_rule_val *rv = val;
printf(" quota rule: hash 0x%016llx coll_nr %llu\n"
" "SQR_FMT"\n",
le64_to_cpu(key->skqr_hash), le64_to_cpu(key->skqr_coll_nr), SQR_ARGS(rv));
}
static void print_xattr_totl(struct scoutfs_key *key, void *val, int val_len)
{
struct scoutfs_xattr_totl_val *tval = val;
@@ -108,17 +89,6 @@ static void print_xattr_totl(struct scoutfs_key *key, void *val, int val_len)
le64_to_cpu(tval->count));
}
static void print_xattr_indx(struct scoutfs_key *key, void *val, int val_len)
{
u64 minor;
u64 ino;
u64 xid;
u8 major;
scoutfs_xattr_get_indx_key(key, &major, &minor, &ino, &xid);
printf(" xattr indx: major %u minor %llu ino %llu xid %llu", major, minor, ino, xid);
}
static u8 *global_printable_name(u8 *name, int name_len)
{
static u8 name_buf[SCOUTFS_NAME_LEN + 1];
@@ -207,15 +177,9 @@ static print_func_t find_printer(u8 zone, u8 type)
return print_orphan;
}
if (zone == SCOUTFS_QUOTA_ZONE)
return print_quota;
if (zone == SCOUTFS_XATTR_TOTL_ZONE)
return print_xattr_totl;
if (zone == SCOUTFS_XATTR_INDX_ZONE)
return print_xattr_indx;
if (zone == SCOUTFS_FS_ZONE) {
switch(type) {
case SCOUTFS_INODE_TYPE: return print_inode;
-547
View File
@@ -1,547 +0,0 @@
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/ioctl.h>
#include <fcntl.h>
#include <errno.h>
#include <string.h>
#include <argp.h>
#include "sparse.h"
#include "parse.h"
#include "util.h"
#include "format.h"
#include "ioctl.h"
#include "cmd.h"
#include "util.h"
#include "key.h"
static char opc[] = {
[SQ_OP_DATA] = 'D',
[SQ_OP_INODE] = 'I',
};
static char nsc[] = {
[SQ_NS_LITERAL] = 'L',
[SQ_NS_PROJ] = 'P',
[SQ_NS_UID] = 'U',
[SQ_NS_GID] = 'G',
};
static void printf_rule(struct scoutfs_ioctl_quota_rule *irule)
{
int i;
/* priority: [0-9]+ */
printf("%3u ", irule->prio);
/* totl name: ([0-9]+,[LPUG-]+,[S-]+){3} */
for (i = 0; i < array_size(irule->name_val); i++) {
printf("%llu,%c,%c ",
irule->name_val[i],
nsc[irule->name_source[i]],
(irule->name_flags[i] & SQ_NF_SELECT) ? 'S' : '-');
}
/* op: [ID], limit: [0-9]+, flags [C-] */
printf("%c %llu %c\n",
opc[irule->op], irule->limit, (irule->rule_flags & SQ_RF_TOTL_COUNT) ? 'C' : '-');
}
static int parse_rule(struct scoutfs_ioctl_quota_rule *irule, char *str)
{
char ns[3];
char nf[3];
char rf;
char op;
int ret;
int i;
int j;
memset(irule, 0, sizeof(struct scoutfs_ioctl_quota_rule));
ret = sscanf(str, " %hhu %llu,%c,%c %llu,%c,%c %llu,%c,%c %c %llu %c",
&irule->prio, &irule->name_val[0], &ns[0], &nf[0], &irule->name_val[1],
&ns[1], &nf[1], &irule->name_val[2], &ns[2], &nf[2], &op, &irule->limit,
&rf);
if (ret != 13) {
printf("invalid rule, missing fields: %s\n", str);
ret = -EINVAL;
goto out;
}
for (i = 0; i < array_size(irule->name_val); i++) {
irule->name_source[i] = SQ_NS__NR;
for (j = 0; j < array_size(nsc); j++) {
if (ns[i] == nsc[j]) {
irule->name_source[i] = j;
break;
}
}
if (irule->name_source[i] == SQ_NS__NR) {
printf("invalid name source '%c' in name #%u in rule:\n\t%s\n",
ns[i], i + 1, str);
ret = -EINVAL;
goto out;
}
irule->name_flags[i] = nf[i] == '-' ? 0 :
nf[i] == 'S' ? SQ_NF_SELECT :
SQ_NF__UNKNOWN;
if (irule->name_flags[i] == SQ_NF__UNKNOWN) {
printf("invalid name flags '%c' in name #%u in rule:\n\t%s\n",
nf[i], i + 1, str);
ret = -EINVAL;
goto out;
}
}
irule->op = SQ_NS__NR;
for (i = 0; i < array_size(opc); i++) {
if (op == opc[i]) {
irule->op = i;
break;
}
}
if (irule->op == SQ_NS__NR) {
printf("invalid op '%c' in rule:\n\t%s\n", op, str);
ret = -EINVAL;
goto out;
}
irule->rule_flags = rf == '-' ? 0 : rf == 'C' ? SQ_RF_TOTL_COUNT : SQ_RF__UNKNOWN;
if (irule->rule_flags == SQ_RF__UNKNOWN) {
printf("invalid rule flags '%c' in rule:\n\t%s\n", rf, str);
ret = -EINVAL;
goto out;
}
ret = 0;
out:
return ret;
}
/* ---------------------------------------------- */
struct mod_args {
char *path;
char *rule_str;
bool is_add;
};
static int do_mod(struct mod_args *args)
{
struct scoutfs_ioctl_quota_rule irule;
unsigned int cmd;
int fd = -1;
int ret;
memset(&irule, 0, sizeof(irule));
ret = parse_rule(&irule, args->rule_str);
if (ret < 0)
goto out;
fd = get_path(args->path, O_RDONLY);
if (fd < 0)
return fd;
cmd = args->is_add ? SCOUTFS_IOC_ADD_QUOTA_RULE : SCOUTFS_IOC_DEL_QUOTA_RULE;
ret = ioctl(fd, cmd, &irule);
if (ret < 0) {
ret = -errno;
fprintf(stderr, "MOD_QUOTA_RULE ioctl failed: %s (%d)\n",
strerror(errno), errno);
goto out;
}
ret = 0;
out:
if (fd >= 0)
close(fd);
return ret;
}
static int parse_mod_opt(int key, char *arg, struct argp_state *state)
{
struct mod_args *args = state->input;
switch (key) {
case 'p':
args->path = strdup_or_error(state, arg);
break;
case 'r':
args->rule_str = strdup_or_error(state, arg);
break;
case ARGP_KEY_FINI:
if (!args->path)
argp_error(state, "must provide file path");
if (!args->rule_str)
argp_error(state, "must provide rule string");
break;
default:
break;
}
return 0;
}
static struct argp_option add_options[] = {
{ "path", 'p', "PATH", 0, "Path to ScoutFS filesystem"},
{ "rule", 'r', "RULE_STRING", 0, "Rule string"},
{ NULL }
};
static struct argp add_argp = {
add_options,
parse_mod_opt,
"",
"Add quota rule"
};
static int add_cmd(int argc, char **argv)
{
struct mod_args args = { .is_add = true, };
int ret;
ret = argp_parse(&add_argp, argc, argv, 0, NULL, &args);
if (ret)
return ret;
return do_mod(&args);
}
static void __attribute__((constructor)) add_ctor(void)
{
cmd_register_argp("quota-add", &add_argp, GROUP_CORE, add_cmd);
}
static struct argp_option del_options[] = {
{ "path", 'p', "PATH", 0, "Path to ScoutFS filesystem"},
{ "rule", 'r', "RULE_STRING", 0, "Rule string"},
{ NULL }
};
static struct argp del_argp = {
del_options,
parse_mod_opt,
"",
"Delete quota rule"
};
static int del_cmd(int argc, char **argv)
{
struct mod_args args = { .is_add = false };
int ret;
ret = argp_parse(&del_argp, argc, argv, 0, NULL, &args);
if (ret)
return ret;
return do_mod(&args);
}
static void __attribute__((constructor)) del_ctor(void)
{
cmd_register_argp("quota-del", &del_argp, GROUP_CORE, del_cmd);
}
/* ---------------------------------------------- */
struct bulk_args {
char *path;
bool unsorted;
};
typedef int (*bulk_in_fn)(int fd, struct scoutfs_ioctl_quota_rule *irules, size_t nr,
void *in_args);
typedef int (*bulk_out_fn)(int fd, struct scoutfs_ioctl_quota_rule *irule, void *out_args);
static int cmp_irules(const struct scoutfs_ioctl_quota_rule *a,
const struct scoutfs_ioctl_quota_rule *b)
{
return scoutfs_cmp(a->prio, b->prio) ?:
scoutfs_cmp(a->name_val[0], b->name_val[0]) ?:
scoutfs_cmp(a->name_source[0], b->name_source[0]) ?:
scoutfs_cmp(a->name_flags[0], b->name_flags[0]) ?:
scoutfs_cmp(a->name_val[1], b->name_val[1]) ?:
scoutfs_cmp(a->name_source[1], b->name_source[1]) ?:
scoutfs_cmp(a->name_flags[1], b->name_flags[1]) ?:
scoutfs_cmp(a->name_val[2], b->name_val[2]) ?:
scoutfs_cmp(a->name_source[2], b->name_source[2]) ?:
scoutfs_cmp(a->name_flags[2], b->name_flags[2]) ?:
scoutfs_cmp(a->op, b->op) ?:
scoutfs_cmp(a->limit, b->limit) ?:
scoutfs_cmp(a->rule_flags, b->rule_flags);
}
static int compar_irules(const void *a, const void *b)
{
return -cmp_irules(a, b);
}
static int do_bulk(struct bulk_args *args, bulk_in_fn in_fn, void *in_args,
bulk_out_fn out_fn, void *out_args)
{
struct scoutfs_ioctl_quota_rule *irules = NULL;
size_t alloced = 0;
size_t nr = 0;
size_t batch;
size_t i;
int fd = -1;
int ret;
fd = get_path(args->path, O_RDONLY);
if (fd < 0)
return fd;
for (;;) {
if (nr == alloced) {
alloced += 1024;
irules = realloc(irules, alloced * sizeof(irules[0]));
if (!irules) {
ret = -errno;
fprintf(stderr, "memory allocation failed: %s (%d)\n",
strerror(errno), errno);
goto out;
}
}
ret = in_fn(fd, &irules[nr], alloced - nr, in_args);
if (ret == 0)
break;
if (ret < 0)
goto out;
batch = ret;
if (args->unsorted) {
for (i = 0; i < batch; i++) {
ret = out_fn(fd, &irules[nr + i], out_args);
if (ret < 0)
goto out;
}
} else {
nr += batch;
}
}
if (!args->unsorted) {
qsort(irules, nr, sizeof(irules[0]), compar_irules);
for (i = 0; i < nr; i++) {
ret = out_fn(fd, &irules[i], out_args);
if (ret < 0)
goto out;
}
}
ret = 0;
out:
if (fd >= 0)
close(fd);
if (irules)
free(irules);
return ret;
}
/* ---------------------------------------------- */
/* maintain iterator in gqr between calls */
static int get_ioctl_in_fn(int fd, struct scoutfs_ioctl_quota_rule *irules, size_t nr,
void *in_args)
{
struct scoutfs_ioctl_get_quota_rules *gqr = in_args;
int ret;
gqr->rules_ptr = (intptr_t)irules;
gqr->rules_nr = nr;
ret = ioctl(fd, SCOUTFS_IOC_GET_QUOTA_RULES, gqr);
if (ret < 0) {
ret = -errno;
fprintf(stderr, "GET_QUOTA_RULES ioctl failed: %s (%d)\n",
strerror(errno), errno);
}
return ret;
}
static int parse_stdin_in_fn(int fd, struct scoutfs_ioctl_quota_rule *irules, size_t nr,
void *in_args)
{
char *line = NULL;
size_t size;
int ret;
ret = getline(&line, &size, stdin);
if (ret < 0) {
if (errno == ENOENT)
return 0;
ret = -errno;
fprintf(stderr, "error reading rules: %s (%d)\n",
strerror(errno), errno);
return ret;
}
ret = parse_rule(&irules[0], line);
if (ret == 0)
ret = 1;
free(line);
return ret;
}
struct mod_ioctl_args {
unsigned int cmd;
char *which;
};
static int mod_ioctl_out_fn(int fd, struct scoutfs_ioctl_quota_rule *irule, void *out_args)
{
struct mod_ioctl_args *args = out_args;
int ret;
ret = ioctl(fd, args->cmd, irule);
if (ret < 0) {
ret = -errno;
printf("Failed to %s following rule:\n ", args->which);
printf_rule(irule);
fprintf(stderr, "Error: %s (%d)\n", strerror(-ret), -ret);
}
return ret;
}
static int print_out_fn(int fd, struct scoutfs_ioctl_quota_rule *irule, void *out_args)
{
printf_rule(irule);
return 0;
}
/* ---------------------------------------------- */
static int parse_bulk_opt(int key, char *arg, struct argp_state *state)
{
struct bulk_args *args = state->input;
switch (key) {
case 'p':
args->path = strdup_or_error(state, arg);
break;
case 'U':
args->unsorted = true;
break;
case ARGP_KEY_FINI:
if (!args->path)
argp_error(state, "must provide file path");
break;
default:
break;
}
return 0;
}
static struct argp_option bulk_options[] = {
{ "path", 'p', "PATH", 0, "Path to ScoutFS filesystem"},
{ "unsorted", 'U', NULL, 0, "Process rules in unsorted filesystem storage order"},
{ NULL }
};
static struct argp list_argp = {
bulk_options,
parse_bulk_opt,
"",
"List quota rules"
};
static int list_cmd(int argc, char **argv)
{
struct scoutfs_ioctl_get_quota_rules gqr = {{0,}};
struct bulk_args args = {NULL};
int ret;
ret = argp_parse(&list_argp, argc, argv, 0, NULL, &args);
if (ret)
return ret;
return do_bulk(&args, get_ioctl_in_fn, &gqr, print_out_fn, NULL);
}
static void __attribute__((constructor)) list_ctor(void)
{
cmd_register_argp("quota-list", &list_argp, GROUP_CORE, list_cmd);
}
/* ---------------------------------------------- */
static struct argp wipe_argp = {
bulk_options,
parse_bulk_opt,
"",
"Delete all quota rules"
};
static int wipe_cmd(int argc, char **argv)
{
struct bulk_args args = {NULL};
struct scoutfs_ioctl_get_quota_rules gqr = {{0,}};
struct mod_ioctl_args out_args = {
.cmd = SCOUTFS_IOC_DEL_QUOTA_RULE,
.which = "delete",
};
int ret;
ret = argp_parse(&wipe_argp, argc, argv, 0, NULL, &args);
if (ret)
return ret;
return do_bulk(&args, get_ioctl_in_fn, &gqr, mod_ioctl_out_fn, &out_args);
}
static void __attribute__((constructor)) wipe_ctor(void)
{
cmd_register_argp("quota-wipe", &wipe_argp, GROUP_CORE, wipe_cmd);
}
/* ---------------------------------------------- */
static struct argp restore_argp = {
bulk_options,
parse_bulk_opt,
"",
"Restore quota rules from list output on stdin"
};
static int restore_cmd(int argc, char **argv)
{
struct bulk_args args = {NULL};
struct mod_ioctl_args out_args = {
.cmd = SCOUTFS_IOC_ADD_QUOTA_RULE,
.which = "add",
};
int ret;
ret = argp_parse(&restore_argp, argc, argv, 0, NULL, &args);
if (ret)
return ret;
return do_bulk(&args, parse_stdin_in_fn, NULL, mod_ioctl_out_fn, &out_args);
}
static void __attribute__((constructor)) restore_ctor(void)
{
cmd_register_argp("quota-restore", &restore_argp, GROUP_CORE, restore_cmd);
}
-186
View File
@@ -1,186 +0,0 @@
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/ioctl.h>
#include <fcntl.h>
#include <errno.h>
#include <string.h>
#include <argp.h>
#include "sparse.h"
#include "parse.h"
#include "util.h"
#include "format.h"
#include "ioctl.h"
#include "cmd.h"
#include "cmp.h"
#define ENTF "%u.%llu.%llu"
#define ENTA(e) (e)->major, (e)->minor, (e)->ino
struct xattr_args {
char *path;
char *first_entry;
char *last_entry;
};
static int compare_entries(struct scoutfs_ioctl_xattr_index_entry *a,
struct scoutfs_ioctl_xattr_index_entry *b)
{
return scoutfs_cmp(a->major, b->major) ?: scoutfs_cmp(a->minor, b->minor) ?:
scoutfs_cmp(a->ino, b->ino);
}
static int parse_entry(struct scoutfs_ioctl_xattr_index_entry *ent, char *str)
{
int major;
int ret;
ret = sscanf(str, "%i.%lli.%lli", &major, &ent->minor, &ent->ino);
if (ret != 3) {
fprintf(stderr, "bad index position entry argument '%s', it must be "
"in the form \"a.b.ino\" where each value can be prefixed by "
"'0' for octal or '0x' for hex\n", str);
return -EINVAL;
}
if (major < 0 || major > UCHAR_MAX) {
fprintf(stderr, "initial major index position '%d' must be between 0 and 255, "
"inclusive.\n", major);
return -EINVAL;
}
ent->major = major;
return 0;
}
#define NR_ENTRIES 1024
static int do_read_xattr_index(struct xattr_args *args)
{
struct scoutfs_ioctl_read_xattr_index rxi;
struct scoutfs_ioctl_xattr_index_entry *ents;
struct scoutfs_ioctl_xattr_index_entry *ent;
int fd = -1;
int ret;
int i;
ents = calloc(NR_ENTRIES, sizeof(struct scoutfs_ioctl_xattr_index_entry));
if (!ents) {
fprintf(stderr, "xattr index entry allocation failed\n");
ret = -ENOMEM;
goto out;
}
fd = get_path(args->path, O_RDONLY);
if (fd < 0)
return fd;
memset(&rxi, 0, sizeof(rxi));
memset(&rxi.last, 0xff, sizeof(rxi.last));
rxi.entries_ptr = (unsigned long)ents;
rxi.entries_nr = NR_ENTRIES;
ret = 0;
if (args->first_entry)
ret = parse_entry(&rxi.first, args->first_entry);
if (args->last_entry)
ret = parse_entry(&rxi.last, args->last_entry);
if (ret < 0)
goto out;
if (compare_entries(&rxi.first, &rxi.last) > 0) {
fprintf(stderr, "first index position "ENTF" must be less than last index position "ENTF"\n",
ENTA(&rxi.first), ENTA(&rxi.last));
ret = -EINVAL;
goto out;
}
for (;;) {
ret = ioctl(fd, SCOUTFS_IOC_READ_XATTR_INDEX, &rxi);
if (ret == 0)
break;
if (ret < 0) {
ret = -errno;
fprintf(stderr, "read_xattr_index ioctl failed: "
"%s (%d)\n", strerror(errno), errno);
goto out;
}
for (i = 0; i < ret; i++) {
ent = &ents[i];
printf("%u.%llu = %llu\n",
ent->major, ent->minor, ent->ino);
}
rxi.first = *ent;
if ((++rxi.first.ino == 0 && ++rxi.first.minor == 0 && ++rxi.first.major == 0) ||
compare_entries(&rxi.first, &rxi.last) > 0)
break;
}
ret = 0;
out:
if (fd >= 0)
close(fd);
free(ents);
return ret;
};
static int parse_opt(int key, char *arg, struct argp_state *state)
{
struct xattr_args *args = state->input;
switch (key) {
case 'p':
args->path = strdup_or_error(state, arg);
break;
case ARGP_KEY_ARG:
if (!args->first_entry)
args->first_entry = strdup_or_error(state, arg);
else if (!args->last_entry)
args->last_entry = strdup_or_error(state, arg);
else
argp_error(state, "more than two entry arguments given");
break;
default:
break;
}
return 0;
}
static struct argp_option options[] = {
{ "path", 'p', "PATH", 0, "Path to ScoutFS filesystem"},
{ NULL }
};
static struct argp argp = {
options,
parse_opt,
"FIRST-ENTRY LAST-ENTRY",
"Search and print inode numbers indexed by their .indx. xattrs"
};
static int read_xattr_index_cmd(int argc, char **argv)
{
struct xattr_args xattr_args = {NULL};
int ret;
ret = argp_parse(&argp, argc, argv, 0, NULL, &xattr_args);
if (ret)
return ret;
return do_read_xattr_index(&xattr_args);
}
static void __attribute__((constructor)) read_xattr_index_ctor(void)
{
cmd_register_argp("read-xattr-index", &argp, GROUP_INFO, read_xattr_index_cmd);
}