Compare commits

...

48 Commits

Author SHA1 Message Date
Auke Kok c137637007 We can't cross-mount ipv4/ipv6.
I thought this would just work between ipv4 and ipv6 based quorum
members, but it turns out it will only work one way by default. While
we could make this work (multiple sockets, special sockopts) it is
highly unlikely and very undesirable.

Much stronger feels to just disallow it explicitly and reject mixed
v4/v6 configurations outright (mkfs/change-quorum, and mount) to avoid
this. I can't imagine this doing any good for users' fencing setups.

The test cases added validate the 2 easy userspace checks. The mount
check isn't easily testable because we disallow userspace from creating
such a failure path.

One additional test section tests the migration path from v4->v6->v4
so there's at least some test that checks that this actually works.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2026-05-14 14:43:00 -07:00
Auke Kok 2c46e37543 Account for ipv6 in kernel_get{sock,peer}name compat.
These 2 kernel wrapper functions need to be able to properly handle
the ipv6 addrlen, instead of returning -EAFNOSUPPORT(-97).

Signed-off-by: Auke Kok <auke.kok@versity.com>
2026-05-06 15:17:26 -07:00
Auke Kok 61ece361a7 Add IPv6 support to the kernel module.
This adds IPv6 support to the kernel module side.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2026-05-06 15:17:26 -07:00
Auke Kok afba683be6 Enable ipv6 in testing.
Instead of using 127.0.0.1, we initialize the quorum slots to ::1,
enabling all ipv6 support.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2026-05-06 15:17:26 -07:00
Auke Kok a95996e932 Add ipv6 support to scoutfs userspace utility.
This change adds ipv6 support to various scoutfs sub-commands, allowing
users to mkfs, print and change-quorum-config using ipv6 addresses, and
modifies the outputs.

Any ipv6 address/port is displayed as [::1]:5000 to comply with the
related RFC's. Input strings remain consistent as the quorum config
input value is comma-separated already, not posing any issues.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2026-05-06 15:17:26 -07:00
Auke Kok 6ff9fd39fa Don't stack alloc struct scoutfs_quorum_block_event old
The size of this thing is well over 1kb, and the compiler will
error on several supported distributions that this particular
function reaches over 2k stack frame size, which is excessive,
even for a function that isn't called regularly.

We can allocate the thing in one go if we smartly allocate this
as an array of (an array of structs) which allows us to index
it as a 2d array as before, taking away some of the additional
complexities.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2026-05-06 15:17:26 -07:00
Zach Brown fece0a9372 Merge pull request #310 from versity/zab/v1.31
v1.31 Release
2026-05-06 10:37:07 -07:00
Zach Brown aa432727f2 v1.31 Release
Finish the release notes for the 1.31 release.

Signed-off-by: Zach Brown <zab@versity.com>
2026-05-05 14:29:18 -07:00
Zach Brown ceebadd139 Merge pull request #308 from versity/auke/totl-delta-repair
totl key repair
2026-05-05 13:05:57 -07:00
Zach Brown 4b4ddc9ded Merge pull request #298 from versity/auke/double_unlock_dw_truncate
Fix double unlock in scoutfs_setattr data_wait error path
2026-05-04 09:52:29 -07:00
Zach Brown 94d3ece590 Merge pull request #299 from versity/auke/cond_resched_block_free
Add cond_resched in block_free_work
2026-05-04 09:49:43 -07:00
Auke Kok 6d5517614b Fix double unlock in scoutfs_setattr data_wait error path
When scoutfs_setattr truncates a file with offline extents, it unlocks
the inode lock before calling scoutfs_data_wait to wait for the data
to be staged. If data_wait returns any error, the code jumps to 'goto
out' which calls scoutfs_unlock again, thus double-unlocking the lock.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2026-05-04 09:48:54 -07:00
Auke Kok 10279d0b23 Add test exercising the totl delta inject ioctl.
Skews a totl twice, restore it, and intersperse setfattr/unlink to
exercise both injected and naturally-produced deltas.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2026-05-04 09:43:01 -07:00
Zach Brown 443c34309f Merge pull request #303 from versity/auke/clang_build_werr
3 minor clang things
2026-05-04 09:42:43 -07:00
Auke Kok 5c81a979d5 Add SCOUTFS_IOC_INJECT_TOTL_DELTA ioctl.
Inject a signed (total, count) delta at a totl key.  No validity
checking.  Requires CAP_SYS_ADMIN.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2026-05-04 09:42:42 -07:00
Zach Brown ec38b6e1c8 Merge pull request #305 from versity/auke/block_submit_bio_err
Set BLOCK_BIT_ERROR on bio submit failure during forced unmount
2026-05-04 09:35:43 -07:00
Zach Brown 8e0066b231 Merge pull request #309 from versity/auke/quota_invalidate_race
fix and test - quota invalidate race
2026-05-04 09:34:26 -07:00
Zach Brown a0fda5b735 Merge pull request #307 from versity/zab/next_merge_range_zero
Search all merge range items for next
2026-05-04 09:29:54 -07:00
Auke Kok fc56a69d8f Add quota invalidate race regression test
Run concurrent quota add/del on one mount against rapid file
creation and deletion on both mounts to exercise the race fixed
in the previous commit.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2026-05-02 13:19:31 -07:00
Auke Kok c8bc42ccdb Fix quota invalidate race with concurrent ruleset read
A quota check holds the quota cluster lock for READ and marks the
cached ruleset EBUSY while loading rules.  A quota mod on the same
mount holds the lock for WRITE (compatible with the local READ)
and calls scoutfs_quota_invalidate(), tripping
BUG_ON(rs == ERR_PTR(-EBUSY)).

Make invalidate skip EBUSY so the reader's claim is preserved, and
have scoutfs_quota_mod_rule wait for the reader to finish before
calling invalidate.  Without the wait, the in-flight reader would
publish its stale ruleset after invalidate runs, leaving the cache
stale until the next invalidation.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2026-05-02 13:19:31 -07:00
Zach Brown 4db0a48fe4 Search all merge range items for next
When searching for the next least merge range we need to sweep all the
stored items because they're interleaved with respect to key sorting
because we've clobbered the zone.

To search all of them we need to start from 0, not from the caller's
start key after setting the zone.  If the caller happens to provide a
start key with a small zone but large other fields (totl keys with
sufficiently large identifiers) we can miss ranges.

Signed-off-by: Zach Brown <zab@zabbo.net>
2026-04-29 10:17:38 -07:00
Auke Kok ac1ab8e87f Add cond_resched in block_free_work
I'm seeing consistent CPU soft lockups in block_free_work on
my bare metal system that aren't reached by VM instances. The
reason is that the bare metal machine has a ton more memory
available causing the block free work queue to grow much
larger in size, and then it has so much work that it can take 30+
seconds before it goes through it all.

This is all with a debug kernel. A non debug kernel will likely
zoom through the outstanding work here at a much faster rate.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2026-04-22 13:39:32 -07:00
Zach Brown af31b9f1e8 Merge pull request #306 from versity/zab/v1.30
v1.30 Release
2026-04-22 10:43:17 -07:00
Zach Brown ad65116d8f v1.30 Release
Finish the release notes for the 1.30 release.

Signed-off-by: Zach Brown <zab@versity.com>
2026-04-21 16:43:12 -07:00
Auke Kok 8bfd35db0b Set BLOCK_BIT_ERROR on bio submit failure during forced unmount
block_submit_bio will return -ENOLINK if called during a forced
shutdown, the bio is never submitted, and thus no completion callback
will fire to set BLOCK_BIT_ERROR. Any other task waiting for this
specific bp will end up waiting forever.

To fix, fall through to the existing block_end_io call on the
error path instead of returning directly.  That means moving
the forcing_unmount check past the setup calls so block_end_io's
bookkeeping stays balanced. block_end_io then sets BLOCK_BIT_ERROR
and wakes up waiters just as it would on a failed async completion.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2026-04-20 17:01:12 -07:00
Zach Brown e20765a9c7 Merge pull request #300 from versity/auke/more_false_positive_failures
Auke/more false positive failures: xfs lockdep miss, newline
2026-04-17 09:17:50 -07:00
Zach Brown 066da5c2a2 Merge pull request #297 from versity/auke/quota_mod_trans_hold
Hold transaction in scoutfs_quota_mod_rule to prevent alloc corruption.
2026-04-17 09:16:41 -07:00
Auke Kok 7eacc7139c Hold transaction in scoutfs_quota_mod_rule to prevent alloc corruption.
scoutfs_quota_mod_rule calls scoutfs_item_create/delete which use
the transaction allocator but it never held it. Without the hold,
a concurrent transaction commit can call scoutfs_alloc_init to
reinitialize the allocator while dirty_alloc_blocks is in the middle
of setting up the freed list block. This overwrites alloc->freed with
the server's fresh (empty) state, causing a blkno mismatch BUG_ON
in list_block_add.

Reproduced by stressing concurrent quota add/del operations across
mounts. Crashdump analysis confirms dirty_list_block COW'd a freed
block (fr_old=9842, new blkno=9852) but by the time list_block_add
ran, freed.ref.blkno was 0 with first_nr=0 and total_nr=0: the freed
list head had been zeroed by a concurrent alloc_init.

Fix by adding scoutfs_hold_trans/scoutfs_release_trans around the
item modification in scoutfs_quota_mod_rule, preventing transaction
commit from racing with the allocator use.

Rename the 'unlock' label to 'release' since 'out' now directly
does the unlock. The unlock safely handles a NULL lock.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2026-04-16 16:20:47 -07:00
Auke Kok 019125d86d Don't swallow invalid message error
A malformed message encountered here increases the counter, but doesn't
tear down the connection because of the nested for loops. The comments
indicate that that is the expected behavior - a misbehaving client
should not be tolerated.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2026-04-15 17:02:40 -07:00
Auke Kok 347e27acec Fix leak in client side lock invalidation
Clang's scan-build found this leak when we get an invalidation
for a lock we no longer have. Free ireq to fix.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2026-04-15 16:35:10 -07:00
Auke Kok 3ce5d47f2c Initialize resp_data to silence clang uninitialized warning
Clang flow analysis flags resp_data in process_response as possibly
uninitialized when find_request returns NULL.

  kmod/src/net.c:533:6: error: variable 'resp_data' is used uninitialized
  whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized]

In practice the read is harmless because resp_func stays NULL in that
path and call_resp_func only dereferences resp_data when resp_func is
non-NULL. Initialize at declaration.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2026-04-15 14:06:46 -07:00
Auke Kok 9e3b01b3b4 Filter newlines out dmesg.new
Without overly broad filtering empty lines from dmesg, filter
them so dmesg.new doesn't trigger a test failure. I don't want
to overly process dmesg, so do this as late as possible.

The xfs lockdep patterns can forget a leading/trailing empty line,
causing a failure despite the explicit removal of the lockdep
false positive.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2026-04-15 10:36:28 -07:00
Auke Kok 876c233f06 Ignore another xfs lockdep class
This already caught xfs_nondir_ilock_class, but recent CI runs
have been hitting xfs_dir_ilock_class, too.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2026-04-15 10:36:28 -07:00
Zach Brown 6aa5876c71 Merge pull request #301 from versity/auke/el7_uninit_read_seq
Squelch gcc uninitialized warning on el7
2026-04-15 09:58:23 -07:00
Auke Kok 7a9f9ec698 Squelch gcc uninitialized warning on el7
The gcc version in el7 can't determine that scoutfs_block_check_stale
won't return ret = 0 when the input ret value is < 0, and
errors because we might call alloc_wpage with an uninitialized
read_seq. Initialize it to 0 to avoid it.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2026-04-14 15:09:20 -04:00
Zach Brown fc0fc1427f Merge pull request #296 from versity/auke/indx_key_delete
Fix indx delete using wrong xid, leaving orphans. && Add basic-xattr-indx tests.
2026-04-13 14:34:37 -07:00
Zach Brown ec68845201 Merge pull request #289 from versity/auke/merge_read_item_stale_seq
Update seq when merging deltas from partial log merge.
2026-04-13 14:10:37 -07:00
Auke Kok 5e2009f939 Avoid double counting deltas from non-input finalized log trees.
Readers currently accumulate all finalized log tree deltas into
a single bucket for deciding whether they are already in fs_root
or not, but, finalized trees that aren't inputs to a current merge
will have higher seqs, and thus we may be double applying deltas
already merged into fs_root.

To distinguish, scoutfs_totl_merge_contribute() needs to know the
merge status item seq.  We change wkic's get_roots() from using the
SCOUTFS_NET_CMD_GET_ROOTS RPC to reading the superblock directly.
This is needed because totl merge resolution has to use the same data
as the btree roots it is operating on, thus we can't grab it from a
SCOUTFS_NET_CMD_GET_ROOTS packet - it likely is different.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2026-04-10 13:50:21 -07:00
Auke Kok 8bdc20af21 Rename/reword FINALIZED to MERGE_INPUT.
These mislabeled members and enums were clearly not describing
the actual data being handled and obfuscating the intent of
avoiding mixing merge input items with non-merge input items.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2026-04-10 13:50:21 -07:00
Auke Kok 857a39579e Clear roots when retrying due to stale btree blocks.
Before deltas were added this code path was correct, but with
deltas we can't just retry this without clearing &root, since
it would potentially double count.

The condition where this could happen is when there are deltas in
several finalized log trees, and we've made progress towards reading
some of them, and then encounter a stale btree block. The retry
would not clear the collected trees, apply the same delta as was
already applied before the retry, and thus double count.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2026-04-10 13:50:21 -07:00
Auke Kok 38d36c9f5c Update seq when merging deltas from partial log merge.
Two different clients can write delta's for totl indexes at the same
time, recording their changes. When merged, a reader should apply both
in order, and only once. To do so, the seq determines whether the delta
has been applied already.

The code fails to update the seq while walking the trees for deltas to
apply. Subsequently, when processing subsequent trees, it could
re-process deltas already applied. In case of a large negative delta
(e.g. removal of large amounts of files), the totl value could become
negative, resulting in quota lockout.

The fix is simple: advance the seq when reading partial delta merges
to avoid double counting.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2026-04-10 13:50:21 -07:00
Auke Kok b724567b2a Add log_merge_force_partial trigger for testing partial merges.
Add a trigger that forces btree_merge() to return -ERANGE after
modifying a leaf's worth of items, causing many small partial merges
per merge cycle. This is used by tests to reliably reproduce races
that depend on partial merges splicing items into fs_root while
finalized logs still exist.

The trigger check lives inside btree_merge() where it can observe
actual item modification progress, rather than overriding the
caller's dirty byte limit argument which applies to the whole
writer context.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2026-04-10 12:25:30 -07:00
Auke Kok add1da10dc Add test for stale seq in merge delta combining.
merge_read_item() fails to update found->seq when combining delta items
from multiple finalized log trees. Add a test case to replicate the
conditions of this issue.

Each of 5 mounts sets totl value 1 on 2500 shared keys, giving an
expected total of 5 per key.  Any total > 5 proves double-counting
from a stale seq.

The log_merge_force_partial trigger forces many partial merges per
cycle, creating the conditions where stale-seq items get spliced into
fs_root while finalized logs still exist.  Parallel readers on all
mounts race against this window to detect double-counted values.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2026-04-10 12:25:30 -07:00
Auke Kok b9c49629a2 Add basic-xattr-indx tests.
We had no basic testing for `scoutfs read-xattr-index` whatsoever. This
adds your basic negative argument tests, lifecycle tests, the
deduplicated reads, and partial removal.

This exposes a bug in deletion where the indx entry isn't cleaned up
on inode delete.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2026-04-08 13:45:56 -07:00
Auke Kok 9737009437 Fix indx delete using wrong xid, leaving orphans.
During inode deletion, scoutfs_xattr_drop forgot to set the xid
of the xattr after calling parse_indx_key, which hardcodes xid=0, and it
is the callers' responsibility. delete_force then deletes the wrong
key, and returns no errors on nonexistant keys.

So now there is a pending deletion for a non-existant indx and an
orphan indx entry in the tree. Subsequent calls to `scoutfs
read-xattr-index` will thus return entries for deleted inodes.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2026-04-08 11:48:47 -07:00
Zach Brown 3d54ae03e6 Merge pull request #295 from versity/auke/xfs_lockdep_ignore
Avoid xfs lockdep false positive dmesg errors.
2026-04-03 09:46:44 -07:00
Auke Kok e27ec0add6 Avoid xfs lockdep false positive dmesg errors.
This xfs lockdep stack trace has at least 2 variants around
fs_reclaim, so try and capture it not too precisely here.

We can remove "lockdep disabled" in the $re grep -v, because it
can affect both this and the kasan one.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2026-04-01 14:25:48 -07:00
Zach Brown 5457741672 Merge pull request #292 from versity/zab/v1.29
v1.29 Release
2026-03-25 22:36:28 -07:00
46 changed files with 1276 additions and 275 deletions
+32
View File
@@ -1,6 +1,38 @@
Versity ScoutFS Release Notes
=============================
---
v1.31
\
*May 5, 2026*
Fix race between modifying quota rules and internal reading of the rules
that tripped an assertion.
Fix a bug that could skip merging totl items under specific heavy write
loads. This could lead to merged totl items incorrectly tracking the
sum of all the contributing totl xattrs.
Fix many small low risk bugs in error paths that were found with code
analysis and testing.
---
v1.30
\
*Apr 21, 2026*
Fix a problem reading the accumulated totals of contributing .totl.
xattrs when log merging is in progress. The problem would have readers
of the totals calculate the sums incorrectly.
Fix a problem updating quota rules. There was a race where updates
could be corrupted if they happened while a transaction was being
written.
Fix a problem deleting files with .indx. xattrs. The internal indexing
metadata wouldn't be properly deleted so the files would still claim to
be present and visible in the index, though the file no longer existed.
---
v1.29
\
+13 -3
View File
@@ -218,6 +218,7 @@ static void block_free_work(struct work_struct *work)
llist_for_each_entry_safe(bp, tmp, deleted, free_node) {
block_free(sb, bp);
cond_resched();
}
}
@@ -467,9 +468,6 @@ static int block_submit_bio(struct super_block *sb, struct block_private *bp,
sector_t sector;
int ret = 0;
if (scoutfs_forcing_unmount(sb))
return -ENOLINK;
sector = bp->bl.blkno << (SCOUTFS_BLOCK_LG_SHIFT - 9);
WARN_ON_ONCE(bp->bl.blkno == U64_MAX);
@@ -480,6 +478,17 @@ static int block_submit_bio(struct super_block *sb, struct block_private *bp,
set_bit(BLOCK_BIT_IO_BUSY, &bp->bits);
block_get(bp);
/*
* A second thread may already be waiting on this block's completion
* after this thread won the race to submit the block. We exit through
* the block_end_io error path which sets BLOCK_BIT_ERROR and assures
* that other callers in the waitq get woken up.
*/
if (scoutfs_forcing_unmount(sb)) {
ret = -ENOLINK;
goto end_io;
}
blk_start_plug(&plug);
for (off = 0; off < SCOUTFS_BLOCK_LG_SIZE; off += PAGE_SIZE) {
@@ -517,6 +526,7 @@ static int block_submit_bio(struct super_block *sb, struct block_private *bp,
blk_finish_plug(&plug);
end_io:
/* let racing end_io know we're done */
block_end_io(sb, opf, bp, ret);
+10
View File
@@ -2183,6 +2183,8 @@ static int merge_read_item(struct super_block *sb, struct scoutfs_key *key, u64
if (ret > 0) {
if (ret == SCOUTFS_DELTA_COMBINED) {
scoutfs_inc_counter(sb, btree_merge_delta_combined);
if (seq > found->seq)
found->seq = seq;
} else if (ret == SCOUTFS_DELTA_COMBINED_NULL) {
scoutfs_inc_counter(sb, btree_merge_delta_null);
free_mitem(rng, found);
@@ -2486,6 +2488,14 @@ int scoutfs_btree_merge(struct super_block *sb,
mitem = next_mitem(mitem);
free_mitem(&rng, tmp);
}
if (mitem && walk_val_len == 0 &&
!(walk_flags & (BTW_INSERT | BTW_DELETE)) &&
scoutfs_trigger(sb, LOG_MERGE_FORCE_PARTIAL)) {
ret = -ERANGE;
*next_ret = mitem->key;
goto out;
}
}
ret = 0;
+1 -1
View File
@@ -479,7 +479,7 @@ static void scoutfs_client_connect_worker(struct work_struct *work)
struct scoutfs_sb_info *sbi = SCOUTFS_SB(sb);
struct scoutfs_mount_options opts;
struct scoutfs_net_greeting greet;
struct sockaddr_in sin;
struct sockaddr_storage sin;
bool am_quorum;
int ret;
+13 -7
View File
@@ -25,6 +25,7 @@
#include "sysfs.h"
#include "server.h"
#include "fence.h"
#include "net.h"
/*
* Fencing ensures that a given mount can no longer write to the
@@ -79,7 +80,7 @@ struct pending_fence {
struct timer_list timer;
ktime_t start_kt;
__be32 ipv4_addr;
union scoutfs_inet_addr addr;
bool fenced;
bool error;
int reason;
@@ -171,14 +172,19 @@ static ssize_t error_store(struct kobject *kobj, struct kobj_attribute *attr, co
}
SCOUTFS_ATTR_RW(error);
static ssize_t ipv4_addr_show(struct kobject *kobj,
static ssize_t inet_addr_show(struct kobject *kobj,
struct kobj_attribute *attr, char *buf)
{
DECLARE_FENCE_FROM_KOBJ(fence, kobj);
struct sockaddr_storage sin;
return snprintf(buf, PAGE_SIZE, "%pI4", &fence->ipv4_addr);
memset(&sin, 0, sizeof(struct sockaddr_storage));
scoutfs_addr_to_sin(&sin, &fence->addr);
return snprintf(buf, PAGE_SIZE, "%pISc", SIN_ARG(&sin));
}
SCOUTFS_ATTR_RO(ipv4_addr);
SCOUTFS_ATTR_RO(inet_addr);
static ssize_t reason_show(struct kobject *kobj, struct kobj_attribute *attr,
char *buf)
@@ -212,7 +218,7 @@ static struct attribute *fence_attrs[] = {
SCOUTFS_ATTR_PTR(elapsed_secs),
SCOUTFS_ATTR_PTR(fenced),
SCOUTFS_ATTR_PTR(error),
SCOUTFS_ATTR_PTR(ipv4_addr),
SCOUTFS_ATTR_PTR(inet_addr),
SCOUTFS_ATTR_PTR(reason),
SCOUTFS_ATTR_PTR(rid),
NULL,
@@ -232,7 +238,7 @@ static void fence_timeout(struct timer_list *timer)
wake_up(&fi->waitq);
}
int scoutfs_fence_start(struct super_block *sb, u64 rid, __be32 ipv4_addr, int reason)
int scoutfs_fence_start(struct super_block *sb, u64 rid, union scoutfs_inet_addr *addr, int reason)
{
DECLARE_FENCE_INFO(sb, fi);
struct pending_fence *fence;
@@ -248,7 +254,7 @@ int scoutfs_fence_start(struct super_block *sb, u64 rid, __be32 ipv4_addr, int r
scoutfs_sysfs_init_attrs(sb, &fence->ssa);
fence->start_kt = ktime_get();
fence->ipv4_addr = ipv4_addr;
memcpy(&fence->addr, addr, sizeof(union scoutfs_inet_addr));
fence->fenced = false;
fence->error = false;
fence->reason = reason;
+1 -1
View File
@@ -7,7 +7,7 @@ enum {
SCOUTFS_FENCE_QUORUM_BLOCK_LEADER,
};
int scoutfs_fence_start(struct super_block *sb, u64 rid, __be32 ipv4_addr, int reason);
int scoutfs_fence_start(struct super_block *sb, u64 rid, union scoutfs_inet_addr *addr, int reason);
int scoutfs_fence_next(struct super_block *sb, u64 *rid, int *reason, bool *error);
int scoutfs_fence_reason_pending(struct super_block *sb, int reason);
int scoutfs_fence_free(struct super_block *sb, u64 rid);
+9 -7
View File
@@ -239,9 +239,9 @@ static int forest_read_items(struct super_block *sb, struct scoutfs_key *key, u6
* to reset their state and retry with a newer version of the btrees.
*/
int scoutfs_forest_read_items_roots(struct super_block *sb, struct scoutfs_net_roots *roots,
struct scoutfs_key *key, struct scoutfs_key *bloom_key,
struct scoutfs_key *start, struct scoutfs_key *end,
scoutfs_forest_item_cb cb, void *arg)
u64 merge_input_seq, struct scoutfs_key *key,
struct scoutfs_key *bloom_key, struct scoutfs_key *start,
struct scoutfs_key *end, scoutfs_forest_item_cb cb, void *arg)
{
struct forest_read_items_data rid = {
.cb = cb,
@@ -317,15 +317,17 @@ int scoutfs_forest_read_items_roots(struct super_block *sb, struct scoutfs_net_r
scoutfs_inc_counter(sb, forest_bloom_pass);
if ((le64_to_cpu(lt.flags) & SCOUTFS_LOG_TREES_FINALIZED))
rid.fic |= FIC_FINALIZED;
if ((le64_to_cpu(lt.flags) & SCOUTFS_LOG_TREES_FINALIZED) &&
(merge_input_seq == 0 ||
le64_to_cpu(lt.finalize_seq) < merge_input_seq))
rid.fic |= FIC_MERGE_INPUT;
ret = scoutfs_btree_read_items(sb, &lt.item_root, key, start,
end, forest_read_items, &rid);
if (ret < 0)
goto out;
rid.fic &= ~FIC_FINALIZED;
rid.fic &= ~FIC_MERGE_INPUT;
}
ret = 0;
@@ -345,7 +347,7 @@ int scoutfs_forest_read_items(struct super_block *sb,
ret = scoutfs_client_get_roots(sb, &roots);
if (ret == 0)
ret = scoutfs_forest_read_items_roots(sb, &roots, key, bloom_key, start, end,
ret = scoutfs_forest_read_items_roots(sb, &roots, 0, key, bloom_key, start, end,
cb, arg);
return ret;
}
+4 -4
View File
@@ -11,7 +11,7 @@ struct scoutfs_lock;
/* caller gives an item to the callback */
enum {
FIC_FS_ROOT = (1 << 0),
FIC_FINALIZED = (1 << 1),
FIC_MERGE_INPUT = (1 << 1),
};
typedef int (*scoutfs_forest_item_cb)(struct super_block *sb, struct scoutfs_key *key, u64 seq,
u8 flags, void *val, int val_len, int fic, void *arg);
@@ -25,9 +25,9 @@ int scoutfs_forest_read_items(struct super_block *sb,
struct scoutfs_key *end,
scoutfs_forest_item_cb cb, void *arg);
int scoutfs_forest_read_items_roots(struct super_block *sb, struct scoutfs_net_roots *roots,
struct scoutfs_key *key, struct scoutfs_key *bloom_key,
struct scoutfs_key *start, struct scoutfs_key *end,
scoutfs_forest_item_cb cb, void *arg);
u64 merge_input_seq, struct scoutfs_key *key,
struct scoutfs_key *bloom_key, struct scoutfs_key *start,
struct scoutfs_key *end, scoutfs_forest_item_cb cb, void *arg);
int scoutfs_forest_set_bloom_bits(struct super_block *sb,
struct scoutfs_lock *lock);
void scoutfs_forest_set_max_seq(struct super_block *sb, u64 max_seq);
+1
View File
@@ -549,6 +549,7 @@ retry:
goto out;
if (scoutfs_data_wait_found(&dw)) {
scoutfs_unlock(sb, lock, SCOUTFS_LOCK_WRITE);
lock = NULL;
/* XXX callee locks instead? */
inode_unlock(inode);
+39
View File
@@ -1739,6 +1739,43 @@ out:
return ret;
}
static long scoutfs_ioc_inject_totl_delta(struct file *file, unsigned long arg)
{
struct super_block *sb = file_inode(file)->i_sb;
struct scoutfs_ioctl_inject_totl_delta __user *uitd = (void __user *)arg;
struct scoutfs_ioctl_inject_totl_delta itd;
struct scoutfs_xattr_totl_val tval;
struct scoutfs_lock *lock = NULL;
struct scoutfs_key key;
int ret;
if (!capable(CAP_SYS_ADMIN))
return -EPERM;
if (copy_from_user(&itd, uitd, sizeof(itd)))
return -EFAULT;
scoutfs_xattr_init_totl_key(&key, itd.name);
tval.total = cpu_to_le64((u64)itd.total);
tval.count = cpu_to_le64((u64)itd.count);
ret = scoutfs_lock_xattr_totl(sb, SCOUTFS_LOCK_WRITE_ONLY, 0, &lock);
if (ret < 0)
goto out;
ret = scoutfs_hold_trans(sb, true);
if (ret < 0)
goto unlock;
ret = scoutfs_item_delta(sb, &key, &tval, sizeof(tval), lock);
scoutfs_release_trans(sb);
unlock:
scoutfs_unlock(sb, lock, SCOUTFS_LOCK_WRITE_ONLY);
out:
return ret;
}
long scoutfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
{
switch (cmd) {
@@ -1790,6 +1827,8 @@ long scoutfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
return scoutfs_ioc_read_xattr_index(file, arg);
case SCOUTFS_IOC_PUNCH_OFFLINE:
return scoutfs_ioc_punch_offline(file, arg);
case SCOUTFS_IOC_INJECT_TOTL_DELTA:
return scoutfs_ioc_inject_totl_delta(file, arg);
}
return -ENOTTY;
+13
View File
@@ -876,4 +876,17 @@ struct scoutfs_ioctl_punch_offline {
#define SCOUTFS_IOC_PUNCH_OFFLINE \
_IOW(SCOUTFS_IOCTL_MAGIC, 24, struct scoutfs_ioctl_punch_offline)
/*
* Inject a signed (total, count) delta at the totl key @name (a, b, c
* match the trailing dotted u64s of a totl xattr name).
*/
struct scoutfs_ioctl_inject_totl_delta {
__u64 name[SCOUTFS_IOCTL_XATTR_TOTAL_NAME_NR];
__s64 total;
__s64 count;
};
#define SCOUTFS_IOC_INJECT_TOTL_DELTA \
_IOW(SCOUTFS_IOCTL_MAGIC, 25, struct scoutfs_ioctl_inject_totl_delta)
#endif
+8 -4
View File
@@ -195,9 +195,11 @@ struct kc_shrinker_wrapper {
#include <linux/inet.h>
static inline int kc_kernel_getsockname(struct socket *sock, struct sockaddr *addr)
{
int addrlen = sizeof(struct sockaddr_in);
int addrlen = sizeof(struct sockaddr_storage);
int ret = kernel_getsockname(sock, addr, &addrlen);
if (ret == 0 && addrlen != sizeof(struct sockaddr_in))
if (ret == 0 && (!(
(addrlen == sizeof(struct sockaddr_in)) ||
(addrlen == sizeof(struct sockaddr_in6)))))
return -EAFNOSUPPORT;
else if (ret < 0)
return ret;
@@ -206,9 +208,11 @@ static inline int kc_kernel_getsockname(struct socket *sock, struct sockaddr *ad
}
static inline int kc_kernel_getpeername(struct socket *sock, struct sockaddr *addr)
{
int addrlen = sizeof(struct sockaddr_in);
int addrlen = sizeof(struct sockaddr_storage);
int ret = kernel_getpeername(sock, addr, &addrlen);
if (ret == 0 && addrlen != sizeof(struct sockaddr_in))
if (ret == 0 && (!(
(addrlen == sizeof(struct sockaddr_in)) ||
(addrlen == sizeof(struct sockaddr_in6)))))
return -EAFNOSUPPORT;
else if (ret < 0)
return ret;
+1
View File
@@ -813,6 +813,7 @@ int scoutfs_lock_invalidate_request(struct super_block *sb, u64 net_id,
out:
if (!lock) {
kfree(ireq);
ret = scoutfs_client_lock_response(sb, net_id, nl);
BUG_ON(ret); /* lock server doesn't fence timed out client requests */
}
+27 -15
View File
@@ -525,7 +525,7 @@ static int process_response(struct scoutfs_net_connection *conn,
struct super_block *sb = conn->sb;
struct message_send *msend;
scoutfs_net_response_t resp_func = NULL;
void *resp_data;
void *resp_data = NULL;
spin_lock(&conn->lock);
@@ -804,7 +804,7 @@ static void scoutfs_net_recv_worker(struct work_struct *work)
if (invalid_message(conn, nh)) {
scoutfs_inc_counter(sb, net_recv_invalid_message);
ret = -EBADMSG;
break;
goto out;
}
data_len = le16_to_cpu(nh->data_len);
@@ -1218,7 +1218,8 @@ static void scoutfs_net_connect_worker(struct work_struct *work)
trace_scoutfs_net_connect_work_enter(sb, 0, 0);
ret = kc_sock_create_kern(AF_INET, SOCK_STREAM, IPPROTO_TCP, &sock);
ret = kc_sock_create_kern(conn->connect_sin.ss_family,
SOCK_STREAM, IPPROTO_TCP, &sock);
if (ret)
goto out;
@@ -1239,7 +1240,9 @@ static void scoutfs_net_connect_worker(struct work_struct *work)
trace_scoutfs_conn_connect_start(conn);
ret = kernel_connect(sock, (struct sockaddr *)&conn->connect_sin,
sizeof(struct sockaddr_in), 0);
conn->connect_sin.ss_family == AF_INET ?
sizeof(struct sockaddr_in) : sizeof(struct sockaddr_in6),
0);
if (ret)
goto out;
@@ -1281,6 +1284,13 @@ static bool empty_accepted_list(struct scoutfs_net_connection *conn)
return empty;
}
/*
* sockaddr_storage wraps both _in and _in6, which have _port always
* __be16 at the same offset, and we only need to test whether it's
* zero.
*/
#define sockaddr_port_is_nonzero(sin) ((sin).__data[0] || (sin).__data[1])
/*
* Safely shut down an active connection. This can be triggered by
* errors in workers or by an external call to free the connection. The
@@ -1304,7 +1314,7 @@ static void scoutfs_net_shutdown_worker(struct work_struct *work)
trace_scoutfs_conn_shutdown_start(conn);
/* connected and accepted conns print a message */
if (conn->peername.sin_port != 0)
if (sockaddr_port_is_nonzero(conn->peername))
scoutfs_info(sb, "%s "SIN_FMT" -> "SIN_FMT,
conn->listening_conn ? "server closing" :
"client disconnected",
@@ -1434,6 +1444,7 @@ static void scoutfs_net_reconn_free_worker(struct work_struct *work)
DEFINE_CONN_FROM_WORK(conn, work, reconn_free_dwork.work);
struct super_block *sb = conn->sb;
struct scoutfs_net_connection *acc;
union scoutfs_inet_addr addr;
unsigned long now = jiffies;
unsigned long deadline = 0;
bool requeue = false;
@@ -1454,8 +1465,9 @@ restart:
if (!test_conn_fl(conn, shutting_down)) {
scoutfs_info(sb, "client "SIN_FMT" reconnect timed out, fencing",
SIN_ARG(&acc->last_peername));
scoutfs_sin_to_addr(&addr, &acc->last_peername);
ret = scoutfs_fence_start(sb, acc->rid,
acc->last_peername.sin_addr.s_addr,
&addr,
SCOUTFS_FENCE_CLIENT_RECONNECT);
if (ret) {
scoutfs_err(sb, "client fence returned err %d, shutting down server",
@@ -1538,9 +1550,9 @@ scoutfs_net_alloc_conn(struct super_block *sb,
conn->req_funcs = req_funcs;
spin_lock_init(&conn->lock);
init_waitqueue_head(&conn->waitq);
conn->sockname.sin_family = AF_INET;
conn->peername.sin_family = AF_INET;
conn->last_peername.sin_family = AF_INET;
conn->sockname.ss_family = AF_UNSPEC;
conn->peername.ss_family = AF_UNSPEC;
conn->last_peername.ss_family = AF_UNSPEC;
INIT_LIST_HEAD(&conn->accepted_head);
INIT_LIST_HEAD(&conn->accepted_list);
conn->next_send_seq = 1;
@@ -1619,7 +1631,7 @@ void scoutfs_net_free_conn(struct super_block *sb,
*/
int scoutfs_net_bind(struct super_block *sb,
struct scoutfs_net_connection *conn,
struct sockaddr_in *sin)
struct sockaddr_storage *sin)
{
struct socket *sock = NULL;
int addrlen;
@@ -1630,7 +1642,7 @@ int scoutfs_net_bind(struct super_block *sb,
if (WARN_ON_ONCE(conn->sock))
return -EINVAL;
ret = kc_sock_create_kern(AF_INET, SOCK_STREAM, IPPROTO_TCP, &sock);
ret = kc_sock_create_kern(sin->ss_family, SOCK_STREAM, IPPROTO_TCP, &sock);
if (ret)
goto out;
@@ -1642,7 +1654,7 @@ int scoutfs_net_bind(struct super_block *sb,
if (ret)
goto out;
addrlen = sizeof(struct sockaddr_in);
addrlen = sin->ss_family == AF_INET ? sizeof(struct sockaddr_in) : sizeof(struct sockaddr_in6);
ret = kernel_bind(sock, (struct sockaddr *)sin, addrlen);
if (ret)
goto out;
@@ -1658,7 +1670,7 @@ int scoutfs_net_bind(struct super_block *sb,
ret = 0;
conn->sock = sock;
*sin = conn->sockname;
sin = (struct sockaddr_storage *)&conn->sockname;
out:
if (ret < 0 && sock)
@@ -1693,7 +1705,7 @@ static bool connect_result(struct scoutfs_net_connection *conn, int *error)
done = true;
*error = 0;
} else if (test_conn_fl(conn, shutting_down) ||
conn->connect_sin.sin_family == 0) {
conn->connect_sin.ss_family == AF_UNSPEC) {
done = true;
*error = -ESHUTDOWN;
}
@@ -1714,7 +1726,7 @@ static bool connect_result(struct scoutfs_net_connection *conn, int *error)
*/
int scoutfs_net_connect(struct super_block *sb,
struct scoutfs_net_connection *conn,
struct sockaddr_in *sin, unsigned long timeout_ms)
struct sockaddr_storage *sin, unsigned long timeout_ms)
{
int ret = 0;
+38 -21
View File
@@ -49,15 +49,15 @@ struct scoutfs_net_connection {
unsigned long flags; /* CONN_FL_* bitmask */
unsigned long reconn_deadline;
struct sockaddr_in connect_sin;
struct sockaddr_storage connect_sin;
unsigned long connect_timeout_ms;
struct socket *sock;
u64 rid;
u64 greeting_id;
struct sockaddr_in sockname;
struct sockaddr_in peername;
struct sockaddr_in last_peername;
struct sockaddr_storage sockname;
struct sockaddr_storage peername;
struct sockaddr_storage last_peername;
struct list_head accepted_head;
struct scoutfs_net_connection *listening_conn;
@@ -99,27 +99,44 @@ enum conn_flags {
CONN_FL_reconn_freeing = (1UL << 6), /* waiting done, setter frees */
};
#define SIN_FMT "%pIS:%u"
#define SIN_ARG(sin) sin, be16_to_cpu((sin)->sin_port)
#define SIN_FMT "%pISpc"
#define SIN_ARG(sin) sin
static inline void scoutfs_addr_to_sin(struct sockaddr_in *sin,
static inline void scoutfs_addr_to_sin(struct sockaddr_storage *sin,
union scoutfs_inet_addr *addr)
{
BUG_ON(addr->v4.family != cpu_to_le16(SCOUTFS_AF_IPV4));
sin->sin_family = AF_INET;
sin->sin_addr.s_addr = cpu_to_be32(le32_to_cpu(addr->v4.addr));
sin->sin_port = cpu_to_be16(le16_to_cpu(addr->v4.port));
if (addr->v4.family == cpu_to_le16(SCOUTFS_AF_IPV4)) {
struct sockaddr_in *sin4 = (struct sockaddr_in *)sin;
memset(sin, 0, sizeof(struct sockaddr_storage));
sin4->sin_family = AF_INET;
sin4->sin_addr.s_addr = cpu_to_be32(le32_to_cpu(addr->v4.addr));
sin4->sin_port = cpu_to_be16(le16_to_cpu(addr->v4.port));
} else if (addr->v6.family == cpu_to_le16(SCOUTFS_AF_IPV6)) {
struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *)sin;
memset(sin, 0, sizeof(struct sockaddr_storage));
sin6->sin6_family = AF_INET6;
memcpy(&sin6->sin6_addr.in6_u.u6_addr8, &addr->v6.addr, 16);
sin6->sin6_port = cpu_to_be16(le16_to_cpu(addr->v6.port));
} else
BUG();
}
static inline void scoutfs_sin_to_addr(union scoutfs_inet_addr *addr, struct sockaddr_in *sin)
static inline void scoutfs_sin_to_addr(union scoutfs_inet_addr *addr, struct sockaddr_storage *sin)
{
BUG_ON(sin->sin_family != AF_INET);
memset(addr, 0, sizeof(union scoutfs_inet_addr));
addr->v4.family = cpu_to_le16(SCOUTFS_AF_IPV4);
addr->v4.addr = be32_to_le32(sin->sin_addr.s_addr);
addr->v4.port = be16_to_le16(sin->sin_port);
if (sin->ss_family == AF_INET) {
struct sockaddr_in *sin4 = (struct sockaddr_in *)sin;
memset(addr, 0, sizeof(union scoutfs_inet_addr));
addr->v4.family = cpu_to_le16(SCOUTFS_AF_IPV4);
addr->v4.addr = be32_to_le32(sin4->sin_addr.s_addr);
addr->v4.port = be16_to_le16(sin4->sin_port);
} else if (sin->ss_family == AF_INET6) {
struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *)sin;
memset(addr, 0, sizeof(union scoutfs_inet_addr));
addr->v6.family = cpu_to_le16(SCOUTFS_AF_IPV6);
memcpy(&addr->v6.addr, &sin6->sin6_addr.in6_u.u6_addr8, 16);
addr->v6.port = be16_to_le16(sin6->sin6_port);
} else
BUG();
}
struct scoutfs_net_connection *
@@ -130,10 +147,10 @@ scoutfs_net_alloc_conn(struct super_block *sb,
u64 scoutfs_net_client_rid(struct scoutfs_net_connection *conn);
int scoutfs_net_connect(struct super_block *sb,
struct scoutfs_net_connection *conn,
struct sockaddr_in *sin, unsigned long timeout_ms);
struct sockaddr_storage *sin, unsigned long timeout_ms);
int scoutfs_net_bind(struct super_block *sb,
struct scoutfs_net_connection *conn,
struct sockaddr_in *sin);
struct sockaddr_storage *sin);
void scoutfs_net_listen(struct super_block *sb,
struct scoutfs_net_connection *conn);
int scoutfs_net_submit_request(struct super_block *sb,
+137 -43
View File
@@ -145,14 +145,26 @@ struct quorum_info {
#define DECLARE_QUORUM_INFO_KOBJ(kobj, name) \
DECLARE_QUORUM_INFO(SCOUTFS_SYSFS_ATTRS_SB(kobj), name)
static bool quorum_slot_present(struct scoutfs_quorum_config *qconf, int i)
static bool quorum_slot_ipv4(struct scoutfs_quorum_config *qconf, int i)
{
BUG_ON(i < 0 || i > SCOUTFS_QUORUM_MAX_SLOTS);
return qconf->slots[i].addr.v4.family == cpu_to_le16(SCOUTFS_AF_IPV4);
}
static void quorum_slot_sin(struct scoutfs_quorum_config *qconf, int i, struct sockaddr_in *sin)
static bool quorum_slot_ipv6(struct scoutfs_quorum_config *qconf, int i)
{
BUG_ON(i < 0 || i > SCOUTFS_QUORUM_MAX_SLOTS);
return qconf->slots[i].addr.v6.family == cpu_to_le16(SCOUTFS_AF_IPV6);
}
static bool quorum_slot_present(struct scoutfs_quorum_config *qconf, int i)
{
return quorum_slot_ipv4(qconf, i) || quorum_slot_ipv6(qconf, i);
}
static void quorum_slot_sin(struct scoutfs_quorum_config *qconf, int i, struct sockaddr_storage *sin)
{
BUG_ON(i < 0 || i >= SCOUTFS_QUORUM_MAX_SLOTS);
@@ -179,11 +191,18 @@ static int create_socket(struct super_block *sb)
{
DECLARE_QUORUM_INFO(sb, qinf);
struct socket *sock = NULL;
struct sockaddr_in sin;
struct sockaddr_storage sin;
struct scoutfs_quorum_slot slot = qinf->qconf.slots[qinf->our_quorum_slot_nr];
int addrlen;
int ret;
ret = kc_sock_create_kern(PF_INET, SOCK_DGRAM, IPPROTO_UDP, &sock);
if (le16_to_cpu(slot.addr.v4.family) == SCOUTFS_AF_IPV4)
ret = kc_sock_create_kern(PF_INET, SOCK_DGRAM, IPPROTO_UDP, &sock);
else if (le16_to_cpu(slot.addr.v6.family) == SCOUTFS_AF_IPV6)
ret = kc_sock_create_kern(PF_INET6, SOCK_DGRAM, IPPROTO_UDP, &sock);
else
BUG();
if (ret) {
scoutfs_err(sb, "quorum couldn't create udp socket: %d", ret);
goto out;
@@ -192,9 +211,9 @@ static int create_socket(struct super_block *sb)
/* rather fail and retry than block waiting for free */
sock->sk->sk_allocation = GFP_ATOMIC;
addrlen = (le16_to_cpu(slot.addr.v4.family) == SCOUTFS_AF_IPV4) ?
sizeof(struct sockaddr_in) : sizeof(struct sockaddr_in6);
quorum_slot_sin(&qinf->qconf, qinf->our_quorum_slot_nr, &sin);
addrlen = sizeof(sin);
ret = kernel_bind(sock, (struct sockaddr *)&sin, addrlen);
if (ret) {
scoutfs_err(sb, "quorum failed to bind udp socket to "SIN_FMT": %d",
@@ -241,7 +260,7 @@ static int send_msg_members(struct super_block *sb, int type, u64 term, int only
.iov_base = &qmes,
.iov_len = sizeof(qmes),
};
struct sockaddr_in sin;
struct sockaddr_storage sin;
struct msghdr mh = {
.msg_flags = MSG_DONTWAIT | MSG_NOSIGNAL,
.msg_name = &sin,
@@ -542,10 +561,11 @@ int scoutfs_quorum_fence_leaders(struct super_block *sb, struct scoutfs_quorum_c
u64 term)
{
#define NR_OLD 2
struct scoutfs_quorum_block_event old[SCOUTFS_QUORUM_MAX_SLOTS][NR_OLD] = {{{0,}}};
struct scoutfs_quorum_block_event (*old)[NR_OLD];
struct scoutfs_sb_info *sbi = SCOUTFS_SB(sb);
struct scoutfs_quorum_block blk;
struct sockaddr_in sin;
struct sockaddr_storage sin;
union scoutfs_inet_addr addr;
const __le64 lefsid = cpu_to_le64(sbi->fsid);
const u64 rid = sbi->rid;
bool fence_started = false;
@@ -558,13 +578,20 @@ int scoutfs_quorum_fence_leaders(struct super_block *sb, struct scoutfs_quorum_c
BUILD_BUG_ON(SCOUTFS_QUORUM_BLOCKS < SCOUTFS_QUORUM_MAX_SLOTS);
old = kmalloc(NR_OLD * SCOUTFS_QUORUM_MAX_SLOTS * sizeof(struct scoutfs_quorum_block_event), GFP_KERNEL);
if (!old) {
ret = -ENOMEM;
goto out;
}
memset(old, 0, NR_OLD * SCOUTFS_QUORUM_MAX_SLOTS * sizeof(struct scoutfs_quorum_block_event));
for (i = 0; i < SCOUTFS_QUORUM_MAX_SLOTS; i++) {
if (!quorum_slot_present(qconf, i))
continue;
ret = read_quorum_block(sb, SCOUTFS_QUORUM_BLKNO + i, &blk, false);
if (ret < 0)
goto out;
goto out_free;
/* elected leader still running */
if (le64_to_cpu(blk.events[SCOUTFS_QUORUM_EVENT_ELECT].term) >
@@ -598,14 +625,17 @@ int scoutfs_quorum_fence_leaders(struct super_block *sb, struct scoutfs_quorum_c
scoutfs_info(sb, "fencing previous leader "SCSBF" at term %llu in slot %u with address "SIN_FMT,
SCSB_LEFR_ARGS(lefsid, fence_rid),
le64_to_cpu(old[i][j].term), i, SIN_ARG(&sin));
ret = scoutfs_fence_start(sb, le64_to_cpu(fence_rid), sin.sin_addr.s_addr,
scoutfs_sin_to_addr(&addr, &sin);
ret = scoutfs_fence_start(sb, le64_to_cpu(fence_rid), &addr,
SCOUTFS_FENCE_QUORUM_BLOCK_LEADER);
if (ret < 0)
goto out;
goto out_free;
fence_started = true;
}
}
out_free:
kfree(old);
out:
err = scoutfs_fence_wait_fenced(sb, msecs_to_jiffies(SCOUTFS_QUORUM_FENCE_TO_MS));
if (ret == 0)
@@ -708,7 +738,7 @@ static void scoutfs_quorum_worker(struct work_struct *work)
struct quorum_info *qinf = container_of(work, struct quorum_info, work);
struct scoutfs_mount_options opts;
struct super_block *sb = qinf->sb;
struct sockaddr_in unused;
struct sockaddr_storage unused;
struct quorum_host_msg msg;
struct quorum_status qst = {0,};
struct hb_recording hbr;
@@ -990,7 +1020,7 @@ out:
* leader with the greatest elected term. If we get it wrong the
* connection will timeout and the client will try again.
*/
int scoutfs_quorum_server_sin(struct super_block *sb, struct sockaddr_in *sin)
int scoutfs_quorum_server_sin(struct super_block *sb, struct sockaddr_storage *sin)
{
struct scoutfs_super_block *super = NULL;
struct scoutfs_quorum_block blk;
@@ -1049,7 +1079,7 @@ u8 scoutfs_quorum_votes_needed(struct super_block *sb)
return qinf->votes_needed;
}
void scoutfs_quorum_slot_sin(struct scoutfs_quorum_config *qconf, int i, struct sockaddr_in *sin)
void scoutfs_quorum_slot_sin(struct scoutfs_quorum_config *qconf, int i, struct sockaddr_storage *sin)
{
return quorum_slot_sin(qconf, i, sin);
}
@@ -1208,8 +1238,13 @@ static int verify_quorum_slots(struct super_block *sb, struct quorum_info *qinf,
struct scoutfs_quorum_config *qconf)
{
char slots[(SCOUTFS_QUORUM_MAX_SLOTS * 3) + 1];
struct sockaddr_in other;
struct sockaddr_in sin;
struct sockaddr_storage other;
struct sockaddr_storage sin;
struct sockaddr_in *sin4;
struct sockaddr_in *other4;
struct sockaddr_in6 *sin6;
struct sockaddr_in6 *other6;
__le16 family = cpu_to_le16(SCOUTFS_AF_NONE);
int found = 0;
int ret;
int i;
@@ -1220,35 +1255,94 @@ static int verify_quorum_slots(struct super_block *sb, struct quorum_info *qinf,
if (!quorum_slot_present(qconf, i))
continue;
scoutfs_quorum_slot_sin(qconf, i, &sin);
if (!valid_ipv4_unicast(sin.sin_addr.s_addr)) {
scoutfs_err(sb, "quorum slot #%d has invalid ipv4 unicast address: "SIN_FMT,
i, SIN_ARG(&sin));
return -EINVAL;
}
if (!valid_ipv4_port(sin.sin_port)) {
scoutfs_err(sb, "quorum slot #%d has invalid ipv4 port number:"SIN_FMT,
i, SIN_ARG(&sin));
return -EINVAL;
}
for (j = i + 1; j < SCOUTFS_QUORUM_MAX_SLOTS; j++) {
if (!quorum_slot_present(qconf, j))
continue;
scoutfs_quorum_slot_sin(qconf, j, &other);
if (sin.sin_addr.s_addr == other.sin_addr.s_addr &&
sin.sin_port == other.sin_port) {
scoutfs_err(sb, "quorum slots #%u and #%u have the same address: "SIN_FMT,
i, j, SIN_ARG(&sin));
if (quorum_slot_ipv4(qconf, i)) {
if (family == cpu_to_le16(SCOUTFS_AF_NONE)) {
family = cpu_to_le16(SCOUTFS_AF_IPV4);
} else if (family != cpu_to_le16(SCOUTFS_AF_IPV4)) {
scoutfs_err(sb, "quorum slot #%d is IPv4 but earlier slots are IPv6; mixed IPv4/IPv6 quorum is not supported",
i);
return -EINVAL;
}
}
found++;
scoutfs_quorum_slot_sin(qconf, i, &sin);
sin4 = (struct sockaddr_in *)&sin;
if (!valid_ipv4_unicast(sin4->sin_addr.s_addr)) {
scoutfs_err(sb, "quorum slot #%d has invalid ipv4 unicast address: "SIN_FMT,
i, SIN_ARG(&sin));
return -EINVAL;
}
if (!valid_ipv4_port(sin4->sin_port)) {
scoutfs_err(sb, "quorum slot #%d has invalid ipv4 port number:"SIN_FMT,
i, SIN_ARG(&sin));
return -EINVAL;
}
for (j = i + 1; j < SCOUTFS_QUORUM_MAX_SLOTS; j++) {
if (!quorum_slot_ipv4(qconf, j))
continue;
scoutfs_quorum_slot_sin(qconf, j, &other);
other4 = (struct sockaddr_in *)&other;
if (sin4->sin_addr.s_addr == other4->sin_addr.s_addr &&
sin4->sin_port == other4->sin_port) {
scoutfs_err(sb, "quorum slots #%u and #%u have the same address: "SIN_FMT,
i, j, SIN_ARG(&sin));
return -EINVAL;
}
}
found++;
} else if (quorum_slot_ipv6(qconf, i)) {
if (family == cpu_to_le16(SCOUTFS_AF_NONE)) {
family = cpu_to_le16(SCOUTFS_AF_IPV6);
} else if (family != cpu_to_le16(SCOUTFS_AF_IPV6)) {
scoutfs_err(sb, "quorum slot #%d is IPv6 but earlier slots are IPv4; mixed IPv4/IPv6 quorum is not supported",
i);
return -EINVAL;
}
quorum_slot_sin(qconf, i, &sin);
sin6 = (struct sockaddr_in6 *)&sin;
if ((sin6->sin6_addr.in6_u.u6_addr32[0] == 0) && (sin6->sin6_addr.in6_u.u6_addr32[1] == 0) &&
(sin6->sin6_addr.in6_u.u6_addr32[2] == 0) && (sin6->sin6_addr.in6_u.u6_addr32[3] == 0)) {
scoutfs_err(sb, "quorum slot #%d has unspecified ipv6 address:"SIN_FMT,
i, SIN_ARG(&sin));
return -EINVAL;
}
if (sin6->sin6_addr.in6_u.u6_addr8[0] == 0xff) {
scoutfs_err(sb, "quorum slot #%d has multicast ipv6 address:"SIN_FMT,
i, SIN_ARG(&sin));
return -EINVAL;
}
if (!valid_ipv4_port(sin6->sin6_port)) {
scoutfs_err(sb, "quorum slot #%d has invalid ipv6 port number:"SIN_FMT,
i, SIN_ARG(&sin));
return -EINVAL;
}
for (j = i + 1; j < SCOUTFS_QUORUM_MAX_SLOTS; j++) {
if (!quorum_slot_ipv6(qconf, j))
continue;
quorum_slot_sin(qconf, j, &other);
other6 = (struct sockaddr_in6 *)&other;
if ((ipv6_addr_equal(&sin6->sin6_addr, &other6->sin6_addr)) &&
(sin6->sin6_port == other6->sin6_port)) {
scoutfs_err(sb, "quorum slots #%u and #%u have the same address: "SIN_FMT,
i, j, SIN_ARG(&sin));
return -EINVAL;
}
}
found++;
}
}
if (found == 0) {
+2 -2
View File
@@ -1,11 +1,11 @@
#ifndef _SCOUTFS_QUORUM_H_
#define _SCOUTFS_QUORUM_H_
int scoutfs_quorum_server_sin(struct super_block *sb, struct sockaddr_in *sin);
int scoutfs_quorum_server_sin(struct super_block *sb, struct sockaddr_storage *sin);
u8 scoutfs_quorum_votes_needed(struct super_block *sb);
void scoutfs_quorum_slot_sin(struct scoutfs_quorum_config *qconf, int i,
struct sockaddr_in *sin);
struct sockaddr_storage *sin);
int scoutfs_quorum_fence_leaders(struct super_block *sb, struct scoutfs_quorum_config *qconf,
u64 term);
+25 -15
View File
@@ -34,6 +34,7 @@
#include "totl.h"
#include "util.h"
#include "quota.h"
#include "trans.h"
#include "counters.h"
#include "scoutfs_trace.h"
@@ -1086,6 +1087,10 @@ int scoutfs_quota_mod_rule(struct super_block *sb, bool is_add,
if (ret < 0)
goto out;
ret = scoutfs_hold_trans(sb, true);
if (ret < 0)
goto out;
down_write(&qtinf->rwsem);
if (is_add) {
@@ -1095,28 +1100,31 @@ int scoutfs_quota_mod_rule(struct super_block *sb, bool is_add,
else if (ret == 0)
ret = -EEXIST;
if (ret < 0)
goto unlock;
goto release;
rule_to_rule_val(&rv, &rule);
ret = scoutfs_item_create(sb, &key, &rv, sizeof(rv), lock);
if (ret < 0)
goto unlock;
goto release;
} else {
ret = find_rule(sb, &rule, &key, lock) ?:
scoutfs_item_delete(sb, &key, lock);
if (ret < 0)
goto unlock;
goto release;
}
wait_event(qtinf->waitq, !ruleset_is_busy(qtinf));
scoutfs_quota_invalidate(sb);
ret = 0;
unlock:
release:
up_write(&qtinf->rwsem);
scoutfs_unlock(sb, lock, SCOUTFS_LOCK_WRITE);
scoutfs_release_trans(sb);
out:
scoutfs_unlock(sb, lock, SCOUTFS_LOCK_WRITE);
if (is_add)
trace_scoutfs_quota_add_rule(sb, &rule, ret);
else
@@ -1135,12 +1143,17 @@ void scoutfs_quota_get_lock_range(struct scoutfs_key *start, struct scoutfs_key
}
/*
* This is called during cluster lock invalidation to indicate that the
* ruleset is no longer protected by cluster locking and might have been
* modified. We mark the ruleset invalid and free it once all readers
* drain. The next check will acquire the cluster lock and read the
* rules. Because this is called during invalidation this is serialized
* with write holders of cluster locks so we can never see -EBUSY here.
* Mark the cached ruleset invalid and free the previous one once readers
* drain. Called from cluster lock invalidation and from quota rule
* modification.
*
* Cluster lock invalidation runs only after the lock layer has drained
* local READ users. Since EBUSY is set only while a reader holds READ,
* the reader has already published by the time we run.
*
* Quota rule modification waits on the waitq for any in-flight reader
* to publish before calling here, so the next check rebuilds against
* the newly written rules rather than the reader's stale result.
*/
void scoutfs_quota_invalidate(struct super_block *sb)
{
@@ -1154,13 +1167,10 @@ void scoutfs_quota_invalidate(struct super_block *sb)
spin_lock(&qtinf->lock);
rs = rcu_dereference_protected(qtinf->ruleset, lockdep_is_held(&qtinf->lock));
if (rs != ERR_PTR(-EINVAL))
if (rs == ERR_PTR(-ENOENT) || !IS_ERR(rs))
rcu_assign_pointer(qtinf->ruleset, ERR_PTR(-EINVAL));
spin_unlock(&qtinf->lock);
/* cluster locking should have prevented this */
BUG_ON(rs == ERR_PTR(-EBUSY));
if (!IS_ERR(rs))
call_rcu(&rs->rcu, free_ruleset_rcu);
+21 -19
View File
@@ -1355,35 +1355,37 @@ DEFINE_EVENT(scoutfs_lock_class, scoutfs_lock_shrink,
);
DECLARE_EVENT_CLASS(scoutfs_net_class,
TP_PROTO(struct super_block *sb, struct sockaddr_in *name,
struct sockaddr_in *peer, struct scoutfs_net_header *nh),
TP_PROTO(struct super_block *sb, struct sockaddr_storage *name,
struct sockaddr_storage *peer, struct scoutfs_net_header *nh),
TP_ARGS(sb, name, peer, nh),
TP_STRUCT__entry(
SCSB_TRACE_FIELDS
si4_trace_define(name)
si4_trace_define(peer)
__field_struct(struct sockaddr_storage, name)
__field_struct(struct sockaddr_storage, peer)
snh_trace_define(nh)
),
TP_fast_assign(
SCSB_TRACE_ASSIGN(sb);
si4_trace_assign(name, name);
si4_trace_assign(peer, peer);
memcpy(&__entry->name, name, sizeof(struct sockaddr_storage));
memcpy(&__entry->peer, peer, sizeof(struct sockaddr_storage));
snh_trace_assign(nh, nh);
),
TP_printk(SCSBF" name "SI4_FMT" peer "SI4_FMT" nh "SNH_FMT,
SCSB_TRACE_ARGS, si4_trace_args(name), si4_trace_args(peer),
TP_printk(SCSBF" name "SIN_FMT" peer "SIN_FMT" nh "SNH_FMT,
SCSB_TRACE_ARGS,
&__entry->name,
&__entry->peer,
snh_trace_args(nh))
);
DEFINE_EVENT(scoutfs_net_class, scoutfs_net_send_message,
TP_PROTO(struct super_block *sb, struct sockaddr_in *name,
struct sockaddr_in *peer, struct scoutfs_net_header *nh),
TP_PROTO(struct super_block *sb, struct sockaddr_storage *name,
struct sockaddr_storage *peer, struct scoutfs_net_header *nh),
TP_ARGS(sb, name, peer, nh)
);
DEFINE_EVENT(scoutfs_net_class, scoutfs_net_recv_message,
TP_PROTO(struct super_block *sb, struct sockaddr_in *name,
struct sockaddr_in *peer, struct scoutfs_net_header *nh),
TP_PROTO(struct super_block *sb, struct sockaddr_storage *name,
struct sockaddr_storage *peer, struct scoutfs_net_header *nh),
TP_ARGS(sb, name, peer, nh)
);
@@ -1416,8 +1418,8 @@ DECLARE_EVENT_CLASS(scoutfs_net_conn_class,
__field(void *, sock)
__field(__u64, c_rid)
__field(__u64, greeting_id)
si4_trace_define(sockname)
si4_trace_define(peername)
__field_struct(struct sockaddr_storage, sockname)
__field_struct(struct sockaddr_storage, peername)
__field(unsigned char, e_accepted_head)
__field(void *, listening_conn)
__field(unsigned char, e_accepted_list)
@@ -1435,8 +1437,8 @@ DECLARE_EVENT_CLASS(scoutfs_net_conn_class,
__entry->sock = conn->sock;
__entry->c_rid = conn->rid;
__entry->greeting_id = conn->greeting_id;
si4_trace_assign(sockname, &conn->sockname);
si4_trace_assign(peername, &conn->peername);
memcpy(&__entry->sockname, &conn->sockname, sizeof(struct sockaddr_storage));
memcpy(&__entry->peername, &conn->peername, sizeof(struct sockaddr_storage));
__entry->e_accepted_head = !!list_empty(&conn->accepted_head);
__entry->listening_conn = conn->listening_conn;
__entry->e_accepted_list = !!list_empty(&conn->accepted_list);
@@ -1446,7 +1448,7 @@ DECLARE_EVENT_CLASS(scoutfs_net_conn_class,
__entry->e_resend_queue = !!list_empty(&conn->resend_queue);
__entry->recv_seq = atomic64_read(&conn->recv_seq);
),
TP_printk(SCSBF" flags %s rc_dl %lu cto %lu sk %p rid %llu grid %llu sn "SI4_FMT" pn "SI4_FMT" eah %u lc %p eal %u nss %llu nsi %llu esq %u erq %u rs %llu",
TP_printk(SCSBF" flags %s rc_dl %lu cto %lu sk %p rid %llu grid %llu sn "SIN_FMT" pn "SIN_FMT" eah %u lc %p eal %u nss %llu nsi %llu esq %u erq %u rs %llu",
SCSB_TRACE_ARGS,
print_conn_flags(__entry->flags),
__entry->reconn_deadline,
@@ -1454,8 +1456,8 @@ DECLARE_EVENT_CLASS(scoutfs_net_conn_class,
__entry->sock,
__entry->c_rid,
__entry->greeting_id,
si4_trace_args(sockname),
si4_trace_args(peername),
&__entry->sockname,
&__entry->peername,
__entry->e_accepted_head,
__entry->listening_conn,
__entry->e_accepted_list,
+5 -6
View File
@@ -1077,8 +1077,7 @@ static int next_log_merge_range(struct super_block *sb, struct scoutfs_btree_roo
struct scoutfs_key key;
int ret;
key = *start;
key.sk_zone = SCOUTFS_LOG_MERGE_RANGE_ZONE;
init_log_merge_key(&key, SCOUTFS_LOG_MERGE_RANGE_ZONE, 0, 0);
scoutfs_key_set_ones(&rng->start);
do {
@@ -3640,7 +3639,7 @@ static bool invalid_mounted_client_item(struct scoutfs_btree_item_ref *iref)
* it's acceptable to see -EEXIST.
*/
static int insert_mounted_client(struct super_block *sb, u64 rid, u64 gr_flags,
struct sockaddr_in *sin)
struct sockaddr_storage *sin)
{
DECLARE_SERVER_INFO(sb, server);
struct scoutfs_super_block *super = DIRTY_SUPER_SB(sb);
@@ -4393,7 +4392,7 @@ static void fence_pending_recov_worker(struct work_struct *work)
break;
}
ret = scoutfs_fence_start(sb, rid, le32_to_be32(addr.v4.addr),
ret = scoutfs_fence_start(sb, rid, &addr,
SCOUTFS_FENCE_CLIENT_RECOVERY);
if (ret < 0) {
scoutfs_err(sb, "fence returned err %d, shutting down server", ret);
@@ -4544,7 +4543,7 @@ static void scoutfs_server_worker(struct work_struct *work)
struct scoutfs_net_connection *conn = NULL;
struct scoutfs_mount_options opts;
DECLARE_WAIT_QUEUE_HEAD(waitq);
struct sockaddr_in sin;
struct sockaddr_storage sin;
bool alloc_init = false;
u64 max_seq;
int ret;
@@ -4553,7 +4552,7 @@ static void scoutfs_server_worker(struct work_struct *work)
scoutfs_options_read(sb, &opts);
scoutfs_quorum_slot_sin(&server->qconf, opts.quorum_slot_nr, &sin);
scoutfs_info(sb, "server starting at "SIN_FMT, SIN_ARG(&sin));
scoutfs_info(sb, "server starting at "SIN_FMT, &sin);
scoutfs_block_writer_init(sb, &server->wri);
server->finalize_sent_seq = 0;
-21
View File
@@ -1,27 +1,6 @@
#ifndef _SCOUTFS_SERVER_H_
#define _SCOUTFS_SERVER_H_
#define SI4_FMT "%u.%u.%u.%u:%u"
#define si4_trace_define(name) \
__field(__u32, name##_addr) \
__field(__u16, name##_port)
#define si4_trace_assign(name, sin) \
do { \
__typeof__(sin) _sin = (sin); \
\
__entry->name##_addr = be32_to_cpu(_sin->sin_addr.s_addr); \
__entry->name##_port = be16_to_cpu(_sin->sin_port); \
} while(0)
#define si4_trace_args(name) \
(__entry->name##_addr >> 24), \
(__entry->name##_addr >> 16) & 255, \
(__entry->name##_addr >> 8) & 255, \
__entry->name##_addr & 255, \
__entry->name##_port
#define SNH_FMT \
"seq %llu recv_seq %llu id %llu data_len %u cmd %u flags 0x%x error %u"
#define SNH_ARG(nh) \
+25 -17
View File
@@ -30,6 +30,11 @@ void scoutfs_totl_merge_init(struct scoutfs_totl_merging *merg)
memset(merg, 0, sizeof(struct scoutfs_totl_merging));
}
/*
* bin the incoming merge inputs so that we can resolve delta items
* properly. Finalized logs that are merge inputs are kept separately
* from those that are not.
*/
void scoutfs_totl_merge_contribute(struct scoutfs_totl_merging *merg,
u64 seq, u8 flags, void *val, int val_len, int fic)
{
@@ -39,10 +44,10 @@ void scoutfs_totl_merge_contribute(struct scoutfs_totl_merging *merg,
merg->fs_seq = seq;
merg->fs_total = le64_to_cpu(tval->total);
merg->fs_count = le64_to_cpu(tval->count);
} else if (fic & FIC_FINALIZED) {
merg->fin_seq = seq;
merg->fin_total += le64_to_cpu(tval->total);
merg->fin_count += le64_to_cpu(tval->count);
} else if (fic & FIC_MERGE_INPUT) {
merg->inp_seq = seq;
merg->inp_total += le64_to_cpu(tval->total);
merg->inp_count += le64_to_cpu(tval->count);
} else {
merg->log_seq = seq;
merg->log_total += le64_to_cpu(tval->total);
@@ -53,15 +58,18 @@ void scoutfs_totl_merge_contribute(struct scoutfs_totl_merging *merg,
/*
* .totl. item merging has to be careful because the log btree merging
* code can write partial results to the fs_root. This means that a
* reader can see both cases where new finalized logs should be applied
* to the old fs items and where old finalized logs have already been
* applied to the partially merged fs items. Currently active logged
* items are always applied on top of all cases.
* reader can see both cases where merge input deltas should be applied
* to the old fs items and where they have already been applied to the
* partially merged fs items.
*
* Only finalized log trees that are inputs to the current merge cycle
* are tracked in the inp_ bucket. Finalized trees that aren't merge
* inputs and active log trees are always applied unconditionally since
* they cannot be in fs_root.
*
* These cases are differentiated with a combination of sequence numbers
* in items, the count of contributing xattrs, and a flag
* differentiating finalized and active logged items. This lets us
* recognize all cases, including when finalized logs were merged and
* in items and the count of contributing xattrs. This lets us
* recognize all cases, including when merge inputs were merged and
* deleted the fs item.
*/
void scoutfs_totl_merge_resolve(struct scoutfs_totl_merging *merg, __u64 *total, __u64 *count)
@@ -75,14 +83,14 @@ void scoutfs_totl_merge_resolve(struct scoutfs_totl_merging *merg, __u64 *total,
*count = merg->fs_count;
}
/* apply finalized logs if they're newer or creating */
if (((merg->fs_seq != 0) && (merg->fin_seq > merg->fs_seq)) ||
((merg->fs_seq == 0) && (merg->fin_count > 0))) {
*total += merg->fin_total;
*count += merg->fin_count;
/* apply merge input deltas if they're newer or creating */
if (((merg->fs_seq != 0) && (merg->inp_seq > merg->fs_seq)) ||
((merg->fs_seq == 0) && (merg->inp_count > 0))) {
*total += merg->inp_total;
*count += merg->inp_count;
}
/* always apply active logs which must be newer than fs and finalized */
/* always apply non-input finalized and active logs */
if (merg->log_seq > 0) {
*total += merg->log_total;
*count += merg->log_count;
+3 -3
View File
@@ -7,9 +7,9 @@ struct scoutfs_totl_merging {
u64 fs_seq;
u64 fs_total;
u64 fs_count;
u64 fin_seq;
u64 fin_total;
s64 fin_count;
u64 inp_seq;
u64 inp_total;
s64 inp_count;
u64 log_seq;
u64 log_total;
s64 log_count;
+1
View File
@@ -46,6 +46,7 @@ static char *names[] = {
[SCOUTFS_TRIGGER_SRCH_MERGE_STOP_SAFE] = "srch_merge_stop_safe",
[SCOUTFS_TRIGGER_STATFS_LOCK_PURGE] = "statfs_lock_purge",
[SCOUTFS_TRIGGER_RECLAIM_SKIP_FINALIZE] = "reclaim_skip_finalize",
[SCOUTFS_TRIGGER_LOG_MERGE_FORCE_PARTIAL] = "log_merge_force_partial",
};
bool scoutfs_trigger_test_and_clear(struct super_block *sb, unsigned int t)
+1
View File
@@ -9,6 +9,7 @@ enum scoutfs_trigger {
SCOUTFS_TRIGGER_SRCH_MERGE_STOP_SAFE,
SCOUTFS_TRIGGER_STATFS_LOCK_PURGE,
SCOUTFS_TRIGGER_RECLAIM_SKIP_FINALIZE,
SCOUTFS_TRIGGER_LOG_MERGE_FORCE_PARTIAL,
SCOUTFS_TRIGGER_NR,
};
+74 -17
View File
@@ -95,6 +95,7 @@ struct wkic_info {
/* block reading slow path */
struct mutex roots_mutex;
struct scoutfs_net_roots roots;
u64 merge_input_seq;
u64 roots_read_seq;
ktime_t roots_expire;
@@ -805,29 +806,79 @@ static void free_page_list(struct super_block *sb, struct list_head *list)
* read_seq number so that we can compare the age of the items in cached
* pages. Only one request to refresh the roots is in progress at a
* time. This is the slow path that's only used when the cache isn't
* populated and the roots aren't cached. The root request is fast
* enough, especially compared to the resulting item reading IO, that we
* don't mind hiding it behind a trivial mutex.
* populated and the roots aren't cached.
*
* We read roots directly from the on-disk superblock rather than
* requesting them from the server so that we can also read the
* log_merge btree from the same superblock. The merge status item
* seq tells us which finalized log trees are inputs to the current
* merge, which is needed to correctly resolve totl delta items.
*/
static int get_roots(struct super_block *sb, struct wkic_info *winf,
struct scoutfs_net_roots *roots_ret, u64 *read_seq, bool force_new)
static int refresh_roots(struct super_block *sb, struct wkic_info *winf)
{
struct scoutfs_super_block *super;
struct scoutfs_log_merge_status *stat;
SCOUTFS_BTREE_ITEM_REF(iref);
struct scoutfs_key key;
int ret;
super = kmalloc(sizeof(*super), GFP_NOFS);
if (!super)
return -ENOMEM;
ret = scoutfs_read_super(sb, super);
if (ret < 0)
goto out;
winf->roots = (struct scoutfs_net_roots){
.fs_root = super->fs_root,
.logs_root = super->logs_root,
.srch_root = super->srch_root,
};
winf->merge_input_seq = 0;
if (super->log_merge.ref.blkno) {
scoutfs_key_set_zeros(&key);
key.sk_zone = SCOUTFS_LOG_MERGE_STATUS_ZONE;
ret = scoutfs_btree_lookup(sb, &super->log_merge, &key, &iref);
if (ret == 0) {
if (iref.val_len == sizeof(*stat)) {
stat = iref.val;
winf->merge_input_seq = le64_to_cpu(stat->seq);
} else {
ret = -EUCLEAN;
}
scoutfs_btree_put_iref(&iref);
} else if (ret == -ENOENT) {
ret = 0;
}
if (ret < 0)
goto out;
}
winf->roots_read_seq++;
winf->roots_expire = ktime_add_ms(ktime_get_raw(), WKIC_CACHE_LIFETIME_MS);
out:
kfree(super);
return ret;
}
static int get_roots(struct super_block *sb, struct wkic_info *winf,
struct scoutfs_net_roots *roots_ret, u64 *merge_input_seq,
u64 *read_seq, bool force_new)
{
struct scoutfs_net_roots roots;
int ret;
mutex_lock(&winf->roots_mutex);
if (force_new || ktime_before(winf->roots_expire, ktime_get_raw())) {
ret = scoutfs_client_get_roots(sb, &roots);
ret = refresh_roots(sb, winf);
if (ret < 0)
goto out;
winf->roots = roots;
winf->roots_read_seq++;
winf->roots_expire = ktime_add_ms(ktime_get_raw(), WKIC_CACHE_LIFETIME_MS);
}
*roots_ret = winf->roots;
*merge_input_seq = winf->merge_input_seq;
*read_seq = winf->roots_read_seq;
ret = 0;
out:
@@ -870,24 +921,30 @@ static int insert_read_pages(struct super_block *sb, struct wkic_info *winf,
struct scoutfs_key end;
struct wkic_page *wpage;
LIST_HEAD(pages);
u64 read_seq;
u64 merge_input_seq;
u64 read_seq = 0;
int ret;
ret = 0;
retry_stale:
ret = get_roots(sb, winf, &roots, &read_seq, ret == -ESTALE);
ret = get_roots(sb, winf, &roots, &merge_input_seq, &read_seq, ret == -ESTALE);
if (ret < 0)
goto out;
goto check_stale;
start = *range_start;
end = *range_end;
ret = scoutfs_forest_read_items_roots(sb, &roots, key, range_start, &start, &end,
read_items_cb, &root);
ret = scoutfs_forest_read_items_roots(sb, &roots, merge_input_seq, key, range_start,
&start, &end, read_items_cb, &root);
trace_scoutfs_wkic_read_items(sb, key, &start, &end);
check_stale:
ret = scoutfs_block_check_stale(sb, ret, &saved, &roots.fs_root.ref, &roots.logs_root.ref);
if (ret < 0) {
if (ret == -ESTALE)
if (ret == -ESTALE) {
/* not safe to retry due to delta items, must restart clean */
free_item_tree(&root);
root = RB_ROOT;
goto retry_stale;
}
goto out;
}
+1
View File
@@ -1265,6 +1265,7 @@ int scoutfs_xattr_drop(struct super_block *sb, u64 ino,
ret = parse_indx_key(&tag_key, xat->name, xat->name_len, ino);
if (ret < 0)
goto out;
scoutfs_xattr_set_indx_key_xid(&tag_key, le64_to_cpu(key.skx_id));
}
if ((tgs.totl || tgs.indx) && locked_zone != tag_key.sk_zone) {
+1
View File
@@ -12,3 +12,4 @@ src/o_tmpfile_umask
src/o_tmpfile_linkat
src/mmap_stress
src/mmap_validate
src/totl-delta-inject
+2 -1
View File
@@ -15,7 +15,8 @@ BIN := src/createmany \
src/o_tmpfile_umask \
src/o_tmpfile_linkat \
src/mmap_stress \
src/mmap_validate
src/mmap_validate \
src/totl-delta-inject
DEPS := $(wildcard src/*.d)
+57 -8
View File
@@ -20,9 +20,6 @@ t_filter_fs()
# [ 2687.691366] BUG: KASAN: stack-out-of-bounds in get_reg+0x1bc/0x230
# ...
# [ 2687.706220] ==================================================================
# [ 2687.707284] Disabling lock debugging due to kernel taint
#
# That final lock debugging message may not be included.
#
ignore_harmless_unwind_kasan_stack_oob()
{
@@ -46,10 +43,6 @@ awk '
saved=""
}
( in_soob == 2 && $0 ~ /==================================================================/ ) {
in_soob = 3
soob_nr = NR
}
( in_soob == 3 && NR > soob_nr && $0 !~ /Disabling lock debugging/ ) {
in_soob = 0
}
( !in_soob ) { print $0 }
@@ -61,6 +54,58 @@ awk '
'
}
#
# in el97+, XFS can generate a spurious lockdep circular dependency
# warning about reclaim. Fixed upstream in e.g. v5.7-rc4-129-g6dcde60efd94
#
ignore_harmless_xfs_lockdep_warning()
{
awk '
BEGIN {
in_block = 0
block_nr = 0
buf = ""
}
( !in_block && $0 ~ /======================================================/ ) {
in_block = 1
block_nr = NR
buf = $0 "\n"
next
}
( in_block == 1 && NR == (block_nr + 1) ) {
if (match($0, /WARNING: possible circular locking dependency detected/) != 0) {
in_block = 2
buf = buf $0 "\n"
} else {
in_block = 0
printf "%s", buf
print $0
buf = ""
}
next
}
( in_block == 2 ) {
buf = buf $0 "\n"
if ($0 ~ /<\/TASK>/) {
if (buf ~ /xfs_(nondir_|dir_)?ilock_class/ && buf ~ /fs_reclaim/) {
# known xfs lockdep false positive, discard
} else {
printf "%s", buf
}
in_block = 0
buf = ""
}
next
}
{ print $0 }
END {
if (buf) {
printf "%s", buf
}
}
'
}
#
# Filter out expected messages. Putting messages here implies that
# tests aren't relying on messages to discover failures.. they're
@@ -176,6 +221,10 @@ t_filter_dmesg()
# creating block devices may trigger this
re="$re|block device autoloading is deprecated and will be removed."
# lockdep or kasan warnings can cause this
re="$re|Disabling lock debugging due to kernel taint"
egrep -v "($re)" | \
ignore_harmless_unwind_kasan_stack_oob
ignore_harmless_unwind_kasan_stack_oob | \
ignore_harmless_xfs_lockdep_warning
}
+7
View File
@@ -0,0 +1,7 @@
== mkfs rejects mixed v4/v6 quorum
rc: 64
== mkfs all-v4, mount three members, cross-mount signature visible
== change-quorum-config rejects mixed v4/v6 quorum
rc: 64
== switch v4 -> v6, signature survives, cross-mount write again
== switch v6 -> v4, signatures survive
+54
View File
@@ -0,0 +1,54 @@
== testing invalid read-xattr-index arguments
bad index position entry argument 'bad', it must be in the form "a.b.ino" where each value can be prefixed by '0' for octal or '0x' for hex
scoutfs: read-xattr-index failed: Invalid argument (22)
bad index position entry argument '1.2', it must be in the form "a.b.ino" where each value can be prefixed by '0' for octal or '0x' for hex
scoutfs: read-xattr-index failed: Invalid argument (22)
initial major index position '256' must be between 0 and 255, inclusive.
scoutfs: read-xattr-index failed: Invalid argument (22)
first index position 1.2.3 must be less than last index position 0.0.0
scoutfs: read-xattr-index failed: Invalid argument (22)
first index position 1.2.0 must be less than last index position 1.1.2
scoutfs: read-xattr-index failed: Invalid argument (22)
first index position 2.2.2 must be less than last index position 2.2.1
scoutfs: read-xattr-index failed: Invalid argument (22)
== testing invalid names
setfattr: /mnt/test/test/basic-xattr-indx/invalid: Invalid argument
setfattr: /mnt/test/test/basic-xattr-indx/invalid: Invalid argument
setfattr: /mnt/test/test/basic-xattr-indx/invalid: Invalid argument
setfattr: /mnt/test/test/basic-xattr-indx/invalid: Invalid argument
setfattr: /mnt/test/test/basic-xattr-indx/invalid: Invalid argument
setfattr: /mnt/test/test/basic-xattr-indx/invalid: Invalid argument
setfattr: /mnt/test/test/basic-xattr-indx/invalid: Invalid argument
setfattr: /mnt/test/test/basic-xattr-indx/invalid: Invalid argument
setfattr: /mnt/test/test/basic-xattr-indx/invalid: Invalid argument
setfattr: /mnt/test/test/basic-xattr-indx/invalid: Invalid argument
setfattr: /mnt/test/test/basic-xattr-indx/invalid: Invalid argument
setfattr: /mnt/test/test/basic-xattr-indx/invalid: Numerical result out of range
== testing boundary values
0.0 found
255.max found
== indx xattr must have no value
setfattr: /mnt/test/test/basic-xattr-indx/noval: Invalid argument
setfattr: /mnt/test/test/basic-xattr-indx/noval: Invalid argument
== set indx xattr and verify index entry
found
== setting same indx xattr again is a no-op
found
== removing non-existent indx xattr succeeds
setfattr: /mnt/test/test/basic-xattr-indx/file: No such attribute
still found
== explicit xattr removal cleans up index entry
== file deletion cleans up index entry
found before delete
== multiple indx xattrs on one file cleaned up by deletion
entries before delete: 2
entries after delete: 0
== partial removal leaves other entries
300 found
== multiple files at same index position
files at same position: 2
surviving file found
== cross-mount visibility
found on mount 1
== duplicate position deduplication
entries for same position: 1
+6
View File
@@ -0,0 +1,6 @@
== setup
== concurrent quota mod and check across mounts
== verify quota rules are consistent after race
== verify file creation still works under quota
file visible on mount 1
== cleanup
+10
View File
@@ -0,0 +1,10 @@
== setup three files contributing to totl 8888.0.0
== merge baseline into fs_root
8888.0.0 = 42, 3
== inject (+128, +2) unbalances totl 8888.0.0
8888.0.0 = 170, 5
== unlink f3 (value 32) produces a -32/-1 delta
8888.0.0 = 138, 4
== inject (-128, -2) restores accounting for the remaining files
8888.0.0 = 10, 2
== cleanup
+3
View File
@@ -0,0 +1,3 @@
== setup
expected 4681
== cleanup
+3 -3
View File
@@ -383,7 +383,7 @@ fi
quo=""
if [ -n "$T_MKFS" ]; then
for i in $(seq -0 $((T_QUORUM - 1))); do
quo="$quo -Q $i,127.0.0.1,$((T_TEST_PORT + i))"
quo="$quo -Q $i,::1,$((T_TEST_PORT + i))"
done
msg "making new filesystem with $T_QUORUM quorum members"
@@ -694,8 +694,8 @@ for t in $tests; do
if [ "$sts" == "$T_PASS_STATUS" ]; then
dmesg | t_filter_dmesg > "$T_TMPDIR/dmesg.after"
diff --old-line-format="" --unchanged-line-format="" \
"$T_TMPDIR/dmesg.before" "$T_TMPDIR/dmesg.after" > \
"$T_TMPDIR/dmesg.new"
"$T_TMPDIR/dmesg.before" "$T_TMPDIR/dmesg.after" | \
grep -v '^$' > "$T_TMPDIR/dmesg.new"
if [ -s "$T_TMPDIR/dmesg.new" ]; then
message="unexpected messages in dmesg"
+5
View File
@@ -1,6 +1,7 @@
export-get-name-parent.sh
basic-block-counts.sh
basic-bad-mounts.sh
basic-inetaddr.sh
basic-posix-acl.sh
basic-acl-consistency.sh
inode-items-updated.sh
@@ -26,7 +27,11 @@ srch-basic-functionality.sh
simple-xattr-unit.sh
retention-basic.sh
totl-xattr-tag.sh
basic-xattr-indx.sh
quota.sh
totl-merge-read.sh
quota-invalidate-race.sh
totl-delta-inject.sh
lock-refleak.sh
lock-shrink-consistency.sh
lock-shrink-read-race.sh
+121
View File
@@ -0,0 +1,121 @@
/*
* Test helper that calls SCOUTFS_IOC_INJECT_TOTL_DELTA to seed
* arbitrary totl deltas.
*
* Copyright (C) 2026 Versity Software, Inc. All rights reserved.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public
* License v2 as published by the Free Software Foundation.
*/
#ifndef _GNU_SOURCE
#define _GNU_SOURCE
#endif
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#include <inttypes.h>
#include <fcntl.h>
#include <errno.h>
#include <sys/ioctl.h>
#include <linux/types.h>
#include "ioctl.h"
static void usage(const char *prog)
{
fprintf(stderr,
"Usage: %s <mountpoint> <a>.<b>.<c> <total> <count>\n",
prog);
exit(2);
}
static int parse_s64(const char *s, int64_t *out)
{
char *end;
int64_t v;
errno = 0;
v = strtoll(s, &end, 0);
if (errno || *end != '\0' || end == s)
return -1;
*out = v;
return 0;
}
/*
* Parse "<a>.<b>.<c>" into abc[0..2] (skxt_a, skxt_b, skxt_c). Each
* component must be a non-empty unsigned base-0 integer.
*/
static int parse_dotted_name(const char *s, uint64_t abc[3])
{
const char *p = s;
char *end;
int i;
for (i = 0; i < 3; i++) {
if (*p == '\0' || *p == '.')
return -1;
errno = 0;
abc[i] = strtoull(p, &end, 0);
if (errno || end == p)
return -1;
if (i < 2) {
if (*end != '.')
return -1;
p = end + 1;
} else {
if (*end != '\0')
return -1;
}
}
return 0;
}
int main(int argc, char **argv)
{
struct scoutfs_ioctl_inject_totl_delta itd = {{0,}};
uint64_t abc[3];
int64_t total, count;
int fd;
int ret;
if (argc != 5)
usage(argv[0]);
if (parse_dotted_name(argv[2], abc) ||
parse_s64(argv[3], &total) ||
parse_s64(argv[4], &count)) {
fprintf(stderr, "could not parse arguments\n");
usage(argv[0]);
}
itd.name[0] = abc[0];
itd.name[1] = abc[1];
itd.name[2] = abc[2];
itd.total = total;
itd.count = count;
fd = open(argv[1], O_RDONLY | O_DIRECTORY);
if (fd < 0) {
fprintf(stderr, "open(%s): %s\n", argv[1], strerror(errno));
return 1;
}
ret = ioctl(fd, SCOUTFS_IOC_INJECT_TOTL_DELTA, &itd);
if (ret < 0) {
fprintf(stderr,
"INJECT_TOTL_DELTA(%" PRIu64 ".%" PRIu64 ".%" PRIu64
", total=%" PRId64 ", count=%" PRId64 "): %s\n",
abc[0], abc[1], abc[2], total, count, strerror(errno));
close(fd);
return 1;
}
close(fd);
return 0;
}
+78
View File
@@ -0,0 +1,78 @@
#
# Test that mixed ipv4/6 fails through mkfs/quorum change and that
# users can migrate from ipv4 to v6 and back.
#
t_require_commands dmsetup blockdev cmp
P0=$T_SCRATCH_PORT
P1=$((T_SCRATCH_PORT + 1))
P2=$((T_SCRATCH_PORT + 2))
SIG=$T_TMP.sig
seq 1 4096 > "$SIG"
trap '
umount $T_TMPDIR/m0 $T_TMPDIR/m1 $T_TMPDIR/m2 2>/dev/null
dmsetup remove _bia_m0 _bia_m1 _bia_m2 _bia_d0 _bia_d1 _bia_d2 2>/dev/null
' EXIT
mkdir -p "$T_TMPDIR/m0" "$T_TMPDIR/m1" "$T_TMPDIR/m2"
for nv in "m0 $T_EX_META_DEV" "m1 $T_EX_META_DEV" "m2 $T_EX_META_DEV" \
"d0 $T_EX_DATA_DEV" "d1 $T_EX_DATA_DEV" "d2 $T_EX_DATA_DEV"; do
set -- $nv
t_quiet dmsetup create _bia_$1 --table "0 $(blockdev --getsz $2) linear $2 0"
done
mnt() {
mount -t scoutfs \
-o metadev_path=/dev/mapper/_bia_m$1,quorum_slot_nr=$1 \
/dev/mapper/_bia_d$1 "$T_TMPDIR/m$1"
}
mount_all() {
mnt 0 &
mnt 1 &
mnt 2 &
wait
}
umount_all() {
umount $T_TMPDIR/m0 &
umount $T_TMPDIR/m1 &
umount $T_TMPDIR/m2 &
wait
}
verify() {
cmp -s "$SIG" "$T_TMPDIR/m0/sig" &&
cmp -s "$SIG" "$T_TMPDIR/m1/sig" &&
cmp -s "$SIG" "$T_TMPDIR/m2/sig" || t_fail "$1"
}
echo "== mkfs rejects mixed v4/v6 quorum"
t_rc scoutfs mkfs -f -Q 0,127.0.0.1,$P0 -Q 1,::1,$P1 -Q 2,127.0.0.1,$P2 /dev/mapper/_bia_m0 /dev/mapper/_bia_d0
echo "== mkfs all-v4, mount three members, cross-mount signature visible"
t_quiet scoutfs mkfs -f -Q 0,127.0.0.1,$P0 -Q 1,127.0.0.1,$P1 -Q 2,127.0.0.1,$P2 /dev/mapper/_bia_m0 /dev/mapper/_bia_d0
mount_all
cp "$SIG" "$T_TMPDIR/m0/sig"
verify "v4 initial"
umount_all
echo "== change-quorum-config rejects mixed v4/v6 quorum"
t_rc scoutfs change-quorum-config --offline -Q 0,127.0.0.1,$P0 -Q 1,::1,$P1 -Q 2,127.0.0.1,$P2 /dev/mapper/_bia_m0
echo "== switch v4 -> v6, signature survives, cross-mount write again"
t_quiet scoutfs change-quorum-config --offline -Q 0,::1,$P0 -Q 1,::1,$P1 -Q 2,::1,$P2 /dev/mapper/_bia_m0
mount_all
verify "after v4->v6"
cp "$SIG" "$T_TMPDIR/m1/sig-v6"
cmp -s "$SIG" "$T_TMPDIR/m0/sig-v6" || t_fail "v6 cross-mount write not visible on m0"
cmp -s "$SIG" "$T_TMPDIR/m2/sig-v6" || t_fail "v6 cross-mount write not visible on m2"
umount_all
echo "== switch v6 -> v4, signatures survive"
t_quiet scoutfs change-quorum-config --offline -Q 0,127.0.0.1,$P0 -Q 1,127.0.0.1,$P1 -Q 2,127.0.0.1,$P2 /dev/mapper/_bia_m0
mount_all
verify "after v6->v4"
cmp -s "$SIG" "$T_TMPDIR/m0/sig-v6" || t_fail "after v6->v4 sig-v6 lost"
umount_all
t_pass
+143
View File
@@ -0,0 +1,143 @@
#
# Test basic .indx. xattr tag functionality and index entry lifecycle
#
t_require_commands touch rm setfattr scoutfs stat
t_require_mounts 2
# query index from a specific mount, default mount 0
read_xattr_index()
{
local nr="${1:-0}"
local mnt="$(eval echo \$T_M$nr)"
shift
sync
echo 1 > $(t_debugfs_path $nr)/drop_weak_item_cache
scoutfs read-xattr-index -p "$mnt" "$@"
}
MAJOR=5
MINOR=100
echo "== testing invalid read-xattr-index arguments"
scoutfs read-xattr-index -p "$T_M0" bad 2>&1
scoutfs read-xattr-index -p "$T_M0" 1.2 2>&1
scoutfs read-xattr-index -p "$T_M0" 1.2.3 256.0.0 2>&1
scoutfs read-xattr-index -p "$T_M0" 1.2.3 0.0.0 2>&1
scoutfs read-xattr-index -p "$T_M0" 1.2.0 1.1.2 2>&1
scoutfs read-xattr-index -p "$T_M0" 2.2.2 2.2.1 2>&1
echo "== testing invalid names"
touch "$T_D0/invalid"
setfattr -n scoutfs.hide.indx.test.$MAJOR "$T_D0/invalid" 2>&1 | t_filter_fs
setfattr -n scoutfs.hide.indx.test "$T_D0/invalid" 2>&1 | t_filter_fs
setfattr -n scoutfs.hide.indx.test.. "$T_D0/invalid" 2>&1 | t_filter_fs
setfattr -n scoutfs.hide.indx.test..$MINOR "$T_D0/invalid" 2>&1 | t_filter_fs
setfattr -n scoutfs.hide.indx.test.$MAJOR. "$T_D0/invalid" 2>&1 | t_filter_fs
setfattr -n scoutfs.hide.indx.test.256.$MINOR "$T_D0/invalid" 2>&1 | t_filter_fs
setfattr -n scoutfs.hide.indx.test.abc.$MINOR "$T_D0/invalid" 2>&1 | t_filter_fs
setfattr -n scoutfs.hide.indx.test.$MAJOR.abc "$T_D0/invalid" 2>&1 | t_filter_fs
setfattr -n scoutfs.hide.indx.test.-1.$MINOR "$T_D0/invalid" 2>&1 | t_filter_fs
setfattr -n scoutfs.hide.indx.test.$MAJOR.-1 "$T_D0/invalid" 2>&1 | t_filter_fs
setfattr -n scoutfs.hide.indx.test.18446744073709551616.$MINOR "$T_D0/invalid" 2>&1 | t_filter_fs
setfattr -n scoutfs.hide.indx.$(printf 'x%.0s' $(seq 1 240)).$MAJOR.$MINOR "$T_D0/invalid" 2>&1 | t_filter_fs
rm -f "$T_D0/invalid"
echo "== testing boundary values"
touch "$T_D0/boundary"
INO=$(stat -c "%i" "$T_D0/boundary")
setfattr -n scoutfs.hide.indx.test.0.0 "$T_D0/boundary"
read_xattr_index 0 0.0.0 0.0.-1 | awk '($3 == "'$INO'") {print "0.0 found"}'
setfattr -x scoutfs.hide.indx.test.0.0 "$T_D0/boundary"
setfattr -n scoutfs.hide.indx.test.255.18446744073709551615 "$T_D0/boundary"
read_xattr_index 0 255.0.0 255.-1.-1 | awk '($3 == "'$INO'") {print "255.max found"}'
setfattr -x scoutfs.hide.indx.test.255.18446744073709551615 "$T_D0/boundary"
rm -f "$T_D0/boundary"
echo "== indx xattr must have no value"
touch "$T_D0/noval"
setfattr -n scoutfs.hide.indx.test.$MAJOR.$MINOR -v "" "$T_D0/noval" 2>&1 | t_filter_fs
setfattr -n scoutfs.hide.indx.test.$MAJOR.$MINOR -v 0 "$T_D0/noval" 2>&1 | t_filter_fs
setfattr -n scoutfs.hide.indx.test.$MAJOR.$MINOR -v 1 "$T_D0/noval" 2>&1 | t_filter_fs
rm -f "$T_D0/noval"
echo "== set indx xattr and verify index entry"
touch "$T_D0/file"
INO=$(stat -c "%i" "$T_D0/file")
setfattr -n scoutfs.hide.indx.test.$MAJOR.$MINOR "$T_D0/file"
read_xattr_index 0 $MAJOR.0.0 $MAJOR.-1.-1 | awk '($3 == "'$INO'") {print "found"}'
echo "== setting same indx xattr again is a no-op"
setfattr -n scoutfs.hide.indx.test.$MAJOR.$MINOR "$T_D0/file"
read_xattr_index 0 $MAJOR.0.0 $MAJOR.-1.-1 | awk '($3 == "'$INO'") {print "found"}'
echo "== removing non-existent indx xattr succeeds"
setfattr -x scoutfs.hide.indx.nonexistent.$MAJOR.999 "$T_D0/file" 2>&1 | t_filter_fs
read_xattr_index 0 $MAJOR.0.0 $MAJOR.-1.-1 | awk '($3 == "'$INO'") {print "still found"}'
echo "== explicit xattr removal cleans up index entry"
setfattr -x scoutfs.hide.indx.test.$MAJOR.$MINOR "$T_D0/file"
read_xattr_index 0 $MAJOR.0.0 $MAJOR.-1.-1 | awk '($3 == "'$INO'") {print "found orphan"}'
rm -f "$T_D0/file"
echo "== file deletion cleans up index entry"
touch "$T_D0/file2"
INO=$(stat -c "%i" "$T_D0/file2")
setfattr -n scoutfs.hide.indx.test.$MAJOR.$MINOR "$T_D0/file2"
read_xattr_index 0 $MAJOR.0.0 $MAJOR.-1.-1 | awk '($3 == "'$INO'") {print "found before delete"}'
rm -f "$T_D0/file2"
read_xattr_index 0 $MAJOR.0.0 $MAJOR.-1.-1 | awk '($3 == "'$INO'") {print "found orphan after delete"}'
echo "== multiple indx xattrs on one file cleaned up by deletion"
touch "$T_D0/file3"
INO=$(stat -c "%i" "$T_D0/file3")
setfattr -n scoutfs.hide.indx.a.$MAJOR.200 "$T_D0/file3"
setfattr -n scoutfs.hide.indx.b.$MAJOR.300 "$T_D0/file3"
BEFORE=$(read_xattr_index 0 $MAJOR.0.0 $MAJOR.-1.-1 | awk '($3 == "'$INO'")' | wc -l)
echo "entries before delete: $BEFORE"
rm -f "$T_D0/file3"
AFTER=$(read_xattr_index 0 $MAJOR.0.0 $MAJOR.-1.-1 | awk '($3 == "'$INO'")' | wc -l)
echo "entries after delete: $AFTER"
echo "== partial removal leaves other entries"
touch "$T_D0/partial"
INO=$(stat -c "%i" "$T_D0/partial")
setfattr -n scoutfs.hide.indx.a.$MAJOR.200 "$T_D0/partial"
setfattr -n scoutfs.hide.indx.b.$MAJOR.300 "$T_D0/partial"
setfattr -x scoutfs.hide.indx.a.$MAJOR.200 "$T_D0/partial"
read_xattr_index 0 $MAJOR.200.0 $MAJOR.200.-1 | awk '($3 == "'$INO'") {print "200 found"}'
read_xattr_index 0 $MAJOR.300.0 $MAJOR.300.-1 | awk '($3 == "'$INO'") {print "300 found"}'
rm -f "$T_D0/partial"
echo "== multiple files at same index position"
touch "$T_D0/multi_a" "$T_D0/multi_b"
INO_A=$(stat -c "%i" "$T_D0/multi_a")
INO_B=$(stat -c "%i" "$T_D0/multi_b")
setfattr -n scoutfs.hide.indx.test.$MAJOR.$MINOR "$T_D0/multi_a"
setfattr -n scoutfs.hide.indx.test.$MAJOR.$MINOR "$T_D0/multi_b"
COUNT=$(read_xattr_index 0 $MAJOR.$MINOR.0 $MAJOR.$MINOR.-1 | wc -l)
echo "files at same position: $COUNT"
rm -f "$T_D0/multi_a"
read_xattr_index 0 $MAJOR.$MINOR.0 $MAJOR.$MINOR.-1 | awk '($3 == "'$INO_A'") {print "deleted file still found"}'
read_xattr_index 0 $MAJOR.$MINOR.0 $MAJOR.$MINOR.-1 | awk '($3 == "'$INO_B'") {print "surviving file found"}'
rm -f "$T_D0/multi_b"
echo "== cross-mount visibility"
touch "$T_D0/file4"
INO=$(stat -c "%i" "$T_D0/file4")
setfattr -n scoutfs.hide.indx.test.$MAJOR.$MINOR "$T_D0/file4"
read_xattr_index 1 $MAJOR.0.0 $MAJOR.-1.-1 | awk '($3 == "'$INO'") {print "found on mount 1"}'
rm -f "$T_D0/file4"
read_xattr_index 1 $MAJOR.0.0 $MAJOR.-1.-1 | awk '($3 == "'$INO'") {print "found orphan on mount 1"}'
echo "== duplicate position deduplication"
touch "$T_D0/file5"
INO=$(stat -c "%i" "$T_D0/file5")
setfattr -n scoutfs.hide.indx.aa.$MAJOR.$MINOR "$T_D0/file5"
setfattr -n scoutfs.hide.indx.bb.$MAJOR.$MINOR "$T_D0/file5"
COUNT=$(read_xattr_index 0 $MAJOR.0.0 $MAJOR.-1.-1 | awk '($3 == "'$INO'")' | wc -l)
echo "entries for same position: $COUNT"
rm -f "$T_D0/file5"
t_pass
+70
View File
@@ -0,0 +1,70 @@
#
# Regression for the BUG_ON in scoutfs_quota_invalidate when a concurrent
# ruleset read on one mount races with a quota rule modification.
#
t_require_mounts 2
TEST_UID=22222
SET_UID="--ruid=$TEST_UID --euid=$TEST_UID"
echo "== setup"
mkdir -p "$T_D0/dir"
chown --quiet $TEST_UID "$T_D0/dir"
# totl xattr gives quota checks something to consult
setfattr -n scoutfs.totl.test.1.1.1 -v 1 "$T_D0/dir"
echo "== concurrent quota mod and check across mounts"
(
for i in $(seq 1 20); do
scoutfs quota-add -p "$T_M0" \
-r "1 1,L,- 1,L,- $i,L,- I 999999 -" 2>/dev/null
scoutfs quota-del -p "$T_M0" \
-r "1 1,L,- 1,L,- $i,L,- I 999999 -" 2>/dev/null
done
) &
MOD_PID=$!
# same mount as the mod: races local read against invalidate
(
for i in $(seq 1 50); do
setpriv $SET_UID touch "$T_D0/dir/race0_$i" 2>/dev/null
rm -f "$T_D0/dir/race0_$i"
done
) &
CHECK0_PID=$!
# other mount: drives cross-node lock traffic
(
for i in $(seq 1 50); do
setpriv $SET_UID touch "$T_D1/dir/race1_$i" 2>/dev/null
rm -f "$T_D1/dir/race1_$i"
done
) &
CHECK1_PID=$!
t_quiet wait $MOD_PID
t_quiet wait $CHECK0_PID
t_quiet wait $CHECK1_PID
echo "== verify quota rules are consistent after race"
scoutfs quota-wipe -p "$T_M0"
scoutfs quota-list -p "$T_M0"
echo "== verify file creation still works under quota"
scoutfs quota-add -p "$T_M0" -r "1 1,L,- 1,L,- 1,L,- I 999999 -"
sync
echo 1 > $(t_debugfs_path)/drop_weak_item_cache
echo 1 > $(t_debugfs_path)/drop_quota_check_cache
setpriv $SET_UID touch "$T_D0/dir/verify_file"
test -f "$T_D1/dir/verify_file" && echo "file visible on mount 1"
rm -f "$T_D0/dir/verify_file"
scoutfs quota-wipe -p "$T_M0"
echo "== cleanup"
setfattr -x scoutfs.totl.test.1.1.1 "$T_D0/dir"
rm -rf "$T_D0/dir"
t_pass
+43
View File
@@ -0,0 +1,43 @@
#
# Exercise the SCOUTFS_IOC_INJECT_TOTL_DELTA ioctl that injects totl
# deltas directly via totl-delta-inject(1).
#
t_require_commands setfattr scoutfs sync rm touch totl-delta-inject
# force a log merge then read-xattr-totals filtered to our own keys
read_totals()
{
t_force_log_merge
sync
echo 1 > $(t_debugfs_path)/drop_weak_item_cache
scoutfs read-xattr-totals -p "$T_M0" | \
grep -E '^8888\.' || true
}
echo "== setup three files contributing to totl 8888.0.0"
touch "$T_D0/f1" "$T_D0/f2" "$T_D0/f3"
setfattr -n scoutfs.totl.inj.8888.0.0 -v 2 "$T_D0/f1"
setfattr -n scoutfs.totl.inj.8888.0.0 -v 8 "$T_D0/f2"
setfattr -n scoutfs.totl.inj.8888.0.0 -v 32 "$T_D0/f3"
echo "== merge baseline into fs_root"
read_totals
echo "== inject (+128, +2) unbalances totl 8888.0.0"
totl-delta-inject "$T_M0" 8888.0.0 128 2
read_totals
echo "== unlink f3 (value 32) produces a -32/-1 delta"
rm -f "$T_D0/f3"
read_totals
echo "== inject (-128, -2) restores accounting for the remaining files"
totl-delta-inject "$T_M0" 8888.0.0 -128 -2
read_totals
echo "== cleanup"
rm -f "$T_D0/f1" "$T_D0/f2"
read_totals
t_pass
+50
View File
@@ -0,0 +1,50 @@
#
# Test that merge_read_item() correctly updates the sequence number when
# combining delta items from multiple finalized log trees. Each mount
# sets a totl value in its own 3-bit lane (powers of 8) so that any
# double-counting overflows the lane and is caught by: or(v, exp) != exp.
#
t_require_commands setfattr scoutfs
t_require_mounts 5
echo "== setup"
for nr in $(t_fs_nrs); do
d=$(eval echo \$T_D$nr)
for i in $(seq 1 2500); do : > "$d/f$nr$i"; done
done
sync
t_force_log_merge
vals=(1 8 64 512 4096)
expected=4681
n=0
for nr in $(t_fs_nrs); do
d=$(eval echo \$T_D$nr)
v=${vals[$((n++))]}
for i in $(seq 1 2500); do
setfattr -n "scoutfs.totl.t.$i.0.0" -v $v "$d/f$nr$i"
done
done
t_trigger_arm_silent log_merge_force_partial $(t_server_nr)
bad="$T_TMPDIR/bad"
for nr in $(t_fs_nrs); do
( while true; do
echo 1 > "$(t_debugfs_path $nr)/drop_weak_item_cache"
scoutfs read-xattr-totals -p "$(eval echo \$T_M$nr)" | \
awk -F'[ =,]+' -v e=$expected 'or($2+0,e) != e'
done ) >> "$bad" &
done
echo "expected $expected"
t_force_log_merge
t_silent_kill $(jobs -p)
test -s "$bad" && echo "double-counted:" && cat "$bad"
echo "== cleanup"
for nr in $(t_fs_nrs); do
find "$(eval echo \$T_D$nr)" -name "f$nr*" -delete
done
t_pass
+22 -11
View File
@@ -160,15 +160,16 @@ int parse_timespec(char *str, struct timespec *ts)
* Parse a quorum slot specification string "NR,ADDR,PORT" into its
* component parts. We use sscanf to both parse the leading NR and
* trailing PORT integers, and to pull out the inner ADDR string which
* is then parsed to make sure that it's a valid unicast ipv4 address.
* is then parsed to make sure that it's a valid unicast ip address.
* We require that all components be specified, and sccanf will check
* this by the number of matches it returns.
*/
int parse_quorum_slot(struct scoutfs_quorum_slot *slot, char *arg)
{
#define ADDR_CHARS 45 /* max ipv6 */
char addr[ADDR_CHARS + 1] = {'\0',};
#define ADDR_CHARS 45 /* (INET6_ADDRSTRLEN - 1) */
char addr[INET6_ADDRSTRLEN] = {'\0',};
struct in_addr in;
struct in6_addr in6;
int port;
int parsed;
int nr;
@@ -206,15 +207,25 @@ int parse_quorum_slot(struct scoutfs_quorum_slot *slot, char *arg)
return -EINVAL;
}
if (inet_aton(addr, &in) == 0 || htonl(in.s_addr) == 0 ||
htonl(in.s_addr) == UINT_MAX) {
printf("invalid ipv4 address '%s' in quorum slot '%s'\n",
addr, arg);
return -EINVAL;
if (inet_pton(AF_INET, addr, &in) == 1) {
if (htonl(in.s_addr) == 0 || htonl(in.s_addr) == UINT_MAX) {
printf("invalid ipv4 address '%s' in quorum slot '%s'\n",
addr, arg);
return -EINVAL;
}
slot->addr.v4.family = cpu_to_le16(SCOUTFS_AF_IPV4);
slot->addr.v4.addr = cpu_to_le32(htonl(in.s_addr));
slot->addr.v4.port = cpu_to_le16(port);
} else if (inet_pton(AF_INET6, addr, &in6) == 1) {
if (IN6_IS_ADDR_UNSPECIFIED(&in6) || IN6_IS_ADDR_MULTICAST(&in6)) {
printf("invalid ipv6 address '%s' in quorum slot '%s'\n",
addr, arg);
return -EINVAL;
}
slot->addr.v6.family = cpu_to_le16(SCOUTFS_AF_IPV6);
memcpy(slot->addr.v6.addr, &in6, 16);
slot->addr.v6.port = cpu_to_le16(port);
}
slot->addr.v4.family = cpu_to_le16(SCOUTFS_AF_IPV4);
slot->addr.v4.addr = cpu_to_le32(htonl(in.s_addr));
slot->addr.v4.port = cpu_to_le16(port);
return nr;
}
+40 -17
View File
@@ -28,6 +28,7 @@
#include "srch.h"
#include "leaf_item_hash.h"
#include "dev.h"
#include "quorum.h"
static void print_block_header(struct scoutfs_block_header *hdr, int size)
{
@@ -400,12 +401,20 @@ static int print_mounted_client_entry(struct scoutfs_key *key, u64 seq, u8 flags
{
struct scoutfs_mounted_client_btree_val *mcv = val;
struct in_addr in;
char ip6addr[INET6_ADDRSTRLEN];
memset(&in, 0, sizeof(in));
in.s_addr = htonl(le32_to_cpu(mcv->addr.v4.addr));
if (mcv->addr.v4.family == cpu_to_le16(SCOUTFS_AF_IPV4)) {
in.s_addr = htonl(le32_to_cpu(mcv->addr.v4.addr));
printf(" rid %016llx ipv4_addr %s flags 0x%x\n",
le64_to_cpu(key->skmc_rid), inet_ntoa(in), mcv->flags);
printf(" rid %016llx ipv4_addr %s flags 0x%x\n",
le64_to_cpu(key->skmc_rid), inet_ntoa(in), mcv->flags);
} else if (mcv->addr.v6.family == cpu_to_le16(SCOUTFS_AF_IPV6)) {
printf(" rid %016llx ipv6_addr %s flags 0x%x\n",
le64_to_cpu(key->skmc_rid),
inet_ntop(AF_INET, mcv->addr.v6.addr, ip6addr, INET6_ADDRSTRLEN),
mcv->flags);
}
return 0;
}
@@ -891,26 +900,40 @@ static int print_btree_leaf_items(int fd, struct scoutfs_super_block *super,
static char *alloc_addr_str(union scoutfs_inet_addr *ia)
{
struct in_addr addr;
char ip6addr[INET6_ADDRSTRLEN];
char *quad;
char *str;
int len;
memset(&addr, 0, sizeof(addr));
addr.s_addr = htonl(le32_to_cpu(ia->v4.addr));
quad = inet_ntoa(addr);
if (quad == NULL)
return NULL;
if (le16_to_cpu(ia->v4.family) == SCOUTFS_AF_IPV4) {
memset(&addr, 0, sizeof(addr));
addr.s_addr = htonl(le32_to_cpu(ia->v4.addr));
quad = inet_ntoa(addr);
if (quad == NULL)
return NULL;
len = snprintf(NULL, 0, "%s:%u", quad, le16_to_cpu(ia->v4.port));
if (len < 1 || len > 22)
return NULL;
len = snprintf(NULL, 0, "%s:%u", quad, le16_to_cpu(ia->v4.port));
if (len < 1 || len > 22)
return NULL;
len++; /* null */
str = malloc(len);
if (!str)
return NULL;
len++; /* null */
str = malloc(len);
if (!str)
return NULL;
snprintf(str, len, "%s:%u", quad, le16_to_cpu(ia->v4.port));
snprintf(str, len, "%s:%u", quad, le16_to_cpu(ia->v4.port));
} else if (le16_to_cpu(ia->v6.family) == SCOUTFS_AF_IPV6) {
if (inet_ntop(AF_INET6, ia->v6.addr, ip6addr, INET6_ADDRSTRLEN) == NULL)
return NULL;
len = strlen(ip6addr) + 9; /* "[]:\0" (4) plus max strlen(u16) (5) */
str = malloc(len);
if (!str)
return NULL;
snprintf(str, len, "[%s]:%u", ip6addr, le16_to_cpu(ia->v6.port));
} else
return NULL;
return str;
}
@@ -1026,7 +1049,7 @@ static void print_super_block(struct scoutfs_super_block *super, u64 blkno)
printf(" quorum config version %llu\n",
le64_to_cpu(super->qconf.version));
for (i = 0; i < array_size(super->qconf.slots); i++) {
if (super->qconf.slots[i].addr.v4.family != cpu_to_le16(SCOUTFS_AF_IPV4))
if (!quorum_slot_present(super, i))
continue;
addr = alloc_addr_str(&super->qconf.slots[i].addr);
+56 -29
View File
@@ -10,7 +10,8 @@
bool quorum_slot_present(struct scoutfs_super_block *super, int i)
{
return super->qconf.slots[i].addr.v4.family == cpu_to_le16(SCOUTFS_AF_IPV4);
return ((super->qconf.slots[i].addr.v4.family == cpu_to_le16(SCOUTFS_AF_IPV4)) ||
(super->qconf.slots[i].addr.v6.family == cpu_to_le16(SCOUTFS_AF_IPV6)));
}
bool valid_quorum_slots(struct scoutfs_quorum_slot *slots)
@@ -18,35 +19,57 @@ bool valid_quorum_slots(struct scoutfs_quorum_slot *slots)
struct in_addr in;
bool valid = true;
char *addr;
char ip6addr[INET6_ADDRSTRLEN];
__le16 family = cpu_to_le16(SCOUTFS_AF_NONE);
int i;
int j;
for (i = 0; i < SCOUTFS_QUORUM_MAX_SLOTS; i++) {
if (slots[i].addr.v4.family == cpu_to_le16(SCOUTFS_AF_NONE))
continue;
if (slots[i].addr.v4.family == cpu_to_le16(SCOUTFS_AF_IPV4)) {
if (family == cpu_to_le16(SCOUTFS_AF_NONE)) {
family = cpu_to_le16(SCOUTFS_AF_IPV4);
} else if (family != cpu_to_le16(SCOUTFS_AF_IPV4)) {
fprintf(stderr, "quorum slot nr %u is IPv4 but earlier slots are IPv6; mixed IPv4/IPv6 quorum is not supported\n",
i);
valid = false;
}
if (slots[i].addr.v4.family != cpu_to_le16(SCOUTFS_AF_IPV4)) {
for (j = i + 1; j < SCOUTFS_QUORUM_MAX_SLOTS; j++) {
if (slots[i].addr.v4.addr == slots[j].addr.v4.addr &&
slots[i].addr.v4.port == slots[j].addr.v4.port) {
in.s_addr =
htonl(le32_to_cpu(slots[i].addr.v4.addr));
addr = inet_ntoa(in);
fprintf(stderr, "quorum slot nr %u and %u have the same address %s:%u\n",
i, j, addr,
le16_to_cpu(slots[i].addr.v4.port));
valid = false;
}
}
} else if (slots[i].addr.v6.family == cpu_to_le16(SCOUTFS_AF_IPV6)) {
if (family == cpu_to_le16(SCOUTFS_AF_NONE)) {
family = cpu_to_le16(SCOUTFS_AF_IPV6);
} else if (family != cpu_to_le16(SCOUTFS_AF_IPV6)) {
fprintf(stderr, "quorum slot nr %u is IPv6 but earlier slots are IPv4; mixed IPv4/IPv6 quorum is not supported\n",
i);
valid = false;
}
for (j = i + 1; j < SCOUTFS_QUORUM_MAX_SLOTS; j++) {
if ((IN6_ARE_ADDR_EQUAL(slots[i].addr.v6.addr, slots[j].addr.v6.addr)) &&
(slots[i].addr.v6.port == slots[j].addr.v6.port)) {
fprintf(stderr, "quorum slot nr %u and %u have the same address [%s]:%u\n",
i, j,
inet_ntop(AF_INET6, slots[i].addr.v6.addr, ip6addr, INET6_ADDRSTRLEN),
le16_to_cpu(slots[i].addr.v6.port));
valid = false;
}
}
} else if (slots[i].addr.v6.family != cpu_to_le16(SCOUTFS_AF_NONE)) {
fprintf(stderr, "quorum slot nr %u has invalid family %u\n",
i, le16_to_cpu(slots[i].addr.v4.family));
valid = false;
}
for (j = i + 1; j < SCOUTFS_QUORUM_MAX_SLOTS; j++) {
if (slots[i].addr.v4.family != cpu_to_le16(SCOUTFS_AF_IPV4))
continue;
if (slots[i].addr.v4.addr == slots[j].addr.v4.addr &&
slots[i].addr.v4.port == slots[j].addr.v4.port) {
in.s_addr =
htonl(le32_to_cpu(slots[i].addr.v4.addr));
addr = inet_ntoa(in);
fprintf(stderr, "quorum slot nr %u and %u have the same address %s:%u\n",
i, j, addr,
le16_to_cpu(slots[i].addr.v4.port));
valid = false;
}
}
}
return valid;
@@ -61,19 +84,23 @@ void print_quorum_slots(struct scoutfs_quorum_slot *slots, int nr, char *indent)
{
struct scoutfs_quorum_slot *sl;
struct in_addr in;
char ip6addr[INET6_ADDRSTRLEN];
bool first = true;
int i;
for (i = 0, sl = slots; i < SCOUTFS_QUORUM_MAX_SLOTS; i++, sl++) {
if (sl->addr.v4.family == cpu_to_le16(SCOUTFS_AF_IPV4)) {
in.s_addr = htonl(le32_to_cpu(sl->addr.v4.addr));
printf("%s%u: %s:%u\n", first ? "" : indent,
i, inet_ntoa(in), le16_to_cpu(sl->addr.v4.port));
if (sl->addr.v4.family != cpu_to_le16(SCOUTFS_AF_IPV4))
continue;
in.s_addr = htonl(le32_to_cpu(sl->addr.v4.addr));
printf("%s%u: %s:%u\n", first ? "" : indent,
i, inet_ntoa(in), le16_to_cpu(sl->addr.v4.port));
first = false;
first = false;
} else if (sl->addr.v6.family == cpu_to_le16(SCOUTFS_AF_IPV6)) {
printf("%s%u: [%s]:%u\n", first ? "" : indent, i,
inet_ntop(AF_INET6, sl->addr.v6.addr, ip6addr, INET6_ADDRSTRLEN),
le16_to_cpu(sl->addr.v6.port));
first = false;
}
}
}