Compare commits

..

43 Commits

Author SHA1 Message Date
Auke Kok
d63b608658 Use spin_lock_bh on recinf->lock to fix softirq deadlock
timer_callback() runs in softirq context and acquires recinf->lock,
but the process-context callers (scoutfs_recov_prepare, _begin,
_finish, _is_pending, _next_pending, _shutdown) were taking the
same lock with plain spin_lock(), leaving softirqs enabled. Found
by Lockdep:

```
	================================
	WARNING: inconsistent lock state
	5.14.0-427.35.1.el9_4.x86_64+debug #1 Tainted: G           OE     -------  ---
	--------------------------------
	inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
	swapper/2/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
	ffff88813cdd9c20 (&recinf->lock){+.?.}-{2:2}, at: timer_callback+0x26/0x380 [scoutfs]
	{SOFTIRQ-ON-W} state was registered at:
	  __lock_acquire+0x7d0/0x1900
	  lock_acquire+0x1da/0x640
	  _raw_spin_lock+0x34/0x80
	  scoutfs_recov_finish+0x80/0x830 [scoutfs]
	  server_greeting+0x244/0xe60 [scoutfs]
	  scoutfs_net_proc_worker+0x28a/0xce0 [scoutfs]
	  recv_one_message+0x7e3/0xd10 [scoutfs]
	  scoutfs_net_recv_worker+0x441/0xe00 [scoutfs]
	  process_one_work+0x8e5/0x1530
	  worker_thread+0x598/0xf70
	  kthread+0x2a4/0x350
	  ret_from_fork+0x29/0x50
	irq event stamp: 549813370
	hardirqs last  enabled at (549813370): [<ffffffffabe25cb4>] _raw_spin_unlock_irq+0x24/0x50
	hardirqs last disabled at (549813369): [<ffffffffabe2594e>] _raw_spin_lock_irq+0x5e/0x90
	softirqs last  enabled at (549813356): [<ffffffffabe28c91>] __do_softirq+0x621/0x9c2
	softirqs last disabled at (549813363): [<ffffffffa9a44665>] __irq_exit_rcu+0x185/0x230

	other info that might help us debug this:
	 Possible unsafe locking scenario:
	       CPU0
	       ----
	  lock(&recinf->lock);
	  <Interrupt>
	    lock(&recinf->lock);

	 *** DEADLOCK ***
```

Convert the six process-context sites to spin_lock_bh()/spin_unlock_bh().

Signed-off-by: Auke Kok <auke.kok@versity.com>
2026-05-12 11:26:55 -07:00
Zach Brown
fece0a9372 Merge pull request #310 from versity/zab/v1.31
v1.31 Release
2026-05-06 10:37:07 -07:00
Zach Brown
aa432727f2 v1.31 Release
Finish the release notes for the 1.31 release.

Signed-off-by: Zach Brown <zab@versity.com>
2026-05-05 14:29:18 -07:00
Zach Brown
ceebadd139 Merge pull request #308 from versity/auke/totl-delta-repair
totl key repair
2026-05-05 13:05:57 -07:00
Zach Brown
4b4ddc9ded Merge pull request #298 from versity/auke/double_unlock_dw_truncate
Fix double unlock in scoutfs_setattr data_wait error path
2026-05-04 09:52:29 -07:00
Zach Brown
94d3ece590 Merge pull request #299 from versity/auke/cond_resched_block_free
Add cond_resched in block_free_work
2026-05-04 09:49:43 -07:00
Auke Kok
6d5517614b Fix double unlock in scoutfs_setattr data_wait error path
When scoutfs_setattr truncates a file with offline extents, it unlocks
the inode lock before calling scoutfs_data_wait to wait for the data
to be staged. If data_wait returns any error, the code jumps to 'goto
out' which calls scoutfs_unlock again, thus double-unlocking the lock.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2026-05-04 09:48:54 -07:00
Auke Kok
10279d0b23 Add test exercising the totl delta inject ioctl.
Skews a totl twice, restore it, and intersperse setfattr/unlink to
exercise both injected and naturally-produced deltas.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2026-05-04 09:43:01 -07:00
Zach Brown
443c34309f Merge pull request #303 from versity/auke/clang_build_werr
3 minor clang things
2026-05-04 09:42:43 -07:00
Auke Kok
5c81a979d5 Add SCOUTFS_IOC_INJECT_TOTL_DELTA ioctl.
Inject a signed (total, count) delta at a totl key.  No validity
checking.  Requires CAP_SYS_ADMIN.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2026-05-04 09:42:42 -07:00
Zach Brown
ec38b6e1c8 Merge pull request #305 from versity/auke/block_submit_bio_err
Set BLOCK_BIT_ERROR on bio submit failure during forced unmount
2026-05-04 09:35:43 -07:00
Zach Brown
8e0066b231 Merge pull request #309 from versity/auke/quota_invalidate_race
fix and test - quota invalidate race
2026-05-04 09:34:26 -07:00
Zach Brown
a0fda5b735 Merge pull request #307 from versity/zab/next_merge_range_zero
Search all merge range items for next
2026-05-04 09:29:54 -07:00
Auke Kok
fc56a69d8f Add quota invalidate race regression test
Run concurrent quota add/del on one mount against rapid file
creation and deletion on both mounts to exercise the race fixed
in the previous commit.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2026-05-02 13:19:31 -07:00
Auke Kok
c8bc42ccdb Fix quota invalidate race with concurrent ruleset read
A quota check holds the quota cluster lock for READ and marks the
cached ruleset EBUSY while loading rules.  A quota mod on the same
mount holds the lock for WRITE (compatible with the local READ)
and calls scoutfs_quota_invalidate(), tripping
BUG_ON(rs == ERR_PTR(-EBUSY)).

Make invalidate skip EBUSY so the reader's claim is preserved, and
have scoutfs_quota_mod_rule wait for the reader to finish before
calling invalidate.  Without the wait, the in-flight reader would
publish its stale ruleset after invalidate runs, leaving the cache
stale until the next invalidation.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2026-05-02 13:19:31 -07:00
Zach Brown
4db0a48fe4 Search all merge range items for next
When searching for the next least merge range we need to sweep all the
stored items because they're interleaved with respect to key sorting
because we've clobbered the zone.

To search all of them we need to start from 0, not from the caller's
start key after setting the zone.  If the caller happens to provide a
start key with a small zone but large other fields (totl keys with
sufficiently large identifiers) we can miss ranges.

Signed-off-by: Zach Brown <zab@zabbo.net>
2026-04-29 10:17:38 -07:00
Auke Kok
ac1ab8e87f Add cond_resched in block_free_work
I'm seeing consistent CPU soft lockups in block_free_work on
my bare metal system that aren't reached by VM instances. The
reason is that the bare metal machine has a ton more memory
available causing the block free work queue to grow much
larger in size, and then it has so much work that it can take 30+
seconds before it goes through it all.

This is all with a debug kernel. A non debug kernel will likely
zoom through the outstanding work here at a much faster rate.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2026-04-22 13:39:32 -07:00
Zach Brown
af31b9f1e8 Merge pull request #306 from versity/zab/v1.30
v1.30 Release
2026-04-22 10:43:17 -07:00
Zach Brown
ad65116d8f v1.30 Release
Finish the release notes for the 1.30 release.

Signed-off-by: Zach Brown <zab@versity.com>
2026-04-21 16:43:12 -07:00
Auke Kok
8bfd35db0b Set BLOCK_BIT_ERROR on bio submit failure during forced unmount
block_submit_bio will return -ENOLINK if called during a forced
shutdown, the bio is never submitted, and thus no completion callback
will fire to set BLOCK_BIT_ERROR. Any other task waiting for this
specific bp will end up waiting forever.

To fix, fall through to the existing block_end_io call on the
error path instead of returning directly.  That means moving
the forcing_unmount check past the setup calls so block_end_io's
bookkeeping stays balanced. block_end_io then sets BLOCK_BIT_ERROR
and wakes up waiters just as it would on a failed async completion.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2026-04-20 17:01:12 -07:00
Zach Brown
e20765a9c7 Merge pull request #300 from versity/auke/more_false_positive_failures
Auke/more false positive failures: xfs lockdep miss, newline
2026-04-17 09:17:50 -07:00
Zach Brown
066da5c2a2 Merge pull request #297 from versity/auke/quota_mod_trans_hold
Hold transaction in scoutfs_quota_mod_rule to prevent alloc corruption.
2026-04-17 09:16:41 -07:00
Auke Kok
7eacc7139c Hold transaction in scoutfs_quota_mod_rule to prevent alloc corruption.
scoutfs_quota_mod_rule calls scoutfs_item_create/delete which use
the transaction allocator but it never held it. Without the hold,
a concurrent transaction commit can call scoutfs_alloc_init to
reinitialize the allocator while dirty_alloc_blocks is in the middle
of setting up the freed list block. This overwrites alloc->freed with
the server's fresh (empty) state, causing a blkno mismatch BUG_ON
in list_block_add.

Reproduced by stressing concurrent quota add/del operations across
mounts. Crashdump analysis confirms dirty_list_block COW'd a freed
block (fr_old=9842, new blkno=9852) but by the time list_block_add
ran, freed.ref.blkno was 0 with first_nr=0 and total_nr=0: the freed
list head had been zeroed by a concurrent alloc_init.

Fix by adding scoutfs_hold_trans/scoutfs_release_trans around the
item modification in scoutfs_quota_mod_rule, preventing transaction
commit from racing with the allocator use.

Rename the 'unlock' label to 'release' since 'out' now directly
does the unlock. The unlock safely handles a NULL lock.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2026-04-16 16:20:47 -07:00
Auke Kok
019125d86d Don't swallow invalid message error
A malformed message encountered here increases the counter, but doesn't
tear down the connection because of the nested for loops. The comments
indicate that that is the expected behavior - a misbehaving client
should not be tolerated.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2026-04-15 17:02:40 -07:00
Auke Kok
347e27acec Fix leak in client side lock invalidation
Clang's scan-build found this leak when we get an invalidation
for a lock we no longer have. Free ireq to fix.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2026-04-15 16:35:10 -07:00
Auke Kok
3ce5d47f2c Initialize resp_data to silence clang uninitialized warning
Clang flow analysis flags resp_data in process_response as possibly
uninitialized when find_request returns NULL.

  kmod/src/net.c:533:6: error: variable 'resp_data' is used uninitialized
  whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized]

In practice the read is harmless because resp_func stays NULL in that
path and call_resp_func only dereferences resp_data when resp_func is
non-NULL. Initialize at declaration.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2026-04-15 14:06:46 -07:00
Auke Kok
9e3b01b3b4 Filter newlines out dmesg.new
Without overly broad filtering empty lines from dmesg, filter
them so dmesg.new doesn't trigger a test failure. I don't want
to overly process dmesg, so do this as late as possible.

The xfs lockdep patterns can forget a leading/trailing empty line,
causing a failure despite the explicit removal of the lockdep
false positive.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2026-04-15 10:36:28 -07:00
Auke Kok
876c233f06 Ignore another xfs lockdep class
This already caught xfs_nondir_ilock_class, but recent CI runs
have been hitting xfs_dir_ilock_class, too.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2026-04-15 10:36:28 -07:00
Zach Brown
6aa5876c71 Merge pull request #301 from versity/auke/el7_uninit_read_seq
Squelch gcc uninitialized warning on el7
2026-04-15 09:58:23 -07:00
Auke Kok
7a9f9ec698 Squelch gcc uninitialized warning on el7
The gcc version in el7 can't determine that scoutfs_block_check_stale
won't return ret = 0 when the input ret value is < 0, and
errors because we might call alloc_wpage with an uninitialized
read_seq. Initialize it to 0 to avoid it.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2026-04-14 15:09:20 -04:00
Zach Brown
fc0fc1427f Merge pull request #296 from versity/auke/indx_key_delete
Fix indx delete using wrong xid, leaving orphans. && Add basic-xattr-indx tests.
2026-04-13 14:34:37 -07:00
Zach Brown
ec68845201 Merge pull request #289 from versity/auke/merge_read_item_stale_seq
Update seq when merging deltas from partial log merge.
2026-04-13 14:10:37 -07:00
Auke Kok
5e2009f939 Avoid double counting deltas from non-input finalized log trees.
Readers currently accumulate all finalized log tree deltas into
a single bucket for deciding whether they are already in fs_root
or not, but, finalized trees that aren't inputs to a current merge
will have higher seqs, and thus we may be double applying deltas
already merged into fs_root.

To distinguish, scoutfs_totl_merge_contribute() needs to know the
merge status item seq.  We change wkic's get_roots() from using the
SCOUTFS_NET_CMD_GET_ROOTS RPC to reading the superblock directly.
This is needed because totl merge resolution has to use the same data
as the btree roots it is operating on, thus we can't grab it from a
SCOUTFS_NET_CMD_GET_ROOTS packet - it likely is different.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2026-04-10 13:50:21 -07:00
Auke Kok
8bdc20af21 Rename/reword FINALIZED to MERGE_INPUT.
These mislabeled members and enums were clearly not describing
the actual data being handled and obfuscating the intent of
avoiding mixing merge input items with non-merge input items.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2026-04-10 13:50:21 -07:00
Auke Kok
857a39579e Clear roots when retrying due to stale btree blocks.
Before deltas were added this code path was correct, but with
deltas we can't just retry this without clearing &root, since
it would potentially double count.

The condition where this could happen is when there are deltas in
several finalized log trees, and we've made progress towards reading
some of them, and then encounter a stale btree block. The retry
would not clear the collected trees, apply the same delta as was
already applied before the retry, and thus double count.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2026-04-10 13:50:21 -07:00
Auke Kok
38d36c9f5c Update seq when merging deltas from partial log merge.
Two different clients can write delta's for totl indexes at the same
time, recording their changes. When merged, a reader should apply both
in order, and only once. To do so, the seq determines whether the delta
has been applied already.

The code fails to update the seq while walking the trees for deltas to
apply. Subsequently, when processing subsequent trees, it could
re-process deltas already applied. In case of a large negative delta
(e.g. removal of large amounts of files), the totl value could become
negative, resulting in quota lockout.

The fix is simple: advance the seq when reading partial delta merges
to avoid double counting.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2026-04-10 13:50:21 -07:00
Auke Kok
b724567b2a Add log_merge_force_partial trigger for testing partial merges.
Add a trigger that forces btree_merge() to return -ERANGE after
modifying a leaf's worth of items, causing many small partial merges
per merge cycle. This is used by tests to reliably reproduce races
that depend on partial merges splicing items into fs_root while
finalized logs still exist.

The trigger check lives inside btree_merge() where it can observe
actual item modification progress, rather than overriding the
caller's dirty byte limit argument which applies to the whole
writer context.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2026-04-10 12:25:30 -07:00
Auke Kok
add1da10dc Add test for stale seq in merge delta combining.
merge_read_item() fails to update found->seq when combining delta items
from multiple finalized log trees. Add a test case to replicate the
conditions of this issue.

Each of 5 mounts sets totl value 1 on 2500 shared keys, giving an
expected total of 5 per key.  Any total > 5 proves double-counting
from a stale seq.

The log_merge_force_partial trigger forces many partial merges per
cycle, creating the conditions where stale-seq items get spliced into
fs_root while finalized logs still exist.  Parallel readers on all
mounts race against this window to detect double-counted values.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2026-04-10 12:25:30 -07:00
Auke Kok
b9c49629a2 Add basic-xattr-indx tests.
We had no basic testing for `scoutfs read-xattr-index` whatsoever. This
adds your basic negative argument tests, lifecycle tests, the
deduplicated reads, and partial removal.

This exposes a bug in deletion where the indx entry isn't cleaned up
on inode delete.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2026-04-08 13:45:56 -07:00
Auke Kok
9737009437 Fix indx delete using wrong xid, leaving orphans.
During inode deletion, scoutfs_xattr_drop forgot to set the xid
of the xattr after calling parse_indx_key, which hardcodes xid=0, and it
is the callers' responsibility. delete_force then deletes the wrong
key, and returns no errors on nonexistant keys.

So now there is a pending deletion for a non-existant indx and an
orphan indx entry in the tree. Subsequent calls to `scoutfs
read-xattr-index` will thus return entries for deleted inodes.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2026-04-08 11:48:47 -07:00
Zach Brown
3d54ae03e6 Merge pull request #295 from versity/auke/xfs_lockdep_ignore
Avoid xfs lockdep false positive dmesg errors.
2026-04-03 09:46:44 -07:00
Auke Kok
e27ec0add6 Avoid xfs lockdep false positive dmesg errors.
This xfs lockdep stack trace has at least 2 variants around
fs_reclaim, so try and capture it not too precisely here.

We can remove "lockdep disabled" in the $re grep -v, because it
can affect both this and the kasan one.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2026-04-01 14:25:48 -07:00
Zach Brown
5457741672 Merge pull request #292 from versity/zab/v1.29
v1.29 Release
2026-03-25 22:36:28 -07:00
33 changed files with 833 additions and 93 deletions

View File

@@ -1,6 +1,38 @@
Versity ScoutFS Release Notes
=============================
---
v1.31
\
*May 5, 2026*
Fix race between modifying quota rules and internal reading of the rules
that tripped an assertion.
Fix a bug that could skip merging totl items under specific heavy write
loads. This could lead to merged totl items incorrectly tracking the
sum of all the contributing totl xattrs.
Fix many small low risk bugs in error paths that were found with code
analysis and testing.
---
v1.30
\
*Apr 21, 2026*
Fix a problem reading the accumulated totals of contributing .totl.
xattrs when log merging is in progress. The problem would have readers
of the totals calculate the sums incorrectly.
Fix a problem updating quota rules. There was a race where updates
could be corrupted if they happened while a transaction was being
written.
Fix a problem deleting files with .indx. xattrs. The internal indexing
metadata wouldn't be properly deleted so the files would still claim to
be present and visible in the index, though the file no longer existed.
---
v1.29
\

View File

@@ -218,6 +218,7 @@ static void block_free_work(struct work_struct *work)
llist_for_each_entry_safe(bp, tmp, deleted, free_node) {
block_free(sb, bp);
cond_resched();
}
}
@@ -467,9 +468,6 @@ static int block_submit_bio(struct super_block *sb, struct block_private *bp,
sector_t sector;
int ret = 0;
if (scoutfs_forcing_unmount(sb))
return -ENOLINK;
sector = bp->bl.blkno << (SCOUTFS_BLOCK_LG_SHIFT - 9);
WARN_ON_ONCE(bp->bl.blkno == U64_MAX);
@@ -480,6 +478,17 @@ static int block_submit_bio(struct super_block *sb, struct block_private *bp,
set_bit(BLOCK_BIT_IO_BUSY, &bp->bits);
block_get(bp);
/*
* A second thread may already be waiting on this block's completion
* after this thread won the race to submit the block. We exit through
* the block_end_io error path which sets BLOCK_BIT_ERROR and assures
* that other callers in the waitq get woken up.
*/
if (scoutfs_forcing_unmount(sb)) {
ret = -ENOLINK;
goto end_io;
}
blk_start_plug(&plug);
for (off = 0; off < SCOUTFS_BLOCK_LG_SIZE; off += PAGE_SIZE) {
@@ -517,6 +526,7 @@ static int block_submit_bio(struct super_block *sb, struct block_private *bp,
blk_finish_plug(&plug);
end_io:
/* let racing end_io know we're done */
block_end_io(sb, opf, bp, ret);

View File

@@ -2183,6 +2183,8 @@ static int merge_read_item(struct super_block *sb, struct scoutfs_key *key, u64
if (ret > 0) {
if (ret == SCOUTFS_DELTA_COMBINED) {
scoutfs_inc_counter(sb, btree_merge_delta_combined);
if (seq > found->seq)
found->seq = seq;
} else if (ret == SCOUTFS_DELTA_COMBINED_NULL) {
scoutfs_inc_counter(sb, btree_merge_delta_null);
free_mitem(rng, found);
@@ -2486,6 +2488,14 @@ int scoutfs_btree_merge(struct super_block *sb,
mitem = next_mitem(mitem);
free_mitem(&rng, tmp);
}
if (mitem && walk_val_len == 0 &&
!(walk_flags & (BTW_INSERT | BTW_DELETE)) &&
scoutfs_trigger(sb, LOG_MERGE_FORCE_PARTIAL)) {
ret = -ERANGE;
*next_ret = mitem->key;
goto out;
}
}
ret = 0;

View File

@@ -239,9 +239,9 @@ static int forest_read_items(struct super_block *sb, struct scoutfs_key *key, u6
* to reset their state and retry with a newer version of the btrees.
*/
int scoutfs_forest_read_items_roots(struct super_block *sb, struct scoutfs_net_roots *roots,
struct scoutfs_key *key, struct scoutfs_key *bloom_key,
struct scoutfs_key *start, struct scoutfs_key *end,
scoutfs_forest_item_cb cb, void *arg)
u64 merge_input_seq, struct scoutfs_key *key,
struct scoutfs_key *bloom_key, struct scoutfs_key *start,
struct scoutfs_key *end, scoutfs_forest_item_cb cb, void *arg)
{
struct forest_read_items_data rid = {
.cb = cb,
@@ -317,15 +317,17 @@ int scoutfs_forest_read_items_roots(struct super_block *sb, struct scoutfs_net_r
scoutfs_inc_counter(sb, forest_bloom_pass);
if ((le64_to_cpu(lt.flags) & SCOUTFS_LOG_TREES_FINALIZED))
rid.fic |= FIC_FINALIZED;
if ((le64_to_cpu(lt.flags) & SCOUTFS_LOG_TREES_FINALIZED) &&
(merge_input_seq == 0 ||
le64_to_cpu(lt.finalize_seq) < merge_input_seq))
rid.fic |= FIC_MERGE_INPUT;
ret = scoutfs_btree_read_items(sb, &lt.item_root, key, start,
end, forest_read_items, &rid);
if (ret < 0)
goto out;
rid.fic &= ~FIC_FINALIZED;
rid.fic &= ~FIC_MERGE_INPUT;
}
ret = 0;
@@ -345,7 +347,7 @@ int scoutfs_forest_read_items(struct super_block *sb,
ret = scoutfs_client_get_roots(sb, &roots);
if (ret == 0)
ret = scoutfs_forest_read_items_roots(sb, &roots, key, bloom_key, start, end,
ret = scoutfs_forest_read_items_roots(sb, &roots, 0, key, bloom_key, start, end,
cb, arg);
return ret;
}

View File

@@ -11,7 +11,7 @@ struct scoutfs_lock;
/* caller gives an item to the callback */
enum {
FIC_FS_ROOT = (1 << 0),
FIC_FINALIZED = (1 << 1),
FIC_MERGE_INPUT = (1 << 1),
};
typedef int (*scoutfs_forest_item_cb)(struct super_block *sb, struct scoutfs_key *key, u64 seq,
u8 flags, void *val, int val_len, int fic, void *arg);
@@ -25,9 +25,9 @@ int scoutfs_forest_read_items(struct super_block *sb,
struct scoutfs_key *end,
scoutfs_forest_item_cb cb, void *arg);
int scoutfs_forest_read_items_roots(struct super_block *sb, struct scoutfs_net_roots *roots,
struct scoutfs_key *key, struct scoutfs_key *bloom_key,
struct scoutfs_key *start, struct scoutfs_key *end,
scoutfs_forest_item_cb cb, void *arg);
u64 merge_input_seq, struct scoutfs_key *key,
struct scoutfs_key *bloom_key, struct scoutfs_key *start,
struct scoutfs_key *end, scoutfs_forest_item_cb cb, void *arg);
int scoutfs_forest_set_bloom_bits(struct super_block *sb,
struct scoutfs_lock *lock);
void scoutfs_forest_set_max_seq(struct super_block *sb, u64 max_seq);

View File

@@ -549,6 +549,7 @@ retry:
goto out;
if (scoutfs_data_wait_found(&dw)) {
scoutfs_unlock(sb, lock, SCOUTFS_LOCK_WRITE);
lock = NULL;
/* XXX callee locks instead? */
inode_unlock(inode);

View File

@@ -1739,6 +1739,43 @@ out:
return ret;
}
static long scoutfs_ioc_inject_totl_delta(struct file *file, unsigned long arg)
{
struct super_block *sb = file_inode(file)->i_sb;
struct scoutfs_ioctl_inject_totl_delta __user *uitd = (void __user *)arg;
struct scoutfs_ioctl_inject_totl_delta itd;
struct scoutfs_xattr_totl_val tval;
struct scoutfs_lock *lock = NULL;
struct scoutfs_key key;
int ret;
if (!capable(CAP_SYS_ADMIN))
return -EPERM;
if (copy_from_user(&itd, uitd, sizeof(itd)))
return -EFAULT;
scoutfs_xattr_init_totl_key(&key, itd.name);
tval.total = cpu_to_le64((u64)itd.total);
tval.count = cpu_to_le64((u64)itd.count);
ret = scoutfs_lock_xattr_totl(sb, SCOUTFS_LOCK_WRITE_ONLY, 0, &lock);
if (ret < 0)
goto out;
ret = scoutfs_hold_trans(sb, true);
if (ret < 0)
goto unlock;
ret = scoutfs_item_delta(sb, &key, &tval, sizeof(tval), lock);
scoutfs_release_trans(sb);
unlock:
scoutfs_unlock(sb, lock, SCOUTFS_LOCK_WRITE_ONLY);
out:
return ret;
}
long scoutfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
{
switch (cmd) {
@@ -1790,6 +1827,8 @@ long scoutfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
return scoutfs_ioc_read_xattr_index(file, arg);
case SCOUTFS_IOC_PUNCH_OFFLINE:
return scoutfs_ioc_punch_offline(file, arg);
case SCOUTFS_IOC_INJECT_TOTL_DELTA:
return scoutfs_ioc_inject_totl_delta(file, arg);
}
return -ENOTTY;

View File

@@ -876,4 +876,17 @@ struct scoutfs_ioctl_punch_offline {
#define SCOUTFS_IOC_PUNCH_OFFLINE \
_IOW(SCOUTFS_IOCTL_MAGIC, 24, struct scoutfs_ioctl_punch_offline)
/*
* Inject a signed (total, count) delta at the totl key @name (a, b, c
* match the trailing dotted u64s of a totl xattr name).
*/
struct scoutfs_ioctl_inject_totl_delta {
__u64 name[SCOUTFS_IOCTL_XATTR_TOTAL_NAME_NR];
__s64 total;
__s64 count;
};
#define SCOUTFS_IOC_INJECT_TOTL_DELTA \
_IOW(SCOUTFS_IOCTL_MAGIC, 25, struct scoutfs_ioctl_inject_totl_delta)
#endif

View File

@@ -813,6 +813,7 @@ int scoutfs_lock_invalidate_request(struct super_block *sb, u64 net_id,
out:
if (!lock) {
kfree(ireq);
ret = scoutfs_client_lock_response(sb, net_id, nl);
BUG_ON(ret); /* lock server doesn't fence timed out client requests */
}

View File

@@ -525,7 +525,7 @@ static int process_response(struct scoutfs_net_connection *conn,
struct super_block *sb = conn->sb;
struct message_send *msend;
scoutfs_net_response_t resp_func = NULL;
void *resp_data;
void *resp_data = NULL;
spin_lock(&conn->lock);
@@ -804,7 +804,7 @@ static void scoutfs_net_recv_worker(struct work_struct *work)
if (invalid_message(conn, nh)) {
scoutfs_inc_counter(sb, net_recv_invalid_message);
ret = -EBADMSG;
break;
goto out;
}
data_len = le16_to_cpu(nh->data_len);

View File

@@ -34,6 +34,7 @@
#include "totl.h"
#include "util.h"
#include "quota.h"
#include "trans.h"
#include "counters.h"
#include "scoutfs_trace.h"
@@ -1086,6 +1087,10 @@ int scoutfs_quota_mod_rule(struct super_block *sb, bool is_add,
if (ret < 0)
goto out;
ret = scoutfs_hold_trans(sb, true);
if (ret < 0)
goto out;
down_write(&qtinf->rwsem);
if (is_add) {
@@ -1095,28 +1100,31 @@ int scoutfs_quota_mod_rule(struct super_block *sb, bool is_add,
else if (ret == 0)
ret = -EEXIST;
if (ret < 0)
goto unlock;
goto release;
rule_to_rule_val(&rv, &rule);
ret = scoutfs_item_create(sb, &key, &rv, sizeof(rv), lock);
if (ret < 0)
goto unlock;
goto release;
} else {
ret = find_rule(sb, &rule, &key, lock) ?:
scoutfs_item_delete(sb, &key, lock);
if (ret < 0)
goto unlock;
goto release;
}
wait_event(qtinf->waitq, !ruleset_is_busy(qtinf));
scoutfs_quota_invalidate(sb);
ret = 0;
unlock:
release:
up_write(&qtinf->rwsem);
scoutfs_unlock(sb, lock, SCOUTFS_LOCK_WRITE);
scoutfs_release_trans(sb);
out:
scoutfs_unlock(sb, lock, SCOUTFS_LOCK_WRITE);
if (is_add)
trace_scoutfs_quota_add_rule(sb, &rule, ret);
else
@@ -1135,12 +1143,17 @@ void scoutfs_quota_get_lock_range(struct scoutfs_key *start, struct scoutfs_key
}
/*
* This is called during cluster lock invalidation to indicate that the
* ruleset is no longer protected by cluster locking and might have been
* modified. We mark the ruleset invalid and free it once all readers
* drain. The next check will acquire the cluster lock and read the
* rules. Because this is called during invalidation this is serialized
* with write holders of cluster locks so we can never see -EBUSY here.
* Mark the cached ruleset invalid and free the previous one once readers
* drain. Called from cluster lock invalidation and from quota rule
* modification.
*
* Cluster lock invalidation runs only after the lock layer has drained
* local READ users. Since EBUSY is set only while a reader holds READ,
* the reader has already published by the time we run.
*
* Quota rule modification waits on the waitq for any in-flight reader
* to publish before calling here, so the next check rebuilds against
* the newly written rules rather than the reader's stale result.
*/
void scoutfs_quota_invalidate(struct super_block *sb)
{
@@ -1154,13 +1167,10 @@ void scoutfs_quota_invalidate(struct super_block *sb)
spin_lock(&qtinf->lock);
rs = rcu_dereference_protected(qtinf->ruleset, lockdep_is_held(&qtinf->lock));
if (rs != ERR_PTR(-EINVAL))
if (rs == ERR_PTR(-ENOENT) || !IS_ERR(rs))
rcu_assign_pointer(qtinf->ruleset, ERR_PTR(-EINVAL));
spin_unlock(&qtinf->lock);
/* cluster locking should have prevented this */
BUG_ON(rs == ERR_PTR(-EBUSY));
if (!IS_ERR(rs))
call_rcu(&rs->rcu, free_ruleset_rcu);

View File

@@ -103,7 +103,7 @@ int scoutfs_recov_prepare(struct super_block *sb, u64 rid, int which)
if (!alloc)
return -ENOMEM;
spin_lock(&recinf->lock);
spin_lock_bh(&recinf->lock);
pend = lookup_pending(recinf, rid, SCOUTFS_RECOV_ALL);
if (pend) {
@@ -116,7 +116,7 @@ int scoutfs_recov_prepare(struct super_block *sb, u64 rid, int which)
list_sort(NULL, &recinf->pending, cmp_pending_rid);
}
spin_unlock(&recinf->lock);
spin_unlock_bh(&recinf->lock);
kfree(alloc);
return 0;
@@ -153,7 +153,7 @@ int scoutfs_recov_begin(struct super_block *sb, void (*timeout_fn)(struct super_
DECLARE_RECOV_INFO(sb, recinf);
int ret;
spin_lock(&recinf->lock);
spin_lock_bh(&recinf->lock);
recinf->timeout_fn = timeout_fn;
recinf->timer.expires = jiffies + msecs_to_jiffies(timeout_ms);
@@ -161,7 +161,7 @@ int scoutfs_recov_begin(struct super_block *sb, void (*timeout_fn)(struct super_
ret = recov_finished(recinf);
spin_unlock(&recinf->lock);
spin_unlock_bh(&recinf->lock);
if (ret > 0)
del_timer_sync(&recinf->timer);
@@ -183,7 +183,7 @@ int scoutfs_recov_finish(struct super_block *sb, u64 rid, int which)
struct recov_pending *pend;
int ret = 0;
spin_lock(&recinf->lock);
spin_lock_bh(&recinf->lock);
pend = lookup_pending(recinf, rid, which);
if (pend) {
@@ -196,7 +196,7 @@ int scoutfs_recov_finish(struct super_block *sb, u64 rid, int which)
}
}
spin_unlock(&recinf->lock);
spin_unlock_bh(&recinf->lock);
if (ret > 0)
del_timer_sync(&recinf->timer);
@@ -215,9 +215,9 @@ bool scoutfs_recov_is_pending(struct super_block *sb, u64 rid, int which)
DECLARE_RECOV_INFO(sb, recinf);
bool is_pending;
spin_lock(&recinf->lock);
spin_lock_bh(&recinf->lock);
is_pending = lookup_pending(recinf, rid, which) != NULL;
spin_unlock(&recinf->lock);
spin_unlock_bh(&recinf->lock);
return is_pending;
}
@@ -236,10 +236,10 @@ u64 scoutfs_recov_next_pending(struct super_block *sb, u64 rid, int which)
DECLARE_RECOV_INFO(sb, recinf);
struct recov_pending *pend;
spin_lock(&recinf->lock);
spin_lock_bh(&recinf->lock);
pend = next_pending(recinf, rid, which);
rid = pend ? pend->rid : 0;
spin_unlock(&recinf->lock);
spin_unlock_bh(&recinf->lock);
return rid;
}
@@ -257,10 +257,10 @@ void scoutfs_recov_shutdown(struct super_block *sb)
del_timer_sync(&recinf->timer);
spin_lock(&recinf->lock);
spin_lock_bh(&recinf->lock);
list_splice_init(&recinf->pending, &list);
recinf->timeout_fn = NULL;
spin_unlock(&recinf->lock);
spin_unlock_bh(&recinf->lock);
list_for_each_entry_safe(pend, tmp, &list, head) {
list_del(&pend->head);

View File

@@ -1077,8 +1077,7 @@ static int next_log_merge_range(struct super_block *sb, struct scoutfs_btree_roo
struct scoutfs_key key;
int ret;
key = *start;
key.sk_zone = SCOUTFS_LOG_MERGE_RANGE_ZONE;
init_log_merge_key(&key, SCOUTFS_LOG_MERGE_RANGE_ZONE, 0, 0);
scoutfs_key_set_ones(&rng->start);
do {

View File

@@ -30,6 +30,11 @@ void scoutfs_totl_merge_init(struct scoutfs_totl_merging *merg)
memset(merg, 0, sizeof(struct scoutfs_totl_merging));
}
/*
* bin the incoming merge inputs so that we can resolve delta items
* properly. Finalized logs that are merge inputs are kept separately
* from those that are not.
*/
void scoutfs_totl_merge_contribute(struct scoutfs_totl_merging *merg,
u64 seq, u8 flags, void *val, int val_len, int fic)
{
@@ -39,10 +44,10 @@ void scoutfs_totl_merge_contribute(struct scoutfs_totl_merging *merg,
merg->fs_seq = seq;
merg->fs_total = le64_to_cpu(tval->total);
merg->fs_count = le64_to_cpu(tval->count);
} else if (fic & FIC_FINALIZED) {
merg->fin_seq = seq;
merg->fin_total += le64_to_cpu(tval->total);
merg->fin_count += le64_to_cpu(tval->count);
} else if (fic & FIC_MERGE_INPUT) {
merg->inp_seq = seq;
merg->inp_total += le64_to_cpu(tval->total);
merg->inp_count += le64_to_cpu(tval->count);
} else {
merg->log_seq = seq;
merg->log_total += le64_to_cpu(tval->total);
@@ -53,15 +58,18 @@ void scoutfs_totl_merge_contribute(struct scoutfs_totl_merging *merg,
/*
* .totl. item merging has to be careful because the log btree merging
* code can write partial results to the fs_root. This means that a
* reader can see both cases where new finalized logs should be applied
* to the old fs items and where old finalized logs have already been
* applied to the partially merged fs items. Currently active logged
* items are always applied on top of all cases.
* reader can see both cases where merge input deltas should be applied
* to the old fs items and where they have already been applied to the
* partially merged fs items.
*
* Only finalized log trees that are inputs to the current merge cycle
* are tracked in the inp_ bucket. Finalized trees that aren't merge
* inputs and active log trees are always applied unconditionally since
* they cannot be in fs_root.
*
* These cases are differentiated with a combination of sequence numbers
* in items, the count of contributing xattrs, and a flag
* differentiating finalized and active logged items. This lets us
* recognize all cases, including when finalized logs were merged and
* in items and the count of contributing xattrs. This lets us
* recognize all cases, including when merge inputs were merged and
* deleted the fs item.
*/
void scoutfs_totl_merge_resolve(struct scoutfs_totl_merging *merg, __u64 *total, __u64 *count)
@@ -75,14 +83,14 @@ void scoutfs_totl_merge_resolve(struct scoutfs_totl_merging *merg, __u64 *total,
*count = merg->fs_count;
}
/* apply finalized logs if they're newer or creating */
if (((merg->fs_seq != 0) && (merg->fin_seq > merg->fs_seq)) ||
((merg->fs_seq == 0) && (merg->fin_count > 0))) {
*total += merg->fin_total;
*count += merg->fin_count;
/* apply merge input deltas if they're newer or creating */
if (((merg->fs_seq != 0) && (merg->inp_seq > merg->fs_seq)) ||
((merg->fs_seq == 0) && (merg->inp_count > 0))) {
*total += merg->inp_total;
*count += merg->inp_count;
}
/* always apply active logs which must be newer than fs and finalized */
/* always apply non-input finalized and active logs */
if (merg->log_seq > 0) {
*total += merg->log_total;
*count += merg->log_count;

View File

@@ -7,9 +7,9 @@ struct scoutfs_totl_merging {
u64 fs_seq;
u64 fs_total;
u64 fs_count;
u64 fin_seq;
u64 fin_total;
s64 fin_count;
u64 inp_seq;
u64 inp_total;
s64 inp_count;
u64 log_seq;
u64 log_total;
s64 log_count;

View File

@@ -46,6 +46,7 @@ static char *names[] = {
[SCOUTFS_TRIGGER_SRCH_MERGE_STOP_SAFE] = "srch_merge_stop_safe",
[SCOUTFS_TRIGGER_STATFS_LOCK_PURGE] = "statfs_lock_purge",
[SCOUTFS_TRIGGER_RECLAIM_SKIP_FINALIZE] = "reclaim_skip_finalize",
[SCOUTFS_TRIGGER_LOG_MERGE_FORCE_PARTIAL] = "log_merge_force_partial",
};
bool scoutfs_trigger_test_and_clear(struct super_block *sb, unsigned int t)

View File

@@ -9,6 +9,7 @@ enum scoutfs_trigger {
SCOUTFS_TRIGGER_SRCH_MERGE_STOP_SAFE,
SCOUTFS_TRIGGER_STATFS_LOCK_PURGE,
SCOUTFS_TRIGGER_RECLAIM_SKIP_FINALIZE,
SCOUTFS_TRIGGER_LOG_MERGE_FORCE_PARTIAL,
SCOUTFS_TRIGGER_NR,
};

View File

@@ -95,6 +95,7 @@ struct wkic_info {
/* block reading slow path */
struct mutex roots_mutex;
struct scoutfs_net_roots roots;
u64 merge_input_seq;
u64 roots_read_seq;
ktime_t roots_expire;
@@ -805,29 +806,79 @@ static void free_page_list(struct super_block *sb, struct list_head *list)
* read_seq number so that we can compare the age of the items in cached
* pages. Only one request to refresh the roots is in progress at a
* time. This is the slow path that's only used when the cache isn't
* populated and the roots aren't cached. The root request is fast
* enough, especially compared to the resulting item reading IO, that we
* don't mind hiding it behind a trivial mutex.
* populated and the roots aren't cached.
*
* We read roots directly from the on-disk superblock rather than
* requesting them from the server so that we can also read the
* log_merge btree from the same superblock. The merge status item
* seq tells us which finalized log trees are inputs to the current
* merge, which is needed to correctly resolve totl delta items.
*/
static int get_roots(struct super_block *sb, struct wkic_info *winf,
struct scoutfs_net_roots *roots_ret, u64 *read_seq, bool force_new)
static int refresh_roots(struct super_block *sb, struct wkic_info *winf)
{
struct scoutfs_super_block *super;
struct scoutfs_log_merge_status *stat;
SCOUTFS_BTREE_ITEM_REF(iref);
struct scoutfs_key key;
int ret;
super = kmalloc(sizeof(*super), GFP_NOFS);
if (!super)
return -ENOMEM;
ret = scoutfs_read_super(sb, super);
if (ret < 0)
goto out;
winf->roots = (struct scoutfs_net_roots){
.fs_root = super->fs_root,
.logs_root = super->logs_root,
.srch_root = super->srch_root,
};
winf->merge_input_seq = 0;
if (super->log_merge.ref.blkno) {
scoutfs_key_set_zeros(&key);
key.sk_zone = SCOUTFS_LOG_MERGE_STATUS_ZONE;
ret = scoutfs_btree_lookup(sb, &super->log_merge, &key, &iref);
if (ret == 0) {
if (iref.val_len == sizeof(*stat)) {
stat = iref.val;
winf->merge_input_seq = le64_to_cpu(stat->seq);
} else {
ret = -EUCLEAN;
}
scoutfs_btree_put_iref(&iref);
} else if (ret == -ENOENT) {
ret = 0;
}
if (ret < 0)
goto out;
}
winf->roots_read_seq++;
winf->roots_expire = ktime_add_ms(ktime_get_raw(), WKIC_CACHE_LIFETIME_MS);
out:
kfree(super);
return ret;
}
static int get_roots(struct super_block *sb, struct wkic_info *winf,
struct scoutfs_net_roots *roots_ret, u64 *merge_input_seq,
u64 *read_seq, bool force_new)
{
struct scoutfs_net_roots roots;
int ret;
mutex_lock(&winf->roots_mutex);
if (force_new || ktime_before(winf->roots_expire, ktime_get_raw())) {
ret = scoutfs_client_get_roots(sb, &roots);
ret = refresh_roots(sb, winf);
if (ret < 0)
goto out;
winf->roots = roots;
winf->roots_read_seq++;
winf->roots_expire = ktime_add_ms(ktime_get_raw(), WKIC_CACHE_LIFETIME_MS);
}
*roots_ret = winf->roots;
*merge_input_seq = winf->merge_input_seq;
*read_seq = winf->roots_read_seq;
ret = 0;
out:
@@ -870,24 +921,30 @@ static int insert_read_pages(struct super_block *sb, struct wkic_info *winf,
struct scoutfs_key end;
struct wkic_page *wpage;
LIST_HEAD(pages);
u64 read_seq;
u64 merge_input_seq;
u64 read_seq = 0;
int ret;
ret = 0;
retry_stale:
ret = get_roots(sb, winf, &roots, &read_seq, ret == -ESTALE);
ret = get_roots(sb, winf, &roots, &merge_input_seq, &read_seq, ret == -ESTALE);
if (ret < 0)
goto out;
goto check_stale;
start = *range_start;
end = *range_end;
ret = scoutfs_forest_read_items_roots(sb, &roots, key, range_start, &start, &end,
read_items_cb, &root);
ret = scoutfs_forest_read_items_roots(sb, &roots, merge_input_seq, key, range_start,
&start, &end, read_items_cb, &root);
trace_scoutfs_wkic_read_items(sb, key, &start, &end);
check_stale:
ret = scoutfs_block_check_stale(sb, ret, &saved, &roots.fs_root.ref, &roots.logs_root.ref);
if (ret < 0) {
if (ret == -ESTALE)
if (ret == -ESTALE) {
/* not safe to retry due to delta items, must restart clean */
free_item_tree(&root);
root = RB_ROOT;
goto retry_stale;
}
goto out;
}

View File

@@ -1265,6 +1265,7 @@ int scoutfs_xattr_drop(struct super_block *sb, u64 ino,
ret = parse_indx_key(&tag_key, xat->name, xat->name_len, ino);
if (ret < 0)
goto out;
scoutfs_xattr_set_indx_key_xid(&tag_key, le64_to_cpu(key.skx_id));
}
if ((tgs.totl || tgs.indx) && locked_zone != tag_key.sk_zone) {

1
tests/.gitignore vendored
View File

@@ -12,3 +12,4 @@ src/o_tmpfile_umask
src/o_tmpfile_linkat
src/mmap_stress
src/mmap_validate
src/totl-delta-inject

View File

@@ -15,7 +15,8 @@ BIN := src/createmany \
src/o_tmpfile_umask \
src/o_tmpfile_linkat \
src/mmap_stress \
src/mmap_validate
src/mmap_validate \
src/totl-delta-inject
DEPS := $(wildcard src/*.d)

View File

@@ -20,9 +20,6 @@ t_filter_fs()
# [ 2687.691366] BUG: KASAN: stack-out-of-bounds in get_reg+0x1bc/0x230
# ...
# [ 2687.706220] ==================================================================
# [ 2687.707284] Disabling lock debugging due to kernel taint
#
# That final lock debugging message may not be included.
#
ignore_harmless_unwind_kasan_stack_oob()
{
@@ -46,10 +43,6 @@ awk '
saved=""
}
( in_soob == 2 && $0 ~ /==================================================================/ ) {
in_soob = 3
soob_nr = NR
}
( in_soob == 3 && NR > soob_nr && $0 !~ /Disabling lock debugging/ ) {
in_soob = 0
}
( !in_soob ) { print $0 }
@@ -61,6 +54,58 @@ awk '
'
}
#
# in el97+, XFS can generate a spurious lockdep circular dependency
# warning about reclaim. Fixed upstream in e.g. v5.7-rc4-129-g6dcde60efd94
#
ignore_harmless_xfs_lockdep_warning()
{
awk '
BEGIN {
in_block = 0
block_nr = 0
buf = ""
}
( !in_block && $0 ~ /======================================================/ ) {
in_block = 1
block_nr = NR
buf = $0 "\n"
next
}
( in_block == 1 && NR == (block_nr + 1) ) {
if (match($0, /WARNING: possible circular locking dependency detected/) != 0) {
in_block = 2
buf = buf $0 "\n"
} else {
in_block = 0
printf "%s", buf
print $0
buf = ""
}
next
}
( in_block == 2 ) {
buf = buf $0 "\n"
if ($0 ~ /<\/TASK>/) {
if (buf ~ /xfs_(nondir_|dir_)?ilock_class/ && buf ~ /fs_reclaim/) {
# known xfs lockdep false positive, discard
} else {
printf "%s", buf
}
in_block = 0
buf = ""
}
next
}
{ print $0 }
END {
if (buf) {
printf "%s", buf
}
}
'
}
#
# Filter out expected messages. Putting messages here implies that
# tests aren't relying on messages to discover failures.. they're
@@ -176,6 +221,10 @@ t_filter_dmesg()
# creating block devices may trigger this
re="$re|block device autoloading is deprecated and will be removed."
# lockdep or kasan warnings can cause this
re="$re|Disabling lock debugging due to kernel taint"
egrep -v "($re)" | \
ignore_harmless_unwind_kasan_stack_oob
ignore_harmless_unwind_kasan_stack_oob | \
ignore_harmless_xfs_lockdep_warning
}

View File

@@ -0,0 +1,54 @@
== testing invalid read-xattr-index arguments
bad index position entry argument 'bad', it must be in the form "a.b.ino" where each value can be prefixed by '0' for octal or '0x' for hex
scoutfs: read-xattr-index failed: Invalid argument (22)
bad index position entry argument '1.2', it must be in the form "a.b.ino" where each value can be prefixed by '0' for octal or '0x' for hex
scoutfs: read-xattr-index failed: Invalid argument (22)
initial major index position '256' must be between 0 and 255, inclusive.
scoutfs: read-xattr-index failed: Invalid argument (22)
first index position 1.2.3 must be less than last index position 0.0.0
scoutfs: read-xattr-index failed: Invalid argument (22)
first index position 1.2.0 must be less than last index position 1.1.2
scoutfs: read-xattr-index failed: Invalid argument (22)
first index position 2.2.2 must be less than last index position 2.2.1
scoutfs: read-xattr-index failed: Invalid argument (22)
== testing invalid names
setfattr: /mnt/test/test/basic-xattr-indx/invalid: Invalid argument
setfattr: /mnt/test/test/basic-xattr-indx/invalid: Invalid argument
setfattr: /mnt/test/test/basic-xattr-indx/invalid: Invalid argument
setfattr: /mnt/test/test/basic-xattr-indx/invalid: Invalid argument
setfattr: /mnt/test/test/basic-xattr-indx/invalid: Invalid argument
setfattr: /mnt/test/test/basic-xattr-indx/invalid: Invalid argument
setfattr: /mnt/test/test/basic-xattr-indx/invalid: Invalid argument
setfattr: /mnt/test/test/basic-xattr-indx/invalid: Invalid argument
setfattr: /mnt/test/test/basic-xattr-indx/invalid: Invalid argument
setfattr: /mnt/test/test/basic-xattr-indx/invalid: Invalid argument
setfattr: /mnt/test/test/basic-xattr-indx/invalid: Invalid argument
setfattr: /mnt/test/test/basic-xattr-indx/invalid: Numerical result out of range
== testing boundary values
0.0 found
255.max found
== indx xattr must have no value
setfattr: /mnt/test/test/basic-xattr-indx/noval: Invalid argument
setfattr: /mnt/test/test/basic-xattr-indx/noval: Invalid argument
== set indx xattr and verify index entry
found
== setting same indx xattr again is a no-op
found
== removing non-existent indx xattr succeeds
setfattr: /mnt/test/test/basic-xattr-indx/file: No such attribute
still found
== explicit xattr removal cleans up index entry
== file deletion cleans up index entry
found before delete
== multiple indx xattrs on one file cleaned up by deletion
entries before delete: 2
entries after delete: 0
== partial removal leaves other entries
300 found
== multiple files at same index position
files at same position: 2
surviving file found
== cross-mount visibility
found on mount 1
== duplicate position deduplication
entries for same position: 1

View File

@@ -0,0 +1,6 @@
== setup
== concurrent quota mod and check across mounts
== verify quota rules are consistent after race
== verify file creation still works under quota
file visible on mount 1
== cleanup

View File

@@ -0,0 +1,10 @@
== setup three files contributing to totl 8888.0.0
== merge baseline into fs_root
8888.0.0 = 42, 3
== inject (+128, +2) unbalances totl 8888.0.0
8888.0.0 = 170, 5
== unlink f3 (value 32) produces a -32/-1 delta
8888.0.0 = 138, 4
== inject (-128, -2) restores accounting for the remaining files
8888.0.0 = 10, 2
== cleanup

View File

@@ -0,0 +1,3 @@
== setup
expected 4681
== cleanup

View File

@@ -694,8 +694,8 @@ for t in $tests; do
if [ "$sts" == "$T_PASS_STATUS" ]; then
dmesg | t_filter_dmesg > "$T_TMPDIR/dmesg.after"
diff --old-line-format="" --unchanged-line-format="" \
"$T_TMPDIR/dmesg.before" "$T_TMPDIR/dmesg.after" > \
"$T_TMPDIR/dmesg.new"
"$T_TMPDIR/dmesg.before" "$T_TMPDIR/dmesg.after" | \
grep -v '^$' > "$T_TMPDIR/dmesg.new"
if [ -s "$T_TMPDIR/dmesg.new" ]; then
message="unexpected messages in dmesg"

View File

@@ -26,7 +26,11 @@ srch-basic-functionality.sh
simple-xattr-unit.sh
retention-basic.sh
totl-xattr-tag.sh
basic-xattr-indx.sh
quota.sh
totl-merge-read.sh
quota-invalidate-race.sh
totl-delta-inject.sh
lock-refleak.sh
lock-shrink-consistency.sh
lock-shrink-read-race.sh

View File

@@ -0,0 +1,121 @@
/*
* Test helper that calls SCOUTFS_IOC_INJECT_TOTL_DELTA to seed
* arbitrary totl deltas.
*
* Copyright (C) 2026 Versity Software, Inc. All rights reserved.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public
* License v2 as published by the Free Software Foundation.
*/
#ifndef _GNU_SOURCE
#define _GNU_SOURCE
#endif
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#include <inttypes.h>
#include <fcntl.h>
#include <errno.h>
#include <sys/ioctl.h>
#include <linux/types.h>
#include "ioctl.h"
static void usage(const char *prog)
{
fprintf(stderr,
"Usage: %s <mountpoint> <a>.<b>.<c> <total> <count>\n",
prog);
exit(2);
}
static int parse_s64(const char *s, int64_t *out)
{
char *end;
int64_t v;
errno = 0;
v = strtoll(s, &end, 0);
if (errno || *end != '\0' || end == s)
return -1;
*out = v;
return 0;
}
/*
* Parse "<a>.<b>.<c>" into abc[0..2] (skxt_a, skxt_b, skxt_c). Each
* component must be a non-empty unsigned base-0 integer.
*/
static int parse_dotted_name(const char *s, uint64_t abc[3])
{
const char *p = s;
char *end;
int i;
for (i = 0; i < 3; i++) {
if (*p == '\0' || *p == '.')
return -1;
errno = 0;
abc[i] = strtoull(p, &end, 0);
if (errno || end == p)
return -1;
if (i < 2) {
if (*end != '.')
return -1;
p = end + 1;
} else {
if (*end != '\0')
return -1;
}
}
return 0;
}
int main(int argc, char **argv)
{
struct scoutfs_ioctl_inject_totl_delta itd = {{0,}};
uint64_t abc[3];
int64_t total, count;
int fd;
int ret;
if (argc != 5)
usage(argv[0]);
if (parse_dotted_name(argv[2], abc) ||
parse_s64(argv[3], &total) ||
parse_s64(argv[4], &count)) {
fprintf(stderr, "could not parse arguments\n");
usage(argv[0]);
}
itd.name[0] = abc[0];
itd.name[1] = abc[1];
itd.name[2] = abc[2];
itd.total = total;
itd.count = count;
fd = open(argv[1], O_RDONLY | O_DIRECTORY);
if (fd < 0) {
fprintf(stderr, "open(%s): %s\n", argv[1], strerror(errno));
return 1;
}
ret = ioctl(fd, SCOUTFS_IOC_INJECT_TOTL_DELTA, &itd);
if (ret < 0) {
fprintf(stderr,
"INJECT_TOTL_DELTA(%" PRIu64 ".%" PRIu64 ".%" PRIu64
", total=%" PRId64 ", count=%" PRId64 "): %s\n",
abc[0], abc[1], abc[2], total, count, strerror(errno));
close(fd);
return 1;
}
close(fd);
return 0;
}

View File

@@ -0,0 +1,143 @@
#
# Test basic .indx. xattr tag functionality and index entry lifecycle
#
t_require_commands touch rm setfattr scoutfs stat
t_require_mounts 2
# query index from a specific mount, default mount 0
read_xattr_index()
{
local nr="${1:-0}"
local mnt="$(eval echo \$T_M$nr)"
shift
sync
echo 1 > $(t_debugfs_path $nr)/drop_weak_item_cache
scoutfs read-xattr-index -p "$mnt" "$@"
}
MAJOR=5
MINOR=100
echo "== testing invalid read-xattr-index arguments"
scoutfs read-xattr-index -p "$T_M0" bad 2>&1
scoutfs read-xattr-index -p "$T_M0" 1.2 2>&1
scoutfs read-xattr-index -p "$T_M0" 1.2.3 256.0.0 2>&1
scoutfs read-xattr-index -p "$T_M0" 1.2.3 0.0.0 2>&1
scoutfs read-xattr-index -p "$T_M0" 1.2.0 1.1.2 2>&1
scoutfs read-xattr-index -p "$T_M0" 2.2.2 2.2.1 2>&1
echo "== testing invalid names"
touch "$T_D0/invalid"
setfattr -n scoutfs.hide.indx.test.$MAJOR "$T_D0/invalid" 2>&1 | t_filter_fs
setfattr -n scoutfs.hide.indx.test "$T_D0/invalid" 2>&1 | t_filter_fs
setfattr -n scoutfs.hide.indx.test.. "$T_D0/invalid" 2>&1 | t_filter_fs
setfattr -n scoutfs.hide.indx.test..$MINOR "$T_D0/invalid" 2>&1 | t_filter_fs
setfattr -n scoutfs.hide.indx.test.$MAJOR. "$T_D0/invalid" 2>&1 | t_filter_fs
setfattr -n scoutfs.hide.indx.test.256.$MINOR "$T_D0/invalid" 2>&1 | t_filter_fs
setfattr -n scoutfs.hide.indx.test.abc.$MINOR "$T_D0/invalid" 2>&1 | t_filter_fs
setfattr -n scoutfs.hide.indx.test.$MAJOR.abc "$T_D0/invalid" 2>&1 | t_filter_fs
setfattr -n scoutfs.hide.indx.test.-1.$MINOR "$T_D0/invalid" 2>&1 | t_filter_fs
setfattr -n scoutfs.hide.indx.test.$MAJOR.-1 "$T_D0/invalid" 2>&1 | t_filter_fs
setfattr -n scoutfs.hide.indx.test.18446744073709551616.$MINOR "$T_D0/invalid" 2>&1 | t_filter_fs
setfattr -n scoutfs.hide.indx.$(printf 'x%.0s' $(seq 1 240)).$MAJOR.$MINOR "$T_D0/invalid" 2>&1 | t_filter_fs
rm -f "$T_D0/invalid"
echo "== testing boundary values"
touch "$T_D0/boundary"
INO=$(stat -c "%i" "$T_D0/boundary")
setfattr -n scoutfs.hide.indx.test.0.0 "$T_D0/boundary"
read_xattr_index 0 0.0.0 0.0.-1 | awk '($3 == "'$INO'") {print "0.0 found"}'
setfattr -x scoutfs.hide.indx.test.0.0 "$T_D0/boundary"
setfattr -n scoutfs.hide.indx.test.255.18446744073709551615 "$T_D0/boundary"
read_xattr_index 0 255.0.0 255.-1.-1 | awk '($3 == "'$INO'") {print "255.max found"}'
setfattr -x scoutfs.hide.indx.test.255.18446744073709551615 "$T_D0/boundary"
rm -f "$T_D0/boundary"
echo "== indx xattr must have no value"
touch "$T_D0/noval"
setfattr -n scoutfs.hide.indx.test.$MAJOR.$MINOR -v "" "$T_D0/noval" 2>&1 | t_filter_fs
setfattr -n scoutfs.hide.indx.test.$MAJOR.$MINOR -v 0 "$T_D0/noval" 2>&1 | t_filter_fs
setfattr -n scoutfs.hide.indx.test.$MAJOR.$MINOR -v 1 "$T_D0/noval" 2>&1 | t_filter_fs
rm -f "$T_D0/noval"
echo "== set indx xattr and verify index entry"
touch "$T_D0/file"
INO=$(stat -c "%i" "$T_D0/file")
setfattr -n scoutfs.hide.indx.test.$MAJOR.$MINOR "$T_D0/file"
read_xattr_index 0 $MAJOR.0.0 $MAJOR.-1.-1 | awk '($3 == "'$INO'") {print "found"}'
echo "== setting same indx xattr again is a no-op"
setfattr -n scoutfs.hide.indx.test.$MAJOR.$MINOR "$T_D0/file"
read_xattr_index 0 $MAJOR.0.0 $MAJOR.-1.-1 | awk '($3 == "'$INO'") {print "found"}'
echo "== removing non-existent indx xattr succeeds"
setfattr -x scoutfs.hide.indx.nonexistent.$MAJOR.999 "$T_D0/file" 2>&1 | t_filter_fs
read_xattr_index 0 $MAJOR.0.0 $MAJOR.-1.-1 | awk '($3 == "'$INO'") {print "still found"}'
echo "== explicit xattr removal cleans up index entry"
setfattr -x scoutfs.hide.indx.test.$MAJOR.$MINOR "$T_D0/file"
read_xattr_index 0 $MAJOR.0.0 $MAJOR.-1.-1 | awk '($3 == "'$INO'") {print "found orphan"}'
rm -f "$T_D0/file"
echo "== file deletion cleans up index entry"
touch "$T_D0/file2"
INO=$(stat -c "%i" "$T_D0/file2")
setfattr -n scoutfs.hide.indx.test.$MAJOR.$MINOR "$T_D0/file2"
read_xattr_index 0 $MAJOR.0.0 $MAJOR.-1.-1 | awk '($3 == "'$INO'") {print "found before delete"}'
rm -f "$T_D0/file2"
read_xattr_index 0 $MAJOR.0.0 $MAJOR.-1.-1 | awk '($3 == "'$INO'") {print "found orphan after delete"}'
echo "== multiple indx xattrs on one file cleaned up by deletion"
touch "$T_D0/file3"
INO=$(stat -c "%i" "$T_D0/file3")
setfattr -n scoutfs.hide.indx.a.$MAJOR.200 "$T_D0/file3"
setfattr -n scoutfs.hide.indx.b.$MAJOR.300 "$T_D0/file3"
BEFORE=$(read_xattr_index 0 $MAJOR.0.0 $MAJOR.-1.-1 | awk '($3 == "'$INO'")' | wc -l)
echo "entries before delete: $BEFORE"
rm -f "$T_D0/file3"
AFTER=$(read_xattr_index 0 $MAJOR.0.0 $MAJOR.-1.-1 | awk '($3 == "'$INO'")' | wc -l)
echo "entries after delete: $AFTER"
echo "== partial removal leaves other entries"
touch "$T_D0/partial"
INO=$(stat -c "%i" "$T_D0/partial")
setfattr -n scoutfs.hide.indx.a.$MAJOR.200 "$T_D0/partial"
setfattr -n scoutfs.hide.indx.b.$MAJOR.300 "$T_D0/partial"
setfattr -x scoutfs.hide.indx.a.$MAJOR.200 "$T_D0/partial"
read_xattr_index 0 $MAJOR.200.0 $MAJOR.200.-1 | awk '($3 == "'$INO'") {print "200 found"}'
read_xattr_index 0 $MAJOR.300.0 $MAJOR.300.-1 | awk '($3 == "'$INO'") {print "300 found"}'
rm -f "$T_D0/partial"
echo "== multiple files at same index position"
touch "$T_D0/multi_a" "$T_D0/multi_b"
INO_A=$(stat -c "%i" "$T_D0/multi_a")
INO_B=$(stat -c "%i" "$T_D0/multi_b")
setfattr -n scoutfs.hide.indx.test.$MAJOR.$MINOR "$T_D0/multi_a"
setfattr -n scoutfs.hide.indx.test.$MAJOR.$MINOR "$T_D0/multi_b"
COUNT=$(read_xattr_index 0 $MAJOR.$MINOR.0 $MAJOR.$MINOR.-1 | wc -l)
echo "files at same position: $COUNT"
rm -f "$T_D0/multi_a"
read_xattr_index 0 $MAJOR.$MINOR.0 $MAJOR.$MINOR.-1 | awk '($3 == "'$INO_A'") {print "deleted file still found"}'
read_xattr_index 0 $MAJOR.$MINOR.0 $MAJOR.$MINOR.-1 | awk '($3 == "'$INO_B'") {print "surviving file found"}'
rm -f "$T_D0/multi_b"
echo "== cross-mount visibility"
touch "$T_D0/file4"
INO=$(stat -c "%i" "$T_D0/file4")
setfattr -n scoutfs.hide.indx.test.$MAJOR.$MINOR "$T_D0/file4"
read_xattr_index 1 $MAJOR.0.0 $MAJOR.-1.-1 | awk '($3 == "'$INO'") {print "found on mount 1"}'
rm -f "$T_D0/file4"
read_xattr_index 1 $MAJOR.0.0 $MAJOR.-1.-1 | awk '($3 == "'$INO'") {print "found orphan on mount 1"}'
echo "== duplicate position deduplication"
touch "$T_D0/file5"
INO=$(stat -c "%i" "$T_D0/file5")
setfattr -n scoutfs.hide.indx.aa.$MAJOR.$MINOR "$T_D0/file5"
setfattr -n scoutfs.hide.indx.bb.$MAJOR.$MINOR "$T_D0/file5"
COUNT=$(read_xattr_index 0 $MAJOR.0.0 $MAJOR.-1.-1 | awk '($3 == "'$INO'")' | wc -l)
echo "entries for same position: $COUNT"
rm -f "$T_D0/file5"
t_pass

View File

@@ -0,0 +1,70 @@
#
# Regression for the BUG_ON in scoutfs_quota_invalidate when a concurrent
# ruleset read on one mount races with a quota rule modification.
#
t_require_mounts 2
TEST_UID=22222
SET_UID="--ruid=$TEST_UID --euid=$TEST_UID"
echo "== setup"
mkdir -p "$T_D0/dir"
chown --quiet $TEST_UID "$T_D0/dir"
# totl xattr gives quota checks something to consult
setfattr -n scoutfs.totl.test.1.1.1 -v 1 "$T_D0/dir"
echo "== concurrent quota mod and check across mounts"
(
for i in $(seq 1 20); do
scoutfs quota-add -p "$T_M0" \
-r "1 1,L,- 1,L,- $i,L,- I 999999 -" 2>/dev/null
scoutfs quota-del -p "$T_M0" \
-r "1 1,L,- 1,L,- $i,L,- I 999999 -" 2>/dev/null
done
) &
MOD_PID=$!
# same mount as the mod: races local read against invalidate
(
for i in $(seq 1 50); do
setpriv $SET_UID touch "$T_D0/dir/race0_$i" 2>/dev/null
rm -f "$T_D0/dir/race0_$i"
done
) &
CHECK0_PID=$!
# other mount: drives cross-node lock traffic
(
for i in $(seq 1 50); do
setpriv $SET_UID touch "$T_D1/dir/race1_$i" 2>/dev/null
rm -f "$T_D1/dir/race1_$i"
done
) &
CHECK1_PID=$!
t_quiet wait $MOD_PID
t_quiet wait $CHECK0_PID
t_quiet wait $CHECK1_PID
echo "== verify quota rules are consistent after race"
scoutfs quota-wipe -p "$T_M0"
scoutfs quota-list -p "$T_M0"
echo "== verify file creation still works under quota"
scoutfs quota-add -p "$T_M0" -r "1 1,L,- 1,L,- 1,L,- I 999999 -"
sync
echo 1 > $(t_debugfs_path)/drop_weak_item_cache
echo 1 > $(t_debugfs_path)/drop_quota_check_cache
setpriv $SET_UID touch "$T_D0/dir/verify_file"
test -f "$T_D1/dir/verify_file" && echo "file visible on mount 1"
rm -f "$T_D0/dir/verify_file"
scoutfs quota-wipe -p "$T_M0"
echo "== cleanup"
setfattr -x scoutfs.totl.test.1.1.1 "$T_D0/dir"
rm -rf "$T_D0/dir"
t_pass

View File

@@ -0,0 +1,43 @@
#
# Exercise the SCOUTFS_IOC_INJECT_TOTL_DELTA ioctl that injects totl
# deltas directly via totl-delta-inject(1).
#
t_require_commands setfattr scoutfs sync rm touch totl-delta-inject
# force a log merge then read-xattr-totals filtered to our own keys
read_totals()
{
t_force_log_merge
sync
echo 1 > $(t_debugfs_path)/drop_weak_item_cache
scoutfs read-xattr-totals -p "$T_M0" | \
grep -E '^8888\.' || true
}
echo "== setup three files contributing to totl 8888.0.0"
touch "$T_D0/f1" "$T_D0/f2" "$T_D0/f3"
setfattr -n scoutfs.totl.inj.8888.0.0 -v 2 "$T_D0/f1"
setfattr -n scoutfs.totl.inj.8888.0.0 -v 8 "$T_D0/f2"
setfattr -n scoutfs.totl.inj.8888.0.0 -v 32 "$T_D0/f3"
echo "== merge baseline into fs_root"
read_totals
echo "== inject (+128, +2) unbalances totl 8888.0.0"
totl-delta-inject "$T_M0" 8888.0.0 128 2
read_totals
echo "== unlink f3 (value 32) produces a -32/-1 delta"
rm -f "$T_D0/f3"
read_totals
echo "== inject (-128, -2) restores accounting for the remaining files"
totl-delta-inject "$T_M0" 8888.0.0 -128 -2
read_totals
echo "== cleanup"
rm -f "$T_D0/f1" "$T_D0/f2"
read_totals
t_pass

View File

@@ -0,0 +1,50 @@
#
# Test that merge_read_item() correctly updates the sequence number when
# combining delta items from multiple finalized log trees. Each mount
# sets a totl value in its own 3-bit lane (powers of 8) so that any
# double-counting overflows the lane and is caught by: or(v, exp) != exp.
#
t_require_commands setfattr scoutfs
t_require_mounts 5
echo "== setup"
for nr in $(t_fs_nrs); do
d=$(eval echo \$T_D$nr)
for i in $(seq 1 2500); do : > "$d/f$nr$i"; done
done
sync
t_force_log_merge
vals=(1 8 64 512 4096)
expected=4681
n=0
for nr in $(t_fs_nrs); do
d=$(eval echo \$T_D$nr)
v=${vals[$((n++))]}
for i in $(seq 1 2500); do
setfattr -n "scoutfs.totl.t.$i.0.0" -v $v "$d/f$nr$i"
done
done
t_trigger_arm_silent log_merge_force_partial $(t_server_nr)
bad="$T_TMPDIR/bad"
for nr in $(t_fs_nrs); do
( while true; do
echo 1 > "$(t_debugfs_path $nr)/drop_weak_item_cache"
scoutfs read-xattr-totals -p "$(eval echo \$T_M$nr)" | \
awk -F'[ =,]+' -v e=$expected 'or($2+0,e) != e'
done ) >> "$bad" &
done
echo "expected $expected"
t_force_log_merge
t_silent_kill $(jobs -p)
test -s "$bad" && echo "double-counted:" && cat "$bad"
echo "== cleanup"
for nr in $(t_fs_nrs); do
find "$(eval echo \$T_D$nr)" -name "f$nr*" -delete
done
t_pass