Compare commits

..

22 Commits

Author SHA1 Message Date
Auke Kok
9ab925b903 Clear ref_blkno output when block is already dirty
block_dirty_ref() skipped setting *ref_blkno when the block was
already dirty, leaving the caller with a stale value passed by
reference.

dirty_alloc_blocks() calls it twice but when the referenced block
is already dirty, it will receive the uninitialized stack value back,
and then adding it freed list with list_block_add() later.

Set it to 0 on the already-dirty fast path so callers do not try to
free a random block.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2026-05-12 15:10:47 -07:00
Zach Brown
fece0a9372 Merge pull request #310 from versity/zab/v1.31
v1.31 Release
2026-05-06 10:37:07 -07:00
Zach Brown
aa432727f2 v1.31 Release
Finish the release notes for the 1.31 release.

Signed-off-by: Zach Brown <zab@versity.com>
2026-05-05 14:29:18 -07:00
Zach Brown
ceebadd139 Merge pull request #308 from versity/auke/totl-delta-repair
totl key repair
2026-05-05 13:05:57 -07:00
Zach Brown
4b4ddc9ded Merge pull request #298 from versity/auke/double_unlock_dw_truncate
Fix double unlock in scoutfs_setattr data_wait error path
2026-05-04 09:52:29 -07:00
Zach Brown
94d3ece590 Merge pull request #299 from versity/auke/cond_resched_block_free
Add cond_resched in block_free_work
2026-05-04 09:49:43 -07:00
Auke Kok
6d5517614b Fix double unlock in scoutfs_setattr data_wait error path
When scoutfs_setattr truncates a file with offline extents, it unlocks
the inode lock before calling scoutfs_data_wait to wait for the data
to be staged. If data_wait returns any error, the code jumps to 'goto
out' which calls scoutfs_unlock again, thus double-unlocking the lock.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2026-05-04 09:48:54 -07:00
Auke Kok
10279d0b23 Add test exercising the totl delta inject ioctl.
Skews a totl twice, restore it, and intersperse setfattr/unlink to
exercise both injected and naturally-produced deltas.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2026-05-04 09:43:01 -07:00
Zach Brown
443c34309f Merge pull request #303 from versity/auke/clang_build_werr
3 minor clang things
2026-05-04 09:42:43 -07:00
Auke Kok
5c81a979d5 Add SCOUTFS_IOC_INJECT_TOTL_DELTA ioctl.
Inject a signed (total, count) delta at a totl key.  No validity
checking.  Requires CAP_SYS_ADMIN.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2026-05-04 09:42:42 -07:00
Zach Brown
ec38b6e1c8 Merge pull request #305 from versity/auke/block_submit_bio_err
Set BLOCK_BIT_ERROR on bio submit failure during forced unmount
2026-05-04 09:35:43 -07:00
Zach Brown
8e0066b231 Merge pull request #309 from versity/auke/quota_invalidate_race
fix and test - quota invalidate race
2026-05-04 09:34:26 -07:00
Zach Brown
a0fda5b735 Merge pull request #307 from versity/zab/next_merge_range_zero
Search all merge range items for next
2026-05-04 09:29:54 -07:00
Auke Kok
fc56a69d8f Add quota invalidate race regression test
Run concurrent quota add/del on one mount against rapid file
creation and deletion on both mounts to exercise the race fixed
in the previous commit.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2026-05-02 13:19:31 -07:00
Auke Kok
c8bc42ccdb Fix quota invalidate race with concurrent ruleset read
A quota check holds the quota cluster lock for READ and marks the
cached ruleset EBUSY while loading rules.  A quota mod on the same
mount holds the lock for WRITE (compatible with the local READ)
and calls scoutfs_quota_invalidate(), tripping
BUG_ON(rs == ERR_PTR(-EBUSY)).

Make invalidate skip EBUSY so the reader's claim is preserved, and
have scoutfs_quota_mod_rule wait for the reader to finish before
calling invalidate.  Without the wait, the in-flight reader would
publish its stale ruleset after invalidate runs, leaving the cache
stale until the next invalidation.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2026-05-02 13:19:31 -07:00
Zach Brown
4db0a48fe4 Search all merge range items for next
When searching for the next least merge range we need to sweep all the
stored items because they're interleaved with respect to key sorting
because we've clobbered the zone.

To search all of them we need to start from 0, not from the caller's
start key after setting the zone.  If the caller happens to provide a
start key with a small zone but large other fields (totl keys with
sufficiently large identifiers) we can miss ranges.

Signed-off-by: Zach Brown <zab@zabbo.net>
2026-04-29 10:17:38 -07:00
Auke Kok
ac1ab8e87f Add cond_resched in block_free_work
I'm seeing consistent CPU soft lockups in block_free_work on
my bare metal system that aren't reached by VM instances. The
reason is that the bare metal machine has a ton more memory
available causing the block free work queue to grow much
larger in size, and then it has so much work that it can take 30+
seconds before it goes through it all.

This is all with a debug kernel. A non debug kernel will likely
zoom through the outstanding work here at a much faster rate.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2026-04-22 13:39:32 -07:00
Zach Brown
af31b9f1e8 Merge pull request #306 from versity/zab/v1.30
v1.30 Release
2026-04-22 10:43:17 -07:00
Auke Kok
8bfd35db0b Set BLOCK_BIT_ERROR on bio submit failure during forced unmount
block_submit_bio will return -ENOLINK if called during a forced
shutdown, the bio is never submitted, and thus no completion callback
will fire to set BLOCK_BIT_ERROR. Any other task waiting for this
specific bp will end up waiting forever.

To fix, fall through to the existing block_end_io call on the
error path instead of returning directly.  That means moving
the forcing_unmount check past the setup calls so block_end_io's
bookkeeping stays balanced. block_end_io then sets BLOCK_BIT_ERROR
and wakes up waiters just as it would on a failed async completion.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2026-04-20 17:01:12 -07:00
Auke Kok
019125d86d Don't swallow invalid message error
A malformed message encountered here increases the counter, but doesn't
tear down the connection because of the nested for loops. The comments
indicate that that is the expected behavior - a misbehaving client
should not be tolerated.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2026-04-15 17:02:40 -07:00
Auke Kok
347e27acec Fix leak in client side lock invalidation
Clang's scan-build found this leak when we get an invalidation
for a lock we no longer have. Free ireq to fix.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2026-04-15 16:35:10 -07:00
Auke Kok
3ce5d47f2c Initialize resp_data to silence clang uninitialized warning
Clang flow analysis flags resp_data in process_response as possibly
uninitialized when find_request returns NULL.

  kmod/src/net.c:533:6: error: variable 'resp_data' is used uninitialized
  whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized]

In practice the read is harmless because resp_func stays NULL in that
path and call_resp_func only dereferences resp_data when resp_func is
non-NULL. Initialize at declaration.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2026-04-15 14:06:46 -07:00
17 changed files with 355 additions and 18 deletions

View File

@@ -1,6 +1,21 @@
Versity ScoutFS Release Notes
=============================
---
v1.31
\
*May 5, 2026*
Fix race between modifying quota rules and internal reading of the rules
that tripped an assertion.
Fix a bug that could skip merging totl items under specific heavy write
loads. This could lead to merged totl items incorrectly tracking the
sum of all the contributing totl xattrs.
Fix many small low risk bugs in error paths that were found with code
analysis and testing.
---
v1.30
\

View File

@@ -218,6 +218,7 @@ static void block_free_work(struct work_struct *work)
llist_for_each_entry_safe(bp, tmp, deleted, free_node) {
block_free(sb, bp);
cond_resched();
}
}
@@ -467,9 +468,6 @@ static int block_submit_bio(struct super_block *sb, struct block_private *bp,
sector_t sector;
int ret = 0;
if (scoutfs_forcing_unmount(sb))
return -ENOLINK;
sector = bp->bl.blkno << (SCOUTFS_BLOCK_LG_SHIFT - 9);
WARN_ON_ONCE(bp->bl.blkno == U64_MAX);
@@ -480,6 +478,17 @@ static int block_submit_bio(struct super_block *sb, struct block_private *bp,
set_bit(BLOCK_BIT_IO_BUSY, &bp->bits);
block_get(bp);
/*
* A second thread may already be waiting on this block's completion
* after this thread won the race to submit the block. We exit through
* the block_end_io error path which sets BLOCK_BIT_ERROR and assures
* that other callers in the waitq get woken up.
*/
if (scoutfs_forcing_unmount(sb)) {
ret = -ENOLINK;
goto end_io;
}
blk_start_plug(&plug);
for (off = 0; off < SCOUTFS_BLOCK_LG_SIZE; off += PAGE_SIZE) {
@@ -517,6 +526,7 @@ static int block_submit_bio(struct super_block *sb, struct block_private *bp,
blk_finish_plug(&plug);
end_io:
/* let racing end_io know we're done */
block_end_io(sb, opf, bp, ret);
@@ -836,6 +846,8 @@ int scoutfs_block_dirty_ref(struct super_block *sb, struct scoutfs_alloc *alloc,
bp = BLOCK_PRIVATE(bl);
if (block_is_dirty(bp)) {
if (ref_blkno)
*ref_blkno = 0;
ret = 0;
goto out;
}

View File

@@ -549,6 +549,7 @@ retry:
goto out;
if (scoutfs_data_wait_found(&dw)) {
scoutfs_unlock(sb, lock, SCOUTFS_LOCK_WRITE);
lock = NULL;
/* XXX callee locks instead? */
inode_unlock(inode);

View File

@@ -1739,6 +1739,43 @@ out:
return ret;
}
static long scoutfs_ioc_inject_totl_delta(struct file *file, unsigned long arg)
{
struct super_block *sb = file_inode(file)->i_sb;
struct scoutfs_ioctl_inject_totl_delta __user *uitd = (void __user *)arg;
struct scoutfs_ioctl_inject_totl_delta itd;
struct scoutfs_xattr_totl_val tval;
struct scoutfs_lock *lock = NULL;
struct scoutfs_key key;
int ret;
if (!capable(CAP_SYS_ADMIN))
return -EPERM;
if (copy_from_user(&itd, uitd, sizeof(itd)))
return -EFAULT;
scoutfs_xattr_init_totl_key(&key, itd.name);
tval.total = cpu_to_le64((u64)itd.total);
tval.count = cpu_to_le64((u64)itd.count);
ret = scoutfs_lock_xattr_totl(sb, SCOUTFS_LOCK_WRITE_ONLY, 0, &lock);
if (ret < 0)
goto out;
ret = scoutfs_hold_trans(sb, true);
if (ret < 0)
goto unlock;
ret = scoutfs_item_delta(sb, &key, &tval, sizeof(tval), lock);
scoutfs_release_trans(sb);
unlock:
scoutfs_unlock(sb, lock, SCOUTFS_LOCK_WRITE_ONLY);
out:
return ret;
}
long scoutfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
{
switch (cmd) {
@@ -1790,6 +1827,8 @@ long scoutfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
return scoutfs_ioc_read_xattr_index(file, arg);
case SCOUTFS_IOC_PUNCH_OFFLINE:
return scoutfs_ioc_punch_offline(file, arg);
case SCOUTFS_IOC_INJECT_TOTL_DELTA:
return scoutfs_ioc_inject_totl_delta(file, arg);
}
return -ENOTTY;

View File

@@ -876,4 +876,17 @@ struct scoutfs_ioctl_punch_offline {
#define SCOUTFS_IOC_PUNCH_OFFLINE \
_IOW(SCOUTFS_IOCTL_MAGIC, 24, struct scoutfs_ioctl_punch_offline)
/*
* Inject a signed (total, count) delta at the totl key @name (a, b, c
* match the trailing dotted u64s of a totl xattr name).
*/
struct scoutfs_ioctl_inject_totl_delta {
__u64 name[SCOUTFS_IOCTL_XATTR_TOTAL_NAME_NR];
__s64 total;
__s64 count;
};
#define SCOUTFS_IOC_INJECT_TOTL_DELTA \
_IOW(SCOUTFS_IOCTL_MAGIC, 25, struct scoutfs_ioctl_inject_totl_delta)
#endif

View File

@@ -813,6 +813,7 @@ int scoutfs_lock_invalidate_request(struct super_block *sb, u64 net_id,
out:
if (!lock) {
kfree(ireq);
ret = scoutfs_client_lock_response(sb, net_id, nl);
BUG_ON(ret); /* lock server doesn't fence timed out client requests */
}

View File

@@ -525,7 +525,7 @@ static int process_response(struct scoutfs_net_connection *conn,
struct super_block *sb = conn->sb;
struct message_send *msend;
scoutfs_net_response_t resp_func = NULL;
void *resp_data;
void *resp_data = NULL;
spin_lock(&conn->lock);
@@ -804,7 +804,7 @@ static void scoutfs_net_recv_worker(struct work_struct *work)
if (invalid_message(conn, nh)) {
scoutfs_inc_counter(sb, net_recv_invalid_message);
ret = -EBADMSG;
break;
goto out;
}
data_len = le16_to_cpu(nh->data_len);

View File

@@ -1114,6 +1114,7 @@ int scoutfs_quota_mod_rule(struct super_block *sb, bool is_add,
goto release;
}
wait_event(qtinf->waitq, !ruleset_is_busy(qtinf));
scoutfs_quota_invalidate(sb);
ret = 0;
@@ -1142,12 +1143,17 @@ void scoutfs_quota_get_lock_range(struct scoutfs_key *start, struct scoutfs_key
}
/*
* This is called during cluster lock invalidation to indicate that the
* ruleset is no longer protected by cluster locking and might have been
* modified. We mark the ruleset invalid and free it once all readers
* drain. The next check will acquire the cluster lock and read the
* rules. Because this is called during invalidation this is serialized
* with write holders of cluster locks so we can never see -EBUSY here.
* Mark the cached ruleset invalid and free the previous one once readers
* drain. Called from cluster lock invalidation and from quota rule
* modification.
*
* Cluster lock invalidation runs only after the lock layer has drained
* local READ users. Since EBUSY is set only while a reader holds READ,
* the reader has already published by the time we run.
*
* Quota rule modification waits on the waitq for any in-flight reader
* to publish before calling here, so the next check rebuilds against
* the newly written rules rather than the reader's stale result.
*/
void scoutfs_quota_invalidate(struct super_block *sb)
{
@@ -1161,13 +1167,10 @@ void scoutfs_quota_invalidate(struct super_block *sb)
spin_lock(&qtinf->lock);
rs = rcu_dereference_protected(qtinf->ruleset, lockdep_is_held(&qtinf->lock));
if (rs != ERR_PTR(-EINVAL))
if (rs == ERR_PTR(-ENOENT) || !IS_ERR(rs))
rcu_assign_pointer(qtinf->ruleset, ERR_PTR(-EINVAL));
spin_unlock(&qtinf->lock);
/* cluster locking should have prevented this */
BUG_ON(rs == ERR_PTR(-EBUSY));
if (!IS_ERR(rs))
call_rcu(&rs->rcu, free_ruleset_rcu);

View File

@@ -1077,8 +1077,7 @@ static int next_log_merge_range(struct super_block *sb, struct scoutfs_btree_roo
struct scoutfs_key key;
int ret;
key = *start;
key.sk_zone = SCOUTFS_LOG_MERGE_RANGE_ZONE;
init_log_merge_key(&key, SCOUTFS_LOG_MERGE_RANGE_ZONE, 0, 0);
scoutfs_key_set_ones(&rng->start);
do {

1
tests/.gitignore vendored
View File

@@ -12,3 +12,4 @@ src/o_tmpfile_umask
src/o_tmpfile_linkat
src/mmap_stress
src/mmap_validate
src/totl-delta-inject

View File

@@ -15,7 +15,8 @@ BIN := src/createmany \
src/o_tmpfile_umask \
src/o_tmpfile_linkat \
src/mmap_stress \
src/mmap_validate
src/mmap_validate \
src/totl-delta-inject
DEPS := $(wildcard src/*.d)

View File

@@ -0,0 +1,6 @@
== setup
== concurrent quota mod and check across mounts
== verify quota rules are consistent after race
== verify file creation still works under quota
file visible on mount 1
== cleanup

View File

@@ -0,0 +1,10 @@
== setup three files contributing to totl 8888.0.0
== merge baseline into fs_root
8888.0.0 = 42, 3
== inject (+128, +2) unbalances totl 8888.0.0
8888.0.0 = 170, 5
== unlink f3 (value 32) produces a -32/-1 delta
8888.0.0 = 138, 4
== inject (-128, -2) restores accounting for the remaining files
8888.0.0 = 10, 2
== cleanup

View File

@@ -29,6 +29,8 @@ totl-xattr-tag.sh
basic-xattr-indx.sh
quota.sh
totl-merge-read.sh
quota-invalidate-race.sh
totl-delta-inject.sh
lock-refleak.sh
lock-shrink-consistency.sh
lock-shrink-read-race.sh

View File

@@ -0,0 +1,121 @@
/*
* Test helper that calls SCOUTFS_IOC_INJECT_TOTL_DELTA to seed
* arbitrary totl deltas.
*
* Copyright (C) 2026 Versity Software, Inc. All rights reserved.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public
* License v2 as published by the Free Software Foundation.
*/
#ifndef _GNU_SOURCE
#define _GNU_SOURCE
#endif
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#include <inttypes.h>
#include <fcntl.h>
#include <errno.h>
#include <sys/ioctl.h>
#include <linux/types.h>
#include "ioctl.h"
static void usage(const char *prog)
{
fprintf(stderr,
"Usage: %s <mountpoint> <a>.<b>.<c> <total> <count>\n",
prog);
exit(2);
}
static int parse_s64(const char *s, int64_t *out)
{
char *end;
int64_t v;
errno = 0;
v = strtoll(s, &end, 0);
if (errno || *end != '\0' || end == s)
return -1;
*out = v;
return 0;
}
/*
* Parse "<a>.<b>.<c>" into abc[0..2] (skxt_a, skxt_b, skxt_c). Each
* component must be a non-empty unsigned base-0 integer.
*/
static int parse_dotted_name(const char *s, uint64_t abc[3])
{
const char *p = s;
char *end;
int i;
for (i = 0; i < 3; i++) {
if (*p == '\0' || *p == '.')
return -1;
errno = 0;
abc[i] = strtoull(p, &end, 0);
if (errno || end == p)
return -1;
if (i < 2) {
if (*end != '.')
return -1;
p = end + 1;
} else {
if (*end != '\0')
return -1;
}
}
return 0;
}
int main(int argc, char **argv)
{
struct scoutfs_ioctl_inject_totl_delta itd = {{0,}};
uint64_t abc[3];
int64_t total, count;
int fd;
int ret;
if (argc != 5)
usage(argv[0]);
if (parse_dotted_name(argv[2], abc) ||
parse_s64(argv[3], &total) ||
parse_s64(argv[4], &count)) {
fprintf(stderr, "could not parse arguments\n");
usage(argv[0]);
}
itd.name[0] = abc[0];
itd.name[1] = abc[1];
itd.name[2] = abc[2];
itd.total = total;
itd.count = count;
fd = open(argv[1], O_RDONLY | O_DIRECTORY);
if (fd < 0) {
fprintf(stderr, "open(%s): %s\n", argv[1], strerror(errno));
return 1;
}
ret = ioctl(fd, SCOUTFS_IOC_INJECT_TOTL_DELTA, &itd);
if (ret < 0) {
fprintf(stderr,
"INJECT_TOTL_DELTA(%" PRIu64 ".%" PRIu64 ".%" PRIu64
", total=%" PRId64 ", count=%" PRId64 "): %s\n",
abc[0], abc[1], abc[2], total, count, strerror(errno));
close(fd);
return 1;
}
close(fd);
return 0;
}

View File

@@ -0,0 +1,70 @@
#
# Regression for the BUG_ON in scoutfs_quota_invalidate when a concurrent
# ruleset read on one mount races with a quota rule modification.
#
t_require_mounts 2
TEST_UID=22222
SET_UID="--ruid=$TEST_UID --euid=$TEST_UID"
echo "== setup"
mkdir -p "$T_D0/dir"
chown --quiet $TEST_UID "$T_D0/dir"
# totl xattr gives quota checks something to consult
setfattr -n scoutfs.totl.test.1.1.1 -v 1 "$T_D0/dir"
echo "== concurrent quota mod and check across mounts"
(
for i in $(seq 1 20); do
scoutfs quota-add -p "$T_M0" \
-r "1 1,L,- 1,L,- $i,L,- I 999999 -" 2>/dev/null
scoutfs quota-del -p "$T_M0" \
-r "1 1,L,- 1,L,- $i,L,- I 999999 -" 2>/dev/null
done
) &
MOD_PID=$!
# same mount as the mod: races local read against invalidate
(
for i in $(seq 1 50); do
setpriv $SET_UID touch "$T_D0/dir/race0_$i" 2>/dev/null
rm -f "$T_D0/dir/race0_$i"
done
) &
CHECK0_PID=$!
# other mount: drives cross-node lock traffic
(
for i in $(seq 1 50); do
setpriv $SET_UID touch "$T_D1/dir/race1_$i" 2>/dev/null
rm -f "$T_D1/dir/race1_$i"
done
) &
CHECK1_PID=$!
t_quiet wait $MOD_PID
t_quiet wait $CHECK0_PID
t_quiet wait $CHECK1_PID
echo "== verify quota rules are consistent after race"
scoutfs quota-wipe -p "$T_M0"
scoutfs quota-list -p "$T_M0"
echo "== verify file creation still works under quota"
scoutfs quota-add -p "$T_M0" -r "1 1,L,- 1,L,- 1,L,- I 999999 -"
sync
echo 1 > $(t_debugfs_path)/drop_weak_item_cache
echo 1 > $(t_debugfs_path)/drop_quota_check_cache
setpriv $SET_UID touch "$T_D0/dir/verify_file"
test -f "$T_D1/dir/verify_file" && echo "file visible on mount 1"
rm -f "$T_D0/dir/verify_file"
scoutfs quota-wipe -p "$T_M0"
echo "== cleanup"
setfattr -x scoutfs.totl.test.1.1.1 "$T_D0/dir"
rm -rf "$T_D0/dir"
t_pass

View File

@@ -0,0 +1,43 @@
#
# Exercise the SCOUTFS_IOC_INJECT_TOTL_DELTA ioctl that injects totl
# deltas directly via totl-delta-inject(1).
#
t_require_commands setfattr scoutfs sync rm touch totl-delta-inject
# force a log merge then read-xattr-totals filtered to our own keys
read_totals()
{
t_force_log_merge
sync
echo 1 > $(t_debugfs_path)/drop_weak_item_cache
scoutfs read-xattr-totals -p "$T_M0" | \
grep -E '^8888\.' || true
}
echo "== setup three files contributing to totl 8888.0.0"
touch "$T_D0/f1" "$T_D0/f2" "$T_D0/f3"
setfattr -n scoutfs.totl.inj.8888.0.0 -v 2 "$T_D0/f1"
setfattr -n scoutfs.totl.inj.8888.0.0 -v 8 "$T_D0/f2"
setfattr -n scoutfs.totl.inj.8888.0.0 -v 32 "$T_D0/f3"
echo "== merge baseline into fs_root"
read_totals
echo "== inject (+128, +2) unbalances totl 8888.0.0"
totl-delta-inject "$T_M0" 8888.0.0 128 2
read_totals
echo "== unlink f3 (value 32) produces a -32/-1 delta"
rm -f "$T_D0/f3"
read_totals
echo "== inject (-128, -2) restores accounting for the remaining files"
totl-delta-inject "$T_M0" 8888.0.0 -128 -2
read_totals
echo "== cleanup"
rm -f "$T_D0/f1" "$T_D0/f2"
read_totals
t_pass