Compare commits

..

23 Commits

Author SHA1 Message Date
Chris Kirby
417261a051 Do orphaned inode extent freeing in chunks
Fix an issue where the final freeing of extents for unlinked files
could trigger spurious hung task timeout warnings. Instead of holding the
scoutfs inode lock for the entire duration of extent freeing, release and
reacquire the lock every time the transaction sequence number changes.

This is only used in the evict and orphan inode cleanup paths; truncate
and release continue to free during a single lock hold.

Remove the hung_task_timeout_secs workaround from the large-fragmented-free
test script.

Signed-off-by: Chris Kirby <ckirby@versity.com>
2025-06-11 13:24:11 -05:00
Zach Brown
9741d40e10 Merge pull request #229 from versity/zab/v1.25
v1.25 Release
2025-06-04 11:21:25 -07:00
Zach Brown
48ac7bdf7c v1.25 Release
Finish the release notes for the 1.25 release.

Signed-off-by: Zach Brown <zab@versity.com>
2025-06-03 13:35:42 -07:00
Zach Brown
7865ee9f54 Merge pull request #223 from versity/auke/el9_5_wmaybe-uninit
Fix -Wmaybe-uninitalized since rhel9.5
2025-05-12 12:21:02 -07:00
Zach Brown
624eb128c6 Merge pull request #221 from versity/auke/enospc-test
Give enospc test more time to commit unlink.
2025-05-09 11:27:04 -07:00
Zach Brown
091eb3b683 Merge pull request #219 from versity/auke/fix-tests-failing-dirty-test-dirs
Fix test cases that don't run cleanly in a semi-dirty env.
2025-05-09 11:17:24 -07:00
Zach Brown
04e8cc6295 Merge pull request #220 from versity/auke/orphan-inodes
Extend orphan-inodes timeout.
2025-05-09 11:15:13 -07:00
Zach Brown
0f6fdb3eb5 Merge pull request #222 from versity/auke/t_kill_silent
Properly silently kill background tasks.
2025-05-09 11:11:24 -07:00
Auke Kok
2f48a606e8 Fix -Wmaybe-uninitalized since rhel9.5
Looks like the compiler isn't smart enough to understand the pass by
pointer value, and we can initialize it here easily.

make[1]: Entering directory '/usr/src/kernels/5.14.0-503.26.1.el9_5.x86_64'
  CC [M]  /home/auke/scoutfs/kmod/src/server.o
/home/auke/scoutfs/kmod/src/server.c: In function ‘fence_pending_recov_worker’:
/home/auke/scoutfs/kmod/src/server.c:4170:23: error: ‘addr.v4.addr’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
 4170 |                 ret = scoutfs_fence_start(sb, rid, le32_to_be32(addr.v4.addr),
      |                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 4171 |                                           SCOUTFS_FENCE_CLIENT_RECOVERY);
      |                                           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors

There's still the obvious issue here that we'd intended to support ipv6
but just disregard that here.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2025-05-08 15:20:50 -07:00
Auke Kok
377e49caf1 Properly silently kill background tasks.
Occasionally, we have some tests fail because these kills produce:

tests/lock-recover-invalidate.sh: line 42:  9928 Terminated

Even though we expected them to be silent. In these particular cases we
already don't care about this output.

We borrow the silent_kill() function from orphan-inodes and promote it
to t_silent_kill() in funcs/exec.sh, and then use it everywhere where
appropriate.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2025-05-08 12:03:04 -07:00
Auke Kok
d08eb66adc Give enospc test more time to commit unlink.
The current test sequence performs the unlink and immediately tests
whether enough resources are available to create new files again, and
this consistently fails.

One of my crummy VMs takes a good 12 seconds before the `touch` actually
succeeds. We care about the filesystem eventually returning from ENOSPC,
and certainly we don't want it to take forever, but there is a period
after our first ENOSPC error and cleanup that we expect ENOSPC to fail
for a bit longer.

Make the timeout 120s. As soon as the `touch` completes, exit the wait
loop.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2025-05-08 11:40:13 -07:00
Zach Brown
6f19d0bd36 Merge pull request #216 from versity/zab/stop_ending_dirty_data_freed
Zab/stop ending dirty data freed
2025-05-08 11:18:23 -07:00
Auke Kok
1d0cde7cc3 Clean up old test data as needed.
If run without `-m` (explicit mkfs) in subsequent testing, old test
data files may break several tests. Most failures are -EEXIST, but
there are some more subtle ones.

This change erases any existing test dir as needed just before we
run the tests, and avoids the issue entirely.

I considered doing a `mv dir dir.$$ && rm -rf dir.$$ &` alternative
solution but that likely will interfere disproportionally with
tests that do disconnects and other thing that can be impacted by an
unlink storm.

This has an obvious performance aspect - tests will be a little
slower to start on subsequent runs. In CI, this will effectively be
a no-op though.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2025-05-08 10:10:01 -07:00
Auke Kok
138c7c6b49 Extend orphan-inodes timeout.
This test regularly fails in CI when the 15 seconds elapses and the
system still hasn't concluded the mount log merges and orphan inode
scans needed to unlink the test files.

Instead of just extending the timeout value, we test-and-retry for 120s.
This hopefully is faster in most cases. My smallest VM needs about 6s-8s
on average.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2025-05-08 09:56:45 -07:00
Zach Brown
8aa1a98901 Merge pull request #210 from versity/auke/perf-irq-took-too-long
Filter out perf `interrupt took too long` dmesg.
2025-04-30 10:04:00 -07:00
Zach Brown
888b1394a6 Retry client commit and get log trees separately
The client transaction commit worker has a series of functions that it
calls to commit the current transaction and open the next one.  If any
of them fail, it retries all of them from the beginning each time until
they all succeed.

This pattern behaves badly since we added the strict get_trans_seq and
commit_trans_seq latching in the log_trees.  The server will only commit
the items for a get or commit request once, and will fail a commit
request if it isn't given the seq that matches the current item.

If the server gets an error it can have persisted items while sending an
error to the client.  If this error was for a get request, then the
client will retry all of its transaction write functions.  This includes
the commit request which is now using a stale seq and will fail
indefinitely.  This is visible in the server log as:

  error -5 committing client logs for rid e57e37132c919c4f: invalid log trees item get_trans_seq

The solution is to retry the commit and get phases independently.  This
way a failed get will be retried on its own without running through the
commit phase that had succeeded.  The client will eventually get the
next seq that it can then safely commit.

Signed-off-by: Zach Brown <zab@versity.com>
2025-04-29 11:46:38 -07:00
Zach Brown
e457694f19 Don't send dirty data_freed blocks to client
At the end of get_log_trees we can try and drain the data_freed extent
tree, which can take multiple commits.  If a commit fails then the
blocks are still dirty in memory.  We can't send references to those
blocks to the client.  We have to return an error and not send the
log_trees, like the main get_log_trees does.  The client will retry and
eventually get a log_trees that references blocks that were successfully
committed.

Signed-off-by: Zach Brown <zab@versity.com>
2025-04-29 11:46:38 -07:00
Zach Brown
459de5b478 Merge pull request #211 from versity/auke/tapf-output
TAP formatted output.
2025-04-15 14:25:06 -07:00
Auke Kok
24031cde1d TAP formatted output.
Stored as `results/scoutfs.tap`, this file contains TAP format 14
generated test results.

Embedded in the output are some metadata so that these files can be
aggregated and stored in an unique and deduplicating way, but using a
generated UUID at the start of testing. The file itself also catches git
ID, date, and kernel version, as well as the (possibly altered) test
sequence used.

Any test that has diff or dmesg output will be considered failed, and a
copy of the relevant data is included as comments.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2025-04-15 12:02:41 -07:00
Zach Brown
04cc41719c Merge pull request #209 from versity/auke/basic-truncate-yes-pipefail
Ignore pipefail alternative error when not a tty.
2025-04-14 13:15:03 -07:00
Auke Kok
1b47e9429e Filter out perf interrupt took too long dmesg.
Example:

```
[ 2469.638414] perf: interrupt took too long (2507 > 2500), lowering kernel.perf_event_max_sample_rate to 79000
```

Signed-off-by: Auke Kok <auke.kok@versity.com>
2025-04-14 12:06:58 -07:00
Auke Kok
7ea084082d Ignore pipefail alternative error when not a tty.
This happens with the basic-truncate test, only. It's the only user
of the `yes` program.

The `yes` command normally fails gracefully under the usual runs that
are attached to some terminal. But when the test script runs entirely
under something else, it will throw a needless error message that
pollutes the test output:

  `yes: standard output: Broken pipe`

Adjust the redirect to omit all stderr for `yes` in this case.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2025-04-14 11:13:39 -07:00
Zach Brown
f565451f76 Merge pull request #208 from versity/zab/v1.24
v1.24 Release
2025-03-17 11:18:42 -07:00
18 changed files with 309 additions and 123 deletions

View File

@@ -1,6 +1,27 @@
Versity ScoutFS Release Notes
=============================
---
v1.25
\
*Jun 3, 2025*
Fix a bug that could cause indefinite retries of failed client commits.
Under specific error conditions the client and server's understanding of
the current client commit could get out of sync. The client would retry
commits indefinitely that could never succeed. This manifested as
infinite "critical transaction commit failure" messages in the kernel
log on the client and matching "error <nr> committing client logs" on
the server.
Fix a bug in a specific case of server error handling that could result
in sending references to unwritten blocks to the client. The client
would try to read blocks that hadn't been written and return spurious
errors. This was seen under low free space conditions on the server and
resulted in error messages with error code 116 (The errno enum for
ESTALE, the client's indication that it couldn't read the blocks that it
expected.)
---
v1.24
\

View File

@@ -296,28 +296,39 @@ static s64 truncate_extents(struct super_block *sb, struct inode *inode,
* and offline blocks. If it's not provided then the inode is being
* destroyed and isn't reachable, we don't need to update it.
*
* If 'pause' is set, then we are destroying the inode and we should take
* breaks occasionally to allow other nodes access to this inode lock shard.
*
* The caller is in charge of locking the inode and data, but we may
* have to modify far more items than fit in a transaction so we're in
* charge of batching updates into transactions. If the inode is
* provided then we're responsible for updating its item as we go.
*/
int scoutfs_data_truncate_items(struct super_block *sb, struct inode *inode,
u64 ino, u64 iblock, u64 last, bool offline,
struct scoutfs_lock *lock)
u64 ino, u64 *iblock, u64 last, bool offline,
struct scoutfs_lock *lock, bool pause)
{
struct scoutfs_inode_info *si = NULL;
LIST_HEAD(ind_locks);
u64 cur_seq;
s64 ret = 0;
WARN_ON_ONCE(inode && !inode_is_locked(inode));
/*
* If the inode is provided, then we aren't destroying it. So it's not
* safe to pause while removing items- it needs to be done in one chunk.
*/
if (WARN_ON_ONCE(pause && inode))
return -EINVAL;
/* clamp last to the last possible block? */
if (last > SCOUTFS_BLOCK_SM_MAX)
last = SCOUTFS_BLOCK_SM_MAX;
trace_scoutfs_data_truncate_items(sb, iblock, last, offline);
trace_scoutfs_data_truncate_items(sb, *iblock, last, offline, pause);
if (WARN_ON_ONCE(last < iblock))
if (WARN_ON_ONCE(last < *iblock))
return -EINVAL;
if (inode) {
@@ -325,7 +336,9 @@ int scoutfs_data_truncate_items(struct super_block *sb, struct inode *inode,
down_write(&si->extent_sem);
}
while (iblock <= last) {
cur_seq = scoutfs_trans_sample_seq(sb);
while (*iblock <= last) {
if (inode)
ret = scoutfs_inode_index_lock_hold(inode, &ind_locks, true, false);
else
@@ -339,7 +352,7 @@ int scoutfs_data_truncate_items(struct super_block *sb, struct inode *inode,
ret = 0;
if (ret == 0)
ret = truncate_extents(sb, inode, ino, iblock, last,
ret = truncate_extents(sb, inode, ino, *iblock, last,
offline, lock);
if (inode)
@@ -351,8 +364,19 @@ int scoutfs_data_truncate_items(struct super_block *sb, struct inode *inode,
if (ret <= 0)
break;
iblock = ret;
*iblock = ret;
ret = 0;
/*
* We know there's more to do because truncate_extents()
* pauses every EXTENTS_PER_HOLD extents and it returned the
* next starting block. Our caller might also want us to pause,
* which we will do whenever we cross a transaction boundary.
*/
if (pause && (cur_seq != scoutfs_trans_sample_seq(sb))) {
ret = -EINPROGRESS;
break;
}
}
if (si)

View File

@@ -47,8 +47,8 @@ int scoutfs_get_block_write(struct inode *inode, sector_t iblock, struct buffer_
int create);
int scoutfs_data_truncate_items(struct super_block *sb, struct inode *inode,
u64 ino, u64 iblock, u64 last, bool offline,
struct scoutfs_lock *lock);
u64 ino, u64 *iblock, u64 last, bool offline,
struct scoutfs_lock *lock, bool pause);
int scoutfs_data_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
u64 start, u64 len);
long scoutfs_fallocate(struct file *file, int mode, loff_t offset, loff_t len);

View File

@@ -468,8 +468,8 @@ int scoutfs_complete_truncate(struct inode *inode, struct scoutfs_lock *lock)
start = (i_size_read(inode) + SCOUTFS_BLOCK_SM_SIZE - 1) >>
SCOUTFS_BLOCK_SM_SHIFT;
ret = scoutfs_data_truncate_items(inode->i_sb, inode,
scoutfs_ino(inode), start, ~0ULL,
false, lock);
scoutfs_ino(inode), &start, ~0ULL,
false, lock, false);
err = clear_truncate_flag(inode, lock);
return ret ? ret : err;
@@ -1635,7 +1635,8 @@ int scoutfs_inode_orphan_delete(struct super_block *sb, u64 ino, struct scoutfs_
* partial deletion until all deletion is complete and the orphan item
* is removed.
*/
static int delete_inode_items(struct super_block *sb, u64 ino, struct scoutfs_inode *sinode,
static int delete_inode_items(struct super_block *sb, u64 ino,
struct scoutfs_inode *sinode, u64 *start,
struct scoutfs_lock *lock, struct scoutfs_lock *orph_lock)
{
struct scoutfs_key key;
@@ -1654,8 +1655,8 @@ static int delete_inode_items(struct super_block *sb, u64 ino, struct scoutfs_in
/* remove data items in their own transactions */
if (S_ISREG(mode)) {
ret = scoutfs_data_truncate_items(sb, NULL, ino, 0, ~0ULL,
false, lock);
ret = scoutfs_data_truncate_items(sb, NULL, ino, start, ~0ULL,
false, lock, true);
if (ret)
goto out;
}
@@ -1803,16 +1804,23 @@ out:
*/
static int try_delete_inode_items(struct super_block *sb, u64 ino)
{
struct inode_deletion_lock_data *ldata = NULL;
struct scoutfs_lock *orph_lock = NULL;
struct scoutfs_lock *lock = NULL;
struct inode_deletion_lock_data *ldata;
struct scoutfs_lock *orph_lock;
struct scoutfs_lock *lock;
struct scoutfs_inode sinode;
struct scoutfs_key key;
bool clear_trying = false;
bool more = false;
u64 group_nr;
u64 start = 0;
int bit_nr;
int ret;
again:
ldata = NULL;
orph_lock = NULL;
lock = NULL;
ret = scoutfs_lock_ino(sb, SCOUTFS_LOCK_WRITE, 0, ino, &lock);
if (ret < 0)
goto out;
@@ -1824,11 +1832,12 @@ static int try_delete_inode_items(struct super_block *sb, u64 ino)
goto out;
/* only one local attempt per inode at a time */
if (test_and_set_bit(bit_nr, ldata->trying)) {
if (!more && test_and_set_bit(bit_nr, ldata->trying)) {
ret = 0;
goto out;
}
clear_trying = true;
more = false;
/* can't delete if it's cached in local or remote mounts */
if (scoutfs_omap_test(sb, ino) || test_bit_le(bit_nr, ldata->map.bits)) {
@@ -1853,7 +1862,15 @@ static int try_delete_inode_items(struct super_block *sb, u64 ino)
if (ret < 0)
goto out;
ret = delete_inode_items(sb, ino, &sinode, lock, orph_lock);
ret = delete_inode_items(sb, ino, &sinode, &start, lock, orph_lock);
if (ret == -EINPROGRESS) {
more = true;
clear_trying = false;
} else {
more = false;
}
out:
if (clear_trying)
clear_bit(bit_nr, ldata->trying);
@@ -1861,6 +1878,9 @@ out:
scoutfs_unlock(sb, lock, SCOUTFS_LOCK_WRITE);
scoutfs_unlock(sb, orph_lock, SCOUTFS_LOCK_WRITE_ONLY);
if (more)
goto again;
return ret;
}

View File

@@ -372,9 +372,9 @@ static long scoutfs_ioc_release(struct file *file, unsigned long arg)
sblock = args.offset >> SCOUTFS_BLOCK_SM_SHIFT;
eblock = (args.offset + args.length - 1) >> SCOUTFS_BLOCK_SM_SHIFT;
ret = scoutfs_data_truncate_items(sb, inode, scoutfs_ino(inode),
sblock,
&sblock,
eblock, true,
lock);
lock, false);
if (ret == 0) {
scoutfs_inode_get_onoff(inode, &online, &offline);
isize = i_size_read(inode);
@@ -383,8 +383,8 @@ static long scoutfs_ioc_release(struct file *file, unsigned long arg)
>> SCOUTFS_BLOCK_SM_SHIFT;
ret = scoutfs_data_truncate_items(sb, inode,
scoutfs_ino(inode),
sblock, U64_MAX,
false, lock);
&sblock, U64_MAX,
false, lock, false);
}
}

View File

@@ -378,15 +378,16 @@ DEFINE_EVENT(scoutfs_data_file_extent_class, scoutfs_data_fiemap_extent,
);
TRACE_EVENT(scoutfs_data_truncate_items,
TP_PROTO(struct super_block *sb, __u64 iblock, __u64 last, int offline),
TP_PROTO(struct super_block *sb, __u64 iblock, __u64 last, int offline, bool pause),
TP_ARGS(sb, iblock, last, offline),
TP_ARGS(sb, iblock, last, offline, pause),
TP_STRUCT__entry(
SCSB_TRACE_FIELDS
__field(__u64, iblock)
__field(__u64, last)
__field(int, offline)
__field(bool, pause)
),
TP_fast_assign(
@@ -394,10 +395,12 @@ TRACE_EVENT(scoutfs_data_truncate_items,
__entry->iblock = iblock;
__entry->last = last;
__entry->offline = offline;
__entry->pause = pause;
),
TP_printk(SCSBF" iblock %llu last %llu offline %u", SCSB_TRACE_ARGS,
__entry->iblock, __entry->last, __entry->offline)
TP_printk(SCSBF" iblock %llu last %llu offline %u pause %d",
SCSB_TRACE_ARGS, __entry->iblock, __entry->last,
__entry->offline, __entry->pause)
);
TRACE_EVENT(scoutfs_data_wait_check,

View File

@@ -1299,12 +1299,10 @@ static int finalize_and_start_log_merge(struct super_block *sb, struct scoutfs_l
* is nested inside holding commits so we recheck the persistent item
* each time we commit to make sure it's still what we think. The
* caller is still going to send the item to the client so we update the
* caller's each time we make progress. This is a best-effort attempt
* to clean up and it's valid to leave extents in data_freed we don't
* return errors to the caller. The client will continue the work later
* in get_log_trees or as the rid is reclaimed.
* caller's each time we make progress. If we hit an error applying the
* changes we make then we can't send the log_trees to the client.
*/
static void try_drain_data_freed(struct super_block *sb, struct scoutfs_log_trees *lt)
static int try_drain_data_freed(struct super_block *sb, struct scoutfs_log_trees *lt)
{
DECLARE_SERVER_INFO(sb, server);
struct scoutfs_super_block *super = DIRTY_SUPER_SB(sb);
@@ -1313,6 +1311,7 @@ static void try_drain_data_freed(struct super_block *sb, struct scoutfs_log_tree
struct scoutfs_log_trees drain;
struct scoutfs_key key;
COMMIT_HOLD(hold);
bool apply = false;
int ret = 0;
int err;
@@ -1321,22 +1320,27 @@ static void try_drain_data_freed(struct super_block *sb, struct scoutfs_log_tree
while (lt->data_freed.total_len != 0) {
server_hold_commit(sb, &hold);
mutex_lock(&server->logs_mutex);
apply = true;
ret = find_log_trees_item(sb, &super->logs_root, false, rid, U64_MAX, &drain);
if (ret < 0)
if (ret < 0) {
ret = 0;
break;
}
/* careful to only keep draining the caller's specific open trans */
if (drain.nr != lt->nr || drain.get_trans_seq != lt->get_trans_seq ||
drain.commit_trans_seq != lt->commit_trans_seq || drain.flags != lt->flags) {
ret = -ENOENT;
ret = 0;
break;
}
ret = scoutfs_btree_dirty(sb, &server->alloc, &server->wri,
&super->logs_root, &key);
if (ret < 0)
if (ret < 0) {
ret = 0;
break;
}
/* moving can modify and return errors, always update caller and item */
mutex_lock(&server->alloc_mutex);
@@ -1352,19 +1356,19 @@ static void try_drain_data_freed(struct super_block *sb, struct scoutfs_log_tree
BUG_ON(err < 0); /* dirtying must guarantee success */
mutex_unlock(&server->logs_mutex);
ret = server_apply_commit(sb, &hold, ret);
if (ret < 0) {
ret = 0; /* don't try to abort, ignoring ret */
apply = false;
if (ret < 0)
break;
}
}
/* try to cleanly abort and write any partial dirty btree blocks, but ignore result */
if (ret < 0) {
if (apply) {
mutex_unlock(&server->logs_mutex);
server_apply_commit(sb, &hold, 0);
server_apply_commit(sb, &hold, ret);
}
return ret;
}
/*
@@ -1572,9 +1576,9 @@ out:
scoutfs_err(sb, "error %d getting log trees for rid %016llx: %s",
ret, rid, err_str);
/* try to drain excessive data_freed with additional commits, if needed, ignoring err */
/* try to drain excessive data_freed with additional commits, if needed */
if (ret == 0)
try_drain_data_freed(sb, &lt);
ret = try_drain_data_freed(sb, &lt);
return scoutfs_net_response(sb, conn, cmd, id, ret, &lt, sizeof(lt));
}
@@ -4149,7 +4153,7 @@ static void fence_pending_recov_worker(struct work_struct *work)
struct server_info *server = container_of(work, struct server_info,
fence_pending_recov_work);
struct super_block *sb = server->sb;
union scoutfs_inet_addr addr;
union scoutfs_inet_addr addr = {{0,}};
u64 rid = 0;
int ret = 0;

View File

@@ -159,6 +159,58 @@ static bool drained_holders(struct trans_info *tri)
return holders == 0;
}
static int commit_current_log_trees(struct super_block *sb, char **str)
{
DECLARE_TRANS_INFO(sb, tri);
return (*str = "data submit", scoutfs_inode_walk_writeback(sb, true)) ?:
(*str = "item dirty", scoutfs_item_write_dirty(sb)) ?:
(*str = "data prepare", scoutfs_data_prepare_commit(sb)) ?:
(*str = "alloc prepare", scoutfs_alloc_prepare_commit(sb, &tri->alloc, &tri->wri)) ?:
(*str = "meta write", scoutfs_block_writer_write(sb, &tri->wri)) ?:
(*str = "data wait", scoutfs_inode_walk_writeback(sb, false)) ?:
(*str = "commit log trees", commit_btrees(sb)) ?:
scoutfs_item_write_done(sb);
}
static int get_next_log_trees(struct super_block *sb, char **str)
{
return (*str = "get log trees", scoutfs_trans_get_log_trees(sb));
}
static int retry_forever(struct super_block *sb, int (*func)(struct super_block *sb, char **str))
{
bool retrying = false;
char *str;
int ret;
do {
str = NULL;
ret = func(sb, &str);
if (ret < 0) {
if (!retrying) {
scoutfs_warn(sb, "critical transaction commit failure: %s = %d, retrying",
str, ret);
retrying = true;
}
if (scoutfs_forcing_unmount(sb)) {
ret = -EIO;
break;
}
msleep(2 * MSEC_PER_SEC);
} else if (retrying) {
scoutfs_info(sb, "retried transaction commit succeeded");
}
} while (ret < 0);
return ret;
}
/*
* This work func is responsible for writing out all the dirty blocks
* that make up the current dirty transaction. It prevents writers from
@@ -184,8 +236,6 @@ void scoutfs_trans_write_func(struct work_struct *work)
struct trans_info *tri = container_of(work, struct trans_info, write_work.work);
struct super_block *sb = tri->sb;
struct scoutfs_sb_info *sbi = SCOUTFS_SB(sb);
bool retrying = false;
char *s = NULL;
int ret = 0;
tri->task = current;
@@ -214,37 +264,9 @@ void scoutfs_trans_write_func(struct work_struct *work)
scoutfs_inc_counter(sb, trans_commit_written);
do {
ret = (s = "data submit", scoutfs_inode_walk_writeback(sb, true)) ?:
(s = "item dirty", scoutfs_item_write_dirty(sb)) ?:
(s = "data prepare", scoutfs_data_prepare_commit(sb)) ?:
(s = "alloc prepare", scoutfs_alloc_prepare_commit(sb, &tri->alloc,
&tri->wri)) ?:
(s = "meta write", scoutfs_block_writer_write(sb, &tri->wri)) ?:
(s = "data wait", scoutfs_inode_walk_writeback(sb, false)) ?:
(s = "commit log trees", commit_btrees(sb)) ?:
scoutfs_item_write_done(sb) ?:
(s = "get log trees", scoutfs_trans_get_log_trees(sb));
if (ret < 0) {
if (!retrying) {
scoutfs_warn(sb, "critical transaction commit failure: %s = %d, retrying",
s, ret);
retrying = true;
}
if (scoutfs_forcing_unmount(sb)) {
ret = -EIO;
break;
}
msleep(2 * MSEC_PER_SEC);
} else if (retrying) {
scoutfs_info(sb, "retried transaction commit succeeded");
}
} while (ret < 0);
/* retry {commit,get}_log_trees until they succeeed, can only fail when forcing unmount */
ret = retry_forever(sb, commit_current_log_trees) ?:
retry_forever(sb, get_next_log_trees);
out:
spin_lock(&tri->write_lock);
tri->write_count++;

View File

@@ -80,3 +80,15 @@ t_compare_output()
{
"$@" >&7 2>&1
}
#
# usually bash prints an annoying output message when jobs
# are killed. We can avoid that by redirecting stderr for
# the bash process when it reaps the jobs that are killed.
#
t_silent_kill() {
exec {ERR}>&2 2>/dev/null
kill "$@"
wait "$@"
exec 2>&$ERR {ERR}>&-
}

View File

@@ -160,6 +160,9 @@ t_filter_dmesg()
re="$re|Pipe handler or fully qualified core dump path required.*"
re="$re|Set kernel.core_pattern before fs.suid_dumpable.*"
# perf warning that it adjusted sample rate
re="$re|perf: interrupt took too long.*lowering kernel.perf_event_max_sample_rate.*"
egrep -v "($re)" | \
ignore_harmless_unwind_kasan_stack_oob
}

88
tests/funcs/tap.sh Normal file
View File

@@ -0,0 +1,88 @@
#
# Generate TAP format test results
#
t_tap_header()
{
local runid=$1
local sequence=( $(echo $tests) )
local count=${#sequence[@]}
# avoid recreating the same TAP result over again - harness sets this
[[ -z "$runid" ]] && runid="*test*"
cat > $T_RESULTS/scoutfs.tap <<TAPEOF
TAP version 14
1..${count}
#
# TAP results for run ${runid}
#
# host/run info:
#
# hostname: ${HOSTNAME}
# test start time: $(date --utc)
# uname -r: $(uname -r)
# scoutfs commit id: $(git describe --tags)
#
# sequence for this run:
#
TAPEOF
# Sequence
for t in ${tests}; do
echo ${t/.sh/}
done | cat -n | expand | column -c 120 | expand | sed 's/^ /#/' >> $T_RESULTS/scoutfs.tap
echo "#" >> $T_RESULTS/scoutfs.tap
}
t_tap_progress()
{
(
local i=$(( testcount + 1 ))
local testname=$1
local result=$2
local diff=""
local dmsg=""
if [[ -s "$T_RESULTS/tmp/${testname}/dmesg.new" ]]; then
dmsg="1"
fi
if ! cmp -s golden/${testname} $T_RESULTS/output/${testname}; then
diff="1"
fi
if [[ "${result}" == "100" ]] && [[ -z "${dmsg}" ]] && [[ -z "${diff}" ]]; then
echo "ok ${i} - ${testname}"
elif [[ "${result}" == "103" ]]; then
echo "ok ${i} - ${testname}"
echo "# ${testname} ** skipped - permitted **"
else
echo "not ok ${i} - ${testname}"
case ${result} in
101)
echo "# ${testname} ** skipped **"
;;
102)
echo "# ${testname} ** failed **"
;;
esac
if [[ -n "${diff}" ]]; then
echo "#"
echo "# diff:"
echo "#"
diff -u golden/${testname} $T_RESULTS/output/${testname} | expand | sed 's/^/# /'
fi
if [[ -n "${dmsg}" ]]; then
echo "#"
echo "# dmesg:"
echo "#"
cat "$T_RESULTS/tmp/${testname}/dmesg.new" | sed 's/^/# /'
fi
fi
) >> $T_RESULTS/scoutfs.tap
}

View File

@@ -1,4 +1,3 @@
== setting longer hung task timeout
== creating fragmented extents
== unlink file with moved extents to free extents per block
== cleanup

View File

@@ -512,6 +512,11 @@ msg "running tests"
> "$T_RESULTS/skip.log"
> "$T_RESULTS/fail.log"
# generate a test ID to make sure we can de-duplicate TAP results in aggregation
. funcs/tap.sh
t_tap_header $(uuidgen)
testcount=0
passed=0
skipped=0
failed=0
@@ -527,12 +532,15 @@ for t in $tests; do
cmd rm -rf "$T_TMPDIR"
cmd mkdir -p "$T_TMPDIR"
# create a test name dir in the fs
# create a test name dir in the fs, clean up old data as needed
T_DS=""
for i in $(seq 0 $((T_NR_MOUNTS - 1))); do
dir="${T_M[$i]}/test/$test_name"
test $i == 0 && cmd mkdir -p "$dir"
test $i == 0 && (
test -d "$dir" && cmd rm -rf "$dir"
cmd mkdir -p "$dir"
)
eval T_D$i=$dir
T_D[$i]=$dir
@@ -637,6 +645,11 @@ for t in $tests; do
test -n "$T_ABORT" && die "aborting after first failure"
fi
# record results for TAP format output
t_tap_progress $test_name $sts
((testcount++))
done
msg "all tests run: $passed passed, $skipped skipped, $skipped_permitted skipped (permitted), $failed failed"

View File

@@ -11,7 +11,7 @@ FILE="$T_D0/file"
# final block as we truncated past it.
#
echo "== truncate writes zeroed partial end of file block"
yes | dd of="$FILE" bs=8K count=1 status=none iflag=fullblock
yes 2>/dev/null | dd of="$FILE" bs=8K count=1 status=none iflag=fullblock
sync
# not passing iflag=fullblock causes the file occasionally to just be

View File

@@ -88,6 +88,11 @@ rm -rf "$SCR/xattrs"
echo "== make sure we can create again"
file="$SCR/file-after"
C=120
while (( C-- )); do
touch $file 2> /dev/null && break
sleep 1
done
touch $file
setfattr -n user.scoutfs-enospc -v 1 "$file"
sync

View File

@@ -10,30 +10,6 @@ EXTENTS_PER_BTREE_BLOCK=600
EXTENTS_PER_LIST_BLOCK=8192
FREED_EXTENTS=$((EXTENTS_PER_BTREE_BLOCK * EXTENTS_PER_LIST_BLOCK))
#
# This test specifically creates a pathologically sparse file that will
# be as expensive as possible to free. This is usually fine on
# dedicated or reasonable hardware, but trying to run this in
# virtualized debug kernels can take a very long time. This test is
# about making sure that the server doesn't fail, not that the platform
# can handle the scale of work that our btree formats happen to require
# while execution is bogged down with use-after-free memory reference
# tracking. So we give the test a lot more breathing room before
# deciding that its hung.
#
echo "== setting longer hung task timeout"
if [ -w /proc/sys/kernel/hung_task_timeout_secs ]; then
secs=$(cat /proc/sys/kernel/hung_task_timeout_secs)
test "$secs" -gt 0 || \
t_fail "confusing value '$secs' from /proc/sys/kernel/hung_task_timeout_secs"
restore_hung_task_timeout()
{
echo "$secs" > /proc/sys/kernel/hung_task_timeout_secs
}
trap restore_hung_task_timeout EXIT
echo "$((secs * 5))" > /proc/sys/kernel/hung_task_timeout_secs
fi
echo "== creating fragmented extents"
fragmented_data_extents $FREED_EXTENTS $EXTENTS_PER_BTREE_BLOCK "$T_D0/alloc" "$T_D0/move"

View File

@@ -38,6 +38,6 @@ while [ "$SECONDS" -lt "$END" ]; do
done
echo "== stopping background load"
kill $load_pids
t_silent_kill $load_pids
t_pass

View File

@@ -5,18 +5,6 @@
t_require_commands sleep touch sync stat handle_cat kill rm
t_require_mounts 2
#
# usually bash prints an annoying output message when jobs
# are killed. We can avoid that by redirecting stderr for
# the bash process when it reaps the jobs that are killed.
#
silent_kill() {
exec {ERR}>&2 2>/dev/null
kill "$@"
wait "$@"
exec 2>&$ERR {ERR}>&-
}
#
# We don't have a great way to test that inode items still exist. We
# don't prevent opening handles with nlink 0 today, so we'll use that.
@@ -52,7 +40,7 @@ inode_exists $ino || echo "$ino didn't exist"
echo "== orphan from failed evict deletion is picked up"
# pending kill signal stops evict from getting locks and deleting
silent_kill $pid
t_silent_kill $pid
t_set_sysfs_mount_option 0 orphan_scan_delay_ms 1000
sleep 5
inode_exists $ino && echo "$ino still exists"
@@ -70,7 +58,7 @@ for nr in $(t_fs_nrs); do
rm -f "$path"
done
sync
silent_kill $pids
t_silent_kill $pids
for nr in $(t_fs_nrs); do
t_force_umount $nr
done
@@ -82,7 +70,15 @@ done
# wait for orphan scans to run
t_set_all_sysfs_mount_options orphan_scan_delay_ms 1000
# also have to wait for delayed log merge work from mount
sleep 15
C=120
while (( C-- )); do
brk=1
for ino in $inos; do
inode_exists $ino && brk=0
done
test $brk -eq 1 && break
sleep 1
done
for ino in $inos; do
inode_exists $ino && echo "$ino still exists"
done
@@ -131,7 +127,7 @@ while [ $SECONDS -lt $END ]; do
done
# trigger eviction deletion of each file in each mount
silent_kill $pids
t_silent_kill $pids
wait || t_fail "handle_fsetxattr failed"