Compare commits

...

1 Commits

Author SHA1 Message Date
Auke Kok c042bf68a7 Fix incomplete freed list head recovery in dirty_alloc_blocks
dirty_alloc_blocks saves the freed list state before modifying it so
it can roll back on error.  But it only saved the block_ref (blkno
+ seq), not the full alloc_list_head which also includes first_nr,
total_nr, and flags.

When the freed list's head block is nearly full, dirty_alloc_blocks
forces allocation of a new empty head block. It zeros alloc->freed.ref
and sets alloc->freed.first_nr = 0.  dirty_list_block then sees the
empty ref, allocates a fresh block, and updates alloc->freed.ref to
point at it. first_nr = 0 correctly describes that new empty block. All
good so far.

If the subsequent avail dirty_list_block fails, the error path restores
alloc->freed.ref to orig_freed but leaves alloc->freed.first_nr at 0.
The original head block still holds its N entries on disk, but the
in-memory head now claims first_nr = 0 -- the recovery should have
rewound first_nr back to N along with the ref, but had no saved copy
to rewind from.

On the next dirty_alloc_blocks call the threshold check sees first_nr =
0! (full empty space) and skips the new-block path.  dirty_list_block
CoWs the existing head block and list_block_add writes new entries
past the N already-present blknos while incrementing first_nr from 0.
The head's first_nr drifts permanently below the block's actual nr.
This leads to the alloc list head/block mismatch BUG_ON at alloc.c:375
and downstream extent overlap errors that permanently stall the server.

Fix by saving and restoring the full scoutfs_alloc_list_head struct
instead of just the scoutfs_block_ref. Exposed by stress testing.

Signed-off-by: Auke Kok <auke.kok@versity.com>
2026-05-21 14:02:27 -07:00
+5 -5
View File
@@ -490,7 +490,7 @@ static int dirty_alloc_blocks(struct super_block *sb,
struct scoutfs_alloc *alloc,
struct scoutfs_block_writer *wri)
{
struct scoutfs_block_ref orig_freed;
struct scoutfs_alloc_list_head orig_freed;
struct scoutfs_alloc_list_block *lblk;
struct scoutfs_block *av_bl = NULL;
struct scoutfs_block *fr_bl = NULL;
@@ -508,7 +508,7 @@ static int dirty_alloc_blocks(struct super_block *sb,
mutex_lock(&alloc->mutex);
/* undo dirty freed if we get an error after */
orig_freed = alloc->freed.ref;
orig_freed = alloc->freed;
if (alloc->dirty_avail_bl != NULL) {
ret = 0;
@@ -549,7 +549,7 @@ static int dirty_alloc_blocks(struct super_block *sb,
if (link_orig) {
/* .. and point the new block at the rest of the list */
lblk = fr_bl->data;
lblk->next = orig_freed;
lblk->next = orig_freed.ref;
lblk = NULL;
}
@@ -574,10 +574,10 @@ static int dirty_alloc_blocks(struct super_block *sb,
ret = 0;
out:
if (ret < 0 && alloc->freed.ref.blkno != orig_freed.blkno) {
if (ret < 0 && alloc->freed.ref.blkno != orig_freed.ref.blkno) {
if (fr_bl)
scoutfs_block_writer_forget(sb, wri, fr_bl);
alloc->freed.ref = orig_freed;
alloc->freed = orig_freed;
}
mutex_unlock(&alloc->mutex);