scoutfs: migrate dirty btree blocks during wrap

We were seeing ring btree corruption that manifest as the server seeing stale btree blocks as it tried to read all the btrees to migrate blocks during a write. A block it tried to read didn't match its reference. It turned out that block wasn't being migrated. It would get stuck at a position in the ring. Eventually new block writes would overwrite it and then the next read would see corruption. It wasn't being migrated because the block reading function didn't realize that it had to migrate a dirty block. The block was written in a transaction at the end of the ring. The ring wrapped during the transaction and then migration tried to migrate the dirty block. It wouldn't be dirtied, and thus be migrated, because it was already dirty in the transaction. The fix is to add more cases to the dirtying decision which takes migration specifically into account. We'll no longer short circuit dirtying blocks for migration when they're in the old half of the ring even though they're dirty. Signed-off-by: Zach Brown <zab@versity.com>
2026-02-07 19:20:44 +00:00 · 2019-05-16 13:21:00 -07:00
parent e150ebc8d2
commit e10033b34d
1 changed files with 10 additions and 2 deletions
--- a/kmod/src/btree.c
+++ b/kmod/src/btree.c
@@ -700,9 +700,17 @@ retry:
 			goto out;
 		}

-		/* done if not dirtying or already dirty */
+		/*
+		 * We don't need to cow the exiting block if we're not
+		 * dirtying the block, or we're not migrating and it's
+		 * already dirty in this transaction, or we're
+		 * migrating and it's already in the current half.
+		 */
 		if (!(flags & BTW_DIRTY) ||
-		    (le64_to_cpu(bt->hdr.seq) >= bti->first_dirty_seq)) {
+		    (!(flags & BTW_MIGRATE) &&
+		     (le64_to_cpu(bt->hdr.seq) >= bti->first_dirty_seq)) ||
+		    ((flags & BTW_MIGRATE) &&
+		     blkno_is_current(bring, le64_to_cpu(ref->blkno)))) {
 			ret = 0;
 			goto out;
 		}