scoutfs: migrate dirty btree blocks during wrap

We were seeing ring btree corruption that manifest as the server seeing
stale btree blocks as it tried to read all the btrees to migrate blocks
during a write.  A block it tried to read didn't match its reference.

It turned out that block wasn't being migrated.  It would get stuck
at a position in the ring.  Eventually new block writes would overwrite
it and then the next read would see corruption.

It wasn't being migrated because the block reading function didn't
realize that it had to migrate a dirty block.  The block was written in
a transaction at the end of the ring.   The ring wrapped during
the transaction and then migration tried to migrate the dirty block.
It wouldn't be dirtied, and thus be migrated, because it was already
dirty in the transaction.

The fix is to add more cases to the dirtying decision which takes
migration specifically into account.  We'll no longer short circuit
dirtying blocks for migration when they're in the old half of the ring
even though they're dirty.

Signed-off-by: Zach Brown <zab@versity.com>
This commit is contained in:
Zach Brown
2019-05-16 13:21:00 -07:00
committed by Zach Brown
parent e150ebc8d2
commit e10033b34d

View File

@@ -700,9 +700,17 @@ retry:
goto out;
}
/* done if not dirtying or already dirty */
/*
* We don't need to cow the exiting block if we're not
* dirtying the block, or we're not migrating and it's
* already dirty in this transaction, or we're
* migrating and it's already in the current half.
*/
if (!(flags & BTW_DIRTY) ||
(le64_to_cpu(bt->hdr.seq) >= bti->first_dirty_seq)) {
(!(flags & BTW_MIGRATE) &&
(le64_to_cpu(bt->hdr.seq) >= bti->first_dirty_seq)) ||
((flags & BTW_MIGRATE) &&
blkno_is_current(bring, le64_to_cpu(ref->blkno)))) {
ret = 0;
goto out;
}