Files
seaweedfs/weed/shell
Chris Lu c2591b4395 fix(replication): verify-before-destroy in VolumeCopy, check.disk, and over-replication trim (#9943)
* volume: verify before destroy in VolumeCopy and replication repair

Four data-safety fixes around copy/repair paths that could destroy or
resurrect data before verifying the source or survivors.

(a) VolumeCopy no longer deletes a pre-existing local replica up front.
The delete is deferred until ReadVolumeFileStatus on the source succeeds,
so a transient source outage (or a retry after one) can no longer wipe a
healthy destination replica. Gated on source readability only; size/count
comparisons are intentionally not used because they invert legitimately
after divergent vacuum/compaction. Mirrored in the Rust volume server.

(b) volume.check.disk no longer resurrects vacuumed-deleted needles. A
key present-and-live on the source but entirely absent on the target is
ambiguous: it may be a genuine missing write, or a needle deleted on the
target and then vacuumed (its index entry and any tombstone are gone). An
individual needle AppendAtNs has no monotonic relation to a vacuum
watermark, so the old cutoff heuristic could not tell them apart. Without
positive proof the absence is a missing write, the safe default is to NOT
push it back. Tradeoff: a real missing write may go unrepaired until a
tombstone-aware path exists, but we never raise back deleted data.

(c) Over-replication trim no longer resurrects needles or removes the
wrong replica. The pre-delete sync now runs read-only (divergence check
only) instead of writing the doomed replica's needles into the survivor.
pickOneReplicaToDelete only ever removes the smallest of multiple healthy
writable replicas; it refuses the trim when doing so would leave only
read-only/integrity-flagged survivors, since file_count>0 alone cannot
prove the survivor's .dat is readable.

(d) Incomplete-volume (.note) cleanup keeps the shared .vif when an .ecx
for the same vid coexists on the disk, so removing an interrupted regular
copy cannot strip a coexisting EC volume's info file. VolumeCopy now
surfaces .note write/remove errors instead of ignoring them. In the Rust
volume server (where a persisting note is actually reachable) the .note
check moves below the empty-stub sweep and EC validation, keeps the .vif
on EC coexistence, and the mount path fails when a .note still persists.

* shell: scope the over-replication writable-survivor guard to the trim path only

The writable-survivor guard (never trim down to a read-only survivor) lived
inside the shared pickOneReplicaToDelete, so it also gated the misplaced-volume
relocation via pickOneMisplacedVolume -- a misplaced read-only volume (e.g. a
full one) would silently stop being rebalanced. Extract pickSmallestReplica
for the relocation path (which deletes-and-recreates and must act on read-only
replicas), and keep the writable-survivor guard only in pickOneReplicaToDelete
used by the over-replication trim.

* seaweed-volume: recompute keep_vif after invalid-EC cleanup in the .note path

keep_vif used the pre-validation ecx_exists snapshot, so when the EC-validation
step above removed the invalid .ecx/shards, the .note cleanup still preserved a
now-orphaned .vif. Re-check .ecx existence at cleanup time, matching the Go
hasEcxFile re-check.

* shell: keep placement when picking an over-replication victim to delete

The trim picked the smallest writable replica without regard to placement, so
it could delete the only replica in a required failure domain (e.g. with "100"
and replicas dc1 + two in dc2, deleting dc1 leaves both survivors in dc2).
Prefer a writable replica whose removal still satisfies placement, falling back
to the smallest writable only when none does.
2026-06-13 20:05:33 -07:00
..
2026-02-09 01:37:56 -08:00
2024-09-29 10:38:22 -07:00
2024-09-29 10:38:22 -07:00
2024-09-29 10:38:22 -07:00