mirror of
https://github.com/seaweedfs/seaweedfs.git
synced 2026-06-09 18:32:43 +00:00
c2591b4395
* volume: verify before destroy in VolumeCopy and replication repair Four data-safety fixes around copy/repair paths that could destroy or resurrect data before verifying the source or survivors. (a) VolumeCopy no longer deletes a pre-existing local replica up front. The delete is deferred until ReadVolumeFileStatus on the source succeeds, so a transient source outage (or a retry after one) can no longer wipe a healthy destination replica. Gated on source readability only; size/count comparisons are intentionally not used because they invert legitimately after divergent vacuum/compaction. Mirrored in the Rust volume server. (b) volume.check.disk no longer resurrects vacuumed-deleted needles. A key present-and-live on the source but entirely absent on the target is ambiguous: it may be a genuine missing write, or a needle deleted on the target and then vacuumed (its index entry and any tombstone are gone). An individual needle AppendAtNs has no monotonic relation to a vacuum watermark, so the old cutoff heuristic could not tell them apart. Without positive proof the absence is a missing write, the safe default is to NOT push it back. Tradeoff: a real missing write may go unrepaired until a tombstone-aware path exists, but we never raise back deleted data. (c) Over-replication trim no longer resurrects needles or removes the wrong replica. The pre-delete sync now runs read-only (divergence check only) instead of writing the doomed replica's needles into the survivor. pickOneReplicaToDelete only ever removes the smallest of multiple healthy writable replicas; it refuses the trim when doing so would leave only read-only/integrity-flagged survivors, since file_count>0 alone cannot prove the survivor's .dat is readable. (d) Incomplete-volume (.note) cleanup keeps the shared .vif when an .ecx for the same vid coexists on the disk, so removing an interrupted regular copy cannot strip a coexisting EC volume's info file. VolumeCopy now surfaces .note write/remove errors instead of ignoring them. In the Rust volume server (where a persisting note is actually reachable) the .note check moves below the empty-stub sweep and EC validation, keeps the .vif on EC coexistence, and the mount path fails when a .note still persists. * shell: scope the over-replication writable-survivor guard to the trim path only The writable-survivor guard (never trim down to a read-only survivor) lived inside the shared pickOneReplicaToDelete, so it also gated the misplaced-volume relocation via pickOneMisplacedVolume -- a misplaced read-only volume (e.g. a full one) would silently stop being rebalanced. Extract pickSmallestReplica for the relocation path (which deletes-and-recreates and must act on read-only replicas), and keep the writable-survivor guard only in pickOneReplicaToDelete used by the over-replication trim. * seaweed-volume: recompute keep_vif after invalid-EC cleanup in the .note path keep_vif used the pre-validation ecx_exists snapshot, so when the EC-validation step above removed the invalid .ecx/shards, the .note cleanup still preserved a now-orphaned .vif. Re-check .ecx existence at cleanup time, matching the Go hasEcxFile re-check. * shell: keep placement when picking an over-replication victim to delete The trim picked the smallest writable replica without regard to placement, so it could delete the only replica in a required failure domain (e.g. with "100" and replicas dc1 + two in dc2, deleting dc1 leaves both survivors in dc2). Prefer a writable replica whose removal still satisfies placement, falling back to the smallest writable only when none does.