mirror of
https://github.com/seaweedfs/seaweedfs.git
synced 2026-05-21 01:01:29 +00:00
* test(erasure_coding): reproduce #9184 deleteOriginalVolume swallowing errors ErasureCodingTask.deleteOriginalVolume logs a warning when any replica VolumeDelete fails and then returns nil, so the EC task reports success to the admin even when a source replica survives. That stale replica lets a later detection scan re-propose the same volume and, once retried, drives the mounted-shard-truncation corruption that issue 9184 also describes. Reproducer: wire one reachable replica (succeeds) and one unreachable replica (fails) and assert the function currently returns nil. After the fix the function must surface the replica failure so the task is retried rather than marked done, and this test needs to be inverted. * fix(erasure_coding): surface replica delete failures from EC task ErasureCodingTask.deleteOriginalVolume previously logged a warning and returned nil when any VolumeDelete against a source replica failed. The EC task therefore reported overall success to the admin even when a source replica stayed on disk, which let a later detection scan propose a duplicate EC encoding of the same volume. The retry then walked the ReceiveFile path against servers that already had mounted EC shards for the volume, truncating the live shard files in place (the other half of #9184). This change returns an error describing the per-replica failures after the best-effort delete pass, so the task is marked failed instead of silently moving on. Successful deletes are still applied (per-replica progress is preserved); only the final return changes. When combined with the ReceiveFile mount-safety check, a stuck original replica now produces loud, actionable failures instead of silent corruption. Tests: - TestDeleteOriginalVolumeSurfacesReplicaFailures: asserts an error is returned and names the unreachable replica, while the reachable replica still gets deleted. - TestDeleteOriginalVolumeSucceedsWhenAllReplicasReachable: pins the happy path.