mirror of
https://github.com/seaweedfs/seaweedfs.git
synced 2026-05-17 23:31:31 +00:00
* fix(ec): clear cross-server stale EC shards before re-distribute (#9478) A previous failed encode leaves partial .ec?? shards mounted on destination volume servers that are not the .dat owner. PR #9480 only prunes when the .dat sits on a sibling disk of the SAME store, so the cross-server case stays stuck: every retry trips volume_grpc_copy.go:570's "ec volume %d is mounted; refusing overwrite" guard and the scheduler loops. Detection already lists existing EC shards as CleanupECShards sources; plumb the shard ids through (ActiveTopology.GetECShardLocations, TaskSourceSpec, TaskSource.shard_ids) and have the EC worker call VolumeEcShardsUnmount + VolumeEcShardsDelete on each destination after the local shard set is generated and before distributeEcShards. Skip EC-shard sources in getReplicas so the post-encode VolumeDelete step does not target destination-only nodes. Integration test mounts a partial shard subset, asserts the mounted-volume refusal, runs cleanupStaleEcShards, and asserts the next ReceiveFile lands. * chore(ec): tighten code comments in stale-shard cleanup Drop issue-number refs from code comments and shorten the docstrings on cleanupStaleEcShards / unmountAndDeleteEcShards / getReplicas plus the new test file. Behavior unchanged. * fix(ec): skip empty-ShardIds locations; dedupe getReplicas by node GetECShardLocations dropped entries where ecShardMatchesCollection saw a phantom info record with EcIndexBits=0 — without ShardIds, getReplicas misread the resulting source as a regular replica and would have called VolumeDelete on a destination-only node. getReplicas now dedupes by Node since VolumeDelete is server-wide; per-disk source rows on the same server collapse to one call. * refactor(ec): use MaxShardCount and ShardBits in collectShardIdsForDisk Drop the literal 32 bit-iteration bound for erasure_coding.MaxShardCount and treat the EcIndexBits union as a ShardBits so Count() drives the slice preallocation. Keeps the helper aligned with the rest of the EC code and survives any future expansion of the shard-count ceiling.