Files
seaweedfs/weed/server
Chris Lu 68794fb94c fix(ec_distribute): remove partial files on copy stream error (#9543)
* fix(ec_distribute): remove partial files on copy stream error

writeToFile opens the destination with O_TRUNC and streams into it. On
a mid-stream receive / write / cancellation error it returned the
failure but left the destination behind in whatever state had been
written so far — typically 0 bytes when the source errored before
sending any FileContent. VolumeEcShardsCopy distributes .ecx by
calling doCopyFile, so this same stub-leaving behaviour produced the
0-byte .ecx files seen on EC encoding failures: the source claims a
non-zero ModifiedTsNs (so the existing "source not found" cleanup
doesn't fire), the stream then errors immediately, and the receiver
ends up with a 0-byte .ecx that downstream code mistook for a valid
empty index.

Clean up the partial file on every error path that returns from the
streaming loop (receive, write, and cancellation). Skip cleanup when
isAppend=true so resumable appends keep their existing content. As
defense in depth, VolumeEcShardsCopy also stats the .ecx after copy
and removes / errors on a 0-byte result so the orchestrator can pick
a different source.

The Rust volume server has only the source side of CopyFile (no
client-side stream-to-disk consumer) and no .ecx subsystem yet, so
this fix has no Rust mirror.

* fix(ec_distribute): close file before remove, fail fast on stat error

Address review feedback:

- writeToFile's mid-stream removeIncomplete called os.Remove while the
  destination file handle was still open. On Windows os.Remove fails
  while a handle is open, so the cleanup wouldn't run there. Wrap the
  handle close in a once-only helper, call it from removeIncomplete
  and from the existing "source not found" cleanup, and keep a deferred
  close as the safety net for the normal-return path.
- VolumeEcShardsCopy's post-copy .ecx check silently passed when
  os.Stat returned an error: doCopyFile had reported success but if
  the file was already gone, unreadable, or somehow a directory, the
  orchestrator only learned at mount time with no useful context.
  Treat any non-nil stat error and any directory result as a copy
  failure here and surface it immediately.
2026-05-18 15:19:51 -07:00
..
2019-03-03 10:17:44 -08:00
2026-02-20 18:42:00 -08:00
2026-04-10 17:31:14 -07:00