Files
seaweedfs/weed/shell
qzhello 9de9dbaa83 fix(shell): exclude failed EC shard copies from rebuild recoverability gate (#10043)
* fix(shell): correct volume.list -writable filter unit and comparison

* fix(shell): correct volume.list -writable filter unit and comparison

* chore(shell): fix typo in EC shard helper param names

* fix(shell): use exact match for volume.balance -racks/-nodes filter

The old strings.Contains-based filter quietly included any id that was a
  substring of the user-supplied flag value (e.g. -racks=rack10 also matched
  rack1). Replace it with an exact-match set parsed from the comma-separated
  flag value, and add regression tests for both -racks and -nodes paths.

  Also fix a small typo in the "remote storage" error returned by
  maybeMoveOneVolume.

* fix(shell): use exact match for volume.balance -racks/-nodes filter

The old strings.Contains-based filter quietly included any id that was a
  substring of the user-supplied flag value (e.g. -racks=rack10 also matched
  rack1). Replace it with an exact-match set parsed from the comma-separated
  flag value, and add regression tests for both -racks and -nodes paths.

  Also fix a small typo in the "remote storage" error returned by
  maybeMoveOneVolume.

* refactor(shell): drop nil sentinel in splitCSVSet, use len() in callers

* fix(shell): exclude failed EC shard copies from rebuild recoverability gate

prepareDataToRecover incremented the remote-shard counter before the copy
RPC, so in apply mode a failed VolumeEcShardsCopy was still counted toward
the DataShardsCount recoverability gate. The gate could then pass with
fewer real shards than required, deferring the failure to the deeper
generateMissingShards/reconstruct step and reporting an inflated shard
count in the "not enough shards" error.

Count the remote shard only after a successful copy (apply mode) or when
planning (dry-run), and rename wouldCopy to recoverableRemoteShards for
clarity. Add a regression test covering an apply-mode copy failure.

* fix(shell): clean up copied EC shards when the recoverability gate fails

A runtime copy failure can trip the gate after earlier copies already
succeeded, stranding those working shards on the rebuilder. Return the
copied shard ids on the error path and run the cleanup defer even when
recovery fails, so the temp shards get deleted.

---------

Co-authored-by: Chris Lu <chris.lu@gmail.com>
2026-06-22 11:23:23 -07:00
..
2026-06-19 09:09:01 -07:00
2026-06-19 09:09:01 -07:00