46 Commits

Author SHA1 Message Date
Chris Lu
f5b833ab6a test(ec): end-to-end encode over a multi-server multi-disk stuck layout (#9728)
* test(framework): support multiple disks per server in MultiVolumeCluster

StartMultiVolumeClusterWithDisks gives each volume server N data
directories (one DiskLocation each), passed to -dir as a comma list, with
a per-server disk-dir accessor for file inspection. StartMultiVolumeCluster
keeps its one-disk default.

* test(ec): end-to-end encode over a multi-server multi-disk stuck layout

A volume in the stuck state — real .dat source, a 0-byte stub replica, and
partial stale EC shards from an interrupted encode — must converge to one
valid EC layout. Asserts the full shard set across servers, .ecx/.vif kept
per server (info file survives the source-volume delete), stale shards
cleared, and no regular .dat/.idx left behind.
2026-05-28 16:44:42 -07:00
Chris Lu
3674f9d04d fix(storage): keep EC .vif when deleting a coexisting regular volume (#9723)
* fix(storage): keep EC .vif when deleting a coexisting regular volume

A regular volume and an EC volume for the same id share <base>.vif. When
EC shards are distributed onto a server that still holds the regular
volume — the encode source, or any replica the planner targets — the
post-encode VolumeDelete ran removeVolumeFiles and stripped the shared
.vif, leaving the freshly built EC volume without its info file.

Skip the .vif in removeVolumeFiles when an EC volume for the same id
exists on the disk (mounted, or a sealed .ecx on disk). The regular
volume's .dat/.idx still go; the EC sidecars survive.

A two-server end-to-end test encodes a volume whose source and a stub
replica both also receive shards, and asserts the final on-disk layout:
both .dat/.idx gone, each server holding only its assigned shards plus
.ecx/.vif. Storage unit tests cover the with-EC and no-EC cases, and the
Rust seaweed-volume port carries the same guard and tests.

* test(storage): assert .idx is removed in the no-EC destroy case

Strengthen TestDestroyRemovesVifWhenNoEc to confirm the full regular
volume cleanup (.dat, .idx, .vif) when no EC volume coexists.
2026-05-28 15:39:31 -07:00
Chris Lu
b1dcb6c52e fix(ec): delete empty stub replicas before distributing EC shards (#9722)
* fix(ec): delete empty stub replicas before distributing EC shards

An interrupted encode can leave a 0-byte .dat replica behind. Until now
the only thing that removed it was deleteOriginalVolume, which runs after
distribute+mount and calls VolumeDelete -> removeVolumeFiles. A regular
volume and an EC volume share the same <collection>_<vid>.vif path, so
deleting the stub at that point strips the .vif out from under the
freshly distributed shards.

Sweep the original replicas with VolumeDelete(OnlyEmpty=true) before
distribute: doIsEmpty uses the same superblock threshold, so only the
0-byte stubs go and any data-bearing replica is refused and kept for the
post-verify delete. Servers cleared in the sweep are skipped by
deleteOriginalVolume so it never touches a server that now holds only EC
shards.

* fix(ec): fail the encode when an empty-replica sweep can't confirm a node

The sweep swallowed every VolumeDelete(OnlyEmpty) error, so a transient
failure on a stub node fell through to the post-verify force-delete on
that node — the shared-.vif clobber the sweep exists to avoid.

Treat only the expected cases (volume not empty, or already gone) as
leave-in-place; any other error propagates and fails the encode, which
rolls back the readonly marks and retries next cycle.
2026-05-28 13:21:24 -07:00
Chris Lu
691e601e6f fix(ec): prefer credible replica as canonical metric in EC detection (#9717)
* fix(ec): prefer credible replica as canonical metric in EC detection

An interrupted encode can leave a 0-byte .dat replica behind. When that
stub sits on a lower-sorting server than the real replica, the
lowest-server canonical pick reported Size=0, tripped the min-size gate,
and the volume was stranded in skippedTooSmall: detection never proposed
an encode, so the partial EC shards were never cleared and re-distribute
kept hitting the mounted-volume guard.

selectCanonicalMetric now prefers the lowest-server credible replica
(data-bearing, not already EC), falling back to the lowest-server metric
only when nothing is credible so the downstream gates skip as before. A
leftover EC shard set on a lower server no longer short-circuits the
volume at the IsECVolume guard either, so the orphan-source cleanup and
re-encode paths get their chance.

* fix(ec): treat a bare superblock .dat as a stub too

An interrupted encode or copy can write the 8-byte superblock and then
fail, leaving an 8-byte .dat with no data. isStubReplica used a strict <
so that file slipped through as credible, could win the canonical pick on
a low server, and re-tripped the min-size gate. Use <= the superblock so a
data-less .dat never shadows a real replica.
2026-05-28 13:06:21 -07:00
Chris Lu
21f2699624 EC detection: build placement snapshot once per cycle (fix large-topology timeout) (#9625)
* EC detection: build placement snapshot once per cycle, not per volume

planECDestinations rebuilt the full ecbalancer snapshot (FromActiveTopology) for
every eligible volume, and resolved each shard destination's address via
ResolveServerAddress, which rebuilds the whole node map on every call. Both are
O(volumes x topology) and made detection time out on large clusters
(TestErasureCodingDetectionLargeTopology: 300k volumes hit the 2-minute
deadline).

Build the snapshot and the node-address map once per detection cycle and pass
them in. planECDestinations now reserves the shards it assigns directly into the
shared snapshot, so volumes planned later in the same cycle still see the reduced
capacity (previously this was observed by rebuilding from ActiveTopology's
pending tasks). Large-topology detection drops from a 120s timeout to ~3.5s.

* Update weed/worker/tasks/erasure_coding/detection.go

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-05-22 22:20:39 -07:00
Chris Lu
d1665750e1 Delete the EC placement package now that encode/repair use ecbalancer.Place (#9624)
Delete the EC placement package and the dead encode planner code

Now that encode (and repair) place via ecbalancer.Place, nothing uses the
erasure_coding/placement package or the EC-only planner machinery
(ecPlacementPlanner, diskInfosToCandidates, calculateECScoreCandidate,
distributeECShards) in detection.go. Removes them and the package, along with the
planner-direct unit tests.
2026-05-22 20:32:09 -07:00
Chris Lu
0566fbd552 EC encode: place shards via ecbalancer.Place + configurable replica placement (#9623)
* Add shared super_block.ResolveReplicaPlacement; use it in ec_balance

* Add ecbalancer.FromActiveTopology snapshot constructor for EC encode/repair

* Add ecbalancer.Place greenfield/repair placement core (strict + durability-first)

* topology: add GetEffectiveAvailableEcShardSlots; FromActiveTopology uses shard-granular free slots

GetDisksWithEffectiveCapacity flattens reserved shard slots into volume slots via
integer truncation, so an in-flight EC task reserving a non-multiple-of-
DataShardsCount number of shards was lost from the snapshot and freeSlots was
over-reported. GetEffectiveAvailableEcShardSlots subtracts the full reservation
impact at shard granularity.

* ecbalancer.Place: reject nodes without a free disk of the requested type

FromActiveTopology keeps all disk types in the snapshot, so an SSD-only request
could be routed to a node with only HDD capacity (pickBestDiskOnNode then returns
disk 0 on the wrong tier). Filter rack/node selection to those with a free disk
of the requested type.

* ecbalancer.Place: enforce ReplicaPlacement DiffDataCenterCount (per-DC shard cap)

* ecbalancer: enforce DiffDataCenterCount in balance (cross-DC phase + cross-rack DC cap)

Adds a cross-DC corrective phase that drains data centers holding more than
DiffDataCenterCount shards of a volume, and a per-DC cap on cross-rack move
targets. Both are no-ops when DiffDataCenterCount is unset, so balance output is
unchanged for non-DC placements.

* topology: ratio-aware EC shard slots and provisional empty-disk slot

GetEffectiveAvailableEcShardSlots now takes the target collection's data-shard
count, so a 4+2 volume's larger shards are not over-counted at 10 per volume slot;
and it keeps the one provisional slot for freshly started empty servers that
report max=0, matching getEffectiveAvailableCapacityUnsafe. FromActiveTopology
threads the ratio through.

* ecbalancer.Place: explicit disk-type filter signal (fix HDD vs any ambiguity)

HardDriveType normalizes to "", which collided with "" meaning any disk. Add
Constraints.FilterDiskType and normalize both sides so a hdd request matches disks
reported as "" and never leaks to SSD, while filter=false still means any.

* ecbalancer: add clearShardAccounting for repair snapshot reconciliation

Clears one disk's copy of a shard from per-domain accounting and recomputes the
node-level union (preserving a kept copy on another disk of the same node), without
crediting capacity. Repair uses it to drop to-be-deleted copies before placing
missing shards.

* ecbalancer: don't cap cross-DC target racks when DiffRackCount is unset

len(racks)+1 wrongly limited each target rack (3 in a 2-rack cluster), so draining
a DC could stop short of the DiffDataCenterCount cap. Use MaxShardCount+1 as the
effectively-unlimited default.

* topology/ecbalancer: ratio-correct EC capacity accounting

Reservation shard slots (default ShardsPerVolumeSlot units) are now converted to
the target ratio before subtracting, and existing EC shards are charged by size
(targetDataShards/shardDataShards) so a 2+1 shard isn't counted as one 10+4 slot.
Per-shard ratio lookup is behind shardDataShards (OSS uses the standard ratio).

* ecbalancer.Place: candidate tiering and eligible-rack caps

Adds a per-disk eligibility/preference abstraction so Place supports:
- preferred-tag whole-plan retry (try disks carrying the earliest tags first,
  widen to all only if a tier cannot place every shard; reports
  SpilledOutsidePreferredTags),
- soft disk-type spill via DiskTypePolicy (Any/Prefer/Require): Prefer fills the
  preferred type then spills, reporting SpilledToOtherDiskType; Require filters,
- even per-rack caps that divide by racks holding an eligible disk, so a tiered
  cluster (e.g. SSDs in 2 of 4 racks) isn't capped impossibly low.
Disk tags carried via Node.AddDiskTags + FromActiveTopology.

* ecbalancer: export ClearShardAccounting for repair snapshot reconciliation

* ecbalancer: address review feedback (ratio rounding, bitmap walk, same-DC moves)

- topology/ecbalancer: round shard-reservation and existing-shard footprint up
  when converting to target-ratio shard slots, so a sub-slot reservation is not
  truncated to zero and free capacity is not overstated for low-data-shard
  layouts (targetDataShards < ds).
- erasure_coding: add ShardBits.All iterator and use it across the balancer,
  cross-DC phase, and placement scoring instead of scanning 0..MaxShardCount and
  probing Has on every id.
- ecbalancer: allow same-DC cross-rack moves when a DC already sits at its
  DiffDataCenterCount cap; a same-DC move leaves the DC total unchanged. Add a
  regression test that fails without the guard.
- ecbalancer cross-DC phase: pick targets via the eligible-aware
  pickNodeInRackEligible/pickBestDiskEligible helpers so the disk-type filter is
  honored and a 0 disk id is not mistaken for a valid selection.

* ecbalancer: test ecShardSlotsOnDisk fractional round-up

Cover the mixed-ratio path (targetDataShards < existing data shards) so a
shard's fractional footprint is never floored to zero and free capacity is not
overstated. Exercises the round-up via the targetDataShards parameter; OSS uses
the standard ratio at runtime while the enterprise build hits it with real
per-volume ratios.

* ecbalancer: assert node B rack in TestFromActiveTopology

* ecbalancer: split Destination into separate DataCenter and bare Rack

Replace the composite "dc:rack" Rack field on Destination with separate
DataCenter and bare Rack values, matching topology.DiskInfo and the worker-task
convention. Callers (and tests) read the data center directly instead of parsing
the composite with strings.SplitN.

* shell ec.balance: use utilization-based global balancing (parity with worker)

The shell's global rebalance phase balanced by raw shard count; switch it to
fractional fullness (shards/capacity), as the worker already does. On uniform
capacity the two agree; on heterogeneous capacity it fills nodes proportionally
instead of driving small-capacity nodes toward full.

Updates the heterogeneous-capacity regression test to assert even fullness
(~equal shards/capacity per node) rather than even shard count.

* ecbalancer: bounded-proportional per-DC shard spread

DiffDataCenterCount was enforced only as a ceiling (drain-to-cap), which could
leave a within-cap-but-lopsided DC distribution under a loose cap (e.g. 10/4 of 14
with cap=10). Now the cross-DC phase, the cross-rack DC guard, and Place all target
boundedMaxPerDC = min(DiffDataCenterCount, max(ceil(total/numDCs), parityShards)):
shards spread proportionally across DCs, but no tighter than the durability floor
(once each DC holds <= parityShards a DC loss is recoverable, so further spreading
only adds cross-DC/WAN traffic). No-op when DiffDataCenterCount is 0; identical to
before when the cap is the binding constraint.

* ecbalancer: drop DiffDataCenterCount enforcement for EC placement

The 1-byte volume ReplicaPlacement packs xyz into x*100+y*10+z<=255, so the DC
digit can only be 0-2 -- far too small to be a meaningful per-DC EC shard cap (a
cap of 1-2 would demand 7-14 DCs for a 10+4 volume). It's volume replica-placement,
not an EC spec. Removes the cross-DC balance phase, the DC guard in the cross-rack
phase, and the per-DC cap in Place (and the just-added bounded-proportional logic);
EC relies on the RP-independent rack/node even spread instead. Rack/node caps
(DiffRackCount/SameRackCount) are unchanged. Per-domain EC caps are left for a real
EC placement spec.

* ecbalancer: enforce per-disk durability cap; symmetric reserve/release

Place now refuses to put more than parityShards shards of a volume on a single
disk (pickBestDiskEligible skips a disk once it holds parityShards of the volume,
a hard cap not relaxed even in durability-first). Previously Place assigned by
free capacity, so a skewed near-full cluster could pile >parityShards onto one
disk -> losing it loses the volume; only distinct-disk count was checked. This
covers encode and repair (both route through Place); the caller skips/leaves the
volume rather than minting an unrecoverable layout.

Also makes reserveShard decrement freeSlots unconditionally, symmetric with
releaseShard's unconditional increment (the old guarded decrement could credit a
phantom slot on release if a shard were ever reserved onto a full disk).

* ecbalancer: add Topology.ReleaseVolumeShards (clear + credit) for greenfield encode

Releases all of a volume's shards from the snapshot and credits the freed disk
capacity, so a greenfield encode can plan as if stale EC shards from a prior failed
attempt are gone. Safe to credit because the encode task deletes stale shards
(cleanupStaleEcShards) before distributing the new ones. Distinct from
ClearShardAccounting (repair), which does not credit.

* ecbalancer: ReleaseVolumeShards credits node freeSlots, not just disks

releaseShard only increments per-disk freeSlots, but rack capacity is summed from
node freeSlots (buildRacks) and node freeSlots gates node eligibility. Crediting
only disks left a node/rack looking full after releasing stale shards, so a
greenfield encode still couldn't use the freed capacity. Now credits the node by
the total disk-slots freed.

* ecbalancer: correct PlacementMode docs (encode uses durability-first)

PlaceStrict was labeled '(encode)' but encode uses PlaceDurabilityFirst. Clarify
that durability-first is used by both encode and repair, reports relaxations in
PlaceResult.Relaxed, and never relaxes the per-disk durability cap.

* ecbalancer: treat SameRackCount as a direct per-node shard cap

The 3rd ReplicaPlacement digit now caps shards per node at exactly the digit
value, matching how DiffRackCount (2nd digit) caps per rack, instead of allowing
digit+1 per node. This makes the per-rack and per-node caps consistent and
matches the documented "digits cap EC shards per rack and per node" semantics;
e.g. 011 now means at most one shard per rack and one per node.

* EC encode: place shards via ecbalancer.Place + configurable replica placement

Encode now plans destinations through the shared ecbalancer.Place policy
(durability-first: prefers the source disk type and honors replica placement /
caps / anti-affinity, relaxing rather than failing when capacity is tight) instead
of the EC-only placement planner. Targets and capacity reservations use Place's
actual per-disk shard assignment, not a round-robin guess; cross-volume in-cycle
capacity is tracked by ActiveTopology's pending task, so the cached planner is no
longer consulted. Adds a configurable replica_placement (proto field 6 + worker
form + reader) that overrides the master default replication.

The placement-package planner code is left in place (now unused) and removed in a
follow-up that drops the package.

* EC encode: drop unused dataShards param from createECTargets

Addresses review feedback: after switching to Place's per-disk shardsPerPlan
assignment, createECTargets no longer needs the data-shard count.

* EC encode: fix packed-target validation, greenfield stale-shard accounting, RP docs

- Validate counts distinct shard ids across targets, not target rows, so packed
  plans (fewer (node,disk) targets than shards) aren't rejected.
- planECDestinations releases the volume's stale EC shards from the snapshot before
  Place (ReleaseVolumeShards), crediting their capacity. The encode task deletes
  stale shards before distributing, so a retry on tight capacity no longer fails
  planning by counting shards that are about to be removed.
- replica_placement config/form help no longer claims a data-center limit (the DC
  digit is ignored for EC); detection logs a warning when a DC digit is set.

* EC encode: surface relaxed placement; mark replica_placement best-effort

Encode places with PlaceDurabilityFirst (the chosen lenient behavior), which can
relax caps/anti-affinity/replica-placement to avoid deferring. That was silent
(only disk-type/tag spills were logged). Now logs PlaceResult.Relaxed so a tight
replica placement isn't weakened unnoticed, and the config/form help states the
rack/node caps are best-effort during encode (enforced by rebalancing).

* EC encode: key per-disk shard grouping by struct, not formatted string

planECDestinations grouped destinations using a fmt.Sprintf("%s:%d") map key
per shard; use a {node,diskID} struct key and pre-size the map/slice to the
shard count to drop the per-shard string allocation.
2026-05-22 20:22:30 -07:00
Chris Lu
0accff0e4a fix(ec): log EC destination planning failures at v=2
The maintenance scanner tries to plan EC destinations for every
eligible volume, so clusters that can't place EC logged a warning per
volume every cycle. The min-node gate already skips clusters with fewer
nodes than parity shards; demote the rest to V(2).
2026-05-21 10:35:34 -07:00
Chris Lu
cd15ae1395 fix(ec): bring ec.encode worker and EC/volume helpers to parity with shell (#9599)
* refactor(volume): extract replica sync/select into shared volume_replica package

Move the volume replica reconciliation helpers (status, union builder,
SyncAndSelectBestReplica, ReadNeedleMeta) out of the shell into a new
weed/storage/volume_replica package so both the shell (ec.encode, volume.tier.move,
volume.check.disk) and the EC encode worker can reuse them. No behavior change.

* fix(ec): bring ec.encode worker to parity with the shell

- Sync replicas and encode the most-complete one (via the shared
  volume_replica.SyncAndSelectBestReplica) instead of a possibly-stale replica,
  marking all replicas readonly first. Prevents silent data loss when a stale
  replica is encoded and the originals deleted.
- Skip remote/tiered volumes in detection (shell ec.encode excludes them).
- Min-node safety gate: refuse to encode when cluster nodes < parity shards.
- Align default thresholds with the shell (fullness 0.95, quiet 1h).

* fix(vacuum): plugin path honors min_volume_age_seconds override

deriveVacuumConfig hard-coded MinVolumeAgeSeconds=0, dropping any configured
value. Read it from worker config (default 0, matching the shell/master vacuum
which has no age gate) so an explicit override is honored.

* address review feedback

- config.go: align GetConfigSpec schema defaults (quiet_for_seconds=3600,
  fullness_ratio=0.95) with the runtime defaults so UI/bootstrap flows match the
  shell (coderabbitai).
- ec_task.go: roll back readonly when markReplicasReadonly fails partway, so
  already-marked replicas don't stay readonly (coderabbitai).
- volume_replica: pass the caller's replica statuses into buildUnionReplica instead
  of re-fetching them, and skip the per-needle ReadNeedleMeta RPC when the source
  replica is read-only (gemini-code-assist).

* test(plugin_workers/ec): make fixtures eligible under the new defaults

The default EC encode thresholds were raised to match the shell (fullness 0.95,
quiet 1h), but the plugin-worker integration fixtures still used 90%-full /
10-minute-old volumes, so detection found no eligible volumes and the tests failed
in CI. Bump the eligible fixtures to 96% full and 2h old.
2026-05-21 02:16:28 -07:00
Chris Lu
024b59fb31 fix(ec): pack EC shards onto fewer disks instead of refusing the task (#9588)
The planner refused to create an EC task unless it found totalShards
distinct (server, disk_id) targets, so a cluster with fewer disks than
shards (e.g. 8 single-disk servers for a 10+4 scheme) could never encode.

A disk safely holds several distinct shards of one volume: each is its own
.ecNN file and ReceiveFile keys by that extension. Drop the strict check and
let createECTargets round-robin shards across the available disks, matching
ec.encode's "4,4,3,3" fallback. The minTotalDisks floor (ceil(total/parity))
already keeps any disk under parityShards shards, so the volume still
survives losing any one disk.

Reserve capacity for the actual per-disk shard count rather than assuming
one shard each, so packing doesn't over-commit disk slots.
2026-05-20 11:50:42 -07:00
Chris Lu
2a41e76101 fix(ec): blanket-clean every destination over the full shard range (#9512)
* fix(ec): blanket-clean every destination over the full shard range

The previous cleanup pass walked t.sources only, with the shard ids the
topology had reported at detection time. In the wild, a destination can
end up with EC shards mounted that the topology snapshot didn't list —
shards on a sibling disk that hadn't heartbeated, or shards left over
from a concurrent attempt's mount step. FindEcVolume still returns
true, so the next ReceiveFile trips the mounted-volume guard.

Cleanup now unions t.sources (with ShardIds) and t.targets and issues
unmount + delete over [0..totalShards-1] on each. Both RPCs are
idempotent on missing shards, so the wider sweep is free.

Two new tests cover the gap: shards mounted beyond what t.sources
lists, and a target-only destination with no source row.

* log(ec): include disk_id in EC unmount/delete/refusal log lines

The current logs identify the volume and shard but leave disk_id off,
which makes the cross-server cleanup story hard to follow when
multiple disks of one server hold pieces of the same volume:

  UnmountEcShards 4121.1                              -> add disk_id
  ec volume video-recordings_4121 shard delete [1 5]  -> add per-loc disk_id
  volume server X:Y deletes ec shards from 4121 [...] -> add disk_id
  ReceiveFile: ec volume 4121 is mounted; refusing... -> add disk_ids

ReceiveFile's refusal now names the disk_ids actually holding the
mount so operators can see whether the next cleanup pass needs to
target a sibling disk. Added Store.FindEcVolumeDiskIds /
Store::find_ec_volume_disk_ids as the supporting primitive.

Mirrored in seaweed-volume/src/ (unmount log in Store::unmount_ec_shard,
heartbeat delete log in diff_ec_shard_delta_messages, refusal in the
ReceiveFile handler).

* test(ec): stub VolumeEcShardsUnmount/Delete on the fake volume server

The plugin-worker EC tests boot a fake volume server that embeds
UnimplementedVolumeServerServer. After the worker started calling
VolumeEcShardsUnmount + VolumeEcShardsDelete pre-distribute, the
default Unimplemented response surfaced as fourteen "method not
implemented" errors and TestErasureCodingExecutionEncodesShards
failed. Both RPCs are no-ops here — nothing on the fake server has
mounted state or persisted shard files to remove.
2026-05-17 11:31:37 -07:00
Chris Lu
2c1482f7a6 fix(ec): clear cross-server stale EC shards before re-distribute (#9478) (#9499)
* fix(ec): clear cross-server stale EC shards before re-distribute (#9478)

A previous failed encode leaves partial .ec?? shards mounted on
destination volume servers that are not the .dat owner. PR #9480 only
prunes when the .dat sits on a sibling disk of the SAME store, so the
cross-server case stays stuck: every retry trips
volume_grpc_copy.go:570's "ec volume %d is mounted; refusing overwrite"
guard and the scheduler loops.

Detection already lists existing EC shards as CleanupECShards sources;
plumb the shard ids through (ActiveTopology.GetECShardLocations,
TaskSourceSpec, TaskSource.shard_ids) and have the EC worker call
VolumeEcShardsUnmount + VolumeEcShardsDelete on each destination after
the local shard set is generated and before distributeEcShards. Skip
EC-shard sources in getReplicas so the post-encode VolumeDelete step
does not target destination-only nodes.

Integration test mounts a partial shard subset, asserts the
mounted-volume refusal, runs cleanupStaleEcShards, and asserts the
next ReceiveFile lands.

* chore(ec): tighten code comments in stale-shard cleanup

Drop issue-number refs from code comments and shorten the docstrings
on cleanupStaleEcShards / unmountAndDeleteEcShards / getReplicas plus
the new test file. Behavior unchanged.

* fix(ec): skip empty-ShardIds locations; dedupe getReplicas by node

GetECShardLocations dropped entries where ecShardMatchesCollection saw a
phantom info record with EcIndexBits=0 — without ShardIds, getReplicas
misread the resulting source as a regular replica and would have called
VolumeDelete on a destination-only node.

getReplicas now dedupes by Node since VolumeDelete is server-wide;
per-disk source rows on the same server collapse to one call.

* refactor(ec): use MaxShardCount and ShardBits in collectShardIdsForDisk

Drop the literal 32 bit-iteration bound for erasure_coding.MaxShardCount
and treat the EcIndexBits union as a ShardBits so Count() drives the
slice preallocation. Keeps the helper aligned with the rest of the EC
code and survives any future expansion of the shard-count ceiling.
2026-05-14 11:57:45 -07:00
Chris Lu
3a8389cd68 fix(ec): verify full shard set before deleting source volume (#9490) (#9493)
* fix(ec): verify full shard set before deleting source volume (#9490)

Before this change, both the worker EC task and the shell ec.encode
command would delete the source .dat as soon as MountEcShards returned —
even if distribute/mount failed partway, leaving fewer than 14 shards
in the cluster. The deletion was logged at V(2), so by the time someone
noticed missing data the only trace was a 0-byte .dat synthesized by
disk_location at next restart.

- Worker path adds Step 6: poll VolumeEcShardsInfo on every destination,
  union the bitmaps, and refuse to call deleteOriginalVolume unless all
  TotalShardsCount distinct shard ids are observed. A failed gate leaves
  the source readonly so the next detection scan can retry.
- Shell ec.encode adds the same gate after EcBalance, walking the master
  topology with collectEcNodeShardsInfo.
- VolumeDelete RPC success and .dat/.idx unlinks now log at V(0) so any
  source destruction is traceable in default-verbosity production logs.

The EC-balance-vs-in-flight-encode race is intentionally left for a
follow-up; balance should refuse to move shards for a volume whose
encode job is not in Completed state.

* fix(ec): trim doc comments on the new shard-verification path

Drop WHAT-describing godoc on freshly added helpers; keep only the WHY
notes (query-error policy in VerifyShardsAcrossServers, the #9490
reference at the call sites).

* fix(ec): drop issue-number anchors from new comments

Issue references age poorly — the why behind each comment already
stands on its own.

* fix(ec): parametrize RequireFullShardSet on totalShards

Take totalShards as an argument instead of reading the package-level
TotalShardsCount constant. The OSS callers continue to pass 14, but the
helper is now usable with any DataShards+ParityShards ratio.

* test(plugin_workers): make fake volume server respond to VolumeEcShardsInfo

The new pre-delete verification gate calls VolumeEcShardsInfo on every
destination after mount, and the fake server's UnimplementedVolumeServer
returns Unimplemented — the verifier read that as zero shards on every
node and aborted source deletion. Build the response from recorded
mount requests so the integration test exercises the gate end-to-end.

* fix(rust/volume): log .dat/.idx unlink with size in remove_volume_files

Mirror the Go-side change in weed/storage/volume_write.go: stat each
file before removing and emit an info-level log for .dat/.idx so a
destructive call is always traceable. The OSS Rust crate previously
unlinked them silently.

* fix(ec/decode): verify regenerated .dat before deleting EC shards

After mountDecodedVolume succeeds, the previous code immediately
unmounts and deletes every EC shard. A silent failure in generate or
mount could leave the cluster with neither shards nor a valid normal
volume. Probe ReadVolumeFileStatus on the target and refuse to proceed
if dat or idx is 0 bytes.

Also make the fake volume server's VolumeEcShardsInfo reflect whichever
shard files exist on disk (seeded for tests as well as mounted via
RPC), so the new gate can be exercised end-to-end.

* fix(ec): address PR review nits in verification + fake server

- Drop unused ServerShardInventory.Sizes field.
- Skip shard ids >= MaxShardCount before bitmap Set so the ShardBits
  bound is explicit (Set already no-ops on overflow, this is for
  clarity).
- Nil-guard the fake server's VolumeEcShardsInfo so a malformed call
  doesn't panic the test process.
2026-05-13 19:29:24 -07:00
Chris Lu
d221a64262 fix(ec): skip re-encode when EC shards already exist for the volume (#9448) (#9458)
* fix(ec): skip re-encode when EC shards already exist for the volume (#9448)

When an earlier EC encoding succeeded but the post-encode source-delete
left a regular replica behind on one of the servers, the next detection
cycle proposes the same volume again. The new encode tries to redistribute
shards to targets that already have them mounted, the volume server
returns `ec volume %d is mounted; refusing overwrite`, the task fails,
and detection re-queues the volume. The cycle repeats forever — issue
#9448.

The existing `metric.IsECVolume` skip catches the case where the canonical
metric is reported on the EC-shard side of the heartbeat, but when the
master sees BOTH a regular replica AND its EC shards in the same volume
list, the canonical metric we pick is the regular replica and
IsECVolume is false. Add a second guard that checks the topology
directly via `findExistingECShards` (already present and indexed) and
skip the volume when any shards exist, logging a warning that points
the admin at the stuck source.

This breaks the loop. Auto-cleanup of the orphaned replica is left as
follow-up work — deleting a source replica from inside the detector is
only safe with a re-verification step right before the delete, plus a
config opt-in, and is best done in its own change.

* fix(ec): #9448 guard only fires when EC shard set is complete

The first version of the #9448 guard tripped on `len(existingShards) > 0`,
which is broader than necessary. The existing recovery branch in the
encode arm (around the `existingECShards` block, ~line 216) is designed
to fold partial leftover shards from a previously failed encode into
the new task as cleanup sources. Skipping unconditionally on any
existing shards made that branch dead code, regressing the recovery
behavior Gemini flagged in the review of af09e1ec7.

Two corrections:

  1. New helper `countExistingEcShardsForVolume` walks each disk's
     `EcIndexBits` bitmap and ORs the results into a `ShardBits`,
     returning the distinct-shard popcount. This is the right unit:
     a single `VolumeEcShardInformationMessage` can carry several
     shards, so `len(EcShardInfos)` is not the same as the number
     of present shards. Per Gemini's "use helper functions that walk
     the actual shard bitmap" note.
  2. The guard now fires only when `shardCount >= totalShards`.
     Partial shard sets fall through to the existing recovery branch,
     unchanged.

Tests:
  - TestDetectionSkipsWhenECShardsAlreadyExist: complete shards →
    no proposal (the regression test for #9448 itself, unchanged
    intent, rewritten on top of new helpers).
  - TestDetectionAllowsRegularReplicaWhenShardsPartial: partial
    shards → guard does NOT swallow the volume; the encode arm
    still gets a chance.
  - TestCountExistingEcShardsForVolume: the helper walks the
    bitmap correctly even when one info entry packs multiple
    shards on one disk.

The dangerous `volume.delete` hint in the warning is unchanged for
now — it gets fixed in the next commit.

* fix(ec): drop dangerous shell-command hint from #9448 warning

The previous warning told operators to run `volume.delete -volumeId=%d`
in the SeaweedFS shell to clean up the orphaned source replica. That
command is cluster-wide — it deletes every replica of the volume,
including the EC shards, which share the same volume id. Running it
in the state the message describes would cause the data loss the
guard exists to prevent.

Replace it with explicit guidance that the cleanup must be a targeted
VolumeDelete RPC against the source server only, and that the
shell command is the exact wrong thing to use here. The next two
commits add the plumbing and the auto-execution of that targeted
delete so most operators never see this hint at all.

Per Gemini comment on af09e1ec7.

* feat(worker): plumb grpc dial option through ClusterInfo

Add ClusterInfo.GrpcDialOption (optional) and set it in the
erasure_coding plugin handler. Lets the detector make targeted
gRPC calls during detection — used by the follow-up commit to
auto-clean orphan source replicas via VolumeDelete RPCs.

Zero-value safe: existing detectors that don't need RPC access
get a nil DialOption and ignore the field.

* feat(ec): auto-clean orphan source replica via targeted VolumeDelete

Builds on the previous commits: the guard now identifies the
#9448 stuck-source state and a gRPC dial option is available on
ClusterInfo. When both are true, detection auto-cleans the
orphaned regular replica instead of just warning the operator.

New helper `cleanupOrphanSourceReplicas`:

  1. Re-verifies the EC shard set is still complete via
     `countExistingEcShardsForVolume` against the live topology
     snapshot. If the count dropped between detection start and
     the cleanup decision (a volume server going down mid-cycle),
     it aborts — the source replica is the only complete copy and
     deleting it without a healthy shard set would be data loss.
  2. Issues targeted VolumeDelete RPCs to each regular-replica
     server via `operation.WithVolumeServerClient`. That RPC only
     touches the regular volume on the targeted server; EC shards
     live in a separate store path and are not affected. This is
     the safe alternative to the cluster-wide `volume.delete`
     shell command we previously warned against.

If the cleanup partially fails (one replica delete errors, others
succeed), detection logs the failure and continues to skip the
volume. The next detection cycle will try again. We deliberately
don't fall back to a re-encode because that would just collide
with the mounted shards on the targets again.

When no dial option is available the existing warning still
points operators at the safe manual procedure.
2026-05-11 23:12:57 -07:00
Chris Lu
532b088262 fix(ec): preserve source disk type across EC encoding (#9423) (#9449)
* fix(ec): carry source disk type on VolumeEcShardsMount (#9423)

When EC shards land on a target whose disk type differs from the
source volume's, master heartbeats wrongly reported under the target
disk's type. Add source_disk_type to VolumeEcShardsMountRequest; the
target server applies it to the in-memory EcVolume via SetDiskType so
the mount notification and steady-state heartbeat both carry the
source's disk type. Empty value falls back to the location's disk
type (used by disk-scan reload paths).

The override is not persisted with the volume — disk type stays an
environmental property and .vif remains portable.

* fix(ec): plumb source disk type through plugin worker (#9423)

Add source_disk_type to ErasureCodingTaskParams (field 8; 7 reserved),
populate it from the metric the detector already collects, thread it
through ec_task into the MountEcShards helper, and forward it on the
VolumeEcShardsMount RPC.

* fix(ec): mirror source disk type plumbing in rust volume server (#9423)

The volume_ec_shards_mount handler now forwards source_disk_type into
mount_ec_shard → DiskLocation::mount_ec_shards. When non-empty it
overrides ec_vol.disk_type (and each mounted shard's disk_type) via
the new set_disk_type method; empty value keeps the location's disk
type, so disk-scan reload and reconcile paths are unchanged.

Also picks up two pre-existing proto drifts that 'make gen' synced
from weed/pb (LockRingUpdate in master.proto, listing_cache_ttl_seconds
in remote.proto).

* feat(ec): bias placement toward preferred disk type (#9423)

Add DiskCandidate.DiskType and PlacementRequest.PreferredDiskType.
When PreferredDiskType is non-empty, SelectDestinations partitions
suitable disks into matching/fallback tiers and runs the rack/server/
disk-diversity passes on the matching tier first; the fallback tier
is only consulted if the matching pool can't satisfy ShardsNeeded.
PlacementResult.SpilledToOtherDiskType lets callers warn on spillover.

Empty PreferredDiskType keeps the existing single-pool behavior.

* fix(ec): plumb source disk type into placement planner (#9423)

diskInfosToCandidates now copies DiskInfo.DiskType into the placement
candidate, and ecPlacementPlanner.selectDestinations forwards
metric.DiskType as PreferredDiskType so EC shards land on disks
matching the source volume's disk type when possible. A glog warning
fires when placement had to spill to other disk types.

* test(ec): integration coverage for source-disk-type plumbing (#9423)

store_ec_disk_type_test exercises Store.MountEcShards end-to-end: a
shard physically lives on an HDD location, MountEcShards is called
with sourceDiskType="ssd", and the test asserts that the in-memory
EcVolume, the mounted shard, the NewEcShardsChan notification, and
the steady-state heartbeat all report under the source's disk type.
A companion test pins the empty-source path so disk-scan reload
keeps the location's disk type.

detection_disk_type_test exercises the worker plumbing: with a
cluster of nodes carrying both HDD and SSD disks, planECDestinations
must place every shard on SSD when metric.DiskType="ssd"; with only
one SSD node and 13 HDD nodes it must still satisfy a 10+4 layout
via spillover (and log a warning).

* revert(ec): drop unrelated proto drift in seaweed-volume/proto (#9423)

make gen pulled two pre-existing OSS changes into the rust proto
tree (LockRingUpdate / by_plugin in master.proto,
listing_cache_ttl_seconds in remote.proto). Reviewers flagged it as
scope creep — none of the rust EC fix references those fields.
Restore both files to origin/master so this branch only touches
EC-related symbols.

* fix(ec placement): treat empty disk type as hdd and skip used racks on spill (#9423)

partitionByDiskType used raw string comparison, so a PreferredDiskType
of "hdd" never matched candidates whose DiskType is "" (the
HardDriveType sentinel that weed/storage/types uses). EC encoding of
an HDD source would spill onto any HDD reporting "" even when the
cluster has plenty of matching capacity. Normalize both sides
through normalizeDiskType, which lowercases and folds "" → "hdd",
mirroring types.ToDiskType without taking a dependency on it.

selectFromTier's rack-diversity pass also kept revisiting racks the
preferred tier had already used when running on the fallback tier,
which negated PreferDifferentRacks on spillover. Skip racks already
in usedRacks so fallback placements still spread onto new racks.

* fix(ec): empty-source remount must not clobber existing disk type (#9423)

mount_ec_shards_with_idx_dir runs more than once per vid (RPC mount,
disk-scan reload, orphan-shard reconcile). After an RPC sets the
source-derived disk type, any later call passing source_disk_type=""
was resetting ec_vol.disk_type back to the location's value, which
reintroduces the heartbeat drift this PR is meant to fix. Only
default to the location's disk type when the EC volume is fresh
(no shards mounted yet); otherwise leave the recorded type alone so
empty-source reloads preserve whatever the original mount RPC set.
2026-05-11 20:21:50 -07:00
Chris Lu
fd463155e4 fix(ec): planner treats each (server, disk_id) as a distinct target (#9369) (#9371)
* fix(ec): planner treats each (server, disk_id) as a distinct target (#9369)

master_pb.DataNodeInfo.DiskInfos is keyed by disk type, so a volume
server with multiple physical disks of the same type collapses into a
single DiskInfo. Per-disk attribution survives only inside the
VolumeInfos[].DiskId / EcShardInfos[].DiskId records, and the active
topology never put it back together. The EC planner saw N candidates
instead of N×disks, returned a short plan, and createECTargets
round-robined extra shards onto the same (server, disk_id) — colliding
with the #9185 disk_id-aware ReceiveFile.

Reconstruct per-physical-disk view in UpdateTopology by splitting each
DiskInfo into one entry per observed disk_id, and index volumes / EC
shards by their own DiskId so lookups stay aligned. Refuse to plan an
EC task when fewer than totalShards distinct disks are available rather
than packing shards onto the same disk.

Threads dataShards/parityShards through planECDestinations,
createECTargets and createECTaskParams so the helpers don't depend on
the OSS 10+4 constants — keeps enterprise merges clean.

* trim verbose comments

* align EC param signatures with enterprise

- dataShards/parityShards: uint32 → int (matches enterprise's ratio API)
- drop unused multiPlan from createECTaskParams
- minTotalDisks: total/parity+1 → ceil(total/parity), correct for non-default ratios

Reduces merge surface when this PR lands in seaweed-enterprise.
2026-05-08 12:59:02 -07:00
Chris Lu
5d43f84df7 refactor(plugin): rename detection_interval_seconds → detection_interval_minutes (#9366)
Minutes is the natural granularity for detection cadence — every
production handler already set the seconds field to a 60-multiple
(17*60, 30*60, 3600, 24*60*60). Switching to minutes drops the *60
arithmetic and matches the unit conventions used elsewhere in the
plugin worker forms.

- Proto: AdminRuntimeDefaults + AdminRuntimeConfig.detection_interval_*
  field renamed.
- Helpers: durationFromMinutes / minutesFromDuration alongside the
  existing seconds variants in plugin_scheduler.go.
- Handlers: vacuum, ec_balance, balance, erasure_coding, iceberg,
  admin_script, s3_lifecycle now declare DetectionIntervalMinutes.
- Admin: scheduler_status + types + UI templ + plugin_api.go pass
  through the new field; UI label and table cells switch to "min".
2026-05-08 10:33:02 -07:00
Chris Lu
1f6f473995 refactor(worker): co-locate plugin handlers with their task packages (#9301)
* refactor(worker): co-locate plugin handlers with their task packages

Move every per-task plugin handler from weed/plugin/worker/ into the
matching weed/worker/tasks/<name>/ package, so each task owns its
detection, scheduling, execution, and plugin handler in one place.

Step 0 (within pluginworker, no behavior change): extract shared helpers
that previously lived inside individual handler files into dedicated
files and export the ones now consumed across packages.

  - activity.go: BuildExecutorActivity, BuildDetectorActivity
  - config.go: ReadStringConfig/Double/Int64/Bytes/StringList, MapTaskPriority
  - interval.go: ShouldSkipDetectionByInterval
  - volume_state.go: VolumeState + consts, FilterMetricsByVolumeState/Location
  - collection_filter.go: CollectionFilterMode + consts
  - volume_metrics.go: export CollectVolumeMetricsFromMasters,
    MasterAddressCandidates, FetchVolumeList
  - testing_senders_test.go: shared test stubs

Phase 1: move the per-task plugin handlers (and the iceberg subpackage)
into their task packages.

  weed/plugin/worker/vacuum_handler.go         -> weed/worker/tasks/vacuum/plugin_handler.go
  weed/plugin/worker/ec_balance_handler.go     -> weed/worker/tasks/ec_balance/plugin_handler.go
  weed/plugin/worker/erasure_coding_handler.go -> weed/worker/tasks/erasure_coding/plugin_handler.go
  weed/plugin/worker/volume_balance_handler.go -> weed/worker/tasks/balance/plugin_handler.go
  weed/plugin/worker/iceberg/                   -> weed/worker/tasks/iceberg/

  weed/plugin/worker/handlers/handlers.go now blank-imports all five
  task subpackages so their init() registrations fire.

  weed/command/mini.go and the worker tests construct the handler with
  vacuum.DefaultMaxExecutionConcurrency (the constant moved with the
  vacuum handler).

admin_script remains in weed/plugin/worker/ because there is no
underlying weed/worker/tasks/admin_script/ package to merge with.

* refactor(worker): update test/plugin_workers imports for moved handlers

Three handler constructors moved out of pluginworker into their task
packages — update the integration test files in test/plugin_workers/
to import from the new locations:

  pluginworker.NewVacuumHandler        -> vacuum.NewVacuumHandler
  pluginworker.NewVolumeBalanceHandler -> balance.NewVolumeBalanceHandler
  pluginworker.NewErasureCodingHandler -> erasure_coding.NewErasureCodingHandler

The pluginworker import is kept where the file still uses
pluginworker.WorkerOptions / pluginworker.JobHandler.

* refactor(worker): update test/s3tables iceberg import path

The iceberg subpackage moved from weed/plugin/worker/iceberg/ to
weed/worker/tasks/iceberg/. test/s3tables/maintenance/maintenance_integration_test.go
still imported the old path, breaking S3 Tables / RisingWave / Trino /
Spark / Iceberg-catalog / STS integration test builds.

Mirrors the OSS-side fix needed by every job in the run that
transitively imports test/s3tables/maintenance.

* chore: gofmt PR-touched files

The S3 Tables Format Check job runs `gofmt -l` over weed/s3api/s3tables
and test/s3tables, then fails if anything is unformatted. Files this
PR moved or modified had import-grouping and trailing-spacing issues
introduced by perl-based renames; reformat them with gofmt -w.

Touched files:
  test/plugin_workers/erasure_coding/{detection,execution}_test.go
  test/s3tables/maintenance/maintenance_integration_test.go
  weed/plugin/worker/handlers/handlers.go
  weed/worker/tasks/{balance,ec_balance,erasure_coding,vacuum}/plugin_handler*.go

* refactor(worker): bounds-checked int conversions for plugin config values

CodeQL flagged 18 go/incorrect-integer-conversion warnings on the moved
plugin handler files: results of pluginworker.ReadInt64Config (which
ultimately calls strconv.ParseInt with bit size 64) were being narrowed
to int32/uint32/int without an upper-bound check, so a malicious or
malformed admin/worker config value could overflow the target type.

Add three helpers in weed/plugin/worker/config.go that wrap
ReadInt64Config and clamp out-of-range values back to the caller's
fallback:

  ReadInt32Config (math.MinInt32 .. math.MaxInt32)
  ReadUint32Config (0 .. math.MaxUint32)
  ReadIntConfig    (math.MinInt32 .. math.MaxInt32, platform-portable)

Update each flagged call site in the four moved task packages to use
the bounds-checked helper. For protobuf uint32 fields (volume IDs)
the variable type also becomes uint32, removing the trailing
uint32(volumeID) casts and changing the "missing volume_id" check
from `<= 0` to `== 0`.

Touched files:
  weed/plugin/worker/config.go
  weed/worker/tasks/balance/plugin_handler.go
  weed/worker/tasks/erasure_coding/plugin_handler.go
  weed/worker/tasks/vacuum/plugin_handler.go

* refactor(worker): use ReadIntConfig for clamped derive-worker-config helpers

CodeQL still flagged three call sites where ReadInt64Config was being
narrowed to int after a value-range clamp (max_concurrent_moves <= 50,
batch_size <= 100, min_server_count >= 2). The clamp is correct but
CodeQL's flow analysis didn't recognize the bound, so it flagged them
as unbounded narrowing.

Switch to ReadIntConfig (already int32-bounded by the helper) for
those three sites, drop the now-redundant int64 intermediate variables.

Also drops the now-unused `> math.MaxInt32` clamp in
ec_balance.deriveECBalanceWorkerConfig (the helper covers it).
2026-05-02 18:03:13 -07:00
Chris Lu
628363c4a6 fix(erasure_coding): surface replica delete failures from EC task (#9184) (#9187)
* test(erasure_coding): reproduce #9184 deleteOriginalVolume swallowing errors

ErasureCodingTask.deleteOriginalVolume logs a warning when any replica
VolumeDelete fails and then returns nil, so the EC task reports
success to the admin even when a source replica survives. That stale
replica lets a later detection scan re-propose the same volume and,
once retried, drives the mounted-shard-truncation corruption that
issue 9184 also describes.

Reproducer: wire one reachable replica (succeeds) and one unreachable
replica (fails) and assert the function currently returns nil. After
the fix the function must surface the replica failure so the task is
retried rather than marked done, and this test needs to be inverted.

* fix(erasure_coding): surface replica delete failures from EC task

ErasureCodingTask.deleteOriginalVolume previously logged a warning
and returned nil when any VolumeDelete against a source replica
failed. The EC task therefore reported overall success to the admin
even when a source replica stayed on disk, which let a later
detection scan propose a duplicate EC encoding of the same volume.
The retry then walked the ReceiveFile path against servers that
already had mounted EC shards for the volume, truncating the live
shard files in place (the other half of #9184).

This change returns an error describing the per-replica failures
after the best-effort delete pass, so the task is marked failed
instead of silently moving on. Successful deletes are still applied
(per-replica progress is preserved); only the final return changes.

When combined with the ReceiveFile mount-safety check, a stuck
original replica now produces loud, actionable failures instead of
silent corruption.

Tests:
- TestDeleteOriginalVolumeSurfacesReplicaFailures: asserts an error
  is returned and names the unreachable replica, while the reachable
  replica still gets deleted.
- TestDeleteOriginalVolumeSucceedsWhenAllReplicasReachable: pins the
  happy path.
2026-04-22 16:02:51 -07:00
Chris Lu
940eed0bd3 fix(ec): generate .ecx before EC shards to prevent data inconsistency (#8972)
* fix(ec): generate .ecx before EC shards to prevent data inconsistency

In VolumeEcShardsGenerate, the .ecx index was generated from .idx AFTER
the EC shards were generated from .dat. If any write occurred between
these two steps (e.g. WriteNeedleBlob during replica sync, which bypasses
the read-only check), the .ecx would contain entries pointing to data
that doesn't exist in the EC shards, causing "shard too short" and
"size mismatch" errors on subsequent reads and scrubs.

Fix by generating .ecx FIRST, then snapshotting datFileSize, then
encoding EC shards. If a write sneaks in after .ecx generation, the
EC shards contain more data than .ecx references — which is harmless
(the extra data is simply not indexed).

Also snapshot datFileSize before EC encoding to ensure the .vif
reflects the same .dat state that .ecx was generated from.

Add TestEcConsistency_WritesBetweenEncodeAndEcx that reproduces the
race condition by appending data between EC encoding and .ecx generation.

* fix: pass actual offset to ReadBytes, improve test quality

- Pass offset.ToActualOffset() to ReadBytes instead of 0 to preserve
  correct error metrics and error messages within ReadBytes
- Handle Stat() error in assembleFromIntervalsAllowError
- Rename TestEcConsistency_DatFileGrowsDuringEncoding to
  TestEcConsistency_ExactLargeRowEncoding (test verifies fixed-size
  encoding, not concurrent growth)
- Update test comment to clarify it reproduces the old buggy sequence
- Fix verification loop to advance by readSize for full data coverage

* fix(ec): add dat/idx consistency check in worker EC encoding

The erasure_coding worker copies .dat and .idx as separate network
transfers. If a write lands on the source between these copies, the
.idx may have entries pointing past the end of .dat, leading to EC
volumes with .ecx entries that reference non-existent shard data.

Add verifyDatIdxConsistency() that walks the .idx and verifies no
entry's offset+size exceeds the .dat file size. This fails the EC
task early with a clear error instead of silently producing corrupt
EC volumes.

* test(ec): add integration test verifying .ecx/.ecd consistency

TestEcIndexConsistencyAfterEncode uploads multiple needles of varying
sizes (14B to 256KB), EC-encodes the volume, mounts data shards, then
reads every needle back via the EC read path and verifies payload
correctness. This catches any inconsistency between .ecx index entries
and EC shard data.

* fix(test): account for needle overhead in test volume fixture

WriteTestVolumeFiles created a .dat of exactly datSize bytes but the
.idx entry claimed a needle of that same size. GetActualSize adds
header + checksum + timestamp overhead, so the consistency check
correctly rejects this as the needle extends past the .dat file.

Fix by sizing the .dat to GetActualSize(datSize) so the .idx entry
is consistent with the .dat contents.

* fix(test): remove flaky shard ID assertion in EC scrub test

When shard 0 is truncated on disk after mount, the volume server may
detect corruption via parity mismatches (shards 10-13) rather than a
direct read failure on shard 0, depending on OS caching/mmap behavior.
Replace the brittle shard-0-specific check with a volume ID validation.

* fix(test): close upload response bodies and tighten file count assertion

Wrap UploadBytes calls with ReadAllAndClose to prevent connection/fd
leaks during test execution. Also tighten TotalFiles check from >= 1
to == 1 since ecSetup uploads exactly one file.
2026-04-07 19:05:36 -07:00
Chris Lu
995dfc4d5d chore: remove ~50k lines of unreachable dead code (#8913)
* chore: remove unreachable dead code across the codebase

Remove ~50,000 lines of unreachable code identified by static analysis.

Major removals:
- weed/filer/redis_lua: entire unused Redis Lua filer store implementation
- weed/wdclient/net2, resource_pool: unused connection/resource pool packages
- weed/plugin/worker/lifecycle: unused lifecycle plugin worker
- weed/s3api: unused S3 policy templates, presigned URL IAM, streaming copy,
  multipart IAM, key rotation, and various SSE helper functions
- weed/mq/kafka: unused partition mapping, compression, schema, and protocol functions
- weed/mq/offset: unused SQL storage and migration code
- weed/worker: unused registry, task, and monitoring functions
- weed/query: unused SQL engine, parquet scanner, and type functions
- weed/shell: unused EC proportional rebalance functions
- weed/storage/erasure_coding/distribution: unused distribution analysis functions
- Individual unreachable functions removed from 150+ files across admin,
  credential, filer, iam, kms, mount, mq, operation, pb, s3api, server,
  shell, storage, topology, and util packages

* fix(s3): reset shared memory store in IAM test to prevent flaky failure

TestLoadIAMManagerFromConfig_EmptyConfigWithFallbackKey was flaky because
the MemoryStore credential backend is a singleton registered via init().
Earlier tests that create anonymous identities pollute the shared store,
causing LookupAnonymous() to unexpectedly return true.

Fix by calling Reset() on the memory store before the test runs.

* style: run gofmt on changed files

* fix: restore KMS functions used by integration tests

* fix(plugin): prevent panic on send to closed worker session channel

The Plugin.sendToWorker method could panic with "send on closed channel"
when a worker disconnected while a message was being sent. The race was
between streamSession.close() closing the outgoing channel and sendToWorker
writing to it concurrently.

Add a done channel to streamSession that is closed before the outgoing
channel, and check it in sendToWorker's select to safely detect closed
sessions without panicking.
2026-04-03 16:04:27 -07:00
Chris Lu
d074830016 fix(worker): pass compaction revision and file sizes in EC volume copy (#8835)
* fix(worker): pass compaction revision and file sizes in EC volume copy

The worker EC task was sending CopyFile requests without the current
compaction revision (defaulting to 0) and with StopOffset set to
math.MaxInt64.  After a vacuum compaction this caused the volume server
to reject the copy or return stale data.

Read the volume file status first and forward the compaction revision
and actual file sizes so the copy is consistent with the compacted
volume.

* propagate erasure coding task context

* fix(worker): validate volume file status and detect short copies

Reject zero dat file size from ReadVolumeFileStatus — a zero-sized
snapshot would produce 0-byte copies and broken EC shards.

After streaming, verify totalBytes matches the expected stopOffset
and return an error on short copies instead of logging success.

* fix(worker): reject zero idx file size in volume status validation

A non-empty dat with zero idx indicates an empty or corrupt volume.
Without this guard, copyFileFromSource gets stopOffset=0, produces a
0-byte .idx, passes the short-copy check, and generateEcShardsLocally
runs against a volume with no index.

* fix fake plugin volume file status

* fix plugin volume balance test fixtures
2026-03-29 18:47:15 -07:00
Lars Lehtonen
9cc26d09e8 chore:(weed/worker/tasks/erasure_coding): Prune Unused and Untested Functions (#8761)
* chore(weed/worker/tasks/erasure_coding): prune unused findVolumeReplicas()

* chore(weed/worker/tasks/erasure_coding): prune unused isDiskSuitableForEC()

* chore(weed/worker/tasks/erasure_coding): prune unused selectBestECDestinations()

* chore(weed/worker/tasks/erasure_coding): prune unused candidatesToDiskInfos()
2026-03-24 10:10:28 -07:00
Chris Lu
55bce53953 reduce logs 2026-03-09 12:14:25 -07:00
Chris Lu
f5c35240be Add volume dir tags and EC placement priority (#8472)
* Add volume dir tags to topology

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add preferred tag config for EC

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Prioritize EC destinations by tags

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add EC placement planner tag tests

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Refactor EC placement tests to reuse buildActiveTopology

Remove buildActiveTopologyWithDiskTags helper function and consolidate
tag setup inline in test cases. Tests now use UpdateTopology to apply
tags after topology creation, reusing the existing buildActiveTopology
function rather than duplicating its logic.

All tag scenario tests pass:
- TestECPlacementPlannerPrefersTaggedDisks
- TestECPlacementPlannerFallsBackWhenTagsInsufficient

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Consolidate normalizeTagList into shared util package

Extract normalizeTagList from three locations (volume.go,
detection.go, erasure_coding_handler.go) into new weed/util/tag.go
as exported NormalizeTagList function. Replace all duplicate
implementations with imports and calls to util.NormalizeTagList.

This improves code reuse and maintainability by centralizing
tag normalization logic.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add PreferredTags to EC config persistence

Add preferred_tags field to ErasureCodingTaskConfig protobuf with field
number 5. Update GetConfigSpec to include preferred_tags field in the
UI configuration schema. Add PreferredTags to ToTaskPolicy to serialize
config to protobuf. Add PreferredTags to FromTaskPolicy to deserialize
from protobuf with defensive copy to prevent external mutation.

This allows EC preferred tags to be persisted and restored across
worker restarts.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add defensive copy for Tags slice in DiskLocation

Copy the incoming tags slice in NewDiskLocation instead of storing
by reference. This prevents external callers from mutating the
DiskLocation.Tags slice after construction, improving encapsulation
and preventing unexpected changes to disk metadata.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add doc comment to buildCandidateSets method

Document the tiered candidate selection and fallback behavior. Explain
that for a planner with preferredTags, it accumulates disks matching
each tag in order into progressively larger tiers, emits a candidate
set once a tier reaches shardsNeeded, and finally falls back to the
full candidates set if preferred-tag tiers are insufficient.

This clarifies the intended semantics for future maintainers.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Apply final PR review fixes

1. Update parseVolumeTags to replicate single tag entry to all folders
   instead of leaving some folders with nil tags. This prevents nil
   pointer dereferences when processing folders without explicit tags.

2. Add defensive copy in ToTaskPolicy for PreferredTags slice to match
   the pattern used in FromTaskPolicy, preventing external mutation of
   the returned TaskPolicy.

3. Add clarifying comment in buildCandidateSets explaining that the
   shardsNeeded <= 0 branch is a defensive check for direct callers,
   since selectDestinations guarantees shardsNeeded > 0.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix nil pointer dereference in parseVolumeTags

Ensure all folder tags are initialized to either normalized tags or
empty slices, not nil. When multiple tag entries are provided and there
are more folders than entries, remaining folders now get empty slices
instead of nil, preventing nil pointer dereference in downstream code.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix NormalizeTagList to return empty slice instead of nil

Change NormalizeTagList to always return a non-nil slice. When all tags
are empty or whitespace after normalization, return an empty slice
instead of nil. This prevents nil pointer dereferences in downstream
code that expects a valid (possibly empty) slice.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add nil safety check for v.tags pointer

Add a safety check to handle the case where v.tags might be nil,
preventing a nil pointer dereference. If v.tags is nil, use an empty
string instead. This is defensive programming to prevent panics in
edge cases.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add volume.tags flag to weed server and weed mini commands

Add the volume.tags CLI option to both the 'weed server' and 'weed mini'
commands. This allows users to specify disk tags when running the
combined server modes, just like they can with 'weed volume'.

The flag uses the same format and description as the volume command:
comma-separated tag groups per data dir with ':' separators
(e.g. fast:ssd,archive).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-01 10:22:00 -08:00
Chris Lu
7354fa87f1 refactor ec shard distribution (#8465)
* refactor ec shard distribution

* fix shard assignment merge and mount errors

* fix mount error aggregation scope

* make WithFields compatible and wrap errors
2026-02-27 17:21:13 -08:00
Chris Lu
4f647e1036 Worker set its working directory (#8461)
* set working directory

* consolidate to worker directory

* working directory

* correct directory name

* refactoring to use wildcard matcher

* simplify

* cleaning ec working directory

* fix reference

* clean

* adjust test
2026-02-27 12:22:21 -08:00
Chris Lu
453310b057 Add plugin worker integration tests for erasure coding (#8450)
* test: add plugin worker integration harness

* test: add erasure coding detection integration tests

* test: add erasure coding execution integration tests

* ci: add plugin worker integration workflow

* test: extend fake volume server for vacuum and balance

* test: expand erasure coding detection topologies

* test: add large erasure coding detection topology

* test: add vacuum plugin worker integration tests

* test: add volume balance plugin worker integration tests

* ci: run plugin worker tests per worker

* fixes

* erasure coding: stop after placement failures

* erasure coding: record hasMore when early stopping

* erasure coding: relax large topology expectations
2026-02-25 22:11:41 -08:00
Chris Lu
d2b92938ee Make EC detection context aware (#8449)
* Make EC detection context aware

* Update register.go

* Speed up EC detection planning

* Add tests for EC detection planner

* optimizations

detection.go: extracted ParseCollectionFilter (exported) and feed it into the detection loop so both detection and tracing share the same parsing/whitelisting logic; the detection loop now iterates on a sorted list of volume IDs, checks the context at every iteration, and only sets hasMore when there are still unprocessed groups after hitting maxResults, keeping runtime bounded while still scheduling planned tasks before returning the results.
erasure_coding_handler.go: dropped the duplicated inline filter parsing in emitErasureCodingDetectionDecisionTrace and now reuse erasurecodingtask.ParseCollectionFilter, and the summary suffix logic now only accounts for the hasMore case that can actually happen.
detection_test.go: updated the helper topology builder to use master_pb.VolumeInformationMessage (matching the current protobuf types) and tightened the cancellation/max-results tests so they reliably exercise the detection logic (cancel before calling Detection, and provide enough disks so one result is produced before the limit).

* use working directory

* fix compilation

* fix compilation

* rename

* go vet

* fix getenv

* address comments, fix error
2026-02-25 18:02:35 -08:00
Аlexey Medvedev
6a3a97333f Add support for TLS in gRPC communication between worker and volume server (#8370)
* Add support for TLS in gRPC communication between worker and volume server

* address comments

* worker: capture shared grpc.DialOption in BalanceTask registration closure

* worker: capture shared grpc.DialOption in ErasureCodingTask registration closure

* worker: capture shared grpc.DialOption in VacuumTask registration closure

* worker: use grpc.worker security configuration section for tasks

* plugin/worker: fix compilation errors by passing grpc.DialOption to task constructors

* plugin/worker: prevent double-counting in EC skip counters

---------

Co-authored-by: Chris Lu <chris.lu@gmail.com>
2026-02-18 15:39:53 -08:00
Chris Lu
72a8f598f2 Fix Maintenance Task Sorting and Refactor Log Persistence (#8199)
* fix float stepping

* do not auto refresh

* only logs when non 200 status

* fix maintenance task sorting and cleanup redundant handler logic

* Refactor log retrieval to persist to disk and fix slowness

- Move log retrieval to disk-based persistence in GetMaintenanceTaskDetail
- Implement background log fetching on task completion in worker_grpc_server.go
- Implement async background refresh for in-progress tasks
- Completely remove blocking gRPC calls from the UI path to fix 10s timeouts
- Cleanup debug logs and performance profiling code

* Ensure consistent deterministic sorting in config_persistence cleanup

* Replace magic numbers with constants and remove debug logs

- Added descriptive constants for truncation limits and timeouts in admin_server.go and worker_grpc_server.go
- Replaced magic numbers with these constants throughout the codebase
- Verified removal of stdout debug printing
- Ensured consistent truncation logic during log persistence

* Address code review feedback on history truncation and logging logic

- Fix AssignmentHistory double-serialization by copying task in GetMaintenanceTaskDetail
- Fix handleTaskCompletion logging logic (mutually exclusive success/failure logs)
- Remove unused Timeout field from LogRequestContext and sync select timeouts with constants
- Ensure AssignmentHistory is only provided in the top-level field for better JSON structure

* Implement goroutine leak protection and request deduplication

- Add request deduplication in RequestTaskLogs to prevent multiple concurrent fetches for the same task
- Implement safe cleanup in timeout handlers to avoid race conditions in pendingLogRequests map
- Add a 10s cooldown for background log refreshes in GetMaintenanceTaskDetail to prevent spamming
- Ensure all persistent log-fetching goroutines are bounded and efficiently managed

* Fix potential nil pointer panics in maintenance handlers

- Add nil checks for adminServer in ShowTaskDetail, ShowMaintenanceWorkers, and UpdateTaskConfig
- Update getMaintenanceQueueData to return a descriptive error instead of nil when adminServer is uninitialized
- Ensure internal helper methods consistently check for adminServer initialization before use

* Strictly enforce disk-only log reading

- Remove background log fetching from GetMaintenanceTaskDetail to prevent timeouts and network calls during page view
- Remove unused lastLogFetch tracking fields to clean up dead code
- Ensure logs are only updated upon task completion via handleTaskCompletion

* Refactor GetWorkerLogs to read from disk

- Update /api/maintenance/workers/:id/logs endpoint to use configPersistence.LoadTaskExecutionLogs
- Remove synchronous gRPC call RequestTaskLogs to prevent timeouts and bad gateway errors
- Ensure consistent log retrieval behavior across the application (disk-only)

* Fix timestamp parsing in log viewer

- Update task_detail.templ JS to handle both ISO 8601 strings and Unix timestamps
- Fix "Invalid time value" error when displaying logs fetched from disk
- Regenerate templates

* master: fallback to HDD if SSD volumes are full in Assign

* worker: improve EC detection logging and fix skip counters

* worker: add Sync method to TaskLogger interface

* worker: implement Sync and ensure logs are flushed before task completion

* admin: improve task log retrieval with retries and better timeouts

* admin: robust timestamp parsing in task detail view
2026-02-04 08:48:55 -08:00
Chris Lu
13dcf445a4 Fix maintenance worker panic and add EC integration tests (#8068)
* Fix nil pointer panic in maintenance worker when receiving empty task assignment

When a worker requests a task and none are available, the admin server
sends an empty TaskAssignment message. The worker was attempting to log
the task details without checking if the TaskId was empty, causing a
nil pointer dereference when accessing taskAssign.Params.VolumeId.

This fix adds a check for empty TaskId before processing the assignment,
preventing worker crashes and improving stability in production environments.

* Add EC integration test for admin-worker maintenance system

Adds comprehensive integration test that verifies the end-to-end flow
of erasure coding maintenance tasks:
- Admin server detects volumes needing EC encoding
- Workers register and receive task assignments
- EC encoding is executed and verified in master topology
- File read-back validation confirms data integrity

The test uses unique absolute working directories for each worker to
prevent ID conflicts and ensure stable worker registration. Includes
proper cleanup and process management for reliable test execution.

* Improve maintenance system stability and task deduplication

- Add cross-type task deduplication to prevent concurrent maintenance
  operations on the same volume (EC, balance, vacuum)
- Implement HasAnyTask check in ActiveTopology for better coordination
- Increase RequestTask timeout from 5s to 30s to prevent unnecessary
  worker reconnections
- Add TaskTypeNone sentinel for generic task checks
- Update all task detectors to use HasAnyTask for conflict prevention
- Improve config persistence and schema handling

* Add GitHub Actions workflow for EC integration tests

Adds CI workflow that runs EC integration tests on push and pull requests
to master branch. The workflow:
- Triggers on changes to admin, worker, or test files
- Builds the weed binary
- Runs the EC integration test suite
- Uploads test logs as artifacts on failure for debugging

This ensures the maintenance system remains stable and worker-admin
integration is validated in CI.

* go version 1.24

* address comments

* Update maintenance_integration.go

* support seconds

* ec prioritize over balancing in tests
2026-01-20 15:07:43 -08:00
Chris Lu
e10f11b480 opt: reduce ShardsInfo memory usage with bitmap and sorted slice (#7974)
* opt: reduce ShardsInfo memory usage with bitmap and sorted slice

- Replace map[ShardId]*ShardInfo with sorted []ShardInfo slice
- Add ShardBits (uint32) bitmap for O(1) existence checks
- Use binary search for O(log n) lookups by shard ID
- Maintain sorted order for efficient iteration
- Add comprehensive unit tests and benchmarks

Memory savings:
- Map overhead: ~48 bytes per entry eliminated
- Pointers: 8 bytes per entry eliminated
- Total: ~56 bytes per shard saved

Performance improvements:
- Has(): O(1) using bitmap
- Size(): O(log n) using binary search (was O(1), acceptable tradeoff)
- Count(): O(1) using popcount on bitmap
- Iteration: Faster due to cache locality

* refactor: add methods to ShardBits type

- Add Has(), Set(), Clear(), and Count() methods to ShardBits
- Simplify ShardsInfo methods by using ShardBits methods
- Improves code readability and encapsulation

* opt: use ShardBits directly in ShardsCountFromVolumeEcShardInformationMessage

Avoid creating a full ShardsInfo object just to count shards.
Directly cast vi.EcIndexBits to ShardBits and use Count() method.

* opt: use strings.Builder in ShardsInfo.String() for efficiency

* refactor: change AsSlice to return []ShardInfo (values instead of pointers)

This completes the memory optimization by avoiding unnecessary pointer slices and potential allocations.

* refactor: rename ShardsCountFromVolumeEcShardInformationMessage to GetShardCount

* fix: prevent deadlock in Add and Subtract methods

Copy shards data from 'other' before releasing its lock to avoid
potential deadlock when a.Add(b) and b.Add(a) are called concurrently.

The previous implementation held other's lock while calling si.Set/Delete,
which acquires si's lock. This could deadlock if two goroutines tried to
add/subtract each other concurrently.

* opt: avoid unnecessary locking in constructor functions

ShardsInfoFromVolume and ShardsInfoFromVolumeEcShardInformationMessage
now build shards slice and bitmap directly without calling Set(), which
acquires a lock on every call. Since the object is local and not yet
shared, locking is unnecessary and adds overhead.

This improves performance during object construction.

* fix: rename 'copy' variable to avoid shadowing built-in function

The variable name 'copy' in TestShardsInfo_Copy shadowed the built-in
copy() function, which is confusing and bad practice. Renamed to 'siCopy'.

* opt: use math/bits.OnesCount32 and reorganize types

1. Replace manual popcount loop with math/bits.OnesCount32 for better
   performance and idiomatic Go code
2. Move ShardSize type definition to ec_shards_info.go for better code
   organization since it's primarily used there

* refactor: Set() now accepts ShardInfo for future extensibility

Changed Set(id ShardId, size ShardSize) to Set(shard ShardInfo) to
support future additions to ShardInfo without changing the API.

This makes the code more extensible as new fields can be added to
ShardInfo (e.g., checksum, location, etc.) without breaking the Set API.

* refactor: move ShardInfo and ShardSize to separate file

Created ec_shard_info.go to hold the basic shard types (ShardInfo and
ShardSize) for better code organization and separation of concerns.

* refactor: add ShardInfo constructor and helper functions

Added NewShardInfo() constructor and IsValid() method to better
encapsulate ShardInfo creation and validation. Updated code to use
the constructor for cleaner, more maintainable code.

* fix: update remaining Set() calls to use NewShardInfo constructor

Fixed compilation errors in storage and shell packages where Set() calls
were not updated to use the new NewShardInfo() constructor.

* fix: remove unreachable code in filer backup commands

Removed unreachable return statements after infinite loops in
filer_backup.go and filer_meta_backup.go to fix compilation errors.

* fix: rename 'new' variable to avoid shadowing built-in

Renamed 'new' to 'result' in MinusParityShards, Plus, and Minus methods
to avoid shadowing Go's built-in new() function.

* fix: update remaining test files to use NewShardInfo constructor

Fixed Set() calls in command_volume_list_test.go and
ec_rebalance_slots_test.go to use NewShardInfo() constructor.
2026-01-06 00:09:52 -08:00
Lisandro Pin
6b98b52acc Fix reporting of EC shard sizes from nodes to masters. (#7835)
SeaweedFS tracks EC shard sizes on topology data stuctures, but this information is never
relayed to master servers :( The end result is that commands reporting disk usage, such
as `volume.list` and `cluster.status`, yield incorrect figures when EC shards are present.

As an example for a simple 5-node test cluster, before...

```
> volume.list
Topology volumeSizeLimit:30000 MB hdd(volume:6/40 active:6 free:33 remote:0)
  DataCenter DefaultDataCenter hdd(volume:6/40 active:6 free:33 remote:0)
    Rack DefaultRack hdd(volume:6/40 active:6 free:33 remote:0)
      DataNode 192.168.10.111:9001 hdd(volume:1/8 active:1 free:7 remote:0)
        Disk hdd(volume:1/8 active:1 free:7 remote:0) id:0
          volume id:3  size:88967096  file_count:172  replica_placement:2  version:3  modified_at_second:1766349617
          ec volume id:1 collection: shards:[1 5]
        Disk hdd total size:88967096 file_count:172
      DataNode 192.168.10.111:9001 total size:88967096 file_count:172
  DataCenter DefaultDataCenter hdd(volume:6/40 active:6 free:33 remote:0)
    Rack DefaultRack hdd(volume:6/40 active:6 free:33 remote:0)
      DataNode 192.168.10.111:9002 hdd(volume:2/8 active:2 free:6 remote:0)
        Disk hdd(volume:2/8 active:2 free:6 remote:0) id:0
          volume id:2  size:77267536  file_count:166  replica_placement:2  version:3  modified_at_second:1766349617
          volume id:3  size:88967096  file_count:172  replica_placement:2  version:3  modified_at_second:1766349617
          ec volume id:1 collection: shards:[0 4]
        Disk hdd total size:166234632 file_count:338
      DataNode 192.168.10.111:9002 total size:166234632 file_count:338
  DataCenter DefaultDataCenter hdd(volume:6/40 active:6 free:33 remote:0)
    Rack DefaultRack hdd(volume:6/40 active:6 free:33 remote:0)
      DataNode 192.168.10.111:9003 hdd(volume:1/8 active:1 free:7 remote:0)
        Disk hdd(volume:1/8 active:1 free:7 remote:0) id:0
          volume id:2  size:77267536  file_count:166  replica_placement:2  version:3  modified_at_second:1766349617
          ec volume id:1 collection: shards:[2 6]
        Disk hdd total size:77267536 file_count:166
      DataNode 192.168.10.111:9003 total size:77267536 file_count:166
  DataCenter DefaultDataCenter hdd(volume:6/40 active:6 free:33 remote:0)
    Rack DefaultRack hdd(volume:6/40 active:6 free:33 remote:0)
      DataNode 192.168.10.111:9004 hdd(volume:2/8 active:2 free:6 remote:0)
        Disk hdd(volume:2/8 active:2 free:6 remote:0) id:0
          volume id:2  size:77267536  file_count:166  replica_placement:2  version:3  modified_at_second:1766349617
          volume id:3  size:88967096  file_count:172  replica_placement:2  version:3  modified_at_second:1766349617
          ec volume id:1 collection: shards:[3 7]
        Disk hdd total size:166234632 file_count:338
      DataNode 192.168.10.111:9004 total size:166234632 file_count:338
  DataCenter DefaultDataCenter hdd(volume:6/40 active:6 free:33 remote:0)
    Rack DefaultRack hdd(volume:6/40 active:6 free:33 remote:0)
      DataNode 192.168.10.111:9005 hdd(volume:0/8 active:0 free:8 remote:0)
        Disk hdd(volume:0/8 active:0 free:8 remote:0) id:0
          ec volume id:1 collection: shards:[8 9 10 11 12 13]
        Disk hdd total size:0 file_count:0
    Rack DefaultRack total size:498703896 file_count:1014
  DataCenter DefaultDataCenter total size:498703896 file_count:1014
total size:498703896 file_count:1014
```

...and after:

```
> volume.list
Topology volumeSizeLimit:30000 MB hdd(volume:6/40 active:6 free:33 remote:0)
  DataCenter DefaultDataCenter hdd(volume:6/40 active:6 free:33 remote:0)
    Rack DefaultRack hdd(volume:6/40 active:6 free:33 remote:0)
      DataNode 192.168.10.111:9001 hdd(volume:1/8 active:1 free:7 remote:0)
        Disk hdd(volume:1/8 active:1 free:7 remote:0) id:0
          volume id:2  size:81761800  file_count:161  replica_placement:2  version:3  modified_at_second:1766349495
          ec volume id:1 collection: shards:[1 5 9] sizes:[1:8.00 MiB 5:8.00 MiB 9:8.00 MiB] total:24.00 MiB
        Disk hdd total size:81761800 file_count:161
      DataNode 192.168.10.111:9001 total size:81761800 file_count:161
  DataCenter DefaultDataCenter hdd(volume:6/40 active:6 free:33 remote:0)
    Rack DefaultRack hdd(volume:6/40 active:6 free:33 remote:0)
      DataNode 192.168.10.111:9002 hdd(volume:1/8 active:1 free:7 remote:0)
        Disk hdd(volume:1/8 active:1 free:7 remote:0) id:0
          volume id:3  size:88678712  file_count:170  replica_placement:2  version:3  modified_at_second:1766349495
          ec volume id:1 collection: shards:[11 12 13] sizes:[11:8.00 MiB 12:8.00 MiB 13:8.00 MiB] total:24.00 MiB
        Disk hdd total size:88678712 file_count:170
      DataNode 192.168.10.111:9002 total size:88678712 file_count:170
  DataCenter DefaultDataCenter hdd(volume:6/40 active:6 free:33 remote:0)
    Rack DefaultRack hdd(volume:6/40 active:6 free:33 remote:0)
      DataNode 192.168.10.111:9003 hdd(volume:2/8 active:2 free:6 remote:0)
        Disk hdd(volume:2/8 active:2 free:6 remote:0) id:0
          volume id:2  size:81761800  file_count:161  replica_placement:2  version:3  modified_at_second:1766349495
          volume id:3  size:88678712  file_count:170  replica_placement:2  version:3  modified_at_second:1766349495
          ec volume id:1 collection: shards:[0 4 8] sizes:[0:8.00 MiB 4:8.00 MiB 8:8.00 MiB] total:24.00 MiB
        Disk hdd total size:170440512 file_count:331
      DataNode 192.168.10.111:9003 total size:170440512 file_count:331
  DataCenter DefaultDataCenter hdd(volume:6/40 active:6 free:33 remote:0)
    Rack DefaultRack hdd(volume:6/40 active:6 free:33 remote:0)
      DataNode 192.168.10.111:9004 hdd(volume:2/8 active:2 free:6 remote:0)
        Disk hdd(volume:2/8 active:2 free:6 remote:0) id:0
          volume id:2  size:81761800  file_count:161  replica_placement:2  version:3  modified_at_second:1766349495
          volume id:3  size:88678712  file_count:170  replica_placement:2  version:3  modified_at_second:1766349495
          ec volume id:1 collection: shards:[2 6 10] sizes:[2:8.00 MiB 6:8.00 MiB 10:8.00 MiB] total:24.00 MiB
        Disk hdd total size:170440512 file_count:331
      DataNode 192.168.10.111:9004 total size:170440512 file_count:331
  DataCenter DefaultDataCenter hdd(volume:6/40 active:6 free:33 remote:0)
    Rack DefaultRack hdd(volume:6/40 active:6 free:33 remote:0)
      DataNode 192.168.10.111:9005 hdd(volume:0/8 active:0 free:8 remote:0)
        Disk hdd(volume:0/8 active:0 free:8 remote:0) id:0
          ec volume id:1 collection: shards:[3 7] sizes:[3:8.00 MiB 7:8.00 MiB] total:16.00 MiB
        Disk hdd total size:0 file_count:0
    Rack DefaultRack total size:511321536 file_count:993
  DataCenter DefaultDataCenter total size:511321536 file_count:993
total size:511321536 file_count:993
```
2025-12-28 19:30:42 -08:00
Chris Lu
c260e6a22e Fix issue #7880: Tasks use Volume IDs instead of ip:port (#7881)
* Fix issue #7880: Tasks use Volume IDs instead of ip:port

When volume servers are registered with custom IDs, tasks were attempting
to connect using the ID instead of the actual ip:port address, causing
connection failures.

Modified task detection logic in balance, erasure coding, and vacuum tasks
to resolve volume server IDs to their actual ip:port addresses using
ActiveTopology information.

* Use server addresses directly instead of translating from IDs

Modified VolumeHealthMetrics to include ServerAddress field populated
directly from topology DataNodeInfo.Address. Updated task detection
logic to use addresses directly without runtime lookups.

Changes:
- Added ServerAddress field to VolumeHealthMetrics
- Updated maintenance scanner to populate ServerAddress
- Modified task detection to use ServerAddress for Node fields
- Updated DestinationPlan to include TargetAddress
- Removed runtime address lookups in favor of direct address usage

* Address PR comments: add ServerAddress field, improve error handling

- Add missing ServerAddress field to VolumeHealthMetrics struct
- Add warning in vacuum detection when server not found in topology
- Improve error handling in erasure coding to abort task if sources missing
- Make vacuum task stricter by skipping if server not found in topology

* Refactor: Extract common address resolution logic into shared utility

- Created weed/worker/tasks/util/address.go with ResolveServerAddress function
- Updated balance, erasure_coding, and vacuum detection to use the shared utility
- Removed code duplication and improved maintainability
- Consistent error handling across all task types

* Fix critical issues in task address resolution

- Vacuum: Require topology availability and fail if server not found (no fallback to ID)
- Ensure all task types consistently fail early when topology is incomplete
- Prevent creation of tasks that would fail due to missing server addresses

* Address additional PR feedback

- Add validation for empty addresses in ResolveServerAddress
- Remove redundant serverAddress variable in vacuum detection
- Improve robustness of address resolution

* Improve error logging in vacuum detection

- Include actual error details in log message for better diagnostics
- Make error messages consistent with other task types
2025-12-25 16:14:05 -08:00
Chris Lu
4f038820dc Add disk-aware EC rebalancing (#7597)
* Add placement package for EC shard placement logic

- Consolidate EC shard placement algorithm for reuse across shell and worker tasks
- Support multi-pass selection: racks, then servers, then disks
- Include proper spread verification and scoring functions
- Comprehensive test coverage for various cluster topologies

* Make ec.balance disk-aware for multi-disk servers

- Add EcDisk struct to track individual disks on volume servers
- Update EcNode to maintain per-disk shard distribution
- Parse disk_id from EC shard information during topology collection
- Implement pickBestDiskOnNode() for selecting best disk per shard
- Add diskDistributionScore() for tie-breaking node selection
- Update all move operations to specify target disk in RPC calls
- Improves shard balance within multi-disk servers, not just across servers

* Use placement package in EC detection for consistent disk-level placement

- Replace custom EC disk selection logic with shared placement package
- Convert topology DiskInfo to placement.DiskCandidate format
- Use SelectDestinations() for multi-rack/server/disk spreading
- Convert placement results back to topology DiskInfo for task creation
- Ensures EC detection uses same placement logic as shell commands

* Make volume server evacuation disk-aware

- Use pickBestDiskOnNode() when selecting evacuation target disk
- Specify target disk in evacuation RPC requests
- Maintains balanced disk distribution during server evacuations

* Rename PlacementConfig to PlacementRequest for clarity

PlacementRequest better reflects that this is a request for placement
rather than a configuration object. This improves API semantics.

* Rename DefaultConfig to DefaultPlacementRequest

Aligns with the PlacementRequest type naming for consistency

* Address review comments from Gemini and CodeRabbit

Fix HIGH issues:
- Fix empty disk discovery: Now discovers all disks from VolumeInfos,
  not just from EC shards. This ensures disks without EC shards are
  still considered for placement.
- Fix EC shard count calculation in detection.go: Now correctly filters
  by DiskId and sums actual shard counts using ShardBits.ShardIdCount()
  instead of just counting EcShardInfo entries.

Fix MEDIUM issues:
- Add disk ID to evacuation log messages for consistency with other logging
- Remove unused serverToDisks variable in placement.go
- Fix comment that incorrectly said 'ascending' when sorting is 'descending'

* add ec tests

* Update ec-integration-tests.yml

* Update ec_integration_test.go

* Fix EC integration tests CI: build weed binary and update actions

- Add 'Build weed binary' step before running tests
- Update actions/setup-go from v4 to v6 (Node20 compatibility)
- Update actions/checkout from v2 to v4 (Node20 compatibility)
- Move working-directory to test step only

* Add disk-aware EC rebalancing integration tests

- Add TestDiskAwareECRebalancing test with multi-disk cluster setup
- Test EC encode with disk awareness (shows disk ID in output)
- Test EC balance with disk-level shard distribution
- Add helper functions for disk-level verification:
  - startMultiDiskCluster: 3 servers x 4 disks each
  - countShardsPerDisk: track shards per disk per server
  - calculateDiskShardVariance: measure distribution balance
- Verify no single disk is overloaded with shards
2025-12-02 12:30:15 -08:00
Chris Lu
208d7f24f4 Erasure Coding: Ec refactoring (#7396)
* refactor: add ECContext structure to encapsulate EC parameters

- Create ec_context.go with ECContext struct
- NewDefaultECContext() creates context with default 10+4 configuration
- Helper methods: CreateEncoder(), ToExt(), String()
- Foundation for cleaner function signatures
- No behavior change, still uses hardcoded 10+4

* refactor: update ec_encoder.go to use ECContext

- Add WriteEcFilesWithContext() and RebuildEcFilesWithContext() functions
- Keep old functions for backward compatibility (call new versions)
- Update all internal functions to accept ECContext parameter
- Use ctx.DataShards, ctx.ParityShards, ctx.TotalShards consistently
- Use ctx.CreateEncoder() instead of hardcoded reedsolomon.New()
- Use ctx.ToExt() for shard file extensions
- No behavior change, still uses default 10+4 configuration

* refactor: update ec_volume.go to use ECContext

- Add ECContext field to EcVolume struct
- Initialize ECContext with default configuration in NewEcVolume()
- Update LocateEcShardNeedleInterval() to use ECContext.DataShards
- Phase 1: Always uses default 10+4 configuration
- No behavior change

* refactor: add EC shard count fields to VolumeInfo protobuf

- Add data_shards_count field (field 8) to VolumeInfo message
- Add parity_shards_count field (field 9) to VolumeInfo message
- Fields are optional, 0 means use default (10+4)
- Backward compatible: fields added at end
- Phase 1: Foundation for future customization

* refactor: regenerate protobuf Go files with EC shard count fields

- Regenerated volume_server_pb/*.go with new EC fields
- DataShardsCount and ParityShardsCount accessors added to VolumeInfo
- No behavior change, fields not yet used

* refactor: update VolumeEcShardsGenerate to use ECContext

- Create ECContext with default configuration in VolumeEcShardsGenerate
- Use ecCtx.TotalShards and ecCtx.ToExt() in cleanup
- Call WriteEcFilesWithContext() instead of WriteEcFiles()
- Save EC configuration (DataShardsCount, ParityShardsCount) to VolumeInfo
- Log EC context being used
- Phase 1: Always uses default 10+4 configuration
- No behavior change

* fmt

* refactor: update ec_test.go to use ECContext

- Update TestEncodingDecoding to create and use ECContext
- Update validateFiles() to accept ECContext parameter
- Update removeGeneratedFiles() to use ctx.TotalShards and ctx.ToExt()
- Test passes with default 10+4 configuration

* refactor: use EcShardConfig message instead of separate fields

* optimize: pre-calculate row sizes in EC encoding loop

* refactor: replace TotalShards field with Total() method

- Remove TotalShards field from ECContext to avoid field drift
- Add Total() method that computes DataShards + ParityShards
- Update all references to use ctx.Total() instead of ctx.TotalShards
- Read EC config from VolumeInfo when loading EC volumes
- Read data shard count from .vif in VolumeEcShardsToVolume
- Use >= instead of > for exact boundary handling in encoding loops

* optimize: simplify VolumeEcShardsToVolume to use existing EC context

- Remove redundant CollectEcShards call
- Remove redundant .vif file loading
- Use v.ECContext.DataShards directly (already loaded by NewEcVolume)
- Slice tempShards instead of collecting again

* refactor: rename MaxShardId to MaxShardCount for clarity

- Change from MaxShardId=31 to MaxShardCount=32
- Eliminates confusing +1 arithmetic (MaxShardId+1)
- More intuitive: MaxShardCount directly represents the limit

fix: support custom EC ratios beyond 14 shards in VolumeEcShardsToVolume

- Add MaxShardId constant (31, since ShardBits is uint32)
- Use MaxShardId+1 (32) instead of TotalShardsCount (14) for tempShards buffer
- Prevents panic when slicing for volumes with >14 total shards
- Critical fix for custom EC configurations like 20+10

* fix: add validation for EC shard counts from VolumeInfo

- Validate DataShards/ParityShards are positive and within MaxShardCount
- Prevent zero or invalid values that could cause divide-by-zero
- Fallback to defaults if validation fails, with warning log
- VolumeEcShardsGenerate now preserves existing EC config when regenerating
- Critical safety fix for corrupted or legacy .vif files

* fix: RebuildEcFiles now loads EC config from .vif file

- Critical: RebuildEcFiles was always using default 10+4 config
- Now loads actual EC config from .vif file when rebuilding shards
- Validates config before use (positive shards, within MaxShardCount)
- Falls back to default if .vif missing or invalid
- Prevents data corruption when rebuilding custom EC volumes

* add: defensive validation for dataShards in VolumeEcShardsToVolume

- Validate dataShards > 0 and <= MaxShardCount before use
- Prevents panic from corrupted or uninitialized ECContext
- Returns clear error message instead of panic
- Defense-in-depth: validates even though upstream should catch issues

* fix: replace TotalShardsCount with MaxShardCount for custom EC ratio support

Critical fixes to support custom EC ratios > 14 shards:

disk_location_ec.go:
- validateEcVolume: Check shards 0-31 instead of 0-13 during validation
- removeEcVolumeFiles: Remove shards 0-31 instead of 0-13 during cleanup

ec_volume_info.go ShardBits methods:
- ShardIds(): Iterate up to MaxShardCount (32) instead of TotalShardsCount (14)
- ToUint32Slice(): Iterate up to MaxShardCount (32)
- IndexToShardId(): Iterate up to MaxShardCount (32)
- MinusParityShards(): Remove shards 10-31 instead of 10-13 (added note about Phase 2)
- Minus() shard size copy: Iterate up to MaxShardCount (32)
- resizeShardSizes(): Iterate up to MaxShardCount (32)

Without these changes:
- Custom EC ratios > 14 total shards would fail validation on startup
- Shards 14-31 would never be discovered or cleaned up
- ShardBits operations would miss shards >= 14

These changes are backward compatible - MaxShardCount (32) includes
the default TotalShardsCount (14), so existing 10+4 volumes work as before.

* fix: replace TotalShardsCount with MaxShardCount in critical data structures

Critical fixes for buffer allocations and loops that must support
custom EC ratios up to 32 shards:

Data Structures:
- store_ec.go:354: Buffer allocation for shard recovery (bufs array)
- topology_ec.go:14: EcShardLocations.Locations fixed array size
- command_ec_rebuild.go:268: EC shard map allocation
- command_ec_common.go:626: Shard-to-locations map allocation

Shard Discovery Loops:
- ec_task.go:378: Loop to find generated shard files
- ec_shard_management.go: All 8 loops that check/count EC shards

These changes are critical because:
1. Buffer allocations sized to 14 would cause index-out-of-bounds panics
   when accessing shards 14-31
2. Fixed arrays sized to 14 would truncate shard location data
3. Loops limited to 0-13 would never discover/manage shards 14-31

Note: command_ec_encode.go:208 intentionally NOT changed - it creates
shard IDs to mount after encoding. In Phase 1 we always generate 14
shards, so this remains TotalShardsCount and will be made dynamic in
Phase 2 based on actual EC context.

Without these fixes, custom EC ratios > 14 total shards would cause:
- Runtime panics (array index out of bounds)
- Data loss (shards 14-31 never discovered/tracked)
- Incomplete shard management (missing shards not detected)

* refactor: move MaxShardCount constant to ec_encoder.go

Moved MaxShardCount from ec_volume_info.go to ec_encoder.go to group it
with other shard count constants (DataShardsCount, ParityShardsCount,
TotalShardsCount). This improves code organization and makes it easier
to understand the relationship between these constants.

Location: ec_encoder.go line 22, between TotalShardsCount and MinTotalDisks

* improve: add defensive programming and better error messages for EC

Code review improvements from CodeRabbit:

1. ShardBits Guardrails (ec_volume_info.go):
   - AddShardId, RemoveShardId: Reject shard IDs >= MaxShardCount
   - HasShardId: Return false for out-of-range shard IDs
   - Prevents silent no-ops from bit shifts with invalid IDs

2. Future-Proof Regex (disk_location_ec.go):
   - Updated regex from \.ec[0-9][0-9] to \.ec\d{2,3}
   - Now matches .ec00 through .ec999 (currently .ec00-.ec31 used)
   - Supports future increases to MaxShardCount beyond 99

3. Better Error Messages (volume_grpc_erasure_coding.go):
   - Include valid range (1..32) in dataShards validation error
   - Helps operators quickly identify the problem

4. Validation Before Save (volume_grpc_erasure_coding.go):
   - Validate ECContext (DataShards > 0, ParityShards > 0, Total <= MaxShardCount)
   - Log EC config being saved to .vif for debugging
   - Prevents writing invalid configs to disk

These changes improve robustness and debuggability without changing
core functionality.

* fmt

* fix: critical bugs from code review + clean up comments

Critical bug fixes:
1. command_ec_rebuild.go: Fixed indentation causing compilation error
   - Properly nested if/for blocks in registerEcNode

2. ec_shard_management.go: Fixed isComplete logic incorrectly using MaxShardCount
   - Changed from MaxShardCount (32) back to TotalShardsCount (14)
   - Default 10+4 volumes were being incorrectly reported as incomplete
   - Missing shards 14-31 were being incorrectly reported as missing
   - Fixed in 4 locations: volume completeness checks and getMissingShards

3. ec_volume_info.go: Fixed MinusParityShards removing too many shards
   - Changed from MaxShardCount (32) back to TotalShardsCount (14)
   - Was incorrectly removing shard IDs 10-31 instead of just 10-13

Comment cleanup:
- Removed Phase 1/Phase 2 references (development plan context)
- Replaced with clear statements about default 10+4 configuration
- SeaweedFS repo uses fixed 10+4 EC ratio, no phases needed

Root cause: Over-aggressive replacement of TotalShardsCount with MaxShardCount.
MaxShardCount (32) is the limit for buffer allocations and shard ID loops,
but TotalShardsCount (14) must be used for default EC configuration logic.

* fix: add defensive bounds checks and compute actual shard counts

Critical fixes from code review:

1. topology_ec.go: Add defensive bounds checks to AddShard/DeleteShard
   - Prevent panic when shardId >= MaxShardCount (32)
   - Return false instead of crashing on out-of-range shard IDs

2. command_ec_common.go: Fix doBalanceEcShardsAcrossRacks
   - Was using hardcoded TotalShardsCount (14) for all volumes
   - Now computes actual totalShardsForVolume from rackToShardCount
   - Fixes incorrect rebalancing for volumes with custom EC ratios
   - Example: 5+2=7 shards would incorrectly use 14 as average

These fixes improve robustness and prepare for future custom EC ratios
without changing current behavior for default 10+4 volumes.

Note: MinusParityShards and ec_task.go intentionally NOT changed for
seaweedfs repo - these will be enhanced in seaweed-enterprise repo
where custom EC ratio configuration is added.

* fmt

* style: make MaxShardCount type casting explicit in loops

Improved code clarity by explicitly casting MaxShardCount to the
appropriate type when used in loop comparisons:

- ShardId comparisons: Cast to ShardId(MaxShardCount)
- uint32 comparisons: Cast to uint32(MaxShardCount)

Changed in 5 locations:
- Minus() loop (line 90)
- ShardIds() loop (line 143)
- ToUint32Slice() loop (line 152)
- IndexToShardId() loop (line 219)
- resizeShardSizes() loop (line 248)

This makes the intent explicit and improves type safety readability.
No functional changes - purely a style improvement.
2025-10-27 22:13:31 -07:00
Chris Lu
25bbf4c3d4 Admin UI: Fetch task logs (#7114)
* show task details

* loading tasks

* task UI works

* generic rendering

* rendering the export link

* removing placementConflicts from task parameters

* remove TaskSourceLocation

* remove "Server ID" column

* rendering balance task source

* sources and targets

* fix ec task generation

* move info

* render timeline

* simplified worker id

* simplify

* read task logs from worker

* isValidTaskID

* address comments

* Update weed/worker/tasks/balance/execution.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update weed/worker/tasks/erasure_coding/ec_task.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update weed/worker/tasks/task_log_handler.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix shard ids

* plan distributing shard id

* rendering planned shards in task details

* remove Conflicts

* worker logs correctly

* pass in dc and rack

* task logging

* Update weed/admin/maintenance/maintenance_queue.go

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* display log details

* logs have fields now

* sort field keys

* fix link

* fix collection filtering

* avoid hard coded ec shard counts

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-08-09 21:47:29 -07:00
Chris Lu
0ecb466eda Admin: refactoring active topology (#7073)
* refactoring

* add ec shard size

* address comments

* passing task id

There seems to be a disconnect between the pending tasks created in ActiveTopology and the TaskDetectionResult returned by this function. A taskID is generated locally and used to create pending tasks via AddPendingECShardTask, but this taskID is not stored in the TaskDetectionResult or passed along in any way.

This makes it impossible for the worker that eventually executes the task to know which pending task in ActiveTopology it corresponds to. Without the correct taskID, the worker cannot call AssignTask or CompleteTask on the master, breaking the entire task lifecycle and capacity management feature.

A potential solution is to add a TaskID field to TaskDetectionResult and worker_pb.TaskParams, ensuring the ID is propagated from detection to execution.

* 1 source multiple destinations

* task supports multi source and destination

* ec needs to clean up previous shards

* use erasure coding constants

* getPlanningCapacityUnsafe getEffectiveAvailableCapacityUnsafe  should return StorageSlotChange for calculation

* use CanAccommodate to calculate

* remove dead code

* address comments

* fix Mutex Copying in Protobuf Structs

* use constants

* fix estimatedSize

The calculation for estimatedSize only considers source.EstimatedSize and dest.StorageChange, but omits dest.EstimatedSize. The TaskDestination struct has an EstimatedSize field, which seems to be ignored here. This could lead to an incorrect estimation of the total size of data involved in tasks on a disk. The loop should probably also include estimatedSize += dest.EstimatedSize.

* at.assignTaskToDisk(task)

* refactoring

* Update weed/admin/topology/internal.go

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* fail fast

* fix compilation

* Update weed/worker/tasks/erasure_coding/detection.go

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* indexes for volume and shard locations

* dedup with ToVolumeSlots

* return an additional boolean to indicate success, or an error

* Update abstract_sql_store.go

* fix

* Update weed/worker/tasks/erasure_coding/detection.go

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* Update weed/admin/topology/task_management.go

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* faster findVolumeDisk

* Update weed/worker/tasks/erasure_coding/detection.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update weed/admin/topology/storage_slot_test.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* refactor

* simplify

* remove unused GetDiskStorageImpact function

* refactor

* add comments

* Update weed/admin/topology/storage_impact.go

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* Update weed/admin/topology/storage_slot_test.go

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* Update storage_impact.go

* AddPendingTask

The unified AddPendingTask function now serves as the single entry point for all task creation, successfully consolidating the previously separate functions while maintaining full functionality and improving code organization.

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-08-03 01:35:38 -07:00
Chris Lu
9d013ea9b8 Admin UI: include ec shard sizes into volume server info (#7071)
* show ec shards on dashboard, show max in its own column

* master collect shard size info

* master send shard size via VolumeList

* change to more efficient shard sizes slice

* include ec shard sizes into volume server info

* Eliminated Redundant gRPC Calls

* much more efficient

* Efficient Counting: bits.OnesCount32() uses CPU-optimized instructions to count set bits in O(1)

* avoid extra volume list call

* simplify

* preserve existing shard sizes

* avoid hard coded value

* Update weed/storage/erasure_coding/ec_volume_info.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update weed/admin/dash/volume_management.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update ec_volume_info.go

* address comments

* avoid duplicated functions

* Update weed/admin/dash/volume_management.go

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* simplify

* refactoring

* fix compilation

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-08-02 02:16:49 -07:00
Chris Lu
0975968e71 admin: Refactor task destination planning (#7063)
* refactor planning into task detection

* refactoring worker tasks

* refactor

* compiles, but only balance task is registered

* compiles, but has nil exception

* avoid nil logger

* add back ec task

* setting ec log directory

* implement balance and vacuum tasks

* EC tasks will no longer fail with "file not found" errors

* Use ReceiveFile API to send locally generated shards

* distributing shard files and ecx,ecj,vif files

* generate .ecx files correctly

* do not mount all possible EC shards (0-13) on every destination

* use constants

* delete all replicas

* rename files

* pass in volume size to tasks
2025-08-01 11:18:32 -07:00
chrislu
f5c53b1bd8 fix reason display 2025-07-30 16:43:14 -07:00
Chris Lu
891a2fb6eb Admin: misc improvements on admin server and workers. EC now works. (#7055)
* initial design

* added simulation as tests

* reorganized the codebase to move the simulation framework and tests into their own dedicated package

* integration test. ec worker task

* remove "enhanced" reference

* start master, volume servers, filer

Current Status
 Master: Healthy and running (port 9333)
 Filer: Healthy and running (port 8888)
 Volume Servers: All 6 servers running (ports 8080-8085)
🔄 Admin/Workers: Will start when dependencies are ready

* generate write load

* tasks are assigned

* admin start wtih grpc port. worker has its own working directory

* Update .gitignore

* working worker and admin. Task detection is not working yet.

* compiles, detection uses volumeSizeLimitMB from master

* compiles

* worker retries connecting to admin

* build and restart

* rendering pending tasks

* skip task ID column

* sticky worker id

* test canScheduleTaskNow

* worker reconnect to admin

* clean up logs

* worker register itself first

* worker can run ec work and report status

but:
1. one volume should not be repeatedly worked on.
2. ec shards needs to be distributed and source data should be deleted.

* move ec task logic

* listing ec shards

* local copy, ec. Need to distribute.

* ec is mostly working now

* distribution of ec shards needs improvement
* need configuration to enable ec

* show ec volumes

* interval field UI component

* rename

* integration test with vauuming

* garbage percentage threshold

* fix warning

* display ec shard sizes

* fix ec volumes list

* Update ui.go

* show default values

* ensure correct default value

* MaintenanceConfig use ConfigField

* use schema defined defaults

* config

* reduce duplication

* refactor to use BaseUIProvider

* each task register its schema

* checkECEncodingCandidate use ecDetector

* use vacuumDetector

* use volumeSizeLimitMB

* remove

remove

* remove unused

* refactor

* use new framework

* remove v2 reference

* refactor

* left menu can scroll now

* The maintenance manager was not being initialized when no data directory was configured for persistent storage.

* saving config

* Update task_config_schema_templ.go

* enable/disable tasks

* protobuf encoded task configurations

* fix system settings

* use ui component

* remove logs

* interface{} Reduction

* reduce interface{}

* reduce interface{}

* avoid from/to map

* reduce interface{}

* refactor

* keep it DRY

* added logging

* debug messages

* debug level

* debug

* show the log caller line

* use configured task policy

* log level

* handle admin heartbeat response

* Update worker.go

* fix EC rack and dc count

* Report task status to admin server

* fix task logging, simplify interface checking, use erasure_coding constants

* factor in empty volume server during task planning

* volume.list adds disk id

* track disk id also

* fix locking scheduled and manual scanning

* add active topology

* simplify task detector

* ec task completed, but shards are not showing up

* implement ec in ec_typed.go

* adjust log level

* dedup

* implementing ec copying shards and only ecx files

* use disk id when distributing ec shards

🎯 Planning: ActiveTopology creates DestinationPlan with specific TargetDisk
📦 Task Creation: maintenance_integration.go creates ECDestination with DiskId
🚀 Task Execution: EC task passes DiskId in VolumeEcShardsCopyRequest
💾 Volume Server: Receives disk_id and stores shards on specific disk (vs.store.Locations[req.DiskId])
📂 File System: EC shards and metadata land in the exact disk directory planned

* Delete original volume from all locations

* clean up existing shard locations

* local encoding and distributing

* Update docker/admin_integration/EC-TESTING-README.md

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* check volume id range

* simplify

* fix tests

* fix types

* clean up logs and tests

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-07-30 12:38:03 -07:00
Chris Lu
69553e5ba6 convert error fromating to %w everywhere (#6995) 2025-07-16 23:39:27 -07:00
Chris Lu
687a6a6c1d Admin UI: Add policies (#6968)
* add policies to UI, accessing filer directly

* view, edit policies

* add back buttons for "users" page

* remove unused

* fix ui dark mode when modal is closed

* bucket view details button

* fix browser buttons

* filer action button works

* clean up masters page

* fix volume servers action buttons

* fix collections page action button

* fix properties page

* more obvious

* fix directory creation file mode

* Update file_browser_handlers.go

* directory permission
2025-07-12 01:13:11 -07:00
Chris Lu
aa66852304 Admin UI add maintenance menu (#6944)
* add ui for maintenance

* valid config loading. fix workers page.

* refactor

* grpc between admin and workers

* add a long-running bidirectional grpc call between admin and worker
* use the grpc call to heartbeat
* use the grpc call to communicate
* worker can remove the http client
* admin uses http port + 10000 as its default grpc port

* one task one package

* handles connection failures gracefully with exponential backoff

* grpc with insecure tls

* grpc with optional tls

* fix detecting tls

* change time config from nano seconds to seconds

* add tasks with 3 interfaces

* compiles reducing hard coded

* remove a couple of tasks

* remove hard coded references

* reduce hard coded values

* remove hard coded values

* remove hard coded from templ

* refactor maintenance package

* fix import cycle

* simplify

* simplify

* auto register

* auto register factory

* auto register task types

* self register types

* refactor

* simplify

* remove one task

* register ui

* lazy init executor factories

* use registered task types

* DefaultWorkerConfig remove hard coded task types

* remove more hard coded

* implement get maintenance task

* dynamic task configuration

* "System Settings" should only have system level settings

* adjust menu for tasks

* ensure menu not collapsed

* render job configuration well

* use templ for ui of task configuration

* fix ordering

* fix bugs

* saving duration in seconds

* use value and unit for duration

* Delete WORKER_REFACTORING_PLAN.md

* Delete maintenance.json

* Delete custom_worker_example.go

* remove address from workers

* remove old code from ec task

* remove creating collection button

* reconnect with exponential backoff

* worker use security.toml

* start admin server with tls info from security.toml

* fix "weed admin" cli description
2025-07-06 13:57:02 -07:00