Commit Graph

6 Commits

Author SHA1 Message Date
Chris Lu
0566fbd552 EC encode: place shards via ecbalancer.Place + configurable replica placement (#9623)
* Add shared super_block.ResolveReplicaPlacement; use it in ec_balance

* Add ecbalancer.FromActiveTopology snapshot constructor for EC encode/repair

* Add ecbalancer.Place greenfield/repair placement core (strict + durability-first)

* topology: add GetEffectiveAvailableEcShardSlots; FromActiveTopology uses shard-granular free slots

GetDisksWithEffectiveCapacity flattens reserved shard slots into volume slots via
integer truncation, so an in-flight EC task reserving a non-multiple-of-
DataShardsCount number of shards was lost from the snapshot and freeSlots was
over-reported. GetEffectiveAvailableEcShardSlots subtracts the full reservation
impact at shard granularity.

* ecbalancer.Place: reject nodes without a free disk of the requested type

FromActiveTopology keeps all disk types in the snapshot, so an SSD-only request
could be routed to a node with only HDD capacity (pickBestDiskOnNode then returns
disk 0 on the wrong tier). Filter rack/node selection to those with a free disk
of the requested type.

* ecbalancer.Place: enforce ReplicaPlacement DiffDataCenterCount (per-DC shard cap)

* ecbalancer: enforce DiffDataCenterCount in balance (cross-DC phase + cross-rack DC cap)

Adds a cross-DC corrective phase that drains data centers holding more than
DiffDataCenterCount shards of a volume, and a per-DC cap on cross-rack move
targets. Both are no-ops when DiffDataCenterCount is unset, so balance output is
unchanged for non-DC placements.

* topology: ratio-aware EC shard slots and provisional empty-disk slot

GetEffectiveAvailableEcShardSlots now takes the target collection's data-shard
count, so a 4+2 volume's larger shards are not over-counted at 10 per volume slot;
and it keeps the one provisional slot for freshly started empty servers that
report max=0, matching getEffectiveAvailableCapacityUnsafe. FromActiveTopology
threads the ratio through.

* ecbalancer.Place: explicit disk-type filter signal (fix HDD vs any ambiguity)

HardDriveType normalizes to "", which collided with "" meaning any disk. Add
Constraints.FilterDiskType and normalize both sides so a hdd request matches disks
reported as "" and never leaks to SSD, while filter=false still means any.

* ecbalancer: add clearShardAccounting for repair snapshot reconciliation

Clears one disk's copy of a shard from per-domain accounting and recomputes the
node-level union (preserving a kept copy on another disk of the same node), without
crediting capacity. Repair uses it to drop to-be-deleted copies before placing
missing shards.

* ecbalancer: don't cap cross-DC target racks when DiffRackCount is unset

len(racks)+1 wrongly limited each target rack (3 in a 2-rack cluster), so draining
a DC could stop short of the DiffDataCenterCount cap. Use MaxShardCount+1 as the
effectively-unlimited default.

* topology/ecbalancer: ratio-correct EC capacity accounting

Reservation shard slots (default ShardsPerVolumeSlot units) are now converted to
the target ratio before subtracting, and existing EC shards are charged by size
(targetDataShards/shardDataShards) so a 2+1 shard isn't counted as one 10+4 slot.
Per-shard ratio lookup is behind shardDataShards (OSS uses the standard ratio).

* ecbalancer.Place: candidate tiering and eligible-rack caps

Adds a per-disk eligibility/preference abstraction so Place supports:
- preferred-tag whole-plan retry (try disks carrying the earliest tags first,
  widen to all only if a tier cannot place every shard; reports
  SpilledOutsidePreferredTags),
- soft disk-type spill via DiskTypePolicy (Any/Prefer/Require): Prefer fills the
  preferred type then spills, reporting SpilledToOtherDiskType; Require filters,
- even per-rack caps that divide by racks holding an eligible disk, so a tiered
  cluster (e.g. SSDs in 2 of 4 racks) isn't capped impossibly low.
Disk tags carried via Node.AddDiskTags + FromActiveTopology.

* ecbalancer: export ClearShardAccounting for repair snapshot reconciliation

* ecbalancer: address review feedback (ratio rounding, bitmap walk, same-DC moves)

- topology/ecbalancer: round shard-reservation and existing-shard footprint up
  when converting to target-ratio shard slots, so a sub-slot reservation is not
  truncated to zero and free capacity is not overstated for low-data-shard
  layouts (targetDataShards < ds).
- erasure_coding: add ShardBits.All iterator and use it across the balancer,
  cross-DC phase, and placement scoring instead of scanning 0..MaxShardCount and
  probing Has on every id.
- ecbalancer: allow same-DC cross-rack moves when a DC already sits at its
  DiffDataCenterCount cap; a same-DC move leaves the DC total unchanged. Add a
  regression test that fails without the guard.
- ecbalancer cross-DC phase: pick targets via the eligible-aware
  pickNodeInRackEligible/pickBestDiskEligible helpers so the disk-type filter is
  honored and a 0 disk id is not mistaken for a valid selection.

* ecbalancer: test ecShardSlotsOnDisk fractional round-up

Cover the mixed-ratio path (targetDataShards < existing data shards) so a
shard's fractional footprint is never floored to zero and free capacity is not
overstated. Exercises the round-up via the targetDataShards parameter; OSS uses
the standard ratio at runtime while the enterprise build hits it with real
per-volume ratios.

* ecbalancer: assert node B rack in TestFromActiveTopology

* ecbalancer: split Destination into separate DataCenter and bare Rack

Replace the composite "dc:rack" Rack field on Destination with separate
DataCenter and bare Rack values, matching topology.DiskInfo and the worker-task
convention. Callers (and tests) read the data center directly instead of parsing
the composite with strings.SplitN.

* shell ec.balance: use utilization-based global balancing (parity with worker)

The shell's global rebalance phase balanced by raw shard count; switch it to
fractional fullness (shards/capacity), as the worker already does. On uniform
capacity the two agree; on heterogeneous capacity it fills nodes proportionally
instead of driving small-capacity nodes toward full.

Updates the heterogeneous-capacity regression test to assert even fullness
(~equal shards/capacity per node) rather than even shard count.

* ecbalancer: bounded-proportional per-DC shard spread

DiffDataCenterCount was enforced only as a ceiling (drain-to-cap), which could
leave a within-cap-but-lopsided DC distribution under a loose cap (e.g. 10/4 of 14
with cap=10). Now the cross-DC phase, the cross-rack DC guard, and Place all target
boundedMaxPerDC = min(DiffDataCenterCount, max(ceil(total/numDCs), parityShards)):
shards spread proportionally across DCs, but no tighter than the durability floor
(once each DC holds <= parityShards a DC loss is recoverable, so further spreading
only adds cross-DC/WAN traffic). No-op when DiffDataCenterCount is 0; identical to
before when the cap is the binding constraint.

* ecbalancer: drop DiffDataCenterCount enforcement for EC placement

The 1-byte volume ReplicaPlacement packs xyz into x*100+y*10+z<=255, so the DC
digit can only be 0-2 -- far too small to be a meaningful per-DC EC shard cap (a
cap of 1-2 would demand 7-14 DCs for a 10+4 volume). It's volume replica-placement,
not an EC spec. Removes the cross-DC balance phase, the DC guard in the cross-rack
phase, and the per-DC cap in Place (and the just-added bounded-proportional logic);
EC relies on the RP-independent rack/node even spread instead. Rack/node caps
(DiffRackCount/SameRackCount) are unchanged. Per-domain EC caps are left for a real
EC placement spec.

* ecbalancer: enforce per-disk durability cap; symmetric reserve/release

Place now refuses to put more than parityShards shards of a volume on a single
disk (pickBestDiskEligible skips a disk once it holds parityShards of the volume,
a hard cap not relaxed even in durability-first). Previously Place assigned by
free capacity, so a skewed near-full cluster could pile >parityShards onto one
disk -> losing it loses the volume; only distinct-disk count was checked. This
covers encode and repair (both route through Place); the caller skips/leaves the
volume rather than minting an unrecoverable layout.

Also makes reserveShard decrement freeSlots unconditionally, symmetric with
releaseShard's unconditional increment (the old guarded decrement could credit a
phantom slot on release if a shard were ever reserved onto a full disk).

* ecbalancer: add Topology.ReleaseVolumeShards (clear + credit) for greenfield encode

Releases all of a volume's shards from the snapshot and credits the freed disk
capacity, so a greenfield encode can plan as if stale EC shards from a prior failed
attempt are gone. Safe to credit because the encode task deletes stale shards
(cleanupStaleEcShards) before distributing the new ones. Distinct from
ClearShardAccounting (repair), which does not credit.

* ecbalancer: ReleaseVolumeShards credits node freeSlots, not just disks

releaseShard only increments per-disk freeSlots, but rack capacity is summed from
node freeSlots (buildRacks) and node freeSlots gates node eligibility. Crediting
only disks left a node/rack looking full after releasing stale shards, so a
greenfield encode still couldn't use the freed capacity. Now credits the node by
the total disk-slots freed.

* ecbalancer: correct PlacementMode docs (encode uses durability-first)

PlaceStrict was labeled '(encode)' but encode uses PlaceDurabilityFirst. Clarify
that durability-first is used by both encode and repair, reports relaxations in
PlaceResult.Relaxed, and never relaxes the per-disk durability cap.

* ecbalancer: treat SameRackCount as a direct per-node shard cap

The 3rd ReplicaPlacement digit now caps shards per node at exactly the digit
value, matching how DiffRackCount (2nd digit) caps per rack, instead of allowing
digit+1 per node. This makes the per-rack and per-node caps consistent and
matches the documented "digits cap EC shards per rack and per node" semantics;
e.g. 011 now means at most one shard per rack and one per node.

* EC encode: place shards via ecbalancer.Place + configurable replica placement

Encode now plans destinations through the shared ecbalancer.Place policy
(durability-first: prefers the source disk type and honors replica placement /
caps / anti-affinity, relaxing rather than failing when capacity is tight) instead
of the EC-only placement planner. Targets and capacity reservations use Place's
actual per-disk shard assignment, not a round-robin guess; cross-volume in-cycle
capacity is tracked by ActiveTopology's pending task, so the cached planner is no
longer consulted. Adds a configurable replica_placement (proto field 6 + worker
form + reader) that overrides the master default replication.

The placement-package planner code is left in place (now unused) and removed in a
follow-up that drops the package.

* EC encode: drop unused dataShards param from createECTargets

Addresses review feedback: after switching to Place's per-disk shardsPerPlan
assignment, createECTargets no longer needs the data-shard count.

* EC encode: fix packed-target validation, greenfield stale-shard accounting, RP docs

- Validate counts distinct shard ids across targets, not target rows, so packed
  plans (fewer (node,disk) targets than shards) aren't rejected.
- planECDestinations releases the volume's stale EC shards from the snapshot before
  Place (ReleaseVolumeShards), crediting their capacity. The encode task deletes
  stale shards before distributing, so a retry on tight capacity no longer fails
  planning by counting shards that are about to be removed.
- replica_placement config/form help no longer claims a data-center limit (the DC
  digit is ignored for EC); detection logs a warning when a DC digit is set.

* EC encode: surface relaxed placement; mark replica_placement best-effort

Encode places with PlaceDurabilityFirst (the chosen lenient behavior), which can
relax caps/anti-affinity/replica-placement to avoid deferring. That was silent
(only disk-type/tag spills were logged). Now logs PlaceResult.Relaxed so a tight
replica placement isn't weakened unnoticed, and the config/form help states the
rack/node caps are best-effort during encode (enforced by rebalancing).

* EC encode: key per-disk shard grouping by struct, not formatted string

planECDestinations grouped destinations using a fmt.Sprintf("%s:%d") map key
per shard; use a {node,diskID} struct key and pre-size the map/slice to the
shard count to drop the per-shard string allocation.
2026-05-22 20:22:30 -07:00
Chris Lu
cd15ae1395 fix(ec): bring ec.encode worker and EC/volume helpers to parity with shell (#9599)
* refactor(volume): extract replica sync/select into shared volume_replica package

Move the volume replica reconciliation helpers (status, union builder,
SyncAndSelectBestReplica, ReadNeedleMeta) out of the shell into a new
weed/storage/volume_replica package so both the shell (ec.encode, volume.tier.move,
volume.check.disk) and the EC encode worker can reuse them. No behavior change.

* fix(ec): bring ec.encode worker to parity with the shell

- Sync replicas and encode the most-complete one (via the shared
  volume_replica.SyncAndSelectBestReplica) instead of a possibly-stale replica,
  marking all replicas readonly first. Prevents silent data loss when a stale
  replica is encoded and the originals deleted.
- Skip remote/tiered volumes in detection (shell ec.encode excludes them).
- Min-node safety gate: refuse to encode when cluster nodes < parity shards.
- Align default thresholds with the shell (fullness 0.95, quiet 1h).

* fix(vacuum): plugin path honors min_volume_age_seconds override

deriveVacuumConfig hard-coded MinVolumeAgeSeconds=0, dropping any configured
value. Read it from worker config (default 0, matching the shell/master vacuum
which has no age gate) so an explicit override is honored.

* address review feedback

- config.go: align GetConfigSpec schema defaults (quiet_for_seconds=3600,
  fullness_ratio=0.95) with the runtime defaults so UI/bootstrap flows match the
  shell (coderabbitai).
- ec_task.go: roll back readonly when markReplicasReadonly fails partway, so
  already-marked replicas don't stay readonly (coderabbitai).
- volume_replica: pass the caller's replica statuses into buildUnionReplica instead
  of re-fetching them, and skip the per-needle ReadNeedleMeta RPC when the source
  replica is read-only (gemini-code-assist).

* test(plugin_workers/ec): make fixtures eligible under the new defaults

The default EC encode thresholds were raised to match the shell (fullness 0.95,
quiet 1h), but the plugin-worker integration fixtures still used 90%-full /
10-minute-old volumes, so detection found no eligible volumes and the tests failed
in CI. Bump the eligible fixtures to 96% full and 2h old.
2026-05-21 02:16:28 -07:00
Chris Lu
d221a64262 fix(ec): skip re-encode when EC shards already exist for the volume (#9448) (#9458)
* fix(ec): skip re-encode when EC shards already exist for the volume (#9448)

When an earlier EC encoding succeeded but the post-encode source-delete
left a regular replica behind on one of the servers, the next detection
cycle proposes the same volume again. The new encode tries to redistribute
shards to targets that already have them mounted, the volume server
returns `ec volume %d is mounted; refusing overwrite`, the task fails,
and detection re-queues the volume. The cycle repeats forever — issue
#9448.

The existing `metric.IsECVolume` skip catches the case where the canonical
metric is reported on the EC-shard side of the heartbeat, but when the
master sees BOTH a regular replica AND its EC shards in the same volume
list, the canonical metric we pick is the regular replica and
IsECVolume is false. Add a second guard that checks the topology
directly via `findExistingECShards` (already present and indexed) and
skip the volume when any shards exist, logging a warning that points
the admin at the stuck source.

This breaks the loop. Auto-cleanup of the orphaned replica is left as
follow-up work — deleting a source replica from inside the detector is
only safe with a re-verification step right before the delete, plus a
config opt-in, and is best done in its own change.

* fix(ec): #9448 guard only fires when EC shard set is complete

The first version of the #9448 guard tripped on `len(existingShards) > 0`,
which is broader than necessary. The existing recovery branch in the
encode arm (around the `existingECShards` block, ~line 216) is designed
to fold partial leftover shards from a previously failed encode into
the new task as cleanup sources. Skipping unconditionally on any
existing shards made that branch dead code, regressing the recovery
behavior Gemini flagged in the review of af09e1ec7.

Two corrections:

  1. New helper `countExistingEcShardsForVolume` walks each disk's
     `EcIndexBits` bitmap and ORs the results into a `ShardBits`,
     returning the distinct-shard popcount. This is the right unit:
     a single `VolumeEcShardInformationMessage` can carry several
     shards, so `len(EcShardInfos)` is not the same as the number
     of present shards. Per Gemini's "use helper functions that walk
     the actual shard bitmap" note.
  2. The guard now fires only when `shardCount >= totalShards`.
     Partial shard sets fall through to the existing recovery branch,
     unchanged.

Tests:
  - TestDetectionSkipsWhenECShardsAlreadyExist: complete shards →
    no proposal (the regression test for #9448 itself, unchanged
    intent, rewritten on top of new helpers).
  - TestDetectionAllowsRegularReplicaWhenShardsPartial: partial
    shards → guard does NOT swallow the volume; the encode arm
    still gets a chance.
  - TestCountExistingEcShardsForVolume: the helper walks the
    bitmap correctly even when one info entry packs multiple
    shards on one disk.

The dangerous `volume.delete` hint in the warning is unchanged for
now — it gets fixed in the next commit.

* fix(ec): drop dangerous shell-command hint from #9448 warning

The previous warning told operators to run `volume.delete -volumeId=%d`
in the SeaweedFS shell to clean up the orphaned source replica. That
command is cluster-wide — it deletes every replica of the volume,
including the EC shards, which share the same volume id. Running it
in the state the message describes would cause the data loss the
guard exists to prevent.

Replace it with explicit guidance that the cleanup must be a targeted
VolumeDelete RPC against the source server only, and that the
shell command is the exact wrong thing to use here. The next two
commits add the plumbing and the auto-execution of that targeted
delete so most operators never see this hint at all.

Per Gemini comment on af09e1ec7.

* feat(worker): plumb grpc dial option through ClusterInfo

Add ClusterInfo.GrpcDialOption (optional) and set it in the
erasure_coding plugin handler. Lets the detector make targeted
gRPC calls during detection — used by the follow-up commit to
auto-clean orphan source replicas via VolumeDelete RPCs.

Zero-value safe: existing detectors that don't need RPC access
get a nil DialOption and ignore the field.

* feat(ec): auto-clean orphan source replica via targeted VolumeDelete

Builds on the previous commits: the guard now identifies the
#9448 stuck-source state and a gRPC dial option is available on
ClusterInfo. When both are true, detection auto-cleans the
orphaned regular replica instead of just warning the operator.

New helper `cleanupOrphanSourceReplicas`:

  1. Re-verifies the EC shard set is still complete via
     `countExistingEcShardsForVolume` against the live topology
     snapshot. If the count dropped between detection start and
     the cleanup decision (a volume server going down mid-cycle),
     it aborts — the source replica is the only complete copy and
     deleting it without a healthy shard set would be data loss.
  2. Issues targeted VolumeDelete RPCs to each regular-replica
     server via `operation.WithVolumeServerClient`. That RPC only
     touches the regular volume on the targeted server; EC shards
     live in a separate store path and are not affected. This is
     the safe alternative to the cluster-wide `volume.delete`
     shell command we previously warned against.

If the cleanup partially fails (one replica delete errors, others
succeed), detection logs the failure and continues to skip the
volume. The next detection cycle will try again. We deliberately
don't fall back to a re-encode because that would just collide
with the mounted shards on the targets again.

When no dial option is available the existing warning still
points operators at the safe manual procedure.
2026-05-11 23:12:57 -07:00
Chris Lu
532b088262 fix(ec): preserve source disk type across EC encoding (#9423) (#9449)
* fix(ec): carry source disk type on VolumeEcShardsMount (#9423)

When EC shards land on a target whose disk type differs from the
source volume's, master heartbeats wrongly reported under the target
disk's type. Add source_disk_type to VolumeEcShardsMountRequest; the
target server applies it to the in-memory EcVolume via SetDiskType so
the mount notification and steady-state heartbeat both carry the
source's disk type. Empty value falls back to the location's disk
type (used by disk-scan reload paths).

The override is not persisted with the volume — disk type stays an
environmental property and .vif remains portable.

* fix(ec): plumb source disk type through plugin worker (#9423)

Add source_disk_type to ErasureCodingTaskParams (field 8; 7 reserved),
populate it from the metric the detector already collects, thread it
through ec_task into the MountEcShards helper, and forward it on the
VolumeEcShardsMount RPC.

* fix(ec): mirror source disk type plumbing in rust volume server (#9423)

The volume_ec_shards_mount handler now forwards source_disk_type into
mount_ec_shard → DiskLocation::mount_ec_shards. When non-empty it
overrides ec_vol.disk_type (and each mounted shard's disk_type) via
the new set_disk_type method; empty value keeps the location's disk
type, so disk-scan reload and reconcile paths are unchanged.

Also picks up two pre-existing proto drifts that 'make gen' synced
from weed/pb (LockRingUpdate in master.proto, listing_cache_ttl_seconds
in remote.proto).

* feat(ec): bias placement toward preferred disk type (#9423)

Add DiskCandidate.DiskType and PlacementRequest.PreferredDiskType.
When PreferredDiskType is non-empty, SelectDestinations partitions
suitable disks into matching/fallback tiers and runs the rack/server/
disk-diversity passes on the matching tier first; the fallback tier
is only consulted if the matching pool can't satisfy ShardsNeeded.
PlacementResult.SpilledToOtherDiskType lets callers warn on spillover.

Empty PreferredDiskType keeps the existing single-pool behavior.

* fix(ec): plumb source disk type into placement planner (#9423)

diskInfosToCandidates now copies DiskInfo.DiskType into the placement
candidate, and ecPlacementPlanner.selectDestinations forwards
metric.DiskType as PreferredDiskType so EC shards land on disks
matching the source volume's disk type when possible. A glog warning
fires when placement had to spill to other disk types.

* test(ec): integration coverage for source-disk-type plumbing (#9423)

store_ec_disk_type_test exercises Store.MountEcShards end-to-end: a
shard physically lives on an HDD location, MountEcShards is called
with sourceDiskType="ssd", and the test asserts that the in-memory
EcVolume, the mounted shard, the NewEcShardsChan notification, and
the steady-state heartbeat all report under the source's disk type.
A companion test pins the empty-source path so disk-scan reload
keeps the location's disk type.

detection_disk_type_test exercises the worker plumbing: with a
cluster of nodes carrying both HDD and SSD disks, planECDestinations
must place every shard on SSD when metric.DiskType="ssd"; with only
one SSD node and 13 HDD nodes it must still satisfy a 10+4 layout
via spillover (and log a warning).

* revert(ec): drop unrelated proto drift in seaweed-volume/proto (#9423)

make gen pulled two pre-existing OSS changes into the rust proto
tree (LockRingUpdate / by_plugin in master.proto,
listing_cache_ttl_seconds in remote.proto). Reviewers flagged it as
scope creep — none of the rust EC fix references those fields.
Restore both files to origin/master so this branch only touches
EC-related symbols.

* fix(ec placement): treat empty disk type as hdd and skip used racks on spill (#9423)

partitionByDiskType used raw string comparison, so a PreferredDiskType
of "hdd" never matched candidates whose DiskType is "" (the
HardDriveType sentinel that weed/storage/types uses). EC encoding of
an HDD source would spill onto any HDD reporting "" even when the
cluster has plenty of matching capacity. Normalize both sides
through normalizeDiskType, which lowercases and folds "" → "hdd",
mirroring types.ToDiskType without taking a dependency on it.

selectFromTier's rack-diversity pass also kept revisiting racks the
preferred tier had already used when running on the fallback tier,
which negated PreferDifferentRacks on spillover. Skip racks already
in usedRacks so fallback placements still spread onto new racks.

* fix(ec): empty-source remount must not clobber existing disk type (#9423)

mount_ec_shards_with_idx_dir runs more than once per vid (RPC mount,
disk-scan reload, orphan-shard reconcile). After an RPC sets the
source-derived disk type, any later call passing source_disk_type=""
was resetting ec_vol.disk_type back to the location's value, which
reintroduces the heartbeat drift this PR is meant to fix. Only
default to the location's disk type when the EC volume is fresh
(no shards mounted yet); otherwise leave the recorded type alone so
empty-source reloads preserve whatever the original mount RPC set.
2026-05-11 20:21:50 -07:00
Chris Lu
5d43f84df7 refactor(plugin): rename detection_interval_seconds → detection_interval_minutes (#9366)
Minutes is the natural granularity for detection cadence — every
production handler already set the seconds field to a 60-multiple
(17*60, 30*60, 3600, 24*60*60). Switching to minutes drops the *60
arithmetic and matches the unit conventions used elsewhere in the
plugin worker forms.

- Proto: AdminRuntimeDefaults + AdminRuntimeConfig.detection_interval_*
  field renamed.
- Helpers: durationFromMinutes / minutesFromDuration alongside the
  existing seconds variants in plugin_scheduler.go.
- Handlers: vacuum, ec_balance, balance, erasure_coding, iceberg,
  admin_script, s3_lifecycle now declare DetectionIntervalMinutes.
- Admin: scheduler_status + types + UI templ + plugin_api.go pass
  through the new field; UI label and table cells switch to "min".
2026-05-08 10:33:02 -07:00
Chris Lu
1f6f473995 refactor(worker): co-locate plugin handlers with their task packages (#9301)
* refactor(worker): co-locate plugin handlers with their task packages

Move every per-task plugin handler from weed/plugin/worker/ into the
matching weed/worker/tasks/<name>/ package, so each task owns its
detection, scheduling, execution, and plugin handler in one place.

Step 0 (within pluginworker, no behavior change): extract shared helpers
that previously lived inside individual handler files into dedicated
files and export the ones now consumed across packages.

  - activity.go: BuildExecutorActivity, BuildDetectorActivity
  - config.go: ReadStringConfig/Double/Int64/Bytes/StringList, MapTaskPriority
  - interval.go: ShouldSkipDetectionByInterval
  - volume_state.go: VolumeState + consts, FilterMetricsByVolumeState/Location
  - collection_filter.go: CollectionFilterMode + consts
  - volume_metrics.go: export CollectVolumeMetricsFromMasters,
    MasterAddressCandidates, FetchVolumeList
  - testing_senders_test.go: shared test stubs

Phase 1: move the per-task plugin handlers (and the iceberg subpackage)
into their task packages.

  weed/plugin/worker/vacuum_handler.go         -> weed/worker/tasks/vacuum/plugin_handler.go
  weed/plugin/worker/ec_balance_handler.go     -> weed/worker/tasks/ec_balance/plugin_handler.go
  weed/plugin/worker/erasure_coding_handler.go -> weed/worker/tasks/erasure_coding/plugin_handler.go
  weed/plugin/worker/volume_balance_handler.go -> weed/worker/tasks/balance/plugin_handler.go
  weed/plugin/worker/iceberg/                   -> weed/worker/tasks/iceberg/

  weed/plugin/worker/handlers/handlers.go now blank-imports all five
  task subpackages so their init() registrations fire.

  weed/command/mini.go and the worker tests construct the handler with
  vacuum.DefaultMaxExecutionConcurrency (the constant moved with the
  vacuum handler).

admin_script remains in weed/plugin/worker/ because there is no
underlying weed/worker/tasks/admin_script/ package to merge with.

* refactor(worker): update test/plugin_workers imports for moved handlers

Three handler constructors moved out of pluginworker into their task
packages — update the integration test files in test/plugin_workers/
to import from the new locations:

  pluginworker.NewVacuumHandler        -> vacuum.NewVacuumHandler
  pluginworker.NewVolumeBalanceHandler -> balance.NewVolumeBalanceHandler
  pluginworker.NewErasureCodingHandler -> erasure_coding.NewErasureCodingHandler

The pluginworker import is kept where the file still uses
pluginworker.WorkerOptions / pluginworker.JobHandler.

* refactor(worker): update test/s3tables iceberg import path

The iceberg subpackage moved from weed/plugin/worker/iceberg/ to
weed/worker/tasks/iceberg/. test/s3tables/maintenance/maintenance_integration_test.go
still imported the old path, breaking S3 Tables / RisingWave / Trino /
Spark / Iceberg-catalog / STS integration test builds.

Mirrors the OSS-side fix needed by every job in the run that
transitively imports test/s3tables/maintenance.

* chore: gofmt PR-touched files

The S3 Tables Format Check job runs `gofmt -l` over weed/s3api/s3tables
and test/s3tables, then fails if anything is unformatted. Files this
PR moved or modified had import-grouping and trailing-spacing issues
introduced by perl-based renames; reformat them with gofmt -w.

Touched files:
  test/plugin_workers/erasure_coding/{detection,execution}_test.go
  test/s3tables/maintenance/maintenance_integration_test.go
  weed/plugin/worker/handlers/handlers.go
  weed/worker/tasks/{balance,ec_balance,erasure_coding,vacuum}/plugin_handler*.go

* refactor(worker): bounds-checked int conversions for plugin config values

CodeQL flagged 18 go/incorrect-integer-conversion warnings on the moved
plugin handler files: results of pluginworker.ReadInt64Config (which
ultimately calls strconv.ParseInt with bit size 64) were being narrowed
to int32/uint32/int without an upper-bound check, so a malicious or
malformed admin/worker config value could overflow the target type.

Add three helpers in weed/plugin/worker/config.go that wrap
ReadInt64Config and clamp out-of-range values back to the caller's
fallback:

  ReadInt32Config (math.MinInt32 .. math.MaxInt32)
  ReadUint32Config (0 .. math.MaxUint32)
  ReadIntConfig    (math.MinInt32 .. math.MaxInt32, platform-portable)

Update each flagged call site in the four moved task packages to use
the bounds-checked helper. For protobuf uint32 fields (volume IDs)
the variable type also becomes uint32, removing the trailing
uint32(volumeID) casts and changing the "missing volume_id" check
from `<= 0` to `== 0`.

Touched files:
  weed/plugin/worker/config.go
  weed/worker/tasks/balance/plugin_handler.go
  weed/worker/tasks/erasure_coding/plugin_handler.go
  weed/worker/tasks/vacuum/plugin_handler.go

* refactor(worker): use ReadIntConfig for clamped derive-worker-config helpers

CodeQL still flagged three call sites where ReadInt64Config was being
narrowed to int after a value-range clamp (max_concurrent_moves <= 50,
batch_size <= 100, min_server_count >= 2). The clamp is correct but
CodeQL's flow analysis didn't recognize the bound, so it flagged them
as unbounded narrowing.

Switch to ReadIntConfig (already int32-bounded by the helper) for
those three sites, drop the now-redundant int64 intermediate variables.

Also drops the now-unused `> math.MaxInt32` clamp in
ec_balance.deriveECBalanceWorkerConfig (the helper covers it).
2026-05-02 18:03:13 -07:00