Files
seaweedfs/weed
Chris Lu e2c79af6ec feat(master): size-aware volume assignment with weighted selection (#9031)
* feat(master): size-aware volume assignment with weighted selection

PickForWrite now selects volumes proportional to remaining capacity
instead of uniform random, so emptier volumes receive more writes.

- Add vid2size map to VolumeLayout tracking effective volume sizes
- Weighted pick via random sampling (k=3) for O(1) cost
- RecordAssign tracks estimated pending bytes between heartbeats
- Exponential decay on heartbeat: halve excess each cycle
- Proactive crowded detection using effective size
- Zero extra heap allocations on the unconstrained hot path

Benchmark (20 writable volumes, unconstrained):
  Before: 36 ns/op, 32 B/op, 2 allocs/op
  After:  85 ns/op, 32 B/op, 2 allocs/op

* fix: address review feedback on size-aware assignment

- RecordAssign: use write lock (Lock) instead of read lock (RLock)
  since it mutates vid2size map and crowded set
- RegisterVolume: clear crowded flag when heartbeat decay drops
  effective size below the threshold
- pickWeightedByRemaining: fix misleading Fisher-Yates comment,
  simplify to plain random sampling (duplicates are harmless)
- ShouldGrowVolumesByDcAndRack: read vid2size under RLock

* fix: decay once per heartbeat cycle, not per replica

RegisterVolume is called once per replica of a volume. For replicated
volumes, the pending size decay was running multiple times per heartbeat
cycle, reducing the excess by 75% instead of 50% (for 2 replicas).

Fix: track vid2reportedSize and only run decay when the heartbeat-
reported size actually changes. A second replica reporting the same
size in the same cycle is a no-op.

Also fix CodeQL alert: cap count*EstimatedNeedleSizeBytes to avoid
uint64→int64 overflow in RecordAssign call.

* Potential fix for pull request finding 'CodeQL / Incorrect conversion between integer types'

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

* fix: fail fast in test setup on JSON errors

- setupWithLimit now takes testing.TB and calls t.Fatalf on unmarshal
  errors or type assertion failures instead of printing and continuing
- benchSetup removed; benchmarks reuse setupWithLimit directly

* fix: run size decay on every heartbeat, not just new volumes

RegisterVolume is only called for newly discovered volumes, not on
every heartbeat. The pending size decay was never running in production.

- Extract decay logic into UpdateVolumeSize(), called from
  SyncDataNodeRegistration for every reported volume on every heartbeat
- RegisterVolume only initializes vid2size for brand-new volumes
- Constrained PickForWrite: scan from random offset, collect up to
  pickSampleSize matches in a stack array (no append allocation)
- Tests now exercise UpdateVolumeSize directly instead of RegisterVolume
  to match the production heartbeat path

* fix: compute pending bytes in uint64 to satisfy CodeQL

---------

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
2026-04-11 09:19:05 -07:00
..
2026-04-10 17:31:14 -07:00
2026-04-10 17:31:14 -07:00
2026-04-10 17:31:14 -07:00
2026-04-10 17:31:14 -07:00
2026-04-10 17:31:14 -07:00
2026-04-10 17:31:14 -07:00
2026-04-10 17:31:14 -07:00
2026-04-10 17:31:14 -07:00
2026-04-10 17:31:14 -07:00
2026-04-10 17:31:14 -07:00
2024-02-14 08:26:38 -08:00
2026-04-10 17:31:14 -07:00
2026-04-10 17:31:14 -07:00
2026-04-10 17:31:14 -07:00
2026-04-09 19:00:06 -07:00