mirror of
https://github.com/seaweedfs/seaweedfs.git
synced 2026-05-21 09:11:29 +00:00
* fix(master): eagerly remove volume from writable when RecordAssign hits limit
Previously, a volume was only removed from the writable list by the
heartbeat-driven CollectDeadNodeAndFullVolumes pass, which runs every
pulse (5s) after a 5s heartbeat. Under sustained concurrent writes,
fio-style workloads observed in the field grew volumes 8-20x past the
configured 100MB limit (median 530MB, peak 1.98GB) during that
5-15s detection window.
RecordAssign already tracks effective size (reported + pending) on each
/dir/assign. It now also removes the volume from writable the moment
effectiveSize reaches volumeSizeLimit, and mirrors the activeVolumeCount
decrement that Topology.SetVolumeCapacityFull would have done on the
next heartbeat. The heartbeat path remains unchanged and idempotent
(vl.SetVolumeCapacityFull returns false if already removed, so no
double-decrement).
Recovery still works: if a heartbeat later reports size < limit and
the volume is not oversized, EnsureCorrectWritables adds it back.
- weed/topology/volume_layout.go: RecordAssign returns reachedCapacity
bool; adds AdjustActiveVolumeCountForFull helper.
- weed/topology/topology.go: PickForWrite invokes the decrement on
eager full transitions.
- TestPickForWrite: pass a 1024-byte hint instead of 0 so the default
1MB pendingDelta does not immediately bust the test's 32KB limit.
- New TestRecordAssignReachingCapacityRemovesFromWritable covers the
eager removal, active count accounting, and no-double-accounting.
* fix(master): recover eagerly-removed volume once decay clears pending
After RecordAssign eagerly removes a volume from writables because
effectiveSize reached the limit, decay can later bring effectiveSize
back under the limit (e.g., when a burst of assigns didn't all result
in uploads). Without recovery the volume would stay non-writable until
vacuum or a ReadOnly flip.
UpdateVolumeSize now re-adds the volume to writables once all of the
following hold:
* RecordAssign is what removed it (tracked via fullSince timestamp)
* at least capacityRecoveryDelay has elapsed since the removal (30s)
— this prevents bouncing during a steady stream of assigns near
the limit
* effectiveSize has decayed below the crowded threshold (90% of limit)
* reportedSize is under the limit (actual disk is not over)
* standard EnsureCorrectWritables preconditions: enough copies, all
copies writable, not oversized
The caller (SyncDataNodeRegistration) re-increments activeVolumeCount
symmetrically with the decrement done on eager removal.
* review: release VolumeLayout lock before UpAdjustDiskUsageDelta
adjustActiveVolumeCount held vl.accessLock across the tree-climbing
UpAdjustDiskUsageDelta walk. That walk takes per-level DiskUsages
locks and could be re-entered from other call paths that hold a
node-level lock and then acquire vl.accessLock. Copy the node list
under the VolumeLayout lock and release it before the tree walk to
eliminate the lock-ordering hazard.