seaweedfs

mirror of https://github.com/seaweedfs/seaweedfs.git synced 2026-05-21 09:11:29 +00:00

Files

Chris Lu ecc0390795 fix(master): eagerly remove volume from writable when assign hits limit (#9108 )

* fix(master): eagerly remove volume from writable when RecordAssign hits limit

Previously, a volume was only removed from the writable list by the
heartbeat-driven CollectDeadNodeAndFullVolumes pass, which runs every
pulse (5s) after a 5s heartbeat. Under sustained concurrent writes,
fio-style workloads observed in the field grew volumes 8-20x past the
configured 100MB limit (median 530MB, peak 1.98GB) during that
5-15s detection window.

RecordAssign already tracks effective size (reported + pending) on each
/dir/assign. It now also removes the volume from writable the moment
effectiveSize reaches volumeSizeLimit, and mirrors the activeVolumeCount
decrement that Topology.SetVolumeCapacityFull would have done on the
next heartbeat. The heartbeat path remains unchanged and idempotent
(vl.SetVolumeCapacityFull returns false if already removed, so no
double-decrement).

Recovery still works: if a heartbeat later reports size < limit and
the volume is not oversized, EnsureCorrectWritables adds it back.

- weed/topology/volume_layout.go: RecordAssign returns reachedCapacity
  bool; adds AdjustActiveVolumeCountForFull helper.
- weed/topology/topology.go: PickForWrite invokes the decrement on
  eager full transitions.
- TestPickForWrite: pass a 1024-byte hint instead of 0 so the default
  1MB pendingDelta does not immediately bust the test's 32KB limit.
- New TestRecordAssignReachingCapacityRemovesFromWritable covers the
  eager removal, active count accounting, and no-double-accounting.

* fix(master): recover eagerly-removed volume once decay clears pending

After RecordAssign eagerly removes a volume from writables because
effectiveSize reached the limit, decay can later bring effectiveSize
back under the limit (e.g., when a burst of assigns didn't all result
in uploads). Without recovery the volume would stay non-writable until
vacuum or a ReadOnly flip.

UpdateVolumeSize now re-adds the volume to writables once all of the
following hold:

  * RecordAssign is what removed it (tracked via fullSince timestamp)
  * at least capacityRecoveryDelay has elapsed since the removal (30s)
    — this prevents bouncing during a steady stream of assigns near
    the limit
  * effectiveSize has decayed below the crowded threshold (90% of limit)
  * reportedSize is under the limit (actual disk is not over)
  * standard EnsureCorrectWritables preconditions: enough copies, all
    copies writable, not oversized

The caller (SyncDataNodeRegistration) re-increments activeVolumeCount
symmetrically with the decrement done on eager removal.

* review: release VolumeLayout lock before UpAdjustDiskUsageDelta

adjustActiveVolumeCount held vl.accessLock across the tree-climbing
UpAdjustDiskUsageDelta walk. That walk takes per-level DiskUsages
locks and could be re-entered from other call paths that hold a
node-level lock and then acquire vl.accessLock. Copy the node list
under the VolumeLayout lock and release it before the tree walk to
eliminate the lock-ordering hazard.

2026-04-16 12:50:30 -07:00

allocate_volume.go

add version to volume proto

2025-06-16 22:05:06 -07:00

cluster_commands.go

Prevent split-brain: Persistent ClusterID and Join Validation (#8022 )

2026-01-18 14:02:34 -08:00

collection.go

chore: execute goimports to format the code (#7983 )

2026-01-07 13:06:08 -08:00

configuration.go

skip deltaBeat if dn is zero (#3630 )