Files
seaweedfs/weed/server
Chris Lu 29eec2f111 master: timeout AllocateVolume/DeleteVolume and defer growRequest cleanup (#9698)
* master: timeout AllocateVolume/DeleteVolume and defer growRequest cleanup

The volume-grow goroutine clears the layout's growRequest flag only after
ms.DoAutomaticVolumeGrow returns, and AllocateVolume / DeleteVolume were
calling the volume-server RPC with context.Background(). A volume server
that hung mid-call (heavy I/O, stuck lock, dead peer behind a stable VIP)
would park the goroutine forever, leaving growRequest=true and silently
blocking every subsequent automatic grow for that layout — Assign retries
then drained their 30s budget with "context deadline exceeded" until the
operator restarted the master.

Bound both RPCs with a 5-minute deadline (creating/removing a volume is
sub-second normally, generous for contended disks) and move the flag
clear + filter delete into defers so a panic in DoAutomaticVolumeGrow
doesn't strand the layout either.

* allocate_volume: shorten timeout to 1m for faster recovery

Volume create/delete is sub-second under normal conditions; 1 minute is
generous even on a contended disk and clears the growRequest flag well
before too many client Assigns drain their own retry budget.

* trim comments
2026-05-26 16:26:21 -07:00
..
2026-02-20 18:42:00 -08:00
2026-04-10 17:31:14 -07:00