Files
Chris Lu f643893891 fix(master): shed assign load when volume growth is already in flight (#10121)
Under a herd of concurrent assigns with no writable volume, Assign spun
PickForWrite for the full 10s timeout, pinning a goroutine per request and
starving the master of the cycles it needs to process growth and answer
heartbeats. When growth is the relevant remedy and already in flight, stop
spinning: if free space exists, shed with a fast retryable error so clients
back off and retry once growth lands; if the cluster is out of space, fail fast
with the real out-of-space error instead of masking it as retryable.

The gRPC shed uses ResourceExhausted, not Unavailable: operation.Assign retries
it, but the client connection layer doesn't treat it as a dead channel, so a
per-request shed across a herd doesn't tear down the shared master connection
and cancel every other in-flight assign. The HTTP dirAssignHandler sheds with
503 + Retry-After.
2026-06-26 14:23:40 -07:00
..
2022-07-28 23:24:38 -07:00
2022-07-28 23:24:38 -07:00
2025-10-13 18:05:17 -07:00
2025-10-13 18:05:17 -07:00
2022-08-18 00:15:46 -07:00
2026-03-09 11:54:32 -07:00
2020-06-20 12:50:40 -07:00