Files
seaweedfs/weed/operation
Chris Lu 4bf27278fa topology: fail replica writes fast when a replica is unreachable (#9744)
* operation: bound upload retries and honor context cancellation

retriedUploadData hardcoded 3 attempts and an uninterruptible backoff
sleep. A synchronous replica write to a dead host therefore paid the
full dial timeout three times over before failing.

Add UploadOption.MaxAttempts (<=0 keeps the default of 3) so callers can
cap attempts, and make the loop return as soon as the context is
cancelled so an abandoned upload unwinds instead of retrying.

* topology: fail replica writes fast when a replica is unreachable

DistributedOperation already returns on the first error, but a single
dead replica is itself the slow result: its goroutine retries the upload
three times through the dial timeout (~30s) before any error surfaces,
stalling the originating client write the whole time.

Make the replica write a single attempt (MaxAttempts=1) so a dead
replica fails after one dial timeout instead of three, and thread a
context into DistributedOperation that is cancelled once the outcome is
decided, so a healthy replica is no longer held hostage by one stalled
in a dial. The originating client write is what retries.

* topology: keep replica deletes off the client request context

ReplicatedDelete runs after the local needle is already deleted. Driving
the replica deletes off r.Context() means a client disconnect cancels
them and orphans needles on the replicas, so use a background context.

* operation, topology: trim comments on the replica fail-fast path
2026-05-30 10:45:02 -07:00
..
2019-02-09 21:56:32 -08:00