Files
seaweedfs/weed/cluster
Chris Lu a5d0e4a735 Samba-over-FUSE integration test and distributed-lock handoff fixes (#9590)
* test(mount): add Samba over FUSE integration test

Export a SeaweedFS FUSE mount over SMB with smbd and drive it with
smbclient: file round-trips, directories, rename, large-file chunking,
recursive upload, cross-protocol consistency, and deletes.

A second -dlm mount adds locking coverage: POSIX fcntl byte-range locks,
distributed-lock write coordination, and concurrent writers. The two
cross-mount handoff checks currently fail and pin a known limitation -
the distributed lock is released on FUSE Release, which the kernel can
delay under contention.

Runs locally via test/samba/run.sh or in Docker via the compose file;
wired into CI as samba-integration.yml.

* fix(cluster): release distributed lock without racing the renewal goroutine

Stop() closed the cancel channel, slept 10ms, then unlocked using
renewToken. A renewal in flight during that window rotates the token on
the server, so the unlock may be sent with a stale token, fail with a
mismatch, and leave the lock to linger until its TTL expires - stalling
other mounts waiting to write the same file.

Wait for the renewal goroutine to exit before unlocking. The channel
close also makes the renewToken read happen-after the last renewal.

* fix(cluster): poll for distributed lock acquisition without exponential backoff

A mount waiting to write a file held by another mount acquired through
util.RetryUntil, whose backoff grows to several seconds. Once the holder
released, the waiter could sleep that long before retrying, stretching
the cross-mount handoff past client timeouts.

Poll at the steady ~1s cadence AttemptToLock already enforces instead.

* test(mount): tighten Samba harness and mark the DLM handoff checks xfail

Run the workflow for weed/cluster changes, fail fast when the filer or
smbd port never opens, and fold the recursive mput result into its own
assertion so it cannot false-pass.

Mark the two cross-mount handoff checks expected-fail: they pin the
remaining DLM liveness bug (the lock is freed only on the delayed FUSE
Release) without failing CI, and turn the suite red if the handoff is
ever fixed.

* fix(cluster): keep a wedged renewal shutdown from sending a stale unlock

If the renewal goroutine is stuck in a slow RPC, Stop() fell through to
unlock anyway once it timed out waiting. A late renewal can rotate
renewToken, so that unlock races it, is rejected on a stale token, and
leaves the lock lingering until its TTL regardless. On the timeout path,
skip the unlock and let the TTL expire the lock instead.

* fix(cluster): wake the long-lived lock renewal loop promptly on Stop

StartLongLivedLock's renewal loop slept uninterruptibly between attempts,
up to 5*renewInterval (2.5*lockTTL) while unlocked. Stop() waits only
lockTTL+2s for the goroutine to exit, so a Stop() during that backoff
would time out before the goroutine woke and closed renewalDone,
breaking the shutdown synchronization. Sleep on a timer with a select on
cancelCh so the loop exits immediately.
2026-05-20 14:52:17 -07:00
..
2026-04-10 17:31:14 -07:00
2026-04-10 17:31:14 -07:00