mirror of
https://github.com/seaweedfs/seaweedfs.git
synced 2026-05-22 01:31:34 +00:00
* test(mount): add Samba over FUSE integration test Export a SeaweedFS FUSE mount over SMB with smbd and drive it with smbclient: file round-trips, directories, rename, large-file chunking, recursive upload, cross-protocol consistency, and deletes. A second -dlm mount adds locking coverage: POSIX fcntl byte-range locks, distributed-lock write coordination, and concurrent writers. The two cross-mount handoff checks currently fail and pin a known limitation - the distributed lock is released on FUSE Release, which the kernel can delay under contention. Runs locally via test/samba/run.sh or in Docker via the compose file; wired into CI as samba-integration.yml. * fix(cluster): release distributed lock without racing the renewal goroutine Stop() closed the cancel channel, slept 10ms, then unlocked using renewToken. A renewal in flight during that window rotates the token on the server, so the unlock may be sent with a stale token, fail with a mismatch, and leave the lock to linger until its TTL expires - stalling other mounts waiting to write the same file. Wait for the renewal goroutine to exit before unlocking. The channel close also makes the renewToken read happen-after the last renewal. * fix(cluster): poll for distributed lock acquisition without exponential backoff A mount waiting to write a file held by another mount acquired through util.RetryUntil, whose backoff grows to several seconds. Once the holder released, the waiter could sleep that long before retrying, stretching the cross-mount handoff past client timeouts. Poll at the steady ~1s cadence AttemptToLock already enforces instead. * test(mount): tighten Samba harness and mark the DLM handoff checks xfail Run the workflow for weed/cluster changes, fail fast when the filer or smbd port never opens, and fold the recursive mput result into its own assertion so it cannot false-pass. Mark the two cross-mount handoff checks expected-fail: they pin the remaining DLM liveness bug (the lock is freed only on the delayed FUSE Release) without failing CI, and turn the suite red if the handoff is ever fixed. * fix(cluster): keep a wedged renewal shutdown from sending a stale unlock If the renewal goroutine is stuck in a slow RPC, Stop() fell through to unlock anyway once it timed out waiting. A late renewal can rotate renewToken, so that unlock races it, is rejected on a stale token, and leaves the lock lingering until its TTL regardless. On the timeout path, skip the unlock and let the TTL expire the lock instead. * fix(cluster): wake the long-lived lock renewal loop promptly on Stop StartLongLivedLock's renewal loop slept uninterruptibly between attempts, up to 5*renewInterval (2.5*lockTTL) while unlocked. Stop() waits only lockTTL+2s for the goroutine to exit, so a Stop() during that backoff would time out before the goroutine woke and closed renewalDone, breaking the shutdown synchronization. Sleep on a timer with a select on cancelCh so the loop exits immediately.