mirror of
https://github.com/seaweedfs/seaweedfs.git
synced 2026-05-25 11:10:20 +00:00
* fix(nfs): make Linux `mount -t nfs` work without client-side workaround (#9199) The upstream go-nfs library serves NFSv3 + MOUNT on a single TCP port and does not register with portmap. Linux mount.nfs queries portmap on port 111 first, so the plain `mount -t nfs host:/export /mnt` form failed with "portmap query failed" / "requested NFS version or transport protocol is not supported" against a default `weed nfs` deployment. - Add a minimal PORTMAP v2 responder (weed/server/nfs/portmap.go) with TCP+UDP listeners implementing PMAP_NULL, PMAP_GETPORT, PMAP_DUMP, and proper PROG_MISMATCH / PROG_UNAVAIL / PROC_UNAVAIL responses. Advertises NFS v3 TCP and MOUNT v3 TCP at the configured NFS port. - New CLI flag `-portmap.bind` (empty, disabled by default) to opt into the responder. Binding port 111 requires root or CAP_NET_BIND_SERVICE and must not collide with a system rpcbind. - Extended `weed nfs -h` help with the two supported ways to mount from Linux (client-side portmap bypass, or server-side `-portmap.bind`). - Startup log now prints a copy-pasteable mount command tailored to whether portmap is enabled. Unit tests cover RPC/XDR parsing, accept-stat paths, and a TCP+UDP round-trip against the real listener. Verified in a privileged Debian 12 container: with `-portmap.bind=0.0.0.0` the exact command from #9199 (`mount -t nfs -o nfsvers=3,nolock host:/export /mnt`) now succeeds and both read and write work. * fix(nfs): harden portmap responder per review feedback (#9201) Addresses three review findings on the portmap responder: - parseRPCCall: validate opaque_auth length against the record limit before applying the XDR 4-byte padding, so a near-uint32-max authLen can no longer overflow (authLen + 3) and bypass the bounds check. (gemini-code-assist) - serveTCP/Close: track live TCP connections and evict them on Close() so shutdown does not block on idle clients waiting for the read deadline to trip. serveTCP also no longer tears the listener down on a non-fatal Accept error (e.g. EMFILE); it logs and retries after a small back-off. Replaces the atomic.Bool closed flag with a mutex-guarded one so closed, conns, and the shutdown transition stay consistent. (coderabbit, minor) - handleTCPConn: apply per-IO read/write deadlines (30s idle, 10s in-flight) so a peer that opens the privileged port 111 and stalls cannot pin a goroutine indefinitely. (coderabbit, major) Adds TestPortmapServer_CloseEvictsIdleTCPConn, which holds a TCP connection idle and asserts Close() returns within 2s (well under the 30s idle deadline) and that the client sees the eviction. All existing tests still pass, including under -race. * fix(nfs): keep portmap UDP responder alive on transient read errors (#9201) - serveUDP: on a non-shutdown ReadFromUDP error, log, back off, and continue instead of returning. Matches how serveTCP now treats non-fatal Accept errors so a transient network blip doesn't take UDP portmap down until restart. (coderabbit) - Rename portmapAcceptBackoff -> portmapRetryBackoff now that both paths use it. - pmapProcDump: fix the pre-allocation capacity to match the actual encoding (20 bytes per entry + 4-byte terminator), replacing the old over-estimate of 24 per entry. No behavior change; just documents intent. (coderabbit nit) * docs(nfs): clarify encodeAcceptedReply body semantics (#9201) The prior comment said body is "nil when the accept_stat is itself an error", which was misleading: the PROG_MISMATCH branch already passes an 8-byte mismatch_info body. Rewrite to enumerate which error accept_stat values omit the body and call out PROG_MISMATCH as the exception, referencing RFC 5531 §9. Comment-only. (coderabbit nit) * fix(nfs): make portmap retry backoff interruptible by Close() (#9201) serveTCP and serveUDP both sleep portmapRetryBackoff (50ms) after a non-fatal listener error. If Close() races in during that sleep, the goroutine can't be interrupted, so Close() has to wait out the remaining backoff before wg.Wait() returns. Add a done channel that Close() closes once, and replace both time.Sleep calls with a select on ps.done + time.After. The window was tiny in practice but the select makes shutdown strictly bounded by Close()'s own work. (coderabbit nit)