Files
seaweedfs/weed
Chris Lu 3d39324bc1 fix(nfs): make Linux mount -t nfs work without client workaround (#9199) (#9201)
* fix(nfs): make Linux `mount -t nfs` work without client-side workaround (#9199)

The upstream go-nfs library serves NFSv3 + MOUNT on a single TCP port and
does not register with portmap. Linux mount.nfs queries portmap on port 111
first, so the plain `mount -t nfs host:/export /mnt` form failed with
"portmap query failed" / "requested NFS version or transport protocol is
not supported" against a default `weed nfs` deployment.

- Add a minimal PORTMAP v2 responder (weed/server/nfs/portmap.go) with
  TCP+UDP listeners implementing PMAP_NULL, PMAP_GETPORT, PMAP_DUMP, and
  proper PROG_MISMATCH / PROG_UNAVAIL / PROC_UNAVAIL responses.
  Advertises NFS v3 TCP and MOUNT v3 TCP at the configured NFS port.

- New CLI flag `-portmap.bind` (empty, disabled by default) to opt into
  the responder. Binding port 111 requires root or CAP_NET_BIND_SERVICE
  and must not collide with a system rpcbind.

- Extended `weed nfs -h` help with the two supported ways to mount from
  Linux (client-side portmap bypass, or server-side `-portmap.bind`).

- Startup log now prints a copy-pasteable mount command tailored to
  whether portmap is enabled.

Unit tests cover RPC/XDR parsing, accept-stat paths, and a TCP+UDP
round-trip against the real listener.

Verified in a privileged Debian 12 container: with `-portmap.bind=0.0.0.0`
the exact command from #9199 (`mount -t nfs -o nfsvers=3,nolock
host:/export /mnt`) now succeeds and both read and write work.

* fix(nfs): harden portmap responder per review feedback (#9201)

Addresses three review findings on the portmap responder:

- parseRPCCall: validate opaque_auth length against the record limit
  before applying the XDR 4-byte padding, so a near-uint32-max authLen
  can no longer overflow (authLen + 3) and bypass the bounds check.
  (gemini-code-assist)

- serveTCP/Close: track live TCP connections and evict them on Close()
  so shutdown does not block on idle clients waiting for the read
  deadline to trip. serveTCP also no longer tears the listener down on
  a non-fatal Accept error (e.g. EMFILE); it logs and retries after a
  small back-off. Replaces the atomic.Bool closed flag with a
  mutex-guarded one so closed, conns, and the shutdown transition stay
  consistent. (coderabbit, minor)

- handleTCPConn: apply per-IO read/write deadlines (30s idle, 10s
  in-flight) so a peer that opens the privileged port 111 and stalls
  cannot pin a goroutine indefinitely. (coderabbit, major)

Adds TestPortmapServer_CloseEvictsIdleTCPConn, which holds a TCP
connection idle and asserts Close() returns within 2s (well under the
30s idle deadline) and that the client sees the eviction.

All existing tests still pass, including under -race.

* fix(nfs): keep portmap UDP responder alive on transient read errors (#9201)

- serveUDP: on a non-shutdown ReadFromUDP error, log, back off, and
  continue instead of returning. Matches how serveTCP now treats
  non-fatal Accept errors so a transient network blip doesn't take
  UDP portmap down until restart. (coderabbit)

- Rename portmapAcceptBackoff -> portmapRetryBackoff now that both
  paths use it.

- pmapProcDump: fix the pre-allocation capacity to match the actual
  encoding (20 bytes per entry + 4-byte terminator), replacing the
  old over-estimate of 24 per entry. No behavior change; just
  documents intent. (coderabbit nit)

* docs(nfs): clarify encodeAcceptedReply body semantics (#9201)

The prior comment said body is "nil when the accept_stat is itself an
error", which was misleading: the PROG_MISMATCH branch already passes
an 8-byte mismatch_info body. Rewrite to enumerate which error
accept_stat values omit the body and call out PROG_MISMATCH as the
exception, referencing RFC 5531 §9. Comment-only. (coderabbit nit)

* fix(nfs): make portmap retry backoff interruptible by Close() (#9201)

serveTCP and serveUDP both sleep portmapRetryBackoff (50ms) after a
non-fatal listener error. If Close() races in during that sleep, the
goroutine can't be interrupted, so Close() has to wait out the
remaining backoff before wg.Wait() returns.

Add a done channel that Close() closes once, and replace both
time.Sleep calls with a select on ps.done + time.After. The window
was tiny in practice but the select makes shutdown strictly bounded
by Close()'s own work. (coderabbit nit)
2026-04-23 13:53:53 -07:00
..
2026-04-10 17:31:14 -07:00
2026-04-10 17:31:14 -07:00
2026-04-14 20:48:24 -07:00
2026-04-23 10:05:51 -07:00