Files
seaweedfs/test/nfs/kernel_mount_test.go
Chris Lu 35fe3c801b feat(nfs): UDP MOUNT v3 responder + real-Linux e2e mount harness (#9267)
* feat(nfs): add UDP MOUNT v3 responder

The upstream willscott/go-nfs library only serves the MOUNT protocol
over TCP. Linux's mount.nfs and the in-kernel NFS client default
mountproto to UDP in many configurations, so against a stock weed nfs
deployment the kernel queries portmap for "MOUNT v3 UDP", gets port=0
("not registered"), and either falls back inconsistently or surfaces
EPROTONOSUPPORT — surfacing as the user-visible "requested NFS version
or transport protocol is not supported" reported in #9263. The user has
to add `mountproto=tcp` or `mountport=2049` to mount options to coerce
TCP just for the MOUNT phase.

Add a small UDP responder that speaks just enough of MOUNT v3 to handle
the procedures the kernel actually invokes during mount setup and
teardown: NULL, MNT, and UMNT. The wire layout for MNT mirrors
handler.go's TCP path so both transports produce the same root
filehandle and the same auth flavor list for the same export. Other
v3 procedures (DUMP, EXPORT, UMNTALL) cleanly return PROC_UNAVAIL.

This commit only adds the responder; portmap-advertise and Server.Start
wire-up follow in subsequent commits so each step stays independently
reviewable.

References: RFC 1813 §5 (NFSv3/MOUNTv3), RFC 5531 (RPC). Existing
constants and parseRPCCall / encodeAcceptedReply helpers from
portmap.go are reused so behaviour stays consistent across both UDP
listening goroutines.

* feat(nfs): advertise UDP MOUNT v3 in the portmap responder

The portmap responder advertised TCP-only entries because go-nfs only
serves TCP, but with the new UDP MOUNT responder in place we can now
honestly advertise MOUNT v3 over UDP as well. Linux clients whose
default mountproto is UDP query portmap during mount setup; if the
answer is "not registered" some kernels translate the result to
EPROTONOSUPPORT instead of falling back to TCP, which is exactly the
failure pattern reported in #9263.

Add the entry, refresh the doc comment, and extend the existing
GETPORT and DUMP unit tests so a regression that drops the entry shows
up at unit-test granularity rather than only in an end-to-end mount.

* feat(nfs): start UDP MOUNT v3 responder alongside the TCP NFS listener

Plug the new mountUDPServer into Server.Start so it comes up on the
same bind/port as the TCP NFS listener. Started before portmap so a
portmap query that races a fast client never returns a UDP MOUNT entry
the responder isn't actually answering, and shut down via the same
defer chain so a portmap-or-listener startup failure doesn't leave the
UDP responder dangling.

The portmap startup log now reflects all three advertised entries
(NFS v3 tcp, MOUNT v3 tcp, MOUNT v3 udp) so operators can confirm at a
glance that the UDP MOUNT path is up.

Verified end-to-end: built a Linux/arm64 binary, ran weed nfs in a
container with -portmap.bind, and mounted from another container using
both the user-reported failing setup from #9263 (vers=3 + tcp without
mountport) and an explicit mountproto=udp to force the new code path.
The trace `mount.nfs: trying ... prog 100005 vers 3 prot UDP port 2049`
now leads to a successful mount instead of EPROTONOSUPPORT.

* docs(nfs): note that the plain mount form works on UDP-default clients

With UDP MOUNT v3 now served alongside TCP, the only path that ever
required mountproto=tcp / mountport=2049 — clients whose default
mountproto is UDP — works against the plain mount example. Update the
startup mount hint and the `weed nfs` long help so users don't go
hunting for a mount-option workaround that no longer applies.

The "without -portmap.bind" branch is unchanged: that path still has
to bypass portmap entirely because there is no portmap responder for
the kernel to query.

* test(nfs): add kernel-mount e2e tests under test/nfs

The existing test/nfs/ harness boots a real master + volume + filer +
weed nfs subprocess stack and drives it via go-nfs-client. That covers
protocol behaviour from a Go client's perspective, but anything
mis-coded once a real Linux kernel parses the wire bytes is invisible:
both ends of the test use the same RPC library, so identical bugs
round-trip cleanly. The two NFS issues hit recently were exactly that
shape — NFSv4 mis-routed to v3 SETATTR (#9262) and missing UDP MOUNT v3
— and only surfaced in a real client.

Add three end-to-end tests that mount the harness's running NFS server
through the in-tree Linux client:

  - TestKernelMountV3TCP: NFSv3 + MOUNT v3 over TCP (baseline).
  - TestKernelMountV3MountProtoUDP: NFSv3 over TCP, MOUNT v3 over UDP
    only — regression test for the new UDP MOUNT v3 responder.
  - TestKernelMountV4RejectsCleanly: vers=4 against the v3-only server,
    asserting the kernel surfaces a protocol/version-level error rather
    than a generic "mount system call failed" — regression test for the
    PROG_MISMATCH path from #9262.

The tests pass explicit port=/mountport= mount options so the kernel
never queries portmap, which means the harness doesn't need to bind
the privileged port 111 and won't collide with a system rpcbind on a
shared CI runner. They t.Skip cleanly when the host isn't Linux, when
mount.nfs isn't installed, or when the test process isn't running as
root.

Run locally with:

	cd test/nfs
	sudo go test -v -run TestKernelMount ./...

CI wiring follows in the next commit.

* ci(nfs): run kernel-mount e2e tests in nfs-tests workflow

Wire the new TestKernelMount* tests from test/nfs into the existing
NFS workflow:

  - Existing protocol-layer step now skips '^TestKernelMount' so a
    "skipped because not root" line doesn't appear on every run.
  - New "Install kernel NFS client" step pulls nfs-common (mount.nfs +
    helpers) and netbase (/etc/protocols, which mount.nfs's protocol-
    name lookups need to resolve `tcp`/`udp`).
  - New privileged step runs only the kernel-mount tests under sudo,
    preserving PATH and pointing GOMODCACHE/GOCACHE at the user's
    caches so the second `go test` invocation reuses already-built
    test binaries instead of redownloading modules under root.

The summary block now lists the three kernel-mount cases explicitly
so a regression on either of #9262 or this PR's UDP MOUNT change is
traceable from the workflow run page.
2026-04-28 14:06:35 -07:00

194 lines
7.3 KiB
Go

//go:build linux
package nfs
// End-to-end mount tests that drive the real Linux NFS client (mount.nfs +
// in-tree kernel) against a running `weed nfs` subprocess. These exist to
// catch regressions that the existing framework can't see, because the
// framework drives the server with willscott/go-nfs-client — the same RPC
// library the server uses internally — so any bug shared between the two
// (XDR layout, version dispatch, RPC framing) round-trips invisibly.
//
// Two real bugs hit recently were exactly that shape:
// 1. NFSv4 mis-routed to the v3 SETATTR handler (#9262). The client
// library never sends NFSv4, so the test suite never noticed; the
// Linux kernel mount path did notice, with EIO.
// 2. UDP MOUNT v3 missing. Only TCP MOUNT was advertised; the kernel
// defaults mountproto=udp in many setups, so the in-tree client
// surfaced EPROTONOSUPPORT during MOUNT setup.
//
// These tests mount over the actual loopback interface using mount.nfs and
// shell out to /bin/mount and /bin/umount. They require root (mount(2) is
// privileged) and Linux (the in-tree NFS client is what's being exercised);
// they t.Skip cleanly when either prerequisite is missing.
//
// Run locally with:
//
// cd test/nfs
// sudo go test -v -run TestKernelMount ./...
//
// CI runs them via .github/workflows/nfs-tests.yml after installing
// nfs-common (mount.nfs + helpers).
import (
"errors"
"fmt"
"net"
"os"
"os/exec"
"strings"
"testing"
)
// kernelMountSkipIfUnsupported skips the test when the host can't run a
// real NFS mount. The combined check belongs in one place so the three
// kernel-mount tests stay focused on what they're actually verifying.
func kernelMountSkipIfUnsupported(t *testing.T) {
t.Helper()
if os.Geteuid() != 0 {
t.Skip("kernel mount test requires root; mount(2) is privileged")
}
if _, err := exec.LookPath("mount.nfs"); err != nil {
t.Skipf("mount.nfs not installed: %v (CI installs the nfs-common package)", err)
}
}
// kernelMount runs /bin/mount with the given options against the framework's
// running NFS server, returns the mountpoint and an unmount closure. We pass
// explicit port=/mountport= options so the kernel never queries portmap.
// That keeps the harness honest about what it's testing — the NFS / MOUNT
// wire protocol — and avoids colliding with a system rpcbind on shared CI
// runners (port 111 is privileged and frequently in use already).
func kernelMount(t *testing.T, fw *NfsTestFramework, optsTemplate string) (string, func()) {
t.Helper()
host, portStr, err := net.SplitHostPort(fw.NfsAddr())
if err != nil {
t.Fatalf("split nfs addr %q: %v", fw.NfsAddr(), err)
}
mountpoint, err := os.MkdirTemp("", "weed-nfs-kmount-")
if err != nil {
t.Fatalf("mkdtemp: %v", err)
}
opts := strings.ReplaceAll(optsTemplate, "{port}", portStr)
target := fmt.Sprintf("%s:%s", host, fw.ExportRoot())
cmd := exec.Command("mount", "-t", "nfs", "-o", opts, target, mountpoint)
if out, err := cmd.CombinedOutput(); err != nil {
_ = os.RemoveAll(mountpoint)
t.Fatalf("mount %s -o %s failed: %v\nmount output:\n%s", target, opts, err, out)
}
teardown := func() {
// -f to bail out faster if the server's already gone.
_ = exec.Command("umount", "-f", mountpoint).Run()
_ = os.RemoveAll(mountpoint)
}
return mountpoint, teardown
}
func newKernelMountFramework(t *testing.T) *NfsTestFramework {
t.Helper()
cfg := DefaultTestConfig()
fw := NewNfsTestFramework(t, cfg)
if err := fw.Setup(cfg); err != nil {
fw.Cleanup()
t.Fatalf("framework setup: %v", err)
}
t.Cleanup(fw.Cleanup)
return fw
}
// TestKernelMountV3TCP exercises the most common mount form: NFSv3 + MOUNT
// v3, both over TCP. This is what the existing go-nfs-client tests cover at
// the protocol layer, but running it through mount.nfs and the kernel
// confirms that the wire format we emit decodes cleanly under a different
// XDR/RPC parser.
func TestKernelMountV3TCP(t *testing.T) {
kernelMountSkipIfUnsupported(t)
fw := newKernelMountFramework(t)
mountpoint, undo := kernelMount(t, fw,
"nfsvers=3,nolock,port={port},mountport={port},proto=tcp,mountproto=tcp")
defer undo()
if _, err := os.Stat(mountpoint); err != nil {
t.Errorf("stat mountpoint: %v", err)
}
if _, err := os.ReadDir(mountpoint); err != nil {
t.Errorf("readdir mountpoint: %v", err)
}
}
// TestKernelMountV3MountProtoUDP is the regression test for the UDP MOUNT
// v3 responder. mountproto=udp forces the kernel to call MOUNT over UDP
// only; before the responder existed the kernel hit nothing (MOUNT was
// advertised TCP-only) and surfaced EPROTONOSUPPORT during mount setup.
func TestKernelMountV3MountProtoUDP(t *testing.T) {
kernelMountSkipIfUnsupported(t)
fw := newKernelMountFramework(t)
mountpoint, undo := kernelMount(t, fw,
"nfsvers=3,nolock,port={port},mountport={port},proto=tcp,mountproto=udp")
defer undo()
if _, err := os.Stat(mountpoint); err != nil {
t.Errorf("stat mountpoint: %v", err)
}
}
// TestKernelMountV4RejectsCleanly is the regression test for the NFSv4
// PROG_MISMATCH path (#9262). The server only speaks NFSv3, but the
// previous behaviour was to mis-route v4 COMPOUND to the v3 SETATTR
// handler and write garbage; the kernel surfaced EIO instead of a
// version-mismatch error and (depending on distro) didn't fall back to
// v3. The version filter now answers PROG_MISMATCH so the kernel sees
// "v4 not supported" cleanly.
//
// The test asserts:
// 1. mount.nfs exits non-zero (no silent success against a v3 server);
// 2. the failure message mentions protocol/version/io, which is what the
// kernel surfaces when it gets PROG_MISMATCH instead of garbage. A
// pre-fix server returns "mount system call failed" with no further
// context, so a regression collapses the assertion onto that branch.
func TestKernelMountV4RejectsCleanly(t *testing.T) {
kernelMountSkipIfUnsupported(t)
fw := newKernelMountFramework(t)
host, portStr, err := net.SplitHostPort(fw.NfsAddr())
if err != nil {
t.Fatalf("split nfs addr: %v", err)
}
mountpoint, err := os.MkdirTemp("", "weed-nfs-kmount-v4-")
if err != nil {
t.Fatalf("mkdtemp: %v", err)
}
defer os.RemoveAll(mountpoint)
target := fmt.Sprintf("%s:%s", host, fw.ExportRoot())
cmd := exec.Command("mount", "-t", "nfs", "-o",
fmt.Sprintf("vers=4,port=%s", portStr),
target, mountpoint)
out, err := cmd.CombinedOutput()
defer exec.Command("umount", "-f", mountpoint).Run()
if err == nil {
t.Fatalf("v4 mount unexpectedly succeeded against v3-only server\nmount output:\n%s", out)
}
// Don't pin the exact error string — different distros print slightly
// different things — but require some hint that the kernel saw a
// protocol-level failure rather than a generic "mount system call
// failed". Without the version filter, mount.nfs prints the latter
// alone; with it, the former.
lower := strings.ToLower(string(out))
if !strings.Contains(lower, "protocol") &&
!strings.Contains(lower, "version") &&
!strings.Contains(lower, "i/o") {
t.Errorf("v4 mount failure didn't mention protocol/version/io; output:\n%s", out)
}
// Also require a non-zero exit so a future change that makes mount(2)
// silently succeed (e.g. by relaxing the version filter) shows up
// here even if the message phrasing changes.
var ee *exec.ExitError
if !errors.As(err, &ee) {
t.Errorf("expected mount to exit non-zero with ExitError, got %v", err)
}
}