mirror of
https://github.com/seaweedfs/seaweedfs.git
synced 2026-05-13 21:31:32 +00:00
* add filer inode foundation for nfs
* nfs command skeleton
* add filer inode index foundation for nfs
* make nfs inode index hardlink aware
* add nfs filehandle and inode lookup plumbing
* add read-only nfs frontend foundation
* add nfs namespace mutation support
* add chunk-backed nfs write path
* add nfs protocol integration tests
* add stale handle nfs coverage
* complete nfs hardlink and failover coverage
* add nfs export access controls
* add nfs metadata cache invalidation
* fix nfs chunk read lookup routing
* fix nfs review findings and rename regression
* address pr 9067 review comments
- filer_inode: fail fast if the snowflake sequencer cannot start, and let
operators override the 10-bit node id via SEAWEEDFS_FILER_SNOWFLAKE_ID
to avoid multi-filer collisions
- filer_inode: drop the redundant retry loop in nextInode
- filerstore_wrapper: treat inode-index writes/removals as best-effort so
a primary store success no longer surfaces as an operation failure
- filer_grpc_server_rename: defer overwritten-target chunk deletion until
after CommitTransaction so a rolled-back rename does not strand live
metadata pointing at freshly deleted chunks
- command/nfs: default ip.bind to loopback and require an explicit
filer.path, so the experimental server does not expose the entire
filer namespace on first run
- nfs integration_test: document why LinkArgs matches go-nfs's on-the-wire
layout rather than RFC 1813 LINK3args
* mount: pre-allocate inode in Mkdir and Symlink
Mkdir and Symlink used to send filer_pb.CreateEntryRequest with
Attributes.Inode = 0. After PR 9067, the filer's CreateEntry now assigns
its own inode in that case, so the filer-side entry ends up with a
different inode than the one the mount allocates via inodeToPath.Lookup
and returns to the kernel. Once applyLocalMetadataEvent stores the
filer's entry in the meta cache, subsequent GetAttr calls read the
cached entry and hit the setAttrByPbEntry override at line 197 of
weedfs_attr.go, returning the filer-assigned inode instead of the
mount's local one. pjdfstest tests/rename/00.t (subtests 81/87/91)
caught this — it lstat'd a freshly-created directory/symlink, renamed
it, lstat'd again, and saw a different inode the second time.
createRegularFile already pre-allocates via inodeToPath.AllocateInode
and stamps it into the create request. Do the same thing in Mkdir and
Symlink so both sides agree on the object identity from the very first
request, and so GetAttr's cache path returns the same value as Mkdir /
Symlink's initial response.
* sequence: mask snowflake node id on int→uint32 conversion
CodeQL flagged the unchecked uint32(snowflakeId) cast in
NewSnowflakeSequencer as a potential truncation bug when snowflakeId is
sourced from user input (e.g. via SEAWEEDFS_FILER_SNOWFLAKE_ID). Mask
to the 10 bits the snowflake library actually uses so any caller-
supplied int is safely clamped into range.
* add test/nfs integration suite
Boots a real SeaweedFS cluster (master + volume + filer) plus the
experimental `weed nfs` frontend as subprocesses and drives it through
the NFSv3 wire protocol via go-nfs-client, mirroring the layout of
test/sftp. The tests run without a kernel NFS mount, privileged ports,
or any platform-specific tooling.
Coverage includes read/write round-trip, mkdir/rmdir, nested
directories, rename content preservation, overwrite + explicit
truncate, 3 MiB binary file, all-byte binary and empty files, symlink
round-trip, ReadDirPlus listing, missing-path remove, FSInfo sanity,
sequential appends, and readdir-after-remove.
Framework notes:
- Picks ephemeral ports with net.Listen("127.0.0.1:0") and passes
-port.grpc explicitly so the default port+10000 convention cannot
overflow uint16 on macOS.
- Pre-creates the /nfs_export directory via the filer HTTP API before
starting the NFS server — the NFS server's ensureIndexedEntry check
requires the export root to exist with a real entry, which filer.Root
does not satisfy when the export path is "/".
- Reuses the same rpc.Client for mount and target so go-nfs-client does
not try to re-dial via portmapper (which concatenates ":111" onto the
address).
* ci: add NFS integration test workflow
Mirror test/sftp's workflow for the new test/nfs suite so PRs that touch
the NFS server, the inode filer plumbing it depends on, or the test
harness itself run the 14 NFSv3-over-RPC integration tests on Ubuntu
22.04 via `make test`.
* nfs: use append for buffer growth in Write and Truncate
The previous make+copy pattern reallocated the full buffer on every
extending write or truncate, giving O(N^2) behaviour for sequential
write loops. Switching to `append(f.content, make([]byte, delta)...)`
lets Go's amortized growth strategy absorb the repeated extensions.
Called out by gemini-code-assist on PR 9067.
* filer: honor caller cancellation in collectInodeIndexEntries
Dropping the WithoutCancel wrapper lets DeleteFolderChildren bail out of
the inode-index scan if the client disconnects mid-walk. The cleanup is
already treated as best-effort by the caller (it logs on error and
continues), so a cancelled walk just means the partial index rebuild is
skipped — the same failure mode as any other index write error.
Flagged as a DoS concern by gemini-code-assist on PR 9067.
* nfs: skip filer read on open when O_TRUNC is set
openFile used to unconditionally loadWritableContent for every writable
open and then discard the buffer if O_TRUNC was set. For large files
that is a pointless 64 MiB round-trip. Reorder the branches so we only
fetch existing content when the caller intends to keep it, and mark the
file dirty right away so the subsequent Close still issues the
truncating write. Called out by gemini-code-assist on PR 9067.
* nfs: allow Seek on O_APPEND files and document buffered write cap
Two related cleanups on filesystem.go:
- POSIX only restricts Write on an O_APPEND fd, not lseek. The existing
Seek error ("append-only file descriptors may only seek to EOF")
prevented read-and-write workloads that legitimately reposition the
read cursor. Write already snaps the offset to EOF before persisting
(see seaweedFile Write), so Seek can unconditionally accept any
offset. Update the unit test that was asserting the old behaviour.
- Add a doc comment on maxBufferedWriteSize explaining that it is a
per-file ceiling, the memory footprint it implies, and that the real
fix for larger whole-file rewrites is streaming / multi-chunk support.
Both changes flagged by gemini-code-assist on PR 9067.
* nfs: guard offset before casting to int in Write
CodeQL flagged `int(f.offset) + len(p)` inside the Write growth path as
a potential overflow on architectures where `int` is 32-bit. The
existing check only bounded the post-cast value, which is too late.
Clamp f.offset against maxBufferedWriteSize before the cast and also
reject negative/overflowed endOffset results. Both branches fall
through to billy.ErrNotSupported, the same behaviour the caller gets
today for any out-of-range buffered write.
* nfs: compute Write endOffset in int64 to satisfy CodeQL
The previous guard bounded f.offset but left len(p) unchecked, so
CodeQL still flagged `int(f.offset) + len(p)` as a possible int-width
overflow path. Bound len(p) against maxBufferedWriteSize first, do the
addition in int64, and only cast down after the total has been clamped
against the buffer ceiling. Behaviour is unchanged: any out-of-range
write still returns billy.ErrNotSupported.
* ci: drop emojis from nfs-tests workflow summary
Plain-text step summary per user preference — no decorative glyphs in
the NFS CI output or checklist.
* nfs: annotate remaining DEV_PLAN TODOs with status
Three of the unchecked items are genuine follow-up PRs rather than
missing work in this one, and one was actually already done:
- Reuse chunk cache and mutation stream helpers without FUSE deps:
checked off — the NFS server imports weed/filer.ReaderCache and
weed/util/chunk_cache directly with no weed/mount or go-fuse imports.
- Extract shared read/write helpers from mount/WebDAV/SFTP: annotated
as deferred to a separate refactor PR (touches four packages).
- Expand direct data-path writes beyond the 64 MiB buffered fallback:
annotated as deferred — requires a streaming WRITE path.
- Shared lock state + lock tests: annotated as blocked upstream on
go-nfs's missing NLM/NFSv4 lock state RPCs, matching the existing
"Current Blockers" note.
* test/nfs: share port+readiness helpers with test/testutil
Drop the per-suite mustPickFreePort and waitForService re-implementations
in favor of testutil.MustAllocatePorts (atomic batch allocation; no
close-then-hope race) and testutil.WaitForPort / SeaweedMiniStartupTimeout.
Pull testutil in via a local replace directive so this standalone
seaweedfs-nfs-tests module can import the in-repo package without a
separate release.
Subprocess startup is still master + volume + filer + nfs — no switch to
weed mini yet, since mini does not know about the nfs frontend.
* nfs: stream writes to volume servers instead of buffering the whole file
Before this change the NFS write path held the full contents of every
writable open in memory:
- OpenFile(write) called loadWritableContent which read the existing
file into seaweedFile.content up to maxBufferedWriteSize (64 MiB)
- each Write() extended content in-place
- Close() uploaded the whole buffer as a single chunk via
persistContent + AssignVolume
The 64 MiB ceiling made large NFS writes return NFS3ERR_NOTSUPP, and
even below the cap every Write paid a whole-file-in-memory cost. This
PR rewrites the write path to match how `weed filer` and the S3 gateway
persist data:
- openFile(write) no longer loads the existing content at all; it
only issues an UpdateEntry when O_TRUNC is set *and* the file is
non-empty (so a fresh create+trunc is still zero-RPC)
- Write() streams the caller's bytes straight to a volume server via
one AssignVolume + one chunk upload, then atomically appends the
resulting chunk to the filer entry through mutateEntry. Any
previously inlined entry.Content is migrated to a chunk in the same
update so the chunk list becomes the authoritative representation.
- Truncate() becomes a direct mutateEntry (drop chunks past the new
size, clip inline content, update FileSize) instead of resizing an
in-memory buffer.
- Close() is a no-op because everything was flushed inline.
The small-file fast path that the filer HTTP handler uses is preserved:
if the post-write size still fits in maxInlineWriteSize (4 MiB) and
the file has no existing chunks, we rewrite entry.Content directly and
skip the volume-server round-trip. This keeps single-shot tiny writes
(echo, small edits) cheap while completely removing the 64 MiB cap on
larger files. Read() now always reads through the chunk reader instead
of a local byte slice, so reads inside the same session see the freshly
appended data.
Drops the unused seaweedFile.content / dirty fields, the
maxBufferedWriteSize constant, and the loadWritableContent helper.
Updates TestSeaweedFileSystemSupportsNamespaceMutations expectations
to match the new "no extra O_TRUNC UpdateEntry on an empty file"
behavior (still 3 updates: Write + Chmod + Truncate).
* filer: extract shared gateway upload helper for NFS and WebDAV
Three filer-backed gateways (NFS, WebDAV, and mount) each had a local
saveDataAsChunk that wrapped operation.NewUploader().UploadWithRetry
with near-identical bodies: build AssignVolumeRequest, build
UploadOption, build genFileUrlFn with optional filerProxy rewriting,
call UploadWithRetry, validate the result, and call ToPbFileChunk.
Pull that body into filer.SaveGatewayDataAsChunk with a
GatewayChunkUploadRequest struct so both NFS and WebDAV can delegate
to one implementation.
- NFS's saveDataAsChunk is now a thin adapter that assembles the
GatewayChunkUploadRequest from server options and calls the helper.
The chunkUploader interface keeps working for test injection because
the new GatewayChunkUploader interface is structurally identical.
- WebDAV's saveDataAsChunk is similarly a thin adapter — it drops the
local operation.NewUploader call plus the AssignVolume/UploadOption
scaffolding.
- mount is intentionally left alone. mount's saveDataAsChunk has two
features that do not fit the shared helper (a pre-allocated file-id
pool used to skip AssignVolume entirely, and a chunkCache
write-through at offset 0 so future reads hit the mount's local
cache), both of which are mount-specific.
Marks the Phase 2 "extract shared read/write helpers from mount,
WebDAV, and SFTP" DEV_PLAN item as done. The filer-level chunk read
path (NonOverlappingVisibleIntervals + ViewFromVisibleIntervals +
NewChunkReaderAtFromClient) was already shared.
* nfs: remove DESIGN.md and DEV_PLAN.md
The planning documents have served their purpose — all phase 1 and
phase 2 items are landed, phase 3 streaming writes are landed, phase 2
shared helpers are extracted, and the two remaining phase 4 items
(shared lock state + lock tests) are blocked upstream on
github.com/willscott/go-nfs which exposes no NLM or NFSv4 lock state
RPCs. The running decision log no longer reflects current code and
would just drift. The NFS wiki page
(https://github.com/seaweedfs/seaweedfs/wiki/NFS-Server) now carries
the overview, configuration surface, architecture notes, and known
limitations; the source is the source of truth for the rest.