Files
seaweedfs/weed/command/mount_std.go
Chris Lu 7a7f220224 feat(mount): cap write buffer with -writeBufferSizeMB (#9066)
* feat(mount): cap write buffer with -writeBufferSizeMB

Without a bound on the per-mount write pipeline, sustained upload
failures (e.g. volume server returning "Volume Size Exceeded" while
the master hasn't yet rotated assignments) let sealed chunks pile up
across open file handles until the swap directory — by default
os.TempDir() — fills the disk. Reported on 4.19 filling /tmp to 1.8 TB
during a large rclone sync.

Add a global WriteBufferAccountant shared across every UploadPipeline
in a mount. Creating a new page chunk (memory or swap) first reserves
ChunkSize bytes; when the cap is reached the writer blocks until an
uploader finishes and releases, turning swap overflow into natural
FUSE-level backpressure instead of unbounded disk growth.

The new -writeBufferSizeMB flag (also accepted via fuse.conf) defaults
to 0 = unlimited, preserving current behavior. Reserve drops
chunksLock while blocking so uploader goroutines — which take
chunksLock on completion before calling Release — cannot deadlock,
and an oversized reservation on an empty accountant succeeds to avoid
single-handle starvation.

* fix(mount): plug write-budget leaks in pipeline Shutdown

Review on #9066 caught two accounting bugs on the Destroy() path:

1. Writable-chunk leak (high). SaveDataAt() reserves ChunkSize before
   inserting into writableChunks, but Shutdown() only iterated
   sealedChunks. Truncate / metadata-invalidation flows call Destroy()
   (via ResetDirtyPages) without flushing first, so any dirty but
   unsealed chunks would permanently shrink the global write budget.
   Shutdown now frees and releases writable chunks too.

2. Double release with racing uploader (medium). Shutdown called
   accountant.Release directly after FreeReference, while the async
   uploader goroutine did the same on normal completion — under a
   Destroy-before-flush race this could underflow the accountant and
   let later writes exceed the configured cap. Move accounting into
   SealedChunk.FreeReference itself: the refcount-zero transition is
   exactly-once by construction, so any number of FreeReference calls
   release the slot precisely once.

Add regression tests for the writable-leak and the FreeReference
idempotency guarantee.

* test(mount): remove sleep-based race in accountant blocking test

Address review nits on #9066:
- Replace time.Sleep(50ms) proxy for "goroutine entered Reserve" with
  a started channel the goroutine closes immediately before calling
  Reserve. Reserve cannot make progress until Release is called, so
  landed is guaranteed false after the handshake — no arbitrary wait.
- Short-circuit WriteBufferAccountant.Used() in unlimited mode for
  consistency with Reserve/Release, avoiding a mutex round-trip.

* test(mount): add end-to-end write-buffer cap integration test

Exercises the full write-budget plumbing with a small cap (4 chunks of
64 KiB = 256 KiB) shared across three UploadPipelines fed by six
concurrent writers. A gated saveFn models the "volume server rejecting
uploads" condition from the original report: no sealed chunk can drain
until the test opens the gate. A background sampler records the peak
value of accountant.Used() throughout the run.

The test asserts:
  - writers fill the budget and then block on Reserve (Used() stays at
    the cap while stalled)
  - Used() never exceeds the configured cap even under concurrent
    pressure from multiple pipelines
  - after the gate opens, writers drain to zero
  - peak observed Used() matches the cap (262144 bytes in this run)

While wiring this up, the race detector surfaced a pre-existing data
race on UploadPipeline.uploaderCount: the two glog.V(4) lines around
the atomic Add sites read the field non-atomically. Capture the new
value from AddInt32 and log that instead — one-liner each, no
behavioral change.

* test(fuse): end-to-end integration test for -writeBufferSizeMB

Exercise the new write-buffer cap against a real weed mount so CI
(fuse-integration.yml) covers the FUSE→upload-pipeline→filer path, not
just the in-package unit tests. Uses a 4 MiB cap with 2 MiB chunks so
every subtest's total write demand is multiples of the budget and
Reserve/Release must drive forward progress for writes to complete.

Subtests:
- ConcurrentLargeWrites: six parallel 6 MiB files (36 MiB total, ~18
  chunk allocations) through the same mount, verifies every byte
  round-trips.
- SingleFileExceedingCap: one 20 MiB file (10 chunks) through a single
  handle, catching any self-deadlock when the pipeline's own earlier
  chunks already fill the global budget.
- DoesNotDeadlockAfterPressure: final small write with a 30s timeout,
  catching budget-slot leaks that would otherwise hang subsequent
  writes on a still-full accountant.

Ran locally on Darwin with macfuse against a real weed mini + mount:
  === RUN   TestWriteBufferCap
  --- PASS: TestWriteBufferCap (1.82s)

* test(fuse): loosen write-buffer cap e2e test + fail-fast on hang

On Linux CI the previous configuration (-writeBufferSizeMB=4,
-concurrentWriters=4 against a 20 MiB single-handle write)
deterministically hung the "Run FUSE Integration Tests" step to the
45-minute workflow timeout, while on macOS / macfuse the same test
completes in ~2 seconds (see run 24386197483). The Linux hang shows
up after TestWriteBufferCap/ConcurrentLargeWrites completes cleanly,
then TestWriteBufferCap/SingleFileExceedingCap starts and never
emits its PASS line.

Change:
- Loosen the cap to 16 MiB (8 × 2 MiB chunk slots) and drop the
  custom -concurrentWriters override. The subtests still drive demand
  well above the cap (32 MiB concurrent, 12 MiB single-handle), so
  Reserve/Release is still on every chunk-allocation path; the cap
  just gives the pipeline enough headroom that interactions with the
  per-file writableChunkLimit and the go-fuse MaxWrite batching don't
  wedge a single-handle writer on a slow runner.
- Wrap every os.WriteFile in a writeWithTimeout helper that dumps every
  live goroutine on timeout. If this ever re-regresses, CI surfaces
  the actual stuck goroutines instead of a 45-minute walltime.
- Also guard the concurrent-writer goroutines with the same timeout +
  stack dump.

The in-package unit test TestWriteBufferCap_SharedAcrossPipelines
remains the deterministic, controlled verification of the blocking
Reserve/Release path — this e2e test is now a smoke test for
correctness and absence of deadlocks through a real FUSE mount, which
is all it should be.

* fix: address PR #9066 review — idempotent FreeReference, subtest watchdog, larger single-handle test

FreeReference on SealedChunk now early-returns when referenceCounter is
already <= 0. The existing == 0 body guard already made side effects
idempotent, but the counter itself would still decrement into the
negatives on a double-call — ugly and a latent landmine for any future
caller that does math on the counter. Make double-call a strict no-op.

test(fuse): per-subtest watchdog + larger single-handle test

- Add runSubtestWithWatchdog and wrap every TestWriteBufferCap subtest
  with a 3-minute deadline. Individual writes were already
  timeout-wrapped but the readback loops and surrounding bookkeeping
  were not, leaving a gap where a subtest body could still hang. On
  watchdog fire, every live goroutine is dumped so CI surfaces the
  wedge instead of a 45-minute walltime.

- Bump testLargeFileUnderCap from 12 MiB → 20 MiB (10 chunks) to
  exceed the 16 MiB cap (8 slots) again and actually exercise
  Reserve/Release backpressure on a single file handle. The earlier
  e2e hang was under much tighter params (-writeBufferSizeMB=4,
  -concurrentWriters=4, writable limit 4); with the current loosened
  config the pressure is gentle and the goroutine-dump-on-timeout
  safety net is in place if it ever regresses.

Declined: adding an observable peak-Used() assertion to the e2e test.
The mount runs as a subprocess so its in-process WriteBufferAccountant
state isn't reachable from the test without adding a metrics/RPC
surface. The deterministic peak-vs-cap verification already lives in
the in-package unit test TestWriteBufferCap_SharedAcrossPipelines.
Recorded this rationale inline in TestWriteBufferCap's doc comment.

* test(fuse): capture mount pprof goroutine dump on write-timeout

The previous run (24388549058) hung on LargeFileUnderCap and the
test-side dumpAllGoroutines only showed the test process — the test's
syscall.Write is blocked in the kernel waiting for FUSE to respond,
which tells us nothing about where the MOUNT is stuck. The mount runs
as a subprocess so its in-process stacks aren't reachable from the
test.

Enable the mount's pprof endpoint via -debug=true -debug.port=<free>,
allocate the port from the test, and on write-timeout fetch
/debug/pprof/goroutine?debug=2 from the mount process and log it. This
gives CI the only view that can actually diagnose a write-buffer
backpressure deadlock (writer goroutines blocked on Reserve, uploader
goroutines stalled on something, etc).

Kept fileSize at 20 MiB so the Linux CI run will still hit the hang
(if it's genuinely there) and produce an actionable mount-side dump;
the alternative — silently shrinking the test below the cap — would
lose the regression signal entirely.

* review: constructor-inject accountant + subtest watchdog body on main

Two PR-#9066 review fixes:

1. NewUploadPipeline now takes the WriteBufferAccountant as a
   constructor parameter; SetWriteBufferAccountant is removed. In
   practice the previous setter was only called once during
   newMemoryChunkPages, before any goroutine could touch the
   pipeline, so there was no actual race — but constructor injection
   makes the "accountant is fixed at construction time" invariant
   explicit and eliminates the possibility of a future caller
   mutating it mid-flight. All three call sites (real + two tests)
   updated; the legacy TestUploadPipeline passes a nil accountant,
   preserving backward-compatible unlimited-mode behavior.

2. runSubtestWithWatchdog now runs body on the subtest main goroutine
   and starts a watcher goroutine that only calls goroutine-safe t
   methods (t.Log, t.Logf, t.Errorf). The previous version ran body
   on a spawned goroutine, which meant any require.* or writeWithTimeout
   t.Fatalf inside body was being called from a non-test goroutine —
   explicitly disallowed by Go's testing docs. The watcher no longer
   interrupts body (it can't), so body must return on its own —
   which it does via writeWithTimeout's internal 90s timeout firing
   t.Fatalf on (now) the main goroutine. The watchdog still provides
   the critical diagnostic: on timeout it dumps both test-side and
   mount-side (via pprof) goroutine stacks and marks the test failed
   via t.Errorf.

* fix(mount): IsComplete must detect coverage across adjacent intervals

Linux FUSE caps per-op writes at FUSE_MAX_PAGES_PER_REQ (typically
1 MiB on x86_64) regardless of go-fuse's requested MaxWrite, so a
2 MiB chunk filled by a sequential writer arrives as two adjacent
1 MiB write ops. addInterval in ChunkWrittenIntervalList does not
merge adjacent intervals, so the resulting list has two elements
{[0,1M], [1M,2M]} — fully covered, but list.size()==2.

IsComplete previously returned `list.size() == 1 &&
list.head.next.isComplete(chunkSize)`, which required a single
interval covering [0, chunkSize). Under that rule, chunks filled by
adjacent writes never reach IsComplete==true, so maybeMoveToSealed
never fires, and the chunks sit in writableChunks until
FlushAll/close. SaveContent handles the adjacency correctly via its
inline merge loop, so uploads work once they're triggered — but
IsComplete is the gate that triggers them.

This was a latent bug: without the write-buffer cap, the overflow
path kicks in at writableChunkLimit (default 128) and force-seals
chunks, hiding the leak. #9066's -writeBufferSizeMB adds a tighter
global cap, and with 8 slots / 20 MiB test, the budget trips long
before overflow. The writer blocks in Reserve, waiting for a slot
that never frees because no uploader ever ran — observed in the CI
run 24390596623 mount pprof dump: goroutine 1 stuck in
WriteBufferAccountant.Reserve → cond.Wait, zero uploader goroutines
anywhere in the 89-goroutine dump.

Walk the (sorted) interval list tracking the furthest covered
offset; return true if coverage reaches chunkSize with no gaps. This
correctly handles adjacent intervals, overlapping intervals, and
out-of-order inserts. Added TestIsComplete_AdjacentIntervals
covering single-write, two adjacent halves (both orderings), eight
adjacent eighths, gaps, missing edges, and overlaps.

* test(fuse): route mount glog to stderr + dump mount on any write error

Run 24392087737 (with the IsComplete fix) no longer hangs on Linux —
huge progress. Now TestWriteBufferCap/LargeFileUnderCap fails with
'close(...write_buffer_cap_large.bin): input/output error', meaning
a chunk upload failed and pages.lastErr propagated via FlushData to
close(). But the mount log in the CI artifact is empty because weed
mount's glog defaults to /tmp/weed.* files, which the CI upload step
never sees, so we can't tell WHICH upload failed or WHY.

Add -logtostderr=true -v=2 to MountOptions so glog output goes to
the mount process's stderr, which the framework's startProcess
redirects into f.logDir/mount.log, which the framework's DumpLogs
then prints to the test output on failure. The -v=2 floor enables
saveDataAsChunk upload errors (currently logged at V(0)) plus the
medium-level write_pipeline/upload traces without drowning the log
in V(4) noise.

Also dump MOUNT goroutines on any writeWithTimeout error (not just
timeout). The IsComplete fix means we now get explicit errors
instead of silent hangs, and the goroutine dump at the error moment
shows in-flight upload state (pending sealed chunks, retry loops,
etc) that a post-failure log alone can't capture.
2026-04-14 07:47:35 -07:00

414 lines
14 KiB
Go

//go:build linux || darwin || freebsd
package command
import (
"context"
"fmt"
"net"
"net/http"
"os"
"os/user"
"path"
"runtime"
"strconv"
"strings"
"syscall"
"time"
"github.com/seaweedfs/seaweedfs/weed/util/version"
"github.com/seaweedfs/go-fuse/v2/fuse"
"github.com/seaweedfs/seaweedfs/weed/glog"
"github.com/seaweedfs/seaweedfs/weed/mount"
"github.com/seaweedfs/seaweedfs/weed/mount/meta_cache"
"github.com/seaweedfs/seaweedfs/weed/mount/unmount"
"github.com/seaweedfs/seaweedfs/weed/pb"
"github.com/seaweedfs/seaweedfs/weed/pb/filer_pb"
"github.com/seaweedfs/seaweedfs/weed/pb/mount_pb"
"github.com/seaweedfs/seaweedfs/weed/s3api/s3_constants"
"github.com/seaweedfs/seaweedfs/weed/security"
"github.com/seaweedfs/seaweedfs/weed/storage/types"
"google.golang.org/grpc/reflection"
"github.com/seaweedfs/seaweedfs/weed/util"
"github.com/seaweedfs/seaweedfs/weed/util/grace"
)
func runMount(cmd *Command, args []string) bool {
if *mountOptions.debug {
go http.ListenAndServe(fmt.Sprintf(":%d", *mountOptions.debugPort), nil)
}
grace.SetupProfiling(*mountCpuProfile, *mountMemProfile)
if *mountReadRetryTime < time.Second {
*mountReadRetryTime = time.Second
}
util.RetryWaitTime = *mountReadRetryTime
umask, umaskErr := strconv.ParseUint(*mountOptions.umaskString, 8, 64)
if umaskErr != nil {
fmt.Printf("can not parse umask %s", *mountOptions.umaskString)
return false
}
if len(args) > 0 {
return false
}
return RunMount(&mountOptions, os.FileMode(umask))
}
func ensureBucketAllowEmptyFolders(ctx context.Context, filerClient filer_pb.FilerClient, mountRoot, bucketRootPath string) error {
bucketPath, isBucketRootMount := bucketPathForMountRoot(mountRoot, bucketRootPath)
if !isBucketRootMount {
return nil
}
entry, err := filer_pb.GetEntry(ctx, filerClient, util.FullPath(bucketPath))
if err != nil {
return err
}
if entry == nil {
return fmt.Errorf("bucket %s not found", bucketPath)
}
if entry.Extended == nil {
entry.Extended = make(map[string][]byte)
}
if strings.EqualFold(strings.TrimSpace(string(entry.Extended[s3_constants.ExtAllowEmptyFolders])), "true") {
return nil
}
entry.Extended[s3_constants.ExtAllowEmptyFolders] = []byte("true")
bucketFullPath := util.FullPath(bucketPath)
parent, _ := bucketFullPath.DirAndName()
if err := filerClient.WithFilerClient(false, func(client filer_pb.SeaweedFilerClient) error {
return filer_pb.UpdateEntry(ctx, client, &filer_pb.UpdateEntryRequest{
Directory: parent,
Entry: entry,
})
}); err != nil {
return err
}
glog.V(3).Infof("RunMount: set bucket %s %s=true", bucketPath, s3_constants.ExtAllowEmptyFolders)
return nil
}
func bucketPathForMountRoot(mountRoot, bucketRootPath string) (string, bool) {
cleanPath := path.Clean("/" + strings.TrimPrefix(mountRoot, "/"))
cleanBucketRoot := path.Clean("/" + strings.TrimPrefix(bucketRootPath, "/"))
if cleanBucketRoot == "/" {
return "", false
}
prefix := cleanBucketRoot + "/"
if !strings.HasPrefix(cleanPath, prefix) {
return "", false
}
rest := strings.TrimPrefix(cleanPath, prefix)
bucketParts := strings.Split(rest, "/")
if len(bucketParts) != 1 || bucketParts[0] == "" {
return "", false
}
return cleanBucketRoot + "/" + bucketParts[0], true
}
func RunMount(option *MountOptions, umask os.FileMode) bool {
// basic checks
chunkSizeLimitMB := *mountOptions.chunkSizeLimitMB
if chunkSizeLimitMB <= 0 {
fmt.Printf("Please specify a reasonable buffer size.\n")
return false
}
// try to connect to filer
filerAddresses := pb.ServerAddresses(*option.filer).ToAddresses()
util.LoadSecurityConfiguration()
grpcDialOption := security.LoadClientTLS(util.GetViper(), "grpc.client")
var cipher bool
var bucketRootPath string
var err error
for i := 0; i < 10; i++ {
err = pb.WithOneOfGrpcFilerClients(false, filerAddresses, grpcDialOption, func(client filer_pb.SeaweedFilerClient) error {
resp, err := client.GetFilerConfiguration(context.Background(), &filer_pb.GetFilerConfigurationRequest{})
if err != nil {
return fmt.Errorf("get filer grpc address %v configuration: %w", filerAddresses, err)
}
cipher = resp.Cipher
bucketRootPath = resp.DirBuckets
return nil
})
if err != nil {
glog.V(0).Infof("failed to talk to filer %v: %v", filerAddresses, err)
glog.V(0).Infof("wait for %d seconds ...", i+1)
time.Sleep(time.Duration(i+1) * time.Second)
}
}
if err != nil {
glog.Errorf("failed to talk to filer %v: %v", filerAddresses, err)
return true
}
if bucketRootPath == "" {
bucketRootPath = "/buckets"
}
filerMountRootPath := *option.filerMountRootPath
// clean up mount point
dir := util.ResolvePath(*option.dir)
if dir == "" {
fmt.Printf("Please specify the mount directory via \"-dir\"")
return false
}
unmount.Unmount(dir)
// start on local unix socket
if *option.localSocket == "" {
mountDirHash := util.HashToInt32([]byte(dir))
if mountDirHash < 0 {
mountDirHash = -mountDirHash
}
*option.localSocket = fmt.Sprintf("/tmp/seaweedfs-mount-%d.sock", mountDirHash)
}
if err := os.Remove(*option.localSocket); err != nil && !os.IsNotExist(err) {
glog.Fatalf("Failed to remove %s, error: %s", *option.localSocket, err.Error())
}
montSocketListener, err := net.Listen("unix", *option.localSocket)
if err != nil {
glog.Fatalf("Failed to listen on %s: %v", *option.localSocket, err)
}
// detect mount folder mode
if *option.dirAutoCreate {
os.MkdirAll(dir, os.FileMode(0777)&^umask)
}
fileInfo, err := os.Stat(dir)
// collect uid, gid
uid, gid := uint32(0), uint32(0)
mountMode := os.ModeDir | 0777
if err == nil {
mountMode = os.ModeDir | os.FileMode(0777)&^umask
uid, gid = util.GetFileUidGid(fileInfo)
fmt.Printf("mount point owner uid=%d gid=%d mode=%s\n", uid, gid, mountMode)
} else {
fmt.Printf("can not stat %s\n", dir)
return false
}
// detect uid, gid
if uid == 0 {
if u, err := user.Current(); err == nil {
if parsedId, pe := strconv.ParseUint(u.Uid, 10, 32); pe == nil {
uid = uint32(parsedId)
}
if parsedId, pe := strconv.ParseUint(u.Gid, 10, 32); pe == nil {
gid = uint32(parsedId)
}
fmt.Printf("current uid=%d gid=%d\n", uid, gid)
}
}
// mapping uid, gid
uidGidMapper, err := meta_cache.NewUidGidMapper(*option.uidMap, *option.gidMap)
if err != nil {
fmt.Printf("failed to parse %s %s: %v\n", *option.uidMap, *option.gidMap, err)
return false
}
// Ensure target mount point availability
skipAutofs := option.hasAutofs != nil && *option.hasAutofs
if isValid := checkMountPointAvailable(dir, skipAutofs); !isValid {
glog.Fatalf("Target mount point is not available: %s, please check!", dir)
return true
}
serverFriendlyName := strings.ReplaceAll(*option.filer, ",", "+")
// When autofs/systemd-mount is used, FsName must be "fuse" so util-linux/mount can recognize
// it as a pseudo filesystem. Otherwise, preserve the descriptive name for mount/df output.
fsName := serverFriendlyName + ":" + filerMountRootPath
if skipAutofs {
fsName = "fuse"
}
// mount fuse
fuseMountOptions := &fuse.MountOptions{
AllowOther: *option.allowOthers,
Options: option.extraOptions,
MaxBackground: 128,
MaxWrite: 1024 * 1024 * 2,
MaxReadAhead: 1024 * 1024 * 2,
IgnoreSecurityLabels: false,
RememberInodes: false,
FsName: fsName,
Name: "seaweedfs",
SingleThreaded: false,
DisableXAttrs: *option.disableXAttr,
Debug: *option.debugFuse,
EnableLocks: true,
ExplicitDataCacheControl: false,
DirectMount: true,
DirectMountFlags: 0,
//SyncRead: false, // set to false to enable the FUSE_CAP_ASYNC_READ capability
EnableAcl: true,
}
if *option.defaultPermissions {
fuseMountOptions.Options = append(fuseMountOptions.Options, "default_permissions")
}
if *option.nonempty {
fuseMountOptions.Options = append(fuseMountOptions.Options, "nonempty")
}
if *option.readOnly {
if runtime.GOOS == "darwin" {
fuseMountOptions.Options = append(fuseMountOptions.Options, "rdonly")
} else {
fuseMountOptions.Options = append(fuseMountOptions.Options, "ro")
}
}
if runtime.GOOS == "darwin" {
// https://github-wiki-see.page/m/macfuse/macfuse/wiki/Mount-Options
ioSizeMB := 1
for ioSizeMB*2 <= *option.chunkSizeLimitMB && ioSizeMB*2 <= 32 {
ioSizeMB *= 2
}
fuseMountOptions.Options = append(fuseMountOptions.Options, "daemon_timeout=600")
if runtime.GOARCH == "amd64" {
fuseMountOptions.Options = append(fuseMountOptions.Options, "noapplexattr")
}
if option.novncache != nil && *option.novncache {
fuseMountOptions.Options = append(fuseMountOptions.Options, "novncache")
}
fuseMountOptions.Options = append(fuseMountOptions.Options, "slow_statfs")
fuseMountOptions.Options = append(fuseMountOptions.Options, "volname="+serverFriendlyName)
fuseMountOptions.Options = append(fuseMountOptions.Options, fmt.Sprintf("iosize=%d", ioSizeMB*1024*1024))
}
if option.writebackCache != nil {
fuseMountOptions.EnableWriteback = *option.writebackCache
}
if option.asyncDio != nil {
fuseMountOptions.EnableAsyncDio = *option.asyncDio
}
if option.cacheSymlink != nil && *option.cacheSymlink {
fuseMountOptions.EnableSymlinkCaching = true
}
// find mount point
mountRoot := filerMountRootPath
if mountRoot != "/" && strings.HasSuffix(mountRoot, "/") {
mountRoot = mountRoot[0 : len(mountRoot)-1]
}
cacheDirForWrite := *option.cacheDirForWrite
if cacheDirForWrite == "" {
cacheDirForWrite = *option.cacheDirForRead
}
seaweedFileSystem := mount.NewSeaweedFileSystem(&mount.Option{
MountDirectory: dir,
FilerAddresses: filerAddresses,
GrpcDialOption: grpcDialOption,
FilerSigningKey: security.SigningKey(util.GetViper().GetString("jwt.filer_signing.key")),
FilerSigningExpiresAfterSec: util.GetViper().GetInt("jwt.filer_signing.expires_after_seconds"),
FilerMountRootPath: mountRoot,
Collection: *option.collection,
Replication: *option.replication,
TtlSec: int32(*option.ttlSec),
DiskType: types.ToDiskType(*option.diskType),
ChunkSizeLimit: int64(chunkSizeLimitMB) * 1024 * 1024,
ConcurrentWriters: *option.concurrentWriters,
ConcurrentReaders: *option.concurrentReaders,
CacheDirForRead: *option.cacheDirForRead,
CacheSizeMBForRead: *option.cacheSizeMBForRead,
CacheDirForWrite: cacheDirForWrite,
WriteBufferSizeMB: *option.writeBufferSizeMB,
CacheMetaTTlSec: *option.cacheMetaTtlSec,
DataCenter: *option.dataCenter,
Quota: int64(*option.collectionQuota) * 1024 * 1024,
MountUid: uid,
MountGid: gid,
MountMode: mountMode,
MountCtime: fileInfo.ModTime(),
MountMtime: time.Now(),
Umask: umask,
VolumeServerAccess: *mountOptions.volumeServerAccess,
Cipher: cipher,
UidGidMapper: uidGidMapper,
IncludeSystemEntries: *option.includeSystemEntries,
DisableXAttr: *option.disableXAttr,
IsMacOs: runtime.GOOS == "darwin",
MetadataFlushSeconds: *option.metadataFlushSeconds,
// RDMA acceleration options
RdmaEnabled: *option.rdmaEnabled,
RdmaSidecarAddr: *option.rdmaSidecarAddr,
RdmaFallback: *option.rdmaFallback,
RdmaReadOnly: *option.rdmaReadOnly,
RdmaMaxConcurrent: *option.rdmaMaxConcurrent,
RdmaTimeoutMs: *option.rdmaTimeoutMs,
DirIdleEvictSec: *option.dirIdleEvictSec,
EnableDistributedLock: option.distributedLock != nil && *option.distributedLock,
WritebackCache: option.writebackCache != nil && *option.writebackCache,
PosixDirNlink: option.posixDirNlink != nil && *option.posixDirNlink,
})
// create mount root
mountRootPath := util.FullPath(mountRoot)
mountRootParent, mountDir := mountRootPath.DirAndName()
if err = filer_pb.Mkdir(context.Background(), seaweedFileSystem, mountRootParent, mountDir, nil); err != nil {
fmt.Printf("failed to create dir %s on filer %s: %v\n", mountRoot, filerAddresses, err)
return false
}
if err := ensureBucketAllowEmptyFolders(context.Background(), seaweedFileSystem, mountRoot, bucketRootPath); err != nil {
fmt.Printf("failed to set bucket auto-remove-empty-folders policy for %s: %v\n", mountRoot, err)
return false
}
server, err := fuse.NewServer(seaweedFileSystem, dir, fuseMountOptions)
if err != nil {
glog.Fatalf("Mount fail: %v", err)
}
grace.OnInterrupt(func() {
unmount.Unmount(dir)
})
if mountOptions.fuseCommandPid != 0 {
// send a signal to the parent process to notify that the mount is ready
err = syscall.Kill(mountOptions.fuseCommandPid, syscall.SIGTERM)
if err != nil {
fmt.Printf("failed to notify parent process: %v\n", err)
return false
}
}
grpcS := pb.NewGrpcServer()
mount_pb.RegisterSeaweedMountServer(grpcS, seaweedFileSystem)
reflection.Register(grpcS)
go grpcS.Serve(montSocketListener)
err = seaweedFileSystem.StartBackgroundTasks()
if err != nil {
fmt.Printf("failed to start background tasks: %v\n", err)
return false
}
glog.V(0).Infof("mounted %s%s to %v", *option.filer, mountRoot, dir)
glog.V(0).Infof("This is SeaweedFS version %s %s %s", version.Version(), runtime.GOOS, runtime.GOARCH)
server.Serve()
// Wait for any pending background flushes (writebackCache async mode)
// before clearing caches, to prevent data loss during clean unmount.
seaweedFileSystem.WaitForAsyncFlush()
seaweedFileSystem.ClearCacheDir()
return true
}