mirror of
https://github.com/seaweedfs/seaweedfs.git
synced 2026-05-22 17:51:30 +00:00
* proto: define MountRegister/MountList and MountPeer service Adds the wire types for peer chunk sharing between weed mount clients: * filer.proto: MountRegister / MountList RPCs so each mount can heartbeat its peer-serve address into a filer-hosted registry, and refresh the list of peers. Tiny payload; the filer stores only O(fleet_size) state. * mount_peer.proto (new): ChunkAnnounce / ChunkLookup RPCs for the mount-to-mount chunk directory. Each fid's directory entry lives on an HRW-assigned mount; announces and lookups route to that mount. No behavior yet — later PRs wire the RPCs into the filer and mount. See design-weed-mount-peer-chunk-sharing.md for the full design. * filer: add mount-server registry behind -peer.registry.enable Implements tier 1 of the peer chunk sharing design: an in-memory registry of live weed mount servers, keyed by peer address, refreshed by MountRegister heartbeats and served by MountList. * weed/filer/peer_registry.go: thread-safe map with TTL eviction; lazy sweep on List plus a background sweeper goroutine for bounded memory. * weed/server/filer_grpc_server_peer.go: MountRegister / MountList RPC handlers. When -peer.registry.enable is false (the default), both RPCs are silent no-ops so probing older filers is harmless. * -peer.registry.enable flag on weed filer; FilerOption.PeerRegistryEnabled wires it through. Phase 1 is single-filer (no cross-filer replication of the registry); mounts that fail over to another filer will re-register on the next heartbeat, so the registry self-heals within one TTL cycle. Part of the peer-chunk-sharing design; no behavior change at runtime until a later PR enables the flag on both filer and mount. * filer: nil-safe peerRegistryEnable + registry hardening Addresses review feedback on PR #9131. * Fix: nil pointer deref in the mini cluster. FilerOptions instances constructed outside weed/command/filer.go (e.g. miniFilerOptions in mini.go) do not populate peerRegistryEnable, so dereferencing the pointer panics at Filer startup. Use the same `nil && deref` idiom already used for distributedLock / writebackCache. * Hardening (gemini review): registry now enforces three invariants: - empty peer_addr is silently rejected (no client-controlled sentinel mass-inserts) - TTL is capped at 1 hour so a runaway client cannot pin entries - new-entry count is capped at 10000 to bound memory; renewals of existing entries are always honored, so a full registry still heartbeats its existing members correctly Covered by new unit tests. * filer: rename -peer.registry.enable flag to -mount.p2p Per review feedback: the old name "peer.registry.enable" leaked the implementation ("registry") into the CLI surface. "mount.p2p" is shorter and describes what it actually controls — whether this filer participates in mount-to-mount peer chunk sharing. Flag renames (all three keep default=true, idle cost is near-zero): -peer.registry.enable -> -mount.p2p (weed filer) -filer.peer.registry.enable -> -filer.mount.p2p (weed mini, weed server) Internal variable names (mountPeerRegistryEnable, MountPeerRegistry) keep their longer form — they describe the component, not the knob. * filer: MountList returns DataCenter + List uses RLock Two review follow-ups on the mount peer registry: * weed/server/filer_grpc_server_mount_peer.go: MountList was dropping the DataCenter on the wire. The whole point of carrying DC separately from Rack is letting the mount-side fetcher re-rank peers by the two-level locality hierarchy (same-rack > same-DC > cross-DC); without DC in the response every remote peer collapsed to "unknown locality." * weed/filer/mount_peer_registry.go: List() was taking a write lock so it could lazy-delete expired entries inline. But MountList is a read-heavy RPC hit on every mount's 30 s refresh loop, and Sweep is already wired as the sole reclamation path (same pattern as the mount-side PeerDirectory). Switch List to RLock + filter, let Sweep do the map mutation, so concurrent MountList callers don't serialize on each other. Test updated to reflect the new contract (List no longer mutates the map; Sweep is what drops expired entries). * mount: add peer chunk sharing options + advertise address resolver First cut at the peer chunk sharing wiring on the mount side. No functional behavior yet — this PR just introduces the option fields, the -peer.* flags, and the helper that resolves a reachable host:port from them. The server implementation arrives in PR #5 (gRPC service) and the fetcher in PR #7. * ResolvePeerAdvertiseAddr: an explicit -peer.advertise wins; else we use -peer.listen's bind host if specific; else util.DetectedHostAddress combined with the port. This is what gets registered with the filer and announced to peers, so wildcard binds no longer result in unreachable identities like "[::]:18080". * Option fields: PeerEnabled, PeerListen, PeerAdvertise, PeerRack. One port handles both directory RPCs and streaming chunk fetches (see PR #1 FetchChunk proto), so there is no second -peer.grpc.* flag — the old HTTP byte-transfer path is gone. * New flags on weed mount: -peer.enable, -peer.listen (default :18080), -peer.advertise (default auto), -peer.rack.
197 lines
11 KiB
Go
197 lines
11 KiB
Go
package command
|
|
|
|
import (
|
|
"os"
|
|
"time"
|
|
)
|
|
|
|
type MountOptions struct {
|
|
filer *string
|
|
filerMountRootPath *string
|
|
dir *string
|
|
dirAutoCreate *bool
|
|
collection *string
|
|
collectionQuota *int
|
|
replication *string
|
|
diskType *string
|
|
ttlSec *int
|
|
chunkSizeLimitMB *int
|
|
concurrentWriters *int
|
|
concurrentReaders *int
|
|
cacheMetaTtlSec *int
|
|
cacheDirForRead *string
|
|
cacheDirForWrite *string
|
|
cacheSizeMBForRead *int64
|
|
writeBufferSizeMB *int64
|
|
dataCenter *string
|
|
allowOthers *bool
|
|
defaultPermissions *bool
|
|
umaskString *string
|
|
nonempty *bool
|
|
volumeServerAccess *string
|
|
uidMap *string
|
|
gidMap *string
|
|
readOnly *bool
|
|
includeSystemEntries *bool
|
|
debug *bool
|
|
debugPort *int
|
|
debugFuse *bool
|
|
localSocket *string
|
|
disableXAttr *bool
|
|
extraOptions []string
|
|
fuseCommandPid int
|
|
|
|
// Periodic metadata flush to protect against orphan chunk cleanup
|
|
metadataFlushSeconds *int
|
|
|
|
// RDMA acceleration options
|
|
rdmaEnabled *bool
|
|
rdmaSidecarAddr *string
|
|
rdmaFallback *bool
|
|
rdmaReadOnly *bool
|
|
rdmaMaxConcurrent *int
|
|
rdmaTimeoutMs *int
|
|
|
|
// Peer chunk sharing options (design-weed-mount-peer-chunk-sharing.md).
|
|
peerEnabled *bool
|
|
peerListen *string
|
|
peerAdvertise *string
|
|
peerDataCenter *string
|
|
peerRack *string
|
|
|
|
dirIdleEvictSec *int
|
|
|
|
// Distributed lock for cross-mount write coordination
|
|
distributedLock *bool
|
|
|
|
// POSIX compliance options
|
|
posixDirNlink *bool
|
|
|
|
// FUSE performance options
|
|
writebackCache *bool
|
|
asyncDio *bool
|
|
cacheSymlink *bool
|
|
|
|
// macOS-specific FUSE options
|
|
novncache *bool
|
|
|
|
// if true, we assume autofs exists over current mount point. Autofs (the kernel one, used by systemd automount)
|
|
// is expected to be mounted as a shim between auto-mounted fs and original mount point to provide auto mount.
|
|
// with this option, we ignore autofs mounted on the same point.
|
|
hasAutofs *bool
|
|
}
|
|
|
|
var (
|
|
mountOptions MountOptions
|
|
mountCpuProfile *string
|
|
mountMemProfile *string
|
|
mountReadRetryTime *time.Duration
|
|
)
|
|
|
|
func init() {
|
|
cmdMount.Run = runMount // break init cycle
|
|
mountOptions.filer = cmdMount.Flag.String("filer", "localhost:8888", "comma-separated weed filer location")
|
|
mountOptions.filerMountRootPath = cmdMount.Flag.String("filer.path", "/", "mount this remote path from filer server")
|
|
mountOptions.dir = cmdMount.Flag.String("dir", ".", "mount weed filer to this directory")
|
|
mountOptions.dirAutoCreate = cmdMount.Flag.Bool("dirAutoCreate", false, "auto create the directory to mount to")
|
|
mountOptions.collection = cmdMount.Flag.String("collection", "", "collection to create the files")
|
|
mountOptions.collectionQuota = cmdMount.Flag.Int("collectionQuotaMB", 0, "quota for the collection")
|
|
mountOptions.replication = cmdMount.Flag.String("replication", "", "replication(e.g. 000, 001) to create to files. If empty, let filer decide.")
|
|
mountOptions.diskType = cmdMount.Flag.String("disk", "", "[hdd|ssd|<tag>] hard drive or solid state drive or any tag")
|
|
mountOptions.ttlSec = cmdMount.Flag.Int("ttl", 0, "file ttl in seconds")
|
|
mountOptions.chunkSizeLimitMB = cmdMount.Flag.Int("chunkSizeLimitMB", 2, "local write buffer size, also chunk large files")
|
|
mountOptions.concurrentWriters = cmdMount.Flag.Int("concurrentWriters", 128, "limit concurrent goroutine writers")
|
|
mountOptions.concurrentReaders = cmdMount.Flag.Int("concurrentReaders", 128, "limit concurrent chunk fetches for read operations")
|
|
mountOptions.cacheDirForRead = cmdMount.Flag.String("cacheDir", os.TempDir(), "local cache directory for file chunks and meta data")
|
|
mountOptions.cacheSizeMBForRead = cmdMount.Flag.Int64("cacheCapacityMB", 128, "file chunk read cache capacity in MB")
|
|
mountOptions.cacheDirForWrite = cmdMount.Flag.String("cacheDirWrite", "", "buffer writes mostly for large files")
|
|
mountOptions.writeBufferSizeMB = cmdMount.Flag.Int64("writeBufferSizeMB", 0, "global cap on the per-mount write buffer (memory + swap) in MB, 0 means unlimited. Bounds /tmp growth when volume uploads stall")
|
|
mountOptions.cacheMetaTtlSec = cmdMount.Flag.Int("cacheMetaTtlSec", 60, "metadata cache validity seconds")
|
|
mountOptions.dataCenter = cmdMount.Flag.String("dataCenter", "", "prefer to write to the data center")
|
|
mountOptions.allowOthers = cmdMount.Flag.Bool("allowOthers", true, "allows other users to access the file system")
|
|
mountOptions.defaultPermissions = cmdMount.Flag.Bool("defaultPermissions", true, "enforce permissions by the operating system")
|
|
mountOptions.umaskString = cmdMount.Flag.String("umask", "022", "octal umask, e.g., 022, 0111")
|
|
mountOptions.nonempty = cmdMount.Flag.Bool("nonempty", false, "allows the mounting over a non-empty directory")
|
|
mountOptions.volumeServerAccess = cmdMount.Flag.String("volumeServerAccess", "direct", "access volume servers by [direct|publicUrl|filerProxy]")
|
|
mountOptions.uidMap = cmdMount.Flag.String("map.uid", "", "map local uid to uid on filer, comma-separated <local_uid>:<filer_uid>")
|
|
mountOptions.gidMap = cmdMount.Flag.String("map.gid", "", "map local gid to gid on filer, comma-separated <local_gid>:<filer_gid>")
|
|
mountOptions.readOnly = cmdMount.Flag.Bool("readOnly", false, "read only")
|
|
mountOptions.includeSystemEntries = cmdMount.Flag.Bool("includeSystemEntries", false, "show filer system entries (e.g. /topics, /etc) in directory listings")
|
|
mountOptions.debug = cmdMount.Flag.Bool("debug", false, "serves runtime profiling data, e.g., http://localhost:<debug.port>/debug/pprof/goroutine?debug=2")
|
|
mountOptions.debugPort = cmdMount.Flag.Int("debug.port", 6061, "http port for debugging")
|
|
mountOptions.debugFuse = cmdMount.Flag.Bool("debug.fuse", false, "log raw FUSE protocol requests and responses")
|
|
mountOptions.localSocket = cmdMount.Flag.String("localSocket", "", "default to /tmp/seaweedfs-mount-<mount_dir_hash>.sock")
|
|
mountOptions.disableXAttr = cmdMount.Flag.Bool("disableXAttr", false, "disable xattr")
|
|
mountOptions.hasAutofs = cmdMount.Flag.Bool("autofs", false, "ignore autofs mounted on the same mountpoint (useful when systemd.automount and autofs is used)")
|
|
mountOptions.fuseCommandPid = 0
|
|
|
|
// Periodic metadata flush to protect against orphan chunk cleanup
|
|
mountOptions.metadataFlushSeconds = cmdMount.Flag.Int("metadataFlushSeconds", 120, "periodically flush file metadata to filer in seconds (0 to disable). This protects chunks from being purged by volume.fsck for long-running writes")
|
|
|
|
// RDMA acceleration flags
|
|
mountOptions.rdmaEnabled = cmdMount.Flag.Bool("rdma.enabled", false, "enable RDMA acceleration for reads")
|
|
mountOptions.rdmaSidecarAddr = cmdMount.Flag.String("rdma.sidecar", "", "RDMA sidecar address (e.g., localhost:8081)")
|
|
mountOptions.rdmaFallback = cmdMount.Flag.Bool("rdma.fallback", true, "fallback to HTTP when RDMA fails")
|
|
mountOptions.rdmaReadOnly = cmdMount.Flag.Bool("rdma.readOnly", false, "use RDMA for reads only (writes use HTTP)")
|
|
mountOptions.rdmaMaxConcurrent = cmdMount.Flag.Int("rdma.maxConcurrent", 64, "max concurrent RDMA operations")
|
|
mountOptions.rdmaTimeoutMs = cmdMount.Flag.Int("rdma.timeoutMs", 5000, "RDMA operation timeout in milliseconds")
|
|
|
|
// Peer chunk sharing flags.
|
|
mountOptions.peerEnabled = cmdMount.Flag.Bool("peer.enable", false, "opt in to peer chunk sharing — mount serves its chunk cache to other mounts and fetches from peers instead of volume servers when available")
|
|
mountOptions.peerListen = cmdMount.Flag.String("peer.listen", ":18080", "bind address for peer gRPC (directory RPCs + FetchChunk streaming)")
|
|
mountOptions.peerAdvertise = cmdMount.Flag.String("peer.advertise", "", "externally-reachable host:port other mounts use to reach this one (defaults to autodetected host + -peer.listen port)")
|
|
mountOptions.peerDataCenter = cmdMount.Flag.String("peer.dataCenter", "", "optional data-center label advertised to peers; used with -peer.rack for two-level locality ranking")
|
|
mountOptions.peerRack = cmdMount.Flag.String("peer.rack", "", "optional rack label advertised to peers")
|
|
|
|
mountOptions.dirIdleEvictSec = cmdMount.Flag.Int("dirIdleEvictSec", 600, "seconds to evict idle cached directories (0 to disable)")
|
|
|
|
mountCpuProfile = cmdMount.Flag.String("cpuprofile", "", "cpu profile output file")
|
|
mountMemProfile = cmdMount.Flag.String("memprofile", "", "memory profile output file")
|
|
mountReadRetryTime = cmdMount.Flag.Duration("readRetryTime", 6*time.Second, "maximum read retry wait time")
|
|
|
|
// Distributed lock for cross-mount write coordination
|
|
mountOptions.distributedLock = cmdMount.Flag.Bool("dlm", false, "enable distributed lock for cross-mount write coordination (only one mount can write a file at a time)")
|
|
|
|
// POSIX compliance options
|
|
mountOptions.posixDirNlink = cmdMount.Flag.Bool("posix.dirNLink", false, "report POSIX-compliant directory nlink (2 + subdirectory count); costs one directory listing per stat")
|
|
|
|
// FUSE performance options
|
|
mountOptions.writebackCache = cmdMount.Flag.Bool("writebackCache", false, "enable FUSE writeback cache for improved write performance (at risk of data loss on crash)")
|
|
mountOptions.asyncDio = cmdMount.Flag.Bool("asyncDio", false, "enable async direct I/O for better concurrency")
|
|
mountOptions.cacheSymlink = cmdMount.Flag.Bool("cacheSymlink", false, "enable symlink caching to reduce metadata lookups")
|
|
|
|
// macOS-specific FUSE options
|
|
mountOptions.novncache = cmdMount.Flag.Bool("sys.novncache", false, "(macOS only) disable vnode name caching to avoid stale data")
|
|
}
|
|
|
|
var cmdMount = &Command{
|
|
UsageLine: "mount -filer=localhost:8888 -dir=/some/dir",
|
|
Short: "mount weed filer to a directory as file system in userspace(FUSE)",
|
|
Long: `mount weed filer to userspace.
|
|
|
|
Pre-requisites:
|
|
1) have SeaweedFS master and volume servers running
|
|
2) have a "weed filer" running
|
|
These 2 requirements can be achieved with one command "weed server -filer=true"
|
|
|
|
This uses github.com/seaweedfs/fuse, which enables writing FUSE file systems on
|
|
Linux, and OS X.
|
|
|
|
On OS X, it requires OSXFUSE (https://osxfuse.github.io/).
|
|
|
|
RDMA Acceleration:
|
|
For ultra-fast reads, enable RDMA acceleration with an RDMA sidecar:
|
|
weed mount -filer=localhost:8888 -dir=/mnt/seaweedfs \
|
|
-rdma.enabled=true -rdma.sidecar=localhost:8081
|
|
|
|
RDMA Options:
|
|
-rdma.enabled=false Enable RDMA acceleration for reads
|
|
-rdma.sidecar="" RDMA sidecar address (required if enabled)
|
|
-rdma.fallback=true Fallback to HTTP when RDMA fails
|
|
-rdma.readOnly=false Use RDMA for reads only (writes use HTTP)
|
|
-rdma.maxConcurrent=64 Max concurrent RDMA operations
|
|
-rdma.timeoutMs=5000 RDMA operation timeout in milliseconds
|
|
|
|
`,
|
|
}
|