122 Commits

Author SHA1 Message Date
Chris Lu
f51468cf73 Revert #9443 — heartbeat peer binding breaks hostname-based clusters (#9474)
Revert "master: bind heartbeat claims to the connecting peer (#9443)"

This reverts commit f28c7ce6df.

The strict heartbeat-ip-vs-peer match in authorizeHeartbeatPeer rejects
every hostname-based deployment. In docker-compose / k8s the volume
server is started with -ip=<service-name> and the gRPC peer surfaces
as the container/pod IP, so the two never match and every heartbeat
fails with `heartbeat ip "volume" does not match peer "172.18.0.3"`.
The master therefore never learns about any volume, growth fails, and
fio writes against the mount return EIO.

After the #9440 revert merged (43a8c4fdc), the e2e workflow is still
failing for this reason; see
https://github.com/seaweedfs/seaweedfs/actions/runs/25767265775 .

Reverting to unblock e2e. A narrower re-do should accept the heartbeat
when heartbeat.Ip resolves (DNS) to the peer address, so the spoof
hardening can return without breaking hostname-based clusters.
2026-05-12 18:22:21 -07:00
Chris Lu
f28c7ce6df master: bind heartbeat claims to the connecting peer (#9443)
SendHeartbeat used to accept whatever Ip/Port/Volumes the caller put on
the wire. Three changes tighten that:

- Reject heartbeats whose Ip does not match the gRPC peer's source
  address. Loopback peers are still trusted; operators behind a proxy
  can opt out with -master.allowUntrustedHeartbeat.
- Track which (ip, port) first claimed a volume id or an ec shard slot
  and drop foreign re-claims. Non-EC volume claims are bounded by the
  replica copy count so legitimate replicas still register. EC
  ownership is keyed by (vid, shard_id) so the same vid can legitimately
  be split across many peers as long as their EcIndexBits are disjoint;
  rejected bits are cleared from the bitmap and the parallel ShardSizes
  array is compacted in lock-step.
- Maintain reverse indexes owner -> volumes and owner -> ec shard slots
  so disconnect cleanup is O(M) in what that peer held rather than O(N)
  over the whole map.

Bindings are also released when a heartbeat reports that the peer no
longer holds an id, either via explicit Deleted{Volumes,EcShards}
entries or by omitting it from a full snapshot. Without this, a planned
rebalance that moved a vid or an ec shard from peer A to peer B would
leave B's heartbeats permanently filtered out until A disconnected,
breaking ec encode/decode flows that delete shards on the source as
soon as the move completes.

The (vid -> owners) binding still does not track which replica slot
each peer occupies, so the first N claims under the copy count win;
strict per-slot mapping is a follow-up.
2026-05-12 15:38:52 -07:00
Chris Lu
d605feb403 refactor(command): expand "~" in all path-style CLI flags (#9306)
* refactor(command): expand "~" in all path-style CLI flags

Many of weed's path-bearing flags (-s3.config, -s3.iam.config,
-admin.dataDir, -webdav.cacheDir, -volume.dir.idx, TLS cert/key
files, profile output paths, mount cache dirs, sftp key files, ...)
were never run through util.ResolvePath, so a value like "~/iam.json"
was used literally. Tilde only worked when the shell expanded it,
which silently fails for the common -flag=~/path form (bash leaves
the tilde literal in --opt=~/path).

- Extend util.ResolvePath to also handle "~user" / "~user/rest",
  matching shell tilde expansion. Add unit tests.
- Apply util.ResolvePath at the top of each shared start* function
  (s3, webdav, sftp) so mini/server/filer/standalone callers all
  inherit it; resolve at the few one-off use sites (mount cache
  dirs, volume idx folder, mini admin.dataDir, profile paths).
- Drop the duplicate expandHomeDir helper from admin.go in favor of
  the now-equivalent util.ResolvePath.

* fixup: handle comma-separated -dir flags for tilde expansion

`weed mini -dir`, `weed server -dir`, and `weed volume -dir` accept
comma-separated paths (`dir[,dir]...`). Calling util.ResolvePath on
the whole string mishandled multi-folder values with tilde, e.g.
"~/d1,~/d2" would resolve as if "d1,~/d2" were a single subpath.

- Add util.ResolveCommaSeparatedPaths: split on ",", run each entry
  through ResolvePath, rejoin. Short-circuits when no "~" present.
- Use it for *miniDataFolders (mini.go), *volumeDataFolders (server.go),
  and resolve each entry of v.folders in-place (volume.go) so all
  downstream consumers see resolved paths.
- Add 7-case TestResolveCommaSeparatedPaths covering empty, single,
  multiple, and mixed inputs.

* address PR review: metaFolder + Windows backslash

- master.go: resolve *m.metaFolder at the top of runMaster so
  util.FullPath(*m.metaFolder) on the next line sees an expanded
  path. Drop the now-redundant ResolvePath in TestFolderWritable.
- server.go: same treatment for *masterOptions.metaFolder, paired
  with the existing cpu/mem profile resolves. Drop the redundant
  inner ResolvePath at TestFolderWritable.
- file_util.go: ResolvePath now accepts filepath.Separator as a
  separator after the tilde, so "~\\data" works on Windows. Other
  platforms keep current behaviour (backslash stays literal because
  it is a valid filename character in usernames and paths).
- file_util_test.go: add two cases using filepath.Separator that
  exercise the new code path on Windows and remain a no-op on Unix.

* address PR review: resolve "~" in remaining command path flags

Comprehensive sweep of path-bearing flags across every weed
subcommand, applying util.ResolvePath in-place at the top of each
run* function so all downstream consumers see expanded paths.

- webdav.go: resolve *wo.cacheDir at the top of startWebDav so
  mini/server/filer/standalone callers all inherit it.
- mount_std.go: cpu/mem profile paths.
- filer_sync.go: cpu/mem profile paths.
- mq_broker.go: cpu/mem profile paths.
- benchmark.go: cpuprofile output path.
- backup.go: -dir resolved once at runBackup; drop the duplicated
  inline ResolvePath in NewVolume calls.
- compact.go: -dir resolved at runCompact; drop inline ResolvePath.
- export.go: -dir and -o resolved at runExport; drop inline
  ResolvePath in LoadFromIdx and ScanVolumeFile.
- download.go: -dir resolved at runDownload; drop inline.
- update.go: -dir resolved at runUpdate so filepath.Join uses the
  expanded path; drop inline ResolvePath in TestFolderWritable.
- scaffold.go: -output expanded before filepath.Join.
- worker.go: -workingDir expanded before being passed to runtime.

* address PR review: resolve option-struct paths at run* entry points

server.go:381 propagates s3Options.config to filerOptions.s3ConfigFile
*before* startS3Server runs, which meant the filer-side code saw the
unresolved tilde-prefixed pointer. Same pattern for webdavOptions and
sftpOptions (and equivalent in mini.go / filer.go).

The fix: hoist resolution from the shared start* functions up to the
run* entry points, where every shared pointer is set up before any
propagation happens.

- s3.go, webdav.go, sftp.go: extract a resolvePaths() method on each
  Options struct that runs every path field through util.ResolvePath
  in-place. Idempotent.
- runS3, runWebDav, runSftp: call the standalone struct's resolvePaths
  before starting metrics / loading security config.
- runServer, runMini, runFiler: call resolvePaths on every embedded
  options struct, plus resolve loose flags (serverIamConfig,
  miniS3Config, miniIamConfig, miniMasterOptions.metaFolder, and
  filer's defaultLevelDbDirectory) so they're expanded before any
  pointer copy or use.
- Drop the now-redundant inline ResolvePath at filer's
  defaultLevelDbDirectory composition.

* address PR review: re-resolve mini -dir post-config, cover misc paths

- mini.go: applyConfigFileOptions can overwrite -dir with a literal
  ~/data from mini.options. Re-resolve *miniDataFolders after the
  config-file apply, alongside the other path resolves, so the mini
  filer no longer ends up with a literal ~/data/filerldb2.
- benchmark.go: resolve *b.idListFile (-list).
- filer_sync.go: resolve *syncOptions.aSecurity / .bSecurity
  (-a.security / -b.security) before LoadClientTLSFromFile.
- filer_cat.go: resolve *filerCat.output (-o) before os.OpenFile.
- admin.go: drop trailing blank line at EOF (git diff --check).

* address PR review: resolve -a.security/-b.security/-config before use

Three follow-up fixes:

- filer_sync.go: the -a.security / -b.security resolves were placed
  *after* LoadClientTLSFromFile / LoadHTTPClientFromFile were called,
  so weed filer.sync -a.security=~/a.toml still passed the literal
  tilde path. Hoist the resolves above the security-loading block so
  TLS clients see expanded paths.
- filer_sync_verify.go: same flag pair was never resolved at all in
  the verify command; resolve at the top of runFilerSyncVerify.
- filer_meta_backup.go: -config (the backup_filer.toml path) was
  passed directly to viper. Resolve at the top of runFilerMetaBackup.
- mini.go: master.dir defaulted to the entire comma-joined
  miniDataFolders. With weed mini -dir=~/d1,~/d2 (or any multi-dir
  setup), TestFolderWritable then stat'd the joined string instead
  of a single directory. Default to the first entry via StringSplit
  to mirror the disk-space calculation a few lines below, and drop
  the now-redundant ResolvePath in TestFolderWritable.
2026-05-03 21:46:21 -07:00
Lars Lehtonen
29e14f89f1 fix(weed/command) address unhandled errors (#9208)
* fix(weed/command) address unhandled errors

* fix(command): don't log graceful-shutdown sentinels; plug response-body leak

- s3: Serve on unix socket treated http.ErrServerClosed as fatal; now
  excluded like the other Serve/ServeTLS paths in this file.
- mq_agent, mq_broker: filter grpc.ErrServerStopped so clean shutdown
  doesn't log as an error.
- worker_runtime: the added decodeErr early-continue skipped
  resp.Body.Close(); drop it since the existing check below already
  surfaces the decode error.
- mount_std: the pre-mount Unmount commonly fails when nothing is
  mounted; demote to V(1) Infof.
- fuse_std: tidy panic message to match sibling cases.

* fix(mq_broker): filter grpc.ErrServerStopped on localhost listener

The localhost listener goroutine logged any Serve error unconditionally,
which includes grpc.ErrServerStopped on graceful shutdown. Match the
main listener's check so clean stops don't surface as errors.

---------

Co-authored-by: Chris Lu <chris.lu@gmail.com>
2026-04-23 22:15:05 -07:00
Chris Lu
9ae905e456 feat(security): hot-reload HTTPS certs without restart (k8s cert-manager) (#9181)
* feat(security): hot-reload HTTPS certs for master/volume/filer/webdav/admin

S3 and filer already use a refreshing pemfile provider for their HTTPS
cert, so rotated certificates (e.g. from k8s cert-manager) are picked up
without a restart. Master, volume, webdav, and admin, however, passed
cert/key paths straight to ServeTLS/ListenAndServeTLS and loaded once at
startup — rotating those certs required a pod restart.

Add a small helper NewReloadingServerCertificate in weed/security that
wraps pemfile.Provider and returns a tls.Config.GetCertificate closure,
then wire it into the four remaining HTTPS entry points. httpdown now
also calls ServeTLS when TLSConfig carries a GetCertificate/Certificates
but CertFile/KeyFile are empty, so volume server can pre-populate
TLSConfig.

A unit test exercises the rotation path (write cert, rotate on disk,
assert the callback returns the new cert) with a short refresh window.

* refactor(security): route filer/s3 HTTPS through the shared cert reloader

Before: filer.go and s3.go each kept a *certprovider.Provider on the
options struct plus a duplicated GetCertificateWithUpdate method. Both
were loading pemfile themselves. Behaviorally they already reloaded, but
the logic was duplicated two ways and neither path was shared with the
newly-added master/volume/webdav/admin wiring.

After: both use security.NewReloadingServerCertificate like the other
servers. The per-struct certProvider field and GetCertificateWithUpdate
method are removed, along with the now-unused certprovider and pemfile
imports. Net: -32 lines, one code path for all HTTPS cert reloading.

No behavior change — the refresh window, cache, and handshake contract
are identical (the helper wraps the same pemfile.NewProvider).

* feat(security): hot-reload HTTPS client certs for mount/backup/upload/etc

The HTTP client in weed/util/http/client loaded the mTLS client cert
once at startup via tls.LoadX509KeyPair. That left every long-lived
HTTPS client process (weed mount, backup, filer.copy, filer→volume,
s3→filer/volume) unable to pick up a rotated client cert without a
restart — even though the same cert-manager setup was already rotating
the server side fine.

Swap the client cert loader for a tls.Config.GetClientCertificate
callback backed by the same refreshing pemfile provider. New TLS
handshakes pick up the rotated cert; in-flight pooled connections keep
their old cert and drop as normal transport churn happens.

To keep this reusable from both server and client TLS code without an
import cycle (weed/security already imports weed/util/http/client for
LoadHTTPClientFromFile), extract the pemfile wrapper into a new
weed/security/certreload subpackage. weed/security keeps its thin
NewReloadingServerCertificate wrapper. The existing unit test moves
with the implementation.

gRPC mTLS was already handled by security.LoadServerTLS /
LoadClientTLS; this PR does not change any gRPC paths. MQ broker, MQ
agent, Kafka gateway, and FUSE mount control plane are gRPC-only and
therefore already rotate.

CA bundles (ClientCAs / RootCAs / grpc.ca) are still loaded once — noted
as a known limitation in the wiki.

* fix(security): address PR review feedback on cert reloader

Bots (gemini-code-assist + coderabbit) flagged three real issues and a
couple of nits. Addressing them here:

1. KeyMaterial used context.Background(). The grpc pemfile provider's
   KeyMaterial blocks until material arrives or the context deadline
   expires; with Background() a slow disk could hang the TLS handshake
   indefinitely. Switched both the server and client callbacks to use
   hello.Context() / cri.Context() so a stuck read is bounded by the
   handshake timeout.

2. Admin server loaded TLS inside the serve goroutine. If the cert was
   bad, the goroutine returned but startAdminServer kept blocking on
   <-ctx.Done() with no listener, making the process look healthy with
   nothing bound. Moved TLS setup to run before the goroutine starts
   and propagate errors via fmt.Errorf; also captures the provider and
   defers Close().

3. HTTP client discarded the certprovider.Provider from
   NewClientGetCertificate. That leaked the refresh goroutine, and
   NewHttpClientWithTLS had a worse case where a CA-file failure after
   provider creation orphaned the provider entirely. Added a
   certProvider field and a Close() method on HTTPClient, and made
   the constructors close the provider on subsequent error paths.

4. Server-side paths (master/volume/filer/s3/webdav/admin) now retain
   the provider. filer and webdav run ServeTLS synchronously, so a
   plain defer works. master/volume/s3 dispatch goroutines and return
   while the server keeps running, so they hook Close() into
   grace.OnInterrupt.

5. Test: certreload_test now tolerates transient read/parse errors
   during file rotation (writeSelfSigned rewrites cert before key) and
   reports the last error only if the deadline expires.

No user-visible behavior change for the happy path.

* test(tls): add end-to-end HTTPS cert rotation integration test

Boots a real `weed master` with HTTPS enabled, captures the leaf cert
served at TLS handshake time, atomically rewrites the cert/key files
on disk (the same rename-in-place pattern kubelet does when it swaps
a cert-manager Secret), and asserts that a subsequent TLS handshake
observes the rotated leaf — with no process restart, no SIGHUP, no
reloader sidecar. Verifies the full path: on-disk change → pemfile
refresh tick → provider.KeyMaterial → tls.Config.GetCertificate →
server TLS handshake.

Runtime is ~1s by exposing the reloader's refresh window as an env
var (WEED_TLS_CERT_REFRESH_INTERVAL) and setting it to 500ms for the
test. The same env var is user-facing — documented in the wiki — so
operators running short-lived certs (Vault, cert-manager with
duration: 24h, etc.) can tighten the rotation-pickup window without a
rebuild. Defaults to 5h to preserve prior behavior.

security.CredRefreshingInterval is kept for API compatibility but now
aliases certreload.DefaultRefreshInterval so the same env controls
both gRPC mTLS and HTTPS reload.

* ci(tls): wire the TLS rotation integration test into GitHub Actions

Mirrors the existing vacuum-integration-tests.yml shape: Ubuntu runner,
Go 1.25, build weed, run `go test` in test/tls_rotation, upload master
logs on failure. 10-minute job timeout; the test itself finishes in
about a second because WEED_TLS_CERT_REFRESH_INTERVAL is set to 500ms
inside the test.

Runs on every push to master and on every PR to master.

* fix(tls): address follow-up PR review comments

Three new comments on the integration test + volume shutdown path:

1. Test: peekServerCert was swallowing every dial/handshake error,
   which meant waitForCert's "last err: <nil>" fatal message lost all
   diagnostic value. Thread errors back through: peekServerCert now
   returns (*x509.Certificate, error), and waitForCert records the
   latest error so a CI flake points at the actual cause (master
   didn't come up, handshake rejected, CA pool mismatch, etc.).

2. Test: set HOME=<tempdir> on the master subprocess. Viper today
   registers the literal path "$HOME/.seaweedfs" without env
   expansion, so a developer's ~/.seaweedfs/security.toml is
   accidentally invisible — the test was relying on that. Pinning
   HOME is belt-and-braces against a future viper upgrade that does
   expand env vars.

3. volume.go: startClusterHttpService's provider close was registered
   via grace.OnInterrupt, which fires on SIGTERM but NOT on the
   v.shutdownCtx.Done() path used by mini / integration tests. The
   pemfile refresh goroutine leaked in that shutdown path. Now the
   helper returns a close func and the caller invokes it on BOTH
   shutdown paths for parity.

Also add MinVersion: TLS 1.2 to the test's tls.Config to quiet the
ast-grep static-analysis nit — zero-risk since the pool only trusts
our in-memory CA.

Test runs clean 3/3.
2026-04-21 20:20:11 -07:00
Chris Lu
9d15705c16 fix(mini): shut down admin/s3/webdav/filer before volume/master on Ctrl+C (#9112)
* fix(mini): shut down admin/s3/webdav/filer before volume/master on Ctrl+C

Interrupts fired grace hooks in registration order, so master (started
first) shut down before its clients, producing heartbeat-canceled errors
and masterClient reconnection noise during weed mini shutdown. Admin/s3/
webdav had no interrupt hooks at all and were killed at os.Exit.

- grace: execute interrupt hooks in LIFO (defer-style) order so later-
  started services tear down first.
- filer: consolidate the three separate interrupt hooks (gRPC / HTTP /
  DB) into one that runs in order, so filer shutdown stays correct
  independent of FIFO/LIFO semantics.
- mini: add MiniClientsShutdownCtx (separate from test-facing
  MiniClusterCtx) plus an OnMiniClientsShutdown helper. Admin, S3,
  WebDAV and the maintenance worker observe it; runMini registers a
  cancel hook after startup so under LIFO it fires first and waits up to
  10s on a WaitGroup for those services to drain before filer, volume,
  and master shut down.

Resulting order on Ctrl+C: admin/s3/webdav/worker -> filer (gRPC -> HTTP
-> DB) -> volume -> master.

* refactor(mini): group mini-client shutdown into one state struct

The first pass spread the shutdown plumbing across three globals
(MiniClientsShutdownCtx, miniClientsWg, cancelMiniClients) and two
ctx-derivation sites (OnMiniClientsShutdown and startMiniAdminWithWorker).

Group into a private miniClientsState (ctx/cancel/wg) rebuilt per runMini
invocation, and chain its ctx from MiniClusterCtx so clients only observe
one signal. Tests that cancel MiniClusterCtx still trigger client
shutdown via parent-child propagation.

- resetMiniClients() installs fresh state at the top of runMini, so
  in-process test reruns don't inherit stale ctx/wg.
- onMiniClientsShutdown(fn) replaces the exported OnMiniClientsShutdown
  and only observes one ctx.
- trackMiniClient() replaces the manual wg.Add/Done dance for the admin
  goroutine.
- miniClientsCtx() gives the admin startup a ctx without re-deriving.
- triggerMiniClientsShutdown(timeout) is the interrupt hook body.

No behaviour change; existing tests pass.

* refactor: generalize shutdown ctx as an option, not a mini-specific helper

Several service files (s3, webdav, filer, master, volume) observed the
mini-specific MiniClusterCtx or called onMiniClientsShutdown directly.
That leaked mini orchestration into code that also runs under weed s3,
weed webdav, weed filer, weed master, and weed volume standalone.

Replace with a generic `shutdownCtx context.Context` field on each
service's Options struct. When non-nil, the server watches it and shuts
down gracefully; when nil (standalone), the shutdown path is a no-op.

Mini wires the contexts up from a single place (runMini):
 - miniMasterOptions/miniOptions.v/miniFilerOptions.shutdownCtx =
   MiniClusterCtx (drives test-triggered teardown)
 - miniS3Options/miniWebDavOptions.shutdownCtx = miniClientsCtx() (drives
   Ctrl+C teardown before filer/volume/master)

All knowledge of MiniClusterCtx now lives in mini.go.

* fix(mini): stop worker before clients ctx so admin shutdown isn't blocked

Symptom on Ctrl+C of a clean weed mini: mini's Shutting down admin/s3/
webdav hook sat for 10s then logged "timed out". Admin had started its
shutdown but was blocked inside StopWorkerGrpcServer's GracefulStop,
waiting for the still-connected worker stream. That in turn left filer
clients connected and cascaded into filer's own 10s gRPC graceful-stop
timeout.

Two causes, both fixed:

1. worker.Stop() deadlocked on clean shutdown. It sent ActionStop (which
   makes managerLoop `break out` and exit), then called getTaskLoad()
   which sends to the same unbuffered cmd channel — no receiver, hangs
   forever. Reorder Stop() to snapshot the admin client and drain tasks
   BEFORE sending ActionStop, and call Disconnect() via the local
   snapshot afterwards.

2. Worker's taskRequestLoop raced with Disconnect(): RequestTask reads
   from c.incoming, which Disconnect closes, yielding a nil response and
   a panic on response.Message. Handle the closed channel explicitly.

3. Mini now has a preCancel phase (beforeMiniClientsShutdown) that runs
   synchronously BEFORE the clients ctx is cancelled. Register worker
   shutdown there so admin's worker-gRPC GracefulStop finds the worker
   already disconnected and returns immediately, instead of waiting on
   a stream that is about to close anyway.

Observed shutdown of a clean mini: admin/s3/webdav down in <10ms; full
process exit in ~11s (the remaining 10s is a pre-existing filer gRPC
graceful-stop timeout, not cascaded from the clients tier).

* feat(mini): cap filer gRPC graceful stop at 1s under weed mini

Full weed mini shutdown was ~11s on a clean exit, dominated by the
filer's default 10s gRPC GracefulStop timeout while background
SubscribeLocalMetadata streams drained.

Expose the timeout as a FilerOptions.gracefulStopTimeout field (default
10s for standalone weed filer) and set it to 1s in mini. Clean weed mini
shutdown now takes ~2s.
2026-04-16 16:11:01 -07:00
Chris Lu
6f036c7015 fix(master): skip redundant DoJoinCommand on resumeState to prevent deadlock (#8998)
* fix(master): skip redundant DoJoinCommand on resumeState to prevent deadlock

When fastResume is active (single-master + resumeState + non-empty log),
the raft server becomes leader within ~1ms. DoJoinCommand then enters
the leaderLoop's processCommand path, which calls setCommitIndex to
commit all pending entries. The goraft setCommitIndex implementation
returns early when it encounters a JoinCommand entry (to recalculate
quorum), which can prevent the new entry's event channel from being
notified — leaving DoJoinCommand blocked forever.

Each restart appends a new raft:join entry to the log, while the conf
file's commitIndex (only persisted on AddPeer) lags behind. After 3-4
restarts the uncommitted range contains old JoinCommand entries that
trigger the early return before the new entry is reached.

Fix: skip DoJoinCommand when the raft log already has entries (the
server was already joined in a previous run). The fastResume mechanism
handles leader election independently.

* fix(master): handle Hashicorp Raft in HasExistingState

Add Hashicorp Raft support to HasExistingState by checking
AppliedIndex, consistent with how other RaftServer methods
handle both raft implementations.

* fix(master): use LastIndex() instead of AppliedIndex() for Hashicorp Raft

AppliedIndex() reflects in-memory FSM state which starts at 0 before
log replay completes. LastIndex() reads from persisted stable storage,
correctly mirroring the non-Hashicorp IsLogEmpty() check.
2026-04-08 21:08:50 -07:00
Chris Lu
4efe0acaf5 fix(master): fast resume state and default resumeState to true (#8925)
* fix(master): fast resume state and default resumeState to true

When resumeState is enabled in single-master mode, the raft server had
existing log entries so the self-join path couldn't promote to leader.
The server waited the full election timeout (10-20s) before self-electing.

Fix by temporarily setting election timeout to 1ms before Start() when
in single-master + resumeState mode with existing log, then restoring
the original timeout after leader election. This makes resume near-instant.

Also change the default for resumeState from false to true across all
CLI commands (master, mini, server) so state is preserved by default.

* fix(master): prevent fastResume goroutine from hanging forever

Use defer to guarantee election timeout is always restored, and bound
the polling loop with a timeout so it cannot spin indefinitely if
leader election never succeeds.

* fix(master): use ticker instead of time.After in fastResume polling loop
2026-04-04 14:15:56 -07:00
Chris Lu
2eaf98a7a2 Use Unix sockets for gRPC in mini mode (#8856)
* Use Unix sockets for gRPC between co-located services in mini mode

In `weed mini`, all services run in one process. Previously, inter-service
gRPC traffic (volume↔master, filer↔master, S3↔filer, worker↔admin, etc.)
went through TCP loopback. This adds a gRPC Unix socket registry in the pb
package: mini mode registers a socket path per gRPC port at startup, each
gRPC server additionally listens on its socket, and GrpcDial transparently
routes to the socket via WithContextDialer when a match is found.

Standalone commands (weed master, weed filer, etc.) are unaffected since
no sockets are registered. TCP listeners are kept for external clients.

* Handle Serve error and clean up socket file in ServeGrpcOnLocalSocket

Log non-expected errors from grpcServer.Serve (ignoring
grpc.ErrServerStopped) and always remove the Unix socket file
when Serve returns, ensuring cleanup on Stop/GracefulStop.
2026-03-30 18:18:52 -07:00
Chris Lu
15f4a97029 fix: improve raft leader election reliability and failover speed (#8692)
* fix: clear raft vote state file on non-resume startup

The seaweedfs/raft library v1.1.7 added a persistent `state` file for
currentTerm and votedFor. When RaftResumeState=false (the default), the
log, conf, and snapshot directories are cleared but this state file was
not. On repeated restarts, different masters accumulate divergent terms,
causing AppendEntries rejections and preventing leader election.

Fixes #8690

* fix: recover TopologyId from snapshot before clearing raft state

When RaftResumeState=false clears log/conf/snapshot, the TopologyId
(used for license validation) was lost. Now extract it from the latest
snapshot before cleanup and restore it on the topology.

Both seaweedfs/raft and hashicorp/raft paths are handled, with a shared
recoverTopologyIdFromState helper in raft_common.go.

* fix: stagger multi-master bootstrap delay by peer index

Previously all masters used a fixed 1500ms delay before the bootstrap
check. Now the delay is proportional to the peer's sorted index with
randomization (matching the hashicorp raft path), giving the designated
bootstrap node (peer 0) a head start while later peers wait for gRPC
servers to be ready.

Also adds diagnostic logging showing why DoJoinCommand was or wasn't
called, making leader election issues easier to diagnose from logs.

* fix: skip unreachable masters during leader reconnection

When a master leader goes down, non-leader masters still redirect
clients to the stale leader address. The masterClient would follow
these redirects, fail, and retry — wasting round-trips each cycle.

Now tryAllMasters tracks which masters failed within a cycle and skips
redirects pointing to them, reducing log spam and connection overhead
during leader failover.

* fix: take snapshot after TopologyId generation for recovery

After generating a new TopologyId on the leader, immediately take a raft
snapshot so the ID can be recovered from the snapshot on future restarts
with RaftResumeState=false. Without this, short-lived clusters would
lose the TopologyId on restart since no automatic snapshot had been
taken yet.

* test: add multi-master raft failover integration tests

Integration test framework and 5 test scenarios for 3-node master
clusters:

- TestLeaderConsistencyAcrossNodes: all nodes agree on leader and
  TopologyId
- TestLeaderDownAndRecoverQuickly: leader stops, new leader elected,
  old leader rejoins as follower
- TestLeaderDownSlowRecover: leader gone for extended period, cluster
  continues with 2/3 quorum
- TestTwoMastersDownAndRestart: quorum lost (2/3 down), recovered
  when both restart
- TestAllMastersDownAndRestart: full cluster restart, leader elected,
  all nodes agree on TopologyId

* fix: address PR review comments

- peerIndex: return -1 (not 0) when self not found, add warning log
- recoverTopologyIdFromSnapshot: defer dir.Close()
- tests: check GetTopologyId errors instead of discarding them

* fix: address review comments on failover tests

- Assert no leader after quorum loss (was only logging)
- Verify follower cs.Leader matches expected leader via
  ServerAddress.ToHttpAddress() comparison
- Check GetTopologyId error in TestTwoMastersDownAndRestart
2026-03-18 23:28:07 -07:00
Chris Lu
ba8e2aaae9 Fix master leader election when grpc ports change (#8272)
* Fix master leader detection when grpc ports change

* Canonicalize self peer entry to avoid raft self-alias panic

* Normalize and deduplicate master peer addresses
2026-02-09 18:13:02 -08:00
Chris Lu
c106532b79 fix: prevent MiniClusterCtx race conditions in command shutdown
Capture global MiniClusterCtx into local variables before goroutine/select
evaluation to prevent nil dereference/data race when context is reset to nil
after nil check. Applied to filer, master, volume, and s3 commands.
2026-01-28 19:42:16 -08:00
Chris Lu
01c17478ae command: implement graceful shutdown for mini cluster
- Introduce MiniClusterCtx to coordinate shutdown across mini services
- Update Master, Volume, Filer, S3, and WebDAV servers to respect context cancellation
- Ensure all resources are cleaned up properly during test teardown
- Integrate MiniClusterCtx in s3tables integration tests
2026-01-28 10:36:19 -08:00
Chris Lu
ed1da07665 Add consistent -debug and -debug.port flags to commands (#7816)
* Add consistent -debug and -debug.port flags to commands

Add -debug and -debug.port flags to weed master, weed volume, weed s3,
weed mq.broker, and weed filer.sync commands for consistency with
weed filer.

When -debug is enabled, an HTTP server starts on the specified port
(default 6060) serving runtime profiling data at /debug/pprof/.

For mq.broker, replaced the older -port.pprof flag with the new
-debug and -debug.port pattern for consistency.

* Update weed/util/grace/pprof.go

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-18 17:44:36 -08:00
Chris Lu
f096b067fd weed master add peers=none option for faster startup (#7419)
* weed master -peers=none

* single master mode only when peers is none

* refactoring

* revert duplicated code

* revert

* Update weed/command/master.go

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* preventing "none" passed to other components if master is not started

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-31 18:29:16 -07:00
Chris Lu
5ab49e2971 Adjust cli option (#7418)
* adjust "weed benchmark" CLI to use readOnly/writeOnly

* consistently use "-master" CLI option

* If both -readOnly and -writeOnly are specified, the current logic silently allows it with -writeOnly taking precedence. This is confusing and could lead to unexpected behavior.
2025-10-31 17:08:00 -07:00
zuzuviewer
8fa1a69f8c * Fix undefined http serve behaiver (#6943) 2025-07-07 22:48:12 -07:00
chrislu
166e36bcd3 use telemetry.seaweedfs.com 2025-06-28 19:48:03 -07:00
Chris Lu
a1aab8a083 add telemetry (#6926)
* add telemetry

* fix go mod

* add default telemetry server url

* Update README.md

* replace with broker count instead of s3 count

* Update telemetry.pb.go

* github action to deploy
2025-06-28 14:11:55 -07:00
chrislu
ab49540d2b use master.toml value if not empty
fix https://github.com/seaweedfs/seaweedfs/issues/6922
2025-06-25 17:54:56 -07:00
chrislu
bd4891a117 change version directory 2025-06-03 22:46:10 -07:00
Ethan Mosbaugh
9ebc132ffc fix: bucket-hook fails with gnu wget (#6521) 2025-02-06 23:11:17 -08:00
chrislu
ec155022e7 "golang.org/x/exp/slices" => "slices" and go fmt 2024-12-19 19:25:06 -08:00
zouyixiong
881c9a009e [master] master missing start LoopPushingMetric routine fixed. (#6018) 2024-09-13 20:01:34 -07:00
chrislu
4463296811 add parallel vacuuming 2024-08-21 22:53:54 -07:00
vadimartynov
b796c21fa9 Added loadSecurityConfigOnce (#5792) 2024-07-16 09:15:55 -07:00
vadimartynov
ec9e7493b3 -metricsIp cmd flag (#5773)
* Added/Updated:
- Added metrics ip options for all servers;
- Fixed a bug with the selection of the binIp or ip parameter for the metrics handler;

* Fixed cmd flags
2024-07-12 10:56:26 -07:00
Konstantin Lebedev
5ffacbb6ea refactor all methods strings to const (#5726) 2024-07-01 01:00:39 -07:00
vadimartynov
8aae82dd71 Added context for the MasterClient's methods to avoid endless loops (#5628)
* Added context for the MasterClient's methods to avoid endless loops

* Returned WithClient function. Added WithClientCustomGetMaster function

* Hid unused ctx arguments

* Using a common context for the KeepConnectedToMaster and WaitUntilConnected functions

* Changed the context termination check in the tryConnectToMaster function

* Added a child context to the tryConnectToMaster function

* Added a common context for KeepConnectedToMaster and WaitUntilConnected functions in benchmark
2024-06-14 11:40:34 -07:00
chrislu
645ae8c57b Revert "Revert "Merge branch 'master' of https://github.com/seaweedfs/seaweedfs""
This reverts commit 8cb42c39
2023-09-25 09:35:16 -07:00
chrislu
8cb42c39ad Revert "Merge branch 'master' of https://github.com/seaweedfs/seaweedfs"
This reverts commit 2e5aa06026, reversing
changes made to 4d414f54a2.
2023-09-18 16:12:50 -07:00
dependabot[bot]
a04bd4d26f Bump github.com/rclone/rclone from 1.63.1 to 1.64.0 (#4850)
* Bump github.com/rclone/rclone from 1.63.1 to 1.64.0

Bumps [github.com/rclone/rclone](https://github.com/rclone/rclone) from 1.63.1 to 1.64.0.
- [Release notes](https://github.com/rclone/rclone/releases)
- [Changelog](https://github.com/rclone/rclone/blob/master/RELEASE.md)
- [Commits](https://github.com/rclone/rclone/compare/v1.63.1...v1.64.0)

---
updated-dependencies:
- dependency-name: github.com/rclone/rclone
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* API changes

* go mod

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Chris Lu <chrislusf@users.noreply.github.com>
Co-authored-by: chrislu <chris.lu@gmail.com>
2023-09-18 14:43:05 -07:00
Stewart Miles
dd71f54c6b Fix -raftHashicorp and -raftBootstrap flag propagation. (#4309)
`weed server` was not correctly propagating
`-master.raftHashicorp` and `-master.raftBootstrap` flags when
starting the master server.

Related to #4307
2023-03-15 13:03:20 -07:00
Jiffs Maverick
4b0430e71d [metrics] Add the ability to control bind ip (#4012) 2022-11-24 10:22:59 -08:00
Konstantin Lebedev
401315f337 master fix interruption through ctrl+c (#3834) 2022-10-12 07:18:40 -07:00
Konstantin Lebedev
b9933d5589 master server graceful stop (#3797) 2022-10-06 09:30:30 -07:00
Ryan Russell
8efe1db01a refactor(various): Listner -> Listener readability improvements (#3672)
* refactor(net_timeout): `listner` -> `listener`

Signed-off-by: Ryan Russell <git@ryanrussell.org>

* refactor(s3): `s3ApiLocalListner` -> `s3ApiLocalListener`

Signed-off-by: Ryan Russell <git@ryanrussell.org>

* refactor(filer): `localPublicListner` -> `localPublicListener`

Signed-off-by: Ryan Russell <git@ryanrussell.org>

* refactor(command): `masterLocalListner` -> `masterLocalListener`

Signed-off-by: Ryan Russell <git@ryanrussell.org>

* refactor(net_timeout): `ipListner` -> `ipListener`

Signed-off-by: Ryan Russell <git@ryanrussell.org>

Signed-off-by: Ryan Russell <git@ryanrussell.org>
2022-09-14 11:59:55 -07:00
Patrick Schmidt
7b424a54dc Add raft server access mutex to avoid races (#3503) 2022-08-24 09:49:05 -07:00
chrislu
67814a5c79 refactor and fix strings.Split 2022-08-07 01:34:32 -07:00
Patrick Schmidt
1a4a36d510 Add healthy indicator for raft status 2022-07-30 19:34:26 +02:00
chrislu
26dbc6c905 move to https://github.com/seaweedfs/seaweedfs 2022-07-29 00:17:28 -07:00
chrislu
3828b8ce87 "github.com/chrislusf/raft" => "github.com/seaweedfs/raft" 2022-07-27 12:12:40 -07:00
chrislu
492da3dbce master: put metadata under instance specific folder 2022-06-20 19:04:49 -07:00
leyou240
89eb87c1d1 Merge branch 'master' into slices.SortFunc 2022-04-18 10:39:29 +08:00
justin
3551ca2fcf enhancement: replace sort.Slice with slices.SortFunc to reduce reflection 2022-04-18 10:35:43 +08:00
Konstantin Lebedev
17c6e8e39f Merge branch 'new_master' into hashicorp_raft
# Conflicts:
#	go.mod
#	go.sum
2022-04-05 13:29:46 +05:00
Konstantin Lebedev
622297f1a7 add stats raft handler 2022-04-04 19:16:06 +05:00
Konstantin Lebedev
14dd971890 hashicorp raft with state machine 2022-04-04 17:51:51 +05:00
Konstantin Lebedev
c514710b7b initial add hashicorp raft 2022-04-04 13:50:56 +05:00
chrislu
daca2d22a5 use original server address string as map key 2022-04-01 17:34:42 -07:00