141 Commits

Author SHA1 Message Date
Chris Lu
69da20bdae volume: gate FetchAndWriteNeedle behind admin auth and refuse internal endpoints (#9441)
volume: require admin auth and refuse loopback endpoints in FetchAndWriteNeedle

Gate the RPC behind checkGrpcAdminAuth for parity with the rest of the
destructive volume-server RPCs, and reject cluster-internal remote S3
endpoints (loopback / link-local / IMDS / RFC 1918 / CGNAT) before
dialing. Pin the validated address against DNS rebinding by routing the
AWS SDK through an HTTP transport whose DialContext re-resolves the host
and re-applies the deny list on every dial, so an endpoint that resolves
to a public IP at validate-time and then flips to 127.0.0.1 at connect
time is refused. Operators that legitimately fetch from private hosts
can opt out with -volume.allowUntrustedRemoteEndpoints.
2026-05-12 10:11:20 -07:00
Chris Lu
d605feb403 refactor(command): expand "~" in all path-style CLI flags (#9306)
* refactor(command): expand "~" in all path-style CLI flags

Many of weed's path-bearing flags (-s3.config, -s3.iam.config,
-admin.dataDir, -webdav.cacheDir, -volume.dir.idx, TLS cert/key
files, profile output paths, mount cache dirs, sftp key files, ...)
were never run through util.ResolvePath, so a value like "~/iam.json"
was used literally. Tilde only worked when the shell expanded it,
which silently fails for the common -flag=~/path form (bash leaves
the tilde literal in --opt=~/path).

- Extend util.ResolvePath to also handle "~user" / "~user/rest",
  matching shell tilde expansion. Add unit tests.
- Apply util.ResolvePath at the top of each shared start* function
  (s3, webdav, sftp) so mini/server/filer/standalone callers all
  inherit it; resolve at the few one-off use sites (mount cache
  dirs, volume idx folder, mini admin.dataDir, profile paths).
- Drop the duplicate expandHomeDir helper from admin.go in favor of
  the now-equivalent util.ResolvePath.

* fixup: handle comma-separated -dir flags for tilde expansion

`weed mini -dir`, `weed server -dir`, and `weed volume -dir` accept
comma-separated paths (`dir[,dir]...`). Calling util.ResolvePath on
the whole string mishandled multi-folder values with tilde, e.g.
"~/d1,~/d2" would resolve as if "d1,~/d2" were a single subpath.

- Add util.ResolveCommaSeparatedPaths: split on ",", run each entry
  through ResolvePath, rejoin. Short-circuits when no "~" present.
- Use it for *miniDataFolders (mini.go), *volumeDataFolders (server.go),
  and resolve each entry of v.folders in-place (volume.go) so all
  downstream consumers see resolved paths.
- Add 7-case TestResolveCommaSeparatedPaths covering empty, single,
  multiple, and mixed inputs.

* address PR review: metaFolder + Windows backslash

- master.go: resolve *m.metaFolder at the top of runMaster so
  util.FullPath(*m.metaFolder) on the next line sees an expanded
  path. Drop the now-redundant ResolvePath in TestFolderWritable.
- server.go: same treatment for *masterOptions.metaFolder, paired
  with the existing cpu/mem profile resolves. Drop the redundant
  inner ResolvePath at TestFolderWritable.
- file_util.go: ResolvePath now accepts filepath.Separator as a
  separator after the tilde, so "~\\data" works on Windows. Other
  platforms keep current behaviour (backslash stays literal because
  it is a valid filename character in usernames and paths).
- file_util_test.go: add two cases using filepath.Separator that
  exercise the new code path on Windows and remain a no-op on Unix.

* address PR review: resolve "~" in remaining command path flags

Comprehensive sweep of path-bearing flags across every weed
subcommand, applying util.ResolvePath in-place at the top of each
run* function so all downstream consumers see expanded paths.

- webdav.go: resolve *wo.cacheDir at the top of startWebDav so
  mini/server/filer/standalone callers all inherit it.
- mount_std.go: cpu/mem profile paths.
- filer_sync.go: cpu/mem profile paths.
- mq_broker.go: cpu/mem profile paths.
- benchmark.go: cpuprofile output path.
- backup.go: -dir resolved once at runBackup; drop the duplicated
  inline ResolvePath in NewVolume calls.
- compact.go: -dir resolved at runCompact; drop inline ResolvePath.
- export.go: -dir and -o resolved at runExport; drop inline
  ResolvePath in LoadFromIdx and ScanVolumeFile.
- download.go: -dir resolved at runDownload; drop inline.
- update.go: -dir resolved at runUpdate so filepath.Join uses the
  expanded path; drop inline ResolvePath in TestFolderWritable.
- scaffold.go: -output expanded before filepath.Join.
- worker.go: -workingDir expanded before being passed to runtime.

* address PR review: resolve option-struct paths at run* entry points

server.go:381 propagates s3Options.config to filerOptions.s3ConfigFile
*before* startS3Server runs, which meant the filer-side code saw the
unresolved tilde-prefixed pointer. Same pattern for webdavOptions and
sftpOptions (and equivalent in mini.go / filer.go).

The fix: hoist resolution from the shared start* functions up to the
run* entry points, where every shared pointer is set up before any
propagation happens.

- s3.go, webdav.go, sftp.go: extract a resolvePaths() method on each
  Options struct that runs every path field through util.ResolvePath
  in-place. Idempotent.
- runS3, runWebDav, runSftp: call the standalone struct's resolvePaths
  before starting metrics / loading security config.
- runServer, runMini, runFiler: call resolvePaths on every embedded
  options struct, plus resolve loose flags (serverIamConfig,
  miniS3Config, miniIamConfig, miniMasterOptions.metaFolder, and
  filer's defaultLevelDbDirectory) so they're expanded before any
  pointer copy or use.
- Drop the now-redundant inline ResolvePath at filer's
  defaultLevelDbDirectory composition.

* address PR review: re-resolve mini -dir post-config, cover misc paths

- mini.go: applyConfigFileOptions can overwrite -dir with a literal
  ~/data from mini.options. Re-resolve *miniDataFolders after the
  config-file apply, alongside the other path resolves, so the mini
  filer no longer ends up with a literal ~/data/filerldb2.
- benchmark.go: resolve *b.idListFile (-list).
- filer_sync.go: resolve *syncOptions.aSecurity / .bSecurity
  (-a.security / -b.security) before LoadClientTLSFromFile.
- filer_cat.go: resolve *filerCat.output (-o) before os.OpenFile.
- admin.go: drop trailing blank line at EOF (git diff --check).

* address PR review: resolve -a.security/-b.security/-config before use

Three follow-up fixes:

- filer_sync.go: the -a.security / -b.security resolves were placed
  *after* LoadClientTLSFromFile / LoadHTTPClientFromFile were called,
  so weed filer.sync -a.security=~/a.toml still passed the literal
  tilde path. Hoist the resolves above the security-loading block so
  TLS clients see expanded paths.
- filer_sync_verify.go: same flag pair was never resolved at all in
  the verify command; resolve at the top of runFilerSyncVerify.
- filer_meta_backup.go: -config (the backup_filer.toml path) was
  passed directly to viper. Resolve at the top of runFilerMetaBackup.
- mini.go: master.dir defaulted to the entire comma-joined
  miniDataFolders. With weed mini -dir=~/d1,~/d2 (or any multi-dir
  setup), TestFolderWritable then stat'd the joined string instead
  of a single directory. Default to the first entry via StringSplit
  to mirror the disk-space calculation a few lines below, and drop
  the now-redundant ResolvePath in TestFolderWritable.
2026-05-03 21:46:21 -07:00
Lars Lehtonen
29e14f89f1 fix(weed/command) address unhandled errors (#9208)
* fix(weed/command) address unhandled errors

* fix(command): don't log graceful-shutdown sentinels; plug response-body leak

- s3: Serve on unix socket treated http.ErrServerClosed as fatal; now
  excluded like the other Serve/ServeTLS paths in this file.
- mq_agent, mq_broker: filter grpc.ErrServerStopped so clean shutdown
  doesn't log as an error.
- worker_runtime: the added decodeErr early-continue skipped
  resp.Body.Close(); drop it since the existing check below already
  surfaces the decode error.
- mount_std: the pre-mount Unmount commonly fails when nothing is
  mounted; demote to V(1) Infof.
- fuse_std: tidy panic message to match sibling cases.

* fix(mq_broker): filter grpc.ErrServerStopped on localhost listener

The localhost listener goroutine logged any Serve error unconditionally,
which includes grpc.ErrServerStopped on graceful shutdown. Match the
main listener's check so clean stops don't surface as errors.

---------

Co-authored-by: Chris Lu <chris.lu@gmail.com>
2026-04-23 22:15:05 -07:00
Chris Lu
9ae905e456 feat(security): hot-reload HTTPS certs without restart (k8s cert-manager) (#9181)
* feat(security): hot-reload HTTPS certs for master/volume/filer/webdav/admin

S3 and filer already use a refreshing pemfile provider for their HTTPS
cert, so rotated certificates (e.g. from k8s cert-manager) are picked up
without a restart. Master, volume, webdav, and admin, however, passed
cert/key paths straight to ServeTLS/ListenAndServeTLS and loaded once at
startup — rotating those certs required a pod restart.

Add a small helper NewReloadingServerCertificate in weed/security that
wraps pemfile.Provider and returns a tls.Config.GetCertificate closure,
then wire it into the four remaining HTTPS entry points. httpdown now
also calls ServeTLS when TLSConfig carries a GetCertificate/Certificates
but CertFile/KeyFile are empty, so volume server can pre-populate
TLSConfig.

A unit test exercises the rotation path (write cert, rotate on disk,
assert the callback returns the new cert) with a short refresh window.

* refactor(security): route filer/s3 HTTPS through the shared cert reloader

Before: filer.go and s3.go each kept a *certprovider.Provider on the
options struct plus a duplicated GetCertificateWithUpdate method. Both
were loading pemfile themselves. Behaviorally they already reloaded, but
the logic was duplicated two ways and neither path was shared with the
newly-added master/volume/webdav/admin wiring.

After: both use security.NewReloadingServerCertificate like the other
servers. The per-struct certProvider field and GetCertificateWithUpdate
method are removed, along with the now-unused certprovider and pemfile
imports. Net: -32 lines, one code path for all HTTPS cert reloading.

No behavior change — the refresh window, cache, and handshake contract
are identical (the helper wraps the same pemfile.NewProvider).

* feat(security): hot-reload HTTPS client certs for mount/backup/upload/etc

The HTTP client in weed/util/http/client loaded the mTLS client cert
once at startup via tls.LoadX509KeyPair. That left every long-lived
HTTPS client process (weed mount, backup, filer.copy, filer→volume,
s3→filer/volume) unable to pick up a rotated client cert without a
restart — even though the same cert-manager setup was already rotating
the server side fine.

Swap the client cert loader for a tls.Config.GetClientCertificate
callback backed by the same refreshing pemfile provider. New TLS
handshakes pick up the rotated cert; in-flight pooled connections keep
their old cert and drop as normal transport churn happens.

To keep this reusable from both server and client TLS code without an
import cycle (weed/security already imports weed/util/http/client for
LoadHTTPClientFromFile), extract the pemfile wrapper into a new
weed/security/certreload subpackage. weed/security keeps its thin
NewReloadingServerCertificate wrapper. The existing unit test moves
with the implementation.

gRPC mTLS was already handled by security.LoadServerTLS /
LoadClientTLS; this PR does not change any gRPC paths. MQ broker, MQ
agent, Kafka gateway, and FUSE mount control plane are gRPC-only and
therefore already rotate.

CA bundles (ClientCAs / RootCAs / grpc.ca) are still loaded once — noted
as a known limitation in the wiki.

* fix(security): address PR review feedback on cert reloader

Bots (gemini-code-assist + coderabbit) flagged three real issues and a
couple of nits. Addressing them here:

1. KeyMaterial used context.Background(). The grpc pemfile provider's
   KeyMaterial blocks until material arrives or the context deadline
   expires; with Background() a slow disk could hang the TLS handshake
   indefinitely. Switched both the server and client callbacks to use
   hello.Context() / cri.Context() so a stuck read is bounded by the
   handshake timeout.

2. Admin server loaded TLS inside the serve goroutine. If the cert was
   bad, the goroutine returned but startAdminServer kept blocking on
   <-ctx.Done() with no listener, making the process look healthy with
   nothing bound. Moved TLS setup to run before the goroutine starts
   and propagate errors via fmt.Errorf; also captures the provider and
   defers Close().

3. HTTP client discarded the certprovider.Provider from
   NewClientGetCertificate. That leaked the refresh goroutine, and
   NewHttpClientWithTLS had a worse case where a CA-file failure after
   provider creation orphaned the provider entirely. Added a
   certProvider field and a Close() method on HTTPClient, and made
   the constructors close the provider on subsequent error paths.

4. Server-side paths (master/volume/filer/s3/webdav/admin) now retain
   the provider. filer and webdav run ServeTLS synchronously, so a
   plain defer works. master/volume/s3 dispatch goroutines and return
   while the server keeps running, so they hook Close() into
   grace.OnInterrupt.

5. Test: certreload_test now tolerates transient read/parse errors
   during file rotation (writeSelfSigned rewrites cert before key) and
   reports the last error only if the deadline expires.

No user-visible behavior change for the happy path.

* test(tls): add end-to-end HTTPS cert rotation integration test

Boots a real `weed master` with HTTPS enabled, captures the leaf cert
served at TLS handshake time, atomically rewrites the cert/key files
on disk (the same rename-in-place pattern kubelet does when it swaps
a cert-manager Secret), and asserts that a subsequent TLS handshake
observes the rotated leaf — with no process restart, no SIGHUP, no
reloader sidecar. Verifies the full path: on-disk change → pemfile
refresh tick → provider.KeyMaterial → tls.Config.GetCertificate →
server TLS handshake.

Runtime is ~1s by exposing the reloader's refresh window as an env
var (WEED_TLS_CERT_REFRESH_INTERVAL) and setting it to 500ms for the
test. The same env var is user-facing — documented in the wiki — so
operators running short-lived certs (Vault, cert-manager with
duration: 24h, etc.) can tighten the rotation-pickup window without a
rebuild. Defaults to 5h to preserve prior behavior.

security.CredRefreshingInterval is kept for API compatibility but now
aliases certreload.DefaultRefreshInterval so the same env controls
both gRPC mTLS and HTTPS reload.

* ci(tls): wire the TLS rotation integration test into GitHub Actions

Mirrors the existing vacuum-integration-tests.yml shape: Ubuntu runner,
Go 1.25, build weed, run `go test` in test/tls_rotation, upload master
logs on failure. 10-minute job timeout; the test itself finishes in
about a second because WEED_TLS_CERT_REFRESH_INTERVAL is set to 500ms
inside the test.

Runs on every push to master and on every PR to master.

* fix(tls): address follow-up PR review comments

Three new comments on the integration test + volume shutdown path:

1. Test: peekServerCert was swallowing every dial/handshake error,
   which meant waitForCert's "last err: <nil>" fatal message lost all
   diagnostic value. Thread errors back through: peekServerCert now
   returns (*x509.Certificate, error), and waitForCert records the
   latest error so a CI flake points at the actual cause (master
   didn't come up, handshake rejected, CA pool mismatch, etc.).

2. Test: set HOME=<tempdir> on the master subprocess. Viper today
   registers the literal path "$HOME/.seaweedfs" without env
   expansion, so a developer's ~/.seaweedfs/security.toml is
   accidentally invisible — the test was relying on that. Pinning
   HOME is belt-and-braces against a future viper upgrade that does
   expand env vars.

3. volume.go: startClusterHttpService's provider close was registered
   via grace.OnInterrupt, which fires on SIGTERM but NOT on the
   v.shutdownCtx.Done() path used by mini / integration tests. The
   pemfile refresh goroutine leaked in that shutdown path. Now the
   helper returns a close func and the caller invokes it on BOTH
   shutdown paths for parity.

Also add MinVersion: TLS 1.2 to the test's tls.Config to quiet the
ast-grep static-analysis nit — zero-risk since the pool only trusts
our in-memory CA.

Test runs clean 3/3.
2026-04-21 20:20:11 -07:00
Chris Lu
9d15705c16 fix(mini): shut down admin/s3/webdav/filer before volume/master on Ctrl+C (#9112)
* fix(mini): shut down admin/s3/webdav/filer before volume/master on Ctrl+C

Interrupts fired grace hooks in registration order, so master (started
first) shut down before its clients, producing heartbeat-canceled errors
and masterClient reconnection noise during weed mini shutdown. Admin/s3/
webdav had no interrupt hooks at all and were killed at os.Exit.

- grace: execute interrupt hooks in LIFO (defer-style) order so later-
  started services tear down first.
- filer: consolidate the three separate interrupt hooks (gRPC / HTTP /
  DB) into one that runs in order, so filer shutdown stays correct
  independent of FIFO/LIFO semantics.
- mini: add MiniClientsShutdownCtx (separate from test-facing
  MiniClusterCtx) plus an OnMiniClientsShutdown helper. Admin, S3,
  WebDAV and the maintenance worker observe it; runMini registers a
  cancel hook after startup so under LIFO it fires first and waits up to
  10s on a WaitGroup for those services to drain before filer, volume,
  and master shut down.

Resulting order on Ctrl+C: admin/s3/webdav/worker -> filer (gRPC -> HTTP
-> DB) -> volume -> master.

* refactor(mini): group mini-client shutdown into one state struct

The first pass spread the shutdown plumbing across three globals
(MiniClientsShutdownCtx, miniClientsWg, cancelMiniClients) and two
ctx-derivation sites (OnMiniClientsShutdown and startMiniAdminWithWorker).

Group into a private miniClientsState (ctx/cancel/wg) rebuilt per runMini
invocation, and chain its ctx from MiniClusterCtx so clients only observe
one signal. Tests that cancel MiniClusterCtx still trigger client
shutdown via parent-child propagation.

- resetMiniClients() installs fresh state at the top of runMini, so
  in-process test reruns don't inherit stale ctx/wg.
- onMiniClientsShutdown(fn) replaces the exported OnMiniClientsShutdown
  and only observes one ctx.
- trackMiniClient() replaces the manual wg.Add/Done dance for the admin
  goroutine.
- miniClientsCtx() gives the admin startup a ctx without re-deriving.
- triggerMiniClientsShutdown(timeout) is the interrupt hook body.

No behaviour change; existing tests pass.

* refactor: generalize shutdown ctx as an option, not a mini-specific helper

Several service files (s3, webdav, filer, master, volume) observed the
mini-specific MiniClusterCtx or called onMiniClientsShutdown directly.
That leaked mini orchestration into code that also runs under weed s3,
weed webdav, weed filer, weed master, and weed volume standalone.

Replace with a generic `shutdownCtx context.Context` field on each
service's Options struct. When non-nil, the server watches it and shuts
down gracefully; when nil (standalone), the shutdown path is a no-op.

Mini wires the contexts up from a single place (runMini):
 - miniMasterOptions/miniOptions.v/miniFilerOptions.shutdownCtx =
   MiniClusterCtx (drives test-triggered teardown)
 - miniS3Options/miniWebDavOptions.shutdownCtx = miniClientsCtx() (drives
   Ctrl+C teardown before filer/volume/master)

All knowledge of MiniClusterCtx now lives in mini.go.

* fix(mini): stop worker before clients ctx so admin shutdown isn't blocked

Symptom on Ctrl+C of a clean weed mini: mini's Shutting down admin/s3/
webdav hook sat for 10s then logged "timed out". Admin had started its
shutdown but was blocked inside StopWorkerGrpcServer's GracefulStop,
waiting for the still-connected worker stream. That in turn left filer
clients connected and cascaded into filer's own 10s gRPC graceful-stop
timeout.

Two causes, both fixed:

1. worker.Stop() deadlocked on clean shutdown. It sent ActionStop (which
   makes managerLoop `break out` and exit), then called getTaskLoad()
   which sends to the same unbuffered cmd channel — no receiver, hangs
   forever. Reorder Stop() to snapshot the admin client and drain tasks
   BEFORE sending ActionStop, and call Disconnect() via the local
   snapshot afterwards.

2. Worker's taskRequestLoop raced with Disconnect(): RequestTask reads
   from c.incoming, which Disconnect closes, yielding a nil response and
   a panic on response.Message. Handle the closed channel explicitly.

3. Mini now has a preCancel phase (beforeMiniClientsShutdown) that runs
   synchronously BEFORE the clients ctx is cancelled. Register worker
   shutdown there so admin's worker-gRPC GracefulStop finds the worker
   already disconnected and returns immediately, instead of waiting on
   a stream that is about to close anyway.

Observed shutdown of a clean mini: admin/s3/webdav down in <10ms; full
process exit in ~11s (the remaining 10s is a pre-existing filer gRPC
graceful-stop timeout, not cascaded from the clients tier).

* feat(mini): cap filer gRPC graceful stop at 1s under weed mini

Full weed mini shutdown was ~11s on a clean exit, dominated by the
filer's default 10s gRPC GracefulStop timeout while background
SubscribeLocalMetadata streams drained.

Expose the timeout as a FilerOptions.gracefulStopTimeout field (default
10s for standalone weed filer) and set it to 1s in mini. Clean weed mini
shutdown now takes ~2s.
2026-04-16 16:11:01 -07:00
Chris Lu
2eaf98a7a2 Use Unix sockets for gRPC in mini mode (#8856)
* Use Unix sockets for gRPC between co-located services in mini mode

In `weed mini`, all services run in one process. Previously, inter-service
gRPC traffic (volume↔master, filer↔master, S3↔filer, worker↔admin, etc.)
went through TCP loopback. This adds a gRPC Unix socket registry in the pb
package: mini mode registers a socket path per gRPC port at startup, each
gRPC server additionally listens on its socket, and GrpcDial transparently
routes to the socket via WithContextDialer when a match is found.

Standalone commands (weed master, weed filer, etc.) are unaffected since
no sockets are registered. TCP listeners are kept for external clients.

* Handle Serve error and clean up socket file in ServeGrpcOnLocalSocket

Log non-expected errors from grpcServer.Serve (ignoring
grpc.ErrServerStopped) and always remove the Unix socket file
when Serve returns, ensuring cleanup on Stop/GracefulStop.
2026-03-30 18:18:52 -07:00
Chris Lu
8cde3d4486 Add data file compaction to iceberg maintenance (Phase 2) (#8503)
* Add iceberg_maintenance plugin worker handler (Phase 1)

Implement automated Iceberg table maintenance as a new plugin worker job
type. The handler scans S3 table buckets for tables needing maintenance
and executes operations in the correct Iceberg order: expire snapshots,
remove orphan files, and rewrite manifests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Add data file compaction to iceberg maintenance handler (Phase 2)

Implement bin-packing compaction for small Parquet data files:
- Enumerate data files from manifests, group by partition
- Merge small files using parquet-go (read rows, write merged output)
- Create new manifest with ADDED/DELETED/EXISTING entries
- Commit new snapshot with compaction metadata

Add 'compact' operation to maintenance order (runs before expire_snapshots),
configurable via target_file_size_bytes and min_input_files thresholds.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix memory exhaustion in mergeParquetFiles by processing files sequentially

Previously all source Parquet files were loaded into memory simultaneously,
risking OOM when a compaction bin contained many small files. Now each file
is loaded, its rows are streamed into the output writer, and its data is
released before the next file is loaded — keeping peak memory proportional
to one input file plus the output buffer.

* Validate bucket/namespace/table names against path traversal

Reject names containing '..', '/', or '\' in Execute to prevent
directory traversal via crafted job parameters.

* Add filer address failover in iceberg maintenance handler

Try each filer address from cluster context in order instead of only
using the first one. This improves resilience when the primary filer
is temporarily unreachable.

* Add separate MinManifestsToRewrite config for manifest rewrite threshold

The rewrite_manifests operation was reusing MinInputFiles (meant for
compaction bin file counts) as its manifest count threshold. Add a
dedicated MinManifestsToRewrite field with its own config UI section
and default value (5) so the two thresholds can be tuned independently.

* Fix risky mtime fallback in orphan removal that could delete new files

When entry.Attributes is nil, mtime defaulted to Unix epoch (1970),
which would always be older than the safety threshold, causing the
file to be treated as eligible for deletion. Skip entries with nil
Attributes instead, matching the safer logic in operations.go.

* Fix undefined function references in iceberg_maintenance_handler.go

Use the exported function names (ShouldSkipDetectionByInterval,
BuildDetectorActivity, BuildExecutorActivity) matching their
definitions in vacuum_handler.go.

* Remove duplicated iceberg maintenance handler in favor of iceberg/ subpackage

The IcebergMaintenanceHandler and its compaction code in the parent
pluginworker package duplicated the logic already present in the
iceberg/ subpackage (which self-registers via init()). The old code
lacked stale-plan guards, proper path normalization, CAS-based xattr
updates, and error-returning parseOperations.

Since the registry pattern (default "all") makes the old handler
unreachable, remove it entirely. All functionality is provided by
iceberg.Handler with the reviewed improvements.

* Fix MinManifestsToRewrite clamping to match UI minimum of 2

The clamp reset values below 2 to the default of 5, contradicting the
UI's advertised MinValue of 2. Clamp to 2 instead.

* Sort entries by size descending in splitOversizedBin for better packing

Entries were processed in insertion order which is non-deterministic
from map iteration. Sorting largest-first before the splitting loop
improves bin packing efficiency by filling bins more evenly.

* Add context cancellation check to drainReader loop

The row-streaming loop in drainReader did not check ctx between
iterations, making long compaction merges uncancellable. Check
ctx.Done() at the top of each iteration.

* Fix splitOversizedBin to always respect targetSize limit

The minFiles check in the split condition allowed bins to grow past
targetSize when they had fewer than minFiles entries, defeating the
OOM protection. Now bins always split at targetSize, and a trailing
runt with fewer than minFiles entries is merged into the previous bin.

* Add integration tests for iceberg table maintenance plugin worker

Tests start a real weed mini cluster, create S3 buckets and Iceberg
table metadata via filer gRPC, then exercise the iceberg.Handler
operations (ExpireSnapshots, RemoveOrphans, RewriteManifests) against
the live filer. A full maintenance cycle test runs all operations in
sequence and verifies metadata consistency.

Also adds exported method wrappers (testing_api.go) so the integration
test package can call the unexported handler methods.

* Fix splitOversizedBin dropping files and add source path to drainReader errors

The runt-merge step could leave leading bins with fewer than minFiles
entries (e.g. [80,80,10,10] with targetSize=100, minFiles=2 would drop
the first 80-byte file). Replace the filter-based approach with an
iterative merge that folds any sub-minFiles bin into its smallest
neighbor, preserving all eligible files.

Also add the source file path to drainReader error messages so callers
can identify which Parquet file caused a read/write failure.

* Harden integration test error handling

- s3put: fail immediately on HTTP 4xx/5xx instead of logging and
  continuing
- lookupEntry: distinguish NotFound (return nil) from unexpected RPC
  errors (fail the test)
- writeOrphan and orphan creation in FullMaintenanceCycle: check
  CreateEntryResponse.Error in addition to the RPC error

* go fmt

---------

Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-15 11:27:42 -07:00
Chris Lu
f5c35240be Add volume dir tags and EC placement priority (#8472)
* Add volume dir tags to topology

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add preferred tag config for EC

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Prioritize EC destinations by tags

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add EC placement planner tag tests

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Refactor EC placement tests to reuse buildActiveTopology

Remove buildActiveTopologyWithDiskTags helper function and consolidate
tag setup inline in test cases. Tests now use UpdateTopology to apply
tags after topology creation, reusing the existing buildActiveTopology
function rather than duplicating its logic.

All tag scenario tests pass:
- TestECPlacementPlannerPrefersTaggedDisks
- TestECPlacementPlannerFallsBackWhenTagsInsufficient

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Consolidate normalizeTagList into shared util package

Extract normalizeTagList from three locations (volume.go,
detection.go, erasure_coding_handler.go) into new weed/util/tag.go
as exported NormalizeTagList function. Replace all duplicate
implementations with imports and calls to util.NormalizeTagList.

This improves code reuse and maintainability by centralizing
tag normalization logic.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add PreferredTags to EC config persistence

Add preferred_tags field to ErasureCodingTaskConfig protobuf with field
number 5. Update GetConfigSpec to include preferred_tags field in the
UI configuration schema. Add PreferredTags to ToTaskPolicy to serialize
config to protobuf. Add PreferredTags to FromTaskPolicy to deserialize
from protobuf with defensive copy to prevent external mutation.

This allows EC preferred tags to be persisted and restored across
worker restarts.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add defensive copy for Tags slice in DiskLocation

Copy the incoming tags slice in NewDiskLocation instead of storing
by reference. This prevents external callers from mutating the
DiskLocation.Tags slice after construction, improving encapsulation
and preventing unexpected changes to disk metadata.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add doc comment to buildCandidateSets method

Document the tiered candidate selection and fallback behavior. Explain
that for a planner with preferredTags, it accumulates disks matching
each tag in order into progressively larger tiers, emits a candidate
set once a tier reaches shardsNeeded, and finally falls back to the
full candidates set if preferred-tag tiers are insufficient.

This clarifies the intended semantics for future maintainers.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Apply final PR review fixes

1. Update parseVolumeTags to replicate single tag entry to all folders
   instead of leaving some folders with nil tags. This prevents nil
   pointer dereferences when processing folders without explicit tags.

2. Add defensive copy in ToTaskPolicy for PreferredTags slice to match
   the pattern used in FromTaskPolicy, preventing external mutation of
   the returned TaskPolicy.

3. Add clarifying comment in buildCandidateSets explaining that the
   shardsNeeded <= 0 branch is a defensive check for direct callers,
   since selectDestinations guarantees shardsNeeded > 0.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix nil pointer dereference in parseVolumeTags

Ensure all folder tags are initialized to either normalized tags or
empty slices, not nil. When multiple tag entries are provided and there
are more folders than entries, remaining folders now get empty slices
instead of nil, preventing nil pointer dereference in downstream code.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix NormalizeTagList to return empty slice instead of nil

Change NormalizeTagList to always return a non-nil slice. When all tags
are empty or whitespace after normalization, return an empty slice
instead of nil. This prevents nil pointer dereferences in downstream
code that expects a valid (possibly empty) slice.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add nil safety check for v.tags pointer

Add a safety check to handle the case where v.tags might be nil,
preventing a nil pointer dereference. If v.tags is nil, use an empty
string instead. This is defensive programming to prevent panics in
edge cases.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add volume.tags flag to weed server and weed mini commands

Add the volume.tags CLI option to both the 'weed server' and 'weed mini'
commands. This allows users to specify disk tags when running the
combined server modes, just like they can with 'weed volume'.

The flag uses the same format and description as the volume command:
comma-separated tag groups per data dir with ':' separators
(e.g. fast:ssd,archive).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-01 10:22:00 -08:00
Chris Lu
c106532b79 fix: prevent MiniClusterCtx race conditions in command shutdown
Capture global MiniClusterCtx into local variables before goroutine/select
evaluation to prevent nil dereference/data race when context is reset to nil
after nil check. Applied to filer, master, volume, and s3 commands.
2026-01-28 19:42:16 -08:00
Chris Lu
01c17478ae command: implement graceful shutdown for mini cluster
- Introduce MiniClusterCtx to coordinate shutdown across mini services
- Update Master, Volume, Filer, S3, and WebDAV servers to respect context cancellation
- Ensure all resources are cleaned up properly during test teardown
- Integrate MiniClusterCtx in s3tables integration tests
2026-01-28 10:36:19 -08:00
Chris Lu
ed1da07665 Add consistent -debug and -debug.port flags to commands (#7816)
* Add consistent -debug and -debug.port flags to commands

Add -debug and -debug.port flags to weed master, weed volume, weed s3,
weed mq.broker, and weed filer.sync commands for consistency with
weed filer.

When -debug is enabled, an HTTP server starts on the specified port
(default 6060) serving runtime profiling data at /debug/pprof/.

For mq.broker, replaced the older -port.pprof flag with the new
-debug and -debug.port pattern for consistency.

* Update weed/util/grace/pprof.go

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-18 17:44:36 -08:00
Chris Lu
2d06ddab41 Remove default concurrent upload/download limits for best performance (#7712)
Change all concurrentUploadLimitMB and concurrentDownloadLimitMB defaults
from fixed values (64, 128, 256 MB) to 0 (unlimited).

This removes artificial throttling that can limit throughput on high-performance
systems, especially on all-flash setups with many cores.

Files changed:
- volume.go: concurrentUploadLimitMB 256->0, concurrentDownloadLimitMB 256->0
- server.go: filer/volume/s3 concurrent limits 64/128->0
- s3.go: concurrentUploadLimitMB 128->0
- filer.go: concurrentUploadLimitMB 128->0, s3.concurrentUploadLimitMB 128->0

Users can still set explicit limits if needed for resource management.
2025-12-10 14:12:51 -08:00
msementsov
c0dad091f1 Separate vacuum speed from replication speed (#7632) 2025-12-05 12:24:38 -08:00
Chris Lu
5ed0b00fb9 Support separate volume server ID independent of RPC bind address (#7609)
* pb: add id field to Heartbeat message for stable volume server identification

This adds an 'id' field to the Heartbeat protobuf message that allows
volume servers to identify themselves independently of their IP:port address.

Ref: https://github.com/seaweedfs/seaweedfs/issues/7487

* storage: add Id field to Store struct

Add Id field to Store struct and include it in CollectHeartbeat().
The Id field provides a stable volume server identity independent of IP:port.

Ref: https://github.com/seaweedfs/seaweedfs/issues/7487

* topology: support id-based DataNode identification

Update GetOrCreateDataNode to accept an id parameter for stable node
identification. When id is provided, the DataNode can maintain its identity
even when its IP address changes (e.g., in Kubernetes pod reschedules).

For backward compatibility:
- If id is provided, use it as the node ID
- If id is empty, fall back to ip:port

Ref: https://github.com/seaweedfs/seaweedfs/issues/7487

* volume: add -id flag for stable volume server identity

Add -id command line flag to volume server that allows specifying a stable
identifier independent of the IP address. This is useful for Kubernetes
deployments with hostPath volumes where pods can be rescheduled to different
nodes while the persisted data remains on the original node.

Usage: weed volume -id=node-1 -ip=10.0.0.1 ...

If -id is not specified, it defaults to ip:port for backward compatibility.

Fixes https://github.com/seaweedfs/seaweedfs/issues/7487

* server: add -volume.id flag to weed server command

Support the -volume.id flag in the all-in-one 'weed server' command,
consistent with the standalone 'weed volume' command.

Usage: weed server -volume.id=node-1 ...

Ref: https://github.com/seaweedfs/seaweedfs/issues/7487

* topology: add test for id-based DataNode identification

Test the key scenarios:
1. Create DataNode with explicit id
2. Same id with different IP returns same DataNode (K8s reschedule)
3. IP/PublicUrl are updated when node reconnects with new address
4. Different id creates new DataNode
5. Empty id falls back to ip:port (backward compatibility)

Ref: https://github.com/seaweedfs/seaweedfs/issues/7487

* pb: add address field to DataNodeInfo for proper node addressing

Previously, DataNodeInfo.Id was used as the node address, which worked
when Id was always ip:port. Now that Id can be an explicit string,
we need a separate Address field for connection purposes.

Changes:
- Add 'address' field to DataNodeInfo protobuf message
- Update ToDataNodeInfo() to populate the address field
- Update NewServerAddressFromDataNode() to use Address (with Id fallback)
- Fix LookupEcVolume to use dn.Url() instead of dn.Id()

Ref: https://github.com/seaweedfs/seaweedfs/issues/7487

* fix: trim whitespace from volume server id and fix test

- Trim whitespace from -id flag to treat ' ' as empty
- Fix store_load_balancing_test.go to include id parameter in NewStore call

Ref: https://github.com/seaweedfs/seaweedfs/issues/7487

* refactor: extract GetVolumeServerId to util package

Move the volume server ID determination logic to a shared utility function
to avoid code duplication between volume.go and rack.go.

Ref: https://github.com/seaweedfs/seaweedfs/issues/7487

* fix: improve transition logic for legacy nodes

- Use exact ip:port match instead of net.SplitHostPort heuristic
- Update GrpcPort and PublicUrl during transition for consistency
- Remove unused net import

Ref: https://github.com/seaweedfs/seaweedfs/issues/7487

* fix: add id normalization and address change logging

- Normalize id parameter at function boundary (trim whitespace)
- Log when DataNode IP:Port changes (helps debug K8s pod rescheduling)

Ref: https://github.com/seaweedfs/seaweedfs/issues/7487
2025-12-02 22:08:11 -08:00
chrislu
6201cd099e fix help messages 2025-11-10 17:31:11 -08:00
Lisandro Pin
f466ff1412 Nit: use time.Durations instead of constants in seconds. (#7438)
Nit: use `time.Durations` instead of constants in seconds. Makes for slightly more readable code.
2025-11-04 13:02:22 -08:00
Chris Lu
5ab49e2971 Adjust cli option (#7418)
* adjust "weed benchmark" CLI to use readOnly/writeOnly

* consistently use "-master" CLI option

* If both -readOnly and -writeOnly are specified, the current logic silently allows it with -writeOnly taking precedence. This is confusing and could lead to unexpected behavior.
2025-10-31 17:08:00 -07:00
zuzuviewer
8fa1a69f8c * Fix undefined http serve behaiver (#6943) 2025-07-07 22:48:12 -07:00
Konstantin Lebedev
93007c1842 [volume] refactor and add metrics for flight upload and download data limit condition (#6920)
* refactor concurrentDownloadLimit

* fix loop

* fix cmdServer

* fix: resolve conversation pr 6920

* Changes logging function (#6919)

* updated logging methods for stores

* updated logging methods for stores

* updated logging methods for filer

* updated logging methods for uploader and http_util

* updated logging methods for weed server

---------

Co-authored-by: akosov <a.kosov@kryptonite.ru>

* Improve lock ring (#6921)

* fix flaky lock ring test

* add more tests

* fix: build

* fix: rm import util/version

* fix: serverOptions

* refactoring

---------

Co-authored-by: Aleksey Kosov <rusyak777@list.ru>
Co-authored-by: akosov <a.kosov@kryptonite.ru>
Co-authored-by: Chris Lu <chrislusf@users.noreply.github.com>
Co-authored-by: chrislu <chris.lu@gmail.com>
2025-07-02 18:03:49 -07:00
chrislu
bd4891a117 change version directory 2025-06-03 22:46:10 -07:00
Konstantin Lebedev
b65eb2ec45 [security] reload whiteList on http seerver (#6302)
* reload whiteList

* white_list add to scaffold
2024-12-02 10:38:10 -08:00
vadimartynov
b796c21fa9 Added loadSecurityConfigOnce (#5792) 2024-07-16 09:15:55 -07:00
vadimartynov
ec9e7493b3 -metricsIp cmd flag (#5773)
* Added/Updated:
- Added metrics ip options for all servers;
- Fixed a bug with the selection of the binIp or ip parameter for the metrics handler;

* Fixed cmd flags
2024-07-12 10:56:26 -07:00
Konstantin Lebedev
2b3e39397e fix: skipping checking active volumes with the same number of files at the moment (#4893)
* fix: skipping checking active volumes with the same number of files at the moment
 https://github.com/seaweedfs/seaweedfs/issues/4140

* refactor with comments
https://github.com/seaweedfs/seaweedfs/issues/4140

* add TestShouldSkipVolume

---------

Co-authored-by: Konstantin Lebedev <9497591+kmlebedev@users.noreply.github.co>
2023-10-09 09:57:26 -07:00
Jiffs Maverick
4b0430e71d [metrics] Add the ability to control bind ip (#4012) 2022-11-24 10:22:59 -08:00
Guo Lei
5b905fb2b7 Lazy loading (#3958)
* types packages is imported more than onece

* lazy-loading

* fix bugs

* fix bugs

* fix unit tests

* fix test error

* rename function

* unload ldb after initial startup

* Don't load ldb when starting volume server if ldbtimeout is set.

* remove uncessary unloadldb

* Update weed/command/server.go

Co-authored-by: Chris Lu <chrislusf@users.noreply.github.com>

* Update weed/command/volume.go

Co-authored-by: Chris Lu <chrislusf@users.noreply.github.com>

Co-authored-by: guol-fnst <goul-fnst@fujitsu.com>
Co-authored-by: Chris Lu <chrislusf@users.noreply.github.com>
2022-11-14 00:19:27 -08:00
famosss
cacc3e883b volume server:set the default value of "hasSlowRead" to true (#3710)
* simplify a bit

* feat: volume: add "readBufSize" option to customize read optimization

* refactor : redbufSIze -> readBufferSize

* simplify a bit

* simplify a bit

* volume server:set the default value of "hasSlowRead" to true
2022-10-12 21:13:26 -07:00
Konstantin Lebedev
301b678147 [volume] Add new volumes to HUP(reload) signal (#3755)
Add new volumes to HUP(reload) signal
2022-09-28 12:44:13 -07:00
chrislu
10d5b4b32b volume server: rename readBufferSize to readBufferSizeMB 2022-09-17 10:56:28 -07:00
famosss
d949a238b8 volume: add "readBufSize" option to customize read optimization (#3702)
* simplify a bit

* feat: volume: add "readBufSize" option to customize read optimization

* refactor : redbufSIze -> readBufferSize

* simplify a bit

* simplify a bit
2022-09-16 00:30:40 -07:00
chrislu
cf90f76a35 mark "hasSlowRead" as experimental 2022-09-15 23:33:46 -07:00
chrislu
896a85d6e4 volume: add "hasSlowRead" option to customize read optimization 2022-09-15 03:11:32 -07:00
chrislu
9b084d4c88 purge tcp implementation 2022-09-08 18:03:43 -07:00
chrislu
3f3a1341d8 make CodeQL happy 2022-08-26 17:09:11 -07:00
chrislu
67814a5c79 refactor and fix strings.Split 2022-08-07 01:34:32 -07:00
chrislu
26dbc6c905 move to https://github.com/seaweedfs/seaweedfs 2022-07-29 00:17:28 -07:00
liubaojiang
076e48a676 add inflight upload data wait timeout 2022-05-21 10:38:08 +08:00
guol-fnst
076595fbdd just exit in case of duplicated volume directories were loaded 2022-05-17 15:41:49 +08:00
Berck Nash
9b14f0c81a Add mTLS support for both master and volume http server. 2022-03-16 09:52:17 -06:00
chrislu
3a6eb8ca5f default bind to one ip address
fix https://github.com/chrislusf/seaweedfs/issues/1937
2022-03-11 14:02:39 -08:00
chrislu
18543c6e8b minor 2022-02-27 23:11:09 -08:00
Konstantin Lebedev
526094d2da StopTimeout 30 sec 2022-02-14 21:42:27 +05:00
Konstantin Lebedev
275e9a4e86 reduce to default http server KillTimeout and StopTimeout 2022-02-14 21:38:24 +05:00
chrislu
8907e6a40a add more help messages 2022-01-13 13:03:04 -08:00
Chris Lu
52fe86df45 use default 10000 for grpc port 2021-09-20 14:05:59 -07:00
Chris Lu
e5fc35ed0c change server address from string to a type 2021-09-12 22:47:52 -07:00
Chris Lu
e690a2be16 custom grpc port: volume server 2021-09-12 02:25:15 -07:00
Chris Lu
574485ec69 better IP v6 support 2021-09-07 19:29:42 -07:00
Chris Lu
734c980040 volume: support concurrent download data size limit 2021-08-08 23:25:16 -07:00
Chris Lu
49c66e88a0 volume: change all writes to fsync during graceful stopping
fix https://github.com/chrislusf/seaweedfs/issues/2193
2021-07-13 01:29:57 -07:00