volume: require admin auth and refuse loopback endpoints in FetchAndWriteNeedle
Gate the RPC behind checkGrpcAdminAuth for parity with the rest of the
destructive volume-server RPCs, and reject cluster-internal remote S3
endpoints (loopback / link-local / IMDS / RFC 1918 / CGNAT) before
dialing. Pin the validated address against DNS rebinding by routing the
AWS SDK through an HTTP transport whose DialContext re-resolves the host
and re-applies the deny list on every dial, so an endpoint that resolves
to a public IP at validate-time and then flips to 127.0.0.1 at connect
time is refused. Operators that legitimately fetch from private hosts
can opt out with -volume.allowUntrustedRemoteEndpoints.
* refactor(command): expand "~" in all path-style CLI flags
Many of weed's path-bearing flags (-s3.config, -s3.iam.config,
-admin.dataDir, -webdav.cacheDir, -volume.dir.idx, TLS cert/key
files, profile output paths, mount cache dirs, sftp key files, ...)
were never run through util.ResolvePath, so a value like "~/iam.json"
was used literally. Tilde only worked when the shell expanded it,
which silently fails for the common -flag=~/path form (bash leaves
the tilde literal in --opt=~/path).
- Extend util.ResolvePath to also handle "~user" / "~user/rest",
matching shell tilde expansion. Add unit tests.
- Apply util.ResolvePath at the top of each shared start* function
(s3, webdav, sftp) so mini/server/filer/standalone callers all
inherit it; resolve at the few one-off use sites (mount cache
dirs, volume idx folder, mini admin.dataDir, profile paths).
- Drop the duplicate expandHomeDir helper from admin.go in favor of
the now-equivalent util.ResolvePath.
* fixup: handle comma-separated -dir flags for tilde expansion
`weed mini -dir`, `weed server -dir`, and `weed volume -dir` accept
comma-separated paths (`dir[,dir]...`). Calling util.ResolvePath on
the whole string mishandled multi-folder values with tilde, e.g.
"~/d1,~/d2" would resolve as if "d1,~/d2" were a single subpath.
- Add util.ResolveCommaSeparatedPaths: split on ",", run each entry
through ResolvePath, rejoin. Short-circuits when no "~" present.
- Use it for *miniDataFolders (mini.go), *volumeDataFolders (server.go),
and resolve each entry of v.folders in-place (volume.go) so all
downstream consumers see resolved paths.
- Add 7-case TestResolveCommaSeparatedPaths covering empty, single,
multiple, and mixed inputs.
* address PR review: metaFolder + Windows backslash
- master.go: resolve *m.metaFolder at the top of runMaster so
util.FullPath(*m.metaFolder) on the next line sees an expanded
path. Drop the now-redundant ResolvePath in TestFolderWritable.
- server.go: same treatment for *masterOptions.metaFolder, paired
with the existing cpu/mem profile resolves. Drop the redundant
inner ResolvePath at TestFolderWritable.
- file_util.go: ResolvePath now accepts filepath.Separator as a
separator after the tilde, so "~\\data" works on Windows. Other
platforms keep current behaviour (backslash stays literal because
it is a valid filename character in usernames and paths).
- file_util_test.go: add two cases using filepath.Separator that
exercise the new code path on Windows and remain a no-op on Unix.
* address PR review: resolve "~" in remaining command path flags
Comprehensive sweep of path-bearing flags across every weed
subcommand, applying util.ResolvePath in-place at the top of each
run* function so all downstream consumers see expanded paths.
- webdav.go: resolve *wo.cacheDir at the top of startWebDav so
mini/server/filer/standalone callers all inherit it.
- mount_std.go: cpu/mem profile paths.
- filer_sync.go: cpu/mem profile paths.
- mq_broker.go: cpu/mem profile paths.
- benchmark.go: cpuprofile output path.
- backup.go: -dir resolved once at runBackup; drop the duplicated
inline ResolvePath in NewVolume calls.
- compact.go: -dir resolved at runCompact; drop inline ResolvePath.
- export.go: -dir and -o resolved at runExport; drop inline
ResolvePath in LoadFromIdx and ScanVolumeFile.
- download.go: -dir resolved at runDownload; drop inline.
- update.go: -dir resolved at runUpdate so filepath.Join uses the
expanded path; drop inline ResolvePath in TestFolderWritable.
- scaffold.go: -output expanded before filepath.Join.
- worker.go: -workingDir expanded before being passed to runtime.
* address PR review: resolve option-struct paths at run* entry points
server.go:381 propagates s3Options.config to filerOptions.s3ConfigFile
*before* startS3Server runs, which meant the filer-side code saw the
unresolved tilde-prefixed pointer. Same pattern for webdavOptions and
sftpOptions (and equivalent in mini.go / filer.go).
The fix: hoist resolution from the shared start* functions up to the
run* entry points, where every shared pointer is set up before any
propagation happens.
- s3.go, webdav.go, sftp.go: extract a resolvePaths() method on each
Options struct that runs every path field through util.ResolvePath
in-place. Idempotent.
- runS3, runWebDav, runSftp: call the standalone struct's resolvePaths
before starting metrics / loading security config.
- runServer, runMini, runFiler: call resolvePaths on every embedded
options struct, plus resolve loose flags (serverIamConfig,
miniS3Config, miniIamConfig, miniMasterOptions.metaFolder, and
filer's defaultLevelDbDirectory) so they're expanded before any
pointer copy or use.
- Drop the now-redundant inline ResolvePath at filer's
defaultLevelDbDirectory composition.
* address PR review: re-resolve mini -dir post-config, cover misc paths
- mini.go: applyConfigFileOptions can overwrite -dir with a literal
~/data from mini.options. Re-resolve *miniDataFolders after the
config-file apply, alongside the other path resolves, so the mini
filer no longer ends up with a literal ~/data/filerldb2.
- benchmark.go: resolve *b.idListFile (-list).
- filer_sync.go: resolve *syncOptions.aSecurity / .bSecurity
(-a.security / -b.security) before LoadClientTLSFromFile.
- filer_cat.go: resolve *filerCat.output (-o) before os.OpenFile.
- admin.go: drop trailing blank line at EOF (git diff --check).
* address PR review: resolve -a.security/-b.security/-config before use
Three follow-up fixes:
- filer_sync.go: the -a.security / -b.security resolves were placed
*after* LoadClientTLSFromFile / LoadHTTPClientFromFile were called,
so weed filer.sync -a.security=~/a.toml still passed the literal
tilde path. Hoist the resolves above the security-loading block so
TLS clients see expanded paths.
- filer_sync_verify.go: same flag pair was never resolved at all in
the verify command; resolve at the top of runFilerSyncVerify.
- filer_meta_backup.go: -config (the backup_filer.toml path) was
passed directly to viper. Resolve at the top of runFilerMetaBackup.
- mini.go: master.dir defaulted to the entire comma-joined
miniDataFolders. With weed mini -dir=~/d1,~/d2 (or any multi-dir
setup), TestFolderWritable then stat'd the joined string instead
of a single directory. Default to the first entry via StringSplit
to mirror the disk-space calculation a few lines below, and drop
the now-redundant ResolvePath in TestFolderWritable.
* fix(weed/command) address unhandled errors
* fix(command): don't log graceful-shutdown sentinels; plug response-body leak
- s3: Serve on unix socket treated http.ErrServerClosed as fatal; now
excluded like the other Serve/ServeTLS paths in this file.
- mq_agent, mq_broker: filter grpc.ErrServerStopped so clean shutdown
doesn't log as an error.
- worker_runtime: the added decodeErr early-continue skipped
resp.Body.Close(); drop it since the existing check below already
surfaces the decode error.
- mount_std: the pre-mount Unmount commonly fails when nothing is
mounted; demote to V(1) Infof.
- fuse_std: tidy panic message to match sibling cases.
* fix(mq_broker): filter grpc.ErrServerStopped on localhost listener
The localhost listener goroutine logged any Serve error unconditionally,
which includes grpc.ErrServerStopped on graceful shutdown. Match the
main listener's check so clean stops don't surface as errors.
---------
Co-authored-by: Chris Lu <chris.lu@gmail.com>
* feat(security): hot-reload HTTPS certs for master/volume/filer/webdav/admin
S3 and filer already use a refreshing pemfile provider for their HTTPS
cert, so rotated certificates (e.g. from k8s cert-manager) are picked up
without a restart. Master, volume, webdav, and admin, however, passed
cert/key paths straight to ServeTLS/ListenAndServeTLS and loaded once at
startup — rotating those certs required a pod restart.
Add a small helper NewReloadingServerCertificate in weed/security that
wraps pemfile.Provider and returns a tls.Config.GetCertificate closure,
then wire it into the four remaining HTTPS entry points. httpdown now
also calls ServeTLS when TLSConfig carries a GetCertificate/Certificates
but CertFile/KeyFile are empty, so volume server can pre-populate
TLSConfig.
A unit test exercises the rotation path (write cert, rotate on disk,
assert the callback returns the new cert) with a short refresh window.
* refactor(security): route filer/s3 HTTPS through the shared cert reloader
Before: filer.go and s3.go each kept a *certprovider.Provider on the
options struct plus a duplicated GetCertificateWithUpdate method. Both
were loading pemfile themselves. Behaviorally they already reloaded, but
the logic was duplicated two ways and neither path was shared with the
newly-added master/volume/webdav/admin wiring.
After: both use security.NewReloadingServerCertificate like the other
servers. The per-struct certProvider field and GetCertificateWithUpdate
method are removed, along with the now-unused certprovider and pemfile
imports. Net: -32 lines, one code path for all HTTPS cert reloading.
No behavior change — the refresh window, cache, and handshake contract
are identical (the helper wraps the same pemfile.NewProvider).
* feat(security): hot-reload HTTPS client certs for mount/backup/upload/etc
The HTTP client in weed/util/http/client loaded the mTLS client cert
once at startup via tls.LoadX509KeyPair. That left every long-lived
HTTPS client process (weed mount, backup, filer.copy, filer→volume,
s3→filer/volume) unable to pick up a rotated client cert without a
restart — even though the same cert-manager setup was already rotating
the server side fine.
Swap the client cert loader for a tls.Config.GetClientCertificate
callback backed by the same refreshing pemfile provider. New TLS
handshakes pick up the rotated cert; in-flight pooled connections keep
their old cert and drop as normal transport churn happens.
To keep this reusable from both server and client TLS code without an
import cycle (weed/security already imports weed/util/http/client for
LoadHTTPClientFromFile), extract the pemfile wrapper into a new
weed/security/certreload subpackage. weed/security keeps its thin
NewReloadingServerCertificate wrapper. The existing unit test moves
with the implementation.
gRPC mTLS was already handled by security.LoadServerTLS /
LoadClientTLS; this PR does not change any gRPC paths. MQ broker, MQ
agent, Kafka gateway, and FUSE mount control plane are gRPC-only and
therefore already rotate.
CA bundles (ClientCAs / RootCAs / grpc.ca) are still loaded once — noted
as a known limitation in the wiki.
* fix(security): address PR review feedback on cert reloader
Bots (gemini-code-assist + coderabbit) flagged three real issues and a
couple of nits. Addressing them here:
1. KeyMaterial used context.Background(). The grpc pemfile provider's
KeyMaterial blocks until material arrives or the context deadline
expires; with Background() a slow disk could hang the TLS handshake
indefinitely. Switched both the server and client callbacks to use
hello.Context() / cri.Context() so a stuck read is bounded by the
handshake timeout.
2. Admin server loaded TLS inside the serve goroutine. If the cert was
bad, the goroutine returned but startAdminServer kept blocking on
<-ctx.Done() with no listener, making the process look healthy with
nothing bound. Moved TLS setup to run before the goroutine starts
and propagate errors via fmt.Errorf; also captures the provider and
defers Close().
3. HTTP client discarded the certprovider.Provider from
NewClientGetCertificate. That leaked the refresh goroutine, and
NewHttpClientWithTLS had a worse case where a CA-file failure after
provider creation orphaned the provider entirely. Added a
certProvider field and a Close() method on HTTPClient, and made
the constructors close the provider on subsequent error paths.
4. Server-side paths (master/volume/filer/s3/webdav/admin) now retain
the provider. filer and webdav run ServeTLS synchronously, so a
plain defer works. master/volume/s3 dispatch goroutines and return
while the server keeps running, so they hook Close() into
grace.OnInterrupt.
5. Test: certreload_test now tolerates transient read/parse errors
during file rotation (writeSelfSigned rewrites cert before key) and
reports the last error only if the deadline expires.
No user-visible behavior change for the happy path.
* test(tls): add end-to-end HTTPS cert rotation integration test
Boots a real `weed master` with HTTPS enabled, captures the leaf cert
served at TLS handshake time, atomically rewrites the cert/key files
on disk (the same rename-in-place pattern kubelet does when it swaps
a cert-manager Secret), and asserts that a subsequent TLS handshake
observes the rotated leaf — with no process restart, no SIGHUP, no
reloader sidecar. Verifies the full path: on-disk change → pemfile
refresh tick → provider.KeyMaterial → tls.Config.GetCertificate →
server TLS handshake.
Runtime is ~1s by exposing the reloader's refresh window as an env
var (WEED_TLS_CERT_REFRESH_INTERVAL) and setting it to 500ms for the
test. The same env var is user-facing — documented in the wiki — so
operators running short-lived certs (Vault, cert-manager with
duration: 24h, etc.) can tighten the rotation-pickup window without a
rebuild. Defaults to 5h to preserve prior behavior.
security.CredRefreshingInterval is kept for API compatibility but now
aliases certreload.DefaultRefreshInterval so the same env controls
both gRPC mTLS and HTTPS reload.
* ci(tls): wire the TLS rotation integration test into GitHub Actions
Mirrors the existing vacuum-integration-tests.yml shape: Ubuntu runner,
Go 1.25, build weed, run `go test` in test/tls_rotation, upload master
logs on failure. 10-minute job timeout; the test itself finishes in
about a second because WEED_TLS_CERT_REFRESH_INTERVAL is set to 500ms
inside the test.
Runs on every push to master and on every PR to master.
* fix(tls): address follow-up PR review comments
Three new comments on the integration test + volume shutdown path:
1. Test: peekServerCert was swallowing every dial/handshake error,
which meant waitForCert's "last err: <nil>" fatal message lost all
diagnostic value. Thread errors back through: peekServerCert now
returns (*x509.Certificate, error), and waitForCert records the
latest error so a CI flake points at the actual cause (master
didn't come up, handshake rejected, CA pool mismatch, etc.).
2. Test: set HOME=<tempdir> on the master subprocess. Viper today
registers the literal path "$HOME/.seaweedfs" without env
expansion, so a developer's ~/.seaweedfs/security.toml is
accidentally invisible — the test was relying on that. Pinning
HOME is belt-and-braces against a future viper upgrade that does
expand env vars.
3. volume.go: startClusterHttpService's provider close was registered
via grace.OnInterrupt, which fires on SIGTERM but NOT on the
v.shutdownCtx.Done() path used by mini / integration tests. The
pemfile refresh goroutine leaked in that shutdown path. Now the
helper returns a close func and the caller invokes it on BOTH
shutdown paths for parity.
Also add MinVersion: TLS 1.2 to the test's tls.Config to quiet the
ast-grep static-analysis nit — zero-risk since the pool only trusts
our in-memory CA.
Test runs clean 3/3.
* fix(mini): shut down admin/s3/webdav/filer before volume/master on Ctrl+C
Interrupts fired grace hooks in registration order, so master (started
first) shut down before its clients, producing heartbeat-canceled errors
and masterClient reconnection noise during weed mini shutdown. Admin/s3/
webdav had no interrupt hooks at all and were killed at os.Exit.
- grace: execute interrupt hooks in LIFO (defer-style) order so later-
started services tear down first.
- filer: consolidate the three separate interrupt hooks (gRPC / HTTP /
DB) into one that runs in order, so filer shutdown stays correct
independent of FIFO/LIFO semantics.
- mini: add MiniClientsShutdownCtx (separate from test-facing
MiniClusterCtx) plus an OnMiniClientsShutdown helper. Admin, S3,
WebDAV and the maintenance worker observe it; runMini registers a
cancel hook after startup so under LIFO it fires first and waits up to
10s on a WaitGroup for those services to drain before filer, volume,
and master shut down.
Resulting order on Ctrl+C: admin/s3/webdav/worker -> filer (gRPC -> HTTP
-> DB) -> volume -> master.
* refactor(mini): group mini-client shutdown into one state struct
The first pass spread the shutdown plumbing across three globals
(MiniClientsShutdownCtx, miniClientsWg, cancelMiniClients) and two
ctx-derivation sites (OnMiniClientsShutdown and startMiniAdminWithWorker).
Group into a private miniClientsState (ctx/cancel/wg) rebuilt per runMini
invocation, and chain its ctx from MiniClusterCtx so clients only observe
one signal. Tests that cancel MiniClusterCtx still trigger client
shutdown via parent-child propagation.
- resetMiniClients() installs fresh state at the top of runMini, so
in-process test reruns don't inherit stale ctx/wg.
- onMiniClientsShutdown(fn) replaces the exported OnMiniClientsShutdown
and only observes one ctx.
- trackMiniClient() replaces the manual wg.Add/Done dance for the admin
goroutine.
- miniClientsCtx() gives the admin startup a ctx without re-deriving.
- triggerMiniClientsShutdown(timeout) is the interrupt hook body.
No behaviour change; existing tests pass.
* refactor: generalize shutdown ctx as an option, not a mini-specific helper
Several service files (s3, webdav, filer, master, volume) observed the
mini-specific MiniClusterCtx or called onMiniClientsShutdown directly.
That leaked mini orchestration into code that also runs under weed s3,
weed webdav, weed filer, weed master, and weed volume standalone.
Replace with a generic `shutdownCtx context.Context` field on each
service's Options struct. When non-nil, the server watches it and shuts
down gracefully; when nil (standalone), the shutdown path is a no-op.
Mini wires the contexts up from a single place (runMini):
- miniMasterOptions/miniOptions.v/miniFilerOptions.shutdownCtx =
MiniClusterCtx (drives test-triggered teardown)
- miniS3Options/miniWebDavOptions.shutdownCtx = miniClientsCtx() (drives
Ctrl+C teardown before filer/volume/master)
All knowledge of MiniClusterCtx now lives in mini.go.
* fix(mini): stop worker before clients ctx so admin shutdown isn't blocked
Symptom on Ctrl+C of a clean weed mini: mini's Shutting down admin/s3/
webdav hook sat for 10s then logged "timed out". Admin had started its
shutdown but was blocked inside StopWorkerGrpcServer's GracefulStop,
waiting for the still-connected worker stream. That in turn left filer
clients connected and cascaded into filer's own 10s gRPC graceful-stop
timeout.
Two causes, both fixed:
1. worker.Stop() deadlocked on clean shutdown. It sent ActionStop (which
makes managerLoop `break out` and exit), then called getTaskLoad()
which sends to the same unbuffered cmd channel — no receiver, hangs
forever. Reorder Stop() to snapshot the admin client and drain tasks
BEFORE sending ActionStop, and call Disconnect() via the local
snapshot afterwards.
2. Worker's taskRequestLoop raced with Disconnect(): RequestTask reads
from c.incoming, which Disconnect closes, yielding a nil response and
a panic on response.Message. Handle the closed channel explicitly.
3. Mini now has a preCancel phase (beforeMiniClientsShutdown) that runs
synchronously BEFORE the clients ctx is cancelled. Register worker
shutdown there so admin's worker-gRPC GracefulStop finds the worker
already disconnected and returns immediately, instead of waiting on
a stream that is about to close anyway.
Observed shutdown of a clean mini: admin/s3/webdav down in <10ms; full
process exit in ~11s (the remaining 10s is a pre-existing filer gRPC
graceful-stop timeout, not cascaded from the clients tier).
* feat(mini): cap filer gRPC graceful stop at 1s under weed mini
Full weed mini shutdown was ~11s on a clean exit, dominated by the
filer's default 10s gRPC GracefulStop timeout while background
SubscribeLocalMetadata streams drained.
Expose the timeout as a FilerOptions.gracefulStopTimeout field (default
10s for standalone weed filer) and set it to 1s in mini. Clean weed mini
shutdown now takes ~2s.
* Use Unix sockets for gRPC between co-located services in mini mode
In `weed mini`, all services run in one process. Previously, inter-service
gRPC traffic (volume↔master, filer↔master, S3↔filer, worker↔admin, etc.)
went through TCP loopback. This adds a gRPC Unix socket registry in the pb
package: mini mode registers a socket path per gRPC port at startup, each
gRPC server additionally listens on its socket, and GrpcDial transparently
routes to the socket via WithContextDialer when a match is found.
Standalone commands (weed master, weed filer, etc.) are unaffected since
no sockets are registered. TCP listeners are kept for external clients.
* Handle Serve error and clean up socket file in ServeGrpcOnLocalSocket
Log non-expected errors from grpcServer.Serve (ignoring
grpc.ErrServerStopped) and always remove the Unix socket file
when Serve returns, ensuring cleanup on Stop/GracefulStop.
* Add iceberg_maintenance plugin worker handler (Phase 1)
Implement automated Iceberg table maintenance as a new plugin worker job
type. The handler scans S3 table buckets for tables needing maintenance
and executes operations in the correct Iceberg order: expire snapshots,
remove orphan files, and rewrite manifests.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Add data file compaction to iceberg maintenance handler (Phase 2)
Implement bin-packing compaction for small Parquet data files:
- Enumerate data files from manifests, group by partition
- Merge small files using parquet-go (read rows, write merged output)
- Create new manifest with ADDED/DELETED/EXISTING entries
- Commit new snapshot with compaction metadata
Add 'compact' operation to maintenance order (runs before expire_snapshots),
configurable via target_file_size_bytes and min_input_files thresholds.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Fix memory exhaustion in mergeParquetFiles by processing files sequentially
Previously all source Parquet files were loaded into memory simultaneously,
risking OOM when a compaction bin contained many small files. Now each file
is loaded, its rows are streamed into the output writer, and its data is
released before the next file is loaded — keeping peak memory proportional
to one input file plus the output buffer.
* Validate bucket/namespace/table names against path traversal
Reject names containing '..', '/', or '\' in Execute to prevent
directory traversal via crafted job parameters.
* Add filer address failover in iceberg maintenance handler
Try each filer address from cluster context in order instead of only
using the first one. This improves resilience when the primary filer
is temporarily unreachable.
* Add separate MinManifestsToRewrite config for manifest rewrite threshold
The rewrite_manifests operation was reusing MinInputFiles (meant for
compaction bin file counts) as its manifest count threshold. Add a
dedicated MinManifestsToRewrite field with its own config UI section
and default value (5) so the two thresholds can be tuned independently.
* Fix risky mtime fallback in orphan removal that could delete new files
When entry.Attributes is nil, mtime defaulted to Unix epoch (1970),
which would always be older than the safety threshold, causing the
file to be treated as eligible for deletion. Skip entries with nil
Attributes instead, matching the safer logic in operations.go.
* Fix undefined function references in iceberg_maintenance_handler.go
Use the exported function names (ShouldSkipDetectionByInterval,
BuildDetectorActivity, BuildExecutorActivity) matching their
definitions in vacuum_handler.go.
* Remove duplicated iceberg maintenance handler in favor of iceberg/ subpackage
The IcebergMaintenanceHandler and its compaction code in the parent
pluginworker package duplicated the logic already present in the
iceberg/ subpackage (which self-registers via init()). The old code
lacked stale-plan guards, proper path normalization, CAS-based xattr
updates, and error-returning parseOperations.
Since the registry pattern (default "all") makes the old handler
unreachable, remove it entirely. All functionality is provided by
iceberg.Handler with the reviewed improvements.
* Fix MinManifestsToRewrite clamping to match UI minimum of 2
The clamp reset values below 2 to the default of 5, contradicting the
UI's advertised MinValue of 2. Clamp to 2 instead.
* Sort entries by size descending in splitOversizedBin for better packing
Entries were processed in insertion order which is non-deterministic
from map iteration. Sorting largest-first before the splitting loop
improves bin packing efficiency by filling bins more evenly.
* Add context cancellation check to drainReader loop
The row-streaming loop in drainReader did not check ctx between
iterations, making long compaction merges uncancellable. Check
ctx.Done() at the top of each iteration.
* Fix splitOversizedBin to always respect targetSize limit
The minFiles check in the split condition allowed bins to grow past
targetSize when they had fewer than minFiles entries, defeating the
OOM protection. Now bins always split at targetSize, and a trailing
runt with fewer than minFiles entries is merged into the previous bin.
* Add integration tests for iceberg table maintenance plugin worker
Tests start a real weed mini cluster, create S3 buckets and Iceberg
table metadata via filer gRPC, then exercise the iceberg.Handler
operations (ExpireSnapshots, RemoveOrphans, RewriteManifests) against
the live filer. A full maintenance cycle test runs all operations in
sequence and verifies metadata consistency.
Also adds exported method wrappers (testing_api.go) so the integration
test package can call the unexported handler methods.
* Fix splitOversizedBin dropping files and add source path to drainReader errors
The runt-merge step could leave leading bins with fewer than minFiles
entries (e.g. [80,80,10,10] with targetSize=100, minFiles=2 would drop
the first 80-byte file). Replace the filter-based approach with an
iterative merge that folds any sub-minFiles bin into its smallest
neighbor, preserving all eligible files.
Also add the source file path to drainReader error messages so callers
can identify which Parquet file caused a read/write failure.
* Harden integration test error handling
- s3put: fail immediately on HTTP 4xx/5xx instead of logging and
continuing
- lookupEntry: distinguish NotFound (return nil) from unexpected RPC
errors (fail the test)
- writeOrphan and orphan creation in FullMaintenanceCycle: check
CreateEntryResponse.Error in addition to the RPC error
* go fmt
---------
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* Add volume dir tags to topology
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Add preferred tag config for EC
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Prioritize EC destinations by tags
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Add EC placement planner tag tests
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Refactor EC placement tests to reuse buildActiveTopology
Remove buildActiveTopologyWithDiskTags helper function and consolidate
tag setup inline in test cases. Tests now use UpdateTopology to apply
tags after topology creation, reusing the existing buildActiveTopology
function rather than duplicating its logic.
All tag scenario tests pass:
- TestECPlacementPlannerPrefersTaggedDisks
- TestECPlacementPlannerFallsBackWhenTagsInsufficient
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Consolidate normalizeTagList into shared util package
Extract normalizeTagList from three locations (volume.go,
detection.go, erasure_coding_handler.go) into new weed/util/tag.go
as exported NormalizeTagList function. Replace all duplicate
implementations with imports and calls to util.NormalizeTagList.
This improves code reuse and maintainability by centralizing
tag normalization logic.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Add PreferredTags to EC config persistence
Add preferred_tags field to ErasureCodingTaskConfig protobuf with field
number 5. Update GetConfigSpec to include preferred_tags field in the
UI configuration schema. Add PreferredTags to ToTaskPolicy to serialize
config to protobuf. Add PreferredTags to FromTaskPolicy to deserialize
from protobuf with defensive copy to prevent external mutation.
This allows EC preferred tags to be persisted and restored across
worker restarts.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Add defensive copy for Tags slice in DiskLocation
Copy the incoming tags slice in NewDiskLocation instead of storing
by reference. This prevents external callers from mutating the
DiskLocation.Tags slice after construction, improving encapsulation
and preventing unexpected changes to disk metadata.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Add doc comment to buildCandidateSets method
Document the tiered candidate selection and fallback behavior. Explain
that for a planner with preferredTags, it accumulates disks matching
each tag in order into progressively larger tiers, emits a candidate
set once a tier reaches shardsNeeded, and finally falls back to the
full candidates set if preferred-tag tiers are insufficient.
This clarifies the intended semantics for future maintainers.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Apply final PR review fixes
1. Update parseVolumeTags to replicate single tag entry to all folders
instead of leaving some folders with nil tags. This prevents nil
pointer dereferences when processing folders without explicit tags.
2. Add defensive copy in ToTaskPolicy for PreferredTags slice to match
the pattern used in FromTaskPolicy, preventing external mutation of
the returned TaskPolicy.
3. Add clarifying comment in buildCandidateSets explaining that the
shardsNeeded <= 0 branch is a defensive check for direct callers,
since selectDestinations guarantees shardsNeeded > 0.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Fix nil pointer dereference in parseVolumeTags
Ensure all folder tags are initialized to either normalized tags or
empty slices, not nil. When multiple tag entries are provided and there
are more folders than entries, remaining folders now get empty slices
instead of nil, preventing nil pointer dereference in downstream code.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Fix NormalizeTagList to return empty slice instead of nil
Change NormalizeTagList to always return a non-nil slice. When all tags
are empty or whitespace after normalization, return an empty slice
instead of nil. This prevents nil pointer dereferences in downstream
code that expects a valid (possibly empty) slice.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Add nil safety check for v.tags pointer
Add a safety check to handle the case where v.tags might be nil,
preventing a nil pointer dereference. If v.tags is nil, use an empty
string instead. This is defensive programming to prevent panics in
edge cases.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Add volume.tags flag to weed server and weed mini commands
Add the volume.tags CLI option to both the 'weed server' and 'weed mini'
commands. This allows users to specify disk tags when running the
combined server modes, just like they can with 'weed volume'.
The flag uses the same format and description as the volume command:
comma-separated tag groups per data dir with ':' separators
(e.g. fast:ssd,archive).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
---------
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Capture global MiniClusterCtx into local variables before goroutine/select
evaluation to prevent nil dereference/data race when context is reset to nil
after nil check. Applied to filer, master, volume, and s3 commands.
- Introduce MiniClusterCtx to coordinate shutdown across mini services
- Update Master, Volume, Filer, S3, and WebDAV servers to respect context cancellation
- Ensure all resources are cleaned up properly during test teardown
- Integrate MiniClusterCtx in s3tables integration tests
* Add consistent -debug and -debug.port flags to commands
Add -debug and -debug.port flags to weed master, weed volume, weed s3,
weed mq.broker, and weed filer.sync commands for consistency with
weed filer.
When -debug is enabled, an HTTP server starts on the specified port
(default 6060) serving runtime profiling data at /debug/pprof/.
For mq.broker, replaced the older -port.pprof flag with the new
-debug and -debug.port pattern for consistency.
* Update weed/util/grace/pprof.go
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
---------
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Change all concurrentUploadLimitMB and concurrentDownloadLimitMB defaults
from fixed values (64, 128, 256 MB) to 0 (unlimited).
This removes artificial throttling that can limit throughput on high-performance
systems, especially on all-flash setups with many cores.
Files changed:
- volume.go: concurrentUploadLimitMB 256->0, concurrentDownloadLimitMB 256->0
- server.go: filer/volume/s3 concurrent limits 64/128->0
- s3.go: concurrentUploadLimitMB 128->0
- filer.go: concurrentUploadLimitMB 128->0, s3.concurrentUploadLimitMB 128->0
Users can still set explicit limits if needed for resource management.
* pb: add id field to Heartbeat message for stable volume server identification
This adds an 'id' field to the Heartbeat protobuf message that allows
volume servers to identify themselves independently of their IP:port address.
Ref: https://github.com/seaweedfs/seaweedfs/issues/7487
* storage: add Id field to Store struct
Add Id field to Store struct and include it in CollectHeartbeat().
The Id field provides a stable volume server identity independent of IP:port.
Ref: https://github.com/seaweedfs/seaweedfs/issues/7487
* topology: support id-based DataNode identification
Update GetOrCreateDataNode to accept an id parameter for stable node
identification. When id is provided, the DataNode can maintain its identity
even when its IP address changes (e.g., in Kubernetes pod reschedules).
For backward compatibility:
- If id is provided, use it as the node ID
- If id is empty, fall back to ip:port
Ref: https://github.com/seaweedfs/seaweedfs/issues/7487
* volume: add -id flag for stable volume server identity
Add -id command line flag to volume server that allows specifying a stable
identifier independent of the IP address. This is useful for Kubernetes
deployments with hostPath volumes where pods can be rescheduled to different
nodes while the persisted data remains on the original node.
Usage: weed volume -id=node-1 -ip=10.0.0.1 ...
If -id is not specified, it defaults to ip:port for backward compatibility.
Fixes https://github.com/seaweedfs/seaweedfs/issues/7487
* server: add -volume.id flag to weed server command
Support the -volume.id flag in the all-in-one 'weed server' command,
consistent with the standalone 'weed volume' command.
Usage: weed server -volume.id=node-1 ...
Ref: https://github.com/seaweedfs/seaweedfs/issues/7487
* topology: add test for id-based DataNode identification
Test the key scenarios:
1. Create DataNode with explicit id
2. Same id with different IP returns same DataNode (K8s reschedule)
3. IP/PublicUrl are updated when node reconnects with new address
4. Different id creates new DataNode
5. Empty id falls back to ip:port (backward compatibility)
Ref: https://github.com/seaweedfs/seaweedfs/issues/7487
* pb: add address field to DataNodeInfo for proper node addressing
Previously, DataNodeInfo.Id was used as the node address, which worked
when Id was always ip:port. Now that Id can be an explicit string,
we need a separate Address field for connection purposes.
Changes:
- Add 'address' field to DataNodeInfo protobuf message
- Update ToDataNodeInfo() to populate the address field
- Update NewServerAddressFromDataNode() to use Address (with Id fallback)
- Fix LookupEcVolume to use dn.Url() instead of dn.Id()
Ref: https://github.com/seaweedfs/seaweedfs/issues/7487
* fix: trim whitespace from volume server id and fix test
- Trim whitespace from -id flag to treat ' ' as empty
- Fix store_load_balancing_test.go to include id parameter in NewStore call
Ref: https://github.com/seaweedfs/seaweedfs/issues/7487
* refactor: extract GetVolumeServerId to util package
Move the volume server ID determination logic to a shared utility function
to avoid code duplication between volume.go and rack.go.
Ref: https://github.com/seaweedfs/seaweedfs/issues/7487
* fix: improve transition logic for legacy nodes
- Use exact ip:port match instead of net.SplitHostPort heuristic
- Update GrpcPort and PublicUrl during transition for consistency
- Remove unused net import
Ref: https://github.com/seaweedfs/seaweedfs/issues/7487
* fix: add id normalization and address change logging
- Normalize id parameter at function boundary (trim whitespace)
- Log when DataNode IP:Port changes (helps debug K8s pod rescheduling)
Ref: https://github.com/seaweedfs/seaweedfs/issues/7487
* adjust "weed benchmark" CLI to use readOnly/writeOnly
* consistently use "-master" CLI option
* If both -readOnly and -writeOnly are specified, the current logic silently allows it with -writeOnly taking precedence. This is confusing and could lead to unexpected behavior.
* Added/Updated:
- Added metrics ip options for all servers;
- Fixed a bug with the selection of the binIp or ip parameter for the metrics handler;
* Fixed cmd flags
* types packages is imported more than onece
* lazy-loading
* fix bugs
* fix bugs
* fix unit tests
* fix test error
* rename function
* unload ldb after initial startup
* Don't load ldb when starting volume server if ldbtimeout is set.
* remove uncessary unloadldb
* Update weed/command/server.go
Co-authored-by: Chris Lu <chrislusf@users.noreply.github.com>
* Update weed/command/volume.go
Co-authored-by: Chris Lu <chrislusf@users.noreply.github.com>
Co-authored-by: guol-fnst <goul-fnst@fujitsu.com>
Co-authored-by: Chris Lu <chrislusf@users.noreply.github.com>
* simplify a bit
* feat: volume: add "readBufSize" option to customize read optimization
* refactor : redbufSIze -> readBufferSize
* simplify a bit
* simplify a bit
* volume server:set the default value of "hasSlowRead" to true
* simplify a bit
* feat: volume: add "readBufSize" option to customize read optimization
* refactor : redbufSIze -> readBufferSize
* simplify a bit
* simplify a bit