seaweedfs

mirror of https://github.com/seaweedfs/seaweedfs.git synced 2026-05-14 05:41:29 +00:00

Author	SHA1	Message	Date
Chris Lu	69da20bdae	volume: gate FetchAndWriteNeedle behind admin auth and refuse internal endpoints (#9441 ) volume: require admin auth and refuse loopback endpoints in FetchAndWriteNeedle Gate the RPC behind checkGrpcAdminAuth for parity with the rest of the destructive volume-server RPCs, and reject cluster-internal remote S3 endpoints (loopback / link-local / IMDS / RFC 1918 / CGNAT) before dialing. Pin the validated address against DNS rebinding by routing the AWS SDK through an HTTP transport whose DialContext re-resolves the host and re-applies the deny list on every dial, so an endpoint that resolves to a public IP at validate-time and then flips to 127.0.0.1 at connect time is refused. Operators that legitimately fetch from private hosts can opt out with -volume.allowUntrustedRemoteEndpoints.	2026-05-12 10:11:20 -07:00
Chris Lu	d605feb403	refactor(command): expand "~" in all path-style CLI flags (#9306 ) * refactor(command): expand "~" in all path-style CLI flags Many of weed's path-bearing flags (-s3.config, -s3.iam.config, -admin.dataDir, -webdav.cacheDir, -volume.dir.idx, TLS cert/key files, profile output paths, mount cache dirs, sftp key files, ...) were never run through util.ResolvePath, so a value like "~/iam.json" was used literally. Tilde only worked when the shell expanded it, which silently fails for the common -flag=~/path form (bash leaves the tilde literal in --opt=~/path). - Extend util.ResolvePath to also handle "~user" / "~user/rest", matching shell tilde expansion. Add unit tests. - Apply util.ResolvePath at the top of each shared start* function (s3, webdav, sftp) so mini/server/filer/standalone callers all inherit it; resolve at the few one-off use sites (mount cache dirs, volume idx folder, mini admin.dataDir, profile paths). - Drop the duplicate expandHomeDir helper from admin.go in favor of the now-equivalent util.ResolvePath. * fixup: handle comma-separated -dir flags for tilde expansion `weed mini -dir`, `weed server -dir`, and `weed volume -dir` accept comma-separated paths (`dir[,dir]...`). Calling util.ResolvePath on the whole string mishandled multi-folder values with tilde, e.g. "~/d1,~/d2" would resolve as if "d1,~/d2" were a single subpath. - Add util.ResolveCommaSeparatedPaths: split on ",", run each entry through ResolvePath, rejoin. Short-circuits when no "~" present. - Use it for miniDataFolders (mini.go), volumeDataFolders (server.go), and resolve each entry of v.folders in-place (volume.go) so all downstream consumers see resolved paths. - Add 7-case TestResolveCommaSeparatedPaths covering empty, single, multiple, and mixed inputs. * address PR review: metaFolder + Windows backslash - master.go: resolve m.metaFolder at the top of runMaster so util.FullPath(m.metaFolder) on the next line sees an expanded path. Drop the now-redundant ResolvePath in TestFolderWritable. - server.go: same treatment for masterOptions.metaFolder, paired with the existing cpu/mem profile resolves. Drop the redundant inner ResolvePath at TestFolderWritable. - file_util.go: ResolvePath now accepts filepath.Separator as a separator after the tilde, so "~\\data" works on Windows. Other platforms keep current behaviour (backslash stays literal because it is a valid filename character in usernames and paths). - file_util_test.go: add two cases using filepath.Separator that exercise the new code path on Windows and remain a no-op on Unix. address PR review: resolve "~" in remaining command path flags Comprehensive sweep of path-bearing flags across every weed subcommand, applying util.ResolvePath in-place at the top of each run* function so all downstream consumers see expanded paths. - webdav.go: resolve wo.cacheDir at the top of startWebDav so mini/server/filer/standalone callers all inherit it. - mount_std.go: cpu/mem profile paths. - filer_sync.go: cpu/mem profile paths. - mq_broker.go: cpu/mem profile paths. - benchmark.go: cpuprofile output path. - backup.go: -dir resolved once at runBackup; drop the duplicated inline ResolvePath in NewVolume calls. - compact.go: -dir resolved at runCompact; drop inline ResolvePath. - export.go: -dir and -o resolved at runExport; drop inline ResolvePath in LoadFromIdx and ScanVolumeFile. - download.go: -dir resolved at runDownload; drop inline. - update.go: -dir resolved at runUpdate so filepath.Join uses the expanded path; drop inline ResolvePath in TestFolderWritable. - scaffold.go: -output expanded before filepath.Join. - worker.go: -workingDir expanded before being passed to runtime. address PR review: resolve option-struct paths at run* entry points server.go:381 propagates s3Options.config to filerOptions.s3ConfigFile before startS3Server runs, which meant the filer-side code saw the unresolved tilde-prefixed pointer. Same pattern for webdavOptions and sftpOptions (and equivalent in mini.go / filer.go). The fix: hoist resolution from the shared start* functions up to the run* entry points, where every shared pointer is set up before any propagation happens. - s3.go, webdav.go, sftp.go: extract a resolvePaths() method on each Options struct that runs every path field through util.ResolvePath in-place. Idempotent. - runS3, runWebDav, runSftp: call the standalone struct's resolvePaths before starting metrics / loading security config. - runServer, runMini, runFiler: call resolvePaths on every embedded options struct, plus resolve loose flags (serverIamConfig, miniS3Config, miniIamConfig, miniMasterOptions.metaFolder, and filer's defaultLevelDbDirectory) so they're expanded before any pointer copy or use. - Drop the now-redundant inline ResolvePath at filer's defaultLevelDbDirectory composition. * address PR review: re-resolve mini -dir post-config, cover misc paths - mini.go: applyConfigFileOptions can overwrite -dir with a literal ~/data from mini.options. Re-resolve miniDataFolders after the config-file apply, alongside the other path resolves, so the mini filer no longer ends up with a literal ~/data/filerldb2. - benchmark.go: resolve b.idListFile (-list). - filer_sync.go: resolve syncOptions.aSecurity / .bSecurity (-a.security / -b.security) before LoadClientTLSFromFile. - filer_cat.go: resolve filerCat.output (-o) before os.OpenFile. - admin.go: drop trailing blank line at EOF (git diff --check). * address PR review: resolve -a.security/-b.security/-config before use Three follow-up fixes: - filer_sync.go: the -a.security / -b.security resolves were placed after LoadClientTLSFromFile / LoadHTTPClientFromFile were called, so weed filer.sync -a.security=~/a.toml still passed the literal tilde path. Hoist the resolves above the security-loading block so TLS clients see expanded paths. - filer_sync_verify.go: same flag pair was never resolved at all in the verify command; resolve at the top of runFilerSyncVerify. - filer_meta_backup.go: -config (the backup_filer.toml path) was passed directly to viper. Resolve at the top of runFilerMetaBackup. - mini.go: master.dir defaulted to the entire comma-joined miniDataFolders. With weed mini -dir=~/d1,~/d2 (or any multi-dir setup), TestFolderWritable then stat'd the joined string instead of a single directory. Default to the first entry via StringSplit to mirror the disk-space calculation a few lines below, and drop the now-redundant ResolvePath in TestFolderWritable.	2026-05-03 21:46:21 -07:00
Lars Lehtonen	29e14f89f1	fix(weed/command) address unhandled errors (#9208 ) * fix(weed/command) address unhandled errors * fix(command): don't log graceful-shutdown sentinels; plug response-body leak - s3: Serve on unix socket treated http.ErrServerClosed as fatal; now excluded like the other Serve/ServeTLS paths in this file. - mq_agent, mq_broker: filter grpc.ErrServerStopped so clean shutdown doesn't log as an error. - worker_runtime: the added decodeErr early-continue skipped resp.Body.Close(); drop it since the existing check below already surfaces the decode error. - mount_std: the pre-mount Unmount commonly fails when nothing is mounted; demote to V(1) Infof. - fuse_std: tidy panic message to match sibling cases. * fix(mq_broker): filter grpc.ErrServerStopped on localhost listener The localhost listener goroutine logged any Serve error unconditionally, which includes grpc.ErrServerStopped on graceful shutdown. Match the main listener's check so clean stops don't surface as errors. --------- Co-authored-by: Chris Lu <chris.lu@gmail.com>	2026-04-23 22:15:05 -07:00
Chris Lu	9ae905e456	feat(security): hot-reload HTTPS certs without restart (k8s cert-manager) (#9181 ) * feat(security): hot-reload HTTPS certs for master/volume/filer/webdav/admin S3 and filer already use a refreshing pemfile provider for their HTTPS cert, so rotated certificates (e.g. from k8s cert-manager) are picked up without a restart. Master, volume, webdav, and admin, however, passed cert/key paths straight to ServeTLS/ListenAndServeTLS and loaded once at startup — rotating those certs required a pod restart. Add a small helper NewReloadingServerCertificate in weed/security that wraps pemfile.Provider and returns a tls.Config.GetCertificate closure, then wire it into the four remaining HTTPS entry points. httpdown now also calls ServeTLS when TLSConfig carries a GetCertificate/Certificates but CertFile/KeyFile are empty, so volume server can pre-populate TLSConfig. A unit test exercises the rotation path (write cert, rotate on disk, assert the callback returns the new cert) with a short refresh window. * refactor(security): route filer/s3 HTTPS through the shared cert reloader Before: filer.go and s3.go each kept a certprovider.Provider on the options struct plus a duplicated GetCertificateWithUpdate method. Both were loading pemfile themselves. Behaviorally they already reloaded, but the logic was duplicated two ways and neither path was shared with the newly-added master/volume/webdav/admin wiring. After: both use security.NewReloadingServerCertificate like the other servers. The per-struct certProvider field and GetCertificateWithUpdate method are removed, along with the now-unused certprovider and pemfile imports. Net: -32 lines, one code path for all HTTPS cert reloading. No behavior change — the refresh window, cache, and handshake contract are identical (the helper wraps the same pemfile.NewProvider). feat(security): hot-reload HTTPS client certs for mount/backup/upload/etc The HTTP client in weed/util/http/client loaded the mTLS client cert once at startup via tls.LoadX509KeyPair. That left every long-lived HTTPS client process (weed mount, backup, filer.copy, filer→volume, s3→filer/volume) unable to pick up a rotated client cert without a restart — even though the same cert-manager setup was already rotating the server side fine. Swap the client cert loader for a tls.Config.GetClientCertificate callback backed by the same refreshing pemfile provider. New TLS handshakes pick up the rotated cert; in-flight pooled connections keep their old cert and drop as normal transport churn happens. To keep this reusable from both server and client TLS code without an import cycle (weed/security already imports weed/util/http/client for LoadHTTPClientFromFile), extract the pemfile wrapper into a new weed/security/certreload subpackage. weed/security keeps its thin NewReloadingServerCertificate wrapper. The existing unit test moves with the implementation. gRPC mTLS was already handled by security.LoadServerTLS / LoadClientTLS; this PR does not change any gRPC paths. MQ broker, MQ agent, Kafka gateway, and FUSE mount control plane are gRPC-only and therefore already rotate. CA bundles (ClientCAs / RootCAs / grpc.ca) are still loaded once — noted as a known limitation in the wiki. * fix(security): address PR review feedback on cert reloader Bots (gemini-code-assist + coderabbit) flagged three real issues and a couple of nits. Addressing them here: 1. KeyMaterial used context.Background(). The grpc pemfile provider's KeyMaterial blocks until material arrives or the context deadline expires; with Background() a slow disk could hang the TLS handshake indefinitely. Switched both the server and client callbacks to use hello.Context() / cri.Context() so a stuck read is bounded by the handshake timeout. 2. Admin server loaded TLS inside the serve goroutine. If the cert was bad, the goroutine returned but startAdminServer kept blocking on <-ctx.Done() with no listener, making the process look healthy with nothing bound. Moved TLS setup to run before the goroutine starts and propagate errors via fmt.Errorf; also captures the provider and defers Close(). 3. HTTP client discarded the certprovider.Provider from NewClientGetCertificate. That leaked the refresh goroutine, and NewHttpClientWithTLS had a worse case where a CA-file failure after provider creation orphaned the provider entirely. Added a certProvider field and a Close() method on HTTPClient, and made the constructors close the provider on subsequent error paths. 4. Server-side paths (master/volume/filer/s3/webdav/admin) now retain the provider. filer and webdav run ServeTLS synchronously, so a plain defer works. master/volume/s3 dispatch goroutines and return while the server keeps running, so they hook Close() into grace.OnInterrupt. 5. Test: certreload_test now tolerates transient read/parse errors during file rotation (writeSelfSigned rewrites cert before key) and reports the last error only if the deadline expires. No user-visible behavior change for the happy path. * test(tls): add end-to-end HTTPS cert rotation integration test Boots a real `weed master` with HTTPS enabled, captures the leaf cert served at TLS handshake time, atomically rewrites the cert/key files on disk (the same rename-in-place pattern kubelet does when it swaps a cert-manager Secret), and asserts that a subsequent TLS handshake observes the rotated leaf — with no process restart, no SIGHUP, no reloader sidecar. Verifies the full path: on-disk change → pemfile refresh tick → provider.KeyMaterial → tls.Config.GetCertificate → server TLS handshake. Runtime is ~1s by exposing the reloader's refresh window as an env var (WEED_TLS_CERT_REFRESH_INTERVAL) and setting it to 500ms for the test. The same env var is user-facing — documented in the wiki — so operators running short-lived certs (Vault, cert-manager with duration: 24h, etc.) can tighten the rotation-pickup window without a rebuild. Defaults to 5h to preserve prior behavior. security.CredRefreshingInterval is kept for API compatibility but now aliases certreload.DefaultRefreshInterval so the same env controls both gRPC mTLS and HTTPS reload. * ci(tls): wire the TLS rotation integration test into GitHub Actions Mirrors the existing vacuum-integration-tests.yml shape: Ubuntu runner, Go 1.25, build weed, run `go test` in test/tls_rotation, upload master logs on failure. 10-minute job timeout; the test itself finishes in about a second because WEED_TLS_CERT_REFRESH_INTERVAL is set to 500ms inside the test. Runs on every push to master and on every PR to master. * fix(tls): address follow-up PR review comments Three new comments on the integration test + volume shutdown path: 1. Test: peekServerCert was swallowing every dial/handshake error, which meant waitForCert's "last err: <nil>" fatal message lost all diagnostic value. Thread errors back through: peekServerCert now returns (*x509.Certificate, error), and waitForCert records the latest error so a CI flake points at the actual cause (master didn't come up, handshake rejected, CA pool mismatch, etc.). 2. Test: set HOME=<tempdir> on the master subprocess. Viper today registers the literal path "$HOME/.seaweedfs" without env expansion, so a developer's ~/.seaweedfs/security.toml is accidentally invisible — the test was relying on that. Pinning HOME is belt-and-braces against a future viper upgrade that does expand env vars. 3. volume.go: startClusterHttpService's provider close was registered via grace.OnInterrupt, which fires on SIGTERM but NOT on the v.shutdownCtx.Done() path used by mini / integration tests. The pemfile refresh goroutine leaked in that shutdown path. Now the helper returns a close func and the caller invokes it on BOTH shutdown paths for parity. Also add MinVersion: TLS 1.2 to the test's tls.Config to quiet the ast-grep static-analysis nit — zero-risk since the pool only trusts our in-memory CA. Test runs clean 3/3.	2026-04-21 20:20:11 -07:00
Chris Lu	9d15705c16	fix(mini): shut down admin/s3/webdav/filer before volume/master on Ctrl+C (#9112 ) * fix(mini): shut down admin/s3/webdav/filer before volume/master on Ctrl+C Interrupts fired grace hooks in registration order, so master (started first) shut down before its clients, producing heartbeat-canceled errors and masterClient reconnection noise during weed mini shutdown. Admin/s3/ webdav had no interrupt hooks at all and were killed at os.Exit. - grace: execute interrupt hooks in LIFO (defer-style) order so later- started services tear down first. - filer: consolidate the three separate interrupt hooks (gRPC / HTTP / DB) into one that runs in order, so filer shutdown stays correct independent of FIFO/LIFO semantics. - mini: add MiniClientsShutdownCtx (separate from test-facing MiniClusterCtx) plus an OnMiniClientsShutdown helper. Admin, S3, WebDAV and the maintenance worker observe it; runMini registers a cancel hook after startup so under LIFO it fires first and waits up to 10s on a WaitGroup for those services to drain before filer, volume, and master shut down. Resulting order on Ctrl+C: admin/s3/webdav/worker -> filer (gRPC -> HTTP -> DB) -> volume -> master. * refactor(mini): group mini-client shutdown into one state struct The first pass spread the shutdown plumbing across three globals (MiniClientsShutdownCtx, miniClientsWg, cancelMiniClients) and two ctx-derivation sites (OnMiniClientsShutdown and startMiniAdminWithWorker). Group into a private miniClientsState (ctx/cancel/wg) rebuilt per runMini invocation, and chain its ctx from MiniClusterCtx so clients only observe one signal. Tests that cancel MiniClusterCtx still trigger client shutdown via parent-child propagation. - resetMiniClients() installs fresh state at the top of runMini, so in-process test reruns don't inherit stale ctx/wg. - onMiniClientsShutdown(fn) replaces the exported OnMiniClientsShutdown and only observes one ctx. - trackMiniClient() replaces the manual wg.Add/Done dance for the admin goroutine. - miniClientsCtx() gives the admin startup a ctx without re-deriving. - triggerMiniClientsShutdown(timeout) is the interrupt hook body. No behaviour change; existing tests pass. * refactor: generalize shutdown ctx as an option, not a mini-specific helper Several service files (s3, webdav, filer, master, volume) observed the mini-specific MiniClusterCtx or called onMiniClientsShutdown directly. That leaked mini orchestration into code that also runs under weed s3, weed webdav, weed filer, weed master, and weed volume standalone. Replace with a generic `shutdownCtx context.Context` field on each service's Options struct. When non-nil, the server watches it and shuts down gracefully; when nil (standalone), the shutdown path is a no-op. Mini wires the contexts up from a single place (runMini): - miniMasterOptions/miniOptions.v/miniFilerOptions.shutdownCtx = MiniClusterCtx (drives test-triggered teardown) - miniS3Options/miniWebDavOptions.shutdownCtx = miniClientsCtx() (drives Ctrl+C teardown before filer/volume/master) All knowledge of MiniClusterCtx now lives in mini.go. * fix(mini): stop worker before clients ctx so admin shutdown isn't blocked Symptom on Ctrl+C of a clean weed mini: mini's Shutting down admin/s3/ webdav hook sat for 10s then logged "timed out". Admin had started its shutdown but was blocked inside StopWorkerGrpcServer's GracefulStop, waiting for the still-connected worker stream. That in turn left filer clients connected and cascaded into filer's own 10s gRPC graceful-stop timeout. Two causes, both fixed: 1. worker.Stop() deadlocked on clean shutdown. It sent ActionStop (which makes managerLoop `break out` and exit), then called getTaskLoad() which sends to the same unbuffered cmd channel — no receiver, hangs forever. Reorder Stop() to snapshot the admin client and drain tasks BEFORE sending ActionStop, and call Disconnect() via the local snapshot afterwards. 2. Worker's taskRequestLoop raced with Disconnect(): RequestTask reads from c.incoming, which Disconnect closes, yielding a nil response and a panic on response.Message. Handle the closed channel explicitly. 3. Mini now has a preCancel phase (beforeMiniClientsShutdown) that runs synchronously BEFORE the clients ctx is cancelled. Register worker shutdown there so admin's worker-gRPC GracefulStop finds the worker already disconnected and returns immediately, instead of waiting on a stream that is about to close anyway. Observed shutdown of a clean mini: admin/s3/webdav down in <10ms; full process exit in ~11s (the remaining 10s is a pre-existing filer gRPC graceful-stop timeout, not cascaded from the clients tier). * feat(mini): cap filer gRPC graceful stop at 1s under weed mini Full weed mini shutdown was ~11s on a clean exit, dominated by the filer's default 10s gRPC GracefulStop timeout while background SubscribeLocalMetadata streams drained. Expose the timeout as a FilerOptions.gracefulStopTimeout field (default 10s for standalone weed filer) and set it to 1s in mini. Clean weed mini shutdown now takes ~2s.	2026-04-16 16:11:01 -07:00
Chris Lu	2eaf98a7a2	Use Unix sockets for gRPC in mini mode (#8856 ) * Use Unix sockets for gRPC between co-located services in mini mode In `weed mini`, all services run in one process. Previously, inter-service gRPC traffic (volume↔master, filer↔master, S3↔filer, worker↔admin, etc.) went through TCP loopback. This adds a gRPC Unix socket registry in the pb package: mini mode registers a socket path per gRPC port at startup, each gRPC server additionally listens on its socket, and GrpcDial transparently routes to the socket via WithContextDialer when a match is found. Standalone commands (weed master, weed filer, etc.) are unaffected since no sockets are registered. TCP listeners are kept for external clients. * Handle Serve error and clean up socket file in ServeGrpcOnLocalSocket Log non-expected errors from grpcServer.Serve (ignoring grpc.ErrServerStopped) and always remove the Unix socket file when Serve returns, ensuring cleanup on Stop/GracefulStop.	2026-03-30 18:18:52 -07:00
Chris Lu	8cde3d4486	Add data file compaction to iceberg maintenance (Phase 2) (#8503 ) * Add iceberg_maintenance plugin worker handler (Phase 1) Implement automated Iceberg table maintenance as a new plugin worker job type. The handler scans S3 table buckets for tables needing maintenance and executes operations in the correct Iceberg order: expire snapshots, remove orphan files, and rewrite manifests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Add data file compaction to iceberg maintenance handler (Phase 2) Implement bin-packing compaction for small Parquet data files: - Enumerate data files from manifests, group by partition - Merge small files using parquet-go (read rows, write merged output) - Create new manifest with ADDED/DELETED/EXISTING entries - Commit new snapshot with compaction metadata Add 'compact' operation to maintenance order (runs before expire_snapshots), configurable via target_file_size_bytes and min_input_files thresholds. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Fix memory exhaustion in mergeParquetFiles by processing files sequentially Previously all source Parquet files were loaded into memory simultaneously, risking OOM when a compaction bin contained many small files. Now each file is loaded, its rows are streamed into the output writer, and its data is released before the next file is loaded — keeping peak memory proportional to one input file plus the output buffer. * Validate bucket/namespace/table names against path traversal Reject names containing '..', '/', or '\' in Execute to prevent directory traversal via crafted job parameters. * Add filer address failover in iceberg maintenance handler Try each filer address from cluster context in order instead of only using the first one. This improves resilience when the primary filer is temporarily unreachable. * Add separate MinManifestsToRewrite config for manifest rewrite threshold The rewrite_manifests operation was reusing MinInputFiles (meant for compaction bin file counts) as its manifest count threshold. Add a dedicated MinManifestsToRewrite field with its own config UI section and default value (5) so the two thresholds can be tuned independently. * Fix risky mtime fallback in orphan removal that could delete new files When entry.Attributes is nil, mtime defaulted to Unix epoch (1970), which would always be older than the safety threshold, causing the file to be treated as eligible for deletion. Skip entries with nil Attributes instead, matching the safer logic in operations.go. * Fix undefined function references in iceberg_maintenance_handler.go Use the exported function names (ShouldSkipDetectionByInterval, BuildDetectorActivity, BuildExecutorActivity) matching their definitions in vacuum_handler.go. * Remove duplicated iceberg maintenance handler in favor of iceberg/ subpackage The IcebergMaintenanceHandler and its compaction code in the parent pluginworker package duplicated the logic already present in the iceberg/ subpackage (which self-registers via init()). The old code lacked stale-plan guards, proper path normalization, CAS-based xattr updates, and error-returning parseOperations. Since the registry pattern (default "all") makes the old handler unreachable, remove it entirely. All functionality is provided by iceberg.Handler with the reviewed improvements. * Fix MinManifestsToRewrite clamping to match UI minimum of 2 The clamp reset values below 2 to the default of 5, contradicting the UI's advertised MinValue of 2. Clamp to 2 instead. * Sort entries by size descending in splitOversizedBin for better packing Entries were processed in insertion order which is non-deterministic from map iteration. Sorting largest-first before the splitting loop improves bin packing efficiency by filling bins more evenly. * Add context cancellation check to drainReader loop The row-streaming loop in drainReader did not check ctx between iterations, making long compaction merges uncancellable. Check ctx.Done() at the top of each iteration. * Fix splitOversizedBin to always respect targetSize limit The minFiles check in the split condition allowed bins to grow past targetSize when they had fewer than minFiles entries, defeating the OOM protection. Now bins always split at targetSize, and a trailing runt with fewer than minFiles entries is merged into the previous bin. * Add integration tests for iceberg table maintenance plugin worker Tests start a real weed mini cluster, create S3 buckets and Iceberg table metadata via filer gRPC, then exercise the iceberg.Handler operations (ExpireSnapshots, RemoveOrphans, RewriteManifests) against the live filer. A full maintenance cycle test runs all operations in sequence and verifies metadata consistency. Also adds exported method wrappers (testing_api.go) so the integration test package can call the unexported handler methods. * Fix splitOversizedBin dropping files and add source path to drainReader errors The runt-merge step could leave leading bins with fewer than minFiles entries (e.g. [80,80,10,10] with targetSize=100, minFiles=2 would drop the first 80-byte file). Replace the filter-based approach with an iterative merge that folds any sub-minFiles bin into its smallest neighbor, preserving all eligible files. Also add the source file path to drainReader error messages so callers can identify which Parquet file caused a read/write failure. * Harden integration test error handling - s3put: fail immediately on HTTP 4xx/5xx instead of logging and continuing - lookupEntry: distinguish NotFound (return nil) from unexpected RPC errors (fail the test) - writeOrphan and orphan creation in FullMaintenanceCycle: check CreateEntryResponse.Error in addition to the RPC error * go fmt --------- Co-authored-by: Copilot <copilot@github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-15 11:27:42 -07:00
Chris Lu	f5c35240be	Add volume dir tags and EC placement priority (#8472 ) * Add volume dir tags to topology Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add preferred tag config for EC Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Prioritize EC destinations by tags Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add EC placement planner tag tests Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Refactor EC placement tests to reuse buildActiveTopology Remove buildActiveTopologyWithDiskTags helper function and consolidate tag setup inline in test cases. Tests now use UpdateTopology to apply tags after topology creation, reusing the existing buildActiveTopology function rather than duplicating its logic. All tag scenario tests pass: - TestECPlacementPlannerPrefersTaggedDisks - TestECPlacementPlannerFallsBackWhenTagsInsufficient Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Consolidate normalizeTagList into shared util package Extract normalizeTagList from three locations (volume.go, detection.go, erasure_coding_handler.go) into new weed/util/tag.go as exported NormalizeTagList function. Replace all duplicate implementations with imports and calls to util.NormalizeTagList. This improves code reuse and maintainability by centralizing tag normalization logic. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add PreferredTags to EC config persistence Add preferred_tags field to ErasureCodingTaskConfig protobuf with field number 5. Update GetConfigSpec to include preferred_tags field in the UI configuration schema. Add PreferredTags to ToTaskPolicy to serialize config to protobuf. Add PreferredTags to FromTaskPolicy to deserialize from protobuf with defensive copy to prevent external mutation. This allows EC preferred tags to be persisted and restored across worker restarts. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add defensive copy for Tags slice in DiskLocation Copy the incoming tags slice in NewDiskLocation instead of storing by reference. This prevents external callers from mutating the DiskLocation.Tags slice after construction, improving encapsulation and preventing unexpected changes to disk metadata. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add doc comment to buildCandidateSets method Document the tiered candidate selection and fallback behavior. Explain that for a planner with preferredTags, it accumulates disks matching each tag in order into progressively larger tiers, emits a candidate set once a tier reaches shardsNeeded, and finally falls back to the full candidates set if preferred-tag tiers are insufficient. This clarifies the intended semantics for future maintainers. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Apply final PR review fixes 1. Update parseVolumeTags to replicate single tag entry to all folders instead of leaving some folders with nil tags. This prevents nil pointer dereferences when processing folders without explicit tags. 2. Add defensive copy in ToTaskPolicy for PreferredTags slice to match the pattern used in FromTaskPolicy, preventing external mutation of the returned TaskPolicy. 3. Add clarifying comment in buildCandidateSets explaining that the shardsNeeded <= 0 branch is a defensive check for direct callers, since selectDestinations guarantees shardsNeeded > 0. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix nil pointer dereference in parseVolumeTags Ensure all folder tags are initialized to either normalized tags or empty slices, not nil. When multiple tag entries are provided and there are more folders than entries, remaining folders now get empty slices instead of nil, preventing nil pointer dereference in downstream code. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix NormalizeTagList to return empty slice instead of nil Change NormalizeTagList to always return a non-nil slice. When all tags are empty or whitespace after normalization, return an empty slice instead of nil. This prevents nil pointer dereferences in downstream code that expects a valid (possibly empty) slice. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add nil safety check for v.tags pointer Add a safety check to handle the case where v.tags might be nil, preventing a nil pointer dereference. If v.tags is nil, use an empty string instead. This is defensive programming to prevent panics in edge cases. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add volume.tags flag to weed server and weed mini commands Add the volume.tags CLI option to both the 'weed server' and 'weed mini' commands. This allows users to specify disk tags when running the combined server modes, just like they can with 'weed volume'. The flag uses the same format and description as the volume command: comma-separated tag groups per data dir with ':' separators (e.g. fast:ssd,archive). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <copilot@github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-01 10:22:00 -08:00
Chris Lu	c106532b79	fix: prevent MiniClusterCtx race conditions in command shutdown Capture global MiniClusterCtx into local variables before goroutine/select evaluation to prevent nil dereference/data race when context is reset to nil after nil check. Applied to filer, master, volume, and s3 commands.	2026-01-28 19:42:16 -08:00
Chris Lu	01c17478ae	command: implement graceful shutdown for mini cluster - Introduce MiniClusterCtx to coordinate shutdown across mini services - Update Master, Volume, Filer, S3, and WebDAV servers to respect context cancellation - Ensure all resources are cleaned up properly during test teardown - Integrate MiniClusterCtx in s3tables integration tests	2026-01-28 10:36:19 -08:00
Chris Lu	ed1da07665	Add consistent -debug and -debug.port flags to commands (#7816 ) * Add consistent -debug and -debug.port flags to commands Add -debug and -debug.port flags to weed master, weed volume, weed s3, weed mq.broker, and weed filer.sync commands for consistency with weed filer. When -debug is enabled, an HTTP server starts on the specified port (default 6060) serving runtime profiling data at /debug/pprof/. For mq.broker, replaced the older -port.pprof flag with the new -debug and -debug.port pattern for consistency. * Update weed/util/grace/pprof.go Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-12-18 17:44:36 -08:00
Chris Lu	2d06ddab41	Remove default concurrent upload/download limits for best performance (#7712 ) Change all concurrentUploadLimitMB and concurrentDownloadLimitMB defaults from fixed values (64, 128, 256 MB) to 0 (unlimited). This removes artificial throttling that can limit throughput on high-performance systems, especially on all-flash setups with many cores. Files changed: - volume.go: concurrentUploadLimitMB 256->0, concurrentDownloadLimitMB 256->0 - server.go: filer/volume/s3 concurrent limits 64/128->0 - s3.go: concurrentUploadLimitMB 128->0 - filer.go: concurrentUploadLimitMB 128->0, s3.concurrentUploadLimitMB 128->0 Users can still set explicit limits if needed for resource management.	2025-12-10 14:12:51 -08:00
msementsov	c0dad091f1	Separate vacuum speed from replication speed (#7632 )	2025-12-05 12:24:38 -08:00
Chris Lu	5ed0b00fb9	Support separate volume server ID independent of RPC bind address (#7609 ) * pb: add id field to Heartbeat message for stable volume server identification This adds an 'id' field to the Heartbeat protobuf message that allows volume servers to identify themselves independently of their IP:port address. Ref: https://github.com/seaweedfs/seaweedfs/issues/7487 * storage: add Id field to Store struct Add Id field to Store struct and include it in CollectHeartbeat(). The Id field provides a stable volume server identity independent of IP:port. Ref: https://github.com/seaweedfs/seaweedfs/issues/7487 * topology: support id-based DataNode identification Update GetOrCreateDataNode to accept an id parameter for stable node identification. When id is provided, the DataNode can maintain its identity even when its IP address changes (e.g., in Kubernetes pod reschedules). For backward compatibility: - If id is provided, use it as the node ID - If id is empty, fall back to ip:port Ref: https://github.com/seaweedfs/seaweedfs/issues/7487 * volume: add -id flag for stable volume server identity Add -id command line flag to volume server that allows specifying a stable identifier independent of the IP address. This is useful for Kubernetes deployments with hostPath volumes where pods can be rescheduled to different nodes while the persisted data remains on the original node. Usage: weed volume -id=node-1 -ip=10.0.0.1 ... If -id is not specified, it defaults to ip:port for backward compatibility. Fixes https://github.com/seaweedfs/seaweedfs/issues/7487 * server: add -volume.id flag to weed server command Support the -volume.id flag in the all-in-one 'weed server' command, consistent with the standalone 'weed volume' command. Usage: weed server -volume.id=node-1 ... Ref: https://github.com/seaweedfs/seaweedfs/issues/7487 * topology: add test for id-based DataNode identification Test the key scenarios: 1. Create DataNode with explicit id 2. Same id with different IP returns same DataNode (K8s reschedule) 3. IP/PublicUrl are updated when node reconnects with new address 4. Different id creates new DataNode 5. Empty id falls back to ip:port (backward compatibility) Ref: https://github.com/seaweedfs/seaweedfs/issues/7487 * pb: add address field to DataNodeInfo for proper node addressing Previously, DataNodeInfo.Id was used as the node address, which worked when Id was always ip:port. Now that Id can be an explicit string, we need a separate Address field for connection purposes. Changes: - Add 'address' field to DataNodeInfo protobuf message - Update ToDataNodeInfo() to populate the address field - Update NewServerAddressFromDataNode() to use Address (with Id fallback) - Fix LookupEcVolume to use dn.Url() instead of dn.Id() Ref: https://github.com/seaweedfs/seaweedfs/issues/7487 * fix: trim whitespace from volume server id and fix test - Trim whitespace from -id flag to treat ' ' as empty - Fix store_load_balancing_test.go to include id parameter in NewStore call Ref: https://github.com/seaweedfs/seaweedfs/issues/7487 * refactor: extract GetVolumeServerId to util package Move the volume server ID determination logic to a shared utility function to avoid code duplication between volume.go and rack.go. Ref: https://github.com/seaweedfs/seaweedfs/issues/7487 * fix: improve transition logic for legacy nodes - Use exact ip:port match instead of net.SplitHostPort heuristic - Update GrpcPort and PublicUrl during transition for consistency - Remove unused net import Ref: https://github.com/seaweedfs/seaweedfs/issues/7487 * fix: add id normalization and address change logging - Normalize id parameter at function boundary (trim whitespace) - Log when DataNode IP:Port changes (helps debug K8s pod rescheduling) Ref: https://github.com/seaweedfs/seaweedfs/issues/7487	2025-12-02 22:08:11 -08:00
chrislu	6201cd099e	fix help messages	2025-11-10 17:31:11 -08:00
Lisandro Pin	f466ff1412	Nit: use `time.Duration`s instead of constants in seconds. (#7438 ) Nit: use `time.Durations` instead of constants in seconds. Makes for slightly more readable code.	2025-11-04 13:02:22 -08:00
Chris Lu	5ab49e2971	Adjust cli option (#7418 ) * adjust "weed benchmark" CLI to use readOnly/writeOnly * consistently use "-master" CLI option * If both -readOnly and -writeOnly are specified, the current logic silently allows it with -writeOnly taking precedence. This is confusing and could lead to unexpected behavior.	2025-10-31 17:08:00 -07:00
zuzuviewer	8fa1a69f8c	* Fix undefined http serve behaiver (#6943 )	2025-07-07 22:48:12 -07:00
Konstantin Lebedev	93007c1842	[volume] refactor and add metrics for flight upload and download data limit condition (#6920 ) * refactor concurrentDownloadLimit * fix loop * fix cmdServer * fix: resolve conversation pr 6920 * Changes logging function (#6919) * updated logging methods for stores * updated logging methods for stores * updated logging methods for filer * updated logging methods for uploader and http_util * updated logging methods for weed server --------- Co-authored-by: akosov <a.kosov@kryptonite.ru> * Improve lock ring (#6921) * fix flaky lock ring test * add more tests * fix: build * fix: rm import util/version * fix: serverOptions * refactoring --------- Co-authored-by: Aleksey Kosov <rusyak777@list.ru> Co-authored-by: akosov <a.kosov@kryptonite.ru> Co-authored-by: Chris Lu <chrislusf@users.noreply.github.com> Co-authored-by: chrislu <chris.lu@gmail.com>	2025-07-02 18:03:49 -07:00
chrislu	bd4891a117	change version directory	2025-06-03 22:46:10 -07:00
Konstantin Lebedev	b65eb2ec45	[security] reload whiteList on http seerver (#6302 ) * reload whiteList * white_list add to scaffold	2024-12-02 10:38:10 -08:00
vadimartynov	b796c21fa9	Added loadSecurityConfigOnce (#5792 )	2024-07-16 09:15:55 -07:00
vadimartynov	ec9e7493b3	-metricsIp cmd flag (#5773 ) * Added/Updated: - Added metrics ip options for all servers; - Fixed a bug with the selection of the binIp or ip parameter for the metrics handler; * Fixed cmd flags	2024-07-12 10:56:26 -07:00
Konstantin Lebedev	2b3e39397e	fix: skipping checking active volumes with the same number of files at the moment (#4893 ) * fix: skipping checking active volumes with the same number of files at the moment https://github.com/seaweedfs/seaweedfs/issues/4140 * refactor with comments https://github.com/seaweedfs/seaweedfs/issues/4140 * add TestShouldSkipVolume --------- Co-authored-by: Konstantin Lebedev <9497591+kmlebedev@users.noreply.github.co>	2023-10-09 09:57:26 -07:00
Jiffs Maverick	4b0430e71d	[metrics] Add the ability to control bind ip (#4012 )	2022-11-24 10:22:59 -08:00
Guo Lei	5b905fb2b7	Lazy loading (#3958 ) * types packages is imported more than onece * lazy-loading * fix bugs * fix bugs * fix unit tests * fix test error * rename function * unload ldb after initial startup * Don't load ldb when starting volume server if ldbtimeout is set. * remove uncessary unloadldb * Update weed/command/server.go Co-authored-by: Chris Lu <chrislusf@users.noreply.github.com> * Update weed/command/volume.go Co-authored-by: Chris Lu <chrislusf@users.noreply.github.com> Co-authored-by: guol-fnst <goul-fnst@fujitsu.com> Co-authored-by: Chris Lu <chrislusf@users.noreply.github.com>	2022-11-14 00:19:27 -08:00
famosss	cacc3e883b	volume server:set the default value of "hasSlowRead" to true (#3710 ) * simplify a bit * feat: volume: add "readBufSize" option to customize read optimization * refactor : redbufSIze -> readBufferSize * simplify a bit * simplify a bit * volume server:set the default value of "hasSlowRead" to true	2022-10-12 21:13:26 -07:00
Konstantin Lebedev	301b678147	[volume] Add new volumes to HUP(reload) signal (#3755 ) Add new volumes to HUP(reload) signal	2022-09-28 12:44:13 -07:00
chrislu	10d5b4b32b	volume server: rename readBufferSize to readBufferSizeMB	2022-09-17 10:56:28 -07:00
famosss	d949a238b8	volume: add "readBufSize" option to customize read optimization (#3702 ) * simplify a bit * feat: volume: add "readBufSize" option to customize read optimization * refactor : redbufSIze -> readBufferSize * simplify a bit * simplify a bit	2022-09-16 00:30:40 -07:00
chrislu	cf90f76a35	mark "hasSlowRead" as experimental	2022-09-15 23:33:46 -07:00
chrislu	896a85d6e4	volume: add "hasSlowRead" option to customize read optimization	2022-09-15 03:11:32 -07:00
chrislu	9b084d4c88	purge tcp implementation	2022-09-08 18:03:43 -07:00
chrislu	3f3a1341d8	make CodeQL happy	2022-08-26 17:09:11 -07:00
chrislu	67814a5c79	refactor and fix strings.Split	2022-08-07 01:34:32 -07:00
chrislu	26dbc6c905	move to https://github.com/seaweedfs/seaweedfs	2022-07-29 00:17:28 -07:00
liubaojiang	076e48a676	add inflight upload data wait timeout	2022-05-21 10:38:08 +08:00
guol-fnst	076595fbdd	just exit in case of duplicated volume directories were loaded	2022-05-17 15:41:49 +08:00
Berck Nash	9b14f0c81a	Add mTLS support for both master and volume http server.	2022-03-16 09:52:17 -06:00
chrislu	3a6eb8ca5f	default bind to one ip address fix https://github.com/chrislusf/seaweedfs/issues/1937	2022-03-11 14:02:39 -08:00
chrislu	18543c6e8b	minor	2022-02-27 23:11:09 -08:00
Konstantin Lebedev	526094d2da	StopTimeout 30 sec	2022-02-14 21:42:27 +05:00
Konstantin Lebedev	275e9a4e86	reduce to default http server KillTimeout and StopTimeout	2022-02-14 21:38:24 +05:00
chrislu	8907e6a40a	add more help messages	2022-01-13 13:03:04 -08:00
Chris Lu	52fe86df45	use default 10000 for grpc port	2021-09-20 14:05:59 -07:00
Chris Lu	e5fc35ed0c	change server address from string to a type	2021-09-12 22:47:52 -07:00
Chris Lu	e690a2be16	custom grpc port: volume server	2021-09-12 02:25:15 -07:00
Chris Lu	574485ec69	better IP v6 support	2021-09-07 19:29:42 -07:00
Chris Lu	734c980040	volume: support concurrent download data size limit	2021-08-08 23:25:16 -07:00
Chris Lu	49c66e88a0	volume: change all writes to fsync during graceful stopping fix https://github.com/chrislusf/seaweedfs/issues/2193	2021-07-13 01:29:57 -07:00

1 2 3

141 Commits