mirror of
https://github.com/seaweedfs/seaweedfs.git
synced 2026-05-13 21:31:32 +00:00
master
231 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
884b0bcbfd |
feat(s3/lifecycle): cluster rate-limit allocation (Phase 3) (#9456)
* feat(s3/lifecycle): cluster rate-limit allocation (Phase 3)
Admin computes a per-worker share of cluster_deletes_per_second at
ExecuteJob time and ships it to the worker via
ClusterContext.Metadata. The worker reads the share, constructs a
golang.org/x/time/rate.Limiter, and passes it to dailyrun.Run via
cfg.Limiter (Phase 2 already plumbed the field). Phase 5 deletes the
streaming path; until then streaming ignores the cap.
Why allocate at admin: the cluster cap is a single knob operators
care about. Dividing it locally per worker would either need
out-of-band coordination or accept N× the configured budget. Admin
is the only party that knows how many execute-capable workers there
are, so it owns the math.
Admin side (weed/admin/plugin):
- Registry.CountCapableExecutors(jobType) returns the number of
non-stale workers with CanExecute=true.
- New file cluster_rate_limit.go: decorateClusterContextForJob clones
the input ClusterContext and injects two metadata keys for
s3_lifecycle. cloneClusterContext duplicates Metadata so per-job
decoration doesn't race shared base state.
- executeJobWithExecutor calls the decorator after loading the admin
config; other job types pass through unchanged.
Worker side (weed/worker/tasks/s3_lifecycle):
- New cluster_rate_limit.go declares the constants both sides agree
on (admin-config field names, metadata keys). Plain strings on the
admin side keep weed/admin/plugin free of a dependency on the
s3_lifecycle worker package; the two sets of constants are pinned
to identical values and a mismatch would silently disable rate
limiting.
- handler.go executeDailyReplay reads ClusterContext.Metadata,
builds a rate.Limiter, and passes it into dailyrun.Config{Limiter}.
Missing/empty/non-positive values → no limiter (legacy unlimited
behavior). burst defaults to 2 × rate, clamped to ≥1 to avoid a
bucket that never refills.
- Admin form gains two fields under "Scope": cluster_deletes_per_second
(rate, 0 = unlimited) and cluster_deletes_burst (0 = 2 × rate).
Metric:
- New S3LifecycleDispatchLimiterWaitSeconds histogram observes how
long each Limiter.Wait blocks before a LifecycleDelete RPC.
Operators tune the cap by reading p95 — near-zero means the cap
isn't binding, a long tail at 1/rate means it is.
Tests:
- weed/admin/plugin/cluster_rate_limit_test.go: 9 cases covering
pass-through for non-allocator job types, rps=0 / no-executors
skip, even sharing, burst sharing, burst=0 omit (worker default
kicks in), burst floor of 1, no mutation of input metadata, nil
input.
- weed/worker/tasks/s3_lifecycle/cluster_rate_limit_test.go: 7 cases
covering nil/empty/missing metadata, non-positive/invalid rate,
positive rate builds correctly, burst missing defaults to 2× rate,
tiny rate clamps burst to ≥1.
Build clean. Phase 2 (#9446) and Phase 4 engine (#9447) are the
parents; this branch stacks on Phase 2 since it consumes
dailyrun.Config{Limiter} which lands there.
* fix(s3/lifecycle): divide cluster budget by active workers, not all capable
gemini pointed out that s3_lifecycle has MaxJobsPerDetection=1
(handler.go:189) — it's a singleton job, only one worker is ever active.
Dividing the cluster_deletes_per_second budget by the count of capable
executors gave the single active worker just 1/N of the configured cap.
Pass adminRuntime.MaxJobsPerDetection through to the decorator. Divisor
is now min(executors, maxJobsPerDetection), clamped to >=1. For
s3_lifecycle (maxJobs=1) the active worker gets the full budget; for a
hypothetical parallel-dispatch job (maxJobs>1) the budget divides
across the running-set.
Tests swap the SharedEvenly case for two pinned scenarios:
- SingletonJobGetsFullBudget: maxJobs=1 across 4 executors => 100/1
- SharedEvenlyWhenParallelLimited: maxJobs=4 across 4 executors => 25/worker
- MaxJobsExceedsExecutors: maxJobs=10 across 4 executors => divisor 4
* feat(s3/lifecycle): drop Worker Count knob from admin config form
The "Worker Count" admin field controlled in-process pipeline goroutines
across the 16-shard space — per-worker tuning, not a cluster-wide scope
concern. Operators looking at the form alongside Cluster Delete Rate
reasonably misread it as the number of workers in the cluster.
Drop the form field and DefaultValues entry. cfg.Workers is now hardcoded
to shardPipelineGoroutines (=1) inside ParseConfig; the rest of the
plumbing through dailyrun.Config.Workers stays so a future need can
re-introduce it as a worker-local knob (or just bump the constant).
handler_test.go pins that "workers" must NOT appear in the form so the
removal doesn't silently regress.
|
||
|
|
fd463155e4 |
fix(ec): planner treats each (server, disk_id) as a distinct target (#9369) (#9371)
* fix(ec): planner treats each (server, disk_id) as a distinct target (#9369) master_pb.DataNodeInfo.DiskInfos is keyed by disk type, so a volume server with multiple physical disks of the same type collapses into a single DiskInfo. Per-disk attribution survives only inside the VolumeInfos[].DiskId / EcShardInfos[].DiskId records, and the active topology never put it back together. The EC planner saw N candidates instead of N×disks, returned a short plan, and createECTargets round-robined extra shards onto the same (server, disk_id) — colliding with the #9185 disk_id-aware ReceiveFile. Reconstruct per-physical-disk view in UpdateTopology by splitting each DiskInfo into one entry per observed disk_id, and index volumes / EC shards by their own DiskId so lookups stay aligned. Refuse to plan an EC task when fewer than totalShards distinct disks are available rather than packing shards onto the same disk. Threads dataShards/parityShards through planECDestinations, createECTargets and createECTaskParams so the helpers don't depend on the OSS 10+4 constants — keeps enterprise merges clean. * trim verbose comments * align EC param signatures with enterprise - dataShards/parityShards: uint32 → int (matches enterprise's ratio API) - drop unused multiPlan from createECTaskParams - minTotalDisks: total/parity+1 → ceil(total/parity), correct for non-default ratios Reduces merge surface when this PR lands in seaweed-enterprise. |
||
|
|
5d43f84df7 |
refactor(plugin): rename detection_interval_seconds → detection_interval_minutes (#9366)
Minutes is the natural granularity for detection cadence — every production handler already set the seconds field to a 60-multiple (17*60, 30*60, 3600, 24*60*60). Switching to minutes drops the *60 arithmetic and matches the unit conventions used elsewhere in the plugin worker forms. - Proto: AdminRuntimeDefaults + AdminRuntimeConfig.detection_interval_* field renamed. - Helpers: durationFromMinutes / minutesFromDuration alongside the existing seconds variants in plugin_scheduler.go. - Handlers: vacuum, ec_balance, balance, erasure_coding, iceberg, admin_script, s3_lifecycle now declare DetectionIntervalMinutes. - Admin: scheduler_status + types + UI templ + plugin_api.go pass through the new field; UI label and table cells switch to "min". |
||
|
|
7f254e158e |
feat(worker/s3_lifecycle): plugin handler with admin UI config (#9362)
* feat(s3/lifecycle): scheduler — N pipelines over an even shard split
Scheduler.Run spawns Workers Pipeline goroutines plus one engine-refresh
ticker. Each worker owns a contiguous AssignShards(idx, total) slice of
[0, ShardCount) and runs Pipeline.Run with EventBudget bounding each
iteration; brief RetryBackoff between iterations avoids hot-loop on
errors. The refresh ticker rebuilds the engine snapshot from the filer's
bucket configs every RefreshInterval.
LoadCompileInputs / IsBucketVersioned / AllActivePriorStates are
exported from a configload.go sibling so the shell command can move to
this shared implementation in a follow-up.
* refactor(shell): reuse scheduler.LoadCompileInputs in run-shard
Drop the local copies of loadLifecycleCompileInputs / isBucketVersioned
/ allActivePriorStates / lifecycleParseError that the new
scheduler package now exports. Same behavior, one source of truth.
* feat(worker/s3_lifecycle): plugin handler with admin UI config
Registers a JobHandler for s3_lifecycle via pluginworker.RegisterHandler.
Admin pulls the descriptor over the worker plugin gRPC and renders the
AdminConfigForm + WorkerConfigForm in the existing UI:
Admin form (cluster shape):
- workers (1..16, default 1)
- s3_grpc_endpoints (comma list)
Worker form (operational tuning):
- dispatch_tick_ms (default 5000)
- checkpoint_tick_ms (default 30000)
- refresh_interval_ms (default 300000)
- event_budget (default 0 = unbounded)
Detect emits a single proposal whenever S3 endpoints + filer addresses
are configured. MaxExecutionConcurrency=1 so admin only ever runs one
lifecycle daemon per worker; a fresh proposal next cycle restarts it
if the prior Execute exits.
Execute dials the configured S3 endpoint + filer, builds a
scheduler.Scheduler with the parsed config, and runs it until
ctx cancellation. Reuses the existing scheduler / dispatcher /
reader / engine packages — the handler is the thin glue that
parses descriptor values and wires the long-running daemon.
* proto(plugin): add s3_grpc_addresses to ClusterContext
So workers can dial s3 servers discovered by the master rather than a
hand-typed list in the admin form.
* feat(admin): populate ClusterContext.s3_grpc_addresses from master
ListClusterNodes(S3Type) returns the live S3 servers; the plugin
scheduler now hands these to job handlers alongside filer/volume
addresses.
* feat(worker/s3_lifecycle): discover s3 endpoints from cluster context
Drop the s3_grpc_endpoints admin form field and read the master-supplied
ClusterContext.S3GrpcAddresses instead. Operators no longer maintain a
hand-typed list, and a stale entry self-heals when the master's view
updates.
* feat(worker/s3_lifecycle): time-based runtime cap, friendlier cadence units
- dispatch_tick_minutes (was *_ms): minutes is the natural granularity
for a daily batch; default 1 minute.
- checkpoint_tick_seconds: seconds for the durable cursor write; default
30 seconds.
- refresh_interval_minutes: minutes for the engine snapshot rebuild.
- max_runtime_minutes replaces event_budget. Each daily run is bounded
by wall clock — typical run wraps in well under an hour because the
cursor persists and the meta-log streams fast. Default 60 minutes.
- AdminRuntimeDefaults.DetectionIntervalSeconds = 86400 so the admin
schedules one job per day.
|
||
|
|
a1e5eb9dad |
Fix UI prefix url encoding (#9344)
* Fix filer UI navigation for URL-sensitive object prefixes * Fix filer UI navigation for URL-sensitive object prefixes * Clarify filer UI path escaping test name Rename the legacy filer UI path test to describe the actual behavior being checked. The printpath helper preserves timestamp characters that are valid in URL path components, while the PR fix is focused on query-string escaping for path and cursor parameters. |
||
|
|
7b0b64db65 |
fix(admin/view): wrap plugin history URL with basePath (#9341)
Plugin tabs/sub-tabs use history.pushState/replaceState to keep the
URL bar in sync with the active view, but updateURL fed it the raw
output of buildPluginURL ("/plugin/lanes/<lane>/..."). Under a
urlPrefix deployment that strips the prefix, so reloading the page
hit /plugin/... directly and 404'd at the proxy.
Wrap with basePath() so the rewritten URL keeps the deployment
prefix.
Reported at #9240.
|
||
|
|
1c0e24f06a |
fix(balance): don't move remote-tiered volumes; don't fatal on missing .idx (#9335)
* fix(volume): don't fatal on missing .idx for remote-tiered volume A .vif left behind without its .idx (orphaned by a crashed move, partial copy, or hand-edit) would trip glog.Fatalf in checkIdxFile and take the whole volume server down on boot, killing every healthy volume on it too. For remote-tiered volumes treat it as a per-volume load error so the server can come up and the operator can clean up the stray .vif. Refs #9331. * fix(balance): skip remote-tiered volumes in admin balance detection The admin/worker balance detector had no equivalent of the shell-side guard ("does not move volume in remote storage" in command_volume_balance.go), so it scheduled moves on remote-tiered volumes. The "move" copies .idx/.vif to the destination and then calls Volume.Destroy on the source, which calls backendStorage.DeleteFile — deleting the remote object the destination's new .vif now points at. Populate HasRemoteCopy on the metrics emitted by both the admin maintenance scanner and the worker's master poll, then drop those volumes at the top of Detection. Fixes #9331. * Apply suggestion from @gemini-code-assist[bot] Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * fix(volume): keep remote data on volume-move-driven delete The on-source delete after a volume move (admin/worker balance and shell volume.move) ran Volume.Destroy with no way to opt out of the remote-object cleanup. Volume.Destroy unconditionally calls backendStorage.DeleteFile for remote-tiered volumes, so a successful move would copy .idx/.vif to the destination and then nuke the cloud object the destination's new .vif was already pointing at. Add VolumeDeleteRequest.keep_remote_data and plumb it through Store.DeleteVolume / DiskLocation.DeleteVolume / Volume.Destroy. The balance task and shell volume.move set it to true; the post-tier-upload cleanup of other replicas and the over-replication trim in volume.fix.replication also set it to true since the remote object is still referenced. Other real-delete callers keep the default. The delete-before-receive path in VolumeCopy also sets it: the inbound copy carries a .vif that may reference the same cloud object as the existing volume. Refs #9331. * test(storage): in-process remote-tier integration tests Cover the four operations the user is most likely to run against a cloud-tiered volume — balance/move, vacuum, EC encode, EC decode — by registering a local-disk-backed BackendStorage as the "remote" tier and exercising the real Volume / DiskLocation / EC encoder code paths. Locks in: - Destroy(keepRemoteData=true) preserves the remote object (move case) - Destroy(keepRemoteData=false) deletes it (real-delete case) - Vacuum/compact on a remote-tier volume never deletes the remote object - EC encode requires the local .dat (callers must download first) - EC encode + rebuild round-trips after a tier-down Tests run in-process and finish in under a second total — no cluster, binary, or external storage required. * fix(rust-volume): keep remote data on volume-move-driven delete Mirror the Go fix in seaweed-volume: plumb keep_remote_data through grpc volume_delete → Store.delete_volume → DiskLocation.delete_volume → Volume.destroy, and skip the s3-tier delete_file call when the flag is set. The pre-receive cleanup in volume_copy passes true for the same reason as the Go side: the inbound copy carries a .vif that may reference the same cloud object as the existing volume. The Rust loader already warns rather than fataling on a stray .vif without an .idx (volume.rs load_index_inmemory / load_index_redb), so no counterpart to the Go fatal-on-missing-idx fix is needed. Refs #9331. * fix(volume): preserve remote tier on IO-error eviction; fix EC test target Two review nits: - Store.MaybeAddVolumes' periodic cleanup pass deleted IO-errored volumes with keepRemoteData=false, so a transient local fault on a remote-tiered volume would also nuke the cloud object. Track the delete reason via a parallel slice and pass keepRemoteData=v.HasRemoteFile() for IO-error evictions; TTL-expired evictions still pass false. - TestRemoteTier_ECEncodeDecode_AfterDownload deleted shards 0..3 but called them "parity" — by the klauspost/reedsolomon convention shards 0..DataShardsCount-1 are data and DataShardsCount..TotalShardsCount-1 are parity. Switch the loop to delete the parity range so the intent matches the indices. --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> |
||
|
|
f2c3bd7b77 |
fix(admin/view): define basePath in plugin IIFE scopes (#9298)
The plugin.templ and plugin_lane.templ components use basePath() in their IIFE (Immediately Invoked Function Expression) scopes to handle subdirectory deployments. However, basePath was not defined locally, causing "basePath is not defined" errors when accessing plugin pages. Added local basePath function definitions in both files, matching the pattern from admin.js. This function checks window.__BASE_PATH__ (set by the layout during page initialization) and prepends it to API paths. |
||
|
|
14cd426cf9 | templ | ||
|
|
db34e8b6fd |
feat(admin): prefer stored S3 Content-Type metadata over key-extension MIME inference (#9286)
* feat(admin): prefer stored filer mime in file browser and properties * feat(mime): enhance MIME type registration and improve fallback logic Co-authored-by: Copilot <copilot@github.com> --------- Co-authored-by: baracudaz <baracudaz@users.noreply.github.com> Co-authored-by: Copilot <copilot@github.com> |
||
|
|
e2f96687ff |
fix(admin): use protocol-relative URLs for component links so HTTPS clusters don't break clicks (#9256)
* fix(admin): use protocol-relative URLs for component links Hardcoded http:// in admin UI templates breaks browser-initiated clicks to master / volume / filer / EC shard / Iceberg REST URLs whenever the target component runs HTTPS-only via security.toml [https.X] sections. The browser sends plain HTTP to a TLS-only endpoint and gets 400 "client sent an HTTP request to an HTTPS server". Same root pattern as #9227 (admin's own backend /dir/status fetch); this PR is the browser-facing equivalent. Replace fmt.Sprintf("http://%s...") with fmt.Sprintf("//%s...") and the JS-string '<a href="http://' with '<a href="//' so the browser uses the same scheme as the page hosting the link. Backwards compatible: - HTTPS-only deployments: links now work - HTTP-only deployments: identical behavior to before - Mixed: edge case, addressed by future per-component public-URL work Affected templates (9 files), each kept in lockstep with its generated _templ.go sibling so reviewers don't need to run templ generate: - weed/admin/view/app/admin.templ - weed/admin/view/app/cluster_filers.templ - weed/admin/view/app/cluster_masters.templ (Go templ + JS modal) - weed/admin/view/app/cluster_volume_servers.templ (Go templ + JS modal) - weed/admin/view/app/cluster_volumes.templ - weed/admin/view/app/ec_volume_details.templ - weed/admin/view/app/volume_details.templ - weed/admin/view/app/iceberg_catalog.templ - weed/admin/view/app/s3tables_buckets.templ 17 link constructions total, +32/-32 lines. * fix(admin): protocol-relative URLs in iceberg + s3tables JS overrides Per Gemini code review on this PR: the JS scripts in iceberg_catalog and s3tables_buckets templates overwrite the href attribute of the "Open Iceberg REST" links after page load, replacing the protocol-relative URL set by the templ render with a hardcoded http://<host>:<port>/v1/config. Apply the same protocol-relative fix to the JS template literals so they don't undo the templ-side change. Browser uses the page scheme (http or https) to fill in the protocol. Mirrored in iceberg_catalog_templ.go and s3tables_buckets_templ.go. * fix(admin): displayed Iceberg endpoint scheme follows page protocol Per CodeRabbit review on this PR: the on-page guidance text in iceberg and s3tables templates still showed a literal `http://` even after the clickable link was switched to a protocol-relative URL. In HTTPS-only deployments operators see `http://host:8181/v1` as the suggested endpoint, copy it, and get a broken connection. Wrap the scheme in <span id="iceberg-protocol"> (and the s3tables counterpart) and have the existing inline script set its innerText to window.location.protocol minus the trailing colon. Same pattern as the existing dynamic host substitution. Mirrored in *_templ.go so reviewers do not need templ generate. SQL/JSON code-block examples (CREATE EXTERNAL TABLE ... ENDPOINT 'http://...', "uri": "http://..." ) are intentionally left as-is — they are starter snippets users adapt to their environment, not clickable or copy-paste-into-runtime values. Happy to follow up with server-side scheme threading if requested. |
||
|
|
fa492a9eed |
fix(admin): wrap plugin URLs with basePath for subdir deployments
Two more spots that broke under a subdirectory deployment: - plugin.templ pluginRequest() called fetch(url) with relative API paths from 14+ callers; wrap once inside the helper so they all honor window.__BASE_PATH__. - plugin_lane.templ generated <a href="/plugin/configuration?job=..."> with an absolute path; wrap with basePath() so the link stays inside the deployment prefix. Follow-up to a6adf530c. |
||
|
|
3ea489d013 |
fix(admin): wrap plugin lane fetch URL with basePath
Plugin lane page fetches API endpoints with raw absolute URLs, breaking deployments under a subdirectory. Wrap the fetch URL with basePath() so window.__BASE_PATH__ is honored, matching other admin pages. Addresses https://github.com/seaweedfs/seaweedfs/issues/9240 |
||
|
|
f407bdaa36 |
fix(admin): use TLS-aware HTTP client for /dir/status fetch (#9227)
fetchPublicUrlMap() in weed/admin/dash/cluster_topology.go uses a
dedicated &http.Client{} that doesn't honor security.toml client TLS
configuration, and hardcodes "http://" in the URL. When master is
configured HTTPS-only ([https.master] set), every cluster topology
cache refresh logs:
NOTICE: http: TLS handshake error from <admin-ip>:<port>: client
sent an HTTP request to an HTTPS server
The function falls through to glog.V(1).Infof and returns nil, so the
admin UI loses PublicUrl enrichment for data nodes. Cosmetic but noisy.
Switch to util_http.GetGlobalHttpClient() whose Do() calls
NormalizeHttpScheme(), which automatically rewrites http:// to https://
when [https.client] is enabled and presents the configured client cert.
Preserve the 5-second timeout via context.WithTimeout().
Same pattern as weed/admin/handlers/file_browser_handlers.go,
weed/server/master_server.go, weed/shell/command_volume_fsck.go.
|
||
|
|
5eead9409a |
fix(admin): S3 Tables CSRF token + non-empty 409 status (#9221)
* fix(admin): attach CSRF token to S3 Tables write requests Several POST/PUT/DELETE calls in s3tables.js were sent without an X-CSRF-Token header while the corresponding handlers in weed/admin/dash/s3tables_management.go enforce CSRF via requireSessionCSRFToken, so authenticated users hit "invalid CSRF token" on actions like creating a table bucket (#9220), updating policies, and managing tags. Add an s3tWriteHeaders helper that pulls the token from the existing csrf-token meta tag and use it on every write to /api/s3tables/buckets, /bucket-policy, /tables, /table-policy, and /tags. The Iceberg-page write paths already attached the token and are unchanged. Fixes #9220 * fix(admin): map BucketNotEmpty/NamespaceNotEmpty to 409 for S3 Tables DELETE on a non-empty table bucket or namespace returned HTTP 500 because s3TablesErrorStatus didn't list ErrCodeBucketNotEmpty or ErrCodeNamespaceNotEmpty in its conflict case, even though the backend handler emits them with 409 Conflict (matching AWS S3 Tables). Add both codes to the existing conflict mapping. * refactor(admin): route Iceberg S3 Tables writes through s3tWriteHeaders Iceberg namespace/table create and Iceberg table delete were still hand-rolling CSRF headers. Replace those blocks with the existing s3tWriteHeaders() helper so every S3 Tables write uses the same code path. Drop the now-unused csrfTokenInput.value population in initIcebergNamespaces and initIcebergTables (the templ hidden inputs have no server-rendered value, and nothing reads the input now that the JS reads the token from the meta tag via getCSRFToken()). |
||
|
|
0fcd5173be |
fix(admin): use basePath for API fetches when urlPrefix is set (#9197)
* fix(admin): use basePath for API fetches when urlPrefix is set * fix(admin): drop duplicate iam-utils script on Groups page * fix(admin): route topics page fetches through basePath The Topics page missed two fetch() calls that still used root-relative URLs, so create-topic and view-details still broke when -urlPrefix was set. --------- Co-authored-by: Maksim Babkou <maksim.babkou@innovatrics.com> Co-authored-by: Chris Lu <chris.lu@gmail.com> |
||
|
|
036191c78a | Merge branch 'master' of https://github.com/seaweedfs/seaweedfs | ||
|
|
ae93f87a46 | adjust logo | ||
|
|
c6302fcb54 |
feat(iam): allow caller-supplied AccessKeyId and SecretAccessKey in CreateAccessKey (#9172)
* feat(iam): support caller-supplied AccessKeyId and SecretAccessKey in CreateAccessKey Both IAM implementations (standalone and embedded) now check for caller-supplied AccessKeyId and SecretAccessKey form parameters before generating random credentials. If provided, the caller-supplied values are used. If empty, random keys are generated as before. This enables programmatic identity provisioning where the caller needs to control the S3 credentials. Backward-compatible: no behavior change for callers that omit these parameters. * refactor(iam): extract shared caller-supplied credential validation Move the AccessKeyId/SecretAccessKey format checks and the in-memory collision scan into weed/iam so the standalone IAM API, the embedded IAM in s3api, and the admin dashboard all enforce the same rules. - ValidateCallerSuppliedAccessKeyId: 4-128 alphanumeric (rejects SigV4-breaking characters like '/' and '='). - ValidateCallerSuppliedSecretAccessKey: 8-128 chars. - FindAccessKeyOwner: scans identities and service accounts and returns the owning entity type + name for debug logging, without exposing the owner in caller-facing error messages. The admin dashboard previously only length-checked caller-supplied keys; it now enforces the same alphanumeric rule, which matches what SigV4 actually accepts anyway. * fix(iam): reject partial caller-supplied AccessKeyId/SecretAccessKey Previously, if a caller supplied only one of AccessKeyId or SecretAccessKey, CreateAccessKey logged a warning and auto-generated the missing half. That silently returns a credential the caller did not fully choose, which is surprising and easy to miss in a response they expected to echo back their input. Return ErrCodeInvalidInputException instead: either both are supplied or neither is. Updates the mixed-supply tests in weed/iamapi and weed/s3api to assert the rejection. * chore(iam): centralize and broaden sensitive form redaction DoActions and ExecuteAction both had an inline loop that redacted SecretAccessKey from their debug-level request log. Replace the two copies with iam.RedactSensitiveFormValues, backed by an explicit sensitive-keys set. The set now also covers Password, OldPassword, NewPassword, PrivateKey, and SessionToken. None of those parameters are used by today's IAM actions, but naming them here makes the log-safety guarantee survive future additions such as LoginProfile / STS. * test(iam): cover the upper length bound for CreateAccessKey TestCreateAccessKeyBoundary / TestEmbeddedIamCreateAccessKeyBoundary only exercised the 3/4-char lower edge. Add cases for 128 (accepted) and 129 (rejected) for AccessKeyId, plus 7 / 128 / 129-char cases for SecretAccessKey, so both ends of the validator are locked in at the handler level (the pure validators in weed/iam already cover this). * fix(s3api/iam): verify user existence before RNG and collision scan In the embedded IAM CreateAccessKey, the user lookup ran last: a request for a non-existent user still walked the whole identity / service-account list for collisions and, if no caller-supplied keys were present, generated fresh random credentials with crypto/rand before the NoSuchEntity error finally surfaced. Reorder: validate inputs, then find the target identity, then do the collision scan, then generate keys. A missing user now fails fast and consumes no entropy, and the handler returns NoSuchEntity instead of a misleading EntityAlreadyExists when both the user is missing and the supplied AccessKeyId happens to collide with another identity's key. Add TestEmbeddedIamCreateAccessKeyRejectsMissingUser to lock in the "no mutation on unknown user" guarantee. The standalone iamapi CreateAccessKey intentionally keeps its pre-existing "create-or-attach" semantics where a missing user is implicitly provisioned — that is a behavior change beyond the scope of this PR. * test(iam): tighten collision leak assertion and cover 8-char secret - Rename the collision-owner identity in TestCreateAccessKeyRejectsCollision (both iamapi and the embedded s3api test) from "existing" / "ExistingUser" to "ownerAlpha". The old assert.NotContains check was effectively a no-op because the error message never contained those substrings; a distinctive name shared with no part of the expected error body makes the leak guard actually meaningful if the wording ever drifts. The embedded test also adds a NotContains assertion that was previously missing entirely. - Add an explicit 8-char SecretAccessKey pass case to both boundary tests so the lower edge of the validator is locked in at the handler level alongside the 7 / 128 / 129-char cases. * fix(iamapi): enforce both-or-none before the collision lookup In the standalone IAM CreateAccessKey, FindAccessKeyOwner ran before the partial-credential check. If a caller supplied only AccessKeyId and it happened to collide with an existing key, the response was EntityAlreadyExists instead of the more fundamental InvalidInput for omitting SecretAccessKey — wrong error class, and leaked the fact that the probed key is already in use. Swap the order: validate both-or-none first, then do the collision scan. Matches the embedded IAM path and AWS behavior. Add a case to TestCreateAccessKeyRejectsPartialSupply that combines partial supply with a collision to lock in the ordering. * fix(admin): reject partial caller-supplied AccessKey/SecretKey The admin dashboard path silently generated the missing half when a caller supplied only one of AccessKey or SecretKey, while the IAM API and embedded IAM paths now reject this. Align the three: if exactly one is provided, return ErrInvalidInput. Also simplifies the generator block — either both are provided or neither is, so there is no mixed path to handle. * test(s3api/iam): guard dereferences in caller-supplied-keys test TestEmbeddedIamCreateAccessKeyWithCallerSuppliedKeys dereferenced *AccessKeyId/*SecretAccessKey/*UserName and indexed Identities[0].Credentials[0] without first verifying shape, so any future regression that returns a partial response or skips the config mutation would panic mid-assertion instead of failing with a clear message. Add require.NotNil on the response pointers and require.Len on the identities/credentials slices before the asserts. * test(iamapi): exercise the service-account branch of the collision check FindAccessKeyOwner scans both Identities[*].Credentials and ServiceAccounts[*].Credential, but TestCreateAccessKeyRejectsCollision only covered the identity branch. Split the test into two subtests — one per branch — so a future refactor that drops the service-account scan (or mutates the existing credential) trips a failure. Also asserts the existing service-account credential is not mutated and no credential is attached to the target identity on rejection. * test(iam): isolate 129-char secret subcase from prior credential In both TestCreateAccessKeyBoundary (iamapi) and TestEmbeddedIamCreateAccessKeyBoundary (s3api), the 129-char SecretAccessKey subcase reused the "validkey" AccessKeyId that the preceding 8-char subcase had just persisted into the config. The test still asserted the right outcome because the handler validates secret length before running the collision scan — but if the two checks ever swap, the subcase would pass (or fail) for the wrong reason. Reset the in-memory credentials before the 129-char subcase, matching the pattern already used by the 3/128/129-char AccessKeyId and 7-char secret subcases. No behavior change; purely test isolation. --------- Co-authored-by: Chris Lu <chris.lu@gmail.com> |
||
|
|
eaab3ec5d0 |
fix(topology): drop per-disk task-type conflict map (#9147) (#9166)
* fix(topology): drop per-disk task-type conflict map (#9147) Different job types (Balance, ErasureCoding, Vacuum) operate on different volumes, so a per-disk cross-type exclusion adds no correctness guarantee beyond what HasAnyTask already enforces at task detection time. The conflict map turned this into a deadlock on small clusters: a single in-flight (or retrying) balance task would prune the source/destination disks from EC placement, dropping the candidate count below MinTotalDisks and permanently blocking auto-EC. Removing the map lets EC see all eligible disks. Per-volume safety is still guaranteed by HasAnyTask, and per-disk load shaping remains available via MaxConcurrentTasksPerDisk. * chore(topology): trim verbose comments from #9147 fix |
||
|
|
46b801aedb |
fix(admin): list all masters and dedupe EC file counts in dashboard (#9093)
* fix(admin): list all masters and dedupe EC file counts in dashboard Dashboard -> Master Nodes only ever showed the currently connected master because getMasterNodesStatus hard-coded a single entry. Replace it with a RaftListClusterServers call that returns every master in the raft group and tags the real leader, falling back to the current master only if the raft call fails. Buckets -> Object Store Buckets could render 0 objects for a bucket backed by an EC volume. Every shard holder reports the same whole-volume file_count (read from the replicated .ecx), so the first-seen value wins; if that first node had not yet finished loading .ecx it reported 0 and pinned the aggregate at 0. Take the max across reporting nodes instead. The dashboard header total_files also dropped after volumes were converted to erasure coding because getTopologyViaGRPC never folded EC file_count into topology.TotalFiles. Aggregate it with the same max/sum dedupe. * fix(admin): address PR review comments - bound RaftListClusterServers with a 3s timeout so the dashboard endpoint cannot hang on a stalled master - pre-validate raft addresses with net.SplitHostPort before calling pb.GrpcAddressToServerAddress, which otherwise glog.Fatalf's on a malformed entry and would crash the admin process - when raft is unreachable, mark the fallback master as not-leader rather than claiming leadership the code cannot verify - warn when summed EC delete_count exceeds file_count while folding into topology.TotalFiles, matching collectCollectionStats * fix(admin): distinguish empty raft response from RPC failure When RaftListClusterServers returns successfully with no servers, raft is not initialized (standalone/non-raft cluster), so the single fallback master is the leader. Only treat the fallback as a non-leader when the RPC actually failed. * fix(admin): remove misleading Objects column from S3 buckets page The bucket "Objects" column displayed needle counts from volume collection stats, not actual S3 object counts. This is confusing because a single S3 object can span multiple needles (multipart uploads, versions) and the count is inaccurate for EC volumes. Remove the ObjectCount field from S3Bucket, the Objects table column, the sort-by-objects handler, the detail-view row, and both CSV export references. * fix(admin): correct cell indexes in fallback bucket CSV export After the Objects column was removed, the fallback CSV exporter in admin.js still used stale cell indexes: cells[1] mapped to Owner (not Created), cells[2] to Created (not Size), cells[3] to Logical Size (not Quota). Align all indexes with the current table column order and include Owner, Logical Size, and Physical Size. |
||
|
|
300e906330 |
admin: report file and delete counts for EC volumes (#9060)
* admin: report file and delete counts for EC volumes The admin bucket size fix (#9058) left object counts at zero for EC-encoded data because VolumeEcShardInformationMessage carried no file count. Billing/monitoring dashboards therefore still under-report objects once a bucket is EC-encoded. Thread file_count and delete_count end-to-end: - Add file_count/delete_count to VolumeEcShardInformationMessage (proto fields 8 and 9) and regenerate master_pb. - Compute them lazily on volume servers by walking the .ecx index once per EcVolume, cache on the struct, and keep the cache in sync inside DeleteNeedleFromEcx (distinguishing live vs already-tombstoned entries so idempotent deletes do not drift the counts). - Populate the new proto fields from EcVolume.ToVolumeEcShardInformationMessage and carry them through the master-side EcVolumeInfo / topology sync. - Aggregate in admin collectCollectionStats, deduping per volume id: every node holding shards of an EC volume reports the same counts, so summing across nodes would otherwise multiply the object count by the number of shard holders. Regression tests cover the initial .ecx walk, live/tombstoned delete bookkeeping (including idempotent and missing-key cases), and the admin dedup path for an EC volume reported by multiple nodes. * ec: include .ecj journal in EcVolume delete count The initial delete count only reflected .ecx tombstones, missing any needle that was journaled in .ecj but not yet folded into .ecx — e.g. on partial recovery. Expand initCountsLocked to take the union of .ecx tombstones and .ecj journal entries, deduped by needle id, so: - an id that is both tombstoned in .ecx and listed in .ecj counts once - a duplicate .ecj entry counts once - an .ecj id with a live .ecx entry is counted as deleted (not live) - an .ecj id with no matching .ecx entry is still counted Covered by TestEcVolumeFileAndDeleteCountEcjUnion. * ec: report delete count authoritatively and tombstone once per delete Address two issues with the previous EcVolume file/delete count work: 1. The delete count was computed lazily on first heartbeat and mixed in a .ecj-union fallback to "recover" partial state. That diverged from how regular volumes report counts (always live from the needle map) and had drift cases when .ecj got reconciled. Replace with an eager walk of .ecx at NewEcVolume time, maintained incrementally on every DeleteNeedleFromEcx call. Semantics now match needle_map_metric: FileCount is the total number of needles ever recorded in .ecx (live + tombstoned), DeleteCount is the tombstones — so live = FileCount - DeleteCount. Drop the .ecj-union logic entirely. 2. A single EC needle delete fanned out to every node holding a replica of the primary data shard and called DeleteNeedleFromEcx on each, which inflated the per-volume delete total by the replica factor. Rewrite doDeleteNeedleFromRemoteEcShardServers to try replicas in order and stop at the first success (one tombstone per delete), and only fall back to other shards when the primary shard has no home (ErrEcShardMissing sentinel), not on transient RPC errors. Admin aggregation now folds EC counts correctly: FileCount is deduped per volume id (every shard holder has an identical .ecx) and DeleteCount is summed across nodes (each delete tombstones exactly one node). Live object count = deduped FileCount - summed DeleteCount. Tests updated to match the new semantics: - EC volume counts seed FileCount as total .ecx entries (live + tombstoned), DeleteCount as tombstones. - DeleteNeedleFromEcx keeps FileCount constant and increments DeleteCount only on live->tombstone transitions. - Admin dedup test uses distinct per-node delete counts (5 + 3 + 2) to prove they're summed, while FileCount=100 is applied once. * ec: test fixture uses real vid; admin warns on skewed ec counts - writeFixture now builds the .ecx/.ecj/.ec00/.vif filenames from the actual vid passed in, instead of hardcoding "_1". The existing tests all use vid=1 so behaviour is unchanged, but the helper no longer silently diverges from its documented parameter. - collectCollectionStats logs a glog warning when an EC volume's summed delete count exceeds its deduped file count, surfacing the anomaly (stale heartbeat, counter drift, etc.) instead of silently dropping the volume from the object count. * ec: derive file/delete counts from .ecx/.ecj file sizes seedCountsFromEcx walked the full .ecx index at volume load, which is wasted work: .ecx has fixed-size entries (NeedleMapEntrySize) and .ecj has fixed-size deletion records (NeedleIdSize), so both counts are pure file-size arithmetic. fileCount = ecxFileSize / NeedleMapEntrySize deleteCount = ecjFileSize / NeedleIdSize Rip out the cached counters, countsLock, seedCountsFromEcx, and the recordDelete helper. Track ecjFileSize directly on the EcVolume struct, seed it from Stat() at load, and bump it on every successful .ecj append inside DeleteNeedleFromEcx under ecjFileAccessLock. Skip the .ecj write entirely when the needle is already tombstoned so the derived delete count stays idempotent on repeat deletes. Heartbeats now compute counts in O(1). Tests updated: the initial fixture pre-populates .ecj with two ids to verify the file-size derivation end-to-end, and the delete test keeps its idempotent-re-delete / missing-needle invariants (unchanged externally, now enforced by the early return rather than a cache guard). * ec: sync Rust volume server with Go file/delete count semantics Mirror the Go-side EC file/delete count work in the Rust volume server so mixed Go/Rust clusters report consistent bucket object counts in the admin dashboard. - Add file_count (8) and delete_count (9) to the Rust copy of VolumeEcShardInformationMessage (seaweed-volume/proto/master.proto). - EcVolume gains ecj_file_size, seeded from the journal's metadata on open and bumped inside journal_delete on every successful append. - file_and_delete_count() returns counts derived in O(1) from ecx_file_size / NEEDLE_MAP_ENTRY_SIZE and ecj_file_size / NEEDLE_ID_SIZE, matching Go's FileAndDeleteCount. - to_volume_ec_shard_information_messages populates the new proto fields instead of defaulting them to zero. - mark_needle_deleted_in_ecx now returns a DeleteOutcome enum (NotFound / AlreadyDeleted / Tombstoned) so journal_delete can skip both the .ecj append and the size bump when the needle is missing or already tombstoned, keeping the derived delete_count idempotent on repeat or no-op deletes. - Rust's EcVolume::new no longer replays .ecj into .ecx on load. Go's RebuildEcxFile is only called from specific decode/rebuild gRPC handlers, not on volume open, and replaying on load was hiding the deletion journal from the new file-size-derived delete counter. rebuild_ecx_from_journal is kept as dead_code for future decode paths that may want the same replay semantics. Also clean up the Go FileAndDeleteCount to drop unnecessary runtime guards against zero constants — NeedleMapEntrySize and NeedleIdSize are compile-time non-zero. test_ec_volume_journal updated to pre-populate the .ecx with the needles it deletes, and extended to verify that repeat and missing-id deletes do not drift the derived counts. * ec: document enterprise-reserved proto field range on ec shard info Both OSS master.proto copies now note that fields 10-19 are reserved for future upstream additions while 20+ are owned by the enterprise fork. Enterprise already pins data_shards/parity_shards at 20/21, so keeping OSS additions inside 8-19 avoids wire-level collisions for mixed deployments. * ec(rust): resolve .ecx/.ecj helpers from ecx_actual_dir ecx_file_name() and ecj_file_name() resolved from self.dir_idx, but new() opens the actual files from ecx_actual_dir (which may fall back to the data dir when the idx dir does not contain the index). After a fallback, read_deleted_needles() and rebuild_ecx_from_journal() would read/rebuild the wrong (nonexistent) path while heartbeats reported counts from the file actually in use — silently dropping deletes. Point idx_base_name() at ecx_actual_dir, which is initialized to dir_idx and only diverges after a successful fallback, so every call site agrees with the file new() has open. The pre-fallback call in new() (line 142) still returns the dir_idx path because ecx_actual_dir == dir_idx at that point. Update the destroy() sweep to build the dir_idx cleanup paths explicitly instead of leaning on the helpers, so post-fallback stale files in the idx dir are still removed. * ec: reset ecj size after rebuild; rollback ecx tombstone on ecj failure Two EC delete-count correctness fixes applied symmetrically to Go and Rust volume servers. 1. rebuild_ecx_from_journal (Rust) now sets ecj_file_size = 0 after recreating the empty journal, matching the on-disk truth. Previously the cached size still reflected the pre-rebuild journal and file_and_delete_count() would keep reporting stale delete counts. The Go side has no equivalent bug because RebuildEcxFile runs in an offline helper that does not touch an EcVolume struct. 2. DeleteNeedleFromEcx / journal_delete used to tombstone the .ecx entry before writing the .ecj record. If the .ecj append then failed, the needle was permanently marked deleted but the heartbeat-reported delete_count never advanced (it is derived from .ecj file size), and a retry would see AlreadyDeleted and early- return, leaving the drift permanent. Both languages now capture the entry's file offset and original size bytes during the mark step, attempt the .ecj append, and on failure roll the .ecx tombstone back by writing the original size bytes at the known offset. A rollback that itself errors is logged (glog / tracing) but cannot re-sync the files — this is the same failure mode a double disk error would produce, and is unavoidable without a full on-disk transaction log. Go: wrap MarkNeedleDeleted in a closure that captures the file offset into an outer variable, then pass the offset + oldSize to the new rollbackEcxTombstone helper on .ecj seek/write errors. Rust: DeleteOutcome::Tombstoned now carries the size_offset and a [u8; SIZE_SIZE] copy of the pre-tombstone size field. journal_delete destructures on Tombstoned and calls restore_ecx_size on .ecj append failure. * test(ec): widen admin /health wait to 180s for cold CI TestEcEndToEnd starts master, 14 volume servers, filer, 2 workers and admin in sequence, then waited only 60s for admin's HTTP server to come up. On cold GitHub runners the tail of the earlier subprocess startups eats most of that budget and the wait occasionally times out (last hit on run 24374773031). The local fast path is still ~20s total, so the bump only extends the timeout ceiling, not the happy path. * test(ec): fork volume servers in parallel in TestEcEndToEnd startWeed is non-blocking (just cmd.Start()), so the per-process fork + mkdir + log-file-open overhead for 14 volume servers was serialized for no reason. On cold CI disks that overhead stacks up and eats into the subsequent admin /health wait, which is how run 24374773031 flaked. Wrap the volume-server loop in a sync.WaitGroup and guard runningCmds with a mutex so concurrent appends are safe. startWeed still calls t.Fatalf on failure, which is fine from a goroutine for a fatal test abort; the fail-fast isn't something we rely on for precise ordering. * ec: fsync ecx before ecj, truncate on failure, harden rebuild Four correctness fixes covering both volume servers. 1. Durability ordering (Go + Rust). After marking the .ecx tombstone we now fsync .ecx before touching .ecj, so a crash between the two files cannot leave the journal with an entry for a needle whose tombstone is still sitting in page cache. Once the fsync returns, the tombstone is the source of truth: reads see "deleted", delete_count may under-count by one (benign, idempotent retries) but never over-reports. If the fsync itself fails we restore the original size bytes and surface the error. The .ecj append is then followed by its own Sync so the reported delete_count matches the on-disk journal once the write returns. 2. .ecj truncation on append failure. write_all may have extended the journal on disk before sync_all / Sync errors out, leaving the cached ecj_file_size out of sync with the physical length and drifting delete_count permanently after restart. Both languages now capture the pre-append size, truncate the file back via set_len / Truncate on any write or sync failure, and only then restore the .ecx tombstone. Truncation errors are logged — same-fd length resets cannot realistically fail — but cannot themselves re-sync the files. 3. Atomic rebuild_ecx_from_journal (Rust, dead code today but wired up on any future decode path). Previously a failed mark_needle_deleted_in_ecx call was swallowed with `let _ = ...` and the journal was still removed, silently losing tombstones. We now bubble up any non-NotFound error, fsync .ecx after the whole replay succeeds, and only then drop and recreate .ecj. NotFound is still ignored (expected race between delete and encode). 4. Missing-.ecx hardening (Rust). mark_needle_deleted_in_ecx used to return Ok(NotFound) when self.ecx_file was None, hiding a closed or corrupt volume behind what looks like an idempotent no-op. It now returns an io::Error carrying the volume id so callers (e.g. journal_delete) fail loudly instead. Existing Go and Rust EC test suites stay green. * ec: make .ecx immutable at runtime; track deletes in memory + .ecj Refactors both volume servers so the sealed sorted .ecx index is never mutated during normal operation. Runtime deletes are committed to the .ecj deletion journal and tracked in an in-memory deleted-needle set; read-path lookups consult that set to mask out deleted ids on top of the immutable .ecx record. Mirrors the intended design on both Go and Rust sides. EcVolume gains a `deletedNeedles` / `deleted_needles` set seeded from .ecj in NewEcVolume / EcVolume::new. DeleteNeedleFromEcx / journal_delete: 1. Looks the needle up read-only in .ecx. 2. Missing needle -> no-op. 3. Pre-existing .ecx tombstone (from a prior decode/rebuild) -> mirror into the in-memory set, no .ecj append. 4. Otherwise append the id to .ecj, fsync, and only then publish the id into the set. A partial write is truncated back to the pre-append length so the on-disk journal and the in-memory set cannot drift. FindNeedleFromEcx / find_needle_from_ecx now return TombstoneFileSize when the id is in the in-memory set, even though the bytes on disk still show the original size. FileAndDeleteCount: fileCount = .ecx size / NeedleMapEntrySize (unchanged) deleteCount = len(deletedNeedles) (was: .ecj size / NeedleIdSize) The RebuildEcxFile / rebuild_ecx_from_journal decode-time helpers still fold .ecj into .ecx — that is the one place tombstones land in the physical index, and it runs offline on closed files. Rust's rebuild helper now also clears the in-memory set when it succeeds. Dead code removed on the Rust side: `DeleteOutcome`, `mark_needle_deleted_in_ecx`, `restore_ecx_size`. Go drops the runtime `rollbackEcxTombstone` path. Neither helper was needed once .ecx stopped being a runtime mutation target. TestEcVolumeSyncEnsuresDeletionsVisible (issue #7751) is rewritten as TestEcVolumeDeleteDurableToJournal, which exercises the full durability chain: delete -> .ecj fsync -> FindNeedleFromEcx masks via the in-memory set -> raw .ecx bytes are *unchanged* -> Close + RebuildEcxFile folds the journal into .ecx -> raw bytes now show the tombstone, as CopyFile in the decode path expects. |
||
|
|
ef77df6141 |
admin: include EC volumes in bucket size reporting (#9058)
* admin: include EC volumes in bucket size reporting The Object Store buckets page computed per-collection size by iterating only regular volumes, so once a bucket's data was EC-encoded it silently disappeared from the reported size — breaking usage-based billing. Walk EcShardInfos alongside VolumeInfos in collectCollectionStats: add raw shard bytes to PhysicalSize, and the parity-stripped value (shardBytes * DataShardsCount / TotalShardsCount) to LogicalSize, matching the normalization used by `weed shell` cluster.status. * admin: derive EC logical size from shard bitmap, not constants Use ShardsInfoFromVolumeEcShardInformationMessage + MinusParityShards to sum actual data-shard bytes instead of scaling raw bytes by the DataShardsCount/TotalShardsCount ratio. Keeps the data/parity split encapsulated in the erasure_coding package and is exact when shard sizes differ (e.g. last shard). * admin: regression test for EC shard size aggregation Cover the uneven-tail-shard case (data shard 9 < 1000 bytes) and the empty-collection-name path to pin PhysicalSize/LogicalSize behavior for collectCollectionStats against future changes. |
||
|
|
512912cbb8 | Update plugin_templ.go | ||
|
|
ae08e77979 |
fix(scheduler): give worker tasks a real per-attempt execution deadline (#9041)
* fix(scheduler): give worker tasks a real per-attempt execution deadline The plugin scheduler derived the per-attempt execution deadline as DetectionTimeoutSeconds * 2, which capped every worker task at twice the cluster-scan budget regardless of actual work. For volume_balance batches this was 240s — far too short for 20 large volume copies, so every attempt died at "context deadline exceeded" and all in-flight sub-RPCs surfaced as "context canceled". Retries restarted from move 1 and hit the same wall. Add an explicit ExecutionTimeoutSeconds field to the plugin proto and make each handler declare its own baseline (1800s for vacuum, balance, EC; 3600s for iceberg). Size-aware handlers also emit an estimated_runtime_seconds parameter on each proposal so the scheduler extends the per-attempt deadline based on actual workload: - volume_balance batch: max(largest single move, total / concurrency) at 5 min/GB, so a skewed batch with one big volume isn't averaged away. - volume_balance single, vacuum (already), erasure_coding (10 min/GB), ec_balance (5 min/GB): per-volume budgets. admin_script and iceberg keep the configurable handler default since their workloads are opaque to the detector. * fix(scheduler): apply descriptor defaults to existing persisted configs The previous commit added execution_timeout_seconds to the proto and each handler's descriptor defaults, but two paths still left existing deployments broken: 1. deriveSchedulerAdminRuntime returned stored AdminRuntime configs as-is. Persisted configs from older versions have no execution_timeout_seconds, so the scheduler fell back to the 90s default — worse than the prior 240s behavior. Overlay descriptor defaults for any zero numeric fields when loading. 2. The admin form did not round-trip execution_timeout_seconds, so a normal save would clear it back to zero. Add the input field, the fillAdminSettings/collectAdminSettings hooks, and as defense in depth reapply descriptor defaults in UpdatePluginJobTypeConfigAPI before persisting so a stale form can never silently clobber a baseline. * fix(volume_balance): account for partial scheduling rounds in batch estimate With N moves and C slots, the busiest slot processes ceil(N/C) moves, not N/C. Dividing total seconds by C underestimates wall-clock time whenever N is not a multiple of C — e.g. 6 moves at concurrency 5 needs 2 rounds, not 1.2. Use avg * ceil(N/C) so partial rounds are counted as full ones. * fix(volume_balance): scale minBudget per wave instead of per move Orchestration overhead (setup/teardown for the parallel move runner) happens once per wave, not once per move. Use numRounds*60 as the floor instead of len(moves)*60 so the minimum doesn't inflate linearly with batch size when individual moves are tiny. |
||
|
|
28d1ef24ec |
fix(admin): allow control chars in file paths when browsing filer (#9043)
* fix(admin): allow control chars in file paths when browsing filer The admin UI rejected any path containing \x00, \r, or \n as "path contains invalid characters". These bytes are legal in S3 object keys, so objects created through the S3 API (or replicated via filer.sync) could exist on the filer but be unreachable from the admin UI — browse, download, and upload all failed with "Invalid file path". Drop the control-character rejection and instead URL-escape the path when constructing filer request URLs, so that such bytes cannot inject into the HTTP request target. Path traversal protection via path.Clean is unchanged. * test(admin): strengthen file path tests with byte-preserving checks Assert full expected output for validateAndCleanFilePath so silent stripping of control characters would fail the test, and cover \r and \x00 escaping in filerFileURL in addition to \n and space. |
||
|
|
41ff105f47 |
object_store_users: fix specific bucket admin permission (#9014)
Fix an issue where seleting Sepecific Buckets with Admin permission while creating/editing an object store user would grant Admin permission on all buckets |
||
|
|
e648c76bcf | go fmt | ||
|
|
b0e79ad207 |
fix(admin): respect urlPrefix for root redirect and JS API calls (#8975)
* fix(admin): respect urlPrefix for root redirect and JS API calls (#8967) Two issues when running admin UI behind a reverse proxy with -urlPrefix: 1. Visiting the prefix path without trailing slash (e.g. /s3-admin) caused a redirect to / instead of /s3-admin/ because http.StripPrefix produced an empty path that the router redirected to root. 2. Several JavaScript API calls in admin.js used hardcoded paths instead of basePath(), causing file upload, download, and preview to fail. * fix(admin): preserve query params in prefix redirect and use 302 Use http.StatusFound instead of 301 to avoid aggressive browser caching of a configuration-dependent redirect, and preserve query parameters. |
||
|
|
076d504044 |
fix(admin): reduce memory usage and verbose logging for large clusters (#8927)
* fix(admin): reduce memory usage and verbose logging for large clusters (#8919) The admin server used excessive memory and produced thousands of log lines on clusters with many volumes (e.g., 33k volumes). Three root causes: 1. Scanner duplicated all volume metrics: getVolumeHealthMetrics() created VolumeHealthMetrics objects, then convertToTaskMetrics() copied them all into identical types.VolumeHealthMetrics. Now uses the task-system type directly, eliminating the duplicate allocation and removing convertToTaskMetrics. 2. All previous task states loaded at startup: LoadTasksFromPersistence read and deserialized every .pb file from disk, logging each one. With thousands of balance tasks persisted, this caused massive startup I/O, memory usage, and log noise (including unguarded DEBUG glog.Infof per task). Now starts with an empty queue — the scanner re-detects current needs from live cluster state. Terminal tasks are purged from memory and disk when new scan results arrive. 3. Verbose per-volume/per-node logging: V(2) and V(3) logs produced thousands of lines per scan. Per-volume logs bumped to V(4), per-node/rack/disk logs bumped to V(3). Topology summary now logs counts instead of full node ID arrays. Also removes lastTopologyInfo field from MaintenanceScanner — the raw protobuf topology is returned as a local value and not retained between 30-minute scans. * fix(admin): delete stale task files at startup, add DeleteAllTaskStates Old task .pb files from previous runs were left on disk. The periodic CleanupCompletedTasks still loads all files to find completed ones — the same expensive 4GB path from the pprof profile. Now at startup, DeleteAllTaskStates removes all .pb files by scanning the directory without reading or deserializing them. The scanner will re-detect any tasks still needed from live cluster state. * fix(admin): don't persist terminal tasks to disk CompleteTask was saving failed/completed tasks to disk where they'd accumulate. The periodic cleanup only triggered for completed tasks, not failed ones. Now terminal tasks are deleted from disk immediately and only kept in memory for the current session's UI. * fix(admin): cap in-memory tasks to 100 per job type Without a limit, the task map grows unbounded — balance could create thousands of pending tasks for a cluster with many imbalanced volumes. Now AddTask rejects new tasks when a job type already has 100 in the queue. The scanner will re-detect skipped volumes on the next scan. * fix(admin): address PR review - memory-only purge, active-only capacity - purgeTerminalTasks now only cleans in-memory map (terminal tasks are already deleted from disk by CompleteTask) - Per-type capacity limit counts only active tasks (pending/assigned/ in_progress), not terminal ones - When at capacity, purge terminal tasks first before rejecting * fix(admin): fix orphaned comment, add TaskStatusCancelled to terminal switch - Move hasQueuedOrActiveTaskForVolume comment to its function definition - Add TaskStatusCancelled to the terminal state switch in CompleteTask so cancelled task files are deleted from disk |
||
|
|
d37b592bc4 | Update object_store_users_templ.go | ||
|
|
d1823d3784 |
fix(s3): include static identities in listing operations (#8903)
* fix(s3): include static identities in listing operations Static identities loaded from -s3.config file were only stored in the S3 API server's in-memory state. Listing operations (s3.configure shell command, aws iam list-users) queried the credential manager which only returned dynamic identities from the backend store. Register static identities with the credential manager after loading so they are included in LoadConfiguration and ListUsers results, and filtered out before SaveConfiguration to avoid persisting them to the dynamic store. Fixes https://github.com/seaweedfs/seaweedfs/discussions/8896 * fix: avoid mutating caller's config and defensive copies - SaveConfiguration: use shallow struct copy instead of mutating the caller's config.Identities field - SetStaticIdentities: skip nil entries to avoid panics - GetStaticIdentities: defensively copy PolicyNames slice to avoid aliasing the original * fix: filter nil static identities and sync on config reload - SetStaticIdentities: filter nil entries from the stored slice (not just from staticNames) to prevent panics in LoadConfiguration/ListUsers - Extract updateCredentialManagerStaticIdentities helper and call it from both startup and the grace.OnReload handler so the credential manager's static snapshot stays current after config file reloads * fix: add mutex for static identity fields and fix ListUsers for store callers - Add sync.RWMutex to protect staticIdentities/staticNames against concurrent reads during config reload - Revert CredentialManager.ListUsers to return only store users, since internal callers (e.g. DeletePolicy) look up each user in the store and fail on non-existent static entries - Merge static usernames in the filer gRPC ListUsers handler instead, via the new GetStaticUsernames method - Fix CI: TestIAMPolicyManagement/managed_policy_crud_lifecycle was failing because DeletePolicy iterated static users that don't exist in the store * fix: show static identities in admin UI and weed shell The admin UI and weed shell s3.configure command query the filer's credential manager via gRPC, which is a separate instance from the S3 server's credential manager. Static identities were only registered on the S3 server's credential manager, so they never appeared in the filer's responses. - Add CredentialManager.LoadS3ConfigFile to parse a static S3 config file and register its identities - Add FilerOptions.s3ConfigFile so the filer can load the same static config that the S3 server uses - Wire s3ConfigFile through in weed mini and weed server modes - Merge static usernames in filer gRPC ListUsers handler - Add CredentialManager.GetStaticUsernames helper - Add sync.RWMutex to protect concurrent access to static identity fields - Avoid importing weed/filer from weed/credential (which pulled in filer store init() registrations and broke test isolation) - Add docker/compose/s3_static_users_example.json * fix(admin): make static users read-only in admin UI Static users loaded from the -s3.config file should not be editable or deletable through the admin UI since they are managed via the config file. - Add IsStatic field to ObjectStoreUser, set from credential manager - Hide edit, delete, and access key buttons for static users in the users table template - Show a "static" badge next to static user names - Return 403 Forbidden from UpdateUser and DeleteUser API handlers when the target user is a static identity * fix(admin): show details for static users GetObjectStoreUserDetails called credentialManager.GetUser which only queries the dynamic store. For static users this returned ErrUserNotFound. Fall back to GetStaticIdentity when the store lookup fails. * fix(admin): load static S3 identities in admin server The admin server has its own credential manager (gRPC store) which is a separate instance from the S3 server's and filer's. It had no static identity data, so IsStaticIdentity returned false (edit/delete buttons shown) and GetStaticIdentity returned nil (details page failed). Pass the -s3.config file path through to the admin server and call LoadS3ConfigFile on its credential manager, matching the approach used for the filer. * fix: use protobuf is_static field instead of passing config file path The previous approach passed -s3.config file path to every component (filer, admin). This is wrong because the admin server should not need to know about S3 config files. Instead, add an is_static field to the Identity protobuf message. The field is set when static identities are serialized (in GetStaticIdentities and LoadS3ConfigFile). Any gRPC client that loads configuration via GetConfiguration automatically sees which identities are static, without needing the config file. - Add is_static field (tag 8) to iam_pb.Identity proto message - Set IsStatic=true in GetStaticIdentities and LoadS3ConfigFile - Admin GetObjectStoreUsers reads identity.IsStatic from proto - Admin IsStaticUser helper loads config via gRPC to check the flag - Filer GetUser gRPC handler falls back to GetStaticIdentity - Remove s3ConfigFile from AdminOptions and NewAdminServer signature |
||
|
|
995dfc4d5d |
chore: remove ~50k lines of unreachable dead code (#8913)
* chore: remove unreachable dead code across the codebase Remove ~50,000 lines of unreachable code identified by static analysis. Major removals: - weed/filer/redis_lua: entire unused Redis Lua filer store implementation - weed/wdclient/net2, resource_pool: unused connection/resource pool packages - weed/plugin/worker/lifecycle: unused lifecycle plugin worker - weed/s3api: unused S3 policy templates, presigned URL IAM, streaming copy, multipart IAM, key rotation, and various SSE helper functions - weed/mq/kafka: unused partition mapping, compression, schema, and protocol functions - weed/mq/offset: unused SQL storage and migration code - weed/worker: unused registry, task, and monitoring functions - weed/query: unused SQL engine, parquet scanner, and type functions - weed/shell: unused EC proportional rebalance functions - weed/storage/erasure_coding/distribution: unused distribution analysis functions - Individual unreachable functions removed from 150+ files across admin, credential, filer, iam, kms, mount, mq, operation, pb, s3api, server, shell, storage, topology, and util packages * fix(s3): reset shared memory store in IAM test to prevent flaky failure TestLoadIAMManagerFromConfig_EmptyConfigWithFallbackKey was flaky because the MemoryStore credential backend is a singleton registered via init(). Earlier tests that create anonymous identities pollute the shared store, causing LookupAnonymous() to unexpectedly return true. Fix by calling Reset() on the memory store before the test runs. * style: run gofmt on changed files * fix: restore KMS functions used by integration tests * fix(plugin): prevent panic on send to closed worker session channel The Plugin.sendToWorker method could panic with "send on closed channel" when a worker disconnected while a message was being sent. The race was between streamSession.close() closing the outgoing channel and sendToWorker writing to it concurrently. Add a done channel to streamSession that is closed before the outgoing channel, and check it in sendToWorker's select to safely detect closed sessions without panicking. |
||
|
|
2e98902f29 |
fix(s3): use URL-safe secret keys for dashboard users and service accounts (#8902)
* fix(s3): use URL-safe secret keys for admin dashboard users and service accounts The dashboard's generateSecretKey() used base64.StdEncoding which produces +, /, and = characters that break S3 signature authentication. Reuse the IAM package's GenerateSecretAccessKey() which was already fixed in #7990. Fixes #8898 * fix: handle error from GenerateSecretAccessKey instead of ignoring it |
||
|
|
888c32cbde |
fix(admin): respect urlPrefix in S3 bucket and S3Tables navigation links (#8885)
* fix(admin): respect urlPrefix in S3 bucket and S3Tables navigation links (#8884) Several admin UI templates used hardcoded URLs (templ.SafeURL) instead of dash.PUrl(ctx, ...) for navigation links, causing 404 errors when the admin is deployed with --urlPrefix. Fixed in: s3_buckets.templ, s3tables_buckets.templ, s3tables_tables.templ * fix(admin): URL-escape bucketName in S3Tables navigation links Add url.PathEscape(bucketName) for consistency and correctness in s3tables_tables.templ (back-to-namespaces link) and s3tables_buckets.templ (namespace link), matching the escaping already used in the table details link. |
||
|
|
44d5cb8f90 |
Fix Admin UI master list showing gRPC port instead of HTTP port (#8869)
* Fix Admin UI master list showing gRPC port instead of HTTP port for followers (#8867) Raft stores server addresses as gRPC addresses. The Admin UI was using these addresses directly via ToHttpAddress(), which cannot extract the HTTP port from a plain gRPC address. Use GrpcAddressToServerAddress() to properly convert gRPC addresses back to HTTP addresses. * Use httpAddress consistently as masterMap key Address review feedback: masterInfo.Address (HTTP form) was already computed but the raw address was used as the map key, causing potential key mismatches between topology and raft data. |
||
|
|
c1acf9e479 |
Prune unused functions from weed/admin/dash. (#8871)
* chore(weed/admin/dash): prune unused functions * chore(weed/admin/dash): prune test-only function |
||
|
|
2eaf98a7a2 |
Use Unix sockets for gRPC in mini mode (#8856)
* Use Unix sockets for gRPC between co-located services in mini mode In `weed mini`, all services run in one process. Previously, inter-service gRPC traffic (volume↔master, filer↔master, S3↔filer, worker↔admin, etc.) went through TCP loopback. This adds a gRPC Unix socket registry in the pb package: mini mode registers a socket path per gRPC port at startup, each gRPC server additionally listens on its socket, and GrpcDial transparently routes to the socket via WithContextDialer when a match is found. Standalone commands (weed master, weed filer, etc.) are unaffected since no sockets are registered. TCP listeners are kept for external clients. * Handle Serve error and clean up socket file in ServeGrpcOnLocalSocket Log non-expected errors from grpcServer.Serve (ignoring grpc.ErrServerStopped) and always remove the Unix socket file when Serve returns, ensuring cleanup on Stop/GracefulStop. |
||
|
|
a95b8396e4 |
plugin scheduler: run iceberg and lifecycle lanes concurrently (#8821)
* plugin scheduler: run iceberg and lifecycle lanes concurrently The default lane serialises job types under a single admin lock because volume-management operations share global state. Iceberg and lifecycle lanes have no such constraint, so run each of their job types independently in separate goroutines. * Fix concurrent lane scheduler status * plugin scheduler: address review feedback - Extract collectDueJobTypes helper to deduplicate policy loading between locked and concurrent iteration paths. - Use atomic.Bool instead of sync.Mutex for hadJobs in the concurrent path. - Set lane loop state to "busy" before launching concurrent goroutines so the lane is not reported as idle while work runs. - Convert TestLaneRequiresLock to table-driven style. - Add TestRunLaneSchedulerIterationLockBehavior to verify the scheduler acquires the admin lock only for lanes that require it. - Fix flaky TestGetLaneSchedulerStatusShowsActiveConcurrentLaneWork by not starting background scheduler goroutines that race with the direct runJobTypeIteration call. |
||
|
|
8c8d21d7e2 | Update plugin_lane_templ.go | ||
|
|
2604ec7deb |
Remove min_interval_seconds from plugin workers; vacuum default to 17m (#8790)
remove min_interval_seconds from plugin workers and default vacuum interval to 17m The worker-level min_interval_seconds was redundant with the admin-side DetectionIntervalSeconds, complicating scheduling logic. Remove it from vacuum, volume_balance, erasure_coding, and ec_balance handlers. Also change the vacuum default DetectionIntervalSeconds from 2 hours to 17 minutes to match the previous default behavior. |
||
|
|
cc2f790c73 |
feat: add per-lane scheduler status API and lane worker UI pages
- GET /api/plugin/lanes returns all lanes with status and job types
- GET /api/plugin/workers?lane=X filters workers by lane
- GET /api/plugin/scheduler-states?lane=X filters job types by lane
- GET /api/plugin/scheduler-status?lane=X returns lane-scoped status
- GET /plugin/lanes/{lane}/workers renders per-lane worker page
- SchedulerJobTypeState now includes a "lane" field
The lane worker pages show scheduler status, job type configuration,
and connected workers scoped to a single lane, with links back to
the main plugin overview.
|
||
|
|
e3e015e108 |
feat: introduce scheduler lanes for independent per-workload scheduling
Split the single plugin scheduler loop into independent per-lane goroutines so that volume management, iceberg compaction, and lifecycle operations never block each other. Each lane has its own: - Goroutine (laneSchedulerLoop) - Wake channel for immediate scheduling - Admin lock scope (e.g. "plugin scheduler:default") - Configurable idle sleep duration - Loop state tracking Three lanes are defined: - default: vacuum, volume_balance, ec_balance, erasure_coding, admin_script - iceberg: iceberg_maintenance - lifecycle: s3_lifecycle (new, handler coming in a later commit) Job types are mapped to lanes via a hardcoded map with LaneDefault as the fallback. The SchedulerJobTypeState and SchedulerStatus types now include a Lane field for API consumers. |
||
|
|
d95df76bca |
feat: separate scheduler lanes for iceberg, lifecycle, and volume management (#8787)
* feat: introduce scheduler lanes for independent per-workload scheduling
Split the single plugin scheduler loop into independent per-lane
goroutines so that volume management, iceberg compaction, and lifecycle
operations never block each other.
Each lane has its own:
- Goroutine (laneSchedulerLoop)
- Wake channel for immediate scheduling
- Admin lock scope (e.g. "plugin scheduler:default")
- Configurable idle sleep duration
- Loop state tracking
Three lanes are defined:
- default: vacuum, volume_balance, ec_balance, erasure_coding, admin_script
- iceberg: iceberg_maintenance
- lifecycle: s3_lifecycle (new, handler coming in a later commit)
Job types are mapped to lanes via a hardcoded map with LaneDefault as
the fallback. The SchedulerJobTypeState and SchedulerStatus types now
include a Lane field for API consumers.
* feat: per-lane execution reservation pools for resource isolation
Each scheduler lane now maintains its own execution reservation map
so that a busy volume lane cannot consume execution slots needed by
iceberg or lifecycle lanes. The per-lane pool is used by default when
dispatching jobs through the lane scheduler; the global pool remains
as a fallback for the public DispatchProposals API.
* feat: add per-lane scheduler status API and lane worker UI pages
- GET /api/plugin/lanes returns all lanes with status and job types
- GET /api/plugin/workers?lane=X filters workers by lane
- GET /api/plugin/scheduler-states?lane=X filters job types by lane
- GET /api/plugin/scheduler-status?lane=X returns lane-scoped status
- GET /plugin/lanes/{lane}/workers renders per-lane worker page
- SchedulerJobTypeState now includes a "lane" field
The lane worker pages show scheduler status, job type configuration,
and connected workers scoped to a single lane, with links back to
the main plugin overview.
* feat: add s3_lifecycle worker handler for object store lifecycle management
Implements a full plugin worker handler for S3 lifecycle management,
assigned to the new "lifecycle" scheduler lane.
Detection phase:
- Reads filer.conf to find buckets with TTL lifecycle rules
- Creates one job proposal per bucket with active lifecycle rules
- Supports bucket_filter wildcard pattern from admin config
Execution phase:
- Walks the bucket directory tree breadth-first
- Identifies expired objects by checking TtlSec + Crtime < now
- Deletes expired objects in configurable batches
- Reports progress with scanned/expired/error counts
- Supports dry_run mode for safe testing
Configurable via admin UI:
- batch_size: entries per filer listing page (default 1000)
- max_deletes_per_bucket: safety cap per run (default 10000)
- dry_run: detect without deleting
- delete_marker_cleanup: clean expired delete markers
- abort_mpu_days: abort stale multipart uploads
The handler integrates with the existing PutBucketLifecycle flow which
sets TtlSec on entries via filer.conf path rules.
* feat: add per-lane submenu items under Workers sidebar menu
Replace the single "Workers" sidebar link with a collapsible submenu
containing three lane entries:
- Default (volume management + admin scripts) -> /plugin
- Iceberg (table compaction) -> /plugin/lanes/iceberg/workers
- Lifecycle (S3 object expiration) -> /plugin/lanes/lifecycle/workers
The submenu auto-expands when on any /plugin page and highlights the
active lane. Icons match each lane's job type descriptor (server,
snowflake, hourglass).
* feat: scope plugin pages to their scheduler lane
The plugin overview, configuration, detection, queue, and execution
pages now filter workers, job types, scheduler states, and scheduler
status to only show data for their lane.
- Plugin() templ function accepts a lane parameter (default: "default")
- JavaScript appends ?lane= to /api/plugin/workers, /job-types,
/scheduler-states, and /scheduler-status API calls
- GET /api/plugin/job-types now supports ?lane= filtering
- When ?job= is provided (e.g. ?job=iceberg_maintenance), the lane is
auto-derived from the job type so the page scopes correctly
This ensures /plugin shows only default-lane workers and
/plugin/configuration?job=iceberg_maintenance scopes to the iceberg lane.
* fix: remove "Lane" from lane worker page titles and capitalize properly
"lifecycle Lane Workers" -> "Lifecycle Workers"
"iceberg Lane Workers" -> "Iceberg Workers"
* refactor: promote lane items to top-level sidebar menu entries
Move Default, Iceberg, and Lifecycle from a collapsible submenu to
direct top-level items under the WORKERS heading. Removes the
intermediate "Workers" parent link and collapse toggle.
* admin: unify plugin lane routes and handlers
* admin: filter plugin jobs and activities by lane
* admin: reuse plugin UI for worker lane pages
* fix: use ServerAddress.ToGrpcAddress() for filer connections in lifecycle handler
ClusterContext addresses use ServerAddress format (host:port.grpcPort).
Convert to the actual gRPC address via ToGrpcAddress() before dialing,
and add a Ping verification after connecting.
Fixes: "dial tcp: lookup tcp/8888.18888: unknown port"
* fix: resolve ServerAddress gRPC port in iceberg and lifecycle filer connections
ClusterContext addresses use ServerAddress format (host:httpPort.grpcPort).
Both the iceberg and lifecycle handlers now detect the compound format
and extract the gRPC port via ToGrpcAddress() before dialing. Plain
host:port addresses (e.g. from tests) are passed through unchanged.
Fixes: "dial tcp: lookup tcp/8888.18888: unknown port"
* align url
* Potential fix for code scanning alert no. 335: Incorrect conversion between integer types
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
* fix: address PR review findings across scheduler lanes and lifecycle handler
- Fix variable shadowing: rename loop var `w` to `worker` in
GetPluginWorkersAPI to avoid shadowing the http.ResponseWriter param
- Fix stale GetSchedulerStatus: aggregate loop states across all lanes
instead of reading never-updated legacy schedulerLoopState
- Scope InProcessJobs to lane in GetLaneSchedulerStatus
- Fix AbortMPUDays=0 treated as unset: change <= 0 to < 0 so 0 disables
- Propagate listing errors in lifecycle bucket walk instead of swallowing
- Implement DeleteMarkerCleanup: scan for S3 delete marker entries and
remove them
- Implement AbortMPUDays: scan .uploads directory and remove stale
multipart uploads older than the configured threshold
- Fix success determination: mark job failed when result.errors > 0
even if no fatal error occurred
- Add regression test for jobTypeLaneMap to catch drift from handler
registrations
* fix: guard against nil result in lifecycle completion and trim filer addresses
- Guard result dereference in completion summary: use local vars
defaulting to 0 when result is nil to prevent panic
- Append trimmed filer addresses instead of originals so whitespace
is not passed to the gRPC dialer
* fix: propagate ctx cancellation from deleteExpiredObjects and add config logging
- deleteExpiredObjects now returns a third error value when the context
is canceled mid-batch; the caller stops processing further batches
and returns the cancellation error to the job completion handler
- readBoolConfig and readInt64Config now log unexpected ConfigValue
types at V(1) for debugging, consistent with readStringConfig
* fix: propagate errors in lifecycle cleanup helpers and use correct delete marker key
- cleanupDeleteMarkers: return error on ctx cancellation and SeaweedList
failures instead of silently continuing
- abortIncompleteMPUs: log SeaweedList errors instead of discarding
- isDeleteMarker: use ExtDeleteMarkerKey ("Seaweed-X-Amz-Delete-Marker")
instead of ExtLatestVersionIsDeleteMarker which is for the parent entry
- batchSize cap: use math.MaxInt instead of math.MaxInt32
* fix: propagate ctx cancellation from abortIncompleteMPUs and log unrecognized bool strings
- abortIncompleteMPUs now returns (aborted, errors, ctxErr) matching
cleanupDeleteMarkers; caller stops on cancellation or listing failure
- readBoolConfig logs unrecognized string values before falling back
* fix: shared per-bucket budget across lifecycle phases and allow cleanup without expired objects
- Thread a shared remaining counter through TTL deletion, delete marker
cleanup, and MPU abort so the total operations per bucket never exceed
MaxDeletesPerBucket
- Remove early return when no TTL-expired objects found so delete marker
cleanup and MPU abort still run
- Add NOTE on cleanupDeleteMarkers about version-safety limitation
---------
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
|
||
|
|
67a551fd62 |
admin UI: add anonymous user creation checkbox (#8773)
Add an "Anonymous" checkbox next to the username field in the Create User modal. When checked, the username is set to "anonymous" and the credential generation checkbox is disabled since anonymous users do not need keys. The checkbox is only shown when no anonymous user exists yet. The manage-access-keys button in the users table is hidden for the anonymous user. |
||
|
|
7f0cf72574 |
admin/plugin: delete job_detail files when jobs are pruned from memory (#8722)
* admin/plugin: delete job_detail files when jobs are pruned from memory pruneTrackedJobsLocked evicts the oldest terminal jobs from the in-memory tracker when the total exceeds maxTrackedJobsTotal (1000). However the dedicated per-job detail files in jobs/job_details/ were never removed, causing them to accumulate indefinitely on disk. Add ConfigStore.DeleteJobDetail and call it from pruneTrackedJobsLocked so that the file is cleaned up together with the in-memory entry. Deletion errors are logged at verbosity level 2 and do not abort the prune. * admin/plugin: add test for DeleteJobDetail --------- Co-authored-by: Anton Ustyugov <anton@devops> Co-authored-by: Chris Lu <chris.lu@gmail.com> |
||
|
|
90277ceed5 |
admin/plugin: migrate inline job details asynchronously to avoid slow startup (#8721)
loadPersistedMonitorState performed a backward-compatibility migration that wrote every job with inline rich detail fields to a dedicated per-job detail file synchronously during startup. On deployments with many historical jobs (e.g. 1000+) stored on distributed block storage (e.g. Longhorn), each individual file write requires an fsync round-trip, making startup disproportionately slow and causing readiness/liveness probe failures. The in-memory state is populated correctly before the goroutine is started because stripTrackedJobDetailFields is still called in-place; only the disk writes are deferred. A completion log message at V(1) is emitted once the background migration finishes. Co-authored-by: Anton Ustyugov <anton@devops> |
||
|
|
ae170f1fbb |
admin: fix manual job run to use scheduler dispatch with capacity management and retry (#8720)
RunPluginJobTypeAPI previously executed proposals with a naive sequential loop calling ExecutePluginJob per proposal. This had two bugs: 1. Double-lock: RunPluginJobTypeAPI held pluginLock while calling ExecutePluginJob, which tried to re-acquire the same lock for every job in the loop. 2. No capacity management: proposals were fired directly at workers without reserveScheduledExecutor, so every job beyond the worker concurrency limit received an immediate at_capacity error with no retry or backoff. Fix: add Plugin.DispatchProposals which reuses dispatchScheduledProposals - the same code path the scheduler loop uses - with executor reservation, configurable concurrency, and per-job retry with backoff. RunPluginJobTypeAPI now calls DispatchPluginProposals (a thin AdminServer wrapper) after holding pluginLock once. Co-authored-by: Anton Ustyugov <anton@devops> |
||
|
|
6ccda3e809 |
fix(s3): allow deleting the anonymous user from admin webui (#8706)
Remove the block that prevented deleting the "anonymous" identity and stop auto-creating it when absent. If no anonymous identity exists (or it is disabled), LookupAnonymous returns not-found and both auth paths return ErrAccessDenied for anonymous requests. To enable anonymous access, explicitly create the "anonymous" user. To revoke it, delete the user like any other identity. Closes #8694 |
||
|
|
1f1eac4f08 |
feat: improve aio support for admin/volume ingress and fix UI links (#8679)
* feat: improve allInOne mode support for admin/volume ingress and fix master UI links - Add allInOne support to admin ingress template, matching the pattern used by filer and s3 ingress templates (or-based enablement with ternary service name selection) - Add allInOne support to volume ingress template, which previously required volume.enabled even when the volume server runs within the allInOne pod - Expose admin ports in allInOne deployment and service when allInOne.admin.enabled is set - Add allInOne.admin config section to values.yaml (enabled by default, ports inherit from admin.*) - Fix legacy master UI templates (master.html, masterNewRaft.html) to prefer PublicUrl over internal Url when linking to volume server UI. The new admin UI already handles this correctly. * fix: revert admin allInOne changes and fix PublicUrl in admin dashboard The admin binary (`weed admin`) is a separate process that cannot run inside `weed server` (allInOne mode). Revert the admin-related allInOne helm chart changes that caused 503 errors on admin ingress. Fix bug in cluster_topology.go where VolumeServer.PublicURL was set to node.Id (internal pod address) instead of the actual public URL. Add public_url field to DataNodeInfo proto message so the topology gRPC response carries the public URL set via -volume.publicUrl flag. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use HTTP /dir/status to populate PublicUrl in admin dashboard The gRPC DataNodeInfo proto does not include PublicUrl, so the admin dashboard showed internal pod IPs instead of the configured public URL. Fetch PublicUrl from the master's /dir/status HTTP endpoint and apply it in both GetClusterTopology and GetClusterVolumeServers code paths. Also reverts the unnecessary proto field additions from the previous commit and cleans up a stray blank line in all-in-one-service.yml. * fix: apply PublicUrl link fix to masterNewRaft.html Match the same conditional logic already applied to master.html — prefer PublicUrl when set and different from Url. * fix: add HTTP timeout and status check to fetchPublicUrlMap Use a 5s-timeout client instead of http.DefaultClient to prevent blocking indefinitely when the master is unresponsive. Also check the HTTP status code before attempting to parse the response body. * fix: fall back to node address when PublicUrl is empty Prevents blank links in the admin dashboard when PublicUrl is not configured, such as in standalone or mixed-version clusters. * fix: log io.ReadAll error in fetchPublicUrlMap --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Chris Lu <chris.lu@gmail.com> |