231 Commits

Author SHA1 Message Date
Chris Lu
884b0bcbfd feat(s3/lifecycle): cluster rate-limit allocation (Phase 3) (#9456)
* feat(s3/lifecycle): cluster rate-limit allocation (Phase 3)

Admin computes a per-worker share of cluster_deletes_per_second at
ExecuteJob time and ships it to the worker via
ClusterContext.Metadata. The worker reads the share, constructs a
golang.org/x/time/rate.Limiter, and passes it to dailyrun.Run via
cfg.Limiter (Phase 2 already plumbed the field). Phase 5 deletes the
streaming path; until then streaming ignores the cap.

Why allocate at admin: the cluster cap is a single knob operators
care about. Dividing it locally per worker would either need
out-of-band coordination or accept N× the configured budget. Admin
is the only party that knows how many execute-capable workers there
are, so it owns the math.

Admin side (weed/admin/plugin):
- Registry.CountCapableExecutors(jobType) returns the number of
  non-stale workers with CanExecute=true.
- New file cluster_rate_limit.go: decorateClusterContextForJob clones
  the input ClusterContext and injects two metadata keys for
  s3_lifecycle. cloneClusterContext duplicates Metadata so per-job
  decoration doesn't race shared base state.
- executeJobWithExecutor calls the decorator after loading the admin
  config; other job types pass through unchanged.

Worker side (weed/worker/tasks/s3_lifecycle):
- New cluster_rate_limit.go declares the constants both sides agree
  on (admin-config field names, metadata keys). Plain strings on the
  admin side keep weed/admin/plugin free of a dependency on the
  s3_lifecycle worker package; the two sets of constants are pinned
  to identical values and a mismatch would silently disable rate
  limiting.
- handler.go executeDailyReplay reads ClusterContext.Metadata,
  builds a rate.Limiter, and passes it into dailyrun.Config{Limiter}.
  Missing/empty/non-positive values → no limiter (legacy unlimited
  behavior). burst defaults to 2 × rate, clamped to ≥1 to avoid a
  bucket that never refills.
- Admin form gains two fields under "Scope": cluster_deletes_per_second
  (rate, 0 = unlimited) and cluster_deletes_burst (0 = 2 × rate).

Metric:
- New S3LifecycleDispatchLimiterWaitSeconds histogram observes how
  long each Limiter.Wait blocks before a LifecycleDelete RPC.
  Operators tune the cap by reading p95 — near-zero means the cap
  isn't binding, a long tail at 1/rate means it is.

Tests:
- weed/admin/plugin/cluster_rate_limit_test.go: 9 cases covering
  pass-through for non-allocator job types, rps=0 / no-executors
  skip, even sharing, burst sharing, burst=0 omit (worker default
  kicks in), burst floor of 1, no mutation of input metadata, nil
  input.
- weed/worker/tasks/s3_lifecycle/cluster_rate_limit_test.go: 7 cases
  covering nil/empty/missing metadata, non-positive/invalid rate,
  positive rate builds correctly, burst missing defaults to 2× rate,
  tiny rate clamps burst to ≥1.

Build clean. Phase 2 (#9446) and Phase 4 engine (#9447) are the
parents; this branch stacks on Phase 2 since it consumes
dailyrun.Config{Limiter} which lands there.

* fix(s3/lifecycle): divide cluster budget by active workers, not all capable

gemini pointed out that s3_lifecycle has MaxJobsPerDetection=1
(handler.go:189) — it's a singleton job, only one worker is ever active.
Dividing the cluster_deletes_per_second budget by the count of capable
executors gave the single active worker just 1/N of the configured cap.

Pass adminRuntime.MaxJobsPerDetection through to the decorator. Divisor
is now min(executors, maxJobsPerDetection), clamped to >=1. For
s3_lifecycle (maxJobs=1) the active worker gets the full budget; for a
hypothetical parallel-dispatch job (maxJobs>1) the budget divides
across the running-set.

Tests swap the SharedEvenly case for two pinned scenarios:
  - SingletonJobGetsFullBudget: maxJobs=1 across 4 executors => 100/1
  - SharedEvenlyWhenParallelLimited: maxJobs=4 across 4 executors => 25/worker
  - MaxJobsExceedsExecutors: maxJobs=10 across 4 executors => divisor 4

* feat(s3/lifecycle): drop Worker Count knob from admin config form

The "Worker Count" admin field controlled in-process pipeline goroutines
across the 16-shard space — per-worker tuning, not a cluster-wide scope
concern. Operators looking at the form alongside Cluster Delete Rate
reasonably misread it as the number of workers in the cluster.

Drop the form field and DefaultValues entry. cfg.Workers is now hardcoded
to shardPipelineGoroutines (=1) inside ParseConfig; the rest of the
plumbing through dailyrun.Config.Workers stays so a future need can
re-introduce it as a worker-local knob (or just bump the constant).

handler_test.go pins that "workers" must NOT appear in the form so the
removal doesn't silently regress.
2026-05-11 19:17:06 -07:00
Chris Lu
fd463155e4 fix(ec): planner treats each (server, disk_id) as a distinct target (#9369) (#9371)
* fix(ec): planner treats each (server, disk_id) as a distinct target (#9369)

master_pb.DataNodeInfo.DiskInfos is keyed by disk type, so a volume
server with multiple physical disks of the same type collapses into a
single DiskInfo. Per-disk attribution survives only inside the
VolumeInfos[].DiskId / EcShardInfos[].DiskId records, and the active
topology never put it back together. The EC planner saw N candidates
instead of N×disks, returned a short plan, and createECTargets
round-robined extra shards onto the same (server, disk_id) — colliding
with the #9185 disk_id-aware ReceiveFile.

Reconstruct per-physical-disk view in UpdateTopology by splitting each
DiskInfo into one entry per observed disk_id, and index volumes / EC
shards by their own DiskId so lookups stay aligned. Refuse to plan an
EC task when fewer than totalShards distinct disks are available rather
than packing shards onto the same disk.

Threads dataShards/parityShards through planECDestinations,
createECTargets and createECTaskParams so the helpers don't depend on
the OSS 10+4 constants — keeps enterprise merges clean.

* trim verbose comments

* align EC param signatures with enterprise

- dataShards/parityShards: uint32 → int (matches enterprise's ratio API)
- drop unused multiPlan from createECTaskParams
- minTotalDisks: total/parity+1 → ceil(total/parity), correct for non-default ratios

Reduces merge surface when this PR lands in seaweed-enterprise.
2026-05-08 12:59:02 -07:00
Chris Lu
5d43f84df7 refactor(plugin): rename detection_interval_seconds → detection_interval_minutes (#9366)
Minutes is the natural granularity for detection cadence — every
production handler already set the seconds field to a 60-multiple
(17*60, 30*60, 3600, 24*60*60). Switching to minutes drops the *60
arithmetic and matches the unit conventions used elsewhere in the
plugin worker forms.

- Proto: AdminRuntimeDefaults + AdminRuntimeConfig.detection_interval_*
  field renamed.
- Helpers: durationFromMinutes / minutesFromDuration alongside the
  existing seconds variants in plugin_scheduler.go.
- Handlers: vacuum, ec_balance, balance, erasure_coding, iceberg,
  admin_script, s3_lifecycle now declare DetectionIntervalMinutes.
- Admin: scheduler_status + types + UI templ + plugin_api.go pass
  through the new field; UI label and table cells switch to "min".
2026-05-08 10:33:02 -07:00
Chris Lu
7f254e158e feat(worker/s3_lifecycle): plugin handler with admin UI config (#9362)
* feat(s3/lifecycle): scheduler — N pipelines over an even shard split

Scheduler.Run spawns Workers Pipeline goroutines plus one engine-refresh
ticker. Each worker owns a contiguous AssignShards(idx, total) slice of
[0, ShardCount) and runs Pipeline.Run with EventBudget bounding each
iteration; brief RetryBackoff between iterations avoids hot-loop on
errors. The refresh ticker rebuilds the engine snapshot from the filer's
bucket configs every RefreshInterval.

LoadCompileInputs / IsBucketVersioned / AllActivePriorStates are
exported from a configload.go sibling so the shell command can move to
this shared implementation in a follow-up.

* refactor(shell): reuse scheduler.LoadCompileInputs in run-shard

Drop the local copies of loadLifecycleCompileInputs / isBucketVersioned
/ allActivePriorStates / lifecycleParseError that the new
scheduler package now exports. Same behavior, one source of truth.

* feat(worker/s3_lifecycle): plugin handler with admin UI config

Registers a JobHandler for s3_lifecycle via pluginworker.RegisterHandler.
Admin pulls the descriptor over the worker plugin gRPC and renders the
AdminConfigForm + WorkerConfigForm in the existing UI:

  Admin form (cluster shape):
    - workers (1..16, default 1)
    - s3_grpc_endpoints (comma list)

  Worker form (operational tuning):
    - dispatch_tick_ms (default 5000)
    - checkpoint_tick_ms (default 30000)
    - refresh_interval_ms (default 300000)
    - event_budget (default 0 = unbounded)

Detect emits a single proposal whenever S3 endpoints + filer addresses
are configured. MaxExecutionConcurrency=1 so admin only ever runs one
lifecycle daemon per worker; a fresh proposal next cycle restarts it
if the prior Execute exits.

Execute dials the configured S3 endpoint + filer, builds a
scheduler.Scheduler with the parsed config, and runs it until
ctx cancellation. Reuses the existing scheduler / dispatcher /
reader / engine packages — the handler is the thin glue that
parses descriptor values and wires the long-running daemon.

* proto(plugin): add s3_grpc_addresses to ClusterContext

So workers can dial s3 servers discovered by the master rather than a
hand-typed list in the admin form.

* feat(admin): populate ClusterContext.s3_grpc_addresses from master

ListClusterNodes(S3Type) returns the live S3 servers; the plugin
scheduler now hands these to job handlers alongside filer/volume
addresses.

* feat(worker/s3_lifecycle): discover s3 endpoints from cluster context

Drop the s3_grpc_endpoints admin form field and read the master-supplied
ClusterContext.S3GrpcAddresses instead. Operators no longer maintain a
hand-typed list, and a stale entry self-heals when the master's view
updates.

* feat(worker/s3_lifecycle): time-based runtime cap, friendlier cadence units

- dispatch_tick_minutes (was *_ms): minutes is the natural granularity
  for a daily batch; default 1 minute.
- checkpoint_tick_seconds: seconds for the durable cursor write; default
  30 seconds.
- refresh_interval_minutes: minutes for the engine snapshot rebuild.
- max_runtime_minutes replaces event_budget. Each daily run is bounded
  by wall clock — typical run wraps in well under an hour because the
  cursor persists and the meta-log streams fast. Default 60 minutes.
- AdminRuntimeDefaults.DetectionIntervalSeconds = 86400 so the admin
  schedules one job per day.
2026-05-08 10:30:02 -07:00
Minsoo Kim
a1e5eb9dad Fix UI prefix url encoding (#9344)
* Fix filer UI navigation for URL-sensitive object prefixes

* Fix filer UI navigation for URL-sensitive object prefixes

* Clarify filer UI path escaping test name

Rename the legacy filer UI
  path test to describe the actual behavior being checked.

  The printpath helper preserves timestamp characters that are valid in URL path
  components, while the PR fix is focused on query-string escaping for path and cursor
  parameters.
2026-05-06 19:14:36 -07:00
Chris Lu
7b0b64db65 fix(admin/view): wrap plugin history URL with basePath (#9341)
Plugin tabs/sub-tabs use history.pushState/replaceState to keep the
URL bar in sync with the active view, but updateURL fed it the raw
output of buildPluginURL ("/plugin/lanes/<lane>/..."). Under a
urlPrefix deployment that strips the prefix, so reloading the page
hit /plugin/... directly and 404'd at the proxy.

Wrap with basePath() so the rewritten URL keeps the deployment
prefix.

Reported at #9240.
2026-05-06 15:25:06 -07:00
Chris Lu
1c0e24f06a fix(balance): don't move remote-tiered volumes; don't fatal on missing .idx (#9335)
* fix(volume): don't fatal on missing .idx for remote-tiered volume

A .vif left behind without its .idx (orphaned by a crashed move, partial
copy, or hand-edit) would trip glog.Fatalf in checkIdxFile and take the
whole volume server down on boot, killing every healthy volume on it
too. For remote-tiered volumes treat it as a per-volume load error so
the server can come up and the operator can clean up the stray .vif.

Refs #9331.

* fix(balance): skip remote-tiered volumes in admin balance detection

The admin/worker balance detector had no equivalent of the shell-side
guard ("does not move volume in remote storage" in
command_volume_balance.go), so it scheduled moves on remote-tiered
volumes. The "move" copies .idx/.vif to the destination and then calls
Volume.Destroy on the source, which calls backendStorage.DeleteFile —
deleting the remote object the destination's new .vif now points at.

Populate HasRemoteCopy on the metrics emitted by both the admin
maintenance scanner and the worker's master poll, then drop those
volumes at the top of Detection.

Fixes #9331.

* Apply suggestion from @gemini-code-assist[bot]

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* fix(volume): keep remote data on volume-move-driven delete

The on-source delete after a volume move (admin/worker balance and
shell volume.move) ran Volume.Destroy with no way to opt out of the
remote-object cleanup. Volume.Destroy unconditionally calls
backendStorage.DeleteFile for remote-tiered volumes, so a successful
move would copy .idx/.vif to the destination and then nuke the cloud
object the destination's new .vif was already pointing at.

Add VolumeDeleteRequest.keep_remote_data and plumb it through
Store.DeleteVolume / DiskLocation.DeleteVolume / Volume.Destroy. The
balance task and shell volume.move set it to true; the post-tier-upload
cleanup of other replicas and the over-replication trim in
volume.fix.replication also set it to true since the remote object is
still referenced. Other real-delete callers keep the default. The
delete-before-receive path in VolumeCopy also sets it: the inbound copy
carries a .vif that may reference the same cloud object as the
existing volume.

Refs #9331.

* test(storage): in-process remote-tier integration tests

Cover the four operations the user is most likely to run against a
cloud-tiered volume — balance/move, vacuum, EC encode, EC decode — by
registering a local-disk-backed BackendStorage as the "remote" tier and
exercising the real Volume / DiskLocation / EC encoder code paths.

Locks in:
- Destroy(keepRemoteData=true) preserves the remote object (move case)
- Destroy(keepRemoteData=false) deletes it (real-delete case)
- Vacuum/compact on a remote-tier volume never deletes the remote object
- EC encode requires the local .dat (callers must download first)
- EC encode + rebuild round-trips after a tier-down

Tests run in-process and finish in under a second total — no cluster,
binary, or external storage required.

* fix(rust-volume): keep remote data on volume-move-driven delete

Mirror the Go fix in seaweed-volume: plumb keep_remote_data through
grpc volume_delete → Store.delete_volume → DiskLocation.delete_volume
→ Volume.destroy, and skip the s3-tier delete_file call when the flag
is set. The pre-receive cleanup in volume_copy passes true for the
same reason as the Go side: the inbound copy carries a .vif that may
reference the same cloud object as the existing volume.

The Rust loader already warns rather than fataling on a stray .vif
without an .idx (volume.rs load_index_inmemory / load_index_redb), so
no counterpart to the Go fatal-on-missing-idx fix is needed.

Refs #9331.

* fix(volume): preserve remote tier on IO-error eviction; fix EC test target

Two review nits:

- Store.MaybeAddVolumes' periodic cleanup pass deleted IO-errored
  volumes with keepRemoteData=false, so a transient local fault on a
  remote-tiered volume would also nuke the cloud object. Track the
  delete reason via a parallel slice and pass keepRemoteData=v.HasRemoteFile()
  for IO-error evictions; TTL-expired evictions still pass false.

- TestRemoteTier_ECEncodeDecode_AfterDownload deleted shards 0..3 but
  called them "parity" — by the klauspost/reedsolomon convention shards
  0..DataShardsCount-1 are data and DataShardsCount..TotalShardsCount-1
  are parity. Switch the loop to delete the parity range so the
  intent matches the indices.

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-05-06 15:19:43 -07:00
Chris Lu
f2c3bd7b77 fix(admin/view): define basePath in plugin IIFE scopes (#9298)
The plugin.templ and plugin_lane.templ components use basePath() in their IIFE
(Immediately Invoked Function Expression) scopes to handle subdirectory
deployments. However, basePath was not defined locally, causing "basePath is
not defined" errors when accessing plugin pages.

Added local basePath function definitions in both files, matching the pattern
from admin.js. This function checks window.__BASE_PATH__ (set by the layout
during page initialization) and prepends it to API paths.
2026-05-01 18:22:39 -07:00
Chris Lu
14cd426cf9 templ 2026-04-29 16:08:38 -07:00
baracudaz
db34e8b6fd feat(admin): prefer stored S3 Content-Type metadata over key-extension MIME inference (#9286)
* feat(admin): prefer stored filer mime in file browser and properties

* feat(mime): enhance MIME type registration and improve fallback logic

Co-authored-by: Copilot <copilot@github.com>

---------

Co-authored-by: baracudaz <baracudaz@users.noreply.github.com>
Co-authored-by: Copilot <copilot@github.com>
2026-04-29 10:21:03 -07:00
Parviz Miriyev
e2f96687ff fix(admin): use protocol-relative URLs for component links so HTTPS clusters don't break clicks (#9256)
* fix(admin): use protocol-relative URLs for component links

Hardcoded http:// in admin UI templates breaks browser-initiated clicks
to master / volume / filer / EC shard / Iceberg REST URLs whenever the
target component runs HTTPS-only via security.toml [https.X] sections.
The browser sends plain HTTP to a TLS-only endpoint and gets 400
"client sent an HTTP request to an HTTPS server".

Same root pattern as #9227 (admin's own backend /dir/status fetch);
this PR is the browser-facing equivalent.

Replace fmt.Sprintf("http://%s...") with fmt.Sprintf("//%s...") and the
JS-string '<a href="http://' with '<a href="//' so the browser uses the
same scheme as the page hosting the link. Backwards compatible:
  - HTTPS-only deployments: links now work
  - HTTP-only deployments: identical behavior to before
  - Mixed: edge case, addressed by future per-component public-URL work

Affected templates (9 files), each kept in lockstep with its generated
_templ.go sibling so reviewers don't need to run templ generate:

- weed/admin/view/app/admin.templ
- weed/admin/view/app/cluster_filers.templ
- weed/admin/view/app/cluster_masters.templ (Go templ + JS modal)
- weed/admin/view/app/cluster_volume_servers.templ (Go templ + JS modal)
- weed/admin/view/app/cluster_volumes.templ
- weed/admin/view/app/ec_volume_details.templ
- weed/admin/view/app/volume_details.templ
- weed/admin/view/app/iceberg_catalog.templ
- weed/admin/view/app/s3tables_buckets.templ

17 link constructions total, +32/-32 lines.

* fix(admin): protocol-relative URLs in iceberg + s3tables JS overrides

Per Gemini code review on this PR: the JS scripts in iceberg_catalog
and s3tables_buckets templates overwrite the href attribute of the
"Open Iceberg REST" links after page load, replacing the
protocol-relative URL set by the templ render with a hardcoded
http://<host>:<port>/v1/config.

Apply the same protocol-relative fix to the JS template literals so
they don't undo the templ-side change. Browser uses the page scheme
(http or https) to fill in the protocol.

Mirrored in iceberg_catalog_templ.go and s3tables_buckets_templ.go.

* fix(admin): displayed Iceberg endpoint scheme follows page protocol

Per CodeRabbit review on this PR: the on-page guidance text in iceberg
and s3tables templates still showed a literal `http://` even after the
clickable link was switched to a protocol-relative URL. In HTTPS-only
deployments operators see `http://host:8181/v1` as the suggested
endpoint, copy it, and get a broken connection.

Wrap the scheme in <span id="iceberg-protocol"> (and the s3tables
counterpart) and have the existing inline script set its innerText to
window.location.protocol minus the trailing colon. Same pattern as the
existing dynamic host substitution. Mirrored in *_templ.go so reviewers
do not need templ generate.

SQL/JSON code-block examples (CREATE EXTERNAL TABLE ... ENDPOINT
'http://...', "uri": "http://..." ) are intentionally left as-is —
they are starter snippets users adapt to their environment, not
clickable or copy-paste-into-runtime values. Happy to follow up with
server-side scheme threading if requested.
2026-04-27 23:10:11 -07:00
Chris Lu
fa492a9eed fix(admin): wrap plugin URLs with basePath for subdir deployments
Two more spots that broke under a subdirectory deployment:

- plugin.templ pluginRequest() called fetch(url) with relative API
  paths from 14+ callers; wrap once inside the helper so they all
  honor window.__BASE_PATH__.
- plugin_lane.templ generated <a href="/plugin/configuration?job=...">
  with an absolute path; wrap with basePath() so the link stays
  inside the deployment prefix.

Follow-up to a6adf530c.
2026-04-27 09:04:12 -07:00
Chris Lu
3ea489d013 fix(admin): wrap plugin lane fetch URL with basePath
Plugin lane page fetches API endpoints with raw absolute URLs, breaking
deployments under a subdirectory. Wrap the fetch URL with basePath() so
window.__BASE_PATH__ is honored, matching other admin pages.

Addresses https://github.com/seaweedfs/seaweedfs/issues/9240
2026-04-27 09:00:29 -07:00
Parviz Miriyev
f407bdaa36 fix(admin): use TLS-aware HTTP client for /dir/status fetch (#9227)
fetchPublicUrlMap() in weed/admin/dash/cluster_topology.go uses a
dedicated &http.Client{} that doesn't honor security.toml client TLS
configuration, and hardcodes "http://" in the URL. When master is
configured HTTPS-only ([https.master] set), every cluster topology
cache refresh logs:

  NOTICE: http: TLS handshake error from <admin-ip>:<port>: client
  sent an HTTP request to an HTTPS server

The function falls through to glog.V(1).Infof and returns nil, so the
admin UI loses PublicUrl enrichment for data nodes. Cosmetic but noisy.

Switch to util_http.GetGlobalHttpClient() whose Do() calls
NormalizeHttpScheme(), which automatically rewrites http:// to https://
when [https.client] is enabled and presents the configured client cert.
Preserve the 5-second timeout via context.WithTimeout().

Same pattern as weed/admin/handlers/file_browser_handlers.go,
weed/server/master_server.go, weed/shell/command_volume_fsck.go.
2026-04-26 12:25:53 -07:00
Chris Lu
5eead9409a fix(admin): S3 Tables CSRF token + non-empty 409 status (#9221)
* fix(admin): attach CSRF token to S3 Tables write requests

Several POST/PUT/DELETE calls in s3tables.js were sent without an
X-CSRF-Token header while the corresponding handlers in
weed/admin/dash/s3tables_management.go enforce CSRF via
requireSessionCSRFToken, so authenticated users hit "invalid CSRF token"
on actions like creating a table bucket (#9220), updating policies, and
managing tags.

Add an s3tWriteHeaders helper that pulls the token from the existing
csrf-token meta tag and use it on every write to /api/s3tables/buckets,
/bucket-policy, /tables, /table-policy, and /tags. The Iceberg-page
write paths already attached the token and are unchanged.

Fixes #9220

* fix(admin): map BucketNotEmpty/NamespaceNotEmpty to 409 for S3 Tables

DELETE on a non-empty table bucket or namespace returned HTTP 500
because s3TablesErrorStatus didn't list ErrCodeBucketNotEmpty or
ErrCodeNamespaceNotEmpty in its conflict case, even though the
backend handler emits them with 409 Conflict (matching AWS S3 Tables).
Add both codes to the existing conflict mapping.

* refactor(admin): route Iceberg S3 Tables writes through s3tWriteHeaders

Iceberg namespace/table create and Iceberg table delete were still
hand-rolling CSRF headers. Replace those blocks with the existing
s3tWriteHeaders() helper so every S3 Tables write uses the same code
path. Drop the now-unused csrfTokenInput.value population in
initIcebergNamespaces and initIcebergTables (the templ hidden inputs
have no server-rendered value, and nothing reads the input now that
the JS reads the token from the meta tag via getCSRFToken()).
2026-04-24 22:48:41 -07:00
faspix
0fcd5173be fix(admin): use basePath for API fetches when urlPrefix is set (#9197)
* fix(admin): use basePath for API fetches when urlPrefix is set

* fix(admin): drop duplicate iam-utils script on Groups page

* fix(admin): route topics page fetches through basePath

The Topics page missed two fetch() calls that still used root-relative
URLs, so create-topic and view-details still broke when -urlPrefix was
set.

---------

Co-authored-by: Maksim Babkou <maksim.babkou@innovatrics.com>
Co-authored-by: Chris Lu <chris.lu@gmail.com>
2026-04-23 11:55:07 -07:00
Chris Lu
036191c78a Merge branch 'master' of https://github.com/seaweedfs/seaweedfs 2026-04-23 11:09:59 -07:00
Chris Lu
ae93f87a46 adjust logo 2026-04-23 10:05:51 -07:00
Jon E Nesvold
c6302fcb54 feat(iam): allow caller-supplied AccessKeyId and SecretAccessKey in CreateAccessKey (#9172)
* feat(iam): support caller-supplied AccessKeyId and SecretAccessKey in CreateAccessKey

Both IAM implementations (standalone and embedded) now check for
caller-supplied AccessKeyId and SecretAccessKey form parameters before
generating random credentials. If provided, the caller-supplied values
are used. If empty, random keys are generated as before.

This enables programmatic identity provisioning where the caller needs
to control the S3 credentials.

Backward-compatible: no behavior change for callers that omit these
parameters.

* refactor(iam): extract shared caller-supplied credential validation

Move the AccessKeyId/SecretAccessKey format checks and the in-memory
collision scan into weed/iam so the standalone IAM API, the embedded
IAM in s3api, and the admin dashboard all enforce the same rules.

- ValidateCallerSuppliedAccessKeyId: 4-128 alphanumeric (rejects
  SigV4-breaking characters like '/' and '=').
- ValidateCallerSuppliedSecretAccessKey: 8-128 chars.
- FindAccessKeyOwner: scans identities and service accounts and
  returns the owning entity type + name for debug logging, without
  exposing the owner in caller-facing error messages.

The admin dashboard previously only length-checked caller-supplied
keys; it now enforces the same alphanumeric rule, which matches what
SigV4 actually accepts anyway.

* fix(iam): reject partial caller-supplied AccessKeyId/SecretAccessKey

Previously, if a caller supplied only one of AccessKeyId or
SecretAccessKey, CreateAccessKey logged a warning and auto-generated
the missing half. That silently returns a credential the caller did
not fully choose, which is surprising and easy to miss in a response
they expected to echo back their input.

Return ErrCodeInvalidInputException instead: either both are supplied
or neither is. Updates the mixed-supply tests in weed/iamapi and
weed/s3api to assert the rejection.

* chore(iam): centralize and broaden sensitive form redaction

DoActions and ExecuteAction both had an inline loop that redacted
SecretAccessKey from their debug-level request log. Replace the two
copies with iam.RedactSensitiveFormValues, backed by an explicit
sensitive-keys set.

The set now also covers Password, OldPassword, NewPassword,
PrivateKey, and SessionToken. None of those parameters are used by
today's IAM actions, but naming them here makes the log-safety
guarantee survive future additions such as LoginProfile / STS.

* test(iam): cover the upper length bound for CreateAccessKey

TestCreateAccessKeyBoundary / TestEmbeddedIamCreateAccessKeyBoundary
only exercised the 3/4-char lower edge. Add cases for 128 (accepted)
and 129 (rejected) for AccessKeyId, plus 7 / 128 / 129-char cases
for SecretAccessKey, so both ends of the validator are locked in at
the handler level (the pure validators in weed/iam already cover
this).

* fix(s3api/iam): verify user existence before RNG and collision scan

In the embedded IAM CreateAccessKey, the user lookup ran last: a
request for a non-existent user still walked the whole identity /
service-account list for collisions and, if no caller-supplied keys
were present, generated fresh random credentials with crypto/rand
before the NoSuchEntity error finally surfaced.

Reorder: validate inputs, then find the target identity, then do the
collision scan, then generate keys. A missing user now fails fast and
consumes no entropy, and the handler returns NoSuchEntity instead of
a misleading EntityAlreadyExists when both the user is missing and
the supplied AccessKeyId happens to collide with another identity's
key.

Add TestEmbeddedIamCreateAccessKeyRejectsMissingUser to lock in the
"no mutation on unknown user" guarantee.

The standalone iamapi CreateAccessKey intentionally keeps its
pre-existing "create-or-attach" semantics where a missing user is
implicitly provisioned — that is a behavior change beyond the scope
of this PR.

* test(iam): tighten collision leak assertion and cover 8-char secret

- Rename the collision-owner identity in TestCreateAccessKeyRejectsCollision
  (both iamapi and the embedded s3api test) from "existing" / "ExistingUser"
  to "ownerAlpha". The old assert.NotContains check was effectively a no-op
  because the error message never contained those substrings; a distinctive
  name shared with no part of the expected error body makes the leak guard
  actually meaningful if the wording ever drifts. The embedded test also
  adds a NotContains assertion that was previously missing entirely.
- Add an explicit 8-char SecretAccessKey pass case to both boundary tests
  so the lower edge of the validator is locked in at the handler level
  alongside the 7 / 128 / 129-char cases.

* fix(iamapi): enforce both-or-none before the collision lookup

In the standalone IAM CreateAccessKey, FindAccessKeyOwner ran before
the partial-credential check. If a caller supplied only AccessKeyId
and it happened to collide with an existing key, the response was
EntityAlreadyExists instead of the more fundamental InvalidInput for
omitting SecretAccessKey — wrong error class, and leaked the fact
that the probed key is already in use.

Swap the order: validate both-or-none first, then do the collision
scan. Matches the embedded IAM path and AWS behavior.

Add a case to TestCreateAccessKeyRejectsPartialSupply that combines
partial supply with a collision to lock in the ordering.

* fix(admin): reject partial caller-supplied AccessKey/SecretKey

The admin dashboard path silently generated the missing half when a
caller supplied only one of AccessKey or SecretKey, while the IAM
API and embedded IAM paths now reject this. Align the three: if
exactly one is provided, return ErrInvalidInput.

Also simplifies the generator block — either both are provided or
neither is, so there is no mixed path to handle.

* test(s3api/iam): guard dereferences in caller-supplied-keys test

TestEmbeddedIamCreateAccessKeyWithCallerSuppliedKeys dereferenced
*AccessKeyId/*SecretAccessKey/*UserName and indexed
Identities[0].Credentials[0] without first verifying shape, so any
future regression that returns a partial response or skips the
config mutation would panic mid-assertion instead of failing with
a clear message.

Add require.NotNil on the response pointers and require.Len on
the identities/credentials slices before the asserts.

* test(iamapi): exercise the service-account branch of the collision check

FindAccessKeyOwner scans both Identities[*].Credentials and
ServiceAccounts[*].Credential, but TestCreateAccessKeyRejectsCollision
only covered the identity branch. Split the test into two subtests —
one per branch — so a future refactor that drops the service-account
scan (or mutates the existing credential) trips a failure.

Also asserts the existing service-account credential is not mutated
and no credential is attached to the target identity on rejection.

* test(iam): isolate 129-char secret subcase from prior credential

In both TestCreateAccessKeyBoundary (iamapi) and
TestEmbeddedIamCreateAccessKeyBoundary (s3api), the 129-char
SecretAccessKey subcase reused the "validkey" AccessKeyId that the
preceding 8-char subcase had just persisted into the config. The
test still asserted the right outcome because the handler validates
secret length before running the collision scan — but if the two
checks ever swap, the subcase would pass (or fail) for the wrong
reason.

Reset the in-memory credentials before the 129-char subcase, matching
the pattern already used by the 3/128/129-char AccessKeyId and
7-char secret subcases. No behavior change; purely test isolation.

---------

Co-authored-by: Chris Lu <chris.lu@gmail.com>
2026-04-22 12:35:55 -07:00
Chris Lu
eaab3ec5d0 fix(topology): drop per-disk task-type conflict map (#9147) (#9166)
* fix(topology): drop per-disk task-type conflict map (#9147)

Different job types (Balance, ErasureCoding, Vacuum) operate on different
volumes, so a per-disk cross-type exclusion adds no correctness guarantee
beyond what HasAnyTask already enforces at task detection time. The
conflict map turned this into a deadlock on small clusters: a single
in-flight (or retrying) balance task would prune the source/destination
disks from EC placement, dropping the candidate count below MinTotalDisks
and permanently blocking auto-EC.

Removing the map lets EC see all eligible disks. Per-volume safety is
still guaranteed by HasAnyTask, and per-disk load shaping remains
available via MaxConcurrentTasksPerDisk.

* chore(topology): trim verbose comments from #9147 fix
2026-04-20 17:40:25 -07:00
Chris Lu
46b801aedb fix(admin): list all masters and dedupe EC file counts in dashboard (#9093)
* fix(admin): list all masters and dedupe EC file counts in dashboard

Dashboard -> Master Nodes only ever showed the currently connected master
because getMasterNodesStatus hard-coded a single entry. Replace it with a
RaftListClusterServers call that returns every master in the raft group and
tags the real leader, falling back to the current master only if the raft
call fails.

Buckets -> Object Store Buckets could render 0 objects for a bucket backed
by an EC volume. Every shard holder reports the same whole-volume
file_count (read from the replicated .ecx), so the first-seen value wins;
if that first node had not yet finished loading .ecx it reported 0 and
pinned the aggregate at 0. Take the max across reporting nodes instead.

The dashboard header total_files also dropped after volumes were converted
to erasure coding because getTopologyViaGRPC never folded EC file_count
into topology.TotalFiles. Aggregate it with the same max/sum dedupe.

* fix(admin): address PR review comments

- bound RaftListClusterServers with a 3s timeout so the dashboard endpoint
  cannot hang on a stalled master
- pre-validate raft addresses with net.SplitHostPort before calling
  pb.GrpcAddressToServerAddress, which otherwise glog.Fatalf's on a
  malformed entry and would crash the admin process
- when raft is unreachable, mark the fallback master as not-leader rather
  than claiming leadership the code cannot verify
- warn when summed EC delete_count exceeds file_count while folding into
  topology.TotalFiles, matching collectCollectionStats

* fix(admin): distinguish empty raft response from RPC failure

When RaftListClusterServers returns successfully with no servers, raft is
not initialized (standalone/non-raft cluster), so the single fallback
master is the leader. Only treat the fallback as a non-leader when the
RPC actually failed.

* fix(admin): remove misleading Objects column from S3 buckets page

The bucket "Objects" column displayed needle counts from volume
collection stats, not actual S3 object counts. This is confusing
because a single S3 object can span multiple needles (multipart
uploads, versions) and the count is inaccurate for EC volumes.

Remove the ObjectCount field from S3Bucket, the Objects table column,
the sort-by-objects handler, the detail-view row, and both CSV export
references.

* fix(admin): correct cell indexes in fallback bucket CSV export

After the Objects column was removed, the fallback CSV exporter in
admin.js still used stale cell indexes: cells[1] mapped to Owner
(not Created), cells[2] to Created (not Size), cells[3] to Logical
Size (not Quota). Align all indexes with the current table column
order and include Owner, Logical Size, and Physical Size.
2026-04-15 22:28:54 -07:00
Chris Lu
300e906330 admin: report file and delete counts for EC volumes (#9060)
* admin: report file and delete counts for EC volumes

The admin bucket size fix (#9058) left object counts at zero for
EC-encoded data because VolumeEcShardInformationMessage carried no file
count. Billing/monitoring dashboards therefore still under-report
objects once a bucket is EC-encoded.

Thread file_count and delete_count end-to-end:

- Add file_count/delete_count to VolumeEcShardInformationMessage (proto
  fields 8 and 9) and regenerate master_pb.
- Compute them lazily on volume servers by walking the .ecx index once
  per EcVolume, cache on the struct, and keep the cache in sync inside
  DeleteNeedleFromEcx (distinguishing live vs already-tombstoned
  entries so idempotent deletes do not drift the counts).
- Populate the new proto fields from EcVolume.ToVolumeEcShardInformationMessage
  and carry them through the master-side EcVolumeInfo / topology sync.
- Aggregate in admin collectCollectionStats, deduping per volume id:
  every node holding shards of an EC volume reports the same counts, so
  summing across nodes would otherwise multiply the object count by the
  number of shard holders.

Regression tests cover the initial .ecx walk, live/tombstoned delete
bookkeeping (including idempotent and missing-key cases), and the admin
dedup path for an EC volume reported by multiple nodes.

* ec: include .ecj journal in EcVolume delete count

The initial delete count only reflected .ecx tombstones, missing any
needle that was journaled in .ecj but not yet folded into .ecx — e.g.
on partial recovery. Expand initCountsLocked to take the union of
.ecx tombstones and .ecj journal entries, deduped by needle id, so:

  - an id that is both tombstoned in .ecx and listed in .ecj counts once
  - a duplicate .ecj entry counts once
  - an .ecj id with a live .ecx entry is counted as deleted (not live)
  - an .ecj id with no matching .ecx entry is still counted

Covered by TestEcVolumeFileAndDeleteCountEcjUnion.

* ec: report delete count authoritatively and tombstone once per delete

Address two issues with the previous EcVolume file/delete count work:

1. The delete count was computed lazily on first heartbeat and mixed
   in a .ecj-union fallback to "recover" partial state. That diverged
   from how regular volumes report counts (always live from the needle
   map) and had drift cases when .ecj got reconciled. Replace with an
   eager walk of .ecx at NewEcVolume time, maintained incrementally on
   every DeleteNeedleFromEcx call. Semantics now match needle_map_metric:
   FileCount is the total number of needles ever recorded in .ecx
   (live + tombstoned), DeleteCount is the tombstones — so live =
   FileCount - DeleteCount. Drop the .ecj-union logic entirely.

2. A single EC needle delete fanned out to every node holding a replica
   of the primary data shard and called DeleteNeedleFromEcx on each,
   which inflated the per-volume delete total by the replica factor.
   Rewrite doDeleteNeedleFromRemoteEcShardServers to try replicas in
   order and stop at the first success (one tombstone per delete), and
   only fall back to other shards when the primary shard has no home
   (ErrEcShardMissing sentinel), not on transient RPC errors.

Admin aggregation now folds EC counts correctly: FileCount is deduped
per volume id (every shard holder has an identical .ecx) and DeleteCount
is summed across nodes (each delete tombstones exactly one node). Live
object count = deduped FileCount - summed DeleteCount.

Tests updated to match the new semantics:
  - EC volume counts seed FileCount as total .ecx entries (live +
    tombstoned), DeleteCount as tombstones.
  - DeleteNeedleFromEcx keeps FileCount constant and increments
    DeleteCount only on live->tombstone transitions.
  - Admin dedup test uses distinct per-node delete counts (5 + 3 + 2)
    to prove they're summed, while FileCount=100 is applied once.

* ec: test fixture uses real vid; admin warns on skewed ec counts

- writeFixture now builds the .ecx/.ecj/.ec00/.vif filenames from the
  actual vid passed in, instead of hardcoding "_1". The existing tests
  all use vid=1 so behaviour is unchanged, but the helper no longer
  silently diverges from its documented parameter.
- collectCollectionStats logs a glog warning when an EC volume's summed
  delete count exceeds its deduped file count, surfacing the anomaly
  (stale heartbeat, counter drift, etc.) instead of silently dropping
  the volume from the object count.

* ec: derive file/delete counts from .ecx/.ecj file sizes

seedCountsFromEcx walked the full .ecx index at volume load, which is
wasted work: .ecx has fixed-size entries (NeedleMapEntrySize) and .ecj
has fixed-size deletion records (NeedleIdSize), so both counts are pure
file-size arithmetic.

  fileCount   = ecxFileSize / NeedleMapEntrySize
  deleteCount = ecjFileSize / NeedleIdSize

Rip out the cached counters, countsLock, seedCountsFromEcx, and the
recordDelete helper. Track ecjFileSize directly on the EcVolume struct,
seed it from Stat() at load, and bump it on every successful .ecj append
inside DeleteNeedleFromEcx under ecjFileAccessLock. Skip the .ecj write
entirely when the needle is already tombstoned so the derived delete
count stays idempotent on repeat deletes. Heartbeats now compute counts
in O(1).

Tests updated: the initial fixture pre-populates .ecj with two ids to
verify the file-size derivation end-to-end, and the delete test keeps
its idempotent-re-delete / missing-needle invariants (unchanged
externally, now enforced by the early return rather than a cache guard).

* ec: sync Rust volume server with Go file/delete count semantics

Mirror the Go-side EC file/delete count work in the Rust volume server
so mixed Go/Rust clusters report consistent bucket object counts in
the admin dashboard.

- Add file_count (8) and delete_count (9) to the Rust copy of
  VolumeEcShardInformationMessage (seaweed-volume/proto/master.proto).
- EcVolume gains ecj_file_size, seeded from the journal's metadata on
  open and bumped inside journal_delete on every successful append.
- file_and_delete_count() returns counts derived in O(1) from
  ecx_file_size / NEEDLE_MAP_ENTRY_SIZE and
  ecj_file_size / NEEDLE_ID_SIZE, matching Go's FileAndDeleteCount.
- to_volume_ec_shard_information_messages populates the new proto
  fields instead of defaulting them to zero.
- mark_needle_deleted_in_ecx now returns a DeleteOutcome enum
  (NotFound / AlreadyDeleted / Tombstoned) so journal_delete can skip
  both the .ecj append and the size bump when the needle is missing
  or already tombstoned, keeping the derived delete_count idempotent
  on repeat or no-op deletes.
- Rust's EcVolume::new no longer replays .ecj into .ecx on load. Go's
  RebuildEcxFile is only called from specific decode/rebuild gRPC
  handlers, not on volume open, and replaying on load was hiding the
  deletion journal from the new file-size-derived delete counter.
  rebuild_ecx_from_journal is kept as dead_code for future decode
  paths that may want the same replay semantics.

Also clean up the Go FileAndDeleteCount to drop unnecessary runtime
guards against zero constants — NeedleMapEntrySize and NeedleIdSize
are compile-time non-zero.

test_ec_volume_journal updated to pre-populate the .ecx with the
needles it deletes, and extended to verify that repeat and
missing-id deletes do not drift the derived counts.

* ec: document enterprise-reserved proto field range on ec shard info

Both OSS master.proto copies now note that fields 10-19 are reserved
for future upstream additions while 20+ are owned by the enterprise
fork. Enterprise already pins data_shards/parity_shards at 20/21, so
keeping OSS additions inside 8-19 avoids wire-level collisions for
mixed deployments.

* ec(rust): resolve .ecx/.ecj helpers from ecx_actual_dir

ecx_file_name() and ecj_file_name() resolved from self.dir_idx, but
new() opens the actual files from ecx_actual_dir (which may fall back
to the data dir when the idx dir does not contain the index). After a
fallback, read_deleted_needles() and rebuild_ecx_from_journal() would
read/rebuild the wrong (nonexistent) path while heartbeats reported
counts from the file actually in use — silently dropping deletes.

Point idx_base_name() at ecx_actual_dir, which is initialized to
dir_idx and only diverges after a successful fallback, so every call
site agrees with the file new() has open. The pre-fallback call in
new() (line 142) still returns the dir_idx path because
ecx_actual_dir == dir_idx at that point.

Update the destroy() sweep to build the dir_idx cleanup paths
explicitly instead of leaning on the helpers, so post-fallback stale
files in the idx dir are still removed.

* ec: reset ecj size after rebuild; rollback ecx tombstone on ecj failure

Two EC delete-count correctness fixes applied symmetrically to Go and
Rust volume servers.

1. rebuild_ecx_from_journal (Rust) now sets ecj_file_size = 0 after
   recreating the empty journal, matching the on-disk truth.
   Previously the cached size still reflected the pre-rebuild journal
   and file_and_delete_count() would keep reporting stale delete
   counts. The Go side has no equivalent bug because RebuildEcxFile
   runs in an offline helper that does not touch an EcVolume struct.

2. DeleteNeedleFromEcx / journal_delete used to tombstone the .ecx
   entry before writing the .ecj record. If the .ecj append then
   failed, the needle was permanently marked deleted but the
   heartbeat-reported delete_count never advanced (it is derived from
   .ecj file size), and a retry would see AlreadyDeleted and early-
   return, leaving the drift permanent.

   Both languages now capture the entry's file offset and original
   size bytes during the mark step, attempt the .ecj append, and on
   failure roll the .ecx tombstone back by writing the original size
   bytes at the known offset. A rollback that itself errors is
   logged (glog / tracing) but cannot re-sync the files — this is
   the same failure mode a double disk error would produce, and is
   unavoidable without a full on-disk transaction log.

Go: wrap MarkNeedleDeleted in a closure that captures the file
offset into an outer variable, then pass the offset + oldSize to the
new rollbackEcxTombstone helper on .ecj seek/write errors.

Rust: DeleteOutcome::Tombstoned now carries the size_offset and a
[u8; SIZE_SIZE] copy of the pre-tombstone size field. journal_delete
destructures on Tombstoned and calls restore_ecx_size on .ecj append
failure.

* test(ec): widen admin /health wait to 180s for cold CI

TestEcEndToEnd starts master, 14 volume servers, filer, 2 workers and
admin in sequence, then waited only 60s for admin's HTTP server to come
up. On cold GitHub runners the tail of the earlier subprocess startups
eats most of that budget and the wait occasionally times out (last hit
on run 24374773031). The local fast path is still ~20s total, so the
bump only extends the timeout ceiling, not the happy path.

* test(ec): fork volume servers in parallel in TestEcEndToEnd

startWeed is non-blocking (just cmd.Start()), so the per-process fork +
mkdir + log-file-open overhead for 14 volume servers was serialized for
no reason. On cold CI disks that overhead stacks up and eats into the
subsequent admin /health wait, which is how run 24374773031 flaked.

Wrap the volume-server loop in a sync.WaitGroup and guard runningCmds
with a mutex so concurrent appends are safe. startWeed still calls
t.Fatalf on failure, which is fine from a goroutine for a fatal test
abort; the fail-fast isn't something we rely on for precise ordering.

* ec: fsync ecx before ecj, truncate on failure, harden rebuild

Four correctness fixes covering both volume servers.

1. Durability ordering (Go + Rust). After marking the .ecx tombstone
   we now fsync .ecx before touching .ecj, so a crash between the two
   files cannot leave the journal with an entry for a needle whose
   tombstone is still sitting in page cache. Once the fsync returns,
   the tombstone is the source of truth: reads see "deleted",
   delete_count may under-count by one (benign, idempotent retries)
   but never over-reports. If the fsync itself fails we restore the
   original size bytes and surface the error. The .ecj append is then
   followed by its own Sync so the reported delete_count matches the
   on-disk journal once the write returns.

2. .ecj truncation on append failure. write_all may have extended the
   journal on disk before sync_all / Sync errors out, leaving the
   cached ecj_file_size out of sync with the physical length and
   drifting delete_count permanently after restart. Both languages
   now capture the pre-append size, truncate the file back via
   set_len / Truncate on any write or sync failure, and only then
   restore the .ecx tombstone. Truncation errors are logged — same-fd
   length resets cannot realistically fail — but cannot themselves
   re-sync the files.

3. Atomic rebuild_ecx_from_journal (Rust, dead code today but wired
   up on any future decode path). Previously a failed
   mark_needle_deleted_in_ecx call was swallowed with `let _ = ...`
   and the journal was still removed, silently losing tombstones.
   We now bubble up any non-NotFound error, fsync .ecx after the
   whole replay succeeds, and only then drop and recreate .ecj.
   NotFound is still ignored (expected race between delete and encode).

4. Missing-.ecx hardening (Rust). mark_needle_deleted_in_ecx used to
   return Ok(NotFound) when self.ecx_file was None, hiding a closed or
   corrupt volume behind what looks like an idempotent no-op. It now
   returns an io::Error carrying the volume id so callers (e.g.
   journal_delete) fail loudly instead.

Existing Go and Rust EC test suites stay green.

* ec: make .ecx immutable at runtime; track deletes in memory + .ecj

Refactors both volume servers so the sealed sorted .ecx index is never
mutated during normal operation. Runtime deletes are committed to the
.ecj deletion journal and tracked in an in-memory deleted-needle set;
read-path lookups consult that set to mask out deleted ids on top of
the immutable .ecx record. Mirrors the intended design on both Go and
Rust sides.

EcVolume gains a `deletedNeedles` / `deleted_needles` set seeded from
.ecj in NewEcVolume / EcVolume::new. DeleteNeedleFromEcx /
journal_delete:

  1. Looks the needle up read-only in .ecx.
  2. Missing needle -> no-op.
  3. Pre-existing .ecx tombstone (from a prior decode/rebuild) ->
     mirror into the in-memory set, no .ecj append.
  4. Otherwise append the id to .ecj, fsync, and only then publish
     the id into the set. A partial write is truncated back to the
     pre-append length so the on-disk journal and the in-memory set
     cannot drift.

FindNeedleFromEcx / find_needle_from_ecx now return
TombstoneFileSize when the id is in the in-memory set, even though
the bytes on disk still show the original size.

FileAndDeleteCount:
  fileCount   = .ecx size / NeedleMapEntrySize (unchanged)
  deleteCount = len(deletedNeedles) (was: .ecj size / NeedleIdSize)

The RebuildEcxFile / rebuild_ecx_from_journal decode-time helpers
still fold .ecj into .ecx — that is the one place tombstones land in
the physical index, and it runs offline on closed files. Rust's
rebuild helper now also clears the in-memory set when it succeeds.

Dead code removed on the Rust side: `DeleteOutcome`,
`mark_needle_deleted_in_ecx`, `restore_ecx_size`. Go drops the
runtime `rollbackEcxTombstone` path. Neither helper was needed once
.ecx stopped being a runtime mutation target.

TestEcVolumeSyncEnsuresDeletionsVisible (issue #7751) is rewritten
as TestEcVolumeDeleteDurableToJournal, which exercises the full
durability chain: delete -> .ecj fsync -> FindNeedleFromEcx masks
via the in-memory set -> raw .ecx bytes are *unchanged* -> Close +
RebuildEcxFile folds the journal into .ecx -> raw bytes now show
the tombstone, as CopyFile in the decode path expects.
2026-04-13 21:10:36 -07:00
Chris Lu
ef77df6141 admin: include EC volumes in bucket size reporting (#9058)
* admin: include EC volumes in bucket size reporting

The Object Store buckets page computed per-collection size by iterating
only regular volumes, so once a bucket's data was EC-encoded it silently
disappeared from the reported size — breaking usage-based billing.

Walk EcShardInfos alongside VolumeInfos in collectCollectionStats: add
raw shard bytes to PhysicalSize, and the parity-stripped value
(shardBytes * DataShardsCount / TotalShardsCount) to LogicalSize,
matching the normalization used by `weed shell` cluster.status.

* admin: derive EC logical size from shard bitmap, not constants

Use ShardsInfoFromVolumeEcShardInformationMessage + MinusParityShards
to sum actual data-shard bytes instead of scaling raw bytes by the
DataShardsCount/TotalShardsCount ratio. Keeps the data/parity split
encapsulated in the erasure_coding package and is exact when shard
sizes differ (e.g. last shard).

* admin: regression test for EC shard size aggregation

Cover the uneven-tail-shard case (data shard 9 < 1000 bytes) and the
empty-collection-name path to pin PhysicalSize/LogicalSize behavior
for collectCollectionStats against future changes.
2026-04-13 15:15:21 -07:00
Chris Lu
512912cbb8 Update plugin_templ.go 2026-04-13 13:10:03 -07:00
Chris Lu
ae08e77979 fix(scheduler): give worker tasks a real per-attempt execution deadline (#9041)
* fix(scheduler): give worker tasks a real per-attempt execution deadline

The plugin scheduler derived the per-attempt execution deadline as
DetectionTimeoutSeconds * 2, which capped every worker task at twice
the cluster-scan budget regardless of actual work. For volume_balance
batches this was 240s — far too short for 20 large volume copies, so
every attempt died at "context deadline exceeded" and all in-flight
sub-RPCs surfaced as "context canceled". Retries restarted from move 1
and hit the same wall.

Add an explicit ExecutionTimeoutSeconds field to the plugin proto and
make each handler declare its own baseline (1800s for vacuum, balance,
EC; 3600s for iceberg). Size-aware handlers also emit an
estimated_runtime_seconds parameter on each proposal so the scheduler
extends the per-attempt deadline based on actual workload:

- volume_balance batch: max(largest single move, total / concurrency)
  at 5 min/GB, so a skewed batch with one big volume isn't averaged
  away.
- volume_balance single, vacuum (already), erasure_coding (10 min/GB),
  ec_balance (5 min/GB): per-volume budgets.

admin_script and iceberg keep the configurable handler default since
their workloads are opaque to the detector.

* fix(scheduler): apply descriptor defaults to existing persisted configs

The previous commit added execution_timeout_seconds to the proto and
each handler's descriptor defaults, but two paths still left existing
deployments broken:

1. deriveSchedulerAdminRuntime returned stored AdminRuntime configs
   as-is. Persisted configs from older versions have no
   execution_timeout_seconds, so the scheduler fell back to the 90s
   default — worse than the prior 240s behavior. Overlay descriptor
   defaults for any zero numeric fields when loading.

2. The admin form did not round-trip execution_timeout_seconds, so a
   normal save would clear it back to zero. Add the input field, the
   fillAdminSettings/collectAdminSettings hooks, and as defense in
   depth reapply descriptor defaults in UpdatePluginJobTypeConfigAPI
   before persisting so a stale form can never silently clobber a
   baseline.

* fix(volume_balance): account for partial scheduling rounds in batch estimate

With N moves and C slots, the busiest slot processes ceil(N/C) moves,
not N/C. Dividing total seconds by C underestimates wall-clock time
whenever N is not a multiple of C — e.g. 6 moves at concurrency 5
needs 2 rounds, not 1.2. Use avg * ceil(N/C) so partial rounds are
counted as full ones.

* fix(volume_balance): scale minBudget per wave instead of per move

Orchestration overhead (setup/teardown for the parallel move runner)
happens once per wave, not once per move. Use numRounds*60 as the
floor instead of len(moves)*60 so the minimum doesn't inflate
linearly with batch size when individual moves are tiny.
2026-04-13 01:15:53 -07:00
Chris Lu
28d1ef24ec fix(admin): allow control chars in file paths when browsing filer (#9043)
* fix(admin): allow control chars in file paths when browsing filer

The admin UI rejected any path containing \x00, \r, or \n as "path contains
invalid characters". These bytes are legal in S3 object keys, so objects
created through the S3 API (or replicated via filer.sync) could exist on the
filer but be unreachable from the admin UI — browse, download, and upload
all failed with "Invalid file path".

Drop the control-character rejection and instead URL-escape the path when
constructing filer request URLs, so that such bytes cannot inject into the
HTTP request target. Path traversal protection via path.Clean is unchanged.

* test(admin): strengthen file path tests with byte-preserving checks

Assert full expected output for validateAndCleanFilePath so silent stripping
of control characters would fail the test, and cover \r and \x00 escaping in
filerFileURL in addition to \n and space.
2026-04-13 01:15:35 -07:00
Moray Baruh
41ff105f47 object_store_users: fix specific bucket admin permission (#9014)
Fix an issue where seleting Sepecific Buckets with Admin permission
while creating/editing an object store user would grant Admin permission on all
buckets
2026-04-10 18:10:05 -07:00
Chris Lu
e648c76bcf go fmt 2026-04-10 17:31:14 -07:00
Chris Lu
b0e79ad207 fix(admin): respect urlPrefix for root redirect and JS API calls (#8975)
* fix(admin): respect urlPrefix for root redirect and JS API calls (#8967)

Two issues when running admin UI behind a reverse proxy with -urlPrefix:

1. Visiting the prefix path without trailing slash (e.g. /s3-admin) caused
   a redirect to / instead of /s3-admin/ because http.StripPrefix produced
   an empty path that the router redirected to root.

2. Several JavaScript API calls in admin.js used hardcoded paths instead
   of basePath(), causing file upload, download, and preview to fail.

* fix(admin): preserve query params in prefix redirect and use 302

Use http.StatusFound instead of 301 to avoid aggressive browser caching
of a configuration-dependent redirect, and preserve query parameters.
2026-04-07 14:12:05 -07:00
Chris Lu
076d504044 fix(admin): reduce memory usage and verbose logging for large clusters (#8927)
* fix(admin): reduce memory usage and verbose logging for large clusters (#8919)

The admin server used excessive memory and produced thousands of log lines
on clusters with many volumes (e.g., 33k volumes). Three root causes:

1. Scanner duplicated all volume metrics: getVolumeHealthMetrics() created
   VolumeHealthMetrics objects, then convertToTaskMetrics() copied them all
   into identical types.VolumeHealthMetrics. Now uses the task-system type
   directly, eliminating the duplicate allocation and removing convertToTaskMetrics.

2. All previous task states loaded at startup: LoadTasksFromPersistence read
   and deserialized every .pb file from disk, logging each one. With thousands
   of balance tasks persisted, this caused massive startup I/O, memory usage,
   and log noise (including unguarded DEBUG glog.Infof per task). Now starts
   with an empty queue — the scanner re-detects current needs from live cluster
   state. Terminal tasks are purged from memory and disk when new scan results
   arrive.

3. Verbose per-volume/per-node logging: V(2) and V(3) logs produced thousands
   of lines per scan. Per-volume logs bumped to V(4), per-node/rack/disk logs
   bumped to V(3). Topology summary now logs counts instead of full node ID arrays.

Also removes lastTopologyInfo field from MaintenanceScanner — the raw protobuf
topology is returned as a local value and not retained between 30-minute scans.

* fix(admin): delete stale task files at startup, add DeleteAllTaskStates

Old task .pb files from previous runs were left on disk. The periodic
CleanupCompletedTasks still loads all files to find completed ones —
the same expensive 4GB path from the pprof profile.

Now at startup, DeleteAllTaskStates removes all .pb files by scanning
the directory without reading or deserializing them. The scanner will
re-detect any tasks still needed from live cluster state.

* fix(admin): don't persist terminal tasks to disk

CompleteTask was saving failed/completed tasks to disk where they'd
accumulate. The periodic cleanup only triggered for completed tasks,
not failed ones. Now terminal tasks are deleted from disk immediately
and only kept in memory for the current session's UI.

* fix(admin): cap in-memory tasks to 100 per job type

Without a limit, the task map grows unbounded — balance could create
thousands of pending tasks for a cluster with many imbalanced volumes.
Now AddTask rejects new tasks when a job type already has 100 in the
queue. The scanner will re-detect skipped volumes on the next scan.

* fix(admin): address PR review - memory-only purge, active-only capacity

- purgeTerminalTasks now only cleans in-memory map (terminal tasks are
  already deleted from disk by CompleteTask)
- Per-type capacity limit counts only active tasks (pending/assigned/
  in_progress), not terminal ones
- When at capacity, purge terminal tasks first before rejecting

* fix(admin): fix orphaned comment, add TaskStatusCancelled to terminal switch

- Move hasQueuedOrActiveTaskForVolume comment to its function definition
- Add TaskStatusCancelled to the terminal state switch in CompleteTask
  so cancelled task files are deleted from disk
2026-04-04 18:45:57 -07:00
Chris Lu
d37b592bc4 Update object_store_users_templ.go 2026-04-04 11:52:57 -07:00
Chris Lu
d1823d3784 fix(s3): include static identities in listing operations (#8903)
* fix(s3): include static identities in listing operations

Static identities loaded from -s3.config file were only stored in the
S3 API server's in-memory state. Listing operations (s3.configure shell
command, aws iam list-users) queried the credential manager which only
returned dynamic identities from the backend store.

Register static identities with the credential manager after loading
so they are included in LoadConfiguration and ListUsers results, and
filtered out before SaveConfiguration to avoid persisting them to the
dynamic store.

Fixes https://github.com/seaweedfs/seaweedfs/discussions/8896

* fix: avoid mutating caller's config and defensive copies

- SaveConfiguration: use shallow struct copy instead of mutating the
  caller's config.Identities field
- SetStaticIdentities: skip nil entries to avoid panics
- GetStaticIdentities: defensively copy PolicyNames slice to avoid
  aliasing the original

* fix: filter nil static identities and sync on config reload

- SetStaticIdentities: filter nil entries from the stored slice (not
  just from staticNames) to prevent panics in LoadConfiguration/ListUsers
- Extract updateCredentialManagerStaticIdentities helper and call it
  from both startup and the grace.OnReload handler so the credential
  manager's static snapshot stays current after config file reloads

* fix: add mutex for static identity fields and fix ListUsers for store callers

- Add sync.RWMutex to protect staticIdentities/staticNames against
  concurrent reads during config reload
- Revert CredentialManager.ListUsers to return only store users, since
  internal callers (e.g. DeletePolicy) look up each user in the store
  and fail on non-existent static entries
- Merge static usernames in the filer gRPC ListUsers handler instead,
  via the new GetStaticUsernames method
- Fix CI: TestIAMPolicyManagement/managed_policy_crud_lifecycle was
  failing because DeletePolicy iterated static users that don't exist
  in the store

* fix: show static identities in admin UI and weed shell

The admin UI and weed shell s3.configure command query the filer's
credential manager via gRPC, which is a separate instance from the S3
server's credential manager. Static identities were only registered
on the S3 server's credential manager, so they never appeared in the
filer's responses.

- Add CredentialManager.LoadS3ConfigFile to parse a static S3 config
  file and register its identities
- Add FilerOptions.s3ConfigFile so the filer can load the same static
  config that the S3 server uses
- Wire s3ConfigFile through in weed mini and weed server modes
- Merge static usernames in filer gRPC ListUsers handler
- Add CredentialManager.GetStaticUsernames helper
- Add sync.RWMutex to protect concurrent access to static identity
  fields
- Avoid importing weed/filer from weed/credential (which pulled in
  filer store init() registrations and broke test isolation)
- Add docker/compose/s3_static_users_example.json

* fix(admin): make static users read-only in admin UI

Static users loaded from the -s3.config file should not be editable
or deletable through the admin UI since they are managed via the
config file.

- Add IsStatic field to ObjectStoreUser, set from credential manager
- Hide edit, delete, and access key buttons for static users in the
  users table template
- Show a "static" badge next to static user names
- Return 403 Forbidden from UpdateUser and DeleteUser API handlers
  when the target user is a static identity

* fix(admin): show details for static users

GetObjectStoreUserDetails called credentialManager.GetUser which only
queries the dynamic store. For static users this returned
ErrUserNotFound. Fall back to GetStaticIdentity when the store lookup
fails.

* fix(admin): load static S3 identities in admin server

The admin server has its own credential manager (gRPC store) which is
a separate instance from the S3 server's and filer's. It had no static
identity data, so IsStaticIdentity returned false (edit/delete buttons
shown) and GetStaticIdentity returned nil (details page failed).

Pass the -s3.config file path through to the admin server and call
LoadS3ConfigFile on its credential manager, matching the approach
used for the filer.

* fix: use protobuf is_static field instead of passing config file path

The previous approach passed -s3.config file path to every component
(filer, admin). This is wrong because the admin server should not need
to know about S3 config files.

Instead, add an is_static field to the Identity protobuf message.
The field is set when static identities are serialized (in
GetStaticIdentities and LoadS3ConfigFile). Any gRPC client that loads
configuration via GetConfiguration automatically sees which identities
are static, without needing the config file.

- Add is_static field (tag 8) to iam_pb.Identity proto message
- Set IsStatic=true in GetStaticIdentities and LoadS3ConfigFile
- Admin GetObjectStoreUsers reads identity.IsStatic from proto
- Admin IsStaticUser helper loads config via gRPC to check the flag
- Filer GetUser gRPC handler falls back to GetStaticIdentity
- Remove s3ConfigFile from AdminOptions and NewAdminServer signature
2026-04-03 20:01:28 -07:00
Chris Lu
995dfc4d5d chore: remove ~50k lines of unreachable dead code (#8913)
* chore: remove unreachable dead code across the codebase

Remove ~50,000 lines of unreachable code identified by static analysis.

Major removals:
- weed/filer/redis_lua: entire unused Redis Lua filer store implementation
- weed/wdclient/net2, resource_pool: unused connection/resource pool packages
- weed/plugin/worker/lifecycle: unused lifecycle plugin worker
- weed/s3api: unused S3 policy templates, presigned URL IAM, streaming copy,
  multipart IAM, key rotation, and various SSE helper functions
- weed/mq/kafka: unused partition mapping, compression, schema, and protocol functions
- weed/mq/offset: unused SQL storage and migration code
- weed/worker: unused registry, task, and monitoring functions
- weed/query: unused SQL engine, parquet scanner, and type functions
- weed/shell: unused EC proportional rebalance functions
- weed/storage/erasure_coding/distribution: unused distribution analysis functions
- Individual unreachable functions removed from 150+ files across admin,
  credential, filer, iam, kms, mount, mq, operation, pb, s3api, server,
  shell, storage, topology, and util packages

* fix(s3): reset shared memory store in IAM test to prevent flaky failure

TestLoadIAMManagerFromConfig_EmptyConfigWithFallbackKey was flaky because
the MemoryStore credential backend is a singleton registered via init().
Earlier tests that create anonymous identities pollute the shared store,
causing LookupAnonymous() to unexpectedly return true.

Fix by calling Reset() on the memory store before the test runs.

* style: run gofmt on changed files

* fix: restore KMS functions used by integration tests

* fix(plugin): prevent panic on send to closed worker session channel

The Plugin.sendToWorker method could panic with "send on closed channel"
when a worker disconnected while a message was being sent. The race was
between streamSession.close() closing the outgoing channel and sendToWorker
writing to it concurrently.

Add a done channel to streamSession that is closed before the outgoing
channel, and check it in sendToWorker's select to safely detect closed
sessions without panicking.
2026-04-03 16:04:27 -07:00
Chris Lu
2e98902f29 fix(s3): use URL-safe secret keys for dashboard users and service accounts (#8902)
* fix(s3): use URL-safe secret keys for admin dashboard users and service accounts

The dashboard's generateSecretKey() used base64.StdEncoding which produces
+, /, and = characters that break S3 signature authentication. Reuse the
IAM package's GenerateSecretAccessKey() which was already fixed in #7990.

Fixes #8898

* fix: handle error from GenerateSecretAccessKey instead of ignoring it
2026-04-03 11:20:28 -07:00
Chris Lu
888c32cbde fix(admin): respect urlPrefix in S3 bucket and S3Tables navigation links (#8885)
* fix(admin): respect urlPrefix in S3 bucket and S3Tables navigation links (#8884)

Several admin UI templates used hardcoded URLs (templ.SafeURL) instead of
dash.PUrl(ctx, ...) for navigation links, causing 404 errors when the
admin is deployed with --urlPrefix.

Fixed in: s3_buckets.templ, s3tables_buckets.templ, s3tables_tables.templ

* fix(admin): URL-escape bucketName in S3Tables navigation links

Add url.PathEscape(bucketName) for consistency and correctness in
s3tables_tables.templ (back-to-namespaces link) and s3tables_buckets.templ
(namespace link), matching the escaping already used in the table details link.
2026-04-02 11:54:19 -07:00
Chris Lu
44d5cb8f90 Fix Admin UI master list showing gRPC port instead of HTTP port (#8869)
* Fix Admin UI master list showing gRPC port instead of HTTP port for followers (#8867)

Raft stores server addresses as gRPC addresses. The Admin UI was using
these addresses directly via ToHttpAddress(), which cannot extract the
HTTP port from a plain gRPC address. Use GrpcAddressToServerAddress()
to properly convert gRPC addresses back to HTTP addresses.

* Use httpAddress consistently as masterMap key

Address review feedback: masterInfo.Address (HTTP form) was already
computed but the raw address was used as the map key, causing
potential key mismatches between topology and raft data.
2026-04-01 09:43:50 -07:00
Lars Lehtonen
c1acf9e479 Prune unused functions from weed/admin/dash. (#8871)
* chore(weed/admin/dash): prune unused functions

* chore(weed/admin/dash): prune test-only function
2026-04-01 09:22:49 -07:00
Chris Lu
2eaf98a7a2 Use Unix sockets for gRPC in mini mode (#8856)
* Use Unix sockets for gRPC between co-located services in mini mode

In `weed mini`, all services run in one process. Previously, inter-service
gRPC traffic (volume↔master, filer↔master, S3↔filer, worker↔admin, etc.)
went through TCP loopback. This adds a gRPC Unix socket registry in the pb
package: mini mode registers a socket path per gRPC port at startup, each
gRPC server additionally listens on its socket, and GrpcDial transparently
routes to the socket via WithContextDialer when a match is found.

Standalone commands (weed master, weed filer, etc.) are unaffected since
no sockets are registered. TCP listeners are kept for external clients.

* Handle Serve error and clean up socket file in ServeGrpcOnLocalSocket

Log non-expected errors from grpcServer.Serve (ignoring
grpc.ErrServerStopped) and always remove the Unix socket file
when Serve returns, ensuring cleanup on Stop/GracefulStop.
2026-03-30 18:18:52 -07:00
Chris Lu
a95b8396e4 plugin scheduler: run iceberg and lifecycle lanes concurrently (#8821)
* plugin scheduler: run iceberg and lifecycle lanes concurrently

The default lane serialises job types under a single admin lock
because volume-management operations share global state. Iceberg
and lifecycle lanes have no such constraint, so run each of their
job types independently in separate goroutines.

* Fix concurrent lane scheduler status

* plugin scheduler: address review feedback

- Extract collectDueJobTypes helper to deduplicate policy loading
  between locked and concurrent iteration paths.
- Use atomic.Bool instead of sync.Mutex for hadJobs in the concurrent
  path.
- Set lane loop state to "busy" before launching concurrent goroutines
  so the lane is not reported as idle while work runs.
- Convert TestLaneRequiresLock to table-driven style.
- Add TestRunLaneSchedulerIterationLockBehavior to verify the scheduler
  acquires the admin lock only for lanes that require it.
- Fix flaky TestGetLaneSchedulerStatusShowsActiveConcurrentLaneWork by
  not starting background scheduler goroutines that race with the
  direct runJobTypeIteration call.
2026-03-29 00:06:20 -07:00
Chris Lu
8c8d21d7e2 Update plugin_lane_templ.go 2026-03-26 23:11:10 -07:00
Chris Lu
2604ec7deb Remove min_interval_seconds from plugin workers; vacuum default to 17m (#8790)
remove min_interval_seconds from plugin workers and default vacuum interval to 17m

The worker-level min_interval_seconds was redundant with the admin-side
DetectionIntervalSeconds, complicating scheduling logic. Remove it from
vacuum, volume_balance, erasure_coding, and ec_balance handlers.

Also change the vacuum default DetectionIntervalSeconds from 2 hours to
17 minutes to match the previous default behavior.
2026-03-26 23:04:36 -07:00
Chris Lu
cc2f790c73 feat: add per-lane scheduler status API and lane worker UI pages
- GET /api/plugin/lanes returns all lanes with status and job types
- GET /api/plugin/workers?lane=X filters workers by lane
- GET /api/plugin/scheduler-states?lane=X filters job types by lane
- GET /api/plugin/scheduler-status?lane=X returns lane-scoped status
- GET /plugin/lanes/{lane}/workers renders per-lane worker page
- SchedulerJobTypeState now includes a "lane" field

The lane worker pages show scheduler status, job type configuration,
and connected workers scoped to a single lane, with links back to
the main plugin overview.
2026-03-26 19:33:42 -07:00
Chris Lu
e3e015e108 feat: introduce scheduler lanes for independent per-workload scheduling
Split the single plugin scheduler loop into independent per-lane
goroutines so that volume management, iceberg compaction, and lifecycle
operations never block each other.

Each lane has its own:
- Goroutine (laneSchedulerLoop)
- Wake channel for immediate scheduling
- Admin lock scope (e.g. "plugin scheduler:default")
- Configurable idle sleep duration
- Loop state tracking

Three lanes are defined:
- default: vacuum, volume_balance, ec_balance, erasure_coding, admin_script
- iceberg: iceberg_maintenance
- lifecycle: s3_lifecycle (new, handler coming in a later commit)

Job types are mapped to lanes via a hardcoded map with LaneDefault as
the fallback. The SchedulerJobTypeState and SchedulerStatus types now
include a Lane field for API consumers.
2026-03-26 19:32:18 -07:00
Chris Lu
d95df76bca feat: separate scheduler lanes for iceberg, lifecycle, and volume management (#8787)
* feat: introduce scheduler lanes for independent per-workload scheduling

Split the single plugin scheduler loop into independent per-lane
goroutines so that volume management, iceberg compaction, and lifecycle
operations never block each other.

Each lane has its own:
- Goroutine (laneSchedulerLoop)
- Wake channel for immediate scheduling
- Admin lock scope (e.g. "plugin scheduler:default")
- Configurable idle sleep duration
- Loop state tracking

Three lanes are defined:
- default: vacuum, volume_balance, ec_balance, erasure_coding, admin_script
- iceberg: iceberg_maintenance
- lifecycle: s3_lifecycle (new, handler coming in a later commit)

Job types are mapped to lanes via a hardcoded map with LaneDefault as
the fallback. The SchedulerJobTypeState and SchedulerStatus types now
include a Lane field for API consumers.

* feat: per-lane execution reservation pools for resource isolation

Each scheduler lane now maintains its own execution reservation map
so that a busy volume lane cannot consume execution slots needed by
iceberg or lifecycle lanes. The per-lane pool is used by default when
dispatching jobs through the lane scheduler; the global pool remains
as a fallback for the public DispatchProposals API.

* feat: add per-lane scheduler status API and lane worker UI pages

- GET /api/plugin/lanes returns all lanes with status and job types
- GET /api/plugin/workers?lane=X filters workers by lane
- GET /api/plugin/scheduler-states?lane=X filters job types by lane
- GET /api/plugin/scheduler-status?lane=X returns lane-scoped status
- GET /plugin/lanes/{lane}/workers renders per-lane worker page
- SchedulerJobTypeState now includes a "lane" field

The lane worker pages show scheduler status, job type configuration,
and connected workers scoped to a single lane, with links back to
the main plugin overview.

* feat: add s3_lifecycle worker handler for object store lifecycle management

Implements a full plugin worker handler for S3 lifecycle management,
assigned to the new "lifecycle" scheduler lane.

Detection phase:
- Reads filer.conf to find buckets with TTL lifecycle rules
- Creates one job proposal per bucket with active lifecycle rules
- Supports bucket_filter wildcard pattern from admin config

Execution phase:
- Walks the bucket directory tree breadth-first
- Identifies expired objects by checking TtlSec + Crtime < now
- Deletes expired objects in configurable batches
- Reports progress with scanned/expired/error counts
- Supports dry_run mode for safe testing

Configurable via admin UI:
- batch_size: entries per filer listing page (default 1000)
- max_deletes_per_bucket: safety cap per run (default 10000)
- dry_run: detect without deleting
- delete_marker_cleanup: clean expired delete markers
- abort_mpu_days: abort stale multipart uploads

The handler integrates with the existing PutBucketLifecycle flow which
sets TtlSec on entries via filer.conf path rules.

* feat: add per-lane submenu items under Workers sidebar menu

Replace the single "Workers" sidebar link with a collapsible submenu
containing three lane entries:
- Default (volume management + admin scripts) -> /plugin
- Iceberg (table compaction) -> /plugin/lanes/iceberg/workers
- Lifecycle (S3 object expiration) -> /plugin/lanes/lifecycle/workers

The submenu auto-expands when on any /plugin page and highlights the
active lane. Icons match each lane's job type descriptor (server,
snowflake, hourglass).

* feat: scope plugin pages to their scheduler lane

The plugin overview, configuration, detection, queue, and execution
pages now filter workers, job types, scheduler states, and scheduler
status to only show data for their lane.

- Plugin() templ function accepts a lane parameter (default: "default")
- JavaScript appends ?lane= to /api/plugin/workers, /job-types,
  /scheduler-states, and /scheduler-status API calls
- GET /api/plugin/job-types now supports ?lane= filtering
- When ?job= is provided (e.g. ?job=iceberg_maintenance), the lane is
  auto-derived from the job type so the page scopes correctly

This ensures /plugin shows only default-lane workers and
/plugin/configuration?job=iceberg_maintenance scopes to the iceberg lane.

* fix: remove "Lane" from lane worker page titles and capitalize properly

"lifecycle Lane Workers" -> "Lifecycle Workers"
"iceberg Lane Workers" -> "Iceberg Workers"

* refactor: promote lane items to top-level sidebar menu entries

Move Default, Iceberg, and Lifecycle from a collapsible submenu to
direct top-level items under the WORKERS heading. Removes the
intermediate "Workers" parent link and collapse toggle.

* admin: unify plugin lane routes and handlers

* admin: filter plugin jobs and activities by lane

* admin: reuse plugin UI for worker lane pages

* fix: use ServerAddress.ToGrpcAddress() for filer connections in lifecycle handler

ClusterContext addresses use ServerAddress format (host:port.grpcPort).
Convert to the actual gRPC address via ToGrpcAddress() before dialing,
and add a Ping verification after connecting.

Fixes: "dial tcp: lookup tcp/8888.18888: unknown port"

* fix: resolve ServerAddress gRPC port in iceberg and lifecycle filer connections

ClusterContext addresses use ServerAddress format (host:httpPort.grpcPort).
Both the iceberg and lifecycle handlers now detect the compound format
and extract the gRPC port via ToGrpcAddress() before dialing. Plain
host:port addresses (e.g. from tests) are passed through unchanged.

Fixes: "dial tcp: lookup tcp/8888.18888: unknown port"

* align url

* Potential fix for code scanning alert no. 335: Incorrect conversion between integer types

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

* fix: address PR review findings across scheduler lanes and lifecycle handler

- Fix variable shadowing: rename loop var `w` to `worker` in
  GetPluginWorkersAPI to avoid shadowing the http.ResponseWriter param
- Fix stale GetSchedulerStatus: aggregate loop states across all lanes
  instead of reading never-updated legacy schedulerLoopState
- Scope InProcessJobs to lane in GetLaneSchedulerStatus
- Fix AbortMPUDays=0 treated as unset: change <= 0 to < 0 so 0 disables
- Propagate listing errors in lifecycle bucket walk instead of swallowing
- Implement DeleteMarkerCleanup: scan for S3 delete marker entries and
  remove them
- Implement AbortMPUDays: scan .uploads directory and remove stale
  multipart uploads older than the configured threshold
- Fix success determination: mark job failed when result.errors > 0
  even if no fatal error occurred
- Add regression test for jobTypeLaneMap to catch drift from handler
  registrations

* fix: guard against nil result in lifecycle completion and trim filer addresses

- Guard result dereference in completion summary: use local vars
  defaulting to 0 when result is nil to prevent panic
- Append trimmed filer addresses instead of originals so whitespace
  is not passed to the gRPC dialer

* fix: propagate ctx cancellation from deleteExpiredObjects and add config logging

- deleteExpiredObjects now returns a third error value when the context
  is canceled mid-batch; the caller stops processing further batches
  and returns the cancellation error to the job completion handler
- readBoolConfig and readInt64Config now log unexpected ConfigValue
  types at V(1) for debugging, consistent with readStringConfig

* fix: propagate errors in lifecycle cleanup helpers and use correct delete marker key

- cleanupDeleteMarkers: return error on ctx cancellation and SeaweedList
  failures instead of silently continuing
- abortIncompleteMPUs: log SeaweedList errors instead of discarding
- isDeleteMarker: use ExtDeleteMarkerKey ("Seaweed-X-Amz-Delete-Marker")
  instead of ExtLatestVersionIsDeleteMarker which is for the parent entry
- batchSize cap: use math.MaxInt instead of math.MaxInt32

* fix: propagate ctx cancellation from abortIncompleteMPUs and log unrecognized bool strings

- abortIncompleteMPUs now returns (aborted, errors, ctxErr) matching
  cleanupDeleteMarkers; caller stops on cancellation or listing failure
- readBoolConfig logs unrecognized string values before falling back

* fix: shared per-bucket budget across lifecycle phases and allow cleanup without expired objects

- Thread a shared remaining counter through TTL deletion, delete marker
  cleanup, and MPU abort so the total operations per bucket never exceed
  MaxDeletesPerBucket
- Remove early return when no TTL-expired objects found so delete marker
  cleanup and MPU abort still run
- Add NOTE on cleanupDeleteMarkers about version-safety limitation

---------

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
2026-03-26 19:28:13 -07:00
Chris Lu
67a551fd62 admin UI: add anonymous user creation checkbox (#8773)
Add an "Anonymous" checkbox next to the username field in the Create User
modal. When checked, the username is set to "anonymous" and the credential
generation checkbox is disabled since anonymous users do not need keys.

The checkbox is only shown when no anonymous user exists yet. The
manage-access-keys button in the users table is hidden for the anonymous
user.
2026-03-25 21:24:10 -07:00
Anton
7f0cf72574 admin/plugin: delete job_detail files when jobs are pruned from memory (#8722)
* admin/plugin: delete job_detail files when jobs are pruned from memory

pruneTrackedJobsLocked evicts the oldest terminal jobs from the in-memory
tracker when the total exceeds maxTrackedJobsTotal (1000). However the
dedicated per-job detail files in jobs/job_details/ were never removed,
causing them to accumulate indefinitely on disk.

Add ConfigStore.DeleteJobDetail and call it from pruneTrackedJobsLocked so
that the file is cleaned up together with the in-memory entry. Deletion
errors are logged at verbosity level 2 and do not abort the prune.

* admin/plugin: add test for DeleteJobDetail

---------

Co-authored-by: Anton Ustyugov <anton@devops>
Co-authored-by: Chris Lu <chris.lu@gmail.com>
2026-03-21 14:23:32 -07:00
Anton
90277ceed5 admin/plugin: migrate inline job details asynchronously to avoid slow startup (#8721)
loadPersistedMonitorState performed a backward-compatibility migration that
wrote every job with inline rich detail fields to a dedicated per-job detail
file synchronously during startup. On deployments with many historical jobs
(e.g. 1000+) stored on distributed block storage (e.g. Longhorn), each
individual file write requires an fsync round-trip, making startup
disproportionately slow and causing readiness/liveness probe failures.

The in-memory state is populated correctly before the goroutine is started
because stripTrackedJobDetailFields is still called in-place; only the disk
writes are deferred. A completion log message at V(1) is emitted once the
background migration finishes.

Co-authored-by: Anton Ustyugov <anton@devops>
2026-03-21 14:18:42 -07:00
Anton
ae170f1fbb admin: fix manual job run to use scheduler dispatch with capacity management and retry (#8720)
RunPluginJobTypeAPI previously executed proposals with a naive sequential loop
calling ExecutePluginJob per proposal. This had two bugs:

1. Double-lock: RunPluginJobTypeAPI held pluginLock while calling ExecutePluginJob,
   which tried to re-acquire the same lock for every job in the loop.

2. No capacity management: proposals were fired directly at workers without
   reserveScheduledExecutor, so every job beyond the worker concurrency limit
   received an immediate at_capacity error with no retry or backoff.

Fix: add Plugin.DispatchProposals which reuses dispatchScheduledProposals - the
same code path the scheduler loop uses - with executor reservation, configurable
concurrency, and per-job retry with backoff. RunPluginJobTypeAPI now calls
DispatchPluginProposals (a thin AdminServer wrapper) after holding pluginLock once.

Co-authored-by: Anton Ustyugov <anton@devops>
2026-03-21 14:09:31 -07:00
Chris Lu
6ccda3e809 fix(s3): allow deleting the anonymous user from admin webui (#8706)
Remove the block that prevented deleting the "anonymous" identity
and stop auto-creating it when absent.  If no anonymous identity
exists (or it is disabled), LookupAnonymous returns not-found and
both auth paths return ErrAccessDenied for anonymous requests.

To enable anonymous access, explicitly create the "anonymous" user.
To revoke it, delete the user like any other identity.

Closes #8694
2026-03-19 18:10:20 -07:00
Jayshan Raghunandan
1f1eac4f08 feat: improve aio support for admin/volume ingress and fix UI links (#8679)
* feat: improve allInOne mode support for admin/volume ingress and fix master UI links

- Add allInOne support to admin ingress template, matching the pattern
  used by filer and s3 ingress templates (or-based enablement with
  ternary service name selection)
- Add allInOne support to volume ingress template, which previously
  required volume.enabled even when the volume server runs within the
  allInOne pod
- Expose admin ports in allInOne deployment and service when
  allInOne.admin.enabled is set
- Add allInOne.admin config section to values.yaml (enabled by default,
  ports inherit from admin.*)
- Fix legacy master UI templates (master.html, masterNewRaft.html) to
  prefer PublicUrl over internal Url when linking to volume server UI.
  The new admin UI already handles this correctly.

* fix: revert admin allInOne changes and fix PublicUrl in admin dashboard

The admin binary (`weed admin`) is a separate process that cannot run
inside `weed server` (allInOne mode). Revert the admin-related allInOne
helm chart changes that caused 503 errors on admin ingress.

Fix bug in cluster_topology.go where VolumeServer.PublicURL was set to
node.Id (internal pod address) instead of the actual public URL. Add
public_url field to DataNodeInfo proto message so the topology gRPC
response carries the public URL set via -volume.publicUrl flag.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: use HTTP /dir/status to populate PublicUrl in admin dashboard

The gRPC DataNodeInfo proto does not include PublicUrl, so the admin dashboard showed internal pod IPs instead of the configured public URL.
Fetch PublicUrl from the master's /dir/status HTTP endpoint and apply it
in both GetClusterTopology and GetClusterVolumeServers code paths.

Also reverts the unnecessary proto field additions from the previous
commit and cleans up a stray blank line in all-in-one-service.yml.

* fix: apply PublicUrl link fix to masterNewRaft.html

Match the same conditional logic already applied to master.html —
prefer PublicUrl when set and different from Url.

* fix: add HTTP timeout and status check to fetchPublicUrlMap

Use a 5s-timeout client instead of http.DefaultClient to prevent
blocking indefinitely when the master is unresponsive. Also check
the HTTP status code before attempting to parse the response body.

* fix: fall back to node address when PublicUrl is empty

Prevents blank links in the admin dashboard when PublicUrl is not
configured, such as in standalone or mixed-version clusters.

* fix: log io.ReadAll error in fetchPublicUrlMap

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Chris Lu <chris.lu@gmail.com>
2026-03-18 13:20:55 -07:00