127 Commits

Author SHA1 Message Date
Chris Lu
5d43f84df7 refactor(plugin): rename detection_interval_seconds → detection_interval_minutes (#9366)
Minutes is the natural granularity for detection cadence — every
production handler already set the seconds field to a 60-multiple
(17*60, 30*60, 3600, 24*60*60). Switching to minutes drops the *60
arithmetic and matches the unit conventions used elsewhere in the
plugin worker forms.

- Proto: AdminRuntimeDefaults + AdminRuntimeConfig.detection_interval_*
  field renamed.
- Helpers: durationFromMinutes / minutesFromDuration alongside the
  existing seconds variants in plugin_scheduler.go.
- Handlers: vacuum, ec_balance, balance, erasure_coding, iceberg,
  admin_script, s3_lifecycle now declare DetectionIntervalMinutes.
- Admin: scheduler_status + types + UI templ + plugin_api.go pass
  through the new field; UI label and table cells switch to "min".
2026-05-08 10:33:02 -07:00
Minsoo Kim
a1e5eb9dad Fix UI prefix url encoding (#9344)
* Fix filer UI navigation for URL-sensitive object prefixes

* Fix filer UI navigation for URL-sensitive object prefixes

* Clarify filer UI path escaping test name

Rename the legacy filer UI
  path test to describe the actual behavior being checked.

  The printpath helper preserves timestamp characters that are valid in URL path
  components, while the PR fix is focused on query-string escaping for path and cursor
  parameters.
2026-05-06 19:14:36 -07:00
Chris Lu
7b0b64db65 fix(admin/view): wrap plugin history URL with basePath (#9341)
Plugin tabs/sub-tabs use history.pushState/replaceState to keep the
URL bar in sync with the active view, but updateURL fed it the raw
output of buildPluginURL ("/plugin/lanes/<lane>/..."). Under a
urlPrefix deployment that strips the prefix, so reloading the page
hit /plugin/... directly and 404'd at the proxy.

Wrap with basePath() so the rewritten URL keeps the deployment
prefix.

Reported at #9240.
2026-05-06 15:25:06 -07:00
Chris Lu
f2c3bd7b77 fix(admin/view): define basePath in plugin IIFE scopes (#9298)
The plugin.templ and plugin_lane.templ components use basePath() in their IIFE
(Immediately Invoked Function Expression) scopes to handle subdirectory
deployments. However, basePath was not defined locally, causing "basePath is
not defined" errors when accessing plugin pages.

Added local basePath function definitions in both files, matching the pattern
from admin.js. This function checks window.__BASE_PATH__ (set by the layout
during page initialization) and prepends it to API paths.
2026-05-01 18:22:39 -07:00
Chris Lu
14cd426cf9 templ 2026-04-29 16:08:38 -07:00
Parviz Miriyev
e2f96687ff fix(admin): use protocol-relative URLs for component links so HTTPS clusters don't break clicks (#9256)
* fix(admin): use protocol-relative URLs for component links

Hardcoded http:// in admin UI templates breaks browser-initiated clicks
to master / volume / filer / EC shard / Iceberg REST URLs whenever the
target component runs HTTPS-only via security.toml [https.X] sections.
The browser sends plain HTTP to a TLS-only endpoint and gets 400
"client sent an HTTP request to an HTTPS server".

Same root pattern as #9227 (admin's own backend /dir/status fetch);
this PR is the browser-facing equivalent.

Replace fmt.Sprintf("http://%s...") with fmt.Sprintf("//%s...") and the
JS-string '<a href="http://' with '<a href="//' so the browser uses the
same scheme as the page hosting the link. Backwards compatible:
  - HTTPS-only deployments: links now work
  - HTTP-only deployments: identical behavior to before
  - Mixed: edge case, addressed by future per-component public-URL work

Affected templates (9 files), each kept in lockstep with its generated
_templ.go sibling so reviewers don't need to run templ generate:

- weed/admin/view/app/admin.templ
- weed/admin/view/app/cluster_filers.templ
- weed/admin/view/app/cluster_masters.templ (Go templ + JS modal)
- weed/admin/view/app/cluster_volume_servers.templ (Go templ + JS modal)
- weed/admin/view/app/cluster_volumes.templ
- weed/admin/view/app/ec_volume_details.templ
- weed/admin/view/app/volume_details.templ
- weed/admin/view/app/iceberg_catalog.templ
- weed/admin/view/app/s3tables_buckets.templ

17 link constructions total, +32/-32 lines.

* fix(admin): protocol-relative URLs in iceberg + s3tables JS overrides

Per Gemini code review on this PR: the JS scripts in iceberg_catalog
and s3tables_buckets templates overwrite the href attribute of the
"Open Iceberg REST" links after page load, replacing the
protocol-relative URL set by the templ render with a hardcoded
http://<host>:<port>/v1/config.

Apply the same protocol-relative fix to the JS template literals so
they don't undo the templ-side change. Browser uses the page scheme
(http or https) to fill in the protocol.

Mirrored in iceberg_catalog_templ.go and s3tables_buckets_templ.go.

* fix(admin): displayed Iceberg endpoint scheme follows page protocol

Per CodeRabbit review on this PR: the on-page guidance text in iceberg
and s3tables templates still showed a literal `http://` even after the
clickable link was switched to a protocol-relative URL. In HTTPS-only
deployments operators see `http://host:8181/v1` as the suggested
endpoint, copy it, and get a broken connection.

Wrap the scheme in <span id="iceberg-protocol"> (and the s3tables
counterpart) and have the existing inline script set its innerText to
window.location.protocol minus the trailing colon. Same pattern as the
existing dynamic host substitution. Mirrored in *_templ.go so reviewers
do not need templ generate.

SQL/JSON code-block examples (CREATE EXTERNAL TABLE ... ENDPOINT
'http://...', "uri": "http://..." ) are intentionally left as-is —
they are starter snippets users adapt to their environment, not
clickable or copy-paste-into-runtime values. Happy to follow up with
server-side scheme threading if requested.
2026-04-27 23:10:11 -07:00
Chris Lu
fa492a9eed fix(admin): wrap plugin URLs with basePath for subdir deployments
Two more spots that broke under a subdirectory deployment:

- plugin.templ pluginRequest() called fetch(url) with relative API
  paths from 14+ callers; wrap once inside the helper so they all
  honor window.__BASE_PATH__.
- plugin_lane.templ generated <a href="/plugin/configuration?job=...">
  with an absolute path; wrap with basePath() so the link stays
  inside the deployment prefix.

Follow-up to a6adf530c.
2026-04-27 09:04:12 -07:00
Chris Lu
3ea489d013 fix(admin): wrap plugin lane fetch URL with basePath
Plugin lane page fetches API endpoints with raw absolute URLs, breaking
deployments under a subdirectory. Wrap the fetch URL with basePath() so
window.__BASE_PATH__ is honored, matching other admin pages.

Addresses https://github.com/seaweedfs/seaweedfs/issues/9240
2026-04-27 09:00:29 -07:00
faspix
0fcd5173be fix(admin): use basePath for API fetches when urlPrefix is set (#9197)
* fix(admin): use basePath for API fetches when urlPrefix is set

* fix(admin): drop duplicate iam-utils script on Groups page

* fix(admin): route topics page fetches through basePath

The Topics page missed two fetch() calls that still used root-relative
URLs, so create-topic and view-details still broke when -urlPrefix was
set.

---------

Co-authored-by: Maksim Babkou <maksim.babkou@innovatrics.com>
Co-authored-by: Chris Lu <chris.lu@gmail.com>
2026-04-23 11:55:07 -07:00
Chris Lu
46b801aedb fix(admin): list all masters and dedupe EC file counts in dashboard (#9093)
* fix(admin): list all masters and dedupe EC file counts in dashboard

Dashboard -> Master Nodes only ever showed the currently connected master
because getMasterNodesStatus hard-coded a single entry. Replace it with a
RaftListClusterServers call that returns every master in the raft group and
tags the real leader, falling back to the current master only if the raft
call fails.

Buckets -> Object Store Buckets could render 0 objects for a bucket backed
by an EC volume. Every shard holder reports the same whole-volume
file_count (read from the replicated .ecx), so the first-seen value wins;
if that first node had not yet finished loading .ecx it reported 0 and
pinned the aggregate at 0. Take the max across reporting nodes instead.

The dashboard header total_files also dropped after volumes were converted
to erasure coding because getTopologyViaGRPC never folded EC file_count
into topology.TotalFiles. Aggregate it with the same max/sum dedupe.

* fix(admin): address PR review comments

- bound RaftListClusterServers with a 3s timeout so the dashboard endpoint
  cannot hang on a stalled master
- pre-validate raft addresses with net.SplitHostPort before calling
  pb.GrpcAddressToServerAddress, which otherwise glog.Fatalf's on a
  malformed entry and would crash the admin process
- when raft is unreachable, mark the fallback master as not-leader rather
  than claiming leadership the code cannot verify
- warn when summed EC delete_count exceeds file_count while folding into
  topology.TotalFiles, matching collectCollectionStats

* fix(admin): distinguish empty raft response from RPC failure

When RaftListClusterServers returns successfully with no servers, raft is
not initialized (standalone/non-raft cluster), so the single fallback
master is the leader. Only treat the fallback as a non-leader when the
RPC actually failed.

* fix(admin): remove misleading Objects column from S3 buckets page

The bucket "Objects" column displayed needle counts from volume
collection stats, not actual S3 object counts. This is confusing
because a single S3 object can span multiple needles (multipart
uploads, versions) and the count is inaccurate for EC volumes.

Remove the ObjectCount field from S3Bucket, the Objects table column,
the sort-by-objects handler, the detail-view row, and both CSV export
references.

* fix(admin): correct cell indexes in fallback bucket CSV export

After the Objects column was removed, the fallback CSV exporter in
admin.js still used stale cell indexes: cells[1] mapped to Owner
(not Created), cells[2] to Created (not Size), cells[3] to Logical
Size (not Quota). Align all indexes with the current table column
order and include Owner, Logical Size, and Physical Size.
2026-04-15 22:28:54 -07:00
Chris Lu
512912cbb8 Update plugin_templ.go 2026-04-13 13:10:03 -07:00
Chris Lu
ae08e77979 fix(scheduler): give worker tasks a real per-attempt execution deadline (#9041)
* fix(scheduler): give worker tasks a real per-attempt execution deadline

The plugin scheduler derived the per-attempt execution deadline as
DetectionTimeoutSeconds * 2, which capped every worker task at twice
the cluster-scan budget regardless of actual work. For volume_balance
batches this was 240s — far too short for 20 large volume copies, so
every attempt died at "context deadline exceeded" and all in-flight
sub-RPCs surfaced as "context canceled". Retries restarted from move 1
and hit the same wall.

Add an explicit ExecutionTimeoutSeconds field to the plugin proto and
make each handler declare its own baseline (1800s for vacuum, balance,
EC; 3600s for iceberg). Size-aware handlers also emit an
estimated_runtime_seconds parameter on each proposal so the scheduler
extends the per-attempt deadline based on actual workload:

- volume_balance batch: max(largest single move, total / concurrency)
  at 5 min/GB, so a skewed batch with one big volume isn't averaged
  away.
- volume_balance single, vacuum (already), erasure_coding (10 min/GB),
  ec_balance (5 min/GB): per-volume budgets.

admin_script and iceberg keep the configurable handler default since
their workloads are opaque to the detector.

* fix(scheduler): apply descriptor defaults to existing persisted configs

The previous commit added execution_timeout_seconds to the proto and
each handler's descriptor defaults, but two paths still left existing
deployments broken:

1. deriveSchedulerAdminRuntime returned stored AdminRuntime configs
   as-is. Persisted configs from older versions have no
   execution_timeout_seconds, so the scheduler fell back to the 90s
   default — worse than the prior 240s behavior. Overlay descriptor
   defaults for any zero numeric fields when loading.

2. The admin form did not round-trip execution_timeout_seconds, so a
   normal save would clear it back to zero. Add the input field, the
   fillAdminSettings/collectAdminSettings hooks, and as defense in
   depth reapply descriptor defaults in UpdatePluginJobTypeConfigAPI
   before persisting so a stale form can never silently clobber a
   baseline.

* fix(volume_balance): account for partial scheduling rounds in batch estimate

With N moves and C slots, the busiest slot processes ceil(N/C) moves,
not N/C. Dividing total seconds by C underestimates wall-clock time
whenever N is not a multiple of C — e.g. 6 moves at concurrency 5
needs 2 rounds, not 1.2. Use avg * ceil(N/C) so partial rounds are
counted as full ones.

* fix(volume_balance): scale minBudget per wave instead of per move

Orchestration overhead (setup/teardown for the parallel move runner)
happens once per wave, not once per move. Use numRounds*60 as the
floor instead of len(moves)*60 so the minimum doesn't inflate
linearly with batch size when individual moves are tiny.
2026-04-13 01:15:53 -07:00
Moray Baruh
41ff105f47 object_store_users: fix specific bucket admin permission (#9014)
Fix an issue where seleting Sepecific Buckets with Admin permission
while creating/editing an object store user would grant Admin permission on all
buckets
2026-04-10 18:10:05 -07:00
Chris Lu
d37b592bc4 Update object_store_users_templ.go 2026-04-04 11:52:57 -07:00
Chris Lu
d1823d3784 fix(s3): include static identities in listing operations (#8903)
* fix(s3): include static identities in listing operations

Static identities loaded from -s3.config file were only stored in the
S3 API server's in-memory state. Listing operations (s3.configure shell
command, aws iam list-users) queried the credential manager which only
returned dynamic identities from the backend store.

Register static identities with the credential manager after loading
so they are included in LoadConfiguration and ListUsers results, and
filtered out before SaveConfiguration to avoid persisting them to the
dynamic store.

Fixes https://github.com/seaweedfs/seaweedfs/discussions/8896

* fix: avoid mutating caller's config and defensive copies

- SaveConfiguration: use shallow struct copy instead of mutating the
  caller's config.Identities field
- SetStaticIdentities: skip nil entries to avoid panics
- GetStaticIdentities: defensively copy PolicyNames slice to avoid
  aliasing the original

* fix: filter nil static identities and sync on config reload

- SetStaticIdentities: filter nil entries from the stored slice (not
  just from staticNames) to prevent panics in LoadConfiguration/ListUsers
- Extract updateCredentialManagerStaticIdentities helper and call it
  from both startup and the grace.OnReload handler so the credential
  manager's static snapshot stays current after config file reloads

* fix: add mutex for static identity fields and fix ListUsers for store callers

- Add sync.RWMutex to protect staticIdentities/staticNames against
  concurrent reads during config reload
- Revert CredentialManager.ListUsers to return only store users, since
  internal callers (e.g. DeletePolicy) look up each user in the store
  and fail on non-existent static entries
- Merge static usernames in the filer gRPC ListUsers handler instead,
  via the new GetStaticUsernames method
- Fix CI: TestIAMPolicyManagement/managed_policy_crud_lifecycle was
  failing because DeletePolicy iterated static users that don't exist
  in the store

* fix: show static identities in admin UI and weed shell

The admin UI and weed shell s3.configure command query the filer's
credential manager via gRPC, which is a separate instance from the S3
server's credential manager. Static identities were only registered
on the S3 server's credential manager, so they never appeared in the
filer's responses.

- Add CredentialManager.LoadS3ConfigFile to parse a static S3 config
  file and register its identities
- Add FilerOptions.s3ConfigFile so the filer can load the same static
  config that the S3 server uses
- Wire s3ConfigFile through in weed mini and weed server modes
- Merge static usernames in filer gRPC ListUsers handler
- Add CredentialManager.GetStaticUsernames helper
- Add sync.RWMutex to protect concurrent access to static identity
  fields
- Avoid importing weed/filer from weed/credential (which pulled in
  filer store init() registrations and broke test isolation)
- Add docker/compose/s3_static_users_example.json

* fix(admin): make static users read-only in admin UI

Static users loaded from the -s3.config file should not be editable
or deletable through the admin UI since they are managed via the
config file.

- Add IsStatic field to ObjectStoreUser, set from credential manager
- Hide edit, delete, and access key buttons for static users in the
  users table template
- Show a "static" badge next to static user names
- Return 403 Forbidden from UpdateUser and DeleteUser API handlers
  when the target user is a static identity

* fix(admin): show details for static users

GetObjectStoreUserDetails called credentialManager.GetUser which only
queries the dynamic store. For static users this returned
ErrUserNotFound. Fall back to GetStaticIdentity when the store lookup
fails.

* fix(admin): load static S3 identities in admin server

The admin server has its own credential manager (gRPC store) which is
a separate instance from the S3 server's and filer's. It had no static
identity data, so IsStaticIdentity returned false (edit/delete buttons
shown) and GetStaticIdentity returned nil (details page failed).

Pass the -s3.config file path through to the admin server and call
LoadS3ConfigFile on its credential manager, matching the approach
used for the filer.

* fix: use protobuf is_static field instead of passing config file path

The previous approach passed -s3.config file path to every component
(filer, admin). This is wrong because the admin server should not need
to know about S3 config files.

Instead, add an is_static field to the Identity protobuf message.
The field is set when static identities are serialized (in
GetStaticIdentities and LoadS3ConfigFile). Any gRPC client that loads
configuration via GetConfiguration automatically sees which identities
are static, without needing the config file.

- Add is_static field (tag 8) to iam_pb.Identity proto message
- Set IsStatic=true in GetStaticIdentities and LoadS3ConfigFile
- Admin GetObjectStoreUsers reads identity.IsStatic from proto
- Admin IsStaticUser helper loads config via gRPC to check the flag
- Filer GetUser gRPC handler falls back to GetStaticIdentity
- Remove s3ConfigFile from AdminOptions and NewAdminServer signature
2026-04-03 20:01:28 -07:00
Chris Lu
995dfc4d5d chore: remove ~50k lines of unreachable dead code (#8913)
* chore: remove unreachable dead code across the codebase

Remove ~50,000 lines of unreachable code identified by static analysis.

Major removals:
- weed/filer/redis_lua: entire unused Redis Lua filer store implementation
- weed/wdclient/net2, resource_pool: unused connection/resource pool packages
- weed/plugin/worker/lifecycle: unused lifecycle plugin worker
- weed/s3api: unused S3 policy templates, presigned URL IAM, streaming copy,
  multipart IAM, key rotation, and various SSE helper functions
- weed/mq/kafka: unused partition mapping, compression, schema, and protocol functions
- weed/mq/offset: unused SQL storage and migration code
- weed/worker: unused registry, task, and monitoring functions
- weed/query: unused SQL engine, parquet scanner, and type functions
- weed/shell: unused EC proportional rebalance functions
- weed/storage/erasure_coding/distribution: unused distribution analysis functions
- Individual unreachable functions removed from 150+ files across admin,
  credential, filer, iam, kms, mount, mq, operation, pb, s3api, server,
  shell, storage, topology, and util packages

* fix(s3): reset shared memory store in IAM test to prevent flaky failure

TestLoadIAMManagerFromConfig_EmptyConfigWithFallbackKey was flaky because
the MemoryStore credential backend is a singleton registered via init().
Earlier tests that create anonymous identities pollute the shared store,
causing LookupAnonymous() to unexpectedly return true.

Fix by calling Reset() on the memory store before the test runs.

* style: run gofmt on changed files

* fix: restore KMS functions used by integration tests

* fix(plugin): prevent panic on send to closed worker session channel

The Plugin.sendToWorker method could panic with "send on closed channel"
when a worker disconnected while a message was being sent. The race was
between streamSession.close() closing the outgoing channel and sendToWorker
writing to it concurrently.

Add a done channel to streamSession that is closed before the outgoing
channel, and check it in sendToWorker's select to safely detect closed
sessions without panicking.
2026-04-03 16:04:27 -07:00
Chris Lu
888c32cbde fix(admin): respect urlPrefix in S3 bucket and S3Tables navigation links (#8885)
* fix(admin): respect urlPrefix in S3 bucket and S3Tables navigation links (#8884)

Several admin UI templates used hardcoded URLs (templ.SafeURL) instead of
dash.PUrl(ctx, ...) for navigation links, causing 404 errors when the
admin is deployed with --urlPrefix.

Fixed in: s3_buckets.templ, s3tables_buckets.templ, s3tables_tables.templ

* fix(admin): URL-escape bucketName in S3Tables navigation links

Add url.PathEscape(bucketName) for consistency and correctness in
s3tables_tables.templ (back-to-namespaces link) and s3tables_buckets.templ
(namespace link), matching the escaping already used in the table details link.
2026-04-02 11:54:19 -07:00
Chris Lu
8c8d21d7e2 Update plugin_lane_templ.go 2026-03-26 23:11:10 -07:00
Chris Lu
cc2f790c73 feat: add per-lane scheduler status API and lane worker UI pages
- GET /api/plugin/lanes returns all lanes with status and job types
- GET /api/plugin/workers?lane=X filters workers by lane
- GET /api/plugin/scheduler-states?lane=X filters job types by lane
- GET /api/plugin/scheduler-status?lane=X returns lane-scoped status
- GET /plugin/lanes/{lane}/workers renders per-lane worker page
- SchedulerJobTypeState now includes a "lane" field

The lane worker pages show scheduler status, job type configuration,
and connected workers scoped to a single lane, with links back to
the main plugin overview.
2026-03-26 19:33:42 -07:00
Chris Lu
d95df76bca feat: separate scheduler lanes for iceberg, lifecycle, and volume management (#8787)
* feat: introduce scheduler lanes for independent per-workload scheduling

Split the single plugin scheduler loop into independent per-lane
goroutines so that volume management, iceberg compaction, and lifecycle
operations never block each other.

Each lane has its own:
- Goroutine (laneSchedulerLoop)
- Wake channel for immediate scheduling
- Admin lock scope (e.g. "plugin scheduler:default")
- Configurable idle sleep duration
- Loop state tracking

Three lanes are defined:
- default: vacuum, volume_balance, ec_balance, erasure_coding, admin_script
- iceberg: iceberg_maintenance
- lifecycle: s3_lifecycle (new, handler coming in a later commit)

Job types are mapped to lanes via a hardcoded map with LaneDefault as
the fallback. The SchedulerJobTypeState and SchedulerStatus types now
include a Lane field for API consumers.

* feat: per-lane execution reservation pools for resource isolation

Each scheduler lane now maintains its own execution reservation map
so that a busy volume lane cannot consume execution slots needed by
iceberg or lifecycle lanes. The per-lane pool is used by default when
dispatching jobs through the lane scheduler; the global pool remains
as a fallback for the public DispatchProposals API.

* feat: add per-lane scheduler status API and lane worker UI pages

- GET /api/plugin/lanes returns all lanes with status and job types
- GET /api/plugin/workers?lane=X filters workers by lane
- GET /api/plugin/scheduler-states?lane=X filters job types by lane
- GET /api/plugin/scheduler-status?lane=X returns lane-scoped status
- GET /plugin/lanes/{lane}/workers renders per-lane worker page
- SchedulerJobTypeState now includes a "lane" field

The lane worker pages show scheduler status, job type configuration,
and connected workers scoped to a single lane, with links back to
the main plugin overview.

* feat: add s3_lifecycle worker handler for object store lifecycle management

Implements a full plugin worker handler for S3 lifecycle management,
assigned to the new "lifecycle" scheduler lane.

Detection phase:
- Reads filer.conf to find buckets with TTL lifecycle rules
- Creates one job proposal per bucket with active lifecycle rules
- Supports bucket_filter wildcard pattern from admin config

Execution phase:
- Walks the bucket directory tree breadth-first
- Identifies expired objects by checking TtlSec + Crtime < now
- Deletes expired objects in configurable batches
- Reports progress with scanned/expired/error counts
- Supports dry_run mode for safe testing

Configurable via admin UI:
- batch_size: entries per filer listing page (default 1000)
- max_deletes_per_bucket: safety cap per run (default 10000)
- dry_run: detect without deleting
- delete_marker_cleanup: clean expired delete markers
- abort_mpu_days: abort stale multipart uploads

The handler integrates with the existing PutBucketLifecycle flow which
sets TtlSec on entries via filer.conf path rules.

* feat: add per-lane submenu items under Workers sidebar menu

Replace the single "Workers" sidebar link with a collapsible submenu
containing three lane entries:
- Default (volume management + admin scripts) -> /plugin
- Iceberg (table compaction) -> /plugin/lanes/iceberg/workers
- Lifecycle (S3 object expiration) -> /plugin/lanes/lifecycle/workers

The submenu auto-expands when on any /plugin page and highlights the
active lane. Icons match each lane's job type descriptor (server,
snowflake, hourglass).

* feat: scope plugin pages to their scheduler lane

The plugin overview, configuration, detection, queue, and execution
pages now filter workers, job types, scheduler states, and scheduler
status to only show data for their lane.

- Plugin() templ function accepts a lane parameter (default: "default")
- JavaScript appends ?lane= to /api/plugin/workers, /job-types,
  /scheduler-states, and /scheduler-status API calls
- GET /api/plugin/job-types now supports ?lane= filtering
- When ?job= is provided (e.g. ?job=iceberg_maintenance), the lane is
  auto-derived from the job type so the page scopes correctly

This ensures /plugin shows only default-lane workers and
/plugin/configuration?job=iceberg_maintenance scopes to the iceberg lane.

* fix: remove "Lane" from lane worker page titles and capitalize properly

"lifecycle Lane Workers" -> "Lifecycle Workers"
"iceberg Lane Workers" -> "Iceberg Workers"

* refactor: promote lane items to top-level sidebar menu entries

Move Default, Iceberg, and Lifecycle from a collapsible submenu to
direct top-level items under the WORKERS heading. Removes the
intermediate "Workers" parent link and collapse toggle.

* admin: unify plugin lane routes and handlers

* admin: filter plugin jobs and activities by lane

* admin: reuse plugin UI for worker lane pages

* fix: use ServerAddress.ToGrpcAddress() for filer connections in lifecycle handler

ClusterContext addresses use ServerAddress format (host:port.grpcPort).
Convert to the actual gRPC address via ToGrpcAddress() before dialing,
and add a Ping verification after connecting.

Fixes: "dial tcp: lookup tcp/8888.18888: unknown port"

* fix: resolve ServerAddress gRPC port in iceberg and lifecycle filer connections

ClusterContext addresses use ServerAddress format (host:httpPort.grpcPort).
Both the iceberg and lifecycle handlers now detect the compound format
and extract the gRPC port via ToGrpcAddress() before dialing. Plain
host:port addresses (e.g. from tests) are passed through unchanged.

Fixes: "dial tcp: lookup tcp/8888.18888: unknown port"

* align url

* Potential fix for code scanning alert no. 335: Incorrect conversion between integer types

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

* fix: address PR review findings across scheduler lanes and lifecycle handler

- Fix variable shadowing: rename loop var `w` to `worker` in
  GetPluginWorkersAPI to avoid shadowing the http.ResponseWriter param
- Fix stale GetSchedulerStatus: aggregate loop states across all lanes
  instead of reading never-updated legacy schedulerLoopState
- Scope InProcessJobs to lane in GetLaneSchedulerStatus
- Fix AbortMPUDays=0 treated as unset: change <= 0 to < 0 so 0 disables
- Propagate listing errors in lifecycle bucket walk instead of swallowing
- Implement DeleteMarkerCleanup: scan for S3 delete marker entries and
  remove them
- Implement AbortMPUDays: scan .uploads directory and remove stale
  multipart uploads older than the configured threshold
- Fix success determination: mark job failed when result.errors > 0
  even if no fatal error occurred
- Add regression test for jobTypeLaneMap to catch drift from handler
  registrations

* fix: guard against nil result in lifecycle completion and trim filer addresses

- Guard result dereference in completion summary: use local vars
  defaulting to 0 when result is nil to prevent panic
- Append trimmed filer addresses instead of originals so whitespace
  is not passed to the gRPC dialer

* fix: propagate ctx cancellation from deleteExpiredObjects and add config logging

- deleteExpiredObjects now returns a third error value when the context
  is canceled mid-batch; the caller stops processing further batches
  and returns the cancellation error to the job completion handler
- readBoolConfig and readInt64Config now log unexpected ConfigValue
  types at V(1) for debugging, consistent with readStringConfig

* fix: propagate errors in lifecycle cleanup helpers and use correct delete marker key

- cleanupDeleteMarkers: return error on ctx cancellation and SeaweedList
  failures instead of silently continuing
- abortIncompleteMPUs: log SeaweedList errors instead of discarding
- isDeleteMarker: use ExtDeleteMarkerKey ("Seaweed-X-Amz-Delete-Marker")
  instead of ExtLatestVersionIsDeleteMarker which is for the parent entry
- batchSize cap: use math.MaxInt instead of math.MaxInt32

* fix: propagate ctx cancellation from abortIncompleteMPUs and log unrecognized bool strings

- abortIncompleteMPUs now returns (aborted, errors, ctxErr) matching
  cleanupDeleteMarkers; caller stops on cancellation or listing failure
- readBoolConfig logs unrecognized string values before falling back

* fix: shared per-bucket budget across lifecycle phases and allow cleanup without expired objects

- Thread a shared remaining counter through TTL deletion, delete marker
  cleanup, and MPU abort so the total operations per bucket never exceed
  MaxDeletesPerBucket
- Remove early return when no TTL-expired objects found so delete marker
  cleanup and MPU abort still run
- Add NOTE on cleanupDeleteMarkers about version-safety limitation

---------

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
2026-03-26 19:28:13 -07:00
Chris Lu
67a551fd62 admin UI: add anonymous user creation checkbox (#8773)
Add an "Anonymous" checkbox next to the username field in the Create User
modal. When checked, the username is set to "anonymous" and the credential
generation checkbox is disabled since anonymous users do not need keys.

The checkbox is only shown when no anonymous user exists yet. The
manage-access-keys button in the users table is hidden for the anonymous
user.
2026-03-25 21:24:10 -07:00
Chris Lu
7c83460b10 adjust template path 2026-03-16 15:29:12 -07:00
Chris Lu
e8914ac879 feat(admin): add -urlPrefix flag for subdirectory deployment (#8670)
Allow the admin server to run behind a reverse proxy under a
subdirectory by adding a -urlPrefix flag (e.g. -urlPrefix=/seaweedfs).

Closes #8646
2026-03-16 15:26:02 -07:00
Chris Lu
6fc0489dd8 feat(plugin): make page tabs and sub-tabs addressable by URLs (#8626)
* feat(plugin): make page tabs and sub-tabs addressable by URLs

Update the plugin page so that clicking tabs and sub-tabs pushes
browser history via history.pushState(), enabling bookmarkable URLs,
browser back/forward navigation, and shareable links.

URL mapping:
  - /plugin              → Overview tab
  - /plugin/configuration → Configuration sub-tab
  - /plugin/detection     → Job Detection sub-tab
  - /plugin/queue         → Job Queue sub-tab
  - /plugin/execution     → Job Execution sub-tab

Job-type-specific URLs use the ?job= query parameter (e.g.,
/plugin/configuration?job=vacuum) so that a specific job type tab
is pre-selected on page load.

Changes:
- Add initialJob parameter to Plugin() template and handler
- Extract ?job= query param in renderPluginPage handler
- Add buildPluginURL/updateURL helpers in JavaScript
- Push history state on top-tab, sub-tab, and job-type clicks
- Listen for popstate to restore tab state on back/forward
- Replace initial history entry on page load via replaceState

* make popstate handler async with proper error handling

Await loadDescriptorAndConfig so data loading completes before
rendering dependent views. Log errors instead of silently
swallowing them.
2026-03-13 23:02:43 -07:00
Chris Lu
a6774f0e01 add git commit hash on admin ui 2026-03-12 23:51:25 -07:00
Chris Lu
ac579c1746 Fix plugin configuration tab layout overflow (#8596)
Fix plugin configuration tab layout overflow (#8587)

Remove h-100 from Job Scheduling Settings card, which caused it to
stretch to 100% of the row height and push the Next Run card below
the row boundary, overflowing into the Detection Results section.
2026-03-10 19:19:42 -07:00
Chris Lu
07f3f5eec5 remove worker links 2026-03-10 18:03:38 -07:00
Chris Lu
00000ec006 Update s3_buckets_templ.go 2026-03-09 22:41:07 -07:00
Chris Lu
5f85bf5e8a Batch volume balance: run multiple moves per job (#8561)
* proto: add BalanceMoveSpec and batch fields to BalanceTaskParams

Add BalanceMoveSpec message for encoding individual volume moves,
and max_concurrent_moves + repeated moves fields to BalanceTaskParams
to support batching multiple volume moves in a single job.

* balance handler: add batch execution with concurrent volume moves

Refactor Execute() into executeSingleMove() (backward compatible) and
executeBatchMoves() which runs multiple volume moves concurrently using
a semaphore-bounded goroutine pool. When BalanceTaskParams.Moves is
populated, the batch path is taken; otherwise the single-move path.

Includes aggregate progress reporting across concurrent moves,
per-move error collection, and partial failure support.

* balance handler: add batch config fields to Descriptor and worker config

Add max_concurrent_moves and batch_size fields to the worker config
form and deriveBalanceWorkerConfig(). These control how many volume
moves run concurrently within a batch job and the maximum batch size.

* balance handler: group detection proposals into batch jobs

When batch_size > 1, the Detect method groups detection results into
batch proposals where each proposal encodes multiple BalanceMoveSpec
entries in BalanceTaskParams.Moves. Single-result batches fall back
to the existing single-move proposal format for backward compatibility.

* admin UI: add volume balance execution plan and batch badge

Add renderBalanceExecutionPlan() for rich rendering of volume balance
jobs in the job detail modal. Single-move jobs show source/target/volume
info; batch jobs show a moves table with all volume moves.

Add batch badge (e.g., "5 moves") next to job type in the execution
jobs table when the job has batch=true label.

* Update plugin_templ.go

* fix: detection algorithm uses greedy target instead of divergent topology scores

The detection loop tracked effective volume counts via an adjustments map,
but createBalanceTask independently called planBalanceDestination which used
the topology's LoadCount — a separate, unadjusted source of truth. This
divergence caused multiple moves to pile onto the same server.

Changes:
- Add resolveBalanceDestination to resolve the detection loop's greedy
  target (minServer) rather than independently picking a destination
- Add oscillation guard: stop when max-min <= 1 since no single move
  can improve the balance beyond that point
- Track unseeded destinations: if a target server wasn't in the initial
  serverVolumeCounts, add it so subsequent iterations include it
- Add TestDetection_UnseededDestinationDoesNotOverload

* fix: handler force_move propagation, partial failure, deterministic dedupe

- Propagate ForceMove from outer BalanceTaskParams to individual move
  TaskParams so batch moves respect the force_move flag
- Fix partial failure: mark job successful if at least one move
  succeeded (succeeded > 0 || failed == 0) to avoid re-running
  already-completed moves on retry
- Use SHA-256 hash for deterministic dedupe key fallback instead of
  time.Now().UnixNano() which is non-deterministic
- Remove unused successDetails variable
- Extract maxProposalStringLength constant to replace magic number 200

* admin UI: use template literals in balance execution plan rendering

* fix: integration test handles batch proposals from batched detection

With batch_size=20, all moves are grouped into a single proposal
containing BalanceParams.Moves instead of top-level Sources/Targets.
Update assertions to handle both batch and single-move proposal formats.

* fix: verify volume size on target before deleting source during balance

Add a pre-delete safety check that reads the volume file status on both
source and target, then compares .dat file size and file count. If they
don't match, the move is aborted — leaving the source intact rather than
risking irreversible data loss.

Also removes the redundant mountVolume call since VolumeCopy already
mounts the volume on the target server.

* fix: clamp maxConcurrent, serialize progress sends, validate config as int64

- Clamp maxConcurrentMoves to defaultMaxConcurrentMoves before creating
  the semaphore so a stale or malicious job cannot request unbounded
  concurrent volume moves
- Extend progressMu to cover sender.SendProgress calls since the
  underlying gRPC stream is not safe for concurrent writes
- Perform bounds checks on max_concurrent_moves and batch_size in int64
  space before casting to int, avoiding potential overflow on 32-bit

* fix: check disk capacity in resolveBalanceDestination

Skip disks where VolumeCount >= MaxVolumeCount so the detection loop
does not propose moves to a full disk that would fail at execution time.

* test: rename unseeded destination test to match actual behavior

The test exercises a server with 0 volumes that IS seeded from topology
(matching disk type), not an unseeded destination. Rename to
TestDetection_ZeroVolumeServerIncludedInBalance and fix comments.

* test: tighten integration test to assert exactly one batch proposal

With default batch_size=20, all moves should be grouped into a single
batch proposal. Assert len(proposals)==1 and require BalanceParams with
Moves, removing the legacy single-move else branch.

* fix: propagate ctx to RPCs and restore source writability on abort

- All helper methods (markVolumeReadonly, copyVolume, tailVolume,
  readVolumeFileStatus, deleteVolume) now accept a context parameter
  instead of using context.Background(), so Execute's ctx propagates
  cancellation and timeouts into every volume server RPC
- Add deferred cleanup that restores the source volume to writable if
  any step after markVolumeReadonly fails, preventing the source from
  being left permanently readonly on abort
- Add markVolumeWritable helper using VolumeMarkWritableRequest

* fix: deep-copy protobuf messages in test recording sender

Use proto.Clone in recordingExecutionSender to store immutable snapshots
of JobProgressUpdate and JobCompleted, preventing assertions from
observing mutations if the handler reuses message pointers.

* fix: add VolumeMarkWritable and ReadVolumeFileStatus to fake volume server

The balance task now calls ReadVolumeFileStatus for pre-delete
verification and VolumeMarkWritable to restore writability on abort.
Add both RPCs to the test fake, and drop the mountCalls assertion since
BalanceTask no longer calls VolumeMount directly (VolumeCopy handles it).

* fix: use maxConcurrentMovesLimit (50) for clamp, not defaultMaxConcurrentMoves

defaultMaxConcurrentMoves (5) is the fallback when the field is unset,
not an upper bound. Clamping to it silently overrides valid config
values like 10/20/50. Introduce maxConcurrentMovesLimit (50) matching
the descriptor's MaxValue and clamp to that instead.

* fix: cancel batch moves on progress stream failure

Derive a cancellable batchCtx from the caller's ctx. If
sender.SendProgress returns an error (client disconnect, context
cancelled), capture it, skip further sends, and cancel batchCtx so
in-flight moves abort via their propagated context rather than running
blind to completion.

* fix: bound cleanup timeout and validate batch move fields

- Use a 30-second timeout for the deferred markVolumeWritable cleanup
  instead of context.Background() which can block indefinitely if the
  volume server is unreachable
- Validate required fields (VolumeID, SourceNode, TargetNode) before
  appending moves to a batch proposal, skipping invalid entries
- Fall back to a single-move proposal when filtering leaves only one
  valid move in a batch

* fix: cancel task execution on SendProgress stream failure

All handler progress callbacks previously ignored SendProgress errors,
allowing tasks to continue executing after the client disconnected.
Now each handler creates a derived cancellable context and cancels it
on the first SendProgress error, stopping the in-flight task promptly.

Handlers fixed: erasure_coding, vacuum, volume_balance (single-move),
and admin_script (breaks command loop on send failure).

* fix: validate batch moves before scheduling in executeBatchMoves

Reject empty batches, enforce a hard upper bound (100 moves), and
filter out nil or incomplete move specs (missing source/target/volume)
before allocating progress tracking and launching goroutines.

* test: add batch balance execution integration test

Tests the batch move path with 3 volumes, max concurrency 2, using
fake volume servers. Verifies all moves complete with correct readonly,
copy, tail, and delete RPC counts.

* test: add MarkWritableCount and ReadFileStatusCount accessors

Expose the markWritableCalls and readFileStatusCalls counters on the
fake volume server, following the existing MarkReadonlyCount pattern.

* fix: oscillation guard uses global effective counts for heterogeneous capacity

The oscillation guard (max-min <= 1) previously used maxServer/minServer
which are determined by utilization ratio. With heterogeneous capacity,
maxServer by utilization can have fewer raw volumes than minServer,
producing a negative diff and incorrectly triggering the guard.

Now scans all servers' effective counts to find the true global max/min
volume counts, so the guard works correctly regardless of whether
utilization-based or raw-count balancing is used.

* fix: admin script handler breaks outer loop on SendProgress failure

The break on SendProgress error inside the shell.Commands scan only
exited the inner loop, letting the outer command loop continue
executing commands on a broken stream. Use a sendBroken flag to
propagate the break to the outer execCommands loop.
2026-03-09 19:30:08 -07:00
Chris Lu
b991acf634 fix: paginate bucket listing in Admin UI to show all buckets (#8585)
* fix: paginate bucket listing in Admin UI to show all buckets

The Admin UI's GetS3Buckets() had a hardcoded Limit of 1000 in the
ListEntries request, causing the Total Buckets count to cap at 1000
even when more buckets exist. This adds pagination to iterate through
all buckets by continuing from the last entry name when a full page
is returned.

Fixes seaweedfs/seaweedfs#8564

* feat: add server-side pagination and sorting to S3 buckets page

Add pagination controls, page size selector, and sortable column
headers to the Admin UI's Object Store buckets page, following the
same pattern used by the Cluster Volumes page. This ensures the UI
remains responsive with thousands of buckets.

- Add CurrentPage, TotalPages, PageSize, SortBy, SortOrder to S3BucketsData
- Accept page/pageSize/sortBy/sortOrder query params in ShowS3Buckets handler
- Sort buckets by name, owner, created, objects, logical/physical size
- Paginate results server-side (default 100 per page)
- Add pagination nav, page size dropdown, and sort indicators to template

* Update s3_buckets_templ.go

* Update object_store_users_templ.go

* fix: use errors.Is(err, io.EOF) instead of string comparison

Replace brittle err.Error() == "EOF" string comparison with idiomatic
errors.Is(err, io.EOF) for checking stream end in bucket listing.

* fix: address PR review findings for bucket pagination

- Clamp page to totalPages when page exceeds total, preventing empty
  results with misleading pagination state
- Fix sort comparator to use explicit ascending/descending comparisons
  with a name tie-breaker, satisfying strict weak ordering for sort.Slice
- Capture SnapshotTsNs from first ListEntries response and pass it to
  subsequent requests for consistent pagination across pages
- Replace non-focusable <th onclick> sort headers with <a> tags and
  reuse getSortIcon, matching the cluster_volumes accessibility pattern
- Change exportBucketList() to fetch all buckets from /api/s3/buckets
  instead of scraping DOM rows (which now only contain the current page)
2026-03-09 18:55:47 -07:00
Chris Lu
02d3e3195c Update object_store_users_templ.go 2026-03-09 18:34:58 -07:00
Chris Lu
6dab90472b admin: fix access key creation UX (#8579)
* admin: remove misleading "secret key only shown once" warning

The access key details modal already allows viewing both the access key
and secret key at any time, so the warning about the secret key only
being displayed once is incorrect and misleading.

* admin: allow specifying custom access key and secret key

Add optional access_key and secret_key fields to the create access key
API. When provided, the specified keys are used instead of generating
random ones. The UI now shows a form with optional fields when creating
a new key, with a note that leaving them blank auto-generates keys.

* admin: check access key uniqueness before creating

Access keys must be globally unique across all users since S3 auth
looks them up in a single global map. Add an explicit check using
GetUserByAccessKey before creating, so the user gets a clear error
("access key is already in use") rather than a generic store error.

* Update object_store_users_templ.go

* admin: address review feedback for access key creation

Handler:
- Use decodeJSONBody/newJSONMaxReader instead of raw json.Decode to
  enforce request size limits and handle malformed JSON properly
- Return 409 Conflict for duplicate access keys, 400 Bad Request for
  validation errors, instead of generic 500

Backend:
- Validate access key length (4-128 chars) and secret key length
  (8-128 chars) when user-provided

Frontend:
- Extract resetCreateKeyForm() helper to avoid duplicated cleanup logic
- Wire resetCreateKeyForm to accessKeysModal hidden.bs.modal event so
  form state is always cleared when modal is dismissed
- Change secret key input to type="password" with a visibility toggle

* admin: guard against nil request and handle GetUserByAccessKey errors

- Add nil check for the CreateAccessKeyRequest pointer before
  dereferencing, defaulting to an empty request (auto-generate both
  keys).
- Handle non-"not found" errors from GetUserByAccessKey explicitly
  instead of silently proceeding, so store errors (e.g. db connection
  failures) surface rather than being swallowed.

* Update object_store_users_templ.go

* admin: fix access key uniqueness check with gRPC store

GetUserByAccessKey returns a gRPC NotFound status error (not the
sentinel credential.ErrAccessKeyNotFound) when using the gRPC store,
causing the uniqueness check to fail with a spurious error.

Treat the lookup as best-effort: only reject when a user is found
(err == nil). Any error (not-found via any store, connectivity issues)
falls through to the store's own CreateAccessKey which enforces
uniqueness definitively.

* admin: fix error handling and input validation for access key creation

Backend:
- Remove access key value from the duplicate-key error message to avoid
  logging the caller-supplied identifier.

Handler:
- Handle empty POST body (io.EOF) as a valid request that auto-generates
  both keys, instead of rejecting it as malformed JSON.
- Return 404 for "not found" errors (e.g. non-existent user) instead of
  collapsing them into a 500.

Frontend:
- Add minlength/maxlength attributes matching backend constraints
  (access key 4-128, secret key 8-128).
- Call reportValidity() before submitting so invalid lengths are caught
  client-side without a round trip.

* admin: use sentinel errors and fix GetUserByAccessKey error handling

Backend (user_management.go):
- Define sentinel errors (ErrAccessKeyInUse, ErrUserNotFound,
  ErrInvalidInput) and wrap them in returned errors so callers can use
  errors.Is.
- Handle GetUserByAccessKey errors properly: check the sentinel
  credential.ErrAccessKeyNotFound first, then fall back to string
  matching for stores (gRPC) that return non-sentinel not-found errors.
  Surface unexpected errors instead of silently proceeding.

Handler (user_handlers.go):
- Replace fragile strings.Contains error matching with errors.Is
  against the new dash sentinels.

Frontend (object_store_users.templ):
- Add double-submit guard (isCreatingKey flag + button disabling) to
  prevent duplicate access key creation requests.
2026-03-09 14:03:41 -07:00
Chris Lu
992db11d2b iam: add IAM group management (#8560)
* iam: add Group message to protobuf schema

Add Group message (name, members, policy_names, disabled) and
add groups field to S3ApiConfiguration for IAM group management
support (issue #7742).

* iam: add group CRUD to CredentialStore interface and all backends

Add group management methods (CreateGroup, GetGroup, DeleteGroup,
ListGroups, UpdateGroup) to the CredentialStore interface with
implementations for memory, filer_etc, postgres, and grpc stores.
Wire group loading/saving into filer_etc LoadConfiguration and
SaveConfiguration.

* iam: add group IAM response types

Add XML response types for group management IAM actions:
CreateGroup, DeleteGroup, GetGroup, ListGroups, AddUserToGroup,
RemoveUserFromGroup, AttachGroupPolicy, DetachGroupPolicy,
ListAttachedGroupPolicies, ListGroupsForUser.

* iam: add group management handlers to embedded IAM API

Add CreateGroup, DeleteGroup, GetGroup, ListGroups, AddUserToGroup,
RemoveUserFromGroup, AttachGroupPolicy, DetachGroupPolicy,
ListAttachedGroupPolicies, and ListGroupsForUser handlers with
dispatch in ExecuteAction.

* iam: add group management handlers to standalone IAM API

Add group handlers (CreateGroup, DeleteGroup, GetGroup, ListGroups,
AddUserToGroup, RemoveUserFromGroup, AttachGroupPolicy, DetachGroupPolicy,
ListAttachedGroupPolicies, ListGroupsForUser) and wire into DoActions
dispatch. Also add helper functions for user/policy side effects.

* iam: integrate group policies into authorization

Add groups and userGroups reverse index to IdentityAccessManagement.
Populate both maps during ReplaceS3ApiConfiguration and
MergeS3ApiConfiguration. Modify evaluateIAMPolicies to evaluate
policies from user's enabled groups in addition to user policies.
Update VerifyActionPermission to consider group policies when
checking hasAttachedPolicies.

* iam: add group side effects on user deletion and rename

When a user is deleted, remove them from all groups they belong to.
When a user is renamed, update group membership references. Applied
to both embedded and standalone IAM handlers.

* iam: watch /etc/iam/groups directory for config changes

Add groups directory to the filer subscription watcher so group
file changes trigger IAM configuration reloads.

* admin: add group management page to admin UI

Add groups page with CRUD operations, member management, policy
attachment, and enable/disable toggle. Register routes in admin
handlers and add Groups entry to sidebar navigation.

* test: add IAM group management integration tests

Add comprehensive integration tests for group CRUD, membership,
policy attachment, policy enforcement, disabled group behavior,
user deletion side effects, and multi-group membership. Add
"group" test type to CI matrix in s3-iam-tests workflow.

* iam: address PR review comments for group management

- Fix XSS vulnerability in groups.templ: replace innerHTML string
  concatenation with DOM APIs (createElement/textContent) for rendering
  member and policy lists
- Use userGroups reverse index in embedded IAM ListGroupsForUser for
  O(1) lookup instead of iterating all groups
- Add buildUserGroupsIndex helper in standalone IAM handlers; use it
  in ListGroupsForUser and removeUserFromAllGroups for efficient lookup
- Add note about gRPC store load-modify-save race condition limitation

* iam: add defensive copies, validation, and XSS fixes for group management

- Memory store: clone groups on store/retrieve to prevent mutation
- Admin dash: deep copy groups before mutation, validate user/policy exists
- HTTP handlers: translate credential errors to proper HTTP status codes,
  use *bool for Enabled field to distinguish missing vs false
- Groups templ: use data attributes + event delegation instead of inline
  onclick for XSS safety, prevent stale async responses

* iam: add explicit group methods to PropagatingCredentialStore

Add CreateGroup, GetGroup, DeleteGroup, ListGroups, and UpdateGroup
methods instead of relying on embedded interface fallthrough. Group
changes propagate via filer subscription so no RPC propagation needed.

* iam: detect postgres unique constraint violation and add groups index

Return ErrGroupAlreadyExists when INSERT hits SQLState 23505 instead of
a generic error. Add index on groups(disabled) for filtered queries.

* iam: add Marker field to group list response types

Add Marker string field to GetGroupResult, ListGroupsResult,
ListAttachedGroupPoliciesResult, and ListGroupsForUserResult to
match AWS IAM pagination response format.

* iam: check group attachment before policy deletion

Reject DeletePolicy if the policy is attached to any group, matching
AWS IAM behavior. Add PolicyArn to ListAttachedGroupPolicies response.

* iam: include group policies in IAM authorization

Merge policy names from user's enabled groups into the IAMIdentity
used for authorization, so group-attached policies are evaluated
alongside user-attached policies.

* iam: check for name collision before renaming user in UpdateUser

Scan identities and inline policies for newUserName before mutating,
returning EntityAlreadyExists if a collision is found. Reuse the
already-loaded policies instead of loading them again inside the loop.

* test: use t.Cleanup for bucket cleanup in group policy test

* iam: wrap ErrUserNotInGroup sentinel in RemoveGroupMember error

Wrap credential.ErrUserNotInGroup so errors.Is works in
groupErrorToHTTPStatus, returning proper 400 instead of 500.

* admin: regenerate groups_templ.go with XSS-safe data attributes

Regenerated from groups.templ which uses data-group-name attributes
instead of inline onclick with string interpolation.

* iam: add input validation and persist groups during migration

- Validate nil/empty group name in CreateGroup and UpdateGroup
- Save groups in migrateToMultiFile so they survive legacy migration

* admin: use groupErrorToHTTPStatus in GetGroupMembers and GetGroupPolicies

* iam: short-circuit UpdateUser when newUserName equals current name

* iam: require empty PolicyNames before group deletion

Reject DeleteGroup when group has attached policies, matching the
existing members check. Also fix GetGroup error handling in
DeletePolicy to only skip ErrGroupNotFound, not all errors.

* ci: add weed/pb/** to S3 IAM test trigger paths

* test: replace time.Sleep with require.Eventually for propagation waits

Use polling with timeout instead of fixed sleeps to reduce flakiness
in integration tests waiting for IAM policy propagation.

* fix: use credentialManager.GetPolicy for AttachGroupPolicy validation

Policies created via CreatePolicy through credentialManager are stored
in the credential store, not in s3cfg.Policies (which only has static
config policies). Change AttachGroupPolicy to use credentialManager.GetPolicy()
for policy existence validation.

* feat: add UpdateGroup handler to embedded IAM API

Add UpdateGroup action to enable/disable groups and rename groups
via the IAM API. This is a SeaweedFS extension (not in AWS SDK) used
by tests to toggle group disabled status.

* fix: authenticate raw IAM API calls in group tests

The embedded IAM endpoint rejects anonymous requests. Replace
callIAMAPI with callIAMAPIAuthenticated that uses JWT bearer token
authentication via the test framework.

* feat: add UpdateGroup handler to standalone IAM API

Mirror the embedded IAM UpdateGroup handler in the standalone IAM API
for parity.

* fix: add omitempty to Marker XML tags in group responses

Non-truncated responses should not emit an empty <Marker/> element.

* fix: distinguish backend errors from missing policies in AttachGroupPolicy

Return ServiceFailure for credential manager errors instead of masking
them as NoSuchEntity. Also switch ListGroupsForUser to use s3cfg.Groups
instead of in-memory reverse index to avoid stale data. Add duplicate
name check to UpdateGroup rename.

* fix: standalone IAM AttachGroupPolicy uses persisted policy store

Check managed policies from GetPolicies() instead of s3cfg.Policies
so dynamically created policies are found. Also add duplicate name
check to UpdateGroup rename.

* fix: rollback inline policies on UpdateUser PutPolicies failure

If PutPolicies fails after moving inline policies to the new username,
restore both the identity name and the inline policies map to their
original state to avoid a partial-write window.

* fix: correct test cleanup ordering for group tests

Replace scattered defers with single ordered t.Cleanup in each test
to ensure resources are torn down in reverse-creation order:
remove membership, detach policies, delete access keys, delete users,
delete groups, delete policies. Move bucket cleanup to parent test
scope and delete objects before bucket.

* fix: move identity nil check before map lookup and refine hasAttachedPolicies

Move the nil check on identity before accessing identity.Name to
prevent panic. Also refine hasAttachedPolicies to only consider groups
that are enabled and have actual policies attached, so membership in
a no-policy group doesn't incorrectly trigger IAM authorization.

* fix: fail group reload on unreadable or corrupt group files

Return errors instead of logging and continuing when group files
cannot be read or unmarshaled. This prevents silently applying a
partial IAM config with missing group memberships or policies.

* fix: use errors.Is for sql.ErrNoRows comparison in postgres group store

* docs: explain why group methods skip propagateChange

Group changes propagate to S3 servers via filer subscription
(watching /etc/iam/groups/) rather than gRPC RPCs, since there
are no group-specific RPCs in the S3 cache protocol.

* fix: remove unused policyNameFromArn and strings import

* fix: update service account ParentUser on user rename

When renaming a user via UpdateUser, also update ParentUser references
in service accounts to prevent them from becoming orphaned after the
next configuration reload.

* fix: wrap DetachGroupPolicy error with ErrPolicyNotAttached sentinel

Use credential.ErrPolicyNotAttached so groupErrorToHTTPStatus maps
it to 400 instead of falling back to 500.

* fix: use admin S3 client for bucket cleanup in enforcement test

The user S3 client may lack permissions by cleanup time since the
user is removed from the group in an earlier subtest. Use the admin
S3 client to ensure bucket and object cleanup always succeeds.

* fix: add nil guard for group param in propagating store log calls

Prevent potential nil dereference when logging group.Name in
CreateGroup and UpdateGroup of PropagatingCredentialStore.

* fix: validate Disabled field in UpdateGroup handlers

Reject values other than "true" or "false" with InvalidInputException
instead of silently treating them as false.

* fix: seed mergedGroups from existing groups in MergeS3ApiConfiguration

Previously the merge started with empty group maps, dropping any
static-file groups. Now seeds from existing iam.groups before
overlaying dynamic config, and builds the reverse index after
merging to avoid stale entries from overridden groups.

* fix: use errors.Is for filer_pb.ErrNotFound comparison in group loading

Replace direct equality (==) with errors.Is() to correctly match
wrapped errors, consistent with the rest of the codebase.

* fix: add ErrUserNotFound and ErrPolicyNotFound to groupErrorToHTTPStatus

Map these sentinel errors to 404 so AddGroupMember and
AttachGroupPolicy return proper HTTP status codes.

* fix: log cleanup errors in group integration tests

Replace fire-and-forget cleanup calls with error-checked versions
that log failures via t.Logf for debugging visibility.

* fix: prevent duplicate group test runs in CI matrix

The basic lane's -run "TestIAM" regex also matched TestIAMGroup*
tests, causing them to run in both the basic and group lanes.
Replace with explicit test function names.

* fix: add GIN index on groups.members JSONB for membership lookups

Without this index, ListGroupsForUser and membership queries
require full table scans on the groups table.

* fix: handle cross-directory moves in IAM config subscription

When a file is moved out of an IAM directory (e.g., /etc/iam/groups),
the dir variable was overwritten with NewParentPath, causing the
source directory change to be missed. Now also notifies handlers
about the source directory for cross-directory moves.

* fix: validate members/policies before deleting group in admin handler

AdminServer.DeleteGroup now checks for attached members and policies
before delegating to credentialManager, matching the IAM handler guards.

* fix: merge groups by name instead of blind append during filer load

Match the identity loader's merge behavior: find existing group
by name and replace, only append when no match exists. Prevents
duplicates when legacy and multi-file configs overlap.

* fix: check DeleteEntry response error when cleaning obsolete group files

Capture and log resp.Error from filer DeleteEntry calls during
group file cleanup, matching the pattern used in deleteGroupFile.

* fix: verify source user exists before no-op check in UpdateUser

Reorder UpdateUser to find the source identity first and return
NoSuchEntityException if not found, before checking if the rename
is a no-op. Previously a non-existent user renamed to itself
would incorrectly return success.

* fix: update service account parent refs on user rename in embedded IAM

The embedded IAM UpdateUser handler updated group membership but
not service account ParentUser fields, unlike the standalone handler.

* fix: replay source-side events for all handlers on cross-dir moves

Pass nil newEntry to bucket, IAM, and circuit-breaker handlers for
the source directory during cross-directory moves, so all watchers
can clear caches for the moved-away resource.

* fix: don't seed mergedGroups from existing iam.groups in merge

Groups are always dynamic (from filer), never static (from s3.config).
Seeding from iam.groups caused stale deleted groups to persist.
Now only uses config.Groups from the dynamic filer config.

* fix: add deferred user cleanup in TestIAMGroupUserDeletionSideEffect

Register t.Cleanup for the created user so it gets cleaned up
even if the test fails before the inline DeleteUser call.

* fix: assert UpdateGroup HTTP status in disabled group tests

Add require.Equal checks for 200 status after UpdateGroup calls
so the test fails immediately on API errors rather than relying
on the subsequent Eventually timeout.

* fix: trim whitespace from group name in filer store operations

Trim leading/trailing whitespace from group.Name before validation
in CreateGroup and UpdateGroup to prevent whitespace-only filenames.
Also merge groups by name during multi-file load to prevent duplicates.

* fix: add nil/empty group validation in gRPC store

Guard CreateGroup and UpdateGroup against nil group or empty name
to prevent panics and invalid persistence.

* fix: add nil/empty group validation in postgres store

Guard CreateGroup and UpdateGroup against nil group or empty name
to prevent panics from nil member access and empty-name row inserts.

* fix: add name collision check in embedded IAM UpdateUser

The embedded IAM handler renamed users without checking if the
target name already existed, unlike the standalone handler.

* fix: add ErrGroupNotEmpty sentinel and map to HTTP 409

AdminServer.DeleteGroup now wraps conflict errors with
ErrGroupNotEmpty, and groupErrorToHTTPStatus maps it to
409 Conflict instead of 500.

* fix: use appropriate error message in GetGroupDetails based on status

Return "Group not found" only for 404, use "Failed to retrieve group"
for other error statuses instead of always saying "Group not found".

* fix: use backend-normalized group.Name in CreateGroup response

After credentialManager.CreateGroup may normalize the name (e.g.,
trim whitespace), use group.Name instead of the raw input for
the returned GroupData to ensure consistency.

* fix: add nil/empty group validation in memory store

Guard CreateGroup and UpdateGroup against nil group or empty name
to prevent panics from nil pointer dereference on map access.

* fix: reorder embedded IAM UpdateUser to verify source first

Find the source identity before checking for collisions, matching
the standalone handler's logic. Previously a non-existent user
renamed to an existing name would get EntityAlreadyExists instead
of NoSuchEntity.

* fix: handle same-directory renames in metadata subscription

Replay a delete event for the old entry name during same-directory
renames so handlers like onBucketMetadataChange can clean up stale
state for the old name.

* fix: abort GetGroups on non-ErrGroupNotFound errors

Only skip groups that return ErrGroupNotFound. Other errors (e.g.,
transient backend failures) now abort the handler and return the
error to the caller instead of silently producing partial results.

* fix: add aria-label and title to icon-only group action buttons

Add accessible labels to View and Delete buttons so screen readers
and tooltips provide meaningful context.

* fix: validate group name in saveGroup to prevent invalid filenames

Trim whitespace and reject empty names before writing group JSON
files, preventing creation of files like ".json".

* fix: add /etc/iam/groups to filer subscription watched directories

The groups directory was missing from the watched directories list,
so S3 servers in a cluster would not detect group changes made by
other servers via filer. The onIamConfigChange handler already had
code to handle group directory changes but it was never triggered.

* add direct gRPC propagation for group changes to S3 servers

Groups now have the same dual propagation as identities and policies:
direct gRPC push via propagateChange + async filer subscription.

- Add PutGroup/RemoveGroup proto messages and RPCs
- Add PutGroup/RemoveGroup in-memory cache methods on IAM
- Add PutGroup/RemoveGroup gRPC server handlers
- Update PropagatingCredentialStore to call propagateChange on group mutations

* reduce log verbosity for config load summary

Change ReplaceS3ApiConfiguration log from Infof to V(1).Infof
to avoid noisy output on every config reload.

* admin: show user groups in view and edit user modals

- Add Groups field to UserDetails and populate from credential manager
- Show groups as badges in user details view modal
- Add group management to edit user modal: display current groups,
  add to group via dropdown, remove from group via badge x button

* fix: remove duplicate showAlert that broke modal-alerts.js

admin.js defined showAlert(type, message) which overwrote the
modal-alerts.js version showAlert(message, type), causing broken
unstyled alert boxes. Remove the duplicate and swap all callers
in admin.js to use the correct (message, type) argument order.

* fix: unwrap groups API response in edit user modal

The /api/groups endpoint returns {"groups": [...]}, not a bare array.

* Update object_store_users_templ.go

* test: assert AccessDenied error code in group denial tests

Replace plain assert.Error checks with awserr.Error type assertion
and AccessDenied code verification, matching the pattern used in
other IAM integration tests.

* fix: propagate GetGroups errors in ShowGroups handler

getGroupsPageData was swallowing errors and returning an empty page
with 200 status. Now returns the error so ShowGroups can respond
with a proper error status.

* fix: reject AttachGroupPolicy when credential manager is nil

Previously skipped policy existence validation when credentialManager
was nil, allowing attachment of nonexistent policies. Now returns
a ServiceFailureException error.

* fix: preserve groups during partial MergeS3ApiConfiguration updates

UpsertIdentity calls MergeS3ApiConfiguration with a partial config
containing only the updated identity (nil Groups). This was wiping
all in-memory group state. Now only replaces groups when
config.Groups is non-nil (full config reload).

* fix: propagate errors from group lookup in GetObjectStoreUserDetails

ListGroups and GetGroup errors were silently ignored, potentially
showing incomplete group data in the UI.

* fix: use DOM APIs for group badge remove button to prevent XSS

Replace innerHTML with onclick string interpolation with DOM
createElement + addEventListener pattern. Also add aria-label
and title to the add-to-group button.

* fix: snapshot group policies under RLock to prevent concurrent map access

evaluateIAMPolicies was copying the map reference via groupMap :=
iam.groups under RLock then iterating after RUnlock, while PutGroup
mutates the map in-place. Now copies the needed policy names into
a slice while holding the lock.

* fix: add nil IAM check to PutGroup and RemoveGroup gRPC handlers

Match the nil guard pattern used by PutPolicy/DeletePolicy to
prevent nil pointer dereference when IAM is not initialized.
2026-03-09 11:54:32 -07:00
Chris Lu
ba66411337 Update plugin_templ.go 2026-03-08 14:29:06 -07:00
Chris Lu
7808b301ef admin: remove Scheduler Settings cards from plugin UI (#8558)
* admin: remove Scheduler Settings cards, make Next Run full-width

Remove the two "Scheduler Settings" placeholder cards from the plugin
UI (overview page and scheduler tab). They only contained a text note
saying detection intervals are configured per job type, which is
self-evident from the per-job-type settings form.

Make the "Next Run" card full-width on the overview page since it no
longer shares a row with the removed card.

* plugin UI: promote Next Run to top summary card row

Move "Next Run" from a standalone card into the top row alongside
Workers, Active Jobs, and Activities as a compact stat card.
2026-03-08 14:27:57 -07:00
Chris Lu
fa7da0f57e template 2026-03-08 14:05:42 -07:00
Chris Lu
961c270aba admin: expose per-job-type detection interval in plugin UI (#8552)
* admin: expose per-job-type detection interval in plugin UI

The detection_interval_seconds field was not editable in the admin UI.
collectAdminSettings() silently preserved the existing value, making it
impossible for users to change how often a job type checks for new work.
Users would change the global "Sleep Between Iterations" setting expecting
it to control job scheduling frequency, but that only controls the
scheduler loop's idle polling rate.

Add a "Detection Interval (s)" input to the per-job-type admin settings
form so users can actually configure it.

Fixes #8549

* admin: remove global Sleep Between Iterations setting

Now that per-job-type detection intervals are exposed in the UI, the
global IdleSleepSeconds setting is redundant and confusing. It only
controlled the scheduler loop's idle polling rate, which is always
overridden by earliestNextDetectionAt() when job types exist.

Replace the three usages with simpler alternatives:
- Scheduler loop sleep: use defaultSchedulerIdleSleep constant
- Initial delay for new job types: use policy.DetectionInterval/2
  (more logical since it's already per-job-type)
- Status fallback: use the constant

The API endpoints are kept for backward compatibility but the UI
no longer exposes or calls them.

* admin: restore configurable idle sleep in scheduler loop

The EC integration test sets idle_sleep_seconds=1 via the scheduler
config API so the scheduler wakes quickly after workers connect. The
previous commit replaced this with a hardcoded 613s constant, causing
the scheduler to sleep through the entire test window.

Restore GetSchedulerConfig().IdleSleepDuration() in the scheduler loop
and status reporting. The UI removal of the setting is still correct —
the API endpoint remains for programmatic use (e.g., tests).

* admin: cap first-run initial delay to 5s instead of DetectionInterval/2

The initial delay for first-run job types was set to
policy.DetectionInterval/2, which creates unbounded first-run latency
(e.g., 1 hour for vacuum with a 2-hour detection interval). A small
fixed 5-second delay provides sufficient stagger without penalizing
startup time.
2026-03-08 14:03:51 -07:00
Chris Lu
e25558e4d8 admin: fix mobile sidebar menu inaccessible in portrait mode (#8556)
* admin: fix mobile sidebar menu inaccessible in portrait mode

The hamburger button only toggled the user dropdown, leaving the
sidebar navigation inaccessible on mobile devices in portrait mode.

Add a dedicated sidebar toggle button (visible only on mobile), give
the sidebar an id so Bootstrap collapse can target it, add a backdrop
overlay for the open state, and auto-close the sidebar when a nav
link is clicked.

Fixes #8550

* admin: address review feedback on mobile sidebar

- Remove redundant JS show/hide.bs.collapse listeners; CSS sibling
  selector already handles backdrop visibility
- Use const instead of var for non-reassigned variables
- Move inline style on user icon to CSS class

* admin: add aria attributes to user-menu toggler, use CSS variable for navbar height

- Add aria-controls, aria-expanded, and aria-label to the user-menu
  toggle button for assistive technology
- Extract hard-coded 56px navbar height into --navbar-height CSS
  custom property used by sidebar and backdrop positioning

* admin: extract hideSidebar helper, use toggler visibility for breakpoint check

- Extract duplicated collapse-hide logic into a hideSidebar helper
- Replace hardcoded window.innerWidth < 768 with a check on the
  sidebar toggler's computed display, decoupling JS from CSS breakpoints
- Add aria-expanded="false" to sidebar toggle button

---------

Co-authored-by: Copilot <copilot@github.com>
2026-03-08 12:32:14 -07:00
Chris Lu
1f3df6e9ef admin: remove Alpha badge and unused Metrics/Logs menu items (#8525)
* admin: remove Alpha badge and unused Metrics/Logs menu items

* Update layout_templ.go
2026-03-05 11:51:11 -08:00
Fábio Henrique Araújo
88e8342e44 style: Reseted padding to container-fluid div in layout template (#8505)
* style: Reseted padding to container-fluid div in layout template

* address comment

Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Chris Lu <chris.lu@gmail.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-04 14:24:23 -08:00
Copilot
70ed9c2a55 Update plugin_templ.go
Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-04 00:41:20 -08:00
Chris Lu
18ccc9b773 Plugin scheduler: sequential iterations with max runtime (#8496)
* pb: add job type max runtime setting

* plugin: default job type max runtime

* plugin: redesign scheduler loop

* admin ui: update scheduler settings

* plugin: fix scheduler loop state name

* plugin scheduler: restore backlog skip

* plugin scheduler: drop legacy detection helper

* admin api: require scheduler config body

* admin ui: preserve detection interval on save

* plugin scheduler: use job context and drain cancels

* plugin scheduler: respect detection intervals

* plugin scheduler: gate runs and drain queue

* ec test: reuse req/resp vars

* ec test: add scheduler debug logs

* Adjust scheduler idle sleep and initial run delay

* Clear pending job queue before scheduler runs

* Log next detection time in EC integration test

* Improve plugin scheduler debug logging in EC test

* Expose scheduler next detection time

* Log scheduler next detection time in EC test

* Wake scheduler on config or worker updates

* Expose scheduler sleep interval in UI

* Fix scheduler sleep save value selection

* Set scheduler idle sleep default to 613s

* Show scheduler next run time in plugin UI

---------

Co-authored-by: Copilot <copilot@github.com>
2026-03-03 23:09:49 -08:00
Chris Lu
e1e5b4a8a6 add admin script worker (#8491)
* admin: add plugin lock coordination

* shell: allow bypassing lock checks

* plugin worker: add admin script handler

* mini: include admin_script in plugin defaults

* admin script UI: drop name and enlarge text

* admin script: add default script

* admin_script: make run interval configurable

* plugin: gate other jobs during admin_script runs

* plugin: use last completed admin_script run

* admin: backfill plugin config defaults

* templ

Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com>

* comparable to default version

Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com>

* default to run

Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com>

* format

Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com>

* shell: respect pre-set noLock for fix.replication

* shell: add force no-lock mode for admin scripts

* volume balance worker already exists

Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com>

* admin: expose scheduler status JSON

* shell: add sleep command

* shell: restrict sleep syntax

* Revert "shell: respect pre-set noLock for fix.replication"

This reverts commit 2b14e8b826.

* templ

Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com>

* fix import

Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com>

* less logs

Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com>

* Reduce master client logs on canceled contexts

* Update mini default job type count

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-03 15:10:40 -08:00
Chris Lu
a61a2affe3 Expire stuck plugin jobs (#8492)
* Add stale job expiry and expire API

* Add expire job button

* Add test hook and coverage for ExpirePluginJobAPI

* Document scheduler filtering side effect and reuse helper

* Restore job spec proposal test

* Regenerate plugin template output

---------

Co-authored-by: Copilot <copilot@github.com>
2026-03-03 01:27:25 -08:00
Chris Lu
c73e65ad5e Add customizable plugin display names and weights (#8459)
* feat: add customizable plugin display names and weights

- Add weight field to JobTypeCapability proto message
- Modify ListKnownJobTypes() to return JobTypeInfo with display names and weights
- Modify ListPluginJobTypes() to return JobTypeInfo instead of string
- Sort plugins by weight (descending) then alphabetically
- Update admin API to return enriched job type metadata
- Update plugin UI template to display names instead of IDs
- Consolidate API by reusing existing function names instead of suffixed variants

* perf: optimize plugin job type capability lookup and add null-safe parsing

- Pre-calculate job type capabilities in a map to reduce O(n*m) nested loops
  to O(n+m) lookup time in ListKnownJobTypes()
- Add parseJobTypeItem() helper function for null-safe job type item parsing
- Refactor plugin.templ to use parseJobTypeItem() in all job type access points
  (hasJobType, applyInitialNavigation, ensureActiveNavigation, renderTopTabs)
- Deterministic capability resolution by using first worker's capability

* templ

* refactor: use parseJobTypeItem helper consistently in plugin.templ

Replace duplicated job type extraction logic at line 1296-1298 with
parseJobTypeItem() helper function for consistency and maintainability.

* improve: prefer richer capability metadata and add null-safety checks

- Improve capability selection in ListKnownJobTypes() to prefer capabilities
  with non-empty DisplayName and higher Weight across all workers instead of
  first-wins approach. Handles mixed-version clusters better.
- Add defensive null checks in renderJobTypeSummary() to safely access
  parseJobTypeItem() result before property access
- Ensures malformed or missing entries won't break the rendering pipeline

* fix: preserve existing DisplayName when merging capabilities

Fix capability merge logic to respect existing DisplayName values:
- If existing has DisplayName but candidate doesn't, preserve existing
- If existing doesn't have DisplayName but candidate does, use candidate
- Only use Weight comparison if DisplayName status is equal
- Prevents higher-weight capabilities with empty DisplayName from
  overriding capabilities with non-empty DisplayName
2026-02-26 19:20:48 -08:00
Chris Lu
cba69f4593 Update layout_templ.go 2026-02-24 13:22:12 -08:00
Chris Lu
8d59ef41d5 Admin UI: replace gin with mux (#8420)
* Replace admin gin router with mux

* Update layout_templ.go

* Harden admin handlers

* Add login CSRF handling

* Fix filer copy naming conflict

* address comments

* address comments
2026-02-23 19:11:17 -08:00
Chris Lu
7b08cf74ed consistent template generation 2026-02-22 13:34:06 -08:00
Chris Lu
8ec9ff4a12 Refactor plugin system and migrate worker runtime (#8369)
* admin: add plugin runtime UI page and route wiring

* pb: add plugin gRPC contract and generated bindings

* admin/plugin: implement worker registry, runtime, monitoring, and config store

* admin/dash: wire plugin runtime and expose plugin workflow APIs

* command: add flags to enable plugin runtime

* admin: rename remaining plugin v2 wording to plugin

* admin/plugin: add detectable job type registry helper

* admin/plugin: add scheduled detection and dispatch orchestration

* admin/plugin: prefetch job type descriptors when workers connect

* admin/plugin: add known job type discovery API and UI

* admin/plugin: refresh design doc to match current implementation

* admin/plugin: enforce per-worker scheduler concurrency limits

* admin/plugin: use descriptor runtime defaults for scheduler policy

* admin/ui: auto-load first known plugin job type on page open

* admin/plugin: bootstrap persisted config from descriptor defaults

* admin/plugin: dedupe scheduled proposals by dedupe key

* admin/ui: add job type and state filters for plugin monitoring

* admin/ui: add per-job-type plugin activity summary

* admin/plugin: split descriptor read API from schema refresh

* admin/ui: keep plugin summary metrics global while tables are filtered

* admin/plugin: retry executor reservation before timing out

* admin/plugin: expose scheduler states for monitoring

* admin/ui: show per-job-type scheduler states in plugin monitor

* pb/plugin: rename protobuf package to plugin

* admin/plugin: rename pluginRuntime wiring to plugin

* admin/plugin: remove runtime naming from plugin APIs and UI

* admin/plugin: rename runtime files to plugin naming

* admin/plugin: persist jobs and activities for monitor recovery

* admin/plugin: lease one detector worker per job type

* admin/ui: show worker load from plugin heartbeats

* admin/plugin: skip stale workers for detector and executor picks

* plugin/worker: add plugin worker command and stream runtime scaffold

* plugin/worker: implement vacuum detect and execute handlers

* admin/plugin: document external vacuum plugin worker starter

* command: update plugin.worker help to reflect implemented flow

* command/admin: drop legacy Plugin V2 label

* plugin/worker: validate vacuum job type and respect min interval

* plugin/worker: test no-op detect when min interval not elapsed

* command/admin: document plugin.worker external process

* plugin/worker: advertise configured concurrency in hello

* command/plugin.worker: add jobType handler selection

* command/plugin.worker: test handler selection by job type

* command/plugin.worker: persist worker id in workingDir

* admin/plugin: document plugin.worker jobType and workingDir flags

* plugin/worker: support cancel request for in-flight work

* plugin/worker: test cancel request acknowledgements

* command/plugin.worker: document workingDir and jobType behavior

* plugin/worker: emit executor activity events for monitor

* plugin/worker: test executor activity builder

* admin/plugin: send last successful run in detection request

* admin/plugin: send cancel request when detect or execute context ends

* admin/plugin: document worker cancel request responsibility

* admin/handlers: expose plugin scheduler states API in no-auth mode

* admin/handlers: test plugin scheduler states route registration

* admin/plugin: keep worker id on worker-generated activity records

* admin/plugin: test worker id propagation in monitor activities

* admin/dash: always initialize plugin service

* command/admin: remove plugin enable flags and default to enabled

* admin/dash: drop pluginEnabled constructor parameter

* admin/plugin UI: stop checking plugin enabled state

* admin/plugin: remove docs for plugin enable flags

* admin/dash: remove unused plugin enabled check method

* admin/dash: fallback to in-memory plugin init when dataDir fails

* admin/plugin API: expose worker gRPC port in status

* command/plugin.worker: resolve admin gRPC port via plugin status

* split plugin UI into overview/configuration/monitoring pages

* Update layout_templ.go

* add volume_balance plugin worker handler

* wire plugin.worker CLI for volume_balance job type

* add erasure_coding plugin worker handler

* wire plugin.worker CLI for erasure_coding job type

* support multi-job handlers in plugin worker runtime

* allow plugin.worker jobType as comma-separated list

* admin/plugin UI: rename to Workers and simplify config view

* plugin worker: queue detection requests instead of capacity reject

* Update plugin_worker.go

* plugin volume_balance: remove force_move/timeout from worker config UI

* plugin erasure_coding: enforce local working dir and cleanup

* admin/plugin UI: rename admin settings to job scheduling

* admin/plugin UI: persist and robustly render detection results

* admin/plugin: record and return detection trace metadata

* admin/plugin UI: show detection process and decision trace

* plugin: surface detector decision trace as activities

* mini: start a plugin worker by default

* admin/plugin UI: split monitoring into detection and execution tabs

* plugin worker: emit detection decision trace for EC and balance

* admin workers UI: split monitoring into detection and execution pages

* plugin scheduler: skip proposals for active assigned/running jobs

* admin workers UI: add job queue tab

* plugin worker: add dummy stress detector and executor job type

* admin workers UI: reorder tabs to detection queue execution

* admin workers UI: regenerate plugin template

* plugin defaults: include dummy stress and add stress tests

* plugin dummy stress: rotate detection selections across runs

* plugin scheduler: remove cross-run proposal dedupe

* plugin queue: track pending scheduled jobs

* plugin scheduler: wait for executor capacity before dispatch

* plugin scheduler: skip detection when waiting backlog is high

* plugin: add disk-backed job detail API and persistence

* admin ui: show plugin job detail modal from job id links

* plugin: generate unique job ids instead of reusing proposal ids

* plugin worker: emit heartbeats on work state changes

* plugin registry: round-robin tied executor and detector picks

* add temporary EC overnight stress runner

* plugin job details: persist and render EC execution plans

* ec volume details: color data and parity shard badges

* shard labels: keep parity ids numeric and color-only distinction

* admin: remove legacy maintenance UI routes and templates

* admin: remove dead maintenance endpoint helpers

* Update layout_templ.go

* remove dummy_stress worker and command support

* refactor plugin UI to job-type top tabs and sub-tabs

* migrate weed worker command to plugin runtime

* remove plugin.worker command and keep worker runtime with metrics

* update helm worker args for jobType and execution flags

* set plugin scheduling defaults to global 16 and per-worker 4

* stress: fix RPC context reuse and remove redundant variables in ec_stress_runner

* admin/plugin: fix lifecycle races, safe channel operations, and terminal state constants

* admin/dash: randomize job IDs and fix priority zero-value overwrite in plugin API

* admin/handlers: implement buffered rendering to prevent response corruption

* admin/plugin: implement debounced persistence flusher and optimize BuildJobDetail memory lookups

* admin/plugin: fix priority overwrite and implement bounded wait in scheduler reserve

* admin/plugin: implement atomic file writes and fix run record side effects

* admin/plugin: use P prefix for parity shard labels in execution plans

* admin/plugin: enable parallel execution for cancellation tests

* admin: refactor time.Time fields to pointers for better JSON omitempty support

* admin/plugin: implement pointer-safe time assignments and comparisons in plugin core

* admin/plugin: fix time assignment and sorting logic in plugin monitor after pointer refactor

* admin/plugin: update scheduler activity tracking to use time pointers

* admin/plugin: fix time-based run history trimming after pointer refactor

* admin/dash: fix JobSpec struct literal in plugin API after pointer refactor

* admin/view: add D/P prefixes to EC shard badges for UI consistency

* admin/plugin: use lifecycle-aware context for schema prefetching

* Update ec_volume_details_templ.go

* admin/stress: fix proposal sorting and log volume cleanup errors

* stress: refine ec stress runner with math/rand and collection name

- Added Collection field to VolumeEcShardsDeleteRequest for correct filename construction.
- Replaced crypto/rand with seeded math/rand PRNG for bulk payloads.
- Added documentation for EcMinAge zero-value behavior.
- Added logging for ignored errors in volume/shard deletion.

* admin: return internal server error for plugin store failures

Changed error status code from 400 Bad Request to 500 Internal Server Error for failures in GetPluginJobDetail to correctly reflect server-side errors.

* admin: implement safe channel sends and graceful shutdown sync

- Added sync.WaitGroup to Plugin struct to manage background goroutines.
- Implemented safeSendCh helper using recover() to prevent panics on closed channels.
- Ensured Shutdown() waits for all background operations to complete.

* admin: robustify plugin monitor with nil-safe time and record init

- Standardized nil-safe assignment for *time.Time pointers (CreatedAt, UpdatedAt, CompletedAt).
- Ensured persistJobDetailSnapshot initializes new records correctly if they don't exist on disk.
- Fixed debounced persistence to trigger immediate write on job completion.

* admin: improve scheduler shutdown behavior and logic guards

- Replaced brittle error string matching with explicit r.shutdownCh selection for shutdown detection.
- Removed redundant nil guard in buildScheduledJobSpec.
- Standardized WaitGroup usage for schedulerLoop.

* admin: implement deep copy for job parameters and atomic write fixes

- Implemented deepCopyGenericValue and used it in cloneTrackedJob to prevent shared state.
- Ensured atomicWriteFile creates parent directories before writing.

* admin: remove unreachable branch in shard classification

Removed an unreachable 'totalShards <= 0' check in classifyShardID as dataShards and parityShards are already guarded.

* admin: secure UI links and use canonical shard constants

- Added rel="noopener noreferrer" to external links for security.
- Replaced magic number 14 with erasure_coding.TotalShardsCount.
- Used renderEcShardBadge for missing shard list consistency.

* admin: stabilize plugin tests and fix regressions

- Composed a robust plugin_monitor_test.go to handle asynchronous persistence.
- Updated all time.Time literals to use timeToPtr helper.
- Added explicit Shutdown() calls in tests to synchronize with debounced writes.
- Fixed syntax errors and orphaned struct literals in tests.

* Potential fix for code scanning alert no. 278: Slice memory allocation with excessive size value

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

* Potential fix for code scanning alert no. 283: Uncontrolled data used in path expression

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

* admin: finalize refinements for error handling, scheduler, and race fixes

- Standardized HTTP 500 status codes for store failures in plugin_api.go.
- Tracked scheduled detection goroutines with sync.WaitGroup for safe shutdown.
- Fixed race condition in safeSendDetectionComplete by extracting channel under lock.
- Implemented deep copy for JobActivity details.
- Used defaultDirPerm constant in atomicWriteFile.

* test(ec): migrate admin dockertest to plugin APIs

* admin/plugin_api: fix RunPluginJobTypeAPI to return 500 for server-side detection/filter errors

* admin/plugin_api: fix ExecutePluginJobAPI to return 500 for job execution failures

* admin/plugin_api: limit parseProtoJSONBody request body to 1MB to prevent unbounded memory usage

* admin/plugin: consolidate regex to package-level validJobTypePattern; add char validation to sanitizeJobID

* admin/plugin: fix racy Shutdown channel close with sync.Once

* admin/plugin: track sendLoop and recv goroutines in WorkerStream with r.wg

* admin/plugin: document writeProtoFiles atomicity — .pb is source of truth, .json is human-readable only

* admin/plugin: extract activityLess helper to deduplicate nil-safe OccurredAt sort comparators

* test/ec: check http.NewRequest errors to prevent nil req panics

* test/ec: replace deprecated ioutil/math/rand, fix stale step comment 5.1→3.1

* plugin(ec): raise default detection and scheduling throughput limits

* topology: include empty disks in volume list and EC capacity fallback

* topology: remove hard 10-task cap for detection planning

* Update ec_volume_details_templ.go

* adjust default

* fix tests

---------

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
2026-02-18 13:42:41 -08:00
Chris Lu
f44e25b422 fix(iam): ensure access key status is persisted and defaulted to Active (#8341)
* Fix master leader election startup issue

Fixes #error-log-leader-not-selected-yet

* not useful test

* fix(iam): ensure access key status is persisted and defaulted to Active

* make pb

* update tests

* using constants
2026-02-13 20:28:41 -08:00