Files
Chris Lu 7f254e158e feat(worker/s3_lifecycle): plugin handler with admin UI config (#9362)
* feat(s3/lifecycle): scheduler — N pipelines over an even shard split

Scheduler.Run spawns Workers Pipeline goroutines plus one engine-refresh
ticker. Each worker owns a contiguous AssignShards(idx, total) slice of
[0, ShardCount) and runs Pipeline.Run with EventBudget bounding each
iteration; brief RetryBackoff between iterations avoids hot-loop on
errors. The refresh ticker rebuilds the engine snapshot from the filer's
bucket configs every RefreshInterval.

LoadCompileInputs / IsBucketVersioned / AllActivePriorStates are
exported from a configload.go sibling so the shell command can move to
this shared implementation in a follow-up.

* refactor(shell): reuse scheduler.LoadCompileInputs in run-shard

Drop the local copies of loadLifecycleCompileInputs / isBucketVersioned
/ allActivePriorStates / lifecycleParseError that the new
scheduler package now exports. Same behavior, one source of truth.

* feat(worker/s3_lifecycle): plugin handler with admin UI config

Registers a JobHandler for s3_lifecycle via pluginworker.RegisterHandler.
Admin pulls the descriptor over the worker plugin gRPC and renders the
AdminConfigForm + WorkerConfigForm in the existing UI:

  Admin form (cluster shape):
    - workers (1..16, default 1)
    - s3_grpc_endpoints (comma list)

  Worker form (operational tuning):
    - dispatch_tick_ms (default 5000)
    - checkpoint_tick_ms (default 30000)
    - refresh_interval_ms (default 300000)
    - event_budget (default 0 = unbounded)

Detect emits a single proposal whenever S3 endpoints + filer addresses
are configured. MaxExecutionConcurrency=1 so admin only ever runs one
lifecycle daemon per worker; a fresh proposal next cycle restarts it
if the prior Execute exits.

Execute dials the configured S3 endpoint + filer, builds a
scheduler.Scheduler with the parsed config, and runs it until
ctx cancellation. Reuses the existing scheduler / dispatcher /
reader / engine packages — the handler is the thin glue that
parses descriptor values and wires the long-running daemon.

* proto(plugin): add s3_grpc_addresses to ClusterContext

So workers can dial s3 servers discovered by the master rather than a
hand-typed list in the admin form.

* feat(admin): populate ClusterContext.s3_grpc_addresses from master

ListClusterNodes(S3Type) returns the live S3 servers; the plugin
scheduler now hands these to job handlers alongside filer/volume
addresses.

* feat(worker/s3_lifecycle): discover s3 endpoints from cluster context

Drop the s3_grpc_endpoints admin form field and read the master-supplied
ClusterContext.S3GrpcAddresses instead. Operators no longer maintain a
hand-typed list, and a stale entry self-heals when the master's view
updates.

* feat(worker/s3_lifecycle): time-based runtime cap, friendlier cadence units

- dispatch_tick_minutes (was *_ms): minutes is the natural granularity
  for a daily batch; default 1 minute.
- checkpoint_tick_seconds: seconds for the durable cursor write; default
  30 seconds.
- refresh_interval_minutes: minutes for the engine snapshot rebuild.
- max_runtime_minutes replaces event_budget. Each daily run is bounded
  by wall clock — typical run wraps in well under an hour because the
  cursor persists and the meta-log streams fast. Default 60 minutes.
- AdminRuntimeDefaults.DetectionIntervalSeconds = 86400 so the admin
  schedules one job per day.
2026-05-08 10:30:02 -07:00
..