Files
seaweedfs/weed/worker
Chris Lu 4f79d8e358 feat(s3/lifecycle): bucket-level bootstrap walker (#9350)
* feat(worker): add TaskTypeS3Lifecycle constant

Single job type for the lifecycle worker; the S3LifecycleParams.Subtype
field (READ / BOOTSTRAP / DRAIN) dispatches inside the handler. The
"s3_lifecycle" string is already wired to LaneLifecycle in
admin/plugin/scheduler_lane.go so adding the constant doesn't change
runtime behavior — it lets future commits reference the type name
without sprinkling string literals.

* feat(s3/lifecycle): bucket-level bootstrap walker

Iterates entries in a bucket, evaluates every active ActionKey in the
engine snapshot against each entry, and dispatches inline-delete for
currently-due actions. Date-kind actions and pending_bootstrap actions
are skipped — the former are handled by their own SCAN_AT_DATE
bootstrap, the latter aren't IsActive() yet.

Walker is callback-driven so callers supply the listing source
(real filer_pb.SeaweedList or test fake) and the dispatcher (real
LifecycleDelete client or test fake). This keeps the walker free of
filer_pb dependencies and makes the per-action evaluation flow
unit-testable in isolation.

Checkpoint state (LastScannedPath, Completed) is returned to the
caller, who is responsible for persisting it under
/etc/s3/lifecycle/<bucket>/_bootstrap. Walk() honours opts.Resume so
a kill-resumed task picks up where the previous walker stopped.

Tests cover: prefix-mismatched skip, not-yet-due skip (reader's job),
date-kind skip, pending_bootstrap skip, multi-action rule (one rule
with three actions dispatches three times — the regression that
per-action keying fixes), dispatch error halts at last-successful
checkpoint, Resume skips entries up to and including the resume
path.

* test(s3/lifecycle): walker test uses bucket-scoped ActionKey

Mechanical follow-up to the bucket-scoped ActionKey on
lifecycle-engine: the bootstrap walker tests construct ActionKeys to
seed PriorStates and need the Bucket field to match what
engine.Compile keys against.

* fix(s3/lifecycle): walker quick wins

Two minor cleanups noted on review:

- Drop the redundant Resume re-filter inside the Walk callback.
  ListFunc's contract already promises "skip entries with Path <=
  start"; trusting that contract avoids divergence if the filter
  logic ever changes on one side and not the other.

- Hoist the ObjectInfo allocation out of the per-action loop in
  walkEntry. Multi-action rules previously allocated one ObjectInfo
  per (entry, kind) pair; now it's one per entry, reused across all
  matching kinds.

* fix(s3/lifecycle): walker Entry.NoncurrentIndex tracks ObjectInfo's *int

ObjectInfo.NoncurrentIndex is now *int so unset is unambiguous;
mirror that on bootstrap.Entry so the per-entry construction stays
type-clean. Phase 5 (versioned-bucket walks) is the first caller
that will populate the field.

* refactor(s3/lifecycle): trim narration from bootstrap walker

Drop the inline step-by-step on Walk and the multi-paragraph package
preamble; the function names already say it. Keep one-liner WHYs at
the SCAN_AT_DATE skip and the once-per-entry ObjectInfo build.

* fix(s3/lifecycle): walker skips directories and ModeDisabled actions

Two safety findings from review:

1. SeaweedFS directory entries can appear in the listing alongside
   objects; without an IsDirectory check the walker would treat a dir
   like any other entry and could dispatch a delete against it. Add
   IsDirectory to bootstrap.Entry and short-circuit it before
   walkEntry.

2. ModeDisabled is set by the operator (e.g. shell pause) independent
   of the XML rule's Status field. EvaluateAction gates on Status and
   would still fire for an operator-disabled action whose XML status
   is "Enabled". Skip ModeDisabled explicitly in walkEntry alongside
   the existing SCAN_AT_DATE skip.

Two regression tests pin both cases.

* perf(s3/lifecycle): reuse ObjectInfo across walker entries

Walker allocated one ObjectInfo struct per entry. For buckets with
millions of objects that's measurable GC pressure. Hoist the
allocation out of the per-entry callback (one per Walk) and reuse
via field assignment in walkEntry.

EvaluateAction reads ObjectInfo synchronously and doesn't retain a
reference, so the reuse is safe — the next iteration's overwrite
can't corrupt an in-flight evaluation.

* refactor(s3/lifecycle): trim narration on walker

Drop the multi-line Entry / ObjectInfo-reuse / SCAN_AT_DATE+DISABLED
explanations. The walker's structure is small enough that the
condition itself reads as the documentation.
2026-05-07 15:04:51 -07:00
..