seaweedfs

mirror of https://github.com/seaweedfs/seaweedfs.git synced 2026-05-14 05:41:29 +00:00
Files
Chris Lu f954781169 feat(s3/lifecycle): Phase 4b — daily walker for recovery and steady state (#9459 )
* feat(s3/lifecycle): plumb RetentionWindow into dailyrun.Config

Adds a Config.RetentionWindow field that runShard threads into
engine.PromotedHash. Zero (the default) falls back to maxTTL, which
matches Phase 4a behavior — PromotedHash stays empty and the
partition-flip recovery trigger stays dormant.

Pure plumbing. The handler still passes zero so nothing changes at
runtime. The walker work (Phase 4b proper) sets a real retention from
the meta-log boundary and the partition-flip trigger starts firing.

* feat(s3/lifecycle): WalkerDispatcher adapter for the daily-run walker

Phase 4b prep. Implements bootstrap.Dispatcher on top of LifecycleClient
so the same LifecycleDelete RPC drives both the meta-log replay path
and the walker. No CAS witness — the server's identityMatches treats
nil ExpectedIdentity as a bootstrap call and rebuilds the witness from
the live entry, which is the right contract for a full-tree walk.

Adds VersionID to bootstrap.Entry so versioned-bucket walks address
the right version. MPU init uses DestKey for ObjectPath (matching the
prefix-match contract); rejecting empty DestKey keeps malformed init
records out of the dispatch path.

Not wired yet — runShard still doesn't invoke the walker. Follow-up
commits add the ListFunc adapter and the recovery-branch wiring.

* feat(s3/lifecycle): wire Walker hook into runShard's recovery branch

Adds a Config.Walker callback that fires on rule-content edit /
partition flip BEFORE the cursor rewinds, so already-due objects across
the rewritten rule set get caught instead of waiting on meta-log
replay alone. The callback receives engine.RecoveryView(snap) and the
per-shard ID; nil disables it (Phase 4a behavior preserved).

Decoupling the wiring from the implementation: the handler-side
WalkerFunc that drives bootstrap.Walk via the filer is the follow-up
commit, and tests can stub the callback without standing up the full
filer/client/lister harness.

Tests pin: walker fires exactly once on hash mismatch, walker error
propagates and leaves the cursor unchanged, nil Walker is a no-op.

* feat(s3/lifecycle): WalkBuckets composes ListFunc + Dispatcher per shard

Adds dailyrun.WalkBuckets — the composable driver the handler-side
WalkerFunc will call. Iterates a bucket list, wraps the supplied
bootstrap.ListFunc with a per-shard filter (Path for non-MPU, DestKey
for MPU init), and runs bootstrap.Walk per bucket using the supplied
Dispatcher. First bucket error wins; remaining buckets log and run to
completion so one filer flake doesn't kill the shard.

Composable rather than monolithic so callers and tests can swap parts:
production uses a filer-backed ListFunc + WalkerDispatcher; tests use
bootstrap.EntryCallback + a stub. The filer-backed ListFunc is the
next commit.

Tests pin: shard filter routes only matching entries, MPU shard uses
DestKey not the .uploads/<id> path, single-bucket error propagates
while other buckets still run, ctx cancellation short-circuits between
buckets, nil guards on view/list/dispatch.

* feat(s3/lifecycle): filer-backed ListFunc for the daily-run walker

Phase 4b: dailyrun.FilerListFunc returns a bootstrap.ListFunc that
streams entries under <bucketsPath>/<bucket> by paginated SeaweedList.
Recurses into regular directories; .versions/ and .uploads/ are
skipped at this stage so they don't surface as raw children — the
sibling expansion (versioned NoncurrentDays state, MPU init dispatch)
lands in the next commit.

listAll and isVersionsDir are ported from scheduler/bootstrap.go's
same-named helpers. Phase 5 deletes the scheduler copies along with
the streaming path.

Tests pin: flat listing, recursion through nested directories,
.versions/ and .uploads/ skipped, kill-resume via the start path
contract, nil-client error, attribute propagation (mtime / size /
IsLatest default).

* feat(s3/lifecycle): versioned-sibling expansion in FilerListFunc

Adds the .versions/<key>/ expansion to the daily-run's filer-backed
ListFunc. Each call emits one bootstrap.Entry per sibling (real
version files + the bare null version, when found) with the same
sibling state the streaming bootstrap injects via reader.Event:

  - Path = logical key (not the .versions/<file> physical path), so
    bootstrap.Walk's MatchPath uses the user's intended path.
  - VersionID per sibling (version_id or "null").
  - IsLatest resolved via parent's ExtLatestVersionIdKey, falling back
    to explicit-null-bare, falling back to newest-by-mtime.
  - NoncurrentIndex rank computed against the latest's position.
  - SuccessorModTime: SuccessorFromEntryStamp if stamped, else the
    previous-newer sibling's mtime (legacy derivation).
  - IsDeleteMarker from ExtDeleteMarkerKey.
  - NumVersions = len(siblings).

Two-pass walk so .versions/ dirs run before regular files; the bare
null-version path is recorded in skipBare so pass 2 doesn't emit it
twice.

expandVersionsDir and lookupNullVersion are ported from
scheduler/bootstrap.go. Sort order, latest resolution, and successor
derivation must agree with that path verbatim so streaming and walker
reach the same verdict on the same objects. Phase 5 deletes the
scheduler copy.

MPU init (.uploads/<id>) remains skipped — the dedicated commit emits
it with IsMPUInit and DestKey.

Tests pin: pointer-wins latest resolution, no-pointer newest-sibling
fallback, explicit-null-is-latest with skipBare suppression of the
bare emission, coincidentally-named .versions folder recursing as a
regular subdir, delete-marker propagation.

* feat(s3/lifecycle): emit MPU init records from FilerListFunc

Last gap in the filer-backed ListFunc. A directory at .uploads/<id>
carrying ExtMultipartObjectKey is the MPU init record; emit one
bootstrap.Entry with IsMPUInit=true and DestKey set to the user's
intended path. The walker's MatchPath uses DestKey for prefix
matching; the WalkerDispatcher uses it for the LifecycleDelete RPC's
ObjectPath. .uploads/<id> directories without the extended key are
mid-write before metadata landed and stay skipped.

isMPUInitDir is upgraded from the path-shape-only stub to the full
shape + extended-attr check that mirrors router.mpuInitInfo and
scheduler/bootstrap.go's same-named helper.

Tests pin: valid init record emits with the right DestKey, missing
ExtMultipartObjectKey skips the directory.

* feat(s3/lifecycle): wire walker into executeDailyReplay

Activates the recovery-branch walker. The handler composes the three
Phase 4b building blocks — FilerListFunc + WalkerDispatcher + WalkBuckets
— into a dailyrun.WalkerFunc and passes it via Config.Walker. The
bucket list is derived from the compiled inputs so it matches the
engine snapshot exactly.

Effect on master behavior: when a worker observes a RuleSetHash or
PromotedHash mismatch on its persisted cursor (rule content edited /
partition flip), runShard now walks the live filer tree under the
RecoveryView before rewinding the cursor. Already-due objects across
the rewritten rule set fire immediately instead of waiting on the
sliding meta-log replay.

Still scoped to replay-eligible action kinds because
checkSnapshotForUnsupported continues to reject walker-bound rules
(ExpirationDate / ExpiredDeleteMarker / NewerNoncurrent) and
scan_only-promoted rules at the top of Run. The follow-up commit
relaxes the gate once the steady-state walker over RulesForShard's
walk view is wired so those rules fire every day, not just on rule
edits.

* feat(s3/lifecycle): steady-state walker + drop unsupported-rule gate

Adds the second walker invocation in runShard. After the recovery
check passes, runShard derives the walk view via snap.RulesForShard
(using the same retentionWindow PromotedHash used, so the partition
is consistent) and runs the walker over it. The view holds
walker-bound action kinds (ExpirationDate / ExpiredDeleteMarker /
NewerNoncurrent) plus any replay-eligible rules promoted to walk by
retention shortage; an empty view skips the call so non-versioned,
replay-only deployments don't pay an O(N) bucket walk per run.

With the walker now servicing every rule kind, checkSnapshotForUnsupported
and its UnsupportedRuleError type are obsolete. router.Route gates
replay on Mode == ModeEventDriven, so walker-bound and scan_only
rules are silently dropped by replay and picked up by the walker
instead — no double-dispatch. Drop the gate, delete replayability.go
+ replayability_test.go, and remove the handler's redundant
IsUnsupportedRule branch.

* fix(s3/lifecycle): walker dispatcher nil-response guard + retention-comment

Two PR-review fixes on 9459:

1. WalkerDispatcher.Delete used to panic on a (nil, nil) RPC return —
   add a defensive nil-response check so the walk halts cleanly
   instead. Spotted by coderabbit.

2. The retentionWindow=maxTTL comment in runShard claimed PromotedHash
   "stays empty" in fallback mode, which gemini correctly pointed out
   is only true once rules are active. During bootstrap (rules
   compiled but IsActive=false) MaxEffectiveTTL is 0 while
   PromotedHash counts every non-disabled rule, so promoted becomes
   non-empty and the next post-activation run hits the recovery
   branch. That's the intended bootstrap walk — rewrite the comment
   to explain it rather than misstate the invariant.

Test: pins nil-response → error path on WalkerDispatcher.

* fix(s3/lifecycle): explicit stale-pointer fallback in versioned expansion

Reviewer caught a structural bug in expandVersionsDir's latest
resolution: when ExtLatestVersionIdKey was set but no scanned sibling
carried that id (stale pointer), the code left latestPos at the
default 0 without ever entering the no-pointer fallback. Today the
two paths yield the same value (newest sibling wins), but the
implicit fall-through makes the intent unclear and would break
silently if the no-pointer branch ever did anything more than
latestPos=0.

Track a pointerResolved flag explicitly so the no-pointer branch
(including the explicit-null-bare check) re-runs on a stale pointer.
Behavior unchanged today.

Test pins: stale pointer + two real versions falls back to
newest-sibling (vnew, not vold).

* feat(s3/lifecycle): walker-side dispatch metrics in WalkerDispatcher

Mirrors the Phase 6 instrumentation already on the replay side
(processMatches) onto the walker's Delete dispatch. Every walker
dispatch now bumps S3LifecycleDispatchCounter with the resolved
outcome (or TRANSPORT_ERROR / NIL_RESPONSE for the failure paths) so
streaming, daily_replay's replay drain, and daily_replay's walker
share a single per-(bucket, kind, outcome) counter view.

Lands together with the rest of Phase 4b — no new metric, just an
extra observation site for the existing one.
2026-05-12 11:39:15 -07:00
has_prefix_test.go
test(s3/lifecycle): final unit-test cleanup before integration suite (#9414 )
2026-05-09 22:32:49 -07:00
walker_test.go
test(s3/lifecycle): integration coverage for versioning + filters (#9415 )
2026-05-10 09:30:50 -07:00
walker.go
feat(s3/lifecycle): Phase 4b — daily walker for recovery and steady state (#9459 )
2026-05-12 11:39:15 -07:00