Files
Chris Lu 7f2b20d577 feat(s3/lifecycle): policy engine — XML conversion, Compile, decideMode, Match (#9348)
* feat(s3/lifecycle): XML lifecycle config to canonical Rule

LifecycleToCanonical takes a parsed *Lifecycle and returns
[]*s3lifecycle.Rule, the flat shape the engine compiles against.
Filter resolution mirrors AWS: <And> sub-elements (Prefix + Tags +
size filters) flatten into the canonical Rule's individual fields;
single <Tag> filter populates FilterTags with one entry; <Prefix>
filter takes precedence over the rule's top-level <Prefix>.

Multi-action rules (Expiration + NoncurrentVersion + AbortMPU on
the same XML <Rule>) populate every action field they declare.
RuleActionKinds expands the canonical rule into its compiled actions
downstream.

* feat(s3/lifecycle): engine snapshot skeleton + ActionKey type

Defines s3lifecycle.ActionKey{rule_hash, action_kind} as the engine's
primary identity, and adds the engine package's Snapshot type.
Snapshot is immutable after Compile (atomic-swapped on rebuild) and
holds the ActionKey-keyed routing indexes:

  - originalDelayGroups: map[time.Duration][]ActionKey
  - predicateActions:    []ActionKey
  - dateActions:         map[ActionKey]time.Time
  - actions:             map[ActionKey]*CompiledAction

CompiledAction.engineState is an atomic.Uint32 so MarkActive (called
after the durable bootstrap_complete + mode write commits) is visible
to in-flight reader passes without a recompile. The reader filters on
IsActive() before dispatching, so stale-snapshot dispatches are
prevented.

No callers yet; downstream commits add Compile, decideMode, and the
Match functions.

* feat(s3/lifecycle): decideMode + retention gate

decideMode picks the scheduling mode for one (rule, kind) compiled
action. Disabled rule -> DISABLED; EXPIRATION_DATE -> SCAN_AT_DATE;
reader-driven kind whose eventLogHorizon + bootstrapLookbackMin
exceeds metaLogRetention -> SCAN_ONLY; otherwise EVENT_DRIVEN. The
gate runs per (rule, kind), so a 90d ExpirationDays sibling can
degrade to scan_only while its 7d AbortMPU sibling stays active.

MetaLogRetention=0 is treated as "unbounded" — matches the SeaweedFS
default (Phase 0 verified that meta-log files are written without
TtlSec by default), so the gate doesn't trip until an operator opts
in to volume-TTL pruning of /topics/.system/log/.

RuleMode is a Go-level enum here, separate from the wire-form
LifecycleState.RuleMode in the proto package; the worker maps between
them when reading/writing the durable state file.

* feat(s3/lifecycle): Compile builds the engine snapshot per-action

Compile produces a fresh Snapshot from per-bucket canonical rules.
Each input rule expands into N CompiledActions via RuleActionKinds;
mode comes from decideMode; activation requires both
bootstrap_complete (from PriorStates) and mode==EVENT_DRIVEN.

Routing indexes are populated by mode:
- SCAN_AT_DATE: always indexed in dateActions (detector schedules at
  rule.date regardless of bootstrap status; the action runs once on
  the date and is then done).
- EVENT_DRIVEN + active: indexed in originalDelayGroups (and in
  predicateActions when the rule has tag/size filters).
- SCAN_ONLY / DISABLED / pending_bootstrap: not indexed; safety-scan
  tick or operator action handle these.

snapshot_id is monotonic per process; pending writes stamp it. The
new snapshot replaces the engine's atomic pointer; in-flight reader
passes continue against their loaded snapshot.

Tests cover: single-action rule, multi-action expansion (one rule ->
three CompiledActions with three distinct delay groups), pending
bootstrap exclusion from indexes, retention gate, sibling actions
degrading independently under partial retention, ExpirationDate path,
disabled rule, MarkActive flipping IsActive(), Compile producing
monotonic snapshot ids.

* feat(s3/lifecycle): MatchOriginalWrite / MatchPredicateChange / MatchPath

The reader feeds events through the engine's match functions to find
the active ActionKeys whose filter applies. The minimal Event shape
the engine takes (bucket, path, tags, size, IsLatest, IsDeleteMarker,
IsMPUInit) keeps engine free of filer_pb dependencies; the reader
extracts these fields from the persisted *filer_pb.LogEntry payload
in Phase 3.

- MatchOriginalWrite: per-delay-group sweep entry. Filters on shape =
  EventShapeOriginalWrite, prefix, tag, size, then per-kind shape
  gating (ABORT_MPU only on IsMPUInit; EXPIRED_DELETE_MARKER only on
  IsLatest+IsDeleteMarker).
- MatchPredicateChange: single near-now sweep. Returns only the
  predicate-sensitive subset of active ActionKeys.
- MatchPath: bucket-level walker entry. Returns every active action
  whose filter matches; bootstrap iterates these per object and calls
  EvaluateAction per kind.

All filter on a.IsActive() at routing time so MarkActive flips become
visible without recompile.

* fix(s3/lifecycle): scope ActionKey by bucket; defensive copies; tidy compile

Three findings on the engine PR addressed:

1. Critical (cross-bucket collision): ActionKey was {RuleHash, ActionKind}
   only. Two buckets with rules whose XML is identical produce the same
   RuleHash; the second bucket's Compile would overwrite the first
   bucket's CompiledAction in snap.actions. Add Bucket to ActionKey
   so the engine's identity matches the on-disk path layout
   /etc/s3/lifecycle/<bucket>/<rule_hash>/<action_kind>/. Regression
   test pins it.

2. Major (immutability leak): OriginalDelayGroups, PredicateActions,
   DateActions returned the snapshot's internal maps/slices by
   reference, letting an external caller mutate routing state and
   break the documented immutability contract. Return defensive
   copies.

3. Minor (redundant condition): mode==EVENT_DRIVEN already implies
   kind != EXPIRATION_DATE because decideMode routes the date kind
   to SCAN_AT_DATE. Drop the redundant check.

Tests updated to construct ActionKey with the new Bucket field.

* fix(s3/lifecycle): drop size filters from rulePredicateSensitive

An object's size is immutable once written: any content change is a
fresh write that flows through the original-write stream, not the
predicate-change one. Tagging rules really can flip post-PUT
(operator adds/removes a tag without rewriting), so they belong; size
filters do not.

Including size filters here was adding rules to predicateActions for
no purpose — every predicate-change sweep would waste cycles
re-evaluating size predicates that physically can't have changed.

* perf(s3/lifecycle): pre-sort AllActions at Compile time

Snapshot is immutable after Compile (engineState bit-flips don't
change membership), so the (bucket, rule_hash, action_kind) ordering
is stable for the snapshot's lifetime. Build the sorted slice once
and serve every AllActions() call from it; drop the per-call
sort.Slice. The bootstrap walker is the primary caller and may
iterate this on every task entry.

* docs(s3/lifecycle): note the FilterSizeGreaterThan=0 ambiguity

Per AWS S3 spec, <ObjectSizeGreaterThan>0</ObjectSizeGreaterThan>
explicitly excludes 0-byte objects, but with the int64 zero value as
the unset sentinel we can't distinguish that from omitted-and-default.
Document the limitation inline so a future deployment that needs the
distinction can switch to *int64 (or a paired set-bool) and update
the matchers / RuleHash accordingly. Not fixing now: the explicit-zero
configuration is unusual, the canonical Rule shape mirrors the same
zero-as-unset convention as s3api.Filter, and a structural fix
touches every filter-using site (evaluator, due_at, match, RuleHash).

* fix(s3/lifecycle): make ObjectInfo.NoncurrentIndex *int

The previous int field had a zero-value collision: 0 is both "newest
non-current version" (a valid index) and "uninitialised by ObjectInfo{}
literal." A caller who built &ObjectInfo{IsLatest: false} without
explicitly setting NoncurrentIndex would have it implicitly read as
"newest non-current," and the count-based NewerNoncurrent retention
would use that bogus 0 to decide eligibility.

Switch to *int so nil is explicitly "not a non-current version /
index not yet computed." The evaluator's NoncurrentDays and
NewerNoncurrent paths conservatively return ActionNone when the
index is nil — the safety scan will revisit once the index is
supplied. This removes a class of latent footguns in test setup and
in any future code path that constructs ObjectInfo without a
versioning-aware builder.

idx() helper added in tests to keep the call sites a one-liner.

* refactor(s3/lifecycle): trim narration from engine + helpers

Drop "what" comments where well-named identifiers already say it
(IsActive, MarkActive, AllActions, etc.); collapse multi-paragraph
"why" docs to one-liners where the design rationale is already in
the design doc. Keep WHY comments only at non-obvious load-bearing
spots: the routing-index activation predicate, the *int rationale on
NoncurrentIndex, the field-tag namespace in RuleHash, the SmallDelay
horizon rule.

Files: action_kind.go, rule.go, rule_hash.go, evaluate.go, due_at.go,
min_trigger_age.go, event_log_horizon.go, engine/engine.go,
engine/compile.go, engine/match.go, engine/mode.go.

No behavior change; tests untouched and pass.

* fix(s3/lifecycle): durable PriorState.Mode wins over decideMode

PriorState.Mode was declared but never read; Compile recomputed mode
via decideMode and stored that on every CompiledAction. Effect: an
action durably persisted as SCAN_ONLY (lag fallback or operator
pause) or DISABLED would silently re-promote to EVENT_DRIVEN on the
next engine rebuild as soon as decideMode's XML+retention predicate
said so. Defeats the durability of mode state.

Use prior.Mode when set; fall through to decideMode only for new
actions (no prior at all) and for legacy entries persisted before
Mode existed (zero value). Regression test pins both branches.

* fix(s3/lifecycle): MarkActive routability — index every EVENT_DRIVEN key

MarkActive's documented contract was "flip visible without a
recompile," but the routing indexes (originalDelayGroups,
predicateActions) were only populated when active && mode ==
EVENT_DRIVEN at compile time. So a key compiled with
BootstrapComplete=false would never enter the indexes; a later
MarkActive flipped engineState but MatchOriginalWrite /
MatchPredicateChange iterated the indexes and never saw the key.
Only MatchPath (which walks bi.actionKeys) and DateActions worked.

Index every EVENT_DRIVEN key regardless of `active`. The runtime
IsActive() filter inside filterMatching already gates dispatch, so
inactive entries are matched-but-not-fired; flipping MarkActive
makes them routable without recompile, matching the documented
contract.

Tests updated: TestCompile_BootstrapPendingIndexedButInactive
asserts the indexed-but-inactive shape; TestMatchOriginalWrite_MarkActiveBecomesRoutable
asserts a MarkActive flip routes the next match.

* test(s3/lifecycle): pin nil NoncurrentIndex no-op behavior

Two regression tests for the *int pointer migration: nil index
combined with NewerNoncurrent (either paired with NoncurrentDays or
standalone) must short-circuit to ActionNone rather than guess at
the version's position in the keep-N window.

* refactor(s3/lifecycle): trim follow-up narration on engine + helpers

Comments accumulated since the last sweep — the durable-Mode rationale,
the MarkActive routability note, the routing-index doc, the
NoncurrentIndex pointer rationale, and the EvaluateAction docblock.
Trimmed each to one or two terse lines; the underlying contracts live
in the design doc.

* docs(s3/lifecycle): note CompileInput one-per-bucket invariant
2026-05-07 15:00:49 -07:00

58 lines
1.4 KiB
Go

package engine
import (
"time"
"github.com/seaweedfs/seaweedfs/weed/s3api/s3lifecycle"
)
// RuleMode mirrors the durable s3_lifecycle_pb.LifecycleState.RuleMode enum;
// the worker maps between them when it reads/writes durable state.
type RuleMode int
const (
ModeUnspecified RuleMode = iota
ModeEventDriven
ModeScanAtDate
ModeScanOnly
ModeDisabled
ModePendingBootstrap
)
func (m RuleMode) String() string {
switch m {
case ModeEventDriven:
return "event_driven"
case ModeScanAtDate:
return "scan_at_date"
case ModeScanOnly:
return "scan_only"
case ModeDisabled:
return "disabled"
case ModePendingBootstrap:
return "pending_bootstrap"
default:
return "unspecified"
}
}
// decideMode: disabled rule -> DISABLED; EXPIRATION_DATE -> SCAN_AT_DATE;
// reader-driven kind whose horizon exceeds retention -> SCAN_ONLY; else
// EVENT_DRIVEN. metaLogRetention=0 means unbounded (default), gate doesn't
// trip.
func decideMode(rule *s3lifecycle.Rule, kind s3lifecycle.ActionKind, metaLogRetention, bootstrapLookbackMin time.Duration) RuleMode {
if rule == nil || rule.Status != s3lifecycle.StatusEnabled {
return ModeDisabled
}
if kind == s3lifecycle.ActionKindExpirationDate {
return ModeScanAtDate
}
if metaLogRetention > 0 {
horizon := s3lifecycle.EventLogHorizon(rule, kind)
if horizon > 0 && metaLogRetention < horizon+bootstrapLookbackMin {
return ModeScanOnly
}
}
return ModeEventDriven
}