Files
seaweedfs/weed/s3api
Chris Lu 4e10669221 docs(s3/lifecycle): event-driven redesign (#9346)
* docs(s3/lifecycle): event-driven redesign

Replaces the synchronous PUT-handler walk with an event-driven worker
model: meta-log reader subscribed to one filer, client-side heap merge
with per-filer-shard MessagePosition cursors, bucket-level bootstrap
with inline delete, blocked-cursor handling for fatal events, durable
retry budget for sustained-transient promotion, retention mode gate
that downgrades reader-driven rules to scan_only when log retention
falls below the rule's event-log horizon.

* docs(s3/lifecycle): record Phase 0 verified assumptions

ReadPersistedLogBuffer payload carries Extended (event marshaled via
SubscribeMetadataResponse → ToProtoEntry). Meta-log files at
topics/.system/log/<date>/<HH-MM>.<filerId> are written without TtlSec
in filer_notify_append.go; retention is unbounded by default and only
shrinks if an operator sets a filer.conf rule with a TTL on the
SystemLogDir prefix. .versions/ filenames are v_<16h-ts><16h-rand>
with old/new (inverted) format distinguished by threshold
0x4000000000000000; getVersionTimestamp / compareVersionIds give
format-agnostic ordering for successor-version discovery.

* feat(s3/lifecycle): rule evaluator and dueAt helper

Evaluate(rule, info, now) returns the EvalResult by object shape:
IsMPUInit -> AbortMultipartUpload, IsDeleteMarker -> ExpireDeleteMarker
when sole survivor, IsLatest -> DeleteObject (Days or Date), non-current
-> DeleteVersion (Days fallback to ModTime when SuccessorModTime is
zero; NewerNoncurrent retention enforced when both are set).

ComputeDueAt mirrors Evaluate's shape and returns the earliest eligible
wall-clock time for the same (rule, info), used by the reader/bootstrap
to decide pending vs inline-delete.

Adds StatusEnabled/Disabled and SmallDelay consts and an IsMPUInit
flag on ObjectInfo so .uploads/<id>/ entries route to AbortMPU without
overloading IsLatest.

No callers; package compiles standalone.

* feat(s3/lifecycle): MinTriggerAge for safety-scan cadence

Returns the smallest non-zero day threshold across the rule's actions.
Used as max(MinTriggerAge, kindFloor) for the per-kind cadence; date,
count-only, and delete-marker-only rules return 0 so callers fall
through to their kind-specific floor.

* feat(s3/lifecycle): EventLogHorizon for retention mode gate

Returns the maximum event age the reader needs for a rule. Days-based
kinds return their day threshold; pure NewerNoncurrent (count) and
ExpiredObjectDeleteMarker return SmallDelay. Date rules return 0 (the
gate skips them). Multi-action rules take the max — strictest horizon
wins.

Drives the Phase 2 mode gate: metaLogRetention < EventLogHorizon(rule)
+ bootstrapLookbackMin -> scan_only with RETENTION_BELOW_HORIZON.

* feat(s3/lifecycle): RuleHash for per-rule state CAS

sha256 over canonicalized form, first 8 bytes. Stable across tag-key
reorder, prefix trailing-slash variation, ID renames, and Status flips
(state continuity is preserved when an operator toggles
Enabled/Disabled). Different action shapes — different days, filter,
or action type — hash differently.

Used by the per-rule state directory layout
/etc/s3/lifecycle/<bucket>/<rule_hash_hex>/ and by the bootstrap
detector's reconcile-on-PUT.

* feat(s3/lifecycle): add s3_lifecycle.proto storage schema

Defines the durable types backing the lifecycle worker:
LifecycleState (per-rule mode + bootstrap_complete + degraded_reason
incl. RETENTION_BELOW_HORIZON / LOST_LOG), PendingItem, EntryIdentity,
BootstrapState, ReaderState (per-filer-shard cursors plus
tail_drained_streams marker), BlockerRecord (rule_hash optional for
pre-evaluation failures), RetryBudgetEntry with the four-shape
StreamKey oneof, and RetryTarget (no action / no expected_identity —
retry replays handler against current state).

No callers; schema only. Wired into the pb Makefile.

* feat(s3/lifecycle): add S3LifecycleParams to TaskParams

Wires lifecycle subroutines into the existing worker dispatch.
Subtype is READ (cluster-singleton meta-log reader), BOOTSTRAP
(per-bucket walker), or DRAIN (per-rule pending). bucket / rule_hash
populated for the latter two; ContinuationHint is an advisory resume
point for kill-resumable BOOTSTRAP / READ.

oneof tag = 14, after the existing ec_balance_params at 13.

* fix(s3/lifecycle): noncurrent delete markers honor NoncurrentDays

A non-current delete marker is just another version per AWS S3 spec
and is eligible under NoncurrentVersionExpirationDays. The
IsDeleteMarker special case is meant only for the *current* delete
marker (sole-survivor ExpiredObjectDeleteMarker action), so guard
that switch arm with IsLatest. Without IsLatest, the bootstrap walker
silently skipped pre-existing non-current delete markers under a
NoncurrentDays rule because ComputeDueAt returned zero.

Mirrors the same fix in evaluate.go so the runtime decision matches.

* fix(s3/lifecycle): preserve prefix trailing slash in RuleHash

"logs" and "logs/" match different object sets under literal
strings.HasPrefix semantics: "logs" matches "logsmore/x", "logs/" does
not. Collapsing them in the hash would let an XML edit silently bind
the new rule to the previous rule's durable state directory, causing
stale bootstrap_complete and stale pending entries against a rule
that now matches a different set.

* fix(s3/lifecycle): add UNSPECIFIED sentinels to enum zero values

Proto3 best practice: enum 0 should be an _UNSPECIFIED sentinel, not
an active value. Persisted state schemas care most: a partially
populated payload (or one written by an older binary that didn't set
the field) would otherwise silently default to a semantically active
value.

- LifecycleState.RuleKind: shifts EXPIRATION_DAYS..EXPIRED_DELETE_MARKER
  from 0..5 to 1..6, with RULE_KIND_UNSPECIFIED at 0.
- LifecycleState.RuleMode: shifts EVENT_DRIVEN..PENDING_BOOTSTRAP from
  0..4 to 1..5.
- LifecycleState.DegradedReason: replaces NONE=0 with
  DEGRADED_REASON_UNSPECIFIED=0 (operators treat both as healthy).
- StreamKind: shifts ORIGINAL..PENDING from 0..3 to 1..4.
- S3LifecycleParams.Subtype: shifts READ..DRAIN from 0..2 to 1..3, so
  an unset subtype no longer routes into the cluster-singleton READ task.

No on-disk state has been written yet; renumbering is safe.

* docs(s3/lifecycle): per-action state, not per-rule

A single AWS lifecycle XML <Rule> can declare multiple actions in
parallel (e.g. ExpirationDays=90 + AbortMPU=7 + NoncurrentDays=30).
Each must drive its own delay/horizon/mode/pending stream
independently. Modeling the rule as one compiled entry with one kind
collapses these — picking the smallest delay (7d MPU) means the 90d
expiration cursor advances past objects that aren't yet due, and the
90d action never re-fires.

Restructure storage to per-action: every XML rule expands into N
compiled actions; state lives at <bucket>/<rule_hash>/<action_kind>/.
The intermediate rule_hash directory keeps a rule's actions grouped
for operator listing. Each action has its own state file with its own
mode + bootstrap_complete + degraded_reason; sibling actions of the
same rule can degrade independently.

* fix(s3/lifecycle): per-action proto schema + safety-scan counters

Realigns the durable schema with the per-action storage model and the
safety-scan contract in the design doc.

- Promote LifecycleState.RuleKind to a top-level ActionKind enum and
  rename rule_kind -> action_kind. The same enum now also keys
  BootstrapKey, PendingKey, and BlockerRecord so per-action streams
  under one rule never collapse.
- LifecycleState now keyed by (rule_hash, action_kind). Added
  last_safety_scan_ts_ns / next_safety_scan_ts_ns and the four
  observability counters the design specifies (evaluated_total,
  expired_total, metadata_only_total, error_total). Dropped
  last_evaluated_ns, deleted_total, skipped_object_lock_total, and
  pending_size — subsumed or out-of-band.
- BlockerRecord.action_kind is OPTIONAL (UNSPECIFIED for
  pre-evaluation failures, just like rule_hash).

No on-disk state has been written yet; the renumbering / rename is
free.

* fix(s3/lifecycle): per-action MinTriggerAge / EventLogHorizon helpers

The retention mode gate and safety-scan cadence run on each compiled
action independently. Taking a "min/max across all actions in the
rule" — as the previous helpers did — was wrong: a 90d ExpirationDays
sibling alongside a 7d AbortMPU action would cause the 90d cursor to
advance at 7d (because MinTriggerAge picked the smallest), and the
ExpirationDays action would never re-fire on objects that aged past
the 7d sweep window before reaching 90d.

Both helpers now take an ActionKind and return the threshold for that
specific action only. Returns 0 if the rule does not declare the
requested kind, which makes the gate a no-op and the cadence fall
through to the kind floor.

Also adds RuleActionKinds(rule), the canonical expansion that the
engine uses at compile time to turn one XML rule into N compiled
actions. NewerNoncurrentVersions paired with NoncurrentDays is
subsumed into a single NONCURRENT_DAYS action (AWS-paired
conditions); only when NewerNoncurrent stands alone does it become
NEWER_NONCURRENT.

* fix(s3/lifecycle): length-prefix RuleHash to remove delimiter ambiguity

The previous encoding used "tag=K=V\n" lines, which is ambiguous: a
tag (a=b, c) serializes identically to (a, b=c). A prefix containing
"\nexp_days=99" could likewise forge an action field. Either could
silently bind two semantically different rules to the same per-rule
state directory.

Switch to a length-prefixed canonical form: each scalar is written as
<field-tag-byte> <uvarint-length> <bytes>. Field-tag bytes namespace
each scalar so ("a=b" tag-key) and ("a" tag-key with "=b" leakage)
can't collide. Three regression tests pin the resistance.

* docs(s3/lifecycle): ActionKey is the engine identity, not rule_hash

Finishes the per-action restructure across the engine pseudocode,
bootstrap completion, drain locks, detector paths, policy CAS, and
Phase 2 plan. Every per-action data structure — engine indexes,
target modes, newly-completed sets, bootstrap completion bits, drain
keys, locks, metrics, status — is now keyed by
ActionKey{rule_hash, action_kind}, not by rule_hash alone.

Without this, sibling actions under one XML rule still shared
scheduling state in the engine: originalDelayGroups holding
[]ruleHash means a rule's 7d AbortMPU and 90d ExpirationDays
collapse into one entry, the smaller delay wins for cursor advance,
and the larger sibling never re-fires.

Helper API tightened: EvaluateAction(rule, kind, info, now) and
ComputeDueAt(rule, kind, info) replace the aggregate-rule signatures
so a caller asks one specific action's eligibility against one
entry, never "any action of this rule." Drain task lock includes
action_kind so siblings have independent re-arm timers. Policy CAS
moves from rule_hash to ActionKey membership.

* fix(s3/lifecycle): kind-aware EvaluateAction / ComputeDueAt

Replaces the aggregate-rule signatures with per-action ones:
EvaluateAction(rule, kind, info, now) and ComputeDueAt(rule, kind,
info). The old Evaluate(rule, info, now) could return the verdict of
ANY action declared on the rule, which sat wrong with the per-action
engine indexing (each ActionKey has its own delay group, mode, and
pending stream).

Each helper now decides exactly one (rule, kind) compiled action's
fate against one entry. Asking for a kind the rule doesn't declare,
or asking against the wrong object shape for that kind, returns
ActionNone / zero — never silently routes to a sibling.

Multi-action regression test: a rule with ExpirationDays=90 and
AbortMPU=7 evaluates each kind independently for the same entry; the
7d window has no influence on the 90d eligibility decision.
2026-05-07 10:02:32 -07:00
..
2026-01-28 14:34:07 -08:00
2026-02-20 18:40:47 -08:00
2025-07-28 02:49:43 -07:00

see https://blog.aqwari.net/xml-schema-go/

1. go get aqwari.net/xml/cmd/xsdgen
2. Add EncodingType element for ListBucketResult in AmazonS3.xsd
3. xsdgen -o s3api_xsd_generated.go -pkg s3api AmazonS3.xsd
4. Remove empty Grantee struct in s3api_xsd_generated.go
5. Remove xmlns: sed s'/http:\/\/s3.amazonaws.com\/doc\/2006-03-01\/\ //' s3api_xsd_generated.go