Files
Chris Lu e9bcb8f4ad docs(s3/lifecycle): refresh DESIGN.md as-built (#9491)
* docs(s3/lifecycle): refresh DESIGN.md as-built + add wiki pages

DESIGN.md was written as a phased implementation plan ("Phase 2 will
ship X, Phase 4 will ship Y"). All phases are now merged, plus the
post-cutover changes from #9477/#9481/#9484/#9485/#9486 substantially
changed the worker model (single subscription, walker throttle,
observability gauges). Rewrite the doc in present tense describing
what's actually there.

Net changes vs the prior plan-style doc:
- Algorithm pseudo-code reflects the single-subscription fan-out plus
  walkedThisPass within-pass guard.
- Walker invocation table replaces the implicit "two distinct calls"
  prose with three call sites (recovery / steady-state / empty-replay)
  and their throttle gates.
- New section on the subscription model (one Reader, ShardPredicate,
  fan-out by ev.ShardID).
- New section on cursor.LastWalkedNs and the WalkerInterval throttle.
- Observability section: gauges, heartbeat tokens, what each means.
- "Implementation history" table maps phases to merged PRs.
- "Future work" lists the four optimizations we deferred (long-lived
  subscription, bucket-coordinated walker, per-bucket lag metric,
  filer meta-log retention).

Drop the "Phase N — ..." narrative from the bottom; the PR history
table is the durable artifact now.

Add wiki pages under docs/wiki/s3-lifecycle/ as source-of-truth for
the operator-facing docs. README explains the sync workflow with the
external seaweedfs.wiki.git repo. Five pages:

- Home.md — landing page, supported rule shapes, what the worker does
- Operator-Guide.md — config knobs, when to change each, walker
  interval recommendations by cluster size
- Monitoring.md — Prometheus metric reference + heartbeat token table
  + suggested PromQL alerts
- Troubleshooting.md — stuck cursor, walker stuck, failure outcomes,
  cursor schema for manual inspection
- Architecture.md — high-level overview for newcomers; sits between
  Home.md (operator) and DESIGN.md (developer)

* docs(s3/lifecycle): address PR review feedback on docs

Coderabbit + gemini findings on #9491:

- Monitoring.md: clarify the "matches all dispatched" phrasing; note
  that LIFECYCLE_DELETE_OUTCOME_UNSPECIFIED is the proto zero-value
  (shouldn't appear in healthy systems); filter PromQL alerts to
  ignore zero-valued gauges so fresh-install heartbeats don't trip.
- Operator-Guide.md, Troubleshooting.md: clarify weed shell -master
  format as host:http_port.grpc_port (SeaweedFS ServerAddress).
- Troubleshooting.md: pause the s3_lifecycle job in the admin UI
  before manually editing a cursor file, otherwise the worker's
  save races with the operator's edit.
- Architecture.md, Home.md, Operator-Guide.md, Monitoring.md,
  Troubleshooting.md, DESIGN.md: add language tags (`text`) to
  fenced code blocks for markdownlint MD040 compliance.
- DESIGN.md: standardize on the S3 spec rule names
  (`ExpiredObjectDeleteMarker`, `NewerNoncurrentVersions`,
  `AbortIncompleteMultipartUpload`) and add a one-line note mapping
  them to the engine's `ActionKind*` constants.
- README.md: prepend `cd "$(git rev-parse --show-toplevel)"` to the
  sync workflow so the `cp` commands' repo-root-relative paths work
  whether the operator's shell is at the repo root or at
  docs/wiki/s3-lifecycle/.
- Home.md: was lagging the wiki-repo merged version (had the older
  pre-merge content). Re-sync from the wiki repo so source matches.

* docs(s3/lifecycle): remove wiki pages from PR

The wiki pages belong in seaweedfs.wiki.git, not the main repo. The
source-of-truth concern that motivated adding them here is real but
the cost — every code-review touchpoint requires reviewers to load
operator-facing pages too — outweighs it. The wiki pages are already
pushed locally (~/dev/seaweedfs.wiki); they'll publish on the
operator-side workflow.

This PR remains scoped to DESIGN.md (the developer-facing reference
that does belong with the code).

* docs(s3/lifecycle): drop Implementation history section

git log is the durable record of what shipped when; the prose table
duplicates it and goes stale faster than commit metadata.

* docs(s3/lifecycle): soften 'exactly once per run' in Goal

The prior phrasing overstated the guarantee versus the failure model
documented later in the same file. Reword to: 'process due objects
each pass; retryable/blocked outcomes get retried from the cursor on
later runs.' Surfaces the head-of-line-blocking semantics up front so
the rest of the doc reads consistently.

Also: drop the stale 'see docs/wiki/s3-lifecycle/' pointer — those
pages live in the wiki repo, not the main repo.
2026-05-13 17:06:14 -07:00
..