Commit Graph

13770 Commits

Author SHA1 Message Date
Chris Lu
ca95d33092 test(s3/lifecycle): bundle dispatcher + engine accessor coverage (#9410)
* test(s3/lifecycle): bundle dispatcher + engine accessor coverage

Two-package bundle covering pure helpers and snapshot read-side
accessors that the router and dispatcher reach for at runtime. None
were directly tested; regressions previously surfaced only as
downstream Tick / Match / Compile failures.

dispatcher (10 tests):
- keyOf: derives every retryKey field from the Match; equal Match
  values produce equal keys (so the second dispatch hits the first's
  retry counter); distinct VersionIDs and ActionKinds produce
  distinct keys (so a noisy version can't starve a healthy one,
  and two kinds on the same object don't share a budget).
- budget(): configured value when set; defaultRetryBudget when zero
  or negative — pins the >0 guard against a flipped comparison.
- backoff(): same pattern as budget for RetryBackoff.

engine snapshot accessors (8 tests):
- OriginalDelayGroups exposes the compiled per-delay groups; rules
  with multiple kinds at different cadences land in distinct entries;
  scan-only actions don't leak into delay groups so the dispatcher
  doesn't try to drive them event-driven.
- PredicateActions populated for tag-sensitive rules, empty for non-
  tag-sensitive ones (so MatchPredicateChange doesn't route
  irrelevant kinds).
- DateActions surfaces ExpirationDate verbatim for date kinds; empty
  for non-date rules.
- MarkActive on an unknown key is a no-op (durable bootstrap-complete
  write races a recompile that drops the rule; panic here would crash
  the worker).
- MarkActive flips a fresh-no-prior-state action from inactive to
  active.
- BucketActionKeys covers every kind RuleActionKinds reports.

* test(s3/lifecycle): strengthen snapshot accessor content assertions

Per gemini review on #9410: assertions previously only checked counts
and non-empty status. Verify the specific ActionKeys land where
expected so an indexing regression that produces the right number of
items with wrong kinds gets caught.

OriginalDelayGroups: each delay group's slice asserts.Contains the
specific (bucket, rule_hash, kind) ActionKey instead of just
NotEmpty.

PredicateActions: assert.Contains the expected key instead of just
NotEmpty.

BucketActionKeys: every key.Bucket must equal the test bucket (catches
cross-bucket leak), and ElementsMatch pins kinds against
RuleActionKinds.
2026-05-09 22:01:54 -07:00
Chris Lu
0955d1aa08 test(s3/lifecycle): direct prefixMatches + filterAllows coverage (#9408)
Both helpers were exercised indirectly through MatchOriginalWrite /
MatchPath; pinning them directly catches a regression at the helper
level so a Match-test failure isn't the first signal of a broken
filter.

prefixMatches: empty prefix fast path; exact-prefix match; non-match
rejection; path shorter than prefix.

filterAllows: no-filter accepts any event; FilterSizeGreaterThan is
strictly > (boundary value rejected); FilterSizeLessThan is strictly
<; zero-size thresholds mean "not set" (must let any size through —
a regression treating 0 as a real threshold would reject everything);
required tag present accepts; missing key, empty tags map, wrong
value, and missing-among-multiple all reject; size + tag filters are
AND'd so either failing rejects.
2026-05-09 20:47:35 -07:00
Chris Lu
edbe7ab140 test(s3/lifecycle): meta-log Event builder + monotonic clock fixture (#9406)
* test(s3/lifecycle): meta-log Event builder + monotonic clock fixture

Several test files build *reader.Event ad-hoc; consolidate the common
shape into the lifecycletest package as task #12 spec calls out
("fixture meta-log generator"). New tests using the builder don't
have to thread Mtime / ShardID / leaf-name semantics by hand, and
existing helpers can migrate over time without churning this PR.

NewCreate / NewDelete / NewUpdate cover the three event shapes;
WithSize / WithModTime / WithTtlSec / WithVersionID / WithExtended /
WithChunks / WithBootstrapVersion / WithShardID compose deterministic
overrides. ShardID defaults to s3lifecycle.ShardID(bucket, key) so
events route through the same shard the production reader would.

MetaLogClock issues monotonic timestamps with a configurable step
(default 1s); concurrent-safe so fan-out fixtures don't have to lock
externally.

15 unit tests pin every option, the IsCreate/IsDelete/IsUpdate
discriminators, leaf-name extraction for nested keys, ShardID
derivation, option-ordering semantics, the concurrent clock contract
under -race, and a Peek-doesn't-advance check.

* test(s3/lifecycle): address review comments on event builder

- leafOf strips trailing slashes before splitting so directory-key
  fixtures (e.g. "folder/") get the slashless leaf "folder" — pre-fix
  it returned "" which would break router tests for directory markers.
- NewUpdate now seeds OldEntry.Attributes.Mtime with the event ts
  (matching NewDelete), so a downstream router that compares mtimes
  doesn't see a synthetic 1970 epoch on the pre-update state.
- New WithOldSize / WithOldChunks / WithOldModTime options let Update
  events configure pre-update state independently. The unprefixed
  variants still target NewEntry on Update events; the With Old*
  options are no-ops on Create (no OldEntry to mutate) and never bleed
  into NewEntry.

5 new tests pin: directory-key + multi-slash leaf extraction; OldEntry
mtime default on Update; the WithOld* targeting + Create-event
no-bleed contract.
2026-05-09 20:47:27 -07:00
Chris Lu
9d20e71883 test(s3/lifecycle): cover worker handler lookupBucketsPath (#9407)
Three branches: gRPC error from GetFilerConfiguration must propagate
(else Execute would proceed to dial S3 with an empty buckets path
and never dispatch); a non-empty DirBuckets is honored verbatim so
operators with a non-default layout aren't force-routed to /buckets;
an empty DirBuckets falls back to the documented "/buckets" default
rather than returning empty (which would route to root). stubFilerConfigClient
embeds filer_pb.SeaweedFilerClient so methods other than the one
under test panic if called — keeps the surface narrow.
2026-05-09 20:41:09 -07:00
Chris Lu
1aa55f5bf9 test(s3/lifecycle): direct decideMode + RuleMode.String coverage (#9405)
Compile tests cover decideMode indirectly; these direct tests pin
every branch so a regression in the classifier itself can't slip
behind a more elaborate Compile failure.

Pinned: nil rule and Disabled status both → Disabled; ExpirationDate
→ ScanAtDate without consulting retention; metaLogRetention=0 means
unbounded so any horizon → EventDriven; horizon within retention →
EventDriven; horizon exceeding retention → ScanOnly; bootstrapLookback
adds to horizon (not retention) so a near-threshold case is still
gated; zero horizon (rule field unset) skips the gate. RuleMode.String
must render the documented names for every variant; an unknown value
collapses to "unspecified" rather than empty or panic.
2026-05-09 20:35:34 -07:00
Chris Lu
619cb39827 test(s3/lifecycle): pin Schedule edge cases beyond happy path (Phase 15 slice) (#9403)
* test(s3/lifecycle): pin Schedule edge cases beyond happy path

Pre-existing schedule_test covered the happy path (ordered Drain,
empty schedule, duplicates, boundary-inclusive). Five new tests pin
edge cases the dispatcher relies on:

- Drain at a time before any DueTime returns nil and leaves the heap
  intact, so the dispatcher can't accidentally consume future-due
  matches.
- NextDue after partial Drain points to the next earliest, catching
  a Drain that forgets the heap invariant.
- Add after Drain bubbles a fresh earlier DueTime to the front, so
  late-arriving high-priority matches don't sit behind older ones.
- Drain returns Matches in ascending DueTime order regardless of
  insert order — explicit pinning of the documented contract.
- Concurrent Add+Drain across 64 goroutines under -race.

* test(s3/lifecycle): actually exercise Drain in AddAfterDrain test

Per coderabbit review on #9403: the test name promised "after Drain"
but the previous body only Add'd both items without ever calling
Drain in between. Insert a real Drain (popping "drain_me") before
the second Add, so the heap-invariant-across-Drain-then-Add path is
actually pinned. Bumps the after-Drain Match's DueTime out of the
way so the Drain in step 3 returns it deterministically.
2026-05-09 20:35:22 -07:00
Chris Lu
435ef7f94f test(s3/lifecycle): pin toProtoActionKind + toProtoIdentity converters (#9404)
test(s3/lifecycle): pin toProtoActionKind + toProtoIdentity

The two converters are the worker-side wire to LifecycleDelete; a
miss in toProtoActionKind sends ACTION_KIND_UNSPECIFIED that the
server rejects FATAL, and a wrong toProtoIdentity flips the CAS
witness so every dispatch comes back NOOP_RESOLVED with STALE_IDENTITY
even though the entry hasn't changed.

10 tests pin: every listed s3lifecycle.ActionKind maps to its proto
counterpart (table-driven, one subtest per kind); ActionKindUnspecified
and a future unknown kind both collapse to ACTION_KIND_UNSPECIFIED
(forward compat); nil EntryIdentity stays nil (preserves the no-CAS
sentinel); a populated identity copies every field; a zero-valued
identity still produces a non-nil output so the server treats it as
a real CAS witness rather than no-CAS.
2026-05-09 20:35:04 -07:00
Chris Lu
1350e681c9 test(s3/lifecycle): pin Pipeline.Run dependency + shard validation (Phase 15 slice) (#9402)
* test(s3/lifecycle): pin Pipeline.Run dependency + shard validation

Pre-existing TestPipelineRunRequiresDependencies only checked that an
empty Pipeline errors; it didn't pin which specific dependency must be
present. A refactor that makes one nilable accidentally would slip
through.

8 new tests pin every validation branch in Pipeline.Run: missing
Engine / Persister / Client / FilerClient each error with "missing
required dependency"; missing BucketsPath errors with its own
distinct message so operators can spot the missing wiring; ShardID =
-1 / ShardCount errors with a range message (covers the half-open
[0, ShardCount) boundary so a < to <= refactor can't introduce a
one-past-the-end shard); and a multi-shard config with one
out-of-range entry refuses the whole run rather than silently
disabling the rest.

* test(s3/lifecycle): refactor Pipeline.Run validation tests as table-driven

Per gemini review on #9402: collapse the eight per-branch tests into
TestPipelineRunValidation with a slice of (name, mutate, wantErr)
cases. Same coverage, ~30 fewer lines, idiomatic Go pattern that
makes adding a new validation case trivial.
2026-05-09 20:34:51 -07:00
Chris Lu
cb6e498e0b test(s3/lifecycle): pin Descriptor structural invariants (#9401)
* test(s3/lifecycle): pin Descriptor structural invariants

Pre-existing handler tests covered Capability and Detect; Descriptor
was previously untested. A drift between the form fields it advertises
and the defaults config.go reads silently breaks the admin UI in two
ways: the form renders blank (admin can't tune) or the worker clamps
to a hardcoded fallback ignoring the admin's edits. The new tests
catch both directions.

Pinned: jobType / DisplayName / Description / DescriptorVersion;
AdminConfigForm exposes a workers field whose default matches
defaultWorkers; WorkerConfigForm has a default and a field for every
cadence knob ParseConfig reads (dispatch_tick / checkpoint_tick /
refresh_interval / bootstrap_interval / max_runtime); AdminRuntime-
Defaults hits a daily cadence with bounded detection timeout and
single job per detection.

* test(s3/lifecycle): tighten Descriptor invariant assertions

Per gemini review on #9401: pin DetectionTimeoutSeconds to its exact
value (60) instead of ">0" so an accidental tweak is caught, and
assert WorkerConfigForm fields are INT64 (matching ParseConfig's
readInt64) so a STRING-type drift can't silently make the worker
ignore admin edits.
2026-05-09 20:34:17 -07:00
Chris Lu
6f9668c20b test(s3/lifecycle): pin lifecycleDispatch validation early-returns (#9400)
Three pure-validation paths in lifecycleDispatch return BLOCKED before
any filer call; without coverage a refactor could let them fall
through to a real delete. ABORT_MPU at dispatch time is a defensive
catch (the route bypass should never happen, but if it does the
fallthrough must not become a default-case rm). Unknown ActionKind
gets the same treatment for forward-compatibility with new proto
values. Empty version_id on noncurrent / EXPIRED_DELETE_MARKER kinds
must be rejected before deleteSpecificObjectVersion is called, so a
malformed event can't silently delete the latest pointer.
2026-05-09 20:11:08 -07:00
Chris Lu
af2a359e45 feat(s3/lifecycle): metadata_only_total Prometheus counter (#9399)
Operator-visible signal for the metadata-only delete path landed in
PR 9390. Increment seaweedfs_s3_lifecycle_metadata_only_total{bucket,
rule_hash} after each successful unversioned or noncurrent / expired-
marker delete that took the skip-chunk path. Suspended-versioning
null delete is intentionally not counted: that path's nil err can
mean "deleted" or "NotFound", so a count there would over-report.
rule_hash is hex-encoded for label safety; nil bytes collapse to
"". DeleteBucketMetrics tears the new series down alongside the
existing lifecycle counters when a bucket is removed.
2026-05-09 20:02:26 -07:00
Chris Lu
c0cf1417f1 test(s3/lifecycle): cover worker handler Execute validation paths (#9398)
7 tests pin the Execute early-return surface that runs without a
filer or S3 dial: nil request / nil Job / nil sender all error;
foreign JobType errors with the offending name in the message; no
S3 endpoints in cluster context errors (Execute is stricter than
Detect — the admin shouldn't have routed the job there); missing
filer_grpc_address parameter errors (proposal must have been tampered
with or dropped); empty JobType is accepted as broadcast routing and
flows through to the next validation step. The dial path itself is
intentionally not covered here — those tests would need an in-process
gRPC server and belong with the integration suite.
2026-05-09 19:51:31 -07:00
Chris Lu
284d37c3b6 test(s3/lifecycle): cover InMemoryPersister deep-copy contract (#9397)
* test(s3/lifecycle): cover InMemoryPersister deep-copy contract

8 tests pin the persister contract other lifecycle tests rely on for
cursor checkpointing: Load on an unknown shard returns an empty map
(not an error); Save then Load roundtrips; Save copies the input so
caller-side mutation doesn't bleed into stored state; Load returns a
copy so caller-side mutation of the snapshot doesn't bleed back; Save
replaces (not merges) prior state so stale resume points don't survive
restart; different shards stay isolated; saving an empty map clears
state; concurrent Save+Load is race-free under -race. A regression on
any of these silently corrupts downstream tests.

* test(s3/lifecycle): assert.NotContains for InMemoryPersister key absence

assert.Empty on a map[K]V index returns true when the value is the
zero value, which would mask a key that leaked through with int64(0).
Use assert.NotContains so the assertion fails on key presence
regardless of the stored value.
2026-05-09 19:47:16 -07:00
Chris Lu
62e04623ce test(s3/lifecycle): cover worker handler Detect + helpers (#9396)
* test(s3/lifecycle): cover worker handler Detect + helpers

13 tests pin the worker-handler surface that runs without a live filer
or S3 server. Pure helpers: clusterS3Endpoints (nil context, empty
list, filter empty entries while preserving order, all-valid
passthrough); readString (missing key, nil ConfigValue, wrong kind
falls back, string returned). Capability advertises jobType with
single-job concurrency caps. Detect: nil request / nil sender / wrong
JobType all error; no S3 endpoints emits a 'skipped' activity and
completes with success; no filer addresses behaves the same; the
happy path proposes one job parameterized with the first filer
address; empty JobType is accepted (broadcast detect); a
SendProposals failure propagates without firing complete.

* test(s3/lifecycle): cover SendComplete error propagation in worker Detect

The recordingSender already supported forcing an err on SendComplete
via errOn, but no case exercised it. A SendComplete failure must
propagate so the admin learns the completion signal never landed;
proposals went out before the failure so they remain recorded.
2026-05-09 19:46:57 -07:00
Chris Lu
551e700e64 test(s3/lifecycle): cover scheduler configload surface (#9395)
* test(s3/lifecycle): cover scheduler configload surface

LoadCompileInputs is the bridge between the filer's bucket directory
and the engine snapshot the scheduler compiles every refresh; a missed
or misclassified bucket silently disables lifecycle for that prefix
until the next refresh. Tests pin: empty bucket dir, files at the
bucket level skipped, buckets without the lifecycle XML extended key
skipped, empty-bytes XML skipped, valid XML becomes a CompileInput,
versioning attr propagates to CompileInput.Versioned, malformed XML
surfaces as a ParseError without aborting the walk, and pagination
across the 1024 page boundary preserves bucket order.

Also covers the IsBucketVersioned (case + whitespace tolerance,
rejection of garbage values) and AllActivePriorStates (one entry per
(bucket, ruleHash, actionKind), bucket-keyed isolation) helpers.

* test(s3/lifecycle): tighten configload pagination boundary check

Switch the bucket-count check to require.Len so a regression that
returns the wrong number of buckets fails fast before the boundary
asserts panic on out-of-range index. Add explicit assertions on the
last entry of page 1 (b01023) and the first entry of page 2 (b01024)
so a pagination-loop bug that drops or duplicates the seam is caught
directly rather than only via the count check.
2026-05-09 19:46:40 -07:00
Chris Lu
6021a88606 test(s3/lifecycle): cover CompareVersionIds tiebreak surface (#9394)
* test(s3/lifecycle): cover CompareVersionIds tiebreak surface

13 tests pin every documented branch of the version-id comparator and
its helpers (isNewFormatVersionId, getVersionTimestamp): equality and
short-circuit paths, null sorting last, both-new-format with smaller-
=-newer ordering, both-old-format with larger-=-newer ordering, mixed-
format compared by parsed timestamp, mixed-format with synthesized
equal timestamps, length / null / non-hex rejection, the strictly-
greater-than threshold boundary at 0x4000000000000000, and the
inverted-value invariant the comparator relies on. Getting any axis
wrong silently inverts retention rankings, which would resurrect
deleted versions or evict live ones.

* test(s3/lifecycle): use plain assert.Equal in mixed-format compare test

The previous local require := assert.New(t) shadowed testify's require
package while actually returning an assert.Assertions (continue-on-fail
semantics, not fail-fast). Use plain assert.Equal(t, ...) calls so the
behavior matches the variable's name and the rest of the file.
2026-05-09 19:03:31 -07:00
Chris Lu
7781eef429 test(s3/lifecycle): cover dispatcher filerSiblingLister surface (Phase 14 slice) (#9392)
* test(s3/lifecycle): cover dispatcher's filerSiblingLister surface

Tests pin the four routing-critical filer interactions on the
filerSiblingLister: Survivors (count cap, LoneEntry semantics,
null-version detection across regular files and directory-key
markers, error propagation in both list and lookup paths),
ListVersions (NotFound collapse, dir/missing-id filtering,
1024-page boundary, error propagation), LookupNullVersion (regular
file, explicit-null flag, directory-key marker accept, plain-dir
reject, NotFound collapse, error propagation), and LookupVersion
(empty version-id no-op, v_ prefix, NotFound collapse, error
propagation).

The fake SeaweedFilerClient mirrors the real filer's NotFound
shape — gRPC succeeds at stream creation and the first Recv()
surfaces filer_pb.ErrNotFound — which is what the lister's
errors.Is check depends on. NewFullPath strips a trailing slash
before splitting so directory-key markers are stored under their
slashless Name.

* test(s3/lifecycle): gofmt sibling_lister_test.go

Trailing comment alignment.
2026-05-09 18:55:50 -07:00
Chris Lu
8cf42a5abb test(s3/lifecycle): assert per-goroutine errors in fake-server concurrent test (#9393)
test(s3/lifecycle): assert per-goroutine errors in concurrent fake test

The previous TestFake_ConcurrentCallsSerializeWithoutDeadlock dropped
the err return from each LifecycleDelete call, so a regression in the
concurrent path could pass the length-only assertion. Capture each
err on a buffered channel and require.NoError after wg.Wait().
2026-05-09 18:54:15 -07:00
Chris Lu
ddfb219ec3 test(s3/lifecycle): fake LifecycleDelete server (Phase 12 slice) (#9391)
* test(s3/lifecycle): fake LifecycleDelete server for component tests

A reusable double for SeaweedS3LifecycleInternalServer with per-key
FIFO outcome queues, a fallback Default, and recorded request capture.
Tests of the worker pipeline that need to hit the proto boundary can
queue up DONE/NOOP/RETRY/FATAL/SKIPPED_OBJECT_LOCK responses per
(bucket, objectPath, versionId) and assert dispatch order against
Recorded(). SetError flips the server into transport-failure mode
without polluting the request log.

* test(s3/lifecycle): use struct map key for FakeLifecycleServer queues

Bucket / object path / version-id are user-supplied strings that can
contain "/" or "@", which would collide if the queue map were keyed by
"<bucket>/<object>@<version>". Switch to a struct key so the
components stay separate.

* test(s3/lifecycle): deep-copy recorded LifecycleDelete requests

Tests that mutate a Recorded() entry — or a request pointer they
already passed in — were able to corrupt the fake's bookkeeping
because the slice carried shared pointers. Clone with proto.Clone at
both record and read time so the fake holds an independent snapshot
of every arriving request and hands callers an independent snapshot
back. Tightened TestFake_VersionIDPartOfKey error checks while there.
2026-05-09 18:38:52 -07:00
Chris Lu
bb0c7c779f feat(s3/lifecycle): metadata-only delete when entry TtlSec > 0 (Phase 2b) (#9390)
* refactor(s3): thread metadataOnly into delete helpers

Add a metadataOnly bool to deleteUnversionedObjectWithClient and
deleteSpecificObjectVersion. When true the helper sends IsDeleteData=
false to the filer's DeleteEntry RPC so per-chunk DeleteFile RPCs are
skipped — the volume server reclaims chunks on its own at TTL drop.
Non-lifecycle callers (DELETE handlers, batch delete) pass false to
preserve today's eager-chunk-delete behavior; only the lifecycle
handler in the next commit will pass true.

* feat(s3/lifecycle): metadata-only delete when entry TtlSec > 0

Per-write TTL stamping (PR 9377) sets Attributes.TtlSec on every
lifecycle-fitting entry. When the live entry the LifecycleDelete
handler fetched carries TtlSec > 0 the volume server is guaranteed
to reclaim chunks at TTL drop, so the filer can skip per-chunk
DeleteFile RPCs and just remove the entry record. lifecycleDispatch
now computes metadataOnly from the live entry and threads it through
the unversioned, suspended-null, and noncurrent/expired-marker delete
paths. createDeleteMarker is unaffected — it creates a marker, never
deletes chunks.
2026-05-09 18:38:38 -07:00
Chris Lu
255e9cd0f7 test(s3/lifecycle): cover reader cursor + Run validation contracts (#9389)
* test(s3/lifecycle): cover reader cursor + Run validation contracts

Layer 2 tests pinning four reader-package contracts the dispatcher
pipeline depends on: MinTsNs anchors at frozen positions, Snapshot
returns a deep copy in both directions, Restore replaces (not merges),
and Run validates ShardID/Events/BucketsPath before subscribing.

* test(s3/lifecycle): tighten cursor composition assertions

Snapshot deep-copy: also assert cursor doesn't see keys added to the
returned map. Restore replace: freeze before second Restore and assert
IsFrozen returns false after, pinning the contract that Restore wipes
frozen state alongside the value map. Run validation: bound the call
with a 5s context timeout so a regression that lets Run reach the nil
client surfaces as a failure instead of a hang.
2026-05-09 14:32:11 -07:00
Chris Lu
aa280443e7 test(s3/lifecycle): Layer 2 multi-shard composition for the dispatcher (#9387)
* test(s3/lifecycle): Layer 2 multi-shard composition for the dispatcher

The existing dispatcher unit tests cover individual outcomes
(DONE / RETRY_LATER / BLOCKED / etc.) on a single shard, and
pipeline_test.go has only one end-to-end happy-path assertion.
Multi-shard composition — the contract Pipeline.Run wires up at
runtime — was untested at the component level.

Add four Layer 2 tests in dispatcher/multi_shard_test.go:

  Two events for two shards land in different schedules, dispatch
  independently, and each cursor advances only for its own event
  (no cross-contamination on the action-key map).

  A poison event on shard 0 returns BLOCKED and freezes shard 0's
  cursor; shard 1's normal event continues to dispatch and its
  cursor advances. Per-shard isolation contract.

  Save/Load round-trips a per-shard cursor snapshot through the
  Persister: a fresh dispatcher restores the same TsNs map. Pins
  the contract Pipeline.Run drives on the checkpoint ticker.

  RETRY_LATER respects RetryBackoff against the wall clock — a
  Tick within the backoff window doesn't re-dispatch; a Tick past
  the new DueTime does. Guards against premature retries from
  refresh ticks landing inside the backoff.

Pipeline.Run itself can't run here (it builds a real reader.Reader),
so the tests share the same fakeClient pattern dispatcher_test.go
uses and drive Tick directly.

* test(s3/lifecycle): drop unused snapshot helper and addAndTick parameter
2026-05-09 14:12:21 -07:00
Chris Lu
1854101125 feat(s3/lifecycle): bootstrap re-walk cadence + operator hooks (Phase 8) (#9386)
* feat(s3/lifecycle): bootstrap re-walk cadence + operator hooks (Phase 8)

scan_only actions only fire from the bootstrap walk: the engine
classifies a rule as scan_only when its retention horizon exceeds the
meta-log retention, so event-driven routing can't be trusted. Today
each bucket walks once per process, so a long-running worker never
revisits — scan_only retention only catches up when the worker
restarts.

Replace BucketBootstrapper.known (set) with BucketBootstrapper.lastWalk
(name -> completion time). KickOffNew now re-walks a bucket whose
last walk completed more than BootstrapInterval ago. Zero interval
preserves the legacy walk-once-per-process behavior so existing
deployments don't change cadence by default. walkBucket re-stamps
on success and clears the stamp on failure (via MarkDirty), so the
next KickOffNew picks failed walks back up.

Add MarkDirty / MarkAllDirty operator hooks for forced re-walks, and
a Now func() for testable time travel.

weed shell run-shard grows --bootstrap-interval (cadence knob) and
--force-bootstrap (drop in-memory state at startup so every bucket
walks again immediately, useful when a config change should take
effect without a restart).

Tests: cadence respected (skip inside interval, re-walk past it);
zero interval keeps once-per-process; MarkDirty forces re-walk
under a 24h interval; MarkAllDirty resets every record. The
fakeClock helper guards the test clock with a mutex so race-detector
runs are clean.

* fix(s3/lifecycle): split walk state, thread BootstrapInterval through worker, drop dead flag

Three issues with the Phase 8 cadence work as it landed:

1. lastWalk did double duty as both completed-walk timestamp and
   in-flight debounce. A walk that took longer than BootstrapInterval
   would have a fresh KickOffNew start a duplicate goroutine on the
   next refresh tick because the stamp from KickOffNew looked stale
   against the interval. Split into lastCompleted (set on success)
   and inFlight (set on dispatch, cleared after the walk goroutine
   returns success or failure). KickOffNew skips inFlight buckets
   regardless of cadence.

2. The cadence knob existed on `weed shell` but not on the production
   path: scheduler.Scheduler constructed BucketBootstrapper without
   BootstrapInterval, and weed/worker/tasks/s3_lifecycle/Config had
   no field for it. Add Scheduler.BootstrapInterval, parse
   `bootstrap_interval_minutes` in ParseConfig (zero = legacy walk-
   once-per-process; negative clamps to zero), and forward it from
   the handler. Tests cover default, override, clamp, and explicit-zero.

3. --force-bootstrap was a no-op: BucketBootstrapper is freshly
   allocated at command start, so MarkAllDirty on empty state does
   nothing, and the flag couldn't influence an already-running
   process anyway. Remove it; a real runtime trigger (SIGHUP, control
   RPC) is a separate change.

In-flight regression: a blockingInjector pins the first walk in
progress while the test advances the clock past the interval. The
second KickOffNew is a no-op (inFlight check). After release, the
post-completion KickOffNew within the interval is also a no-op.

* test(s3/lifecycle): wait for lastCompleted stamp before advancing fake clock

The cadence test polled listedN to know "the walk happened" — but
that fires once both list passes are issued, while the success-stamp
lands later, after walkBucketDir returns. A clock.Advance(30m)
between those two events would record the stamp at clock+30m
instead of T0; the next assertion would then see now.Sub(last) < 1h
and skip the expected re-walk. Tight in practice but exposed under
-race / load.

Add a waitForCompleted helper that polls b.lastCompleted directly,
and use it before each clock advance in both the cadence and zero-
interval tests.

* fix(s3/lifecycle): expose bootstrap interval in worker UI; honor MarkDirty during walks

Two follow-ups on Phase 8.

The worker config descriptor had no bootstrap_interval_minutes field,
so the production operator UI couldn't enable the cadence — only the
internal ParseConfig + Scheduler wiring knew about it. Add the field
to the cadence section (MinValue=0 since 0 is the legacy default) and
include the default in DefaultValues so existing deployments see the
knob with the right preset.

MarkDirty / MarkAllDirty silently lost their effect when a walk was
in flight: the methods cleared lastCompleted, but the walk's success
path then wrote a fresh timestamp, hiding the operator's invalidation.
Track a pendingDirty set; the walk goroutine consumes the flag on
exit and skips the success stamp, so the next KickOffNew picks the
bucket up immediately.

Regression: pin a walk in progress with a blockingInjector, MarkDirty
the bucket, release the walk, and assert lastCompleted stayed empty
plus the next KickOffNew triggers a new walk inside the
BootstrapInterval window.

* refactor(s3/lifecycle): drop unused MarkDirty / MarkAllDirty + pendingDirty

These methods were the operator-hook half of Phase 8, but the only
caller (--force-bootstrap on the shell command) was removed when it
turned out to be a no-op against a freshly-allocated bootstrapper.
Nothing in production calls them anymore.

Strip the dead surface: MarkDirty, MarkAllDirty, the pendingDirty
set, the dirty-suppression branch in walkBucket, and the three tests
that only exercised those methods. BootstrapInterval-driven
re-bootstrap is the live mechanism. A real runtime trigger (SIGHUP,
control RPC) is a separate change with a real call site.
2026-05-09 13:42:31 -07:00
Chris Lu
edfa1ce210 feat(s3/lifecycle): pointer-transition routing for live PUTs (Phase 5b/4) (#9385)
* feat(s3/lifecycle): pointer-transition routing for live PUTs (Phase 5b/4)

Bootstrap covers existing versions, but a live PUT that creates a new
.versions/<v-new> file and updates the parent's ExtLatestVersionIdKey
didn't fire NoncurrentDays / NewerNoncurrent on the displaced prior
version until the next bootstrap. Close that runtime gap.

The meta-log already emits an Update event for the .versions/
directory itself when the latest pointer changes; the router was
dropping it because buildObjectInfo returns nil for directories. New
branch in Route detects that shape (versioned bucket, NewEntry +
OldEntry both directories with the .versions/ suffix, ExtLatestVersionIdKey
changed, ID different from the new ID) and emits a Match against the
LOGICAL key with VersionID=oldID. Match.Identity comes from a single
LookupVersion RPC for the displaced version file; SuccessorModTime is
the directory update's mtime, which is the moment the displaced
version became noncurrent.

SiblingLister grows LookupVersion(bucket, key, versionID) for that
single-RPC fetch. filerSiblingLister implements it; routing path
treats NotFound as "displaced version was hard-deleted in the
meantime, suppress" rather than an error.

The router gates the lookup on at least one active event-driven
NoncurrentDays / NewerNoncurrent rule for the bucket, so most buckets
pay nothing per directory update.

Tests: pointer-flip fires NoncurrentDays with displaced version_id;
unchanged pointer skips; empty old pointer skips (first-PUT scenario);
displaced-version NotFound suppresses; no-rule skips lookup;
NewerNoncurrentVersions retains rank-0; unversioned bucket skips.

* fix(s3/lifecycle): SuccessorModTime cache + NewerNoncurrent expansion

Two correctness gaps in pointer-transition routing.

The .versions/ directory's own Attributes.Mtime is preserved across
pointer updates by updateLatestVersionInDirectory: it's a stale clock
relative to the freshly-written latest version. Using it as the
displaced version's SuccessorModTime made NoncurrentDays compute
due = staleMtime + days, which fires immediately on a fresh PUT
into an old .versions/ container. Read ExtLatestVersionMtimeKey
written by setCachedListMetadata; suppress (return no matches) when
the cache is missing rather than fall back to dir mtime.

Single-oldID lookup is only enough for pure
NoncurrentVersionExpirationDays. Any rule with NewerNoncurrentVersions
> 0 cares about the noncurrent ranks, and a pointer flip shifts every
prior noncurrent's index by one — the version that just crossed the
keep-count threshold needs to be evaluated too. When any matching
rule needs ranks, list the full .versions/ container, sort newest-
first with mtime + version-id tiebreak, and route every noncurrent
with its real index. Identity-CAS dedups against earlier schedules.

SiblingLister grows ListVersions(bucket, key); filerSiblingLister's
implementation paginates the container fully.

Two regression tests: stale dir mtime + correct cached mtime
schedules ~30 days out (not immediate); NewerNoncurrentVersions=2
with 4 versions fires on the rank-2 entry that just crossed the
threshold while rank-0/1 are retained.

* fix(s3/lifecycle): bound pointer-transition expansion to threshold crossings

routePointerTransitionExpand emitted a Match for every eligible
noncurrent on every PUT. Schedule.Add doesn't dedup, so identity-CAS
at dispatch only saved the wasted RPC, not the heap slot. A hot key
with many already-eligible versions and a count rule would push
O(versions) entries per flip, repeatedly, until dispatch caught up.

Bound the emission to versions that newly entered eligibility on
this specific flip: rank 0 (the displaced version, for the
NoncurrentDays clock) plus rank == rule.NewerNoncurrentVersions
for each active count-gated rule (the version that just crossed
from kept to expired). Bootstrap still owns full backfill for
versions that were already over-threshold.

Adds a regression with 6 versions and NewerNoncurrentVersions=2:
asserts only the rank-2 entry that just crossed fires, not the
already-over-threshold rank-3/rank-4 entries.

* fix(s3/lifecycle): suppress pointer-transition expansion when newID missing

routePointerTransitionExpand defaulted latestPos to 0 if newID wasn't
found in the listing. That made the actual newest sibling latest
against the pointer's intent, then misranked every other version. A
race between the pointer write and the version write could land us
there.

Default latestPos to -1, set it only on a real match, and suppress
the expansion when the search misses. Bootstrap repairs state on
the next walk.

The NewerNoncurrentVersions retention test was setting only
lookupEntry, so Route never reached the expansion path it claimed to
exercise. Repoint to listVersions and assert ListVersions was
consulted while LookupVersion was not. Adds a regression covering
the missing-newID suppression directly.

* fix(s3/lifecycle): include bare null version in pointer-transition routing

Bootstrap models the bare-key object as a "null" sibling alongside
.versions/ children, but the live pointer-transition path didn't.
Two cases lost:

1. oldID == "" was treated as "nothing displaced". A pre-versioning
   bare object becomes noncurrent when the first versioned PUT lands
   and the pointer flips to a real id, but live routing skipped it
   and waited for the next bootstrap.

2. The expansion path's ListVersions returned only .versions/
   children. With a bare null in the picture, the noncurrent ranks
   were wrong, so NewerNoncurrentVersions could keep the wrong
   versions and delete the right ones (or vice versa).

SiblingLister grows LookupNullVersion(bucket, key) returning the
bare entry plus an explicit-null flag (matches the bootstrap shape).
filerSiblingLister implements it via util.NewFullPath +
filer_pb.LookupEntry.

routePointerTransitionDisplaced: oldID == "" now consults
LookupNullVersion. When the bare entry exists, route it as
VersionID="null" against the LOGICAL key.

routePointerTransitionExpand: collect .versions/ children plus the
null entry into one sibling slice before sorting and ranking. The
threshold-crossing logic now sees the same N-version set that
bootstrap would compute.

Three new tests: oldID == "" with no null is a no-op (one null
lookup, no version lookup); oldID == "" with a bare null schedules
NoncurrentDays as VersionID="null"; expansion with a bare null
between .versions/ siblings places null at its mtime-correct rank
and only that rank-N entry fires.

* fix(s3/lifecycle): atomic listPageSize so test cleanup doesn't race

KickOffNew dispatches walks via `go b.walkBucket(...)`. A test that
finishes before its goroutines drain leaves them running into the
next test's t.Cleanup, which mutates listPageSize. -race spots the
read/write collision intermittently. Convert listPageSize to
atomic.Uint32; tests use Load/Store. No production semantics change.

* fix(s3/lifecycle): null becomes latest when suspended PUT clears pointer

The router treated newID == "" as if the cached
ExtLatestVersionMtimeKey were still authoritative — but that cache
holds the displaced version's mtime, written by setCachedListMetadata
when the prior version became latest. Using it as SuccessorModTime
made NoncurrentDays=30 immediately fire on a 100-day-old displaced
version even though it became noncurrent today.

When newID == "" the bare null is the new latest. Look it up,
substitute its mtime as the successor clock, and substitute "null"
as the latestPos target for the expansion path's id match. Both
displaced and expand paths now derive the right clock.

updateIsLatestFlagsForSuspendedVersioning was the upstream cause of
the staleness — it cleared ExtLatestVersionIdKey and FileNameKey but
left the cached size/mtime/etag/owner/delete-marker behind. Call
clearCachedVersionMetadata so the .versions/ container is consistent
with "null is latest". The router-side guard is still needed for
older deployments that ran the buggy code, but new writes won't
exercise the workaround.

Two regressions: 100-day-old displaced under NoncurrentDays=30 with
a today-null PUT schedules ~30d out (not immediate); same shape with
NewerNoncurrentVersions=2 ranks the null at latest and only the
rank-2 entry fires.
2026-05-09 12:21:35 -07:00
Chris Lu
2f7ac1d664 feat(s3/lifecycle): NoncurrentVersionExpiration via bootstrap (Phase 5b/3) (#9383)
* feat(s3/lifecycle): NoncurrentVersionExpiration via bootstrap (Phase 5b/3)

Bootstrap now expands every <key>.versions/ directory into one event
per version with sibling state pre-computed. The router fires
NoncurrentDays / NewerNoncurrent off these events using
SuccessorModTime as the noncurrent clock; previously these rules
never ran on a versioned bucket because buildObjectInfo couldn't
classify version-folder events without the latest pointer.

Mechanics

walkBucketDir treats a directory ending in .versions and carrying
ExtLatestVersionIdKey as a SeaweedFS .versions container — emit it
once and skip the recursion. Coincidentally-named directories without
the latest pointer recurse normally.

BucketBootstrapper.expandVersionsDir lists the children, sorts
newest-first by mtime, resolves the latest position from the pointer,
and injects a synthesized reader.Event per version with
BootstrapVersion populated. NoncurrentIndex is 0-based among
noncurrents in newest-first order; SuccessorModTime is the immediate
newer sibling's mtime (zero for the latest). Pointer naming a missing
or absent version falls back to the newest-by-mtime sibling so a
race window can't flag every entry as noncurrent.

routeBootstrapVersion uses BootstrapVersion to build ObjectInfo
directly (bypassing the version-folder skip in buildObjectInfo) and
runs the standard match loop. ABORT_MPU is excluded by kind-shape
gate. The schedule clock uses SuccessorModTime for noncurrents and
ModTime for the latest, so the dispatcher fires when the rule's days
threshold is met. Match.ObjectKey is the LOGICAL key,
Match.VersionID is the marker's stored version_id — the dispatcher
reaches deleteSpecificObjectVersion or createDeleteMarker correctly.

Layer 2 tests cover both sides. Router: latest fires ExpirationDays;
noncurrent fires NoncurrentDays; NewerNoncurrentVersions retains the
N newest noncurrents; ABORT_MPU never matches. Bootstrap: .versions
dir emitted once and not recursed; missing latest pointer falls back
to newest; backdated PUT (latest pointer is older by mtime) keeps
the right noncurrent index; delete-marker flag propagates.

* fix(s3/lifecycle): no VersionID for latest expirations, child-based dir disambig

Two correctness gaps in Phase 5b/3.

Bootstrap was pinning the version_id on every Match. For
EXPIRATION_DAYS / EXPIRATION_DATE on the latest version this is
unsafe: between schedule and dispatch a fresh PUT can land, the
dispatcher would still identity-match against the original version's
bytes (it still exists at that path) and the resulting delete marker
would hide the new latest. Drop VersionID for those kinds; an empty
VersionID makes the dispatcher fetch the current latest, where
identity-CAS resolves to STALE_IDENTITY and bootstrap re-schedules
with the new latest's identity. NoncurrentDays / NewerNoncurrent /
EXPIRED_DELETE_MARKER still pin the version_id since those are
version-targeted.

isVersionsDir gating on ExtLatestVersionIdKey lost a race window:
createDeleteMarker writes the version file before updating the
parent's Extended pointer, so a walk between those two steps would
see a .versions/ dir without the pointer, recurse into it, and emit
raw version files that the router drops. Match the suffix only and
let expandVersionsDir disambiguate by child inspection: if any child
carries ExtVersionIdKey it's a real .versions container and we expand;
otherwise it's a coincidentally-named user folder and we recurse via
the bucket-walk's own callback so nested entries still flow through.

Tests: latest-expiration assertion flipped to expect empty VersionID;
new tests cover the coincidentally-named-folder recursion and the
race-window expansion (children present, pointer absent).

* fix(s3/lifecycle): filter directory + missing-version-id children at listing

expandVersionsDir's listing callback collected every child with
attributes; subdirectories or entries without ExtVersionIdKey would
make it past the empty-id skip in the inner loop but still inflate
NumVersions and skew NoncurrentIndex (the rank derives from the
filtered slice's position, which was wrong when the unfiltered slice
was sorted). Drop directories at listing time and partition the
file children into a versions slice that's the actual rank source.

Test cleanups: out-of-order-mtime test now sets v1 older than v2 so
latestPos > 0 actually exercises the rank-skip branch in
expandVersionsDir; bootstrapVersionEntry preserves nanosecond
precision via MtimeNs to match markerLoneEntry's pattern; drop a
leftover unused idx variable.

* fix(s3/lifecycle): null version + canonical version-id tiebreak

Two correctness gaps in Phase 5b/3 bootstrap.

Null versions live at the bare logical path, not under .versions/.
Bootstrap previously expanded only .versions/<key>/ children, so:
  - pre-versioning objects with newer .versions/ history never had
    their null version expired by NoncurrentDays
  - suspended-bucket writes (which clear the .versions/ latest pointer
    so null becomes current) had every .versions/ child wrongly
    classified as latest by the buildObjectInfo fallback

expandVersionsDir now looks up the bare key via NewFullPath +
LookupEntry, accepts a regular file or an explicit S3 directory-key
marker (Mime set), and folds it into the sibling set with
VersionID="null". Latest resolution: pointer present + names a real
id wins; pointer absent + null exists makes null latest; otherwise
falls back to newest sibling. The walker's regular emission for the
bare entry would otherwise duplicate, so walkBucketDir now does a
two-pass walk per directory level — .versions/ first, then everything
else with a per-walk skipBare set keyed by bucket-relative path that
expandVersionsDir populates when it claims a bare null sibling.

Sort tiebreak: PUTs only set second-level Mtime, so two versions
written in the same second tied. The unstable secondary order let
old-format version filenames sort oldest-first and corrupt
NoncurrentIndex under NewerNoncurrentVersions retention. Add
CompareVersionIds to s3lifecycle/version_time.go (mirrors the
canonical comparator in s3api/s3api_version_id.go to avoid the
import cycle) and use it as a secondary key after mtime equality.

Tests: pre-versioning null-as-noncurrent, suspended null-as-current,
directory-key marker as null version, end-to-end claim through
walkBucketDir's two-pass ordering, and same-second tiebreak via
canonical version-id ordering. fakeFilerClient grows a
LookupDirectoryEntry implementation backed by the same in-memory tree.

* fix(s3/lifecycle): only treat explicit-null bare entries as current

The pointer-missing branch in expandVersionsDir made null latest as
soon as a bare object was found. That's correct for suspended-bucket
writes (s3api_object_handlers_put.go writes the bare entry with
ExtVersionIdKey="null") but wrong for the pre-versioning race window:
a brand-new version under .versions/<file> exists before the parent's
ExtLatestVersionIdKey update lands, and a pre-versioning bare object
has no version-id marker. Marking that older bare object latest hides
the real new version and skips noncurrent expiration of the null
until the next process restart/bootstrap.

Distinguish the two: lookupNullVersion now returns whether the bare
entry's Extended map carries ExtVersionIdKey="null" (the suspended
write marker). expandVersionsDir's pointer-missing branch only
promotes null to latest when explicit; otherwise it falls back to
newest-sibling, which is safe for the race window since the new
version's mtime is fresher than the bare object's.

The existing suspended-null test now uses a new helper that adds the
explicit marker. New regression test covers the race window: bare
entry without the marker + a fresh .versions/<v1> file + missing
parent pointer must keep v1 as latest and the null as noncurrent.

* fix(s3/lifecycle): only the newest item can be the explicit-null latest

The pointer-missing branch in expandVersionsDir scanned every item for
an explicit null and promoted it to latest. After a suspended->enabled
transition that's the wrong call: createVersion writes the version
file before updating ExtLatestVersionIdKey, so a bootstrap that lands
in the race window sees an older bare null with ExtVersionIdKey="null"
plus a newer .versions/<v-new> child and no parent pointer. Promoting
the null misclassifies v-new as noncurrent and skips both the new
version's current-version expiration and the null's noncurrent
scheduling until the next bootstrap.

Constrain the explicit-null branch to items[0]: if the suspended-null
write is genuinely current it'll be the newest by mtime AND tagged.
Anything else falls through to the newest-sibling default.

Adds a regression test for the suspended->re-enabled race.

* fix(s3/lifecycle): paginate bootstrap directory listings

SeaweedList(..., limit=0) is a single-page request: the filer caps
limit=0 at DirListingLimit (1000 by default) and returns whatever fits
in one round trip. expandVersionsDir and walkBucketDir both relied on
that, so any directory bigger than the cap silently truncated. For
noncurrent retention this is correctness, not just scale — a hot key
with more versions than the cap had its rank/sort math computed off
the first page only, NumVersions, NoncurrentIndex, SuccessorModTime,
and the latest-fallback all wrong, with the older versions never
scheduled until a future bootstrap.

Add a listAll helper that drives pagination via StartFromFileName +
inclusive=false, looping until a page returns fewer entries than the
configured page size. Use it in both call sites. Page size is a var
(listPageSize, default 1024) so tests can shrink it without
generating thousands of entries.

The fake filer client now mirrors the real semantics: sort children
by name, honor StartFromFileName/InclusiveStartFrom, cap at Limit.
New regression tests force a small page size and assert the full
result set is processed and the call count matches what pagination
should drive.

* perf(s3/lifecycle): stream bucket walk in two passes instead of buffering

walkBucketDir was paginating into a children slice and then iterating
twice (pass 1: .versions/, pass 2: everything else). For flat buckets
with millions of entries the buffer is a real memory spike. Drop the
materialization: each pass now drives its own listAll over the same
directory and acts on entries as they stream in. The skipBare ordering
contract is preserved — pass 2 still runs after pass 1 finishes — and
the per-pass paging keeps memory bounded by listPageSize.

Tradeoff: each directory level is listed twice. For workloads where
that matters more than the memory headroom, we can revisit; the
correctness/scale dial here is what the noncurrent rules need.

Updated three tests for the new call count: each walk now records 2
listings per directory (pass 1 + pass 2). The KickOffNew dedup tests
expect 2 calls per bucket; the pagination test expects 6 instead of 3.
2026-05-09 10:48:32 -07:00
Chris Lu
1c917ffacb fix(volume): sticky EIO quarantine; track streamed reads (#9384)
Two follow-ups on PR #9382:

1. Quarantine wasn't sticky. Once CollectHeartbeat crossed the streak
   threshold and hid the replica, a subsequent successful read called
   checkReadWriteError(nil), wiping the streak; the next heartbeat
   then re-announced the suspect replica as read-only and master could
   send reads back to a disk that already failed IoErrorTolerance.
   Added an ioErrorQuarantined sticky flag set on the first heartbeat
   that observes the threshold and cleared only by MarkVolumeWritable
   (resetIoErrorState). clearIoError continues to reset just the
   streak so successful ops don't accumulate phantom errors.

2. Streamed reads bypassed the EIO counter. readNeedleDataInto and
   ReadNeedleBlob — the hot paths for large/range GETs — returned
   ReadNeedleData / needle.ReadNeedleBlob errors without threading
   them through checkReadWriteError, so a disk failing only on those
   paths would never trip IoErrorTolerance. Both now route the
   backend error through the tracker, and a fully clean
   readNeedleDataInto call clears the streak.

Tests cover the sticky flag (TestQuarantineIsSticky) and the streamed
read path (TestReadNeedleBlobTracksEIO via a fake EIO backend).
2026-05-09 09:55:02 -07:00
Chris Lu
7c60407897 fix(volume): don't nuke local data on transient IO error (#9378) (#9382)
* fix(volume): don't nuke local data on transient IO error (#9378)

A single syscall.EIO from any read/write/delete set v.lastIoError, and
the next CollectHeartbeat then called Volume.Destroy on the replica —
removing the .dat/.idx/.vif/.sdx/.ldb/.rdb files. A brief NFS / fabric
/ controller blip hitting several replicas at once could cascade into
removal of the last healthy copy, with no recovery for non-tiered
volumes.

Now require IoErrorTolerance (3) consecutive EIOs before acting, and on
that threshold mark the volume read-only and stop announcing it to the
master so re-replication kicks in from healthy peers — never delete
the data files. The on-disk copy stays for operator inspection /
recovery.

* review: fix race, accounting, recovery, non-EIO streak break

Addressing PR #9382 review:

- Data race on lastIoError: guard lastIoError + lastIoErrorCount with a
  RWMutex and expose them through note/clear/get helpers so the
  heartbeat reader sees a consistent snapshot. Verified with -race.
- Collection-size accounting: when a volume is quarantined for sustained
  EIO, skip the entire per-volume bookkeeping (`continue`) instead of
  flipping shouldDeleteVolume — the old branch subtracted a size that
  was never added, dragging the collection gauge to zero / negative.
- Recoverability: MarkVolumeWritable now also calls clearIoError so an
  operator can rejoin a quarantined replica. The next failed op
  re-arms the streak if the disk is still bad.
- Non-EIO streak break: a non-EIO error (e.g. ENOSPC) now resets the
  consecutive-EIO counter, so a sequence EIO,EIO,ENOSPC,EIO is treated
  as a streak of one — the counter only tracks consecutive EIOs.

Reads already call checkReadWriteError (volume_read.go), so successful
reads also clear the streak — no change needed there.
2026-05-09 09:20:31 -07:00
Chris Lu
c6ad6dcf74 feat(s3/lifecycle): sole-survivor delete-marker routing (Phase 5b/2) (#9381)
* feat(s3/lifecycle): sole-survivor delete-marker routing (Phase 5b/2)

Production stores delete markers under <object>.versions/<version-file>;
buildObjectInfo skips every version-folder event because it can't tell
current from noncurrent without sibling state. EXP_DM is the one rule
that can be routed from the file path alone: if the marker is the only
entry under .versions/<key>/ then it's necessarily the latest.

Add SiblingLister; gate the listing on (versioned, version-folder path,
delete-marker entry, EXP_DM rule active for the bucket, lister supplied)
so non-marker version writes pay nothing. The Match carries the LOGICAL
key in ObjectKey and the marker's version_id in VersionID so the
dispatcher can reach deleteSpecificObjectVersion(bucket, logical, vid);
without a version_id the dispatcher would BLOCK and freeze the cursor.

Server-side dispatch re-checks sole-survivor before deleting: another
PUT can land between schedule and dispatch, and identity-CAS only covers
the marker entry, not the directory shape. Lists .versions/<key>/ with
limit 2 and returns NOOP_RESOLVED if count != 1 or the surviving entry
is not the same version.

Fix isDeleteMarkerEntry to compare string(v) == "true" — production
writes []byte("true"), not {1}; every other reader of ExtDeleteMarkerKey
in this repo uses the string predicate.

filerSiblingLister.Count caps at limit 2 — callers only distinguish
"sole survivor" (1) from "more than one" (>=2).

Follows #9373.

* fix(s3/lifecycle): gate sibling listing on active event-driven EXP_DM

hasActionKind tested key presence only; an EXP_DM rule whose action was
inactive (BootstrapComplete=false) or scan-only would still trigger the
sibling-listing RPC even though no match could fire. Replace with a
helper that mirrors the per-key filter Route already applies before
emitting matches: kind matches, snap.Action returns non-nil, IsActive,
and Mode == ModeEventDriven.

Add a regression test that builds a versioned snapshot with the rule
but no PriorStates (action stays inactive) and asserts the lister is
never consulted.

* chore(s3/lifecycle): trim verbose comments

* fix(s3/lifecycle): null version, hard-delete trigger, latest pointer

Three correctness gaps in EXP_DM routing.

1. Null version. SiblingLister.Count saw only .versions/<key>/, but a
   pre-versioning bare-key object survives outside that folder when
   versioning is enabled later. Marker + null would look like count==1
   and EXP_DM would re-expose the old object. Replace Count with
   Survivors which also reports HasNullVersion; route- and dispatch-time
   suppress when set.

2. Hard-delete trigger. The router skipped events with NewEntry==nil so
   a noncurrent hard-delete that left the marker as sole survivor never
   fired EXP_DM. Allow version-folder hard-deletes through; the lister's
   LoneEntry carries the surviving marker's version_id and identity.

3. Latest pointer. checkSoleSurvivorMarker only checked count and
   filename. The worker can race createDeleteMarker between the file
   write and the .versions/ directory metadata update. Require
   ExtLatestVersionIdKey == versionId; missing pointer returns
   retry-later instead of deleting.

Adds a null-version exists check on the dispatch path too.

* fix(s3/lifecycle): normalize null-version lookup errors and detect dir markers

Two correctness bugs in the null-version check.

Route-side: filerSiblingLister called client.LookupDirectoryEntry
directly. SeaweedList wraps not-found via filer_pb.LookupEntry, which
normalizes the gRPC string-mapped not-found into ErrNotFound. The raw
client returns it as a generic error instead, so every absent
null-version (the common case) bubbled up as an error and the router
suppressed every otherwise-valid match. Use filer_pb.LookupEntry.

Both sides: explicit S3 directory-key markers (object names ending in /)
are stored as directory entries with Attributes.Mime set;
processExplicitDirectory in the listing path treats them as null
versions. The previous check was !IsDirectory only, which let the
marker delete proceed and re-expose the directory key. Add
IsDirectoryKeyObject() to both predicates. Also use
util.NewFullPath(...).DirAndName() for the parent/name split so a
trailing-slash key resolves to the same underlying entry path as the
listing code.

* fix(s3/lifecycle): EXP_DM ctx propagation, nil-entry guard, fast-path skip

Three small follow-ups on the EXP_DM dispatch path.

checkSoleSurvivorMarker now takes ctx instead of context.Background()
so worker shutdown / deadlines cancel the SeaweedList RPC instead of
stalling.

If SeaweedList fires the lone callback with entry==nil, firstName
stays empty and the marker-replaced check would short-circuit; that's
the one shape that bypasses the dispatch guard, so retry-later instead.

routeSoleSurvivorMarker now skips the Survivors RPC on regular
non-marker version creates — those always have Count >= 2, so the
listing was wasted load on every versioned write under an EXP_DM rule.
Hard-delete events (NewEntry==nil) and marker creates still flow
through. Added a regression test asserting the regular-create case
doesn't consult the lister.

Documented that logicalKeyFromVersionPath rejects bucket-root markers
intentionally.
2026-05-08 23:24:08 -07:00
Chris Lu
196c41d21a test(s3/lifecycle): cover scheduler/bootstrap walker + MPU detection (#9380)
Locks down isMPUInitDir, walkBucketDir (regular/nested/MPU/error/cancel
paths), and BucketBootstrapper.KickOffNew (per-bucket fanout and
in-process dedup) against a fake SeaweedFilerClient.
2026-05-08 22:14:45 -07:00
Chris Lu
ee1d8f9e8c refactor(s3api): drop filer.conf TTL routing from PUT lifecycle (#9379)
PutBucketLifecycleConfiguration used to install /buckets/<bucket>/<prefix>
day-TTL entries in filer.conf so the volume server's RocksDB compaction
filter would expire matching writes. With 9377 the s3api server now stamps
volume TTL per-write via LifecycleTTLResolver off the stored XML, which
covers the same prefix-only Expiration.Days subset and additionally
handles size filters and AWS overlapping-rule precedence. Maintaining
both paths means a rule change has to mutate two stores in lockstep, and
the filer.conf path can't represent everything the resolver does.

Drop the add path. Keep a one-way cleanup loop so an upgrade still wipes
day-TTL entries written by older builds — otherwise a stale entry would
silently double-stamp writes (volume server expires under the old rule)
or contradict the new XML after a rule change.

Also removes resolveLifecycleDefaultsFromFilerConf (no longer needed) and
the versioning-fast-path guard (the resolver itself returns nil for
versioned/object-lock buckets, covered by
TestNewLifecycleTTLResolver_NilOnVersionedBucket). Tests covering the
deleted helpers are deleted with them; the GET fallback that synthesizes
lifecycle rules from existing filer.conf TTLs is unchanged so users who
historically configured TTL via filer.conf directly still see a rule.
2026-05-08 21:54:39 -07:00
Chris Lu
2458f6c81c feat(s3api): apply lifecycle TTL at write time (#9377)
* feat(s3api): apply lifecycle TTL at write time

The S3 server already has the bucket's lifecycle XML at PUT time (via
the cached BucketConfig), so volume-TTL routing is just a per-write
decision instead of something that needs a separate filer.conf
projection kept in sync via operator commands.

- BucketConfig caches the canonical Rules parsed from the lifecycle
  XML once on load (BucketConfigCache invalidates on Put/Delete
  Lifecycle, so the rules stay current automatically).
- resolveLifecycleTTLForWrite walks the cached rules: longest-prefix
  match, applies tag and size filters against the request, returns
  Days * 86400. Versioned buckets, non-Expiration.Days rules, and
  unevaluable size filters (no Content-Length) yield 0 — the
  lifecycle worker handles those at scan time.
- putToFiler resolves TTL once and passes it through both the
  AssignVolumeRequest (so chunks land on a TTL volume) and the new
  entry's Attributes.TtlSec (so the filer's RocksDB compaction also
  expires the metadata).

Lifecycle XML PUT/DELETE now influences write routing immediately —
no operator command, no filer.conf bookkeeping. The lifecycle worker
remains authoritative for the cases the fast path can't cover (existing
objects via bootstrap, versioned buckets, noncurrent retention,
abort-MPU, tag/size filters that didn't hold at PUT time).

CompleteMultipartUpload and CopyObject still need wiring; left for
follow-ups so this PR stays scoped.

* perf(s3api): pre-filter and sort lifecycle rules for the per-PUT TTL walk

resolveLifecycleTTLForWrite walked every lifecycle rule on every
PutObject, including disabled / non-Expiration.Days rules that could
never fire on the fast path, and computed "longest prefix wins" via a
running max instead of an early exit.

Cache a pre-filtered + pre-sorted slice in BucketConfig:
- buildTTLFastPathRules drops everything except Status=Enabled +
  ExpirationDays>0;
- sorts by descending prefix length (stable, so equal-length rules
  keep their XML order).

The resolver returns on first prefix+filter match. A bucket whose
lifecycle XML has no Expiration.Days rules is now O(1); a typical
bucket with one Expiration.Days rule walks one HasPrefix per PUT.

The cache is built once per bucket-config load. PutBucketLifecycle /
DeleteBucketLifecycle already invalidate the cache, so the fast-path
slice stays current automatically.

* refactor(s3api): LifecycleTTLResolver object + four review fixes

Pulls the per-PUT TTL resolution into a dedicated type so the bucket
config holds one object instead of a slice + magic-walk function:

- LifecycleTTLResolver wraps the pre-filtered, pre-sorted rules.
  nil-safe Resolve so the call site doesn't have to special-case
  buckets with no eligible rules.

Four review findings:

1. (high) drop tag-filtered rules from the fast path. Tags are mutable
   post-PUT via PutObjectTagging but volume TTL is irreversible — an
   object that matched at write time would still expire after the tag
   was removed. Worker re-evaluates current tags at scan time. Fast
   path now keeps only stable predicates: prefix and size.

2. (high) move TTL resolution out of putToFiler. MPU parts, copy-part
   destinations, and other transient writes called putToFiler with
   object="" — bucket-wide rules (empty Prefix) matched and bound a
   TTL clock starting at part-upload time, before
   CompleteMultipartUpload existed. putToFiler now takes an explicit
   ttlSec parameter; only the user-visible PutObject paths
   (PutObjectHandler, postpolicy) feed it from the resolver. MPU and
   copy-part pass 0.

3. (medium) AWS overlapping-rule precedence is "shorter expiration
   wins", not "longest prefix wins". Sort by ExpirationDays ascending
   so the first prefix match is also the shortest applicable rule.

4. (medium) overflow no longer caps at math.MaxInt32 seconds (~68y).
   A longer policy would have expired early. Return 0 instead so the
   worker enforces the actual policy on its own schedule.

Versioning gate moves into the resolver constructor — versioned
buckets get a nil resolver. The five putToFiler callers all updated:
PutObjectHandler + postpolicy resolve via lifecycleTTLForObjectWrite,
suspended/versioned wrappers pass 0 by construction, MPU part and
copy-part SSE pass 0 with a one-line comment about why.

* refactor(s3api): drop unused BucketConfig.LifecycleRules field

The full canonical rule set was set on every bucket-config load but
never read — resolveLifecycleTTLForWrite worked off the resolver's
filtered slice, and the lifecycle worker reads bucket entries straight
off the meta-log instead of this cache. Remove the field and its
s3lifecycle import.

* perf(s3api): pre-compute LifecycleTTLResolver hot-path fields

Resolve was doing per-call work that's actually constant per bucket-
config load: int64 multiplication, max-int32 overflow check, field
indirections through *s3lifecycle.Rule. Move it to the constructor
and pack the rule into a compact ttlRule (prefix + ttlSec int32 +
sizeGT/sizeLT) so the inner loop is HasPrefix → optional size check
→ return.

Drop overflowing rules at construction rather than handling per-
resolve: capping would expire long policies early, and returning 0
in the inner loop would prevent any shorter overlapping rule from
firing. Drop-at-construction composes correctly with the ascending
sort.

Benchmarks (Apple M4):
  NilReceiver           0.99 ns/op   0 B/op
  OneRuleMatching       2.75 ns/op   0 B/op
  FiveRulesNoMatch     13.5  ns/op   0 B/op

* fix(s3api): refresh LifecycleTTL resolver on bucket-config update

storeBucketLifecycleConfiguration writes to Entry.Extended via
updateBucketConfig, which clones the cached BucketConfig and calls
the user fn, then caches the result. The clone inherits the prior
LifecycleTTL pointer and nothing rebuilt it from the new XML, so
add/replace/delete of a lifecycle policy left the wrong resolver in
cache until eviction. Same gap on the meta-log side: peer-driven
updates flowed through updateBucketConfigCacheFromEntry without
re-deriving the resolver.

Centralize the Entry -> derived-field mapping in one helper that
resets every Extended-backed field then repopulates from the entry,
and call it from getBucketConfig (initial load), updateBucketConfig
(after updateEntry succeeds, before caching), and
updateBucketConfigCacheFromEntry (meta-log path). Reset is the
load-bearing part: deleting the lifecycle XML must yield a nil
resolver, since stamping a stale TTL onto subsequent writes is
irreversible.

* fix(s3api): PostPolicy passes object size, not multipart wire size

lifecycleTTLForObjectWrite was reading r.ContentLength, which on the
PostPolicy path is the multipart envelope (form fields + boundaries),
not the uploaded object body. A size-filtered rule would evaluate
against that inflated total and stamp (or skip) a TTL the policy
didn't intend.

Take the object size as an explicit parameter. PutObject still passes
r.ContentLength (correct there); PostPolicy passes the fileSize
already extracted from the form part. Negative size means unknown
and continues to skip any size-filtered rule.

* fix(s3api): treat Object Lock as versioned for lifecycle TTL fast path

Object Lock requires versioning at the API level, but it can be
enabled at create time without S3 ever writing the explicit
Versioning header. The lifecycle resolver construction site only
checked Versioning, so an Object-Lock bucket with no Versioning byte
would still get a fast-path resolver and stamp volume TTL onto
writes — destroying noncurrent versions when the volume expires.

Mirror the OR already used in BucketIsVersioned: ObjectLockConfig
non-nil counts as versioned for resolver construction. Existing
explicit-Versioning paths are unchanged.
2026-05-08 21:35:27 -07:00
Chris Lu
e55db58ca9 feat(s3/lifecycle): expose Prometheus metrics (Phase 7) (#9375)
* feat(s3/lifecycle): expose Prometheus metrics (Phase 7)

Five new gauges/counters under the s3_lifecycle subsystem so operators
can see what the worker is doing without grepping logs:

- dispatch_total{bucket,kind,outcome} — every LifecycleDelete RPC
  bumps this. Outcome is the proto enum name (DONE, NOOP_RESOLVED,
  RETRY_LATER, BLOCKED, …) plus a synthetic "RPC_ERROR" for transport
  failures classified as RETRY_LATER.
- schedule_depth{shard} — pending matches in each shard's schedule,
  sampled on the dispatcher tick.
- cursor_min_ts_ns{shard} — per-shard min cursor timestamp; lag is
  derived as (now - min) by the scrape side.
- events_total{shard} — meta-log events the reader fed to the router.
- bootstrap_dispatch_total{bucket,kind} — bootstrap-walk dispatches.

Test asserts the dispatch counter increments for both DONE and
RPC_ERROR paths.

* fix(stats): purge lifecycle bucket label series in DeleteBucketMetrics

The two new bucket-labeled lifecycle counters
(S3LifecycleDispatchCounter, S3LifecycleBootstrapDispatchCounter)
weren't included in DeleteBucketMetrics, so explicit bucket teardown
left their label series behind — same cardinality leak the existing
counters above already avoid. Tack them onto the same DeletePartialMatch
chain.
2026-05-08 17:49:10 -07:00
Chris Lu
05d31a04b6 fix(s3tests): wire lifecycle worker for expiration suite (#9374)
* fix(s3tests): wire lifecycle worker for expiration suite

The upstream s3-tests `test_lifecycle_expiration` / `test_lifecyclev2_expiration`
exercise the "set rule, wait, verify deletion" path. Phase 4 (#9367) intentionally
stripped the PUT-time back-stamp, so pre-existing objects no longer pick up TtlSec
on a freshly-applied rule. The s3tests CI bare-bones `weed -s3` had nothing left
driving expiration.

Three changes that work together:

- Engine scales `Days` by `util.LifeCycleInterval`. Production keeps the 24h day;
  the `s3tests` build tag shrinks it to 10s so a `Days: 1` rule completes inside
  the suite's 30s polling window. Exported `DaysToDuration` so sibling-package
  tests pin to the same scale.
- Scheduler/dispatcher tick defaults split into `_default` / `_s3tests` files.
  Production stays 5s/30s/5m; the test build runs at 500ms/2s/2s so deletions
  land within a couple ticks of becoming due.
- s3tests.yml spawns `weed shell s3.lifecycle.run-shard -shards 0-15 -events 0
  -runtime 1800s` alongside the s3 server in both the basic and SQL blocks; the
  shell command runs the full pipeline (reader + scheduler + dispatcher) for the
  duration of the suite. `test_lifecycle_expiration_versioning_enabled` is left
  out for now — versioned-bucket expiration via the worker still needs its own
  pass.

Drive-by: bump `TestWorkerDefaultJobTypes` to 7 to match the registered
handler count (8b87ceb0d updated `mini_plugin_test.go` for the s3_lifecycle
plugin but missed this twin test).

Two retention-gate engine tests `t.Skip` under the s3tests build because they
rely on absolute lookback-vs-retention math the day-rescale collapses; the prod
build still covers them.

* review: harden lifecycle worker spawn + assert handler identity

- Workflow: aliveness check on the backgrounded `weed shell` (a bad command
  exits in <1s and the suite would otherwise just opaque-timeout); move
  worker/server teardown into a `trap cleanup EXIT` so failure paths still
  print the worker log and reap the data dir.
- worker_test: check the actual job-type set by name, not just the count.

* fix(shell): keep s3.lifecycle.run-shard alive when no rules exist yet

The s3-tests CI runs the worker BEFORE any test creates a bucket, so
LoadCompileInputs returns empty and the shell command was bailing out
with "no buckets with enabled lifecycle rules found" within ~1s. The
aliveness check then fired exit 1 before tox ever started.

Two changes:

- Don't early-exit on empty inputs. Compile against the empty set, log a
  one-liner, and let the pipeline run normally — the meta-log subscription
  is already up, so events for buckets created later DO arrive; they just
  need the engine to know about them when they do.
- Add `-refresh <duration>` (default 5m, 2s in s3tests CI) that
  periodically re-runs LoadCompileInputs + engine.Compile so rules added
  after startup land in the snapshot the dispatcher reads on its next
  tick. Production deployments keep the 5m default; only the CI workflow
  drops to 2s.

Workflow passes `-refresh 2s` in both basic and SQL blocks.

* fix(shell): backfill pre-rule entries via bootstrap walker

The reader-driven path only sees meta-log events created AFTER its
engine snapshot knows the rule. The s3-tests CI scenario PUTs objects
first, then PUTs the lifecycle config, so by the time the engine
refresh picks up the new bucket the object events have already been
seen-and-dropped (BucketActionKeys returned empty for the bucket).

Wire bootstrap.Walk into the shell command:

- bucketBootstrapper tracks buckets seen so far. kickOffNew spawns one
  loop goroutine per fresh bucket.
- Each goroutine re-walks the bucket every walkInterval (defaults to
  the same value as -refresh, i.e. 2s in s3tests CI, 5m in prod) and
  feeds each entry through bootstrap.Walk; due actions dispatch via a
  direct LifecycleDelete RPC. Not-yet-due entries are silently skipped
  and picked up on a later iteration once they age past their (rescaled
  or real) threshold.
- LifecycleDelete is called with no expected_identity; the server-side
  identityMatches treats nil as "skip CAS", which is the right call
  for bootstrap (the bootstrap entry doesn't carry chunk fid /
  extended hash anyway).

The dispatcher's pkg-private toProtoActionKind is duplicated in the
shell file rather than exported, since the shape is six lines and the
reverse import would pull a proto dep into the s3lifecycle root.

* refactor(s3/lifecycle): hoist bucket bootstrapper into scheduler pkg

The shell command got the backfill in the previous commit but the worker
plugin (weed/worker/tasks/s3_lifecycle/handler.go) drives Scheduler.Run
directly and missed it — same root cause: the reader-driven path only
sees events created after the rule lands, so a daily cron picking up a
freshly-PUT rule wouldn't expire any pre-rule object.

Move the looping bucket walker into scheduler.BucketBootstrapper:

- Scheduler.Run now constructs one and calls KickOffNew on every engine
  refresh. Per-bucket goroutines re-walk every BootstrapWalkInterval
  (defaults to RefreshInterval — 5m in prod, 2s under s3tests).
- The shell command consumes the same struct instead of its own copy
  so the two paths can't drift in semantics.

* refactor(s3/lifecycle): walk-once + schedule via event injection

Previous per-bucket walker re-listed every WalkInterval forever. For a
bucket with N objects under a long rule, the worker did O(N * runtime /
walkInterval) listings even when nothing was newly due — way too much
for production-scale buckets.

New approach: walk each bucket exactly once on first sight, synthesize
one *reader.Event per existing entry, push it onto Pipeline.events.
Router.Route builds a Match with DueTime=mtime+delay; future-due matches
sit in the per-shard Schedule and fire when their DueTime arrives.
Currently-due matches fire on the very next dispatch tick.

Wiring:

- dispatcher.Pipeline lifts its events channel into a struct field
  with sync.Once init, and exposes InjectEvent(ctx, ev). Reader no
  longer closes the channel — the dispatch goroutine exits on runCtx
  cancellation, which works the same as channel-close did.
- scheduler.BucketBootstrapper drops the WalkInterval ticker. KickOffNew
  spawns one walker goroutine per fresh bucket; the goroutine lists,
  synthesizes events, then exits.
- scheduler.Scheduler builds its pipelines up front and exposes a
  pipelineFanout (shard -> Pipeline) as the EventInjector, so a multi-
  worker scheduler routes each synthesized event to the pipeline that
  owns its shard.
- Shell command's single-pipeline path passes pipeline.InjectEvent
  directly.

Synthesized events carry TsNs=0; dispatcher.advance treats that as a
no-op so the reader's persisted cursor isn't ratcheted past unprocessed
meta-log events. Identity (HeadFid + ExtendedHash) is still computed
from the real filer entry, so the server's identity-CAS catches an
overwrite between bootstrap and dispatch.

* debug(s3tests): make lifecycle worker progress visible in CI logs

The previous CI failure dumped an empty $LC_LOG even though the worker
was running. Two reasons:

1. weed shell suppresses glog by default (logtostderr / alsologtostderr
   set to false). Pass `-debug` so the bootstrapper's V(0) lines reach
   stderr instead of disappearing into /tmp/weed.*.log.
2. cleanup used `kill -9` which skips Go's stdout flush. SIGTERM first
   with a 1s grace, then SIGKILL the holdout, then read the log.

While here: bump the bootstrap walker's two informational logs to V(0)
so the diagnosis from CI doesn't require -v=1 on the worker.

* fix(s3/lifecycle/dispatcher): refresh snap on every event

Pipeline.Run captured snap at startup and only refreshed it on the
dispatch tick. With bootstrap event injection, the walker pushes events
seconds after engine.Compile sees the bucket — typically WITHIN the
same dispatch interval. Routing against the cached (empty) snap then
silently dropped every match because BucketActionKeys returned nil for
the bucket-not-yet-in-snapshot case.

Re-fetch on each event. Engine.Snapshot is an atomic.Pointer.Load, so
the cost is negligible. The dispatch-tick branch keeps using a fresh
local read for its own loop, so its semantics are unchanged.
2026-05-08 17:29:47 -07:00
Chris Lu
159cfc97ce feat(s3/lifecycle): classify versioned events by storage path (Phase 5b/1) (#9373)
* feat(s3/lifecycle/router): classify versioned events by storage path

Phase 5b first slice. Pass the bucket's Versioned flag from the engine
snapshot into buildObjectInfo and:

- Recognize <key>.versions/<vid> events as noncurrent versions.
  IsLatest=false, info.Key strips the .versions/<vid> suffix so a
  rule's Filter.Prefix matches the user's logical key, and the
  AWS-visible version_id rides on Match.VersionID for the dispatcher
  to target a single version on the server.
- Read IsDeleteMarker from Extended unconditionally — the engine
  rejects ExpiredObjectDeleteMarker when NumVersions != 1, so without
  sibling listing the marker case stays correctly suppressed (a
  separate PR will add the listing).
- Non-versioned buckets keep the existing behavior even when an
  object literally named "*.versions/v1" exists; Versioned=false
  short-circuits the path classification.

Time-based NoncurrentDays now fires on noncurrent events.
NewerNoncurrent and ExpiredObjectDeleteMarker still need sibling
listing — left for a follow-up.

* fix(s3/lifecycle/router): require ExtVersionIdKey to confirm noncurrent

Path classification alone misclassifies a literal-key collision: a
versioned bucket holding an object with key "logs/backup.versions/2023"
would be flagged noncurrent and have its key stripped to "logs/backup",
losing the user's actual rule-prefix-matching path. SeaweedFS doesn't
reserve the .versions/ segment, so the path shape is necessary but
not sufficient.

Add an authoritative confirmation: the entry must declare the same
version_id via ExtVersionIdKey (the field SeaweedFS sets when storing
a tracked version). Also reject idx==0 paths so ".versions/<vid>"
can't yield an empty logical key.

Tests:
- collision: versioned bucket + .versions/ in literal key + no
  metadata (and the mismatched-vid variant) → still classified as a
  current-version object;
- root-versions: .versions/v1 (idx==0) → treated as a regular key;
- existing noncurrent test now sets ExtVersionIdKey to mirror the
  storage shape.

* fix(s3/lifecycle/router): skip versioned-bucket version-folder events

The previous attempt tried to classify <key>.versions/<vid> events as
noncurrent versions by storage path. That's broken on three counts:

- SeaweedFS stores version files as v_<id> (getVersionFileName), so
  comparing the path suffix to the raw ExtVersionIdKey never matches.
- The "current latest" version on a versioned bucket lives at the
  same .versions/v_<id> path shape as noncurrent versions; the latest
  pointer is on the parent .versions/ directory's
  Extended[ExtLatestVersionIdKey], which the router doesn't see.
- Even with a correct vid match, IsLatest=false plus the storage path
  as ObjectKey would have the dispatcher recompose <storagepath>.versions/v_<id>
  and no-op (or worse, target the wrong file).

Until we route from .versions/ directory pointer-transition events
(or supply IsLatest/SuccessorModTime/index from sibling listing),
skip every event under a *.versions/ folder. Bare-key events (null
versions) still route normally; bootstrap walking covers the
versioned-storage cases.

Tests assert the skip across tracked, literal-collision, and
bucket-root .versions paths.

* feat(s3api): refuse noncurrent-kind delete on the current latest version

Defense-in-depth for the noncurrent kinds: even when bootstrap (or a
future event-driven path) thinks a version is noncurrent, the server
must verify against the .versions/ directory's
Extended[ExtLatestVersionIdKey] before deleting. If the target version
matches the latest pointer the action is silently dropped as
NOOP_RESOLVED:VERSION_IS_LATEST instead of deleting the live data.

* refactor(s3/lifecycle): tidy versioning gates per review

- router: skip directory entries (other than MPU init) in
  buildObjectInfo so .versions/ folder events never become
  ObjectInfo. Subtest "versions dir itself" added.
- s3api: switch isCurrentLatestVersion's path split from
  filepath.Split (OS-dependent) to path.Split so filer paths
  always use '/'.
2026-05-08 14:15:32 -07:00
Lars Lehtonen
935fb42e1d chore(weed/util/chunk_cache): remove unused functions (#9372)
* chore(weed/util/chunk_cache): remove unused functions

* fix(chunk_cache): bound ReadAt buffer in readNeedleSliceAt

When the caller-provided buffer is larger than the remaining needle
bytes, ReadAt would spill into the next needle and trigger the
n != wanted error. Slice to data[:wanted] so the read stops at the
needle boundary.

---------

Co-authored-by: Chris Lu <chris.lu@gmail.com>
2026-05-08 13:12:11 -07:00
Chris Lu
fd463155e4 fix(ec): planner treats each (server, disk_id) as a distinct target (#9369) (#9371)
* fix(ec): planner treats each (server, disk_id) as a distinct target (#9369)

master_pb.DataNodeInfo.DiskInfos is keyed by disk type, so a volume
server with multiple physical disks of the same type collapses into a
single DiskInfo. Per-disk attribution survives only inside the
VolumeInfos[].DiskId / EcShardInfos[].DiskId records, and the active
topology never put it back together. The EC planner saw N candidates
instead of N×disks, returned a short plan, and createECTargets
round-robined extra shards onto the same (server, disk_id) — colliding
with the #9185 disk_id-aware ReceiveFile.

Reconstruct per-physical-disk view in UpdateTopology by splitting each
DiskInfo into one entry per observed disk_id, and index volumes / EC
shards by their own DiskId so lookups stay aligned. Refuse to plan an
EC task when fewer than totalShards distinct disks are available rather
than packing shards onto the same disk.

Threads dataShards/parityShards through planECDestinations,
createECTargets and createECTaskParams so the helpers don't depend on
the OSS 10+4 constants — keeps enterprise merges clean.

* trim verbose comments

* align EC param signatures with enterprise

- dataShards/parityShards: uint32 → int (matches enterprise's ratio API)
- drop unused multiPlan from createECTaskParams
- minTotalDisks: total/parity+1 → ceil(total/parity), correct for non-default ratios

Reduces merge surface when this PR lands in seaweed-enterprise.
2026-05-08 12:59:02 -07:00
Chris Lu
194dce27bf fix(mount): preserve user-set mtime through async/periodic flush (#9363) (#9370)
* fix(mount): preserve user-set mtime through async/periodic flush (#9363)

flushMetadataToFiler and flushFileMetadata both stamped time.Now() onto
the entry before sending it to the filer, clobbering any mtime SetAttr
had stored from utimes()/touch -m -d. The reproducer hit this ~1s after
touch because the writebackCache deferred close from the prior write
ran flushMetadataToFiler after the user's utimes call.

Flush has no business inventing timestamps. Move the write-time stamp
into Write (where it always belonged for POSIX correctness) and let
flush persist whatever Write or SetAttr already put on the entry.

* test(mount): tighten mtime regression test, drop tautological one

- userMtime now has non-zero nanoseconds, so the *Ns assertions catch a
  regression that would zero the field.
- Add CtimeNs assertion (was missing).
- Drop TestWriteStampsEntryMtime: it duplicated the implementation it
  was supposed to test, so a regression in Write would not have failed
  it. Driving the real Write path needs a full PageWriter, which is out
  of scope for this fix; TestFlushFileMetadataPreservesUserMtime is the
  meaningful regression for #9363.
2026-05-08 12:37:23 -07:00
Chris Lu
89aab30821 feat(s3/lifecycle): wire AbortIncompleteMultipartUpload (Phase 5a) (#9368)
* feat(s3/lifecycle/router): emit ABORT_MPU events for .uploads/<id> init dirs

Detect a meta-log event at exactly .uploads/<upload_id> (a directory)
and build the ObjectInfo from its destination key (entry.Extended[key])
so a rule with Filter.Prefix=foo/ matches an MPU uploading to foo/bar.
Sub-events under .uploads/<id>/<part> ride a different mtime and would
over-fire the ABORT_MPU schedule, so they're rejected explicitly.

m.ObjectKey stays as ev.Key (.uploads/<upload_id>) — the dispatcher
needs the upload directory path, not the destination key, to actually
remove the in-flight upload.

* feat(s3api): wire LifecycleDelete ABORT_MPU to remove the upload dir

Replaces the retryLater stub. Validates the .uploads/<upload_id> shape
of req.ObjectPath (so a malformed event can't escalate to a wider rm),
then deletes the upload directory under <bucket>/.uploads/<id>. Maps
NotFound to NOOP_RESOLVED, transport errors to RETRY_LATER, success to
DONE.

* refactor(s3api): drop redundant exists check before lifecycle ABORT_MPU rm

s3a.rm already does a NotFound-returning lookup, so the pre-check just
adds a round-trip. Map filer_pb.ErrNotFound to NOOP_RESOLVED on rm,
keep transport errors as RETRY_LATER.

* refactor(s3/lifecycle/router): use s3_constants for MPU paths + Extended key

Drop the hardcoded ".uploads/" and "key" string literals; the symbols
already exist as s3_constants.MultipartUploadsFolder and
ExtMultipartObjectKey, and the server side reaches them through the
same constants. Keeping the test helpers tied to those names also makes
the negative-result tests meaningful — they'd otherwise still pass if
the lookup constant drifted.

* fix(s3api): close lifecycle ABORT_MPU traversal + NOT_FOUND gaps

Two issues with the recent ABORT_MPU plumbing:

- "." and ".." passed the no-slash check but resolve to the bucket root
  via util.JoinPath, so .uploads/.. could rm the wrong directory.
- filer.DeleteEntry suppresses ErrNotFound and returns success, so the
  rm path can't distinguish missing from deleted; the previous version
  reported DONE for an already-aborted upload instead of NOOP_RESOLVED.

Reject the two reserved names explicitly and restore the existence
pre-check so the outcome map stays correct. Add a table-test covering
the rejected paths.

* fix(s3/lifecycle/bootstrap): walk MPU init dirs by destination key

A real MPU init record is a directory under .uploads/<id> created by
mkdir; the bootstrap walker was skipping every directory entry, so an
MPU that existed before the meta-log subscription was never aborted.
Even with the skip relaxed, MatchPath used the .uploads/<id> path, so
a rule with Filter.Prefix=logs/ would never fire on an MPU uploading
to logs/foo.txt.

Add Entry.DestKey, let IsMPUInit directories through, and use DestKey
for both MatchPath and ObjectInfo.Key. A bare init directory with no
DestKey means metadata hasn't landed yet — skip rather than guess.

* fix(s3/lifecycle): gate (kind, info) shape so MPU init only fires ABORT_MPU

An MPU init record carries IsMPUInit=true and IsLatest=false. Without
gating, the router and bootstrap walker matched it against every active
ActionKey for the bucket, so NONCURRENT_DAYS / NEWER_NONCURRENT fired
(IsLatest=false reads as a noncurrent version). The dispatcher would
then BLOCK on empty version_id and freeze the cursor.

Add a shape gate at both call sites:
  - IsMPUInit + non-ABORT_MPU kind → continue
  - regular object + ABORT_MPU kind → continue

Plus a defense-in-depth check at the top of EvaluateAction so future
callers can't reintroduce the bug. Tests cover all three layers.

* test(s3/lifecycle): tighten dual-action coverage at the call sites

- Walk multi-action: replace the kinds-as-set check with an exact-shape
  DeepEqual on (path, kind) tuples. The set check would have missed an
  MPU init wrongly firing NONCURRENT_DAYS — exactly the regression the
  (kind, info) gate fixes.
- Router: add a converse case for the dual ExpirationDays +
  AbortIncompleteMultipartUpload rule. A regular current-version object
  must fire only EXPIRATION_DAYS; without the gate the dispatcher would
  also receive ABORT_MPU and rm the object via the MPU code path.
2026-05-08 12:12:42 -07:00
Chris Lu
8b87ceb0d1 refactor(s3api): strip back-stamp from PutBucketLifecycleConfiguration (Phase 4) (#9367)
* refactor(s3api): strip back-stamp from PutBucketLifecycleConfiguration

The handler used to walk every existing entry under the rule's prefix
and stamp entry.Attributes.TtlSec + the SeaweedFSExpiresS3 flag so that
the filer's compaction filter would expire them. With the event-driven
lifecycle worker live, that retroactive walk is redundant — the worker
drives expiration off the meta-log and a one-time bootstrap scan, so a
PUT lifecycle stays O(rules) instead of O(objects).

New writes still inherit TTL from the filer.conf location entry above;
that volume-routing path is unchanged here and will move to an explicit
operator command later (Phase 11).

Drops updateEntriesTTL + processDirectoryTTL + processTTLBatch +
updateEntryTTL from filer_util.go.

* fix(s3api): clear stale lifecycle TTL entries on PUT

PutBucketLifecycleConfiguration only ever appended/updated filer.conf
entries — it never cleared ones the operator removed, renamed-prefix on,
disabled, retagged with a tag filter, or bucket-versioned out of the
fast path. The stale day-TTL kept routing new writes (and would expire
old ones if any landed under the prefix) after the policy was updated.

Treat PUT as a full replacement: walk this bucket's existing day-TTL
entries, clear them, then add fresh entries from the new rule set.

* test(command): bump mini default plugin job-type count to 7

The s3_lifecycle plugin handler registered in #9362 is the seventh
default; the test still asserted six.

* fix(s3api): delete stale lifecycle PathConf instead of blanking Ttl

Just clearing pathConf.Ttl leaves the rule's Collection, Replication,
and VolumeGrowthCount in place, so new writes still match the stale
prefix and inherit outdated routing/placement. Use
fc.DeleteLocationConf so the lifecycle-owned PathConf goes away
entirely. Same fix in DeleteBucketLifecycleHandler, which had the
same bug.
2026-05-08 11:03:03 -07:00
Chris Lu
5d43f84df7 refactor(plugin): rename detection_interval_seconds → detection_interval_minutes (#9366)
Minutes is the natural granularity for detection cadence — every
production handler already set the seconds field to a 60-multiple
(17*60, 30*60, 3600, 24*60*60). Switching to minutes drops the *60
arithmetic and matches the unit conventions used elsewhere in the
plugin worker forms.

- Proto: AdminRuntimeDefaults + AdminRuntimeConfig.detection_interval_*
  field renamed.
- Helpers: durationFromMinutes / minutesFromDuration alongside the
  existing seconds variants in plugin_scheduler.go.
- Handlers: vacuum, ec_balance, balance, erasure_coding, iceberg,
  admin_script, s3_lifecycle now declare DetectionIntervalMinutes.
- Admin: scheduler_status + types + UI templ + plugin_api.go pass
  through the new field; UI label and table cells switch to "min".
2026-05-08 10:33:02 -07:00
Chris Lu
7f254e158e feat(worker/s3_lifecycle): plugin handler with admin UI config (#9362)
* feat(s3/lifecycle): scheduler — N pipelines over an even shard split

Scheduler.Run spawns Workers Pipeline goroutines plus one engine-refresh
ticker. Each worker owns a contiguous AssignShards(idx, total) slice of
[0, ShardCount) and runs Pipeline.Run with EventBudget bounding each
iteration; brief RetryBackoff between iterations avoids hot-loop on
errors. The refresh ticker rebuilds the engine snapshot from the filer's
bucket configs every RefreshInterval.

LoadCompileInputs / IsBucketVersioned / AllActivePriorStates are
exported from a configload.go sibling so the shell command can move to
this shared implementation in a follow-up.

* refactor(shell): reuse scheduler.LoadCompileInputs in run-shard

Drop the local copies of loadLifecycleCompileInputs / isBucketVersioned
/ allActivePriorStates / lifecycleParseError that the new
scheduler package now exports. Same behavior, one source of truth.

* feat(worker/s3_lifecycle): plugin handler with admin UI config

Registers a JobHandler for s3_lifecycle via pluginworker.RegisterHandler.
Admin pulls the descriptor over the worker plugin gRPC and renders the
AdminConfigForm + WorkerConfigForm in the existing UI:

  Admin form (cluster shape):
    - workers (1..16, default 1)
    - s3_grpc_endpoints (comma list)

  Worker form (operational tuning):
    - dispatch_tick_ms (default 5000)
    - checkpoint_tick_ms (default 30000)
    - refresh_interval_ms (default 300000)
    - event_budget (default 0 = unbounded)

Detect emits a single proposal whenever S3 endpoints + filer addresses
are configured. MaxExecutionConcurrency=1 so admin only ever runs one
lifecycle daemon per worker; a fresh proposal next cycle restarts it
if the prior Execute exits.

Execute dials the configured S3 endpoint + filer, builds a
scheduler.Scheduler with the parsed config, and runs it until
ctx cancellation. Reuses the existing scheduler / dispatcher /
reader / engine packages — the handler is the thin glue that
parses descriptor values and wires the long-running daemon.

* proto(plugin): add s3_grpc_addresses to ClusterContext

So workers can dial s3 servers discovered by the master rather than a
hand-typed list in the admin form.

* feat(admin): populate ClusterContext.s3_grpc_addresses from master

ListClusterNodes(S3Type) returns the live S3 servers; the plugin
scheduler now hands these to job handlers alongside filer/volume
addresses.

* feat(worker/s3_lifecycle): discover s3 endpoints from cluster context

Drop the s3_grpc_endpoints admin form field and read the master-supplied
ClusterContext.S3GrpcAddresses instead. Operators no longer maintain a
hand-typed list, and a stale entry self-heals when the master's view
updates.

* feat(worker/s3_lifecycle): time-based runtime cap, friendlier cadence units

- dispatch_tick_minutes (was *_ms): minutes is the natural granularity
  for a daily batch; default 1 minute.
- checkpoint_tick_seconds: seconds for the durable cursor write; default
  30 seconds.
- refresh_interval_minutes: minutes for the engine snapshot rebuild.
- max_runtime_minutes replaces event_budget. Each daily run is bounded
  by wall clock — typical run wraps in well under an hour because the
  cursor persists and the meta-log streams fast. Default 60 minutes.
- AdminRuntimeDefaults.DetectionIntervalSeconds = 86400 so the admin
  schedules one job per day.
2026-05-08 10:30:02 -07:00
Chris Lu
85abf3ca88 feat(shell): s3.lifecycle.run-shard + integration test (#9361)
* feat(shell): s3.lifecycle.run-shard for manual Phase 3 dispatch

Subscribes to the filer meta-log filtered to one (bucket, key-prefix-hash)
shard, routes events through the compiled lifecycle engine, and dispatches
due actions to the S3 server's LifecycleDelete RPC. Persists the per-shard
cursor to /etc/s3/lifecycle/cursors/shard-NN.json so subsequent runs resume.

Operator-runnable harness for end-to-end Phase 3 validation while the
plugin-worker auto-scheduler is still pending. EventBudget bounds a single
invocation; flags expose dispatch + checkpoint cadence.

Discovers buckets by walking the configured DirBuckets path and reading
each bucket entry's Extended[s3-bucket-lifecycle-configuration-xml]
through lifecycle_xml.ParseCanonical. All compiled actions are seeded
BootstrapComplete=true so the run dispatches whatever fires immediately;
production bootstrap walks set this incrementally per bucket.

* test(s3/lifecycle): integration test driving the run-shard shell command

Spins up 'weed mini', creates a bucket with a 1-day expiration on a prefix,
PUTs the target object, then rewrites the entry's Mtime via filer
UpdateEntry to 30 days ago. Runs 's3.lifecycle.run-shard' for every
shard via 'weed shell' subprocess and asserts the backdated object is
deleted within 30s, and the in-prefix-but-recent object remains.

The S3 API rejects Expiration.Days < 1, so 'wait a day' is unworkable.
Backdating via the filer's gRPC sidesteps that constraint while still
exercising the real Reader -> Router -> Schedule -> Dispatcher ->
LifecycleDelete RPC path end-to-end.

Wires a new s3-lifecycle-tests job into s3-go-tests.yml. The test runs
all 16 shards because ShardID(bucket, key) is hash-based and the test
shouldn't couple to that detail; running every shard keeps the test
independent of the hash function.

* fix(shell/s3.lifecycle.run-shard): address review findings

- Reject negative -events explicitly. Help text already defines 0 as
  unbounded; negative budgets created ambiguous behavior in pipeline.Run.
- Bound the gRPC dial with a 30s timeout instead of context.Background()
  so an unreachable S3 endpoint doesn't hang the shell.
- Paginate the bucket listing in loadLifecycleCompileInputs. SeaweedList
  takes a single-RPC limit; the prior 4096 silently dropped buckets
  past that page on large clusters. Loop with startFrom until a page
  comes back short.
- Surface parse errors instead of swallowing them. Buckets with
  malformed lifecycle XML now print the first three errors verbatim
  and a count for the rest, so an operator running this command for
  diagnostics can find what's wrong.

* feat(shell/s3.lifecycle.run-shard): -shards range/set with one subscription

Adds -shards "lo-hi" or "a,b,c" to the manual run command and threads
the same model through Reader and Pipeline.

- reader.Reader gains ShardPredicate (func(int) bool) and StartTsNs;
  ShardID stays for the single-shard short form. Event carries the
  computed ShardID so consumers can route per-shard without rehashing.
- dispatcher.Pipeline gains Shards []int. When set, Run holds one
  Cursor + Schedule + Dispatcher per shard, opens one filer
  SubscribeMetadata stream with a predicate covering the whole set,
  and routes events into the matching shard's schedule from a single
  dispatch goroutine — no per-shard goroutine fan-out.
- shell command parses -shard or -shards (mutually exclusive),
  formats progress messages with a contiguous-range label when
  applicable, and validates against ShardCount.

Integration test now uses -shards 0-15 (one subprocess invocation)
instead of a 16-iteration loop.

* fix(s3/lifecycle): allow Reader with StartTsNs=0 + Cursor=nil

The reader rejected the legitimate 'fresh subscription from epoch'
state when called from a fresh Pipeline.Run on a multi-shard worker
(no cursor file yet, all shards' MinTsNs=0). The downstream
SubscribeMetadata call handles SinceNs=0 fine; the up-front check
was over-defensive and broke the auto-scheduler completely (CI
showed 5-second-cadence retries with this exact error).

* fix(s3/lifecycle): schedule from ModTime not eventTime

A backdated or out-of-band entry update has eventTime ≈ now while
ModTime is far in the past; eventTime+Delay would push the dispatch
into the future even though the rule already fires. ModTime+Delay
is the correct fire moment. The dispatcher's identity-CAS still
catches drift between schedule and dispatch.

* fix(s3/lifecycle): -runtime cap on run-shard so it exits on quiet shards

The CI integration test sets -events 200 expecting the subprocess to
return after 200 in-shard events. But -events counts only events that
pass the shard filter; the test produces ~5 such events (bucket
create, lifecycle PUT, two object PUTs, mtime backdate), so the
reader stays in stream.Recv forever and runShellCommand hangs the
test deadline.

- weed/shell/command_s3_lifecycle_run_shard.go: add -runtime D flag.
  When > 0, Pipeline.Run runs under context.WithTimeout(D); on
  expiry the reader/dispatcher drain cleanly and the cursor saves.
- weed/s3api/s3lifecycle/dispatcher/pipeline.go: treat
  context.DeadlineExceeded the same as context.Canceled at exit
  (both are graceful shutdown signals).

* test(s3/lifecycle): pass -runtime 10s to run-shard

Pair with the new -runtime flag so the subprocess exits cleanly
after 10s instead of waiting for an event budget that never lands
on quiet shards.

* refactor(s3/lifecycle): extract HashExtended to s3lifecycle pkg

The worker's router needs the same length-prefixed sha256 of the entry's
Extended map; pulling it out of the s3api private file lets both sides
import it.

* fix(s3/lifecycle): worker captures ExtendedHash for identity-CAS

Without this, the dispatcher sends ExpectedIdentity.ExtendedHash = nil
while the live entry on the server has a non-nil hash, so every dispatch
returns NOOP_RESOLVED:STALE_IDENTITY and nothing is ever deleted.

* fix(s3/lifecycle): identity HeadFid via GetFileIdString

Meta-log events go through BeforeEntrySerialization, which clears
FileChunk.FileId and writes the Fid struct instead. Reading .FileId
directly returns "" on the worker side while the server's freshly
fetched entry still has a populated string, so the identity-CAS would
mismatch and every expiration ended in NOOP_RESOLVED:STALE_IDENTITY.

* fix(s3/lifecycle): treat gRPC Canceled/DeadlineExceeded as graceful exit

errors.Is doesn't unwrap a gRPC status error back to the stdlib ctx
errors, so a subscription that ends because runCtx was canceled was
being logged as a fatal reader error. Check status.Code as well so the
shell's -runtime cap exits cleanly.

* fix(test/s3/lifecycle): pass the gRPC port (not HTTP) to run-shard

run-shard's -s3 flag dials the LifecycleDelete gRPC service, which
listens on s3.port + 10000. The integration test was passing the HTTP
port instead, so the dispatcher's RPC just timed out and the shell
command exited under -runtime with no work done.

* chore(test/s3/lifecycle): drop emoji from Makefile output

* docs(test/s3/lifecycle): correct '-shards 0-15' wording

* fix(s3/lifecycle): reject out-of-range shard IDs in Pipeline.Run

The shell's parseShardsSpec already validates, but a programmatic caller
(scheduler, future worker config) shouldn't be able to silently produce
no-op states by passing -1 or 99.

* fix(s3/lifecycle): bound drain + final-save with their own timeouts

Shutdown was using context.Background, so a stuck dispatcher RPC or
filer save could keep Pipeline.Run from ever returning.

* fix(test/s3/lifecycle): drop self-killing pkill in stop-server

The pkill pattern \"weed mini -dir=...\" is also in the running shell's
argv (it's the recipe body), so pkill -f matches its own bash and the
recipe exits with Terminated. CI test job passed but the cleanup step
failed with exit 2. The PID file is sufficient on its own.

* docs(test/s3/lifecycle): document S3_GRPC_ENDPOINT env var
2026-05-08 09:59:10 -07:00
dependabot[bot]
c918660901 build(deps): bump io.netty:netty-transport-native-epoll from 4.1.132.Final to 4.2.13.Final in /test/java/spark (#9365) 2026-05-08 06:00:51 -07:00
dependabot[bot]
9cb103cd35 build(deps): bump github.com/apache/thrift from 0.22.0 to 0.23.0 (#9364) 2026-05-08 05:59:26 -07:00
Chris Lu
b7928637a0 refactor(s3api): move Lifecycle XML structs to leaf package lifecycle_xml (#9360)
* refactor(s3api): move Lifecycle XML structs to leaf package lifecycle_xml

The structs S3 PutBucketLifecycleConfiguration parses and the canonical
conversion to s3lifecycle.Rule lived in package s3api, which transitively
imports weed/server (which imports weed/shell). Any caller outside
weed/s3api — the shell, the future lifecycle worker — that wanted to
parse a bucket's lifecycle XML hit an import cycle.

Moves:
  weed/s3api/s3api_policy.go              -> lifecycle_xml/types.go
  weed/s3api/s3api_lifecycle_canonical.go -> lifecycle_xml/canonical.go
  s3api_lifecycle_canonical_test.go       -> lifecycle_xml/canonical_test.go
  s3api_policy_test.go                    -> lifecycle_xml/round_trip_test.go

Renames the public RuleStatus type (was unexported ruleStatus) and adds
small accessor methods (Set/Val/AndSet/TagSet) for fields the s3api
handler needs to read across the package boundary. Adds NewPrefix and
NewExpirationDays constructors so the GET handler can build response
rules without poking at unexported fields. Adds a Tag struct local to
the package so it has zero internal seaweed deps. Adds a one-shot
ParseCanonical(xml []byte) helper for non-server callers.

s3api_policy.go was misnamed — its content is lifecycle XML, not S3
bucket policy. The new package name reflects the actual scope.

* test(s3api/lifecycle_xml): exercise public API in tests

- canonical_test.go's parseLifecycle helper went through xml.Unmarshal
  directly; route it through the package's exported Parse so tests
  validate the public entrypoint.
- round_trip_test.go asserted internal flags (rule.Filter.tagSet,
  rule.Filter.andSet, Transition.set, NoncurrentVersionTransition.set);
  switch to TagSet(), AndSet(), Set() — exercises the public contract
  that downstream callers (s3api handler, future shell command) rely on.
2026-05-07 18:54:06 -07:00
Chris Lu
c567da7164 feat(s3): register SeaweedS3LifecycleInternal gRPC service (#9359)
Phase 2 added the LifecycleDelete handler on S3ApiServer but never
registered it on a running gRPC server, so workers had no endpoint to
dial. Embed UnimplementedSeaweedS3LifecycleInternalServer on
S3ApiServer and register it on the s3 command's grpc server alongside
SeaweedS3IamCacheServer.
2026-05-07 18:19:42 -07:00
Chris Lu
35e3fe89bc feat(s3/lifecycle): filer-backed cursor Persister + drop BlockerStore (#9358)
* feat(s3/lifecycle): filer-backed cursor Persister

FilerPersister persists per-shard cursor maps as JSON to
/etc/s3/lifecycle/cursors/shard-NN.json via filer.SaveInsideFiler.
One file per shard keeps Save atomic — the filer writes the entry
in a single mutation, so a crash mid-write doesn't leak partial
state. Pipeline.Run loads on start; the periodic checkpoint and
graceful-shutdown save go through this implementation.

A small FilerStore interface wraps the SeaweedFilerClient surface
the persister needs, so tests inject an in-memory fake instead of
mocking the full gRPC client.

* refactor(s3/lifecycle): drop BlockerStore — durable cursor IS the block

A frozen cursor doesn't advance, so the durable cursor (FilerPersister)
encodes the blocked state on its own. On worker restart the reader
re-encounters the poison event at MinTsNs, the dispatcher walks the
same retry budget to BLOCKED, and the cursor freezes at the same
EventTs. Other in-flight events between freeze tsNs and prior cursor
positions self-resolve via NOOP_RESOLVED (STALE_IDENTITY) since the
underlying objects were already deleted on the prior pass.

Removed:
  - BlockerStore interface + InMemoryBlockerStore + BlockerRecord
  - Dispatcher.Blockers + Dispatcher.ReplayBlockers
  - the BlockerStore.Put call in handleBlocked
  - Pipeline.Blockers field + the ReplayBlockers call on startup

Added a TestDispatchRestartReFreezesNaturally that pins the
self-recovery property: a fresh Dispatcher with a fresh Cursor, fed
the same poison event, reaches the same frozen state at the same
EventTs without any durable blocker store.

Operator visibility: a cursor whose MinTsNs hasn't advanced is the
signal — surfaced via the durable cursor file.

* refactor(filer): SaveInsideFiler accepts ctx

ReadInsideFiler already takes ctx; SaveInsideFiler used context.Background()
internally and silently dropped the caller's ctx. Symmetric API now;
cancellation/deadlines propagate through LookupEntry / CreateEntry /
UpdateEntry. Mechanical update of all callers — most pass
context.Background() since the existing call sites have no ctx in scope.

* fix(s3/lifecycle): deterministic order in cursor save

Iterating Go maps yields random order, so json.Encode produced a different
byte sequence on each save even when the state hadn't changed. Sort
entries by (Bucket, ActionKind, RuleHash) before encoding so the on-disk
file diffs cleanly. New test pins byte-identical output across two saves
of the same map.

* fix(s3/lifecycle): log reason when freezing cursor in handleBlocked

handleBlocked dropped the reason via _ = reason with a comment claiming
the caller logged it; none of the three callers do. A frozen cursor is
the only surface where the operator finds out something stuck, so the
reason has to land somewhere. glog.Warningf with shard, key, eventTs,
and the original reason — same shape the rest of the package uses.
2026-05-07 17:45:04 -07:00
Chris Lu
ec83a87d68 perf(s3/lifecycle): defer pool Put on ShardID hasher
defer guarantees the hasher returns to the pool even if h.Write or
h.Sum panic, preventing pool leak under unexpected failure modes.
2026-05-07 17:08:50 -07:00
Chris Lu
3a192c6c57 fix(s3/lifecycle): address Phase 3 post-merge review (#9354 #9355 #9356) (#9357)
* fix(s3/lifecycle): reader handles bare /buckets parent and pre-normalizes prefix

extractBucketKey accepted /buckets/ but rejected /buckets (no trailing
slash); some delete events emit the bare form, so bucket-root events
were silently dropped. Pre-normalize BucketsPath once on Run instead
of recomputing per event.

* perf(s3/lifecycle): pool sha256 hashers in ShardID

ShardID runs on every meta-log event before the shard filter; a fresh
sha256.New per call produces measurable allocator pressure under load.
sync.Pool reuses hashers across calls.

* fix(s3/lifecycle): router skips hard deletes and missing-attribute events

A hard delete carries no schedule-relevant state — Expiration would hit
NOOP_RESOLVED at dispatch and ExpiredObjectDeleteMarker fires from a
Create on the latest version. Skip rather than burn a schedule slot.

Missing Attributes leaves ModTime at year 0001, which makes
ExpirationDays fire immediately at dispatch. Skip the event instead.

Drop the unused 'versioned' parameter from buildObjectInfo; the
dispatcher's identity-CAS handles version drift in Phase 5.

* fix(s3/lifecycle): EntryIdentity.MtimeNs holds true nanoseconds

Both computeEntryIdentity (server) and buildIdentity (router) wrote
entry.Attributes.Mtime (seconds) into a field named MtimeNs. The CAS
worked because both sides agreed, but the encoding contradicted the
field name and would break if either side later started using true
nanoseconds. Combine Mtime*1e9 + the FuseAttributes.MtimeNs nanosecond
component on both sides; the test was updated to match.

* fix(s3/lifecycle): dispatcher distinguishes ctx cancel from transport errors

A canceled or deadline-exceeded RPC is shutdown, not a transport
failure: re-queue the Match at its original DueTime with no retry-budget
burn so a quick restart can't escalate it to BLOCKED.

* fix(s3/lifecycle): reader fallback prefix normalization mirrors Run

The fallback path that builds prefix from r.BucketsPath when
bucketsPathSlash is empty (test-only entry into extractBucketKey) was
appending an unconditional '/', producing '//' if BucketsPath already
ended with one. Use the same normalization Run does.

* fix(s3/lifecycle): ObjectInfo.ModTime carries the nanosecond component

ModTime dropped FuseAttributes.MtimeNs, leaving ExpirationDays one
nanosecond off relative to EntryIdentity.MtimeNs. Pass both to
time.Unix so the precision matches the CAS witness.
2026-05-07 16:54:24 -07:00
Chris Lu
5c991f38f5 feat(s3/lifecycle): dispatcher + per-shard pipeline (Phase 3 PR-D) (#9356)
feat(s3/lifecycle): dispatcher + blocker store + per-shard pipeline

Dispatcher consumes due Matches from the schedule, calls LifecycleDelete,
and routes outcomes:
  DONE / NOOP_RESOLVED / SKIPPED_OBJECT_LOCK -> Cursor.Advance
  RETRY_LATER (within budget)                 -> re-schedule with backoff
  RETRY_LATER (budget exhausted) / BLOCKED    -> BlockerStore.Put + Freeze

BlockerStore is a small interface with InMemoryBlockerStore for tests;
the filer-backed impl follows when the worker task registration lands.

Pipeline composes Reader + Router + Dispatcher into a single Run loop
keyed by shard. Cursor is restored on start, blockers are replayed as
freezes, checkpoints write at a configurable cadence, and a final save
fires on shutdown. The meta-log itself is the durable buffer for in-flight
schedule entries — restart re-derives them from the cursor's MinTsNs.
2026-05-07 15:44:09 -07:00