seaweedfs

mirror of https://github.com/seaweedfs/seaweedfs.git synced 2026-07-26 10:03:13 +00:00

Author	SHA1	Message	Date
Chris LuandGitHub	47b491b53c	mount: version open file handles by filer log position (#10403 ) * filer: stamp a log position on lookup and remote-cache responses Metadata events are logged after their store write and stamped with the filer clock. Reading that clock before serving an entry therefore gives a timestamp with a causal guarantee: every event at or below it is reflected in the returned entry. Clients caching filer state can use it as the entry's version to order the response against subscription events, including events committed before the call but delivered after it. * mount: version open file handles by filer log position A subscription event refreshing an open handle did a second lookup; a transient failure left the handle pinned to its old entry with no retry, since the subscription cursor had already advanced. The deeper problem is ordering: the handle is a cache written by three unordered channels — the async invalidation worker, local mutation acks, and open-time lookups — and overwriting cached state safely requires knowing which write is newer. The filer log timestamp is that order, and it now travels with every value instead of being derived out of band. Events carry it natively; lookup and remote-cache responses carry the log position stamped before the serving read; mutation acks carry it in their returned event; and the local store pairs each read with a version cursor advanced under the same lock as the store write. Each handle records the version its entry reflects, and one rule replaces the per-site reasoning: state at or below the handle's version is old news and must not be installed. The invalidation itself applies the event's own entry — no lookup, so no transient-failure window — except under a cached parent, where the store entry is the ordered merge of the event and anything applied since, and its version outranks the event's. An uncached parent receives no store writes, so a hit there would be a stale leftover masking the event. A vacated path (delete, rename away) keeps the last entry so unlinked-but-open reads still work. Directory builds version the completed directory at the listing snapshot and re-invalidate buffered events at that version, since their mid-build refresh ran against an incomplete store. The tests replay every race this replaces machinery for: rollback of a newer local flush (queued, cached, and read-through), stale leftovers under uncached parents, the build window including abort, handles opened after an event was queued, events landing mid-lookup, and undelivered events at remote-cache time across a filer failover. * filer: serialize the log position fence with mutations, stamp mutation acks The fence stamped before an unlocked entry read could precede state the read returned: a mutation writes storage first and assigns its event timestamp only at notify time, so a lookup racing that window handed the mount an entry newer than its fence, and the event's later delivery looked like fresh news — destroying dirty pages for a change the handle already had. The mutation handlers already hold an exclusive per-path lock across read, write, and notify; the lookup and remote-cache reads now take it shared around the stamp and the read, making the fence exact: everything at or below it is in the entry, nothing above it is. A no-change update returns success without an event, leaving the mount nothing to fence with even though the response confirms current state. Create and update acks now carry a log position stamped under the same lock, and the mount falls back to it whenever the ack has no event. Also regenerate the VT marshalers, which the earlier generation missed: without them a VT round-trip silently zeroed every log position. * java: sync filer.proto * mount: scope store versions to what they vouch for; atomic handle install The store's version cursor claimed too much. Advanced by local mutation acks and directory listing snapshots, it inflated the version of store reads for unrelated paths whose events the subscription still owed, and those events were then fenced out permanently. The cursor now tracks subscription progress only — events arrive in log order, so everything at or below it has been delivered for every path — and a completed listing records its snapshot as a per-directory floor instead of a global claim. Local acks never touch it: they version their own handle directly. Buffered build events advance the cursor at delivery, since their store write may never happen (abort) while their invalidation is already queued; their read-through directory pairs no store read with it, and rename fragments are applied first. Concurrent first opens raced: a slower opener's older lookup could overwrite the newer entry a faster opener had installed, while the monotonic version kept the newer timestamp — an old entry fenced at a new version, immune to every correcting event. Entry and version are now installed as one decision under the handle map lock, and an install that does not outrank the handle's version is dropped. The remote-cache commit also escaped the fence: it wrote storage and notified without the path lock, so a lookup's shared-locked fence and read could land between the two and hand out the cached state under-versioned. The commit now re-reads and writes under the exclusive path lock, and backs off entirely when the entry changed during the download — the concurrent writer supersedes the cached content. * mount: floors gate store applies; installs respect handle users; renames join the fence A directory floor certifies the listing state as of its snapshot, but a delayed event at or below the floor was still applied to the store — rolling the content back to pre-snapshot state while the floor kept claiming the snapshot version, so the correcting events were fenced out of every future read. Events are now gated against the affected directory's floor, each half of a rename independently. Fences are lower bounds: a listing or lookup can include a mutation whose event has not been delivered yet, and that event later passes every gate carrying state the handle already holds. Such a re-delivery now advances the version without destroying dirty pages or reinstalling the entry — invalidating local writes over a no-op was the real damage in every remaining under-fence window, including the unlocked listing snapshot, which no per-path lock can serialize. The concurrent-open install moved from the map lock to the handle lock every reader, writer, and invalidation synchronizes on, and rejects what cannot improve the handle: dirty state (local writes would be lost), unversioned lookup responses (they cannot outrank anything, and two zero-version opens must not overwrite each other), and anything not strictly newer. New handles are still fully initialized before the map exposes them. Renames committed metadata and emitted events with no path lock, so a lookup could read the renamed state under a fence preceding its events. Both rename handlers now hold the source and destination locks, ordered by path, across commit and notification; descendants of a renamed directory are not individually locked and rely on the no-op re-delivery handling above. * mount: per-entry store versions replace the cursor and directory floors The store's aggregate versions — a global subscription cursor and per-directory listing floors — were versions at coarser granularity than the values they described, and every over-claiming bug in this series traced to that gap: an aggregate vouching for state its source never saw. Each store entry now carries the filer log position of the write that produced it — the event that applied it, or the listing snapshot that inserted it, recorded in the store's key-value space under the same lock as the entry write. The store becomes what the handle already is: a last-writer-wins register with one rule, install only what outranks the current claim. The cursor, the floors, their advancement rules, the pairing ordering constraint, and the floor gating all collapse into that rule. Applies are gated per entry, each half of a rename independently; an unversioned local write clears the claim its content no longer proves; version records lingering after a bulk folder wipe cannot fence a recreate, since a claim only blocks while its entry exists. Listing inserts are stamped at build completion, before the buffered replay so newer replayed events override the stamp. Filer side, the fence dance every versioned read must perform is now a single choke point, fencedFindEntry, so a future read RPC gets the lock-serialized stamp by construction rather than by convention. * mount: judge no-op re-deliveries against an immutable base, not the live entry The equal-state skip compared the incoming event to the live handle entry, but local writes mutate the live entry — size, timestamps, chunks — so a delayed event re-delivering the base the handle was opened with no longer matched, and the installer destroyed the dirty pages and rolled the entry back over nothing new. The handle now keeps an immutable snapshot of the filer state it last installed or acknowledged, refreshed at every install and mutation ack (flush acks snapshot the request entry before the id mapping mutates it), and the no-op judgment runs against that base: an event carrying the base brings nothing, whatever the live entry has diverged to since. * mount: tombstones for versioned deletes, absence floors, copy enrollment Four gaps in the per-entry version protocol, all the same shape: a versioned fact with nothing carrying its version. A deletion is a fact about a path with no entry left to hold it — clearing the record let a delayed older event resurrect the deleted path, permanently, since the deletion's own redelivery is dedup-suppressed. Versioned deletes now leave a tombstone record that fences without an entry; renames tombstone their source the same way. Plain records still only block while their entry exists, so records lingering after a bulk folder wipe cannot fence a recreate. A completed listing proves absences as well as presences: a name it omitted was deleted as of the snapshot, and a delayed create below the snapshot re-creates it. The snapshot is kept per directory strictly as an absence fence, consulted only when a path has neither an entry nor a version record — present entries carry their own versions and never touch it, which is what separates this from the over-claiming floor it replaces. A rebuild against a pre-upgrade filer returns no snapshot; stamping now clears the children's records in that case, so a reinserted entry cannot reactivate the stale claim its previous incarnation left behind and reject valid events below it. Server-side copies installed the copied entry without enrolling in the base protocol, so the copy's own event differed from the stale pre-copy base and destroyed writes made to the destination after the copy. The install now refreshes the base and takes its version from the fenced readback. * mount: deletion facts outlive the cache's knowledge of the entry A versioned delete of a path the store held no entry for recorded nothing, so a delayed older event recreated the path — permanently, with the deletion's redelivery dedup-suppressed. The tombstone is now written whenever a versioned event vacates a path: the deletion is a fact about the path, not about what this cache happened to hold. For an absent entry, the listing's absence floor now speaks whatever older record remains: a tombstone at one position does not exhaust what is known about the path when a newer snapshot has confirmed the name still absent, and an event between the two was slipping past both. A committed copy whose readback failed installed a synthesized base with local timestamps; the copy's real event legitimately differs from it, and was read as foreign state — destroying writes made to the destination after the copy. The handle now marks that its own event is en route and adopts that event's state as the base without touching the live entry or the dirty pages; the adoption is one-shot, so a genuinely foreign event still invalidates. * mount: authoritative acks cancel pending event adoption; tombstones scoped and pruned The copy-event adoption flag could outlive its purpose: a flush after the failed readback installs a newer base and advances the version, the copy's own event is then version gated without consuming the flag, and the next genuinely foreign event was silently adopted — base advanced, live entry and dirty pages untouched — leaving the mount to later overwrite that remote change. Every local acknowledgment now installs its base through one helper that also cancels any pending adoption: the ack supersedes the mutation the adoption was waiting for. Tombstones were written for every versioned delete under the mount and survived directory eviction by design, growing LevelDB with historical deletions on delete-heavy mounts. They are now scoped to directories whose cached state the fence actually protects — an uncached parent never serves from the store nor applies the resurrecting insert — and a completed listing prunes the direct-child tombstones its absence floor supersedes, leaving only those above the snapshot. The store gains a key-prefix visitor for the sweep. * mount: acked saves install their value; trailer snapshots; direct-child prune range A version must never advance without its value. saveEntry stamped any open handle with the acknowledgment's version, but a handle opened while the save was in flight holds the pre-mutation entry — stamping it fenced out the events carrying the state it lacked, permanently, with the local apply performing no invalidation and the redelivery deduplicated. The acknowledged entry is now installed together with its version, through the same guarded install the racing-open path uses: under the handle lock, only when it outranks the handle, never over dirty local writes. Empty listings return no in-band snapshot — a snapshot-only response would be read as an entry by older consumers — so directories that end empty gained no absence floor and their tombstones were never pruned. The filer now sends the snapshot in the stream trailer, which older clients ignore, and the client reads it when no in-band snapshot arrived. Empty directories get real floors, their tombstones prune, and their buffered replays gain the snapshot filter instead of the replay-all fallback. Version records now encode the parent directory and name separated by a NUL, making a directory's direct children one contiguous key range: the tombstone prune scans exactly them under the cache lock, instead of walking every descendant record — the whole store, for root. * mount: fix dirty-page loss, uid/gid base, download race, copy adopt, leak; dedup Correctness fixes from the versioned-invalidation review: - A foreign delete/rename-away of a file held open with unflushed local writes destroyed the dirty pages unconditionally. A process may keep writing to an unlinked-but-open file and those writes were already acknowledged; preserve the pages when the handle is dirty. - downloadRemoteEntry stored the handle's base with filer-side uid/gid while every candidate it is later compared against is in local form, so under a non-identity UidGidMapper an unchanged re-delivery looked foreign and force-destroyed dirty pages. Map the base to local. - downloadRemoteEntry wrote the entry/base/version triple under only the handle's shared lock, so two concurrent reads of the same remote-only file could tear it. Serialize the install with a dedicated mutex (invalidation is already excluded by the exclusive handle lock). - A committed server-side copy whose readback failed adopted the FIRST event past the version gate as its base; a foreign write delivered first was silently swallowed. Adopt only an event whose content matches the synthesized base — the copy's own event — and install any other normally. - The deferred-create path relied on AcquireFileHandle installing the passed entry on a pre-existing handle, which the version rework dropped. Restore that install in the compat wrapper; the versioned open path keeps its gated install. Growth and hot-path cost: - Per-entry version records and tombstones leaked when a directory was evicted or read-through without a rebuild. An uncached directory gates its own inserts, so its records fence nothing; clear a directory's child version records when it is wiped for eviction. - FindEntry paid for the version KvGet on every lookup/getattr cache hit and threw it away. FindEntry now reads only the entry; the hot lookupEntry cache-hit path skips the version entirely. Cleanups: - Extract ackVersionTsNs over the shared response interface, replacing the metadata-event-else-log-ts snippet copy-pasted at four ack sites. - Extract acquireRenamePathLocks, replacing the verbatim sorted two-path lock fence in both rename handlers. * mount: no resurrection on foreign delete, version no-event acks, gate downloads, tighten copy adopt Follow-ups to the review patches: - Preserving dirty pages on a foreign delete let the next flush pass the isDeleted guard and CreateEntry, resurrecting the remotely-unlinked name. Mark the handle deleted in the vacate branch: the open fd can still read its buffered writes, but a flush no longer recreates the file. - A no-event acknowledgment (log fence only) synthesized a metadata event with TsNs 0, so the cache stored the entry unversioned and an older subscriber event rolled it back. Stamp the synthesized event with the ack's log position at all four ack sites. - downloadRemoteEntry serialized its install but did not check the version, so an older response arriving last overwrote the entry/base while the monotonic version kept the newer value, fencing corrections out. Install only when the response is at least as new as the handle. - sameEntryContent compared only size and chunks, so a foreign chmod with unchanged content was adopted as the copy's own event. Compare everything except server-assigned timestamps, so a metadata-only foreign change installs instead. * mount: trim comments to the non-obvious why The versioning work accumulated multi-line comment blocks restating what the code says. Keep the constraint a reader cannot derive — why a fence is exact, why a version must not advance without its value, why an uncached parent's records fence nothing — and drop the rest. * mount: distinguish rename from delete, tighten the download and adopt gates - A rename emits a nil old-path invalidation just like an unlink, so the vacate branch marked the handle deleted and later writes through the already-open descriptor were skipped instead of persisted. Carry the delete/rename distinction on the invalidation and mark only an actual delete. - The remote-download install accepted an unversioned response regardless of the handle's version, so during a rolling upgrade a delayed response could install stale content under a newer version. Require the response to be at least as new, with one exception: a handle still lacking local chunks takes the content anyway — it cannot read without it — but does not claim the response's log position. - Copy-event adoption returned without installing, so a foreign touch arriving before the copy's own event lost its timestamps. Content is unchanged either way, so the dirty pages stay valid; a clean handle now takes the entry, while a dirty one keeps its diverged version. * mount: one directory floor instead of a record per child; agree on TTL Review feedback: - Build completion wrote one KV record per direct child inside the cache write lock, so a large directory stalled every other cache operation for O(children) store writes. The directory's listing snapshot already covers every child it saw; make that floor the version for any child without a record of its own, and a child earns a record only when a later event touches it. One map write per build replaces the per-child writes, with the same fencing. - The presence probe read the store directly and so counted a TTL-expired entry as present, judging the path by a record describing content that has logically vanished. It now applies the same expiry the read path does, and an expired path falls back to its directory floor. - Preserve ErrNotFound identity when the commit-time re-read finds the object deleted, so callers still surface a 404. - Assert the rename-away source fence timestamp in the invalidation test. Also record the tombstone ceiling: distinct deleted names in a cached directory accumulate until it is rebuilt or evicted, which prunes everything at or below the new snapshot. * mount: pin the fence's clock domain instead of letting skew decide A log-position fence is stamped by one filer's clock under that filer's in-process lock, so comparing it to an event another filer logged is comparing two unrelated clocks. The two error directions are not equally costly: applying an event the fence already covered is a re-apply the base-equality check absorbs, while skipping one it does not cover leaves the handle holding exactly the state the event was meant to correct, with the subscription cursor already past it — the unhealable staleness this whole PR exists to remove. So refuse to guess. Fences now carry the signature of the filer that stamped them, and a handle records it alongside the position. An event is only fenced out when the filer that logged it is the one that stamped the fence — the logging filer appends its own signature, so its presence identifies the clock domain. Events from any other filer are applied. Positions taken from events keep comparing as before; the subscription already delivers those in order. The invalidation callback takes a struct now: it carries the path, entry, position, delete/rename distinction, and signatures, and was about to need a fifth positional parameter. * mount: follow a foreign rename; key page invalidation on content, not equality - A rename's old-path invalidation now carries the destination, and the handle follows the file there: an open fd tracks the inode, and leaving it on the old path made its next flush recreate that name instead of updating the renamed file. - Dirty pages overlay content, so only a content change invalidates them. Keying that on exact equality meant any timestamp-only event destroyed them, which the copy-adoption marker existed to paper over — a foreign touch could consume the marker and leave the copy's own event to drop the post-copy writes. Comparing content instead makes the marker unnecessary, so it is gone: a metadata-only event keeps the overlay, and a dirty handle keeps its diverged entry unless foreign content supersedes it. - A remote download response that is merely older is now refused even when the handle still lacks chunks; only an unversioned one is taken (and claims no position), since an older response's content predates what the handle reflects. - A refused or unversioned download no longer publishes to the metadata cache, where a zero-position event would clear the entry's version and let an older subscriber event roll the cache back. * mount: page invalidation keys on content alone; unversioned writes claim no position - sameEntryContent compared everything but timestamps, so a foreign chmod, chown, or xattr change counted as a content change and destroyed the dirty-page overlay. It was strict only to serve the copy-adoption marker, which is gone; its one caller now asks the question it actually needs — did the bytes change — so metadata-only events leave the overlay alone. - A rename over an existing file destroys that file, but its open handle was left live and still pointed at the name the renamed source now occupies, so its flush could overwrite it. MovePath already reports the displaced inode; mark that handle deleted. - An acknowledgment was refused whenever its position was numerically lower, even when a different filer stamped the fence it lost to. Two known, differing signatures mean unrelated clocks, so the comparison no longer applies there; unknown signatures still compare as before. - A local write with no log position behind it now records that explicitly instead of deleting its version record. Absence means the directory listing covers the path, which is why the snapshot floor applies; local content the listing never saw must not inherit it, or the events that would correct it are fenced out. * mount: widen the existing lookup functions instead of forking WithVersion twins The versioning work grew a parallel function for every accessor that needed to return a log position — lookupEntryWithVersion beside lookupEntry, maybeLoadEntryWithVersion beside maybeLoadEntry, FindEntryWithVersion beside FindEntry, AcquireFileHandleWithVersion beside AcquireFileHandle, advanceEntryVersion beside advanceEntryVersionTsNs, plus a getPbEntryWithVersion wrapper and an InsertListedEntriesForTest hook. Two names for one operation is two places to keep in step, and the split let callers pick the one that happened to compile. Each pair is now the single original name carrying the position, with callers that do not want it discarding it. filer_pb.GetEntry returns the fence its response already carried rather than a mount-side wrapper re-issuing the lookup, and InsertEntry takes the position its content reflects rather than a test-only twin that inserted without one. The one behavioural knot the merge exposed: AcquireFileHandle had been installing the entry on a pre-existing handle only in its unversioned form, which conflated 'the caller is authoritative' with 'the lookup had no version'. Deferred create is the only caller that means the former, so it now installs explicitly and the map function just acquires.	2026-07-23 17:44:02 -07:00
Chris LuandGitHub	3f4cb6d2fb	feat(s3/lifecycle/engine): daily-replay view surface (Phase 4 engine) (#9447 ) * feat(s3/lifecycle/engine): daily-replay view surface (Phase 4 engine) Adds the engine-side API the new daily-replay worker reaches for: per-view snapshot construction (RulesForShard, RecoveryView), the two cursor hashes that gate recovery (ReplayContentHash, PromotedHash), and the cursor sliding-window helper (MaxEffectiveTTL). CurrentSnapshot is a stub keyed on a package-level atomic that the worker startup wiring populates. Views return new Snapshot instances holding cloned CompiledAction values so per-clone active/Mode never leak across partitions. Replay clones force Mode=ModeEventDriven to rehabilitate any persistent ModeScanOnly carried over from PriorState; walk and recovery clones preserve Mode as-is. Disabled actions are excluded from all views. No production caller is wired here — Phase 4's walker/dailyrun integration is the follow-up. dailyrun's local helpers (localReplayContentHash, localMaxEffectiveTTL) become one-line redirects to these exports. API surface: - CurrentSnapshot() Snapshot — stub until Phase 4 wiring. - SetCurrentEngine(Engine) — Phase 4 wiring entry point. - Snapshot.RulesForShard(shardID, retentionWindow) (replay, walk Snapshot) - RecoveryView(s Snapshot) Snapshot — force-active over the full set. - ReplayContentHash(s Snapshot) [32]byte — partition-independent. - PromotedHash(s Snapshot, retentionWindow) [32]byte — partition-flip. - MaxEffectiveTTL(s Snapshot) time.Duration — over active replay only. 30 unit tests covering clone isolation, Mode rewrite, partition membership including the multi-action-kind XML rule split, RecoveryView activating pre-BootstrapComplete actions, ReplayContentHash partition-independence, PromotedHash sensitivity to promotion in either direction, MaxEffectiveTTL aggregation. Build + race-tests green. * refactor(s3/lifecycle/engine): consolidate hash helpers; clarify shardID semantics Addresses PR #9447 review feedback. Three medium-priority items from gemini, all code-quality refinements (no behavior change): 1. Duplicated sort comparator between ReplayContentHash and PromotedHash. Extract sortHashItems shared helper so the two hashes use the same ordering by construction — if one drifted, the cursor could see a spurious "rule changed" on a no-op snapshot rebuild. 2. Duplicated writeField/writeInt closures. Extract hashWriter struct holding the sha256 running hash + lenbuf, with method helpers. Same allocation profile (one Hash, one tiny stack buffer per helper); just deduplicates ~20 lines. 3. shardID parameter on RulesForShard is unused. Per the design's open question, every shard sees every rule today (shard filter runs at the entry-iteration site, not view construction). Keep the parameter for API stability — removing it now would force a breaking change when bucket-shard ownership lands — and update the doc comment to explain why it's reserved. go build ./... clean; engine test suite green.	2026-05-11 18:07:54 -07:00
Chris LuandGitHub	82648cca53	test(s3/lifecycle/engine): pin delay-group dedup across buckets (#9418 ) Compile a 100-bucket × 5-rule snapshot where the five Days values include duplicates (1, 1, 7, 7, 30) and assert: - snap.actions has 500 entries — every (bucket, rule) compiles to its own ActionKey, no collapse. - snap.originalDelayGroups has exactly 3 entries — the routing index is keyed by Delay, so same-day rules across all buckets share a group. This is the property that lets the dispatcher index by delay group rather than per-rule. - Per-group key count = (rules with that day) × buckets, so every action is reachable from its group entry.	2026-05-10 10:36:54 -07:00
Chris LuandGitHub	c7b01c72b2	test(s3/lifecycle): integration coverage for versioning + filters (#9415 ) * test(s3/lifecycle): integration coverage for versioning + filters First integration-test bundle building on the existing single-test backdating harness. Each scenario follows the same shape: create bucket, set lifecycle, PUT object, backdate mtime via filer UpdateEntry, run the shell command for one shard sweep, assert S3-side state. Five new tests: - TestLifecycleVersionedBucketCreatesDeleteMarker: Expiration on a versioned bucket must produce a delete marker (latest after worker runs is a marker) AND keep the original version directly addressable by versionId. ListObjectVersions confirms IsLatest=true on the marker. - TestLifecycleNoncurrentVersionExpiration: NoncurrentVersionExpiration fires only on demoted versions. PUT v1, PUT v2 (so v1 → noncurrent), backdate v1, run worker. v1 must be gone, v2 still current. - TestLifecycleExpiredDeleteMarkerCleanup: combined rule (noncurrent + expired-delete-marker) cleans up a sole-survivor marker. PUT v1, DELETE (creates marker), backdate both, run worker. Every version AND marker must be gone for the key. - TestLifecycleDisabledRuleSkipsObject: rule with Status=Disabled must not produce dispatches even on a backdated match. Negative test for the engine's enabled-status gate. - TestLifecycleTagFilter: rule with And{Prefix, Tag} only matches objects carrying the tag. Two backdated objects (one tagged, one not) — only the tagged one is removed. Helpers extracted to keep each test focused: putVersioningEnabled, putNoncurrentExpirationLifecycle, putExpiredDeleteMarkerLifecycle, backdateVersionedMtime (ages a specific .versions/v_<id> entry), runLifecycleShard (one-shot shell invocation with FATAL guard). * test(s3/lifecycle): tighten noncurrent expiration diagnostics Local run showed TestLifecycleNoncurrentVersionExpiration failing with a bare 404 on HEAD(latest), not enough to tell whether v2 was deleted, the bare-key pointer was removed, or a delete marker was synthesized. Strengthen the test to: - HEAD by versionId=v2 first, so we pin "v2 file still on disk" separately from "the latest pointer resolves to v2" - on HEAD(latest) failure, log ListObjectVersions output (versions + markers, with IsLatest) so the next failure shows which side the bug is on rather than just NotFound * test(s3/lifecycle): integration coverage for AbortIncompleteMultipartUpload Exercises the lifecycleAbortMPU handler path that the prefix-based expiration tests can't reach — routing keys off of .uploads/<id>/ directory events, not regular object events, and the dispatcher uses a different RPC path (rm on the .uploads/<id>/ folder). Setup: AbortIncompleteMultipartUpload rule with DaysAfterInitiation=1, CreateMultipartUpload, UploadPart (so the directory carries the right shape), backdate the .uploads/<uploadID>/ directory entry 30 days, run the worker. The upload must drop out of ListMultipartUploads. Helpers added: putAbortMPULifecycle, backdateUploadDir. * test(s3/lifecycle): integration coverage for NewerNoncurrentVersions NewerNoncurrentVersions=N keeps the N most recent noncurrent versions and expires the rest. Distinct from per-version NoncurrentDays — depends on per-version rank, not just per-version age — and routes through routePointerTransition's "needs full expansion" path. Setup: PUT v1, v2, v3, v4 on a versioned bucket (v4 current; v1-v3 noncurrent), backdate v1+v2+v3 so all satisfy the NoncurrentDays>=1 floor, run the worker. Expect v1+v2 expired (older noncurrent), v3 (newest noncurrent within keep=1) and v4 (current) preserved. Helper added: putNewerNoncurrentLifecycle. * test(s3/lifecycle): integration coverage for suspended-versioning Expiration Suspended versioning takes a distinct code path in lifecycleDispatch: the VersioningSuspended branch first deletes the null version (via deleteSpecificObjectVersion(versionId="null")) and then writes a fresh delete marker on top. Other branches (Enabled → only writes a marker; Off → straight rm) miss this two-step. Setup: enable versioning, PUT v1 (real versionId), suspend versioning, PUT again (creates the null version, demotes v1 to noncurrent), set the Expiration rule, backdate the null at the bare path. Expect: latest is now a fresh delete marker, the "null" version is gone from ListObjectVersions, and v1 (noncurrent under Enabled) still addressable directly — suspended Expiration must only touch the null, not other versions. Helper added: putVersioningSuspended. * test(s3/lifecycle): integration coverage for multi-bucket sweep A single shell-driven shard sweep must process every bucket carrying lifecycle config, not just the first one alphabetically. Pinned because the scheduler iterates the buckets directory and a regression that returns early after the first match would silently disable lifecycle for every later bucket. Two buckets, each with their own prefix-expiration rule and a backdated object. Both must be expired after the same sweep. * test(s3/lifecycle): integration coverage for ObjectSizeGreaterThan filter ObjectSizeGreaterThan is a strict > gate (filterAllows uses ev.Size <= rule.FilterSizeGreaterThan to reject). Pinned at the boundary: an object whose size equals the threshold must remain; only an object strictly larger expires. Catches a > vs >= flip. Two backdated objects on the same prefix, sizes 100 and 150 with threshold=100 — boundary survives, larger expires. * test(s3/lifecycle): scrub bucket lifecycle config + versions on cleanup Tests share one weed mini server. Two pollution modes were producing order-dependent failures: - A later test's shard sweep would still load the prior test's lifecycle config (the worker reads every bucket's XML from filer state, and DeleteBucket alone doesn't drop lifecycle config cleanly on this codebase). - Versioned-bucket tests left versions + delete markers behind that ListObjectsV2 can't see, so the existing best-effort empty-then- delete didn't actually empty those buckets. - The AbortMPU test intentionally leaves an in-flight upload; without an explicit AbortMultipartUpload the bucket DELETE hits NotEmpty. Cleanup now runs DeleteBucketLifecycle, ListObjectVersions → DeleteObject(versionId), ListObjectsV2 → DeleteObject (catches what ListObjectVersions missed), ListMultipartUploads → AbortMultipartUpload, then DeleteBucket. Best-effort throughout so a half-torn-down bucket doesn't fail the cleanup chain. * test(s3/lifecycle): backdate both versions for NoncurrentDays clock Per codex review: NoncurrentDays is clocked from the SUCCESSOR version's mtime (when the displaced version became noncurrent), not from the displaced version's own mtime. Backdating only v1 left the clock (v2's mtime) at "now" and the rule never fired — the test was wrong, not the production path. Backdate v1=31d and v2=30d so v1 sits past the 1-day threshold relative to v2, the noncurrent rule fires, and v2 stays current. * test(s3/lifecycle): assert specific NotFound on multi-bucket deletion Per codex review: TestLifecycleMultipleBucketsInOneSweep treated any HeadObject error as "deleted", which lets a transport failure or dead endpoint mask a real bug. Recognize NoSuchKey/NotFound/HTTP-404 specifically via a small isS3NotFound helper so the assertion actually proves deletion happened, not just that the call broke. * test(s3/lifecycle): gofmt size-filter test * test(s3/lifecycle): integration coverage for Object Lock skip Object Lock retention must override the lifecycle rule. The handler's enforceObjectLockProtections check (s3api_internal_lifecycle.go:47) returns an error when retention is active; the dispatcher then classifies the outcome as SKIPPED_OBJECT_LOCK and the object stays. No existing integration test reaches that outcome. Setup: bucket created with ObjectLockEnabledForBucket=true, expiration rule on prefix "lock/", two backdated objects under the same prefix — one with GOVERNANCE retention until 1h from now, one without. After the worker runs, the unlocked object expires (positive control); the locked one survives. Custom cleanup uses BypassGovernanceRetention so the test can drop the locked version when the test finishes — otherwise the retention window keeps the bucket from being deleted. * test(s3/lifecycle): integration coverage for config update between sweeps An operator changes the lifecycle rule between two shell-driven sweeps. The second sweep must respect the NEW rule, not a cached copy of the old one. Each runLifecycleShard invocation spawns a fresh weed shell subprocess, so cached engine state from a previous sweep doesn't persist — but a regression that caches rules across PutBucketLifecycleConfiguration calls within the S3 server itself would still surface here. Sweep 1: rule prefix="first/", PUT + backdate firstKey, run worker → firstKey expires. Update rule to prefix="second/", PUT + backdate secondKey AND a new key under the OLD prefix ("first/post-update.txt"). Sweep 2 must expire only the second-prefix object; the post-update old- prefix one must survive — config replacement, not merge. * test(s3/lifecycle): integration coverage for ExpirationDate (past) Rules with Expiration{Date: <past>} route through ScanAtDate in the engine (decideMode's ActionKindExpirationDate case) — a separate compile + dispatch branch from the EventDriven delay-group path the Days-based tests exercise. Past date + in-prefix object → must expire. Out-of-prefix object → must remain. Object also backdated as defense-in-depth so the assertion doesn't depend on whether the dispatcher consults MinTriggerAge for date kinds. * test(s3/lifecycle): integration coverage for bootstrap walk on existing objects Production scenario: operator enables lifecycle on a bucket that already holds objects from before the policy. The worker must discover them via the bootstrap walk (BucketBootstrapper) — there were no meta-log events to observe because the objects predate the rule. Without the bootstrap path, only NEW writes would ever match. Setup: PUT 5 objects (no lifecycle config yet) + 1 out-of-prefix survivor, backdate all, THEN set the Expiration rule, run the worker. Every in-prefix pre-existing object must be expired; the out-of-prefix one must remain. * test(s3/lifecycle): integration coverage for DeleteBucketLifecycle stops dispatching Operator UX: after DeleteBucketLifecycle, the worker must observe the removal on the next sweep and stop expiring objects under the now-gone rule. A regression that caches old configs across PutBucketLifecycleConfiguration → DeleteBucketLifecycle would keep silently dropping objects. Setup: positive control (rule active, backdated obj expires) → DeleteBucketLifecycle → PUT + backdate a fresh object → second sweep. The fresh object must remain. * test(s3/lifecycle): integration coverage for empty bucket sweep no-op A bucket carrying lifecycle config but no objects must produce a successful sweep — no hangs, no errors, no dispatches. Pinned because the bootstrap walker iterates bucket directories, and an empty directory is a corner of that traversal that's easy to break (slice-bounds bug on the first listing returning zero entries). Asserts: worker logs "loaded lifecycle for" and "shards 0-15 complete", no FATAL output, bucket still exists after the sweep. * test(s3/lifecycle): fix Object Lock backdate path + skip unwired ScanAtDate ObjectLock: enabling Object Lock on a bucket implicitly enables versioning, so PUT objects land at .versions/v_<id>, not at the bare key. The test was calling backdateMtime (bare path) and failing in the helper with "filer: no entry is found". Switch to backdateVersionedMtime with the versionId returned by PutObject. ExpirationDate: ScanAtDate dispatch path isn't wired to the run-shard shell command yet — the bootstrap walker explicitly skips actions in ModeScanAtDate (walker.go:141 says "SCAN_AT_DATE runs its own date- triggered bootstrap" but no such bootstrap exists in the scheduler or shell). Skip with a t.Skip + explanation so the test activates the moment the date-triggered path lands. * fix(s3/lifecycle): wire ExpirationDate dispatch through bootstrap walker The walker explicitly skipped ModeScanAtDate actions on the comment "SCAN_AT_DATE runs its own date-triggered bootstrap" — but no such bootstrap exists in the scheduler or shell layer. The result: rules with Expiration{Date: ...} compiled correctly, populated the snapshot's dateActions map, and were never dispatched. ExpirationDate is silently a no-op in production. EvaluateAction already handles ActionKindExpirationDate correctly (rejects when now.Before(rule.ExpirationDate), otherwise emits ActionDeleteObject). The walker just needed to fall through instead of skipping. Pre-date walks become no-ops via EvaluateAction's date check; post-date walks expire eligible objects. Un-skip TestLifecycleExpirationDateInThePast — it now exercises the fixed path end-to-end. * test(s3/lifecycle): integration coverage for multiple rules per bucket A single bucket carries two independent Expiration rules with disjoint prefix filters and different Days thresholds. Each rule must fire only on its prefix; objects outside both prefixes must survive. Pinned because Compile builds one CompiledAction per rule per kind all sharing the same bucket index — a bug that lets one rule's prefix or threshold leak into another (e.g. last-write-wins on a shared map) would silently expire wrong objects. Setup: rule A with prefix=logs/ Days=1, rule B with prefix=tmp/ Days=7. Three backdated objects: logs/access.log, tmp/scratch.bin, data/keep.bin. After the worker runs, logs/ + tmp/ are gone; data/ — outside both rule prefixes — survives. * fix(s3/lifecycle): mark ScanAtDate actions active in Compile Two layers were silently filtering ScanAtDate actions out of routing: the walker's mode skip (fixed in `e785f59d6`) and Compile only marking ModeEventDriven actions active. MatchPath / MatchOriginalWrite both require IsActive() to emit a key, so a ScanAtDate action that's never marked active never reaches a dispatch path even after the walker falls through. ScanAtDate's only dispatch path is the bootstrap walk's MatchPath call — there's no bootstrap-completion rendezvous to wait on. Make the active flag include ModeScanAtDate alongside the EventDriven+BootstrapComplete combination. ExpirationDate-based rules now actually fire end-to-end. The TestLifecycleExpirationDateInThePast integration test exercises this. * fix(s3/lifecycle): route date kinds via ComputeDueAt ExpirationDate has MinTriggerAge=0, so router computed dueTime = info.ModTime + 0 = info.ModTime. For a backdated entry that mtime is BEFORE rule.ExpirationDate, so EvaluateAction's now.Before(rule.ExpirationDate) check returned ActionNone and the date rule never fired through the event-driven path. ComputeDueAt already knows the per-kind shape — rule.ExpirationDate for date kinds, ModTime+Days for the rest — so use it as the single source of truth for dueTime in Route's main loop. * test(s3/lifecycle): pin bootstrap walker date dispatch The original TestWalk_DateActionsSkipped pinned the pre-e785f59d6 behavior that the regular walker skipped ExpirationDate. That walker was rewired to fire date rules whose date has passed (the SCAN_AT_DATE bootstrap was never wired); update the test to match. Split into two: post-date entries dispatch, pre-date entries don't. * test(s3/lifecycle): drop unused putExpiredDeleteMarkerLifecycle The helper was never called — TestLifecycleExpiredDeleteMarkerCleanup constructs a combined noncurrent + expired-marker rule inline, which the helper doesn't cover. The blank-assignment workaround was just hiding dead code; remove both. * test(s3/lifecycle): tighten HeadObject termination check to typed not-found Generic err != nil also passes on transport/auth/timeouts, letting the test go green without proving the lifecycle action actually fired. Switch the three Eventuallyf HeadObject predicates to isS3NotFound, matching the pattern already in the multi-bucket and expiration-date tests. * test(s3/lifecycle): guard ListObjectVersions diagnostic against nil When ListObjectVersions errors, listOut is nil and the diagnostic log path panics on listOut.Versions before the real assertion fires. Branch on (listErr != nil \|\| listOut == nil) so the failure log is robust whatever ListObjectVersions returned.	2026-05-10 09:30:50 -07:00
Chris LuandGitHub	b740e22e63	test(s3/lifecycle): bundle dispatcher + engine edge-case coverage (#9413 ) * test(s3/lifecycle): bundle dispatcher + engine edge-case coverage Two-package bundle covering uncovered branches in production code that the existing happy-path tests don't reach. Dispatcher 58.1% → 60.2% and engine 81.0% → 81.7% (engine lift modest because most branches were already hit; the nil-rule defensive case is otherwise unreachable from a Compile flow). dispatcher (4 tests): - FilerPersister.Load with nil Store errors with a "nil Store" message rather than panicking at the Read call. - FilerPersister.Save with nil Store same. - FilerPersister.Load with a non-NotFound transport error wraps the shard ID into the message AND keeps the underlying error recoverable via errors.Is. - FilerPersister.Load with successful empty []byte returns an empty map, not a JSON-decode error — pinning that an existing-but-empty cursor file is treated as "no entries". - Tick initializes the retries map on first call without panic so a freshly-constructed Dispatcher works. - Tick with already-canceled ctx re-queues the popped Match, returns zero, and never invokes the LifecycleDelete client — the Match must not be lost across worker restart. engine (4 tests): - rulePredicateSensitive(nil) returns false rather than panicking on the FilterTags dereference. The non-nil paths run through Compile, but a defensive nil-rule arrival isn't reachable that way. - rule with no FilterTags / empty FilterTags map returns false (the check is len(FilterTags) > 0, so empty must classify as non-sensitive — pinning catches a flipped >= comparison). - rule with a populated FilterTags returns true. * fix(s3/lifecycle): Tick must requeue every drained Match on shutdown Per codex review on #9413: Tick called Schedule.Drain to pop ALL due matches at once, then iterated. If ctx canceled mid-loop, only the current Match was re-added — everything past that index was silently lost across the worker restart. With N due matches, up to N-1 were dropped. Fix: on cancellation, re-add due[i:] (current + remaining) before returning. Matches already dispatched (due[:i]) stay processed; the schedule is left exactly as it would be if Drain had returned only the dispatched prefix. Strengthen the existing test to enqueue three due matches and assert sched.Len()==3 after a pre-canceled Tick. Pre-fix the test would have seen Len()==1 because only the first popped Match was re-added.	2026-05-09 22:02:17 -07:00
Chris LuandGitHub	ca95d33092	test(s3/lifecycle): bundle dispatcher + engine accessor coverage (#9410 ) * test(s3/lifecycle): bundle dispatcher + engine accessor coverage Two-package bundle covering pure helpers and snapshot read-side accessors that the router and dispatcher reach for at runtime. None were directly tested; regressions previously surfaced only as downstream Tick / Match / Compile failures. dispatcher (10 tests): - keyOf: derives every retryKey field from the Match; equal Match values produce equal keys (so the second dispatch hits the first's retry counter); distinct VersionIDs and ActionKinds produce distinct keys (so a noisy version can't starve a healthy one, and two kinds on the same object don't share a budget). - budget(): configured value when set; defaultRetryBudget when zero or negative — pins the >0 guard against a flipped comparison. - backoff(): same pattern as budget for RetryBackoff. engine snapshot accessors (8 tests): - OriginalDelayGroups exposes the compiled per-delay groups; rules with multiple kinds at different cadences land in distinct entries; scan-only actions don't leak into delay groups so the dispatcher doesn't try to drive them event-driven. - PredicateActions populated for tag-sensitive rules, empty for non- tag-sensitive ones (so MatchPredicateChange doesn't route irrelevant kinds). - DateActions surfaces ExpirationDate verbatim for date kinds; empty for non-date rules. - MarkActive on an unknown key is a no-op (durable bootstrap-complete write races a recompile that drops the rule; panic here would crash the worker). - MarkActive flips a fresh-no-prior-state action from inactive to active. - BucketActionKeys covers every kind RuleActionKinds reports. * test(s3/lifecycle): strengthen snapshot accessor content assertions Per gemini review on #9410: assertions previously only checked counts and non-empty status. Verify the specific ActionKeys land where expected so an indexing regression that produces the right number of items with wrong kinds gets caught. OriginalDelayGroups: each delay group's slice asserts.Contains the specific (bucket, rule_hash, kind) ActionKey instead of just NotEmpty. PredicateActions: assert.Contains the expected key instead of just NotEmpty. BucketActionKeys: every key.Bucket must equal the test bucket (catches cross-bucket leak), and ElementsMatch pins kinds against RuleActionKinds.	2026-05-09 22:01:54 -07:00
Chris LuandGitHub	0955d1aa08	test(s3/lifecycle): direct prefixMatches + filterAllows coverage (#9408 ) Both helpers were exercised indirectly through MatchOriginalWrite / MatchPath; pinning them directly catches a regression at the helper level so a Match-test failure isn't the first signal of a broken filter. prefixMatches: empty prefix fast path; exact-prefix match; non-match rejection; path shorter than prefix. filterAllows: no-filter accepts any event; FilterSizeGreaterThan is strictly > (boundary value rejected); FilterSizeLessThan is strictly <; zero-size thresholds mean "not set" (must let any size through — a regression treating 0 as a real threshold would reject everything); required tag present accepts; missing key, empty tags map, wrong value, and missing-among-multiple all reject; size + tag filters are AND'd so either failing rejects.	2026-05-09 20:47:35 -07:00
Chris LuandGitHub	1aa55f5bf9	test(s3/lifecycle): direct decideMode + RuleMode.String coverage (#9405 ) Compile tests cover decideMode indirectly; these direct tests pin every branch so a regression in the classifier itself can't slip behind a more elaborate Compile failure. Pinned: nil rule and Disabled status both → Disabled; ExpirationDate → ScanAtDate without consulting retention; metaLogRetention=0 means unbounded so any horizon → EventDriven; horizon within retention → EventDriven; horizon exceeding retention → ScanOnly; bootstrapLookback adds to horizon (not retention) so a near-threshold case is still gated; zero horizon (rule field unset) skips the gate. RuleMode.String must render the documented names for every variant; an unknown value collapses to "unspecified" rather than empty or panic.	2026-05-09 20:35:34 -07:00
Chris LuandGitHub	05d31a04b6	fix(s3tests): wire lifecycle worker for expiration suite (#9374 ) * fix(s3tests): wire lifecycle worker for expiration suite The upstream s3-tests `test_lifecycle_expiration` / `test_lifecyclev2_expiration` exercise the "set rule, wait, verify deletion" path. Phase 4 (#9367) intentionally stripped the PUT-time back-stamp, so pre-existing objects no longer pick up TtlSec on a freshly-applied rule. The s3tests CI bare-bones `weed -s3` had nothing left driving expiration. Three changes that work together: - Engine scales `Days` by `util.LifeCycleInterval`. Production keeps the 24h day; the `s3tests` build tag shrinks it to 10s so a `Days: 1` rule completes inside the suite's 30s polling window. Exported `DaysToDuration` so sibling-package tests pin to the same scale. - Scheduler/dispatcher tick defaults split into `_default` / `_s3tests` files. Production stays 5s/30s/5m; the test build runs at 500ms/2s/2s so deletions land within a couple ticks of becoming due. - s3tests.yml spawns `weed shell s3.lifecycle.run-shard -shards 0-15 -events 0 -runtime 1800s` alongside the s3 server in both the basic and SQL blocks; the shell command runs the full pipeline (reader + scheduler + dispatcher) for the duration of the suite. `test_lifecycle_expiration_versioning_enabled` is left out for now — versioned-bucket expiration via the worker still needs its own pass. Drive-by: bump `TestWorkerDefaultJobTypes` to 7 to match the registered handler count (`8b87ceb0d` updated `mini_plugin_test.go` for the s3_lifecycle plugin but missed this twin test). Two retention-gate engine tests `t.Skip` under the s3tests build because they rely on absolute lookback-vs-retention math the day-rescale collapses; the prod build still covers them. * review: harden lifecycle worker spawn + assert handler identity - Workflow: aliveness check on the backgrounded `weed shell` (a bad command exits in <1s and the suite would otherwise just opaque-timeout); move worker/server teardown into a `trap cleanup EXIT` so failure paths still print the worker log and reap the data dir. - worker_test: check the actual job-type set by name, not just the count. * fix(shell): keep s3.lifecycle.run-shard alive when no rules exist yet The s3-tests CI runs the worker BEFORE any test creates a bucket, so LoadCompileInputs returns empty and the shell command was bailing out with "no buckets with enabled lifecycle rules found" within ~1s. The aliveness check then fired exit 1 before tox ever started. Two changes: - Don't early-exit on empty inputs. Compile against the empty set, log a one-liner, and let the pipeline run normally — the meta-log subscription is already up, so events for buckets created later DO arrive; they just need the engine to know about them when they do. - Add `-refresh <duration>` (default 5m, 2s in s3tests CI) that periodically re-runs LoadCompileInputs + engine.Compile so rules added after startup land in the snapshot the dispatcher reads on its next tick. Production deployments keep the 5m default; only the CI workflow drops to 2s. Workflow passes `-refresh 2s` in both basic and SQL blocks. * fix(shell): backfill pre-rule entries via bootstrap walker The reader-driven path only sees meta-log events created AFTER its engine snapshot knows the rule. The s3-tests CI scenario PUTs objects first, then PUTs the lifecycle config, so by the time the engine refresh picks up the new bucket the object events have already been seen-and-dropped (BucketActionKeys returned empty for the bucket). Wire bootstrap.Walk into the shell command: - bucketBootstrapper tracks buckets seen so far. kickOffNew spawns one loop goroutine per fresh bucket. - Each goroutine re-walks the bucket every walkInterval (defaults to the same value as -refresh, i.e. 2s in s3tests CI, 5m in prod) and feeds each entry through bootstrap.Walk; due actions dispatch via a direct LifecycleDelete RPC. Not-yet-due entries are silently skipped and picked up on a later iteration once they age past their (rescaled or real) threshold. - LifecycleDelete is called with no expected_identity; the server-side identityMatches treats nil as "skip CAS", which is the right call for bootstrap (the bootstrap entry doesn't carry chunk fid / extended hash anyway). The dispatcher's pkg-private toProtoActionKind is duplicated in the shell file rather than exported, since the shape is six lines and the reverse import would pull a proto dep into the s3lifecycle root. * refactor(s3/lifecycle): hoist bucket bootstrapper into scheduler pkg The shell command got the backfill in the previous commit but the worker plugin (weed/worker/tasks/s3_lifecycle/handler.go) drives Scheduler.Run directly and missed it — same root cause: the reader-driven path only sees events created after the rule lands, so a daily cron picking up a freshly-PUT rule wouldn't expire any pre-rule object. Move the looping bucket walker into scheduler.BucketBootstrapper: - Scheduler.Run now constructs one and calls KickOffNew on every engine refresh. Per-bucket goroutines re-walk every BootstrapWalkInterval (defaults to RefreshInterval — 5m in prod, 2s under s3tests). - The shell command consumes the same struct instead of its own copy so the two paths can't drift in semantics. * refactor(s3/lifecycle): walk-once + schedule via event injection Previous per-bucket walker re-listed every WalkInterval forever. For a bucket with N objects under a long rule, the worker did O(N * runtime / walkInterval) listings even when nothing was newly due — way too much for production-scale buckets. New approach: walk each bucket exactly once on first sight, synthesize one reader.Event per existing entry, push it onto Pipeline.events. Router.Route builds a Match with DueTime=mtime+delay; future-due matches sit in the per-shard Schedule and fire when their DueTime arrives. Currently-due matches fire on the very next dispatch tick. Wiring: - dispatcher.Pipeline lifts its events channel into a struct field with sync.Once init, and exposes InjectEvent(ctx, ev). Reader no longer closes the channel — the dispatch goroutine exits on runCtx cancellation, which works the same as channel-close did. - scheduler.BucketBootstrapper drops the WalkInterval ticker. KickOffNew spawns one walker goroutine per fresh bucket; the goroutine lists, synthesizes events, then exits. - scheduler.Scheduler builds its pipelines up front and exposes a pipelineFanout (shard -> Pipeline) as the EventInjector, so a multi- worker scheduler routes each synthesized event to the pipeline that owns its shard. - Shell command's single-pipeline path passes pipeline.InjectEvent directly. Synthesized events carry TsNs=0; dispatcher.advance treats that as a no-op so the reader's persisted cursor isn't ratcheted past unprocessed meta-log events. Identity (HeadFid + ExtendedHash) is still computed from the real filer entry, so the server's identity-CAS catches an overwrite between bootstrap and dispatch. debug(s3tests): make lifecycle worker progress visible in CI logs The previous CI failure dumped an empty $LC_LOG even though the worker was running. Two reasons: 1. weed shell suppresses glog by default (logtostderr / alsologtostderr set to false). Pass `-debug` so the bootstrapper's V(0) lines reach stderr instead of disappearing into /tmp/weed..log. 2. cleanup used `kill -9` which skips Go's stdout flush. SIGTERM first with a 1s grace, then SIGKILL the holdout, then read the log. While here: bump the bootstrap walker's two informational logs to V(0) so the diagnosis from CI doesn't require -v=1 on the worker. fix(s3/lifecycle/dispatcher): refresh snap on every event Pipeline.Run captured snap at startup and only refreshed it on the dispatch tick. With bootstrap event injection, the walker pushes events seconds after engine.Compile sees the bucket — typically WITHIN the same dispatch interval. Routing against the cached (empty) snap then silently dropped every match because BucketActionKeys returned nil for the bucket-not-yet-in-snapshot case. Re-fetch on each event. Engine.Snapshot is an atomic.Pointer.Load, so the cost is negligible. The dispatch-tick branch keeps using a fresh local read for its own loop, so its semantics are unchanged.	2026-05-08 17:29:47 -07:00
Chris LuandGitHub	8425c42858	feat(s3/lifecycle): event router + schedule (Phase 3 PR-C) (#9355 ) feat(s3/lifecycle): event router + DueTime schedule Router consumes per-shard reader events, looks up matching ActionKeys via the engine's BucketActionKeys index, and emits Matches with DueTime = event_time + action.Delay. Evaluation runs at DueTime so the age gate passes for fresh events; the dispatcher's identity-CAS catches drift. Schedule is a min-heap by DueTime; duplicates allowed (RPC CAS handles the redundant dispatch as NOOP_RESOLVED). BucketActionKeys accessor added to engine.Snapshot.	2026-05-07 15:43:27 -07:00
Chris LuandGitHub	7f2b20d577	feat(s3/lifecycle): policy engine — XML conversion, Compile, decideMode, Match (#9348 ) * feat(s3/lifecycle): XML lifecycle config to canonical Rule LifecycleToCanonical takes a parsed Lifecycle and returns []s3lifecycle.Rule, the flat shape the engine compiles against. Filter resolution mirrors AWS: <And> sub-elements (Prefix + Tags + size filters) flatten into the canonical Rule's individual fields; single <Tag> filter populates FilterTags with one entry; <Prefix> filter takes precedence over the rule's top-level <Prefix>. Multi-action rules (Expiration + NoncurrentVersion + AbortMPU on the same XML <Rule>) populate every action field they declare. RuleActionKinds expands the canonical rule into its compiled actions downstream. * feat(s3/lifecycle): engine snapshot skeleton + ActionKey type Defines s3lifecycle.ActionKey{rule_hash, action_kind} as the engine's primary identity, and adds the engine package's Snapshot type. Snapshot is immutable after Compile (atomic-swapped on rebuild) and holds the ActionKey-keyed routing indexes: - originalDelayGroups: map[time.Duration][]ActionKey - predicateActions: []ActionKey - dateActions: map[ActionKey]time.Time - actions: map[ActionKey]CompiledAction CompiledAction.engineState is an atomic.Uint32 so MarkActive (called after the durable bootstrap_complete + mode write commits) is visible to in-flight reader passes without a recompile. The reader filters on IsActive() before dispatching, so stale-snapshot dispatches are prevented. No callers yet; downstream commits add Compile, decideMode, and the Match functions. feat(s3/lifecycle): decideMode + retention gate decideMode picks the scheduling mode for one (rule, kind) compiled action. Disabled rule -> DISABLED; EXPIRATION_DATE -> SCAN_AT_DATE; reader-driven kind whose eventLogHorizon + bootstrapLookbackMin exceeds metaLogRetention -> SCAN_ONLY; otherwise EVENT_DRIVEN. The gate runs per (rule, kind), so a 90d ExpirationDays sibling can degrade to scan_only while its 7d AbortMPU sibling stays active. MetaLogRetention=0 is treated as "unbounded" — matches the SeaweedFS default (Phase 0 verified that meta-log files are written without TtlSec by default), so the gate doesn't trip until an operator opts in to volume-TTL pruning of /topics/.system/log/. RuleMode is a Go-level enum here, separate from the wire-form LifecycleState.RuleMode in the proto package; the worker maps between them when reading/writing the durable state file. * feat(s3/lifecycle): Compile builds the engine snapshot per-action Compile produces a fresh Snapshot from per-bucket canonical rules. Each input rule expands into N CompiledActions via RuleActionKinds; mode comes from decideMode; activation requires both bootstrap_complete (from PriorStates) and mode==EVENT_DRIVEN. Routing indexes are populated by mode: - SCAN_AT_DATE: always indexed in dateActions (detector schedules at rule.date regardless of bootstrap status; the action runs once on the date and is then done). - EVENT_DRIVEN + active: indexed in originalDelayGroups (and in predicateActions when the rule has tag/size filters). - SCAN_ONLY / DISABLED / pending_bootstrap: not indexed; safety-scan tick or operator action handle these. snapshot_id is monotonic per process; pending writes stamp it. The new snapshot replaces the engine's atomic pointer; in-flight reader passes continue against their loaded snapshot. Tests cover: single-action rule, multi-action expansion (one rule -> three CompiledActions with three distinct delay groups), pending bootstrap exclusion from indexes, retention gate, sibling actions degrading independently under partial retention, ExpirationDate path, disabled rule, MarkActive flipping IsActive(), Compile producing monotonic snapshot ids. * feat(s3/lifecycle): MatchOriginalWrite / MatchPredicateChange / MatchPath The reader feeds events through the engine's match functions to find the active ActionKeys whose filter applies. The minimal Event shape the engine takes (bucket, path, tags, size, IsLatest, IsDeleteMarker, IsMPUInit) keeps engine free of filer_pb dependencies; the reader extracts these fields from the persisted filer_pb.LogEntry payload in Phase 3. - MatchOriginalWrite: per-delay-group sweep entry. Filters on shape = EventShapeOriginalWrite, prefix, tag, size, then per-kind shape gating (ABORT_MPU only on IsMPUInit; EXPIRED_DELETE_MARKER only on IsLatest+IsDeleteMarker). - MatchPredicateChange: single near-now sweep. Returns only the predicate-sensitive subset of active ActionKeys. - MatchPath: bucket-level walker entry. Returns every active action whose filter matches; bootstrap iterates these per object and calls EvaluateAction per kind. All filter on a.IsActive() at routing time so MarkActive flips become visible without recompile. fix(s3/lifecycle): scope ActionKey by bucket; defensive copies; tidy compile Three findings on the engine PR addressed: 1. Critical (cross-bucket collision): ActionKey was {RuleHash, ActionKind} only. Two buckets with rules whose XML is identical produce the same RuleHash; the second bucket's Compile would overwrite the first bucket's CompiledAction in snap.actions. Add Bucket to ActionKey so the engine's identity matches the on-disk path layout /etc/s3/lifecycle/<bucket>/<rule_hash>/<action_kind>/. Regression test pins it. 2. Major (immutability leak): OriginalDelayGroups, PredicateActions, DateActions returned the snapshot's internal maps/slices by reference, letting an external caller mutate routing state and break the documented immutability contract. Return defensive copies. 3. Minor (redundant condition): mode==EVENT_DRIVEN already implies kind != EXPIRATION_DATE because decideMode routes the date kind to SCAN_AT_DATE. Drop the redundant check. Tests updated to construct ActionKey with the new Bucket field. * fix(s3/lifecycle): drop size filters from rulePredicateSensitive An object's size is immutable once written: any content change is a fresh write that flows through the original-write stream, not the predicate-change one. Tagging rules really can flip post-PUT (operator adds/removes a tag without rewriting), so they belong; size filters do not. Including size filters here was adding rules to predicateActions for no purpose — every predicate-change sweep would waste cycles re-evaluating size predicates that physically can't have changed. * perf(s3/lifecycle): pre-sort AllActions at Compile time Snapshot is immutable after Compile (engineState bit-flips don't change membership), so the (bucket, rule_hash, action_kind) ordering is stable for the snapshot's lifetime. Build the sorted slice once and serve every AllActions() call from it; drop the per-call sort.Slice. The bootstrap walker is the primary caller and may iterate this on every task entry. * docs(s3/lifecycle): note the FilterSizeGreaterThan=0 ambiguity Per AWS S3 spec, <ObjectSizeGreaterThan>0</ObjectSizeGreaterThan> explicitly excludes 0-byte objects, but with the int64 zero value as the unset sentinel we can't distinguish that from omitted-and-default. Document the limitation inline so a future deployment that needs the distinction can switch to int64 (or a paired set-bool) and update the matchers / RuleHash accordingly. Not fixing now: the explicit-zero configuration is unusual, the canonical Rule shape mirrors the same zero-as-unset convention as s3api.Filter, and a structural fix touches every filter-using site (evaluator, due_at, match, RuleHash). fix(s3/lifecycle): make ObjectInfo.NoncurrentIndex int The previous int field had a zero-value collision: 0 is both "newest non-current version" (a valid index) and "uninitialised by ObjectInfo{} literal." A caller who built &ObjectInfo{IsLatest: false} without explicitly setting NoncurrentIndex would have it implicitly read as "newest non-current," and the count-based NewerNoncurrent retention would use that bogus 0 to decide eligibility. Switch to int so nil is explicitly "not a non-current version / index not yet computed." The evaluator's NoncurrentDays and NewerNoncurrent paths conservatively return ActionNone when the index is nil — the safety scan will revisit once the index is supplied. This removes a class of latent footguns in test setup and in any future code path that constructs ObjectInfo without a versioning-aware builder. idx() helper added in tests to keep the call sites a one-liner. * refactor(s3/lifecycle): trim narration from engine + helpers Drop "what" comments where well-named identifiers already say it (IsActive, MarkActive, AllActions, etc.); collapse multi-paragraph "why" docs to one-liners where the design rationale is already in the design doc. Keep WHY comments only at non-obvious load-bearing spots: the routing-index activation predicate, the int rationale on NoncurrentIndex, the field-tag namespace in RuleHash, the SmallDelay horizon rule. Files: action_kind.go, rule.go, rule_hash.go, evaluate.go, due_at.go, min_trigger_age.go, event_log_horizon.go, engine/engine.go, engine/compile.go, engine/match.go, engine/mode.go. No behavior change; tests untouched and pass. fix(s3/lifecycle): durable PriorState.Mode wins over decideMode PriorState.Mode was declared but never read; Compile recomputed mode via decideMode and stored that on every CompiledAction. Effect: an action durably persisted as SCAN_ONLY (lag fallback or operator pause) or DISABLED would silently re-promote to EVENT_DRIVEN on the next engine rebuild as soon as decideMode's XML+retention predicate said so. Defeats the durability of mode state. Use prior.Mode when set; fall through to decideMode only for new actions (no prior at all) and for legacy entries persisted before Mode existed (zero value). Regression test pins both branches. * fix(s3/lifecycle): MarkActive routability — index every EVENT_DRIVEN key MarkActive's documented contract was "flip visible without a recompile," but the routing indexes (originalDelayGroups, predicateActions) were only populated when active && mode == EVENT_DRIVEN at compile time. So a key compiled with BootstrapComplete=false would never enter the indexes; a later MarkActive flipped engineState but MatchOriginalWrite / MatchPredicateChange iterated the indexes and never saw the key. Only MatchPath (which walks bi.actionKeys) and DateActions worked. Index every EVENT_DRIVEN key regardless of `active`. The runtime IsActive() filter inside filterMatching already gates dispatch, so inactive entries are matched-but-not-fired; flipping MarkActive makes them routable without recompile, matching the documented contract. Tests updated: TestCompile_BootstrapPendingIndexedButInactive asserts the indexed-but-inactive shape; TestMatchOriginalWrite_MarkActiveBecomesRoutable asserts a MarkActive flip routes the next match. * test(s3/lifecycle): pin nil NoncurrentIndex no-op behavior Two regression tests for the int pointer migration: nil index combined with NewerNoncurrent (either paired with NoncurrentDays or standalone) must short-circuit to ActionNone rather than guess at the version's position in the keep-N window. refactor(s3/lifecycle): trim follow-up narration on engine + helpers Comments accumulated since the last sweep — the durable-Mode rationale, the MarkActive routability note, the routing-index doc, the NoncurrentIndex pointer rationale, and the EvaluateAction docblock. Trimmed each to one or two terse lines; the underlying contracts live in the design doc. * docs(s3/lifecycle): note CompileInput one-per-bucket invariant	2026-05-07 15:00:49 -07:00