mirror of
https://github.com/seaweedfs/seaweedfs.git
synced 2026-05-29 13:10:21 +00:00
ca95d33092400a58d24977e60ec93e20a25809f1
13770 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
ca95d33092 |
test(s3/lifecycle): bundle dispatcher + engine accessor coverage (#9410)
* test(s3/lifecycle): bundle dispatcher + engine accessor coverage Two-package bundle covering pure helpers and snapshot read-side accessors that the router and dispatcher reach for at runtime. None were directly tested; regressions previously surfaced only as downstream Tick / Match / Compile failures. dispatcher (10 tests): - keyOf: derives every retryKey field from the Match; equal Match values produce equal keys (so the second dispatch hits the first's retry counter); distinct VersionIDs and ActionKinds produce distinct keys (so a noisy version can't starve a healthy one, and two kinds on the same object don't share a budget). - budget(): configured value when set; defaultRetryBudget when zero or negative — pins the >0 guard against a flipped comparison. - backoff(): same pattern as budget for RetryBackoff. engine snapshot accessors (8 tests): - OriginalDelayGroups exposes the compiled per-delay groups; rules with multiple kinds at different cadences land in distinct entries; scan-only actions don't leak into delay groups so the dispatcher doesn't try to drive them event-driven. - PredicateActions populated for tag-sensitive rules, empty for non- tag-sensitive ones (so MatchPredicateChange doesn't route irrelevant kinds). - DateActions surfaces ExpirationDate verbatim for date kinds; empty for non-date rules. - MarkActive on an unknown key is a no-op (durable bootstrap-complete write races a recompile that drops the rule; panic here would crash the worker). - MarkActive flips a fresh-no-prior-state action from inactive to active. - BucketActionKeys covers every kind RuleActionKinds reports. * test(s3/lifecycle): strengthen snapshot accessor content assertions Per gemini review on #9410: assertions previously only checked counts and non-empty status. Verify the specific ActionKeys land where expected so an indexing regression that produces the right number of items with wrong kinds gets caught. OriginalDelayGroups: each delay group's slice asserts.Contains the specific (bucket, rule_hash, kind) ActionKey instead of just NotEmpty. PredicateActions: assert.Contains the expected key instead of just NotEmpty. BucketActionKeys: every key.Bucket must equal the test bucket (catches cross-bucket leak), and ElementsMatch pins kinds against RuleActionKinds. |
||
|
|
0955d1aa08 |
test(s3/lifecycle): direct prefixMatches + filterAllows coverage (#9408)
Both helpers were exercised indirectly through MatchOriginalWrite / MatchPath; pinning them directly catches a regression at the helper level so a Match-test failure isn't the first signal of a broken filter. prefixMatches: empty prefix fast path; exact-prefix match; non-match rejection; path shorter than prefix. filterAllows: no-filter accepts any event; FilterSizeGreaterThan is strictly > (boundary value rejected); FilterSizeLessThan is strictly <; zero-size thresholds mean "not set" (must let any size through — a regression treating 0 as a real threshold would reject everything); required tag present accepts; missing key, empty tags map, wrong value, and missing-among-multiple all reject; size + tag filters are AND'd so either failing rejects. |
||
|
|
edbe7ab140 |
test(s3/lifecycle): meta-log Event builder + monotonic clock fixture (#9406)
* test(s3/lifecycle): meta-log Event builder + monotonic clock fixture Several test files build *reader.Event ad-hoc; consolidate the common shape into the lifecycletest package as task #12 spec calls out ("fixture meta-log generator"). New tests using the builder don't have to thread Mtime / ShardID / leaf-name semantics by hand, and existing helpers can migrate over time without churning this PR. NewCreate / NewDelete / NewUpdate cover the three event shapes; WithSize / WithModTime / WithTtlSec / WithVersionID / WithExtended / WithChunks / WithBootstrapVersion / WithShardID compose deterministic overrides. ShardID defaults to s3lifecycle.ShardID(bucket, key) so events route through the same shard the production reader would. MetaLogClock issues monotonic timestamps with a configurable step (default 1s); concurrent-safe so fan-out fixtures don't have to lock externally. 15 unit tests pin every option, the IsCreate/IsDelete/IsUpdate discriminators, leaf-name extraction for nested keys, ShardID derivation, option-ordering semantics, the concurrent clock contract under -race, and a Peek-doesn't-advance check. * test(s3/lifecycle): address review comments on event builder - leafOf strips trailing slashes before splitting so directory-key fixtures (e.g. "folder/") get the slashless leaf "folder" — pre-fix it returned "" which would break router tests for directory markers. - NewUpdate now seeds OldEntry.Attributes.Mtime with the event ts (matching NewDelete), so a downstream router that compares mtimes doesn't see a synthetic 1970 epoch on the pre-update state. - New WithOldSize / WithOldChunks / WithOldModTime options let Update events configure pre-update state independently. The unprefixed variants still target NewEntry on Update events; the With Old* options are no-ops on Create (no OldEntry to mutate) and never bleed into NewEntry. 5 new tests pin: directory-key + multi-slash leaf extraction; OldEntry mtime default on Update; the WithOld* targeting + Create-event no-bleed contract. |
||
|
|
9d20e71883 |
test(s3/lifecycle): cover worker handler lookupBucketsPath (#9407)
Three branches: gRPC error from GetFilerConfiguration must propagate (else Execute would proceed to dial S3 with an empty buckets path and never dispatch); a non-empty DirBuckets is honored verbatim so operators with a non-default layout aren't force-routed to /buckets; an empty DirBuckets falls back to the documented "/buckets" default rather than returning empty (which would route to root). stubFilerConfigClient embeds filer_pb.SeaweedFilerClient so methods other than the one under test panic if called — keeps the surface narrow. |
||
|
|
1aa55f5bf9 |
test(s3/lifecycle): direct decideMode + RuleMode.String coverage (#9405)
Compile tests cover decideMode indirectly; these direct tests pin every branch so a regression in the classifier itself can't slip behind a more elaborate Compile failure. Pinned: nil rule and Disabled status both → Disabled; ExpirationDate → ScanAtDate without consulting retention; metaLogRetention=0 means unbounded so any horizon → EventDriven; horizon within retention → EventDriven; horizon exceeding retention → ScanOnly; bootstrapLookback adds to horizon (not retention) so a near-threshold case is still gated; zero horizon (rule field unset) skips the gate. RuleMode.String must render the documented names for every variant; an unknown value collapses to "unspecified" rather than empty or panic. |
||
|
|
619cb39827 |
test(s3/lifecycle): pin Schedule edge cases beyond happy path (Phase 15 slice) (#9403)
* test(s3/lifecycle): pin Schedule edge cases beyond happy path Pre-existing schedule_test covered the happy path (ordered Drain, empty schedule, duplicates, boundary-inclusive). Five new tests pin edge cases the dispatcher relies on: - Drain at a time before any DueTime returns nil and leaves the heap intact, so the dispatcher can't accidentally consume future-due matches. - NextDue after partial Drain points to the next earliest, catching a Drain that forgets the heap invariant. - Add after Drain bubbles a fresh earlier DueTime to the front, so late-arriving high-priority matches don't sit behind older ones. - Drain returns Matches in ascending DueTime order regardless of insert order — explicit pinning of the documented contract. - Concurrent Add+Drain across 64 goroutines under -race. * test(s3/lifecycle): actually exercise Drain in AddAfterDrain test Per coderabbit review on #9403: the test name promised "after Drain" but the previous body only Add'd both items without ever calling Drain in between. Insert a real Drain (popping "drain_me") before the second Add, so the heap-invariant-across-Drain-then-Add path is actually pinned. Bumps the after-Drain Match's DueTime out of the way so the Drain in step 3 returns it deterministically. |
||
|
|
435ef7f94f |
test(s3/lifecycle): pin toProtoActionKind + toProtoIdentity converters (#9404)
test(s3/lifecycle): pin toProtoActionKind + toProtoIdentity The two converters are the worker-side wire to LifecycleDelete; a miss in toProtoActionKind sends ACTION_KIND_UNSPECIFIED that the server rejects FATAL, and a wrong toProtoIdentity flips the CAS witness so every dispatch comes back NOOP_RESOLVED with STALE_IDENTITY even though the entry hasn't changed. 10 tests pin: every listed s3lifecycle.ActionKind maps to its proto counterpart (table-driven, one subtest per kind); ActionKindUnspecified and a future unknown kind both collapse to ACTION_KIND_UNSPECIFIED (forward compat); nil EntryIdentity stays nil (preserves the no-CAS sentinel); a populated identity copies every field; a zero-valued identity still produces a non-nil output so the server treats it as a real CAS witness rather than no-CAS. |
||
|
|
1350e681c9 |
test(s3/lifecycle): pin Pipeline.Run dependency + shard validation (Phase 15 slice) (#9402)
* test(s3/lifecycle): pin Pipeline.Run dependency + shard validation Pre-existing TestPipelineRunRequiresDependencies only checked that an empty Pipeline errors; it didn't pin which specific dependency must be present. A refactor that makes one nilable accidentally would slip through. 8 new tests pin every validation branch in Pipeline.Run: missing Engine / Persister / Client / FilerClient each error with "missing required dependency"; missing BucketsPath errors with its own distinct message so operators can spot the missing wiring; ShardID = -1 / ShardCount errors with a range message (covers the half-open [0, ShardCount) boundary so a < to <= refactor can't introduce a one-past-the-end shard); and a multi-shard config with one out-of-range entry refuses the whole run rather than silently disabling the rest. * test(s3/lifecycle): refactor Pipeline.Run validation tests as table-driven Per gemini review on #9402: collapse the eight per-branch tests into TestPipelineRunValidation with a slice of (name, mutate, wantErr) cases. Same coverage, ~30 fewer lines, idiomatic Go pattern that makes adding a new validation case trivial. |
||
|
|
cb6e498e0b |
test(s3/lifecycle): pin Descriptor structural invariants (#9401)
* test(s3/lifecycle): pin Descriptor structural invariants Pre-existing handler tests covered Capability and Detect; Descriptor was previously untested. A drift between the form fields it advertises and the defaults config.go reads silently breaks the admin UI in two ways: the form renders blank (admin can't tune) or the worker clamps to a hardcoded fallback ignoring the admin's edits. The new tests catch both directions. Pinned: jobType / DisplayName / Description / DescriptorVersion; AdminConfigForm exposes a workers field whose default matches defaultWorkers; WorkerConfigForm has a default and a field for every cadence knob ParseConfig reads (dispatch_tick / checkpoint_tick / refresh_interval / bootstrap_interval / max_runtime); AdminRuntime- Defaults hits a daily cadence with bounded detection timeout and single job per detection. * test(s3/lifecycle): tighten Descriptor invariant assertions Per gemini review on #9401: pin DetectionTimeoutSeconds to its exact value (60) instead of ">0" so an accidental tweak is caught, and assert WorkerConfigForm fields are INT64 (matching ParseConfig's readInt64) so a STRING-type drift can't silently make the worker ignore admin edits. |
||
|
|
6f9668c20b |
test(s3/lifecycle): pin lifecycleDispatch validation early-returns (#9400)
Three pure-validation paths in lifecycleDispatch return BLOCKED before any filer call; without coverage a refactor could let them fall through to a real delete. ABORT_MPU at dispatch time is a defensive catch (the route bypass should never happen, but if it does the fallthrough must not become a default-case rm). Unknown ActionKind gets the same treatment for forward-compatibility with new proto values. Empty version_id on noncurrent / EXPIRED_DELETE_MARKER kinds must be rejected before deleteSpecificObjectVersion is called, so a malformed event can't silently delete the latest pointer. |
||
|
|
af2a359e45 |
feat(s3/lifecycle): metadata_only_total Prometheus counter (#9399)
Operator-visible signal for the metadata-only delete path landed in
PR 9390. Increment seaweedfs_s3_lifecycle_metadata_only_total{bucket,
rule_hash} after each successful unversioned or noncurrent / expired-
marker delete that took the skip-chunk path. Suspended-versioning
null delete is intentionally not counted: that path's nil err can
mean "deleted" or "NotFound", so a count there would over-report.
rule_hash is hex-encoded for label safety; nil bytes collapse to
"". DeleteBucketMetrics tears the new series down alongside the
existing lifecycle counters when a bucket is removed.
|
||
|
|
c0cf1417f1 |
test(s3/lifecycle): cover worker handler Execute validation paths (#9398)
7 tests pin the Execute early-return surface that runs without a filer or S3 dial: nil request / nil Job / nil sender all error; foreign JobType errors with the offending name in the message; no S3 endpoints in cluster context errors (Execute is stricter than Detect — the admin shouldn't have routed the job there); missing filer_grpc_address parameter errors (proposal must have been tampered with or dropped); empty JobType is accepted as broadcast routing and flows through to the next validation step. The dial path itself is intentionally not covered here — those tests would need an in-process gRPC server and belong with the integration suite. |
||
|
|
284d37c3b6 |
test(s3/lifecycle): cover InMemoryPersister deep-copy contract (#9397)
* test(s3/lifecycle): cover InMemoryPersister deep-copy contract 8 tests pin the persister contract other lifecycle tests rely on for cursor checkpointing: Load on an unknown shard returns an empty map (not an error); Save then Load roundtrips; Save copies the input so caller-side mutation doesn't bleed into stored state; Load returns a copy so caller-side mutation of the snapshot doesn't bleed back; Save replaces (not merges) prior state so stale resume points don't survive restart; different shards stay isolated; saving an empty map clears state; concurrent Save+Load is race-free under -race. A regression on any of these silently corrupts downstream tests. * test(s3/lifecycle): assert.NotContains for InMemoryPersister key absence assert.Empty on a map[K]V index returns true when the value is the zero value, which would mask a key that leaked through with int64(0). Use assert.NotContains so the assertion fails on key presence regardless of the stored value. |
||
|
|
62e04623ce |
test(s3/lifecycle): cover worker handler Detect + helpers (#9396)
* test(s3/lifecycle): cover worker handler Detect + helpers 13 tests pin the worker-handler surface that runs without a live filer or S3 server. Pure helpers: clusterS3Endpoints (nil context, empty list, filter empty entries while preserving order, all-valid passthrough); readString (missing key, nil ConfigValue, wrong kind falls back, string returned). Capability advertises jobType with single-job concurrency caps. Detect: nil request / nil sender / wrong JobType all error; no S3 endpoints emits a 'skipped' activity and completes with success; no filer addresses behaves the same; the happy path proposes one job parameterized with the first filer address; empty JobType is accepted (broadcast detect); a SendProposals failure propagates without firing complete. * test(s3/lifecycle): cover SendComplete error propagation in worker Detect The recordingSender already supported forcing an err on SendComplete via errOn, but no case exercised it. A SendComplete failure must propagate so the admin learns the completion signal never landed; proposals went out before the failure so they remain recorded. |
||
|
|
551e700e64 |
test(s3/lifecycle): cover scheduler configload surface (#9395)
* test(s3/lifecycle): cover scheduler configload surface LoadCompileInputs is the bridge between the filer's bucket directory and the engine snapshot the scheduler compiles every refresh; a missed or misclassified bucket silently disables lifecycle for that prefix until the next refresh. Tests pin: empty bucket dir, files at the bucket level skipped, buckets without the lifecycle XML extended key skipped, empty-bytes XML skipped, valid XML becomes a CompileInput, versioning attr propagates to CompileInput.Versioned, malformed XML surfaces as a ParseError without aborting the walk, and pagination across the 1024 page boundary preserves bucket order. Also covers the IsBucketVersioned (case + whitespace tolerance, rejection of garbage values) and AllActivePriorStates (one entry per (bucket, ruleHash, actionKind), bucket-keyed isolation) helpers. * test(s3/lifecycle): tighten configload pagination boundary check Switch the bucket-count check to require.Len so a regression that returns the wrong number of buckets fails fast before the boundary asserts panic on out-of-range index. Add explicit assertions on the last entry of page 1 (b01023) and the first entry of page 2 (b01024) so a pagination-loop bug that drops or duplicates the seam is caught directly rather than only via the count check. |
||
|
|
6021a88606 |
test(s3/lifecycle): cover CompareVersionIds tiebreak surface (#9394)
* test(s3/lifecycle): cover CompareVersionIds tiebreak surface 13 tests pin every documented branch of the version-id comparator and its helpers (isNewFormatVersionId, getVersionTimestamp): equality and short-circuit paths, null sorting last, both-new-format with smaller- =-newer ordering, both-old-format with larger-=-newer ordering, mixed- format compared by parsed timestamp, mixed-format with synthesized equal timestamps, length / null / non-hex rejection, the strictly- greater-than threshold boundary at 0x4000000000000000, and the inverted-value invariant the comparator relies on. Getting any axis wrong silently inverts retention rankings, which would resurrect deleted versions or evict live ones. * test(s3/lifecycle): use plain assert.Equal in mixed-format compare test The previous local require := assert.New(t) shadowed testify's require package while actually returning an assert.Assertions (continue-on-fail semantics, not fail-fast). Use plain assert.Equal(t, ...) calls so the behavior matches the variable's name and the rest of the file. |
||
|
|
7781eef429 |
test(s3/lifecycle): cover dispatcher filerSiblingLister surface (Phase 14 slice) (#9392)
* test(s3/lifecycle): cover dispatcher's filerSiblingLister surface Tests pin the four routing-critical filer interactions on the filerSiblingLister: Survivors (count cap, LoneEntry semantics, null-version detection across regular files and directory-key markers, error propagation in both list and lookup paths), ListVersions (NotFound collapse, dir/missing-id filtering, 1024-page boundary, error propagation), LookupNullVersion (regular file, explicit-null flag, directory-key marker accept, plain-dir reject, NotFound collapse, error propagation), and LookupVersion (empty version-id no-op, v_ prefix, NotFound collapse, error propagation). The fake SeaweedFilerClient mirrors the real filer's NotFound shape — gRPC succeeds at stream creation and the first Recv() surfaces filer_pb.ErrNotFound — which is what the lister's errors.Is check depends on. NewFullPath strips a trailing slash before splitting so directory-key markers are stored under their slashless Name. * test(s3/lifecycle): gofmt sibling_lister_test.go Trailing comment alignment. |
||
|
|
8cf42a5abb |
test(s3/lifecycle): assert per-goroutine errors in fake-server concurrent test (#9393)
test(s3/lifecycle): assert per-goroutine errors in concurrent fake test The previous TestFake_ConcurrentCallsSerializeWithoutDeadlock dropped the err return from each LifecycleDelete call, so a regression in the concurrent path could pass the length-only assertion. Capture each err on a buffered channel and require.NoError after wg.Wait(). |
||
|
|
ddfb219ec3 |
test(s3/lifecycle): fake LifecycleDelete server (Phase 12 slice) (#9391)
* test(s3/lifecycle): fake LifecycleDelete server for component tests A reusable double for SeaweedS3LifecycleInternalServer with per-key FIFO outcome queues, a fallback Default, and recorded request capture. Tests of the worker pipeline that need to hit the proto boundary can queue up DONE/NOOP/RETRY/FATAL/SKIPPED_OBJECT_LOCK responses per (bucket, objectPath, versionId) and assert dispatch order against Recorded(). SetError flips the server into transport-failure mode without polluting the request log. * test(s3/lifecycle): use struct map key for FakeLifecycleServer queues Bucket / object path / version-id are user-supplied strings that can contain "/" or "@", which would collide if the queue map were keyed by "<bucket>/<object>@<version>". Switch to a struct key so the components stay separate. * test(s3/lifecycle): deep-copy recorded LifecycleDelete requests Tests that mutate a Recorded() entry — or a request pointer they already passed in — were able to corrupt the fake's bookkeeping because the slice carried shared pointers. Clone with proto.Clone at both record and read time so the fake holds an independent snapshot of every arriving request and hands callers an independent snapshot back. Tightened TestFake_VersionIDPartOfKey error checks while there. |
||
|
|
bb0c7c779f |
feat(s3/lifecycle): metadata-only delete when entry TtlSec > 0 (Phase 2b) (#9390)
* refactor(s3): thread metadataOnly into delete helpers Add a metadataOnly bool to deleteUnversionedObjectWithClient and deleteSpecificObjectVersion. When true the helper sends IsDeleteData= false to the filer's DeleteEntry RPC so per-chunk DeleteFile RPCs are skipped — the volume server reclaims chunks on its own at TTL drop. Non-lifecycle callers (DELETE handlers, batch delete) pass false to preserve today's eager-chunk-delete behavior; only the lifecycle handler in the next commit will pass true. * feat(s3/lifecycle): metadata-only delete when entry TtlSec > 0 Per-write TTL stamping (PR 9377) sets Attributes.TtlSec on every lifecycle-fitting entry. When the live entry the LifecycleDelete handler fetched carries TtlSec > 0 the volume server is guaranteed to reclaim chunks at TTL drop, so the filer can skip per-chunk DeleteFile RPCs and just remove the entry record. lifecycleDispatch now computes metadataOnly from the live entry and threads it through the unversioned, suspended-null, and noncurrent/expired-marker delete paths. createDeleteMarker is unaffected — it creates a marker, never deletes chunks. |
||
|
|
255e9cd0f7 |
test(s3/lifecycle): cover reader cursor + Run validation contracts (#9389)
* test(s3/lifecycle): cover reader cursor + Run validation contracts Layer 2 tests pinning four reader-package contracts the dispatcher pipeline depends on: MinTsNs anchors at frozen positions, Snapshot returns a deep copy in both directions, Restore replaces (not merges), and Run validates ShardID/Events/BucketsPath before subscribing. * test(s3/lifecycle): tighten cursor composition assertions Snapshot deep-copy: also assert cursor doesn't see keys added to the returned map. Restore replace: freeze before second Restore and assert IsFrozen returns false after, pinning the contract that Restore wipes frozen state alongside the value map. Run validation: bound the call with a 5s context timeout so a regression that lets Run reach the nil client surfaces as a failure instead of a hang. |
||
|
|
aa280443e7 |
test(s3/lifecycle): Layer 2 multi-shard composition for the dispatcher (#9387)
* test(s3/lifecycle): Layer 2 multi-shard composition for the dispatcher The existing dispatcher unit tests cover individual outcomes (DONE / RETRY_LATER / BLOCKED / etc.) on a single shard, and pipeline_test.go has only one end-to-end happy-path assertion. Multi-shard composition — the contract Pipeline.Run wires up at runtime — was untested at the component level. Add four Layer 2 tests in dispatcher/multi_shard_test.go: Two events for two shards land in different schedules, dispatch independently, and each cursor advances only for its own event (no cross-contamination on the action-key map). A poison event on shard 0 returns BLOCKED and freezes shard 0's cursor; shard 1's normal event continues to dispatch and its cursor advances. Per-shard isolation contract. Save/Load round-trips a per-shard cursor snapshot through the Persister: a fresh dispatcher restores the same TsNs map. Pins the contract Pipeline.Run drives on the checkpoint ticker. RETRY_LATER respects RetryBackoff against the wall clock — a Tick within the backoff window doesn't re-dispatch; a Tick past the new DueTime does. Guards against premature retries from refresh ticks landing inside the backoff. Pipeline.Run itself can't run here (it builds a real reader.Reader), so the tests share the same fakeClient pattern dispatcher_test.go uses and drive Tick directly. * test(s3/lifecycle): drop unused snapshot helper and addAndTick parameter |
||
|
|
1854101125 |
feat(s3/lifecycle): bootstrap re-walk cadence + operator hooks (Phase 8) (#9386)
* feat(s3/lifecycle): bootstrap re-walk cadence + operator hooks (Phase 8) scan_only actions only fire from the bootstrap walk: the engine classifies a rule as scan_only when its retention horizon exceeds the meta-log retention, so event-driven routing can't be trusted. Today each bucket walks once per process, so a long-running worker never revisits — scan_only retention only catches up when the worker restarts. Replace BucketBootstrapper.known (set) with BucketBootstrapper.lastWalk (name -> completion time). KickOffNew now re-walks a bucket whose last walk completed more than BootstrapInterval ago. Zero interval preserves the legacy walk-once-per-process behavior so existing deployments don't change cadence by default. walkBucket re-stamps on success and clears the stamp on failure (via MarkDirty), so the next KickOffNew picks failed walks back up. Add MarkDirty / MarkAllDirty operator hooks for forced re-walks, and a Now func() for testable time travel. weed shell run-shard grows --bootstrap-interval (cadence knob) and --force-bootstrap (drop in-memory state at startup so every bucket walks again immediately, useful when a config change should take effect without a restart). Tests: cadence respected (skip inside interval, re-walk past it); zero interval keeps once-per-process; MarkDirty forces re-walk under a 24h interval; MarkAllDirty resets every record. The fakeClock helper guards the test clock with a mutex so race-detector runs are clean. * fix(s3/lifecycle): split walk state, thread BootstrapInterval through worker, drop dead flag Three issues with the Phase 8 cadence work as it landed: 1. lastWalk did double duty as both completed-walk timestamp and in-flight debounce. A walk that took longer than BootstrapInterval would have a fresh KickOffNew start a duplicate goroutine on the next refresh tick because the stamp from KickOffNew looked stale against the interval. Split into lastCompleted (set on success) and inFlight (set on dispatch, cleared after the walk goroutine returns success or failure). KickOffNew skips inFlight buckets regardless of cadence. 2. The cadence knob existed on `weed shell` but not on the production path: scheduler.Scheduler constructed BucketBootstrapper without BootstrapInterval, and weed/worker/tasks/s3_lifecycle/Config had no field for it. Add Scheduler.BootstrapInterval, parse `bootstrap_interval_minutes` in ParseConfig (zero = legacy walk- once-per-process; negative clamps to zero), and forward it from the handler. Tests cover default, override, clamp, and explicit-zero. 3. --force-bootstrap was a no-op: BucketBootstrapper is freshly allocated at command start, so MarkAllDirty on empty state does nothing, and the flag couldn't influence an already-running process anyway. Remove it; a real runtime trigger (SIGHUP, control RPC) is a separate change. In-flight regression: a blockingInjector pins the first walk in progress while the test advances the clock past the interval. The second KickOffNew is a no-op (inFlight check). After release, the post-completion KickOffNew within the interval is also a no-op. * test(s3/lifecycle): wait for lastCompleted stamp before advancing fake clock The cadence test polled listedN to know "the walk happened" — but that fires once both list passes are issued, while the success-stamp lands later, after walkBucketDir returns. A clock.Advance(30m) between those two events would record the stamp at clock+30m instead of T0; the next assertion would then see now.Sub(last) < 1h and skip the expected re-walk. Tight in practice but exposed under -race / load. Add a waitForCompleted helper that polls b.lastCompleted directly, and use it before each clock advance in both the cadence and zero- interval tests. * fix(s3/lifecycle): expose bootstrap interval in worker UI; honor MarkDirty during walks Two follow-ups on Phase 8. The worker config descriptor had no bootstrap_interval_minutes field, so the production operator UI couldn't enable the cadence — only the internal ParseConfig + Scheduler wiring knew about it. Add the field to the cadence section (MinValue=0 since 0 is the legacy default) and include the default in DefaultValues so existing deployments see the knob with the right preset. MarkDirty / MarkAllDirty silently lost their effect when a walk was in flight: the methods cleared lastCompleted, but the walk's success path then wrote a fresh timestamp, hiding the operator's invalidation. Track a pendingDirty set; the walk goroutine consumes the flag on exit and skips the success stamp, so the next KickOffNew picks the bucket up immediately. Regression: pin a walk in progress with a blockingInjector, MarkDirty the bucket, release the walk, and assert lastCompleted stayed empty plus the next KickOffNew triggers a new walk inside the BootstrapInterval window. * refactor(s3/lifecycle): drop unused MarkDirty / MarkAllDirty + pendingDirty These methods were the operator-hook half of Phase 8, but the only caller (--force-bootstrap on the shell command) was removed when it turned out to be a no-op against a freshly-allocated bootstrapper. Nothing in production calls them anymore. Strip the dead surface: MarkDirty, MarkAllDirty, the pendingDirty set, the dirty-suppression branch in walkBucket, and the three tests that only exercised those methods. BootstrapInterval-driven re-bootstrap is the live mechanism. A real runtime trigger (SIGHUP, control RPC) is a separate change with a real call site. |
||
|
|
edfa1ce210 |
feat(s3/lifecycle): pointer-transition routing for live PUTs (Phase 5b/4) (#9385)
* feat(s3/lifecycle): pointer-transition routing for live PUTs (Phase 5b/4) Bootstrap covers existing versions, but a live PUT that creates a new .versions/<v-new> file and updates the parent's ExtLatestVersionIdKey didn't fire NoncurrentDays / NewerNoncurrent on the displaced prior version until the next bootstrap. Close that runtime gap. The meta-log already emits an Update event for the .versions/ directory itself when the latest pointer changes; the router was dropping it because buildObjectInfo returns nil for directories. New branch in Route detects that shape (versioned bucket, NewEntry + OldEntry both directories with the .versions/ suffix, ExtLatestVersionIdKey changed, ID different from the new ID) and emits a Match against the LOGICAL key with VersionID=oldID. Match.Identity comes from a single LookupVersion RPC for the displaced version file; SuccessorModTime is the directory update's mtime, which is the moment the displaced version became noncurrent. SiblingLister grows LookupVersion(bucket, key, versionID) for that single-RPC fetch. filerSiblingLister implements it; routing path treats NotFound as "displaced version was hard-deleted in the meantime, suppress" rather than an error. The router gates the lookup on at least one active event-driven NoncurrentDays / NewerNoncurrent rule for the bucket, so most buckets pay nothing per directory update. Tests: pointer-flip fires NoncurrentDays with displaced version_id; unchanged pointer skips; empty old pointer skips (first-PUT scenario); displaced-version NotFound suppresses; no-rule skips lookup; NewerNoncurrentVersions retains rank-0; unversioned bucket skips. * fix(s3/lifecycle): SuccessorModTime cache + NewerNoncurrent expansion Two correctness gaps in pointer-transition routing. The .versions/ directory's own Attributes.Mtime is preserved across pointer updates by updateLatestVersionInDirectory: it's a stale clock relative to the freshly-written latest version. Using it as the displaced version's SuccessorModTime made NoncurrentDays compute due = staleMtime + days, which fires immediately on a fresh PUT into an old .versions/ container. Read ExtLatestVersionMtimeKey written by setCachedListMetadata; suppress (return no matches) when the cache is missing rather than fall back to dir mtime. Single-oldID lookup is only enough for pure NoncurrentVersionExpirationDays. Any rule with NewerNoncurrentVersions > 0 cares about the noncurrent ranks, and a pointer flip shifts every prior noncurrent's index by one — the version that just crossed the keep-count threshold needs to be evaluated too. When any matching rule needs ranks, list the full .versions/ container, sort newest- first with mtime + version-id tiebreak, and route every noncurrent with its real index. Identity-CAS dedups against earlier schedules. SiblingLister grows ListVersions(bucket, key); filerSiblingLister's implementation paginates the container fully. Two regression tests: stale dir mtime + correct cached mtime schedules ~30 days out (not immediate); NewerNoncurrentVersions=2 with 4 versions fires on the rank-2 entry that just crossed the threshold while rank-0/1 are retained. * fix(s3/lifecycle): bound pointer-transition expansion to threshold crossings routePointerTransitionExpand emitted a Match for every eligible noncurrent on every PUT. Schedule.Add doesn't dedup, so identity-CAS at dispatch only saved the wasted RPC, not the heap slot. A hot key with many already-eligible versions and a count rule would push O(versions) entries per flip, repeatedly, until dispatch caught up. Bound the emission to versions that newly entered eligibility on this specific flip: rank 0 (the displaced version, for the NoncurrentDays clock) plus rank == rule.NewerNoncurrentVersions for each active count-gated rule (the version that just crossed from kept to expired). Bootstrap still owns full backfill for versions that were already over-threshold. Adds a regression with 6 versions and NewerNoncurrentVersions=2: asserts only the rank-2 entry that just crossed fires, not the already-over-threshold rank-3/rank-4 entries. * fix(s3/lifecycle): suppress pointer-transition expansion when newID missing routePointerTransitionExpand defaulted latestPos to 0 if newID wasn't found in the listing. That made the actual newest sibling latest against the pointer's intent, then misranked every other version. A race between the pointer write and the version write could land us there. Default latestPos to -1, set it only on a real match, and suppress the expansion when the search misses. Bootstrap repairs state on the next walk. The NewerNoncurrentVersions retention test was setting only lookupEntry, so Route never reached the expansion path it claimed to exercise. Repoint to listVersions and assert ListVersions was consulted while LookupVersion was not. Adds a regression covering the missing-newID suppression directly. * fix(s3/lifecycle): include bare null version in pointer-transition routing Bootstrap models the bare-key object as a "null" sibling alongside .versions/ children, but the live pointer-transition path didn't. Two cases lost: 1. oldID == "" was treated as "nothing displaced". A pre-versioning bare object becomes noncurrent when the first versioned PUT lands and the pointer flips to a real id, but live routing skipped it and waited for the next bootstrap. 2. The expansion path's ListVersions returned only .versions/ children. With a bare null in the picture, the noncurrent ranks were wrong, so NewerNoncurrentVersions could keep the wrong versions and delete the right ones (or vice versa). SiblingLister grows LookupNullVersion(bucket, key) returning the bare entry plus an explicit-null flag (matches the bootstrap shape). filerSiblingLister implements it via util.NewFullPath + filer_pb.LookupEntry. routePointerTransitionDisplaced: oldID == "" now consults LookupNullVersion. When the bare entry exists, route it as VersionID="null" against the LOGICAL key. routePointerTransitionExpand: collect .versions/ children plus the null entry into one sibling slice before sorting and ranking. The threshold-crossing logic now sees the same N-version set that bootstrap would compute. Three new tests: oldID == "" with no null is a no-op (one null lookup, no version lookup); oldID == "" with a bare null schedules NoncurrentDays as VersionID="null"; expansion with a bare null between .versions/ siblings places null at its mtime-correct rank and only that rank-N entry fires. * fix(s3/lifecycle): atomic listPageSize so test cleanup doesn't race KickOffNew dispatches walks via `go b.walkBucket(...)`. A test that finishes before its goroutines drain leaves them running into the next test's t.Cleanup, which mutates listPageSize. -race spots the read/write collision intermittently. Convert listPageSize to atomic.Uint32; tests use Load/Store. No production semantics change. * fix(s3/lifecycle): null becomes latest when suspended PUT clears pointer The router treated newID == "" as if the cached ExtLatestVersionMtimeKey were still authoritative — but that cache holds the displaced version's mtime, written by setCachedListMetadata when the prior version became latest. Using it as SuccessorModTime made NoncurrentDays=30 immediately fire on a 100-day-old displaced version even though it became noncurrent today. When newID == "" the bare null is the new latest. Look it up, substitute its mtime as the successor clock, and substitute "null" as the latestPos target for the expansion path's id match. Both displaced and expand paths now derive the right clock. updateIsLatestFlagsForSuspendedVersioning was the upstream cause of the staleness — it cleared ExtLatestVersionIdKey and FileNameKey but left the cached size/mtime/etag/owner/delete-marker behind. Call clearCachedVersionMetadata so the .versions/ container is consistent with "null is latest". The router-side guard is still needed for older deployments that ran the buggy code, but new writes won't exercise the workaround. Two regressions: 100-day-old displaced under NoncurrentDays=30 with a today-null PUT schedules ~30d out (not immediate); same shape with NewerNoncurrentVersions=2 ranks the null at latest and only the rank-2 entry fires. |
||
|
|
2f7ac1d664 |
feat(s3/lifecycle): NoncurrentVersionExpiration via bootstrap (Phase 5b/3) (#9383)
* feat(s3/lifecycle): NoncurrentVersionExpiration via bootstrap (Phase 5b/3)
Bootstrap now expands every <key>.versions/ directory into one event
per version with sibling state pre-computed. The router fires
NoncurrentDays / NewerNoncurrent off these events using
SuccessorModTime as the noncurrent clock; previously these rules
never ran on a versioned bucket because buildObjectInfo couldn't
classify version-folder events without the latest pointer.
Mechanics
walkBucketDir treats a directory ending in .versions and carrying
ExtLatestVersionIdKey as a SeaweedFS .versions container — emit it
once and skip the recursion. Coincidentally-named directories without
the latest pointer recurse normally.
BucketBootstrapper.expandVersionsDir lists the children, sorts
newest-first by mtime, resolves the latest position from the pointer,
and injects a synthesized reader.Event per version with
BootstrapVersion populated. NoncurrentIndex is 0-based among
noncurrents in newest-first order; SuccessorModTime is the immediate
newer sibling's mtime (zero for the latest). Pointer naming a missing
or absent version falls back to the newest-by-mtime sibling so a
race window can't flag every entry as noncurrent.
routeBootstrapVersion uses BootstrapVersion to build ObjectInfo
directly (bypassing the version-folder skip in buildObjectInfo) and
runs the standard match loop. ABORT_MPU is excluded by kind-shape
gate. The schedule clock uses SuccessorModTime for noncurrents and
ModTime for the latest, so the dispatcher fires when the rule's days
threshold is met. Match.ObjectKey is the LOGICAL key,
Match.VersionID is the marker's stored version_id — the dispatcher
reaches deleteSpecificObjectVersion or createDeleteMarker correctly.
Layer 2 tests cover both sides. Router: latest fires ExpirationDays;
noncurrent fires NoncurrentDays; NewerNoncurrentVersions retains the
N newest noncurrents; ABORT_MPU never matches. Bootstrap: .versions
dir emitted once and not recursed; missing latest pointer falls back
to newest; backdated PUT (latest pointer is older by mtime) keeps
the right noncurrent index; delete-marker flag propagates.
* fix(s3/lifecycle): no VersionID for latest expirations, child-based dir disambig
Two correctness gaps in Phase 5b/3.
Bootstrap was pinning the version_id on every Match. For
EXPIRATION_DAYS / EXPIRATION_DATE on the latest version this is
unsafe: between schedule and dispatch a fresh PUT can land, the
dispatcher would still identity-match against the original version's
bytes (it still exists at that path) and the resulting delete marker
would hide the new latest. Drop VersionID for those kinds; an empty
VersionID makes the dispatcher fetch the current latest, where
identity-CAS resolves to STALE_IDENTITY and bootstrap re-schedules
with the new latest's identity. NoncurrentDays / NewerNoncurrent /
EXPIRED_DELETE_MARKER still pin the version_id since those are
version-targeted.
isVersionsDir gating on ExtLatestVersionIdKey lost a race window:
createDeleteMarker writes the version file before updating the
parent's Extended pointer, so a walk between those two steps would
see a .versions/ dir without the pointer, recurse into it, and emit
raw version files that the router drops. Match the suffix only and
let expandVersionsDir disambiguate by child inspection: if any child
carries ExtVersionIdKey it's a real .versions container and we expand;
otherwise it's a coincidentally-named user folder and we recurse via
the bucket-walk's own callback so nested entries still flow through.
Tests: latest-expiration assertion flipped to expect empty VersionID;
new tests cover the coincidentally-named-folder recursion and the
race-window expansion (children present, pointer absent).
* fix(s3/lifecycle): filter directory + missing-version-id children at listing
expandVersionsDir's listing callback collected every child with
attributes; subdirectories or entries without ExtVersionIdKey would
make it past the empty-id skip in the inner loop but still inflate
NumVersions and skew NoncurrentIndex (the rank derives from the
filtered slice's position, which was wrong when the unfiltered slice
was sorted). Drop directories at listing time and partition the
file children into a versions slice that's the actual rank source.
Test cleanups: out-of-order-mtime test now sets v1 older than v2 so
latestPos > 0 actually exercises the rank-skip branch in
expandVersionsDir; bootstrapVersionEntry preserves nanosecond
precision via MtimeNs to match markerLoneEntry's pattern; drop a
leftover unused idx variable.
* fix(s3/lifecycle): null version + canonical version-id tiebreak
Two correctness gaps in Phase 5b/3 bootstrap.
Null versions live at the bare logical path, not under .versions/.
Bootstrap previously expanded only .versions/<key>/ children, so:
- pre-versioning objects with newer .versions/ history never had
their null version expired by NoncurrentDays
- suspended-bucket writes (which clear the .versions/ latest pointer
so null becomes current) had every .versions/ child wrongly
classified as latest by the buildObjectInfo fallback
expandVersionsDir now looks up the bare key via NewFullPath +
LookupEntry, accepts a regular file or an explicit S3 directory-key
marker (Mime set), and folds it into the sibling set with
VersionID="null". Latest resolution: pointer present + names a real
id wins; pointer absent + null exists makes null latest; otherwise
falls back to newest sibling. The walker's regular emission for the
bare entry would otherwise duplicate, so walkBucketDir now does a
two-pass walk per directory level — .versions/ first, then everything
else with a per-walk skipBare set keyed by bucket-relative path that
expandVersionsDir populates when it claims a bare null sibling.
Sort tiebreak: PUTs only set second-level Mtime, so two versions
written in the same second tied. The unstable secondary order let
old-format version filenames sort oldest-first and corrupt
NoncurrentIndex under NewerNoncurrentVersions retention. Add
CompareVersionIds to s3lifecycle/version_time.go (mirrors the
canonical comparator in s3api/s3api_version_id.go to avoid the
import cycle) and use it as a secondary key after mtime equality.
Tests: pre-versioning null-as-noncurrent, suspended null-as-current,
directory-key marker as null version, end-to-end claim through
walkBucketDir's two-pass ordering, and same-second tiebreak via
canonical version-id ordering. fakeFilerClient grows a
LookupDirectoryEntry implementation backed by the same in-memory tree.
* fix(s3/lifecycle): only treat explicit-null bare entries as current
The pointer-missing branch in expandVersionsDir made null latest as
soon as a bare object was found. That's correct for suspended-bucket
writes (s3api_object_handlers_put.go writes the bare entry with
ExtVersionIdKey="null") but wrong for the pre-versioning race window:
a brand-new version under .versions/<file> exists before the parent's
ExtLatestVersionIdKey update lands, and a pre-versioning bare object
has no version-id marker. Marking that older bare object latest hides
the real new version and skips noncurrent expiration of the null
until the next process restart/bootstrap.
Distinguish the two: lookupNullVersion now returns whether the bare
entry's Extended map carries ExtVersionIdKey="null" (the suspended
write marker). expandVersionsDir's pointer-missing branch only
promotes null to latest when explicit; otherwise it falls back to
newest-sibling, which is safe for the race window since the new
version's mtime is fresher than the bare object's.
The existing suspended-null test now uses a new helper that adds the
explicit marker. New regression test covers the race window: bare
entry without the marker + a fresh .versions/<v1> file + missing
parent pointer must keep v1 as latest and the null as noncurrent.
* fix(s3/lifecycle): only the newest item can be the explicit-null latest
The pointer-missing branch in expandVersionsDir scanned every item for
an explicit null and promoted it to latest. After a suspended->enabled
transition that's the wrong call: createVersion writes the version
file before updating ExtLatestVersionIdKey, so a bootstrap that lands
in the race window sees an older bare null with ExtVersionIdKey="null"
plus a newer .versions/<v-new> child and no parent pointer. Promoting
the null misclassifies v-new as noncurrent and skips both the new
version's current-version expiration and the null's noncurrent
scheduling until the next bootstrap.
Constrain the explicit-null branch to items[0]: if the suspended-null
write is genuinely current it'll be the newest by mtime AND tagged.
Anything else falls through to the newest-sibling default.
Adds a regression test for the suspended->re-enabled race.
* fix(s3/lifecycle): paginate bootstrap directory listings
SeaweedList(..., limit=0) is a single-page request: the filer caps
limit=0 at DirListingLimit (1000 by default) and returns whatever fits
in one round trip. expandVersionsDir and walkBucketDir both relied on
that, so any directory bigger than the cap silently truncated. For
noncurrent retention this is correctness, not just scale — a hot key
with more versions than the cap had its rank/sort math computed off
the first page only, NumVersions, NoncurrentIndex, SuccessorModTime,
and the latest-fallback all wrong, with the older versions never
scheduled until a future bootstrap.
Add a listAll helper that drives pagination via StartFromFileName +
inclusive=false, looping until a page returns fewer entries than the
configured page size. Use it in both call sites. Page size is a var
(listPageSize, default 1024) so tests can shrink it without
generating thousands of entries.
The fake filer client now mirrors the real semantics: sort children
by name, honor StartFromFileName/InclusiveStartFrom, cap at Limit.
New regression tests force a small page size and assert the full
result set is processed and the call count matches what pagination
should drive.
* perf(s3/lifecycle): stream bucket walk in two passes instead of buffering
walkBucketDir was paginating into a children slice and then iterating
twice (pass 1: .versions/, pass 2: everything else). For flat buckets
with millions of entries the buffer is a real memory spike. Drop the
materialization: each pass now drives its own listAll over the same
directory and acts on entries as they stream in. The skipBare ordering
contract is preserved — pass 2 still runs after pass 1 finishes — and
the per-pass paging keeps memory bounded by listPageSize.
Tradeoff: each directory level is listed twice. For workloads where
that matters more than the memory headroom, we can revisit; the
correctness/scale dial here is what the noncurrent rules need.
Updated three tests for the new call count: each walk now records 2
listings per directory (pass 1 + pass 2). The KickOffNew dedup tests
expect 2 calls per bucket; the pagination test expects 6 instead of 3.
|
||
|
|
1c917ffacb |
fix(volume): sticky EIO quarantine; track streamed reads (#9384)
Two follow-ups on PR #9382: 1. Quarantine wasn't sticky. Once CollectHeartbeat crossed the streak threshold and hid the replica, a subsequent successful read called checkReadWriteError(nil), wiping the streak; the next heartbeat then re-announced the suspect replica as read-only and master could send reads back to a disk that already failed IoErrorTolerance. Added an ioErrorQuarantined sticky flag set on the first heartbeat that observes the threshold and cleared only by MarkVolumeWritable (resetIoErrorState). clearIoError continues to reset just the streak so successful ops don't accumulate phantom errors. 2. Streamed reads bypassed the EIO counter. readNeedleDataInto and ReadNeedleBlob — the hot paths for large/range GETs — returned ReadNeedleData / needle.ReadNeedleBlob errors without threading them through checkReadWriteError, so a disk failing only on those paths would never trip IoErrorTolerance. Both now route the backend error through the tracker, and a fully clean readNeedleDataInto call clears the streak. Tests cover the sticky flag (TestQuarantineIsSticky) and the streamed read path (TestReadNeedleBlobTracksEIO via a fake EIO backend). |
||
|
|
7c60407897 |
fix(volume): don't nuke local data on transient IO error (#9378) (#9382)
* fix(volume): don't nuke local data on transient IO error (#9378) A single syscall.EIO from any read/write/delete set v.lastIoError, and the next CollectHeartbeat then called Volume.Destroy on the replica — removing the .dat/.idx/.vif/.sdx/.ldb/.rdb files. A brief NFS / fabric / controller blip hitting several replicas at once could cascade into removal of the last healthy copy, with no recovery for non-tiered volumes. Now require IoErrorTolerance (3) consecutive EIOs before acting, and on that threshold mark the volume read-only and stop announcing it to the master so re-replication kicks in from healthy peers — never delete the data files. The on-disk copy stays for operator inspection / recovery. * review: fix race, accounting, recovery, non-EIO streak break Addressing PR #9382 review: - Data race on lastIoError: guard lastIoError + lastIoErrorCount with a RWMutex and expose them through note/clear/get helpers so the heartbeat reader sees a consistent snapshot. Verified with -race. - Collection-size accounting: when a volume is quarantined for sustained EIO, skip the entire per-volume bookkeeping (`continue`) instead of flipping shouldDeleteVolume — the old branch subtracted a size that was never added, dragging the collection gauge to zero / negative. - Recoverability: MarkVolumeWritable now also calls clearIoError so an operator can rejoin a quarantined replica. The next failed op re-arms the streak if the disk is still bad. - Non-EIO streak break: a non-EIO error (e.g. ENOSPC) now resets the consecutive-EIO counter, so a sequence EIO,EIO,ENOSPC,EIO is treated as a streak of one — the counter only tracks consecutive EIOs. Reads already call checkReadWriteError (volume_read.go), so successful reads also clear the streak — no change needed there. |
||
|
|
c6ad6dcf74 |
feat(s3/lifecycle): sole-survivor delete-marker routing (Phase 5b/2) (#9381)
* feat(s3/lifecycle): sole-survivor delete-marker routing (Phase 5b/2)
Production stores delete markers under <object>.versions/<version-file>;
buildObjectInfo skips every version-folder event because it can't tell
current from noncurrent without sibling state. EXP_DM is the one rule
that can be routed from the file path alone: if the marker is the only
entry under .versions/<key>/ then it's necessarily the latest.
Add SiblingLister; gate the listing on (versioned, version-folder path,
delete-marker entry, EXP_DM rule active for the bucket, lister supplied)
so non-marker version writes pay nothing. The Match carries the LOGICAL
key in ObjectKey and the marker's version_id in VersionID so the
dispatcher can reach deleteSpecificObjectVersion(bucket, logical, vid);
without a version_id the dispatcher would BLOCK and freeze the cursor.
Server-side dispatch re-checks sole-survivor before deleting: another
PUT can land between schedule and dispatch, and identity-CAS only covers
the marker entry, not the directory shape. Lists .versions/<key>/ with
limit 2 and returns NOOP_RESOLVED if count != 1 or the surviving entry
is not the same version.
Fix isDeleteMarkerEntry to compare string(v) == "true" — production
writes []byte("true"), not {1}; every other reader of ExtDeleteMarkerKey
in this repo uses the string predicate.
filerSiblingLister.Count caps at limit 2 — callers only distinguish
"sole survivor" (1) from "more than one" (>=2).
Follows #9373.
* fix(s3/lifecycle): gate sibling listing on active event-driven EXP_DM
hasActionKind tested key presence only; an EXP_DM rule whose action was
inactive (BootstrapComplete=false) or scan-only would still trigger the
sibling-listing RPC even though no match could fire. Replace with a
helper that mirrors the per-key filter Route already applies before
emitting matches: kind matches, snap.Action returns non-nil, IsActive,
and Mode == ModeEventDriven.
Add a regression test that builds a versioned snapshot with the rule
but no PriorStates (action stays inactive) and asserts the lister is
never consulted.
* chore(s3/lifecycle): trim verbose comments
* fix(s3/lifecycle): null version, hard-delete trigger, latest pointer
Three correctness gaps in EXP_DM routing.
1. Null version. SiblingLister.Count saw only .versions/<key>/, but a
pre-versioning bare-key object survives outside that folder when
versioning is enabled later. Marker + null would look like count==1
and EXP_DM would re-expose the old object. Replace Count with
Survivors which also reports HasNullVersion; route- and dispatch-time
suppress when set.
2. Hard-delete trigger. The router skipped events with NewEntry==nil so
a noncurrent hard-delete that left the marker as sole survivor never
fired EXP_DM. Allow version-folder hard-deletes through; the lister's
LoneEntry carries the surviving marker's version_id and identity.
3. Latest pointer. checkSoleSurvivorMarker only checked count and
filename. The worker can race createDeleteMarker between the file
write and the .versions/ directory metadata update. Require
ExtLatestVersionIdKey == versionId; missing pointer returns
retry-later instead of deleting.
Adds a null-version exists check on the dispatch path too.
* fix(s3/lifecycle): normalize null-version lookup errors and detect dir markers
Two correctness bugs in the null-version check.
Route-side: filerSiblingLister called client.LookupDirectoryEntry
directly. SeaweedList wraps not-found via filer_pb.LookupEntry, which
normalizes the gRPC string-mapped not-found into ErrNotFound. The raw
client returns it as a generic error instead, so every absent
null-version (the common case) bubbled up as an error and the router
suppressed every otherwise-valid match. Use filer_pb.LookupEntry.
Both sides: explicit S3 directory-key markers (object names ending in /)
are stored as directory entries with Attributes.Mime set;
processExplicitDirectory in the listing path treats them as null
versions. The previous check was !IsDirectory only, which let the
marker delete proceed and re-expose the directory key. Add
IsDirectoryKeyObject() to both predicates. Also use
util.NewFullPath(...).DirAndName() for the parent/name split so a
trailing-slash key resolves to the same underlying entry path as the
listing code.
* fix(s3/lifecycle): EXP_DM ctx propagation, nil-entry guard, fast-path skip
Three small follow-ups on the EXP_DM dispatch path.
checkSoleSurvivorMarker now takes ctx instead of context.Background()
so worker shutdown / deadlines cancel the SeaweedList RPC instead of
stalling.
If SeaweedList fires the lone callback with entry==nil, firstName
stays empty and the marker-replaced check would short-circuit; that's
the one shape that bypasses the dispatch guard, so retry-later instead.
routeSoleSurvivorMarker now skips the Survivors RPC on regular
non-marker version creates — those always have Count >= 2, so the
listing was wasted load on every versioned write under an EXP_DM rule.
Hard-delete events (NewEntry==nil) and marker creates still flow
through. Added a regression test asserting the regular-create case
doesn't consult the lister.
Documented that logicalKeyFromVersionPath rejects bucket-root markers
intentionally.
|
||
|
|
196c41d21a |
test(s3/lifecycle): cover scheduler/bootstrap walker + MPU detection (#9380)
Locks down isMPUInitDir, walkBucketDir (regular/nested/MPU/error/cancel paths), and BucketBootstrapper.KickOffNew (per-bucket fanout and in-process dedup) against a fake SeaweedFilerClient. |
||
|
|
ee1d8f9e8c |
refactor(s3api): drop filer.conf TTL routing from PUT lifecycle (#9379)
PutBucketLifecycleConfiguration used to install /buckets/<bucket>/<prefix> day-TTL entries in filer.conf so the volume server's RocksDB compaction filter would expire matching writes. With 9377 the s3api server now stamps volume TTL per-write via LifecycleTTLResolver off the stored XML, which covers the same prefix-only Expiration.Days subset and additionally handles size filters and AWS overlapping-rule precedence. Maintaining both paths means a rule change has to mutate two stores in lockstep, and the filer.conf path can't represent everything the resolver does. Drop the add path. Keep a one-way cleanup loop so an upgrade still wipes day-TTL entries written by older builds — otherwise a stale entry would silently double-stamp writes (volume server expires under the old rule) or contradict the new XML after a rule change. Also removes resolveLifecycleDefaultsFromFilerConf (no longer needed) and the versioning-fast-path guard (the resolver itself returns nil for versioned/object-lock buckets, covered by TestNewLifecycleTTLResolver_NilOnVersionedBucket). Tests covering the deleted helpers are deleted with them; the GET fallback that synthesizes lifecycle rules from existing filer.conf TTLs is unchanged so users who historically configured TTL via filer.conf directly still see a rule. |
||
|
|
2458f6c81c |
feat(s3api): apply lifecycle TTL at write time (#9377)
* feat(s3api): apply lifecycle TTL at write time The S3 server already has the bucket's lifecycle XML at PUT time (via the cached BucketConfig), so volume-TTL routing is just a per-write decision instead of something that needs a separate filer.conf projection kept in sync via operator commands. - BucketConfig caches the canonical Rules parsed from the lifecycle XML once on load (BucketConfigCache invalidates on Put/Delete Lifecycle, so the rules stay current automatically). - resolveLifecycleTTLForWrite walks the cached rules: longest-prefix match, applies tag and size filters against the request, returns Days * 86400. Versioned buckets, non-Expiration.Days rules, and unevaluable size filters (no Content-Length) yield 0 — the lifecycle worker handles those at scan time. - putToFiler resolves TTL once and passes it through both the AssignVolumeRequest (so chunks land on a TTL volume) and the new entry's Attributes.TtlSec (so the filer's RocksDB compaction also expires the metadata). Lifecycle XML PUT/DELETE now influences write routing immediately — no operator command, no filer.conf bookkeeping. The lifecycle worker remains authoritative for the cases the fast path can't cover (existing objects via bootstrap, versioned buckets, noncurrent retention, abort-MPU, tag/size filters that didn't hold at PUT time). CompleteMultipartUpload and CopyObject still need wiring; left for follow-ups so this PR stays scoped. * perf(s3api): pre-filter and sort lifecycle rules for the per-PUT TTL walk resolveLifecycleTTLForWrite walked every lifecycle rule on every PutObject, including disabled / non-Expiration.Days rules that could never fire on the fast path, and computed "longest prefix wins" via a running max instead of an early exit. Cache a pre-filtered + pre-sorted slice in BucketConfig: - buildTTLFastPathRules drops everything except Status=Enabled + ExpirationDays>0; - sorts by descending prefix length (stable, so equal-length rules keep their XML order). The resolver returns on first prefix+filter match. A bucket whose lifecycle XML has no Expiration.Days rules is now O(1); a typical bucket with one Expiration.Days rule walks one HasPrefix per PUT. The cache is built once per bucket-config load. PutBucketLifecycle / DeleteBucketLifecycle already invalidate the cache, so the fast-path slice stays current automatically. * refactor(s3api): LifecycleTTLResolver object + four review fixes Pulls the per-PUT TTL resolution into a dedicated type so the bucket config holds one object instead of a slice + magic-walk function: - LifecycleTTLResolver wraps the pre-filtered, pre-sorted rules. nil-safe Resolve so the call site doesn't have to special-case buckets with no eligible rules. Four review findings: 1. (high) drop tag-filtered rules from the fast path. Tags are mutable post-PUT via PutObjectTagging but volume TTL is irreversible — an object that matched at write time would still expire after the tag was removed. Worker re-evaluates current tags at scan time. Fast path now keeps only stable predicates: prefix and size. 2. (high) move TTL resolution out of putToFiler. MPU parts, copy-part destinations, and other transient writes called putToFiler with object="" — bucket-wide rules (empty Prefix) matched and bound a TTL clock starting at part-upload time, before CompleteMultipartUpload existed. putToFiler now takes an explicit ttlSec parameter; only the user-visible PutObject paths (PutObjectHandler, postpolicy) feed it from the resolver. MPU and copy-part pass 0. 3. (medium) AWS overlapping-rule precedence is "shorter expiration wins", not "longest prefix wins". Sort by ExpirationDays ascending so the first prefix match is also the shortest applicable rule. 4. (medium) overflow no longer caps at math.MaxInt32 seconds (~68y). A longer policy would have expired early. Return 0 instead so the worker enforces the actual policy on its own schedule. Versioning gate moves into the resolver constructor — versioned buckets get a nil resolver. The five putToFiler callers all updated: PutObjectHandler + postpolicy resolve via lifecycleTTLForObjectWrite, suspended/versioned wrappers pass 0 by construction, MPU part and copy-part SSE pass 0 with a one-line comment about why. * refactor(s3api): drop unused BucketConfig.LifecycleRules field The full canonical rule set was set on every bucket-config load but never read — resolveLifecycleTTLForWrite worked off the resolver's filtered slice, and the lifecycle worker reads bucket entries straight off the meta-log instead of this cache. Remove the field and its s3lifecycle import. * perf(s3api): pre-compute LifecycleTTLResolver hot-path fields Resolve was doing per-call work that's actually constant per bucket- config load: int64 multiplication, max-int32 overflow check, field indirections through *s3lifecycle.Rule. Move it to the constructor and pack the rule into a compact ttlRule (prefix + ttlSec int32 + sizeGT/sizeLT) so the inner loop is HasPrefix → optional size check → return. Drop overflowing rules at construction rather than handling per- resolve: capping would expire long policies early, and returning 0 in the inner loop would prevent any shorter overlapping rule from firing. Drop-at-construction composes correctly with the ascending sort. Benchmarks (Apple M4): NilReceiver 0.99 ns/op 0 B/op OneRuleMatching 2.75 ns/op 0 B/op FiveRulesNoMatch 13.5 ns/op 0 B/op * fix(s3api): refresh LifecycleTTL resolver on bucket-config update storeBucketLifecycleConfiguration writes to Entry.Extended via updateBucketConfig, which clones the cached BucketConfig and calls the user fn, then caches the result. The clone inherits the prior LifecycleTTL pointer and nothing rebuilt it from the new XML, so add/replace/delete of a lifecycle policy left the wrong resolver in cache until eviction. Same gap on the meta-log side: peer-driven updates flowed through updateBucketConfigCacheFromEntry without re-deriving the resolver. Centralize the Entry -> derived-field mapping in one helper that resets every Extended-backed field then repopulates from the entry, and call it from getBucketConfig (initial load), updateBucketConfig (after updateEntry succeeds, before caching), and updateBucketConfigCacheFromEntry (meta-log path). Reset is the load-bearing part: deleting the lifecycle XML must yield a nil resolver, since stamping a stale TTL onto subsequent writes is irreversible. * fix(s3api): PostPolicy passes object size, not multipart wire size lifecycleTTLForObjectWrite was reading r.ContentLength, which on the PostPolicy path is the multipart envelope (form fields + boundaries), not the uploaded object body. A size-filtered rule would evaluate against that inflated total and stamp (or skip) a TTL the policy didn't intend. Take the object size as an explicit parameter. PutObject still passes r.ContentLength (correct there); PostPolicy passes the fileSize already extracted from the form part. Negative size means unknown and continues to skip any size-filtered rule. * fix(s3api): treat Object Lock as versioned for lifecycle TTL fast path Object Lock requires versioning at the API level, but it can be enabled at create time without S3 ever writing the explicit Versioning header. The lifecycle resolver construction site only checked Versioning, so an Object-Lock bucket with no Versioning byte would still get a fast-path resolver and stamp volume TTL onto writes — destroying noncurrent versions when the volume expires. Mirror the OR already used in BucketIsVersioned: ObjectLockConfig non-nil counts as versioned for resolver construction. Existing explicit-Versioning paths are unchanged. |
||
|
|
e55db58ca9 |
feat(s3/lifecycle): expose Prometheus metrics (Phase 7) (#9375)
* feat(s3/lifecycle): expose Prometheus metrics (Phase 7)
Five new gauges/counters under the s3_lifecycle subsystem so operators
can see what the worker is doing without grepping logs:
- dispatch_total{bucket,kind,outcome} — every LifecycleDelete RPC
bumps this. Outcome is the proto enum name (DONE, NOOP_RESOLVED,
RETRY_LATER, BLOCKED, …) plus a synthetic "RPC_ERROR" for transport
failures classified as RETRY_LATER.
- schedule_depth{shard} — pending matches in each shard's schedule,
sampled on the dispatcher tick.
- cursor_min_ts_ns{shard} — per-shard min cursor timestamp; lag is
derived as (now - min) by the scrape side.
- events_total{shard} — meta-log events the reader fed to the router.
- bootstrap_dispatch_total{bucket,kind} — bootstrap-walk dispatches.
Test asserts the dispatch counter increments for both DONE and
RPC_ERROR paths.
* fix(stats): purge lifecycle bucket label series in DeleteBucketMetrics
The two new bucket-labeled lifecycle counters
(S3LifecycleDispatchCounter, S3LifecycleBootstrapDispatchCounter)
weren't included in DeleteBucketMetrics, so explicit bucket teardown
left their label series behind — same cardinality leak the existing
counters above already avoid. Tack them onto the same DeletePartialMatch
chain.
|
||
|
|
05d31a04b6 |
fix(s3tests): wire lifecycle worker for expiration suite (#9374)
* fix(s3tests): wire lifecycle worker for expiration suite
The upstream s3-tests `test_lifecycle_expiration` / `test_lifecyclev2_expiration`
exercise the "set rule, wait, verify deletion" path. Phase 4 (#9367) intentionally
stripped the PUT-time back-stamp, so pre-existing objects no longer pick up TtlSec
on a freshly-applied rule. The s3tests CI bare-bones `weed -s3` had nothing left
driving expiration.
Three changes that work together:
- Engine scales `Days` by `util.LifeCycleInterval`. Production keeps the 24h day;
the `s3tests` build tag shrinks it to 10s so a `Days: 1` rule completes inside
the suite's 30s polling window. Exported `DaysToDuration` so sibling-package
tests pin to the same scale.
- Scheduler/dispatcher tick defaults split into `_default` / `_s3tests` files.
Production stays 5s/30s/5m; the test build runs at 500ms/2s/2s so deletions
land within a couple ticks of becoming due.
- s3tests.yml spawns `weed shell s3.lifecycle.run-shard -shards 0-15 -events 0
-runtime 1800s` alongside the s3 server in both the basic and SQL blocks; the
shell command runs the full pipeline (reader + scheduler + dispatcher) for the
duration of the suite. `test_lifecycle_expiration_versioning_enabled` is left
out for now — versioned-bucket expiration via the worker still needs its own
pass.
Drive-by: bump `TestWorkerDefaultJobTypes` to 7 to match the registered
handler count (
|
||
|
|
159cfc97ce |
feat(s3/lifecycle): classify versioned events by storage path (Phase 5b/1) (#9373)
* feat(s3/lifecycle/router): classify versioned events by storage path Phase 5b first slice. Pass the bucket's Versioned flag from the engine snapshot into buildObjectInfo and: - Recognize <key>.versions/<vid> events as noncurrent versions. IsLatest=false, info.Key strips the .versions/<vid> suffix so a rule's Filter.Prefix matches the user's logical key, and the AWS-visible version_id rides on Match.VersionID for the dispatcher to target a single version on the server. - Read IsDeleteMarker from Extended unconditionally — the engine rejects ExpiredObjectDeleteMarker when NumVersions != 1, so without sibling listing the marker case stays correctly suppressed (a separate PR will add the listing). - Non-versioned buckets keep the existing behavior even when an object literally named "*.versions/v1" exists; Versioned=false short-circuits the path classification. Time-based NoncurrentDays now fires on noncurrent events. NewerNoncurrent and ExpiredObjectDeleteMarker still need sibling listing — left for a follow-up. * fix(s3/lifecycle/router): require ExtVersionIdKey to confirm noncurrent Path classification alone misclassifies a literal-key collision: a versioned bucket holding an object with key "logs/backup.versions/2023" would be flagged noncurrent and have its key stripped to "logs/backup", losing the user's actual rule-prefix-matching path. SeaweedFS doesn't reserve the .versions/ segment, so the path shape is necessary but not sufficient. Add an authoritative confirmation: the entry must declare the same version_id via ExtVersionIdKey (the field SeaweedFS sets when storing a tracked version). Also reject idx==0 paths so ".versions/<vid>" can't yield an empty logical key. Tests: - collision: versioned bucket + .versions/ in literal key + no metadata (and the mismatched-vid variant) → still classified as a current-version object; - root-versions: .versions/v1 (idx==0) → treated as a regular key; - existing noncurrent test now sets ExtVersionIdKey to mirror the storage shape. * fix(s3/lifecycle/router): skip versioned-bucket version-folder events The previous attempt tried to classify <key>.versions/<vid> events as noncurrent versions by storage path. That's broken on three counts: - SeaweedFS stores version files as v_<id> (getVersionFileName), so comparing the path suffix to the raw ExtVersionIdKey never matches. - The "current latest" version on a versioned bucket lives at the same .versions/v_<id> path shape as noncurrent versions; the latest pointer is on the parent .versions/ directory's Extended[ExtLatestVersionIdKey], which the router doesn't see. - Even with a correct vid match, IsLatest=false plus the storage path as ObjectKey would have the dispatcher recompose <storagepath>.versions/v_<id> and no-op (or worse, target the wrong file). Until we route from .versions/ directory pointer-transition events (or supply IsLatest/SuccessorModTime/index from sibling listing), skip every event under a *.versions/ folder. Bare-key events (null versions) still route normally; bootstrap walking covers the versioned-storage cases. Tests assert the skip across tracked, literal-collision, and bucket-root .versions paths. * feat(s3api): refuse noncurrent-kind delete on the current latest version Defense-in-depth for the noncurrent kinds: even when bootstrap (or a future event-driven path) thinks a version is noncurrent, the server must verify against the .versions/ directory's Extended[ExtLatestVersionIdKey] before deleting. If the target version matches the latest pointer the action is silently dropped as NOOP_RESOLVED:VERSION_IS_LATEST instead of deleting the live data. * refactor(s3/lifecycle): tidy versioning gates per review - router: skip directory entries (other than MPU init) in buildObjectInfo so .versions/ folder events never become ObjectInfo. Subtest "versions dir itself" added. - s3api: switch isCurrentLatestVersion's path split from filepath.Split (OS-dependent) to path.Split so filer paths always use '/'. |
||
|
|
935fb42e1d |
chore(weed/util/chunk_cache): remove unused functions (#9372)
* chore(weed/util/chunk_cache): remove unused functions * fix(chunk_cache): bound ReadAt buffer in readNeedleSliceAt When the caller-provided buffer is larger than the remaining needle bytes, ReadAt would spill into the next needle and trigger the n != wanted error. Slice to data[:wanted] so the read stops at the needle boundary. --------- Co-authored-by: Chris Lu <chris.lu@gmail.com> |
||
|
|
fd463155e4 |
fix(ec): planner treats each (server, disk_id) as a distinct target (#9369) (#9371)
* fix(ec): planner treats each (server, disk_id) as a distinct target (#9369) master_pb.DataNodeInfo.DiskInfos is keyed by disk type, so a volume server with multiple physical disks of the same type collapses into a single DiskInfo. Per-disk attribution survives only inside the VolumeInfos[].DiskId / EcShardInfos[].DiskId records, and the active topology never put it back together. The EC planner saw N candidates instead of N×disks, returned a short plan, and createECTargets round-robined extra shards onto the same (server, disk_id) — colliding with the #9185 disk_id-aware ReceiveFile. Reconstruct per-physical-disk view in UpdateTopology by splitting each DiskInfo into one entry per observed disk_id, and index volumes / EC shards by their own DiskId so lookups stay aligned. Refuse to plan an EC task when fewer than totalShards distinct disks are available rather than packing shards onto the same disk. Threads dataShards/parityShards through planECDestinations, createECTargets and createECTaskParams so the helpers don't depend on the OSS 10+4 constants — keeps enterprise merges clean. * trim verbose comments * align EC param signatures with enterprise - dataShards/parityShards: uint32 → int (matches enterprise's ratio API) - drop unused multiPlan from createECTaskParams - minTotalDisks: total/parity+1 → ceil(total/parity), correct for non-default ratios Reduces merge surface when this PR lands in seaweed-enterprise. |
||
|
|
194dce27bf |
fix(mount): preserve user-set mtime through async/periodic flush (#9363) (#9370)
* fix(mount): preserve user-set mtime through async/periodic flush (#9363) flushMetadataToFiler and flushFileMetadata both stamped time.Now() onto the entry before sending it to the filer, clobbering any mtime SetAttr had stored from utimes()/touch -m -d. The reproducer hit this ~1s after touch because the writebackCache deferred close from the prior write ran flushMetadataToFiler after the user's utimes call. Flush has no business inventing timestamps. Move the write-time stamp into Write (where it always belonged for POSIX correctness) and let flush persist whatever Write or SetAttr already put on the entry. * test(mount): tighten mtime regression test, drop tautological one - userMtime now has non-zero nanoseconds, so the *Ns assertions catch a regression that would zero the field. - Add CtimeNs assertion (was missing). - Drop TestWriteStampsEntryMtime: it duplicated the implementation it was supposed to test, so a regression in Write would not have failed it. Driving the real Write path needs a full PageWriter, which is out of scope for this fix; TestFlushFileMetadataPreservesUserMtime is the meaningful regression for #9363. |
||
|
|
89aab30821 |
feat(s3/lifecycle): wire AbortIncompleteMultipartUpload (Phase 5a) (#9368)
* feat(s3/lifecycle/router): emit ABORT_MPU events for .uploads/<id> init dirs Detect a meta-log event at exactly .uploads/<upload_id> (a directory) and build the ObjectInfo from its destination key (entry.Extended[key]) so a rule with Filter.Prefix=foo/ matches an MPU uploading to foo/bar. Sub-events under .uploads/<id>/<part> ride a different mtime and would over-fire the ABORT_MPU schedule, so they're rejected explicitly. m.ObjectKey stays as ev.Key (.uploads/<upload_id>) — the dispatcher needs the upload directory path, not the destination key, to actually remove the in-flight upload. * feat(s3api): wire LifecycleDelete ABORT_MPU to remove the upload dir Replaces the retryLater stub. Validates the .uploads/<upload_id> shape of req.ObjectPath (so a malformed event can't escalate to a wider rm), then deletes the upload directory under <bucket>/.uploads/<id>. Maps NotFound to NOOP_RESOLVED, transport errors to RETRY_LATER, success to DONE. * refactor(s3api): drop redundant exists check before lifecycle ABORT_MPU rm s3a.rm already does a NotFound-returning lookup, so the pre-check just adds a round-trip. Map filer_pb.ErrNotFound to NOOP_RESOLVED on rm, keep transport errors as RETRY_LATER. * refactor(s3/lifecycle/router): use s3_constants for MPU paths + Extended key Drop the hardcoded ".uploads/" and "key" string literals; the symbols already exist as s3_constants.MultipartUploadsFolder and ExtMultipartObjectKey, and the server side reaches them through the same constants. Keeping the test helpers tied to those names also makes the negative-result tests meaningful — they'd otherwise still pass if the lookup constant drifted. * fix(s3api): close lifecycle ABORT_MPU traversal + NOT_FOUND gaps Two issues with the recent ABORT_MPU plumbing: - "." and ".." passed the no-slash check but resolve to the bucket root via util.JoinPath, so .uploads/.. could rm the wrong directory. - filer.DeleteEntry suppresses ErrNotFound and returns success, so the rm path can't distinguish missing from deleted; the previous version reported DONE for an already-aborted upload instead of NOOP_RESOLVED. Reject the two reserved names explicitly and restore the existence pre-check so the outcome map stays correct. Add a table-test covering the rejected paths. * fix(s3/lifecycle/bootstrap): walk MPU init dirs by destination key A real MPU init record is a directory under .uploads/<id> created by mkdir; the bootstrap walker was skipping every directory entry, so an MPU that existed before the meta-log subscription was never aborted. Even with the skip relaxed, MatchPath used the .uploads/<id> path, so a rule with Filter.Prefix=logs/ would never fire on an MPU uploading to logs/foo.txt. Add Entry.DestKey, let IsMPUInit directories through, and use DestKey for both MatchPath and ObjectInfo.Key. A bare init directory with no DestKey means metadata hasn't landed yet — skip rather than guess. * fix(s3/lifecycle): gate (kind, info) shape so MPU init only fires ABORT_MPU An MPU init record carries IsMPUInit=true and IsLatest=false. Without gating, the router and bootstrap walker matched it against every active ActionKey for the bucket, so NONCURRENT_DAYS / NEWER_NONCURRENT fired (IsLatest=false reads as a noncurrent version). The dispatcher would then BLOCK on empty version_id and freeze the cursor. Add a shape gate at both call sites: - IsMPUInit + non-ABORT_MPU kind → continue - regular object + ABORT_MPU kind → continue Plus a defense-in-depth check at the top of EvaluateAction so future callers can't reintroduce the bug. Tests cover all three layers. * test(s3/lifecycle): tighten dual-action coverage at the call sites - Walk multi-action: replace the kinds-as-set check with an exact-shape DeepEqual on (path, kind) tuples. The set check would have missed an MPU init wrongly firing NONCURRENT_DAYS — exactly the regression the (kind, info) gate fixes. - Router: add a converse case for the dual ExpirationDays + AbortIncompleteMultipartUpload rule. A regular current-version object must fire only EXPIRATION_DAYS; without the gate the dispatcher would also receive ABORT_MPU and rm the object via the MPU code path. |
||
|
|
8b87ceb0d1 |
refactor(s3api): strip back-stamp from PutBucketLifecycleConfiguration (Phase 4) (#9367)
* refactor(s3api): strip back-stamp from PutBucketLifecycleConfiguration The handler used to walk every existing entry under the rule's prefix and stamp entry.Attributes.TtlSec + the SeaweedFSExpiresS3 flag so that the filer's compaction filter would expire them. With the event-driven lifecycle worker live, that retroactive walk is redundant — the worker drives expiration off the meta-log and a one-time bootstrap scan, so a PUT lifecycle stays O(rules) instead of O(objects). New writes still inherit TTL from the filer.conf location entry above; that volume-routing path is unchanged here and will move to an explicit operator command later (Phase 11). Drops updateEntriesTTL + processDirectoryTTL + processTTLBatch + updateEntryTTL from filer_util.go. * fix(s3api): clear stale lifecycle TTL entries on PUT PutBucketLifecycleConfiguration only ever appended/updated filer.conf entries — it never cleared ones the operator removed, renamed-prefix on, disabled, retagged with a tag filter, or bucket-versioned out of the fast path. The stale day-TTL kept routing new writes (and would expire old ones if any landed under the prefix) after the policy was updated. Treat PUT as a full replacement: walk this bucket's existing day-TTL entries, clear them, then add fresh entries from the new rule set. * test(command): bump mini default plugin job-type count to 7 The s3_lifecycle plugin handler registered in #9362 is the seventh default; the test still asserted six. * fix(s3api): delete stale lifecycle PathConf instead of blanking Ttl Just clearing pathConf.Ttl leaves the rule's Collection, Replication, and VolumeGrowthCount in place, so new writes still match the stale prefix and inherit outdated routing/placement. Use fc.DeleteLocationConf so the lifecycle-owned PathConf goes away entirely. Same fix in DeleteBucketLifecycleHandler, which had the same bug. |
||
|
|
5d43f84df7 |
refactor(plugin): rename detection_interval_seconds → detection_interval_minutes (#9366)
Minutes is the natural granularity for detection cadence — every production handler already set the seconds field to a 60-multiple (17*60, 30*60, 3600, 24*60*60). Switching to minutes drops the *60 arithmetic and matches the unit conventions used elsewhere in the plugin worker forms. - Proto: AdminRuntimeDefaults + AdminRuntimeConfig.detection_interval_* field renamed. - Helpers: durationFromMinutes / minutesFromDuration alongside the existing seconds variants in plugin_scheduler.go. - Handlers: vacuum, ec_balance, balance, erasure_coding, iceberg, admin_script, s3_lifecycle now declare DetectionIntervalMinutes. - Admin: scheduler_status + types + UI templ + plugin_api.go pass through the new field; UI label and table cells switch to "min". |
||
|
|
7f254e158e |
feat(worker/s3_lifecycle): plugin handler with admin UI config (#9362)
* feat(s3/lifecycle): scheduler — N pipelines over an even shard split
Scheduler.Run spawns Workers Pipeline goroutines plus one engine-refresh
ticker. Each worker owns a contiguous AssignShards(idx, total) slice of
[0, ShardCount) and runs Pipeline.Run with EventBudget bounding each
iteration; brief RetryBackoff between iterations avoids hot-loop on
errors. The refresh ticker rebuilds the engine snapshot from the filer's
bucket configs every RefreshInterval.
LoadCompileInputs / IsBucketVersioned / AllActivePriorStates are
exported from a configload.go sibling so the shell command can move to
this shared implementation in a follow-up.
* refactor(shell): reuse scheduler.LoadCompileInputs in run-shard
Drop the local copies of loadLifecycleCompileInputs / isBucketVersioned
/ allActivePriorStates / lifecycleParseError that the new
scheduler package now exports. Same behavior, one source of truth.
* feat(worker/s3_lifecycle): plugin handler with admin UI config
Registers a JobHandler for s3_lifecycle via pluginworker.RegisterHandler.
Admin pulls the descriptor over the worker plugin gRPC and renders the
AdminConfigForm + WorkerConfigForm in the existing UI:
Admin form (cluster shape):
- workers (1..16, default 1)
- s3_grpc_endpoints (comma list)
Worker form (operational tuning):
- dispatch_tick_ms (default 5000)
- checkpoint_tick_ms (default 30000)
- refresh_interval_ms (default 300000)
- event_budget (default 0 = unbounded)
Detect emits a single proposal whenever S3 endpoints + filer addresses
are configured. MaxExecutionConcurrency=1 so admin only ever runs one
lifecycle daemon per worker; a fresh proposal next cycle restarts it
if the prior Execute exits.
Execute dials the configured S3 endpoint + filer, builds a
scheduler.Scheduler with the parsed config, and runs it until
ctx cancellation. Reuses the existing scheduler / dispatcher /
reader / engine packages — the handler is the thin glue that
parses descriptor values and wires the long-running daemon.
* proto(plugin): add s3_grpc_addresses to ClusterContext
So workers can dial s3 servers discovered by the master rather than a
hand-typed list in the admin form.
* feat(admin): populate ClusterContext.s3_grpc_addresses from master
ListClusterNodes(S3Type) returns the live S3 servers; the plugin
scheduler now hands these to job handlers alongside filer/volume
addresses.
* feat(worker/s3_lifecycle): discover s3 endpoints from cluster context
Drop the s3_grpc_endpoints admin form field and read the master-supplied
ClusterContext.S3GrpcAddresses instead. Operators no longer maintain a
hand-typed list, and a stale entry self-heals when the master's view
updates.
* feat(worker/s3_lifecycle): time-based runtime cap, friendlier cadence units
- dispatch_tick_minutes (was *_ms): minutes is the natural granularity
for a daily batch; default 1 minute.
- checkpoint_tick_seconds: seconds for the durable cursor write; default
30 seconds.
- refresh_interval_minutes: minutes for the engine snapshot rebuild.
- max_runtime_minutes replaces event_budget. Each daily run is bounded
by wall clock — typical run wraps in well under an hour because the
cursor persists and the meta-log streams fast. Default 60 minutes.
- AdminRuntimeDefaults.DetectionIntervalSeconds = 86400 so the admin
schedules one job per day.
|
||
|
|
85abf3ca88 |
feat(shell): s3.lifecycle.run-shard + integration test (#9361)
* feat(shell): s3.lifecycle.run-shard for manual Phase 3 dispatch Subscribes to the filer meta-log filtered to one (bucket, key-prefix-hash) shard, routes events through the compiled lifecycle engine, and dispatches due actions to the S3 server's LifecycleDelete RPC. Persists the per-shard cursor to /etc/s3/lifecycle/cursors/shard-NN.json so subsequent runs resume. Operator-runnable harness for end-to-end Phase 3 validation while the plugin-worker auto-scheduler is still pending. EventBudget bounds a single invocation; flags expose dispatch + checkpoint cadence. Discovers buckets by walking the configured DirBuckets path and reading each bucket entry's Extended[s3-bucket-lifecycle-configuration-xml] through lifecycle_xml.ParseCanonical. All compiled actions are seeded BootstrapComplete=true so the run dispatches whatever fires immediately; production bootstrap walks set this incrementally per bucket. * test(s3/lifecycle): integration test driving the run-shard shell command Spins up 'weed mini', creates a bucket with a 1-day expiration on a prefix, PUTs the target object, then rewrites the entry's Mtime via filer UpdateEntry to 30 days ago. Runs 's3.lifecycle.run-shard' for every shard via 'weed shell' subprocess and asserts the backdated object is deleted within 30s, and the in-prefix-but-recent object remains. The S3 API rejects Expiration.Days < 1, so 'wait a day' is unworkable. Backdating via the filer's gRPC sidesteps that constraint while still exercising the real Reader -> Router -> Schedule -> Dispatcher -> LifecycleDelete RPC path end-to-end. Wires a new s3-lifecycle-tests job into s3-go-tests.yml. The test runs all 16 shards because ShardID(bucket, key) is hash-based and the test shouldn't couple to that detail; running every shard keeps the test independent of the hash function. * fix(shell/s3.lifecycle.run-shard): address review findings - Reject negative -events explicitly. Help text already defines 0 as unbounded; negative budgets created ambiguous behavior in pipeline.Run. - Bound the gRPC dial with a 30s timeout instead of context.Background() so an unreachable S3 endpoint doesn't hang the shell. - Paginate the bucket listing in loadLifecycleCompileInputs. SeaweedList takes a single-RPC limit; the prior 4096 silently dropped buckets past that page on large clusters. Loop with startFrom until a page comes back short. - Surface parse errors instead of swallowing them. Buckets with malformed lifecycle XML now print the first three errors verbatim and a count for the rest, so an operator running this command for diagnostics can find what's wrong. * feat(shell/s3.lifecycle.run-shard): -shards range/set with one subscription Adds -shards "lo-hi" or "a,b,c" to the manual run command and threads the same model through Reader and Pipeline. - reader.Reader gains ShardPredicate (func(int) bool) and StartTsNs; ShardID stays for the single-shard short form. Event carries the computed ShardID so consumers can route per-shard without rehashing. - dispatcher.Pipeline gains Shards []int. When set, Run holds one Cursor + Schedule + Dispatcher per shard, opens one filer SubscribeMetadata stream with a predicate covering the whole set, and routes events into the matching shard's schedule from a single dispatch goroutine — no per-shard goroutine fan-out. - shell command parses -shard or -shards (mutually exclusive), formats progress messages with a contiguous-range label when applicable, and validates against ShardCount. Integration test now uses -shards 0-15 (one subprocess invocation) instead of a 16-iteration loop. * fix(s3/lifecycle): allow Reader with StartTsNs=0 + Cursor=nil The reader rejected the legitimate 'fresh subscription from epoch' state when called from a fresh Pipeline.Run on a multi-shard worker (no cursor file yet, all shards' MinTsNs=0). The downstream SubscribeMetadata call handles SinceNs=0 fine; the up-front check was over-defensive and broke the auto-scheduler completely (CI showed 5-second-cadence retries with this exact error). * fix(s3/lifecycle): schedule from ModTime not eventTime A backdated or out-of-band entry update has eventTime ≈ now while ModTime is far in the past; eventTime+Delay would push the dispatch into the future even though the rule already fires. ModTime+Delay is the correct fire moment. The dispatcher's identity-CAS still catches drift between schedule and dispatch. * fix(s3/lifecycle): -runtime cap on run-shard so it exits on quiet shards The CI integration test sets -events 200 expecting the subprocess to return after 200 in-shard events. But -events counts only events that pass the shard filter; the test produces ~5 such events (bucket create, lifecycle PUT, two object PUTs, mtime backdate), so the reader stays in stream.Recv forever and runShellCommand hangs the test deadline. - weed/shell/command_s3_lifecycle_run_shard.go: add -runtime D flag. When > 0, Pipeline.Run runs under context.WithTimeout(D); on expiry the reader/dispatcher drain cleanly and the cursor saves. - weed/s3api/s3lifecycle/dispatcher/pipeline.go: treat context.DeadlineExceeded the same as context.Canceled at exit (both are graceful shutdown signals). * test(s3/lifecycle): pass -runtime 10s to run-shard Pair with the new -runtime flag so the subprocess exits cleanly after 10s instead of waiting for an event budget that never lands on quiet shards. * refactor(s3/lifecycle): extract HashExtended to s3lifecycle pkg The worker's router needs the same length-prefixed sha256 of the entry's Extended map; pulling it out of the s3api private file lets both sides import it. * fix(s3/lifecycle): worker captures ExtendedHash for identity-CAS Without this, the dispatcher sends ExpectedIdentity.ExtendedHash = nil while the live entry on the server has a non-nil hash, so every dispatch returns NOOP_RESOLVED:STALE_IDENTITY and nothing is ever deleted. * fix(s3/lifecycle): identity HeadFid via GetFileIdString Meta-log events go through BeforeEntrySerialization, which clears FileChunk.FileId and writes the Fid struct instead. Reading .FileId directly returns "" on the worker side while the server's freshly fetched entry still has a populated string, so the identity-CAS would mismatch and every expiration ended in NOOP_RESOLVED:STALE_IDENTITY. * fix(s3/lifecycle): treat gRPC Canceled/DeadlineExceeded as graceful exit errors.Is doesn't unwrap a gRPC status error back to the stdlib ctx errors, so a subscription that ends because runCtx was canceled was being logged as a fatal reader error. Check status.Code as well so the shell's -runtime cap exits cleanly. * fix(test/s3/lifecycle): pass the gRPC port (not HTTP) to run-shard run-shard's -s3 flag dials the LifecycleDelete gRPC service, which listens on s3.port + 10000. The integration test was passing the HTTP port instead, so the dispatcher's RPC just timed out and the shell command exited under -runtime with no work done. * chore(test/s3/lifecycle): drop emoji from Makefile output * docs(test/s3/lifecycle): correct '-shards 0-15' wording * fix(s3/lifecycle): reject out-of-range shard IDs in Pipeline.Run The shell's parseShardsSpec already validates, but a programmatic caller (scheduler, future worker config) shouldn't be able to silently produce no-op states by passing -1 or 99. * fix(s3/lifecycle): bound drain + final-save with their own timeouts Shutdown was using context.Background, so a stuck dispatcher RPC or filer save could keep Pipeline.Run from ever returning. * fix(test/s3/lifecycle): drop self-killing pkill in stop-server The pkill pattern \"weed mini -dir=...\" is also in the running shell's argv (it's the recipe body), so pkill -f matches its own bash and the recipe exits with Terminated. CI test job passed but the cleanup step failed with exit 2. The PID file is sufficient on its own. * docs(test/s3/lifecycle): document S3_GRPC_ENDPOINT env var |
||
|
|
c918660901 | build(deps): bump io.netty:netty-transport-native-epoll from 4.1.132.Final to 4.2.13.Final in /test/java/spark (#9365) | ||
|
|
9cb103cd35 | build(deps): bump github.com/apache/thrift from 0.22.0 to 0.23.0 (#9364) | ||
|
|
b7928637a0 |
refactor(s3api): move Lifecycle XML structs to leaf package lifecycle_xml (#9360)
* refactor(s3api): move Lifecycle XML structs to leaf package lifecycle_xml The structs S3 PutBucketLifecycleConfiguration parses and the canonical conversion to s3lifecycle.Rule lived in package s3api, which transitively imports weed/server (which imports weed/shell). Any caller outside weed/s3api — the shell, the future lifecycle worker — that wanted to parse a bucket's lifecycle XML hit an import cycle. Moves: weed/s3api/s3api_policy.go -> lifecycle_xml/types.go weed/s3api/s3api_lifecycle_canonical.go -> lifecycle_xml/canonical.go s3api_lifecycle_canonical_test.go -> lifecycle_xml/canonical_test.go s3api_policy_test.go -> lifecycle_xml/round_trip_test.go Renames the public RuleStatus type (was unexported ruleStatus) and adds small accessor methods (Set/Val/AndSet/TagSet) for fields the s3api handler needs to read across the package boundary. Adds NewPrefix and NewExpirationDays constructors so the GET handler can build response rules without poking at unexported fields. Adds a Tag struct local to the package so it has zero internal seaweed deps. Adds a one-shot ParseCanonical(xml []byte) helper for non-server callers. s3api_policy.go was misnamed — its content is lifecycle XML, not S3 bucket policy. The new package name reflects the actual scope. * test(s3api/lifecycle_xml): exercise public API in tests - canonical_test.go's parseLifecycle helper went through xml.Unmarshal directly; route it through the package's exported Parse so tests validate the public entrypoint. - round_trip_test.go asserted internal flags (rule.Filter.tagSet, rule.Filter.andSet, Transition.set, NoncurrentVersionTransition.set); switch to TagSet(), AndSet(), Set() — exercises the public contract that downstream callers (s3api handler, future shell command) rely on. |
||
|
|
c567da7164 |
feat(s3): register SeaweedS3LifecycleInternal gRPC service (#9359)
Phase 2 added the LifecycleDelete handler on S3ApiServer but never registered it on a running gRPC server, so workers had no endpoint to dial. Embed UnimplementedSeaweedS3LifecycleInternalServer on S3ApiServer and register it on the s3 command's grpc server alongside SeaweedS3IamCacheServer. |
||
|
|
35e3fe89bc |
feat(s3/lifecycle): filer-backed cursor Persister + drop BlockerStore (#9358)
* feat(s3/lifecycle): filer-backed cursor Persister FilerPersister persists per-shard cursor maps as JSON to /etc/s3/lifecycle/cursors/shard-NN.json via filer.SaveInsideFiler. One file per shard keeps Save atomic — the filer writes the entry in a single mutation, so a crash mid-write doesn't leak partial state. Pipeline.Run loads on start; the periodic checkpoint and graceful-shutdown save go through this implementation. A small FilerStore interface wraps the SeaweedFilerClient surface the persister needs, so tests inject an in-memory fake instead of mocking the full gRPC client. * refactor(s3/lifecycle): drop BlockerStore — durable cursor IS the block A frozen cursor doesn't advance, so the durable cursor (FilerPersister) encodes the blocked state on its own. On worker restart the reader re-encounters the poison event at MinTsNs, the dispatcher walks the same retry budget to BLOCKED, and the cursor freezes at the same EventTs. Other in-flight events between freeze tsNs and prior cursor positions self-resolve via NOOP_RESOLVED (STALE_IDENTITY) since the underlying objects were already deleted on the prior pass. Removed: - BlockerStore interface + InMemoryBlockerStore + BlockerRecord - Dispatcher.Blockers + Dispatcher.ReplayBlockers - the BlockerStore.Put call in handleBlocked - Pipeline.Blockers field + the ReplayBlockers call on startup Added a TestDispatchRestartReFreezesNaturally that pins the self-recovery property: a fresh Dispatcher with a fresh Cursor, fed the same poison event, reaches the same frozen state at the same EventTs without any durable blocker store. Operator visibility: a cursor whose MinTsNs hasn't advanced is the signal — surfaced via the durable cursor file. * refactor(filer): SaveInsideFiler accepts ctx ReadInsideFiler already takes ctx; SaveInsideFiler used context.Background() internally and silently dropped the caller's ctx. Symmetric API now; cancellation/deadlines propagate through LookupEntry / CreateEntry / UpdateEntry. Mechanical update of all callers — most pass context.Background() since the existing call sites have no ctx in scope. * fix(s3/lifecycle): deterministic order in cursor save Iterating Go maps yields random order, so json.Encode produced a different byte sequence on each save even when the state hadn't changed. Sort entries by (Bucket, ActionKind, RuleHash) before encoding so the on-disk file diffs cleanly. New test pins byte-identical output across two saves of the same map. * fix(s3/lifecycle): log reason when freezing cursor in handleBlocked handleBlocked dropped the reason via _ = reason with a comment claiming the caller logged it; none of the three callers do. A frozen cursor is the only surface where the operator finds out something stuck, so the reason has to land somewhere. glog.Warningf with shard, key, eventTs, and the original reason — same shape the rest of the package uses. |
||
|
|
ec83a87d68 |
perf(s3/lifecycle): defer pool Put on ShardID hasher
defer guarantees the hasher returns to the pool even if h.Write or h.Sum panic, preventing pool leak under unexpected failure modes. |
||
|
|
3a192c6c57 |
fix(s3/lifecycle): address Phase 3 post-merge review (#9354 #9355 #9356) (#9357)
* fix(s3/lifecycle): reader handles bare /buckets parent and pre-normalizes prefix extractBucketKey accepted /buckets/ but rejected /buckets (no trailing slash); some delete events emit the bare form, so bucket-root events were silently dropped. Pre-normalize BucketsPath once on Run instead of recomputing per event. * perf(s3/lifecycle): pool sha256 hashers in ShardID ShardID runs on every meta-log event before the shard filter; a fresh sha256.New per call produces measurable allocator pressure under load. sync.Pool reuses hashers across calls. * fix(s3/lifecycle): router skips hard deletes and missing-attribute events A hard delete carries no schedule-relevant state — Expiration would hit NOOP_RESOLVED at dispatch and ExpiredObjectDeleteMarker fires from a Create on the latest version. Skip rather than burn a schedule slot. Missing Attributes leaves ModTime at year 0001, which makes ExpirationDays fire immediately at dispatch. Skip the event instead. Drop the unused 'versioned' parameter from buildObjectInfo; the dispatcher's identity-CAS handles version drift in Phase 5. * fix(s3/lifecycle): EntryIdentity.MtimeNs holds true nanoseconds Both computeEntryIdentity (server) and buildIdentity (router) wrote entry.Attributes.Mtime (seconds) into a field named MtimeNs. The CAS worked because both sides agreed, but the encoding contradicted the field name and would break if either side later started using true nanoseconds. Combine Mtime*1e9 + the FuseAttributes.MtimeNs nanosecond component on both sides; the test was updated to match. * fix(s3/lifecycle): dispatcher distinguishes ctx cancel from transport errors A canceled or deadline-exceeded RPC is shutdown, not a transport failure: re-queue the Match at its original DueTime with no retry-budget burn so a quick restart can't escalate it to BLOCKED. * fix(s3/lifecycle): reader fallback prefix normalization mirrors Run The fallback path that builds prefix from r.BucketsPath when bucketsPathSlash is empty (test-only entry into extractBucketKey) was appending an unconditional '/', producing '//' if BucketsPath already ended with one. Use the same normalization Run does. * fix(s3/lifecycle): ObjectInfo.ModTime carries the nanosecond component ModTime dropped FuseAttributes.MtimeNs, leaving ExpirationDays one nanosecond off relative to EntryIdentity.MtimeNs. Pass both to time.Unix so the precision matches the CAS witness. |
||
|
|
5c991f38f5 |
feat(s3/lifecycle): dispatcher + per-shard pipeline (Phase 3 PR-D) (#9356)
feat(s3/lifecycle): dispatcher + blocker store + per-shard pipeline Dispatcher consumes due Matches from the schedule, calls LifecycleDelete, and routes outcomes: DONE / NOOP_RESOLVED / SKIPPED_OBJECT_LOCK -> Cursor.Advance RETRY_LATER (within budget) -> re-schedule with backoff RETRY_LATER (budget exhausted) / BLOCKED -> BlockerStore.Put + Freeze BlockerStore is a small interface with InMemoryBlockerStore for tests; the filer-backed impl follows when the worker task registration lands. Pipeline composes Reader + Router + Dispatcher into a single Run loop keyed by shard. Cursor is restored on start, blockers are replayed as freezes, checkpoints write at a configurable cadence, and a final save fires on shutdown. The meta-log itself is the durable buffer for in-flight schedule entries — restart re-derives them from the cursor's MinTsNs. |