Files
seaweedfs/weed/s3api
Chris Lu d9b86fb495 fix(s3api): clear stale latest-version pointer when .versions dir cleanup is blocked (#9269)
* fix(s3api): clear stale latest-version pointer when post-deletion cleanup is blocked

In versioned buckets, when the only remaining children of a .versions
directory are orphan entries (v_<id> files that lack the version-id
extended attribute - e.g. left behind by older code paths or interrupted
writes), updateLatestVersionAfterDeletion's selectLatestVersion finds
zero promotable versions and falls through to s3a.rm. The non-recursive
rm fails because the orphan blocks the directory deletion. Previously
the code logged the failure and returned, leaving the LatestVersionId
pointer pointing at the version we just deleted.

For Veeam-style workloads that GET-PUT-GET-DELETE a small lock object on
every checkpoint, the stale pointer poisons every subsequent run: the
next GET re-enters getLatestObjectVersion's 13x retry loop on the
missing pointer file plus the self-heal rescan, all to return the same
404. The cycle is self-perpetuating until the orphan is removed by hand.

When rm fails, additionally clear the LatestVersionId / LatestVersionFile
pointer fields on the .versions directory entry. The orphan files stay
in place (an operator can audit and remove them); from the S3 API
perspective the object is now correctly absent and subsequent reads
short-circuit to ErrNotFound on the fast path instead of replaying the
heal cycle.

* fix(s3api): clear stale latest-version pointer on read-side self-heal failure

healStaleLatestVersionPointer is invoked by getLatestObjectVersion when the
pointed-at version file is missing. The rescan path can find no remaining
version-id-tagged entries (e.g. when only orphan v_<id> files lacking the
version-id extended attribute remain). Prior code returned ErrNotFound but
left the stale pointer in place, so every subsequent read replayed the same
13x retry loop on the missing file and re-entered self-heal, all to return
the same 404.

Reuse the same pointer-clear logic introduced for updateLatestVersionAfterDeletion.
The two call sites are now identical, so factor the body out into
clearStaleLatestVersionPointer. The caller parameter carries the source
function name so the log lines operators were already grepping
(updateLatestVersionAfterDeletion: cleared stale ... and
healStaleLatestVersionPointer: cleared stale ...) keep the same prefix.

* fix(s3api): re-validate before clearing latest-version pointer (CAS)

Reviewer feedback (gemini-code-assist, coderabbitai) on PR #9269 flagged
a TOCTOU in clearStaleLatestVersionPointer: between the caller loading
versionsEntry and this function persisting the cleared map, a concurrent
PUT could promote a fresh version. Persisting the caller's snapshot then
silently overwrites the live pointer and re-introduces the missing-pointer
state on a now-existing object.

Make the persist CAS-style:

  1. Re-scan .versions for any version-id-tagged entry. If one now exists,
     abort - the concurrent writer has populated the directory and either
     already updated the pointer or the next read's self-heal will pick
     up the new entry.
  2. Re-fetch the live .versions directory entry and only proceed if its
     latest-pointer fields still match the stale id the caller observed.
     A concurrent pointer update by another writer is detected here and
     the clear is skipped.
  3. Persist with mkFile using the live Extended map (minus the two
     pointer fields and the cached metadata) so any other Extended fields
     written concurrently between (2) and the persist are preserved.

A note on the literal suggestion of mutating updatedEntry.Extended in
the mkFile callback: that does not work because mkFile constructs a
fresh *filer_pb.Entry rather than reading the live entry first
(weed/pb/filer_pb/filer_client.go:247). The callback's updatedEntry is
nil at invocation, so a delete on it would be a no-op and we would lose
all Extended fields, not just the two we want to clear. The correct
shape - re-fetching the live entry before mkFile and carrying its
Extended map into the persist - is what this change implements.

True atomic CAS would require filer-level conditional update support;
this change narrows the race window from the full caller scope to the
~ms gap between (2) and (3), which is the best we can do without that.
2026-04-28 21:02:52 -07:00
..
2026-01-28 14:34:07 -08:00
2024-07-04 11:00:41 -07:00
2026-02-20 18:40:47 -08:00
2024-07-04 11:00:41 -07:00
2025-08-22 01:15:42 -07:00
2025-07-28 02:49:43 -07:00

see https://blog.aqwari.net/xml-schema-go/

1. go get aqwari.net/xml/cmd/xsdgen
2. Add EncodingType element for ListBucketResult in AmazonS3.xsd
3. xsdgen -o s3api_xsd_generated.go -pkg s3api AmazonS3.xsd
4. Remove empty Grantee struct in s3api_xsd_generated.go
5. Remove xmlns: sed s'/http:\/\/s3.amazonaws.com\/doc\/2006-03-01\/\ //' s3api_xsd_generated.go