Files
Chris Lu 6844ec067c fix(s3): cache remote-only source before CopyObject (#9304) (#9305)
* fix(s3): cache remote-only source before CopyObject (#9304)

CopyObject from a remote.mount source whose object lives only upstream
created a destination entry with FileSize > 0 but no chunks/content,
because the resolved source entry has no local chunks and the copy path
fell into the "inline/empty chunks" branch with empty entry.Content.
A subsequent GET returned 500 with "data integrity error: size N
reported but no content available". CopyObjectPart had the same shape
via copyChunksForRange iterating an empty chunk list.

Detect entry.IsInRemoteOnly() right after resolving the source in both
CopyObjectHandler and CopyObjectPartHandler and cache the object to the
local cluster first via a new cacheRemoteObjectForCopy helper (a
copy-time analogue of cacheRemoteObjectForStreaming with a bounded 30s
timeout and version-aware path resolution). If caching fails or
produces no chunks, return 503 with Retry-After: 5 instead of writing
a metadata-only destination, mirroring the GetObject behavior added in
the #7817 cold-cache fix.

Adds TestCopyObjectRemoteOnlySourceDetection pinning the four entry
shapes the fix branches on plus the pre-fix broken-output shape.

* address PR review on remote-only copy fix

- Use the resolved entry's version id when srcVersionId is empty so a
  CopyObject reading the latest object in a versioning-enabled bucket
  caches the correct .versions/v_<id> path instead of getting stuck in
  a 503 retry loop. New helper resolvedSourceVersionId handles the
  fallback for both CopyObject and CopyObjectPart.
- Drop the redundant cachedEntry.IsInRemoteOnly() recheck in both
  handlers; the cache helper now reports success based on local data
  presence, and IsInRemoteOnly does not look at inline Content so
  keeping the check would 503 on small inline-cached objects.
- Treat inline Content as a successful cache result in both
  cacheRemoteObjectForStreaming and cacheRemoteObjectForCopy via a
  shared cachedEntryHasLocalData predicate. The CopyObject inline
  branch already handles entries that have Content but no chunks.
- Extract buildVersionedRemoteObjectPath so the streaming and copy
  cache helpers share path construction.

Adds TestResolvedSourceVersionId and TestCachedEntryHasLocalData to
pin the new helpers' contracts.

* narrow streaming cache contract back to chunks-only

CodeRabbit flagged that cacheRemoteObjectForStreaming's caller in
streamFromVolumeServers (lines 997-1002) still required non-empty
chunks, so content-only cache hits would fall through to a 503 retry
loop instead of being honored.

Resolve by keeping the helper's contract chunks-only: the filer's
caching code only ever writes chunks, the streaming downstream isn't
wired to read inline Content from a cached entry, and a partial range-
aware inline writer here would be overkill for a path that doesn't
actually occur in practice. cacheRemoteObjectForCopy keeps the relaxed
contract since the copy path's inline branch genuinely handles both
chunked and content-only entries.

Document the asymmetry on cachedEntryHasLocalData and on
cacheRemoteObjectForStreaming so a future reader can see why the two
helpers diverge.

* extend version-id resolution to streaming cache path

CodeRabbit flagged that GetObjectHandler still passed the raw query
versionId to cacheRemoteObjectForStreaming. For latest-version reads
in versioning-enabled buckets that stays empty even though the
resolved entry lives at .versions/v_<id>, so remote-only GETs would
keep caching the wrong path and 503-ing forever. Reuse the new
resolvedSourceVersionId helper at the streaming call site too.

Also document on cachedEntryHasLocalData that the zero-byte case
flagged in the same review is handled upstream (IsInRemoteOnly
requires RemoteSize > 0, so the cache helper is never invoked for
empty remote objects -- CopyObject's pre-existing inline branch
writes a correct empty destination directly). Pin this with a new
test case.

* trim verbose comments

Drop tutorial-style and review-history comments. Keep only the WHY
that isn't obvious from identifiers: the #9304 reference on the new
branches in CopyObject / CopyObjectPart, the latest-version-fallback
rationale on resolvedSourceVersionId, and the streaming/copy contract
asymmetry on cachedEntryHasLocalData.

* drop issue references from comments

Issue numbers belong in PR descriptions and commit messages, not in
source comments where they rot. Replace with the underlying invariant
the code is preserving.

* test: drive remote-object cache helpers through real gRPC

Existing tests only re-enacted helper-function branching in test
space, so they could not have caught a handler that consumed a
remote-only entry without going through the cache. Stand up an
in-process filer gRPC server (UnimplementedSeaweedFilerServer +
configurable CacheRemoteObjectToLocalCluster response) and exercise
the two cache helpers end-to-end.

What's pinned:
- cacheRemoteObjectForCopy returns nil when the cache makes no
  progress (response is still remote-only), lets gRPC errors
  through as nil, accepts both chunked and inline-content cache
  hits, and surfaces deadline-exceeded as nil so callers can 503
  instead of holding the request open.
- Versioned source paths route to .versions/v_<id>; non-versioned
  and "null" stay at the bucket-relative path. Captured by reading
  the request the stub server received.
- cacheRemoteObjectForStreaming holds the stricter chunks-only
  contract: a content-only cache hit is not propagated, since
  streamFromVolumeServers' downstream isn't wired to read from
  inline Content there.

Any current or future handler that calls these helpers exercises
the same gRPC path under test, so the bug class is closed for
helper-routed cache calls.

* move remote-only copy test into the integration suite

The previous gRPC-stub test in weed/s3api/ was integration-flavored
but stubbed; relocate the coverage to the existing two-server suite
under test/s3/remote_cache/, which already exercises the real
remote.mount + remote.uncache flow against a primary SeaweedFS plus
a secondary acting as remote storage.

The new test/s3/remote_cache/remote_cache_copy_test.go drives:
- TestRemoteCacheCopyObject: upload to primary, uncache (entry now
  remote-only), CopyObject to a new key, GET the destination. Pre-
  fix the GET returned 500 'data integrity error: size N reported
  but no content'; this pins the fixed behavior over real HTTP
  through the actual handler stack.
- TestRemoteCacheCopyObjectPart: same shape via multipart
  UploadPartCopy on a 6 MiB object split into two parts, exercising
  CopyObjectPartHandler's range-copy path.

Drop weed/s3api/s3api_remote_storage_grpc_test.go: the helper-level
classification tests in s3api_remote_storage_test.go still cover
the contract pieces (cachedEntryHasLocalData, resolvedSourceVersionId,
the remote-only entry shape), and the integration suite covers the
end-to-end behavior that those classifications enable.
2026-05-03 18:52:45 -07:00
..
2025-08-01 15:45:23 -07:00
2026-04-10 17:31:14 -07:00
2026-04-10 17:31:14 -07:00