mirror of
https://github.com/seaweedfs/seaweedfs.git
synced 2026-05-14 05:41:29 +00:00
* fix(s3): cache remote-only source before CopyObject (#9304) CopyObject from a remote.mount source whose object lives only upstream created a destination entry with FileSize > 0 but no chunks/content, because the resolved source entry has no local chunks and the copy path fell into the "inline/empty chunks" branch with empty entry.Content. A subsequent GET returned 500 with "data integrity error: size N reported but no content available". CopyObjectPart had the same shape via copyChunksForRange iterating an empty chunk list. Detect entry.IsInRemoteOnly() right after resolving the source in both CopyObjectHandler and CopyObjectPartHandler and cache the object to the local cluster first via a new cacheRemoteObjectForCopy helper (a copy-time analogue of cacheRemoteObjectForStreaming with a bounded 30s timeout and version-aware path resolution). If caching fails or produces no chunks, return 503 with Retry-After: 5 instead of writing a metadata-only destination, mirroring the GetObject behavior added in the #7817 cold-cache fix. Adds TestCopyObjectRemoteOnlySourceDetection pinning the four entry shapes the fix branches on plus the pre-fix broken-output shape. * address PR review on remote-only copy fix - Use the resolved entry's version id when srcVersionId is empty so a CopyObject reading the latest object in a versioning-enabled bucket caches the correct .versions/v_<id> path instead of getting stuck in a 503 retry loop. New helper resolvedSourceVersionId handles the fallback for both CopyObject and CopyObjectPart. - Drop the redundant cachedEntry.IsInRemoteOnly() recheck in both handlers; the cache helper now reports success based on local data presence, and IsInRemoteOnly does not look at inline Content so keeping the check would 503 on small inline-cached objects. - Treat inline Content as a successful cache result in both cacheRemoteObjectForStreaming and cacheRemoteObjectForCopy via a shared cachedEntryHasLocalData predicate. The CopyObject inline branch already handles entries that have Content but no chunks. - Extract buildVersionedRemoteObjectPath so the streaming and copy cache helpers share path construction. Adds TestResolvedSourceVersionId and TestCachedEntryHasLocalData to pin the new helpers' contracts. * narrow streaming cache contract back to chunks-only CodeRabbit flagged that cacheRemoteObjectForStreaming's caller in streamFromVolumeServers (lines 997-1002) still required non-empty chunks, so content-only cache hits would fall through to a 503 retry loop instead of being honored. Resolve by keeping the helper's contract chunks-only: the filer's caching code only ever writes chunks, the streaming downstream isn't wired to read inline Content from a cached entry, and a partial range- aware inline writer here would be overkill for a path that doesn't actually occur in practice. cacheRemoteObjectForCopy keeps the relaxed contract since the copy path's inline branch genuinely handles both chunked and content-only entries. Document the asymmetry on cachedEntryHasLocalData and on cacheRemoteObjectForStreaming so a future reader can see why the two helpers diverge. * extend version-id resolution to streaming cache path CodeRabbit flagged that GetObjectHandler still passed the raw query versionId to cacheRemoteObjectForStreaming. For latest-version reads in versioning-enabled buckets that stays empty even though the resolved entry lives at .versions/v_<id>, so remote-only GETs would keep caching the wrong path and 503-ing forever. Reuse the new resolvedSourceVersionId helper at the streaming call site too. Also document on cachedEntryHasLocalData that the zero-byte case flagged in the same review is handled upstream (IsInRemoteOnly requires RemoteSize > 0, so the cache helper is never invoked for empty remote objects -- CopyObject's pre-existing inline branch writes a correct empty destination directly). Pin this with a new test case. * trim verbose comments Drop tutorial-style and review-history comments. Keep only the WHY that isn't obvious from identifiers: the #9304 reference on the new branches in CopyObject / CopyObjectPart, the latest-version-fallback rationale on resolvedSourceVersionId, and the streaming/copy contract asymmetry on cachedEntryHasLocalData. * drop issue references from comments Issue numbers belong in PR descriptions and commit messages, not in source comments where they rot. Replace with the underlying invariant the code is preserving. * test: drive remote-object cache helpers through real gRPC Existing tests only re-enacted helper-function branching in test space, so they could not have caught a handler that consumed a remote-only entry without going through the cache. Stand up an in-process filer gRPC server (UnimplementedSeaweedFilerServer + configurable CacheRemoteObjectToLocalCluster response) and exercise the two cache helpers end-to-end. What's pinned: - cacheRemoteObjectForCopy returns nil when the cache makes no progress (response is still remote-only), lets gRPC errors through as nil, accepts both chunked and inline-content cache hits, and surfaces deadline-exceeded as nil so callers can 503 instead of holding the request open. - Versioned source paths route to .versions/v_<id>; non-versioned and "null" stay at the bucket-relative path. Captured by reading the request the stub server received. - cacheRemoteObjectForStreaming holds the stricter chunks-only contract: a content-only cache hit is not propagated, since streamFromVolumeServers' downstream isn't wired to read from inline Content there. Any current or future handler that calls these helpers exercises the same gRPC path under test, so the bug class is closed for helper-routed cache calls. * move remote-only copy test into the integration suite The previous gRPC-stub test in weed/s3api/ was integration-flavored but stubbed; relocate the coverage to the existing two-server suite under test/s3/remote_cache/, which already exercises the real remote.mount + remote.uncache flow against a primary SeaweedFS plus a secondary acting as remote storage. The new test/s3/remote_cache/remote_cache_copy_test.go drives: - TestRemoteCacheCopyObject: upload to primary, uncache (entry now remote-only), CopyObject to a new key, GET the destination. Pre- fix the GET returned 500 'data integrity error: size N reported but no content'; this pins the fixed behavior over real HTTP through the actual handler stack. - TestRemoteCacheCopyObjectPart: same shape via multipart UploadPartCopy on a 6 MiB object split into two parts, exercising CopyObjectPartHandler's range-copy path. Drop weed/s3api/s3api_remote_storage_grpc_test.go: the helper-level classification tests in s3api_remote_storage_test.go still cover the contract pieces (cachedEntryHasLocalData, resolvedSourceVersionId, the remote-only entry shape), and the integration suite covers the end-to-end behavior that those classifications enable.
487 lines
13 KiB
Go
487 lines
13 KiB
Go
package s3api
|
|
|
|
import (
|
|
"strings"
|
|
"testing"
|
|
|
|
"github.com/seaweedfs/seaweedfs/weed/filer"
|
|
"github.com/seaweedfs/seaweedfs/weed/pb/filer_pb"
|
|
"github.com/seaweedfs/seaweedfs/weed/s3api/s3_constants"
|
|
"github.com/stretchr/testify/assert"
|
|
)
|
|
|
|
// TestIsInRemoteOnly tests the IsInRemoteOnly method on filer_pb.Entry
|
|
func TestIsInRemoteOnly(t *testing.T) {
|
|
tests := []struct {
|
|
name string
|
|
entry *filer_pb.Entry
|
|
expected bool
|
|
}{
|
|
{
|
|
name: "remote-only entry with no chunks",
|
|
entry: &filer_pb.Entry{
|
|
Name: "remote-file.txt",
|
|
Chunks: nil,
|
|
RemoteEntry: &filer_pb.RemoteEntry{
|
|
RemoteSize: 1024,
|
|
},
|
|
},
|
|
expected: true,
|
|
},
|
|
{
|
|
name: "remote entry with chunks (cached)",
|
|
entry: &filer_pb.Entry{
|
|
Name: "cached-file.txt",
|
|
Chunks: []*filer_pb.FileChunk{
|
|
{FileId: "1,abc123", Size: 1024, Offset: 0},
|
|
},
|
|
RemoteEntry: &filer_pb.RemoteEntry{
|
|
RemoteSize: 1024,
|
|
},
|
|
},
|
|
expected: false,
|
|
},
|
|
{
|
|
name: "local file with chunks (not remote)",
|
|
entry: &filer_pb.Entry{
|
|
Name: "local-file.txt",
|
|
Chunks: []*filer_pb.FileChunk{
|
|
{FileId: "1,abc123", Size: 1024, Offset: 0},
|
|
},
|
|
RemoteEntry: nil,
|
|
},
|
|
expected: false,
|
|
},
|
|
{
|
|
name: "empty remote entry (size 0)",
|
|
entry: &filer_pb.Entry{
|
|
Name: "empty-remote.txt",
|
|
Chunks: nil,
|
|
RemoteEntry: &filer_pb.RemoteEntry{
|
|
RemoteSize: 0,
|
|
},
|
|
},
|
|
expected: false,
|
|
},
|
|
{
|
|
name: "no chunks but nil RemoteEntry",
|
|
entry: &filer_pb.Entry{
|
|
Name: "empty-local.txt",
|
|
Chunks: nil,
|
|
RemoteEntry: nil,
|
|
},
|
|
expected: false,
|
|
},
|
|
}
|
|
|
|
for _, tt := range tests {
|
|
t.Run(tt.name, func(t *testing.T) {
|
|
result := tt.entry.IsInRemoteOnly()
|
|
assert.Equal(t, tt.expected, result,
|
|
"IsInRemoteOnly() for %s should return %v", tt.name, tt.expected)
|
|
})
|
|
}
|
|
}
|
|
|
|
// TestRemoteOnlyEntryDetection tests that the streamFromVolumeServers logic
|
|
// correctly distinguishes between remote-only entries and data integrity errors
|
|
func TestRemoteOnlyEntryDetection(t *testing.T) {
|
|
tests := []struct {
|
|
name string
|
|
entry *filer_pb.Entry
|
|
shouldBeRemote bool
|
|
shouldBeDataError bool
|
|
shouldBeEmpty bool
|
|
}{
|
|
{
|
|
name: "remote-only entry (no chunks, has remote entry)",
|
|
entry: &filer_pb.Entry{
|
|
Name: "remote-file.txt",
|
|
Chunks: nil,
|
|
Attributes: &filer_pb.FuseAttributes{
|
|
FileSize: 1024,
|
|
},
|
|
RemoteEntry: &filer_pb.RemoteEntry{
|
|
RemoteSize: 1024,
|
|
},
|
|
},
|
|
shouldBeRemote: true,
|
|
shouldBeDataError: false,
|
|
shouldBeEmpty: false,
|
|
},
|
|
{
|
|
name: "data integrity error (no chunks, no remote, has size)",
|
|
entry: &filer_pb.Entry{
|
|
Name: "corrupt-file.txt",
|
|
Chunks: nil,
|
|
Attributes: &filer_pb.FuseAttributes{
|
|
FileSize: 1024,
|
|
},
|
|
RemoteEntry: nil,
|
|
},
|
|
shouldBeRemote: false,
|
|
shouldBeDataError: true,
|
|
shouldBeEmpty: false,
|
|
},
|
|
{
|
|
name: "empty local file (no chunks, no remote, size 0)",
|
|
entry: &filer_pb.Entry{
|
|
Name: "empty-file.txt",
|
|
Chunks: nil,
|
|
Attributes: &filer_pb.FuseAttributes{
|
|
FileSize: 0,
|
|
},
|
|
RemoteEntry: nil,
|
|
},
|
|
shouldBeRemote: false,
|
|
shouldBeDataError: false,
|
|
shouldBeEmpty: true,
|
|
},
|
|
{
|
|
name: "normal file with chunks",
|
|
entry: &filer_pb.Entry{
|
|
Name: "normal-file.txt",
|
|
Chunks: []*filer_pb.FileChunk{
|
|
{FileId: "1,abc123", Size: 1024, Offset: 0},
|
|
},
|
|
Attributes: &filer_pb.FuseAttributes{
|
|
FileSize: 1024,
|
|
},
|
|
RemoteEntry: nil,
|
|
},
|
|
shouldBeRemote: false,
|
|
shouldBeDataError: false,
|
|
shouldBeEmpty: false,
|
|
},
|
|
}
|
|
|
|
for _, tt := range tests {
|
|
t.Run(tt.name, func(t *testing.T) {
|
|
chunks := tt.entry.GetChunks()
|
|
totalSize := int64(filer.FileSize(tt.entry))
|
|
|
|
if len(chunks) == 0 {
|
|
// This mirrors the logic in streamFromVolumeServers
|
|
if tt.entry.IsInRemoteOnly() {
|
|
assert.True(t, tt.shouldBeRemote,
|
|
"Entry should be detected as remote-only")
|
|
} else if totalSize > 0 && len(tt.entry.Content) == 0 {
|
|
assert.True(t, tt.shouldBeDataError,
|
|
"Entry should be detected as data integrity error")
|
|
} else {
|
|
assert.True(t, tt.shouldBeEmpty,
|
|
"Entry should be detected as empty")
|
|
}
|
|
} else {
|
|
assert.False(t, tt.shouldBeRemote, "Entry with chunks should not be remote-only")
|
|
assert.False(t, tt.shouldBeDataError, "Entry with chunks should not be data error")
|
|
assert.False(t, tt.shouldBeEmpty, "Entry with chunks should not be empty")
|
|
}
|
|
})
|
|
}
|
|
}
|
|
|
|
// TestVersionedRemoteObjectPathBuilding tests that the path building logic
|
|
// correctly handles versioned objects stored in .versions/ directory
|
|
func TestVersionedRemoteObjectPathBuilding(t *testing.T) {
|
|
bucketsPath := "/buckets"
|
|
|
|
tests := []struct {
|
|
name string
|
|
bucket string
|
|
object string
|
|
versionId string
|
|
expectedDir string
|
|
expectedName string
|
|
}{
|
|
{
|
|
name: "non-versioned object (empty versionId)",
|
|
bucket: "mybucket",
|
|
object: "myobject.txt",
|
|
versionId: "",
|
|
expectedDir: "/buckets/mybucket",
|
|
expectedName: "myobject.txt",
|
|
},
|
|
{
|
|
name: "null version",
|
|
bucket: "mybucket",
|
|
object: "myobject.txt",
|
|
versionId: "null",
|
|
expectedDir: "/buckets/mybucket",
|
|
expectedName: "myobject.txt",
|
|
},
|
|
{
|
|
name: "specific version",
|
|
bucket: "mybucket",
|
|
object: "myobject.txt",
|
|
versionId: "abc123",
|
|
expectedDir: "/buckets/mybucket/myobject.txt" + s3_constants.VersionsFolder,
|
|
expectedName: "v_abc123",
|
|
},
|
|
{
|
|
name: "nested object with version",
|
|
bucket: "mybucket",
|
|
object: "folder/subfolder/file.txt",
|
|
versionId: "xyz789",
|
|
expectedDir: "/buckets/mybucket/folder/subfolder/file.txt" + s3_constants.VersionsFolder,
|
|
expectedName: "v_xyz789",
|
|
},
|
|
{
|
|
name: "object with leading slash and version",
|
|
bucket: "mybucket",
|
|
object: "/path/to/file.txt",
|
|
versionId: "ver456",
|
|
expectedDir: "/buckets/mybucket/path/to/file.txt" + s3_constants.VersionsFolder,
|
|
expectedName: "v_ver456",
|
|
},
|
|
}
|
|
|
|
for _, tt := range tests {
|
|
t.Run(tt.name, func(t *testing.T) {
|
|
var dir, name string
|
|
|
|
// This mirrors the logic in cacheRemoteObjectForStreaming
|
|
if tt.versionId != "" && tt.versionId != "null" {
|
|
// Versioned object path
|
|
normalizedObject := strings.TrimPrefix(removeDuplicateSlashesTest(tt.object), "/")
|
|
dir = bucketsPath + "/" + tt.bucket + "/" + normalizedObject + s3_constants.VersionsFolder
|
|
name = "v_" + tt.versionId
|
|
} else {
|
|
// Non-versioned path (simplified - just for testing)
|
|
dir = bucketsPath + "/" + tt.bucket
|
|
normalizedObject := strings.TrimPrefix(removeDuplicateSlashesTest(tt.object), "/")
|
|
if idx := strings.LastIndex(normalizedObject, "/"); idx > 0 {
|
|
dir = dir + "/" + normalizedObject[:idx]
|
|
name = normalizedObject[idx+1:]
|
|
} else {
|
|
name = normalizedObject
|
|
}
|
|
}
|
|
|
|
assert.Equal(t, tt.expectedDir, dir, "Directory path should match")
|
|
assert.Equal(t, tt.expectedName, name, "Name should match")
|
|
})
|
|
}
|
|
}
|
|
|
|
// removeDuplicateSlashesTest is a test helper that mirrors production code
|
|
func removeDuplicateSlashesTest(s string) string {
|
|
for strings.Contains(s, "//") {
|
|
s = strings.ReplaceAll(s, "//", "/")
|
|
}
|
|
return s
|
|
}
|
|
|
|
// TestResolvedSourceVersionId pins that latest-version reads in a versioning-
|
|
// enabled bucket fall back to the entry's recorded version id when the
|
|
// request carried none, so cache paths target .versions/v_<id> correctly.
|
|
func TestResolvedSourceVersionId(t *testing.T) {
|
|
tests := []struct {
|
|
name string
|
|
requested string
|
|
entry *filer_pb.Entry
|
|
expected string
|
|
}{
|
|
{
|
|
name: "explicit request versionId wins",
|
|
requested: "abc123",
|
|
entry: &filer_pb.Entry{
|
|
Extended: map[string][]byte{
|
|
s3_constants.ExtVersionIdKey: []byte("ignored"),
|
|
},
|
|
},
|
|
expected: "abc123",
|
|
},
|
|
{
|
|
name: "empty request falls back to entry version (latest in versioned bucket)",
|
|
requested: "",
|
|
entry: &filer_pb.Entry{
|
|
Extended: map[string][]byte{
|
|
s3_constants.ExtVersionIdKey: []byte("xyz789"),
|
|
},
|
|
},
|
|
expected: "xyz789",
|
|
},
|
|
{
|
|
name: "empty request and pre-versioning entry stays empty",
|
|
requested: "",
|
|
entry: &filer_pb.Entry{
|
|
Extended: map[string][]byte{},
|
|
},
|
|
expected: "",
|
|
},
|
|
{
|
|
name: "empty request and nil Extended stays empty",
|
|
requested: "",
|
|
entry: &filer_pb.Entry{Extended: nil},
|
|
expected: "",
|
|
},
|
|
{
|
|
name: "nil entry tolerated when no request version",
|
|
requested: "",
|
|
entry: nil,
|
|
expected: "",
|
|
},
|
|
}
|
|
|
|
for _, tt := range tests {
|
|
t.Run(tt.name, func(t *testing.T) {
|
|
assert.Equal(t, tt.expected, resolvedSourceVersionId(tt.requested, tt.entry))
|
|
})
|
|
}
|
|
}
|
|
|
|
func TestCachedEntryHasLocalData(t *testing.T) {
|
|
tests := []struct {
|
|
name string
|
|
entry *filer_pb.Entry
|
|
expected bool
|
|
}{
|
|
{
|
|
name: "nil entry is not a hit",
|
|
entry: nil,
|
|
expected: false,
|
|
},
|
|
{
|
|
name: "entry with chunks is a hit",
|
|
entry: &filer_pb.Entry{
|
|
Chunks: []*filer_pb.FileChunk{{FileId: "1,abc", Size: 10}},
|
|
},
|
|
expected: true,
|
|
},
|
|
{
|
|
name: "entry with inline content is a hit",
|
|
entry: &filer_pb.Entry{
|
|
Content: []byte("small file body"),
|
|
},
|
|
expected: true,
|
|
},
|
|
{
|
|
name: "entry with neither chunks nor content is not a hit",
|
|
entry: &filer_pb.Entry{},
|
|
expected: false,
|
|
},
|
|
}
|
|
|
|
for _, tt := range tests {
|
|
t.Run(tt.name, func(t *testing.T) {
|
|
assert.Equal(t, tt.expected, cachedEntryHasLocalData(tt.entry))
|
|
})
|
|
}
|
|
}
|
|
|
|
// TestCopyObjectRemoteOnlySourceDetection guards against regressing the
|
|
// remote-only source case: such a source used to fall through to
|
|
// CopyObject's inline branch with empty Content, producing a destination
|
|
// with FileSize > 0 but no chunks.
|
|
func TestCopyObjectRemoteOnlySourceDetection(t *testing.T) {
|
|
tests := []struct {
|
|
name string
|
|
entry *filer_pb.Entry
|
|
expectRemoteOnly bool
|
|
expectInlineBranchHit bool
|
|
expectBrokenWithoutFix bool
|
|
}{
|
|
{
|
|
name: "remote-only object with size and no chunks/content",
|
|
entry: &filer_pb.Entry{
|
|
Name: "file-1234-audio.mp3",
|
|
Attributes: &filer_pb.FuseAttributes{
|
|
FileSize: 16018804,
|
|
},
|
|
Chunks: nil,
|
|
Content: nil,
|
|
RemoteEntry: &filer_pb.RemoteEntry{
|
|
RemoteSize: 16018804,
|
|
},
|
|
},
|
|
expectRemoteOnly: true,
|
|
expectInlineBranchHit: true,
|
|
expectBrokenWithoutFix: true,
|
|
},
|
|
{
|
|
name: "local file with chunks - copy works fine, fix does not engage",
|
|
entry: &filer_pb.Entry{
|
|
Name: "local.bin",
|
|
Attributes: &filer_pb.FuseAttributes{
|
|
FileSize: 1024,
|
|
},
|
|
Chunks: []*filer_pb.FileChunk{
|
|
{FileId: "1,abc", Size: 1024, Offset: 0},
|
|
},
|
|
},
|
|
expectRemoteOnly: false,
|
|
expectInlineBranchHit: false,
|
|
expectBrokenWithoutFix: false,
|
|
},
|
|
{
|
|
name: "small inline file (no chunks, has Content) - hits inline branch but not broken",
|
|
entry: &filer_pb.Entry{
|
|
Name: "tiny.txt",
|
|
Attributes: &filer_pb.FuseAttributes{
|
|
FileSize: 5,
|
|
},
|
|
Content: []byte("hello"),
|
|
},
|
|
expectRemoteOnly: false,
|
|
expectInlineBranchHit: true,
|
|
expectBrokenWithoutFix: false,
|
|
},
|
|
{
|
|
name: "remote entry already cached (has chunks) - fix does not engage",
|
|
entry: &filer_pb.Entry{
|
|
Name: "cached.dat",
|
|
Attributes: &filer_pb.FuseAttributes{
|
|
FileSize: 2048,
|
|
},
|
|
Chunks: []*filer_pb.FileChunk{
|
|
{FileId: "2,def", Size: 2048, Offset: 0},
|
|
},
|
|
RemoteEntry: &filer_pb.RemoteEntry{
|
|
RemoteSize: 2048,
|
|
},
|
|
},
|
|
expectRemoteOnly: false,
|
|
expectInlineBranchHit: false,
|
|
expectBrokenWithoutFix: false,
|
|
},
|
|
{
|
|
// Zero-byte remote objects must not enter the cache branch:
|
|
// IsInRemoteOnly requires RemoteSize > 0, so CopyObject's
|
|
// pre-existing inline branch handles them with no 503.
|
|
name: "zero-byte remote object - fix does not engage, inline branch handles it",
|
|
entry: &filer_pb.Entry{
|
|
Name: "empty-on-remote.txt",
|
|
Attributes: &filer_pb.FuseAttributes{
|
|
FileSize: 0,
|
|
},
|
|
RemoteEntry: &filer_pb.RemoteEntry{
|
|
RemoteSize: 0,
|
|
},
|
|
},
|
|
expectRemoteOnly: false,
|
|
expectInlineBranchHit: true,
|
|
expectBrokenWithoutFix: false,
|
|
},
|
|
}
|
|
|
|
for _, tt := range tests {
|
|
t.Run(tt.name, func(t *testing.T) {
|
|
assert.Equal(t, tt.expectRemoteOnly, tt.entry.IsInRemoteOnly())
|
|
|
|
// Mirror the inline branch in s3api_object_handlers_copy.go:
|
|
// if entry.Attributes.FileSize == 0 || len(entry.GetChunks()) == 0
|
|
inlineBranchHit := tt.entry.Attributes != nil &&
|
|
(tt.entry.Attributes.FileSize == 0 || len(tt.entry.GetChunks()) == 0)
|
|
assert.Equal(t, tt.expectInlineBranchHit, inlineBranchHit)
|
|
|
|
// The broken shape: inline branch fires, no inline content, FileSize > 0.
|
|
brokenWithoutFix := inlineBranchHit &&
|
|
len(tt.entry.Content) == 0 &&
|
|
tt.entry.Attributes != nil &&
|
|
tt.entry.Attributes.FileSize > 0
|
|
assert.Equal(t, tt.expectBrokenWithoutFix, brokenWithoutFix)
|
|
})
|
|
}
|
|
}
|