mirror of
https://github.com/seaweedfs/seaweedfs.git
synced 2026-05-13 21:31:32 +00:00
* feat(filer.backup): -initialSnapshot seeds destination from live tree Replaying the metadata event log on a fresh sync only leaves files that still exist on the source at replay time: any entry that was created and later deleted is replayed as a create/delete pair and never materializes on the destination. Users who wipe the destination and re-run filer.backup therefore see "only new files" instead of a full backup, even when -timeAgo=876000h is passed and the subscription genuinely starts from epoch (ref discussion #8672). Add a -initialSnapshot opt-in flag: when set on a fresh sync (no prior checkpoint, -timeAgo unset), walk the live filer tree under -filerPath via TraverseBfs and seed the destination through sink.CreateEntry, then persist the walk-start timestamp as the checkpoint and subscribe from there. Capturing the timestamp before the walk lets the subscription catch any create/update/delete racing with the walk — sink CreateEntry is idempotent across the builtin sinks so replay is safe. Honors existing -filerExcludePaths / -filerExcludeFileNames / -filerExcludePathPatterns filters and skips /topics/.system/log the same way the subscription path does. Also log "starting from <t> (no prior checkpoint)" instead of a misleading "resuming from 1970-01-01" when the KV has no stored offset. * fix(filer.backup): guard initialSnapshot counters under TraverseBfs workers TraverseBfs fans the callback out across 5 worker goroutines, so the entryCount / byteCount updates and the 5-second progress-log gate in runInitialSnapshot were racing. Switch the counters to atomic.Int64 and protect the lastLog check/update with a short-scoped mutex so the heavy sink.CreateEntry call stays outside the critical section. Flagged by gemini-code-assist on #9126; verified with go test -race. * fix(filer.backup): harden initialSnapshot against transient errors and path edge cases Three review items from CodeRabbit on #9126: 1. getOffset errors no longer leave isFreshSync=true. Before, a transient KV read failure would cause runFilerBackup's retry loop to redo the full -initialSnapshot walk on every retry. Treat any offset-read error as "not fresh" so the snapshot only runs when we've verified there really is no prior checkpoint. 2. initialSnapshotTargetKey now normalizes sourcePath to a trailing- slash base before stripping the prefix, so edge cases where sourceKey equals sourcePath (trailing-slash mismatch or root-entry emission) no longer index past the end. Unit tests cover both forms. 3. Documented the TraverseBfs-enumerates-excluded-subtrees performance characteristic on runInitialSnapshot, since pruning requires a separate change to TraverseBfs itself. * fix(filer.backup): retry setOffset after initialSnapshot to avoid full re-walks If the snapshot walk finishes but the subsequent setOffset fails, the retry loop in runFilerBackup will re-enter doFilerBackup with an empty checkpoint and run the full BFS again — on a multi-million-entry tree that's hours of wasted work over a 100-byte KV write. Retry the write a handful of times with exponential backoff before giving up, and log loudly at the final failure (with snapshotTsNs + sinkId) so operators recognize the symptom instead of guessing at mysterious repeated walks. Nitpick raised by CodeRabbit on #9126. * fix(filer.backup): initialSnapshot ignore404, skew margin, exclude dir-entry itself Three review items from CodeRabbit on #9126: 1. ignore404Error now threads into runInitialSnapshot. If a file is listed by TraverseBfs and then deleted before CreateEntry reads its chunks, the follow path already ignores 404s — the snapshot path was aborting and triggering a full re-walk. Treat an ignorable 404 as "skip this entry, continue." 2. snapshotTsNs now uses `time.Now() - 1min` instead of `time.Now()`. Metadata events are stamped server-side, so a fast backup-host clock could skip events that fire during or right after the walk. Matches the 1-minute margin meta_aggregator.go applies on initial peer traversal; duplicate replay is harmless because CreateEntry is idempotent. 3. Exclude checks now run against the entry's own full path, not just its parent. A walked directory whose full path matches SystemLogDir or -filerExcludePaths was being seeded to the destination; only its descendants were being skipped. Verified with a manual repro where -filerExcludePaths=/data/skipdir now keeps the skipdir entry itself off the destination. * refactor(filer): share destKey helper between buildKey and initialSnapshot Extract destKey(dataSink, targetPath, sourcePath, sourceKey, mTime) from buildKey in filer_sync.go. Both the event-log path (buildKey) and the initialSnapshot walk (initialSnapshotTargetKey) now go through the same helper, so a walk-seeded file and an event-replayed file always resolve to the same destination key. As a bonus, buildKey picks up the defensive trailing-slash normalization that initialSnapshotTargetKey introduced — no more index-past-end risk when sourceKey happens to equal sourcePath. Also tightens the mTime lookup to guard against nil Attributes (caught by an existing test against buildKey when I first moved the lookup out of the incremental branch).
142 lines
5.8 KiB
Go
142 lines
5.8 KiB
Go
package command
|
|
|
|
import (
|
|
"context"
|
|
"fmt"
|
|
"net/http"
|
|
"net/http/httptest"
|
|
"os"
|
|
"testing"
|
|
"time"
|
|
|
|
"github.com/seaweedfs/seaweedfs/weed/pb/filer_pb"
|
|
"github.com/seaweedfs/seaweedfs/weed/replication/sink"
|
|
"github.com/seaweedfs/seaweedfs/weed/replication/source"
|
|
"github.com/seaweedfs/seaweedfs/weed/util"
|
|
util_http "github.com/seaweedfs/seaweedfs/weed/util/http"
|
|
)
|
|
|
|
func TestMain(m *testing.M) {
|
|
util_http.InitGlobalHttpClient()
|
|
os.Exit(m.Run())
|
|
}
|
|
|
|
// readUrlError starts a test HTTP server returning the given status code
|
|
// and returns the error produced by ReadUrlAsStream.
|
|
//
|
|
// The error format is defined in ReadUrlAsStream:
|
|
// https://github.com/seaweedfs/seaweedfs/blob/3a765df2ff90839acb9acf910b73513417fa84d1/weed/util/http/http_global_client_util.go#L353
|
|
func readUrlError(t *testing.T, statusCode int) error {
|
|
t.Helper()
|
|
server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
|
http.Error(w, http.StatusText(statusCode), statusCode)
|
|
}))
|
|
defer server.Close()
|
|
|
|
_, err := util_http.ReadUrlAsStream(context.Background(),
|
|
server.URL+"/437,03f591a3a2b95e?readDeleted=true", "",
|
|
nil, false, true, 0, 1024, func(data []byte) {})
|
|
if err == nil {
|
|
t.Fatal("expected error from ReadUrlAsStream, got nil")
|
|
}
|
|
return err
|
|
}
|
|
|
|
func TestIsIgnorable404_WrappedErrNotFound(t *testing.T) {
|
|
readErr := readUrlError(t, http.StatusNotFound)
|
|
// genProcessFunction wraps sink errors with %w:
|
|
// https://github.com/seaweedfs/seaweedfs/blob/3a765df2ff90839acb9acf910b73513417fa84d1/weed/command/filer_sync.go#L496
|
|
genErr := fmt.Errorf("create entry1 : %w", readErr)
|
|
|
|
if !isIgnorable404(genErr) {
|
|
t.Errorf("expected ignorable, got not: %v", genErr)
|
|
}
|
|
}
|
|
|
|
func TestIsIgnorable404_BrokenUnwrapChain(t *testing.T) {
|
|
readErr := readUrlError(t, http.StatusNotFound)
|
|
// AWS SDK v1 wraps transport errors via awserr.New which uses origErr.Error()
|
|
// instead of %w, so errors.Is cannot unwrap through it:
|
|
// https://github.com/aws/aws-sdk-go/blob/v1.55.8/aws/corehandlers/handlers.go#L173
|
|
// https://github.com/aws/aws-sdk-go/blob/v1.55.8/aws/awserr/types.go#L15
|
|
awsSdkErr := fmt.Errorf("RequestError: send request failed\n"+
|
|
"caused by: Put \"https://s3.amazonaws.com/bucket/key\": %s", readErr.Error())
|
|
genErr := fmt.Errorf("create entry1 : %w", awsSdkErr)
|
|
|
|
if !isIgnorable404(genErr) {
|
|
t.Errorf("expected ignorable, got not: %v", genErr)
|
|
}
|
|
}
|
|
|
|
func TestIsIgnorable404_NonIgnorableError(t *testing.T) {
|
|
readErr := readUrlError(t, http.StatusForbidden)
|
|
genErr := fmt.Errorf("create entry1 : %w", readErr)
|
|
|
|
if isIgnorable404(genErr) {
|
|
t.Errorf("expected not ignorable, got ignorable: %v", genErr)
|
|
}
|
|
}
|
|
|
|
// stubSink is a minimal ReplicationSink used to exercise initialSnapshotTargetKey
|
|
// without standing up a real sink. Only the two methods read by the key builder
|
|
// (GetName, IsIncremental) need meaningful behavior; the rest satisfy the interface.
|
|
type stubSink struct {
|
|
name string
|
|
isIncremental bool
|
|
}
|
|
|
|
func (s *stubSink) GetName() string { return s.name }
|
|
func (s *stubSink) Initialize(util.Configuration, string) error { return nil }
|
|
func (s *stubSink) DeleteEntry(string, bool, bool, []int32) error {
|
|
return nil
|
|
}
|
|
func (s *stubSink) CreateEntry(string, *filer_pb.Entry, []int32) error { return nil }
|
|
func (s *stubSink) UpdateEntry(string, *filer_pb.Entry, string, *filer_pb.Entry, bool, []int32) (bool, error) {
|
|
return false, nil
|
|
}
|
|
func (s *stubSink) GetSinkToDirectory() string { return "" }
|
|
func (s *stubSink) SetSourceFiler(*source.FilerSource) {}
|
|
func (s *stubSink) IsIncremental() bool { return s.isIncremental }
|
|
|
|
var _ sink.ReplicationSink = (*stubSink)(nil)
|
|
|
|
func TestInitialSnapshotTargetKey(t *testing.T) {
|
|
// Mirror the non-incremental path of buildKey so a refactor of one without
|
|
// the other will fail this test.
|
|
mirror := &stubSink{name: "mirror", isIncremental: false}
|
|
got := initialSnapshotTargetKey(mirror, "/backup", "/data", util.FullPath("/data/sub/file.txt"), &filer_pb.Entry{})
|
|
if got != "/backup/sub/file.txt" {
|
|
t.Errorf("mirror sink: got %q, want %q", got, "/backup/sub/file.txt")
|
|
}
|
|
|
|
// Incremental sinks partition by entry mtime, so the seed must use the same
|
|
// YYYY-MM-DD prefix a replayed CreateEntry would produce. buildKey in
|
|
// filer_sync.go formats the date in local time, so compute the expected
|
|
// key the same way to keep the test timezone-independent.
|
|
inc := &stubSink{name: "inc", isIncremental: true}
|
|
mtime := int64(1704196800) // 2024-01-02T12:00:00 UTC — unambiguously Jan 2 in nearly all timezones
|
|
gotInc := initialSnapshotTargetKey(inc, "/backup", "/data", util.FullPath("/data/sub/file.txt"), &filer_pb.Entry{
|
|
Attributes: &filer_pb.FuseAttributes{Mtime: mtime},
|
|
})
|
|
wantInc := "/backup/" + time.Unix(mtime, 0).Format("2006-01-02") + "/sub/file.txt"
|
|
if gotInc != wantInc {
|
|
t.Errorf("incremental sink: got %q, want %q", gotInc, wantInc)
|
|
}
|
|
|
|
// Trailing-slash sourcePath still produces a clean relative key.
|
|
gotTrail := initialSnapshotTargetKey(mirror, "/backup", "/data/", util.FullPath("/data/file.txt"), &filer_pb.Entry{})
|
|
if gotTrail != "/backup/file.txt" {
|
|
t.Errorf("trailing-slash sourcePath: got %q, want %q", gotTrail, "/backup/file.txt")
|
|
}
|
|
|
|
// Edge cases CodeRabbit called out: sourceKey equal to sourcePath
|
|
// (non-trailing and trailing variants). Real TraverseBfs walks never emit
|
|
// the root itself, but the helper must not panic if something else does.
|
|
if got := initialSnapshotTargetKey(mirror, "/backup", "/data", util.FullPath("/data"), &filer_pb.Entry{}); got != "/backup" {
|
|
t.Errorf("sourceKey == sourcePath (no slash): got %q, want %q", got, "/backup")
|
|
}
|
|
if got := initialSnapshotTargetKey(mirror, "/backup", "/data/", util.FullPath("/data"), &filer_pb.Entry{}); got != "/backup" {
|
|
t.Errorf("sourceKey == sourcePath (trailing slash mismatch): got %q, want %q", got, "/backup")
|
|
}
|
|
}
|