Files
seaweedfs/weed/command
Chris Lu c1ccbe97dd feat(filer.backup): -initialSnapshot seeds destination from live tree (#9126)
* feat(filer.backup): -initialSnapshot seeds destination from live tree

Replaying the metadata event log on a fresh sync only leaves files that
still exist on the source at replay time: any entry that was created and
later deleted is replayed as a create/delete pair and never materializes
on the destination. Users who wipe the destination and re-run
filer.backup therefore see "only new files" instead of a full backup,
even when -timeAgo=876000h is passed and the subscription genuinely
starts from epoch (ref discussion #8672).

Add a -initialSnapshot opt-in flag: when set on a fresh sync (no prior
checkpoint, -timeAgo unset), walk the live filer tree under -filerPath
via TraverseBfs and seed the destination through sink.CreateEntry, then
persist the walk-start timestamp as the checkpoint and subscribe from
there. Capturing the timestamp before the walk lets the subscription
catch any create/update/delete racing with the walk — sink CreateEntry
is idempotent across the builtin sinks so replay is safe.

Honors existing -filerExcludePaths / -filerExcludeFileNames /
-filerExcludePathPatterns filters and skips /topics/.system/log the
same way the subscription path does.

Also log "starting from <t> (no prior checkpoint)" instead of a
misleading "resuming from 1970-01-01" when the KV has no stored offset.

* fix(filer.backup): guard initialSnapshot counters under TraverseBfs workers

TraverseBfs fans the callback out across 5 worker goroutines, so the
entryCount / byteCount updates and the 5-second progress-log gate in
runInitialSnapshot were racing. Switch the counters to atomic.Int64 and
protect the lastLog check/update with a short-scoped mutex so the heavy
sink.CreateEntry call stays outside the critical section.

Flagged by gemini-code-assist on #9126; verified with go test -race.

* fix(filer.backup): harden initialSnapshot against transient errors and path edge cases

Three review items from CodeRabbit on #9126:

1. getOffset errors no longer leave isFreshSync=true. Before, a transient
   KV read failure would cause runFilerBackup's retry loop to redo the
   full -initialSnapshot walk on every retry. Treat any offset-read
   error as "not fresh" so the snapshot only runs when we've verified
   there really is no prior checkpoint.

2. initialSnapshotTargetKey now normalizes sourcePath to a trailing-
   slash base before stripping the prefix, so edge cases where
   sourceKey equals sourcePath (trailing-slash mismatch or root-entry
   emission) no longer index past the end. Unit tests cover both
   forms.

3. Documented the TraverseBfs-enumerates-excluded-subtrees performance
   characteristic on runInitialSnapshot, since pruning requires a
   separate change to TraverseBfs itself.

* fix(filer.backup): retry setOffset after initialSnapshot to avoid full re-walks

If the snapshot walk finishes but the subsequent setOffset fails, the
retry loop in runFilerBackup will re-enter doFilerBackup with an empty
checkpoint and run the full BFS again — on a multi-million-entry tree
that's hours of wasted work over a 100-byte KV write. Retry the write a
handful of times with exponential backoff before giving up, and log
loudly at the final failure (with snapshotTsNs + sinkId) so operators
recognize the symptom instead of guessing at mysterious repeated walks.

Nitpick raised by CodeRabbit on #9126.

* fix(filer.backup): initialSnapshot ignore404, skew margin, exclude dir-entry itself

Three review items from CodeRabbit on #9126:

1. ignore404Error now threads into runInitialSnapshot. If a file is listed
   by TraverseBfs and then deleted before CreateEntry reads its chunks,
   the follow path already ignores 404s — the snapshot path was aborting
   and triggering a full re-walk. Treat an ignorable 404 as "skip this
   entry, continue."

2. snapshotTsNs now uses `time.Now() - 1min` instead of `time.Now()`.
   Metadata events are stamped server-side, so a fast backup-host clock
   could skip events that fire during or right after the walk. Matches
   the 1-minute margin meta_aggregator.go applies on initial peer
   traversal; duplicate replay is harmless because CreateEntry is
   idempotent.

3. Exclude checks now run against the entry's own full path, not just
   its parent. A walked directory whose full path matches SystemLogDir
   or -filerExcludePaths was being seeded to the destination; only its
   descendants were being skipped. Verified with a manual repro where
   -filerExcludePaths=/data/skipdir now keeps the skipdir entry itself
   off the destination.

* refactor(filer): share destKey helper between buildKey and initialSnapshot

Extract destKey(dataSink, targetPath, sourcePath, sourceKey, mTime) from
buildKey in filer_sync.go. Both the event-log path (buildKey) and the
initialSnapshot walk (initialSnapshotTargetKey) now go through the same
helper, so a walk-seeded file and an event-replayed file always resolve
to the same destination key.

As a bonus, buildKey picks up the defensive trailing-slash normalization
that initialSnapshotTargetKey introduced — no more index-past-end risk
when sourceKey happens to equal sourcePath. Also tightens the mTime
lookup to guard against nil Attributes (caught by an existing test
against buildKey when I first moved the lookup out of the incremental
branch).
2026-04-17 21:21:32 -07:00
..
2026-04-14 20:48:24 -07:00
2026-02-20 18:42:00 -08:00
2026-02-20 18:42:00 -08:00
2026-02-20 18:42:00 -08:00
2025-10-13 18:05:17 -07:00
2026-02-20 18:42:00 -08:00
2025-12-14 16:02:06 -08:00
2026-03-15 09:44:14 -07:00
2026-02-20 18:42:00 -08:00
2026-02-20 18:42:00 -08:00
2025-10-13 18:05:17 -07:00
2026-04-14 20:48:24 -07:00
2019-11-28 18:44:27 -08:00
2025-07-02 18:03:17 -07:00
2026-04-10 17:31:14 -07:00
2026-02-20 18:42:00 -08:00