Files
Chris Lu 75dcb97187 filer: bootstrap pre-existing metadata when a new filer joins (#8979)
* filer: bootstrap pre-existing metadata when a new filer joins a cluster

When a filer connects to a peer for the first time (no stored sync
offset), it now does a full BFS traversal of the peer's metadata via
TraverseBfsMetadata before starting the incremental change stream.
This ensures filer2 sees all data that existed before it started,
fixing the issue where only post-startup changes were synced.

Closes #8961

* filer: upsert during bootstrap and persist offset immediately

- Use upsert (insert, then update on conflict) during metadata
  traversal so the bootstrap doesn't fail on the root directory
  or after a partial previous attempt.
- Persist the sync offset right after a successful traversal so
  a retry doesn't redo the full BFS.

* filer: address review feedback on metadata bootstrap

- Use peer-side max Mtime as the streaming cursor instead of local
  time.Now() to avoid missing events due to clock skew between filers.
  traversePeerMetadata now returns the high-water Mtime (nanoseconds)
  observed during BFS traversal.

- Compare Mtime before overwriting during bootstrap: if a local entry
  is newer than the peer's version, skip the update instead of
  clobbering it.

- Only trigger full BFS traversal on ErrKvNotFound (key genuinely
  missing). Transient KvGet errors (connection issues, etc.) are now
  propagated instead of silently falling through to a full re-sync.
  Changed readOffset to use %w so errors.Is works through the chain.

* filer: address review findings on bootstrap sync

- Use wall-clock time with safety margin for stream cursor instead of
  entry Mtime. Mtime is file modification time (can be arbitrary),
  while the metadata stream uses TsNs (event log time). Using
  time.Now() minus 1 minute before traversal ensures no events are
  missed even with clock skew, matching the proven filer.meta.backup
  pattern.

- Pass ExcludedPrefixes=[SystemLogDir] to TraverseBfsMetadata so
  the server prunes internal log entries server-side instead of
  transferring them over the network only to be filtered client-side.

- Fail fast if updateOffset fails after bootstrap. If we can't
  persist the offset, bail out rather than proceeding and potentially
  losing the expensive BFS work on the next retry.
2026-04-07 19:05:45 -07:00
..
2026-02-20 18:42:00 -08:00
2026-02-20 18:42:00 -08:00
2024-03-25 12:50:43 -07:00
2026-02-20 18:42:00 -08:00
2026-02-20 18:42:00 -08:00
2026-02-20 18:42:00 -08:00
2026-02-20 18:42:00 -08:00
2023-04-13 22:32:45 -07:00