mirror of
https://github.com/seaweedfs/seaweedfs.git
synced 2026-05-14 05:41:29 +00:00
* Add -verifySync flag to filer.sync for cross-cluster file comparison
Add a verification mode to filer.sync that compares entries between two
filers without performing actual synchronization. Uses directory-level
sorted merge of ListEntries to detect missing files, size mismatches,
and ETag mismatches. Supports -isActivePassive for unidirectional check
and -modifyTimeAgo to skip recently modified files during sync lag.
* Add mtime annotation and JSON output to filer.sync -verifySync
Add automatic mtime relation analysis for SIZE_MISMATCH and
ETAG_MISMATCH diffs, and an NDJSON output mode for external tooling.
mtime classification:
- B_NEWER => "late_updates_skip_likely" hint. Surfaces the case
where target has a stub entry whose mtime is ahead of source's
real file, causing UpdateEntry's mtime guard in filersink to
permanently skip the update.
- A_NEWER => "sync_lag_or_event_miss" hint.
- EQUAL => no hint (chunk-level issue suspected).
Text output example:
[SIZE_MISMATCH] /path (a=996, b=0, B newer +274d [late-updates skip likely])
Add -verifyJsonOutput flag. When set, emits one JSON object per
line (NDJSON) for diffs and a final SUMMARY object, suitable for
piping into external diagnostic pipelines.
Concurrent writes from the directory worker pool are now serialized
via outputMu to keep both text lines and JSON records atomic.
* fix(filer.sync): use shared global semaphore in verifySync to bound goroutine explosion
Replace the per-call local semaphore in compareDirectory with a single
shared semaphore created in runVerifySync. The old per-level semaphore
applied a limit of verifySyncConcurrency only within each directory level,
allowing effective concurrency to grow as verifySyncConcurrency^depth on
deep trees.
The shared semaphore is held only for each directory's I/O phase
(listEntries + merge) and released before recursing into subdirectories,
so a parent never blocks waiting for children to acquire slots — which
would deadlock once tree depth exceeds the semaphore capacity.
Extract the capacity into a named constant (verifySyncConcurrency = 5)
with a comment explaining the memory vs. performance trade-off.
Add unit tests:
- correctness: missing file, only-in-B, size mismatch, active-passive mode
- concurrency bound: peak concurrent listings ≤ verifySyncConcurrency
- no-deadlock: binary tree of depth 10 completes within timeout
* fix(filer.sync): stream directory entries to prevent OOM on large directories
Replace the listEntries helper (which accumulated all entries into a
single []filer_pb.Entry slice) with an entryStream type that pages
through the directory in the background and forwards entries one at a
time through a buffered channel. Memory per directory comparison is now
O(channel buffer size = 64) regardless of how many entries the directory
contains.
Key design points:
- entryStream wraps a goroutine + buffered channel with a one-entry
lookahead (peek/advance) so the two-pointer sorted merge in
compareDirectory can work without buffering any full listing.
- A child context (mergeCtx) is passed to both stream goroutines so
they are cancelled promptly if compareDirectory returns early (e.g.
on error); the ctx.Done() select arm in the callback prevents
goroutine leaks when the consumer stops reading.
- stream.err is written by the goroutine before close(ch), so it is
safe to read after the channel is exhausted (Go memory model:
channel close happens-before the zero-value receive).
- countMissingRecursive is rewritten to use ReadDirAllEntries with a
direct callback, eliminating its own slice allocation.
- listEntries is removed; it is no longer called anywhere.
* fix(filer.sync): address verifySync review findings
Four real bugs found and fixed; one finding already resolved (shared
semaphore was introduced in a prior commit).
path.Join for child paths (filer_sync_verify.go)
fmt.Sprintf("%s/%s", dir, name) produced "//name" when dir was "/".
Replace all child-path concatenations with path.Join so root-level
walks emit clean paths.
cutoffTime check for ONLY_IN_B entries (filer_sync_verify.go)
The B-only branch ignored -modifyTimeAgo, so files recently written
to B were reported as ONLY_IN_B instead of being skipped. Mirror the
A-side mtime guard: skip and increment skippedRecent when the entry
is newer than cutoffTime.
Summary emitted before error check (filer_sync_verify.go)
A filer I/O error mid-walk still caused a SUMMARY record (or text
summary) to be printed, making partial runs appear complete. Move the
error check to before summary emission; on error, return immediately
without printing any summary.
Return false on verification failure (filer_sync.go)
runVerifySync returned true (exit 0) even when diffs were found or the
walk failed. Return false so the main binary sets exit status 1,
consistent with how all other commands signal failure.
* test(filer.sync): add missing verifySync test coverage
Four new tests covering gaps identified during review:
TestVerifySyncETagMismatch
Verifies that two files with identical size but different Md5 checksums
are counted as etagMismatch (not sizeMismatch). Exercises the second
branch of compareEntries that was previously untested.
TestVerifySyncCutoffTime (4 subtests)
A-only recent — recent file skipped (skippedRecent++), not MISSING
A-only old — old file reported as MISSING
B-only recent — recent file skipped (skippedRecent++), not ONLY_IN_B
B-only old — old file reported as ONLY_IN_B
The B-only subtests specifically cover the cutoffTime fix added in the
previous commit.
TestVerifySyncRootPath
Regression for the path.Join fix: walks from "/" and verifies that the
child directory is reached and compared correctly (the old Sprintf
produced "//data" which would silently produce wrong results).
Asserts dirCount=2 and fileCount=1 to confirm the full tree is walked.
* fix(filer.sync): use os.Exit(2) instead of return false on verify failure
return false triggered weed.go's error handler which printed the full
command usage — appropriate for invalid arguments, not for a completed
verification that found differences. Use os.Exit(2) consistent with
the existing pattern in filer_sync.go (lines 251, 293).
* refactor(filer.sync.verify): split verify into its own command
The verify mode is a one-shot batch operation with a fundamentally
different lifecycle from the long-running sync subscriber, and most of
filer.sync's flags (replication, metrics port, debug pprof, concurrency,
etc.) do not apply to it. Extract it into a sibling command alongside
filer.copy/filer.backup/filer.export rather than a flag mode on
filer.sync.
Also rename modifyTimeAgo to modifiedTimeAgo (grammatical) and drop the
verifyJsonOutput prefix to plain jsonOutput now that the verify context
is implicit in the command name.
* fix(filer.sync.verify): address review comments
- Bounded worker pool: cap subdirectory goroutines per level via a
jobs channel and min(verifySyncConcurrency, len(subDirs)) workers
instead of spawning one goroutine per child. Wide directories no
longer park ~2KB per queued goroutine.
- Don't gate recursion on a directory's mtime: a fresh child write
bumps the parent mtime, but older files inside should still be
reported as missing. Always recurse for missing-in-B directories
and apply the cutoff per-file inside countMissingRecursive.
- Apply -modifiedTimeAgo symmetrically: matched-name files now skip
the comparison when EITHER side is recently modified, not just A.
This restores lag tolerance when B was just rewritten.
Adds tests for both new behaviors and a shared isTooRecent helper.
---------
Co-authored-by: Chris Lu <chris.lu@gmail.com>
100 lines
1.9 KiB
Go
100 lines
1.9 KiB
Go
package command
|
|
|
|
import (
|
|
"fmt"
|
|
"os"
|
|
"strings"
|
|
|
|
flag "github.com/seaweedfs/seaweedfs/weed/util/fla9"
|
|
)
|
|
|
|
var Commands = []*Command{
|
|
cmdAdmin,
|
|
cmdAutocomplete,
|
|
cmdUnautocomplete,
|
|
cmdBackup,
|
|
cmdBenchmark,
|
|
cmdCompact,
|
|
cmdDownload,
|
|
cmdExport,
|
|
cmdFiler,
|
|
cmdFilerBackup,
|
|
cmdFilerCat,
|
|
cmdFilerCopy,
|
|
cmdFilerMetaBackup,
|
|
cmdFilerMetaTail,
|
|
cmdFilerRemoteGateway,
|
|
cmdFilerRemoteSynchronize,
|
|
cmdFilerReplicate,
|
|
cmdFilerSynchronize,
|
|
cmdFilerSyncVerify,
|
|
cmdFix,
|
|
cmdFuse,
|
|
cmdIam,
|
|
cmdMaster,
|
|
cmdMasterFollower,
|
|
cmdMini,
|
|
cmdMount,
|
|
cmdMqAgent,
|
|
cmdMqBroker,
|
|
cmdMqKafkaGateway,
|
|
cmdS3,
|
|
cmdScaffold,
|
|
cmdServer,
|
|
cmdShell,
|
|
cmdUpdate,
|
|
cmdUpload,
|
|
cmdVersion,
|
|
cmdVolume,
|
|
cmdWebDav,
|
|
cmdSftp,
|
|
cmdNfs,
|
|
cmdWorker,
|
|
}
|
|
|
|
type Command struct {
|
|
// Run runs the command.
|
|
// The args are the arguments after the command name.
|
|
Run func(cmd *Command, args []string) bool
|
|
|
|
// UsageLine is the one-line usage message.
|
|
// The first word in the line is taken to be the command name.
|
|
UsageLine string
|
|
|
|
// Short is the short description shown in the 'go help' output.
|
|
Short string
|
|
|
|
// Long is the long message shown in the 'go help <this-command>' output.
|
|
Long string
|
|
|
|
// Flag is a set of flags specific to this command.
|
|
Flag flag.FlagSet
|
|
|
|
IsDebug *bool
|
|
}
|
|
|
|
// Name returns the command's name: the first word in the usage line.
|
|
func (c *Command) Name() string {
|
|
name := c.UsageLine
|
|
i := strings.Index(name, " ")
|
|
if i >= 0 {
|
|
name = name[:i]
|
|
}
|
|
return name
|
|
}
|
|
|
|
func (c *Command) Usage() {
|
|
fmt.Fprintf(os.Stderr, "Example: weed %s\n", c.UsageLine)
|
|
fmt.Fprintf(os.Stderr, "Default Usage:\n")
|
|
c.Flag.PrintDefaults()
|
|
fmt.Fprintf(os.Stderr, "Description:\n")
|
|
fmt.Fprintf(os.Stderr, " %s\n", strings.TrimSpace(c.Long))
|
|
os.Exit(2)
|
|
}
|
|
|
|
// Runnable reports whether the command can be run; otherwise
|
|
// it is a documentation pseudo-command such as importpath.
|
|
func (c *Command) Runnable() bool {
|
|
return c.Run != nil
|
|
}
|