78 Commits

Author SHA1 Message Date
Chris Lu
d605feb403 refactor(command): expand "~" in all path-style CLI flags (#9306)
* refactor(command): expand "~" in all path-style CLI flags

Many of weed's path-bearing flags (-s3.config, -s3.iam.config,
-admin.dataDir, -webdav.cacheDir, -volume.dir.idx, TLS cert/key
files, profile output paths, mount cache dirs, sftp key files, ...)
were never run through util.ResolvePath, so a value like "~/iam.json"
was used literally. Tilde only worked when the shell expanded it,
which silently fails for the common -flag=~/path form (bash leaves
the tilde literal in --opt=~/path).

- Extend util.ResolvePath to also handle "~user" / "~user/rest",
  matching shell tilde expansion. Add unit tests.
- Apply util.ResolvePath at the top of each shared start* function
  (s3, webdav, sftp) so mini/server/filer/standalone callers all
  inherit it; resolve at the few one-off use sites (mount cache
  dirs, volume idx folder, mini admin.dataDir, profile paths).
- Drop the duplicate expandHomeDir helper from admin.go in favor of
  the now-equivalent util.ResolvePath.

* fixup: handle comma-separated -dir flags for tilde expansion

`weed mini -dir`, `weed server -dir`, and `weed volume -dir` accept
comma-separated paths (`dir[,dir]...`). Calling util.ResolvePath on
the whole string mishandled multi-folder values with tilde, e.g.
"~/d1,~/d2" would resolve as if "d1,~/d2" were a single subpath.

- Add util.ResolveCommaSeparatedPaths: split on ",", run each entry
  through ResolvePath, rejoin. Short-circuits when no "~" present.
- Use it for *miniDataFolders (mini.go), *volumeDataFolders (server.go),
  and resolve each entry of v.folders in-place (volume.go) so all
  downstream consumers see resolved paths.
- Add 7-case TestResolveCommaSeparatedPaths covering empty, single,
  multiple, and mixed inputs.

* address PR review: metaFolder + Windows backslash

- master.go: resolve *m.metaFolder at the top of runMaster so
  util.FullPath(*m.metaFolder) on the next line sees an expanded
  path. Drop the now-redundant ResolvePath in TestFolderWritable.
- server.go: same treatment for *masterOptions.metaFolder, paired
  with the existing cpu/mem profile resolves. Drop the redundant
  inner ResolvePath at TestFolderWritable.
- file_util.go: ResolvePath now accepts filepath.Separator as a
  separator after the tilde, so "~\\data" works on Windows. Other
  platforms keep current behaviour (backslash stays literal because
  it is a valid filename character in usernames and paths).
- file_util_test.go: add two cases using filepath.Separator that
  exercise the new code path on Windows and remain a no-op on Unix.

* address PR review: resolve "~" in remaining command path flags

Comprehensive sweep of path-bearing flags across every weed
subcommand, applying util.ResolvePath in-place at the top of each
run* function so all downstream consumers see expanded paths.

- webdav.go: resolve *wo.cacheDir at the top of startWebDav so
  mini/server/filer/standalone callers all inherit it.
- mount_std.go: cpu/mem profile paths.
- filer_sync.go: cpu/mem profile paths.
- mq_broker.go: cpu/mem profile paths.
- benchmark.go: cpuprofile output path.
- backup.go: -dir resolved once at runBackup; drop the duplicated
  inline ResolvePath in NewVolume calls.
- compact.go: -dir resolved at runCompact; drop inline ResolvePath.
- export.go: -dir and -o resolved at runExport; drop inline
  ResolvePath in LoadFromIdx and ScanVolumeFile.
- download.go: -dir resolved at runDownload; drop inline.
- update.go: -dir resolved at runUpdate so filepath.Join uses the
  expanded path; drop inline ResolvePath in TestFolderWritable.
- scaffold.go: -output expanded before filepath.Join.
- worker.go: -workingDir expanded before being passed to runtime.

* address PR review: resolve option-struct paths at run* entry points

server.go:381 propagates s3Options.config to filerOptions.s3ConfigFile
*before* startS3Server runs, which meant the filer-side code saw the
unresolved tilde-prefixed pointer. Same pattern for webdavOptions and
sftpOptions (and equivalent in mini.go / filer.go).

The fix: hoist resolution from the shared start* functions up to the
run* entry points, where every shared pointer is set up before any
propagation happens.

- s3.go, webdav.go, sftp.go: extract a resolvePaths() method on each
  Options struct that runs every path field through util.ResolvePath
  in-place. Idempotent.
- runS3, runWebDav, runSftp: call the standalone struct's resolvePaths
  before starting metrics / loading security config.
- runServer, runMini, runFiler: call resolvePaths on every embedded
  options struct, plus resolve loose flags (serverIamConfig,
  miniS3Config, miniIamConfig, miniMasterOptions.metaFolder, and
  filer's defaultLevelDbDirectory) so they're expanded before any
  pointer copy or use.
- Drop the now-redundant inline ResolvePath at filer's
  defaultLevelDbDirectory composition.

* address PR review: re-resolve mini -dir post-config, cover misc paths

- mini.go: applyConfigFileOptions can overwrite -dir with a literal
  ~/data from mini.options. Re-resolve *miniDataFolders after the
  config-file apply, alongside the other path resolves, so the mini
  filer no longer ends up with a literal ~/data/filerldb2.
- benchmark.go: resolve *b.idListFile (-list).
- filer_sync.go: resolve *syncOptions.aSecurity / .bSecurity
  (-a.security / -b.security) before LoadClientTLSFromFile.
- filer_cat.go: resolve *filerCat.output (-o) before os.OpenFile.
- admin.go: drop trailing blank line at EOF (git diff --check).

* address PR review: resolve -a.security/-b.security/-config before use

Three follow-up fixes:

- filer_sync.go: the -a.security / -b.security resolves were placed
  *after* LoadClientTLSFromFile / LoadHTTPClientFromFile were called,
  so weed filer.sync -a.security=~/a.toml still passed the literal
  tilde path. Hoist the resolves above the security-loading block so
  TLS clients see expanded paths.
- filer_sync_verify.go: same flag pair was never resolved at all in
  the verify command; resolve at the top of runFilerSyncVerify.
- filer_meta_backup.go: -config (the backup_filer.toml path) was
  passed directly to viper. Resolve at the top of runFilerMetaBackup.
- mini.go: master.dir defaulted to the entire comma-joined
  miniDataFolders. With weed mini -dir=~/d1,~/d2 (or any multi-dir
  setup), TestFolderWritable then stat'd the joined string instead
  of a single directory. Default to the first entry via StringSplit
  to mirror the disk-space calculation a few lines below, and drop
  the now-redundant ResolvePath in TestFolderWritable.
2026-05-03 21:46:21 -07:00
Chris Lu
c1ccbe97dd feat(filer.backup): -initialSnapshot seeds destination from live tree (#9126)
* feat(filer.backup): -initialSnapshot seeds destination from live tree

Replaying the metadata event log on a fresh sync only leaves files that
still exist on the source at replay time: any entry that was created and
later deleted is replayed as a create/delete pair and never materializes
on the destination. Users who wipe the destination and re-run
filer.backup therefore see "only new files" instead of a full backup,
even when -timeAgo=876000h is passed and the subscription genuinely
starts from epoch (ref discussion #8672).

Add a -initialSnapshot opt-in flag: when set on a fresh sync (no prior
checkpoint, -timeAgo unset), walk the live filer tree under -filerPath
via TraverseBfs and seed the destination through sink.CreateEntry, then
persist the walk-start timestamp as the checkpoint and subscribe from
there. Capturing the timestamp before the walk lets the subscription
catch any create/update/delete racing with the walk — sink CreateEntry
is idempotent across the builtin sinks so replay is safe.

Honors existing -filerExcludePaths / -filerExcludeFileNames /
-filerExcludePathPatterns filters and skips /topics/.system/log the
same way the subscription path does.

Also log "starting from <t> (no prior checkpoint)" instead of a
misleading "resuming from 1970-01-01" when the KV has no stored offset.

* fix(filer.backup): guard initialSnapshot counters under TraverseBfs workers

TraverseBfs fans the callback out across 5 worker goroutines, so the
entryCount / byteCount updates and the 5-second progress-log gate in
runInitialSnapshot were racing. Switch the counters to atomic.Int64 and
protect the lastLog check/update with a short-scoped mutex so the heavy
sink.CreateEntry call stays outside the critical section.

Flagged by gemini-code-assist on #9126; verified with go test -race.

* fix(filer.backup): harden initialSnapshot against transient errors and path edge cases

Three review items from CodeRabbit on #9126:

1. getOffset errors no longer leave isFreshSync=true. Before, a transient
   KV read failure would cause runFilerBackup's retry loop to redo the
   full -initialSnapshot walk on every retry. Treat any offset-read
   error as "not fresh" so the snapshot only runs when we've verified
   there really is no prior checkpoint.

2. initialSnapshotTargetKey now normalizes sourcePath to a trailing-
   slash base before stripping the prefix, so edge cases where
   sourceKey equals sourcePath (trailing-slash mismatch or root-entry
   emission) no longer index past the end. Unit tests cover both
   forms.

3. Documented the TraverseBfs-enumerates-excluded-subtrees performance
   characteristic on runInitialSnapshot, since pruning requires a
   separate change to TraverseBfs itself.

* fix(filer.backup): retry setOffset after initialSnapshot to avoid full re-walks

If the snapshot walk finishes but the subsequent setOffset fails, the
retry loop in runFilerBackup will re-enter doFilerBackup with an empty
checkpoint and run the full BFS again — on a multi-million-entry tree
that's hours of wasted work over a 100-byte KV write. Retry the write a
handful of times with exponential backoff before giving up, and log
loudly at the final failure (with snapshotTsNs + sinkId) so operators
recognize the symptom instead of guessing at mysterious repeated walks.

Nitpick raised by CodeRabbit on #9126.

* fix(filer.backup): initialSnapshot ignore404, skew margin, exclude dir-entry itself

Three review items from CodeRabbit on #9126:

1. ignore404Error now threads into runInitialSnapshot. If a file is listed
   by TraverseBfs and then deleted before CreateEntry reads its chunks,
   the follow path already ignores 404s — the snapshot path was aborting
   and triggering a full re-walk. Treat an ignorable 404 as "skip this
   entry, continue."

2. snapshotTsNs now uses `time.Now() - 1min` instead of `time.Now()`.
   Metadata events are stamped server-side, so a fast backup-host clock
   could skip events that fire during or right after the walk. Matches
   the 1-minute margin meta_aggregator.go applies on initial peer
   traversal; duplicate replay is harmless because CreateEntry is
   idempotent.

3. Exclude checks now run against the entry's own full path, not just
   its parent. A walked directory whose full path matches SystemLogDir
   or -filerExcludePaths was being seeded to the destination; only its
   descendants were being skipped. Verified with a manual repro where
   -filerExcludePaths=/data/skipdir now keeps the skipdir entry itself
   off the destination.

* refactor(filer): share destKey helper between buildKey and initialSnapshot

Extract destKey(dataSink, targetPath, sourcePath, sourceKey, mTime) from
buildKey in filer_sync.go. Both the event-log path (buildKey) and the
initialSnapshot walk (initialSnapshotTargetKey) now go through the same
helper, so a walk-seeded file and an event-replayed file always resolve
to the same destination key.

As a bonus, buildKey picks up the defensive trailing-slash normalization
that initialSnapshotTargetKey introduced — no more index-past-end risk
when sourceKey happens to equal sourcePath. Also tightens the mTime
lookup to guard against nil Attributes (caught by an existing test
against buildKey when I first moved the lookup out of the incremental
branch).
2026-04-17 21:21:32 -07:00
Chris Lu
e648c76bcf go fmt 2026-04-10 17:31:14 -07:00
Chris Lu
2919bb27e5 fix(sync): use per-cluster TLS for HTTP volume connections in filer.sync (#8974)
* fix(sync): use per-cluster TLS for HTTP volume connections in filer.sync (#8965)

When filer.sync runs with -a.security and -b.security flags, only gRPC
connections received per-cluster TLS configuration. HTTP clients for
volume server reads and uploads used a global singleton with the default
security.toml, causing TLS verification failures when clusters use
different self-signed certificates.

Load per-cluster HTTPS client config from the security files and pass
dedicated HTTP clients to FilerSource (for downloads) and FilerSink
(for uploads) so each direction uses the correct cluster's certificates.

* fix(sync): address review feedback for per-cluster HTTP TLS

- Add insecure_skip_verify support to NewHttpClientWithTLS and read it
  from per-cluster security config via https.client.insecure_skip_verify
- Error on partial mTLS config (cert without key or vice versa)
- Add nil-check for client parameter in DownloadFileWithClient
- Document SetUploader as init-only (same pattern as SetChunkConcurrency)
2026-04-07 14:11:44 -07:00
Chris Lu
9552e80b58 filer.sync: show active chunk transfers when sync progress stalls (#8889)
* filer.sync: show active chunk transfers when sync progress stalls

When the sync watermark is not advancing, print each in-progress chunk
transfer with its file path, bytes received so far, and current status
(downloading, uploading, or waiting with backoff duration). This helps
diagnose which files are blocking progress during replication.

Closes #8542

* filer.sync: include last error in stall diagnostics

* filer.sync: fix data races in ChunkTransferStatus

Add sync.RWMutex to ChunkTransferStatus and lock around all field
mutations in fetchAndWrite. ActiveTransfers now returns value copies
under RLock so callers get immutable snapshots.
2026-04-02 13:08:24 -07:00
Chris Lu
8572aae403 filer.sync: support per-cluster mTLS with -a.security and -b.security (#8872)
* filer.sync: support per-cluster mTLS with -a.security and -b.security flags

When syncing between two clusters that use different certificate authorities,
a single security.toml cannot authenticate to both. Add -a.security and
-b.security flags so each filer can use its own security.toml for TLS.

Closes #8481

* security: fatal on failure to read explicitly provided security config

When -a.security or -b.security is specified, falling back to insecure
credentials on read error would silently bypass mTLS. Fatal instead.

* fix(filer.sync): use source filer's fromTsMs flag in initOffsetFromTsMs

A→B was using bFromTsMs and B→A was using aFromTsMs — these were
swapped. Each path should seed the target's offset with the source
filer's starting timestamp.

* security: return error from LoadClientTLSFromFile, resolve relative PEM paths

Change LoadClientTLSFromFile to return (grpc.DialOption, error) so
callers can handle failures explicitly instead of a silent insecure
fallback. Resolve relative PEM paths (grpc.ca, grpc.client.cert,
grpc.client.key) against the config file's directory.
2026-04-01 11:05:43 -07:00
Chris Lu
ced2236cc6 Adjust rename events metadata format (#8854)
* rename metadata events

* fix subscription filter to use NewEntry.Name for rename path matching

The server-side subscription filter constructed the new path using
OldEntry.Name instead of NewEntry.Name when checking if a rename
event's destination matches the subscriber's path prefix. This could
cause events to be incorrectly filtered when a rename changes the
file name.

* fix bucket events to handle rename of bucket directories

onBucketEvents only checked IsCreate and IsDelete. A bucket directory
rename via AtomicRenameEntry now emits a single rename event (both
OldEntry and NewEntry non-nil), which matched neither check. Handle
IsRename by deleting the old bucket and creating the new one.

* fix replicator to handle rename events across directory boundaries

Two issues fixed:

1. The replicator filtered events by checking if the key (old path)
   was under the source directory. Rename events now use the old path
   as key, so renames from outside into the watched directory were
   silently dropped. Now both old and new paths are checked, and
   cross-boundary renames are converted to create or delete.

2. NewParentPath was passed to the sink without remapping to the
   sink's target directory structure, causing the sink to write
   entries at the wrong location. Now NewParentPath is remapped
   alongside the key.

* fix filer sync to handle rename events crossing directory boundaries

The early directory-prefix filter only checked resp.Directory (old
parent). Rename events now carry the old parent as Directory, so
renames from outside the source path into it were dropped before
reaching the existing cross-boundary handling logic. Check both old
and new directories against sourcePath and excludePaths so the
downstream old-key/new-key logic can properly convert these to
create or delete operations.

* fix metadata event path matching

* fix metadata event consumers for rename targets

* Fix replication rename target keys

Logical rename events now reach replication sinks with distinct source and target paths.\n\nHandle non-filer sinks as delete-plus-create on the translated target key, and make the rename fallback path create at the translated target key too.\n\nAdd focused tests covering non-filer renames, filer rename updates, and the fallback path.\n\nCo-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix filer sync rename path scoping

Use directory-boundary matching instead of raw prefix checks when classifying source and target paths during filer sync.\n\nAlso apply excludePaths per side so renames across excluded boundaries downgrade cleanly to create/delete instead of being misclassified as in-scope updates.\n\nAdd focused tests for boundary matching and rename classification.\n\nCo-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix replicator directory boundary checks

Use directory-boundary matching instead of raw prefix checks when deciding whether a source or target path is inside the watched tree or an excluded subtree.\n\nThis prevents sibling paths such as /foo and /foobar from being misclassified during rename handling, and preserves the earlier rename-target-key fix.\n\nAdd focused tests for boundary matching and rename classification across sibling/excluded directories.\n\nCo-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix etc-remote rename-out handling

Use boundary-safe source/target directory membership when classifying metadata events under DirectoryEtcRemote.\n\nThis prevents rename-out events from being processed as config updates, while still treating them as removals where appropriate for the remote sync and remote gateway command paths.\n\nAdd focused tests for update/removal classification and sibling-prefix handling.\n\nCo-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Defer rename events until commit

Queue logical rename metadata events during atomic and streaming renames and publish them only after the transaction commits successfully.\n\nThis prevents subscribers from seeing delete or logical rename events for operations that later fail during delete or commit.\n\nAlso serialize notification.Queue swaps in rename tests and add failure-path coverage.\n\nCo-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Skip descendant rename target lookups

Avoid redundant target lookups during recursive directory renames once the destination subtree is known absent.\n\nThe recursive move path now inserts known-absent descendants directly, and the test harness exercises prefixed directory listing so the optimization is covered by a directory rename regression test.\n\nCo-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Tighten rename review tests

Return filer_pb.ErrNotFound from the bucket tracking store test stub so it follows the FilerStore contract, and add a webhook filter case for same-name renames across parent directories.\n\nCo-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix HardLinkId format verb in InsertEntryKnownAbsent error

HardLinkId is a byte slice. %d prints each byte as a decimal number
which is not useful for an identifier. Use %x to match the log line
two lines above.

* only skip descendant target lookup when source and dest use same store

moveFolderSubEntries unconditionally passed skipTargetLookup=true for
every descendant. This is safe when all paths resolve to the same
underlying store, but with path-specific store configuration a child's
destination may map to a different backend that already holds an entry
at that path. Use FilerStoreWrapper.SameActualStore to check per-child
and fall back to the full CreateEntry path when stores differ.

* add nil and create edge-case tests for metadata event scope helpers

* extract pathIsEqualOrUnder into util.IsEqualOrUnder

Identical implementations existed in both replication/replicator.go and
command/filer_sync.go. Move to util.IsEqualOrUnder (alongside the
existing FullPath.IsUnder) and remove the duplicates.

* use MetadataEventTargetDirectory for new-side directory in filer sync

The new-side directory checks and sourceNewKey computation used
message.NewParentPath directly. If NewParentPath were empty (legacy
events, older filer versions during rolling upgrades), sourceNewKey
would be wrong (/filename instead of /dir/filename) and the
UpdateEntry parent path rewrite would panic on slice bounds.

Derive targetDir once from MetadataEventTargetDirectory, which falls
back to resp.Directory when NewParentPath is empty, and use it
consistently for all new-side checks and the sink parent path.
2026-03-30 18:25:11 -07:00
Jaehoon Kim
6cf34f2376 Add -filerExcludePathPattern flag and fix nil panic in -filerExcludeFileName (#8756)
* Fix filerExcludeFileName to support directory names and path components

The original implementation only matched excludeFileName against
message.NewEntry.Name, which caused two issues:

1. Nil pointer panic on delete events (NewEntry is nil)
2. Files inside excluded directories were still backed up because
   the parent directory name was not checked

This patch:
- Checks all path components in resp.Directory against the regexp
- Adds nil guard for message.NewEntry before accessing .Name
- Also checks message.OldEntry.Name for rename/delete events

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Add -filerExcludePathPattern flag and fix nil panic in filerExcludeFileName

Separate concerns between two exclude mechanisms:
- filerExcludeFileName: matches entry name only (leaf node)
- filerExcludePathPattern (NEW): matches any path component via regexp,
  so files inside matched directories are also excluded

Also fixes nil pointer panic when filerExcludeFileName encounters
delete events where NewEntry is nil.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Refactor exclude logic: per-side exclusion for rename events, reduce duplication

- Extract isEntryExcluded() to compute exclusion per old/new side,
  so rename events crossing an exclude boundary are handled as
  delete + create instead of being entirely skipped
- Extract compileExcludePattern() to deduplicate regexp compilation
- Replace strings.Split with allocation-free pathContainsMatch()
- Check message.NewParentPath (not just resp.Directory) for new side

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Move regexp compilation out of retry loop to fail fast on config errors

compileExcludePattern for -filerExcludeFileName and -filerExcludePathPattern
are configuration-time validations that will never succeed on retry.
Move them to runFilerBackup before the reconnect loop and use glog.Fatalf
on failure, so invalid patterns are caught immediately at startup instead
of being retried every 1.7 seconds indefinitely.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Add wildcard matching helpers for path and filename exclusion

* Replace regexp exclude patterns with wildcard-based flags, deprecate -filerExcludeFileName

Add -filerExcludeFileNames and -filerExcludePathPatterns flags that accept
comma-separated wildcard patterns (*, ?) using the existing wildcard library.
Mark -filerExcludeFileName as deprecated but keep its regexp behavior.

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Chris Lu <chris.lu@gmail.com>
2026-03-26 10:04:06 -07:00
Chris Lu
81369b8a83 improve: large file sync throughput for remote.cache and filer.sync (#8676)
* improve large file sync throughput for remote.cache and filer.sync

Three main throughput improvements:

1. Adaptive chunk sizing for remote.cache: targets ~32 chunks per file
   instead of always starting at 5MB. A 500MB file now uses ~16MB chunks
   (32 chunks) instead of 5MB chunks (100 chunks), reducing per-chunk
   overhead (volume assign, gRPC call, needle write) by 3x.

2. Configurable concurrency at every layer:
   - remote.cache chunk concurrency: -chunkConcurrency flag (default 8)
   - remote.cache S3 download concurrency: -downloadConcurrency flag
     (default raised from 1 to 5 per chunk)
   - filer.sync chunk concurrency: -chunkConcurrency flag (default 32)

3. S3 multipart download concurrency raised from 1 to 5: the S3 manager
   downloader was using Concurrency=1, serializing all part downloads
   within each chunk. This alone can 5x per-chunk download speed.

The concurrency values flow through the gRPC request chain:
  shell command → CacheRemoteObjectToLocalClusterRequest →
  FetchAndWriteNeedleRequest → S3 downloader

Zero values in the request mean "use server defaults", maintaining
full backward compatibility with existing callers.

Ref #8481

* fix: use full maxMB for chunk size cap and remove loop guard

Address review feedback:
- Use full maxMB instead of maxMB/2 for maxChunkSize to avoid
  unnecessarily limiting chunk size for very large files.
- Remove chunkSize < maxChunkSize guard from the safety loop so it
  can always grow past maxChunkSize when needed to stay under 1000
  chunks (e.g., extremely large files with small maxMB).

* address review feedback: help text, validation, naming, docs

- Fix help text for -chunkConcurrency and -downloadConcurrency flags
  to say "0 = server default" instead of advertising specific numeric
  defaults that could drift from the server implementation.
- Validate chunkConcurrency and downloadConcurrency are within int32
  range before narrowing, returning a user-facing error if out of range.
- Rename ReadRemoteErr to readRemoteErr to follow Go naming conventions.
- Add doc comment to SetChunkConcurrency noting it must be called
  during initialization before replication goroutines start.
- Replace doubling loop in chunk size safety check with direct
  ceil(remoteSize/1000) computation to guarantee the 1000-chunk cap.

* address Copilot review: clamp concurrency, fix chunk count, clarify proto docs

- Use ceiling division for chunk count check to avoid overcounting
  when file size is an exact multiple of chunk size.
- Clamp chunkConcurrency (max 1024) and downloadConcurrency (max 1024
  at filer, max 64 at volume server) to prevent excessive goroutines.
- Always use ReadFileWithConcurrency when the client supports it,
  falling back to the implementation's default when value is 0.
- Clarify proto comments that download_concurrency only applies when
  the remote storage client supports it (currently S3).
- Include specific server defaults in help text (e.g., "0 = server
  default 8") so users see the actual values in -h output.

* fix data race on executionErr and use %w for error wrapping

- Protect concurrent writes to executionErr in remote.cache worker
  goroutines with a sync.Mutex to eliminate the data race.
- Use %w instead of %v in volume_grpc_remote.go error formatting
  to preserve the error chain for errors.Is/errors.As callers.
2026-03-17 16:49:56 -07:00
Chris Lu
1261e93ef2 fix: comprehensive go vet error fixes and add CI enforcement (#7861)
* fix: use keyed fields in struct literals

- Replace unsafe reflect.StringHeader/SliceHeader with safe unsafe.String/Slice (weed/query/sqltypes/unsafe.go)
- Add field names to Type_ScalarType struct literals (weed/mq/schema/schema_builder.go)
- Add Duration field name to FlexibleDuration struct literals across test files
- Add field names to bson.D struct literals (weed/filer/mongodb/mongodb_store_kv.go)

Fixes go vet warnings about unkeyed struct literals.

* fix: remove unreachable code

- Remove unreachable return statements after infinite for loops
- Remove unreachable code after if/else blocks where all paths return
- Simplify recursive logic by removing unnecessary for loop (inode_to_path.go)
- Fix Type_ScalarType literal to use enum value directly (schema_builder.go)
- Call onCompletionFn on stream error (subscribe_session.go)

Files fixed:
- weed/query/sqltypes/unsafe.go
- weed/mq/schema/schema_builder.go
- weed/mq/client/sub_client/connect_to_sub_coordinator.go
- weed/filer/redis3/ItemList.go
- weed/mq/client/agent_client/subscribe_session.go
- weed/mq/broker/broker_grpc_pub_balancer.go
- weed/mount/inode_to_path.go
- weed/util/skiplist/name_list.go

* fix: avoid copying lock values in protobuf messages

- Use proto.Merge() instead of direct assignment to avoid copying sync.Mutex in S3ApiConfiguration (iamapi_server.go)
- Add explicit comments noting that channel-received values are already copies before taking addresses (volume_grpc_client_to_master.go)

The protobuf messages contain sync.Mutex fields from the message state, which should not be copied.
Using proto.Merge() properly merges messages without copying the embedded mutex.

* fix: correct byte array size for uint32 bit shift operations

The generateAccountId() function only needs 4 bytes to create a uint32 value.
Changed from allocating 8 bytes to 4 bytes to match the actual usage.

This fixes go vet warning about shifting 8-bit values (bytes) by more than 8 bits.

* fix: ensure context cancellation on all error paths

In broker_client_subscribe.go, ensure subscriberCancel() is called on all error return paths:
- When stream creation fails
- When partition assignment fails
- When sending initialization message fails

This prevents context leaks when an error occurs during subscriber creation.

* fix: ensure subscriberCancel called for CreateFreshSubscriber stream.Send error

Ensure subscriberCancel() is called when stream.Send fails in CreateFreshSubscriber.

* ci: add go vet step to prevent future lint regressions

- Add go vet step to GitHub Actions workflow
- Filter known protobuf lock warnings (MessageState sync.Mutex)
  These are expected in generated protobuf code and are safe
- Prevents accumulation of go vet errors in future PRs
- Step runs before build to catch issues early

* fix: resolve remaining syntax and logic errors in vet fixes

- Fixed syntax errors in filer_sync.go caused by missing closing braces
- Added missing closing brace for if block and function
- Synchronized fixes to match previous commits on branch

* fix: add missing return statements to daemon functions

- Add 'return false' after infinite loops in filer_backup.go and filer_meta_backup.go
- Satisfies declared bool return type signatures
- Maintains consistency with other daemon functions (runMaster, runFilerSynchronize, runWorker)
- While unreachable, explicitly declares the return satisfies function signature contract

* fix: add nil check for onCompletionFn in SubscribeMessageRecord

- Check if onCompletionFn is not nil before calling it
- Prevents potential panic if nil function is passed
- Matches pattern used in other callback functions

* docs: clarify unreachable return statements in daemon functions

- Add comments documenting that return statements satisfy function signature
- Explains that these returns follow infinite loops and are unreachable
- Improves code clarity for future maintainers
2025-12-23 14:48:50 -08:00
Chris Lu
ed1da07665 Add consistent -debug and -debug.port flags to commands (#7816)
* Add consistent -debug and -debug.port flags to commands

Add -debug and -debug.port flags to weed master, weed volume, weed s3,
weed mq.broker, and weed filer.sync commands for consistency with
weed filer.

When -debug is enabled, an HTTP server starts on the specified port
(default 6060) serving runtime profiling data at /debug/pprof/.

For mq.broker, replaced the older -port.pprof flag with the new
-debug and -debug.port pattern for consistency.

* Update weed/util/grace/pprof.go

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-18 17:44:36 -08:00
Chris Lu
84b8a8e010 filer.sync: fix checkpoint not being saved properly (#7719)
* filer.sync: fix race condition on first checkpoint save

Initialize lastWriteTime to time.Now() instead of zero time to prevent
the first checkpoint save from being triggered immediately when the
first event arrives. This gives async jobs time to complete and update
the watermark before the checkpoint is saved.

Previously, the zero time caused lastWriteTime.Add(3s).Before(now) to
be true on the first event, triggering an immediate checkpoint save
attempt. But since jobs are processed asynchronously, the watermark
was still 0 (initial value), causing the save to be skipped due to
the 'if offsetTsNs == 0 { return nil }' check.

Fixes #7717

* filer.sync: save checkpoint on graceful shutdown

Add graceful shutdown handling to save the final checkpoint when
filer.sync is terminated. Previously, any sync progress within the
last 3-second checkpoint interval would be lost on shutdown.

Changes:
- Add syncState struct to track current processor and offset save info
- Add atomic pointers syncStateA2B and syncStateB2A for both directions
- Register grace.OnInterrupt hook to save checkpoints on shutdown
- Modify doSubscribeFilerMetaChanges to update sync state atomically

This ensures that when filer.sync is restarted, it resumes from the
correct position instead of potentially replaying old events.

Fixes #7717
2025-12-11 10:25:02 -08:00
Numblgw
aebf3952b7 filer sync: source path and exclude path support dir suffix (#6268)
filer sync: source path and exclude paht support dir suffix

Co-authored-by: liguowei <liguowei@xinye.com>
2024-11-21 08:25:12 -08:00
Max Denushev
a5fe6e21bc feat(filer.backup): add ignore errors option (#6235)
* feat(filer.backup): add ignore errors option

* feat(filer.backup): fix 404 error wrap

* feat(filer.backup): fix wrapping function

* feat(filer.backup): fix wrapping errors in genProcessFunction

* Update weed/command/filer_backup.go

* Update weed/command/filer_backup.go

* Update weed/command/filer_backup.go

---------

Co-authored-by: Max Denushev <denushev@tochka.com>
Co-authored-by: Chris Lu <chrislusf@users.noreply.github.com>
2024-11-14 08:40:55 -08:00
chrislu
19d14b4c71 minor 2024-07-16 09:21:01 -07:00
vadimartynov
b796c21fa9 Added loadSecurityConfigOnce (#5792) 2024-07-16 09:15:55 -07:00
chrislu
3a82f5ffad ensure metadata follow a specific folder
fix https://github.com/seaweedfs/seaweedfs/issues/5774
2024-07-12 11:17:30 -07:00
Konstantin Lebedev
d821cb3b18 fix: sync without dir /buckets/some/.uploads/hash_hash (#5402) 2024-03-20 12:54:29 -07:00
chrislu
83e4b02517 fix 2024-01-18 09:16:20 -08:00
chrislu
15b66a6633 refactor 2024-01-18 09:13:14 -08:00
XIAOYQ
be166b434f fix: skip s3 .uploads 2024-01-18 22:13:46 +08:00
Konstantin Lebedev
8d23e36c45 fix: doDeleteFiles deletes files (#5198) 2024-01-12 11:04:29 -08:00
Konstantin Lebedev
1169f94310 Fix filer sync set offset (#5197)
* fix: compose 2mount with sync

* fix: DATA RACE
https://github.com/seaweedfs/seaweedfs/issues/5194
https://github.com/seaweedfs/seaweedfs/issues/5195
2024-01-12 10:57:18 -08:00
Konstantin Lebedev
b9d32d32e1 chore: filer sync add doDeleteFiles option for create only mode (#5166) 2024-01-06 10:02:16 -08:00
Konstantin Lebedev
2b229e98ce fix: add doDeleteFile option for filer backup 2023-11-17 07:37:28 -08:00
Konstantin Lebedev
3c5295a1a6 filer backup add option for exclude file names that match the regexp to sync on filer 2023-11-13 06:23:46 -08:00
chrislu
5db9fcccd4 refactoring 2023-03-21 23:01:49 -07:00
chrislu
81fdf3651b grpc connection to filer add sw-client-id header 2023-01-20 01:48:12 -08:00
chrislu
8a40fa8993 more detailed logs 2022-12-17 13:18:35 -08:00
Jiffs Maverick
4b0430e71d [metrics] Add the ability to control bind ip (#4012) 2022-11-24 10:22:59 -08:00
chrislu
21c0587900 go fmt 2022-09-14 23:06:44 -07:00
Ryan Russell
2c92a9ff74 refactor: DefaultConcurrencyLimit var rename (#3658) 2022-09-14 06:30:32 -07:00
bernardx
228b133afa new 'concurrency' parameter for filer.sync (#3579)
Co-authored-by: XIAOYQ <xiaoyq@eudic.net>
2022-09-02 23:03:23 -07:00
qzh
7fcfaf7bc9 fix(filer.sync): offset may be set to 0 (#3451)
* fix(filer.sync): initializing the offset is related to the path

* fix(filer.sync): the offset maybe to be set to 0.

Co-authored-by: zhihao.qu <zhihao.qu@ly.com>
2022-08-15 23:43:52 -07:00
qzh
400f0c3e5d fix(filer.sync): initializing the offset is related to the path (#3450)
Co-authored-by: zhihao.qu <zhihao.qu@ly.com>
2022-08-15 21:56:47 -07:00
chrislu
67814a5c79 refactor and fix strings.Split 2022-08-07 01:34:32 -07:00
chrislu
1a4bf0dcb5 filer.sync: parallelize the filer.sync 2022-08-07 00:56:15 -07:00
chrislu
0e9478488d filer.sync: fix when excluded paths is empty 2022-08-07 00:55:34 -07:00
chrislu
26dbc6c905 move to https://github.com/seaweedfs/seaweedfs 2022-07-29 00:17:28 -07:00
Konstantin Lebedev
7e09a548a6 exclude directories to sync on filer 2022-07-27 19:22:57 +05:00
chrislu
64f3d6fb6e metadata subscription uses client epoch 2022-07-23 10:50:28 -07:00
chrislu
4a65159250 fix reading time 2022-06-27 12:40:47 -07:00
zhihao.qu
4d0d1848c6 fix(filer.sync): modify clientName format : from -> to 2022-06-15 13:33:20 +08:00
zhihao.qu
42d04c581b feat(filer.sync): add metricsServer in filer.sync.
Metrics include:
(1) the offset of the filer.sync
(2) the last send timestamp of the filer subscription
2022-06-15 11:33:18 +08:00
zhihao.qu
14d82c3dea feat(filer.sync): add offset to path. 2022-06-14 19:46:02 +08:00
zhihao.qu
cd5cca36a4 feat(filer.sync): add fromTsMs. Extract signature from doSubscribeFilerMetaChanges 2022-06-09 10:53:19 +08:00
creeew
02ae102731 fix filer.sync missing source srv uploaded files to target when target down 2022-06-02 01:28:47 +08:00
chrislu
a2b101a737 subscribe metadata between a range 2022-05-30 15:04:19 -07:00
chrislu
1384529eb7 Fix filer.backup deletes files in backup folder in incremental mode
fix https://github.com/chrislusf/seaweedfs/issues/2919
2022-04-14 13:35:01 -07:00
chrislu
202a29d014 refactoring 2022-02-25 01:17:26 -08:00