101 Commits

Author SHA1 Message Date
Chris Lu
24e664d651 fix(shell): don't halt volume.fsck purge on a stuck read-only volume (#9714)
* fix(shell): don't halt volume.fsck purge on a stuck read-only volume

A failed VolumeMarkWritable on one volume aborted the entire fsck purge
run; per-volume errors now log and continue so remaining volumes still
get purged.

* fix(shell): unify volume.fsck per-volume skip logging at the caller

Return the mark-writable error from purgeOneVolume instead of logging
in two places — the caller already prints "skip purging volume N: %v"
and defers still fire on the error return.

* fix(shell): collect volume.fsck purge-skipped volumes and report at end

Track volume IDs whose purge was skipped (mark-writable failure or
other per-volume errors) and print a sorted summary so operators don't
have to scrape the run log to find them. Deletes for those volumes are
already skipped; this just makes them explicit.
2026-05-27 17:49:35 -07:00
Chris Lu
b9bf45cb2e fix(shell): scope volume.fsck filer walk when -volumeId selects one bucketed collection (#9347)
* fix(shell): scope volume.fsck filer walk to the bucket when -volumeId selects one bucketed collection

Closes #9345.

-volumeId only filtered which volume .idx files were pulled; the filer-side
BFS still walked from "/", printing every directory under -v and making it
look like the flag was ignored. When all requested volumes share a single
non-empty collection that maps to an existing <bucketsPath>/<collection>
directory, restrict the BFS root to that bucket. Empty-collection volumes
or multi-collection selections fall back to the full walk, since chunks
for those can live anywhere.

* trim comments

* address review: collapse getCollectFilerFilePath; unshadow receiver in loop
2026-05-07 10:04:01 -07:00
Chris Lu
294f7c3d04 shell: expand ~ in local file path arguments (#9265)
* shell: expand `~` in local file path arguments

The weed shell parses commands itself instead of going through an OS
shell, so a path like `~/Downloads/foo.meta` was passed verbatim to
`os.Open`, which fails because no `~` directory exists. Users had to
spell out absolute home paths in every command.

Add an `expandHomeDir` helper that resolves a leading `~` or `~/...` to
the user's home directory, and run user-supplied local file paths in
the affected shell commands through it:

  fs.meta.load          (positional file)
  fs.meta.save          (-o)
  fs.meta.changeVolumeId (-mapping)
  s3.iam.export         (-file)
  s3.iam.import         (-file)
  s3.policy             (-file)
  s3tables.bucket       (-file)
  s3tables.table        (-file, -metadata)
  volume.fsck           (-tempPath)

Filer-namespace path flags (`-dir`, `-path`, `-locationPrefix`, etc.)
are unaffected; they live in the filer, not on the local FS.

* shell: reuse util.ResolvePath instead of a new helper

util.ResolvePath already does tilde expansion; drop the local
expandHomeDir helper and route every shell call site through it.
2026-04-28 12:30:13 -07:00
Chris Lu
e725eb4079 refactor(shell): run volume.fsck purge once per volume, after all replicas (#9159)
* refactor(shell): run volume.fsck purge once per volume, after all replicas

The purge step in findExtraChunksInVolumeServers was nested inside the
outer `for dataNodeId` loop, so it fired once per data-node iteration
rather than once total. Two consequences:

1. The replica-intersection safety net was broken. The code marks a fid
   "found in all replicas" only after every replica has reported its
   orphans, but the purge ran after the first data node already, so
   fids contributed only by later replicas never got the `true` flag
   in time. Without `-forcePurging` that meant some legitimate orphans
   were never purged; with `-forcePurging` the flag was ignored so the
   bug was hidden.

2. Visible output got noisy: "purging orphan data for volume X..."
   printed 2-3 times per volume (N_datanodes * N_replicas RPCs to the
   same locations) since purgeFileIdsForOneVolume already fans out to
   every replica location via MasterClient.GetLocations.

Split the work into two explicit phases: collect orphans from every
replica first, then purge each volume once. Drop the per-replica loop
around purgeFileIdsForOneVolume since it already handles all replicas
internally. Keep the per-replica mark-writable loop (each replica's
readonly bit has to be flipped before the purge RPC fans out to it).

Also simplify the gating expression — `isSeveralReplicas &&
foundInAllReplicas` is redundant given the preceding `!isSeveralReplicas`
branch — and replace `!(X > 0)` with the more idiomatic `len(X) == 0`.

Related to #9116 follow-up on multiple fsck passes needed to fully
clean a volume.

* address review: per-replica readonly tracking, count-based intersection, defer-per-volume

Three issues raised on the v1:

1. The readonly cleanup stored a single isReadOnlyReplicas[volumeId]=bool
   that flipped true if any replica was read-only, then the defer marked
   every replica in serverReplicas[volumeId] read-only on exit. If a
   volume had mixed replica modes (one RO, one RW), the originally-RW
   replica ended up RO after fsck returned. Track read-only state per
   replica in readOnlyServerReplicas[volumeId] and revert only those.

2. The defer inside the volumeId loop accumulated for the entire fsck
   run, so every volume we processed stayed writable until the whole
   command returned. Split the per-volume logic into purgeOneVolume so
   the defers unwind between volumes.

3. The intersection logic used a sticky bool that treated "seen on any
   2 of 3 replicas" as "seen on all replicas" — a 3+-replica volume
   would get purged for fids only 2 replicas agreed on, which is what
   -forcePurging is supposed to opt into. Switch to a count-based
   map[fid]int compared against volumeReplicaCounts[volumeId], so we
   only purge without -forcePurging when every replica agrees.

Also drop the now-unused serverReplicas map.
2026-04-20 15:32:47 -07:00
Chris Lu
08a7502b2c fix(shell): error on missing volume id in fsck, mergeVolumes, vacuum (#9158)
* fix(shell): error on missing volume id in fsck, mergeVolumes, vacuum

Three shell commands silently report success when -volumeId /
-fromVolumeId / -toVolumeId names a volume the master doesn't know
about: typos, already-deleted volumes, and stale scripts all look
identical to a clean no-op, which is what made the confusion in #9116
take as long as it did to diagnose.

- volume.fsck: filter at the per-datanode loop drops unknown ids and
  findExtraChunksInVolumeServers ends with totalOrphanChunkCount==0,
  printing "no orphan data".
- fs.mergeVolumes: createMergePlan iterates only known volumes, so an
  unknown -fromVolumeId produces an empty plan and we print just the
  "max volume size: N MB" header (indistinguishable from "nothing to
  merge").
- volume.vacuum: the master's VacuumVolume RPC silently iterates
  matching volumes; a missing id returns success having done nothing.

Validate the requested ids against the current topology up front and
return an explicit "volume(s) not found on master: [X Y]" error. Also
drop a stale duplicate `if err != nil` in volume.fsck.Do left over from
a prior refactor.

Surfaces #9116 follow-up from madalee-com.

* address review: propagate reloadVolumesInfo error; dedupe vacuum missing ids

- fs.mergeVolumes: c.reloadVolumesInfo's return was ignored. If the
  master is unreachable or VolumeList fails, c.volumes stays empty and
  the new validation block reports "fromVolumeId X not found on master"
  — masking the real connection/RPC failure. Return the wrapped error
  instead.

- volume.vacuum: "volume.vacuum -volumeId 5,5,5" on a missing volume 5
  listed [5 5 5] in the error. Collect missing ids in a set so each
  missing id appears once.

* address review: reject fromVolumeId/toVolumeId values that overflow uint32

flag.Uint produces a uint (64-bit on amd64), and the existing cast to
needle.VolumeId silently truncates to uint32. A typo like
`-fromVolumeId=4294967297` would wrap to volume 1 and slip past every
other validation, so the merge would run against a completely
different volume than the operator intended.

Bail out with an explicit error when the raw flag value exceeds the
uint32 range, before the cast.
2026-04-20 15:32:31 -07:00
Chris Lu
9a6b566fb1 fix(shell): volume.fsck keeps going past a single broken chunk manifest (#9140)
* fix(shell): volume.fsck no longer aborts on a single broken chunk manifest

Previously a single entry whose chunk-manifest could not be read (e.g. the
manifest needle was missing or its sub-chunks pointed at a now-gone volume)
caused collectFilerFileIdAndPaths to return immediately with
"failed to ResolveChunkManifest". The whole fsck run failed, so an operator
with even one corrupted file could not use volume.fsck to find or clean up
unrelated orphan needles on other volumes — they had to locate and delete
the bad entries first, blind, with no help from fsck.

Log the resolution failure with the entry path, fall back to recording the
top-level chunk fids the entry references (data fids and manifest fids
themselves; sub-chunks behind the unresolvable manifest stay unknown), and
keep traversing. Track the count of unresolved entries on the command struct
and refuse -reallyDeleteFromVolume for the run when the count is non-zero,
since the in-use fid set is incomplete and a purge could otherwise delete
live sub-chunks behind the broken manifest. Read-only fsck still produces a
useful (if conservatively over-reported) orphan listing so the operator can
see and fix the broken entries first, then re-run with apply.

Discovered while diagnosing #9116.

* address review: use callback ctx and atomic counter

- Pass the BFS callback's ctx to ResolveChunkManifest so a Ctrl+C / first-error
  cancellation propagates into the manifest fetch instead of using
  context.Background().
- TraverseBfs runs the callback across K=5 worker goroutines (filer_pb/filer_client_bfs.go),
  so the unresolvedManifestEntries field on commandVolumeFsck is shared across
  workers and was racing. Switch it to atomic.Int64 with Add/Load.

* address review: reset counter per Do(), pass through ctx errors

- commandVolumeFsck is a singleton registered in init() and reused across
  shell invocations. Without resetting the unresolved-manifest counter at
  the top of Do(), a single failed run permanently suppressed
  -reallyDeleteFromVolume in the same shell session. Reset to 0 right
  after flag parsing.
- Treating context cancellation as manifest corruption was wrong: a
  Ctrl+C or deadline mid-traversal would inflate the counter and emit
  misleading "manifest broken" warnings for entries that were never
  examined. Detect context.Canceled / context.DeadlineExceeded and
  return the error so the BFS unwinds cleanly.

Not changing the findMissingChunksInFiler branch's purgeAbsent /
applyPurging gating: that path checks recorded filer fids against
volume idx files, and a broken-manifest entry's recorded manifest fid
will fail the existence check and get purged — which is the cleanup
the operator wants for those entries. Adding a gate would block the
exact use case the warning points them at.
2026-04-19 23:06:28 -07:00
Chris Lu
cd6832249b Fix volume.fsck crashing on EC volumes and add multi-volume vacuum support (#8406)
* helm: refine openshift-values.yaml to remove hardcoded UIDs

Remove hardcoded runAsUser, runAsGroup, and fsGroup from the
openshift-values.yaml example. This allows OpenShift's admission
controller to automatically assign a valid UID from the namespace's
allocated range, avoiding "forbidden" errors when UID 1000 is
outside the permissible range.

Updates #8381, #8390.

* helm: fix volume.logs and add consistent security context comments

* Update README.md

* fix volume.fsck crashing on EC volumes and add multi-volume vacuum support

* address comments
2026-02-22 22:07:15 -08:00
Chris Lu
a3136c523f Fix volume.fsck 401 Unauthorized by adding JWT to HTTP delete requests (#8306)
* Fix volume.fsck 401 Unauthorized by adding JWT to HTTP delete requests

* Additionally, for performance, consider fetching the jwt.filer_signing.key once before any loops that call httpDelete, rather than inside httpDelete itself, to avoid repeated configuration lookups.
2026-02-11 13:32:56 -08:00
Chris Lu
6a61037333 fix issue #8230: volume.fsck deletion logic to respect purgeAbsent flag (#8266)
* fix issue #8230: volume.fsck deletion logic to respect purgeAbsent flag

This commit fixes two issues in volume.fsck:
1. Missing chunks in existing volumes are now deleted if -reallyDeleteFilerEntries is set.
2. Missing volumes are now properly handled when a -volumeId filter is specified, allowing deletion of filer entries for those volumes.

* address PR feedback for issue #8230

- Ensure volume filter is applied before reporting missing volumes
- Fix potential nil-pointer dereferences in httpDelete method
- Use proper error checking throughout httpDelete

* address second round PR feedback for issue #8230

- Use fmt.Fprintf(c.writer, ...) instead of fmt.Printf
- Add missing newline in "deleting path" log message
2026-02-09 13:23:17 -08:00
Chris Lu
94e0b902f9 shell: update fs.verify and volume.fsck for new BFS signature
Updated dependent commands to match the refactored
doTraverseBfsAndSaving signature and use context for channel sends.
2026-01-29 14:42:10 -08:00
Jaehoon Kim
f2e7af257d Fix volume.fsck -forcePurging -reallyDeleteFromVolume to fail fast on filer traversal errors (#8015)
* Add TraverseBfsWithContext and fix race conditions in error handling

- Add TraverseBfsWithContext function to support context cancellation
- Fix race condition in doTraverseBfsAndSaving using atomic.Bool and sync.Once
- Improve error handling with fail-fast behavior and proper error propagation
- Update command_volume_fsck to use error-returning saveFn callback
- Enhance error messages in readFilerFileIdFile with detailed context

* refactoring

* fix error format

* atomic

* filer_pb: make enqueue return void

* shell: simplify fs.meta.save error handling

* filer_pb: handle enqueue return value

* Revert "atomic"

This reverts commit 712648bc35.

* shell: refine fs.meta.save logic

---------

Co-authored-by: Chris Lu <chris.lu@gmail.com>
2026-01-14 21:37:50 -08:00
Chris Lu
93cca3a96b volume.fsck: increase default cutoffTimeAgo from 5 minutes to 5 hours (#7730)
* volume.fsck: increase default cutoffTimeAgo from 5 minutes to 5 hours

This change makes the fsck check more conservative by only considering
chunks older than 5 hours as potential orphans. A 5 minute window was
too aggressive and could incorrectly flag recently written chunks,
especially in busy systems or during backup operations.

Addresses #7649

* Update command_volume_fsck.go

* volume.fsck: add help text explaining cutoffTimeAgo parameter

* Update command_volume_fsck.go
2025-12-12 23:42:27 -08:00
Chris Lu
6a8c53bc44 Filer: batch deletion operations to return individual error results (#7382)
* batch deletion operations to return individual error results

Modify batch deletion operations to return individual error results instead of one aggregated error, enabling better tracking of which specific files failed to delete (helping reduce orphan file issues).

* Simplified logging logic

* Optimized nested loop

* handles the edge case where the RPC succeeds but connection cleanup fails

* simplify

* simplify

* ignore 'not found' errors here
2025-10-25 00:09:18 -07:00
Yavor Konstantinov
832df5265f Fix 'NaN%' issue when running volume.fsck (#7368)
* Fix 'NaN%' issue when running volume.fsck

- Running `volume.fsck` on an empty cluster will display 'NaN%'.

* Refactor

- Extract cound of orphan chunks in summary to new var.
- Restore handling for 'NaN' for individual volumes. Its not necessary
  because the check is already done.

* Make code more idiomatic
2025-10-23 21:44:19 -07:00
Chris Lu
97f3028782 Clean up logs and deprecated functions (#7339)
* less logs

* fix deprecated grpc.Dial
2025-10-17 22:11:50 -07:00
Chris Lu
69553e5ba6 convert error fromating to %w everywhere (#6995) 2025-07-16 23:39:27 -07:00
Aleksey Kosov
283d9e0079 Add context with request (#6824) 2025-05-28 11:34:02 -07:00
Lisandro Pin
0d5393641e Unify usage of shell.EcNode.dc as DataCenterId. (#6258) 2024-11-19 06:33:18 -08:00
chrislu
20929f2a57 adjust resource heavy for volume.fix.replication 2024-09-29 11:32:18 -07:00
chrislu
6564ceda91 skip resource heavy commands from running on master nodes 2024-09-29 10:51:17 -07:00
chrislu
ec30a504ba refactor 2024-09-29 10:38:22 -07:00
chrislu
701abbb9df add IsResourceHeavy() to command interface 2024-09-28 20:23:01 -07:00
Max Denushev
d056c0ddf2 fix(volume): don't persist RO state in specific cases (#6058)
* fix(volume): don't persist RO state in specific cases

* fix(volume): writable always persist
2024-09-24 16:15:54 -07:00
chrislu
8378a5b70b rename 2024-08-01 23:54:42 -07:00
wyang
31b89c1062 fsck: only check the appendNs of deleted needle (#5841)
increase fsck speed

Co-authored-by: Yang Wang <yangwang@weride.ai>
2024-07-31 01:12:57 -07:00
vadimartynov
86d92a42b4 Added tls for http clients (#5766)
* Added global http client

* Added Do func for global http client

* Changed the code to use the global http client

* Fix http client in volume uploader

* Fixed pkg name

* Fixed http util funcs

* Fixed http client for bench_filer_upload

* Fixed http client for stress_filer_upload

* Fixed http client for filer_server_handlers_proxy

* Fixed http client for command_fs_merge_volumes

* Fixed http client for command_fs_merge_volumes and command_volume_fsck

* Fixed http client for s3api_server

* Added init global client for main funcs

* Rename global_client to client

* Changed:
- fixed NewHttpClient;
- added CheckIsHttpsClientEnabled func
- updated security.toml in scaffold

* Reduce the visibility of some functions in the util/http/client pkg

* Added the loadSecurityConfig function

* Use util.LoadSecurityConfiguration() in NewHttpClient func
2024-07-16 23:14:09 -07:00
Taehyung Lim
4744889973 fix issue: sometimes volume.fsck report 'volume not found' (#5537)
* fix issue: sometimes volume.fsck report 'volume not found' when a volume server has multiple disk types

* rename variable

* adjust counters

---------

Co-authored-by: chrislu <chris.lu@gmail.com>
2024-06-11 22:22:57 -07:00
NyaMisty
579ebbdf60 Support concurrent volume.fsck & support disabling -cutoffTimeAgo to improve speed (#5636) 2024-06-02 14:25:42 -07:00
Seyed Mahdi Sadegh Shobeiri
97236389e8 Add modifyTimeAgo to volume.fsck (#5133)
* Add modifyTimeAgo to volume.fsck

* Fix AppendAtNs
2023-12-23 12:17:30 -08:00
Seyed Mahdi Sadegh Shobeiri
54ba2c8868 Fix cutoffTimeAgo in findMissingChunksInFiler (#5132) 2023-12-23 09:18:16 -08:00
zemul
0bf56298d5 fix chunk.ModifiedTsNs (#4264)
* fix

* fix mtime s > ns

---------

Co-authored-by: zemul <zhouzemiao@ihuman.com>
2023-03-02 08:24:36 -08:00
Zachary Walters
ef2f741823 Updated the deprecated ioutil dependency (#4239) 2023-02-21 19:47:33 -08:00
chrislu
e037c71ec3 adjust text 2023-02-10 13:04:29 -08:00
chrislu
67b8c2853a add line return 2023-02-10 12:53:43 -08:00
Chris Lu
d4566d4aaa more solid weed mount (#4089)
* compare chunks by timestamp

* fix slab clearing error

* fix test compilation

* move oldest chunk to sealed, instead of by fullness

* lock on fh.entryViewCache

* remove verbose logs

* revert slat clearing

* less logs

* less logs

* track write and read by timestamp

* remove useless logic

* add entry lock on file handle release

* use mem chunk only, swap file chunk has problems

* comment out code that maybe used later

* add debug mode to compare data read and write

* more efficient readResolvedChunks with linked list

* small optimization

* fix test compilation

* minor fix on writer

* add SeparateGarbageChunks

* group chunks into sections

* turn off debug mode

* fix tests

* fix tests

* tmp enable swap file chunk

* Revert "tmp enable swap file chunk"

This reverts commit 985137ec47.

* simple refactoring

* simple refactoring

* do not re-use swap file chunk. Sealed chunks should not be re-used.

* comment out debugging facilities

* either mem chunk or swap file chunk is fine now

* remove orderedMutex  as *semaphore.Weighted

not found impactful

* optimize size calculation for changing large files

* optimize performance to avoid going through the long list of chunks

* still problems with swap file chunk

* rename

* tiny optimization

* swap file chunk save only successfully read data

* fix

* enable both mem and swap file chunk

* resolve chunks with range

* rename

* fix chunk interval list

* also change file handle chunk group when adding chunks

* pick in-active chunk with time-decayed counter

* fix compilation

* avoid nil with empty fh.entry

* refactoring

* rename

* rename

* refactor visible intervals to *list.List

* refactor chunkViews to *list.List

* add IntervalList for generic interval list

* change visible interval to use IntervalList in generics

* cahnge chunkViews to *IntervalList[*ChunkView]

* use NewFileChunkSection to create

* rename variables

* refactor

* fix renaming leftover

* renaming

* renaming

* add insert interval

* interval list adds lock

* incrementally add chunks to readers

Fixes:
1. set start and stop offset for the value object
2. clone the value object
3. use pointer instead of copy-by-value when passing to interval.Value
4. use insert interval since adding chunk could be out of order

* fix tests compilation

* fix tests compilation
2023-01-02 23:20:45 -08:00
chrislu
70a4c98b00 refactor filer_pb.Entry and filer.Entry to use GetChunks()
for later locking on reading chunks
2022-11-15 06:33:36 -08:00
Konstantin Lebedev
0999f9b7ff [volume.fsck] collect ids without cut off time for finding missing data from volumes (#3934)
collect all file ids from the file without cut off time for finding missing data from volumes
2022-10-31 11:38:12 -07:00
Konstantin Lebedev
a322ba042e [volume.fsck] param volumeId is comma separated the volume id (#3933)
volume.fsck param volumeId is comma separated the volume id

Co-authored-by: Chris Lu <chrislusf@users.noreply.github.com>
2022-10-31 11:36:26 -07:00
Konstantin Lebedev
c0deaa4948 [volume.fsck] check needles status from volume server (#3926)
check needles status from volume server
2022-10-31 11:33:04 -07:00
Konstantin Lebedev
bf8a9d2db1 [volume.chek.disk] sync of deletions the fix (#3923)
* sync of deletions the fix

* avoid return if only partiallyDeletedNeedles

* refactor sync deletions
2022-10-30 20:32:46 -07:00
chrislu
ea2637734a refactor filer proto chunk variable from mtime to modified_ts_ns 2022-10-28 12:53:19 -07:00
Eric Yang
51d462f204 ADHOC: volume fsck using append at ns (#3906)
* ADHOC: volume fsck using append at ns

* nit

* nit

Co-authored-by: root <root@HQ-10MSTD3EY.roblox.local>
2022-10-24 22:09:38 -07:00
chrislu
377870f4a9 keep system log data 2022-10-24 16:50:39 -07:00
Konstantin Lebedev
7836f7574e [volume.fsck] hotfix apply purging and add option verifyNeedle #3860 (#3861)
* fix apply purging and add verifyNeedle

* common readSourceNeedleBlob

* use consts
2022-10-15 20:38:46 -07:00
Konstantin Lebedev
f19c9e3d9d Volume fsck by volume (#3851)
* refactor

* refactor args verbose and writer

* refactor readFilerFileIdFile

* fix filter by collectMtime

* skip system log collection
2022-10-13 23:30:30 -07:00
Eric Yang
56c94cc08e ADHOC: filter deleted files from idx file binary search (#3763)
* ADHOC: filter deleted files from idx file binary search

* remove unwanted check

Co-authored-by: root <root@HQ-10MSTD3EY.roblox.local>
2022-09-29 12:48:36 -07:00
chrislu
b6d7556dda skip truncation on error
fix https://github.com/seaweedfs/seaweedfs/issues/3746
2022-09-27 09:48:23 -07:00
Eric Yang
ddd6bee970 ADHOC: Volume fsck use a time cutoff param (#3626)
* ADHOC: cut off volumn fsck

* more

* fix typo

* add test

* modify name

* fix comment

* fix comments

* nit

* fix typo

* Update weed/shell/command_volume_fsck.go

Co-authored-by: root <root@HQ-10MSTD3EY.roblox.local>
Co-authored-by: Chris Lu <chrislusf@users.noreply.github.com>
2022-09-10 15:29:17 -07:00
chrislu
26dbc6c905 move to https://github.com/seaweedfs/seaweedfs 2022-07-29 00:17:28 -07:00
chrislu
271b5aed96 shell: volume.fsck add a note for -reallyDeleteFromVolume option 2022-05-15 11:07:04 -07:00