* fix(shell): don't halt volume.fsck purge on a stuck read-only volume
A failed VolumeMarkWritable on one volume aborted the entire fsck purge
run; per-volume errors now log and continue so remaining volumes still
get purged.
* fix(shell): unify volume.fsck per-volume skip logging at the caller
Return the mark-writable error from purgeOneVolume instead of logging
in two places — the caller already prints "skip purging volume N: %v"
and defers still fire on the error return.
* fix(shell): collect volume.fsck purge-skipped volumes and report at end
Track volume IDs whose purge was skipped (mark-writable failure or
other per-volume errors) and print a sorted summary so operators don't
have to scrape the run log to find them. Deletes for those volumes are
already skipped; this just makes them explicit.
* fix(shell): scope volume.fsck filer walk to the bucket when -volumeId selects one bucketed collection
Closes#9345.
-volumeId only filtered which volume .idx files were pulled; the filer-side
BFS still walked from "/", printing every directory under -v and making it
look like the flag was ignored. When all requested volumes share a single
non-empty collection that maps to an existing <bucketsPath>/<collection>
directory, restrict the BFS root to that bucket. Empty-collection volumes
or multi-collection selections fall back to the full walk, since chunks
for those can live anywhere.
* trim comments
* address review: collapse getCollectFilerFilePath; unshadow receiver in loop
* shell: expand `~` in local file path arguments
The weed shell parses commands itself instead of going through an OS
shell, so a path like `~/Downloads/foo.meta` was passed verbatim to
`os.Open`, which fails because no `~` directory exists. Users had to
spell out absolute home paths in every command.
Add an `expandHomeDir` helper that resolves a leading `~` or `~/...` to
the user's home directory, and run user-supplied local file paths in
the affected shell commands through it:
fs.meta.load (positional file)
fs.meta.save (-o)
fs.meta.changeVolumeId (-mapping)
s3.iam.export (-file)
s3.iam.import (-file)
s3.policy (-file)
s3tables.bucket (-file)
s3tables.table (-file, -metadata)
volume.fsck (-tempPath)
Filer-namespace path flags (`-dir`, `-path`, `-locationPrefix`, etc.)
are unaffected; they live in the filer, not on the local FS.
* shell: reuse util.ResolvePath instead of a new helper
util.ResolvePath already does tilde expansion; drop the local
expandHomeDir helper and route every shell call site through it.
* refactor(shell): run volume.fsck purge once per volume, after all replicas
The purge step in findExtraChunksInVolumeServers was nested inside the
outer `for dataNodeId` loop, so it fired once per data-node iteration
rather than once total. Two consequences:
1. The replica-intersection safety net was broken. The code marks a fid
"found in all replicas" only after every replica has reported its
orphans, but the purge ran after the first data node already, so
fids contributed only by later replicas never got the `true` flag
in time. Without `-forcePurging` that meant some legitimate orphans
were never purged; with `-forcePurging` the flag was ignored so the
bug was hidden.
2. Visible output got noisy: "purging orphan data for volume X..."
printed 2-3 times per volume (N_datanodes * N_replicas RPCs to the
same locations) since purgeFileIdsForOneVolume already fans out to
every replica location via MasterClient.GetLocations.
Split the work into two explicit phases: collect orphans from every
replica first, then purge each volume once. Drop the per-replica loop
around purgeFileIdsForOneVolume since it already handles all replicas
internally. Keep the per-replica mark-writable loop (each replica's
readonly bit has to be flipped before the purge RPC fans out to it).
Also simplify the gating expression — `isSeveralReplicas &&
foundInAllReplicas` is redundant given the preceding `!isSeveralReplicas`
branch — and replace `!(X > 0)` with the more idiomatic `len(X) == 0`.
Related to #9116 follow-up on multiple fsck passes needed to fully
clean a volume.
* address review: per-replica readonly tracking, count-based intersection, defer-per-volume
Three issues raised on the v1:
1. The readonly cleanup stored a single isReadOnlyReplicas[volumeId]=bool
that flipped true if any replica was read-only, then the defer marked
every replica in serverReplicas[volumeId] read-only on exit. If a
volume had mixed replica modes (one RO, one RW), the originally-RW
replica ended up RO after fsck returned. Track read-only state per
replica in readOnlyServerReplicas[volumeId] and revert only those.
2. The defer inside the volumeId loop accumulated for the entire fsck
run, so every volume we processed stayed writable until the whole
command returned. Split the per-volume logic into purgeOneVolume so
the defers unwind between volumes.
3. The intersection logic used a sticky bool that treated "seen on any
2 of 3 replicas" as "seen on all replicas" — a 3+-replica volume
would get purged for fids only 2 replicas agreed on, which is what
-forcePurging is supposed to opt into. Switch to a count-based
map[fid]int compared against volumeReplicaCounts[volumeId], so we
only purge without -forcePurging when every replica agrees.
Also drop the now-unused serverReplicas map.
* fix(shell): error on missing volume id in fsck, mergeVolumes, vacuum
Three shell commands silently report success when -volumeId /
-fromVolumeId / -toVolumeId names a volume the master doesn't know
about: typos, already-deleted volumes, and stale scripts all look
identical to a clean no-op, which is what made the confusion in #9116
take as long as it did to diagnose.
- volume.fsck: filter at the per-datanode loop drops unknown ids and
findExtraChunksInVolumeServers ends with totalOrphanChunkCount==0,
printing "no orphan data".
- fs.mergeVolumes: createMergePlan iterates only known volumes, so an
unknown -fromVolumeId produces an empty plan and we print just the
"max volume size: N MB" header (indistinguishable from "nothing to
merge").
- volume.vacuum: the master's VacuumVolume RPC silently iterates
matching volumes; a missing id returns success having done nothing.
Validate the requested ids against the current topology up front and
return an explicit "volume(s) not found on master: [X Y]" error. Also
drop a stale duplicate `if err != nil` in volume.fsck.Do left over from
a prior refactor.
Surfaces #9116 follow-up from madalee-com.
* address review: propagate reloadVolumesInfo error; dedupe vacuum missing ids
- fs.mergeVolumes: c.reloadVolumesInfo's return was ignored. If the
master is unreachable or VolumeList fails, c.volumes stays empty and
the new validation block reports "fromVolumeId X not found on master"
— masking the real connection/RPC failure. Return the wrapped error
instead.
- volume.vacuum: "volume.vacuum -volumeId 5,5,5" on a missing volume 5
listed [5 5 5] in the error. Collect missing ids in a set so each
missing id appears once.
* address review: reject fromVolumeId/toVolumeId values that overflow uint32
flag.Uint produces a uint (64-bit on amd64), and the existing cast to
needle.VolumeId silently truncates to uint32. A typo like
`-fromVolumeId=4294967297` would wrap to volume 1 and slip past every
other validation, so the merge would run against a completely
different volume than the operator intended.
Bail out with an explicit error when the raw flag value exceeds the
uint32 range, before the cast.
* fix(shell): volume.fsck no longer aborts on a single broken chunk manifest
Previously a single entry whose chunk-manifest could not be read (e.g. the
manifest needle was missing or its sub-chunks pointed at a now-gone volume)
caused collectFilerFileIdAndPaths to return immediately with
"failed to ResolveChunkManifest". The whole fsck run failed, so an operator
with even one corrupted file could not use volume.fsck to find or clean up
unrelated orphan needles on other volumes — they had to locate and delete
the bad entries first, blind, with no help from fsck.
Log the resolution failure with the entry path, fall back to recording the
top-level chunk fids the entry references (data fids and manifest fids
themselves; sub-chunks behind the unresolvable manifest stay unknown), and
keep traversing. Track the count of unresolved entries on the command struct
and refuse -reallyDeleteFromVolume for the run when the count is non-zero,
since the in-use fid set is incomplete and a purge could otherwise delete
live sub-chunks behind the broken manifest. Read-only fsck still produces a
useful (if conservatively over-reported) orphan listing so the operator can
see and fix the broken entries first, then re-run with apply.
Discovered while diagnosing #9116.
* address review: use callback ctx and atomic counter
- Pass the BFS callback's ctx to ResolveChunkManifest so a Ctrl+C / first-error
cancellation propagates into the manifest fetch instead of using
context.Background().
- TraverseBfs runs the callback across K=5 worker goroutines (filer_pb/filer_client_bfs.go),
so the unresolvedManifestEntries field on commandVolumeFsck is shared across
workers and was racing. Switch it to atomic.Int64 with Add/Load.
* address review: reset counter per Do(), pass through ctx errors
- commandVolumeFsck is a singleton registered in init() and reused across
shell invocations. Without resetting the unresolved-manifest counter at
the top of Do(), a single failed run permanently suppressed
-reallyDeleteFromVolume in the same shell session. Reset to 0 right
after flag parsing.
- Treating context cancellation as manifest corruption was wrong: a
Ctrl+C or deadline mid-traversal would inflate the counter and emit
misleading "manifest broken" warnings for entries that were never
examined. Detect context.Canceled / context.DeadlineExceeded and
return the error so the BFS unwinds cleanly.
Not changing the findMissingChunksInFiler branch's purgeAbsent /
applyPurging gating: that path checks recorded filer fids against
volume idx files, and a broken-manifest entry's recorded manifest fid
will fail the existence check and get purged — which is the cleanup
the operator wants for those entries. Adding a gate would block the
exact use case the warning points them at.
* helm: refine openshift-values.yaml to remove hardcoded UIDs
Remove hardcoded runAsUser, runAsGroup, and fsGroup from the
openshift-values.yaml example. This allows OpenShift's admission
controller to automatically assign a valid UID from the namespace's
allocated range, avoiding "forbidden" errors when UID 1000 is
outside the permissible range.
Updates #8381, #8390.
* helm: fix volume.logs and add consistent security context comments
* Update README.md
* fix volume.fsck crashing on EC volumes and add multi-volume vacuum support
* address comments
* Fix volume.fsck 401 Unauthorized by adding JWT to HTTP delete requests
* Additionally, for performance, consider fetching the jwt.filer_signing.key once before any loops that call httpDelete, rather than inside httpDelete itself, to avoid repeated configuration lookups.
* fix issue #8230: volume.fsck deletion logic to respect purgeAbsent flag
This commit fixes two issues in volume.fsck:
1. Missing chunks in existing volumes are now deleted if -reallyDeleteFilerEntries is set.
2. Missing volumes are now properly handled when a -volumeId filter is specified, allowing deletion of filer entries for those volumes.
* address PR feedback for issue #8230
- Ensure volume filter is applied before reporting missing volumes
- Fix potential nil-pointer dereferences in httpDelete method
- Use proper error checking throughout httpDelete
* address second round PR feedback for issue #8230
- Use fmt.Fprintf(c.writer, ...) instead of fmt.Printf
- Add missing newline in "deleting path" log message
* Add TraverseBfsWithContext and fix race conditions in error handling
- Add TraverseBfsWithContext function to support context cancellation
- Fix race condition in doTraverseBfsAndSaving using atomic.Bool and sync.Once
- Improve error handling with fail-fast behavior and proper error propagation
- Update command_volume_fsck to use error-returning saveFn callback
- Enhance error messages in readFilerFileIdFile with detailed context
* refactoring
* fix error format
* atomic
* filer_pb: make enqueue return void
* shell: simplify fs.meta.save error handling
* filer_pb: handle enqueue return value
* Revert "atomic"
This reverts commit 712648bc35.
* shell: refine fs.meta.save logic
---------
Co-authored-by: Chris Lu <chris.lu@gmail.com>
* volume.fsck: increase default cutoffTimeAgo from 5 minutes to 5 hours
This change makes the fsck check more conservative by only considering
chunks older than 5 hours as potential orphans. A 5 minute window was
too aggressive and could incorrectly flag recently written chunks,
especially in busy systems or during backup operations.
Addresses #7649
* Update command_volume_fsck.go
* volume.fsck: add help text explaining cutoffTimeAgo parameter
* Update command_volume_fsck.go
* batch deletion operations to return individual error results
Modify batch deletion operations to return individual error results instead of one aggregated error, enabling better tracking of which specific files failed to delete (helping reduce orphan file issues).
* Simplified logging logic
* Optimized nested loop
* handles the edge case where the RPC succeeds but connection cleanup fails
* simplify
* simplify
* ignore 'not found' errors here
* Fix 'NaN%' issue when running volume.fsck
- Running `volume.fsck` on an empty cluster will display 'NaN%'.
* Refactor
- Extract cound of orphan chunks in summary to new var.
- Restore handling for 'NaN' for individual volumes. Its not necessary
because the check is already done.
* Make code more idiomatic
* Added global http client
* Added Do func for global http client
* Changed the code to use the global http client
* Fix http client in volume uploader
* Fixed pkg name
* Fixed http util funcs
* Fixed http client for bench_filer_upload
* Fixed http client for stress_filer_upload
* Fixed http client for filer_server_handlers_proxy
* Fixed http client for command_fs_merge_volumes
* Fixed http client for command_fs_merge_volumes and command_volume_fsck
* Fixed http client for s3api_server
* Added init global client for main funcs
* Rename global_client to client
* Changed:
- fixed NewHttpClient;
- added CheckIsHttpsClientEnabled func
- updated security.toml in scaffold
* Reduce the visibility of some functions in the util/http/client pkg
* Added the loadSecurityConfig function
* Use util.LoadSecurityConfiguration() in NewHttpClient func
* fix issue: sometimes volume.fsck report 'volume not found' when a volume server has multiple disk types
* rename variable
* adjust counters
---------
Co-authored-by: chrislu <chris.lu@gmail.com>
* compare chunks by timestamp
* fix slab clearing error
* fix test compilation
* move oldest chunk to sealed, instead of by fullness
* lock on fh.entryViewCache
* remove verbose logs
* revert slat clearing
* less logs
* less logs
* track write and read by timestamp
* remove useless logic
* add entry lock on file handle release
* use mem chunk only, swap file chunk has problems
* comment out code that maybe used later
* add debug mode to compare data read and write
* more efficient readResolvedChunks with linked list
* small optimization
* fix test compilation
* minor fix on writer
* add SeparateGarbageChunks
* group chunks into sections
* turn off debug mode
* fix tests
* fix tests
* tmp enable swap file chunk
* Revert "tmp enable swap file chunk"
This reverts commit 985137ec47.
* simple refactoring
* simple refactoring
* do not re-use swap file chunk. Sealed chunks should not be re-used.
* comment out debugging facilities
* either mem chunk or swap file chunk is fine now
* remove orderedMutex as *semaphore.Weighted
not found impactful
* optimize size calculation for changing large files
* optimize performance to avoid going through the long list of chunks
* still problems with swap file chunk
* rename
* tiny optimization
* swap file chunk save only successfully read data
* fix
* enable both mem and swap file chunk
* resolve chunks with range
* rename
* fix chunk interval list
* also change file handle chunk group when adding chunks
* pick in-active chunk with time-decayed counter
* fix compilation
* avoid nil with empty fh.entry
* refactoring
* rename
* rename
* refactor visible intervals to *list.List
* refactor chunkViews to *list.List
* add IntervalList for generic interval list
* change visible interval to use IntervalList in generics
* cahnge chunkViews to *IntervalList[*ChunkView]
* use NewFileChunkSection to create
* rename variables
* refactor
* fix renaming leftover
* renaming
* renaming
* add insert interval
* interval list adds lock
* incrementally add chunks to readers
Fixes:
1. set start and stop offset for the value object
2. clone the value object
3. use pointer instead of copy-by-value when passing to interval.Value
4. use insert interval since adding chunk could be out of order
* fix tests compilation
* fix tests compilation