924 Commits

Author SHA1 Message Date
Chris Lu
3ff92f797d 4.21 2026-04-19 14:38:29 -07:00
Chris Lu
45578a42e9 fix(volume): keep vacuum running past dangling .idx entries (#9115)
* fix(volume): keep vacuum running past dangling .idx entries

Vacuum compaction aborted entirely on the first .idx entry whose offset
pointed past the end of the .dat file, surfacing as `cannot hydrate
needle from file: EOF` and stalling progress on every other volume.

In both Go and Rust:

- During compaction, skip an unreadable needle and continue. The bytes
  it pointed at were already unreachable via reads, so dropping the
  index reference makes the post-vacuum volume consistent. Real EIO
  still bails out so a disk fault is not silently papered over.

- At volume load, do a single linear scan of the .idx and confirm
  every (offset + actual size) fits inside .dat. The pre-existing
  integrity check only looked at the last 10 entries, so deeper
  corruption (e.g. left over from a crashed batched write) went
  undetected and only surfaced later as a vacuum EOF. A failure now
  marks the volume read-only at load time so an operator can react.

Refs #8928

* fix(volume): only skip permanent-corruption needle reads during vacuum

Address PR review feedback (gemini-code-assist + coderabbit):

The original patch skipped any non-EIO read failure, which would silently
drop needles on transient errors — Windows hardware bad-sector errors
(ERROR_CRC etc.) never surface as syscall.EIO; tiered-storage network
timeouts and EROFS would also slip through and shrink the volume.

Switch to an explicit whitelist of permanent-corruption shapes:

- Add needle.ErrorCorrupted sentinel and wrap CRC and "index out of
  range" errors with %w so callers can match via errors.Is.
- copyDataBasedOnIndexFile now skips only when the read failure is
  io.EOF, io.ErrUnexpectedEOF, ErrorSizeMismatch, ErrorSizeInvalid,
  or ErrorCorrupted. Anything else (real disk faults, environmental
  errors, Windows hardware codes) aborts the compaction so an
  operator notices.
- Mirror the same whitelist in the Rust volume server, matching on
  io::ErrorKind::UnexpectedEof and the NeedleError corruption variants
  (SizeMismatch, CrcMismatch, IndexOutOfRange, TailTooShort).

Also add `defer v.Close()` in TestVerifyIndexFitsInDat so Windows
t.TempDir() cleanup can release the .dat/.idx handles.

Refs #8928

* fix(volume): wrap entry-not-found size-mismatch with ErrorSizeMismatch

Address PR review: the fallback branch in ReadBytes returned an
unwrapped fmt.Errorf, so isSkippableNeedleReadError (and any caller
using errors.Is(..., ErrorSizeMismatch)) could not match it. Wrap
with %w so the whitelist applies, while leaving the existing direct
sentinel return for the OffsetSize==4 / offset<MaxPossibleVolumeSize
retry path unchanged so ReadData's `err == ErrorSizeMismatch` retry
still triggers.

Refs #8928

* fix(volume): integrate dangling-idx check into existing index load walk

Address PR review (gemini-code-assist, medium): the structural .idx
check used to do a second linear scan of the index file at every volume
load, doubling the disk-I/O cost on servers managing many volumes.

Track the largest (offset + actual size) seen during the existing
needle-map load walks (`LoadCompactNeedleMap`, `NewLevelDbNeedleMap`,
`NewSortedFileNeedleMap`'s `newNeedleMapMetricFromIndexFile`,
`DoOffsetLoading`) on a new `MaximumNeedleEnd` field on `mapMetric`,
exposed as `MaxNeedleEnd()` on the NeedleMapper interface.
`volume.load()` then compares `nm.MaxNeedleEnd()` to the .dat size
after the load is complete — pure numeric comparison, no extra I/O.

The standalone `verifyIndexFitsInDat` helper and its caller in
`CheckVolumeDataIntegrity` are removed; the test that used to drive
the helper directly now exercises the new path via
`LoadCompactNeedleMap`.

Mirror the same change in the Rust volume server: track
`max_needle_end` on `NeedleMapMetric`, expose via `max_needle_end()`
on `CompactNeedleMap`, `RedbNeedleMap`, and the `NeedleMap` enum.
The Rust load walk already happens in `load_from_idx` for both map
kinds, so the structural check becomes free.

Refs #8928
2026-04-16 22:01:34 -07:00
Chris Lu
9d15705c16 fix(mini): shut down admin/s3/webdav/filer before volume/master on Ctrl+C (#9112)
* fix(mini): shut down admin/s3/webdav/filer before volume/master on Ctrl+C

Interrupts fired grace hooks in registration order, so master (started
first) shut down before its clients, producing heartbeat-canceled errors
and masterClient reconnection noise during weed mini shutdown. Admin/s3/
webdav had no interrupt hooks at all and were killed at os.Exit.

- grace: execute interrupt hooks in LIFO (defer-style) order so later-
  started services tear down first.
- filer: consolidate the three separate interrupt hooks (gRPC / HTTP /
  DB) into one that runs in order, so filer shutdown stays correct
  independent of FIFO/LIFO semantics.
- mini: add MiniClientsShutdownCtx (separate from test-facing
  MiniClusterCtx) plus an OnMiniClientsShutdown helper. Admin, S3,
  WebDAV and the maintenance worker observe it; runMini registers a
  cancel hook after startup so under LIFO it fires first and waits up to
  10s on a WaitGroup for those services to drain before filer, volume,
  and master shut down.

Resulting order on Ctrl+C: admin/s3/webdav/worker -> filer (gRPC -> HTTP
-> DB) -> volume -> master.

* refactor(mini): group mini-client shutdown into one state struct

The first pass spread the shutdown plumbing across three globals
(MiniClientsShutdownCtx, miniClientsWg, cancelMiniClients) and two
ctx-derivation sites (OnMiniClientsShutdown and startMiniAdminWithWorker).

Group into a private miniClientsState (ctx/cancel/wg) rebuilt per runMini
invocation, and chain its ctx from MiniClusterCtx so clients only observe
one signal. Tests that cancel MiniClusterCtx still trigger client
shutdown via parent-child propagation.

- resetMiniClients() installs fresh state at the top of runMini, so
  in-process test reruns don't inherit stale ctx/wg.
- onMiniClientsShutdown(fn) replaces the exported OnMiniClientsShutdown
  and only observes one ctx.
- trackMiniClient() replaces the manual wg.Add/Done dance for the admin
  goroutine.
- miniClientsCtx() gives the admin startup a ctx without re-deriving.
- triggerMiniClientsShutdown(timeout) is the interrupt hook body.

No behaviour change; existing tests pass.

* refactor: generalize shutdown ctx as an option, not a mini-specific helper

Several service files (s3, webdav, filer, master, volume) observed the
mini-specific MiniClusterCtx or called onMiniClientsShutdown directly.
That leaked mini orchestration into code that also runs under weed s3,
weed webdav, weed filer, weed master, and weed volume standalone.

Replace with a generic `shutdownCtx context.Context` field on each
service's Options struct. When non-nil, the server watches it and shuts
down gracefully; when nil (standalone), the shutdown path is a no-op.

Mini wires the contexts up from a single place (runMini):
 - miniMasterOptions/miniOptions.v/miniFilerOptions.shutdownCtx =
   MiniClusterCtx (drives test-triggered teardown)
 - miniS3Options/miniWebDavOptions.shutdownCtx = miniClientsCtx() (drives
   Ctrl+C teardown before filer/volume/master)

All knowledge of MiniClusterCtx now lives in mini.go.

* fix(mini): stop worker before clients ctx so admin shutdown isn't blocked

Symptom on Ctrl+C of a clean weed mini: mini's Shutting down admin/s3/
webdav hook sat for 10s then logged "timed out". Admin had started its
shutdown but was blocked inside StopWorkerGrpcServer's GracefulStop,
waiting for the still-connected worker stream. That in turn left filer
clients connected and cascaded into filer's own 10s gRPC graceful-stop
timeout.

Two causes, both fixed:

1. worker.Stop() deadlocked on clean shutdown. It sent ActionStop (which
   makes managerLoop `break out` and exit), then called getTaskLoad()
   which sends to the same unbuffered cmd channel — no receiver, hangs
   forever. Reorder Stop() to snapshot the admin client and drain tasks
   BEFORE sending ActionStop, and call Disconnect() via the local
   snapshot afterwards.

2. Worker's taskRequestLoop raced with Disconnect(): RequestTask reads
   from c.incoming, which Disconnect closes, yielding a nil response and
   a panic on response.Message. Handle the closed channel explicitly.

3. Mini now has a preCancel phase (beforeMiniClientsShutdown) that runs
   synchronously BEFORE the clients ctx is cancelled. Register worker
   shutdown there so admin's worker-gRPC GracefulStop finds the worker
   already disconnected and returns immediately, instead of waiting on
   a stream that is about to close anyway.

Observed shutdown of a clean mini: admin/s3/webdav down in <10ms; full
process exit in ~11s (the remaining 10s is a pre-existing filer gRPC
graceful-stop timeout, not cascaded from the clients tier).

* feat(mini): cap filer gRPC graceful stop at 1s under weed mini

Full weed mini shutdown was ~11s on a clean exit, dominated by the
filer's default 10s gRPC GracefulStop timeout while background
SubscribeLocalMetadata streams drained.

Expose the timeout as a FilerOptions.gracefulStopTimeout field (default
10s for standalone weed filer) and set it to 1s in mini. Clean weed mini
shutdown now takes ~2s.
2026-04-16 16:11:01 -07:00
Chris Lu
886d50a6a5 feat(mount): singleflight dedup for concurrent chunk reads (#9100)
* feat(mount): add singleflight deduplication for concurrent chunk reads

When multiple FUSE readers request the same uncached chunk concurrently,
only one network fetch is performed. Other readers wait and share the
downloaded data, reducing redundant volume server traffic under parallel
read workloads.

* fix(util): make singleflight panic-safe with defer cleanup

If the provided function panics, the WaitGroup and map entry are now
cleaned up via defer, preventing other waiters from hanging forever.

* fix(filer): remove singleflight from reader_cache to fix buffer ownership

The singleflight wrapper around chunk fetches returned the same []byte
buffer to concurrent callers. Since each SingleChunkCacher owns and
frees its data buffer in destroy(), sharing the same slice would cause
a use-after-free or double-free with the mem allocator.

The downloaders map already deduplicates in-flight downloads for the
same fileId, so the singleflight was redundant at this layer. The
SingleFlightGroup utility is retained for use elsewhere.
2026-04-16 10:18:05 -07:00
Chris Lu
e1fa4ec756 perf(cache): drop OS page cache after disk cache reads (#9098)
* perf(cache): drop OS page cache after disk cache reads

After reading from the on-disk chunk cache, advise the kernel via
FADV_DONTNEED to release the corresponding page cache pages. This
prevents double-caching the same data in both user-space and kernel
page caches, freeing RAM for other uses on systems with large disk
caches.

* fix(cache): guard dropReadCache against zero length and invalid fd

A zero-length fadvise is interpreted as "to end of file" on Linux,
which would inadvertently drop the page cache for the entire remainder
of the cache volume. Also check fd >= 0 to avoid unnecessary syscalls
when the backend file is closed.

* perf(cache): only apply FADV_DONTNEED for reads >= 1 MiB

For small needle reads the syscall overhead outweighs the memory
savings, and the kernel page cache is more beneficial for warm data.
Restrict fadvise to reads of at least 1 MiB where the freed page
cache is meaningful.
2026-04-16 09:38:42 -07:00
Chris Lu
08d9193fe1 [nfs] Add NFS (#9067)
* add filer inode foundation for nfs

* nfs command skeleton

* add filer inode index foundation for nfs

* make nfs inode index hardlink aware

* add nfs filehandle and inode lookup plumbing

* add read-only nfs frontend foundation

* add nfs namespace mutation support

* add chunk-backed nfs write path

* add nfs protocol integration tests

* add stale handle nfs coverage

* complete nfs hardlink and failover coverage

* add nfs export access controls

* add nfs metadata cache invalidation

* fix nfs chunk read lookup routing

* fix nfs review findings and rename regression

* address pr 9067 review comments

- filer_inode: fail fast if the snowflake sequencer cannot start, and let
  operators override the 10-bit node id via SEAWEEDFS_FILER_SNOWFLAKE_ID
  to avoid multi-filer collisions
- filer_inode: drop the redundant retry loop in nextInode
- filerstore_wrapper: treat inode-index writes/removals as best-effort so
  a primary store success no longer surfaces as an operation failure
- filer_grpc_server_rename: defer overwritten-target chunk deletion until
  after CommitTransaction so a rolled-back rename does not strand live
  metadata pointing at freshly deleted chunks
- command/nfs: default ip.bind to loopback and require an explicit
  filer.path, so the experimental server does not expose the entire
  filer namespace on first run
- nfs integration_test: document why LinkArgs matches go-nfs's on-the-wire
  layout rather than RFC 1813 LINK3args

* mount: pre-allocate inode in Mkdir and Symlink

Mkdir and Symlink used to send filer_pb.CreateEntryRequest with
Attributes.Inode = 0. After PR 9067, the filer's CreateEntry now assigns
its own inode in that case, so the filer-side entry ends up with a
different inode than the one the mount allocates via inodeToPath.Lookup
and returns to the kernel. Once applyLocalMetadataEvent stores the
filer's entry in the meta cache, subsequent GetAttr calls read the
cached entry and hit the setAttrByPbEntry override at line 197 of
weedfs_attr.go, returning the filer-assigned inode instead of the
mount's local one. pjdfstest tests/rename/00.t (subtests 81/87/91)
caught this — it lstat'd a freshly-created directory/symlink, renamed
it, lstat'd again, and saw a different inode the second time.

createRegularFile already pre-allocates via inodeToPath.AllocateInode
and stamps it into the create request. Do the same thing in Mkdir and
Symlink so both sides agree on the object identity from the very first
request, and so GetAttr's cache path returns the same value as Mkdir /
Symlink's initial response.

* sequence: mask snowflake node id on int→uint32 conversion

CodeQL flagged the unchecked uint32(snowflakeId) cast in
NewSnowflakeSequencer as a potential truncation bug when snowflakeId is
sourced from user input (e.g. via SEAWEEDFS_FILER_SNOWFLAKE_ID). Mask
to the 10 bits the snowflake library actually uses so any caller-
supplied int is safely clamped into range.

* add test/nfs integration suite

Boots a real SeaweedFS cluster (master + volume + filer) plus the
experimental `weed nfs` frontend as subprocesses and drives it through
the NFSv3 wire protocol via go-nfs-client, mirroring the layout of
test/sftp. The tests run without a kernel NFS mount, privileged ports,
or any platform-specific tooling.

Coverage includes read/write round-trip, mkdir/rmdir, nested
directories, rename content preservation, overwrite + explicit
truncate, 3 MiB binary file, all-byte binary and empty files, symlink
round-trip, ReadDirPlus listing, missing-path remove, FSInfo sanity,
sequential appends, and readdir-after-remove.

Framework notes:

- Picks ephemeral ports with net.Listen("127.0.0.1:0") and passes
  -port.grpc explicitly so the default port+10000 convention cannot
  overflow uint16 on macOS.
- Pre-creates the /nfs_export directory via the filer HTTP API before
  starting the NFS server — the NFS server's ensureIndexedEntry check
  requires the export root to exist with a real entry, which filer.Root
  does not satisfy when the export path is "/".
- Reuses the same rpc.Client for mount and target so go-nfs-client does
  not try to re-dial via portmapper (which concatenates ":111" onto the
  address).

* ci: add NFS integration test workflow

Mirror test/sftp's workflow for the new test/nfs suite so PRs that touch
the NFS server, the inode filer plumbing it depends on, or the test
harness itself run the 14 NFSv3-over-RPC integration tests on Ubuntu
22.04 via `make test`.

* nfs: use append for buffer growth in Write and Truncate

The previous make+copy pattern reallocated the full buffer on every
extending write or truncate, giving O(N^2) behaviour for sequential
write loops. Switching to `append(f.content, make([]byte, delta)...)`
lets Go's amortized growth strategy absorb the repeated extensions.
Called out by gemini-code-assist on PR 9067.

* filer: honor caller cancellation in collectInodeIndexEntries

Dropping the WithoutCancel wrapper lets DeleteFolderChildren bail out of
the inode-index scan if the client disconnects mid-walk. The cleanup is
already treated as best-effort by the caller (it logs on error and
continues), so a cancelled walk just means the partial index rebuild is
skipped — the same failure mode as any other index write error.
Flagged as a DoS concern by gemini-code-assist on PR 9067.

* nfs: skip filer read on open when O_TRUNC is set

openFile used to unconditionally loadWritableContent for every writable
open and then discard the buffer if O_TRUNC was set. For large files
that is a pointless 64 MiB round-trip. Reorder the branches so we only
fetch existing content when the caller intends to keep it, and mark the
file dirty right away so the subsequent Close still issues the
truncating write. Called out by gemini-code-assist on PR 9067.

* nfs: allow Seek on O_APPEND files and document buffered write cap

Two related cleanups on filesystem.go:

- POSIX only restricts Write on an O_APPEND fd, not lseek. The existing
  Seek error ("append-only file descriptors may only seek to EOF")
  prevented read-and-write workloads that legitimately reposition the
  read cursor. Write already snaps the offset to EOF before persisting
  (see seaweedFile Write), so Seek can unconditionally accept any
  offset. Update the unit test that was asserting the old behaviour.
- Add a doc comment on maxBufferedWriteSize explaining that it is a
  per-file ceiling, the memory footprint it implies, and that the real
  fix for larger whole-file rewrites is streaming / multi-chunk support.

Both changes flagged by gemini-code-assist on PR 9067.

* nfs: guard offset before casting to int in Write

CodeQL flagged `int(f.offset) + len(p)` inside the Write growth path as
a potential overflow on architectures where `int` is 32-bit. The
existing check only bounded the post-cast value, which is too late.
Clamp f.offset against maxBufferedWriteSize before the cast and also
reject negative/overflowed endOffset results. Both branches fall
through to billy.ErrNotSupported, the same behaviour the caller gets
today for any out-of-range buffered write.

* nfs: compute Write endOffset in int64 to satisfy CodeQL

The previous guard bounded f.offset but left len(p) unchecked, so
CodeQL still flagged `int(f.offset) + len(p)` as a possible int-width
overflow path. Bound len(p) against maxBufferedWriteSize first, do the
addition in int64, and only cast down after the total has been clamped
against the buffer ceiling. Behaviour is unchanged: any out-of-range
write still returns billy.ErrNotSupported.

* ci: drop emojis from nfs-tests workflow summary

Plain-text step summary per user preference — no decorative glyphs in
the NFS CI output or checklist.

* nfs: annotate remaining DEV_PLAN TODOs with status

Three of the unchecked items are genuine follow-up PRs rather than
missing work in this one, and one was actually already done:

- Reuse chunk cache and mutation stream helpers without FUSE deps:
  checked off — the NFS server imports weed/filer.ReaderCache and
  weed/util/chunk_cache directly with no weed/mount or go-fuse imports.
- Extract shared read/write helpers from mount/WebDAV/SFTP: annotated
  as deferred to a separate refactor PR (touches four packages).
- Expand direct data-path writes beyond the 64 MiB buffered fallback:
  annotated as deferred — requires a streaming WRITE path.
- Shared lock state + lock tests: annotated as blocked upstream on
  go-nfs's missing NLM/NFSv4 lock state RPCs, matching the existing
  "Current Blockers" note.

* test/nfs: share port+readiness helpers with test/testutil

Drop the per-suite mustPickFreePort and waitForService re-implementations
in favor of testutil.MustAllocatePorts (atomic batch allocation; no
close-then-hope race) and testutil.WaitForPort / SeaweedMiniStartupTimeout.
Pull testutil in via a local replace directive so this standalone
seaweedfs-nfs-tests module can import the in-repo package without a
separate release.

Subprocess startup is still master + volume + filer + nfs — no switch to
weed mini yet, since mini does not know about the nfs frontend.

* nfs: stream writes to volume servers instead of buffering the whole file

Before this change the NFS write path held the full contents of every
writable open in memory:

  - OpenFile(write) called loadWritableContent which read the existing
    file into seaweedFile.content up to maxBufferedWriteSize (64 MiB)
  - each Write() extended content in-place
  - Close() uploaded the whole buffer as a single chunk via
    persistContent + AssignVolume

The 64 MiB ceiling made large NFS writes return NFS3ERR_NOTSUPP, and
even below the cap every Write paid a whole-file-in-memory cost. This
PR rewrites the write path to match how `weed filer` and the S3 gateway
persist data:

  - openFile(write) no longer loads the existing content at all; it
    only issues an UpdateEntry when O_TRUNC is set *and* the file is
    non-empty (so a fresh create+trunc is still zero-RPC)
  - Write() streams the caller's bytes straight to a volume server via
    one AssignVolume + one chunk upload, then atomically appends the
    resulting chunk to the filer entry through mutateEntry. Any
    previously inlined entry.Content is migrated to a chunk in the same
    update so the chunk list becomes the authoritative representation.
  - Truncate() becomes a direct mutateEntry (drop chunks past the new
    size, clip inline content, update FileSize) instead of resizing an
    in-memory buffer.
  - Close() is a no-op because everything was flushed inline.

The small-file fast path that the filer HTTP handler uses is preserved:
if the post-write size still fits in maxInlineWriteSize (4 MiB) and
the file has no existing chunks, we rewrite entry.Content directly and
skip the volume-server round-trip. This keeps single-shot tiny writes
(echo, small edits) cheap while completely removing the 64 MiB cap on
larger files. Read() now always reads through the chunk reader instead
of a local byte slice, so reads inside the same session see the freshly
appended data.

Drops the unused seaweedFile.content / dirty fields, the
maxBufferedWriteSize constant, and the loadWritableContent helper.
Updates TestSeaweedFileSystemSupportsNamespaceMutations expectations
to match the new "no extra O_TRUNC UpdateEntry on an empty file"
behavior (still 3 updates: Write + Chmod + Truncate).

* filer: extract shared gateway upload helper for NFS and WebDAV

Three filer-backed gateways (NFS, WebDAV, and mount) each had a local
saveDataAsChunk that wrapped operation.NewUploader().UploadWithRetry
with near-identical bodies: build AssignVolumeRequest, build
UploadOption, build genFileUrlFn with optional filerProxy rewriting,
call UploadWithRetry, validate the result, and call ToPbFileChunk.
Pull that body into filer.SaveGatewayDataAsChunk with a
GatewayChunkUploadRequest struct so both NFS and WebDAV can delegate
to one implementation.

- NFS's saveDataAsChunk is now a thin adapter that assembles the
  GatewayChunkUploadRequest from server options and calls the helper.
  The chunkUploader interface keeps working for test injection because
  the new GatewayChunkUploader interface is structurally identical.
- WebDAV's saveDataAsChunk is similarly a thin adapter — it drops the
  local operation.NewUploader call plus the AssignVolume/UploadOption
  scaffolding.
- mount is intentionally left alone. mount's saveDataAsChunk has two
  features that do not fit the shared helper (a pre-allocated file-id
  pool used to skip AssignVolume entirely, and a chunkCache
  write-through at offset 0 so future reads hit the mount's local
  cache), both of which are mount-specific.

Marks the Phase 2 "extract shared read/write helpers from mount,
WebDAV, and SFTP" DEV_PLAN item as done. The filer-level chunk read
path (NonOverlappingVisibleIntervals + ViewFromVisibleIntervals +
NewChunkReaderAtFromClient) was already shared.

* nfs: remove DESIGN.md and DEV_PLAN.md

The planning documents have served their purpose — all phase 1 and
phase 2 items are landed, phase 3 streaming writes are landed, phase 2
shared helpers are extracted, and the two remaining phase 4 items
(shared lock state + lock tests) are blocked upstream on
github.com/willscott/go-nfs which exposes no NLM or NFSv4 lock state
RPCs. The running decision log no longer reflects current code and
would just drift. The NFS wiki page
(https://github.com/seaweedfs/seaweedfs/wiki/NFS-Server) now carries
the overview, configuration surface, architecture notes, and known
limitations; the source is the source of truth for the rest.
2026-04-14 20:48:24 -07:00
Chris Lu
50f25bb5cd 4.20 2026-04-13 13:25:13 -07:00
Lars Lehtonen
80db692728 fix(weed/util/chunk_cache): fix dropped errors (#9042) 2026-04-13 01:16:56 -07:00
Chris Lu
edf7d2a074 fix(filer): eliminate redundant disk reads causing memory/CPU regression (#9039)
* fix(filer): eliminate redundant disk reads causing memory/CPU regression (#9035)

Since 4.18, LocalMetaLogBuffer's ReadFromDiskFn was set to
readPersistedLogBufferPosition, causing LoopProcessLogData to call
ReadPersistedLogBuffer on every 250ms health-check tick when a
subscriber encounters ResumeFromDiskError.  Each call creates an
OrderedLogVisitor (ListDirectoryEntries on the filer store), spawns a
readahead goroutine with a 1024-element channel, finds no data, and
returns — 4 times per second even on an idle filer.

This is redundant because SubscribeLocalMetadata already manages disk
reads explicitly with its own shouldReadFromDisk / lastCheckedFlushTsNs
tracking in the outer loop.

Set ReadFromDiskFn back to nil for LocalMetaLogBuffer.  When
LoopProcessLogData encounters ResumeFromDiskError with nil
ReadFromDiskFn, the HasData() guard returns ResumeFromDiskError to the
caller (SubscribeLocalMetadata), which blocks efficiently on
listenersCond.Wait() instead of polling.

* fix(filer): add gap detection for slow consumers after disk-read stall

When a slow consumer falls behind and LoopProcessLogData returns
ResumeFromDiskError with no flush or read-position progress, there may
be a gap between persisted data and in-memory data (e.g. writes stopped
while consumer was still catching up). Without this, the consumer would
block on listenersCond.Wait() forever.

Skip forward to the earliest in-memory time to resume progress, matching
the gap-handling pattern already used in the shouldReadFromDisk path.

* fix(filer): clear stale ResumeFromDiskError after gap-skip to avoid stall

The gap-detection block added in the previous commit skips lastReadTime
forward to GetEarliestTime() and continues the outer loop.  On the next
iteration, shouldReadFromDisk becomes true (currentReadTsNs >
lastDiskReadTsNs), the disk read returns processedTsNs == 0, and the
existing gap handler at the top of the loop runs its own gap check.
That check uses readInMemoryLogErr == ResumeFromDiskError as the entry
condition — but readInMemoryLogErr is still the stale error from two
iterations ago.  GetEarliestTime() now equals lastReadTime.Time (we
already advanced to it), so earliestTime.After(lastReadTime.Time) is
false and the handler falls into listenersCond.Wait() — stuck.

Clear readInMemoryLogErr at the gap-skip point, matching the existing
pattern at the earlier gap handler that already clears it for the same
reason.

* fix(log_buffer): GetEarliestTime must include sealed prev buffers

GetEarliestTime previously returned only logBuffer.startTime (the active
buffer's first timestamp).  That is narrower than ReadFromBuffer's
tsMemory, which is the min across active + prev buffers.  Callers using
GetEarliestTime for gap detection after ResumeFromDiskError (the
SubscribeLocalMetadata outer loop's disk-read path, the new gap-skip in
the in-memory ResumeFromDiskError handler, and MQ HasData) saw a time
that was *newer* than the real earliest in-memory data.

Impact in SubscribeLocalMetadata's slow-consumer path:
  - tsMemory = earliest prev buffer time (T_prev)
  - GetEarliestTime() = active startTime (T_active, later than T_prev)
  - Consumer position = T1, with T_prev < T1 < T_active
  - ReadFromBuffer returns ResumeFromDiskError (T1 < tsMemory)
  - Gap detect: GetEarliestTime().After(T1) = T_active.After(T1) = true
  - Skip forward to T_active -- silently drops the prev-buffer data
  - And when T_active happens to equal the stuck position, gap detect
    evaluates false, and the subscriber stalls on listenersCond.Wait()

This reproduces the TestMetadataSubscribeSlowConsumerKeepsProgressing
failure in CI where the consumer stalled at 10220/20000 after writing
stopped -- the buffer still had data in prev[0..3], but gap detection
was comparing against the active buffer's startTime.

Fix: scan all sealed prev buffers under RLock, return the true minimum
startTime.  Matches the min-of-buffers logic in ReadFromBuffer.

* test(log_buffer): make DiskReadRetry test deterministic

The previous test added the message via AddToBuffer + ForceFlush and
relied on a race: the second disk read had to happen before the data
was delivered through the in-memory path.  Under the race detector or
on a slow CI runner, the reader is woken by AddToBuffer's notification,
finds the data in the active buffer or its prev slot, and returns after
exactly one disk read — failing the >= 2 disk reads assertion even
though the loop behaved correctly.

Reproduced on master with race detector (2/5 failures).

Rewrite the test to deliver the data exclusively through the disk-read
path: no AddToBuffer, no ForceFlush.  The test waits until the reader
has issued at least one no-op disk read, then atomically flips a
"dataReady" flag.  The reader's next iteration through readFromDiskFn
returns the entry.  This deterministically exercises the retry-loop
behavior the test was originally written to protect, and removes the
in-memory delivery race entirely.
2026-04-11 23:12:54 -07:00
Chris Lu
e648c76bcf go fmt 2026-04-10 17:31:14 -07:00
Chris Lu
eb5624233d [filer] fix log buffer idle polling (#9012)
* fix log buffer idle polling

* log_buffer: document notificationHealthCheckInterval tradeoffs

Explain that notifyChan is the primary wakeup path and this interval only
bounds the fallback / state-recheck cadence, so future maintainers don't
tune it without understanding the implications for client-disconnect
detection latency.

* log_buffer: rename waitForNotification to awaitNotificationOrTimeout

The helper returns after either a notification or the health-check
timeout; the old name read like it blocked indefinitely. No behavior
change.

* log_buffer: wake blocked subscribers on shutdown

awaitNotificationOrTimeout previously only returned on notifyChan or the
health-check timeout, so ShutdownLogBuffer on an idle buffer (where
copyToFlush returns nil and loopFlush never fires the post-flush
notification) would leave subscribers parked for up to 250ms before they
noticed IsStopping.

Add an internal shutdownCh closed by ShutdownLogBuffer and select on it
from awaitNotificationOrTimeout, which is now a method on *LogBuffer.
Subscribers wake immediately, re-check IsStopping, and exit. No change
to LoopProcessLogData signatures or any caller (filer metadata
subscribers, MQ broker, local partition subscribe).

* log_buffer: regression tests for flush-notify wake-up

TestLoopFlush_NotifiesSubscribersAfterFlush directly verifies that
loopFlush calls notifySubscribers after processing a flush, so a reader
parked on notifyChan wakes promptly when a flush lands. Verified to fail
if that notification is removed.

TestLoopProcessLogDataWithOffset_WakesOnDataArrival is the end-to-end
counterpart: a real LoopProcessLogDataWithOffset reader parks on
notifyChan via the ResumeFromDiskError branch, then wakes and processes
the entry well under the 250ms fallback once data arrives.

* log_buffer: keep notification-timeout logs at V(4)

Revert the V(4)->V(5) demotion. Now that the shutdown wake-up path
exists and (with the follow-up fix) idle-polling CPU churn is bounded
by the 250ms health check, these timeout logs no longer flood at V=4
the way they did on the 10ms fallback, so the previous verbosity is
appropriate again.

* log_buffer: exit reader loops cleanly on shutdown

awaitNotificationOrTimeout returns true on both data notifications and
shutdown (shutdownCh closed). Without an explicit IsStopping() guard,
the ResumeFromDiskError, offset-based no-data, empty-buffer, and
timestamp-wait paths would either tight-spin against a closed shutdownCh
or, in the offset-based case, return ResumeFromDiskError to the caller
instead of exiting.

Add an IsStopping() check after each awaitNotificationOrTimeout call
that previously continued or returned ResumeFromDiskError, so subscribers
exit promptly with isDone=true and err=nil when ShutdownLogBuffer is
called.

* log_buffer: regression test for shutdown wake-up

Park a real LoopProcessLogDataWithOffset reader on notifyChan via the
ResumeFromDiskError branch, call ShutdownLogBuffer, and assert the
reader exits with isDone=true and err=nil well under the 250ms
fallback. Verified to fail (timeout) if the IsStopping() guards added
in the prior commit are removed.

* log_buffer: bump reader-park sleep to 50ms with rationale

Both wake-path tests use a sleep to give the goroutine time to reach
awaitNotificationOrTimeout before the test triggers the wake-up.
Bump from 20ms to 50ms and document the timing assumption to reduce
flakiness on slow CI. Both paths are race-free either way (a buffered
notification or a closed shutdownCh stays valid until consumed), so
this is purely about exercising the park-then-wake path rather than
the already-pending fast path.
2026-04-09 18:09:57 -07:00
eason
a04c9c7dde fix: close CPU profile file after stopping profiling (#9000)
The file handle from os.Create(cpuProfile) was passed to
pprof.StartCPUProfile but never closed in the OnInterrupt handler.
The block and mutex profile files are correctly closed, but the
main CPU profile file was leaked.

Add f.Close() after pprof.StopCPUProfile() to prevent the file
descriptor leak.

Co-authored-by: easonysliu <easonysliu@tencent.com>
2026-04-08 22:13:02 -07:00
Chris Lu
0bdf9b0683 4.19 2026-04-07 19:21:35 -07:00
Chris Lu
2919bb27e5 fix(sync): use per-cluster TLS for HTTP volume connections in filer.sync (#8974)
* fix(sync): use per-cluster TLS for HTTP volume connections in filer.sync (#8965)

When filer.sync runs with -a.security and -b.security flags, only gRPC
connections received per-cluster TLS configuration. HTTP clients for
volume server reads and uploads used a global singleton with the default
security.toml, causing TLS verification failures when clusters use
different self-signed certificates.

Load per-cluster HTTPS client config from the security files and pass
dedicated HTTP clients to FilerSource (for downloads) and FilerSink
(for uploads) so each direction uses the correct cluster's certificates.

* fix(sync): address review feedback for per-cluster HTTP TLS

- Add insecure_skip_verify support to NewHttpClientWithTLS and read it
  from per-cluster security config via https.client.insecure_skip_verify
- Error on partial mTLS config (cert without key or vice versa)
- Add nil-check for client parameter in DownloadFileWithClient
- Document SetUploader as init-only (same pattern as SetChunkConcurrency)
2026-04-07 14:11:44 -07:00
Chris Lu
f6df7126b6 feat(admin): add profiling options for debugging high memory/CPU usage (#8923)
* feat(admin): add profiling options for debugging high memory/CPU usage

Add -debug, -debug.port, -cpuprofile, and -memprofile flags to the admin
command, matching the profiling support already available in master, volume,
and other server commands. This enables investigation of resource usage
issues like #8919.

* refactor(admin): move profiling flags into AdminOptions struct

Move cpuprofile and memprofile flags from global variables into the
AdminOptions struct and init() function for consistency with other flags.

* fix(debug): bind pprof server to localhost only and document profiling flags

StartDebugServer was binding to all interfaces (0.0.0.0), exposing
runtime profiling data to the network. Restrict to 127.0.0.1 since
this is a development/debugging tool.

Also add a "Debugging and Profiling" section to the admin command's
help text documenting the new flags.
2026-04-04 10:05:19 -07:00
Chris Lu
0798b274dd feat(s3): add concurrent chunk prefetch for large file downloads (#8917)
* feat(s3): add concurrent chunk prefetch for large file downloads

Add a pipe-based prefetch pipeline that overlaps chunk fetching with
response writing during S3 GetObject, SSE downloads, and filer proxy.

While chunk N streams to the HTTP response, fetch goroutines for the
next K chunks establish HTTP connections to volume servers ahead of
time, eliminating the RTT gap between sequential chunk fetches.

Uses io.Pipe for minimal memory overhead (~1MB per download regardless
of chunk size, vs buffering entire chunks). Also increases the
streaming read buffer from 64KB to 256KB to reduce syscall overhead.

Benchmark results (64KB chunks, prefetch=4):
- 0ms latency:  1058 → 2362 MB/s (2.2× faster)
- 5ms latency:  11.0 → 41.7 MB/s (3.8× faster)
- 10ms latency: 5.9  → 23.3 MB/s (4.0× faster)
- 20ms latency: 3.1  → 12.1 MB/s (3.9× faster)

* fix: address review feedback for prefetch pipeline

- Fix data race: use *chunkPipeResult (pointer) on channel to avoid
  copying struct while fetch goroutines write to it. Confirmed clean
  with -race detector.
- Remove concurrent map write: retryWithCacheInvalidation no longer
  updates fileId2Url map. Producer only reads it; consumer never writes.
- Use mem.Allocate/mem.Free for copy buffer to reduce GC pressure.
- Add local cancellable context so consumer errors (client disconnect)
  immediately stop the producer and all in-flight fetch goroutines.

* fix(test): remove dead code and add Range header support in test server

- Remove unused allData variable in makeChunksAndServer
- Add Range header handling to createTestServer for partial chunk
  read coverage (206 Partial Content, 416 Range Not Satisfiable)

* fix: correct retry condition and goroutine leak in prefetch pipeline

- Fix retry condition: use result.fetchErr/result.written instead of
  copied to decide cache-invalidation retry. The old condition wrongly
  triggered retry when the fetch succeeded but the response writer
  failed on the first write (copied==0 despite fetcher having data).
  Now matches the sequential path (stream.go:197) which checks whether
  the fetcher itself wrote zero bytes.

- Fix goroutine leak: when the producer's send to the results channel
  is interrupted by context cancellation, the fetch goroutine was
  already launched but the result was never sent to the channel. The
  drain loop couldn't handle it. Now waits on result.done before
  returning so every fetch goroutine is properly awaited.
2026-04-03 19:57:30 -07:00
Chris Lu
995dfc4d5d chore: remove ~50k lines of unreachable dead code (#8913)
* chore: remove unreachable dead code across the codebase

Remove ~50,000 lines of unreachable code identified by static analysis.

Major removals:
- weed/filer/redis_lua: entire unused Redis Lua filer store implementation
- weed/wdclient/net2, resource_pool: unused connection/resource pool packages
- weed/plugin/worker/lifecycle: unused lifecycle plugin worker
- weed/s3api: unused S3 policy templates, presigned URL IAM, streaming copy,
  multipart IAM, key rotation, and various SSE helper functions
- weed/mq/kafka: unused partition mapping, compression, schema, and protocol functions
- weed/mq/offset: unused SQL storage and migration code
- weed/worker: unused registry, task, and monitoring functions
- weed/query: unused SQL engine, parquet scanner, and type functions
- weed/shell: unused EC proportional rebalance functions
- weed/storage/erasure_coding/distribution: unused distribution analysis functions
- Individual unreachable functions removed from 150+ files across admin,
  credential, filer, iam, kms, mount, mq, operation, pb, s3api, server,
  shell, storage, topology, and util packages

* fix(s3): reset shared memory store in IAM test to prevent flaky failure

TestLoadIAMManagerFromConfig_EmptyConfigWithFallbackKey was flaky because
the MemoryStore credential backend is a singleton registered via init().
Earlier tests that create anonymous identities pollute the shared store,
causing LookupAnonymous() to unexpectedly return true.

Fix by calling Reset() on the memory store before the test runs.

* style: run gofmt on changed files

* fix: restore KMS functions used by integration tests

* fix(plugin): prevent panic on send to closed worker session channel

The Plugin.sendToWorker method could panic with "send on closed channel"
when a worker disconnected while a message was being sent. The race was
between streamSession.close() closing the outgoing channel and sendToWorker
writing to it concurrently.

Add a done channel to streamSession that is closed before the outgoing
channel, and check it in sendToWorker's select to safely detect closed
sessions without panicking.
2026-04-03 16:04:27 -07:00
Chris Lu
6213daf118 4.18 2026-04-01 17:42:41 -07:00
Chris Lu
ced2236cc6 Adjust rename events metadata format (#8854)
* rename metadata events

* fix subscription filter to use NewEntry.Name for rename path matching

The server-side subscription filter constructed the new path using
OldEntry.Name instead of NewEntry.Name when checking if a rename
event's destination matches the subscriber's path prefix. This could
cause events to be incorrectly filtered when a rename changes the
file name.

* fix bucket events to handle rename of bucket directories

onBucketEvents only checked IsCreate and IsDelete. A bucket directory
rename via AtomicRenameEntry now emits a single rename event (both
OldEntry and NewEntry non-nil), which matched neither check. Handle
IsRename by deleting the old bucket and creating the new one.

* fix replicator to handle rename events across directory boundaries

Two issues fixed:

1. The replicator filtered events by checking if the key (old path)
   was under the source directory. Rename events now use the old path
   as key, so renames from outside into the watched directory were
   silently dropped. Now both old and new paths are checked, and
   cross-boundary renames are converted to create or delete.

2. NewParentPath was passed to the sink without remapping to the
   sink's target directory structure, causing the sink to write
   entries at the wrong location. Now NewParentPath is remapped
   alongside the key.

* fix filer sync to handle rename events crossing directory boundaries

The early directory-prefix filter only checked resp.Directory (old
parent). Rename events now carry the old parent as Directory, so
renames from outside the source path into it were dropped before
reaching the existing cross-boundary handling logic. Check both old
and new directories against sourcePath and excludePaths so the
downstream old-key/new-key logic can properly convert these to
create or delete operations.

* fix metadata event path matching

* fix metadata event consumers for rename targets

* Fix replication rename target keys

Logical rename events now reach replication sinks with distinct source and target paths.\n\nHandle non-filer sinks as delete-plus-create on the translated target key, and make the rename fallback path create at the translated target key too.\n\nAdd focused tests covering non-filer renames, filer rename updates, and the fallback path.\n\nCo-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix filer sync rename path scoping

Use directory-boundary matching instead of raw prefix checks when classifying source and target paths during filer sync.\n\nAlso apply excludePaths per side so renames across excluded boundaries downgrade cleanly to create/delete instead of being misclassified as in-scope updates.\n\nAdd focused tests for boundary matching and rename classification.\n\nCo-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix replicator directory boundary checks

Use directory-boundary matching instead of raw prefix checks when deciding whether a source or target path is inside the watched tree or an excluded subtree.\n\nThis prevents sibling paths such as /foo and /foobar from being misclassified during rename handling, and preserves the earlier rename-target-key fix.\n\nAdd focused tests for boundary matching and rename classification across sibling/excluded directories.\n\nCo-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix etc-remote rename-out handling

Use boundary-safe source/target directory membership when classifying metadata events under DirectoryEtcRemote.\n\nThis prevents rename-out events from being processed as config updates, while still treating them as removals where appropriate for the remote sync and remote gateway command paths.\n\nAdd focused tests for update/removal classification and sibling-prefix handling.\n\nCo-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Defer rename events until commit

Queue logical rename metadata events during atomic and streaming renames and publish them only after the transaction commits successfully.\n\nThis prevents subscribers from seeing delete or logical rename events for operations that later fail during delete or commit.\n\nAlso serialize notification.Queue swaps in rename tests and add failure-path coverage.\n\nCo-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Skip descendant rename target lookups

Avoid redundant target lookups during recursive directory renames once the destination subtree is known absent.\n\nThe recursive move path now inserts known-absent descendants directly, and the test harness exercises prefixed directory listing so the optimization is covered by a directory rename regression test.\n\nCo-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Tighten rename review tests

Return filer_pb.ErrNotFound from the bucket tracking store test stub so it follows the FilerStore contract, and add a webhook filter case for same-name renames across parent directories.\n\nCo-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix HardLinkId format verb in InsertEntryKnownAbsent error

HardLinkId is a byte slice. %d prints each byte as a decimal number
which is not useful for an identifier. Use %x to match the log line
two lines above.

* only skip descendant target lookup when source and dest use same store

moveFolderSubEntries unconditionally passed skipTargetLookup=true for
every descendant. This is safe when all paths resolve to the same
underlying store, but with path-specific store configuration a child's
destination may map to a different backend that already holds an entry
at that path. Use FilerStoreWrapper.SameActualStore to check per-child
and fall back to the full CreateEntry path when stores differ.

* add nil and create edge-case tests for metadata event scope helpers

* extract pathIsEqualOrUnder into util.IsEqualOrUnder

Identical implementations existed in both replication/replicator.go and
command/filer_sync.go. Move to util.IsEqualOrUnder (alongside the
existing FullPath.IsUnder) and remove the duplicates.

* use MetadataEventTargetDirectory for new-side directory in filer sync

The new-side directory checks and sourceNewKey computation used
message.NewParentPath directly. If NewParentPath were empty (legacy
events, older filer versions during rolling upgrades), sourceNewKey
would be wrong (/filename instead of /dir/filename) and the
UpdateEntry parent path rewrite would panic on slice bounds.

Derive targetDir once from MetadataEventTargetDirectory, which falls
back to resp.Directory when NewParentPath is empty, and use it
consistently for all new-side checks and the sink parent path.
2026-03-30 18:25:11 -07:00
Chris Lu
92c2fc0d52 Add insecure_skip_verify option for HTTPS client in security.toml (#8781)
* Add -insecureSkipVerify flag and config option for filer.sync HTTPS connections

When using filer.sync between clusters with different CAs (e.g., separate
OpenShift clusters), TLS certificate verification fails with "x509:
certificate signed by unknown authority". This adds two ways to skip TLS
certificate verification:

1. CLI flag: `weed filer.sync -insecureSkipVerify ...`
2. Config option: `insecure_skip_verify = true` under [https.client] in
   security.toml

Closes #8778

* Add insecure_skip_verify option for HTTPS client in security.toml

When using filer.sync between clusters with different CAs (e.g., separate
OpenShift clusters), TLS certificate verification fails. Adding
insecure_skip_verify = true under [https.client] in security.toml allows
skipping TLS certificate verification.

The option is read during global HTTP client initialization so it applies
to all HTTPS connections including filer.sync proxy reads and writes.

Closes #8778

---------

Co-authored-by: Copilot <copilot@github.com>
2026-03-26 11:42:47 -07:00
Chris Lu
2877febd73 S3: fix silent PutObject failure and enforce 1024-byte key limit (#8764)
* S3: add KeyTooLongError error code

Add ErrKeyTooLongError (HTTP 400, code "KeyTooLongError") to match the
standard AWS S3 error for object keys that exceed length limits.

* S3: fix silent PutObject failure when entry name exceeds max_file_name_length

putToFiler called client.CreateEntry() directly and discarded the gRPC
response. The filer embeds application errors like "entry name too long"
in resp.Error (not as gRPC transport errors), so the error was silently
swallowed and clients received HTTP 200 with an ETag for objects that
were never stored.

Switch to the filer_pb.CreateEntry() helper which properly checks
resp.Error, and map "entry name too long" to KeyTooLongError (HTTP 400).

To avoid fragile string parsing across the gRPC boundary, define shared
error message constants in weed/util/constants and use them in both the
filer (producing errors) and S3 API (matching errors). Switch
filerErrorToS3Error to use strings.Contains/HasSuffix with these
constants so matches work regardless of any wrapper prefix. Apply
filerErrorToS3Error to the mkdir path for directory markers.

Fixes #8759

* S3: enforce 1024-byte maximum object key length

AWS S3 limits object keys to 1024 bytes. Add early validation on write
paths (PutObject, CopyObject, CreateMultipartUpload) to reject keys
exceeding the limit with the standard KeyTooLongError (HTTP 400).

The key length check runs before bucket auto-creation to prevent
overlong keys from triggering unnecessary side effects.

Also use filerErrorToS3Error for CopyObject's mkFile error paths so
name-too-long errors from the filer return KeyTooLongError instead of
InternalError.

Ref #8758

* S3: add handler-level tests for key length validation and error mapping

Add tests for filerErrorToS3Error mapping "entry name too long" to
KeyTooLongError, including a regression test for the CreateEntry-prefixed
"existing ... is a directory" form. Add handler-level integration tests
that exercise PutObjectHandler, CopyObjectHandler, and
NewMultipartUploadHandler via httptest, verifying HTTP 400 and
KeyTooLongError XML response for overlong keys and acceptance of keys at
the 1024-byte limit.
2026-03-24 13:35:28 -07:00
Chris Lu
6bf654c25c fix: keep metadata subscriptions progressing (#8730) (#8746)
* fix: keep metadata subscriptions progressing (#8730)

* test: cancel slow metadata writers with parent context

* filer: ignore missing persisted log chunks
2026-03-23 15:26:54 -07:00
Chris Lu
08b79a30f6 Fix lock table shared wait condition (#8707)
* Fix lock table shared wait condition (#8696)

* Refactor lock table waiter check

* Add exclusive lock wait helper test
2026-03-19 16:08:24 -07:00
Chris Lu
b665c329bc fix(replication): resume partial chunk reads on EOF instead of re-downloading (#8607)
* fix(replication): resume partial chunk reads on EOF instead of re-downloading

When replicating chunks and the source connection drops mid-transfer,
accumulate the bytes already received and retry with a Range header
to fetch only the remaining bytes. This avoids re-downloading
potentially large chunks from scratch on each retry, reducing load
on busy source servers and speeding up recovery.

* test(replication): add tests for downloadWithRange including gzip partial reads

Tests cover:
- No offset (no Range header sent)
- With offset (Range header verified)
- Content-Disposition filename extraction
- Partial read + resume: server drops connection mid-transfer, client
  resumes with Range from the offset of received bytes
- Gzip partial read + resume: first response is gzip-encoded (Go auto-
  decompresses), connection drops, resume request gets decompressed data
  (Go doesn't add Accept-Encoding when Range is set, so the server
  decompresses), combined bytes match original

* fix(replication): address PR review comments

- Consolidate downloadWithRange into DownloadFile with optional offset
  parameter (variadic), eliminating code duplication (DRY)
- Validate HTTP response status: require 206 + correct Content-Range
  when offset > 0, reject when server ignores Range header
- Use if/else for fullData assignment for clarity
- Add test for rejected Range (server returns 200 instead of 206)

* refactor(replication): remove unused ReplicationSource interface

The interface was never referenced and its signature didn't match
the actual FilerSource.ReadPart method.

---------

Co-authored-by: Copilot <copilot@github.com>
2026-03-11 22:38:22 -07:00
Chris Lu
4a5243886a 4.17 2026-03-11 02:29:24 -07:00
Chris Lu
8ad58e7002 4.16 2026-03-09 21:52:43 -07:00
Chris Lu
2ec0a67ee3 master: return 503/Unavailable during topology warmup after leader change (#8529)
* master: return 503/Unavailable during topology warmup after leader change

After a master restart or leader change, the topology is empty until
volume servers reconnect and send heartbeats. During this warmup window
(3 heartbeat intervals = 15 seconds), volume lookups that fail now
return 503 Service Unavailable (HTTP) or gRPC Unavailable instead of
404 Not Found, signaling clients to retry with other masters.

* master: skip warmup 503 on fresh start and single-master setups

- Check MaxVolumeId > 0 to distinguish restart from fresh start
  (MaxVolumeId is Raft-persisted, so 0 means no prior data)
- Check peer count > 1 so single-master deployments aren't affected
  (no point suggesting "retry with other masters" if there are none)

* master: address review feedback and block assigns during warmup

- Protect LastLeaderChangeTime with dedicated mutex (fix data race)
- Extract warmup multiplier as WarmupPulseMultiplier constant
- Derive Retry-After header from pulse config instead of hardcoding
- Only trigger warmup 503 for "not found" errors, not parse errors
- Return nil response (not partial) on gRPC Unavailable
- Add doc comments to IsWarmingUp, getter/setter, WarmupDuration
- Block volume assign requests (HTTP and gRPC) during warmup,
  since the topology is incomplete and assignments would be unreliable
- Skip warmup behavior for single-master setups (no peers to retry)

* master: apply warmup to all setups, skip only on fresh start

Single-master restarts still have an empty topology until heartbeats
arrive, so warmup protection should apply there too. The only case
to skip is a fresh cluster start (MaxVolumeId == 0), which already
has no volumes to look up.

- Remove GetMasterCount() > 1 guard from all warmup checks
- Remove now-unused GetMasterCount helper
- Update error messages to "topology is still loading" (not
  "retry with other masters" which doesn't apply to single-master)

* master: add client-side retry on Unavailable for lookup and assign

The server-side 503/Unavailable during warmup needs client cooperation.
Previously, LookupVolumeIds and Assign would immediately propagate the
error without retry.

Now both paths retry with exponential backoff (1s -> 1.5s -> ... up to
6s) when receiving Unavailable, respecting context cancellation. This
covers the warmup window where the master's topology is still loading
after a restart or leader change.

* master: seed warmup timestamp in legacy raft path at setup

The legacy raft path only set lastLeaderChangeTime inside the event
listener callback, which could fire after IsLeader() was already
observed as true in SetRaftServer. Seed the timestamp at setup time
(matching the hashicorp path) so IsWarmingUp() is active immediately.

* master: fix assign retry loop to cover full warmup window

The retry loop used waitTime <= maxWaitTime as a stop condition,
causing it to give up after ~13s while warmup lasts 15s. Now cap
each individual sleep at maxWaitTime but keep retrying until the
context is cancelled.

* master: preserve gRPC status in lookup retry and fix retry window

Return the raw gRPC error instead of wrapping with fmt.Errorf so
status.FromError() can extract the status code. Use proper gRPC
status check (codes.Unavailable) instead of string matching. Also
cap individual sleep at maxWaitTime while retrying until ctx is done.

* master: use gRPC status code instead of string matching in assign retry

Use status.FromError/codes.Unavailable instead of brittle
strings.Contains for detecting retriable gRPC errors in the
assign retry loop.

* master: use remaining warmup duration for Retry-After header

Set Retry-After to the remaining warmup time instead of the full
warmup duration, so clients don't wait longer than necessary.

* master: reset ret.Replicas before populating from assign response

Clear Replicas slice before appending to prevent duplicate entries
when the assign response is retried or when alternative requests
are attempted.

* master: add unit tests for warmup retry behavior

Test that Assign() and LookupVolumeIds() retry on codes.Unavailable
and stop promptly when the context is cancelled.

* master: record leader change time before initialization work

Move SetLastLeaderChangeTime() to fire immediately when the leader
change event is received, before DoBarrier(), EnsureTopologyId(),
and updatePeers(), so the warmup clock starts at the true moment
of leadership transition.

* master: use topology warmup duration in volume growth wait loop

Replace hardcoded constants.VolumePulsePeriod * 2 with
topo.IsWarmingUp() and topo.WarmupDuration() so the growth wait
stays in sync with the configured warmup window. Remove unused
constants import.

* master: resolve master before creating RPC timeout context

Move GetMaster() call before context.WithTimeout() so master
resolution blocking doesn't consume the gRPC call timeout.

* master: use NotFound flag instead of string matching for volume lookup

Add a NotFound field to LookupResult and set it in findVolumeLocation
when a volume is genuinely missing. Update HTTP and gRPC warmup
checks to use this flag instead of strings.Contains on the error
message.

* master: bound assign retry loop to 30s for deadline-free contexts

Without a context deadline, the Unavailable retry loop could spin
forever. Add a maxRetryDuration of 30s so the loop gives up even
when no context deadline is set.

* master: strengthen assign retry cancellation test

Verify the retry loop actually retried (callCount > 1) and that
the returned error is context.DeadlineExceeded, not just any error.

* master: extract shared retry-with-backoff utility

Add util.RetryWithBackoff for context-aware, bounded retry with
exponential backoff. Refactor both Assign() and LookupVolumeIds()
to use it instead of duplicating the retry/sleep/backoff logic.

* master: cap waitTime in RetryWithBackoff to prevent unbounded growth

Cap the backoff waitTime at maxWaitTime so it doesn't grow
indefinitely in long-running retry scenarios.

* master: only return Unavailable during warmup when all lookups failed

For batched LookupVolume requests, return partial results when some
volumes are found. Only return codes.Unavailable when no volumes
were successfully resolved, so clients benefit from partial results
instead of retrying unnecessarily.

* master: set retriable error message in 503 response body

When returning 503 during warmup, replace the "not found" error
in the JSON body with "service warming up, please retry" so
clients don't treat it as a permanent error.

* master: guard empty master address in LookupVolumeIds

If GetMaster() returns empty (no master found or ctx cancelled),
return an appropriate error instead of dialing an empty address.
Returns ctx.Err() if context is done, otherwise codes.Unavailable
to trigger retry.

* master: add comprehensive tests for RetryWithBackoff

Test success after retries, non-retryable error handling, context
cancellation, and maxDuration cap with context.Background().

* master: enforce hard maxDuration bound in RetryWithBackoff

Use a deadline instead of elapsed-time check so the last sleep is
capped to remaining time. This prevents the total retry duration
from overshooting maxDuration by up to one full backoff interval.

* master: respect fresh-start bypass in RemainingWarmupDuration

Check IsWarmingUp() first (which returns false when MaxVolumeId==0)
so RemainingWarmupDuration returns 0 on fresh clusters.

* master: round up Retry-After seconds to avoid underestimating

Use math.Ceil so fractional remaining seconds (e.g. 1.9s) round
up to the next integer (2) instead of flooring down (1).

* master: tighten batch lookup warmup to all-NotFound only

Only return codes.Unavailable when every requested volume ID was
a transient not-found. Mixed cases with non-NotFound errors now
return the response with per-volume error details preserved.

* master: reduce retry log noise and fix timer leak

Lower per-attempt retry log from V(0) to V(1) to reduce noise
during warmup. Replace time.After with time.NewTimer to avoid
lingering timers when context is cancelled.

* master: add per-attempt timeout for assign RPC

Use a 10s per-attempt timeout so a single slow RPC can't consume
the entire 30s retry budget when ctx has no deadline.

* master: share single 30s retry deadline across assign request entries

The Assign() function iterates over primary and fallback requests,
previously giving each its own 30s RetryWithBackoff budget. With a
primary + fallback, the total could reach 60s. Compute one deadline
up front and pass the remaining budget to each RetryWithBackoff call
so the entire Assign() call stays within a single 30s cap.

* master: strengthen context-cancel test with DeadlineExceeded and retry assertions

Assert errors.Is(err, context.DeadlineExceeded) to verify the error
is specifically from the context deadline, and check callCount > 1
to prove retries actually occurred before cancellation. Mirrors the
pattern used in TestAssignStopsOnContextCancel.

* master: bound GetMaster with per-attempt timeout in LookupVolumeIds

GetMaster() calls WaitUntilConnected() which can block indefinitely
if no master is available. Previously it used the outer ctx, so a
slow master resolution could consume the entire RetryWithBackoff
budget in a single attempt. Move the per-attempt timeoutCtx creation
before the GetMaster call so both master resolution and the gRPC
LookupVolume RPC share one grpcTimeout-bounded attempt.

* master: use deadline-aware context for assign retry budget

The shared 30s deadline only limited RetryWithBackoff's internal
wall-clock tracking, but per-attempt contexts were still derived
from the original ctx and could run for up to 10s even when the
budget was nearly exhausted. Create a deadlineCtx from the computed
deadline and derive both RetryWithBackoff and per-attempt timeouts
from it so all operations honor the shared 30s cap.

* master: skip warmup gate for empty lookup requests

When VolumeOrFileIds is empty, notFoundCount == len(req.VolumeOrFileIds)
is 0 == 0 which is true, causing empty lookup batches during warmup to
return codes.Unavailable and be retried endlessly. Add a
len(req.VolumeOrFileIds) > 0 guard so empty requests pass through.

* master: validate request fields before warmup gate in Assign

Move Replication and Ttl parsing before the IsWarmingUp() check so
invalid inputs get a proper validation error instead of being masked
by codes.Unavailable during warmup. Pure syntactic validation does
not depend on topology state and should run first.

* master: check deadline and context before starting retry attempt

RetryWithBackoff only checked the deadline and context after an
attempt completed or during the sleep select. If the deadline
expired or context was canceled during sleep, the next iteration
would still call operation() before detecting it. Add pre-operation
checks so no new attempt starts after the budget is exhausted.

* master: always return ctx.Err() on context cancellation in RetryWithBackoff

When ctx.Err() is non-nil, the pre-operation check was returning
lastErr instead of ctx.Err(). This broke callers checking
errors.Is(err, context.DeadlineExceeded) and contradicted the
documented contract. Always return ctx.Err() so the cancellation
reason is properly surfaced.

* master: handle warmup errors in StreamAssign without killing the stream

StreamAssign was returning codes.Unavailable errors from Assign
directly, which terminates the gRPC stream and breaks pooled
connections. Instead, return transient errors as in-band error
responses so the stream survives warmup periods.

Also reset assignClient in doAssign on Send/Recv failures so a
broken stream doesn't leave the proxy permanently dead.

* master: wait for warmup before slot search in findAndGrow

findEmptySlotsForOneVolume was called before the warmup wait loop,
selecting slots from an incomplete topology. Move the warmup wait
before slot search so volume placement uses the fully warmed-up
topology with all servers registered.

* master: add Retry-After header to /dir/assign warmup response

The /dir/lookup handler already sets Retry-After during warmup but
/dir/assign did not, leaving HTTP clients without guidance on when
to retry. Add the same header using RemainingWarmupDuration().

* master: only seed warmup timestamp on leader at startup

SetLastLeaderChangeTime was called unconditionally for both leader
and follower nodes. Followers don't need warmup state, and the
leader change event listener handles real elections. Move the seed
into the IsLeader() block so only the startup leader gets warmup
initialized.

* master: preserve codes.Unavailable for StreamAssign warmup errors in doAssign

StreamAssign returns transient warmup errors as in-band
AssignResponse.Error messages. doAssign was converting these to plain
fmt.Errorf, losing the codes.Unavailable classification needed for
the caller's retry logic. Detect warmup error messages and wrap them
as status.Error(codes.Unavailable) so RetryWithBackoff can retry.
2026-03-08 16:05:45 -07:00
Chris Lu
540fc97e00 s3/iam: reuse one request id per request (#8538)
* request_id: add shared request middleware

* s3err: preserve request ids in responses and logs

* iam: reuse request ids in XML responses

* sts: reuse request ids in XML responses

* request_id: drop legacy header fallback

* request_id: use AWS-style request id format

* iam: fix AWS-compatible XML format for ErrorResponse and field ordering

- ErrorResponse uses bare <RequestId> at root level instead of
  <ResponseMetadata> wrapper, matching the AWS IAM error response spec
- Move CommonResponse to last field in success response structs so
  <ResponseMetadata> serializes after result elements
- Add randomness to request ID generation to avoid collisions
- Add tests for XML ordering and ErrorResponse format

* iam: remove duplicate error_response_test.go

Test is already covered by responses_test.go.

* address PR review comments

- Guard against typed nil pointers in SetResponseRequestID before
  interface assertion (CodeRabbit)
- Use regexp instead of strings.Index in test helpers for extracting
  request IDs (Gemini)

* request_id: prevent spoofing, fix nil-error branch, thread reqID to error writers

- Ensure() now always generates a server-side ID, ignoring client-sent
  x-amz-request-id headers to prevent request ID spoofing. Uses a
  private context key (contextKey{}) instead of the header string.
- writeIamErrorResponse in both iamapi and embedded IAM now accepts
  reqID as a parameter instead of calling Ensure() internally, ensuring
  a single request ID per request lifecycle.
- The nil-iamError branch in writeIamErrorResponse now writes a 500
  Internal Server Error response instead of returning silently.
- Updated tests to set request IDs via context (not headers) and added
  tests for spoofing prevention and context reuse.

* sts: add request-id consistency assertions to ActionInBody tests

* test: update admin test to expect server-generated request IDs

The test previously sent a client x-amz-request-id header and expected
it echoed back. Since Ensure() now ignores client headers to prevent
spoofing, update the test to verify the server returns a non-empty
server-generated request ID instead.

* iam: add generic WithRequestID helper alongside reflection-based fallback

Add WithRequestID[T] that uses generics to take the address of a value
type, satisfying the pointer receiver on SetRequestId without reflection.

The existing SetResponseRequestID is kept for the two call sites that
operate on interface{} (from large action switches where the concrete
type varies at runtime). Generics cannot replace reflection there since
Go cannot infer type parameters from interface{}.

* Remove reflection and generics from request ID setting

Call SetRequestId directly on concrete response types in each switch
branch before boxing into interface{}, eliminating the need for
WithRequestID (generics) and SetResponseRequestID (reflection).

* iam: return pointer responses in action dispatch

* Fix IAM error handling consistency and ensure request IDs on all responses

- UpdateUser/CreatePolicy error branches: use writeIamErrorResponse instead
  of s3err.WriteErrorResponse to preserve IAM formatting and request ID
- ExecuteAction: accept reqID parameter and generate one if empty, ensuring
  every response carries a RequestId regardless of caller

* Clean up inline policies on DeleteUser and UpdateUser rename

DeleteUser: remove InlinePolicies[userName] from policy storage before
removing the identity, so policies are not orphaned.

UpdateUser: move InlinePolicies[userName] to InlinePolicies[newUserName]
when renaming, so GetUserPolicy/DeleteUserPolicy work under the new name.

Both operations persist the updated policies and return an error if
the storage write fails, preventing partial state.
2026-03-06 15:22:39 -08:00
Chris Lu
b3f7472fd3 4.15 2026-03-04 22:13:57 -08:00
Chris Lu
7799804200 4.14
Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-04 19:22:39 -08:00
Chris Lu
f5c35240be Add volume dir tags and EC placement priority (#8472)
* Add volume dir tags to topology

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add preferred tag config for EC

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Prioritize EC destinations by tags

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add EC placement planner tag tests

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Refactor EC placement tests to reuse buildActiveTopology

Remove buildActiveTopologyWithDiskTags helper function and consolidate
tag setup inline in test cases. Tests now use UpdateTopology to apply
tags after topology creation, reusing the existing buildActiveTopology
function rather than duplicating its logic.

All tag scenario tests pass:
- TestECPlacementPlannerPrefersTaggedDisks
- TestECPlacementPlannerFallsBackWhenTagsInsufficient

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Consolidate normalizeTagList into shared util package

Extract normalizeTagList from three locations (volume.go,
detection.go, erasure_coding_handler.go) into new weed/util/tag.go
as exported NormalizeTagList function. Replace all duplicate
implementations with imports and calls to util.NormalizeTagList.

This improves code reuse and maintainability by centralizing
tag normalization logic.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add PreferredTags to EC config persistence

Add preferred_tags field to ErasureCodingTaskConfig protobuf with field
number 5. Update GetConfigSpec to include preferred_tags field in the
UI configuration schema. Add PreferredTags to ToTaskPolicy to serialize
config to protobuf. Add PreferredTags to FromTaskPolicy to deserialize
from protobuf with defensive copy to prevent external mutation.

This allows EC preferred tags to be persisted and restored across
worker restarts.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add defensive copy for Tags slice in DiskLocation

Copy the incoming tags slice in NewDiskLocation instead of storing
by reference. This prevents external callers from mutating the
DiskLocation.Tags slice after construction, improving encapsulation
and preventing unexpected changes to disk metadata.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add doc comment to buildCandidateSets method

Document the tiered candidate selection and fallback behavior. Explain
that for a planner with preferredTags, it accumulates disks matching
each tag in order into progressively larger tiers, emits a candidate
set once a tier reaches shardsNeeded, and finally falls back to the
full candidates set if preferred-tag tiers are insufficient.

This clarifies the intended semantics for future maintainers.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Apply final PR review fixes

1. Update parseVolumeTags to replicate single tag entry to all folders
   instead of leaving some folders with nil tags. This prevents nil
   pointer dereferences when processing folders without explicit tags.

2. Add defensive copy in ToTaskPolicy for PreferredTags slice to match
   the pattern used in FromTaskPolicy, preventing external mutation of
   the returned TaskPolicy.

3. Add clarifying comment in buildCandidateSets explaining that the
   shardsNeeded <= 0 branch is a defensive check for direct callers,
   since selectDestinations guarantees shardsNeeded > 0.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix nil pointer dereference in parseVolumeTags

Ensure all folder tags are initialized to either normalized tags or
empty slices, not nil. When multiple tag entries are provided and there
are more folders than entries, remaining folders now get empty slices
instead of nil, preventing nil pointer dereference in downstream code.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix NormalizeTagList to return empty slice instead of nil

Change NormalizeTagList to always return a non-nil slice. When all tags
are empty or whitespace after normalization, return an empty slice
instead of nil. This prevents nil pointer dereferences in downstream
code that expects a valid (possibly empty) slice.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add nil safety check for v.tags pointer

Add a safety check to handle the case where v.tags might be nil,
preventing a nil pointer dereference. If v.tags is nil, use an empty
string instead. This is defensive programming to prevent panics in
edge cases.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add volume.tags flag to weed server and weed mini commands

Add the volume.tags CLI option to both the 'weed server' and 'weed mini'
commands. This allows users to specify disk tags when running the
combined server modes, just like they can with 'weed volume'.

The flag uses the same format and description as the volume command:
comma-separated tag groups per data dir with ':' separators
(e.g. fast:ssd,archive).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-01 10:22:00 -08:00
Chris Lu
4f647e1036 Worker set its working directory (#8461)
* set working directory

* consolidate to worker directory

* working directory

* correct directory name

* refactoring to use wildcard matcher

* simplify

* cleaning ec working directory

* fix reference

* clean

* adjust test
2026-02-27 12:22:21 -08:00
Chris Lu
e4b70c2521 go fix 2026-02-20 18:42:00 -08:00
Konstantin Lebedev
01b3125815 [shell]: volume balance capacity by min volume density (#8026)
volume balance by min volume density and active volumes
2026-02-19 13:30:59 -08:00
Chris Lu
38e14a867b fix: cancel volume server requests on client disconnect during S3 downloads (#8373)
* fix: cancel volume server requests on client disconnect during S3 downloads

- Use http.NewRequestWithContext in ReadUrlAsStream so in-flight volume
  server requests are properly aborted when the client disconnects and
  the request context is canceled
- Distinguish context-canceled errors (client disconnect, expected) from
  real server errors in streamFromVolumeServers; log at V(3) instead of
  ERROR to reduce noise from client-side disconnects (e.g. Nginx upstream
  timeout, browser cancel, curl --max-time)

Fixes: streamFromVolumeServers: streamFn failed...context canceled"

* fixup: separate Canceled/DeadlineExceeded log severity in streamFromVolumeServers

- context.Canceled → V(3) Infof "client disconnected" (expected, no noise)
- context.DeadlineExceeded → Warningf "server-side deadline exceeded" (unexpected, needs attention)
- all other errors → Errorf (unchanged)"
2026-02-18 17:14:54 -08:00
Chris Lu
3c3a78d08e 4.13 2026-02-16 17:01:19 -08:00
Chris Lu
1e4f30c56f pb: fix IPv6 double brackets in ServerAddress formatting (#8329)
* pb: fix IPv6 double brackets in ServerAddress formatting

* pb: refactor IPv6 tests into table-driven test

* util: add JoinHostPortStr and use it in pb to avoid unsafe port parsing
2026-02-12 18:11:03 -08:00
Chris Lu
b57429ef2e Switch empty-folder cleanup to bucket policy (#8292)
* Fix Spark _temporary cleanup and add issue #8285 regression test

* Generalize empty folder cleanup for Spark temp artifacts

* Revert synchronous folder pruning and add cleanup diagnostics

* Add actionable empty-folder cleanup diagnostics

* Fix Spark temp marker cleanup in async folder cleaner

* Fix Spark temp cleanup with implicit directory markers

* Keep explicit directory markers non-implicit

* logging

* more logs

* Switch empty-folder cleanup to bucket policy

* Seaweed-X-Amz-Allow-Empty-Folders

* less logs

* go vet

* less logs

* refactoring
2026-02-10 18:38:38 -08:00
Chris Lu
af8273386d 4.12 2026-02-09 18:15:19 -08:00
Chris Lu
cb9e21cdc5 Normalize hashicorp raft peer ids (#8253)
* Normalize raft voter ids

* 4.11

* Update raft_hashicorp.go
2026-02-09 07:46:34 -08:00
Chris Lu
0c89185291 4.10 2026-02-08 21:16:58 -08:00
Chris Lu
5a5cc38692 4.09 2026-02-03 17:56:25 -08:00
Chris Lu
330bd92ddc 4.08 2026-02-02 20:44:13 -08:00
Chris Lu
ba8816e2e1 4.08 2026-02-02 20:36:03 -08:00
Chris Lu
bc853bdee5 4.07 2026-01-18 15:48:09 -08:00
Chris Lu
ce6e9be66b 4.06 2026-01-10 12:08:16 -08:00
Chris Lu
379c032868 Fix chown Input/output error on large file sets (#7996)
* Fix chown Input/output error on large file sets (Fixes #7911)

Implemented retry logic for MySQL/MariaDB backend to handle transient errors like deadlocks and timeouts.

* Fix syntax error: missing closing brace

* Refactor: Use %w for error wrapping and errors.As for extraction

* Fix: Disable retry logic inside transactions
2026-01-09 18:02:59 -08:00
promalert
9012069bd7 chore: execute goimports to format the code (#7983)
* chore: execute goimports to format the code

Signed-off-by: promalert <promalert@outlook.com>

* goimports -w .

---------

Signed-off-by: promalert <promalert@outlook.com>
Co-authored-by: Chris Lu <chris.lu@gmail.com>
2026-01-07 13:06:08 -08:00
Chris Lu
0e9f433ec4 refactoring 2026-01-04 11:40:42 -08:00
Chris Lu
87b71029f7 4.05 2026-01-01 20:39:22 -08:00