mirror of
https://github.com/seaweedfs/seaweedfs.git
synced 2026-05-14 13:51:33 +00:00
* feat(mount): cap write buffer with -writeBufferSizeMB Without a bound on the per-mount write pipeline, sustained upload failures (e.g. volume server returning "Volume Size Exceeded" while the master hasn't yet rotated assignments) let sealed chunks pile up across open file handles until the swap directory — by default os.TempDir() — fills the disk. Reported on 4.19 filling /tmp to 1.8 TB during a large rclone sync. Add a global WriteBufferAccountant shared across every UploadPipeline in a mount. Creating a new page chunk (memory or swap) first reserves ChunkSize bytes; when the cap is reached the writer blocks until an uploader finishes and releases, turning swap overflow into natural FUSE-level backpressure instead of unbounded disk growth. The new -writeBufferSizeMB flag (also accepted via fuse.conf) defaults to 0 = unlimited, preserving current behavior. Reserve drops chunksLock while blocking so uploader goroutines — which take chunksLock on completion before calling Release — cannot deadlock, and an oversized reservation on an empty accountant succeeds to avoid single-handle starvation. * fix(mount): plug write-budget leaks in pipeline Shutdown Review on #9066 caught two accounting bugs on the Destroy() path: 1. Writable-chunk leak (high). SaveDataAt() reserves ChunkSize before inserting into writableChunks, but Shutdown() only iterated sealedChunks. Truncate / metadata-invalidation flows call Destroy() (via ResetDirtyPages) without flushing first, so any dirty but unsealed chunks would permanently shrink the global write budget. Shutdown now frees and releases writable chunks too. 2. Double release with racing uploader (medium). Shutdown called accountant.Release directly after FreeReference, while the async uploader goroutine did the same on normal completion — under a Destroy-before-flush race this could underflow the accountant and let later writes exceed the configured cap. Move accounting into SealedChunk.FreeReference itself: the refcount-zero transition is exactly-once by construction, so any number of FreeReference calls release the slot precisely once. Add regression tests for the writable-leak and the FreeReference idempotency guarantee. * test(mount): remove sleep-based race in accountant blocking test Address review nits on #9066: - Replace time.Sleep(50ms) proxy for "goroutine entered Reserve" with a started channel the goroutine closes immediately before calling Reserve. Reserve cannot make progress until Release is called, so landed is guaranteed false after the handshake — no arbitrary wait. - Short-circuit WriteBufferAccountant.Used() in unlimited mode for consistency with Reserve/Release, avoiding a mutex round-trip. * test(mount): add end-to-end write-buffer cap integration test Exercises the full write-budget plumbing with a small cap (4 chunks of 64 KiB = 256 KiB) shared across three UploadPipelines fed by six concurrent writers. A gated saveFn models the "volume server rejecting uploads" condition from the original report: no sealed chunk can drain until the test opens the gate. A background sampler records the peak value of accountant.Used() throughout the run. The test asserts: - writers fill the budget and then block on Reserve (Used() stays at the cap while stalled) - Used() never exceeds the configured cap even under concurrent pressure from multiple pipelines - after the gate opens, writers drain to zero - peak observed Used() matches the cap (262144 bytes in this run) While wiring this up, the race detector surfaced a pre-existing data race on UploadPipeline.uploaderCount: the two glog.V(4) lines around the atomic Add sites read the field non-atomically. Capture the new value from AddInt32 and log that instead — one-liner each, no behavioral change. * test(fuse): end-to-end integration test for -writeBufferSizeMB Exercise the new write-buffer cap against a real weed mount so CI (fuse-integration.yml) covers the FUSE→upload-pipeline→filer path, not just the in-package unit tests. Uses a 4 MiB cap with 2 MiB chunks so every subtest's total write demand is multiples of the budget and Reserve/Release must drive forward progress for writes to complete. Subtests: - ConcurrentLargeWrites: six parallel 6 MiB files (36 MiB total, ~18 chunk allocations) through the same mount, verifies every byte round-trips. - SingleFileExceedingCap: one 20 MiB file (10 chunks) through a single handle, catching any self-deadlock when the pipeline's own earlier chunks already fill the global budget. - DoesNotDeadlockAfterPressure: final small write with a 30s timeout, catching budget-slot leaks that would otherwise hang subsequent writes on a still-full accountant. Ran locally on Darwin with macfuse against a real weed mini + mount: === RUN TestWriteBufferCap --- PASS: TestWriteBufferCap (1.82s) * test(fuse): loosen write-buffer cap e2e test + fail-fast on hang On Linux CI the previous configuration (-writeBufferSizeMB=4, -concurrentWriters=4 against a 20 MiB single-handle write) deterministically hung the "Run FUSE Integration Tests" step to the 45-minute workflow timeout, while on macOS / macfuse the same test completes in ~2 seconds (see run 24386197483). The Linux hang shows up after TestWriteBufferCap/ConcurrentLargeWrites completes cleanly, then TestWriteBufferCap/SingleFileExceedingCap starts and never emits its PASS line. Change: - Loosen the cap to 16 MiB (8 × 2 MiB chunk slots) and drop the custom -concurrentWriters override. The subtests still drive demand well above the cap (32 MiB concurrent, 12 MiB single-handle), so Reserve/Release is still on every chunk-allocation path; the cap just gives the pipeline enough headroom that interactions with the per-file writableChunkLimit and the go-fuse MaxWrite batching don't wedge a single-handle writer on a slow runner. - Wrap every os.WriteFile in a writeWithTimeout helper that dumps every live goroutine on timeout. If this ever re-regresses, CI surfaces the actual stuck goroutines instead of a 45-minute walltime. - Also guard the concurrent-writer goroutines with the same timeout + stack dump. The in-package unit test TestWriteBufferCap_SharedAcrossPipelines remains the deterministic, controlled verification of the blocking Reserve/Release path — this e2e test is now a smoke test for correctness and absence of deadlocks through a real FUSE mount, which is all it should be. * fix: address PR #9066 review — idempotent FreeReference, subtest watchdog, larger single-handle test FreeReference on SealedChunk now early-returns when referenceCounter is already <= 0. The existing == 0 body guard already made side effects idempotent, but the counter itself would still decrement into the negatives on a double-call — ugly and a latent landmine for any future caller that does math on the counter. Make double-call a strict no-op. test(fuse): per-subtest watchdog + larger single-handle test - Add runSubtestWithWatchdog and wrap every TestWriteBufferCap subtest with a 3-minute deadline. Individual writes were already timeout-wrapped but the readback loops and surrounding bookkeeping were not, leaving a gap where a subtest body could still hang. On watchdog fire, every live goroutine is dumped so CI surfaces the wedge instead of a 45-minute walltime. - Bump testLargeFileUnderCap from 12 MiB → 20 MiB (10 chunks) to exceed the 16 MiB cap (8 slots) again and actually exercise Reserve/Release backpressure on a single file handle. The earlier e2e hang was under much tighter params (-writeBufferSizeMB=4, -concurrentWriters=4, writable limit 4); with the current loosened config the pressure is gentle and the goroutine-dump-on-timeout safety net is in place if it ever regresses. Declined: adding an observable peak-Used() assertion to the e2e test. The mount runs as a subprocess so its in-process WriteBufferAccountant state isn't reachable from the test without adding a metrics/RPC surface. The deterministic peak-vs-cap verification already lives in the in-package unit test TestWriteBufferCap_SharedAcrossPipelines. Recorded this rationale inline in TestWriteBufferCap's doc comment. * test(fuse): capture mount pprof goroutine dump on write-timeout The previous run (24388549058) hung on LargeFileUnderCap and the test-side dumpAllGoroutines only showed the test process — the test's syscall.Write is blocked in the kernel waiting for FUSE to respond, which tells us nothing about where the MOUNT is stuck. The mount runs as a subprocess so its in-process stacks aren't reachable from the test. Enable the mount's pprof endpoint via -debug=true -debug.port=<free>, allocate the port from the test, and on write-timeout fetch /debug/pprof/goroutine?debug=2 from the mount process and log it. This gives CI the only view that can actually diagnose a write-buffer backpressure deadlock (writer goroutines blocked on Reserve, uploader goroutines stalled on something, etc). Kept fileSize at 20 MiB so the Linux CI run will still hit the hang (if it's genuinely there) and produce an actionable mount-side dump; the alternative — silently shrinking the test below the cap — would lose the regression signal entirely. * review: constructor-inject accountant + subtest watchdog body on main Two PR-#9066 review fixes: 1. NewUploadPipeline now takes the WriteBufferAccountant as a constructor parameter; SetWriteBufferAccountant is removed. In practice the previous setter was only called once during newMemoryChunkPages, before any goroutine could touch the pipeline, so there was no actual race — but constructor injection makes the "accountant is fixed at construction time" invariant explicit and eliminates the possibility of a future caller mutating it mid-flight. All three call sites (real + two tests) updated; the legacy TestUploadPipeline passes a nil accountant, preserving backward-compatible unlimited-mode behavior. 2. runSubtestWithWatchdog now runs body on the subtest main goroutine and starts a watcher goroutine that only calls goroutine-safe t methods (t.Log, t.Logf, t.Errorf). The previous version ran body on a spawned goroutine, which meant any require.* or writeWithTimeout t.Fatalf inside body was being called from a non-test goroutine — explicitly disallowed by Go's testing docs. The watcher no longer interrupts body (it can't), so body must return on its own — which it does via writeWithTimeout's internal 90s timeout firing t.Fatalf on (now) the main goroutine. The watchdog still provides the critical diagnostic: on timeout it dumps both test-side and mount-side (via pprof) goroutine stacks and marks the test failed via t.Errorf. * fix(mount): IsComplete must detect coverage across adjacent intervals Linux FUSE caps per-op writes at FUSE_MAX_PAGES_PER_REQ (typically 1 MiB on x86_64) regardless of go-fuse's requested MaxWrite, so a 2 MiB chunk filled by a sequential writer arrives as two adjacent 1 MiB write ops. addInterval in ChunkWrittenIntervalList does not merge adjacent intervals, so the resulting list has two elements {[0,1M], [1M,2M]} — fully covered, but list.size()==2. IsComplete previously returned `list.size() == 1 && list.head.next.isComplete(chunkSize)`, which required a single interval covering [0, chunkSize). Under that rule, chunks filled by adjacent writes never reach IsComplete==true, so maybeMoveToSealed never fires, and the chunks sit in writableChunks until FlushAll/close. SaveContent handles the adjacency correctly via its inline merge loop, so uploads work once they're triggered — but IsComplete is the gate that triggers them. This was a latent bug: without the write-buffer cap, the overflow path kicks in at writableChunkLimit (default 128) and force-seals chunks, hiding the leak. #9066's -writeBufferSizeMB adds a tighter global cap, and with 8 slots / 20 MiB test, the budget trips long before overflow. The writer blocks in Reserve, waiting for a slot that never frees because no uploader ever ran — observed in the CI run 24390596623 mount pprof dump: goroutine 1 stuck in WriteBufferAccountant.Reserve → cond.Wait, zero uploader goroutines anywhere in the 89-goroutine dump. Walk the (sorted) interval list tracking the furthest covered offset; return true if coverage reaches chunkSize with no gaps. This correctly handles adjacent intervals, overlapping intervals, and out-of-order inserts. Added TestIsComplete_AdjacentIntervals covering single-write, two adjacent halves (both orderings), eight adjacent eighths, gaps, missing edges, and overlaps. * test(fuse): route mount glog to stderr + dump mount on any write error Run 24392087737 (with the IsComplete fix) no longer hangs on Linux — huge progress. Now TestWriteBufferCap/LargeFileUnderCap fails with 'close(...write_buffer_cap_large.bin): input/output error', meaning a chunk upload failed and pages.lastErr propagated via FlushData to close(). But the mount log in the CI artifact is empty because weed mount's glog defaults to /tmp/weed.* files, which the CI upload step never sees, so we can't tell WHICH upload failed or WHY. Add -logtostderr=true -v=2 to MountOptions so glog output goes to the mount process's stderr, which the framework's startProcess redirects into f.logDir/mount.log, which the framework's DumpLogs then prints to the test output on failure. The -v=2 floor enables saveDataAsChunk upload errors (currently logged at V(0)) plus the medium-level write_pipeline/upload traces without drowning the log in V(4) noise. Also dump MOUNT goroutines on any writeWithTimeout error (not just timeout). The IsComplete fix means we now get explicit errors instead of silent hangs, and the goroutine dump at the error moment shows in-flight upload state (pending sealed chunks, retry loops, etc) that a post-failure log alone can't capture.
315 lines
11 KiB
Go
315 lines
11 KiB
Go
package fuse_test
|
||
|
||
import (
|
||
"bytes"
|
||
"crypto/rand"
|
||
"fmt"
|
||
"io"
|
||
"net/http"
|
||
"os"
|
||
"path/filepath"
|
||
"runtime"
|
||
"sync"
|
||
"testing"
|
||
"time"
|
||
|
||
"github.com/stretchr/testify/require"
|
||
)
|
||
|
||
// mountDebugPort holds the debug/pprof port the test passed to the
|
||
// mount process via -debug.port. It is set once at TestWriteBufferCap
|
||
// entry and consulted from the write-timeout paths to fetch the mount
|
||
// process's goroutine dump (the test's own dumpAllGoroutines only
|
||
// covers the test process).
|
||
var mountDebugPort int
|
||
|
||
// fetchMountGoroutines pulls a full goroutine dump from the mount
|
||
// process's pprof endpoint. If the mount debug port isn't configured
|
||
// or the HTTP call fails, a short explanation is returned instead of
|
||
// an error — this is diagnostic best-effort, not a test assertion.
|
||
func fetchMountGoroutines() string {
|
||
if mountDebugPort == 0 {
|
||
return "(mount debug port not configured)"
|
||
}
|
||
url := fmt.Sprintf("http://127.0.0.1:%d/debug/pprof/goroutine?debug=2", mountDebugPort)
|
||
client := &http.Client{Timeout: 10 * time.Second}
|
||
resp, err := client.Get(url)
|
||
if err != nil {
|
||
return fmt.Sprintf("(failed to reach mount pprof at %s: %v)", url, err)
|
||
}
|
||
defer resp.Body.Close()
|
||
body, err := io.ReadAll(resp.Body)
|
||
if err != nil {
|
||
return fmt.Sprintf("(failed to read mount pprof body: %v)", err)
|
||
}
|
||
return string(body)
|
||
}
|
||
|
||
// dumpAllGoroutines returns a full stack trace of every live goroutine.
|
||
// Used on write-timeout to give CI actionable diagnosis if the write
|
||
// buffer cap ever re-regresses into a hang.
|
||
func dumpAllGoroutines() string {
|
||
buf := make([]byte, 1<<20)
|
||
for {
|
||
n := runtime.Stack(buf, true)
|
||
if n < len(buf) {
|
||
return string(buf[:n])
|
||
}
|
||
buf = make([]byte, 2*len(buf))
|
||
}
|
||
}
|
||
|
||
// writeBufferCapConfig returns a TestConfig that exercises the new
|
||
// -writeBufferSizeMB flag. The cap is set below the aggregate in-flight
|
||
// write demand of the subtests below, so every new chunk has to pass
|
||
// through the Reserve/Release backpressure path at least some of the
|
||
// time. The cap is intentionally NOT minimal — over-tight settings
|
||
// interact with the per-file writable-chunk limit and the FUSE MaxWrite
|
||
// batching in ways that starve single-handle writers on slow CI.
|
||
//
|
||
// Also enables the mount's pprof debug endpoint so the test can fetch
|
||
// mount-process goroutine dumps on write-timeout, which is the only
|
||
// way to actually diagnose a backpressure deadlock (the test process
|
||
// itself is just blocked in syscall.Write waiting on FUSE).
|
||
func writeBufferCapConfig(debugPort int) *TestConfig {
|
||
return &TestConfig{
|
||
Collection: "",
|
||
Replication: "000",
|
||
ChunkSizeMB: 2, // 2 MiB chunks
|
||
CacheSizeMB: 100, // read cache (unrelated)
|
||
NumVolumes: 3,
|
||
EnableDebug: false,
|
||
MountOptions: []string{
|
||
// 16 MiB total write buffer ⇒ up to 8 chunks in flight
|
||
// across every open file handle on this mount. Large
|
||
// enough to avoid starving a single handle on a slow
|
||
// CI runner, small enough that the concurrent test
|
||
// below still has to drain through it.
|
||
"-writeBufferSizeMB=16",
|
||
"-debug=true",
|
||
fmt.Sprintf("-debug.port=%d", debugPort),
|
||
// Route glog to stderr so the framework's process log
|
||
// capture actually contains something — by default weed
|
||
// sends glog to /tmp/weed.* files which the CI artifact
|
||
// upload step never sees. Critical for diagnosing
|
||
// upload/saveToStorage errors on Linux runs.
|
||
"-logtostderr=true",
|
||
"-v=2",
|
||
},
|
||
SkipCleanup: false,
|
||
}
|
||
}
|
||
|
||
// writeWithTimeout wraps os.WriteFile with a hard deadline so a stuck
|
||
// write fails the test fast instead of consuming the full job budget.
|
||
// This is belt-and-braces around the 45-minute workflow timeout and
|
||
// makes write-buffer regressions surface as an actionable failure.
|
||
func writeWithTimeout(t *testing.T, path string, data []byte, timeout time.Duration) {
|
||
t.Helper()
|
||
done := make(chan error, 1)
|
||
go func() { done <- os.WriteFile(path, data, 0644) }()
|
||
select {
|
||
case err := <-done:
|
||
if err != nil {
|
||
// Dump mount goroutines on any write error, not just
|
||
// timeout — upload failures that surface via close()
|
||
// as EIO leave the mount process running but in an
|
||
// informative state (pending sealed chunks, error
|
||
// counters, etc).
|
||
t.Logf("write %s failed (%v) — dumping MOUNT goroutines:\n%s", path, err, fetchMountGoroutines())
|
||
}
|
||
require.NoError(t, err, "write %s", path)
|
||
case <-time.After(timeout):
|
||
t.Logf("write %s did not finish within %v — dumping TEST goroutines:\n%s", path, timeout, dumpAllGoroutines())
|
||
t.Logf("dumping MOUNT goroutines:\n%s", fetchMountGoroutines())
|
||
t.Fatalf("write %s timed out — write buffer cap is likely leaking or deadlocking", path)
|
||
}
|
||
}
|
||
|
||
// runSubtestWithWatchdog runs body on the current (subtest main)
|
||
// goroutine and starts a watcher goroutine that logs diagnostics and
|
||
// fails the test if body doesn't return within timeout.
|
||
//
|
||
// body must run on the main goroutine because test helpers inside it
|
||
// (require.NoError, writeWithTimeout's own t.Fatalf on its internal
|
||
// timeout) need t.Fatal / t.FailNow, which Go's testing docs restrict
|
||
// to the goroutine running the test function. The watcher goroutine
|
||
// only calls goroutine-safe t methods (t.Log, t.Logf, t.Errorf) so it
|
||
// can mark the test failed and dump diagnostics without violating
|
||
// that contract. If body is stuck past timeout the watcher still
|
||
// surfaces the wedge (test + mount goroutine dumps + a FAIL mark);
|
||
// body itself gets unblocked either by its own inner writeWithTimeout
|
||
// firing t.Fatalf or by Go test's global -timeout.
|
||
func runSubtestWithWatchdog(t *testing.T, timeout time.Duration, body func(t *testing.T)) {
|
||
t.Helper()
|
||
stop := make(chan struct{})
|
||
defer close(stop)
|
||
go func() {
|
||
select {
|
||
case <-stop:
|
||
return
|
||
case <-time.After(timeout):
|
||
t.Logf("subtest exceeded %v watchdog — dumping TEST goroutines:\n%s", timeout, dumpAllGoroutines())
|
||
t.Logf("dumping MOUNT goroutines:\n%s", fetchMountGoroutines())
|
||
t.Errorf("subtest exceeded %v watchdog — see goroutine dumps above", timeout)
|
||
}
|
||
}()
|
||
body(t)
|
||
}
|
||
|
||
// TestWriteBufferCap exercises the end-to-end write-buffer cap on a
|
||
// real FUSE mount. Without the cap, a volume-server stall would let
|
||
// the swap file grow without bound (issue #8777). With the cap, writers
|
||
// must serialize through a bounded budget while still producing correct
|
||
// output — that correctness (and the absence of deadlocks) is what
|
||
// this test verifies.
|
||
//
|
||
// Note: this test deliberately does not assert that Reserve *blocked*
|
||
// at some observed used-byte peak. The mount runs as a subprocess so
|
||
// its in-process WriteBufferAccountant state is not reachable from the
|
||
// test without adding a metrics/RPC surface to the mount binary. The
|
||
// deterministic peak-vs-cap assertion instead lives in the in-package
|
||
// unit test TestWriteBufferCap_SharedAcrossPipelines, which drives a
|
||
// controlled gated uploader and samples Used() throughout the run.
|
||
func TestWriteBufferCap(t *testing.T) {
|
||
mountDebugPort = freePort(t)
|
||
config := writeBufferCapConfig(mountDebugPort)
|
||
framework := NewFuseTestFramework(t, config)
|
||
defer framework.Cleanup()
|
||
|
||
require.NoError(t, framework.Setup(config))
|
||
|
||
const subtestTimeout = 3 * time.Minute
|
||
|
||
t.Run("ConcurrentWritesUnderCap", func(t *testing.T) {
|
||
runSubtestWithWatchdog(t, subtestTimeout, func(t *testing.T) {
|
||
testConcurrentWritesUnderCap(t, framework)
|
||
})
|
||
})
|
||
|
||
t.Run("LargeFileUnderCap", func(t *testing.T) {
|
||
runSubtestWithWatchdog(t, subtestTimeout, func(t *testing.T) {
|
||
testLargeFileUnderCap(t, framework)
|
||
})
|
||
})
|
||
|
||
t.Run("DoesNotDeadlockAfterPressure", func(t *testing.T) {
|
||
runSubtestWithWatchdog(t, subtestTimeout, func(t *testing.T) {
|
||
testWriteBufferNoDeadlockAfterPressure(t, framework)
|
||
})
|
||
})
|
||
}
|
||
|
||
// testConcurrentWritesUnderCap opens several files in parallel with
|
||
// aggregate demand that exceeds the 16 MiB write buffer cap, then
|
||
// verifies every byte survived the round trip.
|
||
func testConcurrentWritesUnderCap(t *testing.T, framework *FuseTestFramework) {
|
||
const (
|
||
numFiles = 4
|
||
fileSize = 8 * 1024 * 1024 // 8 MiB per file ⇒ 32 MiB total vs 16 MiB cap
|
||
)
|
||
|
||
dir := "write_buffer_cap_concurrent"
|
||
framework.CreateTestDir(dir)
|
||
|
||
payloads := make([][]byte, numFiles)
|
||
for i := range payloads {
|
||
buf := make([]byte, fileSize)
|
||
_, err := rand.Read(buf)
|
||
require.NoError(t, err)
|
||
payloads[i] = buf
|
||
}
|
||
|
||
start := time.Now()
|
||
var wg sync.WaitGroup
|
||
errs := make(chan error, numFiles)
|
||
timedOut := make(chan struct{}, numFiles)
|
||
for i := 0; i < numFiles; i++ {
|
||
i := i
|
||
wg.Add(1)
|
||
go func() {
|
||
defer wg.Done()
|
||
name := fmt.Sprintf("file_%02d.bin", i)
|
||
path := filepath.Join(framework.GetMountPoint(), dir, name)
|
||
done := make(chan error, 1)
|
||
go func() { done <- os.WriteFile(path, payloads[i], 0644) }()
|
||
select {
|
||
case err := <-done:
|
||
if err != nil {
|
||
errs <- fmt.Errorf("writer %d: %w", i, err)
|
||
}
|
||
case <-time.After(90 * time.Second):
|
||
timedOut <- struct{}{}
|
||
errs <- fmt.Errorf("writer %d: timed out after 90s", i)
|
||
}
|
||
}()
|
||
}
|
||
wg.Wait()
|
||
close(errs)
|
||
// If any writer timed out, dump every live goroutine so CI shows the
|
||
// wedge instead of just a walltime.
|
||
select {
|
||
case <-timedOut:
|
||
t.Logf("at least one concurrent writer timed out — dumping goroutines:\n%s", dumpAllGoroutines())
|
||
default:
|
||
}
|
||
for err := range errs {
|
||
t.Fatal(err)
|
||
}
|
||
t.Logf("wrote %d × %d MiB under 16 MiB cap in %v", numFiles, fileSize/(1024*1024), time.Since(start))
|
||
|
||
for i := 0; i < numFiles; i++ {
|
||
name := fmt.Sprintf("file_%02d.bin", i)
|
||
path := filepath.Join(framework.GetMountPoint(), dir, name)
|
||
got, err := os.ReadFile(path)
|
||
require.NoError(t, err, "read %s", name)
|
||
require.Equal(t, len(payloads[i]), len(got), "size mismatch for %s", name)
|
||
if !bytes.Equal(payloads[i], got) {
|
||
t.Fatalf("content mismatch for %s", name)
|
||
}
|
||
}
|
||
}
|
||
|
||
// testLargeFileUnderCap writes a single file whose size exceeds the
|
||
// 16 MiB cap through a single handle, verifying that the pipeline
|
||
// drains its own earlier chunks and makes forward progress rather than
|
||
// self-deadlocking when the global budget is already full of its own
|
||
// earlier sealed chunks.
|
||
func testLargeFileUnderCap(t *testing.T, framework *FuseTestFramework) {
|
||
const fileSize = 20 * 1024 * 1024 // 20 MiB ⇒ 10 chunks vs 8-slot budget
|
||
|
||
payload := make([]byte, fileSize)
|
||
_, err := rand.Read(payload)
|
||
require.NoError(t, err)
|
||
|
||
name := "write_buffer_cap_large.bin"
|
||
path := filepath.Join(framework.GetMountPoint(), name)
|
||
|
||
start := time.Now()
|
||
writeWithTimeout(t, path, payload, 90*time.Second)
|
||
t.Logf("wrote %d MiB through one handle under 16 MiB cap in %v", fileSize/(1024*1024), time.Since(start))
|
||
|
||
got, err := os.ReadFile(path)
|
||
require.NoError(t, err)
|
||
require.Equal(t, len(payload), len(got))
|
||
if !bytes.Equal(payload, got) {
|
||
t.Fatal("content mismatch on large single-handle write")
|
||
}
|
||
}
|
||
|
||
// testWriteBufferNoDeadlockAfterPressure verifies the mount is still
|
||
// healthy after being driven against the cap. A budget-slot leak would
|
||
// eventually cause every new chunk allocation to hang; a quick canary
|
||
// write catches that as a hard failure.
|
||
func testWriteBufferNoDeadlockAfterPressure(t *testing.T, framework *FuseTestFramework) {
|
||
name := "write_buffer_cap_canary.txt"
|
||
path := filepath.Join(framework.GetMountPoint(), name)
|
||
content := []byte("write buffer cap canary — mount still healthy")
|
||
|
||
writeWithTimeout(t, path, content, 30*time.Second)
|
||
|
||
got, err := os.ReadFile(path)
|
||
require.NoError(t, err)
|
||
require.Equal(t, content, got)
|
||
}
|