* fix(shell): fs.mergeVolumes now rewrites manifest chunks for large files
Previously fs.mergeVolumes skipped any chunk whose IsChunkManifest flag was
true, printing "Change volume id for large file is not implemented yet" and
continuing. Because the BFS traversal only looks at top-level
entry.Chunks, sub-chunks referenced inside a manifest were never
considered either. For any file stored as a chunk manifest (large files
go this path), chunks in the source volume stayed put, leaving behind a
few MB of live data that vacuum and volume.deleteEmpty couldn't clean
up.
This change resolves each manifest chunk recursively, moves any
sub-chunk whose volume id is in the merge plan via the existing
moveChunk path, and re-serializes the manifest. If the manifest chunk
itself lives in a source volume, or any sub-chunk moved, the new
manifest blob is uploaded to a freshly assigned file id (the old
needle becomes orphaned and is reclaimed by vacuum like any other
moved chunk).
Fixes#9116.
* address review: batch UpdateEntry, fix dry-run, defer restore, avoid source volumes
- Call UpdateEntry once per entry after the chunk loop instead of once per
moved chunk (gemini nit).
- In dry-run mode, mark anySubChanged when a sub-chunk in the plan is
encountered and return changed=true after printing "rewrite manifest",
so nested manifests also surface their would-rewrites (gemini nit).
- Defer filer_pb.AfterEntryDeserialization so the manifest chunk list is
restored even when proto.Marshal fails (coderabbit nit).
- Reject AssignVolume results whose file id lands on a volume that is a
source in the merge plan, and retry — otherwise the replacement
manifest could be written to the volume being emptied (coderabbit).
* feat(shell): add fs.distributeChunks command for even chunk distribution
Add a new weed shell command that redistributes a file's chunks evenly
across volume server nodes.
Supports three distribution modes via -mode flag:
- primary: balance chunk ownership across nodes (default)
- replica: balance both ownership and replica copies
- round-robin: assign chunks by offset order for sequential read
optimization (chunk[0]->A, chunk[1]->B, chunk[2]->C, ...)
Additional options:
- -nodes=N to target specific number of nodes
- -apply to execute (dry-run by default)
Usage:
fs.distributeChunks -path=/buckets/file.dat
fs.distributeChunks -path=/buckets/file.dat -mode=round-robin -apply
fs.distributeChunks -path=/buckets/file.dat -mode=replica -apply
fs.distributeChunks -path=/buckets/file.dat -nodes=5 -apply
* fix(shell): improve fs.distributeChunks robustness and code quality
- Propagate flag parse errors instead of swallowing them (return err)
- Handle nil chunk.Fid by falling back to legacy FileId string parsing
- Simplify node membership check using slices.Contains
* fix(shell): fix dead round-robin print loop in fs.distributeChunks
The loop was computing targetNode with sc.index%totalNodes (original
chunk index) instead of the sequential position, and discarding it via
_ = targetNode without printing anything. Replace with a correct loop
using pos%totalNodes and actually print the first 12 node assignments.
* fix(shell): compute replication/collection per-chunk in fs.distributeChunks
Previously replication and collection were derived once from chunks[0]
and reused for all moves, causing wrong volume placement for chunks
belonging to different volumes or collections. Now each chunk looks up
its own volumeInfoMap entry immediately before calling operation.Assign.
* fix(shell): prefer assignResult.Auth JWT over local signing key in fs.distributeChunks
When the master returns an Auth token in the Assign response, use it
directly for the upload instead of generating a new JWT from the local
viper signing key. Fall back to local key generation only when Auth is
empty, matching the pattern used by other upload paths.
* fix(shell): add timeout and error handling to delete requests in fs.distributeChunks
The delete loop was ignoring http.NewRequest errors and had no timeout,
risking a nil-request panic or indefinite block. Replace with
http.NewRequestWithContext and a 30s timeout, handle request creation
errors by incrementing deleteFailCount, and cancel the context
immediately after Do returns.
* feat(shell): parallelize chunk moves in fs.distributeChunks using ErrorWaitGroup
Sequential chunk moves are a bottleneck for large LLM model files with
hundreds or thousands of chunks. Use ErrorWaitGroup with
DefaultMaxParallelization (10) to run download/assign/upload concurrently.
Guard movedRecords appends, chunk.Fid updates, and writer output with a
mutex. Individual chunk failures are non-fatal and logged inline; only
successfully moved chunks are included in the metadata update.
* fix(shell): try all replica URLs on download in fs.distributeChunks
Previously only the first volume server URL was attempted, causing chunk
moves to fail if that replica was unreachable. Now iterates through all
URLs returned by LookupVolumeServerUrl and stops at the first success.
* refactor(shell): apply extract method pattern to fs.distributeChunks
Do() was a single ~615-line function. Break it into focused helpers:
- lookupFileEntry: filer entry lookup
- validateChunks: chunk manifest guard
- collectVolumeTopology: master topology query + ownership mapping
- buildDistributionCounts: chunk→node mapping and owner/copy tallies
- selectActiveNodes: target node selection
- printCurrentDistribution: per-node distribution table
- planDistribution: mode-switch planning (primary/replica/round-robin)
- printRedistributionPlan: before/after plan table
- relevantNodes: active-or-occupied node filter
Do() is now ~100 lines of orchestration; each helper has a single
clear responsibility.
* test(shell): add unit tests for fs.distributeChunks algorithms
Cover all three distribution modes and supporting helpers:
- shortName, relevantNodes
- computeOwnerTarget (even/uneven split, inactive node drain)
- buildDistributionCounts (normal + nil Fid fallback)
- selectActiveNodes (all nodes / limited count)
- planOwnerMoves (imbalanced → balanced, already balanced)
- planDistribution primary (chunks balanced, no-op when even)
- planDistribution round-robin (offset ordering, correct assignment)
- planDistribution replica (owner + copy balancing)
- printRedistributionPlan (output format)
* fix(shell): add 5-minute timeout to chunk downloads in fs.distributeChunks
Download requests had no per-request timeout, unlike delete operations
which already use 30s. Replace readUrl() calls with inline
http.NewRequestWithContext + context.WithTimeout(5m) so a hung volume
server cannot block a goroutine indefinitely during redistribution.
* fix(shell): remove redundant deleteOldChunks in fs.distributeChunks
filer.UpdateEntry already calls deleteChunksIfNotNew internally, which
computes the diff between old and new entry chunks and deletes the ones
no longer referenced. Our explicit deleteOldChunks was racing with this
filer-side cleanup, causing spurious 404 warnings on ~75% of deletes.
Remove deleteOldChunks, movedChunkRecord type, and reduce
executeChunkMoves return type to (int, error) for the moved count.
* fix(shell): handle nil chunk.Fid via chunkVolumeId helper in fs.distributeChunks
chunk.Fid.GetVolumeId() silently returns 0 for legacy chunks stored with
a FileId string instead of a Fid struct, causing them to be skipped in
the replica balancing loop and looked up incorrectly in volumeInfoMap.
Introduce chunkVolumeId() that uses Fid when present and falls back to
parsing the legacy FileId string, matching the logic in
buildDistributionCounts. Apply it in the replica-mode copies loop and
in executeChunkMoves' replication/collection lookup.
* fix(shell): use already-parsed oldFid for volumeInfoMap lookup in fs.distributeChunks
chunkVolumeId(chunk) was being called to look up replication/collection
after oldFid had already been parsed and validated. Use oldFid.VolumeId
directly to avoid redundant parsing and guarantee the correct volume ID
regardless of whether chunk.Fid is nil.
* fix(shell): improve correctness and robustness in fs.distributeChunks
- Buffer download body before upload so dlCtx timeout only covers the
GET request; upload runs with context.Background() via bytes.NewReader
- Replace 'before, after := strings.Cut(...)' + '_ = before' with '_'
as the first return value directly
- Clone copiesCount before replica planner mutates it, keeping the
caller's map immutable
- Add nil-entry guard after filer LookupEntry to prevent panic on
unexpected nil response
* feat(shell): support chunk manifests in fs.distributeChunks
Large files stored as chunk manifests were previously rejected. Resolve
manifests up front via filer.ResolveChunkManifest, redistribute the
underlying data chunks, then re-pack through filer.MaybeManifestize
before UpdateEntry. The filer's MinusChunks resolves manifests on both
sides of the diff, so old manifest and inner data chunks are GC'd
automatically.
* fix(shell): match master's SaveDataAsChunkFunctionType 5-param signature
Master added expectedDataSize uint64; ignore it in shell-side saveAsChunk.
---------
Co-authored-by: Chris Lu <chris.lu@gmail.com>
* fix(wdclient,volume): compare master leader with ServerAddress.Equals
Raft leader is advertised as host:httpPort.grpcPort, but clients dial
host:httpPort. Raw string comparison against VolumeLocation.Leader /
HeartbeatResponse.Leader therefore never matches, causing the
masterclient and the volume server heartbeat loop to continuously
"redirect" to the already-connected master, tearing down the stream
and reconnecting.
Use ServerAddress.Equals, which normalizes the grpc-port suffix.
* fix(filer,mq): compare ServerAddress via Equals in two more sites
filer bootstrap skip (MaybeBootstrapFromOnePeer) and the broker's local
partition assignment check both compared a wire-supplied address string
against the local self ServerAddress with raw string equality. Both are
vulnerable to the same plain-vs-host:port.grpcPort mismatch as the
masterclient/volume heartbeat sites: filer would bootstrap from itself,
and the broker would fail to claim a partition it was actually assigned.
Route both through ServerAddress.Equals.
* fix(master,shell): more ServerAddress comparisons via Equals
- raft_server_handlers.go HealthzHandler: s.serverAddr == leader would
skip the child-lock check on the real leader when the two carry
different plain/grpc-suffix forms, returning 200 OK instead of 423.
- master_server.go SetRaftServer leader-change callback: the
Leader() == Name() guard for ensureTopologyId could disagree with
topology.IsLeader() (which already uses Equals), so leader-only
initialization could be skipped after an election.
- command_volume_merge.go isReplicaServer: the -target guard compared
user-supplied host:port against NewServerAddressFromDataNode(...) with
==, letting an existing replica slip through when topology carries
the embedded gRPC port.
All routed through pb.ServerAddress.Equals.
* fix(mq,cluster): more ServerAddress comparisons via Equals
- broker_grpc_lookup.go GetTopicPublishers/GetTopicSubscribers: the
partition ownership check gated listing on raw
LeaderBroker == BrokerAddress().String(), so listings silently omitted
partitions hosted locally when the assignment carried the other
host:port / host:port.grpcPort form.
- lock_client.go: LockHostMovedTo comparison and the seedFiler fallback
guard both used raw string equality against configured filer
addresses (which may be plain host:port while LockHostMovedTo comes
back suffixed), causing spurious host-change churn and blocking the
seed-filer fallback.
* fix(mq): more ServerAddress comparisons via Equals
- pub_balancer/allocate.go EnsureAssignmentsToActiveBrokers: direct
activeBrokers.Get() lookup missed brokers when a persisted assignment
carried a different address encoding than the registered broker key,
triggering a bogus reassignment on every read/write cycle. Added a
findActiveBroker helper that falls back to an Equals-based scan and
canonicalizes the assignment in place so later writes are stable.
- broker_grpc_lookup.go isLockOwner: used raw string equality between
LockOwner() and BrokerAddress().String(), so a lock owner could fail
to recognize itself and proxy local lookup/config/admin RPCs away.
- pub_client/scheduler.go onEachAssignments: reused publisher jobs only
on exact LeaderBroker match, so an encoding flip in lookup results
tore down and recreated a stream to the same broker.
* fix(shell): s3.user.provision handles existing users by attaching policy
Instead of erroring when the user already exists, the command now
creates the policy and attaches it to the existing user via UpdateUser.
Credentials are only generated and displayed for newly created users.
* fix(shell): skip duplicate policy attachment in s3.user.provision
Check if the policy is already attached before appending and calling
UpdateUser, making repeated runs idempotent.
* fix(shell): generate service account ID in s3.serviceaccount.create
The command built a ServiceAccount proto without setting Id, which was
rejected by credential.ValidateServiceAccountId on any real store. Now
generates sa:<parent>:<uuid> matching the format used by the admin UI.
* test(s3): integration tests for s3.* shell commands
Adds TestShell* integration tests covering ~40 previously untested
shell commands: user, accesskey, group, serviceaccount, anonymous,
bucket, policy.attach/detach, config.show, and iam.export/import.
Switches the test cluster's credential store from memory to filer_etc
because the memory store silently drops groups and service accounts
in LoadConfiguration/SaveConfiguration.
* fix(shell): rollback policy on key generation failure in s3.user.provision
If iam.GenerateRandomString or iam.GenerateSecretAccessKey fails after
the policy was persisted, the policy would be left orphaned. Extracts
the rollback logic into a local closure and invokes it on all failure
paths after policy creation for consistency.
* address PR review feedback for s3 shell tests and serviceaccount
- s3.serviceaccount.create: use 16 bytes of randomness (hex-encoded) for
the service account UUID instead of 4 bytes to eliminate collision risk
- s3.serviceaccount.create: print the actual ID and drop the outdated
"server-assigned" note (the ID is now client-generated)
- tests: guard createdAK in accesskey rotate/delete subtests so sibling
failures don't run invalid CLI calls
- tests: requireContains/requireNotContains use t.Fatalf to fail fast
- tests: Provision subtest asserts the "Attached policy" message on the
second provision call for an existing user
- tests: update extractServiceAccountID comment example to match the
sa:<parent>:<uuid> format
- tests: drop redundant saID empty-check (extractServiceAccountID fatals)
* test(s3): use t.Fatalf for precondition check in serviceaccount test
* fix(weed/shell): suppress prompt when stdin or stdout is not a TTY
When piping weed shell output (e.g. `echo "s3.user.list" | weed shell | jq`),
the "> " prompt was written to stdout, breaking JSON parsers.
`liner.TerminalSupported()` only checks platform support, not whether
stdin/stdout are actual TTYs. Add explicit checks using `term.IsTerminal()`
so the shell falls back to the non-interactive scanner path when piped.
Fixes#8962
* fix(weed/shell): suppress informational logs unless -verbose is set
Suppress glog info messages and connection status logs on stderr by
default. Add -verbose flag to opt in to the previous noisy behavior.
This keeps piped output clean (e.g. `echo "s3.user.list" | weed shell | jq`).
* fix(weed/shell): defer liner init until after TTY check
Move liner.NewLiner() and related setup (history, completion, interrupt
handler) inside the interactive block so the terminal is not put into
raw mode when stdout is redirected. Previously, liner would set raw mode
unconditionally at startup, leaving the terminal broken when falling
back to the scanner path.
Addresses review feedback from gemini-code-assist.
* refactor(weed/shell): consolidate verbose logging into single block
Group all verbose stderr output within one conditional block instead of
scattering three separate if-verbose checks around the filer logic.
Addresses review feedback from gemini-code-assist.
* fix(weed/shell): clean up global liner state and suppress logtostderr
- Set line=nil after Close() to prevent stale state if RunShell is
called again (e.g. in tests)
- Add nil check in OnInterrupt handler for non-interactive sessions
- Also set logtostderr=false when not verbose, in case it was enabled
Addresses review feedback from gemini-code-assist.
* refactor(weed/shell): make liner state local to eliminate data race
Replace the package-level `line` variable with a local variable in
RunShell, passing it explicitly to setCompletionHandler, loadHistory,
and saveHistory. This eliminates a data race between the OnInterrupt
goroutine and the defer that previously set the global to nil.
Addresses review feedback from gemini-code-assist.
* rename(weed/shell): rename -verbose flag to -debug
Avoid conflict with -verbose flags already used by individual shell
commands (e.g. ec.encode, volume.fix.replication, volume.check.disk).
* shell: s3.* commands output JSON, connection messages to stderr
All s3.user.* and s3.policy.attach|detach commands now output structured
JSON to stdout instead of human-readable text:
- s3.user.create: {"name","access_key"} (secret key to stderr only)
- s3.user.list: [{name,status,policies,keys}]
- s3.user.show: {name,status,source,account,policies,credentials,...}
- s3.user.delete: {"name"}
- s3.user.enable/disable: {"name","status"}
- s3.policy.attach/detach: {"policy","user"}
Connection startup messages (master/filer) moved to stderr so they
don't pollute structured output when piping.
Closes#8962 (partial — covers merged s3.user/policy commands).
* shell: fix secret leak, duplicate JSON output, and non-interactive prompt
- s3.user.create: only echo secret key to stderr when auto-generated,
never echo caller-supplied secrets
- s3.user.enable/disable: fix duplicate JSON output — remove inner
write in early-return path, keep single write site after gRPC call
- shell_liner: use bufio.Scanner when stdin is not a terminal instead
of liner.Prompt, suppressing the "> " prompt in piped mode
* shell: check scanner error, idempotent enable output, history errors to stderr
- Check scanner.Err() after non-interactive input loop to surface read errors
- s3.user.enable: always emit JSON regardless of current state (idempotent)
- saveHistory: write error messages to stderr instead of stdout
* shell: add s3.iam.*, s3.config.show, s3.user.provision; hide legacy commands
Add import/export, configuration summary, and a convenience provisioning
command:
- s3.iam.export: dump full IAM state as JSON (stdout or file)
- s3.iam.import: replace IAM state from a JSON file
- s3.config.show: human-readable summary (users, policies, service
accounts, groups with status and counts)
- s3.user.provision: one-step user+policy+credentials creation for
common readonly/readwrite/admin roles
Hide legacy commands from help listing:
- s3.configure: still works but hidden from help output
- s3.bucket.access: still works but hidden from help output
Both hidden commands remain fully functional for existing scripts.
Also adds a Hidden command tag and filters it from printGenericHelp.
* shell: address review feedback for s3.iam.*, s3.config.show, s3.user.provision
- Simplify joinMax using strings.Join
- Fix rolePolicies: remove s3:ListBucket from object-level actions
(already covered by bucket-level statement)
- Fix admin role: grant s3:* on bucket resource too
- Return flag parse errors instead of swallowing them
* shell: address missed review feedback for PR 3
- s3.iam.import: require -force flag for destructive IAM overwrite
- s3.config.show: add nil guard for resp.Configuration
- s3.user.provision: check if user exists before creating policy
- s3.user.provision: reject wildcard bucket names (* ?)
* shell: distinguish NotFound from transient errors in provision, use %w wrapping
- s3.user.provision: check gRPC status code on GetUser error — only
proceed on NotFound, abort on transient/network errors
- s3.iam.import: use %w for error wrapping to preserve error chains,
wrap PutConfiguration error with context
* shell: remove duplicate joinMax after PR 8954 merge
command_s3_helpers.go defined joinMax which is already in
command_s3_user_list.go from the merged PR 8954.
* shell: restrict export file permissions, rollback policy on user create failure
- s3.iam.export: use os.OpenFile with mode 0600 instead of os.Create
to protect exported credentials from other users
- s3.user.provision: rollback the created policy if CreateUser fails,
with a warning if the rollback itself fails
* shell: add s3.user.* and s3.policy.attach|detach commands
Add focused IAM shell commands following a noun-verb model:
- s3.user.create: create user with auto-generated or explicit credentials
- s3.user.list: tabular listing with status, policies, key count
- s3.user.show: detailed user view (status, source, policies, credentials)
- s3.user.delete: delete a user
- s3.user.enable: enable a disabled user
- s3.user.disable: disable a user (preserves credentials and policies)
- s3.policy.attach: attach a named policy to a user
- s3.policy.detach: detach a policy from a user
These commands are thin wrappers over the existing IAM gRPC service,
producing human-readable output instead of raw protobuf text.
This is part of a larger effort to replace the monolithic s3.configure
command with a composable set of single-purpose commands.
* shell: address review feedback for s3.user.* and s3.policy.attach|detach
- Return flag parse errors instead of swallowing them (all commands)
- Use GetConfiguration instead of N+1 GetUser calls in s3.user.list
- Add nil check for resp.Identity in s3.user.show
- Fix GetPolicy error masking in s3.policy.attach (wrap original error)
- Simplify joinMax using strings.Join
* shell: add nil identity guards and wrap gRPC errors
- Add nil check for resp.Identity in policy_attach, policy_detach,
user_enable, user_disable
- Wrap GetUser errors with user context for better diagnostics
* shell: add s3.accesskey.*, s3.anonymous.*, s3.serviceaccount.* commands
Add credential, anonymous access, and service account management commands:
Access key commands:
- s3.accesskey.create: add credentials to an existing user
- s3.accesskey.list: list access keys for a user (key ID + status)
- s3.accesskey.delete: remove a specific access key
- s3.accesskey.rotate: atomic create-new + delete-old key rotation
Anonymous access commands:
- s3.anonymous.set: set/remove public access on a bucket
- s3.anonymous.get: show anonymous access for a bucket
- s3.anonymous.list: list all buckets with anonymous access
Service account commands:
- s3.serviceaccount.create: create with optional action subset and expiry
- s3.serviceaccount.list: tabular listing, optionally filtered by parent
- s3.serviceaccount.show: detailed view of a service account
- s3.serviceaccount.delete: remove a service account
These replace the credential and anonymous portions of the monolithic
s3.configure and s3.bucket.access commands.
* shell: address review feedback for s3.accesskey.*, s3.anonymous.*, s3.serviceaccount.*
- Return flag parse errors instead of swallowing them (all commands)
- Add action validation in s3.anonymous.set (Read, Write, List, Tagging, Admin)
- Fix s3.serviceaccount.create output: note to use list for server-assigned ID
since CreateServiceAccountResponse does not return the ID
* shell: fix bucket matching and action validation in s3.anonymous.*
- Use SplitN instead of HasSuffix for bucket name matching to avoid
false positives when one bucket name is a suffix of another
- Make action validation case-insensitive with canonical normalization
* shell: fix nil panics, dedup actions, validate service account actions
- Fix nil-pointer panic in getOrCreateAnonymousUser when GetUser returns
err==nil with nil Identity (status.FromError(nil) returns nil status)
- Add nil Identity guards in s3.anonymous.get and s3.anonymous.list
- Deduplicate action values in s3.anonymous.set (e.g. -access Read,Read)
- Add action validation in s3.serviceaccount.create with case normalization
* shell: dedup actions and reject negative expiry in s3.serviceaccount.create
- Deduplicate -actions values (e.g. Read,read,Read produces one entry)
- Reject negative -expiry values instead of silently treating as no expiration
* volume.tier.move: fulfill target replication before deleting old replicas
When -toReplication is specified, volume.tier.move now creates all
required replicas on the destination tier before deleting old replicas.
This closes the data-loss window where only one copy existed on the
target tier while awaiting volume.fix.replication.
If replication fulfillment fails, old replicas are preserved and marked
writable so the volume remains accessible.
Also extracts replicateVolumeToServer and configureVolumeReplication
helpers to reduce duplication across volume.tier.move and
volume.fix.replication.
Fixes#8937
* volume.tier.move: always fulfill replication before deleting old replicas
When -toReplication is specified, use that replication setting.
Otherwise, read the volume's existing replication from the super block.
In both cases, all required replicas are created on the destination
tier before old replicas are deleted.
If replication fulfillment fails (e.g. not enough destination nodes),
old replicas are preserved and marked writable so no data is lost.
* volume.tier.move: address review feedback on ensureReplicationFulfilled
- Add 5s delay before re-collecting topology to allow master heartbeat
propagation after the move
- Add nil guard for targetTierReplicas to prevent panic if the moved
replica is not yet visible in the topology
- Treat configureVolumeReplication failure as a hard error instead of a
warning, so the rollback logic preserves old replicas
* volume.tier.move: harden replication config error handling
- Make configureVolumeReplication failure on the primary moved replica a
hard error that aborts the move, instead of logging and continuing
- Configure replication metadata on all existing target-tier replicas
(not just newly created ones) when -toReplication is specified
- Deletion of old replicas cannot affect new replicas since the
locations list only contains pre-move servers (verified, no change)
* volume.tier.move: fix cleanup deleting fulfilled replicas and broken recovery
Fix 1: The cleanup loop now preserves pre-existing target-tier replicas
that ensureReplicationFulfilled counted toward the replication target.
Previously, a mixed-tier volume with an existing replica on the target
tier could have that replica deleted right after being counted as
fulfilled, leaving the volume under-replicated.
ensureReplicationFulfilled now returns a preserveServers set that the
deletion loop checks before removing any old replica.
Fix 2: Failure paths after LiveMoveVolume (which deletes the source
replica) now use restoreSurvivingReplicasWritable instead of
markVolumeReplicasWritable. The old helper stopped on first error, so
attempting to mark the already-deleted source writable would prevent
all surviving replicas from being restored. The new helper skips the
deleted source and continues through all remaining locations, logging
per-replica errors instead of aborting.
* volume.tier.move: mark preserved replicas writable, skip nodes with existing volume
Fix 1: Preserved pre-existing target-tier replicas were left read-only
after the move completed. They were marked read-only at the start
(along with all other replicas) but never restored since the old code
deleted them. Now they are explicitly marked writable before cleanup.
Fix 2: The fulfillment loop could pick a candidate node that already
hosts this volume on a different disk type, causing a VolumeCopy
conflict. Added a guard that skips any node already hosting the volume
(on any disk) before attempting replication.
* chore: remove unreachable dead code across the codebase
Remove ~50,000 lines of unreachable code identified by static analysis.
Major removals:
- weed/filer/redis_lua: entire unused Redis Lua filer store implementation
- weed/wdclient/net2, resource_pool: unused connection/resource pool packages
- weed/plugin/worker/lifecycle: unused lifecycle plugin worker
- weed/s3api: unused S3 policy templates, presigned URL IAM, streaming copy,
multipart IAM, key rotation, and various SSE helper functions
- weed/mq/kafka: unused partition mapping, compression, schema, and protocol functions
- weed/mq/offset: unused SQL storage and migration code
- weed/worker: unused registry, task, and monitoring functions
- weed/query: unused SQL engine, parquet scanner, and type functions
- weed/shell: unused EC proportional rebalance functions
- weed/storage/erasure_coding/distribution: unused distribution analysis functions
- Individual unreachable functions removed from 150+ files across admin,
credential, filer, iam, kms, mount, mq, operation, pb, s3api, server,
shell, storage, topology, and util packages
* fix(s3): reset shared memory store in IAM test to prevent flaky failure
TestLoadIAMManagerFromConfig_EmptyConfigWithFallbackKey was flaky because
the MemoryStore credential backend is a singleton registered via init().
Earlier tests that create anonymous identities pollute the shared store,
causing LookupAnonymous() to unexpectedly return true.
Fix by calling Reset() on the memory store before the test runs.
* style: run gofmt on changed files
* fix: restore KMS functions used by integration tests
* fix(plugin): prevent panic on send to closed worker session channel
The Plugin.sendToWorker method could panic with "send on closed channel"
when a worker disconnected while a message was being sent. The race was
between streamSession.close() closing the outgoing channel and sendToWorker
writing to it concurrently.
Add a done channel to streamSession that is closed before the outgoing
channel, and check it in sendToWorker's select to safely detect closed
sessions without panicking.
* fix(s3api): fix AWS Signature V2 format and validation
* fix(s3api): Skip space after "AWS" prefix (+1 offset)
* test(s3api): add unit tests for Signature V2 authentication fix
* fix(s3api): simply comparing signatures
* validation for the colon extraction in expectedAuth
* fix(shell): avoid marking skipped or unplaced volumes as fixed
---------
Co-authored-by: chrislu <chris.lu@gmail.com>
Co-authored-by: Chris Lu <chrislusf@users.noreply.github.com>
* Process .ecj deletions during EC decode and vacuum decoded volume (#8798)
When decoding EC volumes back to normal volumes, deletions recorded in
the .ecj journal were not being applied before computing the dat file
size or checking for live needles. This caused the decoded volume to
include data for deleted files and could produce false positives in the
all-deleted check.
- Call RebuildEcxFile before HasLiveNeedles/FindDatFileSize in
VolumeEcShardsToVolume so .ecj deletions are merged into .ecx first
- Vacuum the decoded volume after mounting in ec.decode to compact out
deleted needle data from the .dat file
- Add integration tests for decoding with non-empty .ecj files
* storage: add offline volume compaction helper
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* ec: compact decoded volumes before deleting shards
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* ec: address PR review comments
- Fall back to data directory for .ecx when idx directory lacks it
- Make compaction failure non-fatal during EC decode
- Remove misleading "buffer: 10%" from space check error message
* ec: collect .ecj from all shard locations during decode
Each server's .ecj only contains deletions for needles whose data
resides in shards held by that server. Previously, sources with no
new data shards to contribute were skipped entirely, losing their
.ecj deletion entries. Now .ecj is always appended from every shard
location so RebuildEcxFile sees the full set of deletions.
* ec: add integration tests for .ecj collection during decode
TestEcDecodePreservesDeletedNeedles: verifies that needles deleted
via VolumeEcBlobDelete are excluded from the decoded volume.
TestEcDecodeCollectsEcjFromPeer: regression test for the fix in
collectEcShards. Deletes a needle only on a peer server that holds
no new data shards, then verifies the deletion survives decode via
.ecj collection.
* ec: address review nits in decode and tests
- Remove double error wrapping in mountDecodedVolume
- Check VolumeUnmount error in peer ecj test
- Assert 404 specifically for deleted needles, fail on 5xx
---------
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* fix ec.balance failing to rebalance when all nodes share all volumes (#8793)
Two bugs in doBalanceEcRack prevented rebalancing:
1. Sorting by freeEcSlot instead of actual shard count caused incorrect
empty/full node selection when nodes have different total capacities.
2. The volume-level check skipped any volume already present on the
target node. When every node has a shard of every volume (common
with many EC volumes across N nodes with N shards each), no moves
were possible.
Fix: sort by actual shard count, and use a two-pass approach - first
prefer moving shards of volumes not on the target (best diversity),
then fall back to moving specific shard IDs not yet on the target.
* add test simulating real cluster topology from issue #8793
Uses the actual node addresses and mixed max capacities (80 vs 33)
from the reporter's 14-node cluster to verify ec.balance correctly
rebalances with heterogeneous node sizes.
* fix pass comments to match 0-indexed loop variable
* Give the `ScrubVolume()` RPC an option to flag found broken volumes as read-only.
Also exposes this option in the shell `volume.scrub` command.
* Remove redundant test in `TestVolumeMarkReadonlyWritableErrorPaths`.
417051bb slightly rearranges the logic for `VolumeMarkReadonly()` and `VolumeMarkWritable()`,
so calling them for invalid volume IDs will actually yield that error, instead of checking
maintnenance mode first.
* shell: add s3.bucket.access command for anonymous access policy (#7738)
Add a new weed shell command to view or change the anonymous access
policy of an S3 bucket without external tools.
Usage:
s3.bucket.access -name <bucket> -access read,list
s3.bucket.access -name <bucket> -access none
Supported permissions: read, write, list. The command writes a standard
bucket policy with Principal "*" and warns if no anonymous IAM identity
exists.
* shell: fix anonymous identity hint in s3.bucket.access warning
The anonymous identity doesn't need IAM actions — the bucket policy
controls what anonymous users can do.
* shell: only warn about anonymous identity when write access is set
Read and list operations use AuthWithPublicRead which evaluates bucket
policies directly without requiring the anonymous identity. Only write
operations go through the normal auth flow that needs it.
* shell: rewrite s3.bucket.access to use IAM actions instead of bucket policies
Replace the bucket policy approach with direct IAM identity actions,
matching the s3.configure pattern. The user is auto-created if it does
not exist.
Usage:
s3.bucket.access -name <bucket> -user anonymous -access Read,List
s3.bucket.access -name <bucket> -user anonymous -access none
s3.bucket.access -name <bucket> -user anonymous
Actions are stored as "Action:bucket" on the identity, same as
s3.configure -actions=Read -buckets=my-bucket.
* shell: return flag parse errors instead of swallowing them
* shell: normalize action names case-insensitively in s3.bucket.access
Accept actions in any case (read, READ, Read) and normalize to canonical
form (Read, Write, List, etc.) before storing. This matches the
case-insensitive handling of "none" and avoids confusing rejections.
* feat(shell): add volume.tier.compact command to reclaim cloud storage space
Adds a new shell command that automates compaction of cloud tier volumes.
When files are deleted from remote-tiered volumes, space is not reclaimed
on the cloud storage. This command orchestrates: download from remote,
compact locally, and re-upload to reclaim deleted space.
Closes#8563
* fix: log cleanup errors in compactVolumeOnServer instead of discarding them
Helps operators diagnose leftover temp files (.cpd/.cpx) if cleanup
fails after a compaction or commit failure.
* fix: return aggregate error from loop and use regex for collection filter
- Track and return error count when one or more volumes fail to compact,
so callers see partial failures instead of always getting nil.
- Use compileCollectionPattern for -collection in -volumeId mode too, so
regex patterns work consistently with the flag description. Empty
pattern (no -collection given) matches all collections.
* improve large file sync throughput for remote.cache and filer.sync
Three main throughput improvements:
1. Adaptive chunk sizing for remote.cache: targets ~32 chunks per file
instead of always starting at 5MB. A 500MB file now uses ~16MB chunks
(32 chunks) instead of 5MB chunks (100 chunks), reducing per-chunk
overhead (volume assign, gRPC call, needle write) by 3x.
2. Configurable concurrency at every layer:
- remote.cache chunk concurrency: -chunkConcurrency flag (default 8)
- remote.cache S3 download concurrency: -downloadConcurrency flag
(default raised from 1 to 5 per chunk)
- filer.sync chunk concurrency: -chunkConcurrency flag (default 32)
3. S3 multipart download concurrency raised from 1 to 5: the S3 manager
downloader was using Concurrency=1, serializing all part downloads
within each chunk. This alone can 5x per-chunk download speed.
The concurrency values flow through the gRPC request chain:
shell command → CacheRemoteObjectToLocalClusterRequest →
FetchAndWriteNeedleRequest → S3 downloader
Zero values in the request mean "use server defaults", maintaining
full backward compatibility with existing callers.
Ref #8481
* fix: use full maxMB for chunk size cap and remove loop guard
Address review feedback:
- Use full maxMB instead of maxMB/2 for maxChunkSize to avoid
unnecessarily limiting chunk size for very large files.
- Remove chunkSize < maxChunkSize guard from the safety loop so it
can always grow past maxChunkSize when needed to stay under 1000
chunks (e.g., extremely large files with small maxMB).
* address review feedback: help text, validation, naming, docs
- Fix help text for -chunkConcurrency and -downloadConcurrency flags
to say "0 = server default" instead of advertising specific numeric
defaults that could drift from the server implementation.
- Validate chunkConcurrency and downloadConcurrency are within int32
range before narrowing, returning a user-facing error if out of range.
- Rename ReadRemoteErr to readRemoteErr to follow Go naming conventions.
- Add doc comment to SetChunkConcurrency noting it must be called
during initialization before replication goroutines start.
- Replace doubling loop in chunk size safety check with direct
ceil(remoteSize/1000) computation to guarantee the 1000-chunk cap.
* address Copilot review: clamp concurrency, fix chunk count, clarify proto docs
- Use ceiling division for chunk count check to avoid overcounting
when file size is an exact multiple of chunk size.
- Clamp chunkConcurrency (max 1024) and downloadConcurrency (max 1024
at filer, max 64 at volume server) to prevent excessive goroutines.
- Always use ReadFileWithConcurrency when the client supports it,
falling back to the implementation's default when value is 0.
- Clarify proto comments that download_concurrency only applies when
the remote storage client supports it (currently S3).
- Include specific server defaults in help text (e.g., "0 = server
default 8") so users see the actual values in -h output.
* fix data race on executionErr and use %w for error wrapping
- Protect concurrent writes to executionErr in remote.cache worker
goroutines with a sync.Mutex to eliminate the data race.
- Use %w instead of %v in volume_grpc_remote.go error formatting
to preserve the error chain for errors.Is/errors.As callers.
* fix(ec): gather shards from all disk locations before rebuild (#8631)
Fix "too few shards given" error during ec.rebuild on multi-disk volume
servers. The root cause has two parts:
1. VolumeEcShardsRebuild only looked at a single disk location for shard
files. On multi-disk servers, the existing local shards could be on one
disk while copied shards were placed on another, causing the rebuild to
see fewer shards than actually available.
2. VolumeEcShardsCopy had a DiskId condition (req.DiskId == 0 &&
len(vs.store.Locations) > 0) that was always true, making the
FindFreeLocation fallback dead code. This meant copies always went to
Locations[0] regardless of where existing shards were.
Changes:
- VolumeEcShardsRebuild now finds the location with the most shards,
then gathers shard files from other locations via hard links (or
symlinks for cross-device) before rebuilding. Gathered files are
cleaned up after rebuild.
- VolumeEcShardsCopy now only uses Locations[DiskId] when DiskId > 0
(explicitly set). Otherwise, it prefers the location that already has
the EC volume, falling back to HDD then any free location.
- generateMissingEcFiles now logs shard counts and provides a clear
error message when not enough shards are found, instead of passing
through to the opaque reedsolomon "too few shards given" error.
* fix(ec): update test to match skip behavior for unrepairable volumes
The test expected an error for volumes with insufficient shards, but
commit 5acb4578a changed unrepairable volumes to be skipped with a log
message instead of returning an error. Update the test to verify the
skip behavior and log output.
* fix(ec): address PR review comments
- Add comment clarifying DiskId=0 means "not specified" (protobuf default),
callers must use DiskId >= 1 to target a specific disk.
- Log warnings on cleanup failures for gathered shard links.
* fix(ec): read shard files from other disks directly instead of linking
Replace the hard link / symlink gathering approach with passing
additional search directories into RebuildEcFiles. The rebuild
function now opens shard files directly from whichever disk they
live on, avoiding filesystem link operations and cleanup.
RebuildEcFiles and RebuildEcFilesWithContext gain a variadic
additionalDirs parameter (backward compatible with existing callers).
* fix(ec): clarify DiskId selection semantics in VolumeEcShardsCopy comment
* fix(ec): avoid empty files on failed rebuild; don't skip ecx-only locations
- generateMissingEcFiles: two-pass approach — first discover present/missing
shards and check reconstructability, only then create output files. This
avoids leaving behind empty truncated shard files when there are too few
shards to rebuild.
- VolumeEcShardsRebuild: compute hasEcx before skipping zero-shard locations.
A location with an .ecx file but no shard files (all shards on other disks)
is now a valid rebuild candidate instead of being silently skipped.
* fix(ec): select ecx-only location as rebuildLocation when none chosen yet
When rebuildLocation is nil and a location has hasEcx=true but
existingShardCount=0 (all shards on other disks), the condition
0 > 0 was false so it was never promoted to rebuildLocation.
Add rebuildLocation == nil to the predicate so the first location
with an .ecx file is always selected as a candidate.
* Fix ec.rebuild failing on unrepairable volumes instead of skipping them
When an EC volume has fewer shards than DataShardsCount, ec.rebuild would
return an error and abort the entire operation. Now it logs a warning and
continues rebuilding the remaining volumes.
Fixes#8630
* Remove duplicate volume ID in unrepairable log message
---------
Co-authored-by: Copilot <copilot@github.com>
* feat(filer): add lazy directory listing for remote mounts
Directory listings on remote mounts previously only queried the local
filer store. With lazy mounts the listing was empty; with eager mounts
it went stale over time.
Add on-demand directory listing that fetches from remote and caches
results with a 5-minute TTL:
- Add `ListDirectory` to `RemoteStorageClient` interface (delimiter-based,
single-level listing, separate from recursive `Traverse`)
- Implement in S3, GCS, and Azure backends using each platform's
hierarchical listing API
- Add `maybeLazyListFromRemote` to filer: before each directory listing,
check if the directory is under a remote mount with an expired cache,
fetch from remote, persist entries to the local store, then let existing
listing logic run on the populated store
- Use singleflight to deduplicate concurrent requests for the same directory
- Skip local-only entries (no RemoteEntry) to avoid overwriting unsynced uploads
- Errors are logged and swallowed (availability over consistency)
* refactor: extract xattr key to constant xattrRemoteListingSyncedAt
* feat: make listing cache TTL configurable per mount via listing_cache_ttl_seconds
Add listing_cache_ttl_seconds field to RemoteStorageLocation protobuf.
When 0 (default), lazy directory listing is disabled for that mount.
When >0, enables on-demand directory listing with the specified TTL.
Expose as -listingCacheTTL flag on remote.mount command.
* refactor: address review feedback for lazy directory listing
- Add context.Context to ListDirectory interface and all implementations
- Capture startTime before remote call for accurate TTL tracking
- Simplify S3 ListDirectory using ListObjectsV2PagesWithContext
- Make maybeLazyListFromRemote return void (errors always swallowed)
- Remove redundant trailing-slash path manipulation in caller
- Update tests to match new signatures
* When an existing entry has Remote != nil, we should merge remote metadata into it rather than replacing it.
* fix(gcs): wrap ListDirectory iterator error with context
The raw iterator error was returned without bucket/path context,
making it harder to debug. Wrap it consistently with the S3 pattern.
* fix(s3): guard against nil pointer dereference in Traverse and ListDirectory
Some S3-compatible backends may return nil for LastModified, Size, or
ETag fields. Check for nil before dereferencing to prevent panics.
* fix(filer): remove blanket 2-minute timeout from lazy listing context
Individual SDK operations (S3, GCS, Azure) already have per-request
timeouts and retry policies. The blanket timeout could cut off large
directory listings mid-operation even though individual pages were
succeeding.
* fix(filer): preserve trace context in lazy listing with WithoutCancel
Use context.WithoutCancel(ctx) instead of context.Background() so
trace/span values from the incoming request are retained for
distributed tracing, while still decoupling cancellation.
* fix(filer): use Store.FindEntry for internal lookups, add Uid/Gid to files, fix updateDirectoryListingSyncedAt
- Use f.Store.FindEntry instead of f.FindEntry for staleness check and
child lookups to avoid unnecessary lazy-fetch overhead
- Set OS_UID/OS_GID on new file entries for consistency with directories
- In updateDirectoryListingSyncedAt, use Store.UpdateEntry for existing
directories instead of CreateEntry to avoid deleteChunksIfNotNew and
NotifyUpdateEvent side effects
* fix(filer): distinguish not-found from store errors in lazy listing
Previously, any error from Store.FindEntry was treated as "not found,"
which could cause entry recreation/overwrite on transient DB failures.
Now check for filer_pb.ErrNotFound explicitly and skip entries or
bail out on real store errors.
* refactor(filer): use errors.Is for ErrNotFound comparisons
* feat(remote): add -noSync flag to skip upfront metadata pull on mount
Made-with: Cursor
* refactor(remote): split mount setup from metadata sync
Extract ensureMountDirectory for create/validate; call pullMetadata
directly when sync is needed. Caller controls sync step for -noSync.
Made-with: Cursor
* fix(remote): validate mount root when -noSync so bad bucket/creds fail fast
When -noSync is used, perform a cheap remote check (ListBuckets and
verify bucket exists) instead of skipping all remote I/O. Invalid
buckets or credentials now fail at mount time.
Made-with: Cursor
* test(remote): add TestRemoteMountNoSync for -noSync mount and persisted mapping
Made-with: Cursor
* test(remote): assert no upfront metadata after -noSync mount
After remote.mount -noSync, run fs.ls on the mount dir and assert empty
listing so the test fails if pullMetadata was invoked eagerly.
Made-with: Cursor
* fix(remote): propagate non-ErrNotFound lookup errors in ensureMountDirectory
Return lookupErr immediately for any LookupDirectoryEntry failure that
is not filer_pb.ErrNotFound, so only the not-found case creates the
entry and other lookup failures are reported to the caller.
Made-with: Cursor
* fix(remote): use errors.Is for ErrNotFound in ensureMountDirectory
Replace fragile strings.Contains(lookupErr.Error(), ...) with
errors.Is(lookupErr, filer_pb.ErrNotFound) before calling CreateEntry.
Made-with: Cursor
* fix(remote): use LookupEntry so ErrNotFound is recognised after gRPC
Raw gRPC LookupDirectoryEntry returns a status error, not the sentinel,
so errors.Is(lookupErr, filer_pb.ErrNotFound) was always false. Use
filer_pb.LookupEntry which normalises not-found to ErrNotFound so the
mount directory is created when missing.
Made-with: Cursor
* test(remote): ignore weed shell banner in TestRemoteMountNoSync fs.ls count
Exclude master/filer and prompt lines from entry count so the assertion
checks only actual fs.ls output for empty -noSync mount.
Made-with: Cursor
* fix(remote.mount): use 0755 for mount dir, document bucket-less early return
Made-with: Cursor
* feat(remote.mount): replace -noSync with -metadataStrategy=lazy|eager
- Add -metadataStrategy flag (eager default, lazy skips upfront metadata pull)
- Accept lazy/eager case-insensitively; reject invalid values with clear error
- Rename TestRemoteMountNoSync to TestRemoteMountMetadataStrategyLazy
- Add TestRemoteMountMetadataStrategyEager and TestRemoteMountMetadataStrategyInvalid
Made-with: Cursor
* fix(remote.mount): validate strategy and remote before creating mount directory
Move strategy validation and validateMountRoot (lazy path) before
ensureMountDirectory so that invalid strategies or bad bucket/credentials
fail without leaving orphaned directory entries in the filer.
* refactor(remote.mount): remove unused remote param from ensureMountDirectory
The remote *RemoteStorageLocation parameter was left over from the old
syncMetadata signature. Only remoteConf.Name is used inside the function.
* doc(remote.mount): add TODO for HeadBucket-style validation
validateMountRoot currently lists all buckets to verify one exists.
Note the need for a targeted BucketExists method in the interface.
* refactor(remote.mount): use MetadataStrategy type and constants
Replace raw string comparisons with a MetadataStrategy type and
MetadataStrategyEager/MetadataStrategyLazy constants for clarity
and compile-time safety.
* refactor(remote.mount): rename MetadataStrategy to MetadataCacheStrategy
More precisely describes the purpose: controlling how metadata is
cached from the remote, not metadata handling in general.
* fix(remote.mount): remove validateMountRoot from lazy path
Lazy mount's purpose is to skip remote I/O. Validating via ListBuckets
contradicts that, especially on accounts with many buckets. Invalid
buckets or credentials will surface on first lazy access instead.
* fix(test): handle shell exit 0 in TestRemoteMountMetadataStrategyInvalid
The weed shell process exits with code 0 even when individual commands
fail — errors appear in stdout. Check output instead of requiring a
non-nil error.
* test(remote.mount): remove metadataStrategy shell integration tests
These tests only verify string output from a shell process that always
exits 0 — they cannot meaningfully validate eager vs lazy behavior
without a real remote backend.
---------
Co-authored-by: Chris Lu <chris.lu@gmail.com>
The log message was comparing against the planned size of the destination
volume (including volumes already planned to merge into it) but only
displaying the raw volume size, making the output confusing when the
displayed sizes clearly didn't add up to exceed the limit.
remote.uncache checks LastLocalSyncTsNs to determine if a file has been
synced to remote. remote.copy.local was not setting this field, leaving
it at 0, which caused uncache to skip all files uploaded via
remote.copy.local.
Fixes#8602
* Enhance volume.merge command with deduplication and disk-based backend
* Fix copyVolume function call with correct argument order and missing bool parameter
* Revert "Fix copyVolume function call with correct argument order and missing bool parameter"
This reverts commit 7b4a190643.
* Fix critical issues: per-replica writable tracking, tail goroutine cancellation via done channel, and debug logging for allocation failures
* Optimize memory usage with watermark approach for duplicate detection
* Fix critical issues: swap copyVolume arguments, increase idle timeout, remove file double-close, use glog for logging
* Replace temporary file with in-memory buffer for needle blob serialization
* test(volume.merge): Add comprehensive unit and integration tests
Add 7 unit tests covering:
- Ordering by timestamp
- Cross-stream duplicate deduplication
- Empty stream handling
- Complex multi-stream deduplication
- Single stream passthrough
- Large needle ID support
- LastModified fallback when timestamp unavailable
Add 2 integration validation tests:
- TestMergeWorkflowValidation: Documents 9-stage merge workflow
- TestMergeEdgeCaseHandling: Validates 10 edge case handling
All tests passing (9/9)
* fix(volume.merge): Use time window for deduplication to handle clock skew
The same needle ID can have different timestamps on different servers due to
clock skew and replication lag. Needles with the same ID within a 5-second
time window are now treated as duplicates (same write with timestamp variance).
Key changes:
- Add mergeDeduplicationWindowNs constant (5 seconds)
- Replace exact timestamp matching with time window comparison
- Use windowInitialized flag to properly detect window transitions
- Add TestMergeNeedleStreamsTimeWindowDeduplication test
This ensures that replicated writes with slight timestamp differences are
properly deduplicated during merge, while separate updates to the same file
ID (outside the window) are preserved.
All tests passing (10/10)
* test: Add volume.merge integration tests with 5 comprehensive test cases
* test: integration tests for volume.merge command
* Fix integration tests: use TripleVolumeCluster for volume.merge testing
- Created new TripleVolumeCluster framework (cluster_triple.go) with 3 volume servers
- Rebuilt weed binary with volume.merge command compiled in
- Updated all 5 integration tests to use TripleVolumeCluster instead of DualVolumeCluster
- Tests now properly allocate volumes on 2 servers and let merge allocate on 3rd
- All 5 integration tests now pass:
- TestVolumeMergeBasic
- TestVolumeMergeReadonly
- TestVolumeMergeRestore
- TestVolumeMergeTailNeedles
- TestVolumeMergeDivergentReplicas
* Refactor test framework: use parameterized server count instead of hardcoded
- Renamed TripleVolumeCluster to MultiVolumeCluster with serverCount parameter
- Replaced hardcoded volumePort0/1/2 with slices for flexible server count
- Updated StartTripleVolumeCluster as backward-compatible wrapper calling StartMultiVolumeCluster(t, profile, 3)
- Made directory creation, port allocation, and server startup loop-based
- Updated accessor methods (VolumeAdminAddress, VolumeGRPCAddress, etc.) to support any server count
- All 5 integration tests continue to pass with new parameterized cluster framework
- Enables future testing with 2, 4, 5+ volume servers by calling StartMultiVolumeCluster directly
* Consolidate cluster frameworks: StartDualVolumeCluster now uses MultiVolumeCluster
- Made DualVolumeCluster a type alias for MultiVolumeCluster
- Updated StartDualVolumeCluster to call StartMultiVolumeCluster(t, profile, 2)
- Removed duplicate code from cluster_dual.go (now just 17 lines)
- All existing tests using StartDualVolumeCluster continue to work without changes
- Backward compatible: existing code continues to use the old function signatures
- Added wrapper functions in cluster_multi.go for StartTripleVolumeCluster
- Enables unified cluster management across all test suites
* Address PR review comments: improve error handling and clean up code
- Replace parse error swallow with proper error return
- Log cleanup and restoration errors instead of silently discarding them
- Remove unused offset field from memoryBackendFile struct
- Fix WriteAt buffer truncation bug to preserve trailing bytes
- All unit tests passing (10/10)
- Code compiles successfully
* Fix PR review findings: test improvements and code quality
- Add timeout to runWeedShell to prevent hanging
- Add server 1 readonly status verification in tests
- Assert merge fails when replicas writable (not just log output)
- Replace sleep with polling for writable restoration check
- Fix WriteAt stale data snapshot bug in memoryBackendFile
- Fix startVolume error logging to show current server log
- Fix volumePubPorts double assignment in port allocation
- Rename test to reflect behavior: DoesNotDeduplicateAcrossWindows
- Fix misleading dedup window comment
Unit tests: 10/10 passing
Binary: Compiles successfully
* Fix test assumption: merge command marks volumes readonly automatically
TestVolumeMergeReadonly was expecting merge to fail on writable volumes, but the
merge command is designed to mark volumes readonly as part of its operation. Fixed
test to verify merge succeeds on writable volumes and properly restores writable
state afterward. Removed redundant Test 2 code that duplicated the new behavior.
* fmt
* Fix deduplication logic to correctly handle same-stream vs cross-stream duplicates
The dedup map previously used only NeedleId as key, causing same-stream
overwrites to be incorrectly skipped as duplicates. Changed to track which
stream first processed each needle ID in the current window:
- Cross-stream duplicates (same ID from different streams, within window) are skipped
- Same-stream duplicates (overwrites from same stream) are kept
- Map now stores: needleId -> streamIndex of first occurrence in window
Added TestMergeNeedleStreamsSameStreamDuplicates to verify same-stream
overwrites are preserved while cross-stream duplicates are skipped.
All unit tests passing (11/11)
Binary compiles successfully
* helm: refine openshift-values.yaml to remove hardcoded UIDs
Remove hardcoded runAsUser, runAsGroup, and fsGroup from the
openshift-values.yaml example. This allows OpenShift's admission
controller to automatically assign a valid UID from the namespace's
allocated range, avoiding "forbidden" errors when UID 1000 is
outside the permissible range.
Updates #8381, #8390.
* helm: fix volume.logs and add consistent security context comments
* Update README.md
* fix volume.fsck crashing on EC volumes and add multi-volume vacuum support
* address comments
* Fix master leader election startup issue
Fixes #error-log-leader-not-selected-yet
* not useful test
* fix(iam): ensure access key status is persisted and defaulted to Active
* make pb
* update tests
* using constants
When the `--files` flag is present, `cluster.status` will scrape file metrics
from volume servers to provide detailed stats on those. The progress indicator
was not being updated properly though, so the command would complete before
it read 100%.
* Fix volume.fsck 401 Unauthorized by adding JWT to HTTP delete requests
* Additionally, for performance, consider fetching the jwt.filer_signing.key once before any loops that call httpDelete, rather than inside httpDelete itself, to avoid repeated configuration lookups.
* fix ec.encode skipping volumes when one replica is on a full disk
This fixes issue #8218. Previously, ec.encode would skip a volume if ANY
of its replicas resided on a disk with low free volume count. Now it
accepts the volume if AT LEAST ONE replica is on a healthy disk.
* refine noFreeDisk counter logic in ec.encode
Ensure noFreeDisk is decremented if a volume initially marked as bad
is later found to have a healthy replica. This ensures accurate
summary statistics.
* defer noFreeDisk counting and refine logging in ec.encode
Updated logging to be replica-scoped and deferred noFreeDisk counting to
the final pass over vidMap. This ensures that the counter only reflects
volumes that are definitively excluded because all replicas are on full
disks.
* filter replicas by free space during ec.encode
Updated doEcEncode to filter out replicas on disks with
FreeVolumeCount < 2 before selecting the best replica for encoding.
This ensures that EC shards are not generated on healthy source
replicas that happen to be on disks with low free space.
* fix issue #8230: volume.fsck deletion logic to respect purgeAbsent flag
This commit fixes two issues in volume.fsck:
1. Missing chunks in existing volumes are now deleted if -reallyDeleteFilerEntries is set.
2. Missing volumes are now properly handled when a -volumeId filter is specified, allowing deletion of filer entries for those volumes.
* address PR feedback for issue #8230
- Ensure volume filter is applied before reporting missing volumes
- Fix potential nil-pointer dereferences in httpDelete method
- Use proper error checking throughout httpDelete
* address second round PR feedback for issue #8230
- Use fmt.Fprintf(c.writer, ...) instead of fmt.Printf
- Add missing newline in "deleting path" log message
* add minCacheAge flag to remote.uncache command #8221
* address code review feedback: add nil check and improve test isolation
* address code review feedback: use consistent timestamp in FileFilter