seaweedfs

mirror of https://github.com/seaweedfs/seaweedfs.git synced 2026-05-17 23:31:31 +00:00

Author	SHA1	Message	Date
Chris Lu	d57fc67022	fix(shell): fs.mergeVolumes now rewrites manifest chunks for large files (#9127 ) * fix(shell): fs.mergeVolumes now rewrites manifest chunks for large files Previously fs.mergeVolumes skipped any chunk whose IsChunkManifest flag was true, printing "Change volume id for large file is not implemented yet" and continuing. Because the BFS traversal only looks at top-level entry.Chunks, sub-chunks referenced inside a manifest were never considered either. For any file stored as a chunk manifest (large files go this path), chunks in the source volume stayed put, leaving behind a few MB of live data that vacuum and volume.deleteEmpty couldn't clean up. This change resolves each manifest chunk recursively, moves any sub-chunk whose volume id is in the merge plan via the existing moveChunk path, and re-serializes the manifest. If the manifest chunk itself lives in a source volume, or any sub-chunk moved, the new manifest blob is uploaded to a freshly assigned file id (the old needle becomes orphaned and is reclaimed by vacuum like any other moved chunk). Fixes #9116. * address review: batch UpdateEntry, fix dry-run, defer restore, avoid source volumes - Call UpdateEntry once per entry after the chunk loop instead of once per moved chunk (gemini nit). - In dry-run mode, mark anySubChanged when a sub-chunk in the plan is encountered and return changed=true after printing "rewrite manifest", so nested manifests also surface their would-rewrites (gemini nit). - Defer filer_pb.AfterEntryDeserialization so the manifest chunk list is restored even when proto.Marshal fails (coderabbit nit). - Reject AssignVolume results whose file id lands on a volume that is a source in the merge plan, and retry — otherwise the replacement manifest could be written to the volume being emptied (coderabbit).	2026-04-17 21:17:51 -07:00
Jaehoon Kim	96af27a131	feat(shell): add fs.distributeChunks command for even chunk distribution (#9117 ) * feat(shell): add fs.distributeChunks command for even chunk distribution Add a new weed shell command that redistributes a file's chunks evenly across volume server nodes. Supports three distribution modes via -mode flag: - primary: balance chunk ownership across nodes (default) - replica: balance both ownership and replica copies - round-robin: assign chunks by offset order for sequential read optimization (chunk[0]->A, chunk[1]->B, chunk[2]->C, ...) Additional options: - -nodes=N to target specific number of nodes - -apply to execute (dry-run by default) Usage: fs.distributeChunks -path=/buckets/file.dat fs.distributeChunks -path=/buckets/file.dat -mode=round-robin -apply fs.distributeChunks -path=/buckets/file.dat -mode=replica -apply fs.distributeChunks -path=/buckets/file.dat -nodes=5 -apply * fix(shell): improve fs.distributeChunks robustness and code quality - Propagate flag parse errors instead of swallowing them (return err) - Handle nil chunk.Fid by falling back to legacy FileId string parsing - Simplify node membership check using slices.Contains * fix(shell): fix dead round-robin print loop in fs.distributeChunks The loop was computing targetNode with sc.index%totalNodes (original chunk index) instead of the sequential position, and discarding it via _ = targetNode without printing anything. Replace with a correct loop using pos%totalNodes and actually print the first 12 node assignments. * fix(shell): compute replication/collection per-chunk in fs.distributeChunks Previously replication and collection were derived once from chunks[0] and reused for all moves, causing wrong volume placement for chunks belonging to different volumes or collections. Now each chunk looks up its own volumeInfoMap entry immediately before calling operation.Assign. * fix(shell): prefer assignResult.Auth JWT over local signing key in fs.distributeChunks When the master returns an Auth token in the Assign response, use it directly for the upload instead of generating a new JWT from the local viper signing key. Fall back to local key generation only when Auth is empty, matching the pattern used by other upload paths. * fix(shell): add timeout and error handling to delete requests in fs.distributeChunks The delete loop was ignoring http.NewRequest errors and had no timeout, risking a nil-request panic or indefinite block. Replace with http.NewRequestWithContext and a 30s timeout, handle request creation errors by incrementing deleteFailCount, and cancel the context immediately after Do returns. * feat(shell): parallelize chunk moves in fs.distributeChunks using ErrorWaitGroup Sequential chunk moves are a bottleneck for large LLM model files with hundreds or thousands of chunks. Use ErrorWaitGroup with DefaultMaxParallelization (10) to run download/assign/upload concurrently. Guard movedRecords appends, chunk.Fid updates, and writer output with a mutex. Individual chunk failures are non-fatal and logged inline; only successfully moved chunks are included in the metadata update. * fix(shell): try all replica URLs on download in fs.distributeChunks Previously only the first volume server URL was attempted, causing chunk moves to fail if that replica was unreachable. Now iterates through all URLs returned by LookupVolumeServerUrl and stops at the first success. * refactor(shell): apply extract method pattern to fs.distributeChunks Do() was a single ~615-line function. Break it into focused helpers: - lookupFileEntry: filer entry lookup - validateChunks: chunk manifest guard - collectVolumeTopology: master topology query + ownership mapping - buildDistributionCounts: chunk→node mapping and owner/copy tallies - selectActiveNodes: target node selection - printCurrentDistribution: per-node distribution table - planDistribution: mode-switch planning (primary/replica/round-robin) - printRedistributionPlan: before/after plan table - relevantNodes: active-or-occupied node filter Do() is now ~100 lines of orchestration; each helper has a single clear responsibility. * test(shell): add unit tests for fs.distributeChunks algorithms Cover all three distribution modes and supporting helpers: - shortName, relevantNodes - computeOwnerTarget (even/uneven split, inactive node drain) - buildDistributionCounts (normal + nil Fid fallback) - selectActiveNodes (all nodes / limited count) - planOwnerMoves (imbalanced → balanced, already balanced) - planDistribution primary (chunks balanced, no-op when even) - planDistribution round-robin (offset ordering, correct assignment) - planDistribution replica (owner + copy balancing) - printRedistributionPlan (output format) * fix(shell): add 5-minute timeout to chunk downloads in fs.distributeChunks Download requests had no per-request timeout, unlike delete operations which already use 30s. Replace readUrl() calls with inline http.NewRequestWithContext + context.WithTimeout(5m) so a hung volume server cannot block a goroutine indefinitely during redistribution. * fix(shell): remove redundant deleteOldChunks in fs.distributeChunks filer.UpdateEntry already calls deleteChunksIfNotNew internally, which computes the diff between old and new entry chunks and deletes the ones no longer referenced. Our explicit deleteOldChunks was racing with this filer-side cleanup, causing spurious 404 warnings on ~75% of deletes. Remove deleteOldChunks, movedChunkRecord type, and reduce executeChunkMoves return type to (int, error) for the moved count. * fix(shell): handle nil chunk.Fid via chunkVolumeId helper in fs.distributeChunks chunk.Fid.GetVolumeId() silently returns 0 for legacy chunks stored with a FileId string instead of a Fid struct, causing them to be skipped in the replica balancing loop and looked up incorrectly in volumeInfoMap. Introduce chunkVolumeId() that uses Fid when present and falls back to parsing the legacy FileId string, matching the logic in buildDistributionCounts. Apply it in the replica-mode copies loop and in executeChunkMoves' replication/collection lookup. * fix(shell): use already-parsed oldFid for volumeInfoMap lookup in fs.distributeChunks chunkVolumeId(chunk) was being called to look up replication/collection after oldFid had already been parsed and validated. Use oldFid.VolumeId directly to avoid redundant parsing and guarantee the correct volume ID regardless of whether chunk.Fid is nil. * fix(shell): improve correctness and robustness in fs.distributeChunks - Buffer download body before upload so dlCtx timeout only covers the GET request; upload runs with context.Background() via bytes.NewReader - Replace 'before, after := strings.Cut(...)' + '_ = before' with '_' as the first return value directly - Clone copiesCount before replica planner mutates it, keeping the caller's map immutable - Add nil-entry guard after filer LookupEntry to prevent panic on unexpected nil response * feat(shell): support chunk manifests in fs.distributeChunks Large files stored as chunk manifests were previously rejected. Resolve manifests up front via filer.ResolveChunkManifest, redistribute the underlying data chunks, then re-pack through filer.MaybeManifestize before UpdateEntry. The filer's MinusChunks resolves manifests on both sides of the diff, so old manifest and inner data chunks are GC'd automatically. * fix(shell): match master's SaveDataAsChunkFunctionType 5-param signature Master added expectedDataSize uint64; ignore it in shell-side saveAsChunk. --------- Co-authored-by: Chris Lu <chris.lu@gmail.com>	2026-04-17 21:09:36 -07:00
Chris Lu	979c54f693	fix(wdclient,volume): compare master leader with ServerAddress.Equals (#9089 ) * fix(wdclient,volume): compare master leader with ServerAddress.Equals Raft leader is advertised as host:httpPort.grpcPort, but clients dial host:httpPort. Raw string comparison against VolumeLocation.Leader / HeartbeatResponse.Leader therefore never matches, causing the masterclient and the volume server heartbeat loop to continuously "redirect" to the already-connected master, tearing down the stream and reconnecting. Use ServerAddress.Equals, which normalizes the grpc-port suffix. * fix(filer,mq): compare ServerAddress via Equals in two more sites filer bootstrap skip (MaybeBootstrapFromOnePeer) and the broker's local partition assignment check both compared a wire-supplied address string against the local self ServerAddress with raw string equality. Both are vulnerable to the same plain-vs-host:port.grpcPort mismatch as the masterclient/volume heartbeat sites: filer would bootstrap from itself, and the broker would fail to claim a partition it was actually assigned. Route both through ServerAddress.Equals. * fix(master,shell): more ServerAddress comparisons via Equals - raft_server_handlers.go HealthzHandler: s.serverAddr == leader would skip the child-lock check on the real leader when the two carry different plain/grpc-suffix forms, returning 200 OK instead of 423. - master_server.go SetRaftServer leader-change callback: the Leader() == Name() guard for ensureTopologyId could disagree with topology.IsLeader() (which already uses Equals), so leader-only initialization could be skipped after an election. - command_volume_merge.go isReplicaServer: the -target guard compared user-supplied host:port against NewServerAddressFromDataNode(...) with ==, letting an existing replica slip through when topology carries the embedded gRPC port. All routed through pb.ServerAddress.Equals. * fix(mq,cluster): more ServerAddress comparisons via Equals - broker_grpc_lookup.go GetTopicPublishers/GetTopicSubscribers: the partition ownership check gated listing on raw LeaderBroker == BrokerAddress().String(), so listings silently omitted partitions hosted locally when the assignment carried the other host:port / host:port.grpcPort form. - lock_client.go: LockHostMovedTo comparison and the seedFiler fallback guard both used raw string equality against configured filer addresses (which may be plain host:port while LockHostMovedTo comes back suffixed), causing spurious host-change churn and blocking the seed-filer fallback. * fix(mq): more ServerAddress comparisons via Equals - pub_balancer/allocate.go EnsureAssignmentsToActiveBrokers: direct activeBrokers.Get() lookup missed brokers when a persisted assignment carried a different address encoding than the registered broker key, triggering a bogus reassignment on every read/write cycle. Added a findActiveBroker helper that falls back to an Equals-based scan and canonicalizes the assignment in place so later writes are stable. - broker_grpc_lookup.go isLockOwner: used raw string equality between LockOwner() and BrokerAddress().String(), so a lock owner could fail to recognize itself and proxy local lookup/config/admin RPCs away. - pub_client/scheduler.go onEachAssignments: reused publisher jobs only on exact LeaderBroker match, so an encoding flip in lookup results tore down and recreated a stream to the same broker.	2026-04-15 12:29:31 -07:00
Chris Lu	10e7f0f2bc	fix(shell): s3.user.provision handles existing users by attaching policy (#9040 ) * fix(shell): s3.user.provision handles existing users by attaching policy Instead of erroring when the user already exists, the command now creates the policy and attaches it to the existing user via UpdateUser. Credentials are only generated and displayed for newly created users. * fix(shell): skip duplicate policy attachment in s3.user.provision Check if the policy is already attached before appending and calling UpdateUser, making repeated runs idempotent. * fix(shell): generate service account ID in s3.serviceaccount.create The command built a ServiceAccount proto without setting Id, which was rejected by credential.ValidateServiceAccountId on any real store. Now generates sa:<parent>:<uuid> matching the format used by the admin UI. * test(s3): integration tests for s3.* shell commands Adds TestShell* integration tests covering ~40 previously untested shell commands: user, accesskey, group, serviceaccount, anonymous, bucket, policy.attach/detach, config.show, and iam.export/import. Switches the test cluster's credential store from memory to filer_etc because the memory store silently drops groups and service accounts in LoadConfiguration/SaveConfiguration. * fix(shell): rollback policy on key generation failure in s3.user.provision If iam.GenerateRandomString or iam.GenerateSecretAccessKey fails after the policy was persisted, the policy would be left orphaned. Extracts the rollback logic into a local closure and invokes it on all failure paths after policy creation for consistency. * address PR review feedback for s3 shell tests and serviceaccount - s3.serviceaccount.create: use 16 bytes of randomness (hex-encoded) for the service account UUID instead of 4 bytes to eliminate collision risk - s3.serviceaccount.create: print the actual ID and drop the outdated "server-assigned" note (the ID is now client-generated) - tests: guard createdAK in accesskey rotate/delete subtests so sibling failures don't run invalid CLI calls - tests: requireContains/requireNotContains use t.Fatalf to fail fast - tests: Provision subtest asserts the "Attached policy" message on the second provision call for an existing user - tests: update extractServiceAccountID comment example to match the sa:<parent>:<uuid> format - tests: drop redundant saID empty-check (extractServiceAccountID fatals) * test(s3): use t.Fatalf for precondition check in serviceaccount test	2026-04-11 22:30:51 -07:00
Chris Lu	e648c76bcf	go fmt	2026-04-10 17:31:14 -07:00
Chris Lu	b1265de78f	feat(shell): add group management commands (#8993 ) * feat(shell): add group management commands Add weed shell commands for IAM group management: - s3.group.create -name <group> - s3.group.delete -name <group> - s3.group.list - s3.group.show -name <group> - s3.group.add-user -group <group> -user <user> - s3.group.remove-user -group <group> -user <user> All commands use GetConfiguration/PutConfiguration gRPC pattern, consistent with existing shell commands like s3.user.list. * fix: add nil check for Configuration in group shell commands Guard against nil Configuration response from GetConfiguration gRPC call to prevent potential panics. (Gemini review)	2026-04-08 14:03:26 -07:00
Chris Lu	7f3908297c	fix(weed/shell): suppress prompt when piped (#8990 ) * fix(weed/shell): suppress prompt when stdin or stdout is not a TTY When piping weed shell output (e.g. `echo "s3.user.list" \| weed shell \| jq`), the "> " prompt was written to stdout, breaking JSON parsers. `liner.TerminalSupported()` only checks platform support, not whether stdin/stdout are actual TTYs. Add explicit checks using `term.IsTerminal()` so the shell falls back to the non-interactive scanner path when piped. Fixes #8962 * fix(weed/shell): suppress informational logs unless -verbose is set Suppress glog info messages and connection status logs on stderr by default. Add -verbose flag to opt in to the previous noisy behavior. This keeps piped output clean (e.g. `echo "s3.user.list" \| weed shell \| jq`). * fix(weed/shell): defer liner init until after TTY check Move liner.NewLiner() and related setup (history, completion, interrupt handler) inside the interactive block so the terminal is not put into raw mode when stdout is redirected. Previously, liner would set raw mode unconditionally at startup, leaving the terminal broken when falling back to the scanner path. Addresses review feedback from gemini-code-assist. * refactor(weed/shell): consolidate verbose logging into single block Group all verbose stderr output within one conditional block instead of scattering three separate if-verbose checks around the filer logic. Addresses review feedback from gemini-code-assist. * fix(weed/shell): clean up global liner state and suppress logtostderr - Set line=nil after Close() to prevent stale state if RunShell is called again (e.g. in tests) - Add nil check in OnInterrupt handler for non-interactive sessions - Also set logtostderr=false when not verbose, in case it was enabled Addresses review feedback from gemini-code-assist. * refactor(weed/shell): make liner state local to eliminate data race Replace the package-level `line` variable with a local variable in RunShell, passing it explicitly to setCompletionHandler, loadHistory, and saveHistory. This eliminates a data race between the OnInterrupt goroutine and the defer that previously set the global to nil. Addresses review feedback from gemini-code-assist. * rename(weed/shell): rename -verbose flag to -debug Avoid conflict with -verbose flags already used by individual shell commands (e.g. ec.encode, volume.fix.replication, volume.check.disk).	2026-04-08 13:07:15 -07:00
Chris Lu	74905c4b5d	shell: s3.* commands always output JSON, connection messages to stderr (#8976 ) * shell: s3.* commands output JSON, connection messages to stderr All s3.user.* and s3.policy.attach\|detach commands now output structured JSON to stdout instead of human-readable text: - s3.user.create: {"name","access_key"} (secret key to stderr only) - s3.user.list: [{name,status,policies,keys}] - s3.user.show: {name,status,source,account,policies,credentials,...} - s3.user.delete: {"name"} - s3.user.enable/disable: {"name","status"} - s3.policy.attach/detach: {"policy","user"} Connection startup messages (master/filer) moved to stderr so they don't pollute structured output when piping. Closes #8962 (partial — covers merged s3.user/policy commands). * shell: fix secret leak, duplicate JSON output, and non-interactive prompt - s3.user.create: only echo secret key to stderr when auto-generated, never echo caller-supplied secrets - s3.user.enable/disable: fix duplicate JSON output — remove inner write in early-return path, keep single write site after gRPC call - shell_liner: use bufio.Scanner when stdin is not a terminal instead of liner.Prompt, suppressing the "> " prompt in piped mode * shell: check scanner error, idempotent enable output, history errors to stderr - Check scanner.Err() after non-interactive input loop to surface read errors - s3.user.enable: always emit JSON regardless of current state (idempotent) - saveHistory: write error messages to stderr instead of stdout	2026-04-07 16:27:21 -07:00
Chris Lu	fb0573ffc4	shell: rename -force to -apply in s3.iam.import for consistency	2026-04-07 14:17:07 -07:00
Chris Lu	d50889002b	shell: add s3.iam., s3.config.show, s3.user.provision; hide legacy commands (#8956 ) shell: add s3.iam., s3.config.show, s3.user.provision; hide legacy commands Add import/export, configuration summary, and a convenience provisioning command: - s3.iam.export: dump full IAM state as JSON (stdout or file) - s3.iam.import: replace IAM state from a JSON file - s3.config.show: human-readable summary (users, policies, service accounts, groups with status and counts) - s3.user.provision: one-step user+policy+credentials creation for common readonly/readwrite/admin roles Hide legacy commands from help listing: - s3.configure: still works but hidden from help output - s3.bucket.access: still works but hidden from help output Both hidden commands remain fully functional for existing scripts. Also adds a Hidden command tag and filters it from printGenericHelp. shell: address review feedback for s3.iam., s3.config.show, s3.user.provision - Simplify joinMax using strings.Join - Fix rolePolicies: remove s3:ListBucket from object-level actions (already covered by bucket-level statement) - Fix admin role: grant s3: on bucket resource too - Return flag parse errors instead of swallowing them * shell: address missed review feedback for PR 3 - s3.iam.import: require -force flag for destructive IAM overwrite - s3.config.show: add nil guard for resp.Configuration - s3.user.provision: check if user exists before creating policy - s3.user.provision: reject wildcard bucket names (* ?) * shell: distinguish NotFound from transient errors in provision, use %w wrapping - s3.user.provision: check gRPC status code on GetUser error — only proceed on NotFound, abort on transient/network errors - s3.iam.import: use %w for error wrapping to preserve error chains, wrap PutConfiguration error with context * shell: remove duplicate joinMax after PR 8954 merge command_s3_helpers.go defined joinMax which is already in command_s3_user_list.go from the merged PR 8954. * shell: restrict export file permissions, rollback policy on user create failure - s3.iam.export: use os.OpenFile with mode 0600 instead of os.Create to protect exported credentials from other users - s3.user.provision: rollback the created policy if CreateUser fails, with a warning if the rollback itself fails	2026-04-07 14:10:15 -07:00
Chris Lu	45bf3ad058	shell: add s3.user.* and s3.policy.attach\|detach commands (#8954 ) * shell: add s3.user.* and s3.policy.attach\|detach commands Add focused IAM shell commands following a noun-verb model: - s3.user.create: create user with auto-generated or explicit credentials - s3.user.list: tabular listing with status, policies, key count - s3.user.show: detailed user view (status, source, policies, credentials) - s3.user.delete: delete a user - s3.user.enable: enable a disabled user - s3.user.disable: disable a user (preserves credentials and policies) - s3.policy.attach: attach a named policy to a user - s3.policy.detach: detach a policy from a user These commands are thin wrappers over the existing IAM gRPC service, producing human-readable output instead of raw protobuf text. This is part of a larger effort to replace the monolithic s3.configure command with a composable set of single-purpose commands. * shell: address review feedback for s3.user.* and s3.policy.attach\|detach - Return flag parse errors instead of swallowing them (all commands) - Use GetConfiguration instead of N+1 GetUser calls in s3.user.list - Add nil check for resp.Identity in s3.user.show - Fix GetPolicy error masking in s3.policy.attach (wrap original error) - Simplify joinMax using strings.Join * shell: add nil identity guards and wrap gRPC errors - Add nil check for resp.Identity in policy_attach, policy_detach, user_enable, user_disable - Wrap GetUser errors with user context for better diagnostics	2026-04-07 11:26:57 -07:00
Chris Lu	d123a2768b	shell: add s3.accesskey., s3.anonymous., s3.serviceaccount.* commands (#8955 ) * shell: add s3.accesskey., s3.anonymous., s3.serviceaccount.* commands Add credential, anonymous access, and service account management commands: Access key commands: - s3.accesskey.create: add credentials to an existing user - s3.accesskey.list: list access keys for a user (key ID + status) - s3.accesskey.delete: remove a specific access key - s3.accesskey.rotate: atomic create-new + delete-old key rotation Anonymous access commands: - s3.anonymous.set: set/remove public access on a bucket - s3.anonymous.get: show anonymous access for a bucket - s3.anonymous.list: list all buckets with anonymous access Service account commands: - s3.serviceaccount.create: create with optional action subset and expiry - s3.serviceaccount.list: tabular listing, optionally filtered by parent - s3.serviceaccount.show: detailed view of a service account - s3.serviceaccount.delete: remove a service account These replace the credential and anonymous portions of the monolithic s3.configure and s3.bucket.access commands. * shell: address review feedback for s3.accesskey., s3.anonymous., s3.serviceaccount.* - Return flag parse errors instead of swallowing them (all commands) - Add action validation in s3.anonymous.set (Read, Write, List, Tagging, Admin) - Fix s3.serviceaccount.create output: note to use list for server-assigned ID since CreateServiceAccountResponse does not return the ID * shell: fix bucket matching and action validation in s3.anonymous.* - Use SplitN instead of HasSuffix for bucket name matching to avoid false positives when one bucket name is a suffix of another - Make action validation case-insensitive with canonical normalization * shell: fix nil panics, dedup actions, validate service account actions - Fix nil-pointer panic in getOrCreateAnonymousUser when GetUser returns err==nil with nil Identity (status.FromError(nil) returns nil status) - Add nil Identity guards in s3.anonymous.get and s3.anonymous.list - Deduplicate action values in s3.anonymous.set (e.g. -access Read,Read) - Add action validation in s3.serviceaccount.create with case normalization * shell: dedup actions and reject negative expiry in s3.serviceaccount.create - Deduplicate -actions values (e.g. Read,read,Read produces one entry) - Reject negative -expiry values instead of silently treating as no expiration	2026-04-07 11:20:15 -07:00
Chris Lu	0fed72d95a	volume.tier.move: fulfill target replication before deleting old replicas (#8950 ) * volume.tier.move: fulfill target replication before deleting old replicas When -toReplication is specified, volume.tier.move now creates all required replicas on the destination tier before deleting old replicas. This closes the data-loss window where only one copy existed on the target tier while awaiting volume.fix.replication. If replication fulfillment fails, old replicas are preserved and marked writable so the volume remains accessible. Also extracts replicateVolumeToServer and configureVolumeReplication helpers to reduce duplication across volume.tier.move and volume.fix.replication. Fixes #8937 * volume.tier.move: always fulfill replication before deleting old replicas When -toReplication is specified, use that replication setting. Otherwise, read the volume's existing replication from the super block. In both cases, all required replicas are created on the destination tier before old replicas are deleted. If replication fulfillment fails (e.g. not enough destination nodes), old replicas are preserved and marked writable so no data is lost. * volume.tier.move: address review feedback on ensureReplicationFulfilled - Add 5s delay before re-collecting topology to allow master heartbeat propagation after the move - Add nil guard for targetTierReplicas to prevent panic if the moved replica is not yet visible in the topology - Treat configureVolumeReplication failure as a hard error instead of a warning, so the rollback logic preserves old replicas * volume.tier.move: harden replication config error handling - Make configureVolumeReplication failure on the primary moved replica a hard error that aborts the move, instead of logging and continuing - Configure replication metadata on all existing target-tier replicas (not just newly created ones) when -toReplication is specified - Deletion of old replicas cannot affect new replicas since the locations list only contains pre-move servers (verified, no change) * volume.tier.move: fix cleanup deleting fulfilled replicas and broken recovery Fix 1: The cleanup loop now preserves pre-existing target-tier replicas that ensureReplicationFulfilled counted toward the replication target. Previously, a mixed-tier volume with an existing replica on the target tier could have that replica deleted right after being counted as fulfilled, leaving the volume under-replicated. ensureReplicationFulfilled now returns a preserveServers set that the deletion loop checks before removing any old replica. Fix 2: Failure paths after LiveMoveVolume (which deletes the source replica) now use restoreSurvivingReplicasWritable instead of markVolumeReplicasWritable. The old helper stopped on first error, so attempting to mark the already-deleted source writable would prevent all surviving replicas from being restored. The new helper skips the deleted source and continues through all remaining locations, logging per-replica errors instead of aborting. * volume.tier.move: mark preserved replicas writable, skip nodes with existing volume Fix 1: Preserved pre-existing target-tier replicas were left read-only after the move completed. They were marked read-only at the start (along with all other replicas) but never restored since the old code deleted them. Now they are explicitly marked writable before cleanup. Fix 2: The fulfillment loop could pick a candidate node that already hosts this volume on a different disk type, causing a VolumeCopy conflict. Added a guard that skips any node already hosting the volume (on any disk) before attempting replication.	2026-04-06 14:55:37 -07:00
Chris Lu	995dfc4d5d	chore: remove ~50k lines of unreachable dead code (#8913 ) * chore: remove unreachable dead code across the codebase Remove ~50,000 lines of unreachable code identified by static analysis. Major removals: - weed/filer/redis_lua: entire unused Redis Lua filer store implementation - weed/wdclient/net2, resource_pool: unused connection/resource pool packages - weed/plugin/worker/lifecycle: unused lifecycle plugin worker - weed/s3api: unused S3 policy templates, presigned URL IAM, streaming copy, multipart IAM, key rotation, and various SSE helper functions - weed/mq/kafka: unused partition mapping, compression, schema, and protocol functions - weed/mq/offset: unused SQL storage and migration code - weed/worker: unused registry, task, and monitoring functions - weed/query: unused SQL engine, parquet scanner, and type functions - weed/shell: unused EC proportional rebalance functions - weed/storage/erasure_coding/distribution: unused distribution analysis functions - Individual unreachable functions removed from 150+ files across admin, credential, filer, iam, kms, mount, mq, operation, pb, s3api, server, shell, storage, topology, and util packages * fix(s3): reset shared memory store in IAM test to prevent flaky failure TestLoadIAMManagerFromConfig_EmptyConfigWithFallbackKey was flaky because the MemoryStore credential backend is a singleton registered via init(). Earlier tests that create anonymous identities pollute the shared store, causing LookupAnonymous() to unexpectedly return true. Fix by calling Reset() on the memory store before the test runs. * style: run gofmt on changed files * fix: restore KMS functions used by integration tests * fix(plugin): prevent panic on send to closed worker session channel The Plugin.sendToWorker method could panic with "send on closed channel" when a worker disconnected while a message was being sent. The race was between streamSession.close() closing the outgoing channel and sendToWorker writing to it concurrently. Add a done channel to streamSession that is closed before the outgoing channel, and check it in sendToWorker's select to safely detect closed sessions without panicking.	2026-04-03 16:04:27 -07:00
qzh	4c72512ea2	fix(shell): avoid marking skipped or unplaced volumes as fixed (#8866 ) * fix(s3api): fix AWS Signature V2 format and validation * fix(s3api): Skip space after "AWS" prefix (+1 offset) * test(s3api): add unit tests for Signature V2 authentication fix * fix(s3api): simply comparing signatures * validation for the colon extraction in expectedAuth * fix(shell): avoid marking skipped or unplaced volumes as fixed --------- Co-authored-by: chrislu <chris.lu@gmail.com> Co-authored-by: Chris Lu <chrislusf@users.noreply.github.com>	2026-04-01 01:20:25 -07:00
Chris Lu	af68449a26	Process .ecj deletions during EC decode and vacuum decoded volume (#8863 ) * Process .ecj deletions during EC decode and vacuum decoded volume (#8798) When decoding EC volumes back to normal volumes, deletions recorded in the .ecj journal were not being applied before computing the dat file size or checking for live needles. This caused the decoded volume to include data for deleted files and could produce false positives in the all-deleted check. - Call RebuildEcxFile before HasLiveNeedles/FindDatFileSize in VolumeEcShardsToVolume so .ecj deletions are merged into .ecx first - Vacuum the decoded volume after mounting in ec.decode to compact out deleted needle data from the .dat file - Add integration tests for decoding with non-empty .ecj files * storage: add offline volume compaction helper Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * ec: compact decoded volumes before deleting shards Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * ec: address PR review comments - Fall back to data directory for .ecx when idx directory lacks it - Make compaction failure non-fatal during EC decode - Remove misleading "buffer: 10%" from space check error message * ec: collect .ecj from all shard locations during decode Each server's .ecj only contains deletions for needles whose data resides in shards held by that server. Previously, sources with no new data shards to contribute were skipped entirely, losing their .ecj deletion entries. Now .ecj is always appended from every shard location so RebuildEcxFile sees the full set of deletions. * ec: add integration tests for .ecj collection during decode TestEcDecodePreservesDeletedNeedles: verifies that needles deleted via VolumeEcBlobDelete are excluded from the decoded volume. TestEcDecodeCollectsEcjFromPeer: regression test for the fix in collectEcShards. Deletes a needle only on a peer server that holds no new data shards, then verifies the deletion survives decode via .ecj collection. * ec: address review nits in decode and tests - Remove double error wrapping in mountDecodedVolume - Check VolumeUnmount error in peer ecj test - Assert 404 specifically for deleted needles, fail on 5xx --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-01 01:15:26 -07:00
Chris Lu	f256002d0b	fix ec.balance failing to rebalance when all nodes share all volumes (#8796 ) * fix ec.balance failing to rebalance when all nodes share all volumes (#8793) Two bugs in doBalanceEcRack prevented rebalancing: 1. Sorting by freeEcSlot instead of actual shard count caused incorrect empty/full node selection when nodes have different total capacities. 2. The volume-level check skipped any volume already present on the target node. When every node has a shard of every volume (common with many EC volumes across N nodes with N shards each), no moves were possible. Fix: sort by actual shard count, and use a two-pass approach - first prefer moving shards of volumes not on the target (best diversity), then fall back to moving specific shard IDs not yet on the target. * add test simulating real cluster topology from issue #8793 Uses the actual node addresses and mixed max capacities (80 vs 33) from the reporter's 14-node cluster to verify ec.balance correctly rebalances with heterogeneous node sizes. * fix pass comments to match 0-indexed loop variable	2026-03-27 11:14:10 -07:00
Lisandro Pin	e5cf2d2a19	Give the `ScrubVolume()` RPC an option to flag found broken volumes as read-only. (#8360 ) * Give the `ScrubVolume()` RPC an option to flag found broken volumes as read-only. Also exposes this option in the shell `volume.scrub` command. * Remove redundant test in `TestVolumeMarkReadonlyWritableErrorPaths`. `417051bb` slightly rearranges the logic for `VolumeMarkReadonly()` and `VolumeMarkWritable()`, so calling them for invalid volume IDs will actually yield that error, instead of checking maintnenance mode first.	2026-03-26 10:20:57 -07:00
Chris Lu	ccc662b90b	shell: add s3.bucket.access command for anonymous access policy (#8774 ) * shell: add s3.bucket.access command for anonymous access policy (#7738) Add a new weed shell command to view or change the anonymous access policy of an S3 bucket without external tools. Usage: s3.bucket.access -name <bucket> -access read,list s3.bucket.access -name <bucket> -access none Supported permissions: read, write, list. The command writes a standard bucket policy with Principal "" and warns if no anonymous IAM identity exists. shell: fix anonymous identity hint in s3.bucket.access warning The anonymous identity doesn't need IAM actions — the bucket policy controls what anonymous users can do. * shell: only warn about anonymous identity when write access is set Read and list operations use AuthWithPublicRead which evaluates bucket policies directly without requiring the anonymous identity. Only write operations go through the normal auth flow that needs it. * shell: rewrite s3.bucket.access to use IAM actions instead of bucket policies Replace the bucket policy approach with direct IAM identity actions, matching the s3.configure pattern. The user is auto-created if it does not exist. Usage: s3.bucket.access -name <bucket> -user anonymous -access Read,List s3.bucket.access -name <bucket> -user anonymous -access none s3.bucket.access -name <bucket> -user anonymous Actions are stored as "Action:bucket" on the identity, same as s3.configure -actions=Read -buckets=my-bucket. * shell: return flag parse errors instead of swallowing them * shell: normalize action names case-insensitively in s3.bucket.access Accept actions in any case (read, READ, Read) and normalize to canonical form (Read, Write, List, etc.) before storing. This matches the case-insensitive handling of "none" and avoids confusing rejections.	2026-03-25 23:09:53 -07:00
Chris Lu	7fbdb9b7b7	feat(shell): add volume.tier.compact command to reclaim cloud storage space (#8715 ) * feat(shell): add volume.tier.compact command to reclaim cloud storage space Adds a new shell command that automates compaction of cloud tier volumes. When files are deleted from remote-tiered volumes, space is not reclaimed on the cloud storage. This command orchestrates: download from remote, compact locally, and re-upload to reclaim deleted space. Closes #8563 * fix: log cleanup errors in compactVolumeOnServer instead of discarding them Helps operators diagnose leftover temp files (.cpd/.cpx) if cleanup fails after a compaction or commit failure. * fix: return aggregate error from loop and use regex for collection filter - Track and return error count when one or more volumes fail to compact, so callers see partial failures instead of always getting nil. - Use compileCollectionPattern for -collection in -volumeId mode too, so regex patterns work consistently with the flag description. Empty pattern (no -collection given) matches all collections.	2026-03-20 23:52:12 -07:00
Chris Lu	81369b8a83	improve: large file sync throughput for remote.cache and filer.sync (#8676 ) * improve large file sync throughput for remote.cache and filer.sync Three main throughput improvements: 1. Adaptive chunk sizing for remote.cache: targets ~32 chunks per file instead of always starting at 5MB. A 500MB file now uses ~16MB chunks (32 chunks) instead of 5MB chunks (100 chunks), reducing per-chunk overhead (volume assign, gRPC call, needle write) by 3x. 2. Configurable concurrency at every layer: - remote.cache chunk concurrency: -chunkConcurrency flag (default 8) - remote.cache S3 download concurrency: -downloadConcurrency flag (default raised from 1 to 5 per chunk) - filer.sync chunk concurrency: -chunkConcurrency flag (default 32) 3. S3 multipart download concurrency raised from 1 to 5: the S3 manager downloader was using Concurrency=1, serializing all part downloads within each chunk. This alone can 5x per-chunk download speed. The concurrency values flow through the gRPC request chain: shell command → CacheRemoteObjectToLocalClusterRequest → FetchAndWriteNeedleRequest → S3 downloader Zero values in the request mean "use server defaults", maintaining full backward compatibility with existing callers. Ref #8481 * fix: use full maxMB for chunk size cap and remove loop guard Address review feedback: - Use full maxMB instead of maxMB/2 for maxChunkSize to avoid unnecessarily limiting chunk size for very large files. - Remove chunkSize < maxChunkSize guard from the safety loop so it can always grow past maxChunkSize when needed to stay under 1000 chunks (e.g., extremely large files with small maxMB). * address review feedback: help text, validation, naming, docs - Fix help text for -chunkConcurrency and -downloadConcurrency flags to say "0 = server default" instead of advertising specific numeric defaults that could drift from the server implementation. - Validate chunkConcurrency and downloadConcurrency are within int32 range before narrowing, returning a user-facing error if out of range. - Rename ReadRemoteErr to readRemoteErr to follow Go naming conventions. - Add doc comment to SetChunkConcurrency noting it must be called during initialization before replication goroutines start. - Replace doubling loop in chunk size safety check with direct ceil(remoteSize/1000) computation to guarantee the 1000-chunk cap. * address Copilot review: clamp concurrency, fix chunk count, clarify proto docs - Use ceiling division for chunk count check to avoid overcounting when file size is an exact multiple of chunk size. - Clamp chunkConcurrency (max 1024) and downloadConcurrency (max 1024 at filer, max 64 at volume server) to prevent excessive goroutines. - Always use ReadFileWithConcurrency when the client supports it, falling back to the implementation's default when value is 0. - Clarify proto comments that download_concurrency only applies when the remote storage client supports it (currently S3). - Include specific server defaults in help text (e.g., "0 = server default 8") so users see the actual values in -h output. * fix data race on executionErr and use %w for error wrapping - Protect concurrent writes to executionErr in remote.cache worker goroutines with a sync.Mutex to eliminate the data race. - Use %w instead of %v in volume_grpc_remote.go error formatting to preserve the error chain for errors.Is/errors.As callers.	2026-03-17 16:49:56 -07:00
Chris Lu	c4d642b8aa	fix(ec): gather shards from all disk locations before rebuild (#8633 ) * fix(ec): gather shards from all disk locations before rebuild (#8631) Fix "too few shards given" error during ec.rebuild on multi-disk volume servers. The root cause has two parts: 1. VolumeEcShardsRebuild only looked at a single disk location for shard files. On multi-disk servers, the existing local shards could be on one disk while copied shards were placed on another, causing the rebuild to see fewer shards than actually available. 2. VolumeEcShardsCopy had a DiskId condition (req.DiskId == 0 && len(vs.store.Locations) > 0) that was always true, making the FindFreeLocation fallback dead code. This meant copies always went to Locations[0] regardless of where existing shards were. Changes: - VolumeEcShardsRebuild now finds the location with the most shards, then gathers shard files from other locations via hard links (or symlinks for cross-device) before rebuilding. Gathered files are cleaned up after rebuild. - VolumeEcShardsCopy now only uses Locations[DiskId] when DiskId > 0 (explicitly set). Otherwise, it prefers the location that already has the EC volume, falling back to HDD then any free location. - generateMissingEcFiles now logs shard counts and provides a clear error message when not enough shards are found, instead of passing through to the opaque reedsolomon "too few shards given" error. * fix(ec): update test to match skip behavior for unrepairable volumes The test expected an error for volumes with insufficient shards, but commit `5acb4578a` changed unrepairable volumes to be skipped with a log message instead of returning an error. Update the test to verify the skip behavior and log output. * fix(ec): address PR review comments - Add comment clarifying DiskId=0 means "not specified" (protobuf default), callers must use DiskId >= 1 to target a specific disk. - Log warnings on cleanup failures for gathered shard links. * fix(ec): read shard files from other disks directly instead of linking Replace the hard link / symlink gathering approach with passing additional search directories into RebuildEcFiles. The rebuild function now opens shard files directly from whichever disk they live on, avoiding filesystem link operations and cleanup. RebuildEcFiles and RebuildEcFilesWithContext gain a variadic additionalDirs parameter (backward compatible with existing callers). * fix(ec): clarify DiskId selection semantics in VolumeEcShardsCopy comment * fix(ec): avoid empty files on failed rebuild; don't skip ecx-only locations - generateMissingEcFiles: two-pass approach — first discover present/missing shards and check reconstructability, only then create output files. This avoids leaving behind empty truncated shard files when there are too few shards to rebuild. - VolumeEcShardsRebuild: compute hasEcx before skipping zero-shard locations. A location with an .ecx file but no shard files (all shards on other disks) is now a valid rebuild candidate instead of being silently skipped. * fix(ec): select ecx-only location as rebuildLocation when none chosen yet When rebuildLocation is nil and a location has hasEcx=true but existingShardCount=0 (all shards on other disks), the condition 0 > 0 was false so it was never promoted to rebuildLocation. Add rebuildLocation == nil to the predicate so the first location with an .ecx file is always selected as a candidate.	2026-03-14 20:59:47 -07:00
Chris Lu	5acb4578ab	Fix ec.rebuild failing on unrepairable volumes instead of skipping (#8632 ) * Fix ec.rebuild failing on unrepairable volumes instead of skipping them When an EC volume has fewer shards than DataShardsCount, ec.rebuild would return an error and abort the entire operation. Now it logs a warning and continues rebuilding the remaining volumes. Fixes #8630 * Remove duplicate volume ID in unrepairable log message --------- Co-authored-by: Copilot <copilot@github.com>	2026-03-14 16:18:29 -07:00
Chris Lu	f3c5ba3cd6	feat(filer): add lazy directory listing for remote mounts (#8615 ) * feat(filer): add lazy directory listing for remote mounts Directory listings on remote mounts previously only queried the local filer store. With lazy mounts the listing was empty; with eager mounts it went stale over time. Add on-demand directory listing that fetches from remote and caches results with a 5-minute TTL: - Add `ListDirectory` to `RemoteStorageClient` interface (delimiter-based, single-level listing, separate from recursive `Traverse`) - Implement in S3, GCS, and Azure backends using each platform's hierarchical listing API - Add `maybeLazyListFromRemote` to filer: before each directory listing, check if the directory is under a remote mount with an expired cache, fetch from remote, persist entries to the local store, then let existing listing logic run on the populated store - Use singleflight to deduplicate concurrent requests for the same directory - Skip local-only entries (no RemoteEntry) to avoid overwriting unsynced uploads - Errors are logged and swallowed (availability over consistency) * refactor: extract xattr key to constant xattrRemoteListingSyncedAt * feat: make listing cache TTL configurable per mount via listing_cache_ttl_seconds Add listing_cache_ttl_seconds field to RemoteStorageLocation protobuf. When 0 (default), lazy directory listing is disabled for that mount. When >0, enables on-demand directory listing with the specified TTL. Expose as -listingCacheTTL flag on remote.mount command. * refactor: address review feedback for lazy directory listing - Add context.Context to ListDirectory interface and all implementations - Capture startTime before remote call for accurate TTL tracking - Simplify S3 ListDirectory using ListObjectsV2PagesWithContext - Make maybeLazyListFromRemote return void (errors always swallowed) - Remove redundant trailing-slash path manipulation in caller - Update tests to match new signatures * When an existing entry has Remote != nil, we should merge remote metadata into it rather than replacing it. * fix(gcs): wrap ListDirectory iterator error with context The raw iterator error was returned without bucket/path context, making it harder to debug. Wrap it consistently with the S3 pattern. * fix(s3): guard against nil pointer dereference in Traverse and ListDirectory Some S3-compatible backends may return nil for LastModified, Size, or ETag fields. Check for nil before dereferencing to prevent panics. * fix(filer): remove blanket 2-minute timeout from lazy listing context Individual SDK operations (S3, GCS, Azure) already have per-request timeouts and retry policies. The blanket timeout could cut off large directory listings mid-operation even though individual pages were succeeding. * fix(filer): preserve trace context in lazy listing with WithoutCancel Use context.WithoutCancel(ctx) instead of context.Background() so trace/span values from the incoming request are retained for distributed tracing, while still decoupling cancellation. * fix(filer): use Store.FindEntry for internal lookups, add Uid/Gid to files, fix updateDirectoryListingSyncedAt - Use f.Store.FindEntry instead of f.FindEntry for staleness check and child lookups to avoid unnecessary lazy-fetch overhead - Set OS_UID/OS_GID on new file entries for consistency with directories - In updateDirectoryListingSyncedAt, use Store.UpdateEntry for existing directories instead of CreateEntry to avoid deleteChunksIfNotNew and NotifyUpdateEvent side effects * fix(filer): distinguish not-found from store errors in lazy listing Previously, any error from Store.FindEntry was treated as "not found," which could cause entry recreation/overwrite on transient DB failures. Now check for filer_pb.ErrNotFound explicitly and skip entries or bail out on real store errors. * refactor(filer): use errors.Is for ErrNotFound comparisons	2026-03-13 09:36:54 -07:00
Peter Dodd	0e570d6a8f	feat(remote.mount): add -metadataStrategy flag to control metadata caching (#8568 ) * feat(remote): add -noSync flag to skip upfront metadata pull on mount Made-with: Cursor * refactor(remote): split mount setup from metadata sync Extract ensureMountDirectory for create/validate; call pullMetadata directly when sync is needed. Caller controls sync step for -noSync. Made-with: Cursor * fix(remote): validate mount root when -noSync so bad bucket/creds fail fast When -noSync is used, perform a cheap remote check (ListBuckets and verify bucket exists) instead of skipping all remote I/O. Invalid buckets or credentials now fail at mount time. Made-with: Cursor * test(remote): add TestRemoteMountNoSync for -noSync mount and persisted mapping Made-with: Cursor * test(remote): assert no upfront metadata after -noSync mount After remote.mount -noSync, run fs.ls on the mount dir and assert empty listing so the test fails if pullMetadata was invoked eagerly. Made-with: Cursor * fix(remote): propagate non-ErrNotFound lookup errors in ensureMountDirectory Return lookupErr immediately for any LookupDirectoryEntry failure that is not filer_pb.ErrNotFound, so only the not-found case creates the entry and other lookup failures are reported to the caller. Made-with: Cursor * fix(remote): use errors.Is for ErrNotFound in ensureMountDirectory Replace fragile strings.Contains(lookupErr.Error(), ...) with errors.Is(lookupErr, filer_pb.ErrNotFound) before calling CreateEntry. Made-with: Cursor * fix(remote): use LookupEntry so ErrNotFound is recognised after gRPC Raw gRPC LookupDirectoryEntry returns a status error, not the sentinel, so errors.Is(lookupErr, filer_pb.ErrNotFound) was always false. Use filer_pb.LookupEntry which normalises not-found to ErrNotFound so the mount directory is created when missing. Made-with: Cursor * test(remote): ignore weed shell banner in TestRemoteMountNoSync fs.ls count Exclude master/filer and prompt lines from entry count so the assertion checks only actual fs.ls output for empty -noSync mount. Made-with: Cursor * fix(remote.mount): use 0755 for mount dir, document bucket-less early return Made-with: Cursor * feat(remote.mount): replace -noSync with -metadataStrategy=lazy\|eager - Add -metadataStrategy flag (eager default, lazy skips upfront metadata pull) - Accept lazy/eager case-insensitively; reject invalid values with clear error - Rename TestRemoteMountNoSync to TestRemoteMountMetadataStrategyLazy - Add TestRemoteMountMetadataStrategyEager and TestRemoteMountMetadataStrategyInvalid Made-with: Cursor * fix(remote.mount): validate strategy and remote before creating mount directory Move strategy validation and validateMountRoot (lazy path) before ensureMountDirectory so that invalid strategies or bad bucket/credentials fail without leaving orphaned directory entries in the filer. * refactor(remote.mount): remove unused remote param from ensureMountDirectory The remote RemoteStorageLocation parameter was left over from the old syncMetadata signature. Only remoteConf.Name is used inside the function. doc(remote.mount): add TODO for HeadBucket-style validation validateMountRoot currently lists all buckets to verify one exists. Note the need for a targeted BucketExists method in the interface. * refactor(remote.mount): use MetadataStrategy type and constants Replace raw string comparisons with a MetadataStrategy type and MetadataStrategyEager/MetadataStrategyLazy constants for clarity and compile-time safety. * refactor(remote.mount): rename MetadataStrategy to MetadataCacheStrategy More precisely describes the purpose: controlling how metadata is cached from the remote, not metadata handling in general. * fix(remote.mount): remove validateMountRoot from lazy path Lazy mount's purpose is to skip remote I/O. Validating via ListBuckets contradicts that, especially on accounts with many buckets. Invalid buckets or credentials will surface on first lazy access instead. * fix(test): handle shell exit 0 in TestRemoteMountMetadataStrategyInvalid The weed shell process exits with code 0 even when individual commands fail — errors appear in stdout. Check output instead of requiring a non-nil error. * test(remote.mount): remove metadataStrategy shell integration tests These tests only verify string output from a shell process that always exits 0 — they cannot meaningfully validate eager vs lazy behavior without a real remote backend. --------- Co-authored-by: Chris Lu <chris.lu@gmail.com>	2026-03-12 15:21:07 -07:00
Copilot	013362d2d3	fix(shell): show planned size in fs.mergeVolumes log to clarify size limit check (#8553 ) The log message was comparing against the planned size of the destination volume (including volumes already planned to merge into it) but only displaying the raw volume size, making the output confusing when the displayed sizes clearly didn't add up to exceed the limit.	2026-03-11 13:56:13 -07:00
Chris Lu	b799650357	fix(shell): set LastLocalSyncTsNs in remote.copy.local so remote.uncache works (#8604 ) remote.uncache checks LastLocalSyncTsNs to determine if a file has been synced to remote. remote.copy.local was not setting this field, leaving it at 0, which caused uncache to skip all files uploaded via remote.copy.local. Fixes #8602	2026-03-11 12:55:45 -07:00
Chris Lu	e1e5b4a8a6	add admin script worker (#8491 ) * admin: add plugin lock coordination * shell: allow bypassing lock checks * plugin worker: add admin script handler * mini: include admin_script in plugin defaults * admin script UI: drop name and enlarge text * admin script: add default script * admin_script: make run interval configurable * plugin: gate other jobs during admin_script runs * plugin: use last completed admin_script run * admin: backfill plugin config defaults * templ Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com> * comparable to default version Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com> * default to run Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com> * format Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com> * shell: respect pre-set noLock for fix.replication * shell: add force no-lock mode for admin scripts * volume balance worker already exists Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com> * admin: expose scheduler status JSON * shell: add sleep command * shell: restrict sleep syntax * Revert "shell: respect pre-set noLock for fix.replication" This reverts commit `2b14e8b826`. * templ Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com> * fix import Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com> * less logs Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com> * Reduce master client logs on canceled contexts * Update mini default job type count --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-03 15:10:40 -08:00
Chris Lu	2dd3944819	Respect -minFreeSpace during ec.decode (#8467 ) * shell: add ec.decode ignoreMinFreeSpace flag Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * shell: respect minFreeSpace in ec.decode Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * shell: rename ec.decode minFreeSpace flag Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * shell: error when ec.decode has no shards Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * shell: select ec.decode target with zero shards Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * shell: adjust free counts across ec.decode Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * unused * Update weed/shell/command_ec_decode.go Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>	2026-02-27 23:54:30 -08:00
Chris Lu	9b6fc49946	Chart createBuckets config #8368 : Add TTL, Object Lock, and Versioning support (#8375 ) * Chart createBuckets config #8368: Add TTL, Object Lock, and Versioning support * Update weed/shell/command_s3_bucket_versioning.go Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * address comments * address comments * go fmt * fix failures are still treated like “bucket not found” --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-02-26 11:56:10 -08:00
Chris Lu	a3cb7fa8cc	go fmt	2026-02-25 10:25:23 -08:00
Chris Lu	b565a0cc86	Adds volume.merge command with deduplication and disk-based backend (#8441 ) * Enhance volume.merge command with deduplication and disk-based backend * Fix copyVolume function call with correct argument order and missing bool parameter * Revert "Fix copyVolume function call with correct argument order and missing bool parameter" This reverts commit `7b4a190643`. * Fix critical issues: per-replica writable tracking, tail goroutine cancellation via done channel, and debug logging for allocation failures * Optimize memory usage with watermark approach for duplicate detection * Fix critical issues: swap copyVolume arguments, increase idle timeout, remove file double-close, use glog for logging * Replace temporary file with in-memory buffer for needle blob serialization * test(volume.merge): Add comprehensive unit and integration tests Add 7 unit tests covering: - Ordering by timestamp - Cross-stream duplicate deduplication - Empty stream handling - Complex multi-stream deduplication - Single stream passthrough - Large needle ID support - LastModified fallback when timestamp unavailable Add 2 integration validation tests: - TestMergeWorkflowValidation: Documents 9-stage merge workflow - TestMergeEdgeCaseHandling: Validates 10 edge case handling All tests passing (9/9) * fix(volume.merge): Use time window for deduplication to handle clock skew The same needle ID can have different timestamps on different servers due to clock skew and replication lag. Needles with the same ID within a 5-second time window are now treated as duplicates (same write with timestamp variance). Key changes: - Add mergeDeduplicationWindowNs constant (5 seconds) - Replace exact timestamp matching with time window comparison - Use windowInitialized flag to properly detect window transitions - Add TestMergeNeedleStreamsTimeWindowDeduplication test This ensures that replicated writes with slight timestamp differences are properly deduplicated during merge, while separate updates to the same file ID (outside the window) are preserved. All tests passing (10/10) * test: Add volume.merge integration tests with 5 comprehensive test cases * test: integration tests for volume.merge command * Fix integration tests: use TripleVolumeCluster for volume.merge testing - Created new TripleVolumeCluster framework (cluster_triple.go) with 3 volume servers - Rebuilt weed binary with volume.merge command compiled in - Updated all 5 integration tests to use TripleVolumeCluster instead of DualVolumeCluster - Tests now properly allocate volumes on 2 servers and let merge allocate on 3rd - All 5 integration tests now pass: - TestVolumeMergeBasic - TestVolumeMergeReadonly - TestVolumeMergeRestore - TestVolumeMergeTailNeedles - TestVolumeMergeDivergentReplicas * Refactor test framework: use parameterized server count instead of hardcoded - Renamed TripleVolumeCluster to MultiVolumeCluster with serverCount parameter - Replaced hardcoded volumePort0/1/2 with slices for flexible server count - Updated StartTripleVolumeCluster as backward-compatible wrapper calling StartMultiVolumeCluster(t, profile, 3) - Made directory creation, port allocation, and server startup loop-based - Updated accessor methods (VolumeAdminAddress, VolumeGRPCAddress, etc.) to support any server count - All 5 integration tests continue to pass with new parameterized cluster framework - Enables future testing with 2, 4, 5+ volume servers by calling StartMultiVolumeCluster directly * Consolidate cluster frameworks: StartDualVolumeCluster now uses MultiVolumeCluster - Made DualVolumeCluster a type alias for MultiVolumeCluster - Updated StartDualVolumeCluster to call StartMultiVolumeCluster(t, profile, 2) - Removed duplicate code from cluster_dual.go (now just 17 lines) - All existing tests using StartDualVolumeCluster continue to work without changes - Backward compatible: existing code continues to use the old function signatures - Added wrapper functions in cluster_multi.go for StartTripleVolumeCluster - Enables unified cluster management across all test suites * Address PR review comments: improve error handling and clean up code - Replace parse error swallow with proper error return - Log cleanup and restoration errors instead of silently discarding them - Remove unused offset field from memoryBackendFile struct - Fix WriteAt buffer truncation bug to preserve trailing bytes - All unit tests passing (10/10) - Code compiles successfully * Fix PR review findings: test improvements and code quality - Add timeout to runWeedShell to prevent hanging - Add server 1 readonly status verification in tests - Assert merge fails when replicas writable (not just log output) - Replace sleep with polling for writable restoration check - Fix WriteAt stale data snapshot bug in memoryBackendFile - Fix startVolume error logging to show current server log - Fix volumePubPorts double assignment in port allocation - Rename test to reflect behavior: DoesNotDeduplicateAcrossWindows - Fix misleading dedup window comment Unit tests: 10/10 passing Binary: Compiles successfully * Fix test assumption: merge command marks volumes readonly automatically TestVolumeMergeReadonly was expecting merge to fail on writable volumes, but the merge command is designed to mark volumes readonly as part of its operation. Fixed test to verify merge succeeds on writable volumes and properly restores writable state afterward. Removed redundant Test 2 code that duplicated the new behavior. * fmt * Fix deduplication logic to correctly handle same-stream vs cross-stream duplicates The dedup map previously used only NeedleId as key, causing same-stream overwrites to be incorrectly skipped as duplicates. Changed to track which stream first processed each needle ID in the current window: - Cross-stream duplicates (same ID from different streams, within window) are skipped - Same-stream duplicates (overwrites from same stream) are kept - Map now stores: needleId -> streamIndex of first occurrence in window Added TestMergeNeedleStreamsSameStreamDuplicates to verify same-stream overwrites are preserved while cross-stream duplicates are skipped. All unit tests passing (11/11) Binary compiles successfully	2026-02-25 10:12:09 -08:00
Chris Lu	da4edb5fe6	Fix live volume move tail timestamp (#8440 ) * Improve move tail timestamp * Add move tail timestamp integration test * Simulate traffic during move	2026-02-24 20:07:26 -08:00
Chris Lu	cd6832249b	Fix volume.fsck crashing on EC volumes and add multi-volume vacuum support (#8406 ) * helm: refine openshift-values.yaml to remove hardcoded UIDs Remove hardcoded runAsUser, runAsGroup, and fsGroup from the openshift-values.yaml example. This allows OpenShift's admission controller to automatically assign a valid UID from the namespace's allocated range, avoiding "forbidden" errors when UID 1000 is outside the permissible range. Updates #8381, #8390. * helm: fix volume.logs and add consistent security context comments * Update README.md * fix volume.fsck crashing on EC volumes and add multi-volume vacuum support * address comments	2026-02-22 22:07:15 -08:00
Konstantin Lebedev	01b3125815	[shell]: volume balance capacity by min volume density (#8026 ) volume balance by min volume density and active volumes	2026-02-19 13:30:59 -08:00
Chris Lu	f44e25b422	fix(iam): ensure access key status is persisted and defaulted to Active (#8341 ) * Fix master leader election startup issue Fixes #error-log-leader-not-selected-yet * not useful test * fix(iam): ensure access key status is persisted and defaulted to Active * make pb * update tests * using constants	2026-02-13 20:28:41 -08:00
Lisandro Pin	fbe7dd32c2	Implement full scrubbing for regular volumes (#8254 ) Implement full scrubbing for regular volumes.	2026-02-13 15:47:29 -08:00
Lisandro Pin	221bd237c4	Fix file stat collection metric bug for the `cluster.status` command. (#8302 ) When the `--files` flag is present, `cluster.status` will scrape file metrics from volume servers to provide detailed stats on those. The progress indicator was not being updated properly though, so the command would complete before it read 100%.	2026-02-11 13:34:20 -08:00
Chris Lu	a3136c523f	Fix volume.fsck 401 Unauthorized by adding JWT to HTTP delete requests (#8306 ) * Fix volume.fsck 401 Unauthorized by adding JWT to HTTP delete requests * Additionally, for performance, consider fetching the jwt.filer_signing.key once before any loops that call httpDelete, rather than inside httpDelete itself, to avoid repeated configuration lookups.	2026-02-11 13:32:56 -08:00
Lisandro Pin	e657e7d827	Implement local scrubbing for EC volumes. (#8283 )	2026-02-11 11:04:08 -08:00
Lisandro Pin	2a73219397	Add weed shell command `volumeServer.state` to query/update volume server state settings. (#8271 ) Add weed shell command `volumeServer.state` to query/update volume server states.	2026-02-11 11:02:37 -08:00
Lisandro Pin	f400fb44a0	Update `cluster.status` to resolve file details on EC volumes. (#8268 ) Also parallelizes queries for file metrics collections when the `--files` flag is specified, and improves the command's output for readability: ``` > cluster.status --files collecting file stats: 100% cluster: id: topo status: LOCKED nodes: 10 topology: 1 DC, 10 disks on 1 rack volumes: total: 3 volumes, 1 collection max size: 32 GB regular: 1/80 volume on 3 replicas, 3 writable (100%), 0 read-only (0%) EC: 2 EC volumes on 28 shards (14 shards/volume) storage: total: 269 MB (522 MB raw, 193.95%) regular volumes: 91 MB (272 MB raw, 300%) EC volumes: 178 MB (250 MB raw, 140%) files: total: 363 files, 300 readable (82.64%), 63 deleted (17.35%), avg 522 kB per file regular: 168 files, 105 readable (62.5%), 63 deleted (37.5%), avg 540 kB per file EC: 195 files, 195 readable (100%), 0 deleted (0%), avg 506 kB per file ```	2026-02-09 17:52:43 -08:00
Chris Lu	30812b85f3	fix ec.encode skipping volumes when one replica is on a full disk (#8227 ) * fix ec.encode skipping volumes when one replica is on a full disk This fixes issue #8218. Previously, ec.encode would skip a volume if ANY of its replicas resided on a disk with low free volume count. Now it accepts the volume if AT LEAST ONE replica is on a healthy disk. * refine noFreeDisk counter logic in ec.encode Ensure noFreeDisk is decremented if a volume initially marked as bad is later found to have a healthy replica. This ensures accurate summary statistics. * defer noFreeDisk counting and refine logging in ec.encode Updated logging to be replica-scoped and deferred noFreeDisk counting to the final pass over vidMap. This ensures that the counter only reflects volumes that are definitively excluded because all replicas are on full disks. * filter replicas by free space during ec.encode Updated doEcEncode to filter out replicas on disks with FreeVolumeCount < 2 before selecting the best replica for encoding. This ensures that EC shards are not generated on healthy source replicas that happen to be on disks with low free space.	2026-02-09 14:23:11 -08:00
Chris Lu	6a61037333	fix issue #8230 : volume.fsck deletion logic to respect purgeAbsent flag (#8266 ) * fix issue #8230: volume.fsck deletion logic to respect purgeAbsent flag This commit fixes two issues in volume.fsck: 1. Missing chunks in existing volumes are now deleted if -reallyDeleteFilerEntries is set. 2. Missing volumes are now properly handled when a -volumeId filter is specified, allowing deletion of filer entries for those volumes. * address PR feedback for issue #8230 - Ensure volume filter is applied before reporting missing volumes - Fix potential nil-pointer dereferences in httpDelete method - Use proper error checking throughout httpDelete * address second round PR feedback for issue #8230 - Use fmt.Fprintf(c.writer, ...) instead of fmt.Printf - Add missing newline in "deleting path" log message	2026-02-09 13:23:17 -08:00
Lisandro Pin	63b846b73b	Parallelize operations for the `volume.scrub` and `ec.scrub` commands (#8247 ) Parallelize operations for the `volume.scrub` and `ec.scrub` commands.	2026-02-09 09:07:06 -08:00
Chris Lu	2ed5a8f65c	add tests	2026-02-09 01:37:56 -08:00
Feng Shao	963398ac8c	use ReadFull (#40 ) (#8240 ) * use ReadFull * fix error checking	2026-02-06 20:51:47 -08:00
Lisandro Pin	9d751a7b61	Contrib/volume scrub local (#8226 )	2026-02-05 14:44:12 -08:00
Chris Lu	3306abae10	shell: add minCacheAge flag to remote.uncache command (#8225 ) * add minCacheAge flag to remote.uncache command #8221 * address code review feedback: add nil check and improve test isolation * address code review feedback: use consistent timestamp in FileFilter	2026-02-05 12:57:27 -08:00
Lisandro Pin	2ecbae3611	Add volume.scrub and ec.scrub shell commands to scrub regular & EC volumes on demand. (#8188 ) * Implement RPC skeleton for regular/EC volumes scrubbing. See https://github.com/seaweedfs/seaweedfs/issues/8018 for details. * Add `volume.scrub` and `ec.scrub` shell commands to scrub regular & EC volumes on demand. F.ex: ``` > ec.scrub --full Scrubbing 10.200.17.13:9005 (1/10)... Scrubbing 10.200.17.13:9001 (2/10)... Scrubbing 10.200.17.13:9008 (3/10)... Scrubbing 10.200.17.13:9009 (4/10)... Scrubbing 10.200.17.13:9004 (5/10)... Scrubbing 10.200.17.13:9010 (6/10)... Scrubbing 10.200.17.13:9007 (7/10)... Scrubbing 10.200.17.13:9002 (8/10)... Scrubbing 10.200.17.13:9003 (9/10)... Scrubbing 10.200.17.13:9006 (10/10)... Scrubbed 20 EC files and 20 volumes on 10 nodes Got scrub failures on 1 EC volumes and 2 EC shards :( Affected volumes: 10.200.17.13:9005:1 Details: [10.200.17.13:9005] expected 551041 bytes for needle 6, got 551072 [10.200.17.13:9005] needles in volume file (1) don't match index entries (173) for volume 1 ```	2026-02-04 17:08:31 -08:00

1 2 3 4 5 ...

1031 Commits