seaweedfs

mirror of https://github.com/seaweedfs/seaweedfs.git synced 2026-05-19 16:21:28 +00:00

Author	SHA1	Message	Date
qzhello	60c76120fc	fix(shell): use exact match for volume.balance -racks/-nodes filter (#9279 ) * fix(shell): correct volume.list -writable filter unit and comparison * fix(shell): correct volume.list -writable filter unit and comparison * chore(shell): fix typo in EC shard helper param names * fix(shell): use exact match for volume.balance -racks/-nodes filter The old strings.Contains-based filter quietly included any id that was a substring of the user-supplied flag value (e.g. -racks=rack10 also matched rack1). Replace it with an exact-match set parsed from the comma-separated flag value, and add regression tests for both -racks and -nodes paths. Also fix a small typo in the "remote storage" error returned by maybeMoveOneVolume. * fix(shell): use exact match for volume.balance -racks/-nodes filter The old strings.Contains-based filter quietly included any id that was a substring of the user-supplied flag value (e.g. -racks=rack10 also matched rack1). Replace it with an exact-match set parsed from the comma-separated flag value, and add regression tests for both -racks and -nodes paths. Also fix a small typo in the "remote storage" error returned by maybeMoveOneVolume. * refactor(shell): drop nil sentinel in splitCSVSet, use len() in callers	2026-04-29 10:19:16 -07:00
qzhello	108e42fb8b	chore(shell): fix typo in EC shard helper param names (#9277 ) * fix(shell): correct volume.list -writable filter unit and comparison * fix(shell): correct volume.list -writable filter unit and comparison * chore(shell): fix typo in EC shard helper param names	2026-04-28 23:09:26 -07:00
Chris Lu	294f7c3d04	shell: expand `~` in local file path arguments (#9265 ) * shell: expand `~` in local file path arguments The weed shell parses commands itself instead of going through an OS shell, so a path like `~/Downloads/foo.meta` was passed verbatim to `os.Open`, which fails because no `~` directory exists. Users had to spell out absolute home paths in every command. Add an `expandHomeDir` helper that resolves a leading `~` or `~/...` to the user's home directory, and run user-supplied local file paths in the affected shell commands through it: fs.meta.load (positional file) fs.meta.save (-o) fs.meta.changeVolumeId (-mapping) s3.iam.export (-file) s3.iam.import (-file) s3.policy (-file) s3tables.bucket (-file) s3tables.table (-file, -metadata) volume.fsck (-tempPath) Filer-namespace path flags (`-dir`, `-path`, `-locationPrefix`, etc.) are unaffected; they live in the filer, not on the local FS. * shell: reuse util.ResolvePath instead of a new helper util.ResolvePath already does tilde expansion; drop the local expandHomeDir helper and route every shell call site through it.	2026-04-28 12:30:13 -07:00
qzh	21fadf5582	fix(shell): correct volume.list -writable filter unit and comparison (#9231 ) * fix(shell): correct volume.list -writable filter unit and comparison * fix(shell): correct volume.list -writable filter unit and comparison	2026-04-26 22:20:46 -07:00
Chris Lu	da2e90aefd	fix(mount): sanitize non-UTF-8 filenames; keep marshal errors per-request (#9207 ) * fix(mount): sanitize non-UTF-8 filenames; keep marshal errors per-request (#9139) A single file with invalid-UTF-8 bytes in its name (e.g. a GNOME Trash "partial" like \x10\x98=\\\x8a\x7f.trashinfo.9a51454f.partial) made every FUSE-initiated filer RPC fail with: rpc error: code = Internal desc = grpc: error while marshaling: string field contains invalid UTF-8 and then produced an avalanche of "connection is closing" errors on unrelated LookupEntry / ReadDirAll / UpdateEntry calls, causing the volume-server QPS dips reported in #9139. Root cause is twofold: 1. Proto3 `string` fields require valid UTF-8, but the FUSE kernel passes raw name bytes. Create/Mknod/Mkdir/Unlink/Rmdir/Rename/Lookup/Link/ Symlink all forwarded those bytes directly into CreateEntryRequest.Name, DeleteEntryRequest.Name, StreamRenameEntryRequest.{Old,New}Name and Entry.Name. saveDataAsChunk also copied the FullPath into AssignVolumeRequest.Path unchecked. 2. When the marshal failed, shouldInvalidateConnection treated the resulting codes.Internal as a connection problem and dropped the shared cached ClientConn — canceling every other in-flight RPC on it. Fix: - Add sanitizeFuseName (strings.ToValidUTF8 with '?' replacement, matching util.FullPath.DirAndName) and make checkName return the sanitized name. Apply at every FUSE entry point that passes a name to the filer RPC, including Unlink/Rmdir (which did not previously call checkName) and both oldName/newName in Rename. Add a backstop scrub for AssignVolumeRequest.Path so async flush paths cannot reintroduce invalid bytes from a pre-sanitization cached FullPath. - In weed/pb.shouldInvalidateConnection, detect client-side marshal errors via the gRPC library's "error while marshaling" prefix and return false: the connection is healthy, only the request is bad. Refs: https://github.com/seaweedfs/seaweedfs/issues/9139#issuecomment-4301184231 * fix(mount,util): use '_' for invalid-UTF-8 replacement (URL-safe) Sanitized filenames flow downstream into HTTP URLs (volume-server uploads, filer HTTP API, S3/WebDAV gateways). '?' is the URL query-string delimiter and would split the path the first time the name lands in one, so swap every invalid-UTF-8 replacement to '_'. This covers the two pre-existing sites in weed/util/fullpath.go as well, keeping all paths sanitized the same way. * refactor(pb): detect client-side marshal errors via errors.As, not substring Replace the raw `strings.Contains(err.Error(), ...)` check with a type-based carve-out: use errors.As against the `GRPCStatus() Status` interface to pull the original Status out of any fmt.Errorf("...: %w") wrapping, then match the library-owned "grpc:" prefix on that Status's Message. Why not errors.Is against a proto-level sentinel: gRPC's encode() collapses the inner proto error with "%v" (stringification) before wrapping it in a Status, so the original error type does not survive into the caller. The Status itself is the structural signal that does survive. Why not status.FromError: when the caller wraps the Status error with fmt.Errorf("...: %w", ...), status.FromError rewrites Status.Message with the full err.Error() of the outermost wrapper, which defeats a prefix check on the library-owned message. errors.As gives us the original Status whose Message is still verbatim from the gRPC library. A new test asserts that a plain errors.New("grpc: error while marshaling: …") — i.e. the same text attached to something that is NOT a gRPC status — does not short-circuit invalidation, so we never silently keep a cached connection alive based on a coincidental substring match. refactor(util): centralize UTF-8 sanitization; add FullPath.Sanitized Addresses review feedback on PR #9207. Nitpick: every invalid-UTF-8 replacement across the codebase (DirAndName, Name, mount.sanitizeFuseName, the weedfs_write.go backstop) now goes through a single util.SanitizeUTF8Name helper, so the replacement char ('_' — URL-safe) is chosen in one place. Outside-diff: three proto fields took raw FullPath strings that could break marshaling if an entry ever carried invalid UTF-8 (CreateEntryRequest.Directory in Mkdir, DeleteEntryRequest.Directory in Unlink, AssignVolumeRequest.Path in command_fs_merge_volumes). The reviewer's suggested fix — using DirAndName() — would have silently changed Directory from parent to grandparent, because DirAndName sanitizes only the trailing component. Added FullPath.Sanitized(), which scrubs every component, and applied it at the three sites. Exposure is narrow in practice (FUSE-boundary sanitization and the gRPC-side isClientSideMarshalError carve-out already cover the #9139 cascade), but the defense-in-depth is cheap and consistent with the existing AssignVolume backstop. New tests in weed/util/fullpath_test.go document: - SanitizeUTF8Name: valid UTF-8 passes through unchanged; invalid bytes become '_' (not '?', which is URL-special). - FullPath.Sanitized: scrubs bytes in any component, not just the last. - FullPath.DirAndName: dir remains raw on purpose — callers needing a clean full path must use Sanitized(). The test pins this behavior so it is not accidentally "fixed" in a way that changes the (dir, name) semantics callers depend on.	2026-04-23 19:17:35 -07:00
Chris Lu	7364f148bd	fix(s3/shell): factor EC volumes into bucket size metrics and collection.list (#9182 ) * fix(s3/shell): include EC volumes in bucket size metrics and collection.list S3 bucket size metrics exported to Prometheus (and fed through stats.UpdateBucketSizeMetrics) are computed by collectCollectionInfoFromTopology, which only walked diskInfo.VolumeInfos. As soon as a volume was encoded to EC it dropped out of every aggregate, so Grafana showed bucket sizes shrinking while physical disk usage kept climbing. The shell helper collectCollectionInfo — used by collection.list and s3.bucket.quota.enforce — had the same gap, with the EC branch left as a commented-out TODO. Fold EC shards into both paths using the same approach the admin dashboard already uses (PR #9093): - PhysicalSize / Size sum across shard holders: EC shards are node-local (not replicas), so per-node TotalSize() and MinusParityShards().TotalSize() sum to the whole-volume physical and logical sizes respectively. - FileCount is deduped via max across reporters (every shard holder reports the same .ecx count; a slow node with a not-yet-loaded .ecx reports 0 and must not pin the aggregate). - DeleteCount is summed (each delete tombstones exactly one node's .ecj). - VolumeCount increments once per unique EC volume id. Adds regression tests covering pure-EC, mixed regular+EC, and the slow-reporter FileCount dedupe case. Refs #9086 * Address PR review feedback: EC size helpers, composite key, VolumeCount dedupe - Add EcShardsTotalSize / EcShardsDataSize helpers in the erasure_coding package that walk the shard bitmap directly instead of materializing a ShardsInfo and copying it via MinusParityShards(). Keeps the DataShardsCount dependency encapsulated in one place and avoids the per-shard allocation/copy overhead in the metrics hot path. - Switch shell collectCollectionInfo ecVolumes map to a composite {collection, volumeId} key, matching the bucket_size_metrics collector and defending against any cross-collection volume id aliasing. - Dedupe VolumeCount in shell addToCollection by volume id so regular volumes aren't counted once per replica presence. Aligns the shell's collection.list output with the S3 metrics collector and the EC branch, all of which now report logical volume counts. - Add unit tests for the new helpers and for the regular-volume VolumeCount dedupe. * Parameterize EcShardsDataSize with dataShards for custom EC ratios Add a dataShards parameter to EcShardsDataSize so forks with per-volume ratio metadata (e.g. the enterprise data_shards field carried on an extended VolumeEcShardInformationMessage) can pass the configured value and get accurate logical sizes under custom EC policies like 6+3 or 16+6. Passing 0 or a negative value falls back to the upstream DataShardsCount default, which is correct for the fixed 10+4 layout — so OSS callers in s3api and shell pass 0 and keep their current behavior. Added table cases covering the custom 6+3 and 16+6 paths so the parameterization is pinned by tests.	2026-04-21 20:17:42 -07:00
Chris Lu	49f62df2cf	fix(shell): mergeVolumes hard-link safety and cleaner cleanup logging (#9163 ) * fix(shell): skip hard-linked entries in fs.mergeVolumes Hard-linked entries share a chunk list with their siblings, but the filer's UpdateEntry only rewrites one entry at a time. Moving a chunk here leaves every other hard-linked sibling pointing at a fid that either gets deleted by the filer's own garbage step after UpdateEntry or by deleteMovedSourceNeedles (#9160) — either way, the siblings end up with dangling references. Skip the entry with a visible log line so operators know the file was bypassed and can handle it explicitly (copy-then-unlink, or dedup before merge). Detected via entry.HardLinkId being non-empty, which is the same signal the filer itself uses (weed/pb/filer_pb/filer.pb.go:451). Flagged by coderabbit on #9160 post-merge. * fix(shell): mergeVolumes suppresses 404 alongside 304 in source cleanup BatchDelete also returns StatusNotFound (404) for an already-deleted needle when ReadVolumeNeedle can't find it in the cookie-check path, not only StatusNotModified (304) from DeleteVolumeNeedle returning size 0. Both are benign races against a concurrent fsck purge or a replica that already reconciled, so don't clutter the output with "delete ... not found" warnings for them. Flagged by coderabbit on #9160 post-merge. * fix(shell): mergeVolumes merges hard-linked files via dedup on HardLinkId Hard-linked siblings share one chunk list through a KV blob keyed by HardLinkId (see weed/filer/filerstore_hardlink.go). UpdateEntry's setHardLink rewrites that blob and maybeReadHardLink overrides per-entry chunks with the blob's on every read, so a single UpdateEntry propagates new fids to every sibling automatically — the previous skip-hardlinks bailout was overly conservative and left hard-linked files stuck on merge-source volumes forever. Process each HardLinkId exactly once per run with a sync.Map so BFS workers in different directories synchronize without a global lock. First sibling carries the chunk move + UpdateEntry; later siblings find the id in the map and return — preventing the real race, which is two siblings trying to re-download an already-moved source needle or double-queue the same fid for deletion. Also address the log-spam review on deleteMovedSourceNeedles: an unreachable volume server returns one error per needle, so collapse multiple failures into a single per-server line with the first error as an example.	2026-04-20 17:55:20 -07:00
Chris Lu	caaa53aee3	fix(shell): ec.encode health-check key mismatch on K8s deployments (#9164 ) Build freeVolumeCountMap using dn.Address so the key matches wdclient.Location.Url during the subsequent lookup. Keying by dn.Id silently filtered out every replica in deployments where dn.Id is a short name (e.g. Kubernetes StatefulSet pod name) while the location Url is a FQDN:port, causing "no healthy replicas" even with ample free capacity. Also filter replicas before marking volumes readonly so that a failed health check no longer strands volumes in readonly state. Fixes #9145	2026-04-20 15:57:30 -07:00
Chris Lu	e725eb4079	refactor(shell): run volume.fsck purge once per volume, after all replicas (#9159 ) * refactor(shell): run volume.fsck purge once per volume, after all replicas The purge step in findExtraChunksInVolumeServers was nested inside the outer `for dataNodeId` loop, so it fired once per data-node iteration rather than once total. Two consequences: 1. The replica-intersection safety net was broken. The code marks a fid "found in all replicas" only after every replica has reported its orphans, but the purge ran after the first data node already, so fids contributed only by later replicas never got the `true` flag in time. Without `-forcePurging` that meant some legitimate orphans were never purged; with `-forcePurging` the flag was ignored so the bug was hidden. 2. Visible output got noisy: "purging orphan data for volume X..." printed 2-3 times per volume (N_datanodes * N_replicas RPCs to the same locations) since purgeFileIdsForOneVolume already fans out to every replica location via MasterClient.GetLocations. Split the work into two explicit phases: collect orphans from every replica first, then purge each volume once. Drop the per-replica loop around purgeFileIdsForOneVolume since it already handles all replicas internally. Keep the per-replica mark-writable loop (each replica's readonly bit has to be flipped before the purge RPC fans out to it). Also simplify the gating expression — `isSeveralReplicas && foundInAllReplicas` is redundant given the preceding `!isSeveralReplicas` branch — and replace `!(X > 0)` with the more idiomatic `len(X) == 0`. Related to #9116 follow-up on multiple fsck passes needed to fully clean a volume. * address review: per-replica readonly tracking, count-based intersection, defer-per-volume Three issues raised on the v1: 1. The readonly cleanup stored a single isReadOnlyReplicas[volumeId]=bool that flipped true if any replica was read-only, then the defer marked every replica in serverReplicas[volumeId] read-only on exit. If a volume had mixed replica modes (one RO, one RW), the originally-RW replica ended up RO after fsck returned. Track read-only state per replica in readOnlyServerReplicas[volumeId] and revert only those. 2. The defer inside the volumeId loop accumulated for the entire fsck run, so every volume we processed stayed writable until the whole command returned. Split the per-volume logic into purgeOneVolume so the defers unwind between volumes. 3. The intersection logic used a sticky bool that treated "seen on any 2 of 3 replicas" as "seen on all replicas" — a 3+-replica volume would get purged for fids only 2 replicas agreed on, which is what -forcePurging is supposed to opt into. Switch to a count-based map[fid]int compared against volumeReplicaCounts[volumeId], so we only purge without -forcePurging when every replica agrees. Also drop the now-unused serverReplicas map.	2026-04-20 15:32:47 -07:00
Chris Lu	08a7502b2c	fix(shell): error on missing volume id in fsck, mergeVolumes, vacuum (#9158 ) * fix(shell): error on missing volume id in fsck, mergeVolumes, vacuum Three shell commands silently report success when -volumeId / -fromVolumeId / -toVolumeId names a volume the master doesn't know about: typos, already-deleted volumes, and stale scripts all look identical to a clean no-op, which is what made the confusion in #9116 take as long as it did to diagnose. - volume.fsck: filter at the per-datanode loop drops unknown ids and findExtraChunksInVolumeServers ends with totalOrphanChunkCount==0, printing "no orphan data". - fs.mergeVolumes: createMergePlan iterates only known volumes, so an unknown -fromVolumeId produces an empty plan and we print just the "max volume size: N MB" header (indistinguishable from "nothing to merge"). - volume.vacuum: the master's VacuumVolume RPC silently iterates matching volumes; a missing id returns success having done nothing. Validate the requested ids against the current topology up front and return an explicit "volume(s) not found on master: [X Y]" error. Also drop a stale duplicate `if err != nil` in volume.fsck.Do left over from a prior refactor. Surfaces #9116 follow-up from madalee-com. * address review: propagate reloadVolumesInfo error; dedupe vacuum missing ids - fs.mergeVolumes: c.reloadVolumesInfo's return was ignored. If the master is unreachable or VolumeList fails, c.volumes stays empty and the new validation block reports "fromVolumeId X not found on master" — masking the real connection/RPC failure. Return the wrapped error instead. - volume.vacuum: "volume.vacuum -volumeId 5,5,5" on a missing volume 5 listed [5 5 5] in the error. Collect missing ids in a set so each missing id appears once. * address review: reject fromVolumeId/toVolumeId values that overflow uint32 flag.Uint produces a uint (64-bit on amd64), and the existing cast to needle.VolumeId silently truncates to uint32. A typo like `-fromVolumeId=4294967297` would wrap to volume 1 and slip past every other validation, so the merge would run against a completely different volume than the operator intended. Bail out with an explicit error when the raw flag value exceeds the uint32 range, before the cast.	2026-04-20 15:32:31 -07:00
Chris Lu	6bdd775963	feat(shell): fs.mergeVolumes deletes source needles after filer update (#9160 ) * feat(shell): fs.mergeVolumes deletes source needles after filer update Before this change, mergeVolumes only copied chunks to the destination volume and updated the filer — the source needle sat untouched on its original volume as a silent orphan. Operators had to run a separate volume.fsck + volume.vacuum pass to actually reclaim the space, and #9116 (comment 4282692876) showed how that pipeline can look exactly like "mergeVolumes did nothing": the source volume keeps reporting its original size even though every chunk has been logically moved out. Clean up the source inline. For each entry, track the pre-move fids as they're captured, and after the UpdateEntry RPC commits, issue BatchDelete on every replica of each source volume. Key invariants: - Source fids are only deleted AFTER UpdateEntry succeeds; if the filer write fails we skip the cleanup for that entry so we never delete data the filer still references. - rewriteManifestChunk grew a fourth return value so nested manifest and sub-chunk moves propagate their moved-source list back to the top-level callsite. The outer manifest itself is recorded at the callsite, since only the callsite sees the pre-rewrite fid. - deleteMovedSourceNeedles logs errors but never returns them. Propagating would abort TraverseBfs mid-merge, stranding remaining entries; logging leaves the fallback path (fsck reconciles later) intact. - StatusNotModified from the volume server is expected whenever a concurrent fsck purge beat us to the delete or a replica already reconciled — don't warn on it. Readonly source volumes are already rejected up front by createMergePlan, so by the time we reach the delete the source is writable. If a replica's readonly bit has flipped since then the delete will fail and get logged; the user can re-run once they've fixed the replica (same failure mode as today's fsck purge). Fixes the space-not-reclaimed half of #9116. Related design discussion: #8589. * address review: cast r.Status to int in StatusNotModified compare http.StatusNotModified is an untyped constant so the compare works as written, but the int32/int mixed-type signal trips static analyzers and PR tooling. Cast explicitly and note why.	2026-04-20 14:37:20 -07:00
Chris Lu	9a6b566fb1	fix(shell): volume.fsck keeps going past a single broken chunk manifest (#9140 ) * fix(shell): volume.fsck no longer aborts on a single broken chunk manifest Previously a single entry whose chunk-manifest could not be read (e.g. the manifest needle was missing or its sub-chunks pointed at a now-gone volume) caused collectFilerFileIdAndPaths to return immediately with "failed to ResolveChunkManifest". The whole fsck run failed, so an operator with even one corrupted file could not use volume.fsck to find or clean up unrelated orphan needles on other volumes — they had to locate and delete the bad entries first, blind, with no help from fsck. Log the resolution failure with the entry path, fall back to recording the top-level chunk fids the entry references (data fids and manifest fids themselves; sub-chunks behind the unresolvable manifest stay unknown), and keep traversing. Track the count of unresolved entries on the command struct and refuse -reallyDeleteFromVolume for the run when the count is non-zero, since the in-use fid set is incomplete and a purge could otherwise delete live sub-chunks behind the broken manifest. Read-only fsck still produces a useful (if conservatively over-reported) orphan listing so the operator can see and fix the broken entries first, then re-run with apply. Discovered while diagnosing #9116. * address review: use callback ctx and atomic counter - Pass the BFS callback's ctx to ResolveChunkManifest so a Ctrl+C / first-error cancellation propagates into the manifest fetch instead of using context.Background(). - TraverseBfs runs the callback across K=5 worker goroutines (filer_pb/filer_client_bfs.go), so the unresolvedManifestEntries field on commandVolumeFsck is shared across workers and was racing. Switch it to atomic.Int64 with Add/Load. * address review: reset counter per Do(), pass through ctx errors - commandVolumeFsck is a singleton registered in init() and reused across shell invocations. Without resetting the unresolved-manifest counter at the top of Do(), a single failed run permanently suppressed -reallyDeleteFromVolume in the same shell session. Reset to 0 right after flag parsing. - Treating context cancellation as manifest corruption was wrong: a Ctrl+C or deadline mid-traversal would inflate the counter and emit misleading "manifest broken" warnings for entries that were never examined. Detect context.Canceled / context.DeadlineExceeded and return the error so the BFS unwinds cleanly. Not changing the findMissingChunksInFiler branch's purgeAbsent / applyPurging gating: that path checks recorded filer fids against volume idx files, and a broken-manifest entry's recorded manifest fid will fail the existence check and get purged — which is the cleanup the operator wants for those entries. Adding a gate would block the exact use case the warning points them at.	2026-04-19 23:06:28 -07:00
Chris Lu	d57fc67022	fix(shell): fs.mergeVolumes now rewrites manifest chunks for large files (#9127 ) * fix(shell): fs.mergeVolumes now rewrites manifest chunks for large files Previously fs.mergeVolumes skipped any chunk whose IsChunkManifest flag was true, printing "Change volume id for large file is not implemented yet" and continuing. Because the BFS traversal only looks at top-level entry.Chunks, sub-chunks referenced inside a manifest were never considered either. For any file stored as a chunk manifest (large files go this path), chunks in the source volume stayed put, leaving behind a few MB of live data that vacuum and volume.deleteEmpty couldn't clean up. This change resolves each manifest chunk recursively, moves any sub-chunk whose volume id is in the merge plan via the existing moveChunk path, and re-serializes the manifest. If the manifest chunk itself lives in a source volume, or any sub-chunk moved, the new manifest blob is uploaded to a freshly assigned file id (the old needle becomes orphaned and is reclaimed by vacuum like any other moved chunk). Fixes #9116. * address review: batch UpdateEntry, fix dry-run, defer restore, avoid source volumes - Call UpdateEntry once per entry after the chunk loop instead of once per moved chunk (gemini nit). - In dry-run mode, mark anySubChanged when a sub-chunk in the plan is encountered and return changed=true after printing "rewrite manifest", so nested manifests also surface their would-rewrites (gemini nit). - Defer filer_pb.AfterEntryDeserialization so the manifest chunk list is restored even when proto.Marshal fails (coderabbit nit). - Reject AssignVolume results whose file id lands on a volume that is a source in the merge plan, and retry — otherwise the replacement manifest could be written to the volume being emptied (coderabbit).	2026-04-17 21:17:51 -07:00
Jaehoon Kim	96af27a131	feat(shell): add fs.distributeChunks command for even chunk distribution (#9117 ) * feat(shell): add fs.distributeChunks command for even chunk distribution Add a new weed shell command that redistributes a file's chunks evenly across volume server nodes. Supports three distribution modes via -mode flag: - primary: balance chunk ownership across nodes (default) - replica: balance both ownership and replica copies - round-robin: assign chunks by offset order for sequential read optimization (chunk[0]->A, chunk[1]->B, chunk[2]->C, ...) Additional options: - -nodes=N to target specific number of nodes - -apply to execute (dry-run by default) Usage: fs.distributeChunks -path=/buckets/file.dat fs.distributeChunks -path=/buckets/file.dat -mode=round-robin -apply fs.distributeChunks -path=/buckets/file.dat -mode=replica -apply fs.distributeChunks -path=/buckets/file.dat -nodes=5 -apply * fix(shell): improve fs.distributeChunks robustness and code quality - Propagate flag parse errors instead of swallowing them (return err) - Handle nil chunk.Fid by falling back to legacy FileId string parsing - Simplify node membership check using slices.Contains * fix(shell): fix dead round-robin print loop in fs.distributeChunks The loop was computing targetNode with sc.index%totalNodes (original chunk index) instead of the sequential position, and discarding it via _ = targetNode without printing anything. Replace with a correct loop using pos%totalNodes and actually print the first 12 node assignments. * fix(shell): compute replication/collection per-chunk in fs.distributeChunks Previously replication and collection were derived once from chunks[0] and reused for all moves, causing wrong volume placement for chunks belonging to different volumes or collections. Now each chunk looks up its own volumeInfoMap entry immediately before calling operation.Assign. * fix(shell): prefer assignResult.Auth JWT over local signing key in fs.distributeChunks When the master returns an Auth token in the Assign response, use it directly for the upload instead of generating a new JWT from the local viper signing key. Fall back to local key generation only when Auth is empty, matching the pattern used by other upload paths. * fix(shell): add timeout and error handling to delete requests in fs.distributeChunks The delete loop was ignoring http.NewRequest errors and had no timeout, risking a nil-request panic or indefinite block. Replace with http.NewRequestWithContext and a 30s timeout, handle request creation errors by incrementing deleteFailCount, and cancel the context immediately after Do returns. * feat(shell): parallelize chunk moves in fs.distributeChunks using ErrorWaitGroup Sequential chunk moves are a bottleneck for large LLM model files with hundreds or thousands of chunks. Use ErrorWaitGroup with DefaultMaxParallelization (10) to run download/assign/upload concurrently. Guard movedRecords appends, chunk.Fid updates, and writer output with a mutex. Individual chunk failures are non-fatal and logged inline; only successfully moved chunks are included in the metadata update. * fix(shell): try all replica URLs on download in fs.distributeChunks Previously only the first volume server URL was attempted, causing chunk moves to fail if that replica was unreachable. Now iterates through all URLs returned by LookupVolumeServerUrl and stops at the first success. * refactor(shell): apply extract method pattern to fs.distributeChunks Do() was a single ~615-line function. Break it into focused helpers: - lookupFileEntry: filer entry lookup - validateChunks: chunk manifest guard - collectVolumeTopology: master topology query + ownership mapping - buildDistributionCounts: chunk→node mapping and owner/copy tallies - selectActiveNodes: target node selection - printCurrentDistribution: per-node distribution table - planDistribution: mode-switch planning (primary/replica/round-robin) - printRedistributionPlan: before/after plan table - relevantNodes: active-or-occupied node filter Do() is now ~100 lines of orchestration; each helper has a single clear responsibility. * test(shell): add unit tests for fs.distributeChunks algorithms Cover all three distribution modes and supporting helpers: - shortName, relevantNodes - computeOwnerTarget (even/uneven split, inactive node drain) - buildDistributionCounts (normal + nil Fid fallback) - selectActiveNodes (all nodes / limited count) - planOwnerMoves (imbalanced → balanced, already balanced) - planDistribution primary (chunks balanced, no-op when even) - planDistribution round-robin (offset ordering, correct assignment) - planDistribution replica (owner + copy balancing) - printRedistributionPlan (output format) * fix(shell): add 5-minute timeout to chunk downloads in fs.distributeChunks Download requests had no per-request timeout, unlike delete operations which already use 30s. Replace readUrl() calls with inline http.NewRequestWithContext + context.WithTimeout(5m) so a hung volume server cannot block a goroutine indefinitely during redistribution. * fix(shell): remove redundant deleteOldChunks in fs.distributeChunks filer.UpdateEntry already calls deleteChunksIfNotNew internally, which computes the diff between old and new entry chunks and deletes the ones no longer referenced. Our explicit deleteOldChunks was racing with this filer-side cleanup, causing spurious 404 warnings on ~75% of deletes. Remove deleteOldChunks, movedChunkRecord type, and reduce executeChunkMoves return type to (int, error) for the moved count. * fix(shell): handle nil chunk.Fid via chunkVolumeId helper in fs.distributeChunks chunk.Fid.GetVolumeId() silently returns 0 for legacy chunks stored with a FileId string instead of a Fid struct, causing them to be skipped in the replica balancing loop and looked up incorrectly in volumeInfoMap. Introduce chunkVolumeId() that uses Fid when present and falls back to parsing the legacy FileId string, matching the logic in buildDistributionCounts. Apply it in the replica-mode copies loop and in executeChunkMoves' replication/collection lookup. * fix(shell): use already-parsed oldFid for volumeInfoMap lookup in fs.distributeChunks chunkVolumeId(chunk) was being called to look up replication/collection after oldFid had already been parsed and validated. Use oldFid.VolumeId directly to avoid redundant parsing and guarantee the correct volume ID regardless of whether chunk.Fid is nil. * fix(shell): improve correctness and robustness in fs.distributeChunks - Buffer download body before upload so dlCtx timeout only covers the GET request; upload runs with context.Background() via bytes.NewReader - Replace 'before, after := strings.Cut(...)' + '_ = before' with '_' as the first return value directly - Clone copiesCount before replica planner mutates it, keeping the caller's map immutable - Add nil-entry guard after filer LookupEntry to prevent panic on unexpected nil response * feat(shell): support chunk manifests in fs.distributeChunks Large files stored as chunk manifests were previously rejected. Resolve manifests up front via filer.ResolveChunkManifest, redistribute the underlying data chunks, then re-pack through filer.MaybeManifestize before UpdateEntry. The filer's MinusChunks resolves manifests on both sides of the diff, so old manifest and inner data chunks are GC'd automatically. * fix(shell): match master's SaveDataAsChunkFunctionType 5-param signature Master added expectedDataSize uint64; ignore it in shell-side saveAsChunk. --------- Co-authored-by: Chris Lu <chris.lu@gmail.com>	2026-04-17 21:09:36 -07:00
Chris Lu	979c54f693	fix(wdclient,volume): compare master leader with ServerAddress.Equals (#9089 ) * fix(wdclient,volume): compare master leader with ServerAddress.Equals Raft leader is advertised as host:httpPort.grpcPort, but clients dial host:httpPort. Raw string comparison against VolumeLocation.Leader / HeartbeatResponse.Leader therefore never matches, causing the masterclient and the volume server heartbeat loop to continuously "redirect" to the already-connected master, tearing down the stream and reconnecting. Use ServerAddress.Equals, which normalizes the grpc-port suffix. * fix(filer,mq): compare ServerAddress via Equals in two more sites filer bootstrap skip (MaybeBootstrapFromOnePeer) and the broker's local partition assignment check both compared a wire-supplied address string against the local self ServerAddress with raw string equality. Both are vulnerable to the same plain-vs-host:port.grpcPort mismatch as the masterclient/volume heartbeat sites: filer would bootstrap from itself, and the broker would fail to claim a partition it was actually assigned. Route both through ServerAddress.Equals. * fix(master,shell): more ServerAddress comparisons via Equals - raft_server_handlers.go HealthzHandler: s.serverAddr == leader would skip the child-lock check on the real leader when the two carry different plain/grpc-suffix forms, returning 200 OK instead of 423. - master_server.go SetRaftServer leader-change callback: the Leader() == Name() guard for ensureTopologyId could disagree with topology.IsLeader() (which already uses Equals), so leader-only initialization could be skipped after an election. - command_volume_merge.go isReplicaServer: the -target guard compared user-supplied host:port against NewServerAddressFromDataNode(...) with ==, letting an existing replica slip through when topology carries the embedded gRPC port. All routed through pb.ServerAddress.Equals. * fix(mq,cluster): more ServerAddress comparisons via Equals - broker_grpc_lookup.go GetTopicPublishers/GetTopicSubscribers: the partition ownership check gated listing on raw LeaderBroker == BrokerAddress().String(), so listings silently omitted partitions hosted locally when the assignment carried the other host:port / host:port.grpcPort form. - lock_client.go: LockHostMovedTo comparison and the seedFiler fallback guard both used raw string equality against configured filer addresses (which may be plain host:port while LockHostMovedTo comes back suffixed), causing spurious host-change churn and blocking the seed-filer fallback. * fix(mq): more ServerAddress comparisons via Equals - pub_balancer/allocate.go EnsureAssignmentsToActiveBrokers: direct activeBrokers.Get() lookup missed brokers when a persisted assignment carried a different address encoding than the registered broker key, triggering a bogus reassignment on every read/write cycle. Added a findActiveBroker helper that falls back to an Equals-based scan and canonicalizes the assignment in place so later writes are stable. - broker_grpc_lookup.go isLockOwner: used raw string equality between LockOwner() and BrokerAddress().String(), so a lock owner could fail to recognize itself and proxy local lookup/config/admin RPCs away. - pub_client/scheduler.go onEachAssignments: reused publisher jobs only on exact LeaderBroker match, so an encoding flip in lookup results tore down and recreated a stream to the same broker.	2026-04-15 12:29:31 -07:00
Chris Lu	10e7f0f2bc	fix(shell): s3.user.provision handles existing users by attaching policy (#9040 ) * fix(shell): s3.user.provision handles existing users by attaching policy Instead of erroring when the user already exists, the command now creates the policy and attaches it to the existing user via UpdateUser. Credentials are only generated and displayed for newly created users. * fix(shell): skip duplicate policy attachment in s3.user.provision Check if the policy is already attached before appending and calling UpdateUser, making repeated runs idempotent. * fix(shell): generate service account ID in s3.serviceaccount.create The command built a ServiceAccount proto without setting Id, which was rejected by credential.ValidateServiceAccountId on any real store. Now generates sa:<parent>:<uuid> matching the format used by the admin UI. * test(s3): integration tests for s3.* shell commands Adds TestShell* integration tests covering ~40 previously untested shell commands: user, accesskey, group, serviceaccount, anonymous, bucket, policy.attach/detach, config.show, and iam.export/import. Switches the test cluster's credential store from memory to filer_etc because the memory store silently drops groups and service accounts in LoadConfiguration/SaveConfiguration. * fix(shell): rollback policy on key generation failure in s3.user.provision If iam.GenerateRandomString or iam.GenerateSecretAccessKey fails after the policy was persisted, the policy would be left orphaned. Extracts the rollback logic into a local closure and invokes it on all failure paths after policy creation for consistency. * address PR review feedback for s3 shell tests and serviceaccount - s3.serviceaccount.create: use 16 bytes of randomness (hex-encoded) for the service account UUID instead of 4 bytes to eliminate collision risk - s3.serviceaccount.create: print the actual ID and drop the outdated "server-assigned" note (the ID is now client-generated) - tests: guard createdAK in accesskey rotate/delete subtests so sibling failures don't run invalid CLI calls - tests: requireContains/requireNotContains use t.Fatalf to fail fast - tests: Provision subtest asserts the "Attached policy" message on the second provision call for an existing user - tests: update extractServiceAccountID comment example to match the sa:<parent>:<uuid> format - tests: drop redundant saID empty-check (extractServiceAccountID fatals) * test(s3): use t.Fatalf for precondition check in serviceaccount test	2026-04-11 22:30:51 -07:00
Chris Lu	e648c76bcf	go fmt	2026-04-10 17:31:14 -07:00
Chris Lu	b1265de78f	feat(shell): add group management commands (#8993 ) * feat(shell): add group management commands Add weed shell commands for IAM group management: - s3.group.create -name <group> - s3.group.delete -name <group> - s3.group.list - s3.group.show -name <group> - s3.group.add-user -group <group> -user <user> - s3.group.remove-user -group <group> -user <user> All commands use GetConfiguration/PutConfiguration gRPC pattern, consistent with existing shell commands like s3.user.list. * fix: add nil check for Configuration in group shell commands Guard against nil Configuration response from GetConfiguration gRPC call to prevent potential panics. (Gemini review)	2026-04-08 14:03:26 -07:00
Chris Lu	7f3908297c	fix(weed/shell): suppress prompt when piped (#8990 ) * fix(weed/shell): suppress prompt when stdin or stdout is not a TTY When piping weed shell output (e.g. `echo "s3.user.list" \| weed shell \| jq`), the "> " prompt was written to stdout, breaking JSON parsers. `liner.TerminalSupported()` only checks platform support, not whether stdin/stdout are actual TTYs. Add explicit checks using `term.IsTerminal()` so the shell falls back to the non-interactive scanner path when piped. Fixes #8962 * fix(weed/shell): suppress informational logs unless -verbose is set Suppress glog info messages and connection status logs on stderr by default. Add -verbose flag to opt in to the previous noisy behavior. This keeps piped output clean (e.g. `echo "s3.user.list" \| weed shell \| jq`). * fix(weed/shell): defer liner init until after TTY check Move liner.NewLiner() and related setup (history, completion, interrupt handler) inside the interactive block so the terminal is not put into raw mode when stdout is redirected. Previously, liner would set raw mode unconditionally at startup, leaving the terminal broken when falling back to the scanner path. Addresses review feedback from gemini-code-assist. * refactor(weed/shell): consolidate verbose logging into single block Group all verbose stderr output within one conditional block instead of scattering three separate if-verbose checks around the filer logic. Addresses review feedback from gemini-code-assist. * fix(weed/shell): clean up global liner state and suppress logtostderr - Set line=nil after Close() to prevent stale state if RunShell is called again (e.g. in tests) - Add nil check in OnInterrupt handler for non-interactive sessions - Also set logtostderr=false when not verbose, in case it was enabled Addresses review feedback from gemini-code-assist. * refactor(weed/shell): make liner state local to eliminate data race Replace the package-level `line` variable with a local variable in RunShell, passing it explicitly to setCompletionHandler, loadHistory, and saveHistory. This eliminates a data race between the OnInterrupt goroutine and the defer that previously set the global to nil. Addresses review feedback from gemini-code-assist. * rename(weed/shell): rename -verbose flag to -debug Avoid conflict with -verbose flags already used by individual shell commands (e.g. ec.encode, volume.fix.replication, volume.check.disk).	2026-04-08 13:07:15 -07:00
Chris Lu	74905c4b5d	shell: s3.* commands always output JSON, connection messages to stderr (#8976 ) * shell: s3.* commands output JSON, connection messages to stderr All s3.user.* and s3.policy.attach\|detach commands now output structured JSON to stdout instead of human-readable text: - s3.user.create: {"name","access_key"} (secret key to stderr only) - s3.user.list: [{name,status,policies,keys}] - s3.user.show: {name,status,source,account,policies,credentials,...} - s3.user.delete: {"name"} - s3.user.enable/disable: {"name","status"} - s3.policy.attach/detach: {"policy","user"} Connection startup messages (master/filer) moved to stderr so they don't pollute structured output when piping. Closes #8962 (partial — covers merged s3.user/policy commands). * shell: fix secret leak, duplicate JSON output, and non-interactive prompt - s3.user.create: only echo secret key to stderr when auto-generated, never echo caller-supplied secrets - s3.user.enable/disable: fix duplicate JSON output — remove inner write in early-return path, keep single write site after gRPC call - shell_liner: use bufio.Scanner when stdin is not a terminal instead of liner.Prompt, suppressing the "> " prompt in piped mode * shell: check scanner error, idempotent enable output, history errors to stderr - Check scanner.Err() after non-interactive input loop to surface read errors - s3.user.enable: always emit JSON regardless of current state (idempotent) - saveHistory: write error messages to stderr instead of stdout	2026-04-07 16:27:21 -07:00
Chris Lu	fb0573ffc4	shell: rename -force to -apply in s3.iam.import for consistency	2026-04-07 14:17:07 -07:00
Chris Lu	d50889002b	shell: add s3.iam., s3.config.show, s3.user.provision; hide legacy commands (#8956 ) shell: add s3.iam., s3.config.show, s3.user.provision; hide legacy commands Add import/export, configuration summary, and a convenience provisioning command: - s3.iam.export: dump full IAM state as JSON (stdout or file) - s3.iam.import: replace IAM state from a JSON file - s3.config.show: human-readable summary (users, policies, service accounts, groups with status and counts) - s3.user.provision: one-step user+policy+credentials creation for common readonly/readwrite/admin roles Hide legacy commands from help listing: - s3.configure: still works but hidden from help output - s3.bucket.access: still works but hidden from help output Both hidden commands remain fully functional for existing scripts. Also adds a Hidden command tag and filters it from printGenericHelp. shell: address review feedback for s3.iam., s3.config.show, s3.user.provision - Simplify joinMax using strings.Join - Fix rolePolicies: remove s3:ListBucket from object-level actions (already covered by bucket-level statement) - Fix admin role: grant s3: on bucket resource too - Return flag parse errors instead of swallowing them * shell: address missed review feedback for PR 3 - s3.iam.import: require -force flag for destructive IAM overwrite - s3.config.show: add nil guard for resp.Configuration - s3.user.provision: check if user exists before creating policy - s3.user.provision: reject wildcard bucket names (* ?) * shell: distinguish NotFound from transient errors in provision, use %w wrapping - s3.user.provision: check gRPC status code on GetUser error — only proceed on NotFound, abort on transient/network errors - s3.iam.import: use %w for error wrapping to preserve error chains, wrap PutConfiguration error with context * shell: remove duplicate joinMax after PR 8954 merge command_s3_helpers.go defined joinMax which is already in command_s3_user_list.go from the merged PR 8954. * shell: restrict export file permissions, rollback policy on user create failure - s3.iam.export: use os.OpenFile with mode 0600 instead of os.Create to protect exported credentials from other users - s3.user.provision: rollback the created policy if CreateUser fails, with a warning if the rollback itself fails	2026-04-07 14:10:15 -07:00
Chris Lu	45bf3ad058	shell: add s3.user.* and s3.policy.attach\|detach commands (#8954 ) * shell: add s3.user.* and s3.policy.attach\|detach commands Add focused IAM shell commands following a noun-verb model: - s3.user.create: create user with auto-generated or explicit credentials - s3.user.list: tabular listing with status, policies, key count - s3.user.show: detailed user view (status, source, policies, credentials) - s3.user.delete: delete a user - s3.user.enable: enable a disabled user - s3.user.disable: disable a user (preserves credentials and policies) - s3.policy.attach: attach a named policy to a user - s3.policy.detach: detach a policy from a user These commands are thin wrappers over the existing IAM gRPC service, producing human-readable output instead of raw protobuf text. This is part of a larger effort to replace the monolithic s3.configure command with a composable set of single-purpose commands. * shell: address review feedback for s3.user.* and s3.policy.attach\|detach - Return flag parse errors instead of swallowing them (all commands) - Use GetConfiguration instead of N+1 GetUser calls in s3.user.list - Add nil check for resp.Identity in s3.user.show - Fix GetPolicy error masking in s3.policy.attach (wrap original error) - Simplify joinMax using strings.Join * shell: add nil identity guards and wrap gRPC errors - Add nil check for resp.Identity in policy_attach, policy_detach, user_enable, user_disable - Wrap GetUser errors with user context for better diagnostics	2026-04-07 11:26:57 -07:00
Chris Lu	d123a2768b	shell: add s3.accesskey., s3.anonymous., s3.serviceaccount.* commands (#8955 ) * shell: add s3.accesskey., s3.anonymous., s3.serviceaccount.* commands Add credential, anonymous access, and service account management commands: Access key commands: - s3.accesskey.create: add credentials to an existing user - s3.accesskey.list: list access keys for a user (key ID + status) - s3.accesskey.delete: remove a specific access key - s3.accesskey.rotate: atomic create-new + delete-old key rotation Anonymous access commands: - s3.anonymous.set: set/remove public access on a bucket - s3.anonymous.get: show anonymous access for a bucket - s3.anonymous.list: list all buckets with anonymous access Service account commands: - s3.serviceaccount.create: create with optional action subset and expiry - s3.serviceaccount.list: tabular listing, optionally filtered by parent - s3.serviceaccount.show: detailed view of a service account - s3.serviceaccount.delete: remove a service account These replace the credential and anonymous portions of the monolithic s3.configure and s3.bucket.access commands. * shell: address review feedback for s3.accesskey., s3.anonymous., s3.serviceaccount.* - Return flag parse errors instead of swallowing them (all commands) - Add action validation in s3.anonymous.set (Read, Write, List, Tagging, Admin) - Fix s3.serviceaccount.create output: note to use list for server-assigned ID since CreateServiceAccountResponse does not return the ID * shell: fix bucket matching and action validation in s3.anonymous.* - Use SplitN instead of HasSuffix for bucket name matching to avoid false positives when one bucket name is a suffix of another - Make action validation case-insensitive with canonical normalization * shell: fix nil panics, dedup actions, validate service account actions - Fix nil-pointer panic in getOrCreateAnonymousUser when GetUser returns err==nil with nil Identity (status.FromError(nil) returns nil status) - Add nil Identity guards in s3.anonymous.get and s3.anonymous.list - Deduplicate action values in s3.anonymous.set (e.g. -access Read,Read) - Add action validation in s3.serviceaccount.create with case normalization * shell: dedup actions and reject negative expiry in s3.serviceaccount.create - Deduplicate -actions values (e.g. Read,read,Read produces one entry) - Reject negative -expiry values instead of silently treating as no expiration	2026-04-07 11:20:15 -07:00
Chris Lu	0fed72d95a	volume.tier.move: fulfill target replication before deleting old replicas (#8950 ) * volume.tier.move: fulfill target replication before deleting old replicas When -toReplication is specified, volume.tier.move now creates all required replicas on the destination tier before deleting old replicas. This closes the data-loss window where only one copy existed on the target tier while awaiting volume.fix.replication. If replication fulfillment fails, old replicas are preserved and marked writable so the volume remains accessible. Also extracts replicateVolumeToServer and configureVolumeReplication helpers to reduce duplication across volume.tier.move and volume.fix.replication. Fixes #8937 * volume.tier.move: always fulfill replication before deleting old replicas When -toReplication is specified, use that replication setting. Otherwise, read the volume's existing replication from the super block. In both cases, all required replicas are created on the destination tier before old replicas are deleted. If replication fulfillment fails (e.g. not enough destination nodes), old replicas are preserved and marked writable so no data is lost. * volume.tier.move: address review feedback on ensureReplicationFulfilled - Add 5s delay before re-collecting topology to allow master heartbeat propagation after the move - Add nil guard for targetTierReplicas to prevent panic if the moved replica is not yet visible in the topology - Treat configureVolumeReplication failure as a hard error instead of a warning, so the rollback logic preserves old replicas * volume.tier.move: harden replication config error handling - Make configureVolumeReplication failure on the primary moved replica a hard error that aborts the move, instead of logging and continuing - Configure replication metadata on all existing target-tier replicas (not just newly created ones) when -toReplication is specified - Deletion of old replicas cannot affect new replicas since the locations list only contains pre-move servers (verified, no change) * volume.tier.move: fix cleanup deleting fulfilled replicas and broken recovery Fix 1: The cleanup loop now preserves pre-existing target-tier replicas that ensureReplicationFulfilled counted toward the replication target. Previously, a mixed-tier volume with an existing replica on the target tier could have that replica deleted right after being counted as fulfilled, leaving the volume under-replicated. ensureReplicationFulfilled now returns a preserveServers set that the deletion loop checks before removing any old replica. Fix 2: Failure paths after LiveMoveVolume (which deletes the source replica) now use restoreSurvivingReplicasWritable instead of markVolumeReplicasWritable. The old helper stopped on first error, so attempting to mark the already-deleted source writable would prevent all surviving replicas from being restored. The new helper skips the deleted source and continues through all remaining locations, logging per-replica errors instead of aborting. * volume.tier.move: mark preserved replicas writable, skip nodes with existing volume Fix 1: Preserved pre-existing target-tier replicas were left read-only after the move completed. They were marked read-only at the start (along with all other replicas) but never restored since the old code deleted them. Now they are explicitly marked writable before cleanup. Fix 2: The fulfillment loop could pick a candidate node that already hosts this volume on a different disk type, causing a VolumeCopy conflict. Added a guard that skips any node already hosting the volume (on any disk) before attempting replication.	2026-04-06 14:55:37 -07:00
Chris Lu	995dfc4d5d	chore: remove ~50k lines of unreachable dead code (#8913 ) * chore: remove unreachable dead code across the codebase Remove ~50,000 lines of unreachable code identified by static analysis. Major removals: - weed/filer/redis_lua: entire unused Redis Lua filer store implementation - weed/wdclient/net2, resource_pool: unused connection/resource pool packages - weed/plugin/worker/lifecycle: unused lifecycle plugin worker - weed/s3api: unused S3 policy templates, presigned URL IAM, streaming copy, multipart IAM, key rotation, and various SSE helper functions - weed/mq/kafka: unused partition mapping, compression, schema, and protocol functions - weed/mq/offset: unused SQL storage and migration code - weed/worker: unused registry, task, and monitoring functions - weed/query: unused SQL engine, parquet scanner, and type functions - weed/shell: unused EC proportional rebalance functions - weed/storage/erasure_coding/distribution: unused distribution analysis functions - Individual unreachable functions removed from 150+ files across admin, credential, filer, iam, kms, mount, mq, operation, pb, s3api, server, shell, storage, topology, and util packages * fix(s3): reset shared memory store in IAM test to prevent flaky failure TestLoadIAMManagerFromConfig_EmptyConfigWithFallbackKey was flaky because the MemoryStore credential backend is a singleton registered via init(). Earlier tests that create anonymous identities pollute the shared store, causing LookupAnonymous() to unexpectedly return true. Fix by calling Reset() on the memory store before the test runs. * style: run gofmt on changed files * fix: restore KMS functions used by integration tests * fix(plugin): prevent panic on send to closed worker session channel The Plugin.sendToWorker method could panic with "send on closed channel" when a worker disconnected while a message was being sent. The race was between streamSession.close() closing the outgoing channel and sendToWorker writing to it concurrently. Add a done channel to streamSession that is closed before the outgoing channel, and check it in sendToWorker's select to safely detect closed sessions without panicking.	2026-04-03 16:04:27 -07:00
qzh	4c72512ea2	fix(shell): avoid marking skipped or unplaced volumes as fixed (#8866 ) * fix(s3api): fix AWS Signature V2 format and validation * fix(s3api): Skip space after "AWS" prefix (+1 offset) * test(s3api): add unit tests for Signature V2 authentication fix * fix(s3api): simply comparing signatures * validation for the colon extraction in expectedAuth * fix(shell): avoid marking skipped or unplaced volumes as fixed --------- Co-authored-by: chrislu <chris.lu@gmail.com> Co-authored-by: Chris Lu <chrislusf@users.noreply.github.com>	2026-04-01 01:20:25 -07:00
Chris Lu	af68449a26	Process .ecj deletions during EC decode and vacuum decoded volume (#8863 ) * Process .ecj deletions during EC decode and vacuum decoded volume (#8798) When decoding EC volumes back to normal volumes, deletions recorded in the .ecj journal were not being applied before computing the dat file size or checking for live needles. This caused the decoded volume to include data for deleted files and could produce false positives in the all-deleted check. - Call RebuildEcxFile before HasLiveNeedles/FindDatFileSize in VolumeEcShardsToVolume so .ecj deletions are merged into .ecx first - Vacuum the decoded volume after mounting in ec.decode to compact out deleted needle data from the .dat file - Add integration tests for decoding with non-empty .ecj files * storage: add offline volume compaction helper Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * ec: compact decoded volumes before deleting shards Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * ec: address PR review comments - Fall back to data directory for .ecx when idx directory lacks it - Make compaction failure non-fatal during EC decode - Remove misleading "buffer: 10%" from space check error message * ec: collect .ecj from all shard locations during decode Each server's .ecj only contains deletions for needles whose data resides in shards held by that server. Previously, sources with no new data shards to contribute were skipped entirely, losing their .ecj deletion entries. Now .ecj is always appended from every shard location so RebuildEcxFile sees the full set of deletions. * ec: add integration tests for .ecj collection during decode TestEcDecodePreservesDeletedNeedles: verifies that needles deleted via VolumeEcBlobDelete are excluded from the decoded volume. TestEcDecodeCollectsEcjFromPeer: regression test for the fix in collectEcShards. Deletes a needle only on a peer server that holds no new data shards, then verifies the deletion survives decode via .ecj collection. * ec: address review nits in decode and tests - Remove double error wrapping in mountDecodedVolume - Check VolumeUnmount error in peer ecj test - Assert 404 specifically for deleted needles, fail on 5xx --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-01 01:15:26 -07:00
Chris Lu	f256002d0b	fix ec.balance failing to rebalance when all nodes share all volumes (#8796 ) * fix ec.balance failing to rebalance when all nodes share all volumes (#8793) Two bugs in doBalanceEcRack prevented rebalancing: 1. Sorting by freeEcSlot instead of actual shard count caused incorrect empty/full node selection when nodes have different total capacities. 2. The volume-level check skipped any volume already present on the target node. When every node has a shard of every volume (common with many EC volumes across N nodes with N shards each), no moves were possible. Fix: sort by actual shard count, and use a two-pass approach - first prefer moving shards of volumes not on the target (best diversity), then fall back to moving specific shard IDs not yet on the target. * add test simulating real cluster topology from issue #8793 Uses the actual node addresses and mixed max capacities (80 vs 33) from the reporter's 14-node cluster to verify ec.balance correctly rebalances with heterogeneous node sizes. * fix pass comments to match 0-indexed loop variable	2026-03-27 11:14:10 -07:00
Lisandro Pin	e5cf2d2a19	Give the `ScrubVolume()` RPC an option to flag found broken volumes as read-only. (#8360 ) * Give the `ScrubVolume()` RPC an option to flag found broken volumes as read-only. Also exposes this option in the shell `volume.scrub` command. * Remove redundant test in `TestVolumeMarkReadonlyWritableErrorPaths`. `417051bb` slightly rearranges the logic for `VolumeMarkReadonly()` and `VolumeMarkWritable()`, so calling them for invalid volume IDs will actually yield that error, instead of checking maintnenance mode first.	2026-03-26 10:20:57 -07:00
Chris Lu	ccc662b90b	shell: add s3.bucket.access command for anonymous access policy (#8774 ) * shell: add s3.bucket.access command for anonymous access policy (#7738) Add a new weed shell command to view or change the anonymous access policy of an S3 bucket without external tools. Usage: s3.bucket.access -name <bucket> -access read,list s3.bucket.access -name <bucket> -access none Supported permissions: read, write, list. The command writes a standard bucket policy with Principal "" and warns if no anonymous IAM identity exists. shell: fix anonymous identity hint in s3.bucket.access warning The anonymous identity doesn't need IAM actions — the bucket policy controls what anonymous users can do. * shell: only warn about anonymous identity when write access is set Read and list operations use AuthWithPublicRead which evaluates bucket policies directly without requiring the anonymous identity. Only write operations go through the normal auth flow that needs it. * shell: rewrite s3.bucket.access to use IAM actions instead of bucket policies Replace the bucket policy approach with direct IAM identity actions, matching the s3.configure pattern. The user is auto-created if it does not exist. Usage: s3.bucket.access -name <bucket> -user anonymous -access Read,List s3.bucket.access -name <bucket> -user anonymous -access none s3.bucket.access -name <bucket> -user anonymous Actions are stored as "Action:bucket" on the identity, same as s3.configure -actions=Read -buckets=my-bucket. * shell: return flag parse errors instead of swallowing them * shell: normalize action names case-insensitively in s3.bucket.access Accept actions in any case (read, READ, Read) and normalize to canonical form (Read, Write, List, etc.) before storing. This matches the case-insensitive handling of "none" and avoids confusing rejections.	2026-03-25 23:09:53 -07:00
Chris Lu	7fbdb9b7b7	feat(shell): add volume.tier.compact command to reclaim cloud storage space (#8715 ) * feat(shell): add volume.tier.compact command to reclaim cloud storage space Adds a new shell command that automates compaction of cloud tier volumes. When files are deleted from remote-tiered volumes, space is not reclaimed on the cloud storage. This command orchestrates: download from remote, compact locally, and re-upload to reclaim deleted space. Closes #8563 * fix: log cleanup errors in compactVolumeOnServer instead of discarding them Helps operators diagnose leftover temp files (.cpd/.cpx) if cleanup fails after a compaction or commit failure. * fix: return aggregate error from loop and use regex for collection filter - Track and return error count when one or more volumes fail to compact, so callers see partial failures instead of always getting nil. - Use compileCollectionPattern for -collection in -volumeId mode too, so regex patterns work consistently with the flag description. Empty pattern (no -collection given) matches all collections.	2026-03-20 23:52:12 -07:00
Chris Lu	81369b8a83	improve: large file sync throughput for remote.cache and filer.sync (#8676 ) * improve large file sync throughput for remote.cache and filer.sync Three main throughput improvements: 1. Adaptive chunk sizing for remote.cache: targets ~32 chunks per file instead of always starting at 5MB. A 500MB file now uses ~16MB chunks (32 chunks) instead of 5MB chunks (100 chunks), reducing per-chunk overhead (volume assign, gRPC call, needle write) by 3x. 2. Configurable concurrency at every layer: - remote.cache chunk concurrency: -chunkConcurrency flag (default 8) - remote.cache S3 download concurrency: -downloadConcurrency flag (default raised from 1 to 5 per chunk) - filer.sync chunk concurrency: -chunkConcurrency flag (default 32) 3. S3 multipart download concurrency raised from 1 to 5: the S3 manager downloader was using Concurrency=1, serializing all part downloads within each chunk. This alone can 5x per-chunk download speed. The concurrency values flow through the gRPC request chain: shell command → CacheRemoteObjectToLocalClusterRequest → FetchAndWriteNeedleRequest → S3 downloader Zero values in the request mean "use server defaults", maintaining full backward compatibility with existing callers. Ref #8481 * fix: use full maxMB for chunk size cap and remove loop guard Address review feedback: - Use full maxMB instead of maxMB/2 for maxChunkSize to avoid unnecessarily limiting chunk size for very large files. - Remove chunkSize < maxChunkSize guard from the safety loop so it can always grow past maxChunkSize when needed to stay under 1000 chunks (e.g., extremely large files with small maxMB). * address review feedback: help text, validation, naming, docs - Fix help text for -chunkConcurrency and -downloadConcurrency flags to say "0 = server default" instead of advertising specific numeric defaults that could drift from the server implementation. - Validate chunkConcurrency and downloadConcurrency are within int32 range before narrowing, returning a user-facing error if out of range. - Rename ReadRemoteErr to readRemoteErr to follow Go naming conventions. - Add doc comment to SetChunkConcurrency noting it must be called during initialization before replication goroutines start. - Replace doubling loop in chunk size safety check with direct ceil(remoteSize/1000) computation to guarantee the 1000-chunk cap. * address Copilot review: clamp concurrency, fix chunk count, clarify proto docs - Use ceiling division for chunk count check to avoid overcounting when file size is an exact multiple of chunk size. - Clamp chunkConcurrency (max 1024) and downloadConcurrency (max 1024 at filer, max 64 at volume server) to prevent excessive goroutines. - Always use ReadFileWithConcurrency when the client supports it, falling back to the implementation's default when value is 0. - Clarify proto comments that download_concurrency only applies when the remote storage client supports it (currently S3). - Include specific server defaults in help text (e.g., "0 = server default 8") so users see the actual values in -h output. * fix data race on executionErr and use %w for error wrapping - Protect concurrent writes to executionErr in remote.cache worker goroutines with a sync.Mutex to eliminate the data race. - Use %w instead of %v in volume_grpc_remote.go error formatting to preserve the error chain for errors.Is/errors.As callers.	2026-03-17 16:49:56 -07:00
Chris Lu	c4d642b8aa	fix(ec): gather shards from all disk locations before rebuild (#8633 ) * fix(ec): gather shards from all disk locations before rebuild (#8631) Fix "too few shards given" error during ec.rebuild on multi-disk volume servers. The root cause has two parts: 1. VolumeEcShardsRebuild only looked at a single disk location for shard files. On multi-disk servers, the existing local shards could be on one disk while copied shards were placed on another, causing the rebuild to see fewer shards than actually available. 2. VolumeEcShardsCopy had a DiskId condition (req.DiskId == 0 && len(vs.store.Locations) > 0) that was always true, making the FindFreeLocation fallback dead code. This meant copies always went to Locations[0] regardless of where existing shards were. Changes: - VolumeEcShardsRebuild now finds the location with the most shards, then gathers shard files from other locations via hard links (or symlinks for cross-device) before rebuilding. Gathered files are cleaned up after rebuild. - VolumeEcShardsCopy now only uses Locations[DiskId] when DiskId > 0 (explicitly set). Otherwise, it prefers the location that already has the EC volume, falling back to HDD then any free location. - generateMissingEcFiles now logs shard counts and provides a clear error message when not enough shards are found, instead of passing through to the opaque reedsolomon "too few shards given" error. * fix(ec): update test to match skip behavior for unrepairable volumes The test expected an error for volumes with insufficient shards, but commit `5acb4578a` changed unrepairable volumes to be skipped with a log message instead of returning an error. Update the test to verify the skip behavior and log output. * fix(ec): address PR review comments - Add comment clarifying DiskId=0 means "not specified" (protobuf default), callers must use DiskId >= 1 to target a specific disk. - Log warnings on cleanup failures for gathered shard links. * fix(ec): read shard files from other disks directly instead of linking Replace the hard link / symlink gathering approach with passing additional search directories into RebuildEcFiles. The rebuild function now opens shard files directly from whichever disk they live on, avoiding filesystem link operations and cleanup. RebuildEcFiles and RebuildEcFilesWithContext gain a variadic additionalDirs parameter (backward compatible with existing callers). * fix(ec): clarify DiskId selection semantics in VolumeEcShardsCopy comment * fix(ec): avoid empty files on failed rebuild; don't skip ecx-only locations - generateMissingEcFiles: two-pass approach — first discover present/missing shards and check reconstructability, only then create output files. This avoids leaving behind empty truncated shard files when there are too few shards to rebuild. - VolumeEcShardsRebuild: compute hasEcx before skipping zero-shard locations. A location with an .ecx file but no shard files (all shards on other disks) is now a valid rebuild candidate instead of being silently skipped. * fix(ec): select ecx-only location as rebuildLocation when none chosen yet When rebuildLocation is nil and a location has hasEcx=true but existingShardCount=0 (all shards on other disks), the condition 0 > 0 was false so it was never promoted to rebuildLocation. Add rebuildLocation == nil to the predicate so the first location with an .ecx file is always selected as a candidate.	2026-03-14 20:59:47 -07:00
Chris Lu	5acb4578ab	Fix ec.rebuild failing on unrepairable volumes instead of skipping (#8632 ) * Fix ec.rebuild failing on unrepairable volumes instead of skipping them When an EC volume has fewer shards than DataShardsCount, ec.rebuild would return an error and abort the entire operation. Now it logs a warning and continues rebuilding the remaining volumes. Fixes #8630 * Remove duplicate volume ID in unrepairable log message --------- Co-authored-by: Copilot <copilot@github.com>	2026-03-14 16:18:29 -07:00
Chris Lu	f3c5ba3cd6	feat(filer): add lazy directory listing for remote mounts (#8615 ) * feat(filer): add lazy directory listing for remote mounts Directory listings on remote mounts previously only queried the local filer store. With lazy mounts the listing was empty; with eager mounts it went stale over time. Add on-demand directory listing that fetches from remote and caches results with a 5-minute TTL: - Add `ListDirectory` to `RemoteStorageClient` interface (delimiter-based, single-level listing, separate from recursive `Traverse`) - Implement in S3, GCS, and Azure backends using each platform's hierarchical listing API - Add `maybeLazyListFromRemote` to filer: before each directory listing, check if the directory is under a remote mount with an expired cache, fetch from remote, persist entries to the local store, then let existing listing logic run on the populated store - Use singleflight to deduplicate concurrent requests for the same directory - Skip local-only entries (no RemoteEntry) to avoid overwriting unsynced uploads - Errors are logged and swallowed (availability over consistency) * refactor: extract xattr key to constant xattrRemoteListingSyncedAt * feat: make listing cache TTL configurable per mount via listing_cache_ttl_seconds Add listing_cache_ttl_seconds field to RemoteStorageLocation protobuf. When 0 (default), lazy directory listing is disabled for that mount. When >0, enables on-demand directory listing with the specified TTL. Expose as -listingCacheTTL flag on remote.mount command. * refactor: address review feedback for lazy directory listing - Add context.Context to ListDirectory interface and all implementations - Capture startTime before remote call for accurate TTL tracking - Simplify S3 ListDirectory using ListObjectsV2PagesWithContext - Make maybeLazyListFromRemote return void (errors always swallowed) - Remove redundant trailing-slash path manipulation in caller - Update tests to match new signatures * When an existing entry has Remote != nil, we should merge remote metadata into it rather than replacing it. * fix(gcs): wrap ListDirectory iterator error with context The raw iterator error was returned without bucket/path context, making it harder to debug. Wrap it consistently with the S3 pattern. * fix(s3): guard against nil pointer dereference in Traverse and ListDirectory Some S3-compatible backends may return nil for LastModified, Size, or ETag fields. Check for nil before dereferencing to prevent panics. * fix(filer): remove blanket 2-minute timeout from lazy listing context Individual SDK operations (S3, GCS, Azure) already have per-request timeouts and retry policies. The blanket timeout could cut off large directory listings mid-operation even though individual pages were succeeding. * fix(filer): preserve trace context in lazy listing with WithoutCancel Use context.WithoutCancel(ctx) instead of context.Background() so trace/span values from the incoming request are retained for distributed tracing, while still decoupling cancellation. * fix(filer): use Store.FindEntry for internal lookups, add Uid/Gid to files, fix updateDirectoryListingSyncedAt - Use f.Store.FindEntry instead of f.FindEntry for staleness check and child lookups to avoid unnecessary lazy-fetch overhead - Set OS_UID/OS_GID on new file entries for consistency with directories - In updateDirectoryListingSyncedAt, use Store.UpdateEntry for existing directories instead of CreateEntry to avoid deleteChunksIfNotNew and NotifyUpdateEvent side effects * fix(filer): distinguish not-found from store errors in lazy listing Previously, any error from Store.FindEntry was treated as "not found," which could cause entry recreation/overwrite on transient DB failures. Now check for filer_pb.ErrNotFound explicitly and skip entries or bail out on real store errors. * refactor(filer): use errors.Is for ErrNotFound comparisons	2026-03-13 09:36:54 -07:00
Peter Dodd	0e570d6a8f	feat(remote.mount): add -metadataStrategy flag to control metadata caching (#8568 ) * feat(remote): add -noSync flag to skip upfront metadata pull on mount Made-with: Cursor * refactor(remote): split mount setup from metadata sync Extract ensureMountDirectory for create/validate; call pullMetadata directly when sync is needed. Caller controls sync step for -noSync. Made-with: Cursor * fix(remote): validate mount root when -noSync so bad bucket/creds fail fast When -noSync is used, perform a cheap remote check (ListBuckets and verify bucket exists) instead of skipping all remote I/O. Invalid buckets or credentials now fail at mount time. Made-with: Cursor * test(remote): add TestRemoteMountNoSync for -noSync mount and persisted mapping Made-with: Cursor * test(remote): assert no upfront metadata after -noSync mount After remote.mount -noSync, run fs.ls on the mount dir and assert empty listing so the test fails if pullMetadata was invoked eagerly. Made-with: Cursor * fix(remote): propagate non-ErrNotFound lookup errors in ensureMountDirectory Return lookupErr immediately for any LookupDirectoryEntry failure that is not filer_pb.ErrNotFound, so only the not-found case creates the entry and other lookup failures are reported to the caller. Made-with: Cursor * fix(remote): use errors.Is for ErrNotFound in ensureMountDirectory Replace fragile strings.Contains(lookupErr.Error(), ...) with errors.Is(lookupErr, filer_pb.ErrNotFound) before calling CreateEntry. Made-with: Cursor * fix(remote): use LookupEntry so ErrNotFound is recognised after gRPC Raw gRPC LookupDirectoryEntry returns a status error, not the sentinel, so errors.Is(lookupErr, filer_pb.ErrNotFound) was always false. Use filer_pb.LookupEntry which normalises not-found to ErrNotFound so the mount directory is created when missing. Made-with: Cursor * test(remote): ignore weed shell banner in TestRemoteMountNoSync fs.ls count Exclude master/filer and prompt lines from entry count so the assertion checks only actual fs.ls output for empty -noSync mount. Made-with: Cursor * fix(remote.mount): use 0755 for mount dir, document bucket-less early return Made-with: Cursor * feat(remote.mount): replace -noSync with -metadataStrategy=lazy\|eager - Add -metadataStrategy flag (eager default, lazy skips upfront metadata pull) - Accept lazy/eager case-insensitively; reject invalid values with clear error - Rename TestRemoteMountNoSync to TestRemoteMountMetadataStrategyLazy - Add TestRemoteMountMetadataStrategyEager and TestRemoteMountMetadataStrategyInvalid Made-with: Cursor * fix(remote.mount): validate strategy and remote before creating mount directory Move strategy validation and validateMountRoot (lazy path) before ensureMountDirectory so that invalid strategies or bad bucket/credentials fail without leaving orphaned directory entries in the filer. * refactor(remote.mount): remove unused remote param from ensureMountDirectory The remote RemoteStorageLocation parameter was left over from the old syncMetadata signature. Only remoteConf.Name is used inside the function. doc(remote.mount): add TODO for HeadBucket-style validation validateMountRoot currently lists all buckets to verify one exists. Note the need for a targeted BucketExists method in the interface. * refactor(remote.mount): use MetadataStrategy type and constants Replace raw string comparisons with a MetadataStrategy type and MetadataStrategyEager/MetadataStrategyLazy constants for clarity and compile-time safety. * refactor(remote.mount): rename MetadataStrategy to MetadataCacheStrategy More precisely describes the purpose: controlling how metadata is cached from the remote, not metadata handling in general. * fix(remote.mount): remove validateMountRoot from lazy path Lazy mount's purpose is to skip remote I/O. Validating via ListBuckets contradicts that, especially on accounts with many buckets. Invalid buckets or credentials will surface on first lazy access instead. * fix(test): handle shell exit 0 in TestRemoteMountMetadataStrategyInvalid The weed shell process exits with code 0 even when individual commands fail — errors appear in stdout. Check output instead of requiring a non-nil error. * test(remote.mount): remove metadataStrategy shell integration tests These tests only verify string output from a shell process that always exits 0 — they cannot meaningfully validate eager vs lazy behavior without a real remote backend. --------- Co-authored-by: Chris Lu <chris.lu@gmail.com>	2026-03-12 15:21:07 -07:00
Copilot	013362d2d3	fix(shell): show planned size in fs.mergeVolumes log to clarify size limit check (#8553 ) The log message was comparing against the planned size of the destination volume (including volumes already planned to merge into it) but only displaying the raw volume size, making the output confusing when the displayed sizes clearly didn't add up to exceed the limit.	2026-03-11 13:56:13 -07:00
Chris Lu	b799650357	fix(shell): set LastLocalSyncTsNs in remote.copy.local so remote.uncache works (#8604 ) remote.uncache checks LastLocalSyncTsNs to determine if a file has been synced to remote. remote.copy.local was not setting this field, leaving it at 0, which caused uncache to skip all files uploaded via remote.copy.local. Fixes #8602	2026-03-11 12:55:45 -07:00
Chris Lu	e1e5b4a8a6	add admin script worker (#8491 ) * admin: add plugin lock coordination * shell: allow bypassing lock checks * plugin worker: add admin script handler * mini: include admin_script in plugin defaults * admin script UI: drop name and enlarge text * admin script: add default script * admin_script: make run interval configurable * plugin: gate other jobs during admin_script runs * plugin: use last completed admin_script run * admin: backfill plugin config defaults * templ Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com> * comparable to default version Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com> * default to run Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com> * format Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com> * shell: respect pre-set noLock for fix.replication * shell: add force no-lock mode for admin scripts * volume balance worker already exists Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com> * admin: expose scheduler status JSON * shell: add sleep command * shell: restrict sleep syntax * Revert "shell: respect pre-set noLock for fix.replication" This reverts commit `2b14e8b826`. * templ Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com> * fix import Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com> * less logs Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com> * Reduce master client logs on canceled contexts * Update mini default job type count --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-03 15:10:40 -08:00
Chris Lu	2dd3944819	Respect -minFreeSpace during ec.decode (#8467 ) * shell: add ec.decode ignoreMinFreeSpace flag Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * shell: respect minFreeSpace in ec.decode Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * shell: rename ec.decode minFreeSpace flag Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * shell: error when ec.decode has no shards Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * shell: select ec.decode target with zero shards Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * shell: adjust free counts across ec.decode Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * unused * Update weed/shell/command_ec_decode.go Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>	2026-02-27 23:54:30 -08:00
Chris Lu	9b6fc49946	Chart createBuckets config #8368 : Add TTL, Object Lock, and Versioning support (#8375 ) * Chart createBuckets config #8368: Add TTL, Object Lock, and Versioning support * Update weed/shell/command_s3_bucket_versioning.go Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * address comments * address comments * go fmt * fix failures are still treated like “bucket not found” --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-02-26 11:56:10 -08:00
Chris Lu	a3cb7fa8cc	go fmt	2026-02-25 10:25:23 -08:00
Chris Lu	b565a0cc86	Adds volume.merge command with deduplication and disk-based backend (#8441 ) * Enhance volume.merge command with deduplication and disk-based backend * Fix copyVolume function call with correct argument order and missing bool parameter * Revert "Fix copyVolume function call with correct argument order and missing bool parameter" This reverts commit `7b4a190643`. * Fix critical issues: per-replica writable tracking, tail goroutine cancellation via done channel, and debug logging for allocation failures * Optimize memory usage with watermark approach for duplicate detection * Fix critical issues: swap copyVolume arguments, increase idle timeout, remove file double-close, use glog for logging * Replace temporary file with in-memory buffer for needle blob serialization * test(volume.merge): Add comprehensive unit and integration tests Add 7 unit tests covering: - Ordering by timestamp - Cross-stream duplicate deduplication - Empty stream handling - Complex multi-stream deduplication - Single stream passthrough - Large needle ID support - LastModified fallback when timestamp unavailable Add 2 integration validation tests: - TestMergeWorkflowValidation: Documents 9-stage merge workflow - TestMergeEdgeCaseHandling: Validates 10 edge case handling All tests passing (9/9) * fix(volume.merge): Use time window for deduplication to handle clock skew The same needle ID can have different timestamps on different servers due to clock skew and replication lag. Needles with the same ID within a 5-second time window are now treated as duplicates (same write with timestamp variance). Key changes: - Add mergeDeduplicationWindowNs constant (5 seconds) - Replace exact timestamp matching with time window comparison - Use windowInitialized flag to properly detect window transitions - Add TestMergeNeedleStreamsTimeWindowDeduplication test This ensures that replicated writes with slight timestamp differences are properly deduplicated during merge, while separate updates to the same file ID (outside the window) are preserved. All tests passing (10/10) * test: Add volume.merge integration tests with 5 comprehensive test cases * test: integration tests for volume.merge command * Fix integration tests: use TripleVolumeCluster for volume.merge testing - Created new TripleVolumeCluster framework (cluster_triple.go) with 3 volume servers - Rebuilt weed binary with volume.merge command compiled in - Updated all 5 integration tests to use TripleVolumeCluster instead of DualVolumeCluster - Tests now properly allocate volumes on 2 servers and let merge allocate on 3rd - All 5 integration tests now pass: - TestVolumeMergeBasic - TestVolumeMergeReadonly - TestVolumeMergeRestore - TestVolumeMergeTailNeedles - TestVolumeMergeDivergentReplicas * Refactor test framework: use parameterized server count instead of hardcoded - Renamed TripleVolumeCluster to MultiVolumeCluster with serverCount parameter - Replaced hardcoded volumePort0/1/2 with slices for flexible server count - Updated StartTripleVolumeCluster as backward-compatible wrapper calling StartMultiVolumeCluster(t, profile, 3) - Made directory creation, port allocation, and server startup loop-based - Updated accessor methods (VolumeAdminAddress, VolumeGRPCAddress, etc.) to support any server count - All 5 integration tests continue to pass with new parameterized cluster framework - Enables future testing with 2, 4, 5+ volume servers by calling StartMultiVolumeCluster directly * Consolidate cluster frameworks: StartDualVolumeCluster now uses MultiVolumeCluster - Made DualVolumeCluster a type alias for MultiVolumeCluster - Updated StartDualVolumeCluster to call StartMultiVolumeCluster(t, profile, 2) - Removed duplicate code from cluster_dual.go (now just 17 lines) - All existing tests using StartDualVolumeCluster continue to work without changes - Backward compatible: existing code continues to use the old function signatures - Added wrapper functions in cluster_multi.go for StartTripleVolumeCluster - Enables unified cluster management across all test suites * Address PR review comments: improve error handling and clean up code - Replace parse error swallow with proper error return - Log cleanup and restoration errors instead of silently discarding them - Remove unused offset field from memoryBackendFile struct - Fix WriteAt buffer truncation bug to preserve trailing bytes - All unit tests passing (10/10) - Code compiles successfully * Fix PR review findings: test improvements and code quality - Add timeout to runWeedShell to prevent hanging - Add server 1 readonly status verification in tests - Assert merge fails when replicas writable (not just log output) - Replace sleep with polling for writable restoration check - Fix WriteAt stale data snapshot bug in memoryBackendFile - Fix startVolume error logging to show current server log - Fix volumePubPorts double assignment in port allocation - Rename test to reflect behavior: DoesNotDeduplicateAcrossWindows - Fix misleading dedup window comment Unit tests: 10/10 passing Binary: Compiles successfully * Fix test assumption: merge command marks volumes readonly automatically TestVolumeMergeReadonly was expecting merge to fail on writable volumes, but the merge command is designed to mark volumes readonly as part of its operation. Fixed test to verify merge succeeds on writable volumes and properly restores writable state afterward. Removed redundant Test 2 code that duplicated the new behavior. * fmt * Fix deduplication logic to correctly handle same-stream vs cross-stream duplicates The dedup map previously used only NeedleId as key, causing same-stream overwrites to be incorrectly skipped as duplicates. Changed to track which stream first processed each needle ID in the current window: - Cross-stream duplicates (same ID from different streams, within window) are skipped - Same-stream duplicates (overwrites from same stream) are kept - Map now stores: needleId -> streamIndex of first occurrence in window Added TestMergeNeedleStreamsSameStreamDuplicates to verify same-stream overwrites are preserved while cross-stream duplicates are skipped. All unit tests passing (11/11) Binary compiles successfully	2026-02-25 10:12:09 -08:00
Chris Lu	da4edb5fe6	Fix live volume move tail timestamp (#8440 ) * Improve move tail timestamp * Add move tail timestamp integration test * Simulate traffic during move	2026-02-24 20:07:26 -08:00
Chris Lu	cd6832249b	Fix volume.fsck crashing on EC volumes and add multi-volume vacuum support (#8406 ) * helm: refine openshift-values.yaml to remove hardcoded UIDs Remove hardcoded runAsUser, runAsGroup, and fsGroup from the openshift-values.yaml example. This allows OpenShift's admission controller to automatically assign a valid UID from the namespace's allocated range, avoiding "forbidden" errors when UID 1000 is outside the permissible range. Updates #8381, #8390. * helm: fix volume.logs and add consistent security context comments * Update README.md * fix volume.fsck crashing on EC volumes and add multi-volume vacuum support * address comments	2026-02-22 22:07:15 -08:00
Konstantin Lebedev	01b3125815	[shell]: volume balance capacity by min volume density (#8026 ) volume balance by min volume density and active volumes	2026-02-19 13:30:59 -08:00
Chris Lu	f44e25b422	fix(iam): ensure access key status is persisted and defaulted to Active (#8341 ) * Fix master leader election startup issue Fixes #error-log-leader-not-selected-yet * not useful test * fix(iam): ensure access key status is persisted and defaulted to Active * make pb * update tests * using constants	2026-02-13 20:28:41 -08:00
Lisandro Pin	fbe7dd32c2	Implement full scrubbing for regular volumes (#8254 ) Implement full scrubbing for regular volumes.	2026-02-13 15:47:29 -08:00
Lisandro Pin	221bd237c4	Fix file stat collection metric bug for the `cluster.status` command. (#8302 ) When the `--files` flag is present, `cluster.status` will scrape file metrics from volume servers to provide detailed stats on those. The progress indicator was not being updated properly though, so the command would complete before it read 100%.	2026-02-11 13:34:20 -08:00

1 2 3 4 5 ...

1043 Commits