seaweedfs

mirror of https://github.com/seaweedfs/seaweedfs.git synced 2026-05-21 17:21:34 +00:00

Author	SHA1	Message	Date
Chris Lu	391f543ff2	fix(ec): correct multi-disk disk counting and EC balance shard attribution (#9594 ) * fix(shell): count physical disks in cluster.status on multi-disk nodes The master keys DataNodeInfo.DiskInfos by disk type, so several same-type physical disks on one node collapse into a single DiskInfo entry. cluster.status (printClusterInfo) and CountTopologyResources counted len(DiskInfos), reporting one disk per node instead of the real physical disk count, while volume.list and the admin ActiveTopology already split per physical disk. Route both counters through DiskInfo.SplitByPhysicalDisk so a node with N same-type disks reports N. Cosmetic/diagnostic only; placement already uses the per-disk activeDisk map. * fix(ec): attribute EC balance source disk per shard and reject same-node moves On multi-disk nodes the EC balance worker built a node-level view that kept only the first physical disk id per (node, volume), so a move of a shard living on a different disk reported the wrong source disk. That source disk drives the per-disk capacity reservation, so the wrong disk drifts the capacity model the EC placement planner relies on. Track shards per physical disk and resolve the actual source disk for every emitted move (dedup, cross-rack, within-rack, global), keeping the per-disk view consistent as simulated moves are applied. Also close a data-loss trap: VolumeEcShardsDelete is node-wide (it removes the shard from every disk on the node) and copyAndMountShard skips the copy when source and target addresses match, so a same-node move would erase a shard it never copied. isDedupPhase now requires the same node AND disk, and Validate / Execute reject same-node cross-disk moves outright. * fix(ec): spread EC balance moves across destination disks Port the shell ec.balance pickBestDiskOnNode heuristic to the EC balance worker so a moved shard is placed on a good physical disk instead of always deferring to the volume server (target disk 0). The detection now builds a per-physical-disk view of each node (free slots split from the node total, exact EC shard count, disk type, discovered from both regular volumes and EC shards) and, for each cross-rack, within-rack, and global move, chooses the destination disk by ascending score: - fewer total EC shards on the disk, - far fewer shards of the same volume on the disk (spread a volume's shards across disks for fault tolerance), and - data/parity anti-affinity (a data shard avoids disks holding the volume's parity shards and vice versa). Planned placements are reserved on the in-memory model during a run so multiple shards moved to the same node spread across its disks rather than piling on one. * fix(ec): bring EC balance worker to parity with shell ec.balance The worker's cross-rack and within-rack balancing balanced shards by total count; the shell balances data and parity shards separately with anti-affinity and honors replica placement. Port that logic so the automatic balancer makes the same fault-tolerance-aware decisions as the manual command: - Cross-rack and within-rack now run a two-pass balance: data shards spread first, then parity shards spread while avoiding racks/nodes that already hold the volume's data shards (anti-affinity), mirroring doBalanceEcShardsAcrossRacks and doBalanceEcShardsWithinOneRack. - Optional replica placement: a new replica_placement config (e.g. "020") constrains shards per rack (DiffRackCount) and per node (SameRackCount); empty keeps the previous even-spread behavior. - The data/parity boundary is resolved from a per-collection EC ratio (standard 10+4 here), replacing the previously hardcoded constant at the call sites. Selection is deterministic (sorted keys) to keep behavior reproducible. * refactor(ec): extract shared ecbalancer package for shell and worker The EC shard balancing policy was duplicated between the shell ec.balance command and the admin EC balance worker, and the two had drifted (multi-disk handling, data/parity anti-affinity, replica placement). Extract the policy into a new pure package, weed/storage/erasure_coding/ecbalancer, that both callers share so it cannot drift again. - ecbalancer.Plan(topology, options) runs the full policy (dedup, cross-rack and within-rack data/parity two-pass with anti-affinity, global per-rack balance, and diversity-aware disk selection) over a caller-built Topology snapshot and returns the shard Moves. It depends only on erasure_coding and super_block. - The worker builds the Topology from the master topology and turns Moves into task proposals; the shell builds it from its EcNode model and executes Moves via the existing move/delete RPCs. Per-collection EC ratio resolution stays in each caller (passed as Options.Ratio). - Options expose the two genuine policy differences: GlobalUtilizationBased (worker balances by fractional fullness; shell by raw count) and GlobalMaxMovesPerRack (worker moves incrementally across cycles; shell drains in one pass). The shell keeps pickBestDiskOnNode for the evacuate command. Policy tests move to the ecbalancer package; the shell and worker keep their adapter/execution tests. * fix(ec): restore parallelism and per-type/full-range balancing after ecbalancer refactor Address regressions and gaps from the ecbalancer extraction: - Shell ec.balance honors -maxParallelization again: planned moves run phase by phase (preserving cross-phase dependencies) with bounded concurrency within a phase. Apply mode does only the RPCs concurrently; dry-run stays sequential and updates the in-memory model for inspection. - Rack and node balancing gate on per-type spread (data and parity separately) instead of combined totals, so a data/parity skew is corrected even when the per-rack/node totals are even. - Global rack balancing iterates the full shard-id space (MaxShardCount) so custom EC ratios with more than the standard total are candidates. - Cross-rack planning decrements the destination node's free slots per planned move, so limited-capacity targets are no longer over-planned. * fix(ec): make EC dedup keeper deterministic and capacity-aware When a shard is duplicated across nodes, keep the copy on the node with the most free slots and delete the duplicates from the more-constrained nodes, relieving capacity pressure where it is tightest. Tie-break on node id so the choice is deterministic. This unifies the shell and worker (the shell previously kept the least-free node, an incidental default) on the more sensible behavior. * fix(ec): restore global volume-diversity and per-volume move serialization Two more behaviors lost in the ecbalancer refactor: - Global rack balancing again prefers moving a shard of a volume the destination does not hold at all before adding another shard of an already-present volume (two-pass, mirroring the old balanceEcRack), keeping each volume's shards spread across nodes. - Shell apply-mode execution serializes a single volume's moves within a phase while still running different volumes in parallel, so concurrent moves of the same volume cannot race on its shared .ecx/.ecj/.vif sidecar files. * fix(ec): key EC balance shards by (collection, volume id) A numeric volume id can be reused across collections, and EC identity is (collection, vid) (see store_ec_attach_reservation.go). The ecbalancer keyed Node.shards by vid alone, so volumes sharing an id across collections merged into one entry — letting dedup delete a "duplicate" that is actually a different collection's shard, and letting moves act across collections. Key shards by (collection, vid) throughout so each volume stays distinct. * fix(ec): credit freed capacity from dedup before later balance phases Dedup deletions are simulated only by applyMovesToTopology, which cleared shard bits but did not return the freed disk/node/rack slots. Later phases reject destinations with no free slots, so a slot opened by dedup could not be reused in the same Plan/ec.balance run. applyMovesToTopology now credits the freed disk/node/rack capacity for dedup moves (non-dedup moves still rely on the inline accounting their phase already did). * test(ec): add multi-disk EC balance integration test Cover issue 9593 end-to-end at the unit level the old tests missed: build the master's actual multi-disk wire format (same-type disks collapsed into one DiskInfo, real DiskId only in per-shard records), run it through a real ActiveTopology and the Detection entry point, then replay the planned moves with the volume server's true semantics (node-wide VolumeEcShardsDelete) and assert no EC shard is ever lost. Covers a balanced spread, a one-node-concentrated volume, and a multi-rack spread, and asserts moves are safe (no same-node cross-disk), correctly attributed to the source disk, and redistribute concentrated volumes across both other racks and multiple destination disks. * fix(ec): aggregate per-disk EC shards when verifying multi-disk volumes collectEcNodeShardsInfo overwrote its per-server entry for each EcShardInfo of a volume. A multi-disk node reports one EcShardInfo per physical disk holding shards of the volume, so only the last disk's shards survived — the node looked like it was missing shards it actually had. This made ec.encode's pre-delete verification (and ec.decode) under-count volumes whose shards are spread across disks on one server, falsely aborting the encode on multi-disk clusters. Union the per-disk shard sets per server instead. Also make verifyEcShardsBeforeDelete poll briefly: shard relocations reach the master via volume-server heartbeats, so a freshly distributed shard set may not be fully visible the instant the balance returns. Retry before concluding the set is incomplete; genuine loss still fails after the retries are exhausted. * test(ec): end-to-end multi-disk EC balance shard-loss regression Start a real cluster of multi-disk volume servers (3 servers x 4 disks), EC-encode a volume, run ec.balance, and assert hard invariants the prior integration tests only logged: after encode all 14 shards exist, ec.balance loses no shard, shards span more than one disk per node, and cluster.status counts physical disks (not one per node). This reproduces issue 9593 end to end and would have caught the multi-disk shard-aggregation bug fixed alongside it. * fix(ec): bring EC balance worker/plugin path to parity with shell - Per-volume serialization and phase order: key the plugin proposal dedupe by (collection, volume) instead of (volume, shard, source), so the scheduler runs only one of a volume's moves at a time (within a run and against in-flight jobs). Concurrent same-volume moves raced on the volume's .ecx/.ecj/.vif sidecars; and because the planner emits a volume's moves in phase order, they now execute in order across detection cycles, matching the shell. - disk_type "hdd": normalize via ToDiskType (hdd -> "" HardDriveType) while keeping a "filter requested" flag, so disk_type=hdd matches the empty-keyed HDD disks instead of nothing; apply the canonical type to planner options and move params. - Replica placement: expose shard_replica_placement in the admin config form and read it into the worker config, mirroring ec.balance -shardReplicaPlacement. * test(ec): rename worker in-process test (not a real integration test) The worker-package multi-disk tests build a fake master topology and simulate move execution; they are not real-cluster integration tests. Rename integration_test.go -> multidisk_detection_test.go and drop the Integration prefix so 'integration' refers only to the real-cluster E2Es in test/erasure_coding. * ci(ec): remove redundant ec-integration workflow ec-integration.yml duplicated EC Integration Tests under the same workflow name but ran only 'go test ec_integration_test.go' (one file), so it never ran new test files (e.g. multidisk_shardloss_test.go) and was a strict, path-filtered subset of ec-integration-tests.yml, which already runs 'go test -v' over the whole test/erasure_coding package on every push/PR. * fix(ec): worker falls back to master default replication for EC balance For strict parity with the shell, the EC balance worker now uses the master's configured default replication as the replica-placement fallback when no explicit shard_replica_placement is set, instead of always defaulting to even spread. The maintenance scanner reads it via GetMasterConfiguration each cycle and passes it through ClusterInfo.DefaultReplicaPlacement; detection resolves the constraint (explicit config wins, else master default, else none) in resolveReplicaPlacement. A zero-replication default (the common 000 case) still means even spread, so the common configuration is unchanged. * fix(ec): plugin path populates master default replication too The plugin worker built ClusterInfo with only ActiveTopology, so the master default replication fallback added for the maintenance path never reached plugin-driven EC balance detection — empty shard_replica_placement still meant even spread there. Fetch the master default via GetMasterConfiguration (new pluginworker.FetchDefaultReplicaPlacement) and set ClusterInfo.DefaultReplicaPlacement so both detection paths resolve replica placement identically to the shell. * docs(ec): empty shard replica placement uses master default, not even spread The EC balance config text (admin plugin form, legacy form help text, and the struct/proto field comments) still said an empty shard_replica_placement spreads evenly. The runtime resolves empty to the master default replication (resolveReplicaPlacement), matching shell ec.balance, with even spread only when that default is empty or zero. Update the text to match and regenerate worker_pb for the proto comment change.	2026-05-20 23:31:21 -07:00
Lisandro Pin	221bd237c4	Fix file stat collection metric bug for the `cluster.status` command. (#8302 ) When the `--files` flag is present, `cluster.status` will scrape file metrics from volume servers to provide detailed stats on those. The progress indicator was not being updated properly though, so the command would complete before it read 100%.	2026-02-11 13:34:20 -08:00
Lisandro Pin	f400fb44a0	Update `cluster.status` to resolve file details on EC volumes. (#8268 ) Also parallelizes queries for file metrics collections when the `--files` flag is specified, and improves the command's output for readability: ``` > cluster.status --files collecting file stats: 100% cluster: id: topo status: LOCKED nodes: 10 topology: 1 DC, 10 disks on 1 rack volumes: total: 3 volumes, 1 collection max size: 32 GB regular: 1/80 volume on 3 replicas, 3 writable (100%), 0 read-only (0%) EC: 2 EC volumes on 28 shards (14 shards/volume) storage: total: 269 MB (522 MB raw, 193.95%) regular volumes: 91 MB (272 MB raw, 300%) EC volumes: 178 MB (250 MB raw, 140%) files: total: 363 files, 300 readable (82.64%), 63 deleted (17.35%), avg 522 kB per file regular: 168 files, 105 readable (62.5%), 63 deleted (37.5%), avg 540 kB per file EC: 195 files, 195 readable (100%), 0 deleted (0%), avg 506 kB per file ```	2026-02-09 17:52:43 -08:00
Chris Lu	e10f11b480	opt: reduce ShardsInfo memory usage with bitmap and sorted slice (#7974 ) * opt: reduce ShardsInfo memory usage with bitmap and sorted slice - Replace map[ShardId]ShardInfo with sorted []ShardInfo slice - Add ShardBits (uint32) bitmap for O(1) existence checks - Use binary search for O(log n) lookups by shard ID - Maintain sorted order for efficient iteration - Add comprehensive unit tests and benchmarks Memory savings: - Map overhead: ~48 bytes per entry eliminated - Pointers: 8 bytes per entry eliminated - Total: ~56 bytes per shard saved Performance improvements: - Has(): O(1) using bitmap - Size(): O(log n) using binary search (was O(1), acceptable tradeoff) - Count(): O(1) using popcount on bitmap - Iteration: Faster due to cache locality refactor: add methods to ShardBits type - Add Has(), Set(), Clear(), and Count() methods to ShardBits - Simplify ShardsInfo methods by using ShardBits methods - Improves code readability and encapsulation * opt: use ShardBits directly in ShardsCountFromVolumeEcShardInformationMessage Avoid creating a full ShardsInfo object just to count shards. Directly cast vi.EcIndexBits to ShardBits and use Count() method. * opt: use strings.Builder in ShardsInfo.String() for efficiency * refactor: change AsSlice to return []ShardInfo (values instead of pointers) This completes the memory optimization by avoiding unnecessary pointer slices and potential allocations. * refactor: rename ShardsCountFromVolumeEcShardInformationMessage to GetShardCount * fix: prevent deadlock in Add and Subtract methods Copy shards data from 'other' before releasing its lock to avoid potential deadlock when a.Add(b) and b.Add(a) are called concurrently. The previous implementation held other's lock while calling si.Set/Delete, which acquires si's lock. This could deadlock if two goroutines tried to add/subtract each other concurrently. * opt: avoid unnecessary locking in constructor functions ShardsInfoFromVolume and ShardsInfoFromVolumeEcShardInformationMessage now build shards slice and bitmap directly without calling Set(), which acquires a lock on every call. Since the object is local and not yet shared, locking is unnecessary and adds overhead. This improves performance during object construction. * fix: rename 'copy' variable to avoid shadowing built-in function The variable name 'copy' in TestShardsInfo_Copy shadowed the built-in copy() function, which is confusing and bad practice. Renamed to 'siCopy'. * opt: use math/bits.OnesCount32 and reorganize types 1. Replace manual popcount loop with math/bits.OnesCount32 for better performance and idiomatic Go code 2. Move ShardSize type definition to ec_shards_info.go for better code organization since it's primarily used there * refactor: Set() now accepts ShardInfo for future extensibility Changed Set(id ShardId, size ShardSize) to Set(shard ShardInfo) to support future additions to ShardInfo without changing the API. This makes the code more extensible as new fields can be added to ShardInfo (e.g., checksum, location, etc.) without breaking the Set API. * refactor: move ShardInfo and ShardSize to separate file Created ec_shard_info.go to hold the basic shard types (ShardInfo and ShardSize) for better code organization and separation of concerns. * refactor: add ShardInfo constructor and helper functions Added NewShardInfo() constructor and IsValid() method to better encapsulate ShardInfo creation and validation. Updated code to use the constructor for cleaner, more maintainable code. * fix: update remaining Set() calls to use NewShardInfo constructor Fixed compilation errors in storage and shell packages where Set() calls were not updated to use the new NewShardInfo() constructor. * fix: remove unreachable code in filer backup commands Removed unreachable return statements after infinite loops in filer_backup.go and filer_meta_backup.go to fix compilation errors. * fix: rename 'new' variable to avoid shadowing built-in Renamed 'new' to 'result' in MinusParityShards, Plus, and Minus methods to avoid shadowing Go's built-in new() function. * fix: update remaining test files to use NewShardInfo constructor Fixed Set() calls in command_volume_list_test.go and ec_rebalance_slots_test.go to use NewShardInfo() constructor.	2026-01-06 00:09:52 -08:00
Lisandro Pin	6b98b52acc	Fix reporting of EC shard sizes from nodes to masters. (#7835 ) SeaweedFS tracks EC shard sizes on topology data stuctures, but this information is never relayed to master servers :( The end result is that commands reporting disk usage, such as `volume.list` and `cluster.status`, yield incorrect figures when EC shards are present. As an example for a simple 5-node test cluster, before... ``` > volume.list Topology volumeSizeLimit:30000 MB hdd(volume:6/40 active:6 free:33 remote:0) DataCenter DefaultDataCenter hdd(volume:6/40 active:6 free:33 remote:0) Rack DefaultRack hdd(volume:6/40 active:6 free:33 remote:0) DataNode 192.168.10.111:9001 hdd(volume:1/8 active:1 free:7 remote:0) Disk hdd(volume:1/8 active:1 free:7 remote:0) id:0 volume id:3 size:88967096 file_count:172 replica_placement:2 version:3 modified_at_second:1766349617 ec volume id:1 collection: shards:[1 5] Disk hdd total size:88967096 file_count:172 DataNode 192.168.10.111:9001 total size:88967096 file_count:172 DataCenter DefaultDataCenter hdd(volume:6/40 active:6 free:33 remote:0) Rack DefaultRack hdd(volume:6/40 active:6 free:33 remote:0) DataNode 192.168.10.111:9002 hdd(volume:2/8 active:2 free:6 remote:0) Disk hdd(volume:2/8 active:2 free:6 remote:0) id:0 volume id:2 size:77267536 file_count:166 replica_placement:2 version:3 modified_at_second:1766349617 volume id:3 size:88967096 file_count:172 replica_placement:2 version:3 modified_at_second:1766349617 ec volume id:1 collection: shards:[0 4] Disk hdd total size:166234632 file_count:338 DataNode 192.168.10.111:9002 total size:166234632 file_count:338 DataCenter DefaultDataCenter hdd(volume:6/40 active:6 free:33 remote:0) Rack DefaultRack hdd(volume:6/40 active:6 free:33 remote:0) DataNode 192.168.10.111:9003 hdd(volume:1/8 active:1 free:7 remote:0) Disk hdd(volume:1/8 active:1 free:7 remote:0) id:0 volume id:2 size:77267536 file_count:166 replica_placement:2 version:3 modified_at_second:1766349617 ec volume id:1 collection: shards:[2 6] Disk hdd total size:77267536 file_count:166 DataNode 192.168.10.111:9003 total size:77267536 file_count:166 DataCenter DefaultDataCenter hdd(volume:6/40 active:6 free:33 remote:0) Rack DefaultRack hdd(volume:6/40 active:6 free:33 remote:0) DataNode 192.168.10.111:9004 hdd(volume:2/8 active:2 free:6 remote:0) Disk hdd(volume:2/8 active:2 free:6 remote:0) id:0 volume id:2 size:77267536 file_count:166 replica_placement:2 version:3 modified_at_second:1766349617 volume id:3 size:88967096 file_count:172 replica_placement:2 version:3 modified_at_second:1766349617 ec volume id:1 collection: shards:[3 7] Disk hdd total size:166234632 file_count:338 DataNode 192.168.10.111:9004 total size:166234632 file_count:338 DataCenter DefaultDataCenter hdd(volume:6/40 active:6 free:33 remote:0) Rack DefaultRack hdd(volume:6/40 active:6 free:33 remote:0) DataNode 192.168.10.111:9005 hdd(volume:0/8 active:0 free:8 remote:0) Disk hdd(volume:0/8 active:0 free:8 remote:0) id:0 ec volume id:1 collection: shards:[8 9 10 11 12 13] Disk hdd total size:0 file_count:0 Rack DefaultRack total size:498703896 file_count:1014 DataCenter DefaultDataCenter total size:498703896 file_count:1014 total size:498703896 file_count:1014 ``` ...and after: ``` > volume.list Topology volumeSizeLimit:30000 MB hdd(volume:6/40 active:6 free:33 remote:0) DataCenter DefaultDataCenter hdd(volume:6/40 active:6 free:33 remote:0) Rack DefaultRack hdd(volume:6/40 active:6 free:33 remote:0) DataNode 192.168.10.111:9001 hdd(volume:1/8 active:1 free:7 remote:0) Disk hdd(volume:1/8 active:1 free:7 remote:0) id:0 volume id:2 size:81761800 file_count:161 replica_placement:2 version:3 modified_at_second:1766349495 ec volume id:1 collection: shards:[1 5 9] sizes:[1:8.00 MiB 5:8.00 MiB 9:8.00 MiB] total:24.00 MiB Disk hdd total size:81761800 file_count:161 DataNode 192.168.10.111:9001 total size:81761800 file_count:161 DataCenter DefaultDataCenter hdd(volume:6/40 active:6 free:33 remote:0) Rack DefaultRack hdd(volume:6/40 active:6 free:33 remote:0) DataNode 192.168.10.111:9002 hdd(volume:1/8 active:1 free:7 remote:0) Disk hdd(volume:1/8 active:1 free:7 remote:0) id:0 volume id:3 size:88678712 file_count:170 replica_placement:2 version:3 modified_at_second:1766349495 ec volume id:1 collection: shards:[11 12 13] sizes:[11:8.00 MiB 12:8.00 MiB 13:8.00 MiB] total:24.00 MiB Disk hdd total size:88678712 file_count:170 DataNode 192.168.10.111:9002 total size:88678712 file_count:170 DataCenter DefaultDataCenter hdd(volume:6/40 active:6 free:33 remote:0) Rack DefaultRack hdd(volume:6/40 active:6 free:33 remote:0) DataNode 192.168.10.111:9003 hdd(volume:2/8 active:2 free:6 remote:0) Disk hdd(volume:2/8 active:2 free:6 remote:0) id:0 volume id:2 size:81761800 file_count:161 replica_placement:2 version:3 modified_at_second:1766349495 volume id:3 size:88678712 file_count:170 replica_placement:2 version:3 modified_at_second:1766349495 ec volume id:1 collection: shards:[0 4 8] sizes:[0:8.00 MiB 4:8.00 MiB 8:8.00 MiB] total:24.00 MiB Disk hdd total size:170440512 file_count:331 DataNode 192.168.10.111:9003 total size:170440512 file_count:331 DataCenter DefaultDataCenter hdd(volume:6/40 active:6 free:33 remote:0) Rack DefaultRack hdd(volume:6/40 active:6 free:33 remote:0) DataNode 192.168.10.111:9004 hdd(volume:2/8 active:2 free:6 remote:0) Disk hdd(volume:2/8 active:2 free:6 remote:0) id:0 volume id:2 size:81761800 file_count:161 replica_placement:2 version:3 modified_at_second:1766349495 volume id:3 size:88678712 file_count:170 replica_placement:2 version:3 modified_at_second:1766349495 ec volume id:1 collection: shards:[2 6 10] sizes:[2:8.00 MiB 6:8.00 MiB 10:8.00 MiB] total:24.00 MiB Disk hdd total size:170440512 file_count:331 DataNode 192.168.10.111:9004 total size:170440512 file_count:331 DataCenter DefaultDataCenter hdd(volume:6/40 active:6 free:33 remote:0) Rack DefaultRack hdd(volume:6/40 active:6 free:33 remote:0) DataNode 192.168.10.111:9005 hdd(volume:0/8 active:0 free:8 remote:0) Disk hdd(volume:0/8 active:0 free:8 remote:0) id:0 ec volume id:1 collection: shards:[3 7] sizes:[3:8.00 MiB 7:8.00 MiB] total:16.00 MiB Disk hdd total size:0 file_count:0 Rack DefaultRack total size:511321536 file_count:993 DataCenter DefaultDataCenter total size:511321536 file_count:993 total size:511321536 file_count:993 ```	2025-12-28 19:30:42 -08:00
Lisandro Pin	6a1b9ce8cd	Give `cluster.status` detailed file metrics for regular volumes (#7791 ) * Implement a `weed shell` command to return a status overview of the cluster. Detailed file information will be implemented in a follow-up MR. Note also that masters are currently not reporting back EC shard sizes correctly, via `master_pb.VolumeEcShardInformationMessage.shard_sizes`. F.ex: ``` > status cluster: id: topo status: LOCKED nodes: 10 topology: 1 DC(s)s, 1 disk(s) on 1 rack(s) volumes: total: 3 volumes on 1 collections max size: 31457280000 bytes regular: 2/80 volumes on 6 replicas, 6 writable (100.00%), 0 read-only (0.00%) EC: 1 EC volumes on 14 shards (14.00 shards/volume) storage: total: 186024424 bytes regular volumes: 186024424 bytes EC volumes: 0 bytes raw: 558073152 bytes on volume replicas, 0 bytes on EC shard files ``` * Humanize output for `weed.server` by default. Makes things more readable :) ``` > cluster.status cluster: id: topo status: LOCKED nodes: 10 topology: 1 DC, 10 disks on 1 rack volumes: total: 3 volumes, 1 collection max size: 32 GB regular: 2/80 volumes on 6 replicas, 6 writable (100%), 0 read-only (0%) EC: 1 EC volume on 14 shards (14 shards/volume) storage: total: 172 MB regular volumes: 172 MB EC volumes: 0 B raw: 516 MB on volume replicas, 0 B on EC shards ``` ``` > cluster.status --humanize=false cluster: id: topo status: LOCKED nodes: 10 topology: 1 DC(s), 10 disk(s) on 1 rack(s) volumes: total: 3 volume(s), 1 collection(s) max size: 31457280000 byte(s) regular: 2/80 volume(s) on 6 replica(s), 5 writable (83.33%), 1 read-only (16.67%) EC: 1 EC volume(s) on 14 shard(s) (14.00 shards/volume) storage: total: 172128072 byte(s) regular volumes: 172128072 byte(s) EC volumes: 0 byte(s) raw: 516384216 byte(s) on volume replicas, 0 byte(s) on EC shards ``` Also adds unit tests, and reshuffles test files handling for clarity. * `cluster.status`: Add detailed file metrics for regular volumes.	2025-12-17 16:40:27 -08:00
Lisandro Pin	187ef65e8f	Humanize output for `weed.server` by default (#7758 ) * Implement a `weed shell` command to return a status overview of the cluster. Detailed file information will be implemented in a follow-up MR. Note also that masters are currently not reporting back EC shard sizes correctly, via `master_pb.VolumeEcShardInformationMessage.shard_sizes`. F.ex: ``` > status cluster: id: topo status: LOCKED nodes: 10 topology: 1 DC(s)s, 1 disk(s) on 1 rack(s) volumes: total: 3 volumes on 1 collections max size: 31457280000 bytes regular: 2/80 volumes on 6 replicas, 6 writable (100.00%), 0 read-only (0.00%) EC: 1 EC volumes on 14 shards (14.00 shards/volume) storage: total: 186024424 bytes regular volumes: 186024424 bytes EC volumes: 0 bytes raw: 558073152 bytes on volume replicas, 0 bytes on EC shard files ``` * Humanize output for `weed.server` by default. Makes things more readable :) ``` > cluster.status cluster: id: topo status: LOCKED nodes: 10 topology: 1 DC, 10 disks on 1 rack volumes: total: 3 volumes, 1 collection max size: 32 GB regular: 2/80 volumes on 6 replicas, 6 writable (100%), 0 read-only (0%) EC: 1 EC volume on 14 shards (14 shards/volume) storage: total: 172 MB regular volumes: 172 MB EC volumes: 0 B raw: 516 MB on volume replicas, 0 B on EC shards ``` ``` > cluster.status --humanize=false cluster: id: topo status: LOCKED nodes: 10 topology: 1 DC(s), 10 disk(s) on 1 rack(s) volumes: total: 3 volume(s), 1 collection(s) max size: 31457280000 byte(s) regular: 2/80 volume(s) on 6 replica(s), 5 writable (83.33%), 1 read-only (16.67%) EC: 1 EC volume(s) on 14 shard(s) (14.00 shards/volume) storage: total: 172128072 byte(s) regular volumes: 172128072 byte(s) EC volumes: 0 byte(s) raw: 516384216 byte(s) on volume replicas, 0 byte(s) on EC shards ``` Also adds unit tests, and reshuffles test files handling for clarity.	2025-12-15 11:18:45 -08:00
Lisandro Pin	662a6ac8ee	Implement a `weed shell` command to return a status overview of the cluster. (#7704 ) Detailed file information will be implemented in a follow-up MR. Note also that masters are currently not reporting back EC shard sizes correctly, via `master_pb.VolumeEcShardInformationMessage.shard_sizes`. F.ex: ``` > cluster.status cluster: id: topo status: LOCKED nodes: 10 topology: 1 DC(s)s, 1 disk(s) on 1 rack(s) volumes: total: 3 volumes on 1 collections max size: 31457280000 bytes regular: 2/80 volumes on 6 replicas, 6 writable (100.00%), 0 read-only (0.00%) EC: 1 EC volumes on 14 shards (14.00 shards/volume) storage: total: 186024424 bytes regular volumes: 186024424 bytes EC volumes: 0 bytes raw: 558073152 bytes on volume replicas, 0 bytes on EC shard files ```	2025-12-12 18:07:59 -08:00

8 Commits