seaweedfs

mirror of https://github.com/seaweedfs/seaweedfs.git synced 2026-05-14 05:41:29 +00:00

Author	SHA1	Message	Date
Chris Lu	1c0e24f06a	fix(balance): don't move remote-tiered volumes; don't fatal on missing .idx (#9335 ) * fix(volume): don't fatal on missing .idx for remote-tiered volume A .vif left behind without its .idx (orphaned by a crashed move, partial copy, or hand-edit) would trip glog.Fatalf in checkIdxFile and take the whole volume server down on boot, killing every healthy volume on it too. For remote-tiered volumes treat it as a per-volume load error so the server can come up and the operator can clean up the stray .vif. Refs #9331. * fix(balance): skip remote-tiered volumes in admin balance detection The admin/worker balance detector had no equivalent of the shell-side guard ("does not move volume in remote storage" in command_volume_balance.go), so it scheduled moves on remote-tiered volumes. The "move" copies .idx/.vif to the destination and then calls Volume.Destroy on the source, which calls backendStorage.DeleteFile — deleting the remote object the destination's new .vif now points at. Populate HasRemoteCopy on the metrics emitted by both the admin maintenance scanner and the worker's master poll, then drop those volumes at the top of Detection. Fixes #9331. * Apply suggestion from @gemini-code-assist[bot] Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * fix(volume): keep remote data on volume-move-driven delete The on-source delete after a volume move (admin/worker balance and shell volume.move) ran Volume.Destroy with no way to opt out of the remote-object cleanup. Volume.Destroy unconditionally calls backendStorage.DeleteFile for remote-tiered volumes, so a successful move would copy .idx/.vif to the destination and then nuke the cloud object the destination's new .vif was already pointing at. Add VolumeDeleteRequest.keep_remote_data and plumb it through Store.DeleteVolume / DiskLocation.DeleteVolume / Volume.Destroy. The balance task and shell volume.move set it to true; the post-tier-upload cleanup of other replicas and the over-replication trim in volume.fix.replication also set it to true since the remote object is still referenced. Other real-delete callers keep the default. The delete-before-receive path in VolumeCopy also sets it: the inbound copy carries a .vif that may reference the same cloud object as the existing volume. Refs #9331. * test(storage): in-process remote-tier integration tests Cover the four operations the user is most likely to run against a cloud-tiered volume — balance/move, vacuum, EC encode, EC decode — by registering a local-disk-backed BackendStorage as the "remote" tier and exercising the real Volume / DiskLocation / EC encoder code paths. Locks in: - Destroy(keepRemoteData=true) preserves the remote object (move case) - Destroy(keepRemoteData=false) deletes it (real-delete case) - Vacuum/compact on a remote-tier volume never deletes the remote object - EC encode requires the local .dat (callers must download first) - EC encode + rebuild round-trips after a tier-down Tests run in-process and finish in under a second total — no cluster, binary, or external storage required. * fix(rust-volume): keep remote data on volume-move-driven delete Mirror the Go fix in seaweed-volume: plumb keep_remote_data through grpc volume_delete → Store.delete_volume → DiskLocation.delete_volume → Volume.destroy, and skip the s3-tier delete_file call when the flag is set. The pre-receive cleanup in volume_copy passes true for the same reason as the Go side: the inbound copy carries a .vif that may reference the same cloud object as the existing volume. The Rust loader already warns rather than fataling on a stray .vif without an .idx (volume.rs load_index_inmemory / load_index_redb), so no counterpart to the Go fatal-on-missing-idx fix is needed. Refs #9331. * fix(volume): preserve remote tier on IO-error eviction; fix EC test target Two review nits: - Store.MaybeAddVolumes' periodic cleanup pass deleted IO-errored volumes with keepRemoteData=false, so a transient local fault on a remote-tiered volume would also nuke the cloud object. Track the delete reason via a parallel slice and pass keepRemoteData=v.HasRemoteFile() for IO-error evictions; TTL-expired evictions still pass false. - TestRemoteTier_ECEncodeDecode_AfterDownload deleted shards 0..3 but called them "parity" — by the klauspost/reedsolomon convention shards 0..DataShardsCount-1 are data and DataShardsCount..TotalShardsCount-1 are parity. Switch the loop to delete the parity range so the intent matches the indices. --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-05-06 15:19:43 -07:00
Chris Lu	0fed72d95a	volume.tier.move: fulfill target replication before deleting old replicas (#8950 ) * volume.tier.move: fulfill target replication before deleting old replicas When -toReplication is specified, volume.tier.move now creates all required replicas on the destination tier before deleting old replicas. This closes the data-loss window where only one copy existed on the target tier while awaiting volume.fix.replication. If replication fulfillment fails, old replicas are preserved and marked writable so the volume remains accessible. Also extracts replicateVolumeToServer and configureVolumeReplication helpers to reduce duplication across volume.tier.move and volume.fix.replication. Fixes #8937 * volume.tier.move: always fulfill replication before deleting old replicas When -toReplication is specified, use that replication setting. Otherwise, read the volume's existing replication from the super block. In both cases, all required replicas are created on the destination tier before old replicas are deleted. If replication fulfillment fails (e.g. not enough destination nodes), old replicas are preserved and marked writable so no data is lost. * volume.tier.move: address review feedback on ensureReplicationFulfilled - Add 5s delay before re-collecting topology to allow master heartbeat propagation after the move - Add nil guard for targetTierReplicas to prevent panic if the moved replica is not yet visible in the topology - Treat configureVolumeReplication failure as a hard error instead of a warning, so the rollback logic preserves old replicas * volume.tier.move: harden replication config error handling - Make configureVolumeReplication failure on the primary moved replica a hard error that aborts the move, instead of logging and continuing - Configure replication metadata on all existing target-tier replicas (not just newly created ones) when -toReplication is specified - Deletion of old replicas cannot affect new replicas since the locations list only contains pre-move servers (verified, no change) * volume.tier.move: fix cleanup deleting fulfilled replicas and broken recovery Fix 1: The cleanup loop now preserves pre-existing target-tier replicas that ensureReplicationFulfilled counted toward the replication target. Previously, a mixed-tier volume with an existing replica on the target tier could have that replica deleted right after being counted as fulfilled, leaving the volume under-replicated. ensureReplicationFulfilled now returns a preserveServers set that the deletion loop checks before removing any old replica. Fix 2: Failure paths after LiveMoveVolume (which deletes the source replica) now use restoreSurvivingReplicasWritable instead of markVolumeReplicasWritable. The old helper stopped on first error, so attempting to mark the already-deleted source writable would prevent all surviving replicas from being restored. The new helper skips the deleted source and continues through all remaining locations, logging per-replica errors instead of aborting. * volume.tier.move: mark preserved replicas writable, skip nodes with existing volume Fix 1: Preserved pre-existing target-tier replicas were left read-only after the move completed. They were marked read-only at the start (along with all other replicas) but never restored since the old code deleted them. Now they are explicitly marked writable before cleanup. Fix 2: The fulfillment loop could pick a candidate node that already hosts this volume on a different disk type, causing a VolumeCopy conflict. Added a guard that skips any node already hosting the volume (on any disk) before attempting replication.	2026-04-06 14:55:37 -07:00
qzh	4c72512ea2	fix(shell): avoid marking skipped or unplaced volumes as fixed (#8866 ) * fix(s3api): fix AWS Signature V2 format and validation * fix(s3api): Skip space after "AWS" prefix (+1 offset) * test(s3api): add unit tests for Signature V2 authentication fix * fix(s3api): simply comparing signatures * validation for the colon extraction in expectedAuth * fix(shell): avoid marking skipped or unplaced volumes as fixed --------- Co-authored-by: chrislu <chris.lu@gmail.com> Co-authored-by: Chris Lu <chrislusf@users.noreply.github.com>	2026-04-01 01:20:25 -07:00
Chris Lu	0a46577700	Fix #8040 : Support '_default' keyword in collectionPattern to match default collection (#8046 ) * Fix #8040: Support 'default' keyword in collectionPattern to match default collection The default collection in SeaweedFS is represented as an empty string internally. Previously, it was impossible to specifically target only the default collection because: - Empty collectionPattern matched ALL collections (filter was skipped) - Using collectionPattern="default" tried to match the literal string "default" This commit adds special handling for the keyword "default" in collectionPattern across multiple shell commands: - volume.tier.move - volume.list - volume.fix.replication - volume.configure.replication Now users can use -collectionPattern="default" to specifically target volumes in the default collection (empty collection name), while maintaining backward compatibility where empty pattern matches all collections. Updated help text to document this feature. * Update compileCollectionPattern to support 'default' keyword This extends the fix to all commands that use regex-based collection pattern matching: - ec.encode - ec.decode - volume.tier.download - volume.balance The compileCollectionPattern function now treats "default" as a special keyword that compiles to the regex "^$" (matching empty strings), making it consistent with the other commands that use filepath.Match. * Use CollectionDefault constant instead of hardcoded "default" string Refactored the collection pattern matching logic to use a central constant CollectionDefault defined in weed/shell/common.go. This improves maintainability and ensures consistency across all shell commands. * Address PR review feedback: simplify logic and use '_default' keyword Changes: 1. Changed CollectionDefault from "default" to "_default" to avoid collision with literal collection names 2. Simplified pattern matching logic to reduce code duplication across all affected commands 3. Fixed error handling in command_volume_tier_move.go to properly propagate filepath.Match errors instead of swallowing them 4. Updated documentation to clarify how to match a literal "default" collection using regex patterns like "^default$" This addresses all feedback from PR review comments. * Remove unnecessary documentation about matching literal 'default' Since we changed the keyword to '_default', users can now simply use 'default' to match a literal collection named "default". The previous documentation about using regex patterns was confusing and no longer needed. * Fix error propagation and empty pattern handling 1. command_volume_tier_move.go: Added early termination check after eachDataNode callback to stop processing remaining nodes if a pattern matching error occurred, improving efficiency 2. command_volume_configure_replication.go: Fixed empty pattern handling to match all collections (collectionMatched = true when pattern is empty), mirroring the behavior in other commands These changes address the remaining PR review feedback.	2026-01-16 12:31:48 -08:00
Chris Lu	2b3ff3cd05	verbose mode	2025-12-26 12:42:00 -08:00
steve.wei	5c25df20f2	feat(volume.fix): show all replica locations for misplaced volumes (#7560 )	2025-11-27 00:04:45 -08:00
Lisandro Pin	9744382a18	Rework parameters passing for functions within `volume.check.disk`. (#7448 ) * Rework parameters passing for functions within `volume.check.disk`. We'll need to rework this logic to account for read-only volumes, and there're already way too many parameters shuffled around. Grouping these into a single struct simplifies the overall codebase. * similar fix * Improved Error Handling in Tests * propagate the errors * edge cases * edge case on modified time * clean up --------- Co-authored-by: chrislu <chris.lu@gmail.com>	2025-11-10 16:03:38 -08:00
Lisandro Pin	76e4a51964	Unify the parameter to disable dry-run on weed shell commands to `-apply` (instead of `-force`). (#7450 ) * Unify the parameter to disable dry-run on weed shell commands to --apply (instead of --force). * lint * refactor * Execution Order Corrected * handle deprecated force flag * fix help messages * Refactoring]: Using flag.FlagSet.Visit() * consistent with other commands * Checks for both flags * fix toml files --------- Co-authored-by: chrislu <chris.lu@gmail.com>	2025-11-09 19:58:38 -08:00
nightcoffee	aed91baa2e	[weed] update volume.fix.replication description (#7340 ) * [weed] update volume.fix.replication description * Update master-cloud.toml * Update master.toml	2025-10-21 12:38:40 -07:00
Roman Shishkin	b6f5fb4b45	Human-readable processed bytes in volume.fix.replication (#7253 )	2025-09-18 06:44:40 -07:00
Lisandro Pin	64198dad83	Paralleize operations for `weed shell`'s `volume.fix.replication`. (#6789 ) Paralleize operations for `weed shell`s `volume.fix.replication`.	2025-07-30 10:58:30 -07:00
dsd	da2a234b00	[weed] change -n to -force (#6421 )	2025-01-08 09:57:18 -08:00
dsd	20cbc9e4eb	skip error while executing volume.fix.replication (#6382 )	2024-12-20 07:36:13 -08:00
chrislu	ec155022e7	"golang.org/x/exp/slices" => "slices" and go fmt	2024-12-19 19:25:06 -08:00
Trim21	d43fa07f06	use readable bytes size string in shell output (#6288 )	2024-11-25 17:25:17 -08:00
Konstantin Lebedev	a143c888e5	[shell] don't require lock when there are no changes for volume.fix.replication (#6266 ) * don't require lock when there are no changes * revert takeAction	2024-11-21 08:17:25 -08:00
chrislu	07cf8cf22d	minor	2024-11-19 08:31:33 -08:00
Lisandro Pin	0d5393641e	Unify usage of shell.EcNode.dc as DataCenterId. (#6258 )	2024-11-19 06:33:18 -08:00
chrislu	20929f2a57	adjust resource heavy for volume.fix.replication	2024-09-29 11:32:18 -07:00
chrislu	6564ceda91	skip resource heavy commands from running on master nodes	2024-09-29 10:51:17 -07:00
chrislu	ec30a504ba	refactor	2024-09-29 10:38:22 -07:00
chrislu	701abbb9df	add IsResourceHeavy() to command interface	2024-09-28 20:23:01 -07:00
Max Denushev	f1e700ce2f	Fix/copy before delete replication (#6064 ) * fix(shell): volume.fix.replication misplaced volumes unsatisfying replication factor * fix(shell): simplify replication check * fix(shell): add test for satisfyReplicaCurrentLocation	2024-09-26 08:34:13 -07:00
skycope	6e4b9181f5	fix "volume.fix.replication" move many replications only to one volumeServer (#5522 )	2024-04-23 06:33:50 -07:00
steve.wei	67ead9b18f	fix(volume.fix.replication): adjust volume count, not free volume count (#5479 )	2024-04-08 07:30:04 -07:00
zehweh	2b9dda7d2e	fix isMisplaced() in command_volume_fix_replication.go (#4988 )	2023-11-07 07:58:19 -08:00
Konstantin Lebedev	dffe00a822	fix: logger place msg (#4880 )	2023-10-02 08:29:09 -07:00
Konstantin Lebedev	dd580190b4	fix: avoid deleting one replica without sync (#4875 ) * fix: avoid deleting one replica without sync https://github.com/seaweedfs/seaweedfs/issues/4647 * Update weed/shell/command_volume_fix_replication.go Co-authored-by: Chris Lu <chrislusf@users.noreply.github.com> * fix: revert this existing do option to positive --------- Co-authored-by: Konstantin Lebedev <9497591+kmlebedev@users.noreply.github.co> Co-authored-by: Chris Lu <chrislusf@users.noreply.github.com>	2023-09-27 23:12:10 -07:00
Konstantin Lebedev	df4ded758e	fix: avoid deleting more than one replica (#4873 ) https://github.com/seaweedfs/seaweedfs/issues/4647 Co-authored-by: Konstantin Lebedev <9497591+kmlebedev@users.noreply.github.co>	2023-09-26 00:20:48 -07:00
chrislu	645ae8c57b	Revert "Revert "Merge branch 'master' of https://github.com/seaweedfs/seaweedfs "" This reverts commit `8cb42c39`	2023-09-25 09:35:16 -07:00
chrislu	8cb42c39ad	Revert "Merge branch 'master' of https://github.com/seaweedfs/seaweedfs " This reverts commit `2e5aa06026`, reversing changes made to `4d414f54a2`.	2023-09-18 16:12:50 -07:00
dependabot[bot]	a04bd4d26f	Bump github.com/rclone/rclone from 1.63.1 to 1.64.0 (#4850 ) * Bump github.com/rclone/rclone from 1.63.1 to 1.64.0 Bumps [github.com/rclone/rclone](https://github.com/rclone/rclone) from 1.63.1 to 1.64.0. - [Release notes](https://github.com/rclone/rclone/releases) - [Changelog](https://github.com/rclone/rclone/blob/master/RELEASE.md) - [Commits](https://github.com/rclone/rclone/compare/v1.63.1...v1.64.0) --- updated-dependencies: - dependency-name: github.com/rclone/rclone dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> * API changes * go mod --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Chris Lu <chrislusf@users.noreply.github.com> Co-authored-by: chrislu <chris.lu@gmail.com>	2023-09-18 14:43:05 -07:00
chrislu	3365468d0d	added an error message	2023-08-08 20:35:21 -07:00
Konstantin Lebedev	25535e9c36	Delete volume is empty (#4561 ) * use onlyEmpty for deleteVolume https://github.com/seaweedfs/seaweedfs/issues/4559 * fix IsEmpty * fix test --------- Co-authored-by: Konstantin Lebedev <9497591+kmlebedev@users.noreply.github.co>	2023-06-12 10:42:44 -07:00
chrislu	214b7cd286	volume.fix.replication: adjust the retry checking times	2023-02-22 10:47:52 -08:00
chrislu	21c0587900	go fmt	2022-09-14 23:06:44 -07:00
Brian	4e3e2b1b82	Add option in volume.fix.replication to only fix under-replication and not delete volumes (#3640 )	2022-09-10 08:05:28 -07:00
chrislu	676e27c589	shell: stop long running jobs if lock is lost	2022-08-22 14:12:23 -07:00
chrislu	26dbc6c905	move to https://github.com/seaweedfs/seaweedfs	2022-07-29 00:17:28 -07:00
chrislu	f97acdd489	volume.fix.replication fix retry logic fix https://github.com/chrislusf/seaweedfs/issues/3136	2022-06-03 08:45:29 -07:00
Konstantin Lebedev	44f53ceda6	fix collectionIsMismatch charset	2022-05-16 13:23:23 +05:00
Konstantin Lebedev	10d435f2c2	fix skip loop	2022-05-16 13:16:27 +05:00
Konstantin Lebedev	279053572c	avoid delete volume replica if collection mismatch	2022-05-16 13:07:05 +05:00
justin	3551ca2fcf	enhancement: replace sort.Slice with slices.SortFunc to reduce reflection	2022-04-18 10:35:43 +08:00
chrislu	f18803424a	volume.balance: add delay during tight loop fix https://github.com/chrislusf/seaweedfs/issues/2637	2022-02-08 00:53:55 -08:00
chrislu	9f9ef1340c	use streaming mode for long poll grpc calls streaming mode would create separate grpc connections for each call. this is to ensure the long poll connections are properly closed.	2021-12-26 00:15:03 -08:00
chrislu	a2d3f89c7b	add lock messages	2021-12-10 13:24:38 -08:00
chrislu	e6c026db65	volume.fix.replication: fix misplaced volumes fix https://github.com/chrislusf/seaweedfs/issues/2416	2021-12-05 16:56:25 -08:00
Chris Lu	5435027ff0	volume copy: stream out copying progress and avoid grpc request timeout fix https://github.com/chrislusf/seaweedfs/issues/2386	2021-10-24 02:52:56 -07:00
Chris Lu	e862b2529a	refactor	2021-10-01 12:10:11 -07:00

1 2

96 Commits