seaweedfs

mirror of https://github.com/seaweedfs/seaweedfs.git synced 2026-06-09 18:32:43 +00:00

Author	SHA1	Message	Date
Chris Lu	a653a7f72a	fix(shell): honor explicit fs.mergeVolumes from/to direction (#10159 ) * fix(shell): honor explicit fs.mergeVolumes from/to direction mergeVolumes only ever merged a smaller volume into a larger one. When the user named both -fromVolumeId and -toVolumeId with the source larger than the target, the planner produced an empty plan and the command printed just "max volume size: N MB" and moved nothing. Build the requested pair directly when both ids are given, instead of routing through the size-descending heuristic. Read-only, empty, and wrong-collection endpoints are rejected with a clear error rather than a silent no-op. * fix(shell): allow fs.mergeVolumes into an empty target volume Merging chunks into an empty volume is valid, e.g. consolidating data into a freshly created or recently vacuumed volume. Only reject an empty source, which has nothing to move. * fix(shell): reject self-map in directed mergeVolumes planner createMergePlan with from == to returned a {vid: vid} self-merge when called directly. Guard it in the planner so it is correct independent of the Do entrypoint.	2026-06-30 13:28:53 -07:00
Aleksey	424cd164e9	s3: invalidate stale reader cache locations on chunk read failure (#10156 ) * s3: invalidate stale reader cache locations on chunk read failure * filer: share the chunk-read self-heal across reader cache and streaming paths The reader cache retry added a third copy of the invalidate-relookup-compare-retry dance already inlined in PrepareStreamContentWithThrottler and duplicated in retryWithCacheInvalidation. Extract retryFetchWithFreshLocations and route all three through it, parameterized by the refetch primitive. * filer: drop redundant completedTimeNew store in reader cache success path startCaching already stamps completedTimeNew unconditionally before the fetchErr branch; the second store inside the success branch is dead. * filer: make NewReaderCache cache invalidator an explicit parameter The variadic ...CacheInvalidator only ever read the first element, so a caller could pass two and silently get one. Take a single explicit argument and have the non-S3 callers pass nil. * filer: inject reader cache chunk fetch as a struct field Replace the process-global readerCacheFetchChunkData test seam with a per-instance fetchChunkDataFn field defaulted in NewReaderCache, matching how lookupFileIdFn is already wired. Tests set the field on the cache instead of swapping a shared global. * filer: log the location count, not full URLs, on self-heal retry --------- Co-authored-by: Chris Lu <chris.lu@gmail.com>	2026-06-30 13:27:49 -07:00
Chris Lu	803c5a8dca	fix(filer): use DROP TABLE IF EXISTS in SQL stores (#10158 ) Concurrent bucket deletion across multiple filer replicas races on the per-bucket DROP TABLE. The first replica drops the table; the rest hit an undefined-table error (postgres 42P01, mysql 1051) which propagates out of DeleteFolderChildren and panics the filer. On restart the same pending DROP re-runs and the filer crash-loops. Make the drop idempotent. Same defect class in all SQL backends, so fix postgres, postgres2, mysql, mysql2, and sqlite together.	2026-06-30 10:38:51 -07:00
Chris Lu	8f2a2abae4	fix(ec): correct EC FULL scrub for deleted needles, shard-location cache, and parity coverage (#10152 ) * fix(ec): correct EC FULL scrub for deleted needles + shard-location cache Addresses review findings on the EC FULL distributed scrub: - Remote EC reads now thread Go's (bytes, is_deleted) contract. A runtime EC delete keeps the .ecx size positive (the delete lives in .ecj/memory), so the raw-index walk verifies the needle, and its header interval is usually remote; the peer answers is_deleted with no payload. The scrub zero-fills that interval (so the needle reaches read_bytes -> SizeMismatch{0} -> the delete-state suppression), the serving direct read short-circuits to not-found, and reconstruction EXCLUDES the shard instead of feeding zeros into Reed-Solomon. - The walk skips size.is_deleted() (not just is_tombstone), so a -originalSize .ecx entry (pre-encode delete) can't yield empty intervals or panic parse_header. - Restore Go's < data_shards completeness guard (per-volume, custom-ratio aware) and per-shard merge in the location cache instead of clobber-with-partial. - Abort the scrub with an error on mid-scan unmount instead of a false-CLEAN. - Hoist the refreshed location map once instead of cloning it per needle. Claude-Session: https://claude.ai/code/session_015EE9Sc9EvNp8BCVva4RKdo * feat(scrub): keep RS parity check in EC FULL until CHECKSUM lands The per-needle FULL walk only reads live data-shard intervals, so it can't catch bitrot in a parity shard or an unwalked cold region. Run verify_ec_shards alongside the walk, gated on all-shards-local (single-node EC), via spawn_blocking. A deliberate temporary divergence from Go FULL; moves to mode 4 (CHECKSUM) once the .ecsum subsystem lands. Claude-Session: https://claude.ai/code/session_015EE9Sc9EvNp8BCVva4RKdo	2026-06-30 10:18:31 -07:00
Kwak Byoung Min	a85318111c	admin: restore cluster volume page CSV export (#10155 )	2026-06-30 08:31:37 -07:00
Peter Dodd	bb5e6e40df	fix(remote/gcs): tolerate already-deleted object in DeleteFile (#10151 )	2026-06-30 07:52:25 -07:00
Chris Lu	d18b85ef61	feat(scrub): EC FULL scrub — distributed local+remote needle walk (#10149 ) * feat(ec): add scrub_ec_volume_distributed (FULL EC scrub, local+remote) Ports Go's Store.ScrubEcVolume: walk the raw .ecx, verify every needle across local AND remote shards without decoding (report faults, don't heal), with the #10130 deleted-needle size-mismatch suppression gated on a force flag. Reuses the read path's lock-drop + no-reconstruct read_remote_ec_shard_interval so no !Send store guard is held across an .await. Walks the unmasked index (scrub_snapshot_under_lock locates from the raw (offset, size), not locate_needle) so logically-deleted-but-present needles are still byte-verified, matching Go. Refreshes shard locations once up front and hard-fails on a master-lookup error rather than retrying per needle. Claude-Session: https://claude.ai/code/session_015EE9Sc9EvNp8BCVva4RKdo * feat(scrub): dispatch EC FULL (mode 2) to the distributed needle walk FULL ran a local-only Reed-Solomon parity check; route it to the per-needle local+remote walk instead, mirroring Go. The handler collects vids under a brief lock then releases it: FULL self-locks per needle (it awaits remote reads), INDEX/LOCAL re-acquire a brief lock. verify_ec_shards is retained but no longer wired to a mode. Claude-Session: https://claude.ai/code/session_015EE9Sc9EvNp8BCVva4RKdo	2026-06-30 03:29:46 -07:00
Chris Lu	53087cb237	admin: remove non-functional EC repair button from UI (#10150 ) The EC volumes, EC shards, and collection details pages each rendered a repair (wrench) button for incomplete EC volumes. Its handler POSTed to a /repair endpoint that the admin server never registers, so every click returned "404 page not found" (the collection details page only had a placeholder handler). Remove the buttons and their JavaScript handlers, and regenerate the templ output. Manual EC shard recovery remains available from weed shell via ec.rebuild.	2026-06-30 02:38:13 -07:00
Lisandro Pin	cac83bb4a8	Fix scrubbing of deleted needles on EC volumes. (#10130 ) EC volumes do not propagate deletions to all shard indexes, so it is possible to run scrubbing on a volume where a deleted needle is still present in the index, or a needle deleted from the index is still present on the volume. On either scenario, scrubbing will fail due to size mismatch errors. This PR reworks the scrubbing logic so needle size mismatches are ignored in such scenarios. Scrubbing can still be forced to check deleted needles (f.ex. to discover index inconsistencies); this option will be exposed in RPCs and `weed shell` on a follow-up PR.	2026-06-30 02:05:14 -07:00
Chris Lu	acbb6f7550	fix(scrub): don't flag offset-0 logical tombstones in volume scrub (#10148 ) * fix(scrub): don't flag offset-0 logical tombstones in volume scrub A remote-tier delete records a tombstone at .idx offset 0 with no physical .dat bytes. Full scrub double-flagged a healthy remote-tiered volume with deletes: scrubVolumeData counted the tombstone's GetActualSize(-1)=32 toward totalRead (want > physical .dat), and CheckIndexFile treated it as occupying [0,31] and flagged the first live needle as overlapping. Skip offset-0 logical tombstones from both the size reconcile and the overlap check; they are still counted for the index-size check. Local deletes (offset != 0) are unaffected. Claude-Session: https://claude.ai/code/session_015EE9Sc9EvNp8BCVva4RKdo * fix(scrub): mirror offset-0 logical tombstone handling into Rust Same fix as the Go volume_checking.go + idx/check.go change: Volume::scrub skips offset-0 logical tombstones from total_read, and check_index_file excludes them from the overlap check (still counted for the index-size check). Claude-Session: https://claude.ai/code/session_015EE9Sc9EvNp8BCVva4RKdo	2026-06-30 02:01:14 -07:00
Chris Lu	c9f2ef9ef7	fix(ec): suppress deleted-needle size mismatch in EC LOCAL scrub (#10147 ) * fix(ec): suppress deleted-needle size mismatch in EC LOCAL scrub EcVolume.ScrubLocal reassembles each fully-local needle and ReadBytes-checks it, but appended every error unconditionally. A needle the .ecx still reports live while its reassembled on-disk header carries size 0 (delete state disagrees between index and header) is not corruption — the LOCAL twin of the #10130 fix for the FULL path. Suppress the ErrorSizeMismatch in that case; genuine (non-zero) size mismatches and CRC/tail errors are still reported. Claude-Session: https://claude.ai/code/session_015EE9Sc9EvNp8BCVva4RKdo * fix(ec): mirror EC LOCAL scrub deleted-needle suppression into Rust Same suppression as the Go EcVolume.ScrubLocal change: a NeedleError::SizeMismatch whose on-disk header size is 0 against a live index entry is a delete-state disagreement, not corruption. Claude-Session: https://claude.ai/code/session_015EE9Sc9EvNp8BCVva4RKdo	2026-06-30 01:58:57 -07:00
Chris Lu	473f7b2367	feat(scrub): EC LOCAL needle walk (split from FULL) (#10144 ) * feat(ec): extract locate_ec_shard_needle_interval Mirrors Go's EcVolume.LocateEcShardNeedleInterval; reused by locate_needle and the upcoming local scrub walk. Claude-Session: https://claude.ai/code/session_015EE9Sc9EvNp8BCVva4RKdo * feat(ec): add EcVolumeShard::to_ec_shard_info Mirrors Go's ToEcShardInfo. Claude-Session: https://claude.ai/code/session_015EE9Sc9EvNp8BCVva4RKdo * feat(ec): add EcVolume::scrub_local Walk the .ecx and verify each needle against the locally-held shards, reading interval-by-interval (reusing one chunk buffer); CRC-check only fully-local needles, report short/unreadable local shards, and abort the scan on a structural size mismatch. Mirrors Go's EcVolume.ScrubLocal. Claude-Session: https://claude.ai/code/session_015EE9Sc9EvNp8BCVva4RKdo * feat(scrub): dispatch EC LOCAL (mode 3) to scrub_local Splits the mode 2\|3 arm: FULL (2) keeps the Reed-Solomon parity check; LOCAL (3) now runs the per-needle local-shard walk. Claude-Session: https://claude.ai/code/session_015EE9Sc9EvNp8BCVva4RKdo	2026-06-30 01:22:46 -07:00
Chris Lu	dccc015a1f	fix(scrub): walk the on-disk .idx in Volume::scrub (source-of-truth parity) (#10143 ) * refactor(scrub): extract open_index_for_scrub shared by scrub_index Mirrors Go's openIndex, shared by ScrubIndex and the upcoming Scrub rewrite. Claude-Session: https://claude.ai/code/session_015EE9Sc9EvNp8BCVva4RKdo * fix(scrub): walk the on-disk .idx in Volume::scrub scrub walked the deduped in-memory map, so total_read undercounted the physical .dat on any volume with overwrites or deletes and the size reconcile falsely flagged healthy volumes broken. Walk every .idx row instead (matching Go's scrubVolumeData): count all rows, CRC-verify live needles, skip deleted, and reconcile against the .dat. Holds one data-file read lock and reads via the unlocked path, like Go's Scrub. Claude-Session: https://claude.ai/code/session_015EE9Sc9EvNp8BCVva4RKdo	2026-06-30 01:17:28 -07:00
adri	1df7a0e653	fix(volume [rust] + ec): search sibling disk locations when rebuilding missing EC shards + .ecx files (#10145 ) * fix(volume [rust] + ec): search sibling disk locations when rebuilding missing EC shards * fix(volume [rust] + ec): apply sibling-disk shard lookup to .ecx rebuild as well * fix(volume [rust] + ec): include rebuild_dir in .ecx rebuild's shard search dirs --------- Co-authored-by: adri <adri@digitalunited.net>	2026-06-30 00:07:44 -07:00
Chris Lu	72009c607b	fix(scrub): align Rust INDEX scrub to Go's idx.CheckIndexFile (#10142 ) * feat(idx): add check_index_file mirroring Go idx.CheckIndexFile Index-only structural check: walk the on-disk index, sort by (offset, size), flag overlapping needles, and verify the file is a whole number of entries. No data-file reads. Claude-Session: https://claude.ai/code/session_015EE9Sc9EvNp8BCVva4RKdo * refactor(ec): use idx::check_index_file in EcVolume::scrub_index Drops the inline walk/sort/overlap copy. Walks a private fd so the structural scan never moves the shared ecx_file cursor (read positionally elsewhere). Claude-Session: https://claude.ai/code/session_015EE9Sc9EvNp8BCVva4RKdo * fix(scrub): make Volume::scrub_index an index-only check on the on-disk .idx INDEX mode walked the deduped in-memory map and read .dat headers — more than the cheap-INDEX contract allows, yet missing Go's overlap and size-multiple structural checks. Route it through idx::check_index_file so it matches Go's Volume.ScrubIndex and the INDEX<LOCAL<FULL cost tiering holds. Ports openIndex's zero-size-index guard (a populated .dat with an empty .idx is corruption) and takes the data-file read lock for a consistent snapshot. Claude-Session: https://claude.ai/code/session_015EE9Sc9EvNp8BCVva4RKdo	2026-06-30 00:00:24 -07:00
Chris Lu	0e293c9b0a	docs(scrub): correct backwards FULL/LOCAL mode comments in Rust (#10141 ) docs(scrub): correct backwards FULL(2)/LOCAL(3) mode comments The proto enum is FULL=2, LOCAL=3; two comments had them swapped. Claude-Session: https://claude.ai/code/session_015EE9Sc9EvNp8BCVva4RKdo	2026-06-29 23:59:11 -07:00
Chris Lu	270ac332ff	fix(ec): honor wide EC ratios in Rust read_ec_shard_config (#10140 ) * fix(ec): cap EcShardConfig at MAX_SHARD_COUNT, not TOTAL_SHARDS_COUNT read_ec_shard_config rejected any .vif ratio summing past 14 shards and silently fell back to 10/4, so wider EC volumes ran against the wrong shard set. Match Go's MaxShardCount(32) bound. Claude-Session: https://claude.ai/code/session_015EE9Sc9EvNp8BCVva4RKdo * docs(ec): correct stale 0..14 shard-count comments Claude-Session: https://claude.ai/code/session_015EE9Sc9EvNp8BCVva4RKdo	2026-06-29 23:58:38 -07:00
jk2lx	96f93d8e3b	fix(rust-volume): parse master lookup when publicUrl is omitted (#10128 ) Master /dir/lookup JSON omits publicUrl when empty (Go json omitempty). The Rust volume server required the field, so serde failed with "lookup parse failed: error decoding response body" and cross-DC replicated writes failed. Default publicUrl to empty, fall back to url for peer filtering, and normalize addresses with to_http_address before excluding the local peer (so host:port.grpcPort forms do not match self incorrectly).	2026-06-29 14:03:58 -07:00
Chris Lu	9752286fd4	test(seaweed-volume): cover type=replicate fan-out writes on the Rust volume server (#10139 ) * test(seaweed-volume): cover type=replicate fan-out writes A holder must accept a replicated copy and store it locally without re-replicating. Covers raw and multipart bodies, and a multi-copy volume where re-replication would otherwise reach the master. * test(seaweed-volume): use port 0 for the dead-master address Connecting to port 0 is refused at the socket layer immediately, so the plain-write fan-out path fails fast instead of risking a connect-timeout hang where port 1 is filtered.	2026-06-29 13:55:48 -07:00
7y-9	b55a608ae0	feat: add collection pattern to delete empty volumes (#10129 ) * feat: add collection pattern to delete empty volumes Co-authored-by: Codex <noreply@openai.com> * shell: match collection pattern with wildcard matcher Use wildcard.MatchesWildcard in the shared collection-pattern helper, matching command_volume_fix_replication's matchCollectionPattern. The flag only advertises '*' and '?', which is exactly what the matcher supports. --------- Co-authored-by: Codex <noreply@openai.com> Co-authored-by: Chris Lu <chris.lu@gmail.com>	2026-06-29 12:43:19 -07:00
dependabot[bot]	cb95b45282	build(deps): bump github.com/redis/go-redis/v9 from 9.20.0 to 9.21.0 (#10136 ) Bumps [github.com/redis/go-redis/v9](https://github.com/redis/go-redis) from 9.20.0 to 9.21.0. - [Release notes](https://github.com/redis/go-redis/releases) - [Changelog](https://github.com/redis/go-redis/blob/master/RELEASE-NOTES.md) - [Commits](https://github.com/redis/go-redis/compare/v9.20.0...v9.21.0) --- updated-dependencies: - dependency-name: github.com/redis/go-redis/v9 dependency-version: 9.21.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-06-29 11:35:11 -07:00
dependabot[bot]	2acae706d2	build(deps): bump gocloud.dev/pubsub/natspubsub from 0.45.0 to 0.46.0 (#10135 ) Bumps [gocloud.dev/pubsub/natspubsub](https://github.com/google/go-cloud) from 0.45.0 to 0.46.0. - [Release notes](https://github.com/google/go-cloud/releases) - [Commits](https://github.com/google/go-cloud/compare/v0.45.0...v0.46.0) --- updated-dependencies: - dependency-name: gocloud.dev/pubsub/natspubsub dependency-version: 0.46.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-06-29 11:35:02 -07:00
dependabot[bot]	6de556a9e3	build(deps): bump github.com/jackc/pgx/v5 from 5.9.2 to 5.10.0 (#10134 ) Bumps [github.com/jackc/pgx/v5](https://github.com/jackc/pgx) from 5.9.2 to 5.10.0. - [Changelog](https://github.com/jackc/pgx/blob/master/CHANGELOG.md) - [Commits](https://github.com/jackc/pgx/compare/v5.9.2...v5.10.0) --- updated-dependencies: - dependency-name: github.com/jackc/pgx/v5 dependency-version: 5.10.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-06-29 11:34:54 -07:00
dependabot[bot]	9bc8260532	build(deps): bump golang.org/x/term from 0.43.0 to 0.44.0 (#10133 ) Bumps [golang.org/x/term](https://github.com/golang/term) from 0.43.0 to 0.44.0. - [Commits](https://github.com/golang/term/compare/v0.43.0...v0.44.0) --- updated-dependencies: - dependency-name: golang.org/x/term dependency-version: 0.44.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-06-29 11:34:45 -07:00
dependabot[bot]	990b168766	build(deps): bump github.com/Azure/azure-sdk-for-go/sdk/azidentity from 1.13.1 to 1.14.0 (#10132 ) build(deps): bump github.com/Azure/azure-sdk-for-go/sdk/azidentity Bumps [github.com/Azure/azure-sdk-for-go/sdk/azidentity](https://github.com/Azure/azure-sdk-for-go) from 1.13.1 to 1.14.0. - [Release notes](https://github.com/Azure/azure-sdk-for-go/releases) - [Commits](https://github.com/Azure/azure-sdk-for-go/compare/sdk/azidentity/v1.13.1...sdk/azcore/v1.14.0) --- updated-dependencies: - dependency-name: github.com/Azure/azure-sdk-for-go/sdk/azidentity dependency-version: 1.14.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-06-29 11:34:37 -07:00
dependabot[bot]	e2c649d56f	build(deps): bump actions/cache from 5 to 6 (#10131 ) Bumps [actions/cache](https://github.com/actions/cache) from 5 to 6. - [Release notes](https://github.com/actions/cache/releases) - [Changelog](https://github.com/actions/cache/blob/main/RELEASES.md) - [Commits](https://github.com/actions/cache/compare/v5...v6) --- updated-dependencies: - dependency-name: actions/cache dependency-version: '6' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-06-29 11:34:28 -07:00
7y-9	1e42dd77ca	fix: avoid duplicate volume.list parent headers (#10126 ) Co-authored-by: Codex <noreply@openai.com>	2026-06-29 11:31:45 -07:00
github-actions[bot]	c06a2dca87	4.37 4.37	2026-06-29 06:45:55 +00:00
Chris Lu	5797fb24ec	s3: support AWS object form for bucket policy Principal, add NotPrincipal (#10125 ) * s3: support AWS object form for bucket policy Principal, add NotPrincipal Bucket policy statements only accepted a bare string or array of strings for the Principal element, so the AWS-documented object form was rejected: "Principal": { "AWS": "arn:aws:iam::123456789012:root" } "Principal": { "AWS": ["arn:...", "999999999999"] } Add a PolicyPrincipal type that parses the bare string, the bare array (retained for backward compatibility), and the object form keyed by AWS, Service, Federated or CanonicalUser (each value a string or array). All keyed values are flattened for principal matching, and the original JSON is preserved so PutBucketPolicy/GetBucketPolicy returns the exact shape submitted - keeping infrastructure-as-code tools (Terraform, Ansible) idempotent. Also add NotPrincipal support (a statement applies to every principal except the ones named), compiled and evaluated in both policy evaluators, and reject statements that specify both Principal and NotPrincipal. * s3: address review - validate principal object form, honor dynamic NotPrincipal - Reject unsupported Principal object keys (only AWS/Service/Federated/ CanonicalUser) and empty values, so a form like {"AWS":[]} no longer compiles to zero matchers and silently relies on the match-all fallback. - Detect both Principal and NotPrincipal by field presence, not by flattened length, so a present-but-empty field is still rejected. - Honor dynamic (policy-variable) NotPrincipal/Principal patterns in the compiled evaluator; previously a NotPrincipal made only of variables was treated as absent and its exclusion bypassed. - Add regression tests for the object-form validation and dynamic NotPrincipal.	2026-06-27 22:36:26 -07:00
Rushikesh Deshpande	d0db94c34a	feat(metrics): Add EC rebuild/reconstruct Prometheus metrics (#10124 ) * Review comment removed unnecessary success and failure count * fix: use Gather.Gather() with seeded counter for EC rebuild registration test - Restore Gather.Gather() to verify MustRegister calls as requested in review - Seed VolumeServerECRebuildCounter before gathering because CounterVec only appears after at least one label value is observed - Use correct fully-qualified metric names (SeaweedFS_volumeServer_) fix: remove preflight checkEcVolumeStatus failure from ec_rebuild_total counter ec_rebuild_total should only reflect actual rebuild execution failures (from RebuildEcFiles / RebuildEcxFile), not scan/precheck failures in the volume status loop. The error is still returned to the caller; only the misleading counter increment was removed. * Review comment removed unnecessary observe * label EC rebuild duration histogram by result Without a result label, fast failures pull down the success-latency quantiles shown on the EC Rebuild Duration panel. Make the histogram a HistogramVec keyed by result, record success/failure through one recordEcRebuild helper, and split the Grafana quantiles by (le, result). * reset EC rebuild metric vecs in registration test The HistogramVec needs a child before Gather emits it, so the test must observe once; reset both vecs in cleanup so that sample doesn't leak into other tests. --------- Co-authored-by: Ubuntu User <ubuntu@example.com> Co-authored-by: Chris Lu <chris.lu@gmail.com>	2026-06-27 22:01:36 -07:00
Chris Lu	57ffef8543	fix(admin): skip task state files with no task data on load An empty or truncated tasks/*.pb file unmarshals into a TaskStateFile with a nil Task, and protobufToMaintenanceTask dereferenced it immediately, panicking the whole admin process on startup. Guard the nil case so the loader logs a warning and skips the bad file.	2026-06-26 17:36:42 -07:00
Chris Lu	f643893891	fix(master): shed assign load when volume growth is already in flight (#10121 ) Under a herd of concurrent assigns with no writable volume, Assign spun PickForWrite for the full 10s timeout, pinning a goroutine per request and starving the master of the cycles it needs to process growth and answer heartbeats. When growth is the relevant remedy and already in flight, stop spinning: if free space exists, shed with a fast retryable error so clients back off and retry once growth lands; if the cluster is out of space, fail fast with the real out-of-space error instead of masking it as retryable. The gRPC shed uses ResourceExhausted, not Unavailable: operation.Assign retries it, but the client connection layer doesn't treat it as a dead channel, so a per-request shed across a herd doesn't tear down the shared master connection and cancel every other in-flight assign. The HTTP dirAssignHandler sheds with 503 + Retry-After.	2026-06-26 14:23:40 -07:00
jk2lx	81ed379884	volume server: route VolumeMarkReadonly to raft leader (#10120 ) * volume server: route VolumeMarkReadonly to raft leader After a master raft election, volume servers may still heartbeat a follower while admin paths such as weed shell volume.mark call notifyMasterVolumeReadonly via vs.GetMaster(). Followers reject VolumeMarkReadonly with NotLeader, which breaks tiering and other mark-readonly workflows until the heartbeat loop reconnects. Resolve the leader through GetMasterConfiguration on configured -master peers (same Leader field filer/master clients already use) before calling VolumeMarkReadonly. When the leader differs from the heartbeat peer, update currentMaster so the heartbeat loop converges faster. Adds operation.LookupRaftLeaderMaster with unit tests. Co-authored-by: Cursor <cursoragent@cursor.com> * fix: address review feedback on volume.mark raft leader routing Do not update currentMaster during leader lookup — heartbeat owns that field and uses stream GetLeader() to reconnect. Try the heartbeat peer first and only resolve the raft leader after a NotLeader rejection. Add ctx.Err() early exit and quieter logging for context cancellation. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(operation): thread the lookup timeout ctx into connection invalidation The 5s timeout drove only the RPC; WithMasterServerClient saw the unbounded outer ctx, so a self-inflicted timeout (slow GetMasterConfiguration during an election) was treated as a stale channel and tore down the shared master connection. Pass the timeout ctx into the helper so its own expiry leaves ctx.Err() set and spares the connection. --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Chris Lu <chris.lu@gmail.com>	2026-06-26 14:22:57 -07:00
Chris Lu	7c3c5ed2a4	fix(filer.sync.verify): sort listings client-side before merge (#10117 ) * fix(filer.sync.verify): sort listings client-side before merge The merge walks both filers' directory listings in lockstep and needs them in the same byte order. A filer before 4.32 with a locale SQL collation lists case-insensitively while a 4.32+ peer lists byte-ordered, so comparing two such clusters returns the same names in a different order and the merge desyncs into spurious MISSING / ONLY_IN_B. Buffer and sort each directory client-side so both sides agree on order regardless of filer version or store backend. Trades the streaming source's O(buffer) memory for O(directory) per side, fine for a one-shot verify CLI; both sides still load concurrently. Claude-Session: https://claude.ai/code/session_01BKsBdKYFNCEjeHLjJfumPF * fix(filer.sync.verify): surface listing errors before merging A listing that fails mid-stream leaves a partial, unsorted buffer. Now that both sides are fully buffered anyway, check each side's error right after the loads finish and before the merge, so partial entries can't emit spurious MISSING / ONLY_IN_B before the error aborts the run. Claude-Session: https://claude.ai/code/session_01BKsBdKYFNCEjeHLjJfumPF	2026-06-26 10:27:18 -07:00
qzhello	378f9a64ff	fix: apply collectionPattern during detection in volume.fix.replication (#10115 ) * fix(shell): correct volume.list -writable filter unit and comparison * fix(shell): correct volume.list -writable filter unit and comparison * chore(shell): fix typo in EC shard helper param names * fix(shell): use exact match for volume.balance -racks/-nodes filter The old strings.Contains-based filter quietly included any id that was a substring of the user-supplied flag value (e.g. -racks=rack10 also matched rack1). Replace it with an exact-match set parsed from the comma-separated flag value, and add regression tests for both -racks and -nodes paths. Also fix a small typo in the "remote storage" error returned by maybeMoveOneVolume. * fix(shell): use exact match for volume.balance -racks/-nodes filter The old strings.Contains-based filter quietly included any id that was a substring of the user-supplied flag value (e.g. -racks=rack10 also matched rack1). Replace it with an exact-match set parsed from the comma-separated flag value, and add regression tests for both -racks and -nodes paths. Also fix a small typo in the "remote storage" error returned by maybeMoveOneVolume. * refactor(shell): drop nil sentinel in splitCSVSet, use len() in callers * fix: apply collectionPattern during detection in volume.fix.replication * use existing wildcard.MatchesWildcard for collection matching It returns a plain bool, so drop the up-front filepath.Match validation and the path/filepath import that only existed to handle its error. * trim verbose comments to terse one-liners * drop redundant per-path collection guards Detection already filters by replicas[0].info.Collection. The repair guard re-checked pickOneReplicaToCopyFrom's collection (a different replica), so a mixed-collection volume could pass detection yet be skipped in repair without decrementing the counter, spinning the -apply loop. deleteOneVolume keeps its collectionIsMismatch safety. --------- Co-authored-by: Chris Lu <chris.lu@gmail.com>	2026-06-26 00:48:29 -07:00
Chris Lu	f475d60fcf	mount: move directory cache state to a side map to shrink InodeEntry (152 to 32 bytes) (#10114 ) mount: move directory cache state to a side map to shrink InodeEntry The mount keeps an InodeEntry alive for every inode the kernel references. On a mount that is almost entirely regular files, each entry carried the full directory readdir-cache bookkeeping (four time.Time fields plus counters), bloating it to 152 bytes whether or not the inode was a directory. Move that state into a dirState held in a side map keyed by inode, and drop the isDirectory bool: an inode is a directory iff it has a dirState. InodeEntry is now just paths + nlookup at 32 bytes, landing in a smaller Go allocator size class; on a mount with tens of millions of cached file inodes that is several GB less resident heap. As a side effect the readdir-cache scan helpers iterate only directories instead of every inode.	2026-06-25 19:17:32 -07:00
Chris Lu	c2668fbc64	fix(volume): make tier-down crash-safe and serve from local (Rust) (#10113 ) * fix(volume): fsync .vif and downloaded tier .dat (Rust) save_volume_info wrote the .vif with a plain write and no fsync, and the tier download never synced the .dat it wrote. Either could be lost on a crash before the tier-down path acts on them. fsync both, matching the Go volume server's util.WriteFile and DownloadFile. * fix(volume): swap to local before deleting remote on tier-down (Rust) The tier-down path deleted the shared remote object before trimming the .vif, so a crash in between left the volume's .vif pointing at a deleted object. It also dropped the remote backend only on the delete path and never opened the downloaded local .dat, so reads broke until reload and a keep-remote download kept serving from the slow remote object. Trim the .vif and swap to the local .dat on both paths, bracketed by directory fsyncs, before removing the remote object; gate only the object removal on keep_remote_dat_file. Matches the Go volume server's crash-safe ordering.	2026-06-25 12:29:21 -07:00
Chris Lu	66620a1ab8	fix(volume): serve reads from remote after tier upload (Rust) (#10112 ) After VolumeTierMoveDatToRemote uploaded the .dat, the volume closed its local backend but never opened the remote one, leaving both dat_file and remote_dat_file empty. The needle read path has no lazy reopen, so reads returned "dat file not open" until the volume reloaded. Switch to the remote backend right after saving the .vif, the same as the Go volume server's LoadRemoteFile, so the volume keeps serving from remote storage immediately after tiering.	2026-06-25 10:55:52 -07:00
Chris Lu	2c2df751f5	Perf CI: benchmark the Rust volume server and report memory usage (#10111 ) * ci: add per-process memory sampler for perf jobs Samples VmRSS once a second into a CSV and records peak VmHWM per process on stop. Linux only; reads /proc/<pid>/status. * ci: run perf benchmarks on the Rust volume server and report memory Matrix the throughput and S3 jobs over go/rust volume servers, using a standalone master (plus filer for S3) and swapping only the volume binary so the two are directly comparable. Sample peak RSS in every job and surface it per impl in the run summary. * ci: harden mem sampler arg handling and peak fallback Guard against missing args under set -u, and fall back to the max RSS sampled when a process exits before VmHWM can be read.	2026-06-25 10:52:23 -07:00
Chris Lu	2efc0e1656	ec: recover EC shards whose .ecx index lives only on a peer server (#10108 ) * ec: recover EC shards whose .ecx index lives only on a peer server A volume server that boots with EC shard files on disk but no .ecx index on any local disk cannot mount the shards, so the master never learns about them. ec.rebuild works off master-registered shards, so it sees the volume as short and gives up even though the shard data is intact. Add an operator-triggered recovery: VolumeEcShardsMount gains a recover_missing_index flag that makes the volume server fetch the missing .ecx (plus .ecj/.vif) from a peer holding it and mount the on-disk shards. ec.rebuild runs this across the cluster before planning, so orphaned shards register and the rebuild sees the true shard set. .ecx is an immutable encode-time index, identical on every holder. .ecj is a per-holder deletion journal that differs across holders, so the recovered node adopts the source peer's deletion view, like a balanced or rebuilt shard does. * ec: mirror missing-index recovery into the Rust volume server Port the #10104 recovery to seaweed-volume so the Rust volume server self-heals the same layout: EC shards on disk with the .ecx index only on a peer. Adds collect_ec_volumes_missing_index / mount_recovered_ec_shards to the store, recover_missing_ec_indexes (master LookupEcVolume + peer CopyFile fetch + mount) to the server, and the recover_missing_index flag on VolumeEcShardsMount. .ecx is the immutable encode-time index, identical on every holder. .ecj is a per-holder deletion journal, so the recovered node adopts the source peer's deletion view, matching the Go path.	2026-06-25 10:38:14 -07:00
adri	130a5dffc3	fix (Volume [Rust]): stream copy_file and volume_incremental_copy instead of buffering the whole file in memory (#10110 ) * fix(volume): stream copy_file from disk instead of buffering whole file copy_file pushed every 2MB chunk into a Vec and only then returned tokio_stream::iter(results), so serving a near-limit volume as a copy source (e.g. during volume.fix.replication) held the entire .dat resident and could OOM the process. Stream chunks through a bounded mpsc channel from a spawn_blocking reader instead; caps memory at ~16MB per transfer with backpressure. * fix(volume): stream volume_incremental_copy from disk instead of buffering Same buffering pattern as copy_file: every 2MB chunk was pushed into a Vec and only then returned via tokio_stream::iter, holding the entire delta resident. Stream the byte range from an owned file handle through a bounded mpsc channel, mirroring the copy_file fix. * test(volume): cover streaming copy_file and volume_incremental_copy Adds a multi-chunk .dat fixture and tests asserting both handlers stream in 2MB chunks (multiple messages), reassemble byte-for-byte, carry modified_ts_ns only on the first copy_file message, and honor stop_offset. * address review: use u64 byte counters; stream local incremental copy without holding the store lock - copy_file/volume_incremental_copy: track remaining bytes and offsets as u64 instead of casting uint64 stop_offset/dat_size through i64 (CodeRabbit). - volume_incremental_copy: for local volumes open the .dat and stream directly with no lock held; only remote/tiered volumes take the per-chunk read_dat_slice path, so a remote S3 read is never performed while holding the store read lock (Gemini). * volume (Rust): stream tiered incremental copy off the store lock, open .dat under it Capture the reader for volume_incremental_copy while the volume lookup is still under the store read lock: an open File for local volumes, a cloned remote backend handle for tiered ones. Then drop the lock and stream with none held. Opening under the lock pins the reader to the volume that exists now, so a concurrent delete/recreate can't stream from the wrong file, and a slow S3 fetch for a tiered .dat no longer blocks store writers (the remote path previously re-took the store lock per chunk). Use a non-uniform copy-test payload so chunk reassembly catches duplicated or reordered chunks a repeated byte would hide. * volume (Rust): return empty when incremental-copy start offset is past the .dat A corrupt needle index could locate an offset beyond the captured .dat size, underflowing the dat_size - start_offset subtraction (panic in debug, wrap in release). Guard it up front like the other empty-delta early returns. --------- Co-authored-by: adri <adri@digitalunited.net> Co-authored-by: Chris Lu <chris.lu@gmail.com>	2026-06-25 10:25:42 -07:00
Chris Lu	c01cea8786	docker release: run all platform jobs in one wave, cache rocksdb compile Drop max-parallel so the 13 per-platform builds run together instead of two waves of 8 (rocksdb was queuing behind the cap and starting ~8 min late). Keep cache-to mode=max for rocksdb: its RocksDB static_lib compile is sha-independent, so it caches across releases and stops being the ~16-min long-pole that gates the merge fan-in. go-build variants stay mode=min.	2026-06-25 01:13:32 -07:00
Chris Lu	3f68b19500	docker release: per-platform builds on native runners, drop mode=max cache (#10109 ) docker release: build per-platform on native runners, drop mode=max cache The build job built every platform of a variant on one runner, so 2-4 Go cross-compiles fought over a single 2-vCPU box and arm64 ran in an emulated context. Split the matrix to one platform per job on a native runner (amd64/386 on ubuntu-latest, arm64/arm-v7 on ubuntu-24.04-arm); only arm/v7 still needs QEMU, and only for its final apk stage. Each job pushes by digest, and a new merge job assembles the multi-arch tag with imagetools and mirrors it to Docker Hub. cache-to mode=max -> mode=min: BRANCH=sha cache-busts the heavy go-build layer every release, so writing all intermediate layers to the gha backend spent 3-11 min per variant on a cache the next release's sha can never hit.	2026-06-25 00:37:33 -07:00
jay	d2795de186	fix(admin): volume TTL in dashboard (#10107 ) fix: admin dashboard ttl display Signed-off-by: jayl1e <jayl1e@outlook.com>	2026-06-24 23:42:10 -07:00
Chris Lu	a88acaf061	Add performance CI (profiling, throughput, S3 read/write) (#10105 ) * test: add self-contained S3 read/write load tool Concurrent PUT/GET against the S3 gateway, reporting requests/sec, transfer rate, and latency percentiles. Built on the aws-sdk-go-v2 client the S3 tests already use, so no extra benchmark binary is needed. * ci: add performance workflow Three parallel jobs: cpu/heap pprof of the server under write load, native throughput via weed benchmark plus the Go micro-benchmarks, and an S3 read/write benchmark against the gateway. Runs on push to master and manual dispatch with tunable duration, object count, size, and concurrency.	2026-06-24 22:44:03 -07:00
github-actions[bot]	d0b90d29eb	4.36 4.36	2026-06-25 05:09:40 +00:00
Chris Lu	d65ed3b557	add release version-bump workflow	2026-06-24 22:08:06 -07:00
Chris Lu	3b9e196e5f	sts: enforce session-policy explicit deny during role chaining (#10103 ) * sts: enforce session-policy explicit deny during role chaining A chained AssumeRole caller authenticates with an STS session token whose inline session policy can explicitly deny sts:AssumeRole. The deny check only evaluated the caller's named policies, so such a session could still chain into any role its trust policy admits. Validate the session token in the deny check and honor an explicit Deny in the inline session policy too. * test(sts): integration coverage for AssumeRole authorization Add an end-to-end AssumeRole authorization test (real weed mini + boto3): a non-admin caller assumes a role its trust policy admits, an explicit identity-side deny is blocked, and a session policy's explicit deny blocks role chaining. * sts: skip OIDC tokens and reject revoked sessions in the chaining deny check Review follow-ups on the session-policy deny check: - Guard session validation with !isOIDCToken so a bearer token our STS service cannot validate does not error into a false deny. - Reject a revoked session before evaluating its policy, restoring the revocation enforcement the AssumeRole path lost when it stopped routing through IsActionAllowed.	2026-06-24 21:38:21 -07:00
Chris Lu	88a4a939aa	fix(sts): authorize AssumeRole by the role's trust policy (#10097 ) * fix(sts): authorize AssumeRole by the role's trust policy The role's trust policy already declares who may assume it, but the caller also had to pass an identity-side sts:AssumeRole check that only the Admin action could satisfy — legacy static identities have no way to express sts:AssumeRole on a role. So assuming any role required a full admin identity. Drop the redundant check and let the trust policy be the authority; scope it to specific principals to restrict who can assume. * sts: resolve caller principal ARN for the trust-policy check A legacy static identity can reach AssumeRole without a PrincipalArn set; passing the empty value would miss a trust policy that names a concrete principal. Resolve it to the canonical user ARN, sharing the logic GetCallerIdentity already used inline. * sts: enforce explicit identity-side deny for AssumeRole Authorizing a named role by its trust policy alone dropped identity-side evaluation entirely, so a caller whose attached policy explicitly denies sts:AssumeRole could still assume any role the trust policy admits. Re-check the caller's policies through the IAM manager for an explicit deny (deny-always-wins) without requiring an allow; the trust policy stays the allow authority.	2026-06-24 20:14:26 -07:00
sshhan	a1fff50935	fix(postgres): prevent uint32 underflow & OOM in message parsing (#10099 ) * fix(postgres): prevent uint32 underflow & OOM in message parsing * postgres: drop redundant startup guard, use maxStartupMessageSize const The msgTotalLen < 8 check already guarantees msgLength >= 4, so the extra msgLength < 4 guard before reading the protocol version was unreachable. Point the startup size limit at maxStartupMessageSize instead of a literal. * postgres: trim query terminator safely, cap pre-auth payloads Use strings.TrimSuffix for the simple-query null terminator so a non-null-terminated body isn't silently shortened, matching the auth handlers. Bound password/MD5 reads with a dedicated maxAuthMessageSize (10 KiB) instead of the 100 MiB maxMessageSize, since these payloads are read before authentication. --------- Co-authored-by: shangshuhan <shangshuhan@cmict.chinamobile.com> Co-authored-by: Chris Lu <chris.lu@gmail.com>	2026-06-24 20:05:43 -07:00

1 2 3 4 5 ...

14343 Commits