seaweedfs

mirror of https://github.com/seaweedfs/seaweedfs.git synced 2026-06-09 18:32:43 +00:00

Author	SHA1	Message	Date
github-actions[bot]	d0b90d29eb	4.36 4.36	2026-06-25 05:09:40 +00:00
Chris Lu	d65ed3b557	add release version-bump workflow	2026-06-24 22:08:06 -07:00
Chris Lu	3b9e196e5f	sts: enforce session-policy explicit deny during role chaining (#10103 ) * sts: enforce session-policy explicit deny during role chaining A chained AssumeRole caller authenticates with an STS session token whose inline session policy can explicitly deny sts:AssumeRole. The deny check only evaluated the caller's named policies, so such a session could still chain into any role its trust policy admits. Validate the session token in the deny check and honor an explicit Deny in the inline session policy too. * test(sts): integration coverage for AssumeRole authorization Add an end-to-end AssumeRole authorization test (real weed mini + boto3): a non-admin caller assumes a role its trust policy admits, an explicit identity-side deny is blocked, and a session policy's explicit deny blocks role chaining. * sts: skip OIDC tokens and reject revoked sessions in the chaining deny check Review follow-ups on the session-policy deny check: - Guard session validation with !isOIDCToken so a bearer token our STS service cannot validate does not error into a false deny. - Reject a revoked session before evaluating its policy, restoring the revocation enforcement the AssumeRole path lost when it stopped routing through IsActionAllowed.	2026-06-24 21:38:21 -07:00
Chris Lu	88a4a939aa	fix(sts): authorize AssumeRole by the role's trust policy (#10097 ) * fix(sts): authorize AssumeRole by the role's trust policy The role's trust policy already declares who may assume it, but the caller also had to pass an identity-side sts:AssumeRole check that only the Admin action could satisfy — legacy static identities have no way to express sts:AssumeRole on a role. So assuming any role required a full admin identity. Drop the redundant check and let the trust policy be the authority; scope it to specific principals to restrict who can assume. * sts: resolve caller principal ARN for the trust-policy check A legacy static identity can reach AssumeRole without a PrincipalArn set; passing the empty value would miss a trust policy that names a concrete principal. Resolve it to the canonical user ARN, sharing the logic GetCallerIdentity already used inline. * sts: enforce explicit identity-side deny for AssumeRole Authorizing a named role by its trust policy alone dropped identity-side evaluation entirely, so a caller whose attached policy explicitly denies sts:AssumeRole could still assume any role the trust policy admits. Re-check the caller's policies through the IAM manager for an explicit deny (deny-always-wins) without requiring an allow; the trust policy stays the allow authority.	2026-06-24 20:14:26 -07:00
sshhan	a1fff50935	fix(postgres): prevent uint32 underflow & OOM in message parsing (#10099 ) * fix(postgres): prevent uint32 underflow & OOM in message parsing * postgres: drop redundant startup guard, use maxStartupMessageSize const The msgTotalLen < 8 check already guarantees msgLength >= 4, so the extra msgLength < 4 guard before reading the protocol version was unreachable. Point the startup size limit at maxStartupMessageSize instead of a literal. * postgres: trim query terminator safely, cap pre-auth payloads Use strings.TrimSuffix for the simple-query null terminator so a non-null-terminated body isn't silently shortened, matching the auth handlers. Bound password/MD5 reads with a dedicated maxAuthMessageSize (10 KiB) instead of the 100 MiB maxMessageSize, since these payloads are read before authentication. --------- Co-authored-by: shangshuhan <shangshuhan@cmict.chinamobile.com> Co-authored-by: Chris Lu <chris.lu@gmail.com>	2026-06-24 20:05:43 -07:00
Chris Lu	0f1ec8983d	mount: don't fail close() on a benign FUSE interrupt (#10102 ) A FUSE interrupt is not a process kill. Go's async preemption (SIGURG) makes a close() under load emit an interrupt on nearly every flush, so deriving the metadata-flush context from the FUSE cancel channel turned healthy concurrent close()s into EIO: the interrupt cancelled the in-flight CreateEntry, which surfaced as "input/output error". Bound the flush with a deadline instead. A healthy CreateEntry finishes in well under a second, so the deadline only fires against a genuinely stuck filer -- still keeping close() from hanging forever -- while benign preemption no longer aborts a good flush.	2026-06-24 19:54:03 -07:00
Chris Lu	95427b5573	security: add BearerPrefix constant for Authorization headers (#10101 ) Introduce security.BearerPrefix ("Bearer ", RFC 6750) and use it everywhere an "Authorization: Bearer <token>" header is constructed, replacing the scattered "BEARER "/"Bearer " string literals. SeaweedFS matches the scheme case-insensitively when parsing (security.GetJwt), so behavior is unchanged; this removes the magic string and settles the casing on the standard form. The parser's upper-case comparison stays as is on purpose.	2026-06-24 19:36:42 -07:00
Chris Lu	4d3e5d94a9	filer: mint volume read JWT when proxying chunk reads (#10100 ) The /?proxyChunkId= endpoint forwards the caller's headers to the volume server but never mints a read token, so proxied chunk reads return 401 once jwt.signing.read.key is configured. Generate a fileId-scoped volume token the same way the direct filer read path does, which fixes filer.sync, filer.backup, filerProxy mounts, the MQ broker and the upload gateway in one place.	2026-06-24 19:21:57 -07:00
dependabot[bot]	7c9f61d4dc	build(deps): bump com.fasterxml.jackson.core:jackson-databind from 2.18.6 to 2.22.0 in /test/java/spark (#10094 ) * build(deps): bump com.fasterxml.jackson.core:jackson-databind Bumps [com.fasterxml.jackson.core:jackson-databind](https://github.com/FasterXML/jackson) from 2.18.6 to 2.22.0. - [Commits](https://github.com/FasterXML/jackson/commits) --- updated-dependencies: - dependency-name: com.fasterxml.jackson.core:jackson-databind dependency-version: 2.22.0 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> * build(deps): pin jackson-annotations to its own 2.22 version jackson-annotations dropped the patch digit in 2.20 and releases on its own line, so 2.22.0 does not exist. Sharing jackson.version broke dependency resolution; give it a dedicated property. --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Chris Lu <chris.lu@gmail.com>	2026-06-24 19:12:48 -07:00
Chris Lu	96d2d13efe	s3: replicate by fanning out from the gateway to every holder (#10078 ) * s3: replicate by fanning out from the gateway to every holder The S3 gateway uploaded each chunk to one volume server, which then relayed the copies to the other replica holders. The gateway now uploads each chunk to every holder in parallel (type=replicate), removing the primary volume server's receive-then-resend relay. AssignVolume returns every replica holder (new repeated Location replicas, forwarded from the master assign), the s3api captures them, and the chunked uploader fans out whenever a chunk has more than one holder. Cipher uploads keep the server-driven path since per-call encryption would diverge the replicas. * s3: cancel sibling replica uploads on the first failure * s3: trim replica fan-out comments * s3: roll back successful fan-out chunk copies when a holder fails A failed fan-out records no FileChunk, so copies that landed on the holders that finished before the cancel were leaked as orphans the caller could not see. Track the holders that succeeded and delete the needle from each (type=replicate, local-only) on failure, leaving nothing behind.	2026-06-24 16:31:58 -07:00
os-pradipbabar	d1b1338558	Fix stale cache fallback for empty volume locations in wdclient (#10081 ) fix(wdclient): prevent stale cache fallback for empty volume locations ## Problem During Kubernetes pod restarts, volume servers temporarily disconnect and their locations are removed from vidMap. The deleteLocation function leaves an empty array [] in vid2Locations map instead of removing the key entirely. GetLocations() was checking 'if found && len(locations) > 0', which would fail for empty arrays and fall back to the cache chain, returning STALE locations from before the restart. This caused S3 gateway to try connecting to old pod IPs that no longer exist, resulting in connection timeouts and hanging registry sync jobs. Example timeline: 1. Volume pod at 10.131.1.28:8081 registers volumes 10,12 2. S3 gateway caches: vid2Locations[10] = [10.131.1.28:8081] 3. Pod restarts, gets new IP 10.131.1.65:8081 4. Master sends delete → vid2Locations[10] = [] (empty, but key exists) 5. BUG: GetLocations(10) sees found=true, len=0 → falls back to cache 6. Returns stale 10.131.1.28:8081 instead of waiting for new location 7. S3 requests timeout trying to reach unreachable old IP ## Solution Distinguish between two cases: - found=true, locations=[] : Volume explicitly has no locations (e.g. restart) → Return nil, false (no fallback to cache) - found=false : Volume never seen in current map → Check cache (preserve cache benefits for unknown volumes) An empty array explicitly means 'this volume currently has no locations', which is semantically different from 'volume unknown'. Don't fall back to stale cache for explicitly empty volumes. ## Testing Added comprehensive tests: - TestGetLocationsEmptyArrayNoFallback: Verifies empty arrays don't use cache - TestGetLocationsUnknownVolumeUsesCache: Verifies unknown volumes still use cache - All existing tests pass ## Impact Fixes registry sync job hangs during SeaweedFS upgrades/restarts. S3 gateway will now correctly wait for updated volume locations instead of using stale cached IPs. Related: OutSystems.SeaWeedfs Helm chart, vega cluster incident 2026-06-24	2026-06-24 16:31:32 -07:00
Chris Lu	089acfbf36	fix(s3api): apply static config file updates on reload (#10096 ) A config-file reload (SIGHUP) routed through MergeS3ApiConfiguration, which skips identities marked static so dynamic admin/filer updates can't clobber them. That also blocked the config file itself from updating its own identities, so editing a secretKey and reloading had no effect. Thread a fromStaticFile flag from the file-load path into the merge: the authoritative file overwrites its static identities (and reapplies service accounts under them), while dynamic updates still leave them immutable. Mark the rebuilt identities static in the merge so a concurrent RemoveIdentity never observes them as removable mid-reload.	2026-06-24 16:26:35 -07:00
Chris Lu	cd828f6503	s3: propagate IAM changes from standalone weed s3 to peer pods (#10095 ) Standalone weed s3 created a master client and registered the receiving SeaweedS3IamCache gRPC service, but never wrapped its credential store with the propagating store. Only the filer-embedded path called SetMasterClient, so IAM mutations on one s3 pod never reached peers; they served a stale in-memory identity cache and returned InvalidAccessKeyId until restarted. Wrap the credential store with the master client when one is available, mirroring the filer path, so mutations fan out over the existing gRPC cache service.	2026-06-24 16:26:08 -07:00
Chris Lu	c15989387b	s3tables: allow hyphens in namespace and table names (#10093 ) * s3tables: allow hyphens in namespace and table names Iceberg REST clients routinely use hyphenated namespace/table names, but the S3 Tables charset (a-z, 0-9, _) rejected them with 400. Accept '-' as an interior character (names must still start, and namespaces end, with a letter or digit), making the catalog conformant for those clients. A permissive superset of the AWS S3 Tables charset. * s3tables: allow hyphens in table ARN parsing too The ARN regexes still excluded '-', so parseTableFromARN rejected ARNs with hyphenated namespace/table names and existing reject-the-hyphen tests broke. Widen the ARN patterns to match the validator, retarget those tests at a still-invalid leading-hyphen name, and cover ARN parsing with hyphens.	2026-06-24 16:24:45 -07:00
Chris Lu	1c5f8244a4	s3tables: fix create-after-rename overwriting the renamed table (#10091 ) * s3tables: purge decoupled table data without deleting the reused name path A renamed or created-over-leftover table keeps its data at a location that differs from its catalog name path. Drop now purges that data location and clears the marker, instead of recursively deleting the name path, which may still hold another table's data. * iceberg: route a table created over a leftover to a unique location When the default location is occupied by a leftover directory (data kept when another table was renamed to this name), create the new table at a unique location so it cannot overwrite that table's metadata. Common case is unchanged. * iceberg: fail table create when the leftover-path check errors A transient filer lookup error fell through as "not occupied", routing the new table back to the default path and risking the very overwrite this check guards against. Propagate the error and return 500 instead. * s3tables: assert all catalog xattrs cleared on decoupled drop Seed the full marker set so the test catches a regression that leaves the policy, tags, version, or entry-type attribute on the reused name path. * s3tables: refuse to drop a table whose data path is an ancestor Corrupt metadata can resolve the data path to the bucket or namespace root, which the bucket-scope check still admits; a recursive purge there would wipe sibling tables. Reject an ancestor data path before deleting.	2026-06-24 14:37:04 -07:00
Chris Lu	5456f9d695	mount: confirm an empty directory rebuild before caching it (#10092 ) A directory rebuild wiped the cached children, listed the filer once, and published the directory authoritatively cached over whatever came back. A transient empty listing -- a momentary list-stream glitch that ends as a clean EOF with no entries -- then stranded a populated directory cached over an empty store, hiding every file in it until some unrelated event happened to rebuild it: stat returns ENOENT and readdir returns nothing though the files are safe on the filer, and nothing re-triggers a build. Re-read the directory when the listing comes back empty before trusting it. The first re-read is immediate, since the likely transient clears on a fresh stream; later attempts space out. A genuinely empty directory still lists empty every time and caches as before, so only empty listings pay the extra read. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-24 14:25:23 -07:00
Chris Lu	5112da98a2	mount: skip redundant permission checks under default_permissions (#10089 ) With default_permissions (the mount default) the kernel enforces unix permission bits from the getattr/lookup attributes before it ever calls Open, Create, or Mknod. The mount was re-checking permissions in AcquireHandle and createRegularFile anyway, which duplicated the kernel's work and kept the supplementary-group lookup on the per-file hot path. Gate only the mode-bit access check on default_permissions being off, so a non-root copy does no permission work on open/create. createRegularFile still loads the parent to validate it exists, since the create RPC skips the filer-side parent check. With default_permissions off the mount remains the sole enforcer, so the full check still runs.	2026-06-24 14:24:51 -07:00
Chris Lu	ef109fe9e1	mount: don't hang close() when a writer is killed during flush (#10090 ) * operation: bound AssignVolume with a deadline AssignVolume ran on context.Background(), so when the filer is overwhelmed the RPC could block indefinitely and wedge every caller holding the connection. Give it a 30s deadline so a stuck assign fails and the caller's retry/error path runs instead of hanging forever. * mount: abort flush when the FUSE request is interrupted On close(), a killed process blocks in fuse_flush waiting for the mount to answer. doFlush ran its metadata CreateEntry on context.Background() and ignored the kernel interrupt channel, so against an overwhelmed filer the flush never completed and the process stayed in uninterruptible sleep -- making the pod un-killable. Derive a context from the FUSE cancel channel in Flush/Fsync and thread it through doFlush -> flushMetadataToFiler -> streamCreateEntry; the retry loop stops as soon as the context is cancelled. Release and the pre-rename flush keep a non-cancellable context since they must finish regardless. * operation: harden the AssignVolume timeout test Make the test double's signal send non-blocking and bound the receive with a timeout so a regression can't wedge the test instead of failing it.	2026-06-24 14:24:22 -07:00
Jaehoon Kim	a11d81b21f	fix(filer.backup): repair chunk-incomplete and stale destination entries (#10082 ) * fix(filer.backup): repair chunk-incomplete and stale destination entries filer.backup left destinations diverged while metadata advanced — chunk-incomplete (missing/gapped ranges at full attr.file_size) or holding a chunk superseded by a missed overwrite. The skip/repair decision keyed on filer.FileSize (the attr), which a truncated entry keeps full, so it never repaired. Decide from actual chunk state instead: - coversReference: range-by-range containment (scalar byte totals and attr FileSize/Md5 cannot see chunk-level gaps). - hasStaleBackupChunk: a backup-written chunk (SourceFileId) the source no longer lists; ignores out-of-band (rsync/direct) chunks. - destinationMatchesReference: allocation-free positional fast path gating the above so they run only on divergence (the in-sync path stays cheap). - A strictly-newer destination is never repaired, so an older out-of-order replay cannot roll it back. The stale signal is deferred at equal mtime (same-second versions cannot be ordered; reliable S3 sub-second ordering is a separate fix). Tests in filer_sink_test.go. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * filer.backup: verify chunk range in destinationMatchesReference fast path The allocation-free fast path matched a destination chunk to its reference by SourceFileId alone. That is correct today only because replicateOneChunk copies the source chunk's Offset/Size verbatim, so SourceFileId identity implies an identical range — an invariant that lives in another file with no guard linking the two. If replication ever re-chunks (split/coalesce), a chunk with the right SourceFileId but a different range would fast-path as a full match and skip a needed repair (a false positive in the very class this change otherwise prevents). Compare Offset/Size alongside SourceFileId so the fast path is self-contained and can only be more conservative (a range mismatch falls through to the precise coversReference/hasStaleBackupChunk checks). Add tests for a shifted offset and a larger size at matching identity. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-24 14:23:38 -07:00
jk2lx	e1f89f85f2	fix(filer): apply -filer.disk default to metadata log assigns (#10080 ) * fix(filer): apply -filer.disk default to metadata log assigns Metadata event log writes call operation.Assign directly and used only FilerConf path rule DiskType. When filer.conf rules were missing or unmatched, the master received an empty DiskType and grew volumes on the built-in hdd layout. Mirror resolveAssignStorageOption: wire FilerOption.DiskType into the Filer, fall back when the matched path rule has no disk type, and return the matched rule from resolveMetadataLogAssignDiskType to avoid duplicate MatchStorageRule lookups. Co-authored-by: Cursor <cursoragent@cursor.com> * mini: fall back to -volume.disk for filer default disk type weed server copies -volume.disk into the filer disk default when -filer.disk is unset; weed mini did not, so metadata-log assigns sent an empty disk type on clusters that only tag volumes (e.g. hot/warm). --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Chris Lu <chris.lu@gmail.com>	2026-06-24 10:47:11 -07:00
Chris Lu	d29e6ed98a	deps: replace deleted tyler-smith/go-bip39 with cosmos fork (#10088 ) The tyler-smith/go-bip39 repository was deleted from GitHub, so go mod download fails for anyone resolving it directly (GOPROXY=direct). It only reaches us transitively through rclone's internxt backend, which calls IsMnemonicValid and NewSeed. Point it at cosmos/go-bip39, an API-compatible and maintained fork.	2026-06-24 10:41:43 -07:00
Chris Lu	e744b5f2ee	iceberg: detect table-exists through the wrapped manager error (#10075 ) handleCreateTable used a type assertion that fails through WithFilerClient's 'all filers failed' wrap, so a concurrent create that the pre-check missed fell through instead of returning the existing table. Use errors.As.	2026-06-24 10:22:36 -07:00
patrick	3e2c637858	util: trim minFreeSpace values before parsing (#10083 )	2026-06-24 09:03:38 -07:00
Lisandro Pin	30f2dd5040	Weed shell `ec.rebuild`: Allow targeting rebuild to specific volume IDs. (#10087 )	2026-06-24 08:40:29 -07:00
qzhello	fb168e2a36	fix: avoid reading upload body when writing JSON errors (#10073 ) * fix(shell): correct volume.list -writable filter unit and comparison * fix(shell): correct volume.list -writable filter unit and comparison * chore(shell): fix typo in EC shard helper param names * fix(shell): use exact match for volume.balance -racks/-nodes filter The old strings.Contains-based filter quietly included any id that was a substring of the user-supplied flag value (e.g. -racks=rack10 also matched rack1). Replace it with an exact-match set parsed from the comma-separated flag value, and add regression tests for both -racks and -nodes paths. Also fix a small typo in the "remote storage" error returned by maybeMoveOneVolume. * fix(shell): use exact match for volume.balance -racks/-nodes filter The old strings.Contains-based filter quietly included any id that was a substring of the user-supplied flag value (e.g. -racks=rack10 also matched rack1). Replace it with an exact-match set parsed from the comma-separated flag value, and add regression tests for both -racks and -nodes paths. Also fix a small typo in the "remote storage" error returned by maybeMoveOneVolume. * refactor(shell): drop nil sentinel in splitCSVSet, use len() in callers * fix: avoid reading upload body when writing JSON errors	2026-06-23 20:20:11 -07:00
Chris Lu	c95401b11a	iceberg: support table rename (#10068 ) * s3tables: add RenameTable operation * iceberg: support table rename * iceberg: test table rename * s3tables: keep table data in place on rename rename is catalog-only: drop the source's catalog xattrs in place instead of recursively deleting its directory, which wiped the metadata.json and data files the renamed destination still points at. treat a missing table-metadata xattr as NoSuchTable in GetTable so the soft-deleted source name stops resolving. * s3tables: test rename preserves data make the in-memory filer honor recursive data deletion and seed the source table's metadata/ and data/ children, then assert a rename leaves them intact, the source name resolves to NoSuchTable, and the destination resolves to the preserved location. * iceberg: map rename errors through wrapped manager error * s3tables: authorize rename destination namespace rename moved a table into the destination namespace after only checking the source, letting a source-authorized caller place tables in namespaces they don't control. require CreateTable on the destination namespace and bucket before writing. * s3tables: purge renamed table data on drop * s3tables: test table data dir derivation	2026-06-23 20:18:11 -07:00
Chris Lu	7abed4e517	s3: skip 503 when client disconnects during remote cache wait (#10071 ) s3: don't write 503 to a disconnected client during remote cache wait When the remote-only cache poll returns without chunks, re-check the request context before emitting 503 + Retry-After. A client that disconnected during the wait surfaces as context.Canceled, which the caller already handles silently; writing to the closed connection only produced broken-pipe log noise.	2026-06-23 15:31:08 -07:00
Chris Lu	0403e47ef6	iceberg: support views (#10069 ) * s3tables: tag table entries and exclude views from table listings * s3tables: add view CRUD operations * iceberg: support view create, load, exists, drop, and list * iceberg: support view update * iceberg: test view error classification and metadata round-trip * iceberg: pre-check existence and write view metadata only after create * iceberg: map view namespace-not-found to 404 * iceberg: test view create namespace-404 and duplicate no-clobber * s3tables: tag view metadata and entry type atomically CreateView wrote ExtendedKeyMetadata and ExtendedKeyEntryType in two UpdateEntry calls, so a partial failure could leave a view directory untagged. Add setExtendedAttributes to set both in one UpdateEntry. * iceberg: roll back view registration when metadata write fails The metadata file is written after the catalog registers the view. If that write fails, drop the just-created view so it doesn't linger pointing at a missing metadata.json. Reuse the DeleteView path via a shared dropView helper.	2026-06-23 15:22:31 -07:00
Chris Lu	1ca628d3e9	iceberg: support multi-table transaction commit (#10066 ) * iceberg: support multi-table transaction commit Add handleCommitTransaction for POST /v1/transactions/commit. Validation is atomic across all table-changes (resolve, load, evaluate every requirement before any write); metadata writes and pointer flips are best-effort with rollback, so this is not crash-atomic. * iceberg: route transactions/commit with and without prefix * iceberg: test transaction commit request decoding * iceberg: restore full prior table state on transaction rollback * iceberg: test transaction rollback restores full prior table state * iceberg: only clean up metadata for rolled-back tables	2026-06-23 14:08:03 -07:00
Chris Lu	628ce57625	iceberg: support table register (#10067 ) * s3tables: add RegisterTable op * iceberg: support table register * iceberg: test register table * iceberg: parse engine-written metadata version from location * iceberg: test metadata version parsing for both filename forms * iceberg: map register errors through wrapped manager error * iceberg: validate register metadata-location bucket and reject traversal * iceberg: log register metadata load failure	2026-06-23 14:07:13 -07:00
Chris Lu	63f2f0bef5	s3: keep a file promoted to a directory retrievable as an object (#10070 ) * filer: treat a directory carrying object data as an S3 key object A file promoted to a directory by a child write keeps its chunks, inline content, or remote-tiered entry. Recognize that as a directory key object, not only when a Mime is set, so the object still lists, demotes on delete, and is not reclaimed by cleanup like the object it still is. * filer: keep the empty-folder cleaner from reclaiming a promoted object The cleaner skips directory key objects, but its check only looked at the Mime. Mirror the chunks/content/remote check so a file promoted to a directory is not deleted once its children are gone. * s3: serve ranged GET for a directory that carries object data Reject only zero-size directories so a file promoted to a directory streams range requests instead of returning 404, while empty directories still 404. * s3: return HEAD metadata for a directory that carries object data HEAD now 404s a directory only when it has no data, so a promoted object is retrievable while empty/implicit directories still fall back to LIST.	2026-06-23 14:06:00 -07:00
7y-9	ddd11e44f9	feat: support marking volumes by collection (#9585 ) * feat: add collection.mark shell command Add collection.mark to mark all existing normal volume replicas in a collection as readonly or writable. The command runs in preview mode by default and requires -apply to execute changes. It reuses existing volume mark RPCs, supports default collection aliases, skips EC shards, and adds unit tests for option parsing and target collection logic. * Revert "feat: add collection.mark shell command" This reverts commit `50c2bbf94c`. * feat: support marking volumes by collection Add a -collection option to volume.mark so operators can mark every normal volume replica in a collection using existing topology data and volume mark RPCs. The change keeps the single-volume path unchanged and adds tests for collection target selection, EC shard exclusion, and argument validation. Co-authored-by: Codex <noreply@openai.com> * volume.mark: reuse eachDataNode for collection traversal * volume.mark: continue past per-volume failures and report progress Collection marking aborted on the first failed RPC, leaving the collection half-marked with no record of which volumes succeeded. Mark every reachable volume, print per-volume progress to the writer, and return an aggregated error naming the failures. * volume.mark: let -collection _default target the unnamed collection Other volume commands use the _default sentinel to match volumes with no named collection; volume.mark could not reach them at all. Map _default to the empty collection name in the filter. --------- Co-authored-by: Chris Lu <chrislusf@users.noreply.github.com> Co-authored-by: Codex <noreply@openai.com> Co-authored-by: Chris Lu <chris.lu@gmail.com>	2026-06-23 11:27:43 -07:00
msementsov	70d9dd5afe	volume.balance: add -volumesPerExec to cap moves per run Limit the number of volume moves performed in one command execution; re-run to continue. 0 = unlimited.	2026-06-23 10:48:33 -07:00
198wmj	aeaf62fa86	fix: resolve postgres startup message length type mismatch and uint underflow OOM risk (#10065 ) * fix: resolve postgres startup message length type mismatch and uint underflow OOM risk * Update weed/server/postgres/server.go Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --------- Co-authored-by: wangmeijuan <542204218@qq.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-06-23 10:08:26 -07:00
Chris Lu	b0bad761ff	worker: don't leak task goroutines on forced shutdown (#10062 ) * worker: don't leak task goroutines on forced shutdown Stop() drains in-flight tasks for 30s, then terminates the manager loop. A task still running past that deadline later reports completion through w.cmds - getAdmin in completeTask, the ActionIncTask* send, removeTask - but with the loop gone and cmds unbuffered, those sends and the response reads behind getAdmin/getTaskLoad block forever, leaking the goroutine. Close a done channel when the loop exits and route the task-goroutine sends through it so they abort and return zero values instead of blocking. getAdmin can now return nil mid-shutdown, so collapse its double-call sites to a single nil-checked call to avoid a deref. * worker: abort remaining manager-loop sends after shutdown Extend the post-shutdown abort to the sends that still blocked: Stop()'s own ActionStop (so a second Stop, e.g. an admin-shutdown timer racing an explicit one, doesn't hang), setTask, and handleTaskCancellation. Route them through w.done so they return instead of blocking when the loop is gone. Stop is now idempotent.	2026-06-23 10:06:59 -07:00
AlexALei	faa8c3963b	fix(chunk_cache): close data/index files on initialization error (#10057 ) * fix(chunk_cache): close data/index files on initialization error * chunk_cache: assign outer err on the .dat open path The error-path defer keys off the function-level err, but the .dat OpenFile used := and shadowed it, so that path relied on nothing being open yet rather than the cleanup invariant. Assign the outer err so every error return is uniform. * chunk_cache: verify descriptor closure on POSIX, not just Windows os.Remove succeeds on open files on Linux/macOS, so the removal check only proved closure on Windows. Compare the open-fd count before and after the failed load; gate the removal check to Windows. --------- Co-authored-by: Contributor <contributor@example.com> Co-authored-by: Chris Lu <chris.lu@gmail.com>	2026-06-23 01:49:35 -07:00
shiftraodd	1e2412e502	fix: enforce XATTR_REPLACE semantics in setxattr (#10059 ) * 修复weedfs_xattr.go 中 XATTR_REPLACE 语义缺失 * mount: fix XATTR_CREATE/XATTR_REPLACE flag semantics in setxattr XATTR_CREATE fell through into the XATTR_REPLACE branch: creating a new attribute hit the empty-oldData guard and returned ENODATA instead of creating it, while creating over an existing attribute silently succeeded without the EEXIST that setxattr(2) requires. Drop the fallthrough chain so CREATE returns EEXIST when the attribute already exists, REPLACE returns ENODATA when it is missing, and otherwise the value is written. Test existence via the map lookup so an attribute with an empty value is still treated as present. --------- Co-authored-by: 王郁文 <wangyuwen@cmict.chinamobile.com> Co-authored-by: Chris Lu <chris.lu@gmail.com>	2026-06-23 01:31:14 -07:00
patrick	4bcd27fb6f	s3api: preserve equals signs in tag values (#10058 ) * s3api: preserve equals signs in tag values * s3api: decode tag key once in parseTagsHeader --------- Co-authored-by: Chris Lu <chris.lu@gmail.com>	2026-06-23 01:25:32 -07:00
mumingl	16c3f5c816	fix: Resolve inconsistent usage of error variables (#10060 ) * fix: Resolve inconsistent usage of error variables * mysql2: guard nil DB on open failure and wrap connect error --------- Co-authored-by: muminglei <muminglei@cmict.chinamobile.com> Co-authored-by: Chris Lu <chris.lu@gmail.com>	2026-06-23 01:25:02 -07:00
Chris Lu	6de283ccaa	iceberg: return 400 for invalid namespace/table names (#10051 ) * iceberg: return 400 for invalid namespace/table names The S3 Tables name charset (a-z, 0-9, _) is stricter than the Iceberg REST spec, so clients sending hyphens or uppercase hit a validation error. That error fell through to 500; it's client input, so map it to 400 BadRequestException across the namespace and table handlers. * iceberg: tighten name-validation error matching Match the validator's own phrasings (invalid/must/cannot) instead of a bare "namespace name"/"table name" substring, so an unrelated fault that happens to mention a name isn't misreported as a 400. Lowercase first to stay robust to message capitalization.	2026-06-23 00:42:42 -07:00
Chris Lu	0ded0984a4	iceberg: support namespace property updates (#10052 ) * iceberg: support namespace property updates Add POST /v1/namespaces/{namespace}/properties to the REST catalog. It applies the request's removals and updates and returns the removed/updated/ missing summary the spec defines. A new UpdateNamespace op on the S3 Tables manager rewrites the stored namespace properties; AWS S3 Tables namespaces have no properties, so this is the SeaweedFS-side backing for the catalog. * iceberg: dedup namespace property removals A key repeated in removals was deleted on its first occurrence, then reported as missing on the next — landing in both removed and missing. Skip keys already processed. * iceberg: map namespace-update backend errors to REST statuses UpdateNamespaceProperties returned 500 for every manager failure, masking the namespace being dropped between read and write, or a denied caller. Inspect the typed S3TablesError and answer 404/403 accordingly, 500 only for the rest. Also replaces the GetNamespace not-found string match. * iceberg: test the namespace-properties conflict path Cover the 422 returned when a key appears in both removals and updates. The check runs before any backend call, so it needs no filer.	2026-06-23 00:41:47 -07:00
7y-9	44d575100a	fix(s3api): preserve requested AES256 copy encryption (#10049 ) * fix(s3api): preserve requested AES256 copy encryption Problem CopyObject metadata processing ignored an explicit x-amz-server-side-encryption: AES256 request header. A destination copy could lose the requested SSE-S3 metadata even though KMS requests were handled. Root cause processMetadataBytes only wrote the destination SSE header when the requested algorithm was aws:kms. Any other explicit SSE algorithm fell through to the source-preservation branch. Fix Write the requested SSE algorithm whenever x-amz-server-side-encryption is present, and keep KMS-specific metadata handling limited to aws:kms. Co-authored-by: Codex <noreply@openai.com> * fix(s3api): reject unsupported copy encryption algorithms A mistyped or unsupported x-amz-server-side-encryption value on a copy request slipped past validation and got persisted as the destination's algorithm header, advertising encryption that was never applied. Reject anything other than AES256 or aws:kms up front. * fix(s3api): write SSE key metadata for empty encrypted copies A zero-byte source copied with an explicit SSE request took the no-content branch and never ran the encryption path, leaving the object with a bare algorithm header but no key. HEAD then advertised SSE while the encryption-state machine saw the header as orphaned. Run the inline encryption path when the destination requests encryption so the key metadata is written too. * s3api: use SSEAlgorithmKMS constant in copy metadata handling * test(s3api): cover source SSE preservation on copy * test(iam): allow the local client's real source IP in SourceIp tests The aws:SourceIp allow policies hardcoded the loopback CIDRs, but a CI runner reaching the server over localhost can be observed with one of the host's RFC1918 addresses (the S3 endpoint is advertised on a 10.x interface), so the positive-condition PutObject was denied and the allow assertion flaked while the deny path passed trivially. Broaden the allow list to loopback plus private ranges via a shared helper, and log the denial on each failed attempt so any residual failure is diagnosable. --------- Co-authored-by: Codex <noreply@openai.com> Co-authored-by: Chris Lu <chris.lu@gmail.com>	2026-06-22 22:19:24 -07:00
aCuteBegCinner	42ccfc0763	refactor: 将fmt.Errorf中的%v替换为%w以保留错误链 (#10050 ) 替换了多个文件中的错误格式化方式，使用%w包裹原始错误，保留完整的错误调用链以提升调试时的错误追踪能力。 Co-authored-by: guant <guant@chinaunicom.cn>	2026-06-22 21:31:45 -07:00
AlexALei	091d953c34	fix(benchmark): close CPU profile file handle after profiling (#10048 ) Co-authored-by: Contributor <contributor@example.com>	2026-06-22 20:33:22 -07:00
patrick	11b7b7247f	util: support IPv6 host port parsing (#10046 ) * util: support IPv6 host port parsing * Update weed/util/parse.go Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --------- Co-authored-by: Chris Lu <chrislusf@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-06-22 20:32:43 -07:00
DanielWu-star	55a54574af	fix: use %w instead of %v in fmt.Errorf to preserve error chain (#10047 ) In ec_task.go, 23 fmt.Errorf calls used %v verb to wrap errors, breaking the error chain introduced in Go 1.13. This prevents callers from using errors.Is() and errors.As() to inspect the underlying error type. Changed all fmt.Errorf calls from %v to %w to properly wrap errors, preserving the error chain for upstream callers. Note: glog.* logging calls and fmt.Sprintf calls intentionally keep %v as they are not error wrapping contexts. Co-authored-by: 吴奇臻 <wuqizhen@cmict.chinamobile.com>	2026-06-22 20:30:37 -07:00
dependabot[bot]	36f2ddcaea	build(deps): bump github.com/apache/iceberg-go from 0.5.0 to 0.6.0 (#10038 ) * build(deps): bump github.com/apache/iceberg-go from 0.5.0 to 0.6.0 Bumps [github.com/apache/iceberg-go](https://github.com/apache/iceberg-go) from 0.5.0 to 0.6.0. - [Release notes](https://github.com/apache/iceberg-go/releases) - [Commits](https://github.com/apache/iceberg-go/compare/v0.5.0...v0.6.0) --- updated-dependencies: - dependency-name: github.com/apache/iceberg-go dependency-version: 0.6.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> * iceberg: adapt worker to iceberg-go 0.6.0 API Fields() now yields iter.Seq2 (index, value); SortField.SourceID and PartitionField.SourceID are methods backed by SourceIDs; RemoveSnapshots takes a postCommit flag (false here, file cleanup runs through the filer). --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Chris Lu <chris.lu@gmail.com>	2026-06-22 11:51:37 -07:00
qzhello	9de9dbaa83	fix(shell): exclude failed EC shard copies from rebuild recoverability gate (#10043 ) * fix(shell): correct volume.list -writable filter unit and comparison * fix(shell): correct volume.list -writable filter unit and comparison * chore(shell): fix typo in EC shard helper param names * fix(shell): use exact match for volume.balance -racks/-nodes filter The old strings.Contains-based filter quietly included any id that was a substring of the user-supplied flag value (e.g. -racks=rack10 also matched rack1). Replace it with an exact-match set parsed from the comma-separated flag value, and add regression tests for both -racks and -nodes paths. Also fix a small typo in the "remote storage" error returned by maybeMoveOneVolume. * fix(shell): use exact match for volume.balance -racks/-nodes filter The old strings.Contains-based filter quietly included any id that was a substring of the user-supplied flag value (e.g. -racks=rack10 also matched rack1). Replace it with an exact-match set parsed from the comma-separated flag value, and add regression tests for both -racks and -nodes paths. Also fix a small typo in the "remote storage" error returned by maybeMoveOneVolume. * refactor(shell): drop nil sentinel in splitCSVSet, use len() in callers * fix(shell): exclude failed EC shard copies from rebuild recoverability gate prepareDataToRecover incremented the remote-shard counter before the copy RPC, so in apply mode a failed VolumeEcShardsCopy was still counted toward the DataShardsCount recoverability gate. The gate could then pass with fewer real shards than required, deferring the failure to the deeper generateMissingShards/reconstruct step and reporting an inflated shard count in the "not enough shards" error. Count the remote shard only after a successful copy (apply mode) or when planning (dry-run), and rename wouldCopy to recoverableRemoteShards for clarity. Add a regression test covering an apply-mode copy failure. * fix(shell): clean up copied EC shards when the recoverability gate fails A runtime copy failure can trip the gate after earlier copies already succeeded, stranding those working shards on the rebuilder. Return the copied shard ids on the error path and run the cleanup defer even when recovery fails, so the temp shards get deleted. --------- Co-authored-by: Chris Lu <chris.lu@gmail.com>	2026-06-22 11:23:23 -07:00
MorezMartin	6f1d4af035	fix(filer): propagate proxyChunkId query params to volume server (#10036 ) * fix(filer): propagate proxyChunkId query params to volume server When weed mount reads via filer proxy mode (-volumeServerAccess=filerProxy), the mount adds query params like readDeleted=true to chunk read requests. Two bugs prevented these from working: 1. filer_server_handlers.go extracted fileId from the raw RequestURI, which includes query params, corrupting the fileId (e.g. '6,abc&readDeleted=true'). Fix: use r.URL.Query().Get("proxyChunkId") for clean extraction. 2. filer_server_handlers_proxy.go didn't forward query params to the volume server. The urlStrings from LookupFileId already contain the fileId in the path, so just append the original query string. * filer: match chunk proxy by query param, not URI prefix order Order-dependent prefix slicing missed proxyChunkId when it wasn't the first query param. Gate on root path and read the parsed query value. * filer: drop internal proxyChunkId from proxied volume query Lookup URLs already carry the fileId in the path, so forwarding the raw query duplicated proxyChunkId onto the volume server. Strip it and only append the remaining caller params (e.g. readDeleted). --------- Co-authored-by: Chris Lu <chris.lu@gmail.com>	2026-06-22 11:21:29 -07:00
Chris Lu	16ba8af0b7	util/http: lazily init the global HTTP client to fix admin metrics nil panic (#10044 ) util/http: lazily init the global HTTP client GetGlobalHttpClient returned a nil client until InitGlobalHttpClient ran, which only happens in weed.go's main. Anything that starts a command in-process bypasses that: the admin server's metrics goroutine seeds a dashboard sample on startup, reaching fetchPublicUrlMap -> GetGlobalHttpClient().Do, and nil-derefs the receiver in GetHttpScheme. Init the client on first Get via sync.Once so it is never nil regardless of the startup path. InitGlobalHttpClient keeps its eager-init role through the same Once.	2026-06-22 10:20:02 -07:00

1 2 3 4 5 ...

14298 Commits