seaweedfs

mirror of https://github.com/seaweedfs/seaweedfs.git synced 2026-05-17 15:21:31 +00:00

Author	SHA1	Message	Date
Chris Lu	0315da9022	fix(s3api): self-heal stale .versions latest-version pointer on read (#9125 ) * fix(s3api): self-heal stale .versions latest-version pointer on read When the `.versions` directory metadata points at a version file that has gone missing (e.g. a crash between deleting the latest version and rewriting the pointer, or a concurrent delete racing with a read), `getLatestObjectVersion` bailed with a hard error that required manual repair. Tagging, ACL, retention, copy-source, and HEAD/GET all surfaced NoSuchKey even though other versions remained on disk. On `ErrNotFound`/`codes.NotFound` from the pointed-at version lookup, rescan the `.versions` directory, pick the newest remaining non-delete- marker entry, persist the repaired pointer (best-effort), and return that entry. If only delete markers (or nothing) remain, the caller still sees an error and the object correctly appears absent. Extracted the selection logic into a pure `selectLatestContentVersion` helper so `updateLatestVersionAfterDeletion` and the new self-heal path share a single implementation. A warning is logged whenever the heal kicks in so stale-pointer incidents remain visible in operator logs. * fix(s3api): self-heal must promote newest version even if delete marker Review feedback (gemini-code-assist): the self-heal path used `selectLatestContentVersion`, which skips delete markers. That had two bugs: 1. If the chronologically newest entry was a delete marker, an older content version would be promoted, effectively "undeleting" an object that was actually deleted. 2. If only delete markers remained, heal returned an error and the caller surfaced a hard 500 instead of the correct 404-with- x-amz-delete-marker response. Add `selectLatestVersion` that picks the newest entry regardless of type (content or delete marker) and use it in `healStaleLatestVersionPointer`. The promoted entry flows back through `doGetLatestObjectVersion` unchanged; downstream handlers already detect `ExtDeleteMarkerKey` on the returned entry and render NoSuchKey + `x-amz-delete-marker: true` (see `s3api_object_handlers.go:722-728`). Kept `selectLatestContentVersion` in place for `updateLatestVersionAfterDeletion`, which deliberately limits the pointer to a live content version in the post-deletion flow — changing that is out of scope for this fix. Added four tests for the new selector: - promotes newest delete marker over older content (the reviewer case) - picks content when content is newest - promotes newest delete marker when only delete markers remain - returns nil latestEntry on empty/untagged input * fix(s3api): paginate .versions scan in self-heal path Review feedback (gemini-code-assist, coderabbitai): the self-heal rescan did a single-shot list(..., 1000). For objects whose version ids use the old raw-timestamp format, filer ordering is lexicographic-ascending = oldest-first. If the .versions directory held more than one page of entries, the first page contained only the oldest, and the heal would promote an older version as the new "latest" — silently surfacing stale data on subsequent reads. Paginate through the whole directory with `filer.PaginationSize`, running `selectLatestVersion` per page and keeping a single running best candidate across pages (via `compareVersionIds`). This mirrors the pagination pattern already used by `getObjectVersionList` in the same file and closes the window for old-format stacks larger than one page. New-format (inverted-timestamp) ids were not affected because their lexicographic order matches newest-first, but paginating is still the right fix. Also updated the function doc to reflect that self-heal now promotes the newest entry regardless of type (content version or delete marker). * fix(s3api): don't resurrect deleted objects; wrap ErrNotFound sentinel Two review findings addressed: 1. `updateLatestVersionAfterDeletion` was using `selectLatestContentVersion` which skipped delete markers. Scenario: PUT v1, DELETE (dm1 written, pointer->dm1), PUT v3 (pointer->v3), user explicitly deletes v3 by versionId. Remaining files: v1, dm1. S3 semantics say the current version is the chronologically newest = dm1 (object appears deleted). Old code would promote v1, "undeleting" the object silently. Switched to `selectLatestVersion`, which picks the newest entry regardless of type. Also paginated the scan the same way the self-heal path does — otherwise a single-page `list(1000)` still mis-selects the latest for old-format version id stacks that exceed one page. Removed the `hasDeleteMarkers \|\| !isLast` branch: with `selectLatestVersion` any delete marker participates in the comparison and shows up as the latest when it is newest, so the "keep the directory on ambiguity" guard becomes unreachable. Full pagination also makes the `!isLast` guard unnecessary — we either see every entry or surface a list error. 2. `healStaleLatestVersionPointer`'s "no remaining version" error was a plain `fmt.Errorf`, so callers could not distinguish genuine object-absence from a scan failure via `errors.Is`. Wrap it with `filer_pb.ErrNotFound` so the sentinel flows through the chain (the outer wrap already uses `%w`). No test additions needed — `TestSelectLatestVersion_PromotesNewestDeleteMarker` already asserts the "newer delete marker beats older content" invariant that drives both fixes. * refactor(s3api): drop unused selectLatestContentVersion after review Review feedback flagged a stale comment claiming selectLatestContentVersion "mirrors" the post-deletion semantics of updateLatestVersionAfterDeletion. That claim became false when the post-deletion path switched to selectLatestVersion in the previous commit. Verified no production callers remain — only the helper's own tests referenced it — so the cleaner fix is to delete the dead code rather than rewrite the comment to explain "this exists but isn't used." - Removed selectLatestContentVersion. - Removed the four TestSelectLatestContentVersion_* tests. - Preserved a renamed TestSelectLatestVersion_MixedFormats so the mixed-format comparator coverage still runs against the active selector.	2026-04-17 14:57:59 -07:00
Chris Lu	bfea14320a	fix(s3): reject unknown POST policy conditions and extra x-amz form fields (#9124 ) * fix(s3): reject unknown POST policy conditions and extra x-amz form fields CheckPostPolicy previously accepted policy conditions with unknown $keys (e.g. "$foo") as satisfied, and only rejected stray X-Amz-Meta-* form fields. Reject unknown condition keys outright, and extend the extra- input-fields check to all X-Amz-* form fields except the reserved auth/signing headers. Matches AWS S3 POST Object behavior. * refactor(s3): drop redundant $x-amz-meta- prefix check in CheckPostPolicy The $x-amz- prefix already subsumes $x-amz-meta-, so the explicit $x-amz-meta- check adds no coverage. Simplify the else-if condition. Addresses gemini-code-assist review on PR #9124. * style(s3): align unknown-key policy error with [op, key, value] trailer Reformat the unknown-condition-key error in CheckPostPolicy to include the same "[op, key, value]" trailer used by the other condition-failed messages. The value slot is empty because no comparison occurs for an unknown key. The descriptive "unknown condition key" suffix is kept so operators can still tell this failure from a mismatched value. * fix(s3): honor starts-with prefix-stem POST policies when checking extras AWS POST policies use ["starts-with","$x-amz-meta-",""] to allow any X-Amz-Meta-* form field. The previous exact-match policyXAmzKeys would flag every X-Amz-Meta-Foo as an "Extra input fields" failure because only the stem X-Amz-Meta- was stored. Track starts-with conditions whose key ends in "-" with an empty value as prefix stems, and accept any X-Amz-* form field matching one of those stems. * fix(s3): validate value prefix for starts-with POST policy stems Drop the policy.Value == "" gate when detecting prefix-stem conditions so that ["starts-with","$x-amz-meta-","pfx-"] is recognized as a prefix rule. Track the required value prefix alongside the name prefix, enforce it against every matching form field in the extras loop, and skip the prefix-stem condition in the main iteration (it has no single form field to evaluate). Also include policy.Value in the unknown-condition error trailer for clearer debugging. Addresses gemini-code-assist review on PR #9124. * fix(s3): check every matching POST policy rule, not just the first The extras loop exited early on exact-key match and broke on the first matching prefix stem. Per AWS, a form field must satisfy every policy condition that applies to it, so an exact-match field must still honor any overlapping starts-with stem's value prefix, and multiple stems on the same field must all hold. Drop both early exits: start matched from the exact-key lookup, iterate all prefix stems, and fail on the first value-prefix violation. Addresses gemini-code-assist review on PR #9124.	2026-04-17 14:55:12 -07:00
Chris Lu	0bddc2652e	fix(s3): propagate validated POST form fields to upload headers (#9123 ) * fix(s3): propagate validated POST form fields to upload headers POST Object form fields like acl, Content-Encoding, x-amz-storage-class, x-amz-tagging, and x-amz-server-side-encryption were validated against the POST policy but never forwarded to the underlying PUT, so the validated values had no effect. Forward all non-reserved x-amz-* fields plus acl (as x-amz-acl) and Content-Encoding. Reserved POST policy mechanism fields (Policy, Signature, Key, etc.) are still excluded. * fix(s3): also forward Content-Language from POST form to upload headers AWS S3 POST Object supports Content-Language; add it to the set of content headers forwarded by applyPostPolicyFormHeaders alongside Cache-Control, Expires, Content-Disposition, and Content-Encoding. Addresses gemini-code-assist review on PR #9123. * refactor(s3): only look up form value in branches that use it applyPostPolicyFormHeaders previously called formValues.Get(k) for every form field, including fields that fall through the switch (non-reserved fields that are neither Acl, a forwarded content header, nor X-Amz-*). Move the lookup inside the switch cases that actually use it.	2026-04-17 14:55:06 -07:00
Chris Lu	2da24cc230	fix(s3): return 403 on POST policy violation instead of 307 redirect (#9122 ) * fix(s3): return 403 on POST policy violation instead of 307 redirect CheckPostPolicy failures previously responded with HTTP 307 Temporary Redirect to the request URL, which causes clients to re-POST and obscures the failure. Return 403 AccessDenied so the client surfaces the error. * test(s3): exercise PostPolicyBucketHandler end-to-end for 403 mapping Replace the shallow ErrAccessDenied tautology test with one that builds a signed POST multipart request whose policy conditions cannot be satisfied, calls PostPolicyBucketHandler directly, and asserts HTTP 403 with no Location redirect header. Addresses gemini-code-assist review on PR #9122. * fix(s3): surface POST policy failure reason in AccessDenied response Add s3err.WriteErrorResponseWithMessage so a caller can keep the standard error code mapping while providing a specific Message. Use it from PostPolicyBucketHandler so the XML body carries the CheckPostPolicy error (e.g. which condition failed or that the policy expired) rather than the generic "Access Denied." description. Addresses gemini-code- assist review on PR #9122. * refactor(s3err): delegate WriteErrorResponse to WriteErrorResponseWithMessage The two helpers shared every line except the Message override. Fold WriteErrorResponse into a one-line delegation that passes an empty message, so the request-id/mux/apiError logic lives in exactly one place. Addresses gemini-code-assist review on PR #9122.	2026-04-17 14:54:58 -07:00
Chris Lu	cce98fcecf	fix(s3): strip client-supplied X-SeaweedFS-Principal/Session-Token in AuthSignatureOnly (#9120 ) * fix(s3): strip client-supplied X-SeaweedFS-Principal/Session-Token in AuthSignatureOnly AuthSignatureOnly is the only auth gate in front of S3Tables routes (incl. CreateTableBucket) and UnifiedPostHandler, but unlike authenticateRequestInternal it did not clear the internal IAM trust headers before running signature verification. S3Tables authorizeIAMAction reads X-SeaweedFS-Principal directly from the request and prefers it over the authenticated identity's PrincipalArn, so a signed low-privilege caller could append that header after signing (unsigned header, SigV4 still verifies) and have IAM policy evaluated against a spoofed principal, bypassing authorization. Clear both X-SeaweedFS-Principal and X-SeaweedFS-Session-Token at the top of AuthSignatureOnly, mirroring the existing guard in authenticateRequestInternal. Add a regression test covering the header-injection path. * refactor(s3): route AuthSignatureOnly through authenticateRequestInternal Addresses review feedback: both entry points were independently maintaining the internal-IAM-header stripping and the auth-type dispatch switch. Collapse AuthSignatureOnly into a thin wrapper around authenticateRequestInternal so the security-critical header scrub and the signature-verify switch live in one place. Post-auth behavior unique to AuthSignatureOnly (AmzAccountId header) stays inline. No functional change beyond two harmless telemetry tweaks that now match authenticateRequestInternal: the per-branch glog verbosity shifts from V(3) to V(4), and the anonymous-found path now sets AmzAuthType. * refactor(s3): centralize X-SeaweedFS-Principal/Session-Token header names Introduce SeaweedFSPrincipalHeader and SeaweedFSSessionTokenHeader in weed/s3api/s3_constants so the trust-header literals are defined once and referenced consistently by the auth scrub, JWT auth path, bucket policy principal resolution, IAM authorization, and S3Tables IAM evaluation. Replace every remaining usage in weed/s3api and weed/s3api/s3tables. This removes the drift risk the reviewer called out: adding another call site with a typo can no longer silently bypass the scrub. Pure rename, no behavior change. No-op integration-test helper in test/s3/iam/s3_iam_framework.go left untouched (separate module, and the server now strips the client-supplied value regardless).	2026-04-17 12:23:21 -07:00
Chris Lu	88ac2d0431	security(s3api): reject unsigned x-amz-* headers in SigV4 requests (#9121 ) A presigned URL holder could attach arbitrary x-amz-* headers to a PUT request (e.g. x-amz-tagging, x-amz-acl, x-amz-storage-class, x-amz-server-side-encryption, x-amz-object-lock-, x-amz-meta-, x-amz-website-redirect-location, x-amz-grant-). Because only the headers declared in SignedHeaders participate in signature verification, the added headers bypass authentication; the PUT handler then persists them into the object's Extended metadata. Match the AWS SigV4 rule: every x-amz-* header present in the request must appear in SignedHeaders. Exempt x-amz-content-sha256 (already tamper-protected via the canonical request's payload-hash line) and, for presigned URLs, the SigV4 protocol parameters that live in the query string (X-Amz-Algorithm/Credential/Date/Expires/SignedHeaders/ Signature) in case they are duplicated as headers. Applies to both header-based and presigned SigV4; non-amz headers are unaffected.	2026-04-17 12:20:28 -07:00
Chris Lu	9554e259dd	fix(iceberg): route catalog clients to the right bucket and vend S3 endpoint (#9109 ) * fix(iceberg): route catalog clients to the right bucket and vend S3 endpoint DuckDB ATTACH 's3://<bucket>/' AS cat (TYPE 'ICEBERG', ...) was failing with "schema does not exist" because GET /v1/config ignored the warehouse query parameter and returned no overrides.prefix, so subsequent requests fell through to the hard-coded "warehouse" default bucket instead of the one the client attached. LoadTable also returned an empty config, forcing clients to discover the S3 endpoint out-of-band and producing 403s on direct iceberg_scan calls. - handleConfig now echoes overrides.prefix = bucket and defaults.warehouse when ?warehouse=s3://<bucket>/ is supplied. - getBucketFromPrefix honors a warehouse query parameter as a fallback for clients that skip the /v1/config handshake. - LoadTable responses advertise s3.endpoint and s3.path-style-access so clients can reach data files without separate configuration. Refs #9103 * address review feedback on iceberg S3 endpoint vending - deriveS3AdvertisedEndpoint is now a method on S3Options; honors externalUrl / S3_EXTERNAL_URL, switches to https when -s3.key.file is set, uses the https port when configured, and brackets IPv6 literals via util.JoinHostPort. - handleCreateTable returns s.buildFileIOConfig() in both its staged and final LoadTableResult branches so create and load flows see the same FileIO hints. - Add unit test coverage for the endpoint derivation scenarios. * address CI and review feedback for #9109 - DuckDB integration test now runs under its own newOAuthTestEnv (with a valid IAM config) so the OAuth2 client_credentials flow DuckDB requires actually works; the shared env has no registered credentials, which was the cause of the CI failure. Helper createTableWithToken was added to create tables via Bearer auth. - Tighten TestIssue9103_LoadTableDoesNotVendS3FileIOCredentials to also assert s3.path-style-access = "true", so a partial regression where the endpoint is vended but path-style is dropped still fails. - deriveS3AdvertisedEndpoint now logs a startup hint when it infers the host from os.Hostname because the bind IP is a wildcard, pointing operators at -s3.externalUrl / S3_EXTERNAL_URL for reverse-proxy deployments where the inferred name is not externally reachable. - handleConfig has a comment explaining that any sub-path in the warehouse URL is dropped because catalog routing is bucket-scoped. * fix(iceberg): make advertised S3 endpoint strictly opt-in; add region The wildcard-bind fallback to os.Hostname() in deriveS3AdvertisedEndpoint was hijacking correctly-configured clients: on the CI runner it produced http://runnervmrc6n4:<port>, which Spark (running in Docker) could not resolve, so Spark iceberg tests began failing after the endpoint started being vended in LoadTable responses. Change the rule so advertising is opt-in and never guesses a host that might not be routable: - -s3.externalUrl / S3_EXTERNAL_URL wins (covers reverse-proxy). - Otherwise, only an explicit, non-wildcard -s3.bindIp is used. - Wildcard / empty bind returns "" so no FileIO endpoint is vended and existing clients keep using their own configuration. buildFileIOConfig additionally vends s3.region (defaulting to the same value baked into table bucket ARNs) whenever it vends an endpoint, so DuckDB's attach does not fail with "No region was provided via the vended credentials" when the operator has opted in. The DuckDB issue-9103 integration test runs under an env with a wildcard bind, so it explicitly sets AWS_REGION in the docker run to pick up the same default. The HTTP-level LoadTable-vending test was dropped because its expectation is now conditional and already covered by unit tests in iceberg_issue_9103_test.go.	2026-04-16 15:51:43 -07:00
Chris Lu	00a2e22478	fix(mount): remove fid pool to stop master over-allocating volumes (#9111 ) * fix(mount): remove fid pool to stop master over-allocating volumes The writeback-cache fid pool pre-allocated file IDs with ExpectedDataSize = ChunkSizeLimit (typically 8+ MB). The master's PickForWrite charges count * expectedDataSize against the volume's effectiveSize, so a full pool refill could charge hundreds of MB against a single volume before any bytes were actually written. That tripped RecordAssign's hard-limit path and eagerly removed volumes from writable, causing the master to grow new volumes even when the real data being written was tiny. Drop the pool entirely. Every chunk upload goes through UploadWithRetry -> AssignVolume with no ExpectedDataSize hint, letting the master fall back to the 1 MB default estimate. The mount->filer grpc connection is already cached in pb.WithGrpcClient (non-streaming mode), so per-chunk AssignVolume is a unary RPC over an existing HTTP/2 stream, not a full dial. Path-based filer.conf storage rules now apply to mount chunk assigns again, which the pool had to skip. Also remove the now-unused operation.UploadWithAssignFunc and its AssignFunc type. * fix(upload): populate ExpectedDataSize from actual chunk bytes UploadWithRetry already buffers the full chunk into `data` before calling AssignVolume, so the real size is known. Previously the assign request went out with ExpectedDataSize=0, making the master fall back to the 1 MB DefaultNeedleSizeEstimate per fid — same over-reservation symptom the pool had, just smaller per call. Stamp ExpectedDataSize = len(data) before the assign RPC when the caller hasn't already set it. This covers mount chunk uploads, filer_copy, filersink, mq/logstore, broker_write, gateway_upload, and nfs — all the UploadWithRetry paths. * fix(assign): pass real ExpectedDataSize at every assign call site After removing the mount fid pool, per-chunk AssignVolume calls went out with ExpectedDataSize=0, making the master fall back to its 1 MB DefaultNeedleSizeEstimate. That's still an over-estimate for small writes. Thread the real payload size through every remaining assign site so RecordAssign charges effectiveSize accurately and stops prematurely marking volumes full. - filer: assignNewFileInfo now takes expectedDataSize and stamps it on both primary and alternate VolumeAssignRequests. Callers pass: - SSE data-to-chunk: len(data) - copy manifest save: len(data) - streamCopyChunk: srcChunk.Size - TUS sub-chunk: bytes read - saveAsChunk (autochunk/manifestize): 0 (small, size unknown until the reader is drained; master uses 1 MB default) - filer gRPC remote fetch-and-write: ExpectedDataSize = chunkSize after the adaptive chunkSize is computed. - ChunkedUploadOption.AssignFunc gains an expectedDataSize parameter; upload_chunked.go passes the buffered dataSize at the call site. S3 PUT assignFunc stamps it on the AssignVolumeRequest. - S3 copy: assignNewVolume / prepareChunkCopy take expectedDataSize; all seven call sites pass the source chunk's Size. - operation.SubmitFiles / FilePart.Upload: derive per-fid size from FileSize (average for batched requests, real per-chunk size for sequential chunk assigns). - benchmark: pass fileSize. - filer append-to-file: pass len(data). * fix(assign): thread size through SaveDataAsChunkFunctionType The saveAsChunk path (autochunk, filer_copy, webdav, mount) ran AssignVolume before the reader was drained, so it had to pass ExpectedDataSize=0 and fall back to the master's 1 MB default. Add an expectedDataSize parameter to SaveDataAsChunkFunctionType. - mergeIntoManifest already has the serialized manifest bytes, so it passes uint64(len(data)) directly. - Mount's saveDataAsChunk ignores the parameter because it uses UploadWithRetry, which already stamps len(data) on the assign after reading the payload. - webdav and filer_copy saveDataAsChunk follow the same UploadWithRetry path and also ignore the hint. - Filer's saveAsChunk (used for manifestize) plumbs the value to assignNewFileInfo so manifest-chunk assigns get a real size. Callers of saveFunc-as-value (weedfs_file_sync, dirty_pages_chunked) pass the chunk size they're about to upload.	2026-04-16 15:51:13 -07:00
Chris Lu	d0a09ea178	fix(s3): honor ChecksumAlgorithm on presigned URL uploads (#9076 ) * fix(s3): honor ChecksumAlgorithm on presigned URL uploads AWS SDK presigners hoist x-amz-sdk-checksum-algorithm (and related checksum headers) into the signed URL's query string, so servers must read either location. detectRequestedChecksumAlgorithm only looked at request headers, so presigned PUTs with ChecksumAlgorithm set validated and stored no additional checksum, and HEAD/GET never returned the x-amz-checksum-* header. Read these parameters from headers first, then fall back to a case-insensitive query-string lookup. Apply the same fallback when comparing an object-level checksum value against the computed one. Fixes #9075 * test(s3): presigned URL checksum integration tests (#9075) Adds test/s3/checksum with end-to-end coverage for flexible-checksum behavior on presigned URL uploads. Tests generate a presigned PUT URL with ChecksumAlgorithm set, upload the body with a plain http.Client (bypassing AWS SDK middleware so the server must honor the query-string hoisted x-amz-sdk-checksum-algorithm), then HEAD/GET with ChecksumMode=ENABLED and assert the stored x-amz-checksum-* header. Covers SHA256, SHA1, and a negative control with no checksum requested. Wires the new directory into s3-go-tests.yml as its own CI job. * perf(s3): parse presigned query once in detectRequestedChecksumAlgorithm Previously, each header fallback called getHeaderOrQuery, which re-parsed r.URL.Query() and allocated a new map on every invocation — up to eight times per PutObject request. Parse the raw query at most once per request (only when non-empty) and pass the pre-parsed url.Values into a new lookupHeaderOrQuery helper. Also drops a redundant strings.ToLower allocation in the case-insensitive query key scan (strings.EqualFold already handles ASCII case folding). Addresses review feedback from gemini-code-assist on PR #9076. * test(s3): honor credential env vars and add presigned upload timeout - init() now reads S3_ACCESS_KEY/S3_SECRET_KEY (and AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY / AWS_REGION fallbacks) so that `make test-with-server ACCESS_KEY=... SECRET_KEY=...` no longer authenticates with hardcoded defaults while the server has been started with different credentials. - uploadViaPresignedURL uses a dedicated http.Client with a 30s timeout instead of http.DefaultClient, so a stalled server fails fast in CI instead of blocking until the suite's global timeout fires. Addresses review feedback from coderabbitai on PR #9076. * test(s3): pass S3_PORT and credentials through to checksum tests - 'make test' now exports S3_ENDPOINT, S3_ACCESS_KEY, and S3_SECRET_KEY derived from the Makefile variables so the Go test process talks to the same endpoint/credentials that start-server was launched with. - start-server cleans up the background SeaweedFS process and PID file when the readiness poll times out, preventing stale port conflicts on subsequent runs. Addresses review feedback from coderabbitai on PR #9076. * ci(s3): raise checksum tests step timeout make test-with-server builds weed_binary, waits up to 90s for readiness, then runs go test -timeout=10m. The previous 12-minute step timeout only had ~2 minutes of headroom over the Go timeout, risking the Actions runner killing the step before tests reported a real failure. Bumps the job timeout from 15 to 20 minutes and the step timeout from 12 to 16 minutes, matching other S3 integration jobs. Addresses review feedback from coderabbitai on PR #9076. * perf(s3): thread pre-parsed query through putToFiler hot path Parse the request's query string once at the top of putToFiler and reuse the resulting url.Values for both the checksum-algorithm detection and the expected-checksum verification. Previously, the verification path called getHeaderOrQuery which re-parsed r.URL.Query() again on every PutObject, defeating the previous commit's single-parse goal. - Add parseRequestQuery + detectRequestedChecksumAlgorithmQ (the pre-parsed-query variant). detectRequestedChecksumAlgorithm is now a thin wrapper used by callers that do a single lookup. - putToFiler parses once and threads the result through both call sites. - Remove getHeaderOrQuery and update the unit test to use lookupHeaderOrQuery directly. Addresses follow-up review from gemini-code-assist on PR #9076. * test(s3): check io.ReadAll error in uploadViaPresignedURL helper * test(s3): drop SHA1 presigned test case The AWS SDK v2 presigner signs a Content-MD5 header at presign time for SHA1 PutObject requests even when no body is attached (the MD5 of the empty payload gets baked into the signed headers). Uploading the real body via a plain http.Client then trips SeaweedFS's MD5 validation and returns BadDigest — an SDK/presigner quirk, not a SeaweedFS bug. The SHA256 positive case already exercises the server-side query-hoisted algorithm path that issue #9075 is about, and the unit tests in weed/s3api cover each algorithm's header mapping. Drop the SHA1 integration case rather than chase SDK-specific workarounds. * test(s3): provide real Content-MD5 to presigned checksum test AWS SDK v2's flexible-checksum middleware signs a Content-MD5 header at presign time. There is no body to hash at that point, so it seeds the header with MD5 of the empty payload. When the real body is then PUT with a plain http.Client, SeaweedFS's server-side Content-MD5 verification correctly rejects the upload with BadDigest. Pre-compute the MD5 of the test body and thread it into PutObjectInput.ContentMD5 so the signed Content-MD5 matches the body that will actually be uploaded. The test still exercises the server-side path that reads X-Amz-Sdk-Checksum-Algorithm from the query string (the fix that PR #9076 is validating). * test(s3): send the signed Content-MD5 header on presigned upload uploadViaPresignedURL now accepts an extraHeaders map so callers can thread through headers that the presigner signed but the raw http request would otherwise omit. The SHA256 test passes the Content-MD5 it computed, matching what the presigner baked into the signature. Fixes SignatureDoesNotMatch seen in CI after the previous commit set ContentMD5 on the presign input without sending the corresponding header on the actual upload. * test(s3): build presigned URL with the raw v4 signer The AWS SDK v2 s3.PresignClient runs the flexible-checksum middleware for any PutObject input that carries ChecksumAlgorithm. That middleware injects a Content-MD5 header at presign time, and with no body present it seeds MD5-of-empty. Any subsequent upload of a non-empty body through a plain http.Client then trips SeaweedFS's Content-MD5 verification and returns BadDigest — not the code path that issue #9075 is about. Replace the PresignClient usage in the integration test with a direct call to v4.Signer.PresignHTTP, building a canonical URL whose query string already contains x-amz-sdk-checksum-algorithm=SHA256. This is exactly the shape of URL a browser/curl client would receive from any presigner that hoists the algorithm header, and it exercises the server-side fix from PR #9076 without dragging in SDK-specific middleware quirks. * test(s3): set X-Amz-Expires on presigned URL before signing v4.Signer.PresignHTTP does not add X-Amz-Expires on its own — the caller has to seed it into the request's query string so the signer includes it in the canonical query and the server accepts the presigned URL. Without it, SeaweedFS correctly returns AuthorizationQueryParametersError. Also adds a .gitignore for the make-managed test volume data, log file, and PID file so local `make test-with-server` runs do not leave artifacts tracked by git. Verified by running the integration tests locally: make test-with-server → both presigned checksum tests PASS.	2026-04-14 21:52:49 -07:00
Chris Lu	2818251dd5	fix(iceberg): clean stale data before creating a table (#9074 ) (#9077 ) * fix(iceberg): clean stale data before creating a table CREATE TABLE AS from Trino fails against the SeaweedFS Iceberg REST catalog with "Cannot create a table on a non-empty location". The catalog assigns every new table the deterministic <ns>/<table> path, and Trino's pre-write check rejects the CTAS whenever leftover objects live there — typically files from a prior DROP that did not purge the data, or an earlier aborted CTAS. Make the catalog the authority for table existence: before writing any metadata, look up the table in the S3 Tables catalog. If it is already registered, return the existing definition (idempotent create). If it is not registered, any objects still sitting at the target location are stale, so purge them before proceeding. Live tables are never touched — the cleanup path is guarded by the catalog lookup. Fixes #9074 * test(trino): regression for create/drop/recreate table (#9074) Exercises the exact sequence from the reported bug: CREATE TABLE without an explicit location, INSERT, DROP, then CREATE again with the same name, followed by a CTAS on top. Previously the recreate failed with "Cannot create a table on a non-empty location" because stale data files from the dropped table lingered at the deterministic <schema>/<table> path. The test also asserts the recreated table does not see the dropped data and that a drop/recreate CTAS cycle works. * fix(iceberg): purge dropped table location on DROP TABLE Recreate-at-same-location was failing with "Cannot create a table on a non-empty location" because DROP TABLE only removed the catalog entry and left data files behind. Trino's pre-write emptiness check then rejected the subsequent CREATE. Look up the table's storage location before deleting the catalog entry, and after a successful DeleteTable, purge the filer subtree at that location. The lookup uses the catalog — the authoritative owner of the name→location mapping — so cleanup is gated on a live table having existed; failed lookups skip cleanup and leave storage alone. Also pin the regression test to an explicit, fixed location so the recreate genuinely targets the same path the drop was supposed to free. * refactor(iceberg): use errors.As in isNoSuchTableError * fix(iceberg): drop create-preflight cleanup; harden cleanup path - Remove the destructive cleanupStaleTableLocation call from the CreateTable preflight. Storage cleanup now lives exclusively on the DROP path, so CreateTable has no side effects on storage beyond the new table's own metadata write. - Validate tablePath in cleanupStaleTableLocation by rejecting empty, ".", "..", or backslash-bearing segments before joining with the bucket prefix, so a crafted location cannot escape the bucket subtree. path.Clean would silently collapse traversal, so segments are checked raw. * test(trino): defer table drops in recreate test for failure-safe cleanup	2026-04-14 20:34:53 -07:00
Chris Lu	40c1797f8e	fix(s3): allow anonymous ListBuckets with prefix-scoped List action (#9073 ) * fix(s3): allow anonymous ListBuckets with prefix-scoped List action An anonymous identity holding a prefix-scoped action such as "List:prefix-" was denied at the auth middleware before ListBucketsHandler could apply the per-bucket visibility check. The middleware called CanDo with an empty bucket, which never matches a scoped action, so every anonymous ListBuckets request returned 403 even though matching buckets should have been visible. Defer ListBuckets authorization to the handler for the anonymous identity when it actually carries a List action, mirroring the existing behavior for authenticated users. Anonymous identities with no List action continue to be rejected at the global layer, preserving the secure-by-default posture. Fixes #9072 refactor(s3): make hasListAction a method on Identity Addresses PR review — consistent with existing CanDo/isAdmin methods and also treats Admin identities as implicitly having List permission.	2026-04-14 10:52:00 -07:00
Chris Lu	eaf561e86c	perf(s3): add optional shared in-memory chunk cache for GET (#9069 ) Adds the -s3.cacheCapacityMB flag (default 0, disabled) that attaches an in-memory chunk_cache.ChunkCacheInMemory to the server-wide ReaderCache introduced in the previous commit. When enabled, completed chunks are deposited into the shared cache as they are downloaded, so concurrent and repeat GETs of the same object hit memory instead of re-fetching chunks from volume servers. When 0 (the default) the shared ReaderCache still runs — it just attaches a nil chunk cache, so behaviour matches the previous commit exactly. No behaviour change for clusters that don't opt in. Disk-backed TieredChunkCache was evaluated and rejected: its synchronous SetChunk writes regressed cold reads ~12x on loopback because the chunk fetchers block on local disk I/O that is slower than the TCP volume-server fetch it is supposed to accelerate. Memory-only avoids that. Flag registered in all four S3 flag sites (s3.go, server.go, filer.go, mini.go) per the comment on command.S3Options. The chunk size used to convert CacheSizeMB → entry count is encapsulated in the s3ChunkCacheChunkSizeMB constant so it's easy to grep and revisit if the filer default chunk size changes. Measured on weed mini + 1 GiB random object over loopback, single curl on a presigned URL: cacheCapacityMB=0 (off): cold ~2900, warm ~2900 MB/s cacheCapacityMB=4096: cold ~2790, warm ~5050 MB/s (+70%)	2026-04-14 09:24:35 -07:00
Chris Lu	228ed25a01	perf(s3): route GET through ChunkReadAt + per-request ReaderCache (#9068 ) perf(s3): route GET through ChunkReadAt + shared ReaderCache The S3 GET path previously used filer.PrepareStreamContentWithPrefetch, which hands chunk bytes from the volume-server fetch goroutine to the consumer through an io.Pipe. io.Pipe is a synchronous rendezvous, so the prefetch=4 window only overlapped HTTP connection setup — the actual data bytes still flowed one pipe at a time. Switch to the same path WebDAV uses (server/webdav_server.go): build a filer.ChunkReadAt backed by a server-wide filer.ReaderCache. ReaderCache prefetches whole chunks into []byte buffers, so the prefetch window translates into real in-flight bytes and the consumer copies them out as memcpys. The ReaderCache is server-wide (not per-request) for two reasons: 1. ChunkReadAt.Close() destroys the ReaderCache's downloader map. With a per-request cache, the defer on the handler would wait for background chunk downloads that run on context.Background() — so a client disconnect would block handler cleanup on downloads that the client no longer wants, tying up goroutines and memory. 2. Concurrent requests for the same object can share in-flight downloads through the shared downloader map. No persistent ChunkCache is added in this commit — the ReaderCache is constructed with a nil *chunk_cache.TieredChunkCache (all its methods are nil-receiver safe). A follow-up PR wires in an in-memory chunk cache for cross-request warm hits. JWT for volume-server requests is generated internally by util_http.RetriedFetchChunkData from jwtSigningReadKey, so the new path remains compatible with JWT-protected clusters — this is the same mechanism the WebDAV and mount read paths have been using. Measured on weed mini + 1 GiB random object over loopback, cold cache, single-stream curl on a presigned URL: before (io.Pipe): 2100-2200 MB/s after (ChunkReadAt): 2900-3800 MB/s	2026-04-14 07:46:05 -07:00
Chris Lu	7aaa431bb4	s3api: prune bucket-scoped IAM actions on DeleteBucket (#9054 ) * s3api: prune bucket-scoped IAM actions on DeleteBucket DeleteBucket removed the bucket directory and collection but left behind any identity actions configured via s3.configure that were scoped to that bucket (e.g. Read:bucket, Write:bucket/prefix), leaving stale auth metadata that users expected to be cleaned up along with the bucket. After a successful delete, strip actions whose resource is exactly the bucket or a prefix under it, save via the credential manager, and let the existing filer metadata subscription fan the reload out to every S3 server. Wildcarded resources and global actions are preserved since they may cover other buckets; static identities are left untouched. Fixes #5310 * s3api: address review feedback on bucket IAM prune - Apply per-identity updates via credentialManager.UpdateUser instead of a full LoadConfiguration/SaveConfiguration round-trip, so the prune no longer clobbers concurrent IAM edits made by s3.configure or the IAM API during a DeleteBucket. - Use a 30s bounded background context for the post-delete cleanup so it survives client disconnect — the bucket is already gone by then and this is best-effort bookkeeping. - Skip static identities via IsStaticIdentity, since the credential store never persists them and UpdateUser would return NotFound.	2026-04-13 12:13:38 -07:00
Chris Lu	c390448906	fix(s3): preserve exact policy document in embedded IAM put/get-user-policy (#9025 ) * fix(s3): preserve exact policy document in embedded IAM PutUserPolicy/GetUserPolicy (#9008) The embedded IAM implementation (used when IAM requests go through the S3 gateway) discarded the original policy document on PutUserPolicy, storing only the lossy ident.Actions representation. GetUserPolicy then reconstructed the document from these coarse-grained actions, producing wildcard-expanded actions (s3:GetObject → s3:Get), duplicates, and collapsed resources (array → single string). PR #9009 fixed this in the standalone IAM server (weed/iamapi/) but the embedded IAM (weed/s3api/) — which is the code path most users hit — had the same bugs. Changes: - Add InlinePolicyStore optional interface to credential store, with implementations for FilerEtcStore (uses existing PoliciesCollection), MemoryStore, and PropagatingCredentialStore. - Embedded IAM PutUserPolicy now persists the original policy document via CredentialManager.PutUserInlinePolicy for lossless round-trips. - Embedded IAM GetUserPolicy first tries the stored inline policy; only falls back to lossy reconstruction from ident.Actions when no stored document exists (e.g. policies created before this fix). - Fix the fallback reconstruction: add action deduplication and preserve resource paths verbatim (no more spurious / appending). - Update DeleteUserPolicy/ListUserPolicies to use stored inline policies. * fix(s3): address PR review feedback for embedded IAM inline policies - Validate PolicyName is non-empty in PutUserPolicy and DeleteUserPolicy - Add recomputeActions() to aggregate ident.Actions from ALL stored inline policies on put/delete, fixing the issue where a second PutUserPolicy would overwrite the first policy's enforcement - Log errors from GetUserInlinePolicy in the GetUserPolicy fallback instead of silently ignoring them - Add initialization guards to MemoryStore GetUserInlinePolicy and ListUserInlinePolicies for consistency with other read methods * fix(s3): make inline policy persistence fatal and propagate recompute errors Address second round of review feedback: - recomputeActions() now returns ([]string, error) so callers can distinguish store failures from "no stored policies" and abort the mutation on transient errors instead of silently falling back. - PutUserInlinePolicy and DeleteUserInlinePolicy failures are now fatal: the API call returns ServiceFailure instead of logging and continuing, keeping ident.Actions and stored policy state in sync. * chore: gofmt weed/s3api/iceberg/handlers_oauth.go Pre-existing formatting issue from #9017; fixes S3 Tables Format Check CI.	2026-04-10 18:09:22 -07:00
Chris Lu	e648c76bcf	go fmt	2026-04-10 17:31:14 -07:00
Chris Lu	2b8c16160f	feat(iceberg): add OAuth2 token endpoint for DuckDB compatibility (#9017 ) * feat(iceberg): add OAuth2 token endpoint for DuckDB compatibility (#9015) DuckDB's Iceberg connector uses OAuth2 client_credentials flow, hitting POST /v1/oauth/tokens which was not implemented, returning 404. Add the OAuth2 token endpoint that accepts S3 access key / secret key as client_id / client_secret, validates them against IAM, and returns a signed JWT bearer token. The Auth middleware now accepts Bearer tokens in addition to S3 signature auth. * fix(test): use weed shell for table bucket creation with IAM enabled The S3 Tables REST API requires SigV4 auth when IAM is configured. Use weed shell (which bypasses S3 auth) to create table buckets, matching the pattern used by the Trino integration tests. * address review feedback: access key in JWT, full identity in Bearer auth - Include AccessKey in JWT claims so token verification uses the exact credential that signed the token (no ambiguity with multi-key identities) - Return full Identity object from Bearer auth so downstream IAM/policy code sees an authenticated request, not anonymous - Replace GetSecretKeyForIdentity with GetCredentialByAccessKey for unambiguous credential lookup - DuckDB test now tries the full SQL script first (CREATE SECRET + catalog access), falling back to simple CREATE SECRET if needed - Tighten bearer auth test assertion to only accept 200/500 Addresses review comments from coderabbitai and gemini-code-assist. * security: use PostFormValue, bind signing key to access key, fix port conflict - Use r.PostFormValue instead of r.FormValue to prevent credentials from leaking via query string into logs and caches - Reject client_secret in URL query parameters explicitly - Include access key in HMAC signing key derivation to prevent cross-credential token forgery when secrets happen to match - Allocate dedicated webdav port in OAuth test env to avoid port collision with the shared TestMain cluster	2026-04-10 11:18:11 -07:00
Chris Lu	ba90ae5c94	fix(s3): don't count ErrNotFound as filer health failure in failover (#8995 ) * fix(s3): don't count ErrNotFound as filer health failure in failover The S3 gateway's filer client failover was recording ErrNotFound (entry doesn't exist) as a filer health failure. In multi-filer setups where filers have separate metadata stores, normal object lookups that return "not found" accumulated in the circuit breaker, eventually marking healthy filers as unhealthy after just 3 lookups. This caused the distributed lock integration test to fail with 500 InternalError: once a filer was circuit-broken, subsequent lookups could no longer fall back, turning a would-be 412 PreconditionFailed into an unrecoverable internal error. Only record actual transport/server failures in the health tracker. The failover still tries other filers for data locality, but no longer penalizes filers for correctly reporting missing entries. * style: inline isNotFound variable for consistency The variable was only used once; inlining it matches the pattern already used in the failover loop a few lines below.	2026-04-08 17:08:57 -07:00
Chris Lu	e21d7602c3	feat(iam): implement group inline policy actions (#8992 ) * feat(iam): implement group inline policy actions Add PutGroupPolicy, GetGroupPolicy, DeleteGroupPolicy, and ListGroupPolicies to both embedded and standalone IAM servers. The standalone IAM stores group inline policies in a new GroupInlinePolicies field in the Policies JSON, mirroring the existing user inline policy pattern. DeleteGroup now also checks for inline policies before allowing deletion. * fix: address review feedback for group inline policies - Embedded IAM: return NotImplemented for group inline policies instead of silently succeeding as no-ops (Gemini + CodeRabbit) - Standalone IAM: recompute member actions after PutGroupPolicy and DeleteGroupPolicy (Gemini) - Add parameter validation for GroupName/PolicyName/PolicyDocument on PutGroupPolicy, DeleteGroupPolicy, ListGroupPolicies (Gemini) - Add UserName validation for ListUserPolicies in standalone IAM - Call cleanupGroupInlinePolicies from DeleteGroup (Gemini) - Migrate GroupInlinePolicies on group rename in UpdateGroup (CodeRabbit) - Fix integration test cleanup order (CodeRabbit) * fix: persist recomputed actions and improve error handling - Set changed=true for PutGroupPolicy/DeleteGroupPolicy in standalone IAM DoActions so recomputed member actions are persisted (Gemini critical) - Make cleanupGroupInlinePolicies accept policies parameter to avoid redundant I/O, return error (Gemini) - Make migrateGroupInlinePolicies return error, handle in caller (Gemini) * fix: include group policies in action recomputation Extend computeAllActionsForUser to also aggregate group inline policies and group managed policies when s3cfg is provided. Previously, group inline policies were stored but never reflected in member Identity.Actions. (CodeRabbit critical) * perf: use identity index in recomputeActionsForGroupMembers for O(N+M) * fix: skip group inline policy integration test on embedded IAM The embedded IAM returns NotImplemented for group inline policies. Skip TestIAMGroupInlinePolicy when running against embedded mode to avoid CI failures in the group integration test matrix.	2026-04-08 15:57:04 -07:00
Chris Lu	45ee2ab4b9	feat(iam): implement ListUserPolicies API action (#8991 ) * feat(iam): implement ListUserPolicies API action (#8987) Add ListUserPolicies support to both embedded and standalone IAM servers, resolving the NotImplemented error when calling `aws iam list-user-policies`. * fix: address review feedback for ListUserPolicies - Add handleImplicitUsername for ListUserPolicies in both IAM servers so omitting UserName defaults to the calling user (Gemini review) - Assert synthetic policy name in unit test (CodeRabbit) - Use require.True for error type assertion in integration test (CodeRabbit)	2026-04-08 12:27:03 -07:00
Chris Lu	efc7f3936f	fix(s3): handle empty URL path in forwarded prefix signature verification (#8973 ) fix(s3): handle empty URL path in forwarded prefix signature verification (#8966) When S3 is behind a reverse proxy with a forwarded prefix (e.g. /s3), requests with an empty URL path (like ListBuckets) would incorrectly get a trailing slash appended (e.g. /s3/), causing signature verification to fail because the client signs /s3 without the slash.	2026-04-07 13:22:21 -07:00
Chris Lu	79a48256f5	fix(s3): populate s3:prefix from query param for ListObjects policy conditions (#8971 ) * fix(s3): populate s3:prefix from query param for ListObjects policy conditions (#8969) ListObjectsV2/V1 requests with prefix-restricted STS session policies were denied because: 1. s3:prefix was derived from objectKey, which the auth middleware set to the prefix value, but the resource ARN then included the prefix (e.g. arn:aws:s3:::bucket/prefix) instead of staying at bucket level (arn:aws:s3:::bucket) as AWS requires for ListBucket. 2. When objectKey was empty (no middleware propagation), s3:prefix was never populated from the query parameter at all. Now AuthorizeAction extracts the prefix query parameter directly, sets it as s3:prefix in the request context, and uses a bucket-level resource ARN when the objectKey matches the propagated prefix. * fix(s3): use AWS-style wildcard matching for StringLike policy conditions filepath.Match treats * as not matching /, which breaks IAM StringLike conditions on paths (e.g. arn:aws:s3:::bucket/* won't match nested keys). Replace with a case-sensitive variant of AwsWildcardMatch that correctly treats * as matching any character including /. * refactor(s3): replace regex wildcard matching with string-based matcher Use the existing wildcard.MatchesWildcard utility instead of compiling and caching regexes for IAM wildcard matching. Removes the regexCache, its mutex, and the sync import. * refactor(s3): inline and remove AwsWildcardMatch wrapper functions Replace all call sites with direct wildcard.MatchesWildcard calls. * fix(s3): scope s3:prefix condition key to list operations only The s3:prefix logic was running for all actions, so a GetObject on "foo/bar" would wrongly populate s3:prefix. Restrict it to action "List" and always reset resourceObjectKey to "" so the resource ARN stays at bucket level. Also set s3:prefix to "" when no prefix is provided, so policies with StringEquals {"s3:prefix": ""} evaluate correctly.	2026-04-07 13:21:30 -07:00
Chris Lu	761ec7da00	fix(iceberg): use dot separator for namespace paths instead of unit separator (#8960 ) * fix(iceberg): use dot separator for namespace paths instead of unit separator The Iceberg REST Catalog handler was using \x1F (unit separator) to join multi-level namespaces when constructing S3 location and filer paths. The S3 Tables storage layer uses "." (dot) as the namespace separator, causing tables created via the Iceberg REST API to point to different paths than where S3 Tables actually stores them. Fixes #8959 * fix(iceberg): use dot separator in log messages for readable namespace output * fix(iceberg): use path.Join for S3 location path segments Use path.Join to construct the namespace/table path segments in fallback S3 locations for robustness and consistency with handleCreateTable. * test(iceberg): add multi-level namespace integration tests for Spark and Trino Add regression tests for #8959 that create a two-level namespace (e.g. "analytics.daily"), create a table under it, insert data, and query it back. This exercises the dot-separated namespace path construction and verifies that Spark/Trino can actually read the data at the S3 location returned by the Iceberg REST API. * fix(test): enable nested namespace in Trino Iceberg catalog config Trino requires `iceberg.rest-catalog.nested-namespace-enabled=true` to support multi-level namespaces. Without this, CREATE SCHEMA with a dotted name fails with "Nested namespace is not enabled for this catalog". * fix(test): parse Trino COUNT() output as integer instead of substring match Avoids false matches from strings.Contains(output, "3") by parsing the actual numeric result with strconv.Atoi and asserting equality. fix(test): use separate Trino config for nested namespace test The nested-namespace-enabled=true setting in Trino changes how SHOW SCHEMAS works, causing "Internal error" for all tests sharing that catalog config. Move the flag to a dedicated config used only by TestTrinoMultiLevelNamespace. * fix(iceberg): support parent query parameter in ListNamespaces for nested namespaces Add handling for the Iceberg REST spec's `parent` query parameter in handleListNamespaces. When Trino has nested-namespace-enabled=true, it sends `GET /v1/namespaces?parent=<ns>` to list child namespaces. The parent value is decoded from the Iceberg unit separator format and converted to a dot-separated prefix for the S3 Tables layer. Also simplify TestTrinoMultiLevelNamespace to focus on namespace operations (create, list, show tables) rather than data operations, since Trino's REST catalog has a non-empty location check that conflicts with server-side metadata creation. * fix(test): expand Trino multi-level namespace test and merge config helpers - Expand TestTrinoMultiLevelNamespace to create a table with explicit location, insert rows, query them back, and verify the S3 file path contains the dot-separated namespace (not \x1F). This ensures the original #8959 bug would be caught by the Trino integration test. - Merge writeTrinoConfig and writeTrinoNestedNamespaceConfig into a single parameterized function using functional options.	2026-04-07 12:21:22 -07:00
Chris Lu	733517df30	fix(s3): s3:PutObject bucket policy now implicitly allows multipart uploads (#8968 ) * fix(s3): s3:PutObject bucket policy now implicitly allows multipart uploads The PolicyEngine.evaluateStatement() method used raw regex matching for actions, bypassing the multipart-inherits-PutObject logic that only existed in the unused CompiledStatement.MatchesAction() code path. When a bucket policy granted only s3:PutObject, multipart upload operations (CreateMultipartUpload, UploadPart, CompleteMultipartUpload, etc.) were denied, forcing users to explicitly list every multipart action. Fixes https://github.com/seaweedfs/seaweedfs/discussions/8751 * fix(s3): add s3:UploadPartCopy to multipartActionSet and improve test coverage Add missing S3_ACTION_UPLOAD_PART_COPY constant and include it in multipartActionSet so UploadPartCopy is implicitly allowed by s3:PutObject. Also add a bucket-ARN sub-test for ListBucketMultipartUploads to verify that an object-only resource pattern does not match bucket-level requests.	2026-04-07 11:13:29 -07:00
Chris Lu	d1823d3784	fix(s3): include static identities in listing operations (#8903 ) * fix(s3): include static identities in listing operations Static identities loaded from -s3.config file were only stored in the S3 API server's in-memory state. Listing operations (s3.configure shell command, aws iam list-users) queried the credential manager which only returned dynamic identities from the backend store. Register static identities with the credential manager after loading so they are included in LoadConfiguration and ListUsers results, and filtered out before SaveConfiguration to avoid persisting them to the dynamic store. Fixes https://github.com/seaweedfs/seaweedfs/discussions/8896 * fix: avoid mutating caller's config and defensive copies - SaveConfiguration: use shallow struct copy instead of mutating the caller's config.Identities field - SetStaticIdentities: skip nil entries to avoid panics - GetStaticIdentities: defensively copy PolicyNames slice to avoid aliasing the original * fix: filter nil static identities and sync on config reload - SetStaticIdentities: filter nil entries from the stored slice (not just from staticNames) to prevent panics in LoadConfiguration/ListUsers - Extract updateCredentialManagerStaticIdentities helper and call it from both startup and the grace.OnReload handler so the credential manager's static snapshot stays current after config file reloads * fix: add mutex for static identity fields and fix ListUsers for store callers - Add sync.RWMutex to protect staticIdentities/staticNames against concurrent reads during config reload - Revert CredentialManager.ListUsers to return only store users, since internal callers (e.g. DeletePolicy) look up each user in the store and fail on non-existent static entries - Merge static usernames in the filer gRPC ListUsers handler instead, via the new GetStaticUsernames method - Fix CI: TestIAMPolicyManagement/managed_policy_crud_lifecycle was failing because DeletePolicy iterated static users that don't exist in the store * fix: show static identities in admin UI and weed shell The admin UI and weed shell s3.configure command query the filer's credential manager via gRPC, which is a separate instance from the S3 server's credential manager. Static identities were only registered on the S3 server's credential manager, so they never appeared in the filer's responses. - Add CredentialManager.LoadS3ConfigFile to parse a static S3 config file and register its identities - Add FilerOptions.s3ConfigFile so the filer can load the same static config that the S3 server uses - Wire s3ConfigFile through in weed mini and weed server modes - Merge static usernames in filer gRPC ListUsers handler - Add CredentialManager.GetStaticUsernames helper - Add sync.RWMutex to protect concurrent access to static identity fields - Avoid importing weed/filer from weed/credential (which pulled in filer store init() registrations and broke test isolation) - Add docker/compose/s3_static_users_example.json * fix(admin): make static users read-only in admin UI Static users loaded from the -s3.config file should not be editable or deletable through the admin UI since they are managed via the config file. - Add IsStatic field to ObjectStoreUser, set from credential manager - Hide edit, delete, and access key buttons for static users in the users table template - Show a "static" badge next to static user names - Return 403 Forbidden from UpdateUser and DeleteUser API handlers when the target user is a static identity * fix(admin): show details for static users GetObjectStoreUserDetails called credentialManager.GetUser which only queries the dynamic store. For static users this returned ErrUserNotFound. Fall back to GetStaticIdentity when the store lookup fails. * fix(admin): load static S3 identities in admin server The admin server has its own credential manager (gRPC store) which is a separate instance from the S3 server's and filer's. It had no static identity data, so IsStaticIdentity returned false (edit/delete buttons shown) and GetStaticIdentity returned nil (details page failed). Pass the -s3.config file path through to the admin server and call LoadS3ConfigFile on its credential manager, matching the approach used for the filer. * fix: use protobuf is_static field instead of passing config file path The previous approach passed -s3.config file path to every component (filer, admin). This is wrong because the admin server should not need to know about S3 config files. Instead, add an is_static field to the Identity protobuf message. The field is set when static identities are serialized (in GetStaticIdentities and LoadS3ConfigFile). Any gRPC client that loads configuration via GetConfiguration automatically sees which identities are static, without needing the config file. - Add is_static field (tag 8) to iam_pb.Identity proto message - Set IsStatic=true in GetStaticIdentities and LoadS3ConfigFile - Admin GetObjectStoreUsers reads identity.IsStatic from proto - Admin IsStaticUser helper loads config via gRPC to check the flag - Filer GetUser gRPC handler falls back to GetStaticIdentity - Remove s3ConfigFile from AdminOptions and NewAdminServer signature	2026-04-03 20:01:28 -07:00
Chris Lu	0798b274dd	feat(s3): add concurrent chunk prefetch for large file downloads (#8917 ) * feat(s3): add concurrent chunk prefetch for large file downloads Add a pipe-based prefetch pipeline that overlaps chunk fetching with response writing during S3 GetObject, SSE downloads, and filer proxy. While chunk N streams to the HTTP response, fetch goroutines for the next K chunks establish HTTP connections to volume servers ahead of time, eliminating the RTT gap between sequential chunk fetches. Uses io.Pipe for minimal memory overhead (~1MB per download regardless of chunk size, vs buffering entire chunks). Also increases the streaming read buffer from 64KB to 256KB to reduce syscall overhead. Benchmark results (64KB chunks, prefetch=4): - 0ms latency: 1058 → 2362 MB/s (2.2× faster) - 5ms latency: 11.0 → 41.7 MB/s (3.8× faster) - 10ms latency: 5.9 → 23.3 MB/s (4.0× faster) - 20ms latency: 3.1 → 12.1 MB/s (3.9× faster) * fix: address review feedback for prefetch pipeline - Fix data race: use chunkPipeResult (pointer) on channel to avoid copying struct while fetch goroutines write to it. Confirmed clean with -race detector. - Remove concurrent map write: retryWithCacheInvalidation no longer updates fileId2Url map. Producer only reads it; consumer never writes. - Use mem.Allocate/mem.Free for copy buffer to reduce GC pressure. - Add local cancellable context so consumer errors (client disconnect) immediately stop the producer and all in-flight fetch goroutines. fix(test): remove dead code and add Range header support in test server - Remove unused allData variable in makeChunksAndServer - Add Range header handling to createTestServer for partial chunk read coverage (206 Partial Content, 416 Range Not Satisfiable) * fix: correct retry condition and goroutine leak in prefetch pipeline - Fix retry condition: use result.fetchErr/result.written instead of copied to decide cache-invalidation retry. The old condition wrongly triggered retry when the fetch succeeded but the response writer failed on the first write (copied==0 despite fetcher having data). Now matches the sequential path (stream.go:197) which checks whether the fetcher itself wrote zero bytes. - Fix goroutine leak: when the producer's send to the results channel is interrupted by context cancellation, the fetch goroutine was already launched but the result was never sent to the channel. The drain loop couldn't handle it. Now waits on result.done before returning so every fetch goroutine is properly awaited.	2026-04-03 19:57:30 -07:00
Chris Lu	3efe88c718	feat(s3): store and return checksum headers for additional checksum algorithms (#8914 ) * feat(s3): store and return checksum headers for additional checksum algorithms When clients upload with --checksum-algorithm (SHA256, CRC32, etc.), SeaweedFS validated the checksum but discarded it. The checksum was never stored in metadata or returned in PUT/HEAD/GET responses. Now the checksum is computed alongside MD5 during upload, stored in entry extended attributes, and returned as the appropriate x-amz-checksum-* header in all responses. Fixes #8911 * fix(s3): address review feedback and CI failures for checksum support - Gate GET/HEAD checksum response headers on x-amz-checksum-mode: ENABLED per AWS S3 spec, fixing FlexibleChecksumError on ranged GETs and multipart copies - Verify computed checksum against client-provided header value for non-chunked uploads, returning BadDigest on mismatch - Add nil check for getCheckSumWriter to prevent panic - Handle comma-separated values in X-Amz-Trailer header - Use ordered slice instead of map for deterministic checksum header selection; extract shared mappings into package-level vars * fix(s3): skip checksum header for ranged GET responses The stored checksum covers the full object. Returning it for ranged (partial) responses causes SDK checksum validation failures because the SDK validates the header value against the partial content received. Skip emitting x-amz-checksum-* headers when a Range request header is present, fixing PyArrow large file read failures. * fix(s3): reject unsupported checksum algorithm with 400 detectRequestedChecksumAlgorithm now returns an error code when x-amz-sdk-checksum-algorithm or x-amz-checksum-algorithm contains an unsupported value, instead of silently ignoring it. * feat(s3): compute composite checksum for multipart uploads Store the checksum algorithm during CreateMultipartUpload, then during CompleteMultipartUpload compute a composite checksum from per-part checksums following the AWS S3 spec: concatenate raw per-part checksums, hash with the same algorithm, format as "base64-N" where N is part count. The composite checksum is persisted on the final object entry and returned in HEAD/GET responses (gated on x-amz-checksum-mode: ENABLED). Reuses existing per-part checksum storage from putToFiler and the getCheckSumWriter/checksumHeaders infrastructure. * fix(s3): validate checksum algorithm in CreateMultipartUpload, error on missing part checksums - Move detectRequestedChecksumAlgorithm call before mkdir callback so an unsupported algorithm returns 400 before the upload is created - Change computeCompositeChecksum to return an error when a part is missing its checksum (the upload was initiated with a checksum algorithm, so all parts must have checksums) - Propagate the error as ErrInvalidPart in CompleteMultipartUpload * fix(s3): return checksum header in CompleteMultipartUpload response, validate per-part algorithm - Add ChecksumHeaderName/ChecksumValue fields to CompleteMultipartUploadResult and set the x-amz-checksum-* HTTP response header in the handler, matching the AWS S3 CompleteMultipartUpload response spec - Validate that each part's stored checksum algorithm matches the upload's expected algorithm before assembling the composite checksum; return an error if a part was uploaded with a different algorithm	2026-04-03 18:37:54 -07:00
Chris Lu	995dfc4d5d	chore: remove ~50k lines of unreachable dead code (#8913 ) * chore: remove unreachable dead code across the codebase Remove ~50,000 lines of unreachable code identified by static analysis. Major removals: - weed/filer/redis_lua: entire unused Redis Lua filer store implementation - weed/wdclient/net2, resource_pool: unused connection/resource pool packages - weed/plugin/worker/lifecycle: unused lifecycle plugin worker - weed/s3api: unused S3 policy templates, presigned URL IAM, streaming copy, multipart IAM, key rotation, and various SSE helper functions - weed/mq/kafka: unused partition mapping, compression, schema, and protocol functions - weed/mq/offset: unused SQL storage and migration code - weed/worker: unused registry, task, and monitoring functions - weed/query: unused SQL engine, parquet scanner, and type functions - weed/shell: unused EC proportional rebalance functions - weed/storage/erasure_coding/distribution: unused distribution analysis functions - Individual unreachable functions removed from 150+ files across admin, credential, filer, iam, kms, mount, mq, operation, pb, s3api, server, shell, storage, topology, and util packages * fix(s3): reset shared memory store in IAM test to prevent flaky failure TestLoadIAMManagerFromConfig_EmptyConfigWithFallbackKey was flaky because the MemoryStore credential backend is a singleton registered via init(). Earlier tests that create anonymous identities pollute the shared store, causing LookupAnonymous() to unexpectedly return true. Fix by calling Reset() on the memory store before the test runs. * style: run gofmt on changed files * fix: restore KMS functions used by integration tests * fix(plugin): prevent panic on send to closed worker session channel The Plugin.sendToWorker method could panic with "send on closed channel" when a worker disconnected while a message was being sent. The race was between streamSession.close() closing the outgoing channel and sendToWorker writing to it concurrently. Add a done channel to streamSession that is closed before the outgoing channel, and check it in sendToWorker's select to safely detect closed sessions without panicking.	2026-04-03 16:04:27 -07:00
Chris Lu	8fad85aed7	feat(s3): support WEED_S3_SSE_KEY env var for SSE-S3 KEK (#8904 ) * feat(s3): support WEED_S3_SSE_KEY env var for SSE-S3 KEK Add support for providing the SSE-S3 Key Encryption Key (KEK) via the WEED_S3_SSE_KEY environment variable (hex-encoded 256-bit key). This avoids storing the master key in plaintext on the filer at /etc/s3/sse_kek. Key source priority: 1. WEED_S3_SSE_KEY environment variable (recommended) 2. Existing filer KEK at /etc/s3/sse_kek (backward compatible) 3. Auto-generate and save to filer (deprecated for new deployments) Existing deployments with a filer-stored KEK continue to work unchanged. A deprecation warning is logged when auto-generating a new filer KEK. * refactor(s3): derive KEK from any string via HKDF instead of requiring hex Accept any secret string in WEED_S3_SSE_KEY and derive a 256-bit key using HKDF-SHA256 instead of requiring a hex-encoded key. This is simpler for users — no need to generate hex, just set a passphrase. * feat(s3): add WEED_S3_SSE_KEK and WEED_S3_SSE_KEY env vars for KEK Two env vars for providing the SSE-S3 Key Encryption Key: - WEED_S3_SSE_KEK: hex-encoded, same format as /etc/s3/sse_kek. If the filer file also exists, they must match. - WEED_S3_SSE_KEY: any string, 256-bit key derived via HKDF-SHA256. Refuses to start if /etc/s3/sse_kek exists (must delete first). Only one may be set. Existing filer-stored KEKs continue to work. Auto-generating and storing new KEKs on filer is deprecated. * fix(s3): stop auto-generating KEK, fail only when SSE-S3 is used Instead of auto-generating a KEK and storing it on the filer when no key source is configured, simply leave SSE-S3 disabled. Encrypt and decrypt operations return a clear error directing the user to set WEED_S3_SSE_KEK or WEED_S3_SSE_KEY. * refactor(s3): move SSE-S3 KEK config to security.toml Move KEK configuration from standalone env vars to security.toml's new [sse_s3] section, following the same pattern as JWT keys and TLS certs. [sse_s3] kek = "" # hex-encoded 256-bit key (same format as /etc/s3/sse_kek) key = "" # any string, HKDF-derived Viper's WEED_ prefix auto-mapping provides env var support: WEED_SSE_S3_KEK and WEED_SSE_S3_KEY. All existing behavior is preserved: filer KEK fallback, mismatch detection, and HKDF derivation. * refactor(s3): rename SSE-S3 config keys to s3.sse.kek / s3.sse.key Use [s3.sse] section in security.toml, matching the existing naming convention (e.g. [s3.]). Env vars: WEED_S3_SSE_KEK, WEED_S3_SSE_KEY. fix(s3): address code review findings for SSE-S3 KEK - Don't hold mutex during filer retry loop (up to 20s of sleep). Lock only to write filerClient and superKey. - Remove dead generateAndSaveSuperKeyToFiler and unused constants. - Return error from deriveKeyFromSecret instead of ignoring it. - Fix outdated doc comment on InitializeWithFiler. - Use t.Setenv in tests instead of manual os.Setenv/Unsetenv. * fix(s3): don't block startup on filer errors when KEK is configured - When s3.sse.kek is set, a temporarily unreachable filer no longer prevents startup. The filer consistency check becomes best-effort with a warning. - Same treatment for s3.sse.key: filer unreachable logs a warning instead of failing. - Rewrite error messages to suggest migration instead of file deletion, avoiding the risk of orphaning encrypted data. Finding 3 (restore auto-generation) intentionally skipped — auto-gen was removed by design to avoid storing plaintext KEK on filer. * fix(test): set WEED_S3_SSE_KEY in SSE integration test server startup SSE-S3 no longer auto-generates a KEK, so integration tests must provide one. Set WEED_S3_SSE_KEY=test-sse-s3-key in all weed mini invocations in the test Makefile.	2026-04-03 13:01:21 -07:00
Chris Lu	059bee683f	feat(s3): add STS GetFederationToken support (#8891 ) * feat(s3): add STS GetFederationToken support Implement the AWS STS GetFederationToken API, which allows long-term IAM users to obtain temporary credentials scoped down by an optional inline session policy. This is useful for server-side applications that mint per-user temporary credentials. Key behaviors: - Requires SigV4 authentication from a long-term IAM user - Rejects calls from temporary credentials (session tokens) - Name parameter (2-64 chars) identifies the federated user - DurationSeconds supports 900-129600 (15 min to 36 hours, default 12h) - Optional inline session policy for permission scoping - Caller's attached policies are embedded in the JWT token - Returns federated user ARN: arn:aws:sts::<account>:federated-user/<Name> No performance impact on the S3 hot path — credential vending is a separate control-plane operation, and all policy data is embedded in the stateless JWT token. * fix(s3): address GetFederationToken PR review feedback - Fix Name validation: max 32 chars (not 64) per AWS spec, add regex validation for [\w+=,.@-]+ character whitelist - Refactor parseDurationSeconds into parseDurationSecondsWithBounds to eliminate duplicated duration parsing logic - Add sts:GetFederationToken permission check via VerifyActionPermission mirroring the AssumeRole authorization pattern - Change GetPoliciesForUser to return ([]string, error) so callers fail closed on policy-resolution failures instead of silently returning nil - Move temporary-credentials rejection before SigV4 verification for early rejection and proper test coverage - Update tests: verify specific error message for temp cred rejection, add regex validation test cases (spaces, slashes rejected) * refactor(s3): use sts.Action* constants instead of hard-coded strings Replace hard-coded "sts:AssumeRole" and "sts:GetFederationToken" strings in VerifyActionPermission calls with sts.ActionAssumeRole and sts.ActionGetFederationToken package constants. * fix(s3): pass through sts: prefix in action resolver and merge policies Two fixes: 1. mapBaseActionToS3Format now passes through "sts:" prefix alongside "s3:" and "iam:", preventing sts:GetFederationToken from being rewritten to s3:sts:GetFederationToken in VerifyActionPermission. This also fixes the existing sts:AssumeRole permission checks. 2. GetFederationToken policy embedding now merges identity.PolicyNames (from SigV4 identity) with policies from the IAM manager (which may include group-attached policies), deduplicated via a map. Previously the IAM manager lookup was skipped when identity.PolicyNames was non-empty, causing group policies to be omitted from the token. * test(s3): add integration tests for sts: action passthrough and policy merge Action resolver tests: - TestMapBaseActionToS3Format_ServicePrefixPassthrough: verifies s3:, iam:, and sts: prefixed actions pass through unchanged while coarse actions (Read, Write) are mapped to S3 format - TestResolveS3Action_STSActionsPassthrough: verifies sts:AssumeRole, sts:GetFederationToken, sts:GetCallerIdentity pass through ResolveS3Action unchanged with both nil and real HTTP requests Policy merge tests: - TestGetFederationToken_GetPoliciesForUser: tests IAMManager.GetPoliciesForUser with no user store (error), missing user, user with policies, user without - TestGetFederationToken_PolicyMergeAndDedup: tests that identity.PolicyNames and IAM-manager-resolved policies are merged and deduplicated (SharedPolicy appears in both sources, result has 3 unique policies) - TestGetFederationToken_PolicyMergeNoManager: tests that when IAM manager is unavailable, identity.PolicyNames alone are embedded * test(s3): add end-to-end integration tests for GetFederationToken Add integration tests that call GetFederationToken using real AWS SigV4 signed HTTP requests against a running SeaweedFS instance, following the existing pattern in test/s3/iam/s3_sts_assume_role_test.go. Tests: - TestSTSGetFederationTokenValidation: missing name, name too short/long, invalid characters, duration too short/long, malformed policy, anonymous rejection (7 subtests) - TestSTSGetFederationTokenRejectTemporaryCredentials: obtains temp creds via AssumeRole then verifies GetFederationToken rejects them - TestSTSGetFederationTokenSuccess: basic success, custom 1h duration, 36h max duration with expiration time verification - TestSTSGetFederationTokenWithSessionPolicy: creates a bucket, obtains federated creds with GetObject-only session policy, verifies GetObject succeeds and PutObject is denied using the AWS SDK S3 client	2026-04-02 17:37:05 -07:00
Chris Lu	a4b896a224	fix(s3): skip directories before marker in ListObjectVersions pagination (#8890 ) * fix(s3): skip directories before marker in ListObjectVersions pagination ListObjectVersions was re-traversing the entire directory tree from the beginning on every paginated request, only skipping entries at the leaf level. For buckets with millions of objects in deep hierarchies, this caused exponentially slower responses as pagination progressed. Two optimizations: 1. Use keyMarker to compute a startFrom position at each directory level, skipping directly to the relevant entry instead of scanning from the beginning (mirroring how ListObjects uses marker descent). 2. Skip recursing into subdirectories whose keys are entirely before the keyMarker. Changes per-page cost from O(entries_before_marker) to O(tree_depth). * test(s3): add integration test for deep-hierarchy version listing pagination Adds TestVersioningPaginationDeepDirectoryHierarchy which creates objects across 20 subdirectories at depth 6 (mimicking Veeam 365 backup layout) and paginates through them with small maxKeys. Verifies correctness (no duplicates, sorted order, all objects found) and checks that later pages don't take dramatically longer than earlier ones — the symptom of the pre-fix re-traversal bug. Also tests delimiter+pagination interaction across subdirectories. * test(s3): strengthen deep-hierarchy pagination assertions - Replace timing warning (t.Logf) with a failing assertion (t.Errorf) so pagination regressions actually fail the test. - Replace generic count/uniqueness/sort checks on CommonPrefixes with exact equality against the expected prefix slice, catching wrong-but- sorted results. * test(s3): use allKeys for exact assertion in deep-hierarchy pagination test Wire the allKeys slice (previously unused dead code) into the version listing assertion, replacing generic count/uniqueness/sort checks with an exact equality comparison against the keys that were created.	2026-04-02 15:59:52 -07:00
Chris Lu	7c59b639c9	STS: add GetCallerIdentity support (#8893 ) * STS: add GetCallerIdentity support Implement the AWS STS GetCallerIdentity action, which returns the ARN, account ID, and user ID of the caller based on SigV4 authentication. This is commonly used by AWS SDKs and CLI tools (e.g. `aws sts get-caller-identity`) to verify credentials and determine the authenticated identity. * test: remove trivial GetCallerIdentity tests Remove the XML unmarshal test (we don't consume this response as input) and the routing constant test (just asserts a literal equals itself). * fix: route GetCallerIdentity through STS in UnifiedPostHandler and use stable UserId - UnifiedPostHandler only dispatched actions starting with "AssumeRole" to STS, so GetCallerIdentity in a POST body would fall through to the IAM path and get AccessDenied for non-admin users. Add explicit check for GetCallerIdentity. - Use identity.Name as UserId instead of credential.AccessKey, which is a transient value and incorrect for STS assumed-role callers.	2026-04-02 15:59:09 -07:00
Chris Lu	ab4e52ae2f	fix(s3): use recursive delete for .versions directory cleanup (#8887 ) * fix(s3): use recursive delete for .versions directory cleanup When only delete markers remain in a .versions directory, updateLatestVersionAfterDeletion tried to delete it non-recursively, which failed with "fail to delete non-empty folder" because the delete marker entries were still present. Use recursive deletion so the directory and its remaining delete marker entries are cleaned up together. * fix(s3): guard .versions directory deletion against truncated listings When the version listing is truncated (>1000 entries), content versions may exist beyond the first page. Skip the recursive directory deletion in this case to prevent data loss. * fix(s3): preserve delete markers in .versions directory Delete markers must be preserved per S3 semantics — they are only removed by an explicit DELETE with versionId. The previous fix would recursively delete the entire .versions directory (including delete markers) when no content versions were found. Now the logic distinguishes three cases: 1. Content versions exist → update latest version metadata 2. Only delete markers remain (or listing truncated) → keep directory 3. Truly empty → safe to delete directory (non-recursive)	2026-04-02 11:55:13 -07:00
Chris Lu	efbed39e25	S3: map canned ACL to file permissions and add configurable default file mode (#8886 ) * S3: map canned ACL to file permissions and add configurable default file mode S3 uploads were hardcoded to 0660 regardless of ACL headers. Now the X-Amz-Acl header maps to Unix file permissions per-object: - public-read, authenticated-read, bucket-owner-read → 0644 - public-read-write → 0666 - private, bucket-owner-full-control → 0660 Also adds -defaultFileMode / -s3.defaultFileMode flag to set a server-wide default when no ACL header is present. Closes #8874 * Address review feedback for S3 file mode feature - Extract hardcoded 0660 to defaultFileMode constant - Change parseDefaultFileMode to return error instead of calling Fatalf - Add -s3.defaultFileMode flag to filer.go and mini.go (was missing) - Add doc comment to S3Options about updating all four flag sites - Add TestResolveFileMode with 10 test cases covering ACL mapping, server default, and priority ordering	2026-04-02 11:51:54 -07:00
Chris Lu	b3e50bb12f	fix(s3): remove customer encryption key from SSE-C debug log (#8875 ) * fix(s3): remove customer encryption key from SSE-C debug log The debug log in validateAndParseSSECHeaders was logging the raw customer-provided encryption key bytes in hex format (keyBytes=%x), leaking sensitive key material to log output. Remove the key bytes from the log statement while keeping the MD5 hash comparison info. * Apply suggestion from @gemini-code-assist[bot] Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-04-01 23:23:56 -07:00
msementsov	4c13a9ce65	Client disconnects create context cancelled errors, 500x errors and Filer lookup failures (#8845 ) * Update stream.go Client disconnects create context cancelled errors and Filer lookup failures * s3api: handle canceled stream requests cleanly * s3api: address canceled streaming review feedback --------- Co-authored-by: Chris Lu <chris.lu@gmail.com>	2026-03-30 12:11:30 -07:00
Lars Lehtonen	5c5d377277	weed/s3api: prune test-only functions (#8840 ) weed/s3api: prune functions that are referenced only from tests and the tests that exercise them.	2026-03-30 09:43:33 -07:00
Chris Lu	e8a6fcaafb	s3api: skip TTL fast-path for versioned buckets (#8823 ) * s3api: skip TTL fast-path for versioned buckets (#8757) PutBucketLifecycleConfiguration was translating Expiration.Days into filer.conf TTL entries for all buckets. For versioned buckets this is wrong: 1. TTL volumes expire as a unit, destroying all data — including noncurrent versions that should be preserved. 2. Filer-backend TTL (RocksDB compaction filter, Redis key expiry) removes entries without triggering chunk deletion, leaving orphaned volume data with 0 deleted bytes. 3. On AWS S3, Expiration.Days on a versioned bucket creates a delete marker — it does not hard-delete data. TTL has no such nuance. Fix: skip the TTL fast-path when the bucket has versioning enabled or suspended. All lifecycle rules are evaluated at scan time by the lifecycle worker instead. Also fix the lifecycle worker to evaluate Expiration rules against the latest version in .versions/ directories, which was previously skipped entirely — only NoncurrentVersionExpiration was handled. * lifecycle worker: handle SeaweedList error in versions dir cleanup Do not assume the directory is empty when the list call fails — log the error and skip the directory to avoid incorrect deletion. * address review feedback - Fetch version file for tag-based rules instead of reading tags from the .versions directory entry where they are not cached. - Handle getBucketVersioningStatus error by failing closed (treat as versioned) to avoid creating TTL entries on transient failures. - Capture and assert deleteExpiredObjects return values in test. - Improve test documentation.	2026-03-29 00:05:53 -07:00
Chris Lu	297cdef1a3	s3api: accept all supported lifecycle rule types (#8813 ) * s3api: accept NoncurrentVersionExpiration, AbortIncompleteMultipartUpload, Expiration.Date Update PutBucketLifecycleConfigurationHandler to accept all newly-supported lifecycle rule types. Only Transition and NoncurrentVersionTransition are still rejected (require storage class tier infrastructure). Changes: - Remove ErrNotImplemented for Expiration.Date (handled by worker at scan time) - Only reject rules with Transition.set or NoncurrentVersionTransition.set - Extract prefix from Filter.And when present - Add comment explaining that non-Expiration.Days rules are evaluated by the lifecycle worker from stored lifecycle XML, not via filer.conf TTL The lifecycle XML is already stored verbatim in bucket metadata, so new rule types are preserved on Get even without explicit handler support. Filer.conf TTL entries are only created for Expiration.Days (fast path). * s3api: skip TTL fast path for rules with tag or size filters Rules with tag or size constraints (Filter.Tag, Filter.And with tags or size bounds, Filter.ObjectSizeGreaterThan/LessThan) must not be lowered to filer.conf TTL entries, because TTL applies unconditionally to all objects under the prefix. These rules are evaluated at scan time by the lifecycle worker which checks each object's tags and size. Only simple Expiration.Days rules with prefix-only filters use the TTL fast path (RocksDB compaction filter). --------- Co-authored-by: Copilot <copilot@github.com>	2026-03-28 19:39:21 -07:00
Chris Lu	782ab84f95	lifecycle worker: drive MPU abort from lifecycle rules (#8812 ) * lifecycle worker: drive MPU abort from lifecycle rules Update the multipart upload abort phase to read AbortIncompleteMultipartUpload.DaysAfterInitiation from the parsed lifecycle rules. Falls back to the worker config abort_mpu_days when no lifecycle XML rule specifies the value. This means per-bucket MPU abort thresholds are now respected when set via PutBucketLifecycleConfiguration, instead of using a single global worker config value for all buckets. * lifecycle worker: only use config AbortMPUDays when no lifecycle XML exists When a bucket has lifecycle XML (useRuleEval=true) but no AbortIncompleteMultipartUpload rule, mpuAbortDays should be 0 (no abort), not the worker config default. The config fallback should only apply to buckets without lifecycle XML. * lifecycle worker: only skip .uploads at bucket root * lifecycle worker: use per-upload rule evaluation for MPU abort Replace the single bucket-wide mpuAbortDays with per-upload evaluation using s3lifecycle.EvaluateMPUAbort, which respects each rule's prefix filter and DaysAfterInitiation threshold. Previously the code took the first enabled abort rule's days value and applied it to all uploads, ignoring prefix scoping and multiple rules with different thresholds. Config fallback (abort_mpu_days) now only applies when lifecycle XML is truly absent (xmlPresent=false), not when XML exists but has no abort rules. Also fix EvaluateMPUAbort to use expectedExpiryTime for midnight-UTC semantics matching other lifecycle cutoffs. --------- Co-authored-by: Copilot <copilot@github.com>	2026-03-28 13:50:33 -07:00
Chris Lu	f52a3c87ce	lifecycle worker: fix ExpiredObjectDeleteMarker to match AWS semantics (#8811 ) * lifecycle worker: add NoncurrentVersionExpiration support Add version-aware scanning to the rule-based execution path. When the walker encounters a .versions directory, processVersionsDirectory(): - Lists all version entries (v_<versionId>) - Sorts by version timestamp (newest first) - Walks non-current versions with ShouldExpireNoncurrentVersion() which handles both NoncurrentDays and NewerNoncurrentVersions - Extracts successor time from version IDs (both old/new format) - Skips delete markers in noncurrent version counting - Falls back to entry Mtime when version ID timestamp is unavailable Helper functions: - sortVersionsByTimestamp: insertion sort by version ID timestamp - getEntryVersionTimestamp: extracts timestamp with Mtime fallback * lifecycle worker: address review feedback for noncurrent versions - Use sentinel errLimitReached in versions directory handler - Set NoncurrentIndex on ObjectInfo for proper NewerNoncurrentVersions evaluation * lifecycle worker: fail closed on XML parse error, guard zero Mtime - Fail closed when lifecycle XML exists but fails to parse, instead of falling back to TTL which could apply broader rules - Guard Mtime > 0 before using time.Unix(mtime, 0) to avoid mapping unset Mtime to 1970, which would misorder versions and cause premature expiration * lifecycle worker: count delete markers toward NoncurrentIndex Noncurrent delete markers should count toward the NewerNoncurrentVersions retention threshold so data versions get the correct position index. Previously, skipping delete markers without incrementing the index could retain too many versions after delete/recreate cycles. * lifecycle worker: fix version ordering, error propagation, and fail-closed scope 1. Use full version ID comparison (CompareVersionIds) for sorting .versions entries, not just decoded timestamps. Two versions with the same timestamp prefix but different random suffixes were previously misordered, potentially treating the newest version as noncurrent and deleting it. 2. Propagate .versions listing failures to the caller instead of swallowing them with (nil, 0). Transient filer errors on a .versions directory now surface in the job result. 3. Narrow the fail-closed path to only malformed lifecycle XML (errMalformedLifecycleXML). Transient filer LookupEntry errors now fall back to TTL with a warning, matching the original intent of "fail closed on bad config, not on network blips." * lifecycle worker: only skip .uploads at bucket root * lifecycle worker: sort.Slice, mixed-format test, XML presence tracking - Replace manual insertion sort with sort.Slice in sortVersionsByVersionId - Add TestCompareVersionIdsMixedFormats covering old/new format ordering - Distinguish "no lifecycle XML" (nil) from "XML present but no effective rules" (non-nil empty slice) so buckets with all-disabled rules don't incorrectly fall back to filer.conf TTL expiration * lifecycle worker: guard nil Attributes, use TrimSuffix in test - Guard entry.Attributes != nil before accessing GetFileSize() and Mtime in both listExpiredObjectsByRules and processVersionsDirectory - Use strings.TrimPrefix/TrimSuffix in TestVersionsDirectoryNaming to match the production code pattern * lifecycle worker: skip TTL scan when XML present, fix test assertions - When lifecycle XML is present but has no effective rules, skip object scanning entirely instead of falling back to TTL path - Test sort output against concrete expected names instead of re-using the same comparator as the sort itself * lifecycle worker: fix ExpiredObjectDeleteMarker to match AWS semantics Rewrite cleanupDeleteMarkers() to only remove delete markers that are the sole remaining version of an object. Previously, delete markers were removed unconditionally which could resurface older versions in versioned buckets. New algorithm: 1. Walk bucket tree looking for .versions directories 2. Check ExtLatestVersionIsDeleteMarker from directory metadata 3. Count versions in the .versions directory 4. Only remove if count == 1 (delete marker is sole version) 5. Require an ExpiredObjectDeleteMarker=true rule (when lifecycle XML rules are present) 6. Remove the empty .versions directory after cleanup This phase runs after NoncurrentVersionExpiration so version counts are accurate. * lifecycle worker: respect prefix filter in ExpiredObjectDeleteMarker rules Previously hasDeleteMarkerRule was a bucket-wide boolean that ignored rule prefixes. A prefix-scoped rule like "logs/" would incorrectly clean up delete markers in all paths. Add matchesDeleteMarkerRule() that checks if a matching enabled ExpiredObjectDeleteMarker rule exists for the specific object key, respecting the rule's prefix filter. Falls back to legacy behavior (allow cleanup) when no lifecycle XML rules are provided. * lifecycle worker: only skip .uploads at bucket root Check dir == bucketPath before skipping directories named .uploads. Previously a user-created directory like data/.uploads/ at any depth would be incorrectly skipped during lifecycle scanning. * lifecycle worker: fix delete marker cleanup with XML-present empty rules 1. matchesDeleteMarkerRule now uses nil check (not len==0) for legacy fallback. A non-nil empty slice means lifecycle XML was present but had no ExpiredObjectDeleteMarker rules, so cleanup is blocked. Previously, an empty slice triggered the legacy true path. 2. Use per-directory removedHere flag instead of cumulative cleaned counter when deciding to remove .versions directories. Previously, after the first successful cleanup anywhere in the bucket, every subsequent .versions directory would be removed even if its own delete marker was not actually deleted. * lifecycle worker: use full filter matching for delete marker rules matchesDeleteMarkerRule now uses s3lifecycle.MatchesFilter (exported) instead of prefix-only matching. This ensures tag and size filters on ExpiredObjectDeleteMarker rules are respected, preventing broader deletions than the configured policy intends. Add TestMatchesDeleteMarkerRule covering: nil rules (legacy), empty rules (XML present), prefix match/mismatch, disabled rules, rules without the flag, and tag-filtered rules against tagless markers. --------- Co-authored-by: Copilot <copilot@github.com>	2026-03-28 13:26:57 -07:00
Lars Lehtonen	b01a74c6bb	Prune Unused Functions from weed/s3api (#8815 ) * weed/s3api: prune calculatePartOffset() * weed/s3api: prune clearCachedListMetadata() * weed/s3api: prune S3ApiServer.isObjectRetentionActive() * weed/s3api: prune S3ApiServer.ensureDirectoryAllEmpty() * weed/s3api: prune s3ApiServer.getEncryptionTypeString() * weed/s3api: prune newStreamError() * weed/s3api: prune S3ApiServer.rotateSSECKey() weed/s3api: prune S3ApiServer.rotateSSEKMSKey() weed/s3api: prune S3ApiServer.rotateSSEKMSMetadataOnly() weed/s3api: prune S3ApiServer.rotateSSECChunks() weed/s3api: prune S3ApiServer.rotateSSEKMSChunks() weed/s3api: prune S3ApiServer.rotateSSECChunk() weed/s3api: prune S3ApiServer.rotateSSEKMSChunk() * weed/s3api: prune addCounterToIV() * weed/s3api: prune minInt() * weed/s3api: prune isMethodActionMismatch() * weed/s3api: prune hasSpecificQueryParameters() * weed/s3api: prune handlePutToFilerError() weed/s3api: prune handlePutToFilerInternalError() weed/s3api: prune logErrorAndReturn() weed/s3api: prune logInternalError weed/s3api: prune handleSSEError() weed/s3api: prune handleSSEInternalError() * weed/s3api: prune encryptionConfigToProto() * weed/s3api: prune S3ApiServer.touch()	2026-03-28 13:24:11 -07:00
Chris Lu	f6ec9941cb	lifecycle worker: NoncurrentVersionExpiration support (#8810 ) * lifecycle worker: add NoncurrentVersionExpiration support Add version-aware scanning to the rule-based execution path. When the walker encounters a .versions directory, processVersionsDirectory(): - Lists all version entries (v_<versionId>) - Sorts by version timestamp (newest first) - Walks non-current versions with ShouldExpireNoncurrentVersion() which handles both NoncurrentDays and NewerNoncurrentVersions - Extracts successor time from version IDs (both old/new format) - Skips delete markers in noncurrent version counting - Falls back to entry Mtime when version ID timestamp is unavailable Helper functions: - sortVersionsByTimestamp: insertion sort by version ID timestamp - getEntryVersionTimestamp: extracts timestamp with Mtime fallback * lifecycle worker: address review feedback for noncurrent versions - Use sentinel errLimitReached in versions directory handler - Set NoncurrentIndex on ObjectInfo for proper NewerNoncurrentVersions evaluation * lifecycle worker: fail closed on XML parse error, guard zero Mtime - Fail closed when lifecycle XML exists but fails to parse, instead of falling back to TTL which could apply broader rules - Guard Mtime > 0 before using time.Unix(mtime, 0) to avoid mapping unset Mtime to 1970, which would misorder versions and cause premature expiration * lifecycle worker: count delete markers toward NoncurrentIndex Noncurrent delete markers should count toward the NewerNoncurrentVersions retention threshold so data versions get the correct position index. Previously, skipping delete markers without incrementing the index could retain too many versions after delete/recreate cycles. * lifecycle worker: fix version ordering, error propagation, and fail-closed scope 1. Use full version ID comparison (CompareVersionIds) for sorting .versions entries, not just decoded timestamps. Two versions with the same timestamp prefix but different random suffixes were previously misordered, potentially treating the newest version as noncurrent and deleting it. 2. Propagate .versions listing failures to the caller instead of swallowing them with (nil, 0). Transient filer errors on a .versions directory now surface in the job result. 3. Narrow the fail-closed path to only malformed lifecycle XML (errMalformedLifecycleXML). Transient filer LookupEntry errors now fall back to TTL with a warning, matching the original intent of "fail closed on bad config, not on network blips." * lifecycle worker: only skip .uploads at bucket root * lifecycle worker: sort.Slice, mixed-format test, XML presence tracking - Replace manual insertion sort with sort.Slice in sortVersionsByVersionId - Add TestCompareVersionIdsMixedFormats covering old/new format ordering - Distinguish "no lifecycle XML" (nil) from "XML present but no effective rules" (non-nil empty slice) so buckets with all-disabled rules don't incorrectly fall back to filer.conf TTL expiration * lifecycle worker: guard nil Attributes, use TrimSuffix in test - Guard entry.Attributes != nil before accessing GetFileSize() and Mtime in both listExpiredObjectsByRules and processVersionsDirectory - Use strings.TrimPrefix/TrimSuffix in TestVersionsDirectoryNaming to match the production code pattern * lifecycle worker: skip TTL scan when XML present, fix test assertions - When lifecycle XML is present but has no effective rules, skip object scanning entirely instead of falling back to TTL path - Test sort output against concrete expected names instead of re-using the same comparator as the sort itself --------- Co-authored-by: Copilot <copilot@github.com>	2026-03-28 12:58:21 -07:00
Chris Lu	54dd4f091d	s3lifecycle: add lifecycle rule evaluator package and extend XML types (#8807 ) * s3api: extend lifecycle XML types with NoncurrentVersionExpiration, AbortIncompleteMultipartUpload Add missing S3 lifecycle rule types to the XML data model: - NoncurrentVersionExpiration with NoncurrentDays and NewerNoncurrentVersions - NoncurrentVersionTransition with NoncurrentDays and StorageClass - AbortIncompleteMultipartUpload with DaysAfterInitiation - Filter.ObjectSizeGreaterThan and ObjectSizeLessThan - And.ObjectSizeGreaterThan and ObjectSizeLessThan - Filter.UnmarshalXML to properly parse Tag, And, and size filter elements Each new type follows the existing set-field pattern for conditional XML marshaling. No behavior changes - these types are not yet wired into handlers or the lifecycle worker. * s3lifecycle: add lifecycle rule evaluator package New package weed/s3api/s3lifecycle/ provides a pure-function lifecycle rule evaluation engine. The evaluator accepts flattened Rule structs and ObjectInfo metadata, and returns the appropriate Action. Components: - evaluator.go: Evaluate() for per-object actions with S3 priority ordering (delete marker > noncurrent version > current expiration), ShouldExpireNoncurrentVersion() with NewerNoncurrentVersions support, EvaluateMPUAbort() for multipart upload rules - filter.go: prefix, tag, and size-based filter matching - tags.go: ExtractTags() extracts S3 tags from filer Extended metadata, HasTagRules() for scan-time optimization - version_time.go: GetVersionTimestamp() extracts timestamps from SeaweedFS version IDs (both old and new format) Comprehensive test coverage: 54 tests covering all action types, filter combinations, edge cases, and version ID formats. * s3api: add UnmarshalXML for Expiration, Transition, ExpireDeleteMarker Add UnmarshalXML methods that set the internal 'set' flag during XML parsing. Previously these flags were only set programmatically, causing XML round-trip to drop elements. This ensures lifecycle configurations stored as XML survive unmarshal/marshal cycles correctly. Add comprehensive XML round-trip tests for all lifecycle rule types including NoncurrentVersionExpiration, AbortIncompleteMultipartUpload, Filter with Tag/And/size constraints, and a complete Terraform-style lifecycle configuration. * s3lifecycle: address review feedback - Fix version_time.go overflow: guard timestampPart > MaxInt64 before the inversion subtraction to prevent uint64 wrap - Make all expiry checks inclusive (!now.Before instead of now.After) so actions trigger at the exact scheduled instant - Add NoncurrentIndex to ObjectInfo so Evaluate() can properly handle NewerNoncurrentVersions via ShouldExpireNoncurrentVersion() - Add test for high-bit overflow version ID * s3lifecycle: guard ShouldExpireNoncurrentVersion against zero SuccessorModTime Add early return when obj.IsLatest or obj.SuccessorModTime.IsZero() to prevent premature expiration of versions with uninitialized successor timestamps (zero value would compute to epoch, always expired). --------- Co-authored-by: Copilot <copilot@github.com>	2026-03-28 11:10:31 -07:00
Chris Lu	7d5cbfd547	s3: support s3:x-amz-server-side-encryption policy condition (#8806 ) * s3: support s3:x-amz-server-side-encryption policy condition (#7680) - Normalize x-amz-server-side-encryption header values to canonical form (aes256 → AES256, aws:kms mixed-case → aws:kms) so StringEquals conditions work regardless of client capitalisation - Exempt UploadPart and UploadPartCopy from SSE Null conditions: these actions inherit SSE from the initial CreateMultipartUpload request and do not re-send the header, so Deny/Null("true") should not block them - Add sse_condition_test.go covering StringEquals, Null, case-insensitive normalisation, and multipart continuation action exemption * s3: address review comments on SSE condition support - Replace "inherited" sentinel in injectSSEForMultipart with "AES256" so that StringEquals/Null conditions evaluate against a meaningful value; add TODO noting that KMS multipart uploads need the actual algorithm looked up from the upload state - Rewrite TestSSECaseInsensitiveNormalization to drive normalisation through EvaluatePolicyForRequest with a real http.Request so regressions in the production code path are caught; split into AES256 and aws:kms variants to cover both normalisation branches s3: plumb real inherited SSE from multipart upload state into policy eval Instead of injecting a static "AES256" sentinel for UploadPart/UploadPartCopy, look up the actual SSE algorithm from the stored CreateMultipartUpload entry and pass it through the evaluation chain. Changes: - PolicyEvaluationArgs gains InheritedSSEAlgorithm string; set by the BucketPolicyEngine wrapper for multipart continuation actions - injectSSEForMultipart(conditions, inheritedSSE) now accepts the real algorithm; empty string means no SSE → Null("true") fires correctly - IsMultipartContinuationAction exported so the s3api wrapper can use it - BucketPolicyEngine gets a MultipartSSELookup callback (set by S3ApiServer) that fetches the upload entry and reads SeaweedFSSSEKMSKeyID / SeaweedFSSSES3Encryption to determine the algorithm - S3ApiServer.getMultipartSSEAlgorithm implements the lookup via getEntry - Tests updated: three multipart cases (AES256, aws:kms, no-SSE-must-deny) plus UploadPartCopy coverage	2026-03-27 23:15:01 -07:00
Chris Lu	e3f052cd84	s3api: preserve lifecycle config responses for Terraform (#8805 ) * s3api: preserve lifecycle configs for terraform * s3api: bound lifecycle config request bodies * s3api: make bucket config updates copy-on-write * s3api: tighten string slice cloning	2026-03-27 22:50:02 -07:00
Chris Lu	0adb78bc6b	s3api: make conditional mutations atomic and AWS-compatible (#8802 ) * s3api: serialize conditional write finalization * s3api: add conditional delete mutation checks * s3api: enforce destination conditions for copy * s3api: revalidate multipart completion under lock * s3api: rollback failed put finalization hooks * s3api: report delete-marker version deletions * s3api: fix copy destination versioning edge cases * s3api: make versioned multipart completion idempotent * test/s3: cover conditional mutation regressions * s3api: rollback failed copy version finalization * s3api: resolve suspended delete conditions via latest entry * s3api: remove copy test null-version injection * s3api: reject out-of-order multipart completions * s3api: preserve multipart replay version metadata * s3api: surface copy destination existence errors * s3api: simplify delete condition target resolution * test/s3: make conditional delete assertions order independent * test/s3: add distributed lock gateway integration * s3api: fail closed multipart versioned completion * s3api: harden copy metadata and overwrite paths * s3api: create delete markers for suspended deletes * s3api: allow duplicate multipart completion parts	2026-03-27 19:22:26 -07:00
Chris Lu	17028fbf59	fix: serialize SSE-KMS metadata when bucket default encryption applies KMS (#8780 ) * fix: serialize SSE-KMS metadata when bucket default encryption applies KMS When a bucket has default SSE-KMS encryption enabled and a file is uploaded without explicit SSE headers, the encryption was applied correctly but the SSE-KMS metadata (x-seaweedfs-sse-kms-key) was not serialized. This caused downloads to fail with "empty SSE-KMS metadata" because the entry's Extended map stored an empty byte slice. The existing code already handled this for SSE-S3 bucket defaults (SerializeSSES3Metadata) but was missing the equivalent call to SerializeSSEKMSMetadata for the KMS path. Fixes seaweedfs/seaweedfs#8776 * ci: add KMS integration tests to GitHub Actions Add a kms-tests.yml workflow that runs on changes to KMS/SSE code with two jobs: 1. KMS provider tests: starts OpenBao via Docker, runs Go integration tests in test/kms/ against a real KMS backend 2. S3 KMS e2e tests: starts OpenBao + weed mini built from source, runs test_s3_kms.sh which covers bucket-default SSE-KMS upload/download (the exact scenario from #8776) Supporting changes: - test/kms/Makefile: add CI targets (test-provider-ci, test-s3-kms-ci) that manage OpenBao via plain Docker and run weed from source - test/kms/s3-config-openbao-template.json: S3 config template with OpenBao KMS provider for weed mini * refactor: combine SSE-S3 and SSE-KMS metadata serialization into else-if SSE-S3 and SSE-KMS bucket default encryption are mutually exclusive, so use a single if/else-if block instead of two independent if blocks. * Update .github/workflows/kms-tests.yml Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * fix(ci): start weed mini from data dir to avoid Docker filer.toml weed mini reads filer.toml from the current working directory first. When running from test/kms/, it picked up the Docker-targeted filer.toml which has dir="/data/filerdb" (a path that doesn't exist in CI), causing a fatal crash at filer store initialization. Fix by cd-ing to the data directory before starting weed mini. Also improve log visibility on failure. --------- Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>	2026-03-26 14:07:01 -07:00
Chris Lu	28fe92065a	S3: reject part uploads after AbortMultipartUpload (#8768 ) * S3: reject part uploads after AbortMultipartUpload PutObjectPartHandler did not verify that the multipart upload session still exists before accepting parts. After AbortMultipartUpload deleted the upload directory, the ErrNotFound from getEntry was silently ignored (treated as "may be non-SSE upload"), allowing parts to be stored as orphaned files. Now return ErrNoSuchUpload when the upload directory is not found, matching AWS S3 behavior. Fixes #8766 * S3: check upload existence unconditionally in PutObjectPartHandler Move the getEntry call out of the SSE-type conditional so the upload existence check runs for all part uploads, including SSE-C. Previously the SSE-C path skipped the check entirely, allowing parts to be uploaded after abort when SSE-C headers were present. Also flattens the nested SSE branching by one level now that getEntry is called once upfront. * S3: address PR review feedback for PutObjectPartHandler - Log at error level when getEntry fails with an unexpected error, since we return ErrInternalError to the client - Distinguish base IV decode errors from length validation failures with separate, clearer error messages --------- Co-authored-by: Copilot <copilot@github.com>	2026-03-24 18:11:51 -07:00
Chris Lu	0b3867dca3	filer: add structured error codes to CreateEntryResponse (#8767 ) * filer: add FilerError enum and error_code field to CreateEntryResponse Add a machine-readable error code alongside the existing string error field. This follows the precedent set by PublishMessageResponse in the MQ broker proto. The string field is kept for human readability and backward compatibility. Defined codes: OK, ENTRY_NAME_TOO_LONG, PARENT_IS_FILE, EXISTING_IS_DIRECTORY, EXISTING_IS_FILE, ENTRY_ALREADY_EXISTS. * filer: add sentinel errors and error code mapping in filer_pb Define sentinel errors (ErrEntryNameTooLong, ErrParentIsFile, etc.) in the filer_pb package so both the filer and consumers can reference them without circular imports. Add FilerErrorToSentinel() to map proto error codes to sentinels, and update CreateEntryWithResponse() to check error_code first, falling back to the string-based path for backward compatibility with old servers. * filer: return wrapped sentinel errors and set proto error codes Replace fmt.Errorf string errors in filer.CreateEntry, UpdateEntry, and ensureParentDirectoryEntry with wrapped filer_pb sentinel errors (using %w). This preserves errors.Is() traversal on the server side. In the gRPC CreateEntry handler, map sentinel errors to the corresponding FilerError proto codes using errors.Is(), setting both resp.Error (string, for backward compat) and resp.ErrorCode (enum). * S3: use errors.Is() with filer sentinels instead of string matching Replace fragile string-based error matching in filerErrorToS3Error and other S3 API consumers with errors.Is() checks against filer_pb sentinel errors. This works because the updated CreateEntryWithResponse helper reconstructs sentinel errors from the proto FilerError code. Update iceberg stage_create and metadata_files to check resp.ErrorCode instead of parsing resp.Error strings. Update SSE-S3 to use errors.Is() for the already-exists check. String matching is retained only for non-filer errors (gRPC transport errors, checksum validation) that don't go through CreateEntryResponse. * filer: remove backward-compat string fallbacks for error codes Clients and servers are always deployed together, so there is no need for backward-compatibility fallback paths that parse resp.Error strings when resp.ErrorCode is unset. Simplify all consumers to rely solely on the structured error code. * iceberg: ensure unknown non-OK error codes are not silently ignored When FilerErrorToSentinel returns nil for an unrecognized error code, return an error including the code and message rather than falling through to return nil. * filer: fix redundant error message and restore error wrapping in helper Use request path instead of resp.Error in the sentinel error format string to avoid duplicating the sentinel message (e.g. "entry already exists: entry already exists"). Restore %w wrapping with errors.New() in the fallback paths so callers can use errors.Is()/errors.As(). * filer: promote file to directory on path conflict instead of erroring S3 allows both "foo/bar" (object) and "foo/bar/xyzzy" (another object) to coexist because S3 has a flat key space. When ensureParentDirectoryEntry finds a parent path that is a file instead of a directory, promote it to a directory by setting ModeDir while preserving the original content and chunks. Use Store.UpdateEntry directly to bypass the Filer.UpdateEntry type-change guard. This fixes the S3 compatibility test failures where creating overlapping keys (e.g. "foo/bar" then "foo/bar/xyzzy") returned ExistingObjectIsFile.	2026-03-24 17:08:22 -07:00

1 2 3 4 5 ...

1059 Commits