mirror of
https://github.com/seaweedfs/seaweedfs.git
synced 2026-05-17 23:31:31 +00:00
4.21
126 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
d0a09ea178 |
fix(s3): honor ChecksumAlgorithm on presigned URL uploads (#9076)
* fix(s3): honor ChecksumAlgorithm on presigned URL uploads AWS SDK presigners hoist x-amz-sdk-checksum-algorithm (and related checksum headers) into the signed URL's query string, so servers must read either location. detectRequestedChecksumAlgorithm only looked at request headers, so presigned PUTs with ChecksumAlgorithm set validated and stored no additional checksum, and HEAD/GET never returned the x-amz-checksum-* header. Read these parameters from headers first, then fall back to a case-insensitive query-string lookup. Apply the same fallback when comparing an object-level checksum value against the computed one. Fixes #9075 * test(s3): presigned URL checksum integration tests (#9075) Adds test/s3/checksum with end-to-end coverage for flexible-checksum behavior on presigned URL uploads. Tests generate a presigned PUT URL with ChecksumAlgorithm set, upload the body with a plain http.Client (bypassing AWS SDK middleware so the server must honor the query-string hoisted x-amz-sdk-checksum-algorithm), then HEAD/GET with ChecksumMode=ENABLED and assert the stored x-amz-checksum-* header. Covers SHA256, SHA1, and a negative control with no checksum requested. Wires the new directory into s3-go-tests.yml as its own CI job. * perf(s3): parse presigned query once in detectRequestedChecksumAlgorithm Previously, each header fallback called getHeaderOrQuery, which re-parsed r.URL.Query() and allocated a new map on every invocation — up to eight times per PutObject request. Parse the raw query at most once per request (only when non-empty) and pass the pre-parsed url.Values into a new lookupHeaderOrQuery helper. Also drops a redundant strings.ToLower allocation in the case-insensitive query key scan (strings.EqualFold already handles ASCII case folding). Addresses review feedback from gemini-code-assist on PR #9076. * test(s3): honor credential env vars and add presigned upload timeout - init() now reads S3_ACCESS_KEY/S3_SECRET_KEY (and AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY / AWS_REGION fallbacks) so that `make test-with-server ACCESS_KEY=... SECRET_KEY=...` no longer authenticates with hardcoded defaults while the server has been started with different credentials. - uploadViaPresignedURL uses a dedicated http.Client with a 30s timeout instead of http.DefaultClient, so a stalled server fails fast in CI instead of blocking until the suite's global timeout fires. Addresses review feedback from coderabbitai on PR #9076. * test(s3): pass S3_PORT and credentials through to checksum tests - 'make test' now exports S3_ENDPOINT, S3_ACCESS_KEY, and S3_SECRET_KEY derived from the Makefile variables so the Go test process talks to the same endpoint/credentials that start-server was launched with. - start-server cleans up the background SeaweedFS process and PID file when the readiness poll times out, preventing stale port conflicts on subsequent runs. Addresses review feedback from coderabbitai on PR #9076. * ci(s3): raise checksum tests step timeout make test-with-server builds weed_binary, waits up to 90s for readiness, then runs go test -timeout=10m. The previous 12-minute step timeout only had ~2 minutes of headroom over the Go timeout, risking the Actions runner killing the step before tests reported a real failure. Bumps the job timeout from 15 to 20 minutes and the step timeout from 12 to 16 minutes, matching other S3 integration jobs. Addresses review feedback from coderabbitai on PR #9076. * perf(s3): thread pre-parsed query through putToFiler hot path Parse the request's query string once at the top of putToFiler and reuse the resulting url.Values for both the checksum-algorithm detection and the expected-checksum verification. Previously, the verification path called getHeaderOrQuery which re-parsed r.URL.Query() again on every PutObject, defeating the previous commit's single-parse goal. - Add parseRequestQuery + detectRequestedChecksumAlgorithmQ (the pre-parsed-query variant). detectRequestedChecksumAlgorithm is now a thin wrapper used by callers that do a single lookup. - putToFiler parses once and threads the result through both call sites. - Remove getHeaderOrQuery and update the unit test to use lookupHeaderOrQuery directly. Addresses follow-up review from gemini-code-assist on PR #9076. * test(s3): check io.ReadAll error in uploadViaPresignedURL helper * test(s3): drop SHA1 presigned test case The AWS SDK v2 presigner signs a Content-MD5 header at presign time for SHA1 PutObject requests even when no body is attached (the MD5 of the empty payload gets baked into the signed headers). Uploading the real body via a plain http.Client then trips SeaweedFS's MD5 validation and returns BadDigest — an SDK/presigner quirk, not a SeaweedFS bug. The SHA256 positive case already exercises the server-side query-hoisted algorithm path that issue #9075 is about, and the unit tests in weed/s3api cover each algorithm's header mapping. Drop the SHA1 integration case rather than chase SDK-specific workarounds. * test(s3): provide real Content-MD5 to presigned checksum test AWS SDK v2's flexible-checksum middleware signs a Content-MD5 header at presign time. There is no body to hash at that point, so it seeds the header with MD5 of the empty payload. When the real body is then PUT with a plain http.Client, SeaweedFS's server-side Content-MD5 verification correctly rejects the upload with BadDigest. Pre-compute the MD5 of the test body and thread it into PutObjectInput.ContentMD5 so the signed Content-MD5 matches the body that will actually be uploaded. The test still exercises the server-side path that reads X-Amz-Sdk-Checksum-Algorithm from the query string (the fix that PR #9076 is validating). * test(s3): send the signed Content-MD5 header on presigned upload uploadViaPresignedURL now accepts an extraHeaders map so callers can thread through headers that the presigner signed but the raw http request would otherwise omit. The SHA256 test passes the Content-MD5 it computed, matching what the presigner baked into the signature. Fixes SignatureDoesNotMatch seen in CI after the previous commit set ContentMD5 on the presign input without sending the corresponding header on the actual upload. * test(s3): build presigned URL with the raw v4 signer The AWS SDK v2 s3.PresignClient runs the flexible-checksum middleware for any PutObject input that carries ChecksumAlgorithm. That middleware injects a Content-MD5 header at presign time, and with no body present it seeds MD5-of-empty. Any subsequent upload of a non-empty body through a plain http.Client then trips SeaweedFS's Content-MD5 verification and returns BadDigest — not the code path that issue #9075 is about. Replace the PresignClient usage in the integration test with a direct call to v4.Signer.PresignHTTP, building a canonical URL whose query string already contains x-amz-sdk-checksum-algorithm=SHA256. This is exactly the shape of URL a browser/curl client would receive from any presigner that hoists the algorithm header, and it exercises the server-side fix from PR #9076 without dragging in SDK-specific middleware quirks. * test(s3): set X-Amz-Expires on presigned URL before signing v4.Signer.PresignHTTP does not add X-Amz-Expires on its own — the caller has to seed it into the request's query string so the signer includes it in the canonical query and the server accepts the presigned URL. Without it, SeaweedFS correctly returns AuthorizationQueryParametersError. Also adds a .gitignore for the make-managed test volume data, log file, and PID file so local `make test-with-server` runs do not leave artifacts tracked by git. Verified by running the integration tests locally: make test-with-server → both presigned checksum tests PASS. |
||
|
|
10e7f0f2bc |
fix(shell): s3.user.provision handles existing users by attaching policy (#9040)
* fix(shell): s3.user.provision handles existing users by attaching policy Instead of erroring when the user already exists, the command now creates the policy and attaches it to the existing user via UpdateUser. Credentials are only generated and displayed for newly created users. * fix(shell): skip duplicate policy attachment in s3.user.provision Check if the policy is already attached before appending and calling UpdateUser, making repeated runs idempotent. * fix(shell): generate service account ID in s3.serviceaccount.create The command built a ServiceAccount proto without setting Id, which was rejected by credential.ValidateServiceAccountId on any real store. Now generates sa:<parent>:<uuid> matching the format used by the admin UI. * test(s3): integration tests for s3.* shell commands Adds TestShell* integration tests covering ~40 previously untested shell commands: user, accesskey, group, serviceaccount, anonymous, bucket, policy.attach/detach, config.show, and iam.export/import. Switches the test cluster's credential store from memory to filer_etc because the memory store silently drops groups and service accounts in LoadConfiguration/SaveConfiguration. * fix(shell): rollback policy on key generation failure in s3.user.provision If iam.GenerateRandomString or iam.GenerateSecretAccessKey fails after the policy was persisted, the policy would be left orphaned. Extracts the rollback logic into a local closure and invokes it on all failure paths after policy creation for consistency. * address PR review feedback for s3 shell tests and serviceaccount - s3.serviceaccount.create: use 16 bytes of randomness (hex-encoded) for the service account UUID instead of 4 bytes to eliminate collision risk - s3.serviceaccount.create: print the actual ID and drop the outdated "server-assigned" note (the ID is now client-generated) - tests: guard createdAK in accesskey rotate/delete subtests so sibling failures don't run invalid CLI calls - tests: requireContains/requireNotContains use t.Fatalf to fail fast - tests: Provision subtest asserts the "Attached policy" message on the second provision call for an existing user - tests: update extractServiceAccountID comment example to match the sa:<parent>:<uuid> format - tests: drop redundant saID empty-check (extractServiceAccountID fatals) * test(s3): use t.Fatalf for precondition check in serviceaccount test |
||
|
|
e648c76bcf | go fmt | ||
|
|
e21d7602c3 |
feat(iam): implement group inline policy actions (#8992)
* feat(iam): implement group inline policy actions Add PutGroupPolicy, GetGroupPolicy, DeleteGroupPolicy, and ListGroupPolicies to both embedded and standalone IAM servers. The standalone IAM stores group inline policies in a new GroupInlinePolicies field in the Policies JSON, mirroring the existing user inline policy pattern. DeleteGroup now also checks for inline policies before allowing deletion. * fix: address review feedback for group inline policies - Embedded IAM: return NotImplemented for group inline policies instead of silently succeeding as no-ops (Gemini + CodeRabbit) - Standalone IAM: recompute member actions after PutGroupPolicy and DeleteGroupPolicy (Gemini) - Add parameter validation for GroupName/PolicyName/PolicyDocument on PutGroupPolicy, DeleteGroupPolicy, ListGroupPolicies (Gemini) - Add UserName validation for ListUserPolicies in standalone IAM - Call cleanupGroupInlinePolicies from DeleteGroup (Gemini) - Migrate GroupInlinePolicies on group rename in UpdateGroup (CodeRabbit) - Fix integration test cleanup order (CodeRabbit) * fix: persist recomputed actions and improve error handling - Set changed=true for PutGroupPolicy/DeleteGroupPolicy in standalone IAM DoActions so recomputed member actions are persisted (Gemini critical) - Make cleanupGroupInlinePolicies accept policies parameter to avoid redundant I/O, return error (Gemini) - Make migrateGroupInlinePolicies return error, handle in caller (Gemini) * fix: include group policies in action recomputation Extend computeAllActionsForUser to also aggregate group inline policies and group managed policies when s3cfg is provided. Previously, group inline policies were stored but never reflected in member Identity.Actions. (CodeRabbit critical) * perf: use identity index in recomputeActionsForGroupMembers for O(N+M) * fix: skip group inline policy integration test on embedded IAM The embedded IAM returns NotImplemented for group inline policies. Skip TestIAMGroupInlinePolicy when running against embedded mode to avoid CI failures in the group integration test matrix. |
||
|
|
45ee2ab4b9 |
feat(iam): implement ListUserPolicies API action (#8991)
* feat(iam): implement ListUserPolicies API action (#8987) Add ListUserPolicies support to both embedded and standalone IAM servers, resolving the NotImplemented error when calling `aws iam list-user-policies`. * fix: address review feedback for ListUserPolicies - Add handleImplicitUsername for ListUserPolicies in both IAM servers so omitting UserName defaults to the calling user (Gemini review) - Assert synthetic policy name in unit test (CodeRabbit) - Use require.True for error type assertion in integration test (CodeRabbit) |
||
|
|
fbe758efa8 |
test: consolidate port allocation into shared test/testutil package (#8982)
* test: consolidate port allocation into shared test/testutil package Move duplicated port allocation logic from 15+ test files into a single shared package at test/testutil/. This fixes a port collision bug where independently allocated ports could overlap via the gRPC offset (port+10000), causing weed mini to reject the configuration. The shared package provides: - AllocatePorts: atomic allocation of N unique ports - AllocateMiniPorts/MustFreeMiniPorts: gRPC-offset-aware allocation that prevents port A+10000 == port B collisions - WaitForPort, WaitForService, FindBindIP, WriteIAMConfig, HasDocker * test: address review feedback and fix FUSE build - Revert fuse_integration change: it has its own go.mod and cannot import the shared testutil package - AllocateMiniPorts: hold all listeners open until the entire batch is allocated, preventing race conditions where other processes steal ports - HasDocker: add 5s context timeout to avoid hanging on stalled Docker - WaitForService: only treat 2xx HTTP status codes as ready * test: use global rand in AllocateMiniPorts for better seeding Go 1.20+ auto-seeds the global rand generator. Using it avoids identical sequences when multiple tests call at the same nanosecond. * test: revert WaitForService status code check S3 endpoints return non-2xx (e.g. 403) on bare GET requests, so requiring 2xx caused the S3 integration test to time out. Any HTTP response is sufficient proof that the service is running. * test: fix gofmt formatting in s3tables test files |
||
|
|
a4753b6a3b |
S3: delay empty folder cleanup to prevent Spark write failures (#8970)
* S3: delay empty folder cleanup to prevent Spark write failures (#8963) Empty folders were being cleaned up within seconds, causing Apache Spark (s3a) writes to fail when temporary directories like _temporary/0/task_xxx/ were briefly empty. - Increase default cleanup delay from 5s to 2 minutes - Only process queue items that have individually aged past the delay (previously the entire queue was drained once any item triggered) - Make the delay configurable via filer.toml: [filer.options] s3.empty_folder_cleanup_delay = "2m" * test: increase cleanup wait timeout to match 2m delay The empty folder cleanup delay was increased to 2 minutes, so the Spark integration test needs to wait longer for temporary directories to disappear. * fix: eagerly clean parent directories after empty folder deletion After deleting an empty folder, immediately try to clean its parent rather than relying on cascading metadata events that each re-enter the 2-minute delay queue. This prevents multi-minute waits when cleaning nested temporary directory trees (e.g. Spark's _temporary hierarchy with 3+ levels would take 6m+ vs near-instant). Fixes the CI failure where lingering _temporary parent directories were not cleaned within the test's 3-minute timeout. |
||
|
|
8fad85aed7 |
feat(s3): support WEED_S3_SSE_KEY env var for SSE-S3 KEK (#8904)
* feat(s3): support WEED_S3_SSE_KEY env var for SSE-S3 KEK Add support for providing the SSE-S3 Key Encryption Key (KEK) via the WEED_S3_SSE_KEY environment variable (hex-encoded 256-bit key). This avoids storing the master key in plaintext on the filer at /etc/s3/sse_kek. Key source priority: 1. WEED_S3_SSE_KEY environment variable (recommended) 2. Existing filer KEK at /etc/s3/sse_kek (backward compatible) 3. Auto-generate and save to filer (deprecated for new deployments) Existing deployments with a filer-stored KEK continue to work unchanged. A deprecation warning is logged when auto-generating a new filer KEK. * refactor(s3): derive KEK from any string via HKDF instead of requiring hex Accept any secret string in WEED_S3_SSE_KEY and derive a 256-bit key using HKDF-SHA256 instead of requiring a hex-encoded key. This is simpler for users — no need to generate hex, just set a passphrase. * feat(s3): add WEED_S3_SSE_KEK and WEED_S3_SSE_KEY env vars for KEK Two env vars for providing the SSE-S3 Key Encryption Key: - WEED_S3_SSE_KEK: hex-encoded, same format as /etc/s3/sse_kek. If the filer file also exists, they must match. - WEED_S3_SSE_KEY: any string, 256-bit key derived via HKDF-SHA256. Refuses to start if /etc/s3/sse_kek exists (must delete first). Only one may be set. Existing filer-stored KEKs continue to work. Auto-generating and storing new KEKs on filer is deprecated. * fix(s3): stop auto-generating KEK, fail only when SSE-S3 is used Instead of auto-generating a KEK and storing it on the filer when no key source is configured, simply leave SSE-S3 disabled. Encrypt and decrypt operations return a clear error directing the user to set WEED_S3_SSE_KEK or WEED_S3_SSE_KEY. * refactor(s3): move SSE-S3 KEK config to security.toml Move KEK configuration from standalone env vars to security.toml's new [sse_s3] section, following the same pattern as JWT keys and TLS certs. [sse_s3] kek = "" # hex-encoded 256-bit key (same format as /etc/s3/sse_kek) key = "" # any string, HKDF-derived Viper's WEED_ prefix auto-mapping provides env var support: WEED_SSE_S3_KEK and WEED_SSE_S3_KEY. All existing behavior is preserved: filer KEK fallback, mismatch detection, and HKDF derivation. * refactor(s3): rename SSE-S3 config keys to s3.sse.kek / s3.sse.key Use [s3.sse] section in security.toml, matching the existing naming convention (e.g. [s3.*]). Env vars: WEED_S3_SSE_KEK, WEED_S3_SSE_KEY. * fix(s3): address code review findings for SSE-S3 KEK - Don't hold mutex during filer retry loop (up to 20s of sleep). Lock only to write filerClient and superKey. - Remove dead generateAndSaveSuperKeyToFiler and unused constants. - Return error from deriveKeyFromSecret instead of ignoring it. - Fix outdated doc comment on InitializeWithFiler. - Use t.Setenv in tests instead of manual os.Setenv/Unsetenv. * fix(s3): don't block startup on filer errors when KEK is configured - When s3.sse.kek is set, a temporarily unreachable filer no longer prevents startup. The filer consistency check becomes best-effort with a warning. - Same treatment for s3.sse.key: filer unreachable logs a warning instead of failing. - Rewrite error messages to suggest migration instead of file deletion, avoiding the risk of orphaning encrypted data. Finding 3 (restore auto-generation) intentionally skipped — auto-gen was removed by design to avoid storing plaintext KEK on filer. * fix(test): set WEED_S3_SSE_KEY in SSE integration test server startup SSE-S3 no longer auto-generates a KEK, so integration tests must provide one. Set WEED_S3_SSE_KEY=test-sse-s3-key in all weed mini invocations in the test Makefile. |
||
|
|
059bee683f |
feat(s3): add STS GetFederationToken support (#8891)
* feat(s3): add STS GetFederationToken support Implement the AWS STS GetFederationToken API, which allows long-term IAM users to obtain temporary credentials scoped down by an optional inline session policy. This is useful for server-side applications that mint per-user temporary credentials. Key behaviors: - Requires SigV4 authentication from a long-term IAM user - Rejects calls from temporary credentials (session tokens) - Name parameter (2-64 chars) identifies the federated user - DurationSeconds supports 900-129600 (15 min to 36 hours, default 12h) - Optional inline session policy for permission scoping - Caller's attached policies are embedded in the JWT token - Returns federated user ARN: arn:aws:sts::<account>:federated-user/<Name> No performance impact on the S3 hot path — credential vending is a separate control-plane operation, and all policy data is embedded in the stateless JWT token. * fix(s3): address GetFederationToken PR review feedback - Fix Name validation: max 32 chars (not 64) per AWS spec, add regex validation for [\w+=,.@-]+ character whitelist - Refactor parseDurationSeconds into parseDurationSecondsWithBounds to eliminate duplicated duration parsing logic - Add sts:GetFederationToken permission check via VerifyActionPermission mirroring the AssumeRole authorization pattern - Change GetPoliciesForUser to return ([]string, error) so callers fail closed on policy-resolution failures instead of silently returning nil - Move temporary-credentials rejection before SigV4 verification for early rejection and proper test coverage - Update tests: verify specific error message for temp cred rejection, add regex validation test cases (spaces, slashes rejected) * refactor(s3): use sts.Action* constants instead of hard-coded strings Replace hard-coded "sts:AssumeRole" and "sts:GetFederationToken" strings in VerifyActionPermission calls with sts.ActionAssumeRole and sts.ActionGetFederationToken package constants. * fix(s3): pass through sts: prefix in action resolver and merge policies Two fixes: 1. mapBaseActionToS3Format now passes through "sts:" prefix alongside "s3:" and "iam:", preventing sts:GetFederationToken from being rewritten to s3:sts:GetFederationToken in VerifyActionPermission. This also fixes the existing sts:AssumeRole permission checks. 2. GetFederationToken policy embedding now merges identity.PolicyNames (from SigV4 identity) with policies from the IAM manager (which may include group-attached policies), deduplicated via a map. Previously the IAM manager lookup was skipped when identity.PolicyNames was non-empty, causing group policies to be omitted from the token. * test(s3): add integration tests for sts: action passthrough and policy merge Action resolver tests: - TestMapBaseActionToS3Format_ServicePrefixPassthrough: verifies s3:, iam:, and sts: prefixed actions pass through unchanged while coarse actions (Read, Write) are mapped to S3 format - TestResolveS3Action_STSActionsPassthrough: verifies sts:AssumeRole, sts:GetFederationToken, sts:GetCallerIdentity pass through ResolveS3Action unchanged with both nil and real HTTP requests Policy merge tests: - TestGetFederationToken_GetPoliciesForUser: tests IAMManager.GetPoliciesForUser with no user store (error), missing user, user with policies, user without - TestGetFederationToken_PolicyMergeAndDedup: tests that identity.PolicyNames and IAM-manager-resolved policies are merged and deduplicated (SharedPolicy appears in both sources, result has 3 unique policies) - TestGetFederationToken_PolicyMergeNoManager: tests that when IAM manager is unavailable, identity.PolicyNames alone are embedded * test(s3): add end-to-end integration tests for GetFederationToken Add integration tests that call GetFederationToken using real AWS SigV4 signed HTTP requests against a running SeaweedFS instance, following the existing pattern in test/s3/iam/s3_sts_assume_role_test.go. Tests: - TestSTSGetFederationTokenValidation: missing name, name too short/long, invalid characters, duration too short/long, malformed policy, anonymous rejection (7 subtests) - TestSTSGetFederationTokenRejectTemporaryCredentials: obtains temp creds via AssumeRole then verifies GetFederationToken rejects them - TestSTSGetFederationTokenSuccess: basic success, custom 1h duration, 36h max duration with expiration time verification - TestSTSGetFederationTokenWithSessionPolicy: creates a bucket, obtains federated creds with GetObject-only session policy, verifies GetObject succeeds and PutObject is denied using the AWS SDK S3 client |
||
|
|
a4b896a224 |
fix(s3): skip directories before marker in ListObjectVersions pagination (#8890)
* fix(s3): skip directories before marker in ListObjectVersions pagination ListObjectVersions was re-traversing the entire directory tree from the beginning on every paginated request, only skipping entries at the leaf level. For buckets with millions of objects in deep hierarchies, this caused exponentially slower responses as pagination progressed. Two optimizations: 1. Use keyMarker to compute a startFrom position at each directory level, skipping directly to the relevant entry instead of scanning from the beginning (mirroring how ListObjects uses marker descent). 2. Skip recursing into subdirectories whose keys are entirely before the keyMarker. Changes per-page cost from O(entries_before_marker) to O(tree_depth). * test(s3): add integration test for deep-hierarchy version listing pagination Adds TestVersioningPaginationDeepDirectoryHierarchy which creates objects across 20 subdirectories at depth 6 (mimicking Veeam 365 backup layout) and paginates through them with small maxKeys. Verifies correctness (no duplicates, sorted order, all objects found) and checks that later pages don't take dramatically longer than earlier ones — the symptom of the pre-fix re-traversal bug. Also tests delimiter+pagination interaction across subdirectories. * test(s3): strengthen deep-hierarchy pagination assertions - Replace timing warning (t.Logf) with a failing assertion (t.Errorf) so pagination regressions actually fail the test. - Replace generic count/uniqueness/sort checks on CommonPrefixes with exact equality against the expected prefix slice, catching wrong-but- sorted results. * test(s3): use allKeys for exact assertion in deep-hierarchy pagination test Wire the allKeys slice (previously unused dead code) into the version listing assertion, replacing generic count/uniqueness/sort checks with an exact equality comparison against the keys that were created. |
||
|
|
98714b9f70 |
fix(test): address flaky S3 distributed lock integration test (#8888)
* fix(test): address flaky S3 distributed lock integration test Two root causes: 1. Lock ring convergence race: After waitForFilerCount(2) confirms the master sees both filers, there's a window where filer0's lock ring still only contains itself (master's LockRingUpdate broadcast is delayed by the 1s stabilization timer). During this window filer0 considers itself primary for ALL keys, so both filers can independently grant the same lock. Fix: Add waitForLockRingConverged() that acquires the same lock through both filers and verifies mutual exclusion before proceeding. 2. Hash function mismatch: ownerForObjectLock used util.HashStringToLong (MD5 + modulo) to predict lock owners, but the production DLM uses CRC32 consistent hashing via HashRing. This meant the test could pick keys that route to the same filer, not exercising the cross-filer coordination it intended to test. Fix: Use lock_manager.NewHashRing + GetPrimary() to match production routing exactly. * fix(test): verify lock denial reason in convergence check Ensure the convergence check only returns true when the second lock attempt is denied specifically because the lock is already owned, avoiding false positives from transient errors. * fix(test): check one key per primary filer in convergence wait A single arbitrary key can false-pass: if its real primary is the filer with the stale ring, mutual exclusion holds trivially because that filer IS the correct primary. Generate one test key per distinct primary using the same consistent-hash ring as production, so a stale ring on any filer is caught deterministically. |
||
|
|
d5068b3ee6 | test: harden weed mini readiness checks | ||
|
|
0884acd70c |
ci: add S3 mutation regression coverage (#8804)
* test/s3: stabilize distributed lock regression * ci: add S3 mutation regression workflow * test: fix delete regression readiness probe * test: address mutation regression review comments |
||
|
|
0adb78bc6b |
s3api: make conditional mutations atomic and AWS-compatible (#8802)
* s3api: serialize conditional write finalization * s3api: add conditional delete mutation checks * s3api: enforce destination conditions for copy * s3api: revalidate multipart completion under lock * s3api: rollback failed put finalization hooks * s3api: report delete-marker version deletions * s3api: fix copy destination versioning edge cases * s3api: make versioned multipart completion idempotent * test/s3: cover conditional mutation regressions * s3api: rollback failed copy version finalization * s3api: resolve suspended delete conditions via latest entry * s3api: remove copy test null-version injection * s3api: reject out-of-order multipart completions * s3api: preserve multipart replay version metadata * s3api: surface copy destination existence errors * s3api: simplify delete condition target resolution * test/s3: make conditional delete assertions order independent * test/s3: add distributed lock gateway integration * s3api: fail closed multipart versioned completion * s3api: harden copy metadata and overwrite paths * s3api: create delete markers for suspended deletes * s3api: allow duplicate multipart completion parts |
||
|
|
ba624f1f34 |
Rust volume server implementation with CI (#8539)
* Match Go gRPC client transport defaults
* Honor Go HTTP idle timeout
* Honor maintenanceMBps during volume copy
* Honor images.fix.orientation on uploads
* Honor cpuprofile when pprof is disabled
* Match Go memory status payloads
* Propagate request IDs across gRPC calls
* Format pending Rust source updates
* Match Go stats endpoint payloads
* Serve Go volume server UI assets
* Enforce Go HTTP whitelist guards
* Align Rust metrics admin-port test with Go behavior
* Format pending Rust server updates
* Honor access.ui without per-request JWT checks
* Honor keepLocalDatFile in tier upload shortcut
* Honor Go remote volume write mode
* Load tier backends from master config
* Check master config before loading volumes
* Remove vif files on volume destroy
* Delete remote tier data on volume destroy
* Honor vif version defaults and overrides
* Reject mismatched vif bytes offsets
* Load remote-only tiered volumes
* Report Go tail offsets in sync status
* Stream remote dat in incremental copy
* Honor collection vif for EC shard config
* Persist EC expireAtSec in vif metadata
* Stream remote volume reads through HTTP
* Serve HTTP ranges from backend source
* Match Go ReadAllNeedles scan order
* Match Go CopyFile zero-stop metadata
* Delete EC volumes with collection cleanup
* Drop deleted collection metrics
* Match Go tombstone ReadNeedleMeta
* Match Go TTL parsing: all-digit default to minutes, two-pass fit algorithm
* Match Go needle ID/cookie formatting and name size computation
* Match Go image ext checks: webp resize only, no crop; empty healthz body
* Match Go Prometheus metric names and add missing handler counter constants
* Match Go ReplicaPlacement short string parsing with zero-padding
* Add missing EC constants MAX_SHARD_COUNT and MIN_TOTAL_DISKS
* Add walk_ecx_stats for accurate EC volume file counts and size
* Match Go VolumeStatus dat file size, EC shard stats, and disk pct precision
* Match Go needle map: unconditional delete counter, fix redb idx walk offset
* Add CompactMapSegment overflow panic guard matching Go
* Match Go volume: vif creation, version from superblock, TTL expiry, dedup data_size, garbage_level fallback
* Match Go 304 Not Modified: return bare status with no headers
* Match Go JWT error message: use "wrong jwt" instead of detailed error
* Match Go read handler bare 400, delete error prefix, download throttle timeout
* Match Go pretty JSON 1-space indent and "Deletion Failed:" error prefix
* Match Go heartbeat: keep is_heartbeating on error, add EC shard identification
* Match Go needle ReadBytes V2: tolerate EOF on truncated body
* Match Go volume: cookie check on any existing needle, return DataSize, 128KB meta guard
* Match Go DeleteCollection: propagate destroy errors
* Match Go gRPC: BatchDelete no flag, IncrementalCopy error, FetchAndWrite concurrent, VolumeUnmount/DeleteCollection errors, tail draining, query error code
* Match Go Content-Disposition RFC 6266 formatting with RFC 2231 encoding
* Match Go Guard isWriteActive: combine whitelist and signing key check
* Match Go DeleteCollectionMetrics: use partial label matching
* Match Go heartbeat: send state-only delta on volume state changes
* Match Go ReadNeedleMeta paged I/O: read header+tail only, skip data; add EIO tracking
* Match Go ScrubVolume INDEX mode dispatch; add VolumeCopy preallocation and EC NeedleStatus TODOs
* Add read_ec_shard_needle for full needle reconstruction from local EC shards
* Make heartbeat master config helpers pub for VolumeCopy preallocation
* Match Go gRPC: VolumeCopy preallocation, EC NeedleStatus full read, error message wording
* Match Go HTTP responses: omitempty fields, 2-space JSON indent, JWT JSON error, delete pretty/JSONP, 304 Last-Modified, raw write error
* Match Go WriteNeedleBlob V3 timestamp patching, fix makeup_diff double padding, count==0 read handling
* Add rebuild_ecx_file for EC index reconstruction from data shards
* Match Go gRPC: tail header first-chunk-only, EC cleanup on failure, copy append mode, ecx rebuild, compact cancellation
* Add EC volume read and delete support in HTTP handlers
* Add per-shard EC mount/unmount, location predicate search, idx directory for EC
* Add CheckVolumeDataIntegrity on volume load matching Go
* Match Go gRPC: EC multi-disk placement, per-shard mount/unmount, no auto-mount on reconstruct, streaming ReadAll/EcShardRead, ReceiveFile cleanup, version check, proxy streaming, redirect Content-Type
* Match Go heartbeat metric accounting
* Match Go duplicate UUID heartbeat retries
* Delete expired EC volumes during heartbeat
* Match Go volume heartbeat pruning
* Honor master preallocate in volume max
* Report remote storage info in heartbeats
* Emit EC heartbeat deltas on shard changes
* Match Go throttle boundary: use <= instead of <, fix pretty JSON to 1-space
* Match Go write_needle_blob monotonic appendAtNs via get_append_at_ns
* Match Go VolumeUnmount: idempotent success when volume not found
* Match Go TTL Display: return empty string when unit is Empty
Go checks `t.Unit == Empty` separately and returns "" for TTLs
with nonzero count but Empty unit. Rust only checked is_empty()
(count==0 && unit==0), so count>0 with unit=0 would format as
"5 " instead of "".
* Match Go error behavior for truncated needle data in read_body_v2
Go's readNeedleDataVersion2 returns "index out of range %d" errors
(indices 1-7) when needle body or metadata fields are truncated.
Rust was silently tolerating truncation and returning Ok. Now returns
NeedleError::IndexOutOfRange with the matching index for each field.
* Match Go download throttle: return JSON error instead of plain text
* Match Go crop params: default x1/y1 to 0 when not provided
* Match Go ScrubEcVolume: accumulate total_files from EC shards
* Match Go ScrubVolume: count total_files even on scrub error
* Match Go VolumeEcShardsCopy: set ignore_source_file_not_found for .vif
* Match Go VolumeTailSender: send needle_header on every chunk
* Match Go read_super_block: apply replication override from .vif
* Match Go check_volume_data_integrity: verify all 10 entries, detect trailing corruption
* Match Go WriteNeedleBlob: dedup check before writing during replication
* handlers: use meta-only reads for HEAD
* handlers: align range parsing and responses with Go
* handlers: align upload parsing with Go
* deps: enable webp support
* Make 5bytes the default feature for idx entry compatibility
* Match Go TTL: preserve original unit when count fits in byte
* Fix EC locate_needle: use get_actual_size for full needle size
* Fix raw body POST: only parse multipart when Content-Type contains form-data
* Match Go ReceiveFile: return protocol errors in response body, not gRPC status
* add docs
* Match Go VolumeEcShardsCopy: append to .ecj file instead of truncating
* Match Go ParsePath: support _delta suffix on file IDs for sub-file addressing
* Match Go chunk manifest: add Accept-Ranges, Content-Disposition, filename fallback, MIME detection
* Match Go privateStoreHandler: use proper JSON error for unsupported methods
* Match Go Destroy: add only_empty parameter to reject non-empty volume deletion
* Fix compilation: set_read_only_persist and set_writable return ()
These methods fire-and-forget save_vif internally, so gRPC callers
should not try to chain .map_err() on the unit return type.
* Match Go SaveVolumeInfo: check writability and propagate errors in save_vif
* Match Go VolumeDelete: propagate only_empty to delete_volume for defense in depth
The gRPC VolumeDelete handler had a pre-check for only_empty but then
passed false to store.delete_volume(), bypassing the store-level check.
Go passes req.OnlyEmpty directly to DeleteVolume. Now Rust does the same
for defense in depth against TOCTOU races (though the store write lock
makes this unlikely).
* Match Go ProcessRangeRequest: return full content for empty/oversized ranges
Go returns nil from ProcessRangeRequest when ranges are empty or total
range size exceeds content length, causing the caller to serve the full
content as a normal 200 response. Rust was returning an empty 200 body.
* Match Go Query: quote JSON keys in output records
Go's ToJson produces valid JSON with quoted keys like {"name":"Alice"}.
Rust was producing invalid JSON with unquoted keys like {name:"Alice"}.
* Match Go VolumeCopy: reject when no suitable disk location exists
Go returns ErrVolumeNoSpaceLeft when no location matches the disk type
and has sufficient space. Rust had an unsafe fallback that silently
picked the first location regardless of type or available space.
* Match Go DeleteVolumeNeedle: check noWriteOrDelete before allowing delete
Go checks v.noWriteOrDelete before proceeding with needle deletion,
returning "volume is read only" if true. Rust was skipping this check.
* Match Go ReceiveFile: prefer HardDrive location for EC and use response-level write errors
Two fixes: (1) Go prefers HardDriveType disk location for EC volumes,
falling back to first location. Returns "no storage location available"
when no locations exist. (2) Write failures are now response-level
errors (in response body) instead of gRPC status errors, matching Go.
* Match Go CopyFile: sync EC volume journal to disk before copying
Go calls ecVolume.Sync() before copying EC volume files to ensure the
.ecj journal is flushed to disk. Added sync_to_disk() to EcVolume and
call it in the CopyFile EC branch.
* Match Go readSuperBlock: propagate replication parse errors
Go returns an error when parsing the replication string from the .vif
file fails. Rust was silently ignoring the parse failure and using the
super block's replication as-is.
* Match Go TTL expiry: remove append_at_ns > 0 guard
Go computes TTL expiry from AppendAtNs without guarding against zero.
When append_at_ns is 0, the expiry is epoch + TTL which is in the past,
correctly returning NotFound. Rust's extra guard skipped the check,
incorrectly returning success for such needles.
* Match Go delete_collection: skip volumes with compaction in progress
Go checks !v.isCompactionInProgress.Load() before destroying a volume
during collection deletion, skipping compacting volumes. Also changed
destroy errors to log instead of aborting the entire collection delete.
* Match Go MarkReadonly/MarkWritable: always notify master even on local error
Go always notifies the master regardless of whether the local
set_read_only_persist or set_writable step fails. The Rust code was
using `?` which short-circuited on error, skipping the final master
notification. Save the result and defer the `?` until after the
notify call.
* Match Go PostHandler: return 500 for all write errors
Go returns 500 (InternalServerError) for all write failures. Rust was
returning 404 for volume-not-found and 403 for read-only volumes.
* Match Go makeupDiff: validate .cpd compaction revision is old + 1
Go reads the new .cpd file's super block and verifies the compaction
revision is exactly old + 1. Rust only validated the old revision.
* Match Go VolumeStatus: check data backend before returning status
Go checks v.DataBackend != nil before building the status response,
returning an error if missing. Rust was silently returning size 0.
* Match Go PostHandler: always include mime field in upload response JSON
Go always serializes the mime field even when empty ("mime":""). Rust was
omitting it when empty due to Option<String> with skip_serializing_if.
* Match Go FindFreeLocation: account for EC shards in free slot calculation
Go subtracts EC shard equivalents when computing available volume slots.
Rust was only comparing volume count, potentially over-counting free
slots on locations with many EC shards.
* Match Go privateStoreHandler: use INVALID as metrics label for unsupported methods
Go records the method as INVALID in metrics for unsupported HTTP methods.
Rust was using the actual method name.
* Match Go volume: add commit_compact guard and scrub data size validation
Two fixes: (1) commit_compact now checks/sets is_compacting flag to
prevent concurrent commits, matching Go's CompareAndSwap guard.
(2) scrub now validates total needle sizes against .dat file size.
* Match Go gRPC: fix TailSender error propagation, EcShardsInfo all slots, EcShardRead .ecx check
Three fixes: (1) VolumeTailSender now propagates binary search errors
instead of silently falling back to start. (2) VolumeEcShardsInfo
returns entries for all shard slots including unmounted. (3)
VolumeEcShardRead checks .ecx index for deletions instead of .ecj.
* Match Go metrics: add BuildInfo gauge and connection tracking functions
Go exposes a BuildInfo Prometheus metric with version labels, and tracks
open connections via stats.ConnectionOpen/Close. Added both to Rust.
* Match Go NeedleMap.Delete: use !is_deleted() instead of is_valid()
Go's CompactMap.Delete checks !IsDeleted() not IsValid(), so needles
with size==0 (live but anomalous) can still be deleted. The Rust code
was using is_valid() which returns false for size==0, preventing
deletion of such needles.
* Match Go fitTtlCount: always normalize TTL to coarsest unit
Go's fitTtlCount always converts to seconds first, then finds the
coarsest unit that fits in one byte (e.g., 120m → 2h). Rust had an
early return for count<=255 that skipped normalization, producing
different binary encodings for the same duration.
* Match Go BuildInfo metric: correct name and add missing labels
Go uses SeaweedFS_build_info (Namespace=SeaweedFS, Subsystem=build,
Name=info) with labels [version, commit, sizelimit, goos, goarch].
Rust had SeaweedFS_volumeServer_buildInfo with only [version].
* Match Go HTTP handlers: fix UploadResult fields, DiskStatus JSON, chunk manifest ETag
- UploadResult.mime: add skip_serializing_if to omit empty MIME (Go uses omitempty)
- UploadResult.contentMd5: only include when request provided Content-MD5 header
- Content-MD5 response header: only set when request provided it
- DiskStatuses: use camelCase field names (percentFree, percentUsed, diskType)
to match Go's protobuf JSON marshaling
- Chunk manifest: preserve needle ETag in expanded response headers
* Match Go volume: fix version(), integrity check, scrub, and commit_compact
- version(): use self.version() instead of self.super_block.version in
read_all_needles, check_volume_data_integrity, scan_raw_needles_from
to respect volumeInfo.version override
- check_volume_data_integrity: initialize healthy_index_size to idx_size
(matching Go) and continue on EOF instead of returning error
- scrub(): count deleted needles in total_read since they still occupy
space in the .dat file (matches Go's totalRead += actualSize for deleted)
- commit_compact: clean up .cpd/.cpx files on makeup_diff failure
(matches Go's error path cleanup)
* Match Go write queue: add 4MB batch byte limit
Go's startWorker breaks the batch at either 128 requests or 4MB of
accumulated write data. Rust only had the 128-request limit, allowing
large writes to accumulate unbounded latency.
* Add TTL normalization tests for Go parity verification
Test that fit_ttl_count normalizes 120m→2h, 24h→1d, 7d→1w even
when count fits in a byte, matching Go's fitTtlCount behavior.
* Match Go FindFreeLocation: account for EC shards in free slot calculation
Go's free volume count subtracts both regular volumes and EC volumes
from max_volume_count. Rust was only counting regular volumes, which
could over-report available slots when EC shards are mounted.
* Match Go EC volume: mark deletions in .ecx and replay .ecj at startup
Go's DeleteNeedleFromEcx marks needles as deleted in the .ecx index
in-place (writing TOMBSTONE_FILE_SIZE at the size field) in addition
to appending to the .ecj journal. Go's RebuildEcxFile replays .ecj
entries into .ecx on startup, then removes the .ecj file.
Rust was only appending to .ecj without marking .ecx, which meant
deleted EC needles remained readable via .ecx binary search. This
fix:
- Opens .ecx in read/write mode (was read-only)
- Adds mark_needle_deleted_in_ecx: binary search + in-place write
- Calls it from journal_delete before appending to .ecj
- Adds rebuild_ecx_from_journal: replays .ecj into .ecx on startup
* Match Go check_all_ec_shards_deleted: use MAX_SHARD_COUNT instead of hardcoded 14
Go's TotalShardsCount is DataShardsCount + ParityShardsCount = 14 by
default, but custom EC configs via .vif can have more shards (up to
MaxShardCount = 32). Using MAX_SHARD_COUNT ensures all shard files
are checked regardless of EC configuration.
* Match Go EC locate: subtract 1 from shard size and use datFileSize override
Go's LocateEcShardNeedleInterval passes shard.ecdFileSize-1 to
LocateData (shards are padded, -1 avoids overcounting large block
rows). When datFileSize is known, Go uses datFileSize/DataShards
instead. Rust was passing the raw shard file size without adjustment.
* Fix TTL parsing and DiskStatus field names to match Go exactly
TTL::read: Go's ReadTTL preserves the original unit (7d stays 7d,
not 1w) and errors on count > 255. The previous normalization change
was incorrect — Go only normalizes internally via fitTtlCount, not
during string parsing.
DiskStatus: Go uses encoding/json on protobuf structs, which reads
the json struct tags (snake_case: percent_free, percent_used,
disk_type), not the protobuf JSON names (camelCase). Revert to
snake_case to match Go's actual output.
* Fix heartbeat: check leader != current master before redirect, process duplicated UUIDs first
Match Go's volume_grpc_client_to_master.go behavior:
1. Only trigger leader redirect when the leader address differs from the
current master (prevents unnecessary reconnect loops when master confirms
its own address).
2. Process duplicated_uuids before leader redirect check, matching Go's
ordering where duplicate UUID detection takes priority.
* Remove SetState version check to match Go behavior
Go's SetState unconditionally applies the state without any version
mismatch check. The Rust version had an extra optimistic concurrency
check that would reject valid requests from Go clients that don't
track versions.
* Fix TTL::read() to normalize via fit_ttl_count matching Go's ReadTTL
Go's ReadTTL calls fitTtlCount which converts to seconds and normalizes
to the coarsest unit that fits in a byte count (e.g. 120m->2h, 7d->1w,
24h->1d). The Rust version was preserving the original unit, producing
different binary encodings on disk and in heartbeat messages.
* Always return Content-MD5 header and JSON field on successful writes
Go always sets Content-MD5 in the response regardless of whether the
request included it. The Rust version was conditionally including it
only when the request provided Content-MD5.
* Include name and size in UploadResult JSON even when empty/zero
Go's encoding/json always includes empty strings and zero values in
the upload response. The Rust version was using skip_serializing_if
to omit them, causing JSON structure differences.
* Include deleted needles in scan_raw_needles_from to match Go
Go's ScanVolumeFileFrom visits ALL needles including deleted ones.
Skipping deleted entries during incremental copy would cause tombstones
to not be propagated, making deleted files reappear on the receiving side.
* Match Go NeedleMap.Delete: always write tombstone to idx file
Go's NeedleMap.Delete unconditionally writes a tombstone entry to the
idx file and updates metrics, even if the needle doesn't exist or is
already deleted. This is important for replication where every delete
operation must produce an idx write. The Rust version was skipping the
tombstone write for non-existent or already-deleted needles.
* Limit MIME type to 255 bytes matching Go's CreateNeedleFromRequest
* Title-case Seaweed-* pair keys to match Go HTTP header canonicalization
* Unify DiskType::Hdd into HardDrive to match Go's single HardDriveType
* Skip tombstone entries in walk_ecx_stats total_size matching Go's Raw()
* Return EMPTY TTL when computed seconds is zero matching Go's fitTtlCount
* Include disk-space-low in Volume.is_read_only() matching Go
* Log error on CIDR parse failure in whitelist matching Go's glog.Errorf
* Log cookie mismatch in gRPC Query matching Go's V(0).Infof
* Fix is_expired volume_size comparison to use < matching Go
Go checks `volumeSize < super_block.SuperBlockSize` (strict less-than),
but Rust used `<=`. This meant Rust would fail to expire a volume that
is exactly SUPER_BLOCK_SIZE bytes.
* Apply Go's JWT expiry defaults: 10s write, 60s read
Go calls v.SetDefault("jwt.signing.expires_after_seconds", 10) and
v.SetDefault("jwt.signing.read.expires_after_seconds", 60). Rust
defaulted to 0 for both, which meant tokens would never expire when
security.toml has a signing key but omits expires_after_seconds.
* Stop [grpc.volume].ca from overriding [grpc].ca matching Go
Go reads the gRPC CA file only from config.GetString("grpc.ca"), i.e.
the [grpc] section. The [grpc.volume] section only provides cert and
key. Rust was also reading ca from [grpc.volume] which would silently
override the [grpc].ca value when both were present.
* Fix free_volume_count to use EC shard count matching Go
Was counting EC volumes instead of EC shards, which underestimates EC
space usage. One EC volume with 14 shards uses ~1.4 volume slots, not 1.
Now uses Go's formula: ((max - volumes) * DataShardsCount - ecShardCount) / DataShardsCount.
* Include preallocate in compaction space check matching Go
Go uses max(preallocate, estimatedCompactSize) for the free space check.
Rust was only using the estimated volume size, which could start a
compaction that fails mid-way if preallocate exceeds the volume size.
* Check gzip magic bytes before setting Content-Encoding matching Go
Go checks both Accept-Encoding contains "gzip" AND IsGzippedContent
(data starts with 0x1f 0x8b) before setting Content-Encoding: gzip.
Rust only checked Accept-Encoding, which could incorrectly declare
gzip encoding for non-gzip compressed data.
* Only set upload response name when needle HasName matching Go
Go checks reqNeedle.HasName() before setting ret.Name. Rust always set
the name from the filename variable, which could return the fid portion
of the path as the name for raw PUT requests without a filename.
* Treat MaxVolumeCount==0 as unlimited matching Go's hasFreeDiskLocation
Go's hasFreeDiskLocation returns true immediately when MaxVolumeCount
is 0, treating it as unlimited. Rust was computing effective_free as
<= 0 for max==0, rejecting the location. This could fail volume
creation during early startup before the first heartbeat adjusts max.
* Read lastAppendAtNs from deleted V3 entries in integrity check
Go's doCheckAndFixVolumeData reads AppendAtNs from both live entries
(verifyNeedleIntegrity) and deleted tombstones (verifyDeletedNeedleIntegrity).
Rust was skipping deleted entries, which could result in a stale
last_append_at_ns if the last index entry is a deletion.
* Return empty body for empty/oversized range requests matching Go
Go's ProcessRangeRequest returns nil (empty body, 200 OK) when
parsed ranges are empty or combined range size exceeds total content
size. The Rust buffered path incorrectly returned the full file data
for both cases. The streaming path already handled this correctly.
* Dispatch ScrubEcVolume by mode matching Go's INDEX/LOCAL/FULL
Go's ScrubEcVolume switches on mode: INDEX calls v.ScrubIndex()
(ecx integrity only), LOCAL calls v.ScrubLocal(), FULL calls
vs.store.ScrubEcVolume(). Rust was ignoring the mode and always
running verify_ec_shards. Now INDEX mode checks ecx index integrity
(sorted overlap detection + file size validation) without shard I/O,
while LOCAL/FULL modes run the existing shard verification.
* Fix TTL test expectation: 7d normalizes to 1w matching Go's fitTtlCount
Go's ReadTTL calls fitTtlCount which normalizes to the coarsest unit
that fits: 7 days = 1 week, so "7d" becomes {Count:1, Unit:Week}
which displays as "1w". Both Go and Rust normalize identically.
* Add version mismatch check to SetState matching Go's State.Update
Go's State.Update compares the incoming version with the stored
version and returns "version mismatch" error if they differ. This
provides optimistic concurrency control. The Rust implementation
was accepting any version unconditionally.
* Use unquoted keys in Query JSON output matching Go's json.ToJson
Go's json.ToJson produces records with unquoted keys like
{score:12} not {"score":12}. This is a custom format used
internally by SeaweedFS for query results.
* Fix TTL test expectation in VolumeNeedleStatus: 7d normalizes to 1w
Same normalization as the HTTP test: Go's ReadTTL calls fitTtlCount
which converts 7 days to 1 week.
* Include ETag header in 304 Not Modified responses matching Go behavior
Go sets ETag on the response writer (via SetEtag) before the
If-Modified-Since and If-None-Match conditional checks, so both
304 response paths include the ETag header. The Rust implementation
was only adding ETag to 200 responses.
* Remove needle-name fallback in chunk manifest filename resolution
Go's tryHandleChunkedFile only falls back from URL filename to
manifest name. Rust had an extra fallback to needle.name that
Go does not perform, which could produce different
Content-Disposition filenames for chunk manifests.
* Validate JWT nbf (Not Before) claim matching Go's jwt-go/v5
Go's jwt.ParseWithClaims validates the nbf claim when present,
rejecting tokens whose nbf is in the future. The Rust jsonwebtoken
crate defaults validate_nbf to false, so tokens with future nbf
were incorrectly accepted.
* Set isHeartbeating to true at startup matching Go's VolumeServer init
Go unconditionally sets isHeartbeating: true in the VolumeServer
struct literal. Rust was starting with false when masters are
configured, causing /healthz to return 503 until the first
heartbeat succeeds.
* Call store.close() on shutdown matching Go's Shutdown()
Go's Shutdown() calls vs.store.Close() which closes all volumes
and flushes file handles. The Rust server was relying on process
exit for cleanup, which could leave data unflushed.
* Include server ID in maintenance mode error matching Go's format
Go returns "volume server %s is in maintenance mode" with the
store ID. Rust was returning a generic "maintenance mode" message.
* Fix DiskType test: use HardDrive variant matching Go's HddType=""
Go maps both "" and "hdd" to HardDriveType (empty string). The
Rust enum variant is HardDrive, not Hdd. The test referenced a
nonexistent Hdd variant causing compilation failure.
* Do not include ETag in 304 responses matching Go's GetOrHeadHandler
Go sets ETag at L235 AFTER the If-Modified-Since and If-None-Match
304 return paths, so Go's 304 responses do not include the ETag header.
The Rust code was incorrectly including ETag in both 304 response paths.
* Return 400 on malformed query strings in PostHandler matching Go's ParseForm
Go's r.ParseForm() returns HTTP 400 with "form parse error: ..." when
the query string is malformed. Rust was silently falling back to empty
query params via unwrap_or_default().
* Load EC volume version from .vif matching Go's NewEcVolume
Go sets ev.Version = needle.Version(volumeInfo.Version) from the .vif
file. Rust was always using Version::current() (V3), which would produce
wrong needle actual size calculations for volumes created with V1 or V2.
* Sync .ecx file before close matching Go's EcVolume.Close
Go calls ev.ecxFile.Sync() before closing to ensure in-place deletion
marks are flushed to disk. Without this, deletion marks written via
MarkNeedleDeleted could be lost on crash.
* Validate SuperBlock extra data size matching Go's Bytes() guard
Go checks extraSize > 256*256-2 and calls glog.Fatalf to prevent
corrupt super block headers. Rust was silently truncating via u16 cast,
which would write an incorrect extra_size field.
* Update quinn-proto 0.11.13 -> 0.11.14 to fix GHSA-6xvm-j4wr-6v98
Fixes Dependency Review CI failure: quinn-proto < 0.11.14 is vulnerable
to unauthenticated remote DoS via panic in QUIC transport parameter
parsing.
* Skip TestMultipartUploadUsesFormFieldsForTimestampAndTTL for Go server
Go's r.FormValue() cannot read multipart text fields after
r.MultipartReader() consumes the body, so ts/ttl sent as multipart
form fields only work with the Rust volume server. Skip this test
when VOLUME_SERVER_IMPL != "rust" to fix CI failure.
* Flush .ecx in EC volume sync_to_disk matching Go's Sync()
Go's EcVolume.Sync() flushes both the .ecj journal and the .ecx index
to disk. The Rust version only flushed .ecj, leaving in-place deletion
marks in .ecx unpersisted until close(). This could cause data
inconsistency if the server crashes after marking a needle deleted in
.ecx but before close().
* Remove .vif file in EC volume destroy matching Go's Destroy()
Go's EcVolume.Destroy() removes .ecx, .ecj, and .vif files. The Rust
version only removed .ecx and .ecj, leaving orphaned .vif files on
disk after EC volume destruction (e.g., after TTL expiry).
* Fix is_expired to use <= for SuperBlockSize check matching Go
Go checks contentSize <= SuperBlockSize to detect empty volumes (no
needles). Rust used < which would incorrectly allow a volume with
exactly SuperBlockSize bytes (header only, no data) to proceed to
the TTL expiry check and potentially be marked as expired.
* Fix read_append_at_ns to read timestamps from tombstone entries
Go reads the full needle body for all entries including tombstones
(deleted needles with size=0) to extract the actual AppendAtNs
timestamp. The Rust version returned 0 early for size <= 0 entries,
which would cause the binary search in incremental copy to produce
incorrect results for positions containing deleted needles.
Now uses get_actual_size to compute the on-disk size (which handles
tombstones correctly) and only returns 0 when the actual size is 0.
* Add X-Request-Id response header matching Go's requestIDMiddleware
Go sets both X-Request-Id and x-amz-request-id response headers.
The Rust server only set x-amz-request-id, missing X-Request-Id.
* Add skip_serializing_if for UploadResult name and size fields
Go's UploadResult uses json:"name,omitempty" and json:"size,omitempty",
omitting these fields from JSON when they are zero values (empty
string / 0). The Rust struct always serialized them, producing
"name":"" and "size":0 where Go would omit them.
* Support JSONP/pretty-print for write success responses
Go's writeJsonQuiet checks for callback (JSONP) and pretty query
parameters on all JSON responses including write success. The Rust
write success path used axum::Json directly, bypassing JSONP and
pretty-print support. Now uses json_result_with_query to match Go.
* Include actual limit in file size limit error message
Go returns "file over the limited %d bytes" with the actual limit
value included. Rust returned a generic "file size limit exceeded"
without the limit value, making it harder to debug.
* Extract extension from 2-segment URL paths for image operations
Go's parseURLPath extracts the file extension from all URL formats
including 2-segment paths like /vid,fid.jpg. The Rust version only
handled 3-segment paths (/vid/fid/filename.ext), so extensions in
2-segment paths were lost. This caused image resize/crop operations
requested via query params to be silently skipped for those paths.
* Add size_hint to TrackedBody so throttled downloads get Content-Length
TrackedBody (used for download throttling) did not implement
size_hint(), causing HTTP/1.1 to fall back to chunked transfer
encoding instead of setting Content-Length. Go always sets
Content-Length explicitly for non-range responses.
* Add Last-Modified, pairs, and S3 headers to chunk manifest responses
Go sets Last-Modified, needle pairs, and S3 pass-through headers on
the response writer BEFORE calling tryHandleChunkedFile. Since the
Rust chunk manifest handler created fresh response headers and
returned early, these headers were missing from chunk manifest
responses. Now passes last_modified_str into the chunk manifest
handler and applies pairs and S3 pass-through query params
(response-cache-control, response-content-encoding, etc.) to the
chunk manifest response headers.
* Fix multipart fallback to use first part data when no filename
Go reads the first part's data unconditionally, then looks for a
part with a filename. If none found, Go uses the first part's data
(with empty filename). Rust only captured parts with filenames, so
when no part had a filename it fell back to the raw multipart body
bytes (including boundary delimiters), producing corrupt needle data.
* Set HasName and HasMime flags for empty values matching Go
Go's CreateNeedleFromRequest sets HasName and HasMime flags even when
the filename or MIME type is empty (len < 256 is true for len 0).
Rust skipped empty values, causing the on-disk needle format to
differ: Go-written needles include extra bytes for the empty name/mime
size fields, changing the serialized needle size in the idx entry.
This ensures binary format compatibility between Go and Rust servers.
* Add is_stopping guard to vacuum_volume_commit matching Go
Go's CommitCompactVolume (store_vacuum.go L53-54) checks
s.isStopping before committing compaction to prevent file
swaps during shutdown. The Rust handler was missing this
check, which could allow compaction commits while the
server is stopping.
* Remove disk_type from required status fields since Go omits it
Go's default DiskType is "" (HardDriveType), and protobuf's omitempty
tag causes empty strings to be dropped from JSON output.
* test: honor rust env in dual volume harness
* grpc: notify master after volume lifecycle changes
* http: proxy to replicas before download-limit timeout
* test: pass readMode to rust volume harnesses
* fix store free-location predicate selection
* fix volume copy disk placement and heartbeat notification
* fix chunk manifest delete replication
* fix write replication to survive client disconnects
* fix download limit proxy and wait flow
* fix crop gating for streamed reads
* fix upload limit wait counter behavior
* fix chunk manifest image transforms
* fix has_resize_ops to check width/height > 0 instead of is_some()
Go's shouldResizeImages condition is `width > 0 || height > 0`, so
`?width=0` correctly evaluates to false. Rust was using `is_some()`
which made `?width=0` evaluate to true, unnecessarily disabling
streaming reads for those requests.
* fix Content-MD5 to only compute and return when provided by client
Go only computes the MD5 of uncompressed data when a Content-MD5
header or multipart field is provided. Rust was always computing and
returning it. Also fix the mismatch error message to include size,
matching Go's format.
* fix save_vif to compute ExpireAtSec from TTL
Go's SaveVolumeInfo always computes ExpireAtSec = now + ttlSeconds
when the volume has a TTL. The save_vif path (used by set_read_only
and set_writable) was missing this computation, causing .vif files
to be written without the correct expiration timestamp for TTL volumes.
* fix set_writable to not modify no_write_can_delete
Go's MarkVolumeWritable only sets noWriteOrDelete=false and persists.
Rust was additionally setting no_write_can_delete=has_remote_file,
which could incorrectly change the write mode for remote-file volumes
when the master explicitly asks to make the volume writable.
* fix write_needle_blob_and_index to error on too-small V3 blob
Go returns an error when the needle blob is too small for timestamp
patching. Rust was silently skipping the patch and writing the blob
with a stale/zero timestamp, which could cause data integrity issues
during incremental replication that relies on AppendAtNs ordering.
* fix VolumeEcShardsToVolume to validate dataShards range
Go validates that dataShards is > 0 and <= MaxShardCount before
proceeding with EC-to-volume reconstruction. Without this check,
a zero or excessively large data_shards value could cause confusing
downstream failures.
* fix destroy to use VolumeError::NotEmpty instead of generic Io error
The dedicated NotEmpty variant exists in the enum but was not being
used. This makes error matching consistent with Go's ErrVolumeNotEmpty.
* fix SetState to persist state to disk with rollback on failure
Go's State.Update saves VolumeServerState to a state.pb file after
each SetState call, and rolls back the in-memory state if persistence
fails. Rust was only updating in-memory atomics, so maintenance mode
would be lost on server restart. Now saves protobuf-encoded state.pb
and loads it on startup.
* fix VolumeTierMoveDatToRemote to close local dat backend after upload
Go calls v.LoadRemoteFile() after saving volume info, which closes
the local DataBackend before transitioning to remote storage. Without
this, the volume holds a stale file handle to the deleted local .dat
file, causing reads to fail until server restart.
* fix VolumeTierMoveDatFromRemote to close remote dat backend after download
Go calls v.DataBackend.Close() and sets DataBackend=nil after removing
the remote file reference. Without this, the stale remote backend
state lingers and reads may not discover the newly downloaded local
.dat file until server restart.
* fix redirect to use internal url instead of public_url
Go's proxyReqToTargetServer builds the redirect Location header from
loc.Url (the internal URL), not publicUrl. Using public_url could
cause redirect failures when internal and external URLs differ.
* fix redirect test and add state_file_path to integration test
Update redirect unit test to expect internal url (matching the
previous fix). Add missing state_file_path field to the integration
test VolumeServerState constructor.
* fix FetchAndWriteNeedle to await all writes before checking errors
Go uses a WaitGroup to await all writes (local + replicas) before
checking errors. Rust was short-circuiting on local write failure,
which could leave replica writes in-flight without waiting for
completion.
* fix shutdown to send deregister heartbeat before pre_stop delay
Go's StopHeartbeat() closes stopChan immediately on interrupt, causing
the heartbeat goroutine to send the deregister heartbeat right away,
before the preStopSeconds delay. Rust was only setting is_stopping=true
without waking the heartbeat loop, so the deregister was delayed until
after the pre_stop sleep. Now we call volume_state_notify.notify_one()
to wake the heartbeat immediately.
* fix heartbeat response ordering to check duplicate UUIDs first
Go processes heartbeat responses in this order: DuplicatedUuids first,
then volume options (prealloc/size limit), then leader redirect. Rust
was applying volume options before checking for duplicate UUIDs, which
meant volume option changes would take effect even when the response
contained a duplicate UUID error that should cause an immediate return.
* the test thread was blocked
* fix(deps): update aws-lc-sys 0.38.0 → 0.39.0 to resolve security advisories
Bumps aws-lc-rs 1.16.1 → 1.16.2, pulling in aws-lc-sys 0.39.0 which
fixes GHSA-394x-vwmw-crm3 (X.509 Name Constraints wildcard/unicode
bypass) and GHSA-9f94-5g5w-gf6r (CRL Distribution Point scope check
logic error).
* fix: match Go Content-MD5 mismatch error message format
Go uses "Content-MD5 did not match md5 of file data expected [X]
received [Y] size Z" while Rust had a shorter format. Match the
exact Go error string so clients see identical messages.
* fix: match Go Bearer token length check (> 7, not >= 7)
Go requires len(bearer) > 7 ensuring at least one char after
"Bearer ". Rust used >= 7 which would accept an empty token.
* fix(deps): drop legacy rustls 0.21 to resolve rustls-webpki GHSA-pwjx-qhcg-rvj4
aws-sdk-s3's default "rustls" feature enables tls-rustls in
aws-smithy-runtime, which pulls in legacy-rustls-ring (rustls 0.21
→ rustls-webpki 0.101.7, moderate CRL advisory). Replace with
explicit default-https-client which uses only rustls 0.23 /
rustls-webpki 0.103.9.
* fix: use uploaded filename for auto-compression extension detection
Go extracts the file extension from pu.FileName (the uploaded
filename) for auto-compression decisions. Rust was using the URL
path, which typically has no extension for SeaweedFS file IDs.
* fix: add CRC legacy Value() backward-compat check on needle read
Go double-checks CRC: n.Checksum != crc && uint32(n.Checksum) !=
crc.Value(). The Value() path is a deprecated transform for compat
with seaweed versions prior to commit
|
||
|
|
e5f72077ee |
fix: resolve CORS cache race condition causing stale 404 responses (#8748)
The metadata subscription handler (updateBucketConfigCacheFromEntry) was making a separate RPC call via loadCORSFromBucketContent to load CORS configuration. This created a race window where a slow CreateBucket subscription event could re-cache stale data after PutBucketCors had already cleared the cache, causing subsequent GetBucketCors to return 404 NoSuchCORSConfiguration. Parse CORS directly from the subscription entry's Content field instead of making a separate RPC. Also fix getBucketConfig to parse CORS from the already-fetched entry, eliminating a redundant RPC call. Fix TestCORSCaching to use require.NoError to prevent nil pointer dereference panics when GetBucketCors fails. |
||
|
|
d5ee35c8df |
Fix S3 delete for non-empty directory markers (#8740)
* Fix S3 delete for non-empty directory markers * Address review feedback on directory marker deletes * Stabilize FUSE concurrent directory operations |
||
|
|
d6a872c4b9 |
Preserve explicit directory markers with octet-stream MIME (#8726)
* Preserve octet-stream MIME on explicit directory markers * Run empty directory marker regression in CI * Run S3 Spark workflow for filer changes |
||
|
|
80f3079d2a |
fix(s3): include directory markers in ListObjects without delimiter (#8704)
* fix(s3): include directory markers in ListObjects without delimiter (#8698) Directory key objects (zero-byte objects with keys ending in "/") created via PutObject were omitted from ListObjects/ListObjectsV2 results when no delimiter was specified. AWS S3 includes these as regular keys in Contents. The issue was in doListFilerEntries: when recursing into directories in non-delimiter mode, directory key objects were only emitted when prefixEndsOnDelimiter was true. Added an else branch to emit them in the general recursive case as well. * remove issue reference from inline comment * test: add child-under-marker and paginated listing coverage Extend test 6 to place a child object under the directory marker and paginate with MaxKeys=1 so the emit-then-recurse truncation path is exercised. * fix(test): skip directory markers in Spark temporary artifacts check The listing check now correctly shows directory markers (keys ending in "/") after the ListObjects fix. These 0-byte metadata objects are not data artifacts — filter them from the listing check since the HeadObject-based check already verifies their cleanup with a timeout. |
||
|
|
9984ce7dcb |
fix(s3): omit NotResource:null from bucket policy JSON response (#8658)
* fix(s3): omit NotResource:null from bucket policy JSON response (#8657) Change NotResource from value type to pointer (*StringOrStringSlice) so that omitempty properly omits it when unset, matching the existing Principal field pattern. This prevents IaC tools (Terraform, Ansible) from detecting false configuration drift. Add bucket policy round-trip idempotency integration tests. * simplify JSON comparison in bucket policy idempotency test Use require.JSONEq directly on the raw JSON strings instead of round-tripping through unmarshal/marshal, since JSONEq already handles normalization internally. * fix bucket policy test cases that locked out the admin user The Deny+NotResource test cases used Action:"s3:*" which denied the admin's own GetBucketPolicy call. Scope deny to s3:GetObject only, and add an Allow+NotResource variant instead. * fix(s3): also make Resource a pointer to fix empty string in JSON Apply the same omitempty pointer fix to the Resource field, which was emitting "Resource":"" when only NotResource was set. Add NewStringOrStringSlicePtr helper, make Strings() nil-safe, and handle *StringOrStringSlice in normalizeToStringSliceWithError. * improve bucket policy integration tests per review feedback - Replace time.Sleep with waitForClusterReady using ListBuckets - Use structural hasKey check instead of brittle substring NotContains - Assert specific NoSuchBucketPolicy error code after delete - Handle single-statement policies in hasKey helper |
||
|
|
8cde3d4486 |
Add data file compaction to iceberg maintenance (Phase 2) (#8503)
* Add iceberg_maintenance plugin worker handler (Phase 1) Implement automated Iceberg table maintenance as a new plugin worker job type. The handler scans S3 table buckets for tables needing maintenance and executes operations in the correct Iceberg order: expire snapshots, remove orphan files, and rewrite manifests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Add data file compaction to iceberg maintenance handler (Phase 2) Implement bin-packing compaction for small Parquet data files: - Enumerate data files from manifests, group by partition - Merge small files using parquet-go (read rows, write merged output) - Create new manifest with ADDED/DELETED/EXISTING entries - Commit new snapshot with compaction metadata Add 'compact' operation to maintenance order (runs before expire_snapshots), configurable via target_file_size_bytes and min_input_files thresholds. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Fix memory exhaustion in mergeParquetFiles by processing files sequentially Previously all source Parquet files were loaded into memory simultaneously, risking OOM when a compaction bin contained many small files. Now each file is loaded, its rows are streamed into the output writer, and its data is released before the next file is loaded — keeping peak memory proportional to one input file plus the output buffer. * Validate bucket/namespace/table names against path traversal Reject names containing '..', '/', or '\' in Execute to prevent directory traversal via crafted job parameters. * Add filer address failover in iceberg maintenance handler Try each filer address from cluster context in order instead of only using the first one. This improves resilience when the primary filer is temporarily unreachable. * Add separate MinManifestsToRewrite config for manifest rewrite threshold The rewrite_manifests operation was reusing MinInputFiles (meant for compaction bin file counts) as its manifest count threshold. Add a dedicated MinManifestsToRewrite field with its own config UI section and default value (5) so the two thresholds can be tuned independently. * Fix risky mtime fallback in orphan removal that could delete new files When entry.Attributes is nil, mtime defaulted to Unix epoch (1970), which would always be older than the safety threshold, causing the file to be treated as eligible for deletion. Skip entries with nil Attributes instead, matching the safer logic in operations.go. * Fix undefined function references in iceberg_maintenance_handler.go Use the exported function names (ShouldSkipDetectionByInterval, BuildDetectorActivity, BuildExecutorActivity) matching their definitions in vacuum_handler.go. * Remove duplicated iceberg maintenance handler in favor of iceberg/ subpackage The IcebergMaintenanceHandler and its compaction code in the parent pluginworker package duplicated the logic already present in the iceberg/ subpackage (which self-registers via init()). The old code lacked stale-plan guards, proper path normalization, CAS-based xattr updates, and error-returning parseOperations. Since the registry pattern (default "all") makes the old handler unreachable, remove it entirely. All functionality is provided by iceberg.Handler with the reviewed improvements. * Fix MinManifestsToRewrite clamping to match UI minimum of 2 The clamp reset values below 2 to the default of 5, contradicting the UI's advertised MinValue of 2. Clamp to 2 instead. * Sort entries by size descending in splitOversizedBin for better packing Entries were processed in insertion order which is non-deterministic from map iteration. Sorting largest-first before the splitting loop improves bin packing efficiency by filling bins more evenly. * Add context cancellation check to drainReader loop The row-streaming loop in drainReader did not check ctx between iterations, making long compaction merges uncancellable. Check ctx.Done() at the top of each iteration. * Fix splitOversizedBin to always respect targetSize limit The minFiles check in the split condition allowed bins to grow past targetSize when they had fewer than minFiles entries, defeating the OOM protection. Now bins always split at targetSize, and a trailing runt with fewer than minFiles entries is merged into the previous bin. * Add integration tests for iceberg table maintenance plugin worker Tests start a real weed mini cluster, create S3 buckets and Iceberg table metadata via filer gRPC, then exercise the iceberg.Handler operations (ExpireSnapshots, RemoveOrphans, RewriteManifests) against the live filer. A full maintenance cycle test runs all operations in sequence and verifies metadata consistency. Also adds exported method wrappers (testing_api.go) so the integration test package can call the unexported handler methods. * Fix splitOversizedBin dropping files and add source path to drainReader errors The runt-merge step could leave leading bins with fewer than minFiles entries (e.g. [80,80,10,10] with targetSize=100, minFiles=2 would drop the first 80-byte file). Replace the filter-based approach with an iterative merge that folds any sub-minFiles bin into its smallest neighbor, preserving all eligible files. Also add the source file path to drainReader error messages so callers can identify which Parquet file caused a read/write failure. * Harden integration test error handling - s3put: fail immediately on HTTP 4xx/5xx instead of logging and continuing - lookupEntry: distinguish NotFound (return nil) from unexpected RPC errors (fail the test) - writeOrphan and orphan creation in FullMaintenanceCycle: check CreateEntryResponse.Error in addition to the RPC error * go fmt --------- Co-authored-by: Copilot <copilot@github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
3d9f7f6f81 | go 1.25 | ||
|
|
992db11d2b |
iam: add IAM group management (#8560)
* iam: add Group message to protobuf schema Add Group message (name, members, policy_names, disabled) and add groups field to S3ApiConfiguration for IAM group management support (issue #7742). * iam: add group CRUD to CredentialStore interface and all backends Add group management methods (CreateGroup, GetGroup, DeleteGroup, ListGroups, UpdateGroup) to the CredentialStore interface with implementations for memory, filer_etc, postgres, and grpc stores. Wire group loading/saving into filer_etc LoadConfiguration and SaveConfiguration. * iam: add group IAM response types Add XML response types for group management IAM actions: CreateGroup, DeleteGroup, GetGroup, ListGroups, AddUserToGroup, RemoveUserFromGroup, AttachGroupPolicy, DetachGroupPolicy, ListAttachedGroupPolicies, ListGroupsForUser. * iam: add group management handlers to embedded IAM API Add CreateGroup, DeleteGroup, GetGroup, ListGroups, AddUserToGroup, RemoveUserFromGroup, AttachGroupPolicy, DetachGroupPolicy, ListAttachedGroupPolicies, and ListGroupsForUser handlers with dispatch in ExecuteAction. * iam: add group management handlers to standalone IAM API Add group handlers (CreateGroup, DeleteGroup, GetGroup, ListGroups, AddUserToGroup, RemoveUserFromGroup, AttachGroupPolicy, DetachGroupPolicy, ListAttachedGroupPolicies, ListGroupsForUser) and wire into DoActions dispatch. Also add helper functions for user/policy side effects. * iam: integrate group policies into authorization Add groups and userGroups reverse index to IdentityAccessManagement. Populate both maps during ReplaceS3ApiConfiguration and MergeS3ApiConfiguration. Modify evaluateIAMPolicies to evaluate policies from user's enabled groups in addition to user policies. Update VerifyActionPermission to consider group policies when checking hasAttachedPolicies. * iam: add group side effects on user deletion and rename When a user is deleted, remove them from all groups they belong to. When a user is renamed, update group membership references. Applied to both embedded and standalone IAM handlers. * iam: watch /etc/iam/groups directory for config changes Add groups directory to the filer subscription watcher so group file changes trigger IAM configuration reloads. * admin: add group management page to admin UI Add groups page with CRUD operations, member management, policy attachment, and enable/disable toggle. Register routes in admin handlers and add Groups entry to sidebar navigation. * test: add IAM group management integration tests Add comprehensive integration tests for group CRUD, membership, policy attachment, policy enforcement, disabled group behavior, user deletion side effects, and multi-group membership. Add "group" test type to CI matrix in s3-iam-tests workflow. * iam: address PR review comments for group management - Fix XSS vulnerability in groups.templ: replace innerHTML string concatenation with DOM APIs (createElement/textContent) for rendering member and policy lists - Use userGroups reverse index in embedded IAM ListGroupsForUser for O(1) lookup instead of iterating all groups - Add buildUserGroupsIndex helper in standalone IAM handlers; use it in ListGroupsForUser and removeUserFromAllGroups for efficient lookup - Add note about gRPC store load-modify-save race condition limitation * iam: add defensive copies, validation, and XSS fixes for group management - Memory store: clone groups on store/retrieve to prevent mutation - Admin dash: deep copy groups before mutation, validate user/policy exists - HTTP handlers: translate credential errors to proper HTTP status codes, use *bool for Enabled field to distinguish missing vs false - Groups templ: use data attributes + event delegation instead of inline onclick for XSS safety, prevent stale async responses * iam: add explicit group methods to PropagatingCredentialStore Add CreateGroup, GetGroup, DeleteGroup, ListGroups, and UpdateGroup methods instead of relying on embedded interface fallthrough. Group changes propagate via filer subscription so no RPC propagation needed. * iam: detect postgres unique constraint violation and add groups index Return ErrGroupAlreadyExists when INSERT hits SQLState 23505 instead of a generic error. Add index on groups(disabled) for filtered queries. * iam: add Marker field to group list response types Add Marker string field to GetGroupResult, ListGroupsResult, ListAttachedGroupPoliciesResult, and ListGroupsForUserResult to match AWS IAM pagination response format. * iam: check group attachment before policy deletion Reject DeletePolicy if the policy is attached to any group, matching AWS IAM behavior. Add PolicyArn to ListAttachedGroupPolicies response. * iam: include group policies in IAM authorization Merge policy names from user's enabled groups into the IAMIdentity used for authorization, so group-attached policies are evaluated alongside user-attached policies. * iam: check for name collision before renaming user in UpdateUser Scan identities and inline policies for newUserName before mutating, returning EntityAlreadyExists if a collision is found. Reuse the already-loaded policies instead of loading them again inside the loop. * test: use t.Cleanup for bucket cleanup in group policy test * iam: wrap ErrUserNotInGroup sentinel in RemoveGroupMember error Wrap credential.ErrUserNotInGroup so errors.Is works in groupErrorToHTTPStatus, returning proper 400 instead of 500. * admin: regenerate groups_templ.go with XSS-safe data attributes Regenerated from groups.templ which uses data-group-name attributes instead of inline onclick with string interpolation. * iam: add input validation and persist groups during migration - Validate nil/empty group name in CreateGroup and UpdateGroup - Save groups in migrateToMultiFile so they survive legacy migration * admin: use groupErrorToHTTPStatus in GetGroupMembers and GetGroupPolicies * iam: short-circuit UpdateUser when newUserName equals current name * iam: require empty PolicyNames before group deletion Reject DeleteGroup when group has attached policies, matching the existing members check. Also fix GetGroup error handling in DeletePolicy to only skip ErrGroupNotFound, not all errors. * ci: add weed/pb/** to S3 IAM test trigger paths * test: replace time.Sleep with require.Eventually for propagation waits Use polling with timeout instead of fixed sleeps to reduce flakiness in integration tests waiting for IAM policy propagation. * fix: use credentialManager.GetPolicy for AttachGroupPolicy validation Policies created via CreatePolicy through credentialManager are stored in the credential store, not in s3cfg.Policies (which only has static config policies). Change AttachGroupPolicy to use credentialManager.GetPolicy() for policy existence validation. * feat: add UpdateGroup handler to embedded IAM API Add UpdateGroup action to enable/disable groups and rename groups via the IAM API. This is a SeaweedFS extension (not in AWS SDK) used by tests to toggle group disabled status. * fix: authenticate raw IAM API calls in group tests The embedded IAM endpoint rejects anonymous requests. Replace callIAMAPI with callIAMAPIAuthenticated that uses JWT bearer token authentication via the test framework. * feat: add UpdateGroup handler to standalone IAM API Mirror the embedded IAM UpdateGroup handler in the standalone IAM API for parity. * fix: add omitempty to Marker XML tags in group responses Non-truncated responses should not emit an empty <Marker/> element. * fix: distinguish backend errors from missing policies in AttachGroupPolicy Return ServiceFailure for credential manager errors instead of masking them as NoSuchEntity. Also switch ListGroupsForUser to use s3cfg.Groups instead of in-memory reverse index to avoid stale data. Add duplicate name check to UpdateGroup rename. * fix: standalone IAM AttachGroupPolicy uses persisted policy store Check managed policies from GetPolicies() instead of s3cfg.Policies so dynamically created policies are found. Also add duplicate name check to UpdateGroup rename. * fix: rollback inline policies on UpdateUser PutPolicies failure If PutPolicies fails after moving inline policies to the new username, restore both the identity name and the inline policies map to their original state to avoid a partial-write window. * fix: correct test cleanup ordering for group tests Replace scattered defers with single ordered t.Cleanup in each test to ensure resources are torn down in reverse-creation order: remove membership, detach policies, delete access keys, delete users, delete groups, delete policies. Move bucket cleanup to parent test scope and delete objects before bucket. * fix: move identity nil check before map lookup and refine hasAttachedPolicies Move the nil check on identity before accessing identity.Name to prevent panic. Also refine hasAttachedPolicies to only consider groups that are enabled and have actual policies attached, so membership in a no-policy group doesn't incorrectly trigger IAM authorization. * fix: fail group reload on unreadable or corrupt group files Return errors instead of logging and continuing when group files cannot be read or unmarshaled. This prevents silently applying a partial IAM config with missing group memberships or policies. * fix: use errors.Is for sql.ErrNoRows comparison in postgres group store * docs: explain why group methods skip propagateChange Group changes propagate to S3 servers via filer subscription (watching /etc/iam/groups/) rather than gRPC RPCs, since there are no group-specific RPCs in the S3 cache protocol. * fix: remove unused policyNameFromArn and strings import * fix: update service account ParentUser on user rename When renaming a user via UpdateUser, also update ParentUser references in service accounts to prevent them from becoming orphaned after the next configuration reload. * fix: wrap DetachGroupPolicy error with ErrPolicyNotAttached sentinel Use credential.ErrPolicyNotAttached so groupErrorToHTTPStatus maps it to 400 instead of falling back to 500. * fix: use admin S3 client for bucket cleanup in enforcement test The user S3 client may lack permissions by cleanup time since the user is removed from the group in an earlier subtest. Use the admin S3 client to ensure bucket and object cleanup always succeeds. * fix: add nil guard for group param in propagating store log calls Prevent potential nil dereference when logging group.Name in CreateGroup and UpdateGroup of PropagatingCredentialStore. * fix: validate Disabled field in UpdateGroup handlers Reject values other than "true" or "false" with InvalidInputException instead of silently treating them as false. * fix: seed mergedGroups from existing groups in MergeS3ApiConfiguration Previously the merge started with empty group maps, dropping any static-file groups. Now seeds from existing iam.groups before overlaying dynamic config, and builds the reverse index after merging to avoid stale entries from overridden groups. * fix: use errors.Is for filer_pb.ErrNotFound comparison in group loading Replace direct equality (==) with errors.Is() to correctly match wrapped errors, consistent with the rest of the codebase. * fix: add ErrUserNotFound and ErrPolicyNotFound to groupErrorToHTTPStatus Map these sentinel errors to 404 so AddGroupMember and AttachGroupPolicy return proper HTTP status codes. * fix: log cleanup errors in group integration tests Replace fire-and-forget cleanup calls with error-checked versions that log failures via t.Logf for debugging visibility. * fix: prevent duplicate group test runs in CI matrix The basic lane's -run "TestIAM" regex also matched TestIAMGroup* tests, causing them to run in both the basic and group lanes. Replace with explicit test function names. * fix: add GIN index on groups.members JSONB for membership lookups Without this index, ListGroupsForUser and membership queries require full table scans on the groups table. * fix: handle cross-directory moves in IAM config subscription When a file is moved out of an IAM directory (e.g., /etc/iam/groups), the dir variable was overwritten with NewParentPath, causing the source directory change to be missed. Now also notifies handlers about the source directory for cross-directory moves. * fix: validate members/policies before deleting group in admin handler AdminServer.DeleteGroup now checks for attached members and policies before delegating to credentialManager, matching the IAM handler guards. * fix: merge groups by name instead of blind append during filer load Match the identity loader's merge behavior: find existing group by name and replace, only append when no match exists. Prevents duplicates when legacy and multi-file configs overlap. * fix: check DeleteEntry response error when cleaning obsolete group files Capture and log resp.Error from filer DeleteEntry calls during group file cleanup, matching the pattern used in deleteGroupFile. * fix: verify source user exists before no-op check in UpdateUser Reorder UpdateUser to find the source identity first and return NoSuchEntityException if not found, before checking if the rename is a no-op. Previously a non-existent user renamed to itself would incorrectly return success. * fix: update service account parent refs on user rename in embedded IAM The embedded IAM UpdateUser handler updated group membership but not service account ParentUser fields, unlike the standalone handler. * fix: replay source-side events for all handlers on cross-dir moves Pass nil newEntry to bucket, IAM, and circuit-breaker handlers for the source directory during cross-directory moves, so all watchers can clear caches for the moved-away resource. * fix: don't seed mergedGroups from existing iam.groups in merge Groups are always dynamic (from filer), never static (from s3.config). Seeding from iam.groups caused stale deleted groups to persist. Now only uses config.Groups from the dynamic filer config. * fix: add deferred user cleanup in TestIAMGroupUserDeletionSideEffect Register t.Cleanup for the created user so it gets cleaned up even if the test fails before the inline DeleteUser call. * fix: assert UpdateGroup HTTP status in disabled group tests Add require.Equal checks for 200 status after UpdateGroup calls so the test fails immediately on API errors rather than relying on the subsequent Eventually timeout. * fix: trim whitespace from group name in filer store operations Trim leading/trailing whitespace from group.Name before validation in CreateGroup and UpdateGroup to prevent whitespace-only filenames. Also merge groups by name during multi-file load to prevent duplicates. * fix: add nil/empty group validation in gRPC store Guard CreateGroup and UpdateGroup against nil group or empty name to prevent panics and invalid persistence. * fix: add nil/empty group validation in postgres store Guard CreateGroup and UpdateGroup against nil group or empty name to prevent panics from nil member access and empty-name row inserts. * fix: add name collision check in embedded IAM UpdateUser The embedded IAM handler renamed users without checking if the target name already existed, unlike the standalone handler. * fix: add ErrGroupNotEmpty sentinel and map to HTTP 409 AdminServer.DeleteGroup now wraps conflict errors with ErrGroupNotEmpty, and groupErrorToHTTPStatus maps it to 409 Conflict instead of 500. * fix: use appropriate error message in GetGroupDetails based on status Return "Group not found" only for 404, use "Failed to retrieve group" for other error statuses instead of always saying "Group not found". * fix: use backend-normalized group.Name in CreateGroup response After credentialManager.CreateGroup may normalize the name (e.g., trim whitespace), use group.Name instead of the raw input for the returned GroupData to ensure consistency. * fix: add nil/empty group validation in memory store Guard CreateGroup and UpdateGroup against nil group or empty name to prevent panics from nil pointer dereference on map access. * fix: reorder embedded IAM UpdateUser to verify source first Find the source identity before checking for collisions, matching the standalone handler's logic. Previously a non-existent user renamed to an existing name would get EntityAlreadyExists instead of NoSuchEntity. * fix: handle same-directory renames in metadata subscription Replay a delete event for the old entry name during same-directory renames so handlers like onBucketMetadataChange can clean up stale state for the old name. * fix: abort GetGroups on non-ErrGroupNotFound errors Only skip groups that return ErrGroupNotFound. Other errors (e.g., transient backend failures) now abort the handler and return the error to the caller instead of silently producing partial results. * fix: add aria-label and title to icon-only group action buttons Add accessible labels to View and Delete buttons so screen readers and tooltips provide meaningful context. * fix: validate group name in saveGroup to prevent invalid filenames Trim whitespace and reject empty names before writing group JSON files, preventing creation of files like ".json". * fix: add /etc/iam/groups to filer subscription watched directories The groups directory was missing from the watched directories list, so S3 servers in a cluster would not detect group changes made by other servers via filer. The onIamConfigChange handler already had code to handle group directory changes but it was never triggered. * add direct gRPC propagation for group changes to S3 servers Groups now have the same dual propagation as identities and policies: direct gRPC push via propagateChange + async filer subscription. - Add PutGroup/RemoveGroup proto messages and RPCs - Add PutGroup/RemoveGroup in-memory cache methods on IAM - Add PutGroup/RemoveGroup gRPC server handlers - Update PropagatingCredentialStore to call propagateChange on group mutations * reduce log verbosity for config load summary Change ReplaceS3ApiConfiguration log from Infof to V(1).Infof to avoid noisy output on every config reload. * admin: show user groups in view and edit user modals - Add Groups field to UserDetails and populate from credential manager - Show groups as badges in user details view modal - Add group management to edit user modal: display current groups, add to group via dropdown, remove from group via badge x button * fix: remove duplicate showAlert that broke modal-alerts.js admin.js defined showAlert(type, message) which overwrote the modal-alerts.js version showAlert(message, type), causing broken unstyled alert boxes. Remove the duplicate and swap all callers in admin.js to use the correct (message, type) argument order. * fix: unwrap groups API response in edit user modal The /api/groups endpoint returns {"groups": [...]}, not a bare array. * Update object_store_users_templ.go * test: assert AccessDenied error code in group denial tests Replace plain assert.Error checks with awserr.Error type assertion and AccessDenied code verification, matching the pattern used in other IAM integration tests. * fix: propagate GetGroups errors in ShowGroups handler getGroupsPageData was swallowing errors and returning an empty page with 200 status. Now returns the error so ShowGroups can respond with a proper error status. * fix: reject AttachGroupPolicy when credential manager is nil Previously skipped policy existence validation when credentialManager was nil, allowing attachment of nonexistent policies. Now returns a ServiceFailureException error. * fix: preserve groups during partial MergeS3ApiConfiguration updates UpsertIdentity calls MergeS3ApiConfiguration with a partial config containing only the updated identity (nil Groups). This was wiping all in-memory group state. Now only replaces groups when config.Groups is non-nil (full config reload). * fix: propagate errors from group lookup in GetObjectStoreUserDetails ListGroups and GetGroup errors were silently ignored, potentially showing incomplete group data in the UI. * fix: use DOM APIs for group badge remove button to prevent XSS Replace innerHTML with onclick string interpolation with DOM createElement + addEventListener pattern. Also add aria-label and title to the add-to-group button. * fix: snapshot group policies under RLock to prevent concurrent map access evaluateIAMPolicies was copying the map reference via groupMap := iam.groups under RLock then iterating after RUnlock, while PutGroup mutates the map in-place. Now copies the needed policy names into a slice while holding the lock. * fix: add nil IAM check to PutGroup and RemoveGroup gRPC handlers Match the nil guard pattern used by PutPolicy/DeletePolicy to prevent nil pointer dereference when IAM is not initialized. |
||
|
|
78a3441b30 |
fix: volume balance detection returns multiple tasks per run (#8559)
* fix: volume balance detection now returns multiple tasks per run (#8551) Previously, detectForDiskType() returned at most 1 balance task per disk type, making the MaxJobsPerDetection setting ineffective. The detection loop now iterates within each disk type, planning multiple moves until the imbalance drops below threshold or maxResults is reached. Effective volume counts are adjusted after each planned move so the algorithm correctly re-evaluates which server is overloaded. * fix: factor pending tasks into destination scoring and use UnixNano for task IDs - Use UnixNano instead of Unix for task IDs to avoid collisions when multiple tasks are created within the same second - Adjust calculateBalanceScore to include LoadCount (pending + assigned tasks) in the utilization estimate, so the destination picker avoids stacking multiple planned moves onto the same target disk * test: add comprehensive balance detection tests for complex scenarios Cover multi-server convergence, max-server shifting, destination spreading, pre-existing pending task skipping, no-duplicate-volume invariant, and parameterized convergence verification across different cluster shapes and thresholds. * fix: address PR review findings in balance detection - hasMore flag: compute from len(results) >= maxResults so the scheduler knows more pages may exist, matching vacuum/EC handler pattern - Exhausted server fallthrough: when no eligible volumes remain on the current maxServer (all have pending tasks) or destination planning fails, mark the server as exhausted and continue to the next overloaded server instead of stopping the entire detection loop - Return canonical destination server ID directly from createBalanceTask instead of resolving via findServerIDByAddress, eliminating the fragile address→ID lookup for adjustment tracking - Fix bestScore sentinel: use math.Inf(-1) instead of -1.0 so disks with negative scores (high pending load, same rack/DC) are still selected as the best available destination - Add TestDetection_ExhaustedServerFallsThrough covering the scenario where the top server's volumes are all blocked by pre-existing tasks * test: fix computeEffectiveCounts and add len guard in no-duplicate test - computeEffectiveCounts now takes a servers slice to seed counts for all known servers (including empty ones) and uses an address→ID map from the topology spec instead of scanning metrics, so destination servers with zero initial volumes are tracked correctly - TestDetection_NoDuplicateVolumesAcrossIterations now asserts len > 1 before checking duplicates, so the test actually fails if Detection regresses to returning a single task * fix: remove redundant HasAnyTask check in createBalanceTask The HasAnyTask check in createBalanceTask duplicated the same check already performed in detectForDiskType's volume selection loop. Since detection runs single-threaded (MaxDetectionConcurrency: 1), no race can occur between the two points. * fix: consistent hasMore pattern and remove double-counted LoadCount in scoring - Adopt vacuum_handler's hasMore pattern: over-fetch by 1, check len > maxResults, and truncate — consistent truncation semantics - Remove direct LoadCount penalty in calculateBalanceScore since LoadCount is already factored into effectiveVolumeCount for utilization scoring; bump utilization weight from 40 to 50 to compensate for the removed 10-point load penalty * fix: handle zero maxResults as no-cap, emit trace after trim, seed empty servers - When MaxResults is 0 (omitted), treat as no explicit cap instead of defaulting to 1; only apply the +1 over-fetch probe when caller supplies a positive limit - Move decision trace emission after hasMore/trim so the trace accurately reflects the returned proposals - Seed serverVolumeCounts from ActiveTopology so servers that have a matching disk type but zero volumes are included in the imbalance calculation and MinServerCount check * fix: nil-guard clusterInfo, uncap legacy DetectionFunc, deterministic disk type order - Add early nil guard for clusterInfo in Detection to prevent panics in downstream helpers (detectForDiskType, createBalanceTask) - Change register.go DetectionFunc wrapper from maxResults=1 to 0 (no cap) so the legacy code path returns all detected tasks - Sort disk type keys before iteration so results are deterministic when maxResults spans multiple disk types (HDD/SSD) * fix: don't over-fetch in stateful detection to avoid orphaned pending tasks Detection registers planned moves in ActiveTopology via AddPendingTask, so requesting maxResults+1 would create an extra pending task that gets discarded during trim. Use len(results) >= maxResults as the hasMore signal instead, which is correct since Detection already caps internally. * fix: return explicit truncated flag from Detection instead of approximating Detection now returns (results, truncated, error) where truncated is true only when the loop stopped because it hit maxResults, not when it ran out of work naturally. This eliminates false hasMore signals when detection happens to produce exactly maxResults results by resolving the imbalance. * cleanup: simplify detection logic and remove redundancies - Remove redundant clusterInfo nil check in detectForDiskType since Detection already guards against nil clusterInfo - Remove adjustments loop for destination servers not in serverVolumeCounts — topology seeding ensures all servers with matching disk type are already present - Merge two-loop min/max calculation into a single loop: min across all servers, max only among non-exhausted servers - Replace magic number 100 with len(metrics) for minC initialization in convergence test * fix: accurate truncation flag, deterministic server order, indexed volume lookup - Track balanced flag to distinguish "hit maxResults cap" from "cluster balanced at exactly maxResults" — truncated is only true when there's genuinely more work to do - Sort servers for deterministic iteration and tie-breaking when multiple servers have equal volume counts - Pre-index volumes by server with per-server cursors to avoid O(maxResults * volumes) rescanning on each iteration - Add truncation flag assertions to RespectsMaxResults test: true when capped, false when detection finishes naturally * fix: seed trace server counts from ActiveTopology to match detection logic The decision trace was building serverVolumeCounts only from metrics, missing zero-volume servers seeded from ActiveTopology by Detection. This could cause the trace to report wrong server counts, incorrect imbalance ratios, or spurious "too few servers" messages. Pass activeTopology into the trace function and seed server counts the same way Detection does. * fix: don't exhaust server on per-volume planning failure, sort volumes by ID - When createBalanceTask returns nil, continue to the next volume on the same server instead of marking the entire server as exhausted. The failure may be volume-specific (not found in topology, pending task registration failed) and other volumes on the server may still be viable candidates. - Sort each server's volume slice by VolumeID after pre-indexing so volume selection is fully deterministic regardless of input order. * fix: use require instead of assert to prevent nil dereference panic in CORS test The test used assert.NoError (non-fatal) for GetBucketCors, then immediately accessed getResp.CORSRules. When the API returns an error, getResp is nil causing a panic. Switch to require.NoError/NotNil/Len so the test stops before dereferencing a nil response. * fix: deterministic disk tie-breaking and stronger pre-existing task test - Sort available disks by NodeID then DiskID before scoring so destination selection is deterministic when two disks score equally - Add task count bounds assertion to SkipsPreExistingPendingTasks test: with 15 of 20 volumes already having pending tasks, at most 5 new tasks should be created and at least 1 (imbalance still exists) * fix: seed adjustments from existing pending/assigned tasks to prevent over-scheduling Detection now calls ActiveTopology.GetTaskServerAdjustments() to initialize the adjustments map with source/destination deltas from existing pending and assigned balance tasks. This ensures effectiveCounts reflects in-flight moves, preventing the algorithm from planning additional moves in the same direction when prior moves already address the imbalance. Added GetTaskServerAdjustments(taskType) to ActiveTopology which iterates pending and assigned tasks, decrementing source servers and incrementing destination servers for the given task type. |
||
|
|
df5e8210df |
Implement IAM managed policy operations (#8507)
* feat: Implement IAM managed policy operations (GetPolicy, ListPolicies, DeletePolicy, AttachUserPolicy, DetachUserPolicy) - Add response type aliases in iamapi_response.go for managed policy operations - Implement 6 handler methods in iamapi_management_handlers.go: - GetPolicy: Lookup managed policy by ARN - DeletePolicy: Remove managed policy - ListPolicies: List all managed policies - AttachUserPolicy: Attach managed policy to user, aggregating inline + managed actions - DetachUserPolicy: Detach managed policy from user - ListAttachedUserPolicies: List user's attached managed policies - Add computeAllActionsForUser() to aggregate actions from both inline and managed policies - Wire 6 new DoActions switch cases for policy operations - Add comprehensive tests for all new handlers - Fixes #8506 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address PR review feedback for IAM managed policy operations - Add parsePolicyArn() helper with proper ARN prefix validation, replacing fragile strings.Split parsing in GetPolicy, DeletePolicy, AttachUserPolicy, and DetachUserPolicy - DeletePolicy now detaches the policy from all users and recomputes their aggregated actions, preventing stale permissions after deletion - Set changed=true for DeletePolicy DoActions case so identity updates persist - Make PolicyId consistent: CreatePolicy now uses Hash(&policyName) matching GetPolicy and ListPolicies - Remove redundant nil map checks (Go handles nil map lookups safely) - DRY up action deduplication in computeAllActionsForUser with addUniqueActions closure - Add tests for invalid/empty ARN rejection and DeletePolicy identity cleanup Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add integration tests for managed policy lifecycle (#8506) Add two integration tests covering the user-reported use case where managed policy operations returned 500 errors: - TestS3IAMManagedPolicyLifecycle: end-to-end workflow matching the issue report — CreatePolicy, ListPolicies, GetPolicy, AttachUserPolicy, ListAttachedUserPolicies, idempotent re-attach, DeletePolicy while attached (expects DeleteConflict), DetachUserPolicy, DeletePolicy, and verification that deleted policy is gone - TestS3IAMManagedPolicyErrorCases: covers error paths — nonexistent policy/user for GetPolicy, DeletePolicy, AttachUserPolicy, DetachUserPolicy, and ListAttachedUserPolicies Also fixes DeletePolicy to reject deletion when policy is still attached to a user (AWS-compatible DeleteConflictException), and adds the 409 status code mapping for DeleteConflictException in the error response handler. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: nil map panic in CreatePolicy, add PolicyId test assertions - Initialize policies.Policies map in CreatePolicy if nil (prevents panic when no policies exist yet); also handle filer_pb.ErrNotFound like other callers - Add PolicyId assertions in TestGetPolicy and TestListPolicies to lock in the consistent Hash(&policyName) behavior - Remove redundant time.Sleep calls from new integration tests (startMiniCluster already blocks on waitForS3Ready) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: PutUserPolicy and DeleteUserPolicy now preserve managed policy actions PutUserPolicy and DeleteUserPolicy were calling computeAggregatedActionsForUser (inline-only), overwriting ident.Actions and dropping managed policy actions. Both now call computeAllActionsForUser which unions inline + managed actions. Add TestManagedPolicyActionsPreservedAcrossInlineMutations regression test: attaches a managed policy, adds an inline policy (verifies both actions present), deletes the inline policy, then asserts managed policy actions still persist. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: PutUserPolicy verifies user exists before persisting inline policy Previously the inline policy was written to storage before checking if the target user exists in s3cfg.Identities, leaving orphaned policy data when the user was absent. Now validates the user first, returning NoSuchEntityException immediately if not found. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: prevent stale/lost actions on computeAllActionsForUser failure - PutUserPolicy: on recomputation failure, preserve existing ident.Actions instead of falling back to only the current inline policy's actions - DeleteUserPolicy: on recomputation failure, preserve existing ident.Actions instead of assigning nil (which wiped all permissions) - AttachUserPolicy: roll back ident.PolicyNames and return error if action recomputation fails, keeping identity consistent - DetachUserPolicy: roll back ident.PolicyNames and return error if GetPolicies or action recomputation fails - Add doc comment on newTestIamApiServer noting it only sets s3ApiConfig Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
10a30a83e1 |
s3api: add GetObjectAttributes API support (#8504)
* s3api: add error code and header constants for GetObjectAttributes Add ErrInvalidAttributeName error code and header constants (X-Amz-Object-Attributes, X-Amz-Max-Parts, X-Amz-Part-Number-Marker, X-Amz-Delete-Marker) needed by the S3 GetObjectAttributes API. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: implement GetObjectAttributes handler Add GetObjectAttributesHandler that returns selected object metadata (ETag, Checksum, StorageClass, ObjectSize, ObjectParts) without returning the object body. Follows the same versioning and conditional header patterns as HeadObjectHandler. The handler parses the X-Amz-Object-Attributes header to determine which attributes to include in the XML response, and supports ObjectParts pagination via X-Amz-Max-Parts and X-Amz-Part-Number-Marker. Ref: https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetObjectAttributes.html Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: register GetObjectAttributes route Register the GET /{object}?attributes route for the GetObjectAttributes API, placed before other object query routes to ensure proper matching. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: add integration tests for GetObjectAttributes Test coverage: - Basic: simple object with all attribute types - MultipartObject: multipart upload with parts pagination - SelectiveAttributes: requesting only specific attributes - InvalidAttribute: server rejects invalid attribute names - NonExistentObject: returns NoSuchKey for missing objects Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: add versioned object test for GetObjectAttributes Test puts two versions of the same object and verifies that: - GetObjectAttributes returns the latest version by default - GetObjectAttributes with versionId returns the specific version - ObjectSize and VersionId are correct for each version Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: fix combined conditional header evaluation per RFC 7232 Per RFC 7232: - Section 3.4: If-Unmodified-Since MUST be ignored when If-Match is present (If-Match is the more accurate replacement) - Section 3.3: If-Modified-Since MUST be ignored when If-None-Match is present (If-None-Match is the more accurate replacement) Previously, all four conditional headers were evaluated independently. This caused incorrect 412 responses when If-Match succeeded but If-Unmodified-Since failed (should return 200 per AWS S3 behavior). Fix applied to both validateConditionalHeadersForReads (GET/HEAD) and validateConditionalHeaders (PUT) paths. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: add conditional header combination tests for GetObjectAttributes Test the RFC 7232 combined conditional header semantics: - If-Match=true + If-Unmodified-Since=false => 200 (If-Unmodified-Since ignored) - If-None-Match=false + If-Modified-Since=true => 304 (If-Modified-Since ignored) - If-None-Match=true + If-Modified-Since=false => 200 (If-Modified-Since ignored) - If-Match=true + If-Unmodified-Since=true => 200 - If-Match=false => 412 regardless Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: document Checksum attribute as not yet populated Checksum is accepted in validation (so clients requesting it don't get a 400 error, matching AWS behavior for objects without checksums) but SeaweedFS does not yet store S3 checksums. Add a comment explaining this and noting where to populate it when checksum storage is added. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: add s3:GetObjectAttributes IAM action for ?attributes query Previously, GET /{object}?attributes resolved to s3:GetObject via the fallback path since resolveFromQueryParameters had no case for the "attributes" query parameter. Add S3_ACTION_GET_OBJECT_ATTRIBUTES constant ("s3:GetObjectAttributes") and a branch in resolveFromQueryParameters to return it for GET requests with the "attributes" query parameter, so IAM policies can distinguish GetObjectAttributes from GetObject. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: evaluate conditional headers after version resolution Move conditional header evaluation (If-Match, If-None-Match, etc.) to after the version resolution step in GetObjectAttributesHandler. This ensures that when a specific versionId is requested, conditions are checked against the correct version entry rather than always against the latest version. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: use bounded HTTP client in GetObjectAttributes tests Replace http.DefaultClient with a timeout-aware http.Client (10s) in the signedGetObjectAttributes helper and testGetObjectAttributesInvalid to prevent tests from hanging indefinitely. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: check attributes query before versionId in action resolver Move the GetObjectAttributes action check before the versionId check in resolveFromQueryParameters. This fixes GET /bucket/key?attributes&versionId=xyz being incorrectly classified as s3:GetObjectVersion instead of s3:GetObjectAttributes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: add tests for versioned conditional headers and action resolver Add integration test that verifies conditional headers (If-Match, If-None-Match) are evaluated against the requested version entry, not the latest version. This covers the fix in |
||
|
|
3d81d5bef7 |
Fix S3 signature verification behind reverse proxies (#8444)
* Fix S3 signature verification behind reverse proxies When SeaweedFS is deployed behind a reverse proxy (e.g. nginx, Kong, Traefik), AWS S3 Signature V4 verification fails because the Host header the client signed with (e.g. "localhost:9000") differs from the Host header SeaweedFS receives on the backend (e.g. "seaweedfs:8333"). This commit adds a new -s3.externalUrl parameter (and S3_EXTERNAL_URL environment variable) that tells SeaweedFS what public-facing URL clients use to connect. When set, SeaweedFS uses this host value for signature verification instead of the Host header from the incoming request. New parameter: -s3.externalUrl (flag) or S3_EXTERNAL_URL (environment variable) Example: -s3.externalUrl=http://localhost:9000 Example: S3_EXTERNAL_URL=https://s3.example.com The environment variable is particularly useful in Docker/Kubernetes deployments where the external URL is injected via container config. The flag takes precedence over the environment variable when both are set. At startup, the URL is parsed and default ports are stripped to match AWS SDK behavior (port 80 for HTTP, port 443 for HTTPS), so "http://s3.example.com:80" and "http://s3.example.com" are equivalent. Bugs fixed: - Default port stripping was removed by a prior PR, causing signature mismatches when clients connect on standard ports (80/443) - X-Forwarded-Port was ignored when X-Forwarded-Host was not present - Scheme detection now uses proper precedence: X-Forwarded-Proto > TLS connection > URL scheme > "http" - Test expectations for standard port stripping were incorrect - expectedHost field in TestSignatureV4WithForwardedPort was declared but never actually checked (self-referential test) * Add Docker integration test for S3 proxy signature verification Docker Compose setup with nginx reverse proxy to validate that the -s3.externalUrl parameter (or S3_EXTERNAL_URL env var) correctly resolves S3 signature verification when SeaweedFS runs behind a proxy. The test uses nginx proxying port 9000 to SeaweedFS on port 8333, with X-Forwarded-Host/Port/Proto headers set. SeaweedFS is configured with -s3.externalUrl=http://localhost:9000 so it uses "localhost:9000" for signature verification, matching what the AWS CLI signs with. The test can be run with aws CLI on the host or without it by using the amazon/aws-cli Docker image with --network host. Test covers: create-bucket, list-buckets, put-object, head-object, list-objects-v2, get-object, content round-trip integrity, delete-object, and delete-bucket — all through the reverse proxy. * Create s3-proxy-signature-tests.yml * fix CLI * fix CI * Update s3-proxy-signature-tests.yml * address comments * Update Dockerfile * add user * no need for fuse * Update s3-proxy-signature-tests.yml * debug * weed mini * fix health check * health check * fix health checking --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Chris Lu <chris.lu@gmail.com> |
||
|
|
a92e9baddf |
Add integration test for multipart operations inheriting s3:PutObject permissions
TestS3MultipartOperationsInheritPutObjectPermissions verifies that multipart upload operations (CreateMultipartUpload, UploadPart, ListParts, CompleteMultipartUpload, AbortMultipartUpload, ListMultipartUploads) work correctly when a user has only s3:PutObject permission granted. This test validates the behavior where multipart operations are implicitly granted when s3:PutObject is authorized, as multipart upload is an implementation detail of putting objects in S3. |
||
|
|
b565a0cc86 |
Adds volume.merge command with deduplication and disk-based backend (#8441)
* Enhance volume.merge command with deduplication and disk-based backend
* Fix copyVolume function call with correct argument order and missing bool parameter
* Revert "Fix copyVolume function call with correct argument order and missing bool parameter"
This reverts commit
|
||
|
|
e4b70c2521 | go fix | ||
|
|
bd0b1fe9d5 |
S3 IAM: Added ListPolicyVersions and GetPolicyVersion support (#8395)
* test(s3/iam): add managed policy CRUD lifecycle integration coverage * s3/iam: add ListPolicyVersions and GetPolicyVersion support * test(s3/iam): cover ListPolicyVersions and GetPolicyVersion |
||
|
|
2f837c4780 |
Fix error on deleting non-empty bucket (#8376)
* Move check for non-empty bucket deletion out of `WithFilerClient` call * Added proper checking if a bucket has "user" objects |
||
|
|
e9c45144cf |
Implement managed policy storage (#8385)
* Persist managed IAM policies * Add IAM list/get policy integration test * Faster marker lookup and cleanup * Handle delete conflict and improve listing * Add delete-in-use policy integration test * Stabilize policy ID and guard path prefix * Tighten CreatePolicy guard and reload * Add ListPolicyNames to credential store |
||
|
|
7b8df39cf7 |
s3api: add AttachUserPolicy/DetachUserPolicy/ListAttachedUserPolicies (#8379)
* iam: add XML responses for managed user policy APIs * s3api: implement attach/detach/list attached user policies * s3api: add embedded IAM tests for managed user policies * iam: update CredentialStore interface and Manager for managed policies Updated the `CredentialStore` interface to include `AttachUserPolicy`, `DetachUserPolicy`, and `ListAttachedUserPolicies` methods. The `CredentialManager` was updated to delegate these calls to the store. Added common error variables for policy management. * iam: implement managed policy methods in MemoryStore Implemented `AttachUserPolicy`, `DetachUserPolicy`, and `ListAttachedUserPolicies` in the MemoryStore. Also ensured deep copying of identities includes PolicyNames. * iam: implement managed policy methods in PostgresStore Modified Postgres schema to include `policy_names` JSONB column in `users`. Implemented `AttachUserPolicy`, `DetachUserPolicy`, and `ListAttachedUserPolicies`. Updated user CRUD operations to handle policy names persistence. * iam: implement managed policy methods in remaining stores Implemented user policy management in: - `FilerEtcStore` (partial implementation) - `IamGrpcStore` (delegated via GetUser/UpdateUser) - `PropagatingCredentialStore` (to broadcast updates) Ensures cluster-wide consistency for policy attachments. * s3api: refactor EmbeddedIamApi to use managed policy APIs - Refactored `AttachUserPolicy`, `DetachUserPolicy`, and `ListAttachedUserPolicies` to use `e.credentialManager` directly. - Fixed a critical error suppression bug in `ExecuteAction` that always returned success even on failure. - Implemented robust error matching using string comparison fallbacks. - Improved consistency by reloading configuration after policy changes. * s3api: update and refine IAM integration tests - Updated tests to use a real `MemoryStore`-backed `CredentialManager`. - Refined test configuration synchronization using `sync.Once` and manual deep-copying to prevent state corruption. - Improved `extractEmbeddedIamErrorCodeAndMessage` to handle more XML formats robustly. - Adjusted test expectations to match current AWS IAM behavior. * fix compilation * visibility * ensure 10 policies * reload * add integration tests * Guard raft command registration * Allow IAM actions in policy tests * Validate gRPC policy attachments * Revert Validate gRPC policy attachments * Tighten gRPC policy attach/detach * Improve IAM managed policy handling * Improve managed policy filters |
||
|
|
53048ffffb |
Add md5 checksum validation support on PutObject and UploadPart (#8367)
* Add md5 checksum validation support on PutObject and UploadPart Per the S3 specification, when a client sends a Content-MD5 header, the server must compare it against the MD5 of the received body and return BadDigest (HTTP 400) if they don't match. SeaweedFS was silently accepting objects with incorrect Content-MD5 headers, which breaks data integrity verification for clients that rely on this feature (e.g. boto3). The error infrastructure (ErrBadDigest, ErrMsgBadDigest) already existed from PR #7306 but was never wired to an actual check. This commit adds MD5 verification in putToFiler after the body is streamed and the MD5 is computed, and adds Content-MD5 header validation to PutObjectPartHandler (matching PutObjectHandler). Orphaned chunks are cleaned up on mismatch. Refs: https://github.com/seaweedfs/seaweedfs/discussions/3908 * handle SSE, add uploadpart test * s3 integration test: fix typo and add multipart upload checksum test * s3api: move validateContentMd5 after GetBucketAndObject in PutObjectPartHandler * s3api: move validateContentMd5 after GetBucketAndObject in PutObjectHandler * s3api: fix MD5 validation for SSE uploads and logging in putToFiler * add SSE test with checksum validation - mostly ai-generated * Update s3_integration_test.go * Address S3 integration test feedback: fix typos, rename variables, add verification steps, and clean up comments. --------- Co-authored-by: Chris Lu <chris.lu@gmail.com> |
||
|
|
0d8588e3ae |
S3: Implement IAM defaults and STS signing key fallback (#8348)
* S3: Implement IAM defaults and STS signing key fallback logic * S3: Refactor startup order to init SSE-S3 key manager before IAM * S3: Derive STS signing key from KEK using HKDF for security isolation * S3: Document STS signing key fallback in security.toml * fix(s3api): refine anonymous access logic and secure-by-default behavior - Initialize anonymous identity by default in `NewIdentityAccessManagement` to prevent nil pointer exceptions. - Ensure `ReplaceS3ApiConfiguration` preserves the anonymous identity if not present in the new configuration. - Update `NewIdentityAccessManagement` signature to accept `filerClient`. - In legacy mode (no policy engine), anonymous defaults to Deny (no actions), preserving secure-by-default behavior. - Use specific `LookupAnonymous` method instead of generic map lookup. - Update tests to accommodate signature changes and verify improved anonymous handling. * feat(s3api): make IAM configuration optional - Start S3 API server without a configuration file if `EnableIam` option is set. - Default to `Allow` effect for policy engine when no configuration is provided (Zero-Config mode). - Handle empty configuration path gracefully in `loadIAMManagerFromConfig`. - Add integration test `iam_optional_test.go` to verify empty config behavior. * fix(iamapi): fix signature mismatch in NewIdentityAccessManagementWithStore * fix(iamapi): properly initialize FilerClient instead of passing nil * fix(iamapi): properly initialize filer client for IAM management - Instead of passing `nil`, construct a `wdclient.FilerClient` using the provided `Filers` addresses. - Ensure `NewIdentityAccessManagementWithStore` receives a valid `filerClient` to avoid potential nil pointer dereferences or limited functionality. * clean: remove dead code in s3api_server.go * refactor(s3api): improve IAM initialization, safety and anonymous access security * fix(s3api): ensure IAM config loads from filer after client init * fix(s3): resolve test failures in integration, CORS, and tagging tests - Fix CORS tests by providing explicit anonymous permissions config - Fix S3 integration tests by setting admin credentials in init - Align tagging test credentials in CI with IAM defaults - Added goroutine to retry IAM config load in iamapi server * fix(s3): allow anonymous access to health targets and S3 Tables when identities are present * fix(ci): use /healthz for Caddy health check in awscli tests * iam, s3api: expose DefaultAllow from IAM and Policy Engine This allows checking the global "Open by Default" configuration from other components like S3 Tables. * s3api/s3tables: support DefaultAllow in permission logic and handler Updated CheckPermissionWithContext to respect the DefaultAllow flag in PolicyContext. This enables "Open by Default" behavior for unauthenticated access in zero-config environments. Added a targeted unit test to verify the logic. * s3api/s3tables: propagate DefaultAllow through handlers Propagated the DefaultAllow flag to individual handlers for namespaces, buckets, tables, policies, and tagging. This ensures consistent "Open by Default" behavior across all S3 Tables API endpoints. * s3api: wire up DefaultAllow for S3 Tables API initialization Updated registerS3TablesRoutes to query the global IAM configuration and set the DefaultAllow flag on the S3 Tables API server. This completes the end-to-end propagation required for anonymous access in zero-config environments. Added a SetDefaultAllow method to S3TablesApiServer to facilitate this. * s3api: fix tests by adding DefaultAllow to mock IAM integrations The IAMIntegration interface was updated to include DefaultAllow(), breaking several mock implementations in tests. This commit fixes the build errors by adding the missing method to the mocks. * env * ensure ports * env * env * fix default allow * add one more test using non-anonymous user * debug * add more debug * less logs |
||
|
|
25ea48227f |
Fix STS temporary credentials to use ASIA prefix instead of AKIA (#8326)
Temporary credentials from STS AssumeRole were using "AKIA" prefix (permanent IAM user credentials) instead of "ASIA" prefix (temporary security credentials). This violates AWS conventions and may cause compatibility issues with AWS SDKs that validate credential types. Changes: - Rename generateAccessKeyId to generateTemporaryAccessKeyId for clarity - Update function to use ASIA prefix for temporary credentials - Add unit tests to verify ASIA prefix format (weed/iam/sts/credential_prefix_test.go) - Add integration test to verify ASIA prefix in S3 API (test/s3/iam/s3_sts_credential_prefix_test.go) - Ensure AWS-compatible credential format (ASIA + 16 hex chars) The credentials are already deterministic (SHA256-based from session ID) and the SessionToken is correctly set to the JWT token, so this is just a prefix fix to follow AWS standards. Fixes #8312 |
||
|
|
b57429ef2e |
Switch empty-folder cleanup to bucket policy (#8292)
* Fix Spark _temporary cleanup and add issue #8285 regression test * Generalize empty folder cleanup for Spark temp artifacts * Revert synchronous folder pruning and add cleanup diagnostics * Add actionable empty-folder cleanup diagnostics * Fix Spark temp marker cleanup in async folder cleaner * Fix Spark temp cleanup with implicit directory markers * Keep explicit directory markers non-implicit * logging * more logs * Switch empty-folder cleanup to bucket policy * Seaweed-X-Amz-Allow-Empty-Folders * less logs * go vet * less logs * refactoring |
||
|
|
458c12fb99 |
test: add Spark S3 integration regression for issue #8234 (#8249)
* test: add Spark S3 regression integration test for issue 8234 * Update test/s3/spark/setup_test.go Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> |
||
|
|
c284e51d20 |
fix: multipart upload ETag calculation (#8238)
* fix multipart etag * address comments * clean up * clean up * optimization * address comments * unquoted etag * dedup * upgrade * clean * etag * return quoted tag * quoted etag * debug * s3api: unify ETag retrieval and quoting across handlers Refactor newListEntry to take *S3ApiServer and use getObjectETag, and update setResponseHeaders to use the same logic. This ensures consistent ETags are returned for both listing and direct access. * s3api: implement ListObjects deduplication for versioned buckets Handle duplicate entries between the main path and the .versions directory by prioritizing the latest version when bucket versioning is enabled. * s3api: cleanup stale main file entries during versioned uploads Add explicit deletion of pre-existing "main" files when creating new versions in versioned buckets. This prevents stale entries from appearing in bucket listings and ensures consistency. * s3api: fix cleanup code placement in versioned uploads Correct the placement of rm calls in completeMultipartUpload and putVersionedObject to ensure stale main files are properly deleted during versioned uploads. * s3api: improve getObjectETag fallback for empty ExtETagKey Ensure that when ExtETagKey exists but contains an empty value, the function falls through to MD5/chunk-based calculation instead of returning an empty string. * s3api: fix test files for new newListEntry signature Update test files to use the new newListEntry signature where the first parameter is *S3ApiServer. Created mockS3ApiServer to properly test owner display name lookup functionality. * s3api: use filer.ETag for consistent Md5 handling in getEtagFromEntry Change getEtagFromEntry fallback to use filer.ETag(entry) instead of filer.ETagChunks to ensure legacy entries with Attributes.Md5 are handled consistently with the rest of the codebase. * s3api: optimize list logic and fix conditional header logging - Hoist bucket versioning check out of per-entry callback to avoid repeated getVersioningState calls - Extract appendOrDedup helper function to eliminate duplicate dedup/append logic across multiple code paths - Change If-Match mismatch logging from glog.Errorf to glog.V(3).Infof and remove DEBUG prefix for consistency * s3api: fix test mock to properly initialize IAM accounts Fixed nil pointer dereference in TestNewListEntryOwnerDisplayName by directly initializing the IdentityAccessManagement.accounts map in the test setup. This ensures newListEntry can properly look up account display names without panicking. * cleanup * s3api: remove premature main file cleanup in versioned uploads Removed incorrect cleanup logic that was deleting main files during versioned uploads. This was causing test failures because it deleted objects that should have been preserved as null versions when versioning was first enabled. The deduplication logic in listing is sufficient to handle duplicate entries without deleting files during upload. * s3api: add empty-value guard to getEtagFromEntry Added the same empty-value guard used in getObjectETag to prevent returning quoted empty strings. When ExtETagKey exists but is empty, the function now falls through to filer.ETag calculation instead of returning "". * s3api: fix listing of directory key objects with matching prefix Revert prefix handling logic to use strings.TrimPrefix instead of checking HasPrefix with empty string result. This ensures that when a directory key object exactly matches the prefix (e.g. prefix="dir/", object="dir/"), it is correctly handled as a regular entry instead of being skipped or incorrectly processed as a common prefix. Also fixed missing variable definition. * s3api: refactor list inline dedup to use appendOrDedup helper Refactored the inline deduplication logic in listFilerEntries to use the shared appendOrDedup helper function. This ensures consistent behavior and reduces code duplication. * test: fix port allocation race in s3tables integration test Updated startMiniCluster to find all required ports simultaneously using findAvailablePorts instead of sequentially. This prevents race conditions where the OS reallocates a port that was just released, causing multiple services (e.g. Filer and Volume) to be assigned the same port and fail to start. |
||
|
|
217d977579 | Merge branch 'master' of https://github.com/seaweedfs/seaweedfs | ||
|
|
a3b83f8808 |
test: add Trino Iceberg catalog integration test (#8228)
* test: add Trino Iceberg catalog integration test - Create test/s3/catalog_trino/trino_catalog_test.go with TestTrinoIcebergCatalog - Tests integration between Trino SQL engine and SeaweedFS Iceberg REST catalog - Starts weed mini with all services and Trino in Docker container - Validates Iceberg catalog schema creation and listing operations - Uses native S3 filesystem support in Trino with path-style access - Add workflow job to s3-tables-tests.yml for CI execution * fix: preserve AWS environment credentials when replacing S3 configuration When S3 configuration is loaded from filer/db, it replaces the identities list and inadvertently removes AWS_ACCESS_KEY_ID credentials that were added from environment variables. This caused auth to remain disabled even though valid credentials were present. Fix by preserving environment-based identities when replacing the configuration and re-adding them after the replacement. This ensures environment credentials persist across configuration reloads and properly enable authentication. * fix: use correct ServerAddress format with gRPC port encoding The admin server couldn't connect to master because the master address was missing the gRPC port information. Use pb.NewServerAddress() which properly encodes both HTTP and gRPC ports in the address string. Changes: - weed/command/mini.go: Use pb.NewServerAddress for master address in admin - test/s3/policy/policy_test.go: Store and use gRPC ports for master/filer addresses This fix applies to: 1. Admin server connection to master (mini.go) 2. Test shell commands that need master/filer addresses (policy_test.go) * move * move * fix: always include gRPC port in server address encoding The NewServerAddress() function was omitting the gRPC port from the address string when it matched the port+10000 convention. However, gRPC port allocation doesn't always follow this convention - when the calculated port is busy, an alternative port is allocated. This caused a bug where: 1. Master's gRPC port was allocated as 50661 (sequential, not port+10000) 2. Address was encoded as '192.168.1.66:50660' (gRPC port omitted) 3. Admin client called ToGrpcAddress() which assumed port+10000 offset 4. Admin tried to connect to 60660 but master was on 50661 → connection failed Fix: Always include explicit gRPC port in address format (host:httpPort.grpcPort) unless gRPC port is 0. This makes addresses unambiguous and works regardless of the port allocation strategy used. Impacts: All server-to-server gRPC connections now use properly formatted addresses. * test: fix Iceberg REST API readiness check The Iceberg REST API endpoints require authentication. When checked without credentials, the API returns 403 Forbidden (not 401 Unauthorized). The readiness check now accepts both auth error codes (401/403) as indicators that the service is up and ready, it just needs credentials. This fixes the 'Iceberg REST API did not become ready' test failure. * Fix AWS SigV4 signature verification for base64-encoded payload hashes AWS SigV4 canonical requests must use hex-encoded SHA256 hashes, but the X-Amz-Content-Sha256 header may be transmitted as base64. Changes: - Added normalizePayloadHash() function to convert base64 to hex - Call normalizePayloadHash() in extractV4AuthInfoFromHeader() - Added encoding/base64 import Fixes 403 Forbidden errors on POST requests to Iceberg REST API when clients send base64-encoded content hashes in the header. Impacted services: Iceberg REST API, S3Tables * Fix AWS SigV4 signature verification for base64-encoded payload hashes AWS SigV4 canonical requests must use hex-encoded SHA256 hashes, but the X-Amz-Content-Sha256 header may be transmitted as base64. Changes: - Added normalizePayloadHash() function to convert base64 to hex - Call normalizePayloadHash() in extractV4AuthInfoFromHeader() - Added encoding/base64 import - Removed unused fmt import Fixes 403 Forbidden errors on POST requests to Iceberg REST API when clients send base64-encoded content hashes in the header. Impacted services: Iceberg REST API, S3Tables * pass sigv4 * s3api: fix identity preservation and logging levels - Ensure environment-based identities are preserved during config replacement - Update accessKeyIdent and nameToIdentity maps correctly - Downgrade informational logs to V(2) to reduce noise * test: fix trino integration test and s3 policy test - Pin Trino image version to 479 - Fix port binding to 0.0.0.0 for Docker connectivity - Fix S3 policy test hang by correctly assigning MiniClusterCtx - Improve port finding robustness in policy tests * ci: pre-pull trino image to avoid timeouts - Pull trinodb/trino:479 after Docker setup - Ensure image is ready before integration tests start * iceberg: remove unused checkAuth and improve logging - Remove unused checkAuth method - Downgrade informational logs to V(2) - Ensure loggingMiddleware uses a status writer for accurate reported codes - Narrow catch-all route to avoid interfering with other subsystems * iceberg: fix build failure by removing unused s3api import * Update iceberg.go * use warehouse * Update trino_catalog_test.go |
||
|
|
833bcde9f3 |
test: add Trino Iceberg catalog integration test
- Create test/s3/catalog_trino/trino_catalog_test.go with TestTrinoIcebergCatalog - Tests integration between Trino SQL engine and SeaweedFS Iceberg REST catalog - Starts weed mini with all services and Trino in Docker container - Validates Iceberg catalog schema creation and listing operations - Uses native S3 filesystem support in Trino with path-style access - Add workflow job to s3-tables-tests.yml for CI execution |
||
|
|
dfdace9a13 |
s3tables: enhance test robustness and resilience
Updated random string generation to use crypto/rand in s3tables tests. Increased resilience of IAM distributed tests by adding "connection refused" to retryable errors. |
||
|
|
551a31e156 |
Implement IAM propagation to S3 servers (#8130)
* Implement IAM propagation to S3 servers - Add PropagatingCredentialStore to propagate IAM changes to S3 servers via gRPC - Add Policy management RPCs to S3 proto and S3ApiServer - Update CredentialManager to use PropagatingCredentialStore when MasterClient is available - Wire FilerServer to enable propagation * Implement parallel IAM propagation and fix S3 cluster registration - Parallelized IAM change propagation with 10s timeout. - Refined context usage in PropagatingCredentialStore. - Added S3Type support to cluster node management. - Enabled S3 servers to register with gRPC address to the master. - Ensured IAM configuration reload after policy updates via gRPC. * Optimize IAM propagation with direct in-memory cache updates * Secure IAM propagation: Use metadata to skip persistence only on propagation * pb: refactor IAM and S3 services for unidirectional IAM propagation - Move SeaweedS3IamCache service from iam.proto to s3.proto. - Remove legacy IAM management RPCs and empty SeaweedS3 service from s3.proto. - Enforce that S3 servers only use the synchronization interface. * pb: regenerate Go code for IAM and S3 services Updated generated code following the proto refactoring of IAM synchronization services. * s3api: implement read-only mode for Embedded IAM API - Add readOnly flag to EmbeddedIamApi to reject write operations via HTTP. - Enable read-only mode by default in S3ApiServer. - Handle AccessDenied error in writeIamErrorResponse. - Embed SeaweedS3IamCacheServer in S3ApiServer. * credential: refactor PropagatingCredentialStore for unidirectional IAM flow - Update to use s3_pb.SeaweedS3IamCacheClient for propagation to S3 servers. - Propagate full Identity object via PutIdentity for consistency. - Remove redundant propagation of specific user/account/policy management RPCs. - Add timeout context for propagation calls. * s3api: implement SeaweedS3IamCacheServer for unidirectional sync - Update S3ApiServer to implement the cache synchronization gRPC interface. - Methods (PutIdentity, RemoveIdentity, etc.) now perform direct in-memory cache updates. - Register SeaweedS3IamCacheServer in command/s3.go. - Remove registration for the legacy and now empty SeaweedS3 service. * s3api: update tests for read-only IAM and propagation - Added TestEmbeddedIamReadOnly to verify rejection of write operations in read-only mode. - Update test setup to pass readOnly=false to NewEmbeddedIamApi in routing tests. - Updated EmbeddedIamApiForTest helper with read-only checks matching production behavior. * s3api: add back temporary debug logs for IAM updates Log IAM updates received via: - gRPC propagation (PutIdentity, PutPolicy, etc.) - Metadata configuration reloads (LoadS3ApiConfigurationFromCredentialManager) - Core identity management (UpsertIdentity, RemoveIdentity) * IAM: finalize propagation fix with reduced logging and clarified architecture * Allow configuring IAM read-only mode for S3 server integration tests * s3api: add defensive validation to UpsertIdentity * s3api: fix log message to reference correct IAM read-only flag * test/s3/iam: ensure WaitForS3Service checks for IAM write permissions * test: enable writable IAM in Makefile for integration tests * IAM: add GetPolicy/ListPolicies RPCs to s3.proto * S3: add GetBucketPolicy and ListBucketPolicies helpers * S3: support storing generic IAM policies in IdentityAccessManagement * S3: implement IAM policy RPCs using IdentityAccessManagement * IAM: fix stale user identity on rename propagation |
||
|
|
43229b05ce |
Explicit IAM gRPC APIs for S3 Server (#8126)
* Update IAM and S3 protobuf definitions for explicit IAM gRPC APIs * Refactor s3api: Extract generic ExecuteAction method for IAM operations * Implement explicit IAM gRPC APIs in S3 server * iam: remove deprecated GetConfiguration and PutConfiguration RPCs * iamapi: refactor handlers to use CredentialManager directly * s3api: refactor embedded IAM to use CredentialManager directly * server: remove deprecated configuration gRPC handlers * credential/grpc: refactor configuration calls to return error * shell: update s3.configure to list users instead of full config * s3api: fix CreateServiceAccount gRPC handler to map required fields * s3api: fix UpdateServiceAccount gRPC handler to map fields and safe status * s3api: enforce UserName in embedded IAM ListAccessKeys * test: fix test_config.json structure to match proto definition * Revert "credential/grpc: refactor configuration calls to return error" This reverts commit |
||
|
|
6bf088cec9 |
IAM Policy Management via gRPC (#8109)
* Add IAM gRPC service definition - Add GetConfiguration/PutConfiguration for config management - Add CreateUser/GetUser/UpdateUser/DeleteUser/ListUsers for user management - Add CreateAccessKey/DeleteAccessKey/GetUserByAccessKey for access key management - Methods mirror existing IAM HTTP API functionality * Add IAM gRPC handlers on filer server - Implement IamGrpcServer with CredentialManager integration - Handle configuration get/put operations - Handle user CRUD operations - Handle access key create/delete operations - All methods delegate to CredentialManager for actual storage * Wire IAM gRPC service to filer server - Add CredentialManager field to FilerOption and FilerServer - Import credential store implementations in filer command - Initialize CredentialManager from credential.toml if available - Register IAM gRPC service on filer gRPC server - Enable credential management via gRPC alongside existing filer services * Regenerate IAM protobuf with gRPC service methods * iam_pb: add Policy Management to protobuf definitions * credential: implement PolicyManager in credential stores * filer: implement IAM Policy Management RPCs * shell: add s3.policy command * test: add integration test for s3.policy * test: fix compilation errors in policy_test * pb * fmt * test * weed shell: add -policies flag to s3.configure This allows linking/unlinking IAM policies to/from identities directly from the s3.configure command. * test: verify s3.configure policy linking and fix port allocation - Added test case for linking policies to users via s3.configure - Implemented findAvailablePortPair to ensure HTTP and gRPC ports are both available, avoiding conflicts with randomized port assignments. - Updated assertion to match jsonpb output (policyNames) * credential: add StoreTypeGrpc constant * credential: add IAM gRPC store boilerplate * credential: implement identity methods in gRPC store * credential: implement policy methods in gRPC store * admin: use gRPC credential store for AdminServer This ensures that all IAM and policy changes made through the Admin UI are persisted via the Filer's IAM gRPC service instead of direct file manipulation. * shell: s3.configure use granular IAM gRPC APIs instead of full config patching * shell: s3.configure use granular IAM gRPC APIs * shell: replace deprecated ioutil with os in s3.policy * filer: use gRPC FailedPrecondition for unconfigured credential manager * test: improve s3.policy integration tests and fix error checks * ci: add s3 policy shell integration tests to github workflow * filer: fix LoadCredentialConfiguration error handling * credential/grpc: propagate unmarshal errors in GetPolicies * filer/grpc: improve error handling and validation * shell: use gRPC status codes in s3.configure * credential: document PutPolicy as create-or-replace * credential/postgres: reuse CreatePolicy in PutPolicy to deduplicate logic * shell: add timeout context and strictly enforce flags in s3.policy * iam: standardize policy content field naming in gRPC and proto * shell: extract slice helper functions in s3.configure * filer: map credential store errors to gRPC status codes * filer: add input validation for UpdateUser and CreateAccessKey * iam: improve validation in policy and config handlers * filer: ensure IAM service registration by defaulting credential manager * credential: add GetStoreName method to manager * test: verify policy deletion in integration test |
||
|
|
535be3096b |
Add AWS IAM integration tests and refactor admin authorization (#8098)
* Add AWS IAM integration tests and refactor admin authorization - Added AWS IAM management integration tests (User, AccessKey, Policy) - Updated test framework to support IAM client creation with JWT/OIDC - Refactored s3api authorization to be policy-driven for IAM actions - Removed hardcoded role name checks for admin privileges - Added new tests to GitHub Actions basic test matrix * test(s3/iam): add UpdateUser and UpdateAccessKey tests and fix nil pointer dereference * feat(s3api): add DeletePolicy and update tests with cleanup logic * test(s3/iam): use t.Cleanup for managed policy deletion in CreatePolicy test |
||
|
|
d664ca5ed3 |
fix: IAM authentication with AWS Signature V4 and environment credentials (#8099)
* fix: IAM authentication with AWS Signature V4 and environment credentials Three key fixes for authenticated IAM requests to work: 1. Fix request body consumption before signature verification - iamMatcher was calling r.ParseForm() which consumed POST body - This broke AWS Signature V4 verification on subsequent reads - Now only check query string in matcher, preserving body for verification - File: weed/s3api/s3api_server.go 2. Preserve environment variable credentials across config reloads - After IAM mutations, config reload overwrote env var credentials - Extract env var loading into loadEnvironmentVariableCredentials() - Call after every config reload to persist credentials - File: weed/s3api/auth_credentials.go 3. Add authenticated IAM tests and test infrastructure - New TestIAMAuthenticated suite with AWS SDK + Signature V4 - Dynamic port allocation for independent test execution - Flag reset to prevent state leakage between tests - CI workflow to run S3 and IAM tests separately - Files: test/s3/example/*, .github/workflows/s3-example-integration-tests.yml All tests pass: - TestIAMCreateUser (unauthenticated) - TestIAMAuthenticated (with AWS Signature V4) - S3 integration tests * fmt * chore: rename test/s3/example to test/s3/normal * simplify: CI runs all integration tests in single job * Update s3-example-integration-tests.yml * ci: run each test group separately to avoid raft registry conflicts |
||
|
|
14f44379cb |
test: fix flaky S3 volume encryption test (#8083)
Specifically: - Use bytes.NewReader for binary data instead of strings.NewReader - Increase binary test data from 8 bytes to 1KB to avoid edge cases - Add 50ms delay between subtests to prevent overwhelming the server |