seaweedfs

mirror of https://github.com/seaweedfs/seaweedfs.git synced 2026-05-22 17:51:30 +00:00

Author	SHA1	Message	Date
Mmx233	9b9fdb5b76	fix(s3): sync IAM policies to advanced IAM Manager policy engine (#9577 ) * fix(s3): sync IAM policies to advanced IAM Manager policy engine * test(s3): add unit tests for PutPolicy/DeletePolicy IAM Manager sync * fix(s3): flush loaded policies in SetIAMIntegration, drop extra reload Sync the policies already loaded from the credential store into the IAM Manager's engine from SetIAMIntegration itself, instead of re-running a full LoadS3ApiConfigurationFromCredentialManager after setup. This covers both startup orderings without a second filer round-trip or racing the async loader goroutine: if the load won, the policies are in memory to push; if SetIAMIntegration won, the load's own sync runs afterward. Move the runtime PutPolicy/DeletePolicy sync out of the iam.m write lock so the per-request auth RLock path isn't blocked by the policy recompile. * fix(s3): serialize IAM manager policy resync to avoid stale snapshots SyncRuntimePolicies replaces the manager's full policy set, so applying a policy view captured before a later mutation can resurrect a deleted policy or drop a new one. Funnel every path (PutPolicy, DeletePolicy, SetIAMIntegration, and the credential-manager load) through a single resyncIAMManagerPolicies that serializes on a dedicated mutex and reads iam.policies fresh at apply time, so the live map always wins regardless of interleaving. The load now installs the config into iam.policies before resyncing, closing the window where the manager held policies the map didn't yet have. --------- Co-authored-by: Chris Lu <chris.lu@gmail.com>	2026-05-21 00:39:42 -07:00
Chris Lu	37b6a14b0d	feat(s3): add four bucket configuration handlers (#9570 ) * feat(s3): add four bucket configuration handlers - GetBucketPolicyStatus: computes IsPublic from the existing bucket policy - PutBucketRequestPayment: companion writer to the existing GET; accepts only BucketOwner - GetBucketAccelerateConfiguration: returns <Status>Suspended</Status> - GetBucketLogging: returns an empty BucketLoggingStatus Lets AWS SDK probes succeed instead of returning MethodNotAllowed. * review: route GetBucketPolicyStatus through checkBucket Mirrors the existence/auth gating used by other bucket handlers and drops the bespoke filer_pb lookup so NoSuchBucket precedence is consistent across the API surface. * review: cap PutBucketRequestPayment body with MaxBytesReader The body is unmarshalled as RequestPaymentConfiguration, which is a handful of bytes; reject excessively large payloads up front and defer Close immediately after wrapping. * review: gate static getters on checkBucket GetBucketAccelerateConfiguration and GetBucketLogging now run the standard bucket existence check before returning the static Suspended / empty-status response so a missing bucket cannot appear to have valid configuration. * review: share cache helper across misc tests; check io.ReadAll error Accelerate and Logging tests now run through newMiscTestServer like the others so the checkBucket guard sees a cached bucket; the ReadAll error is explicitly checked.	2026-05-19 17:35:08 -07:00
Chris Lu	cee2bf697c	feat(s3): stub bucket configuration list endpoints (#9571 ) * feat(s3): stub bucket configuration list endpoints Adds Get and List handlers for Analytics, Inventory, IntelligentTiering, and Metrics bucket configurations. List returns an empty result with IsTruncated=false; single-get returns NoSuchConfiguration so SDK error parsing remains predictable. * review: gate stubs on bucket existence All eight stub handlers now call checkBucket via stubBucketGuard so NoSuchBucket takes precedence over NoSuchConfiguration / empty-list responses, matching AWS S3 precedence. Tests provide a cached bucket so the guard sees it as present.	2026-05-19 17:34:51 -07:00
Chris Lu	d57de6dc20	fix(s3): keep anonymous access working with EnableIam default (fixes #9557 ) (#9567 ) fix(s3): keep anonymous access working with EnableIam default `docker run seaweedfs` (and `weed mini` with no config) start with EnableIam=true but no IAM config file and no identities. The advanced-IAM init path was failing in 4.25 because of the missing STS signing key, which masked a latent bug: SetIAMIntegration unconditionally flipped isAuthEnabled to true, and isEnabled() also treated a non-nil iamIntegration as auth-on. Once the mini SSE-S3 KEK landed in 4.26 the STS fallback started succeeding, the integration got installed end to end, and every anonymous S3 request bounced as AccessDenied. Separate the two concerns: SetIAMIntegration just plumbs in the OIDC / embedded-IAM machinery, and a new EnableAuthEnforcement opts in to enforcement. The startup path calls it only when -s3.iam.config is actually provided, so operators with explicit IAM configs still get auth (preserves #7726). isEnabled() now reads isAuthEnabled only.	2026-05-19 13:03:30 -07:00
Chris Lu	58c3fa802c	fix(s3): keep host-less bucket catch-all so reverse proxies work (#9540 ) When s3.domainName is set, all bucket-prefix routes were gated on a matching Host header. Requests that arrive via an IP, an unlisted hostname, or a reverse proxy that rewrites Host hit no router and bounce back as 405/404 (and 503 once a proxy maps the upstream error). Register the path-style catch-all unconditionally, after the host-specific routers, so it only fires when no Host matcher applies.	2026-05-18 19:44:19 -07:00
Konstantin Lebedev	7d1b16fbcd	fix: ListBucketsHandler for pathStyleDomains (#9510 )	2026-05-15 13:12:55 -07:00
Chris Lu	f5a4bfb514	fix(s3/versioning): repair dangling latest-version pointer after partial delete (#9460 ) * fix(s3/versioning): repair dangling latest-version pointer after partial delete deleteSpecificObjectVersion did two non-atomic filer ops: rm the version blob, then update the .versions/ pointer. Step 2 failures were silently logged and the client got 204 OK, so any transient blip (filer timeout, process restart between RPCs, lock contention) left the .versions/ directory naming a missing file. Subsequent GETs paid the 10-retry self-heal cost and returned NoSuchKey — surfacing as "Storage not found" to Veeam, which is what triggered this investigation. Three changes: 1. Pre-roll the pointer for the singleton / multi-version-deleting-latest cases. The pointer is repointed (multi) or cleared (singleton) before the blob rm. A failure between leaves a recoverable orphan blob — pointer is consistent, GETs succeed or correctly miss without entering the stale-pointer self-heal path. 2. Wrap the load-bearing filer ops in updateLatestVersionAfterDeletion with bounded retries (~6.3s worst case). When retries are exhausted the function now returns a non-nil error instead of swallowing it. The caller logs at Error level and queues the path for the reconciler. 3. Background reconciler drains stranded .versions/ pointer-to-missing states off the hot path. Bounded in-memory queue with capped retries; read-path heal remains as a last-resort safety net. * fix(s3/versioning): address review on #9460 Four fixes addressing review on PR #9460. All four are correctness; no behavioural change for the happy path. 1. repointLatestBeforeDeletion: discriminate NotFound from transient errors when re-fetching the .versions/ entry. Previously any error returned rolled=true,nil — a transient filer hiccup at that point would cause the caller to skip the post-delete reconciliation AND proceed with the blob rm, producing exactly the dangling pointer state the PR aims to prevent. NotFound stays "vacuously consistent" (directory already gone); other errors surface so the caller aborts before removing the blob. 2. Move the singleton .versions/ teardown out of repointLatestBeforeDeletion (where it ran BEFORE the blob rm and always failed with "non-empty folder") into deleteSpecificObjectVersion AFTER the blob rm. Adds a wasSingleton return value so the caller knows when to run the teardown. Without this, every singleton-version delete in a versioned bucket leaked an empty .versions/ directory. 3. Wrap the list, getEntry, and mkFile calls inside repointLatestBeforeDeletion with retryFilerOp so the pre-roll has the same transient-failure resilience as the post-roll path. Without retries, a single transient blip causes the caller to fall back to the legacy non-atomic flow even when the filer recovers immediately. 4. healVersionsPointer in the reconciler: same NotFound-vs-transient discrimination on both the .versions/ getEntry and the latest-file presence probe. Previously a transient filer error would silently evict the candidate from the queue as "healed", leaving the real stranded state until a client read happened to surface it. Also fixes the gemini-flagged consistency nit: the queued-for-reconciler error log now uses normalizedObject instead of object so it matches the queue entry's key. * fix(s3/versioning): short-circuit terminal errors in retryFilerOp Add isRetryableFilerErr that returns false for filer_pb.ErrNotFound, gRPC NotFound, context.Canceled, and context.DeadlineExceeded. retryFilerOp now bails immediately on a terminal error and returns it unwrapped, so callers like repointLatestBeforeDeletion.getEntry and updateLatestVersionAfterDeletion.rm see the raw NotFound instead of paying the ~6.3 s retry-budget delay AND parsing it out of an "exhausted N retries" wrapper. errors.Is and status.Code already walk the %w chain so today's call sites still work, but the delay was real on the hot DELETE path whenever a key was genuinely absent. Test added covering all five terminal-error shapes — each must run the wrapped fn exactly once and return in under 50 ms.	2026-05-13 10:14:27 -07:00
Chris Lu	c567da7164	feat(s3): register SeaweedS3LifecycleInternal gRPC service (#9359 ) Phase 2 added the LifecycleDelete handler on S3ApiServer but never registered it on a running gRPC server, so workers had no endpoint to dial. Embed UnimplementedSeaweedS3LifecycleInternalServer on S3ApiServer and register it on the s3 command's grpc server alongside SeaweedS3IamCacheServer.	2026-05-07 18:19:42 -07:00
Chris Lu	f8973b3ed6	feat(iam): OIDC provider mutations + multi-client + TLS thumbprints (Phase 2b) (#9320 ) * feat(iam): OIDC provider mutations + multi-client + TLS thumbprints - Mutating IAM actions: CreateOpenIDConnectProvider, DeleteOpenIDConnectProvider, AddClientIDToOpenIDConnectProvider, RemoveClientIDFromOpenIDConnectProvider, UpdateOpenIDConnectProviderThumbprint, TagOpenIDConnectProvider, UntagOpenIDConnectProvider. Each enforces AWS-shape input bounds and the read-only mode rejects all mutations. - Multiple client_ids per provider in OIDCConfig (clientIds list, plural) with full backward compatibility — singular clientId still works and is merged into the audience allowlist. Provider factory accepts both. - AWS-compatible TLS thumbprint pinning: when OIDCConfig.Thumbprints is non-empty, JWKS fetches enforce that the negotiated TLS chain contains a certificate whose SHA-1 hex matches the allowlist. Empty list keeps the existing system-trust path. * fix(iam): factor Tags.member.N parser into a helper CreateOpenIDConnectProvider and TagOpenIDConnectProvider were both walking the AWS Tags.member.N.Key / Tags.member.N.Value query-string convention with copy-pasted loops. Factor into extractTags so the parsing rules and the "no tags present" semantics live in one place. Addresses gemini medium review on PR #9320. * fix(iam): sentinel errors for OIDC provider not-found / already-exists The s3api dispatcher was using strings.Contains(err.Error(), "not found") and "already exists" to map IAM-manager errors back to AWS error codes. Substring matching on a formatted message couples the API error code to the exact wording of the upstream message — touching the message silently changes the IAM API contract. Define ErrOIDCProviderNotFound and ErrOIDCProviderAlreadyExists in the integration package, fmt.Errorf("%w: ...") them at the four return sites in iam_manager.go and oidc_provider_store.go, and use errors.Is at the s3api call sites. Same control flow, no string-match fragility. Addresses gemini medium review on PR #9320. * fix(iam): surface non-NotFound errors from CreateOIDCProvider lookup Previously CreateOIDCProvider only treated GetProviderByARN's success path as "exists" and silently fell through on any error, including transient backend failures. That hid real problems and still attempted a write. Distinguish ErrOIDCProviderNotFound (the only "safe to create" case) from other errors so we don't mask filer outages or partition issues. * fix(iam): enforce 100-client-ID cap on AddClientIDToOIDCProvider CreateOIDCProvider and the implicit update path through validateOIDC- ProviderRecord both reject lists with more than 100 client IDs, but AddClientIDToOIDCProvider could grow the list past that bound one ID at a time. Refuse the add when the list is already at the cap so the invariant holds across every mutation entry point. * feat(iam): IAM-managed OIDC provider live view in STS service Add a separate, mutex-guarded map of admin-managed OIDC providers on the STS service. The map can be atomically replaced via SetIAMManagedOIDCProvidersByIssuer; AssumeRoleWithWebIdentity lookups consult it first and fall back to the existing static-config map, so records persisted through the IAM API can shadow bootstrap entries without a restart. This is the runtime hook the IAM API and the metadata-subscribe path will both call when the OIDCProviderStore changes (next two commits). * feat(iam): refresh STS service runtime view after OIDC mutations Add IAMManager.RefreshOIDCProvidersFromStore: lists every persisted OIDCProviderRecord, builds a runtime OIDCProvider for each, and atomically publishes the issuer-keyed map into the STS service. Each mutating IAM API call (Create / Delete / AddClientID / RemoveClientID / UpdateThumbprints) now triggers this refresh inline so the local instance picks up the change without waiting for a metadata-subscribe round trip. Tag mutations skip the refresh because tags do not affect token validation. Refresh failures only log; the persisted write has already succeeded by that point, so a transient list error must not surface to the API caller. The peer-instance update path (filer metadata subscription) is added in a follow-up commit. * feat(iam): subscribe to OIDC provider changes on the filer Watch /etc/iam/oidc-providers under the existing s3 metadata-subscribe loop and call RefreshOIDCProvidersFromStore on any create / update / delete / rename. This is the cross-instance update path: S3 server A writes via the IAM API, the filer fans out the metadata change, and S3 servers B..N pick up the new runtime view without a restart. Mirrors the existing onIamConfigChange / onCircuitBreakerConfigChange pattern. The handler short-circuits when the path is unrelated, and when no IAMManager is wired in (static-only configurations).	2026-05-05 11:26:08 -07:00
Chris Lu	e77f8ae204	fix(s3api): route STS GetFederationToken to STS handler (#9157 ) (#9167 ) * fix(s3api): route STS GetFederationToken requests to STS handler (#9157) The STS GetFederationToken handler was implemented but never reachable. Three routing gaps sent requests to the S3/IAM path instead of STS: - No explicit mux route for Action=GetFederationToken in the URL query - iamMatcher did not exclude GetFederationToken, so authenticated POSTs with Action in the form body were matched and dispatched to IAM - UnifiedPostHandler only dispatched AssumeRole* and GetCallerIdentity to STS, leaving GetFederationToken to fall through to DoActions and return NotImplemented Add the missing route, the matcher exclusion, and the dispatch branch. Also wire TestSTS, TestAssumeRoleWithWebIdentity, and TestServiceAccount into the s3-iam-tests workflow as a new "sts" matrix entry. Before this change, none of test/s3/iam/s3_sts_get_federation_token_test.go's four test functions ran in CI, which is why this regression shipped. * test(iam): make orphaned STS/service-account tests pass under auth-enabled CI Follow-up to wiring STS tests into CI: fixes several pre-existing issues that made the newly-included tests fail locally. Server fixes: - weed/s3api/s3api_sts.go: handleGetFederationToken no longer 500s when the caller is a legacy S3-config identity (not in the IAM user store). Previously any GetPoliciesForUser error short-circuited to InternalError, which hard-failed every SigV4 caller using keys from -s3.config. - weed/s3api/s3api_embedded_iam.go: CreateServiceAccount now generates IDs in the sa:<parent>:<uuid> format required by credential.ValidateServiceAccountId. The old "sa-XXXXXXXX" format failed the persistence-layer regex and caused every CreateServiceAccount call to return 500 once a filer-backed credential store validated the ID. Test helpers: - test/s3/iam/s3_sts_assume_role_test.go: callSTSAPIWithSigV4 no longer sets req.Header["Host"]. aws-sdk-go v1 v4.Signer already signs Host from req.URL.Host, and a manual Host header made the signer emit host;host in SignedHeaders, producing SignatureDoesNotMatch. Updated missing_role_arn subtest to match the existing SeaweedFS behavior (user-context assumption). - test/s3/iam/s3_service_account_test.go: callIAMAPI now SigV4-signs requests when STS_TEST_{ACCESS,SECRET}_KEY env vars are set. Unsigned IAM writes otherwise fall through to the STS fallback and return InvalidAction. CI matrix: - .github/workflows/s3-iam-tests.yml: skip TestServiceAccountLifecycle/use_service_account_credentials only. The rest of the service-account suite passes; that one subtest depends on a separate credential-reload issue where new ABIA keys briefly register into accessKeyIdent but aren't persisted to the filer, so they vanish on the next reload. Out of scope for the #9157 GetFederationToken fix. * fix(credential): accept AWS IAM username chars in service-account IDs Gemini review on #9167 pointed out that ServiceAccountIdPattern's parent-user segment was more restrictive than an AWS IAM username: `[A-Za-z0-9_-]` vs. IAM's `[\w+=,.@-]`. Realistic usernames with `@`, `.`, `+`, `=`, or `,` (e.g. email-style principals) would fail validation at the filer store even though the embedded IAM API happily created them. Broaden the regex to `[A-Za-z0-9_+=,.@-]` (matching the AWS IAM spec at https://docs.aws.amazon.com/IAM/latest/APIReference/API_User.html) and add a table-driven test that locks the expansion in. * address PR review feedback on #9167 All five review items were valid; changes keyed to review bullets: - weed/s3api/s3api_sts.go: handleGetFederationToken no longer swallows arbitrary policy-lookup failures. Only credential.ErrUserNotFound is treated leniently (the legacy-config SigV4 path); any other error now returns InternalError so we don't mint tokens with an incomplete policy set. - weed/credential/grpc/grpc_identity.go: GetUser translates gRPC NotFound back to credential.ErrUserNotFound so errors.Is(...) above matches for gRPC-backed stores, not just memory/filer-direct. - weed/s3api/s3api_embedded_iam.go: CreateServiceAccount now validates the generated saId against credential.ValidateServiceAccountId before returning. Surfaces a client 400 with the offending ID instead of the opaque 500 that used to bubble up from the persistence layer. - weed/s3api/s3api_server_routing_test.go: seed a routing-test identity with a known AK/SK, sign TestRouting_GetFederationTokenAuthenticatedBody with aws-sdk-go v4.Signer so the request actually passes AuthSignatureOnly. Assert 503 ServiceUnavailable (from STSHandlers with no stsService) instead of just NotEqual(501) — 503 proves the dispatch reached STSHandlers.HandleSTSRequest. - test/s3/iam/s3_service_account_test.go: callIAMAPI signs with service="iam" instead of "s3" (SeaweedFS verifies against whichever service the client signed with, but "iam" is semantically correct). - weed/credential/validation_test.go: add positive rows for an uppercase parent (sa:ALICE:...) and a canonical hyphenated UUID suffix (sa:alice:123e4567-e89b-12d3-a456-426614174000).	2026-04-20 19:33:22 -07:00
Chris Lu	eaf561e86c	perf(s3): add optional shared in-memory chunk cache for GET (#9069 ) Adds the -s3.cacheCapacityMB flag (default 0, disabled) that attaches an in-memory chunk_cache.ChunkCacheInMemory to the server-wide ReaderCache introduced in the previous commit. When enabled, completed chunks are deposited into the shared cache as they are downloaded, so concurrent and repeat GETs of the same object hit memory instead of re-fetching chunks from volume servers. When 0 (the default) the shared ReaderCache still runs — it just attaches a nil chunk cache, so behaviour matches the previous commit exactly. No behaviour change for clusters that don't opt in. Disk-backed TieredChunkCache was evaluated and rejected: its synchronous SetChunk writes regressed cold reads ~12x on loopback because the chunk fetchers block on local disk I/O that is slower than the TCP volume-server fetch it is supposed to accelerate. Memory-only avoids that. Flag registered in all four S3 flag sites (s3.go, server.go, filer.go, mini.go) per the comment on command.S3Options. The chunk size used to convert CacheSizeMB → entry count is encapsulated in the s3ChunkCacheChunkSizeMB constant so it's easy to grep and revisit if the filer default chunk size changes. Measured on weed mini + 1 GiB random object over loopback, single curl on a presigned URL: cacheCapacityMB=0 (off): cold ~2900, warm ~2900 MB/s cacheCapacityMB=4096: cold ~2790, warm ~5050 MB/s (+70%)	2026-04-14 09:24:35 -07:00
Chris Lu	228ed25a01	perf(s3): route GET through ChunkReadAt + per-request ReaderCache (#9068 ) perf(s3): route GET through ChunkReadAt + shared ReaderCache The S3 GET path previously used filer.PrepareStreamContentWithPrefetch, which hands chunk bytes from the volume-server fetch goroutine to the consumer through an io.Pipe. io.Pipe is a synchronous rendezvous, so the prefetch=4 window only overlapped HTTP connection setup — the actual data bytes still flowed one pipe at a time. Switch to the same path WebDAV uses (server/webdav_server.go): build a filer.ChunkReadAt backed by a server-wide filer.ReaderCache. ReaderCache prefetches whole chunks into []byte buffers, so the prefetch window translates into real in-flight bytes and the consumer copies them out as memcpys. The ReaderCache is server-wide (not per-request) for two reasons: 1. ChunkReadAt.Close() destroys the ReaderCache's downloader map. With a per-request cache, the defer on the handler would wait for background chunk downloads that run on context.Background() — so a client disconnect would block handler cleanup on downloads that the client no longer wants, tying up goroutines and memory. 2. Concurrent requests for the same object can share in-flight downloads through the shared downloader map. No persistent ChunkCache is added in this commit — the ReaderCache is constructed with a nil *chunk_cache.TieredChunkCache (all its methods are nil-receiver safe). A follow-up PR wires in an in-memory chunk cache for cross-request warm hits. JWT for volume-server requests is generated internally by util_http.RetriedFetchChunkData from jwtSigningReadKey, so the new path remains compatible with JWT-protected clusters — this is the same mechanism the WebDAV and mount read paths have been using. Measured on weed mini + 1 GiB random object over loopback, cold cache, single-stream curl on a presigned URL: before (io.Pipe): 2100-2200 MB/s after (ChunkReadAt): 2900-3800 MB/s	2026-04-14 07:46:05 -07:00
Chris Lu	2b8c16160f	feat(iceberg): add OAuth2 token endpoint for DuckDB compatibility (#9017 ) * feat(iceberg): add OAuth2 token endpoint for DuckDB compatibility (#9015) DuckDB's Iceberg connector uses OAuth2 client_credentials flow, hitting POST /v1/oauth/tokens which was not implemented, returning 404. Add the OAuth2 token endpoint that accepts S3 access key / secret key as client_id / client_secret, validates them against IAM, and returns a signed JWT bearer token. The Auth middleware now accepts Bearer tokens in addition to S3 signature auth. * fix(test): use weed shell for table bucket creation with IAM enabled The S3 Tables REST API requires SigV4 auth when IAM is configured. Use weed shell (which bypasses S3 auth) to create table buckets, matching the pattern used by the Trino integration tests. * address review feedback: access key in JWT, full identity in Bearer auth - Include AccessKey in JWT claims so token verification uses the exact credential that signed the token (no ambiguity with multi-key identities) - Return full Identity object from Bearer auth so downstream IAM/policy code sees an authenticated request, not anonymous - Replace GetSecretKeyForIdentity with GetCredentialByAccessKey for unambiguous credential lookup - DuckDB test now tries the full SQL script first (CREATE SECRET + catalog access), falling back to simple CREATE SECRET if needed - Tighten bearer auth test assertion to only accept 200/500 Addresses review comments from coderabbitai and gemini-code-assist. * security: use PostFormValue, bind signing key to access key, fix port conflict - Use r.PostFormValue instead of r.FormValue to prevent credentials from leaking via query string into logs and caches - Reject client_secret in URL query parameters explicitly - Include access key in HMAC signing key derivation to prevent cross-credential token forgery when secrets happen to match - Allocate dedicated webdav port in OAuth test env to avoid port collision with the shared TestMain cluster	2026-04-10 11:18:11 -07:00
Chris Lu	d1823d3784	fix(s3): include static identities in listing operations (#8903 ) * fix(s3): include static identities in listing operations Static identities loaded from -s3.config file were only stored in the S3 API server's in-memory state. Listing operations (s3.configure shell command, aws iam list-users) queried the credential manager which only returned dynamic identities from the backend store. Register static identities with the credential manager after loading so they are included in LoadConfiguration and ListUsers results, and filtered out before SaveConfiguration to avoid persisting them to the dynamic store. Fixes https://github.com/seaweedfs/seaweedfs/discussions/8896 * fix: avoid mutating caller's config and defensive copies - SaveConfiguration: use shallow struct copy instead of mutating the caller's config.Identities field - SetStaticIdentities: skip nil entries to avoid panics - GetStaticIdentities: defensively copy PolicyNames slice to avoid aliasing the original * fix: filter nil static identities and sync on config reload - SetStaticIdentities: filter nil entries from the stored slice (not just from staticNames) to prevent panics in LoadConfiguration/ListUsers - Extract updateCredentialManagerStaticIdentities helper and call it from both startup and the grace.OnReload handler so the credential manager's static snapshot stays current after config file reloads * fix: add mutex for static identity fields and fix ListUsers for store callers - Add sync.RWMutex to protect staticIdentities/staticNames against concurrent reads during config reload - Revert CredentialManager.ListUsers to return only store users, since internal callers (e.g. DeletePolicy) look up each user in the store and fail on non-existent static entries - Merge static usernames in the filer gRPC ListUsers handler instead, via the new GetStaticUsernames method - Fix CI: TestIAMPolicyManagement/managed_policy_crud_lifecycle was failing because DeletePolicy iterated static users that don't exist in the store * fix: show static identities in admin UI and weed shell The admin UI and weed shell s3.configure command query the filer's credential manager via gRPC, which is a separate instance from the S3 server's credential manager. Static identities were only registered on the S3 server's credential manager, so they never appeared in the filer's responses. - Add CredentialManager.LoadS3ConfigFile to parse a static S3 config file and register its identities - Add FilerOptions.s3ConfigFile so the filer can load the same static config that the S3 server uses - Wire s3ConfigFile through in weed mini and weed server modes - Merge static usernames in filer gRPC ListUsers handler - Add CredentialManager.GetStaticUsernames helper - Add sync.RWMutex to protect concurrent access to static identity fields - Avoid importing weed/filer from weed/credential (which pulled in filer store init() registrations and broke test isolation) - Add docker/compose/s3_static_users_example.json * fix(admin): make static users read-only in admin UI Static users loaded from the -s3.config file should not be editable or deletable through the admin UI since they are managed via the config file. - Add IsStatic field to ObjectStoreUser, set from credential manager - Hide edit, delete, and access key buttons for static users in the users table template - Show a "static" badge next to static user names - Return 403 Forbidden from UpdateUser and DeleteUser API handlers when the target user is a static identity * fix(admin): show details for static users GetObjectStoreUserDetails called credentialManager.GetUser which only queries the dynamic store. For static users this returned ErrUserNotFound. Fall back to GetStaticIdentity when the store lookup fails. * fix(admin): load static S3 identities in admin server The admin server has its own credential manager (gRPC store) which is a separate instance from the S3 server's and filer's. It had no static identity data, so IsStaticIdentity returned false (edit/delete buttons shown) and GetStaticIdentity returned nil (details page failed). Pass the -s3.config file path through to the admin server and call LoadS3ConfigFile on its credential manager, matching the approach used for the filer. * fix: use protobuf is_static field instead of passing config file path The previous approach passed -s3.config file path to every component (filer, admin). This is wrong because the admin server should not need to know about S3 config files. Instead, add an is_static field to the Identity protobuf message. The field is set when static identities are serialized (in GetStaticIdentities and LoadS3ConfigFile). Any gRPC client that loads configuration via GetConfiguration automatically sees which identities are static, without needing the config file. - Add is_static field (tag 8) to iam_pb.Identity proto message - Set IsStatic=true in GetStaticIdentities and LoadS3ConfigFile - Admin GetObjectStoreUsers reads identity.IsStatic from proto - Admin IsStaticUser helper loads config via gRPC to check the flag - Filer GetUser gRPC handler falls back to GetStaticIdentity - Remove s3ConfigFile from AdminOptions and NewAdminServer signature	2026-04-03 20:01:28 -07:00
Chris Lu	7c59b639c9	STS: add GetCallerIdentity support (#8893 ) * STS: add GetCallerIdentity support Implement the AWS STS GetCallerIdentity action, which returns the ARN, account ID, and user ID of the caller based on SigV4 authentication. This is commonly used by AWS SDKs and CLI tools (e.g. `aws sts get-caller-identity`) to verify credentials and determine the authenticated identity. * test: remove trivial GetCallerIdentity tests Remove the XML unmarshal test (we don't consume this response as input) and the routing constant test (just asserts a literal equals itself). * fix: route GetCallerIdentity through STS in UnifiedPostHandler and use stable UserId - UnifiedPostHandler only dispatched actions starting with "AssumeRole" to STS, so GetCallerIdentity in a POST body would fall through to the IAM path and get AccessDenied for non-admin users. Add explicit check for GetCallerIdentity. - Use identity.Name as UserId instead of credential.AccessKey, which is a transient value and incorrect for STS assumed-role callers.	2026-04-02 15:59:09 -07:00
Chris Lu	efbed39e25	S3: map canned ACL to file permissions and add configurable default file mode (#8886 ) * S3: map canned ACL to file permissions and add configurable default file mode S3 uploads were hardcoded to 0660 regardless of ACL headers. Now the X-Amz-Acl header maps to Unix file permissions per-object: - public-read, authenticated-read, bucket-owner-read → 0644 - public-read-write → 0666 - private, bucket-owner-full-control → 0660 Also adds -defaultFileMode / -s3.defaultFileMode flag to set a server-wide default when no ACL header is present. Closes #8874 * Address review feedback for S3 file mode feature - Extract hardcoded 0660 to defaultFileMode constant - Change parseDefaultFileMode to return error instead of calling Fatalf - Add -s3.defaultFileMode flag to filer.go and mini.go (was missing) - Add doc comment to S3Options about updating all four flag sites - Add TestResolveFileMode with 10 test cases covering ACL mapping, server default, and priority ordering	2026-04-02 11:51:54 -07:00
Chris Lu	7d5cbfd547	s3: support s3:x-amz-server-side-encryption policy condition (#8806 ) * s3: support s3:x-amz-server-side-encryption policy condition (#7680) - Normalize x-amz-server-side-encryption header values to canonical form (aes256 → AES256, aws:kms mixed-case → aws:kms) so StringEquals conditions work regardless of client capitalisation - Exempt UploadPart and UploadPartCopy from SSE Null conditions: these actions inherit SSE from the initial CreateMultipartUpload request and do not re-send the header, so Deny/Null("true") should not block them - Add sse_condition_test.go covering StringEquals, Null, case-insensitive normalisation, and multipart continuation action exemption * s3: address review comments on SSE condition support - Replace "inherited" sentinel in injectSSEForMultipart with "AES256" so that StringEquals/Null conditions evaluate against a meaningful value; add TODO noting that KMS multipart uploads need the actual algorithm looked up from the upload state - Rewrite TestSSECaseInsensitiveNormalization to drive normalisation through EvaluatePolicyForRequest with a real http.Request so regressions in the production code path are caught; split into AES256 and aws:kms variants to cover both normalisation branches s3: plumb real inherited SSE from multipart upload state into policy eval Instead of injecting a static "AES256" sentinel for UploadPart/UploadPartCopy, look up the actual SSE algorithm from the stored CreateMultipartUpload entry and pass it through the evaluation chain. Changes: - PolicyEvaluationArgs gains InheritedSSEAlgorithm string; set by the BucketPolicyEngine wrapper for multipart continuation actions - injectSSEForMultipart(conditions, inheritedSSE) now accepts the real algorithm; empty string means no SSE → Null("true") fires correctly - IsMultipartContinuationAction exported so the s3api wrapper can use it - BucketPolicyEngine gets a MultipartSSELookup callback (set by S3ApiServer) that fetches the upload entry and reads SeaweedFSSSEKMSKeyID / SeaweedFSSSES3Encryption to determine the algorithm - S3ApiServer.getMultipartSSEAlgorithm implements the lookup via getEntry - Tests updated: three multipart cases (AES256, aws:kms, no-SSE-must-deny) plus UploadPartCopy coverage	2026-03-27 23:15:01 -07:00
Chris Lu	0adb78bc6b	s3api: make conditional mutations atomic and AWS-compatible (#8802 ) * s3api: serialize conditional write finalization * s3api: add conditional delete mutation checks * s3api: enforce destination conditions for copy * s3api: revalidate multipart completion under lock * s3api: rollback failed put finalization hooks * s3api: report delete-marker version deletions * s3api: fix copy destination versioning edge cases * s3api: make versioned multipart completion idempotent * test/s3: cover conditional mutation regressions * s3api: rollback failed copy version finalization * s3api: resolve suspended delete conditions via latest entry * s3api: remove copy test null-version injection * s3api: reject out-of-order multipart completions * s3api: preserve multipart replay version metadata * s3api: surface copy destination existence errors * s3api: simplify delete condition target resolution * test/s3: make conditional delete assertions order independent * test/s3: add distributed lock gateway integration * s3api: fail closed multipart versioned completion * s3api: harden copy metadata and overwrite paths * s3api: create delete markers for suspended deletes * s3api: allow duplicate multipart completion parts	2026-03-27 19:22:26 -07:00
Chris Lu	992db11d2b	iam: add IAM group management (#8560 ) * iam: add Group message to protobuf schema Add Group message (name, members, policy_names, disabled) and add groups field to S3ApiConfiguration for IAM group management support (issue #7742). * iam: add group CRUD to CredentialStore interface and all backends Add group management methods (CreateGroup, GetGroup, DeleteGroup, ListGroups, UpdateGroup) to the CredentialStore interface with implementations for memory, filer_etc, postgres, and grpc stores. Wire group loading/saving into filer_etc LoadConfiguration and SaveConfiguration. * iam: add group IAM response types Add XML response types for group management IAM actions: CreateGroup, DeleteGroup, GetGroup, ListGroups, AddUserToGroup, RemoveUserFromGroup, AttachGroupPolicy, DetachGroupPolicy, ListAttachedGroupPolicies, ListGroupsForUser. * iam: add group management handlers to embedded IAM API Add CreateGroup, DeleteGroup, GetGroup, ListGroups, AddUserToGroup, RemoveUserFromGroup, AttachGroupPolicy, DetachGroupPolicy, ListAttachedGroupPolicies, and ListGroupsForUser handlers with dispatch in ExecuteAction. * iam: add group management handlers to standalone IAM API Add group handlers (CreateGroup, DeleteGroup, GetGroup, ListGroups, AddUserToGroup, RemoveUserFromGroup, AttachGroupPolicy, DetachGroupPolicy, ListAttachedGroupPolicies, ListGroupsForUser) and wire into DoActions dispatch. Also add helper functions for user/policy side effects. * iam: integrate group policies into authorization Add groups and userGroups reverse index to IdentityAccessManagement. Populate both maps during ReplaceS3ApiConfiguration and MergeS3ApiConfiguration. Modify evaluateIAMPolicies to evaluate policies from user's enabled groups in addition to user policies. Update VerifyActionPermission to consider group policies when checking hasAttachedPolicies. * iam: add group side effects on user deletion and rename When a user is deleted, remove them from all groups they belong to. When a user is renamed, update group membership references. Applied to both embedded and standalone IAM handlers. * iam: watch /etc/iam/groups directory for config changes Add groups directory to the filer subscription watcher so group file changes trigger IAM configuration reloads. * admin: add group management page to admin UI Add groups page with CRUD operations, member management, policy attachment, and enable/disable toggle. Register routes in admin handlers and add Groups entry to sidebar navigation. * test: add IAM group management integration tests Add comprehensive integration tests for group CRUD, membership, policy attachment, policy enforcement, disabled group behavior, user deletion side effects, and multi-group membership. Add "group" test type to CI matrix in s3-iam-tests workflow. * iam: address PR review comments for group management - Fix XSS vulnerability in groups.templ: replace innerHTML string concatenation with DOM APIs (createElement/textContent) for rendering member and policy lists - Use userGroups reverse index in embedded IAM ListGroupsForUser for O(1) lookup instead of iterating all groups - Add buildUserGroupsIndex helper in standalone IAM handlers; use it in ListGroupsForUser and removeUserFromAllGroups for efficient lookup - Add note about gRPC store load-modify-save race condition limitation * iam: add defensive copies, validation, and XSS fixes for group management - Memory store: clone groups on store/retrieve to prevent mutation - Admin dash: deep copy groups before mutation, validate user/policy exists - HTTP handlers: translate credential errors to proper HTTP status codes, use bool for Enabled field to distinguish missing vs false - Groups templ: use data attributes + event delegation instead of inline onclick for XSS safety, prevent stale async responses iam: add explicit group methods to PropagatingCredentialStore Add CreateGroup, GetGroup, DeleteGroup, ListGroups, and UpdateGroup methods instead of relying on embedded interface fallthrough. Group changes propagate via filer subscription so no RPC propagation needed. * iam: detect postgres unique constraint violation and add groups index Return ErrGroupAlreadyExists when INSERT hits SQLState 23505 instead of a generic error. Add index on groups(disabled) for filtered queries. * iam: add Marker field to group list response types Add Marker string field to GetGroupResult, ListGroupsResult, ListAttachedGroupPoliciesResult, and ListGroupsForUserResult to match AWS IAM pagination response format. * iam: check group attachment before policy deletion Reject DeletePolicy if the policy is attached to any group, matching AWS IAM behavior. Add PolicyArn to ListAttachedGroupPolicies response. * iam: include group policies in IAM authorization Merge policy names from user's enabled groups into the IAMIdentity used for authorization, so group-attached policies are evaluated alongside user-attached policies. * iam: check for name collision before renaming user in UpdateUser Scan identities and inline policies for newUserName before mutating, returning EntityAlreadyExists if a collision is found. Reuse the already-loaded policies instead of loading them again inside the loop. * test: use t.Cleanup for bucket cleanup in group policy test * iam: wrap ErrUserNotInGroup sentinel in RemoveGroupMember error Wrap credential.ErrUserNotInGroup so errors.Is works in groupErrorToHTTPStatus, returning proper 400 instead of 500. * admin: regenerate groups_templ.go with XSS-safe data attributes Regenerated from groups.templ which uses data-group-name attributes instead of inline onclick with string interpolation. * iam: add input validation and persist groups during migration - Validate nil/empty group name in CreateGroup and UpdateGroup - Save groups in migrateToMultiFile so they survive legacy migration * admin: use groupErrorToHTTPStatus in GetGroupMembers and GetGroupPolicies * iam: short-circuit UpdateUser when newUserName equals current name * iam: require empty PolicyNames before group deletion Reject DeleteGroup when group has attached policies, matching the existing members check. Also fix GetGroup error handling in DeletePolicy to only skip ErrGroupNotFound, not all errors. * ci: add weed/pb/** to S3 IAM test trigger paths * test: replace time.Sleep with require.Eventually for propagation waits Use polling with timeout instead of fixed sleeps to reduce flakiness in integration tests waiting for IAM policy propagation. * fix: use credentialManager.GetPolicy for AttachGroupPolicy validation Policies created via CreatePolicy through credentialManager are stored in the credential store, not in s3cfg.Policies (which only has static config policies). Change AttachGroupPolicy to use credentialManager.GetPolicy() for policy existence validation. * feat: add UpdateGroup handler to embedded IAM API Add UpdateGroup action to enable/disable groups and rename groups via the IAM API. This is a SeaweedFS extension (not in AWS SDK) used by tests to toggle group disabled status. * fix: authenticate raw IAM API calls in group tests The embedded IAM endpoint rejects anonymous requests. Replace callIAMAPI with callIAMAPIAuthenticated that uses JWT bearer token authentication via the test framework. * feat: add UpdateGroup handler to standalone IAM API Mirror the embedded IAM UpdateGroup handler in the standalone IAM API for parity. * fix: add omitempty to Marker XML tags in group responses Non-truncated responses should not emit an empty <Marker/> element. * fix: distinguish backend errors from missing policies in AttachGroupPolicy Return ServiceFailure for credential manager errors instead of masking them as NoSuchEntity. Also switch ListGroupsForUser to use s3cfg.Groups instead of in-memory reverse index to avoid stale data. Add duplicate name check to UpdateGroup rename. * fix: standalone IAM AttachGroupPolicy uses persisted policy store Check managed policies from GetPolicies() instead of s3cfg.Policies so dynamically created policies are found. Also add duplicate name check to UpdateGroup rename. * fix: rollback inline policies on UpdateUser PutPolicies failure If PutPolicies fails after moving inline policies to the new username, restore both the identity name and the inline policies map to their original state to avoid a partial-write window. * fix: correct test cleanup ordering for group tests Replace scattered defers with single ordered t.Cleanup in each test to ensure resources are torn down in reverse-creation order: remove membership, detach policies, delete access keys, delete users, delete groups, delete policies. Move bucket cleanup to parent test scope and delete objects before bucket. * fix: move identity nil check before map lookup and refine hasAttachedPolicies Move the nil check on identity before accessing identity.Name to prevent panic. Also refine hasAttachedPolicies to only consider groups that are enabled and have actual policies attached, so membership in a no-policy group doesn't incorrectly trigger IAM authorization. * fix: fail group reload on unreadable or corrupt group files Return errors instead of logging and continuing when group files cannot be read or unmarshaled. This prevents silently applying a partial IAM config with missing group memberships or policies. * fix: use errors.Is for sql.ErrNoRows comparison in postgres group store * docs: explain why group methods skip propagateChange Group changes propagate to S3 servers via filer subscription (watching /etc/iam/groups/) rather than gRPC RPCs, since there are no group-specific RPCs in the S3 cache protocol. * fix: remove unused policyNameFromArn and strings import * fix: update service account ParentUser on user rename When renaming a user via UpdateUser, also update ParentUser references in service accounts to prevent them from becoming orphaned after the next configuration reload. * fix: wrap DetachGroupPolicy error with ErrPolicyNotAttached sentinel Use credential.ErrPolicyNotAttached so groupErrorToHTTPStatus maps it to 400 instead of falling back to 500. * fix: use admin S3 client for bucket cleanup in enforcement test The user S3 client may lack permissions by cleanup time since the user is removed from the group in an earlier subtest. Use the admin S3 client to ensure bucket and object cleanup always succeeds. * fix: add nil guard for group param in propagating store log calls Prevent potential nil dereference when logging group.Name in CreateGroup and UpdateGroup of PropagatingCredentialStore. * fix: validate Disabled field in UpdateGroup handlers Reject values other than "true" or "false" with InvalidInputException instead of silently treating them as false. * fix: seed mergedGroups from existing groups in MergeS3ApiConfiguration Previously the merge started with empty group maps, dropping any static-file groups. Now seeds from existing iam.groups before overlaying dynamic config, and builds the reverse index after merging to avoid stale entries from overridden groups. * fix: use errors.Is for filer_pb.ErrNotFound comparison in group loading Replace direct equality (==) with errors.Is() to correctly match wrapped errors, consistent with the rest of the codebase. * fix: add ErrUserNotFound and ErrPolicyNotFound to groupErrorToHTTPStatus Map these sentinel errors to 404 so AddGroupMember and AttachGroupPolicy return proper HTTP status codes. * fix: log cleanup errors in group integration tests Replace fire-and-forget cleanup calls with error-checked versions that log failures via t.Logf for debugging visibility. * fix: prevent duplicate group test runs in CI matrix The basic lane's -run "TestIAM" regex also matched TestIAMGroup* tests, causing them to run in both the basic and group lanes. Replace with explicit test function names. * fix: add GIN index on groups.members JSONB for membership lookups Without this index, ListGroupsForUser and membership queries require full table scans on the groups table. * fix: handle cross-directory moves in IAM config subscription When a file is moved out of an IAM directory (e.g., /etc/iam/groups), the dir variable was overwritten with NewParentPath, causing the source directory change to be missed. Now also notifies handlers about the source directory for cross-directory moves. * fix: validate members/policies before deleting group in admin handler AdminServer.DeleteGroup now checks for attached members and policies before delegating to credentialManager, matching the IAM handler guards. * fix: merge groups by name instead of blind append during filer load Match the identity loader's merge behavior: find existing group by name and replace, only append when no match exists. Prevents duplicates when legacy and multi-file configs overlap. * fix: check DeleteEntry response error when cleaning obsolete group files Capture and log resp.Error from filer DeleteEntry calls during group file cleanup, matching the pattern used in deleteGroupFile. * fix: verify source user exists before no-op check in UpdateUser Reorder UpdateUser to find the source identity first and return NoSuchEntityException if not found, before checking if the rename is a no-op. Previously a non-existent user renamed to itself would incorrectly return success. * fix: update service account parent refs on user rename in embedded IAM The embedded IAM UpdateUser handler updated group membership but not service account ParentUser fields, unlike the standalone handler. * fix: replay source-side events for all handlers on cross-dir moves Pass nil newEntry to bucket, IAM, and circuit-breaker handlers for the source directory during cross-directory moves, so all watchers can clear caches for the moved-away resource. * fix: don't seed mergedGroups from existing iam.groups in merge Groups are always dynamic (from filer), never static (from s3.config). Seeding from iam.groups caused stale deleted groups to persist. Now only uses config.Groups from the dynamic filer config. * fix: add deferred user cleanup in TestIAMGroupUserDeletionSideEffect Register t.Cleanup for the created user so it gets cleaned up even if the test fails before the inline DeleteUser call. * fix: assert UpdateGroup HTTP status in disabled group tests Add require.Equal checks for 200 status after UpdateGroup calls so the test fails immediately on API errors rather than relying on the subsequent Eventually timeout. * fix: trim whitespace from group name in filer store operations Trim leading/trailing whitespace from group.Name before validation in CreateGroup and UpdateGroup to prevent whitespace-only filenames. Also merge groups by name during multi-file load to prevent duplicates. * fix: add nil/empty group validation in gRPC store Guard CreateGroup and UpdateGroup against nil group or empty name to prevent panics and invalid persistence. * fix: add nil/empty group validation in postgres store Guard CreateGroup and UpdateGroup against nil group or empty name to prevent panics from nil member access and empty-name row inserts. * fix: add name collision check in embedded IAM UpdateUser The embedded IAM handler renamed users without checking if the target name already existed, unlike the standalone handler. * fix: add ErrGroupNotEmpty sentinel and map to HTTP 409 AdminServer.DeleteGroup now wraps conflict errors with ErrGroupNotEmpty, and groupErrorToHTTPStatus maps it to 409 Conflict instead of 500. * fix: use appropriate error message in GetGroupDetails based on status Return "Group not found" only for 404, use "Failed to retrieve group" for other error statuses instead of always saying "Group not found". * fix: use backend-normalized group.Name in CreateGroup response After credentialManager.CreateGroup may normalize the name (e.g., trim whitespace), use group.Name instead of the raw input for the returned GroupData to ensure consistency. * fix: add nil/empty group validation in memory store Guard CreateGroup and UpdateGroup against nil group or empty name to prevent panics from nil pointer dereference on map access. * fix: reorder embedded IAM UpdateUser to verify source first Find the source identity before checking for collisions, matching the standalone handler's logic. Previously a non-existent user renamed to an existing name would get EntityAlreadyExists instead of NoSuchEntity. * fix: handle same-directory renames in metadata subscription Replay a delete event for the old entry name during same-directory renames so handlers like onBucketMetadataChange can clean up stale state for the old name. * fix: abort GetGroups on non-ErrGroupNotFound errors Only skip groups that return ErrGroupNotFound. Other errors (e.g., transient backend failures) now abort the handler and return the error to the caller instead of silently producing partial results. * fix: add aria-label and title to icon-only group action buttons Add accessible labels to View and Delete buttons so screen readers and tooltips provide meaningful context. * fix: validate group name in saveGroup to prevent invalid filenames Trim whitespace and reject empty names before writing group JSON files, preventing creation of files like ".json". * fix: add /etc/iam/groups to filer subscription watched directories The groups directory was missing from the watched directories list, so S3 servers in a cluster would not detect group changes made by other servers via filer. The onIamConfigChange handler already had code to handle group directory changes but it was never triggered. * add direct gRPC propagation for group changes to S3 servers Groups now have the same dual propagation as identities and policies: direct gRPC push via propagateChange + async filer subscription. - Add PutGroup/RemoveGroup proto messages and RPCs - Add PutGroup/RemoveGroup in-memory cache methods on IAM - Add PutGroup/RemoveGroup gRPC server handlers - Update PropagatingCredentialStore to call propagateChange on group mutations * reduce log verbosity for config load summary Change ReplaceS3ApiConfiguration log from Infof to V(1).Infof to avoid noisy output on every config reload. * admin: show user groups in view and edit user modals - Add Groups field to UserDetails and populate from credential manager - Show groups as badges in user details view modal - Add group management to edit user modal: display current groups, add to group via dropdown, remove from group via badge x button * fix: remove duplicate showAlert that broke modal-alerts.js admin.js defined showAlert(type, message) which overwrote the modal-alerts.js version showAlert(message, type), causing broken unstyled alert boxes. Remove the duplicate and swap all callers in admin.js to use the correct (message, type) argument order. * fix: unwrap groups API response in edit user modal The /api/groups endpoint returns {"groups": [...]}, not a bare array. * Update object_store_users_templ.go * test: assert AccessDenied error code in group denial tests Replace plain assert.Error checks with awserr.Error type assertion and AccessDenied code verification, matching the pattern used in other IAM integration tests. * fix: propagate GetGroups errors in ShowGroups handler getGroupsPageData was swallowing errors and returning an empty page with 200 status. Now returns the error so ShowGroups can respond with a proper error status. * fix: reject AttachGroupPolicy when credential manager is nil Previously skipped policy existence validation when credentialManager was nil, allowing attachment of nonexistent policies. Now returns a ServiceFailureException error. * fix: preserve groups during partial MergeS3ApiConfiguration updates UpsertIdentity calls MergeS3ApiConfiguration with a partial config containing only the updated identity (nil Groups). This was wiping all in-memory group state. Now only replaces groups when config.Groups is non-nil (full config reload). * fix: propagate errors from group lookup in GetObjectStoreUserDetails ListGroups and GetGroup errors were silently ignored, potentially showing incomplete group data in the UI. * fix: use DOM APIs for group badge remove button to prevent XSS Replace innerHTML with onclick string interpolation with DOM createElement + addEventListener pattern. Also add aria-label and title to the add-to-group button. * fix: snapshot group policies under RLock to prevent concurrent map access evaluateIAMPolicies was copying the map reference via groupMap := iam.groups under RLock then iterating after RUnlock, while PutGroup mutates the map in-place. Now copies the needed policy names into a slice while holding the lock. * fix: add nil IAM check to PutGroup and RemoveGroup gRPC handlers Match the nil guard pattern used by PutPolicy/DeletePolicy to prevent nil pointer dereference when IAM is not initialized.	2026-03-09 11:54:32 -07:00
Chris Lu	d89eb8267f	s3: use url.PathUnescape for X-Amz-Copy-Source header (#8545 ) * s3: use url.PathUnescape for X-Amz-Copy-Source header (#8544) The X-Amz-Copy-Source header is a URL-encoded path, not a query string. Using url.QueryUnescape incorrectly converts literal '+' characters to spaces, which can cause object key mismatches during copy operations. Switch to url.PathUnescape in CopyObjectHandler, CopyObjectPartHandler, and pathToBucketObjectAndVersion to correctly handle special characters like '!', '+', and other RFC 3986 sub-delimiters that S3 clients may percent-encode (e.g. '!' as %21). * s3: add path validation to CopyObjectPartHandler CopyObjectPartHandler was missing the validateTableBucketObjectPath checks that CopyObjectHandler has, allowing potential path traversal in the source bucket/object of copy part requests. * s3: fix case-sensitive HeadersRegexp for copy source routing The HeadersRegexp for X-Amz-Copy-Source used `%2F` which only matched uppercase hex encoding. RFC 3986 allows both `%2F` and `%2f`, so clients sending lowercase percent-encoding would bypass the copy handler and hit PutObjectHandler instead. Add (?i) flag for case-insensitive matching. Also add test coverage for the versionId branch in pathToBucketObjectAndVersion and for lowercase %2f routing.	2026-03-07 11:10:02 -08:00
Chris Lu	540fc97e00	s3/iam: reuse one request id per request (#8538 ) * request_id: add shared request middleware * s3err: preserve request ids in responses and logs * iam: reuse request ids in XML responses * sts: reuse request ids in XML responses * request_id: drop legacy header fallback * request_id: use AWS-style request id format * iam: fix AWS-compatible XML format for ErrorResponse and field ordering - ErrorResponse uses bare <RequestId> at root level instead of <ResponseMetadata> wrapper, matching the AWS IAM error response spec - Move CommonResponse to last field in success response structs so <ResponseMetadata> serializes after result elements - Add randomness to request ID generation to avoid collisions - Add tests for XML ordering and ErrorResponse format * iam: remove duplicate error_response_test.go Test is already covered by responses_test.go. * address PR review comments - Guard against typed nil pointers in SetResponseRequestID before interface assertion (CodeRabbit) - Use regexp instead of strings.Index in test helpers for extracting request IDs (Gemini) * request_id: prevent spoofing, fix nil-error branch, thread reqID to error writers - Ensure() now always generates a server-side ID, ignoring client-sent x-amz-request-id headers to prevent request ID spoofing. Uses a private context key (contextKey{}) instead of the header string. - writeIamErrorResponse in both iamapi and embedded IAM now accepts reqID as a parameter instead of calling Ensure() internally, ensuring a single request ID per request lifecycle. - The nil-iamError branch in writeIamErrorResponse now writes a 500 Internal Server Error response instead of returning silently. - Updated tests to set request IDs via context (not headers) and added tests for spoofing prevention and context reuse. * sts: add request-id consistency assertions to ActionInBody tests * test: update admin test to expect server-generated request IDs The test previously sent a client x-amz-request-id header and expected it echoed back. Since Ensure() now ignores client headers to prevent spoofing, update the test to verify the server returns a non-empty server-generated request ID instead. * iam: add generic WithRequestID helper alongside reflection-based fallback Add WithRequestID[T] that uses generics to take the address of a value type, satisfying the pointer receiver on SetRequestId without reflection. The existing SetResponseRequestID is kept for the two call sites that operate on interface{} (from large action switches where the concrete type varies at runtime). Generics cannot replace reflection there since Go cannot infer type parameters from interface{}. * Remove reflection and generics from request ID setting Call SetRequestId directly on concrete response types in each switch branch before boxing into interface{}, eliminating the need for WithRequestID (generics) and SetResponseRequestID (reflection). * iam: return pointer responses in action dispatch * Fix IAM error handling consistency and ensure request IDs on all responses - UpdateUser/CreatePolicy error branches: use writeIamErrorResponse instead of s3err.WriteErrorResponse to preserve IAM formatting and request ID - ExecuteAction: accept reqID parameter and generate one if empty, ensuring every response carries a RequestId regardless of caller * Clean up inline policies on DeleteUser and UpdateUser rename DeleteUser: remove InlinePolicies[userName] from policy storage before removing the identity, so policies are not orphaned. UpdateUser: move InlinePolicies[userName] to InlinePolicies[newUserName] when renaming, so GetUserPolicy/DeleteUserPolicy work under the new name. Both operations persist the updated policies and return an error if the storage write fails, preventing partial state.	2026-03-06 15:22:39 -08:00
Chris Lu	10a30a83e1	s3api: add GetObjectAttributes API support (#8504 ) * s3api: add error code and header constants for GetObjectAttributes Add ErrInvalidAttributeName error code and header constants (X-Amz-Object-Attributes, X-Amz-Max-Parts, X-Amz-Part-Number-Marker, X-Amz-Delete-Marker) needed by the S3 GetObjectAttributes API. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: implement GetObjectAttributes handler Add GetObjectAttributesHandler that returns selected object metadata (ETag, Checksum, StorageClass, ObjectSize, ObjectParts) without returning the object body. Follows the same versioning and conditional header patterns as HeadObjectHandler. The handler parses the X-Amz-Object-Attributes header to determine which attributes to include in the XML response, and supports ObjectParts pagination via X-Amz-Max-Parts and X-Amz-Part-Number-Marker. Ref: https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetObjectAttributes.html Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: register GetObjectAttributes route Register the GET /{object}?attributes route for the GetObjectAttributes API, placed before other object query routes to ensure proper matching. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: add integration tests for GetObjectAttributes Test coverage: - Basic: simple object with all attribute types - MultipartObject: multipart upload with parts pagination - SelectiveAttributes: requesting only specific attributes - InvalidAttribute: server rejects invalid attribute names - NonExistentObject: returns NoSuchKey for missing objects Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: add versioned object test for GetObjectAttributes Test puts two versions of the same object and verifies that: - GetObjectAttributes returns the latest version by default - GetObjectAttributes with versionId returns the specific version - ObjectSize and VersionId are correct for each version Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: fix combined conditional header evaluation per RFC 7232 Per RFC 7232: - Section 3.4: If-Unmodified-Since MUST be ignored when If-Match is present (If-Match is the more accurate replacement) - Section 3.3: If-Modified-Since MUST be ignored when If-None-Match is present (If-None-Match is the more accurate replacement) Previously, all four conditional headers were evaluated independently. This caused incorrect 412 responses when If-Match succeeded but If-Unmodified-Since failed (should return 200 per AWS S3 behavior). Fix applied to both validateConditionalHeadersForReads (GET/HEAD) and validateConditionalHeaders (PUT) paths. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: add conditional header combination tests for GetObjectAttributes Test the RFC 7232 combined conditional header semantics: - If-Match=true + If-Unmodified-Since=false => 200 (If-Unmodified-Since ignored) - If-None-Match=false + If-Modified-Since=true => 304 (If-Modified-Since ignored) - If-None-Match=true + If-Modified-Since=false => 200 (If-Modified-Since ignored) - If-Match=true + If-Unmodified-Since=true => 200 - If-Match=false => 412 regardless Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: document Checksum attribute as not yet populated Checksum is accepted in validation (so clients requesting it don't get a 400 error, matching AWS behavior for objects without checksums) but SeaweedFS does not yet store S3 checksums. Add a comment explaining this and noting where to populate it when checksum storage is added. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: add s3:GetObjectAttributes IAM action for ?attributes query Previously, GET /{object}?attributes resolved to s3:GetObject via the fallback path since resolveFromQueryParameters had no case for the "attributes" query parameter. Add S3_ACTION_GET_OBJECT_ATTRIBUTES constant ("s3:GetObjectAttributes") and a branch in resolveFromQueryParameters to return it for GET requests with the "attributes" query parameter, so IAM policies can distinguish GetObjectAttributes from GetObject. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: evaluate conditional headers after version resolution Move conditional header evaluation (If-Match, If-None-Match, etc.) to after the version resolution step in GetObjectAttributesHandler. This ensures that when a specific versionId is requested, conditions are checked against the correct version entry rather than always against the latest version. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: use bounded HTTP client in GetObjectAttributes tests Replace http.DefaultClient with a timeout-aware http.Client (10s) in the signedGetObjectAttributes helper and testGetObjectAttributesInvalid to prevent tests from hanging indefinitely. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: check attributes query before versionId in action resolver Move the GetObjectAttributes action check before the versionId check in resolveFromQueryParameters. This fixes GET /bucket/key?attributes&versionId=xyz being incorrectly classified as s3:GetObjectVersion instead of s3:GetObjectAttributes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: add tests for versioned conditional headers and action resolver Add integration test that verifies conditional headers (If-Match, If-None-Match) are evaluated against the requested version entry, not the latest version. This covers the fix in `55c409dec`. Add unit test for ResolveS3Action verifying that the attributes query parameter takes precedence over versionId, so GET ?attributes&versionId resolves to s3:GetObjectAttributes. This covers the fix in `b92c61c95`. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: guard negative chunk indices and rename PartsCount field Add bounds checks for b.StartChunk >= 0 and b.EndChunk >= 0 in buildObjectAttributesParts to prevent panics from corrupted metadata with negative index values. Rename ObjectAttributesParts.PartsCount to TotalPartsCount to match the AWS SDK v2 Go field naming convention, while preserving the XML element name "PartsCount" via the struct tag. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: reject malformed max-parts and part-number-marker headers Return ErrInvalidMaxParts and ErrInvalidPartNumberMarker when the X-Amz-Max-Parts or X-Amz-Part-Number-Marker headers contain non-integer or negative values, matching ListObjectPartsHandler behavior. Previously these were silently ignored with defaults. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-04 12:52:09 -08:00
blitt001	3d81d5bef7	Fix S3 signature verification behind reverse proxies (#8444 ) * Fix S3 signature verification behind reverse proxies When SeaweedFS is deployed behind a reverse proxy (e.g. nginx, Kong, Traefik), AWS S3 Signature V4 verification fails because the Host header the client signed with (e.g. "localhost:9000") differs from the Host header SeaweedFS receives on the backend (e.g. "seaweedfs:8333"). This commit adds a new -s3.externalUrl parameter (and S3_EXTERNAL_URL environment variable) that tells SeaweedFS what public-facing URL clients use to connect. When set, SeaweedFS uses this host value for signature verification instead of the Host header from the incoming request. New parameter: -s3.externalUrl (flag) or S3_EXTERNAL_URL (environment variable) Example: -s3.externalUrl=http://localhost:9000 Example: S3_EXTERNAL_URL=https://s3.example.com The environment variable is particularly useful in Docker/Kubernetes deployments where the external URL is injected via container config. The flag takes precedence over the environment variable when both are set. At startup, the URL is parsed and default ports are stripped to match AWS SDK behavior (port 80 for HTTP, port 443 for HTTPS), so "http://s3.example.com:80" and "http://s3.example.com" are equivalent. Bugs fixed: - Default port stripping was removed by a prior PR, causing signature mismatches when clients connect on standard ports (80/443) - X-Forwarded-Port was ignored when X-Forwarded-Host was not present - Scheme detection now uses proper precedence: X-Forwarded-Proto > TLS connection > URL scheme > "http" - Test expectations for standard port stripping were incorrect - expectedHost field in TestSignatureV4WithForwardedPort was declared but never actually checked (self-referential test) * Add Docker integration test for S3 proxy signature verification Docker Compose setup with nginx reverse proxy to validate that the -s3.externalUrl parameter (or S3_EXTERNAL_URL env var) correctly resolves S3 signature verification when SeaweedFS runs behind a proxy. The test uses nginx proxying port 9000 to SeaweedFS on port 8333, with X-Forwarded-Host/Port/Proto headers set. SeaweedFS is configured with -s3.externalUrl=http://localhost:9000 so it uses "localhost:9000" for signature verification, matching what the AWS CLI signs with. The test can be run with aws CLI on the host or without it by using the amazon/aws-cli Docker image with --network host. Test covers: create-bucket, list-buckets, put-object, head-object, list-objects-v2, get-object, content round-trip integrity, delete-object, and delete-bucket — all through the reverse proxy. * Create s3-proxy-signature-tests.yml * fix CLI * fix CI * Update s3-proxy-signature-tests.yml * address comments * Update Dockerfile * add user * no need for fuse * Update s3-proxy-signature-tests.yml * debug * weed mini * fix health check * health check * fix health checking --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Chris Lu <chris.lu@gmail.com>	2026-02-26 14:20:42 -08:00
Chris Lu	d1fecdface	Fix IAM defaults and S3Tables IAM regression (#8374 ) * Fix IAM defaults and s3tables identities * Refine S3Tables identity tests * Clarify identity tests	2026-02-18 18:20:03 -08:00
Chris Lu	eda4a000cc	Revert "Fix IAM defaults and s3tables identities" This reverts commit `bf71fe0039`.	2026-02-18 16:23:13 -08:00
Chris Lu	bf71fe0039	Fix IAM defaults and s3tables identities	2026-02-18 16:21:48 -08:00
Chris Lu	0d8588e3ae	S3: Implement IAM defaults and STS signing key fallback (#8348 ) * S3: Implement IAM defaults and STS signing key fallback logic * S3: Refactor startup order to init SSE-S3 key manager before IAM * S3: Derive STS signing key from KEK using HKDF for security isolation * S3: Document STS signing key fallback in security.toml * fix(s3api): refine anonymous access logic and secure-by-default behavior - Initialize anonymous identity by default in `NewIdentityAccessManagement` to prevent nil pointer exceptions. - Ensure `ReplaceS3ApiConfiguration` preserves the anonymous identity if not present in the new configuration. - Update `NewIdentityAccessManagement` signature to accept `filerClient`. - In legacy mode (no policy engine), anonymous defaults to Deny (no actions), preserving secure-by-default behavior. - Use specific `LookupAnonymous` method instead of generic map lookup. - Update tests to accommodate signature changes and verify improved anonymous handling. * feat(s3api): make IAM configuration optional - Start S3 API server without a configuration file if `EnableIam` option is set. - Default to `Allow` effect for policy engine when no configuration is provided (Zero-Config mode). - Handle empty configuration path gracefully in `loadIAMManagerFromConfig`. - Add integration test `iam_optional_test.go` to verify empty config behavior. * fix(iamapi): fix signature mismatch in NewIdentityAccessManagementWithStore * fix(iamapi): properly initialize FilerClient instead of passing nil * fix(iamapi): properly initialize filer client for IAM management - Instead of passing `nil`, construct a `wdclient.FilerClient` using the provided `Filers` addresses. - Ensure `NewIdentityAccessManagementWithStore` receives a valid `filerClient` to avoid potential nil pointer dereferences or limited functionality. * clean: remove dead code in s3api_server.go * refactor(s3api): improve IAM initialization, safety and anonymous access security * fix(s3api): ensure IAM config loads from filer after client init * fix(s3): resolve test failures in integration, CORS, and tagging tests - Fix CORS tests by providing explicit anonymous permissions config - Fix S3 integration tests by setting admin credentials in init - Align tagging test credentials in CI with IAM defaults - Added goroutine to retry IAM config load in iamapi server * fix(s3): allow anonymous access to health targets and S3 Tables when identities are present * fix(ci): use /healthz for Caddy health check in awscli tests * iam, s3api: expose DefaultAllow from IAM and Policy Engine This allows checking the global "Open by Default" configuration from other components like S3 Tables. * s3api/s3tables: support DefaultAllow in permission logic and handler Updated CheckPermissionWithContext to respect the DefaultAllow flag in PolicyContext. This enables "Open by Default" behavior for unauthenticated access in zero-config environments. Added a targeted unit test to verify the logic. * s3api/s3tables: propagate DefaultAllow through handlers Propagated the DefaultAllow flag to individual handlers for namespaces, buckets, tables, policies, and tagging. This ensures consistent "Open by Default" behavior across all S3 Tables API endpoints. * s3api: wire up DefaultAllow for S3 Tables API initialization Updated registerS3TablesRoutes to query the global IAM configuration and set the DefaultAllow flag on the S3 Tables API server. This completes the end-to-end propagation required for anonymous access in zero-config environments. Added a SetDefaultAllow method to S3TablesApiServer to facilitate this. * s3api: fix tests by adding DefaultAllow to mock IAM integrations The IAMIntegration interface was updated to include DefaultAllow(), breaking several mock implementations in tests. This commit fixes the build errors by adding the missing method to the mocks. * env * ensure ports * env * env * fix default allow * add one more test using non-anonymous user * debug * add more debug * less logs	2026-02-16 13:59:13 -08:00
Chris Lu	e863767ac7	cleanup(iam): final removal of temporary debug logging from STS and S3 API	2026-02-14 22:15:06 -08:00
Chris Lu	cf8e383e1e	STS: Fallback to Caller Identity when RoleArn is missing in AssumeRole (#8345 ) * s3api: make RoleArn optional in AssumeRole * s3api: address PR feedback for optional RoleArn * iam: add configurable default role for AssumeRole * S3 STS: Use caller identity when RoleArn is missing - Fallback to PrincipalArn/Context in AssumeRole if RoleArn is empty - Handle User ARNs in prepareSTSCredentials - Fix PrincipalArn generation for env var credentials * Test: Add unit test for AssumeRole caller identity fallback * fix(s3api): propagate admin permissions to assumed role session when using caller identity fallback * STS: Fix is_admin propagation and optimize IAM policy evaluation for assumed roles - Restore is_admin propagation via JWT req_ctx - Optimize IsActionAllowed to skip role lookups for admin sessions - Ensure session policies are still applied for downscoping - Remove debug logging - Fix syntax errors in cleanup * fix(iam): resolve STS policy bypass for admin sessions - Fixed IsActionAllowed in iam_manager.go to correctly identify and validate internal STS tokens, ensuring session policies are enforced. - Refactored VerifyActionPermission in auth_credentials.go to properly handle session tokens and avoid legacy authorization short-circuits. - Added debug logging for better tracing of policy evaluation and session validation.	2026-02-14 22:00:59 -08:00
Chris Lu	7799915e50	Fix IAM identity loss on S3 restart migration (#8343 ) * Fix IAM reload after legacy config migration Handle legacy identity.json metadata events by reloading from the credential manager instead of parsing event content, and watch the correct /etc/iam multi-file directories so identity changes are applied. Add regression tests for legacy deletion and /etc/iam/identities change events. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix auth_credentials_subscribe_test helper to not pollute global memory store The SaveConfiguration call was affecting other tests. Use local credential manager and ReplaceS3ApiConfiguration instead. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix IAM event watching: subscribe to IAM directories and improve directory matching - Add /etc/iam and its subdirectories (identities, policies, service_accounts) to directoriesToWatch - Fix directory matching to avoid false positives from sibling directories - Use exact match or prefix with trailing slash instead of plain HasPrefix - Prevents matching hypothetical /etc/iam/identities_backup directory This ensures IAM config change events are actually delivered to the handler. * fix tests --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-02-13 22:49:27 -08:00
Chris Lu	49a64f50f1	Add session policy support to IAM (#8338 ) * Add session policy support to IAM - Implement policy evaluation for session tokens in policy_engine.go - Add session_policy field to session claims for tracking applied policies - Update STS service to include session policies in token generation - Add IAM integration tests for session policy validation - Update IAM manager to support policy attachment to sessions - Extend S3 API STS endpoint to handle session policy restrictions * fix: optimize session policy evaluation and add documentation * sts: add NormalizeSessionPolicy helper for inline session policies * sts: support inline session policies for AssumeRoleWithWebIdentity and credential-based flows * s3api: parse and normalize Policy parameter for STS HTTP handlers * tests: add session policy unit tests and integration tests for inline policy downscoping * tests: add s3tables STS inline policy integration * iam: handle user principals and validate tokens * sts: enforce inline session policy size limit * tests: harden s3tables STS integration config * iam: clarify principal policy resolution errors * tests: improve STS integration endpoint selection	2026-02-13 13:58:22 -08:00
Chris Lu	4e1065e485	Fix: preserve request body for STS signature verification (#8324 ) * Fix: preserve request body for STS signature verification - Save and restore request body in UnifiedPostHandler after ParseForm() - This allows STS handler to verify signatures correctly - Fixes 'invalid AWS signature: 53' error (ErrContentSHA256Mismatch) - ParseForm() consumes the body, so we need to restore it for downstream handlers * Improve error handling in UnifiedPostHandler - Add http.MaxBytesReader to limit body size to 10 MiB (iamRequestBodyLimit) - Add proper error handling for io.ReadAll failures - Log errors when body reading fails - Prevents DoS attacks from oversized request bodies - Addresses code review feedback	2026-02-12 13:28:12 -08:00
Chris Lu	c1a9263e37	Fix STS AssumeRole with POST body param (#8320 ) * Fix STS AssumeRole with POST body param and add integration test * Add STS integration test to CI workflow * Address code review feedback: fix HPP vulnerability and style issues * Refactor: address code review feedback - Fix HTTP Parameter Pollution vulnerability in UnifiedPostHandler - Refactor permission check logic for better readability - Extract test helpers to testutil/docker.go to reduce duplication - Clean up imports and simplify context setting * Add SigV4-style test variant for AssumeRole POST body routing - Added ActionInBodyWithSigV4Style test case to validate real-world scenario - Test confirms routing works correctly for AWS SigV4-signed requests - Addresses code review feedback about testing with SigV4 signatures * Fix: always set identity in context when non-nil - Ensure UnifiedPostHandler always calls SetIdentityInContext when identity is non-nil - Only call SetIdentityNameInContext when identity.Name is non-empty - This ensures downstream handlers (embeddedIam.DoActions) always have access to identity - Addresses potential issue where empty identity.Name would skip context setting	2026-02-12 12:04:07 -08:00
Chris Lu	be6b5db65a	s3: fix health check endpoints returning 404 for HEAD requests #8243 (#8248 ) * Fix disk errors handling in vacuum compaction When a disk reports IO errors during vacuum compaction (e.g., 'read /mnt/d1/weed/oc_xyz.dat: input/output error'), the vacuum task should signal the error to the master so it can: 1. Drop the faulty volume replica 2. Rebuild the replica from healthy copies Changes: - Add checkReadWriteError() calls in vacuum read paths (ReadNeedleBlob, ReadData, ScanVolumeFile) to flag EIO errors in volume.lastIoError - Preserve error wrapping using %w format instead of %v so EIO propagates correctly - The existing heartbeat logic will detect lastIoError and remove the bad volume Fixes issue #8237 * error * s3: fix health check endpoints returning 404 for HEAD requests #8243	2026-02-08 19:08:10 -08:00
Chris Lu	1274cf038c	s3: enforce authentication and JSON error format for Iceberg REST Catalog (#8192 ) * s3: enforce authentication and JSON error format for Iceberg REST Catalog * s3/iceberg: align error exception types with OpenAPI spec examples * s3api: refactor AuthenticateRequest to return identity object * s3/iceberg: propagate full identity object to request context * s3/iceberg: differentiate NotAuthorizedException and ForbiddenException * s3/iceberg: reject requests if authenticator is nil to prevent auth bypass * s3/iceberg: refactor Auth middleware to build context incrementally and use switch for error mapping * s3api: update misleading comment for authRequestWithAuthType * s3api: return ErrAccessDenied if IAM is not configured to prevent auth bypass * s3/iceberg: optimize context update in Auth middleware * s3api: export CanDo for external authorization use * s3/iceberg: enforce identity-based authorization in all API handlers * s3api: fix compilation errors by updating internal CanDo references * s3/iceberg: robust identity validation and consistent action usage in handlers * s3api: complete CanDo rename across tests and policy engine integration * s3api: fix integration tests by allowing admin access when auth is disabled and explicit gRPC ports * duckdb * create test bucket	2026-02-03 11:55:12 -08:00
Chris Lu	f1e27b8f30	s3: change s3 tables to use RESTful API (#8169 ) * s3: refactor s3 tables to use RESTful API * test/s3tables: guard empty namespaces * s3api: document tag parsing and validate get-table * s3api: limit S3Tables REST body size * Update weed/s3api/s3api_tables.go Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update weed/s3api/s3tables/handler.go Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * s3api: accept encoded table bucket ARNs * s3api: validate namespaces and close body * s3api: match encoded table bucket ARNs * s3api: scope table bucket ARN routes * s3api: dedupe table bucket request builders * test/s3tables: allow list tables without namespace * s3api: validate table params and tag ARN * s3api: tighten tag handling and get-table params * s3api: loosen tag ARN route matching * Fix S3 Tables REST routing and tests * Adjust S3 Tables request parsing * Gate S3 Tables target routing * Avoid double decoding namespaces --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2026-01-30 10:37:34 -08:00
Chris Lu	6a12438351	s3api: register S3 Tables routes in API server - Add S3 Tables route registration in s3api_server.go registerRouter method - Enable S3 Tables API operations to be routed through S3 API server - Routes handled by s3api_tables.go integration layer - Minimal changes to existing S3 API structure	2026-01-28 00:55:39 -08:00
Chris Lu	551a31e156	Implement IAM propagation to S3 servers (#8130 ) * Implement IAM propagation to S3 servers - Add PropagatingCredentialStore to propagate IAM changes to S3 servers via gRPC - Add Policy management RPCs to S3 proto and S3ApiServer - Update CredentialManager to use PropagatingCredentialStore when MasterClient is available - Wire FilerServer to enable propagation * Implement parallel IAM propagation and fix S3 cluster registration - Parallelized IAM change propagation with 10s timeout. - Refined context usage in PropagatingCredentialStore. - Added S3Type support to cluster node management. - Enabled S3 servers to register with gRPC address to the master. - Ensured IAM configuration reload after policy updates via gRPC. * Optimize IAM propagation with direct in-memory cache updates * Secure IAM propagation: Use metadata to skip persistence only on propagation * pb: refactor IAM and S3 services for unidirectional IAM propagation - Move SeaweedS3IamCache service from iam.proto to s3.proto. - Remove legacy IAM management RPCs and empty SeaweedS3 service from s3.proto. - Enforce that S3 servers only use the synchronization interface. * pb: regenerate Go code for IAM and S3 services Updated generated code following the proto refactoring of IAM synchronization services. * s3api: implement read-only mode for Embedded IAM API - Add readOnly flag to EmbeddedIamApi to reject write operations via HTTP. - Enable read-only mode by default in S3ApiServer. - Handle AccessDenied error in writeIamErrorResponse. - Embed SeaweedS3IamCacheServer in S3ApiServer. * credential: refactor PropagatingCredentialStore for unidirectional IAM flow - Update to use s3_pb.SeaweedS3IamCacheClient for propagation to S3 servers. - Propagate full Identity object via PutIdentity for consistency. - Remove redundant propagation of specific user/account/policy management RPCs. - Add timeout context for propagation calls. * s3api: implement SeaweedS3IamCacheServer for unidirectional sync - Update S3ApiServer to implement the cache synchronization gRPC interface. - Methods (PutIdentity, RemoveIdentity, etc.) now perform direct in-memory cache updates. - Register SeaweedS3IamCacheServer in command/s3.go. - Remove registration for the legacy and now empty SeaweedS3 service. * s3api: update tests for read-only IAM and propagation - Added TestEmbeddedIamReadOnly to verify rejection of write operations in read-only mode. - Update test setup to pass readOnly=false to NewEmbeddedIamApi in routing tests. - Updated EmbeddedIamApiForTest helper with read-only checks matching production behavior. * s3api: add back temporary debug logs for IAM updates Log IAM updates received via: - gRPC propagation (PutIdentity, PutPolicy, etc.) - Metadata configuration reloads (LoadS3ApiConfigurationFromCredentialManager) - Core identity management (UpsertIdentity, RemoveIdentity) * IAM: finalize propagation fix with reduced logging and clarified architecture * Allow configuring IAM read-only mode for S3 server integration tests * s3api: add defensive validation to UpsertIdentity * s3api: fix log message to reference correct IAM read-only flag * test/s3/iam: ensure WaitForS3Service checks for IAM write permissions * test: enable writable IAM in Makefile for integration tests * IAM: add GetPolicy/ListPolicies RPCs to s3.proto * S3: add GetBucketPolicy and ListBucketPolicies helpers * S3: support storing generic IAM policies in IdentityAccessManagement * S3: implement IAM policy RPCs using IdentityAccessManagement * IAM: fix stale user identity on rename propagation	2026-01-26 22:59:43 -08:00
Chris Lu	6394e2f6a5	Fix IAM OIDC role mapping and OIDC claims in trust policy (#8104 ) * Fix IAM OIDC role mapping and OIDC claims in trust policy * Address PR review: Add config safety checks and refactor tests	2026-01-23 21:35:26 -08:00
Chris Lu	f6318edbc9	Refactor Admin UI to use unified IAM storage and add MultipleFileStore (#8101 ) * Refactor Admin UI to use unified IAM storage and add MultipleFileStore * Address PR feedback: fix renames, error handling, and sync logic in FilerMultipleStore * Address refined PR feedback: safe rename order, rollback logic, and structural sync refinement * Optimize LoadConfiguration: use streaming callback for memory efficiency * Refactor UpdateUser: log rollback failures during rename * Implement PolicyManager for FilerMultipleStore * include the filer_multiple backend configuration * Implement cross-S3 synchronization and proper shutdown for all IAM backends * Extract Admin UI refactoring to a separate PR	2026-01-23 20:12:59 -08:00
Chris Lu	d664ca5ed3	fix: IAM authentication with AWS Signature V4 and environment credentials (#8099 ) * fix: IAM authentication with AWS Signature V4 and environment credentials Three key fixes for authenticated IAM requests to work: 1. Fix request body consumption before signature verification - iamMatcher was calling r.ParseForm() which consumed POST body - This broke AWS Signature V4 verification on subsequent reads - Now only check query string in matcher, preserving body for verification - File: weed/s3api/s3api_server.go 2. Preserve environment variable credentials across config reloads - After IAM mutations, config reload overwrote env var credentials - Extract env var loading into loadEnvironmentVariableCredentials() - Call after every config reload to persist credentials - File: weed/s3api/auth_credentials.go 3. Add authenticated IAM tests and test infrastructure - New TestIAMAuthenticated suite with AWS SDK + Signature V4 - Dynamic port allocation for independent test execution - Flag reset to prevent state leakage between tests - CI workflow to run S3 and IAM tests separately - Files: test/s3/example/, .github/workflows/s3-example-integration-tests.yml All tests pass: - TestIAMCreateUser (unauthenticated) - TestIAMAuthenticated (with AWS Signature V4) - S3 integration tests fmt * chore: rename test/s3/example to test/s3/normal * simplify: CI runs all integration tests in single job * Update s3-example-integration-tests.yml * ci: run each test group separately to avoid raft registry conflicts	2026-01-23 16:27:42 -08:00
Chris Lu	ee3813787e	feat(s3api): Implement S3 Policy Variables (#8039 ) * feat: Add AWS IAM Policy Variables support to S3 API Implements policy variables for dynamic access control in bucket policies. Supported variables: - aws:username - Extracted from principal ARN - aws:userid - User identifier (same as username in SeaweedFS) - aws:principaltype - IAMUser, IAMRole, or AssumedRole - jwt:* - Any JWT claim (e.g., jwt:preferred_username, jwt:sub) Key changes: - Added PolicyVariableRegex to detect ${...} patterns - Extended CompiledStatement with DynamicResourcePatterns, DynamicPrincipalPatterns, DynamicActionPatterns - Added Claims field to PolicyEvaluationArgs for JWT claim access - Implemented SubstituteVariables() for variable replacement from context and JWT claims - Implemented extractPrincipalVariables() for ARN parsing - Updated EvaluateConditions() to support variable substitution - Comprehensive unit and integration tests Resolves #8037 * feat: Add LDAP and PrincipalAccount variable support Completes future enhancements for policy variables: - Added ldap:* variable support for LDAP claims - ldap:username - LDAP username from claims - ldap:dn - LDAP distinguished name from claims - ldap:* - Any LDAP claim - Added aws:PrincipalAccount extraction from ARN - Extracts account ID from principal ARN - Available as ${aws:PrincipalAccount} in policies Updated SubstituteVariables() to check LDAP claims Updated extractPrincipalVariables() to extract account ID Added comprehensive tests for new variables * feat(s3api): implement IAM policy variables core logic and optimization * feat(s3api): integrate policy variables with S3 authentication and handlers * test(s3api): add integration tests for policy variables * cleanup: remove unused policy conversion files * Add S3 policy variables integration tests and path support - Add comprehensive integration tests for policy variables - Test username isolation, JWT claims, LDAP claims - Add support for IAM paths in principal ARN parsing - Add tests for principals with paths * Fix IAM Role principal variable extraction IAM Roles should not have aws:userid or aws:PrincipalAccount according to AWS behavior. Only IAM Users and Assumed Roles should have these variables. Fixes TestExtractPrincipalVariables test failures. * Security fixes and bug fixes for S3 policy variables SECURITY FIXES: - Prevent X-SeaweedFS-Principal header spoofing by clearing internal headers at start of authentication (auth_credentials.go) - Restrict policy variable substitution to safe allowlist to prevent client header injection (iam/policy/policy_engine.go) - Add core policy validation before storing bucket policies BUG FIXES: - Remove unused sid variable in evaluateStatement - Fix LDAP claim lookup to check both prefixed and unprefixed keys - Add ValidatePolicy call in PutBucketPolicyHandler These fixes prevent privilege escalation via header injection and ensure only validated identity claims are used in policy evaluation. * Additional security fixes and code cleanup SECURITY FIXES: - Fixed X-Forwarded-For spoofing by only trusting proxy headers from private/localhost IPs (s3_iam_middleware.go) - Changed context key from "sourceIP" to "aws:SourceIp" for proper policy variable substitution CODE IMPROVEMENTS: - Kept aws:PrincipalAccount for IAM Roles to support condition evaluations - Removed redundant STS principaltype override - Removed unused service variable - Cleaned up commented-out debug logging statements - Updated tests to reflect new IAM Role behavior These changes prevent IP spoofing attacks and ensure policy variables work correctly with the safe allowlist. * Add security documentation for ParseJWTToken Added comprehensive security comments explaining that ParseJWTToken is safe despite parsing without verification because: - It's only used for routing to the correct verification method - All code paths perform cryptographic verification before trusting claims - OIDC tokens: validated via validateExternalOIDCToken - STS tokens: validated via ValidateSessionToken Enhanced function documentation with clear security warnings about proper usage to prevent future misuse. * Fix IP condition evaluation to use aws:SourceIp key Fixed evaluateIPCondition in IAM policy engine to use "aws:SourceIp" instead of "sourceIP" to match the updated extractRequestContext. This fixes the failing IP-restricted role test where IP-based policy conditions were not being evaluated correctly. Updated all test cases to use the correct "aws:SourceIp" key. * Address code review feedback: optimize and clarify PERFORMANCE IMPROVEMENT: - Optimized expandPolicyVariables to use regexp.ReplaceAllStringFunc for single-pass variable substitution instead of iterating through all safe variables. This improves performance from O(nm) to O(m) where n is the number of safe variables and m is the pattern length. CODE CLARITY: - Added detailed comment explaining LDAP claim fallback mechanism (checks both prefixed and unprefixed keys for compatibility) - Enhanced TODO comment for trusted proxy configuration with rationale and recommendations for supporting cloud load balancers, CDNs, and complex network topologies All tests passing. Address Copilot code review feedback BUG FIXES: - Fixed type switch for int/int32/int64 - separated into individual cases since interface type switches only match the first type in multi-type cases - Fixed grammatically incorrect error message in types.go CODE QUALITY: - Removed duplicate Resource/NotResource validation (already in ValidateStatement) - Added comprehensive comment explaining isEnabled() logic and security implications - Improved trusted proxy NOTE comment to be more concise while noting limitations All tests passing. * Fix test failures after extractSourceIP security changes Updated tests to work with the security fix that only trusts X-Forwarded-For/X-Real-IP headers from private IP addresses: - Set RemoteAddr to 127.0.0.1 in tests to simulate trusted proxy - Changed context key from "sourceIP" to "aws:SourceIp" - Added test case for untrusted proxy (public RemoteAddr) - Removed invalid ValidateStatement call (validation happens in ValidatePolicy) All tests now passing. * Address remaining Gemini code review feedback CODE SAFETY: - Deep clone Action field in CompileStatement to prevent potential data races if the original policy document is modified after compilation TEST CLEANUP: - Remove debug logging (fmt.Fprintf) from engine_notresource_test.go - Remove unused imports in engine_notresource_test.go All tests passing. * Fix insecure JWT parsing in IAM auth flow SECURITY FIX: - Renamed ParseJWTToken to ParseUnverifiedJWTToken with explicit security warnings. - Refactored AuthenticateJWT to use the trusted SessionInfo returned by ValidateSessionToken instead of relying on unverified claims from the initial parse. - Refactored ValidatePresignedURLWithIAM to reuse the robust AuthenticateJWT logic, removing duplicated and insecure manual token parsing. This ensures all identity information (Role, Principal, Subject) used for authorization decisions is derived solely from cryptographically verified tokens. * Security: Fix insecure JWT claim extraction in policy engine - Refactored EvaluatePolicy to accept trusted claims from verified Identity instead of parsing unverified tokens - Updated AuthenticateJWT to populate Claims in IAMIdentity from verified sources (SessionInfo/ExternalIdentity) - Updated s3api_server and handlers to pass claims correctly - Improved isPrivateIP to support IPv6 loopback, link-local, and ULA - Fixed flaky distributed_session_consistency test with retry logic * fix(iam): populate Subject in STSSessionInfo to ensure correct identity propagation This fixes the TestS3IAMAuthentication/valid_jwt_token_authentication failure by ensuring the session subject (sub) is correctly mapped to the internal SessionInfo struct, allowing bucket ownership validation to succeed. * Optimized isPrivateIP * Create s3-policy-tests.yml * fix tests * fix tests * tests(s3/iam): simplify policy to resource-based \ (step 1) * tests(s3/iam): add explicit Deny NotResource for isolation (step 2) * fixes * policy: skip resource matching for STS trust policies to allow AssumeRole evaluation * refactor: remove debug logging and hoist policy variables for performance * test: fix TestS3IAMBucketPolicyIntegration cleanup to handle per-subtest object lifecycle * test: fix bucket name generation to comply with S3 63-char limit * test: skip TestS3IAMPolicyEnforcement until role setup is implemented * test: use weed mini for simpler test server deployment Replace 'weed server' with 'weed mini' for IAM tests to avoid port binding issues and simplify the all-in-one server deployment. This improves test reliability and execution time. * security: prevent allocation overflow in policy evaluation Add maxPoliciesForEvaluation constant to cap the number of policies evaluated in a single request. This prevents potential integer overflow when allocating slices for policy lists that may be influenced by untrusted input. Changes: - Add const maxPoliciesForEvaluation = 1024 to set an upper bound - Validate len(policies) < maxPoliciesForEvaluation before appending bucket policy - Use append() instead of make([]string, len+1) to avoid arithmetic overflow - Apply fix to both IsActionAllowed policy evaluation paths	2026-01-16 11:12:28 -08:00
Feng Shao	8abcdc6d00	use "s" flag of regexp to let . match \n (#8024 ) * use "s" flag of regexp to let . match \n the partten "/{object:.+}" cause the mux failed to match URI of object with new line char, and the request fall thru into bucket handlers. * refactor --------- Co-authored-by: Chris Lu <chris.lu@gmail.com>	2026-01-14 13:49:49 -08:00
Chris Lu	06391701ed	Add AssumeRole and AssumeRoleWithLDAPIdentity STS actions (#8003 ) * test: add integration tests for AssumeRole and AssumeRoleWithLDAPIdentity STS actions - Add s3_sts_assume_role_test.go with comprehensive tests for AssumeRole: * Parameter validation (missing RoleArn, RoleSessionName, invalid duration) * AWS SigV4 authentication with valid/invalid credentials * Temporary credential generation and usage - Add s3_sts_ldap_test.go with tests for AssumeRoleWithLDAPIdentity: * Parameter validation (missing LDAP credentials, RoleArn) * LDAP authentication scenarios (valid/invalid credentials) * Integration with LDAP server (when configured) - Update Makefile with new test targets: * test-sts: run all STS tests * test-sts-assume-role: run AssumeRole tests only * test-sts-ldap: run LDAP STS tests only * test-sts-suite: run tests with full service lifecycle - Enhance setup_all_tests.sh: * Add OpenLDAP container setup for LDAP testing * Create test LDAP users (testuser, ldapadmin) * Set LDAP environment variables for tests * Update cleanup to remove LDAP container - Fix setup_keycloak.sh: * Enable verbose error logging for realm creation * Improve error diagnostics Tests use fail-fast approach (t.Fatal) when server not configured, ensuring clear feedback when infrastructure is missing. * feat: implement AssumeRole and AssumeRoleWithLDAPIdentity STS actions Implement two new STS actions to match MinIO's STS feature set: AssumeRole Implementation: - Add handleAssumeRole with full AWS SigV4 authentication - Integrate with existing IAM infrastructure via verifyV4Signature - Validate required parameters (RoleArn, RoleSessionName) - Validate DurationSeconds (900-43200 seconds range) - Generate temporary credentials with expiration - Return AWS-compatible XML response AssumeRoleWithLDAPIdentity Implementation: - Add handleAssumeRoleWithLDAPIdentity handler (stub) - Validate LDAP-specific parameters (LDAPUsername, LDAPPassword) - Validate common STS parameters (RoleArn, RoleSessionName, DurationSeconds) - Return proper error messages for missing LDAP provider - Ready for LDAP provider integration Routing Fixes: - Add explicit routes for AssumeRole and AssumeRoleWithLDAPIdentity - Prevent IAM handler from intercepting authenticated STS requests - Ensure proper request routing priority Handler Infrastructure: - Add IAM field to STSHandlers for SigV4 verification - Update NewSTSHandlers to accept IAM reference - Add STS-specific error codes and response types - Implement writeSTSErrorResponse for AWS-compatible errors The AssumeRole action is fully functional and tested. AssumeRoleWithLDAPIdentity requires LDAP provider implementation. * fix: update IAM matcher to exclude STS actions from interception Update the IAM handler matcher to check for STS actions (AssumeRole, AssumeRoleWithWebIdentity, AssumeRoleWithLDAPIdentity) and exclude them from IAM handler processing. This allows STS requests to be handled by the STS fallback handler even when they include AWS SigV4 authentication. The matcher now parses the form data to check the Action parameter and returns false for STS actions, ensuring they are routed to the correct handler. Note: This is a work-in-progress fix. Tests are still showing some routing issues that need further investigation. * fix: address PR review security issues for STS handlers This commit addresses all critical security issues from PR review: Security Fixes: - Use crypto/rand for cryptographically secure credential generation instead of time.Now().UnixNano() (fixes predictable credentials) - Add sts:AssumeRole permission check via VerifyActionPermission to prevent unauthorized role assumption - Generate proper session tokens using crypto/rand instead of placeholder strings Code Quality Improvements: - Refactor DurationSeconds parsing into reusable parseDurationSeconds() helper function used by all three STS handlers - Create generateSecureCredentials() helper for consistent and secure temporary credential generation - Fix iamMatcher to check query string as fallback when Action not found in form data LDAP Provider Implementation: - Add go-ldap/ldap/v3 dependency - Create LDAPProvider implementing IdentityProvider interface with full LDAP authentication support (connect, bind, search, groups) - Update ProviderFactory to create real LDAP providers - Wire LDAP provider into AssumeRoleWithLDAPIdentity handler Test Infrastructure: - Add LDAP user creation verification step in setup_all_tests.sh * fix: address PR feedback (Round 2) - config validation & provider improvements - Implement `validateLDAPConfig` in `ProviderFactory` - Improve `LDAPProvider.Initialize`: - Support `connectionTimeout` parsing (string/int/float) from config map - Warn if `BindDN` is present but `BindPassword` is empty - Improve `LDAPProvider.GetUserInfo`: - Add fallback to `searchUserGroups` if `memberOf` returns no groups (consistent with Authenticate) * fix: address PR feedback (Round 3) - LDAP connection improvements & build fix - Improve `LDAPProvider` connection handling: - Use `net.Dialer` with configured timeout for connection establishment - Enforce TLS 1.2+ (`MinVersion: tls.VersionTLS12`) for both LDAPS and StartTLS - Fix build error in `s3api_sts.go` (format verb for ErrorCode) * fix: address PR feedback (Round 4) - LDAP hardening, Authz check & Routing fix - LDAP Provider Hardening: - Prevent re-initialization - Enforce single user match in `GetUserInfo` (was explicit only in Authenticate) - Ensure connection closure if StartTLS fails - STS Handlers: - Add robust provider detection using type assertion - Security: Implement authorization check (`VerifyActionPermission`) after LDAP authentication - Routing: - Update tests to reflect that STS actions are handled by STS handler, not generic IAM * fix: address PR feedback (Round 5) - JWT tokens, ARN formatting, PrincipalArn CRITICAL FIXES: - Replace standalone credential generation with STS service JWT tokens - handleAssumeRole now generates proper JWT session tokens - handleAssumeRoleWithLDAPIdentity now generates proper JWT session tokens - Session tokens can be validated across distributed instances - Fix ARN formatting in responses - Extract role name from ARN using utils.ExtractRoleNameFromArn() - Prevents malformed ARNs like "arn:aws:sts::assumed-role/arn:aws:iam::..." - Add configurable AccountId for federated users - Add AccountId field to STSConfig (defaults to "111122223333") - PrincipalArn now uses configured account ID instead of hardcoded "aws" - Enables proper trust policy validation IMPROVEMENTS: - Sanitize LDAP authentication error messages (don't leak internal details) - Remove duplicate comment in provider detection - Add utils import for ARN parsing utilities * feat: implement LDAP connection pooling to prevent resource exhaustion PERFORMANCE IMPROVEMENT: - Add connection pool to LDAPProvider (default size: 10 connections) - Reuse LDAP connections across authentication requests - Prevent file descriptor exhaustion under high load IMPLEMENTATION: - connectionPool struct with channel-based connection management - getConnection(): retrieves from pool or creates new connection - returnConnection(): returns healthy connections to pool - createConnection(): establishes new LDAP connection with TLS support - Close(): cleanup method to close all pooled connections - Connection health checking (IsClosing()) before reuse BENEFITS: - Reduced connection overhead (no TCP handshake per request) - Better resource utilization under load - Prevents "too many open files" errors - Non-blocking pool operations (creates new conn if pool empty) * fix: correct TokenGenerator access in STS handlers CRITICAL FIX: - Make TokenGenerator public in STSService (was private tokenGenerator) - Update all references from Config.TokenGenerator to TokenGenerator - Remove TokenGenerator from STSConfig (it belongs in STSService) This fixes the "NotImplemented" errors in distributed and Keycloak tests. The issue was that Round 5 changes tried to access Config.TokenGenerator which didn't exist - TokenGenerator is a field in STSService, not STSConfig. The TokenGenerator is properly initialized in STSService.Initialize() and is now accessible for JWT token generation in AssumeRole handlers. * fix: update tests to use public TokenGenerator field Following the change to make TokenGenerator public in STSService, this commit updates the test files to reference the correct public field name. This resolves compilation errors in the IAM STS test suite. * fix: update distributed tests to use valid Keycloak users Updated s3_iam_distributed_test.go to use 'admin-user' and 'read-user' which exist in the standard Keycloak setup provided by setup_keycloak.sh. This resolves 'unknown test user' errors in distributed integration tests. * fix: ensure iam_config.json exists in setup target for CI The GitHub Actions workflow calls 'make setup' which was not creating iam_config.json, causing the server to start without IAM integration enabled (iamIntegration = nil), resulting in NotImplemented errors. Now 'make setup' copies iam_config.local.json to iam_config.json if it doesn't exist, ensuring IAM is properly configured in CI. * fix(iam/ldap): fix connection pool race and rebind corruption - Add atomic 'closed' flag to connection pool to prevent racing on Close() - Rebind authenticated user connections back to service account before returning to pool - Close connections on error instead of returning potentially corrupted state to pool * fix(iam/ldap): populate standard TokenClaims fields in ValidateToken - Set Subject, Issuer, Audience, IssuedAt, and ExpiresAt to satisfy the interface - Use time.Time for timestamps as required by TokenClaims struct - Default to 1 hour TTL for LDAP tokens * fix(s3api): include account ID in STS AssumedRoleUser ARN - Consistent with AWS, include the account ID in the assumed-role ARN - Use the configured account ID from STS service if available, otherwise default to '111122223333' - Apply to both AssumeRole and AssumeRoleWithLDAPIdentity handlers - Also update .gitignore to ignore IAM test environment files * refactor(s3api): extract shared STS credential generation logic - Move common logic for session claims and credential generation to prepareSTSCredentials - Update handleAssumeRole and handleAssumeRoleWithLDAPIdentity to use the helper - Remove stale comments referencing outdated line numbers * feat(iam/ldap): make pool size configurable and add audience support - Add PoolSize to LDAPConfig (default 10) - Add Audience to LDAPConfig to align with OIDC validation - Update initialization and ValidateToken to use new fields * update tests * debug * chore(iam): cleanup debug prints and fix test config port * refactor(iam): use mapstructure for LDAP config parsing * feat(sts): implement strict trust policy validation for AssumeRole * test(iam): refactor STS tests to use AWS SDK signer * test(s3api): implement ValidateTrustPolicyForPrincipal in MockIAMIntegration * fix(s3api): ensure IAM matcher checks query string on ParseForm error * fix(sts): use crypto/rand for secure credentials and extract constants * fix(iam): fix ldap connection leaks and add insecure warning * chore(iam): improved error wrapping and test parameterization * feat(sts): add support for LDAPProviderName parameter * Update weed/iam/ldap/ldap_provider.go Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update weed/s3api/s3api_sts.go Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * fix(sts): use STSErrSTSNotReady when LDAP provider is missing * fix(sts): encapsulate TokenGenerator in STSService and add getter --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2026-01-12 10:45:24 -08:00
Chris Lu	e3db95e0c1	Fix: Route unauthenticated specific STS requests to STS handler correctly (#7920 ) * Fix STS Access Denied for AssumeRoleWithWebIdentity (Issue #7917) * Fix logging regression: ensure IAM status is logged even if STS is enabled * Address PR feedback: fix duplicate log, clarify comments, add comprehensive routing tests * Add edge case test: authenticated STS action routes to IAM (auth precedence)	2025-12-30 16:34:05 -08:00
Chris Lu	7a18c3a16f	Fix critical authentication bypass vulnerability (#7912 ) (#7915 ) * Fix critical authentication bypass vulnerability (#7912) The isRequestPostPolicySignatureV4() function was incorrectly returning true for ANY POST request with multipart/form-data content type, causing all such requests to bypass authentication in authRequest(). This allowed unauthenticated access to S3 API endpoints, as reported in issue #7912 where any credentials (or no credentials) were accepted. The fix removes isRequestPostPolicySignatureV4() entirely, preventing authTypePostPolicy from ever being set. PostPolicy signature verification is still properly handled in PostPolicyBucketHandler via doesPolicySignatureMatch(). Fixes #7912 * add AuthPostPolicy * refactor * Optimizing Auth Credentials * Update auth_credentials.go * Update auth_credentials.go	2025-12-30 12:40:59 -08:00
Chris Lu	ae9a943ef6	IAM: Add Service Account Support (#7744 ) (#7901 ) * iam: add ServiceAccount protobuf schema Add ServiceAccount message type to iam.proto with support for: - Unique ID and parent user linkage - Optional expiration timestamp - Separate credentials (access key/secret) - Action restrictions (subset of parent) - Enable/disable status This is the first step toward implementing issue #7744 (IAM Service Account Support). * iam: add service account response types Add IAM API response types for service account operations: - ServiceAccountInfo struct for marshaling account details - CreateServiceAccountResponse - DeleteServiceAccountResponse - ListServiceAccountsResponse - GetServiceAccountResponse - UpdateServiceAccountResponse Also add type aliases in iamapi package for backwards compatibility. Part of issue #7744 (IAM Service Account Support). * iam: implement service account API handlers Add CRUD operations for service accounts: - CreateServiceAccount: Creates service account with ABIA key prefix - DeleteServiceAccount: Removes service account and parent linkage - ListServiceAccounts: Lists all or filtered by parent user - GetServiceAccount: Retrieves service account details - UpdateServiceAccount: Modifies status, description, expiration Service accounts inherit parent user's actions by default and support optional expiration timestamps. Part of issue #7744 (IAM Service Account Support). * sts: add AssumeRoleWithWebIdentity HTTP endpoint Add STS API HTTP endpoint for AWS SDK compatibility: - Create s3api_sts.go with HTTP handlers matching AWS STS spec - Support AssumeRoleWithWebIdentity action with JWT token - Return XML response with temporary credentials (AccessKeyId, SecretAccessKey, SessionToken) matching AWS format - Register STS route at POST /?Action=AssumeRoleWithWebIdentity This enables AWS SDKs (boto3, AWS CLI, etc.) to obtain temporary S3 credentials using OIDC/JWT tokens. Part of issue #7744 (IAM Service Account Support). * test: add service account and STS integration tests Add integration tests for new IAM features: s3_service_account_test.go: - TestServiceAccountLifecycle: Create, Get, List, Update, Delete - TestServiceAccountValidation: Error handling for missing params s3_sts_test.go: - TestAssumeRoleWithWebIdentityValidation: Parameter validation - TestAssumeRoleWithWebIdentityWithMockJWT: JWT token handling Tests skip gracefully when SeaweedFS is not running or when IAM features are not configured. Part of issue #7744 (IAM Service Account Support). * iam: address code review comments - Add constants for service account ID and key lengths - Use strconv.ParseInt instead of fmt.Sscanf for better error handling - Allow clearing descriptions by checking key existence in url.Values - Replace magic numbers (12, 20, 40) with named constants Addresses review comments from gemini-code-assist[bot] * test: add proper error handling in service account tests Use require.NoError(t, err) for io.ReadAll and xml.Unmarshal to prevent silent failures and ensure test reliability. Addresses review comment from gemini-code-assist[bot] * test: add proper error handling in STS tests Use require.NoError(t, err) for io.ReadAll and xml.Unmarshal to prevent silent failures and ensure test reliability. Repeated this fix throughout the file. Addresses review comment from gemini-code-assist[bot] in PR #7901. * iam: address additional code review comments - Specific error code mapping for STS service errors - Distinguish between Sender and Receiver error types in STS responses - Add nil checks for credentials in List/GetServiceAccount - Validate expiration date is in the future - Improve integration test error messages (include response body) - Add credential verification step in service account tests Addresses remaining review comments from gemini-code-assist[bot] across multiple files. * iam: fix shared slice reference in service account creation Copy parent's actions to create an independent slice for the service account instead of sharing the underlying array. This prevents unexpected mutations when the parent's actions are modified later. Addresses review comment from coderabbitai[bot] in PR #7901. * iam: remove duplicate unused constant Removed redundant iamServiceAccountKeyPrefix as ServiceAccountKeyPrefix is already defined and used. Addresses remaining cleanup task. * sts: document limitation of string-based error mapping Added TODO comment explaining that the current string-based error mapping approach is fragile and should be replaced with typed errors from the STS service in a future refactoring. This addresses the architectural concern raised in code review while deferring the actual implementation to a separate PR to avoid scope creep in the current service account feature addition. * iam: fix remaining review issues - Add future-date validation for expiration in UpdateServiceAccount - Reorder tests so credential verification happens before deletion - Fix compilation error by using correct JWT generation methods Addresses final review comments from coderabbitai[bot]. * iam: fix service account access key length The access key IDs were incorrectly generated with 24 characters instead of the AWS-standard 20 characters. This was caused by generating 20 random characters and then prepending the 4-character ABIA prefix. Fixed by subtracting the prefix length from AccessKeyLength, so the final key is: ABIA (4 chars) + random (16 chars) = 20 chars total. This ensures compatibility with S3 clients that validate key length. * test: add comprehensive service account security tests Added comprehensive integration tests for service account functionality: - TestServiceAccountS3Access: Verify SA credentials work for S3 operations - TestServiceAccountExpiration: Test expiration date validation and enforcement - TestServiceAccountInheritedPermissions: Verify parent-child relationship - TestServiceAccountAccessKeyFormat: Validate AWS-compatible key format (ABIA prefix, 20 char length) These tests ensure SeaweedFS service accounts are compatible with AWS conventions and provide robust security coverage. * iam: remove unused UserAccessKeyPrefix constant Code cleanup to remove unused constants. * iam: remove unused iamCommonResponse type alias Code cleanup to remove unused type aliases. * iam: restore and use UserAccessKeyPrefix constant Restored UserAccessKeyPrefix constant and updated s3api tests to use it instead of hardcoded strings for better maintainability and consistency. * test: improve error handling in service account security tests Added explicit error checking for io.ReadAll and xml.Unmarshal in TestServiceAccountExpiration to ensure failures are reported correctly and cleanup is performed only when appropriate. Also added logging for failed responses. * test: use t.Cleanup for reliable resource cleanup Replaced defer with t.Cleanup to ensure service account cleanup runs even when require.NoError fails. Also switched from manual error checking to require.NoError for more idiomatic testify usage. * iam: add CreatedBy field and optimize identity lookups - Added createdBy parameter to CreateServiceAccount to track who created each service account - Extract creator identity from request context using GetIdentityNameFromContext - Populate created_by field in ServiceAccount protobuf - Added findIdentityByName helper function to optimize identity lookups - Replaced nested loops with O(n) helper function calls in CreateServiceAccount and DeleteServiceAccount This addresses code review feedback for better auditing and performance. * iam: prevent user deletion when service accounts exist Following AWS IAM behavior, prevent deletion of users that have active service accounts. This ensures explicit cleanup and prevents orphaned service account resources with invalid ParentUser references. Users must delete all associated service accounts before deleting the parent user, providing safer resource management. * sts: enhance TODO with typed error implementation guidance Updated TODO comment with detailed implementation approach for replacing string-based error matching with typed errors using errors.Is(). This provides a clear roadmap for a follow-up PR to improve error handling robustness and maintainability. * iam: add operational limits for service account creation Added AWS IAM-compatible safeguards to prevent resource exhaustion: - Maximum 100 service accounts per user (LimitExceededException) - Maximum 1000 character description length (InvalidInputException) These limits prevent accidental or malicious resource exhaustion while not impacting legitimate use cases. * iam: add missing operational limit constants Added MaxServiceAccountsPerUser and MaxDescriptionLength constants that were referenced in the previous commit but not defined. * iam: enforce service account expiration during authentication CRITICAL SECURITY FIX: Expired service account credentials were not being rejected during authentication, allowing continued access after expiration. Changes: - Added Expiration field to Credential struct - Populate expiration when loading service accounts from configuration - Check expiration in all authentication paths (V2 and V4 signatures) - Return ErrExpiredToken for expired credentials This ensures expired service accounts are properly rejected at authentication time, matching AWS IAM behavior and preventing unauthorized access. * iam: fix error code for expired service account credentials Use ErrAccessDenied instead of non-existent ErrExpiredToken for expired service account credentials. This provides appropriate access denial for expired credentials while maintaining AWS-compatible error responses. * iam: fix remaining ErrExpiredToken references Replace all remaining instances of non-existent ErrExpiredToken with ErrAccessDenied for expired service account credentials. * iam: apply AWS-standard key format to user access keys Updated CreateAccessKey to generate AWS-standard 20-character access keys with AKIA prefix for regular users, matching the format used for service accounts. This ensures consistency across all access key types and full AWS compatibility. - Access keys: AKIA + 16 random chars = 20 total (was 21 chars, no prefix) - Secret keys: 40 random chars (was 42, now matches AWS standard) - Uses AccessKeyLength and UserAccessKeyPrefix constants * sts: replace fragile string-based error matching with typed errors Implemented robust error handling using typed errors and errors.Is() instead of fragile strings.Contains() matching. This decouples the HTTP layer from service implementation details and prevents errors from being miscategorized if error messages change. Changes: - Added typed error variables to weed/iam/sts/constants.go: * ErrTypedTokenExpired * ErrTypedInvalidToken * ErrTypedInvalidIssuer * ErrTypedInvalidAudience * ErrTypedMissingClaims - Updated STS service to wrap provider authentication errors with typed errors - Replaced strings.Contains() with errors.Is() in HTTP layer for error checking - Removed TODO comment as the improvement is now implemented This makes error handling more maintainable and reliable. * sts: eliminate all string-based error matching with provider-level typed errors Completed the typed error implementation by adding provider-level typed errors and updating provider implementations to return them. This eliminates ALL fragile string matching throughout the entire error handling stack. Changes: - Added typed error definitions to weed/iam/providers/errors.go: * ErrProviderTokenExpired * ErrProviderInvalidToken * ErrProviderInvalidIssuer * ErrProviderInvalidAudience * ErrProviderMissingClaims - Updated OIDC provider to wrap JWT validation errors with typed provider errors - Replaced strings.Contains() with errors.Is() in STS service for error mapping - Complete error chain: Provider -> STS -> HTTP layer, all using errors.Is() This provides: - Reliable error classification independent of error message content - Type-safe error checking throughout the stack - No order-dependent string matching - Maintainable error handling that won't break with message changes * oidc: use jwt.ErrTokenExpired instead of string matching Replaced the last remaining string-based error check with the JWT library's exported typed error. This makes the error detection independent of error message content and more robust against library updates. Changed from: strings.Contains(errMsg, "expired") To: errors.Is(err, jwt.ErrTokenExpired) This completes the elimination of ALL string-based error matching throughout the entire authentication stack. * iam: add description length validation to UpdateServiceAccount Fixed inconsistency where UpdateServiceAccount didn't validate description length against MaxDescriptionLength, allowing operational limits to be bypassed during updates. Now validates that updated descriptions don't exceed 1000 characters, matching the validation in CreateServiceAccount. * iam: refactor expiration check into helper method Extracted duplicated credential expiration check logic into a helper method to reduce code duplication and improve maintainability. Added Credential.isCredentialExpired() method and replaced 5 instances of inline expiration checks across auth_signature_v2.go and auth_signature_v4.go. * iam: address critical Copilot security and consistency feedback Fixed three critical issues identified by Copilot code review: 1. SECURITY: Prevent loading disabled service account credentials - Added check to skip disabled service accounts during credential loading - Disabled accounts can no longer authenticate 2. Add DurationSeconds validation for STS AssumeRoleWithWebIdentity - Enforce AWS-compatible range: 900-43200 seconds (15 min - 12 hours) - Returns proper error for out-of-range values 3. Fix expiration update consistency in UpdateServiceAccount - Added key existence check like Description field - Allows explicit clearing of expiration by setting to empty string - Distinguishes between "not updating" and "clearing expiration" * sts: remove unused durationSecondsStr variable Fixed build error from unused variable after refactoring duration parsing. * iam: address remaining Copilot feedback and remove dead code Completed remaining Copilot code review items: 1. Remove unused getPermission() method (dead code) - Method was defined but never called anywhere 2. Improve slice modification safety in DeleteServiceAccount - Replaced append-with-slice-operations with filter pattern - Avoids potential issues from mutating slice during iteration 3. Fix route registration order - Moved STS route registration BEFORE IAM route - Prevents IAM route from intercepting STS requests - More specific route (with query parameter) now registered first * iam: improve expiration validation and test cleanup robustness Addressed additional Copilot feedback: 1. Make expiration validation more explicit - Added explicit check for negative values - Added comment clarifying that 0 is allowed to clear expiration - Improves code readability and intent 2. Fix test cleanup order in s3_service_account_test.go - Track created service accounts in a slice - Delete all service accounts before deleting parent user - Prevents DeleteConflictException during cleanup - More robust cleanup even if test fails mid-execution Note: s3_service_account_security_test.go already had correct cleanup order due to LIFO defer execution. * test: remove redundant variable assignments Removed duplicate assignments of createdSAId, createdAccessKeyId, and createdSecretAccessKey on lines 148-150 that were already assigned on lines 132-134.	2025-12-29 20:17:23 -08:00
Chris Lu	8d6bcddf60	Add S3 volume encryption support with -s3.encryptVolumeData flag (#7890 ) * Add S3 volume encryption support with -s3.encryptVolumeData flag This change adds volume-level encryption support for S3 uploads, similar to the existing -filer.encryptVolumeData option. Each chunk is encrypted with its own auto-generated CipherKey when the flag is enabled. Changes: - Add -s3.encryptVolumeData flag to weed s3, weed server, and weed mini - Wire Cipher option through S3ApiServer and ChunkedUploadOption - Add integration tests for multi-chunk range reads with encryption - Tests verify encryption works across chunk boundaries Usage: weed s3 -encryptVolumeData weed server -s3 -s3.encryptVolumeData weed mini -s3.encryptVolumeData Integration tests: go test -v -tags=integration -timeout 5m ./test/s3/sse/... * Add GitHub Actions CI for S3 volume encryption tests - Add test-volume-encryption target to Makefile that starts server with -s3.encryptVolumeData - Add s3-volume-encryption job to GitHub Actions workflow - Tests run with integration build tag and 10m timeout - Server logs uploaded on failure for debugging * Fix S3 client credentials to use environment variables The test was using hardcoded credentials "any"/"any" but the Makefile sets AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY to "some_access_key1"/ "some_secret_key1". Updated getS3Client() to read from environment variables with fallback to "any"/"any" for manual testing. * Change bucket creation errors from skip to fatal Tests should fail, not skip, when bucket creation fails. This ensures that credential mismatches and other configuration issues are caught rather than silently skipped. * Make copy and multipart test jobs fail instead of succeed Changed exit 0 to exit 1 for s3-sse-copy-operations and s3-sse-multipart jobs. These jobs document known limitations but should fail to ensure the issues are tracked and addressed, not silently ignored. * Hardcode S3 credentials to match Makefile Changed from environment variables to hardcoded credentials "some_access_key1"/"some_secret_key1" to match the Makefile configuration. This ensures tests work reliably. * fix Double Encryption * fix Chunk Size Mismatch * Added IsCompressed * is gzipped * fix copying * only perform HEAD request when len(cipherKey) > 0 * Revert "Make copy and multipart test jobs fail instead of succeed" This reverts commit `bc34a7eb3c`. * fix security vulnerability * fix security * Update s3api_object_handlers_copy.go * Update s3api_object_handlers_copy.go * jwt to get content length	2025-12-27 00:09:14 -08:00
G-OD	504b258258	s3: fix remote object not caching (#7790 ) * s3: fix remote object not caching * s3: address review comments for remote object caching - Fix leading slash in object name by using strings.TrimPrefix - Return cached entry from CacheRemoteObjectToLocalCluster to get updated local chunk locations - Reuse existing helper function instead of inline gRPC call * s3/filer: add singleflight deduplication for remote object caching - Add singleflight.Group to FilerServer to deduplicate concurrent cache operations - Wrap CacheRemoteObjectToLocalCluster with singleflight to ensure only one caching operation runs per object when multiple clients request the same file - Add early-return check for already-cached objects - S3 API calls filer gRPC with timeout and graceful fallback on error - Clear negative bucket cache when bucket is created via weed shell - Add integration tests for remote cache with singleflight deduplication This benefits all clients (S3, HTTP, Hadoop) accessing remote-mounted objects by preventing redundant cache operations and improving concurrent access performance. Fixes: https://github.com/seaweedfs/seaweedfs/discussions/7599 * fix: data race in concurrent remote object caching - Add mutex to protect chunks slice from concurrent append - Add mutex to protect fetchAndWriteErr from concurrent read/write - Fix incorrect error check (was checking assignResult.Error instead of parseErr) - Rename inner variable to avoid shadowing fetchAndWriteErr * fix: address code review comments - Remove duplicate remote caching block in GetObjectHandler, keep only singleflight version - Add mutex protection for concurrent chunk slice and error access (data race fix) - Use lazy initialization for S3 client in tests to avoid panic during package load - Fix markdown linting: add language specifier to code fence, blank lines around tables - Add 'all' target to Makefile as alias for test-with-server - Remove unused 'util' import * style: remove emojis from test files * fix: add defensive checks and sort chunks by offset - Add nil check and type assertion check for singleflight result - Sort chunks by offset after concurrent fetching to maintain file order * fix: improve test diagnostics and path normalization - runWeedShell now returns error for better test diagnostics - Add all targets to .PHONY in Makefile (logs-primary, logs-remote, health) - Strip leading slash from normalizedObject to avoid double slashes in path --------- Co-authored-by: chrislu <chris.lu@gmail.com> Co-authored-by: Chris Lu <chrislusf@users.noreply.github.com>	2025-12-16 12:41:04 -08:00
Chris Lu	f5c666052e	feat: add S3 bucket size and object count metrics (#7776 ) * feat: add S3 bucket size and object count metrics Adds periodic collection of bucket size metrics: - SeaweedFS_s3_bucket_size_bytes: logical size (deduplicated across replicas) - SeaweedFS_s3_bucket_physical_size_bytes: physical size (including replicas) - SeaweedFS_s3_bucket_object_count: object count (deduplicated) Collection runs every 1 minute via background goroutine that queries filer Statistics RPC for each bucket's collection. Also adds Grafana dashboard panels for: - S3 Bucket Size (logical vs physical) - S3 Bucket Object Count * address PR comments: fix bucket size metrics collection 1. Fix collectCollectionInfoFromMaster to use master VolumeList API - Now properly queries master for topology info - Uses WithMasterClient to get volume list from master - Correctly calculates logical vs physical size based on replication 2. Return error when filerClient is nil to trigger fallback - Changed from 'return nil, nil' to 'return nil, error' - Ensures fallback to filer stats is properly triggered 3. Implement pagination in listBucketNames - Added listBucketPageSize constant (1000) - Uses StartFromFileName for pagination - Continues fetching until fewer entries than limit returned 4. Handle NewReplicaPlacementFromByte error and prevent division by zero - Check error return from NewReplicaPlacementFromByte - Default to 1 copy if error occurs - Add explicit check for copyCount == 0 * simplify bucket size metrics: remove filer fallback, align with quota enforcement - Remove fallback to filer Statistics RPC - Use only master topology for collection info (same as s3.bucket.quota.enforce) - Updated comments to clarify this runs the same collection logic as quota enforcement - Simplified code by removing collectBucketSizeFromFilerStats * use s3a.option.Masters directly instead of querying filer * address PR comments: fix dashboard overlaps and improve metrics collection Grafana dashboard fixes: - Fix overlapping panels 55 and 59 in grafana_seaweedfs.json (moved 59 to y=30) - Fix grid collision in k8s dashboard (moved panel 72 to y=48) - Aggregate bucket metrics with max() by (bucket) for multi-instance S3 gateways Go code improvements: - Add graceful shutdown support via context cancellation - Use ticker instead of time.Sleep for better shutdown responsiveness - Distinguish EOF from actual errors in stream handling * improve bucket size metrics: multi-master failover and proper error handling - Initial delay now respects context cancellation using select with time.After - Use WithOneOfGrpcMasterClients for multi-master failover instead of hardcoding Masters[0] - Properly propagate stream errors instead of just logging them (EOF vs real errors) * improve bucket size metrics: distributed lock and volume ID deduplication - Add distributed lock (LiveLock) so only one S3 instance collects metrics at a time - Add IsLocked() method to LiveLock for checking lock status - Fix deduplication: use volume ID tracking instead of dividing by copyCount - Previous approach gave wrong results if replicas were missing - Now tracks seen volume IDs and counts each volume only once - Physical size still includes all replicas for accurate disk usage reporting * rename lock to s3.leader * simplify: remove StartBucketSizeMetricsCollection wrapper function * fix data race: use atomic operations for LiveLock.isLocked field - Change isLocked from bool to int32 - Use atomic.LoadInt32/StoreInt32 for all reads/writes - Sync shared isLocked field in StartLongLivedLock goroutine * add nil check for topology info to prevent panic * fix bucket metrics: use Ticker for consistent intervals, fix pagination logic - Use time.Ticker instead of time.After for consistent interval execution - Fix pagination: count all entries (not just directories) for proper termination - Update lastFileName for all entries to prevent pagination issues * address PR comments: remove redundant atomic store, propagate context - Remove redundant atomic.StoreInt32 in StartLongLivedLock (AttemptToLock already sets it) - Propagate context through metrics collection for proper cancellation on shutdown - collectAndUpdateBucketSizeMetrics now accepts ctx - collectCollectionInfoFromMaster uses ctx for VolumeList RPC - listBucketNames uses ctx for ListEntries RPC	2025-12-15 19:23:25 -08:00

1 2 3 4

158 Commits