Files
seaweedfs/weed
Chris Lu 4ded97a321 feat(iam): OIDC provider store + read-only IAM API (Phase 2a) (#9319)
* feat(iam): STS web-identity AWS-fidelity polish

- OIDC discovery via .well-known/openid-configuration; falls back to
  /.well-known/jwks.json when discovery is absent. Reject discovery docs
  whose issuer claim does not match the configured issuer to defend
  against issuer-substitution.
- ComputeParentUser derives a stable per-identity hash from (sub, iss).
  Surface as aws:userid in the request context and as a parent_user
  claim in the session JWT so per-user state survives token rotation.
- Per-role MaxSessionDuration (3600..43200) clamps requested
  DurationSeconds before the STS service applies its own caps.
- Tighten RoleSessionName to the AWS contract: 2..64 chars from
  [\w+=,.@-].
- Populate PackedPolicySize in AssumeRole / AssumeRoleWithWebIdentity /
  AssumeRoleWithLDAPIdentity responses as a percentage of the 2048-byte
  inline session policy budget.

* fix(iam): leave omitted DurationSeconds nil so STS default applies

capDurationByRole was substituting the role's MaxSessionDuration
when the caller omitted DurationSeconds entirely. AWS returns the
configured default (typically 1 hour) in that case, not the role's
upper bound — a 12h MaxSessionDuration shouldn't silently make every
no-duration assume-role mint a 12h session.

Return nil when requested is nil; let the downstream
calculateSessionDuration in the STS service apply its TokenDuration
default. The role-max upper bound still clamps when the request
arrives with a concrete value above the cap.

Addresses gemini high-priority review on PR #9318.

* fix(iam): synchronize OIDCProvider JWKS cache fields

jwksCache, jwksFetchedAt, resolvedJWKSUri, and discoveryFailed are
mutated lazily on the first token-validate call and refreshed
afterwards on TTL expiry. Multiple S3 requests can land here in
parallel, so the writes were racing against subsequent reads on
every other goroutine. resolvedJWKSUri/discoveryFailed inherited
the same un-protected pattern when discovery shipped.

Add sync.RWMutex; getPublicKey takes the read lock for the
common cache-hit path and promotes to the write lock for misses
+ refreshes. fetchJWKSLocked / resolveJWKSUriLocked assume the
write lock is held by the caller; fetchJWKS keeps the
test-friendly entry point that acquires the lock itself.

Addresses gemini high-priority review on PR #9318.

* fix(iam): trim trailing slash + retry discovery after transient failure

Two OIDC discovery edge cases reviewers flagged:

1. Issuer comparison was sensitive to trailing slashes. resolveJWKSUri
   trims them when building the discovery URL, but the doc.Issuer ↔
   p.config.Issuer check did not, so an IDP whose issuer claim drops or
   adds the slash relative to the configured value would be falsely
   rejected. Trim a single trailing slash on each side before comparing.

2. discoveryFailed flipped to true on any error and stayed there for the
   process lifetime. A transient 5xx at startup permanently locked the
   provider into the /.well-known/jwks.json fallback. Reset the flag at
   the top of fetchJWKSLocked when no URI has been cached yet, so each
   JWKS refresh (typically once per TTL = 1h) reattempts discovery.
   Successful discovery remains cached via resolvedJWKSUri so we don't
   pay the discovery RTT on every refresh.

Addresses gemini security-medium + medium reviews on PR #9318.

* fix(iam): require non-empty issuer in OIDC discovery doc

The previous "doc.Issuer != "" && ..." guard let a discovery document
that omitted the issuer field bypass the issuer-mismatch check
entirely, letting the doc steer fetchJWKS at any URL it provided.
OIDC Discovery 1.0 §3 mandates the issuer field; treat missing as a
hard failure same as mismatched. Trailing-slash equivalence still
applies.

Adds TestDiscoveryRejectsMissingIssuer alongside the existing
TestDiscoveryRejectsIssuerMismatch via a new omitDiscoveryIssuer
toggle on fakeIDP.

* feat(iam): OIDC provider store + read-only IAM API

Add OIDCProviderRecord — the persisted, IAM-managed view of an OIDC
identity provider — and an OIDCProviderStore interface with memory and
filer implementations mirroring the existing role-store pattern.

The store is hydrated at boot from the static STS.Providers list so the
new IAM API surfaces the same set the STS service already validates
against. Two read-only actions land now:

- ListOpenIDConnectProviders -> ARN-only list, AWS-shape XML.
- GetOpenIDConnectProvider   -> URL, ClientIDList, ThumbprintList,
                                Tags, CreateDate.

Mutations (Create/Delete/Add-Remove ClientID/Update Thumbprint), multiple
client_ids per provider, and TLS thumbprint pinning come in Phase 2b.

* fix(iam): preserve CreatedAt across boots + paginate ListProviders

Two medium-priority issues gemini flagged on the read-only IAM API:

1. The static-config bootstrap was setting CreatedAt = time.Now() on
   every server start, so the IAM GetOpenIDConnectProvider response's
   CreateDate shifted on each restart even when backed by a persistent
   store. Look up the existing record via GetProviderByARN first and
   preserve its CreatedAt; only the UpdatedAt advances.

2. FilerOIDCProviderStore.ListProviders had a hardcoded Limit: 1000
   that silently truncated above that. Stream-paginate via
   StartFromFileName, returning io.EOF naturally and surfacing all
   other errors instead of swallowing them.

Addresses two gemini medium reviews on PR #9319.
2026-05-04 22:15:03 -07:00
..
2026-04-10 17:31:14 -07:00
2026-04-10 17:31:14 -07:00
2026-04-14 20:48:24 -07:00
2026-04-23 10:05:51 -07:00
2026-05-03 23:15:34 -07:00