Files
seaweedfs/test/s3tables
Chris Lu 9554e259dd fix(iceberg): route catalog clients to the right bucket and vend S3 endpoint (#9109)
* fix(iceberg): route catalog clients to the right bucket and vend S3 endpoint

DuckDB ATTACH 's3://<bucket>/' AS cat (TYPE 'ICEBERG', ...) was failing
with "schema does not exist" because GET /v1/config ignored the warehouse
query parameter and returned no overrides.prefix, so subsequent requests
fell through to the hard-coded "warehouse" default bucket instead of the
one the client attached. LoadTable also returned an empty config, forcing
clients to discover the S3 endpoint out-of-band and producing 403s on
direct iceberg_scan calls.

- handleConfig now echoes overrides.prefix = bucket and defaults.warehouse
  when ?warehouse=s3://<bucket>/ is supplied.
- getBucketFromPrefix honors a warehouse query parameter as a fallback for
  clients that skip the /v1/config handshake.
- LoadTable responses advertise s3.endpoint and s3.path-style-access so
  clients can reach data files without separate configuration.

Refs #9103

* address review feedback on iceberg S3 endpoint vending

- deriveS3AdvertisedEndpoint is now a method on S3Options; honors
  externalUrl / S3_EXTERNAL_URL, switches to https when -s3.key.file is
  set, uses the https port when configured, and brackets IPv6 literals
  via util.JoinHostPort.
- handleCreateTable returns s.buildFileIOConfig() in both its staged and
  final LoadTableResult branches so create and load flows see the same
  FileIO hints.
- Add unit test coverage for the endpoint derivation scenarios.

* address CI and review feedback for #9109

- DuckDB integration test now runs under its own newOAuthTestEnv (with a
  valid IAM config) so the OAuth2 client_credentials flow DuckDB requires
  actually works; the shared env has no registered credentials, which was
  the cause of the CI failure. Helper createTableWithToken was added to
  create tables via Bearer auth.
- Tighten TestIssue9103_LoadTableDoesNotVendS3FileIOCredentials to also
  assert s3.path-style-access = "true", so a partial regression where the
  endpoint is vended but path-style is dropped still fails.
- deriveS3AdvertisedEndpoint now logs a startup hint when it infers the
  host from os.Hostname because the bind IP is a wildcard, pointing
  operators at -s3.externalUrl / S3_EXTERNAL_URL for reverse-proxy
  deployments where the inferred name is not externally reachable.
- handleConfig has a comment explaining that any sub-path in the
  warehouse URL is dropped because catalog routing is bucket-scoped.

* fix(iceberg): make advertised S3 endpoint strictly opt-in; add region

The wildcard-bind fallback to os.Hostname() in deriveS3AdvertisedEndpoint
was hijacking correctly-configured clients: on the CI runner it produced
http://runnervmrc6n4:<port>, which Spark (running in Docker) could not
resolve, so Spark iceberg tests began failing after the endpoint started
being vended in LoadTable responses.

Change the rule so advertising is opt-in and never guesses a host that
might not be routable:
  - -s3.externalUrl / S3_EXTERNAL_URL wins (covers reverse-proxy).
  - Otherwise, only an explicit, non-wildcard -s3.bindIp is used.
  - Wildcard / empty bind returns "" so no FileIO endpoint is vended and
    existing clients keep using their own configuration.

buildFileIOConfig additionally vends s3.region (defaulting to the same
value baked into table bucket ARNs) whenever it vends an endpoint, so
DuckDB's attach does not fail with "No region was provided via the
vended credentials" when the operator has opted in.

The DuckDB issue-9103 integration test runs under an env with a
wildcard bind, so it explicitly sets AWS_REGION in the docker run to
pick up the same default. The HTTP-level LoadTable-vending test was
dropped because its expectation is now conditional and already covered
by unit tests in iceberg_issue_9103_test.go.
2026-04-16 15:51:43 -07:00
..
2026-02-08 20:06:32 -08:00

Populate data run:

  - make -C test/s3tables help
  - make -C test/s3tables populate-trino
  - make -C test/s3tables populate-spark

  Run:

  - make -C test/s3tables populate
  - If your account id differs, override: make -C test/s3tables populate
    TABLE_ACCOUNT_ID=000000000000