* test(s3tables): add Unity Catalog OSS integration test against SeaweedFS Mirrors the configuration used by the upstream playground at data-engineering-helpers/mds-in-a-box/unitycatalog-playground. Three test variants under test/s3tables/unity_catalog: - TestUnityCatalogDeltaIntegration: aws.masterRoleArn empty / static keys; catalog/schema/EXTERNAL Delta CRUD + temporary-table-credentials S3 round-trip (the playground's working configuration). - TestUnityCatalogMasterRoleIntegration: aws.masterRoleArn set to a SeaweedFS-side role with a permissive trust policy; UC's StsClient is pinned at SeaweedFS via AWS_ENDPOINT_URL_STS, and the test asserts the vended creds carry a session_token and a non-static access key, proving the role-vended path the playground notes as not-yet-working actually does work today. - TestUnityCatalogDeltaRsRoundTrip: writes/reads a real Delta table at the registered storage_location using delta-rs in a slim Python container, with temporary credentials fetched from UC. All three self-skip without Docker or a weed binary, matching the sibling lakekeeper / polaris tests. * test(s3tables): tighten Unity Catalog tests against actual UC OSS behavior After running the suite locally, ground the assertions in what the upstream UC OSS Docker image actually does against SeaweedFS today. - Static-key playground configuration (TestUnityCatalogDeltaIntegration): catalog/schema/EXTERNAL Delta CRUD pass against the SeaweedFS-backed warehouse. The temporary-table- credentials subtest is renamed and inverted to assert the failure mode the playground reports -- UC's AwsCredentialVendor falls through to an internal StsClient.assumeRole when masterRoleArn and sessionToken are both empty, which has no real STS to talk to. Bucket path is also fixed to match UC's getStorageBase() lookup (s3://lakehouse vs the playground's s3://lakehouse/warehouse, which the upstream code never matches). - Master-role variant (TestUnityCatalogMasterRoleIntegration): split into two passing slices. Slice 1 proves SeaweedFS' STS endpoint vending UnityCatalogVendedRole works via the Go AWS SDK and the vended creds round-trip on S3. Slice 2 boots UC with aws.masterRoleArn set and verifies catalog/schema/Delta CRUD. The third hop -- UC's Java StsClient actually reaching SeaweedFS' STS handler during /temporary-table-credentials -- is logged but not asserted, since the AWS Java SDK's STS request currently lands on a SeaweedFS S3 path rather than the STS handler. - Delta-RS round-trip (TestUnityCatalogDeltaRsRoundTrip): gated on UC_DELTA_RS_RUN=1 since it depends on the master-role STS handoff above. The Dockerfile / writer script stay in tree so the test runs end-to-end the moment that hop is fixed. README rewritten to be explicit about what each test validates today and what is still pending. Result: `go test -run TestUnityCatalog ./test/s3tables/unity_catalog/...` passes cleanly with weed + Docker available, and self-skips otherwise. * test(s3tables): exercise unity catalog integrations * ci: run Unity Catalog integration tests on PRs Adds a unity-catalog-integration-tests job to s3-tables-tests.yml, modeled on the existing lakekeeper / dremio jobs. Pre-pulls the UC image and python:3.11-slim (used by the delta-rs writer container) and runs `go test ./test/s3tables/unity_catalog`. Format-check and go-vet jobs already recurse into ./test/s3tables/... so the new package is covered there too. * test/ci: address PR review Tighten the UC readiness probe to require 200, not <500, so a 401/403/404 during startup surfaces immediately instead of being treated as ready (CodeRabbit). Pin the UC image to v0.4.0 in both the workflow and the test default, matching the pinned-tag convention the rest of s3-tables-tests.yml uses (CodeRabbit). Use UC_IMAGE=unitycatalog/unitycatalog:main to re-test against current upstream. * docs: separate UC static-key vs master-role failure modes The README mixed the two together. Static-key empty-sessionToken short-circuits with "S3 bucket configuration not found." before UC even fires an STS call; the AccessDenied I described is what happens in the master-role variant where UC's Java StsClient actually reaches SeaweedFS. Cross-link the playground PR that fixes the static-key vending side. Also drop the "what most playground users actually run" hand-wave under MANAGED tables. * docs: trim README Drop the playground cross-reference and the "two layers fail independently" framing. * docs: pin down what's actually pending Investigated the master-role STS handoff with a sniffer in front of SeaweedFS' STS port. UC's StsClient is constructed without an endpointOverride and never reads aws.endpoint or AWS_ENDPOINT_URL_STS; verified by pointing AWS_ENDPOINT_URL_STS at port 1 and seeing the same real-AWS InvalidClientTokenId 403 with zero traffic to SeaweedFS. The fix is upstream in UC. Updated the README and the master-role test's t.Logf to say so precisely, and dropped the stale "Spark client" bullet (delta-rs covers that path). * test(s3tables): use BaseEndpoint instead of deprecated resolver EndpointResolverWithOptions is deprecated in aws-sdk-go-v2; the supported way to override a service endpoint is via the per-service Options.BaseEndpoint. Switch the assume-role helper to that pattern so the test stops compiling against deprecated API and the resolver boilerplate disappears. Addresses gemini review on PR #9308. * test(s3tables): drop unused splitS3URI helper Helper had no callers; gemini caught it on PR #9308. Easy to bring back from git history if needed. * test(s3tables): extract last token of docker run output as container ID docker run -d may prefix the container ID with image-pull progress when the image isn't cached locally. strings.TrimSpace on the whole output then gave a multi-line string, not the ID. Take the last whitespace-separated token so the ID survives a fresh CI runner. Addresses gemini review on PR #9308. * test(s3tables): cap Unity Catalog response body reads at 10 MiB io.ReadAll without a limit could OOM the test runner if the UC container hands back an unexpectedly large body. 10 MiB is well above any well-formed catalog response and turns a misbehaving server into a test failure instead of a runner crash. Addresses gemini review on PR #9308. * docs: link UC fix PR and call out UC's mocked-Sts test pattern UC's own credential-vending tests substitute StsClient with an in-process EchoAwsStsClient (BaseCRUDTestWithMockCredentials) or Mockito.mockStatic (CloudCredentialVendorTest), so the wire path between UC's Java SDK and a real STS server is untested -- which is why the missing endpointOverride slipped through upstream. Linked the upstream fix at unitycatalog/unitycatalog#1532.
Unity Catalog OSS integration tests
These tests run Unity Catalog OSS in Docker against an embedded SeaweedFS
S3 endpoint. The server.properties mirrors the upstream playground at
mds-in-a-box/unitycatalog-playground.
| Test | Variant | Status |
|---|---|---|
TestUnityCatalogDeltaIntegration |
static keys, aws.masterRoleArn= empty |
passes; covers catalog/schema/EXTERNAL Delta CRUD against SeaweedFS-backed warehouse and asserts that UC's /temporary-table-credentials cannot vend usable creds with this configuration -- exactly the gap the playground reports. |
TestUnityCatalogMasterRoleIntegration |
aws.masterRoleArn=arn:aws:iam::000000000000:role/UnityCatalogVendedRole |
passes; proves SeaweedFS' STS endpoint accepts sts:AssumeRole for the role UC would use (Go SDK round-trip), and that UC starts and accepts CRUD when wired with the master-role config. UC's own StsClient still talks to real AWS regardless of aws.endpoint / AWS_ENDPOINT_URL_STS (UC bug, see below); that hop is logged via t.Logf rather than asserted. |
TestUnityCatalogDeltaRsRoundTrip |
static keys + delta-rs Python client |
passes; resolves table metadata through UC and writes/reads a real Delta table at the registered storage_location using python:3.11-slim + deltalake with the SeaweedFS test credentials. |
Prerequisites
- Docker available locally (the tests call
docker run/docker builddirectly). - A
weedbinary at the repo root (weed/weed) or on$PATH.
Run
go test -timeout 15m \
-run 'TestUnityCatalog' \
./test/s3tables/unity_catalog/...
Pin a specific Unity Catalog image (defaults to
unitycatalog/unitycatalog:v0.4.0):
UC_IMAGE=unitycatalog/unitycatalog:main \
go test -timeout 15m -run TestUnityCatalogDeltaIntegration \
./test/s3tables/unity_catalog/...
The tests self-skip when Docker is unavailable or no weed binary is on
the path; running under -short also skips them.
Why the static-key path can't vend usable creds
UC OSS' AwsCredentialVendor.createPerBucketCredentialGenerator:
if (config.getSessionToken() != null && !config.getSessionToken().isEmpty()) {
return new AwsCredentialGenerator.StaticAwsCredentialGenerator(config);
}
return createStsCredentialGenerator(config);
With aws.masterRoleArn= empty and s3.sessionToken.0= empty (this
test's configuration), /temporary-table-credentials short-circuits with
"S3 bucket configuration not found." before UC fires any STS call.
Setting a stub s3.sessionToken.0 switches UC to
StaticAwsCredentialGenerator and the endpoint returns the static keys,
but the response carries that stub session token -- SeaweedFS won't
recognize it on the next S3 call, so the vended creds aren't usable for
table I/O. Clients have to fall back to the static keys directly.
With aws.masterRoleArn set, UC's AwsCredentialGenerator.StsAwsCredentialGenerator
builds the StsClient with only .region(...) and .credentialsProvider(...) --
no .endpointOverride(). The SDK's generic env-var resolution doesn't kick in
for that builder shape, so even with AWS_ENDPOINT_URL_STS=... (or the
matching aws.endpointUrlSts Java property, or the catch-all
AWS_ENDPOINT_URL=...) the StsClient still targets real AWS and gets back
InvalidClientTokenId. Verified by pointing the env var at port 1: UC reports
the same AWS-issued 403 that it reports against SeaweedFS, and a sniffer in
front of SeaweedFS' STS port records zero traffic. SeaweedFS' STS handler
itself works -- the Go SDK round-trip in assumeRoleViaSeaweedFS proves that
against the same SeaweedFS instance.
UC's own AWS credential-vending tests don't catch this because they mock
StsClient away entirely -- BaseCRUDTestWithMockCredentials injects a
custom stsClientBuilderSupplier returning an EchoAwsStsClient that
synthesizes credentials in-process, and CloudCredentialVendorTest uses
Mockito.mockStatic(StsClient.class). No test ever exercises the wire
path between UC's Java SDK and a real STS endpoint, so the missing
endpointOverride slipped through.
Fix is upstream in
unitycatalog/unitycatalog#1532,
which adds an aws.endpoint property and applies it to both the StsClient
and the S3Client builders. Until that lands, the master-role test logs
the failure but does not assert it.
What the tests actually validate today
- Unity Catalog accepts a SeaweedFS-backed
server.propertiesand starts. - Catalog / schema / EXTERNAL Delta table CRUD all work against the SeaweedFS warehouse via the UC REST API.
- SeaweedFS' STS endpoint correctly issues
sts:AssumeRolecredentials for theUnityCatalogVendedRoleand those credentials are accepted on S3 round-trips (Go AWS SDK). - Delta-RS resolves a UC table's
storage_locationand can write/read Delta data through the SeaweedFS S3 endpoint with the test credentials.
What is still pending
Nothing on the SeaweedFS side. The remaining gap (UC's StsClient ignoring endpoint config) needs a UC OSS patch upstream.
MANAGED tables
Not exercised. UC OSS gates them behind server.managed-table.enabled=true
and a two-step staging flow (POST /staging-tables then POST /tables);
EXTERNAL Delta is the simpler path and what these tests cover.