mirror of
https://github.com/seaweedfs/seaweedfs.git
synced 2026-05-14 05:41:29 +00:00
* test(s3tables): add Unity Catalog OSS integration test against SeaweedFS Mirrors the configuration used by the upstream playground at data-engineering-helpers/mds-in-a-box/unitycatalog-playground. Three test variants under test/s3tables/unity_catalog: - TestUnityCatalogDeltaIntegration: aws.masterRoleArn empty / static keys; catalog/schema/EXTERNAL Delta CRUD + temporary-table-credentials S3 round-trip (the playground's working configuration). - TestUnityCatalogMasterRoleIntegration: aws.masterRoleArn set to a SeaweedFS-side role with a permissive trust policy; UC's StsClient is pinned at SeaweedFS via AWS_ENDPOINT_URL_STS, and the test asserts the vended creds carry a session_token and a non-static access key, proving the role-vended path the playground notes as not-yet-working actually does work today. - TestUnityCatalogDeltaRsRoundTrip: writes/reads a real Delta table at the registered storage_location using delta-rs in a slim Python container, with temporary credentials fetched from UC. All three self-skip without Docker or a weed binary, matching the sibling lakekeeper / polaris tests. * test(s3tables): tighten Unity Catalog tests against actual UC OSS behavior After running the suite locally, ground the assertions in what the upstream UC OSS Docker image actually does against SeaweedFS today. - Static-key playground configuration (TestUnityCatalogDeltaIntegration): catalog/schema/EXTERNAL Delta CRUD pass against the SeaweedFS-backed warehouse. The temporary-table- credentials subtest is renamed and inverted to assert the failure mode the playground reports -- UC's AwsCredentialVendor falls through to an internal StsClient.assumeRole when masterRoleArn and sessionToken are both empty, which has no real STS to talk to. Bucket path is also fixed to match UC's getStorageBase() lookup (s3://lakehouse vs the playground's s3://lakehouse/warehouse, which the upstream code never matches). - Master-role variant (TestUnityCatalogMasterRoleIntegration): split into two passing slices. Slice 1 proves SeaweedFS' STS endpoint vending UnityCatalogVendedRole works via the Go AWS SDK and the vended creds round-trip on S3. Slice 2 boots UC with aws.masterRoleArn set and verifies catalog/schema/Delta CRUD. The third hop -- UC's Java StsClient actually reaching SeaweedFS' STS handler during /temporary-table-credentials -- is logged but not asserted, since the AWS Java SDK's STS request currently lands on a SeaweedFS S3 path rather than the STS handler. - Delta-RS round-trip (TestUnityCatalogDeltaRsRoundTrip): gated on UC_DELTA_RS_RUN=1 since it depends on the master-role STS handoff above. The Dockerfile / writer script stay in tree so the test runs end-to-end the moment that hop is fixed. README rewritten to be explicit about what each test validates today and what is still pending. Result: `go test -run TestUnityCatalog ./test/s3tables/unity_catalog/...` passes cleanly with weed + Docker available, and self-skips otherwise. * test(s3tables): exercise unity catalog integrations * ci: run Unity Catalog integration tests on PRs Adds a unity-catalog-integration-tests job to s3-tables-tests.yml, modeled on the existing lakekeeper / dremio jobs. Pre-pulls the UC image and python:3.11-slim (used by the delta-rs writer container) and runs `go test ./test/s3tables/unity_catalog`. Format-check and go-vet jobs already recurse into ./test/s3tables/... so the new package is covered there too. * test/ci: address PR review Tighten the UC readiness probe to require 200, not <500, so a 401/403/404 during startup surfaces immediately instead of being treated as ready (CodeRabbit). Pin the UC image to v0.4.0 in both the workflow and the test default, matching the pinned-tag convention the rest of s3-tables-tests.yml uses (CodeRabbit). Use UC_IMAGE=unitycatalog/unitycatalog:main to re-test against current upstream. * docs: separate UC static-key vs master-role failure modes The README mixed the two together. Static-key empty-sessionToken short-circuits with "S3 bucket configuration not found." before UC even fires an STS call; the AccessDenied I described is what happens in the master-role variant where UC's Java StsClient actually reaches SeaweedFS. Cross-link the playground PR that fixes the static-key vending side. Also drop the "what most playground users actually run" hand-wave under MANAGED tables. * docs: trim README Drop the playground cross-reference and the "two layers fail independently" framing. * docs: pin down what's actually pending Investigated the master-role STS handoff with a sniffer in front of SeaweedFS' STS port. UC's StsClient is constructed without an endpointOverride and never reads aws.endpoint or AWS_ENDPOINT_URL_STS; verified by pointing AWS_ENDPOINT_URL_STS at port 1 and seeing the same real-AWS InvalidClientTokenId 403 with zero traffic to SeaweedFS. The fix is upstream in UC. Updated the README and the master-role test's t.Logf to say so precisely, and dropped the stale "Spark client" bullet (delta-rs covers that path). * test(s3tables): use BaseEndpoint instead of deprecated resolver EndpointResolverWithOptions is deprecated in aws-sdk-go-v2; the supported way to override a service endpoint is via the per-service Options.BaseEndpoint. Switch the assume-role helper to that pattern so the test stops compiling against deprecated API and the resolver boilerplate disappears. Addresses gemini review on PR #9308. * test(s3tables): drop unused splitS3URI helper Helper had no callers; gemini caught it on PR #9308. Easy to bring back from git history if needed. * test(s3tables): extract last token of docker run output as container ID docker run -d may prefix the container ID with image-pull progress when the image isn't cached locally. strings.TrimSpace on the whole output then gave a multi-line string, not the ID. Take the last whitespace-separated token so the ID survives a fresh CI runner. Addresses gemini review on PR #9308. * test(s3tables): cap Unity Catalog response body reads at 10 MiB io.ReadAll without a limit could OOM the test runner if the UC container hands back an unexpectedly large body. 10 MiB is well above any well-formed catalog response and turns a misbehaving server into a test failure instead of a runner crash. Addresses gemini review on PR #9308. * docs: link UC fix PR and call out UC's mocked-Sts test pattern UC's own credential-vending tests substitute StsClient with an in-process EchoAwsStsClient (BaseCRUDTestWithMockCredentials) or Mockito.mockStatic (CloudCredentialVendorTest), so the wire path between UC's Java SDK and a real STS server is untested -- which is why the missing endpointOverride slipped through upstream. Linked the upstream fix at unitycatalog/unitycatalog#1532.