fix(s3): stop S3 Tables routes from swallowing buckets named "buckets" or "get-table" (#9566)

* fix(s3): stop S3 Tables routes from swallowing buckets named "buckets" or "get-table"

The S3 Tables REST endpoints share top-level paths with the regular S3
API (/buckets for ListTableBuckets/CreateTableBucket, /get-table for
GetTable). They are registered first on the same router as the bucket
subrouter, so a path-style request such as GET /buckets?list-type=2 on
a bucket actually named "buckets" matched ListTableBuckets and returned
JSON. AWS SDK V2 (and Hadoop s3a / Spark) then failed XML parsing with
"Unexpected character '{' (code 123) in prolog".

Disambiguate by requiring the AWS V4 credential scope to name the
s3tables service on the colliding routes. Regular S3 SDKs sign with
service=s3, S3 Tables SDKs sign with service=s3tables, and the scope is
present in both the Authorization header and the X-Amz-Credential query
parameter for presigned URLs, so the matcher works for both flavors.

ARN-bearing S3 Tables routes (/buckets/<arn>, /namespaces/<arn>, etc.)
already cannot collide because colons are not valid in bucket names, so
they are left untouched.

* fix(s3): accept AWS JSON RPC content type as S3 Tables intent signal

The Iceberg catalog integration tests send unsigned PUT /buckets with
Content-Type: application/x-amz-json-1.1 to create table buckets. With
only the credential-scope check, those requests fell through to the
regular S3 CreateBucket handler and the suite went red on this branch.

Extend the matcher so a request is recognized as S3 Tables when either:

  - its AWS V4 credential scope names SERVICE=s3tables; or
  - it carries the canonical AWS JSON RPC 1.1 content type and is
    unsigned (a request explicitly signed for SERVICE=s3 still wins).

The regular S3 SDKs do not send application/x-amz-json-1.1, so the
signal is safe for the colliding paths (/buckets, /get-table).

Also add an AWS SDK V2 for Go integration test under
test/s3/sdk_v2_routing/ that drives the SDK's own XML deserializer
against a bucket literally named "buckets" and "get-table" — the SDK
errors before the test asserts if the server returns the wrong body
shape. Wired up via .github/workflows/s3-sdk-v2-routing-tests.yml,
mirroring the etag/acl workflow.

* s3api: extend service matcher to all S3 Tables routes; simplify scope check

- Apply serviceMatcher to every S3 Tables route, not just the bare-path
  ones. ARN-bearing paths could otherwise be hit by an S3 object key
  that starts with arn:aws:s3tables:..., inside a bucket named
  "buckets", "namespaces", "tables", or "tag". One matcher everywhere
  closes both collision classes.
- Replace strings.Split + index lookup with strings.Contains for the
  credential-scope check. The scope shape is fixed at
  AK/DATE/REGION/SERVICE/aws4_request, slashes only delimit components,
  and access keys are alphanumeric — so /s3tables/ matches iff SERVICE
  is exactly s3tables. Existing unit cases (including the
  access-key-substring case) still pass.
- Read the GetObject body in the SDK v2 routing test with io.ReadAll;
  the single Read could return short and make the equality check flaky.

* s3api: drop content-type fallback; sign s3 tables harness traffic instead

The content-type fallback in isS3TablesSignedRequest let an anonymous
regular-S3 request whose body type is application/x-amz-json-1.1 hit
an S3 Tables route when the path-style object key happened to be
shaped like an S3 Tables ARN (e.g. PutObject on bucket "buckets"
with key arn:aws:s3tables:.../bucket/foo/policy). Narrow the matcher
back to the AWS V4 credential scope so only requests signed for
SERVICE=s3tables match the S3 Tables routes.

Update the Iceberg catalog test harness — the only caller still
sending unsigned PUT /buckets — to sign with SERVICE=s3tables. The
mini instance runs in default-allow mode, so the signature itself is
not verified; only the credential scope matters for the route match.

Drop the stale unit cases for the JSON-RPC content-type signal and
the routing test that exercised unsigned harness traffic.
This commit is contained in:
Chris Lu
2026-05-19 14:24:25 -07:00
committed by GitHub
parent cfc08fbf6c
commit f72983c1fd
6 changed files with 616 additions and 24 deletions

View File

@@ -0,0 +1,110 @@
name: "S3 SDK V2 Route Disambiguation Tests"
on:
push:
branches: [ master ]
paths:
- 'weed/s3api/**'
- 'test/s3/sdk_v2_routing/**'
- '.github/workflows/s3-sdk-v2-routing-tests.yml'
pull_request:
branches: [ master ]
paths:
- 'weed/s3api/**'
- 'test/s3/sdk_v2_routing/**'
- '.github/workflows/s3-sdk-v2-routing-tests.yml'
concurrency:
group: ${{ github.head_ref || github.ref }}/s3-sdk-v2-routing-tests
cancel-in-progress: true
permissions:
contents: read
jobs:
s3-sdk-v2-routing-tests:
name: S3 SDK V2 Routing Tests
runs-on: ubuntu-22.04
timeout-minutes: 10
steps:
- name: Check out code
uses: actions/checkout@v6
- name: Set up Go
uses: actions/setup-go@v6
with:
go-version-file: 'go.mod'
- name: Install SeaweedFS
run: |
cd weed && go install -buildvcs=false
- name: Start weed mini (S3 on :8333)
# Pins the regression for issue #9559: AWS SDK V2 / Hadoop s3a
# listing a bucket literally named "buckets" must get an XML
# ListObjectsV2 response, not the JSON ListTableBuckets body
# served by the S3 Tables REST endpoint on the same path.
run: |
mkdir -p /tmp/seaweedfs-sdk-v2-routing
cat > /tmp/seaweedfs-sdk-v2-routing-s3.json <<'JSON'
{
"identities": [
{
"name": "admin",
"credentials": [
{"accessKey": "some_access_key1", "secretKey": "some_secret_key1"}
],
"actions": ["Admin", "Read", "Write"]
}
]
}
JSON
AWS_ACCESS_KEY_ID=some_access_key1 \
AWS_SECRET_ACCESS_KEY=some_secret_key1 \
weed mini \
-dir=/tmp/seaweedfs-sdk-v2-routing \
-s3.port=8333 \
-s3.config=/tmp/seaweedfs-sdk-v2-routing-s3.json \
-ip=127.0.0.1 \
> /tmp/weed-mini.log 2>&1 &
echo $! > /tmp/weed-mini.pid
for i in $(seq 1 30); do
if curl -s -o /dev/null -w "%{http_code}" http://127.0.0.1:8333/ | grep -qE "^(200|403)$"; then
echo "weed mini is ready"
exit 0
fi
sleep 1
done
echo "weed mini failed to start within 30s"
tail -50 /tmp/weed-mini.log
exit 1
- name: Run SDK V2 routing tests
env:
S3_ENDPOINT: http://127.0.0.1:8333
AWS_ACCESS_KEY_ID: some_access_key1
AWS_SECRET_ACCESS_KEY: some_secret_key1
AWS_REGION: us-east-1
run: go test -v -timeout=5m ./test/s3/sdk_v2_routing/...
- name: Stop weed mini
if: always()
run: |
if [ -f /tmp/weed-mini.pid ]; then
kill "$(cat /tmp/weed-mini.pid)" 2>/dev/null || true
fi
- name: Show server log on failure
if: failure()
run: |
echo "=== weed mini log (last 200 lines) ==="
tail -n 200 /tmp/weed-mini.log 2>/dev/null || echo "no log available"
- name: Archive log
if: failure()
uses: actions/upload-artifact@v7
with:
name: s3-sdk-v2-routing-server-log
path: /tmp/weed-mini.log
retention-days: 3