Files
seaweedfs/test
Chris Lu fc75f16c30 test(s3tables): expand Dremio Iceberg catalog test coverage (#9303)
* test(s3tables): expand Dremio Iceberg catalog test coverage

Restructure TestDremioIcebergCatalog into subtests and add three new
checks that go beyond a connectivity smoke test:

- ColumnProjection: SELECT id, label proves Dremio parsed the schema
  served by the SeaweedFS REST catalog (the previous SELECT COUNT(*)
  passed without exercising any column metadata).
- InformationSchemaColumns: verifies the table's columns are listed in
  Dremio's INFORMATION_SCHEMA.COLUMNS in the expected ordinal order.
- InformationSchemaTables: verifies the table is registered in
  INFORMATION_SCHEMA.TABLES.

All subtests share a single Dremio container startup, so total
runtime is unchanged.

* test(s3tables): exercise multi-level Iceberg namespaces from Dremio

Seed a 2-level Iceberg namespace (and a table inside it) via the REST
catalog before bootstrapping Dremio, then add a MultiLevelNamespace
subtest that scans the nested table by its dot-separated reference.

This relies on isRecursiveAllowedNamespaces=true (already set in the
Dremio source config) to surface the nested levels as folders. A
regression in either the SeaweedFS namespace path encoding (#8959-style)
or Dremio's recursive-namespace discovery would surface here.

Adds two helpers to keep the existing single-level call sites unchanged:

- createIcebergNamespaceLevels: namespace creation with []string levels
- createIcebergTableInLevels: table creation with []string levels and
  unit-separator (0x1F) URL encoding for the namespace path component

* test(s3tables): verify Dremio reads PyIceberg-written rows

The previous Dremio subtests only scanned empty tables, so they did not
exercise the data path - just the catalog/metadata path. Add a
PyIceberg-based writer that materializes parquet files plus a snapshot
on a separate table before Dremio bootstraps, and two new subtests:

- ReadWrittenDataCount: SELECT COUNT(*) returns 3.
- ReadWrittenDataValues: SELECT id, label ORDER BY id returns the three
  written rows with the expected (id, label) pairs.

The writer runs in a small image (Dockerfile.writer) built locally on
demand. It pip-installs pyiceberg+pyarrow once and reuses the layer
cache on subsequent runs. The CI workflow pre-pulls python:3.11-slim
to keep cold runs predictable.

The writer authenticates via the OAuth2 client_credentials flow that
SeaweedFS already exposes at /v1/oauth/tokens, mirroring the Go-side
helper used for REST-API table creation.

* test(s3tables): fix Dremio writer required-field schema mismatch

PyIceberg's append() compatibility check rejects an arrow column whose
nullability does not match the Iceberg field. The table schema declares
id as `required long`, but the default pyarrow int64 column is nullable
- so the writer failed with:

    1: id: required long  vs.  1: id: optional long

Declare an explicit pyarrow schema with nullable=False on id and
nullable=True on label to match the Iceberg side.
2026-05-03 00:17:16 -07:00
..
2026-03-09 23:10:27 -07:00
2026-04-10 17:31:14 -07:00
2026-04-10 17:31:14 -07:00
2026-03-09 11:12:05 -07:00
2023-11-13 08:23:53 -08:00