Files
at-container-registry/docs/CONFIG_BLOB_STORAGE.md
2026-02-11 20:44:07 -06:00

3.0 KiB

Config Blob Storage Decision

Background

OCI image manifests reference two types of blobs:

  1. Layers — filesystem diffs (tar+gzip), typically large, content-addressed and shared across users
  2. Config blob — small JSON (~2-15KB) containing image metadata: architecture, OS, environment variables, entrypoint, Dockerfile build history, and labels

In ATCR, manifests are stored in the user's PDS while all blobs (layers and config) are stored in S3 via the hold service. The hold tracks layers with io.atcr.hold.layer records but has no equivalent tracking for config blobs.

Considered: Storing Config Blobs in PDS

Config blobs are unique per image build — unlike layers which are deduplicated across users, a config blob contains the specific Dockerfile history, env vars, and labels for that build. This makes them conceptually "user data" that could belong in the user's PDS alongside the manifest.

The proposal was to add a ConfigBlob field to ManifestRecord, uploading the config blob to PDS during push (the data is already fetched from S3 for label extraction). The config would remain in S3 as well since the distribution library puts it there during the blob push phase.

Potential benefits:

  • Manifests become more self-contained in PDS
  • Config metadata (entrypoint, env, history) available without S3 access (e.g., for web UI)
  • Aligns with the principle that user-specific data belongs in the user's PDS

Decision: Keep Config Blobs in S3 Only

Config blobs can contain sensitive data:

  • Environment variablesENV DATABASE_URL=..., ENV API_KEY=... set in Dockerfiles
  • Build historyhistory[].created_by reveals exact Dockerfile commands, internal registry URLs, build arguments
  • Labels — may contain internal metadata not intended for public consumption

ATProto has no private data. The current storage split creates a useful privacy boundary:

Storage Visibility Contains
PDS Public (anyone) Manifest structure, tags, repo names, annotations
Hold/S3 Auth-gated Layers + config — actual image content

This boundary enables semi-private repos: the public PDS metadata tells you what images exist (names, tags, sizes), but you cannot reconstruct or run the image without hold access. Storing config in PDS would break this — build secrets and Dockerfile history would be publicly readable even when the hold restricts blob access.

We considered making PDS storage optional (only for fully public holds or allow-all-crew holds), but an optional field that can't be relied upon adds complexity without clear benefit — the config must live in S3 regardless for the pull path.

Current Status

Config blobs remain in S3 behind hold authorization. GC handles config digests to prevent orphaned deletion (config digests are included in the referenced set alongside layer digests).

Revisit If

  • ATProto adds private data support
  • A concrete use case emerges that requires PDS-native config access