Files
at-container-registry/docs/REMOVING_DISTRIBUTION.md
Evan Jarrett de02e1f046 remove distribution from hold, add vulnerability scanning in appview.
1. Removing distribution/distribution from the Hold Service (biggest change)
  The hold service previously used distribution's StorageDriver interface for all blob operations. This replaces it with direct AWS SDK v2 calls through ATCR's own pkg/s3.S3Service:
  - New S3Service methods: Stat(), PutBytes(), Move(), Delete(), WalkBlobs(), ListPrefix() added to pkg/s3/types.go
  - Pull zone fix: Presigned URLs are now generated against the real S3 endpoint, then the host is swapped to the CDN URL post-signing (previously the CDN URL was set as the endpoint, which
  broke SigV4 signatures)
  - All hold subsystems migrated: GC, OCI uploads, XRPC handlers, profile uploads, scan broadcaster, manifest posts — all now use *s3.S3Service instead of storagedriver.StorageDriver
  - Config simplified: Removed configuration.Storage type and buildStorageConfigFromFields(); replaced with a simple S3Params() method
  - Mock expanded: MockS3Client gains an in-memory object store + 5 new methods, replacing duplicate mockStorageDriver implementations in tests (~160 lines deleted from each test file)
2. Vulnerability Scan UI in AppView (new feature)
  Displays scan results from the hold's PDS on the repository page:
  - New lexicon: io/atcr/hold/scan.json with vulnReportBlob field for storing full Grype reports
  - Two new HTMX endpoints: /api/scan-result (badge) and /api/vuln-details (modal with CVE table)
  - New templates: vuln-badge.html (severity count chips) and vuln-details.html (full CVE table with NVD/GHSA links)
  - Repository page: Lazy-loads scan badges per manifest via HTMX
  - Tests: ~590 lines of test coverage for both handlers
3. S3 Diagnostic Tool
  New cmd/s3-test/main.go (418 lines) — tests S3 connectivity with both SDK v1 and v2, including presigned URL generation, pull zone host swapping, and verbose signing debug output.
4. Deployment Tooling
  - New syncServiceUnit() for comparing/updating systemd units on servers
  - Update command now syncs config keys (adds missing keys from template) and service units with daemon-reload
5. DB Migration
  0011_fix_captain_successor_column.yaml — rebuilds hold_captain_records to add the successor column that was missed in a previous migration.
6. Documentation
  - APPVIEW-UI-FUTURE.md rewritten as a status-tracked feature inventory
  - DISTRIBUTION.md renamed to CREDENTIAL_HELPER.md
  - New REMOVING_DISTRIBUTION.md — 480-line analysis of fully removing distribution from the appview side
7. go.mod
  aws-sdk-go v1 moved from indirect to direct (needed by cmd/s3-test).
2026-02-13 15:26:24 -06:00

23 KiB

Removing distribution/distribution

This document analyzes what it would take to remove the github.com/distribution/distribution/v3 library and implement ATCR's own OCI Distribution Spec HTTP endpoints.

Why Consider Removing It

  1. Impedance mismatch -- Distribution assumes manifests and blobs live in the same storage backend. ATCR routes manifests to ATProto PDS and blobs to hold/S3. Every storage interface is overridden.
  2. Context value workaround -- Repository() receives only context.Context from distribution's interface, forcing auth/identity data through context keys into RegistryContext.
  3. Per-request repository creation -- RoutingRepository is recreated on every request because distribution's caching assumptions conflict with ATCR's OAuth session model.
  4. Stale transitive dependencies -- Distribution pulls in AWS SDK v1 (EOL) via its S3 storage driver, even though ATCR doesn't use that driver.
  5. Unused features -- GC, notifications, storage drivers, replication -- none are used. ATCR has its own GC, its own event dispatch (processManifest XRPC), and its own S3 integration.
  6. Upstream maintenance pace -- Slow to merge dependency updates and bug fixes.

What Distribution Currently Provides

Only these pieces are actually used:

What Distribution Package ATCR Usage
HTTP endpoint routing registry/handlers handlers.NewApp() creates the /v2/ handler
OCI error responses registry/api/errcode ErrorCodeUnauthorized, ErrorCodeDenied, ErrorCodeUnsupported
Middleware registration registry/middleware/registry Register("atproto-resolver", ...)
Repository interface distribution (root) Repository, ManifestService, BlobStore, TagService
Reference parsing distribution/reference reference.Named for identity/image parsing
Token auth registry/auth/token Blank import for registration
In-memory driver registry/storage/driver/inmemory Blank import; placeholder since real storage is external
Configuration types configuration configuration.Configuration struct

Everything else (S3 driver, GC, notifications, replication, schema validation) is dead weight.

Files That Import Distribution

All in pkg/appview/ -- hold and scanner are unaffected.

Core implementation (8 files):

  • storage/routing_repository.go -- distribution.Repository wrapper
  • storage/manifest_store.go -- distribution.ManifestService impl
  • storage/proxy_blob_store.go -- distribution.BlobStore + BlobWriter impl
  • storage/tag_store.go -- distribution.TagService impl
  • middleware/registry.go -- distribution.Namespace + middleware registration
  • config.go -- Builds configuration.Configuration
  • server.go -- handlers.NewApp(), errcode for error responses
  • cmd/appview/main.go -- Blank imports for driver/auth registration

Tests (6 files):

  • storage/routing_repository_test.go
  • storage/manifest_store_test.go
  • storage/proxy_blob_store_test.go
  • storage/tag_store_test.go
  • middleware/registry_test.go

OCI Distribution Spec Endpoints to Implement

The spec defines these HTTP endpoints. ATCR would need handlers for each.

Version Check

GET /v2/
200 OK  (confirms OCI compliance)
401 Unauthorized  (triggers auth flow)

Docker clients hit this first. Must return 200 for authenticated requests. A 401 response with WWW-Authenticate header triggers the Docker auth handshake.

Manifests

GET    /v2/<name>/manifests/<reference>   -> 200 + manifest body
HEAD   /v2/<name>/manifests/<reference>   -> 200 + headers only
PUT    /v2/<name>/manifests/<reference>   -> 201 Created
DELETE /v2/<name>/manifests/<reference>   -> 202 Accepted

<reference> is either a tag (latest) or digest (sha256:abc...).

Required headers:

  • Request Accept: manifest media types the client supports
  • Response Content-Type: actual manifest media type
  • Response Docker-Content-Digest: canonical digest of manifest

Media types to support:

  • application/vnd.oci.image.manifest.v1+json
  • application/vnd.oci.image.index.v1+json
  • application/vnd.docker.distribution.manifest.v2+json
  • application/vnd.docker.distribution.manifest.list.v2+json

Blobs

GET    /v2/<name>/blobs/<digest>          -> 200 + blob body (or 307 redirect)
HEAD   /v2/<name>/blobs/<digest>          -> 200 + headers only
DELETE /v2/<name>/blobs/<digest>          -> 202 Accepted

ATCR already redirects to presigned S3 URLs via ServeBlob() -- this would become a direct 307 redirect in the handler.

Blob Uploads (Chunked/Resumable)

Initiate:

POST /v2/<name>/blobs/uploads/
202 Accepted
Location: /v2/<name>/blobs/uploads/<uuid>

Monolithic (single request):

POST /v2/<name>/blobs/uploads/?digest=sha256:...
Content-Type: application/octet-stream
Body: <entire blob>
201 Created

Chunked:

PATCH /v2/<name>/blobs/uploads/<uuid>
Content-Type: application/octet-stream
Content-Range: <start>-<end>
Body: <chunk data>
202 Accepted
Range: 0-<end>

(repeat PATCH for each chunk)

PUT /v2/<name>/blobs/uploads/<uuid>?digest=sha256:...
201 Created
Location: /v2/<name>/blobs/<digest>

Check progress:

GET /v2/<name>/blobs/uploads/<uuid>
204 No Content
Range: 0-<bytes received>

Cancel:

DELETE /v2/<name>/blobs/uploads/<uuid>
204 No Content

Cross-repo mount:

POST /v2/<name>/blobs/uploads/?mount=<digest>&from=<other-repo>
201 Created  (if blob exists in source repo)
202 Accepted  (fall back to regular upload)

Tags

GET /v2/<name>/tags/list
200 OK
{
  "name": "<name>",
  "tags": ["latest", "v1.0"]
}

Supports pagination via n (count) and last (cursor) query params.

Referrers (OCI v1.1)

GET /v2/<name>/referrers/<digest>
200 OK
Content-Type: application/vnd.oci.image.index.v1+json
Body: image index of referring manifests

Supports artifactType query filter. Returns manifests whose subject field points to the given digest.

Catalog (Optional)

GET /v2/_catalog
200 OK
{ "repositories": ["alice/app", "bob/tool"] }

Pagination via n and last. ATCR may choose not to implement this (many registries don't).

Error Response Format

All 4xx/5xx responses must use the OCI error envelope:

{
  "errors": [
    {
      "code": "MANIFEST_UNKNOWN",
      "message": "manifest not found",
      "detail": { "tag": "latest" }
    }
  ]
}

Standard error codes:

Code HTTP Status Meaning
BLOB_UNKNOWN 404 Blob not found
BLOB_UPLOAD_INVALID 400 Bad digest or size mismatch
BLOB_UPLOAD_UNKNOWN 404 Upload session expired/missing
DIGEST_INVALID 400 Digest doesn't match content
MANIFEST_BLOB_UNKNOWN 404 Manifest references missing blob
MANIFEST_INVALID 400 Malformed manifest
MANIFEST_UNKNOWN 404 Manifest not found
NAME_INVALID 400 Bad repository name
NAME_UNKNOWN 404 Repository doesn't exist
SIZE_INVALID 400 Content-Length mismatch
UNAUTHORIZED 401 Authentication required
DENIED 403 Permission denied
UNSUPPORTED 405 Operation not supported
TOOMANYREQUESTS 429 Rate limited

What Exists Today vs What's New

For each handler, this breaks down what logic already exists in the storage layer (and just needs to be called) vs what new HTTP glue code must be written. Distribution's handler layer currently handles all the HTTP parsing, header validation, content negotiation, and response formatting -- all of that becomes our responsibility.

Shared New Code

Error helpers (~50 lines, new): OCI error envelope formatting. Currently provided by errcode.ErrorCodeUnauthorized etc.

type RegistryError struct {
    Code    string      `json:"code"`
    Message string      `json:"message"`
    Detail  interface{} `json:"detail,omitempty"`
}

func WriteError(w http.ResponseWriter, status int, code, message string) {
    w.Header().Set("Content-Type", "application/json")
    w.WriteHeader(status)
    json.NewEncoder(w).Encode(struct {
        Errors []RegistryError `json:"errors"`
    }{Errors: []RegistryError{{Code: code, Message: message}}})
}

Auth middleware (~80 lines, mostly exists): ExtractAuthMethod() already exists in middleware/registry.go. Needs adaptation to work standalone (currently wraps distribution's app). Must also generate WWW-Authenticate header for 401 responses -- distribution's token auth handler currently does this via blank import of registry/auth/token.

Identity resolution middleware (~250 lines, exists): NamespaceResolver.Repository() in middleware/registry.go does identity resolution, hold discovery, service token acquisition, and ATProto client creation. This logic moves into an HTTP middleware but the code is the same -- resolves DID, finds hold, gets service token, builds RegistryContext. The validation cache (concurrent service token deduplication) comes along as-is.

Router (~30 lines, new):

mux.HandleFunc("GET /v2/", handleVersionCheck)
mux.HandleFunc("GET /v2/{name...}/manifests/{reference}", handleManifestGet)
// ... etc

Handler-by-Handler Breakdown


handleVersionCheck -- GET /v2/

Existing logic None needed -- this is just a 200 OK response
New code ~10 lines. Return 200 with Docker-Distribution-API-Version: registry/2.0 header. If unauthenticated, return 401 with WWW-Authenticate header to trigger Docker's auth flow

handleManifestGet -- GET /v2/<name>/manifests/<reference>

Existing logic ManifestStore.Get() fetches manifest from PDS (record lookup, optional blob download for new-format records). Returns media type + raw bytes. Also fires async pull notification to hold for stats. TagStore.Get() resolves tag → digest when reference is a tag
New code (~40 lines) Parse <reference> to determine tag vs digest. If tag, call TagStore.Get() first to resolve digest. Call ManifestStore.Get(). Set response headers: Content-Type (manifest media type), Docker-Content-Digest (canonical digest), Content-Length. Write body. Handle 404 (manifest not found → MANIFEST_UNKNOWN error)
Subtle Content negotiation: must check client's Accept header against the manifest's actual media type. Distribution handles this transparently. If client doesn't accept the type, return 404. In practice most clients accept everything, but crane and skopeo can be picky

handleManifestHead -- HEAD /v2/<name>/manifests/<reference>

Existing logic ManifestStore.Exists() checks PDS record existence. ManifestStore.Get() needed for full headers
New code (~30 lines) Same as GET but write headers only, no body. Needs Content-Type, Docker-Content-Digest, Content-Length. Could call Exists() for a fast path and Get() for full header population, or just call Get() and skip the body write
Note Some clients (Docker) use HEAD to check existence before pulling. Must return same headers as GET

handleManifestPut -- PUT /v2/<name>/manifests/<reference>

Existing logic ManifestStore.Put() does a LOT: calculates digest, uploads manifest bytes as blob to PDS, creates ManifestRecord with structured metadata, validates manifest list child references, extracts config labels, fetches README/icon, creates tag record, fires async notifications to hold, creates repo page records, handles successor migration
New code (~50 lines) Read request body. Extract Content-Type header as media type. Parse <reference> to determine if this is a tag push. Call ManifestStore.Put() with payload, media type, and optional tag. Set response headers: Location (/v2/<name>/manifests/<digest>), Docker-Content-Digest. Return 201 Created. Handle errors: MANIFEST_INVALID (bad JSON), MANIFEST_BLOB_UNKNOWN (missing child manifest in manifest list)
Subtle Distribution currently wraps the manifest in a distribution.Manifest interface (with Payload() and References() methods) before passing to Put(). Without distribution, we'd change Put() to accept raw []byte + mediaType + optional tag directly -- simpler but requires updating the method signature and its internals

handleManifestDelete -- DELETE /v2/<name>/manifests/<reference>

Existing logic ManifestStore.Delete() calls ATProtoClient.DeleteRecord()
New code (~15 lines) Parse digest from <reference>. Call ManifestStore.Delete(). Return 202 Accepted. Handle 404

handleBlobGet -- GET /v2/<name>/blobs/<digest>

Existing logic ProxyBlobStore.ServeBlob() checks read access, gets presigned URL from hold, and issues 307 redirect. This is already essentially an HTTP handler
New code (~20 lines) Parse digest from path. Call the presigned URL logic (read access check + hold XRPC call). Write 307 redirect with Location header pointing to presigned S3 URL
Note ServeBlob() currently takes http.ResponseWriter and *http.Request -- it's already doing the HTTP work. This handler is mostly just calling it. Could almost be used as-is

handleBlobHead -- HEAD /v2/<name>/blobs/<digest>

Existing logic ProxyBlobStore.Stat() checks read access, gets presigned HEAD URL, makes HEAD request to S3, returns size
New code (~20 lines) Parse digest. Call Stat(). Set Content-Length, Docker-Content-Digest, Content-Type: application/octet-stream. Return 200. Handle 404 (BLOB_UNKNOWN)

handleBlobUploadInit -- POST /v2/<name>/blobs/uploads/

Existing logic ProxyBlobStore.Create() checks write access, generates upload ID, calls startMultipartUpload() XRPC to hold, creates ProxyBlobWriter, stores in globalUploads map
New code (~50 lines) Check for ?mount=<digest>&from=<repo> query params (cross-repo mount). Check for ?digest=<digest> (monolithic upload -- read body, write to store, complete in one shot). Otherwise, call Create() to start a new upload session. Return 202 Accepted with Location: /v2/<name>/blobs/uploads/<uuid> header, Docker-Upload-UUID header
Subtle Monolithic upload (single POST with digest and body) is a shortcut some clients use. Distribution handles this transparently. We'd need to handle it explicitly: read body, create writer, write, commit. Cross-repo mount is also handled here -- check if blob exists in source repo, skip upload if so

handleBlobUploadChunk -- PATCH /v2/<name>/blobs/uploads/<uuid>

Existing logic ProxyBlobWriter.Write() buffers data and auto-flushes 10MB chunks to S3 via presigned URLs. flushPart() handles the XRPC call to hold for part upload URLs and ETag tracking
New code (~40 lines) Look up writer from globalUploads by UUID. Parse Content-Range header (format: <start>-<end>). Read request body. Call writer.Write(body). Return 202 Accepted with Location header (same upload URL), Range: 0-<total bytes received> header. Handle missing upload (BLOB_UPLOAD_UNKNOWN)
Subtle Content-Range validation: must verify start offset matches current writer position (no gaps, no out-of-order). Return 416 Range Not Satisfiable if misaligned. Distribution handles this; we'd need to track and validate

handleBlobUploadComplete -- PUT /v2/<name>/blobs/uploads/<uuid>?digest=sha256:...

Existing logic ProxyBlobWriter.Commit() flushes remaining buffer, calls completeMultipartUpload() XRPC to hold, removes writer from globalUploads
New code (~40 lines) Look up writer from globalUploads. Parse ?digest= query param. If request has body, write it to the writer (final chunk can be in the PUT). Call writer.Commit() with digest descriptor. Return 201 Created with Location: /v2/<name>/blobs/<digest>, Docker-Content-Digest header. Handle errors: DIGEST_INVALID (provided digest doesn't match), BLOB_UPLOAD_UNKNOWN (expired session)
Subtle Digest validation: distribution verifies the provided digest matches what was actually uploaded. Our writer doesn't currently track a running digest hash -- Commit() just passes the digest through to hold. Need to decide: trust the hold to validate, or add client-side validation. Currently hold does the final validation since it has all the parts

handleBlobUploadStatus -- GET /v2/<name>/blobs/uploads/<uuid>

Existing logic ProxyBlobWriter.Size() returns total bytes written
New code (~15 lines) Look up writer from globalUploads. Return 204 No Content with Range: 0-<size - 1>, Docker-Upload-UUID, Location headers. Handle missing upload

handleBlobUploadCancel -- DELETE /v2/<name>/blobs/uploads/<uuid>

Existing logic ProxyBlobWriter.Cancel() calls abortMultipartUpload() XRPC to hold, removes from globalUploads
New code (~15 lines) Look up writer. Call Cancel(). Return 204 No Content. Handle missing upload

handleTagsList -- GET /v2/<name>/tags/list

Existing logic TagStore.All() lists all tag records from PDS, filters by repository
New code (~30 lines) Call TagStore.All(). Parse ?n= and ?last= query params for pagination (slice the results). Return JSON: {"name": "<name>", "tags": [...]}. Set Link header for pagination if there are more results
Note Distribution handles pagination. We'd need to implement it ourselves -- sort tags, apply cursor, set Link header with next page URL

handleReferrers -- GET /v2/<name>/referrers/<digest>

Existing logic Not currently implemented in ATCR's storage layer. Distribution may return an empty index
New code (~30 lines) Query manifests that have a subject field pointing to the given digest. Return an OCI image index containing descriptors for each referrer. Support ?artifactType= filter. If no referrers, return empty index
Note This is new functionality either way. ATCR would need to query PDS for manifests with matching subject digests. Could defer this (return empty index) and implement properly later

Interface Changes to Storage Layer

The existing stores would need their method signatures simplified. This is mostly mechanical -- removing distribution wrapper types:

ManifestStore changes:

  • Get(): returns (distribution.Manifest, error) → returns (mediaType string, payload []byte, err error)
  • Put(): accepts distribution.Manifest + ...distribution.ManifestServiceOption → accepts payload []byte, mediaType string, tag string
  • Exists() and Delete(): signatures stay roughly the same (just digest.Digest in, error out)
  • Remove rawManifest struct (wrapper implementing distribution.Manifest interface)
  • Remove distribution.WithTagOption extraction logic in Put()

ProxyBlobStore changes:

  • Stat(): returns distribution.Descriptor → returns (size int64, err error)
  • Get(): stays the same (returns []byte)
  • ServeBlob(): already takes http.ResponseWriter/*http.Request -- could become the handler itself
  • Create(): returns distribution.BlobWriter → returns *ProxyBlobWriter directly
  • Resume(): same change
  • Remove distribution.BlobCreateOption / distribution.CreateOptions parsing
  • ProxyBlobWriter.Commit(): accepts distribution.Descriptor → accepts digest string, size int64

TagStore changes:

  • Get(): returns distribution.Descriptor → returns (digest string, err error)
  • Tag(): accepts distribution.Descriptor → accepts digest string
  • All(), Untag(), Lookup(): minimal changes

RoutingRepository:

  • Removed entirely. Handlers call stores directly. The lazy initialization via sync.Once goes away since there's no interface requiring a Repository object.

Estimated interface change work: ~150 lines changed across storage files + ~150 lines changed across test files.

What Stays

These dependencies are used directly and stay regardless:

  • github.com/opencontainers/go-digest -- Digest parsing/validation (standard, lightweight)
  • github.com/opencontainers/image-spec -- OCI manifest/index structs (optional but useful for validation)
  • github.com/distribution/reference -- Could stay (lightweight, no heavy transitive deps) or replace with string splitting since ATCR's name format is always <identity>/<image>

Revised Effort Estimate

Component New Lines Changed Lines Notes
Router + version check ~40 0 Trivial
Error helpers ~50 0 OCI error envelope, error code constants
Auth middleware adaptation ~30 ~50 WWW-Authenticate header generation is new; ExtractAuthMethod moves
Identity resolution middleware ~20 ~30 NamespaceResolver.Repository() logic moves to HTTP middleware; code is the same
Manifest handlers (GET/HEAD/PUT/DELETE) ~135 0 Content negotiation, header writing, tag vs digest parsing
Blob handlers (GET/HEAD/DELETE) ~55 0 Presigned URL redirect, stat, delete stub
Blob upload handlers (POST/PATCH/PUT/GET/DELETE) ~160 0 Chunked upload protocol, Content-Range validation, monolithic upload, cross-repo mount
Tags list handler ~30 0 Pagination logic
Referrers handler ~30 0 Could defer with empty index
Storage interface changes 0 ~150 Remove distribution types from method signatures
Test updates 0 ~150 Update mocks and assertions for new signatures
Config cleanup 0 ~80 Remove buildDistributionConfig(), blank imports
Total ~550 new ~460 changed ~1010 lines total

This is not a trivial migration. The ~550 new lines are genuine new HTTP handler code that doesn't exist today -- distribution's handler layer provides all of it currently. The changed lines are mostly mechanical (removing distribution type wrappers) but still need care and test updates.

Risk Assessment

Low risk:

  • Storage logic is unchanged -- same PDS calls, same hold XRPC calls, same presigned URLs
  • Auth flow is unchanged -- same JWT validation, same OAuth refresh
  • Tests can be adapted incrementally

Medium risk:

  • Subtle OCI spec compliance gaps (edge cases in content negotiation, digest validation, chunked upload semantics)
  • Docker client compatibility -- different clients (Docker, Podman, crane, skopeo) may exercise different code paths

Mitigation:

  • Use OCI conformance tests to validate
  • Test against Docker, Podman, crane, and skopeo before shipping
  • Can be done incrementally: build new router, test alongside distribution handler, swap when ready

Dependencies Removed

Removing distribution eliminates ~30-40 transitive packages, notably:

  • github.com/aws/aws-sdk-go (v1, EOL)
  • Azure cloud SDK packages
  • Google Cloud Storage packages
  • Distribution-specific logging/metrics
  • Unused storage driver registrations

Most other transitive deps (gRPC, protobuf, OpenTelemetry, logrus) are also pulled by bluesky-social/indigo and would remain.