1. Removing distribution/distribution from the Hold Service (biggest change) The hold service previously used distribution's StorageDriver interface for all blob operations. This replaces it with direct AWS SDK v2 calls through ATCR's own pkg/s3.S3Service: - New S3Service methods: Stat(), PutBytes(), Move(), Delete(), WalkBlobs(), ListPrefix() added to pkg/s3/types.go - Pull zone fix: Presigned URLs are now generated against the real S3 endpoint, then the host is swapped to the CDN URL post-signing (previously the CDN URL was set as the endpoint, which broke SigV4 signatures) - All hold subsystems migrated: GC, OCI uploads, XRPC handlers, profile uploads, scan broadcaster, manifest posts — all now use *s3.S3Service instead of storagedriver.StorageDriver - Config simplified: Removed configuration.Storage type and buildStorageConfigFromFields(); replaced with a simple S3Params() method - Mock expanded: MockS3Client gains an in-memory object store + 5 new methods, replacing duplicate mockStorageDriver implementations in tests (~160 lines deleted from each test file) 2. Vulnerability Scan UI in AppView (new feature) Displays scan results from the hold's PDS on the repository page: - New lexicon: io/atcr/hold/scan.json with vulnReportBlob field for storing full Grype reports - Two new HTMX endpoints: /api/scan-result (badge) and /api/vuln-details (modal with CVE table) - New templates: vuln-badge.html (severity count chips) and vuln-details.html (full CVE table with NVD/GHSA links) - Repository page: Lazy-loads scan badges per manifest via HTMX - Tests: ~590 lines of test coverage for both handlers 3. S3 Diagnostic Tool New cmd/s3-test/main.go (418 lines) — tests S3 connectivity with both SDK v1 and v2, including presigned URL generation, pull zone host swapping, and verbose signing debug output. 4. Deployment Tooling - New syncServiceUnit() for comparing/updating systemd units on servers - Update command now syncs config keys (adds missing keys from template) and service units with daemon-reload 5. DB Migration 0011_fix_captain_successor_column.yaml — rebuilds hold_captain_records to add the successor column that was missed in a previous migration. 6. Documentation - APPVIEW-UI-FUTURE.md rewritten as a status-tracked feature inventory - DISTRIBUTION.md renamed to CREDENTIAL_HELPER.md - New REMOVING_DISTRIBUTION.md — 480-line analysis of fully removing distribution from the appview side 7. go.mod aws-sdk-go v1 moved from indirect to direct (needed by cmd/s3-test).
23 KiB
Removing distribution/distribution
This document analyzes what it would take to remove the github.com/distribution/distribution/v3 library and implement ATCR's own OCI Distribution Spec HTTP endpoints.
Why Consider Removing It
- Impedance mismatch -- Distribution assumes manifests and blobs live in the same storage backend. ATCR routes manifests to ATProto PDS and blobs to hold/S3. Every storage interface is overridden.
- Context value workaround --
Repository()receives onlycontext.Contextfrom distribution's interface, forcing auth/identity data through context keys intoRegistryContext. - Per-request repository creation --
RoutingRepositoryis recreated on every request because distribution's caching assumptions conflict with ATCR's OAuth session model. - Stale transitive dependencies -- Distribution pulls in AWS SDK v1 (EOL) via its S3 storage driver, even though ATCR doesn't use that driver.
- Unused features -- GC, notifications, storage drivers, replication -- none are used. ATCR has its own GC, its own event dispatch (
processManifestXRPC), and its own S3 integration. - Upstream maintenance pace -- Slow to merge dependency updates and bug fixes.
What Distribution Currently Provides
Only these pieces are actually used:
| What | Distribution Package | ATCR Usage |
|---|---|---|
| HTTP endpoint routing | registry/handlers |
handlers.NewApp() creates the /v2/ handler |
| OCI error responses | registry/api/errcode |
ErrorCodeUnauthorized, ErrorCodeDenied, ErrorCodeUnsupported |
| Middleware registration | registry/middleware/registry |
Register("atproto-resolver", ...) |
| Repository interface | distribution (root) |
Repository, ManifestService, BlobStore, TagService |
| Reference parsing | distribution/reference |
reference.Named for identity/image parsing |
| Token auth | registry/auth/token |
Blank import for registration |
| In-memory driver | registry/storage/driver/inmemory |
Blank import; placeholder since real storage is external |
| Configuration types | configuration |
configuration.Configuration struct |
Everything else (S3 driver, GC, notifications, replication, schema validation) is dead weight.
Files That Import Distribution
All in pkg/appview/ -- hold and scanner are unaffected.
Core implementation (8 files):
storage/routing_repository.go--distribution.Repositorywrapperstorage/manifest_store.go--distribution.ManifestServiceimplstorage/proxy_blob_store.go--distribution.BlobStore+BlobWriterimplstorage/tag_store.go--distribution.TagServiceimplmiddleware/registry.go--distribution.Namespace+ middleware registrationconfig.go-- Buildsconfiguration.Configurationserver.go--handlers.NewApp(),errcodefor error responsescmd/appview/main.go-- Blank imports for driver/auth registration
Tests (6 files):
storage/routing_repository_test.gostorage/manifest_store_test.gostorage/proxy_blob_store_test.gostorage/tag_store_test.gomiddleware/registry_test.go
OCI Distribution Spec Endpoints to Implement
The spec defines these HTTP endpoints. ATCR would need handlers for each.
Version Check
GET /v2/
200 OK (confirms OCI compliance)
401 Unauthorized (triggers auth flow)
Docker clients hit this first. Must return 200 for authenticated requests. A 401 response with WWW-Authenticate header triggers the Docker auth handshake.
Manifests
GET /v2/<name>/manifests/<reference> -> 200 + manifest body
HEAD /v2/<name>/manifests/<reference> -> 200 + headers only
PUT /v2/<name>/manifests/<reference> -> 201 Created
DELETE /v2/<name>/manifests/<reference> -> 202 Accepted
<reference> is either a tag (latest) or digest (sha256:abc...).
Required headers:
- Request
Accept: manifest media types the client supports - Response
Content-Type: actual manifest media type - Response
Docker-Content-Digest: canonical digest of manifest
Media types to support:
application/vnd.oci.image.manifest.v1+jsonapplication/vnd.oci.image.index.v1+jsonapplication/vnd.docker.distribution.manifest.v2+jsonapplication/vnd.docker.distribution.manifest.list.v2+json
Blobs
GET /v2/<name>/blobs/<digest> -> 200 + blob body (or 307 redirect)
HEAD /v2/<name>/blobs/<digest> -> 200 + headers only
DELETE /v2/<name>/blobs/<digest> -> 202 Accepted
ATCR already redirects to presigned S3 URLs via ServeBlob() -- this would become a direct 307 redirect in the handler.
Blob Uploads (Chunked/Resumable)
Initiate:
POST /v2/<name>/blobs/uploads/
202 Accepted
Location: /v2/<name>/blobs/uploads/<uuid>
Monolithic (single request):
POST /v2/<name>/blobs/uploads/?digest=sha256:...
Content-Type: application/octet-stream
Body: <entire blob>
201 Created
Chunked:
PATCH /v2/<name>/blobs/uploads/<uuid>
Content-Type: application/octet-stream
Content-Range: <start>-<end>
Body: <chunk data>
202 Accepted
Range: 0-<end>
(repeat PATCH for each chunk)
PUT /v2/<name>/blobs/uploads/<uuid>?digest=sha256:...
201 Created
Location: /v2/<name>/blobs/<digest>
Check progress:
GET /v2/<name>/blobs/uploads/<uuid>
204 No Content
Range: 0-<bytes received>
Cancel:
DELETE /v2/<name>/blobs/uploads/<uuid>
204 No Content
Cross-repo mount:
POST /v2/<name>/blobs/uploads/?mount=<digest>&from=<other-repo>
201 Created (if blob exists in source repo)
202 Accepted (fall back to regular upload)
Tags
GET /v2/<name>/tags/list
200 OK
{
"name": "<name>",
"tags": ["latest", "v1.0"]
}
Supports pagination via n (count) and last (cursor) query params.
Referrers (OCI v1.1)
GET /v2/<name>/referrers/<digest>
200 OK
Content-Type: application/vnd.oci.image.index.v1+json
Body: image index of referring manifests
Supports artifactType query filter. Returns manifests whose subject field points to the given digest.
Catalog (Optional)
GET /v2/_catalog
200 OK
{ "repositories": ["alice/app", "bob/tool"] }
Pagination via n and last. ATCR may choose not to implement this (many registries don't).
Error Response Format
All 4xx/5xx responses must use the OCI error envelope:
{
"errors": [
{
"code": "MANIFEST_UNKNOWN",
"message": "manifest not found",
"detail": { "tag": "latest" }
}
]
}
Standard error codes:
| Code | HTTP Status | Meaning |
|---|---|---|
BLOB_UNKNOWN |
404 | Blob not found |
BLOB_UPLOAD_INVALID |
400 | Bad digest or size mismatch |
BLOB_UPLOAD_UNKNOWN |
404 | Upload session expired/missing |
DIGEST_INVALID |
400 | Digest doesn't match content |
MANIFEST_BLOB_UNKNOWN |
404 | Manifest references missing blob |
MANIFEST_INVALID |
400 | Malformed manifest |
MANIFEST_UNKNOWN |
404 | Manifest not found |
NAME_INVALID |
400 | Bad repository name |
NAME_UNKNOWN |
404 | Repository doesn't exist |
SIZE_INVALID |
400 | Content-Length mismatch |
UNAUTHORIZED |
401 | Authentication required |
DENIED |
403 | Permission denied |
UNSUPPORTED |
405 | Operation not supported |
TOOMANYREQUESTS |
429 | Rate limited |
What Exists Today vs What's New
For each handler, this breaks down what logic already exists in the storage layer (and just needs to be called) vs what new HTTP glue code must be written. Distribution's handler layer currently handles all the HTTP parsing, header validation, content negotiation, and response formatting -- all of that becomes our responsibility.
Shared New Code
Error helpers (~50 lines, new):
OCI error envelope formatting. Currently provided by errcode.ErrorCodeUnauthorized etc.
type RegistryError struct {
Code string `json:"code"`
Message string `json:"message"`
Detail interface{} `json:"detail,omitempty"`
}
func WriteError(w http.ResponseWriter, status int, code, message string) {
w.Header().Set("Content-Type", "application/json")
w.WriteHeader(status)
json.NewEncoder(w).Encode(struct {
Errors []RegistryError `json:"errors"`
}{Errors: []RegistryError{{Code: code, Message: message}}})
}
Auth middleware (~80 lines, mostly exists):
ExtractAuthMethod() already exists in middleware/registry.go. Needs adaptation to work standalone (currently wraps distribution's app). Must also generate WWW-Authenticate header for 401 responses -- distribution's token auth handler currently does this via blank import of registry/auth/token.
Identity resolution middleware (~250 lines, exists):
NamespaceResolver.Repository() in middleware/registry.go does identity resolution, hold discovery, service token acquisition, and ATProto client creation. This logic moves into an HTTP middleware but the code is the same -- resolves DID, finds hold, gets service token, builds RegistryContext. The validation cache (concurrent service token deduplication) comes along as-is.
Router (~30 lines, new):
mux.HandleFunc("GET /v2/", handleVersionCheck)
mux.HandleFunc("GET /v2/{name...}/manifests/{reference}", handleManifestGet)
// ... etc
Handler-by-Handler Breakdown
handleVersionCheck -- GET /v2/
| Existing logic | None needed -- this is just a 200 OK response |
| New code | ~10 lines. Return 200 with Docker-Distribution-API-Version: registry/2.0 header. If unauthenticated, return 401 with WWW-Authenticate header to trigger Docker's auth flow |
handleManifestGet -- GET /v2/<name>/manifests/<reference>
| Existing logic | ManifestStore.Get() fetches manifest from PDS (record lookup, optional blob download for new-format records). Returns media type + raw bytes. Also fires async pull notification to hold for stats. TagStore.Get() resolves tag → digest when reference is a tag |
| New code (~40 lines) | Parse <reference> to determine tag vs digest. If tag, call TagStore.Get() first to resolve digest. Call ManifestStore.Get(). Set response headers: Content-Type (manifest media type), Docker-Content-Digest (canonical digest), Content-Length. Write body. Handle 404 (manifest not found → MANIFEST_UNKNOWN error) |
| Subtle | Content negotiation: must check client's Accept header against the manifest's actual media type. Distribution handles this transparently. If client doesn't accept the type, return 404. In practice most clients accept everything, but crane and skopeo can be picky |
handleManifestHead -- HEAD /v2/<name>/manifests/<reference>
| Existing logic | ManifestStore.Exists() checks PDS record existence. ManifestStore.Get() needed for full headers |
| New code (~30 lines) | Same as GET but write headers only, no body. Needs Content-Type, Docker-Content-Digest, Content-Length. Could call Exists() for a fast path and Get() for full header population, or just call Get() and skip the body write |
| Note | Some clients (Docker) use HEAD to check existence before pulling. Must return same headers as GET |
handleManifestPut -- PUT /v2/<name>/manifests/<reference>
| Existing logic | ManifestStore.Put() does a LOT: calculates digest, uploads manifest bytes as blob to PDS, creates ManifestRecord with structured metadata, validates manifest list child references, extracts config labels, fetches README/icon, creates tag record, fires async notifications to hold, creates repo page records, handles successor migration |
| New code (~50 lines) | Read request body. Extract Content-Type header as media type. Parse <reference> to determine if this is a tag push. Call ManifestStore.Put() with payload, media type, and optional tag. Set response headers: Location (/v2/<name>/manifests/<digest>), Docker-Content-Digest. Return 201 Created. Handle errors: MANIFEST_INVALID (bad JSON), MANIFEST_BLOB_UNKNOWN (missing child manifest in manifest list) |
| Subtle | Distribution currently wraps the manifest in a distribution.Manifest interface (with Payload() and References() methods) before passing to Put(). Without distribution, we'd change Put() to accept raw []byte + mediaType + optional tag directly -- simpler but requires updating the method signature and its internals |
handleManifestDelete -- DELETE /v2/<name>/manifests/<reference>
| Existing logic | ManifestStore.Delete() calls ATProtoClient.DeleteRecord() |
| New code (~15 lines) | Parse digest from <reference>. Call ManifestStore.Delete(). Return 202 Accepted. Handle 404 |
handleBlobGet -- GET /v2/<name>/blobs/<digest>
| Existing logic | ProxyBlobStore.ServeBlob() checks read access, gets presigned URL from hold, and issues 307 redirect. This is already essentially an HTTP handler |
| New code (~20 lines) | Parse digest from path. Call the presigned URL logic (read access check + hold XRPC call). Write 307 redirect with Location header pointing to presigned S3 URL |
| Note | ServeBlob() currently takes http.ResponseWriter and *http.Request -- it's already doing the HTTP work. This handler is mostly just calling it. Could almost be used as-is |
handleBlobHead -- HEAD /v2/<name>/blobs/<digest>
| Existing logic | ProxyBlobStore.Stat() checks read access, gets presigned HEAD URL, makes HEAD request to S3, returns size |
| New code (~20 lines) | Parse digest. Call Stat(). Set Content-Length, Docker-Content-Digest, Content-Type: application/octet-stream. Return 200. Handle 404 (BLOB_UNKNOWN) |
handleBlobUploadInit -- POST /v2/<name>/blobs/uploads/
| Existing logic | ProxyBlobStore.Create() checks write access, generates upload ID, calls startMultipartUpload() XRPC to hold, creates ProxyBlobWriter, stores in globalUploads map |
| New code (~50 lines) | Check for ?mount=<digest>&from=<repo> query params (cross-repo mount). Check for ?digest=<digest> (monolithic upload -- read body, write to store, complete in one shot). Otherwise, call Create() to start a new upload session. Return 202 Accepted with Location: /v2/<name>/blobs/uploads/<uuid> header, Docker-Upload-UUID header |
| Subtle | Monolithic upload (single POST with digest and body) is a shortcut some clients use. Distribution handles this transparently. We'd need to handle it explicitly: read body, create writer, write, commit. Cross-repo mount is also handled here -- check if blob exists in source repo, skip upload if so |
handleBlobUploadChunk -- PATCH /v2/<name>/blobs/uploads/<uuid>
| Existing logic | ProxyBlobWriter.Write() buffers data and auto-flushes 10MB chunks to S3 via presigned URLs. flushPart() handles the XRPC call to hold for part upload URLs and ETag tracking |
| New code (~40 lines) | Look up writer from globalUploads by UUID. Parse Content-Range header (format: <start>-<end>). Read request body. Call writer.Write(body). Return 202 Accepted with Location header (same upload URL), Range: 0-<total bytes received> header. Handle missing upload (BLOB_UPLOAD_UNKNOWN) |
| Subtle | Content-Range validation: must verify start offset matches current writer position (no gaps, no out-of-order). Return 416 Range Not Satisfiable if misaligned. Distribution handles this; we'd need to track and validate |
handleBlobUploadComplete -- PUT /v2/<name>/blobs/uploads/<uuid>?digest=sha256:...
| Existing logic | ProxyBlobWriter.Commit() flushes remaining buffer, calls completeMultipartUpload() XRPC to hold, removes writer from globalUploads |
| New code (~40 lines) | Look up writer from globalUploads. Parse ?digest= query param. If request has body, write it to the writer (final chunk can be in the PUT). Call writer.Commit() with digest descriptor. Return 201 Created with Location: /v2/<name>/blobs/<digest>, Docker-Content-Digest header. Handle errors: DIGEST_INVALID (provided digest doesn't match), BLOB_UPLOAD_UNKNOWN (expired session) |
| Subtle | Digest validation: distribution verifies the provided digest matches what was actually uploaded. Our writer doesn't currently track a running digest hash -- Commit() just passes the digest through to hold. Need to decide: trust the hold to validate, or add client-side validation. Currently hold does the final validation since it has all the parts |
handleBlobUploadStatus -- GET /v2/<name>/blobs/uploads/<uuid>
| Existing logic | ProxyBlobWriter.Size() returns total bytes written |
| New code (~15 lines) | Look up writer from globalUploads. Return 204 No Content with Range: 0-<size - 1>, Docker-Upload-UUID, Location headers. Handle missing upload |
handleBlobUploadCancel -- DELETE /v2/<name>/blobs/uploads/<uuid>
| Existing logic | ProxyBlobWriter.Cancel() calls abortMultipartUpload() XRPC to hold, removes from globalUploads |
| New code (~15 lines) | Look up writer. Call Cancel(). Return 204 No Content. Handle missing upload |
handleTagsList -- GET /v2/<name>/tags/list
| Existing logic | TagStore.All() lists all tag records from PDS, filters by repository |
| New code (~30 lines) | Call TagStore.All(). Parse ?n= and ?last= query params for pagination (slice the results). Return JSON: {"name": "<name>", "tags": [...]}. Set Link header for pagination if there are more results |
| Note | Distribution handles pagination. We'd need to implement it ourselves -- sort tags, apply cursor, set Link header with next page URL |
handleReferrers -- GET /v2/<name>/referrers/<digest>
| Existing logic | Not currently implemented in ATCR's storage layer. Distribution may return an empty index |
| New code (~30 lines) | Query manifests that have a subject field pointing to the given digest. Return an OCI image index containing descriptors for each referrer. Support ?artifactType= filter. If no referrers, return empty index |
| Note | This is new functionality either way. ATCR would need to query PDS for manifests with matching subject digests. Could defer this (return empty index) and implement properly later |
Interface Changes to Storage Layer
The existing stores would need their method signatures simplified. This is mostly mechanical -- removing distribution wrapper types:
ManifestStore changes:
Get(): returns(distribution.Manifest, error)→ returns(mediaType string, payload []byte, err error)Put(): acceptsdistribution.Manifest+...distribution.ManifestServiceOption→ acceptspayload []byte, mediaType string, tag stringExists()andDelete(): signatures stay roughly the same (justdigest.Digestin, error out)- Remove
rawManifeststruct (wrapper implementingdistribution.Manifestinterface) - Remove
distribution.WithTagOptionextraction logic inPut()
ProxyBlobStore changes:
Stat(): returnsdistribution.Descriptor→ returns(size int64, err error)Get(): stays the same (returns[]byte)ServeBlob(): already takeshttp.ResponseWriter/*http.Request-- could become the handler itselfCreate(): returnsdistribution.BlobWriter→ returns*ProxyBlobWriterdirectlyResume(): same change- Remove
distribution.BlobCreateOption/distribution.CreateOptionsparsing ProxyBlobWriter.Commit(): acceptsdistribution.Descriptor→ acceptsdigest string, size int64
TagStore changes:
Get(): returnsdistribution.Descriptor→ returns(digest string, err error)Tag(): acceptsdistribution.Descriptor→ acceptsdigest stringAll(),Untag(),Lookup(): minimal changes
RoutingRepository:
- Removed entirely. Handlers call stores directly. The lazy initialization via
sync.Oncegoes away since there's no interface requiring aRepositoryobject.
Estimated interface change work: ~150 lines changed across storage files + ~150 lines changed across test files.
What Stays
These dependencies are used directly and stay regardless:
github.com/opencontainers/go-digest-- Digest parsing/validation (standard, lightweight)github.com/opencontainers/image-spec-- OCI manifest/index structs (optional but useful for validation)github.com/distribution/reference-- Could stay (lightweight, no heavy transitive deps) or replace with string splitting since ATCR's name format is always<identity>/<image>
Revised Effort Estimate
| Component | New Lines | Changed Lines | Notes |
|---|---|---|---|
| Router + version check | ~40 | 0 | Trivial |
| Error helpers | ~50 | 0 | OCI error envelope, error code constants |
| Auth middleware adaptation | ~30 | ~50 | WWW-Authenticate header generation is new; ExtractAuthMethod moves |
| Identity resolution middleware | ~20 | ~30 | NamespaceResolver.Repository() logic moves to HTTP middleware; code is the same |
| Manifest handlers (GET/HEAD/PUT/DELETE) | ~135 | 0 | Content negotiation, header writing, tag vs digest parsing |
| Blob handlers (GET/HEAD/DELETE) | ~55 | 0 | Presigned URL redirect, stat, delete stub |
| Blob upload handlers (POST/PATCH/PUT/GET/DELETE) | ~160 | 0 | Chunked upload protocol, Content-Range validation, monolithic upload, cross-repo mount |
| Tags list handler | ~30 | 0 | Pagination logic |
| Referrers handler | ~30 | 0 | Could defer with empty index |
| Storage interface changes | 0 | ~150 | Remove distribution types from method signatures |
| Test updates | 0 | ~150 | Update mocks and assertions for new signatures |
| Config cleanup | 0 | ~80 | Remove buildDistributionConfig(), blank imports |
| Total | ~550 new | ~460 changed | ~1010 lines total |
This is not a trivial migration. The ~550 new lines are genuine new HTTP handler code that doesn't exist today -- distribution's handler layer provides all of it currently. The changed lines are mostly mechanical (removing distribution type wrappers) but still need care and test updates.
Risk Assessment
Low risk:
- Storage logic is unchanged -- same PDS calls, same hold XRPC calls, same presigned URLs
- Auth flow is unchanged -- same JWT validation, same OAuth refresh
- Tests can be adapted incrementally
Medium risk:
- Subtle OCI spec compliance gaps (edge cases in content negotiation, digest validation, chunked upload semantics)
- Docker client compatibility -- different clients (Docker, Podman, crane, skopeo) may exercise different code paths
Mitigation:
- Use OCI conformance tests to validate
- Test against Docker, Podman, crane, and skopeo before shipping
- Can be done incrementally: build new router, test alongside distribution handler, swap when ready
Dependencies Removed
Removing distribution eliminates ~30-40 transitive packages, notably:
github.com/aws/aws-sdk-go(v1, EOL)- Azure cloud SDK packages
- Google Cloud Storage packages
- Distribution-specific logging/metrics
- Unused storage driver registrations
Most other transitive deps (gRPC, protobuf, OpenTelemetry, logrus) are also pulled by bluesky-social/indigo and would remain.