Files
at-container-registry/docs/EMBEDDED_PDS.md
2025-10-14 20:25:08 -05:00

22 KiB

Embedded PDS Architecture for Hold Services

This document explores the evolution of ATCR's hold service architecture toward becoming an embedded ATProto PDS (Personal Data Server).

Motivation

Comparison to Other ATProto Projects

Several ATProto projects face similar challenges with large data storage:

Project Large Data Metadata Current Solution
tangled.org Git objects Issues, PRs, comments External knot storage
stream.place Video segments Stream info, chat Embedded "static PDS"
ATCR Container blobs Manifests, comments, builds External hold service

Common problem: Large binary data can't realistically live in user PDSs, but interaction metadata gets fragmented across different users' PDSs.

Emerging pattern: Application-specific storage services with embedded minimal PDS implementations.

The Fragmentation Problem

Tangled.org Example

user/myproject repository
├── Git data → Knot (external storage)
├── Issues → Created by @alice → Lives in alice's PDS
├── PRs → Created by @bob → Lives in bob's PDS
└── Comments → Created by @charlie → Lives in charlie's PDS

Problems:

  • Repo owner can't export all issues/PRs easily
  • No single source of truth for repo metadata
  • Interaction history fragmented across PDSs
  • Can't encrypt repo data while maintaining collaboration

ATCR's Similar Challenge

atcr.io/alice/myapp
├── Manifests → alice's PDS
├── Blobs → Hold service (external)
└── Future: Comments, builds, attestations → Where?

Stream.place's Approach

Stream.place built a minimal "static PDS" embedded in their application with just the XRPC endpoints they need:

  • com.atproto.repo.describeRepo
  • com.atproto.sync.subscribeRepos
  • Minimal read methods

Why: Avoid rate-limiting Bluesky's infrastructure with video segments while staying ATProto-native.

Current Hold Service Architecture

The current hold service is intentionally minimal:

Hold Service =
  - OAuth token validation (call user's PDS)
  - Generate presigned S3 URLs
  - Return HTTP redirects
  - Optional crew membership checks

Endpoints:

  • POST /get-presigned-url → S3 download URL
  • POST /put-presigned-url → S3 upload URL
  • GET /blobs/{digest} → Proxy fallback
  • PUT /blobs/{digest} → Proxy fallback
  • GET /health → Health check

Resource footprint:

  • Single Go binary (~20MB)
  • No database (stateless)
  • No PDS (validates against user's PDS)
  • Minimal memory/CPU (just signing URLs)
  • S3 does all the heavy lifting

This is already as cheap as possible for what it does - just an OAuth validation + URL signing service.

Why Not Force Blobs into User PDSs?

Size Considerations

PDS blob limits: Default ~50MB (Bluesky may be lower)

Container layer sizes:

  • Alpine base: ~5MB ✓
  • Config blobs: ~1-5KB ✓
  • Small Go binaries: 10-30MB ✓
  • Node.js base: 100-200MB ✗
  • Python base: 50-100MB ✗
  • ML models: 500MB - 10GB ✗
  • Large datasets: huge ✗

Reality: Many/most layers exceed 50MB. A split-brain approach would be the norm, not the exception.

Split-Brain Complexity

func (s *SplitBlobStore) Create(ctx context.Context, options ...) {
    // Challenges:
    // 1. Monolithic uploads: Size known upfront ✓
    // 2. Chunked uploads: Size unknown until complete ✗
    // 3. Resumable uploads: State management across PDS/hold ✗
    // 4. Mount/cross-repo: Which backend to check? ✗
}

Detection works for simple cases but breaks down with:

  • Multipart/chunked uploads (no size until complete)
  • Resumable uploads (stateful across boundaries)
  • Cross-repository blob mounts (which backend?)

Pragmatic Decision

Accept the trade-off:

  • Blobs in holds (practical for large data)
  • Manifests in user's PDS (ownership of metadata)
  • Focus on making holds easy to deploy and migrate

Users still own the important part - the manifest is the source of truth for what the image is.

Embedded PDS Vision

Key Insight: Hold is the PDS

Because blobs are content-addressed and deduplicated globally, there isn't a singular owner of blob data. Multiple images share the same base layer blobs.

Therefore: The hold itself is the PDS (with identity did:web:hold1.example.com), not individual image repositories.

Proposed Architecture

Hold Service = Minimal PDS (did:web:hold1.example.com)
├── Standard ATProto blob endpoints:
│   ├── com.atproto.sync.uploadBlob
│   ├── com.atproto.sync.getBlob
│   └── Blob storage → S3 (like normal PDS)
├── Custom XRPC methods:
│   ├── io.atcr.hold.delegateAccess (IAM)
│   ├── io.atcr.hold.getUploadUrl (optimization)
│   ├── io.atcr.hold.getDownloadUrl (optimization)
│   ├── io.atcr.hold.exportImage (data portability)
│   └── io.atcr.hold.getStats (metadata)
└── Records (hold's own PDS):
    ├── io.atcr.hold.crew (crew membership)
    └── io.atcr.hold.config (hold configuration)

Benefits

  1. ATProto-native: Uses standard XRPC, not custom REST API
  2. Discoverable: Hold's DID document advertises capabilities
  3. Portable: Users can export images via XRPC
  4. Standardized: Blob operations use ATProto conventions
  5. Future-proof: Can add more XRPC methods as needed
  6. Interoperable: Works with ATProto tooling

Implementation Details

1. SHA256 to CID Mapping

ATProto uses CIDs (Content Identifiers) for blobs, while OCI uses SHA256 digests. However, CIDs support SHA256 as the hash function.

Key insight: We can construct CIDs directly from SHA256 digests with no additional storage needed!

// pkg/hold/cid.go
func DigestToCID(digest string) (cid.Cid, error) {
    // sha256:abc123... → raw bytes
    hash := parseDigest(digest)

    // Construct CIDv1 with sha256 codec
    return cid.NewCidV1(
        cid.Raw,                    // codec
        multihash.SHA2_256,         // hash function
        hash,                       // hash bytes
    )
}

func CIDToDigest(c cid.Cid) string {
    // Decode multihash → sha256:abc...
    mh := c.Hash()
    return fmt.Sprintf("sha256:%x", mh)
}

Mapping:

OCI digest:     sha256:abc123...
ATProto CID:    bafybei... (CIDv1 with sha256, base32 encoded)
Storage path:   s3://bucket/blobs/sha256/ab/abc123...

Blobs stay in distribution's layout, we just compute CID on-the-fly. No mapping records needed.

2. Storage: Distribution Layout with PDS Interface

The hold's blob storage uses distribution's driver directly - no encoding or transformation:

type HoldBlobStore struct {
    storageDriver storagedriver.StorageDriver  // S3, filesystem, etc
}

// Implements ATProto blob interface
func (h *HoldBlobStore) UploadBlob(ctx context.Context, data io.Reader) (cid.Cid, error) {
    // 1. Compute sha256 while reading
    digest, size := computeDigest(data)

    // 2. Store at distribution's path: blobs/sha256/ab/abc123...
    path := h.blobPath(digest)
    h.storageDriver.PutContent(ctx, path, data)

    // 3. Return CID (computed from sha256)
    return DigestToCID(digest), nil
}

func (h *HoldBlobStore) GetBlob(ctx context.Context, c cid.Cid) (io.Reader, error) {
    // 1. Convert CID → sha256 digest
    digest := CIDToDigest(c)

    // 2. Fetch from distribution's path
    path := h.blobPath(digest)
    return h.storageDriver.Reader(ctx, path, 0)
}

Storage continues to use distribution's existing S3 layout. The PDS interface is just a wrapper.

3. Authentication & IAM

Challenge: ATProto operations are authenticated AS the account owner. For hold operations, we need actions to be performed AS the hold (not individual users), but authorized BY crew members.

Important context: AppView manages the user's OAuth session. When users authenticate via the credential helper, they actually authenticate through AppView's web interface. AppView obtains and stores the user's OAuth token and DPoP key. The credential helper only receives a registry JWT.

Proposed: DPoP Proof Delegation (Standard ATProto Federation)

1. User authenticates via AppView (OAuth flow)
   - AppView obtains: OAuth token, refresh token, DPoP key, DID
   - AppView stores these in its token storage
   - Credential helper receives: Registry JWT only

2. When AppView needs blob access, it calls hold:
   POST /xrpc/io.atcr.hold.delegateAccess
   Headers: Authorization: DPoP <user-oauth-token>
            DPoP: <proof-signed-with-user-dpop-key>
   Body: {
     "userDid": "did:plc:alice123",
     "purpose": "blob-upload",
     "duration": 900
   }

3. Hold validates (standard ATProto token validation):
   - Verify DPoP proof signature matches token's bound key
   - Call user's PDS: com.atproto.server.getSession (validates token)
   - Extract user's DID from validated session
   - Check user's DID in hold's crew records
   - If authorized, issue temporary token for blob operations

4. AppView uses delegated token for blob operations:
   POST /xrpc/com.atproto.sync.uploadBlob
   Headers: Authorization: DPoP <hold-token>
            DPoP: <proof>

This is standard ATProto federation - services pass OAuth tokens with DPoP proofs between each other. Hold independently validates tokens against the user's PDS, so there's no trust relationship required.

Crew records stored in hold's PDS:

{
  "$type": "io.atcr.hold.crew",
  "member": "did:plc:alice123",
  "role": "admin",
  "permissions": ["blob:read", "blob:write", "crew:manage"],
  "addedAt": "2025-10-14T..."
}

Security considerations:

  • User's OAuth token is exposed to hold during delegation
  • However, hold independently validates it (can't be forged)
  • Tokens are short-lived (15min typical)
  • Hold only accepts tokens for crew members
  • Hold validates DPoP binding (requires private key)
  • Standard ATProto security model

4. Presigned URLs for Optimized Egress

While standard ATProto blob endpoints work, direct S3 access is more efficient. Hold can expose custom XRPC methods:

// io.atcr.hold.getUploadUrl - Get presigned upload URL
type GetUploadUrlRequest struct {
    Digest string  // sha256:abc...
    Size   int64
}

type GetUploadUrlResponse struct {
    UploadURL string     // Presigned S3 URL
    ExpiresAt time.Time
}

// io.atcr.hold.getDownloadUrl - Get presigned download URL
type GetDownloadUrlRequest struct {
    Digest string
}

type GetDownloadUrlResponse struct {
    DownloadURL string   // Presigned S3 URL
    ExpiresAt   time.Time
}

AppView uses optimized path:

func (a *ATProtoBlobStore) ServeBlob(ctx, w, r, dgst) error {
    // Try optimized presigned URL endpoint
    resp, err := a.client.GetDownloadUrl(ctx, dgst)
    if err == nil {
        // Redirect directly to S3
        http.Redirect(w, r, resp.DownloadURL, http.StatusTemporaryRedirect)
        return nil
    }

    // Fallback: Standard ATProto blob endpoint (proxied)
    reader, _ := a.client.GetBlob(ctx, holdDID, cid)
    io.Copy(w, reader)
}

Best of both worlds: Standard ATProto interface + S3 optimization for bandwidth efficiency.

5. Image Export for Portability

Custom XRPC method enables users to export entire images:

// io.atcr.hold.exportImage - Export all blobs for an image
type ExportImageRequest struct {
    Manifest *oci.Manifest  // User provides manifest
}

type ExportImageResponse struct {
    ArchiveURL string       // Presigned S3 URL to tar.gz
    ExpiresAt  time.Time
}

// Implementation:
// 1. Extract all blob digests from manifest (config + layers)
// 2. Create tar.gz with all blobs
// 3. Upload to S3 temp location
// 4. Return presigned download URL (15min expiry)

Users can request all blobs for their images and migrate to different holds.

Changes Required

AppView Changes

Current:

type ProxyBlobStore struct {
    holdURL string  // HTTP endpoint
}

func (p *ProxyBlobStore) ServeBlob(...) {
    // POST /put-presigned-url
    // Return redirect
}

New:

type ATProtoBlobStore struct {
    holdDID string              // did:web:hold1.example.com
    holdURL string              // Resolved from DID document
    client  *atproto.Client     // XRPC client
    delegatedToken string       // From io.atcr.hold.delegateAccess
}

func (a *ATProtoBlobStore) ServeBlob(ctx, w, r, dgst) error {
    // Try optimized: io.atcr.hold.getDownloadUrl
    // Fallback: com.atproto.sync.getBlob
}

Hold Service Changes

Transform from simple HTTP server to minimal PDS:

// cmd/hold/main.go
func main() {
    // Storage driver (unchanged)
    storageDriver := buildStorageDriver()

    // NEW: Embedded PDS
    pds := hold.NewEmbeddedPDS(hold.Config{
        DID:         "did:web:hold1.example.com",
        BlobStore:   storageDriver,
        Collections: []string{
            "io.atcr.hold.crew",
            "io.atcr.hold.config",
        },
    })

    // Serve XRPC endpoints
    mux.Handle("/xrpc/", pds.Handler())

    // Legacy endpoints (optional for backwards compat)
    // mux.Handle("/get-presigned-url", legacyHandler)
}

Open Questions

1. Docker Hub Size Limits

Research findings: Docker Hub has soft limits around 10-20GB per layer, with practical issues beyond that. No hard-coded enforcement.

For ATCR: Hold services can theoretically support larger blobs if S3 and network infrastructure allows. May want configurable limits to prevent abuse.

2. Token Delegation Security Model

Recommended approach: DPoP proof delegation (standard ATProto federation pattern)

Open questions:

  • How long should delegated tokens last? (15min like presigned URLs?)
  • Should delegation be per-operation or session-based?
  • Do we need audit logs for delegated operations?
  • Can AppView cache delegated tokens across requests?
  • Should we implement token refresh for long-running operations?

3. Migration Path

  • Do we support both HTTP and XRPC APIs during transition?
  • How do existing manifests with holdEndpoint: "https://..." migrate to holdDid: "did:web:..."?
  • Can AppView auto-detect if hold supports XRPC vs legacy?

4. PDS Implementation Scope

Minimal endpoints needed:

  • com.atproto.sync.uploadBlob
  • com.atproto.sync.getBlob
  • com.atproto.repo.describeRepo (discovery)
  • Custom XRPC methods (delegation, presigned URLs, export)

Not needed:

  • com.atproto.repo.* (no user repos)
  • com.atproto.server.* (no user sessions)
  • Most sync/admin endpoints

Can we build a reusable "static PDS" library for apps like ATCR, tangled.org, stream.place?

5. Crew Management

  • How are crew members added/removed?
  • UI in AppView? CLI tool? Direct XRPC calls?
  • Can crew members delegate to other crew members?
  • Role hierarchy (owner > admin > member)?

6. Hold Discovery & Registration

Current: Hold registers by creating records in owner's PDS New: Hold is its own identity - how does AppView discover available holds?

Possibilities:

  • Holds publish to feeds
  • AppView maintains directory
  • DIDs are manually configured
  • ATProto directory service

7. Multi-Tenancy

Could one hold PDS serve multiple "logical holds" for different organizations?

did:web:hold-provider.com/org1
did:web:hold-provider.com/org2

Or should each hold be a separate deployment?

8. Blob Deduplication

Current behavior: Global deduplication (same layer shared across all images).

With embedded PDS:

  • Does dedup stay global across all crew/users?
  • Or is it per-hold (isolated storage)?
  • How do we track blob references for garbage collection?

9. Cost Model

  • Who pays for S3 storage/egress?
  • Hold operator? Image owner? Per-pull?
  • How to implement metering/billing via XRPC?

10. Disaster Recovery

  • How to backup hold's PDS (crew records, config)?
  • Can holds replicate to other holds?
  • Image export handles blobs - what about metadata?

Implementation Plan

Phase 1: Basic PDS with Carstore COMPLETED

Implementation: Using indigo's carstore with SQLite + DeltaSession

import (
    "github.com/bluesky-social/indigo/carstore"
    "github.com/bluesky-social/indigo/models"
    "github.com/bluesky-social/indigo/repo"
)

type HoldPDS struct {
    did      string
    carstore carstore.CarStore
    session  *carstore.DeltaSession  // Provides blockstore interface
    repo     *repo.Repo
    dbPath   string
    uid      models.Uid              // User ID for carstore (fixed: 1)
}

func NewHoldPDS(ctx context.Context, did, dbPath string) (*HoldPDS, error) {
    // Create SQLite-backed carstore
    sqlStore, err := carstore.NewSqliteStore(dbPath)
    sqlStore.Open(dbPath)
    cs := sqlStore.CarStore()

    // For single-hold use, fixed UID
    uid := models.Uid(1)

    // Create DeltaSession (provides blockstore interface)
    session, err := cs.NewDeltaSession(ctx, uid, nil)

    // Create repo with session as blockstore
    r := repo.NewRepo(ctx, did, session)

    return &HoldPDS{
        did:      did,
        carstore: cs,
        session:  session,
        repo:     r,
        dbPath:   dbPath,
        uid:      uid,
    }, nil
}

Key learnings:

  • Carstore provides blockstore via DeltaSession (not direct access)
  • models.Uid is the user ID type (we use fixed UID(1))
  • DeltaSession needs to be a pointer (*carstore.DeltaSession)
  • repo.NewRepo() accepts the session directly as blockstore

Storage:

  • Single file: /var/lib/atcr-hold/hold.db (SQLite)
  • Contains MST nodes, records, commits in carstore tables
  • Proper indigo repo/MST implementation (production-tested)

Why SQLite carstore:

  • Single file persistence (like appview's SQLite)
  • Official indigo storage backend
  • Handles compaction/cleanup automatically
  • Migration path to Postgres/Scylla if needed
  • Easy to replicate (Litestream, LiteFS, rsync)
  • CAR import/export support built-in

Scale considerations:

  • SQLite carstore marked "experimental" but suitable for single-hold use
  • MST designed for massive scale (O(log n) operations)
  • 1000 crew records = ~1-2MB database (trivial)
  • Bluesky PDSs use carstore for millions of records
  • If needed: migrate to Postgres-backed carstore (same API)

Hold as Proper ATProto User

Decision: Make holds full ATProto actors for discoverability and ecosystem integration.

What this enables:

  • Hold becomes discoverable via ATProto directory
  • Can have profile (app.bsky.actor.profile)
  • Can post status updates (app.bsky.feed.post)
  • Users can follow holds
  • Social proof/reputation via ATProto social graph

MVP Scope: We're building the minimal PDS needed for discoverability, not a full social client:

  • Signing keys (ES256K via atproto/atcrypto)
  • DID document (did:web at /.well-known/did.json)
  • Standard XRPC endpoints (describeRepo, getRecord, listRecords)
  • Profile record (app.bsky.actor.profile)
  • ⏸️ Posting functionality (later - other services can read our records)

Key insight: Other ATProto services will "just work" as long as they can retrieve records from the hold's PDS. We don't need to implement full social features for the hold to participate in the ecosystem.

Crew Management: Individual Records

Decision: Individual crew record per user (remove wildcard logic)

// io.atcr.hold.crew/{rkey}
{
  "$type": "io.atcr.hold.crew",
  "member": "did:plc:alice123",
  "role": "admin",  // or "member"
  "permissions": ["blob:read", "blob:write"],
  "addedAt": "2025-10-14T..."
}

// io.atcr.hold.config/policy
{
  "$type": "io.atcr.hold.config",
  "access": "public",     // or "allowlist"
  "allowAny": true,       // public: allow any authenticated user
  "requireAuth": true,    // require authentication (no anonymous)
  "maxUsers": 1000        // optional limit
}

Authorization logic:

func (p *HoldPDS) CheckAccess(ctx context.Context, userDID string) (bool, error) {
    policy := p.GetPolicy(ctx)

    if policy.Access == "public" && policy.AllowAny {
        // Public hold - any authenticated ATCR user allowed
        // No individual crew record needed
        return true, nil
    }

    if policy.Access == "allowlist" {
        // Check explicit crew membership
        _, err := p.GetCrewMember(ctx, userDID)
        return err == nil, nil
    }

    return false, nil
}

Benefits of individual records:

  • Auditability (track who has access)
  • Per-user permissions (admin vs member)
  • Explicit revocation capabilities
  • Analytics (usage tracking)
  • Rate limiting (per-user quotas)
  • subscribeRepos events on crew changes

Use cases:

  • Public community hold: access: "public", allowAny: true - no crew records needed
  • Private team hold: access: "allowlist" - explicit crew membership
  • Hybrid: Public access + explicit admin crew records for elevated permissions

Next Steps

  1. Add indigo dependencies - carstore, repo, MST
  2. Implement HoldPDS with carstore - Create pkg/hold/pds
  3. Add crew management - CRUD operations for crew records
  4. Implement standard PDS endpoints - describeServer, describeRepo, getRecord, listRecords
  5. Add DID document - did:web identity generation
  6. Custom XRPC methods - getUploadUrl, getDownloadUrl (presigned URLs)
  7. Wire up in cmd/hold - Serve XRPC alongside existing HTTP
  8. Test basic operations - Add/list crew, policy checks
  9. Design delegation/IAM - Token exchange for authenticated operations
  10. Implement AppView XRPC client - Support PDS-based holds

References