Files

Evan Jarrett 2a60a47fd5 fix issues pulling other users images. fix labels taking priority over annotations. fix various auth errors

2025-12-23 16:20:52 -06:00

13 KiB

Raw Permalink Blame History

Analysis: AppView SQL Database Usage

Overview

The AppView uses SQLite with 19 tables. The key finding: most data is a cache of ATProto records that could theoretically be rebuilt from users' PDS instances.

Data Categories

1. MUST PERSIST (Local State Only)

These tables contain data that cannot be reconstructed from external sources:

Table	Purpose	Why It Must Persist
`oauth_sessions`	OAuth tokens	Refresh tokens are stateful; losing them = users must re-auth
`ui_sessions`	Web browser sessions	Session continuity for logged-in users
`devices`	Approved devices + bcrypt secrets	User authorization decisions; secrets are one-way hashed
`pending_device_auth`	In-flight auth flows	Short-lived (10min) but critical during auth
`oauth_auth_requests`	OAuth flow state	Short-lived but required for auth completion
`repository_stats`	Pull/push counts	Locally tracked metrics - not stored in ATProto

2. CACHED FROM PDS (Rebuildable)

These tables are essentially a read-through cache of ATProto data:

Table	Source	ATProto Collection
`users`	User's PDS profile	`app.bsky.actor.profile` + DID document
`manifests`	User's PDS	`io.atcr.manifest` records
`tags`	User's PDS	`io.atcr.tag` records
`layers`	Derived from manifests	Parsed from manifest content
`manifest_references`	Derived from manifest lists	Parsed from multi-arch manifests
`repository_annotations`	Manifest config blob	OCI annotations from config
`repo_pages`	User's PDS	`io.atcr.repo.page` records
`stars`	User's PDS	`io.atcr.sailor.star` records (synced via Jetstream)
`hold_captain_records`	Hold's embedded PDS	`io.atcr.hold.captain` records
`hold_crew_approvals`	Hold's embedded PDS	`io.atcr.hold.crew` records
`hold_crew_denials`	Local authorization cache	Could re-check on demand

3. OPERATIONAL

Table	Purpose
`schema_migrations`	Migration tracking
`firehose_cursor`	Jetstream position (can restart from 0)

Key Insights

What's Actually Unique to AppView?

Authentication state - OAuth sessions, devices, UI sessions
Engagement metrics - Pull/push counts (locally tracked, not in ATProto)

What Could Be Eliminated?

If ATCR fully embraced the ATProto model:

users - Query PDS on demand (with caching)
manifests, tags, layers - Query PDS on demand (with caching)
repository_annotations - Fetch manifest config on demand
repo_pages - Query PDS on demand
hold_* tables - Query hold's PDS on demand

Trade-offs

Current approach (heavy caching):

Fast queries for UI (search, browse, stats)
Offline resilience (PDS down doesn't break UI)
Complex sync logic (Jetstream consumer, backfill)
State can diverge from source of truth

Lighter approach (query on demand):

Always fresh data
Simpler codebase (no sync)
Slower queries (network round-trips)
Depends on PDS availability

Current Limitation: No Cache-Miss Queries

Finding: There's no "query PDS on cache miss" logic. Users/manifests only enter the DB via:

OAuth login (user authenticates)
Jetstream events (firehose activity)

Problem: If someone visits atcr.io/alice/myapp before alice is indexed → 404

Where this happens:

pkg/appview/handlers/repository.go:50-53: If db.GetUserByDID() returns nil → 404
No fallback to atproto.Client.ListRecords() or similar

This matters for Valkey migration: If cache is ephemeral and restarts clear it, you need cache-miss logic to repopulate on demand. Otherwise:

Restart Valkey → all users/manifests gone
Wait for Jetstream to re-index OR implement cache-miss queries

Cache-miss implementation design:

Existing code to reuse: pkg/appview/jetstream/processor.go:43-97 (EnsureUser)

// New: pkg/appview/cache/loader.go

type Loader struct {
    cache  Cache  // Valkey interface
    client *atproto.Client
}

// GetUser with cache-miss fallback
func (l *Loader) GetUser(ctx context.Context, did string) (*User, error) {
    // 1. Try cache
    if user := l.cache.GetUser(did); user != nil {
        return user, nil
    }

    // 2. Cache miss - resolve identity (already queries network)
    _, handle, pdsEndpoint, err := atproto.ResolveIdentity(ctx, did)
    if err != nil {
        return nil, err  // User doesn't exist in network
    }

    // 3. Fetch profile for avatar
    client := atproto.NewClient(pdsEndpoint, "", "")
    profile, _ := client.GetProfileRecord(ctx, did)
    avatarURL := ""
    if profile != nil && profile.Avatar != nil {
        avatarURL = atproto.BlobCDNURL(did, profile.Avatar.Ref.Link)
    }

    // 4. Cache and return
    user := &User{DID: did, Handle: handle, PDSEndpoint: pdsEndpoint, Avatar: avatarURL}
    l.cache.SetUser(user, 1*time.Hour)
    return user, nil
}

// GetManifestsForRepo with cache-miss fallback
func (l *Loader) GetManifestsForRepo(ctx context.Context, did, repo string) ([]Manifest, error) {
    cacheKey := fmt.Sprintf("manifests:%s:%s", did, repo)

    // 1. Try cache
    if cached := l.cache.Get(cacheKey); cached != nil {
        return cached.([]Manifest), nil
    }

    // 2. Cache miss - get user's PDS endpoint
    user, err := l.GetUser(ctx, did)
    if err != nil {
        return nil, err
    }

    // 3. Query PDS for manifests
    client := atproto.NewClient(user.PDSEndpoint, "", "")
    records, _, err := client.ListRecordsForRepo(ctx, did, atproto.ManifestCollection, 100, "")
    if err != nil {
        return nil, err
    }

    // 4. Filter by repository and parse
    var manifests []Manifest
    for _, rec := range records {
        var m atproto.ManifestRecord
        if err := json.Unmarshal(rec.Value, &m); err != nil {
            continue
        }
        if m.Repository == repo {
            manifests = append(manifests, convertManifest(m))
        }
    }

    // 5. Cache and return
    l.cache.Set(cacheKey, manifests, 10*time.Minute)
    return manifests, nil
}

Handler changes:

// Before (repository.go:45-53):
owner, err := db.GetUserByDID(h.DB, did)
if owner == nil {
    RenderNotFound(w, r, h.Templates, h.RegistryURL)
    return
}

// After:
owner, err := h.Loader.GetUser(r.Context(), did)
if err != nil {
    RenderNotFound(w, r, h.Templates, h.RegistryURL)
    return
}

Performance considerations:

Cache hit: ~1ms (Valkey lookup)
Cache miss: ~200-500ms (PDS round-trip)
First request after restart: slower but correct
Jetstream still useful for proactive warming

Proposed Architecture: Valkey + ATProto

Goal

Replace SQLite with Valkey (Redis-compatible) for ephemeral state, push remaining persistent data to ATProto.

What goes to Valkey (ephemeral, TTL-based)

Current Table	Valkey Key Pattern	TTL	Notes
`oauth_sessions`	`oauth:{did}:{session_id}`	90 days	Lost on restart = re-auth
`ui_sessions`	`ui:{session_id}`	Session duration	Lost on restart = re-login
`oauth_auth_requests`	`authreq:{state}`	10 min	In-flight flows
`pending_device_auth`	`pending:{device_code}`	10 min	In-flight flows
`firehose_cursor`	`cursor:jetstream`	None	Can restart from 0
All PDS cache tables	`cache:{collection}:{did}:{rkey}`	10-60 min	Query PDS on miss

Benefits:

Multi-instance ready (shared Valkey)
No schema migrations
Natural TTL expiry
Simpler code (no SQL)

What could become ATProto records

Current Table	Proposed Collection	Where Stored	Open Questions
`devices`	`io.atcr.sailor.device`	User's PDS	Privacy: IP, user-agent sensitive?
`repository_stats`	`io.atcr.repo.stats`	Hold's PDS or User's PDS	Who owns the stats?

Devices → Valkey:

Move current device table to Valkey
Key: device:{did}:{device_id} → {name, secret_hash, ip, user_agent, created_at, last_used}
TTL: Long (1 year?) or no expiry
Device list: devices:{did} → Set of device IDs
Secret validation works the same, just different backend

Service auth exploration (future): The challenge with pure ATProto service auth is the AppView still needs the user's OAuth session to write manifests to their PDS. The current flow:

User authenticates via OAuth → AppView gets OAuth tokens
AppView issues registry JWT to credential helper
Credential helper presents JWT on each push/pull
AppView uses OAuth session to write to user's PDS

Service auth could work for the hold side (AppView → Hold), but not for the user's OAuth session.

Repository stats → Hold's PDS:

Challenge discovered: The hold's getBlob endpoint only receives did + cid, not the repository name.

Current flow (proxy_blob_store.go:358-362):

xrpcURL := fmt.Sprintf("%s%s?did=%s&cid=%s&method=%s",
    p.holdURL, atproto.SyncGetBlob, p.ctx.DID, dgst.String(), operation)

Implementation options:

Option A: Add repository parameter to getBlob (recommended)

// Modified AppView call:
xrpcURL := fmt.Sprintf("%s%s?did=%s&cid=%s&method=%s&repo=%s",
    p.holdURL, atproto.SyncGetBlob, p.ctx.DID, dgst.String(), operation, p.ctx.Repository)

// Modified hold handler (xrpc.go:969):
func (h *XRPCHandler) HandleGetBlob(w http.ResponseWriter, r *http.Request) {
    did := r.URL.Query().Get("did")
    cidOrDigest := r.URL.Query().Get("cid")
    repo := r.URL.Query().Get("repo")  // NEW

    // ... existing blob handling ...

    // Increment stats if repo provided
    if repo != "" {
        go h.pds.IncrementPullCount(did, repo)  // Async, non-blocking
    }
}

Stats record structure:

Collection: io.atcr.hold.stats
Rkey: base64(did:repository)  // Deterministic, unique

{
  "$type": "io.atcr.hold.stats",
  "did": "did:plc:alice123",
  "repository": "myapp",
  "pullCount": 1542,
  "pushCount": 47,
  "lastPull": "2025-01-15T...",
  "lastPush": "2025-01-10T...",
  "createdAt": "2025-01-01T..."
}

Hold-side implementation:

// New file: pkg/hold/pds/stats.go

func (p *HoldPDS) IncrementPullCount(ctx context.Context, did, repo string) error {
    rkey := statsRecordKey(did, repo)

    // Get or create stats record
    stats, err := p.GetStatsRecord(ctx, rkey)
    if err != nil || stats == nil {
        stats = &atproto.StatsRecord{
            Type:       atproto.StatsCollection,
            DID:        did,
            Repository: repo,
            PullCount:  0,
            PushCount:  0,
            CreatedAt:  time.Now(),
        }
    }

    // Increment and update
    stats.PullCount++
    stats.LastPull = time.Now()

    _, err = p.repomgr.UpdateRecord(ctx, p.uid, atproto.StatsCollection, rkey, stats)
    return err
}

Query endpoint (new XRPC):

GET /xrpc/io.atcr.hold.getStats?did={userDID}&repo={repository}
→ Returns JSON: { pullCount, pushCount, lastPull, lastPush }

GET /xrpc/io.atcr.hold.listStats?did={userDID}
→ Returns all stats for a user across all repos on this hold

AppView aggregation:

func (l *Loader) GetAggregatedStats(ctx context.Context, did, repo string) (*Stats, error) {
    // 1. Get all holds that have served this repo
    holdDIDs, _ := l.cache.GetHoldDIDsForRepo(did, repo)

    // 2. Query each hold for stats
    var total Stats
    for _, holdDID := range holdDIDs {
        holdURL := resolveHoldDID(holdDID)
        stats, _ := queryHoldStats(ctx, holdURL, did, repo)
        total.PullCount += stats.PullCount
        total.PushCount += stats.PushCount
    }

    return &total, nil
}

Files to modify:

pkg/atproto/lexicon.go - Add StatsCollection + StatsRecord
pkg/hold/pds/stats.go - New file for stats operations
pkg/hold/pds/xrpc.go - Add repo param to getBlob, add stats endpoints
pkg/appview/storage/proxy_blob_store.go - Pass repository to getBlob
pkg/appview/cache/loader.go - Aggregation logic

Migration Path

Phase 1: Add Valkey infrastructure

Add Valkey client to AppView
Create store interfaces that abstract SQLite vs Valkey
Dual-write OAuth sessions to both

Phase 2: Migrate sessions to Valkey

OAuth sessions, UI sessions, auth requests, pending device auth
Remove SQLite session tables
Test: restart AppView, users get logged out (acceptable)

Phase 3: Migrate devices to Valkey

Move device store to Valkey
Same data structure, different backend
Consider device expiry policy

Phase 4: Implement hold-side stats

Add io.atcr.hold.stats collection to hold's embedded PDS
Hold increments stats on blob access
Add XRPC endpoint: io.atcr.hold.getStats

Phase 5: AppView stats aggregation

Track holdDids per repo in Valkey cache
Query holds for stats, aggregate
Cache aggregated stats with TTL

Phase 6: Remove SQLite (optional)

Keep SQLite as optional cache layer for UI queries
Or: Query PDS on demand with Valkey caching
Jetstream still useful for real-time updates

Summary Table

Category	Tables	% of Schema	Truly Persistent?
Auth & Sessions + Metrics	6	32%	Yes
PDS Cache	11	58%	No (rebuildable)
Operational	2	10%	No

~58% of the database is cached ATProto data that could be rebuilt from PDSes.

13 KiB Raw Permalink Blame History