# Analysis: AppView SQL Database Usage ## Overview The AppView uses SQLite with 19 tables. The key finding: **most data is a cache of ATProto records** that could theoretically be rebuilt from users' PDS instances. ## Data Categories ### 1. MUST PERSIST (Local State Only) These tables contain data that **cannot be reconstructed** from external sources: | Table | Purpose | Why It Must Persist | |-------|---------|---------------------| | `oauth_sessions` | OAuth tokens | Refresh tokens are stateful; losing them = users must re-auth | | `ui_sessions` | Web browser sessions | Session continuity for logged-in users | | `devices` | Approved devices + bcrypt secrets | User authorization decisions; secrets are one-way hashed | | `pending_device_auth` | In-flight auth flows | Short-lived (10min) but critical during auth | | `oauth_auth_requests` | OAuth flow state | Short-lived but required for auth completion | | `repository_stats` | Pull/push counts | **Locally tracked metrics** - not stored in ATProto | ### 2. CACHED FROM PDS (Rebuildable) These tables are essentially a **read-through cache** of ATProto data: | Table | Source | ATProto Collection | |-------|--------|-------------------| | `users` | User's PDS profile | `app.bsky.actor.profile` + DID document | | `manifests` | User's PDS | `io.atcr.manifest` records | | `tags` | User's PDS | `io.atcr.tag` records | | `layers` | Derived from manifests | Parsed from manifest content | | `manifest_references` | Derived from manifest lists | Parsed from multi-arch manifests | | `repository_annotations` | Manifest config blob | OCI annotations from config | | `repo_pages` | User's PDS | `io.atcr.repo.page` records | | `stars` | User's PDS | `io.atcr.sailor.star` records (synced via Jetstream) | | `hold_captain_records` | Hold's embedded PDS | `io.atcr.hold.captain` records | | `hold_crew_approvals` | Hold's embedded PDS | `io.atcr.hold.crew` records | | `hold_crew_denials` | Local authorization cache | Could re-check on demand | ### 3. OPERATIONAL | Table | Purpose | |-------|---------| | `schema_migrations` | Migration tracking | | `firehose_cursor` | Jetstream position (can restart from 0) | ## Key Insights ### What's Actually Unique to AppView? 1. **Authentication state** - OAuth sessions, devices, UI sessions 2. **Engagement metrics** - Pull/push counts (locally tracked, not in ATProto) ### What Could Be Eliminated? If ATCR fully embraced the ATProto model: 1. **`users`** - Query PDS on demand (with caching) 2. **`manifests`, `tags`, `layers`** - Query PDS on demand (with caching) 3. **`repository_annotations`** - Fetch manifest config on demand 4. **`repo_pages`** - Query PDS on demand 5. **`hold_*` tables** - Query hold's PDS on demand ### Trade-offs **Current approach (heavy caching):** - Fast queries for UI (search, browse, stats) - Offline resilience (PDS down doesn't break UI) - Complex sync logic (Jetstream consumer, backfill) - State can diverge from source of truth **Lighter approach (query on demand):** - Always fresh data - Simpler codebase (no sync) - Slower queries (network round-trips) - Depends on PDS availability ## Current Limitation: No Cache-Miss Queries **Finding:** There's no "query PDS on cache miss" logic. Users/manifests only enter the DB via: 1. OAuth login (user authenticates) 2. Jetstream events (firehose activity) **Problem:** If someone visits `atcr.io/alice/myapp` before alice is indexed → 404 **Where this happens:** - `pkg/appview/handlers/repository.go:50-53`: If `db.GetUserByDID()` returns nil → 404 - No fallback to `atproto.Client.ListRecords()` or similar **This matters for Valkey migration:** If cache is ephemeral and restarts clear it, you need cache-miss logic to repopulate on demand. Otherwise: - Restart Valkey → all users/manifests gone - Wait for Jetstream to re-index OR implement cache-miss queries **Cache-miss implementation design:** Existing code to reuse: `pkg/appview/jetstream/processor.go:43-97` (`EnsureUser`) ```go // New: pkg/appview/cache/loader.go type Loader struct { cache Cache // Valkey interface client *atproto.Client } // GetUser with cache-miss fallback func (l *Loader) GetUser(ctx context.Context, did string) (*User, error) { // 1. Try cache if user := l.cache.GetUser(did); user != nil { return user, nil } // 2. Cache miss - resolve identity (already queries network) _, handle, pdsEndpoint, err := atproto.ResolveIdentity(ctx, did) if err != nil { return nil, err // User doesn't exist in network } // 3. Fetch profile for avatar client := atproto.NewClient(pdsEndpoint, "", "") profile, _ := client.GetProfileRecord(ctx, did) avatarURL := "" if profile != nil && profile.Avatar != nil { avatarURL = atproto.BlobCDNURL(did, profile.Avatar.Ref.Link) } // 4. Cache and return user := &User{DID: did, Handle: handle, PDSEndpoint: pdsEndpoint, Avatar: avatarURL} l.cache.SetUser(user, 1*time.Hour) return user, nil } // GetManifestsForRepo with cache-miss fallback func (l *Loader) GetManifestsForRepo(ctx context.Context, did, repo string) ([]Manifest, error) { cacheKey := fmt.Sprintf("manifests:%s:%s", did, repo) // 1. Try cache if cached := l.cache.Get(cacheKey); cached != nil { return cached.([]Manifest), nil } // 2. Cache miss - get user's PDS endpoint user, err := l.GetUser(ctx, did) if err != nil { return nil, err } // 3. Query PDS for manifests client := atproto.NewClient(user.PDSEndpoint, "", "") records, _, err := client.ListRecordsForRepo(ctx, did, atproto.ManifestCollection, 100, "") if err != nil { return nil, err } // 4. Filter by repository and parse var manifests []Manifest for _, rec := range records { var m atproto.ManifestRecord if err := json.Unmarshal(rec.Value, &m); err != nil { continue } if m.Repository == repo { manifests = append(manifests, convertManifest(m)) } } // 5. Cache and return l.cache.Set(cacheKey, manifests, 10*time.Minute) return manifests, nil } ``` **Handler changes:** ```go // Before (repository.go:45-53): owner, err := db.GetUserByDID(h.DB, did) if owner == nil { RenderNotFound(w, r, h.Templates, h.RegistryURL) return } // After: owner, err := h.Loader.GetUser(r.Context(), did) if err != nil { RenderNotFound(w, r, h.Templates, h.RegistryURL) return } ``` **Performance considerations:** - Cache hit: ~1ms (Valkey lookup) - Cache miss: ~200-500ms (PDS round-trip) - First request after restart: slower but correct - Jetstream still useful for proactive warming --- ## Proposed Architecture: Valkey + ATProto ### Goal Replace SQLite with Valkey (Redis-compatible) for ephemeral state, push remaining persistent data to ATProto. ### What goes to Valkey (ephemeral, TTL-based) | Current Table | Valkey Key Pattern | TTL | Notes | |---------------|-------------------|-----|-------| | `oauth_sessions` | `oauth:{did}:{session_id}` | 90 days | Lost on restart = re-auth | | `ui_sessions` | `ui:{session_id}` | Session duration | Lost on restart = re-login | | `oauth_auth_requests` | `authreq:{state}` | 10 min | In-flight flows | | `pending_device_auth` | `pending:{device_code}` | 10 min | In-flight flows | | `firehose_cursor` | `cursor:jetstream` | None | Can restart from 0 | | All PDS cache tables | `cache:{collection}:{did}:{rkey}` | 10-60 min | Query PDS on miss | **Benefits:** - Multi-instance ready (shared Valkey) - No schema migrations - Natural TTL expiry - Simpler code (no SQL) ### What could become ATProto records | Current Table | Proposed Collection | Where Stored | Open Questions | |---------------|---------------------|--------------|----------------| | `devices` | `io.atcr.sailor.device` | User's PDS | Privacy: IP, user-agent sensitive? | | `repository_stats` | `io.atcr.repo.stats` | Hold's PDS or User's PDS | Who owns the stats? | **Devices → Valkey:** - Move current device table to Valkey - Key: `device:{did}:{device_id}` → `{name, secret_hash, ip, user_agent, created_at, last_used}` - TTL: Long (1 year?) or no expiry - Device list: `devices:{did}` → Set of device IDs - Secret validation works the same, just different backend **Service auth exploration (future):** The challenge with pure ATProto service auth is the AppView still needs the user's OAuth session to write manifests to their PDS. The current flow: 1. User authenticates via OAuth → AppView gets OAuth tokens 2. AppView issues registry JWT to credential helper 3. Credential helper presents JWT on each push/pull 4. AppView uses OAuth session to write to user's PDS Service auth could work for the hold side (AppView → Hold), but not for the user's OAuth session. **Repository stats → Hold's PDS:** **Challenge discovered:** The hold's `getBlob` endpoint only receives `did` + `cid`, not the repository name. Current flow (`proxy_blob_store.go:358-362`): ```go xrpcURL := fmt.Sprintf("%s%s?did=%s&cid=%s&method=%s", p.holdURL, atproto.SyncGetBlob, p.ctx.DID, dgst.String(), operation) ``` **Implementation options:** **Option A: Add repository parameter to getBlob (recommended)** ```go // Modified AppView call: xrpcURL := fmt.Sprintf("%s%s?did=%s&cid=%s&method=%s&repo=%s", p.holdURL, atproto.SyncGetBlob, p.ctx.DID, dgst.String(), operation, p.ctx.Repository) ``` ```go // Modified hold handler (xrpc.go:969): func (h *XRPCHandler) HandleGetBlob(w http.ResponseWriter, r *http.Request) { did := r.URL.Query().Get("did") cidOrDigest := r.URL.Query().Get("cid") repo := r.URL.Query().Get("repo") // NEW // ... existing blob handling ... // Increment stats if repo provided if repo != "" { go h.pds.IncrementPullCount(did, repo) // Async, non-blocking } } ``` **Stats record structure:** ``` Collection: io.atcr.hold.stats Rkey: base64(did:repository) // Deterministic, unique { "$type": "io.atcr.hold.stats", "did": "did:plc:alice123", "repository": "myapp", "pullCount": 1542, "pushCount": 47, "lastPull": "2025-01-15T...", "lastPush": "2025-01-10T...", "createdAt": "2025-01-01T..." } ``` **Hold-side implementation:** ```go // New file: pkg/hold/pds/stats.go func (p *HoldPDS) IncrementPullCount(ctx context.Context, did, repo string) error { rkey := statsRecordKey(did, repo) // Get or create stats record stats, err := p.GetStatsRecord(ctx, rkey) if err != nil || stats == nil { stats = &atproto.StatsRecord{ Type: atproto.StatsCollection, DID: did, Repository: repo, PullCount: 0, PushCount: 0, CreatedAt: time.Now(), } } // Increment and update stats.PullCount++ stats.LastPull = time.Now() _, err = p.repomgr.UpdateRecord(ctx, p.uid, atproto.StatsCollection, rkey, stats) return err } ``` **Query endpoint (new XRPC):** ``` GET /xrpc/io.atcr.hold.getStats?did={userDID}&repo={repository} → Returns JSON: { pullCount, pushCount, lastPull, lastPush } GET /xrpc/io.atcr.hold.listStats?did={userDID} → Returns all stats for a user across all repos on this hold ``` **AppView aggregation:** ```go func (l *Loader) GetAggregatedStats(ctx context.Context, did, repo string) (*Stats, error) { // 1. Get all holds that have served this repo holdDIDs, _ := l.cache.GetHoldDIDsForRepo(did, repo) // 2. Query each hold for stats var total Stats for _, holdDID := range holdDIDs { holdURL := resolveHoldDID(holdDID) stats, _ := queryHoldStats(ctx, holdURL, did, repo) total.PullCount += stats.PullCount total.PushCount += stats.PushCount } return &total, nil } ``` **Files to modify:** - `pkg/atproto/lexicon.go` - Add `StatsCollection` + `StatsRecord` - `pkg/hold/pds/stats.go` - New file for stats operations - `pkg/hold/pds/xrpc.go` - Add `repo` param to getBlob, add stats endpoints - `pkg/appview/storage/proxy_blob_store.go` - Pass repository to getBlob - `pkg/appview/cache/loader.go` - Aggregation logic ### Migration Path **Phase 1: Add Valkey infrastructure** - Add Valkey client to AppView - Create store interfaces that abstract SQLite vs Valkey - Dual-write OAuth sessions to both **Phase 2: Migrate sessions to Valkey** - OAuth sessions, UI sessions, auth requests, pending device auth - Remove SQLite session tables - Test: restart AppView, users get logged out (acceptable) **Phase 3: Migrate devices to Valkey** - Move device store to Valkey - Same data structure, different backend - Consider device expiry policy **Phase 4: Implement hold-side stats** - Add `io.atcr.hold.stats` collection to hold's embedded PDS - Hold increments stats on blob access - Add XRPC endpoint: `io.atcr.hold.getStats` **Phase 5: AppView stats aggregation** - Track holdDids per repo in Valkey cache - Query holds for stats, aggregate - Cache aggregated stats with TTL **Phase 6: Remove SQLite (optional)** - Keep SQLite as optional cache layer for UI queries - Or: Query PDS on demand with Valkey caching - Jetstream still useful for real-time updates ## Summary Table | Category | Tables | % of Schema | Truly Persistent? | |----------|--------|-------------|-------------------| | Auth & Sessions + Metrics | 6 | 32% | Yes | | PDS Cache | 11 | 58% | No (rebuildable) | | Operational | 2 | 10% | No | **~58% of the database is cached ATProto data that could be rebuilt from PDSes.**