mirror of
https://tangled.org/evan.jarrett.net/at-container-registry
synced 2026-04-19 16:15:01 +00:00
692 lines
21 KiB
Markdown
692 lines
21 KiB
Markdown
# Running an ATProto Relay for ATCR Hold Discovery
|
|
|
|
This document explains what it takes to run an ATProto relay for indexing ATCR hold records, including infrastructure requirements, configuration, and trade-offs.
|
|
|
|
## Overview
|
|
|
|
### What is an ATProto Relay?
|
|
|
|
An ATProto relay is a service that:
|
|
- **Subscribes to multiple PDS hosts** and aggregates their data streams
|
|
- **Outputs a combined "firehose"** event stream for real-time network updates
|
|
- **Validates data integrity** and identity signatures
|
|
- **Provides discovery endpoints** like `com.atproto.sync.listReposByCollection`
|
|
|
|
The relay acts as a network-wide indexer, making it possible to discover which DIDs have records of specific types (collections).
|
|
|
|
### Why ATCR Needs a Relay
|
|
|
|
ATCR uses hold captain records (`io.atcr.hold.captain`) stored in hold PDSs to enable hold discovery. The `listReposByCollection` endpoint allows AppViews to efficiently discover all holds in the network without crawling every PDS individually.
|
|
|
|
**The problem**: Standard Bluesky relays appear to only index collections from `did:plc` DIDs, not `did:web` DIDs. Since ATCR holds use `did:web` (e.g., `did:web:hold01.atcr.io`), they aren't discoverable via Bluesky's public relays.
|
|
|
|
## Recommended Approach: Phased Implementation
|
|
|
|
ATCR's discovery needs evolve as the network grows. Start simple, scale as needed.
|
|
|
|
## MVP: Minimal Discovery Service
|
|
|
|
For initial deployment with a small number of holds (dozens, not thousands), build a **lightweight custom discovery service** focused solely on `io.atcr.*` collections.
|
|
|
|
### Why Minimal Service for MVP?
|
|
|
|
- **Scope**: Only index `io.atcr.*` collections (manifests, tags, captain/crew, sailor profiles)
|
|
- **Opt-in**: Only crawls PDSs that explicitly call `requestCrawl`
|
|
- **Small scale**: Dozens of holds, not millions of users
|
|
- **Simple storage**: SQLite sufficient for current scale
|
|
- **Cost-effective**: $5-10/month VPS
|
|
|
|
### Architecture
|
|
|
|
**Inbound endpoints:**
|
|
```
|
|
POST /xrpc/com.atproto.sync.requestCrawl
|
|
→ Hold registers itself for crawling
|
|
|
|
GET /xrpc/com.atproto.sync.listReposByCollection?collection=io.atcr.hold.captain
|
|
→ AppView discovers holds
|
|
```
|
|
|
|
**Outbound (client to PDS):**
|
|
```
|
|
1. com.atproto.repo.describeRepo → verify PDS exists
|
|
2. com.atproto.sync.getRepo → fetch full CAR file (initial backfill)
|
|
3. com.atproto.sync.subscribeRepos → WebSocket for real-time updates
|
|
4. Parse events → extract io.atcr.* records → index in SQLite
|
|
```
|
|
|
|
**Data flow:**
|
|
|
|
**Initial crawl (on requestCrawl):**
|
|
```
|
|
1. Hold POSTs requestCrawl → service queues crawl job
|
|
2. Service fetches getRepo (CAR file) from hold's PDS for backfill
|
|
3. Service parses CAR using indigo libraries
|
|
4. Service extracts io.atcr.* records (captain, crew, manifests, etc.)
|
|
5. Service stores: (did, collection, rkey, record_data) in SQLite
|
|
6. Service opens WebSocket to subscribeRepos for this DID
|
|
7. Service stores cursor for reconnection handling
|
|
```
|
|
|
|
**Ongoing updates (WebSocket):**
|
|
```
|
|
1. Receive commit events via subscribeRepos WebSocket
|
|
2. Parse event, filter to io.atcr.* collections only
|
|
3. Update indexed_records incrementally (insert/update/delete)
|
|
4. Update cursor after processing each event
|
|
5. On disconnect: reconnect with stored cursor to resume
|
|
```
|
|
|
|
**Discovery (AppView query):**
|
|
```
|
|
1. AppView GETs listReposByCollection?collection=io.atcr.hold.captain
|
|
2. Service queries SQLite WHERE collection='io.atcr.hold.captain'
|
|
3. Service returns list of DIDs with that collection
|
|
```
|
|
|
|
### Implementation Requirements
|
|
|
|
**Technologies:**
|
|
- Go (reuse indigo libraries for CAR parsing and WebSocket)
|
|
- SQLite (sufficient for dozens/hundreds of holds)
|
|
- Standard HTTP server + WebSocket client
|
|
|
|
**Core components:**
|
|
|
|
1. **HTTP handlers** (`cmd/atcr-discovery/handlers/`):
|
|
- `requestCrawl` - queue crawl jobs
|
|
- `listReposByCollection` - query indexed collections
|
|
|
|
2. **Crawler** (`pkg/discovery/crawler.go`):
|
|
- Fetch CAR files from PDSs for initial backfill
|
|
- Parse with `github.com/bluesky-social/indigo/repo`
|
|
- Extract records, filter to `io.atcr.*` only
|
|
|
|
3. **WebSocket subscriber** (`pkg/discovery/subscriber.go`):
|
|
- WebSocket client for `com.atproto.sync.subscribeRepos`
|
|
- Event parsing and filtering
|
|
- Cursor management and persistence
|
|
- Automatic reconnection with resume
|
|
|
|
4. **Storage** (`pkg/discovery/storage.go`):
|
|
- SQLite schema for indexed records
|
|
- Indexes on (collection, did) for fast queries
|
|
- Cursor storage for reconnection
|
|
|
|
5. **Worker** (`pkg/discovery/worker.go`):
|
|
- Background crawl job processor
|
|
- WebSocket connection manager
|
|
- Health monitoring for subscriptions
|
|
|
|
**Database schema:**
|
|
```sql
|
|
CREATE TABLE indexed_records (
|
|
did TEXT NOT NULL,
|
|
collection TEXT NOT NULL,
|
|
rkey TEXT NOT NULL,
|
|
record_data TEXT NOT NULL, -- JSON
|
|
indexed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
|
PRIMARY KEY (did, collection, rkey)
|
|
);
|
|
|
|
CREATE INDEX idx_collection ON indexed_records(collection);
|
|
CREATE INDEX idx_did ON indexed_records(did);
|
|
|
|
CREATE TABLE crawl_queue (
|
|
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
|
hostname TEXT NOT NULL UNIQUE,
|
|
did TEXT,
|
|
status TEXT DEFAULT 'pending', -- pending, in_progress, subscribed, failed
|
|
last_crawled_at TIMESTAMP,
|
|
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
|
|
);
|
|
|
|
CREATE TABLE subscriptions (
|
|
did TEXT PRIMARY KEY,
|
|
hostname TEXT NOT NULL,
|
|
cursor INTEGER, -- Last processed sequence number
|
|
status TEXT DEFAULT 'active', -- active, disconnected, failed
|
|
last_event_at TIMESTAMP,
|
|
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
|
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
|
|
);
|
|
```
|
|
|
|
**Leveraging indigo libraries:**
|
|
|
|
```go
|
|
import (
|
|
"github.com/bluesky-social/indigo/repo"
|
|
"github.com/bluesky-social/indigo/atproto/syntax"
|
|
"github.com/bluesky-social/indigo/events"
|
|
"github.com/gorilla/websocket"
|
|
"github.com/ipfs/go-cid"
|
|
)
|
|
|
|
// Initial backfill: Parse CAR file
|
|
r, err := repo.ReadRepoFromCar(ctx, bytes.NewReader(carData))
|
|
if err != nil {
|
|
return err
|
|
}
|
|
|
|
// Iterate records
|
|
err = r.ForEach(ctx, "", func(path string, nodeCid cid.Cid) error {
|
|
// Parse collection from path (e.g., "io.atcr.hold.captain/self")
|
|
parts := strings.Split(path, "/")
|
|
if len(parts) != 2 {
|
|
return nil // skip invalid paths
|
|
}
|
|
|
|
collection := parts[0]
|
|
rkey := parts[1]
|
|
|
|
// Filter to io.atcr.* only
|
|
if !strings.HasPrefix(collection, "io.atcr.") {
|
|
return nil
|
|
}
|
|
|
|
// Get record data
|
|
recordBytes, err := r.GetRecord(ctx, path)
|
|
if err != nil {
|
|
return err
|
|
}
|
|
|
|
// Store in database
|
|
return store.IndexRecord(did, collection, rkey, recordBytes)
|
|
})
|
|
|
|
// WebSocket subscription: Listen for updates
|
|
wsURL := fmt.Sprintf("wss://%s/xrpc/com.atproto.sync.subscribeRepos", hostname)
|
|
conn, _, err := websocket.DefaultDialer.Dial(wsURL, nil)
|
|
if err != nil {
|
|
return err
|
|
}
|
|
|
|
// Read events
|
|
rsc := &events.RepoStreamCallbacks{
|
|
RepoCommit: func(evt *events.RepoCommit) error {
|
|
// Filter to io.atcr.* collections only
|
|
for _, op := range evt.Ops {
|
|
if !strings.HasPrefix(op.Collection, "io.atcr.") {
|
|
continue
|
|
}
|
|
|
|
// Process create/update/delete operations
|
|
switch op.Action {
|
|
case "create", "update":
|
|
store.IndexRecord(evt.Repo, op.Collection, op.Rkey, op.Record)
|
|
case "delete":
|
|
store.DeleteRecord(evt.Repo, op.Collection, op.Rkey)
|
|
}
|
|
}
|
|
|
|
// Update cursor
|
|
return store.UpdateCursor(evt.Repo, evt.Seq)
|
|
},
|
|
}
|
|
|
|
// Process stream
|
|
scheduler := events.NewScheduler("discovery-worker", conn.RemoteAddr().String(), rsc)
|
|
return events.HandleRepoStream(ctx, conn, scheduler)
|
|
```
|
|
|
|
### Infrastructure Requirements
|
|
|
|
**Minimum specs:**
|
|
- 1 vCPU
|
|
- 1-2GB RAM
|
|
- 20GB SSD
|
|
- Minimal bandwidth (<1GB/day for dozens of holds)
|
|
|
|
**Estimated cost:**
|
|
- Hetzner CX11: €4.15/month (~$5/month)
|
|
- DigitalOcean Basic: $6/month
|
|
- Fly.io: ~$5-10/month
|
|
|
|
**Deployment:**
|
|
```bash
|
|
# Build
|
|
go build -o atcr-discovery ./cmd/atcr-discovery
|
|
|
|
# Run
|
|
export DATABASE_PATH="/var/lib/atcr-discovery/discovery.db"
|
|
export HTTP_ADDR=":8080"
|
|
./atcr-discovery
|
|
```
|
|
|
|
### Limitations
|
|
|
|
**What it does NOT do:**
|
|
- ❌ Serve outbound `subscribeRepos` firehose (AppViews query via listReposByCollection)
|
|
- ❌ Full MST validation (trust PDS validation)
|
|
- ❌ Scale to millions of accounts (SQLite limits)
|
|
- ❌ Multi-instance deployment (single process with SQLite)
|
|
|
|
**When to migrate to full relay:** When you have 1000+ holds, need PostgreSQL, or multi-instance deployment.
|
|
|
|
## Future Scale: Full Relay (Sync v1.1)
|
|
|
|
When ATCR grows beyond dozens of holds and needs real-time indexing, migrate to Bluesky's relay v1.1 implementation.
|
|
|
|
### When to Upgrade
|
|
|
|
**Indicators:**
|
|
- 100+ holds requesting frequent crawls
|
|
- Need real-time updates (re-crawl latency too high)
|
|
- Multiple AppView instances need coordinated discovery
|
|
- SQLite performance becomes bottleneck
|
|
|
|
### Relay v1.1 Characteristics
|
|
|
|
Released May 2025, this is Bluesky's current reference implementation.
|
|
|
|
**Key features:**
|
|
- **Non-archival**: Doesn't mirror full repository data, only processes firehose
|
|
- **WebSocket subscriptions**: Real-time updates from PDSs
|
|
- **Scalable**: 2 vCPU, 12GB RAM handles ~100M accounts
|
|
- **PostgreSQL**: Required for production scale
|
|
- **Admin UI**: Web dashboard for management
|
|
|
|
**Source**: `github.com/bluesky-social/indigo/cmd/relay`
|
|
|
|
### Migration Path
|
|
|
|
**Step 1: Deploy relay v1.1**
|
|
```bash
|
|
git clone https://github.com/bluesky-social/indigo.git
|
|
cd indigo
|
|
go build -o relay ./cmd/relay
|
|
|
|
export DATABASE_URL="postgres://relay:password@localhost:5432/atcr_relay"
|
|
./relay --admin-password="secure-password"
|
|
```
|
|
|
|
**Step 2: Migrate data**
|
|
- Export indexed records from SQLite
|
|
- Trigger crawls in relay for all known holds
|
|
- Verify relay indexes correctly
|
|
|
|
**Step 3: Update AppView configuration**
|
|
```bash
|
|
# Point to new relay
|
|
export ATCR_RELAY_ENDPOINT="https://relay.atcr.io"
|
|
```
|
|
|
|
**Step 4: Decommission minimal service**
|
|
- Monitor relay for stability
|
|
- Shut down old discovery service
|
|
|
|
### Infrastructure Requirements (Full Relay)
|
|
|
|
**Minimum specs:**
|
|
- 2 vCPU cores
|
|
- 12GB RAM
|
|
- 100GB SSD
|
|
- 30 Mbps bandwidth
|
|
|
|
**Estimated cost:**
|
|
- Hetzner: ~$30-40/month
|
|
- DigitalOcean: ~$50/month (with managed PostgreSQL)
|
|
- Fly.io: ~$35-50/month
|
|
|
|
## Collection Indexing: The `collectiondir` Microservice
|
|
|
|
The `com.atproto.sync.listReposByCollection` endpoint is **not part of the relay core**. It's provided by a separate microservice called **`collectiondir`**.
|
|
|
|
### What is collectiondir?
|
|
|
|
- **Separate service** that indexes collections for efficient discovery
|
|
- **Optional**: Not required by the ATProto spec, but very useful for AppViews
|
|
- **Deployed alongside relay** by Bluesky's public instances
|
|
|
|
### Current Limitation: did:plc Only?
|
|
|
|
Based on testing, Bluesky's public relays (with collectiondir) appear to:
|
|
- ✅ Index `io.atcr.*` collections from `did:plc` DIDs
|
|
- ❌ NOT index `io.atcr.*` collections from `did:web` DIDs
|
|
|
|
This means:
|
|
- ATCR manifests from users (did:plc) are discoverable
|
|
- ATCR hold captain records (did:web) are NOT discoverable
|
|
- The relay still **stores** all data (CAR file includes did:web records)
|
|
- The issue is specifically with **indexing** for `listReposByCollection`
|
|
|
|
### Configuring collectiondir
|
|
|
|
Documentation on configuring collectiondir is sparse. Possible approaches:
|
|
|
|
1. **Fork and modify**: Clone indigo repo, modify collectiondir to index all DIDs
|
|
2. **Configuration file**: Check if collectiondir accepts whitelist/configuration for indexed collections
|
|
3. **No filtering**: Default behavior might be to index everything, but Bluesky's deployment filters
|
|
|
|
**Action item**: Review `indigo/cmd/collectiondir` source code to understand configuration options.
|
|
|
|
## Multi-Relay Strategy
|
|
|
|
Holds can request crawls from **multiple relays** simultaneously. This enables:
|
|
|
|
### Scenario: Bluesky + ATCR Relays
|
|
|
|
**Setup:**
|
|
1. Hold deploys with embedded PDS at `did:web:hold01.atcr.io`
|
|
2. Hold creates captain record (`io.atcr.hold.captain/self`)
|
|
3. Hold requests crawl from **both**:
|
|
- Bluesky relay: `https://bsky.network/xrpc/com.atproto.sync.requestCrawl`
|
|
- ATCR relay: `https://relay.atcr.io/xrpc/com.atproto.sync.requestCrawl`
|
|
|
|
**Result:**
|
|
- ✅ Bluesky relay indexes social posts (if hold owner posts)
|
|
- ✅ ATCR relay indexes hold captain records
|
|
- ✅ AppViews query ATCR relay for hold discovery
|
|
- ✅ Independent networks - Bluesky posts work regardless of ATCR relay
|
|
|
|
### Request Crawl Script
|
|
|
|
The existing script can be modified to support multiple relays:
|
|
|
|
```bash
|
|
#!/bin/bash
|
|
# deploy/request-crawl.sh
|
|
|
|
HOSTNAME=$1
|
|
BLUESKY_RELAY=${2:-"https://bsky.network"}
|
|
ATCR_RELAY=${3:-"https://relay.atcr.io"}
|
|
|
|
echo "Requesting crawl for $HOSTNAME from Bluesky relay..."
|
|
curl -X POST "$BLUESKY_RELAY/xrpc/com.atproto.sync.requestCrawl" \
|
|
-H "Content-Type: application/json" \
|
|
-d "{\"hostname\": \"$HOSTNAME\"}"
|
|
|
|
echo "Requesting crawl for $HOSTNAME from ATCR relay..."
|
|
curl -X POST "$ATCR_RELAY/xrpc/com.atproto.sync.requestCrawl" \
|
|
-H "Content-Type: application/json" \
|
|
-d "{\"hostname\": \"$HOSTNAME\"}"
|
|
```
|
|
|
|
Usage:
|
|
```bash
|
|
./deploy/request-crawl.sh hold01.atcr.io
|
|
```
|
|
|
|
## Deployment: Minimal Discovery Service
|
|
|
|
### 1. Infrastructure Setup
|
|
|
|
**Provision VPS:**
|
|
- Hetzner CX11, DigitalOcean Basic, or Fly.io
|
|
- Public domain (e.g., `discovery.atcr.io`)
|
|
- TLS certificate (Let's Encrypt)
|
|
|
|
**Configure reverse proxy (optional - nginx):**
|
|
```nginx
|
|
upstream discovery {
|
|
server 127.0.0.1:8080;
|
|
}
|
|
|
|
server {
|
|
listen 443 ssl http2;
|
|
server_name discovery.atcr.io;
|
|
|
|
ssl_certificate /etc/letsencrypt/live/discovery.atcr.io/fullchain.pem;
|
|
ssl_certificate_key /etc/letsencrypt/live/discovery.atcr.io/privkey.pem;
|
|
|
|
location / {
|
|
proxy_pass http://discovery;
|
|
proxy_set_header Host $host;
|
|
proxy_set_header X-Real-IP $remote_addr;
|
|
}
|
|
}
|
|
```
|
|
|
|
### 2. Build and Deploy
|
|
|
|
```bash
|
|
# Clone ATCR repo
|
|
git clone https://github.com/atcr-io/atcr.git
|
|
cd atcr
|
|
|
|
# Build discovery service
|
|
go build -o atcr-discovery ./cmd/atcr-discovery
|
|
|
|
# Run
|
|
export DATABASE_PATH="/var/lib/atcr-discovery/discovery.db"
|
|
export HTTP_ADDR=":8080"
|
|
export CRAWL_INTERVAL="12h"
|
|
./atcr-discovery
|
|
```
|
|
|
|
### 3. Update Hold Startup
|
|
|
|
Each hold should request crawl on startup:
|
|
|
|
```bash
|
|
# In hold startup script or environment
|
|
export ATCR_DISCOVERY_URL="https://discovery.atcr.io"
|
|
|
|
# Request crawl from both Bluesky and ATCR
|
|
curl -X POST "https://bsky.network/xrpc/com.atproto.sync.requestCrawl" \
|
|
-H "Content-Type: application/json" \
|
|
-d "{\"hostname\": \"$HOLD_PUBLIC_URL\"}"
|
|
|
|
curl -X POST "$ATCR_DISCOVERY_URL/xrpc/com.atproto.sync.requestCrawl" \
|
|
-H "Content-Type: application/json" \
|
|
-d "{\"hostname\": \"$HOLD_PUBLIC_URL\"}"
|
|
```
|
|
|
|
### 4. Update AppView Configuration
|
|
|
|
Point AppView discovery worker to the discovery service:
|
|
|
|
```bash
|
|
# In .env.appview or environment
|
|
export ATCR_RELAY_ENDPOINT="https://discovery.atcr.io"
|
|
export ATCR_HOLD_DISCOVERY_ENABLED="true"
|
|
export ATCR_HOLD_DISCOVERY_INTERVAL="6h"
|
|
```
|
|
|
|
### 5. Monitor and Maintain
|
|
|
|
**Monitoring:**
|
|
- Check crawl queue status
|
|
- Monitor SQLite database size
|
|
- Track failed crawls
|
|
|
|
**Maintenance:**
|
|
- Re-crawl on schedule (every 6-24 hours)
|
|
- Prune stale records (>7 days old)
|
|
- Backup SQLite database regularly
|
|
|
|
## Trade-Offs and Considerations
|
|
|
|
### Running Your Own Relay
|
|
|
|
**Pros:**
|
|
- ✅ Full control over indexing (can index `did:web` holds)
|
|
- ✅ No dependency on third-party relay policies
|
|
- ✅ Can customize collection filters for ATCR-specific needs
|
|
- ✅ Relatively lightweight with modern relay implementation
|
|
|
|
**Cons:**
|
|
- ❌ Infrastructure cost (~$30-50/month minimum)
|
|
- ❌ Operational overhead (monitoring, updates, backups)
|
|
- ❌ Need to maintain as network grows
|
|
- ❌ Single point of failure for discovery (unless multi-relay)
|
|
|
|
### Alternatives to Running a Relay
|
|
|
|
#### 1. Direct Registration API
|
|
|
|
Holds POST to AppView on startup to register themselves:
|
|
|
|
**Pros:**
|
|
- ✅ Simplest implementation
|
|
- ✅ No relay infrastructure needed
|
|
- ✅ Immediate registration (no crawl delay)
|
|
|
|
**Cons:**
|
|
- ❌ Ties holds to specific AppView instances
|
|
- ❌ Breaks decentralized discovery model
|
|
- ❌ Each AppView has different hold registry
|
|
|
|
#### 2. Static Discovery File
|
|
|
|
Maintain `https://atcr.io/.well-known/holds.json`:
|
|
|
|
**Pros:**
|
|
- ✅ No infrastructure beyond static hosting
|
|
- ✅ All AppViews share same registry
|
|
- ✅ Simple to implement
|
|
|
|
**Cons:**
|
|
- ❌ Manual process (PRs/issues to add holds)
|
|
- ❌ Not real-time discovery
|
|
- ❌ Centralized control point
|
|
|
|
#### 3. Hybrid Approach
|
|
|
|
Combine multiple discovery mechanisms:
|
|
|
|
```go
|
|
func (w *HoldDiscoveryWorker) DiscoverHolds(ctx context.Context) error {
|
|
// 1. Fetch static registry
|
|
staticHolds := w.fetchStaticRegistry()
|
|
|
|
// 2. Query relay (if available)
|
|
relayHolds := w.queryRelay(ctx)
|
|
|
|
// 3. Accept direct registrations
|
|
registeredHolds := w.getDirectRegistrations()
|
|
|
|
// Merge and deduplicate
|
|
allHolds := mergeHolds(staticHolds, relayHolds, registeredHolds)
|
|
|
|
// Cache in database
|
|
for _, hold := range allHolds {
|
|
w.cacheHold(hold)
|
|
}
|
|
}
|
|
```
|
|
|
|
**Pros:**
|
|
- ✅ Multiple discovery paths (resilient)
|
|
- ✅ Gradual migration to relay-based discovery
|
|
- ✅ Supports both centralized bootstrap and decentralized growth
|
|
|
|
**Cons:**
|
|
- ❌ More complex implementation
|
|
- ❌ Potential for stale data if sources conflict
|
|
|
|
## Recommendations for ATCR
|
|
|
|
### Phase 1: MVP (Now - 1000 holds)
|
|
|
|
**Build minimal discovery service with WebSocket** (~$5-10/month):
|
|
1. Implement `requestCrawl` + `listReposByCollection` endpoints
|
|
2. Initial backfill via `getRepo` (CAR file parsing)
|
|
3. Real-time updates via WebSocket `subscribeRepos`
|
|
4. SQLite storage with cursor management
|
|
5. Filter to `io.atcr.*` collections only
|
|
|
|
**Deliverables:**
|
|
- `cmd/atcr-discovery` service
|
|
- SQLite schema with cursor storage
|
|
- CAR file parser (indigo libraries)
|
|
- WebSocket subscriber with reconnection
|
|
- Deployment scripts
|
|
|
|
**Cost**: ~$5-10/month VPS
|
|
|
|
**Why**: Minimal infrastructure, real-time updates, full control over indexing, sufficient for hundreds of holds.
|
|
|
|
### Phase 2: Migrate to Full Relay (1000+ holds)
|
|
|
|
**Deploy Bluesky relay v1.1** when scaling needed (~$30-50/month):
|
|
1. Set up PostgreSQL database
|
|
2. Deploy indigo relay with admin UI
|
|
3. Migrate indexed data from SQLite
|
|
4. Configure for `io.atcr.*` collection filtering (if possible)
|
|
5. Handle thousands of concurrent WebSocket connections
|
|
|
|
**Cost**: ~$30-50/month
|
|
|
|
**Why**: Proven scalability to 100M+ accounts, standardized protocol, community support, production-ready infrastructure.
|
|
|
|
### Phase 3: Multi-Relay Federation (Future)
|
|
|
|
**Decentralized relay network:**
|
|
1. Multiple ATCR relays operated independently
|
|
2. AppViews query multiple relays (fallback/redundancy)
|
|
3. Holds request crawls from all known ATCR relays
|
|
4. Cross-relay synchronization (optional)
|
|
|
|
**Why**: No single point of failure, fully decentralized discovery, geographic distribution.
|
|
|
|
## Next Steps
|
|
|
|
### For MVP Implementation
|
|
|
|
1. **Create `cmd/atcr-discovery` package structure**
|
|
- HTTP handlers for XRPC endpoints (`requestCrawl`, `listReposByCollection`)
|
|
- Crawler with indigo CAR parsing for initial backfill
|
|
- WebSocket subscriber for real-time updates
|
|
- SQLite storage layer with cursor management
|
|
- Background worker for managing subscriptions
|
|
|
|
2. **Database schema**
|
|
- `indexed_records` table for collection data
|
|
- `crawl_queue` table for crawl job management
|
|
- `subscriptions` table for WebSocket cursor tracking
|
|
- Indexes for efficient queries
|
|
|
|
3. **WebSocket implementation**
|
|
- Use `github.com/bluesky-social/indigo/events` for event handling
|
|
- Implement reconnection logic with cursor resume
|
|
- Filter events to `io.atcr.*` collections only
|
|
- Health monitoring for active subscriptions
|
|
|
|
4. **Testing strategy**
|
|
- Unit tests for CAR parsing
|
|
- Unit tests for event filtering
|
|
- Integration tests with mock PDSs and WebSocket
|
|
- Connection failure and reconnection testing
|
|
- Load testing with SQLite
|
|
|
|
5. **Deployment**
|
|
- Dockerfile for discovery service
|
|
- Deployment scripts (systemd, docker-compose)
|
|
- Monitoring setup (logs, metrics, WebSocket health)
|
|
- Alert on subscription failures
|
|
|
|
6. **Documentation**
|
|
- API documentation for XRPC endpoints
|
|
- Deployment guide
|
|
- Troubleshooting guide (WebSocket connection issues)
|
|
|
|
### Open Questions
|
|
|
|
1. **CAR parsing edge cases**: How to handle malformed CAR files or invalid records?
|
|
2. **WebSocket reconnection**: What's the optimal backoff strategy for reconnection attempts?
|
|
3. **Subscription management**: How many concurrent WebSocket connections can SQLite handle?
|
|
4. **Rate limiting**: Should discovery service rate-limit requestCrawl to prevent abuse?
|
|
5. **Authentication**: Should requestCrawl require authentication, or remain open?
|
|
6. **Cursor storage**: Should cursors be persisted immediately or batched for performance?
|
|
7. **Monitoring**: What metrics are most important for operational visibility (active subs, event rate, lag)?
|
|
8. **Error handling**: When a WebSocket dies, should we re-backfill via getRepo or trust cursor resume?
|
|
|
|
## References
|
|
|
|
### ATProto Specifications
|
|
- [ATProto Sync Specification](https://atproto.com/specs/sync)
|
|
- [Repository Specification](https://atproto.com/specs/repository)
|
|
- [CAR File Format](https://ipld.io/specs/transport/car/)
|
|
|
|
### Indigo Libraries
|
|
- [Indigo Repository](https://github.com/bluesky-social/indigo)
|
|
- [Indigo Repo Package](https://pkg.go.dev/github.com/bluesky-social/indigo/repo)
|
|
- [Indigo ATProto Package](https://pkg.go.dev/github.com/bluesky-social/indigo/atproto)
|
|
|
|
### Relay Reference (Future)
|
|
- [Relay v1.1 Updates](https://docs.bsky.app/blog/relay-sync-updates)
|
|
- [Indigo Relay Implementation](https://github.com/bluesky-social/indigo/tree/main/cmd/relay)
|
|
- [Running a Full-Network Relay](https://whtwnd.com/bnewbold.net/3kwzl7tye6u2y)
|