518 lines
16 KiB
Markdown
518 lines
16 KiB
Markdown
# Bring Your Own Storage (BYOS)
|
|
|
|
## Overview
|
|
|
|
ATCR supports "Bring Your Own Storage" (BYOS) for blob storage. This allows users to:
|
|
- Deploy their own storage service backed by S3/Storj/Minio/filesystem
|
|
- Control who can use their storage (public or private)
|
|
- Keep blob data in their own infrastructure while manifests remain in their ATProto PDS
|
|
|
|
## Architecture
|
|
|
|
```
|
|
┌─────────────────────────────────────────────┐
|
|
│ ATCR AppView (API) │
|
|
│ - Manifests → ATProto PDS │
|
|
│ - Auth & token validation │
|
|
│ - Blob routing (issues redirects) │
|
|
│ - Profile management │
|
|
└─────────────────┬───────────────────────────┘
|
|
│
|
|
│ Hold discovery priority:
|
|
│ 1. io.atcr.sailor.profile.defaultHold
|
|
│ 2. io.atcr.hold records
|
|
│ 3. AppView default_storage_endpoint
|
|
▼
|
|
┌─────────────────────────────────────────────┐
|
|
│ User's PDS │
|
|
│ - io.atcr.sailor.profile (hold preference) │
|
|
│ - io.atcr.hold records (own holds) │
|
|
│ - io.atcr.manifest records (with holdEP) │
|
|
└─────────────────┬───────────────────────────┘
|
|
│
|
|
│ Redirects to hold
|
|
▼
|
|
┌─────────────────────────────────────────────┐
|
|
│ Storage Service (Hold) │
|
|
│ - Blob storage (S3/Storj/Minio/filesystem) │
|
|
│ - Presigned URL generation │
|
|
│ - Authorization (DID-based) │
|
|
└─────────────────────────────────────────────┘
|
|
```
|
|
|
|
## ATProto Records
|
|
|
|
### io.atcr.sailor.profile
|
|
|
|
**NEW:** User profile for hold selection preferences. Created automatically on first authentication.
|
|
|
|
```json
|
|
{
|
|
"$type": "io.atcr.sailor.profile",
|
|
"defaultHold": "https://team-hold.example.com",
|
|
"createdAt": "2025-10-02T12:00:00Z",
|
|
"updatedAt": "2025-10-02T12:00:00Z"
|
|
}
|
|
```
|
|
|
|
**Record key:** Always `"self"` (only one profile per user)
|
|
|
|
**Behavior:**
|
|
- Created automatically when user first authenticates (OAuth or Basic Auth)
|
|
- If AppView has `default_storage_endpoint`, profile gets that as initial `defaultHold`
|
|
- User can update to join shared holds or use their own hold
|
|
- Set `defaultHold` to `null` to opt out of defaults (use own hold or AppView default)
|
|
|
|
**This solves the multi-hold problem:** Users who are crew members of multiple holds can explicitly choose which one to use via their profile.
|
|
|
|
### io.atcr.hold
|
|
|
|
Users create a hold record in their PDS to configure their own storage:
|
|
|
|
```json
|
|
{
|
|
"$type": "io.atcr.hold",
|
|
"endpoint": "https://alice-storage.example.com",
|
|
"owner": "did:plc:alice123",
|
|
"public": false,
|
|
"createdAt": "2025-10-01T12:00:00Z"
|
|
}
|
|
```
|
|
|
|
### io.atcr.hold.crew
|
|
|
|
Hold owners can add crew members (for shared storage):
|
|
|
|
```json
|
|
{
|
|
"$type": "io.atcr.hold.crew",
|
|
"hold": "at://did:plc:alice/io.atcr.hold/my-storage",
|
|
"member": "did:plc:bob456",
|
|
"role": "write",
|
|
"addedAt": "2025-10-01T12:00:00Z"
|
|
}
|
|
```
|
|
|
|
**Note:** Crew records are stored in the **hold owner's PDS**, not the crew member's PDS. This ensures the hold owner maintains full control over access.
|
|
|
|
## Storage Service
|
|
|
|
### Deployment
|
|
|
|
The storage service is a lightweight HTTP server that:
|
|
1. Accepts presigned URL requests
|
|
2. Verifies DID authorization
|
|
3. Generates presigned URLs for S3/Storj/etc
|
|
4. Returns URLs to AppView for client redirect
|
|
|
|
### Configuration
|
|
|
|
The hold service is configured entirely via environment variables. See `.env.example` for all options.
|
|
|
|
**Required environment variables:**
|
|
|
|
```bash
|
|
# Hold service public URL (REQUIRED)
|
|
HOLD_PUBLIC_URL=https://storage.example.com
|
|
|
|
# Storage driver type
|
|
STORAGE_DRIVER=s3
|
|
|
|
# For S3/Minio
|
|
AWS_ACCESS_KEY_ID=your_access_key
|
|
AWS_SECRET_ACCESS_KEY=your_secret_key
|
|
AWS_REGION=us-east-1
|
|
S3_BUCKET=my-blobs
|
|
|
|
# For Storj (optional - custom S3 endpoint)
|
|
# S3_ENDPOINT=https://gateway.storjshare.io
|
|
|
|
# For filesystem storage
|
|
# STORAGE_DRIVER=filesystem
|
|
# STORAGE_ROOT_DIR=/var/lib/atcr-storage
|
|
```
|
|
|
|
**Authorization:**
|
|
|
|
ATCR follows ATProto's public-by-default model with gated anonymous access:
|
|
|
|
**Read Access:**
|
|
- **Public hold** (`HOLD_PUBLIC=true`): Anonymous reads allowed (no authentication)
|
|
- **Private hold** (`HOLD_PUBLIC=false`): Requires authentication (any ATCR user with sailor.profile)
|
|
|
|
**Write Access:**
|
|
- Always requires authentication
|
|
- Must be hold owner OR crew member (verified via `io.atcr.hold.crew` records in owner's PDS)
|
|
|
|
**Key Points:**
|
|
- "Private" just means "no anonymous access" - not "limited user access"
|
|
- Any authenticated ATCR user can read from private holds
|
|
- Crew membership only controls WRITE access, not READ access
|
|
- This aligns with ATProto's public records model (no private PDS records yet)
|
|
|
|
### Running
|
|
|
|
```bash
|
|
# Build
|
|
go build -o atcr-hold ./cmd/hold
|
|
|
|
# Set environment variables (or use .env file)
|
|
export HOLD_PUBLIC_URL=https://storage.example.com
|
|
export STORAGE_DRIVER=s3
|
|
export AWS_ACCESS_KEY_ID=...
|
|
export AWS_SECRET_ACCESS_KEY=...
|
|
export AWS_REGION=us-east-1
|
|
export S3_BUCKET=my-blobs
|
|
|
|
# Run
|
|
./atcr-hold
|
|
```
|
|
|
|
**Registration (required):**
|
|
|
|
The hold service must be registered in a PDS to be discoverable by the AppView.
|
|
|
|
**Standard registration workflow:**
|
|
|
|
1. Set `HOLD_OWNER` to your DID:
|
|
```bash
|
|
export HOLD_OWNER=did:plc:your-did-here
|
|
```
|
|
|
|
2. Start the hold service:
|
|
```bash
|
|
./atcr-hold
|
|
```
|
|
|
|
3. **Check the logs** for the OAuth authorization URL:
|
|
```
|
|
================================================================================
|
|
OAUTH AUTHORIZATION REQUIRED
|
|
================================================================================
|
|
|
|
Please visit this URL to authorize the hold service:
|
|
|
|
https://bsky.app/authorize?client_id=...
|
|
|
|
Waiting for authorization...
|
|
================================================================================
|
|
```
|
|
|
|
4. Visit the URL in your browser and authorize
|
|
|
|
5. The hold service will:
|
|
- Exchange the authorization code for a token
|
|
- Create `io.atcr.hold` record in your PDS
|
|
- Create `io.atcr.hold.crew` record (making you the owner)
|
|
- Save registration state
|
|
|
|
6. On subsequent runs, the service checks if already registered and skips OAuth
|
|
|
|
**Alternative methods:**
|
|
|
|
- **Manual API registration**: Call `POST /register` with your own OAuth token
|
|
- **Completely manual**: Create PDS records yourself using any ATProto client
|
|
|
|
### Deploy to Fly.io
|
|
|
|
```bash
|
|
# Create fly.toml
|
|
cat > fly.toml <<EOF
|
|
app = "my-atcr-hold"
|
|
primary_region = "ord"
|
|
|
|
[env]
|
|
HOLD_PUBLIC_URL = "https://my-atcr-hold.fly.dev"
|
|
HOLD_SERVER_ADDR = ":8080"
|
|
STORAGE_DRIVER = "s3"
|
|
AWS_REGION = "us-east-1"
|
|
S3_BUCKET = "my-blobs"
|
|
HOLD_PUBLIC = "false"
|
|
|
|
[http_service]
|
|
internal_port = 8080
|
|
force_https = true
|
|
auto_stop_machines = true
|
|
auto_start_machines = true
|
|
min_machines_running = 0
|
|
|
|
[[vm]]
|
|
cpu_kind = "shared"
|
|
cpus = 1
|
|
memory_mb = 256
|
|
EOF
|
|
|
|
# Deploy
|
|
fly launch
|
|
fly deploy
|
|
|
|
# Set secrets
|
|
fly secrets set AWS_ACCESS_KEY_ID=...
|
|
fly secrets set AWS_SECRET_ACCESS_KEY=...
|
|
fly secrets set HOLD_OWNER=did:plc:your-did-here
|
|
|
|
# Check logs for OAuth URL on first run
|
|
fly logs
|
|
|
|
# Visit the OAuth URL shown in logs to authorize
|
|
# The hold service will register itself in your PDS
|
|
```
|
|
|
|
## Request Flow
|
|
|
|
### Push with BYOS
|
|
|
|
1. **Docker push** `atcr.io/alice/myapp:latest`
|
|
2. **AppView** resolves `alice` → `did:plc:alice123`
|
|
3. **AppView** discovers hold via priority logic:
|
|
- Check alice's `io.atcr.sailor.profile` for `defaultHold`
|
|
- If not set, check alice's `io.atcr.hold` records
|
|
- Fall back to AppView's `default_storage_endpoint`
|
|
4. **Found:** `alice.profile.defaultHold = "https://team-hold.example.com"`
|
|
5. **AppView** → team-hold: POST `/put-presigned-url`
|
|
```json
|
|
{
|
|
"did": "did:plc:alice123",
|
|
"digest": "sha256:abc123...",
|
|
"size": 1048576
|
|
}
|
|
```
|
|
6. **Hold service**:
|
|
- Verifies alice is authorized (checks crew records)
|
|
- Generates S3 presigned upload URL (15min expiry)
|
|
- Returns: `{"url": "https://s3.../blob?signature=..."}`
|
|
7. **AppView** → Docker: `307 Redirect` to presigned URL
|
|
8. **Docker** → S3: PUT blob directly (no proxy)
|
|
9. **Manifest** stored in alice's PDS with `holdEndpoint: "https://team-hold.example.com"`
|
|
|
|
### Pull with BYOS
|
|
|
|
1. **Docker pull** `atcr.io/alice/myapp:latest`
|
|
2. **AppView** fetches manifest from alice's PDS
|
|
3. **Manifest** contains `holdEndpoint: "https://team-hold.example.com"`
|
|
4. **AppView** caches: `(alice's DID, "myapp") → "https://team-hold.example.com"` (10min TTL)
|
|
5. **Docker** requests blobs: GET `/v2/alice/myapp/blobs/sha256:abc123`
|
|
6. **AppView** uses **cached hold from manifest** (not re-discovered)
|
|
7. **AppView** → team-hold: POST `/get-presigned-url`
|
|
8. **Hold service** returns presigned download URL
|
|
9. **AppView** → Docker: `307 Redirect`
|
|
10. **Docker** → S3: GET blob directly
|
|
|
|
**Key insight:** Pull uses the historical `holdEndpoint` from the manifest, ensuring blobs are fetched from where they were originally pushed, even if alice later changes her profile's `defaultHold`.
|
|
|
|
## Default Registry
|
|
|
|
The AppView can run its own storage service as the default:
|
|
|
|
### AppView config
|
|
|
|
```yaml
|
|
middleware:
|
|
- name: registry
|
|
options:
|
|
atproto-resolver:
|
|
default_storage_endpoint: https://storage.atcr.io
|
|
```
|
|
|
|
### Default hold service config
|
|
|
|
```bash
|
|
# Accept any authenticated DID
|
|
HOLD_PUBLIC=false # Requires authentication
|
|
|
|
# Or allow public reads
|
|
HOLD_PUBLIC=true # Public reads, auth required for writes
|
|
```
|
|
|
|
This provides free-tier shared storage for users who don't want to deploy their own.
|
|
|
|
## Storage Drivers Supported
|
|
|
|
The storage service uses distribution's storage drivers:
|
|
|
|
- **S3** - AWS S3, Minio, Storj (via S3 gateway)
|
|
- **Filesystem** - Local disk (for testing)
|
|
- **Azure** - Azure Blob Storage
|
|
- **GCS** - Google Cloud Storage
|
|
- **Swift** - OpenStack Swift
|
|
- **OSS** - Alibaba Cloud OSS
|
|
|
|
## Quotas
|
|
|
|
Quotas are NOT implemented in the storage service. Instead, use:
|
|
|
|
- **S3**: Bucket policies, lifecycle rules
|
|
- **Storj**: Project limits in Storj dashboard
|
|
- **Minio**: Quota enforcement features
|
|
- **Filesystem**: Disk quotas at OS level
|
|
|
|
## Security
|
|
|
|
### Authorization
|
|
|
|
Authorization is based on ATProto's public-by-default model:
|
|
|
|
**Read Authorization:**
|
|
- **Public hold** (`public: true` in hold record):
|
|
- Anonymous users: ✅ Allowed
|
|
- Any authenticated user: ✅ Allowed
|
|
|
|
- **Private hold** (`public: false` in hold record):
|
|
- Anonymous users: ❌ 401 Unauthorized
|
|
- Any authenticated ATCR user: ✅ Allowed (no crew membership required)
|
|
|
|
**Write Authorization:**
|
|
- Anonymous users: ❌ 401 Unauthorized
|
|
- Authenticated non-crew: ❌ 403 Forbidden
|
|
- Authenticated crew member: ✅ Allowed
|
|
- Hold owner: ✅ Allowed
|
|
|
|
**Implementation:**
|
|
- Hold service queries owner's PDS for `io.atcr.hold.crew` records
|
|
- Crew records are public ATProto records (read without authentication)
|
|
- "Private" holds only gate anonymous access, not authenticated user access
|
|
- This reflects ATProto's current limitation: no private PDS records
|
|
|
|
### Presigned URLs
|
|
|
|
- 15 minute expiry
|
|
- Client uploads/downloads directly to storage
|
|
- No data flows through AppView or hold service
|
|
|
|
### Private Holds
|
|
|
|
"Private" holds gate anonymous access while remaining accessible to authenticated users:
|
|
|
|
**What "Private" Means:**
|
|
- `HOLD_PUBLIC=false` prevents anonymous reads
|
|
- Any authenticated ATCR user can still read
|
|
- This aligns with ATProto's public records model
|
|
|
|
**Write Control:**
|
|
- Only hold owner and crew members can write
|
|
- Crew membership managed via `io.atcr.hold.crew` records in owner's PDS
|
|
- Removing crew member immediately revokes write access
|
|
|
|
**Future: True Private Access**
|
|
- When ATProto adds private PDS records, ATCR can support truly private repos
|
|
- For now, "private" = "authenticated-only access"
|
|
|
|
## Example: Personal Storage
|
|
|
|
Alice wants to use her own Storj account:
|
|
|
|
1. **Set environment variables**:
|
|
```bash
|
|
export HOLD_PUBLIC_URL=https://alice-storage.fly.dev
|
|
export HOLD_OWNER=did:plc:alice123
|
|
export STORAGE_DRIVER=s3
|
|
export AWS_ACCESS_KEY_ID=your_storj_access_key
|
|
export AWS_SECRET_ACCESS_KEY=your_storj_secret_key
|
|
export S3_ENDPOINT=https://gateway.storjshare.io
|
|
export S3_BUCKET=alice-blobs
|
|
```
|
|
|
|
2. **Deploy hold service** to Fly.io - auto-registration creates hold + crew record
|
|
|
|
3. **Push images** - AppView automatically routes to her storage
|
|
|
|
## Example: Team Hold
|
|
|
|
A company wants shared storage for their team:
|
|
|
|
1. **Deploy hold service** with S3 credentials and auto-registration:
|
|
```bash
|
|
export HOLD_PUBLIC_URL=https://company-hold.fly.dev
|
|
export HOLD_OWNER=did:plc:admin
|
|
export HOLD_PUBLIC=false
|
|
export STORAGE_DRIVER=s3
|
|
export AWS_ACCESS_KEY_ID=...
|
|
export AWS_SECRET_ACCESS_KEY=...
|
|
export S3_BUCKET=company-blobs
|
|
```
|
|
|
|
2. **Hold service auto-registers** on first run, creating:
|
|
- Hold record in admin's PDS
|
|
- Crew record making admin the owner
|
|
|
|
3. **Admin adds crew members** via ATProto client or manually:
|
|
```bash
|
|
# Using atproto client
|
|
atproto put-record \
|
|
--collection io.atcr.hold.crew \
|
|
--rkey "company-did:plc:engineer1" \
|
|
--value '{
|
|
"$type": "io.atcr.hold.crew",
|
|
"hold": "at://did:plc:admin/io.atcr.hold/company",
|
|
"member": "did:plc:engineer1",
|
|
"role": "write"
|
|
}'
|
|
```
|
|
|
|
4. **Team members set their profile** to use the shared hold:
|
|
```bash
|
|
# Engineer updates their sailor profile
|
|
atproto put-record \
|
|
--collection io.atcr.sailor.profile \
|
|
--rkey "self" \
|
|
--value '{
|
|
"$type": "io.atcr.sailor.profile",
|
|
"defaultHold": "https://company-hold.fly.dev"
|
|
}'
|
|
```
|
|
|
|
5. **Hold service queries PDS** for crew records to authorize writes
|
|
6. **Engineers push/pull** using `atcr.io/engineer1/myapp` - blobs go to company hold
|
|
|
|
## Limitations
|
|
|
|
1. **No resume/partial uploads** - Storage service doesn't track upload state
|
|
2. **No advanced features** - Just basic put/get, no deduplication logic
|
|
3. **In-memory cache** - Hold endpoint cache is in-memory (for production, use Redis)
|
|
4. **Manual profile updates** - No UI for updating sailor profile (must use ATProto client)
|
|
|
|
## Performance Optimization: S3 Presigned URLs
|
|
|
|
**Status:** Planned implementation (see [PRESIGNED_URLS.md](./PRESIGNED_URLS.md))
|
|
|
|
Currently, hold services act as proxies for blob data. With presigned URLs:
|
|
|
|
- **Downloads:** Docker → S3 direct (via 307 redirect)
|
|
- **Uploads:** Docker → AppView → S3 (via presigned URL)
|
|
- **Hold service bandwidth:** Reduced by 99.98% (only orchestration)
|
|
|
|
**Benefits:**
|
|
- Hold services can run on minimal infrastructure ($5/month instances)
|
|
- Direct S3 transfers at maximum speed
|
|
- Scales to arbitrarily large images
|
|
- Works with Storj, MinIO, Backblaze B2, Cloudflare R2
|
|
|
|
See [PRESIGNED_URLS.md](./PRESIGNED_URLS.md) for complete technical details and implementation guide.
|
|
|
|
## Future Improvements
|
|
|
|
1. **S3 Presigned URLs** - Implement direct S3 URLs (see [PRESIGNED_URLS.md](./PRESIGNED_URLS.md))
|
|
2. **Automatic failover** - Multiple storage endpoints, fallback to default
|
|
3. **Storage analytics** - Track usage per DID
|
|
4. **Quota integration** - Optional quota tracking in storage service
|
|
5. **Profile management UI** - Web interface for users to manage their sailor profile
|
|
6. **Distributed cache** - Redis/Memcached for hold endpoint cache in multi-instance deployments
|
|
|
|
## Comparison to Default Storage
|
|
|
|
| Feature | Default (Shared S3) | BYOS |
|
|
|---------|---------------------|------|
|
|
| Setup | None required | Deploy storage service |
|
|
| Cost | Free (with quota) | User pays for S3/Storj |
|
|
| Control | Limited | Full control |
|
|
| Performance | Shared | Dedicated |
|
|
| Quotas | Enforced by AppView | User managed |
|
|
| Privacy | Blobs in shared bucket | Blobs in user's bucket |
|
|
|
|
## References
|
|
|
|
- [ATProto Lexicon Spec](https://atproto.com/specs/lexicon)
|
|
- [Distribution Storage Drivers](https://distribution.github.io/distribution/storage-drivers/)
|
|
- [S3 Presigned URLs](https://docs.aws.amazon.com/AmazonS3/latest/userguide/PresignedUrlUploadObject.html)
|
|
- [Storj Documentation](https://docs.storj.io/)
|