Files
at-container-registry/deploy/README.md
2026-02-10 20:48:24 -06:00

491 lines
11 KiB
Markdown

# ATCR UpCloud Deployment Guide
This guide walks you through deploying ATCR on UpCloud with Rocky Linux.
## Architecture
- **AppView** (atcr.io) - OCI registry API + web UI
- **Hold Service** (hold01.atcr.io) - Presigned URL generator for blob storage
- **Caddy** - Reverse proxy with automatic HTTPS
- **UpCloud Object Storage** (blobs.atcr.io) - S3-compatible blob storage
## Prerequisites
### 1. UpCloud Account
- Active UpCloud account
- Object Storage enabled
- Billing configured
### 2. Domain Names
You need three DNS records:
- `atcr.io` (or your domain) - AppView
- `hold01.atcr.io` - Hold service
- `blobs.atcr.io` - S3 storage (CNAME)
### 3. ATProto Account
- Bluesky/ATProto account
- Your DID (get from: `https://bsky.social/xrpc/com.atproto.identity.resolveHandle?handle=yourhandle.bsky.social`)
### 4. UpCloud Object Storage Bucket
Create an S3 bucket in UpCloud Object Storage:
1. Go to UpCloud Console → Storage → Object Storage
2. Create new bucket (e.g., `atcr-blobs`)
3. Note the region (e.g., `us-chi1`)
4. Generate access credentials (Access Key ID + Secret)
5. Note the endpoint (e.g., `s3.us-chi1.upcloudobjects.com`)
## Deployment Steps
### Step 1: Configure DNS
Set up DNS records (using Cloudflare or your DNS provider):
```
Type Name Value Proxy
────────────────────────────────────────────────────────────────────────────
A atcr.io [your-upcloud-ip] ☁️ DISABLED
A hold01.atcr.io [your-upcloud-ip] ☁️ DISABLED
CNAME blobs.atcr.io atcr-blobs.us-chi1.upcloudobjects.com ☁️ DISABLED
```
**IMPORTANT:**
- **DISABLE Cloudflare proxy** (gray cloud, not orange) for all three domains
- Proxied connections break Docker registry protocol and presigned URLs
- You'll still get HTTPS via Caddy's Let's Encrypt integration
Wait for DNS propagation (5-30 minutes). Verify with:
```bash
dig atcr.io
dig hold01.atcr.io
dig blobs.atcr.io
```
### Step 2: Create UpCloud Server
1. Go to UpCloud Console → Servers → Deploy a new server
2. Select location (match your S3 region if possible)
3. Select **Rocky Linux 9** operating system
4. Choose plan (minimum: 2 GB RAM, 1 CPU)
5. Configure hostname: `atcr`
6. Enable IPv4 public networking
7. **Optional:** Enable IPv6
8. **User data:** Paste contents of `deploy/init-upcloud.sh`
- Update `ATCR_REPO` variable with your git repository URL
- Or leave empty and manually copy files later
9. Create SSH key or use password authentication
10. Click **Deploy**
### Step 3: Wait for Initialization
The init script will:
- Update system packages (~2-5 minutes)
- Install Docker and Docker Compose
- Configure firewall
- Clone repository (if ATCR_REPO configured)
- Create systemd service
- Create helper scripts
Monitor progress:
```bash
# SSH into server
ssh root@[your-upcloud-ip]
# Check cloud-init logs
tail -f /var/log/cloud-init-output.log
```
Wait for the completion message in the logs.
### Step 4: Configure Environment
Edit the environment configuration:
```bash
# SSH into server
ssh root@[your-upcloud-ip]
# Edit environment file
cd /opt/atcr
nano .env
```
**Required configuration:**
```bash
# Domains
APPVIEW_DOMAIN=atcr.io
HOLD_DOMAIN=hold01.atcr.io
# Your ATProto DID
HOLD_OWNER=did:plc:your-did-here
# UpCloud S3 credentials
AWS_ACCESS_KEY_ID=your-access-key-id
AWS_SECRET_ACCESS_KEY=your-secret-access-key
AWS_REGION=us-chi1
S3_BUCKET=atcr-blobs
# S3 endpoint (choose one):
# Option 1: Custom domain (recommended)
S3_ENDPOINT=https://blobs.atcr.io
# Option 2: Direct UpCloud endpoint
# S3_ENDPOINT=https://s3.us-chi1.upcloudobjects.com
# Public access (optional)
HOLD_PUBLIC=false # Set to true to allow anonymous pulls
```
Save and exit (Ctrl+X, Y, Enter).
### Step 5: Start ATCR
```bash
# Start services
systemctl start atcr
# Check status
systemctl status atcr
# Verify containers are running
docker ps
```
You should see three containers:
- `atcr-caddy`
- `atcr-appview`
- `atcr-hold`
### Step 6: Complete Hold OAuth Registration
The hold service needs to register itself with your PDS:
```bash
# Get OAuth URL from logs
/opt/atcr/get-hold-oauth.sh
```
Look for output like:
```
Visit this URL to authorize: https://bsky.social/oauth/authorize?...
```
1. Copy the URL and open in your browser
2. Log in with your ATProto account
3. Authorize the hold service
4. Return to terminal
The hold service will create records in your PDS:
- `io.atcr.hold` - Hold definition
- `io.atcr.hold.crew` - Your membership as captain
Verify registration:
```bash
docker logs atcr-hold | grep -i "success\|registered\|created"
```
### Step 7: Test the Registry
#### Test 1: Check endpoints
```bash
# AppView (should return {})
curl https://atcr.io/v2/
# Hold service (should return {"status":"ok"})
curl https://hold01.atcr.io/health
```
#### Test 2: Configure Docker client
On your local machine:
```bash
# Install credential helper
# (Build from source or download release)
go install atcr.io/cmd/docker-credential-atcr@latest
# Configure Docker to use the credential helper
# Add to ~/.docker/config.json:
{
"credHelpers": {
"atcr.io": "atcr"
}
}
```
#### Test 3: Push a test image
```bash
# Tag an image
docker tag alpine:latest atcr.io/yourhandle/test:latest
# Push to ATCR
docker push atcr.io/yourhandle/test:latest
# Pull from ATCR
docker pull atcr.io/yourhandle/test:latest
```
### Step 8: Monitor and Maintain
#### View logs
```bash
# All services
/opt/atcr/logs.sh
# Specific service
/opt/atcr/logs.sh atcr-appview
/opt/atcr/logs.sh atcr-hold
/opt/atcr/logs.sh atcr-caddy
# Or use docker directly
docker logs -f atcr-appview
```
#### Enable debug logging
Toggle debug logging at runtime without restarting the container:
```bash
# Enable debug logging (auto-reverts after 30 minutes)
docker kill -s SIGUSR1 atcr-appview
docker kill -s SIGUSR1 atcr-hold
# Manually disable before timeout
docker kill -s SIGUSR1 atcr-appview
```
When toggled, you'll see:
```
level=INFO msg="Log level changed" from=INFO to=DEBUG trigger=SIGUSR1 auto_revert_in=30m0s
```
**Note:** Despite the command name, `docker kill -s SIGUSR1` does NOT stop the container. It sends a user-defined signal that the application handles to toggle debug mode.
#### Restart services
```bash
# Restart all
systemctl restart atcr
# Or use docker-compose
cd /opt/atcr
docker compose -f deploy/docker-compose.prod.yml restart
```
#### Rebuild after code changes
```bash
/opt/atcr/rebuild.sh
```
#### Update configuration
```bash
# Edit environment
nano /opt/atcr/.env
# Restart services
systemctl restart atcr
```
## Architecture Details
### Service Communication
```
Internet
Caddy (443) ───────────┐
├─→ atcr-appview:5000 (Registry API + Web UI)
└─→ atcr-hold:8080 (Presigned URL generator)
UpCloud S3 (blobs.atcr.io)
```
### Data Flow: Push
```
1. docker push atcr.io/user/image:tag
2. AppView ← Docker client (manifest + blob metadata)
3. AppView → ATProto PDS (store manifest record)
4. Hold ← Docker client (request presigned URL)
5. Hold → UpCloud S3 API (generate presigned URL)
6. Hold → Docker client (return presigned URL)
7. UpCloud S3 ← Docker client (upload blob directly)
```
### Data Flow: Pull
```
1. docker pull atcr.io/user/image:tag
2. AppView ← Docker client (get manifest)
3. AppView → ATProto PDS (fetch manifest record)
4. AppView → Docker client (return manifest with holdEndpoint)
5. Hold ← Docker client (request presigned URL)
6. Hold → UpCloud S3 API (generate presigned URL)
7. Hold → Docker client (return presigned URL)
8. UpCloud S3 ← Docker client (download blob directly)
```
**Key insight:** The hold service only generates presigned URLs. Actual data transfer happens directly between Docker clients and S3, minimizing bandwidth costs.
## Troubleshooting
### Issue: "Cannot connect to registry"
**Check DNS:**
```bash
dig atcr.io
dig hold01.atcr.io
```
**Check Caddy logs:**
```bash
docker logs atcr-caddy
```
**Check firewall:**
```bash
firewall-cmd --list-all
```
### Issue: "Certificate errors"
**Verify DNS is propagated:**
```bash
curl -I https://atcr.io
```
**Check Caddy is obtaining certificates:**
```bash
docker logs atcr-caddy | grep -i certificate
```
**Common causes:**
- DNS not propagated (wait 30 minutes)
- Cloudflare proxy enabled (must be disabled)
- Port 80/443 blocked by firewall
### Issue: "Presigned URLs fail"
**Check S3 endpoint configuration:**
```bash
docker exec atcr-hold env | grep S3
```
**Verify custom domain CNAME:**
```bash
dig blobs.atcr.io CNAME
```
**Test S3 connectivity:**
```bash
docker exec atcr-hold wget -O- https://blobs.atcr.io/
```
**Common causes:**
- Cloudflare proxy enabled on blobs.atcr.io
- S3_ENDPOINT misconfigured
- AWS credentials invalid
### Issue: "Hold registration fails"
**Check hold owner DID:**
```bash
docker exec atcr-hold env | grep HOLD_OWNER
```
**Verify OAuth flow:**
```bash
/opt/atcr/get-hold-oauth.sh
```
**Manual registration:**
```bash
# Get fresh OAuth URL
docker restart atcr-hold
docker logs -f atcr-hold
```
### Issue: "High bandwidth usage"
Presigned URLs should eliminate hold bandwidth. If seeing high usage:
**Verify presigned URLs are enabled:**
```bash
docker logs atcr-hold | grep -i presigned
```
**Check S3 configuration:**
```bash
docker exec atcr-hold env | grep S3_BUCKET
# Should show your S3 bucket name
```
**Verify direct S3 access:**
```bash
# Push should show 307 redirects in logs
docker logs -f atcr-hold
# Then push an image
```
### Automatic Updates
```bash
# Install automatic updates
dnf install -y dnf-automatic
# Enable timer
systemctl enable --now dnf-automatic.timer
```
### Monitoring
```bash
# Install monitoring tools
dnf install -y htop iotop nethogs
# Monitor resources
htop
# Monitor Docker
docker stats
```
### Backups
Critical data to backup:
- `/opt/atcr/.env` - Configuration
- Docker volumes:
- `atcr-appview-data` - Auth keys, UI database, OAuth tokens
- `caddy_data` - TLS certificates
```bash
# Backup volumes
docker run --rm \
-v atcr-appview-data:/data \
-v /backup:/backup \
alpine tar czf /backup/atcr-appview-data.tar.gz /data
```
## Scaling Considerations
### Single Server (Current Setup)
- Suitable for: 100-1000 users
- Bottleneck: AppView CPU (manifest queries)
- Storage: Unlimited (S3)
### Multi-Server (Future)
- Multiple AppView instances behind load balancer
- Shared Redis for hold cache (replace in-memory cache)
- PostgreSQL for UI database (replace SQLite)
- Multiple hold services (geo-distributed)
## Support
- Documentation: https://tangled.org/evan.jarrett.net/at-container-registry
- Issues: https://tangled.org/evan.jarrett.net/at-container-registry/issues
- Bluesky: @evan.jarrett.net