Files
anchorage/docs/architecture.md
William Gill 12bf35caf8 anchorage v1.0 initial tree
Greenfield Go multi-tenant IPFS Pinning Service wire-compatible with the
IPFS Pinning Services API spec. Paired 1:1 with Kubo over localhost RPC,
clustered via embedded NATS JetStream, Postgres source-of-truth with
RLS-enforced tenancy, Fiber + huma v2 for the HTTP surface, Authentik
OIDC for session login with kid-rotated HS256 JWT API tokens.

Feature-complete against the 22-milestone build plan, including the
ship-it v1.0 gap items:

  * admin CLIs: drain/uncordon, maintenance, mint-token, rotate-key,
    prune-denylist, rebalance --dry-run, cache-stats, cluster-presences
  * TTL leader election via NATS KV, fence tokens, JetStream dedup
  * rebalancer (plan/apply split), reconciler, requeue sweeper
  * ristretto caches with NATS-backed cross-node invalidation
    (placements live-nodes + token denylist)
  * maintenance watchdog for stuck cluster-pause flag
  * Prometheus /metrics with CIDR ACL, HTTP/pin/scheduler/cache gauges
  * rate limiting: session (10/min) + anonymous global (120/min)
  * integration tests: rebalance, refcount multi-org, RLS belt
  * goreleaser (tar + deb/rpm/apk + Alpine Docker) targeting Gitea

Stack: Cobra/Viper, Fiber v2 + huma v2, embedded NATS JetStream,
pgx/sqlc/golang-migrate, ristretto, TypeID, prometheus/client_golang,
testcontainers-go.
2026-04-16 18:13:36 -05:00

3.4 KiB

anchorage architecture

One-paragraph summary

anchorage is a horizontally-scalable IPFS Pinning Service. Each instance is paired 1:1 with its own Kubo daemon and runs an embedded NATS server that joins the cluster via gossip. Postgres is the single source of truth for pins, placements, refcounts, orgs, users, tokens, and audit log. NATS carries only signaling: per-node work queues (pin.jobs.<nodeID>), pin status fan-out (pin.events.<orgID>.<requestID>), heartbeats, cache-invalidation pubsub, and a TTL-based leader-election KV key.

Layers

Layer What it owns
Kubo (per node) Physical pinset on that node's local IPFS repo.
Postgres Logical state: pins, pin_placements (per-node), pin_refcount (per (node,cid)), orgs, users, memberships, tokens, denylist, nodes, audit.
NATS Non-authoritative signaling. Everything in NATS is reconstructable from Postgres.

Request lifecycle (POST /v1/pins)

  1. LB routes to some anchorage instance (not necessarily one that will hold a replica).
  2. One Postgres transaction:
    • Insert pins row (status=queued).
    • Compute placements via rendezvous hash of (orgID, cid, nodeID) over live nodes.
    • Insert pin_placements rows with fence=1.
    • Increment pin_refcount per target (node, cid).
    • Write an audit_log row.
  3. After commit, publish one pin.jobs.<targetNodeID> message per placement with Nats-Msg-Id = <requestID>:<nodeID>:<fence> for JetStream dedup.
  4. Target nodes pull from pin.jobs.<myNodeID>, call Kubo, UPDATE the placement row WHERE fence = $n, publish pin.events.<orgID>.<requestID>, then ack.
  5. WebSocket clients subscribed to the org + optional requestid see status frames.

Key invariants

  • Postgres is the commit point. NATS publish is follow-on; a dropped publish is recovered by the Requeue Sweeper.
  • Fence tokens prevent zombie writes. A zombie node's late ack UPDATE affects 0 rows because its fence was bumped during rebalance.
  • Publisher-side dedup absorbs retry storms. JetStream's Duplicates: 5m plus Nats-Msg-Id means the same logical job cannot be processed twice within the window.
  • Per-(node, cid) refcounts let two orgs pinning the same CID on the same node share one Kubo pin; unpin only fires when the refcount hits zero.
  • RLS is belt-and-suspenders. Every tenant-scoped table enables Postgres row-level security keyed on the anchorage.org_id GUC so a Go-layer bug can't bleed rows.

Clustering patterns

Adopted from the sibling kanrisha project:

  • TTL-based leader election via JetStream KV (ANCHORAGE_LEADER bucket, 5s TTL, kv.Create as CAS).
  • Fence tokens on every dispatched unit of work.
  • Publisher dedup via Nats-Msg-Id + stream Duplicates: 5m.
  • Write-ahead journaling — Postgres is the atomic commit point; NATS is signal only.
  • Graceful shutdown order — HTTP → consumers → NATS drain → pgxpool close.

Maintenance mode (see docs/cluster-ops.md)

Two independent toggles:

  1. Per-node drainnodes.status = 'drained'. Node stops pulling jobs, /v1/ready returns 503, rebalancer moves its placements off (with reason=drain audit rows).
  2. Cluster-wide pauseANCHORAGE_CLUSTER.maintenance=true in NATS KV. Rebalancer and Requeue Sweeper no-op; API keeps serving. Safety rail: cluster.maintenance.maxDuration (default 1h) warns loudly on forgotten flags.