Greenfield Go multi-tenant IPFS Pinning Service wire-compatible with the
IPFS Pinning Services API spec. Paired 1:1 with Kubo over localhost RPC,
clustered via embedded NATS JetStream, Postgres source-of-truth with
RLS-enforced tenancy, Fiber + huma v2 for the HTTP surface, Authentik
OIDC for session login with kid-rotated HS256 JWT API tokens.
Feature-complete against the 22-milestone build plan, including the
ship-it v1.0 gap items:
* admin CLIs: drain/uncordon, maintenance, mint-token, rotate-key,
prune-denylist, rebalance --dry-run, cache-stats, cluster-presences
* TTL leader election via NATS KV, fence tokens, JetStream dedup
* rebalancer (plan/apply split), reconciler, requeue sweeper
* ristretto caches with NATS-backed cross-node invalidation
(placements live-nodes + token denylist)
* maintenance watchdog for stuck cluster-pause flag
* Prometheus /metrics with CIDR ACL, HTTP/pin/scheduler/cache gauges
* rate limiting: session (10/min) + anonymous global (120/min)
* integration tests: rebalance, refcount multi-org, RLS belt
* goreleaser (tar + deb/rpm/apk + Alpine Docker) targeting Gitea
Stack: Cobra/Viper, Fiber v2 + huma v2, embedded NATS JetStream,
pgx/sqlc/golang-migrate, ristretto, TypeID, prometheus/client_golang,
testcontainers-go.
3.4 KiB
3.4 KiB
anchorage architecture
One-paragraph summary
anchorage is a horizontally-scalable IPFS Pinning Service. Each instance is paired 1:1 with its own Kubo daemon and runs an embedded NATS server that joins the cluster via gossip. Postgres is the single source of truth for pins, placements, refcounts, orgs, users, tokens, and audit log. NATS carries only signaling: per-node work queues (pin.jobs.<nodeID>), pin status fan-out (pin.events.<orgID>.<requestID>), heartbeats, cache-invalidation pubsub, and a TTL-based leader-election KV key.
Layers
| Layer | What it owns |
|---|---|
| Kubo (per node) | Physical pinset on that node's local IPFS repo. |
| Postgres | Logical state: pins, pin_placements (per-node), pin_refcount (per (node,cid)), orgs, users, memberships, tokens, denylist, nodes, audit. |
| NATS | Non-authoritative signaling. Everything in NATS is reconstructable from Postgres. |
Request lifecycle (POST /v1/pins)
- LB routes to some anchorage instance (not necessarily one that will hold a replica).
- One Postgres transaction:
- Insert
pinsrow (status=queued). - Compute placements via rendezvous hash of
(orgID, cid, nodeID)over live nodes. - Insert
pin_placementsrows withfence=1. - Increment
pin_refcountper target(node, cid). - Write an
audit_logrow.
- Insert
- After commit, publish one
pin.jobs.<targetNodeID>message per placement withNats-Msg-Id = <requestID>:<nodeID>:<fence>for JetStream dedup. - Target nodes pull from
pin.jobs.<myNodeID>, call Kubo, UPDATE the placement rowWHERE fence = $n, publishpin.events.<orgID>.<requestID>, then ack. - WebSocket clients subscribed to the org + optional requestid see status frames.
Key invariants
- Postgres is the commit point. NATS publish is follow-on; a dropped publish is recovered by the Requeue Sweeper.
- Fence tokens prevent zombie writes. A zombie node's late ack UPDATE affects 0 rows because its fence was bumped during rebalance.
- Publisher-side dedup absorbs retry storms. JetStream's
Duplicates: 5mplusNats-Msg-Idmeans the same logical job cannot be processed twice within the window. - Per-(node, cid) refcounts let two orgs pinning the same CID on the same node share one Kubo pin; unpin only fires when the refcount hits zero.
- RLS is belt-and-suspenders. Every tenant-scoped table enables Postgres row-level security keyed on the
anchorage.org_idGUC so a Go-layer bug can't bleed rows.
Clustering patterns
Adopted from the sibling kanrisha project:
- TTL-based leader election via JetStream KV (
ANCHORAGE_LEADERbucket, 5s TTL,kv.Createas CAS). - Fence tokens on every dispatched unit of work.
- Publisher dedup via
Nats-Msg-Id+ streamDuplicates: 5m. - Write-ahead journaling — Postgres is the atomic commit point; NATS is signal only.
- Graceful shutdown order — HTTP → consumers → NATS drain → pgxpool close.
Maintenance mode (see docs/cluster-ops.md)
Two independent toggles:
- Per-node drain —
nodes.status = 'drained'. Node stops pulling jobs,/v1/readyreturns 503, rebalancer moves its placements off (withreason=drainaudit rows). - Cluster-wide pause —
ANCHORAGE_CLUSTER.maintenance=truein NATS KV. Rebalancer and Requeue Sweeper no-op; API keeps serving. Safety rail:cluster.maintenance.maxDuration(default 1h) warns loudly on forgotten flags.