Greenfield Go multi-tenant IPFS Pinning Service wire-compatible with the
IPFS Pinning Services API spec. Paired 1:1 with Kubo over localhost RPC,
clustered via embedded NATS JetStream, Postgres source-of-truth with
RLS-enforced tenancy, Fiber + huma v2 for the HTTP surface, Authentik
OIDC for session login with kid-rotated HS256 JWT API tokens.
Feature-complete against the 22-milestone build plan, including the
ship-it v1.0 gap items:
* admin CLIs: drain/uncordon, maintenance, mint-token, rotate-key,
prune-denylist, rebalance --dry-run, cache-stats, cluster-presences
* TTL leader election via NATS KV, fence tokens, JetStream dedup
* rebalancer (plan/apply split), reconciler, requeue sweeper
* ristretto caches with NATS-backed cross-node invalidation
(placements live-nodes + token denylist)
* maintenance watchdog for stuck cluster-pause flag
* Prometheus /metrics with CIDR ACL, HTTP/pin/scheduler/cache gauges
* rate limiting: session (10/min) + anonymous global (120/min)
* integration tests: rebalance, refcount multi-org, RLS belt
* goreleaser (tar + deb/rpm/apk + Alpine Docker) targeting Gitea
Stack: Cobra/Viper, Fiber v2 + huma v2, embedded NATS JetStream,
pgx/sqlc/golang-migrate, ristretto, TypeID, prometheus/client_golang,
testcontainers-go.
1.6 KiB
1.6 KiB
Cluster operations
Drain a single node (hardware replacement, etc.)
anchorage admin drain nod_01h7rfxv...
# verify
curl -sf http://node-under-drain:8080/v1/ready # now 503
The drained node:
- Returns 503 from
/v1/readyso the LB stops routing new traffic. - Cancels its
pin.jobs.<nodeID>consumer aftercluster.drainGracePeriod(default 2m) once in-flight jobs finish. - Keeps publishing
node.heartbeat.<nodeID>so peers know it is intentionally out of rotation. - Gets its placements moved onto healthy nodes by the rebalancer with
reason=drainin audit.
When work is done, restore:
anchorage admin uncordon nod_01h7rfxv...
Existing pins already migrated stay put; the uncordoned node re-acquires load gradually as new pins land.
Cluster-wide maintenance (rolling upgrade)
Before bouncing every anchorage in turn:
anchorage admin maintenance on --reason "upgrade to v0.3.1" --ttl 30m
While this is set in the ANCHORAGE_CLUSTER NATS KV bucket:
- Rebalancer no-ops (so nodes briefly going
downduring restart don't trigger replacement placements). - Requeue sweeper no-ops (same reason).
- API continues serving on live nodes; new pins are accepted and placed on the live set.
/v1/readykeeps returning 200 on live nodes.
After the fleet has finished bouncing:
anchorage admin maintenance off
The safety rail cluster.maintenance.maxDuration (default 1h) means a forgotten flag logs loud warnings, so incidents aren't silently masked.
Inspecting state
anchorage admin maintenance status
anchorage migrate status