Files
at-container-registry/docs/REPOMGR_MIGRATION.md

7.7 KiB
Raw Permalink Blame History

Incremental Migration: Vendored repomgr → Direct indigo/repo

Context

The hold PDS uses a vendored copy of indigo's repomgr (~1450 lines in pkg/hold/pds/repomgr.go). Upstream, repomgr is soft-deprecated — bnewbold recommends using indigo/repo directly (as cocoon does). The vendored copy already has custom patches (PutRecord, UpsertRecord, prevData) and will continue to drift. This migration defines a clean interface, then swaps the implementation behind it.

Phase 0: Save plan to docs, remove dead code, define interface

Goal: Persist this migration plan as a reference doc. Shrink repomgr.go from ~1450 lines to ~750 by removing dead code. Fix import.go encapsulation. Define RepoOperator interface so all code accesses repomgr through it. No behavior change.

Completed 2026-02-28. repomgr.go: 1453 → 871 lines (-40%). import.go: 189 → 88 lines (-53%). New repo_operator.go: 82 lines.

Step 1: Save plan to docs

  • Write this plan to docs/REPOMGR_MIGRATION.md

Step 2: Dead code deleted from repomgr.go

  • HandleExternalUserEvent() + handleExternalUserEventNoArchive() + handleExternalUserEventArchive()
  • ImportNewRepo() + processNewRepo() + walkTree() + processOp() + stringOrNil()
  • CheckRepoSig()
  • TakeDownRepo(), ResetRepo(), VerifyRepo()
  • GetProfile()
  • CarStore()
  • NextTID() / nextTID() (removed entirely — BatchWrite uses rm.clk.Next() directly)
  • noArchive field removed from struct
  • 12 unused imports cleaned up

Step 3: Fixed import.go encapsulation

Added BulkUpsert() method to RepoManager. Rewrote ImportFromCAR to call p.repomgr.BulkUpsert() instead of reaching into p.repomgr.lockUser, p.repomgr.cs, p.repomgr.kmgr, p.repomgr.events. Removed the 88-line bulkImportRecords() private method.

Step 4: Defined RepoOperator interface

New file: pkg/hold/pds/repo_operator.go — interface with 16 methods, compile-time check, shared types (RepoEvent, RepoOp, EventKind, BulkRecord).

Step 5: Updated callers to use interface

  • pkg/hold/pds/server.gorepomgr *RepoManagerrepomgr RepoOperator, RepomgrRef() returns RepoOperator
  • Downstream callers (hold/server.go, admin/handlers_relays.go, tests) unchanged — they go through RepomgrRef() which returns the interface
  • var _ RepoOperator = (*RepoManager)(nil) compile-time check in repo_operator.go

Verification:

  • make lint — 0 issues (also fixed pre-existing unchecked error in events.go)
  • make test — all tests pass

Phase 1: Test hardening against the interface

Goal: Write tests against RepoOperator that verify current behavior while RepoManager is the only implementation. These become the regression safety net when swapping to the new implementation in Phase 3.

Completed 2026-02-28. New repo_operator_test.go: 38 subtests covering all CRUD, read, event emission, error paths, and edge cases. runRepoOperatorTests(t, setup) pattern ready for Phase 2's DirectRepoOperator.

Gaps covered

  • CreateRecord — round-trip, TID 13-char rkey format, no-panic without event handler
  • UpdateRecord — CID changes, new data returned, non-existent record error, hydrated events
  • PutRecord — explicit rkey, duplicate rkey error, hydrated events
  • UpsertRecord — create path (created=true), update path (created=false, CID changes)
  • DeleteRecord — delete + verify gone, non-existent rkey error
  • BatchWrite — create+delete batch, update write type, auto-rkey (nil Rkey), delete-not-found error, empty write elem error, multi-op event emission with hydration, update hydration
  • BulkUpsert — create + re-upsert with changed data, multi-op event emission
  • GetRecord — CID match, CID mismatch error, not-found error
  • GetRecordProof — head CID + proof blocks, not-found error, no-repo error
  • GetRepoRoot — defined CID after init, changes after write
  • GetRepoRev — non-empty, changes after write
  • ReadRepo — non-empty CAR output, incremental export with since
  • InitNewActor — empty DID error, zero user error, event emission with hydration
  • Event emission — create/update/delete events verified: prevData, ops, oldRoot, newRoot, rev, since, repoSlice, hydration

Coverage

All RepoOperator methods 81100%. Remaining uncovered lines are internal infrastructure error guards (GetUserRepoRev, NewDeltaSession, OpenRepo, Commit, CloseWithRoot) — not reachable without mocking the carstore.

Files modified

  • pkg/hold/pds/repo_operator_test.go — new file, 38 subtests

Verification

  • make lint — 0 issues
  • make test — all tests pass

Phase 2: Build new implementation

Goal: Create DirectRepoOperator using indigo/repo directly (cocoon pattern).

Completed 2026-02-28. New pkg/hold/pds/repo.go: 548 lines (vs 852 in repomgr.go, ~36% reduction). All 37 subtests pass identically for both implementations. Race detector, shuffled order, and parallel execution all clean.

New file: pkg/hold/pds/repo.go

Key differences from vendored repomgr:

  • Single sync.Mutex instead of per-user lock map (lklk + userLocks + userLock struct + reference counting)
  • No OpenTelemetry tracing (otel.Tracer calls removed)
  • No gorm dependency (RepoHead struct removed, gorm.io/gorm dropped from go.mod)
  • openWriteSession / commitWrite helpers extract the repeated 6-step write pattern

Core mutation pattern (same as current, just cleaner):

  1. Lock → get rev → open delta session → open repo
  2. Capture r.DataCid() for prevData
  3. Perform operation(s)
  4. r.Commit()ds.CloseWithRoot() → emit event → unlock

Shared types moved to repo_operator.go

  • KeyManager interface and ActorInfo struct moved from repomgr.go
  • Both implementations import from the same location

Test wiring

  • setupTestDirectRepoOperator — creates carstore + key manager directly (no NewHoldPDS)
  • runRepoOperatorTests refactored to accept optional freshSetup for InitNewActor_EventEmission
  • TestDirectRepoOperator runs all 37 subtests identically

Verification

  • go build ./cmd/hold — compiles
  • TestRepoManager — 37/37 subtests pass
  • TestDirectRepoOperator — 37/37 subtests pass
  • -race -shuffle=on -count=5 -parallel=8 — all clean
  • make lint — 0 issues
  • make test — all tests pass

Phase 3: Config flag + switchover

Goal: Feature flag to select implementation, default old.

Changes:

  • pkg/hold/config.go — add UseDirectRepo bool to DatabaseConfig
  • pkg/hold/pds/server.go — select implementation based on config in NewHoldPDS/NewHoldPDSWithDB
  • Regenerate example configs

Verification:

  • Deploy with use_direct_repo: false (default)
  • Test with use_direct_repo: true in staging
  • make lint && make test

Phase 4: Remove vendored repomgr

Goal: After production validation, delete the old code.

  • Delete repomgr.go
  • Remove config flag, make DirectRepoOperator the only implementation
  • Rename repo_direct.gorepo_operator_impl.go
  • Regenerate example configs
  • make lint && make test

Decision log

  • indigo/repo over atproto/repo: atproto/repo has the MST primitives (Insert, Remove, ApplyOp) but no high-level PDS API (OpenRepo, CreateRecord, Commit(signFn)). Its own doc.go says "does not yet work for implementing a repository host (PDS)." indigo/repo is what the reference PDS and cocoon use. The RepoOperator interface means we can swap later if atproto/repo adds PDS support.