Files
at-container-registry/docs/REPOMGR_MIGRATION.md

149 lines
7.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Incremental Migration: Vendored repomgr → Direct `indigo/repo`
## Context
The hold PDS uses a vendored copy of indigo's `repomgr` (~1450 lines in `pkg/hold/pds/repomgr.go`). Upstream, repomgr is [soft-deprecated](https://github.com/bluesky-social/indigo/pull/1102#issuecomment-2985956040) — bnewbold recommends using `indigo/repo` directly (as [cocoon](https://github.com/haileyok/cocoon) does). The vendored copy already has custom patches (PutRecord, UpsertRecord, prevData) and will continue to drift. This migration defines a clean interface, then swaps the implementation behind it.
## Phase 0: Save plan to docs, remove dead code, define interface ✅
**Goal:** Persist this migration plan as a reference doc. Shrink repomgr.go from ~1450 lines to ~750 by removing dead code. Fix import.go encapsulation. Define `RepoOperator` interface so all code accesses repomgr through it. No behavior change.
**Completed 2026-02-28.** repomgr.go: 1453 → 871 lines (-40%). import.go: 189 → 88 lines (-53%). New repo_operator.go: 82 lines.
### Step 1: Save plan to docs ✅
- Write this plan to `docs/REPOMGR_MIGRATION.md`
### Step 2: Dead code deleted from `repomgr.go` ✅
- `HandleExternalUserEvent()` + `handleExternalUserEventNoArchive()` + `handleExternalUserEventArchive()`
- `ImportNewRepo()` + `processNewRepo()` + `walkTree()` + `processOp()` + `stringOrNil()`
- `CheckRepoSig()`
- `TakeDownRepo()`, `ResetRepo()`, `VerifyRepo()`
- `GetProfile()`
- `CarStore()`
- `NextTID()` / `nextTID()` (removed entirely — `BatchWrite` uses `rm.clk.Next()` directly)
- `noArchive` field removed from struct
- 12 unused imports cleaned up
### Step 3: Fixed `import.go` encapsulation ✅
Added `BulkUpsert()` method to `RepoManager`. Rewrote `ImportFromCAR` to call `p.repomgr.BulkUpsert()` instead of reaching into `p.repomgr.lockUser`, `p.repomgr.cs`, `p.repomgr.kmgr`, `p.repomgr.events`. Removed the 88-line `bulkImportRecords()` private method.
### Step 4: Defined `RepoOperator` interface ✅
New file: `pkg/hold/pds/repo_operator.go` — interface with 16 methods, compile-time check, shared types (`RepoEvent`, `RepoOp`, `EventKind`, `BulkRecord`).
### Step 5: Updated callers to use interface ✅
- `pkg/hold/pds/server.go``repomgr *RepoManager``repomgr RepoOperator`, `RepomgrRef()` returns `RepoOperator`
- Downstream callers (`hold/server.go`, `admin/handlers_relays.go`, tests) unchanged — they go through `RepomgrRef()` which returns the interface
- `var _ RepoOperator = (*RepoManager)(nil)` compile-time check in `repo_operator.go`
### Verification: ✅
- `make lint` — 0 issues (also fixed pre-existing unchecked error in `events.go`)
- `make test` — all tests pass
---
## Phase 1: Test hardening against the interface ✅
**Goal:** Write tests against `RepoOperator` that verify current behavior while `RepoManager` is the only implementation. These become the regression safety net when swapping to the new implementation in Phase 3.
**Completed 2026-02-28.** New `repo_operator_test.go`: 38 subtests covering all CRUD, read, event emission, error paths, and edge cases. `runRepoOperatorTests(t, setup)` pattern ready for Phase 2's `DirectRepoOperator`.
### Gaps covered ✅
- `CreateRecord` — round-trip, TID 13-char rkey format, no-panic without event handler
- `UpdateRecord` — CID changes, new data returned, non-existent record error, hydrated events
- `PutRecord` — explicit rkey, duplicate rkey error, hydrated events
- `UpsertRecord` — create path (created=true), update path (created=false, CID changes)
- `DeleteRecord` — delete + verify gone, non-existent rkey error
- `BatchWrite` — create+delete batch, update write type, auto-rkey (nil Rkey), delete-not-found error, empty write elem error, multi-op event emission with hydration, update hydration
- `BulkUpsert` — create + re-upsert with changed data, multi-op event emission
- `GetRecord` — CID match, CID mismatch error, not-found error
- `GetRecordProof` — head CID + proof blocks, not-found error, no-repo error
- `GetRepoRoot` — defined CID after init, changes after write
- `GetRepoRev` — non-empty, changes after write
- `ReadRepo` — non-empty CAR output, incremental export with `since`
- `InitNewActor` — empty DID error, zero user error, event emission with hydration
- Event emission — create/update/delete events verified: prevData, ops, oldRoot, newRoot, rev, since, repoSlice, hydration
### Coverage ✅
All RepoOperator methods 81100%. Remaining uncovered lines are internal infrastructure error guards (`GetUserRepoRev`, `NewDeltaSession`, `OpenRepo`, `Commit`, `CloseWithRoot`) — not reachable without mocking the carstore.
### Files modified ✅
- `pkg/hold/pds/repo_operator_test.go` — new file, 38 subtests
### Verification ✅
- `make lint` — 0 issues
- `make test` — all tests pass
---
## Phase 2: Build new implementation ✅
**Goal:** Create `DirectRepoOperator` using `indigo/repo` directly (cocoon pattern).
**Completed 2026-02-28.** New `pkg/hold/pds/repo.go`: 548 lines (vs 852 in repomgr.go, ~36% reduction). All 37 subtests pass identically for both implementations. Race detector, shuffled order, and parallel execution all clean.
### New file: `pkg/hold/pds/repo.go` ✅
Key differences from vendored repomgr:
- **Single `sync.Mutex`** instead of per-user lock map (`lklk` + `userLocks` + `userLock` struct + reference counting)
- **No OpenTelemetry tracing** (`otel.Tracer` calls removed)
- **No `gorm` dependency** (`RepoHead` struct removed, `gorm.io/gorm` dropped from go.mod)
- **`openWriteSession` / `commitWrite` helpers** extract the repeated 6-step write pattern
Core mutation pattern (same as current, just cleaner):
1. Lock → get rev → open delta session → open repo
2. Capture `r.DataCid()` for prevData
3. Perform operation(s)
4. `r.Commit()``ds.CloseWithRoot()` → emit event → unlock
### Shared types moved to `repo_operator.go` ✅
- `KeyManager` interface and `ActorInfo` struct moved from `repomgr.go`
- Both implementations import from the same location
### Test wiring ✅
- `setupTestDirectRepoOperator` — creates carstore + key manager directly (no `NewHoldPDS`)
- `runRepoOperatorTests` refactored to accept optional `freshSetup` for `InitNewActor_EventEmission`
- `TestDirectRepoOperator` runs all 37 subtests identically
### Verification ✅
- `go build ./cmd/hold` — compiles
- `TestRepoManager` — 37/37 subtests pass
- `TestDirectRepoOperator` — 37/37 subtests pass
- `-race -shuffle=on -count=5 -parallel=8` — all clean
- `make lint` — 0 issues
- `make test` — all tests pass
---
## Phase 3: Config flag + switchover
**Goal:** Feature flag to select implementation, default old.
### Changes:
- `pkg/hold/config.go` — add `UseDirectRepo bool` to DatabaseConfig
- `pkg/hold/pds/server.go` — select implementation based on config in `NewHoldPDS`/`NewHoldPDSWithDB`
- Regenerate example configs
### Verification:
- Deploy with `use_direct_repo: false` (default)
- Test with `use_direct_repo: true` in staging
- `make lint && make test`
---
## Phase 4: Remove vendored repomgr
**Goal:** After production validation, delete the old code.
- Delete `repomgr.go`
- Remove config flag, make `DirectRepoOperator` the only implementation
- Rename `repo_direct.go``repo_operator_impl.go`
- Regenerate example configs
- `make lint && make test`
---
## Decision log
- **`indigo/repo` over `atproto/repo`**: `atproto/repo` has the MST primitives (`Insert`, `Remove`, `ApplyOp`) but no high-level PDS API (`OpenRepo`, `CreateRecord`, `Commit(signFn)`). Its own doc.go says "does not yet work for implementing a repository host (PDS)." `indigo/repo` is what the reference PDS and cocoon use. The `RepoOperator` interface means we can swap later if `atproto/repo` adds PDS support.