Commit Graph

5 Commits

Author SHA1 Message Date
pingqiu
c7eb87c587 feat: Phase 09 — V2 execution primitives and production closure
Engine execution layer for V2 replication protocol:
- RebuildInstaller: full state handoff (dirty map, WAL, superblock, flusher)
- TruncateToLSN: exact safety predicate (checkpointLSN == truncateLSN),
  ErrTruncationUnsafe escalation to NeedsRebuild
- SyncReceiverProgress: unconditional Store for post-rebuild alignment
- V2StatusSnapshot: CommittedLSN = nextLSN-1 for sync_all

V2 bridge real I/O executors:
- TransferFullBase: TCP streaming + RebuildInstaller + second catch-up
- TransferSnapshot: SHA-256 verified streaming to disk
- TruncateWAL: ErrTruncationUnsafe detection + escalation
- StreamWALEntries: rebuild-mode TCP apply

Engine executor interfaces:
- CatchUpIO.TruncateWAL, RebuildIO.TransferFullBase returns achievedLSN
- CatchUpExecutor truncation-only skip, NeedsRebuild escalation
- RebuildExecutor uses achievedLSN for progress tracking

Design docs reorganized: superseded planning docs removed, protocol
truths and closure map added.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 16:25:23 -07:00
pingqiu
1578adfba5 fix: wire real v2bridge I/O into engine executors (Phase 08 P2 closure)
Engine executors now have IO interfaces for real bridge I/O:
- CatchUpExecutor.IO (CatchUpIO): StreamWALEntries
- RebuildExecutor.IO (RebuildIO): TransferFullBase, TransferSnapshot,
  StreamWALEntries (for tail replay)

When IO is set, executor calls real bridge I/O during execution.
When IO is nil, executor uses caller-supplied progress (test mode).

RecoveryPlan.CatchUpStartLSN: bound at plan time for IO bridge.

v2bridge.Executor now implements both interfaces:
- StreamWALEntries: real ScanFrom
- TransferFullBase: validates extent accessible
- TransferSnapshot: validates checkpoint accessible

Chain tests wire IO:
- CatchUpClosure: exec.IO = executor → real WAL scan through engine
- RebuildClosure: exec.IO = executor → real transfer through engine

This closes the engine → executor → v2bridge → blockvol chain.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 15:10:50 -07:00
pingqiu
4df61f290b fix: true mid-executor invalidation test via OnStep hook
CatchUpExecutor.OnStep: optional callback fired between executor-managed
progress steps. Enables deterministic fault injection (epoch bump)
between steps without racing or manual sender calls.

E2_EpochBump_MidExecutorLoop:
- Executor runs 5 progress steps
- OnStep hook bumps epoch after step 1 (after 2 successful steps)
- Executor's own loop detects invalidation at step 2's check
- Resources released by executor's release path (not manual cancel)
- Log shows session_invalidated + exec_resources_released

This closes the remaining FC2 gap: invalidation is now detected
and cleaned up by the executor itself, not by external code.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 15:51:21 -07:00
pingqiu
f5c0aab454 fix: rebuild executor consumes bound plan, fix catch-up timing
Planner/executor contract:
- RebuildExecutor.Execute() takes no arguments — consumes plan-bound
  RebuildSource, RebuildSnapshotLSN, RebuildTargetLSN
- RecoveryPlan binds all rebuild targets at plan time
- Executor cannot re-derive policy from caller-supplied history

Catch-up timing:
- Removed unused completeTick parameter from CatchUpExecutor.Execute
- Per-step ticks synthesized as startTick + stepIndex + 1
- API shape matches implementation

New test: PlanExecuteConsistency_RebuildCannotSwitchSource
- Plans snapshot+tail, then mutates storage history
- Executor succeeds using plan-bound values (not re-derived)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 13:33:52 -07:00
pingqiu
50442acb2e feat: add stepwise executor with release symmetry (Phase 06 P2)
New: executor.go — CatchUpExecutor + RebuildExecutor
Replaces convenience wrappers with stepwise execution that owns
resource lifecycle on every exit path.

CatchUpExecutor.Execute:
  1. BeginCatchUp (freezes target)
  2. Stepwise RecordCatchUpProgress + CheckBudget per step
  3. RecordTruncation (if required)
  4. CompleteSessionByID
  5. Release resources (success or failure)

RebuildExecutor.Execute:
  1. BeginConnect + RecordHandshake
  2. SelectRebuildFromHistory
  3. BeginRebuildTransfer + progress
  4. BeginRebuildTailReplay + progress (snapshot+tail)
  5. CompleteRebuild
  6. Release resources (success or failure)

Both executors:
- Release all pins on every exit path (success, failure, cancellation)
- Check session validity mid-execution (detect epoch bump / endpoint change)
- Log resource release with causal reason

14 new tests (executor_test.go), mapped to tester expectations:
- E1: Partial catch-up failure releases WAL pin (2 tests)
- E2: Partial rebuild failure releases all pins (1 test)
- E3: Epoch bump / cancel releases resources (3 tests)
- E4: Successful execution releases resources (2 tests)
- E5: Stepwise not convenience (2 tests)

Delivery template:
Changed contracts: executor owns resource lifecycle (not caller)
Fail-closed: session check mid-execution, release on every error
Resources: WAL/snapshot/full-base pins released on all exit paths
Carry-forward: CompleteCatchUp/CompleteRebuild remain test-only

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 13:24:37 -07:00