Commit Graph

3 Commits

Author SHA1 Message Date
pingqiu
44103a1bd7 feat: Phase 20 acceptance fixes + sw-test-runner suite mode
Acceptance rows closed:
- WriteLBA/SyncCache contract: code comments document write-back vs
  durability fence semantics
- RF=2 stable identity: v2bridge always uses SetReplicaAddrs (preserves
  ServerID); blockcmd dispatcher also fixed to use setupPrimaryReplicationMulti;
  test asserts exact expected ReplicaID="vs-2" (not just non-empty)
- Tests treating WriteLBA as commit: replica_read_test rewritten with
  SyncCache as durability fence
- publish_healthy contract: 3 gate tests with hard assertions including
  gate 3 (PrimaryShipperConnected)
- SetReplicaAddr deprecation warning added
- WALShipper.ReplicaID() getter added for identity verification

Test runner enhancements:
- sw-test-runner suite command: build → deploy → run N scenarios in one
  invocation with --skip-deploy support
- Suite YAML definitions for T6 Stage 0 and Stage 1
- deploy action: kill stale processes, clean dirs, cross-compile, upload
- run-phase20-t6.ps1 PowerShell script (deprecated by suite command)

Engine/runtime fixes:
- Recovery executor nil-safety improvements
- Recovery bundle BuildRecoveryBundle defensive checks
- ShipperGroup MinReplicaFlushedLSNAll surface

Docs: acceptance checklist refined, test matrix updated, T6 runbook,
engine maintainer tutorial, design README updated.

26 files changed, ~1600 insertions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 11:30:54 -07:00
pingqiu
16ba70f856 refactor: make bounded recovery observation events replica-scoped
Carry replica-scoped addressing through bounded recovery planning and completion events so the core no longer depends on a volume-only observation seam. This preserves the current single-replica catch-up and rebuilding behavior while aligning the observation side with the replica-scoped command path.

Made-with: Cursor
2026-04-04 09:18:07 -07:00
pingqiu
e200df7791 feat: Task I — recovery execution helpers extracted to sw-block runtime
New reusable execution helpers in sw-block/engine/replication/runtime:
- ExecuteCatchUpPlan: drives catch-up execution, notifies host via callback
- ExecuteRebuildPlan: drives rebuild execution, notifies host via callback
- RecoveryCallbacks interface: host-side OnCatchUpCompleted/OnRebuildCompleted

The host (weed/server/block_recovery.go) supplies concrete IO bindings and
receives completion notifications. The reusable execution logic no longer
requires weed/server ownership.

4 tests prove boundary behavior:
- catch-up callback receives achievedLSN matching plan target
- catch-up with plan-derived target works correctly
- rebuild callback receives plan reference
- nil callbacks don't panic

weed/server rebinding to use these helpers deferred to Task J
(legacy isolation).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 01:03:37 -07:00