feat: Phase 13 CP13-1 — frozen test-first baseline for sync replication gaps

Baseline report (phase-13-cp1-baseline.md) from running 44 existing
replication-gap tests on current code with zero protocol changes:

  37 PASS / 4 FAIL / 3 PASS*

4 FAILs expose real gaps:
- ReconnectUsesHandshakeNotBootstrap: degraded shipper doesn't catch up (CP13-5)
- CatchupMultipleDisconnects: repeated reconnect cycles don't recover (CP13-5)
- NeedsRebuildBlocksAllPaths: stays Degraded after large gap (CP13-5+7)
- CatchupDoesNotOverwriteNewerData: catch-up fails at barrier (CP13-5)

3 PASS* are witness-only (pass but don't prove the property):
- Bug3_ReplicaAddr: documents gap, not fix (CP13-2)
- GapBeyondRetainedWal: asserts barrier failure, not NeedsRebuild (CP13-7)
- MaxBytesTriggersNeedsRebuild: logs "not implemented" (CP13-6)

No protocol code changed. Baseline is test-first evidence only.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
pingqiu
2026-04-02 17:07:21 -07:00
parent c0a805184f
commit 600dac6029
2 changed files with 415 additions and 0 deletions

View File

@@ -0,0 +1,111 @@
# CP13-1 Baseline Report
Date: 2026-04-02
Commit: c0a805184 (feature/sw-block HEAD)
Runner: `go test ./weed/storage/blockvol/ -v -count=1 -timeout 120s`
Protocol changes in this checkpoint: NONE — test-first baseline only
## Category 1: Address Truth
| Result | Test | Reason |
|--------|------|--------|
| PASS | `TestCanonicalizeAddr_WildcardIPv4_UsesAdvertised` | canonicalization infra works |
| PASS | `TestCanonicalizeAddr_WildcardIPv6_UsesAdvertised` | canonicalization infra works |
| PASS | `TestCanonicalizeAddr_NilIP_UsesAdvertised` | canonicalization infra works |
| PASS | `TestCanonicalizeAddr_AlreadyCanonical_Unchanged` | no-op on canonical input |
| PASS | `TestCanonicalizeAddr_Loopback_Unchanged` | loopback preserved intentionally |
| PASS | `TestCanonicalizeAddr_NoAdvertised_FallsBackToOutbound` | fallback path works |
| PASS* | `TestBug3_ReplicaAddr_MustBeIPPort_WildcardBind` | documents gap: ReplicaReceiver may return `:port` not `ip:port` on wildcard bind; test passes as documentation, not as proof of fix → CP13-2 |
## Category 2: Durable Progress Truth
| Result | Test | Reason |
|--------|------|--------|
| PASS | `TestReplicaProgress_BarrierUsesFlushedLSN` | barrier now gates on replicaFlushedLSN (CP13-3 done) |
| PASS | `TestReplicaProgress_FlushedLSNMonotonicWithinEpoch` | flushedLSN monotonic within epoch (CP13-3 done) |
| PASS | `TestBarrier_RejectsReplicaNotInSync` | barrier rejects non-InSync replica |
| PASS | `TestBarrier_EpochMismatchRejected` | barrier rejects epoch mismatch |
| PASS | `TestBarrier_DuringCatchup_Rejected` | barrier rejected during CatchingUp state (CP13-4 done) |
| PASS | `TestBarrier_ReplicaSlowFsync_Timeout` | barrier timeout on slow replica |
| PASS | `TestBarrierResp_FlushedLSN_Roundtrip` | barrier response wire format carries flushedLSN |
| PASS | `TestBarrierResp_BackwardCompat_1Byte` | backward compat with old 1-byte response |
| PASS | `TestReplica_FlushedLSN_OnlyAfterSync` | flushedLSN only updated after fdatasync |
| PASS | `TestReplica_FlushedLSN_NotOnReceive` | flushedLSN not updated on entry receive |
| PASS | `TestShipper_ReplicaFlushedLSN_UpdatedOnBarrier` | shipper tracks replica flushedLSN from barrier |
| PASS | `TestShipper_ReplicaFlushedLSN_Monotonic` | tracked flushedLSN is monotonic |
| PASS | `TestShipperGroup_MinReplicaFlushedLSN` | group computes min flushedLSN across replicas |
| PASS | `TestDistSync_SyncAll_NilGroup_Succeeds` | sync_all with no replicas succeeds locally |
| PASS | `TestDistSync_SyncAll_AllDegraded_Fails` | sync_all fails when all replicas degraded |
| PASS | `TestBug2_SyncAll_SyncCache_AfterDegradedShipperRecovers` | after recovery, catch-up + barrier succeeds (CP13-5 done) |
| PASS | `TestBug1_SyncAll_WriteDuringDegraded_SyncCacheMustFail` | SyncCache correctly fails during degraded |
## Category 3: Reconnect / Catch-up
| Result | Test | Reason |
|--------|------|--------|
| PASS | `TestReconnect_CatchupFromRetainedWal` | reconnect + WAL catch-up works (CP13-5 done) |
| PASS* | `TestReconnect_GapBeyondRetainedWal_NeedsRebuild` | correctly fails SyncCache after large gap, but does NOT assert NeedsRebuild state transition — asserts barrier failure only → CP13-5+CP13-7 |
| PASS | `TestReconnect_EpochChangeDuringCatchup_Aborts` | catch-up aborts on epoch change |
| PASS | `TestReconnect_CatchupTimeout_TransitionsDegraded` | catch-up timeout → degraded |
| PASS | `TestAdversarial_FreshShipperUsesBootstrapNotReconnect` | fresh shipper uses bootstrap path |
| FAIL | `TestAdversarial_ReconnectUsesHandshakeNotBootstrap` | **gap: degraded shipper with prior flushed progress reconnects but barrier fails** — shipper does not catch up before attempting barrier → CP13-5 |
| PASS | `TestAdversarial_ReplicaRejectsDuplicateLSN` | replica rejects duplicate LSN |
| PASS | `TestAdversarial_ReplicaRejectsGapLSN` | replica rejects LSN gap |
| FAIL | `TestAdversarial_CatchupMultipleDisconnects` | **gap: catch-up across multiple disconnect/reconnect cycles fails** — first reconnect barrier fails, subsequent cycles never recover → CP13-5 |
| PASS | `TestAdversarial_ConcurrentBarrierDoesNotCorruptCatchupFailures` | concurrent barriers don't corrupt counter |
## Category 4: Retention / Rebuild Boundary
| Result | Test | Reason |
|--------|------|--------|
| PASS | `TestWalRetention_RequiredReplicaBlocksReclaim` | replica-aware WAL retention works (CP13-6 done) |
| PASS | `TestWalRetention_TimeoutTriggersNeedsRebuild` | retention timeout → NeedsRebuild (CP13-6 done) |
| PASS* | `TestWalRetention_MaxBytesTriggersNeedsRebuild` | passes but logs "max-bytes retention trigger not implemented yet" — shipper stays Degraded, does not transition to NeedsRebuild → CP13-6 |
| FAIL | `TestAdversarial_NeedsRebuildBlocksAllPaths` | **gap: after large WAL gap, shipper stays Degraded instead of NeedsRebuild; Ship/Barrier not blocked** → CP13-5+CP13-7 |
| FAIL | `TestAdversarial_CatchupDoesNotOverwriteNewerData` | **gap: catch-up after disconnect fails at barrier level** — catch-up doesn't complete, so newer-data safety not actually exercised → CP13-5 |
| PASS | `TestHeartbeat_ReportsPerReplicaState` | heartbeat reports per-replica shipper state |
| PASS | `TestHeartbeat_ReportsNeedsRebuild` | heartbeat reports NeedsRebuild per-replica |
| PASS | `TestReplicaState_RebuildComplete_ReentersInSync` | full rebuild cycle: NeedsRebuild → rebuild → InSync |
| PASS | `TestRebuild_AbortOnEpochChange` | rebuild aborts on epoch change |
| PASS | `TestRebuild_PostRebuild_FlushedLSN_IsCheckpoint` | post-rebuild flushedLSN = checkpoint |
## Summary
| Category | PASS | FAIL | PASS* | Total |
|----------|------|------|-------|-------|
| 1. Address Truth | 6 | 0 | 1 | 7 |
| 2. Durable Progress Truth | 17 | 0 | 0 | 17 |
| 3. Reconnect / Catch-up | 7 | 2 | 1 | 10 |
| 4. Retention / Rebuild | 7 | 2 | 1 | 10 |
| **Total** | **37** | **4** | **3** | **44** |
## Failure → Checkpoint Mapping
| FAIL Test | Root Cause | Closes In |
|-----------|-----------|-----------|
| `TestAdversarial_ReconnectUsesHandshakeNotBootstrap` | degraded shipper reconnects but doesn't catch up before barrier | CP13-5 |
| `TestAdversarial_CatchupMultipleDisconnects` | repeated disconnect/reconnect cycles don't recover | CP13-5 |
| `TestAdversarial_NeedsRebuildBlocksAllPaths` | shipper stays Degraded after large gap, should be NeedsRebuild | CP13-5 + CP13-7 |
| `TestAdversarial_CatchupDoesNotOverwriteNewerData` | catch-up fails, so newer-data safety not exercised | CP13-5 |
## PASS* → Checkpoint Mapping
| PASS* Test | Why Not Full Proof | Closes In |
|------------|-------------------|-----------|
| `TestBug3_ReplicaAddr_MustBeIPPort_WildcardBind` | documents gap, doesn't prove fix | CP13-2 |
| `TestReconnect_GapBeyondRetainedWal_NeedsRebuild` | asserts barrier failure, not NeedsRebuild state | CP13-5 + CP13-7 |
| `TestWalRetention_MaxBytesTriggersNeedsRebuild` | logs "not implemented", shipper stays Degraded | CP13-6 |
## What Was NOT Changed
This baseline was captured on current code without any protocol modifications:
- No reconnect handshake changes
- No WAL catch-up logic changes
- No retention policy changes
- No rebuild behavior changes
- No barrier protocol changes
- No state machine changes
- No new protocol code of any kind
All 4 FAILs and 3 PASS* entries expose real gaps that exist in the current codebase.

View File

@@ -0,0 +1,304 @@
Purpose: append-only technical pack and delivery log for `Phase 13` sync replication correctness.
---
### `CP13-1` Technical Pack
Date: 2026-04-02
Goal: freeze a focused test-first baseline for sync replication correctness before major implementation work so `Phase 13` closes real gaps rather than validating against moving expectations
#### Layer 1: Semantic Core
##### Problem statement
`Phase 12` accepted bounded hardening and a first launch envelope on the chosen path.
That acceptance did not prove that cross-machine `RF=2 sync_all` already has a fully explicit replicated-durability model for:
1. reconnect after outage
2. catch-up from retained WAL
3. durable-progress truth at barrier time
4. retention vs rebuild boundary
The first checkpoint therefore accepts only one bounded thing:
1. a frozen failing/passing baseline for the replication gaps that `Phase 13` will close
It does not accept:
1. protocol implementation by implication
2. broad performance or rollout claims
3. happy-path-only validation
##### State / contract
`CP13-1` must make these truths explicit:
1. the target replication gaps are named before implementation
2. at least one current-code failure exists for each major missing protocol property
3. already-correct behavior may remain green and should be recorded as such
4. later checkpoints must refer back to this baseline rather than redefining success after the fact
##### Reject shapes
Reject before implementation if the checkpoint:
1. adds tests only after protocol code lands
2. reports “some failures happened” without mapping failures to named gaps
3. mixes proxy coverage and true proof coverage without distinction
4. quietly turns the baseline into broad workload benchmarking
#### Layer 2: Execution Core
##### Current gaps `CP13-1` must expose
1. canonical replica endpoint truth may be weaker than real cross-machine requirements
2. barrier correctness may still depend on sender-side progress rather than flushed durability truth
3. reconnect / catch-up behavior may fail or degrade unclearly after outage
4. retention / rebuild boundary may be implicit instead of explicit
##### Suggested file targets
1. `weed/storage/blockvol/blockvol_test.go`
2. `weed/storage/blockvol/replica_test.go`
3. `weed/storage/blockvol/dist_group_commit_test.go`
4. `weed/storage/blockvol/wal_shipper_test.go`
5. `weed/storage/blockvol/test/component/`
6. `weed/storage/blockvol/testrunner/scenarios/internal/` for bounded real-node scenarios when justified
##### Validation focus
Required proofs:
1. baseline-freeze proof
- the focused tests are added before the major protocol checkpoints land
2. gap-visibility proof
- named protocol gaps fail clearly on current code or are marked as bounded witness coverage
3. boundedness proof
- the checkpoint remains test-first baseline work, not hidden implementation
Reject if:
1. tests are too indirect to say which gap they expose
2. failing behavior is captured only in chat or terminal output, not in a baseline artifact
3. baseline wording already claims the later protocol is fixed
##### Suggested first cut
1. prepare a compact test inventory grouped by:
- address truth
- durable progress truth
- reconnect / catch-up
- retention / rebuild boundary
2. run on current code
3. freeze one baseline report with explicit categories:
- `FAIL`
- `PASS`
- `PASS*`
##### Assignment For `tester`
1. Goal
- add and run the focused replication-gap tests before `sw` starts major protocol work
2. Required outputs
- one frozen baseline report
- one explicit list of current expected failures
- one explicit list of already-green behaviors
3. Hard rules
- do not strengthen the current implementation first
- do not let component or real-node tests replace the smaller protocol-gap tests
- do not turn proxy passes into full-proof claims
##### Assignment For `sw`
1. Goal
- start `Phase 13` immediately by making the baseline package real without pre-solving the protocol gaps
2. Allowed work before baseline freeze
- test harness support that does not change protocol behavior
- small cleanup required to make the baseline runnable
- test additions or renames that make the current gaps explicit
3. Hard rules
- do not pre-fix reconnect / catch-up / retention semantics before the baseline is captured
- do not weaken current degraded-mode signaling just to make tests pass
##### `P1` Start Pack For `sw`
`sw` may start now, but only inside this bounded `CP13-1` package.
---
###### Task 1: Baseline Inventory Freeze
Collect the existing test inventory and classify each test. The inventory below is the frozen starting point — `sw` validates it against current code, fixes classification errors, and adds missing entries only.
**Category 1: Address Truth**
| Test | File | Status | Classification |
|------|------|--------|----------------|
| `TestCanonicalizeAddr_WildcardIPv4_UsesAdvertised` | `net_util_test.go` | PASS | existing, reusable |
| `TestCanonicalizeAddr_WildcardIPv6_UsesAdvertised` | `net_util_test.go` | PASS | existing, reusable |
| `TestCanonicalizeAddr_NilIP_UsesAdvertised` | `net_util_test.go` | PASS | existing, reusable |
| `TestCanonicalizeAddr_AlreadyCanonical_Unchanged` | `net_util_test.go` | PASS | existing, reusable |
| `TestCanonicalizeAddr_Loopback_Unchanged` | `net_util_test.go` | PASS | existing, reusable |
| `TestCanonicalizeAddr_NoAdvertised_FallsBackToOutbound` | `net_util_test.go` | PASS | existing, reusable |
| `TestBug3_ReplicaAddr_MustBeIPPort_WildcardBind` | `sync_all_bug_test.go` | PASS* | documents gap: ReplicaReceiver may return `:port` not `ip:port` |
**Category 2: Durable Progress Truth**
| Test | File | Status | Classification |
|------|------|--------|----------------|
| `TestReplicaProgress_BarrierUsesFlushedLSN` | `sync_all_protocol_test.go` | FAIL expected | gap: barrier doesn't gate on replicaFlushedLSN |
| `TestReplicaProgress_FlushedLSNMonotonicWithinEpoch` | `sync_all_protocol_test.go` | FAIL expected | gap: replicaFlushedLSN API missing |
| `TestBarrier_EpochMismatchRejected` | `sync_all_protocol_test.go` | FAIL expected | gap: barrier doesn't check epoch on replica |
| `TestBarrier_ReplicaSlowFsync_Timeout` | `sync_all_protocol_test.go` | FAIL expected | gap: barrier timeout is hardcoded |
| `TestBarrier_RejectsReplicaNotInSync` | `sync_all_protocol_test.go` | verify | existing, needs verification |
| `TestBarrierResp_FlushedLSN_Roundtrip` | `sync_all_protocol_test.go` | verify | existing, needs verification |
| `TestBarrierResp_BackwardCompat_1Byte` | `sync_all_protocol_test.go` | verify | existing, needs verification |
| `TestReplica_FlushedLSN_OnlyAfterSync` | `sync_all_protocol_test.go` | verify | existing, needs verification |
| `TestReplica_FlushedLSN_NotOnReceive` | `sync_all_protocol_test.go` | verify | existing, needs verification |
| `TestShipper_ReplicaFlushedLSN_UpdatedOnBarrier` | `sync_all_protocol_test.go` | verify | existing, needs verification |
| `TestShipper_ReplicaFlushedLSN_Monotonic` | `sync_all_protocol_test.go` | verify | existing, needs verification |
| `TestShipperGroup_MinReplicaFlushedLSN` | `sync_all_protocol_test.go` | verify | existing, needs verification |
| `TestDistSync_SyncAll_NilGroup_Succeeds` | `dist_group_commit_test.go` | PASS | existing, reusable |
| `TestDistSync_SyncAll_AllDegraded_Fails` | `dist_group_commit_test.go` | PASS | existing, reusable |
| `TestBug2_SyncAll_SyncCache_AfterDegradedShipperRecovers` | `sync_all_bug_test.go` | FAIL expected | gap: catch-up not implemented, barrier hangs after recovery |
**Category 3: Reconnect / Catch-up**
| Test | File | Status | Classification |
|------|------|--------|----------------|
| `TestReconnect_CatchupFromRetainedWal` | `sync_all_protocol_test.go` | FAIL expected | gap: no reconnect handshake or WAL catch-up |
| `TestReconnect_GapBeyondRetainedWal_NeedsRebuild` | `sync_all_protocol_test.go` | FAIL expected | gap: no retention tracking, no NeedsRebuild transition |
| `TestReconnect_EpochChangeDuringCatchup_Aborts` | `sync_all_protocol_test.go` | FAIL expected | gap: no CatchingUp state, no epoch-aware abort |
| `TestReconnect_CatchupTimeout_TransitionsDegraded` | `sync_all_protocol_test.go` | FAIL expected | gap: no catch-up timeout |
| `TestBarrier_DuringCatchup_Rejected` | `sync_all_protocol_test.go` | FAIL expected | gap: no CatchingUp state |
| `TestAdversarial_FreshShipperUsesBootstrapNotReconnect` | `sync_all_adversarial_test.go` | verify | existing, needs verification |
| `TestAdversarial_ReconnectUsesHandshakeNotBootstrap` | `sync_all_adversarial_test.go` | FAIL expected | gap: handshake protocol missing |
| `TestAdversarial_ReplicaRejectsDuplicateLSN` | `sync_all_adversarial_test.go` | verify | existing, needs verification |
| `TestAdversarial_ReplicaRejectsGapLSN` | `sync_all_adversarial_test.go` | verify | existing, needs verification |
| `TestAdversarial_CatchupMultipleDisconnects` | `sync_all_adversarial_test.go` | FAIL expected | gap: no catch-up protocol |
| `TestAdversarial_ConcurrentBarrierDoesNotCorruptCatchupFailures` | `sync_all_adversarial_test.go` | verify | existing, needs verification |
**Category 4: Retention / Rebuild Boundary**
| Test | File | Status | Classification |
|------|------|--------|----------------|
| `TestWalRetention_RequiredReplicaBlocksReclaim` | `sync_all_protocol_test.go` | FAIL expected | gap: WAL reclaim not replica-aware |
| `TestWalRetention_TimeoutTriggersNeedsRebuild` | `sync_all_protocol_test.go` | FAIL expected | gap: no retention timeout |
| `TestWalRetention_MaxBytesTriggersNeedsRebuild` | `sync_all_protocol_test.go` | FAIL expected | gap: no max-bytes retention |
| `TestAdversarial_NeedsRebuildBlocksAllPaths` | `sync_all_adversarial_test.go` | FAIL expected | gap: NeedsRebuild state incomplete |
| `TestAdversarial_CatchupDoesNotOverwriteNewerData` | `sync_all_adversarial_test.go` | verify | existing, needs verification |
| `TestHeartbeat_ReportsPerReplicaState` | `rebuild_v1_test.go` | verify | existing, needs verification |
| `TestHeartbeat_ReportsNeedsRebuild` | `rebuild_v1_test.go` | verify | existing, needs verification |
| `TestReplicaState_RebuildComplete_ReentersInSync` | `rebuild_v1_test.go` | verify | existing, needs verification |
| `TestRebuild_AbortOnEpochChange` | `rebuild_v1_test.go` | verify | existing, needs verification |
| `TestRebuild_PostRebuild_FlushedLSN_IsCheckpoint` | `rebuild_v1_test.go` | verify | existing, needs verification |
###### Task 2: Runnable Baseline Harness
Fix only the minimum harness friction needed to run the baseline cleanly:
- if any test cannot compile or run due to missing test helpers, add the helpers
- if any test panics on setup (not on the gap itself), fix the setup
- do NOT change protocol behavior to make failing tests pass
- do NOT add new protocol code (reconnect, retention, rebuild)
Scope guard: if a fix touches `wal_shipper.go`, `replica_apply.go`, `dist_group_commit.go`, or `blockvol.go` beyond test-helper support, it is out of bounds for Task 2.
###### Task 3: Focused Gap Tests
Add or tighten the smallest set of tests that exposes the current gap on present code:
- if the inventory has a `verify` entry that turns out to be proxy coverage (passes but doesn't actually prove the property), reclassify it as `PASS*`
- if a gap has no test at all, add the minimum test that fails on current code
- prefer unit/protocol tests; add component tests only where the unit test cannot expose the gap
- each new test must have a comment naming which CP13 checkpoint it maps to
Hard rule: do NOT add tests that only pass after protocol work. The baseline must fail cleanly on current code.
###### Task 4: Frozen Baseline Report
Run the full inventory on current code and produce one explicit report:
```
CP13-1 Baseline Report
Date: YYYY-MM-DD
Commit: <hash>
Category 1: Address Truth
PASS TestCanonicalizeAddr_WildcardIPv4_UsesAdvertised
PASS TestCanonicalizeAddr_...
PASS* TestBug3_ReplicaAddr_MustBeIPPort_WildcardBind — documents gap, not proof
...
Category 2: Durable Progress Truth
FAIL TestReplicaProgress_BarrierUsesFlushedLSN — barrier doesn't gate on flushedLSN
PASS TestDistSync_SyncAll_NilGroup_Succeeds
...
Category 3: Reconnect / Catch-up
FAIL TestReconnect_CatchupFromRetainedWal — no catch-up protocol
...
Category 4: Retention / Rebuild Boundary
FAIL TestWalRetention_RequiredReplicaBlocksReclaim — WAL reclaim not replica-aware
PASS TestHeartbeat_ReportsPerReplicaState
...
Summary: X PASS / Y FAIL / Z PASS* out of N total
```
The report must be saved to `sw-block/.private/phase/phase-13-cp1-baseline.md`.
---
###### Required output from `sw`
1. one delivery note naming:
- files changed (test helpers, test additions, renames)
- tests added or strengthened
- which gaps are now exposed by the baseline
2. one frozen result summary (`phase-13-cp1-baseline.md`)
3. one explicit statement of what `sw` did NOT fix
###### Hard boundary
`sw` may do Tasks 1-4 now. `sw` may NOT:
1. implement reconnect handshake protocol (→ CP13-5)
2. implement WAL retention policy (→ CP13-6)
3. implement rebuild fallback behavior (→ CP13-7)
4. change barrier protocol to return flushedLSN (→ CP13-3)
5. add CatchingUp or NeedsRebuild state transitions (→ CP13-4, CP13-5)
6. change `wal_shipper.go` Ship/Connect behavior beyond test-helper wiring
These are explicitly reserved for CP13-2 through CP13-7. The baseline must expose the gaps without closing them.
###### Reject if `sw`
1. starts implementing reconnect protocol, retention policy, or rebuild behavior before the baseline is frozen
2. buries the real gap under large harness churn
3. upgrades witness coverage into proof coverage without saying so
4. turns `CP13-1` into `CP13-2+` by stealth
---
#### Gap → Checkpoint Mapping
| Gap | Exposed by baseline tests | Closed by checkpoint |
|-----|--------------------------|---------------------|
| ReplicaReceiver returns `:port` not `ip:port` | TestBug3 (PASS*) | CP13-2 |
| Barrier doesn't gate on replicaFlushedLSN | TestReplicaProgress_BarrierUsesFlushedLSN (FAIL) | CP13-3 |
| No CatchingUp state, no barrier rejection during catch-up | TestBarrier_DuringCatchup_Rejected (FAIL) | CP13-4 |
| No reconnect handshake or WAL catch-up replay | TestReconnect_CatchupFromRetainedWal (FAIL) | CP13-5 |
| WAL reclaim not replica-aware | TestWalRetention_RequiredReplicaBlocksReclaim (FAIL) | CP13-6 |
| No NeedsRebuild transition on WAL gap | TestReconnect_GapBeyondRetainedWal_NeedsRebuild (FAIL) | CP13-5 + CP13-7 |
| Post-reconnect barrier hangs | TestBug2 (FAIL) | CP13-5 |
#### Short judgment
`CP13-1` is acceptable when:
1. the phase has a frozen, named failing baseline
2. the failing baseline maps cleanly to later checkpoints
3. already-correct behavior is distinguished from true gaps
4. no implementation overclaim sneaks into the checkpoint