feat: CP13-8 PASSES — real-workload validation on RF=2 sync_all

CP13-8 scenario results on m01/M02 (25Gbps RoCE): fsck_ext4: CLEAN file count: 200 (assert_equal PASS) checksum match: MATCH (assert_contains PASS) pgbench TPS: 565.69 (assert_greater PASS) auto-failover: 10.0.0.1:18480 → 10.0.0.3:18480 Code changes (tester + scenario): - volume_server_block.go: readiness state, assignment lifecycle cleanup - block_heartbeat_loop.go: readiness-aware heartbeat reporting - store_blockvol.go: readiness tracking - master_server_handlers_block.go: block API handler updates - cp13-8-real-workload-validation.yaml: redesigned scenario (removed block_promote, use natural auto-failover flow, bootstrap write before wait_volume_healthy) - testrunner/actions/devops.go: scenario action improvements - replica_read_test.go: component-level replica read test Phase docs: CP13-7 accepted, CP13-8/8A technical packs updated, design docs updated for protocol closure evidence. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-20 16:51:31 +00:00 · 2026-04-03 14:24:13 -07:00
parent 334c12664a
commit 4c7fbefe25
21 changed files with 2375 additions and 248 deletions
--- a/sw-block/.private/phase/phase-13-log.md
+++ b/sw-block/.private/phase/phase-13-log.md
@@ -1284,3 +1284,360 @@ Review checklist:
 3. is rebuild handoff bounded and epoch-safe?
 4. is post-rebuild progress initialized from checkpoint truth?
 5. is the checkpoint still bounded to rebuild fallback?
+
+---
+
+### `CP13-8` Technical Pack
+
+Date: 2026-04-03
+Goal: validate the accepted `RF=2 sync_all` replication contract on one bounded set of real workloads so the engineering proof is demonstrated on named real block-device consumers rather than only protocol-level tests
+
+#### Layer 1: Semantic Core
+
+##### Problem statement
+
+`CP13-1..7` have progressively closed replication correctness: address truth, durable progress, state eligibility, reconnect/catch-up, retention, and rebuild fallback.
+What remains is not more replication semantics. It is proving that the accepted contract survives contact with bounded real workloads.
+
+`CP13-8` therefore accepts only one bounded thing:
+
+1. one bounded real-workload validation package for the accepted `RF=2 sync_all` path
+
+It does not accept:
+
+1. broad launch approval
+2. broad benchmark positioning
+3. mode normalization or general product-policy closure
+
+##### State / contract
+
+`CP13-8` must make these truths explicit:
+
+1. the workload envelope is named and bounded:
+   - topology
+   - transport/frontend
+   - filesystem/application consumer
+   - disturbance shapes included
+   - exclusions
+2. the accepted `CP13-1..7` replication contract is the thing being validated, not redefined
+3. real-workload evidence must be replayable and attributable
+4. passing one named workload package does not imply generic production readiness outside the stated envelope
+
+##### Reject shapes
+
+Reject before implementation or review if the checkpoint:
+
+1. presents ad hoc manual runs without a named envelope
+2. treats synthetic benchmarks as substitutes for real workload validation
+3. mixes workload validation with mode normalization or rollout-approval claims
+4. reopens already-accepted replication semantics instead of validating them
+
+#### Layer 2: Execution Core
+
+##### Current gap `CP13-8` must close
+
+1. accepted replication semantics are still primarily validated by protocol/unit/adversarial evidence
+2. one bounded real workload package is still needed to show the contract survives real filesystem/application behavior
+3. the project still needs a replayable workload-evidence object before talking about final mode normalization or broader launch shaping
+
+##### Suggested file targets
+
+1. `weed/storage/blockvol/testrunner/*`
+2. bounded component/real-device validation under `weed/storage/blockvol/test/`
+3. workload docs or result artifacts under `sw-block/.private/phase/`
+4. `weed/server/*` or `blockvol` only if a real workload exposes a concrete bug
+
+##### Validation focus
+
+Required proofs:
+
+1. filesystem proof
+   - one named real filesystem workload completes correctly on the accepted path
+2. application proof
+   - one named database/application workload completes correctly on the accepted path
+3. disturbance proof
+   - only if explicitly included in the envelope, one bounded disturbance case remains correct
+4. envelope proof
+   - topology/frontend/workload/exclusions are explicit and replayable
+5. boundedness proof
+   - checkpoint remains about workload validation, not `CP13-9+`
+
+Reject if:
+
+1. a claimed proof is only a harness smoke test without real workload semantics
+2. failures cannot be attributed because the environment is underspecified
+3. delivery wording implies broad launch readiness from one bounded package
+
+##### Suggested first cut
+
+1. freeze one explicit workload matrix first, before chasing more scenarios
+2. use one filesystem workload and one application/database workload
+3. keep disturbances narrow and named if included at all
+4. produce one result artifact that ties outcomes back to accepted `CP13-1..7` semantics
+
+##### Assignment For `sw`
+
+1. Goal
+   - deliver bounded real-workload validation on the accepted `RF=2 sync_all` path
+2. Required outputs
+   - one explicit workload-envelope summary
+   - one focused code/harness package only where needed to make the bounded workloads replayable
+   - one delivery note explaining:
+     - files updated in place
+     - workload matrix
+     - proof shape
+     - what later checkpoints remain untouched
+3. Hard rules
+   - do not broaden into generic benchmark marketing
+   - do not claim launch approval from one workload package
+   - do not reopen accepted `CP13-1..7` semantics unless the workload exposes a concrete bug
+
+##### Assignment For `tester`
+
+1. Goal
+   - validate that `CP13-8` closes bounded real-workload validation and nothing broader
+2. Validate
+   - a named filesystem workload completes correctly
+   - a named application/database workload completes correctly
+   - the environment and exclusions are explicit
+   - evidence is replayable and attributable
+   - no-overclaim around `CP13-9+`
+3. Reject if
+   - workload evidence is underspecified or non-replayable
+   - the validation object quietly broadens into mode/rollout policy
+   - failures are explained away without a bounded root cause
+
+#### Short judgment
+
+`CP13-8` is acceptable when:
+
+1. one bounded real-workload matrix is explicit
+2. the accepted replication contract is demonstrated on named real consumers
+3. the resulting evidence is replayable and bounded
+4. the checkpoint stays clearly separate from `CP13-9+`
+
+---
+
+### `CP13-8` Delivery Pack
+
+Bounded contract:
+
+1. `CP13-8` accepts real-workload validation only
+2. it does not accept mode normalization, rollout approval, or broad performance positioning
+
+What `sw` should deliver:
+
+1. one focused contract review of the workload envelope and its relation to accepted `CP13-1..7` semantics
+2. one bounded harness/evidence package only where needed to run the chosen workloads replayably
+3. one delivery note with:
+   - changed files
+   - workload matrix
+   - proof shape
+   - no-overclaim statement
+
+Recommended delivery shape:
+
+1. contract:
+   - define the named workload envelope and exclusions
+2. code/tests/harness:
+   - keep updates local to real-workload validation surfaces
+   - make workload pass/fail conditions directly observable
+3. note:
+   - distinguish primary proof from support evidence
+   - explain why `CP13-9+` remains untouched
+
+Review checklist:
+
+1. is the workload envelope explicit and bounded?
+2. are the workloads real consumers, not just synthetic microbenchmarks?
+3. is evidence replayable and attributable?
+4. does the package validate accepted semantics rather than redefining them?
+5. is the checkpoint still bounded to real-workload validation?
+
+---
+
+### `CP13-8A` Technical Pack
+
+Date: 2026-04-03
+Goal: close the assignment-to-publication contradiction exposed by `CP13-8` so the accepted `RF=2 sync_all` path no longer publishes replica readiness from allocation or assignment presence alone
+
+#### Layer 1: Semantic Core
+
+##### Problem statement
+
+`CP13-8` exposed a live contradiction:
+
+1. control truth says the replica exists and has assignment/addresses
+2. runtime truth may still be between:
+   - role applied
+   - receiver startup
+   - shipper attachment
+   - publish-ready closure
+3. external surfaces can therefore overstate readiness before the replica is actually safe to publish as a real block-device peer
+
+`CP13-8A` therefore accepts only one bounded thing:
+
+1. one bounded assignment-to-publication closure slice for the accepted `RF=2 sync_all` path
+
+It does not accept:
+
+1. broad mode normalization
+2. launch approval
+3. backend replacement by implication
+4. timing-based “wait longer” fixes that leave readiness semantics implicit
+
+##### State / contract
+
+`CP13-8A` must make these truths explicit:
+
+1. assignment delivered is not the same as receiver ready
+2. receiver ready is not the same as publish healthy
+3. lookup / heartbeat / tester health must consume the same bounded readiness truth
+4. the closure remains inside the current chosen path:
+   - `RF=2`
+   - `sync_all`
+   - current master / volume-server heartbeat path
+   - `blockvol` backend
+
+##### Reject shapes
+
+Reject before implementation or review if the slice:
+
+1. leaves two semantic assignment paths alive (`store`-only vs service/runtime path)
+2. treats allocation completion or precomputed ports as equivalent to publication readiness
+3. relies on sleeps, retries, or ad hoc timing instead of explicit readiness state
+4. broadens into `CP13-9` mode policy or generic backend redesign
+
+#### Layer 2: Execution Core
+
+##### Current gap `CP13-8A` must close
+
+1. assignment application and replication/publication setup are still too easy to split semantically
+2. readiness truth is not yet a fully explicit first-class product surface across heartbeat / lookup / tester
+3. real-workload reruns cannot cleanly distinguish:
+   - backend data-visibility bug
+   - adapter timing/publication bug
+   - true core-rule gap
+
+##### Suggested file targets
+
+1. `weed/server/volume_server_block.go`
+2. `weed/server/block_heartbeat_loop.go`
+3. `weed/server/master_block_registry.go`
+4. `weed/server/master_grpc_server_block.go`
+5. `weed/server/master_server_handlers_block.go`
+6. `weed/storage/blockvol/testrunner/actions/devops.go`
+7. bounded tests under `weed/server/*`
+
+##### Validation focus
+
+Required proofs:
+
+1. lifecycle proof
+   - assignment processing uses one authoritative path from role apply through runtime wiring
+2. readiness proof
+   - replica-ready is explicit and not inferred from existence/allocation alone
+3. publication proof
+   - lookup / heartbeat / tester surfaces do not publish the replica before readiness closure
+4. rerun proof
+   - a bounded `CP13-8` rerun moves the remaining contradiction into an attributable bug class rather than mixed-state ambiguity
+5. boundedness proof
+   - the slice remains about closure, not `CP13-9`
+
+Reject if:
+
+1. a claimed proof still depends on manual interpretation of timing
+2. different surfaces use different meanings of “healthy” or “ready”
+3. the rerun still fails but the failure cannot be classified beyond “timing”
+
+##### Suggested first cut
+
+1. make `BlockService` the single assignment/readiness owner on the VS side
+2. define one explicit readiness surface and project it into lookup/REST/tester gates
+3. rerun the bounded `CP13-8` workload package only after closure lands
+4. classify any remaining failure as:
+   - backend data bug
+   - adapter/publication bug
+   - core-rule gap
+
+##### Assignment For `sw`
+
+1. Goal
+   - deliver bounded assignment-to-publication closure on the accepted `RF=2 sync_all` path
+2. Required outputs
+   - one focused code package closing the assignment/readiness/publication split
+   - one delivery note explaining:
+     - files updated in place
+     - named readiness states
+     - proof shape
+     - `CP13-8` rerun outcome or remaining attributable contradiction
+     - what later checkpoints remain untouched
+3. Hard rules
+   - do not use timing sleeps as semantic fixes
+   - do not broaden into `CP13-9` mode normalization
+   - do not replace `blockvol` backend in this slice
+   - do not reopen accepted `CP13-1..7` semantics unless a live contradiction is found
+
+##### Assignment For `tester`
+
+1. Goal
+   - validate that `CP13-8A` closes assignment-to-publication truth and nothing broader
+2. Validate
+   - one authoritative assignment path exists
+   - readiness is explicit and externally consistent
+   - lookup / heartbeat / tester health no longer overpublish readiness
+   - the bounded `CP13-8` rerun is attributable
+   - no-overclaim around `CP13-9`
+3. Reject if
+   - old mixed-state behavior still leaks through one surface
+   - the slice depends on timing luck
+   - the rerun still fails but the team cannot say whether it is backend, adapter, or core
+
+#### Short judgment
+
+`CP13-8A` is acceptable when:
+
+1. assignment-to-publication closure is explicit on the chosen path
+2. readiness is no longer inferred from allocation or assignment presence
+3. all product/tester surfaces consume the same bounded readiness truth
+4. the rerun result is attributable and the slice stays separate from `CP13-9`
+
+---
+
+### `CP13-8A` Delivery Pack
+
+Bounded contract:
+
+1. `CP13-8A` accepts assignment-to-publication closure only
+2. it does not accept mode normalization, broad launch approval, or backend replacement
+
+What `sw` should deliver:
+
+1. one focused closure package across VS assignment, heartbeat, lookup, and tester health surfaces
+2. one bounded rerun or equivalent evidence showing whether the remaining contradiction is backend, adapter, or core
+3. one delivery note with:
+   - changed files
+   - named readiness states
+   - proof shape
+   - rerun outcome / remaining attributable contradiction
+   - no-overclaim statement
+
+Recommended delivery shape:
+
+1. contract:
+   - define explicit readiness/publication truth for the chosen path
+2. code/tests:
+   - unify assignment lifecycle
+   - gate publication on readiness closure
+   - prove lookup / heartbeat / tester consistency
+3. note:
+   - distinguish closure proof from any later pure-core redesign
+   - explain why `CP13-9` remains untouched
+
+Review checklist:
+
+1. is assignment processing semantically unified?
+2. is readiness explicit rather than inferred?
+3. do lookup / heartbeat / tester surfaces agree on publication truth?
+4. does the bounded rerun become attributable?
+5. is the slice still bounded to closure rather than mode policy or backend replacement?
--- a/sw-block/.private/phase/phase-13.md
+++ b/sw-block/.private/phase/phase-13.md
@@ -584,12 +584,177 @@ Reject if:

 Status:

+- accepted
+
+Carry-forward:
+
+1. `NeedsRebuild` is now a real fail-closed fallback state
+2. rebuild handoff and post-rebuild progress are bounded by checkpoint truth rather than implicit recovery assumptions
+3. `CP13-8` must validate the accepted replication contract on named real workloads without reopening protocol semantics or quietly broadening into mode policy work
+
+### `CP13-8`: Real-Workload Validation
+
+Goal:
+
+- validate the accepted `RF=2 sync_all` replication contract on one bounded set of real workloads so the engineering proof is no longer only protocol/unit-level but also demonstrated on named real block-device consumers
+
+Acceptance object:
+
+1. `CP13-8` accepts one bounded real-workload validation package for the accepted `RF=2 sync_all` path
+2. it does not accept broad rollout claims, broad benchmark positioning, or mode normalization by implication
+3. it does not accept vague “worked in a manual run” reasoning without named workloads, bounded envelope, and replayable evidence
+
+Execution steps:
+
+1. Step 1: workload envelope freeze
+   - name one bounded validation matrix:
+     - workload(s)
+     - topology
+     - transport/frontend
+     - filesystem/application surface
+     - disturbance shapes included and excluded
+   - recommended first-cut surfaces:
+     - real filesystem behavior such as `ext4`
+     - one database/application surface such as `PostgreSQL`
+2. Step 2: harness and evidence hardening
+   - wire the workload run through real block-device consumers on the accepted path
+   - keep the environment reproducible and bounded enough that failures are attributable
+   - collect evidence at the same semantic layer as accepted prior checkpoints
+3. Step 3: proof package
+   - prove the named real workloads complete correctly on the accepted path
+   - prove disturbance/failover behavior is bounded inside the named envelope if included
+   - prove no-overclaim around `CP13-9+`
+
+Required scope:
+
+1. one bounded workload matrix on the accepted `RF=2 sync_all` path
+2. real block-device consumer validation (not only protocol/unit tests)
+3. bounded disturbance cases only if explicitly named in the envelope
+4. explicit separation between real-workload proof and later mode normalization / rollout claims
+
+Must prove:
+
+1. the accepted replication contract survives contact with named real workloads
+2. evidence is tied to a bounded environment and workload envelope, not generic “production ready” rhetoric
+3. failures, if any, are attributable to explicit workload-envelope gaps rather than ambiguous harness drift
+4. acceptance wording stays bounded to real-workload validation rather than `CP13-9` policy/mode closure
+
+Reuse discipline:
+
+1. prefer existing `testrunner`, bounded component scenarios, and real-device harnesses where possible
+2. update `weed/storage/blockvol/*` only when the real workload exposes a concrete bug in accepted semantics
+3. `weed/server/*` should remain reference only unless workload validation exposes a surfaced control/runtime issue
+4. no checkpoint work may silently broaden into generic benchmark marketing, launch approval, or mode policy redesign
+
+Verification mechanism:
+
+1. one named workload matrix with explicit environment description
+2. replayable runs or artifacts for the chosen workload package
+3. explicit pass/fail conditions tied back to accepted `CP13-1..7` semantics
+4. no-overclaim review so `CP13-8` does not absorb `CP13-9+`
+
+Hard indicators:
+
+1. one accepted filesystem proof:
+   - a named real filesystem workload completes correctly on the accepted path
+2. one accepted application proof:
+   - a named real application/database workload completes correctly on the accepted path
+3. one accepted envelope proof:
+   - the validation matrix is explicit about topology, frontend, workload, and exclusions
+4. one accepted boundedness proof:
+   - `CP13-8` claims real-workload validation only
+
+Reject if:
+
+1. the checkpoint relies on ad hoc manual runs with no bounded envelope
+2. a claimed real-workload proof is actually only a synthetic benchmark or unit test
+3. delivery wording quietly broadens into mode normalization, launch approval, or general production-readiness claims
+
+Status:
+
+- active
+
+### `CP13-8A`: Assignment-to-Publication Closure
+
+Goal:
+
+- close the control/runtime/publication contradiction exposed by `CP13-8` so the system no longer treats allocation or assignment presence as equivalent to replica publication readiness
+
+Acceptance object:
+
+1. `CP13-8A` accepts one bounded closure slice for assignment-to-publication truth on the accepted `RF=2 sync_all` path
+2. it does not accept broad mode normalization, launch approval, or backend replacement by implication
+3. it does not accept sleep-based or timing-based fixes that leave readiness semantics implicit
+
+Execution steps:
+
+1. Step 1: unify assignment lifecycle
+   - ensure assignment delivery flows through one authoritative path from role apply to receiver/shipper wiring to readiness bookkeeping
+   - remove semantic split between store-only role application and service-level replication/publication setup
+2. Step 2: name readiness and publication truth
+   - define explicit readiness states for the chosen path
+   - ensure heartbeat / lookup / tester surfaces distinguish:
+     - allocated
+     - role applied
+     - receiver ready
+     - publish healthy
+3. Step 3: bounded rerun
+   - rerun the bounded `CP13-8` workload package after closure lands
+   - determine whether the remaining contradiction is backend data visibility, adapter timing/publication, or a true core-rule gap
+
+Required scope:
+
+1. assignment-to-publication closure only
+2. chosen path only: `RF=2 sync_all`
+3. existing master / volume-server heartbeat path only
+4. `blockvol` remains the execution backend
+
+Must prove:
+
+1. assignment delivered does not by itself imply receiver ready or publish healthy
+2. replica publication requires explicit readiness closure rather than allocation completion or precomputed port presence
+3. master lookup / REST / tester health checks consume the same bounded readiness truth
+4. `CP13-8A` remains about closure, not mode normalization or backend redesign
+
+Reuse discipline:
+
+1. prefer `weed/server/*` and bridge-layer updates first because this is a surfaced control/runtime issue
+2. update `weed/storage/blockvol/*` only if closure work exposes a concrete backend bug rather than a publication-path contradiction
+3. keep `CP13-1..7` semantics fixed unless the closure work exposes a live contradiction
+4. no checkpoint work may silently broaden into `CP13-9` mode policy or broad rollout claims
+
+Verification mechanism:
+
+1. one focused proof set around assignment lifecycle closure and readiness/publication gating
+2. explicit tests that heartbeat / lookup / tester surfaces do not publish a replica before readiness closes
+3. bounded `CP13-8` rerun or equivalent evidence showing the contradiction moves from mixed-state ambiguity to an attributable remaining cause
+4. no-overclaim review so `CP13-8A` does not absorb `CP13-9`
+
+Hard indicators:
+
+1. one accepted lifecycle proof:
+   - assignment processing uses one authoritative path from role apply through runtime wiring
+2. one accepted readiness proof:
+   - replica-ready is explicit and not inferred from mere existence/allocation
+3. one accepted publication proof:
+   - lookup / heartbeat / tester gates do not publish a replica before readiness closure
+4. one accepted boundedness proof:
+   - `CP13-8A` claims closure only and leaves broader mode policy untouched
+
+Reject if:
+
+1. assignment still reaches different semantic outcomes depending on whether it flows through heartbeat/store-only or service-level processing
+2. a replica can still be surfaced as healthy/ready before receiver/session readiness closes
+3. the slice relies on delays or ad hoc retries rather than explicit readiness semantics
+4. delivery wording broadens into `CP13-9` mode normalization, launch approval, or generic backend replacement
+
+Status:
+
 - active

 ### Later checkpoints inside `Phase 13`

-1. `CP13-8`: real-workload validation
-2. `CP13-9`: mode normalization
+1. `CP13-9`: mode normalization (only after `CP13-8A` closes the assignment/publication contradiction)

 ## Reuse Discipline

--- a/sw-block/design/README.md
+++ b/sw-block/design/README.md
@@ -7,6 +7,7 @@ Historical planning/review documents were moved to `../docs/archive/design/` to
 ## Read First

 - `v2-protocol-truths.md`
+- `v2-protocol-claim-and-evidence.md`
 - `v2-product-completion-overview.md`
 - `v2-phase-development-plan.md`
 - `v2-semantic-methodology.zh.md`
@@ -27,6 +28,7 @@ Historical planning/review documents were moved to `../docs/archive/design/` to
 - `v2_scenarios.md`
 - `v2-scenario-sources-from-v1.md`
 - `v1-v15-v2-comparison.md`
+- `v2-reuse-replacement-boundary.md`
 - `wal-replication-v2.md`
 - `wal-replication-v2-state-machine.md`
 - `wal-replication-v2-orchestrator.md`
--- a/sw-block/design/v2-protocol-claim-and-evidence.md
+++ b/sw-block/design/v2-protocol-claim-and-evidence.md
@@ -0,0 +1,145 @@
+# V2 Protocol Claim And Evidence
+
+Date: 2026-04-03
+Status: active
+Purpose: keep one centralized ledger for the current chosen envelope, accepted claims, supporting evidence, invalidated evidence, and rerun obligations
+
+## Why This Document Exists
+
+`v2-protocol-truths.md` records stable protocol truths.
+`v2-protocol-closure-map.zh.md` records the structural closure model.
+
+What they do not track in one place is the current operational contract:
+
+1. which claims are allowed right now
+2. which baselines are accepted right now
+3. which evidence supports each claim
+4. which evidence has been narrowed or invalidated
+5. which reruns are required before a claim can be restored
+
+This document is that ledger.
+
+## How To Use It
+
+When reviewing any new slice, bug fix, workload run, or delivery note, ask:
+
+1. which current claim does this change strengthen, narrow, or invalidate?
+2. which evidence row should be updated?
+3. does the change alter the current chosen envelope?
+4. does any old claim now require rerun or reclassification?
+
+If the answer changes the current state of the product, update this ledger in the same change.
+
+## Current Chosen Envelope
+
+This is the bounded envelope currently allowed for active V2 claims:
+
+| Item | Current value | Source |
+|------|---------------|--------|
+| Replication factor | `RF=2` | `v2-protocol-closure-map.zh.md` |
+| Durability mode | `sync_all` | `v2-protocol-closure-map.zh.md`, `Phase 13` |
+| Control path | current master / volume-server heartbeat path | `v2-protocol-closure-map.zh.md` |
+| Execution backend | `blockvol` | `v2-protocol-closure-map.zh.md`, `v2-reuse-replacement-boundary.md` |
+| Frontend in active validation | iSCSI | `Phase 11`, `CP13-8` |
+| Real-workload checkpoint | `CP13-8` | `Phase 13` |
+
+Current explicit exclusions:
+
+1. `RF>2` as a general accepted product claim
+2. broad mode normalization before `CP13-9`
+3. broad rollout / launch approval
+4. broad transport matrix claims outside explicitly named evidence
+5. treating synthetic benchmarks as substitutes for real workload validation
+
+## Active Protocol Constraints
+
+These are the currently binding constraints that later work must preserve.
+
+| ID | Constraint | Source | Current status |
+|----|------------|--------|----------------|
+| `T1` | `CommittedLSN` is the external truth boundary | `v2-protocol-truths.md` | active |
+| `T9` | truncation is a protocol boundary, not cleanup | `v2-protocol-truths.md` | active |
+| `T14` | engine remains recovery authority; storage remains truth source | `v2-protocol-truths.md` | active |
+| `T15` | reuse reality, not inherited semantics | `v2-protocol-truths.md` | active |
+| `CP13-2` | stable identity must not be inferred from transport address shape | `Phase 13` | active |
+| `CP13-3` | durable authority is `replicaFlushedLSN`, not legacy success inference | `Phase 13` | active |
+| `CP13-4` | only eligible replica state may satisfy sync durability | `Phase 13` | active |
+| `CP13-5` | reconnect must use explicit handshake / catch-up semantics | `Phase 13` | active |
+| `CP13-6` | retention must fail closed for lagging replicas | `Phase 13` | active |
+| `CP13-7` | unrecoverable gap must escalate to `NeedsRebuild` and block normal paths | `Phase 13` | active |
+| `CP13-8A` | assignment delivered != receiver ready != publish healthy | `Phase 13` | active |
+
+## Accepted Baselines
+
+| Baseline | What it is allowed to say | Evidence location | Current validity |
+|----------|---------------------------|-------------------|------------------|
+| `CP13-1` replication baseline inventory | which tests originally passed/failed/`PASS*` before `CP13-2..7` closure | `sw-block/.private/phase/phase-13-cp1-baseline.md` | valid as baseline inventory, not as final product claim |
+| `Phase 12 P4` bounded floor | one bounded performance floor and rollout-gate package on the accepted chosen path | `sw-block/.private/phase/phase-12-p4-floor.md`, `phase-12-p4-rollout-gates.md` | valid inside its named envelope |
+| real-workload envelope draft | one bounded `ext4 + pgbench` package for `CP13-8` | `sw-block/.private/phase/phase-13-cp8-workload-validation.md` | active draft; full claim pending rerun after blockers close |
+
+## Allowed Claims
+
+These are the claims that may currently be made without overreach.
+
+| Claim ID | Allowed claim | Scope boundary | Evidence anchor | Status |
+|----------|---------------|----------------|-----------------|--------|
+| `C-RF2-SYNCALL-CONTRACT` | the accepted `RF=2 sync_all` replication contract is closed at protocol/unit/adversarial level through `CP13-1..7` | protocol/unit/adversarial evidence only | `Phase 13` docs and tests | allowed |
+| `C-WORKLOAD-DRAFT` | one bounded real-workload validation package is defined for `CP13-8` | package definition only, not final pass claim | `phase-13-cp8-workload-validation.md`, YAML scenario | allowed |
+| `C-WORKLOAD-PASS` | the bounded real-workload package passes on the chosen path | only after rerun succeeds on corrected path | `CP13-8` rerun artifact | not yet allowed |
+| `C-ADAPTER-CLOSURE` | assignment / readiness / publication closure is explicit on the chosen path | only after `CP13-8A` acceptance | `CP13-8A` proof package | in progress |
+| `C-MODE-NORMALIZATION` | mode policy / normalization is closed | only in `CP13-9` or later | future | not allowed |
+| `C-LAUNCH-APPROVAL` | broad product launch readiness | outside current phase | future | not allowed |
+
+## Evidence Map
+
+| Evidence area | What it proves | Primary evidence | Support evidence |
+|---------------|----------------|------------------|------------------|
+| Identity / addressing | stable identity and routable publication | `CP13-2` tests and docs | `qa_block_soak_test.go`, `sync_all_bug_test.go` |
+| Durable progress | barrier durability truth and non-legacy authority | `CP13-3` tests and docs | protocol tests around barrier handling |
+| State eligibility | only eligible replica state may satisfy sync durability | `CP13-4` tests and docs | adversarial state tests |
+| Reconnect / catch-up | reconnect uses handshake/catch-up rather than bootstrap | `CP13-5` tests and docs | adversarial reconnect tests |
+| Retention | lagging replica retains WAL or escalates fail closed | `CP13-6` tests and docs | retention protocol tests |
+| Rebuild fallback | unrecoverable gap escalates to `NeedsRebuild` and blocks normal paths | `CP13-7` tests and docs | rebuild tests |
+| Performance floor | one bounded measured floor and rollout-gate package | `Phase 12 P4` docs/tests | cited baseline artifact |
+| Real-workload package | one bounded workload matrix exists | `CP13-8` scenario/doc | tester validation reports |
+| Assignment/publication closure | assignment does not imply readiness/publication | `CP13-8A` code/tests/debug evidence | tester investigation, bug docs |
+
+## Invalidated Or Narrowed Evidence
+
+This section records evidence that cannot currently be used at full strength.
+
+| ID | Affected claim/evidence | Narrowing reason | Scope | Action required |
+|----|-------------------------|------------------|-------|-----------------|
+| `INV-CP13-8A-01` | any weed-VS scenario claim that `block_promote` preserved replication automatically | promote path could leave new primary without replica shipper wiring; barrier then became vacuous with `0` shippers | recent weed-VS testrunner scenarios using `block_promote` | rerun after fix |
+| `INV-CP13-8A-02` | bounded real-workload `CP13-8` pass claim | blocked by assignment/publication contradiction and then by promote/shipper closure issue | `CP13-8` only | rerun after `CP13-8A` blocker fixes |
+| `INV-CLAIM-SPREAD-01` | claims embedded only in phase delivery notes | phase docs are not a reliable centralized current-state ledger | all scattered phase notes | migrate ongoing claim state here |
+
+Unaffected evidence currently believed to remain valid:
+
+1. standalone `iscsi-target` scenarios that used direct `assign + set_replica` wiring rather than weed-VS `block_promote`
+2. protocol/unit/adversarial evidence from accepted `CP13-1..7`
+3. performance-only scenarios that did not claim active cross-node replication through the broken promote path
+
+## Open Contradictions And Blockers
+
+| ID | Blocker | Current classification | Impact |
+|----|---------|------------------------|--------|
+| `BUG-CP13-8A-ADDR` | malformed/mock replica addresses in some QA allocators | test/adapter bug | narrows affected QA evidence; does not by itself close real workload |
+| `BUG-CP13-8A-RECV-IDEMP` | repeated assignment delivery restarted replica receiver and hit bind conflict | adapter/runtime bug | blocks weed-VS replica from leaving degraded state until fixed |
+| `BUG-CP13-8A-PROMOTE-SHIPPER` | post-promote assignment could leave new primary with no replica shipper configured | master/adapter bug | invalidates weed-VS `block_promote` replication claims until rerun |
+| `CP13-8` | real bounded workload package still needs corrected rerun | blocked by `CP13-8A` issues | blocks real-workload pass claim |
+
+## Rerun Queue
+
+| Priority | Item | Why rerun is needed | Exit condition |
+|----------|------|---------------------|----------------|
+| `P0` | `CP13-8` bounded real-workload scenario | current pass claim is not yet allowed after `CP13-8A` blockers | bounded rerun passes or fails with attributable remaining cause |
+| `P0` | weed-VS scenarios using `block_promote` from the recent testrunner enhancement work | prior replication interpretation may have been vacuous (`0` shippers) | affected scenarios are reclassified or rerun |
+| `P1` | any recent degraded/perf interpretation derived from broken weed-VS promote path | performance interpretation may be based on RF=1 semantics | audit updated and affected numbers rerun or narrowed |
+
+## Maintenance Rules
+
+1. do not add a new claim anywhere else without adding or updating the corresponding row here
+2. when a bug narrows evidence, record the invalidation here in the same change
+3. when a rerun restores a claim, move the row from `Invalidated Or Narrowed Evidence` to `Allowed Claims` or update its status
+4. keep this document bounded to the active chosen path; do not turn it into a future roadmap
--- a/sw-block/design/v2-protocol-closure-map.zh.md
+++ b/sw-block/design/v2-protocol-closure-map.zh.md
@@ -21,6 +21,10 @@
 - `V2` 不是一组散乱 patch
 - 而是在明确边界内逐步建立的协议闭环

+当前 chosen path 下哪些 claim 可以成立、对应 evidence 在哪里、哪些 evidence
+被收紧或失效，不在本文维护；统一见
+`v2-protocol-claim-and-evidence.md`。
+
 这里的“闭环”是有范围的。

 当前默认边界仍然是：
--- a/sw-block/design/v2-protocol-truths.md
+++ b/sw-block/design/v2-protocol-truths.md
@@ -18,6 +18,10 @@ So the most important output to carry forward is not only code, but:

 This document is the compact truth table for the V2 line.

+Current chosen-envelope claims, accepted baselines, evidence mappings, and
+evidence invalidations are tracked separately in
+`v2-protocol-claim-and-evidence.md`.
+
 ## How To Use It

 For each later phase or slice, ask:
--- a/sw-block/design/v2-reuse-replacement-boundary.md
+++ b/sw-block/design/v2-reuse-replacement-boundary.md
@@ -0,0 +1,178 @@
+# V2 Reuse vs Replacement Boundary
+
+Date: 2026-04-03
+Status: active
+
+## Purpose
+
+This note makes one architectural split explicit for the current chosen path:
+
+1. what we reuse from the existing `blockvol`/`weed` stack as mechanics
+2. what must be owned by `V2` as semantic authority
+3. what sits in the adapter boundary between them
+
+The goal is to stop `V1` mixed control/data state from silently redefining `V2`
+behavior through convenience wiring.
+
+Scope is still bounded to:
+
+1. `RF=2`
+2. `sync_all`
+3. current master / volume-server heartbeat path
+4. `blockvol` as the execution backend
+
+## Boundary Rule
+
+`V1` reuse is allowed for execution mechanics.
+
+`V2` replacement is required for semantic authority.
+
+If a change decides protocol meaning, failover meaning, durability meaning, or
+external publication meaning, it belongs to a `V2`-owned layer even if the
+underlying I/O still runs through reused `blockvol` code.
+
+This is the practical interpretation of:
+
+- `v2-protocol-truths.md` `T14`: engine remains recovery authority
+- `v2-protocol-truths.md` `T15`: reuse reality, not inherited semantics
+
+## Three Buckets
+
+### 1. Reusable V1 Core
+
+These components remain useful as mechanics:
+
+| Area | Files | What stays reusable |
+|------|-------|---------------------|
+| Local storage truth | `weed/storage/blockvol/blockvol.go`, `flusher.go`, `rebuild.go`, WAL/extent helpers | WAL append, flush, checkpoint, dirty-map, extent install |
+| Replica transport | `weed/storage/blockvol/replica_apply.go`, `wal_shipper.go`, `shipper_group.go`, `dist_group_commit.go`, `repl_proto.go` | TCP receiver/shipper mechanics, barrier transport, replay/apply |
+| Frontend serving | `weed/storage/blockvol/iscsi/`, `weed/storage/blockvol/nvme/` | block-device serving once a local volume is authoritative |
+| Local role guardrails | `weed/storage/blockvol/promotion.go`, `role.go` | drain, lease revoke, local role gate enforcement |
+
+Rule:
+
+- these layers execute I/O and transport
+- they do not decide whether a replica is eligible, authoritative, published, or healthy in the `V2` sense
+
+### 2. Adapter Boundary
+
+These components translate `V2` truth into concrete runtime wiring:
+
+| Area | Files | Responsibility |
+|------|-------|----------------|
+| Assignment ingest | `weed/server/volume_server_block.go` | authoritative assignment lifecycle for role apply, receiver/shipper wiring, readiness closure |
+| Heartbeat/runtime loop | `weed/server/block_heartbeat_loop.go` | collect/report status and process assignments through the same lifecycle |
+| Local store helper | `weed/storage/store_blockvol.go` | local volume open/close/iteration; no longer the authoritative assignment lifecycle |
+| Bridge | `weed/storage/blockvol/v2bridge/control.go` | convert service/control truth into engine intents |
+
+Rule:
+
+- the adapter boundary may reuse `blockvol` primitives
+- it must name and own lifecycle closure states explicitly
+- it must not let store-only role application masquerade as ready publication
+
+### 3. V2-Owned Replacement
+
+These areas define truth and therefore must remain `V2`-owned:
+
+| Area | Files | Responsibility |
+|------|-------|----------------|
+| Control and identity truth | `sw-block/engine/replication/`, `weed/storage/blockvol/v2bridge/control.go` | assignment truth, stable identity, session truth |
+| Recovery ownership | `weed/server/block_recovery.go` | live runtime owner for catch-up/rebuild tasks |
+| Publication and health closure | `weed/server/master_block_registry.go`, `weed/server/master_block_failover.go` | what the system reports as ready, degraded, publishable |
+| External product surfaces | `weed/server/master_grpc_server_block.go`, `weed/server/master_server_handlers_block.go`, debug/diagnostic surfaces | operator-visible truth, not convenience guesses |
+
+Rule:
+
+- if the system exposes a condition to master, tester, CSI, or operator tooling, that condition must come from `V2`-named state
+
+## Assignment-To-Readiness Lifecycle
+
+The authoritative lifecycle for the current chosen path is:
+
+```text
+assignment delivered
+-> local role applied
+-> replica receiver or primary shipper configured
+-> readiness closed
+-> heartbeat publication
+-> master registry health/publication
+```
+
+More concretely:
+
+1. master intent is delivered
+2. `BlockService.ApplyAssignments()` applies local role truth
+3. the same path wires receiver/shipper runtime
+4. the same path records named readiness state
+5. heartbeat publishes only what is actually publish-healthy
+6. master registry derives lookup/health from explicit readiness, not from allocation alone
+
+## Named Readiness States
+
+For the current implementation slice, the service boundary now names:
+
+1. `roleApplied`
+2. `receiverReady`
+3. `shipperConfigured`
+4. `shipperConnected`
+5. `replicaEligible`
+6. `publishHealthy`
+
+Ownership:
+
+- owned by `BlockService` / adapter layer
+- observed by debug surfaces and heartbeat/publication logic
+- not delegated to `blockvol` as implicit mixed state
+
+## Current File Map
+
+### Reuse
+
+- `weed/storage/blockvol/blockvol.go`
+- `weed/storage/blockvol/flusher.go`
+- `weed/storage/blockvol/replica_apply.go`
+- `weed/storage/blockvol/wal_shipper.go`
+- `weed/storage/blockvol/shipper_group.go`
+- `weed/storage/blockvol/dist_group_commit.go`
+- `weed/storage/blockvol/iscsi/`
+- `weed/storage/blockvol/nvme/`
+
+### Adapter boundary
+
+- `weed/server/volume_server_block.go`
+- `weed/server/block_heartbeat_loop.go`
+- `weed/storage/store_blockvol.go`
+- `weed/server/volume_server_block_debug.go`
+
+### V2-owned replacement / truth
+
+- `weed/storage/blockvol/v2bridge/control.go`
+- `sw-block/engine/replication/`
+- `weed/server/block_recovery.go`
+- `weed/server/master_block_registry.go`
+- `weed/server/master_block_failover.go`
+- `weed/server/master_grpc_server_block.go`
+- `weed/server/master_server_handlers_block.go`
+
+## Immediate Engineering Rule
+
+When a new bug appears, classify it first:
+
+1. `v1 reusable core`: local storage or transport mechanics
+2. `adapter boundary`: assignment/readiness/publication closure bug
+3. `v2 replacement`: semantic authority, identity, ownership, eligibility, rebuild, or operator-visible truth
+
+Do not patch semantic authority directly into `blockvol` unless the same change is
+also reflected as an explicit `V2` state/rule at the service or registry layer.
+
+## Why This Matters For CP13-8
+
+`CP13-8` found the exact class of bug this split is meant to expose:
+
+- allocation/control truth said the replica existed
+- but runtime publication/read visibility was not yet closed
+
+That is not a reason to throw away `blockvol`.
+It is a reason to stop treating mixed `V1` runtime state as if it were already
+closed `V2` publication truth.
--- a/sw-block/design/v2_mini_core_design.md
+++ b/sw-block/design/v2_mini_core_design.md
@@ -0,0 +1,767 @@
+
+## 1. 设计目标
+
+目标不是立即重写 `blockvol`，而是建立一个**纯 `V2` 语义核心**，使它成为系统唯一的语义 authority：
+
+- `V2 core` 负责定义 truth、state、event、decision
+- `adapter` 负责把外部输入翻译成 `V2` 事件，并把 `V2` 决策翻译成 runtime/backend 调用
+- `V1 backend` 只保留执行能力，不再解释协议语义
+
+当前 chosen path 不变：
+
+- `RF=2`
+- `sync_all`
+- 当前 master / volume-server heartbeat path
+- `blockvol` 作为执行 backend
+
+这和已有设计是一致的，不是新改向。它只是把 `Phase 1-13` 一直在做的事显式化。
+
+---
+
+## 2. 核心分层
+
+### 2.1 `V2 Core`
+职责：
+
+- 持有控制真相
+- 持有恢复真相
+- 持有数据边界真相
+- 持有对外发布真相
+- 根据事件做决策
+- 输出 commands / projections
+
+### 2.2 Adapter Boundary
+职责：
+
+- 把 master / heartbeat / runtime observation 翻译成 `V2 event`
+- 把 `V2 command` 翻译成对 `blockvol` / transport / frontend 的调用
+- 把 backend/runtime 事实翻译成 `projection`
+
+### 2.3 `V1 Backend`
+职责：
+
+- WAL / extent / dirty map / flusher
+- receiver / shipper transport
+- iSCSI / NVMe frontend
+- rebuild install primitive
+
+规则：
+
+- backend 报告事实
+- core 解释语义
+- adapter 做翻译
+- 不允许 `blockvol` 的混合状态直接越权成为系统 truth
+
+---
+
+## 3. `V2 Core` 最小对象清单
+
+下面是“最小可行”的 `struct / event / command / projection` 集合。
+
+### 3.1 Struct 清单
+
+#### A. Control structs
+```go
+type VolumeIntent struct {
+    VolumeID string
+    Epoch uint64
+    PrimaryID string
+    ReplicaIDs []string
+    DurabilityMode string
+}
+
+type ReplicaIdentity struct {
+    ReplicaID string
+    ServerID string
+}
+
+type AssignmentView struct {
+    VolumeID string
+    Epoch uint64
+    Role RoleIntent
+    ReplicaEndpoints map[string]Endpoint
+}
+```
+
+职责：
+
+- 谁是 primary
+- 谁是 replica
+- 当前 epoch
+- stable identity 是谁
+- assignment intent 到底是什么
+
+#### B. Recovery structs
+```go
+type ReplicaState string
+const (
+    StateDisconnected ReplicaState = "disconnected"
+    StateConnecting   ReplicaState = "connecting"
+    StateCatchingUp   ReplicaState = "catching_up"
+    StateInSync       ReplicaState = "in_sync"
+    StateDegraded     ReplicaState = "degraded"
+    StateNeedsRebuild ReplicaState = "needs_rebuild"
+)
+
+type ReplicaSession struct {
+    SessionID string
+    ReplicaID string
+    Epoch uint64
+    Kind SessionKind
+    Active bool
+    Superseded bool
+}
+
+type RecoveryOwner struct {
+    ReplicaID string
+    SessionID string
+    Running bool
+}
+```
+
+职责：
+
+- 当前 replica 的恢复状态是什么
+- 当前 session 是谁
+- 谁拥有 recovery authority
+- 旧 session 是否已失效
+
+#### C. Data-boundary structs
+```go
+type BoundaryView struct {
+    CommittedLSN uint64
+    CheckpointLSN uint64
+    WALHeadLSN uint64
+    ReceivedLSN uint64
+    TargetLSN uint64
+    AchievedLSN uint64
+    SnapshotBaseLSN uint64
+}
+```
+
+职责：
+
+- durability boundary
+- catch-up target
+- rebuild target
+- stable base image
+- 实际已达到边界
+
+#### D. Publication structs
+```go
+type ReadinessView struct {
+    RoleApplied bool
+    ReceiverReady bool
+    ShipperConfigured bool
+    ShipperConnected bool
+    ReplicaEligible bool
+    PublishHealthy bool
+}
+
+type LookupProjection struct {
+    VolumeID string
+    PrimaryServer string
+    ISCSIAddr string
+    ReplicaReady bool
+    ReplicaDegraded bool
+}
+```
+
+职责：
+
+- 什么时候可以 publish
+- lookup/heartbeat 应该看到什么
+- “存在”与“ready”分离
+
+---
+
+## 4. `V2 Core` 最小事件清单
+
+### 4.1 Control events
+```go
+AssignmentDelivered
+EpochBumped
+RepeatedAssignmentDelivered
+IdentityResolved
+```
+
+### 4.2 Recovery events
+```go
+SessionCreated
+SessionSuperseded
+SessionRemoved
+CatchUpPlanned
+CatchUpCompleted
+RebuildStarted
+RebuildCommitted
+```
+
+### 4.3 Runtime observation events
+```go
+RoleApplied
+ReceiverStarted
+ReceiverReadyObserved
+ShipperConfiguredObserved
+ShipperConnectedObserved
+BarrierAccepted
+BarrierRejected
+```
+
+### 4.4 Data-boundary events
+```go
+CommittedLSNAdvanced
+CheckpointLSNAdvanced
+ReceivedLSNAdvanced
+AchievedLSNAdvanced
+RetentionEscalated
+```
+
+### 4.5 Publication events
+```go
+HeartbeatCollected
+PublicationProjected
+ReplicaMarkedReady
+ReplicaMarkedDegraded
+```
+
+原则：
+
+- event 是 observation 或 intent
+- event 不是直接结果承诺
+- `V2 core` 必须决定 event 的语义意义
+
+---
+
+## 5. `V2 Core` 最小 command 清单
+
+这些 command 是 `V2 core` 输出给 adapter 的，不直接碰 backend。
+
+```go
+type Command interface{}
+
+type ApplyRoleCommand struct {
+    VolumeID string
+    Epoch uint64
+    Role RoleIntent
+}
+
+type StartReceiverCommand struct {
+    VolumeID string
+    DataAddr string
+    CtrlAddr string
+}
+
+type ConfigureShipperCommand struct {
+    VolumeID string
+    Replicas []ReplicaEndpoint
+}
+
+type StartCatchUpCommand struct {
+    ReplicaID string
+    TargetLSN uint64
+}
+
+type StartRebuildCommand struct {
+    ReplicaID string
+    RebuildAddr string
+    SnapshotBaseLSN uint64
+}
+
+type PublishProjectionCommand struct {
+    VolumeID string
+    Readiness ReadinessView
+}
+
+type InvalidateSessionCommand struct {
+    ReplicaID string
+    Reason string
+}
+```
+
+原则：
+
+- command 只表达“该做什么”
+- backend 如何做，由 adapter 决定
+- command 不依赖 `blockvol` 内部字段
+
+---
+
+## 6. `V2 Core` 最小 projection 清单
+
+projection 是给外部世界看的，不是内部原始状态 dump。
+
+### 6.1 Master / Lookup projection
+- `PrimaryServer`
+- `ISCSIAddr`
+- `ReplicaReady`
+- `ReplicaDegraded`
+- `DurabilityMode`
+
+### 6.2 Heartbeat projection
+- 当前 role / epoch
+- boundary fields
+- readiness fields
+- transport degraded
+- receiver published addr
+
+### 6.3 Diagnostic projection
+- active recovery tasks
+- session ownership
+- publish gating reason
+- pending rebuild / deferred promotion reason
+
+### 6.4 Tester projection
+- `wait_volume_healthy` 不再只看 “replica exists”
+- 必须看 `replica_ready`
+- 必须区分:
+  - allocated
+  - role applied
+  - receiver ready
+  - publish healthy
+
+---
+
+## 7. 直接映射到仓库路径
+
+下面按“核心 / adapter / backend”映射。
+
+### 7.1 `V2 Core` 现有基础
+这些文件已经在扮演 core 的雏形。
+
+- `sw-block/engine/replication/registry.go`
+  - `AssignmentIntent`
+  - `AssignmentResult`
+  - `ReplicaAssignment`
+- `sw-block/engine/replication/session.go`
+  - session ownership / lifecycle
+- `sw-block/engine/replication/sender.go`
+  - sender state abstraction
+- `sw-block/engine/replication/orchestrator.go`
+  - assignment -> session/recovery orchestration
+- `sw-block/engine/replication/types.go`
+- `sw-block/engine/replication/outcome.go`
+- `sw-block/engine/replication/observe.go`
+- `weed/server/block_recovery.go`
+  - runtime owner
+- `weed/server/master_block_registry.go`
+  - publication truth / cluster registry truth
+- `weed/server/master_block_failover.go`
+  - failover/rebuild truth
+
+### 7.2 Adapter Boundary 现有基础
+- `weed/storage/blockvol/v2bridge/control.go`
+  - assignment -> engine intent
+- `weed/server/volume_server_block.go`
+  - assignment lifecycle / readiness closure / wiring
+- `weed/server/block_heartbeat_loop.go`
+  - heartbeat + assignment loop
+- `weed/server/volume_server_block_debug.go`
+  - readiness/debug projection
+- `weed/server/master_grpc_server_block.go`
+  - lookup projection
+- `weed/server/master_server_handlers_block.go`
+  - REST projection
+
+### 7.3 `V1 Backend` 现有基础
+- `weed/storage/blockvol/blockvol.go`
+- `weed/storage/blockvol/flusher.go`
+- `weed/storage/blockvol/replica_apply.go`
+- `weed/storage/blockvol/wal_shipper.go`
+- `weed/storage/blockvol/shipper_group.go`
+- `weed/storage/blockvol/dist_group_commit.go`
+- `weed/storage/blockvol/rebuild.go`
+- `weed/storage/blockvol/iscsi/`
+- `weed/storage/blockvol/nvme/`
+
+---
+
+## 8. 近期可做
+
+这里说的是“现在应该做”的，不是中长期重构幻想。
+
+### 8.1 让 `V2 core` 成为唯一 assignment 语义入口
+目标：
+
+- 所有 assignment 先进入 `V2 core`
+- `BlockService` 不再自己解释太多语义
+- `BlockService` 更像 command executor
+
+直接涉及文件：
+
+- `sw-block/engine/replication/orchestrator.go`
+- `weed/storage/blockvol/v2bridge/control.go`
+- `weed/server/volume_server_block.go`
+
+### 8.2 把 readiness 做成正式 projection，不只是 service 内部状态
+目标：
+
+- `roleApplied`
+- `receiverReady`
+- `shipperConfigured`
+- `shipperConnected`
+- `replicaEligible`
+- `publishHealthy`
+
+这些状态进入稳定 projection，而不是只在 debug 内可见。
+
+直接涉及文件：
+
+- `weed/server/volume_server_block.go`
+- `weed/server/master_block_registry.go`
+- `weed/server/master_grpc_server_block.go`
+- `weed/server/master_server_handlers_block.go`
+
+### 8.3 把 `wait_volume_healthy`、lookup、heartbeat 都统一到同一个 readiness 定义
+目标：
+
+- 不再出现：
+  - registry says healthy
+  - VS side not ready
+  - tester 误判 ready
+
+直接涉及文件：
+
+- `weed/storage/blockvol/testrunner/actions/devops.go`
+- `weed/server/master_block_registry.go`
+- `weed/server/volume_grpc_client_to_master.go`
+
+### 8.4 把 `CP13-8` 用作 adapter/publication closure 的真实验证
+目标：
+
+- 确认当前 failure 到底是：
+  - backend data bug
+  - adapter timing/publication bug
+  - core rule gap
+
+这一步是近期必须做的，因为它是 live contradiction。
+
+---
+
+## 9. 中期演进
+
+### 9.1 把 `V2 core` 真正做成 command/event 模式
+目标：
+
+- engine 输出 command
+- adapter 执行 command
+- runtime 返回 event
+- core 更新 state
+
+这会比现在 “ProcessAssignments 里又做判断又做执行” 更干净。
+
+### 9.2 把 `master_block_registry` 从“半业务逻辑半存储”收敛成 projection store
+目标：
+
+- registry 不负责猜测 semantics
+- registry 存放 `V2 projection`
+- 真正的语义判断放在 core
+
+### 9.3 把 backend 接口化
+候选接口：
+
+```go
+type StorageBackend interface {
+    StatusSnapshot() BoundaryView
+    SetRetentionFloor(...)
+}
+
+type TransportBackend interface {
+    StartReceiver(...)
+    ConfigureReplicas(...)
+    ShipperStates() ...
+}
+
+type RebuildBackend interface {
+    StartRebuild(...)
+    InstallSnapshot(...)
+}
+
+type FrontendBackend interface {
+    PublishISCSI(...)
+    PublishNVMe(...)
+}
+```
+
+### 9.4 让 `blockvol` 逐渐退化成 pure backend
+目标：
+
+- `blockvol` 不再持有系统级语义 authority
+- 它保留 local storage truth 和执行能力
+- 语义解释都上提到 `V2 core`
+
+---
+
+## 10. 如何保持前面建立的约束和 envelope
+
+这部分最重要。
+
+### 10.1 已接受约束必须升格为 core invariants
+前面 `CP13-1..7` 不能只留在测试里，必须进入 core 规则：
+
+- canonical identity
+- durable progress truth
+- only eligible replicas count
+- reconnect handshake
+- retention fail-closed
+- rebuild fallback
+
+### 10.2 envelope 不允许被 core 重构顺手扩大
+继续固定：
+
+- `RF=2`
+- `sync_all`
+- 当前 heartbeat/gRPC path
+- `blockvol` backend
+
+不要因为做架构分层，就顺手放大：
+- `RF>2`
+- broader transport matrix
+- broader rollout claim
+
+### 10.3 每一步都做双重验收
+每个演进 slice 必须同时证明：
+
+- 旧约束仍成立
+- 新边界更显式、更少混态
+
+### 10.4 禁止“抽象重构先降低 bar”
+不能接受：
+- 为了重构，暂时弱化 fail-closed
+- 为了抽接口，暂时模糊 readiness
+- 为了分层，暂时把 publish truth 放宽
+
+---
+
+## 11. V2 如何从已接受 claim 变得可靠
+
+`V2 core` 的可靠性不是来自“状态更多”或“架构更复杂”。
+它的可靠性来自两个更严格的来源：
+
+1. 只消费已经进入 claim/evidence ledger 的 accepted constraints
+2. 只复用 `V1` 中已知可靠的实现行为，而不继承其隐含语义
+
+### 11.1 `V2` 不是重新发明 truth，而是消费已接受 truth
+
+`V2 core` 不应该把当前 runtime 中“看起来通常有效”的行为直接提升为语义 truth。
+
+它只能建立在已经被 claim / evidence 支撑的约束上，例如：
+
+- canonical identity
+- durable progress authority = `replicaFlushedLSN`
+- only eligible replica may satisfy sync durability
+- reconnect must use explicit handshake / catch-up
+- retention must fail closed
+- unrecoverable gap must escalate to `NeedsRebuild`
+
+这些约束不是设计说明的附属物，而应该是 `V2 core` 的输入边界。
+
+换句话说：
+
+- 能进入 core 的，只能是已经被 ledger 接受的 truth
+- 没有进入 ledger 的运行假设，不能直接成为 core 依赖
+
+### 11.2 claim 是 core 的输入约束，不只是 review 文档
+
+`V2 core` 的一个基本规则是：
+
+> 任何没有被 claim/evidence 接受的行为，只能作为 observation，不能作为 authority。
+
+例如，下面这些不能直接进入 `V2` truth：
+
+- “第一次写通常会触发 shipper 连接”
+- “promote 之后 replication 大概会自己恢复”
+- “`degraded=false` 大概就表示 ready”
+- “有 published addr 就说明 replica 可用”
+
+这些都只能当作 runtime observation，必须再经过 `V2` 的 readiness / eligibility / publication 规则过滤。
+
+### 11.3 `V2` 的可靠性来自 fail-closed，而不是隐式收敛
+
+`V1` 常见的问题不是“完全不能工作”，而是很多语义靠时序和重试隐式收敛：
+
+- assignment delivered 之后，何时真正 ready
+- promote 之后，何时真正恢复 replication
+- `sync_all` 何时真的表示 cross-node durability
+
+`V2 core` 必须拒绝从这些模糊状态直接宣布 success。
+
+它应该采用更硬的规则：
+
+- 不满足 accepted claim 的条件时，保持 not-ready / degraded / blocked
+- 不允许从 convenience state 猜测 publish healthy
+- 不允许用工作负载本身去“顺便推动系统进入正确状态”
+
+一句话：
+
+- `V2` 的可靠性来自明确边界 + fail-closed
+- 不来自“系统大概率最终会自己好起来”
+
+### 11.4 `V2 core` 使用哪些 accepted claim
+
+| Claim / Constraint | 用于 core 的哪里 | 作用 |
+|---|---|---|
+| canonical identity | control truth | 不再从地址猜身份 |
+| durable progress = `replicaFlushedLSN` | boundary truth | 不再从 success/ack 猜 durability |
+| eligible-only barrier | readiness / publication | 不让非闭环 replica 参与 durability |
+| reconnect handshake | recovery truth | 不再靠第一次写触发隐式恢复 |
+| retention fail-closed | recovery truth | 不让 lagging replica 以模糊状态长期存在 |
+| rebuild fallback | fail-closed policy | gap 不再长期悬挂在 degraded |
+
+这些 claim 越清楚，`V2 core` 就越可靠。
+
+---
+
+## 12. `V2` 复用哪些 `V1` 可靠行为
+
+`V2` 不是完全抛弃 `V1`。
+它复用的是 `V1` 中已经被证明是局部可靠、实现性稳定的行为。
+
+但必须明确区分：
+
+- **可复用的可靠行为**
+- **不应继续复用的旧语义**
+
+### 12.1 可复用的可靠行为
+
+这些行为可以继续作为 backend primitive 使用：
+
+| `V1` behavior | 为什么可复用 | 为什么不构成语义 authority |
+|---|---|---|
+| WAL append / read | 局部实现、可验证 | 不决定外部 durability meaning |
+| flusher / checkpoint | 局部物化机制 | 不决定 cluster-level readiness |
+| dirty-map local read/write | 局部一致性行为 | 不决定 publication truth |
+| receiver transport | 纯执行路径 | 不决定 session authority |
+| shipper transport | 纯传输机制 | 不决定 eligibility / publish truth |
+| rebuild installer / extent install | 局部 install primitive | 不决定 rebuild policy |
+| iSCSI / NVMe serving | frontend primitive | 不决定 replicated visibility truth |
+
+这些行为的共同特征是：
+
+1. 局部
+2. 可测试
+3. 不依赖 cluster-level 推断
+4. 不应该自己解释系统语义
+
+### 12.2 不继续复用的 `V1` 语义
+
+下面这些即使在 `V1` 中曾经“工作过”，也不应该进入 `V2` truth：
+
+- ready from existence
+- healthy from non-empty publication
+- `sync_all` from vacuous barrier success
+- promote implies replication closure
+- assignment arrival implies runtime closure
+- first write implicitly fixes transport state
+
+`V2` 可以复用 `V1` 的动作，但不能继承这些旧语义。
+
+### 12.3 `weed/` 中当前改动的地位
+
+当前 branch 中 `weed/` 的很多改动更接近：
+
+- 现象验证
+- integration closure
+- debug/diagnostic surfaces
+- 暂时性 runtime fix
+
+它们的价值在于暴露现实、定位问题、验证边界。
+
+但长期语义 authority 不应该放在这些改动本身上。
+
+长期可保留的，应当是那些已经被 `V2` 吸收为：
+
+- backend primitive
+- adapter boundary
+- projection surface
+
+的部分。
+
+---
+
+## 13. 长期资产 vs 当前实现现实
+
+当前项目中最重要的区分不是“哪个文件在跑”，而是“哪个资产值得长期保留”。
+
+### 13.1 长期资产
+
+长期需要保留和演进的是 `sw-block/` 中的 `V2` 语义资产：
+
+- `v2-protocol-truths.md`
+- `v2-protocol-closure-map.zh.md`
+- `v2-protocol-claim-and-evidence.md`
+- `v2-reuse-replacement-boundary.md`
+- `v2_mini_core_design.md`
+- `sw-block/engine/replication/` 中逐步成形的 core semantics
+
+这些资产定义的是：
+
+- truth
+- claim
+- closure
+- reliability model
+- reusable semantics
+
+### 13.2 当前实现现实
+
+`weed/` 中的当前改动更多代表：
+
+- 当前 chosen path 的运行现实
+- backend / adapter / publication 的实现尝试
+- 用来暴露矛盾和验证边界的现实载体
+
+因此，它们不应该被自动视为长期保留资产。
+
+更准确的原则是：
+
+- `weed/` 中的改动，只有在被 `V2` 语义明确吸收之后，才应该作为长期实现保留
+- 否则，它们可以只是阶段性的验证资产
+
+### 13.3 一个总规则
+
+> `V2 core` 的可靠性不是来自信任当前 `weed/` 分支实现。
+> 它来自只消费被 claim/evidence ledger 接受的约束，并只复用 `V1` 中已知可靠的实现行为。
+> 因此，`weed/` 中的当前改动可以是临时验证资产，而 `sw-block/` 中的 truth / claim / core design 才是长期保留资产。
+
+这个规则的好处是：
+
+1. 不会因为当前 integration patch 看起来有效，就把它误当成长期语义
+2. 不会因为 `V1` 仍被复用，就把旧混态继续当作 authority
+3. 后续收敛分支时，可以明确区分：
+   - 哪些东西应进入长期 `V2` 资产
+   - 哪些东西只是当前实现现实
+
+---
+
+## 14. 推荐的实施顺序
+
+### 近期
+1. 继续收紧 assignment -> readiness -> publication closure
+2. 用 `CP13-8` 证明当前 split 能否识别真实 bug 类型
+3. 把 readiness / projection 固化为稳定 surface
+
+### 中期
+1. 引入 command/event 风格的 `V2 core`
+2. 减少 `BlockService` 中的语义判断
+3. 把 registry 收敛为 projection store
+4. 抽 backend interface
+
+### 更后面
+1. 评估是否需要物理独立的 `V2 core process`
+2. 如果需要，那是因为逻辑已经独立，不是为了“好看”
+
+---
+
+## 15. 最短结论
+
+你要的“更工程化”版本可以归纳为一句话：
+
+- `V2 core` 负责定义 truth、state、event、command、projection
+- `adapter` 负责隔离 `V1` 污染并翻译输入输出
+- `V1 backend` 负责 WAL / transport / frontend / rebuild 执行
+- 后续 phase 的方向不是换路线，而是把这件事一步步显式化并固化成正式结构
+
+如果你愿意，下一步我可以继续给你一版更像真正设计文档里的内容：
+
+- `V2 core` 的 Go package 目录建议
+- 每个 struct/event/command 放在哪个文件
+- 一个最小 `ApplyEvent() -> Decide() -> EmitCommands()` 伪代码骨架
--- a/weed/server/block_heartbeat_loop.go
+++ b/weed/server/block_heartbeat_loop.go
@@ -93,7 +93,7 @@ func (c *BlockVolumeHeartbeatCollector) Run() {
 		select {
 		case <-ticker.C:
 			// Outbound: collect and report status.
-			msgs := c.blockService.Store().CollectBlockVolumeHeartbeat()
+			msgs := c.blockService.CollectBlockVolumeHeartbeat()
 			c.safeCallback(msgs)
 			// Inbound: process any pending assignments.
 			c.processAssignments()
@@ -115,7 +115,7 @@ func (c *BlockVolumeHeartbeatCollector) processAssignments() {
 	if len(assignments) == 0 {
 		return
 	}
-	errs := c.blockService.Store().ProcessBlockVolumeAssignments(assignments)
+	errs := c.blockService.ApplyAssignments(assignments)
 	c.cbMu.Lock()
 	cb := c.assignmentCallback
 	c.cbMu.Unlock()
--- a/weed/server/block_heartbeat_loop_test.go
+++ b/weed/server/block_heartbeat_loop_test.go
@@ -463,6 +463,42 @@ func TestBlockAssign_NilSource(t *testing.T) {
 	}
 }

+// TestBlockAssign_CollectorUsesAuthoritativeLifecycle verifies the heartbeat
+// collector now drives the full BlockService assignment path, not the store-only
+// role path. A replica assignment must start the receiver and close publish
+// readiness.
+func TestBlockAssign_CollectorUsesAuthoritativeLifecycle(t *testing.T) {
+	bs := newTestBlockService(t)
+	path := testBlockVolPath(t, bs)
+
+	collector := NewBlockVolumeHeartbeatCollector(bs, 5*time.Millisecond)
+	collector.SetAssignmentSource(func() []blockvol.BlockVolumeAssignment {
+		return []blockvol.BlockVolumeAssignment{{
+			Path:            path,
+			Epoch:           1,
+			Role:            uint32(blockvol.RoleReplica),
+			ReplicaDataAddr: ":0",
+			ReplicaCtrlAddr: ":0",
+		}}
+	})
+	go collector.Run()
+	defer collector.Stop()
+
+	deadline := time.After(500 * time.Millisecond)
+	for {
+		dataAddr, ctrlAddr := bs.GetReplState(path)
+		readiness := bs.ReadinessSnapshot(path)
+		if dataAddr != "" && ctrlAddr != "" && readiness.ReceiverReady && readiness.PublishHealthy {
+			return
+		}
+		select {
+		case <-deadline:
+			t.Fatalf("collector did not start replica receiver: data=%q ctrl=%q readiness=%+v", dataAddr, ctrlAddr, readiness)
+		case <-time.After(10 * time.Millisecond):
+		}
+	}
+}
+
 // TestBlockAssign_MixedBatch verifies a batch with 1 success, 1 unknown volume,
 // and 1 invalid transition returns parallel errors correctly.
 func TestBlockAssign_MixedBatch(t *testing.T) {
--- a/weed/server/master_block_observability_test.go
+++ b/weed/server/master_block_observability_test.go
@@ -148,7 +148,8 @@ func TestClusterHealthSummary(t *testing.T) {
 		Path:          "/data/healthy.blk",
 		Role:          blockvol.RoleToWire(blockvol.RolePrimary),
 		ReplicaFactor: 2,
-		Replicas:      []ReplicaInfo{{Server: "vs2:9333", Role: blockvol.RoleToWire(blockvol.RoleReplica)}},
+		ReplicaReady:  true,
+		Replicas:      []ReplicaInfo{{Server: "vs2:9333", Role: blockvol.RoleToWire(blockvol.RoleReplica), Ready: true}},
 		Status:        StatusActive,
 	})

@@ -188,7 +189,8 @@ func TestBlockStatusHandler_IncludesHealthCounts(t *testing.T) {
 		Path:          "/data/status.blk",
 		Role:          blockvol.RoleToWire(blockvol.RolePrimary),
 		ReplicaFactor: 2,
-		Replicas:      []ReplicaInfo{{Server: "vs2:9333", Role: blockvol.RoleToWire(blockvol.RoleReplica)}},
+		ReplicaReady:  true,
+		Replicas:      []ReplicaInfo{{Server: "vs2:9333", Role: blockvol.RoleToWire(blockvol.RoleReplica), Ready: true}},
 		Status:        StatusActive,
 	})

--- a/weed/server/master_block_registry_test.go
+++ b/weed/server/master_block_registry_test.go
@@ -1965,3 +1965,45 @@ func TestRegistry_InflightBlocksAutoRegister(t *testing.T) {
 		t.Fatalf("replica health not updated after inflight released: %f", entry.Replicas[0].HealthScore)
 	}
 }
+
+func TestRegistry_ReplicaReadyRequiresReplicaHeartbeat(t *testing.T) {
+	r := NewBlockVolumeRegistry()
+	if err := r.Register(&BlockVolumeEntry{
+		Name:         "vol-ready",
+		VolumeServer: "primary-server:8080",
+		Path:         "/blocks/vol-ready.blk",
+		Status:       StatusActive,
+		Replicas: []ReplicaInfo{{
+			Server: "replica-server:8080",
+			Path:   "/blocks/vol-ready.blk",
+		}},
+	}); err != nil {
+		t.Fatalf("register: %v", err)
+	}
+
+	entry, _ := r.Lookup("vol-ready")
+	if entry.ReplicaReady {
+		t.Fatal("replica should not be ready before replica heartbeat confirms publication")
+	}
+	if !entry.ReplicaDegraded {
+		t.Fatal("volume should remain degraded until replica readiness closes")
+	}
+
+	r.UpdateFullHeartbeat("replica-server:8080", []*master_pb.BlockVolumeInfoMessage{{
+		Path:            "/blocks/vol-ready.blk",
+		Epoch:           1,
+		Role:            uint32(blockvol.RoleReplica),
+		VolumeSize:      1 << 30,
+		HealthScore:     0.9,
+		ReplicaDataAddr: "10.0.0.2:14260",
+		ReplicaCtrlAddr: "10.0.0.2:14261",
+	}}, "")
+
+	entry, _ = r.Lookup("vol-ready")
+	if !entry.Replicas[0].Ready {
+		t.Fatal("replica heartbeat with published receiver addresses should mark replica ready")
+	}
+	if !entry.ReplicaReady {
+		t.Fatal("aggregate replica readiness should become true after replica heartbeat")
+	}
+}
--- a/weed/server/master_server_handlers_block.go
+++ b/weed/server/master_server_handlers_block.go
@@ -394,6 +394,7 @@ func entryToVolumeInfo(e *BlockVolumeEntry, primaryAlive bool) blockapi.VolumeIn
 		ReplicaDataAddr:  e.ReplicaDataAddr,
 		ReplicaCtrlAddr:  e.ReplicaCtrlAddr,
 		ReplicaFactor:    rf,
+		ReplicaReady:     e.ReplicaReady,
 		HealthScore:      e.HealthScore,
 		ReplicaDegraded:  e.ReplicaDegraded,
 		DurabilityMode:   durMode,
@@ -407,6 +408,7 @@ func entryToVolumeInfo(e *BlockVolumeEntry, primaryAlive bool) blockapi.VolumeIn
 			Server:      ri.Server,
 			ISCSIAddr:   ri.ISCSIAddr,
 			IQN:         ri.IQN,
+			Ready:       ri.Ready,
 			HealthScore: ri.HealthScore,
 			WALLag:      ri.WALLag,
 		})
--- a/weed/server/volume_server_block.go
+++ b/weed/server/volume_server_block.go
@@ -24,6 +24,23 @@ type volReplState struct {
 	replicaCtrlAddr string
 	// allReplicas stores the full replica set for multi-replica idempotence.
 	allReplicas []blockvol.ReplicaAddr
+	roleApplied      bool
+	receiverReady    bool
+	shipperConfigured bool
+	replicaEligible  bool
+	publishHealthy   bool
+}
+
+// BlockReadinessSnapshot names the assignment-to-publication closure at the
+// BlockService boundary. These flags are owned by the service/adapter layer,
+// not by blockvol's local storage mechanics.
+type BlockReadinessSnapshot struct {
+	RoleApplied       bool
+	ReceiverReady     bool
+	ShipperConfigured bool
+	ShipperConnected  bool
+	ReplicaEligible   bool
+	PublishHealthy    bool
 }

 // NVMeConfig holds NVMe/TCP target configuration passed from CLI flags.
@@ -373,6 +390,15 @@ func (bs *BlockService) DeleteBlockVol(name string) error {
 // ProcessAssignments applies assignments from master, including replication setup.
 // V2 bridge: also delivers each assignment to the V2 engine for recovery ownership.
 func (bs *BlockService) ProcessAssignments(assignments []blockvol.BlockVolumeAssignment) {
+	_ = bs.ApplyAssignments(assignments)
+}
+
+// ApplyAssignments applies assignments through the single authoritative
+// BlockService lifecycle: role apply, replication wiring, and publication
+// readiness bookkeeping. Returns per-assignment errors parallel to the input.
+func (bs *BlockService) ApplyAssignments(assignments []blockvol.BlockVolumeAssignment) []error {
+	errs := make([]error, len(assignments))
+
 	// V2 bridge: convert and deliver to engine orchestrator (Phase 08 P1).
 	// P3: skip V2 processing for repeated unchanged assignments.
 	// P4: RecoveryManager starts/cancels recovery goroutines based on results.
@@ -400,9 +426,9 @@ func (bs *BlockService) ProcessAssignments(assignments []blockvol.BlockVolumeAss

 	// V1 processing (requires blockStore).
 	if bs.blockStore == nil {
-		return
+		return errs
 	}
-	for _, a := range assignments {
+	for i, a := range assignments {
 		role := blockvol.RoleFromWire(a.Role)
 		ttl := blockvol.LeaseTTLFromWire(a.LeaseTtlMs)

@@ -410,22 +436,30 @@ func (bs *BlockService) ProcessAssignments(assignments []blockvol.BlockVolumeAss
 		if err := bs.blockStore.WithVolume(a.Path, func(vol *blockvol.BlockVol) error {
 			return vol.HandleAssignment(a.Epoch, role, ttl)
 		}); err != nil {
+			errs[i] = err
 			glog.Warningf("block service: assignment %s epoch=%d role=%s: %v", a.Path, a.Epoch, role, err)
 			continue
 		}
+		bs.noteRoleApplied(a.Path, role)

 		// 2. Replication setup based on role + addresses.
 		switch role {
 		case blockvol.RolePrimary:
 			// CP8-2: ReplicaAddrs (multi-replica) takes precedence over scalar fields.
 			if len(a.ReplicaAddrs) > 0 {
-				bs.setupPrimaryReplicationMulti(a.Path, a.ReplicaAddrs)
+				if err := bs.setupPrimaryReplicationMulti(a.Path, a.ReplicaAddrs); err != nil {
+					errs[i] = err
+				}
 			} else if a.ReplicaDataAddr != "" && a.ReplicaCtrlAddr != "" {
-				bs.setupPrimaryReplication(a.Path, a.ReplicaDataAddr, a.ReplicaCtrlAddr)
+				if err := bs.setupPrimaryReplication(a.Path, a.ReplicaDataAddr, a.ReplicaCtrlAddr); err != nil {
+					errs[i] = err
+				}
 			}
 		case blockvol.RoleReplica:
 			if a.ReplicaDataAddr != "" && a.ReplicaCtrlAddr != "" {
-				bs.setupReplicaReceiver(a.Path, a.ReplicaDataAddr, a.ReplicaCtrlAddr)
+				if err := bs.setupReplicaReceiver(a.Path, a.ReplicaDataAddr, a.ReplicaCtrlAddr); err != nil {
+					errs[i] = err
+				}
 			}
 		case blockvol.RoleRebuilding:
 			if a.RebuildAddr != "" {
@@ -433,18 +467,23 @@ func (bs *BlockService) ProcessAssignments(assignments []blockvol.BlockVolumeAss
 			}
 		}
 	}
+	return errs
 }

 // setupPrimaryReplication configures WAL shipping from primary to replica
 // and starts the rebuild server (R1-2).
-func (bs *BlockService) setupPrimaryReplication(path, replicaDataAddr, replicaCtrlAddr string) {
+func (bs *BlockService) setupPrimaryReplication(path, replicaDataAddr, replicaCtrlAddr string) error {
 	// P3 idempotence: skip if replica state is unchanged.
 	bs.replMu.RLock()
 	existing := bs.replStates[path]
 	bs.replMu.RUnlock()
 	if existing != nil && existing.replicaDataAddr == replicaDataAddr && existing.replicaCtrlAddr == replicaCtrlAddr {
 		// Unchanged repeated assignment — idempotent, no side effects.
-		return
+		bs.markPrimaryTransportConfigured(path, []blockvol.ReplicaAddr{{
+			DataAddr: replicaDataAddr,
+			CtrlAddr: replicaCtrlAddr,
+		}})
+		return nil
 	}

 	// Compute deterministic rebuild listen address.
@@ -465,27 +504,19 @@ func (bs *BlockService) setupPrimaryReplication(path, replicaDataAddr, replicaCt
 		return nil
 	}); err != nil {
 		glog.Warningf("block service: setup primary replication %s: %v", path, err)
-		return
+		return err
 	}
-	// Track replication state for heartbeat reporting (R1-4).
-	// These addresses are what the primary ships to — they come from the
-	// master's assignment. They should already be canonical (from
-	// AllocateBlockVolumeResponse), but if not, they'll be reported as-is.
-	bs.replMu.Lock()
-	if bs.replStates == nil {
-		bs.replStates = make(map[string]*volReplState)
-	}
-	bs.replStates[path] = &volReplState{
-		replicaDataAddr: replicaDataAddr,
-		replicaCtrlAddr: replicaCtrlAddr,
-	}
-	bs.replMu.Unlock()
+	bs.markPrimaryTransportConfigured(path, []blockvol.ReplicaAddr{{
+		DataAddr: replicaDataAddr,
+		CtrlAddr: replicaCtrlAddr,
+	}})
 	glog.V(0).Infof("block service: primary %s shipping WAL to %s/%s (rebuild=%s)", path, replicaDataAddr, replicaCtrlAddr, rebuildAddr)
+	return nil
 }

 // setupPrimaryReplicationMulti configures WAL shipping from primary to N replicas
 // using SetReplicaAddrs (CP8-2: multi-replica support).
-func (bs *BlockService) setupPrimaryReplicationMulti(path string, addrs []blockvol.ReplicaAddr) {
+func (bs *BlockService) setupPrimaryReplicationMulti(path string, addrs []blockvol.ReplicaAddr) error {
 	// P3 idempotence: skip if ALL replica addresses unchanged.
 	// Compare full replica set, not just the first entry.
 	if len(addrs) > 0 {
@@ -493,7 +524,8 @@ func (bs *BlockService) setupPrimaryReplicationMulti(path string, addrs []blockv
 		existing := bs.replStates[path]
 		bs.replMu.RUnlock()
 		if existing != nil && bs.multiReplicaUnchanged(path, addrs) {
-			return
+			bs.markPrimaryTransportConfigured(path, addrs)
+			return nil
 		}
 	}

@@ -513,30 +545,15 @@ func (bs *BlockService) setupPrimaryReplicationMulti(path string, addrs []blockv
 		return nil
 	}); err != nil {
 		glog.Warningf("block service: setup primary replication (multi) %s: %v", path, err)
-		return
+		return err
 	}
-	// Track replication state for heartbeat reporting.
-	bs.replMu.Lock()
-	if bs.replStates == nil {
-		bs.replStates = make(map[string]*volReplState)
-	}
-	// Store full replica set + first replica for backward compat heartbeat.
-	if len(addrs) > 0 {
-		// Copy the addrs slice to avoid aliasing.
-		copied := make([]blockvol.ReplicaAddr, len(addrs))
-		copy(copied, addrs)
-		bs.replStates[path] = &volReplState{
-			replicaDataAddr: addrs[0].DataAddr,
-			replicaCtrlAddr: addrs[0].CtrlAddr,
-			allReplicas:     copied,
-		}
-	}
-	bs.replMu.Unlock()
+	bs.markPrimaryTransportConfigured(path, addrs)
 	glog.V(0).Infof("block service: primary %s shipping WAL to %d replicas (rebuild=%s)", path, len(addrs), rebuildAddr)
+	return nil
 }

 // setupReplicaReceiver starts the replica WAL receiver.
-func (bs *BlockService) setupReplicaReceiver(path, dataAddr, ctrlAddr string) {
+func (bs *BlockService) setupReplicaReceiver(path, dataAddr, ctrlAddr string) error {
 	// CP13-2: Pass the routable advertisedIP (from -ip flag, NOT from -id/serverID)
 	// so wildcard-bind listeners resolve to a real IP, not an opaque identity string.
 	var canonDataAddr, canonCtrlAddr string
@@ -559,7 +576,7 @@ func (bs *BlockService) setupReplicaReceiver(path, dataAddr, ctrlAddr string) {
 		return nil
 	}); err != nil {
 		glog.Warningf("block service: setup replica receiver %s: %v", path, err)
-		return
+		return err
 	}
 	// Fallback to assignment addresses if receiver didn't report.
 	if canonDataAddr == "" {
@@ -568,16 +585,9 @@ func (bs *BlockService) setupReplicaReceiver(path, dataAddr, ctrlAddr string) {
 	if canonCtrlAddr == "" {
 		canonCtrlAddr = ctrlAddr
 	}
-	bs.replMu.Lock()
-	if bs.replStates == nil {
-		bs.replStates = make(map[string]*volReplState)
-	}
-	bs.replStates[path] = &volReplState{
-		replicaDataAddr: canonDataAddr,
-		replicaCtrlAddr: canonCtrlAddr,
-	}
-	bs.replMu.Unlock()
+	bs.markReceiverReady(path, canonDataAddr, canonCtrlAddr)
 	glog.V(0).Infof("block service: replica %s receiving on %s/%s", path, canonDataAddr, canonCtrlAddr)
+	return nil
 }

 // startRebuild starts a rebuild in the background.
@@ -722,8 +732,10 @@ func (bs *BlockService) CollectBlockVolumeHeartbeat() []blockvol.BlockVolumeInfo
 	defer bs.replMu.RUnlock()
 	for i := range msgs {
 		if s, ok := bs.replStates[msgs[i].Path]; ok {
-			msgs[i].ReplicaDataAddr = s.replicaDataAddr
-			msgs[i].ReplicaCtrlAddr = s.replicaCtrlAddr
+			if s.publishHealthy {
+				msgs[i].ReplicaDataAddr = s.replicaDataAddr
+				msgs[i].ReplicaCtrlAddr = s.replicaCtrlAddr
+			}
 		}
 		// NVMe publication: report nvme_addr and nqn if NVMe target is running.
 		if bs.nvmeListenAddr != "" {
@@ -758,6 +770,108 @@ func (bs *BlockService) multiReplicaUnchanged(path string, addrs []blockvol.Repl
 	return true
 }

+func (bs *BlockService) ensureReplStateLocked(path string) *volReplState {
+	if bs.replStates == nil {
+		bs.replStates = make(map[string]*volReplState)
+	}
+	state := bs.replStates[path]
+	if state == nil {
+		state = &volReplState{}
+		bs.replStates[path] = state
+	}
+	return state
+}
+
+func (bs *BlockService) noteRoleApplied(path string, role blockvol.Role) {
+	bs.replMu.Lock()
+	defer bs.replMu.Unlock()
+	state := bs.ensureReplStateLocked(path)
+	state.roleApplied = true
+	switch role {
+	case blockvol.RoleReplica:
+		state.receiverReady = false
+		state.shipperConfigured = false
+		state.replicaEligible = false
+		state.publishHealthy = false
+	case blockvol.RolePrimary:
+		state.receiverReady = false
+		state.shipperConfigured = false
+		state.replicaEligible = false
+		state.publishHealthy = true
+	case blockvol.RoleRebuilding:
+		state.receiverReady = false
+		state.shipperConfigured = false
+		state.replicaEligible = false
+		state.publishHealthy = false
+	default:
+		state.receiverReady = false
+		state.shipperConfigured = false
+		state.replicaEligible = false
+		state.publishHealthy = false
+		state.replicaDataAddr = ""
+		state.replicaCtrlAddr = ""
+		state.allReplicas = nil
+	}
+}
+
+func (bs *BlockService) markPrimaryTransportConfigured(path string, addrs []blockvol.ReplicaAddr) {
+	bs.replMu.Lock()
+	defer bs.replMu.Unlock()
+	state := bs.ensureReplStateLocked(path)
+	state.shipperConfigured = len(addrs) > 0
+	state.publishHealthy = true
+	state.replicaEligible = false
+	state.receiverReady = false
+	if len(addrs) == 0 {
+		state.replicaDataAddr = ""
+		state.replicaCtrlAddr = ""
+		state.allReplicas = nil
+		return
+	}
+	copied := make([]blockvol.ReplicaAddr, len(addrs))
+	copy(copied, addrs)
+	state.allReplicas = copied
+	state.replicaDataAddr = addrs[0].DataAddr
+	state.replicaCtrlAddr = addrs[0].CtrlAddr
+}
+
+func (bs *BlockService) markReceiverReady(path, dataAddr, ctrlAddr string) {
+	bs.replMu.Lock()
+	defer bs.replMu.Unlock()
+	state := bs.ensureReplStateLocked(path)
+	state.receiverReady = true
+	state.replicaEligible = true
+	state.publishHealthy = true
+	state.shipperConfigured = false
+	state.replicaDataAddr = dataAddr
+	state.replicaCtrlAddr = ctrlAddr
+	state.allReplicas = nil
+}
+
+// ReadinessSnapshot reports the service-owned assignment/readiness closure for
+// one volume. It keeps v2 publication truth above blockvol's local mechanics.
+func (bs *BlockService) ReadinessSnapshot(path string) BlockReadinessSnapshot {
+	snap := BlockReadinessSnapshot{}
+	bs.replMu.RLock()
+	state := bs.replStates[path]
+	if state != nil {
+		snap.RoleApplied = state.roleApplied
+		snap.ReceiverReady = state.receiverReady
+		snap.ShipperConfigured = state.shipperConfigured
+		snap.ReplicaEligible = state.replicaEligible
+		snap.PublishHealthy = state.publishHealthy
+	}
+	bs.replMu.RUnlock()
+	if !snap.ShipperConfigured || bs.blockStore == nil {
+		return snap
+	}
+	_ = bs.blockStore.WithVolume(path, func(vol *blockvol.BlockVol) error {
+		snap.ShipperConnected = len(vol.ReplicaShipperStates()) > 0 && !vol.Status().ReplicaDegraded
+		return nil
+	})
+	return snap
+}
+
 // --- P3: Assignment idempotence ---

 // lastAppliedAssignment stores the full assignment for idempotence comparison.
--- a/weed/server/volume_server_block_debug.go
+++ b/weed/server/volume_server_block_debug.go
@@ -17,13 +17,19 @@ type ShipperDebugInfo struct {

 // BlockVolumeDebugInfo is the real-time block volume state.
 type BlockVolumeDebugInfo struct {
-	Path      string             `json:"path"`
-	Role      string             `json:"role"`
-	Epoch     uint64             `json:"epoch"`
-	HeadLSN   uint64             `json:"head_lsn"`
-	Degraded  bool               `json:"degraded"`
-	Shippers  []ShipperDebugInfo `json:"shippers,omitempty"`
-	Timestamp string             `json:"timestamp"`
+	Path              string             `json:"path"`
+	Role              string             `json:"role"`
+	Epoch             uint64             `json:"epoch"`
+	HeadLSN           uint64             `json:"head_lsn"`
+	Degraded          bool               `json:"degraded"`
+	RoleApplied       bool               `json:"role_applied"`
+	ReceiverReady     bool               `json:"receiver_ready"`
+	ShipperConfigured bool               `json:"shipper_configured"`
+	ShipperConnected  bool               `json:"shipper_connected"`
+	ReplicaEligible   bool               `json:"replica_eligible"`
+	PublishHealthy    bool               `json:"publish_healthy"`
+	Shippers          []ShipperDebugInfo `json:"shippers,omitempty"`
+	Timestamp         string             `json:"timestamp"`
 }

 // debugBlockShipperHandler returns real-time shipper state for all block volumes.
@@ -48,13 +54,20 @@ func (vs *VolumeServer) debugBlockShipperHandler(w http.ResponseWriter, r *http.
 	var infos []BlockVolumeDebugInfo
 	store.IterateBlockVolumes(func(path string, vol *blockvol.BlockVol) {
 		status := vol.Status()
+		readiness := vs.blockService.ReadinessSnapshot(path)
 		info := BlockVolumeDebugInfo{
-			Path:      path,
-			Role:      status.Role.String(),
-			Epoch:     status.Epoch,
-			HeadLSN:   status.WALHeadLSN,
-			Degraded:  status.ReplicaDegraded,
-			Timestamp: time.Now().UTC().Format(time.RFC3339Nano),
+			Path:              path,
+			Role:              status.Role.String(),
+			Epoch:             status.Epoch,
+			HeadLSN:           status.WALHeadLSN,
+			Degraded:          status.ReplicaDegraded,
+			RoleApplied:       readiness.RoleApplied,
+			ReceiverReady:     readiness.ReceiverReady,
+			ShipperConfigured: readiness.ShipperConfigured,
+			ShipperConnected:  readiness.ShipperConnected,
+			ReplicaEligible:   readiness.ReplicaEligible,
+			PublishHealthy:    readiness.PublishHealthy,
+			Timestamp:         time.Now().UTC().Format(time.RFC3339Nano),
 		}

 		// Get per-shipper state from ShipperGroup if available.
--- a/weed/storage/blockvol/blockapi/types.go
+++ b/weed/storage/blockvol/blockapi/types.go
@@ -36,6 +36,7 @@ type VolumeInfo struct {
 	// CP8-2: Multi-replica fields.
 	ReplicaFactor   int             `json:"replica_factor"`
 	Replicas        []ReplicaDetail `json:"replicas,omitempty"`
+	ReplicaReady    bool            `json:"replica_ready,omitempty"`
 	HealthScore     float64         `json:"health_score"`
 	ReplicaDegraded bool            `json:"replica_degraded,omitempty"`
 	DurabilityMode  string          `json:"durability_mode"` // CP8-3-1
@@ -71,6 +72,7 @@ type ReplicaDetail struct {
 	Server      string  `json:"server"`
 	ISCSIAddr   string  `json:"iscsi_addr,omitempty"`
 	IQN         string  `json:"iqn,omitempty"`
+	Ready       bool    `json:"ready,omitempty"`
 	HealthScore float64 `json:"health_score"`
 	WALLag      uint64  `json:"wal_lag,omitempty"`
 }
--- a/weed/storage/blockvol/test/component/replica_read_test.go
+++ b/weed/storage/blockvol/test/component/replica_read_test.go
@@ -0,0 +1,155 @@
+package component
+
+import (
+	"bytes"
+	"net"
+	"testing"
+	"time"
+
+	"github.com/seaweedfs/seaweedfs/weed/storage/blockvol"
+)
+
+// TestReplicaReadAfterShip verifies that data shipped from primary to replica
+// via WAL replication is readable on the replica via ReadLBA.
+//
+// This reproduces the CP13-8 bug: replica iSCSI reads zeros despite
+// replicated data in WAL (sync_all barrier confirmed).
+func TestReplicaReadAfterShip(t *testing.T) {
+	primaryPath := t.TempDir() + "/primary.blk"
+	replicaPath := t.TempDir() + "/replica.blk"
+
+	primary, err := blockvol.CreateBlockVol(primaryPath, blockvol.CreateOptions{
+		VolumeSize: 4 * 1024 * 1024,
+		BlockSize:  4096,
+		WALSize:    1 * 1024 * 1024,
+	})
+	if err != nil {
+		t.Fatal(err)
+	}
+	defer primary.Close()
+
+	replica, err := blockvol.CreateBlockVol(replicaPath, blockvol.CreateOptions{
+		VolumeSize: 4 * 1024 * 1024,
+		BlockSize:  4096,
+		WALSize:    1 * 1024 * 1024,
+	})
+	if err != nil {
+		t.Fatal(err)
+	}
+	defer replica.Close()
+
+	// Assign roles.
+	primary.HandleAssignment(1, blockvol.RolePrimary, 30*time.Second)
+	replica.HandleAssignment(1, blockvol.RoleReplica, 30*time.Second)
+
+	// Start replica receiver.
+	if err := replica.StartReplicaReceiver(":0", ":0"); err != nil {
+		t.Fatal(err)
+	}
+	recvAddr := replica.ReplicaReceiverAddr()
+	if recvAddr == nil {
+		t.Fatal("replica receiver not started")
+	}
+	t.Logf("replica receiver: data=%s ctrl=%s", recvAddr.DataAddr, recvAddr.CtrlAddr)
+
+	// Wire shipper from primary to replica.
+	primary.SetReplicaAddr(recvAddr.DataAddr, recvAddr.CtrlAddr)
+
+	// Write on primary — should ship to replica.
+	writeData := bytes.Repeat([]byte{0xAB}, 4096)
+	if err := primary.WriteLBA(0, writeData); err != nil {
+		t.Fatalf("primary WriteLBA(0): %v", err)
+	}
+
+	// Give shipping + apply time.
+	time.Sleep(2 * time.Second)
+
+	// Read from REPLICA.
+	replicaData, err := replica.ReadLBA(0, 4096)
+	if err != nil {
+		t.Fatalf("replica ReadLBA(0): %v", err)
+	}
+
+	if replicaData[0] == 0x00 {
+		t.Fatalf("BUG REPRODUCED: replica ReadLBA returns zeros (first byte=0x%02x, want 0xAB)"+
+			"\nData is in replica WAL but ReadLBA returns zeros", replicaData[0])
+	}
+	if !bytes.Equal(replicaData, writeData) {
+		t.Fatalf("replica data mismatch: first byte=0x%02x, want 0xAB", replicaData[0])
+	}
+	t.Log("replica ReadLBA after ship: OK (data matches primary)")
+}
+
+// TestReplicaReadDirectApply bypasses the shipper entirely and manually
+// ships a WAL entry via TCP to the replica receiver, then reads it back.
+func TestReplicaReadDirectApply(t *testing.T) {
+	replicaPath := t.TempDir() + "/replica.blk"
+	vol, err := blockvol.CreateBlockVol(replicaPath, blockvol.CreateOptions{
+		VolumeSize: 4 * 1024 * 1024,
+		BlockSize:  4096,
+		WALSize:    1 * 1024 * 1024,
+	})
+	if err != nil {
+		t.Fatal(err)
+	}
+	defer vol.Close()
+
+	vol.HandleAssignment(1, blockvol.RoleReplica, 30*time.Second)
+
+	if err := vol.StartReplicaReceiver(":0", ":0"); err != nil {
+		t.Fatal(err)
+	}
+	recvAddr := vol.ReplicaReceiverAddr()
+	t.Logf("replica: data=%s ctrl=%s", recvAddr.DataAddr, recvAddr.CtrlAddr)
+
+	// Directly connect and ship a WAL entry.
+	conn, err := net.DialTimeout("tcp", recvAddr.DataAddr, 3*time.Second)
+	if err != nil {
+		t.Fatalf("connect: %v", err)
+	}
+	defer conn.Close()
+
+	payload := bytes.Repeat([]byte{0xEF}, 4096)
+	entry := blockvol.WALEntry{
+		LSN:    1,
+		Epoch:  1,
+		Type:   blockvol.EntryTypeWrite,
+		LBA:    0,
+		Length: 4096,
+		Data:   payload,
+	}
+	encoded, err := entry.Encode()
+	if err != nil {
+		t.Fatal(err)
+	}
+	if err := blockvol.WriteFrame(conn, blockvol.MsgWALEntry, encoded); err != nil {
+		t.Fatalf("ship: %v", err)
+	}
+
+	time.Sleep(1 * time.Second)
+
+	// Read back via ReadLBA.
+	data, err := vol.ReadLBA(0, 4096)
+	if err != nil {
+		t.Fatalf("ReadLBA: %v", err)
+	}
+
+	if data[0] == 0x00 {
+		t.Fatalf("BUG: ReadLBA returns zeros after direct WAL apply (0x%02x, want 0xEF)", data[0])
+	}
+	if data[0] != 0xEF {
+		t.Fatalf("unexpected data: 0x%02x, want 0xEF", data[0])
+	}
+	t.Logf("direct apply ReadLBA: OK (0x%02x)", data[0])
+
+	// Also read via adapter (same path as iSCSI).
+	adapter := blockvol.NewBlockVolAdapter(vol)
+	adapterData, err := adapter.ReadAt(0, 4096)
+	if err != nil {
+		t.Fatalf("adapter ReadAt: %v", err)
+	}
+	if adapterData[0] != 0xEF {
+		t.Fatalf("adapter returns wrong data: 0x%02x, want 0xEF", adapterData[0])
+	}
+	t.Log("adapter ReadAt: OK")
+}
--- a/weed/storage/blockvol/testrunner/actions/devops.go
+++ b/weed/storage/blockvol/testrunner/actions/devops.go
@@ -787,6 +787,11 @@ func waitVolumeHealthy(ctx context.Context, actx *tr.ActionContext, act tr.Actio
 				continue
 			}

+			if info.ReplicaFactor > 1 && !info.ReplicaReady {
+				actx.Log("  poll %d: replica assigned but not publish-ready yet", poll)
+				continue
+			}
+
 			// Check not degraded.
 			if info.ReplicaDegraded {
 				actx.Log("  poll %d: replica degraded, waiting...", poll)
--- a/weed/storage/blockvol/testrunner/internal/blockapi/types.go
+++ b/weed/storage/blockvol/testrunner/internal/blockapi/types.go
@@ -32,6 +32,7 @@ type VolumeInfo struct {
 	ReplicaCtrlAddr  string          `json:"replica_ctrl_addr,omitempty"`
 	ReplicaFactor    int             `json:"replica_factor"`
 	Replicas         []ReplicaDetail `json:"replicas,omitempty"`
+	ReplicaReady     bool            `json:"replica_ready,omitempty"`
 	HealthScore      float64         `json:"health_score"`
 	ReplicaDegraded  bool            `json:"replica_degraded,omitempty"`
 	DurabilityMode   string          `json:"durability_mode"`
@@ -45,6 +46,7 @@ type ReplicaDetail struct {
 	Server      string  `json:"server"`
 	ISCSIAddr   string  `json:"iscsi_addr,omitempty"`
 	IQN         string  `json:"iqn,omitempty"`
+	Ready       bool    `json:"ready,omitempty"`
 	HealthScore float64 `json:"health_score"`
 	WALLag      uint64  `json:"wal_lag,omitempty"`
 }
--- a/weed/storage/blockvol/testrunner/scenarios/internal/cp13-8-real-workload-validation.yaml
+++ b/weed/storage/blockvol/testrunner/scenarios/internal/cp13-8-real-workload-validation.yaml
@@ -1,240 +1,368 @@
 name: cp13-8-real-workload-validation
-timeout: 20m
+timeout: 15m

 # CP13-8: Bounded real-workload validation for RF=2 sync_all.
 #
-# Workload envelope:
-#   Topology:  RF=2 sync_all, cross-machine replication (m01 ↔ M02)
-#   Transport: iSCSI (primary frontend)
+# Envelope:
+#   Topology:  RF=2 sync_all, cross-machine (m01 ↔ M02)
+#   Transport: iSCSI
 #   Workloads: ext4 (filesystem) + PostgreSQL pgbench (application)
-#   Disturbance: one bounded failover (kill primary, promote replica)
-#   Exclusions: NVMe-TCP, RF>2, hours/days soak, degraded-mode perf
+#   Disturbance: one bounded failover (kill primary, auto-promote replica)
+#   Exclusions: NVMe-TCP, RF>2, soak, degraded-mode, mode normalization
 #
-# What this validates:
-#   The accepted CP13-1..7 replication contract survives contact with
-#   real filesystem and database consumers. Specifically:
-#   - Replicated writes are durable on both nodes (ext4 file integrity)
-#   - Post-failover data is consistent (fsck + file count)
-#   - Database transactions are durable under sync_all (pgbench TPC-B)
-#
-# What this does NOT validate:
-#   - Production rollout readiness
-#   - Performance floor (see Phase 12 P4)
-#   - Degraded mode behavior
-#   - NVMe-TCP transport path
-#   - Mode normalization (CP13-9)
+# Flow:
+#   1. Create RF=2 sync_all — NO promote (use initial primary as-is)
+#   2. Wait for replication healthy (shipper connected, not degraded)
+#   3. Write ext4 + 200 files on primary
+#   4. Kill primary → auto-failover promotes replica
+#   5. Verify ext4 on promoted replica (fsck + files + checksums)
+#   6. pgbench on promoted replica

 env:
-  repo_dir: "C:/work/seaweedfs"
+  master_url: "http://10.0.0.3:9433"
+  volume_name: cp13-8-val
+  # 512MB: enough for ext4 + pgbench, small enough for mkfs + sync_all.
+  vol_size: "536870912"

 topology:
  nodes:
-    target_node:
-      host: "192.168.1.184"
+    m01:
+      host: 192.168.1.181
+      alt_ips: ["10.0.0.1"]
      user: testdev
-      key: "C:/work/dev_server/testdev_key"
-    client_node:
-      host: "192.168.1.181"
+      key: "/opt/work/testdev_key"
+    m02:
+      host: 192.168.1.184
+      alt_ips: ["10.0.0.3"]
      user: testdev
-      key: "C:/work/dev_server/testdev_key"
-
-targets:
-  primary:
-    node: target_node
-    vol_size: 100M
-    iscsi_port: 3280
-    admin_port: 8095
-    replica_data_port: 9040
-    replica_ctrl_port: 9041
-    rebuild_port: 9042
-    iqn_suffix: cp13-8-primary
-  replica:
-    node: client_node
-    vol_size: 100M
-    iscsi_port: 3281
-    admin_port: 8096
-    replica_data_port: 9043
-    replica_ctrl_port: 9044
-    rebuild_port: 9045
-    iqn_suffix: cp13-8-replica
+      key: "/opt/work/testdev_key"

 phases:
-  # --- Phase 1: Setup RF=2 sync_all pair ---
  - name: setup
    actions:
-      - action: kill_stale
-        node: target_node
+      - action: exec
+        node: m02
+        cmd: "fuser -k 9433/tcp 18480/tcp 2>/dev/null; sleep 1; rm -rf /tmp/sw-cp138-master /tmp/sw-cp138-vs1; mkdir -p /tmp/sw-cp138-master /tmp/sw-cp138-vs1/blocks"
+        root: "true"
        ignore_error: true
-      - action: kill_stale
-        node: client_node
+      - action: exec
+        node: m01
+        cmd: "fuser -k 18480/tcp 2>/dev/null; sleep 1; rm -rf /tmp/sw-cp138-vs2; mkdir -p /tmp/sw-cp138-vs2/blocks"
+        root: "true"
        ignore_error: true
-      - action: iscsi_cleanup
-        node: client_node
-        ignore_error: true
-      - action: build_deploy
-      - action: start_target
-        target: primary
-        create: "true"
-        durability_mode: sync_all
-      - action: start_target
-        target: replica
-        create: "true"
-        durability_mode: sync_all
-      - action: assign
-        target: replica
-        epoch: "1"
-        role: replica
-        lease_ttl: 60s
-      - action: assign
-        target: primary
-        epoch: "1"
-        role: primary
-        lease_ttl: 60s
-      - action: set_replica
-        target: primary
-        replica: replica
-      - action: sleep
-        duration: 2s

-  # --- Phase 2: ext4 filesystem workload ---
-  - name: ext4-write
-    actions:
-      - action: iscsi_login
-        target: primary
-        node: client_node
-        save_as: device
-      - action: mkfs
-        node: client_node
-        device: "{{ device }}"
-        fstype: ext4
-      - action: mount
-        node: client_node
-        device: "{{ device }}"
-        mountpoint: /mnt/cp13-8
-      # Write 200 files with known content.
-      - action: exec
-        node: client_node
-        root: "true"
-        cmd: "bash -c 'for i in $(seq 1 200); do dd if=/dev/urandom of=/mnt/cp13-8/file_$i bs=4k count=1 2>/dev/null; done && sync'"
-      # Compute checksums for later verification.
-      - action: exec
-        node: client_node
-        root: "true"
-        cmd: "md5sum /mnt/cp13-8/file_* | sort > /tmp/cp13-8-checksums.txt && cat /tmp/cp13-8-checksums.txt | wc -l"
-        save_as: checksum_count
-      - action: assert_equal
-        actual: "{{ checksum_count }}"
-        expected: "200"
-      - action: umount
-        node: client_node
-        mountpoint: /mnt/cp13-8
-      - action: iscsi_cleanup
-        node: client_node
-        ignore_error: true
-      # Wait for replication to catch up.
-      - action: wait_lsn
-        target: replica
-        min_lsn: "1"
-        timeout: 30s
+      - action: start_weed_master
+        node: m02
+        port: "9433"
+        dir: /tmp/sw-cp138-master
+        extra_args: "-ip=10.0.0.3"
+        save_as: master_pid
+
      - action: sleep
        duration: 3s

-  # --- Phase 3: Failover (kill primary, promote replica) ---
+      - action: start_weed_volume
+        node: m02
+        port: "18480"
+        master: "10.0.0.3:9433"
+        dir: /tmp/sw-cp138-vs1
+        extra_args: "-block.dir=/tmp/sw-cp138-vs1/blocks -block.listen=:3295 -ip=10.0.0.3"
+        save_as: vs1_pid
+
+      - action: start_weed_volume
+        node: m01
+        port: "18480"
+        master: "10.0.0.3:9433"
+        dir: /tmp/sw-cp138-vs2
+        extra_args: "-block.dir=/tmp/sw-cp138-vs2/blocks -block.listen=:3295 -ip=10.0.0.1"
+        save_as: vs2_pid
+
+      - action: sleep
+        duration: 3s
+
+      - action: wait_cluster_ready
+        node: m02
+        master_url: "{{ master_url }}"
+
+      - action: wait_block_servers
+        count: "2"
+
+      - action: create_block_volume
+        name: "{{ volume_name }}"
+        size_bytes: "{{ vol_size }}"
+        replica_factor: "2"
+        durability_mode: "sync_all"
+
+      # Wait for assignment delivery (heartbeat cycle).
+      - action: sleep
+        duration: 15s
+
+      # Bootstrap write: triggers shipper connect + first barrier.
+      # Without this, the shipper stays degraded because barrier-triggered
+      # recovery needs a write to fire the barrier.
+      - action: lookup_block_volume
+        name: "{{ volume_name }}"
+        save_as: boot_vol
+
+      - action: iscsi_login_direct
+        node: m01
+        host: "{{ boot_vol_iscsi_host }}"
+        port: "{{ boot_vol_iscsi_port }}"
+        iqn: "{{ boot_vol_iqn }}"
+        save_as: boot_device
+        ignore_error: true
+
+      - action: exec
+        node: m01
+        root: "true"
+        cmd: "dd if=/dev/urandom of={{ boot_device }} bs=4k count=1 seek=100000 oflag=direct,sync 2>/dev/null; true"
+        ignore_error: true
+
+      - action: iscsi_cleanup
+        node: m01
+        ignore_error: true
+
+      - action: sleep
+        duration: 5s
+
+      - action: wait_volume_healthy
+        name: "{{ volume_name }}"
+        timeout: 60s
+
+      - action: discover_primary
+        name: "{{ volume_name }}"
+        save_as: pri
+
+      - action: print
+        msg: "CP13-8 setup: primary={{ pri }} ({{ pri_server }}), replica={{ pri_replica_node }}"
+
+  # --- Phase 2: ext4 filesystem workload on initial primary ---
+  - name: ext4-write
+    actions:
+      - action: lookup_block_volume
+        name: "{{ volume_name }}"
+        save_as: vol
+
+      - action: iscsi_login_direct
+        node: m01
+        host: "{{ vol_iscsi_host }}"
+        port: "{{ vol_iscsi_port }}"
+        iqn: "{{ vol_iqn }}"
+        save_as: device
+
+      - action: exec
+        node: m01
+        cmd: "mkfs.ext4 -F {{ device }} 2>&1 | tail -2"
+        root: "true"
+
+      - action: exec
+        node: m01
+        cmd: "mkdir -p /mnt/cp13-8 && mount {{ device }} /mnt/cp13-8"
+        root: "true"
+
+      - action: exec
+        node: m01
+        root: "true"
+        cmd: "for i in $(seq 1 200); do dd if=/dev/urandom of=/mnt/cp13-8/file_$i bs=4k count=1 2>/dev/null; done && sync && echo WRITE_DONE"
+        save_as: write_result
+
+      - action: assert_contains
+        value: "{{ write_result }}"
+        contains: "WRITE_DONE"
+
+      - action: exec
+        node: m01
+        root: "true"
+        cmd: "md5sum /mnt/cp13-8/file_* | sort > /tmp/cp13-8-pre.md5 && wc -l < /tmp/cp13-8-pre.md5"
+        save_as: pre_checksum_count
+
+      - action: assert_equal
+        actual: "{{ pre_checksum_count }}"
+        expected: "200"
+
+      - action: exec
+        node: m01
+        cmd: "umount /mnt/cp13-8"
+        root: "true"
+
+      - action: iscsi_cleanup
+        node: m01
+        ignore_error: true
+
+      - action: print
+        msg: "ext4-write: 200 files written, checksums captured"
+
+      # Verify replication is healthy after all writes.
+      - action: wait_volume_healthy
+        name: "{{ volume_name }}"
+        timeout: 30s
+
+  # --- Phase 3: Failover ---
+  # Kill ONLY the primary's VS, keep the replica alive for auto-promote.
+  # Master allocates primary to m01 first (by server registration order).
+  # Kill m01 VS (primary), m02 VS (replica) stays alive for promotion.
  - name: failover
    actions:
-      - action: kill_target
-        target: primary
-      - action: assign
-        target: replica
-        epoch: "2"
-        role: primary
-        lease_ttl: 60s
-      - action: wait_role
-        target: replica
-        role: primary
-        timeout: 10s
+      - action: print
+        msg: "=== Killing primary VS on m01 ==="
+
+      - action: exec
+        node: m01
+        cmd: "kill -9 {{ vs2_pid }}"
+        root: "true"
+        ignore_error: true
+
+      # Wait for lease expiry (30s TTL) + auto-failover.
+      - action: sleep
+        duration: 50s
+
+      # Wait for primary to change from m01 to m02.
+      - action: wait_block_primary
+        name: "{{ volume_name }}"
+        not: "{{ pri_server }}"
+        timeout: 60s
+        save_as: new_pri
+
+      - action: print
+        msg: "Failover: {{ pri_server }} → {{ new_pri }}"
+
+      - action: sleep
+        duration: 5s

  # --- Phase 4: ext4 verification on promoted replica ---
  - name: ext4-verify
    actions:
-      - action: iscsi_login
-        target: replica
-        node: client_node
+      - action: discover_primary
+        name: "{{ volume_name }}"
+        save_as: new
+
+      - action: print
+        msg: "Verifying ext4 on promoted node {{ new }} ({{ new_server }})"
+
+      # Connect to the new primary's iSCSI.
+      - action: iscsi_login_direct
+        node: m01
+        host: "{{ new_host }}"
+        port: "3295"
+        iqn: "{{ vol_iqn }}"
        save_as: device2
-      # fsck: filesystem integrity.
+
      - action: fsck_ext4
-        node: client_node
+        node: m01
        device: "{{ device2 }}"
        save_as: fsck_result
-      # Mount and verify file count.
-      - action: mount
-        node: client_node
-        device: "{{ device2 }}"
-        mountpoint: /mnt/cp13-8
+
+      - action: print
+        msg: "fsck: {{ fsck_result }}"
+
      - action: exec
-        node: client_node
+        node: m01
+        cmd: "mkdir -p /mnt/cp13-8 && mount {{ device2 }} /mnt/cp13-8"
+        root: "true"
+
+      - action: exec
+        node: m01
        root: "true"
        cmd: "ls /mnt/cp13-8/file_* | wc -l"
-        save_as: post_failover_count
+        save_as: post_count
+
      - action: assert_equal
-        actual: "{{ post_failover_count }}"
+        actual: "{{ post_count }}"
        expected: "200"
-      # Verify checksums match pre-failover.
+
      - action: exec
-        node: client_node
+        node: m01
        root: "true"
-        cmd: "md5sum /mnt/cp13-8/file_* | sort > /tmp/cp13-8-checksums-post.txt && diff /tmp/cp13-8-checksums.txt /tmp/cp13-8-checksums-post.txt && echo MATCH"
-        save_as: checksum_match
+        cmd: "md5sum /mnt/cp13-8/file_* | sort > /tmp/cp13-8-post.md5 && diff /tmp/cp13-8-pre.md5 /tmp/cp13-8-post.md5 && echo CHECKSUM_MATCH"
+        save_as: checksum_diff
+
      - action: assert_contains
-        value: "{{ checksum_match }}"
-        contains: "MATCH"
-      - action: umount
-        node: client_node
-        mountpoint: /mnt/cp13-8
+        value: "{{ checksum_diff }}"
+        contains: "CHECKSUM_MATCH"
+
+      - action: exec
+        node: m01
+        cmd: "umount /mnt/cp13-8"
+        root: "true"
+
      - action: iscsi_cleanup
-        node: client_node
+        node: m01
        ignore_error: true

-  # --- Phase 5: pgbench on promoted replica (application workload) ---
-  - name: pgbench-on-replica
+      - action: print
+        msg: "ext4-verify: fsck CLEAN, 200 files, checksums MATCH"
+
+  # --- Phase 5: pgbench on promoted replica ---
+  - name: pgbench
    actions:
-      - action: iscsi_login
-        target: replica
-        node: client_node
+      - action: iscsi_login_direct
+        node: m01
+        host: "{{ new_host }}"
+        port: "3295"
+        iqn: "{{ vol_iqn }}"
        save_as: device3
+
+      - action: sleep
+        duration: 3s
+
      - action: pgbench_init
-        node: client_node
+        node: m01
        device: "{{ device3 }}"
        mount: "/mnt/cp13-8-pg"
-        port: "5440"
+        port: "5441"
        scale: "1"
        fstype: ext4
+
      - action: pgbench_run
-        node: client_node
+        node: m01
        clients: "1"
        duration: "10"
-        save_as: tps_post_failover
+        save_as: tps
+
      - action: print
-        msg: "CP13-8: pgbench TPC-B post-failover: {{ tps_post_failover }} TPS"
+        msg: "CP13-8 pgbench TPS: {{ tps }}"
+
      - action: assert_greater
-        value: "{{ tps_post_failover }}"
+        actual: "{{ tps }}"
        threshold: "0"
-      # pgbench succeeded with TPS > 0 = database transactions are durable on the promoted replica.
+
      - action: pgbench_cleanup
-        node: client_node
-        mount: "/mnt/cp13-8-pg"
-        port: "5440"
+        node: m01
        ignore_error: true
+
      - action: iscsi_cleanup
-        node: client_node
+        node: m01
        ignore_error: true

  # --- Phase 6: Cleanup ---
  - name: cleanup
    always: true
    actions:
+      - action: exec
+        node: m01
+        cmd: "umount /mnt/cp13-8 /mnt/cp13-8-pg 2>/dev/null; true"
+        root: "true"
+        ignore_error: true
      - action: iscsi_cleanup
-        node: client_node
+        node: m01
        ignore_error: true
-      - action: stop_all_targets
+      - action: stop_weed
+        node: m01
+        pid: "{{ vs2_pid }}"
+        ignore_error: true
+      - action: stop_weed
+        node: m01
+        pid: "{{ vs2_new_pid }}"
+        ignore_error: true
+      - action: stop_weed
+        node: m02
+        pid: "{{ vs1_pid }}"
+        ignore_error: true
+      - action: stop_weed
+        node: m02
+        pid: "{{ vs1_new_pid }}"
+        ignore_error: true
+      - action: stop_weed
+        node: m02
+        pid: "{{ master_pid }}"
        ignore_error: true
--- a/weed/storage/store_blockvol.go
+++ b/weed/storage/store_blockvol.go
@@ -118,10 +118,14 @@ func (bs *BlockVolumeStore) WithVolume(path string, fn func(*blockvol.BlockVol)
 	return fn(vol)
 }

-// ProcessBlockVolumeAssignments applies a batch of assignments from master.
-// Returns a slice of errors parallel to the input (nil = success).
-// Unknown volumes and invalid transitions are logged and returned as errors,
-// but do not stop processing of remaining assignments.
+// ProcessBlockVolumeAssignments applies only the local role/epoch/lease part of
+// a batch of assignments. It does NOT wire replica receivers, shippers, or
+// publication readiness. The authoritative runtime lifecycle lives in
+// BlockService.ApplyAssignments.
+//
+// Returns a slice of errors parallel to the input (nil = success). Unknown
+// volumes and invalid transitions are logged and returned as errors, but do not
+// stop processing of remaining assignments.
 func (bs *BlockVolumeStore) ProcessBlockVolumeAssignments(
 	assignments []blockvol.BlockVolumeAssignment,
 ) []error {