After rejoin, the shipper is configured but no I/O triggers Ship(),
so the shipper stays Disconnected and the core stays at
awaiting_shipper_connected indefinitely.
Fix: observePrimaryShipperConnectivity now calls TryReconnectShippers
when ShipperConfigured=true but ShipperConnected=false. This triggers
the full reconnect protocol (dial + handshake + bounded catch-up)
proactively, bringing the replica current without waiting for I/O.
Option B approach: uses the same reconnect path as Barrier() — not a
fake write or bare dial probe. CatchUpTo(headLSN) replays any retained
WAL entries, bringing the replica fully current.
New methods:
- WALShipper.TryReconnect(): full reconnect without foreground I/O
- ShipperGroup.TryReconnectAll(): probes all disconnected shippers
- BlockVol.TryReconnectShippers(): volume-level entry point
Also fix pre-existing test expectation: engine now emits
start_recovery_task on primary assignment with replicas.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fix recover path TOCTOU: re-Lookup after AddReplica so the primary
refresh assignment includes the freshly added replica addresses.
Previously, Lookup (copy) was called before AddReplica modified the
registry, so entry.Replicas was empty → primary got replicas=0 →
shipper never configured.
Add 2 WAL pressure edge case tests:
- ShipperCatchUpOrEscalate: 64KB WAL, 200 writes, aggressive flusher.
Proves no hang/deadlock/corruption. Shipper either keeps up or
correctly escalates to NeedsRebuild.
- RebuildWithPinWhilePrimaryWrites: rebuild session active while
primary writes 7600+ blocks in 2s. Proves primary never freezes
— rebuild pin is on replica only, primary WAL recycles freely.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
All 43 actions pass on m01/m02 hardware. Auto-failover PASS.
dd_write: 30s → 123ms. Post-failover write: 33,621 IOPS.
1. WAL retention: remove keepup retention floor (MinShippedLSN).
WAL cannot be pinned during sustained async writes — any pin
strategy either fills WAL (blocking writes) or over-recycles
(breaking catch-up). Flusher recycles freely. Future LBA map
will provide catch-up without WAL retention.
MinShippedLSN on ShipperGroup retained as diagnostic surface.
2. Registry stale-cleanup race: add RegisteredAt grace period.
Race: master registers volume → next VS heartbeat arrives before
VS discovers the volume → stale cleanup deletes the entry →
failover finds 0 entries. Fix: skip stale cleanup for entries
registered within 30s (> 2 heartbeat intervals).
2 new tests: grace protects new entry, old entry still cleaned.
3. Shutdown heartbeat: VS disconnect heartbeat no longer claims
block inventory authority. Previously, the shutdown beat's
empty inventory triggered stale cleanup, deleting the entry
before failover could use it.
Scenario fix: recovery-baseline-failover.yaml now kills the
correct node (discovered primary, not hardcoded), connects to
the correct new primary for post-failover verification.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Wire protocol messages and transport handlers for the rebuild MVP:
Protocol messages (rebuild_transport.go):
- SessionControlMsg: epoch, sessionID, command, baseLSN, targetLSN,
snapshotID. Encode/Decode with fixed 37-byte wire format.
- SessionAckMsg: epoch, sessionID, phase, walAppliedLSN, baseComplete,
achievedLSN. Encode/Decode with fixed 34-byte wire format.
- MsgSessionControl (0x10) and MsgSessionAck (0x11) on control channel.
- SendSessionControl/SendSessionAck convenience functions.
Transport handlers:
- RebuildTransportServer: primary-side, streams all extent blocks as
MsgRebuildExtent frames (reusing existing rebuild message type),
ends with MsgRebuildDone.
- RebuildTransportClient: replica-side, receives base blocks and
routes through vol.ApplyRebuildSessionBaseBlock, marks base
complete on MsgRebuildDone.
4 transport tests:
- SessionControl wire round-trip
- SessionAck wire round-trip
- BaseBlockStreaming: full TCP loop, 1024 blocks streamed and verified
- SessionControlOverTCP: real TCP send/receive with accepted ack
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add BlockService replica-side rebuild routing API that bridges
transport/host layer to BlockVol session surface:
StartReplicaRebuildSession(path, config)
ApplyReplicaRebuildWALEntry(path, sessionID, entry)
ApplyReplicaRebuildBaseBlock(path, sessionID, lba, data)
MarkReplicaRebuildBaseComplete(path, sessionID, totalBlocks)
TryCompleteReplicaRebuildSession(path, sessionID)
CancelReplicaRebuildSession(path, sessionID, reason)
ReplicaRebuildSession(path) → snapshot
Each method does one thing: validate → WithVolume → delegate to BlockVol.
No wire decoding, no protocol decisions, no state invention. Transport
wiring (sessionControl/walData/sessionData handlers) is the next step.
2 focused tests: skeleton routes correctly, stale session ID rejected.
Updated v2-rebuild-mvp-session-protocol.md with server skeleton section.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tighten acceptance matrix with explicit per-boundary rows, signoff
reading split into hard blockers vs product hardening, and clear
rule: architecture-complete ≠ product-complete.
6 hard blockers before T6/T7:
1. WriteLBA/SyncCache/sync_all contract closure
2. Fresh replica bounded catch-up before live tail
3. Timeout/retention-loss classification for catch-up
4. publish_healthy alignment with one protocol contract
5. RF=2 stable identity on all shipping paths
6. Test audit for incorrect WriteLBA==commit assumptions
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
7-area acceptance matrix mapping current state vs product requirements:
write/durability contract, fresh replica bootstrap, host observation
completeness, serving/publish alignment, snapshot/rebuild convergence,
adapter consistency, test contract alignment.
Each item marked with: current state, required for product, blocks
T6/T7, best test level. Priority ordered into must-close-before-Stage-1,
should-close-before-Stage-2, and can-close-after-T6/T7.
Key diagnosis: architecture-complete, execution-incomplete. The engine
thinks like a product; the data plane still behaves partly like a
prototype. The gap is end-to-end contract closure.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add host-side protocol state seam that derives per-replica execution
state from V2 sender/session snapshots and blocks live-tail WAL
shipping while an active recovery session is in progress.
New file: weed/server/block_protocol_state.go
- replicaProtocolExecutionState derived from engine snapshots
- LiveEligible=false during active catch-up/rebuild sessions
- bindProtocolExecutionPolicy wires policy into BlockVol
- syncProtocolExecutionState called after assignments + core events
Data plane changes:
- WALShipper.Ship() checks liveShippingPolicy before dial/send
- BlockVol.SetLiveShippingPolicy persists across shipper group rebuilds
- ShipperGroup propagates policy to all shippers
Design contract: sw-block/design/v2-protocol-aware-execution.md
Scope: WAL-first rollout only. Prevents illegal live-tail delivery
during active recovery. Does not change snapshot/build behavior or
move backlog. Next wave: bounded WAL catch-up under same contract.
Tests: 4 unit/component tests for phase gate behavior, plus bootstrap
seam tests that confirmed the two pre-existing bugs locally.
13 files changed, 900 insertions, 69 deletions.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Update coverage reading to reflect 49 tests (6 new component tests).
Add full roster status table with per-item strong/bounded/missing
marking and mapped test function names.
Unit+component: 32 of 33 items strong (T4-C7 NVMe bounded).
Integration: 6 of 10 missing (Tier 2 next).
Hardware: 4 of 4 missing (T6/T7 staged plan).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add detailed coverage mapping of 43 existing tests against the test
roster. Identify 7 missing component tests and 3 missing integration
tests with concrete scenarios, file placement, and must-prove criteria.
Key finding: every tester-found bug during T1-T5 was a wiring bug caught
by reviewing the production path, not by unit tests on pure logic. This
confirms component tests are the highest-value gap for CI/CD protection.
Priority order: Tier 1 (7 component tests, do now), Tier 2 (3 integration
tests, do before hardware), Tier 3 (4 hardware scenarios, T6/T7).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add ClusterReplicationMode and EngineProjectionMode to
FailoverVolumeState so each volume in the failover diagnostic
carries its cluster/engine mode at diagnosis time.
FailoverDiagnosticSnapshot() enriches volume entries by looking up
the registry entry for each volume. This covers both the block
volume API (GET /block/volume/{name}) and the failover diagnostic
snapshot surface.
Update phase doc to reflect actual exposure paths.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fix three tester findings on T5:
1. RF2 with missing replicas now reports "degraded" instead of
"no_replicas". Only RF=1 with no replicas returns "no_replicas".
Missing replica in an RF2 set is a degraded cluster state.
2. TransportDegraded signal now incorporated: if master-observed
transport is degraded, ClusterReplicationMode is at least
"degraded" regardless of individual replica health.
3. API surface exposure: EngineProjectionMode and
ClusterReplicationMode now appear on blockapi.VolumeInfo and are
populated in entryToVolumeInfo(). Operators can consume both
through GET /block/volume/{name} with distinct JSON field names.
12 tests: keepup, catching_up, stale degraded, LSN gap needs_rebuild,
rebuilding role, RF1 no_replicas, RF2 missing degraded, transport
degraded, distinctness, heartbeat update, worst dominates, API
surface distinct naming.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add ClusterReplicationMode as a distinct master-owned cluster-level
replication health judgment, computed from multi-replica facts:
replica LSN lag, heartbeat freshness, role state. Monotonic: worst
replica state dominates.
Modes: "no_replicas" (RF=1), "keepup" (all healthy), "catching_up"
(replica behind but recoverable), "degraded" (stale heartbeat or
barrier failure), "needs_rebuild" (unrecoverable gap or rebuilding
role).
Distinct from EngineProjectionMode (VS-local engine truth) and
VolumeMode (legacy). They answer different questions, live in
different fields, have different names. Tests explicitly prove the
two can differ without conflict.
Computed in recomputeReplicaState() alongside existing VolumeMode.
Updated on every heartbeat that touches the entry.
9 tests: keepup, catching_up, stale degraded, LSN gap needs_rebuild,
rebuilding role, no_replicas, distinctness from EngineProjectionMode,
heartbeat-driven update, worst-replica-dominates (RF3).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fix two tester findings:
1. Missing engine projection now fails closed: if v2Core is active but
CoreProjection(path) is missing, gate locally with reason
"missing_engine_projection". Mirrors T2's fail-closed posture.
Only skips enforcement when V2 core is entirely absent.
2. NVMe/TCP now gated alongside iSCSI: gateServing() calls both
targetServer.DisconnectVolume() and nvmeServer.RemoveVolume().
ungateServing() re-registers with both iSCSI and NVMe. A gated
volume is unreachable through all frontend paths.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fix three tester findings on T4 activation gate:
1. Real serving enforcement: evaluateActivationGate now calls
gateServing() → DisconnectVolume(iqn) on gate (terminates active
iSCSI sessions, removes volume from target). ungateServing() →
AddVolume(iqn, adapter) on clear (re-registers volume). This is
actual serving enforcement, not just bookkeeping.
2. Wire propagation: add activation_gated (field 25) and
activation_gate_reason (field 26) to proto BlockVolumeInfoMessage.
Add generated Go fields + getters. Add proto conversion in
InfoMessageToProto/InfoMessageFromProto. Gate state now rides the
real VS→master heartbeat wire.
3. Runtime ungate: evaluateActivationGate() now also runs in
applyCoreEvent() (the observation-driven path), not just
applyCoreAssignmentEvent(). Recovery/catch-up completion that
transitions the projection to publish_healthy/replica_ready now
clears the gate and re-registers the volume automatically.
ClearActivationGate() remains as an explicit override for edge cases
but is no longer the primary ungate path.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
After assignment executes through V2 core, evaluateActivationGate()
checks the resulting projection locally. If mode is degraded,
needs_rebuild, bootstrap_pending, or allocated_only, the volume is
gated from serving. Gate is enforced immediately after assignment,
before the next heartbeat round-trip.
Gate cleared only when projection reaches publish_healthy or
replica_ready. IsActivationGated() provides the query surface for
iSCSI/NVMe adapter enforcement. Heartbeat carries ActivationGated
and ActivationGateReason fields so master can observe the gated state
(report path, not enforcement path).
activationGated map on BlockService tracks per-volume gate state.
Initialized in constructor. Test helper updated to include it.
6 tests: degraded gates, needs_rebuild gates, healthy clears gate,
gate enforced before heartbeat, recovery re-enables, assignment with
degraded projection triggers gate.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace misleading V2PromotionEnabled/V2PromotionReady booleans with
single V2PromotionMode string: "disabled", "placeholder_fail_closed",
or "transport_ready".
Previous V2PromotionReady was true whenever any querier was installed,
including the placeholder that always returns error. Now the diagnostic
accurately distinguishes placeholder (fail-closed until proto regen)
from real gRPC transport.
blockV2EvidenceTransport bool on MasterServer tracks whether the real
transport querier is installed. Currently always false (placeholder).
Set to true only when real gRPC querier replaces the placeholder after
proto regen.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
FailoverDiagnostic now carries V2PromotionEnabled and V2PromotionReady
fields. MasterServer.FailoverDiagnosticSnapshot() enriches the failover
state diagnostic with rollout gate visibility so operators can confirm
whether the master is on V1, V2, or V2-fail-closed-placeholder mode.
Update phase-20.md: document default=false rollout policy (safe default
until proto regen enables evidence RPC, then flip to default true).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Wire V2 promotion into production binary:
- Add --block.v2Promotion CLI flag on weed master (default false)
- MasterOption.BlockV2Promotion → NewMasterServer wires flag + querier
- defaultBlockVSQueryEvidence placeholder (returns explicit error until
proto regen on M01 enables gRPC evidence RPC)
Fix three fail-closed violations found by tester:
1. blockV2Promotion=true + nil querier now fails closed with explicit
log instead of silently falling back to V1
2. Partial evidence (any candidate query failed) now fails closed —
unreachable candidate may be the most durable, promoting from
incomplete evidence violates durability-first ordering
3. Clear EngineProjectionMode in applyPromotionLocked (already in
previous commit, verified in tests here)
2 new tests: NilQuerier_FailsClosed, PartialEvidenceFailure_FailsClosed.
Total T3 tests: 7, all pass. Existing V1 failover tests unaffected.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Wire V2 promotion into the real master failover decision path:
promoteReplica() now dispatches to promoteReplicaV2() when
blockV2Promotion flag is true. V2 path queries each candidate for
fresh evidence via pluggable BlockPromotionEvidenceQuerier, selects
by CommittedLSN (durability-first), and fail-closes when no eligible
candidate exists. No silent fallback to V1.
Feature flag: blockV2Promotion bool on MasterServer. When false,
existing promoteReplicaV1() (health-score-first) is used unchanged.
Flag is explicit and observable, not a hidden rescue path.
Registry: add PromoteReplicaByServer() for V2 path where master
already knows the winner. Clear stale EngineProjectionMode in
applyPromotionLocked (complements T1 turnover fix).
T2 fix: fail-closed when V2 core projection is absent —
Eligible=false with reason "missing_engine_projection". CommittedLSN
from core used unconditionally (no WALHeadLSN overstatement).
5 T3 integration tests: higher CommittedLSN wins, all-ineligible
fail-closed, evidence-failure fail-closed, flag-off uses legacy,
epoch bump + assignment enqueue only after selection.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
VS-side evidence handler (QueryBlockPromotionEvidence) reads live
blockvol.Status() + V2 core projection at call time. Fail-closed:
no core projection → ineligible with reason "missing_engine_projection".
Engine CommittedLSN used unconditionally when core present (no WALHeadLSN
overstatement). Eligibility owned by local V2 engine, not master.
Master-side selection (selectDurabilityFirstCandidate): durability-first
ordering by CommittedLSN, tie-break WALHeadLSN then HealthScore. All
ineligible → fail-closed, no promotion. Pluggable querier
(BlockPromotionEvidenceQuerier) for T3 wiring.
Proto messages added to volume_server.proto. gRPC transport binding
pending proto regen on M01 — this commit delivers evidence semantics
and selection substrate, not full end-to-end RPC closure.
Phase 20 doc updated with T2-T5 reviewer packs and cross-task guardrails.
13 tests: live facts, core projection mode, fail-closed no-core, 4 gated
modes, missing volume, epoch mismatch, CommittedLSN ordering, WALHeadLSN
tie-break, HealthScore tie-break, all-ineligible, mixed collection.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add engine_projection_mode as a distinct proto/wire/registry field
that carries pure V2 engine-derived local projection mode from VS
to master. Reads ONLY from CoreProjection — no ad-hoc fallback.
Separate from existing VolumeMode: EngineProjectionMode is VS-local
V2 engine truth, VolumeMode is the existing field that conflates V2
and V1 paths. Both exist during transition; only EngineProjectionMode
is V2-authoritative.
Clears stale value on primary turnover: when a newly promoted primary
heartbeats without the field, the old primary's projection is not
preserved (prevents synthetic master-side truth).
5 focused tests: propagation, distinctness (hard assertion), backward
compat preservation, turnover-clears, turnover-with-field.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Live HTTP evidence transport, continuous Loop2 service, bounded auto
failover trigger, runtime-managed frontend export, bounded replica
repair, end-to-end RF2 handoff with continued I/O on new primary,
bounded operator HTTP surface, and CSI V2 runtime backend adapter.
11 new proof tests covering the full M6-M10 chain plus CSI create/
lookup/publish through the V2 runtime path.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Freeze the first bounded pilot/preflight/stop/rollout-review artifact set and sync the global product ledgers so productionization can start from an explicit chosen-envelope discipline instead of ad hoc rollout judgment.
Made-with: Cursor
Freeze the first Phase 17 branch/contract/policy/envelope package, add review and supported-matrix artifacts, and sync the product-completion and claim-evidence ledgers to the new bounded post-Phase-16 checkpoint.
Made-with: Cursor
Bind non-authoritative inventory, restart primary-truth rebasing, and sparse replica readiness retention into the heartbeat/master seam, and package the bounded finish-line checkpoint with explicit claims, non-claims, and proof commands.
Made-with: Cursor
Carry explicit volume_mode_reason across the heartbeat/master/API seam so outward surfaces retain the bounded core-owned explanation behind mode transitions.
Made-with: Cursor
Use ReplicaEligible instead of PublishHealthy in the heartbeat collector test now that publish health is rebound to publication truth rather than receiver readiness.
Made-with: Cursor
Make the heartbeat/master boundary preserve explicit volume_mode truth so master consume no longer reconstructs outward mode only from secondary heartbeat signals. Keep backward compatibility by falling back to the previous reconstruction when older heartbeats do not send the field.
Made-with: Cursor
Make the heartbeat/master boundary preserve explicit publish_healthy truth so master consume no longer reconstructs healthy publication only from secondary readiness and degraded heuristics. Keep backward compatibility by falling back to the previous reconstruction when older heartbeats do not send the field.
Made-with: Cursor
Make the heartbeat/master boundary preserve explicit needs_rebuild truth so primary heartbeat consume no longer collapses that stronger mode into a generic degraded signal. Keep backward compatibility by falling back to the previous heuristic when older heartbeats do not send the field.
Made-with: Cursor
Make the heartbeat/master boundary carry explicit replica readiness truth so the registry no longer depends only on replica transport-address presence as a readiness proxy. Keep backward compatibility by falling back to the old address heuristic when older heartbeats do not send the field.
Made-with: Cursor
Move removed-replica drain and replica-scoped invalidation onto explicit core-command paths so the widened multi-replica runtime no longer depends on coarse host-side recovery handling.
Made-with: Cursor
Emit one core-owned start_recovery_task per primary catch-up replica so the bounded multi-replica startup path no longer depends on a single-replica assumption.
Made-with: Cursor
Track catch-up observations per replica so the volume-level recovery view stays in catching_up until all bounded replicas complete. This preserves the current bounded semantics while removing an overclaim that would block later multi-replica startup ownership work.
Made-with: Cursor
Carry replica-scoped addressing through bounded recovery planning and completion events so the core no longer depends on a volume-only observation seam. This preserves the current single-replica catch-up and rebuilding behavior while aligning the observation side with the replica-scoped command path.
Made-with: Cursor
Replace the remaining volume-scoped recovery command and pending slot
with replica-scoped addressing on the bounded core-present path. This
preserves the current single-replica catch-up and rebuilding behavior
while removing the structural blocker for later multi-replica startup
ownership.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move dispatcher-facing host effects out of volume_server_block.go into
blockcmd while keeping server-owned cache/state semantics in weed/server.
Document Batch 10 delivery and Batch 11 stop-line review so the
separation line closes without over-extracting readiness-state mutation.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>