Commit Graph

13225 Commits

Author SHA1 Message Date
pingqiu
f90ccf5bfd fix: proactive shipper reconnect on rejoin (Bug 5)
After rejoin, the shipper is configured but no I/O triggers Ship(),
so the shipper stays Disconnected and the core stays at
awaiting_shipper_connected indefinitely.

Fix: observePrimaryShipperConnectivity now calls TryReconnectShippers
when ShipperConfigured=true but ShipperConnected=false. This triggers
the full reconnect protocol (dial + handshake + bounded catch-up)
proactively, bringing the replica current without waiting for I/O.

Option B approach: uses the same reconnect path as Barrier() — not a
fake write or bare dial probe. CatchUpTo(headLSN) replays any retained
WAL entries, bringing the replica fully current.

New methods:
- WALShipper.TryReconnect(): full reconnect without foreground I/O
- ShipperGroup.TryReconnectAll(): probes all disconnected shippers
- BlockVol.TryReconnectShippers(): volume-level entry point

Also fix pre-existing test expectation: engine now emits
start_recovery_task on primary assignment with replicas.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 00:14:46 -07:00
pingqiu
53246d2780 fix: recover TOCTOU + WAL pressure edge case tests
Fix recover path TOCTOU: re-Lookup after AddReplica so the primary
refresh assignment includes the freshly added replica addresses.
Previously, Lookup (copy) was called before AddReplica modified the
registry, so entry.Replicas was empty → primary got replicas=0 →
shipper never configured.

Add 2 WAL pressure edge case tests:
- ShipperCatchUpOrEscalate: 64KB WAL, 200 writes, aggressive flusher.
  Proves no hang/deadlock/corruption. Shipper either keeps up or
  correctly escalates to NeedsRebuild.
- RebuildWithPinWhilePrimaryWrites: rebuild session active while
  primary writes 7600+ blocks in 2s. Proves primary never freezes
  — rebuild pin is on replica only, primary WAL recycles freely.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 23:56:26 -07:00
pingqiu
e0116fc631 fix: three hardware blockers — WAL retention + registry race + shutdown beat
All 43 actions pass on m01/m02 hardware. Auto-failover PASS.
dd_write: 30s → 123ms. Post-failover write: 33,621 IOPS.

1. WAL retention: remove keepup retention floor (MinShippedLSN).
   WAL cannot be pinned during sustained async writes — any pin
   strategy either fills WAL (blocking writes) or over-recycles
   (breaking catch-up). Flusher recycles freely. Future LBA map
   will provide catch-up without WAL retention.
   MinShippedLSN on ShipperGroup retained as diagnostic surface.

2. Registry stale-cleanup race: add RegisteredAt grace period.
   Race: master registers volume → next VS heartbeat arrives before
   VS discovers the volume → stale cleanup deletes the entry →
   failover finds 0 entries. Fix: skip stale cleanup for entries
   registered within 30s (> 2 heartbeat intervals).
   2 new tests: grace protects new entry, old entry still cleaned.

3. Shutdown heartbeat: VS disconnect heartbeat no longer claims
   block inventory authority. Previously, the shutdown beat's
   empty inventory triggered stale cleanup, deleting the entry
   before failover could use it.

Scenario fix: recovery-baseline-failover.yaml now kills the
correct node (discovered primary, not hardcoded), connects to
the correct new primary for post-failover verification.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 22:59:46 -07:00
pingqiu
39f1232fe2 feat: validation matrix closure — Rebuild Ready 12/12, Restore Ready 10/10
Close all Rebuild Ready and Restore Ready matrix gaps. V2 Ready at 10/14
(2 partial, 2 missing — honest assessment).

New tests (tester-written):
- R1: syncAck-driven trigger via protocol engine decision
- R3: stale replica restart beyond WAL → rebuild converges
- R5: connection drop mid-base → cancel → fresh rebuild converges
- R10: failover-rejoin with forced WAL recycling, strict rebuild assert
- R11: divergent replica full overwrite convergence
- R12: crash mid-rebuild → fresh session converges (not resume)
- S2: corrupt WAL entry + corrupt base block both rejected
- S5: snapshot-tail rebuild (base + WAL tail replay)
- S7: crash between base install and tail replay
- S8: snapshot under concurrent writes
- V5: rebuild complete without DurableLSN blocks publish_healthy
- V9: mixed replica health aggregate projection
- V14: negative fail-closed matrix (epoch, kind, stale)

Bug fix: StartRebuildSession now clears stale dirty map + resets WAL +
updates checkpoint AFTER safety check but BEFORE session.Start(). Fixes
stale extent data shadowing rebuild base blocks on reopened replicas.

Cleanup: remove 14 obsolete design docs (migration batches, old WAL-v2
specs, simulator goals) — all superseded by current protocol docs.

34 component tests + 8 protocol engine tests + server tests all pass.
1GB CRC validation passes in 19s.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 16:31:55 -07:00
pingqiu
59a36013d4 feat: rebuild hardening A1-A5 + session-controlled execution path
A1 Engine kind-routing fix:
  SessionProgressObserved/Completed/Failed now respect active session
  Kind. Rebuild progress no longer leaks into catch-up aggregate.
  sessionKindMismatch guard + observeRebuildProgress helper.
  2 regression tests lock kind isolation.

A2 Retention pin:
  Rebuild session ack drives progress-based WAL retention floor.
  Pin installed at base_lsn on accepted, advances with wal_applied_lsn,
  released on completed/failed/cancelled. rebuildProgressPinFloor
  returns min across all active replicas.
  Retention pin test: 100 blocks fill WAL, 5 flusher cycles with
  20 pinned rebuild entries — all verified correct.

A3 Progress ack emission:
  Automatic sessionAck(running/base_complete/completed/failed) emitted
  from rebuild session lifecycle transitions. sessionAckLocked builds
  ack under session lock. emitRebuildSessionAck callback wired through
  SetOnRebuildSessionAck on BlockVol.
  ObserveReplicaRebuildSessionAck maps acks to core engine events.
  WireLocalReplicaRebuildSessionAcks bridges local callback to server.
  5 server tests proving ack→core, pin advance, pin cleanup.

A4 Deadline/timeout:
  rebuildAckWatch watchdog: armed on accepted/running/base_complete,
  refreshed on each ack, cleared on completed/failed. Timeout
  cancels local session + clears pin + fail-closes.
  2 tests: timeout→fail-close, progress→refresh.

A5 Session-controlled execution path:
  v2bridge.Executor.TransferFullBase now uses session-controlled loop:
  beginControlledFullBase → real sessionControl over TCP →
  transferExtentToSession via RebuildTransportClient →
  PrepareFullBaseRebuild → TryCompleteRebuildSession.
  ReplicaReceiver control channel handles MsgSessionControl alongside
  MsgBarrierReq. Session acks written back on same TCP connection.
  RebuildSessionBase request type separates new per-block stream from
  legacy raw extent stream. Full-base cleanup deferred until success.
  Deadlock fix: ApplyBaseBlock releases session lock before ioMu.
  Hydration skip for full-base sessions.

23 rebuild component tests (all pass):
  11 kernel correctness, 8 transport/runtime, 3 scenario-scale,
  including 1GB primary-initiated with CRC validation.

29 files changed, ~2500 insertions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 14:39:11 -07:00
pingqiu
342f8baa69 feat: rebuild transport wiring — session control + base block streaming
Wire protocol messages and transport handlers for the rebuild MVP:

Protocol messages (rebuild_transport.go):
- SessionControlMsg: epoch, sessionID, command, baseLSN, targetLSN,
  snapshotID. Encode/Decode with fixed 37-byte wire format.
- SessionAckMsg: epoch, sessionID, phase, walAppliedLSN, baseComplete,
  achievedLSN. Encode/Decode with fixed 34-byte wire format.
- MsgSessionControl (0x10) and MsgSessionAck (0x11) on control channel.
- SendSessionControl/SendSessionAck convenience functions.

Transport handlers:
- RebuildTransportServer: primary-side, streams all extent blocks as
  MsgRebuildExtent frames (reusing existing rebuild message type),
  ends with MsgRebuildDone.
- RebuildTransportClient: replica-side, receives base blocks and
  routes through vol.ApplyRebuildSessionBaseBlock, marks base
  complete on MsgRebuildDone.

4 transport tests:
- SessionControl wire round-trip
- SessionAck wire round-trip
- BaseBlockStreaming: full TCP loop, 1024 blocks streamed and verified
- SessionControlOverTCP: real TCP send/receive with accepted ack

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 14:57:43 -07:00
pingqiu
49845dd509 feat: server-layer rebuild session skeleton — host routing for MVP
Add BlockService replica-side rebuild routing API that bridges
transport/host layer to BlockVol session surface:

  StartReplicaRebuildSession(path, config)
  ApplyReplicaRebuildWALEntry(path, sessionID, entry)
  ApplyReplicaRebuildBaseBlock(path, sessionID, lba, data)
  MarkReplicaRebuildBaseComplete(path, sessionID, totalBlocks)
  TryCompleteReplicaRebuildSession(path, sessionID)
  CancelReplicaRebuildSession(path, sessionID, reason)
  ReplicaRebuildSession(path) → snapshot

Each method does one thing: validate → WithVolume → delegate to BlockVol.
No wire decoding, no protocol decisions, no state invention. Transport
wiring (sessionControl/walData/sessionData handlers) is the next step.

2 focused tests: skeleton routes correctly, stale session ID rejected.

Updated v2-rebuild-mvp-session-protocol.md with server skeleton section.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 14:53:32 -07:00
pingqiu
d2d57851b0 feat: rebuild MVP — dual-lane session with bitmap protection
Rebuild session protocol implementation for v2-rebuild-mvp-session-protocol.md.

New files:
- rebuild_bitmap.go: RebuildBitmap — session-scoped dense bitset for
  WAL-applied LBA tracking. MarkApplied on local WAL write (not receive).
  ShouldApplyBase returns false for WAL-covered LBAs (WAL always wins).

- rebuild_session.go: RebuildSession — replica-side two-line rebuild.
  WAL lane (ApplyWALEntry) + base lane (ApplyBaseBlock) with bitmap
  conflict resolution. TryComplete requires BOTH base_complete AND
  wal_applied_lsn >= target_lsn. Volume-level control surface:
  StartRebuildSession, ApplyRebuildSessionWALEntry/BaseBlock,
  MarkRebuildSessionBaseComplete, TryCompleteRebuildSession,
  CancelRebuildSession, ActiveRebuildSession.

- rebuild_mvp_test.go: 4 correctness tests — base+WAL converge,
  WAL-applied never overwritten by base, bitmap set on applied not
  received, control surface start/supersede/complete.

- rebuild_transport_test.go: 2 transport-level tests — two-line with
  real WAL shipping, live writes during base copy with bitmap conflict.

Design docs:
- v2-rebuild-mvp-session-protocol.md: MVP spec with message set, apply
  rules, completion/failure/crash rules, test matrix
- v2-sync-recovery-protocol.md: full protocol context (keepup/catchup/
  rebuild unified design, primary decision logic, two-line model)
- v2-session-protocol-shape.md: protocol shape overview

Protocol engine (reference, not production):
- sw-block/protocol/: 7-event engine with ~300 lines, 13 tests

6 rebuild tests pass, all existing component tests pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 14:30:34 -07:00
pingqiu
55013e103b feat: Phase 20 Stage 0+1 closure — bootstrap + sustained workload on hardware
Stage 0 (bootstrap closure): PASS on m01/M02
  - create RF=2 sync_all → 10s shipper wait → 4k fsync → publish_healthy
  - Proves: BarrierAccepted observation, ShipperConnected, DurableLSN > 0

Stage 1 (sustained workload): 32/33 actions PASS
  - bootstrap → fio 10s randwrite → dd_write 1M×2 fsync → data checksum
  - Remaining: auto-failover promotion (separate issue)

Key fixes:
  - BarrierAccepted callback: SyncCache success → core DurableLSN update
  - BarrierRejected callback: barrier failures surface to core with reason
  - Shipper state callback for new volumes (not just startup volumes)
  - CatchUpTo ctrl conn reset: prevents stale control channel after recovery
  - CP13-6 max-bytes budget suspended: uses replicaFlushedLSN which can't
    advance without barrier; kills healthy shippers during async writes.
    Will be replaced by v2 negotiated sync/recovery protocol.
  - Barrier diagnostic logging: start/fail/success with reason and LSN
  - Scenario restructured: Stage 0 (bootstrap-closure) + Stage 1 (failover)
  - dd_write: sync_mode param + real stderr capture
  - sw-test-runner suite command: deploy once, run N scenarios
  - WAL size plumbing: proto + API + handler (forward-compatible)

Known: 6 blockvol/server test failures from Barrier() path change
(bounded catch-up in Barrier). Need test updates to match new semantics.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 19:55:12 -07:00
pingqiu
44103a1bd7 feat: Phase 20 acceptance fixes + sw-test-runner suite mode
Acceptance rows closed:
- WriteLBA/SyncCache contract: code comments document write-back vs
  durability fence semantics
- RF=2 stable identity: v2bridge always uses SetReplicaAddrs (preserves
  ServerID); blockcmd dispatcher also fixed to use setupPrimaryReplicationMulti;
  test asserts exact expected ReplicaID="vs-2" (not just non-empty)
- Tests treating WriteLBA as commit: replica_read_test rewritten with
  SyncCache as durability fence
- publish_healthy contract: 3 gate tests with hard assertions including
  gate 3 (PrimaryShipperConnected)
- SetReplicaAddr deprecation warning added
- WALShipper.ReplicaID() getter added for identity verification

Test runner enhancements:
- sw-test-runner suite command: build → deploy → run N scenarios in one
  invocation with --skip-deploy support
- Suite YAML definitions for T6 Stage 0 and Stage 1
- deploy action: kill stale processes, clean dirs, cross-compile, upload
- run-phase20-t6.ps1 PowerShell script (deprecated by suite command)

Engine/runtime fixes:
- Recovery executor nil-safety improvements
- Recovery bundle BuildRecoveryBundle defensive checks
- ShipperGroup MinReplicaFlushedLSNAll surface

Docs: acceptance checklist refined, test matrix updated, T6 runbook,
engine maintainer tutorial, design README updated.

26 files changed, ~1600 insertions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 11:30:54 -07:00
pingqiu
275c3ee1c7 docs: Phase 20 acceptance checklist — architect-refined signoff matrix
Tighten acceptance matrix with explicit per-boundary rows, signoff
reading split into hard blockers vs product hardening, and clear
rule: architecture-complete ≠ product-complete.

6 hard blockers before T6/T7:
1. WriteLBA/SyncCache/sync_all contract closure
2. Fresh replica bounded catch-up before live tail
3. Timeout/retention-loss classification for catch-up
4. publish_healthy alignment with one protocol contract
5. RF=2 stable identity on all shipping paths
6. Test audit for incorrect WriteLBA==commit assumptions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 00:12:32 -07:00
pingqiu
58aa842802 docs: Phase 20 product acceptance checklist
7-area acceptance matrix mapping current state vs product requirements:
write/durability contract, fresh replica bootstrap, host observation
completeness, serving/publish alignment, snapshot/rebuild convergence,
adapter consistency, test contract alignment.

Each item marked with: current state, required for product, blocks
T6/T7, best test level. Priority ordered into must-close-before-Stage-1,
should-close-before-Stage-2, and can-close-after-T6/T7.

Key diagnosis: architecture-complete, execution-incomplete. The engine
thinks like a product; the data plane still behaves partly like a
prototype. The gap is end-to-end contract closure.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 00:05:22 -07:00
pingqiu
d1a16fac03 feat: protocol-aware execution wave — phase gate for live WAL shipping
Add host-side protocol state seam that derives per-replica execution
state from V2 sender/session snapshots and blocks live-tail WAL
shipping while an active recovery session is in progress.

New file: weed/server/block_protocol_state.go
  - replicaProtocolExecutionState derived from engine snapshots
  - LiveEligible=false during active catch-up/rebuild sessions
  - bindProtocolExecutionPolicy wires policy into BlockVol
  - syncProtocolExecutionState called after assignments + core events

Data plane changes:
  - WALShipper.Ship() checks liveShippingPolicy before dial/send
  - BlockVol.SetLiveShippingPolicy persists across shipper group rebuilds
  - ShipperGroup propagates policy to all shippers

Design contract: sw-block/design/v2-protocol-aware-execution.md

Scope: WAL-first rollout only. Prevents illegal live-tail delivery
during active recovery. Does not change snapshot/build behavior or
move backlog. Next wave: bounded WAL catch-up under same contract.

Tests: 4 unit/component tests for phase gate behavior, plus bootstrap
seam tests that confirmed the two pre-existing bugs locally.

13 files changed, 900 insertions, 69 deletions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 23:47:07 -07:00
pingqiu
f8e8c2c4d1 docs: fix Phase 20 test count — 48 not 49
Verified by counting: T1(5) + T2(12) + T3(8) + T4(8) + T5(13) + Proto(2) = 48.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 21:06:00 -07:00
pingqiu
c7dd90c623 docs: Phase 20 test matrix — update with Tier 1 results + full roster status
Update coverage reading to reflect 49 tests (6 new component tests).
Add full roster status table with per-item strong/bounded/missing
marking and mapped test function names.

Unit+component: 32 of 33 items strong (T4-C7 NVMe bounded).
Integration: 6 of 10 missing (Tier 2 next).
Hardware: 4 of 4 missing (T6/T7 staged plan).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 19:42:13 -07:00
pingqiu
6bf9a6c283 test: Phase 20 Tier 1 component tests — wiring proof for CI/CD
6 new component tests closing gaps identified in the test matrix audit:

P20-T4-C3: Missing projection with active V2 core fails closed
  - v2Core != nil, no projection cached → gate with "missing_engine_projection"

P20-T4-C6: Gate actually removes iSCSI target (enforcement)
  - real TargetServer → HasTarget(iqn)==true before gate
  - gate → HasTarget(iqn)==false (DisconnectVolume called)
  - ungate → HasTarget(iqn)==true (AddVolume restores)

P20-T5-C3: FailoverDiagnosticSnapshot carries both mode fields
  - register volume with EngineProjectionMode + ClusterReplicationMode
  - trigger pending rebuild → volume appears in diagnostic
  - diagnostic entry carries both modes from registry lookup

P20-T3-C5: V2PromotionMode diagnostic tri-state
  - disabled / placeholder_fail_closed / transport_ready
  - all three configurations produce correct diagnostic value

P20-T1-C3: EngineProjectionMode proto round-trip
  - set value survives InfoMessageToProto → InfoMessageFromProto
  - empty value produces nil proto field (presence semantics)

P20-T4-C8: ActivationGated proto round-trip
  - gated=true + reason survives round-trip
  - not-gated produces no spurious reason

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 19:39:00 -07:00
pingqiu
1c7154a11a docs: Phase 20 test matrix — gap inventory + component test specs
Add detailed coverage mapping of 43 existing tests against the test
roster. Identify 7 missing component tests and 3 missing integration
tests with concrete scenarios, file placement, and must-prove criteria.

Key finding: every tester-found bug during T1-T5 was a wiring bug caught
by reviewing the production path, not by unit tests on pure logic. This
confirms component tests are the highest-value gap for CI/CD protection.

Priority order: Tier 1 (7 component tests, do now), Tier 2 (3 integration
tests, do before hardware), Tier 3 (4 hardware scenarios, T6/T7).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 19:33:55 -07:00
pingqiu
3e6155c18e docs: Phase 20 T5 — wire ClusterReplicationMode into diagnostic surface
Add ClusterReplicationMode and EngineProjectionMode to
FailoverVolumeState so each volume in the failover diagnostic
carries its cluster/engine mode at diagnosis time.

FailoverDiagnosticSnapshot() enriches volume entries by looking up
the registry entry for each volume. This covers both the block
volume API (GET /block/volume/{name}) and the failover diagnostic
snapshot surface.

Update phase doc to reflect actual exposure paths.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 18:56:40 -07:00
pingqiu
ceb68cc66b fix: Phase 20 T5 — RF2 missing replica degraded + transport signal + API surface
Fix three tester findings on T5:

1. RF2 with missing replicas now reports "degraded" instead of
   "no_replicas". Only RF=1 with no replicas returns "no_replicas".
   Missing replica in an RF2 set is a degraded cluster state.

2. TransportDegraded signal now incorporated: if master-observed
   transport is degraded, ClusterReplicationMode is at least
   "degraded" regardless of individual replica health.

3. API surface exposure: EngineProjectionMode and
   ClusterReplicationMode now appear on blockapi.VolumeInfo and are
   populated in entryToVolumeInfo(). Operators can consume both
   through GET /block/volume/{name} with distinct JSON field names.

12 tests: keepup, catching_up, stale degraded, LSN gap needs_rebuild,
rebuilding role, RF1 no_replicas, RF2 missing degraded, transport
degraded, distinctness, heartbeat update, worst dominates, API
surface distinct naming.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 18:49:37 -07:00
pingqiu
013f3e7ccb feat: Phase 20 T5 — ClusterReplicationMode on master
Add ClusterReplicationMode as a distinct master-owned cluster-level
replication health judgment, computed from multi-replica facts:
replica LSN lag, heartbeat freshness, role state. Monotonic: worst
replica state dominates.

Modes: "no_replicas" (RF=1), "keepup" (all healthy), "catching_up"
(replica behind but recoverable), "degraded" (stale heartbeat or
barrier failure), "needs_rebuild" (unrecoverable gap or rebuilding
role).

Distinct from EngineProjectionMode (VS-local engine truth) and
VolumeMode (legacy). They answer different questions, live in
different fields, have different names. Tests explicitly prove the
two can differ without conflict.

Computed in recomputeReplicaState() alongside existing VolumeMode.
Updated on every heartbeat that touches the entry.

9 tests: keepup, catching_up, stale degraded, LSN gap needs_rebuild,
rebuilding role, no_replicas, distinctness from EngineProjectionMode,
heartbeat-driven update, worst-replica-dominates (RF3).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 18:41:44 -07:00
pingqiu
9cead1b502 fix: Phase 20 T4 — fail-closed on missing projection + NVMe gate
Fix two tester findings:

1. Missing engine projection now fails closed: if v2Core is active but
   CoreProjection(path) is missing, gate locally with reason
   "missing_engine_projection". Mirrors T2's fail-closed posture.
   Only skips enforcement when V2 core is entirely absent.

2. NVMe/TCP now gated alongside iSCSI: gateServing() calls both
   targetServer.DisconnectVolume() and nvmeServer.RemoveVolume().
   ungateServing() re-registers with both iSCSI and NVMe. A gated
   volume is unreachable through all frontend paths.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 18:29:58 -07:00
pingqiu
46f72572c5 fix: Phase 20 T4 — real serving enforcement + wire propagation + runtime ungate
Fix three tester findings on T4 activation gate:

1. Real serving enforcement: evaluateActivationGate now calls
   gateServing() → DisconnectVolume(iqn) on gate (terminates active
   iSCSI sessions, removes volume from target). ungateServing() →
   AddVolume(iqn, adapter) on clear (re-registers volume). This is
   actual serving enforcement, not just bookkeeping.

2. Wire propagation: add activation_gated (field 25) and
   activation_gate_reason (field 26) to proto BlockVolumeInfoMessage.
   Add generated Go fields + getters. Add proto conversion in
   InfoMessageToProto/InfoMessageFromProto. Gate state now rides the
   real VS→master heartbeat wire.

3. Runtime ungate: evaluateActivationGate() now also runs in
   applyCoreEvent() (the observation-driven path), not just
   applyCoreAssignmentEvent(). Recovery/catch-up completion that
   transitions the projection to publish_healthy/replica_ready now
   clears the gate and re-registers the volume automatically.

ClearActivationGate() remains as an explicit override for edge cases
but is no longer the primary ungate path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 18:22:49 -07:00
pingqiu
a27569358b feat: Phase 20 T4 — local activation gate on promoted primary
After assignment executes through V2 core, evaluateActivationGate()
checks the resulting projection locally. If mode is degraded,
needs_rebuild, bootstrap_pending, or allocated_only, the volume is
gated from serving. Gate is enforced immediately after assignment,
before the next heartbeat round-trip.

Gate cleared only when projection reaches publish_healthy or
replica_ready. IsActivationGated() provides the query surface for
iSCSI/NVMe adapter enforcement. Heartbeat carries ActivationGated
and ActivationGateReason fields so master can observe the gated state
(report path, not enforcement path).

activationGated map on BlockService tracks per-volume gate state.
Initialized in constructor. Test helper updated to include it.

6 tests: degraded gates, needs_rebuild gates, healthy clears gate,
gate enforced before heartbeat, recovery re-enables, assignment with
degraded projection triggers gate.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 18:13:20 -07:00
pingqiu
f825f08680 fix: Phase 20 T3 — correct V2 promotion observability to tri-state mode
Replace misleading V2PromotionEnabled/V2PromotionReady booleans with
single V2PromotionMode string: "disabled", "placeholder_fail_closed",
or "transport_ready".

Previous V2PromotionReady was true whenever any querier was installed,
including the placeholder that always returns error. Now the diagnostic
accurately distinguishes placeholder (fail-closed until proto regen)
from real gRPC transport.

blockV2EvidenceTransport bool on MasterServer tracks whether the real
transport querier is installed. Currently always false (placeholder).
Set to true only when real gRPC querier replaces the placeholder after
proto regen.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 16:29:12 -07:00
pingqiu
2b97cd04b8 fix: Phase 20 T3 — add V2 promotion observability to FailoverDiagnostic
FailoverDiagnostic now carries V2PromotionEnabled and V2PromotionReady
fields. MasterServer.FailoverDiagnosticSnapshot() enriches the failover
state diagnostic with rollout gate visibility so operators can confirm
whether the master is on V1, V2, or V2-fail-closed-placeholder mode.

Update phase-20.md: document default=false rollout policy (safe default
until proto regen enables evidence RPC, then flip to default true).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 16:27:02 -07:00
pingqiu
43016e6645 fix: Phase 20 T3 — production wiring + fail-closed on partial evidence
Wire V2 promotion into production binary:
- Add --block.v2Promotion CLI flag on weed master (default false)
- MasterOption.BlockV2Promotion → NewMasterServer wires flag + querier
- defaultBlockVSQueryEvidence placeholder (returns explicit error until
  proto regen on M01 enables gRPC evidence RPC)

Fix three fail-closed violations found by tester:
1. blockV2Promotion=true + nil querier now fails closed with explicit
   log instead of silently falling back to V1
2. Partial evidence (any candidate query failed) now fails closed —
   unreachable candidate may be the most durable, promoting from
   incomplete evidence violates durability-first ordering
3. Clear EngineProjectionMode in applyPromotionLocked (already in
   previous commit, verified in tests here)

2 new tests: NilQuerier_FailsClosed, PartialEvidenceFailure_FailsClosed.
Total T3 tests: 7, all pass. Existing V1 failover tests unaffected.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 16:23:35 -07:00
pingqiu
59b2e2d8f9 feat: Phase 20 T3 — durability-first V2 promotion in real failover path
Wire V2 promotion into the real master failover decision path:
promoteReplica() now dispatches to promoteReplicaV2() when
blockV2Promotion flag is true. V2 path queries each candidate for
fresh evidence via pluggable BlockPromotionEvidenceQuerier, selects
by CommittedLSN (durability-first), and fail-closes when no eligible
candidate exists. No silent fallback to V1.

Feature flag: blockV2Promotion bool on MasterServer. When false,
existing promoteReplicaV1() (health-score-first) is used unchanged.
Flag is explicit and observable, not a hidden rescue path.

Registry: add PromoteReplicaByServer() for V2 path where master
already knows the winner. Clear stale EngineProjectionMode in
applyPromotionLocked (complements T1 turnover fix).

T2 fix: fail-closed when V2 core projection is absent —
Eligible=false with reason "missing_engine_projection". CommittedLSN
from core used unconditionally (no WALHeadLSN overstatement).

5 T3 integration tests: higher CommittedLSN wins, all-ineligible
fail-closed, evidence-failure fail-closed, flag-off uses legacy,
epoch bump + assignment enqueue only after selection.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 16:15:54 -07:00
pingqiu
1ca13143b6 feat: Phase 20 T2 — promotion evidence semantics + selection substrate
VS-side evidence handler (QueryBlockPromotionEvidence) reads live
blockvol.Status() + V2 core projection at call time. Fail-closed:
no core projection → ineligible with reason "missing_engine_projection".
Engine CommittedLSN used unconditionally when core present (no WALHeadLSN
overstatement). Eligibility owned by local V2 engine, not master.

Master-side selection (selectDurabilityFirstCandidate): durability-first
ordering by CommittedLSN, tie-break WALHeadLSN then HealthScore. All
ineligible → fail-closed, no promotion. Pluggable querier
(BlockPromotionEvidenceQuerier) for T3 wiring.

Proto messages added to volume_server.proto. gRPC transport binding
pending proto regen on M01 — this commit delivers evidence semantics
and selection substrate, not full end-to-end RPC closure.

Phase 20 doc updated with T2-T5 reviewer packs and cross-task guardrails.

13 tests: live facts, core projection mode, fail-closed no-core, 4 gated
modes, missing volume, epoch mismatch, CommittedLSN ordering, WALHeadLSN
tie-break, HealthScore tie-break, all-ineligible, mixed collection.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 16:10:57 -07:00
pingqiu
85dad8e0c9 feat: Phase 20 T1 — EngineProjectionMode in heartbeat
Add engine_projection_mode as a distinct proto/wire/registry field
that carries pure V2 engine-derived local projection mode from VS
to master. Reads ONLY from CoreProjection — no ad-hoc fallback.

Separate from existing VolumeMode: EngineProjectionMode is VS-local
V2 engine truth, VolumeMode is the existing field that conflates V2
and V1 paths. Both exist during transition; only EngineProjectionMode
is V2-authoritative.

Clears stale value on primary turnover: when a newly promoted primary
heartbeats without the field, the old primary's projection is not
preserved (prevents synthetic master-side truth).

5 focused tests: propagation, distinctness (hard assertion), backward
compat preservation, turnover-clears, turnover-with-field.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 15:45:26 -07:00
pingqiu
044a6d770b feat: Phase 19 — bounded working RF2 block path
Live HTTP evidence transport, continuous Loop2 service, bounded auto
failover trigger, runtime-managed frontend export, bounded replica
repair, end-to-end RF2 handoff with continued I/O on new primary,
bounded operator HTTP surface, and CSI V2 runtime backend adapter.

11 new proof tests covering the full M6-M10 chain plus CSI create/
lookup/publish through the V2 runtime path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 15:12:00 -07:00
pingqiu
5aedada53a feat: Phase 18 M3 — replicated data continuity closure
M3 milestone: write → Loop2 observe → failover → readback verify.

Continuity runtime (continuity_runtime.go):
- ExecuteReplicatedContinuity: composes mirror write + sync + Loop2
  observation + failover + readback verify into one bounded path
- ReplicatedContinuityResult: captures pre-failover Loop2 snapshot,
  failover result, selected primary, readback length, data match

Runtime manager extensions:
- Local node registry for write/readback during continuity verification
- RegisterNode now stores node reference for local I/O access

Tests prove two paths:
- Happy: write on source → failover → promoted node reads correct data
- Gated: degraded peer → failover gate stops → continuity reports failure

Phase 18 docs: M3 delivered, M4 next.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 14:10:05 -07:00
pingqiu
cae07c0bf1 feat: Phase 18 M2 — active Loop 2 replication runtime
M2 milestone: bounded summary-driven active Loop 2 runtime.

Loop 2 runtime session (loop2_runtime.go):
- Loop2RuntimeSession: primary-led active observation of replica set
- ObserveOnce: collects replica summaries via transport seam, evaluates
  runtime mode (keepup / catching_up / degraded / needs_rebuild)
- Fail-closed severity escalation: mode only degrades, never reverts
- Detection: epoch mismatch, barrier failure, peer behind primary,
  recovery in progress, needs_rebuild sticky

Runtime manager integration:
- NewLoop2RuntimeSession, ObserveLoop2, LastLoop2Snapshot, Loop2Snapshot
- Runtime manager now retains active Loop 2 snapshots alongside failover

Tests prove three paths:
- healthy replica set → keepup
- peer behind → catching_up
- peer needs_rebuild → needs_rebuild (fail-closed)

Phase 18 docs updated: M2 delivered, M3 next.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 14:02:13 -07:00
pingqiu
b82df09856 feat: Phase 18 M1 — transport-backed RF2 failover runtime
M1 milestone: failover evidence crosses transport/session seam.

Adapter seam (failover_adapter.go):
- FailoverEvidenceAdapter: query-side (promotion evidence + replica summary)
- FailoverTakeoverAdapter: execution-side (prepare + gate)
- FailoverTarget: binds NodeID + both adapters
- NewInProcessFailoverTarget: factory for in-process case

Transport seam (failover_evidence_transport.go):
- FailoverEvidenceTransport: request/response interface with nodeID routing
- FailoverEvidenceHandler: server-side registration
- InMemoryFailoverEvidenceTransport: first transport impl (in-memory)
- NewHybridInProcessFailoverTarget: transport-backed evidence + local takeover

Runtime manager (runtime_manager.go):
- InProcessRuntimeManager: participant registry + ExecuteFailover entry point
- Persisted failover snapshots/results per-volume and global-last

All failover paths (session/driver/manager) now go through adapter seam.
Old FailoverParticipant preserved as compatibility wrapper only.

Phase 18 docs: phase-18.md (M1-M5 structure), log, decisions.
Design docs updated: kernel-closure-review, claim-and-evidence ledger.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 13:57:08 -07:00
pingqiu
b8c6944e3f feat: V2 MVP milestone — masterv2 + volumev2 + in-process failover
V2 runtime packages:
- sw-block/runtime/masterv2: identity authority (desired state,
  heartbeat handling, promotion arbitration via SelectPromotionCandidate)
- sw-block/runtime/volumev2: per-volume micro-cluster shell (node,
  orchestrator, control session, iSCSI frontend, takeover gate,
  failover session + driver, replica summary reconstruction)
- sw-block/runtime/purev2: RF1 execution shell (engine + store +
  dispatcher + local boundary observations)
- sw-block/runtime/protocolv2: three-channel separation
  (heartbeat/assignment/query + replica summary)

V2 binaries:
- sw-block/cmd/v2singleblock: single-node RF1 block server
- sw-block/cmd/purev2rf1: minimal RF1 runtime binary

Milestone capabilities:
- RF1 write/read/sync with engine-driven mode projection
- masterv2 ↔ volumev2 heartbeat convergence + assignment reissue
- Promotion query with fresh CommittedLSN/WALHeadLSN evidence
- Replica summary for bounded takeover reconstruction
- Primary-loss reconstruction from peer summaries (fail-closed gate)
- In-process failover driver with session observability
- Local boundary observations feed engine (Committed/Durable/Checkpoint)

Design docs:
- v2-two-loop-protocol.md: identity vs data-control separation
- v2-automata-ownership-map.md: event/command ownership split
- v2-loop1-surface-draft.md: heartbeat/query/assignment field spec
- v2-volumev2-single-node-mvp.md: target layering
- v2-kernel-closure-review.md: per-volume micro-cluster principle
- v2-pure-runtime-rf1-bootstrap.md, v2-capability-map.md,
  v2-proof-and-retest-pyramid.md

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 13:08:02 -07:00
pingqiu
cf16e53b04 feat: Phase 16M/17 + promote fixes + testrunner updates
Phase 16M: explicit replica readiness on heartbeat seam
- master.proto: optional bool replica_ready = 19 (proto regenerated on M01)
- block_heartbeat_proto.go: write/read ReplicaReady with presence semantics
- master_block_registry.go: replicaReadyObservedFromHeartbeat prefers
  explicit proto field, falls back to address heuristic when absent
- volume_server_block.go: heartbeat emits ReplicaReady from core projection

Phase 17: host effects extraction + stop line
- phase-17-log.md: Batch 10/11 delivery notes

Promote fixes:
- master_block_failover.go: deterministic replica addrs from path hash
- qa_promote_replication_test.go: address-upgrade trigger test
- qa_promote_rejoin_live_test.go: new live rejoin test

Testrunner:
- devops.go: action improvements
- recovery-baseline-failover.yaml, suite-ha-failover.yaml: scenario updates
- cp11b3-manual-promote.yaml: promote scenario alignment
- fresh_volume_write_test.go: new component test

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 11:38:05 -07:00
pingqiu
7855d5240c docs: add bounded productionization pilot artifacts
Freeze the first bounded pilot/preflight/stop/rollout-review artifact set and sync the global product ledgers so productionization can start from an explicit chosen-envelope discipline instead of ad hoc rollout judgment.

Made-with: Cursor
2026-04-04 19:01:56 -07:00
pingqiu
4f95a1e868 docs: package phase 17 product claim checkpoint
Freeze the first Phase 17 branch/contract/policy/envelope package, add review and supported-matrix artifacts, and sync the product-completion and claim-evidence ledgers to the new bounded post-Phase-16 checkpoint.

Made-with: Cursor
2026-04-04 18:21:16 -07:00
pingqiu
0f72c8d062 refactor: close bounded phase 16 restart truth seams
Bind non-authoritative inventory, restart primary-truth rebasing, and sparse replica readiness retention into the heartbeat/master seam, and package the bounded finish-line checkpoint with explicit claims, non-claims, and proof commands.

Made-with: Cursor
2026-04-04 16:13:06 -07:00
pingqiu
10833c8b68 refactor: preserve bounded volume mode reason heartbeat truth
Carry explicit volume_mode_reason across the heartbeat/master/API seam so outward surfaces retain the bounded core-owned explanation behind mode transitions.

Made-with: Cursor
2026-04-04 14:21:31 -07:00
pingqiu
f20ec2ef79 test: align collector readiness check with replica eligibility
Use ReplicaEligible instead of PublishHealthy in the heartbeat collector test now that publish health is rebound to publication truth rather than receiver readiness.

Made-with: Cursor
2026-04-04 14:03:21 -07:00
pingqiu
6cad5bb8e1 refactor: rebind bounded volume mode heartbeat truth
Make the heartbeat/master boundary preserve explicit volume_mode truth so master consume no longer reconstructs outward mode only from secondary heartbeat signals. Keep backward compatibility by falling back to the previous reconstruction when older heartbeats do not send the field.

Made-with: Cursor
2026-04-04 13:56:41 -07:00
pingqiu
6794f79df9 refactor: preserve bounded publish healthy heartbeat truth
Make the heartbeat/master boundary preserve explicit publish_healthy truth so master consume no longer reconstructs healthy publication only from secondary readiness and degraded heuristics. Keep backward compatibility by falling back to the previous reconstruction when older heartbeats do not send the field.

Made-with: Cursor
2026-04-04 13:43:19 -07:00
pingqiu
eb610deb92 refactor: preserve bounded needs_rebuild heartbeat truth
Make the heartbeat/master boundary preserve explicit needs_rebuild truth so primary heartbeat consume no longer collapses that stronger mode into a generic degraded signal. Keep backward compatibility by falling back to the previous heuristic when older heartbeats do not send the field.

Made-with: Cursor
2026-04-04 13:11:42 -07:00
pingqiu
69b41a7f16 refactor: rebind bounded replica-ready heartbeat truth
Make the heartbeat/master boundary carry explicit replica readiness truth so the registry no longer depends only on replica transport-address presence as a readiness proxy. Keep backward compatibility by falling back to the old address heuristic when older heartbeats do not send the field.

Made-with: Cursor
2026-04-04 12:06:53 -07:00
pingqiu
43dbebfa04 refactor: close bounded recovery drain and invalidation seams
Move removed-replica drain and replica-scoped invalidation onto explicit core-command paths so the widened multi-replica runtime no longer depends on coarse host-side recovery handling.

Made-with: Cursor
2026-04-04 11:01:12 -07:00
pingqiu
5fd9ec0edf refactor: widen bounded multi-replica catch-up startup ownership
Emit one core-owned start_recovery_task per primary catch-up replica so the bounded multi-replica startup path no longer depends on a single-replica assumption.

Made-with: Cursor
2026-04-04 10:21:28 -07:00
pingqiu
92c006eb29 refactor: aggregate bounded multi-replica catch-up conservatively
Track catch-up observations per replica so the volume-level recovery view stays in catching_up until all bounded replicas complete. This preserves the current bounded semantics while removing an overclaim that would block later multi-replica startup ownership work.

Made-with: Cursor
2026-04-04 09:27:03 -07:00
pingqiu
16ba70f856 refactor: make bounded recovery observation events replica-scoped
Carry replica-scoped addressing through bounded recovery planning and completion events so the core no longer depends on a volume-only observation seam. This preserves the current single-replica catch-up and rebuilding behavior while aligning the observation side with the replica-scoped command path.

Made-with: Cursor
2026-04-04 09:18:07 -07:00
pingqiu
b304b8e212 refactor: make bounded recovery command addressing replica-scoped
Replace the remaining volume-scoped recovery command and pending slot
with replica-scoped addressing on the bounded core-present path. This
preserves the current single-replica catch-up and rebuilding behavior
while removing the structural blocker for later multi-replica startup
ownership.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 09:05:36 -07:00
pingqiu
1453274988 refactor: extract host effects adapter and define Phase 17 stop line
Move dispatcher-facing host effects out of volume_server_block.go into
blockcmd while keeping server-owned cache/state semantics in weed/server.
Document Batch 10 delivery and Batch 11 stop-line review so the
separation line closes without over-extracting readiness-state mutation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 08:43:21 -07:00