Files
seaweedfs/sw-block/engine
pingqiu 59a36013d4 feat: rebuild hardening A1-A5 + session-controlled execution path
A1 Engine kind-routing fix:
  SessionProgressObserved/Completed/Failed now respect active session
  Kind. Rebuild progress no longer leaks into catch-up aggregate.
  sessionKindMismatch guard + observeRebuildProgress helper.
  2 regression tests lock kind isolation.

A2 Retention pin:
  Rebuild session ack drives progress-based WAL retention floor.
  Pin installed at base_lsn on accepted, advances with wal_applied_lsn,
  released on completed/failed/cancelled. rebuildProgressPinFloor
  returns min across all active replicas.
  Retention pin test: 100 blocks fill WAL, 5 flusher cycles with
  20 pinned rebuild entries — all verified correct.

A3 Progress ack emission:
  Automatic sessionAck(running/base_complete/completed/failed) emitted
  from rebuild session lifecycle transitions. sessionAckLocked builds
  ack under session lock. emitRebuildSessionAck callback wired through
  SetOnRebuildSessionAck on BlockVol.
  ObserveReplicaRebuildSessionAck maps acks to core engine events.
  WireLocalReplicaRebuildSessionAcks bridges local callback to server.
  5 server tests proving ack→core, pin advance, pin cleanup.

A4 Deadline/timeout:
  rebuildAckWatch watchdog: armed on accepted/running/base_complete,
  refreshed on each ack, cleared on completed/failed. Timeout
  cancels local session + clears pin + fail-closes.
  2 tests: timeout→fail-close, progress→refresh.

A5 Session-controlled execution path:
  v2bridge.Executor.TransferFullBase now uses session-controlled loop:
  beginControlledFullBase → real sessionControl over TCP →
  transferExtentToSession via RebuildTransportClient →
  PrepareFullBaseRebuild → TryCompleteRebuildSession.
  ReplicaReceiver control channel handles MsgSessionControl alongside
  MsgBarrierReq. Session acks written back on same TCP connection.
  RebuildSessionBase request type separates new per-block stream from
  legacy raw extent stream. Full-base cleanup deferred until success.
  Deadlock fix: ApplyBaseBlock releases session lock before ioMu.
  Hydration skip for full-base sessions.

23 rebuild component tests (all pass):
  11 kernel correctness, 8 transport/runtime, 3 scenario-scale,
  including 1GB primary-initiated with CRC validation.

29 files changed, ~2500 insertions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 14:39:11 -07:00
..