After rejoin, the shipper is configured but no I/O triggers Ship(),
so the shipper stays Disconnected and the core stays at
awaiting_shipper_connected indefinitely.
Fix: observePrimaryShipperConnectivity now calls TryReconnectShippers
when ShipperConfigured=true but ShipperConnected=false. This triggers
the full reconnect protocol (dial + handshake + bounded catch-up)
proactively, bringing the replica current without waiting for I/O.
Option B approach: uses the same reconnect path as Barrier() — not a
fake write or bare dial probe. CatchUpTo(headLSN) replays any retained
WAL entries, bringing the replica fully current.
New methods:
- WALShipper.TryReconnect(): full reconnect without foreground I/O
- ShipperGroup.TryReconnectAll(): probes all disconnected shippers
- BlockVol.TryReconnectShippers(): volume-level entry point
Also fix pre-existing test expectation: engine now emits
start_recovery_task on primary assignment with replicas.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>