From fbcfe89e2461d42158cdc5e678d64e281ec7d2c8 Mon Sep 17 00:00:00 2001 From: pingqiu Date: Sun, 26 Apr 2026 10:41:02 -0700 Subject: [PATCH] =?UTF-8?q?G5-4=20bring-up=20hand-off=20v0.2=20=E2=80=94?= =?UTF-8?q?=20RESOLVED=20via=20local=20debug?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Root cause for "volume not ready" gate: missing --expected-slots-per-volume 2 flag on blockmaster. Default is 3; QA's 2-node topology had 2 slots; controller silently rejected observation snapshot (cmd/blockmaster/main.go:39). Fix verified locally on Windows (single-node, no m01/M02 needed): - Add --expected-slots-per-volume 2 to blockmaster command - Primary reaches Healthy=true with epoch=1 - assignment-received fires; durable storage opens; status endpoint serves {"Healthy":true} Lesson learned (process improvement): for V3-internal bring-up debug, try single-node local reproduction FIRST. The cluster bring-up gate is V3 logic, not network topology. Reproduces in seconds locally with full source-code access; m01/M02 only needed for cross-node-specific scenarios (real network conditions, iptables, multi-host wire). Secondary finding: replica r2 sees primary r1's assignment but records "supersede, not applying to adapter" because T1 HealthyPathExecutor only handles primary case. For G5-4 replica bring-up, sw needs to wire T4a-T4d ReplicationVolume + ReplicaPeer + ReplicaListener stack (not just --t1-readiness flag). This is the actual next gap for G5-4. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../v3-phase-15-g5-m0102-bringup-handoff.md | 63 ++++++++++++++++++- 1 file changed, 60 insertions(+), 3 deletions(-) diff --git a/sw-block/design/v3-phase-15-g5-m0102-bringup-handoff.md b/sw-block/design/v3-phase-15-g5-m0102-bringup-handoff.md index d64781a3c..0e2555612 100644 --- a/sw-block/design/v3-phase-15-g5-m0102-bringup-handoff.md +++ b/sw-block/design/v3-phase-15-g5-m0102-bringup-handoff.md @@ -1,13 +1,70 @@ # G5-4 m01+M02 Cluster Bring-Up — Hand-off to sw -**Date**: 2026-04-26 -**Status**: ⏸ blocked on V3-internal bring-up sequence question -**From**: QA (round 2026-04-26 cross-node smoke attempt) +**Date**: 2026-04-26 (v0.2 — RESOLVED via local single-node debug round) +**Status**: ✅ RESOLVED — root cause was missing `--expected-slots-per-volume 2` flag on blockmaster (default is 3); also documented secondary finding about non-primary replica not becoming "Healthy" via T1 path +**From**: QA (round 2026-04-26 cross-node smoke attempt → pivoted to local Windows debug) **To**: sw (G5-4 framework owner) **Context**: G5-4 m01 hardware first-light per [g5-kickoff §3 batch G5-4](v3-phase-15-g5-kickoff.md). Skeleton script committed at `seaweed_block@eabafe8` (`scripts/iterate-m01-replicated-write.sh`). --- +## §0 Resolution (added 2026-04-26 v0.2) + +**Root cause** (found via local Windows reproduction — m01/M02 was unnecessary for this debug): + +`cmd/blockmaster/main.go:39`: +``` +fs.IntVar(&f.expectedSlotsPerVol, "expected-slots-per-volume", 3, + "RF/expected slot count per volume; the controller rejects observation + snapshots whose slot count differs (default 3, set to 2 for 2-node smoke clusters)") +``` + +QA's 2-node topology (r1+r2) had **2 slots**; default `expected-slots-per-volume = 3`. Controller silently rejected the observation snapshot → no assignment minted → volumes stuck at "volume not ready". + +**Fix**: add `--expected-slots-per-volume 2` to the blockmaster command for 2-node test clusters. + +**Verification (local single-node, 3rd attempt):** + +``` +---master log--- +{"component":"blockmaster","phase":"listening","addr":"127.0.0.1:9180"} +---primary log--- +{"component":"blockvolume","phase":"status-listening","status_addr":"127.0.0.1:9290"} +{"component":"blockvolume","phase":"assignment-received","volume_id":"v1","replica_id":"r1","epoch":1,"endpoint_version":1} +2026/04/26 10:39:35 storage: recovery defensive scan (head==tail=0 checkpoint=0) +blockvolume: durable recovered: recovered LSN=0 +---primary status--- +{"VolumeID":"v1","ReplicaID":"r1","Epoch":1,"EndpointVersion":1,"Healthy":true} +``` + +✅ Primary fully Healthy + assignment received + durable opened. + +**Lesson learned (added to QA process)**: for V3 bring-up debug, **try single-node local reproduction first** before ssh'ing to m01/M02. The "cluster bring-up gate" is V3 logic, not network topology — local reproduces same failure mode in seconds, with full Read/Edit/grep access to source code. + +--- + +## §0a Secondary finding — non-primary replica doesn't reach "Healthy" via T1 path + +When the topology has 2 slots (r1 primary, r2 replica), the replica's `--t1-readiness` HealthyPathExecutor sees the assignment but logs: + +``` +blockvolume: volume v1 authority is now r1@1 (not this replica r2); + recording supersede, not applying to adapter +``` + +→ replica's adapter projection never reaches `Healthy=true` → replica stays at `volume not ready`. + +This is **architecturally correct** for the T1 minimum-readiness scope (T1 only handles the primary case per `core/host/volume/healthy_executor.go:10-28` godoc). For G5-4's replica-side bring-up, the script must wire the actual T4a-T4d **ReplicationVolume + ReplicaPeer + ReplicaListener** stack — NOT just `--t1-readiness` HealthyPathExecutor. + +This is a **G5-4 implementation question for sw**: what's the equivalent flag/setup to bring up a replica that participates in the replication path (vs T1's primary-only path)? Likely needs: +- The full ReplicationVolume binding (already done via T4d-4 part B `ReplicationVolume↔adapter` wiring) +- A different executor than `HealthyPathExecutor` — or `HealthyPathExecutor` needs to handle the secondary case +- Possibly a `--enable-replication` flag or similar + +This is the **next gap to surface** for G5-4. The 2-node bring-up itself works (proven above) — the replica just doesn't reach Healthy via T1. Real T4d engine-driven path needs different wiring. + +--- + ## §1 What I'm asking sw to answer **Question**: what's the canonical V3 flow to bring a 2-node cluster from cold-start to "primary + replica both healthy"?