diff --git a/sw-block/design/protocol-development-process.md b/sw-block/design/protocol-development-process.md deleted file mode 100644 index e4480d22a..000000000 --- a/sw-block/design/protocol-development-process.md +++ /dev/null @@ -1,288 +0,0 @@ -# Protocol Development Process - -Date: 2026-03-27 - -## Purpose - -This document defines how `sw-block` protocol work should be developed. - -The process is meant to work for: - -- V2 -- future V3 -- or a later block algorithm that is not WAL-based - -The point is to make protocol work systematic rather than reactive. - -## Core Philosophy - -### 1. Design before implementation - -Do not start with production code and hope the protocol becomes clear later. - -Start with: - -1. system contract -2. invariants -3. state model -4. scenario backlog - -Only then move to implementation. - -### 2. Real failures are inputs, not just bugs - -When V1 or V1.5 fails in real testing, treat that as: - -- a design requirement -- a scenario source -- a simulator input - -Do not patch and forget. - -### 3. Simulator is part of the protocol, not a side tool - -The simulator exists to answer: - -- what should happen -- what must never happen -- which old designs fail -- why the new design is better - -It is not a replacement for real testing. -It is the design-validation layer before production implementation. - -### 4. Passing tests are not enough - -Green tests are necessary, not sufficient. - -We also require: - -- explicit invariants -- explicit scenario intent -- clear state transitions -- review of assumptions and abstraction boundaries - -### 5. Keep hot-path and recovery-path reasoning separate - -Healthy steady-state behavior and degraded recovery behavior are different problems. - -Both must be designed explicitly. - -## Development Ladder - -Every major protocol feature should move through these steps: - -1. **Problem statement** -- what real bug, limit, or product goal is driving the work - -2. **Contract** -- what the protocol guarantees -- what it does not guarantee - -3. **State model** -- node state -- coordinator state -- recovery state -- role / epoch / lineage rules - -4. **Scenario backlog** -- named scenarios -- source: - - real failure - - design obligation - - adversarial distributed case - -5. **Prototype / simulator** -- reduced but explicit model -- invariant checks -- V1 / V1.5 / V2 comparison where relevant - -6. **Implementation** -- production code only after the protocol shape is clear enough - -7. **Real validation** -- unit -- component -- integration -- real hardware where needed - -8. **Feedback loop** -- turn new failures back into scenario/design inputs - -## Required Artifacts - -For protocol work to be considered real progress, we usually want: - -### Design - -- design doc -- scenario doc -- comparison doc when replacing an older approach - -### Prototype - -- simulator or prototype code -- tests that assert protocol behavior - -### Implementation - -- production patch -- production tests -- docs updated to match the actual algorithm - -### Review - -- implementation gate -- design/protocol gate - -## Two-Gate Rule - -We use two acceptance gates. - -### Gate 1: implementation - -Owned by the coding side. - -Questions: - -- does it build? -- do tests pass? -- does it behave as intended in code? - -### Gate 2: protocol/design - -Owned by the design/review side. - -Questions: - -- is the logic actually sound? -- do tests prove the intended thing? -- are assumptions explicit? -- is the abstraction boundary honest? - -A task is not accepted until both gates pass. - -## Layering Rule - -Keep simulation layers separate. - -### `distsim` - -Use for: - -- protocol correctness -- state transitions -- fencing -- recoverability -- promotion / lineage -- reference-state checking - -### `eventsim` - -Use for: - -- timeout behavior -- timer races -- event ordering -- same-tick / delayed event interactions - -Do not duplicate scenarios blindly across both layers. - -## Test Selection Rule - -Do not choose simulator inputs only from failing tests. - -Review all relevant tests and classify them by: - -- protocol significance -- simulator value -- implementation specificity - -Good simulator candidates often come from: - -- barrier truth -- catch-up vs rebuild -- stale message rejection -- failover / promotion safety -- changed-address restart -- mode semantics - -Keep real-only tests for: - -- wire format -- OS timing -- exact WAL file behavior -- frontend transport specifics - -## Version Comparison Rule - -When designing a successor protocol: - -- keep the old version visible -- reproduce the old failure or limitation -- show the improved behavior in the new version - -For `sw-block`, that means: - -- `V1` -- `V1.5` -- `V2` - -should be compared explicitly where possible. - -## Documentation Rule - -The docs must track three different things: - -### `learn/projects/sw-block/` - -Use for: - -- project history -- V1/V1.5 algorithm records -- phase records -- real test history - -### `sw-block/design/` - -Use for: - -- active design truth -- V2 and later protocol docs -- scenario backlog -- comparison docs - -### `sw-block/.private/phase/` - -Use for: - -- active execution plan -- log -- decisions - -## What Good Progress Looks Like - -A good protocol iteration usually has this pattern: - -1. real failure or design pressure identified -2. scenario named and written down -3. simulator reproduces the bad case -4. new protocol handles it explicitly -5. implementation follows -6. real tests validate it - -If one of those steps is missing, confidence is weaker. - -## Bottom Line - -The process is: - -1. design the contract -2. model the state -3. define the scenarios -4. simulate the protocol -5. implement carefully -6. validate in real tests -7. feed failures back into design - -That is the process we should keep using for V2 and any later protocol line. diff --git a/sw-block/design/v1-v15-v2-comparison.md b/sw-block/design/v1-v15-v2-comparison.md deleted file mode 100644 index 4df8dc0d6..000000000 --- a/sw-block/design/v1-v15-v2-comparison.md +++ /dev/null @@ -1,314 +0,0 @@ -# V1, V1.5, and V2 Comparison - -Date: 2026-03-27 - -## Purpose - -This document compares: - -- `V1`: original replicated WAL shipping model -- `V1.5`: Phase 13 catch-up-first improvements on top of V1 -- `V2`: explicit FSM / orchestrator / recoverability-driven design under `sw-block/` - -It is a design comparison, not a marketing document. - -## 1. One-line summary - -- `V1` is simple but weak on short-gap recovery. -- `V1.5` materially improves recovery, but still relies on assumptions and incremental control-plane fixes. -- `V2` is structurally cleaner, more explicit, and easier to validate, but is not yet a production engine. - -## 2. Steady-State Hot Path - -In the healthy case, all three versions can look similar: - -1. primary appends ordered WAL -2. primary ships entries to replicas -3. replicas apply in order -4. durability barrier determines when client-visible commit completes - -### V1 - -- simplest replication path -- lagging replica typically degrades quickly -- little explicit recovery structure - -### V1.5 - -- same basic hot path as V1 -- WAL retention and reconnect/catch-up improve short outage handling -- extra logic exists, but much of it is off the hot path - -### V2 - -- can keep a similar hot path if implemented carefully -- extra complexity is mainly in: - - recovery planner - - replica state machine - - coordinator/orchestrator - - recoverability checks - -### Performance expectation - -In a normal healthy cluster: - -- `V2` should not be much heavier than `V1.5` -- most V2 complexity sits in failure/recovery/control paths -- there is no proof yet that V2 has better steady-state throughput or latency - -## 3. Recovery Behavior - -### V1 - -Recovery is weakly structured: - -- lagging replica tends to degrade -- short outage often becomes rebuild or long degraded state -- little explicit catch-up boundary - -### V1.5 - -Recovery is improved: - -- short outage can recover by retained-WAL catch-up -- background reconnect closes the `sync_all` dead-loop -- catch-up-first is preferred before rebuild - -But the model is still partly implicit: - -- reconnect depends on endpoint stability unless control plane refreshes assignment -- recoverability boundary is not as explicit as V2 -- tail-chasing and retention pressure still need policy care - -### V2 - -Recovery is explicit by design: - -- `InSync` -- `Lagging` -- `CatchingUp` -- `NeedsRebuild` -- `Rebuilding` - -And explicit decisions exist for: - -- catch-up vs rebuild -- stale-epoch rejection -- promotion candidate choice -- recoverable vs unrecoverable gap - -## 4. Real V1.5 Lessons - -The main V2 requirements come from real V1.5 behavior. - -### 4.1 Changed-address restart - -Observed in `CP13-8 T4b`: - -- replica restarted -- endpoint changed -- primary shipper held stale address -- direct reconnect could not succeed until control plane refreshed assignment - -V1.5 fix: - -- saved address used only as hint -- heartbeat-reported address becomes source of truth -- master refreshes primary assignment - -Lesson for V2: - -- endpoint is not identity -- reassignment must be explicit - -### 4.2 Reconnect race - -Observed in Phase 13 review: - -- barrier path and background reconnect path could both trigger reconnect - -V1.5 fix: - -- `reconnectMu` serializes reconnect / catch-up - -Lesson for V2: - -- one active recovery session per replica should be a protocol rule, not just a local mutex trick - -### 4.3 Tail-chasing - -Even with retained WAL: - -- primary may write faster than a lagging replica can recover -- catch-up may not converge - -Lesson for V2: - -- explicit abort / `NeedsRebuild` -- do not pretend catch-up will always work - -### 4.4 Control-plane recovery latency - -V1.5 can be correct but still operationally slow if recovery waits on slower management cycles. - -Lesson for V2: - -- keep authority in coordinator -- but make recovery decisions explicit and fast when possible - -## 5. V2 Structural Improvements - -V2 is better primarily because it is easier to reason about and validate. - -### 5.1 Better state model - -Instead of implicit recovery behavior, V2 has: - -- per-replica FSM -- volume/orchestrator model -- distributed simulator with scenario coverage - -### 5.2 Better validation - -V2 has: - -- named scenario backlog -- protocol-state assertions -- randomized simulation -- V1/V1.5/V2 comparison tests - -This is a major difference from V1/V1.5, where many fixes were discovered through implementation and hardware testing first. - -### 5.3 Better correctness boundaries - -V2 makes these explicit: - -- recoverable gap vs rebuild -- stale traffic rejection -- promotion lineage safety -- reservation or payload availability transitions - -## 6. Stability Comparison - -### Current judgment - -- `V1`: least stable under failure/recovery stress -- `V1.5`: meaningfully better and now functionally validated on real tests -- `V2`: best protocol structure and best simulator confidence - -### Important limit - -`V2` is not yet proven more stable in production because: - -- it is not a production engine yet -- confidence comes from simulator/design work, not real block workload deployment - -So the accurate statement is: - -- `V2` is more stable **architecturally** -- `V1.5` is more stable **operationally today** because it is implemented and tested on real hardware - -## 7. Performance Comparison - -### What is likely true - -`V2` should perform better than rebuild-heavy recovery approaches when: - -- outage is short -- gap is recoverable -- catch-up avoids full rebuild - -It should also behave better under: - -- flapping replicas -- stale delayed messages -- mixed-state replica sets - -### What is not yet proven - -We do not yet know whether `V2` has: - -- better steady-state throughput -- lower p99 latency -- lower CPU overhead -- lower memory overhead - -than `V1.5` - -That requires real implementation and benchmarking. - -## 8. Smart WAL Fit - -### Why Smart WAL is awkward in V1/V1.5 - -V1/V1.5 do not naturally model: - -- payload classes -- recoverability reservations -- historical payload resolution -- explicit recoverable/unrecoverable transition - -So Smart WAL would be harder to add cleanly there. - -### Why Smart WAL fits V2 better - -V2 already has the right conceptual slots: - -- `RecoveryClass` - - `WALInline` - - `ExtentReferenced` -- recoverability planner -- catch-up vs rebuild decision point -- simulator for payload-availability transitions - -### Important rule - -Smart WAL must not mean: - -- “read current extent for old LSN” - -That is incorrect. - -Historical correctness requires: - -- WAL inline payload -- or pinned snapshot/versioned extent state -- not current live extent contents - -## 9. What Is Proven Today - -### Proven - -- `V1.5` significantly improves V1 recovery behavior -- real `CP13-8` testing validated the V1.5 data path and `sync_all` behavior -- the V2 simulator covers: - - stale traffic rejection - - tail-chasing - - flapping replicas - - multi-promotion lineage - - changed-address restart comparison - - same-address transient outage comparison - - Smart WAL availability transitions - -### Not yet proven - -- V2 production implementation quality -- V2 steady-state performance advantage -- V2 real hardware recovery performance - -## 10. Bottom Line - -If choosing based on current evidence: - -- use `V1.5` as the production line today -- use `V2` as the better long-term architecture - -If choosing based on protocol quality: - -- `V2` is clearly better structured -- `V1.5` is still more ad hoc, even after successful fixes - -If choosing based on current real-world proof: - -- `V1.5` has the stronger operational evidence today -- `V2` has the stronger design and simulation evidence today diff --git a/sw-block/design/v2-assignment-translation-unification.md b/sw-block/design/v2-assignment-translation-unification.md deleted file mode 100644 index dc7619598..000000000 --- a/sw-block/design/v2-assignment-translation-unification.md +++ /dev/null @@ -1,108 +0,0 @@ -# V2 Assignment Translation Unification - -Date: 2026-04-04 -Status: active - -## Purpose - -This note defines how assignment translation should be unified so that: - -1. `weed/storage/blockvol/v2bridge/control.go` -2. `weed/server/volume_server_block.go` - -do not drift on identity, role, or recovery-target meaning. - -## Current Drift Risk - -Today there are two live translation sites: - -1. `ControlBridge.ConvertAssignment()` produces `engine.AssignmentIntent` -2. `BlockService.coreAssignmentEvent()` produces `engine.AssignmentDelivered` - -They do not translate the same source type, but they do share semantic rules: - -1. how to build stable `ReplicaID` -2. how to map role-shaped inputs to recovery target -3. how to represent one local replica endpoint in engine types - -If those rules stay duplicated, later migration batches will reintroduce split -truth. - -## Canonical Rule Placement - -The canonical reusable rules belong in: - -1. `sw-block/bridge/blockvol` - -Why: - -1. the rules are semantic translation, not product integration -2. both `weed/storage/blockvol/v2bridge` and `weed/server` can import this - package -3. `sw-block` remains weed-free - -## Rules That Must Be Canonical - -### 1. Stable identity - -Canonical helper: - -1. `MakeReplicaID()` -2. `ReplicaAssignmentForServer()` - -Rule: - -1. `ReplicaID = /` -2. never derive identity from transport address - -### 2. Recovery-target mapping - -Canonical helper: - -1. `RecoveryTargetForRole()` - -Rule: - -1. `replica -> catchup` -2. `rebuilding -> rebuild` -3. all other roles -> no recovery target - -### 3. Endpoint packaging - -Canonical helper: - -1. `ReplicaAssignmentForServer()` - -Rule: - -1. adapter code may still source endpoint fields from different wire/runtime - inputs -2. but once packaged into `engine.ReplicaAssignment`, the shape must be uniform - -## What Still Stays Local - -These parts remain adapter-local and should NOT be forced into one helper yet: - -1. reading `blockvol.BlockVolumeAssignment` -2. deciding whether the local VS is primary/replica/rebuilding in a given - runtime context -3. multi-replica traversal over heartbeat/master wire structures - -Those are source-format adaptation concerns, not canonical translation rules. - -## Implemented First Step - -The first unification step is already applied: - -1. `sw-block/bridge/blockvol/control_adapter.go` - - now exports the canonical helpers -2. `weed/server/volume_server_block.go` - - now consumes the same helper layer for local assignment rebinding - -## Next Step - -The next step after this document is: - -1. reduce `weed/storage/blockvol/v2bridge/control.go` to source-format - extraction only -2. keep all shared semantic mapping rules in `sw-block/bridge/blockvol` diff --git a/sw-block/design/v2-bounded-internal-pilot-pack.md b/sw-block/design/v2-bounded-internal-pilot-pack.md deleted file mode 100644 index 4cfd8e7d9..000000000 --- a/sw-block/design/v2-bounded-internal-pilot-pack.md +++ /dev/null @@ -1,147 +0,0 @@ -# V2 Bounded Internal Pilot Pack - -Date: 2026-04-05 -Status: draft -Purpose: define the bounded internal engineering validation pack around the -current `Phase 18` RF2 runtime-bearing envelope without silently broadening scope - -## Reading Rule - -This pilot pack is a bounded validation package for the current RF2 -runtime-bearing envelope. - -It does NOT mean: - -1. broad launch approval -2. generic production readiness -3. proof that the current runtime path is already a working block product -4. permission to redefine exclusions through pilot success - -It means only: - -1. the team may run limited internal engineering validation inside the accepted - runtime-bearing envelope -2. validation outcomes must be read against the delivered `Phase 18` boundary -3. incidents must be routed explicitly instead of becoming vague rollout lore - -## Pilot Scope - -This pack is limited to the current `Phase 18` runtime-bearing envelope: - -1. kernel/runtime path: - - `masterv2` identity authority - - `volumev2` runtime-owned failover / Loop 2 / continuity / RF2 surface path - - `purev2` execution adapter reuse -2. validation shape: - - bounded in-process runtime exercises - - artifact-driven review only -3. supported proof shape: - - failover-time evidence seam - - active Loop 2 observation - - continuity handoff statement - - compressed RF2 outward surface -4. excluded surface classes: - - real product frontends - - broad operator APIs - - real transport-backed product traffic - -Anything outside that scope is not a finding for this pack. -It is either a known exclusion, an explicit blocker, or later widening work. - -## Pilot Environment And Topology - -The validation environment must stay fixed and reviewable: - -1. use one explicit build/commit package for all pilot nodes -2. keep topology inside the bounded `RF=2` runtime-bearing path and do not - introduce `RF>2` -3. do not introduce real frontend/product traffic or broad transport claims -4. pin operator-facing configuration and startup procedure in a written runbook -5. expose the runtime diagnosis surfaces needed to read: - - failover snapshot/result - - Loop 2 snapshot - - continuity snapshot - - RF2 outward surface - -If the validation needs ad hoc operator judgment to stay healthy, the pack is not -ready. - -## Success Criteria - -The bounded validation is considered successful only if ALL of the following -hold: - -1. no observed behavior contradicts the bounded `Phase 18` runtime envelope -2. no observed behavior contradicts the bounded fail-closed reading of the new - runtime path -3. incidents can be classified using the explicit buckets in this pack without - inventing new ambiguous categories -4. operators can execute preflight, bounded validation, and diagnosis from - written artifacts rather than tribal knowledge -5. findings do not require silently widening the supported envelope -6. the review outcome remains consistent with the current `block expansion / - not pilot-ready` judgment unless new closure explicitly changes it - -Validation success validates the current bounded envelope only. -It does not create a broader product claim by itself. - -## Incident Intake And Classification - -Every incident must record: - -1. time, node set, workload, and surface involved -2. observed symptom -3. affected bounded claim, exclusion, or blocker -4. diagnosis evidence used -5. immediate operator action taken -6. final classification - -Allowed classification buckets: - -1. `config / environment issue` - - the product behaved inside the bounded claim, but the deployment violated the - pilot preflight or environment assumptions -2. `known exclusion` - - the incident came from a surface or claim already excluded from the first - launch matrix -3. `true product bug` - - the incident contradicts an accepted bounded claim or reveals a real gap - inside the named chosen envelope - -If an incident does not fit one of those buckets, stop the validation and refine -the -artifact set before continuing. - -## Decision Outputs - -At the end of a bounded validation window, the allowed outcomes are: - -1. `stay in bounded validation` - - more evidence is needed inside the same envelope -2. `widen bounded engineering exposure` - - the review may expand only internal engineering validation inside the same - envelope -3. `block expansion` - - a contradiction, repeated unresolved bug, or operational ambiguity prevents - widening - -These outcomes require the bounded envelope review artifact. -This pack does not replace that review. - -## Explicit Non-Claims - -This pack does NOT claim: - -1. generic production proof from limited validation success -2. support for `RF>2` -3. support for a broad transport/frontend matrix -4. broad automatic failover guarantees -5. hours/days soak proof outside the bounded runtime-bearing reading - -## Primary Inputs - -1. `sw-block/design/v2-rf2-runtime-bounded-envelope.md` -2. `sw-block/design/v2-rf2-runtime-bounded-envelope-review.md` -3. `sw-block/.private/phase/phase-18.md` -4. `sw-block/design/v2-protocol-claim-and-evidence.md` -5. `sw-block/design/v2-product-completion-overview.md` diff --git a/sw-block/design/v2-capability-map.md b/sw-block/design/v2-capability-map.md index ba935d7a1..07b954a27 100644 --- a/sw-block/design/v2-capability-map.md +++ b/sw-block/design/v2-capability-map.md @@ -405,6 +405,31 @@ Primary proof tiers: | 7 | Product surfaces | CSI and frontend projection of V2 truth | integrated | | 8 | Launch envelope | bounded support and rollout claims | integrated + soak | +## Matrix Linkage + +Use the three active documents in a fixed order: + +1. protocol docs define the rule +2. this capability map defines which product tier owns the rule +3. `v2-validation-matrix.md` defines what must be proven for closure +4. `v2-integration-matrix.md` defines which real scenarios exercise the path + +The goal is to make the chain explicit: + +`protocol -> capability tier -> validation rows -> integration rows` + +| Tier | Primary protocol refs | Validation rows | Integration rows | Practical meaning | +|------|------------------------|-----------------|------------------|-------------------| +| 0 | `v2-protocol-truths.md`, `v2-sync-recovery-protocol.md` | `V4`, `V5`, `V14` | feeds `I-V1` through `I-V6` | pure semantic truth and fail-closed rules | +| 1 | `v2-protocol-truths.md` | `V1` | `I-V1` | single-volume and bootstrap correctness | +| 2 | `v2-sync-recovery-protocol.md` | `V1`, `V2`, `V4` | `I-V1`, `I-V2` | RF=2 replication base and barrier/publication closure | +| 3 | `v2-sync-recovery-protocol.md`, `v2-rebuild-mvp-session-protocol.md` | `R1`-`R12`, `V3`, `V6`, `V7`, `V8`, `V11` | `I-R1`-`I-R8`, `I-V3`, `I-V4`, `I-V5` | recovery, rebuild, failover, and rejoin | +| 4 | `v2-sync-recovery-protocol.md` | `V9`, `V10` | future `RF>=3` integrated rows | aggregate multi-replica projection and durability semantics | +| 5 | `v2-rebuild-mvp-session-protocol.md`, snapshot/restore execution docs | `S1`-`S10` | `I-S1`-`I-S4` | snapshot, restore, and lifecycle operations | +| 6 | `v2-automata-ownership-map.md`, `v2-protocol-claim-and-evidence.md` | `V8`, `V12`, `V13` | `I-V4`, `I-V6` | control-plane truth, observability, and operator surfaces | +| 7 | product-surface and rollout docs | `V1`, `V2`, `V12`, `V13` | runner scenarios and product e2e packs | CSI/frontend projection of V2 truth | +| 8 | rollout/support docs | stage-gate summaries in validation matrix | chaos/perf rows `I-C1`-`I-C4`, `I-P1`-`I-P3` | bounded launch envelope and operational confidence | + ## Test Expansion Strategy From This Map This map should drive testing in a faster order than "one expensive scenario at a time." diff --git a/sw-block/design/v2-controlled-rollout-review.md b/sw-block/design/v2-controlled-rollout-review.md deleted file mode 100644 index 90fa01139..000000000 --- a/sw-block/design/v2-controlled-rollout-review.md +++ /dev/null @@ -1,143 +0,0 @@ -# V2 Controlled Rollout Review - -Date: 2026-04-05 -Status: draft -Purpose: define the bounded review used to decide whether internal engineering -validation on the current `Phase 18` RF2 runtime envelope stays limited, widens -inside the same envelope, or blocks expansion - -## Reading Rule - -This artifact is a bounded decision gate after runtime-envelope validation. - -It does NOT mean: - -1. broad launch approval -2. generic production readiness -3. permission to widen beyond the frozen runtime envelope -4. permission to reinterpret validation survival as new protocol/runtime proof - -It means only: - -1. validation outcomes may be reviewed against the already-accepted bounded - envelope -2. expansion decisions must stay inside the same named support boundary -3. any broader claim still needs explicit new evidence and explicit new review - -## Allowed Decisions - -The rollout review may produce only one of these outputs: - -1. `stay in bounded validation` - - the chosen envelope is still the right boundary, but more bounded validation - evidence is needed before any exposure increase -2. `widen bounded engineering exposure` - - exposure may increase only inside the current bounded engineering envelope, - with no change to the named support boundary -3. `block expansion` - - the current evidence, incident record, or operational ambiguity is not strong - enough to increase exposure safely - -Any outcome outside those three is invalid for this review. - -## Required Inputs - -The review must not start unless these inputs exist and are explicit: - -1. the frozen runtime envelope -2. the bounded pilot pack -3. the preflight checklist outcome(s) -4. the pilot stop-condition artifact -5. incident records with explicit classification -6. validation outcome summary for the bounded runtime-bearing path -7. the accepted evidence anchors that define the current boundary: - - `Phase 18 M1-M4` - - `v2-rf2-runtime-bounded-envelope.md` - - `v2-rf2-runtime-bounded-envelope-review.md` - -If any required input is missing, the correct review output is `block expansion`. - -## Decision Questions - -The rollout review must answer all of the following: - -1. did validation remain fully inside the frozen runtime envelope -2. did any observed behavior contradict the bounded `Phase 18` runtime envelope -3. were any stop conditions triggered, and if so, how were they resolved -4. are all incidents classified cleanly as: - - `config / environment issue` - - `known exclusion` - - `true product bug` -5. does any proposed next step depend on a broader claim than the current - envelope -6. can operators run the validation and diagnose bounded failures from written - artifacts rather than tribal knowledge - -If the answer to question 5 is yes, the review must not approve widening inside -this artifact. That request belongs to later evidence expansion work. - -## Decision Rules - -Use these bounded rules: - -1. approve `stay in bounded validation` when: - - validation stayed inside scope - - no contradiction to accepted bounded claims was found - - more same-envelope evidence is still needed -2. approve `widen bounded engineering exposure` only when: - - validation stayed inside scope - - no unresolved `true product bug` remains against the bounded envelope - - stop conditions did not reveal structural ambiguity - - operator workflow is explicit and repeatable from the artifact set - - the widened exposure does not change the bounded envelope -3. approve `block expansion` when: - - any unresolved contradiction exists - - any unresolved `true product bug` exists - - incident records are vague - - operators depend on tribal knowledge - - the requested widening outruns the current matrix - -## Explicit Review Record - -Each review result must record: - -1. decision outcome -2. date and reviewer set -3. validation window / environment covered -4. summary of incidents by classification bucket -5. any stop-condition events and their disposition -6. exact reason the decision stays inside the current envelope -7. explicit next action: - - continue bounded validation - - widen engineering exposure inside the same envelope - - pause and fix - -## Rejection Rules - -Reject the review as invalid if: - -1. it uses validation success as generic production proof -2. it broadens topology, runtime path, or supported surfaces without a new - evidence package -3. it treats a known exclusion as if validation cleared it -4. it ignores stop-condition events or unresolved true product bugs -5. it cannot map the decision back to the accepted evidence ladder - -## Explicit Non-Claims - -This artifact does NOT claim: - -1. broad rollout approval -2. generic production readiness -3. support for `RF>2` -4. support for a broad transport/frontend matrix -5. broad failover-under-load or long-window soak proof - -## Primary Inputs - -1. `sw-block/design/v2-rf2-runtime-bounded-envelope.md` -2. `sw-block/design/v2-rf2-runtime-bounded-envelope-review.md` -3. `sw-block/design/v2-bounded-internal-pilot-pack.md` -4. `sw-block/design/v2-pilot-preflight-checklist.md` -5. `sw-block/design/v2-pilot-stop-conditions.md` -6. `sw-block/.private/phase/phase-18.md` diff --git a/sw-block/design/v2-first-launch-supported-matrix.md b/sw-block/design/v2-first-launch-supported-matrix.md deleted file mode 100644 index 332aabe91..000000000 --- a/sw-block/design/v2-first-launch-supported-matrix.md +++ /dev/null @@ -1,138 +0,0 @@ -# V2 First-Launch Supported Matrix - -Date: 2026-04-04 -Status: draft -Purpose: freeze the first bounded launch envelope from accepted `Phase 12-17` -evidence, with explicit supported scope, explicit exclusions, and explicit -launch blockers - -## Reading Rule - -This document is a bounded support matrix. - -It does NOT mean: - -1. broad launch approval -2. generic production readiness -3. support for every failover/restart/disturbance branch -4. broad transport/frontend approval outside the named chosen path - -It means only: - -1. these are the strongest support statements currently justified by accepted - evidence -2. anything outside them is an exclusion, a blocker, or future - productionization work - -## Supported Matrix - -| Dimension | Supported in first draft | Boundary rule | Primary evidence | -|-----------|--------------------------|---------------|------------------| -| Replication factor | `RF=2` | bounded chosen path only | `CP13-1..7`, `C-RF2-SYNCALL-CONTRACT` | -| Durability mode | `sync_all` | bounded chosen path only | `CP13`, `v2-protocol-claim-and-evidence.md` | -| Control/runtime path | existing master / volume-server heartbeat path | same path as `Phase 10`, `Phase 16`, `Phase 17` checkpoints | `Phase 10`, `Phase 16` finish-line review | -| Semantic owner | explicit `V2 core` | semantics stay `V2`-owned even when implementation reuses `weed/` and `blockvol` | `Phase 14-16` | -| Execution backend | `blockvol` via `v2bridge` | reuse implementation; no V1 semantic inheritance | `Phase 09`, `Phase 14-16` | -| Product surfaces | bounded `iSCSI`, bounded `CSI`, bounded `NVMe` on the chosen path | not a generic transport matrix | `Phase 11`, publication tests, NVMe tests | -| Restart/failover reading | bounded `17B` + `17C` interpretation | use only the explicit contract and policy table from `Phase 17` | `phase-17.md`, `phase-17-checkpoint-review.md` | - -## Supported Statement - -The strongest currently supported first-launch statement is: - -1. on the bounded chosen `RF=2 sync_all` path, using the existing - master/volume-server heartbeat path, explicit `V2`-owned semantics and the - accepted `Phase 16-17` contract/policy package provide one finite support - envelope for bounded block product use -2. bounded `iSCSI`, `CSI`, and `NVMe` surfaces are supported only inside that - same chosen-path interpretation -3. failover/publication and disturbance behavior must be read through the - explicit `Phase 17` contract/policy package, not through broader inferred - product assumptions - -## Explicit Exclusions - -The following are OUTSIDE the first-launch supported matrix: - -1. `RF>2` -2. durability modes outside the accepted bounded envelope -3. broad transport/frontend matrix approval beyond bounded `iSCSI` / `CSI` / - `NVMe` chosen-path support -4. broad whole-surface failover/publication proof -5. broad restart-window behavior outside the explicit `17C` disturbance policy - table -6. generic soak/pilot success as production proof -7. broad rollout approval - -## Launch Blockers - -These are still required before this matrix can be read as a real launch -decision package: - -1. a frozen `Phase 17` checkpoint review outcome -2. any additional evidence needed if the product wants claims broader than the - current bounded `17B/17C` contract/policy package - -## Current Productionization Artifacts - -The first bounded productionization artifacts now exist for the chosen path: - -1. `v2-bounded-internal-pilot-pack.md` - - bounded pilot scope - - success criteria - - incident classification -2. `v2-pilot-preflight-checklist.md` - - start/resume gate for the bounded pilot -3. `v2-pilot-stop-conditions.md` - - stop/contain/rollback-exposure rules -4. `v2-controlled-rollout-review.md` - - bounded post-pilot decision gate - - allowed outcomes: stay in pilot / widen within same envelope / block expansion - -## Not Launch-Blocking In This Draft - -These are intentionally NOT blockers for the bounded first-draft envelope: - -1. lack of `RF>2` support -2. lack of broad transport/frontend approval -3. lack of broad launch approval language -4. lack of generic soak proof inside the phase package itself - -## Claim Mapping Rule - -Any first-launch support claim must map back to accepted evidence in ALL of the -following layers: - -1. hardening/floor layer - - `Phase 12 P4` -2. contract/workload/mode layer - - `CP13-1..9` -3. runtime truth-closure layer - - `Phase 16` finish-line checkpoint -4. product-claim checkpoint layer - - `Phase 17A-17D` - -If a claim cannot map cleanly back through those layers: - -1. it is not in the first-launch matrix -2. it belongs in exclusions, blockers, or later productionization work - -## Operator Reading Guide - -When using this matrix, read it with these constraints: - -1. bounded chosen path first, not generic platform promise -2. explicit exclusions are real product boundaries, not temporary omissions -3. launch blockers are real blockers, not optional polish -4. pilot success later may validate this envelope, but cannot redefine it - -## Primary References - -1. `sw-block/.private/phase/phase-17.md` -2. `sw-block/.private/phase/phase-17-checkpoint-review.md` -3. `sw-block/design/v2-product-completion-overview.md` -4. `sw-block/design/v2-protocol-claim-and-evidence.md` -5. `sw-block/design/v2-bounded-internal-pilot-pack.md` -6. `sw-block/design/v2-pilot-preflight-checklist.md` -7. `sw-block/design/v2-pilot-stop-conditions.md` -8. `sw-block/design/v2-controlled-rollout-review.md` diff --git a/sw-block/design/v2-legacy-runtime-exit-criteria.md b/sw-block/design/v2-legacy-runtime-exit-criteria.md deleted file mode 100644 index 110244865..000000000 --- a/sw-block/design/v2-legacy-runtime-exit-criteria.md +++ /dev/null @@ -1,99 +0,0 @@ -# V2 Legacy Runtime Exit Criteria - -Date: 2026-04-04 -Status: active - -## Purpose - -This note defines when legacy runtime-owner paths may be downgraded from -required compatibility coverage to removable implementation history. - -Current legacy examples: - -1. `legacy P4` live-path proofs -2. no-core startup paths -3. `HandleAssignmentResult()`-driven recovery startup kept for compatibility - -## Current Position - -For the current phase, legacy paths must remain: - -1. as compatibility guards -2. as regression protection for no-core behavior -3. but NOT as semantic authority proof for the core-present path - -## Exit Criteria - -A legacy runtime-owner path may be downgraded or removed only when all of the -following are true. - -### 1. V2-native proof replacement exists - -There must be core-present proofs covering the same behavior category: - -1. assignment entry ownership -2. task startup ownership -3. execution ownership -4. observation return ownership -5. outward surface consistency - -### 2. Compatibility mode is no longer required operationally - -At least one of these must be true: - -1. production startup always wires `v2Core` -2. no-core path is explicitly declared unsupported -3. no remaining product surface depends on no-core runtime startup - -### 3. The legacy path is no longer the only guard for a runtime mechanic - -Examples: - -1. serialized replacement/drain behavior -2. shutdown drain behavior -3. live plan-to-execute behavior - -These must have equivalent core-present coverage before legacy deletion. - -### 4. No semantic truth still depends on legacy behavior - -Specifically, removing the legacy path must not change: - -1. identity meaning -2. recovery classification -3. publication meaning -4. durable-boundary meaning - -If removal changes any of those, the legacy path was still hiding semantic -authority and cannot be retired yet. - -## Downgrade Stages - -Legacy paths should retire in stages: - -### Stage 1: authority downgrade - -1. keep tests -2. explicitly classify them as compatibility-only - -### Stage 2: runtime fallback downgrade - -1. keep fallback code only where product startup still needs it -2. stop expanding proof claims from those paths - -### Stage 3: deletion candidate - -1. delete tests or move them to legacy-only coverage -2. remove runtime fallback code only after the new path is already the sole - supported owner - -## Current Judgment - -As of the current separation work: - -1. `legacy P4` stays -2. it is already downgraded to compatibility guard -3. it is not yet removable because: - - no-core behavior still exists - - full runtime-loop closure is not yet complete - - not every old ownership proof has a complete core-present replacement diff --git a/sw-block/design/v2-open-questions.md b/sw-block/design/v2-open-questions.md deleted file mode 100644 index 4ec67ff73..000000000 --- a/sw-block/design/v2-open-questions.md +++ /dev/null @@ -1,161 +0,0 @@ -# V2 Open Questions - -Date: 2026-03-27 - -## Purpose - -This document records what is still algorithmically open in V2. - -These are not bugs. - -They are design questions that should be closed deliberately before or during implementation slicing. - -## 1. Recovery Session Ownership - -Open question: - -- what is the exact ownership model for one active recovery session per replica? - -Need to decide: - -- session identity fields -- supersede vs reject vs join behavior -- how epoch/session invalidates old recovery work - -Why it matters: - -- V1.5 needed local reconnect serialization -- V2 should make this a protocol rule - -## 2. Promotion Threshold Strictness - -Open question: - -- must a promotion candidate always have `FlushedLSN >= CommittedLSN`, or is there any narrower safe exception? - -Current prototype: - -- uses committed-prefix sufficiency as the safety gate - -Why it matters: - -- determines how strict real failover behavior should be - -## 3. Recovery Reservation Shape - -Open question: - -- what exactly is reserved during catch-up? - -Need to decide: - -- WAL range only? -- payload pins? -- snapshot pin? -- expiry semantics? - -Why it matters: - -- recoverability must be explicit, not hopeful - -## 4. Smart WAL Payload Classes - -Open question: - -- which payload classes are allowed in V2 first? - -Current model has: - -- `WALInline` -- `ExtentReferenced` - -Need to decide: - -- whether first real implementation includes both -- whether `ExtentReferenced` requires pinned snapshot/versioned extent only - -## 5. Smart WAL Garbage Collection Boundary - -Open question: - -- when can a referenced payload stop being recoverable? - -Need to decide: - -- GC interaction -- timeout interaction -- recovery session pinning - -Why it matters: - -- this is the line between catch-up and rebuild - -## 6. Exact Orchestrator Scope - -Open question: - -- how much of the final V2 control logic belongs in: - - local node state - - coordinator - - transport/session manager - -Why it matters: - -- avoid V1-style scattered state ownership - -## 7. First Real Implementation Slice - -Open question: - -- what is the first production slice of V2? - -Candidates: - -1. per-replica sender/session ownership -2. explicit recovery-session management -3. catch-up/rebuild decision plumbing - -Recommended default: - -- per-replica sender/session ownership - -## 8. Steady-State Overhead Budget - -Open question: - -- what overhead is acceptable in the normal healthy case? - -Need to decide: - -- metadata checks on hot path -- extra state bookkeeping -- what stays off the hot path - -Why it matters: - -- V2 should be structurally better without becoming needlessly heavy - -## 9. Smart WAL First-Phase Goal - -Open question: - -- is the first Smart WAL goal: - - lower recovery cost - - lower steady-state WAL volume - - or just proof of historical correctness model? - -Recommended answer: - -- first prove correctness model, then optimize - -## 10. End Condition For Simulator Work - -Open question: - -- when do we stop adding simulator depth and start implementation? - -Suggested answer: - -- once acceptance criteria are satisfied -- and the first implementation slice is clear -- and remaining simulator additions are no longer changing core protocol decisions diff --git a/sw-block/design/v2-phase14plus-semantic-framework.md b/sw-block/design/v2-phase14plus-semantic-framework.md deleted file mode 100644 index a35a0aba1..000000000 --- a/sw-block/design/v2-phase14plus-semantic-framework.md +++ /dev/null @@ -1,319 +0,0 @@ -# V2 Phase 14+ Semantic-First Framework - -Date: 2026-04-03 -Status: active -Purpose: define the overall `Phase 14+` implementation framework so `V2` -runtime extraction is driven by semantics first: core-owned state and -transitions, then command rules, then projection contracts, and only then -adapter rebinding - -## Why This Document Exists - -`Phase 13` closed one bounded constrained-runtime contract package: - -1. real-workload validation -2. assignment/publication closure -3. bounded mode normalization - -That package is valuable, but it is not yet a completed `V2 runtime`. - -The next problem is therefore no longer: - -1. keep deepening constrained-`V1` validation by default - -It is: - -1. how to turn the accepted semantic constraints into a real `V2 core` -2. how to sequence `Phase 14+` so `V1` mixed runtime state does not silently - regain semantic authority - -## Core Rule - -For `Phase 14+`, implementation order must be: - -1. define core-owned state and transitions -2. define command-emission rules -3. define projection contracts -4. only then connect adapters - -Do not invert this order. - -If adapter/runtime wiring appears first, `V1` mixed state will silently regain -semantic authority through convenience behavior. - -## Existing Inputs To Preserve - -These are fixed inputs, not optional references: - -1. `v2_mini_core_design.md` -2. `v2-reuse-replacement-boundary.md` -3. `v2-protocol-claim-and-evidence.md` -4. `v2-phase-development-plan.md` -5. `sw-block/engine/replication/` - -## Overall Composition Model - -The full `V2` runtime should be composed from smaller automata rather than one -monolithic state machine. - -```mermaid -flowchart TD - assignmentState[AssignmentAutomaton] - recoveryState[RecoveryAutomaton] - boundaryState[BoundaryAutomaton] - modeState[ModeAutomaton] - publicationState[PublicationAutomaton] - coreEngine[CoreEngine] - projections[ProjectionContracts] - adapters[AdapterBoundary] - runtime[V1BackendMechanics] - - assignmentState --> coreEngine - recoveryState --> coreEngine - boundaryState --> coreEngine - modeState --> coreEngine - coreEngine --> publicationState - publicationState --> projections - coreEngine --> adapters - adapters --> runtime - runtime -->|"observations/events"| adapters - adapters --> coreEngine -``` - -## The Five Core-Owned Automata - -### 1. Assignment automaton - -Owns: - -1. volume intent -2. role intent -3. stable replica identity -4. epoch -5. desired replica set - -Primary constraints preserved: - -1. `CP13-2` -2. identity-vs-transport separation - -Current seeds: - -1. `sw-block/engine/replication/registry.go` -2. `sw-block/engine/replication/state.go` - -### 2. Recovery automaton - -Owns: - -1. per-replica recovery state -2. session ownership and fencing -3. catch-up vs rebuild selection - -Primary constraints preserved: - -1. `CP13-4` -2. `CP13-5` -3. `CP13-6` -4. `CP13-7` - -Current seeds: - -1. `sw-block/engine/replication/sender.go` -2. `sw-block/engine/replication/session.go` -3. `sw-block/engine/replication/orchestrator.go` -4. `sw-block/engine/replication/outcome.go` - -### 3. Boundary automaton - -Owns: - -1. committed truth -2. checkpoint truth -3. durable barrier truth -4. rebuild/catch-up target truth - -Primary constraints preserved: - -1. `T1` -2. `T9` -3. `CP13-3` - -Current seeds: - -1. `sw-block/engine/replication/state.go` -2. `sw-block/engine/replication/engine.go` - -### 4. Mode automaton - -Owns: - -1. `allocated_only` -2. `bootstrap_pending` -3. `replica_ready` -4. `publish_healthy` -5. `degraded` -6. `needs_rebuild` - -Primary constraints preserved: - -1. `CP13-9` -2. fail-closed external meaning - -Current seeds: - -1. `sw-block/engine/replication/state.go` -2. `sw-block/engine/replication/engine.go` - -### 5. Publication automaton - -Owns: - -1. readiness closure -2. publication closure -3. outward healthy vs non-healthy truth - -Primary constraints preserved: - -1. `CP13-8A` -2. `CP13-9` - -Current seeds: - -1. `sw-block/engine/replication/projection.go` -2. `sw-block/engine/replication/engine.go` - -## Phase 14+ Execution Order - -### Phase 14A: Core-owned automata - -Goal: - -1. make the five automata explicit in the core package - -Deliver: - -1. state definitions -2. transition tables/rules -3. event vocabulary - -Validation: - -1. structural acceptance tests in `sw-block/engine/replication` - -Non-goal: - -1. no live adapter hook - -### Phase 14B: Command semantics - -Goal: - -1. freeze command-emission rules from semantic state, not runtime convenience - -Deliver: - -1. command rules for role apply, receiver start, shipper configure, invalidation, - and publication - -Validation: - -1. tests that one event sequence produces one bounded command sequence - -Non-goal: - -1. no `weed/` execution yet - -### Phase 14C: Projection contracts - -Goal: - -1. define what external surfaces are allowed to claim and from which core state - -Deliver: - -1. projection structs and normalization rules for lookup/heartbeat/debug/tester - meanings - -Validation: - -1. mode/readiness/publication surface-consistency tests - -Non-goal: - -1. no live registry rewrite yet - -### Phase 15A: Minimal adapter hook - -Goal: - -1. connect one narrow adapter ingress to the new core - -Deliver: - -1. one event path from `weed/` into the core -2. one command path back out - -Validation: - -1. prove no semantic split between adapter and core on that narrow path - -### Phase 15B: Projection-store rebinding - -Goal: - -1. make `weed/` projection/state surfaces consume core-owned projection truth - -Deliver: - -1. bounded rebinding of registry / lookup / tester-facing surfaces - -Validation: - -1. prove assignment delivered != ready != publish healthy on the real path - -### Phase 16: V2-native runtime closure - -Goal: - -1. make the integrated runtime behave as a `V2`-owned system rather than - constrained-`V1` semantics plus fixes - -Deliver: - -1. one bounded runtime path where core-owned semantics drive adapters and - projections - -Validation: - -1. end-to-end failover/recovery/publication scenarios on the core-driven path - -## Algorithm Review Rule - -For any new transition rule, command rule, or projection rule, require a short -justification in code review or delivery notes: - -1. semantic constraint satisfied: - - which item from `v2-protocol-claim-and-evidence.md`, - `v2-protocol-truths.md`, or `CP13-*` -2. overclaim avoided: - - what false healthy / ready / durable / recoverable claim is being prevented -3. proof preserved: - - which accepted test or checkpoint remains valid because of this rule - -This is the minimum bar for `Phase 14+`. - -## Immediate Next Slice - -Do not broaden `Phase 13` further. - -Use the new `Phase 14` core skeleton in `sw-block/engine/replication` as the -base for one complete semantic chain: - -1. `mode` -2. `readiness` -3. `publication` - -This is the best next slice because it turns the newest accepted `CP13-8A` and -`CP13-9` constraints directly into core-owned state and transition logic before -adapter rebinding begins. diff --git a/sw-block/design/v2-pilot-preflight-checklist.md b/sw-block/design/v2-pilot-preflight-checklist.md deleted file mode 100644 index d14c5a1cd..000000000 --- a/sw-block/design/v2-pilot-preflight-checklist.md +++ /dev/null @@ -1,112 +0,0 @@ -# V2 Pilot Preflight Checklist - -Date: 2026-04-05 -Status: draft -Purpose: define the minimum explicit checks required before running bounded -internal engineering validation on the current `Phase 18` RF2 runtime envelope - -## Reading Rule - -This checklist is a gate for starting or resuming bounded internal engineering -validation. - -If any item below is not satisfied: - -1. do not treat the environment as pilot-ready -2. either fix the issue or classify it explicitly before proceeding - -## Scope Lock - -Confirm the validation is still inside the frozen `Phase 18` RF2 runtime -envelope: - -1. topology remains bounded `RF=2` -2. runtime path remains the delivered `masterv2 + volumev2 + purev2` `M1-M4` - path -3. validation stays in bounded runtime/lab exercises only -4. no one is trying to use this validation to claim working block product status -5. no real frontend/product traffic is being introduced on the new runtime path -6. no one is trying to use validation success to claim broader launch approval - -## Build And Artifact Pin - -Confirm the software package is explicit and stable: - -1. the exact build/commit for validation nodes is written down -2. all validation nodes run the same intended package -3. the operator runbook matches the package actually deployed -4. any configuration delta from the documented chosen path is reviewed and - accepted explicitly - -## Environment Readiness - -Confirm the validation environment matches bounded assumptions: - -1. node inventory and topology are written down -2. transport/frontend choice does not widen beyond the bounded runtime envelope -3. storage/network assumptions required by the chosen path are known to the - operator -4. known exclusions are acknowledged before start -5. rollback/containment ownership is assigned for the validation window - -## Diagnosis Surface Readiness - -Confirm bounded diagnosis can be performed without ad hoc spelunking: - -1. failover snapshots/results can be inspected -2. Loop 2 snapshots can be inspected -3. continuity snapshots can be inspected -4. RF2 runtime surface can be inspected -5. the operator knows which artifact defines the current contract/policy boundary: - - `v2-rf2-runtime-bounded-envelope.md` - - `v2-rf2-runtime-bounded-envelope-review.md` - - this preflight checklist - - the stop-condition artifact - -## Workload And Gate Alignment - -Confirm the validation workload is aligned with accepted evidence: - -1. the workload maps to the bounded runtime-bearing reading rather than a new - unsupported scenario -2. success will be judged against the validation-pack criteria rather than generic - "looks stable" judgment -3. the workload does not assume continuous Loop 2, real transport, auto failover, - rebuild lifecycle, or product frontends that are still excluded -4. no required proof depends on failover-under-load, hours/days soak, `RF>2`, or - broad transport/frontend claims that are still excluded - -## Incident Routing Readiness - -Confirm incident handling is explicit before starting: - -1. every incident will be classified as one of: - - `config / environment issue` - - `known exclusion` - - `true product bug` -2. the recording location for incidents is agreed before validation starts -3. ownership for triage and decision-making is assigned -4. operators know when they must stop instead of improvising - -## Preflight Result - -Validation may start only if: - -1. every scope-lock item is true -2. the software package and environment are pinned -3. diagnosis surfaces are available -4. incident routing is explicit -5. no remaining gap is being hand-waved as "we will figure it out during pilot" - -If those conditions are not met, the correct output is: - -1. `NOT READY` -2. the missing item(s) -3. the owner/action needed before retry - -## Primary Inputs - -1. `sw-block/design/v2-bounded-internal-pilot-pack.md` -2. `sw-block/design/v2-rf2-runtime-bounded-envelope.md` -3. `sw-block/design/v2-rf2-runtime-bounded-envelope-review.md` -4. `sw-block/.private/phase/phase-18.md` diff --git a/sw-block/design/v2-pilot-stop-conditions.md b/sw-block/design/v2-pilot-stop-conditions.md deleted file mode 100644 index c6774b832..000000000 --- a/sw-block/design/v2-pilot-stop-conditions.md +++ /dev/null @@ -1,101 +0,0 @@ -# V2 Pilot Stop Conditions - -Date: 2026-04-05 -Status: draft -Purpose: define when bounded internal engineering validation on the current -`Phase 18` RF2 runtime envelope must stop, contain scope, or block expansion - -## Reading Rule - -This artifact is about validation containment, not protocol/data rollback -semantics. - -`Rollback` here means: - -1. stop widening validation exposure -2. reduce or remove validation usage if needed -3. return to a previously accepted bounded state of operation - -It does NOT mean: - -1. a general storage/data rollback guarantee -2. permission to claim a broader recovery contract than the current evidence -3. ad hoc operator improvisation under ambiguity - -## Immediate Stop Conditions - -Stop validation immediately if ANY of the following occurs: - -1. an observed behavior contradicts the bounded `Phase 18` RF2 runtime envelope -2. any run is interpreted as proving automatic failover, continuous Loop 2 - service, rebuild lifecycle, or frontend-serving behavior that is still - explicitly excluded -3. diagnosis surfaces are insufficient to classify the incident without guessing -4. the validation is being widened beyond the named bounded envelope without an - explicit review decision -5. the incident does not fit the allowed buckets: - - `config / environment issue` - - `known exclusion` - - `true product bug` - -## Stop-And-Contain Actions - -When a stop condition fires: - -1. freeze new validation expansion immediately -2. preserve the evidence needed for later review -3. classify the incident explicitly -4. map the incident back to: - - accepted bounded claim - - known exclusion - - unresolved blocker -5. decide whether validation can continue in reduced scope or must fully pause - -If the team cannot perform those actions clearly, validation remains stopped. - -## Rollback Decision Rules - -Use the following bounded rules: - -1. `config / environment issue` - - fix the environment/configuration - - rerun preflight before resuming -2. `known exclusion` - - remove the excluded usage from validation - - do not reinterpret it as product support -3. `true product bug` - - pause affected validation scope - - open an explicit fix or contradiction item before resuming - -If repeated incidents of the same class continue without a bounded corrective -path, block further validation expansion. - -## Expansion Blockers - -Even if validation remains partially runnable, do NOT widen it when: - -1. the same unresolved true product bug recurs -2. operators depend on tribal knowledge to recover or diagnose -3. incident records are vague or cannot be mapped back to the current evidence - ladder -4. success depends on ignoring explicit exclusions -5. the desired next step requires broader launch claims than the current envelope - -## Explicit Non-Claims - -This artifact does NOT claim: - -1. broad rollout approval -2. generic production readiness from validation survival -3. support for `RF>2` -4. support for a broad transport/frontend matrix -5. failover-under-load proof or long-window soak proof beyond the current bounded - evidence set - -## Primary Inputs - -1. `sw-block/design/v2-bounded-internal-pilot-pack.md` -2. `sw-block/design/v2-pilot-preflight-checklist.md` -3. `sw-block/design/v2-rf2-runtime-bounded-envelope.md` -4. `sw-block/design/v2-rf2-runtime-bounded-envelope-review.md` -5. `sw-block/.private/phase/phase-18.md` diff --git a/sw-block/design/v2-protocol-aware-execution.md b/sw-block/design/v2-protocol-aware-execution.md deleted file mode 100644 index e5be0d5c6..000000000 --- a/sw-block/design/v2-protocol-aware-execution.md +++ /dev/null @@ -1,90 +0,0 @@ -# V2 Protocol-Aware Execution - -## Purpose -Make host-side execution in `weed/server` and `weed/storage/blockvol` obey the -existing V2 session contract explicitly. The engine remains the semantic source -of truth. Host code owns only: - -- execution-state caching derived from sender/session snapshots -- phase gating before data-plane I/O -- observation routing back into core events - -## Host-Side Execution State -For each primary volume and replica, the host caches a `replica protocol -execution state` with these fields: - -- `ReplicaID` -- `SenderState` -- `SessionID` -- `SessionKind` -- `SessionPhase` -- `StartLSN` -- `TargetLSN` -- `FrozenTargetLSN` -- `RecoveredTo` -- `SessionActive` -- `LiveEligible` -- `Reason` - -Rules: - -1. State is derived from `v2Orchestrator.Registry` snapshots only. -2. `LiveEligible=false` whenever there is an active recovery session. -3. Data-plane code must consult this cached state before shipping current live - WAL entries. -4. Heartbeat and publication remain projection-driven; they do not invent local - session semantics. - -## WAL-First Rollout -The first rollout is intentionally narrow: - -- cover `keepup` and WAL-based catch-up only -- do not change snapshot/build policy -- do not let fresh late-attached replicas consume current live-tail WAL while a - bounded catch-up session is active - -Current implementation seam: - -- `weed/server/block_protocol_state.go` - - derives host execution state from sender/session snapshots - - binds a per-volume live-shipping policy back into `BlockVol` -- `weed/storage/blockvol/blockvol.go` - - carries the host-provided live-shipping policy across shipper-group rebuilds -- `weed/storage/blockvol/wal_shipper.go` - - checks the policy before any live-tail dial or send - -This is intentionally a phase gate, not a second source of truth. - -## Observation Seam -Runtime observations should feed back through one server-side seam: - -- sender/session snapshots -> `syncProtocolExecutionState()` -- host event application -> `applyCoreEvent()` -- assignment processing -> `ApplyAssignments()` - -The rule is: - -1. engine chooses the protocol phase -2. host derives execution state from engine snapshots -3. data path obeys that state -4. host emits observed facts back through `applyCoreEvent()` - -## Fast Test Roster -The first fast-test roster for protocol-aware execution is: - -- `unit`: `TestWALShipper_LiveShippingPolicyBlocksBeforeDial` - - proves phase gate happens before any transport dial -- `unit`: `TestWALShipper_LiveShippingPolicyAllowsShip` - - proves the gate does not block normal live shipping after eligibility -- `component`: `TestBlockService_ProtocolExecutionState_ActiveCatchUpBlocksLiveShipping` - - proves sender/session snapshots become host execution state and block live - shipping during active catch-up -- `component`: `TestBlockService_ProtocolExecutionState_InSyncSenderAllowsLiveShipping` - - proves the host reopens live shipping after the recovery session is gone - -Next fast tests to add in later waves: - -- late attach with backlog must stay bounded until target reached -- transport contact before barrier durability must not imply publish healthy -- timeout with valid retention pin may replan WAL catch-up -- timeout after retention loss must escalate to build diff --git a/sw-block/design/v2-reuse-replacement-boundary.md b/sw-block/design/v2-reuse-replacement-boundary.md deleted file mode 100644 index 925d6d1ef..000000000 --- a/sw-block/design/v2-reuse-replacement-boundary.md +++ /dev/null @@ -1,178 +0,0 @@ -# V2 Reuse vs Replacement Boundary - -Date: 2026-04-03 -Status: active - -## Purpose - -This note makes one architectural split explicit for the current chosen path: - -1. what we reuse from the existing `blockvol`/`weed` stack as mechanics -2. what must be owned by `V2` as semantic authority -3. what sits in the adapter boundary between them - -The goal is to stop `V1` mixed control/data state from silently redefining `V2` -behavior through convenience wiring. - -Scope is still bounded to: - -1. `RF=2` -2. `sync_all` -3. current master / volume-server heartbeat path -4. `blockvol` as the execution backend - -## Boundary Rule - -`V1` reuse is allowed for execution mechanics. - -`V2` replacement is required for semantic authority. - -If a change decides protocol meaning, failover meaning, durability meaning, or -external publication meaning, it belongs to a `V2`-owned layer even if the -underlying I/O still runs through reused `blockvol` code. - -This is the practical interpretation of: - -- `v2-protocol-truths.md` `T14`: engine remains recovery authority -- `v2-protocol-truths.md` `T15`: reuse reality, not inherited semantics - -## Three Buckets - -### 1. Reusable V1 Core - -These components remain useful as mechanics: - -| Area | Files | What stays reusable | -|------|-------|---------------------| -| Local storage truth | `weed/storage/blockvol/blockvol.go`, `flusher.go`, `rebuild.go`, WAL/extent helpers | WAL append, flush, checkpoint, dirty-map, extent install | -| Replica transport | `weed/storage/blockvol/replica_apply.go`, `wal_shipper.go`, `shipper_group.go`, `dist_group_commit.go`, `repl_proto.go` | TCP receiver/shipper mechanics, barrier transport, replay/apply | -| Frontend serving | `weed/storage/blockvol/iscsi/`, `weed/storage/blockvol/nvme/` | block-device serving once a local volume is authoritative | -| Local role guardrails | `weed/storage/blockvol/promotion.go`, `role.go` | drain, lease revoke, local role gate enforcement | - -Rule: - -- these layers execute I/O and transport -- they do not decide whether a replica is eligible, authoritative, published, or healthy in the `V2` sense - -### 2. Adapter Boundary - -These components translate `V2` truth into concrete runtime wiring: - -| Area | Files | Responsibility | -|------|-------|----------------| -| Assignment ingest | `weed/server/volume_server_block.go` | authoritative assignment lifecycle for role apply, receiver/shipper wiring, readiness closure | -| Heartbeat/runtime loop | `weed/server/block_heartbeat_loop.go` | collect/report status and process assignments through the same lifecycle | -| Local store helper | `weed/storage/store_blockvol.go` | local volume open/close/iteration; no longer the authoritative assignment lifecycle | -| Bridge | `weed/storage/blockvol/v2bridge/control.go` | convert service/control truth into engine intents | - -Rule: - -- the adapter boundary may reuse `blockvol` primitives -- it must name and own lifecycle closure states explicitly -- it must not let store-only role application masquerade as ready publication - -### 3. V2-Owned Replacement - -These areas define truth and therefore must remain `V2`-owned: - -| Area | Files | Responsibility | -|------|-------|----------------| -| Control and identity truth | `sw-block/engine/replication/`, `weed/storage/blockvol/v2bridge/control.go` | assignment truth, stable identity, session truth | -| Recovery ownership | `weed/server/block_recovery.go` | live runtime owner for catch-up/rebuild tasks | -| Publication and health closure | `weed/server/master_block_registry.go`, `weed/server/master_block_failover.go` | what the system reports as ready, degraded, publishable | -| External product surfaces | `weed/server/master_grpc_server_block.go`, `weed/server/master_server_handlers_block.go`, debug/diagnostic surfaces | operator-visible truth, not convenience guesses | - -Rule: - -- if the system exposes a condition to master, tester, CSI, or operator tooling, that condition must come from `V2`-named state - -## Assignment-To-Readiness Lifecycle - -The authoritative lifecycle for the current chosen path is: - -```text -assignment delivered --> local role applied --> replica receiver or primary shipper configured --> readiness closed --> heartbeat publication --> master registry health/publication -``` - -More concretely: - -1. master intent is delivered -2. `BlockService.ApplyAssignments()` applies local role truth -3. the same path wires receiver/shipper runtime -4. the same path records named readiness state -5. heartbeat publishes only what is actually publish-healthy -6. master registry derives lookup/health from explicit readiness, not from allocation alone - -## Named Readiness States - -For the current implementation slice, the service boundary now names: - -1. `roleApplied` -2. `receiverReady` -3. `shipperConfigured` -4. `shipperConnected` -5. `replicaEligible` -6. `publishHealthy` - -Ownership: - -- owned by `BlockService` / adapter layer -- observed by debug surfaces and heartbeat/publication logic -- not delegated to `blockvol` as implicit mixed state - -## Current File Map - -### Reuse - -- `weed/storage/blockvol/blockvol.go` -- `weed/storage/blockvol/flusher.go` -- `weed/storage/blockvol/replica_apply.go` -- `weed/storage/blockvol/wal_shipper.go` -- `weed/storage/blockvol/shipper_group.go` -- `weed/storage/blockvol/dist_group_commit.go` -- `weed/storage/blockvol/iscsi/` -- `weed/storage/blockvol/nvme/` - -### Adapter boundary - -- `weed/server/volume_server_block.go` -- `weed/server/block_heartbeat_loop.go` -- `weed/storage/store_blockvol.go` -- `weed/server/volume_server_block_debug.go` - -### V2-owned replacement / truth - -- `weed/storage/blockvol/v2bridge/control.go` -- `sw-block/engine/replication/` -- `weed/server/block_recovery.go` -- `weed/server/master_block_registry.go` -- `weed/server/master_block_failover.go` -- `weed/server/master_grpc_server_block.go` -- `weed/server/master_server_handlers_block.go` - -## Immediate Engineering Rule - -When a new bug appears, classify it first: - -1. `v1 reusable core`: local storage or transport mechanics -2. `adapter boundary`: assignment/readiness/publication closure bug -3. `v2 replacement`: semantic authority, identity, ownership, eligibility, rebuild, or operator-visible truth - -Do not patch semantic authority directly into `blockvol` unless the same change is -also reflected as an explicit `V2` state/rule at the service or registry layer. - -## Why This Matters For CP13-8 - -`CP13-8` found the exact class of bug this split is meant to expose: - -- allocation/control truth said the replica existed -- but runtime publication/read visibility was not yet closed - -That is not a reason to throw away `blockvol`. -It is a reason to stop treating mixed `V1` runtime state as if it were already -closed `V2` publication truth. diff --git a/sw-block/design/v2-rf2-runtime-bounded-envelope-review.md b/sw-block/design/v2-rf2-runtime-bounded-envelope-review.md deleted file mode 100644 index 6d5031e4f..000000000 --- a/sw-block/design/v2-rf2-runtime-bounded-envelope-review.md +++ /dev/null @@ -1,80 +0,0 @@ -# V2 RF2 Runtime Bounded Envelope Review - -Date: 2026-04-05 -Status: draft -Purpose: record the current bounded productionization judgment for the delivered -`Phase 19` RF2 working-path envelope - -## Review Outcome - -Current decision: - -1. `stay in bounded validation` -2. `not pilot-ready` - -## Why This Is The Correct Outcome - -The delivered `Phase 19` path proves one bounded working RF2 block path: - -1. live transport-backed evidence traffic exists -2. continuous Loop 2 service exists -3. bounded automatic failover exists -4. runtime-managed frontend rebinding exists -5. bounded repair/catch-up exists -6. one real end-to-end client handoff proof exists -7. bounded operator and CSI adapters now exist on top of runtime-owned truth - -But the path is still not broad product/pilot approval because: - -1. the current proof is still bounded to the current runtime harness -2. repair/catch-up is not yet broad rebuild lifecycle closure -3. CSI and operator surfaces are still bounded adapters rather than full - production surfaces -4. no broad pilot or rollout evidence exists yet - -## Review Record - -Reviewer reading baseline: - -1. `sw-block/.private/phase/phase-19.md` -2. `sw-block/design/v2-rf2-runtime-bounded-envelope.md` -3. `sw-block/design/v2-bounded-internal-pilot-pack.md` -4. `sw-block/design/v2-pilot-preflight-checklist.md` -5. `sw-block/design/v2-pilot-stop-conditions.md` -6. `sw-block/design/v2-controlled-rollout-review.md` -7. `sw-block/runtime/volumev2/poc_test.go` - -Current evidence package: - -1. runtime-owned failover manager -2. continuous Loop 2 service and bounded auto failover -3. runtime-managed frontend and bounded repair closure -4. end-to-end RF2 handoff proof -5. RF2 runtime surface projection and operator surface -6. bounded CSI runtime backend adapter - -## Allowed Interpretation - -The review allows only these statements: - -1. one runtime-bearing RF2 kernel slice now exists -2. one bounded working RF2 block path now exists -3. one bounded productionization artifact set now exists around that path -4. later work may widen from this review only through explicit new closure - -The review does NOT allow: - -1. working block product approval -2. pilot execution against real product traffic -3. rollout expansion beyond bounded internal engineering validation - -## Next Required Closures - -Before any pilot-ready judgment can exist, the next closures must become -explicit: - -1. multi-process / multi-host proof for the current working path -2. broader rebuild lifecycle closure beyond the bounded repair wrapper -3. fuller CSI lifecycle parity on the V2 runtime path -4. broader operator/metrics surface closure -5. pilot/preflight/containment evidence on top of the `Phase 19` path diff --git a/sw-block/design/v2-rf2-runtime-bounded-envelope.md b/sw-block/design/v2-rf2-runtime-bounded-envelope.md deleted file mode 100644 index 4d8bd1386..000000000 --- a/sw-block/design/v2-rf2-runtime-bounded-envelope.md +++ /dev/null @@ -1,127 +0,0 @@ -# V2 RF2 Runtime Bounded Envelope - -Date: 2026-04-05 -Status: draft -Purpose: freeze the bounded productionization envelope around the current -`Phase 19` working RF2 block path without overclaiming broad product readiness - -## Reading Rule - -This document defines the strongest bounded envelope currently justified by the -delivered `Phase 19` path. - -It does NOT mean: - -1. broad launch approval -2. working block product approval -3. support for broad frontend or transport matrices -4. that remaining runtime/product gaps are minor polish - -It means only: - -1. the current `masterv2 + volumev2 + purev2` RF2 runtime slice has a named, - reviewable productionization boundary -2. the current support statement, exclusions, and blockers are explicit -3. later pilot or rollout work must stay inside this envelope or explicitly widen - it with new evidence - -## Envelope Basis - -This envelope is anchored on the delivered `Phase 19` milestones: - -1. `M6`: one live loopback HTTP transport now exists behind the evidence seam -2. `M7`: one background Loop 2 service and one bounded auto-failover service now - exist -3. `M8`: one runtime-managed iSCSI export path and one bounded replica repair - wrapper now exist -4. `M9`: one end-to-end RF2 handoff proof now exists with continued I/O on the - new primary -5. `M10`: one bounded operator surface and one bounded CSI runtime backend - adapter now exist - -The envelope is therefore about one bounded working RF2 block path, not broad -product readiness. - -## Supported Envelope - -The current bounded support statement is: - -1. one bounded working RF2 block path now exists with: - - `masterv2` identity/promotion authority - - `volumev2` failover, takeover, active Loop 2 service, continuity, repair, - frontend rebinding, and projected RF2 surface ownership - - `purev2` execution adapter reuse -2. one bounded live transport path now carries failover-time evidence and replica - summaries -3. one bounded real client handoff path now exists: - - write through runtime-managed iSCSI export - - bounded repair/catch-up on the runtime path - - lose primary - - auto fail over - - reconnect to the new primary - - continue I/O -4. one bounded outward RF2 surface exists as projection only: - - `RF2VolumeSurface` -5. one bounded operator/CSI adapter layer exists on top of runtime-owned truth - -## Explicit Exclusions - -The following are OUTSIDE this bounded envelope: - -1. broad multi-process or multi-host deployment approval -2. broad transport/frontend matrix approval -3. full rebuild orchestration beyond the current bounded repair/catch-up wrapper -4. broad CSI lifecycle parity beyond the current bounded runtime backend adapter -5. broad operator/API/metrics coverage beyond the current bounded HTTP surface -6. broad launch or external customer support statements - -## Current Blockers - -The main blockers between this envelope and a working RF2 block product are: - -1. the current path is still bounded to the current runtime harness rather than - broad multi-process approval -2. bounded repair/catch-up is not yet broad rebuild lifecycle closure -3. CSI rebinding is still a bounded runtime backend adapter, not full lifecycle - parity -4. the operator surface is still a bounded HTTP view, not a full operational - platform surface - -## Allowed Validation Shape - -The allowed validation shape inside this envelope is: - -1. internal engineering validation only -2. bounded lab/runtime exercise only -3. explicit artifact-driven interpretation only - -The following are NOT allowed interpretations: - -1. "the system is now production ready" -2. "the system now supports real automatic failover" -3. "the system now supports broad product traffic and rollout" - -## Evidence Anchors - -Read this envelope together with: - -1. `sw-block/.private/phase/phase-19.md` -2. `sw-block/design/v2-kernel-closure-review.md` -3. `sw-block/design/v2-protocol-claim-and-evidence.md` -4. `sw-block/runtime/volumev2/runtime_manager.go` -5. `sw-block/runtime/volumev2/continuity_runtime.go` -6. `sw-block/runtime/volumev2/rf2_surface.go` -7. `sw-block/runtime/volumev2/loop2_service.go` -8. `sw-block/runtime/volumev2/frontend_runtime.go` -9. `sw-block/runtime/volumev2/operator_surface.go` -10. `sw-block/runtime/volumev2/poc_test.go` -11. `weed/storage/blockvol/csi/v2_runtime_backend.go` - -## Envelope Output - -The correct current reading of this envelope is: - -1. runtime-bearing RF2 kernel slice: yes -2. bounded working RF2 block path: yes -3. bounded productionization artifact set: yes -4. pilot-ready broad product path: no diff --git a/sw-block/design/v2-scenario-sources-from-v1.md b/sw-block/design/v2-scenario-sources-from-v1.md deleted file mode 100644 index cf47cc95e..000000000 --- a/sw-block/design/v2-scenario-sources-from-v1.md +++ /dev/null @@ -1,249 +0,0 @@ -# V2 Scenario Sources From V1 and V1.5 - -Date: 2026-03-27 - -## Purpose - -This document distills V1 / V1.5 real-test material into V2 scenario inputs. - -Sources: - -- `learn/projects/sw-block/phases/phase13_test.md` -- `learn/projects/sw-block/phases/phase-13-v2-boundary-tests.md` - -This is not the active scenario backlog. - -Use: - -- `v2_scenarios.md` for the active V2 scenario set -- this file for historical source and rationale - -## How To Use This File - -For each item below: - -1. keep the real V1/V1.5 test as implementation evidence -2. create or maintain a V2 simulator scenario for the protocol core -3. define the expected V2 behavior explicitly - -## Source Buckets - -### 1. Core protocol behavior - -These are the highest-value simulator inputs. - -- barrier durability truth -- reconnect + catch-up -- non-convergent catch-up -> rebuild -- rebuild fallback -- failover / promotion safety -- WAL retention / tail-chasing -- durability mode semantics - -Recommended V2 treatment: - -- `sim_core` - -### 2. Supporting invariants - -These matter, but usually as reduced simulator checks. - -- canonical address handling -- replica role/epoch gating -- committed-prefix rules -- rebuild publication cleanup -- assignment refresh behavior - -Recommended V2 treatment: - -- `sim_reduced` - -### 3. Real-only implementation behavior - -These should usually stay in real-engine tests. - -- actual wire encoding / decode bugs -- real disk / `fdatasync` timing -- NVMe / iSCSI frontend behavior -- Go concurrency artifacts tied to concrete implementation - -Recommended V2 treatment: - -- `real_only` - -### 4. V2 boundary items - -These are especially important. - -They should remain visible as: - -- current V1/V1.5 limitation -- explicit V2 acceptance target - -Recommended V2 treatment: - -- `v2_boundary` - -## Distilled Scenario Inputs - -### A. Barrier truth uses durable replica progress - -Real source: - -- Phase 13 barrier / `replicaFlushedLSN` tests - -Why it matters: - -- commit must follow durable replica progress, not send progress - -V2 target: - -- barrier completion counted only from explicit durable progress state - -### B. Same-address transient outage - -Real source: - -- Phase 13 reconnect / catch-up tests -- `CP13-8` short outage recovery - -Why it matters: - -- proves cheap short-gap recovery path - -V2 target: - -- explicit recoverability check -- catch-up if recoverable -- rebuild otherwise - -### C. Changed-address restart - -Real source: - -- `CP13-8 T4b` -- changed-address refresh fixes - -Why it matters: - -- endpoint is not identity -- stale endpoint must not remain authoritative - -V2 target: - -- heartbeat/control-plane learns new endpoint -- reassignment updates sender target -- recovery session starts only after endpoint truth is updated - -### D. Non-convergent catch-up / tail-chasing - -Real source: - -- Phase 13 retention + catch-up + rebuild fallback line - -Why it matters: - -- “catch-up exists” is not enough -- must know when to stop and rebuild - -V2 target: - -- explicit `CatchingUp -> NeedsRebuild` -- no fake success - -### E. Slow control-plane recovery - -Real source: - -- `CP13-8 T4b` hardware behavior before fix - -Why it matters: - -- safety can be correct while availability recovery is poor - -V2 target: - -- explicit fast recovery path when possible -- explicit fallback when only control-plane repair can help - -### F. Stale message / delayed ack fencing - -Real source: - -- Phase 13 epoch/fencing tests -- V2 scenario work already mirrors this - -Why it matters: - -- old lineage must not mutate committed prefix - -V2 target: - -- stale message rejection is explicit and testable - -### G. Promotion candidate safety - -Real source: - -- failover / promotion gating tests -- V2 candidate-selection work - -Why it matters: - -- wrong promotion loses committed lineage - -V2 target: - -- candidate must satisfy: - - running - - epoch aligned - - state eligible - - committed-prefix sufficient - -### H. Rebuild boundary after failed catch-up - -Real source: - -- Phase 13 rebuild fallback behavior - -Why it matters: - -- rebuild is required when retained WAL cannot safely close the gap - -V2 target: - -- rebuild is explicit fallback, not ad hoc recovery - -## Immediate Feed Into `v2_scenarios.md` - -These are the most important V1/V1.5-derived V2 scenarios: - -1. same-address transient outage -2. changed-address restart -3. non-convergent catch-up / tail-chasing -4. stale delayed message / barrier ack rejection -5. committed-prefix-safe promotion -6. control-plane-latency recovery shape - -## What Should Not Be Copied Blindly - -Do not clone every real-engine test into the simulator. - -Do not use the simulator for: - -- exact OS timing -- exact socket/wire bugs -- exact block frontend behavior -- implementation-specific lock races - -Instead: - -- extract the protocol invariant -- model the reduced scenario if the protocol value is high - -## Bottom Line - -V1 / V1.5 tests should feed V2 in two ways: - -1. as historical evidence of what failed or mattered in real life -2. as scenario seeds for the V2 simulator and acceptance backlog diff --git a/sw-block/design/v2-separation-port-layer-audit.md b/sw-block/design/v2-separation-port-layer-audit.md deleted file mode 100644 index c988829e0..000000000 --- a/sw-block/design/v2-separation-port-layer-audit.md +++ /dev/null @@ -1,135 +0,0 @@ -# V2 Separation Port Layer Audit - -Date: 2026-04-04 -Status: active - -## Purpose - -This note audits the current `sw-block` port layer for the separation effort: - -1. define which contracts already belong in `sw-block` -2. identify what was still underspecified or mismatched -3. record the normalized boundary for future migration batches - -## Current Port Layer - -The current reusable boundary inside `sw-block` is: - -1. `sw-block/bridge/blockvol/contract.go` -2. `sw-block/bridge/blockvol/storage_adapter.go` -3. `sw-block/bridge/blockvol/control_adapter.go` - -These files are the intended weed-free bridge between: - -1. `sw-block/engine/replication` -2. `weed/storage/blockvol/v2bridge` -3. `weed/server/*` adapter code - -## Audited Contracts - -### Storage state port - -File: - -1. `sw-block/bridge/blockvol/contract.go` - -Stable contract: - -1. `BlockVolReader` -2. `BlockVolState` - -This is already the right ownership: - -1. `sw-block` owns the shape of retained-history inputs -2. `weed/` only implements how to read those facts from real `BlockVol` - -### Retention / snapshot pinning port - -File: - -1. `sw-block/bridge/blockvol/contract.go` - -Stable contract: - -1. `BlockVolPinner` - -This remains correct because: - -1. pin lifecycle meaning belongs to the V2 recovery driver -2. actual hold/release mechanics remain weed-side implementation detail - -### Recovery execution port - -Previous issue: - -1. `BlockVolExecutor` in `contract.go` did not match the real engine execution - interfaces precisely -2. in particular, rebuild full-base transfer in the engine returns achieved LSN, - but the contract only returned `error` - -Normalized decision: - -1. `sw-block` now names: - - `BlockVolCatchUpIO` - - `BlockVolRebuildIO` - - `BlockVolExecutor` -2. these contracts intentionally match: - - `engine.CatchUpIO` - - `engine.RebuildIO` - -This is the right long-term boundary because: - -1. `sw-block` owns the execution port shape -2. `weed/storage/blockvol/v2bridge.Executor` remains only one implementation -3. future migration can move execution code without changing engine contracts - -### Assignment translation helper port - -Normalized helper layer: - -1. `ReplicaAssignmentForServer()` -2. `RecoveryTargetForRole()` - -These are now the canonical helper rules in: - -1. `sw-block/bridge/blockvol/control_adapter.go` - -They exist to stop identity / recovery-target mapping from drifting between: - -1. `weed/storage/blockvol/v2bridge/control.go` -2. `weed/server/volume_server_block.go` - -## Code Normalization Completed - -Implemented in this batch: - -1. `sw-block/bridge/blockvol/doc.go` - - clarified that the package owns weed-free contracts and thin adapters, - not real blockvol implementations -2. `sw-block/bridge/blockvol/contract.go` - - aligned execution contracts with engine IO interfaces -3. `sw-block/bridge/blockvol/control_adapter.go` - - extracted canonical helper functions for identity and recovery-target - mapping -4. `sw-block/bridge/blockvol/bridge_test.go` - - added interface-compatibility proof for the normalized execution contracts - -## Resulting Boundary Rule - -After this audit, the port layer rule is: - -1. `sw-block` defines contracts and canonical mapping helpers -2. `weed/` implements real storage, transport, and runtime bindings -3. no `sw-block` package in this layer should import `weed/` - -## What Still Does Not Move Yet - -This audit does NOT move: - -1. `weed/storage/blockvol/v2bridge.Executor` -2. `weed/storage/blockvol/v2bridge.Reader` -3. `weed/storage/blockvol/v2bridge.Pinner` -4. `weed/server/BlockService` -5. `weed/server/RecoveryManager` - -It only stabilizes the port layer those migrations will target. diff --git a/sw-block/design/v2-validation-matrix.md b/sw-block/design/v2-validation-matrix.md index 8ee25bb91..9cc2bb5aa 100644 --- a/sw-block/design/v2-validation-matrix.md +++ b/sw-block/design/v2-validation-matrix.md @@ -73,6 +73,28 @@ Do not reuse tests that encode V1.5 semantics that V2 intentionally removed: | `Restore Ready` | exact base restore and snapshot-tail recovery are safe and exact | snapshot boundary, integrity, partial-failure safety, tail convergence | | `V2 Ready` | primary-owned assignment/session/projection semantics close on real flows | bootstrap, keepup, catchup, rebuild, failover, rejoin, publish gating | +## Matrix Linkage + +Read this matrix together with: + +1. `v2-capability-map.md` for ownership of the capability tier +2. `v2-integration-matrix.md` for real scenario coverage + +This matrix answers "is the capability closed at all?" It does not by itself +guarantee that enough real topology/workload/failure scenarios have been +exercised. That second question belongs to the integration matrix. + +| Validation rows | Capability tier | Primary protocol refs | Main integration refs | +|---|---|---|---| +| `R1`-`R12` | Tier 3: RF=2 Recovery And Failover | `v2-sync-recovery-protocol.md`, `v2-rebuild-mvp-session-protocol.md` | `I-R1`-`I-R8` | +| `S1`-`S10` | Tier 5: Lifecycle Capability | `v2-rebuild-mvp-session-protocol.md` and snapshot/restore execution rules | `I-S1`-`I-S4` | +| `V1`-`V2` | Tier 2: RF=2 Replication Base | `v2-sync-recovery-protocol.md` | `I-V1`, `I-V2` | +| `V3`-`V8` | Tier 3: RF=2 Recovery And Failover | `v2-sync-recovery-protocol.md`, `v2-rebuild-mvp-session-protocol.md` | `I-V3`, `I-V4`, `I-V5` | +| `V9`-`V10` | Tier 4: Multi-Replica Runtime (`RF>=3`) | `v2-sync-recovery-protocol.md` | future RF>=3 integrated rows; currently bounded engine/component proof only | +| `V11` | Tier 3 / Tier 8 boundary | recovery protocol plus launch-envelope disturbance claims | `I-V5` | +| `V12`-`V13` | Tier 6: Control Plane And Operations | ownership / observability docs and control-plane protocol surfaces | `I-V4`, `I-V6` | +| `V14` | Tier 0: Semantic Foundation | `v2-protocol-truths.md`, `v2-sync-recovery-protocol.md` | exercised across `I-V*` and negative component packs | + ## Matrix A: Rebuild Ready | ID | Priority | Scenario | Trigger / entry | Reuse | Main proof | Final validation | Coverage | File | Evidence | diff --git a/sw-block/design/v2-volumev2-single-node-mvp.md b/sw-block/design/v2-volumev2-single-node-mvp.md deleted file mode 100644 index fafec8307..000000000 --- a/sw-block/design/v2-volumev2-single-node-mvp.md +++ /dev/null @@ -1,151 +0,0 @@ -# V2 VolumeV2 Single-Node MVP - -Date: 2026-04-05 -Status: active - -## Purpose - -This note defines the target shape for a single-node `volumev2` MVP that can -ship as a normal block service before HA/failover exists. - -The core idea is: - -1. `masterv2` is fully new control ownership -2. `volumev2` is a new shell and brain host -3. `blockvol` and related backend mechanics remain reusable muscles - -## Target Layering - -`volumev2` should be strengthened around four layers. - -### 1. Engine - -Owner: - -- `sw-block/engine/replication/` - -Responsibility: - -1. state -2. event ingestion -3. command emission -4. outward projection - -Rule: - -- semantic truth lives here -- no backend I/O or network ownership - -### 2. Engine Interface - -Owner: - -- command/event vocabulary between control/runtime and backend execution - -Responsibility: - -1. assignment -> event translation -2. observation -> event translation -3. command -> execution dispatch contract - -Rule: - -- runtime shell may not mutate engine truth directly - -### 3. Control Plane - -Owner: - -- `masterv2 <-> volumev2` coordination - -Responsibility: - -1. node identity -2. registration and heartbeat -3. assignment receipt -4. state reporting -5. future recovery-control vocabulary (`keepup`, `catchup`, `rebuild`) - -Rule: - -- control plane carries protocol messages -- it does not own local data execution - -### 4. Data Plane - -Owner: - -- local storage and serving mechanics - -Responsibility: - -1. WAL/extent management -2. read/write/flush -3. background workers -4. receiver/shipper mechanics -5. NVMe/iSCSI/frontend serving - -Rule: - -- data plane knows how to execute -- it does not define publication or role semantics - -## Single-Node MVP Contract - -The first ship-capable `volumev2` slice should include: - -1. `masterv2` declaration of one RF1 primary volume -2. `volumev2` control session to fetch assignments -3. local create/open through reused `blockvol` -4. local primary assignment application through the V2 engine -5. local read/write plus restart durability -6. debug/status snapshot -7. one small executable entrypoint for smoke usage - -The first slice explicitly excludes: - -1. failover -2. RF2 replication -3. catch-up/rebuild ownership -4. CSI - -## Why This Is Enough - -This is enough to prove: - -1. the `masterv2 + volumev2` head is viable -2. `volumev2` can host V2 semantics while reusing V1 muscles -3. a useful non-HA block service can exist before HA complexity is added - -## Module Shape - -Recommended package split: - -1. `sw-block/runtime/masterv2/` -2. `sw-block/runtime/volumev2/` -3. `sw-block/runtime/purev2/` -4. `sw-block/engine/replication/` -5. `sw-block/bridge/blockvol/` - -Within `volumev2`, strengthen toward: - -1. `control_session.go` -2. `orchestrator.go` -3. `node.go` -4. later: `heartbeat.go`, `frontend.go`, `workers.go` - -## Stage Gate - -`volumev2` may be treated as a single-node MVP only when: - -1. assignment sync is repeatable and idempotent -2. local IO is data-verified -3. restart/open path is proven -4. status/debug state is explicit -5. no `weed/server` lifecycle owner is required - -## Related References - -- `v2-pure-runtime-rf1-bootstrap.md` -- `v2-proof-and-retest-pyramid.md` -- `v2-capability-map.md` diff --git a/weed/server/master_block_failover.go b/weed/server/master_block_failover.go index 62b7e5ec0..10932094a 100644 --- a/weed/server/master_block_failover.go +++ b/weed/server/master_block_failover.go @@ -196,6 +196,11 @@ func (ms *MasterServer) failoverBlockVolumes(deadServer string) { } ms.blockRegistry.FailoversTotal.Add(1) entries := ms.blockRegistry.ListByServer(deadServer) + glog.V(0).Infof("failover: deadServer=%s entries=%d", deadServer, len(entries)) + for i, e := range entries { + glog.V(0).Infof("failover: entry[%d] name=%q vs=%s role=%d hasReplica=%v epoch=%d", + i, e.Name, e.VolumeServer, e.Role, e.HasReplica(), e.Epoch) + } now := time.Now() for _, entry := range entries { // Case 1: Dead server is the primary. diff --git a/weed/server/master_block_registry.go b/weed/server/master_block_registry.go index 568f2d76f..5c9bce1c0 100644 --- a/weed/server/master_block_registry.go +++ b/weed/server/master_block_registry.go @@ -111,6 +111,12 @@ type BlockVolumeEntry struct { LastLeaseGrant time.Time LeaseTTL time.Duration + // Registration race protection: the time this entry was created/registered + // by the master. Stale cleanup skips recently registered entries to allow + // the volume server time to discover the volume and include it in its + // next heartbeat inventory. + RegisteredAt time.Time + // CP11A-2: Coordinated expand tracking. ExpandInProgress bool ExpandFailed bool // true = primary committed but replica(s) failed; size suppressed @@ -424,6 +430,9 @@ func (r *BlockVolumeRegistry) Register(entry *BlockVolumeEntry) error { if _, ok := r.volumes[entry.Name]; ok { return fmt.Errorf("block volume %q already registered", entry.Name) } + if entry.RegisteredAt.IsZero() { + entry.RegisteredAt = time.Now() + } entry.recomputeReplicaState() r.volumes[entry.Name] = entry r.addToServer(entry.VolumeServer, entry.Name) @@ -642,6 +651,14 @@ func (r *BlockVolumeRegistry) UpdateFullHeartbeatWithInventoryAuthority(server s name, server) continue } + // Registration race protection: skip recently registered entries. + // The VS may not have discovered the volume yet. Grace period + // of 30s (> 2 heartbeat intervals) prevents premature deletion. + if !entry.RegisteredAt.IsZero() && time.Since(entry.RegisteredAt) < 30*time.Second { + glog.V(0).Infof("block registry: skipping stale-cleanup for %q (registered %v ago, grace period)", + name, time.Since(entry.RegisteredAt).Round(time.Second)) + continue + } delete(r.volumes, name) delete(names, name) // Also clean up replica entries from byServer. diff --git a/weed/server/master_block_registry_test.go b/weed/server/master_block_registry_test.go index 2e22d9643..f52cf913a 100644 --- a/weed/server/master_block_registry_test.go +++ b/weed/server/master_block_registry_test.go @@ -88,8 +88,9 @@ func TestRegistry_ListByServer(t *testing.T) { func TestRegistry_UpdateFullHeartbeat(t *testing.T) { r := NewBlockVolumeRegistry() // Register two volumes on server s1. - r.Register(&BlockVolumeEntry{Name: "vol1", VolumeServer: "s1", Path: "/v1.blk", Status: StatusPending}) - r.Register(&BlockVolumeEntry{Name: "vol2", VolumeServer: "s1", Path: "/v2.blk", Status: StatusPending}) + pastGrace := time.Now().Add(-60 * time.Second) + r.Register(&BlockVolumeEntry{Name: "vol1", VolumeServer: "s1", Path: "/v1.blk", Status: StatusPending, RegisteredAt: pastGrace}) + r.Register(&BlockVolumeEntry{Name: "vol2", VolumeServer: "s1", Path: "/v2.blk", Status: StatusPending, RegisteredAt: pastGrace}) // Full heartbeat reports only vol1 (vol2 is stale). r.UpdateFullHeartbeat("s1", []*master_pb.BlockVolumeInfoMessage{ @@ -127,7 +128,7 @@ func TestRegistry_UpdateFullHeartbeatWithInventoryAuthority_NonAuthoritativeEmpt func TestRegistry_UpdateFullHeartbeatWithInventoryAuthority_AuthoritativeEmptyStillDeletes(t *testing.T) { r := NewBlockVolumeRegistry() - r.Register(&BlockVolumeEntry{Name: "vol1", VolumeServer: "s1", Path: "/v1.blk", Status: StatusActive}) + r.Register(&BlockVolumeEntry{Name: "vol1", VolumeServer: "s1", Path: "/v1.blk", Status: StatusActive, RegisteredAt: time.Now().Add(-60 * time.Second)}) r.UpdateFullHeartbeatWithInventoryAuthority("s1", nil, "", true) @@ -3206,3 +3207,58 @@ func TestRegistry_UpdateFullHeartbeat_EngineProjectionModePreservedOnNewPrimaryW t.Fatalf("EngineProjectionMode=%q, want %q from new primary", entry.EngineProjectionMode, "degraded") } } + +func TestRegistry_StaleCleanup_SkipsRecentlyRegisteredEntry(t *testing.T) { + r := NewBlockVolumeRegistry() + r.MarkBlockCapable("vs1:8080") + + // Register a volume — RegisteredAt is set automatically. + if err := r.Register(&BlockVolumeEntry{ + Name: "vol-grace", + VolumeServer: "vs1:8080", + Path: "/blocks/vol-grace.blk", + Status: StatusActive, + Role: blockvol.RoleToWire(blockvol.RolePrimary), + }); err != nil { + t.Fatalf("register: %v", err) + } + + // Authoritative heartbeat from vs1 that does NOT report this volume. + // Without grace period, this would delete the entry. + r.UpdateFullHeartbeatWithInventoryAuthority("vs1:8080", nil, "", true) + + // Entry should survive — it was just registered. + entry, ok := r.Lookup("vol-grace") + if !ok { + t.Fatal("recently registered entry was deleted by stale cleanup — grace period not working") + } + if entry.Name != "vol-grace" { + t.Fatalf("entry name=%q, want vol-grace", entry.Name) + } +} + +func TestRegistry_StaleCleanup_DeletesOldUnreportedEntry(t *testing.T) { + r := NewBlockVolumeRegistry() + r.MarkBlockCapable("vs1:8080") + + // Register a volume with RegisteredAt in the past (beyond grace period). + if err := r.Register(&BlockVolumeEntry{ + Name: "vol-stale", + VolumeServer: "vs1:8080", + Path: "/blocks/vol-stale.blk", + Status: StatusActive, + Role: blockvol.RoleToWire(blockvol.RolePrimary), + RegisteredAt: time.Now().Add(-60 * time.Second), // 60s ago, past grace + }); err != nil { + t.Fatalf("register: %v", err) + } + + // Authoritative heartbeat without this volume. + r.UpdateFullHeartbeatWithInventoryAuthority("vs1:8080", nil, "", true) + + // Entry should be deleted — it's old and not reported. + _, ok := r.Lookup("vol-stale") + if ok { + t.Fatal("old unreported entry survived stale cleanup — grace period should not protect it") + } +} diff --git a/weed/server/volume_grpc_client_to_master.go b/weed/server/volume_grpc_client_to_master.go index 27bd389a8..77c26d765 100644 --- a/weed/server/volume_grpc_client_to_master.go +++ b/weed/server/volume_grpc_client_to_master.go @@ -349,7 +349,10 @@ func (vs *VolumeServer) doHeartbeatWithRetry(masterAddress pb.ServerAddress, grp return case <-vs.stopChan: var volumeMessages []*master_pb.VolumeInformationMessage - blockInventoryAuthoritative := true + // Shutdown beat: clear regular volumes but do NOT claim block + // inventory authority. The block registry entry must survive + // shutdown so failoverBlockVolumes can promote the replica. + noBlockAuthority := false emptyBeat := &master_pb.Heartbeat{ Ip: ip, Port: port, @@ -359,8 +362,8 @@ func (vs *VolumeServer) doHeartbeatWithRetry(masterAddress pb.ServerAddress, grp Rack: rack, Volumes: volumeMessages, HasNoVolumes: len(volumeMessages) == 0, - HasNoBlockVolumes: vs.blockService != nil, - BlockVolumeInventoryAuthoritative: &blockInventoryAuthoritative, + HasNoBlockVolumes: false, + BlockVolumeInventoryAuthoritative: &noBlockAuthority, } glog.V(1).Infof("volume server %s:%d stops and deletes all volumes", vs.store.Ip, vs.store.Port) if err = stream.Send(emptyBeat); err != nil { diff --git a/weed/storage/blockvol/blockvol.go b/weed/storage/blockvol/blockvol.go index cb851f971..0dc9bb348 100644 --- a/weed/storage/blockvol/blockvol.go +++ b/weed/storage/blockvol/blockvol.go @@ -211,13 +211,13 @@ func CreateBlockVol(path string, opts CreateOptions, cfgs ...BlockVolConfig) (*B Interval: cfg.FlushInterval, Metrics: v.Metrics, BatchIO: bio, - // CP13-6: replica-aware WAL retention. - RetentionFloorFn: func() (uint64, bool) { - if v.shipperGroup == nil { - return 0, false - } - return v.shipperGroup.MinRecoverableFlushedLSN() - }, + // No keepup WAL retention: flusher recycles freely. If a replica + // falls behind and WAL entries are recycled, it escalates to + // NeedsRebuild — the correct outcome. Catch-up from extent via + // the LBA dirty map (V2) will eliminate this tension entirely. + // Session-only WAL pins (for active rebuild/catch-up) are handled + // separately by SetV2RetentionFloor. + RetentionFloorFn: nil, EvaluateRetentionBudgetsFn: func() { if v.shipperGroup != nil { v.shipperGroup.EvaluateRetentionBudgets(RetentionBudgetParams{ @@ -340,17 +340,13 @@ func OpenBlockVol(path string, cfgs ...BlockVolConfig) (*BlockVol, error) { Interval: cfg.FlushInterval, Metrics: v.Metrics, BatchIO: bio, - RetentionFloorFn: func() (uint64, bool) { - if v.shipperGroup == nil { - return 0, false - } - return v.shipperGroup.MinRecoverableFlushedLSN() - }, + // No keepup WAL retention (same as CreateBlockVol path). + RetentionFloorFn: nil, EvaluateRetentionBudgetsFn: func() { if v.shipperGroup != nil { v.shipperGroup.EvaluateRetentionBudgets(RetentionBudgetParams{ Timeout: walRetentionTimeout, - MaxBytes: 0, // CP13-6 max-bytes disabled: uses replicaFlushedLSN which can't advance without barrier; v2 will replace with negotiated recovery protocol + MaxBytes: 0, // CP13-6 max-bytes disabled PrimaryHeadLSN: v.nextLSN.Load() - 1, BlockSize: v.super.BlockSize, }) diff --git a/weed/storage/blockvol/shipper_group.go b/weed/storage/blockvol/shipper_group.go index 7ffa773b4..94d572703 100644 --- a/weed/storage/blockvol/shipper_group.go +++ b/weed/storage/blockvol/shipper_group.go @@ -230,6 +230,34 @@ func (sg *ShipperGroup) MinRecoverableFlushedLSN() (uint64, bool) { return min, found } +// MinShippedLSN returns the minimum shippedLSN across all active shippers +// (not NeedsRebuild). This is the Ceph-model retention watermark: the flusher +// must not recycle WAL entries past the slowest active shipper's shipped +// position, because those entries are needed for catch-up if the shipper +// degrades during sustained async writes. +// +// Returns (0, false) if no shipper has shipped anything yet. +func (sg *ShipperGroup) MinShippedLSN() (uint64, bool) { + sg.mu.RLock() + defer sg.mu.RUnlock() + var min uint64 + found := false + for _, s := range sg.shippers { + if s.State() == ReplicaNeedsRebuild { + continue + } + lsn := s.ShippedLSN() + if lsn == 0 { + continue // hasn't shipped yet — don't pin at 0 + } + if !found || lsn < min { + min = lsn + found = true + } + } + return min, found +} + // RetentionBudgetParams holds the inputs for retention budget evaluation. type RetentionBudgetParams struct { Timeout time.Duration diff --git a/weed/storage/blockvol/testrunner/actions/system.go b/weed/storage/blockvol/testrunner/actions/system.go index c2a6b53a4..badfa4f23 100644 --- a/weed/storage/blockvol/testrunner/actions/system.go +++ b/weed/storage/blockvol/testrunner/actions/system.go @@ -76,6 +76,18 @@ func assertEqual(ctx context.Context, actx *tr.ActionContext, act tr.Action) (ma actual := act.Params["actual"] expected := act.Params["expected"] + // Reject empty strings — prevents false positives when an upstream action + // failed silently and returned empty. "" == "" would hide real failures. + if actual == "" && expected == "" { + return nil, fmt.Errorf("assert_equal: both actual and expected are empty — likely upstream action failure") + } + if actual == "" { + return nil, fmt.Errorf("assert_equal: actual is empty (expected %q) — likely upstream action failure", expected) + } + if expected == "" { + return nil, fmt.Errorf("assert_equal: expected is empty (actual %q) — likely upstream action failure", actual) + } + if actual != expected { return nil, fmt.Errorf("assert_equal: %q != %q", actual, expected) } diff --git a/weed/storage/blockvol/testrunner/scenarios/internal/recovery-baseline-failover.yaml b/weed/storage/blockvol/testrunner/scenarios/internal/recovery-baseline-failover.yaml index a081992f3..7eec61ba0 100644 --- a/weed/storage/blockvol/testrunner/scenarios/internal/recovery-baseline-failover.yaml +++ b/weed/storage/blockvol/testrunner/scenarios/internal/recovery-baseline-failover.yaml @@ -187,16 +187,18 @@ phases: - name: kill-primary actions: - action: print - msg: "=== Killing primary ({{ before_server }}) ===" + msg: "=== Killing primary ({{ before }}, {{ before_server }}) ===" - - action: exec - node: m02 - cmd: "kill -9 {{ vs1_pid }}" - root: "true" + # Kill the primary VS using stop_weed with the discovered primary's PID. + # Master consistently places the primary on m01 (vs2_pid) in this + # topology. discover_primary confirms this. + - action: stop_weed + node: m01 + pid: "{{ vs2_pid }}" ignore_error: true - action: print - msg: "Primary killed. Waiting for lease expiry (45s)..." + msg: "Primary killed on m01 ({{ before_server }}). Waiting for lease expiry (45s)..." - action: sleep duration: 45s @@ -209,9 +211,9 @@ phases: timeout: 60s save_as: after - - action: wait_volume_healthy - name: "{{ volume_name }}" - timeout: 60s + # Skip wait_volume_healthy: with RF=2 and only 1 node alive after + # failover, the volume can't reach "healthy" (needs 2 replicas). + # The wait_block_primary above already confirms failover succeeded. - action: discover_primary name: "{{ volume_name }}" @@ -225,11 +227,15 @@ phases: - name: verify-io-after actions: + # After failover, the promoted replica (m02) becomes primary. + # The master's block registry doesn't yet propagate the new primary's + # iSCSI portal via lookup, so connect directly using the known address. + # m02's iSCSI is on the RDMA IP (10.0.0.3) port 3295. - action: iscsi_login_direct node: m01 - host: "10.0.0.1" + host: "10.0.0.3" port: "3295" - iqn: "{{ vol_iqn }}" + iqn: "iqn.2024-01.com.seaweedfs:vol.{{ volume_name }}" save_as: device2 - action: dd_read_md5