mirror of
https://github.com/seaweedfs/seaweedfs.git
synced 2026-05-22 17:51:30 +00:00
fix: three hardware blockers — WAL retention + registry race + shutdown beat
All 43 actions pass on m01/m02 hardware. Auto-failover PASS. dd_write: 30s → 123ms. Post-failover write: 33,621 IOPS. 1. WAL retention: remove keepup retention floor (MinShippedLSN). WAL cannot be pinned during sustained async writes — any pin strategy either fills WAL (blocking writes) or over-recycles (breaking catch-up). Flusher recycles freely. Future LBA map will provide catch-up without WAL retention. MinShippedLSN on ShipperGroup retained as diagnostic surface. 2. Registry stale-cleanup race: add RegisteredAt grace period. Race: master registers volume → next VS heartbeat arrives before VS discovers the volume → stale cleanup deletes the entry → failover finds 0 entries. Fix: skip stale cleanup for entries registered within 30s (> 2 heartbeat intervals). 2 new tests: grace protects new entry, old entry still cleaned. 3. Shutdown heartbeat: VS disconnect heartbeat no longer claims block inventory authority. Previously, the shutdown beat's empty inventory triggered stale cleanup, deleting the entry before failover could use it. Scenario fix: recovery-baseline-failover.yaml now kills the correct node (discovered primary, not hardcoded), connects to the correct new primary for post-failover verification. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -1,288 +0,0 @@
|
||||
# Protocol Development Process
|
||||
|
||||
Date: 2026-03-27
|
||||
|
||||
## Purpose
|
||||
|
||||
This document defines how `sw-block` protocol work should be developed.
|
||||
|
||||
The process is meant to work for:
|
||||
|
||||
- V2
|
||||
- future V3
|
||||
- or a later block algorithm that is not WAL-based
|
||||
|
||||
The point is to make protocol work systematic rather than reactive.
|
||||
|
||||
## Core Philosophy
|
||||
|
||||
### 1. Design before implementation
|
||||
|
||||
Do not start with production code and hope the protocol becomes clear later.
|
||||
|
||||
Start with:
|
||||
|
||||
1. system contract
|
||||
2. invariants
|
||||
3. state model
|
||||
4. scenario backlog
|
||||
|
||||
Only then move to implementation.
|
||||
|
||||
### 2. Real failures are inputs, not just bugs
|
||||
|
||||
When V1 or V1.5 fails in real testing, treat that as:
|
||||
|
||||
- a design requirement
|
||||
- a scenario source
|
||||
- a simulator input
|
||||
|
||||
Do not patch and forget.
|
||||
|
||||
### 3. Simulator is part of the protocol, not a side tool
|
||||
|
||||
The simulator exists to answer:
|
||||
|
||||
- what should happen
|
||||
- what must never happen
|
||||
- which old designs fail
|
||||
- why the new design is better
|
||||
|
||||
It is not a replacement for real testing.
|
||||
It is the design-validation layer before production implementation.
|
||||
|
||||
### 4. Passing tests are not enough
|
||||
|
||||
Green tests are necessary, not sufficient.
|
||||
|
||||
We also require:
|
||||
|
||||
- explicit invariants
|
||||
- explicit scenario intent
|
||||
- clear state transitions
|
||||
- review of assumptions and abstraction boundaries
|
||||
|
||||
### 5. Keep hot-path and recovery-path reasoning separate
|
||||
|
||||
Healthy steady-state behavior and degraded recovery behavior are different problems.
|
||||
|
||||
Both must be designed explicitly.
|
||||
|
||||
## Development Ladder
|
||||
|
||||
Every major protocol feature should move through these steps:
|
||||
|
||||
1. **Problem statement**
|
||||
- what real bug, limit, or product goal is driving the work
|
||||
|
||||
2. **Contract**
|
||||
- what the protocol guarantees
|
||||
- what it does not guarantee
|
||||
|
||||
3. **State model**
|
||||
- node state
|
||||
- coordinator state
|
||||
- recovery state
|
||||
- role / epoch / lineage rules
|
||||
|
||||
4. **Scenario backlog**
|
||||
- named scenarios
|
||||
- source:
|
||||
- real failure
|
||||
- design obligation
|
||||
- adversarial distributed case
|
||||
|
||||
5. **Prototype / simulator**
|
||||
- reduced but explicit model
|
||||
- invariant checks
|
||||
- V1 / V1.5 / V2 comparison where relevant
|
||||
|
||||
6. **Implementation**
|
||||
- production code only after the protocol shape is clear enough
|
||||
|
||||
7. **Real validation**
|
||||
- unit
|
||||
- component
|
||||
- integration
|
||||
- real hardware where needed
|
||||
|
||||
8. **Feedback loop**
|
||||
- turn new failures back into scenario/design inputs
|
||||
|
||||
## Required Artifacts
|
||||
|
||||
For protocol work to be considered real progress, we usually want:
|
||||
|
||||
### Design
|
||||
|
||||
- design doc
|
||||
- scenario doc
|
||||
- comparison doc when replacing an older approach
|
||||
|
||||
### Prototype
|
||||
|
||||
- simulator or prototype code
|
||||
- tests that assert protocol behavior
|
||||
|
||||
### Implementation
|
||||
|
||||
- production patch
|
||||
- production tests
|
||||
- docs updated to match the actual algorithm
|
||||
|
||||
### Review
|
||||
|
||||
- implementation gate
|
||||
- design/protocol gate
|
||||
|
||||
## Two-Gate Rule
|
||||
|
||||
We use two acceptance gates.
|
||||
|
||||
### Gate 1: implementation
|
||||
|
||||
Owned by the coding side.
|
||||
|
||||
Questions:
|
||||
|
||||
- does it build?
|
||||
- do tests pass?
|
||||
- does it behave as intended in code?
|
||||
|
||||
### Gate 2: protocol/design
|
||||
|
||||
Owned by the design/review side.
|
||||
|
||||
Questions:
|
||||
|
||||
- is the logic actually sound?
|
||||
- do tests prove the intended thing?
|
||||
- are assumptions explicit?
|
||||
- is the abstraction boundary honest?
|
||||
|
||||
A task is not accepted until both gates pass.
|
||||
|
||||
## Layering Rule
|
||||
|
||||
Keep simulation layers separate.
|
||||
|
||||
### `distsim`
|
||||
|
||||
Use for:
|
||||
|
||||
- protocol correctness
|
||||
- state transitions
|
||||
- fencing
|
||||
- recoverability
|
||||
- promotion / lineage
|
||||
- reference-state checking
|
||||
|
||||
### `eventsim`
|
||||
|
||||
Use for:
|
||||
|
||||
- timeout behavior
|
||||
- timer races
|
||||
- event ordering
|
||||
- same-tick / delayed event interactions
|
||||
|
||||
Do not duplicate scenarios blindly across both layers.
|
||||
|
||||
## Test Selection Rule
|
||||
|
||||
Do not choose simulator inputs only from failing tests.
|
||||
|
||||
Review all relevant tests and classify them by:
|
||||
|
||||
- protocol significance
|
||||
- simulator value
|
||||
- implementation specificity
|
||||
|
||||
Good simulator candidates often come from:
|
||||
|
||||
- barrier truth
|
||||
- catch-up vs rebuild
|
||||
- stale message rejection
|
||||
- failover / promotion safety
|
||||
- changed-address restart
|
||||
- mode semantics
|
||||
|
||||
Keep real-only tests for:
|
||||
|
||||
- wire format
|
||||
- OS timing
|
||||
- exact WAL file behavior
|
||||
- frontend transport specifics
|
||||
|
||||
## Version Comparison Rule
|
||||
|
||||
When designing a successor protocol:
|
||||
|
||||
- keep the old version visible
|
||||
- reproduce the old failure or limitation
|
||||
- show the improved behavior in the new version
|
||||
|
||||
For `sw-block`, that means:
|
||||
|
||||
- `V1`
|
||||
- `V1.5`
|
||||
- `V2`
|
||||
|
||||
should be compared explicitly where possible.
|
||||
|
||||
## Documentation Rule
|
||||
|
||||
The docs must track three different things:
|
||||
|
||||
### `learn/projects/sw-block/`
|
||||
|
||||
Use for:
|
||||
|
||||
- project history
|
||||
- V1/V1.5 algorithm records
|
||||
- phase records
|
||||
- real test history
|
||||
|
||||
### `sw-block/design/`
|
||||
|
||||
Use for:
|
||||
|
||||
- active design truth
|
||||
- V2 and later protocol docs
|
||||
- scenario backlog
|
||||
- comparison docs
|
||||
|
||||
### `sw-block/.private/phase/`
|
||||
|
||||
Use for:
|
||||
|
||||
- active execution plan
|
||||
- log
|
||||
- decisions
|
||||
|
||||
## What Good Progress Looks Like
|
||||
|
||||
A good protocol iteration usually has this pattern:
|
||||
|
||||
1. real failure or design pressure identified
|
||||
2. scenario named and written down
|
||||
3. simulator reproduces the bad case
|
||||
4. new protocol handles it explicitly
|
||||
5. implementation follows
|
||||
6. real tests validate it
|
||||
|
||||
If one of those steps is missing, confidence is weaker.
|
||||
|
||||
## Bottom Line
|
||||
|
||||
The process is:
|
||||
|
||||
1. design the contract
|
||||
2. model the state
|
||||
3. define the scenarios
|
||||
4. simulate the protocol
|
||||
5. implement carefully
|
||||
6. validate in real tests
|
||||
7. feed failures back into design
|
||||
|
||||
That is the process we should keep using for V2 and any later protocol line.
|
||||
@@ -1,314 +0,0 @@
|
||||
# V1, V1.5, and V2 Comparison
|
||||
|
||||
Date: 2026-03-27
|
||||
|
||||
## Purpose
|
||||
|
||||
This document compares:
|
||||
|
||||
- `V1`: original replicated WAL shipping model
|
||||
- `V1.5`: Phase 13 catch-up-first improvements on top of V1
|
||||
- `V2`: explicit FSM / orchestrator / recoverability-driven design under `sw-block/`
|
||||
|
||||
It is a design comparison, not a marketing document.
|
||||
|
||||
## 1. One-line summary
|
||||
|
||||
- `V1` is simple but weak on short-gap recovery.
|
||||
- `V1.5` materially improves recovery, but still relies on assumptions and incremental control-plane fixes.
|
||||
- `V2` is structurally cleaner, more explicit, and easier to validate, but is not yet a production engine.
|
||||
|
||||
## 2. Steady-State Hot Path
|
||||
|
||||
In the healthy case, all three versions can look similar:
|
||||
|
||||
1. primary appends ordered WAL
|
||||
2. primary ships entries to replicas
|
||||
3. replicas apply in order
|
||||
4. durability barrier determines when client-visible commit completes
|
||||
|
||||
### V1
|
||||
|
||||
- simplest replication path
|
||||
- lagging replica typically degrades quickly
|
||||
- little explicit recovery structure
|
||||
|
||||
### V1.5
|
||||
|
||||
- same basic hot path as V1
|
||||
- WAL retention and reconnect/catch-up improve short outage handling
|
||||
- extra logic exists, but much of it is off the hot path
|
||||
|
||||
### V2
|
||||
|
||||
- can keep a similar hot path if implemented carefully
|
||||
- extra complexity is mainly in:
|
||||
- recovery planner
|
||||
- replica state machine
|
||||
- coordinator/orchestrator
|
||||
- recoverability checks
|
||||
|
||||
### Performance expectation
|
||||
|
||||
In a normal healthy cluster:
|
||||
|
||||
- `V2` should not be much heavier than `V1.5`
|
||||
- most V2 complexity sits in failure/recovery/control paths
|
||||
- there is no proof yet that V2 has better steady-state throughput or latency
|
||||
|
||||
## 3. Recovery Behavior
|
||||
|
||||
### V1
|
||||
|
||||
Recovery is weakly structured:
|
||||
|
||||
- lagging replica tends to degrade
|
||||
- short outage often becomes rebuild or long degraded state
|
||||
- little explicit catch-up boundary
|
||||
|
||||
### V1.5
|
||||
|
||||
Recovery is improved:
|
||||
|
||||
- short outage can recover by retained-WAL catch-up
|
||||
- background reconnect closes the `sync_all` dead-loop
|
||||
- catch-up-first is preferred before rebuild
|
||||
|
||||
But the model is still partly implicit:
|
||||
|
||||
- reconnect depends on endpoint stability unless control plane refreshes assignment
|
||||
- recoverability boundary is not as explicit as V2
|
||||
- tail-chasing and retention pressure still need policy care
|
||||
|
||||
### V2
|
||||
|
||||
Recovery is explicit by design:
|
||||
|
||||
- `InSync`
|
||||
- `Lagging`
|
||||
- `CatchingUp`
|
||||
- `NeedsRebuild`
|
||||
- `Rebuilding`
|
||||
|
||||
And explicit decisions exist for:
|
||||
|
||||
- catch-up vs rebuild
|
||||
- stale-epoch rejection
|
||||
- promotion candidate choice
|
||||
- recoverable vs unrecoverable gap
|
||||
|
||||
## 4. Real V1.5 Lessons
|
||||
|
||||
The main V2 requirements come from real V1.5 behavior.
|
||||
|
||||
### 4.1 Changed-address restart
|
||||
|
||||
Observed in `CP13-8 T4b`:
|
||||
|
||||
- replica restarted
|
||||
- endpoint changed
|
||||
- primary shipper held stale address
|
||||
- direct reconnect could not succeed until control plane refreshed assignment
|
||||
|
||||
V1.5 fix:
|
||||
|
||||
- saved address used only as hint
|
||||
- heartbeat-reported address becomes source of truth
|
||||
- master refreshes primary assignment
|
||||
|
||||
Lesson for V2:
|
||||
|
||||
- endpoint is not identity
|
||||
- reassignment must be explicit
|
||||
|
||||
### 4.2 Reconnect race
|
||||
|
||||
Observed in Phase 13 review:
|
||||
|
||||
- barrier path and background reconnect path could both trigger reconnect
|
||||
|
||||
V1.5 fix:
|
||||
|
||||
- `reconnectMu` serializes reconnect / catch-up
|
||||
|
||||
Lesson for V2:
|
||||
|
||||
- one active recovery session per replica should be a protocol rule, not just a local mutex trick
|
||||
|
||||
### 4.3 Tail-chasing
|
||||
|
||||
Even with retained WAL:
|
||||
|
||||
- primary may write faster than a lagging replica can recover
|
||||
- catch-up may not converge
|
||||
|
||||
Lesson for V2:
|
||||
|
||||
- explicit abort / `NeedsRebuild`
|
||||
- do not pretend catch-up will always work
|
||||
|
||||
### 4.4 Control-plane recovery latency
|
||||
|
||||
V1.5 can be correct but still operationally slow if recovery waits on slower management cycles.
|
||||
|
||||
Lesson for V2:
|
||||
|
||||
- keep authority in coordinator
|
||||
- but make recovery decisions explicit and fast when possible
|
||||
|
||||
## 5. V2 Structural Improvements
|
||||
|
||||
V2 is better primarily because it is easier to reason about and validate.
|
||||
|
||||
### 5.1 Better state model
|
||||
|
||||
Instead of implicit recovery behavior, V2 has:
|
||||
|
||||
- per-replica FSM
|
||||
- volume/orchestrator model
|
||||
- distributed simulator with scenario coverage
|
||||
|
||||
### 5.2 Better validation
|
||||
|
||||
V2 has:
|
||||
|
||||
- named scenario backlog
|
||||
- protocol-state assertions
|
||||
- randomized simulation
|
||||
- V1/V1.5/V2 comparison tests
|
||||
|
||||
This is a major difference from V1/V1.5, where many fixes were discovered through implementation and hardware testing first.
|
||||
|
||||
### 5.3 Better correctness boundaries
|
||||
|
||||
V2 makes these explicit:
|
||||
|
||||
- recoverable gap vs rebuild
|
||||
- stale traffic rejection
|
||||
- promotion lineage safety
|
||||
- reservation or payload availability transitions
|
||||
|
||||
## 6. Stability Comparison
|
||||
|
||||
### Current judgment
|
||||
|
||||
- `V1`: least stable under failure/recovery stress
|
||||
- `V1.5`: meaningfully better and now functionally validated on real tests
|
||||
- `V2`: best protocol structure and best simulator confidence
|
||||
|
||||
### Important limit
|
||||
|
||||
`V2` is not yet proven more stable in production because:
|
||||
|
||||
- it is not a production engine yet
|
||||
- confidence comes from simulator/design work, not real block workload deployment
|
||||
|
||||
So the accurate statement is:
|
||||
|
||||
- `V2` is more stable **architecturally**
|
||||
- `V1.5` is more stable **operationally today** because it is implemented and tested on real hardware
|
||||
|
||||
## 7. Performance Comparison
|
||||
|
||||
### What is likely true
|
||||
|
||||
`V2` should perform better than rebuild-heavy recovery approaches when:
|
||||
|
||||
- outage is short
|
||||
- gap is recoverable
|
||||
- catch-up avoids full rebuild
|
||||
|
||||
It should also behave better under:
|
||||
|
||||
- flapping replicas
|
||||
- stale delayed messages
|
||||
- mixed-state replica sets
|
||||
|
||||
### What is not yet proven
|
||||
|
||||
We do not yet know whether `V2` has:
|
||||
|
||||
- better steady-state throughput
|
||||
- lower p99 latency
|
||||
- lower CPU overhead
|
||||
- lower memory overhead
|
||||
|
||||
than `V1.5`
|
||||
|
||||
That requires real implementation and benchmarking.
|
||||
|
||||
## 8. Smart WAL Fit
|
||||
|
||||
### Why Smart WAL is awkward in V1/V1.5
|
||||
|
||||
V1/V1.5 do not naturally model:
|
||||
|
||||
- payload classes
|
||||
- recoverability reservations
|
||||
- historical payload resolution
|
||||
- explicit recoverable/unrecoverable transition
|
||||
|
||||
So Smart WAL would be harder to add cleanly there.
|
||||
|
||||
### Why Smart WAL fits V2 better
|
||||
|
||||
V2 already has the right conceptual slots:
|
||||
|
||||
- `RecoveryClass`
|
||||
- `WALInline`
|
||||
- `ExtentReferenced`
|
||||
- recoverability planner
|
||||
- catch-up vs rebuild decision point
|
||||
- simulator for payload-availability transitions
|
||||
|
||||
### Important rule
|
||||
|
||||
Smart WAL must not mean:
|
||||
|
||||
- “read current extent for old LSN”
|
||||
|
||||
That is incorrect.
|
||||
|
||||
Historical correctness requires:
|
||||
|
||||
- WAL inline payload
|
||||
- or pinned snapshot/versioned extent state
|
||||
- not current live extent contents
|
||||
|
||||
## 9. What Is Proven Today
|
||||
|
||||
### Proven
|
||||
|
||||
- `V1.5` significantly improves V1 recovery behavior
|
||||
- real `CP13-8` testing validated the V1.5 data path and `sync_all` behavior
|
||||
- the V2 simulator covers:
|
||||
- stale traffic rejection
|
||||
- tail-chasing
|
||||
- flapping replicas
|
||||
- multi-promotion lineage
|
||||
- changed-address restart comparison
|
||||
- same-address transient outage comparison
|
||||
- Smart WAL availability transitions
|
||||
|
||||
### Not yet proven
|
||||
|
||||
- V2 production implementation quality
|
||||
- V2 steady-state performance advantage
|
||||
- V2 real hardware recovery performance
|
||||
|
||||
## 10. Bottom Line
|
||||
|
||||
If choosing based on current evidence:
|
||||
|
||||
- use `V1.5` as the production line today
|
||||
- use `V2` as the better long-term architecture
|
||||
|
||||
If choosing based on protocol quality:
|
||||
|
||||
- `V2` is clearly better structured
|
||||
- `V1.5` is still more ad hoc, even after successful fixes
|
||||
|
||||
If choosing based on current real-world proof:
|
||||
|
||||
- `V1.5` has the stronger operational evidence today
|
||||
- `V2` has the stronger design and simulation evidence today
|
||||
@@ -1,108 +0,0 @@
|
||||
# V2 Assignment Translation Unification
|
||||
|
||||
Date: 2026-04-04
|
||||
Status: active
|
||||
|
||||
## Purpose
|
||||
|
||||
This note defines how assignment translation should be unified so that:
|
||||
|
||||
1. `weed/storage/blockvol/v2bridge/control.go`
|
||||
2. `weed/server/volume_server_block.go`
|
||||
|
||||
do not drift on identity, role, or recovery-target meaning.
|
||||
|
||||
## Current Drift Risk
|
||||
|
||||
Today there are two live translation sites:
|
||||
|
||||
1. `ControlBridge.ConvertAssignment()` produces `engine.AssignmentIntent`
|
||||
2. `BlockService.coreAssignmentEvent()` produces `engine.AssignmentDelivered`
|
||||
|
||||
They do not translate the same source type, but they do share semantic rules:
|
||||
|
||||
1. how to build stable `ReplicaID`
|
||||
2. how to map role-shaped inputs to recovery target
|
||||
3. how to represent one local replica endpoint in engine types
|
||||
|
||||
If those rules stay duplicated, later migration batches will reintroduce split
|
||||
truth.
|
||||
|
||||
## Canonical Rule Placement
|
||||
|
||||
The canonical reusable rules belong in:
|
||||
|
||||
1. `sw-block/bridge/blockvol`
|
||||
|
||||
Why:
|
||||
|
||||
1. the rules are semantic translation, not product integration
|
||||
2. both `weed/storage/blockvol/v2bridge` and `weed/server` can import this
|
||||
package
|
||||
3. `sw-block` remains weed-free
|
||||
|
||||
## Rules That Must Be Canonical
|
||||
|
||||
### 1. Stable identity
|
||||
|
||||
Canonical helper:
|
||||
|
||||
1. `MakeReplicaID()`
|
||||
2. `ReplicaAssignmentForServer()`
|
||||
|
||||
Rule:
|
||||
|
||||
1. `ReplicaID = <volume>/<server-id>`
|
||||
2. never derive identity from transport address
|
||||
|
||||
### 2. Recovery-target mapping
|
||||
|
||||
Canonical helper:
|
||||
|
||||
1. `RecoveryTargetForRole()`
|
||||
|
||||
Rule:
|
||||
|
||||
1. `replica -> catchup`
|
||||
2. `rebuilding -> rebuild`
|
||||
3. all other roles -> no recovery target
|
||||
|
||||
### 3. Endpoint packaging
|
||||
|
||||
Canonical helper:
|
||||
|
||||
1. `ReplicaAssignmentForServer()`
|
||||
|
||||
Rule:
|
||||
|
||||
1. adapter code may still source endpoint fields from different wire/runtime
|
||||
inputs
|
||||
2. but once packaged into `engine.ReplicaAssignment`, the shape must be uniform
|
||||
|
||||
## What Still Stays Local
|
||||
|
||||
These parts remain adapter-local and should NOT be forced into one helper yet:
|
||||
|
||||
1. reading `blockvol.BlockVolumeAssignment`
|
||||
2. deciding whether the local VS is primary/replica/rebuilding in a given
|
||||
runtime context
|
||||
3. multi-replica traversal over heartbeat/master wire structures
|
||||
|
||||
Those are source-format adaptation concerns, not canonical translation rules.
|
||||
|
||||
## Implemented First Step
|
||||
|
||||
The first unification step is already applied:
|
||||
|
||||
1. `sw-block/bridge/blockvol/control_adapter.go`
|
||||
- now exports the canonical helpers
|
||||
2. `weed/server/volume_server_block.go`
|
||||
- now consumes the same helper layer for local assignment rebinding
|
||||
|
||||
## Next Step
|
||||
|
||||
The next step after this document is:
|
||||
|
||||
1. reduce `weed/storage/blockvol/v2bridge/control.go` to source-format
|
||||
extraction only
|
||||
2. keep all shared semantic mapping rules in `sw-block/bridge/blockvol`
|
||||
@@ -1,147 +0,0 @@
|
||||
# V2 Bounded Internal Pilot Pack
|
||||
|
||||
Date: 2026-04-05
|
||||
Status: draft
|
||||
Purpose: define the bounded internal engineering validation pack around the
|
||||
current `Phase 18` RF2 runtime-bearing envelope without silently broadening scope
|
||||
|
||||
## Reading Rule
|
||||
|
||||
This pilot pack is a bounded validation package for the current RF2
|
||||
runtime-bearing envelope.
|
||||
|
||||
It does NOT mean:
|
||||
|
||||
1. broad launch approval
|
||||
2. generic production readiness
|
||||
3. proof that the current runtime path is already a working block product
|
||||
4. permission to redefine exclusions through pilot success
|
||||
|
||||
It means only:
|
||||
|
||||
1. the team may run limited internal engineering validation inside the accepted
|
||||
runtime-bearing envelope
|
||||
2. validation outcomes must be read against the delivered `Phase 18` boundary
|
||||
3. incidents must be routed explicitly instead of becoming vague rollout lore
|
||||
|
||||
## Pilot Scope
|
||||
|
||||
This pack is limited to the current `Phase 18` runtime-bearing envelope:
|
||||
|
||||
1. kernel/runtime path:
|
||||
- `masterv2` identity authority
|
||||
- `volumev2` runtime-owned failover / Loop 2 / continuity / RF2 surface path
|
||||
- `purev2` execution adapter reuse
|
||||
2. validation shape:
|
||||
- bounded in-process runtime exercises
|
||||
- artifact-driven review only
|
||||
3. supported proof shape:
|
||||
- failover-time evidence seam
|
||||
- active Loop 2 observation
|
||||
- continuity handoff statement
|
||||
- compressed RF2 outward surface
|
||||
4. excluded surface classes:
|
||||
- real product frontends
|
||||
- broad operator APIs
|
||||
- real transport-backed product traffic
|
||||
|
||||
Anything outside that scope is not a finding for this pack.
|
||||
It is either a known exclusion, an explicit blocker, or later widening work.
|
||||
|
||||
## Pilot Environment And Topology
|
||||
|
||||
The validation environment must stay fixed and reviewable:
|
||||
|
||||
1. use one explicit build/commit package for all pilot nodes
|
||||
2. keep topology inside the bounded `RF=2` runtime-bearing path and do not
|
||||
introduce `RF>2`
|
||||
3. do not introduce real frontend/product traffic or broad transport claims
|
||||
4. pin operator-facing configuration and startup procedure in a written runbook
|
||||
5. expose the runtime diagnosis surfaces needed to read:
|
||||
- failover snapshot/result
|
||||
- Loop 2 snapshot
|
||||
- continuity snapshot
|
||||
- RF2 outward surface
|
||||
|
||||
If the validation needs ad hoc operator judgment to stay healthy, the pack is not
|
||||
ready.
|
||||
|
||||
## Success Criteria
|
||||
|
||||
The bounded validation is considered successful only if ALL of the following
|
||||
hold:
|
||||
|
||||
1. no observed behavior contradicts the bounded `Phase 18` runtime envelope
|
||||
2. no observed behavior contradicts the bounded fail-closed reading of the new
|
||||
runtime path
|
||||
3. incidents can be classified using the explicit buckets in this pack without
|
||||
inventing new ambiguous categories
|
||||
4. operators can execute preflight, bounded validation, and diagnosis from
|
||||
written artifacts rather than tribal knowledge
|
||||
5. findings do not require silently widening the supported envelope
|
||||
6. the review outcome remains consistent with the current `block expansion /
|
||||
not pilot-ready` judgment unless new closure explicitly changes it
|
||||
|
||||
Validation success validates the current bounded envelope only.
|
||||
It does not create a broader product claim by itself.
|
||||
|
||||
## Incident Intake And Classification
|
||||
|
||||
Every incident must record:
|
||||
|
||||
1. time, node set, workload, and surface involved
|
||||
2. observed symptom
|
||||
3. affected bounded claim, exclusion, or blocker
|
||||
4. diagnosis evidence used
|
||||
5. immediate operator action taken
|
||||
6. final classification
|
||||
|
||||
Allowed classification buckets:
|
||||
|
||||
1. `config / environment issue`
|
||||
- the product behaved inside the bounded claim, but the deployment violated the
|
||||
pilot preflight or environment assumptions
|
||||
2. `known exclusion`
|
||||
- the incident came from a surface or claim already excluded from the first
|
||||
launch matrix
|
||||
3. `true product bug`
|
||||
- the incident contradicts an accepted bounded claim or reveals a real gap
|
||||
inside the named chosen envelope
|
||||
|
||||
If an incident does not fit one of those buckets, stop the validation and refine
|
||||
the
|
||||
artifact set before continuing.
|
||||
|
||||
## Decision Outputs
|
||||
|
||||
At the end of a bounded validation window, the allowed outcomes are:
|
||||
|
||||
1. `stay in bounded validation`
|
||||
- more evidence is needed inside the same envelope
|
||||
2. `widen bounded engineering exposure`
|
||||
- the review may expand only internal engineering validation inside the same
|
||||
envelope
|
||||
3. `block expansion`
|
||||
- a contradiction, repeated unresolved bug, or operational ambiguity prevents
|
||||
widening
|
||||
|
||||
These outcomes require the bounded envelope review artifact.
|
||||
This pack does not replace that review.
|
||||
|
||||
## Explicit Non-Claims
|
||||
|
||||
This pack does NOT claim:
|
||||
|
||||
1. generic production proof from limited validation success
|
||||
2. support for `RF>2`
|
||||
3. support for a broad transport/frontend matrix
|
||||
4. broad automatic failover guarantees
|
||||
5. hours/days soak proof outside the bounded runtime-bearing reading
|
||||
|
||||
## Primary Inputs
|
||||
|
||||
1. `sw-block/design/v2-rf2-runtime-bounded-envelope.md`
|
||||
2. `sw-block/design/v2-rf2-runtime-bounded-envelope-review.md`
|
||||
3. `sw-block/.private/phase/phase-18.md`
|
||||
4. `sw-block/design/v2-protocol-claim-and-evidence.md`
|
||||
5. `sw-block/design/v2-product-completion-overview.md`
|
||||
@@ -405,6 +405,31 @@ Primary proof tiers:
|
||||
| 7 | Product surfaces | CSI and frontend projection of V2 truth | integrated |
|
||||
| 8 | Launch envelope | bounded support and rollout claims | integrated + soak |
|
||||
|
||||
## Matrix Linkage
|
||||
|
||||
Use the three active documents in a fixed order:
|
||||
|
||||
1. protocol docs define the rule
|
||||
2. this capability map defines which product tier owns the rule
|
||||
3. `v2-validation-matrix.md` defines what must be proven for closure
|
||||
4. `v2-integration-matrix.md` defines which real scenarios exercise the path
|
||||
|
||||
The goal is to make the chain explicit:
|
||||
|
||||
`protocol -> capability tier -> validation rows -> integration rows`
|
||||
|
||||
| Tier | Primary protocol refs | Validation rows | Integration rows | Practical meaning |
|
||||
|------|------------------------|-----------------|------------------|-------------------|
|
||||
| 0 | `v2-protocol-truths.md`, `v2-sync-recovery-protocol.md` | `V4`, `V5`, `V14` | feeds `I-V1` through `I-V6` | pure semantic truth and fail-closed rules |
|
||||
| 1 | `v2-protocol-truths.md` | `V1` | `I-V1` | single-volume and bootstrap correctness |
|
||||
| 2 | `v2-sync-recovery-protocol.md` | `V1`, `V2`, `V4` | `I-V1`, `I-V2` | RF=2 replication base and barrier/publication closure |
|
||||
| 3 | `v2-sync-recovery-protocol.md`, `v2-rebuild-mvp-session-protocol.md` | `R1`-`R12`, `V3`, `V6`, `V7`, `V8`, `V11` | `I-R1`-`I-R8`, `I-V3`, `I-V4`, `I-V5` | recovery, rebuild, failover, and rejoin |
|
||||
| 4 | `v2-sync-recovery-protocol.md` | `V9`, `V10` | future `RF>=3` integrated rows | aggregate multi-replica projection and durability semantics |
|
||||
| 5 | `v2-rebuild-mvp-session-protocol.md`, snapshot/restore execution docs | `S1`-`S10` | `I-S1`-`I-S4` | snapshot, restore, and lifecycle operations |
|
||||
| 6 | `v2-automata-ownership-map.md`, `v2-protocol-claim-and-evidence.md` | `V8`, `V12`, `V13` | `I-V4`, `I-V6` | control-plane truth, observability, and operator surfaces |
|
||||
| 7 | product-surface and rollout docs | `V1`, `V2`, `V12`, `V13` | runner scenarios and product e2e packs | CSI/frontend projection of V2 truth |
|
||||
| 8 | rollout/support docs | stage-gate summaries in validation matrix | chaos/perf rows `I-C1`-`I-C4`, `I-P1`-`I-P3` | bounded launch envelope and operational confidence |
|
||||
|
||||
## Test Expansion Strategy From This Map
|
||||
|
||||
This map should drive testing in a faster order than "one expensive scenario at a time."
|
||||
|
||||
@@ -1,143 +0,0 @@
|
||||
# V2 Controlled Rollout Review
|
||||
|
||||
Date: 2026-04-05
|
||||
Status: draft
|
||||
Purpose: define the bounded review used to decide whether internal engineering
|
||||
validation on the current `Phase 18` RF2 runtime envelope stays limited, widens
|
||||
inside the same envelope, or blocks expansion
|
||||
|
||||
## Reading Rule
|
||||
|
||||
This artifact is a bounded decision gate after runtime-envelope validation.
|
||||
|
||||
It does NOT mean:
|
||||
|
||||
1. broad launch approval
|
||||
2. generic production readiness
|
||||
3. permission to widen beyond the frozen runtime envelope
|
||||
4. permission to reinterpret validation survival as new protocol/runtime proof
|
||||
|
||||
It means only:
|
||||
|
||||
1. validation outcomes may be reviewed against the already-accepted bounded
|
||||
envelope
|
||||
2. expansion decisions must stay inside the same named support boundary
|
||||
3. any broader claim still needs explicit new evidence and explicit new review
|
||||
|
||||
## Allowed Decisions
|
||||
|
||||
The rollout review may produce only one of these outputs:
|
||||
|
||||
1. `stay in bounded validation`
|
||||
- the chosen envelope is still the right boundary, but more bounded validation
|
||||
evidence is needed before any exposure increase
|
||||
2. `widen bounded engineering exposure`
|
||||
- exposure may increase only inside the current bounded engineering envelope,
|
||||
with no change to the named support boundary
|
||||
3. `block expansion`
|
||||
- the current evidence, incident record, or operational ambiguity is not strong
|
||||
enough to increase exposure safely
|
||||
|
||||
Any outcome outside those three is invalid for this review.
|
||||
|
||||
## Required Inputs
|
||||
|
||||
The review must not start unless these inputs exist and are explicit:
|
||||
|
||||
1. the frozen runtime envelope
|
||||
2. the bounded pilot pack
|
||||
3. the preflight checklist outcome(s)
|
||||
4. the pilot stop-condition artifact
|
||||
5. incident records with explicit classification
|
||||
6. validation outcome summary for the bounded runtime-bearing path
|
||||
7. the accepted evidence anchors that define the current boundary:
|
||||
- `Phase 18 M1-M4`
|
||||
- `v2-rf2-runtime-bounded-envelope.md`
|
||||
- `v2-rf2-runtime-bounded-envelope-review.md`
|
||||
|
||||
If any required input is missing, the correct review output is `block expansion`.
|
||||
|
||||
## Decision Questions
|
||||
|
||||
The rollout review must answer all of the following:
|
||||
|
||||
1. did validation remain fully inside the frozen runtime envelope
|
||||
2. did any observed behavior contradict the bounded `Phase 18` runtime envelope
|
||||
3. were any stop conditions triggered, and if so, how were they resolved
|
||||
4. are all incidents classified cleanly as:
|
||||
- `config / environment issue`
|
||||
- `known exclusion`
|
||||
- `true product bug`
|
||||
5. does any proposed next step depend on a broader claim than the current
|
||||
envelope
|
||||
6. can operators run the validation and diagnose bounded failures from written
|
||||
artifacts rather than tribal knowledge
|
||||
|
||||
If the answer to question 5 is yes, the review must not approve widening inside
|
||||
this artifact. That request belongs to later evidence expansion work.
|
||||
|
||||
## Decision Rules
|
||||
|
||||
Use these bounded rules:
|
||||
|
||||
1. approve `stay in bounded validation` when:
|
||||
- validation stayed inside scope
|
||||
- no contradiction to accepted bounded claims was found
|
||||
- more same-envelope evidence is still needed
|
||||
2. approve `widen bounded engineering exposure` only when:
|
||||
- validation stayed inside scope
|
||||
- no unresolved `true product bug` remains against the bounded envelope
|
||||
- stop conditions did not reveal structural ambiguity
|
||||
- operator workflow is explicit and repeatable from the artifact set
|
||||
- the widened exposure does not change the bounded envelope
|
||||
3. approve `block expansion` when:
|
||||
- any unresolved contradiction exists
|
||||
- any unresolved `true product bug` exists
|
||||
- incident records are vague
|
||||
- operators depend on tribal knowledge
|
||||
- the requested widening outruns the current matrix
|
||||
|
||||
## Explicit Review Record
|
||||
|
||||
Each review result must record:
|
||||
|
||||
1. decision outcome
|
||||
2. date and reviewer set
|
||||
3. validation window / environment covered
|
||||
4. summary of incidents by classification bucket
|
||||
5. any stop-condition events and their disposition
|
||||
6. exact reason the decision stays inside the current envelope
|
||||
7. explicit next action:
|
||||
- continue bounded validation
|
||||
- widen engineering exposure inside the same envelope
|
||||
- pause and fix
|
||||
|
||||
## Rejection Rules
|
||||
|
||||
Reject the review as invalid if:
|
||||
|
||||
1. it uses validation success as generic production proof
|
||||
2. it broadens topology, runtime path, or supported surfaces without a new
|
||||
evidence package
|
||||
3. it treats a known exclusion as if validation cleared it
|
||||
4. it ignores stop-condition events or unresolved true product bugs
|
||||
5. it cannot map the decision back to the accepted evidence ladder
|
||||
|
||||
## Explicit Non-Claims
|
||||
|
||||
This artifact does NOT claim:
|
||||
|
||||
1. broad rollout approval
|
||||
2. generic production readiness
|
||||
3. support for `RF>2`
|
||||
4. support for a broad transport/frontend matrix
|
||||
5. broad failover-under-load or long-window soak proof
|
||||
|
||||
## Primary Inputs
|
||||
|
||||
1. `sw-block/design/v2-rf2-runtime-bounded-envelope.md`
|
||||
2. `sw-block/design/v2-rf2-runtime-bounded-envelope-review.md`
|
||||
3. `sw-block/design/v2-bounded-internal-pilot-pack.md`
|
||||
4. `sw-block/design/v2-pilot-preflight-checklist.md`
|
||||
5. `sw-block/design/v2-pilot-stop-conditions.md`
|
||||
6. `sw-block/.private/phase/phase-18.md`
|
||||
@@ -1,138 +0,0 @@
|
||||
# V2 First-Launch Supported Matrix
|
||||
|
||||
Date: 2026-04-04
|
||||
Status: draft
|
||||
Purpose: freeze the first bounded launch envelope from accepted `Phase 12-17`
|
||||
evidence, with explicit supported scope, explicit exclusions, and explicit
|
||||
launch blockers
|
||||
|
||||
## Reading Rule
|
||||
|
||||
This document is a bounded support matrix.
|
||||
|
||||
It does NOT mean:
|
||||
|
||||
1. broad launch approval
|
||||
2. generic production readiness
|
||||
3. support for every failover/restart/disturbance branch
|
||||
4. broad transport/frontend approval outside the named chosen path
|
||||
|
||||
It means only:
|
||||
|
||||
1. these are the strongest support statements currently justified by accepted
|
||||
evidence
|
||||
2. anything outside them is an exclusion, a blocker, or future
|
||||
productionization work
|
||||
|
||||
## Supported Matrix
|
||||
|
||||
| Dimension | Supported in first draft | Boundary rule | Primary evidence |
|
||||
|-----------|--------------------------|---------------|------------------|
|
||||
| Replication factor | `RF=2` | bounded chosen path only | `CP13-1..7`, `C-RF2-SYNCALL-CONTRACT` |
|
||||
| Durability mode | `sync_all` | bounded chosen path only | `CP13`, `v2-protocol-claim-and-evidence.md` |
|
||||
| Control/runtime path | existing master / volume-server heartbeat path | same path as `Phase 10`, `Phase 16`, `Phase 17` checkpoints | `Phase 10`, `Phase 16` finish-line review |
|
||||
| Semantic owner | explicit `V2 core` | semantics stay `V2`-owned even when implementation reuses `weed/` and `blockvol` | `Phase 14-16` |
|
||||
| Execution backend | `blockvol` via `v2bridge` | reuse implementation; no V1 semantic inheritance | `Phase 09`, `Phase 14-16` |
|
||||
| Product surfaces | bounded `iSCSI`, bounded `CSI`, bounded `NVMe` on the chosen path | not a generic transport matrix | `Phase 11`, publication tests, NVMe tests |
|
||||
| Restart/failover reading | bounded `17B` + `17C` interpretation | use only the explicit contract and policy table from `Phase 17` | `phase-17.md`, `phase-17-checkpoint-review.md` |
|
||||
|
||||
## Supported Statement
|
||||
|
||||
The strongest currently supported first-launch statement is:
|
||||
|
||||
1. on the bounded chosen `RF=2 sync_all` path, using the existing
|
||||
master/volume-server heartbeat path, explicit `V2`-owned semantics and the
|
||||
accepted `Phase 16-17` contract/policy package provide one finite support
|
||||
envelope for bounded block product use
|
||||
2. bounded `iSCSI`, `CSI`, and `NVMe` surfaces are supported only inside that
|
||||
same chosen-path interpretation
|
||||
3. failover/publication and disturbance behavior must be read through the
|
||||
explicit `Phase 17` contract/policy package, not through broader inferred
|
||||
product assumptions
|
||||
|
||||
## Explicit Exclusions
|
||||
|
||||
The following are OUTSIDE the first-launch supported matrix:
|
||||
|
||||
1. `RF>2`
|
||||
2. durability modes outside the accepted bounded envelope
|
||||
3. broad transport/frontend matrix approval beyond bounded `iSCSI` / `CSI` /
|
||||
`NVMe` chosen-path support
|
||||
4. broad whole-surface failover/publication proof
|
||||
5. broad restart-window behavior outside the explicit `17C` disturbance policy
|
||||
table
|
||||
6. generic soak/pilot success as production proof
|
||||
7. broad rollout approval
|
||||
|
||||
## Launch Blockers
|
||||
|
||||
These are still required before this matrix can be read as a real launch
|
||||
decision package:
|
||||
|
||||
1. a frozen `Phase 17` checkpoint review outcome
|
||||
2. any additional evidence needed if the product wants claims broader than the
|
||||
current bounded `17B/17C` contract/policy package
|
||||
|
||||
## Current Productionization Artifacts
|
||||
|
||||
The first bounded productionization artifacts now exist for the chosen path:
|
||||
|
||||
1. `v2-bounded-internal-pilot-pack.md`
|
||||
- bounded pilot scope
|
||||
- success criteria
|
||||
- incident classification
|
||||
2. `v2-pilot-preflight-checklist.md`
|
||||
- start/resume gate for the bounded pilot
|
||||
3. `v2-pilot-stop-conditions.md`
|
||||
- stop/contain/rollback-exposure rules
|
||||
4. `v2-controlled-rollout-review.md`
|
||||
- bounded post-pilot decision gate
|
||||
- allowed outcomes: stay in pilot / widen within same envelope / block expansion
|
||||
|
||||
## Not Launch-Blocking In This Draft
|
||||
|
||||
These are intentionally NOT blockers for the bounded first-draft envelope:
|
||||
|
||||
1. lack of `RF>2` support
|
||||
2. lack of broad transport/frontend approval
|
||||
3. lack of broad launch approval language
|
||||
4. lack of generic soak proof inside the phase package itself
|
||||
|
||||
## Claim Mapping Rule
|
||||
|
||||
Any first-launch support claim must map back to accepted evidence in ALL of the
|
||||
following layers:
|
||||
|
||||
1. hardening/floor layer
|
||||
- `Phase 12 P4`
|
||||
2. contract/workload/mode layer
|
||||
- `CP13-1..9`
|
||||
3. runtime truth-closure layer
|
||||
- `Phase 16` finish-line checkpoint
|
||||
4. product-claim checkpoint layer
|
||||
- `Phase 17A-17D`
|
||||
|
||||
If a claim cannot map cleanly back through those layers:
|
||||
|
||||
1. it is not in the first-launch matrix
|
||||
2. it belongs in exclusions, blockers, or later productionization work
|
||||
|
||||
## Operator Reading Guide
|
||||
|
||||
When using this matrix, read it with these constraints:
|
||||
|
||||
1. bounded chosen path first, not generic platform promise
|
||||
2. explicit exclusions are real product boundaries, not temporary omissions
|
||||
3. launch blockers are real blockers, not optional polish
|
||||
4. pilot success later may validate this envelope, but cannot redefine it
|
||||
|
||||
## Primary References
|
||||
|
||||
1. `sw-block/.private/phase/phase-17.md`
|
||||
2. `sw-block/.private/phase/phase-17-checkpoint-review.md`
|
||||
3. `sw-block/design/v2-product-completion-overview.md`
|
||||
4. `sw-block/design/v2-protocol-claim-and-evidence.md`
|
||||
5. `sw-block/design/v2-bounded-internal-pilot-pack.md`
|
||||
6. `sw-block/design/v2-pilot-preflight-checklist.md`
|
||||
7. `sw-block/design/v2-pilot-stop-conditions.md`
|
||||
8. `sw-block/design/v2-controlled-rollout-review.md`
|
||||
@@ -1,99 +0,0 @@
|
||||
# V2 Legacy Runtime Exit Criteria
|
||||
|
||||
Date: 2026-04-04
|
||||
Status: active
|
||||
|
||||
## Purpose
|
||||
|
||||
This note defines when legacy runtime-owner paths may be downgraded from
|
||||
required compatibility coverage to removable implementation history.
|
||||
|
||||
Current legacy examples:
|
||||
|
||||
1. `legacy P4` live-path proofs
|
||||
2. no-core startup paths
|
||||
3. `HandleAssignmentResult()`-driven recovery startup kept for compatibility
|
||||
|
||||
## Current Position
|
||||
|
||||
For the current phase, legacy paths must remain:
|
||||
|
||||
1. as compatibility guards
|
||||
2. as regression protection for no-core behavior
|
||||
3. but NOT as semantic authority proof for the core-present path
|
||||
|
||||
## Exit Criteria
|
||||
|
||||
A legacy runtime-owner path may be downgraded or removed only when all of the
|
||||
following are true.
|
||||
|
||||
### 1. V2-native proof replacement exists
|
||||
|
||||
There must be core-present proofs covering the same behavior category:
|
||||
|
||||
1. assignment entry ownership
|
||||
2. task startup ownership
|
||||
3. execution ownership
|
||||
4. observation return ownership
|
||||
5. outward surface consistency
|
||||
|
||||
### 2. Compatibility mode is no longer required operationally
|
||||
|
||||
At least one of these must be true:
|
||||
|
||||
1. production startup always wires `v2Core`
|
||||
2. no-core path is explicitly declared unsupported
|
||||
3. no remaining product surface depends on no-core runtime startup
|
||||
|
||||
### 3. The legacy path is no longer the only guard for a runtime mechanic
|
||||
|
||||
Examples:
|
||||
|
||||
1. serialized replacement/drain behavior
|
||||
2. shutdown drain behavior
|
||||
3. live plan-to-execute behavior
|
||||
|
||||
These must have equivalent core-present coverage before legacy deletion.
|
||||
|
||||
### 4. No semantic truth still depends on legacy behavior
|
||||
|
||||
Specifically, removing the legacy path must not change:
|
||||
|
||||
1. identity meaning
|
||||
2. recovery classification
|
||||
3. publication meaning
|
||||
4. durable-boundary meaning
|
||||
|
||||
If removal changes any of those, the legacy path was still hiding semantic
|
||||
authority and cannot be retired yet.
|
||||
|
||||
## Downgrade Stages
|
||||
|
||||
Legacy paths should retire in stages:
|
||||
|
||||
### Stage 1: authority downgrade
|
||||
|
||||
1. keep tests
|
||||
2. explicitly classify them as compatibility-only
|
||||
|
||||
### Stage 2: runtime fallback downgrade
|
||||
|
||||
1. keep fallback code only where product startup still needs it
|
||||
2. stop expanding proof claims from those paths
|
||||
|
||||
### Stage 3: deletion candidate
|
||||
|
||||
1. delete tests or move them to legacy-only coverage
|
||||
2. remove runtime fallback code only after the new path is already the sole
|
||||
supported owner
|
||||
|
||||
## Current Judgment
|
||||
|
||||
As of the current separation work:
|
||||
|
||||
1. `legacy P4` stays
|
||||
2. it is already downgraded to compatibility guard
|
||||
3. it is not yet removable because:
|
||||
- no-core behavior still exists
|
||||
- full runtime-loop closure is not yet complete
|
||||
- not every old ownership proof has a complete core-present replacement
|
||||
@@ -1,161 +0,0 @@
|
||||
# V2 Open Questions
|
||||
|
||||
Date: 2026-03-27
|
||||
|
||||
## Purpose
|
||||
|
||||
This document records what is still algorithmically open in V2.
|
||||
|
||||
These are not bugs.
|
||||
|
||||
They are design questions that should be closed deliberately before or during implementation slicing.
|
||||
|
||||
## 1. Recovery Session Ownership
|
||||
|
||||
Open question:
|
||||
|
||||
- what is the exact ownership model for one active recovery session per replica?
|
||||
|
||||
Need to decide:
|
||||
|
||||
- session identity fields
|
||||
- supersede vs reject vs join behavior
|
||||
- how epoch/session invalidates old recovery work
|
||||
|
||||
Why it matters:
|
||||
|
||||
- V1.5 needed local reconnect serialization
|
||||
- V2 should make this a protocol rule
|
||||
|
||||
## 2. Promotion Threshold Strictness
|
||||
|
||||
Open question:
|
||||
|
||||
- must a promotion candidate always have `FlushedLSN >= CommittedLSN`, or is there any narrower safe exception?
|
||||
|
||||
Current prototype:
|
||||
|
||||
- uses committed-prefix sufficiency as the safety gate
|
||||
|
||||
Why it matters:
|
||||
|
||||
- determines how strict real failover behavior should be
|
||||
|
||||
## 3. Recovery Reservation Shape
|
||||
|
||||
Open question:
|
||||
|
||||
- what exactly is reserved during catch-up?
|
||||
|
||||
Need to decide:
|
||||
|
||||
- WAL range only?
|
||||
- payload pins?
|
||||
- snapshot pin?
|
||||
- expiry semantics?
|
||||
|
||||
Why it matters:
|
||||
|
||||
- recoverability must be explicit, not hopeful
|
||||
|
||||
## 4. Smart WAL Payload Classes
|
||||
|
||||
Open question:
|
||||
|
||||
- which payload classes are allowed in V2 first?
|
||||
|
||||
Current model has:
|
||||
|
||||
- `WALInline`
|
||||
- `ExtentReferenced`
|
||||
|
||||
Need to decide:
|
||||
|
||||
- whether first real implementation includes both
|
||||
- whether `ExtentReferenced` requires pinned snapshot/versioned extent only
|
||||
|
||||
## 5. Smart WAL Garbage Collection Boundary
|
||||
|
||||
Open question:
|
||||
|
||||
- when can a referenced payload stop being recoverable?
|
||||
|
||||
Need to decide:
|
||||
|
||||
- GC interaction
|
||||
- timeout interaction
|
||||
- recovery session pinning
|
||||
|
||||
Why it matters:
|
||||
|
||||
- this is the line between catch-up and rebuild
|
||||
|
||||
## 6. Exact Orchestrator Scope
|
||||
|
||||
Open question:
|
||||
|
||||
- how much of the final V2 control logic belongs in:
|
||||
- local node state
|
||||
- coordinator
|
||||
- transport/session manager
|
||||
|
||||
Why it matters:
|
||||
|
||||
- avoid V1-style scattered state ownership
|
||||
|
||||
## 7. First Real Implementation Slice
|
||||
|
||||
Open question:
|
||||
|
||||
- what is the first production slice of V2?
|
||||
|
||||
Candidates:
|
||||
|
||||
1. per-replica sender/session ownership
|
||||
2. explicit recovery-session management
|
||||
3. catch-up/rebuild decision plumbing
|
||||
|
||||
Recommended default:
|
||||
|
||||
- per-replica sender/session ownership
|
||||
|
||||
## 8. Steady-State Overhead Budget
|
||||
|
||||
Open question:
|
||||
|
||||
- what overhead is acceptable in the normal healthy case?
|
||||
|
||||
Need to decide:
|
||||
|
||||
- metadata checks on hot path
|
||||
- extra state bookkeeping
|
||||
- what stays off the hot path
|
||||
|
||||
Why it matters:
|
||||
|
||||
- V2 should be structurally better without becoming needlessly heavy
|
||||
|
||||
## 9. Smart WAL First-Phase Goal
|
||||
|
||||
Open question:
|
||||
|
||||
- is the first Smart WAL goal:
|
||||
- lower recovery cost
|
||||
- lower steady-state WAL volume
|
||||
- or just proof of historical correctness model?
|
||||
|
||||
Recommended answer:
|
||||
|
||||
- first prove correctness model, then optimize
|
||||
|
||||
## 10. End Condition For Simulator Work
|
||||
|
||||
Open question:
|
||||
|
||||
- when do we stop adding simulator depth and start implementation?
|
||||
|
||||
Suggested answer:
|
||||
|
||||
- once acceptance criteria are satisfied
|
||||
- and the first implementation slice is clear
|
||||
- and remaining simulator additions are no longer changing core protocol decisions
|
||||
@@ -1,319 +0,0 @@
|
||||
# V2 Phase 14+ Semantic-First Framework
|
||||
|
||||
Date: 2026-04-03
|
||||
Status: active
|
||||
Purpose: define the overall `Phase 14+` implementation framework so `V2`
|
||||
runtime extraction is driven by semantics first: core-owned state and
|
||||
transitions, then command rules, then projection contracts, and only then
|
||||
adapter rebinding
|
||||
|
||||
## Why This Document Exists
|
||||
|
||||
`Phase 13` closed one bounded constrained-runtime contract package:
|
||||
|
||||
1. real-workload validation
|
||||
2. assignment/publication closure
|
||||
3. bounded mode normalization
|
||||
|
||||
That package is valuable, but it is not yet a completed `V2 runtime`.
|
||||
|
||||
The next problem is therefore no longer:
|
||||
|
||||
1. keep deepening constrained-`V1` validation by default
|
||||
|
||||
It is:
|
||||
|
||||
1. how to turn the accepted semantic constraints into a real `V2 core`
|
||||
2. how to sequence `Phase 14+` so `V1` mixed runtime state does not silently
|
||||
regain semantic authority
|
||||
|
||||
## Core Rule
|
||||
|
||||
For `Phase 14+`, implementation order must be:
|
||||
|
||||
1. define core-owned state and transitions
|
||||
2. define command-emission rules
|
||||
3. define projection contracts
|
||||
4. only then connect adapters
|
||||
|
||||
Do not invert this order.
|
||||
|
||||
If adapter/runtime wiring appears first, `V1` mixed state will silently regain
|
||||
semantic authority through convenience behavior.
|
||||
|
||||
## Existing Inputs To Preserve
|
||||
|
||||
These are fixed inputs, not optional references:
|
||||
|
||||
1. `v2_mini_core_design.md`
|
||||
2. `v2-reuse-replacement-boundary.md`
|
||||
3. `v2-protocol-claim-and-evidence.md`
|
||||
4. `v2-phase-development-plan.md`
|
||||
5. `sw-block/engine/replication/`
|
||||
|
||||
## Overall Composition Model
|
||||
|
||||
The full `V2` runtime should be composed from smaller automata rather than one
|
||||
monolithic state machine.
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
assignmentState[AssignmentAutomaton]
|
||||
recoveryState[RecoveryAutomaton]
|
||||
boundaryState[BoundaryAutomaton]
|
||||
modeState[ModeAutomaton]
|
||||
publicationState[PublicationAutomaton]
|
||||
coreEngine[CoreEngine]
|
||||
projections[ProjectionContracts]
|
||||
adapters[AdapterBoundary]
|
||||
runtime[V1BackendMechanics]
|
||||
|
||||
assignmentState --> coreEngine
|
||||
recoveryState --> coreEngine
|
||||
boundaryState --> coreEngine
|
||||
modeState --> coreEngine
|
||||
coreEngine --> publicationState
|
||||
publicationState --> projections
|
||||
coreEngine --> adapters
|
||||
adapters --> runtime
|
||||
runtime -->|"observations/events"| adapters
|
||||
adapters --> coreEngine
|
||||
```
|
||||
|
||||
## The Five Core-Owned Automata
|
||||
|
||||
### 1. Assignment automaton
|
||||
|
||||
Owns:
|
||||
|
||||
1. volume intent
|
||||
2. role intent
|
||||
3. stable replica identity
|
||||
4. epoch
|
||||
5. desired replica set
|
||||
|
||||
Primary constraints preserved:
|
||||
|
||||
1. `CP13-2`
|
||||
2. identity-vs-transport separation
|
||||
|
||||
Current seeds:
|
||||
|
||||
1. `sw-block/engine/replication/registry.go`
|
||||
2. `sw-block/engine/replication/state.go`
|
||||
|
||||
### 2. Recovery automaton
|
||||
|
||||
Owns:
|
||||
|
||||
1. per-replica recovery state
|
||||
2. session ownership and fencing
|
||||
3. catch-up vs rebuild selection
|
||||
|
||||
Primary constraints preserved:
|
||||
|
||||
1. `CP13-4`
|
||||
2. `CP13-5`
|
||||
3. `CP13-6`
|
||||
4. `CP13-7`
|
||||
|
||||
Current seeds:
|
||||
|
||||
1. `sw-block/engine/replication/sender.go`
|
||||
2. `sw-block/engine/replication/session.go`
|
||||
3. `sw-block/engine/replication/orchestrator.go`
|
||||
4. `sw-block/engine/replication/outcome.go`
|
||||
|
||||
### 3. Boundary automaton
|
||||
|
||||
Owns:
|
||||
|
||||
1. committed truth
|
||||
2. checkpoint truth
|
||||
3. durable barrier truth
|
||||
4. rebuild/catch-up target truth
|
||||
|
||||
Primary constraints preserved:
|
||||
|
||||
1. `T1`
|
||||
2. `T9`
|
||||
3. `CP13-3`
|
||||
|
||||
Current seeds:
|
||||
|
||||
1. `sw-block/engine/replication/state.go`
|
||||
2. `sw-block/engine/replication/engine.go`
|
||||
|
||||
### 4. Mode automaton
|
||||
|
||||
Owns:
|
||||
|
||||
1. `allocated_only`
|
||||
2. `bootstrap_pending`
|
||||
3. `replica_ready`
|
||||
4. `publish_healthy`
|
||||
5. `degraded`
|
||||
6. `needs_rebuild`
|
||||
|
||||
Primary constraints preserved:
|
||||
|
||||
1. `CP13-9`
|
||||
2. fail-closed external meaning
|
||||
|
||||
Current seeds:
|
||||
|
||||
1. `sw-block/engine/replication/state.go`
|
||||
2. `sw-block/engine/replication/engine.go`
|
||||
|
||||
### 5. Publication automaton
|
||||
|
||||
Owns:
|
||||
|
||||
1. readiness closure
|
||||
2. publication closure
|
||||
3. outward healthy vs non-healthy truth
|
||||
|
||||
Primary constraints preserved:
|
||||
|
||||
1. `CP13-8A`
|
||||
2. `CP13-9`
|
||||
|
||||
Current seeds:
|
||||
|
||||
1. `sw-block/engine/replication/projection.go`
|
||||
2. `sw-block/engine/replication/engine.go`
|
||||
|
||||
## Phase 14+ Execution Order
|
||||
|
||||
### Phase 14A: Core-owned automata
|
||||
|
||||
Goal:
|
||||
|
||||
1. make the five automata explicit in the core package
|
||||
|
||||
Deliver:
|
||||
|
||||
1. state definitions
|
||||
2. transition tables/rules
|
||||
3. event vocabulary
|
||||
|
||||
Validation:
|
||||
|
||||
1. structural acceptance tests in `sw-block/engine/replication`
|
||||
|
||||
Non-goal:
|
||||
|
||||
1. no live adapter hook
|
||||
|
||||
### Phase 14B: Command semantics
|
||||
|
||||
Goal:
|
||||
|
||||
1. freeze command-emission rules from semantic state, not runtime convenience
|
||||
|
||||
Deliver:
|
||||
|
||||
1. command rules for role apply, receiver start, shipper configure, invalidation,
|
||||
and publication
|
||||
|
||||
Validation:
|
||||
|
||||
1. tests that one event sequence produces one bounded command sequence
|
||||
|
||||
Non-goal:
|
||||
|
||||
1. no `weed/` execution yet
|
||||
|
||||
### Phase 14C: Projection contracts
|
||||
|
||||
Goal:
|
||||
|
||||
1. define what external surfaces are allowed to claim and from which core state
|
||||
|
||||
Deliver:
|
||||
|
||||
1. projection structs and normalization rules for lookup/heartbeat/debug/tester
|
||||
meanings
|
||||
|
||||
Validation:
|
||||
|
||||
1. mode/readiness/publication surface-consistency tests
|
||||
|
||||
Non-goal:
|
||||
|
||||
1. no live registry rewrite yet
|
||||
|
||||
### Phase 15A: Minimal adapter hook
|
||||
|
||||
Goal:
|
||||
|
||||
1. connect one narrow adapter ingress to the new core
|
||||
|
||||
Deliver:
|
||||
|
||||
1. one event path from `weed/` into the core
|
||||
2. one command path back out
|
||||
|
||||
Validation:
|
||||
|
||||
1. prove no semantic split between adapter and core on that narrow path
|
||||
|
||||
### Phase 15B: Projection-store rebinding
|
||||
|
||||
Goal:
|
||||
|
||||
1. make `weed/` projection/state surfaces consume core-owned projection truth
|
||||
|
||||
Deliver:
|
||||
|
||||
1. bounded rebinding of registry / lookup / tester-facing surfaces
|
||||
|
||||
Validation:
|
||||
|
||||
1. prove assignment delivered != ready != publish healthy on the real path
|
||||
|
||||
### Phase 16: V2-native runtime closure
|
||||
|
||||
Goal:
|
||||
|
||||
1. make the integrated runtime behave as a `V2`-owned system rather than
|
||||
constrained-`V1` semantics plus fixes
|
||||
|
||||
Deliver:
|
||||
|
||||
1. one bounded runtime path where core-owned semantics drive adapters and
|
||||
projections
|
||||
|
||||
Validation:
|
||||
|
||||
1. end-to-end failover/recovery/publication scenarios on the core-driven path
|
||||
|
||||
## Algorithm Review Rule
|
||||
|
||||
For any new transition rule, command rule, or projection rule, require a short
|
||||
justification in code review or delivery notes:
|
||||
|
||||
1. semantic constraint satisfied:
|
||||
- which item from `v2-protocol-claim-and-evidence.md`,
|
||||
`v2-protocol-truths.md`, or `CP13-*`
|
||||
2. overclaim avoided:
|
||||
- what false healthy / ready / durable / recoverable claim is being prevented
|
||||
3. proof preserved:
|
||||
- which accepted test or checkpoint remains valid because of this rule
|
||||
|
||||
This is the minimum bar for `Phase 14+`.
|
||||
|
||||
## Immediate Next Slice
|
||||
|
||||
Do not broaden `Phase 13` further.
|
||||
|
||||
Use the new `Phase 14` core skeleton in `sw-block/engine/replication` as the
|
||||
base for one complete semantic chain:
|
||||
|
||||
1. `mode`
|
||||
2. `readiness`
|
||||
3. `publication`
|
||||
|
||||
This is the best next slice because it turns the newest accepted `CP13-8A` and
|
||||
`CP13-9` constraints directly into core-owned state and transition logic before
|
||||
adapter rebinding begins.
|
||||
@@ -1,112 +0,0 @@
|
||||
# V2 Pilot Preflight Checklist
|
||||
|
||||
Date: 2026-04-05
|
||||
Status: draft
|
||||
Purpose: define the minimum explicit checks required before running bounded
|
||||
internal engineering validation on the current `Phase 18` RF2 runtime envelope
|
||||
|
||||
## Reading Rule
|
||||
|
||||
This checklist is a gate for starting or resuming bounded internal engineering
|
||||
validation.
|
||||
|
||||
If any item below is not satisfied:
|
||||
|
||||
1. do not treat the environment as pilot-ready
|
||||
2. either fix the issue or classify it explicitly before proceeding
|
||||
|
||||
## Scope Lock
|
||||
|
||||
Confirm the validation is still inside the frozen `Phase 18` RF2 runtime
|
||||
envelope:
|
||||
|
||||
1. topology remains bounded `RF=2`
|
||||
2. runtime path remains the delivered `masterv2 + volumev2 + purev2` `M1-M4`
|
||||
path
|
||||
3. validation stays in bounded runtime/lab exercises only
|
||||
4. no one is trying to use this validation to claim working block product status
|
||||
5. no real frontend/product traffic is being introduced on the new runtime path
|
||||
6. no one is trying to use validation success to claim broader launch approval
|
||||
|
||||
## Build And Artifact Pin
|
||||
|
||||
Confirm the software package is explicit and stable:
|
||||
|
||||
1. the exact build/commit for validation nodes is written down
|
||||
2. all validation nodes run the same intended package
|
||||
3. the operator runbook matches the package actually deployed
|
||||
4. any configuration delta from the documented chosen path is reviewed and
|
||||
accepted explicitly
|
||||
|
||||
## Environment Readiness
|
||||
|
||||
Confirm the validation environment matches bounded assumptions:
|
||||
|
||||
1. node inventory and topology are written down
|
||||
2. transport/frontend choice does not widen beyond the bounded runtime envelope
|
||||
3. storage/network assumptions required by the chosen path are known to the
|
||||
operator
|
||||
4. known exclusions are acknowledged before start
|
||||
5. rollback/containment ownership is assigned for the validation window
|
||||
|
||||
## Diagnosis Surface Readiness
|
||||
|
||||
Confirm bounded diagnosis can be performed without ad hoc spelunking:
|
||||
|
||||
1. failover snapshots/results can be inspected
|
||||
2. Loop 2 snapshots can be inspected
|
||||
3. continuity snapshots can be inspected
|
||||
4. RF2 runtime surface can be inspected
|
||||
5. the operator knows which artifact defines the current contract/policy boundary:
|
||||
- `v2-rf2-runtime-bounded-envelope.md`
|
||||
- `v2-rf2-runtime-bounded-envelope-review.md`
|
||||
- this preflight checklist
|
||||
- the stop-condition artifact
|
||||
|
||||
## Workload And Gate Alignment
|
||||
|
||||
Confirm the validation workload is aligned with accepted evidence:
|
||||
|
||||
1. the workload maps to the bounded runtime-bearing reading rather than a new
|
||||
unsupported scenario
|
||||
2. success will be judged against the validation-pack criteria rather than generic
|
||||
"looks stable" judgment
|
||||
3. the workload does not assume continuous Loop 2, real transport, auto failover,
|
||||
rebuild lifecycle, or product frontends that are still excluded
|
||||
4. no required proof depends on failover-under-load, hours/days soak, `RF>2`, or
|
||||
broad transport/frontend claims that are still excluded
|
||||
|
||||
## Incident Routing Readiness
|
||||
|
||||
Confirm incident handling is explicit before starting:
|
||||
|
||||
1. every incident will be classified as one of:
|
||||
- `config / environment issue`
|
||||
- `known exclusion`
|
||||
- `true product bug`
|
||||
2. the recording location for incidents is agreed before validation starts
|
||||
3. ownership for triage and decision-making is assigned
|
||||
4. operators know when they must stop instead of improvising
|
||||
|
||||
## Preflight Result
|
||||
|
||||
Validation may start only if:
|
||||
|
||||
1. every scope-lock item is true
|
||||
2. the software package and environment are pinned
|
||||
3. diagnosis surfaces are available
|
||||
4. incident routing is explicit
|
||||
5. no remaining gap is being hand-waved as "we will figure it out during pilot"
|
||||
|
||||
If those conditions are not met, the correct output is:
|
||||
|
||||
1. `NOT READY`
|
||||
2. the missing item(s)
|
||||
3. the owner/action needed before retry
|
||||
|
||||
## Primary Inputs
|
||||
|
||||
1. `sw-block/design/v2-bounded-internal-pilot-pack.md`
|
||||
2. `sw-block/design/v2-rf2-runtime-bounded-envelope.md`
|
||||
3. `sw-block/design/v2-rf2-runtime-bounded-envelope-review.md`
|
||||
4. `sw-block/.private/phase/phase-18.md`
|
||||
@@ -1,101 +0,0 @@
|
||||
# V2 Pilot Stop Conditions
|
||||
|
||||
Date: 2026-04-05
|
||||
Status: draft
|
||||
Purpose: define when bounded internal engineering validation on the current
|
||||
`Phase 18` RF2 runtime envelope must stop, contain scope, or block expansion
|
||||
|
||||
## Reading Rule
|
||||
|
||||
This artifact is about validation containment, not protocol/data rollback
|
||||
semantics.
|
||||
|
||||
`Rollback` here means:
|
||||
|
||||
1. stop widening validation exposure
|
||||
2. reduce or remove validation usage if needed
|
||||
3. return to a previously accepted bounded state of operation
|
||||
|
||||
It does NOT mean:
|
||||
|
||||
1. a general storage/data rollback guarantee
|
||||
2. permission to claim a broader recovery contract than the current evidence
|
||||
3. ad hoc operator improvisation under ambiguity
|
||||
|
||||
## Immediate Stop Conditions
|
||||
|
||||
Stop validation immediately if ANY of the following occurs:
|
||||
|
||||
1. an observed behavior contradicts the bounded `Phase 18` RF2 runtime envelope
|
||||
2. any run is interpreted as proving automatic failover, continuous Loop 2
|
||||
service, rebuild lifecycle, or frontend-serving behavior that is still
|
||||
explicitly excluded
|
||||
3. diagnosis surfaces are insufficient to classify the incident without guessing
|
||||
4. the validation is being widened beyond the named bounded envelope without an
|
||||
explicit review decision
|
||||
5. the incident does not fit the allowed buckets:
|
||||
- `config / environment issue`
|
||||
- `known exclusion`
|
||||
- `true product bug`
|
||||
|
||||
## Stop-And-Contain Actions
|
||||
|
||||
When a stop condition fires:
|
||||
|
||||
1. freeze new validation expansion immediately
|
||||
2. preserve the evidence needed for later review
|
||||
3. classify the incident explicitly
|
||||
4. map the incident back to:
|
||||
- accepted bounded claim
|
||||
- known exclusion
|
||||
- unresolved blocker
|
||||
5. decide whether validation can continue in reduced scope or must fully pause
|
||||
|
||||
If the team cannot perform those actions clearly, validation remains stopped.
|
||||
|
||||
## Rollback Decision Rules
|
||||
|
||||
Use the following bounded rules:
|
||||
|
||||
1. `config / environment issue`
|
||||
- fix the environment/configuration
|
||||
- rerun preflight before resuming
|
||||
2. `known exclusion`
|
||||
- remove the excluded usage from validation
|
||||
- do not reinterpret it as product support
|
||||
3. `true product bug`
|
||||
- pause affected validation scope
|
||||
- open an explicit fix or contradiction item before resuming
|
||||
|
||||
If repeated incidents of the same class continue without a bounded corrective
|
||||
path, block further validation expansion.
|
||||
|
||||
## Expansion Blockers
|
||||
|
||||
Even if validation remains partially runnable, do NOT widen it when:
|
||||
|
||||
1. the same unresolved true product bug recurs
|
||||
2. operators depend on tribal knowledge to recover or diagnose
|
||||
3. incident records are vague or cannot be mapped back to the current evidence
|
||||
ladder
|
||||
4. success depends on ignoring explicit exclusions
|
||||
5. the desired next step requires broader launch claims than the current envelope
|
||||
|
||||
## Explicit Non-Claims
|
||||
|
||||
This artifact does NOT claim:
|
||||
|
||||
1. broad rollout approval
|
||||
2. generic production readiness from validation survival
|
||||
3. support for `RF>2`
|
||||
4. support for a broad transport/frontend matrix
|
||||
5. failover-under-load proof or long-window soak proof beyond the current bounded
|
||||
evidence set
|
||||
|
||||
## Primary Inputs
|
||||
|
||||
1. `sw-block/design/v2-bounded-internal-pilot-pack.md`
|
||||
2. `sw-block/design/v2-pilot-preflight-checklist.md`
|
||||
3. `sw-block/design/v2-rf2-runtime-bounded-envelope.md`
|
||||
4. `sw-block/design/v2-rf2-runtime-bounded-envelope-review.md`
|
||||
5. `sw-block/.private/phase/phase-18.md`
|
||||
@@ -1,90 +0,0 @@
|
||||
# V2 Protocol-Aware Execution
|
||||
|
||||
## Purpose
|
||||
Make host-side execution in `weed/server` and `weed/storage/blockvol` obey the
|
||||
existing V2 session contract explicitly. The engine remains the semantic source
|
||||
of truth. Host code owns only:
|
||||
|
||||
- execution-state caching derived from sender/session snapshots
|
||||
- phase gating before data-plane I/O
|
||||
- observation routing back into core events
|
||||
|
||||
## Host-Side Execution State
|
||||
For each primary volume and replica, the host caches a `replica protocol
|
||||
execution state` with these fields:
|
||||
|
||||
- `ReplicaID`
|
||||
- `SenderState`
|
||||
- `SessionID`
|
||||
- `SessionKind`
|
||||
- `SessionPhase`
|
||||
- `StartLSN`
|
||||
- `TargetLSN`
|
||||
- `FrozenTargetLSN`
|
||||
- `RecoveredTo`
|
||||
- `SessionActive`
|
||||
- `LiveEligible`
|
||||
- `Reason`
|
||||
|
||||
Rules:
|
||||
|
||||
1. State is derived from `v2Orchestrator.Registry` snapshots only.
|
||||
2. `LiveEligible=false` whenever there is an active recovery session.
|
||||
3. Data-plane code must consult this cached state before shipping current live
|
||||
WAL entries.
|
||||
4. Heartbeat and publication remain projection-driven; they do not invent local
|
||||
session semantics.
|
||||
|
||||
## WAL-First Rollout
|
||||
The first rollout is intentionally narrow:
|
||||
|
||||
- cover `keepup` and WAL-based catch-up only
|
||||
- do not change snapshot/build policy
|
||||
- do not let fresh late-attached replicas consume current live-tail WAL while a
|
||||
bounded catch-up session is active
|
||||
|
||||
Current implementation seam:
|
||||
|
||||
- `weed/server/block_protocol_state.go`
|
||||
- derives host execution state from sender/session snapshots
|
||||
- binds a per-volume live-shipping policy back into `BlockVol`
|
||||
- `weed/storage/blockvol/blockvol.go`
|
||||
- carries the host-provided live-shipping policy across shipper-group rebuilds
|
||||
- `weed/storage/blockvol/wal_shipper.go`
|
||||
- checks the policy before any live-tail dial or send
|
||||
|
||||
This is intentionally a phase gate, not a second source of truth.
|
||||
|
||||
## Observation Seam
|
||||
Runtime observations should feed back through one server-side seam:
|
||||
|
||||
- sender/session snapshots -> `syncProtocolExecutionState()`
|
||||
- host event application -> `applyCoreEvent()`
|
||||
- assignment processing -> `ApplyAssignments()`
|
||||
|
||||
The rule is:
|
||||
|
||||
1. engine chooses the protocol phase
|
||||
2. host derives execution state from engine snapshots
|
||||
3. data path obeys that state
|
||||
4. host emits observed facts back through `applyCoreEvent()`
|
||||
|
||||
## Fast Test Roster
|
||||
The first fast-test roster for protocol-aware execution is:
|
||||
|
||||
- `unit`: `TestWALShipper_LiveShippingPolicyBlocksBeforeDial`
|
||||
- proves phase gate happens before any transport dial
|
||||
- `unit`: `TestWALShipper_LiveShippingPolicyAllowsShip`
|
||||
- proves the gate does not block normal live shipping after eligibility
|
||||
- `component`: `TestBlockService_ProtocolExecutionState_ActiveCatchUpBlocksLiveShipping`
|
||||
- proves sender/session snapshots become host execution state and block live
|
||||
shipping during active catch-up
|
||||
- `component`: `TestBlockService_ProtocolExecutionState_InSyncSenderAllowsLiveShipping`
|
||||
- proves the host reopens live shipping after the recovery session is gone
|
||||
|
||||
Next fast tests to add in later waves:
|
||||
|
||||
- late attach with backlog must stay bounded until target reached
|
||||
- transport contact before barrier durability must not imply publish healthy
|
||||
- timeout with valid retention pin may replan WAL catch-up
|
||||
- timeout after retention loss must escalate to build
|
||||
@@ -1,178 +0,0 @@
|
||||
# V2 Reuse vs Replacement Boundary
|
||||
|
||||
Date: 2026-04-03
|
||||
Status: active
|
||||
|
||||
## Purpose
|
||||
|
||||
This note makes one architectural split explicit for the current chosen path:
|
||||
|
||||
1. what we reuse from the existing `blockvol`/`weed` stack as mechanics
|
||||
2. what must be owned by `V2` as semantic authority
|
||||
3. what sits in the adapter boundary between them
|
||||
|
||||
The goal is to stop `V1` mixed control/data state from silently redefining `V2`
|
||||
behavior through convenience wiring.
|
||||
|
||||
Scope is still bounded to:
|
||||
|
||||
1. `RF=2`
|
||||
2. `sync_all`
|
||||
3. current master / volume-server heartbeat path
|
||||
4. `blockvol` as the execution backend
|
||||
|
||||
## Boundary Rule
|
||||
|
||||
`V1` reuse is allowed for execution mechanics.
|
||||
|
||||
`V2` replacement is required for semantic authority.
|
||||
|
||||
If a change decides protocol meaning, failover meaning, durability meaning, or
|
||||
external publication meaning, it belongs to a `V2`-owned layer even if the
|
||||
underlying I/O still runs through reused `blockvol` code.
|
||||
|
||||
This is the practical interpretation of:
|
||||
|
||||
- `v2-protocol-truths.md` `T14`: engine remains recovery authority
|
||||
- `v2-protocol-truths.md` `T15`: reuse reality, not inherited semantics
|
||||
|
||||
## Three Buckets
|
||||
|
||||
### 1. Reusable V1 Core
|
||||
|
||||
These components remain useful as mechanics:
|
||||
|
||||
| Area | Files | What stays reusable |
|
||||
|------|-------|---------------------|
|
||||
| Local storage truth | `weed/storage/blockvol/blockvol.go`, `flusher.go`, `rebuild.go`, WAL/extent helpers | WAL append, flush, checkpoint, dirty-map, extent install |
|
||||
| Replica transport | `weed/storage/blockvol/replica_apply.go`, `wal_shipper.go`, `shipper_group.go`, `dist_group_commit.go`, `repl_proto.go` | TCP receiver/shipper mechanics, barrier transport, replay/apply |
|
||||
| Frontend serving | `weed/storage/blockvol/iscsi/`, `weed/storage/blockvol/nvme/` | block-device serving once a local volume is authoritative |
|
||||
| Local role guardrails | `weed/storage/blockvol/promotion.go`, `role.go` | drain, lease revoke, local role gate enforcement |
|
||||
|
||||
Rule:
|
||||
|
||||
- these layers execute I/O and transport
|
||||
- they do not decide whether a replica is eligible, authoritative, published, or healthy in the `V2` sense
|
||||
|
||||
### 2. Adapter Boundary
|
||||
|
||||
These components translate `V2` truth into concrete runtime wiring:
|
||||
|
||||
| Area | Files | Responsibility |
|
||||
|------|-------|----------------|
|
||||
| Assignment ingest | `weed/server/volume_server_block.go` | authoritative assignment lifecycle for role apply, receiver/shipper wiring, readiness closure |
|
||||
| Heartbeat/runtime loop | `weed/server/block_heartbeat_loop.go` | collect/report status and process assignments through the same lifecycle |
|
||||
| Local store helper | `weed/storage/store_blockvol.go` | local volume open/close/iteration; no longer the authoritative assignment lifecycle |
|
||||
| Bridge | `weed/storage/blockvol/v2bridge/control.go` | convert service/control truth into engine intents |
|
||||
|
||||
Rule:
|
||||
|
||||
- the adapter boundary may reuse `blockvol` primitives
|
||||
- it must name and own lifecycle closure states explicitly
|
||||
- it must not let store-only role application masquerade as ready publication
|
||||
|
||||
### 3. V2-Owned Replacement
|
||||
|
||||
These areas define truth and therefore must remain `V2`-owned:
|
||||
|
||||
| Area | Files | Responsibility |
|
||||
|------|-------|----------------|
|
||||
| Control and identity truth | `sw-block/engine/replication/`, `weed/storage/blockvol/v2bridge/control.go` | assignment truth, stable identity, session truth |
|
||||
| Recovery ownership | `weed/server/block_recovery.go` | live runtime owner for catch-up/rebuild tasks |
|
||||
| Publication and health closure | `weed/server/master_block_registry.go`, `weed/server/master_block_failover.go` | what the system reports as ready, degraded, publishable |
|
||||
| External product surfaces | `weed/server/master_grpc_server_block.go`, `weed/server/master_server_handlers_block.go`, debug/diagnostic surfaces | operator-visible truth, not convenience guesses |
|
||||
|
||||
Rule:
|
||||
|
||||
- if the system exposes a condition to master, tester, CSI, or operator tooling, that condition must come from `V2`-named state
|
||||
|
||||
## Assignment-To-Readiness Lifecycle
|
||||
|
||||
The authoritative lifecycle for the current chosen path is:
|
||||
|
||||
```text
|
||||
assignment delivered
|
||||
-> local role applied
|
||||
-> replica receiver or primary shipper configured
|
||||
-> readiness closed
|
||||
-> heartbeat publication
|
||||
-> master registry health/publication
|
||||
```
|
||||
|
||||
More concretely:
|
||||
|
||||
1. master intent is delivered
|
||||
2. `BlockService.ApplyAssignments()` applies local role truth
|
||||
3. the same path wires receiver/shipper runtime
|
||||
4. the same path records named readiness state
|
||||
5. heartbeat publishes only what is actually publish-healthy
|
||||
6. master registry derives lookup/health from explicit readiness, not from allocation alone
|
||||
|
||||
## Named Readiness States
|
||||
|
||||
For the current implementation slice, the service boundary now names:
|
||||
|
||||
1. `roleApplied`
|
||||
2. `receiverReady`
|
||||
3. `shipperConfigured`
|
||||
4. `shipperConnected`
|
||||
5. `replicaEligible`
|
||||
6. `publishHealthy`
|
||||
|
||||
Ownership:
|
||||
|
||||
- owned by `BlockService` / adapter layer
|
||||
- observed by debug surfaces and heartbeat/publication logic
|
||||
- not delegated to `blockvol` as implicit mixed state
|
||||
|
||||
## Current File Map
|
||||
|
||||
### Reuse
|
||||
|
||||
- `weed/storage/blockvol/blockvol.go`
|
||||
- `weed/storage/blockvol/flusher.go`
|
||||
- `weed/storage/blockvol/replica_apply.go`
|
||||
- `weed/storage/blockvol/wal_shipper.go`
|
||||
- `weed/storage/blockvol/shipper_group.go`
|
||||
- `weed/storage/blockvol/dist_group_commit.go`
|
||||
- `weed/storage/blockvol/iscsi/`
|
||||
- `weed/storage/blockvol/nvme/`
|
||||
|
||||
### Adapter boundary
|
||||
|
||||
- `weed/server/volume_server_block.go`
|
||||
- `weed/server/block_heartbeat_loop.go`
|
||||
- `weed/storage/store_blockvol.go`
|
||||
- `weed/server/volume_server_block_debug.go`
|
||||
|
||||
### V2-owned replacement / truth
|
||||
|
||||
- `weed/storage/blockvol/v2bridge/control.go`
|
||||
- `sw-block/engine/replication/`
|
||||
- `weed/server/block_recovery.go`
|
||||
- `weed/server/master_block_registry.go`
|
||||
- `weed/server/master_block_failover.go`
|
||||
- `weed/server/master_grpc_server_block.go`
|
||||
- `weed/server/master_server_handlers_block.go`
|
||||
|
||||
## Immediate Engineering Rule
|
||||
|
||||
When a new bug appears, classify it first:
|
||||
|
||||
1. `v1 reusable core`: local storage or transport mechanics
|
||||
2. `adapter boundary`: assignment/readiness/publication closure bug
|
||||
3. `v2 replacement`: semantic authority, identity, ownership, eligibility, rebuild, or operator-visible truth
|
||||
|
||||
Do not patch semantic authority directly into `blockvol` unless the same change is
|
||||
also reflected as an explicit `V2` state/rule at the service or registry layer.
|
||||
|
||||
## Why This Matters For CP13-8
|
||||
|
||||
`CP13-8` found the exact class of bug this split is meant to expose:
|
||||
|
||||
- allocation/control truth said the replica existed
|
||||
- but runtime publication/read visibility was not yet closed
|
||||
|
||||
That is not a reason to throw away `blockvol`.
|
||||
It is a reason to stop treating mixed `V1` runtime state as if it were already
|
||||
closed `V2` publication truth.
|
||||
@@ -1,80 +0,0 @@
|
||||
# V2 RF2 Runtime Bounded Envelope Review
|
||||
|
||||
Date: 2026-04-05
|
||||
Status: draft
|
||||
Purpose: record the current bounded productionization judgment for the delivered
|
||||
`Phase 19` RF2 working-path envelope
|
||||
|
||||
## Review Outcome
|
||||
|
||||
Current decision:
|
||||
|
||||
1. `stay in bounded validation`
|
||||
2. `not pilot-ready`
|
||||
|
||||
## Why This Is The Correct Outcome
|
||||
|
||||
The delivered `Phase 19` path proves one bounded working RF2 block path:
|
||||
|
||||
1. live transport-backed evidence traffic exists
|
||||
2. continuous Loop 2 service exists
|
||||
3. bounded automatic failover exists
|
||||
4. runtime-managed frontend rebinding exists
|
||||
5. bounded repair/catch-up exists
|
||||
6. one real end-to-end client handoff proof exists
|
||||
7. bounded operator and CSI adapters now exist on top of runtime-owned truth
|
||||
|
||||
But the path is still not broad product/pilot approval because:
|
||||
|
||||
1. the current proof is still bounded to the current runtime harness
|
||||
2. repair/catch-up is not yet broad rebuild lifecycle closure
|
||||
3. CSI and operator surfaces are still bounded adapters rather than full
|
||||
production surfaces
|
||||
4. no broad pilot or rollout evidence exists yet
|
||||
|
||||
## Review Record
|
||||
|
||||
Reviewer reading baseline:
|
||||
|
||||
1. `sw-block/.private/phase/phase-19.md`
|
||||
2. `sw-block/design/v2-rf2-runtime-bounded-envelope.md`
|
||||
3. `sw-block/design/v2-bounded-internal-pilot-pack.md`
|
||||
4. `sw-block/design/v2-pilot-preflight-checklist.md`
|
||||
5. `sw-block/design/v2-pilot-stop-conditions.md`
|
||||
6. `sw-block/design/v2-controlled-rollout-review.md`
|
||||
7. `sw-block/runtime/volumev2/poc_test.go`
|
||||
|
||||
Current evidence package:
|
||||
|
||||
1. runtime-owned failover manager
|
||||
2. continuous Loop 2 service and bounded auto failover
|
||||
3. runtime-managed frontend and bounded repair closure
|
||||
4. end-to-end RF2 handoff proof
|
||||
5. RF2 runtime surface projection and operator surface
|
||||
6. bounded CSI runtime backend adapter
|
||||
|
||||
## Allowed Interpretation
|
||||
|
||||
The review allows only these statements:
|
||||
|
||||
1. one runtime-bearing RF2 kernel slice now exists
|
||||
2. one bounded working RF2 block path now exists
|
||||
3. one bounded productionization artifact set now exists around that path
|
||||
4. later work may widen from this review only through explicit new closure
|
||||
|
||||
The review does NOT allow:
|
||||
|
||||
1. working block product approval
|
||||
2. pilot execution against real product traffic
|
||||
3. rollout expansion beyond bounded internal engineering validation
|
||||
|
||||
## Next Required Closures
|
||||
|
||||
Before any pilot-ready judgment can exist, the next closures must become
|
||||
explicit:
|
||||
|
||||
1. multi-process / multi-host proof for the current working path
|
||||
2. broader rebuild lifecycle closure beyond the bounded repair wrapper
|
||||
3. fuller CSI lifecycle parity on the V2 runtime path
|
||||
4. broader operator/metrics surface closure
|
||||
5. pilot/preflight/containment evidence on top of the `Phase 19` path
|
||||
@@ -1,127 +0,0 @@
|
||||
# V2 RF2 Runtime Bounded Envelope
|
||||
|
||||
Date: 2026-04-05
|
||||
Status: draft
|
||||
Purpose: freeze the bounded productionization envelope around the current
|
||||
`Phase 19` working RF2 block path without overclaiming broad product readiness
|
||||
|
||||
## Reading Rule
|
||||
|
||||
This document defines the strongest bounded envelope currently justified by the
|
||||
delivered `Phase 19` path.
|
||||
|
||||
It does NOT mean:
|
||||
|
||||
1. broad launch approval
|
||||
2. working block product approval
|
||||
3. support for broad frontend or transport matrices
|
||||
4. that remaining runtime/product gaps are minor polish
|
||||
|
||||
It means only:
|
||||
|
||||
1. the current `masterv2 + volumev2 + purev2` RF2 runtime slice has a named,
|
||||
reviewable productionization boundary
|
||||
2. the current support statement, exclusions, and blockers are explicit
|
||||
3. later pilot or rollout work must stay inside this envelope or explicitly widen
|
||||
it with new evidence
|
||||
|
||||
## Envelope Basis
|
||||
|
||||
This envelope is anchored on the delivered `Phase 19` milestones:
|
||||
|
||||
1. `M6`: one live loopback HTTP transport now exists behind the evidence seam
|
||||
2. `M7`: one background Loop 2 service and one bounded auto-failover service now
|
||||
exist
|
||||
3. `M8`: one runtime-managed iSCSI export path and one bounded replica repair
|
||||
wrapper now exist
|
||||
4. `M9`: one end-to-end RF2 handoff proof now exists with continued I/O on the
|
||||
new primary
|
||||
5. `M10`: one bounded operator surface and one bounded CSI runtime backend
|
||||
adapter now exist
|
||||
|
||||
The envelope is therefore about one bounded working RF2 block path, not broad
|
||||
product readiness.
|
||||
|
||||
## Supported Envelope
|
||||
|
||||
The current bounded support statement is:
|
||||
|
||||
1. one bounded working RF2 block path now exists with:
|
||||
- `masterv2` identity/promotion authority
|
||||
- `volumev2` failover, takeover, active Loop 2 service, continuity, repair,
|
||||
frontend rebinding, and projected RF2 surface ownership
|
||||
- `purev2` execution adapter reuse
|
||||
2. one bounded live transport path now carries failover-time evidence and replica
|
||||
summaries
|
||||
3. one bounded real client handoff path now exists:
|
||||
- write through runtime-managed iSCSI export
|
||||
- bounded repair/catch-up on the runtime path
|
||||
- lose primary
|
||||
- auto fail over
|
||||
- reconnect to the new primary
|
||||
- continue I/O
|
||||
4. one bounded outward RF2 surface exists as projection only:
|
||||
- `RF2VolumeSurface`
|
||||
5. one bounded operator/CSI adapter layer exists on top of runtime-owned truth
|
||||
|
||||
## Explicit Exclusions
|
||||
|
||||
The following are OUTSIDE this bounded envelope:
|
||||
|
||||
1. broad multi-process or multi-host deployment approval
|
||||
2. broad transport/frontend matrix approval
|
||||
3. full rebuild orchestration beyond the current bounded repair/catch-up wrapper
|
||||
4. broad CSI lifecycle parity beyond the current bounded runtime backend adapter
|
||||
5. broad operator/API/metrics coverage beyond the current bounded HTTP surface
|
||||
6. broad launch or external customer support statements
|
||||
|
||||
## Current Blockers
|
||||
|
||||
The main blockers between this envelope and a working RF2 block product are:
|
||||
|
||||
1. the current path is still bounded to the current runtime harness rather than
|
||||
broad multi-process approval
|
||||
2. bounded repair/catch-up is not yet broad rebuild lifecycle closure
|
||||
3. CSI rebinding is still a bounded runtime backend adapter, not full lifecycle
|
||||
parity
|
||||
4. the operator surface is still a bounded HTTP view, not a full operational
|
||||
platform surface
|
||||
|
||||
## Allowed Validation Shape
|
||||
|
||||
The allowed validation shape inside this envelope is:
|
||||
|
||||
1. internal engineering validation only
|
||||
2. bounded lab/runtime exercise only
|
||||
3. explicit artifact-driven interpretation only
|
||||
|
||||
The following are NOT allowed interpretations:
|
||||
|
||||
1. "the system is now production ready"
|
||||
2. "the system now supports real automatic failover"
|
||||
3. "the system now supports broad product traffic and rollout"
|
||||
|
||||
## Evidence Anchors
|
||||
|
||||
Read this envelope together with:
|
||||
|
||||
1. `sw-block/.private/phase/phase-19.md`
|
||||
2. `sw-block/design/v2-kernel-closure-review.md`
|
||||
3. `sw-block/design/v2-protocol-claim-and-evidence.md`
|
||||
4. `sw-block/runtime/volumev2/runtime_manager.go`
|
||||
5. `sw-block/runtime/volumev2/continuity_runtime.go`
|
||||
6. `sw-block/runtime/volumev2/rf2_surface.go`
|
||||
7. `sw-block/runtime/volumev2/loop2_service.go`
|
||||
8. `sw-block/runtime/volumev2/frontend_runtime.go`
|
||||
9. `sw-block/runtime/volumev2/operator_surface.go`
|
||||
10. `sw-block/runtime/volumev2/poc_test.go`
|
||||
11. `weed/storage/blockvol/csi/v2_runtime_backend.go`
|
||||
|
||||
## Envelope Output
|
||||
|
||||
The correct current reading of this envelope is:
|
||||
|
||||
1. runtime-bearing RF2 kernel slice: yes
|
||||
2. bounded working RF2 block path: yes
|
||||
3. bounded productionization artifact set: yes
|
||||
4. pilot-ready broad product path: no
|
||||
@@ -1,249 +0,0 @@
|
||||
# V2 Scenario Sources From V1 and V1.5
|
||||
|
||||
Date: 2026-03-27
|
||||
|
||||
## Purpose
|
||||
|
||||
This document distills V1 / V1.5 real-test material into V2 scenario inputs.
|
||||
|
||||
Sources:
|
||||
|
||||
- `learn/projects/sw-block/phases/phase13_test.md`
|
||||
- `learn/projects/sw-block/phases/phase-13-v2-boundary-tests.md`
|
||||
|
||||
This is not the active scenario backlog.
|
||||
|
||||
Use:
|
||||
|
||||
- `v2_scenarios.md` for the active V2 scenario set
|
||||
- this file for historical source and rationale
|
||||
|
||||
## How To Use This File
|
||||
|
||||
For each item below:
|
||||
|
||||
1. keep the real V1/V1.5 test as implementation evidence
|
||||
2. create or maintain a V2 simulator scenario for the protocol core
|
||||
3. define the expected V2 behavior explicitly
|
||||
|
||||
## Source Buckets
|
||||
|
||||
### 1. Core protocol behavior
|
||||
|
||||
These are the highest-value simulator inputs.
|
||||
|
||||
- barrier durability truth
|
||||
- reconnect + catch-up
|
||||
- non-convergent catch-up -> rebuild
|
||||
- rebuild fallback
|
||||
- failover / promotion safety
|
||||
- WAL retention / tail-chasing
|
||||
- durability mode semantics
|
||||
|
||||
Recommended V2 treatment:
|
||||
|
||||
- `sim_core`
|
||||
|
||||
### 2. Supporting invariants
|
||||
|
||||
These matter, but usually as reduced simulator checks.
|
||||
|
||||
- canonical address handling
|
||||
- replica role/epoch gating
|
||||
- committed-prefix rules
|
||||
- rebuild publication cleanup
|
||||
- assignment refresh behavior
|
||||
|
||||
Recommended V2 treatment:
|
||||
|
||||
- `sim_reduced`
|
||||
|
||||
### 3. Real-only implementation behavior
|
||||
|
||||
These should usually stay in real-engine tests.
|
||||
|
||||
- actual wire encoding / decode bugs
|
||||
- real disk / `fdatasync` timing
|
||||
- NVMe / iSCSI frontend behavior
|
||||
- Go concurrency artifacts tied to concrete implementation
|
||||
|
||||
Recommended V2 treatment:
|
||||
|
||||
- `real_only`
|
||||
|
||||
### 4. V2 boundary items
|
||||
|
||||
These are especially important.
|
||||
|
||||
They should remain visible as:
|
||||
|
||||
- current V1/V1.5 limitation
|
||||
- explicit V2 acceptance target
|
||||
|
||||
Recommended V2 treatment:
|
||||
|
||||
- `v2_boundary`
|
||||
|
||||
## Distilled Scenario Inputs
|
||||
|
||||
### A. Barrier truth uses durable replica progress
|
||||
|
||||
Real source:
|
||||
|
||||
- Phase 13 barrier / `replicaFlushedLSN` tests
|
||||
|
||||
Why it matters:
|
||||
|
||||
- commit must follow durable replica progress, not send progress
|
||||
|
||||
V2 target:
|
||||
|
||||
- barrier completion counted only from explicit durable progress state
|
||||
|
||||
### B. Same-address transient outage
|
||||
|
||||
Real source:
|
||||
|
||||
- Phase 13 reconnect / catch-up tests
|
||||
- `CP13-8` short outage recovery
|
||||
|
||||
Why it matters:
|
||||
|
||||
- proves cheap short-gap recovery path
|
||||
|
||||
V2 target:
|
||||
|
||||
- explicit recoverability check
|
||||
- catch-up if recoverable
|
||||
- rebuild otherwise
|
||||
|
||||
### C. Changed-address restart
|
||||
|
||||
Real source:
|
||||
|
||||
- `CP13-8 T4b`
|
||||
- changed-address refresh fixes
|
||||
|
||||
Why it matters:
|
||||
|
||||
- endpoint is not identity
|
||||
- stale endpoint must not remain authoritative
|
||||
|
||||
V2 target:
|
||||
|
||||
- heartbeat/control-plane learns new endpoint
|
||||
- reassignment updates sender target
|
||||
- recovery session starts only after endpoint truth is updated
|
||||
|
||||
### D. Non-convergent catch-up / tail-chasing
|
||||
|
||||
Real source:
|
||||
|
||||
- Phase 13 retention + catch-up + rebuild fallback line
|
||||
|
||||
Why it matters:
|
||||
|
||||
- “catch-up exists” is not enough
|
||||
- must know when to stop and rebuild
|
||||
|
||||
V2 target:
|
||||
|
||||
- explicit `CatchingUp -> NeedsRebuild`
|
||||
- no fake success
|
||||
|
||||
### E. Slow control-plane recovery
|
||||
|
||||
Real source:
|
||||
|
||||
- `CP13-8 T4b` hardware behavior before fix
|
||||
|
||||
Why it matters:
|
||||
|
||||
- safety can be correct while availability recovery is poor
|
||||
|
||||
V2 target:
|
||||
|
||||
- explicit fast recovery path when possible
|
||||
- explicit fallback when only control-plane repair can help
|
||||
|
||||
### F. Stale message / delayed ack fencing
|
||||
|
||||
Real source:
|
||||
|
||||
- Phase 13 epoch/fencing tests
|
||||
- V2 scenario work already mirrors this
|
||||
|
||||
Why it matters:
|
||||
|
||||
- old lineage must not mutate committed prefix
|
||||
|
||||
V2 target:
|
||||
|
||||
- stale message rejection is explicit and testable
|
||||
|
||||
### G. Promotion candidate safety
|
||||
|
||||
Real source:
|
||||
|
||||
- failover / promotion gating tests
|
||||
- V2 candidate-selection work
|
||||
|
||||
Why it matters:
|
||||
|
||||
- wrong promotion loses committed lineage
|
||||
|
||||
V2 target:
|
||||
|
||||
- candidate must satisfy:
|
||||
- running
|
||||
- epoch aligned
|
||||
- state eligible
|
||||
- committed-prefix sufficient
|
||||
|
||||
### H. Rebuild boundary after failed catch-up
|
||||
|
||||
Real source:
|
||||
|
||||
- Phase 13 rebuild fallback behavior
|
||||
|
||||
Why it matters:
|
||||
|
||||
- rebuild is required when retained WAL cannot safely close the gap
|
||||
|
||||
V2 target:
|
||||
|
||||
- rebuild is explicit fallback, not ad hoc recovery
|
||||
|
||||
## Immediate Feed Into `v2_scenarios.md`
|
||||
|
||||
These are the most important V1/V1.5-derived V2 scenarios:
|
||||
|
||||
1. same-address transient outage
|
||||
2. changed-address restart
|
||||
3. non-convergent catch-up / tail-chasing
|
||||
4. stale delayed message / barrier ack rejection
|
||||
5. committed-prefix-safe promotion
|
||||
6. control-plane-latency recovery shape
|
||||
|
||||
## What Should Not Be Copied Blindly
|
||||
|
||||
Do not clone every real-engine test into the simulator.
|
||||
|
||||
Do not use the simulator for:
|
||||
|
||||
- exact OS timing
|
||||
- exact socket/wire bugs
|
||||
- exact block frontend behavior
|
||||
- implementation-specific lock races
|
||||
|
||||
Instead:
|
||||
|
||||
- extract the protocol invariant
|
||||
- model the reduced scenario if the protocol value is high
|
||||
|
||||
## Bottom Line
|
||||
|
||||
V1 / V1.5 tests should feed V2 in two ways:
|
||||
|
||||
1. as historical evidence of what failed or mattered in real life
|
||||
2. as scenario seeds for the V2 simulator and acceptance backlog
|
||||
@@ -1,135 +0,0 @@
|
||||
# V2 Separation Port Layer Audit
|
||||
|
||||
Date: 2026-04-04
|
||||
Status: active
|
||||
|
||||
## Purpose
|
||||
|
||||
This note audits the current `sw-block` port layer for the separation effort:
|
||||
|
||||
1. define which contracts already belong in `sw-block`
|
||||
2. identify what was still underspecified or mismatched
|
||||
3. record the normalized boundary for future migration batches
|
||||
|
||||
## Current Port Layer
|
||||
|
||||
The current reusable boundary inside `sw-block` is:
|
||||
|
||||
1. `sw-block/bridge/blockvol/contract.go`
|
||||
2. `sw-block/bridge/blockvol/storage_adapter.go`
|
||||
3. `sw-block/bridge/blockvol/control_adapter.go`
|
||||
|
||||
These files are the intended weed-free bridge between:
|
||||
|
||||
1. `sw-block/engine/replication`
|
||||
2. `weed/storage/blockvol/v2bridge`
|
||||
3. `weed/server/*` adapter code
|
||||
|
||||
## Audited Contracts
|
||||
|
||||
### Storage state port
|
||||
|
||||
File:
|
||||
|
||||
1. `sw-block/bridge/blockvol/contract.go`
|
||||
|
||||
Stable contract:
|
||||
|
||||
1. `BlockVolReader`
|
||||
2. `BlockVolState`
|
||||
|
||||
This is already the right ownership:
|
||||
|
||||
1. `sw-block` owns the shape of retained-history inputs
|
||||
2. `weed/` only implements how to read those facts from real `BlockVol`
|
||||
|
||||
### Retention / snapshot pinning port
|
||||
|
||||
File:
|
||||
|
||||
1. `sw-block/bridge/blockvol/contract.go`
|
||||
|
||||
Stable contract:
|
||||
|
||||
1. `BlockVolPinner`
|
||||
|
||||
This remains correct because:
|
||||
|
||||
1. pin lifecycle meaning belongs to the V2 recovery driver
|
||||
2. actual hold/release mechanics remain weed-side implementation detail
|
||||
|
||||
### Recovery execution port
|
||||
|
||||
Previous issue:
|
||||
|
||||
1. `BlockVolExecutor` in `contract.go` did not match the real engine execution
|
||||
interfaces precisely
|
||||
2. in particular, rebuild full-base transfer in the engine returns achieved LSN,
|
||||
but the contract only returned `error`
|
||||
|
||||
Normalized decision:
|
||||
|
||||
1. `sw-block` now names:
|
||||
- `BlockVolCatchUpIO`
|
||||
- `BlockVolRebuildIO`
|
||||
- `BlockVolExecutor`
|
||||
2. these contracts intentionally match:
|
||||
- `engine.CatchUpIO`
|
||||
- `engine.RebuildIO`
|
||||
|
||||
This is the right long-term boundary because:
|
||||
|
||||
1. `sw-block` owns the execution port shape
|
||||
2. `weed/storage/blockvol/v2bridge.Executor` remains only one implementation
|
||||
3. future migration can move execution code without changing engine contracts
|
||||
|
||||
### Assignment translation helper port
|
||||
|
||||
Normalized helper layer:
|
||||
|
||||
1. `ReplicaAssignmentForServer()`
|
||||
2. `RecoveryTargetForRole()`
|
||||
|
||||
These are now the canonical helper rules in:
|
||||
|
||||
1. `sw-block/bridge/blockvol/control_adapter.go`
|
||||
|
||||
They exist to stop identity / recovery-target mapping from drifting between:
|
||||
|
||||
1. `weed/storage/blockvol/v2bridge/control.go`
|
||||
2. `weed/server/volume_server_block.go`
|
||||
|
||||
## Code Normalization Completed
|
||||
|
||||
Implemented in this batch:
|
||||
|
||||
1. `sw-block/bridge/blockvol/doc.go`
|
||||
- clarified that the package owns weed-free contracts and thin adapters,
|
||||
not real blockvol implementations
|
||||
2. `sw-block/bridge/blockvol/contract.go`
|
||||
- aligned execution contracts with engine IO interfaces
|
||||
3. `sw-block/bridge/blockvol/control_adapter.go`
|
||||
- extracted canonical helper functions for identity and recovery-target
|
||||
mapping
|
||||
4. `sw-block/bridge/blockvol/bridge_test.go`
|
||||
- added interface-compatibility proof for the normalized execution contracts
|
||||
|
||||
## Resulting Boundary Rule
|
||||
|
||||
After this audit, the port layer rule is:
|
||||
|
||||
1. `sw-block` defines contracts and canonical mapping helpers
|
||||
2. `weed/` implements real storage, transport, and runtime bindings
|
||||
3. no `sw-block` package in this layer should import `weed/`
|
||||
|
||||
## What Still Does Not Move Yet
|
||||
|
||||
This audit does NOT move:
|
||||
|
||||
1. `weed/storage/blockvol/v2bridge.Executor`
|
||||
2. `weed/storage/blockvol/v2bridge.Reader`
|
||||
3. `weed/storage/blockvol/v2bridge.Pinner`
|
||||
4. `weed/server/BlockService`
|
||||
5. `weed/server/RecoveryManager`
|
||||
|
||||
It only stabilizes the port layer those migrations will target.
|
||||
@@ -73,6 +73,28 @@ Do not reuse tests that encode V1.5 semantics that V2 intentionally removed:
|
||||
| `Restore Ready` | exact base restore and snapshot-tail recovery are safe and exact | snapshot boundary, integrity, partial-failure safety, tail convergence |
|
||||
| `V2 Ready` | primary-owned assignment/session/projection semantics close on real flows | bootstrap, keepup, catchup, rebuild, failover, rejoin, publish gating |
|
||||
|
||||
## Matrix Linkage
|
||||
|
||||
Read this matrix together with:
|
||||
|
||||
1. `v2-capability-map.md` for ownership of the capability tier
|
||||
2. `v2-integration-matrix.md` for real scenario coverage
|
||||
|
||||
This matrix answers "is the capability closed at all?" It does not by itself
|
||||
guarantee that enough real topology/workload/failure scenarios have been
|
||||
exercised. That second question belongs to the integration matrix.
|
||||
|
||||
| Validation rows | Capability tier | Primary protocol refs | Main integration refs |
|
||||
|---|---|---|---|
|
||||
| `R1`-`R12` | Tier 3: RF=2 Recovery And Failover | `v2-sync-recovery-protocol.md`, `v2-rebuild-mvp-session-protocol.md` | `I-R1`-`I-R8` |
|
||||
| `S1`-`S10` | Tier 5: Lifecycle Capability | `v2-rebuild-mvp-session-protocol.md` and snapshot/restore execution rules | `I-S1`-`I-S4` |
|
||||
| `V1`-`V2` | Tier 2: RF=2 Replication Base | `v2-sync-recovery-protocol.md` | `I-V1`, `I-V2` |
|
||||
| `V3`-`V8` | Tier 3: RF=2 Recovery And Failover | `v2-sync-recovery-protocol.md`, `v2-rebuild-mvp-session-protocol.md` | `I-V3`, `I-V4`, `I-V5` |
|
||||
| `V9`-`V10` | Tier 4: Multi-Replica Runtime (`RF>=3`) | `v2-sync-recovery-protocol.md` | future RF>=3 integrated rows; currently bounded engine/component proof only |
|
||||
| `V11` | Tier 3 / Tier 8 boundary | recovery protocol plus launch-envelope disturbance claims | `I-V5` |
|
||||
| `V12`-`V13` | Tier 6: Control Plane And Operations | ownership / observability docs and control-plane protocol surfaces | `I-V4`, `I-V6` |
|
||||
| `V14` | Tier 0: Semantic Foundation | `v2-protocol-truths.md`, `v2-sync-recovery-protocol.md` | exercised across `I-V*` and negative component packs |
|
||||
|
||||
## Matrix A: Rebuild Ready
|
||||
|
||||
| ID | Priority | Scenario | Trigger / entry | Reuse | Main proof | Final validation | Coverage | File | Evidence |
|
||||
|
||||
@@ -1,151 +0,0 @@
|
||||
# V2 VolumeV2 Single-Node MVP
|
||||
|
||||
Date: 2026-04-05
|
||||
Status: active
|
||||
|
||||
## Purpose
|
||||
|
||||
This note defines the target shape for a single-node `volumev2` MVP that can
|
||||
ship as a normal block service before HA/failover exists.
|
||||
|
||||
The core idea is:
|
||||
|
||||
1. `masterv2` is fully new control ownership
|
||||
2. `volumev2` is a new shell and brain host
|
||||
3. `blockvol` and related backend mechanics remain reusable muscles
|
||||
|
||||
## Target Layering
|
||||
|
||||
`volumev2` should be strengthened around four layers.
|
||||
|
||||
### 1. Engine
|
||||
|
||||
Owner:
|
||||
|
||||
- `sw-block/engine/replication/`
|
||||
|
||||
Responsibility:
|
||||
|
||||
1. state
|
||||
2. event ingestion
|
||||
3. command emission
|
||||
4. outward projection
|
||||
|
||||
Rule:
|
||||
|
||||
- semantic truth lives here
|
||||
- no backend I/O or network ownership
|
||||
|
||||
### 2. Engine Interface
|
||||
|
||||
Owner:
|
||||
|
||||
- command/event vocabulary between control/runtime and backend execution
|
||||
|
||||
Responsibility:
|
||||
|
||||
1. assignment -> event translation
|
||||
2. observation -> event translation
|
||||
3. command -> execution dispatch contract
|
||||
|
||||
Rule:
|
||||
|
||||
- runtime shell may not mutate engine truth directly
|
||||
|
||||
### 3. Control Plane
|
||||
|
||||
Owner:
|
||||
|
||||
- `masterv2 <-> volumev2` coordination
|
||||
|
||||
Responsibility:
|
||||
|
||||
1. node identity
|
||||
2. registration and heartbeat
|
||||
3. assignment receipt
|
||||
4. state reporting
|
||||
5. future recovery-control vocabulary (`keepup`, `catchup`, `rebuild`)
|
||||
|
||||
Rule:
|
||||
|
||||
- control plane carries protocol messages
|
||||
- it does not own local data execution
|
||||
|
||||
### 4. Data Plane
|
||||
|
||||
Owner:
|
||||
|
||||
- local storage and serving mechanics
|
||||
|
||||
Responsibility:
|
||||
|
||||
1. WAL/extent management
|
||||
2. read/write/flush
|
||||
3. background workers
|
||||
4. receiver/shipper mechanics
|
||||
5. NVMe/iSCSI/frontend serving
|
||||
|
||||
Rule:
|
||||
|
||||
- data plane knows how to execute
|
||||
- it does not define publication or role semantics
|
||||
|
||||
## Single-Node MVP Contract
|
||||
|
||||
The first ship-capable `volumev2` slice should include:
|
||||
|
||||
1. `masterv2` declaration of one RF1 primary volume
|
||||
2. `volumev2` control session to fetch assignments
|
||||
3. local create/open through reused `blockvol`
|
||||
4. local primary assignment application through the V2 engine
|
||||
5. local read/write plus restart durability
|
||||
6. debug/status snapshot
|
||||
7. one small executable entrypoint for smoke usage
|
||||
|
||||
The first slice explicitly excludes:
|
||||
|
||||
1. failover
|
||||
2. RF2 replication
|
||||
3. catch-up/rebuild ownership
|
||||
4. CSI
|
||||
|
||||
## Why This Is Enough
|
||||
|
||||
This is enough to prove:
|
||||
|
||||
1. the `masterv2 + volumev2` head is viable
|
||||
2. `volumev2` can host V2 semantics while reusing V1 muscles
|
||||
3. a useful non-HA block service can exist before HA complexity is added
|
||||
|
||||
## Module Shape
|
||||
|
||||
Recommended package split:
|
||||
|
||||
1. `sw-block/runtime/masterv2/`
|
||||
2. `sw-block/runtime/volumev2/`
|
||||
3. `sw-block/runtime/purev2/`
|
||||
4. `sw-block/engine/replication/`
|
||||
5. `sw-block/bridge/blockvol/`
|
||||
|
||||
Within `volumev2`, strengthen toward:
|
||||
|
||||
1. `control_session.go`
|
||||
2. `orchestrator.go`
|
||||
3. `node.go`
|
||||
4. later: `heartbeat.go`, `frontend.go`, `workers.go`
|
||||
|
||||
## Stage Gate
|
||||
|
||||
`volumev2` may be treated as a single-node MVP only when:
|
||||
|
||||
1. assignment sync is repeatable and idempotent
|
||||
2. local IO is data-verified
|
||||
3. restart/open path is proven
|
||||
4. status/debug state is explicit
|
||||
5. no `weed/server` lifecycle owner is required
|
||||
|
||||
## Related References
|
||||
|
||||
- `v2-pure-runtime-rf1-bootstrap.md`
|
||||
- `v2-proof-and-retest-pyramid.md`
|
||||
- `v2-capability-map.md`
|
||||
@@ -196,6 +196,11 @@ func (ms *MasterServer) failoverBlockVolumes(deadServer string) {
|
||||
}
|
||||
ms.blockRegistry.FailoversTotal.Add(1)
|
||||
entries := ms.blockRegistry.ListByServer(deadServer)
|
||||
glog.V(0).Infof("failover: deadServer=%s entries=%d", deadServer, len(entries))
|
||||
for i, e := range entries {
|
||||
glog.V(0).Infof("failover: entry[%d] name=%q vs=%s role=%d hasReplica=%v epoch=%d",
|
||||
i, e.Name, e.VolumeServer, e.Role, e.HasReplica(), e.Epoch)
|
||||
}
|
||||
now := time.Now()
|
||||
for _, entry := range entries {
|
||||
// Case 1: Dead server is the primary.
|
||||
|
||||
@@ -111,6 +111,12 @@ type BlockVolumeEntry struct {
|
||||
LastLeaseGrant time.Time
|
||||
LeaseTTL time.Duration
|
||||
|
||||
// Registration race protection: the time this entry was created/registered
|
||||
// by the master. Stale cleanup skips recently registered entries to allow
|
||||
// the volume server time to discover the volume and include it in its
|
||||
// next heartbeat inventory.
|
||||
RegisteredAt time.Time
|
||||
|
||||
// CP11A-2: Coordinated expand tracking.
|
||||
ExpandInProgress bool
|
||||
ExpandFailed bool // true = primary committed but replica(s) failed; size suppressed
|
||||
@@ -424,6 +430,9 @@ func (r *BlockVolumeRegistry) Register(entry *BlockVolumeEntry) error {
|
||||
if _, ok := r.volumes[entry.Name]; ok {
|
||||
return fmt.Errorf("block volume %q already registered", entry.Name)
|
||||
}
|
||||
if entry.RegisteredAt.IsZero() {
|
||||
entry.RegisteredAt = time.Now()
|
||||
}
|
||||
entry.recomputeReplicaState()
|
||||
r.volumes[entry.Name] = entry
|
||||
r.addToServer(entry.VolumeServer, entry.Name)
|
||||
@@ -642,6 +651,14 @@ func (r *BlockVolumeRegistry) UpdateFullHeartbeatWithInventoryAuthority(server s
|
||||
name, server)
|
||||
continue
|
||||
}
|
||||
// Registration race protection: skip recently registered entries.
|
||||
// The VS may not have discovered the volume yet. Grace period
|
||||
// of 30s (> 2 heartbeat intervals) prevents premature deletion.
|
||||
if !entry.RegisteredAt.IsZero() && time.Since(entry.RegisteredAt) < 30*time.Second {
|
||||
glog.V(0).Infof("block registry: skipping stale-cleanup for %q (registered %v ago, grace period)",
|
||||
name, time.Since(entry.RegisteredAt).Round(time.Second))
|
||||
continue
|
||||
}
|
||||
delete(r.volumes, name)
|
||||
delete(names, name)
|
||||
// Also clean up replica entries from byServer.
|
||||
|
||||
@@ -88,8 +88,9 @@ func TestRegistry_ListByServer(t *testing.T) {
|
||||
func TestRegistry_UpdateFullHeartbeat(t *testing.T) {
|
||||
r := NewBlockVolumeRegistry()
|
||||
// Register two volumes on server s1.
|
||||
r.Register(&BlockVolumeEntry{Name: "vol1", VolumeServer: "s1", Path: "/v1.blk", Status: StatusPending})
|
||||
r.Register(&BlockVolumeEntry{Name: "vol2", VolumeServer: "s1", Path: "/v2.blk", Status: StatusPending})
|
||||
pastGrace := time.Now().Add(-60 * time.Second)
|
||||
r.Register(&BlockVolumeEntry{Name: "vol1", VolumeServer: "s1", Path: "/v1.blk", Status: StatusPending, RegisteredAt: pastGrace})
|
||||
r.Register(&BlockVolumeEntry{Name: "vol2", VolumeServer: "s1", Path: "/v2.blk", Status: StatusPending, RegisteredAt: pastGrace})
|
||||
|
||||
// Full heartbeat reports only vol1 (vol2 is stale).
|
||||
r.UpdateFullHeartbeat("s1", []*master_pb.BlockVolumeInfoMessage{
|
||||
@@ -127,7 +128,7 @@ func TestRegistry_UpdateFullHeartbeatWithInventoryAuthority_NonAuthoritativeEmpt
|
||||
|
||||
func TestRegistry_UpdateFullHeartbeatWithInventoryAuthority_AuthoritativeEmptyStillDeletes(t *testing.T) {
|
||||
r := NewBlockVolumeRegistry()
|
||||
r.Register(&BlockVolumeEntry{Name: "vol1", VolumeServer: "s1", Path: "/v1.blk", Status: StatusActive})
|
||||
r.Register(&BlockVolumeEntry{Name: "vol1", VolumeServer: "s1", Path: "/v1.blk", Status: StatusActive, RegisteredAt: time.Now().Add(-60 * time.Second)})
|
||||
|
||||
r.UpdateFullHeartbeatWithInventoryAuthority("s1", nil, "", true)
|
||||
|
||||
@@ -3206,3 +3207,58 @@ func TestRegistry_UpdateFullHeartbeat_EngineProjectionModePreservedOnNewPrimaryW
|
||||
t.Fatalf("EngineProjectionMode=%q, want %q from new primary", entry.EngineProjectionMode, "degraded")
|
||||
}
|
||||
}
|
||||
|
||||
func TestRegistry_StaleCleanup_SkipsRecentlyRegisteredEntry(t *testing.T) {
|
||||
r := NewBlockVolumeRegistry()
|
||||
r.MarkBlockCapable("vs1:8080")
|
||||
|
||||
// Register a volume — RegisteredAt is set automatically.
|
||||
if err := r.Register(&BlockVolumeEntry{
|
||||
Name: "vol-grace",
|
||||
VolumeServer: "vs1:8080",
|
||||
Path: "/blocks/vol-grace.blk",
|
||||
Status: StatusActive,
|
||||
Role: blockvol.RoleToWire(blockvol.RolePrimary),
|
||||
}); err != nil {
|
||||
t.Fatalf("register: %v", err)
|
||||
}
|
||||
|
||||
// Authoritative heartbeat from vs1 that does NOT report this volume.
|
||||
// Without grace period, this would delete the entry.
|
||||
r.UpdateFullHeartbeatWithInventoryAuthority("vs1:8080", nil, "", true)
|
||||
|
||||
// Entry should survive — it was just registered.
|
||||
entry, ok := r.Lookup("vol-grace")
|
||||
if !ok {
|
||||
t.Fatal("recently registered entry was deleted by stale cleanup — grace period not working")
|
||||
}
|
||||
if entry.Name != "vol-grace" {
|
||||
t.Fatalf("entry name=%q, want vol-grace", entry.Name)
|
||||
}
|
||||
}
|
||||
|
||||
func TestRegistry_StaleCleanup_DeletesOldUnreportedEntry(t *testing.T) {
|
||||
r := NewBlockVolumeRegistry()
|
||||
r.MarkBlockCapable("vs1:8080")
|
||||
|
||||
// Register a volume with RegisteredAt in the past (beyond grace period).
|
||||
if err := r.Register(&BlockVolumeEntry{
|
||||
Name: "vol-stale",
|
||||
VolumeServer: "vs1:8080",
|
||||
Path: "/blocks/vol-stale.blk",
|
||||
Status: StatusActive,
|
||||
Role: blockvol.RoleToWire(blockvol.RolePrimary),
|
||||
RegisteredAt: time.Now().Add(-60 * time.Second), // 60s ago, past grace
|
||||
}); err != nil {
|
||||
t.Fatalf("register: %v", err)
|
||||
}
|
||||
|
||||
// Authoritative heartbeat without this volume.
|
||||
r.UpdateFullHeartbeatWithInventoryAuthority("vs1:8080", nil, "", true)
|
||||
|
||||
// Entry should be deleted — it's old and not reported.
|
||||
_, ok := r.Lookup("vol-stale")
|
||||
if ok {
|
||||
t.Fatal("old unreported entry survived stale cleanup — grace period should not protect it")
|
||||
}
|
||||
}
|
||||
|
||||
@@ -349,7 +349,10 @@ func (vs *VolumeServer) doHeartbeatWithRetry(masterAddress pb.ServerAddress, grp
|
||||
return
|
||||
case <-vs.stopChan:
|
||||
var volumeMessages []*master_pb.VolumeInformationMessage
|
||||
blockInventoryAuthoritative := true
|
||||
// Shutdown beat: clear regular volumes but do NOT claim block
|
||||
// inventory authority. The block registry entry must survive
|
||||
// shutdown so failoverBlockVolumes can promote the replica.
|
||||
noBlockAuthority := false
|
||||
emptyBeat := &master_pb.Heartbeat{
|
||||
Ip: ip,
|
||||
Port: port,
|
||||
@@ -359,8 +362,8 @@ func (vs *VolumeServer) doHeartbeatWithRetry(masterAddress pb.ServerAddress, grp
|
||||
Rack: rack,
|
||||
Volumes: volumeMessages,
|
||||
HasNoVolumes: len(volumeMessages) == 0,
|
||||
HasNoBlockVolumes: vs.blockService != nil,
|
||||
BlockVolumeInventoryAuthoritative: &blockInventoryAuthoritative,
|
||||
HasNoBlockVolumes: false,
|
||||
BlockVolumeInventoryAuthoritative: &noBlockAuthority,
|
||||
}
|
||||
glog.V(1).Infof("volume server %s:%d stops and deletes all volumes", vs.store.Ip, vs.store.Port)
|
||||
if err = stream.Send(emptyBeat); err != nil {
|
||||
|
||||
@@ -211,13 +211,13 @@ func CreateBlockVol(path string, opts CreateOptions, cfgs ...BlockVolConfig) (*B
|
||||
Interval: cfg.FlushInterval,
|
||||
Metrics: v.Metrics,
|
||||
BatchIO: bio,
|
||||
// CP13-6: replica-aware WAL retention.
|
||||
RetentionFloorFn: func() (uint64, bool) {
|
||||
if v.shipperGroup == nil {
|
||||
return 0, false
|
||||
}
|
||||
return v.shipperGroup.MinRecoverableFlushedLSN()
|
||||
},
|
||||
// No keepup WAL retention: flusher recycles freely. If a replica
|
||||
// falls behind and WAL entries are recycled, it escalates to
|
||||
// NeedsRebuild — the correct outcome. Catch-up from extent via
|
||||
// the LBA dirty map (V2) will eliminate this tension entirely.
|
||||
// Session-only WAL pins (for active rebuild/catch-up) are handled
|
||||
// separately by SetV2RetentionFloor.
|
||||
RetentionFloorFn: nil,
|
||||
EvaluateRetentionBudgetsFn: func() {
|
||||
if v.shipperGroup != nil {
|
||||
v.shipperGroup.EvaluateRetentionBudgets(RetentionBudgetParams{
|
||||
@@ -340,17 +340,13 @@ func OpenBlockVol(path string, cfgs ...BlockVolConfig) (*BlockVol, error) {
|
||||
Interval: cfg.FlushInterval,
|
||||
Metrics: v.Metrics,
|
||||
BatchIO: bio,
|
||||
RetentionFloorFn: func() (uint64, bool) {
|
||||
if v.shipperGroup == nil {
|
||||
return 0, false
|
||||
}
|
||||
return v.shipperGroup.MinRecoverableFlushedLSN()
|
||||
},
|
||||
// No keepup WAL retention (same as CreateBlockVol path).
|
||||
RetentionFloorFn: nil,
|
||||
EvaluateRetentionBudgetsFn: func() {
|
||||
if v.shipperGroup != nil {
|
||||
v.shipperGroup.EvaluateRetentionBudgets(RetentionBudgetParams{
|
||||
Timeout: walRetentionTimeout,
|
||||
MaxBytes: 0, // CP13-6 max-bytes disabled: uses replicaFlushedLSN which can't advance without barrier; v2 will replace with negotiated recovery protocol
|
||||
MaxBytes: 0, // CP13-6 max-bytes disabled
|
||||
PrimaryHeadLSN: v.nextLSN.Load() - 1,
|
||||
BlockSize: v.super.BlockSize,
|
||||
})
|
||||
|
||||
@@ -230,6 +230,34 @@ func (sg *ShipperGroup) MinRecoverableFlushedLSN() (uint64, bool) {
|
||||
return min, found
|
||||
}
|
||||
|
||||
// MinShippedLSN returns the minimum shippedLSN across all active shippers
|
||||
// (not NeedsRebuild). This is the Ceph-model retention watermark: the flusher
|
||||
// must not recycle WAL entries past the slowest active shipper's shipped
|
||||
// position, because those entries are needed for catch-up if the shipper
|
||||
// degrades during sustained async writes.
|
||||
//
|
||||
// Returns (0, false) if no shipper has shipped anything yet.
|
||||
func (sg *ShipperGroup) MinShippedLSN() (uint64, bool) {
|
||||
sg.mu.RLock()
|
||||
defer sg.mu.RUnlock()
|
||||
var min uint64
|
||||
found := false
|
||||
for _, s := range sg.shippers {
|
||||
if s.State() == ReplicaNeedsRebuild {
|
||||
continue
|
||||
}
|
||||
lsn := s.ShippedLSN()
|
||||
if lsn == 0 {
|
||||
continue // hasn't shipped yet — don't pin at 0
|
||||
}
|
||||
if !found || lsn < min {
|
||||
min = lsn
|
||||
found = true
|
||||
}
|
||||
}
|
||||
return min, found
|
||||
}
|
||||
|
||||
// RetentionBudgetParams holds the inputs for retention budget evaluation.
|
||||
type RetentionBudgetParams struct {
|
||||
Timeout time.Duration
|
||||
|
||||
@@ -76,6 +76,18 @@ func assertEqual(ctx context.Context, actx *tr.ActionContext, act tr.Action) (ma
|
||||
actual := act.Params["actual"]
|
||||
expected := act.Params["expected"]
|
||||
|
||||
// Reject empty strings — prevents false positives when an upstream action
|
||||
// failed silently and returned empty. "" == "" would hide real failures.
|
||||
if actual == "" && expected == "" {
|
||||
return nil, fmt.Errorf("assert_equal: both actual and expected are empty — likely upstream action failure")
|
||||
}
|
||||
if actual == "" {
|
||||
return nil, fmt.Errorf("assert_equal: actual is empty (expected %q) — likely upstream action failure", expected)
|
||||
}
|
||||
if expected == "" {
|
||||
return nil, fmt.Errorf("assert_equal: expected is empty (actual %q) — likely upstream action failure", actual)
|
||||
}
|
||||
|
||||
if actual != expected {
|
||||
return nil, fmt.Errorf("assert_equal: %q != %q", actual, expected)
|
||||
}
|
||||
|
||||
@@ -187,16 +187,18 @@ phases:
|
||||
- name: kill-primary
|
||||
actions:
|
||||
- action: print
|
||||
msg: "=== Killing primary ({{ before_server }}) ==="
|
||||
msg: "=== Killing primary ({{ before }}, {{ before_server }}) ==="
|
||||
|
||||
- action: exec
|
||||
node: m02
|
||||
cmd: "kill -9 {{ vs1_pid }}"
|
||||
root: "true"
|
||||
# Kill the primary VS using stop_weed with the discovered primary's PID.
|
||||
# Master consistently places the primary on m01 (vs2_pid) in this
|
||||
# topology. discover_primary confirms this.
|
||||
- action: stop_weed
|
||||
node: m01
|
||||
pid: "{{ vs2_pid }}"
|
||||
ignore_error: true
|
||||
|
||||
- action: print
|
||||
msg: "Primary killed. Waiting for lease expiry (45s)..."
|
||||
msg: "Primary killed on m01 ({{ before_server }}). Waiting for lease expiry (45s)..."
|
||||
|
||||
- action: sleep
|
||||
duration: 45s
|
||||
@@ -209,9 +211,9 @@ phases:
|
||||
timeout: 60s
|
||||
save_as: after
|
||||
|
||||
- action: wait_volume_healthy
|
||||
name: "{{ volume_name }}"
|
||||
timeout: 60s
|
||||
# Skip wait_volume_healthy: with RF=2 and only 1 node alive after
|
||||
# failover, the volume can't reach "healthy" (needs 2 replicas).
|
||||
# The wait_block_primary above already confirms failover succeeded.
|
||||
|
||||
- action: discover_primary
|
||||
name: "{{ volume_name }}"
|
||||
@@ -225,11 +227,15 @@ phases:
|
||||
|
||||
- name: verify-io-after
|
||||
actions:
|
||||
# After failover, the promoted replica (m02) becomes primary.
|
||||
# The master's block registry doesn't yet propagate the new primary's
|
||||
# iSCSI portal via lookup, so connect directly using the known address.
|
||||
# m02's iSCSI is on the RDMA IP (10.0.0.3) port 3295.
|
||||
- action: iscsi_login_direct
|
||||
node: m01
|
||||
host: "10.0.0.1"
|
||||
host: "10.0.0.3"
|
||||
port: "3295"
|
||||
iqn: "{{ vol_iqn }}"
|
||||
iqn: "iqn.2024-01.com.seaweedfs:vol.{{ volume_name }}"
|
||||
save_as: device2
|
||||
|
||||
- action: dd_read_md5
|
||||
|
||||
Reference in New Issue
Block a user