mirror of
https://github.com/seaweedfs/seaweedfs.git
synced 2026-05-21 01:01:29 +00:00
chore: archive superseded V2 design docs
Copies of design docs removed in Phase 09, preserved in sw-block/docs/archive/ for historical reference. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
28
sw-block/docs/archive/design/README.md
Normal file
28
sw-block/docs/archive/design/README.md
Normal file
@@ -0,0 +1,28 @@
|
||||
# Design Archive
|
||||
|
||||
This directory contains historical `sw-block` design/planning documents that are still worth keeping as references, but are no longer the main entrypoints for current work.
|
||||
|
||||
Use `sw-block/design/` for active design and process documents.
|
||||
Use `sw-block/.private/phase/` for current phase contracts, logs, and slice-level execution packages.
|
||||
|
||||
## Archived Here
|
||||
|
||||
- `v2-production-roadmap.md`
|
||||
- `v2-engine-readiness-review.md`
|
||||
- `v2-engine-slicing-plan.md`
|
||||
- `v2-prototype-roadmap-and-gates.md`
|
||||
- `phase-07-service-slice-plan.md`
|
||||
- `phase-08-engine-skeleton-map.md`
|
||||
- `v2-first-slice-session-ownership.md`
|
||||
- `v2-first-slice-sender-ownership.md`
|
||||
- `a5-a8-traceability.md`
|
||||
|
||||
## Why Archived
|
||||
|
||||
These documents are useful for:
|
||||
|
||||
1. historical decision context
|
||||
2. earlier slice/phase rationale
|
||||
3. traceability for passed reviews and planning gates
|
||||
|
||||
They are not the canonical source for the current phase roadmap.
|
||||
117
sw-block/docs/archive/design/a5-a8-traceability.md
Normal file
117
sw-block/docs/archive/design/a5-a8-traceability.md
Normal file
@@ -0,0 +1,117 @@
|
||||
# A5-A8 Acceptance Traceability
|
||||
|
||||
Date: 2026-03-29
|
||||
Status: historical evidence traceability
|
||||
|
||||
## Purpose
|
||||
|
||||
Map each acceptance criterion to specific executable evidence.
|
||||
Two evidence layers:
|
||||
- **Simulator** (distsim): protocol-level proof
|
||||
- **Prototype** (enginev2): ownership/session-level proof
|
||||
|
||||
---
|
||||
|
||||
## A5: Non-Convergent Catch-Up Escalates Explicitly
|
||||
|
||||
**Must prove**: tail-chasing or failed catch-up does not pretend success.
|
||||
|
||||
**Pass condition**: explicit `CatchingUp → NeedsRebuild` transition.
|
||||
|
||||
| Evidence | Test | File | Layer | Status |
|
||||
|----------|------|------|-------|--------|
|
||||
| Tail-chasing converges or aborts | `TestS6_TailChasing_ConvergesOrAborts` | `cluster_test.go` | distsim | PASS |
|
||||
| Tail-chasing non-convergent → NeedsRebuild | `TestS6_TailChasing_NonConvergent_EscalatesToNeedsRebuild` | `phase02_advanced_test.go` | distsim | PASS |
|
||||
| Catch-up timeout → NeedsRebuild | `TestP03_CatchupTimeout_EscalatesToNeedsRebuild` | `phase03_timeout_test.go` | distsim | PASS |
|
||||
| Reservation expiry aborts catch-up | `TestReservationExpiryAbortsCatchup` | `cluster_test.go` | distsim | PASS |
|
||||
| Flapping budget exceeded → NeedsRebuild | `TestP02_S5_FlappingExceedsBudget_EscalatesToNeedsRebuild` | `phase02_advanced_test.go` | distsim | PASS |
|
||||
| Catch-up converges or escalates (I3) | `TestI3_CatchUpConvergesOrEscalates` | `phase045_crash_test.go` | distsim | PASS |
|
||||
| Catch-up timeout in enginev2 | `TestE2E_NeedsRebuild_Escalation` | `p2_test.go` | enginev2 | PASS |
|
||||
|
||||
**Verdict**: A5 is well-covered. Both simulator and prototype prove explicit escalation. No pretend-success path exists.
|
||||
|
||||
---
|
||||
|
||||
## A6: Recoverability Boundary Is Explicit
|
||||
|
||||
**Must prove**: recoverable vs unrecoverable gap is decided explicitly.
|
||||
|
||||
**Pass condition**: recovery aborts when reservation/payload availability is lost; rebuild is explicit fallback.
|
||||
|
||||
| Evidence | Test | File | Layer | Status |
|
||||
|----------|------|------|-------|--------|
|
||||
| Reservation expiry aborts catch-up | `TestReservationExpiryAbortsCatchup` | `cluster_test.go` | distsim | PASS |
|
||||
| WAL GC beyond replica → NeedsRebuild | `TestI5_CheckpointGC_PreservesAckedBoundary` | `phase045_crash_test.go` | distsim | PASS |
|
||||
| Rebuild from snapshot + tail | `TestReplicaRebuildFromSnapshotAndTail` | `cluster_test.go` | distsim | PASS |
|
||||
| Smart WAL: resolvable → unresolvable | `TestP02_SmartWAL_RecoverableThenUnrecoverable` | `phase02_advanced_test.go` | distsim | PASS |
|
||||
| Time-varying payload availability | `TestP02_SmartWAL_TimeVaryingAvailability` | `phase02_advanced_test.go` | distsim | PASS |
|
||||
| RecoverableLSN is replayability proof | `RecoverableLSN()` in `storage.go` | `storage.go` | distsim | Implemented |
|
||||
| Handshake outcome: NeedsRebuild | `TestExec_HandshakeOutcome_NeedsRebuild_InvalidatesSession` | `execution_test.go` | enginev2 | PASS |
|
||||
|
||||
**Verdict**: A6 is covered. Recovery boundary is decided by explicit reservation + recoverability check, not by optimistic assumption. `RecoverableLSN()` verifies contiguous WAL coverage.
|
||||
|
||||
---
|
||||
|
||||
## A7: Historical Data Correctness Holds
|
||||
|
||||
**Must prove**: recovered data for target LSN is historically correct; current extent cannot fake old history.
|
||||
|
||||
**Pass condition**: snapshot + tail rebuild matches reference; current-extent reconstruction of old LSN fails correctness.
|
||||
|
||||
| Evidence | Test | File | Layer | Status |
|
||||
|----------|------|------|-------|--------|
|
||||
| Snapshot + tail matches reference | `TestReplicaRebuildFromSnapshotAndTail` | `cluster_test.go` | distsim | PASS |
|
||||
| Historical state not reconstructable after GC | `TestA7_HistoricalState_NotReconstructableAfterGC` | `phase045_crash_test.go` | distsim | PASS |
|
||||
| `CanReconstructAt()` rejects faked history | `CanReconstructAt()` in `storage.go` | `storage.go` | distsim | Implemented |
|
||||
| Checkpoint does not leak applied state | `TestI2_CheckpointDoesNotLeakAppliedState` | `phase045_crash_test.go` | distsim | PASS |
|
||||
| Extent-referenced resolvable records | `TestExtentReferencedResolvableRecordsAreRecoverable` | `cluster_test.go` | distsim | PASS |
|
||||
| Extent-referenced unresolvable → rebuild | `TestExtentReferencedUnresolvableForcesRebuild` | `cluster_test.go` | distsim | PASS |
|
||||
| ACK'd flush recoverable after crash (I1) | `TestI1_AckedFlush_RecoverableAfterPrimaryCrash` | `phase045_crash_test.go` | distsim | PASS |
|
||||
|
||||
**Verdict**: A7 is now covered with the Phase 4.5 crash-consistency additions. The critical gap ("current extent cannot fake old history") is proven by `CanReconstructAt()` + `TestA7_HistoricalState_NotReconstructableAfterGC`.
|
||||
|
||||
---
|
||||
|
||||
## A8: Durability Mode Semantics Are Correct
|
||||
|
||||
**Must prove**: best_effort, sync_all, sync_quorum behave as intended under mixed replica states.
|
||||
|
||||
**Pass condition**: sync_all strict, sync_quorum commits only with true durable quorum, invalid topology rejected.
|
||||
|
||||
| Evidence | Test | File | Layer | Status |
|
||||
|----------|------|------|-------|--------|
|
||||
| sync_quorum continues with one lagging | `TestSyncQuorumContinuesWithOneLaggingReplica` | `cluster_test.go` | distsim | PASS |
|
||||
| sync_all blocks with one lagging | `TestSyncAllBlocksWithOneLaggingReplica` | `cluster_test.go` | distsim | PASS |
|
||||
| sync_quorum mixed states | `TestSyncQuorumWithMixedReplicaStates` | `cluster_test.go` | distsim | PASS |
|
||||
| sync_all mixed states | `TestSyncAllBlocksWithMixedReplicaStates` | `cluster_test.go` | distsim | PASS |
|
||||
| Barrier timeout: sync_all blocked | `TestP03_BarrierTimeout_SyncAll_Blocked` | `phase03_timeout_test.go` | distsim | PASS |
|
||||
| Barrier timeout: sync_quorum commits | `TestP03_BarrierTimeout_SyncQuorum_StillCommits` | `phase03_timeout_test.go` | distsim | PASS |
|
||||
| Promotion uses RecoverableLSN | `EvaluateCandidateEligibility()` | `cluster.go` | distsim | Implemented |
|
||||
| Promoted replica has committed prefix (I4) | `TestI4_PromotedReplica_HasCommittedPrefix` | `phase045_crash_test.go` | distsim | PASS |
|
||||
|
||||
**Verdict**: A8 is well-covered. sync_all is strict (blocks on lagging), sync_quorum uses true durable quorum (not connection count). Promotion now uses `RecoverableLSN()` for committed-prefix check.
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
| Criterion | Simulator Evidence | Prototype Evidence | Status |
|
||||
|-----------|-------------------|-------------------|--------|
|
||||
| A5 (catch-up escalation) | 6 tests | 1 test | **Strong** |
|
||||
| A6 (recoverability boundary) | 6 tests + RecoverableLSN() | 1 test | **Strong** |
|
||||
| A7 (historical correctness) | 7 tests + CanReconstructAt() | — | **Strong** (new in Phase 4.5) |
|
||||
| A8 (durability modes) | 7 tests + RecoverableLSN() | — | **Strong** |
|
||||
|
||||
**Total executable evidence**: 26 simulator tests + 2 prototype tests + 2 new storage methods.
|
||||
|
||||
All A5-A8 acceptance criteria have direct test evidence. No criterion depends solely on design-doc claims.
|
||||
|
||||
---
|
||||
|
||||
## Still Open (Not Blocking)
|
||||
|
||||
| Item | Priority | Why not blocking |
|
||||
|------|----------|-----------------|
|
||||
| Predicate exploration / adversarial search | P2 | Manual scenarios already cover known failure classes |
|
||||
| Catch-up convergence under sustained load | P2 | I3 proves escalation; load-rate modeling is optimization |
|
||||
| A5-A8 in a single grouped runner view | P3 | Traceability doc serves as grouped evidence for now |
|
||||
403
sw-block/docs/archive/design/phase-07-service-slice-plan.md
Normal file
403
sw-block/docs/archive/design/phase-07-service-slice-plan.md
Normal file
@@ -0,0 +1,403 @@
|
||||
# Phase 07 Service-Slice Plan
|
||||
|
||||
Date: 2026-03-30
|
||||
Status: historical phase-planning artifact
|
||||
Scope: `Phase 07 P0`
|
||||
|
||||
## Purpose
|
||||
|
||||
Define the first real-system service slice that will host the V2 engine, choose the first concrete integration path in the existing codebase, and map engine adapters onto real modules.
|
||||
|
||||
This is a planning document. It does not claim the integration already works.
|
||||
|
||||
## Decision
|
||||
|
||||
The first service slice should be:
|
||||
|
||||
- a single `blockvol` primary on a real volume server
|
||||
- with one replica target (`RF=2` path)
|
||||
- driven by the existing master heartbeat / assignment loop
|
||||
- using the V2 engine only for replication recovery ownership / planning / execution
|
||||
|
||||
This is the narrowest real-system slice that still exercises:
|
||||
|
||||
1. real assignment delivery
|
||||
2. real epoch and failover signals
|
||||
3. real volume-server lifecycle
|
||||
4. real WAL/checkpoint/base-image truth
|
||||
5. real changed-address / reconnect behavior
|
||||
|
||||
It is narrow enough to avoid reopening the whole system, but real enough to stop hiding behind engine-local mocks.
|
||||
|
||||
## Why This Slice
|
||||
|
||||
This slice is the right first integration target because:
|
||||
|
||||
1. `weed/server/master_grpc_server.go` already delivers block-volume assignments over heartbeat
|
||||
2. `weed/server/master_block_failover.go` already owns failover / promotion / pending rebuild decisions
|
||||
3. `weed/storage/blockvol/blockvol.go` already owns the current replication runtime (`shipperGroup`, receiver, WAL retention, checkpoint state)
|
||||
4. the existing V1/V1.5 failure history is concentrated in exactly this master <-> volume-server <-> blockvol path
|
||||
|
||||
So this slice gives maximum validation value with minimum new surface.
|
||||
|
||||
## First Concrete Integration Path
|
||||
|
||||
The first integration path should be:
|
||||
|
||||
1. master receives volume-server heartbeat
|
||||
2. master updates block registry and emits `BlockVolumeAssignment`
|
||||
3. volume server receives assignment
|
||||
4. block volume adapter converts assignment + local storage state into V2 engine inputs
|
||||
5. V2 engine drives sender/session/recovery state
|
||||
6. existing block-volume runtime executes the actual data-path work under engine decisions
|
||||
|
||||
In code, that path starts here:
|
||||
|
||||
- master side:
|
||||
- `weed/server/master_grpc_server.go`
|
||||
- `weed/server/master_block_failover.go`
|
||||
- `weed/server/master_block_registry.go`
|
||||
- volume / storage side:
|
||||
- `weed/storage/blockvol/blockvol.go`
|
||||
- `weed/storage/blockvol/recovery.go`
|
||||
- `weed/storage/blockvol/wal_shipper.go`
|
||||
- assignment-handling code under `weed/storage/blockvol/`
|
||||
- V2 engine side:
|
||||
- `sw-block/engine/replication/`
|
||||
|
||||
## Service-Slice Boundaries
|
||||
|
||||
### In-process placement
|
||||
|
||||
The V2 engine should initially live:
|
||||
|
||||
- in-process with the volume server / `blockvol` runtime
|
||||
- not in master
|
||||
- not as a separate service yet
|
||||
|
||||
Reason:
|
||||
|
||||
- the engine needs local access to storage truth and local recovery execution
|
||||
- master should remain control-plane authority, not recovery executor
|
||||
|
||||
### Control-plane boundary
|
||||
|
||||
Master remains authoritative for:
|
||||
|
||||
1. epoch
|
||||
2. role / assignment
|
||||
3. promotion / failover decision
|
||||
4. replica membership
|
||||
|
||||
The engine consumes these as control inputs. It does not replace master failover policy in `Phase 07`.
|
||||
|
||||
### Control-Over-Heartbeat Upgrade Path
|
||||
|
||||
For the first V2 product path, the recommended direction is:
|
||||
|
||||
- reuse the existing master <-> volume-server heartbeat path as the control carrier
|
||||
- upgrade the block-specific control semantics carried on that path
|
||||
- do not immediately invent a separate control service or assignment channel
|
||||
|
||||
Why:
|
||||
|
||||
1. this is the real Seaweed path already carrying block assignments and confirmations today
|
||||
2. this gives the fastest route to a real integrated control path
|
||||
3. it preserves compatibility with existing Seaweed master/volume-server semantics while V2 hardens its own control truth
|
||||
|
||||
Concretely, the current V1 path already provides:
|
||||
|
||||
1. block assignments delivered in heartbeat responses from `weed/server/master_grpc_server.go`
|
||||
2. assignment application on the volume server in `weed/server/volume_grpc_client_to_master.go` and `weed/server/volume_server_block.go`
|
||||
3. assignment confirmation and address-change refresh driven by later heartbeats in `weed/server/master_grpc_server.go` and `weed/server/master_block_registry.go`
|
||||
4. immediate block heartbeat on selected shipper state changes in `weed/server/volume_grpc_client_to_master.go`
|
||||
|
||||
What should be upgraded for V2 is not mainly the transport, but the control contract carried on it:
|
||||
|
||||
1. stable `ReplicaID`
|
||||
2. explicit `Epoch`
|
||||
3. explicit role / assignment authority
|
||||
4. explicit apply/confirm semantics
|
||||
5. explicit stale assignment rejection
|
||||
6. explicit address-change refresh as endpoint change, not identity change
|
||||
|
||||
Current cadence note:
|
||||
|
||||
- the block volume heartbeat is periodic (`5 * sleepInterval`) with some immediate state-change heartbeats
|
||||
- this is acceptable as the first hardening carrier
|
||||
- it should not be assumed to be the final control responsiveness model
|
||||
|
||||
Deferred design decision:
|
||||
|
||||
- whether block control should eventually move beyond heartbeat-only carriage into a more explicit control/assignment channel should be decided only after the `Phase 08 P1` real control-delivery path exists and can be measured
|
||||
|
||||
That later decision should be based on:
|
||||
|
||||
1. failover / reassignment responsiveness
|
||||
2. assignment confirmation precision
|
||||
3. operational complexity
|
||||
4. whether heartbeat carriage remains too coarse for the block-control path
|
||||
|
||||
Until then, the preferred direction is:
|
||||
|
||||
- strengthen block control semantics over the existing heartbeat path
|
||||
- do not prematurely create a second control plane
|
||||
|
||||
### Storage boundary
|
||||
|
||||
`blockvol` remains authoritative for:
|
||||
|
||||
1. WAL head / retention reality
|
||||
2. checkpoint/base-image reality
|
||||
3. actual catch-up streaming
|
||||
4. actual rebuild transfer / restore operations
|
||||
|
||||
The engine consumes these as storage truth and recovery execution capabilities. It does not replace the storage backend in `Phase 07`.
|
||||
|
||||
## First-Slice Identity Mapping
|
||||
|
||||
This must be explicit in the first integration slice.
|
||||
|
||||
For `RF=2` on the existing master / block registry path:
|
||||
|
||||
- stable engine `ReplicaID` should be derived from:
|
||||
- `<volume-name>/<replica-server-id>`
|
||||
- not from:
|
||||
- `DataAddr`
|
||||
- `CtrlAddr`
|
||||
- heartbeat transport endpoint
|
||||
|
||||
For this slice, the adapter should map:
|
||||
|
||||
1. `ReplicaID`
|
||||
- from master/block-registry identity for the replica host entry
|
||||
|
||||
2. `Endpoint`
|
||||
- from the current replica receiver/data/control addresses reported by the real runtime
|
||||
|
||||
3. `Epoch`
|
||||
- from the confirmed master assignment for the volume
|
||||
|
||||
4. `SessionKind`
|
||||
- from master-driven recovery intent / role transition outcome
|
||||
|
||||
This is a hard first-slice requirement because address refresh must not collapse identity back into endpoint-shaped keys.
|
||||
|
||||
## Adapter Mapping
|
||||
|
||||
### 1. ControlPlaneAdapter
|
||||
|
||||
Engine interface today:
|
||||
|
||||
- `HandleHeartbeat(serverID, volumes)`
|
||||
- `HandleFailover(deadServerID)`
|
||||
|
||||
Real mapping should be:
|
||||
|
||||
- master-side source:
|
||||
- `weed/server/master_grpc_server.go`
|
||||
- `weed/server/master_block_failover.go`
|
||||
- `weed/server/master_block_registry.go`
|
||||
- volume-server side sink:
|
||||
- assignment receive/apply path in `weed/storage/blockvol/`
|
||||
|
||||
Recommended real shape:
|
||||
|
||||
- do not literally push raw heartbeat messages into the engine
|
||||
- instead introduce a thin adapter that converts confirmed master assignment state into:
|
||||
- stable `ReplicaID`
|
||||
- endpoint set
|
||||
- epoch
|
||||
- recovery target kind
|
||||
|
||||
That keeps master as control owner and the engine as execution owner.
|
||||
|
||||
Important note:
|
||||
|
||||
- the adapter should treat heartbeat as the transport carrier, not as the final protocol shape
|
||||
- block-control semantics should be made explicit over that carrier
|
||||
- if a later phase concludes that heartbeat-only carriage is too coarse, that should be a separate design decision after the real hardening path is measured
|
||||
|
||||
### 2. StorageAdapter
|
||||
|
||||
Engine interface today:
|
||||
|
||||
- `GetRetainedHistory()`
|
||||
- `PinSnapshot(lsn)` / `ReleaseSnapshot(pin)`
|
||||
- `PinWALRetention(startLSN)` / `ReleaseWALRetention(pin)`
|
||||
- `PinFullBase(committedLSN)` / `ReleaseFullBase(pin)`
|
||||
|
||||
Real mapping should be:
|
||||
|
||||
- retained history source:
|
||||
- current WAL head/tail/checkpoint state from `weed/storage/blockvol/blockvol.go`
|
||||
- recovery helpers in `weed/storage/blockvol/recovery.go`
|
||||
- WAL retention pin:
|
||||
- existing retention-floor / replica-aware WAL retention machinery around `shipperGroup`
|
||||
- snapshot pin:
|
||||
- existing snapshot/checkpoint artifacts in `blockvol`
|
||||
- full-base pin:
|
||||
- explicit pinned full-extent export or equivalent consistent base handle from `blockvol`
|
||||
|
||||
Important constraint:
|
||||
|
||||
- `Phase 07` must not fake this by reconstructing `RetainedHistory` from tests or metadata alone
|
||||
|
||||
### 3. Execution Driver / Executor hookup
|
||||
|
||||
Engine side already has:
|
||||
|
||||
- planner/executor split in `sw-block/engine/replication/driver.go`
|
||||
- stepwise executors in `sw-block/engine/replication/executor.go`
|
||||
|
||||
Real mapping should be:
|
||||
|
||||
- engine planner decides:
|
||||
- zero-gap / catch-up / rebuild
|
||||
- trusted-base requirement
|
||||
- replayable-tail requirement
|
||||
- blockvol runtime performs:
|
||||
- actual WAL catch-up transport
|
||||
- actual snapshot/base transfer
|
||||
- actual truncation / apply operations
|
||||
|
||||
Recommended split:
|
||||
|
||||
- engine owns contract and state transitions
|
||||
- blockvol adapter owns concrete I/O work
|
||||
|
||||
## First-Slice Acceptance Rule
|
||||
|
||||
For the first integration slice, this is a hard rule:
|
||||
|
||||
- `blockvol` may execute recovery I/O
|
||||
- `blockvol` must not own recovery policy
|
||||
|
||||
Concretely, `blockvol` must not decide:
|
||||
|
||||
1. zero-gap vs catch-up vs rebuild
|
||||
2. trusted-base validity
|
||||
3. replayable-tail sufficiency
|
||||
4. whether rebuild fallback is required
|
||||
|
||||
Those decisions must remain in the V2 engine.
|
||||
|
||||
The bridge may translate engine decisions into concrete blockvol actions, but it must not re-decide recovery policy underneath the engine.
|
||||
|
||||
## First Product Path
|
||||
|
||||
The first product path should be:
|
||||
|
||||
- `RF=2` block volume replication on the existing heartbeat/assignment loop
|
||||
- primary + one replica
|
||||
- failover / reconnect / changed-address handling
|
||||
- rebuild as the formal non-catch-up recovery path
|
||||
|
||||
This is the right first path because it exercises the core correctness boundary without introducing N-replica coordination complexity too early.
|
||||
|
||||
## What Must Be Replaced First
|
||||
|
||||
Current engine-stage pieces that are still mock/test-only or too abstract:
|
||||
|
||||
### Replace first
|
||||
|
||||
1. `mockStorage` in engine tests
|
||||
- replace with a real `blockvol`-backed `StorageAdapter`
|
||||
|
||||
2. synthetic control events in engine tests
|
||||
- replace with assignment-driven events from the real master/volume-server path
|
||||
|
||||
3. convenience recovery completion wrappers
|
||||
- keep them test-only
|
||||
- real integration should use planner + executor + storage work loop
|
||||
|
||||
### Can remain temporarily abstract in Phase 07 P0/P1
|
||||
|
||||
1. `ControlPlaneAdapter` exact public shape
|
||||
- can remain thin while the integration path is being chosen
|
||||
|
||||
2. async production scheduler details
|
||||
- executor can still be driven by a service loop before full background-task architecture is finalized
|
||||
|
||||
## Recommended Concrete Modules
|
||||
|
||||
### Engine stays here
|
||||
|
||||
- `sw-block/engine/replication/`
|
||||
|
||||
### First real adapter package should be added near blockvol
|
||||
|
||||
Recommended initial location:
|
||||
|
||||
- `weed/storage/blockvol/v2bridge/`
|
||||
|
||||
Reason:
|
||||
|
||||
- keeps V2 engine independent under `sw-block/`
|
||||
- keeps real-system glue close to blockvol storage truth
|
||||
- avoids copying engine logic into `weed/`
|
||||
|
||||
Suggested contents:
|
||||
|
||||
1. `control_adapter.go`
|
||||
- convert master assignment / local apply path into engine intents
|
||||
|
||||
2. `storage_adapter.go`
|
||||
- expose retained history, pin/release, trusted-base export handles from real blockvol state
|
||||
|
||||
3. `executor_bridge.go`
|
||||
- translate engine executor steps into actual blockvol recovery actions
|
||||
|
||||
4. `observe_adapter.go`
|
||||
- map engine status/logs into service-visible diagnostics
|
||||
|
||||
## First Failure Replay Set For Phase 07
|
||||
|
||||
The first real-system replay set should be:
|
||||
|
||||
1. changed-address restart
|
||||
- current risk: old identity/address coupling reappears in service glue
|
||||
|
||||
2. stale epoch / stale result after failover
|
||||
- current risk: master and engine disagree on authority timing
|
||||
|
||||
3. unreplayable-tail rebuild fallback
|
||||
- current risk: service glue over-trusts checkpoint/base availability
|
||||
|
||||
4. plan/execution cleanup after resource failure
|
||||
- current risk: blockvol-side resource failures leave engine or service state dangling
|
||||
|
||||
5. primary failover to replica with rebuild pending on old primary reconnect
|
||||
- current risk: old V1/V1.5 semantics leak back into reconnect handling
|
||||
|
||||
## Non-Goals For This Slice
|
||||
|
||||
Do not use `Phase 07` to:
|
||||
|
||||
1. widen catch-up semantics
|
||||
2. add smart rebuild optimizations
|
||||
3. redesign all blockvol internals
|
||||
4. replace the full V1 runtime in one move
|
||||
5. claim production readiness
|
||||
|
||||
## Deliverables For Phase 07 P0
|
||||
|
||||
A good `P0` delivery should include:
|
||||
|
||||
1. chosen service slice
|
||||
2. chosen integration path in the current repo
|
||||
3. adapter-to-module mapping
|
||||
4. list of test-only adapters to replace first
|
||||
5. first failure replay set
|
||||
6. explicit note of what remains outside this first slice
|
||||
|
||||
## Short Form
|
||||
|
||||
`Phase 07 P0` should start with:
|
||||
|
||||
- engine in `sw-block/engine/replication/`
|
||||
- bridge in `weed/storage/blockvol/v2bridge/`
|
||||
- first real slice = blockvol primary + one replica on the existing master heartbeat / assignment path
|
||||
- `ReplicaID = <volume-name>/<replica-server-id>` for the first slice
|
||||
- `blockvol` executes I/O but does not own recovery policy
|
||||
- first product path = `RF=2` failover/reconnect/rebuild correctness
|
||||
301
sw-block/docs/archive/design/phase-08-engine-skeleton-map.md
Normal file
301
sw-block/docs/archive/design/phase-08-engine-skeleton-map.md
Normal file
@@ -0,0 +1,301 @@
|
||||
# Phase 08 Engine Skeleton Map
|
||||
|
||||
Date: 2026-03-31
|
||||
Status: historical phase map
|
||||
Purpose: provide a short structural map for the `Phase 08` hardening path so implementation can move faster without reopening accepted V2 boundaries
|
||||
|
||||
## Scope
|
||||
|
||||
This is not the final standalone `sw-block` architecture.
|
||||
|
||||
It is the shortest useful engine skeleton for the accepted `Phase 08` hardening path:
|
||||
|
||||
- `RF=2`
|
||||
- `sync_all`
|
||||
- existing `Seaweed` master / volume-server heartbeat path
|
||||
- V2 engine owns recovery policy
|
||||
- `blockvol` remains the execution backend
|
||||
|
||||
## Module Map
|
||||
|
||||
### 1. Control plane
|
||||
|
||||
Role:
|
||||
|
||||
- authoritative control truth
|
||||
|
||||
Primary sources:
|
||||
|
||||
- `weed/server/master_grpc_server.go`
|
||||
- `weed/server/master_block_registry.go`
|
||||
- `weed/server/master_block_failover.go`
|
||||
- `weed/server/volume_grpc_client_to_master.go`
|
||||
|
||||
What it produces:
|
||||
|
||||
- confirmed assignment
|
||||
- `Epoch`
|
||||
- target `Role`
|
||||
- failover / promotion / reassignment result
|
||||
- stable server identity
|
||||
|
||||
### 2. Control bridge
|
||||
|
||||
Role:
|
||||
|
||||
- translate real control truth into V2 engine intent
|
||||
|
||||
Primary files:
|
||||
|
||||
- `weed/storage/blockvol/v2bridge/control.go`
|
||||
- `sw-block/bridge/blockvol/control_adapter.go`
|
||||
- entry path in `weed/server/volume_server_block.go`
|
||||
|
||||
What it produces:
|
||||
|
||||
- `AssignmentIntent`
|
||||
- stable `ReplicaID`
|
||||
- `Endpoint`
|
||||
- `SessionKind`
|
||||
|
||||
### 3. Engine runtime
|
||||
|
||||
Role:
|
||||
|
||||
- recovery-policy core
|
||||
|
||||
Primary files:
|
||||
|
||||
- `sw-block/engine/replication/orchestrator.go`
|
||||
- `sw-block/engine/replication/driver.go`
|
||||
- `sw-block/engine/replication/executor.go`
|
||||
- `sw-block/engine/replication/sender.go`
|
||||
- `sw-block/engine/replication/history.go`
|
||||
|
||||
What it decides:
|
||||
|
||||
- zero-gap / catch-up / needs-rebuild
|
||||
- sender/session ownership
|
||||
- stale authority rejection
|
||||
- resource acquisition / release
|
||||
- rebuild source selection
|
||||
|
||||
### 4. Storage bridge
|
||||
|
||||
Role:
|
||||
|
||||
- translate real blockvol storage truth and execution capability into engine-facing adapters
|
||||
|
||||
Primary files:
|
||||
|
||||
- `weed/storage/blockvol/v2bridge/reader.go`
|
||||
- `weed/storage/blockvol/v2bridge/pinner.go`
|
||||
- `weed/storage/blockvol/v2bridge/executor.go`
|
||||
- `sw-block/bridge/blockvol/storage_adapter.go`
|
||||
|
||||
What it provides:
|
||||
|
||||
- `RetainedHistory`
|
||||
- WAL retention pin / release
|
||||
- snapshot pin / release
|
||||
- full-base pin / release
|
||||
- WAL scan execution
|
||||
|
||||
### 5. Block runtime
|
||||
|
||||
Role:
|
||||
|
||||
- execute real I/O
|
||||
|
||||
Primary files:
|
||||
|
||||
- `weed/storage/blockvol/blockvol.go`
|
||||
- `weed/storage/blockvol/replica_apply.go`
|
||||
- `weed/storage/blockvol/replica_barrier.go`
|
||||
- `weed/storage/blockvol/recovery.go`
|
||||
- `weed/storage/blockvol/rebuild.go`
|
||||
- `weed/storage/blockvol/wal_shipper.go`
|
||||
|
||||
What it owns:
|
||||
|
||||
- WAL
|
||||
- extent
|
||||
- flusher
|
||||
- checkpoint / superblock
|
||||
- receiver / shipper
|
||||
- rebuild server
|
||||
|
||||
## Execution Order
|
||||
|
||||
### Control path
|
||||
|
||||
```text
|
||||
master heartbeat / failover truth
|
||||
-> BlockVolumeAssignment
|
||||
-> volume server ProcessAssignments
|
||||
-> v2bridge control conversion
|
||||
-> engine ProcessAssignment
|
||||
-> sender/session state updated
|
||||
```
|
||||
|
||||
### Catch-up path
|
||||
|
||||
```text
|
||||
assignment accepted
|
||||
-> engine reads retained history
|
||||
-> engine plans catch-up
|
||||
-> storage bridge pins WAL retention
|
||||
-> engine executor drives v2bridge executor
|
||||
-> blockvol scans WAL / ships entries
|
||||
-> engine completes session
|
||||
```
|
||||
|
||||
### Rebuild path
|
||||
|
||||
```text
|
||||
assignment accepted
|
||||
-> engine detects NeedsRebuild
|
||||
-> engine selects rebuild source
|
||||
-> storage bridge pins snapshot/full-base/tail
|
||||
-> executor drives transfer path
|
||||
-> blockvol performs restore / replay work
|
||||
-> engine completes rebuild
|
||||
```
|
||||
|
||||
### Local durability path
|
||||
|
||||
```text
|
||||
WriteLBA / Trim
|
||||
-> WAL append
|
||||
-> shipping / barrier
|
||||
-> client-visible durability decision
|
||||
-> flusher writes extent
|
||||
-> checkpoint advances
|
||||
-> retention floor decides WAL reclaimability
|
||||
```
|
||||
|
||||
## Interim Fields
|
||||
|
||||
These are currently acceptable only as explicit hardening carry-forwards:
|
||||
|
||||
### `localServerID`
|
||||
|
||||
Current source:
|
||||
|
||||
- `BlockService.listenAddr`
|
||||
|
||||
Meaning:
|
||||
|
||||
- temporary local identity source for replica/rebuild-side assignment translation
|
||||
|
||||
Status:
|
||||
|
||||
- interim only
|
||||
- should become registry-assigned stable server identity later
|
||||
|
||||
### `CommittedLSN = CheckpointLSN`
|
||||
|
||||
Current source:
|
||||
|
||||
- `v2bridge.Reader` / `BlockVol.StatusSnapshot()`
|
||||
|
||||
Meaning:
|
||||
|
||||
- current V1-style interim mapping where committed truth collapses to local checkpoint truth
|
||||
|
||||
Status:
|
||||
|
||||
- not final V2 truth
|
||||
- must become a gate decision before a production-candidate phase
|
||||
|
||||
### heartbeat as control carrier
|
||||
|
||||
Current source:
|
||||
|
||||
- existing master <-> volume-server heartbeat path
|
||||
|
||||
Meaning:
|
||||
|
||||
- current transport for assignment/control delivery
|
||||
|
||||
Status:
|
||||
|
||||
- acceptable as current carrier
|
||||
- not yet a final proof that no separate control channel will ever be needed
|
||||
|
||||
## Hard Gates
|
||||
|
||||
These should remain explicit in `Phase 08`:
|
||||
|
||||
### Gate 1: committed truth
|
||||
|
||||
Before production-candidate:
|
||||
|
||||
- either separate `CommittedLSN` from `CheckpointLSN`
|
||||
- or explicitly bound the first candidate path to currently proven pre-checkpoint replay behavior
|
||||
|
||||
### Gate 2: live control delivery
|
||||
|
||||
Required:
|
||||
|
||||
- real assignment delivery must reach the engine on the live path
|
||||
- not only converter-level proof
|
||||
|
||||
### Gate 3: integrated catch-up closure
|
||||
|
||||
Required:
|
||||
|
||||
- engine -> executor -> `v2bridge` -> blockvol must be proven as one live chain
|
||||
- not planner proof plus direct WAL-scan proof as separate evidence
|
||||
|
||||
### Gate 4: first rebuild execution path
|
||||
|
||||
Required:
|
||||
|
||||
- rebuild must not remain only a detection outcome
|
||||
- the chosen product path needs one real executable rebuild closure
|
||||
|
||||
### Gate 5: unified replay
|
||||
|
||||
Required:
|
||||
|
||||
- after control and execution closure land, rerun the accepted failure-class set on the unified live path
|
||||
|
||||
## Reuse Map
|
||||
|
||||
### Reuse directly
|
||||
|
||||
- `weed/server/master_grpc_server.go`
|
||||
- `weed/server/volume_grpc_client_to_master.go`
|
||||
- `weed/server/volume_server_block.go`
|
||||
- `weed/server/master_block_registry.go`
|
||||
- `weed/server/master_block_failover.go`
|
||||
- `weed/storage/blockvol/blockvol.go`
|
||||
- `weed/storage/blockvol/replica_apply.go`
|
||||
- `weed/storage/blockvol/replica_barrier.go`
|
||||
- `weed/storage/blockvol/v2bridge/`
|
||||
|
||||
### Reuse as implementation reality, not truth
|
||||
|
||||
- `shipperGroup`
|
||||
- `RetentionFloorFn`
|
||||
- `ReplicaReceiver`
|
||||
- checkpoint/superblock machinery
|
||||
- existing failover heuristics
|
||||
|
||||
### Do not inherit as V2 semantics
|
||||
|
||||
- address-shaped identity
|
||||
- old degraded/catch-up intuition from V1/V1.5
|
||||
- `CommittedLSN = CheckpointLSN` as final truth
|
||||
- blockvol-side recovery policy decisions
|
||||
|
||||
## Short Rule
|
||||
|
||||
Use this skeleton as:
|
||||
|
||||
- a hardening map for the current product path
|
||||
|
||||
Do not mistake it for:
|
||||
|
||||
- the final standalone `sw-block` architecture
|
||||
170
sw-block/docs/archive/design/v2-engine-readiness-review.md
Normal file
170
sw-block/docs/archive/design/v2-engine-readiness-review.md
Normal file
@@ -0,0 +1,170 @@
|
||||
# V2 Engine Readiness Review
|
||||
|
||||
Date: 2026-03-29
|
||||
Status: historical readiness review
|
||||
Purpose: record the decision on whether the current V2 design + prototype + simulator stack is strong enough to begin real V2 engine slicing
|
||||
|
||||
## Decision
|
||||
|
||||
Current judgment:
|
||||
|
||||
- proceed to real V2 engine planning
|
||||
- do not open a `V2.5` redesign track at this time
|
||||
|
||||
This is a planning-readiness decision, not a production-readiness claim.
|
||||
|
||||
## Why This Review Exists
|
||||
|
||||
The project has now completed:
|
||||
|
||||
1. design/FSM closure for the V2 line
|
||||
2. protocol simulation closure for:
|
||||
- V1 / V1.5 / V2 comparison
|
||||
- timeout/race behavior
|
||||
- ownership/session semantics
|
||||
3. standalone prototype closure for:
|
||||
- sender/session ownership
|
||||
- execution authority
|
||||
- recovery branching
|
||||
- minimal historical-data proof
|
||||
- prototype scenario closure
|
||||
4. `Phase 4.5` hardening for:
|
||||
- bounded `CatchUp`
|
||||
- first-class `Rebuild`
|
||||
- crash-consistency / restart-recoverability
|
||||
- `A5-A8` stronger evidence
|
||||
|
||||
So the question is no longer:
|
||||
|
||||
- "can the prototype be made richer?"
|
||||
|
||||
The question is:
|
||||
|
||||
- "is the evidence now strong enough to begin real engine slicing?"
|
||||
|
||||
## Evidence Summary
|
||||
|
||||
### 1. Design / Protocol
|
||||
|
||||
Primary docs:
|
||||
|
||||
- `sw-block/design/v2-acceptance-criteria.md`
|
||||
- `sw-block/design/v2-open-questions.md`
|
||||
- `sw-block/design/v2_scenarios.md`
|
||||
- `sw-block/design/v1-v15-v2-comparison.md`
|
||||
- `sw-block/docs/archive/design/v2-prototype-roadmap-and-gates.md`
|
||||
|
||||
Judgment:
|
||||
|
||||
- protocol story is coherent
|
||||
- acceptance set exists
|
||||
- major V1 / V1.5 failures are mapped into V2 scenarios
|
||||
|
||||
### 2. Simulator
|
||||
|
||||
Primary code/tests:
|
||||
|
||||
- `sw-block/prototype/distsim/`
|
||||
- `sw-block/prototype/distsim/eventsim.go`
|
||||
- `learn/projects/sw-block/test/results/v2-simulation-review.md`
|
||||
|
||||
Judgment:
|
||||
|
||||
- strong enough for protocol/design validation
|
||||
- strong enough to challenge crash-consistency and liveness assumptions
|
||||
- not a substitute for real engine / hardware proof
|
||||
|
||||
### 3. Prototype
|
||||
|
||||
Primary code/tests:
|
||||
|
||||
- `sw-block/prototype/enginev2/`
|
||||
- `sw-block/prototype/enginev2/acceptance_test.go`
|
||||
|
||||
Judgment:
|
||||
|
||||
- ownership is explicit and fenced
|
||||
- execution authority is explicit and fenced
|
||||
- bounded `CatchUp` is semantic, not documentary
|
||||
- `Rebuild` is a first-class sender-owned path
|
||||
- historical-data and recoverability reasoning are executable
|
||||
|
||||
### 4. `A5-A8` Double Evidence
|
||||
|
||||
Prototype-side grouped evidence:
|
||||
|
||||
- `sw-block/prototype/enginev2/acceptance_test.go`
|
||||
|
||||
Simulator-side grouped evidence:
|
||||
|
||||
- `sw-block/docs/archive/design/a5-a8-traceability.md`
|
||||
- `sw-block/prototype/distsim/`
|
||||
|
||||
Judgment:
|
||||
|
||||
- the critical acceptance items that most affect engine risk now have materially stronger proof on both sides
|
||||
|
||||
## What Is Good Enough Now
|
||||
|
||||
The following are good enough to begin engine slicing:
|
||||
|
||||
1. sender/session ownership model
|
||||
2. stale authority fencing
|
||||
3. recovery orchestration shape
|
||||
4. bounded `CatchUp` contract
|
||||
5. `Rebuild` as formal path
|
||||
6. committed/recoverable boundary thinking
|
||||
7. crash-consistency / restart-recoverability proof style
|
||||
|
||||
## What Is Still Not Proven
|
||||
|
||||
The following still require real engine work and later real-system validation:
|
||||
|
||||
1. actual engine lifecycle integration
|
||||
2. real storage/backend implementation
|
||||
3. real control-plane integration
|
||||
4. real durability / fsync behavior under the actual engine
|
||||
5. real hardware timing / performance
|
||||
6. final production observability and failure handling
|
||||
|
||||
These are expected gaps. They do not block engine planning.
|
||||
|
||||
## Open Risks To Carry Forward
|
||||
|
||||
These are not blockers, but they should remain explicit:
|
||||
|
||||
1. prototype and simulator are still reduced models
|
||||
2. rebuild-source quality in the real engine will depend on actual checkpoint/base-image mechanics
|
||||
3. durability truth in the real engine must still be re-proven against actual persistence behavior
|
||||
4. predicate exploration can still grow, but should not block engine slicing
|
||||
|
||||
## Engine-Planning Decision
|
||||
|
||||
Decision:
|
||||
|
||||
- start real V2 engine planning
|
||||
|
||||
Reason:
|
||||
|
||||
1. no current evidence points to a structural flaw requiring `V2.5`
|
||||
2. the remaining gaps are implementation/system gaps, not prototype ambiguity
|
||||
3. continuing to extend prototype/simulator breadth would have diminishing returns
|
||||
|
||||
## Required Outputs After This Review
|
||||
|
||||
1. `sw-block/docs/archive/design/v2-engine-slicing-plan.md`
|
||||
2. first real engine slice definition
|
||||
3. explicit non-goals for first engine stage
|
||||
4. explicit validation plan for engine slices
|
||||
|
||||
## Non-Goals Of This Review
|
||||
|
||||
This review does not claim:
|
||||
|
||||
1. V2 is production-ready
|
||||
2. V2 should replace V1 immediately
|
||||
3. all design questions are forever closed
|
||||
|
||||
It only claims:
|
||||
|
||||
- the project now has enough evidence to begin disciplined real engine slicing
|
||||
191
sw-block/docs/archive/design/v2-engine-slicing-plan.md
Normal file
191
sw-block/docs/archive/design/v2-engine-slicing-plan.md
Normal file
@@ -0,0 +1,191 @@
|
||||
# V2 Engine Slicing Plan
|
||||
|
||||
Date: 2026-03-29
|
||||
Status: historical slicing plan
|
||||
Purpose: define the first real V2 engine slices after prototype and `Phase 4.5` closure
|
||||
|
||||
## Goal
|
||||
|
||||
Move from:
|
||||
|
||||
- standalone design/prototype truth under `sw-block/prototype/`
|
||||
|
||||
to:
|
||||
|
||||
- a real V2 engine core under `sw-block/`
|
||||
|
||||
without dragging V1.5 lifecycle assumptions into the implementation.
|
||||
|
||||
## Planning Rules
|
||||
|
||||
1. reuse V1 ideas and tests selectively, not structurally
|
||||
2. prefer narrow vertical slices over broad skeletons
|
||||
3. each slice must preserve the accepted V2 ownership/fencing model
|
||||
4. keep simulator/prototype as validation support, not as the implementation itself
|
||||
5. do not mix V2 engine work into `weed/storage/blockvol/`
|
||||
|
||||
## First Engine Stage
|
||||
|
||||
The first engine stage should build the control/recovery core, not the full storage engine.
|
||||
|
||||
That means:
|
||||
|
||||
1. per-replica sender identity
|
||||
2. one active recovery session per replica per epoch
|
||||
3. sender-owned execution authority
|
||||
4. explicit recovery outcomes:
|
||||
- zero gap
|
||||
- bounded catch-up
|
||||
- rebuild
|
||||
5. rebuild execution shell only
|
||||
- do not hard-code final snapshot + tail vs full base decision logic yet
|
||||
- keep real rebuild-source choice tied to Slice 3 recoverability inputs
|
||||
|
||||
## Recommended Slice Order
|
||||
|
||||
### Slice 1: Engine Ownership Core
|
||||
|
||||
Purpose:
|
||||
|
||||
- carry the accepted `enginev2` ownership/fencing model into the real engine core
|
||||
|
||||
Scope:
|
||||
|
||||
1. stable per-replica sender object
|
||||
2. stable recovery-session object
|
||||
3. session identity fencing
|
||||
4. endpoint / epoch invalidation
|
||||
5. sender-group or equivalent ownership registry
|
||||
|
||||
Acceptance:
|
||||
|
||||
1. stale session results cannot mutate current authority
|
||||
2. changed-address and epoch-bump invalidation work in engine code
|
||||
3. the 4 V2-boundary ownership themes remain provable
|
||||
|
||||
### Slice 2: Engine Recovery Execution Core
|
||||
|
||||
Purpose:
|
||||
|
||||
- move the prototype execution APIs into real engine behavior
|
||||
|
||||
Scope:
|
||||
|
||||
1. connect / handshake / catch-up flow
|
||||
2. bounded `CatchUp`
|
||||
3. explicit `NeedsRebuild`
|
||||
4. sender-owned rebuild execution path
|
||||
5. rebuild execution shell without final trusted-base selection policy
|
||||
|
||||
Acceptance:
|
||||
|
||||
1. bounded catch-up does not chase indefinitely
|
||||
2. rebuild is exclusive from catch-up
|
||||
3. session completion rules are explicit and fenced
|
||||
|
||||
### Slice 3: Engine Data / Recoverability Core
|
||||
|
||||
Purpose:
|
||||
|
||||
- connect recovery behavior to real retained-history / checkpoint mechanics
|
||||
|
||||
Scope:
|
||||
|
||||
1. real recoverability decision inputs
|
||||
2. trusted-base decision for rebuild source
|
||||
3. minimal real checkpoint/base-image integration
|
||||
4. real truncation / safe-boundary handling
|
||||
|
||||
This is the first slice that should decide, from real engine inputs, between:
|
||||
|
||||
1. `snapshot + tail`
|
||||
2. `full base`
|
||||
|
||||
Acceptance:
|
||||
|
||||
1. engine can explain why recovery is allowed
|
||||
2. rebuild-source choice is explicit and testable
|
||||
3. historical correctness and truncation rules remain intact
|
||||
|
||||
### Slice 4: Engine Integration Closure
|
||||
|
||||
Purpose:
|
||||
|
||||
- bind engine control/recovery core to real orchestration and validation surfaces
|
||||
|
||||
Scope:
|
||||
|
||||
1. real assignment/control intent entry path
|
||||
2. engine-facing observability
|
||||
3. focused real-engine tests for V2-boundary cases
|
||||
4. first integration review against real failure classes
|
||||
|
||||
Acceptance:
|
||||
|
||||
1. key V2-boundary failures are reproduced and closed in engine tests
|
||||
2. engine observability is good enough to debug ownership/recovery failures
|
||||
3. remaining gaps are system/performance gaps, not control-model ambiguity
|
||||
|
||||
## What To Reuse
|
||||
|
||||
Good reuse candidates:
|
||||
|
||||
1. tests and failure cases from V1 / V1.5
|
||||
2. narrow utility/data helpers where not coupled to V1 lifecycle
|
||||
3. selected WAL/history concepts if they fit V2 ownership boundaries
|
||||
|
||||
Do not structurally reuse:
|
||||
|
||||
1. V1/V1.5 shipper lifecycle
|
||||
2. address-based identity assumptions
|
||||
3. `SetReplicaAddrs`-style behavior
|
||||
4. old recovery control structure
|
||||
|
||||
## Where The Work Should Live
|
||||
|
||||
Real V2 engine work should continue under:
|
||||
|
||||
- `sw-block/`
|
||||
|
||||
Recommended next area:
|
||||
|
||||
- `sw-block/core/`
|
||||
or
|
||||
- `sw-block/engine/`
|
||||
|
||||
Exact path can be chosen later, but it should remain separate from:
|
||||
|
||||
- `sw-block/prototype/`
|
||||
- `weed/storage/blockvol/`
|
||||
|
||||
## Validation Plan For Engine Slices
|
||||
|
||||
Each engine slice should be validated at three levels:
|
||||
|
||||
1. prototype alignment
|
||||
- does engine behavior preserve the accepted prototype invariant?
|
||||
|
||||
2. focused engine tests
|
||||
- does the real engine slice enforce the same contract?
|
||||
|
||||
3. scenario mapping
|
||||
- does at least one important V1/V1.5 failure class remain closed?
|
||||
|
||||
## Non-Goals For First Engine Stage
|
||||
|
||||
Do not try to do these immediately:
|
||||
|
||||
1. full Smart WAL expansion
|
||||
2. performance optimization
|
||||
3. V1 replacement/migration plan
|
||||
4. full product integration
|
||||
5. all storage/backend redesign at once
|
||||
|
||||
## Immediate Next Assignment
|
||||
|
||||
The first concrete engine-planning task should be:
|
||||
|
||||
1. choose the real V2 engine module location under `sw-block/`
|
||||
2. define Slice 1 file/module boundaries
|
||||
3. write a short engine ownership-core spec
|
||||
4. map 3-5 acceptance scenarios directly onto Slice 1 expectations
|
||||
159
sw-block/docs/archive/design/v2-first-slice-sender-ownership.md
Normal file
159
sw-block/docs/archive/design/v2-first-slice-sender-ownership.md
Normal file
@@ -0,0 +1,159 @@
|
||||
# V2 First Slice: Per-Replica Sender/Session Ownership
|
||||
|
||||
Date: 2026-03-27
|
||||
Status: historical first-slice note
|
||||
Depends-on: Q1 (recovery session), Q6 (orchestrator scope), Q7 (first slice)
|
||||
|
||||
## Problem
|
||||
|
||||
`SetReplicaAddrs()` replaces the entire `ShipperGroup` atomically. This causes:
|
||||
|
||||
1. **State loss on topology change.** All shippers are destroyed and recreated.
|
||||
Recovery state (`replicaFlushedLSN`, `lastContactTime`, catch-up progress) is lost.
|
||||
After a changed-address restart, the new shipper starts from scratch.
|
||||
|
||||
2. **No per-replica identity.** Shippers are identified by array index. The master
|
||||
cannot target a specific replica for rebuild/catch-up — it must re-issue the
|
||||
entire address set.
|
||||
|
||||
3. **Background reconnect races.** A reconnect cycle may be in progress when
|
||||
`SetReplicaAddrs` replaces the group. The in-progress reconnect's connection
|
||||
objects become orphaned.
|
||||
|
||||
## Design
|
||||
|
||||
### Per-replica sender identity
|
||||
|
||||
`ShipperGroup` changes from `[]*WALShipper` to `map[string]*WALShipper`, keyed by
|
||||
the replica's canonical data address. Each shipper stores its own `ReplicaID`.
|
||||
|
||||
```go
|
||||
type WALShipper struct {
|
||||
ReplicaID string // canonical data address — identity across reconnects
|
||||
// ... existing fields
|
||||
}
|
||||
|
||||
type ShipperGroup struct {
|
||||
mu sync.RWMutex
|
||||
shippers map[string]*WALShipper // keyed by ReplicaID
|
||||
}
|
||||
```
|
||||
|
||||
### ReconcileReplicas replaces SetReplicaAddrs
|
||||
|
||||
Instead of replacing the entire group, `ReconcileReplicas` diffs old vs new:
|
||||
|
||||
```
|
||||
ReconcileReplicas(newAddrs []ReplicaAddr):
|
||||
for each existing shipper:
|
||||
if NOT in newAddrs → Stop and remove
|
||||
for each newAddr:
|
||||
if matching shipper exists → keep (preserve state)
|
||||
if no match → create new shipper
|
||||
```
|
||||
|
||||
This preserves `replicaFlushedLSN`, `lastContactTime`, catch-up progress, and
|
||||
background reconnect goroutines for replicas that stay in the set.
|
||||
|
||||
`SetReplicaAddrs` becomes a wrapper:
|
||||
```go
|
||||
func (v *BlockVol) SetReplicaAddrs(addrs []ReplicaAddr) {
|
||||
if v.shipperGroup == nil {
|
||||
v.shipperGroup = NewShipperGroup(nil)
|
||||
}
|
||||
v.shipperGroup.ReconcileReplicas(addrs, v.makeShipperFactory())
|
||||
}
|
||||
```
|
||||
|
||||
### Changed-address restart flow
|
||||
|
||||
1. Replica restarts on new port. Heartbeat reports new address.
|
||||
2. Master detects endpoint change (address differs, same volume).
|
||||
3. Master sends assignment update to primary with new replica address.
|
||||
4. Primary's `ReconcileReplicas` receives `[oldAddr1, newAddr2]`.
|
||||
5. Old shipper for the changed replica is stopped (old address gone from set).
|
||||
6. New shipper created with new address — but this is a fresh shipper.
|
||||
7. New shipper bootstraps: Disconnected → Connecting → CatchingUp → InSync.
|
||||
|
||||
The improvement over V1.5: the **other** replicas in the set are NOT disturbed.
|
||||
Only the changed replica gets a fresh shipper. Recovery state for stable replicas
|
||||
is preserved.
|
||||
|
||||
### Recovery session
|
||||
|
||||
Each WALShipper already contains the recovery state machine:
|
||||
- `state` (Disconnected → Connecting → CatchingUp → InSync → Degraded → NeedsRebuild)
|
||||
- `replicaFlushedLSN` (authoritative progress)
|
||||
- `lastContactTime` (retention budget)
|
||||
- `catchupFailures` (escalation counter)
|
||||
- Background reconnect goroutine
|
||||
|
||||
No separate `RecoverySession` object is needed. The WALShipper IS the per-replica
|
||||
recovery session. The state machine already tracks the session lifecycle.
|
||||
|
||||
What changes: the session is no longer destroyed on topology change (unless the
|
||||
replica itself is removed from the set).
|
||||
|
||||
### Coordinator vs primary responsibilities
|
||||
|
||||
| Responsibility | Owner |
|
||||
|---------------|-------|
|
||||
| Endpoint truth (canonical address) | Coordinator (master) |
|
||||
| Assignment updates (add/remove replicas) | Coordinator |
|
||||
| Epoch authority | Coordinator |
|
||||
| Session creation trigger | Coordinator (via assignment) |
|
||||
| Session execution (reconnect, catch-up, barrier) | Primary (via WALShipper) |
|
||||
| Timeout enforcement | Primary |
|
||||
| Ordered receive/apply | Replica |
|
||||
| Barrier ack | Replica |
|
||||
| Heartbeat reporting | Replica |
|
||||
|
||||
### Migration from current code
|
||||
|
||||
| Current | V2 |
|
||||
|---------|-----|
|
||||
| `ShipperGroup.shippers []*WALShipper` | `ShipperGroup.shippers map[string]*WALShipper` |
|
||||
| `SetReplicaAddrs()` creates all new | `ReconcileReplicas()` diffs and preserves |
|
||||
| `StopAll()` in demote | `StopAll()` unchanged (stops all) |
|
||||
| `ShipAll(entry)` iterates slice | `ShipAll(entry)` iterates map values |
|
||||
| `BarrierAll(lsn)` parallel slice | `BarrierAll(lsn)` parallel map values |
|
||||
| `MinReplicaFlushedLSN()` iterates slice | Same, iterates map values |
|
||||
| `ShipperStates()` iterates slice | Same, iterates map values |
|
||||
| No per-shipper identity | `WALShipper.ReplicaID` = canonical data addr |
|
||||
|
||||
### Files changed
|
||||
|
||||
| File | Change |
|
||||
|------|--------|
|
||||
| `wal_shipper.go` | Add `ReplicaID` field, pass in constructor |
|
||||
| `shipper_group.go` | `map[string]*WALShipper`, `ReconcileReplicas`, update iterators |
|
||||
| `blockvol.go` | `SetReplicaAddrs` calls `ReconcileReplicas`, shipper factory |
|
||||
| `promotion.go` | No change (StopAll unchanged) |
|
||||
| `dist_group_commit.go` | No change (uses ShipperGroup API) |
|
||||
| `block_heartbeat.go` | No change (uses ShipperStates) |
|
||||
|
||||
### Acceptance bar
|
||||
|
||||
The following existing tests must continue to pass:
|
||||
- All CP13-1 through CP13-7 protocol tests (sync_all_protocol_test.go)
|
||||
- All adversarial tests (sync_all_adversarial_test.go)
|
||||
- All baseline tests (sync_all_bug_test.go)
|
||||
- All rebuild tests (rebuild_v1_test.go)
|
||||
|
||||
The following CP13-8 tests validate the V2 improvement:
|
||||
- `TestCP13_SyncAll_ReplicaRestart_Rejoin` — changed-address recovery
|
||||
- `TestAdversarial_ReconnectUsesHandshakeNotBootstrap` — V2 reconnect protocol
|
||||
- `TestAdversarial_CatchupMultipleDisconnects` — state preservation across reconnects
|
||||
|
||||
New tests to add:
|
||||
- `TestReconcileReplicas_PreservesExistingShipper` — stable replica keeps state
|
||||
- `TestReconcileReplicas_RemovesStaleShipper` — removed replica stopped
|
||||
- `TestReconcileReplicas_AddsNewShipper` — new replica bootstraps
|
||||
- `TestReconcileReplicas_MixedUpdate` — one kept, one removed, one added
|
||||
|
||||
## Non-goals for this slice
|
||||
|
||||
- Smart WAL payload classes
|
||||
- Recovery reservation protocol
|
||||
- Full coordinator orchestration
|
||||
- New transport layer
|
||||
194
sw-block/docs/archive/design/v2-first-slice-session-ownership.md
Normal file
194
sw-block/docs/archive/design/v2-first-slice-session-ownership.md
Normal file
@@ -0,0 +1,194 @@
|
||||
# V2 First Slice: Per-Replica Sender and Recovery Session Ownership
|
||||
|
||||
Date: 2026-03-27
|
||||
Status: historical first-slice note
|
||||
|
||||
## Purpose
|
||||
|
||||
This document defines the first real V2 implementation slice.
|
||||
|
||||
The slice is intentionally narrow:
|
||||
|
||||
- per-replica sender ownership
|
||||
- explicit recovery session ownership
|
||||
- clear coordinator vs primary responsibility
|
||||
|
||||
This is the first step toward a standalone V2 block engine under `sw-block/`.
|
||||
|
||||
## Why This Slice First
|
||||
|
||||
It directly addresses the clearest V1.5 structural limits:
|
||||
|
||||
- sender identity loss when replica sets are refreshed
|
||||
- changed-address restart recovery complexity
|
||||
- repeated reconnect cycles without stable per-replica ownership
|
||||
- adversarial Phase 13 boundary tests that V1.5 cannot cleanly satisfy
|
||||
|
||||
It also avoids jumping too early into:
|
||||
|
||||
- Smart WAL
|
||||
- new backend storage layout
|
||||
- full production transport redesign
|
||||
|
||||
## Core Decision
|
||||
|
||||
Use:
|
||||
|
||||
- **one sender owner per replica**
|
||||
- **at most one active recovery session per replica per epoch**
|
||||
|
||||
Healthy replicas may only need their steady sender object.
|
||||
|
||||
Degraded / reconnecting replicas gain an explicit recovery session owned by the primary.
|
||||
|
||||
## Ownership Split
|
||||
|
||||
### Coordinator
|
||||
|
||||
Owns:
|
||||
|
||||
- replica identity / endpoint truth
|
||||
- assignment updates
|
||||
- epoch authority
|
||||
- session creation / destruction intent
|
||||
|
||||
Does not own:
|
||||
|
||||
- byte-by-byte catch-up execution
|
||||
- local sender loop scheduling
|
||||
|
||||
### Primary
|
||||
|
||||
Owns:
|
||||
|
||||
- per-replica sender objects
|
||||
- per-replica recovery session execution
|
||||
- reconnect / catch-up progress
|
||||
- timeout enforcement for active session
|
||||
- transition from:
|
||||
- normal sender
|
||||
- to recovery session
|
||||
- back to normal sender
|
||||
|
||||
### Replica
|
||||
|
||||
Owns:
|
||||
|
||||
- receive/apply path
|
||||
- barrier ack
|
||||
- heartbeat/reporting
|
||||
|
||||
Replica remains passive from the recovery-orchestration point of view.
|
||||
|
||||
## Data Model
|
||||
|
||||
## Sender Owner
|
||||
|
||||
Per replica, maintain a stable sender owner with:
|
||||
|
||||
- replica logical ID
|
||||
- current endpoint
|
||||
- current epoch view
|
||||
- steady-state health/status
|
||||
- optional active recovery session reference
|
||||
|
||||
## Recovery Session
|
||||
|
||||
Per replica, per epoch:
|
||||
|
||||
- `ReplicaID`
|
||||
- `Epoch`
|
||||
- `EndpointVersion` or equivalent endpoint truth
|
||||
- `State`
|
||||
- `connecting`
|
||||
- `catching_up`
|
||||
- `in_sync`
|
||||
- `needs_rebuild`
|
||||
- `StartLSN`
|
||||
- `TargetLSN`
|
||||
- timeout / deadline metadata
|
||||
|
||||
## Session Rules
|
||||
|
||||
1. only one active session per replica per epoch
|
||||
2. new assignment for same replica:
|
||||
- supersedes old session only if epoch/session generation is newer
|
||||
3. stale session must not continue after:
|
||||
- epoch bump
|
||||
- endpoint truth change
|
||||
- explicit coordinator replacement
|
||||
|
||||
## Minimal State Transitions
|
||||
|
||||
### Healthy path
|
||||
|
||||
1. replica sender exists
|
||||
2. sender ships normally
|
||||
3. replica remains `InSync`
|
||||
|
||||
### Recovery path
|
||||
|
||||
1. sender detects or is told replica is not healthy
|
||||
2. coordinator provides valid assignment/endpoint truth
|
||||
3. primary creates recovery session
|
||||
4. session connects
|
||||
5. session catches up if recoverable
|
||||
6. on success:
|
||||
- session closes
|
||||
- steady sender resumes normal state
|
||||
|
||||
### Rebuild path
|
||||
|
||||
1. session determines catch-up is not sufficient
|
||||
2. session transitions to `needs_rebuild`
|
||||
3. higher layer rebuild flow takes over
|
||||
|
||||
## What This Slice Does Not Include
|
||||
|
||||
Not in the first slice:
|
||||
|
||||
- Smart WAL payload classes in production
|
||||
- snapshot pinning / GC logic
|
||||
- new on-disk engine
|
||||
- frontend publication changes
|
||||
- full production event scheduler
|
||||
|
||||
## Proposed V2 Workspace Target
|
||||
|
||||
Do this under `sw-block/`, not `weed/storage/blockvol/`.
|
||||
|
||||
Suggested area:
|
||||
|
||||
- `sw-block/prototype/enginev2/`
|
||||
|
||||
Suggested first files:
|
||||
|
||||
- `sw-block/prototype/enginev2/session.go`
|
||||
- `sw-block/prototype/enginev2/sender.go`
|
||||
- `sw-block/prototype/enginev2/group.go`
|
||||
- `sw-block/prototype/enginev2/session_test.go`
|
||||
|
||||
The first code does not need full storage I/O.
|
||||
It should prove ownership and transition shape first.
|
||||
|
||||
## Acceptance For This Slice
|
||||
|
||||
The slice is good enough when:
|
||||
|
||||
1. sender identity is stable per replica
|
||||
2. changed-address reassignment updates the right sender owner
|
||||
3. multiple reconnect cycles do not lose recovery ownership
|
||||
4. stale session does not survive epoch bump
|
||||
5. the 4 Phase 13 V2-boundary tests have a clear path to become satisfiable
|
||||
|
||||
## Relationship To Existing Simulator
|
||||
|
||||
This slice should align with:
|
||||
|
||||
- `v2-acceptance-criteria.md`
|
||||
- `v2-open-questions.md`
|
||||
- `v1-v15-v2-comparison.md`
|
||||
- `distsim` / `eventsim` behavior
|
||||
|
||||
The simulator remains the design oracle.
|
||||
The first implementation slice should not contradict it.
|
||||
199
sw-block/docs/archive/design/v2-production-roadmap.md
Normal file
199
sw-block/docs/archive/design/v2-production-roadmap.md
Normal file
@@ -0,0 +1,199 @@
|
||||
# V2 Production Roadmap
|
||||
|
||||
Date: 2026-03-30
|
||||
Status: historical roadmap
|
||||
Purpose: define the path from the accepted V2 engine core to a production candidate
|
||||
|
||||
## Current Position
|
||||
|
||||
Completed:
|
||||
|
||||
1. design / FSM closure
|
||||
2. simulator / protocol validation
|
||||
3. prototype closure
|
||||
4. evidence hardening
|
||||
5. engine core slices:
|
||||
- Slice 1 ownership core
|
||||
- Slice 2 recovery execution core
|
||||
- Slice 3 data / recoverability core
|
||||
- Slice 4 integration closure
|
||||
|
||||
Current stage:
|
||||
|
||||
- entering broader engine implementation
|
||||
|
||||
This means the main risk is no longer:
|
||||
|
||||
- whether the V2 idea stands up
|
||||
|
||||
The main risk is:
|
||||
|
||||
- whether the accepted engine core can be turned into a real system without reintroducing V1/V1.5 structure and semantics
|
||||
|
||||
## Roadmap Summary
|
||||
|
||||
1. Phase 06: broader engine implementation stage
|
||||
2. Phase 07: real-system integration / product-path decision
|
||||
3. Phase 08: pre-production hardening
|
||||
4. Phase 09: performance / scale / soak validation
|
||||
5. Phase 10: production candidate and rollout gate
|
||||
|
||||
## Phase 06
|
||||
|
||||
### Goal
|
||||
|
||||
Connect the accepted engine core to:
|
||||
|
||||
1. real control truth
|
||||
2. real storage truth
|
||||
3. explicit engine execution steps
|
||||
|
||||
### Outputs
|
||||
|
||||
1. control-plane adapter into the engine core
|
||||
2. storage/base/recoverability adapters
|
||||
3. explicit execution-driver model where synchronous helpers are no longer sufficient
|
||||
4. validation against selected real failure classes
|
||||
|
||||
### Gate
|
||||
|
||||
At the end of Phase 06, the project should be able to say:
|
||||
|
||||
- the engine core can live inside a real system shape
|
||||
|
||||
## Phase 07
|
||||
|
||||
### Goal
|
||||
|
||||
Move from engine-local correctness to a real runnable subsystem.
|
||||
|
||||
### Outputs
|
||||
|
||||
1. service-style runnable engine slice
|
||||
2. integration with real control and storage surfaces
|
||||
3. crash/failover/restart integration tests
|
||||
4. decision on the first viable product path
|
||||
|
||||
### Gate
|
||||
|
||||
At the end of Phase 07, the project should be able to say:
|
||||
|
||||
- the engine can run as a real subsystem, not only as an isolated core
|
||||
|
||||
## Phase 08
|
||||
|
||||
### Goal
|
||||
|
||||
Turn correctness into operational safety.
|
||||
|
||||
### Outputs
|
||||
|
||||
1. observability hardening
|
||||
2. operator/debug flows
|
||||
3. recovery/runbook procedures
|
||||
4. config surface cleanup
|
||||
5. realistic durability/restart validation
|
||||
|
||||
### Gate
|
||||
|
||||
At the end of Phase 08, the project should be able to say:
|
||||
|
||||
- operators can run, debug, and recover the system safely
|
||||
|
||||
## Phase 09
|
||||
|
||||
### Goal
|
||||
|
||||
Prove viability under load and over time.
|
||||
|
||||
### Outputs
|
||||
|
||||
1. throughput / latency baselines
|
||||
2. rebuild / catch-up cost characterization
|
||||
3. steady-state overhead measurement
|
||||
4. soak testing
|
||||
5. scale and failure-under-load validation
|
||||
|
||||
### Gate
|
||||
|
||||
At the end of Phase 09, the project should be able to say:
|
||||
|
||||
- the design is not only correct, but viable at useful scale and duration
|
||||
|
||||
## Phase 10
|
||||
|
||||
### Goal
|
||||
|
||||
Produce a controlled production candidate.
|
||||
|
||||
### Outputs
|
||||
|
||||
1. feature-gated production candidate
|
||||
2. rollback strategy
|
||||
3. migration/coexistence plan with V1
|
||||
4. staged rollout plan
|
||||
5. production acceptance checklist
|
||||
|
||||
### Gate
|
||||
|
||||
At the end of Phase 10, the project should be able to say:
|
||||
|
||||
- the system is ready for a controlled production rollout
|
||||
|
||||
## Cross-Phase Rules
|
||||
|
||||
### Rule 1: Do not reopen protocol shape casually
|
||||
|
||||
The accepted core should remain stable unless new implementation evidence forces a change.
|
||||
|
||||
### Rule 2: Use V1 as validation source, not design template
|
||||
|
||||
Use:
|
||||
|
||||
1. `learn/projects/sw-block/`
|
||||
2. `weed/storage/block*`
|
||||
|
||||
for:
|
||||
|
||||
1. failure gates
|
||||
2. constraints
|
||||
3. integration references
|
||||
|
||||
Do not use them as the default V2 architecture template.
|
||||
|
||||
### Rule 3: Keep `CatchUp` narrow
|
||||
|
||||
Do not let later implementation phases re-expand `CatchUp` into a broad, optimistic, long-lived recovery mode.
|
||||
|
||||
### Rule 4: Keep evidence quality ahead of object growth
|
||||
|
||||
New work should preferentially improve:
|
||||
|
||||
1. traceability
|
||||
2. diagnosability
|
||||
3. real-failure validation
|
||||
4. operational confidence
|
||||
|
||||
not simply add new objects, states, or mechanisms.
|
||||
|
||||
## Production Readiness Ladder
|
||||
|
||||
The project should move through this ladder explicitly:
|
||||
|
||||
1. proof-of-design
|
||||
2. proof-of-engine-shape
|
||||
3. proof-of-runnable-engine-stage
|
||||
4. proof-of-operable-system
|
||||
5. proof-of-viable-production-candidate
|
||||
|
||||
Current ladder position:
|
||||
|
||||
- between `2` and `3`
|
||||
- engine core accepted; broader runnable engine stage underway
|
||||
|
||||
## Next Documents To Maintain
|
||||
|
||||
1. `sw-block/.private/phase/phase-06.md`
|
||||
2. `sw-block/docs/archive/design/v2-engine-readiness-review.md`
|
||||
3. `sw-block/docs/archive/design/v2-engine-slicing-plan.md`
|
||||
4. this roadmap
|
||||
239
sw-block/docs/archive/design/v2-prototype-roadmap-and-gates.md
Normal file
239
sw-block/docs/archive/design/v2-prototype-roadmap-and-gates.md
Normal file
@@ -0,0 +1,239 @@
|
||||
# V2 Prototype Roadmap And Gates
|
||||
|
||||
Date: 2026-03-27
|
||||
Status: historical prototype roadmap
|
||||
Purpose: define the remaining prototype roadmap, the validation gates between stages, and the decision point between real V2 engine work and possible V2.5 redesign
|
||||
|
||||
## Current Position
|
||||
|
||||
V2 design/FSM/simulator work is sufficiently closed for serious prototyping, but not frozen against later `V2.5` adjustments.
|
||||
|
||||
Current state:
|
||||
|
||||
- design proof: high
|
||||
- execution proof: medium
|
||||
- data/recovery proof: low
|
||||
- prototype end-to-end proof: low
|
||||
|
||||
Rough prototype progress:
|
||||
|
||||
- `25%` to `35%`
|
||||
|
||||
This is early executable prototype, not engine-ready prototype.
|
||||
|
||||
## Roadmap Goal
|
||||
|
||||
Answer this question with prototype evidence:
|
||||
|
||||
- can V2 become a real engine path?
|
||||
- or should it become `V2.5` before real implementation begins?
|
||||
|
||||
## Step 1: Execution Authority Closure
|
||||
|
||||
Purpose:
|
||||
|
||||
- finish the sender / recovery-session authority model so stale work is unambiguously rejected
|
||||
|
||||
Scope:
|
||||
|
||||
1. ownership-only `AttachSession()` / `SupersedeSession()`
|
||||
2. execution begins only through execution APIs
|
||||
3. stale handshake / progress / completion fenced by `sessionID`
|
||||
4. endpoint bump / epoch bump invalidate execution authority
|
||||
5. sender-group preserve-or-kill behavior is explicit
|
||||
|
||||
Done when:
|
||||
|
||||
1. all execution APIs are sender-gated and reject stale `sessionID`
|
||||
2. session creation is separated from execution start
|
||||
3. phase ordering is enforced
|
||||
4. endpoint bump / epoch bump invalidate execution authority correctly
|
||||
5. mixed add/remove/update reconciliation preserves or kills state exactly as intended
|
||||
|
||||
Main files:
|
||||
|
||||
- `sw-block/prototype/enginev2/`
|
||||
- `sw-block/prototype/distsim/`
|
||||
- `learn/projects/sw-block/phases/phase-13-v2-boundary-tests.md`
|
||||
|
||||
Key gate:
|
||||
|
||||
- old recovery work cannot mutate current sender state at any execution stage
|
||||
|
||||
## Step 2: Orchestrated Recovery Prototype
|
||||
|
||||
Purpose:
|
||||
|
||||
- move from good local sender APIs to an actual prototype recovery flow driven by assignment/update intent
|
||||
|
||||
Scope:
|
||||
|
||||
1. assignment/update intent creates or supersedes recovery attempts
|
||||
2. reconnect / reassignment / catch-up / rebuild decision path
|
||||
3. sender-group becomes orchestration entry point
|
||||
4. explicit outcome branching:
|
||||
- zero-gap fast completion
|
||||
- positive-gap catch-up
|
||||
- unrecoverable gap -> `NeedsRebuild`
|
||||
|
||||
Done when:
|
||||
|
||||
1. the prototype expresses a realistic recovery flow from topology/control intent
|
||||
2. sender-group drives recovery creation, not only unit helpers
|
||||
3. recovery outcomes are explicit and testable
|
||||
4. orchestrator responsibility is clear enough to narrow `v2-open-questions.md` item 6
|
||||
|
||||
Key gate:
|
||||
|
||||
- recovery control is no longer scattered across helper calls; it has one clear orchestration path
|
||||
|
||||
## Step 3: Minimal Historical Data Prototype
|
||||
|
||||
Purpose:
|
||||
|
||||
- prove the recovery model against real data-history assumptions, not only control logic
|
||||
|
||||
Scope:
|
||||
|
||||
1. minimal WAL/history model, not full engine
|
||||
2. enough to exercise:
|
||||
- catch-up range
|
||||
- retained prefix/window
|
||||
- rebuild fallback
|
||||
- historical correctness at target LSN
|
||||
3. enough reservation/recoverability state to make recovery explicit
|
||||
|
||||
Done when:
|
||||
|
||||
1. the prototype can prove why a gap is recoverable or unrecoverable
|
||||
2. catch-up and rebuild decisions are backed by minimal data/history state
|
||||
3. `v2-open-questions.md` items 3, 4, 5 are closed or sharply narrowed
|
||||
4. prototype evidence strengthens acceptance criteria `A5`, `A6`, and `A7`
|
||||
|
||||
Key gate:
|
||||
|
||||
- the prototype must explain why recovery is allowed, not just that policy says it is
|
||||
|
||||
## Step 4: Prototype Scenario Closure
|
||||
|
||||
Purpose:
|
||||
|
||||
- make the prototype itself demonstrate the V2 story end-to-end
|
||||
|
||||
Scope:
|
||||
|
||||
1. map key V2 scenarios onto the prototype
|
||||
2. express the 4 V2-boundary cases against prototype behavior
|
||||
3. add one small end-to-end harness inside `sw-block/prototype/`
|
||||
4. align prototype evidence with acceptance criteria
|
||||
|
||||
Done when:
|
||||
|
||||
1. prototype behavior can be reviewed scenario-by-scenario
|
||||
2. key V1/V1.5 failures have prototype equivalents
|
||||
3. prototype outcomes match intended V2 design claims
|
||||
4. remaining gaps are clearly real-engine gaps, not protocol/prototype ambiguity
|
||||
|
||||
Key gate:
|
||||
|
||||
- a reviewer can trace:
|
||||
- acceptance criteria -> scenario -> prototype behavior
|
||||
without hand-waving
|
||||
|
||||
## Gates
|
||||
|
||||
### Gate 1: Design Closed Enough
|
||||
|
||||
Status:
|
||||
|
||||
- mostly passed
|
||||
|
||||
Meaning:
|
||||
|
||||
1. acceptance criteria exist
|
||||
2. core simulator exists
|
||||
3. ownership gap from V1.5 is understood
|
||||
|
||||
### Gate 2: Execution Authority Closed
|
||||
|
||||
Passes after Step 1.
|
||||
|
||||
Meaning:
|
||||
|
||||
- stale execution results cannot mutate current authority
|
||||
|
||||
### Gate 3: Orchestrated Recovery Closed
|
||||
|
||||
Passes after Step 2.
|
||||
|
||||
Meaning:
|
||||
|
||||
- recovery flow is controlled by one coherent orchestration model
|
||||
|
||||
### Gate 4: Historical Data Model Closed
|
||||
|
||||
Passes after Step 3.
|
||||
|
||||
Meaning:
|
||||
|
||||
- catch-up vs rebuild is backed by executable data-history logic
|
||||
|
||||
### Gate 5: Prototype Convincing
|
||||
|
||||
Passes after Step 4.
|
||||
|
||||
Meaning:
|
||||
|
||||
- enough evidence exists to choose:
|
||||
- real V2 engine path
|
||||
- or `V2.5` redesign
|
||||
|
||||
## Decision Gate After Step 4
|
||||
|
||||
### Path A: Real V2 Engine Planning
|
||||
|
||||
Choose this if:
|
||||
|
||||
1. prototype control logic is coherent
|
||||
2. recovery boundary is explicit
|
||||
3. boundary cases are convincing
|
||||
4. no major structural flaw remains
|
||||
|
||||
Outputs:
|
||||
|
||||
1. real engine slicing plan
|
||||
2. migration/integration plan into future standalone `sw-block`
|
||||
3. explicit non-goals for first production version
|
||||
|
||||
### Path B: V2.5 Redesign
|
||||
|
||||
Choose this if the prototype reveals:
|
||||
|
||||
1. ownership/orchestration still too fragile
|
||||
2. recovery boundary still too implicit
|
||||
3. historical correctness model too costly or too unclear
|
||||
4. too much complexity leaks into the hot path
|
||||
|
||||
Output:
|
||||
|
||||
- write `V2.5` as a design/prototype correction before engine work
|
||||
|
||||
## What Not To Do Yet
|
||||
|
||||
1. no Smart WAL expansion beyond what Step 3 minimally needs
|
||||
2. no backend/storage-engine redesign
|
||||
3. no V1 production integration
|
||||
4. no frontend/wire protocol work
|
||||
5. no performance optimization as a primary goal
|
||||
|
||||
## Practical Summary
|
||||
|
||||
Current sequence:
|
||||
|
||||
1. finish execution authority
|
||||
2. build orchestrated recovery
|
||||
3. add minimal historical-data proof
|
||||
4. close key scenarios against the prototype
|
||||
5. decide:
|
||||
- V2 engine
|
||||
- or `V2.5`
|
||||
Reference in New Issue
Block a user