Commit Graph

2 Commits

Author SHA1 Message Date
Ping Qiu
da1b81d1c9 feat: CP8-3-1 durability modes + testrunner platform + 21 adversarial tests
Durability mode implementation (sync_all, sync_quorum, best_effort):
- DurabilityMode type with superblock persistence, parse/validate/string
- MakeDistributedSync mode-aware barrier enforcement in dist_group_commit
- blockerr sentinel package (ErrDurabilityBarrierFailed, ErrDurabilityQuorumLost)
- gRPC create path: mode validation, idempotent create consistency, partial cleanup
- F1: strict mode rejects partial replica provisioning with cleanup
- F3: empty heartbeat does not overwrite persisted strict mode
- F4: SCSI error mapping uses errors.Is sentinels (not string matching)
- Proto/wire/blockapi/CLI/UI plumbing for durability_mode field
- Observability dashboard: cluster health cards + per-volume columns

Testrunner platform (YAML-driven integration test framework):
- Engine, parser, registry, reporter (JUnit XML + HTML), metrics scraping
- 52 registered actions: block, iSCSI, I/O, fault injection, assertions
- Baseline regression framework with 7 hard-fail conditions
- 15 YAML scenarios (smoke, crash, HA, fault, consistency, snapshot)
- 49 unit tests for testrunner internals

QA adversarial suite (21 tests, all PASS):
- Idempotent create mode/RF mismatch detection
- Heartbeat mode downgrade prevention (F3)
- sync_all/sync_quorum partial replica enforcement (F1)
- Concurrent create race safety
- Failover/expand mode preservation
- Cleanup resilience when delete fails
- Master restart auto-register mode handling
- Superblock roundtrip all 3 modes
- Validate edge cases (mode×RF matrix)
- RequiredReplicas quorum math verification
- Sentinel error categorization

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 01:06:51 -08:00
Ping Qiu
979a9b496c feat: Phase 8 CP8-1/2/3/4 -- ops control plane, multi-replica, CSI snapshots, observability
CP8-1: HTTP REST API (create/delete/lookup/list/assign/servers), blockapi Go
client with multi-master failover, 5 shell commands, HTML dashboard at /block/.

CP8-2: RF=2/RF=3 multi-replica support -- ShipperGroup fan-out, distributed
sync, health scoring, segment-based scrub, gated promotion (heartbeat
freshness + WAL LSN + role checks), failover/rebuild for N>2 replicas.

CP8-3: CSI snapshot + expansion -- CreateSnapshot/DeleteSnapshot/ListSnapshots
RPCs, NodeExpandVolume with iSCSI rescan, snapshot ID helpers, 20 adversarial
tests covering concurrent ops, edge cases, and error injection.

CP8-4: Observability -- EngineMetrics atomic counters for flusher/group-commit/
WAL-shipper/scrub, 10 new Prometheus metrics, barrier_lag_lsn SLO gauge,
failover/promotion/rebuild counters, request ID correlation in master gRPC
logs, baseline regression framework with 7 hard-fail conditions.

Total: 63 files, ~11.2K LOC, 160+ new tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 00:05:17 -08:00