Closes the four deferred items from project_chirp_v2_pr_f_review_followups
that were carved out of 51a94f0 to keep that diff narrow.
A. TB doppler_bin / dbg_doppler_bin / dbg_range_bin still 5 / 6 bits,
ports widened to 6 / 9 in PR-F:
- tb/tb_doppler_cosim.v
- tb/tb_doppler_frame_start_gate.v
- tb/tb_system_e2e.v
- tb/radar_system_tb.v
- tb/tb_radar_receiver_final.v
All five files now include radar_params.vh and use
`RP_DOPPLER_BIN_WIDTH / `RP_RANGE_BIN_WIDTH_MAX. tb_doppler_cosim.v
was already structured around CHIRPS=32 and would have stalled
forever against the new 48-chirp default — added explicit parameter
overrides (CHIRPS_PER_FRAME=32, CHIRPS_PER_SUBFRAME=16, RANGE_BINS=512)
to keep its legacy 2-subframe golden vectors valid, mirroring the
pattern already used by tb_doppler_realdata / tb_fullchain_realdata.
B. tb_radar_receiver_final hardcoded NUM_DOPPLER_BINS=32 across the
golden buffer, the per-range bitmap, the duplicate-detect mask, the
gidx multiplier, and the S5/S6/S7/B3/B4 expected counts. All bumped
to `RP_NUM_DOPPLER_BINS (=48) via NUM_DBINS / NUM_RBINS / GOLDEN_ENTRIES
localparams; per-range index_seen widened to 64-bit so
`(64'd1 << doppler_bin)` covers bins 32..47. Note: under iverilog the
doppler-frame checks (S4-S9, B2a, B3, B4, G1) remain gated on
FFT_USE_XILINX_IP — the in-house fft_engine is too slow to land a
48-chirp Doppler frame in 20 ms sim; under XSim with the IP the
widened logic now exercises the full 24576-cell output (was 16384).
The 8-test active subset under iverilog is unchanged.
C. radar_system_top_50t.v adds `\`include "radar_params.vh"`, which is
needed for the `\`RP_DOPPLER_BIN_WIDTH-1:0]` reference added in PR-F.
Previously worked only because alphabetical Vivado file ordering
processes radar_system_top.v (which does include) first and the
macros leak across the same compilation unit. While here, also bumps
the dbg_range_bin_nc tie-off wire from a literal [5:0] to
`RP_RANGE_BIN_WIDTH_MAX-1:0] so the wrapper width matches the port.
D. usb_data_interface_ft2232h.v:392 stale comment ("FRAME_CELLS = 24576
< 32768") rewritten to reflect that PR-F's pad-to-power-of-2 makes
FRAME_CELLS = NUM_RANGE_BINS * (1<<DOPPLER_BIN_BITS) = 32768 (the
full 15-bit address space).
Tests (parity with PR-F baseline numbers in 7862f4d / 51a94f0):
- tb_doppler_cosim (3 scenarios): 14/14 each + Python golden compare PASS
- tb_doppler_frame_start_gate: 21/21 PASS
- tb_doppler_realdata: 2056/2056 PASS
- tb_cfar_ca: 24/0 PASS
- tb_chirp_controller: 43/43 PASS
- tb_chirp_contract: 10/10 PASS
- tb_mti_canceller: 43/43 PASS
- tb_radar_receiver_final: 8/8 PASS
- tb_system_e2e: 33/49 PASS
- radar_system_tb (USB_MODE=1): smoke (no PASS/FAIL markers; runs to $finish)
Lint (iverilog -Wall on full PROD_RTL + 50t wrapper): no new
width / Padding / Truncating warnings introduced.
Two issues caught re-reviewing 7862f4d:
1. doppler_processor.v: at sub_frame = NUM_SUBFRAMES-1 (=2 in production),
the read-ahead pointer was advanced one cycle past the last useful chirp,
producing an out-of-range mem_read_addr (chirps 48/49 in a 48-chirp frame)
on the BRAM read port. The result was never consumed — counter > CPS-1
blocks the multiply — so the OOB read had no functional effect, but it
still drives mem_mem[OOB_idx] every frame and would trigger Vivado synth
range warnings. Gate the read_doppler_index advance on
fft_sample_counter <= CHIRPS_PER_SUBFRAME - 3 so the last NBA at
counter = CPS-3 schedules the data needed at counter = CPS-1 and no more.
For sub_frame < NUM_SUBFRAMES-1 this just replaces previously-wasted
forward reads with redundant reads of the same address; outputs are
bit-exact.
2. radar_system_top.v: cfar_detect_class, cfar_detect_threshold_soft, and
cfar_detect_count_cand were declared and connected to cfar_inst but went
nowhere downstream. They will be wired to USB / telemetry in PR-G; until
then they show up as dangling wires that Vivado optimises away with
noisy warnings. Drop the wire decls and leave the cfar_ca output ports
unconnected. The soft-tier comparison is still synthesized because the
1-bit detect_flag (which IS wired) depends on noise_product_soft via the
`else if (cur > thr_soft)` branch, so the candidate logic is preserved
in the netlist — only the class / soft-thr / cand-count rails are gone.
Tests (parity with the PR-F numbers in 7862f4d):
- tb_chirp_controller: 43/43 PASS
- tb_chirp_contract: 10/10 PASS
- tb_cfar_ca: 24/0 PASS
- tb_mti_canceller: 43/43 PASS
- tb_doppler_realdata: 2056/2056 PASS
- tb_doppler_frame_start_gate: 21/21 PASS
- tb_system_e2e: 33/49 PASS (PR-F baseline parity)
Bumps RP_CHIRPS_PER_FRAME 32 -> 48 (= 3 sub-frames × 16 chirps), widens
doppler_bin from 5 to 6 bits ({sub_frame[1:0], bin[3:0]}), and replaces the
1-bit detect_flag rail with a 2-bit detect_class (NONE / CANDIDATE /
CONFIRMED) sourced from a soft+confirm CFAR threshold pair.
doppler_processor:
Generalised the 2-subframe FSM to NUM_SUBFRAMES = CHIRPS_PER_FRAME /
CHIRPS_PER_SUBFRAME (=3 in production, =2 when TBs override). S_OUTPUT
walks current_sub_frame 0..NUM_SUBFRAMES-1 then advances range_bin;
the chirp_base * CHIRPS_PER_SUBFRAME formula replaces the if/else split.
write_chirp_index, read_doppler_index, sub_frame, current_sub_frame all
widened to 6/2 bits accordingly. doppler_bin packing {current_sub_frame[1:0],
fft_sample_counter[3:0]} naturally yields 6 bits.
cfar_ca:
Adds cfg_alpha_soft input + r_alpha_soft register (default
RP_DEF_CFAR_ALPHA_SOFT = 0x18 ≈ 1.5 in Q4.4 → Pfa_soft ≈ 1e-5). ST_CFAR_MUL
computes both noise_product (alpha) and noise_product_soft (alpha_soft) in
parallel DSPs; ST_CFAR_CMP emits detect_class = CONFIRMED when cur > thr,
CANDIDATE when cur > thr_soft (and not CONFIRMED), NONE otherwise.
detect_flag is preserved as (class != NONE) for backward compat.
Address packing now pads doppler axis to next power-of-2 (DOPPLER_PAD =
1 << ceil(log2(NUM_DOPPLER))) so {range, doppler} packs contiguously
for both NUM_DOPPLER=32 (legacy TB) and NUM_DOPPLER=48 (production).
Mag-BRAM grows from ~16 to ~30 RAMB18 on 50T (acceptable on the budget).
usb_data_interface_ft2232h:
doppler_bin_in widened to 6 bits. FRAME_CELLS pads to next power of two
(32K) so {range, doppler[5:0]} concatenation lands cleanly. Address regs
bumped: mag_wr/rd_addr 14→15, detect_byte_addr 11→12, detect_clear bit-
counter 14→15. Detect-bit BRAM grows 2K→4K bytes. Wire-protocol byte
counts auto-scale with FRAME_CELLS / DOPPLER_MAG_SECTION_BYTES; PR-G
bumps the bulk-frame protocol version so the host parser knows.
Other:
- radar_params.vh: RP_CHIRPS_PER_FRAME 32→48, RP_NUM_DOPPLER_BINS 32→48,
RP_DOPPLER_MEM_ADDR_W 14→15 (50T) / 17→18 (200T), RP_CFAR_MAG_ADDR_W
likewise. Other macros (RP_DOPPLER_BIN_WIDTH=6, RP_DETECT_CLASS_WIDTH=2,
RP_DEF_CFAR_ALPHA_SOFT=0x18, RP_NUM_SUBFRAMES=3) were already in place
from PR-A.
- radar_system_top: rx_doppler_bin / dbg_doppler_bin widened. Adds
host_cfar_alpha_soft register (default RP_DEF_CFAR_ALPHA_SOFT). USB
opcode mapping deferred to PR-G.
- radar_system_top_50t: dbg_doppler_bin_nc width.
- radar_receiver_final: doppler_bin port width.
Test summary:
- tb_chirp_controller_v2: 43/43 PASS
- tb_chirp_contract: 10/10 PASS
- tb_cfar_ca: 24/0 PASS
- tb_mti_canceller: 43/43 PASS
- tb_rxb_fullchain: peak 24033 ~80x (parity with PR-D/E)
- tb_doppler_realdata: 2056/2056 PASS (had been broken pre-PR-F due
to missing RANGE_BINS=64 override; this PR fixes
the parameter override along with the widening)
- tb_system_e2e: 33/49 PASS — identical to PR-E baseline; the
one new fail vs PR-D (G2.2) carries over.
- tb_radar_receiver_final: still finishing in background (~10 min).
Replaces plfm_chirp_controller_enhanced (5-state FSM with hardcoded
LONG/SHORT timings + 60-entry inline short LUT) with plfm_chirp_controller_v2,
a pure DAC playback driver: IDLE -> CHIRP -> IDLE keyed off a 1-cycle
dst_chirp_valid pulse, with sample count selected by dst_wave_sel
(SHORT=120 / MEDIUM=600 / LONG=3600). Inter-chirp timing (LISTEN, GUARD,
frame boundaries) is now owned exclusively by chirp_scheduler.
Scheduler -> TX bridge: cdc_async_fifo (Cummings style #2, WIDTH=2 DEPTH=4)
crosses {wave_sel} from clk_100m to clk_120m_dac, with chirp_pulse as
src_valid. frame_pulse rides a separate toggle CDC for chirp_counter
clear and the new_chirp_frame status output. mixers_enable now also gates
the scheduler so it stays in S_IDLE while the radar is "off" — without
this gate the first chirp_pulse fires at reset and gets dropped before
mixers come up.
Files:
- NEW plfm_chirp_controller_v2.v DAC playback driver (3 LUTs, FSM)
- DEL plfm_chirp_controller.v legacy controller (382 lines)
- DEL long_chirp_lut.mem legacy LUT (3600 lines), replaced
by tx_long_lut.mem from PR-B
- chirp_scheduler.v + mixers_enable input (master quiesce)
- radar_receiver_final.v + sched_*_out output ports + mixers_enable_100m
- radar_system_top.v wire sched_*_out -> tx_inst.sched_*; pass
stm32_mixers_enable_100m to rx_inst
- radar_transmitter.v full rewrite: drop new_chirp edge detector +
toggle CDC, instantiate cdc_async_fifo for
{wave_sel}, toggle CDC for frame_pulse,
plfm_chirp_controller_v2 in place of _enhanced
- tb/tb_chirp_controller.v + tb/tb_chirp_contract.v rewritten for v2
contract (43/43 unit + 10/10 contract green)
- tb/tb_radar_receiver_final.v + .mixers_enable_100m(1'b1) pin
- run_regression.sh, scripts/200t/build_200t.tcl file-list bumped
Test summary:
- tb_chirp_controller_v2: 43/43 PASS
- tb_chirp_contract: 10/10 contracts upheld
- tb_rxb_fullchain: peak 24033 ~80x (parity with PR-D)
- tb_mti_canceller: 43/43 PASS
- tb_system_e2e: 33/49 (1 new vs 34/49 PR-D baseline: G2.2
new_chirp_frame, intentional v2 frame-pulse
semantics — fires once per Doppler frame
instead of once per stm32 chirp toggle.
TB needs widening in PR-H to wait the full
frame.)
Single 100 MHz scheduler emits wave_sel[1:0] and chirp_pulse natively. Modes
00 (STM32 pass-through), 01 (auto-scan over SHORT/MEDIUM/LONG sub-frames),
10 (single-chirp debug), 11 (track dwell with watchdog scan-fallback after
RP_DEF_TRACK_WATCHDOG_FRAMES=5 idle frames). Sub-frame mask lets ops drop a
waveform without recompiling.
Drops the receiver_final wave_sel shim added in PR-C: wave_sel comes
straight from the scheduler; chirp_pulse replaces the old mc_new_chirp
toggle + XOR edge converter. matched_filter_multi_segment and mti_canceller
take wave_sel[1:0] and chirp_pulse directly — no parallel paths.
multi_segment also bumped: SHORT_CHIRP_SAMPLES 50 -> 100 (V2 1 us SHORT)
and MEDIUM_CHIRP_SAMPLES = 500 (5 us). LONG path unchanged. Dead
mc_new_elevation/azimuth XOR converters removed.
Deletes radar_mode_controller.v, formal/fv_radar_mode_controller.v, and
tb/tb_radar_mode_controller.v. Build manifests (run_regression.sh,
scripts/200t/build_200t.tcl) updated. Receiver_final pins medium/track/
subframe_enable inputs to RP_DEF_* defaults until PR-G plumbs USB opcodes.
Verification:
- tb_rxb_fullchain_latency: peak |I|+|Q|=24033 at bin 0, ~80x peak/mean
(up from PR-C's 15115 since matched filter now uses full 100 SHORT samples)
- tb_mti_canceller: 43/43 PASS with new wave_sel[1:0] input
- tb_radar_receiver_final: 8/8 PASS, ALL TESTS PASSED
- tb_system_e2e: 34/49 PASS - identical to pre-PR-D baseline (15 failures
are pre-existing matched-filter cycle-budget skips); G8.2/G8.3 chirp_scheduler
probes PASS
- tb_multiseg_cosim: 16/32 - same as pre-PR-D baseline
Drop the chirp-v1 1-bit use_long_chirp memory loader and its 6 .mem files;
introduce chirp_reference_rom — wave_sel-native, single 8192x16 BRAM array
per Q15 lane, 4-region init (SHORT, MEDIUM, LONG seg0/seg1) loaded from the
PR-B mem files. Same 1-clk read latency as the legacy module so the RX-B
autocorrelation alignment fix carries through unchanged.
Receiver-side wave_sel shim added in radar_receiver_final.v:
wire [1:0] wave_sel = use_long_chirp ? RP_WAVE_LONG : RP_WAVE_SHORT;
This is a 1-line transitional bridge while radar_mode_controller still
emits 1-bit use_long_chirp; PR-D deletes the shim and wires chirp_scheduler
straight through. MEDIUM is loaded into the ROM but unreachable through
the production path until PR-D.
BRAM cost: 8 RAMB18 (was 6 in chirp-v1). +2 BRAM is the cost of adding
MEDIUM to the waveform set; not avoidable.
Files added:
- chirp_reference_rom.v
Files removed:
- chirp_memory_loader_param.v
- long_chirp_seg{0,1}_{i,q}.mem (4 files)
- short_chirp_{i,q}.mem (2 files)
- tb/cosim/validate_mem_files.py (legacy file-set validator; replaced by
gen_chirp_mem.py's internal verify_phase_match)
- tb/cosim/analyze_short_chirp_mismatch.py (one-shot tool from the
chirp-v1 TX-I investigation; finding incorporated, references the
deleted short_chirp_*.mem files)
Files updated for module rename:
- radar_receiver_final.v — instance, comments, wave_sel shim
- radar_mode_controller.v — header comment
- matched_filter_processing_chain.v — header comment
- scripts/200t/build_200t.tcl — explicit RTL list
- run_regression.sh — 5 spots
- tb/tb_rxb_fullchain_latency.v — instance, wave_sel shim, mem filenames,
SHORT_LEN 50 → 100 (1 µs at 100 MHz)
- tb/tb_system_e2e.v — header comment
Verification:
- chirp_reference_rom standalone iverilog compile: clean
- Full receiver chain compile (21 RTL files): clean
- tb_rxb_fullchain_latency runs end-to-end with new ROM + new mem files
+ 100-sample SHORT chirp; autocorrelation peak at bin 0, peak |I|+|Q|
= 15115. Confirms 1-clk ROM read latency is preserved and the RX-B
direct-wire-with-1-FF alignment still holds.
- 50T build script (scripts/50t/build_50t.tcl) uses glob *.v — no edit
needed; it picks up the new file automatically.
Rewrite gen_chirp_mem.py to emit the SHORT (1 µs), MEDIUM (5 µs), and LONG
(30 µs) waveform set on both TX and RX paths. The script is now the single
source for every chirp .mem file; the legacy 6-file set on disk
(long_chirp_lut.mem, long_chirp_seg{0,1}_{i,q}.mem, short_chirp_{i,q}.mem)
is no longer regenerated and gets deleted in PR-C/PR-E when its consumer
modules are removed.
Generated artifacts (committed):
TX (8-bit unsigned offset-binary, fs_dac = 120 MHz):
tx_short_lut.mem 120 lines
tx_medium_lut.mem 600 lines
tx_long_lut.mem 3600 lines
RX (Q15 I/Q hex, fs_sys = 100 MHz, all 2048 lines for uniform BRAM sizing):
rx_short_i.mem / rx_short_q.mem 100 active + 1948 zero-pad
rx_medium_i.mem / rx_medium_q.mem 500 active + 1548 zero-pad
rx_long_seg0_i.mem / rx_long_seg0_q.mem 2048 (samples [0..2047])
rx_long_seg1_i.mem / rx_long_seg1_q.mem 952 active + 1096 zero-pad
Phase model unchanged from chirp-v1: phi(n) = 2π·F_BASEBAND_LOW·t +
π·(BW/T)·t² with F_BASEBAND_LOW=10 MHz and BW=20 MHz. The same formula now
runs three durations and two sample rates from one helper.
rx_long_seg0_i.mem is bit-exact to the legacy long_chirp_seg0_i.mem on disk
(diff -q reports identical) — proves the SHORT/MEDIUM additions did not
perturb the LONG path.
Verification:
- all 11 files have correct line counts (above)
- script is idempotent (re-run produces byte-identical output)
- ruff clean (one E501 line-length + two RUF046 redundant-int casts fixed)
- phase regression at long-seg0 against pre-chirp-v2 reference: bit-exact
No RTL or testbench changes. The legacy .mem files remain on disk for the
existing chirp_memory_loader_param.v / plfm_chirp_controller.v consumers
until PR-C and PR-E delete those modules. No module references the new
files yet.
Establishes the macro vocabulary for the SHORT/MEDIUM/LONG waveform ladder,
3-subframe Doppler layout, track-mode dwell, and 2-class CFAR detection.
PR-A is purely additive — no module references the new macros yet.
Subsequent PRs (B–H) progressively replace the old chirp logic; this one
puts the names in place so each follow-on PR is mechanical.
Added:
- Waveform identity: RP_WAVE_{SHORT,MEDIUM,LONG,RESERVED} (2-bit selector)
- Sub-frame layout: RP_NUM_SUBFRAMES=3, RP_DOPPLER_BIN_WIDTH=6,
RP_SUBFRAME_ID_WIDTH=2
- Track mode: RP_DOPPLER_FFT_SIZE_TRACK=64, RP_MODE_TRACK=2'b11,
RP_DEF_TRACK_CHIRP_COUNT=64, RP_DEF_TRACK_WATCHDOG_FRAMES=5
- Detection class: RP_DETECT_{NONE,CANDIDATE,CONFIRMED,RSVD}
- 3-ladder timing defaults (V2 suffix to coexist with legacy in this PR):
SHORT 100 cyc (1 µs), MEDIUM 500 cyc (5 µs), LONG 3000 cyc (30 µs)
- Soft-CFAR alpha default: RP_DEF_CFAR_ALPHA_SOFT=0x18 (1.5 Q4.4,
Pfa_soft ≈ 10⁻⁵; confirm Pfa ≈ 10⁻⁶ at α=3.0)
- host_subframe_enable default: RP_DEF_SUBFRAME_ENABLE=3'b111
Marked LEGACY (deleted in the noted PR):
- RP_CHIRPS_PER_FRAME=32, RP_NUM_DOPPLER_BINS=32 (PR-F)
- RP_DEF_SHORT_CHIRP_CYCLES=50 (PR-E switches to 100)
- RP_DEF_CHIRPS_PER_ELEV=32 (PR-F)
Verified: iverilog preprocess clean. Sweep across 9_2_FPGA confirms no
module references the new macros yet — the PR is fully isolated.
Revert tag pre-chirp-v2 placed at 4f898ae for the chirp-v2 series.
The header still described the legacy in-house Radix-2 DIT fft_engine and a
FWD/REF/INV BITREV+BUTTERFLY state list that no longer matches reality.
Since RX-NEW-3 (commit 5c8cc8c), the chain instantiates fft_engine_axi_bridge,
which wraps xfft_2048 — LogiCORE FFT v9.1 (Pipelined Streaming) in synth/XSim
when FFT_USE_XILINX_IP is defined, in-house fft_engine fallback in iverilog.
Bit-reversal is now handled inside the IP (and the fallback), so the FSM has
COLLECT → SIG_FFT/CAP → REF_FFT/CAP → MULTIPLY → INV_FFT/CAP → OUTPUT → DONE.
No RTL changes. Header comment updates only.
cdc_adc_to_processing carries multi-bit data across 400→100 MHz via
TWO independent synchronizer chains (data Gray-encoded + a separate
2-bit toggle). Under metastability, the chains can resolve on
different cycles, letting the destination latch a half-resolved Gray
word that decodes to an arbitrary value. Audit C-11. Practical MTBF
is years per event but the design is non-conformant for arbitrary
multi-bit data — Gray code's single-bit-flip protection only holds
for ±1 transitions, not for CIC samples that can change by hundreds
of LSBs.
Replace with cdc_async_fifo, a Cummings SNUG-2002 style #2 async
FIFO. Data does NOT cross domains; it sits in dual-clock distRAM
(write port src_clk, read port dst_clk). Only the read/write
Gray-coded POINTERS cross — and pointers genuinely change ±1 per
increment, so Gray code's protection is correct by construction.
Home-grown rather than XPM_FIFO_ASYNC: vendor-neutral (iverilog can
simulate it directly, no SIM stub), keeps the project's existing
home-grown CDC convention (3 sibling primitives in cdc_modules.v),
and avoids XPM library version skew.
Port shape is preserved (same WIDTH=18, same dst_data/dst_valid/
overrun semantics — 1-cycle pulse per read in steady state) so the
swap is local to two instantiations in ddc_400m.v. Sticky-overrun
aggregation downstream is unchanged.
XDC: project already has blanket set_false_path on
clk_100m ↔ adc_dco_p, which covers both new pointer crossings.
Synchronizer FFs carry ASYNC_REG="TRUE" for placement-aware MTBF.
No XDC change needed.
New TB tb_cdc_async_fifo.v exercises 7 groups (28 checks): reset,
single-sample passthrough, multi-Gray-bit-flip (0x00000 ↔ 0x3FFFF —
audit's recommended coverage point, asserts NO intermediate values
appear at dst_data), matched-rate continuous stream, sustained-burst
overrun, drain-to-empty, and mid-stream reset.
Resource: 8 LUTRAMs per instance × 2 instances = 16 LUTRAMs (~0.05%
of XC7A50T budget).
Verified: full FPGA regression 42/42 PASS (was 41/41; +1 new test,
0 regressions in DDC Chain / Doppler Co-Sim / Full-Chain Real-Data
/ Receiver Integration / System Top / System E2E / MF Co-Sim — all
of which exercise the swap path through the production signal
chain). 0 lint errors.
The coefficient ROM has a deliberate positive DC pre-emphasis. Sum of
32 signed coefficients = 231,944; with the output slice at
accumulator[34:17] (effective Q17), DC gain = 231944 / 2^17 = 1.7696
= +4.96 dB. Bit-exact against the in-header golden-model line
(DC=5000 → 8847).
The +4.96 dB pre-emphasis compensates the upstream 4-stage CIC's
~3-4 dB passband droop. Without this note in the header, a future
engineer rebuilding the filter from a clean FIR design tool would
silently lose the pre-emphasis; AGC/saturation budgets in downstream
stages must also account for the +4.96 dB rather than assume 0 dB.
Audit's original "+7 dB" estimate was directionally correct but
quantitatively wrong (no Q-format reconciles to +7 dB; Q15 → +17 dB,
Q16 → +11 dB, Q17 → +4.96 dB). Documented at the verified +4.96 dB.
No coefficient or RTL change. Verified: full FPGA regression
41/41 PASS, 0 lint errors (FIR Lowpass: 13 checks PASS).
`output_bin_count` is declared `reg [RP_RANGE_BIN_WIDTH_MAX-1:0]`
(9 bits on 50T, 12 bits on 200T), but the reset and ST_IDLE assignments
used the literal `9'd0`. Vivado zero-extends with a width-mismatch
warning on 200T. The FORMAL port `fv_output_bin_count` was also
hardcoded `[8:0]`.
Replace all three sites with `{RP_RANGE_BIN_WIDTH_MAX{1'b0}}` /
parameterized port width — same pattern already used for the
`range_bin_index` reset in this module.
No functional change. Verified by full FPGA regression: 41/41 PASS,
0 lint errors (Range Bin Decimator: 63 checks PASS).
cfar_ca.v's GO/SO modes correctly cross-multiply to pick the side with
the greater (GO) or lesser (SO) per-cell average, but return that
side's RAW SUM as the noise estimate -- not the average. Combined with
alpha being pre-baked for the interior training-cell count, this means
at edges where the picked side is truncated, effective Pfa shifts by
the count ratio (up to ~2x in the first/last r_train bins). CA mode's
edge behavior was already documented; GO/SO's was not.
Documentation only -- no RTL behavior change. The audit's preferred
fix (divide noise_sum by selected_count) is explicitly NOT applied:
per-CUT integer divide is expensive in 50T fabric and the affected
bins are platform clutter (0..60 m) or noise floor (3012..3072 m)
where edge errors are masked by other effects. Operators tuning Pfa
have three documented options: (a) accept the asymmetry, (b) host-side
skip GO/SO outside r_train..NRANGE-r_train and fall back to CA there,
(c) hand-tune alpha per-mode based on observed Pfa drift.
Changes:
- cfar_ca.v header "CFAR Modes" table: GO/SO now explicitly note that
selection is by average but return value is raw sum.
- cfar_ca.v header "Edge handling": new GO/SO caveat paragraph.
- cfar_ca.v ST_CFAR_THR mode 2'b01/2'b10 selectors: inline AUDIT-C7
comment pointing to header.
Verification: full regression 41/41 PASS, 0 lint regressions.
AUDIT-S10 (commit `58154a6`) split the FPGA's six-flag aggregate
gpio_dig5 into two MCU-visible bits: gpio_dig5 keeps signal-saturation
(AGC reacts), gpio_dig7 (PD15) carries control-fault classes
(range_decim_watchdog | cic_fir_overrun). Until now the MCU did NOT
poll PD15, so DSP control faults were invisible to the recovery
dispatcher.
Changes:
- New `ERROR_FPGA_DSP_STALL` enum value placed AFTER ERROR_WATCHDOG_TIMEOUT
so the dispatcher routes to attemptErrorRecovery (FPGA reset pulse) not
Emergency_Stop. Updated error_strings[] in lockstep (static_assert
enforces).
- checkSystemHealth section 10 polls PD15 at 1 Hz with 2-sample debounce.
`last_dsp_check` is committed BEFORE the early return per AUDIT-CAL
pattern, so a flapping fault never bypasses the rate-limit. Streak
counter resets to 0 after firing (armed for next post-recovery
assertion) AND resets naturally when PD15 returns LOW.
- attemptErrorRecovery: ERROR_FPGA_DSP_STALL fans into the existing
ERROR_FPGA_COMM PD12 reset case (stacked case labels, same body). No
MCU-driven reset_monitors path exists; full bitstream reload clears
all sticky monitors as a side effect.
Tests:
- tests/test_audit_s10_dsp_stall_polling.c (NEW, 7 scenarios, 7/7 PASS):
T1 healthy 60s, T2 single-sample glitch blocked by debounce, T3
sustained fault fires once, T4 post-fire rate-limit holds within
window, T5 sustained fault rate bounded (29 errors / 60s -- MCU-N1
latch at error_count>10 fires in ~22s, gives operator time to
intervene), T6 counter-test demos no-debounce false-positive on
glitch, T7 HAL_GetTick 32-bit wrap.
- MCU host suite 35/35 PASS (was 34/34; +1 new, 0 regressions).
Pre-fix Tests 1/2/4 in fpga_self_test.v gave false PASS even on broken
silicon:
S-19 Test 1 (CIC): `result_flags[1] <= 1'b1` unconditional, comment
admitted "always true for simple check".
S-20 Test 2 (FFT): `(16'sd100+16'sd100 == 16'sd200) && (...)` —
both predicates compile-time-fold to 1; synth reduces to a
constant write.
S-21 Test 4 (ADC): PASS once N samples land, regardless of value.
A stuck-at-0 / stuck-at-MAX / dead LVDS link still PASSed
provided adc_valid_in toggled.
Fixes:
Test 1: drive impulse {5,0,0,0,0,0,0} through registered integrator
y[n]=y[n-1]+x[n]; require accumulator==5 after step
response. Real adder + register path; sign-extension
exercised. Detail = 0xC1 on fail.
Test 2: real radix-2 butterfly with twiddle multiply across 4 FSM
states. A=8, B=4 (real), W=2+3j -> WB=(8,12), A'=(16,12),
B'=(0,-12). Forces synth to instantiate signed multiplier
(DSP slice) + 17-bit signed add/sub. Detail = 0xF2 on fail.
Test 4: track min/max across 256-sample capture, require
(max - min) > ADC_RANGE_THRESHOLD (10 LSB). Catches stuck-at
faults. Does NOT distinguish AD9484 format mismatches
(audit's per-mode mean check requires SPI, impossible per
AUDIT-C13). Detail = 0xAD on fail.
Tests:
- tb_fpga_self_test.v existing Group 1-4 (16 PASS) still pass: varied
ADC counter input gives range >> 10.
- New Group 5: drive constant 0 -> expect Test 4 FAIL + detail=0xAD.
- New Group 6: drive constant 0x7FFF -> expect Test 4 FAIL + detail=0xAD.
- Regression: 41/41 PASS; fpga_self_test 22/22 (was 16/16).
Pre-fix usb_data_interface.v hardcoded `localparam [14:0] NUM_CELLS =
15'd16384` for the 50T 512-range x 32-doppler layout. On 200T builds
with SUPPORT_LONG_RANGE defined, RP_MAX_OUTPUT_BINS=4096 makes a real
frame 131072 cells, so the fixed value caused two distinct defects:
(a) value: counter wrapped 8x per real frame; bit-7 frame-start
marker fired 8x at incorrect host-frame offsets, silently
desyncing the GUI parser
(b) width: 15 bits could not represent 131072 (needs 17 bits)
Fix: derive NUM_CELLS = RP_MAX_OUTPUT_BINS * RP_NUM_DOPPLER_BINS and
counter width = RP_DOPPLER_MEM_ADDR_W (14 on 50T, 17 on 200T) from
radar_params.vh, so both scale together with the build define.
Tests:
- tb_audit_c16_num_cells.v: standalone counter-block exerciser (T1
reset, T2 increment, T3 wrap at NUM_CELLS-1, T4 exactly 2 markers
across 2*NUM_CELLS ticks, T5 top-bit observability) -- 6/6 PASS at
both 50T (NUM_CELLS=16384, CTR_W=14) and 200T (131072, 17).
- tb_usb_data_interface.v: existing test 7-8 retargeted from the old
hardcoded `>=15` / `==15'd16384` invariant to the new parameterized
one (`==RP_DOPPLER_MEM_ADDR_W` / `==RP_MAX_OUTPUT_BINS*RP_NUM_DOPPLER_BINS`).
Regression: 41/41 PASS (+2 new entries: 50T default + 200T
`+define+SUPPORT_LONG_RANGE`).
checkSystemHealth() had three watchdog blocks with the identical
"last_X_check not updated on error path" bug — same root cause as
AUDIT-CAL (BMP180 fix in commit 95aed35), distinct sites:
AD9523 clock check (5 s) main.cpp:693-705
ADAR1000 comm check (2 s) main.cpp:729-749
IMU comm check (10 s) main.cpp:752-760
Pre-fix, each block placed `last_X_check = HAL_GetTick();` below the
early-return path, so once the underlying check (STATUS0/1 RESET,
SCRATCHPAD verify fail, GY85_Update false) started failing, the
rate-limit window never engaged. Every subsequent iteration of the
main while(1) loop re-fired the corresponding ERROR_*. With
error_count > 10 latching system_emergency_state per MCU-N1, the
radar would trip into SAFE-MODE within ~10 main-loop iterations of
the first transient — far short of the intended ~100-150 s grace
window meant for operator intervention or attemptErrorRecovery
to succeed. ADAR1000 comm-failure also re-ran the 16 ms blocking
SPI verify (4 devices × 4 ms HAL_Delay) per iteration → chirp jitter.
Fix at all three sites: move the timestamp update INTO the if-block
and BEFORE any sub-check call. Mirrors the AUDIT-CAL post-fix
BMP180 block at main.cpp:771-780. ADAR1000 overtemp check stays
per-loop (unchanged) — over-temperature must remain responsive.
Test: tests/test_audit_imu_watchdog_cadence.c (6 tests, 6/6 PASS)
exercises the post-fix predicate against simulated HAL_GetTick()
ticks and a controllable GY85_Update() mock; counter-test runs the
pre-fix predicate to demonstrate the regression. Test uses IMU as
representative; AD9523 (5 s) and ADAR1000 (2 s) sites have identical
control flow.
Verification: full MCU host suite 34/34 PASS (was 33/33; +1 new test,
0 regressions).
Two stale-baseline events were never captured in earlier commits:
1. The FFT-1024 -> FFT-2048 merge (c668652) updated the testbench and
gen_mf_cosim_golden.py but left radar_scene.py FFT_SIZE=1024. When
FFT_SIZE was later bumped to 2048, the input vectors written by
generate_baseband_samples (bb_mf_test_*.hex, ref_chirp_*.hex) grew
from 1024 to 2048 samples but were never re-exported.
2. The TX-I matched-filter realignment (5ff5671) changed the ADC chirp
phase from 2*pi*F_IF*t to 2*pi*(F_IF+F_BASEBAND_LOW)*t. ADC sample
values shifted from sample ~1336 onward but adc_*.hex was never
re-exported.
Result: every regression run produced a "dirty" working tree as the
regen reproduced post-merge values that disagreed with the committed
baselines. Two consecutive regen runs are bit-exact identical
(LCG seed=42 + deterministic chirp math) — verified via diff -q on
two output dirs. There is no actual non-determinism; only stale
artifacts.
This commit refreshes all 15 affected files in one shot:
- 6 input hex (adc_*_target.hex, bb_mf_test_*.hex, ref_chirp_*.hex)
- 5 RTL output csv (rtl_*.csv from current RTL)
- 4 compare csv (compare_mf_*.csv = py vs rtl side-by-side)
Verification: full regression 39/39 PASS on the refreshed inputs.
After this commit, regression runs should leave the working tree clean.
gpio_dig5 (PD13) previously OR'd six flags — four signal-saturation
classes (AGC, DDC overflow, DDC saturation, MTI saturation) and two
control-fault classes (range-decimator watchdog from F-6.4, CIC->FIR
CDC overrun from F-1.2). The MCU outer-loop AGC reduces RF gain on
PD13 assertion, which is the wrong response to a watchdog or CDC
stall — it just hides the stall behind a quiet receive chain. gpio_dig7
(PD15) was tied 1'b0 as "reserved".
Split:
gpio_dig5 = signal-saturation only (AGC continues to react correctly)
gpio_dig7 = control-fault classes
Telemetry: status_words[5][6:5] now exposes the two control-fault
classes in BOTH legacy (FT601) and FT2232H USB variants, with 2-FF
level CDC sync from clk_100m to ft601_clk_in / ft_clk. Bit [7] is
reserved. AUDIT-C12's frame_drop_count at [31:25] is preserved.
50T XDC H12 -> gpio_dig7 pin already assigned (audit AUDIT-C15-era);
no XDC change.
Test: tb/tb_audit_s10_gpio_split.v 17/17 PASS — exercises both the
combinational GPIO split and the CDC status-word packing path.
Regression: 39/39 PASS (was 34/34).
`radar_receiver_final.v:246` had `assign adc_pwdn = 1'b0;` -- the AD9484
PWDN pin was hard-tied LOW with no path for the host or MCU to assert
it. Combined with AUDIT-C13 (CSB hard-tied HIGH on the production board,
no SPI access to the AD9484), the ADC was fully un-recoverable from a
stuck state without dropping main power -- which also drops the
VBAT-backed BKPSRAM persistence (MCU-A4 OCXO warmup, MCU-A7 emergency
flag) and forces a 180 s warmup soak.
Opcode 0x32 was reserved during the AUDIT-C3 fix (commit 24ef5e7) for
exactly this purpose. Wire it through:
- `radar_system_top.v` adds `reg host_adc_pwdn` next to `host_adc_format`,
resets to 1'b0 (matches historical hard-tied state -- preserves
bringup behavior), latches `usb_cmd_value[0]` on opcode 0x32, drives
the new receiver input port.
- `radar_receiver_final.v` adds `input wire host_adc_pwdn`, replaces the
hard-coded `assign adc_pwdn = 1'b0` with `assign adc_pwdn = host_adc_pwdn`.
- No CDC: `host_adc_pwdn` is a stable single-bit level driven from the
clk_100m register straight to the I/O pad. AD9484 PWDN is asynchronous
w.r.t. the ADC clock; the chip re-acquires its DLL on PWDN deassert.
XDC pin assignments were already in place from AUDIT-C15 (50T:T5,
200T:P20, both LVCMOS25 driving the AD9484 PWDN net via the R36/R37
divider on the Main Board).
Verification:
- new tb/tb_adc_pwdn_opcode.v, 15/15 PASS:
T1 reset -> host_adc_pwdn=0, adc_pwdn pin=0 (ADC powered up)
T2 opcode 0x32 val=1 -> host_adc_pwdn=1, pin=1 (PWDN asserted)
T3 opcode 0x32 val=0 -> cleared
T4 only bit[0] consumed (upper bits ignored)
T5 unrelated opcodes (0x33, 0x01) don't disturb host_adc_pwdn
T6 cmd_valid_100m gating works
- Quick regression 33/33 PASS (was 32/32; +1 new test, 0 regressions)
- Lint: 0 errors
The BMP180 driver had no public init method and never called
readCalibrationCoefficients() from anywhere -- _calCoeff ran at the
C++ in-class member-initializer defaults (all zeros) at runtime.
Consequence chain:
- computeB5(UT) short-circuited via 0/0 (Cortex-M7 SDIV with
SCB->CCR.DIV_0_TRP=0 returns 0 silently -- system_stm32f7xx.c does
not enable the trap)
- getPressure() always tripped the `if (B4 == 0)` guard, returning
the I2C-error sentinel (post-AUDIT-C17: INT32_MIN; pre-: 255)
- health watchdog at main.cpp:758 fired ERROR_BMP180_COMM every
main-loop iteration because last_bmp_check was only updated on the
success path, so the 15 s rate-limit never engaged once the check
started failing
- error_count > 10 latched system_emergency_state = true (per the
MCU-N1 fix), driving SAFE-MODE within ~25 s of every boot
Fix:
- Added BMP180::begin() public method: probes chip ID, then reads the
11 factory cal coefficients (registers 0xAA..0xBE step 2). Returns
true only on full success; false on chip-ID mismatch or any I2C
failure mid-loop.
- main.cpp BAROMETER INIT calls myBMP.begin() with up to 3 retries
(50 ms backoff) and sets a file-scope bmp180_operational flag.
Altitude-baseline loop now gated on success -- failure path leaves
RADAR_Altitude at 0.0f instead of letting pow(negative, fractional)
propagate NaN into gps_data telemetry.
- Health watchdog gates BMP180 check on bmp180_operational AND
updates last_bmp_check regardless of the error path. A single bad
pressure reading no longer tight-loops into SAFE-MODE; legit sensor
failure now takes the intended ~150 s (10 errors x 15 s) before
the MCU-N1 latch trips, giving the operator time to intervene.
Verification:
- new test_audit_cal_bmp180_begin.c, 3/3 PASS:
T1 every coefficient loaded in order with correct signed/unsigned types
T2 chip-mismatch and I2C-fail short-circuit semantics correct
T3 regression demo: zero-cal computeB5 returns 0 for any UT (the
silent-fail mode); datasheet cal reproduces 15.0 C
- full MCU regression 33/33 PASS (was 32/32; +1 new test, 0 regressions)
Bug introduced in 5fbe97f (initial upload of the driver from the
Arduino enjoyneering79 BMP180 library -- the begin()/init pattern from
the upstream Arduino version was lost in the STM32 port). Latent until
this audit cycle.
BMP180_ERROR=255 was an in-band sentinel returned by uint16_t I/O helpers
(read16, readRawTemperature) on I2C failure. 255 is also a valid uint16
register reading (0x00FF appears across the calibration block and is
reachable as a raw temperature/pressure sample), so a sensor failure was
indistinguishable from a real reading.
getTemperature() additionally narrowed the uint16_t raw read to int16_t
before passing to computeB5(). Raw bit-patterns >= 0x8000 (reachable across
the BMP180 -40..+85 C operating window) flipped to negative int16_t and
sign-extended into computeB5(), producing temperature errors of order
100s of C (e.g. -347 C instead of +51 C for raw UT = 0x8000).
Fix:
- Internal I/O helpers (read8/read16/readRawTemperature/readRawPressure)
now return bool and pass the value through an out-param. None of the
new sentinels collide with valid sensor output:
* getTemperature -> NaN on error
* getPressure -> INT32_MIN on error
* getSeaLevelPressure -> INT32_MIN on error
- getTemperature() keeps raw as uint16_t and widens value-preservingly
via (int32_t)raw before computeB5().
- readRawPressure() reads XLSB through the bool-out-param contract;
previously OR'd in 0xFF on I2C fail, silently corrupting the LSB.
Verification: test_audit_c17_bmp180_sentinel_and_cast 4/4 PASS, including
datasheet UT=27898 -> 15.0 C reproduction and 64/64 finite outputs across
a full uint16 sweep (vs 32/32 collapses in the upper half under the buggy
narrowing). Full MCU regression 32/32 PASS.
Caller-side: no external code references BMP180_ERROR; main.cpp's existing
range check at the health-watchdog catches INT32_MIN via the < 30000.0
branch.
Pre-fix S_IDLE had two independent if-branches: one for frame_start_pulse
(resets pointers) and one for data_valid (transitions to S_ACCUMULATE).
A data_valid arriving before frame_start_pulse would advance the FSM with
whatever pointers happened to be live, and the BRAM write block would write
the sample into mem_write_addr = (write_chirp_index*RANGE_BINS) + 0.
In current operation the race is benign — end-of-S_ACCUMULATE always zeros
write_chirp_index/write_range_bin (line 287-288) and the MF pipeline latency
(~165 µs) is millions of cycles longer than the frame_start CDC latency
(~50 ns), so frame_start always arrives first. But the FSM relies on an
undocumented system-level invariant; a future code path that leaves
pointers stale on entry to S_IDLE would silently corrupt the first sample.
Fix: add a `frame_armed` register set when frame_start_pulse arrives in
S_IDLE, cleared on transition to S_ACCUMULATE. Both the FSM transition and
the BRAM write block gate on `(frame_start_pulse || frame_armed)`. The OR
admits the same-cycle case where both arrive together (write to addr 0
still resolves correctly because both blocks use the same gate).
Verification: tb_doppler_frame_start_gate 21/21 PASS, quick regression
32/32 PASS (was 31/31; +1 new test, 0 regressions). tb_doppler_realdata
(full FFT pipeline) still passes — gate transparent to normal operation.
Bug: 16-bit detect_count was reset only on power-on; increments at three
sites (ST_IDLE/ST_BUFFER simple-threshold paths and ST_CFAR_CMP) accumulate
across frames. At 178 fps with even 2-3 average detections per frame the
counter wraps in 100-180 seconds, breaking any rate-based host telemetry
or health check that reads it.
Fix: add `detect_count <= 16'd0` in ST_DONE so the counter represents
"detections this frame" instead of cumulative-since-boot. Updated $display
wording from "total detections" to "frame detections".
T13 flipped from "count keeps growing" to "identical-scene frames produce
identical counts" (the actual contract a per-frame counter must satisfy).
TB snapshots detect_count during ST_DONE because cfar_busy only goes low
on ST_IDLE entry — after the reset has fired.
Verification: tb_cfar_ca 24/24 PASS, quick regression 31/31 PASS.
Note: detect_count output port is now "live" (accumulates during frame,
0 between frames). Audit confirmed no current host telemetry consumes
this port. If future host code needs a stable last-frame total, add a
detect_count_last_frame snapshot register then.
AUDIT-C12: usb_data_interface_ft2232h had a misleading single-buffer comment
that overstated the timing slack and referenced a frame_ack_toggle CDC that
was never implemented. Re-verified actual numbers: at 178 fps the slack is
1.14 ms (20%), not "much shorter than gap". No data corruption today (write
order matches read order, addresses don't collide), but frame_complete
firing while WR_FSM is still draining the previous frame causes silent
frame drops via the missed frame_ready_toggle edge.
Fix is instrumentation, not architectural rework: add wr_done_toggle
(ft_clk -> clk CDC) on WR_DONE -> WR_IDLE, track frame_pending in clk
domain, count drops in 7-bit saturating frame_drop_count, surface in
unused upper 7 bits of status_words[5]. Host now has visibility into the
failure mode if margin ever shrinks (faster frame rate or USB bandwidth
shortfall). Replaced misleading comment with corrected timing breakdown.
AUDIT-S22: cfar_ca emits one detection per 3 cycles (THR/MUL/CMP); the
detection RMW takes 3 cycles. Match by construction today, fragile against
any CFAR speedup. Added a header comment in cfar_ca.v documenting the
dependency, and a SIMULATION-only assertion in usb_data_interface_ft2232h.v
that fires [ASSERT FAIL] AUDIT-S22 if cfar_valid arrives while RMW busy.
Catches silent-drop regressions in the test suite.
Verification: new tb_ft2232h_frame_drop.v with 5 scenarios (no drops /
stalled drops / multi-drop / recovery / saturation at 127) - 10/10 PASS.
Quick regression 31/31 PASS (was 30/30; +1 new test, 0 regressions).
Closes AUDIT-C15. The 200T XDC `adc_or_p/n` PACKAGE_PIN was a TODO
placeholder that blocked all 200T synth/impl with unplaced-IO errors.
Pins U20/V20 are L11P/L11N_T1_SRCC_14 — same T1 clock tile as adc_dco_p
on L12_MRCC (W19/W20), so OR captures with the same IBUFDS->BUFIO->IDDR
source-synchronous topology as adc_d_p[*]. Free-pair confirmed by Vivado
get_package_pins query against xc7a200tfbg484-2.
Adds DIFF_TERM TRUE on adc_or_n (was only on the p-side; explicit on
both is safer). Adds input_delay constraints mirroring adc_d_p
(max 1.0 ns / min 0.2 ns on both edges).
Header pin counts updated: Bank 14 21/50 used, total 184/285.
This is the FPGA-team RECOMMENDATION for the production PCB (NEW
design); the PCB designer must route AD9484 OR+ -> U20 and OR- -> V20.
Validation:
- read_xdc + link_design -part xc7a200tfbg484-2 -> READ OK on both
xc7a200t_fbg484.xdc and adc_clk_mmcm.xdc; no PACKAGE_PIN errors.
- ./run_regression.sh --quick: 29/29 PASS (RTL untouched).
The GUI's radar_protocol.py parsed 11-byte legacy packets only. The
production board (50T, USB_MODE=1) emits ~35 KB bulk frames from
usb_data_interface_ft2232h.v, so the legacy parser saw a random walk
of false 11-byte boundaries through bulk data — no usable display on
production hardware.
Bulk parser added (radar_protocol.py):
- parse_bulk_frame validates header, reserved bits, n_range=512,
n_doppler=32, footer-at-flag-derived-offset; unpacks range_profile
/ doppler_mag / cfar_dense per the format-flags byte.
- find_bulk_frame_boundaries is the bulk counterpart of
find_packet_boundaries; status packets (0xBB) handled in the same
stream since FT2232H emits them too.
- RadarAcquisition dispatches on isinstance(conn, FT2232HConnection):
bulk path skips the per-sample state machine and fills RadarFrame
in one shot. FT601 / 200T keeps legacy 11-byte (USB 3.0 has 50x
bandwidth headroom; per-sample format is correct and already works).
- RadarFrame.mag_only flag carries the wire's mag_only bit so
downstream consumers can skip I/Q panels cleanly.
- FT2232HConnection._mock_read now emits synthetic bulk frames
(was misleading legacy 11-byte).
RTL alignment (AUDIT-C9 RTL stub option):
- usb_data_interface_ft2232h.v header no longer promises the
unimplemented mag_only=0 (full-I/Q) and sparse_det=1 paths;
explicit INERT FLAGS note distinguishes the two reasons:
* Full-I/Q is constrained by hardware — needs ~28-BRAM18 I/Q
buffer (50T currently 78% BRAM utilised after FFT IP) AND
USB 2.0 bandwidth (12.21 MB/s vs 8 MB/s conservative budget).
* Sparse-list is feasible — smaller than dense for typical
scenes (<341 detections), ~1 BRAM18 cost. Just unimplemented
RTL work (small list BRAM + new WR_DETECT_SPARSE state).
- New SIMULATION-only assertion fires if stream_mag_only ever
becomes 0 or stream_sparse_det ever becomes 1 — backstop for
any future regression that bypasses the host-register clamp.
- radar_system_top.v opcode 0x04 force-clamps mag_only=1 and
sparse_det=0 in host_stream_control when USB_MODE=1, so a
Custom-Command host write can't push the FPGA into a wire-format
vs FSM divergence.
Bandwidth math (verified for 27c9c22+):
Frame rate = 1 / (16x167 us + 175.4 us + 16x175 us) = ~178 fps
Mag-only frame = 8+1024+32768+2048+1 = 35849 B = 6.38 MB/s
FT2232H 245-Sync-FIFO sustained budget (FTDI AN_232B-04
conservative): 8 MB/s. Headroom 20%.
Tests: test_GUI_V65_Tk.py TestBulkFrameParser — 18 new cases covering
round-trip per stream-flag combo, header/footer/n_range/n_doppler/
reserved-bit/truncation rejection, multi-frame boundaries, bulk+status
mixed streams, byte-drop resync, dispatch-by-connection-type,
ingest-to-RadarFrame end-to-end. GUI 117/117 PASS, v7 83/83 PASS,
FPGA quick regression 29/29 PASS, ruff clean.
Refs: AUDIT-C9 (GUI parses legacy 11-byte vs FT2232H bulk).
Follow-ups (separate patches):
- Sparse-detection write FSM (~1 BRAM18 + ~100 RTL lines).
Bandwidth- and memory-feasible; just unimplemented work.
- Full-I/Q write FSM. Constrained: needs ~28-BRAM18 I/Q buffer
AND USB 2.0 bandwidth headroom (50T post-FFT-IP at 78% BRAM).
The DDC hard-coded an offset-binary->2C subtract on the AD9484 path. The
chip's output format is selected by the SCLK/DFS strap (jumper SJ1 on
RADAR_Main_Board.sch), and CSB is hard-tied HIGH so SPI cannot be used
to confirm or change it from firmware. If the board is assembled with
SJ1 on pins 2-3 (two's-complement), the existing RTL silently mis-
converts every sample.
Add a 2-bit adc_format input to ddc_400m_enhanced (2-FF synchronized
clk_100m -> clk_400m, ASYNC_REG attribute), drive it from a new top-
level register host_adc_format written by host opcode 0x33, and wire
it through radar_receiver_final. Default 2'b00 matches the SJ1 default
strap (offset-binary) and preserves pre-patch behavior. Opcode 0x32 is
intentionally left unused; reserved for the future S-25 fix
(host-driven adc_pwdn).
Tests: tb/tb_ddc_400m.v Test Group 5 — 7 new assertions covering
offset-binary at {0x80, 0x00, 0xFF}, two's-complement at
{0x00, 0x80, 0x7F}, and reserved 2'b10 fallback. 14/14 PASS.
Refs: AUDIT-C3 (DDC offset-binary hardcoded).
Schematic ref: RADAR_Main_Board.sch:46719 (CSB on +1V8_CLOCK_F),
:46845 (SCLK/DFS via SJ1).
tb_radar_receiver_final had three pre-existing issues that all surfaced as
fails in regression (32 passed, 2 failed before; 34 passed, 0 after):
1. host_range_mode was undriven (floating 2'bzz); rmc log confirmed
"Auto-scan starting, range_mode=z". Add explicit 2'b01 (long-range
dual-chirp) for the test scenario.
2. DDC_MAX_ENERGY threshold (2^56) was sized for an unspecified earlier
stimulus; the test feeds a deliberately-loud 120 MHz sawtooth that
produces ~1.27e17 energy over 2M samples. Raised to 2^60 (~10x
observed) so B1b catches true overflow without false-firing.
3. The 9 doppler-frame-dependent checks (S4-S9, G1, B2a, B3, B4) need
~108 ms simulated time to fill a 32-chirp Doppler frame because the
in-house fft_engine takes ~340 K cycles per multi-segment chirp
(RX-NEW-3, commit 5c8cc8c). Iverilog can't elaborate the Xilinx FFT IP
that would make this tractable. Guard those checks behind
`ifdef FFT_USE_XILINX_IP` so iverilog cleanly SKIPs them with an
explanatory line; XSim with the IP runs them normally.
Also tightens run_regression.sh's pass/fail regex from
^\[(PASS|FAIL)([^]]*)\] to ^\[(PASS|FAIL)( [0-9]+)?\] so informational
tags like [FAIL-INFO] (used to document the known RX-NEW-1 fft_engine
bin-shift in tb_matched_filter_processing_chain.v) no longer false-fire
as real failures. The Matched Filter Chain test goes from FAIL (40 pass,
2 false-fails) to PASS (40 checks).
Regression: 34 passed, 0 failed.
The DAC short/long chirp LUTs are 10..30 MHz upchirps (Hilbert-confirmed).
With TX_LO=10.500 GHz, RX_LO=10.380 GHz (adf4382a_manager.h) and the
120 MHz DDC NCO (ddc_400m.v), high-side mixing places the post-DDC echo
at 10..30 MHz baseband. The matched-filter reference (gen_chirp_mem.py)
was generating 0..20 MHz, implicitly assuming the chirp's low edge mixed
to DC. This caused a 10 MHz spectral offset and ~5 dB matched-filter loss.
Adds F_BASEBAND_LOW=10e6 in both gen_chirp_mem.py and radar_scene.py,
with phase formula 2*pi*F_BASEBAND_LOW*t + pi*rate*t^2 in all chirp
generators. Regenerates the 6 .mem files. Adds analyze_short_chirp_mismatch.py
for the Hilbert-based diagnosis. Fixes the misleading "30MHz to 10MHz"
comment in plfm_chirp_controller.v and adds an end-to-end frequency plan
in the LUT header.
Sideband orientation (high-side at both mixers) is the conventional choice
and consistent with antenna match (10.25..10.75 GHz, 8x16 patch designed
at 10.5 GHz). Loopback capture would settle definitively; if either mixer
is low-side the F_BASEBAND_LOW sign flips and/or chirp direction reverses.
latency_buffer.v has had zero non-tb instantiations since RX-B (2026-04-23)
replaced its hookup in radar_receiver_final with a 1-FF alignment register.
The module was being kept "for potential future use" — exactly the kind of
dead weight the codebase does not need. Deleted, along with all build /
test infrastructure that dragged it along:
- 9_Firmware/9_2_FPGA/latency_buffer.v
- 9_Firmware/9_2_FPGA/tb/tb_latency_buffer.v
- run_regression.sh: removed from RTL_FILES and RECEIVER_RTL
- scripts/200t/build_200t.tcl: removed from synthesis source list
- tb/tb_system_e2e.v: removed from header compile-string example
- tb/cosim/validate_mem_files.py: deleted test_latency_buffer() (~75 lines),
its call site, and the corresponding entry in the module docstring
Historical RX-B comments referencing latency_buffer in radar_receiver_final.v,
tb_rxb_fullchain_latency.v, and tb_rxb_latency_measure.v are kept — they
explain WHY the module was removed, which is still useful design archaeology.
Two doc-only housekeeping touches bundled in:
- plfm_chirp_controller.v: replaced two empty "CRITICAL FIX: Generate
valid signal" labels at LONG_CHIRP and SHORT_CHIRP with one shared
chirp_valid policy comment block above LONG_CHIRP that explains the
actual rationale (downstream FIFO underrun on trailing samples).
- v7/models.py: replaced the "range_resolution and velocity_resolution
should be calibrated" docstring (sounded like an open TODO but was a
documented placeholder) with a clear pointer to the GUI-C3 fix in
workers.py:RadarDataWorker so future readers know the live path
derives correct values from WaveformConfig.
FPGA quick regression unchanged: 28/29 (1 fail is the unrelated iverilog/
Xilinx-IP RX-NEW-3 gap). GUI suite 180/180. Ruff clean.
Cross-verified status word 4 bit positions against the FPGA word builder
(usb_data_interface.v:376-380, usb_data_interface_ft2232h.v:675-679) and
the GUI parser (radar_protocol.py:252-257) — all positions match. No
production change needed; the gap was that nothing in the test suite
caught a future drift between the two sides.
Added test_parse_status_word4_layout_co_spec: a single canonical layout
table is the source of truth; for each field the test sets only that
field to its max value, builds the status packet via the existing
FPGA-builder-mirror _make_status_packet, parses, and asserts the field
round-trips exactly AND every other field reads back zero. Catches both
LSB drift and width drift on either side of the wire. Pre-checks that
widths plus the reserved [9:2] gap sum to 32 and that no two fields
overlap.
Also fixed test_default_shapes — stale (64,32)/(64,) literals predated
the GUI-C1 / Q3 alignment that bumped NUM_RANGE_BINS 64 -> 512. test_v7
was updated at the time, this one was missed. Replaced with references
to NUM_RANGE_BINS/NUM_DOPPLER_BINS so any future bin-count change
auto-updates the assertion. Suite now 180/180 PASS.
cmd_data / cmd_opcode / cmd_addr / cmd_value feed downstream CDC sync
chains; the safety property is that they only change on the cycle
cmd_valid rises (RD_PROCESS), and stay held on every other cycle so the
receiver's 2-FF synchronizer sees a clean payload regardless of where
its sample window lands. The FSM satisfies this implicitly today, but
nothing flagged a regression that introduced a stray write somewhere
in the same always block.
Added an `ifdef SIMULATION block at the bottom of both
usb_data_interface.v (FT601 / ft601_clk_in / ft601_reset_n) and
usb_data_interface_ft2232h.v (FT2232H / ft_clk / ft_reset_n). It
snapshots the payload + cmd_valid each cycle and fires
[ASSERT FAIL] TX-N9: cmd_<field> changed while cmd_valid=0 (old -> new)
on any payload change while cmd_valid is low. Local regs suffixed _n9
to avoid future name collisions. Synthesis-inert.
Quick FPGA regression unchanged: USB Data Interface 91/91 PASS, overall
28/29 (same baseline; the 1 fail is the pre-existing iverilog/Xilinx-IP
RX-NEW-3 gap).
Every boot waited the full 180 s OCXO warmup soak — even an
IWDG/SYSRESETREQ reset that takes seconds and leaves the OCXO oven hot
lost three minutes of bringup time.
Added BKPSRAM slot 3 (magic 0xCA1C1F1E) with warmup_persist_set/check
helpers next to the existing MCU-A2/A7 BKPSRAM block. Cold-boot path
now arms the flag at the end of the full 180 s soak; subsequent boots
that find the flag still set know the OCXO oven is still hot and the
crystal is settled, so they wait 5 s and move on. Power-cycle clears
BKPSRAM and forces the full soak again — safe default, operator can't
accidentally skip the warmup by yanking and re-applying power.
Added test_mcu_a4_ocxo_warm_restart (7 cases): cold boot soaks 180 s
and sets the flag; warm reset is 5 s; 5 consecutive warm resets stay
fast; power-cycle restores the cold path; cold-after-power-cycle
re-arms the bypass; pre-fix regression confirms 10 warm restarts save
1750 s vs the old always-180-s path. MCU regression now 82/82.
The magnetometer yaw correction used a hardcoded -0.61 deg literal baked
in for one deployment site. Yaw_Sensor was wrong by (site_decl + 0.61)
deg at every other site whenever the UM982 dual-antenna heading was
unavailable.
Backed the value with BKPSRAM (slots 1+2 — slot 0 is the MCU-A7
emergency flag) and exposed set_mag_declination_deg / get_mag_declination_deg.
Default returns the legacy -0.61 deg when no override has been written so
the original site stays correct out of the box; a host command (or
future GPS-derived auto-calibration) writes the new site value once and
it persists across every reset path until main-power removal.
Hardened with a +/-30 deg range clamp on both write AND read paths — real
magnetic declinations are roughly +/-25 deg worldwide, so a wider value
indicates a calibration error or BKPSRAM corruption (VBAT brown-out, bit
flip) rather than a legitimate site. Defensive read-side clamp prevents
a corrupted slot from propagating a wild heading offset.
Replaced the single use site at the magnetometer yaw computation with
the getter; legacy global Mag_Declination retained and kept in sync by
the setter for any external linkage.
Added test_mcu_a2_mag_declination (10 cases): default, set/get,
persistence across reset, power-cycle clear, write-side clamp both
directions, plausible-site passthrough, defensive read-side clamp on
corruption, wrong-magic fallback, pre-fix bearing-error regression.
MCU regression now 81/81.
attemptErrorRecovery() previously fell through to the default log-only
branch for both ERROR_AD9523_CLOCK and ERROR_FPGA_COMM. checkSystemHealth
keeps re-firing the same error every pass with no recovery action ever
attempted, so the system limps along until escalation kicks in.
ERROR_AD9523_CLOCK: AD9523_RESET_ASSERT, 10 ms settle, then re-run
configure_ad9523() (releases reset, selects REFB, reprograms, waits for
lock). On second failure we log and let the next health pass re-fire so
a transient brown-out on the 100 MHz reference does not drop straight
into Emergency_Stop.
ERROR_FPGA_COMM: pulse PD12 LOW->10 ms->HIGH (matches the boot reset
pattern). PA rails left untouched at runtime; brief adar_tr_x undefined
window is acceptable vs. losing the radar entirely.
Added test_mcu_a6_recovery_dispatch (11 cases) covering both new
handlers, all existing routes, the default branch, a pre-fix regression
check, and an explicit assertion that RF_PA_OVERCURRENT escalates
upstream (handleSystemError) rather than recovering inline. MCU
regression now 80/80.
The boot-time Idq calibration walks DAC_val from 126 down toward the
1.680 A target. Mid-walk readings sit well above the 2.5 A overcurrent
threshold by design, and a channel that hits the safety_counter timeout
(50 iters) can be left above the window. Without a gate, the next
checkSystemHealth() pass would trip ERROR_RF_PA_OVERCURRENT and route
straight into Emergency_Stop, killing the system mid-bringup.
Added a `pa_calibration_in_progress` flag set TRUE around both DAC1 and
DAC2 cal walks. checkSystemHealth's Idq window short-circuits while the
flag is set; bias-fault and overcurrent thresholds remain fully active
once the walk completes, so any genuinely stuck-high channel surfaces on
the very next health pass and routes through the normal handler.
Other health checks (lock, comm, temperature, watchdog) stay live during
cal — no behavioural change to anything except the Idq window.
Added test_mcu_a5_pa_cal_gate (7 cases): mid-walk masking, post-cal
re-arming, stuck-high channel surfacing after gate clears, bias-fault
gating, PowerAmplifier=false short-circuit, and a pre-fix regression
case showing the buggy path would have tripped overcurrent mid-walk.
MCU regression now 79/79.
Emergency_Stop's hold loop refreshed IWDG forever, so any reset path that
DID fire (SYSRESETREQ from another fault, brown-out) would re-run
startup and re-energize the PA rails — there was no record that the
system had been in emergency state. Watchdog defeat in the hold loop
masked the problem.
BKPSRAM gives us a flag that survives every reset path but is lost on
main-power removal — exactly the recovery semantics we want:
power-cycle is the deliberate operator action that clears emergency,
every other reset stays in safe-hold.
- Added emergency_persist_set/check helpers (BKPSRAM @ 0x40024000,
magic 0xDEAD5A5A); enable PWR + backup-access + BKPSRAM clock.
- Emergency_Stop now writes the flag BEFORE the rail-cut sequence so
even an interrupted shutdown still leaves the persisted state set.
- main() checks the flag immediately after MX_IWDG_Init and before
any PA enable code; if set, calls Emergency_Stop directly. GPIO
init has already forced all PA enables LOW, so the safe-hold path
is reached without a single PA rail going hot.
Hold-loop IWDG refresh kept intentionally: a healthy hold loop does not
need to cycle the MCU, but if the loop itself wedges (stack corruption,
bus fault), refresh stops, IWDG fires, and the persist flag routes the
reset right back into safe-hold.
Added test_mcu_a7_emergency_persist (6 cases) modelling BKPSRAM
persistence vs power-cycle, including a regression check that exercises
the pre-fix "no persistence" boot to confirm it would have re-energized
the PAs. MCU regression now 78/78.
Cooling-fan trip in main.cpp's periodic temperature block was a 25 C dev
stub that latched the fan ON at room temperature on every boot. Replaced
with production thermal control: ON at 70 C, OFF at 60 C. The 10 C
dead-band prevents relay/fan chatter near the threshold; the 70 C ON
point sits below the 75 C SAFE-mode gate in checkSystemHealth() so the
fan engages before the system shuts down.
Driven from the existing `temperature` global (max of 8 sensors,
populated just above by the GAP-3 fix) instead of re-OR'ing the eight
Temperature_N variables — single source of truth, and the diag now
prints the actual peak temperature on each transition.
Added test_mcu_a1_cooling_hysteresis (9 cases) covering cold-start,
upward crossing, dead-band hold, downward crossing, and a regression
guard at 30 C that would have engaged the fan under the old stub.
MCU regression now 77/77.
matched_filter_processing_chain declared `input wire [5:0] chirp_counter`
but never read it inside the module. matched_filter_multi_segment passed
its own chirp_counter through to that dead port.
Removed the port from the chain and the corresponding hookup at the
multi_segment instantiation site. Five testbenches also referenced the
port (tb_mf_cosim, tb_matched_filter_processing_chain, tb_rxb_latency
_measure plus the four MF cosim variants that share tb_mf_cosim) — the
reg/connection/init lines were dropped, and the now-stale "Test Group 8:
Chirp Counter Passthrough" was repurposed as a port-removal smoke test
that confirms the chain still produces FFT_SIZE outputs without that
input.
multi_segment.chirp_counter input remains on the port list (it could
plausibly be wired to per-chirp logic in the future); it is now formally
unused but iverilog/Vivado do not flag unused module inputs.
Quick regression: 28/29 PASS (same as baseline; the 1 fail is the known
iverilog/Xilinx-IP RX-NEW-3 gap unchanged by this commit).
runRadarPulseSequence was redeclaring `int m, n, y` at function scope,
which shadowed the file-scope `uint8_t m, n, y` globals at lines
~190-192 that getStatusString reports to the GUI as
BeamPos|Azimuth|ChirpCount. The function's increments updated only the
locals, then discarded them — so telemetry was permanently frozen at
"BeamPos:1|Azimuth:1|ChirpCount:1" no matter how many beam positions
or revolutions had elapsed.
Fix: drop the three local declarations; the body already references
m/n/y by name, so removing the locals lets the writes hit the globals.
A comment documents the pitfall so the locals do not get re-added by
a future cleanup. Numeric ranges are safe (m_max=32, n_max=31,
y_max=50, all fit in uint8_t).
Test: new standalone test_bug16_runradar_shadows_globals.c reproduces
both the buggy (locals shadow globals) and fixed (globals advance)
patterns and asserts the expected post-sweep values
(g_n=16, g_m=1 wraps each iter, g_y=2 after one revolution).
MCU regression: 76/76 (was 75).
The packet-boundary scanner only checked header + footer bytes, so any
payload byte that happened to be 0xAA (or 0xBB) and which lined up with
a 0x55 at offset+10 (or +25) was accepted as a packet. A single corrupt
byte could permanently shift the binning until the next frame_start
re-sync.
Added two structural sentinel checks against fixed bits the FPGA
emitter always drives to known values:
- data byte 9 = {frame_start, 6'b0, cfar_detection} -> bits[6:1]==0
- status byte 1 = high byte of status_words[0] -> 0xFF
Combined with the existing footer check, false-match probability drops
from ~1/256 to ~1/16384 (data) and ~1/65536 (status). Mock generators
already produce conformant bit patterns, so existing parser/mock-read
tests pass unchanged.
New tests:
- test_find_boundaries_rejects_false_data_header (forged 0xAA...0x55)
- test_find_boundaries_rejects_false_status_header (forged 0xBB...0x55)
- test_find_boundaries_recovers_after_byte_drop (single-byte loss)
Tests: GUI 96/96 (was 93), test_v7 83/83, MCU 75/75, ruff clean.
No RTL change -- wire format is unchanged; this hardens the parser only.
The same RadarFrame is enqueued for the display consumer and handed to
DataRecorder.record_frame on the producer thread. h5py releases the GIL
during gzip compression, so any in-place mutation by the consumer (or a
future scaling/normalisation step) would tear the on-disk frame.
record_frame now copies all five numpy arrays into local snapshots
before passing them to h5py.create_dataset. Disk integrity no longer
depends on consumer behaviour.
New test test_record_frame_isolates_from_post_call_mutation asserts
that mutating every array in place after record_frame returns leaves
the HDF5 contents untouched.
Tests: GUI 93/93 (was 92), ruff clean repo-wide.
`chirps_mismatch_error` was set in radar_system_top when the host
requested chirps_per_elev != Doppler FFT size, but never wired into the
USB status response — a latent silent failure.
Wired the flag through both USB interfaces (FT601 + FT2232H) into bit
[10] of status word 4 (was reserved). GUI parser exposes it as
StatusResponse.chirps_mismatch.
- usb_data_interface*.v: new status_chirps_mismatch input, packed at [10]
- radar_system_top.v: connect chirps_mismatch_error to both USB instances
- radar_protocol.py + test_GUI_V65_Tk.py: parse new bit, +1 round-trip test
- tb_usb_data_interface.v: drive the new port, update word-4 expectation
Tests: GUI 92/92 (was 91), MCU 75/75, USB TB 91/91, ruff clean repo-wide.
The 2 remaining FPGA regression failures (Receiver Integration, MF Chain)
are the pre-existing iverilog-can't-link-Xilinx-IP issue tracked
separately as the open RX-NEW-3 follow-up.
mti_canceller previously armed has_previous and refreshed
prev_chirp_was_long only when range_bin_d1 == NUM_RANGE_BINS - 1.
range_bin_decimator can early-terminate a chirp before reaching the
last bin (overflow guard at range_bin_decimator.v:306, watchdog at
:314), so on every such chirp MTI never armed and stayed muted forever
on every subsequent chirp until reset.
Detect chirp boundary internally using bin-0 arrival after at least
one non-zero bin in the prior chirp. effective_has_previous lifts
has_previous=1 the cycle chirp_boundary fires so the new chirp's
bin-0 is subtracted (read-before-write on prev[0] correctly returns
the previous chirp's bin-0). prev_chirp_was_long now updates on every
range_valid_d1 (no-op within a chirp; OLD value still visible at the
chirp_boundary cycle for the waveform_changed compare). Pass-through
clears saw_nonzero_bin_in_chirp so the first MTI-enabled chirp after
a pass-through run is correctly muted.
No port changes. tb_mti_canceller T13 added: feed a 32/64-bin partial
chirp followed by a full chirp, verify the second chirp is NOT muted
(would fail without the fix). MTI Canceller goes from 40 -> 43 checks,
all passing. Local regression: 32/34 PASS (same as baseline; the two
failing tests are pre-existing RX-NEW-3 FFT throughput).
Replaces the in-house iterative fft_engine.v in the matched-filter chain
with the Pipelined Streaming Xilinx FFT IP, closing RX-NEW-3 (FFT chain
~11x too slow vs PRI budget).
Components:
* ip/xfft_2048_ip/xfft_2048_ip.xci — committed IP definition
(16-bit fixed point, BFP scaling, convergent rounding, natural order,
pipelined-streaming, BRAM data/reorder/phase factors). Vivado
regenerates .dcp / sim-netlist from this on each build.
* scripts/50t/gen_xfft_2048_ip.tcl — IP-Catalog generation script
* scripts/50t/run_xfft_xsim.sh — XSim batch runner for tb_xfft_2048_xsim
* xfft_2048.v — AXI-Stream wrapper. FFT_USE_XILINX_IP define routes to
real LogiCORE for synth/XSim; falls back to fft_engine batched
one-shot for iverilog (unit coverage only).
* fft_engine_axi_bridge.v — exposes legacy fft_engine port surface on
top of the xfft_2048 AXI wrapper, so the chain swap is a 1-line
module-name change.
* matched_filter_processing_chain.v — fft_engine -> fft_engine_axi_bridge
* scripts/50t/build_50t.tcl — read_ip + generate_target + synth_ip;
adds FFT_USE_XILINX_IP to verilog defines.
* tb/tb_xfft_2048_xsim.v — XSim verification (DC, impulse, tone bin 128).
All 5 assertions PASS on remote with the real IP; tuser=0x0a (BLK_EXP=10)
confirms BFP scaling working.
Local iverilog regression: 32/34 PASS — identical to baseline. Same two
RX-NEW-3 failures (Receiver Integration, Matched Filter Chain) — these
only resolve in remote XSim with the real IP, since iverilog uses the
fft_engine fallback inside xfft_2048 (~150K cycles/pass, not the
~2200-cycle Pipelined Streaming throughput). MF cosim 4/4 PASS confirms
bridge bit-exact in fallback mode.
Pending: remote XSim of tb_radar_receiver_final to demonstrate Doppler
frames produced within PRI budget; remote synth to confirm DSP/timing
post-IP.
Strip the explicit DSP48E1 instance from comb stage 0 and the
(* use_dsp = "yes" *) attribute from comb stages 1-4. The combs are
gated by data_valid_comb_pipe (fires once every 4 clk_400m cycles
post-decimation), so a multicycle path of 4 -setup / 3 -hold scoped
to the comb registers in xc7a50t_ftg256.xdc gives STA 10 ns of slack
for fabric carry-chain to close 28-bit subtracts comfortably.
Pipeline depth and bit-widths unchanged: the new fabric model mirrors
the prior CREG+AREG+BREG+PREG structure exactly, so data_valid_comb_0_out
alignment and downstream stages 1-4 see bit-identical samples. CIC
behavioral simulation model now lives outside the SIMULATION ifdef
branch (used unconditionally) since there is no longer a synthesis-only
DSP48E1 to replace.
50T post-impl results (Vivado 2025.2):
DSPs: 80 → 70 / 120 (66.7% → 58.3%, freed 10)
LUTs: 22114 / 32600 (67.8%)
BRAM: 55.5 / 75 (74.0%, unchanged)
adc_dco_p WNS: +0.022 ns → +0.906 ns (margin improved)
All clocks meet timing, 0 failing endpoints.
Local regression: 32/34 PASS — same as baseline; the two failures
(Receiver Integration, Matched Filter Chain) are pre-existing
RX-NEW-3 (FFT throughput) and unaffected by this change. Bit-exact
through DDC chain (NCO→CIC→FIR) and MF cosim verified.
Cumulative DSP savings today: 112 → 70 (freed 42), enough headroom
for Xilinx LogiCORE FFT Pipelined Streaming swap (~33 DSPs for the
3-instance matched-filter chain) with 17 DSPs to spare.
Re-group the 32-tap symmetric lowpass into 16 (D+A)*B operations using
the DSP48E1 pre-adder, exploiting coeff[k] == coeff[31-k]. Production
silicon (XC7A50T) drops from 112/120 DSPs (93.3%) to 80/120 (66.7%),
freeing the budget needed for the matched-filter FFT swap (RX-NEW-3).
Bit-exact contract preserved at non-saturating signal levels: DC=5000
→ 8847 and 45 MHz tone → ±16 LSB match the unfolded design and the
Python golden model. Throughput unchanged (1 sample/cycle, 100 MSPS);
latency +2 cycles for the pre-adder stage.
Saturation thresholds rebuilt via bit concatenation to dodge the
Verilog 32-bit-literal trap (1 <<< 34 silently wraps to 0, which
made the earlier symmetric draft assert positive saturation on all
non-negative accumulator values).
Local regression: 32/34 PASS — same as baseline; the two failures
(Receiver Integration, Matched Filter Chain) are pre-existing
RX-NEW-3 (FFT throughput) and unaffected by this change.
The header had two claims that "valid samples arrive every ~4 cycles" at
the FIR boundary. That is false in the production wiring: the CIC `_4x`
decimator turns clk_400m into a 100 M-pulse-per-second stream, then
cdc_adc_to_processing crosses that into clk_100m where dst_valid asserts
every cycle in steady state. The 4:1 ratio applies between the two clock
domains, not as further sub-sampling inside clk_100m.
This matters because the 32-tap coefficients were designed for the
25 MSPS rate the wrong comment described, but the FIR is actually being
driven at 100 MSPS. The cutoff sits 4x higher than intended; existing
tests pass because the 36-bit accumulator silently wraps on large
sustained inputs (see RX-NEW-3 in the project ledger).
Comment-only commit. No RTL behaviour change. Any future DSP-saving
rework — symmetric pre-adder, 4:1 fold, Xilinx FIR Compiler — needs a
designer call on whether to redesign coefficients for 100 MSPS, add a
real decimation stage to hit 25 MSPS, or keep the current accidental
behaviour.