From 977434a5f6ebeaaf816e4e9a4fed9983ef0ee8a7 Mon Sep 17 00:00:00 2001
From: Jason <83615043+JJassonn69@users.noreply.github.com>
Date: Thu, 23 Apr 2026 09:26:23 +0545
Subject: [PATCH] docs(fpga): correct fir_lowpass.v rate comment + flag
 rate/coeff mismatch
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The header had two claims that "valid samples arrive every ~4 cycles" at
the FIR boundary. That is false in the production wiring: the CIC `_4x`
decimator turns clk_400m into a 100 M-pulse-per-second stream, then
cdc_adc_to_processing crosses that into clk_100m where dst_valid asserts
every cycle in steady state. The 4:1 ratio applies between the two clock
domains, not as further sub-sampling inside clk_100m.

This matters because the 32-tap coefficients were designed for the
25 MSPS rate the wrong comment described, but the FIR is actually being
driven at 100 MSPS. The cutoff sits 4x higher than intended; existing
tests pass because the 36-bit accumulator silently wraps on large
sustained inputs (see RX-NEW-3 in the project ledger).

Comment-only commit. No RTL behaviour change. Any future DSP-saving
rework — symmetric pre-adder, 4:1 fold, Xilinx FIR Compiler — needs a
designer call on whether to redesign coefficients for 100 MSPS, add a
real decimation stage to hit 25 MSPS, or keep the current accidental
behaviour.
---
 9_Firmware/9_2_FPGA/fir_lowpass.v | 23 +++++++++++++++++++----
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/9_Firmware/9_2_FPGA/fir_lowpass.v b/9_Firmware/9_2_FPGA/fir_lowpass.v
index 68d169d..a74acbf 100644
--- a/9_Firmware/9_2_FPGA/fir_lowpass.v
+++ b/9_Firmware/9_2_FPGA/fir_lowpass.v
@@ -40,8 +40,23 @@ parameter ACCUM_WIDTH = 36;
 // Total latency: 9 cycles from data_valid to data_out_valid
 // (was 7 before BREG+MREG addition — +2 cycles for DSP48 pipelining)
 // Throughput: 1 sample per cycle (fully pipelined)
-// FIR runs at 100 MHz on data decimated 4:1 from 400 MHz — valid samples
-// arrive every ~4 cycles, so the 9-cycle latency is transparent.
+//
+// Input rate: 100 MSPS at clk_100m (data_valid asserted EVERY cycle in
+// steady state). The CIC `_4x` decimator drops 4:1 from clk_400m → 100 M
+// pulses/s, then the cdc_adc_to_processing CDC into clk_100m emits one
+// dst_valid per dst_clk cycle (Gray-toggle handshake at matching rate).
+// A previous comment here claimed "samples arrive every ~4 cycles"; that
+// was wrong — the 4:1 ratio applies between clk_400m and clk_100m, not as
+// further sub-sampling within clk_100m.
+//
+// Implication: this 32-tap FIR's cutoff was designed for a 25 MSPS rate
+// (the post-decimation rate that "every 4 cycles" would imply). Running
+// at 100 MSPS shifts the effective cutoff 4× above that. The system's
+// existing tests pass because the 36-bit accumulator silently wraps on
+// large sustained inputs (giving lowpass-like behaviour by accident) —
+// see the build report and the open RX-NEW-3 design question. Any future
+// DSP-saving rework of this module needs a designer call on the
+// rate-vs-coefficient mismatch.
 // ============================================================================
 
 // Filter coefficients (symmetric: coeff[k] == coeff[31-k])
@@ -102,8 +117,8 @@ end
 // eliminating 68 DPIP-1 + 35 DPOP-2 warnings and improving timing.
 //
 // Pipeline impact: +2 cycles latency (BREG + MREG). Total FIR latency
-// goes from 7 to 9 cycles. Still transparent since FIR input arrives
-// every ~4 clocks (100 MHz / 4:1 CIC decimation).
+// goes from 7 to 9 cycles. Transparent relative to downstream processing
+// (Doppler/MTI operate on accumulated frames, not per-sample).
 // ============================================================================
 
 // Registered coefficients (BREG — absorbed into DSP48E1 B-port register)