Skip to content

Add FPGA board cards, DMA+PCIe simulation, Wishbone bus wrapper, timing closure#46

Open
devin-ai-integration[bot] wants to merge 4 commits into
masterfrom
devin/1776115159-fpga-board-cards
Open

Add FPGA board cards, DMA+PCIe simulation, Wishbone bus wrapper, timing closure#46
devin-ai-integration[bot] wants to merge 4 commits into
masterfrom
devin/1776115159-fpga-board-cards

Conversation

@devin-ai-integration

@devin-ai-integration devin-ai-integration Bot commented Apr 13, 2026

Copy link
Copy Markdown
Contributor

Summary

Introduces FPGA board "cards" — JSON files under fpga_cards/ that capture all board-level parameters (fabric resources, BRAM geometry, PCIe bandwidth, power draw, synthesis flags) sourced from vendor datasheets. The benchmark, report, and comparison pipelines now read from an FPGACard dataclass instead of using hardcoded constants and the PCIeModel class.

Additionally adds:

  • Cycle-accurate PCIe DMA simulation — throttles word transfers to match the card's practical PCIe bandwidth
  • Timing closure flaggingsynthesis_stats() reports whether nextpnr met the target Fmax
  • Synthesizable Wishbone B4 slave wrapper — an Amaranth Elaboratable that memory-maps TopModule over a standard Wishbone bus, modelling the LiteX SoC integration path

New files

  • fpga_cards/lattice_ecp5_45k_cabga381.json — first card, all values from Lattice ECP5 Family Data Sheet (FPGA-DS-02012-3.4) and related app notes
  • tg2hdl/fpga_card.py — frozen FPGACard dataclass, load_card() / load_card_from_path() / list_cards() helpers, and derived methods (pcie_xfer_s, bram_blocks_for_bits, etc.)
  • compiler/pcie_dma.pysimulate_top_with_pcie() runs a second Amaranth simulation that throttles input loading and output readback to match the card's PCIe bandwidth, with per-direction DMA setup latency
  • compiler/wishbone_wrapper.pyWishboneTopWrapper Elaboratable that wraps TopModule as a Wishbone B4 slave with memory-mapped registers: CTRL (0x0000), STATUS (0x0004), CYCLE_CNT (0x0008), input buffers (0x1000+), output buffers (0x8000+). simulate_wishbone() drives TopModule through the bus interface with cycle-accurate 2-cycle Wishbone transactions (strobe + ack)

Modified files

  • tg2hdl/report.py — removes PCIeModel class and FPGA_FAMILY/FPGA_DEVICE/FPGA_PACKAGE constants; benchmark() now accepts card: FPGACard and runs three simulations (ideal, DMA-throttled, Wishbone-wrapped); HTML report includes "DMA + PCIe" and "Wishbone Bus Simulation" timing tables; BenchmarkArtifact gains dma_* and wb_* fields; estimates section generated from _estimates_for_card(card) with Wishbone methodology documented
  • benchmark.py — clock speeds, power values, and scaling estimates read from card instead of a hardcoded multi-FPGA table
  • compare_inference.py — replaces Xilinx-specific _RAMB36_BITS constant with card-derived BRAM block size; synthesis calls pass card=card; display label uses card.synth_toolchain
  • compiler/utils.pysynthesis_stats() accepts optional card: FPGACard; returns new target_mhz and timing_met fields for timing closure flagging; reads nextpnr_binary name from card instead of hardcoding "nextpnr-ecp5"
  • compiler/__init__.py — exports simulate_top_with_pcie, WishboneTopWrapper, simulate_wishbone
  • tg2hdl/__init__.py — exports FPGACard, load_card, load_card_from_path, list_cards; drops PCIeModel export

All FIXME comments related to estimated FPGA values have been removed or replaced with notes referencing the card datasheet source.

Review & Testing Checklist for Human

  • Wishbone wrapper correctness: WishboneTopWrapper is a ~300-line Elaboratable that has not been synthesized or tested end-to-end. The FSM, address decoding, and buffer mapping logic should be reviewed carefully. In particular verify that the input buffer sequential layout (0x1000 + offset) and output buffer region (0x8000+) correctly map to TopModule's ext_write_ports and output_rport.
  • Triple simulation cost: benchmark() now runs three full Amaranth simulations (ideal + DMA-throttled + Wishbone-wrapped). There is no flag to skip any of them. Verify this runtime is acceptable or consider adding opt-out parameters.
  • simulate_wishbone() output correctness: The Wishbone simulation should produce identical numerical results to the ideal simulation. Run benchmark() and verify that the Wishbone output buffer contents match the ideal simulation output.
  • Breaking API: benchmark() signature changed from pcie: PCIeModel to card: FPGACard, and PCIeModel is deleted. Verify no external callers or notebooks use the old pcie= kwarg or import PCIeModel.
  • timing_met semantics: synthesis_stats() compares achieved Fmax against card.synth_typical_fmax_mhz (the card's typical achievable frequency, not a user-specified constraint). Decide if this is the right target or if it should be a separate user-configurable parameter.
  • Report HTML output: Run benchmark() end-to-end and open the generated index.html to verify the three timing tables (ideal, DMA, Wishbone) all render correctly with card-derived values.
  • Datasheet accuracy: Spot-check key ECP5 45K specs in the JSON (44,184 LUTs, 108 DP16KD blocks, PCIe Gen1 x1, 2.5 GT/s, 200 MB/s practical BW, 2.5 µs DMA latency, 0.4 W typical power) against Lattice ECP5 datasheet.

Notes

  • compiler/backend.py has an unrelated FIXME about the analytical cycle model, and benchmark.py retains a FIXME about GPU latency estimates — both are intentionally left as-is since they aren't FPGA board-level estimates.
  • No unit tests were added for FPGACard, load_card(), simulate_top_with_pcie(), WishboneTopWrapper, or simulate_wishbone(). The JSON schema is validated only by the dataclass constructor at load time.
  • synthesis_stats() retains its original device="45k" / package="CABGA381" default arguments for backward compatibility, but all callers in this PR now pass card=card instead.
  • The "mapped" stage in show_hardware() still uses a hardcoded synth_ecp5 Yosys pass — not updated to use the card's yosys_target.
  • synthesis_stats() still returns ECP5-specific key names (dp16kd, mult18, comb, ff). Future non-ECP5 cards may need generic key names.
  • The Wishbone wrapper is designed to plug directly into a LiteX SoC as a Wishbone slave when an ECP5 board is available — no rewrite needed for that transition.

Link to Devin session: https://app.devin.ai/sessions/c6cba5b3aae14406a970cb500224341c
Requested by: @Ferryistaken


Open with Devin

…imates

- Add fpga_cards/ directory with Lattice ECP5 45K-CABGA381 card JSON
  (all values from official Lattice datasheets)
- Add tg2hdl/fpga_card.py: FPGACard dataclass, load_card(), list_cards()
- Remove PCIeModel class and hardcoded FPGA constants from report.py
- Update benchmark() to accept FPGACard instead of PCIeModel
- Update benchmark.py to read clock/power from FPGA card
- Update compare_inference.py to use card for synthesis params and BRAM
- Replace Xilinx RAMB36 constant with card-derived BRAM block size
- Remove all FIXME comments related to estimated FPGA values
- Export FPGACard and helpers from tg2hdl.__init__

Co-Authored-By: Alessandro Ferrari <alessandro.ferrari.2004@gmail.com>
@devin-ai-integration

Copy link
Copy Markdown
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@netlify

netlify Bot commented Apr 13, 2026

Copy link
Copy Markdown

Deploy Preview for tg2hdl ready!

Name Link
🔨 Latest commit a843cd5
🔍 Latest deploy log https://app.netlify.com/projects/tg2hdl/deploys/69dd64bf616dbd00084803c3
😎 Deploy Preview https://deploy-preview-46--tg2hdl.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

- Add optional card: FPGACard parameter to synthesis_stats()
- When card is provided, device, package, yosys_target, resource types,
  and fpga_family are all read from the card
- Update callers in report.py and compare_inference.py to pass card=card
  instead of device=/package= kwargs
- Backward-compatible: existing callers without a card still work via
  the original device/package string defaults

Co-Authored-By: Alessandro Ferrari <alessandro.ferrari.2004@gmail.com>
devin-ai-integration[bot]

This comment was marked as resolved.

…_binary to card

- compiler/pcie_dma.py: New module with simulate_top_with_pcie() that
  throttles input/output transfers to match card PCIe bandwidth and
  adds per-direction DMA setup latency cycles
- compiler/utils.py: synthesis_stats() now returns target_mhz and
  timing_met fields for timing closure flagging; reads nextpnr binary
  name from card instead of hardcoding
- tg2hdl/report.py: benchmark() runs both ideal and DMA-aware sims;
  HTML report shows DMA timing breakdown table; estimates section
  documents the DMA simulation methodology
- fpga_cards/lattice_ecp5_45k_cabga381.json: Added nextpnr_binary field
- tg2hdl/fpga_card.py: Added synth_nextpnr_binary and synth_toolchain
- compare_inference.py: Uses card.synth_toolchain for display label

Co-Authored-By: Alessandro Ferrari <alessandro.ferrari.2004@gmail.com>
@devin-ai-integration devin-ai-integration Bot changed the title Add FPGA board cards as first-class JSON specs, replace hardcoded estimates Add FPGA board cards, DMA+PCIe cycle-accurate simulation, timing closure flagging Apr 13, 2026
- compiler/wishbone_wrapper.py: WishboneTopWrapper Elaboratable that
  memory-maps TopModule's input/output buffers and control registers
  over a standard Wishbone B4 bus. Register map: CTRL (0x0000),
  STATUS (0x0004), CYCLE_CNT (0x0008), input region (0x1000+),
  output region (0x8000+). simulate_wishbone() drives TopModule
  through the bus interface with cycle-accurate bus transactions.
- compiler/__init__.py: Export WishboneTopWrapper and simulate_wishbone
- compiler/pcie_dma.py: Remove unused amaranth.hdl imports
- tg2hdl/report.py: benchmark() runs Wishbone simulation alongside
  ideal and DMA sims; HTML report adds Wishbone timing breakdown
  table; BenchmarkArtifact gains wb_* fields; estimates section
  documents Wishbone simulation methodology

Co-Authored-By: Alessandro Ferrari <alessandro.ferrari.2004@gmail.com>
@devin-ai-integration devin-ai-integration Bot changed the title Add FPGA board cards, DMA+PCIe cycle-accurate simulation, timing closure flagging Add FPGA board cards, DMA+PCIe simulation, Wishbone bus wrapper, timing closure Apr 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant