Summary
An evaluation of the Akita DRAM model (akita/v4/mem/dram/) was conducted to assess its suitability for modeling HBM3 memory as used on the AMD MI300A. While the model has a solid architectural foundation (bank state machines, 4-level timing tables, transaction splitting), several critical features are missing or incorrectly modeled for HBM3.
Akita DRAM Model Architecture (Brief)
The model follows a standard academic DRAM controller pipeline:
Request → Transaction → SubTransSplitter → SubTransactionQueue (FCFS)
→ CommandCreator (close-page) → CommandQueue (per-rank, round-robin)
→ Channel (Banks[Rank][BankGroup][Bank]) → Bank state machine → Response
Supported protocols: DDR3, DDR4, GDDR5, GDDR5X, GDDR6, LPDDR, LPDDR3, LPDDR4, HBM, HBM2, HMC.
Key HBM3 Features Missing or Incorrectly Modeled
1. No HBM3 Protocol Constant (only HBM/HBM2)
The model has no HBM3 protocol. HBM3 has significant differences from HBM/HBM2:
- Higher data rate (up to 6.4 Gbps/pin; MI300A uses 5.2 Gbps/pin)
- Independent pseudo-channels as a first-class concept
- Different timing parameters and refresh schemes (per-bank refresh is default)
- In-line ECC
Recommendation: Add an HBM3 protocol constant with appropriate protocol-specific timing behaviors.
2. No Pseudo-Channel Modeling
HBM3 splits each 128-bit channel into two independent 64-bit pseudo-channels, each with its own command/address bus, bank groups, banks, and independent row buffers. The Akita model has no concept of pseudo-channels — it treats the channel as a monolithic unit.
Impact: Two requests to different pseudo-channels should proceed independently but in the model they share timing constraints.
Recommendation: Add pseudo-channel support as a first-class feature in the channel hierarchy, or model each pseudo-channel as a separate DRAM controller instance.
3. Single Command Per Cycle Bottleneck
The issue() function issues at most ONE command per tick. The subTransactionQueue.Tick() also processes at most ONE sub-transaction per tick. Real HBM3 can issue multiple independent commands per cycle (one per pseudo-channel, multiple banks active simultaneously).
Recommendation: Allow multiple commands to be issued per tick when timing constraints allow. At minimum, one command per pseudo-channel.
4. Close-Page Policy Only (No Open-Page)
The model hardcodes ClosePageCommandCreator, always generating ReadPrecharge/WritePrecharge commands. Real HBM3 controllers use open-page or adaptive policies to exploit row buffer locality, which is critical for GPU streaming workloads. The bank state machine already supports open row tracking — only a proper command creator is needed.
Recommendation: Implement an OpenPageCommandCreator or AdaptiveCommandCreator.
5. No Per-Bank Refresh
The model defines refresh commands but no component actually generates them. No refresh controller or scheduler exists. In real HBM3, per-bank refresh (REFpb) is the default mode. Refresh interference can reduce effective bandwidth by 5-15%.
Recommendation: Implement a refresh scheduler. For HBM3, per-bank refresh should be the default mode.
6. Missing tPPD for HBM
The tPPD (precharge-to-precharge delay) is only applied for GDDR and LPDDR4 protocols in the timing generation code, not for HBM. HBM3 requires tPPD. This appears to be a bug in the timing table generation.
Recommendation: Enable tPPD in the timing tables for HBM/HBM2/HBM3 protocols.
Additional Gaps (Lower Priority)
- No bus turnaround delay modeling at the channel level (read↔write switching)
- Command queue is per-rank only — per-bank or per-bank-group queues would improve parallelism
- Address mapping order not configurable through the builder API; no HBM3-optimized defaults
- No power-down state management (states defined but never entered)
Current Workaround
We are using SimpleBankedMemory as an interim DRAM model for MI300A timing configuration, with tuned pipeline depth, stage latency, and buffer sizes to approximate the expected bandwidth characteristics. This sidesteps the DRAM model limitations but sacrifices detailed timing accuracy.
Summary Table
| Feature |
Status |
HBM3 Need |
Severity |
| HBM3 protocol |
❌ Missing |
HBM3-specific behavior |
🔴 Critical |
| Pseudo-channels |
❌ Not modeled |
Independent 64-bit channels |
🔴 Critical |
| Page policy |
Close-page only |
Open/Close/Adaptive |
🔴 Critical |
| Commands/cycle |
1 (hardcoded) |
Multiple per bank/pseudo-ch |
🔴 Critical |
| Refresh |
❌ Not implemented |
Per-bank refresh (3.9 μs) |
🔴 Critical |
| tPPD for HBM |
❌ Not applied |
Required |
🔴 Critical |
| Bus turnaround |
❌ Not modeled |
R/W turnaround penalty |
⚠️ Important |
| Command queue |
Per-rank |
Per-bank/bank-group |
⚠️ Important |
| Bank state machine |
✅ OK |
Open/Closed/SRef |
✅ OK |
| Timing tables |
✅ OK |
4-level hierarchy |
✅ OK |
| Row buffer tracking |
✅ OK |
Per-bank open row |
✅ OK |
Summary
An evaluation of the Akita DRAM model (
akita/v4/mem/dram/) was conducted to assess its suitability for modeling HBM3 memory as used on the AMD MI300A. While the model has a solid architectural foundation (bank state machines, 4-level timing tables, transaction splitting), several critical features are missing or incorrectly modeled for HBM3.Akita DRAM Model Architecture (Brief)
The model follows a standard academic DRAM controller pipeline:
Supported protocols: DDR3, DDR4, GDDR5, GDDR5X, GDDR6, LPDDR, LPDDR3, LPDDR4, HBM, HBM2, HMC.
Key HBM3 Features Missing or Incorrectly Modeled
1. No HBM3 Protocol Constant (only HBM/HBM2)
The model has no
HBM3protocol. HBM3 has significant differences from HBM/HBM2:Recommendation: Add an
HBM3protocol constant with appropriate protocol-specific timing behaviors.2. No Pseudo-Channel Modeling
HBM3 splits each 128-bit channel into two independent 64-bit pseudo-channels, each with its own command/address bus, bank groups, banks, and independent row buffers. The Akita model has no concept of pseudo-channels — it treats the channel as a monolithic unit.
Impact: Two requests to different pseudo-channels should proceed independently but in the model they share timing constraints.
Recommendation: Add pseudo-channel support as a first-class feature in the channel hierarchy, or model each pseudo-channel as a separate DRAM controller instance.
3. Single Command Per Cycle Bottleneck
The
issue()function issues at most ONE command per tick. ThesubTransactionQueue.Tick()also processes at most ONE sub-transaction per tick. Real HBM3 can issue multiple independent commands per cycle (one per pseudo-channel, multiple banks active simultaneously).Recommendation: Allow multiple commands to be issued per tick when timing constraints allow. At minimum, one command per pseudo-channel.
4. Close-Page Policy Only (No Open-Page)
The model hardcodes
ClosePageCommandCreator, always generatingReadPrecharge/WritePrechargecommands. Real HBM3 controllers use open-page or adaptive policies to exploit row buffer locality, which is critical for GPU streaming workloads. The bank state machine already supports open row tracking — only a proper command creator is needed.Recommendation: Implement an
OpenPageCommandCreatororAdaptiveCommandCreator.5. No Per-Bank Refresh
The model defines refresh commands but no component actually generates them. No refresh controller or scheduler exists. In real HBM3, per-bank refresh (REFpb) is the default mode. Refresh interference can reduce effective bandwidth by 5-15%.
Recommendation: Implement a refresh scheduler. For HBM3, per-bank refresh should be the default mode.
6. Missing tPPD for HBM
The
tPPD(precharge-to-precharge delay) is only applied for GDDR and LPDDR4 protocols in the timing generation code, not for HBM. HBM3 requires tPPD. This appears to be a bug in the timing table generation.Recommendation: Enable tPPD in the timing tables for HBM/HBM2/HBM3 protocols.
Additional Gaps (Lower Priority)
Current Workaround
We are using
SimpleBankedMemoryas an interim DRAM model for MI300A timing configuration, with tuned pipeline depth, stage latency, and buffer sizes to approximate the expected bandwidth characteristics. This sidesteps the DRAM model limitations but sacrifices detailed timing accuracy.Summary Table