Skip to content

HUMAN: Akita DRAM Model Evaluation for HBM3 #252

@syifan

Description

@syifan

Summary

An evaluation of the Akita DRAM model (akita/v4/mem/dram/) was conducted to assess its suitability for modeling HBM3 memory as used on the AMD MI300A. While the model has a solid architectural foundation (bank state machines, 4-level timing tables, transaction splitting), several critical features are missing or incorrectly modeled for HBM3.

Akita DRAM Model Architecture (Brief)

The model follows a standard academic DRAM controller pipeline:

Request → Transaction → SubTransSplitter → SubTransactionQueue (FCFS)
→ CommandCreator (close-page) → CommandQueue (per-rank, round-robin)
→ Channel (Banks[Rank][BankGroup][Bank]) → Bank state machine → Response

Supported protocols: DDR3, DDR4, GDDR5, GDDR5X, GDDR6, LPDDR, LPDDR3, LPDDR4, HBM, HBM2, HMC.

Key HBM3 Features Missing or Incorrectly Modeled

1. No HBM3 Protocol Constant (only HBM/HBM2)

The model has no HBM3 protocol. HBM3 has significant differences from HBM/HBM2:

  • Higher data rate (up to 6.4 Gbps/pin; MI300A uses 5.2 Gbps/pin)
  • Independent pseudo-channels as a first-class concept
  • Different timing parameters and refresh schemes (per-bank refresh is default)
  • In-line ECC

Recommendation: Add an HBM3 protocol constant with appropriate protocol-specific timing behaviors.

2. No Pseudo-Channel Modeling

HBM3 splits each 128-bit channel into two independent 64-bit pseudo-channels, each with its own command/address bus, bank groups, banks, and independent row buffers. The Akita model has no concept of pseudo-channels — it treats the channel as a monolithic unit.

Impact: Two requests to different pseudo-channels should proceed independently but in the model they share timing constraints.

Recommendation: Add pseudo-channel support as a first-class feature in the channel hierarchy, or model each pseudo-channel as a separate DRAM controller instance.

3. Single Command Per Cycle Bottleneck

The issue() function issues at most ONE command per tick. The subTransactionQueue.Tick() also processes at most ONE sub-transaction per tick. Real HBM3 can issue multiple independent commands per cycle (one per pseudo-channel, multiple banks active simultaneously).

Recommendation: Allow multiple commands to be issued per tick when timing constraints allow. At minimum, one command per pseudo-channel.

4. Close-Page Policy Only (No Open-Page)

The model hardcodes ClosePageCommandCreator, always generating ReadPrecharge/WritePrecharge commands. Real HBM3 controllers use open-page or adaptive policies to exploit row buffer locality, which is critical for GPU streaming workloads. The bank state machine already supports open row tracking — only a proper command creator is needed.

Recommendation: Implement an OpenPageCommandCreator or AdaptiveCommandCreator.

5. No Per-Bank Refresh

The model defines refresh commands but no component actually generates them. No refresh controller or scheduler exists. In real HBM3, per-bank refresh (REFpb) is the default mode. Refresh interference can reduce effective bandwidth by 5-15%.

Recommendation: Implement a refresh scheduler. For HBM3, per-bank refresh should be the default mode.

6. Missing tPPD for HBM

The tPPD (precharge-to-precharge delay) is only applied for GDDR and LPDDR4 protocols in the timing generation code, not for HBM. HBM3 requires tPPD. This appears to be a bug in the timing table generation.

Recommendation: Enable tPPD in the timing tables for HBM/HBM2/HBM3 protocols.

Additional Gaps (Lower Priority)

  • No bus turnaround delay modeling at the channel level (read↔write switching)
  • Command queue is per-rank only — per-bank or per-bank-group queues would improve parallelism
  • Address mapping order not configurable through the builder API; no HBM3-optimized defaults
  • No power-down state management (states defined but never entered)

Current Workaround

We are using SimpleBankedMemory as an interim DRAM model for MI300A timing configuration, with tuned pipeline depth, stage latency, and buffer sizes to approximate the expected bandwidth characteristics. This sidesteps the DRAM model limitations but sacrifices detailed timing accuracy.

Summary Table

Feature Status HBM3 Need Severity
HBM3 protocol ❌ Missing HBM3-specific behavior 🔴 Critical
Pseudo-channels ❌ Not modeled Independent 64-bit channels 🔴 Critical
Page policy Close-page only Open/Close/Adaptive 🔴 Critical
Commands/cycle 1 (hardcoded) Multiple per bank/pseudo-ch 🔴 Critical
Refresh ❌ Not implemented Per-bank refresh (3.9 μs) 🔴 Critical
tPPD for HBM ❌ Not applied Required 🔴 Critical
Bus turnaround ❌ Not modeled R/W turnaround penalty ⚠️ Important
Command queue Per-rank Per-bank/bank-group ⚠️ Important
Bank state machine ✅ OK Open/Closed/SRef ✅ OK
Timing tables ✅ OK 4-level hierarchy ✅ OK
Row buffer tracking ✅ OK Per-bank open row ✅ OK

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions