DLAFI

Software-Based Fault Injection for Permanent Faults in Deep Learning Accelerators

This repository contains DLAFI, a hardware-aware, software-level fault-injection (FI) framework that models permanent faults in systolic arrays (SAs) with the speed of software-level injection and the accuracy of hardware simulation. The accompanying paper (Accepted in ISSRE 2025) describes the approach, experiments, and results in detail.

Overview

DLAFI extracts microarchitectural mapping abstractions of a systolic array (SA) via a small set of hardware microbenchmarks run in RTL simulation (Gemmini). These mapping abstractions are then used to perform hardware-aware fault injection at the LLVM IR level across large ML models, combining accuracy and scale.

If you want to understand the theoretical details, experimental setup, and results, see the paper.

Repository layout

/ (repo root)
├─ hw-sim/                # microbenchmarks, generator scripts
├─ llfi-dlafi/            # LLFI integration and SA_programs (SA microbenchmarks)
├─ pytorch-fi/            # application-level PyTorch FI and evaluation scripts
├─ install_script.sh      # top-level installer (builds LLVM, ONNX-MLIR, DLAFI)
├─ install_hw_sim_script.sh  # installs Chipyard + Gemmini (HW sim support)
├─ quick_run.sh    # installation scripts and getting strated script
├─ Dockerfile
└─ README.md (this file)

Requirements

Docker (recommended and tested) OR Ubuntu 20.04/22.04 host with sudo privileges
At least 100 GB disk space (more recommended for toolchains, build artifacts)
Build-time RAM and CPU: building LLVM/ONNX-MLIR benefits from multiple cores; use -j$(nproc) where applicable.

Recommended: run inside the repository's Docker image which already contains most system packages used in the paper's artifact.

How to install

Docker build: The repo contains Dockerfile and install_script.sh to help automation.

Building the Docker

Build the Docker image from the top-level Dockerfile:

docker build -t dlafi_image .

Run a container and mount the repo for persistence:

docker run -it --name dlafi_container dlafi_image

Environment variables

Set these paths into your docker's ~/.bashrc before running install scripts. They are used throughout the scripts and make the workflow portable.

# LLVM (source/build)
export LLVM_SRC=/workspace/llvm-project
export LLVM_DST_ROOT=$LLVM_SRC/build
export MLIR_DIR=$LLVM_DST_ROOT/lib/cmake/mlir

# ONNX-MLIR
export ONNX_MLIR_SRC=/workspace/onnx-mlir
export ONNX_MLIR_BUILD=$ONNX_MLIR_SRC/build

# LLFI/DLAFI
export LLFI_BUILD_ROOT=/workspace/llfi-build
export DLAFI_ROOT=/workspace/DLAFI

# Chipyard (hardware sims)
export CHIPYARD_ROOT=/workspace/chipyard

You can save these lines to ~/.bashrc and source it:

source ~/.bashrc

Inside the container, run the installer for DLAFI and its HW simulation compoenent :

git clone https://github.com/ManiSadati/DLAFI.git $DLAFI_ROOT

cd $DLAFI_ROOT

# installing LLVM, ONNX MLIR, PyTorch environment, and DLAFI (might take 1-2 hours)
bash install_script.sh

# (Optional) To install Chipyard + Gemmini for HW sims, Note that it will take several hours to install:
bash install_hw_sim_script.sh

Getting started (quick run)

After completing installation, you can execute a minimal DLAFI flow — consisting of mapping generation → LLVM‑level fault injection → PyTorch‑FI evaluation — using the helper script quick_run.sh.

Basic usage

To run only the LLVM‑level FI and compare it with PyTorch‑FI results:

bash quick_run.sh --llvm-fi --pytorch-fi

If you have also installed the hardware simulation components (Chipyard + Gemmini), you can execute the full end‑to‑end flow:

bash quick_run.sh --all

What the script does internally

The script leverages the environment variables defined earlier to coordinate the following phases:

Hardware‑aware mapping generation
Generates systolic array (SA) mapping abstractions using hardware microbenchmarks.
You can also run this manually:
```
cd "$DLAFI_ROOT/hw-sim/microbenchmarks"
python3 benchmark_generator.py \
    --chipyard_dir "$CHIPYARD_ROOT" \
    --kernels matmul
```
The --kernels flag can be changed to other supported kernels for additional benchmarks.
Mapping distribution
The generated file mappings_output.yaml is automatically copied into each benchmark directory under:
```
$DLAFI_ROOT/llfi-dlafi/SA_programs/*/SAinput.yaml
```

LLVM‑level fault injection (LLFI)
Compiles and runs fault injection for each benchmark. You can also do this manually for a specific model:

cd "$DLAFI_ROOT/llfi-dlafi/SA_programs/shufflenet-v2-10" # Try other folders for experimenting with different benchmarks
bash compile.sh
bash runllfi.sh 1   # Increase the argument to test multiple inputs

PyTorch‑FI evaluation
Executes the corresponding PyTorch‑level fault injection and reports results:

cd "$DLAFI_ROOT/pytorch-fi"
source pytorch-env/bin/activate
python3 main.py \
    --models shufflenet_v2 \   # You can experiment with other models and even load your own pretrained model
    --num-images 1 \
    --num-iters 5 \
    --sa-dim 16

These steps collectively reproduce the hardware‑aware and software‑level FI workflow used in the paper.

Last updated: [October 18th, 2025]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DLAFI

Table of Contents

Overview

Repository layout

Requirements

How to install

Building the Docker

Environment variables

Getting started (quick run)

Basic usage

What the script does internally

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 82 Commits
hw-sim		hw-sim
llfi-dlafi		llfi-dlafi
pytorch-fi		pytorch-fi
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
install_hw_sim_script.sh		install_hw_sim_script.sh
install_script.sh		install_script.sh
quick_run.sh		quick_run.sh

License

DependableSystemsLab/DLAFI

Folders and files

Latest commit

History

Repository files navigation

DLAFI

Table of Contents

Overview

Repository layout

Requirements

How to install

Building the Docker

Environment variables

Getting started (quick run)

Basic usage

What the script does internally

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages