| Website | Documentation | Contribute | Contact | Changelog |
We have built an LLMServingSim website to help you get started with the simulator. Please visit llmservingsim.ai for documentation, contribution guides, and team contact info.
LLMServingSim is a cycle-level simulator for LLM serving infrastructure. It pairs a Python frontend that mirrors vLLM's continuous-batching scheduler with the ASTRA-Sim C++ analytical network backend, and drives both from per-hardware latency data captured by a vLLM-based layerwise profiler. The result is a unified environment for studying heterogeneous accelerators, disaggregated memory tiers (CPU / CXL / PIM), MoE routing, and multi-instance parallelism (TP / PP / EP / DP) end-to-end.
git clone --recurse-submodules https://github.com/casys-kaist/LLMServingSim.git
cd LLMServingSim
./scripts/docker-sim.sh # launch the simulator container
./scripts/compile.sh # build ASTRA-Sim + Chakra
./serving/run.sh # run the example simulationsFor installation details, container choices, configuration layout, CLI flags, and the full set of example workloads, see the documentation.
ISPASS 2026
LLMServingSim 2.0: A Unified Simulator for Heterogeneous and Disaggregated LLM Serving Infrastructure
Jaehong Cho*, Hyunmin Choi*, Guseul Heo, Jongse Park (KAIST) [Paper] (To Appear)
*Equal contribution
CAL 2025
LLMServingSim2.0: A Unified Simulator for Heterogeneous Hardware and Serving Techniques in LLM Infrastructure
Jaehong Cho, Hyunmin Choi, Jongse Park (KAIST) [Paper]
IISWC 2024
LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale
Jaehong Cho, Minsu Kim, Hyunmin Choi, Guseul Heo, Jongse Park (KAIST) [Paper]
If you use LLMServingSim in your research, please cite:
@ARTICLE{11224567,
author={Cho, Jaehong and Choi, Hyunmin and Park, Jongse},
journal={IEEE Computer Architecture Letters},
title={{LLMServingSim2.0: A Unified Simulator for Heterogeneous Hardware and Serving
Techniques in LLM Infrastructure}},
year={2025},
volume={24},
number={02},
pages={361-364},
doi={10.1109/LCA.2025.3628325},
ISSN={1556-6064},
publisher={IEEE Computer Society},
address={Los Alamitos, CA, USA},
month=jul
}
@INPROCEEDINGS{10763697,
author={Cho, Jaehong and Kim, Minsu and Choi, Hyunmin and Heo, Guseul and Park, Jongse},
booktitle={2024 IEEE International Symposium on Workload Characterization (IISWC)},
title={{LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving
at Scale}},
year={2024},
pages={15-29},
doi={10.1109/IISWC63097.2024.00012}
}