Thank you for this great survey.
On page 84, the section states: “Batch management optimization aims to increase the batch size during the decoding stage to enhance arithmetic intensity. A representative method is continuous batching, proposed by vLLM [304].”
The attribution in the last sentence is inaccurate. Continuous batching was first introduced by ORCA, not by vLLM. The vLLM paper and its implementation use continuous batching as the default scheduling mechanism, but the original idea was proposed earlier by ORCA.
@inproceedings {280922,
author = {Gyeong-In Yu and Joo Seong Jeong and Geon-Woo Kim and Soojeong Kim and Byung-Gon Chun},
title = {Orca: A Distributed Serving System for {Transformer-Based} Generative Models},
booktitle = {16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22)},
year = {2022},
isbn = {978-1-939133-28-1},
address = {Carlsbad, CA},
pages = {521--538},
url = {https://www.usenix.org/conference/osdi22/presentation/yu},
publisher = {USENIX Association},
month = jul
}
Thank you for this great survey.
On page 84, the section states: “Batch management optimization aims to increase the batch size during the decoding stage to enhance arithmetic intensity. A representative method is continuous batching, proposed by vLLM [304].”
The attribution in the last sentence is inaccurate. Continuous batching was first introduced by ORCA, not by vLLM. The vLLM paper and its implementation use continuous batching as the default scheduling mechanism, but the original idea was proposed earlier by ORCA.
@inproceedings {280922,
author = {Gyeong-In Yu and Joo Seong Jeong and Geon-Woo Kim and Soojeong Kim and Byung-Gon Chun},
title = {Orca: A Distributed Serving System for {Transformer-Based} Generative Models},
booktitle = {16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22)},
year = {2022},
isbn = {978-1-939133-28-1},
address = {Carlsbad, CA},
pages = {521--538},
url = {https://www.usenix.org/conference/osdi22/presentation/yu},
publisher = {USENIX Association},
month = jul
}