Skip to content

Correction on the Attribution of Continuous Batching #100

@leigao97

Description

@leigao97

Thank you for this great survey.

On page 84, the section states: “Batch management optimization aims to increase the batch size during the decoding stage to enhance arithmetic intensity. A representative method is continuous batching, proposed by vLLM [304].”

The attribution in the last sentence is inaccurate. Continuous batching was first introduced by ORCA, not by vLLM. The vLLM paper and its implementation use continuous batching as the default scheduling mechanism, but the original idea was proposed earlier by ORCA.

@inproceedings {280922,
author = {Gyeong-In Yu and Joo Seong Jeong and Geon-Woo Kim and Soojeong Kim and Byung-Gon Chun},
title = {Orca: A Distributed Serving System for {Transformer-Based} Generative Models},
booktitle = {16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22)},
year = {2022},
isbn = {978-1-939133-28-1},
address = {Carlsbad, CA},
pages = {521--538},
url = {https://www.usenix.org/conference/osdi22/presentation/yu},
publisher = {USENIX Association},
month = jul
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions