-
-
Notifications
You must be signed in to change notification settings - Fork 9.8k
[Bugfix][V1][P/D]Fix the uneven polling issue in the toy proxy for P2pNcclConnector #21819
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Abatom <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request correctly fixes an uneven polling bug in the example proxy server by changing dict.pop()
to dict.get()
. This prevents reordering of registered instances and ensures fair round-robin selection. The change is effective and well-targeted.
However, I've identified a critical issue in the surrounding code that could lead to an UnboundLocalError
and crash the listener thread if an unexpected message type is received. I've left a detailed comment on this. Addressing this would significantly improve the robustness of the example.
examples/online_serving/disaggregated_serving_p2p_nccl_xpyd/disagg_proxy_p2p_nccl_xpyd.py
Show resolved
Hide resolved
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
Signed-off-by: Abatom <[email protected]>
Signed-off-by: Abatom <[email protected]>
@chaunceyjiang Could you help me review this PR? |
OK /assign |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Your change ensures that the order always follows the first registration, since subsequent registrations only modify the dict's values and don’t affect the keys’ order.
/LGTM
@DarkLight1337 PTAL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For toy example this is a good-enough fix so LGTM. We probably need a more formal fix in our future version of router.
…pNcclConnector (vllm-project#21819) Signed-off-by: Abatom <[email protected]>
…pNcclConnector (vllm-project#21819) Signed-off-by: Abatom <[email protected]>
…pNcclConnector (vllm-project#21819) Signed-off-by: Abatom <[email protected]> Signed-off-by: x22x22 <[email protected]>
…pNcclConnector (vllm-project#21819) Signed-off-by: Abatom <[email protected]> Signed-off-by: x22x22 <[email protected]>
…pNcclConnector (vllm-project#21819) Signed-off-by: Abatom <[email protected]>
…pNcclConnector (vllm-project#21819) Signed-off-by: Abatom <[email protected]>
…pNcclConnector (vllm-project#21819) Signed-off-by: Abatom <[email protected]> Signed-off-by: jingyu <[email protected]>
…pNcclConnector (vllm-project#21819) Signed-off-by: Abatom <[email protected]> Signed-off-by: Jinzhen Lin <[email protected]>
…pNcclConnector (vllm-project#21819) Signed-off-by: Abatom <[email protected]> Signed-off-by: Noam Gat <[email protected]>
…pNcclConnector (vllm-project#21819) Signed-off-by: Abatom <[email protected]> Signed-off-by: Avery Yingyi Huang <[email protected]>
…pNcclConnector (vllm-project#21819) Signed-off-by: Abatom <[email protected]> Signed-off-by: Paul Pak <[email protected]>
…pNcclConnector (vllm-project#21819) Signed-off-by: Abatom <[email protected]> Signed-off-by: Xiao Yu <[email protected]>
…pNcclConnector (vllm-project#21819) Signed-off-by: Abatom <[email protected]>
…pNcclConnector (vllm-project#21819) Signed-off-by: Abatom <[email protected]> Signed-off-by: Boyuan Feng <[email protected]>
…pNcclConnector (vllm-project#21819) Signed-off-by: Abatom <[email protected]> Signed-off-by: Diego-Castan <[email protected]>
…pNcclConnector (vllm-project#21819) Signed-off-by: Abatom <[email protected]> Signed-off-by: Xiao Yu <[email protected]>
…y for P2pNcclConnector (vllm-project#21819)" This reverts commit 9f518af.
…pNcclConnector (vllm-project#21819) Signed-off-by: Abatom <[email protected]>
…pNcclConnector (vllm-project#21819) Signed-off-by: Abatom <[email protected]> Signed-off-by: Xiao Yu <[email protected]>
…y for P2pNcclConnector (vllm-project#21819)" This reverts commit 9f518af. Signed-off-by: Xiao Yu <[email protected]>
…pNcclConnector (vllm-project#21819) Signed-off-by: Abatom <[email protected]> Signed-off-by: Xiao Yu <[email protected]>
…pNcclConnector (vllm-project#21819) Signed-off-by: Abatom <[email protected]> Signed-off-by: Xiao Yu <[email protected]>
…y for P2pNcclConnector (vllm-project#21819)" This reverts commit 9f518af. Signed-off-by: Xiao Yu <[email protected]>
…pNcclConnector (vllm-project#21819) Signed-off-by: Abatom <[email protected]> Signed-off-by: Xiao Yu <[email protected]>
…y for P2pNcclConnector (vllm-project#21819)" This reverts commit 9f518af.
…y for P2pNcclConnector (vllm-project#21819)" This reverts commit 9f518af.
…y for P2pNcclConnector (vllm-project#21819)" This reverts commit 9f518af. Signed-off-by: Xiao Yu <[email protected]>
…pNcclConnector (vllm-project#21819) Signed-off-by: Abatom <[email protected]>
…y for P2pNcclConnector (vllm-project#21819)" This reverts commit 9f518af. Signed-off-by: Xiao Yu <[email protected]>
…pNcclConnector (vllm-project#21819) Signed-off-by: Abatom <[email protected]>
Using
pop
alters the original P/D instance order in the dictionary, and the round-robin selection of 1P1D relies on that order, which leads to uneven polling. After replacingpop
withget
, the unevenness issue has been fixed.