-
Notifications
You must be signed in to change notification settings - Fork 77
feat: enable torch_npu graph mode for Qwen-3 dense with TP support. #325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: enable torch_npu graph mode for Qwen-3 dense with TP support. #325
Conversation
Summary of Pending Items for this PRThis PR is a work in progress. The following items need to be completed before it's ready for final review:
|
4549991 to
1a5e2f0
Compare
xllm/core/kernels/param.h
Outdated
| // for npu | ||
| torch::Tensor seq_lens; | ||
| int num_heads; | ||
| int num_kv_heads; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can get num_heads and num_kv_heads form th shape of query and key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the file xllm_2/xllm/core/layers/common/attention.cpp, you’ve pre-shaped the query and key parameters before passing them. However, different platforms may require slightly different shapes after the view operation. It might be more flexible to pass the original query, key along with num_heads_ and num_kv_heads_ parameters, and then perform the view operation inside the batch_prefill method. This approach would provide better cross-platform compatibility and clearer parameter handling.
34afffc to
97f924a
Compare
…ilation structure.
97f924a to
f37d69d
Compare
|
Could you please help review this PR when you have a moment? 🙏 @yq33victor @XuZhang99 |
No description provided.