-
-
Notifications
You must be signed in to change notification settings - Fork 776
[WIP] Enable Ascend NPU Backend with Custom Ops Integration for NF4 Support #1695
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[WIP] Enable Ascend NPU Backend with Custom Ops Integration for NF4 Support #1695
Conversation
Signed-off-by: SlightwindSec <[email protected]>
Signed-off-by: SlightwindSec <[email protected]>
Error Summary: Encountered a vector core execution failure on Ascend 910B3 NPU while running Qwen-image NF4 quantized model inference. The NPU reported multiple DDR memory access violations (error code 0x800000) across 12 compute cores, specifically during dequantize_blockwise_fp32_nf4_1kernel execution. The system threw ACL synchronization error (code 507035) when attempting tensor device transfer (pos_freqs.to(device)). Technical Breakdown: Hardware-Level: Multiple cores (5-15,20-22) triggered MTE (Memory Tagging Extension) faults indicating invalid DDR address access
|
What does this PR do?
This PR ports Ascend NPU backend changes from the multi-backend-refactor branch and integrates with custom ops. It includes changes to enable Ascend build and translation of kernels and ops to Ascend-compatible operators. As the AscendC-based high-performance NF4 implementation is still in progress, a temporary PyTorch version is used for now. The build steps remain the same as before from the user's standpoint.
Collaborators
@ji-huazhong @Ginray @Runningwater23
cc @Titus-von-Koeller @matthewdouglas @amathews-amd @sunway513