-
Notifications
You must be signed in to change notification settings - Fork 640
Qualcomm AI Engine Direct - fix LPBQ implementation #12663
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- fix LPBQ and make test case more general
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/12663
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 New FailuresAs of commit e8ad73c with merge base 4d7f9ca ( NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
Hi @cccclai, I made a mistake when implementing LPBQ. I've tested current version against Llama3.2 1B and can get better results now. Will update the related change on llama.py in another PR. |
Thanks! Mind sharing a bit more details about the mistake? Trying to follow the code but not super clear |
Yes, currently HTP uses 4bits to store quantized scales. I didn't clip the numeric to the right range (before: 0-255 / after: 1-16). I also double check with AIMET's implementation and the mse between per-channel / per-block. |
Ah that makes sense.. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix
### Summary - fix LPBQ and make test case more general ### Test plan ```python python backends/qualcomm/tests/test_qnn_delegate.py TestQNNQuantizedOperator.test_qnn_backend_conv2d_block -b build-android -s $DEVICE -m SM8750 ```
Summary
Test plan