Skip to content

[ET-VK][q8ta] Add q8ta_linear operator for int8 quantized linear#17565

Open
SS-JIA wants to merge 1 commit intogh/SS-JIA/439/basefrom
gh/SS-JIA/439/head
Open

[ET-VK][q8ta] Add q8ta_linear operator for int8 quantized linear#17565
SS-JIA wants to merge 1 commit intogh/SS-JIA/439/basefrom
gh/SS-JIA/439/head

Conversation

@SS-JIA
Copy link
Contributor

@SS-JIA SS-JIA commented Feb 19, 2026

Stack from ghstack (oldest at bottom):

Add a new q8ta_linear operator that performs fully quantized int8
linear (matmul + bias) with per-tensor activation quantization and
per-channel weight quantization, producing int8 output. This enables
back-to-back quantized linear layers without intermediate
dequantize/quantize steps.

The operator reuses the existing tiled int8 linear GLSL headers
(input/weight tile loading, int8 dot product accumulation, weight
scales/sums/bias loading) and adds output quantization via
quantize_and_pack to produce packed int8 output.

The fusion pass in quantized_linear.py detects the
q→dq→linear→q pattern (where the output quantize node comes from a
subsequent quantized op's input) and fuses it into a single
q8ta_linear call.

This diff was authored with Claude.

Differential Revision: D93768642

Add a new q8ta_linear operator that performs fully quantized int8
linear (matmul + bias) with per-tensor activation quantization and
per-channel weight quantization, producing int8 output. This enables
back-to-back quantized linear layers without intermediate
dequantize/quantize steps.

The operator reuses the existing tiled int8 linear GLSL headers
(input/weight tile loading, int8 dot product accumulation, weight
scales/sums/bias loading) and adds output quantization via
quantize_and_pack to produce packed int8 output.

The fusion pass in quantized_linear.py detects the
q→dq→linear→q pattern (where the output quantize node comes from a
subsequent quantized op's input) and fuses it into a single
q8ta_linear call.

This diff was authored with Claude.

Differential Revision: [D93768642](https://our.internmc.facebook.com/intern/diff/D93768642/)

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Feb 19, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17565

Note: Links to docs will display an error until the docs builds have been completed.

❌ 4 New Failures

As of commit 3a94e2e with merge base 7b843e4 (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@github-actions
Copy link

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments