[Pending][OneDNN] add fp8 block gemm by Yejing-Lai · Pull Request #173 · vllm-project/vllm-xpu-kernels

Yejing-Lai · 2026-03-05T03:15:01Z

Add fp8 block gemm and UT. All UT passed.

Signed-off-by: Lai, Yejing <yejing.lai@intel.com>

xinyu-intel · 2026-03-05T05:57:34Z

tests/ops/fp8_quant_op.py

+        x: [M, N] float tensor (fp16/fp32)
+        block_m: block rows
+        block_n: block cols
+        fp8_dtype: torch.float8_e4m3fn or e5m2


remove e5m2 if not supported

xinyu-intel · 2026-03-05T05:59:51Z

tests/test_fp8_gemm_onednn.py

+    weight_fp8, scale_wei_fp8 = fp8_block_quant_2d(weight, group_size,
+                                                   group_size)
+
+    # reference fp16 gemm


xinyu-intel · 2026-03-05T06:07:40Z

tests/test_fp8_gemm_onednn.py

+                                                   group_size)
+
+    # reference fp16 gemm
+    output_ref = torch.matmul(input, weight.t())


what should be the reference here? the purpose here is to check the numerical loss instead of the quantization loss. see https://github.com/pytorch/pytorch/blob/4a1bbf44dde11d5ca0c9928a929d18cb3d180181/test/test_scaled_matmul_cuda.py#L308 for the reference.

Here referenced the comparison methods used in per_tensor/per_channel/per_token UT from this file(https://github.com/vllm-project/vllm-xpu-kernels/blob/main/tests/test_fp8_gemm_onednn.py#L155). Do we need to update all of these comparison methods?

I see. Please update others in another PR. cc @zufangzhu

[OneDNN] add fp8 block gemm

286685b

Signed-off-by: Lai, Yejing <yejing.lai@intel.com>

xinyu-intel requested changes Mar 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Pending][OneDNN] add fp8 block gemm#173

[Pending][OneDNN] add fp8 block gemm#173
Yejing-Lai wants to merge 1 commit intovllm-project:mainfrom
Yejing-Lai:lyj/add_block_gemm_xe2

Yejing-Lai commented Mar 5, 2026

Uh oh!

xinyu-intel Mar 5, 2026

Uh oh!

xinyu-intel Mar 5, 2026

Uh oh!

xinyu-intel Mar 5, 2026

Uh oh!

Yejing-Lai Mar 5, 2026 •

edited

Loading

Uh oh!

xinyu-intel Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Yejing-Lai commented Mar 5, 2026

Uh oh!

xinyu-intel Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

xinyu-intel Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

xinyu-intel Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Yejing-Lai Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xinyu-intel Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Yejing-Lai Mar 5, 2026 •

edited

Loading