[webgpu] Don't use num_workgroups when use indirect dispatch #26334

qjia7 · 2025-10-17T05:23:07Z

This pull request updates the FlashAttention WebGPU implementation to improve support for indirect dispatch. The main changes ensure that when indirect dispatch is used, the shader receives the actual workgroup dimensions from an input buffer rather than relying on built-in variables, which avoids duplication overhead in Dawn/WebGPU. See https://source.chromium.org/chromium/chromium/src/+/main:third_party/dawn/src/dawn/native/ComputePassEncoder.cpp;l=275.
This PR fixes the issue that indirect dispatch is slower than normal dispatch for the same program.
With this change, the phi4 with graph capture enabled can run 145 tps from 125 tps on NV 5080.

onnxruntime/contrib_ops/webgpu/bert/flash_attention.cc

[webgpu] Don't use num_workgroups when use indirect dispatch

767b271

qjia7 marked this pull request as ready for review October 17, 2025 05:48

qjia7 requested review from fs-eire and guschmue October 17, 2025 05:48

fs-eire reviewed Oct 17, 2025

View reviewed changes

onnxruntime/contrib_ops/webgpu/bert/flash_attention.cc Outdated Show resolved Hide resolved

qjia7 added 3 commits October 17, 2025 14:58

address comments

691a4cb

revert unnecessary changes

f4a4fdc

Make sure the indirect buffer is the last one in program side

90f86e1

qjia7 requested a review from fs-eire October 17, 2025 08:17

guschmue added the ep:WebGPU ort-web webgpu provider label Oct 20, 2025

guschmue approved these changes Oct 20, 2025

View reviewed changes

fs-eire approved these changes Oct 20, 2025

View reviewed changes

fs-eire merged commit 8ab27d9 into main Oct 20, 2025
106 of 117 checks passed

fs-eire deleted the num_workgroups branch October 20, 2025 18:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[webgpu] Don't use num_workgroups when use indirect dispatch #26334

[webgpu] Don't use num_workgroups when use indirect dispatch #26334

qjia7 commented Oct 17, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[webgpu] Don't use num_workgroups when use indirect dispatch #26334

[webgpu] Don't use num_workgroups when use indirect dispatch #26334

Conversation

qjia7 commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

qjia7 commented Oct 17, 2025 •

edited

Loading