Fix: Correct/Improve the triton attention kernel #196

chiennv2000 · 2025-09-23T14:45:03Z

This PR introduces several enhancements to the attention kernel, including the implementation of a backward pass, memory optimization for grouped query attention, and a bug fix.

1. Bug Fix: Incorrect Attention with Query Offset: Fixed a bug where the attention kernel produced incorrect results when the query offset (start_q) was non-zero. The kernel's starting loop bound (lo) was incorrectly initialized to start_q, causing the computation to skip the initial keys in the KV cache.

2. Improve GQA Memory Optimization: The K and V tensors were explicitly expanded using torch.repeat_interleave, which materialized large tensors in memory. It is better to handle it by manipulating pointers to map query heads to their corresponding KV head

3. Backward Pass Implementation: Implement a custom backward pass, making the module fully differentiable and usable for end-to-end model training.

Testing: All pytestcases have been updated to validate both the forward and backward passes against a reference PyTorch implementation:

================= 18 passed, 6 skipped in 5.06s =================

…ead of repeat_interleave

chiennv2000 · 2025-09-28T12:31:34Z

@Maratyszcza @dkundel-openai please review

chiennv2000 added 2 commits September 23, 2025 07:39

fix: correct loop bounds for attention kernel

ded6e3c

add: support backward pass, improve GQA by manipulating pointers inst…

a655998

…ead of repeat_interleave

chiennv2000 changed the title ~~Fix: Correct attention kernel loop bounds~~ Fix: Correct/Improve the triton attention kernel Sep 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix: Correct/Improve the triton attention kernel #196

Fix: Correct/Improve the triton attention kernel #196

chiennv2000 commented Sep 23, 2025 •

edited

Loading

Uh oh!

chiennv2000 commented Sep 28, 2025

Uh oh!

Uh oh!

Fix: Correct/Improve the triton attention kernel #196

Are you sure you want to change the base?

Fix: Correct/Improve the triton attention kernel #196

Conversation

chiennv2000 commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chiennv2000 commented Sep 28, 2025

Uh oh!

Uh oh!

chiennv2000 commented Sep 23, 2025 •

edited

Loading