Fix: use cu_chunk instead of cu_seqlens for flash_attn during corner case where top-k equals 1 #26

yibozhong · 2025-04-04T20:02:50Z

This PR fixes an inconsistency in MoBA. When top-k > 1, the self attn is calculated using cu_seqlens_q=self_attn_cu_seqlen, where self_attn_cu_seqlen = cu_chunk. Meanwhile when top-k = 1, the self attn is calculated using cu_seqlens_q=cu_seqlens, which behaves the same as standard attention. Because the current block is always chosen, the corner case should use cu_chunk instead of cu_seqlens.

…case where top-k equals 1

Fix: use cu_chunk instead of cu_seqlens for flash_attn during corner …

70b50fd

…case where top-k equals 1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix: use cu_chunk instead of cu_seqlens for flash_attn during corner case where top-k equals 1 #26

Fix: use cu_chunk instead of cu_seqlens for flash_attn during corner case where top-k equals 1 #26

Uh oh!

yibozhong commented Apr 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Fix: use cu_chunk instead of cu_seqlens for flash_attn during corner case where top-k equals 1 #26

Are you sure you want to change the base?

Fix: use cu_chunk instead of cu_seqlens for flash_attn during corner case where top-k equals 1 #26

Uh oh!

Conversation

yibozhong commented Apr 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant