Skip to content

Conversation

yibozhong
Copy link

This PR fixes an inconsistency in MoBA. When top-k > 1, the self attn is calculated using cu_seqlens_q=self_attn_cu_seqlen, where self_attn_cu_seqlen = cu_chunk. Meanwhile when top-k = 1, the self attn is calculated using cu_seqlens_q=cu_seqlens, which behaves the same as standard attention. Because the current block is always chosen, the corner case should use cu_chunk instead of cu_seqlens.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant