Skip to content

Conversation

k50112113
Copy link

@k50112113 k50112113 commented Sep 10, 2025

This PR contains launch-bound triton fusion kernel optimization,

including DS-V3 FP8, LL-70B (rope+kv_cache), gpt-oss-120B (rope+kv_cahce),

for GPT-OSS, the following triton compiler change is required:

Change TRITON_HIP_PRESHUFFLE_SCALES default by cagrikymk · Pull Request #877 · ROCm/triton

@k50112113
Copy link
Author

See ROCm/aiter#984

@dllehr-amd dllehr-amd merged commit b7a1826 into 355_wip Sep 11, 2025
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants