Skip to content

Conversation

@k50112113
Copy link

@k50112113 k50112113 commented Nov 4, 2025

this PR includes optimization for DS-R1 FP8 TP8:

  1. add a16w8 gemm for o_proj for decode
  2. add rocm_aiter_triton_qkv_a_proj_layernorm

this PR depends on ROCm/aiter#1328

@k50112113 k50112113 changed the title [355_wip] add a16w8 gemm for DS-R1 for o_proj for decode, add rocm_aiter_triton… [Triton] add a16w8 gemm for DS-R1 for o_proj for decode, add rocm_aiter_triton… Nov 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants