[Roadmap] Kimi-K2 performance enhancement on H20 GPU

### [Proposal] Kimi-K2 performance enhancement on H20 GPU

### Summary
Our current test found that the performance of Kimi k2 under TP16 is very poor, in the input and output 3500/1500 scenarios, to meet the SLO for TTFT < 5s and TPOT < 50ms single card total throughput can only reach 36 token/s, so determine the plan aims to quickly improve the performance of Kimi k2 on H20 hardware, fix the bugs in the process, and give the best practices.

### Roadmap

- [x] Kimi k2 fuse_moe TP16 triton config on H20 @artetaout https://github.com/sgl-project/sglang/pull/8047
- [x] Kimi k2 fuse_moe TP16 triton config on H20-3e @GaoYusong https://github.com/sgl-project/sglang/pull/8021
- [x] Kimi k2 W4A8 on EP mode on H20 or H20-3e @yangsijia-serena https://github.com/sgl-project/sglang/pull/7762
- [ ] Kimi k2 W4A8 on TP mode on H20 or H20-3e @chenxijun1029  https://github.com/sgl-project/sglang/pull/8118
- [ ] Train the Kimi k2 Eagle3 model @zhangxiaolei123456
- [ ] Kimi k2 W4A8 on EP model support deepep @ayrnb 
   - [ ] https://github.com/sgl-project/sglang/pull/8247
   - [ ] https://github.com/sgl-project/sglang/pull/8311 
- [ ] Kimi k2 support PD Disaggregation on H20-3e @zhangxiaolei123456
    - [x] Bugfix: pd transfer timeout when input length > 8k on h20 https://github.com/sgl-project/sglang/pull/7695
    - [ ] Feature: performance enhance W4A8 on EP mode.
    - [ ] Feature: support Flux groupgemm and allreduce fusion. 
- [ ] Kimi k2 W4A8 on TP mode support PD Disaggregation on H20 @Layssy
- [ ] Kimi k2 support PD Disaggregation and large scale EP on H20 @HanHan009527
- [ ] Kimi k2 W4A8 support PD Disaggregation and large scale EP on H20(Prefill on TP, Decode on EP) @zhangxiaolei123456

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Roadmap] Kimi-K2 performance enhancement on H20 GPU #8151

[Proposal] Kimi-K2 performance enhancement on H20 GPU

Summary

Roadmap

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Roadmap] Kimi-K2 performance enhancement on H20 GPU #8151

Description

[Proposal] Kimi-K2 performance enhancement on H20 GPU

Summary

Roadmap

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions