Update 2025-09-25-gb200-part-2.md (#210)

fzyzcjy · web-flow · commit 124198721edc · 2025-09-25T23:45:06.000-07:00
diff --git a/blog/2025-09-25-gb200-part-2.md b/blog/2025-09-25-gb200-part-2.md
@@ -5,7 +5,7 @@ date: "September 25, 2025"
 previewImg: /images/blog/gb200_part_2/primary.png
 ---
 
-The GB200 NVL72 is one of the most powerful hardware for deep learning. In this blog post, we share our progress to optimize the inference performance of DeepSeek V3/R1 with FP8 attention, NVFP4 MoE, large-scale expert parallelism, prefill-decode disaggregation, and various other optimizations. When using FP8 attention and NVFP4 MoE, SGLang achieved 26,156 input and 13,386 output tokens per second per GPU for prefill and decode, respectively, on DeepSeek V3/R1 for 2000-token input sequences, which is a 3.8x and 4.8x speedup compared to [H100 settings](https://lmsys.org/blog/2025-05-05-large-scale-ep/). Even with traditional BF16 attention and FP8 MoE, SGLang still achieves 18,471 input and 9,087 output tokens per second. Reproduction instructions can be found [here](https://github.com/sgl-project/sglang/issues/10903).
+The GB200 NVL72 is one of the most powerful hardware for deep learning. In this blog post, we share our progress after our [previous blog post](https://lmsys.org/blog/2025-06-16-gb200-part-1/) to optimize the inference performance of DeepSeek V3/R1 with FP8 attention, NVFP4 MoE, large-scale expert parallelism, prefill-decode disaggregation, and various other optimizations. When using FP8 attention and NVFP4 MoE, SGLang achieved 26,156 input and 13,386 output tokens per second per GPU for prefill and decode, respectively, on DeepSeek V3/R1 for 2000-token input sequences, which is a 3.8x and 4.8x speedup compared to [H100 settings](https://lmsys.org/blog/2025-05-05-large-scale-ep/). Even with traditional BF16 attention and FP8 MoE, SGLang still achieves 18,471 input and 9,087 output tokens per second. Reproduction instructions can be found [here](https://github.com/sgl-project/sglang/issues/10903).
 
 **Highlights**