Update 2025-09-25-gb200-part-2.md (#213)

fzyzcjy · web-flow · commit edb9ff7299c3 · 2025-09-28T17:22:54.000+08:00
diff --git a/blog/2025-09-25-gb200-part-2.md b/blog/2025-09-25-gb200-part-2.md
@@ -52,7 +52,7 @@ As a remark, the end-to-end performance differences between the high-precision a
 
 ### Zoom into Low-precision Kernels
 
-In this subsection, we examine the effects when changing from standard precision to low precision kernels. More specifically, we consider both the attention kernel and the GEMM kernels. For the latter, we consider the gate-up GEMM in MoE, the down GEMM in MoE, as well as the output projection GEMM which is in attention but is also time-consuming. For simplicity, we only consider one typical case. For simplicity, we only consider one typical case.
+In this subsection, we examine the effects when changing from standard precision to low precision kernels. More specifically, we consider both the attention kernel and the GEMM kernels. For the latter, we consider the gate-up GEMM in MoE, the down GEMM in MoE, as well as the output projection GEMM which is in attention but is also time-consuming. For simplicity, we only consider one typical case.
 
 As can be seen in the figure below, lowering the precision speeds up the related kernels to a great extent. For the case under test, attention is 1.8x faster and GEMM is up to 1.9x faster. Another improvement, which is not visible from the kernel perspective, is the increased number of KV cache tokens, which leads to larger batch sizes and thus improved performance.