Skip to content

Commit edb9ff7

Browse files
authored
Update 2025-09-25-gb200-part-2.md (#213)
1 parent 2fccf4a commit edb9ff7

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

blog/2025-09-25-gb200-part-2.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ As a remark, the end-to-end performance differences between the high-precision a
5252

5353
### Zoom into Low-precision Kernels
5454

55-
In this subsection, we examine the effects when changing from standard precision to low precision kernels. More specifically, we consider both the attention kernel and the GEMM kernels. For the latter, we consider the gate-up GEMM in MoE, the down GEMM in MoE, as well as the output projection GEMM which is in attention but is also time-consuming. For simplicity, we only consider one typical case. For simplicity, we only consider one typical case.
55+
In this subsection, we examine the effects when changing from standard precision to low precision kernels. More specifically, we consider both the attention kernel and the GEMM kernels. For the latter, we consider the gate-up GEMM in MoE, the down GEMM in MoE, as well as the output projection GEMM which is in attention but is also time-consuming. For simplicity, we only consider one typical case.
5656

5757
As can be seen in the figure below, lowering the precision speeds up the related kernels to a great extent. For the case under test, attention is 1.8x faster and GEMM is up to 1.9x faster. Another improvement, which is not visible from the kernel perspective, is the increased number of KV cache tokens, which leads to larger batch sizes and thus improved performance.
5858

0 commit comments

Comments
 (0)