You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: blog/2025-09-25-gb200-part-2.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -52,7 +52,7 @@ As a remark, the end-to-end performance differences between the high-precision a
52
52
53
53
### Zoom into Low-precision Kernels
54
54
55
-
In this subsection, we examine the effects when changing from standard precision to low precision kernels. More specifically, we consider both the attention kernel and the GEMM kernels. For the latter, we consider the gate-up GEMM in MoE, the down GEMM in MoE, as well as the output projection GEMM which is in attention but is also time-consuming. For simplicity, we only consider one typical case. For simplicity, we only consider one typical case.
55
+
In this subsection, we examine the effects when changing from standard precision to low precision kernels. More specifically, we consider both the attention kernel and the GEMM kernels. For the latter, we consider the gate-up GEMM in MoE, the down GEMM in MoE, as well as the output projection GEMM which is in attention but is also time-consuming. For simplicity, we only consider one typical case.
56
56
57
57
As can be seen in the figure below, lowering the precision speeds up the related kernels to a great extent. For the case under test, attention is 1.8x faster and GEMM is up to 1.9x faster. Another improvement, which is not visible from the kernel perspective, is the increased number of KV cache tokens, which leads to larger batch sizes and thus improved performance.
0 commit comments