Skip to content

Commit 068105f

Browse files
Update docs
1 parent 4200bdf commit 068105f

File tree

185 files changed

+3330
-35
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

185 files changed

+3330
-35
lines changed

_sources/autoapi/tilelang/intrinsics/index.rst.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,8 @@ Submodules
1414
/autoapi/tilelang/intrinsics/mfma_macro_generator/index
1515
/autoapi/tilelang/intrinsics/mma_layout/index
1616
/autoapi/tilelang/intrinsics/mma_macro_generator/index
17+
/autoapi/tilelang/intrinsics/mma_sm70_layout/index
18+
/autoapi/tilelang/intrinsics/mma_sm70_macro_generator/index
1719
/autoapi/tilelang/intrinsics/tcgen05_macro_generator/index
1820
/autoapi/tilelang/intrinsics/utils/index
1921
/autoapi/tilelang/intrinsics/wgmma_macro_generator/index
Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
tilelang.intrinsics.mma_sm70_layout
2+
===================================
3+
4+
.. py:module:: tilelang.intrinsics.mma_sm70_layout
5+
6+
7+
Functions
8+
---------
9+
10+
.. autoapisummary::
11+
12+
tilelang.intrinsics.mma_sm70_layout.shared_16x4_to_mma_a_32x4_layout
13+
tilelang.intrinsics.mma_sm70_layout.shared_4x16_to_mma_b_32x4_layout
14+
tilelang.intrinsics.mma_sm70_layout.shared_16x4_to_mma_b_32x4_layout_trans
15+
tilelang.intrinsics.mma_sm70_layout.mma_32x8_to_shared_16x16_layout_fp32
16+
tilelang.intrinsics.mma_sm70_layout.mma_32x8_to_shared_16x16_layout_fp16
17+
tilelang.intrinsics.mma_sm70_layout.mma_load_a_32x4_to_shared_16x4_layout
18+
tilelang.intrinsics.mma_sm70_layout.mma_load_b_32x4_to_shared_16x4_layout_trans
19+
tilelang.intrinsics.mma_sm70_layout.mma_load_b_32x4_to_shared_4x16_layout
20+
21+
22+
Module Contents
23+
---------------
24+
25+
.. py:function:: shared_16x4_to_mma_a_32x4_layout(row, col, rep)
26+
27+
.. py:function:: shared_4x16_to_mma_b_32x4_layout(row, col, rep)
28+
29+
.. py:function:: shared_16x4_to_mma_b_32x4_layout_trans(row, col, rep)
30+
31+
.. py:function:: mma_32x8_to_shared_16x16_layout_fp32(thread_id, local_id)
32+
33+
.. py:function:: mma_32x8_to_shared_16x16_layout_fp16(thread_id, local_id)
34+
35+
.. py:function:: mma_load_a_32x4_to_shared_16x4_layout(thread_id, local_id)
36+
37+
.. py:function:: mma_load_b_32x4_to_shared_16x4_layout_trans(thread_id, local_id)
38+
39+
.. py:function:: mma_load_b_32x4_to_shared_4x16_layout(thread_id, local_id)
40+
Lines changed: 187 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,187 @@
1+
tilelang.intrinsics.mma_sm70_macro_generator
2+
============================================
3+
4+
.. py:module:: tilelang.intrinsics.mma_sm70_macro_generator
5+
6+
7+
Attributes
8+
----------
9+
10+
.. autoapisummary::
11+
12+
tilelang.intrinsics.mma_sm70_macro_generator.lift
13+
14+
15+
Classes
16+
-------
17+
18+
.. autoapisummary::
19+
20+
tilelang.intrinsics.mma_sm70_macro_generator.TensorCoreIntrinEmitter
21+
22+
23+
Module Contents
24+
---------------
25+
26+
.. py:data:: lift
27+
28+
.. py:class:: TensorCoreIntrinEmitter(a_dtype = 'float16', b_dtype = 'float16', accum_dtype = 'float16', a_transposed = False, b_transposed = False, block_row_warps = 2, block_col_warps = 2, warp_row_tiles = 8, warp_col_tiles = 8, chunk = 16, reduce_k = 1, num_elems_per_byte = 1, is_m_first = False, thread_var = None)
29+
30+
To eliminate Python syntax within TIR Macro.
31+
32+
33+
.. py:attribute:: M_DIM
34+
:value: 16
35+
36+
37+
38+
.. py:attribute:: n_dim
39+
:value: 16
40+
41+
42+
43+
.. py:attribute:: WARP_SIZE
44+
:value: 32
45+
46+
47+
48+
.. py:attribute:: HALF_WARP_SIZE
49+
:value: 16
50+
51+
52+
53+
.. py:attribute:: dtype_abbrv
54+
55+
56+
.. py:attribute:: is_m_first
57+
:value: False
58+
59+
60+
61+
.. py:attribute:: a_dtype
62+
:value: 'float16'
63+
64+
65+
66+
.. py:attribute:: b_dtype
67+
:value: 'float16'
68+
69+
70+
71+
.. py:attribute:: accum_dtype
72+
:value: 'float16'
73+
74+
75+
76+
.. py:attribute:: a_transposed
77+
:value: False
78+
79+
80+
81+
.. py:attribute:: b_transposed
82+
:value: False
83+
84+
85+
86+
.. py:attribute:: block_row_warps
87+
:value: 2
88+
89+
90+
91+
.. py:attribute:: block_col_warps
92+
:value: 2
93+
94+
95+
96+
.. py:attribute:: warp_row_tiles
97+
:value: 8
98+
99+
100+
101+
.. py:attribute:: warp_col_tiles
102+
:value: 8
103+
104+
105+
106+
.. py:attribute:: chunk
107+
:value: 16
108+
109+
110+
111+
.. py:attribute:: reduce_k
112+
:value: 1
113+
114+
115+
116+
.. py:attribute:: threads
117+
:value: 128
118+
119+
120+
121+
.. py:attribute:: num_elems_per_byte
122+
:value: 1
123+
124+
125+
126+
.. py:attribute:: thread_var
127+
:value: None
128+
129+
130+
131+
.. py:method:: get_thread_binding()
132+
133+
134+
.. py:method:: get_store_index_map(inverse = False)
135+
136+
137+
.. py:method:: extract_thread_binding(thread_id, is_m_first = None)
138+
139+
is_m_first: True if the thread binding is in the form of (tx, warp_n, warp_m)
140+
which represents [warp_size, block_row_warps (split n), block_col_warps (split m)]
141+
Otherwise, it is in the form of [warp_size, block_col_warps (split m), block_row_warps (split n)]
142+
143+
144+
145+
.. py:method:: ldmatrix_a(A_local_buf, A_shared_buf, ki, rk = 0)
146+
147+
148+
.. py:method:: ldmatrix_b(B_local_buf, B_shared_buf, ki, rk = 0)
149+
150+
151+
.. py:method:: mma(A_local_buf, B_local_buf, C_local_buf, k_inner = 0)
152+
153+
154+
.. py:method:: make_mma_load_layout(local_buf, matrix = 'A')
155+
156+
Create a layout function for storing MMA results into a fragment buffer.
157+
This layout is used in conjunction with `inverse_mma_store_layout` to
158+
map fragment indices to threads and local indices.
159+
160+
:param local_buf: The local buffer representing a fragment of a matrix.
161+
:type local_buf: tir.Buffer
162+
163+
:returns: A fragment object that describes how threads and indices
164+
in `local_buf` are laid out.
165+
:rtype: T.Fragment
166+
167+
:raises AssertionError: If `local_buf` is not detected to be a fragment buffer.
168+
169+
170+
171+
.. py:method:: make_mma_store_layout(local_buf)
172+
173+
Create a layout function for storing MMA results into a fragment buffer.
174+
This layout is used in conjunction with `inverse_mma_store_layout` to
175+
map fragment indices to threads and local indices.
176+
177+
:param local_buf: The local buffer representing a fragment of a matrix.
178+
:type local_buf: tir.Buffer
179+
180+
:returns: A fragment object that describes how threads and indices
181+
in `local_buf` are laid out.
182+
:rtype: T.Fragment
183+
184+
:raises AssertionError: If `local_buf` is not detected to be a fragment buffer.
185+
186+
187+

_sources/autoapi/tilelang/language/builtin/index.rst.txt

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,7 @@ Functions
5555
tilelang.language.builtin.loop_break
5656
tilelang.language.builtin.cp_async_barrier_noinc
5757
tilelang.language.builtin.tcgen05_mma_arrive
58+
tilelang.language.builtin.ptx_mma_sm70
5859

5960

6061
Module Contents
@@ -520,3 +521,57 @@ Module Contents
520521
:type mbar_ptr: PrimExpr
521522

522523

524+
.. py:function:: ptx_mma_sm70(shape, A_layout, B_layout, A_dtype, B_dtype, C_dtype, multiplicand_a, a_index, multiplicand_b, b_index, accumulator, c_index)
525+
526+
TVM intrinsic for ptx tensor core mma instructions on SM70 (Volta).
527+
528+
This intrinsic provides SM70-specific MMA operations that support m16n16k4 shape
529+
with FP16 inputs and FP16/FP32 accumulation.
530+
531+
:param shape: The shape of mma fragment (e.g., "m16n16k4").
532+
:type shape: str
533+
:param A_layout: The layout of multiplicand fragment A ("row" or "col").
534+
:type A_layout: str
535+
:param B_layout: The layout of multiplicand fragment B ("row" or "col").
536+
:type B_layout: str
537+
:param A_dtype: The data type of multiplicand fragment A (typically "fp16").
538+
:type A_dtype: str
539+
:param B_dtype: The data type of multiplicand fragment B (typically "fp16").
540+
:type B_dtype: str
541+
:param C_dtype: The data type of accumulator fragment C ("fp16" or "fp32").
542+
:type C_dtype: str
543+
:param multiplicand_a: The multiplicand fragment A variable.
544+
:type multiplicand_a: Var
545+
:param a_index: The index of multiplicand fragment A.
546+
:type a_index: Expr
547+
:param multiplicand_b: The multiplicand fragment B variable.
548+
:type multiplicand_b: Var
549+
:param b_index: The index of multiplicand fragment B.
550+
:type b_index: Expr
551+
:param accumulator: The accumulator fragment C variable.
552+
:type accumulator: Var
553+
:param c_index: The index of accumulator fragment C.
554+
:type c_index: Expr
555+
556+
:returns: **call** -- The call expression.
557+
:rtype: PrimExpr
558+
559+
.. rubric:: Examples
560+
561+
>>> T.ptx_mma_sm70(
562+
... "float16",
563+
... "m16n16k4",
564+
... "row",
565+
... "col",
566+
... "fp16",
567+
... "fp16",
568+
... "fp16",
569+
... A_local.data,
570+
... 0,
571+
... B_local.data,
572+
... 0,
573+
... C_local.data,
574+
... 0,
575+
... )
576+
577+

_sources/autoapi/tilelang/layout/swizzle/index.rst.txt

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ Functions
1515
.. autoapisummary::
1616

1717
tilelang.layout.swizzle.make_swizzled_layout
18+
tilelang.layout.swizzle.make_volta_swizzled_layout
1819
tilelang.layout.swizzle.make_wgmma_swizzled_layout
1920
tilelang.layout.swizzle.make_tcgen05mma_swizzled_layout
2021
tilelang.layout.swizzle.make_full_bank_swizzled_layout
@@ -28,6 +29,8 @@ Module Contents
2829

2930
.. py:function:: make_swizzled_layout(buffer, k_major = True, allow_pad = True)
3031
32+
.. py:function:: make_volta_swizzled_layout(buffer, is_a = True, k_inner = True)
33+
3134
.. py:function:: make_wgmma_swizzled_layout(buffer, continuity = None, k_major = True)
3235
3336
.. py:function:: make_tcgen05mma_swizzled_layout(buffer, continuity = None, k_major = True)
Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
tilelang.tileop.gemm.gemm_mma_sm70
2+
==================================
3+
4+
.. py:module:: tilelang.tileop.gemm.gemm_mma_sm70
5+
6+
7+
Classes
8+
-------
9+
10+
.. autoapisummary::
11+
12+
tilelang.tileop.gemm.gemm_mma_sm70.GemmMMASm70
13+
14+
15+
Module Contents
16+
---------------
17+
18+
.. py:class:: GemmMMASm70
19+
20+
Bases: :py:obj:`tilelang.tileop.gemm.gemm_base.GemmBase`
21+
22+
23+
.. py:method:: infer_layout(target, thread_nums)
24+
25+
26+
.. py:method:: lower(layout_map, target, thread_nums, thread_var)
27+
28+
29+
.. py:method:: is_gemm_ss()
30+
31+
32+
.. py:method:: is_gemm_sr()
33+
34+
35+
.. py:method:: is_gemm_rs()
36+
37+
38+
.. py:method:: is_gemm_rr()
39+
40+

_sources/autoapi/tilelang/tileop/gemm/index.rst.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ Submodules
1313
/autoapi/tilelang/tileop/gemm/gemm_base/index
1414
/autoapi/tilelang/tileop/gemm/gemm_mfma/index
1515
/autoapi/tilelang/tileop/gemm/gemm_mma/index
16+
/autoapi/tilelang/tileop/gemm/gemm_mma_sm70/index
1617
/autoapi/tilelang/tileop/gemm/gemm_tcgen05/index
1718
/autoapi/tilelang/tileop/gemm/gemm_wgmma/index
1819

0 commit comments

Comments
 (0)