@@ -14,8 +14,9 @@ params:
1414## Introduction
1515
1616Today HLSL (DXIL) validation enforces a fixed upper limit of 32 KB
17- of group shared memory per thread group for compute, mesh, and amplification
18- shaders. Modern GPU architectures often expose substantially larger physically
17+ of group shared memory per thread group for Compute, and Amplification
18+ Shaders with Mesh shaders being limited to 28 KB. Modern GPU architectures
19+ often expose substantially larger physically
1920available shared memory, and practical algorithms (e.g. large tile / cluster
2021culling, large matrix manipulation, software raster bins, wave-cooperative BVH
2122traversal, etc.) are constrained by the fixed specification limit rather than
@@ -43,8 +44,8 @@ nothing).
4344Introduce two core pieces:
4445
45461 . A runtime API query returning ` MaxGroupSharedMemoryPerGroup ` (in bytes).
46- - This will return a minimum value of 32,768 i.e the default for SM 6.9 and
47- prior
47+ - This will return a value at minimum equal to the existing limits in SM 6.9
48+ and prior i.e. 32k for CS and AS and 28k for Mesh Shaders.
4849 - There is no defined maximum value.
4950 - Values must be 4 byte aligned.
50512 . A new optional entry-point attribute allowing a shader author to declare the
@@ -109,8 +110,8 @@ actual usage exceeds that.
109110
110111### Runtime Validation
111112* If ` GroupSharedLimit ` is omitted, validation will fall back to the original
112- 32k limit. The error message will be update to indicate that the limit may be
113- raised with the caveat that hardware support must be checked.
113+ 32k limit (28k for MS) . The error message will be updated to indicate that the
114+ limit may be raised with the caveat that hardware support must be checked.
114115* If ` GroupSharedLimit ` is present, HLSL validation will ensure the actual
115116static usage is less than that limit. While a shader may pass validation and
116117compile successfully the runtime may reject it if the shared memory usage is
@@ -161,7 +162,7 @@ Validator must:
161162* Sum byte sizes of all groupshared globals (respect alignment / padding like
162163today).
163164* Check attribute presence & argument correctness.
164- * Ensure intrinsic appears only in compute/mesh/amplification and SM >= 6.10.
165+ * Ensure attribute appears only in compute/mesh/amplification and SM >= 6.10.
165166* Emit / retain static usage metadata (existing) for runtime comparison against
166167device capability.
167168
@@ -170,24 +171,24 @@ device capability.
170171#### Capability Bit / Query
171172
172173Add a new feature query (illustrative naming):
173- * D3D12: ` D3D12_FEATURE_DATA_D3D12_OPTIONS_XX::MaxGroupSharedMemoryPerGroup `
174+ * D3D12: ` D3D12_FEATURE_DATA_D3D12_OPTIONS_XX::MaxGroupSharedMemoryPerGroupCSAS `
174175 - Value declares the maximum group shared memory in bytes per thread group
176+ for Compute and Amplification Shaders.
175177 - Must be >= 32,768 and 4 byte aligned
178+ * D3D12: ` D3D12_FEATURE_DATA_D3D12_OPTIONS_XX::MaxGroupSharedMemoryPerGroupMS `
179+ - Value declares the maximum group shared memory in bytes per thread group
180+ for Mesh Shaders.
181+ - Must be >= 28,672 and 4 byte aligned
176182
177183#### Pipeline Compilation / Load
178184* Runtime compares shader static usage versus device capacity.
179185* Failure path mirrors existing shader model mismatch failures.
180186
181- ### Device Capability
182-
183- * When targeting Shader Model 6.10 drivers must return a value for
184- ` MaxGroupSharedMemoryPerGroup ` greater than or equal to 32,768.
185-
186187## Testing
187188
188189Testing matrix axes:
189190* Stages: compute, mesh, amplification.
190- * Capacities: 0 - 32 KB, 48 KB, 64 KB, 96 KB, 128 KB.
191+ * Capacities: 0 - 32/28 KB, 48 KB, 64 KB, 96 KB, 128 KB.
191192* Attribute: absent vs present (below, equal, above static usage; above
192193capacity).
193194
0 commit comments