Skip to content

Commit f740250

Browse files
JoeCitizenJack Elliott
andauthored
[0049] Differentiate Mesh Shaders from Compute and Amplification Shaders for Group Shared usage (#722)
Mesh Shaders currently have a group shared memory limit of 28k vs the 32k of Compute and Amp shaders. The runtime will need to report separate limits to account for this difference. Addresses issue #721 --------- Co-authored-by: Jack Elliott <[email protected]>
1 parent b2d7514 commit f740250

File tree

2 files changed

+18
-14
lines changed

2 files changed

+18
-14
lines changed

.vscode/settings.json

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
{
2+
"editor.rulers": [80]
3+
}

proposals/0049-variable-groupshared-memory.md

Lines changed: 15 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,9 @@ params:
1414
## Introduction
1515

1616
Today HLSL (DXIL) validation enforces a fixed upper limit of 32 KB
17-
of group shared memory per thread group for compute, mesh, and amplification
18-
shaders. Modern GPU architectures often expose substantially larger physically
17+
of group shared memory per thread group for Compute, and Amplification
18+
Shaders with Mesh shaders being limited to 28 KB. Modern GPU architectures
19+
often expose substantially larger physically
1920
available shared memory, and practical algorithms (e.g. large tile / cluster
2021
culling, large matrix manipulation, software raster bins, wave-cooperative BVH
2122
traversal, etc.) are constrained by the fixed specification limit rather than
@@ -43,8 +44,8 @@ nothing).
4344
Introduce two core pieces:
4445

4546
1. A runtime API query returning `MaxGroupSharedMemoryPerGroup` (in bytes).
46-
- This will return a minimum value of 32,768 i.e the default for SM 6.9 and
47-
prior
47+
- This will return a value at minimum equal to the existing limits in SM 6.9
48+
and prior i.e. 32k for CS and AS and 28k for Mesh Shaders.
4849
- There is no defined maximum value.
4950
- Values must be 4 byte aligned.
5051
2. A new optional entry-point attribute allowing a shader author to declare the
@@ -109,8 +110,8 @@ actual usage exceeds that.
109110

110111
### Runtime Validation
111112
* If `GroupSharedLimit` is omitted, validation will fall back to the original
112-
32k limit. The error message will be update to indicate that the limit may be
113-
raised with the caveat that hardware support must be checked.
113+
32k limit (28k for MS). The error message will be updated to indicate that the
114+
limit may be raised with the caveat that hardware support must be checked.
114115
* If `GroupSharedLimit` is present, HLSL validation will ensure the actual
115116
static usage is less than that limit. While a shader may pass validation and
116117
compile successfully the runtime may reject it if the shared memory usage is
@@ -161,7 +162,7 @@ Validator must:
161162
* Sum byte sizes of all groupshared globals (respect alignment / padding like
162163
today).
163164
* Check attribute presence & argument correctness.
164-
* Ensure intrinsic appears only in compute/mesh/amplification and SM >= 6.10.
165+
* Ensure attribute appears only in compute/mesh/amplification and SM >= 6.10.
165166
* Emit / retain static usage metadata (existing) for runtime comparison against
166167
device capability.
167168

@@ -170,24 +171,24 @@ device capability.
170171
#### Capability Bit / Query
171172

172173
Add a new feature query (illustrative naming):
173-
* D3D12: `D3D12_FEATURE_DATA_D3D12_OPTIONS_XX::MaxGroupSharedMemoryPerGroup`
174+
* D3D12: `D3D12_FEATURE_DATA_D3D12_OPTIONS_XX::MaxGroupSharedMemoryPerGroupCSAS`
174175
- Value declares the maximum group shared memory in bytes per thread group
176+
for Compute and Amplification Shaders.
175177
- Must be >= 32,768 and 4 byte aligned
178+
* D3D12: `D3D12_FEATURE_DATA_D3D12_OPTIONS_XX::MaxGroupSharedMemoryPerGroupMS`
179+
- Value declares the maximum group shared memory in bytes per thread group
180+
for Mesh Shaders.
181+
- Must be >= 28,672 and 4 byte aligned
176182

177183
#### Pipeline Compilation / Load
178184
* Runtime compares shader static usage versus device capacity.
179185
* Failure path mirrors existing shader model mismatch failures.
180186

181-
### Device Capability
182-
183-
* When targeting Shader Model 6.10 drivers must return a value for
184-
`MaxGroupSharedMemoryPerGroup` greater than or equal to 32,768.
185-
186187
## Testing
187188

188189
Testing matrix axes:
189190
* Stages: compute, mesh, amplification.
190-
* Capacities: 0 - 32 KB, 48 KB, 64 KB, 96 KB, 128 KB.
191+
* Capacities: 0 - 32/28 KB, 48 KB, 64 KB, 96 KB, 128 KB.
191192
* Attribute: absent vs present (below, equal, above static usage; above
192193
capacity).
193194

0 commit comments

Comments
 (0)