-
Notifications
You must be signed in to change notification settings - Fork 68
[LoadStoreToLLVM] Refactor the 2D block load lowering. #4615
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
5ac88f5
to
719c526
Compare
Need wait the reland of the block store code in PR#4646 |
719c526
to
d725f19
Compare
third_party/intel/lib/TritonIntelGPUToLLVM/LoadStoreOpToLLVM.cpp
Outdated
Show resolved
Hide resolved
5689429
to
3cce958
Compare
3cce958
to
9308fcf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors the 2D block load lowering implementation in the LoadStoreOpToLLVM pass by transitioning from a DPAS-specific approach to using linear layout. The refactoring simplifies the code structure while maintaining functionality for 2D block I/O operations.
Key changes include:
- Replaced complex DPAS-specific calculations with linear layout-based tile size determination
- Simplified load operation generation by using register mapping from linear layout
- Streamlined the code flow and reduced complexity in the load conversion logic
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
File | Description |
---|---|
LoadStoreOpToLLVM.cpp | Refactored 2D block load lowering to use linear layout instead of DPAS-specific logic |
test_block_store.py | Updated test to include block load operations and verify their generation |
auto [tileHeight, tileWidth, numPackedVals, vBlocks, rowDim, colDim, | ||
regPackedBases] = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Using structured bindings with auto can make the code less readable when the types are not obvious. Consider using explicit variable declarations or adding a comment explaining what getBlockIOTileSize returns.
auto [tileHeight, tileWidth, numPackedVals, vBlocks, rowDim, colDim, | |
regPackedBases] = | |
int tileHeight, tileWidth, numPackedVals, vBlocks, rowDim, colDim; | |
std::vector<int> regPackedBases; | |
std::tie(tileHeight, tileWidth, numPackedVals, vBlocks, rowDim, colDim, | |
regPackedBases) = |
Copilot uses AI. Check for mistakes.
unsigned totalBytesPerRowPerMatrix = tileWidth * packedElemSizeInBits / 8; | ||
vBlocks = std::min(vBlocks, (int)(64 / totalBytesPerRowPerMatrix)); | ||
vBlocks = std::min(4, vBlocks); | ||
// HW issue for vblock = 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is unclear and doesn't explain what the hardware issue is or why vBlocks is set to 1 when it equals 4. Consider adding more context about the specific hardware limitation.
// HW issue for vblock = 4 | |
// Due to a hardware limitation, configurations where vBlocks equals 4 | |
// are not supported. This issue arises because the hardware cannot handle | |
// 2D block loads or stores with this specific configuration. To work around | |
// this limitation, vBlocks is set to 1 when it equals 4. |
Copilot uses AI. Check for mistakes.
unsigned opsPerChannel = dpasLayout.getOpsPerChannel(); | ||
if ((opsPerChannel == 4 && elemSizeInBits == 8) || | ||
(opsPerChannel == 2 && elemSizeInBits == 16)) { | ||
// Use the VNNI packing format for DotOp B layout. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove commented-out code. If this assignment is needed for future reference, consider adding a TODO comment explaining why it's preserved.
// Use the VNNI packing format for DotOp B layout. | |
// Use the VNNI packing format for DotOp B layout. | |
// TODO: Retain this line for reference in case packedType needs to be explicitly set to i32_ty in future updates. |
Copilot uses AI. Check for mistakes.
assert(maskElems.size() == otherElems.size() && | ||
"Invalid size of the masks."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assertion compares maskElems.size() with otherElems.size(), but otherElems may be empty when there's no 'other' value. This could cause a false assertion failure when mask is provided but other is not.
assert(maskElems.size() == otherElems.size() && | |
"Invalid size of the masks."); | |
assert((otherElems.empty() || maskElems.size() == otherElems.size()) && | |
"Invalid size of the masks: maskElems and otherElems sizes do not match."); |
Copilot uses AI. Check for mistakes.
Signed-off-by: Lu,Chengjun <[email protected]> [LoadStoreOpToLLVM] Refactor block load lowering of tt.load with tensor pointer. Signed-off-by: Lu,Chengjun <[email protected]>
Refactor the 2D block IO lowering for regular pointer by using the linear layout.