Skip to content

[LoadStoreToLLVM] Refactor the 2D block load lowering. #4615

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

chengjunlu
Copy link
Contributor

@chengjunlu chengjunlu commented Jul 4, 2025

Refactor the 2D block IO lowering for regular pointer by using the linear layout.

@chengjunlu chengjunlu force-pushed the chengjun/refactor_block_io_load branch from 5ac88f5 to 719c526 Compare July 9, 2025 03:31
@chengjunlu
Copy link
Contributor Author

Need wait the reland of the block store code in PR#4646

@chengjunlu chengjunlu force-pushed the chengjun/refactor_block_io_load branch from 719c526 to d725f19 Compare July 9, 2025 03:36
@chengjunlu chengjunlu marked this pull request as draft July 9, 2025 07:29
@chengjunlu chengjunlu force-pushed the chengjun/refactor_block_io_load branch 3 times, most recently from 5689429 to 3cce958 Compare July 24, 2025 03:28
@chengjunlu chengjunlu marked this pull request as ready for review July 24, 2025 03:28
@chengjunlu chengjunlu changed the title [DRAFT] Refactor the 2D block load lowering. [LoadStoreToLLVM] Refactor the 2D block load lowering. Jul 24, 2025
@chengjunlu chengjunlu force-pushed the chengjun/refactor_block_io_load branch from 3cce958 to 9308fcf Compare July 24, 2025 04:24
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the 2D block load lowering implementation in the LoadStoreOpToLLVM pass by transitioning from a DPAS-specific approach to using linear layout. The refactoring simplifies the code structure while maintaining functionality for 2D block I/O operations.

Key changes include:

  • Replaced complex DPAS-specific calculations with linear layout-based tile size determination
  • Simplified load operation generation by using register mapping from linear layout
  • Streamlined the code flow and reduced complexity in the load conversion logic

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
LoadStoreOpToLLVM.cpp Refactored 2D block load lowering to use linear layout instead of DPAS-specific logic
test_block_store.py Updated test to include block load operations and verify their generation

Comment on lines +1804 to +1805
auto [tileHeight, tileWidth, numPackedVals, vBlocks, rowDim, colDim,
regPackedBases] =
Copy link
Preview

Copilot AI Jul 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Using structured bindings with auto can make the code less readable when the types are not obvious. Consider using explicit variable declarations or adding a comment explaining what getBlockIOTileSize returns.

Suggested change
auto [tileHeight, tileWidth, numPackedVals, vBlocks, rowDim, colDim,
regPackedBases] =
int tileHeight, tileWidth, numPackedVals, vBlocks, rowDim, colDim;
std::vector<int> regPackedBases;
std::tie(tileHeight, tileWidth, numPackedVals, vBlocks, rowDim, colDim,
regPackedBases) =

Copilot uses AI. Check for mistakes.

unsigned totalBytesPerRowPerMatrix = tileWidth * packedElemSizeInBits / 8;
vBlocks = std::min(vBlocks, (int)(64 / totalBytesPerRowPerMatrix));
vBlocks = std::min(4, vBlocks);
// HW issue for vblock = 4
Copy link
Preview

Copilot AI Jul 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is unclear and doesn't explain what the hardware issue is or why vBlocks is set to 1 when it equals 4. Consider adding more context about the specific hardware limitation.

Suggested change
// HW issue for vblock = 4
// Due to a hardware limitation, configurations where vBlocks equals 4
// are not supported. This issue arises because the hardware cannot handle
// 2D block loads or stores with this specific configuration. To work around
// this limitation, vBlocks is set to 1 when it equals 4.

Copilot uses AI. Check for mistakes.

unsigned opsPerChannel = dpasLayout.getOpsPerChannel();
if ((opsPerChannel == 4 && elemSizeInBits == 8) ||
(opsPerChannel == 2 && elemSizeInBits == 16)) {
// Use the VNNI packing format for DotOp B layout.
Copy link
Preview

Copilot AI Jul 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove commented-out code. If this assignment is needed for future reference, consider adding a TODO comment explaining why it's preserved.

Suggested change
// Use the VNNI packing format for DotOp B layout.
// Use the VNNI packing format for DotOp B layout.
// TODO: Retain this line for reference in case packedType needs to be explicitly set to i32_ty in future updates.

Copilot uses AI. Check for mistakes.

Comment on lines +2045 to +2046
assert(maskElems.size() == otherElems.size() &&
"Invalid size of the masks.");
Copy link
Preview

Copilot AI Jul 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assertion compares maskElems.size() with otherElems.size(), but otherElems may be empty when there's no 'other' value. This could cause a false assertion failure when mask is provided but other is not.

Suggested change
assert(maskElems.size() == otherElems.size() &&
"Invalid size of the masks.");
assert((otherElems.empty() || maskElems.size() == otherElems.size()) &&
"Invalid size of the masks: maskElems and otherElems sizes do not match.");

Copilot uses AI. Check for mistakes.

Signed-off-by: Lu,Chengjun <[email protected]>

[LoadStoreOpToLLVM] Refactor block load lowering of tt.load with tensor pointer.

Signed-off-by: Lu,Chengjun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Replace the load op to llvm loop infrastructure with loops over linear layout input dimensions
1 participant