-
Notifications
You must be signed in to change notification settings - Fork 15k
[mlir][ArithToAMDGPU] limit scaling truncf/extf support to gfx950 #155431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[mlir][ArithToAMDGPU] limit scaling truncf/extf support to gfx950 #155431
Conversation
@llvm/pr-subscribers-mlir-gpu @llvm/pr-subscribers-backend-amdgpu Author: Muzammil (Muzammiluddin-Syed-ECE) ChangesThe current chip guard fails to prevent scaling_extf/truncf patterns from being applied on gfx1100 which does not have scaling support. Full diff: https://github.com/llvm/llvm-project/pull/155431.diff 1 Files Affected:
diff --git a/mlir/lib/Conversion/ArithToAMDGPU/ArithToAMDGPU.cpp b/mlir/lib/Conversion/ArithToAMDGPU/ArithToAMDGPU.cpp
index 8230591123661..c0c063b4d4923 100644
--- a/mlir/lib/Conversion/ArithToAMDGPU/ArithToAMDGPU.cpp
+++ b/mlir/lib/Conversion/ArithToAMDGPU/ArithToAMDGPU.cpp
@@ -702,7 +702,7 @@ void mlir::arith::populateArithToAMDGPUConversionPatterns(
if (allowPackedF16Rtz)
patterns.add<TruncfToFloat16RewritePattern>(patterns.getContext(), benefit);
- if (chipset >= kGfx950) {
+ if (chipset == kGfx950) {
patterns.add<ScalingExtFRewritePattern>(patterns.getContext(), benefit);
patterns.add<ScalingTruncFRewritePattern>(patterns.getContext(), benefit);
}
|
841cbb3
to
5464787
Compare
Signed-off-by: Muzammiluddin Syed <[email protected]>
✅ With the latest revision this PR passed the C/C++ code formatter. |
Signed-off-by: Muzammiluddin Syed <[email protected]>
dd20850
to
2f2a400
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing test
Signed-off-by: Muzammiluddin Syed <[email protected]>
@arsenm I've added a minimal test that checks to see that the extf/truncf aren't expanded on gfx1100 for scalar types. |
@@ -241,6 +242,9 @@ func.func @conversion_broadcast(%in: vector<4xf8E5M2>, %scale: f8E8M0FNU) -> vec | |||
|
|||
// ----- | |||
|
|||
// CHECK-GFX1100-LABEL: @conversion_scalar | |||
// CHECK-GFX1100: arith.scaling_extf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the operations involved here. Why does this need target knowledge, to emit something that isn't a target operation? Why isn't there target legalization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There does not exist support for scaled mxfp4 types in this target (gfx1100) so there is no viable legalization IIUC (cc @krzysz00 )
The main reason this change is necessary is to avoid running the scaled extf/truncf rewrite on unsupported chips when calling populateArithToAMDGPUConversionPatterns
(See here).
However, this check does not need to exist in ArithToAMDGPU.cpp
. If preferred, I can create a separate function populateConversionPatterns
in the same vein as how it's done in Math/Transforms
where we use pass in a list of ops which we are interested in expanding, moving the check to the caller rather than having it here.
/// Adds patterns to expand math operations into other more fundamental
/// operations. For example, hyperbolic functions are expanded into expressions
/// using `exp`. If `opMnemonics` is empty then all available patterns will be
/// added, otherwise only the patterns corresponding to ops in `opMnemonics`
/// will be added to the set.
void populateExpansionPatterns(RewritePatternSet &patterns,
ArrayRef<StringRef> opMnemonics = {});
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, these rewrite patterns are specifically for "lower arith
operations to target intrinsics if they exist". If the target intrinsic doesn't exist, these patterns shouldn't run, and later patterns (the "generic" expansion over in ExpandArithOps) will run instead.
So, the main point of the test here is to ensure that we don't emit intrinsics calls that can't be fulfilled, or rewrite things into a more complex form that only makes sense if you're targetting intrinsics (the amdgpu.*
operations are intrinsic wrappers that proved somewhat higher-level APIs)
I think these target checks are fine right where they are, especially since this lowering already has to do a decent number of target checks (for example, which FP8 formats get lowered to intrinsic calls)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
arith.scaling_extf
isn't a target-specific operation, but, if it has an implementation on a given chipset, this pass lowers to that target-specific implementation. Hence the test checking that, on gfx1100, this pass is a noop (because other passes introduce a less efficient lowering)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(If there were an easy, low-dependency way to query the CPU features - that is, if we substantially refactored LLVM - these target checks could be checks on the same flags LLVM uses to test for the ultimate presence of the intrinsics. But that isn't, so chipset checks it is.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^ @arsenm
I'm open to landing through the changes requested flag, since the tests have been added @arsenm ping if you'd still like to hold this up |
The current chip guard fails to prevent scaling_extf/truncf patterns from being applied on gfx1100 which does not have scaling support.