Skip to content

Conversation

Muzammiluddin-Syed-ECE
Copy link
Contributor

The current chip guard fails to prevent scaling_extf/truncf patterns from being applied on gfx1100 which does not have scaling support.

@llvmbot
Copy link
Member

llvmbot commented Aug 26, 2025

@llvm/pr-subscribers-mlir-gpu
@llvm/pr-subscribers-mlir

@llvm/pr-subscribers-backend-amdgpu

Author: Muzammil (Muzammiluddin-Syed-ECE)

Changes

The current chip guard fails to prevent scaling_extf/truncf patterns from being applied on gfx1100 which does not have scaling support.


Full diff: https://github.com/llvm/llvm-project/pull/155431.diff

1 Files Affected:

  • (modified) mlir/lib/Conversion/ArithToAMDGPU/ArithToAMDGPU.cpp (+1-1)
diff --git a/mlir/lib/Conversion/ArithToAMDGPU/ArithToAMDGPU.cpp b/mlir/lib/Conversion/ArithToAMDGPU/ArithToAMDGPU.cpp
index 8230591123661..c0c063b4d4923 100644
--- a/mlir/lib/Conversion/ArithToAMDGPU/ArithToAMDGPU.cpp
+++ b/mlir/lib/Conversion/ArithToAMDGPU/ArithToAMDGPU.cpp
@@ -702,7 +702,7 @@ void mlir::arith::populateArithToAMDGPUConversionPatterns(
   if (allowPackedF16Rtz)
     patterns.add<TruncfToFloat16RewritePattern>(patterns.getContext(), benefit);
 
-  if (chipset >= kGfx950) {
+  if (chipset == kGfx950) {
     patterns.add<ScalingExtFRewritePattern>(patterns.getContext(), benefit);
     patterns.add<ScalingTruncFRewritePattern>(patterns.getContext(), benefit);
   }

Copy link

github-actions bot commented Aug 26, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

Signed-off-by: Muzammiluddin Syed <[email protected]>
Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing test

Signed-off-by: Muzammiluddin Syed <[email protected]>
@Muzammiluddin-Syed-ECE
Copy link
Contributor Author

Muzammiluddin-Syed-ECE commented Sep 3, 2025

@arsenm I've added a minimal test that checks to see that the extf/truncf aren't expanded on gfx1100 for scalar types.

@@ -241,6 +242,9 @@ func.func @conversion_broadcast(%in: vector<4xf8E5M2>, %scale: f8E8M0FNU) -> vec

// -----

// CHECK-GFX1100-LABEL: @conversion_scalar
// CHECK-GFX1100: arith.scaling_extf
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the operations involved here. Why does this need target knowledge, to emit something that isn't a target operation? Why isn't there target legalization?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There does not exist support for scaled mxfp4 types in this target (gfx1100) so there is no viable legalization IIUC (cc @krzysz00 )

The main reason this change is necessary is to avoid running the scaled extf/truncf rewrite on unsupported chips when calling populateArithToAMDGPUConversionPatterns (See here).

However, this check does not need to exist in ArithToAMDGPU.cpp. If preferred, I can create a separate function populateConversionPatterns in the same vein as how it's done in Math/Transforms where we use pass in a list of ops which we are interested in expanding, moving the check to the caller rather than having it here.

/// Adds patterns to expand math operations into other more fundamental
/// operations. For example, hyperbolic functions are expanded into expressions
/// using `exp`. If `opMnemonics` is empty then all available patterns will be
/// added, otherwise only the patterns corresponding to ops in `opMnemonics`
/// will be added to the set.
void populateExpansionPatterns(RewritePatternSet &patterns,
                               ArrayRef<StringRef> opMnemonics = {});

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, these rewrite patterns are specifically for "lower arith operations to target intrinsics if they exist". If the target intrinsic doesn't exist, these patterns shouldn't run, and later patterns (the "generic" expansion over in ExpandArithOps) will run instead.

So, the main point of the test here is to ensure that we don't emit intrinsics calls that can't be fulfilled, or rewrite things into a more complex form that only makes sense if you're targetting intrinsics (the amdgpu.* operations are intrinsic wrappers that proved somewhat higher-level APIs)

I think these target checks are fine right where they are, especially since this lowering already has to do a decent number of target checks (for example, which FP8 formats get lowered to intrinsic calls)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arith.scaling_extf isn't a target-specific operation, but, if it has an implementation on a given chipset, this pass lowers to that target-specific implementation. Hence the test checking that, on gfx1100, this pass is a noop (because other passes introduce a less efficient lowering)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(If there were an easy, low-dependency way to query the CPU features - that is, if we substantially refactored LLVM - these target checks could be checks on the same flags LLVM uses to test for the ultimate presence of the intrinsics. But that isn't, so chipset checks it is.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@krzysz00
Copy link
Contributor

krzysz00 commented Sep 8, 2025

I'm open to landing through the changes requested flag, since the tests have been added

@arsenm ping if you'd still like to hold this up

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants