Skip to content

[X86][Inline] Check correct function for target feature check #152515

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

nikic
Copy link
Contributor

@nikic nikic commented Aug 7, 2025

The check for ABI differences for inlined calls involves the caller, the callee and the nested callee. Before inlining, the ABI is determined by the target features of the callee. After inlining it is determined by the caller. The features of the nested callee should never actually matter.

The check for ABI differences for inlined calls involves the
caller, the callee and the nested callee. Before inlining, the
ABI is determined by the target features of the callee. After
inlining it is determined by the caller. The features of the
nested callee should never actually matter.
@llvmbot
Copy link
Member

llvmbot commented Aug 7, 2025

@llvm/pr-subscribers-backend-x86

@llvm/pr-subscribers-llvm-transforms

Author: Nikita Popov (nikic)

Changes

The check for ABI differences for inlined calls involves the caller, the callee and the nested callee. Before inlining, the ABI is determined by the target features of the callee. After inlining it is determined by the caller. The features of the nested callee should never actually matter.


Full diff: https://github.com/llvm/llvm-project/pull/152515.diff

2 Files Affected:

  • (modified) llvm/lib/Target/X86/X86TargetTransformInfo.cpp (+4-14)
  • (modified) llvm/test/Transforms/Inline/X86/call-abi-compatibility.ll (+32-6)
diff --git a/llvm/lib/Target/X86/X86TargetTransformInfo.cpp b/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
index 90791fc5788f4..fdae0ba7c483b 100644
--- a/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
+++ b/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
@@ -6525,8 +6525,8 @@ bool X86TTIImpl::areInlineCompatible(const Function *Caller,
 
   for (const Instruction &I : instructions(Callee)) {
     if (const auto *CB = dyn_cast<CallBase>(&I)) {
-      // Having more target features is fine for inline ASM.
-      if (CB->isInlineAsm())
+      // Having more target features is fine for inline ASM and intrinsics.
+      if (CB->isInlineAsm() || CB->getIntrinsicID() != Intrinsic::not_intrinsic)
         continue;
 
       SmallVector<Type *, 8> Types;
@@ -6542,19 +6542,9 @@ bool X86TTIImpl::areInlineCompatible(const Function *Caller,
       if (all_of(Types, IsSimpleTy))
         continue;
 
-      if (Function *NestedCallee = CB->getCalledFunction()) {
-        // Assume that intrinsics are always ABI compatible.
-        if (NestedCallee->isIntrinsic())
-          continue;
-
-        // Do a precise compatibility check.
-        if (!areTypesABICompatible(Caller, NestedCallee, Types))
-          return false;
-      } else {
-        // We don't know the target features of the callee,
-        // assume it is incompatible.
+      // Do a precise compatibility check.
+      if (!areTypesABICompatible(Caller, Callee, Types))
         return false;
-      }
     }
   }
   return true;
diff --git a/llvm/test/Transforms/Inline/X86/call-abi-compatibility.ll b/llvm/test/Transforms/Inline/X86/call-abi-compatibility.ll
index 6f582cab2f145..0b6c2ead58cf7 100644
--- a/llvm/test/Transforms/Inline/X86/call-abi-compatibility.ll
+++ b/llvm/test/Transforms/Inline/X86/call-abi-compatibility.ll
@@ -34,8 +34,7 @@ define i64 @callee_not_avx(<4 x i64> %arg) noinline {
   ret i64 %v
 }
 
-; This call also shouldn't be inlined, as we don't know whether callee_unknown
-; is ABI compatible or not.
+; This call also shouldn't be inlined, as caller_not_avx2 is not ABI compatible.
 define void @caller_avx2() "target-features"="+avx" {
 ; CHECK-LABEL: define {{[^@]+}}@caller_avx2
 ; CHECK-SAME: () #[[ATTR0]] {
@@ -55,6 +54,29 @@ define internal void @caller_not_avx2() {
   ret void
 }
 
+; Should be inlined, as caller_avx4 is ABI compatible. The fact that we don't
+; know anything about callee_unknown doesn't matter, as it is the caller that
+; determines the ABI as far as target features are concerned.
+define void @caller_avx6() "target-features"="+avx" {
+; CHECK-LABEL: define {{[^@]+}}@caller_avx6
+; CHECK-SAME: () #[[ATTR0]] {
+; CHECK-NEXT:    [[TMP1:%.*]] = call i64 @callee_unknown(<4 x i64> <i64 0, i64 1, i64 2, i64 3>)
+; CHECK-NEXT:    ret void
+;
+  call void @caller_avx7()
+  ret void
+}
+
+define void @caller_avx7() "target-features"="+avx" {
+; CHECK-LABEL: define {{[^@]+}}@caller_avx7
+; CHECK-SAME: () #[[ATTR0]] {
+; CHECK-NEXT:    [[TMP1:%.*]] = call i64 @callee_unknown(<4 x i64> <i64 0, i64 1, i64 2, i64 3>)
+; CHECK-NEXT:    ret void
+;
+  call i64 @callee_unknown(<4 x i64> <i64 0, i64 1, i64 2, i64 3>)
+  ret void
+}
+
 declare i64 @callee_unknown(<4 x i64>)
 
 ; This call should get inlined, because we assume that intrinsics are always
@@ -62,20 +84,24 @@ declare i64 @callee_unknown(<4 x i64>)
 define void @caller_avx3() "target-features"="+avx" {
 ; CHECK-LABEL: define {{[^@]+}}@caller_avx3
 ; CHECK-SAME: () #[[ATTR0]] {
-; CHECK-NEXT:    [[TMP1:%.*]] = call i64 @llvm.some_intrinsic(<4 x i64> <i64 0, i64 1, i64 2, i64 3>)
+; CHECK-NEXT:    [[V_I:%.*]] = load <4 x i64>, ptr @g, align 32
+; CHECK-NEXT:    [[V2_I:%.*]] = call <4 x i64> @llvm.abs.v4i64(<4 x i64> [[V_I]], i1 false)
+; CHECK-NEXT:    store <4 x i64> [[V2_I]], ptr @g, align 32
 ; CHECK-NEXT:    ret void
 ;
   call void @caller_not_avx3()
   ret void
 }
 
+@g = external global <4 x i64>
+
 define internal void @caller_not_avx3() {
-  call i64 @llvm.some_intrinsic(<4 x i64> <i64 0, i64 1, i64 2, i64 3>)
+  %v = load <4 x i64>, ptr @g
+  %v2 = call <4 x i64> @llvm.abs(<4 x i64> %v, i1 false)
+  store <4 x i64> %v2, ptr @g
   ret void
 }
 
-declare i64 @llvm.some_intrinsic(<4 x i64>)
-
 ; This call should get inlined, because only simple types are involved.
 define void @caller_avx4() "target-features"="+avx" {
 ; CHECK-LABEL: define {{[^@]+}}@caller_avx4

@@ -55,27 +54,54 @@ define internal void @caller_not_avx2() {
ret void
}

; Should be inlined, as caller_avx4 is ABI compatible. The fact that we don't
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

caller_avx4 -> caller_avx7 ?

@phoebewang
Copy link
Contributor

I think NestedCallee check is necessary. Once inlined, NestedCallee will be new callees of the caller. We need to make sure there's no ABI differences among them. Checking the currect callee is not enough, because it may happen to not use vector arguments.

@nikic
Copy link
Contributor Author

nikic commented Aug 8, 2025

I think NestedCallee check is necessary. Once inlined, NestedCallee will be new callees of the caller. We need to make sure there's no ABI differences among them. Checking the currect callee is not enough, because it may happen to not use vector arguments.

Situation before inlining:

caller calls callee (uses target-features of caller)
callee calls nested_callee (uses target-features of callee)

Situation after inlining:

caller calls nested_callee (uses target-features of caller)

So the call to nested_callee goes from using the target-features of callee to using the target features of caller. The target features of nested_callee are never used by call lowering.

So I think it's correct to do the ABI compatibility check between caller and callee, not caller and nested_callee.

@phoebewang
Copy link
Contributor

phoebewang commented Aug 8, 2025

I think NestedCallee check is necessary. Once inlined, NestedCallee will be new callees of the caller. We need to make sure there's no ABI differences among them. Checking the currect callee is not enough, because it may happen to not use vector arguments.

Situation before inlining:

caller calls callee (uses target-features of caller)
callee calls nested_callee (uses target-features of callee)

Situation after inlining:

caller calls nested_callee (uses target-features of caller)

So the call to nested_callee goes from using the target-features of callee to using the target features of caller. The target features of nested_callee are never used by call lowering.

So I think it's correct to do the ABI compatibility check between caller and callee, not caller and nested_callee.

Here is an artificial one: https://godbolt.org/z/EhcdWE393. callee2 cannot be inlined to caller2 because of nested_callee2.

@nikic
Copy link
Contributor Author

nikic commented Aug 8, 2025

I think NestedCallee check is necessary. Once inlined, NestedCallee will be new callees of the caller. We need to make sure there's no ABI differences among them. Checking the currect callee is not enough, because it may happen to not use vector arguments.

Situation before inlining:

caller calls callee (uses target-features of caller)
callee calls nested_callee (uses target-features of callee)

Situation after inlining:

caller calls nested_callee (uses target-features of caller)

So the call to nested_callee goes from using the target-features of callee to using the target features of caller. The target features of nested_callee are never used by call lowering.
So I think it's correct to do the ABI compatibility check between caller and callee, not caller and nested_callee.

Here is an artificial one: https://godbolt.org/z/EhcdWE393. callee2 cannot be inlined to caller2 because of nested_callee2.

Right, we can't inline callee2 into caller2 -- but this patch will not inline it either, due to the target feature mismatch between caller2 and callee2.

@phoebewang
Copy link
Contributor

I think NestedCallee check is necessary. Once inlined, NestedCallee will be new callees of the caller. We need to make sure there's no ABI differences among them. Checking the currect callee is not enough, because it may happen to not use vector arguments.

Situation before inlining:

caller calls callee (uses target-features of caller)
callee calls nested_callee (uses target-features of callee)

Situation after inlining:

caller calls nested_callee (uses target-features of caller)

So the call to nested_callee goes from using the target-features of callee to using the target features of caller. The target features of nested_callee are never used by call lowering.
So I think it's correct to do the ABI compatibility check between caller and callee, not caller and nested_callee.

Here is an artificial one: https://godbolt.org/z/EhcdWE393. callee2 cannot be inlined to caller2 because of nested_callee2.

Right, we can't inline callee2 into caller2 -- but this patch will not inline it either, due to the target feature mismatch between caller2 and callee2.

I think we just require callee2 has subset of caller2's target feature?

@nikic
Copy link
Contributor Author

nikic commented Aug 8, 2025

I think NestedCallee check is necessary. Once inlined, NestedCallee will be new callees of the caller. We need to make sure there's no ABI differences among them. Checking the currect callee is not enough, because it may happen to not use vector arguments.

Situation before inlining:

caller calls callee (uses target-features of caller)
callee calls nested_callee (uses target-features of callee)

Situation after inlining:

caller calls nested_callee (uses target-features of caller)

So the call to nested_callee goes from using the target-features of callee to using the target features of caller. The target features of nested_callee are never used by call lowering.
So I think it's correct to do the ABI compatibility check between caller and callee, not caller and nested_callee.

Here is an artificial one: https://godbolt.org/z/EhcdWE393. callee2 cannot be inlined to caller2 because of nested_callee2.

Right, we can't inline callee2 into caller2 -- but this patch will not inline it either, due to the target feature mismatch between caller2 and callee2.

I think we just require callee2 has subset of caller2's target feature?

Generally yes, but due to the presence of nested_callee2 with vector argument, we require an exact match.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants