-
Notifications
You must be signed in to change notification settings - Fork 14.5k
Add FABS to canCreateUndefOrPoison #149440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds the FABS instruction to the list of operations that cannot create undef or poison values in the canCreateUndefOrPoison
function. The floating-point absolute value operation (FABS) cannot produce undef or poison values since it only clears the sign bit of floating-point numbers.
- Adds
ISD::FABS
to thecanCreateUndefOrPoison
function to return false - Adds comprehensive test cases to verify the behavior across different GPU generations and instruction selection modes
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
File | Description |
---|---|
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp | Adds FABS case to canCreateUndefOrPoison function |
llvm/test/CodeGen/AMDGPU/freeze.ll | Adds test functions demonstrating FABS optimization behavior |
Comments suppressed due to low confidence (3)
llvm/test/CodeGen/AMDGPU/freeze.ll:10453
- The GFX6-GISEL test case shows redundant v_and_b32_e32 instructions (lines 14722-14726) that duplicate the same operations on v0-v3. This suggests the optimization for eliminating redundant FABS operations after freeze is not working correctly for the GlobalISel path.
;
llvm/test/CodeGen/AMDGPU/freeze.ll:10454
- Similar to GFX6-GISEL, the GFX7-GISEL test case shows duplicated v_and_b32_e32 instructions (lines 14743-14747) indicating the same optimization issue exists across multiple GPU generations in the GlobalISel path.
; GFX7-GISEL
llvm/test/CodeGen/AMDGPU/freeze.ll:10453
- The GFX8-GISEL test case also shows the same pattern of redundant FABS operations (lines 14766-14769), confirming this optimization issue is consistent across the GlobalISel instruction selection path.
;
@llvm/pr-subscribers-llvm-selectiondag @llvm/pr-subscribers-backend-amdgpu Author: None (Shoreshen) ChangesFABS will not create undef/poison, add it into canCreateUndefOrPoison return false Full diff: https://github.com/llvm/llvm-project/pull/149440.diff 2 Files Affected:
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index 682d93d0abf3f..56c8bb441ddf8 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -5569,6 +5569,7 @@ bool SelectionDAG::canCreateUndefOrPoison(SDValue Op, const APInt &DemandedElts,
case ISD::BUILD_VECTOR:
case ISD::BUILD_PAIR:
case ISD::SPLAT_VECTOR:
+ case ISD::FABS:
return false;
case ISD::ABS:
diff --git a/llvm/test/CodeGen/AMDGPU/freeze.ll b/llvm/test/CodeGen/AMDGPU/freeze.ll
index ac438062ae208..0476bc47e2366 100644
--- a/llvm/test/CodeGen/AMDGPU/freeze.ll
+++ b/llvm/test/CodeGen/AMDGPU/freeze.ll
@@ -14592,5 +14592,241 @@ define void @freeze_v4i1_vcc(ptr addrspace(1) %ptra, ptr addrspace(1) %ptrb) {
store <4 x i1> %freeze, ptr addrspace(1) %ptrb
ret void
}
-;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
-; GFX8-SDAG: {{.*}}
+
+define double @tgt(float %a, double %b, double %c) {
+; GFX6-SDAG-LABEL: tgt:
+; GFX6-SDAG: ; %bb.0:
+; GFX6-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX6-SDAG-NEXT: v_mov_b32_e32 v5, v0
+; GFX6-SDAG-NEXT: v_add_f64 v[0:1], |v[4:5]|, v[1:2]
+; GFX6-SDAG-NEXT: v_add_f64 v[2:3], |v[4:5]|, v[3:4]
+; GFX6-SDAG-NEXT: v_add_f64 v[0:1], v[0:1], v[2:3]
+; GFX6-SDAG-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX6-GISEL-LABEL: tgt:
+; GFX6-GISEL: ; %bb.0:
+; GFX6-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX6-GISEL-NEXT: v_and_b32_e32 v5, 0x7fffffff, v0
+; GFX6-GISEL-NEXT: v_add_f64 v[0:1], v[4:5], v[1:2]
+; GFX6-GISEL-NEXT: v_add_f64 v[2:3], v[4:5], v[3:4]
+; GFX6-GISEL-NEXT: v_add_f64 v[0:1], v[0:1], v[2:3]
+; GFX6-GISEL-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX7-SDAG-LABEL: tgt:
+; GFX7-SDAG: ; %bb.0:
+; GFX7-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX7-SDAG-NEXT: v_mov_b32_e32 v5, v0
+; GFX7-SDAG-NEXT: v_add_f64 v[0:1], |v[4:5]|, v[1:2]
+; GFX7-SDAG-NEXT: v_add_f64 v[2:3], |v[4:5]|, v[3:4]
+; GFX7-SDAG-NEXT: v_add_f64 v[0:1], v[0:1], v[2:3]
+; GFX7-SDAG-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX7-GISEL-LABEL: tgt:
+; GFX7-GISEL: ; %bb.0:
+; GFX7-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX7-GISEL-NEXT: v_and_b32_e32 v5, 0x7fffffff, v0
+; GFX7-GISEL-NEXT: v_add_f64 v[0:1], v[4:5], v[1:2]
+; GFX7-GISEL-NEXT: v_add_f64 v[2:3], v[4:5], v[3:4]
+; GFX7-GISEL-NEXT: v_add_f64 v[0:1], v[0:1], v[2:3]
+; GFX7-GISEL-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX8-SDAG-LABEL: tgt:
+; GFX8-SDAG: ; %bb.0:
+; GFX8-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX8-SDAG-NEXT: v_mov_b32_e32 v5, v0
+; GFX8-SDAG-NEXT: v_add_f64 v[0:1], |v[4:5]|, v[1:2]
+; GFX8-SDAG-NEXT: v_add_f64 v[2:3], |v[4:5]|, v[3:4]
+; GFX8-SDAG-NEXT: v_add_f64 v[0:1], v[0:1], v[2:3]
+; GFX8-SDAG-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX8-GISEL-LABEL: tgt:
+; GFX8-GISEL: ; %bb.0:
+; GFX8-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX8-GISEL-NEXT: v_and_b32_e32 v5, 0x7fffffff, v0
+; GFX8-GISEL-NEXT: v_add_f64 v[0:1], v[4:5], v[1:2]
+; GFX8-GISEL-NEXT: v_add_f64 v[2:3], v[4:5], v[3:4]
+; GFX8-GISEL-NEXT: v_add_f64 v[0:1], v[0:1], v[2:3]
+; GFX8-GISEL-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-GISEL-LABEL: tgt:
+; GFX9-GISEL: ; %bb.0:
+; GFX9-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-GISEL-NEXT: v_and_b32_e32 v5, 0x7fffffff, v0
+; GFX9-GISEL-NEXT: v_add_f64 v[0:1], v[4:5], v[1:2]
+; GFX9-GISEL-NEXT: v_add_f64 v[2:3], v[4:5], v[3:4]
+; GFX9-GISEL-NEXT: v_add_f64 v[0:1], v[0:1], v[2:3]
+; GFX9-GISEL-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX10-SDAG-LABEL: tgt:
+; GFX10-SDAG: ; %bb.0:
+; GFX10-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX10-SDAG-NEXT: v_mov_b32_e32 v5, v0
+; GFX10-SDAG-NEXT: v_add_f64 v[0:1], |v[4:5]|, v[1:2]
+; GFX10-SDAG-NEXT: v_add_f64 v[2:3], |v[4:5]|, v[3:4]
+; GFX10-SDAG-NEXT: v_add_f64 v[0:1], v[0:1], v[2:3]
+; GFX10-SDAG-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX10-GISEL-LABEL: tgt:
+; GFX10-GISEL: ; %bb.0:
+; GFX10-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX10-GISEL-NEXT: v_and_b32_e32 v5, 0x7fffffff, v0
+; GFX10-GISEL-NEXT: v_add_f64 v[0:1], v[4:5], v[1:2]
+; GFX10-GISEL-NEXT: v_add_f64 v[2:3], v[4:5], v[3:4]
+; GFX10-GISEL-NEXT: v_add_f64 v[0:1], v[0:1], v[2:3]
+; GFX10-GISEL-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-SDAG-LABEL: tgt:
+; GFX11-SDAG: ; %bb.0:
+; GFX11-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-SDAG-NEXT: v_mov_b32_e32 v5, v0
+; GFX11-SDAG-NEXT: v_add_f64 v[0:1], |v[4:5]|, v[1:2]
+; GFX11-SDAG-NEXT: v_add_f64 v[2:3], |v[4:5]|, v[3:4]
+; GFX11-SDAG-NEXT: v_add_f64 v[0:1], v[0:1], v[2:3]
+; GFX11-SDAG-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-GISEL-LABEL: tgt:
+; GFX11-GISEL: ; %bb.0:
+; GFX11-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-GISEL-NEXT: v_and_b32_e32 v5, 0x7fffffff, v0
+; GFX11-GISEL-NEXT: v_add_f64 v[0:1], v[4:5], v[1:2]
+; GFX11-GISEL-NEXT: v_add_f64 v[2:3], v[4:5], v[3:4]
+; GFX11-GISEL-NEXT: v_add_f64 v[0:1], v[0:1], v[2:3]
+; GFX11-GISEL-NEXT: s_setpc_b64 s[30:31]
+ %pv = insertelement <2 x float> poison, float %a, i32 1
+ %d = bitcast <2 x float> %pv to double
+ %r = call double @llvm.fabs.f64(double %d)
+ %fr = freeze double %r
+ %add1 = fadd double %fr, %b
+ %add2 = fadd double %fr, %c
+ %add = fadd double %add1, %add2
+ ret double %add
+}
+
+define <4 x float> @src(<4 x float> %A, <4 x float> %B) {
+; GFX6-SDAG-LABEL: src:
+; GFX6-SDAG: ; %bb.0:
+; GFX6-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX6-SDAG-NEXT: v_and_b32_e32 v0, 0x7fffffff, v0
+; GFX6-SDAG-NEXT: v_and_b32_e32 v1, 0x7fffffff, v1
+; GFX6-SDAG-NEXT: v_and_b32_e32 v2, 0x7fffffff, v2
+; GFX6-SDAG-NEXT: v_and_b32_e32 v3, 0x7fffffff, v3
+; GFX6-SDAG-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX6-GISEL-LABEL: src:
+; GFX6-GISEL: ; %bb.0:
+; GFX6-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX6-GISEL-NEXT: v_and_b32_e32 v0, 0x7fffffff, v0
+; GFX6-GISEL-NEXT: v_and_b32_e32 v1, 0x7fffffff, v1
+; GFX6-GISEL-NEXT: v_and_b32_e32 v2, 0x7fffffff, v2
+; GFX6-GISEL-NEXT: v_and_b32_e32 v3, 0x7fffffff, v3
+; GFX6-GISEL-NEXT: v_and_b32_e32 v0, 0x7fffffff, v0
+; GFX6-GISEL-NEXT: v_and_b32_e32 v1, 0x7fffffff, v1
+; GFX6-GISEL-NEXT: v_and_b32_e32 v2, 0x7fffffff, v2
+; GFX6-GISEL-NEXT: v_and_b32_e32 v3, 0x7fffffff, v3
+; GFX6-GISEL-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX7-SDAG-LABEL: src:
+; GFX7-SDAG: ; %bb.0:
+; GFX7-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX7-SDAG-NEXT: v_and_b32_e32 v0, 0x7fffffff, v0
+; GFX7-SDAG-NEXT: v_and_b32_e32 v1, 0x7fffffff, v1
+; GFX7-SDAG-NEXT: v_and_b32_e32 v2, 0x7fffffff, v2
+; GFX7-SDAG-NEXT: v_and_b32_e32 v3, 0x7fffffff, v3
+; GFX7-SDAG-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX7-GISEL-LABEL: src:
+; GFX7-GISEL: ; %bb.0:
+; GFX7-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX7-GISEL-NEXT: v_and_b32_e32 v0, 0x7fffffff, v0
+; GFX7-GISEL-NEXT: v_and_b32_e32 v1, 0x7fffffff, v1
+; GFX7-GISEL-NEXT: v_and_b32_e32 v2, 0x7fffffff, v2
+; GFX7-GISEL-NEXT: v_and_b32_e32 v3, 0x7fffffff, v3
+; GFX7-GISEL-NEXT: v_and_b32_e32 v0, 0x7fffffff, v0
+; GFX7-GISEL-NEXT: v_and_b32_e32 v1, 0x7fffffff, v1
+; GFX7-GISEL-NEXT: v_and_b32_e32 v2, 0x7fffffff, v2
+; GFX7-GISEL-NEXT: v_and_b32_e32 v3, 0x7fffffff, v3
+; GFX7-GISEL-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX8-SDAG-LABEL: src:
+; GFX8-SDAG: ; %bb.0:
+; GFX8-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX8-SDAG-NEXT: v_and_b32_e32 v0, 0x7fffffff, v0
+; GFX8-SDAG-NEXT: v_and_b32_e32 v1, 0x7fffffff, v1
+; GFX8-SDAG-NEXT: v_and_b32_e32 v2, 0x7fffffff, v2
+; GFX8-SDAG-NEXT: v_and_b32_e32 v3, 0x7fffffff, v3
+; GFX8-SDAG-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX8-GISEL-LABEL: src:
+; GFX8-GISEL: ; %bb.0:
+; GFX8-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX8-GISEL-NEXT: v_and_b32_e32 v0, 0x7fffffff, v0
+; GFX8-GISEL-NEXT: v_and_b32_e32 v1, 0x7fffffff, v1
+; GFX8-GISEL-NEXT: v_and_b32_e32 v2, 0x7fffffff, v2
+; GFX8-GISEL-NEXT: v_and_b32_e32 v3, 0x7fffffff, v3
+; GFX8-GISEL-NEXT: v_and_b32_e32 v0, 0x7fffffff, v0
+; GFX8-GISEL-NEXT: v_and_b32_e32 v1, 0x7fffffff, v1
+; GFX8-GISEL-NEXT: v_and_b32_e32 v2, 0x7fffffff, v2
+; GFX8-GISEL-NEXT: v_and_b32_e32 v3, 0x7fffffff, v3
+; GFX8-GISEL-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-GISEL-LABEL: src:
+; GFX9-GISEL: ; %bb.0:
+; GFX9-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-GISEL-NEXT: v_and_b32_e32 v0, 0x7fffffff, v0
+; GFX9-GISEL-NEXT: v_and_b32_e32 v1, 0x7fffffff, v1
+; GFX9-GISEL-NEXT: v_and_b32_e32 v2, 0x7fffffff, v2
+; GFX9-GISEL-NEXT: v_and_b32_e32 v3, 0x7fffffff, v3
+; GFX9-GISEL-NEXT: v_and_b32_e32 v0, 0x7fffffff, v0
+; GFX9-GISEL-NEXT: v_and_b32_e32 v1, 0x7fffffff, v1
+; GFX9-GISEL-NEXT: v_and_b32_e32 v2, 0x7fffffff, v2
+; GFX9-GISEL-NEXT: v_and_b32_e32 v3, 0x7fffffff, v3
+; GFX9-GISEL-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX10-SDAG-LABEL: src:
+; GFX10-SDAG: ; %bb.0:
+; GFX10-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX10-SDAG-NEXT: v_and_b32_e32 v0, 0x7fffffff, v0
+; GFX10-SDAG-NEXT: v_and_b32_e32 v1, 0x7fffffff, v1
+; GFX10-SDAG-NEXT: v_and_b32_e32 v2, 0x7fffffff, v2
+; GFX10-SDAG-NEXT: v_and_b32_e32 v3, 0x7fffffff, v3
+; GFX10-SDAG-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX10-GISEL-LABEL: src:
+; GFX10-GISEL: ; %bb.0:
+; GFX10-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX10-GISEL-NEXT: v_and_b32_e32 v0, 0x7fffffff, v0
+; GFX10-GISEL-NEXT: v_and_b32_e32 v1, 0x7fffffff, v1
+; GFX10-GISEL-NEXT: v_and_b32_e32 v2, 0x7fffffff, v2
+; GFX10-GISEL-NEXT: v_and_b32_e32 v3, 0x7fffffff, v3
+; GFX10-GISEL-NEXT: v_and_b32_e32 v0, 0x7fffffff, v0
+; GFX10-GISEL-NEXT: v_and_b32_e32 v1, 0x7fffffff, v1
+; GFX10-GISEL-NEXT: v_and_b32_e32 v2, 0x7fffffff, v2
+; GFX10-GISEL-NEXT: v_and_b32_e32 v3, 0x7fffffff, v3
+; GFX10-GISEL-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-SDAG-LABEL: src:
+; GFX11-SDAG: ; %bb.0:
+; GFX11-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-SDAG-NEXT: v_and_b32_e32 v0, 0x7fffffff, v0
+; GFX11-SDAG-NEXT: v_and_b32_e32 v1, 0x7fffffff, v1
+; GFX11-SDAG-NEXT: v_and_b32_e32 v2, 0x7fffffff, v2
+; GFX11-SDAG-NEXT: v_and_b32_e32 v3, 0x7fffffff, v3
+; GFX11-SDAG-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-GISEL-LABEL: src:
+; GFX11-GISEL: ; %bb.0:
+; GFX11-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-GISEL-NEXT: v_and_b32_e32 v0, 0x7fffffff, v0
+; GFX11-GISEL-NEXT: v_and_b32_e32 v1, 0x7fffffff, v1
+; GFX11-GISEL-NEXT: v_and_b32_e32 v2, 0x7fffffff, v2
+; GFX11-GISEL-NEXT: v_and_b32_e32 v3, 0x7fffffff, v3
+; GFX11-GISEL-NEXT: v_and_b32_e32 v0, 0x7fffffff, v0
+; GFX11-GISEL-NEXT: v_and_b32_e32 v1, 0x7fffffff, v1
+; GFX11-GISEL-NEXT: v_and_b32_e32 v2, 0x7fffffff, v2
+; GFX11-GISEL-NEXT: v_and_b32_e32 v3, 0x7fffffff, v3
+; GFX11-GISEL-NEXT: s_setpc_b64 s[30:31]
+ %A0 = call <4 x float> @llvm.fabs.v4f32(<4 x float> %A)
+ %F1 = freeze <4 x float> %A0
+ %A1 = call <4 x float> @llvm.fabs.v4f32(<4 x float> %F1)
+ ret <4 x float> %A1
+}
|
llvm/test/CodeGen/AMDGPU/freeze.ll
Outdated
;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line: | ||
; GFX8-SDAG: {{.*}} | ||
|
||
define double @tgt(float %a, double %b, double %c) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except function names should be better
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/123/builds/23628 Here is the relevant piece of the build log for the reference
|
FABS will not create undef/poison, add it into canCreateUndefOrPoison return false