[DAG] Fold (setcc ((x | x >> c0 | ...) & mask)) sequences #146054

Pierre-vh · 2025-06-27T10:45:55Z

Fold sequences where we extract a bunch of contiguous bits from a value,
merge them into the low bit and then check if the low bits are zero or not.

Usually the and would be on the outside (the leaves) of the expression, but the DAG canonicalizes it to a single and at the root of the expression.

The reason I put this in DAGCombiner instead of the target combiner is
because this is a generic, valid transform that's also fairly niche, so
there isn't much risk of a combine loop I think.

See #136727

Pierre-vh · 2025-06-27T10:46:09Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

llvmbot · 2025-06-27T10:48:01Z

@llvm/pr-subscribers-backend-amdgpu

@llvm/pr-subscribers-llvm-selectiondag

Author: Pierre van Houtryve (Pierre-vh)

Changes

Fold sequences where we extract a bunch of contiguous bits from a value,
merge them into the low bit and then check if the low bits are zero or not.

Usually the and would be on the outside (the leaves) of the expression, but the DAG canonicalizes it to a single and at the root of the expression.

The reason I put this in DAGCombiner instead of the target combiner is
because this is a generic, valid transform that's also fairly niche, so
there isn't much risk of a combine loop I think.

See #136727

Full diff: https://github.com/llvm/llvm-project/pull/146054.diff

2 Files Affected:

(modified) llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (+85-1)
(modified) llvm/test/CodeGen/AMDGPU/workitems-intrinsics-opts.ll (+6-28)

diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index 08dab7c697b99..a189208d3a62e 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -28909,13 +28909,97 @@ SDValue DAGCombiner::SimplifySelectCC(const SDLoc &DL, SDValue N0, SDValue N1,
   return SDValue();
 }
 
+static SDValue matchMergedBFX(SDValue Root, SelectionDAG &DAG,
+                              const TargetLowering &TLI) {
+  // Match a pattern such as:
+  //  (X | (X >> C0) | (X >> C1) | ...) & Mask
+  // This extracts contiguous parts of X and ORs them together before comparing.
+  // We can optimize this so that we directly check (X & SomeMask) instead,
+  // eliminating the shifts.
+
+  EVT VT = Root.getValueType();
+
+  if (Root.getOpcode() != ISD::AND)
+    return SDValue();
+
+  SDValue N0 = Root.getOperand(0);
+  SDValue N1 = Root.getOperand(1);
+
+  if (N0.getOpcode() != ISD::OR || !isa<ConstantSDNode>(N1))
+    return SDValue();
+
+  APInt RootMask = cast<ConstantSDNode>(N1)->getAsAPIntVal();
+  if (!RootMask.isMask())
+    return SDValue();
+
+  SDValue Src;
+  const auto IsSrc = [&](SDValue V) {
+    if (!Src) {
+      Src = V;
+      return true;
+    }
+
+    return Src == V;
+  };
+
+  SmallVector<SDValue> Worklist = {N0};
+  APInt PartsMask(VT.getSizeInBits(), 0);
+  while (!Worklist.empty()) {
+    SDValue V = Worklist.pop_back_val();
+    if (!V.hasOneUse() && Src != V)
+      return SDValue();
+
+    if (V.getOpcode() == ISD::OR) {
+      Worklist.push_back(V.getOperand(0));
+      Worklist.push_back(V.getOperand(1));
+      continue;
+    }
+
+    if (V.getOpcode() == ISD::SRL) {
+      SDValue ShiftSrc = V.getOperand(0);
+      SDValue ShiftAmt = V.getOperand(1);
+
+      if (!IsSrc(ShiftSrc) || !isa<ConstantSDNode>(ShiftAmt))
+        return SDValue();
+
+      PartsMask |= (RootMask << cast<ConstantSDNode>(ShiftAmt)->getAsZExtVal());
+      continue;
+    }
+
+    if (IsSrc(V)) {
+      PartsMask |= RootMask;
+      continue;
+    }
+
+    return SDValue();
+  }
+
+  if (!RootMask.isMask() || !Src)
+    return SDValue();
+
+  SDLoc DL(Root);
+  return DAG.getNode(ISD::AND, DL, VT,
+                     {Src, DAG.getConstant(PartsMask, DL, VT)});
+}
+
 /// This is a stub for TargetLowering::SimplifySetCC.
 SDValue DAGCombiner::SimplifySetCC(EVT VT, SDValue N0, SDValue N1,
                                    ISD::CondCode Cond, const SDLoc &DL,
                                    bool foldBooleans) {
   TargetLowering::DAGCombinerInfo
     DagCombineInfo(DAG, Level, false, this);
-  return TLI.SimplifySetCC(VT, N0, N1, Cond, foldBooleans, DagCombineInfo, DL);
+  if (SDValue C =
+          TLI.SimplifySetCC(VT, N0, N1, Cond, foldBooleans, DagCombineInfo, DL))
+    return C;
+
+  if ((Cond == ISD::SETNE || Cond == ISD::SETEQ) &&
+      N0.getOpcode() == ISD::AND && isNullConstant(N1)) {
+
+    if (SDValue Res = matchMergedBFX(N0, DAG, TLI))
+      return DAG.getSetCC(DL, VT, Res, N1, Cond);
+  }
+
+  return SDValue();
 }
 
 /// Given an ISD::SDIV node expressing a divide by constant, return
diff --git a/llvm/test/CodeGen/AMDGPU/workitems-intrinsics-opts.ll b/llvm/test/CodeGen/AMDGPU/workitems-intrinsics-opts.ll
index 14120680216fc..5a25ec29af481 100644
--- a/llvm/test/CodeGen/AMDGPU/workitems-intrinsics-opts.ll
+++ b/llvm/test/CodeGen/AMDGPU/workitems-intrinsics-opts.ll
@@ -12,11 +12,7 @@ define i1 @workitem_zero() {
 ; DAGISEL-GFX9-LABEL: workitem_zero:
 ; DAGISEL-GFX9:       ; %bb.0: ; %entry
 ; DAGISEL-GFX9-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; DAGISEL-GFX9-NEXT:    v_lshrrev_b32_e32 v1, 10, v31
-; DAGISEL-GFX9-NEXT:    v_lshrrev_b32_e32 v0, 20, v31
-; DAGISEL-GFX9-NEXT:    v_or_b32_e32 v1, v31, v1
-; DAGISEL-GFX9-NEXT:    v_or_b32_e32 v0, v1, v0
-; DAGISEL-GFX9-NEXT:    v_and_b32_e32 v0, 0x3ff, v0
+; DAGISEL-GFX9-NEXT:    v_and_b32_e32 v0, 0x3fffffff, v31
 ; DAGISEL-GFX9-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
 ; DAGISEL-GFX9-NEXT:    v_cndmask_b32_e64 v0, 0, 1, vcc
 ; DAGISEL-GFX9-NEXT:    s_setpc_b64 s[30:31]
@@ -24,10 +20,7 @@ define i1 @workitem_zero() {
 ; DAGISEL-GFX942-LABEL: workitem_zero:
 ; DAGISEL-GFX942:       ; %bb.0: ; %entry
 ; DAGISEL-GFX942-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; DAGISEL-GFX942-NEXT:    v_lshrrev_b32_e32 v0, 20, v31
-; DAGISEL-GFX942-NEXT:    v_lshrrev_b32_e32 v1, 10, v31
-; DAGISEL-GFX942-NEXT:    v_or3_b32 v0, v31, v1, v0
-; DAGISEL-GFX942-NEXT:    v_and_b32_e32 v0, 0x3ff, v0
+; DAGISEL-GFX942-NEXT:    v_and_b32_e32 v0, 0x3fffffff, v31
 ; DAGISEL-GFX942-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v0
 ; DAGISEL-GFX942-NEXT:    s_nop 1
 ; DAGISEL-GFX942-NEXT:    v_cndmask_b32_e64 v0, 0, 1, vcc
@@ -40,11 +33,7 @@ define i1 @workitem_zero() {
 ; DAGISEL-GFX12-NEXT:    s_wait_samplecnt 0x0
 ; DAGISEL-GFX12-NEXT:    s_wait_bvhcnt 0x0
 ; DAGISEL-GFX12-NEXT:    s_wait_kmcnt 0x0
-; DAGISEL-GFX12-NEXT:    v_lshrrev_b32_e32 v0, 20, v31
-; DAGISEL-GFX12-NEXT:    v_lshrrev_b32_e32 v1, 10, v31
-; DAGISEL-GFX12-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
-; DAGISEL-GFX12-NEXT:    v_or3_b32 v0, v31, v1, v0
-; DAGISEL-GFX12-NEXT:    v_and_b32_e32 v0, 0x3ff, v0
+; DAGISEL-GFX12-NEXT:    v_and_b32_e32 v0, 0x3fffffff, v31
 ; DAGISEL-GFX12-NEXT:    s_delay_alu instid0(VALU_DEP_1)
 ; DAGISEL-GFX12-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v0
 ; DAGISEL-GFX12-NEXT:    s_wait_alu 0xfffd
@@ -106,11 +95,7 @@ define i1 @workitem_nonzero() {
 ; DAGISEL-GFX9-LABEL: workitem_nonzero:
 ; DAGISEL-GFX9:       ; %bb.0: ; %entry
 ; DAGISEL-GFX9-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; DAGISEL-GFX9-NEXT:    v_lshrrev_b32_e32 v1, 10, v31
-; DAGISEL-GFX9-NEXT:    v_lshrrev_b32_e32 v0, 20, v31
-; DAGISEL-GFX9-NEXT:    v_or_b32_e32 v1, v31, v1
-; DAGISEL-GFX9-NEXT:    v_or_b32_e32 v0, v1, v0
-; DAGISEL-GFX9-NEXT:    v_and_b32_e32 v0, 0x3ff, v0
+; DAGISEL-GFX9-NEXT:    v_and_b32_e32 v0, 0x3fffffff, v31
 ; DAGISEL-GFX9-NEXT:    v_cmp_ne_u32_e32 vcc, 0, v0
 ; DAGISEL-GFX9-NEXT:    v_cndmask_b32_e64 v0, 0, 1, vcc
 ; DAGISEL-GFX9-NEXT:    s_setpc_b64 s[30:31]
@@ -118,10 +103,7 @@ define i1 @workitem_nonzero() {
 ; DAGISEL-GFX942-LABEL: workitem_nonzero:
 ; DAGISEL-GFX942:       ; %bb.0: ; %entry
 ; DAGISEL-GFX942-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; DAGISEL-GFX942-NEXT:    v_lshrrev_b32_e32 v0, 20, v31
-; DAGISEL-GFX942-NEXT:    v_lshrrev_b32_e32 v1, 10, v31
-; DAGISEL-GFX942-NEXT:    v_or3_b32 v0, v31, v1, v0
-; DAGISEL-GFX942-NEXT:    v_and_b32_e32 v0, 0x3ff, v0
+; DAGISEL-GFX942-NEXT:    v_and_b32_e32 v0, 0x3fffffff, v31
 ; DAGISEL-GFX942-NEXT:    v_cmp_ne_u32_e32 vcc, 0, v0
 ; DAGISEL-GFX942-NEXT:    s_nop 1
 ; DAGISEL-GFX942-NEXT:    v_cndmask_b32_e64 v0, 0, 1, vcc
@@ -134,11 +116,7 @@ define i1 @workitem_nonzero() {
 ; DAGISEL-GFX12-NEXT:    s_wait_samplecnt 0x0
 ; DAGISEL-GFX12-NEXT:    s_wait_bvhcnt 0x0
 ; DAGISEL-GFX12-NEXT:    s_wait_kmcnt 0x0
-; DAGISEL-GFX12-NEXT:    v_lshrrev_b32_e32 v0, 20, v31
-; DAGISEL-GFX12-NEXT:    v_lshrrev_b32_e32 v1, 10, v31
-; DAGISEL-GFX12-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
-; DAGISEL-GFX12-NEXT:    v_or3_b32 v0, v31, v1, v0
-; DAGISEL-GFX12-NEXT:    v_and_b32_e32 v0, 0x3ff, v0
+; DAGISEL-GFX12-NEXT:    v_and_b32_e32 v0, 0x3fffffff, v31
 ; DAGISEL-GFX12-NEXT:    s_delay_alu instid0(VALU_DEP_1)
 ; DAGISEL-GFX12-NEXT:    v_cmp_ne_u32_e32 vcc_lo, 0, v0
 ; DAGISEL-GFX12-NEXT:    s_wait_alu 0xfffd

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

jayfoad · 2025-06-27T12:41:51Z

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

+  if (N0.getOpcode() != ISD::OR || !isa<ConstantSDNode>(N1))
+    return SDValue();


I think a lot of this matching code would be much neater with sd_match.

I don't think so, except maybe for the shift part but even then it doesn't make the code much shorter. I don't check for a tree of node, just one node at a time

RKSimon

Why DAG and not InstCombine for this?

Pierre-vh · 2025-06-30T07:02:39Z

Why DAG and not InstCombine for this?

The intrinsics we want to optimize with this aren't lowered yet at IC

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

jayfoad · 2025-06-30T09:13:09Z

Does this also handle the case where all of the values ORed together are shifted, like (setcc ((x >> c0 | x >> c1 | ...) & mask)) ?

Pierre-vh · 2025-07-01T09:12:01Z

Does this also handle the case where all of the values ORed together are shifted, like (setcc ((x >> c0 | x >> c1 | ...) & mask)) ?

Yes, i added a test for it

Pierre-vh · 2025-07-29T10:57:51Z

Merge activity

Jul 29, 10:57 AM UTC: A user started a stack merge that includes this pull request via Graphite.
Jul 29, 11:05 AM UTC: The Graphite merge of this pull request was cancelled.

RKSimon

You can probably tidy this up a great deal by using the sd_match matchers

Fold sequences where we extract a bunch of contiguous bits from a value, merge them into the low bit and then check if the low bits are zero or not. It seems like a strange sequence at first but it's an idiom used by device libs in device libs to check workitem IDs for AMDGPU. The reason I put this in DAGCombiner instead of the target combiner is because this is a generic, valid transform that's also fairly niche, so there isn't much risk of a combine loop I think. See #136727

RKSimon · 2025-08-02T18:09:45Z

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

+        return SDValue();
+
+      auto ShiftAmtVal = cast<ConstantSDNode>(ShiftAmt)->getAsZExtVal();
+      if (ShiftAmtVal > RootMask.getBitWidth())


@Pierre-vh shouldn't this be ShiftAmtVal >= RootMask.getBitWidth()?

This was referenced Jun 27, 2025

[DAG] Remove AssertZext if the input is masked #146052

Merged

[AMDGPU] Add tests for workgroup/workitem intrinsic optimizations #146053

Merged

[GISel] Combine compare of bitfield extracts or'd together. #146055

Open

Pierre-vh requested review from arsenm, jayfoad and shiltian June 27, 2025 10:46

Pierre-vh marked this pull request as ready for review June 27, 2025 10:47

llvmbot added backend:AMDGPU llvm:SelectionDAG SelectionDAGISel as well labels Jun 27, 2025

jayfoad reviewed Jun 27, 2025

View reviewed changes

RKSimon reviewed Jun 27, 2025

View reviewed changes

Pierre-vh force-pushed the users/pierre-vh/dag-combine-workitems-intrinsics branch from 7ff0806 to bb21ba4 Compare June 30, 2025 08:27

jayfoad reviewed Jun 30, 2025

View reviewed changes

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp Outdated Show resolved Hide resolved

Pierre-vh force-pushed the users/pierre-vh/tests-for-id-intrinsic-opts branch from 4eee684 to 2b9f0b5 Compare June 30, 2025 09:14

Pierre-vh force-pushed the users/pierre-vh/dag-combine-workitems-intrinsics branch from bb21ba4 to a3ee9d6 Compare June 30, 2025 09:14

jayfoad approved these changes Jul 1, 2025

View reviewed changes

Pierre-vh force-pushed the users/pierre-vh/tests-for-id-intrinsic-opts branch from 2b9f0b5 to e08e64f Compare July 16, 2025 08:51

Pierre-vh force-pushed the users/pierre-vh/dag-combine-workitems-intrinsics branch from c154703 to 00dc5b5 Compare July 16, 2025 08:51

Pierre-vh force-pushed the users/pierre-vh/tests-for-id-intrinsic-opts branch from e08e64f to 267fb0b Compare July 24, 2025 10:35

Pierre-vh force-pushed the users/pierre-vh/dag-combine-workitems-intrinsics branch from 00dc5b5 to ed92c03 Compare July 24, 2025 10:35

Pierre-vh force-pushed the users/pierre-vh/tests-for-id-intrinsic-opts branch from 267fb0b to 3e84a66 Compare July 29, 2025 10:08

Pierre-vh force-pushed the users/pierre-vh/dag-combine-workitems-intrinsics branch from ed92c03 to 2ae6cfd Compare July 29, 2025 10:08

RKSimon reviewed Jul 29, 2025

View reviewed changes

Base automatically changed from users/pierre-vh/tests-for-id-intrinsic-opts to main July 29, 2025 11:06

Pierre-vh added 3 commits July 29, 2025 11:08

comments

262994c

comments

be70e00

Pierre-vh force-pushed the users/pierre-vh/dag-combine-workitems-intrinsics branch from 2ae6cfd to be70e00 Compare July 29, 2025 11:08

Pierre-vh merged commit c4b1557 into main Jul 30, 2025
9 checks passed

Pierre-vh deleted the users/pierre-vh/dag-combine-workitems-intrinsics branch July 30, 2025 08:27

RKSimon reviewed Aug 2, 2025

View reviewed changes

		if (N0.getOpcode() != ISD::OR \|\| !isa<ConstantSDNode>(N1))
		return SDValue();

[DAG] Fold (setcc ((x | x >> c0 | ...) & mask)) sequences #146054

[DAG] Fold (setcc ((x | x >> c0 | ...) & mask)) sequences #146054

Uh oh!

Conversation

Pierre-vh commented Jun 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Pierre-vh commented Jun 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jun 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jayfoad Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

Pierre-vh Jun 30, 2025

Choose a reason for hiding this comment

Uh oh!

RKSimon left a comment

Choose a reason for hiding this comment

Uh oh!

Pierre-vh commented Jun 30, 2025

Uh oh!

Uh oh!

jayfoad commented Jun 30, 2025

Uh oh!

Pierre-vh commented Jul 1, 2025

Uh oh!

Pierre-vh commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge activity

Uh oh!

RKSimon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

RKSimon Aug 2, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Pierre-vh commented Jun 27, 2025 •

edited

Loading

Pierre-vh commented Jun 27, 2025 •

edited

Loading

llvmbot commented Jun 27, 2025 •

edited

Loading

Pierre-vh commented Jul 29, 2025 •

edited

Loading