Skip to content

[DemandedBits] Support non-constant shift amounts #148880

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

karouzakisp
Copy link

@karouzakisp karouzakisp commented Jul 15, 2025

This patch adds support for the shift operators to handle non-constant shift operands.

ashr proof -->https://alive2.llvm.org/ce/z/EN-siK
lshr proof --> https://alive2.llvm.org/ce/z/eeGzyB
shl proof --> https://alive2.llvm.org/ce/z/dpvbkq

This is done by supporting shift operators to handle non
constant shift amount.
Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added the llvm:analysis Includes value tracking, cost tables and constant folding label Jul 15, 2025
@llvmbot
Copy link
Member

llvmbot commented Jul 15, 2025

@llvm/pr-subscribers-llvm-analysis

Author: Panagiotis K (karouzakisp)

Changes

This is part of a larger PR: #148853
To improve the DemandedBits Analysis.

Here we add support to the shift operators to handle non-constant shift operands.


Full diff: https://github.com/llvm/llvm-project/pull/148880.diff

2 Files Affected:

  • (modified) llvm/lib/Analysis/DemandedBits.cpp (+46)
  • (modified) llvm/test/Analysis/DemandedBits/shl.ll (+47-1)
diff --git a/llvm/lib/Analysis/DemandedBits.cpp b/llvm/lib/Analysis/DemandedBits.cpp
index 6694d5cc06c8c..2d30575c19130 100644
--- a/llvm/lib/Analysis/DemandedBits.cpp
+++ b/llvm/lib/Analysis/DemandedBits.cpp
@@ -36,6 +36,7 @@
 #include "llvm/Support/Casting.h"
 #include "llvm/Support/Debug.h"
 #include "llvm/Support/KnownBits.h"
+#include "llvm/Support/MathExtras.h"
 #include "llvm/Support/raw_ostream.h"
 #include <algorithm>
 #include <cstdint>
@@ -183,6 +184,17 @@ void DemandedBits::determineLiveOperandBits(
           AB |= APInt::getHighBitsSet(BitWidth, ShiftAmt+1);
         else if (S->hasNoUnsignedWrap())
           AB |= APInt::getHighBitsSet(BitWidth, ShiftAmt);
+      } else {
+        ComputeKnownBits(BitWidth, UserI->getOperand(1), nullptr);
+        unsigned Min = Known.getMinValue().getLimitedValue(BitWidth - 1);
+        unsigned Max = Known.getMaxValue().getLimitedValue(BitWidth - 1);
+        // similar to Lshr case
+        AB = (AOut.lshr(Min) | AOut.lshr(Max));
+        const auto *S = cast<ShlOperator>(UserI);
+        if (S->hasNoSignedWrap())
+          AB |= APInt::getHighBitsSet(BitWidth, Max + 1);
+        else if (S->hasNoUnsignedWrap())
+          AB |= APInt::getHighBitsSet(BitWidth, Max);
       }
     }
     break;
@@ -197,6 +209,19 @@ void DemandedBits::determineLiveOperandBits(
         // (they must be zero).
         if (cast<LShrOperator>(UserI)->isExact())
           AB |= APInt::getLowBitsSet(BitWidth, ShiftAmt);
+      } else {
+        ComputeKnownBits(BitWidth, UserI->getOperand(1), nullptr);
+        unsigned Min = Known.getMinValue().getLimitedValue(BitWidth - 1);
+        unsigned Max = Known.getMaxValue().getLimitedValue(BitWidth - 1);
+        // Suppose AOut == 0b0000 0011
+        // [min, max] = [1, 3]
+        // shift by 1 we get 0b0000 0110
+        // shift by 2 we get 0b0000 1100
+        // shift by 3 we get 0b0001 1000
+        // we take the or here because need to cover all the above possibilities
+        AB = (AOut.shl(Min) | AOut.shl(Max));
+        if (cast<LShrOperator>(UserI)->isExact())
+          AB |= APInt::getLowBitsSet(BitWidth, Max);
       }
     }
     break;
@@ -217,6 +242,27 @@ void DemandedBits::determineLiveOperandBits(
         // (they must be zero).
         if (cast<AShrOperator>(UserI)->isExact())
           AB |= APInt::getLowBitsSet(BitWidth, ShiftAmt);
+      } else {
+        ComputeKnownBits(BitWidth, UserI->getOperand(1), nullptr);
+        unsigned Min = Known.getMinValue().getLimitedValue(BitWidth - 1);
+        unsigned Max = Known.getMaxValue().getLimitedValue(BitWidth - 1);
+        AB = (AOut.shl(Min) | AOut.shl(Max));
+
+        if (Max) {
+          // Suppose AOut = 0011 1100
+          // [min, max] = [1, 3]
+          // ShiftAmount = 1 : Mask is 1000 0000
+          // ShiftAmount = 2 : Mask is 1100 0000
+          // ShiftAmount = 3 : Mask is 1110 0000
+          // The Mask with Max covers every case in [min, max],
+          // so we are done
+          if ((AOut & APInt::getHighBitsSet(BitWidth, Max)).getBoolValue())
+            AB.setSignBit();
+        }
+        // If the shift is exact, then the low bits are not dead
+        // (they must be zero).
+        if (cast<AShrOperator>(UserI)->isExact())
+          AB |= APInt::getLowBitsSet(BitWidth, Max);
       }
     }
     break;
diff --git a/llvm/test/Analysis/DemandedBits/shl.ll b/llvm/test/Analysis/DemandedBits/shl.ll
index e41f5f4107735..c3313a93c1e85 100644
--- a/llvm/test/Analysis/DemandedBits/shl.ll
+++ b/llvm/test/Analysis/DemandedBits/shl.ll
@@ -57,10 +57,56 @@ define i8 @test_shl(i32 %a, i32 %b) {
 ; CHECK-DAG:  DemandedBits: 0xff for %shl.t = trunc i32 %shl to i8
 ; CHECK-DAG:  DemandedBits: 0xff for %shl in %shl.t = trunc i32 %shl to i8
 ; CHECK-DAG:  DemandedBits: 0xff for %shl = shl i32 %a, %b
-; CHECK-DAG:  DemandedBits: 0xffffffff for %a in %shl = shl i32 %a, %b
+; CHECK-DAG:  DemandedBits: 0xff for %a in %shl = shl i32 %a, %b
 ; CHECK-DAG:  DemandedBits: 0xffffffff for %b in %shl = shl i32 %a, %b
 ;
   %shl = shl i32 %a, %b
   %shl.t = trunc i32 %shl to i8
   ret i8 %shl.t
 }
+
+define i8 @test_shl_var_amount(i32 %a, i32 %b){
+; CHECK-LABEL: 'test_shl_var_amount'
+; CHECK-DAG: DemandedBits: 0xff for   %5 = trunc i32 %4 to i8
+; CHECK-DAG: DemandedBits: 0xff for %4 in   %5 = trunc i32 %4 to i8
+; CHECK-DAG: DemandedBits: 0xff for   %4 = shl i32 %1, %3
+; CHECK-DAG: DemandedBits: 0xff for %1 in   %4 = shl i32 %1, %3
+; CHECK-DAG: DemandedBits: 0xffffffff for %3 in   %4 = shl i32 %1, %3
+; CHECK-DAG: DemandedBits: 0xff for   %2 = trunc i32 %1 to i8
+; CHECK-DAG: DemandedBits: 0xff for %1 in   %2 = trunc i32 %1 to i8
+; CHECK-DAG: DemandedBits: 0xffffffff for   %3 = zext i8 %2 to i32
+; CHECK-DAG: DemandedBits: 0xff for %2 in   %3 = zext i8 %2 to i32
+; CHECK-DAG: DemandedBits: 0xff for   %1 = add nsw i32 %a, %b
+; CHECK-DAG: DemandedBits: 0xff for %a in   %1 = add nsw i32 %a, %b
+; CHECK-DAG: DemandedBits: 0xff for %b in   %1 = add nsw i32 %a, %b
+;
+  %1 = add nsw i32 %a, %b
+  %2 = trunc i32 %1 to i8
+  %3 = zext i8 %2 to i32
+  %4 = shl i32 %1, %3
+  %5 = trunc i32 %4 to i8
+  ret i8 %5
+}
+
+define i8 @test_shl_var_amount_nsw(i32 %a, i32 %b){
+ ; CHECK-LABEL 'test_shl_var_amount_nsw'
+ ; CHECK-DAG: DemandedBits: 0xff for   %5 = trunc i32 %4 to i8
+ ; CHECK-DAG: DemandedBits: 0xff for %4 in   %5 = trunc i32 %4 to i8
+ ; CHECK-DAG: DemandedBits: 0xff for   %4 = shl nsw i32 %1, %3
+ ; CHECK-DAG: DemandedBits: 0xffffffff for %1 in   %4 = shl nsw i32 %1, %3
+ ; CHECK-DAG: DemandedBits: 0xffffffff for %3 in   %4 = shl nsw i32 %1, %3
+ ; CHECK-DAG: DemandedBits: 0xffffffff for   %3 = zext i8 %2 to i32
+ ; CHECK-DAG: DemandedBits: 0xff for %2 in   %3 = zext i8 %2 to i32
+ ; CHECK-DAG: DemandedBits: 0xff for   %2 = trunc i32 %1 to i8
+ ; CHECK-DAG: DemandedBits: 0xff for %1 in   %2 = trunc i32 %1 to i8
+ ; CHECK-DAG: DemandedBits: 0xffffffff for   %1 = add nsw i32 %a, %b
+ ; CHECK-DAG: DemandedBits: 0xffffffff for %a in   %1 = add nsw i32 %a, %b
+ ; CHECK-DAG: DemandedBits: 0xffffffff for %b in   %1 = add nsw i32 %a, %b
+ ;
+  %1 = add nsw i32 %a, %b
+  %2 = trunc i32 %1 to i8
+  %3 = zext i8 %2 to i32
+  %4 = shl nsw i32 %1, %3
+  %5 = trunc i32 %4 to i8
+  ret i8 %5
+}

@karouzakisp
Copy link
Author

@nikic @artagnon @jayfoad Could you please review? Thanks

@artagnon artagnon requested review from artagnon, jayfoad and nikic July 15, 2025 16:22
Copy link
Contributor

@artagnon artagnon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing coverage for lshr and ashr? Could you kindly add tests for them?

@topperc
Copy link
Collaborator

topperc commented Jul 15, 2025

Missing tests for right shifts?

@artagnon
Copy link
Contributor

Kindly note that we only have a squash-and-merge. As a result:

  1. Your commit message should be filled into the PR, including the title. The PR's title and body will be used as the commit message, and the text in your commit will be discarded when landing.
  2. Kindly add additional changes as separate commits, and avoid force-pushing except when a rebase is required.

Copy link
Member

@dtcxzyw dtcxzyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please provide the alive2 proof. See also my previous comment #148853 (review)

@karouzakisp
Copy link
Author

Missing tests for right shifts?

I just added the tests

@karouzakisp karouzakisp changed the title [LLVM] Improve the DemandedBits Analysis [LLVM] Improve the shift operators of the DemandedBits Analysis Jul 15, 2025
@karouzakisp
Copy link
Author

Please provide the alive2 proof. See also my previous comment #148853 (review)

I am not certain which transformation I should verify. Maybe the one on your previous comment?

@artagnon
Copy link
Contributor

Please provide the alive2 proof. See also my previous comment #148853 (review)

I am not certain which transformation I should verify. Maybe the one on your previous comment?

I think what we want verified is the algorithm of the analysis itself, not a particular transformation: if can express the code you wrote for DemandedBits in a language that Alive2 can verify, that would be great (this isn't exactly straight-forward, but @dtcxzyw left some hints). Think about it, and try it out: we'll help out.

@nikic nikic changed the title [LLVM] Improve the shift operators of the DemandedBits Analysis [DemandedBits] Support non-constant shift amounts Jul 16, 2025
Copy link
Member

@dtcxzyw dtcxzyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Miscompilation reproducer: https://alive2.llvm.org/ce/z/bSBzWM

; bin/opt -passes=bdce test.ll -S
define i16 @src(i32 range(i32 0, 2) %x) {
entry:
  %or = or i32 0, 48
  %shl = shl i32 %or, %x
  %trunc = trunc i32 %shl to i16
  ret i16 %trunc
}
define i16 @tgt(i32 range(i32 0, 2) %x) {
entry:
  %shl = shl i32 0, %x
  %trunc = trunc i32 %shl to i16
  ret i16 %trunc
}

@karouzakisp
Copy link
Author

Miscompilation reproducer: https://alive2.llvm.org/ce/z/bSBzWM

; bin/opt -passes=bdce test.ll -S
define i16 @src(i32 range(i32 0, 2) %x) {
entry:
  %or = or i32 0, 48
  %shl = shl i32 %or, %x
  %trunc = trunc i32 %shl to i16
  ret i16 %trunc
}
define i16 @tgt(i32 range(i32 0, 2) %x) {
entry:
  %shl = shl i32 0, %x
  %trunc = trunc i32 %shl to i16
  ret i16 %trunc
}

Fixed, Alive verifications coming soon. Hopefully this week!

@karouzakisp
Copy link
Author

Please provide the alive2 proof. See also my previous comment #148853 (review)

@dtcxzyw Here are the alive2 proofs -->
https://alive2.llvm.org/ce/z/SxgY_5

Please note that since my transformation contains a loop and the Alive syntax doesn't permit loops, I added various ranges.

Please let me know if it's okay.

@artagnon, Please let me know what you think.

@dtcxzyw
Copy link
Member

dtcxzyw commented Jul 19, 2025

my transformation contains a loop and the Alive syntax doesn't permit loops

You can use a smaller integer bitwidth (e.g., i4/i8), then unroll the loop with -src-unroll=8 -tgt-unroll=8.

@karouzakisp
Copy link
Author

my transformation contains a loop and the Alive syntax doesn't permit loops

You can use a smaller integer bitwidth (e.g., i4/i8), then unroll the loop with -src-unroll=8 -tgt-unroll=8.

Thanks for the tip. Here is the updated proof --> https://alive2.llvm.org/ce/z/tCvUT6

@dtcxzyw
Copy link
Member

dtcxzyw commented Jul 20, 2025

my transformation contains a loop and the Alive syntax doesn't permit loops

You can use a smaller integer bitwidth (e.g., i4/i8), then unroll the loop with -src-unroll=8 -tgt-unroll=8.

Thanks for the tip. Here is the updated proof --> https://alive2.llvm.org/ce/z/tCvUT6

In your proof, the range of shamt is not taken into account. Updated: https://alive2.llvm.org/ce/z/n4hgkX
Can you please add proofs for shl nsw/shl nuw/lshr/lshr exact/ashr/ashr exact?
Can you please add proofs for lshr/ashr?
Then you should paste the links into the PR description.

@karouzakisp
Copy link
Author

my transformation contains a loop and the Alive syntax doesn't permit loops

You can use a smaller integer bitwidth (e.g., i4/i8), then unroll the loop with -src-unroll=8 -tgt-unroll=8.

Thanks for the tip. Here is the updated proof --> https://alive2.llvm.org/ce/z/tCvUT6

In your proof, the range of shamt is not taken into account. Updated: https://alive2.llvm.org/ce/z/n4hgkX Can you please add proofs for shl nsw/shl nuw/lshr/lshr exact/ashr/ashr exact? Can you please add proofs for lshr/ashr? Then you should paste the links into the PR description.

Thanks for the help.

I just added the proofs for lshr and ashr. Also, I updated the cpp code.

@karouzakisp
Copy link
Author

I just updated all the proofs. Making changes in the code now, to reduce the complexity to L times log(L)

@karouzakisp karouzakisp requested a review from artagnon July 29, 2025 07:57
@karouzakisp karouzakisp requested a review from topperc July 29, 2025 20:57
Copy link
Contributor

@artagnon artagnon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good overall, with extensive testing, although I'll leave the final sign-off to @dtcxzyw.

@karouzakisp
Copy link
Author

Looks very good overall, with extensive testing, although I'll leave the final sign-off to @dtcxzyw.

Artagnon, thanks for the reviews and the comments.

I just updated the code.

@karouzakisp
Copy link
Author

@dtcxzyw Could you please review? Thanks!

if (S->hasNoSignedWrap())
AB |= APInt::getHighBitsSet(BitWidth, Max + 1);
else if (S->hasNoUnsignedWrap())
AB |= APInt::getHighBitsSet(BitWidth, Max);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much of a real world difference is there between what this patch does and

unsigned ActiveBits = AOut.getActiveBits);
AB = APInt::getLowBitsSet(BitWidth, ActiveBits);
if (S->hasNoSignedWrap())
  AB |= APInt::getHighBitsSet(BitWidth, BitWidth);
else if (S->hasNoUnsignedWrap())
  AB |= APInt::getHighBitsSet(BitWidth, BitWidth-1);

That doesn't require calling computeKnownBits on the shift amount and is consistent with what InstCombineSimplifyDemanded does.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to check on the big benchmarks that @dtcxzyw already ran, because BDCE has very few triggers in the llvm test suite.

Should I create a new PR with that approach?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, I added similar logic from here to InstCombineSimplifyDemanded and observed an improvement in code size of up to 1.0% with the O2 pipeline. This is a work in progress, though, that's why there isn't a pull request yet.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to check on the big benchmarks that @dtcxzyw already ran, because BDCE has very few triggers in the llvm test suite.

Should I create a new PR with that approach?

@dtcxzyw what do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course. You don't need to create a new PR, just send me your commit hash.
The bad thing is that llvm-opt-benchmark doesn't support comparing diffs between two PRs, as the diff has been reduced to fit GitHub's limit. But we can compare them by the number of line changes and the statistics diff.

Copy link
Author

@karouzakisp karouzakisp Aug 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dtcxzyw here is the commit hash

91febb9

It's from my branch at

demanded-bit-exp

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@topperc what do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can go ahead with this patch. Thanks for getting the data.

AB = APInt::getZero(BitWidth);
uint64_t LoopRange = Max - Min;
APInt Mask = AOut;
for (unsigned ShiftAmnt = 1; ShiftAmnt <= LoopRange; ShiftAmnt <<= 1) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not for this patch, but what if we know the shift amount is even or odd? We would want to use every other shift amount in the range.

Copy link
Author

@karouzakisp karouzakisp Aug 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you mean if the LoopRange is even or odd? I think the shift amount is always even.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean the shift amount from the instruction itself. computeKnownBits may tell us that the lsb(s) of the shift amount are known to be zero or one. This loop is trying to evaluate all possible shift amounts, but we don't exclude cases that would be impossible based on known values of the lsb(s).

Something like this does exist in the code from KnownBits::shl.

  // Find the common bits from all possible shifts.                              
  unsigned ShiftAmtZeroMask = RHS.Zero.zextOrTrunc(32).getZExtValue();           
  unsigned ShiftAmtOneMask = RHS.One.zextOrTrunc(32).getZExtValue();             
  Known.Zero.setAllBits();                                                       
  Known.One.setAllBits();                                                        
  for (unsigned ShiftAmt = MinShiftAmount; ShiftAmt <= MaxShiftAmount;           
       ++ShiftAmt) {                                                             
    // Skip if the shift amount is impossible.                                   
    if ((ShiftAmtZeroMask & ShiftAmt) != 0 ||                                    
        (ShiftAmtOneMask | ShiftAmt) != ShiftAmt)                                
      continue;                                                                  
    Known = Known.intersectWith(ShiftByConst(LHS, ShiftAmt));                    
    if (Known.isUnknown())                                                       
      break;                                                                     
  }

Copy link
Author

@karouzakisp karouzakisp Aug 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the ShiftAmount of the Shift Inst isn't known,
Then what Min and Max would be?
Min = 0?
Max = BitWidth - 1?
unsigned Min = Known.getMinValue().getLimitedValue(BitWidth - 1); unsigned Max = Known.getMaxValue().getLimitedValue(BitWidth - 1);

Over-approximation fixed

Co-authored-by: Yingwei Zheng <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:analysis Includes value tracking, cost tables and constant folding
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants