Skip to content

[DemandedBits] Support non-constant shift amounts #148880

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
69 changes: 69 additions & 0 deletions llvm/lib/Analysis/DemandedBits.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,26 @@ void DemandedBits::determineLiveOperandBits(
computeKnownBits(V2, Known2, DL, &AC, UserI, &DT);
}
};
auto GetShiftedRange = [&](uint64_t Min, uint64_t Max, bool ShiftLeft) {
auto ShiftF = [ShiftLeft](const APInt &Mask, unsigned ShiftAmnt) {
return ShiftLeft ? Mask.shl(ShiftAmnt) : Mask.lshr(ShiftAmnt);
};
AB = APInt::getZero(BitWidth);
uint64_t LoopRange = Max - Min;
APInt Mask = AOut;
APInt Shifted = AOut; // AOut | (AOut << 1) | ... | (AOut << (ShiftAmnt - 1)
for (unsigned ShiftAmnt = 1; ShiftAmnt <= LoopRange; ShiftAmnt <<= 1) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not for this patch, but what if we know the shift amount is even or odd? We would want to use every other shift amount in the range.

Copy link
Author

@karouzakisp karouzakisp Aug 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you mean if the LoopRange is even or odd? I think the shift amount is always even.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean the shift amount from the instruction itself. computeKnownBits may tell us that the lsb(s) of the shift amount are known to be zero or one. This loop is trying to evaluate all possible shift amounts, but we don't exclude cases that would be impossible based on known values of the lsb(s).

Something like this does exist in the code from KnownBits::shl.

  // Find the common bits from all possible shifts.                              
  unsigned ShiftAmtZeroMask = RHS.Zero.zextOrTrunc(32).getZExtValue();           
  unsigned ShiftAmtOneMask = RHS.One.zextOrTrunc(32).getZExtValue();             
  Known.Zero.setAllBits();                                                       
  Known.One.setAllBits();                                                        
  for (unsigned ShiftAmt = MinShiftAmount; ShiftAmt <= MaxShiftAmount;           
       ++ShiftAmt) {                                                             
    // Skip if the shift amount is impossible.                                   
    if ((ShiftAmtZeroMask & ShiftAmt) != 0 ||                                    
        (ShiftAmtOneMask | ShiftAmt) != ShiftAmt)                                
      continue;                                                                  
    Known = Known.intersectWith(ShiftByConst(LHS, ShiftAmt));                    
    if (Known.isUnknown())                                                       
      break;                                                                     
  }

Copy link
Author

@karouzakisp karouzakisp Aug 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the ShiftAmount of the Shift Inst isn't known,
Then what Min and Max would be?
Min = 0?
Max = BitWidth - 1?
unsigned Min = Known.getMinValue().getLimitedValue(BitWidth - 1); unsigned Max = Known.getMaxValue().getLimitedValue(BitWidth - 1);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

computeKnownBits may tell us that the lsb(s) of the shift amount are known to be zero or one. This loop is trying to evaluate all possible shift amounts, but we don't exclude cases that would be impossible based on known values of the lsb(s).

Enumerating all possible shift amounts may be expensive (see my previous comment #148880 (comment)). But I am fine to reuse the KnownBits' logic. ~KnownBits::shl/lshr/ashr(AOut, ShAmt).Zeros is exactly what we want.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dtcxzyw The loop is optimized.
for a 128 bit value we only need 2 * 8 + 1 = 17 shifts and 17 ors.
KnownBits::shl/lshr/ashr does an intersection; we don't do an intersection, we do a union. I am not certain that we can employ the same approach as they do since we use a different loop. In their loop, they increment by 1
for (unsigned ShiftAmt = MinShiftAmount; ShiftAmt <= MaxShiftAmount; ++ShiftAmt) {

Instead, we increment by the next power of two
for (unsigned ShiftAmnt = 1; ShiftAmnt <= LoopRange; ShiftAmnt <<= 1) {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

KnownBits::shl/lshr/ashr does an intersection; we don't do an intersection, we do a union.

~Intersection.Zeros is a union.

Copy link
Author

@karouzakisp karouzakisp Aug 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant this
Known = Known.intersectWith(ShiftByConst(LHS, ShiftAmt));
Anyway, since we are using APInts if we switch the loop to do increments by 1, even if we employ the same strategy as the KnownBits technique the time complexity is O(n^2) instead of the O(nlogn) we have now.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@topperc what do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think we should handle the lsb case in this patch. So we can keep the optimized loop for now and change it if/when we implement the lsb case which will require additional benchmark evaluation.

if (LoopRange & ShiftAmnt) {
// Account for (LoopRange - ShiftAmnt, LoopRange]
Mask |= ShiftF(Shifted, LoopRange - ShiftAmnt + 1);
// Clears the low bit.
LoopRange -= ShiftAmnt;
}
// [0, ShiftAmnt) -> [0, ShiftAmnt * 2)
Shifted |= ShiftF(Shifted, ShiftAmnt);
}
AB = ShiftF(Mask, Min);
};

switch (UserI->getOpcode()) {
default: break;
Expand Down Expand Up @@ -183,6 +203,17 @@ void DemandedBits::determineLiveOperandBits(
AB |= APInt::getHighBitsSet(BitWidth, ShiftAmt+1);
else if (S->hasNoUnsignedWrap())
AB |= APInt::getHighBitsSet(BitWidth, ShiftAmt);
} else {
ComputeKnownBits(BitWidth, UserI->getOperand(1), nullptr);
uint64_t Min = Known.getMinValue().getLimitedValue(BitWidth - 1);
uint64_t Max = Known.getMaxValue().getLimitedValue(BitWidth - 1);
// similar to Lshr case
GetShiftedRange(Min, Max, /*ShiftLeft=*/false);
const auto *S = cast<ShlOperator>(UserI);
if (S->hasNoSignedWrap())
AB |= APInt::getHighBitsSet(BitWidth, Max + 1);
else if (S->hasNoUnsignedWrap())
AB |= APInt::getHighBitsSet(BitWidth, Max);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much of a real world difference is there between what this patch does and

unsigned ActiveBits = AOut.getActiveBits);
AB = APInt::getLowBitsSet(BitWidth, ActiveBits);
if (S->hasNoSignedWrap())
  AB |= APInt::getHighBitsSet(BitWidth, BitWidth);
else if (S->hasNoUnsignedWrap())
  AB |= APInt::getHighBitsSet(BitWidth, BitWidth-1);

That doesn't require calling computeKnownBits on the shift amount and is consistent with what InstCombineSimplifyDemanded does.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to check on the big benchmarks that @dtcxzyw already ran, because BDCE has very few triggers in the llvm test suite.

Should I create a new PR with that approach?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, I added similar logic from here to InstCombineSimplifyDemanded and observed an improvement in code size of up to 1.0% with the O2 pipeline. This is a work in progress, though, that's why there isn't a pull request yet.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to check on the big benchmarks that @dtcxzyw already ran, because BDCE has very few triggers in the llvm test suite.

Should I create a new PR with that approach?

@dtcxzyw what do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course. You don't need to create a new PR, just send me your commit hash.
The bad thing is that llvm-opt-benchmark doesn't support comparing diffs between two PRs, as the diff has been reduced to fit GitHub's limit. But we can compare them by the number of line changes and the statistics diff.

Copy link
Author

@karouzakisp karouzakisp Aug 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dtcxzyw here is the commit hash

91febb9

It's from my branch at

demanded-bit-exp

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@topperc what do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can go ahead with this patch. Thanks for getting the data.

}
}
break;
Expand All @@ -197,6 +228,24 @@ void DemandedBits::determineLiveOperandBits(
// (they must be zero).
if (cast<LShrOperator>(UserI)->isExact())
AB |= APInt::getLowBitsSet(BitWidth, ShiftAmt);
} else {
ComputeKnownBits(BitWidth, UserI->getOperand(1), nullptr);
uint64_t Min = Known.getMinValue().getLimitedValue(BitWidth - 1);
uint64_t Max = Known.getMaxValue().getLimitedValue(BitWidth - 1);
// Suppose AOut == 0b0000 0001
// [min, max] = [1, 3]
// iteration 1 shift by 1 mask is 0b0000 0011
// iteration 2 shift by 2 mask is 0b0000 1111
// iteration 3, shiftAmnt = 4 > max - min, we stop.
//
// After the iterations we need one more shift by min,
// to move from 0b0000 1111 to --> 0b0001 1110.
// The loop populates the mask relative to (0,...,max-min),
// but we need coverage from (min, max).
// This is why the shift by min is needed.
GetShiftedRange(Min, Max, /*ShiftLeft=*/true);
if (cast<LShrOperator>(UserI)->isExact())
AB |= APInt::getLowBitsSet(BitWidth, Max);
}
}
break;
Expand All @@ -217,6 +266,26 @@ void DemandedBits::determineLiveOperandBits(
// (they must be zero).
if (cast<AShrOperator>(UserI)->isExact())
AB |= APInt::getLowBitsSet(BitWidth, ShiftAmt);
} else {
ComputeKnownBits(BitWidth, UserI->getOperand(1), nullptr);
uint64_t Min = Known.getMinValue().getLimitedValue(BitWidth - 1);
uint64_t Max = Known.getMaxValue().getLimitedValue(BitWidth - 1);
GetShiftedRange(Min, Max, /*ShiftLeft=*/true);
if (Max &&
(AOut & APInt::getHighBitsSet(BitWidth, Max)).getBoolValue()) {
// Suppose AOut = 0011 1100
// [min, max] = [1, 3]
// ShiftAmount = 1 : Mask is 1000 0000
// ShiftAmount = 2 : Mask is 1100 0000
// ShiftAmount = 3 : Mask is 1110 0000
// The Mask with Max covers every case in [min, max],
// so we are done
AB.setSignBit();
}
// If the shift is exact, then the low bits are not dead
// (they must be zero).
if (cast<AShrOperator>(UserI)->isExact())
AB |= APInt::getLowBitsSet(BitWidth, Max);
}
}
break;
Expand Down
198 changes: 198 additions & 0 deletions llvm/test/Analysis/DemandedBits/ashr.ll
Original file line number Diff line number Diff line change
@@ -0,0 +1,198 @@
; RUN: opt -S -disable-output -passes="print<demanded-bits>" < %s 2>&1 | FileCheck %s

define i8 @test_ashr_const_amount_4(i32 %a) {
; CHECK-LABEL: 'test_ashr_const_amount_4'
; CHECK-DAG: DemandedBits: 0xff for %ashr = ashr i32 %a, 4
; CHECK-DAG: DemandedBits: 0xff0 for %a in %ashr = ashr i32 %a, 4
; CHECK-DAG: DemandedBits: 0xffffffff for 4 in %ashr = ashr i32 %a, 4
; CHECK-DAG: DemandedBits: 0xff for %ashr.t = trunc i32 %ashr to i8
; CHECK-DAG: DemandedBits: 0xff for %ashr in %ashr.t = trunc i32 %ashr to i8
;
%ashr = ashr i32 %a, 4
%ashr.t = trunc i32 %ashr to i8
ret i8 %ashr.t
}

define i8 @test_ashr_const_amount_5(i32 %a) {
; CHECK-LABEL: 'test_ashr_const_amount_5'
; CHECK-DAG: DemandedBits: 0xff for %ashr = ashr i32 %a, 5
; CHECK-DAG: DemandedBits: 0x1fe0 for %a in %ashr = ashr i32 %a, 5
; CHECK-DAG: DemandedBits: 0xffffffff for 5 in %ashr = ashr i32 %a, 5
; CHECK-DAG: DemandedBits: 0xff for %ashr.t = trunc i32 %ashr to i8
; CHECK-DAG: DemandedBits: 0xff for %ashr in %ashr.t = trunc i32 %ashr to i8
;
%ashr = ashr i32 %a, 5
%ashr.t = trunc i32 %ashr to i8
ret i8 %ashr.t
}

define i8 @test_ashr_const_amount_8(i32 %a) {
; CHECK-LABEL: 'test_ashr_const_amount_8'
; CHECK-DAG: DemandedBits: 0xff for %ashr = ashr i32 %a, 8
; CHECK-DAG: DemandedBits: 0xff00 for %a in %ashr = ashr i32 %a, 8
; CHECK-DAG: DemandedBits: 0xffffffff for 8 in %ashr = ashr i32 %a, 8
; CHECK-DAG: DemandedBits: 0xff for %ashr.t = trunc i32 %ashr to i8
; CHECK-DAG: DemandedBits: 0xff for %ashr in %ashr.t = trunc i32 %ashr to i8
;
%ashr = ashr i32 %a, 8
%ashr.t = trunc i32 %ashr to i8
ret i8 %ashr.t
}

define i8 @test_ashr_const_amount_9(i32 %a) {

; CHECK-LABEL: 'test_ashr_const_amount_9'
; CHECK-DAG: DemandedBits: 0xff for %ashr.t = trunc i32 %ashr to i8
; CHECK-DAG: DemandedBits: 0xff for %ashr in %ashr.t = trunc i32 %ashr to i8
; CHECK-DAG: DemandedBits: 0xff for %ashr = ashr i32 %a, 8
; CHECK-DAG: DemandedBits: 0xff00 for %a in %ashr = ashr i32 %a, 8
; CHECK-DAG: DemandedBits: 0xffffffff for 8 in %ashr = ashr i32 %a, 8
;
%ashr = ashr i32 %a, 8
%ashr.t = trunc i32 %ashr to i8
ret i8 %ashr.t
}

define i8 @test_ashr(i32 %a, i32 %b) {
; CHECK-LABEL: 'test_ashr'
; CHECK-DAG: DemandedBits: 0xff for %ashr = ashr i32 %a, %b
; CHECK-DAG: DemandedBits: 0xffffffff for %a in %ashr = ashr i32 %a, %b
; CHECK-DAG: DemandedBits: 0xffffffff for %b in %ashr = ashr i32 %a, %b
; CHECK-DAG: DemandedBits: 0xff for %ashr.t = trunc i32 %ashr to i8
; CHECK-DAG: DemandedBits: 0xff for %ashr in %ashr.t = trunc i32 %ashr to i8
;
%ashr = ashr i32 %a, %b
%ashr.t = trunc i32 %ashr to i8
ret i8 %ashr.t
}

define i8 @test_ashr_range_1(i32 %a, i32 %b) {
; CHECK-LABEL: 'test_ashr_range_1'
; CHECK-DAG: DemandedBits: 0xff for %shl.t = trunc i32 %ashr to i8
; CHECK-DAG: DemandedBits: 0xff for %ashr in %shl.t = trunc i32 %ashr to i8
; CHECK-DAG: DemandedBits: 0xffffffff for %b2 = and i32 %b, 3
; CHECK-DAG: DemandedBits: 0x3 for %b in %b2 = and i32 %b, 3
; CHECK-DAG: DemandedBits: 0xffffffff for 3 in %b2 = and i32 %b, 3
; CHECK-DAG: DemandedBits: 0xff for %ashr = ashr i32 %a, %b2
; CHECK-DAG: DemandedBits: 0x7ff for %a in %ashr = ashr i32 %a, %b2
; CHECK-DAG: DemandedBits: 0xffffffff for %b2 in %ashr = ashr i32 %a, %b2
;
%b2 = and i32 %b, 3
%ashr = ashr i32 %a, %b2
%shl.t = trunc i32 %ashr to i8
ret i8 %shl.t
}

define i32 @test_ashr_range_2(i32 %a, i32 %b) {
; CHECK-LABEL: 'test_ashr_range_2'
; CHECK-DAG: DemandedBits: 0xffffffff for %b2 = and i32 %b, 3
; CHECK-DAG: DemandedBits: 0x3 for %b in %b2 = and i32 %b, 3
; CHECK-DAG: DemandedBits: 0xffffffff for 3 in %b2 = and i32 %b, 3
; CHECK-DAG: DemandedBits: 0xffffffff for %ashr = ashr i32 %a, %b2
; CHECK-DAG: DemandedBits: 0xffffffff for %a in %ashr = ashr i32 %a, %b2
; CHECK-DAG: DemandedBits: 0xffffffff for %b2 in %ashr = ashr i32 %a, %b2
;
%b2 = and i32 %b, 3
%ashr = ashr i32 %a, %b2
ret i32 %ashr
}

define i32 @test_ashr_range_3(i32 %a, i32 %b) {
; CHECK-LABEL: 'test_ashr_range_3'
; CHECK-DAG: DemandedBits: 0xffff for %ashr = ashr i32 %a, %b
; CHECK-DAG: DemandedBits: 0xffffffff for %a in %ashr = ashr i32 %a, %b
; CHECK-DAG: DemandedBits: 0xffffffff for %b in %ashr = ashr i32 %a, %b
; CHECK-DAG: DemandedBits: 0xffffffff for %shl = shl i32 %ashr, 16
; CHECK-DAG: DemandedBits: 0xffff for %ashr in %shl = shl i32 %ashr, 16
; CHECK-DAG: DemandedBits: 0xffffffff for 16 in %shl = shl i32 %ashr, 16
;
%ashr = ashr i32 %a, %b
%shl = shl i32 %ashr, 16
ret i32 %shl
}
define i32 @test_ashr_range_4(i32 %a, i32 %b) {
; CHECK-LABEL: 'test_ashr_range_4'
; CHECK-DAG: DemandedBits: 0xffffffff for %shr = lshr i32 %ashr, 8
; CHECK-DAG: DemandedBits: 0xffffff00 for %ashr in %shr = lshr i32 %ashr, 8
; CHECK-DAG: DemandedBits: 0xffffffff for 8 in %shr = lshr i32 %ashr, 8
; CHECK-DAG: DemandedBits: 0xffffff00 for %ashr = ashr i32 %a, %b
; CHECK-DAG: DemandedBits: 0xffffff00 for %a in %ashr = ashr i32 %a, %b
; CHECK-DAG: DemandedBits: 0xffffffff for %b in %ashr = ashr i32 %a, %b
%ashr = ashr i32 %a, %b
%shr = lshr i32 %ashr, 8
ret i32 %shr
}

define i32 @test_ashr_range_5(i32 %a, i32 %b) {
; CHECK-LABEL: 'test_ashr_range_5'
; CHECK-DAG: DemandedBits: 0xffffffff for %2 = and i32 %1, 255
; CHECK-DAG: DemandedBits: 0xff for %1 in %2 = and i32 %1, 255
; CHECK-DAG: DemandedBits: 0xffffffff for 255 in %2 = and i32 %1, 255
; CHECK-DAG: DemandedBits: 0xff for %1 = ashr i32 %a, %b
; CHECK-DAG: DemandedBits: 0xffffffff for %a in %1 = ashr i32 %a, %b
; CHECK-DAG: DemandedBits: 0xffffffff for %b in %1 = ashr i32 %a, %b
;
%1 = ashr i32 %a, %b
%2 = and i32 %1, 255
ret i32 %2
}

define i32 @test_ashr_range_6(i32 %a, i32 %b) {
; CHECK-LABEL: 'test_ashr_range_6'
; CHECK-DAG: DemandedBits: 0xffff0000 for %lshr.1 = ashr i32 %a, %b
; CHECK-DAG: DemandedBits: 0xffff0000 for %a in %lshr.1 = ashr i32 %a, %b
; CHECK-DAG: DemandedBits: 0xffffffff for %b in %lshr.1 = ashr i32 %a, %b
; CHECK-DAG: DemandedBits: 0xffffffff for %lshr.2 = ashr i32 %lshr.1, 16
; CHECK-DAG: DemandedBits: 0xffff0000 for %lshr.1 in %lshr.2 = ashr i32 %lshr.1, 16
; CHECK-DAG: DemandedBits: 0xffffffff for 16 in %lshr.2 = ashr i32 %lshr.1, 16
;
%lshr.1 = ashr i32 %a, %b
%lshr.2 = ashr i32 %lshr.1, 16
ret i32 %lshr.2
}

define i8 @test_ashr_var_amount(i32 %a, i32 %b){
; CHECK-LABEL: 'test_ashr_var_amount'
; CHECK-DAG: DemandedBits: 0xff for %4 = ashr i32 %1, %3
; CHECK-DAG: DemandedBits: 0xffffffff for %1 in %4 = ashr i32 %1, %3
; CHECK-DAG: DemandedBits: 0xffffffff for %3 in %4 = ashr i32 %1, %3
; CHECK-DAG: DemandedBits: 0xff for %2 = trunc i32 %1 to i8
; CHECK-DAG: DemandedBits: 0xff for %1 in %2 = trunc i32 %1 to i8
; CHECK-DAG: DemandedBits: 0xffffffff for %1 = add nsw i32 %a, %b
; CHECK-DAG: DemandedBits: 0xffffffff for %a in %1 = add nsw i32 %a, %b
; CHECK-DAG: DemandedBits: 0xffffffff for %b in %1 = add nsw i32 %a, %b
; CHECK-DAG: DemandedBits: 0xffffffff for %3 = zext i8 %2 to i32
; CHECK-DAG: DemandedBits: 0xff for %2 in %3 = zext i8 %2 to i32
; CHECK-DAG: DemandedBits: 0xff for %5 = trunc i32 %4 to i8
; CHECK-DAG: DemandedBits: 0xff for %4 in %5 = trunc i32 %4 to i8
;
%1 = add nsw i32 %a, %b
%2 = trunc i32 %1 to i8
%3 = zext i8 %2 to i32
%4 = ashr i32 %1, %3
%5 = trunc i32 %4 to i8
ret i8 %5
}

define i8 @test_ashr_var_amount_nsw(i32 %a, i32 %b){
; CHECK-LABEL 'test_ashr_var_amount_nsw'
; CHECK-DAG: DemandedBits: 0xff for %5 = trunc i32 %4 to i8
; CHECK-DAG: DemandedBits: 0xff for %4 in %5 = trunc i32 %4 to i8
; CHECK-DAG: DemandedBits: 0xffffffff for %1 = add nsw i32 %a, %b
; CHECK-DAG: DemandedBits: 0xffffffff for %a in %1 = add nsw i32 %a, %b
; CHECK-DAG: DemandedBits: 0xffffffff for %b in %1 = add nsw i32 %a, %b
; CHECK-DAG: DemandedBits: 0xff for %2 = trunc i32 %1 to i8
; CHECK-DAG: DemandedBits: 0xff for %1 in %2 = trunc i32 %1 to i8
; CHECK-DAG: DemandedBits: 0xffffffff for %3 = zext i8 %2 to i32
; CHECK-DAG: DemandedBits: 0xff for %2 in %3 = zext i8 %2 to i32
; CHECK-DAG: DemandedBits: 0xff for %4 = ashr exact i32 %1, %3
; CHECK-DAG: DemandedBits: 0xffffffff for %1 in %4 = ashr exact i32 %1, %3
; CHECK-DAG: DemandedBits: 0xffffffff for %3 in %4 = ashr exact i32 %1, %3
;
%1 = add nsw i32 %a, %b
%2 = trunc i32 %1 to i8
%3 = zext i8 %2 to i32
%4 = ashr exact i32 %1, %3
%5 = trunc i32 %4 to i8
ret i8 %5
}
Loading