-
Notifications
You must be signed in to change notification settings - Fork 14.7k
[DemandedBits] Support non-constant shift amounts #148880
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
b802455
289fb1c
3a4d65e
4fbabd6
23cea68
0ee0e03
bbb1bd8
c102f64
bde2923
27e434a
e9288bd
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -76,6 +76,26 @@ void DemandedBits::determineLiveOperandBits( | |
computeKnownBits(V2, Known2, DL, &AC, UserI, &DT); | ||
} | ||
}; | ||
auto GetShiftedRange = [&](uint64_t Min, uint64_t Max, bool ShiftLeft) { | ||
auto ShiftF = [ShiftLeft](const APInt &Mask, unsigned ShiftAmnt) { | ||
return ShiftLeft ? Mask.shl(ShiftAmnt) : Mask.lshr(ShiftAmnt); | ||
}; | ||
AB = APInt::getZero(BitWidth); | ||
uint64_t LoopRange = Max - Min; | ||
APInt Mask = AOut; | ||
APInt Shifted = AOut; // AOut | (AOut << 1) | ... | (AOut << (ShiftAmnt - 1) | ||
for (unsigned ShiftAmnt = 1; ShiftAmnt <= LoopRange; ShiftAmnt <<= 1) { | ||
if (LoopRange & ShiftAmnt) { | ||
// Account for (LoopRange - ShiftAmnt, LoopRange] | ||
Mask |= ShiftF(Shifted, LoopRange - ShiftAmnt + 1); | ||
// Clears the low bit. | ||
LoopRange -= ShiftAmnt; | ||
} | ||
// [0, ShiftAmnt) -> [0, ShiftAmnt * 2) | ||
Shifted |= ShiftF(Shifted, ShiftAmnt); | ||
} | ||
AB = ShiftF(Mask, Min); | ||
}; | ||
|
||
switch (UserI->getOpcode()) { | ||
default: break; | ||
|
@@ -183,6 +203,17 @@ void DemandedBits::determineLiveOperandBits( | |
AB |= APInt::getHighBitsSet(BitWidth, ShiftAmt+1); | ||
else if (S->hasNoUnsignedWrap()) | ||
AB |= APInt::getHighBitsSet(BitWidth, ShiftAmt); | ||
} else { | ||
ComputeKnownBits(BitWidth, UserI->getOperand(1), nullptr); | ||
dtcxzyw marked this conversation as resolved.
Show resolved
Hide resolved
|
||
uint64_t Min = Known.getMinValue().getLimitedValue(BitWidth - 1); | ||
uint64_t Max = Known.getMaxValue().getLimitedValue(BitWidth - 1); | ||
// similar to Lshr case | ||
GetShiftedRange(Min, Max, /*ShiftLeft=*/false); | ||
const auto *S = cast<ShlOperator>(UserI); | ||
if (S->hasNoSignedWrap()) | ||
AB |= APInt::getHighBitsSet(BitWidth, Max + 1); | ||
else if (S->hasNoUnsignedWrap()) | ||
AB |= APInt::getHighBitsSet(BitWidth, Max); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How much of a real world difference is there between what this patch does and
That doesn't require calling computeKnownBits on the shift amount and is consistent with what InstCombineSimplifyDemanded does. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We need to check on the big benchmarks that @dtcxzyw already ran, because BDCE has very few triggers in the llvm test suite. Should I create a new PR with that approach? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Additionally, I added similar logic from here to InstCombineSimplifyDemanded and observed an improvement in code size of up to 1.0% with the O2 pipeline. This is a work in progress, though, that's why there isn't a pull request yet. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Of course. You don't need to create a new PR, just send me your commit hash. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Before: dtcxzyw/llvm-opt-benchmark#2638 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @topperc what do you think? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We can go ahead with this patch. Thanks for getting the data. |
||
} | ||
} | ||
break; | ||
|
@@ -197,6 +228,24 @@ void DemandedBits::determineLiveOperandBits( | |
// (they must be zero). | ||
if (cast<LShrOperator>(UserI)->isExact()) | ||
AB |= APInt::getLowBitsSet(BitWidth, ShiftAmt); | ||
} else { | ||
ComputeKnownBits(BitWidth, UserI->getOperand(1), nullptr); | ||
uint64_t Min = Known.getMinValue().getLimitedValue(BitWidth - 1); | ||
uint64_t Max = Known.getMaxValue().getLimitedValue(BitWidth - 1); | ||
// Suppose AOut == 0b0000 0001 | ||
// [min, max] = [1, 3] | ||
// iteration 1 shift by 1 mask is 0b0000 0011 | ||
// iteration 2 shift by 2 mask is 0b0000 1111 | ||
// iteration 3, shiftAmnt = 4 > max - min, we stop. | ||
// | ||
// After the iterations we need one more shift by min, | ||
// to move from 0b0000 1111 to --> 0b0001 1110. | ||
// The loop populates the mask relative to (0,...,max-min), | ||
// but we need coverage from (min, max). | ||
// This is why the shift by min is needed. | ||
GetShiftedRange(Min, Max, /*ShiftLeft=*/true); | ||
if (cast<LShrOperator>(UserI)->isExact()) | ||
AB |= APInt::getLowBitsSet(BitWidth, Max); | ||
} | ||
} | ||
break; | ||
|
@@ -217,6 +266,26 @@ void DemandedBits::determineLiveOperandBits( | |
// (they must be zero). | ||
if (cast<AShrOperator>(UserI)->isExact()) | ||
AB |= APInt::getLowBitsSet(BitWidth, ShiftAmt); | ||
} else { | ||
ComputeKnownBits(BitWidth, UserI->getOperand(1), nullptr); | ||
uint64_t Min = Known.getMinValue().getLimitedValue(BitWidth - 1); | ||
uint64_t Max = Known.getMaxValue().getLimitedValue(BitWidth - 1); | ||
GetShiftedRange(Min, Max, /*ShiftLeft=*/true); | ||
if (Max && | ||
(AOut & APInt::getHighBitsSet(BitWidth, Max)).getBoolValue()) { | ||
// Suppose AOut = 0011 1100 | ||
// [min, max] = [1, 3] | ||
// ShiftAmount = 1 : Mask is 1000 0000 | ||
// ShiftAmount = 2 : Mask is 1100 0000 | ||
// ShiftAmount = 3 : Mask is 1110 0000 | ||
// The Mask with Max covers every case in [min, max], | ||
// so we are done | ||
AB.setSignBit(); | ||
} | ||
// If the shift is exact, then the low bits are not dead | ||
// (they must be zero). | ||
if (cast<AShrOperator>(UserI)->isExact()) | ||
AB |= APInt::getLowBitsSet(BitWidth, Max); | ||
} | ||
} | ||
break; | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,198 @@ | ||
; RUN: opt -S -disable-output -passes="print<demanded-bits>" < %s 2>&1 | FileCheck %s | ||
|
||
define i8 @test_ashr_const_amount_4(i32 %a) { | ||
; CHECK-LABEL: 'test_ashr_const_amount_4' | ||
; CHECK-DAG: DemandedBits: 0xff for %ashr = ashr i32 %a, 4 | ||
; CHECK-DAG: DemandedBits: 0xff0 for %a in %ashr = ashr i32 %a, 4 | ||
; CHECK-DAG: DemandedBits: 0xffffffff for 4 in %ashr = ashr i32 %a, 4 | ||
; CHECK-DAG: DemandedBits: 0xff for %ashr.t = trunc i32 %ashr to i8 | ||
; CHECK-DAG: DemandedBits: 0xff for %ashr in %ashr.t = trunc i32 %ashr to i8 | ||
; | ||
%ashr = ashr i32 %a, 4 | ||
%ashr.t = trunc i32 %ashr to i8 | ||
ret i8 %ashr.t | ||
} | ||
|
||
define i8 @test_ashr_const_amount_5(i32 %a) { | ||
; CHECK-LABEL: 'test_ashr_const_amount_5' | ||
; CHECK-DAG: DemandedBits: 0xff for %ashr = ashr i32 %a, 5 | ||
; CHECK-DAG: DemandedBits: 0x1fe0 for %a in %ashr = ashr i32 %a, 5 | ||
; CHECK-DAG: DemandedBits: 0xffffffff for 5 in %ashr = ashr i32 %a, 5 | ||
; CHECK-DAG: DemandedBits: 0xff for %ashr.t = trunc i32 %ashr to i8 | ||
; CHECK-DAG: DemandedBits: 0xff for %ashr in %ashr.t = trunc i32 %ashr to i8 | ||
; | ||
%ashr = ashr i32 %a, 5 | ||
%ashr.t = trunc i32 %ashr to i8 | ||
ret i8 %ashr.t | ||
} | ||
|
||
define i8 @test_ashr_const_amount_8(i32 %a) { | ||
; CHECK-LABEL: 'test_ashr_const_amount_8' | ||
; CHECK-DAG: DemandedBits: 0xff for %ashr = ashr i32 %a, 8 | ||
; CHECK-DAG: DemandedBits: 0xff00 for %a in %ashr = ashr i32 %a, 8 | ||
; CHECK-DAG: DemandedBits: 0xffffffff for 8 in %ashr = ashr i32 %a, 8 | ||
; CHECK-DAG: DemandedBits: 0xff for %ashr.t = trunc i32 %ashr to i8 | ||
; CHECK-DAG: DemandedBits: 0xff for %ashr in %ashr.t = trunc i32 %ashr to i8 | ||
; | ||
%ashr = ashr i32 %a, 8 | ||
%ashr.t = trunc i32 %ashr to i8 | ||
ret i8 %ashr.t | ||
} | ||
|
||
define i8 @test_ashr_const_amount_9(i32 %a) { | ||
|
||
; CHECK-LABEL: 'test_ashr_const_amount_9' | ||
; CHECK-DAG: DemandedBits: 0xff for %ashr.t = trunc i32 %ashr to i8 | ||
; CHECK-DAG: DemandedBits: 0xff for %ashr in %ashr.t = trunc i32 %ashr to i8 | ||
; CHECK-DAG: DemandedBits: 0xff for %ashr = ashr i32 %a, 8 | ||
; CHECK-DAG: DemandedBits: 0xff00 for %a in %ashr = ashr i32 %a, 8 | ||
; CHECK-DAG: DemandedBits: 0xffffffff for 8 in %ashr = ashr i32 %a, 8 | ||
; | ||
%ashr = ashr i32 %a, 8 | ||
%ashr.t = trunc i32 %ashr to i8 | ||
ret i8 %ashr.t | ||
} | ||
|
||
define i8 @test_ashr(i32 %a, i32 %b) { | ||
; CHECK-LABEL: 'test_ashr' | ||
; CHECK-DAG: DemandedBits: 0xff for %ashr = ashr i32 %a, %b | ||
; CHECK-DAG: DemandedBits: 0xffffffff for %a in %ashr = ashr i32 %a, %b | ||
; CHECK-DAG: DemandedBits: 0xffffffff for %b in %ashr = ashr i32 %a, %b | ||
; CHECK-DAG: DemandedBits: 0xff for %ashr.t = trunc i32 %ashr to i8 | ||
; CHECK-DAG: DemandedBits: 0xff for %ashr in %ashr.t = trunc i32 %ashr to i8 | ||
; | ||
%ashr = ashr i32 %a, %b | ||
%ashr.t = trunc i32 %ashr to i8 | ||
ret i8 %ashr.t | ||
} | ||
|
||
define i8 @test_ashr_range_1(i32 %a, i32 %b) { | ||
; CHECK-LABEL: 'test_ashr_range_1' | ||
; CHECK-DAG: DemandedBits: 0xff for %shl.t = trunc i32 %ashr to i8 | ||
; CHECK-DAG: DemandedBits: 0xff for %ashr in %shl.t = trunc i32 %ashr to i8 | ||
; CHECK-DAG: DemandedBits: 0xffffffff for %b2 = and i32 %b, 3 | ||
; CHECK-DAG: DemandedBits: 0x3 for %b in %b2 = and i32 %b, 3 | ||
; CHECK-DAG: DemandedBits: 0xffffffff for 3 in %b2 = and i32 %b, 3 | ||
; CHECK-DAG: DemandedBits: 0xff for %ashr = ashr i32 %a, %b2 | ||
; CHECK-DAG: DemandedBits: 0x7ff for %a in %ashr = ashr i32 %a, %b2 | ||
; CHECK-DAG: DemandedBits: 0xffffffff for %b2 in %ashr = ashr i32 %a, %b2 | ||
; | ||
%b2 = and i32 %b, 3 | ||
%ashr = ashr i32 %a, %b2 | ||
%shl.t = trunc i32 %ashr to i8 | ||
ret i8 %shl.t | ||
} | ||
|
||
define i32 @test_ashr_range_2(i32 %a, i32 %b) { | ||
; CHECK-LABEL: 'test_ashr_range_2' | ||
; CHECK-DAG: DemandedBits: 0xffffffff for %b2 = and i32 %b, 3 | ||
; CHECK-DAG: DemandedBits: 0x3 for %b in %b2 = and i32 %b, 3 | ||
; CHECK-DAG: DemandedBits: 0xffffffff for 3 in %b2 = and i32 %b, 3 | ||
; CHECK-DAG: DemandedBits: 0xffffffff for %ashr = ashr i32 %a, %b2 | ||
; CHECK-DAG: DemandedBits: 0xffffffff for %a in %ashr = ashr i32 %a, %b2 | ||
; CHECK-DAG: DemandedBits: 0xffffffff for %b2 in %ashr = ashr i32 %a, %b2 | ||
; | ||
%b2 = and i32 %b, 3 | ||
%ashr = ashr i32 %a, %b2 | ||
ret i32 %ashr | ||
} | ||
|
||
define i32 @test_ashr_range_3(i32 %a, i32 %b) { | ||
; CHECK-LABEL: 'test_ashr_range_3' | ||
; CHECK-DAG: DemandedBits: 0xffff for %ashr = ashr i32 %a, %b | ||
; CHECK-DAG: DemandedBits: 0xffffffff for %a in %ashr = ashr i32 %a, %b | ||
; CHECK-DAG: DemandedBits: 0xffffffff for %b in %ashr = ashr i32 %a, %b | ||
; CHECK-DAG: DemandedBits: 0xffffffff for %shl = shl i32 %ashr, 16 | ||
; CHECK-DAG: DemandedBits: 0xffff for %ashr in %shl = shl i32 %ashr, 16 | ||
; CHECK-DAG: DemandedBits: 0xffffffff for 16 in %shl = shl i32 %ashr, 16 | ||
; | ||
%ashr = ashr i32 %a, %b | ||
%shl = shl i32 %ashr, 16 | ||
ret i32 %shl | ||
} | ||
define i32 @test_ashr_range_4(i32 %a, i32 %b) { | ||
; CHECK-LABEL: 'test_ashr_range_4' | ||
; CHECK-DAG: DemandedBits: 0xffffffff for %shr = lshr i32 %ashr, 8 | ||
; CHECK-DAG: DemandedBits: 0xffffff00 for %ashr in %shr = lshr i32 %ashr, 8 | ||
; CHECK-DAG: DemandedBits: 0xffffffff for 8 in %shr = lshr i32 %ashr, 8 | ||
; CHECK-DAG: DemandedBits: 0xffffff00 for %ashr = ashr i32 %a, %b | ||
; CHECK-DAG: DemandedBits: 0xffffff00 for %a in %ashr = ashr i32 %a, %b | ||
; CHECK-DAG: DemandedBits: 0xffffffff for %b in %ashr = ashr i32 %a, %b | ||
%ashr = ashr i32 %a, %b | ||
%shr = lshr i32 %ashr, 8 | ||
ret i32 %shr | ||
} | ||
|
||
define i32 @test_ashr_range_5(i32 %a, i32 %b) { | ||
; CHECK-LABEL: 'test_ashr_range_5' | ||
; CHECK-DAG: DemandedBits: 0xffffffff for %2 = and i32 %1, 255 | ||
; CHECK-DAG: DemandedBits: 0xff for %1 in %2 = and i32 %1, 255 | ||
; CHECK-DAG: DemandedBits: 0xffffffff for 255 in %2 = and i32 %1, 255 | ||
; CHECK-DAG: DemandedBits: 0xff for %1 = ashr i32 %a, %b | ||
; CHECK-DAG: DemandedBits: 0xffffffff for %a in %1 = ashr i32 %a, %b | ||
; CHECK-DAG: DemandedBits: 0xffffffff for %b in %1 = ashr i32 %a, %b | ||
; | ||
%1 = ashr i32 %a, %b | ||
%2 = and i32 %1, 255 | ||
ret i32 %2 | ||
} | ||
|
||
define i32 @test_ashr_range_6(i32 %a, i32 %b) { | ||
; CHECK-LABEL: 'test_ashr_range_6' | ||
; CHECK-DAG: DemandedBits: 0xffff0000 for %lshr.1 = ashr i32 %a, %b | ||
; CHECK-DAG: DemandedBits: 0xffff0000 for %a in %lshr.1 = ashr i32 %a, %b | ||
; CHECK-DAG: DemandedBits: 0xffffffff for %b in %lshr.1 = ashr i32 %a, %b | ||
; CHECK-DAG: DemandedBits: 0xffffffff for %lshr.2 = ashr i32 %lshr.1, 16 | ||
; CHECK-DAG: DemandedBits: 0xffff0000 for %lshr.1 in %lshr.2 = ashr i32 %lshr.1, 16 | ||
; CHECK-DAG: DemandedBits: 0xffffffff for 16 in %lshr.2 = ashr i32 %lshr.1, 16 | ||
; | ||
%lshr.1 = ashr i32 %a, %b | ||
%lshr.2 = ashr i32 %lshr.1, 16 | ||
ret i32 %lshr.2 | ||
} | ||
|
||
define i8 @test_ashr_var_amount(i32 %a, i32 %b){ | ||
; CHECK-LABEL: 'test_ashr_var_amount' | ||
; CHECK-DAG: DemandedBits: 0xff for %4 = ashr i32 %1, %3 | ||
; CHECK-DAG: DemandedBits: 0xffffffff for %1 in %4 = ashr i32 %1, %3 | ||
; CHECK-DAG: DemandedBits: 0xffffffff for %3 in %4 = ashr i32 %1, %3 | ||
; CHECK-DAG: DemandedBits: 0xff for %2 = trunc i32 %1 to i8 | ||
; CHECK-DAG: DemandedBits: 0xff for %1 in %2 = trunc i32 %1 to i8 | ||
; CHECK-DAG: DemandedBits: 0xffffffff for %1 = add nsw i32 %a, %b | ||
; CHECK-DAG: DemandedBits: 0xffffffff for %a in %1 = add nsw i32 %a, %b | ||
; CHECK-DAG: DemandedBits: 0xffffffff for %b in %1 = add nsw i32 %a, %b | ||
; CHECK-DAG: DemandedBits: 0xffffffff for %3 = zext i8 %2 to i32 | ||
; CHECK-DAG: DemandedBits: 0xff for %2 in %3 = zext i8 %2 to i32 | ||
; CHECK-DAG: DemandedBits: 0xff for %5 = trunc i32 %4 to i8 | ||
; CHECK-DAG: DemandedBits: 0xff for %4 in %5 = trunc i32 %4 to i8 | ||
; | ||
%1 = add nsw i32 %a, %b | ||
%2 = trunc i32 %1 to i8 | ||
%3 = zext i8 %2 to i32 | ||
%4 = ashr i32 %1, %3 | ||
%5 = trunc i32 %4 to i8 | ||
ret i8 %5 | ||
} | ||
|
||
define i8 @test_ashr_var_amount_nsw(i32 %a, i32 %b){ | ||
; CHECK-LABEL 'test_ashr_var_amount_nsw' | ||
; CHECK-DAG: DemandedBits: 0xff for %5 = trunc i32 %4 to i8 | ||
; CHECK-DAG: DemandedBits: 0xff for %4 in %5 = trunc i32 %4 to i8 | ||
; CHECK-DAG: DemandedBits: 0xffffffff for %1 = add nsw i32 %a, %b | ||
; CHECK-DAG: DemandedBits: 0xffffffff for %a in %1 = add nsw i32 %a, %b | ||
; CHECK-DAG: DemandedBits: 0xffffffff for %b in %1 = add nsw i32 %a, %b | ||
; CHECK-DAG: DemandedBits: 0xff for %2 = trunc i32 %1 to i8 | ||
; CHECK-DAG: DemandedBits: 0xff for %1 in %2 = trunc i32 %1 to i8 | ||
; CHECK-DAG: DemandedBits: 0xffffffff for %3 = zext i8 %2 to i32 | ||
; CHECK-DAG: DemandedBits: 0xff for %2 in %3 = zext i8 %2 to i32 | ||
; CHECK-DAG: DemandedBits: 0xff for %4 = ashr exact i32 %1, %3 | ||
; CHECK-DAG: DemandedBits: 0xffffffff for %1 in %4 = ashr exact i32 %1, %3 | ||
; CHECK-DAG: DemandedBits: 0xffffffff for %3 in %4 = ashr exact i32 %1, %3 | ||
; | ||
%1 = add nsw i32 %a, %b | ||
%2 = trunc i32 %1 to i8 | ||
%3 = zext i8 %2 to i32 | ||
%4 = ashr exact i32 %1, %3 | ||
%5 = trunc i32 %4 to i8 | ||
ret i8 %5 | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not for this patch, but what if we know the shift amount is even or odd? We would want to use every other shift amount in the range.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you mean if the LoopRange is even or odd? I think the shift amount is always even.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean the shift amount from the instruction itself. computeKnownBits may tell us that the lsb(s) of the shift amount are known to be zero or one. This loop is trying to evaluate all possible shift amounts, but we don't exclude cases that would be impossible based on known values of the lsb(s).
Something like this does exist in the code from KnownBits::shl.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the ShiftAmount of the Shift Inst isn't known,
Then what Min and Max would be?
Min = 0
?Max = BitWidth - 1
?unsigned Min = Known.getMinValue().getLimitedValue(BitWidth - 1); unsigned Max = Known.getMaxValue().getLimitedValue(BitWidth - 1);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enumerating all possible shift amounts may be expensive (see my previous comment #148880 (comment)). But I am fine to reuse the KnownBits' logic.
~KnownBits::shl/lshr/ashr(AOut, ShAmt).Zeros
is exactly what we want.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dtcxzyw The loop is optimized.
for a 128 bit value we only need 2 * 8 + 1 = 17 shifts and 17 ors.
KnownBits::shl/lshr/ashr does an intersection; we don't do an intersection, we do a union. I am not certain that we can employ the same approach as they do since we use a different loop. In their loop, they increment by 1
for (unsigned ShiftAmt = MinShiftAmount; ShiftAmt <= MaxShiftAmount; ++ShiftAmt) {
Instead, we increment by the next power of two
for (unsigned ShiftAmnt = 1; ShiftAmnt <= LoopRange; ShiftAmnt <<= 1) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
~Intersection.Zeros
is a union.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant this
Known = Known.intersectWith(ShiftByConst(LHS, ShiftAmt));
Anyway, since we are using APInts if we switch the loop to do increments by 1, even if we employ the same strategy as the KnownBits technique the time complexity is O(n^2) instead of the O(nlogn) we have now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@topperc what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t think we should handle the lsb case in this patch. So we can keep the optimized loop for now and change it if/when we implement the lsb case which will require additional benchmark evaluation.