Skip to content

Commit be2ad30

Browse files
committed
[LoopPeel] Fix branch weights
For example, `llvm/test/Transforms/LoopUnroll/peel-branch-weights.ll` tests the following LLVM IR: ``` define void @test() { entry: br label %loop loop: %x = call i32 @get.x() switch i32 %x, label %loop.latch [ i32 0, label %loop.latch i32 1, label %loop.exit i32 2, label %loop.exit ], !prof !0 loop.latch: br label %loop loop.exit: ret void } !0 = !{!"branch_weights", i32 100, i32 200, i32 20, i32 10} ``` Given those branch weights, once any loop iteration is actually reached, the probability of the loop exiting at the iteration's end is (20+10)/(100+200+20+10) = 1/11. That is, the loop is likely to exit every 11 iterations. `opt -passes='print<block-freq>'` shows that 11 is indeed the frequency of the loop body: ``` block-frequency-info: test - entry: float = 1.0, int = 1637672590245888 - loop: float = 11.0, int = 18014398509481984 - loop.latch: float = 10.0, int = 16376725919236096 - loop.exit: float = 1.0, int = 1637672590245888 ``` Key Observation: The frequency of reaching any particular iteration is logically less than for the previous iteration exactly because the previous iteration has a non-zero probability of exiting the loop. This observation holds even though every loop iteration, once actually reached, has exactly the same probability of exiting and exactly the same branch weights. After peeling 2 iterations as in the test, we expect those observations not to change, but they do under the implementation without this patch. The block frequency becomes 1.0 for the first iteration, 0.90909 for the second, and 7.3636 for the main loop body. Again, a decreasing frequency is expected, but it decreases too much: the total frequency of the original loop body becomes 9.2727. The new branch weights reveal the problem: ``` !0 = !{!"branch_weights", i32 100, i32 200, i32 20, i32 10} !1 = !{!"branch_weights", i32 90, i32 180, i32 20, i32 10} !2 = !{!"branch_weights", i32 80, i32 160, i32 20, i32 10} ``` The exit probability is now 1/11 for the first peeled iteration, 1/10 for the second, and 1/9 for the remaining loop iterations. Based on comments in `LoopPeel.cpp`, it seems this behavior was trying to ensure a decreasing frequency. However, as explained above for the original loop, that happens correctly without decreasing the branch weights across iterations. This patch changes the peeling implementation not to decrease the branch weights across loop iterations so that the probabilities for every iteration are the same as they were in the original loop. The total frequency of the loop body, summed across all its occurrences, thus remains 11 after peeling.
1 parent 0be3f13 commit be2ad30

File tree

2 files changed

+47
-122
lines changed

2 files changed

+47
-122
lines changed

llvm/lib/Transforms/Utils/LoopPeel.cpp

Lines changed: 14 additions & 91 deletions
Original file line numberDiff line numberDiff line change
@@ -657,84 +657,6 @@ void llvm::computePeelCount(Loop *L, unsigned LoopSize,
657657
}
658658
}
659659

660-
struct WeightInfo {
661-
// Weights for current iteration.
662-
SmallVector<uint32_t> Weights;
663-
// Weights to subtract after each iteration.
664-
const SmallVector<uint32_t> SubWeights;
665-
};
666-
667-
/// Update the branch weights of an exiting block of a peeled-off loop
668-
/// iteration.
669-
/// Let F is a weight of the edge to continue (fallthrough) into the loop.
670-
/// Let E is a weight of the edge to an exit.
671-
/// F/(F+E) is a probability to go to loop and E/(F+E) is a probability to
672-
/// go to exit.
673-
/// Then, Estimated ExitCount = F / E.
674-
/// For I-th (counting from 0) peeled off iteration we set the weights for
675-
/// the peeled exit as (EC - I, 1). It gives us reasonable distribution,
676-
/// The probability to go to exit 1/(EC-I) increases. At the same time
677-
/// the estimated exit count in the remainder loop reduces by I.
678-
/// To avoid dealing with division rounding we can just multiple both part
679-
/// of weights to E and use weight as (F - I * E, E).
680-
static void updateBranchWeights(Instruction *Term, WeightInfo &Info) {
681-
setBranchWeights(*Term, Info.Weights, /*IsExpected=*/false);
682-
for (auto [Idx, SubWeight] : enumerate(Info.SubWeights))
683-
if (SubWeight != 0)
684-
// Don't set the probability of taking the edge from latch to loop header
685-
// to less than 1:1 ratio (meaning Weight should not be lower than
686-
// SubWeight), as this could significantly reduce the loop's hotness,
687-
// which would be incorrect in the case of underestimating the trip count.
688-
Info.Weights[Idx] =
689-
Info.Weights[Idx] > SubWeight
690-
? std::max(Info.Weights[Idx] - SubWeight, SubWeight)
691-
: SubWeight;
692-
}
693-
694-
/// Initialize the weights for all exiting blocks.
695-
static void initBranchWeights(DenseMap<Instruction *, WeightInfo> &WeightInfos,
696-
Loop *L) {
697-
SmallVector<BasicBlock *> ExitingBlocks;
698-
L->getExitingBlocks(ExitingBlocks);
699-
for (BasicBlock *ExitingBlock : ExitingBlocks) {
700-
Instruction *Term = ExitingBlock->getTerminator();
701-
SmallVector<uint32_t> Weights;
702-
if (!extractBranchWeights(*Term, Weights))
703-
continue;
704-
705-
// See the comment on updateBranchWeights() for an explanation of what we
706-
// do here.
707-
uint32_t FallThroughWeights = 0;
708-
uint32_t ExitWeights = 0;
709-
for (auto [Succ, Weight] : zip(successors(Term), Weights)) {
710-
if (L->contains(Succ))
711-
FallThroughWeights += Weight;
712-
else
713-
ExitWeights += Weight;
714-
}
715-
716-
// Don't try to update weights for degenerate case.
717-
if (FallThroughWeights == 0)
718-
continue;
719-
720-
SmallVector<uint32_t> SubWeights;
721-
for (auto [Succ, Weight] : zip(successors(Term), Weights)) {
722-
if (!L->contains(Succ)) {
723-
// Exit weights stay the same.
724-
SubWeights.push_back(0);
725-
continue;
726-
}
727-
728-
// Subtract exit weights on each iteration, distributed across all
729-
// fallthrough edges.
730-
double W = (double)Weight / (double)FallThroughWeights;
731-
SubWeights.push_back((uint32_t)(ExitWeights * W));
732-
}
733-
734-
WeightInfos.insert({Term, {std::move(Weights), std::move(SubWeights)}});
735-
}
736-
}
737-
738660
/// Clones the body of the loop L, putting it between \p InsertTop and \p
739661
/// InsertBot.
740662
/// \param IterNumber The serial number of the iteration currently being
@@ -1008,11 +930,6 @@ bool llvm::peelLoop(Loop *L, unsigned PeelCount, LoopInfo *LI,
1008930
Instruction *LatchTerm =
1009931
cast<Instruction>(cast<BasicBlock>(Latch)->getTerminator());
1010932

1011-
// If we have branch weight information, we'll want to update it for the
1012-
// newly created branches.
1013-
DenseMap<Instruction *, WeightInfo> Weights;
1014-
initBranchWeights(Weights, L);
1015-
1016933
// Identify what noalias metadata is inside the loop: if it is inside the
1017934
// loop, the associated metadata must be cloned for each iteration.
1018935
SmallVector<MDNode *, 6> LoopLocalNoAliasDeclScopes;
@@ -1040,10 +957,20 @@ bool llvm::peelLoop(Loop *L, unsigned PeelCount, LoopInfo *LI,
1040957
assert(DT.verify(DominatorTree::VerificationLevel::Fast));
1041958
#endif
1042959

1043-
for (auto &[Term, Info] : Weights) {
1044-
auto *TermCopy = cast<Instruction>(VMap[Term]);
1045-
updateBranchWeights(TermCopy, Info);
1046-
}
960+
// Do not adjust the branch weights of an exiting block of a peeled-off loop
961+
// iteration or of the remaining loop. Before peeling, once any iteration
962+
// is actually reached, the probability of the loop exiting at the
963+
// iteration's end is exactly the same across all iterations because there's
964+
// only one set of branch weights for them all. Peeling does not change
965+
// those probabilties, so there's no reason to adjust the branch weights.
966+
//
967+
// Of course, the probability of *reaching* any particular iteration is
968+
// logically less than for the previous iteration exactly if the previous
969+
// iteration has a non-zero probability of exiting the loop. In a previous
970+
// implementation, that observation was apparently used to justify
971+
// decreasing the branch weights across iterations, but all that
972+
// accomplishes is corrupting the probabilities relative to the original
973+
// loop.
1047974

1048975
// Remove Loop metadata from the latch branch instruction
1049976
// because it is not the Loop's latch branch anymore.
@@ -1070,10 +997,6 @@ bool llvm::peelLoop(Loop *L, unsigned PeelCount, LoopInfo *LI,
1070997
PHI->setIncomingValueForBlock(NewPreHeader, NewVal);
1071998
}
1072999

1073-
for (const auto &[Term, Info] : Weights) {
1074-
setBranchWeights(*Term, Info.Weights, /*IsExpected=*/false);
1075-
}
1076-
10771000
// Update Metadata for count of peeled off iterations.
10781001
unsigned AlreadyPeeled = 0;
10791002
if (auto Peeled = getOptionalIntLoopAttribute(L, PeeledCountMetaData))

llvm/test/Transforms/LoopUnroll/peel-branch-weights.ll

Lines changed: 33 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -15,9 +15,9 @@ define void @test() {
1515
; CHECK: loop.peel:
1616
; CHECK-NEXT: [[X_PEEL:%.*]] = call i32 @get.x()
1717
; CHECK-NEXT: switch i32 [[X_PEEL]], label [[LOOP_LATCH_PEEL:%.*]] [
18-
; CHECK-NEXT: i32 0, label [[LOOP_LATCH_PEEL]]
19-
; CHECK-NEXT: i32 1, label [[LOOP_EXIT:%.*]]
20-
; CHECK-NEXT: i32 2, label [[LOOP_EXIT]]
18+
; CHECK-NEXT: i32 0, label [[LOOP_LATCH_PEEL]]
19+
; CHECK-NEXT: i32 1, label [[LOOP_EXIT:%.*]]
20+
; CHECK-NEXT: i32 2, label [[LOOP_EXIT]]
2121
; CHECK-NEXT: ], !prof [[PROF0:![0-9]+]]
2222
; CHECK: loop.latch.peel:
2323
; CHECK-NEXT: br label [[LOOP_PEEL_NEXT:%.*]]
@@ -26,10 +26,10 @@ define void @test() {
2626
; CHECK: loop.peel2:
2727
; CHECK-NEXT: [[X_PEEL3:%.*]] = call i32 @get.x()
2828
; CHECK-NEXT: switch i32 [[X_PEEL3]], label [[LOOP_LATCH_PEEL4:%.*]] [
29-
; CHECK-NEXT: i32 0, label [[LOOP_LATCH_PEEL4]]
30-
; CHECK-NEXT: i32 1, label [[LOOP_EXIT]]
31-
; CHECK-NEXT: i32 2, label [[LOOP_EXIT]]
32-
; CHECK-NEXT: ], !prof [[PROF1:![0-9]+]]
29+
; CHECK-NEXT: i32 0, label [[LOOP_LATCH_PEEL4]]
30+
; CHECK-NEXT: i32 1, label [[LOOP_EXIT]]
31+
; CHECK-NEXT: i32 2, label [[LOOP_EXIT]]
32+
; CHECK-NEXT: ], !prof [[PROF0]]
3333
; CHECK: loop.latch.peel4:
3434
; CHECK-NEXT: br label [[LOOP_PEEL_NEXT1:%.*]]
3535
; CHECK: loop.peel.next1:
@@ -41,31 +41,33 @@ define void @test() {
4141
; CHECK: loop:
4242
; CHECK-NEXT: [[X:%.*]] = call i32 @get.x()
4343
; CHECK-NEXT: switch i32 [[X]], label [[LOOP_LATCH:%.*]] [
44-
; CHECK-NEXT: i32 0, label [[LOOP_LATCH]]
45-
; CHECK-NEXT: i32 1, label [[LOOP_EXIT_LOOPEXIT:%.*]]
46-
; CHECK-NEXT: i32 2, label [[LOOP_EXIT_LOOPEXIT]]
47-
; CHECK-NEXT: ], !prof [[PROF2:![0-9]+]]
44+
; CHECK-NEXT: i32 0, label [[LOOP_LATCH]]
45+
; CHECK-NEXT: i32 1, label [[LOOP_EXIT_LOOPEXIT:%.*]]
46+
; CHECK-NEXT: i32 2, label [[LOOP_EXIT_LOOPEXIT]]
47+
; CHECK-NEXT: ], !prof [[PROF0]]
4848
; CHECK: loop.latch:
49-
; CHECK-NEXT: br label [[LOOP]], !llvm.loop [[LOOP3:![0-9]+]]
49+
; CHECK-NEXT: br label [[LOOP]], !llvm.loop [[LOOP1:![0-9]+]]
5050
; CHECK: loop.exit.loopexit:
5151
; CHECK-NEXT: br label [[LOOP_EXIT]]
5252
; CHECK: loop.exit:
5353
; CHECK-NEXT: ret void
54+
;
55+
; DISABLEADV-LABEL: @test(
56+
; DISABLEADV-NEXT: entry:
57+
; DISABLEADV-NEXT: br label [[LOOP:%.*]]
58+
; DISABLEADV: loop:
59+
; DISABLEADV-NEXT: [[X:%.*]] = call i32 @get.x()
60+
; DISABLEADV-NEXT: switch i32 [[X]], label [[LOOP_LATCH:%.*]] [
61+
; DISABLEADV-NEXT: i32 0, label [[LOOP_LATCH]]
62+
; DISABLEADV-NEXT: i32 1, label [[LOOP_EXIT:%.*]]
63+
; DISABLEADV-NEXT: i32 2, label [[LOOP_EXIT]]
64+
; DISABLEADV-NEXT: ], !prof [[PROF0:![0-9]+]]
65+
; DISABLEADV: loop.latch:
66+
; DISABLEADV-NEXT: br label [[LOOP]]
67+
; DISABLEADV: loop.exit:
68+
; DISABLEADV-NEXT: ret void
69+
;
5470

55-
; DISABLEADV-LABEL: @test()
56-
; DISABLEADV-NEXT: entry:
57-
; DISABLEADV-NEXT: br label %loop
58-
; DISABLEADV: loop
59-
; DISABLEADV-NEXT: %x = call i32 @get.x()
60-
; DISABLEADV-NEXT: switch i32 %x, label %loop.latch [
61-
; DISABLEADV-NEXT: i32 0, label %loop.latch
62-
; DISABLEADV-NEXT: i32 1, label %loop.exit
63-
; DISABLEADV-NEXT: i32 2, label %loop.exit
64-
; DISABLEADV-NEXT: ], !prof !0
65-
; DISABLEADV: loop.latch:
66-
; DISABLEADV-NEXT: br label %loop
67-
; DISABLEADV: loop.exit:
68-
; DISABLEADV-NEXT: ret void
6971

7072
entry:
7173
br label %loop
@@ -89,9 +91,9 @@ loop.exit:
8991

9092
;.
9193
; CHECK: [[PROF0]] = !{!"branch_weights", i32 100, i32 200, i32 20, i32 10}
92-
; CHECK: [[PROF1]] = !{!"branch_weights", i32 90, i32 180, i32 20, i32 10}
93-
; CHECK: [[PROF2]] = !{!"branch_weights", i32 80, i32 160, i32 20, i32 10}
94-
; CHECK: [[LOOP3]] = distinct !{!3, !4, !5}
95-
; CHECK: [[META4:![0-9]+]] = !{!"llvm.loop.peeled.count", i32 2}
96-
; CHECK: [[META5:![0-9]+]] = !{!"llvm.loop.unroll.disable"}
94+
; CHECK: [[LOOP1]] = distinct !{[[LOOP1]], [[META2:![0-9]+]], [[META3:![0-9]+]]}
95+
; CHECK: [[META2]] = !{!"llvm.loop.peeled.count", i32 2}
96+
; CHECK: [[META3]] = !{!"llvm.loop.unroll.disable"}
97+
;.
98+
; DISABLEADV: [[PROF0]] = !{!"branch_weights", i32 100, i32 200, i32 20, i32 10}
9799
;.

0 commit comments

Comments
 (0)