You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[LoopPeel] Fix branch weights' effect on block frequencies
For example:
```
declare void @f(i32)
define void @test(i32 %n) {
entry:
br label %do.body
do.body:
%i = phi i32 [ 0, %entry ], [ %inc, %do.body ]
%inc = add i32 %i, 1
call void @f(i32 %i)
%c = icmp sge i32 %inc, %n
br i1 %c, label %do.end, label %do.body, !prof !0
do.end:
ret void
}
!0 = !{!"branch_weights", i32 1, i32 9}
```
Given those branch weights, once any loop iteration is actually
reached, the probability of the loop exiting at the iteration's end is
1/(1+9). That is, the loop is likely to exit every 10 iterations and
thus has an estimated trip count of 10. `opt
-passes='print<block-freq>'` shows that 10 is indeed the frequency of
the loop body:
```
Printing analysis results of BFI for function 'test':
block-frequency-info: test
- entry: float = 1.0, int = 1801439852625920
- do.body: float = 10.0, int = 18014398509481984
- do.end: float = 1.0, int = 1801439852625920
```
Key Observation: The frequency of reaching any particular iteration is
less than for the previous iteration because the previous iteration
has a non-zero probability of exiting the loop. This observation
holds even though every loop iteration, once actually reached, has
exactly the same probability of exiting and thus exactly the same
branch weights.
Now we use `opt -unroll-force-peel-count=2 -passes=loop-unroll` to
peel 2 iterations and insert them before the remaining loop. We
expect the key observation above not to change, but it does under the
implementation without this patch. The block frequency becomes 1.0
for the first iteration, 0.9 for the second, and 6.4 for the main loop
body. Again, a decreasing frequency is expected, but it decreases too
much: the total frequency of the original loop body becomes 8.3. The
new branch weights reveal the problem:
```
!0 = !{!"branch_weights", i32 1, i32 9}
!1 = !{!"branch_weights", i32 1, i32 8}
!2 = !{!"branch_weights", i32 1, i32 7}
```
The exit probability is now 1/10 for the first peeled iteration, 1/9
for the second, and 1/8 for the remaining loop iterations. It seems
this behavior was trying to ensure a decreasing block frequency.
However, as in the key observation above for the original loop, that
happens correctly without decreasing the branch weights across
iterations.
This patch changes the peeling implementation not to decrease the
branch weights across loop iterations so that the frequency for every
iteration is the same as it was in the original loop. The total
frequency of the loop body, summed across all its occurrences, thus
remains 10 after peeling.
Unfortunately, that change means a later analysis cannot accurately
estimate the trip count of the remaining loop while examining the
remaining loop in isolation without considering the probability of
actually reaching it. For that purpose, this patch stores the new
trip count as separate metadata named `llvm.loop.estimated_trip_count`
and extends `llvm::getLoopEstimatedTripCount` to prefer it, if
present, over branch weights.
An alternative fix is for `llvm::getLoopEstimatedTripCount` to
subtract the `llvm.loop.peeled.count` metadata from the trip count
estimated by a loop's branch weights. However, there might be other
loop transformations that still corrupt block frequencies in a similar
manner and require a similar fix. `llvm.loop.estimated_trip_count` is
intended to provide a general way to store estimated trip counts when
branch weights cannot directly store them.
This patch introduces several fixme comments that need to be addressed
before it can land.
0 commit comments