Skip to content

[LoopPeel] Fix branch weights' effect on block frequencies #128785

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 20 commits into
base: users/jdenny-ornl/pgo-estimated-trip-count
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 11 commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
f413520
[LoopPeel] Fix branch weights' effect on block frequencies
jdenny-ornl Mar 19, 2025
f821eeb
Run update_test_checks.py on a test
jdenny-ornl Mar 26, 2025
af8ec56
Fix typo
jdenny-ornl Apr 4, 2025
a0264ad
Merge branch 'main' into fix-peel-branch-weights
jdenny-ornl Apr 8, 2025
fd29a49
Merge branch 'main' into fix-peel-branch-weights
jdenny-ornl Apr 9, 2025
6303177
Document new metadata
jdenny-ornl Apr 10, 2025
bbd0e95
Improve LangRef.rst entry
jdenny-ornl May 1, 2025
715cb0a
Merge branch 'main' into fix-peel-branch-weights
jdenny-ornl May 5, 2025
67fa67d
Merge branch 'main' into fix-peel-branch-weights
jdenny-ornl Jun 10, 2025
37ce859
Update fixmes
jdenny-ornl Jun 16, 2025
4337dcd
Merge branch 'main' into fix-peel-branch-weights
jdenny-ornl Jun 16, 2025
5193158
Update test for AArch4, which I did not build before
jdenny-ornl Jun 17, 2025
bbd2f22
Merge branch 'main' into fix-peel-branch-weights
jdenny-ornl Jul 10, 2025
b23f467
Run update script on test changed by merge from main
jdenny-ornl Jul 10, 2025
e250cfc
Merge branch 'main' into fix-peel-branch-weights
jdenny-ornl Jul 15, 2025
859b84d
Merge branch 'pgo-estimated-trip-count' into fix-peel-branch-weights
jdenny-ornl Jul 15, 2025
3f6a91a
Merge branch 'pgo-estimated-trip-count' into fix-peel-branch-weights
jdenny-ornl Jul 24, 2025
e5a0a26
Update for merge from pgo-estimated-trip-count
jdenny-ornl Jul 24, 2025
c283ebe
Merge branch 'pgo-estimated-trip-count' into fix-peel-branch-weights
jdenny-ornl Jul 24, 2025
ecbf6e0
Merge branch 'pgo-estimated-trip-count' into fix-peel-branch-weights
jdenny-ornl Jul 28, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 29 additions & 0 deletions llvm/docs/LangRef.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7911,6 +7911,35 @@ The attributes in this metadata is added to all followup loops of the
loop distribution pass. See
:ref:`Transformation Metadata <transformation-metadata>` for details.

'``llvm.loop.estimated_trip_count``' Metadata
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

This metadata records the loop's estimated trip count. The first
operand is the string ``llvm.loop.estimated_trip_count`` and the
second operand is an integer specifying the count. For example:

.. code-block:: llvm

!0 = !{!"llvm.loop.estimated_trip_count", i32 8}

A loop's estimated trip count is an estimate of the average number of
loop iterations (specifically, the number of times the loop's header
executes) each time execution reaches the loop. It is usually only an
estimate based on, for example, profile data. The actual number of
iterations might vary widely.

The estimated trip count serves as a parameter for various loop
transformations and typically helps estimate transformation cost. For
example, it can help determine how many iterations to peel or how
aggressively to unroll.

If this metadata is not present, such passes compute the estimated
trip count from any ``branch_weights`` metadata attached to the latch
block's branch instruction. Thus, this metadata frees loop
transformations to compute latch branch weights solely for the purpose
of maintaining accurate block frequencies instead of requiring the
branch weights to always serve both roles.

'``llvm.licm.disable``' Metadata
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Expand Down
25 changes: 17 additions & 8 deletions llvm/include/llvm/Transforms/Utils/LoopUtils.h
Original file line number Diff line number Diff line change
Expand Up @@ -322,7 +322,8 @@ LLVM_ABI TransformationMode hasLICMVersioningTransformation(const Loop *L);
LLVM_ABI void addStringMetadataToLoop(Loop *TheLoop, const char *MDString,
unsigned V = 0);

/// Returns a loop's estimated trip count based on branch weight metadata.
/// Returns a loop's estimated trip count based on
/// llvm.loop.estimated_trip_count metadata or, if none, branch weight metadata.
/// In addition if \p EstimatedLoopInvocationWeight is not null it is
/// initialized with weight of loop's latch leading to the exit.
/// Returns a valid positive trip count, saturated at UINT_MAX, or std::nullopt
Expand All @@ -331,13 +332,21 @@ LLVM_ABI std::optional<unsigned>
getLoopEstimatedTripCount(Loop *L,
unsigned *EstimatedLoopInvocationWeight = nullptr);

/// Set a loop's branch weight metadata to reflect that loop has \p
/// EstimatedTripCount iterations and \p EstimatedLoopInvocationWeight exits
/// through latch. Returns true if metadata is successfully updated, false
/// otherwise. Note that loop must have a latch block which controls loop exit
/// in order to succeed.
LLVM_ABI bool setLoopEstimatedTripCount(Loop *L, unsigned EstimatedTripCount,
unsigned EstimatedLoopInvocationWeight);
/// Set a loop's llvm.loop.estimated_trip_count metadata and, if \p
/// EstimatedLoopInvocationWeight, branch weight metadata to reflect that loop
/// has \p EstimatedTripCount iterations and \p EstimatedLoopInvocationWeight
/// exit weight through latch. Returns true if metadata is successfully updated,
/// false otherwise. Note that loop must have a latch block which controls loop
/// exit in order to succeed.
///
/// The use case for not setting branch weight metadata is when the original
/// branch weight metadata is correct for computing block frequencies but the
/// trip count has changed due to a loop transformation. The branch weight
/// metadata cannot be adjusted to reflect the new trip count, so we store the
/// new trip count separately.
LLVM_ABI bool setLoopEstimatedTripCount(
Loop *L, unsigned EstimatedTripCount,
std::optional<unsigned> EstimatedLoopInvocationWeight);

/// Check inner loop (L) backedge count is known to be invariant on all
/// iterations of its outer loop. If the loop has no parent, this is trivially
Expand Down
145 changes: 52 additions & 93 deletions llvm/lib/Transforms/Utils/LoopPeel.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -739,84 +739,6 @@ void llvm::computePeelCount(Loop *L, unsigned LoopSize,
}
}

struct WeightInfo {
// Weights for current iteration.
SmallVector<uint32_t> Weights;
// Weights to subtract after each iteration.
const SmallVector<uint32_t> SubWeights;
};

/// Update the branch weights of an exiting block of a peeled-off loop
/// iteration.
/// Let F is a weight of the edge to continue (fallthrough) into the loop.
/// Let E is a weight of the edge to an exit.
/// F/(F+E) is a probability to go to loop and E/(F+E) is a probability to
/// go to exit.
/// Then, Estimated ExitCount = F / E.
/// For I-th (counting from 0) peeled off iteration we set the weights for
/// the peeled exit as (EC - I, 1). It gives us reasonable distribution,
/// The probability to go to exit 1/(EC-I) increases. At the same time
/// the estimated exit count in the remainder loop reduces by I.
/// To avoid dealing with division rounding we can just multiple both part
/// of weights to E and use weight as (F - I * E, E).
static void updateBranchWeights(Instruction *Term, WeightInfo &Info) {
setBranchWeights(*Term, Info.Weights, /*IsExpected=*/false);
for (auto [Idx, SubWeight] : enumerate(Info.SubWeights))
if (SubWeight != 0)
// Don't set the probability of taking the edge from latch to loop header
// to less than 1:1 ratio (meaning Weight should not be lower than
// SubWeight), as this could significantly reduce the loop's hotness,
// which would be incorrect in the case of underestimating the trip count.
Info.Weights[Idx] =
Info.Weights[Idx] > SubWeight
? std::max(Info.Weights[Idx] - SubWeight, SubWeight)
: SubWeight;
}

/// Initialize the weights for all exiting blocks.
static void initBranchWeights(DenseMap<Instruction *, WeightInfo> &WeightInfos,
Loop *L) {
SmallVector<BasicBlock *> ExitingBlocks;
L->getExitingBlocks(ExitingBlocks);
for (BasicBlock *ExitingBlock : ExitingBlocks) {
Instruction *Term = ExitingBlock->getTerminator();
SmallVector<uint32_t> Weights;
if (!extractBranchWeights(*Term, Weights))
continue;

// See the comment on updateBranchWeights() for an explanation of what we
// do here.
uint32_t FallThroughWeights = 0;
uint32_t ExitWeights = 0;
for (auto [Succ, Weight] : zip(successors(Term), Weights)) {
if (L->contains(Succ))
FallThroughWeights += Weight;
else
ExitWeights += Weight;
}

// Don't try to update weights for degenerate case.
if (FallThroughWeights == 0)
continue;

SmallVector<uint32_t> SubWeights;
for (auto [Succ, Weight] : zip(successors(Term), Weights)) {
if (!L->contains(Succ)) {
// Exit weights stay the same.
SubWeights.push_back(0);
continue;
}

// Subtract exit weights on each iteration, distributed across all
// fallthrough edges.
double W = (double)Weight / (double)FallThroughWeights;
SubWeights.push_back((uint32_t)(ExitWeights * W));
}

WeightInfos.insert({Term, {std::move(Weights), std::move(SubWeights)}});
}
}

/// Clones the body of the loop L, putting it between \p InsertTop and \p
/// InsertBot.
/// \param IterNumber The serial number of the iteration currently being
Expand Down Expand Up @@ -1188,11 +1110,6 @@ bool llvm::peelLoop(Loop *L, unsigned PeelCount, bool PeelLast, LoopInfo *LI,
Instruction *LatchTerm =
cast<Instruction>(cast<BasicBlock>(Latch)->getTerminator());

// If we have branch weight information, we'll want to update it for the
// newly created branches.
DenseMap<Instruction *, WeightInfo> Weights;
initBranchWeights(Weights, L);

// Identify what noalias metadata is inside the loop: if it is inside the
// loop, the associated metadata must be cloned for each iteration.
SmallVector<MDNode *, 6> LoopLocalNoAliasDeclScopes;
Expand Down Expand Up @@ -1238,11 +1155,6 @@ bool llvm::peelLoop(Loop *L, unsigned PeelCount, bool PeelLast, LoopInfo *LI,
assert(DT.verify(DominatorTree::VerificationLevel::Fast));
#endif

for (auto &[Term, Info] : Weights) {
auto *TermCopy = cast<Instruction>(VMap[Term]);
updateBranchWeights(TermCopy, Info);
}

// Remove Loop metadata from the latch branch instruction
// because it is not the Loop's latch branch anymore.
auto *LatchTermCopy = cast<Instruction>(VMap[LatchTerm]);
Expand Down Expand Up @@ -1282,15 +1194,62 @@ bool llvm::peelLoop(Loop *L, unsigned PeelCount, bool PeelLast, LoopInfo *LI,
}
}

for (const auto &[Term, Info] : Weights) {
setBranchWeights(*Term, Info.Weights, /*IsExpected=*/false);
}

// Update Metadata for count of peeled off iterations.
unsigned AlreadyPeeled = 0;
if (auto Peeled = getOptionalIntLoopAttribute(L, PeeledCountMetaData))
AlreadyPeeled = *Peeled;
addStringMetadataToLoop(L, PeeledCountMetaData, AlreadyPeeled + PeelCount);
unsigned TotalPeeled = AlreadyPeeled + PeelCount;
addStringMetadataToLoop(L, PeeledCountMetaData, TotalPeeled);

// Update metadata for the estimated trip count. The original branch weight
// metadata is already correct for both the remaining loop and the peeled loop
// iterations, so don't adjust it.
//
// For example, consider what happens when peeling 2 iterations from a loop
// with an estimated trip count of 10 and inserting them before the remaining
// loop. Each of the peeled iterations and each iteration in the remaining
// loop still has the same probability of exiting the *entire original* loop
// as it did when in the original loop, and thus it should still have the same
// branch weights. The peeled iterations' non-zero probabilities of exiting
// already appropriately reduce the probability of reaching the remaining
// iterations just as they did in the original loop. Trying to also adjust
// the remaining loop's branch weights to reflect its new trip count of 8 will
// erroneously further reduce its block frequencies. However, in case an
// analysis later needs to determine the trip count of the remaining loop
// while examining it in isolation without considering the probability of
// actually reaching it, we store the new trip count as separate metadata.
//
// TODO: getLoopEstimatedTripCount and setLoopEstimatedTripCount skip loops
// that don't match the restrictions of getExpectedExitLoopLatchBranch in
// LoopUtils.cpp. For example,
// llvm/tests/Transforms/LoopUnroll/peel-branch-weights.ll (introduced by
// b43a4d0850d5) has multiple exits. Should we try to extend them to handle
// such cases? For now, we just don't try to record
// llvm.loop.estimated_trip_count for such cases, so the original branch
// weights will have to do.
if (auto EstimatedTripCount = getLoopEstimatedTripCount(L)) {
// FIXME: The previous updateBranchWeights implementation had this
// comment:
//
// Don't set the probability of taking the edge from latch to loop header
// to less than 1:1 ratio (meaning Weight should not be lower than
// SubWeight), as this could significantly reduce the loop's hotness,
// which would be incorrect in the case of underestimating the trip count.
//
// See e8d5db206c2f commit log for further discussion. That seems to
// suggest that we should avoid ever setting a trip count of < 2 here
// (equal chance of continuing and exiting means the loop will likely
// continue once and then exit once). Or is keeping the original branch
// weights already a sufficient improvement for whatever analysis cares
// about this case?
unsigned EstimatedTripCountNew = *EstimatedTripCount;
if (EstimatedTripCountNew < TotalPeeled) // FIXME: TotalPeeled + 2?
EstimatedTripCountNew = 0; // FIXME: = 2?
else
EstimatedTripCountNew -= TotalPeeled;
setLoopEstimatedTripCount(L, EstimatedTripCountNew,
/*EstimatedLoopInvocationWeight=*/std::nullopt);
}

if (Loop *ParentLoop = L->getParentLoop())
L = ParentLoop;
Expand Down
16 changes: 13 additions & 3 deletions llvm/lib/Transforms/Utils/LoopUtils.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,8 @@ using namespace llvm::PatternMatch;

static const char *LLVMLoopDisableNonforced = "llvm.loop.disable_nonforced";
static const char *LLVMLoopDisableLICM = "llvm.licm.disable";
static const char *LLVMLoopEstimatedTripCount =
"llvm.loop.estimated_trip_count";

bool llvm::formDedicatedExitBlocks(Loop *L, DominatorTree *DT, LoopInfo *LI,
MemorySSAUpdater *MSSAU,
Expand Down Expand Up @@ -850,27 +852,35 @@ llvm::getLoopEstimatedTripCount(Loop *L,
getEstimatedTripCount(LatchBranch, L, ExitWeight)) {
if (EstimatedLoopInvocationWeight)
*EstimatedLoopInvocationWeight = ExitWeight;
if (auto EstimatedTripCount =
getOptionalIntLoopAttribute(L, LLVMLoopEstimatedTripCount))
return EstimatedTripCount;
return *EstTripCount;
}
}
return std::nullopt;
}

bool llvm::setLoopEstimatedTripCount(Loop *L, unsigned EstimatedTripCount,
unsigned EstimatedloopInvocationWeight) {
bool llvm::setLoopEstimatedTripCount(
Loop *L, unsigned EstimatedTripCount,
std::optional<unsigned> EstimatedloopInvocationWeight) {
// At the moment, we currently support changing the estimate trip count of
// the latch branch only. We could extend this API to manipulate estimated
// trip counts for any exit.
BranchInst *LatchBranch = getExpectedExitLoopLatchBranch(L);
if (!LatchBranch)
return false;

addStringMetadataToLoop(L, LLVMLoopEstimatedTripCount, EstimatedTripCount);
if (!EstimatedloopInvocationWeight)
return true;

// Calculate taken and exit weights.
unsigned LatchExitWeight = 0;
unsigned BackedgeTakenWeight = 0;

if (EstimatedTripCount > 0) {
LatchExitWeight = EstimatedloopInvocationWeight;
LatchExitWeight = *EstimatedloopInvocationWeight;
BackedgeTakenWeight = (EstimatedTripCount - 1) * LatchExitWeight;
}

Expand Down
75 changes: 75 additions & 0 deletions llvm/test/Transforms/LoopUnroll/peel-branch-weights-freq.ll
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
; Test branch weight metadata, estimated trip count metadata, and block
; frequencies after loop peeling.

; RUN: opt < %s -S -passes='print<block-freq>' 2>&1 | \
; RUN: FileCheck -check-prefix=CHECK %s

; The -implicit-check-not options make sure that no additional labels or calls
; to @f show up.
; RUN: opt < %s -S -passes='loop-unroll,print<block-freq>' \
; RUN: -unroll-force-peel-count=2 2>&1 | \
; RUN: FileCheck %s -check-prefix=CHECK-UR \
; RUN: -implicit-check-not='{{^[^ ;]*:}}' \
; RUN: -implicit-check-not='call void @f'

; CHECK: block-frequency-info: test
; CHECK: do.body: float = 10.0,

; The sum should still be ~10.
;
; CHECK-UR: block-frequency-info: test
; CHECK-UR: - [[DO_BODY_PEEL:.*]]: float = 1.0,
; CHECK-UR: - [[DO_BODY_PEEL2:.*]]: float = 0.9,
; CHECK-UR: - [[DO_BODY:.*]]: float = 8.1,

declare void @f(i32)

define void @test(i32 %n) {
; CHECK-UR-LABEL: define void @test(
; CHECK-UR: [[ENTRY:.*]]:
; CHECK-UR: br label %[[DO_BODY_PEEL_BEGIN:.*]]
; CHECK-UR: [[DO_BODY_PEEL_BEGIN]]:
; CHECK-UR: br label %[[DO_BODY_PEEL:.*]]
; CHECK-UR: [[DO_BODY_PEEL]]:
; CHECK-UR: call void @f
; CHECK-UR: br i1 %{{.*}}, label %[[DO_END:.*]], label %[[DO_BODY_PEEL_NEXT:.*]], !prof ![[#PROF:]]
; CHECK-UR: [[DO_BODY_PEEL_NEXT]]:
; CHECK-UR: br label %[[DO_BODY_PEEL2:.*]]
; CHECK-UR: [[DO_BODY_PEEL2]]:
; CHECK-UR: call void @f
; CHECK-UR: br i1 %{{.*}}, label %[[DO_END]], label %[[DO_BODY_PEEL_NEXT1:.*]], !prof ![[#PROF]]
; CHECK-UR: [[DO_BODY_PEEL_NEXT1]]:
; CHECK-UR: br label %[[DO_BODY_PEEL_NEXT5:.*]]
; CHECK-UR: [[DO_BODY_PEEL_NEXT5]]:
; CHECK-UR: br label %[[ENTRY_PEEL_NEWPH:.*]]
; CHECK-UR: [[ENTRY_PEEL_NEWPH]]:
; CHECK-UR: br label %[[DO_BODY]]
; CHECK-UR: [[DO_BODY]]:
; CHECK-UR: call void @f
; CHECK-UR: br i1 %{{.*}}, label %[[DO_END_LOOPEXIT:.*]], label %[[DO_BODY]], !prof ![[#PROF]], !llvm.loop ![[#LOOP_UR_LATCH:]]
; CHECK-UR: [[DO_END_LOOPEXIT]]:
; CHECK-UR: br label %[[DO_END]]
; CHECK-UR: [[DO_END]]:
; CHECK-UR: ret void

entry:
br label %do.body

do.body:
%i = phi i32 [ 0, %entry ], [ %inc, %do.body ]
%inc = add i32 %i, 1
call void @f(i32 %i)
%c = icmp sge i32 %inc, %n
br i1 %c, label %do.end, label %do.body, !prof !0

do.end:
ret void
}

!0 = !{!"branch_weights", i32 1, i32 9}

; CHECK-UR: ![[#PROF]] = !{!"branch_weights", i32 1, i32 9}
; CHECK-UR: ![[#LOOP_UR_LATCH]] = distinct !{![[#LOOP_UR_LATCH]], ![[#LOOP_UR_PC:]], ![[#LOOP_UR_TC:]], ![[#DISABLE:]]}
; CHECK-UR: ![[#LOOP_UR_PC]] = !{!"llvm.loop.peeled.count", i32 2}
; CHECK-UR: ![[#LOOP_UR_TC]] = !{!"llvm.loop.estimated_trip_count", i32 8}
; CHECK-UR: ![[#DISABLE]] = !{!"llvm.loop.unroll.disable"}
Loading
Loading