-
Notifications
You must be signed in to change notification settings - Fork 28
[AIE2P] Allow more address chaining and add a profitability check heuristic #451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: aie-public
Are you sure you want to change the base?
Conversation
…ristic This allows chaining when the base pointer is used in other basic blocks but only when it is considered profitable: - This shouldn't happen in a loop as the resulting copy will be more costly - The cost of chaining is incremented for each offset falling outside the load/store immediate ranges. - An experimental threshold is used to determine if chaining is profitable based on the compute cost (e.g half the number of pointer adds to be chained)
50a42df
to
5230bcf
Compare
return true; | ||
} else { | ||
if (MemType.getSizeInBits() <= 32) { | ||
ImmediateRangeMax = TII->getLoadStorePostIncImmediateRange(MemType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: one query to TII is enough.
// assume chaining is always profitable. | ||
if (MemType.isVector()) { | ||
return true; | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove else, it will reduce the indentation.
|
||
// If the immediate range is not set, the pointers aren't used by any | ||
// loads and stores, so we return. | ||
if (!ImmediateRangeSet) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this dead code? This covered by:
llvm_unreachable(
"unreachable: Unsupported immediate range of scalar size ");
Should we reuse that else to just return false?
continue; | ||
} | ||
|
||
const int64_t CurrOffset = OffsetMI->Value.getSExtValue(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could include some description of the idea (also as a future reference).
// assume chaining is always profitable. | ||
if (MemType.isVector()) { | ||
return true; | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can assert MemType.getSizeInBits() <= 32
and remove the next if as well.
int64_t ImmediateRangeMax; | ||
int64_t ImmediateRangeMin; | ||
}; | ||
virtual ImmediateRangeBounds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we simply use std::pair?
@@ -72,6 +75,11 @@ static cl::opt<bool> EnableChainsForVectorLdSt( | |||
"aie-chain-addr-vec-ldst", cl::Hidden, cl::init(true), | |||
cl::desc("Enable ptradd chaining for vector loads and stores.")); | |||
|
|||
cl::opt<int> AddressChainCostLimit( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: can we make this static?
// If the base reg is used in any of the successive MBBs, would introduce a | ||
// COPY and increase reg pressure. We only skip chaining in this case if it | ||
// is considered unprofitable. | ||
if (isRegUsedInSuccessiveMBBs(&MBB, PtrReg) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move isRegUsedInSuccessiveMBBs
inside profitability check?
ChainedCostLimit = AddressChainCostLimit; | ||
} | ||
|
||
if (isRegUsedInSuccessiveMBBs(&MBB, PtrReg)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are checking isRegUsedInSuccessiveMBBs
already right?
assert(Instrs.size() > 1); | ||
|
||
bool InLoop = true; | ||
MachineLoopInfo &MLI = getAnalysis<MachineLoopInfo>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use AIELoopUtils::isSingleMBBLoop
, then no need to include MachineLoopInfo
return false; | ||
} | ||
|
||
auto OffsetMI = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: OffsetValue may be.
# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception | ||
# | ||
# (c) Copyright 2025 Advanced Micro Devices, Inc. or its affiliates | ||
# RUN: llc -mtriple aie2p -start-before=aie-cluster-base-address -stop-after=postmisched --issue-limit=6 --aie-chain-addr-scl-ldst=true %s -verify-machineinstrs -o - | FileCheck %s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arent --issue-limit=6 --aie-chain-addr-scl-ldst=true
the default options?
Offsets.push_back( | ||
(I == 0 || (std::holds_alternative<std::string>(Offsets.back()) && | ||
std::get<std::string>(Offsets.back()) == "Break")) | ||
? CurrOffset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like CurrOffset
is an integer value.
|
||
NewOffset = NextOffset - AccumulatedOffset; | ||
|
||
if (NewOffset < ImmediateRangeMin || NewOffset > ImmediateRangeMax) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check: are we supposed to check AccumulatedOffset
with immediate range right?
return {14, -16}; | ||
else if (MemType.getSizeInBits() <= 32) | ||
return {28, -32}; | ||
else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have memory type for s128 as well right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we gracefully exit, rather than asserting?
@@ -1650,3 +1650,15 @@ unsigned AIE2InstrInfo::getBasicVectorBitSize() const { return 512; } | |||
unsigned AIE2InstrInfo::getMaxVectorBitSize() const { return 1024; } | |||
|
|||
unsigned AIE2InstrInfo::getMaxSupportedLdStIncSize() const { return 512; } | |||
|
|||
AIEBaseInstrInfo::ImmediateRangeBounds | |||
AIE2InstrInfo::getLoadStorePostIncImmediateRange(LLT MemType) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will open up a PR, where i have the Immediate Range checks for all the MemorySizes.
You are missing the vector Sizes checks and the checks for different Loads, such as G_ZEXT_LOAD, which are pitiful small.
[AutoBump] Merge with fixes of 0a17bdf (Oct 15) (14) (Needs ONNX Bump)(Needs downstream changes)(Needs Torch Bump)
This allows chaining when the base pointer is used in other basic blocks but only when it is considered profitable:
Below QoR results for affected kernels, the rest are unchanged:
Stack Size is unchanged in all kernels