-
Notifications
You must be signed in to change notification settings - Fork 15.1k
[VPlan] Hook IR blocks into VPlan during skeleton creation (NFC) #114292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 16 commits
1b89761
4e5c743
b87cf14
4ddc87a
599e690
1a77b55
a2137b4
b4d0eac
2b4c71f
be2c3a6
466b393
10a675e
7996451
5ea6f7b
64909aa
382380a
333536a
e5b8af3
df6894a
927a66d
5468f61
3eef601
1d6db4a
22eeebe
886bcc2
65ac2d7
2f7d530
ba08c2e
4c76bec
43c9186
7f08758
3709f17
a72df24
16a6246
87f2815
b7b43a8
906603f
1e7cac7
4a51d29
f53cf1b
a0af583
968598b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -2429,6 +2429,30 @@ InnerLoopVectorizer::getOrCreateVectorTripCount(BasicBlock *InsertBlock) { | |||||||||||
return VectorTripCount; | ||||||||||||
} | ||||||||||||
|
||||||||||||
/// Helper to connect both the vector and scalar preheaders to the Plan's | ||||||||||||
/// entry. This is used when adjusting \p Plan during skeleton | ||||||||||||
/// creation, i.e. adjusting the plan after introducing an initial runtime | ||||||||||||
/// check. | ||||||||||||
static void connectScalarPreheaderInVPlan(VPlan &Plan) { | ||||||||||||
|
||||||||||||
VPBlockBase *VectorPH = Plan.getVectorPreheader(); | ||||||||||||
|
VPBlockBase *VectorPH = Plan.getVectorPreheader(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dropped, thanks!
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VPBlockBase *PredVPB = Plan.getEntry(); | |
VPBlockBase *Entry = Plan.getEntry(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VPlan starts with createInitialVPlan() producing a disconnected Entry and a predecessor-less vector_PH --> vector_region (with Header --> Latch) --> middle_block --> scalar_PH, plus an optional middle-block --> exit.
Would it be better to connect Entry-->vector_PH at the outset (ah, disregard - this indeed takes place below...), and here connect Entry-->scalar_PH followed by swapSuccessors(Entry)
? The use of non-negative indices in connectBlocks()
may be confusing, as it not only connects the two blocks but also disconnects them from existing successor and/or predecessor. I.e., replacePredecessor()
or replaceSuccessor()
may be more appropriate.
Furthermore, TCCheckBlock could then be inserted on this Entry-->vector_PH edge, so that connectScalarPreheaderInVPlan() assimilates introduceCheckBlockInVPlan().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to connect + swap, thanks.
Furthermore, TCCheckBlock could then be inserted on this Entry-->vector_PH edge, so that connectScalarPreheaderInVPlan() assimilates introduceCheckBlockInVPlan().
Not sure how exactly, left as is for now.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// vector preheader and its predecessor, also connecting to the scalar | |
/// vector preheader and its predecessor, also connecting the new block to the scalar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ScalarPH is expected to be the other successor of PreVectorPH, so could be asserted (or another way to retrieve ScalarPH), although more general w/o this assert?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added an assert for now, than
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VPBlockUtils::connectBlocks(CheckVPIRBB, ScalarPH); | |
VPBlockUtils::insertOnEdge(PreVectorPH, VectorPH, CheckVPIRBB); | |
// Connect ScalarPH before VectorPH (via insert on edge), so that ScalarPH is the first successor of the new check block. | |
VPBlockUtils::connectBlocks(CheckVPIRBB, ScalarPH); | |
VPBlockUtils::insertOnEdge(PreVectorPH, VectorPH, CheckVPIRBB); |
or first insert on edge, so that check block has vector loop preheader as its only successor, similar to the initial Entry block, and then connect it also to scalar loop preheader, followed by swapping the successors, aligning with the above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Independent): LoopScalarPreHeader is passed as parameter Bypass
, while LoopVectorPreHeader is retrieved directly to set TCCheckBlock
(before being reset). Would be better to be consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assert was probably added along with the lines below, but should remain, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is to check the DT updates from SplitBlock I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scalar preheader can be connected in VPlan from the outset, rather than here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few places that assume the scalar PH has a single predecessor, which would need to be updated. Could pull this in here or adjust as follow-up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The single predecessor of Scalar PH in VPlan being middle_block, right?
I.e., scalar loop is initially connected as leftover/remainder loop only, and here connected also as alternative/bypass loop - to handle trip counts too small for the vector loop, and potentially other unvectorized cases.
Perhaps connectScalarAsBypassLoopInVPlan()
is more accurate than connectScalarPreheaderInVPlan()
, given that scalar PH is already connected in VPlan?
Applying this additional connection earlier, is fine as a later follow-up, perhaps w/ a TODO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, renamed and added TODO to connectScalarAsBypassLoopInVPlan
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth wrapping LoopVectorPreHeader in a VPIRBB via connectCheckBlockInVPlan() as well, perhaps as a TODO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a TODO for now, thanks!
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved later to VPlan::execute(), after is replaces VPBBtoVPIRBB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, can also split off once we are happy
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: worth renaming getPreheader() <--> getEntry() separately (?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Undone for now, thanks
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VPBlockBase *PredVPB = VectorPH->getSinglePredecessor(); | |
VPBlockBase *PreVectorPH = VectorPH->getSinglePredecessor(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, thanks!
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this distinction between having a single successor (VectorPH) and two (VectorPH and ... ScalarPH?) correspond to ForEpilogue
or not (forMain)? Worth some explanation, compared to InnerLoopVectorizer::emitIterationCountCheck()
which always applies connectScalarPreheaderInVPlan()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes ForEpilogue can be checked instead, updated and added a comment, thanks!
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth connecting scalar preheader from the outset rather than here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few places that assume the scalar PH has a single predecessor, which would need to be updated. Could pull this in here or adjust as follow-up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow-up would be fine.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth introducing TCCheckBlock also when it appears first, rather than "folding" it into Entry?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as in having the entry connect directly to TCCheckBlock?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's somewhat confusing to wrap TCCheckBlock only below under "else" but not above under "if", given that it is created in both cases. But this seems similar to InnerLoopVectorizer::emitIterationCountCheck()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the check as suggested above and added a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one still needed?
(Review to be continued from here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep this one is still needed for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Entry is conceptually an immutable VPIRBB that wraps the original scalar preheader, where SCEVs can be safely expanded, and all runtime checks are added as successors starting with minimal trip count check that is added as its terminal. Does this change when executing the VPlan of an epilog loop, whose Entry is replicated and transformed from old to new?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it changes conceptually, but here we need to wrap the new entry to be the new node created here, after executing the plan. (There is existing logic to re-use expanded SCEVs from the first execution of the main plan, to avoid duplicated expansions)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All VPlans are initialized with the original scalar preheader as their Entry, but this works only when vectorizing the main loop (w/ or w/o vectorizing the epilog). When vectorizing the epilog loop, Entry no longer serves to host SCEV-expand recipes but should instead refer to the block between Middle and "Vector Epilog PreHeader" (called "Vector Epilogue Trip Count Check", but that's also how the first block is named), as depicted in https://llvm.org/docs/Vectorizers.html#epilogue-vectorization. The documentation of Entry in createInitialVPlan() deserves update?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment, thanks!
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be done using replaceVPBBWithIRVPBB()?
Better merge NewEntry with OldEntry here, rather than insert NewEntry after OldEntry and let subsequent merging of VPBB pairs merge them, because the latter avoids handling VPIRBB's?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be done using replaceVPBBWithIRVPBB()?
I've simplified the code here and there should be nothing to merge at this point, I dropped the loop to remove the loop with VPIRInstructions. It may be possible to use replaceVPBBWithIRVPBB
, but we still need to retrieve the new Block and update Entry. Left as is for now.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Surely VPIR isn't null, having just created NewEntry fromBasicBlock Insert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code removed, thanks!
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Erasing all of Insert's phi's from NewEntry?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be empty, code remove, thanks!
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should OldEntry be deleted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to remove it, thanks!
Original file line number | Diff line number | Diff line change | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -170,9 +170,8 @@ VPBasicBlock *VPBlockBase::getEntryBasicBlock() { | |||||||||||||
} | ||||||||||||||
|
||||||||||||||
void VPBlockBase::setPlan(VPlan *ParentPlan) { | ||||||||||||||
assert( | ||||||||||||||
(ParentPlan->getEntry() == this || ParentPlan->getPreheader() == this) && | ||||||||||||||
"Can only set plan on its entry or preheader block."); | ||||||||||||||
assert(ParentPlan->getEntry() == this && | ||||||||||||||
"Can only set plan on its entry or preheader block."); | ||||||||||||||
|
"Can only set plan on its entry or preheader block."); | |
"Can only set plan on its entry block."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth noting that this connection is subject to insertion of runtime checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to disconnect in DT only?
Retain the comment wrt DT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Retained both, as the DT update is required, (otherwise the sanity checks during DT updates will trigger)
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cast<VPBasicBlock>(getVectorLoopRegion()->getSinglePredecessor()), | |
getVectorPreheader(), |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated thanks!
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better placed next to the two replaceVPBBWithIRVPBB() below, and update the comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved, thanks!
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment should be updated - code for runtime guards has already been generated (is simply skipped here), but code is generated also after the body (middle, scalar pre-header).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, thanks!
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would Block : RPOT
suffice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, earlier versions skipped the pre-header but that is not needed in the latest version, simplified, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better cache the result? Can be done as follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leave a TODO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added thanks!
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simpler to return last successor, provided it ain't a VPIRBB?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simplified, thanks
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dyn_casting to VPBB but calling it IRVPBB
?
Intention is to return the single successor if it's not an IRVPBB, as that represents the exit block?
if (auto *IRVPBB = dyn_cast<VPBasicBlock>(MiddleVPBB->getSingleSuccessor())) | |
if (isa<VPIRBasicBlock>(MiddleVPBB->getSingleSuccessor())) { | |
// Exit block is IRVPBB, scalar preheader is not. | |
return nullptr; | |
} | |
return cast<VPBasicBlock>(MiddleVPBB->getSingleSuccessor()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated code as per comment above, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, thanks!