-
Notifications
You must be signed in to change notification settings - Fork 30
SWP-aware WAWRegRewriter. #699
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: aie-public
Are you sure you want to change the base?
Conversation
| // The initial graph will have ordering edges induced by hasSideEffects of the | ||
| // locks/DONE. | ||
| class LockDelays : public ScheduleDAGMutation { | ||
| bool ExactLatencies = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do this:
class LockDelays : public ScheduleDAGMutation {
- bool ExactLatencies = true;
+ bool ExactLatencies;
void apply(ScheduleDAGInstrs *DAG) override {
const auto *TII = static_cast<const AIEBaseInstrInfo *>(DAG->TII);
const int CoreStallCycle = TII->getCoreStallCycleAfterLock();
@@ -243,7 +243,7 @@ class LockDelays : public ScheduleDAGMutation {
}
public:
- LockDelays(bool ExactLatencies) : ExactLatencies(ExactLatencies) {};
+ LockDelays(bool ExactLatencies = true) : ExactLatencies(ExactLatencies) {};
};
We don't need any change in the current instantiation (getPostRAMutationsImpl) of the mutators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that's true. But I have bad feelings about default parameters in constructors. It restricts future constructors which would be ambiguous.
| std::optional<int> MemLat = TII->getMemoryLatency( | ||
| SrcMI.getDesc().getSchedClass(), MI.getDesc().getSchedClass()); | ||
| if (!MemLat.has_value()) { | ||
| int Latency = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CHECK: if we don't need exact latencies, and we don't have them, we use a very optimistic value instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not very optimistic. For a RAW ST -> LD pair, the latency is actually 1, since the LOAD reads memory late.. It's also the default that was assumed before Gaetan made it defined by a target hook.
|
nit: check this commit message: |
| } | ||
|
|
||
| bool AIEWawRegRewriter::renameMBBPhysRegs(const MachineBasicBlock *MBB) { | ||
| // We do this mainly for the postpipeliner, and it will want to see ZOL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we could have some verifier problems by accepting ZOL loops before MachineBlockPlacement, right?
It is better to choose one direction here:
- Implement another hook, like
isPreLayoutZOLBody. - Extend isZOLBody and target machine verifier.
- Clearly state in the comment why we are not relying isZOLBody here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, sometimes we can see some unexpected gains related to PreSWP + WAWRewriter. Are we preventing gains in some pipelined decrement/branch loop here? Could we guard this behavior with a command line option, to be able to run additional tests in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. Under these assumptions, we should perhaps not constrain it to single-block loops at all, because the only advantage we can have from PreSWP loops is the scheduler, and for the scheduler anything might work.
I don't think isPreLayoutZOLBody is very clear concept, it is so tied to the flow.
If you have evidence of the combination of PreSWP, CountedLoops and WAWRegRewriter paying off, I would suggest to just remove the check. My main motivation was that I saw it kick in on loops with calls inside, which is concisely prevented by the ZOL check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or even AIEBaseInstrInfo::isZOLBody(const MachineBasicBlock &MBB, bool OnlyCheckLastInstr = true)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I found evidence in AvgPool. Removed the new check
|
|
||
| static cl::opt<bool> | ||
| LatencyAware("aie-realloc-latencyaware", cl::Hidden, cl::init(true), | ||
| static cl::opt<int> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am trying to figure out the rationale behind this change...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm overloading here, I agree. for tuning, I want to select false, true, or autoselect.
We pass some flags down which control the asserts that are meant to check completeness of the scheduler model for postscheduler's sake. That allows us to create a more approximate ddg in earlier stages. These would throw, mainly on occurrences of PseudoInstructions
Reorder the allocation order of the candidates based on an approximate pipeline schedule.
47d0673 to
61eed42
Compare
The enable flags are now integers, with 0 meaning false, 1 meaning true, anything else meaning auto-select based on LoopClass. This selection avoids running swpaware if latencyaware has run Add some tuning based on LoopClass
61eed42 to
664f9ef
Compare
This adds an alternative to the latency-aware heuristic in WAWRegisterRewriter and selects between the two based on LoopClass.