Skip to content

Conversation

@andcarminati
Copy link
Collaborator

@andcarminati andcarminati commented Oct 21, 2025

This work is intended to avoid 2D/3D (when possible) register spills.

The idea and rationale behind this work is in a previous Draft PR: #442.

To review, I recommend to follow this PR commit by commit.

Credits also for the co-author @krishnamtibrewala.

@andcarminati
Copy link
Collaborator Author

andcarminati commented Oct 21, 2025

QoR results:

Core_Insn_Count Core_StackSize_absolute Core_PMSize_absolute

@krishnamtibrewala
Copy link
Collaborator

Thanks you @andcarminati, very much !!

@krishnamtibrewala
Copy link
Collaborator

Also do you think following commit will help ?
a681b6e

@andcarminati
Copy link
Collaborator Author

Also do you think following commit will help ? a681b6e

Maybe yes! As mentioned before, I prefer to keep just the minimal necessary changes. We can test after, on top of this PR.

@andcarminati andcarminati force-pushed the andreu.extend.2d3d.allocation branch from 31b7e71 to a27561f Compare October 22, 2025 08:35
@andcarminati andcarminati force-pushed the andreu.extend.2d3d.allocation branch from a27561f to e124649 Compare October 31, 2025 14:07
@andcarminati andcarminati force-pushed the andreu.extend.2d3d.allocation branch 3 times, most recently from 80e6f7c to 4c47705 Compare October 31, 2025 14:36
MI.getMF()->getSubtarget().getInstrInfo());

// We should recognize both cases, with and without splitting. A 2D/3D
// instruction will always be split os splittable.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: as

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Repaired as a fixup in the refactor commit.

MI.eraseFromParent();
// As we don't handle all registers now (selective LI filter),
// We should make sure that all LiveIntervals are correct.
// If we dont't repair, MI will compose the LIs of some registers,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: don't

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean with compose? the dead MI will block some LI ?

Copy link
Collaborator Author

@andcarminati andcarminati Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A dead MI will have an invalid SlotIndex that will appear in some instruction`s LI, what is wrong. The original copy should not appear in any LI.

const AIEBaseRegisterInfo &TRI,
std::set<Register> &VisitedVRegs);

SmallSet<int, 8> getRewritableSubRegs(Register Reg,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we have a comment what this function does?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is a refactor but it is always good to document.

}
Register SrcReg = RegOp.getParent()->getOperand(1).getReg();
if (!VisitedVRegs.count(SrcReg) &&
getRewritableSubRegs(SrcReg, MRI, TRI, VisitedVRegs).empty()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it ever happen, that SrcReg has no SubRegs, but DstReg has them or vis versa?
Can they also have different SubRegs?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part was a refactor, but as we are handling a full copy here, we can expect subregs on both sides.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a full copy, I mean a 2d to 2d copy or a 3d to 3d copy.

MI.eraseFromParent();
// As we don't handle all registers now (selective LI filter),
// We should make sure that all LiveIntervals are correct.
// If we dont't repair, MI will compose the LIs of some registers,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: don't

// We should make sure that all LiveIntervals are correct.
// If we dont't repair, MI will compose the LIs of some registers,
// what is not correct because MI was deleted.
updateLRMAndLIS(RegistersToRepair, VRM, LRM, LIS);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could we get a better name that does not have the abbreviations in it?
updateLiveIntervals. Also since LIS is the most relevant Datastructure, have it first and then followed by helper structures LRM and VRM.

andcarminati and others added 2 commits November 7, 2025 01:23
Now we filter by register class and usage. Basically, we exclude here
instructions like copies and non-2D/3D ones.

Co-Authored-By: Krishnam Tibrewala <[email protected]>
andcarminati and others added 7 commits November 7, 2025 03:07
If we don't need a full register, we can expand to individual lanes.

Co-Authored-By: Krishnam Tibrewala <[email protected]>
This avoids cycles in bundles that appear in VirtRegRewriter.
We also update LIs related to src and dst operands of those
expanded copies.

Co-Authored-By: Krishnam Tibrewala <[email protected]>
The goal of this test is to check if we properly insert undef flag on the def side
of a expanded full copy.  On a sub-register def operand, it refers to the part of the
register that isn't written. A sub-register def implicitly reads the other parts of the
register being redefined unless the <undef> flag is set, and a missing flag can
force the related register to be inserted in liveout set of the predecessors block,
causing dominance problems.

Co-Authored-By: Krishnam Tibrewala <[email protected]>
@andcarminati andcarminati force-pushed the andreu.extend.2d3d.allocation branch from 4c47705 to a88a541 Compare November 7, 2025 12:36
This will handle properly use of non-dominating definitions. We also
change the handling of the destination registers in two parts:

*Copy expansion: we replace the ogininal index by the index of the first
lane copy to avoid the creation LRs with just one instruction, in this
way we keep que LI correct.

*Rewrite: reset dead flags if necessary.

In the test, MachineVerifier is still off, because some lanes redefinition are
consirered full register redefinition, causing some non-accurate
expactations aroud dead flags. The affected test is a extreme corner case
where individual lanes are handled apart of original 2D register.

Co-Authored-By: Krishnam Tibrewala <[email protected]>
@andcarminati andcarminati force-pushed the andreu.extend.2d3d.allocation branch from a88a541 to ab8c08d Compare November 7, 2025 13:02

MachineInstr *FirstMI = nullptr;
SmallSet<Register, 8> RegistersToRepair;
bool IsFirstCopy = true;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: We can use FirstMI as a replacement for IsFirstCopy.

@@ -0,0 +1,172 @@
//===----- AIEUnallocatedSuperRegRewriter.cpp - Constrain tied sub-registers
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants