Skip to content

Conversation

martien-de-jong
Copy link
Collaborator

@martien-de-jong martien-de-jong commented Mar 31, 2025

The generic G_MEMSET legalizer helper would tweak the alignment of stack objects to make them amenable for vector implementations. However, the vector store that it creates doesn't have that alignment info available, and our legalization scalarizes it. With that scalarization it is not better than the original code that uses 32 bit stores.

I have disabled the cleverness which gives good results, also in stack size on the reduced example that I have added as a regression test.

DRAFT DRAFT DRAFT
This would be a draft PR, but I want to see whether it comes through standard CI testing

@martien-de-jong martien-de-jong changed the title Avoid broadcat/extract when implementing memset Avoid broadcast/extract when implementing memset Mar 31, 2025
@andcarminati
Copy link
Collaborator

I think we should extend the target hook to have something like:

 LLT AIEBaseTargetLowering::getOptimalMemOpLLT(
     const MemOp &Op, const AttributeList &FuncAttributes) const {
 
+  bool AllowVectors =  && Op.isFixedDstAlign();
   if (Subtarget.isAIE2P()) {
-    if (AllowVecRegMemOps && Op.size() >= 64 && Op.isAligned(Align(64)))
+    if (AllowVectors && Op.size() >= 64 && Op.isAligned(Align(64)))
       return LLT::fixed_vector(16, 32);
   }
+  
   if (Subtarget.isAIE2() || Subtarget.isAIE2P()) {
-    if (AllowVecRegMemOps && Op.size() >= 32 && Op.isAligned(Align(32)))
+    if (AllowVectors && Op.size() >= 32 && Op.isAligned(Align(32)))
       return LLT::fixed_vector(8, 32);
-    if (AllowVecRegMemOps && Op.size() >= 16 && Op.isAligned(Align(16)))
+    if (AllowVectors && Op.size() >= 16 && Op.isAligned(Align(16)))
       return LLT::fixed_vector(4, 32);
     if (Op.size() >= 4 && Op.isAligned(Align(4)))
       return LLT::scalar(32);

It is risky to remove all that code.

@andcarminati
Copy link
Collaborator

I think we should extend the target hook to have something like:

 LLT AIEBaseTargetLowering::getOptimalMemOpLLT(
     const MemOp &Op, const AttributeList &FuncAttributes) const {
 
+  bool AllowVectors =  && Op.isFixedDstAlign();
   if (Subtarget.isAIE2P()) {
-    if (AllowVecRegMemOps && Op.size() >= 64 && Op.isAligned(Align(64)))
+    if (AllowVectors && Op.size() >= 64 && Op.isAligned(Align(64)))
       return LLT::fixed_vector(16, 32);
   }
+  
   if (Subtarget.isAIE2() || Subtarget.isAIE2P()) {
-    if (AllowVecRegMemOps && Op.size() >= 32 && Op.isAligned(Align(32)))
+    if (AllowVectors && Op.size() >= 32 && Op.isAligned(Align(32)))
       return LLT::fixed_vector(8, 32);
-    if (AllowVecRegMemOps && Op.size() >= 16 && Op.isAligned(Align(16)))
+    if (AllowVectors && Op.size() >= 16 && Op.isAligned(Align(16)))
       return LLT::fixed_vector(4, 32);
     if (Op.size() >= 4 && Op.isAligned(Align(4)))
       return LLT::scalar(32);

It is risky to remove all that code.

Discussed offline: we have an alignment here, we are just spotting the wrong one in the legalizer.

if (MFI.getObjectAlign(FI) < Alignment)
MFI.setObjectAlignment(FI, Alignment);
}
}
Copy link
Collaborator Author

@martien-de-jong martien-de-jong Apr 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. blunt fix. We cut away an optimization and don't meet it in PR testing.
However, this is not the way. Instead I propose we change the lowering code to get the appropriate alignment in the MMO's of the store instructions that are generated. We need to have a less lazy constructor of that derived MMO. I think that should be perfectly acceptable to upstream.

mgehre-amd pushed a commit that referenced this pull request Aug 21, 2025
[AutoBump] Merge with fixes of d5f0969 (Sep 11) (2)
@martien-de-jong
Copy link
Collaborator Author

Has been overtaken by other fixes and improvements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants