Avoid broadcast/extract when implementing memset #416

martien-de-jong · 2025-03-31T16:00:35Z

The generic G_MEMSET legalizer helper would tweak the alignment of stack objects to make them amenable for vector implementations. However, the vector store that it creates doesn't have that alignment info available, and our legalization scalarizes it. With that scalarization it is not better than the original code that uses 32 bit stores.

I have disabled the cleverness which gives good results, also in stack size on the reduced example that I have added as a regression test.

DRAFT DRAFT DRAFT
This would be a draft PR, but I want to see whether it comes through standard CI testing

andcarminati · 2025-04-02T07:22:45Z

I think we should extend the target hook to have something like:

 LLT AIEBaseTargetLowering::getOptimalMemOpLLT(
     const MemOp &Op, const AttributeList &FuncAttributes) const {
 
+  bool AllowVectors =  && Op.isFixedDstAlign();
   if (Subtarget.isAIE2P()) {
-    if (AllowVecRegMemOps && Op.size() >= 64 && Op.isAligned(Align(64)))
+    if (AllowVectors && Op.size() >= 64 && Op.isAligned(Align(64)))
       return LLT::fixed_vector(16, 32);
   }
+  
   if (Subtarget.isAIE2() || Subtarget.isAIE2P()) {
-    if (AllowVecRegMemOps && Op.size() >= 32 && Op.isAligned(Align(32)))
+    if (AllowVectors && Op.size() >= 32 && Op.isAligned(Align(32)))
       return LLT::fixed_vector(8, 32);
-    if (AllowVecRegMemOps && Op.size() >= 16 && Op.isAligned(Align(16)))
+    if (AllowVectors && Op.size() >= 16 && Op.isAligned(Align(16)))
       return LLT::fixed_vector(4, 32);
     if (Op.size() >= 4 && Op.isAligned(Align(4)))
       return LLT::scalar(32);

It is risky to remove all that code.

andcarminati · 2025-04-03T08:09:53Z

I think we should extend the target hook to have something like:

 LLT AIEBaseTargetLowering::getOptimalMemOpLLT(
     const MemOp &Op, const AttributeList &FuncAttributes) const {
 
+  bool AllowVectors =  && Op.isFixedDstAlign();
   if (Subtarget.isAIE2P()) {
-    if (AllowVecRegMemOps && Op.size() >= 64 && Op.isAligned(Align(64)))
+    if (AllowVectors && Op.size() >= 64 && Op.isAligned(Align(64)))
       return LLT::fixed_vector(16, 32);
   }
+  
   if (Subtarget.isAIE2() || Subtarget.isAIE2P()) {
-    if (AllowVecRegMemOps && Op.size() >= 32 && Op.isAligned(Align(32)))
+    if (AllowVectors && Op.size() >= 32 && Op.isAligned(Align(32)))
       return LLT::fixed_vector(8, 32);
-    if (AllowVecRegMemOps && Op.size() >= 16 && Op.isAligned(Align(16)))
+    if (AllowVectors && Op.size() >= 16 && Op.isAligned(Align(16)))
       return LLT::fixed_vector(4, 32);
     if (Op.size() >= 4 && Op.isAligned(Align(4)))
       return LLT::scalar(32);

It is risky to remove all that code.

Discussed offline: we have an alignment here, we are just spotting the wrong one in the legalizer.

martien-de-jong · 2025-04-03T12:25:01Z

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp

-      if (MFI.getObjectAlign(FI) < Alignment)
-        MFI.setObjectAlignment(FI, Alignment);
-    }
-  }


Yes. blunt fix. We cut away an optimization and don't meet it in PR testing.
However, this is not the way. Instead I propose we change the lowering code to get the appropriate alignment in the MMO's of the store instructions that are generated. We need to have a less lazy constructor of that derived MMO. I think that should be perfectly acceptable to upstream.

[AutoBump] Merge with fixes of d5f0969 (Sep 11) (2)

martien-de-jong · 2025-09-12T12:26:47Z

Has been overtaken by other fixes and improvements.

martien-de-jong requested review from abhinay-anubola, abnikant, andcarminati, F-Stuckmann, gbossu, katerynamuts, khallouh, konstantinschwarz, niwinanto, SagarMaheshwari99 and stephenneuendorffer as code owners March 31, 2025 16:00

martien-de-jong changed the title ~~Avoid broadcat/extract when implementing memset~~ Avoid broadcast/extract when implementing memset Mar 31, 2025

martien-de-jong force-pushed the martien.fixmemset branch from f188d6e to de8bb9f Compare March 31, 2025 16:32

Martien de Jong added 2 commits April 1, 2025 09:12

add regression test

da9b881

blunt fix

a425b7d

martien-de-jong force-pushed the martien.fixmemset branch from de8bb9f to a425b7d Compare April 1, 2025 08:40

martien-de-jong commented Apr 3, 2025

View reviewed changes

mgehre-amd pushed a commit that referenced this pull request Aug 21, 2025

Merge pull request #416 from Xilinx/bump_to_d5f0969c

63401e3

[AutoBump] Merge with fixes of d5f0969 (Sep 11) (2)

martien-de-jong closed this Sep 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Avoid broadcast/extract when implementing memset #416

Avoid broadcast/extract when implementing memset #416

martien-de-jong commented Mar 31, 2025 •

edited

Loading

Uh oh!

andcarminati commented Apr 2, 2025

Uh oh!

andcarminati commented Apr 3, 2025

Uh oh!

martien-de-jong Apr 3, 2025 •

edited

Loading

Uh oh!

martien-de-jong commented Sep 12, 2025

Uh oh!

Uh oh!

Avoid broadcast/extract when implementing memset #416

Avoid broadcast/extract when implementing memset #416

Conversation

martien-de-jong commented Mar 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andcarminati commented Apr 2, 2025

Uh oh!

andcarminati commented Apr 3, 2025

Uh oh!

martien-de-jong Apr 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

martien-de-jong commented Sep 12, 2025

Uh oh!

Uh oh!

martien-de-jong commented Mar 31, 2025 •

edited

Loading

martien-de-jong Apr 3, 2025 •

edited

Loading