Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion llvm/include/llvm/CodeGen/GlobalISel/LegalizerHelper.h
Original file line number Diff line number Diff line change
Expand Up @@ -489,7 +489,8 @@ class LegalizerHelper {
LLVM_ABI LegalizeResult lowerVectorReduction(MachineInstr &MI);
LLVM_ABI LegalizeResult lowerMemcpyInline(MachineInstr &MI);
LLVM_ABI LegalizeResult lowerMemCpyFamily(MachineInstr &MI,
unsigned MaxLen = 0);
unsigned MaxLen = 0,
bool SkipVolatile = false);
LLVM_ABI LegalizeResult lowerVAArg(MachineInstr &MI);
};

Expand Down
2 changes: 1 addition & 1 deletion llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1704,7 +1704,7 @@ bool CombinerHelper::tryCombineMemCpyFamily(MachineInstr &MI,
MachineIRBuilder HelperBuilder(MI);
GISelObserverWrapper DummyObserver;
LegalizerHelper Helper(HelperBuilder.getMF(), DummyObserver, HelperBuilder);
return Helper.lowerMemCpyFamily(MI, MaxLen) ==
return Helper.lowerMemCpyFamily(MI, MaxLen, /*SkipVolatile=*/true) ==
LegalizerHelper::LegalizeResult::Legalized;
}

Expand Down
7 changes: 4 additions & 3 deletions llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -10066,7 +10066,8 @@ LegalizerHelper::lowerMemmove(MachineInstr &MI, Register Dst, Register Src,
}

LegalizerHelper::LegalizeResult
LegalizerHelper::lowerMemCpyFamily(MachineInstr &MI, unsigned MaxLen) {
LegalizerHelper::lowerMemCpyFamily(MachineInstr &MI, unsigned MaxLen,
bool SkipVolatile) {
const unsigned Opc = MI.getOpcode();
// This combine is fairly complex so it's not written with a separate
// matcher function.
Expand Down Expand Up @@ -10099,8 +10100,8 @@ LegalizerHelper::lowerMemCpyFamily(MachineInstr &MI, unsigned MaxLen) {
}

bool IsVolatile = MemOp->isVolatile();
// Don't try to optimize volatile.
if (IsVolatile)
// Don't try to optimize volatile when not allowed.
if (SkipVolatile && IsVolatile)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not an optimization and this should not be skipped, nor be an option. I would just delete the check altogether

Copy link
Contributor Author

@petechou petechou Jun 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ISTM, when the code was first introduced in 13af1ed, it's part of combiner code and used as an optimization for AArch64 to inline memcpy sequence instead of creating libcall. In a later refactoring change 36527cb, the code was moved to legalizer to allow AMDGPU legalizing memcpy family of intrinsics by lowering them. However, the code is still used by combiner, and the logic to skip volatile was kept for combiner users. So, that's why I think it's used by some targets as an optimization.

The change tries to make the least update to support both legalizer and combiner users. Another alternative could be deleting the check as you suggested, and do the check in combiner before calling legalizer lowering function instead. That might add some duplicate code like checking zero length and volatile though. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, in 36527cb the MaxLen argument of the added LegalizerHelper lowerMemCpyFamily function could be a defualt argument.

LegalizeResult lowerMemCpyFamily(MachineInstr &MI, unsigned MaxLen = 0);

It's probably for the same reason to support both legalizer and combiner users, I guess.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arsenm Could you please provide your recommendations? I would appreciate your input. Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arsenm Ping. Thanks.

@mbrkusanin This pr extends 36527cb to support legalizing volatile memcpy family. Appreciate it if you could also provide comments here. Thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arsenm, ping^3.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should always be lowering volatile memcpy operations if possible so the check can be deleted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aemerson @arsenm
Thanks for your comment. Changed to delete the check with an aarch64 lit update. The lit update seems ok as GISEL produces similar sequence as SDAG. Please help review. Thanks.

return UnableToLegalize;

if (MaxLen && KnownLen > MaxLen)
Expand Down
30 changes: 30 additions & 0 deletions llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-memcpy.mir
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For AMDGPU, SelectionDAG can handle volatile memcpy but GlobalISEL would fail to legalize it, so it seems to me GlobalISEL should allow lowering volatile memcpy family.

Original file line number Diff line number Diff line change
Expand Up @@ -31,3 +31,33 @@ body: |
S_ENDPGM 0

...
---
name: memcpy_test_volatile
body: |
bb.0:
liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3

; CHECK-LABEL: name: memcpy_test_volatile
; CHECK: liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1
; CHECK-NEXT: [[MV:%[0-9]+]]:_(p0) = G_MERGE_VALUES [[COPY]](s32), [[COPY1]](s32)
; CHECK-NEXT: [[COPY2:%[0-9]+]]:_(s32) = COPY $vgpr2
; CHECK-NEXT: [[COPY3:%[0-9]+]]:_(s32) = COPY $vgpr3
; CHECK-NEXT: [[MV1:%[0-9]+]]:_(p0) = G_MERGE_VALUES [[COPY2]](s32), [[COPY3]](s32)
; CHECK-NEXT: [[LOAD:%[0-9]+]]:_(s32) = G_LOAD [[MV1]](p0) :: (volatile load (s8))
; CHECK-NEXT: G_STORE [[LOAD]](s32), [[MV]](p0) :: (volatile store (s8))
; CHECK-NEXT: S_ENDPGM 0
%0:_(s32) = COPY $vgpr0
%1:_(s32) = COPY $vgpr1
%2:_(p0) = G_MERGE_VALUES %0:_(s32), %1:_(s32)
%3:_(s32) = COPY $vgpr2
%4:_(s32) = COPY $vgpr3
%5:_(p0) = G_MERGE_VALUES %3:_(s32), %4:_(s32)
%6:_(s32) = G_CONSTANT i32 1
%7:_(s64) = G_ZEXT %6:_(s32)
G_MEMCPY %2:_(p0), %5:_(p0), %7:_(s64), 0 :: (volatile store (s8)), (volatile load (s8))
S_ENDPGM 0

...
30 changes: 30 additions & 0 deletions llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-memcpyinline.mir
Original file line number Diff line number Diff line change
Expand Up @@ -31,3 +31,33 @@ body: |
S_ENDPGM 0

...
---
name: memcpyinline_test_volatile
body: |
bb.0:
liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3

; CHECK-LABEL: name: memcpyinline_test_volatile
; CHECK: liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1
; CHECK-NEXT: [[MV:%[0-9]+]]:_(p0) = G_MERGE_VALUES [[COPY]](s32), [[COPY1]](s32)
; CHECK-NEXT: [[COPY2:%[0-9]+]]:_(s32) = COPY $vgpr2
; CHECK-NEXT: [[COPY3:%[0-9]+]]:_(s32) = COPY $vgpr3
; CHECK-NEXT: [[MV1:%[0-9]+]]:_(p0) = G_MERGE_VALUES [[COPY2]](s32), [[COPY3]](s32)
; CHECK-NEXT: [[LOAD:%[0-9]+]]:_(s32) = G_LOAD [[MV1]](p0) :: (volatile load (s8))
; CHECK-NEXT: G_STORE [[LOAD]](s32), [[MV]](p0) :: (volatile store (s8))
; CHECK-NEXT: S_ENDPGM 0
%0:_(s32) = COPY $vgpr0
%1:_(s32) = COPY $vgpr1
%2:_(p0) = G_MERGE_VALUES %0:_(s32), %1:_(s32)
%3:_(s32) = COPY $vgpr2
%4:_(s32) = COPY $vgpr3
%5:_(p0) = G_MERGE_VALUES %3:_(s32), %4:_(s32)
%6:_(s32) = G_CONSTANT i32 1
%7:_(s64) = G_ZEXT %6:_(s32)
G_MEMCPY_INLINE %2:_(p0), %5:_(p0), %7:_(s64) :: (volatile store (s8)), (volatile load (s8))
S_ENDPGM 0

...
30 changes: 30 additions & 0 deletions llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-memmove.mir
Original file line number Diff line number Diff line change
Expand Up @@ -31,3 +31,33 @@ body: |
S_ENDPGM 0

...
---
name: memmove_test_volatile
body: |
bb.0:
liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3

; CHECK-LABEL: name: memmove_test_volatile
; CHECK: liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1
; CHECK-NEXT: [[MV:%[0-9]+]]:_(p0) = G_MERGE_VALUES [[COPY]](s32), [[COPY1]](s32)
; CHECK-NEXT: [[COPY2:%[0-9]+]]:_(s32) = COPY $vgpr2
; CHECK-NEXT: [[COPY3:%[0-9]+]]:_(s32) = COPY $vgpr3
; CHECK-NEXT: [[MV1:%[0-9]+]]:_(p0) = G_MERGE_VALUES [[COPY2]](s32), [[COPY3]](s32)
; CHECK-NEXT: [[LOAD:%[0-9]+]]:_(s32) = G_LOAD [[MV1]](p0) :: (volatile load (s8))
; CHECK-NEXT: G_STORE [[LOAD]](s32), [[MV]](p0) :: (volatile store (s8))
; CHECK-NEXT: S_ENDPGM 0
%0:_(s32) = COPY $vgpr0
%1:_(s32) = COPY $vgpr1
%2:_(p0) = G_MERGE_VALUES %0:_(s32), %1:_(s32)
%3:_(s32) = COPY $vgpr2
%4:_(s32) = COPY $vgpr3
%5:_(p0) = G_MERGE_VALUES %3:_(s32), %4:_(s32)
%6:_(s32) = G_CONSTANT i32 1
%7:_(s64) = G_ZEXT %6:_(s32)
G_MEMMOVE %2:_(p0), %5:_(p0), %7:_(s64), 0 :: (volatile store (s8)), (volatile load (s8))
S_ENDPGM 0

...
29 changes: 29 additions & 0 deletions llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-memset.mir
Original file line number Diff line number Diff line change
Expand Up @@ -30,3 +30,32 @@ body: |
S_ENDPGM 0

...
---
name: memset_test_volatile
body: |
bb.0:
liveins: $vgpr0, $vgpr1, $vgpr2

; CHECK-LABEL: name: memset_test_volatile
; CHECK: liveins: $vgpr0, $vgpr1, $vgpr2
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1
; CHECK-NEXT: [[MV:%[0-9]+]]:_(p0) = G_MERGE_VALUES [[COPY]](s32), [[COPY1]](s32)
; CHECK-NEXT: [[COPY2:%[0-9]+]]:_(s32) = COPY $vgpr2
; CHECK-NEXT: [[TRUNC:%[0-9]+]]:_(s8) = G_TRUNC [[COPY2]](s32)
; CHECK-NEXT: [[COPY3:%[0-9]+]]:_(s8) = COPY [[TRUNC]](s8)
; CHECK-NEXT: G_STORE [[COPY2]](s32), [[MV]](p0) :: (volatile store (s8))
; CHECK-NEXT: S_ENDPGM 0
%0:_(s32) = COPY $vgpr0
%1:_(s32) = COPY $vgpr1
%2:_(p0) = G_MERGE_VALUES %0:_(s32), %1:_(s32)
%3:_(s32) = COPY $vgpr2
%4:_(s16) = G_TRUNC %3:_(s32)
%5:_(s8) = G_TRUNC %4:_(s16)
%6:_(s32) = G_CONSTANT i32 1
%7:_(s64) = G_ZEXT %6:_(s32)
G_MEMSET %2:_(p0), %5:_(s8), %7:_(s64), 0 :: (volatile store (s8))
S_ENDPGM 0

...