Skip to content

Treat ';' and '\n' as assembly instruction separators in collectAsmInstrs #149365

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

rlavaee
Copy link
Contributor

@rlavaee rlavaee commented Jul 17, 2025

This also fixes the incorrect treatment of '\n\t' as a separator for asm instructions.

The new implementation also trims comments and whitespaces from the parsed assembly instructions.

@llvmbot
Copy link
Member

llvmbot commented Jul 17, 2025

@llvm/pr-subscribers-llvm-analysis
@llvm/pr-subscribers-llvm-selectiondag
@llvm/pr-subscribers-llvm-ir
@llvm/pr-subscribers-backend-x86

@llvm/pr-subscribers-llvm-transforms

Author: Rahman Lavaee (rlavaee)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/149365.diff

3 Files Affected:

  • (modified) llvm/lib/IR/InlineAsm.cpp (+2-2)
  • (modified) llvm/test/CodeGen/X86/inline-asm-function-call-pic.ll (+1-1)
  • (modified) llvm/test/Transforms/Inline/inline-call-with-asm-call.ll (+1-1)
diff --git a/llvm/lib/IR/InlineAsm.cpp b/llvm/lib/IR/InlineAsm.cpp
index 922081468a775..baa814d06e28a 100644
--- a/llvm/lib/IR/InlineAsm.cpp
+++ b/llvm/lib/IR/InlineAsm.cpp
@@ -63,14 +63,14 @@ FunctionType *InlineAsm::getFunctionType() const {
 void InlineAsm::collectAsmStrs(SmallVectorImpl<StringRef> &AsmStrs) const {
   StringRef AsmStr(AsmString);
   AsmStrs.clear();
-
   // TODO: 1) Unify delimiter for inline asm, we also meet other delimiters
   // for example "\0A", ";".
   // 2) Enhance StringRef. Some of the special delimiter ("\0") can't be
   // split in StringRef. Also empty StringRef can not call split (will stuck).
   if (AsmStr.empty())
     return;
-  AsmStr.split(AsmStrs, "\n\t", -1, false);
+  AsmStr.split(AsmStrs, "\n", -1, false);
+  errs() << "Num Asm Strs: " << AsmStrs.size() << "\n";
 }
 
 /// Parse - Analyze the specified string (e.g. "==&{eax}") and fill in the
diff --git a/llvm/test/CodeGen/X86/inline-asm-function-call-pic.ll b/llvm/test/CodeGen/X86/inline-asm-function-call-pic.ll
index d3ca872509ad5..f5bebfb058db6 100644
--- a/llvm/test/CodeGen/X86/inline-asm-function-call-pic.ll
+++ b/llvm/test/CodeGen/X86/inline-asm-function-call-pic.ll
@@ -59,7 +59,7 @@ define void @func() local_unnamed_addr #0 {
 entry:
   %call = tail call i32 @static_func()
 ;; We test call, CALL, and jmp.
-  tail call void asm sideeffect inteldialect "call ${0:P}\0A\09CALL ${1:P}\0A\09jmp ${1:P}\0A\09shr eax, $$0\0A\09shr ebx, $$0\0A\09shr ecx, $$0\0A\09shr edx, $$0\0A\09shr edi, $$0\0A\09shr esi, $$0\0A\09shr ebp, $$0\0A\09shr esp, $$0", "*m,*m,~{eax},~{ebp},~{ebx},~{ecx},~{edi},~{edx},~{flags},~{esi},~{esp},~{dirflag},~{fpsr},~{flags}"(ptr nonnull elementtype(i32 (...)) @static_func, ptr nonnull elementtype(i32 (...)) @extern_func) #0
+  tail call void asm sideeffect inteldialect "call ${0:P}\0ACALL ${1:P}\0Ajmp ${1:P}\0Ashr eax, $$0\0Ashr ebx, $$0\0Ashr ecx, $$0\0Ashr edx, $$0\0Ashr edi, $$0\0Ashr esi, $$0\0Ashr ebp, $$0\0Ashr esp, $$0", "*m,*m,~{eax},~{ebp},~{ebx},~{ecx},~{edi},~{edx},~{flags},~{esi},~{esp},~{dirflag},~{fpsr},~{flags}"(ptr nonnull elementtype(i32 (...)) @static_func, ptr nonnull elementtype(i32 (...)) @extern_func) #0
   ret void
 }
 
diff --git a/llvm/test/Transforms/Inline/inline-call-with-asm-call.ll b/llvm/test/Transforms/Inline/inline-call-with-asm-call.ll
index 7d8121d04996e..a536eca072d42 100644
--- a/llvm/test/Transforms/Inline/inline-call-with-asm-call.ll
+++ b/llvm/test/Transforms/Inline/inline-call-with-asm-call.ll
@@ -27,7 +27,7 @@ define void @caller(i32 %a, i1 %b) #0 {
 ;; destination section and two assembly instructions in the outlined "other"
 ;; section.
 define void @callee(i32 %a, i1 %b) {
-  call void asm sideeffect "s_nop 1\0A\09.pushsection other\0A\09s_nop 2\0A\09s_nop 3\0A\09.popsection\0A\09s_nop 4\0A\09.align 32", ""()
+  call void asm sideeffect "s_nop 1\0A.pushsection other\0A\09s_nop 2\0A\09s_nop 3\0A\09.popsection\0A\09s_nop 4\0A.align 32", ""()
   ret void
 }
 ; CHECK: define void @callee

@llvmbot llvmbot added llvm:SelectionDAG SelectionDAGISel as well llvm:analysis Includes value tracking, cost tables and constant folding labels Jul 17, 2025
@rlavaee rlavaee changed the title Treat "\n" as a separator for asm instructions, instead of "\n\t". Treat ';' and '\n' as assembly instruction separators in collectAsmInstrs Jul 17, 2025
@rlavaee rlavaee requested review from kazutakahirata, mingmingl-llvm and xiangzh1 and removed request for kazutakahirata July 17, 2025 20:18
Copy link
Contributor

@kazutakahirata kazutakahirata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo some minor comments. Thanks!

}
};

SmallVector<StringRef, 4> AsmInstrs;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm inclined to drop , 4 here to ensure return value optimization even when we change how we compute the default number of inline elements in SmallVector.h. Right now, that happens to be 4, so this type happens to be the same as the return type.

Suggested change
SmallVector<StringRef, 4> AsmInstrs;
SmallVector<StringRef> AsmInstrs;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

for (StringRef &AsmLine : AsmLines) {
// First remove the comments. Note it's important to do this before breaking
// by ';' since the comment portion may include that character too.
TrimComments(AsmLine);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I suggest "inlining" TrimComments while using StringRef::split? It's only a couple of lines of code if we use StringRef::split. StringRef::split internally computes second, but that should be optimized away.

Suggested change
TrimComments(AsmLine);
AsmLine = AsmLine.split('#').first;
AsmLine = AsmLine.split("//").first;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great suggestion. I used AsmLine = AsmLine.split('#').first.split("//").first

Comment on lines 92 to 94
if (Trimmed.empty())
continue;
AsmInstrs.push_back(Trimmed);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: May I suggest something like this?

Suggested change
if (Trimmed.empty())
continue;
AsmInstrs.push_back(Trimmed);
if (!Trimmed.empty())
AsmInstrs.push_back(Trimmed);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

AsmStr.split(AsmStrs, "\n\t", -1, false);
// First break the assembly string into lines.
SmallVector<StringRef, 4> AsmLines;
AsmStr.split(AsmLines, '\n', -1, false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: comment the argument name, something like AsmStr.split(AsmLines, '\n', /* MaxSplit= */ -1, /* KeepEmpty= */ false)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, changed to AsmStr.split(AsmLines, '\n') so we can use the default argument values.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sg.

@@ -7,10 +7,9 @@ define ptr @foo(ptr %Ptr) {
%Ptr.addr = alloca ptr, align 8
store ptr %Ptr, ptr %Ptr.addr, align 8
; CHECK: movq %rdi, -8(%rsp)
%1 = tail call ptr asm "mov $1, $0\0A\09lea $2, $0", "=r,p,*m,~{dirflag},~{fpsr},~{flags}"(ptr %Ptr, ptr elementtype(ptr) %Ptr.addr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will \0A\09 get handled by the updated implementation of collectAsmInstrs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

\0A is the newline character and \09 is tab. So the tab will be trimmed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The \0A\09 is in the original test case so it'd be good to keep them for test coverage.

To test new cases (; or # some comment in llvm/test/Transforms/Inline/inline-call-with-asm-call.ll), what do you think of adding new functions instead of changing existing ones in place?

@@ -87,7 +87,12 @@ class InlineAsm final : public Value {

StringRef getAsmString() const { return AsmString; }
StringRef getConstraintString() const { return Constraints; }
LLVM_ABI void collectAsmStrs(SmallVectorImpl<StringRef> &AsmStrs) const;

/// collectAsmInstrs - Parses the assembly instruction and collects individual
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// collectAsmInstrs - Parses the assembly instruction and collects individual
/// Parses the assembly instruction and collects individual

collectAsmInstrs

This also fixes the incorrect treatment of '\n\t' as a separator for asm
instructions.
Copy link
Member

@MaskRay MaskRay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would break Mach-O, which uses ; as a comment marker.

See CommentString = ";"; in MCAsmInfo.cpp files. Unfortunately, IR cannot depend on MC..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:X86 llvm:analysis Includes value tracking, cost tables and constant folding llvm:ir llvm:SelectionDAG SelectionDAGISel as well llvm:transforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants