Skip to content

Conversation

slinder1
Copy link
Contributor

Finish making LLVM's implementation of DW_LNCT_LLVM_source conform to the final accepted version of DW_LNCT_source from
https://dwarfstd.org/issues/180201.1.html

This is effectively a continuation of a few commits which have moved in this direction already, including:

  • c9cb4fc [DebugInfo] Store optional DIFile::Source as pointer
  • 87e22bd Allow for mixing source/no-source DIFiles in one CU

This patch:

  • Teaches LLParser that there is a distinction between an empty and an absent "source:" field on DIFile.
  • Makes printing the "source:" field in AsmWriter conditional on it being present, instead of being conditional on it being non-empty.
  • Teaches MC to map an empty-but-present source field to "\n" (which is ambiguous, making the source strings "" and "\n" indistinguishable, but that's what the DWARF issue specifies).

Add a test for round-tripping an empty source field through assembler/bitcode.

Extend the test for the actual DWARF generation so it covers all of the cases (absent, present-but-empty, present-and-ambiguously-single-newline, present).

Finish making LLVM's implementation of `DW_LNCT_LLVM_source` conform to the
final accepted version of `DW_LNCT_source` from
https://dwarfstd.org/issues/180201.1.html

This is effectively a continuation of a few commits which have moved in this
direction already, including:

c9cb4fc [DebugInfo] Store optional DIFile::Source as pointer
87e22bd Allow for mixing source/no-source DIFiles in one CU

This patch:

* Teaches LLParser that there is a distinction between an empty and an absent
  "source:" field on DIFile.
* Makes printing the "source:" field in AsmWriter conditional on it
  being present, instead of being conditional on it being non-empty.
* Teaches MC to map an empty-but-present source field to "\n" (which is
  ambiguous, making the source strings "" and "\n" indistinguishable,
  but that's what the DWARF issue specifies).

Add a test for round-tripping an empty source field through assembler/bitcode.

Extend the test for the actual DWARF generation so it covers all of the cases
(absent, present-but-empty, present-and-ambiguously-single-newline, present).
@llvmbot
Copy link
Member

llvmbot commented Jul 16, 2025

@llvm/pr-subscribers-debuginfo

@llvm/pr-subscribers-llvm-ir

Author: Scott Linder (slinder1)

Changes

Finish making LLVM's implementation of DW_LNCT_LLVM_source conform to the final accepted version of DW_LNCT_source from
https://dwarfstd.org/issues/180201.1.html

This is effectively a continuation of a few commits which have moved in this direction already, including:

  • c9cb4fc [DebugInfo] Store optional DIFile::Source as pointer
  • 87e22bd Allow for mixing source/no-source DIFiles in one CU

This patch:

  • Teaches LLParser that there is a distinction between an empty and an absent "source:" field on DIFile.
  • Makes printing the "source:" field in AsmWriter conditional on it being present, instead of being conditional on it being non-empty.
  • Teaches MC to map an empty-but-present source field to "\n" (which is ambiguous, making the source strings "" and "\n" indistinguishable, but that's what the DWARF issue specifies).

Add a test for round-tripping an empty source field through assembler/bitcode.

Extend the test for the actual DWARF generation so it covers all of the cases (absent, present-but-empty, present-and-ambiguously-single-newline, present).


Full diff: https://github.com/llvm/llvm-project/pull/149226.diff

5 Files Affected:

  • (modified) llvm/lib/AsmParser/LLParser.cpp (+21-8)
  • (modified) llvm/lib/IR/AsmWriter.cpp (+3-2)
  • (modified) llvm/lib/MC/MCDwarf.cpp (+9-2)
  • (added) llvm/test/Assembler/difile-empty-source.ll (+12)
  • (modified) llvm/test/DebugInfo/Generic/mixed-source.ll (+44-14)
diff --git a/llvm/lib/AsmParser/LLParser.cpp b/llvm/lib/AsmParser/LLParser.cpp
index b7f6950f679ef..ebdac354179f2 100644
--- a/llvm/lib/AsmParser/LLParser.cpp
+++ b/llvm/lib/AsmParser/LLParser.cpp
@@ -4783,9 +4783,13 @@ struct MDField : public MDFieldImpl<Metadata *> {
 };
 
 struct MDStringField : public MDFieldImpl<MDString *> {
-  bool AllowEmpty;
-  MDStringField(bool AllowEmpty = true)
-      : ImplTy(nullptr), AllowEmpty(AllowEmpty) {}
+  enum class EmptyIs {
+    Null,  //< Allow empty input string, map to nullptr
+    Empty, //< Allow empty input string, map to an empty MDString
+    Error, //< Disallow empty string, map to an error
+  } EmptyIs;
+  MDStringField(enum EmptyIs EmptyIs = EmptyIs::Null)
+      : ImplTy(nullptr), EmptyIs(EmptyIs) {}
 };
 
 struct MDFieldList : public MDFieldImpl<SmallVector<Metadata *, 4>> {
@@ -5257,10 +5261,19 @@ bool LLParser::parseMDField(LocTy Loc, StringRef Name, MDStringField &Result) {
   if (parseStringConstant(S))
     return true;
 
-  if (!Result.AllowEmpty && S.empty())
-    return error(ValueLoc, "'" + Name + "' cannot be empty");
+  if (S.empty()) {
+    switch (Result.EmptyIs) {
+    case MDStringField::EmptyIs::Null:
+      Result.assign(nullptr);
+      return false;
+    case MDStringField::EmptyIs::Empty:
+      break;
+    case MDStringField::EmptyIs::Error:
+      return error(ValueLoc, "'" + Name + "' cannot be empty");
+    }
+  }
 
-  Result.assign(S.empty() ? nullptr : MDString::get(Context, S));
+  Result.assign(MDString::get(Context, S));
   return false;
 }
 
@@ -5778,7 +5791,7 @@ bool LLParser::parseDIFile(MDNode *&Result, bool IsDistinct) {
   REQUIRED(directory, MDStringField, );                                        \
   OPTIONAL(checksumkind, ChecksumKindField, (DIFile::CSK_MD5));                \
   OPTIONAL(checksum, MDStringField, );                                         \
-  OPTIONAL(source, MDStringField, );
+  OPTIONAL(source, MDStringField, (MDStringField::EmptyIs::Empty));
   PARSE_MD_FIELDS();
 #undef VISIT_MD_FIELDS
 
@@ -6062,7 +6075,7 @@ bool LLParser::parseDITemplateValueParameter(MDNode *&Result, bool IsDistinct) {
 ///                         declaration: !4, align: 8)
 bool LLParser::parseDIGlobalVariable(MDNode *&Result, bool IsDistinct) {
 #define VISIT_MD_FIELDS(OPTIONAL, REQUIRED)                                    \
-  OPTIONAL(name, MDStringField, (/* AllowEmpty */ false));                     \
+  OPTIONAL(name, MDStringField, (MDStringField::EmptyIs::Error));              \
   OPTIONAL(scope, MDField, );                                                  \
   OPTIONAL(linkageName, MDStringField, );                                      \
   OPTIONAL(file, MDField, );                                                   \
diff --git a/llvm/lib/IR/AsmWriter.cpp b/llvm/lib/IR/AsmWriter.cpp
index 145ef10f28f35..18d67f0c3a006 100644
--- a/llvm/lib/IR/AsmWriter.cpp
+++ b/llvm/lib/IR/AsmWriter.cpp
@@ -2398,8 +2398,9 @@ static void writeDIFile(raw_ostream &Out, const DIFile *N, AsmWriterContext &) {
   // Print all values for checksum together, or not at all.
   if (N->getChecksum())
     Printer.printChecksum(*N->getChecksum());
-  Printer.printString("source", N->getSource().value_or(StringRef()),
-                      /* ShouldSkipEmpty */ true);
+  if (N->getSource())
+    Printer.printString("source", *N->getSource(),
+                        /* ShouldSkipEmpty */ false);
   Out << ")";
 }
 
diff --git a/llvm/lib/MC/MCDwarf.cpp b/llvm/lib/MC/MCDwarf.cpp
index b1dced7e2cda3..e7c0d37e8f99b 100644
--- a/llvm/lib/MC/MCDwarf.cpp
+++ b/llvm/lib/MC/MCDwarf.cpp
@@ -447,10 +447,17 @@ static void emitOneV5FileEntry(MCStreamer *MCOS, const MCDwarfFile &DwarfFile,
         StringRef(reinterpret_cast<const char *>(Cksum.data()), Cksum.size()));
   }
   if (HasAnySource) {
+    // From https://dwarfstd.org/issues/180201.1.html
+    // * The value is an empty null-terminated string if no source is available
+    StringRef Source = DwarfFile.Source.value_or(StringRef());
+    // * If the source is available but is an empty file then the value is a
+    // null-terminated single "\n".
+    if (DwarfFile.Source && DwarfFile.Source->empty())
+      Source = "\n";
     if (LineStr)
-      LineStr->emitRef(MCOS, DwarfFile.Source.value_or(StringRef()));
+      LineStr->emitRef(MCOS, Source);
     else {
-      MCOS->emitBytes(DwarfFile.Source.value_or(StringRef())); // Source and...
+      MCOS->emitBytes(Source);             // Source and...
       MCOS->emitBytes(StringRef("\0", 1)); // its null terminator.
     }
   }
diff --git a/llvm/test/Assembler/difile-empty-source.ll b/llvm/test/Assembler/difile-empty-source.ll
new file mode 100644
index 0000000000000..11587d8f1afdc
--- /dev/null
+++ b/llvm/test/Assembler/difile-empty-source.ll
@@ -0,0 +1,12 @@
+; RUN: llvm-as < %s | llvm-dis | llvm-as | llvm-dis | FileCheck %s
+; RUN: verify-uselistorder
+
+; CHECK: !DIFile({{.*}}, source: "")
+
+!llvm.dbg.cu = !{!0}
+!llvm.module.flags = !{!2, !3}
+
+!0 = distinct !DICompileUnit(language: DW_LANG_C11, file: !1, emissionKind: FullDebug)
+!1 = !DIFile(filename: "-", directory: "/", checksumkind: CSK_MD5, checksum: "d41d8cd98f00b204e9800998ecf8427e", source: "")
+!2 = !{i32 7, !"Dwarf Version", i32 5}
+!3 = !{i32 2, !"Debug Info Version", i32 3}
diff --git a/llvm/test/DebugInfo/Generic/mixed-source.ll b/llvm/test/DebugInfo/Generic/mixed-source.ll
index d5586f8fc5389..ee3598f1fdd13 100644
--- a/llvm/test/DebugInfo/Generic/mixed-source.ll
+++ b/llvm/test/DebugInfo/Generic/mixed-source.ll
@@ -5,36 +5,66 @@
 
 ; CHECK: include_directories[  0] = "dir"
 ; CHECK-NEXT: file_names[  0]:
+; CHECK-NEXT:            name: "main.c"
+; CHECK-NEXT:       dir_index: 0
+; CHECK-NOT:           source:
+; CHECK-NEXT: file_names[  1]:
 ; CHECK-NEXT:            name: "foo.c"
 ; CHECK-NEXT:       dir_index: 0
 ; CHECK-NEXT:          source: "void foo() { }\n"
-; CHECK-NEXT: file_names[  1]:
-; CHECK-NEXT:            name: "bar.h"
+; CHECK-NEXT: file_names[  2]:
+; CHECK-NEXT:            name: "newline.h"
+; CHECK-NEXT:       dir_index: 0
+; CHECK-NEXT:          source: "\n"
+; CHECK-NEXT: file_names[  3]:
+; CHECK-NEXT:            name: "empty.h"
+; CHECK-NEXT:       dir_index: 0
+; CHECK-NEXT:          source: "\n"
+; CHECK-NEXT: file_names[  4]:
+; CHECK-NEXT:            name: "absent.h"
 ; CHECK-NEXT:       dir_index: 0
 ; CHECK-NOT:           source:
 
 ; Test that DIFiles mixing source and no-source within a DICompileUnit works.
 
-define dso_local void @foo() !dbg !5 {
+define dso_local void @foo() !dbg !6 {
   ret void, !dbg !7
 }
 
-define dso_local void @bar() !dbg !6 {
-  ret void, !dbg !8
+define dso_local void @newline() !dbg !9 {
+  ret void, !dbg !10
 }
 
-!llvm.dbg.cu = !{!4}
+define dso_local void @empty() !dbg !12 {
+  ret void, !dbg !13
+}
+
+define dso_local void @absent() !dbg !15 {
+  ret void, !dbg !16
+}
+
+!llvm.dbg.cu = !{!2}
 !llvm.module.flags = !{!0, !1}
 
 !0 = !{i32 2, !"Dwarf Version", i32 5}
 !1 = !{i32 2, !"Debug Info Version", i32 3}
 
-!2 = !DIFile(filename: "foo.c", directory: "dir", source: "void foo() { }\0A")
-!3 = !DIFile(filename: "bar.h", directory: "dir")
+!2 = distinct !DICompileUnit(language: DW_LANG_C99, emissionKind: FullDebug, file: !4)
+!3 = !DISubroutineType(types: !{})
+!4 = !DIFile(filename: "main.c", directory: "dir")
+
+!5 = !DIFile(filename: "foo.c", directory: "dir", source: "void foo() { }\0A")
+!6 = distinct !DISubprogram(name: "foo", file: !5, line: 1, type: !3, scopeLine: 1, spFlags: DISPFlagDefinition, unit: !2)
+!7 = !DILocation(line: 1, scope: !6)
+
+!8 = !DIFile(filename: "newline.h", directory: "dir", source: "\0A")
+!9 = distinct !DISubprogram(name: "newline", file: !8, line: 1, type: !3, scopeLine: 1, spFlags: DISPFlagDefinition, unit: !2)
+!10 = !DILocation(line: 1, scope: !9)
+
+!11 = !DIFile(filename: "empty.h", directory: "dir", source: "")
+!12 = distinct !DISubprogram(name: "empty", file: !11, line: 1, type: !3, scopeLine: 1, spFlags: DISPFlagDefinition, unit: !2)
+!13 = !DILocation(line: 1, scope: !12)
 
-!4 = distinct !DICompileUnit(language: DW_LANG_C99, emissionKind: FullDebug, file: !2)
-!5 = distinct !DISubprogram(name: "foo", file: !2, line: 1, type: !9, scopeLine: 1, spFlags: DISPFlagDefinition, unit: !4)
-!6 = distinct !DISubprogram(name: "bar", file: !3, line: 1, type: !9, scopeLine: 1, spFlags: DISPFlagDefinition, unit: !4)
-!7 = !DILocation(line: 1, scope: !5)
-!8 = !DILocation(line: 1, scope: !6)
-!9 = !DISubroutineType(types: !{})
+!14 = !DIFile(filename: "absent.h", directory: "dir")
+!15 = distinct !DISubprogram(name: "absent", file: !14, line: 1, type: !3, scopeLine: 1, spFlags: DISPFlagDefinition, unit: !2)
+!16 = !DILocation(line: 1, scope: !15)

@llvmbot
Copy link
Member

llvmbot commented Jul 16, 2025

@llvm/pr-subscribers-mc

Author: Scott Linder (slinder1)

Changes

Finish making LLVM's implementation of DW_LNCT_LLVM_source conform to the final accepted version of DW_LNCT_source from
https://dwarfstd.org/issues/180201.1.html

This is effectively a continuation of a few commits which have moved in this direction already, including:

  • c9cb4fc [DebugInfo] Store optional DIFile::Source as pointer
  • 87e22bd Allow for mixing source/no-source DIFiles in one CU

This patch:

  • Teaches LLParser that there is a distinction between an empty and an absent "source:" field on DIFile.
  • Makes printing the "source:" field in AsmWriter conditional on it being present, instead of being conditional on it being non-empty.
  • Teaches MC to map an empty-but-present source field to "\n" (which is ambiguous, making the source strings "" and "\n" indistinguishable, but that's what the DWARF issue specifies).

Add a test for round-tripping an empty source field through assembler/bitcode.

Extend the test for the actual DWARF generation so it covers all of the cases (absent, present-but-empty, present-and-ambiguously-single-newline, present).


Full diff: https://github.com/llvm/llvm-project/pull/149226.diff

5 Files Affected:

  • (modified) llvm/lib/AsmParser/LLParser.cpp (+21-8)
  • (modified) llvm/lib/IR/AsmWriter.cpp (+3-2)
  • (modified) llvm/lib/MC/MCDwarf.cpp (+9-2)
  • (added) llvm/test/Assembler/difile-empty-source.ll (+12)
  • (modified) llvm/test/DebugInfo/Generic/mixed-source.ll (+44-14)
diff --git a/llvm/lib/AsmParser/LLParser.cpp b/llvm/lib/AsmParser/LLParser.cpp
index b7f6950f679ef..ebdac354179f2 100644
--- a/llvm/lib/AsmParser/LLParser.cpp
+++ b/llvm/lib/AsmParser/LLParser.cpp
@@ -4783,9 +4783,13 @@ struct MDField : public MDFieldImpl<Metadata *> {
 };
 
 struct MDStringField : public MDFieldImpl<MDString *> {
-  bool AllowEmpty;
-  MDStringField(bool AllowEmpty = true)
-      : ImplTy(nullptr), AllowEmpty(AllowEmpty) {}
+  enum class EmptyIs {
+    Null,  //< Allow empty input string, map to nullptr
+    Empty, //< Allow empty input string, map to an empty MDString
+    Error, //< Disallow empty string, map to an error
+  } EmptyIs;
+  MDStringField(enum EmptyIs EmptyIs = EmptyIs::Null)
+      : ImplTy(nullptr), EmptyIs(EmptyIs) {}
 };
 
 struct MDFieldList : public MDFieldImpl<SmallVector<Metadata *, 4>> {
@@ -5257,10 +5261,19 @@ bool LLParser::parseMDField(LocTy Loc, StringRef Name, MDStringField &Result) {
   if (parseStringConstant(S))
     return true;
 
-  if (!Result.AllowEmpty && S.empty())
-    return error(ValueLoc, "'" + Name + "' cannot be empty");
+  if (S.empty()) {
+    switch (Result.EmptyIs) {
+    case MDStringField::EmptyIs::Null:
+      Result.assign(nullptr);
+      return false;
+    case MDStringField::EmptyIs::Empty:
+      break;
+    case MDStringField::EmptyIs::Error:
+      return error(ValueLoc, "'" + Name + "' cannot be empty");
+    }
+  }
 
-  Result.assign(S.empty() ? nullptr : MDString::get(Context, S));
+  Result.assign(MDString::get(Context, S));
   return false;
 }
 
@@ -5778,7 +5791,7 @@ bool LLParser::parseDIFile(MDNode *&Result, bool IsDistinct) {
   REQUIRED(directory, MDStringField, );                                        \
   OPTIONAL(checksumkind, ChecksumKindField, (DIFile::CSK_MD5));                \
   OPTIONAL(checksum, MDStringField, );                                         \
-  OPTIONAL(source, MDStringField, );
+  OPTIONAL(source, MDStringField, (MDStringField::EmptyIs::Empty));
   PARSE_MD_FIELDS();
 #undef VISIT_MD_FIELDS
 
@@ -6062,7 +6075,7 @@ bool LLParser::parseDITemplateValueParameter(MDNode *&Result, bool IsDistinct) {
 ///                         declaration: !4, align: 8)
 bool LLParser::parseDIGlobalVariable(MDNode *&Result, bool IsDistinct) {
 #define VISIT_MD_FIELDS(OPTIONAL, REQUIRED)                                    \
-  OPTIONAL(name, MDStringField, (/* AllowEmpty */ false));                     \
+  OPTIONAL(name, MDStringField, (MDStringField::EmptyIs::Error));              \
   OPTIONAL(scope, MDField, );                                                  \
   OPTIONAL(linkageName, MDStringField, );                                      \
   OPTIONAL(file, MDField, );                                                   \
diff --git a/llvm/lib/IR/AsmWriter.cpp b/llvm/lib/IR/AsmWriter.cpp
index 145ef10f28f35..18d67f0c3a006 100644
--- a/llvm/lib/IR/AsmWriter.cpp
+++ b/llvm/lib/IR/AsmWriter.cpp
@@ -2398,8 +2398,9 @@ static void writeDIFile(raw_ostream &Out, const DIFile *N, AsmWriterContext &) {
   // Print all values for checksum together, or not at all.
   if (N->getChecksum())
     Printer.printChecksum(*N->getChecksum());
-  Printer.printString("source", N->getSource().value_or(StringRef()),
-                      /* ShouldSkipEmpty */ true);
+  if (N->getSource())
+    Printer.printString("source", *N->getSource(),
+                        /* ShouldSkipEmpty */ false);
   Out << ")";
 }
 
diff --git a/llvm/lib/MC/MCDwarf.cpp b/llvm/lib/MC/MCDwarf.cpp
index b1dced7e2cda3..e7c0d37e8f99b 100644
--- a/llvm/lib/MC/MCDwarf.cpp
+++ b/llvm/lib/MC/MCDwarf.cpp
@@ -447,10 +447,17 @@ static void emitOneV5FileEntry(MCStreamer *MCOS, const MCDwarfFile &DwarfFile,
         StringRef(reinterpret_cast<const char *>(Cksum.data()), Cksum.size()));
   }
   if (HasAnySource) {
+    // From https://dwarfstd.org/issues/180201.1.html
+    // * The value is an empty null-terminated string if no source is available
+    StringRef Source = DwarfFile.Source.value_or(StringRef());
+    // * If the source is available but is an empty file then the value is a
+    // null-terminated single "\n".
+    if (DwarfFile.Source && DwarfFile.Source->empty())
+      Source = "\n";
     if (LineStr)
-      LineStr->emitRef(MCOS, DwarfFile.Source.value_or(StringRef()));
+      LineStr->emitRef(MCOS, Source);
     else {
-      MCOS->emitBytes(DwarfFile.Source.value_or(StringRef())); // Source and...
+      MCOS->emitBytes(Source);             // Source and...
       MCOS->emitBytes(StringRef("\0", 1)); // its null terminator.
     }
   }
diff --git a/llvm/test/Assembler/difile-empty-source.ll b/llvm/test/Assembler/difile-empty-source.ll
new file mode 100644
index 0000000000000..11587d8f1afdc
--- /dev/null
+++ b/llvm/test/Assembler/difile-empty-source.ll
@@ -0,0 +1,12 @@
+; RUN: llvm-as < %s | llvm-dis | llvm-as | llvm-dis | FileCheck %s
+; RUN: verify-uselistorder
+
+; CHECK: !DIFile({{.*}}, source: "")
+
+!llvm.dbg.cu = !{!0}
+!llvm.module.flags = !{!2, !3}
+
+!0 = distinct !DICompileUnit(language: DW_LANG_C11, file: !1, emissionKind: FullDebug)
+!1 = !DIFile(filename: "-", directory: "/", checksumkind: CSK_MD5, checksum: "d41d8cd98f00b204e9800998ecf8427e", source: "")
+!2 = !{i32 7, !"Dwarf Version", i32 5}
+!3 = !{i32 2, !"Debug Info Version", i32 3}
diff --git a/llvm/test/DebugInfo/Generic/mixed-source.ll b/llvm/test/DebugInfo/Generic/mixed-source.ll
index d5586f8fc5389..ee3598f1fdd13 100644
--- a/llvm/test/DebugInfo/Generic/mixed-source.ll
+++ b/llvm/test/DebugInfo/Generic/mixed-source.ll
@@ -5,36 +5,66 @@
 
 ; CHECK: include_directories[  0] = "dir"
 ; CHECK-NEXT: file_names[  0]:
+; CHECK-NEXT:            name: "main.c"
+; CHECK-NEXT:       dir_index: 0
+; CHECK-NOT:           source:
+; CHECK-NEXT: file_names[  1]:
 ; CHECK-NEXT:            name: "foo.c"
 ; CHECK-NEXT:       dir_index: 0
 ; CHECK-NEXT:          source: "void foo() { }\n"
-; CHECK-NEXT: file_names[  1]:
-; CHECK-NEXT:            name: "bar.h"
+; CHECK-NEXT: file_names[  2]:
+; CHECK-NEXT:            name: "newline.h"
+; CHECK-NEXT:       dir_index: 0
+; CHECK-NEXT:          source: "\n"
+; CHECK-NEXT: file_names[  3]:
+; CHECK-NEXT:            name: "empty.h"
+; CHECK-NEXT:       dir_index: 0
+; CHECK-NEXT:          source: "\n"
+; CHECK-NEXT: file_names[  4]:
+; CHECK-NEXT:            name: "absent.h"
 ; CHECK-NEXT:       dir_index: 0
 ; CHECK-NOT:           source:
 
 ; Test that DIFiles mixing source and no-source within a DICompileUnit works.
 
-define dso_local void @foo() !dbg !5 {
+define dso_local void @foo() !dbg !6 {
   ret void, !dbg !7
 }
 
-define dso_local void @bar() !dbg !6 {
-  ret void, !dbg !8
+define dso_local void @newline() !dbg !9 {
+  ret void, !dbg !10
 }
 
-!llvm.dbg.cu = !{!4}
+define dso_local void @empty() !dbg !12 {
+  ret void, !dbg !13
+}
+
+define dso_local void @absent() !dbg !15 {
+  ret void, !dbg !16
+}
+
+!llvm.dbg.cu = !{!2}
 !llvm.module.flags = !{!0, !1}
 
 !0 = !{i32 2, !"Dwarf Version", i32 5}
 !1 = !{i32 2, !"Debug Info Version", i32 3}
 
-!2 = !DIFile(filename: "foo.c", directory: "dir", source: "void foo() { }\0A")
-!3 = !DIFile(filename: "bar.h", directory: "dir")
+!2 = distinct !DICompileUnit(language: DW_LANG_C99, emissionKind: FullDebug, file: !4)
+!3 = !DISubroutineType(types: !{})
+!4 = !DIFile(filename: "main.c", directory: "dir")
+
+!5 = !DIFile(filename: "foo.c", directory: "dir", source: "void foo() { }\0A")
+!6 = distinct !DISubprogram(name: "foo", file: !5, line: 1, type: !3, scopeLine: 1, spFlags: DISPFlagDefinition, unit: !2)
+!7 = !DILocation(line: 1, scope: !6)
+
+!8 = !DIFile(filename: "newline.h", directory: "dir", source: "\0A")
+!9 = distinct !DISubprogram(name: "newline", file: !8, line: 1, type: !3, scopeLine: 1, spFlags: DISPFlagDefinition, unit: !2)
+!10 = !DILocation(line: 1, scope: !9)
+
+!11 = !DIFile(filename: "empty.h", directory: "dir", source: "")
+!12 = distinct !DISubprogram(name: "empty", file: !11, line: 1, type: !3, scopeLine: 1, spFlags: DISPFlagDefinition, unit: !2)
+!13 = !DILocation(line: 1, scope: !12)
 
-!4 = distinct !DICompileUnit(language: DW_LANG_C99, emissionKind: FullDebug, file: !2)
-!5 = distinct !DISubprogram(name: "foo", file: !2, line: 1, type: !9, scopeLine: 1, spFlags: DISPFlagDefinition, unit: !4)
-!6 = distinct !DISubprogram(name: "bar", file: !3, line: 1, type: !9, scopeLine: 1, spFlags: DISPFlagDefinition, unit: !4)
-!7 = !DILocation(line: 1, scope: !5)
-!8 = !DILocation(line: 1, scope: !6)
-!9 = !DISubroutineType(types: !{})
+!14 = !DIFile(filename: "absent.h", directory: "dir")
+!15 = distinct !DISubprogram(name: "absent", file: !14, line: 1, type: !3, scopeLine: 1, spFlags: DISPFlagDefinition, unit: !2)
+!16 = !DILocation(line: 1, scope: !15)

@hahnjo hahnjo removed their request for review July 21, 2025 07:20
@hahnjo
Copy link
Member

hahnjo commented Jul 21, 2025

Not really knowledgeable for reviewing DWARF changes

@jmorse
Copy link
Member

jmorse commented Jul 21, 2025

This broadly LGTM; it'd be great if someone with more context looks at it too.

(The empty-string versus "\n" ambiguity is awkward; but it's also explicit in the spec, so presumably there's a reason for doing it).

@pogo59
Copy link
Collaborator

pogo59 commented Jul 21, 2025

(The empty-string versus "\n" ambiguity is awkward; but it's also explicit in the spec, so presumably there's a reason for doing it).

I remember the topic, although not the precise reasoning; had something to do with how a toolchain might generate source on the fly. Some languages mandate that source ends with a newline, or something along those lines.

Copy link
Collaborator

@pogo59 pogo59 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@slinder1
Copy link
Contributor Author

I agree the ambiguity isn't ideal, but the cases where I could imagine it being an issue seem unlikely to come up in practice.

Thinking about this just now, if one includes a checksum alongside the embedded source it seems like it is actually enough to disambiguate it. The consumer can compare against the "null" hash digest, e.g. printf '' | md5sum - # d41d8cd98f00b204e9800998ecf8427e

@slinder1 slinder1 merged commit 354944d into llvm:main Jul 21, 2025
13 checks passed
@slinder1 slinder1 deleted the difile-source-empty branch July 21, 2025 22:42
mahesh-attarde pushed a commit to mahesh-attarde/llvm-project that referenced this pull request Jul 28, 2025
Finish making LLVM's implementation of `DW_LNCT_LLVM_source` conform to
the final accepted version of `DW_LNCT_source` from
https://dwarfstd.org/issues/180201.1.html

This is effectively a continuation of a few commits which have moved in
this direction already, including:

* c9cb4fc [DebugInfo] Store optional DIFile::Source as pointer
* 87e22bd Allow for mixing source/no-source DIFiles in one CU

This patch:

* Teaches LLParser that there is a distinction between an empty and an
absent "source:" field on DIFile.
* Makes printing the "source:" field in AsmWriter conditional on it
being present, instead of being conditional on it being non-empty.
* Teaches MC to map an empty-but-present source field to "\n" (which is
ambiguous, making the source strings "" and "\n" indistinguishable, but
that's what the DWARF issue specifies).

Add a test for round-tripping an empty source field through
assembler/bitcode.

Extend the test for the actual DWARF generation so it covers all of the
cases (absent, present-but-empty,
present-and-ambiguously-single-newline, present).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
debuginfo llvm:ir llvm:mc Machine (object) code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants