Skip to content

ELF,SystemZ: Don't sort relocations for TLS GD/LD optimization; support CREL #149640

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

MaskRay
Copy link
Member

@MaskRay MaskRay commented Jul 19, 2025

The General dynamic and Local Dynamic TLS models call
__tls_get_offset, which is eliminated after optimization. To avoid an
unneeded PLT entry and its .dynsym entry, #75643 sorted the relocations.

However, sorting relocations does not play well with CREL. Since GNU ld
has a false dependency on __tls_get_offset as well, let's remove the
relocation sorting. Fix #149511

Created using spr 1.3.5-bogner
@llvmbot
Copy link
Member

llvmbot commented Jul 19, 2025

@llvm/pr-subscribers-lld

@llvm/pr-subscribers-lld-elf

Author: Fangrui Song (MaskRay)

Changes

The General dynamic and Local Dynamic TLS models call
__tls_get_offset, which is eliminated after optimization. To avoid an
unneeded PLT entry and its .dynsym entry, #75643 sorted the relocations.

However, sorting relocations does not play well with CREL. Since GNU ld
has a false dependency on __tls_get_offset as well, let's remove the
relocation sorting. Fix #149511


Full diff: https://github.com/llvm/llvm-project/pull/149640.diff

5 Files Affected:

  • (modified) lld/ELF/Arch/SystemZ.cpp (+3-24)
  • (modified) lld/ELF/Relocations.cpp (+1-1)
  • (modified) lld/test/ELF/systemz-plt.s (+1-1)
  • (modified) lld/test/ELF/systemz-tls-gd.s (+25-21)
  • (modified) lld/test/ELF/systemz-tls-ld.s (+3)
diff --git a/lld/ELF/Arch/SystemZ.cpp b/lld/ELF/Arch/SystemZ.cpp
index a9125806c0952..6771ff01dae58 100644
--- a/lld/ELF/Arch/SystemZ.cpp
+++ b/lld/ELF/Arch/SystemZ.cpp
@@ -23,7 +23,6 @@ namespace {
 class SystemZ : public TargetInfo {
 public:
   SystemZ(Ctx &);
-  int getTlsGdRelaxSkip(RelType type) const override;
   RelExpr getRelExpr(RelType type, const Symbol &s,
                      const uint8_t *loc) const override;
   RelType getDynRel(RelType type) const override;
@@ -277,29 +276,6 @@ RelExpr SystemZ::adjustTlsExpr(RelType type, RelExpr expr) const {
   return expr;
 }
 
-int SystemZ::getTlsGdRelaxSkip(RelType type) const {
-  // A __tls_get_offset call instruction is marked with 2 relocations:
-  //
-  //   R_390_TLS_GDCALL / R_390_TLS_LDCALL: marker relocation
-  //   R_390_PLT32DBL: __tls_get_offset
-  //
-  // After the relaxation we no longer call __tls_get_offset and should skip
-  // both relocations to not create a false dependence on __tls_get_offset
-  // being defined.
-  //
-  // Note that this mechanism only works correctly if the R_390_TLS_[GL]DCALL
-  // is seen immediately *before* the R_390_PLT32DBL.  Unfortunately, current
-  // compilers on the platform will typically generate the inverse sequence.
-  // To fix this, we sort relocations by offset in RelocationScanner::scan;
-  // this ensures the correct sequence as the R_390_TLS_[GL]DCALL applies to
-  // the first byte of the brasl instruction, while the R_390_PLT32DBL applies
-  // to its third byte (the relative displacement).
-
-  if (type == R_390_TLS_GDCALL || type == R_390_TLS_LDCALL)
-    return 2;
-  return 1;
-}
-
 void SystemZ::relaxTlsGdToIe(uint8_t *loc, const Relocation &rel,
                              uint64_t val) const {
   // The general-dynamic code sequence for a global `x`:
@@ -320,6 +296,9 @@ void SystemZ::relaxTlsGdToIe(uint8_t *loc, const Relocation &rel,
   // Relaxing to initial-exec entails:
   // 1) Replacing the call by a load from the GOT.
   // 2) Replacing the relocation on the constant LC0 by R_390_TLS_GOTIE64.
+  //
+  // While we no longer call __tls_get_offset after optimization, we still
+  // generate an unused PLT entry. This simple behavior matches GNU ld.
 
   switch (rel.type) {
   case R_390_TLS_GDCALL:
diff --git a/lld/ELF/Relocations.cpp b/lld/ELF/Relocations.cpp
index 4333b032c9d4e..845ce199b62f2 100644
--- a/lld/ELF/Relocations.cpp
+++ b/lld/ELF/Relocations.cpp
@@ -1657,7 +1657,7 @@ void RelocationScanner::scan(Relocs<RelTy> rels) {
   // On SystemZ, all sections need to be sorted by r_offset, to allow TLS
   // relaxation to be handled correctly - see SystemZ::getTlsGdRelaxSkip.
   SmallVector<RelTy, 0> storage;
-  if (isa<EhInputSection>(sec) || ctx.arg.emachine == EM_S390)
+  if (isa<EhInputSection>(sec))
     rels = sortRels(rels, storage);
 
   if constexpr (RelTy::IsCrel) {
diff --git a/lld/test/ELF/systemz-plt.s b/lld/test/ELF/systemz-plt.s
index 717343ce4c4d5..1207f0704db8e 100644
--- a/lld/test/ELF/systemz-plt.s
+++ b/lld/test/ELF/systemz-plt.s
@@ -3,7 +3,7 @@
 
 # RUN: llvm-mc -filetype=obj -triple=s390x-unknown-linux %t1.s -o %t1.o
 # RUN: ld.lld -shared %t1.o -soname=t1.so -o %t1.so
-# RUN: llvm-mc -filetype=obj -triple=s390x-unknown-linux %s -o %t.o
+# RUN: llvm-mc -filetype=obj -triple=s390x-unknown-linux --crel %s -o %t.o
 # RUN: ld.lld %t.o %t1.so -z separate-code -o %t
 # RUN: llvm-readelf -S -s -r -x .got.plt %t | FileCheck %s
 # RUN: llvm-objdump -d %t | FileCheck --check-prefixes=DIS %s
diff --git a/lld/test/ELF/systemz-tls-gd.s b/lld/test/ELF/systemz-tls-gd.s
index 742797e2d62e4..0042127eea153 100644
--- a/lld/test/ELF/systemz-tls-gd.s
+++ b/lld/test/ELF/systemz-tls-gd.s
@@ -1,6 +1,7 @@
 # REQUIRES: systemz
 # RUN: llvm-mc -filetype=obj -triple=s390x-unknown-linux %s -o %t.o
-# RUN: echo '.tbss; .globl b, c; b: .zero 4; c:' | llvm-mc -filetype=obj -triple=s390x-unknown-linux - -o %t1.o
+# RUN: echo '.globl __tls_get_offset; __tls_get_offset:; .tbss; .globl b, c; b: .zero 4; c:' | \
+# RUN:   llvm-mc -filetype=obj -triple=s390x-unknown-linux - -o %t1.o
 # RUN: ld.lld -shared -soname=t1.so %t1.o -o %t1.so
 
 # RUN: ld.lld -shared %t.o %t1.o -o %t.so
@@ -19,12 +20,12 @@
 # RUN: llvm-objdump --section .data.rel.ro --full-contents %t.ie | FileCheck --check-prefix=IE-DATA %s
 
 # GD-REL: Relocation section '.rela.dyn' at offset {{.*}} contains 6 entries:
-# GD-REL:      0000000000002570 0000000200000036 R_390_TLS_DTPMOD 0000000000000008 a + 0
-# GD-REL-NEXT: 0000000000002578 0000000200000037 R_390_TLS_DTPOFF 0000000000000008 a + 0
-# GD-REL-NEXT: 0000000000002580 0000000300000036 R_390_TLS_DTPMOD 000000000000000c b + 0
-# GD-REL-NEXT: 0000000000002588 0000000300000037 R_390_TLS_DTPOFF 000000000000000c b + 0
-# GD-REL-NEXT: 0000000000002590 0000000400000036 R_390_TLS_DTPMOD 0000000000000010 c + 0
-# GD-REL-NEXT: 0000000000002598 0000000400000037 R_390_TLS_DTPOFF 0000000000000010 c + 0
+# GD-REL:      0000000000002570 {{.*}}           R_390_TLS_DTPMOD 0000000000000008 a + 0
+# GD-REL-NEXT: 0000000000002578 {{.*}}           R_390_TLS_DTPOFF 0000000000000008 a + 0
+# GD-REL-NEXT: 0000000000002580 {{.*}}           R_390_TLS_DTPMOD 000000000000000c b + 0
+# GD-REL-NEXT: 0000000000002588 {{.*}}           R_390_TLS_DTPOFF 000000000000000c b + 0
+# GD-REL-NEXT: 0000000000002590 {{.*}}           R_390_TLS_DTPMOD 0000000000000010 c + 0
+# GD-REL-NEXT: 0000000000002598 {{.*}}           R_390_TLS_DTPOFF 0000000000000010 c + 0
 
 ## _GLOBAL_OFFSET_TABLE is at 0x2558
 # GD:      larl    %r12, 0x2558
@@ -80,33 +81,36 @@
 
 
 # IE-REL: Relocation section '.rela.dyn' at offset {{.*}} contains 2 entries:
-# IE-REL:      0000000001002430 0000000200000038 R_390_TLS_TPOFF 0000000000000000 b + 0
-# IE-REL-NEXT: 0000000001002438 0000000300000038 R_390_TLS_TPOFF 0000000000000000 c + 0
+# IE-REL:      0000000001002500 {{.*}}           R_390_TLS_TPOFF 0000000000000000 b + 0
+# IE-REL-NEXT: 0000000001002508 {{.*}}           R_390_TLS_TPOFF 0000000000000000 c + 0
+## Benign false dependency on __tls_get_offset
+# IE-REL: Relocation section '.rela.plt' at offset {{.*}} contains 1
+# IE-REL:                                        R_390_JMP_SLOT  0000000000000000 __tls_get_offset
 
-## _GLOBAL_OFFSET_TABLE is at 0x1002418
-# IE:      larl    %r12, 0x1002418
+## _GLOBAL_OFFSET_TABLE
+# IE:      larl    %r12, 0x10024e8
 
-## TP offset of a is at 0x1002340
-# IE-NEXT: lgrl    %r2, 0x1002340
+## TP offset of a
+# IE-NEXT: lgrl    %r2, 0x10023d0
 # IE-NEXT: jgnop
 # IE-NEXT: lgf     %r2, 0(%r2,%r7)
 
-## GOT offset of the TP offset for b is at 0x1002348
-# IE-NEXT: lgrl    %r2, 0x1002348
+## GOT offset of the TP offset for b
+# IE-NEXT: lgrl    %r2, 0x10023d8
 # IE-NEXT: lg      %r2, 0(%r2,%r12)
 # IE-NEXT: lgf     %r2, 0(%r2,%r7)
 
-## GOT offset of the TP offset for c is at 0x1002350
-# IE-NEXT: lgrl    %r2, 0x1002350
+## GOT offset of the TP offset for c
+# IE-NEXT: lgrl    %r2, 0x10023e0
 # IE-NEXT: lg      %r2, 0(%r2,%r12)
 # IE-NEXT: lgf     %r2, 0(%r2,%r7)
 
 ## TP offsets (a) / GOT offset of TP offsets (b, c)
 # a: -4
-# b: 0x1002430 / 0x18
-# c: 0x1002438 / 0x20
-# IE-DATA:      1002340 ffffffff fffffffc 00000000 00000018
-# IE-DATA-NEXT: 1002350 00000000 00000020
+# b: 0x10023d0 / 0x18
+# c: 0x10023e0 / 0x20
+# IE-DATA:      10023d0 ffffffff fffffffc 00000000 00000018
+# IE-DATA-NEXT: 10023e0 00000000 00000020
 
 
 ear     %r7,%a0
diff --git a/lld/test/ELF/systemz-tls-ld.s b/lld/test/ELF/systemz-tls-ld.s
index ef104b82644ce..fc96f3a558a1f 100644
--- a/lld/test/ELF/systemz-tls-ld.s
+++ b/lld/test/ELF/systemz-tls-ld.s
@@ -91,6 +91,9 @@ lgf     %r1,0(%r1,%r2)
 lgrl    %r1, .LC3
 lgf     %r1,0(%r1,%r2)
 
+.globl __tls_get_offset
+__tls_get_offset:
+
         .section        .data.rel.ro,"aw"
         .align  8
 .LC0:

@MaskRay MaskRay requested a review from uweigand July 19, 2025 06:49
Copy link
Member

@uweigand uweigand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I'd be OK with this. It would still be nicer if this false dependency could be avoided. Not sure what CREL has to do with relocation sorting: is the problem indeed the swap of these two TLS-related relocs, or is the problem that generic sorting also introduces other changes? If the latter, maybe we could just swap those TLS relocs instead of a generic sorting path? Or simply allow a more generic relocation-skipping mechanism that allows skipping the reloc before a TLS reloc as well as the reloc after the TLS reloc?

@@ -1657,7 +1657,7 @@ void RelocationScanner::scan(Relocs<RelTy> rels) {
// On SystemZ, all sections need to be sorted by r_offset, to allow TLS
// relaxation to be handled correctly - see SystemZ::getTlsGdRelaxSkip.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should remove that comment then as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed!

@MaskRay
Copy link
Member Author

MaskRay commented Jul 21, 2025

I guess I'd be OK with this. It would still be nicer if this false dependency could be avoided. Not sure what CREL has to do with relocation sorting: is the problem indeed the swap of these two TLS-related relocs, or is the problem that generic sorting also introduces other changes? If the latter, maybe we could just swap those TLS relocs instead of a generic sorting path? Or simply allow a more generic relocation-skipping mechanism that allows skipping the reloc before a TLS reloc as well as the reloc after the TLS reloc?

CREL does not support random access. To allow the GDCALL/PLT32, we would need to decode them to RELA and sort them first...

@uweigand
Copy link
Member

CREL does not support random access. To allow the GDCALL/PLT32, we would need to decode them to RELA and sort them first...

Ah, I see. But it should be possible to "peek" at the next relocation, right? So we might add something to the main scanOne loop to the effect of: if this reloc is a R_390_PLT32DBL and the next reloc is a R_390_TLS_GDCALL or R_390_TLS_LDCALL which we know we'll able to optimize, then simply ignore this (R_390_PLT32DBL) reloc?

Created using spr 1.3.5-bogner
MaskRay added a commit to MaskRay/llvm-project that referenced this pull request Jul 22, 2025
…rt CREL

The General dynamic and Local Dynamic TLS models call
`__tls_get_offset`, which is eliminated after optimization. To avoid an
unneeded PLT entry and its .dynsym entry, llvm#75643 sorted the relocations.

However, sorting relocations does not play well with CREL. Since GNU ld
has a false dependency on `__tls_get_offset` as well, let's remove the
relocation sorting. Fix llvm#149511

Pull Request: llvm#149640
@MaskRay
Copy link
Member Author

MaskRay commented Jul 22, 2025

CREL does not support random access. To allow the GDCALL/PLT32, we would need to decode them to RELA and sort them first...

Ah, I see. But it should be possible to "peek" at the next relocation, right? So we might add something to the main scanOne loop to the effect of: if this reloc is a R_390_PLT32DBL and the next reloc is a R_390_TLS_GDCALL or R_390_TLS_LDCALL which we know we'll able to optimize, then simply ignore this (R_390_PLT32DBL) reloc?

In RelocationScanner::scanOne, the parameter typename Relocs<RelTy>::const_iterator &i can be changed to support MIPS. It pessimized generic code... I don't want more targets to skip relocations like that or do EM_PPC64 checks for .toc and IBM XL/C TLS workarounds. In the long term scanOne should become a template instantiated by targets, so that these target workarounds do not affect performance of other targets. mold adopts this approach.

I plan not to do anything with with GDCALL/PLT32 skipping because it is a minor thing and GNU ld doesn't skip PLT32DBL...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

s390x does not handle PC offset correctly when using CREL
3 participants