-
Notifications
You must be signed in to change notification settings - Fork 14.5k
[lld-macho] Clean up chained fixup entry emission (NFC) #149483
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
These changes are in preparation for arm64e support: - Store inline addends as `uint32_t` instead of `uint8_t`, as arm64e fixup formats store these in more than 8 bits. - Do not assume `DYLD_CHAINED_PTR_64` format. - Construct the bit-fields in the fixup entries by hand instead of using `reinterpret_cast`, as the latter is a code smell and will work incorrectly on big-endian hosts.
@llvm/pr-subscribers-llvm-binary-utilities @llvm/pr-subscribers-lld-macho Author: Daniel Bertalan (BertalanD) ChangesThese changes are in preparation for arm64e support:
Full diff: https://github.com/llvm/llvm-project/pull/149483.diff 4 Files Affected:
diff --git a/lld/MachO/SyntheticSections.cpp b/lld/MachO/SyntheticSections.cpp
index 979a4ee6d8133..b3a10524158d0 100644
--- a/lld/MachO/SyntheticSections.cpp
+++ b/lld/MachO/SyntheticSections.cpp
@@ -345,17 +345,28 @@ void NonLazyPointerSectionBase::addEntry(Symbol *sym) {
void macho::writeChainedRebase(uint8_t *buf, uint64_t targetVA) {
assert(config->emitChainedFixups);
- assert(target->wordSize == 8 && "Only 64-bit platforms are supported");
- auto *rebase = reinterpret_cast<dyld_chained_ptr_64_rebase *>(buf);
- rebase->target = targetVA & 0xf'ffff'ffff;
- rebase->high8 = (targetVA >> 56);
- rebase->reserved = 0;
- rebase->next = 0;
- rebase->bind = 0;
-
- // The fixup format places a 64 GiB limit on the output's size.
- // Should we handle this gracefully?
- uint64_t encodedVA = rebase->target | ((uint64_t)rebase->high8 << 56);
+
+ uint64_t encodedVA = 0;
+ switch (in.chainedFixups->pointerFormat()) {
+ case DYLD_CHAINED_PTR_64: {
+ // struct dyld_chained_ptr_64_rebase {
+ // uint64_t target : 36; // lower 36 bits of target VA
+ // uint64_t high8 : 8; // upper 8 bits of target VA
+ // uint64_t reserved : 7; // set to 0
+ // uint64_t next : 12; // filled in by Writer later
+ // uint64_t bind : 1; // set to 0
+ // };
+
+ uint64_t target36 = targetVA & 0xf'ffff'ffff;
+ uint64_t high8 = targetVA >> 56;
+ write64le(buf, target36 | (high8 << 36));
+ encodedVA = target36 | (high8 << 56);
+ break;
+ }
+ default:
+ llvm_unreachable("unsupported chained fixup pointer format");
+ }
+
if (encodedVA != targetVA)
error("rebase target address 0x" + Twine::utohexstr(targetVA) +
" does not fit into chained fixup. Re-link with -no_fixup_chains");
@@ -363,14 +374,35 @@ void macho::writeChainedRebase(uint8_t *buf, uint64_t targetVA) {
static void writeChainedBind(uint8_t *buf, const Symbol *sym, int64_t addend) {
assert(config->emitChainedFixups);
- assert(target->wordSize == 8 && "Only 64-bit platforms are supported");
- auto *bind = reinterpret_cast<dyld_chained_ptr_64_bind *>(buf);
auto [ordinal, inlineAddend] = in.chainedFixups->getBinding(sym, addend);
- bind->ordinal = ordinal;
- bind->addend = inlineAddend;
- bind->reserved = 0;
- bind->next = 0;
- bind->bind = 1;
+
+ if (!isUInt<24>(ordinal))
+ error("Import ordinal for symbol '" + sym->getName() + "' (" +
+ utohexstr(ordinal) +
+ ") is larger than maximum import count (0xffffff). Re-link with "
+ "-no_fixup_chains");
+
+ ordinal &= 0xffffff;
+
+ switch (in.chainedFixups->pointerFormat()) {
+ case DYLD_CHAINED_PTR_64: {
+ // struct dyld_chained_ptr_64_bind {
+ // uint64_t ordinal : 24; // import ordinal
+ // uint64_t addend : 8;
+ // uint64_t reserved : 19; // set to 0
+ // uint64_t next : 12; // filled in by Writer later
+ // uint64_t bind : 1; // set to 1
+ // };
+
+ assert(isUInt<8>(inlineAddend) &&
+ "large addend should be stored out-of-line");
+
+ write64le(buf, ordinal | (inlineAddend << 24) | (1ULL << 63));
+ break;
+ }
+ default:
+ llvm_unreachable("unsupported chained fixup pointer format");
+ }
}
void macho::writeChainedFixup(uint8_t *buf, const Symbol *sym, int64_t addend) {
@@ -2300,7 +2332,10 @@ void macho::createSyntheticSymbols() {
}
ChainedFixupsSection::ChainedFixupsSection()
- : LinkEditSection(segment_names::linkEdit, section_names::chainFixups) {}
+ : LinkEditSection(segment_names::linkEdit, section_names::chainFixups) {
+ // FIXME: ld64 uses DYLD_CHAINED_PTR_64_OFFSET on macOS 12+.
+ ptrFormat = DYLD_CHAINED_PTR_64;
+}
bool ChainedFixupsSection::isNeeded() const {
assert(config->emitChainedFixups);
@@ -2328,7 +2363,7 @@ void ChainedFixupsSection::addBinding(const Symbol *sym,
}
}
-std::pair<uint32_t, uint8_t>
+std::pair<uint32_t, uint32_t>
ChainedFixupsSection::getBinding(const Symbol *sym, int64_t addend) const {
int64_t outlineAddend = (addend < 0 || addend > 0xFF) ? addend : 0;
auto it = bindings.find({sym, outlineAddend});
@@ -2379,8 +2414,7 @@ size_t ChainedFixupsSection::SegmentInfo::writeTo(uint8_t *buf) const {
auto *segInfo = reinterpret_cast<dyld_chained_starts_in_segment *>(buf);
segInfo->size = getSize();
segInfo->page_size = target->getPageSize();
- // FIXME: Use DYLD_CHAINED_PTR_64_OFFSET on newer OS versions.
- segInfo->pointer_format = DYLD_CHAINED_PTR_64;
+ segInfo->pointer_format = in.chainedFixups->pointerFormat();
segInfo->segment_offset = oseg->addr - in.header->addr;
segInfo->max_valid_pointer = 0; // not used on 64-bit
segInfo->page_count = pageStarts.back().first + 1;
diff --git a/lld/MachO/SyntheticSections.h b/lld/MachO/SyntheticSections.h
index 5796b0790c83a..1370b51381baf 100644
--- a/lld/MachO/SyntheticSections.h
+++ b/lld/MachO/SyntheticSections.h
@@ -800,14 +800,16 @@ class ChainedFixupsSection final : public LinkEditSection {
void setHasNonWeakDefinition() { hasNonWeakDef = true; }
// Returns an (ordinal, inline addend) tuple used by dyld_chained_ptr_64_bind.
- std::pair<uint32_t, uint8_t> getBinding(const Symbol *sym,
- int64_t addend) const;
+ std::pair<uint32_t, uint32_t> getBinding(const Symbol *sym,
+ int64_t addend) const;
const std::vector<Location> &getLocations() const { return locations; }
bool hasWeakBinding() const { return hasWeakBind; }
bool hasNonWeakDefinition() const { return hasNonWeakDef; }
+ llvm::MachO::ChainedPointerFormat pointerFormat() const { return ptrFormat; }
+
private:
// Location::offset initially stores the offset within an InputSection, but
// contains output segment offsets after finalizeContents().
@@ -835,6 +837,7 @@ class ChainedFixupsSection final : public LinkEditSection {
bool hasWeakBind = false;
bool hasNonWeakDef = false;
llvm::MachO::ChainedImportFormat importFormat;
+ llvm::MachO::ChainedPointerFormat ptrFormat;
};
void writeChainedRebase(uint8_t *buf, uint64_t targetVA);
diff --git a/lld/MachO/Writer.cpp b/lld/MachO/Writer.cpp
index f288fadc0d14f..65ba7895239f1 100644
--- a/lld/MachO/Writer.cpp
+++ b/lld/MachO/Writer.cpp
@@ -36,6 +36,7 @@
using namespace llvm;
using namespace llvm::MachO;
+using namespace llvm::support::endian;
using namespace llvm::sys;
using namespace lld;
using namespace lld::macho;
@@ -1283,9 +1284,10 @@ void Writer::buildFixupChains() {
"fixups are unaligned (offset " + Twine(offset) +
" is not a multiple of the stride). Re-link with -no_fixup_chains");
- // The "next" field is in the same location for bind and rebase entries.
- reinterpret_cast<dyld_chained_ptr_64_bind *>(buf + loc[i - 1].offset)
- ->next = offset / stride;
+ // Set the previous fixup's next offset (bits 51-62) to point to this.
+ void *prev = buf + loc[i - 1].offset;
+ write64le(prev, read64le(prev) | ((offset / stride) << 51));
+
++i;
}
}
diff --git a/llvm/include/llvm/BinaryFormat/MachO.h b/llvm/include/llvm/BinaryFormat/MachO.h
index 5dbdfb13d1a5f..0552e18132c2d 100644
--- a/llvm/include/llvm/BinaryFormat/MachO.h
+++ b/llvm/include/llvm/BinaryFormat/MachO.h
@@ -1043,7 +1043,7 @@ enum {
};
// Values for dyld_chained_starts_in_segment::pointer_format.
-enum {
+enum ChainedPointerFormat {
DYLD_CHAINED_PTR_ARM64E = 1,
DYLD_CHAINED_PTR_64 = 2,
DYLD_CHAINED_PTR_32 = 3,
|
@drodriguez fyi |
I looked for usages of I agree that the usages were not endianness-safe and since they use C/C++ bitfield, inherently unsafe with the packing. There's a good point in trying to use those APIs, though, because they are easier to understand. There's https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/ADT/Bitfields.h which, if I understand correctly, would let us define those types like the following:
This way the bitfields are only defined in one place, and one doesn't have to calculate the masks and the offsets (and the right operations) in each of the usages. What do you think? |
Thanks for the suggestion! While I do agree with your points about wanting to have a single place where the offsets are defined, I actually find the proposed abstraction a bit harder to follow. And LLD uses manual packing/unpacking for other bitfields exclusively, so we'd stay consistent with the existing code with my approach. I'd prefer to merge this as-is for now, and maybe we could later look into adding |
Hm, I’m thinking about a different change that should go in first, but I haven’t fully decided yet. Marking this PR as a draft for now. |
Do you mean the whole LLD or LLD for Mach-O? (also, that something is done all over the place doesn't mean it is the best thing to keep doing, specially when a really thin abstraction is possible and (IMO) improves the usability of the code).
This was my main concern. The usages seems to be limited to those couple of places, which was why I was proposing changing it at once now, but I understand that the proposal makes it look like nothing is going on, but there's some code going on under the hood (which probably need some docstrings, honestly). One thing with your changes is that One compromise is removing most of the current struct, still use
Allows you to write:
But you can merge as-is, if you prefer. It might become very difficult to change this after more and more changes are introduced, though, so I wanted to get this early feedback out. |
These changes are in preparation for arm64e support:
uint32_t
instead ofuint8_t
, as arm64e fixup formats store these in more than 8 bits.DYLD_CHAINED_PTR_64
format.reinterpret_cast
, as the latter is a code smell and will work incorrectly on big-endian hosts.