Skip to content

Commit a78a0f8

Browse files
authored
[X86] Align f128 and i128 to 16 bytes when passing on x86-32 (#138092)
The i386 psABI specifies that `__float128` has 16 byte alignment and must be passed on the stack; however, LLVM currently stores it in a stack slot that has an offset of 4. Add a custom lowering to correct this alignment to 16-byte. i386 does not specify an `__int128`, but it seems reasonable to keep the same behavior as `__float128` so this is changed as well. There also isn't a good way to distinguish whether a set of four registers came from an integer or a float. The main test demonstrating this change is `store_perturbed` in `llvm/test/CodeGen/X86/i128-fp128-abi.ll`. Referenced ABI: https://gitlab.com/x86-psABIs/i386-ABI/-/wikis/uploads/14c05f1b1e156e0e46b61bfa7c1df1e2/intel386-psABI-2020-08-07.pdf Fixes: #77401
1 parent e66eabe commit a78a0f8

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

51 files changed

+5364
-3782
lines changed

llvm/docs/ReleaseNotes.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -233,6 +233,8 @@ Changes to the X86 Backend
233233
--------------------------
234234

235235
* `fp128` will now use `*f128` libcalls on 32-bit GNU targets as well.
236+
* On x86-32, `fp128` and `i128` are now passed with the expected 16-byte stack
237+
alignment.
236238

237239
Changes to the OCaml bindings
238240
-----------------------------

llvm/lib/Target/X86/X86CallingConv.cpp

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -374,5 +374,37 @@ static bool CC_X86_64_I128(unsigned &ValNo, MVT &ValVT, MVT &LocVT,
374374
return true;
375375
}
376376

377+
/// Special handling for i128 and fp128: on x86-32, i128 and fp128 get legalized
378+
/// as four i32s, but fp128 must be passed on the stack with 16-byte alignment.
379+
/// Technically only fp128 has a specified ABI, but it makes sense to handle
380+
/// i128 the same until we hear differently.
381+
static bool CC_X86_32_I128_FP128(unsigned &ValNo, MVT &ValVT, MVT &LocVT,
382+
CCValAssign::LocInfo &LocInfo,
383+
ISD::ArgFlagsTy &ArgFlags, CCState &State) {
384+
assert(ValVT == MVT::i32 && "Should have i32 parts");
385+
SmallVectorImpl<CCValAssign> &PendingMembers = State.getPendingLocs();
386+
PendingMembers.push_back(
387+
CCValAssign::getPending(ValNo, ValVT, LocVT, LocInfo));
388+
389+
if (!ArgFlags.isInConsecutiveRegsLast())
390+
return true;
391+
392+
unsigned NumRegs = PendingMembers.size();
393+
assert(NumRegs == 4 && "Should have two parts");
394+
395+
int64_t Offset = State.AllocateStack(16, Align(16));
396+
PendingMembers[0].convertToMem(Offset);
397+
PendingMembers[1].convertToMem(Offset + 4);
398+
PendingMembers[2].convertToMem(Offset + 8);
399+
PendingMembers[3].convertToMem(Offset + 12);
400+
401+
State.addLoc(PendingMembers[0]);
402+
State.addLoc(PendingMembers[1]);
403+
State.addLoc(PendingMembers[2]);
404+
State.addLoc(PendingMembers[3]);
405+
PendingMembers.clear();
406+
return true;
407+
}
408+
377409
// Provides entry points of CC_X86 and RetCC_X86.
378410
#include "X86GenCallingConv.inc"

llvm/lib/Target/X86/X86CallingConv.td

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -859,6 +859,11 @@ def CC_X86_32_C : CallingConv<[
859859
// The 'nest' parameter, if any, is passed in ECX.
860860
CCIfNest<CCAssignToReg<[ECX]>>,
861861

862+
// i128 and fp128 need to be passed on the stack with a higher alignment than
863+
// their legal types. Handle this with a custom function.
864+
CCIfType<[i32],
865+
CCIfConsecutiveRegs<CCCustom<"CC_X86_32_I128_FP128">>>,
866+
862867
// On swifttailcc pass swiftself in ECX.
863868
CCIfCC<"CallingConv::SwiftTail",
864869
CCIfSwiftSelf<CCIfType<[i32], CCAssignToReg<[ECX]>>>>,

llvm/lib/Target/X86/X86ISelLoweringCall.cpp

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -237,9 +237,18 @@ EVT X86TargetLowering::getSetCCResultType(const DataLayout &DL,
237237
bool X86TargetLowering::functionArgumentNeedsConsecutiveRegisters(
238238
Type *Ty, CallingConv::ID CallConv, bool isVarArg,
239239
const DataLayout &DL) const {
240-
// i128 split into i64 needs to be allocated to two consecutive registers,
241-
// or spilled to the stack as a whole.
242-
return Ty->isIntegerTy(128);
240+
// On x86-64 i128 is split into two i64s and needs to be allocated to two
241+
// consecutive registers, or spilled to the stack as a whole. On x86-32 i128
242+
// is split to four i32s and never actually passed in registers, but we use
243+
// the consecutive register mark to match it in TableGen.
244+
if (Ty->isIntegerTy(128))
245+
return true;
246+
247+
// On x86-32, fp128 acts the same as i128.
248+
if (Subtarget.is32Bit() && Ty->isFP128Ty())
249+
return true;
250+
251+
return false;
243252
}
244253

245254
/// Helper for getByValTypeAlignment to determine

0 commit comments

Comments
 (0)