-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[SYCL] Add offload wrapping for SYCL kind. #147508
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-llvm-binary-utilities @llvm/pr-subscribers-clang Author: Maksim Sabianin (maksimsab) ChangesPatch is 34.84 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/147508.diff 11 Files Affected:
diff --git a/clang/test/Driver/linker-wrapper-image.c b/clang/test/Driver/linker-wrapper-image.c
index c0de56d58196a..67bb21bfe49b4 100644
--- a/clang/test/Driver/linker-wrapper-image.c
+++ b/clang/test/Driver/linker-wrapper-image.c
@@ -1,6 +1,7 @@
// REQUIRES: x86-registered-target
// REQUIRES: nvptx-registered-target
// REQUIRES: amdgpu-registered-target
+// REQUIRES: spirv-registered-target
// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t.elf.o
@@ -263,3 +264,37 @@
// HIP: while.end:
// HIP-NEXT: ret void
// HIP-NEXT: }
+
+// RUN: clang-offload-packager -o %t.out --image=file=%t.elf.o,kind=sycl,triple=spirv64-unknown-unknown,arch=generic
+// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t.o \
+// RUN: -fembed-offload-object=%t.out
+// RUN: clang-linker-wrapper --print-wrapped-module --dry-run --host-triple=x86_64-unknown-linux-gnu \
+// RUN: --linker-path=/usr/bin/ld %t.o -o a.out 2>&1 | FileCheck %s --check-prefixes=SYCL
+// RUN: clang-linker-wrapper --print-wrapped-module --dry-run --host-triple=x86_64-unknown-linux-gnu -r \
+// RUN: --linker-path=/usr/bin/ld %t.o -o a.out 2>&1 | FileCheck %s --check-prefixes=SYCL
+
+// SYCL: %__sycl.tgt_device_image = type { i16, i8, i8, ptr, ptr, ptr, ptr, ptr, ptr, ptr, ptr, ptr }
+// SYCL-NEXT: %__sycl.tgt_bin_desc = type { i16, i16, ptr, ptr, ptr }
+
+// SYCL: @.sycl_offloading.target.0 = internal unnamed_addr constant [1 x i8] zeroinitializer
+// SYCL-NEXT: @.sycl_offloading.opts.compile.0 = internal unnamed_addr constant [1 x i8] zeroinitializer
+// SYCL-NEXT: @.sycl_offloading.opts.link.0 = internal unnamed_addr constant [1 x i8] zeroinitializer
+// SYCL-NEXT: @.sycl_offloading.0.data = internal unnamed_addr constant [0 x i8] zeroinitializer
+// SYCL-NEXT: @.sycl_offloading.0.info = internal local_unnamed_addr constant [2 x i64] [i64 ptrtoint (ptr @.sycl_offloading.0.data to i64), i64 0], section ".tgtimg", align 16
+// SYCL-NEXT: @llvm.used = appending global [1 x ptr] [ptr @.sycl_offloading.0.info], section "llvm.metadata"
+// SYCL-NEXT: @.sycl_offloading.device_images = internal unnamed_addr constant [1 x %__sycl.tgt_device_image] [%__sycl.tgt_device_image { i16 3, i8 8, i8 0, ptr @.sycl_offloading.target.0, ptr @.sycl_offloading.opts.compile.0, ptr @.sycl_offloading.opts.link.0, ptr @.sycl_offloading.0.data, ptr @.sycl_offloading.0.data, ptr null, ptr null, ptr null, ptr null }]
+// SYCL-NEXT: @.sycl_offloading.descriptor = internal constant %__sycl.tgt_bin_desc { i16 1, i16 1, ptr @.sycl_offloading.device_images, ptr null, ptr null }
+// SYCL-NEXT: @llvm.global_ctors = appending global [1 x { i32, ptr, ptr }] [{ i32, ptr, ptr } { i32 1, ptr @sycl.descriptor_reg, ptr null }]
+// SYCL-NEXT: @llvm.global_dtors = appending global [1 x { i32, ptr, ptr }] [{ i32, ptr, ptr } { i32 1, ptr @sycl.descriptor_unreg, ptr null }]
+
+// SYCL: define internal void @sycl.descriptor_reg() section ".text.startup" {
+// SYCL-NEXT: entry:
+// SYCL-NEXT: call void @__sycl_register_lib(ptr @.sycl_offloading.descriptor)
+// SYCL-NEXT: ret void
+// SYCL-NEXT: }
+
+// SYCL: define internal void @sycl.descriptor_unreg() section ".text.startup" {
+// SYCL-NEXT: entry:
+// SYCL-NEXT: call void @__sycl_unregister_lib(ptr @.sycl_offloading.descriptor)
+// SYCL-NEXT: ret void
+// SYCL-NEXT: }
diff --git a/clang/test/Driver/linker-wrapper.c b/clang/test/Driver/linker-wrapper.c
index 80b1a5745a123..5ab8a09660e57 100644
--- a/clang/test/Driver/linker-wrapper.c
+++ b/clang/test/Driver/linker-wrapper.c
@@ -54,7 +54,7 @@ __attribute__((visibility("protected"), used)) int x;
// RUN: clang-offload-packager -o %t.out \
// RUN: --image=file=%t.spirv.bc,kind=sycl,triple=spirv64-unknown-unknown,arch=generic
// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t.o -fembed-offload-object=%t.out
-// RUN: not clang-linker-wrapper --host-triple=x86_64-unknown-linux-gnu --dry-run \
+// RUN: clang-linker-wrapper --host-triple=x86_64-unknown-linux-gnu --dry-run \
// RUN: --linker-path=/usr/bin/ld %t.o -o a.out 2>&1 | FileCheck %s --check-prefix=SPIRV-LINK
// SPIRV-LINK: clang{{.*}} -o {{.*}}.img --target=spirv64-unknown-unknown {{.*}}.o --sycl-link -Xlinker -triple=spirv64-unknown-unknown -Xlinker -arch=
diff --git a/clang/tools/clang-linker-wrapper/CMakeLists.txt b/clang/tools/clang-linker-wrapper/CMakeLists.txt
index bf37d8031025e..741e3fbbefb74 100644
--- a/clang/tools/clang-linker-wrapper/CMakeLists.txt
+++ b/clang/tools/clang-linker-wrapper/CMakeLists.txt
@@ -16,6 +16,7 @@ set(LLVM_LINK_COMPONENTS
CodeGen
LTO
FrontendOffloading
+ FrontendSYCL
)
set(LLVM_TARGET_DEFINITIONS LinkerWrapperOpts.td)
diff --git a/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp b/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
index 0f1fa8b329fd6..9a466d6e69c31 100644
--- a/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
+++ b/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
@@ -22,6 +22,7 @@
#include "llvm/CodeGen/CommandFlags.h"
#include "llvm/Frontend/Offloading/OffloadWrapper.h"
#include "llvm/Frontend/Offloading/Utility.h"
+#include "llvm/Frontend/SYCL/OffloadWrapper.h"
#include "llvm/IR/Constants.h"
#include "llvm/IR/DiagnosticPrinter.h"
#include "llvm/IR/Module.h"
@@ -711,6 +712,13 @@ wrapDeviceImages(ArrayRef<std::unique_ptr<MemoryBuffer>> Buffers,
M, BuffersToWrap.front(), offloading::getOffloadEntryArray(M)))
return std::move(Err);
break;
+ case OFK_SYCL: {
+ offloading::sycl::SYCLWrappingOptions WrappingOptions;
+ if (Error Err = offloading::sycl::wrapSYCLBinaries(M, BuffersToWrap,
+ WrappingOptions))
+ return Err;
+ break;
+ }
default:
return createStringError(getOffloadKindName(Kind) +
" wrapping is not supported");
@@ -748,6 +756,36 @@ bundleOpenMP(ArrayRef<OffloadingImage> Images) {
return std::move(Buffers);
}
+Expected<SmallVector<std::unique_ptr<MemoryBuffer>>>
+bundleSYCL(ArrayRef<OffloadingImage> Images) {
+ SmallVector<std::unique_ptr<MemoryBuffer>> Buffers;
+ if (DryRun) {
+ // In dry-run mode there is an empty input which is insufficient for
+ // the testing. Therefore, we insert a stub value.
+ OffloadBinary::OffloadingImage Image;
+ Image.TheOffloadKind = OffloadKind::OFK_SYCL;
+ Image.Image = MemoryBuffer::getMemBufferCopy("");
+ SmallString<0> SerializedImage = OffloadBinary::write(Image);
+ Buffers.emplace_back(MemoryBuffer::getMemBufferCopy(SerializedImage));
+ return Buffers;
+ }
+
+ for (const OffloadingImage &TheImage : Images) {
+ SmallVector<OffloadFile> OffloadBinaries;
+ if (Error E = extractOffloadBinaries(*TheImage.Image, OffloadBinaries))
+ return E;
+
+ for (const OffloadFile &File : OffloadBinaries) {
+ const OffloadBinary &Binary = *File.getBinary();
+ SmallString<0> SerializedImage =
+ OffloadBinary::write(Binary.getOffloadingImage());
+ Buffers.emplace_back(MemoryBuffer::getMemBufferCopy(SerializedImage));
+ }
+ }
+
+ return Buffers;
+}
+
Expected<SmallVector<std::unique_ptr<MemoryBuffer>>>
bundleCuda(ArrayRef<OffloadingImage> Images, const ArgList &Args) {
SmallVector<std::pair<StringRef, StringRef>, 4> InputFiles;
@@ -800,8 +838,9 @@ bundleLinkedOutput(ArrayRef<OffloadingImage> Images, const ArgList &Args,
llvm::TimeTraceScope TimeScope("Bundle linked output");
switch (Kind) {
case OFK_OpenMP:
- case OFK_SYCL:
return bundleOpenMP(Images);
+ case OFK_SYCL:
+ return bundleSYCL(Images);
case OFK_Cuda:
return bundleCuda(Images, Args);
case OFK_HIP:
diff --git a/llvm/include/llvm/Frontend/SYCL/OffloadWrapper.h b/llvm/include/llvm/Frontend/SYCL/OffloadWrapper.h
new file mode 100644
index 0000000000000..f89411c86984d
--- /dev/null
+++ b/llvm/include/llvm/Frontend/SYCL/OffloadWrapper.h
@@ -0,0 +1,44 @@
+//===----- OffloadWrapper.h -------------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_FRONTEND_SYCL_OFFLOAD_WRAPPER_H
+#define LLVM_FRONTEND_SYCL_OFFLOAD_WRAPPER_H
+
+#include "llvm/ADT/ArrayRef.h"
+#include "llvm/Object/OffloadBinary.h"
+
+#include <string>
+
+namespace llvm {
+
+class Module;
+
+namespace offloading {
+namespace sycl {
+
+struct SYCLWrappingOptions {
+ // target/compiler specific options what are suggested to use to "compile"
+ // program at runtime.
+ std::string CompileOptions;
+ // Target/Compiler specific options that are suggested to use to "link"
+ // program at runtime.
+ std::string LinkOptions;
+};
+
+/// Wraps OffloadBinaries in the given \p Buffers into the module \p M
+/// as global symbols and registers the images with the SYCL Runtime.
+/// \param Options Settings that allows to turn on optional data and settings.
+llvm::Error
+wrapSYCLBinaries(llvm::Module &M, llvm::ArrayRef<llvm::ArrayRef<char>> Buffers,
+ SYCLWrappingOptions Options = SYCLWrappingOptions());
+
+} // namespace sycl
+} // namespace offloading
+} // namespace llvm
+
+#endif // LLVM_FRONTEND_SYCL_OFFLOAD_WRAPPER_H
diff --git a/llvm/include/llvm/Object/OffloadBinary.h b/llvm/include/llvm/Object/OffloadBinary.h
index b5c845fa8eb70..9d137db834f08 100644
--- a/llvm/include/llvm/Object/OffloadBinary.h
+++ b/llvm/include/llvm/Object/OffloadBinary.h
@@ -48,6 +48,7 @@ enum ImageKind : uint16_t {
IMG_Cubin,
IMG_Fatbinary,
IMG_PTX,
+ IMG_SPIRV,
IMG_LAST,
};
@@ -70,9 +71,9 @@ class OffloadBinary : public Binary {
/// The offloading metadata that will be serialized to a memory buffer.
struct OffloadingImage {
- ImageKind TheImageKind;
- OffloadKind TheOffloadKind;
- uint32_t Flags;
+ ImageKind TheImageKind = ImageKind::IMG_None;
+ OffloadKind TheOffloadKind = OffloadKind::OFK_None;
+ uint32_t Flags = 0;
MapVector<StringRef, StringRef> StringData;
std::unique_ptr<MemoryBuffer> Image;
};
@@ -84,6 +85,8 @@ class OffloadBinary : public Binary {
/// Serialize the contents of \p File to a binary buffer to be read later.
LLVM_ABI static SmallString<0> write(const OffloadingImage &);
+ OffloadingImage getOffloadingImage() const;
+
static uint64_t getAlignment() { return 8; }
ImageKind getImageKind() const { return TheEntry->TheImageKind; }
diff --git a/llvm/lib/Frontend/CMakeLists.txt b/llvm/lib/Frontend/CMakeLists.txt
index 3b31e6f8dec96..6c4b8362c04fd 100644
--- a/llvm/lib/Frontend/CMakeLists.txt
+++ b/llvm/lib/Frontend/CMakeLists.txt
@@ -5,3 +5,4 @@ add_subdirectory(HLSL)
add_subdirectory(OpenACC)
add_subdirectory(OpenMP)
add_subdirectory(Offloading)
+add_subdirectory(SYCL)
diff --git a/llvm/lib/Frontend/SYCL/CMakeLists.txt b/llvm/lib/Frontend/SYCL/CMakeLists.txt
new file mode 100644
index 0000000000000..355ae5f7955a8
--- /dev/null
+++ b/llvm/lib/Frontend/SYCL/CMakeLists.txt
@@ -0,0 +1,14 @@
+add_llvm_component_library(LLVMFrontendSYCL
+ OffloadWrapper.cpp
+
+ ADDITIONAL_HEADER_DIRS
+ ${LLVM_MAIN_INCLUDE_DIR}/llvm/Frontend
+ ${LLVM_MAIN_INCLUDE_DIR}/llvm/Frontend/SYCL
+
+ LINK_COMPONENTS
+ Core
+ FrontendOffloading
+ Object
+ Support
+ TransformUtils
+ )
diff --git a/llvm/lib/Frontend/SYCL/OffloadWrapper.cpp b/llvm/lib/Frontend/SYCL/OffloadWrapper.cpp
new file mode 100644
index 0000000000000..c0d160c39e93d
--- /dev/null
+++ b/llvm/lib/Frontend/SYCL/OffloadWrapper.cpp
@@ -0,0 +1,513 @@
+//===- SYCLOffloadWrapper.cpp -----------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "llvm/Frontend/SYCL/OffloadWrapper.h"
+#include "llvm/ADT/ArrayRef.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/ADT/Twine.h"
+#include "llvm/Frontend/Offloading/Utility.h"
+#include "llvm/IR/Constants.h"
+#include "llvm/IR/DerivedTypes.h"
+#include "llvm/IR/GlobalVariable.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/LLVMContext.h"
+#include "llvm/IR/Module.h"
+#include "llvm/IR/Type.h"
+#include "llvm/Object/OffloadBinary.h"
+#include "llvm/Support/Error.h"
+#include "llvm/Support/ErrorHandling.h"
+#include "llvm/Support/LineIterator.h"
+#include "llvm/Support/MemoryBufferRef.h"
+#include "llvm/Transforms/Utils/ModuleUtils.h"
+
+#include <memory>
+#include <string>
+#include <utility>
+
+using namespace llvm;
+using namespace llvm::object;
+using namespace llvm::offloading;
+using namespace llvm::offloading::sycl;
+
+using OffloadingImage = OffloadBinary::OffloadingImage;
+
+namespace {
+
+/// Wrapper helper class that creates all LLVM IRs wrapping given images.
+struct Wrapper {
+ Module &M;
+ LLVMContext &C;
+ SYCLWrappingOptions Options;
+
+ StructType *EntryTy = nullptr;
+ StructType *SyclDeviceImageTy = nullptr;
+ StructType *SyclBinDescTy = nullptr;
+
+ Wrapper(Module &M, const SYCLWrappingOptions &Options)
+ : M(M), C(M.getContext()), Options(Options) {
+
+ EntryTy = offloading::getEntryTy(M);
+ SyclDeviceImageTy = getSyclDeviceImageTy();
+ SyclBinDescTy = getSyclBinDescTy();
+ }
+
+ IntegerType *getSizeTTy() {
+ switch (M.getDataLayout().getPointerSize()) {
+ case 4:
+ return Type::getInt32Ty(C);
+ case 8:
+ return Type::getInt64Ty(C);
+ }
+ llvm_unreachable("unsupported pointer type size");
+ }
+
+ SmallVector<Constant *, 2> getSizetConstPair(size_t First, size_t Second) {
+ IntegerType *SizeTTy = getSizeTTy();
+ return SmallVector<Constant *, 2>{ConstantInt::get(SizeTTy, First),
+ ConstantInt::get(SizeTTy, Second)};
+ }
+
+ /// Note: Properties aren't supported and the support is going
+ /// to be added later.
+ /// Creates a structure corresponding to:
+ /// SYCL specific image descriptor type.
+ /// \code
+ /// struct __sycl.tgt_device_image {
+ /// // version of this structure - for backward compatibility;
+ /// // all modifications which change order/type/offsets of existing fields
+ /// // should increment the version.
+ /// uint16_t Version;
+ /// // the kind of offload model the image employs.
+ /// uint8_t OffloadKind;
+ /// // format of the image data - SPIRV, LLVMIR bitcode, etc
+ /// uint8_t Format;
+ /// // null-terminated string representation of the device's target
+ /// // architecture
+ /// const char *Arch;
+ /// // a null-terminated string; target- and compiler-specific options
+ /// // which are suggested to use to "compile" program at runtime
+ /// const char *CompileOptions;
+ /// // a null-terminated string; target- and compiler-specific options
+ /// // which are suggested to use to "link" program at runtime
+ /// const char *LinkOptions;
+ /// // Pointer to the device binary image start
+ /// void *ImageStart;
+ /// // Pointer to the device binary image end
+ /// void *ImageEnd;
+ /// // the entry table
+ /// __tgt_offload_entry *EntriesBegin;
+ /// __tgt_offload_entry *EntriesEnd;
+ /// const char *PropertiesBegin;
+ /// const char *PropertiesEnd;
+ /// };
+ /// \endcode
+ StructType *getSyclDeviceImageTy() {
+ return StructType::create(
+ {
+ Type::getInt16Ty(C), // Version
+ Type::getInt8Ty(C), // OffloadKind
+ Type::getInt8Ty(C), // Format
+ PointerType::getUnqual(C), // Arch
+ PointerType::getUnqual(C), // CompileOptions
+ PointerType::getUnqual(C), // LinkOptions
+ PointerType::getUnqual(C), // ImageStart
+ PointerType::getUnqual(C), // ImageEnd
+ PointerType::getUnqual(C), // EntriesBegin
+ PointerType::getUnqual(C), // EntriesEnd
+ PointerType::getUnqual(C), // PropertiesBegin
+ PointerType::getUnqual(C) // PropertiesEnd
+ },
+ "__sycl.tgt_device_image");
+ }
+
+ /// Creates a structure for SYCL specific binary descriptor type. Corresponds
+ /// to:
+ ///
+ /// \code
+ /// struct __sycl.tgt_bin_desc {
+ /// // version of this structure - for backward compatibility;
+ /// // all modifications which change order/type/offsets of existing fields
+ /// // should increment the version.
+ /// uint16_t Version;
+ /// uint16_t NumDeviceImages;
+ /// __sycl.tgt_device_image *DeviceImages;
+ /// // the offload entry table
+ /// __tgt_offload_entry *HostEntriesBegin;
+ /// __tgt_offload_entry *HostEntriesEnd;
+ /// };
+ /// \endcode
+ StructType *getSyclBinDescTy() {
+ return StructType::create(
+ {Type::getInt16Ty(C), Type::getInt16Ty(C), PointerType::getUnqual(C),
+ PointerType::getUnqual(C), PointerType::getUnqual(C)},
+ "__sycl.tgt_bin_desc");
+ }
+
+ /// Adds a global readonly variable that is initialized by given
+ /// \p Initializer to the module.
+ GlobalVariable *addGlobalArrayVariable(const Twine &Name,
+ ArrayRef<char> Initializer,
+ const Twine &Section = "") {
+ auto *Arr = ConstantDataArray::get(M.getContext(), Initializer);
+ auto *Var = new GlobalVariable(M, Arr->getType(), /*isConstant*/ true,
+ GlobalVariable::InternalLinkage, Arr, Name);
+ Var->setUnnamedAddr(GlobalValue::UnnamedAddr::Global);
+
+ SmallVector<char, 32> NameBuf;
+ auto SectionName = Section.toStringRef(NameBuf);
+ if (!SectionName.empty())
+ Var->setSection(SectionName);
+ return Var;
+ }
+
+ /// Adds given \p Buf as a global variable into the module.
+ /// \returns Pair of pointers that point at the beginning and the end of the
+ /// variable.
+ std::pair<Constant *, Constant *>
+ addArrayToModule(ArrayRef<char> Buf, const Twine &Name,
+ const Twine &Section = "") {
+ auto *Var = addGlobalArrayVariable(Name, Buf, Section);
+ auto *ImageB = ConstantExpr::getGetElementPtr(Var->getValueType(), Var,
+ getSizetConstPair(0, 0));
+ auto *ImageE = ConstantExpr::getGetElementPtr(
+ Var->getValueType(), Var, getSizetConstPair(0, Buf.size()));
+ return std::make_pair(ImageB, ImageE);
+ }
+
+ /// Adds given \p Data as constant byte array in the module.
+ /// \returns Constant pointer to the added data. The pointer type does not
+ /// carry size information.
+ Constant *addRawDataToModule(ArrayRef<char> Data, const Twine &Name) {
+ auto *Var = addGlobalArrayVariable(Name, Data);
+ auto *DataPtr = ConstantExpr::getGetElementPtr(Var->getValueType(), Var,
+ getSizetConstPair(0, 0));
+ return DataPtr;
+ }
+
+ /// Creates a global variable of const char* type and creates an
+ /// initializer that initializes it with \p Str.
+ ///
+ /// \returns Link-time constant pointer (constant expr) to that
+ /// variable.
+ Constant *addStringToModule(StringRef Str, const Twine &Name) {
+ auto *Arr = ConstantDataArray::getString(C, Str);
+ auto *Var = new GlobalVariable(M, Arr->getType(), /*isConstant*/ true,
+ GlobalVariable::InternalLinkage, Arr, Name);
+ Var->setUnnamedAddr(GlobalValue::UnnamedAddr::Global);
+ auto *Zero = ConstantInt::get(getSizeTTy(), 0);
+ Constant *ZeroZero[] = {Zero, Zero};
+ return ConstantExpr::getGetElementPtr(Var->getValueType(), Var, ZeroZero);
+ }
+
+ /// Creates a global variable of array of structs and initializes
+ /// it with the given values in \p ArrayData.
+ ///
+ /// \returns Pair of Constants that point at array content.
+ /// If \p ArrayData is empty then a returned pair contains nullptrs.
+ std::pair<Constant *, Constant *>
+ addStructArrayToModule(ArrayRef<Constant *> ArrayData, Type *ElemTy) {
+ if (ArrayData.empty()) {
+ auto *PtrTy = llvm::PointerType::getUnqual(ElemTy->getContext());
+ auto *NullPtr = Constant::getNullValue(PtrTy)...
[truncated]
|
Hi @jdoerfert @jhuber6 ! We are adding offload wrapping for SYCL in this PR. Do you have any suggestions for whom I should request for a code review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need a separate directory for the offload wrapper?
// SYCL-NEXT: @.sycl_offloading.0.info = internal local_unnamed_addr constant [2 x i64] [i64 ptrtoint (ptr @.sycl_offloading.0.data to i64), i64 0], section ".tgtimg", align 16 | ||
// SYCL-NEXT: @llvm.used = appending global [1 x ptr] [ptr @.sycl_offloading.0.info], section "llvm.metadata" | ||
// SYCL-NEXT: @.sycl_offloading.device_images = internal unnamed_addr constant [1 x %__sycl.tgt_device_image] [%__sycl.tgt_device_image { i16 3, i8 8, i8 0, ptr @.sycl_offloading.target.0, ptr @.sycl_offloading.opts.compile.0, ptr @.sycl_offloading.opts.link.0, ptr @.sycl_offloading.0.data, ptr @.sycl_offloading.0.data, ptr null, ptr null, ptr null, ptr null }] | ||
// SYCL-NEXT: @.sycl_offloading.descriptor = internal constant %__sycl.tgt_bin_desc { i16 1, i16 1, ptr @.sycl_offloading.device_images, ptr null, ptr null } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this just copied from OpenMP?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RUN
lines are copied from OpenMP and adjusted to SYCL case. Output's checks are copied from the clang-linker-wrapper
's output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I'm just wondering why everything's called tgt_bin_desc
and similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially, we customized old clang-offload-wrapper
and then we extracted SYCL logic in a separate place. That is how these names stuck with us.
@jhuber6 |
OpenMP and HIP don't look similar either but I figured it was easier to just put them all together because they're basically exporting the same interface. |
I've merged SYCL wrapper with all others. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this stuff not common with the other targets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see a divergence with offloading::getOffloadEntryArray
. Other targets don't set char *SymbolName
for some reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you elaborate a bit more? Other targets definitely set the symbol name when they call emitOffloadingEntry
. We shouldn't be duplicating functionality here. Is there something preventing you from just updating the other function with whatever you need?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Entries could be emitted by emitOffloadEntry
. Then I should create an array that refers to these entries.
I don't see the appropriate other function that does that. The most closest one is offloading::getOffloadEntryArray
but this function creates an empty array if I get it right. Therefore, I would need to modify that function in a strange way.
If you have other opinion regarding getOffloadEntryArray
please let me know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getOffloadEntryArray
definitely does not create an empty array. It creates a pair of symbols that will let you iterate the offloading entries emitted in each TU once the linker combines them into a single section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, our compilation flow consists of steps:
Link all modules together in Module LM.
Then split LM into several parts SM_1, ..., SM_N.
Could you explain this a bit further? Trivially you could just put some metadata in the offloading entries and skip over them as-needed in your wrapping code. I fail to see how that's a blocker, though it might be slightly annoying if you split them out later. I don't have the full view of how you choose to do your splitting. Another option, and one I do for relocatable links, is to just rename the section.
Also, the mentioned Mac OS case concerns me.
There is a way to do it in Mac OS, I just haven't gotten around to writing it, there's no incentive since GPU offloading on Mac-OS is effectively dead as far as I'm aware.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's consider the following example.
One TU has 2 kernels (A, B) and the user requests a split by kernels.
In unified approach we would have the output from the Codegen like the following:
kernel A() {...}
kernel B() {...}
@entry_A = %tgt_offload_entry {...} ; info about kernel A
@entry_B = %tgt_offload_entry {...} ; info about kernel B.
Then we have to split these kernels using splitModuleTransitiveFromEntryPoints
added in #131347. In order to correctly move these entries, we have to add such SYCL specific handling functionality in splitModuleTransitiveFromEntryPoints
, which is very discouraged according to my understanding. That is only my concern about that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't fully understand the SYCL workflow unfortunately. If we are splitting the module (serveral IR blobs which may or may not contain the requested function) we could possibly just try each entry for each image. It's N^2 technically but I feel like these lists will be small enough that it won't blow up. I.e.
for image in images:
for entry in entries:
if (entry.name in image.symbols)
register(entry, image);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Joseph, just for your information. We are currently experimenting/discussing internally on how to better address your feedback. We'll get back to you...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jhuber6, we replaced duplicated code with usage of emitOffloadingEntry
and updated name and description of addOffloadEntriesToModule
(now it is initOffloadEntriesPerImage
) to avoid confusion with getOffloadEntryArray
, because initially similar names created perception that the functions are doing the same, but in reality, they are different.
We need to initialize offload entries per image, because, unlike other targets, each SYCL device image contains its own set of symbols, which may contain different symbols than other images. The reason is per-kernel device code split, which solves the following problems:
- SYCL 2020 spec requirement: “An implementation may not raise a compile time diagnostic or a run time exception merely due to speculative compilation of a kernel for a device when the application does not actually submit the kernel to that device.” To satisfy this requirement, we split kernels into different images based on the features required.
- Compilation overhead. There are use cases, when module contains 1000s of kernels. Providing only specific symbols in each of the images allows to compile/run 1 specific kernel/TU, so no need to JIT compile all 1000s.
Does that address all your concerns for this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking into this and removing at least some of the code duplication. So, let me make sure I'm understanding this correctly. The other offloading targets have a big array of entries that apply to every image embedded in the device. The clang compiler emits these while it's compiling each TU itself.
SYCL on the other hand seems to do this all at the end and generates these arrays as-needed. Images are only needed for registration, so I'm still wondering if there isn't some way to just indicate which entries are to be used for which kernels. If I were doing this on the other targets it would just be a symbol lookup.
Could we at least put these in the same section as the other target's offloading entries? I'm also not entirely sure what the heuristic is for determining which symbols go where.
llvm::StringRef Suffix = "", | ||
bool EmitSurfacesAndTextures = true); | ||
|
||
struct SYCLWrappingOptions { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the members description this structure holds options for the runtime compiler (aka JIT compiler). I think it's better to reflect in the structure name.
struct SYCLWrappingOptions { | |
struct SYCLJITOptions { |
Or maybe even just "JITOptions". These are specific to the device code format rather than a programming model. If we use some intermediate representation (e.g. SPIR-V), which just be compiled to the executable format, these options are useful to configure the runtime compilation step.
@sarnex, does OpenMP offload to SPIR-V compiler allows setting JIT compiler options? If so, I think this structure should have a neutral name like "JITOptions" and re-used by different offloading modes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's supported and the generated device ELF contains it but I didn't hook it up to anything from the driver yet, see here.
Now Offload Wrapper handles the binary blob which contains sycl images. Also, the issue with duplicated entry names strings is resolved. SYCLWrappingOptions are renamed to SYCLJITOptions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, also fix CI, if it is caused by your changes.
Rollback unittest to resolve CI. Clean leftovers and unify names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is generally fine from my side, I'll leave it to the Intel people to understand the specifics.
ImgInfoVar->setAlignment( | ||
MaybeAlign(M.getDataLayout().getTypeStoreSize(IntPtrTy) * 2u)); | ||
ImgInfoVar->setUnnamedAddr(GlobalValue::UnnamedAddr::Local); | ||
ImgInfoVar->setSection(".tgtimg"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Significance of this section?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, this is unnecessary. I will take one more look for other useless pieces like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this whole function unnecessary. So I removed that.
@maksimsab Can you ping me if this is really ready for review? Thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, all my comments can be addressed in a separate pr or as part of the next iteration of this change. if you are ready otherwise we can merge this before replying to any of my comments
// Target/compiler specific options that are suggested to use to "compile" | ||
// program at runtime. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit that can be addressed in follow-up pr, this comment might be clearer, these options arent really a suggestion, they must be passed to the gpu driver
// Target/compiler specific options that are suggested to use to "compile" | |
// program at runtime. | |
// Target/compiler specific options that are passed to the device compiler | |
// at runtime. |
/// \param Options Data that needs to be encoded for the later use in a runtime. | ||
LLVM_ABI llvm::Error | ||
wrapSYCLBinaries(llvm::Module &M, llvm::ArrayRef<char> Buffer, | ||
SYCLJITOptions Options = SYCLJITOptions()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit that can be addressed in follow-up: are there any cases where a caller wouldnt want to pass SYCLJitOptions? AOT mode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, AOT mode is a case when it is too late to pass any options.
} | ||
|
||
/// SYCLWrapper helper class that creates all LLVM IRs wrapping given images. | ||
struct SYCLWrapper { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
struct SYCLWrapper { | |
class SYCLWrapper { |
/// // null-terminated string representation of the device's target | ||
/// // architecture | ||
/// const char *Arch; | ||
/// // a null-terminated string; target- and compiler-specific options |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same nit about the suggested
descirption here, if we change the comment lets change both places
/// const char *PropertiesBegin; | ||
/// const char *PropertiesEnd; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i dont really understand why these are const char*
but i think we are planning to rework these anyway so i guess it doesnt matter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I needed some "pointer type" here for now either const char*
or const void*
. Later, it would be changed to a pointer to appropriate structure.
// Note: Intel DPC++ compiler had 2 versions of this structure | ||
// and clang++ has a third different structure. To avoid ABI incompatibility | ||
// between generated device images the Version here starts from 3. | ||
constexpr uint16_t DeviceImageStructVersion = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason we use uint16
for this? is it what we use in our fork?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only can guess that the author was trying to save some space.
initOffloadEntriesPerImage(OB.getString("symbols"), OffloadKindTag); | ||
Constant *WrappedBinary = ConstantStruct::get( | ||
SyclDeviceImageTy, Version, OffloadKindConstant, ImageKindConstant, | ||
TripleConstant, CompileOptions, LinkOptions, Binary.first, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the first
and second
usages are kinda unclear here, if they all follow some rule could we add a comment, and if not, could we add an inline comment for each one explaining what it is?
/// extern const char *CompileOptions = "..."; | ||
/// ... | ||
/// __attribute__((visibility("hidden"))) | ||
/// extern const char *LinkOptions = "..."; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here these look like these is one set of compile/link opts for all images, but thats not necessarily true, for example we allow different device optimization options per-TU (as in a.cpp with O0 and b.cpp with O3). we can deal with this later though
GlobalVariable *Desc = W.createFatbinDesc(OffloadFiles); | ||
if (!Desc) | ||
return createStringError(inconvertibleErrorCode(), | ||
"No binary descriptors created."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe this is more useful to the user?
"No binary descriptors created."); | |
"Error creating binary descriptors."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, both options don't follow LLVM's guideline because we don't add any useful piece of information.
/// \param Flags Flags associated with the entry. | ||
/// \param Data Extra data storage associated with the entry. | ||
/// \param SectionName The section this entry will be placed at. | ||
/// \param AuxAddr An extra pointer if needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we explain what this means more? when do we need it
@sarnex Hi. It is ready. |
Once CI passes I will merge |
This patch adds an Offload Wrapper for the SYCL kind. This is an essential step for SYCL offloading and the compilation flow. The usage of offload wrapping is added to the clang-linker-wrapper tool. Modifications: Implemented `bundleSYCL()` function to handle SYCL image bundling. Implemented `wrapSYCLBinaries()` function that is invoked from clang-linker-wrapper. SYCL Offload Wrapping uses specific data structures such as `__sycl.tgt_device_image` and `__sycl.tgt_bin_desc`. Each SYCL image maintains its own symbol table (unlike shared global tables in other targets). Therefore, symbols are encoded explicitly during the offload wrapping. Also, images refer to their own Offloading Entries arrays unlike other targets. The proposed `__sycl.tgt_device_image` uses Version 3 to differentiate from images generated by Intel DPC++. The structure proposed in this patch doesn't have fields deprecated in DPC++.
This patch adds an Offload Wrapper for the SYCL kind. This is an essential step for SYCL offloading and the compilation flow. The usage of offload wrapping is added to the clang-linker-wrapper tool.
Modifications:
Implemented
bundleSYCL()
function to handle SYCL image bundling.Implemented
wrapSYCLBinaries()
function that is invoked from clang-linker-wrapper.SYCL Offload Wrapping uses specific data structures such as
__sycl.tgt_device_image
and__sycl.tgt_bin_desc
. Each SYCL image maintains its own symbol table (unlike shared global tables in other targets). Therefore, symbols are encoded explicitly during the offload wrapping. Also, images refer to their own Offloading Entries arrays unlike other targets.The proposed
__sycl.tgt_device_image
uses Version 3 to differentiate from images generated by Intel DPC++. The structure proposed in this patch doesn't have fields deprecated in DPC++.